All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 -tip 0/7] tools/perf: Update rbtree implementation and optimize users
@ 2018-12-06 19:18 Davidlohr Bueso
  2018-12-06 19:18 ` [PATCH 1/7] tools/perf: Update rbtree implementation Davidlohr Bueso
                   ` (6 more replies)
  0 siblings, 7 replies; 18+ messages in thread
From: Davidlohr Bueso @ 2018-12-06 19:18 UTC (permalink / raw)
  To: acme; +Cc: mingo, linux-kernel, dave

Hi,

Per acme's request, this is a rebase (and basically rewrite) of v1.

The following optimizes the rb_first() lookups in perf tooling such that we
can avoid walking down the tree finding the first element. Tree traversals
(and overall computing the first node in the tree) is a surprisingly common
operation. On a Xeon E5-2450 @ 2.10GHz, the cost of an rb_first() was ~60 cycles
for 100 nodes, and ~75 cycles with 1000 nodes.

The first patch adds the updated implementation of rbtrees, including the
cached interfaces, taken from the kernel. The rest of the patches make use
of this for the users I thought might care the most.

With these patches I am able to build and use perf without anything going wrong
(but needs more testing no doubt, as changes while redundant can be a little tricky
depending on if the user get smart about the trees). I'm sorry if some patches
seem too big, I've tried to split them the best I could.

Applies on today's -tip tree. Please consider for v4.21.

Thanks!

Davidlohr Bueso (7):
  tools/perf: Update rbtree implementation
  perf machine: Use cached rbtrees
  perf callchain: Use cached rbtrees
  perf util: Use cached rbtree for rblists
  perf symbols: Use cached rbtrees
  perf hist: Use cached rbtrees
  perf sched: Use cached rbtrees

 tools/include/linux/rbtree.h           |  52 ++++++++-
 tools/include/linux/rbtree_augmented.h |  60 ++++++++--
 tools/lib/rbtree.c                     | 178 ++++++++++++++++++++++-------
 tools/perf/builtin-annotate.c          |   4 +-
 tools/perf/builtin-c2c.c               |   6 +-
 tools/perf/builtin-diff.c              |  10 +-
 tools/perf/builtin-report.c            |   3 +-
 tools/perf/builtin-sched.c             |  45 ++++----
 tools/perf/builtin-top.c               |   2 +-
 tools/perf/tests/hists_common.c        |   8 +-
 tools/perf/tests/hists_cumulate.c      |  14 +--
 tools/perf/tests/hists_link.c          |   8 +-
 tools/perf/tests/hists_output.c        |  32 +++---
 tools/perf/ui/browsers/hists.c         |   6 +-
 tools/perf/ui/gtk/hists.c              |   4 +-
 tools/perf/ui/stdio/hist.c             |   3 +-
 tools/perf/util/build-id.c             |  12 +-
 tools/perf/util/dso.c                  |   8 +-
 tools/perf/util/dso.h                  |  10 +-
 tools/perf/util/hist.c                 | 199 +++++++++++++++++++--------------
 tools/perf/util/hist.h                 |  10 +-
 tools/perf/util/intlist.h              |   2 +-
 tools/perf/util/machine.c              |  53 +++++----
 tools/perf/util/machine.h              |  12 +-
 tools/perf/util/map.c                  |   8 +-
 tools/perf/util/metricgroup.c          |   2 +-
 tools/perf/util/probe-event.c          |   3 +-
 tools/perf/util/rb_resort.h            |   8 +-
 tools/perf/util/rblist.c               |  28 +++--
 tools/perf/util/rblist.h               |   2 +-
 tools/perf/util/sort.h                 |   4 +-
 tools/perf/util/srcline.c              |  43 ++++---
 tools/perf/util/srcline.h              |  13 ++-
 tools/perf/util/stat-shadow.c          |   2 +-
 tools/perf/util/strlist.h              |   2 +-
 tools/perf/util/symbol.c               |  87 +++++++-------
 tools/perf/util/symbol.h               |  13 ++-
 tools/perf/util/symbol_fprintf.c       |   2 +-
 38 files changed, 598 insertions(+), 360 deletions(-)

-- 
2.16.4


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/7] tools/perf: Update rbtree implementation
  2018-12-06 19:18 [PATCH v2 -tip 0/7] tools/perf: Update rbtree implementation and optimize users Davidlohr Bueso
@ 2018-12-06 19:18 ` Davidlohr Bueso
  2019-01-26 10:01   ` [tip:perf/core] tools: " tip-bot for Davidlohr Bueso
  2018-12-06 19:18 ` [PATCH 2/7] perf machine: Use cached rbtrees Davidlohr Bueso
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 18+ messages in thread
From: Davidlohr Bueso @ 2018-12-06 19:18 UTC (permalink / raw)
  To: acme; +Cc: mingo, linux-kernel, dave, Davidlohr Bueso

There have been a number of changes in the kernel's rbrtee
implementation, including loose lockless searching guarantees
and rb_root_cached, which later patches will use as an
optimization.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 tools/include/linux/rbtree.h           |  52 ++++++++--
 tools/include/linux/rbtree_augmented.h |  60 ++++++++---
 tools/lib/rbtree.c                     | 178 +++++++++++++++++++++++++--------
 3 files changed, 229 insertions(+), 61 deletions(-)

diff --git a/tools/include/linux/rbtree.h b/tools/include/linux/rbtree.h
index 112582253dd0..8e9ed4786269 100644
--- a/tools/include/linux/rbtree.h
+++ b/tools/include/linux/rbtree.h
@@ -43,13 +43,28 @@ struct rb_root {
 	struct rb_node *rb_node;
 };
 
+/*
+ * Leftmost-cached rbtrees.
+ *
+ * We do not cache the rightmost node based on footprint
+ * size vs number of potential users that could benefit
+ * from O(1) rb_last(). Just not worth it, users that want
+ * this feature can always implement the logic explicitly.
+ * Furthermore, users that want to cache both pointers may
+ * find it a bit asymmetric, but that's ok.
+ */
+struct rb_root_cached {
+	struct rb_root rb_root;
+	struct rb_node *rb_leftmost;
+};
 
 #define rb_parent(r)   ((struct rb_node *)((r)->__rb_parent_color & ~3))
 
 #define RB_ROOT	(struct rb_root) { NULL, }
+#define RB_ROOT_CACHED (struct rb_root_cached) { {NULL, }, NULL }
 #define	rb_entry(ptr, type, member) container_of(ptr, type, member)
 
-#define RB_EMPTY_ROOT(root)  ((root)->rb_node == NULL)
+#define RB_EMPTY_ROOT(root)  (READ_ONCE((root)->rb_node) == NULL)
 
 /* 'empty' nodes are nodes that are known not to be inserted in an rbtree */
 #define RB_EMPTY_NODE(node)  \
@@ -68,6 +83,12 @@ extern struct rb_node *rb_prev(const struct rb_node *);
 extern struct rb_node *rb_first(const struct rb_root *);
 extern struct rb_node *rb_last(const struct rb_root *);
 
+extern void rb_insert_color_cached(struct rb_node *,
+				   struct rb_root_cached *, bool);
+extern void rb_erase_cached(struct rb_node *node, struct rb_root_cached *);
+/* Same as rb_first(), but O(1) */
+#define rb_first_cached(root) (root)->rb_leftmost
+
 /* Postorder iteration - always visit the parent after its children */
 extern struct rb_node *rb_first_postorder(const struct rb_root *);
 extern struct rb_node *rb_next_postorder(const struct rb_node *);
@@ -75,6 +96,8 @@ extern struct rb_node *rb_next_postorder(const struct rb_node *);
 /* Fast replacement of a single node without remove/rebalance/add/rebalance */
 extern void rb_replace_node(struct rb_node *victim, struct rb_node *new,
 			    struct rb_root *root);
+extern void rb_replace_node_cached(struct rb_node *victim, struct rb_node *new,
+				   struct rb_root_cached *root);
 
 static inline void rb_link_node(struct rb_node *node, struct rb_node *parent,
 				struct rb_node **rb_link)
@@ -90,12 +113,29 @@ static inline void rb_link_node(struct rb_node *node, struct rb_node *parent,
 	   ____ptr ? rb_entry(____ptr, type, member) : NULL; \
 	})
 
-
-/*
- * Handy for checking that we are not deleting an entry that is
- * already in a list, found in block/{blk-throttle,cfq-iosched}.c,
- * probably should be moved to lib/rbtree.c...
+/**
+ * rbtree_postorder_for_each_entry_safe - iterate in post-order over rb_root of
+ * given type allowing the backing memory of @pos to be invalidated
+ *
+ * @pos:	the 'type *' to use as a loop cursor.
+ * @n:		another 'type *' to use as temporary storage
+ * @root:	'rb_root *' of the rbtree.
+ * @field:	the name of the rb_node field within 'type'.
+ *
+ * rbtree_postorder_for_each_entry_safe() provides a similar guarantee as
+ * list_for_each_entry_safe() and allows the iteration to continue independent
+ * of changes to @pos by the body of the loop.
+ *
+ * Note, however, that it cannot handle other modifications that re-order the
+ * rbtree it is iterating over. This includes calling rb_erase() on @pos, as
+ * rb_erase() may rebalance the tree, causing us to miss some nodes.
  */
+#define rbtree_postorder_for_each_entry_safe(pos, n, root, field) \
+	for (pos = rb_entry_safe(rb_first_postorder(root), typeof(*pos), field); \
+	     pos && ({ n = rb_entry_safe(rb_next_postorder(&pos->field), \
+			typeof(*pos), field); 1; }); \
+	     pos = n)
+
 static inline void rb_erase_init(struct rb_node *n, struct rb_root *root)
 {
 	rb_erase(n, root);
diff --git a/tools/include/linux/rbtree_augmented.h b/tools/include/linux/rbtree_augmented.h
index 43be941db695..d008e1404580 100644
--- a/tools/include/linux/rbtree_augmented.h
+++ b/tools/include/linux/rbtree_augmented.h
@@ -44,7 +44,9 @@ struct rb_augment_callbacks {
 	void (*rotate)(struct rb_node *old, struct rb_node *new);
 };
 
-extern void __rb_insert_augmented(struct rb_node *node, struct rb_root *root,
+extern void __rb_insert_augmented(struct rb_node *node,
+				  struct rb_root *root,
+				  bool newleft, struct rb_node **leftmost,
 	void (*augment_rotate)(struct rb_node *old, struct rb_node *new));
 /*
  * Fixup the rbtree and update the augmented information when rebalancing.
@@ -60,7 +62,16 @@ static inline void
 rb_insert_augmented(struct rb_node *node, struct rb_root *root,
 		    const struct rb_augment_callbacks *augment)
 {
-	__rb_insert_augmented(node, root, augment->rotate);
+	__rb_insert_augmented(node, root, false, NULL, augment->rotate);
+}
+
+static inline void
+rb_insert_augmented_cached(struct rb_node *node,
+			   struct rb_root_cached *root, bool newleft,
+			   const struct rb_augment_callbacks *augment)
+{
+	__rb_insert_augmented(node, &root->rb_root,
+			      newleft, &root->rb_leftmost, augment->rotate);
 }
 
 #define RB_DECLARE_CALLBACKS(rbstatic, rbname, rbstruct, rbfield,	\
@@ -93,7 +104,9 @@ rbname ## _rotate(struct rb_node *rb_old, struct rb_node *rb_new)	\
 	old->rbaugmented = rbcompute(old);				\
 }									\
 rbstatic const struct rb_augment_callbacks rbname = {			\
-	rbname ## _propagate, rbname ## _copy, rbname ## _rotate	\
+	.propagate = rbname ## _propagate,				\
+	.copy = rbname ## _copy,					\
+	.rotate = rbname ## _rotate					\
 };
 
 
@@ -126,11 +139,11 @@ __rb_change_child(struct rb_node *old, struct rb_node *new,
 {
 	if (parent) {
 		if (parent->rb_left == old)
-			parent->rb_left = new;
+			WRITE_ONCE(parent->rb_left, new);
 		else
-			parent->rb_right = new;
+			WRITE_ONCE(parent->rb_right, new);
 	} else
-		root->rb_node = new;
+		WRITE_ONCE(root->rb_node, new);
 }
 
 extern void __rb_erase_color(struct rb_node *parent, struct rb_root *root,
@@ -138,12 +151,17 @@ extern void __rb_erase_color(struct rb_node *parent, struct rb_root *root,
 
 static __always_inline struct rb_node *
 __rb_erase_augmented(struct rb_node *node, struct rb_root *root,
+		     struct rb_node **leftmost,
 		     const struct rb_augment_callbacks *augment)
 {
-	struct rb_node *child = node->rb_right, *tmp = node->rb_left;
+	struct rb_node *child = node->rb_right;
+	struct rb_node *tmp = node->rb_left;
 	struct rb_node *parent, *rebalance;
 	unsigned long pc;
 
+	if (leftmost && node == *leftmost)
+		*leftmost = rb_next(node);
+
 	if (!tmp) {
 		/*
 		 * Case 1: node to erase has no more than 1 child (easy!)
@@ -170,6 +188,7 @@ __rb_erase_augmented(struct rb_node *node, struct rb_root *root,
 		tmp = parent;
 	} else {
 		struct rb_node *successor = child, *child2;
+
 		tmp = child->rb_left;
 		if (!tmp) {
 			/*
@@ -183,6 +202,7 @@ __rb_erase_augmented(struct rb_node *node, struct rb_root *root,
 			 */
 			parent = successor;
 			child2 = successor->rb_right;
+
 			augment->copy(node, successor);
 		} else {
 			/*
@@ -204,19 +224,23 @@ __rb_erase_augmented(struct rb_node *node, struct rb_root *root,
 				successor = tmp;
 				tmp = tmp->rb_left;
 			} while (tmp);
-			parent->rb_left = child2 = successor->rb_right;
-			successor->rb_right = child;
+			child2 = successor->rb_right;
+			WRITE_ONCE(parent->rb_left, child2);
+			WRITE_ONCE(successor->rb_right, child);
 			rb_set_parent(child, successor);
+
 			augment->copy(node, successor);
 			augment->propagate(parent, successor);
 		}
 
-		successor->rb_left = tmp = node->rb_left;
+		tmp = node->rb_left;
+		WRITE_ONCE(successor->rb_left, tmp);
 		rb_set_parent(tmp, successor);
 
 		pc = node->__rb_parent_color;
 		tmp = __rb_parent(pc);
 		__rb_change_child(node, successor, tmp, root);
+
 		if (child2) {
 			successor->__rb_parent_color = pc;
 			rb_set_parent_color(child2, parent, RB_BLACK);
@@ -237,9 +261,21 @@ static __always_inline void
 rb_erase_augmented(struct rb_node *node, struct rb_root *root,
 		   const struct rb_augment_callbacks *augment)
 {
-	struct rb_node *rebalance = __rb_erase_augmented(node, root, augment);
+	struct rb_node *rebalance = __rb_erase_augmented(node, root,
+							 NULL, augment);
 	if (rebalance)
 		__rb_erase_color(rebalance, root, augment->rotate);
 }
 
-#endif	/* _TOOLS_LINUX_RBTREE_AUGMENTED_H */
+static __always_inline void
+rb_erase_augmented_cached(struct rb_node *node, struct rb_root_cached *root,
+			  const struct rb_augment_callbacks *augment)
+{
+	struct rb_node *rebalance = __rb_erase_augmented(node, &root->rb_root,
+							 &root->rb_leftmost,
+							 augment);
+	if (rebalance)
+		__rb_erase_color(rebalance, &root->rb_root, augment->rotate);
+}
+
+#endif /* _TOOLS_LINUX_RBTREE_AUGMENTED_H */
diff --git a/tools/lib/rbtree.c b/tools/lib/rbtree.c
index 17c2b596f043..904adb70a4f0 100644
--- a/tools/lib/rbtree.c
+++ b/tools/lib/rbtree.c
@@ -22,6 +22,7 @@
 */
 
 #include <linux/rbtree_augmented.h>
+#include <linux/export.h>
 
 /*
  * red-black trees properties:  http://en.wikipedia.org/wiki/Rbtree
@@ -43,6 +44,30 @@
  *  parentheses and have some accompanying text comment.
  */
 
+/*
+ * Notes on lockless lookups:
+ *
+ * All stores to the tree structure (rb_left and rb_right) must be done using
+ * WRITE_ONCE(). And we must not inadvertently cause (temporary) loops in the
+ * tree structure as seen in program order.
+ *
+ * These two requirements will allow lockless iteration of the tree -- not
+ * correct iteration mind you, tree rotations are not atomic so a lookup might
+ * miss entire subtrees.
+ *
+ * But they do guarantee that any such traversal will only see valid elements
+ * and that it will indeed complete -- does not get stuck in a loop.
+ *
+ * It also guarantees that if the lookup returns an element it is the 'correct'
+ * one. But not returning an element does _NOT_ mean it's not present.
+ *
+ * NOTE:
+ *
+ * Stores to __rb_parent_color are not important for simple lookups so those
+ * are left undone as of now. Nor did I check for loops involving parent
+ * pointers.
+ */
+
 static inline void rb_set_black(struct rb_node *rb)
 {
 	rb->__rb_parent_color |= RB_BLACK;
@@ -70,22 +95,35 @@ __rb_rotate_set_parents(struct rb_node *old, struct rb_node *new,
 
 static __always_inline void
 __rb_insert(struct rb_node *node, struct rb_root *root,
+	    bool newleft, struct rb_node **leftmost,
 	    void (*augment_rotate)(struct rb_node *old, struct rb_node *new))
 {
 	struct rb_node *parent = rb_red_parent(node), *gparent, *tmp;
 
+	if (newleft)
+		*leftmost = node;
+
 	while (true) {
 		/*
-		 * Loop invariant: node is red
-		 *
-		 * If there is a black parent, we are done.
-		 * Otherwise, take some corrective action as we don't
-		 * want a red root or two consecutive red nodes.
+		 * Loop invariant: node is red.
 		 */
-		if (!parent) {
+		if (unlikely(!parent)) {
+			/*
+			 * The inserted node is root. Either this is the
+			 * first node, or we recursed at Case 1 below and
+			 * are no longer violating 4).
+			 */
 			rb_set_parent_color(node, NULL, RB_BLACK);
 			break;
-		} else if (rb_is_black(parent))
+		}
+
+		/*
+		 * If there is a black parent, we are done.
+		 * Otherwise, take some corrective action as,
+		 * per 4), we don't want a red root or two
+		 * consecutive red nodes.
+		 */
+		if(rb_is_black(parent))
 			break;
 
 		gparent = rb_red_parent(parent);
@@ -94,7 +132,7 @@ __rb_insert(struct rb_node *node, struct rb_root *root,
 		if (parent != tmp) {	/* parent == gparent->rb_left */
 			if (tmp && rb_is_red(tmp)) {
 				/*
-				 * Case 1 - color flips
+				 * Case 1 - node's uncle is red (color flips).
 				 *
 				 *       G            g
 				 *      / \          / \
@@ -117,7 +155,8 @@ __rb_insert(struct rb_node *node, struct rb_root *root,
 			tmp = parent->rb_right;
 			if (node == tmp) {
 				/*
-				 * Case 2 - left rotate at parent
+				 * Case 2 - node's uncle is black and node is
+				 * the parent's right child (left rotate at parent).
 				 *
 				 *      G             G
 				 *     / \           / \
@@ -128,8 +167,9 @@ __rb_insert(struct rb_node *node, struct rb_root *root,
 				 * This still leaves us in violation of 4), the
 				 * continuation into Case 3 will fix that.
 				 */
-				parent->rb_right = tmp = node->rb_left;
-				node->rb_left = parent;
+				tmp = node->rb_left;
+				WRITE_ONCE(parent->rb_right, tmp);
+				WRITE_ONCE(node->rb_left, parent);
 				if (tmp)
 					rb_set_parent_color(tmp, parent,
 							    RB_BLACK);
@@ -140,7 +180,8 @@ __rb_insert(struct rb_node *node, struct rb_root *root,
 			}
 
 			/*
-			 * Case 3 - right rotate at gparent
+			 * Case 3 - node's uncle is black and node is
+			 * the parent's left child (right rotate at gparent).
 			 *
 			 *        G           P
 			 *       / \         / \
@@ -148,8 +189,8 @@ __rb_insert(struct rb_node *node, struct rb_root *root,
 			 *     /                 \
 			 *    n                   U
 			 */
-			gparent->rb_left = tmp;  /* == parent->rb_right */
-			parent->rb_right = gparent;
+			WRITE_ONCE(gparent->rb_left, tmp); /* == parent->rb_right */
+			WRITE_ONCE(parent->rb_right, gparent);
 			if (tmp)
 				rb_set_parent_color(tmp, gparent, RB_BLACK);
 			__rb_rotate_set_parents(gparent, parent, root, RB_RED);
@@ -170,8 +211,9 @@ __rb_insert(struct rb_node *node, struct rb_root *root,
 			tmp = parent->rb_left;
 			if (node == tmp) {
 				/* Case 2 - right rotate at parent */
-				parent->rb_left = tmp = node->rb_right;
-				node->rb_right = parent;
+				tmp = node->rb_right;
+				WRITE_ONCE(parent->rb_left, tmp);
+				WRITE_ONCE(node->rb_right, parent);
 				if (tmp)
 					rb_set_parent_color(tmp, parent,
 							    RB_BLACK);
@@ -182,8 +224,8 @@ __rb_insert(struct rb_node *node, struct rb_root *root,
 			}
 
 			/* Case 3 - left rotate at gparent */
-			gparent->rb_right = tmp;  /* == parent->rb_left */
-			parent->rb_left = gparent;
+			WRITE_ONCE(gparent->rb_right, tmp); /* == parent->rb_left */
+			WRITE_ONCE(parent->rb_left, gparent);
 			if (tmp)
 				rb_set_parent_color(tmp, gparent, RB_BLACK);
 			__rb_rotate_set_parents(gparent, parent, root, RB_RED);
@@ -223,8 +265,9 @@ ____rb_erase_color(struct rb_node *parent, struct rb_root *root,
 				 *      / \         / \
 				 *     Sl  Sr      N   Sl
 				 */
-				parent->rb_right = tmp1 = sibling->rb_left;
-				sibling->rb_left = parent;
+				tmp1 = sibling->rb_left;
+				WRITE_ONCE(parent->rb_right, tmp1);
+				WRITE_ONCE(sibling->rb_left, parent);
 				rb_set_parent_color(tmp1, parent, RB_BLACK);
 				__rb_rotate_set_parents(parent, sibling, root,
 							RB_RED);
@@ -268,15 +311,31 @@ ____rb_erase_color(struct rb_node *parent, struct rb_root *root,
 				 *
 				 *   (p)           (p)
 				 *   / \           / \
-				 *  N   S    -->  N   Sl
+				 *  N   S    -->  N   sl
 				 *     / \             \
-				 *    sl  Sr            s
+				 *    sl  Sr            S
 				 *                       \
 				 *                        Sr
+				 *
+				 * Note: p might be red, and then both
+				 * p and sl are red after rotation(which
+				 * breaks property 4). This is fixed in
+				 * Case 4 (in __rb_rotate_set_parents()
+				 *         which set sl the color of p
+				 *         and set p RB_BLACK)
+				 *
+				 *   (p)            (sl)
+				 *   / \            /  \
+				 *  N   sl   -->   P    S
+				 *       \        /      \
+				 *        S      N        Sr
+				 *         \
+				 *          Sr
 				 */
-				sibling->rb_left = tmp1 = tmp2->rb_right;
-				tmp2->rb_right = sibling;
-				parent->rb_right = tmp2;
+				tmp1 = tmp2->rb_right;
+				WRITE_ONCE(sibling->rb_left, tmp1);
+				WRITE_ONCE(tmp2->rb_right, sibling);
+				WRITE_ONCE(parent->rb_right, tmp2);
 				if (tmp1)
 					rb_set_parent_color(tmp1, sibling,
 							    RB_BLACK);
@@ -296,8 +355,9 @@ ____rb_erase_color(struct rb_node *parent, struct rb_root *root,
 			 *        / \         / \
 			 *      (sl) sr      N  (sl)
 			 */
-			parent->rb_right = tmp2 = sibling->rb_left;
-			sibling->rb_left = parent;
+			tmp2 = sibling->rb_left;
+			WRITE_ONCE(parent->rb_right, tmp2);
+			WRITE_ONCE(sibling->rb_left, parent);
 			rb_set_parent_color(tmp1, sibling, RB_BLACK);
 			if (tmp2)
 				rb_set_parent(tmp2, parent);
@@ -309,8 +369,9 @@ ____rb_erase_color(struct rb_node *parent, struct rb_root *root,
 			sibling = parent->rb_left;
 			if (rb_is_red(sibling)) {
 				/* Case 1 - right rotate at parent */
-				parent->rb_left = tmp1 = sibling->rb_right;
-				sibling->rb_right = parent;
+				tmp1 = sibling->rb_right;
+				WRITE_ONCE(parent->rb_left, tmp1);
+				WRITE_ONCE(sibling->rb_right, parent);
 				rb_set_parent_color(tmp1, parent, RB_BLACK);
 				__rb_rotate_set_parents(parent, sibling, root,
 							RB_RED);
@@ -334,10 +395,11 @@ ____rb_erase_color(struct rb_node *parent, struct rb_root *root,
 					}
 					break;
 				}
-				/* Case 3 - right rotate at sibling */
-				sibling->rb_right = tmp1 = tmp2->rb_left;
-				tmp2->rb_left = sibling;
-				parent->rb_left = tmp2;
+				/* Case 3 - left rotate at sibling */
+				tmp1 = tmp2->rb_left;
+				WRITE_ONCE(sibling->rb_right, tmp1);
+				WRITE_ONCE(tmp2->rb_left, sibling);
+				WRITE_ONCE(parent->rb_left, tmp2);
 				if (tmp1)
 					rb_set_parent_color(tmp1, sibling,
 							    RB_BLACK);
@@ -345,9 +407,10 @@ ____rb_erase_color(struct rb_node *parent, struct rb_root *root,
 				tmp1 = sibling;
 				sibling = tmp2;
 			}
-			/* Case 4 - left rotate at parent + color flips */
-			parent->rb_left = tmp2 = sibling->rb_right;
-			sibling->rb_right = parent;
+			/* Case 4 - right rotate at parent + color flips */
+			tmp2 = sibling->rb_right;
+			WRITE_ONCE(parent->rb_left, tmp2);
+			WRITE_ONCE(sibling->rb_right, parent);
 			rb_set_parent_color(tmp1, sibling, RB_BLACK);
 			if (tmp2)
 				rb_set_parent(tmp2, parent);
@@ -378,22 +441,41 @@ static inline void dummy_copy(struct rb_node *old, struct rb_node *new) {}
 static inline void dummy_rotate(struct rb_node *old, struct rb_node *new) {}
 
 static const struct rb_augment_callbacks dummy_callbacks = {
-	dummy_propagate, dummy_copy, dummy_rotate
+	.propagate = dummy_propagate,
+	.copy = dummy_copy,
+	.rotate = dummy_rotate
 };
 
 void rb_insert_color(struct rb_node *node, struct rb_root *root)
 {
-	__rb_insert(node, root, dummy_rotate);
+	__rb_insert(node, root, false, NULL, dummy_rotate);
 }
 
 void rb_erase(struct rb_node *node, struct rb_root *root)
 {
 	struct rb_node *rebalance;
-	rebalance = __rb_erase_augmented(node, root, &dummy_callbacks);
+	rebalance = __rb_erase_augmented(node, root,
+					 NULL, &dummy_callbacks);
 	if (rebalance)
 		____rb_erase_color(rebalance, root, dummy_rotate);
 }
 
+void rb_insert_color_cached(struct rb_node *node,
+			    struct rb_root_cached *root, bool leftmost)
+{
+	__rb_insert(node, &root->rb_root, leftmost,
+		    &root->rb_leftmost, dummy_rotate);
+}
+
+void rb_erase_cached(struct rb_node *node, struct rb_root_cached *root)
+{
+	struct rb_node *rebalance;
+	rebalance = __rb_erase_augmented(node, &root->rb_root,
+					 &root->rb_leftmost, &dummy_callbacks);
+	if (rebalance)
+		____rb_erase_color(rebalance, &root->rb_root, dummy_rotate);
+}
+
 /*
  * Augmented rbtree manipulation functions.
  *
@@ -402,9 +484,10 @@ void rb_erase(struct rb_node *node, struct rb_root *root)
  */
 
 void __rb_insert_augmented(struct rb_node *node, struct rb_root *root,
+			   bool newleft, struct rb_node **leftmost,
 	void (*augment_rotate)(struct rb_node *old, struct rb_node *new))
 {
-	__rb_insert(node, root, augment_rotate);
+	__rb_insert(node, root, newleft, leftmost, augment_rotate);
 }
 
 /*
@@ -498,15 +581,24 @@ void rb_replace_node(struct rb_node *victim, struct rb_node *new,
 {
 	struct rb_node *parent = rb_parent(victim);
 
+	/* Copy the pointers/colour from the victim to the replacement */
+	*new = *victim;
+
 	/* Set the surrounding nodes to point to the replacement */
-	__rb_change_child(victim, new, parent, root);
 	if (victim->rb_left)
 		rb_set_parent(victim->rb_left, new);
 	if (victim->rb_right)
 		rb_set_parent(victim->rb_right, new);
+	__rb_change_child(victim, new, parent, root);
+}
 
-	/* Copy the pointers/colour from the victim to the replacement */
-	*new = *victim;
+void rb_replace_node_cached(struct rb_node *victim, struct rb_node *new,
+			    struct rb_root_cached *root)
+{
+	rb_replace_node(victim, new, &root->rb_root);
+
+	if (root->rb_leftmost == victim)
+		root->rb_leftmost = new;
 }
 
 static struct rb_node *rb_left_deepest_node(const struct rb_node *node)
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/7] perf machine: Use cached rbtrees
  2018-12-06 19:18 [PATCH v2 -tip 0/7] tools/perf: Update rbtree implementation and optimize users Davidlohr Bueso
  2018-12-06 19:18 ` [PATCH 1/7] tools/perf: Update rbtree implementation Davidlohr Bueso
@ 2018-12-06 19:18 ` Davidlohr Bueso
  2019-01-26 10:01   ` [tip:perf/core] " tip-bot for Davidlohr Bueso
  2018-12-06 19:18 ` [PATCH 3/7] perf callchain: " Davidlohr Bueso
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 18+ messages in thread
From: Davidlohr Bueso @ 2018-12-06 19:18 UTC (permalink / raw)
  To: acme; +Cc: mingo, linux-kernel, dave, Davidlohr Bueso

At the cost of an extra pointer, we can avoid the O(logN) cost
of finding the first element in the tree (smallest node), which
is something required for nearly every operation dealing with
machine->guests and threads->entries.

The conversion is straightforward, however, it's worth noticing
that the rb_erase_init() calls have been replaced by rb_erase_cached()
which has no _init() flavor, however, the node is explicitly
cleared next anyway, which was redundant until now.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 tools/perf/builtin-report.c |  3 ++-
 tools/perf/util/build-id.c  | 12 ++++++----
 tools/perf/util/machine.c   | 53 ++++++++++++++++++++++++++-------------------
 tools/perf/util/machine.h   | 12 +++++-----
 tools/perf/util/rb_resort.h |  6 ++---
 5 files changed, 50 insertions(+), 36 deletions(-)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 257c9c18cb7e..b22fce3d9c95 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -752,7 +752,8 @@ static int tasks_print(struct report *rep, FILE *fp)
 	for (i = 0; i < THREADS__TABLE_SIZE; i++) {
 		struct threads *threads = &machine->threads[i];
 
-		for (nd = rb_first(&threads->entries); nd; nd = rb_next(nd)) {
+		for (nd = rb_first_cached(&threads->entries); nd;
+		     nd = rb_next(nd)) {
 			task = tasks + itask++;
 
 			task->thread = rb_entry(nd, struct thread, rb_node);
diff --git a/tools/perf/util/build-id.c b/tools/perf/util/build-id.c
index 04b1d53e4bf9..a3796455898e 100644
--- a/tools/perf/util/build-id.c
+++ b/tools/perf/util/build-id.c
@@ -363,7 +363,8 @@ int perf_session__write_buildid_table(struct perf_session *session,
 	if (err)
 		return err;
 
-	for (nd = rb_first(&session->machines.guests); nd; nd = rb_next(nd)) {
+	for (nd = rb_first_cached(&session->machines.guests); nd;
+	     nd = rb_next(nd)) {
 		struct machine *pos = rb_entry(nd, struct machine, rb_node);
 		err = machine__write_buildid_table(pos, fd);
 		if (err)
@@ -396,7 +397,8 @@ int dsos__hit_all(struct perf_session *session)
 	if (err)
 		return err;
 
-	for (nd = rb_first(&session->machines.guests); nd; nd = rb_next(nd)) {
+	for (nd = rb_first_cached(&session->machines.guests); nd;
+	     nd = rb_next(nd)) {
 		struct machine *pos = rb_entry(nd, struct machine, rb_node);
 
 		err = machine__hit_all_dsos(pos);
@@ -849,7 +851,8 @@ int perf_session__cache_build_ids(struct perf_session *session)
 
 	ret = machine__cache_build_ids(&session->machines.host);
 
-	for (nd = rb_first(&session->machines.guests); nd; nd = rb_next(nd)) {
+	for (nd = rb_first_cached(&session->machines.guests); nd;
+	     nd = rb_next(nd)) {
 		struct machine *pos = rb_entry(nd, struct machine, rb_node);
 		ret |= machine__cache_build_ids(pos);
 	}
@@ -866,7 +869,8 @@ bool perf_session__read_build_ids(struct perf_session *session, bool with_hits)
 	struct rb_node *nd;
 	bool ret = machine__read_build_ids(&session->machines.host, with_hits);
 
-	for (nd = rb_first(&session->machines.guests); nd; nd = rb_next(nd)) {
+	for (nd = rb_first_cached(&session->machines.guests); nd;
+	     nd = rb_next(nd)) {
 		struct machine *pos = rb_entry(nd, struct machine, rb_node);
 		ret |= machine__read_build_ids(pos, with_hits);
 	}
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 8f36ce813bc5..1fbdc745a57e 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -41,7 +41,7 @@ static void machine__threads_init(struct machine *machine)
 
 	for (i = 0; i < THREADS__TABLE_SIZE; i++) {
 		struct threads *threads = &machine->threads[i];
-		threads->entries = RB_ROOT;
+		threads->entries = RB_ROOT_CACHED;
 		init_rwsem(&threads->lock);
 		threads->nr = 0;
 		INIT_LIST_HEAD(&threads->dead);
@@ -179,7 +179,7 @@ void machine__delete_threads(struct machine *machine)
 	for (i = 0; i < THREADS__TABLE_SIZE; i++) {
 		struct threads *threads = &machine->threads[i];
 		down_write(&threads->lock);
-		nd = rb_first(&threads->entries);
+		nd = rb_first_cached(&threads->entries);
 		while (nd) {
 			struct thread *t = rb_entry(nd, struct thread, rb_node);
 
@@ -222,7 +222,7 @@ void machine__delete(struct machine *machine)
 void machines__init(struct machines *machines)
 {
 	machine__init(&machines->host, "", HOST_KERNEL_ID);
-	machines->guests = RB_ROOT;
+	machines->guests = RB_ROOT_CACHED;
 }
 
 void machines__exit(struct machines *machines)
@@ -234,9 +234,10 @@ void machines__exit(struct machines *machines)
 struct machine *machines__add(struct machines *machines, pid_t pid,
 			      const char *root_dir)
 {
-	struct rb_node **p = &machines->guests.rb_node;
+	struct rb_node **p = &machines->guests.rb_root.rb_node;
 	struct rb_node *parent = NULL;
 	struct machine *pos, *machine = malloc(sizeof(*machine));
+	bool leftmost = true;
 
 	if (machine == NULL)
 		return NULL;
@@ -251,12 +252,14 @@ struct machine *machines__add(struct machines *machines, pid_t pid,
 		pos = rb_entry(parent, struct machine, rb_node);
 		if (pid < pos->pid)
 			p = &(*p)->rb_left;
-		else
+		else {
 			p = &(*p)->rb_right;
+			leftmost = false;
+		}
 	}
 
 	rb_link_node(&machine->rb_node, parent, p);
-	rb_insert_color(&machine->rb_node, &machines->guests);
+	rb_insert_color_cached(&machine->rb_node, &machines->guests, leftmost);
 
 	return machine;
 }
@@ -267,7 +270,7 @@ void machines__set_comm_exec(struct machines *machines, bool comm_exec)
 
 	machines->host.comm_exec = comm_exec;
 
-	for (nd = rb_first(&machines->guests); nd; nd = rb_next(nd)) {
+	for (nd = rb_first_cached(&machines->guests); nd; nd = rb_next(nd)) {
 		struct machine *machine = rb_entry(nd, struct machine, rb_node);
 
 		machine->comm_exec = comm_exec;
@@ -276,7 +279,7 @@ void machines__set_comm_exec(struct machines *machines, bool comm_exec)
 
 struct machine *machines__find(struct machines *machines, pid_t pid)
 {
-	struct rb_node **p = &machines->guests.rb_node;
+	struct rb_node **p = &machines->guests.rb_root.rb_node;
 	struct rb_node *parent = NULL;
 	struct machine *machine;
 	struct machine *default_machine = NULL;
@@ -339,7 +342,7 @@ void machines__process_guests(struct machines *machines,
 {
 	struct rb_node *nd;
 
-	for (nd = rb_first(&machines->guests); nd; nd = rb_next(nd)) {
+	for (nd = rb_first_cached(&machines->guests); nd; nd = rb_next(nd)) {
 		struct machine *pos = rb_entry(nd, struct machine, rb_node);
 		process(pos, data);
 	}
@@ -352,7 +355,8 @@ void machines__set_id_hdr_size(struct machines *machines, u16 id_hdr_size)
 
 	machines->host.id_hdr_size = id_hdr_size;
 
-	for (node = rb_first(&machines->guests); node; node = rb_next(node)) {
+	for (node = rb_first_cached(&machines->guests); node;
+	     node = rb_next(node)) {
 		machine = rb_entry(node, struct machine, rb_node);
 		machine->id_hdr_size = id_hdr_size;
 	}
@@ -465,9 +469,10 @@ static struct thread *____machine__findnew_thread(struct machine *machine,
 						  pid_t pid, pid_t tid,
 						  bool create)
 {
-	struct rb_node **p = &threads->entries.rb_node;
+	struct rb_node **p = &threads->entries.rb_root.rb_node;
 	struct rb_node *parent = NULL;
 	struct thread *th;
+	bool leftmost = true;
 
 	th = threads__get_last_match(threads, machine, pid, tid);
 	if (th)
@@ -485,8 +490,10 @@ static struct thread *____machine__findnew_thread(struct machine *machine,
 
 		if (tid < th->tid)
 			p = &(*p)->rb_left;
-		else
+		else {
 			p = &(*p)->rb_right;
+			leftmost = false;
+		}
 	}
 
 	if (!create)
@@ -495,7 +502,7 @@ static struct thread *____machine__findnew_thread(struct machine *machine,
 	th = thread__new(pid, tid);
 	if (th != NULL) {
 		rb_link_node(&th->rb_node, parent, p);
-		rb_insert_color(&th->rb_node, &threads->entries);
+		rb_insert_color_cached(&th->rb_node, &threads->entries, leftmost);
 
 		/*
 		 * We have to initialize map_groups separately
@@ -506,7 +513,7 @@ static struct thread *____machine__findnew_thread(struct machine *machine,
 		 * leader and that would screwed the rb tree.
 		 */
 		if (thread__init_map_groups(th, machine)) {
-			rb_erase_init(&th->rb_node, &threads->entries);
+			rb_erase_cached(&th->rb_node, &threads->entries);
 			RB_CLEAR_NODE(&th->rb_node);
 			thread__put(th);
 			return NULL;
@@ -744,7 +751,7 @@ size_t machines__fprintf_dsos(struct machines *machines, FILE *fp)
 	struct rb_node *nd;
 	size_t ret = __dsos__fprintf(&machines->host.dsos.head, fp);
 
-	for (nd = rb_first(&machines->guests); nd; nd = rb_next(nd)) {
+	for (nd = rb_first_cached(&machines->guests); nd; nd = rb_next(nd)) {
 		struct machine *pos = rb_entry(nd, struct machine, rb_node);
 		ret += __dsos__fprintf(&pos->dsos.head, fp);
 	}
@@ -764,7 +771,7 @@ size_t machines__fprintf_dsos_buildid(struct machines *machines, FILE *fp,
 	struct rb_node *nd;
 	size_t ret = machine__fprintf_dsos_buildid(&machines->host, fp, skip, parm);
 
-	for (nd = rb_first(&machines->guests); nd; nd = rb_next(nd)) {
+	for (nd = rb_first_cached(&machines->guests); nd; nd = rb_next(nd)) {
 		struct machine *pos = rb_entry(nd, struct machine, rb_node);
 		ret += machine__fprintf_dsos_buildid(pos, fp, skip, parm);
 	}
@@ -804,7 +811,8 @@ size_t machine__fprintf(struct machine *machine, FILE *fp)
 
 		ret = fprintf(fp, "Threads: %u\n", threads->nr);
 
-		for (nd = rb_first(&threads->entries); nd; nd = rb_next(nd)) {
+		for (nd = rb_first_cached(&threads->entries); nd;
+		     nd = rb_next(nd)) {
 			struct thread *pos = rb_entry(nd, struct thread, rb_node);
 
 			ret += thread__fprintf(pos, fp);
@@ -1107,7 +1115,7 @@ int machines__create_guest_kernel_maps(struct machines *machines)
 
 void machines__destroy_kernel_maps(struct machines *machines)
 {
-	struct rb_node *next = rb_first(&machines->guests);
+	struct rb_node *next = rb_first_cached(&machines->guests);
 
 	machine__destroy_kernel_maps(&machines->host);
 
@@ -1115,7 +1123,7 @@ void machines__destroy_kernel_maps(struct machines *machines)
 		struct machine *pos = rb_entry(next, struct machine, rb_node);
 
 		next = rb_next(&pos->rb_node);
-		rb_erase(&pos->rb_node, &machines->guests);
+		rb_erase_cached(&pos->rb_node, &machines->guests);
 		machine__delete(pos);
 	}
 }
@@ -1680,7 +1688,7 @@ static void __machine__remove_thread(struct machine *machine, struct thread *th,
 	BUG_ON(refcount_read(&th->refcnt) == 0);
 	if (lock)
 		down_write(&threads->lock);
-	rb_erase_init(&th->rb_node, &threads->entries);
+	rb_erase_cached(&th->rb_node, &threads->entries);
 	RB_CLEAR_NODE(&th->rb_node);
 	--threads->nr;
 	/*
@@ -2453,7 +2461,8 @@ int machine__for_each_thread(struct machine *machine,
 
 	for (i = 0; i < THREADS__TABLE_SIZE; i++) {
 		threads = &machine->threads[i];
-		for (nd = rb_first(&threads->entries); nd; nd = rb_next(nd)) {
+		for (nd = rb_first_cached(&threads->entries); nd;
+		     nd = rb_next(nd)) {
 			thread = rb_entry(nd, struct thread, rb_node);
 			rc = fn(thread, priv);
 			if (rc != 0)
@@ -2480,7 +2489,7 @@ int machines__for_each_thread(struct machines *machines,
 	if (rc != 0)
 		return rc;
 
-	for (nd = rb_first(&machines->guests); nd; nd = rb_next(nd)) {
+	for (nd = rb_first_cached(&machines->guests); nd; nd = rb_next(nd)) {
 		struct machine *machine = rb_entry(nd, struct machine, rb_node);
 
 		rc = machine__for_each_thread(machine, fn, priv);
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index d856b85862e2..a4cc5a1cb1f8 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -29,11 +29,11 @@ struct vdso_info;
 #define THREADS__TABLE_SIZE	(1 << THREADS__TABLE_BITS)
 
 struct threads {
-	struct rb_root	  entries;
-	struct rw_semaphore lock;
-	unsigned int	  nr;
-	struct list_head  dead;
-	struct thread	  *last_match;
+	struct rb_root_cached  entries;
+	struct rw_semaphore    lock;
+	unsigned int	       nr;
+	struct list_head       dead;
+	struct thread	       *last_match;
 };
 
 struct machine {
@@ -134,7 +134,7 @@ typedef void (*machine__process_t)(struct machine *machine, void *data);
 
 struct machines {
 	struct machine host;
-	struct rb_root guests;
+	struct rb_root_cached guests;
 };
 
 void machines__init(struct machines *machines);
diff --git a/tools/perf/util/rb_resort.h b/tools/perf/util/rb_resort.h
index a920f702a74d..f272e181d3d6 100644
--- a/tools/perf/util/rb_resort.h
+++ b/tools/perf/util/rb_resort.h
@@ -144,8 +144,8 @@ struct __name##_sorted *__name = __name##_sorted__new
 				  __ilist->rblist.nr_entries)
 
 /* For 'struct machine->threads' */
-#define DECLARE_RESORT_RB_MACHINE_THREADS(__name, __machine, hash_bucket)	\
-	DECLARE_RESORT_RB(__name)(&__machine->threads[hash_bucket].entries,	\
-				  __machine->threads[hash_bucket].nr)
+#define DECLARE_RESORT_RB_MACHINE_THREADS(__name, __machine, hash_bucket)    \
+ DECLARE_RESORT_RB(__name)(&__machine->threads[hash_bucket].entries.rb_root, \
+			   __machine->threads[hash_bucket].nr)
 
 #endif /* _PERF_RESORT_RB_H_ */
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/7] perf callchain: Use cached rbtrees
  2018-12-06 19:18 [PATCH v2 -tip 0/7] tools/perf: Update rbtree implementation and optimize users Davidlohr Bueso
  2018-12-06 19:18 ` [PATCH 1/7] tools/perf: Update rbtree implementation Davidlohr Bueso
  2018-12-06 19:18 ` [PATCH 2/7] perf machine: Use cached rbtrees Davidlohr Bueso
@ 2018-12-06 19:18 ` Davidlohr Bueso
  2019-01-26 10:02   ` [tip:perf/core] " tip-bot for Davidlohr Bueso
  2018-12-06 19:18 ` [PATCH 4/7] perf util: Use cached rbtree for rblists Davidlohr Bueso
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 18+ messages in thread
From: Davidlohr Bueso @ 2018-12-06 19:18 UTC (permalink / raw)
  To: acme; +Cc: mingo, linux-kernel, dave, Davidlohr Bueso

At the cost of an extra pointer, we can avoid the O(logN) cost
of finding the first element in the tree (smallest node), which
is something required for nearly every in/srcline callchain node
deletion (in/srcline__tree_delete()).

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 tools/perf/util/dso.c     |  4 ++--
 tools/perf/util/dso.h     |  4 ++--
 tools/perf/util/srcline.c | 43 +++++++++++++++++++++++++------------------
 tools/perf/util/srcline.h | 13 +++++++------
 4 files changed, 36 insertions(+), 28 deletions(-)

diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c
index bbed90e5d9bb..31edcc1d8661 100644
--- a/tools/perf/util/dso.c
+++ b/tools/perf/util/dso.c
@@ -1197,8 +1197,8 @@ struct dso *dso__new(const char *name)
 		dso__set_short_name(dso, dso->name, false);
 		dso->symbols = dso->symbol_names = RB_ROOT;
 		dso->data.cache = RB_ROOT;
-		dso->inlined_nodes = RB_ROOT;
-		dso->srclines = RB_ROOT;
+		dso->inlined_nodes = RB_ROOT_CACHED;
+		dso->srclines = RB_ROOT_CACHED;
 		dso->data.fd = -1;
 		dso->data.status = DSO_DATA_STATUS_UNKNOWN;
 		dso->symtab_type = DSO_BINARY_TYPE__NOT_FOUND;
diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h
index c5380500bed4..935fe99b38a8 100644
--- a/tools/perf/util/dso.h
+++ b/tools/perf/util/dso.h
@@ -142,8 +142,8 @@ struct dso {
 	struct rb_root	 *root;		/* root of rbtree that rb_node is in */
 	struct rb_root	 symbols;
 	struct rb_root	 symbol_names;
-	struct rb_root	 inlined_nodes;
-	struct rb_root	 srclines;
+	struct rb_root_cached inlined_nodes;
+	struct rb_root_cached srclines;
 	struct {
 		u64		addr;
 		struct symbol	*symbol;
diff --git a/tools/perf/util/srcline.c b/tools/perf/util/srcline.c
index e767c4a9d4d2..41ced7f65c03 100644
--- a/tools/perf/util/srcline.c
+++ b/tools/perf/util/srcline.c
@@ -566,11 +566,12 @@ struct srcline_node {
 	struct rb_node		rb_node;
 };
 
-void srcline__tree_insert(struct rb_root *tree, u64 addr, char *srcline)
+void srcline__tree_insert(struct rb_root_cached *tree, u64 addr, char *srcline)
 {
-	struct rb_node **p = &tree->rb_node;
+	struct rb_node **p = &tree->rb_root.rb_node;
 	struct rb_node *parent = NULL;
 	struct srcline_node *i, *node;
+	bool leftmost = true;
 
 	node = zalloc(sizeof(struct srcline_node));
 	if (!node) {
@@ -586,16 +587,18 @@ void srcline__tree_insert(struct rb_root *tree, u64 addr, char *srcline)
 		i = rb_entry(parent, struct srcline_node, rb_node);
 		if (addr < i->addr)
 			p = &(*p)->rb_left;
-		else
+		else {
 			p = &(*p)->rb_right;
+			leftmost = false;
+		}
 	}
 	rb_link_node(&node->rb_node, parent, p);
-	rb_insert_color(&node->rb_node, tree);
+	rb_insert_color_cached(&node->rb_node, tree, leftmost);
 }
 
-char *srcline__tree_find(struct rb_root *tree, u64 addr)
+char *srcline__tree_find(struct rb_root_cached *tree, u64 addr)
 {
-	struct rb_node *n = tree->rb_node;
+	struct rb_node *n = tree->rb_root.rb_node;
 
 	while (n) {
 		struct srcline_node *i = rb_entry(n, struct srcline_node,
@@ -612,15 +615,15 @@ char *srcline__tree_find(struct rb_root *tree, u64 addr)
 	return NULL;
 }
 
-void srcline__tree_delete(struct rb_root *tree)
+void srcline__tree_delete(struct rb_root_cached *tree)
 {
 	struct srcline_node *pos;
-	struct rb_node *next = rb_first(tree);
+	struct rb_node *next = rb_first_cached(tree);
 
 	while (next) {
 		pos = rb_entry(next, struct srcline_node, rb_node);
 		next = rb_next(&pos->rb_node);
-		rb_erase(&pos->rb_node, tree);
+		rb_erase_cached(&pos->rb_node, tree);
 		free_srcline(pos->srcline);
 		zfree(&pos);
 	}
@@ -654,28 +657,32 @@ void inline_node__delete(struct inline_node *node)
 	free(node);
 }
 
-void inlines__tree_insert(struct rb_root *tree, struct inline_node *inlines)
+void inlines__tree_insert(struct rb_root_cached *tree,
+			  struct inline_node *inlines)
 {
-	struct rb_node **p = &tree->rb_node;
+	struct rb_node **p = &tree->rb_root.rb_node;
 	struct rb_node *parent = NULL;
 	const u64 addr = inlines->addr;
 	struct inline_node *i;
+	bool leftmost = true;
 
 	while (*p != NULL) {
 		parent = *p;
 		i = rb_entry(parent, struct inline_node, rb_node);
 		if (addr < i->addr)
 			p = &(*p)->rb_left;
-		else
+		else {
 			p = &(*p)->rb_right;
+			leftmost = false;
+		}
 	}
 	rb_link_node(&inlines->rb_node, parent, p);
-	rb_insert_color(&inlines->rb_node, tree);
+	rb_insert_color_cached(&inlines->rb_node, tree, leftmost);
 }
 
-struct inline_node *inlines__tree_find(struct rb_root *tree, u64 addr)
+struct inline_node *inlines__tree_find(struct rb_root_cached *tree, u64 addr)
 {
-	struct rb_node *n = tree->rb_node;
+	struct rb_node *n = tree->rb_root.rb_node;
 
 	while (n) {
 		struct inline_node *i = rb_entry(n, struct inline_node,
@@ -692,15 +699,15 @@ struct inline_node *inlines__tree_find(struct rb_root *tree, u64 addr)
 	return NULL;
 }
 
-void inlines__tree_delete(struct rb_root *tree)
+void inlines__tree_delete(struct rb_root_cached *tree)
 {
 	struct inline_node *pos;
-	struct rb_node *next = rb_first(tree);
+	struct rb_node *next = rb_first_cached(tree);
 
 	while (next) {
 		pos = rb_entry(next, struct inline_node, rb_node);
 		next = rb_next(&pos->rb_node);
-		rb_erase(&pos->rb_node, tree);
+		rb_erase_cached(&pos->rb_node, tree);
 		inline_node__delete(pos);
 	}
 }
diff --git a/tools/perf/util/srcline.h b/tools/perf/util/srcline.h
index b2bb5502fd62..c77646f057bf 100644
--- a/tools/perf/util/srcline.h
+++ b/tools/perf/util/srcline.h
@@ -18,11 +18,11 @@ char *__get_srcline(struct dso *dso, u64 addr, struct symbol *sym,
 void free_srcline(char *srcline);
 
 /* insert the srcline into the DSO, which will take ownership */
-void srcline__tree_insert(struct rb_root *tree, u64 addr, char *srcline);
+void srcline__tree_insert(struct rb_root_cached *tree, u64 addr, char *srcline);
 /* find previously inserted srcline */
-char *srcline__tree_find(struct rb_root *tree, u64 addr);
+char *srcline__tree_find(struct rb_root_cached *tree, u64 addr);
 /* delete all srclines within the tree */
-void srcline__tree_delete(struct rb_root *tree);
+void srcline__tree_delete(struct rb_root_cached *tree);
 
 #define SRCLINE_UNKNOWN  ((char *) "??:0")
 
@@ -45,10 +45,11 @@ struct inline_node *dso__parse_addr_inlines(struct dso *dso, u64 addr,
 void inline_node__delete(struct inline_node *node);
 
 /* insert the inline node list into the DSO, which will take ownership */
-void inlines__tree_insert(struct rb_root *tree, struct inline_node *inlines);
+void inlines__tree_insert(struct rb_root_cached *tree,
+			  struct inline_node *inlines);
 /* find previously inserted inline node list */
-struct inline_node *inlines__tree_find(struct rb_root *tree, u64 addr);
+struct inline_node *inlines__tree_find(struct rb_root_cached *tree, u64 addr);
 /* delete all nodes within the tree of inline_node s */
-void inlines__tree_delete(struct rb_root *tree);
+void inlines__tree_delete(struct rb_root_cached *tree);
 
 #endif /* PERF_SRCLINE_H */
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 4/7] perf util: Use cached rbtree for rblists
  2018-12-06 19:18 [PATCH v2 -tip 0/7] tools/perf: Update rbtree implementation and optimize users Davidlohr Bueso
                   ` (2 preceding siblings ...)
  2018-12-06 19:18 ` [PATCH 3/7] perf callchain: " Davidlohr Bueso
@ 2018-12-06 19:18 ` Davidlohr Bueso
  2019-01-26 10:03   ` [tip:perf/core] " tip-bot for Davidlohr Bueso
  2018-12-06 19:18 ` [PATCH 5/7] perf symbols: Use cached rbtrees Davidlohr Bueso
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 18+ messages in thread
From: Davidlohr Bueso @ 2018-12-06 19:18 UTC (permalink / raw)
  To: acme; +Cc: mingo, linux-kernel, dave, Davidlohr Bueso

At the cost of an extra pointer, we can avoid the O(logN) cost
of finding the first element in the tree (smallest node), which
is something required for any of the strlist or intlist traversals
(XXX_for_each_entry()). There are a number of users in perf of
these (particularly strlists), including probes, and buildid.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 tools/perf/util/intlist.h     |  2 +-
 tools/perf/util/metricgroup.c |  2 +-
 tools/perf/util/rb_resort.h   |  2 +-
 tools/perf/util/rblist.c      | 28 ++++++++++++++++++----------
 tools/perf/util/rblist.h      |  2 +-
 tools/perf/util/stat-shadow.c |  2 +-
 tools/perf/util/strlist.h     |  2 +-
 7 files changed, 24 insertions(+), 16 deletions(-)

diff --git a/tools/perf/util/intlist.h b/tools/perf/util/intlist.h
index 85bab8735fa9..5c19ee001299 100644
--- a/tools/perf/util/intlist.h
+++ b/tools/perf/util/intlist.h
@@ -45,7 +45,7 @@ static inline unsigned int intlist__nr_entries(const struct intlist *ilist)
 /* For intlist iteration */
 static inline struct int_node *intlist__first(struct intlist *ilist)
 {
-	struct rb_node *rn = rb_first(&ilist->rblist.entries);
+	struct rb_node *rn = rb_first_cached(&ilist->rblist.entries);
 	return rn ? rb_entry(rn, struct int_node, rb_node) : NULL;
 }
 static inline struct int_node *intlist__next(struct int_node *in)
diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
index a28f9b5cc4ff..8529cbd3955b 100644
--- a/tools/perf/util/metricgroup.c
+++ b/tools/perf/util/metricgroup.c
@@ -352,7 +352,7 @@ void metricgroup__print(bool metrics, bool metricgroups, char *filter,
 	else if (metrics && !raw)
 		printf("\nMetrics:\n\n");
 
-	for (node = rb_first(&groups.entries); node; node = next) {
+	for (node = rb_first_cached(&groups.entries); node; node = next) {
 		struct mep *me = container_of(node, struct mep, nd);
 
 		if (metricgroups)
diff --git a/tools/perf/util/rb_resort.h b/tools/perf/util/rb_resort.h
index f272e181d3d6..376e86cb4c3c 100644
--- a/tools/perf/util/rb_resort.h
+++ b/tools/perf/util/rb_resort.h
@@ -140,7 +140,7 @@ struct __name##_sorted *__name = __name##_sorted__new
 
 /* For 'struct intlist' */
 #define DECLARE_RESORT_RB_INTLIST(__name, __ilist)				\
-	DECLARE_RESORT_RB(__name)(&__ilist->rblist.entries,			\
+	DECLARE_RESORT_RB(__name)(&__ilist->rblist.entries.rb_root,		\
 				  __ilist->rblist.nr_entries)
 
 /* For 'struct machine->threads' */
diff --git a/tools/perf/util/rblist.c b/tools/perf/util/rblist.c
index 0efc3258c648..11e07fab20dc 100644
--- a/tools/perf/util/rblist.c
+++ b/tools/perf/util/rblist.c
@@ -13,8 +13,9 @@
 
 int rblist__add_node(struct rblist *rblist, const void *new_entry)
 {
-	struct rb_node **p = &rblist->entries.rb_node;
+	struct rb_node **p = &rblist->entries.rb_root.rb_node;
 	struct rb_node *parent = NULL, *new_node;
+	bool leftmost = true;
 
 	while (*p != NULL) {
 		int rc;
@@ -24,8 +25,10 @@ int rblist__add_node(struct rblist *rblist, const void *new_entry)
 		rc = rblist->node_cmp(parent, new_entry);
 		if (rc > 0)
 			p = &(*p)->rb_left;
-		else if (rc < 0)
+		else if (rc < 0) {
 			p = &(*p)->rb_right;
+			leftmost = false;
+		}
 		else
 			return -EEXIST;
 	}
@@ -35,7 +38,7 @@ int rblist__add_node(struct rblist *rblist, const void *new_entry)
 		return -ENOMEM;
 
 	rb_link_node(new_node, parent, p);
-	rb_insert_color(new_node, &rblist->entries);
+	rb_insert_color_cached(new_node, &rblist->entries, leftmost);
 	++rblist->nr_entries;
 
 	return 0;
@@ -43,7 +46,7 @@ int rblist__add_node(struct rblist *rblist, const void *new_entry)
 
 void rblist__remove_node(struct rblist *rblist, struct rb_node *rb_node)
 {
-	rb_erase(rb_node, &rblist->entries);
+	rb_erase_cached(rb_node, &rblist->entries);
 	--rblist->nr_entries;
 	rblist->node_delete(rblist, rb_node);
 }
@@ -52,8 +55,9 @@ static struct rb_node *__rblist__findnew(struct rblist *rblist,
 					 const void *entry,
 					 bool create)
 {
-	struct rb_node **p = &rblist->entries.rb_node;
+	struct rb_node **p = &rblist->entries.rb_root.rb_node;
 	struct rb_node *parent = NULL, *new_node = NULL;
+	bool leftmost = true;
 
 	while (*p != NULL) {
 		int rc;
@@ -63,8 +67,10 @@ static struct rb_node *__rblist__findnew(struct rblist *rblist,
 		rc = rblist->node_cmp(parent, entry);
 		if (rc > 0)
 			p = &(*p)->rb_left;
-		else if (rc < 0)
+		else if (rc < 0) {
 			p = &(*p)->rb_right;
+			leftmost = false;
+		}
 		else
 			return parent;
 	}
@@ -73,7 +79,8 @@ static struct rb_node *__rblist__findnew(struct rblist *rblist,
 		new_node = rblist->node_new(rblist, entry);
 		if (new_node) {
 			rb_link_node(new_node, parent, p);
-			rb_insert_color(new_node, &rblist->entries);
+			rb_insert_color_cached(new_node,
+					       &rblist->entries, leftmost);
 			++rblist->nr_entries;
 		}
 	}
@@ -94,7 +101,7 @@ struct rb_node *rblist__findnew(struct rblist *rblist, const void *entry)
 void rblist__init(struct rblist *rblist)
 {
 	if (rblist != NULL) {
-		rblist->entries	 = RB_ROOT;
+		rblist->entries	 = RB_ROOT_CACHED;
 		rblist->nr_entries = 0;
 	}
 
@@ -103,7 +110,7 @@ void rblist__init(struct rblist *rblist)
 
 void rblist__exit(struct rblist *rblist)
 {
-	struct rb_node *pos, *next = rb_first(&rblist->entries);
+	struct rb_node *pos, *next = rb_first_cached(&rblist->entries);
 
 	while (next) {
 		pos = next;
@@ -124,7 +131,8 @@ struct rb_node *rblist__entry(const struct rblist *rblist, unsigned int idx)
 {
 	struct rb_node *node;
 
-	for (node = rb_first(&rblist->entries); node; node = rb_next(node)) {
+	for (node = rb_first_cached(&rblist->entries); node;
+	     node = rb_next(node)) {
 		if (!idx--)
 			return node;
 	}
diff --git a/tools/perf/util/rblist.h b/tools/perf/util/rblist.h
index 76df15c27f5f..14b232a4d0b6 100644
--- a/tools/perf/util/rblist.h
+++ b/tools/perf/util/rblist.h
@@ -20,7 +20,7 @@
  */
 
 struct rblist {
-	struct rb_root entries;
+	struct rb_root_cached entries;
 	unsigned int   nr_entries;
 
 	int (*node_cmp)(struct rb_node *rbn, const void *entry);
diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
index f0a8cec55c47..463160a5494c 100644
--- a/tools/perf/util/stat-shadow.c
+++ b/tools/perf/util/stat-shadow.c
@@ -168,7 +168,7 @@ static void reset_stat(struct runtime_stat *st)
 	struct rb_node *pos, *next;
 
 	rblist = &st->value_list;
-	next = rb_first(&rblist->entries);
+	next = rb_first_cached(&rblist->entries);
 	while (next) {
 		pos = next;
 		next = rb_next(pos);
diff --git a/tools/perf/util/strlist.h b/tools/perf/util/strlist.h
index d58f1e08b170..7e82c71dcc42 100644
--- a/tools/perf/util/strlist.h
+++ b/tools/perf/util/strlist.h
@@ -57,7 +57,7 @@ static inline unsigned int strlist__nr_entries(const struct strlist *slist)
 /* For strlist iteration */
 static inline struct str_node *strlist__first(struct strlist *slist)
 {
-	struct rb_node *rn = rb_first(&slist->rblist.entries);
+	struct rb_node *rn = rb_first_cached(&slist->rblist.entries);
 	return rn ? rb_entry(rn, struct str_node, rb_node) : NULL;
 }
 static inline struct str_node *strlist__next(struct str_node *sn)
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 5/7] perf symbols: Use cached rbtrees
  2018-12-06 19:18 [PATCH v2 -tip 0/7] tools/perf: Update rbtree implementation and optimize users Davidlohr Bueso
                   ` (3 preceding siblings ...)
  2018-12-06 19:18 ` [PATCH 4/7] perf util: Use cached rbtree for rblists Davidlohr Bueso
@ 2018-12-06 19:18 ` Davidlohr Bueso
  2019-01-26 10:03   ` [tip:perf/core] " tip-bot for Davidlohr Bueso
  2018-12-06 19:18 ` [PATCH 6/7] perf hist: " Davidlohr Bueso
  2018-12-06 19:18 ` [PATCH 7/7] perf sched: " Davidlohr Bueso
  6 siblings, 1 reply; 18+ messages in thread
From: Davidlohr Bueso @ 2018-12-06 19:18 UTC (permalink / raw)
  To: acme; +Cc: mingo, linux-kernel, dave, Davidlohr Bueso

At the cost of an extra pointer, we can avoid the O(logN) cost
of finding the first element in the tree (smallest node).

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 tools/perf/builtin-annotate.c    |  2 +-
 tools/perf/util/dso.c            |  4 +-
 tools/perf/util/dso.h            |  6 +--
 tools/perf/util/map.c            |  8 ++--
 tools/perf/util/probe-event.c    |  3 +-
 tools/perf/util/symbol.c         | 87 ++++++++++++++++++++++------------------
 tools/perf/util/symbol.h         | 13 +++---
 tools/perf/util/symbol_fprintf.c |  2 +-
 8 files changed, 67 insertions(+), 58 deletions(-)

diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index 93d679eaf1f4..cc3da5564300 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -227,7 +227,7 @@ static int perf_evsel__add_sample(struct perf_evsel *evsel,
 		 * the DSO?
 		 */
 		if (al->sym != NULL) {
-			rb_erase(&al->sym->rb_node,
+			rb_erase_cached(&al->sym->rb_node,
 				 &al->map->dso->symbols);
 			symbol__delete(al->sym);
 			dso__reset_find_symbol_cache(al->map->dso);
diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c
index 31edcc1d8661..6f9ddd6b6e23 100644
--- a/tools/perf/util/dso.c
+++ b/tools/perf/util/dso.c
@@ -1195,7 +1195,7 @@ struct dso *dso__new(const char *name)
 		strcpy(dso->name, name);
 		dso__set_long_name(dso, dso->name, false);
 		dso__set_short_name(dso, dso->name, false);
-		dso->symbols = dso->symbol_names = RB_ROOT;
+		dso->symbols = dso->symbol_names = RB_ROOT_CACHED;
 		dso->data.cache = RB_ROOT;
 		dso->inlined_nodes = RB_ROOT_CACHED;
 		dso->srclines = RB_ROOT_CACHED;
@@ -1467,7 +1467,7 @@ size_t dso__fprintf(struct dso *dso, FILE *fp)
 	ret += fprintf(fp, "%sloaded, ", dso__loaded(dso) ? "" : "NOT ");
 	ret += dso__fprintf_buildid(dso, fp);
 	ret += fprintf(fp, ")\n");
-	for (nd = rb_first(&dso->symbols); nd; nd = rb_next(nd)) {
+	for (nd = rb_first_cached(&dso->symbols); nd; nd = rb_next(nd)) {
 		struct symbol *pos = rb_entry(nd, struct symbol, rb_node);
 		ret += symbol__fprintf(pos, fp);
 	}
diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h
index 935fe99b38a8..53130b78cd37 100644
--- a/tools/perf/util/dso.h
+++ b/tools/perf/util/dso.h
@@ -140,8 +140,8 @@ struct dso {
 	struct list_head node;
 	struct rb_node	 rb_node;	/* rbtree node sorted by long name */
 	struct rb_root	 *root;		/* root of rbtree that rb_node is in */
-	struct rb_root	 symbols;
-	struct rb_root	 symbol_names;
+	struct rb_root_cached symbols;
+	struct rb_root_cached symbol_names;
 	struct rb_root_cached inlined_nodes;
 	struct rb_root_cached srclines;
 	struct {
@@ -235,7 +235,7 @@ bool dso__loaded(const struct dso *dso);
 
 static inline bool dso__has_symbols(const struct dso *dso)
 {
-	return !RB_EMPTY_ROOT(&dso->symbols);
+	return !RB_EMPTY_ROOT(&dso->symbols.rb_root);
 }
 
 bool dso__sorted_by_name(const struct dso *dso);
diff --git a/tools/perf/util/map.c b/tools/perf/util/map.c
index 781eed8e3265..e2d34c1ae07d 100644
--- a/tools/perf/util/map.c
+++ b/tools/perf/util/map.c
@@ -285,8 +285,8 @@ void map__put(struct map *map)
 
 void map__fixup_start(struct map *map)
 {
-	struct rb_root *symbols = &map->dso->symbols;
-	struct rb_node *nd = rb_first(symbols);
+	struct rb_root_cached *symbols = &map->dso->symbols;
+	struct rb_node *nd = rb_first_cached(symbols);
 	if (nd != NULL) {
 		struct symbol *sym = rb_entry(nd, struct symbol, rb_node);
 		map->start = sym->start;
@@ -295,8 +295,8 @@ void map__fixup_start(struct map *map)
 
 void map__fixup_end(struct map *map)
 {
-	struct rb_root *symbols = &map->dso->symbols;
-	struct rb_node *nd = rb_last(symbols);
+	struct rb_root_cached *symbols = &map->dso->symbols;
+	struct rb_node *nd = rb_last(&symbols->rb_root);
 	if (nd != NULL) {
 		struct symbol *sym = rb_entry(nd, struct symbol, rb_node);
 		map->end = sym->end;
diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
index e86f8be89157..eaf91648b713 100644
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
@@ -3528,7 +3528,8 @@ int show_available_funcs(const char *target, struct nsinfo *nsi,
 	/* Show all (filtered) symbols */
 	setup_pager();
 
-	for (nd = rb_first(&map->dso->symbol_names); nd; nd = rb_next(nd)) {
+	for (nd = rb_first_cached(&map->dso->symbol_names); nd;
+	     nd = rb_next(nd)) {
 		struct symbol_name_rb_node *pos = rb_entry(nd, struct symbol_name_rb_node, rb_node);
 
 		if (strfilter__compare(_filter, pos->sym.name))
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index dcce74bae6de..81dd498453ca 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -163,7 +163,7 @@ static int choose_best_symbol(struct symbol *syma, struct symbol *symb)
 	return arch__choose_best_symbol(syma, symb);
 }
 
-void symbols__fixup_duplicate(struct rb_root *symbols)
+void symbols__fixup_duplicate(struct rb_root_cached *symbols)
 {
 	struct rb_node *nd;
 	struct symbol *curr, *next;
@@ -171,7 +171,7 @@ void symbols__fixup_duplicate(struct rb_root *symbols)
 	if (symbol_conf.allow_aliases)
 		return;
 
-	nd = rb_first(symbols);
+	nd = rb_first_cached(symbols);
 
 	while (nd) {
 		curr = rb_entry(nd, struct symbol, rb_node);
@@ -186,20 +186,20 @@ void symbols__fixup_duplicate(struct rb_root *symbols)
 			continue;
 
 		if (choose_best_symbol(curr, next) == SYMBOL_A) {
-			rb_erase(&next->rb_node, symbols);
+			rb_erase_cached(&next->rb_node, symbols);
 			symbol__delete(next);
 			goto again;
 		} else {
 			nd = rb_next(&curr->rb_node);
-			rb_erase(&curr->rb_node, symbols);
+			rb_erase_cached(&curr->rb_node, symbols);
 			symbol__delete(curr);
 		}
 	}
 }
 
-void symbols__fixup_end(struct rb_root *symbols)
+void symbols__fixup_end(struct rb_root_cached *symbols)
 {
-	struct rb_node *nd, *prevnd = rb_first(symbols);
+	struct rb_node *nd, *prevnd = rb_first_cached(symbols);
 	struct symbol *curr, *prev;
 
 	if (prevnd == NULL)
@@ -282,25 +282,27 @@ void symbol__delete(struct symbol *sym)
 	free(((void *)sym) - symbol_conf.priv_size);
 }
 
-void symbols__delete(struct rb_root *symbols)
+void symbols__delete(struct rb_root_cached *symbols)
 {
 	struct symbol *pos;
-	struct rb_node *next = rb_first(symbols);
+	struct rb_node *next = rb_first_cached(symbols);
 
 	while (next) {
 		pos = rb_entry(next, struct symbol, rb_node);
 		next = rb_next(&pos->rb_node);
-		rb_erase(&pos->rb_node, symbols);
+		rb_erase_cached(&pos->rb_node, symbols);
 		symbol__delete(pos);
 	}
 }
 
-void __symbols__insert(struct rb_root *symbols, struct symbol *sym, bool kernel)
+void __symbols__insert(struct rb_root_cached *symbols,
+		       struct symbol *sym, bool kernel)
 {
-	struct rb_node **p = &symbols->rb_node;
+	struct rb_node **p = &symbols->rb_root.rb_node;
 	struct rb_node *parent = NULL;
 	const u64 ip = sym->start;
 	struct symbol *s;
+	bool leftmost = true;
 
 	if (kernel) {
 		const char *name = sym->name;
@@ -318,26 +320,28 @@ void __symbols__insert(struct rb_root *symbols, struct symbol *sym, bool kernel)
 		s = rb_entry(parent, struct symbol, rb_node);
 		if (ip < s->start)
 			p = &(*p)->rb_left;
-		else
+		else {
 			p = &(*p)->rb_right;
+			leftmost = false;
+		}
 	}
 	rb_link_node(&sym->rb_node, parent, p);
-	rb_insert_color(&sym->rb_node, symbols);
+	rb_insert_color_cached(&sym->rb_node, symbols, leftmost);
 }
 
-void symbols__insert(struct rb_root *symbols, struct symbol *sym)
+void symbols__insert(struct rb_root_cached *symbols, struct symbol *sym)
 {
 	__symbols__insert(symbols, sym, false);
 }
 
-static struct symbol *symbols__find(struct rb_root *symbols, u64 ip)
+static struct symbol *symbols__find(struct rb_root_cached *symbols, u64 ip)
 {
 	struct rb_node *n;
 
 	if (symbols == NULL)
 		return NULL;
 
-	n = symbols->rb_node;
+	n = symbols->rb_root.rb_node;
 
 	while (n) {
 		struct symbol *s = rb_entry(n, struct symbol, rb_node);
@@ -353,9 +357,9 @@ static struct symbol *symbols__find(struct rb_root *symbols, u64 ip)
 	return NULL;
 }
 
-static struct symbol *symbols__first(struct rb_root *symbols)
+static struct symbol *symbols__first(struct rb_root_cached *symbols)
 {
-	struct rb_node *n = rb_first(symbols);
+	struct rb_node *n = rb_first_cached(symbols);
 
 	if (n)
 		return rb_entry(n, struct symbol, rb_node);
@@ -363,9 +367,9 @@ static struct symbol *symbols__first(struct rb_root *symbols)
 	return NULL;
 }
 
-static struct symbol *symbols__last(struct rb_root *symbols)
+static struct symbol *symbols__last(struct rb_root_cached *symbols)
 {
-	struct rb_node *n = rb_last(symbols);
+	struct rb_node *n = rb_last(&symbols->rb_root);
 
 	if (n)
 		return rb_entry(n, struct symbol, rb_node);
@@ -383,11 +387,12 @@ static struct symbol *symbols__next(struct symbol *sym)
 	return NULL;
 }
 
-static void symbols__insert_by_name(struct rb_root *symbols, struct symbol *sym)
+static void symbols__insert_by_name(struct rb_root_cached *symbols, struct symbol *sym)
 {
-	struct rb_node **p = &symbols->rb_node;
+	struct rb_node **p = &symbols->rb_root.rb_node;
 	struct rb_node *parent = NULL;
 	struct symbol_name_rb_node *symn, *s;
+	bool leftmost = true;
 
 	symn = container_of(sym, struct symbol_name_rb_node, sym);
 
@@ -396,19 +401,21 @@ static void symbols__insert_by_name(struct rb_root *symbols, struct symbol *sym)
 		s = rb_entry(parent, struct symbol_name_rb_node, rb_node);
 		if (strcmp(sym->name, s->sym.name) < 0)
 			p = &(*p)->rb_left;
-		else
+		else {
 			p = &(*p)->rb_right;
+			leftmost = false;
+		}
 	}
 	rb_link_node(&symn->rb_node, parent, p);
-	rb_insert_color(&symn->rb_node, symbols);
+	rb_insert_color_cached(&symn->rb_node, symbols, leftmost);
 }
 
-static void symbols__sort_by_name(struct rb_root *symbols,
-				  struct rb_root *source)
+static void symbols__sort_by_name(struct rb_root_cached *symbols,
+				  struct rb_root_cached *source)
 {
 	struct rb_node *nd;
 
-	for (nd = rb_first(source); nd; nd = rb_next(nd)) {
+	for (nd = rb_first_cached(source); nd; nd = rb_next(nd)) {
 		struct symbol *pos = rb_entry(nd, struct symbol, rb_node);
 		symbols__insert_by_name(symbols, pos);
 	}
@@ -431,7 +438,7 @@ int symbol__match_symbol_name(const char *name, const char *str,
 		return arch__compare_symbol_names(name, str);
 }
 
-static struct symbol *symbols__find_by_name(struct rb_root *symbols,
+static struct symbol *symbols__find_by_name(struct rb_root_cached *symbols,
 					    const char *name,
 					    enum symbol_tag_include includes)
 {
@@ -441,7 +448,7 @@ static struct symbol *symbols__find_by_name(struct rb_root *symbols,
 	if (symbols == NULL)
 		return NULL;
 
-	n = symbols->rb_node;
+	n = symbols->rb_root.rb_node;
 
 	while (n) {
 		int cmp;
@@ -643,7 +650,7 @@ static int map__process_kallsym_symbol(void *arg, const char *name,
 {
 	struct symbol *sym;
 	struct dso *dso = arg;
-	struct rb_root *root = &dso->symbols;
+	struct rb_root_cached *root = &dso->symbols;
 
 	if (!symbol_type__filter(type))
 		return 0;
@@ -680,14 +687,14 @@ static int map_groups__split_kallsyms_for_kcore(struct map_groups *kmaps, struct
 	struct map *curr_map;
 	struct symbol *pos;
 	int count = 0;
-	struct rb_root old_root = dso->symbols;
-	struct rb_root *root = &dso->symbols;
-	struct rb_node *next = rb_first(root);
+	struct rb_root_cached old_root = dso->symbols;
+	struct rb_root_cached *root = &dso->symbols;
+	struct rb_node *next = rb_first_cached(root);
 
 	if (!kmaps)
 		return -1;
 
-	*root = RB_ROOT;
+	*root = RB_ROOT_CACHED;
 
 	while (next) {
 		char *module;
@@ -695,8 +702,8 @@ static int map_groups__split_kallsyms_for_kcore(struct map_groups *kmaps, struct
 		pos = rb_entry(next, struct symbol, rb_node);
 		next = rb_next(&pos->rb_node);
 
-		rb_erase_init(&pos->rb_node, &old_root);
-
+		rb_erase_cached(&pos->rb_node, &old_root);
+		RB_CLEAR_NODE(&pos->rb_node);
 		module = strchr(pos->name, '\t');
 		if (module)
 			*module = '\0';
@@ -733,8 +740,8 @@ static int map_groups__split_kallsyms(struct map_groups *kmaps, struct dso *dso,
 	struct map *curr_map = initial_map;
 	struct symbol *pos;
 	int count = 0, moved = 0;
-	struct rb_root *root = &dso->symbols;
-	struct rb_node *next = rb_first(root);
+	struct rb_root_cached *root = &dso->symbols;
+	struct rb_node *next = rb_first_cached(root);
 	int kernel_range = 0;
 	bool x86_64;
 
@@ -848,7 +855,7 @@ static int map_groups__split_kallsyms(struct map_groups *kmaps, struct dso *dso,
 		}
 add_symbol:
 		if (curr_map != initial_map) {
-			rb_erase(&pos->rb_node, root);
+			rb_erase_cached(&pos->rb_node, root);
 			symbols__insert(&curr_map->dso->symbols, pos);
 			++moved;
 		} else
@@ -856,7 +863,7 @@ static int map_groups__split_kallsyms(struct map_groups *kmaps, struct dso *dso,
 
 		continue;
 discard_symbol:
-		rb_erase(&pos->rb_node, root);
+		rb_erase_cached(&pos->rb_node, root);
 		symbol__delete(pos);
 	}
 
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index d026d215bdc6..e21ba7299632 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -67,7 +67,7 @@ struct symbol {
 };
 
 void symbol__delete(struct symbol *sym);
-void symbols__delete(struct rb_root *symbols);
+void symbols__delete(struct rb_root_cached *symbols);
 
 /* symbols__for_each_entry - iterate over symbols (rb_root)
  *
@@ -76,7 +76,7 @@ void symbols__delete(struct rb_root *symbols);
  * @nd: the 'struct rb_node *' to use as a temporary storage
  */
 #define symbols__for_each_entry(symbols, pos, nd)			\
-	for (nd = rb_first(symbols);					\
+	for (nd = rb_first_cached(symbols);					\
 	     nd && (pos = rb_entry(nd, struct symbol, rb_node));	\
 	     nd = rb_next(nd))
 
@@ -309,10 +309,11 @@ int dso__synthesize_plt_symbols(struct dso *dso, struct symsrc *ss);
 
 char *dso__demangle_sym(struct dso *dso, int kmodule, const char *elf_name);
 
-void __symbols__insert(struct rb_root *symbols, struct symbol *sym, bool kernel);
-void symbols__insert(struct rb_root *symbols, struct symbol *sym);
-void symbols__fixup_duplicate(struct rb_root *symbols);
-void symbols__fixup_end(struct rb_root *symbols);
+void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym,
+		       bool kernel);
+void symbols__insert(struct rb_root_cached *symbols, struct symbol *sym);
+void symbols__fixup_duplicate(struct rb_root_cached *symbols);
+void symbols__fixup_end(struct rb_root_cached *symbols);
 void map_groups__fixup_end(struct map_groups *mg);
 
 typedef int (*mapfn_t)(u64 start, u64 len, u64 pgoff, void *data);
diff --git a/tools/perf/util/symbol_fprintf.c b/tools/perf/util/symbol_fprintf.c
index ed0205cc7942..a86f89f357d9 100644
--- a/tools/perf/util/symbol_fprintf.c
+++ b/tools/perf/util/symbol_fprintf.c
@@ -64,7 +64,7 @@ size_t dso__fprintf_symbols_by_name(struct dso *dso,
 	struct rb_node *nd;
 	struct symbol_name_rb_node *pos;
 
-	for (nd = rb_first(&dso->symbol_names); nd; nd = rb_next(nd)) {
+	for (nd = rb_first_cached(&dso->symbol_names); nd; nd = rb_next(nd)) {
 		pos = rb_entry(nd, struct symbol_name_rb_node, rb_node);
 		fprintf(fp, "%s\n", pos->sym.name);
 	}
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 6/7] perf hist: Use cached rbtrees
  2018-12-06 19:18 [PATCH v2 -tip 0/7] tools/perf: Update rbtree implementation and optimize users Davidlohr Bueso
                   ` (4 preceding siblings ...)
  2018-12-06 19:18 ` [PATCH 5/7] perf symbols: Use cached rbtrees Davidlohr Bueso
@ 2018-12-06 19:18 ` Davidlohr Bueso
  2019-01-22 13:59   ` Arnaldo Carvalho de Melo
  2019-01-26 10:04   ` [tip:perf/core] " tip-bot for Davidlohr Bueso
  2018-12-06 19:18 ` [PATCH 7/7] perf sched: " Davidlohr Bueso
  6 siblings, 2 replies; 18+ messages in thread
From: Davidlohr Bueso @ 2018-12-06 19:18 UTC (permalink / raw)
  To: acme; +Cc: mingo, linux-kernel, dave, Davidlohr Bueso

At the cost of an extra pointer, we can avoid the O(logN) cost
of finding the first element in the tree (smallest node), which
is something heavily required for histograms. Specifically, the
following are converted to rb_root_cached, and users accordingly:

hist::entries_in_array
hist::entries_in
hist::entries
hist::entries_collapsed
hist_entry::hroot_in
hist_entry::hroot_out

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 tools/perf/builtin-annotate.c     |   2 +-
 tools/perf/builtin-c2c.c          |   6 +-
 tools/perf/builtin-diff.c         |  10 +-
 tools/perf/builtin-top.c          |   2 +-
 tools/perf/tests/hists_common.c   |   8 +-
 tools/perf/tests/hists_cumulate.c |  14 +--
 tools/perf/tests/hists_link.c     |   8 +-
 tools/perf/tests/hists_output.c   |  32 +++---
 tools/perf/ui/browsers/hists.c    |   6 +-
 tools/perf/ui/gtk/hists.c         |   4 +-
 tools/perf/ui/stdio/hist.c        |   3 +-
 tools/perf/util/hist.c            | 199 +++++++++++++++++++++-----------------
 tools/perf/util/hist.h            |  10 +-
 tools/perf/util/sort.h            |   4 +-
 14 files changed, 167 insertions(+), 141 deletions(-)

diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index cc3da5564300..20db635347e5 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -305,7 +305,7 @@ static void hists__find_annotations(struct hists *hists,
 				    struct perf_evsel *evsel,
 				    struct perf_annotate *ann)
 {
-	struct rb_node *nd = rb_first(&hists->entries), *next;
+	struct rb_node *nd = rb_first_cached(&hists->entries), *next;
 	int key = K_RIGHT;
 
 	while (nd) {
diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index f3aa9d02a5ab..8274de8812ea 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -2088,7 +2088,7 @@ static int resort_hitm_cb(struct hist_entry *he)
 
 static int hists__iterate_cb(struct hists *hists, hists__resort_cb_t cb)
 {
-	struct rb_node *next = rb_first(&hists->entries);
+	struct rb_node *next = rb_first_cached(&hists->entries);
 	int ret = 0;
 
 	while (next) {
@@ -2215,7 +2215,7 @@ static void print_pareto(FILE *out)
 	if (WARN_ONCE(ret, "failed to setup sort entries\n"))
 		return;
 
-	nd = rb_first(&c2c.hists.hists.entries);
+	nd = rb_first_cached(&c2c.hists.hists.entries);
 
 	for (; nd; nd = rb_next(nd)) {
 		struct hist_entry *he = rb_entry(nd, struct hist_entry, rb_node);
@@ -2283,7 +2283,7 @@ static void perf_c2c__hists_fprintf(FILE *out, struct perf_session *session)
 static void c2c_browser__update_nr_entries(struct hist_browser *hb)
 {
 	u64 nr_entries = 0;
-	struct rb_node *nd = rb_first(&hb->hists->entries);
+	struct rb_node *nd = rb_first_cached(&hb->hists->entries);
 
 	while (nd) {
 		struct hist_entry *he = rb_entry(nd, struct hist_entry, rb_node);
diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
index 39db2ee32d48..751e1971456b 100644
--- a/tools/perf/builtin-diff.c
+++ b/tools/perf/builtin-diff.c
@@ -429,7 +429,7 @@ get_pair_fmt(struct hist_entry *he, struct diff_hpp_fmt *dfmt)
 
 static void hists__baseline_only(struct hists *hists)
 {
-	struct rb_root *root;
+	struct rb_root_cached *root;
 	struct rb_node *next;
 
 	if (hists__has(hists, need_collapse))
@@ -437,13 +437,13 @@ static void hists__baseline_only(struct hists *hists)
 	else
 		root = hists->entries_in;
 
-	next = rb_first(root);
+	next = rb_first_cached(root);
 	while (next != NULL) {
 		struct hist_entry *he = rb_entry(next, struct hist_entry, rb_node_in);
 
 		next = rb_next(&he->rb_node_in);
 		if (!hist_entry__next_pair(he)) {
-			rb_erase(&he->rb_node_in, root);
+			rb_erase_cached(&he->rb_node_in, root);
 			hist_entry__delete(he);
 		}
 	}
@@ -451,7 +451,7 @@ static void hists__baseline_only(struct hists *hists)
 
 static void hists__precompute(struct hists *hists)
 {
-	struct rb_root *root;
+	struct rb_root_cached *root;
 	struct rb_node *next;
 
 	if (hists__has(hists, need_collapse))
@@ -459,7 +459,7 @@ static void hists__precompute(struct hists *hists)
 	else
 		root = hists->entries_in;
 
-	next = rb_first(root);
+	next = rb_first_cached(root);
 	while (next != NULL) {
 		struct hist_entry *he, *pair;
 		struct data__file *d;
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index aa0c73e57924..92575df265a9 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -367,7 +367,7 @@ static void perf_top__prompt_symbol(struct perf_top *top, const char *msg)
 	if (p)
 		*p = 0;
 
-	next = rb_first(&hists->entries);
+	next = rb_first_cached(&hists->entries);
 	while (next) {
 		n = rb_entry(next, struct hist_entry, rb_node);
 		if (n->ms.sym && !strcmp(buf, n->ms.sym->name)) {
diff --git a/tools/perf/tests/hists_common.c b/tools/perf/tests/hists_common.c
index b889a28fd80b..6b0649c8b33e 100644
--- a/tools/perf/tests/hists_common.c
+++ b/tools/perf/tests/hists_common.c
@@ -161,7 +161,7 @@ struct machine *setup_fake_machine(struct machines *machines)
 void print_hists_in(struct hists *hists)
 {
 	int i = 0;
-	struct rb_root *root;
+	struct rb_root_cached *root;
 	struct rb_node *node;
 
 	if (hists__has(hists, need_collapse))
@@ -170,7 +170,7 @@ void print_hists_in(struct hists *hists)
 		root = hists->entries_in;
 
 	pr_info("----- %s --------\n", __func__);
-	node = rb_first(root);
+	node = rb_first_cached(root);
 	while (node) {
 		struct hist_entry *he;
 
@@ -191,13 +191,13 @@ void print_hists_in(struct hists *hists)
 void print_hists_out(struct hists *hists)
 {
 	int i = 0;
-	struct rb_root *root;
+	struct rb_root_cached *root;
 	struct rb_node *node;
 
 	root = &hists->entries;
 
 	pr_info("----- %s --------\n", __func__);
-	node = rb_first(root);
+	node = rb_first_cached(root);
 	while (node) {
 		struct hist_entry *he;
 
diff --git a/tools/perf/tests/hists_cumulate.c b/tools/perf/tests/hists_cumulate.c
index 65fe02bebbee..4ca417d1a06b 100644
--- a/tools/perf/tests/hists_cumulate.c
+++ b/tools/perf/tests/hists_cumulate.c
@@ -125,8 +125,8 @@ static int add_hist_entries(struct hists *hists, struct machine *machine)
 static void del_hist_entries(struct hists *hists)
 {
 	struct hist_entry *he;
-	struct rb_root *root_in;
-	struct rb_root *root_out;
+	struct rb_root_cached *root_in;
+	struct rb_root_cached *root_out;
 	struct rb_node *node;
 
 	if (hists__has(hists, need_collapse))
@@ -136,12 +136,12 @@ static void del_hist_entries(struct hists *hists)
 
 	root_out = &hists->entries;
 
-	while (!RB_EMPTY_ROOT(root_out)) {
-		node = rb_first(root_out);
+	while (!RB_EMPTY_ROOT(&root_out->rb_root)) {
+		node = rb_first_cached(root_out);
 
 		he = rb_entry(node, struct hist_entry, rb_node);
-		rb_erase(node, root_out);
-		rb_erase(&he->rb_node_in, root_in);
+		rb_erase_cached(node, root_out);
+		rb_erase_cached(&he->rb_node_in, root_in);
 		hist_entry__delete(he);
 	}
 }
@@ -198,7 +198,7 @@ static int do_test(struct hists *hists, struct result *expected, size_t nr_expec
 		print_hists_out(hists);
 	}
 
-	root = &hists->entries;
+	root = &hists->entries.rb_root;
 	for (node = rb_first(root), i = 0;
 	     node && (he = rb_entry(node, struct hist_entry, rb_node));
 	     node = rb_next(node), i++) {
diff --git a/tools/perf/tests/hists_link.c b/tools/perf/tests/hists_link.c
index 9a9d06cb0222..af633db63f4d 100644
--- a/tools/perf/tests/hists_link.c
+++ b/tools/perf/tests/hists_link.c
@@ -142,7 +142,7 @@ static int find_sample(struct sample *samples, size_t nr_samples,
 static int __validate_match(struct hists *hists)
 {
 	size_t count = 0;
-	struct rb_root *root;
+	struct rb_root_cached *root;
 	struct rb_node *node;
 
 	/*
@@ -153,7 +153,7 @@ static int __validate_match(struct hists *hists)
 	else
 		root = hists->entries_in;
 
-	node = rb_first(root);
+	node = rb_first_cached(root);
 	while (node) {
 		struct hist_entry *he;
 
@@ -192,7 +192,7 @@ static int __validate_link(struct hists *hists, int idx)
 	size_t count = 0;
 	size_t count_pair = 0;
 	size_t count_dummy = 0;
-	struct rb_root *root;
+	struct rb_root_cached *root;
 	struct rb_node *node;
 
 	/*
@@ -205,7 +205,7 @@ static int __validate_link(struct hists *hists, int idx)
 	else
 		root = hists->entries_in;
 
-	node = rb_first(root);
+	node = rb_first_cached(root);
 	while (node) {
 		struct hist_entry *he;
 
diff --git a/tools/perf/tests/hists_output.c b/tools/perf/tests/hists_output.c
index faacb4f41460..6c4bc68a77ee 100644
--- a/tools/perf/tests/hists_output.c
+++ b/tools/perf/tests/hists_output.c
@@ -91,8 +91,8 @@ static int add_hist_entries(struct hists *hists, struct machine *machine)
 static void del_hist_entries(struct hists *hists)
 {
 	struct hist_entry *he;
-	struct rb_root *root_in;
-	struct rb_root *root_out;
+	struct rb_root_cached *root_in;
+	struct rb_root_cached *root_out;
 	struct rb_node *node;
 
 	if (hists__has(hists, need_collapse))
@@ -102,12 +102,12 @@ static void del_hist_entries(struct hists *hists)
 
 	root_out = &hists->entries;
 
-	while (!RB_EMPTY_ROOT(root_out)) {
-		node = rb_first(root_out);
+	while (!RB_EMPTY_ROOT(&root_out->rb_root)) {
+		node = rb_first_cached(root_out);
 
 		he = rb_entry(node, struct hist_entry, rb_node);
-		rb_erase(node, root_out);
-		rb_erase(&he->rb_node_in, root_in);
+		rb_erase_cached(node, root_out);
+		rb_erase_cached(&he->rb_node_in, root_in);
 		hist_entry__delete(he);
 	}
 }
@@ -126,7 +126,7 @@ static int test1(struct perf_evsel *evsel, struct machine *machine)
 	int err;
 	struct hists *hists = evsel__hists(evsel);
 	struct hist_entry *he;
-	struct rb_root *root;
+	struct rb_root_cached *root;
 	struct rb_node *node;
 
 	field_order = NULL;
@@ -162,7 +162,7 @@ static int test1(struct perf_evsel *evsel, struct machine *machine)
 	}
 
 	root = &hists->entries;
-	node = rb_first(root);
+	node = rb_first_cached(root);
 	he = rb_entry(node, struct hist_entry, rb_node);
 	TEST_ASSERT_VAL("Invalid hist entry",
 			!strcmp(COMM(he), "perf") && !strcmp(DSO(he), "perf") &&
@@ -228,7 +228,7 @@ static int test2(struct perf_evsel *evsel, struct machine *machine)
 	int err;
 	struct hists *hists = evsel__hists(evsel);
 	struct hist_entry *he;
-	struct rb_root *root;
+	struct rb_root_cached *root;
 	struct rb_node *node;
 
 	field_order = "overhead,cpu";
@@ -262,7 +262,7 @@ static int test2(struct perf_evsel *evsel, struct machine *machine)
 	}
 
 	root = &hists->entries;
-	node = rb_first(root);
+	node = rb_first_cached(root);
 	he = rb_entry(node, struct hist_entry, rb_node);
 	TEST_ASSERT_VAL("Invalid hist entry",
 			CPU(he) == 1 && PID(he) == 100 && he->stat.period == 300);
@@ -284,7 +284,7 @@ static int test3(struct perf_evsel *evsel, struct machine *machine)
 	int err;
 	struct hists *hists = evsel__hists(evsel);
 	struct hist_entry *he;
-	struct rb_root *root;
+	struct rb_root_cached *root;
 	struct rb_node *node;
 
 	field_order = "comm,overhead,dso";
@@ -316,7 +316,7 @@ static int test3(struct perf_evsel *evsel, struct machine *machine)
 	}
 
 	root = &hists->entries;
-	node = rb_first(root);
+	node = rb_first_cached(root);
 	he = rb_entry(node, struct hist_entry, rb_node);
 	TEST_ASSERT_VAL("Invalid hist entry",
 			!strcmp(COMM(he), "bash") && !strcmp(DSO(he), "bash") &&
@@ -358,7 +358,7 @@ static int test4(struct perf_evsel *evsel, struct machine *machine)
 	int err;
 	struct hists *hists = evsel__hists(evsel);
 	struct hist_entry *he;
-	struct rb_root *root;
+	struct rb_root_cached *root;
 	struct rb_node *node;
 
 	field_order = "dso,sym,comm,overhead,dso";
@@ -394,7 +394,7 @@ static int test4(struct perf_evsel *evsel, struct machine *machine)
 	}
 
 	root = &hists->entries;
-	node = rb_first(root);
+	node = rb_first_cached(root);
 	he = rb_entry(node, struct hist_entry, rb_node);
 	TEST_ASSERT_VAL("Invalid hist entry",
 			!strcmp(DSO(he), "perf") && !strcmp(SYM(he), "cmd_record") &&
@@ -460,7 +460,7 @@ static int test5(struct perf_evsel *evsel, struct machine *machine)
 	int err;
 	struct hists *hists = evsel__hists(evsel);
 	struct hist_entry *he;
-	struct rb_root *root;
+	struct rb_root_cached *root;
 	struct rb_node *node;
 
 	field_order = "cpu,pid,comm,dso,sym";
@@ -497,7 +497,7 @@ static int test5(struct perf_evsel *evsel, struct machine *machine)
 	}
 
 	root = &hists->entries;
-	node = rb_first(root);
+	node = rb_first_cached(root);
 	he = rb_entry(node, struct hist_entry, rb_node);
 
 	TEST_ASSERT_VAL("Invalid hist entry",
diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index a96f62ca984a..3809029bb088 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -49,7 +49,7 @@ static int hist_browser__get_folding(struct hist_browser *browser)
 	struct hists *hists = browser->hists;
 	int unfolded_rows = 0;
 
-	for (nd = rb_first(&hists->entries);
+	for (nd = rb_first_cached(&hists->entries);
 	     (nd = hists__filter_entries(nd, browser->min_pcnt)) != NULL;
 	     nd = rb_hierarchy_next(nd)) {
 		struct hist_entry *he =
@@ -267,7 +267,7 @@ static int hierarchy_count_rows(struct hist_browser *hb, struct hist_entry *he,
 	if (he->has_no_entry)
 		return 1;
 
-	node = rb_first(&he->hroot_out);
+	node = rb_first_cached(&he->hroot_out);
 	while (node) {
 		float percent;
 
@@ -372,7 +372,7 @@ static void hist_entry__init_have_children(struct hist_entry *he)
 		he->has_children = !RB_EMPTY_ROOT(&he->sorted_chain);
 		callchain__init_have_children(&he->sorted_chain);
 	} else {
-		he->has_children = !RB_EMPTY_ROOT(&he->hroot_out);
+		he->has_children = !RB_EMPTY_ROOT(&he->hroot_out.rb_root);
 	}
 
 	he->init_have_children = true;
diff --git a/tools/perf/ui/gtk/hists.c b/tools/perf/ui/gtk/hists.c
index 4ab663ec3e5e..dfab3deef028 100644
--- a/tools/perf/ui/gtk/hists.c
+++ b/tools/perf/ui/gtk/hists.c
@@ -353,7 +353,7 @@ static void perf_gtk__show_hists(GtkWidget *window, struct hists *hists,
 
 	g_object_unref(GTK_TREE_MODEL(store));
 
-	for (nd = rb_first(&hists->entries); nd; nd = rb_next(nd)) {
+	for (nd = rb_first_cached(&hists->entries); nd; nd = rb_next(nd)) {
 		struct hist_entry *h = rb_entry(nd, struct hist_entry, rb_node);
 		GtkTreeIter iter;
 		u64 total = hists__total_period(h->hists);
@@ -578,7 +578,7 @@ static void perf_gtk__show_hierarchy(GtkWidget *window, struct hists *hists,
 	gtk_tree_view_set_model(GTK_TREE_VIEW(view), GTK_TREE_MODEL(store));
 	g_object_unref(GTK_TREE_MODEL(store));
 
-	perf_gtk__add_hierarchy_entries(hists, &hists->entries, store,
+	perf_gtk__add_hierarchy_entries(hists, &hists->entries.rb_root, store,
 					NULL, &hpp, min_pcnt);
 
 	gtk_tree_view_set_rules_hint(GTK_TREE_VIEW(view), TRUE);
diff --git a/tools/perf/ui/stdio/hist.c b/tools/perf/ui/stdio/hist.c
index 74c4ae1f0a05..2a9fa4dcbc2a 100644
--- a/tools/perf/ui/stdio/hist.c
+++ b/tools/perf/ui/stdio/hist.c
@@ -788,7 +788,8 @@ size_t hists__fprintf(struct hists *hists, bool show_header, int max_rows,
 
 	indent = hists__overhead_width(hists) + 4;
 
-	for (nd = rb_first(&hists->entries); nd; nd = __rb_hierarchy_next(nd, HMD_FORCE_CHILD)) {
+	for (nd = rb_first_cached(&hists->entries); nd;
+	     nd = __rb_hierarchy_next(nd, HMD_FORCE_CHILD)) {
 		struct hist_entry *h = rb_entry(nd, struct hist_entry, rb_node);
 		float percent;
 
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 828cb9794c76..1cc3f2958fee 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -209,7 +209,7 @@ void hists__calc_col_len(struct hists *hists, struct hist_entry *h)
 
 void hists__output_recalc_col_len(struct hists *hists, int max_rows)
 {
-	struct rb_node *next = rb_first(&hists->entries);
+	struct rb_node *next = rb_first_cached(&hists->entries);
 	struct hist_entry *n;
 	int row = 0;
 
@@ -296,7 +296,7 @@ static bool hists__decay_entry(struct hists *hists, struct hist_entry *he)
 
 	if (!he->leaf) {
 		struct hist_entry *child;
-		struct rb_node *node = rb_first(&he->hroot_out);
+		struct rb_node *node = rb_first_cached(&he->hroot_out);
 		while (node) {
 			child = rb_entry(node, struct hist_entry, rb_node);
 			node = rb_next(node);
@@ -311,8 +311,8 @@ static bool hists__decay_entry(struct hists *hists, struct hist_entry *he)
 
 static void hists__delete_entry(struct hists *hists, struct hist_entry *he)
 {
-	struct rb_root *root_in;
-	struct rb_root *root_out;
+	struct rb_root_cached *root_in;
+	struct rb_root_cached *root_out;
 
 	if (he->parent_he) {
 		root_in  = &he->parent_he->hroot_in;
@@ -325,8 +325,8 @@ static void hists__delete_entry(struct hists *hists, struct hist_entry *he)
 		root_out = &hists->entries;
 	}
 
-	rb_erase(&he->rb_node_in, root_in);
-	rb_erase(&he->rb_node, root_out);
+	rb_erase_cached(&he->rb_node_in, root_in);
+	rb_erase_cached(&he->rb_node, root_out);
 
 	--hists->nr_entries;
 	if (!he->filtered)
@@ -337,7 +337,7 @@ static void hists__delete_entry(struct hists *hists, struct hist_entry *he)
 
 void hists__decay_entries(struct hists *hists, bool zap_user, bool zap_kernel)
 {
-	struct rb_node *next = rb_first(&hists->entries);
+	struct rb_node *next = rb_first_cached(&hists->entries);
 	struct hist_entry *n;
 
 	while (next) {
@@ -353,7 +353,7 @@ void hists__decay_entries(struct hists *hists, bool zap_user, bool zap_kernel)
 
 void hists__delete_entries(struct hists *hists)
 {
-	struct rb_node *next = rb_first(&hists->entries);
+	struct rb_node *next = rb_first_cached(&hists->entries);
 	struct hist_entry *n;
 
 	while (next) {
@@ -435,8 +435,8 @@ static int hist_entry__init(struct hist_entry *he,
 	}
 	INIT_LIST_HEAD(&he->pairs.node);
 	thread__get(he->thread);
-	he->hroot_in  = RB_ROOT;
-	he->hroot_out = RB_ROOT;
+	he->hroot_in  = RB_ROOT_CACHED;
+	he->hroot_out = RB_ROOT_CACHED;
 
 	if (!symbol_conf.report_hierarchy)
 		he->leaf = true;
@@ -513,8 +513,9 @@ static struct hist_entry *hists__findnew_entry(struct hists *hists,
 	int64_t cmp;
 	u64 period = entry->stat.period;
 	u64 weight = entry->stat.weight;
+	bool leftmost = true;
 
-	p = &hists->entries_in->rb_node;
+	p = &hists->entries_in->rb_root.rb_node;
 
 	while (*p != NULL) {
 		parent = *p;
@@ -557,8 +558,10 @@ static struct hist_entry *hists__findnew_entry(struct hists *hists,
 
 		if (cmp < 0)
 			p = &(*p)->rb_left;
-		else
+		else {
 			p = &(*p)->rb_right;
+			leftmost = false;
+		}
 	}
 
 	he = hist_entry__new(entry, sample_self);
@@ -570,7 +573,7 @@ static struct hist_entry *hists__findnew_entry(struct hists *hists,
 	hists->nr_entries++;
 
 	rb_link_node(&he->rb_node_in, parent, p);
-	rb_insert_color(&he->rb_node_in, hists->entries_in);
+	rb_insert_color_cached(&he->rb_node_in, hists->entries_in, leftmost);
 out:
 	if (sample_self)
 		he_stat__add_cpumode_period(&he->stat, al->cpumode, period);
@@ -1279,16 +1282,17 @@ static void hist_entry__apply_hierarchy_filters(struct hist_entry *he)
 }
 
 static struct hist_entry *hierarchy_insert_entry(struct hists *hists,
-						 struct rb_root *root,
+						 struct rb_root_cached *root,
 						 struct hist_entry *he,
 						 struct hist_entry *parent_he,
 						 struct perf_hpp_list *hpp_list)
 {
-	struct rb_node **p = &root->rb_node;
+	struct rb_node **p = &root->rb_root.rb_node;
 	struct rb_node *parent = NULL;
 	struct hist_entry *iter, *new;
 	struct perf_hpp_fmt *fmt;
 	int64_t cmp;
+	bool leftmost = true;
 
 	while (*p != NULL) {
 		parent = *p;
@@ -1308,8 +1312,10 @@ static struct hist_entry *hierarchy_insert_entry(struct hists *hists,
 
 		if (cmp < 0)
 			p = &parent->rb_left;
-		else
+		else {
 			p = &parent->rb_right;
+			leftmost = false;
+		}
 	}
 
 	new = hist_entry__new(he, true);
@@ -1343,12 +1349,12 @@ static struct hist_entry *hierarchy_insert_entry(struct hists *hists,
 	}
 
 	rb_link_node(&new->rb_node_in, parent, p);
-	rb_insert_color(&new->rb_node_in, root);
+	rb_insert_color_cached(&new->rb_node_in, root, leftmost);
 	return new;
 }
 
 static int hists__hierarchy_insert_entry(struct hists *hists,
-					 struct rb_root *root,
+					 struct rb_root_cached *root,
 					 struct hist_entry *he)
 {
 	struct perf_hpp_list_node *node;
@@ -1395,13 +1401,14 @@ static int hists__hierarchy_insert_entry(struct hists *hists,
 }
 
 static int hists__collapse_insert_entry(struct hists *hists,
-					struct rb_root *root,
+					struct rb_root_cached *root,
 					struct hist_entry *he)
 {
-	struct rb_node **p = &root->rb_node;
+	struct rb_node **p = &root->rb_root.rb_node;
 	struct rb_node *parent = NULL;
 	struct hist_entry *iter;
 	int64_t cmp;
+	bool leftmost = true;
 
 	if (symbol_conf.report_hierarchy)
 		return hists__hierarchy_insert_entry(hists, root, he);
@@ -1432,19 +1439,21 @@ static int hists__collapse_insert_entry(struct hists *hists,
 
 		if (cmp < 0)
 			p = &(*p)->rb_left;
-		else
+		else {
 			p = &(*p)->rb_right;
+			leftmost = false;
+		}
 	}
 	hists->nr_entries++;
 
 	rb_link_node(&he->rb_node_in, parent, p);
-	rb_insert_color(&he->rb_node_in, root);
+	rb_insert_color_cached(&he->rb_node_in, root, leftmost);
 	return 1;
 }
 
-struct rb_root *hists__get_rotate_entries_in(struct hists *hists)
+struct rb_root_cached *hists__get_rotate_entries_in(struct hists *hists)
 {
-	struct rb_root *root;
+	struct rb_root_cached *root;
 
 	pthread_mutex_lock(&hists->lock);
 
@@ -1467,7 +1476,7 @@ static void hists__apply_filters(struct hists *hists, struct hist_entry *he)
 
 int hists__collapse_resort(struct hists *hists, struct ui_progress *prog)
 {
-	struct rb_root *root;
+	struct rb_root_cached *root;
 	struct rb_node *next;
 	struct hist_entry *n;
 	int ret;
@@ -1479,7 +1488,7 @@ int hists__collapse_resort(struct hists *hists, struct ui_progress *prog)
 
 	root = hists__get_rotate_entries_in(hists);
 
-	next = rb_first(root);
+	next = rb_first_cached(root);
 
 	while (next) {
 		if (session_done())
@@ -1487,7 +1496,7 @@ int hists__collapse_resort(struct hists *hists, struct ui_progress *prog)
 		n = rb_entry(next, struct hist_entry, rb_node_in);
 		next = rb_next(&n->rb_node_in);
 
-		rb_erase(&n->rb_node_in, root);
+		rb_erase_cached(&n->rb_node_in, root);
 		ret = hists__collapse_insert_entry(hists, &hists->entries_collapsed, n);
 		if (ret < 0)
 			return -1;
@@ -1558,7 +1567,7 @@ static void hierarchy_recalc_total_periods(struct hists *hists)
 	struct rb_node *node;
 	struct hist_entry *he;
 
-	node = rb_first(&hists->entries);
+	node = rb_first_cached(&hists->entries);
 
 	hists->stats.total_period = 0;
 	hists->stats.total_non_filtered_period = 0;
@@ -1578,13 +1587,14 @@ static void hierarchy_recalc_total_periods(struct hists *hists)
 	}
 }
 
-static void hierarchy_insert_output_entry(struct rb_root *root,
+static void hierarchy_insert_output_entry(struct rb_root_cached *root,
 					  struct hist_entry *he)
 {
-	struct rb_node **p = &root->rb_node;
+	struct rb_node **p = &root->rb_root.rb_node;
 	struct rb_node *parent = NULL;
 	struct hist_entry *iter;
 	struct perf_hpp_fmt *fmt;
+	bool leftmost = true;
 
 	while (*p != NULL) {
 		parent = *p;
@@ -1592,12 +1602,14 @@ static void hierarchy_insert_output_entry(struct rb_root *root,
 
 		if (hist_entry__sort(he, iter) > 0)
 			p = &parent->rb_left;
-		else
+		else {
 			p = &parent->rb_right;
+			leftmost = false;
+		}
 	}
 
 	rb_link_node(&he->rb_node, parent, p);
-	rb_insert_color(&he->rb_node, root);
+	rb_insert_color_cached(&he->rb_node, root, leftmost);
 
 	/* update column width of dynamic entry */
 	perf_hpp_list__for_each_sort_list(he->hpp_list, fmt) {
@@ -1608,16 +1620,16 @@ static void hierarchy_insert_output_entry(struct rb_root *root,
 
 static void hists__hierarchy_output_resort(struct hists *hists,
 					   struct ui_progress *prog,
-					   struct rb_root *root_in,
-					   struct rb_root *root_out,
+					   struct rb_root_cached *root_in,
+					   struct rb_root_cached *root_out,
 					   u64 min_callchain_hits,
 					   bool use_callchain)
 {
 	struct rb_node *node;
 	struct hist_entry *he;
 
-	*root_out = RB_ROOT;
-	node = rb_first(root_in);
+	*root_out = RB_ROOT_CACHED;
+	node = rb_first_cached(root_in);
 
 	while (node) {
 		he = rb_entry(node, struct hist_entry, rb_node_in);
@@ -1660,15 +1672,16 @@ static void hists__hierarchy_output_resort(struct hists *hists,
 	}
 }
 
-static void __hists__insert_output_entry(struct rb_root *entries,
+static void __hists__insert_output_entry(struct rb_root_cached *entries,
 					 struct hist_entry *he,
 					 u64 min_callchain_hits,
 					 bool use_callchain)
 {
-	struct rb_node **p = &entries->rb_node;
+	struct rb_node **p = &entries->rb_root.rb_node;
 	struct rb_node *parent = NULL;
 	struct hist_entry *iter;
 	struct perf_hpp_fmt *fmt;
+	bool leftmost = true;
 
 	if (use_callchain) {
 		if (callchain_param.mode == CHAIN_GRAPH_REL) {
@@ -1689,12 +1702,14 @@ static void __hists__insert_output_entry(struct rb_root *entries,
 
 		if (hist_entry__sort(he, iter) > 0)
 			p = &(*p)->rb_left;
-		else
+		else {
 			p = &(*p)->rb_right;
+			leftmost = false;
+		}
 	}
 
 	rb_link_node(&he->rb_node, parent, p);
-	rb_insert_color(&he->rb_node, entries);
+	rb_insert_color_cached(&he->rb_node, entries, leftmost);
 
 	perf_hpp_list__for_each_sort_list(&perf_hpp_list, fmt) {
 		if (perf_hpp__is_dynamic_entry(fmt) &&
@@ -1706,7 +1721,7 @@ static void __hists__insert_output_entry(struct rb_root *entries,
 static void output_resort(struct hists *hists, struct ui_progress *prog,
 			  bool use_callchain, hists__resort_cb_t cb)
 {
-	struct rb_root *root;
+	struct rb_root_cached *root;
 	struct rb_node *next;
 	struct hist_entry *n;
 	u64 callchain_total;
@@ -1736,8 +1751,8 @@ static void output_resort(struct hists *hists, struct ui_progress *prog,
 	else
 		root = hists->entries_in;
 
-	next = rb_first(root);
-	hists->entries = RB_ROOT;
+	next = rb_first_cached(root);
+	hists->entries = RB_ROOT_CACHED;
 
 	while (next) {
 		n = rb_entry(next, struct hist_entry, rb_node_in);
@@ -1798,7 +1813,7 @@ struct rb_node *rb_hierarchy_last(struct rb_node *node)
 	struct hist_entry *he = rb_entry(node, struct hist_entry, rb_node);
 
 	while (can_goto_child(he, HMD_NORMAL)) {
-		node = rb_last(&he->hroot_out);
+		node = rb_last(&he->hroot_out.rb_root);
 		he = rb_entry(node, struct hist_entry, rb_node);
 	}
 	return node;
@@ -1809,7 +1824,7 @@ struct rb_node *__rb_hierarchy_next(struct rb_node *node, enum hierarchy_move_di
 	struct hist_entry *he = rb_entry(node, struct hist_entry, rb_node);
 
 	if (can_goto_child(he, hmd))
-		node = rb_first(&he->hroot_out);
+		node = rb_first_cached(&he->hroot_out);
 	else
 		node = rb_next(node);
 
@@ -1847,7 +1862,7 @@ bool hist_entry__has_hierarchy_children(struct hist_entry *he, float limit)
 	if (he->leaf)
 		return false;
 
-	node = rb_first(&he->hroot_out);
+	node = rb_first_cached(&he->hroot_out);
 	child = rb_entry(node, struct hist_entry, rb_node);
 
 	while (node && child->filtered) {
@@ -1965,7 +1980,7 @@ static void hists__filter_by_type(struct hists *hists, int type, filter_fn_t fil
 	hists__reset_filter_stats(hists);
 	hists__reset_col_len(hists);
 
-	for (nd = rb_first(&hists->entries); nd; nd = rb_next(nd)) {
+	for (nd = rb_first_cached(&hists->entries); nd; nd = rb_next(nd)) {
 		struct hist_entry *h = rb_entry(nd, struct hist_entry, rb_node);
 
 		if (filter(hists, h))
@@ -1975,13 +1990,15 @@ static void hists__filter_by_type(struct hists *hists, int type, filter_fn_t fil
 	}
 }
 
-static void resort_filtered_entry(struct rb_root *root, struct hist_entry *he)
+static void resort_filtered_entry(struct rb_root_cached *root,
+				  struct hist_entry *he)
 {
-	struct rb_node **p = &root->rb_node;
+	struct rb_node **p = &root->rb_root.rb_node;
 	struct rb_node *parent = NULL;
 	struct hist_entry *iter;
-	struct rb_root new_root = RB_ROOT;
+	struct rb_root_cached new_root = RB_ROOT_CACHED;
 	struct rb_node *nd;
+	bool leftmost = true;
 
 	while (*p != NULL) {
 		parent = *p;
@@ -1989,22 +2006,24 @@ static void resort_filtered_entry(struct rb_root *root, struct hist_entry *he)
 
 		if (hist_entry__sort(he, iter) > 0)
 			p = &(*p)->rb_left;
-		else
+		else {
 			p = &(*p)->rb_right;
+			leftmost = false;
+		}
 	}
 
 	rb_link_node(&he->rb_node, parent, p);
-	rb_insert_color(&he->rb_node, root);
+	rb_insert_color_cached(&he->rb_node, root, leftmost);
 
 	if (he->leaf || he->filtered)
 		return;
 
-	nd = rb_first(&he->hroot_out);
+	nd = rb_first_cached(&he->hroot_out);
 	while (nd) {
 		struct hist_entry *h = rb_entry(nd, struct hist_entry, rb_node);
 
 		nd = rb_next(nd);
-		rb_erase(&h->rb_node, &he->hroot_out);
+		rb_erase_cached(&h->rb_node, &he->hroot_out);
 
 		resort_filtered_entry(&new_root, h);
 	}
@@ -2015,14 +2034,14 @@ static void resort_filtered_entry(struct rb_root *root, struct hist_entry *he)
 static void hists__filter_hierarchy(struct hists *hists, int type, const void *arg)
 {
 	struct rb_node *nd;
-	struct rb_root new_root = RB_ROOT;
+	struct rb_root_cached new_root = RB_ROOT_CACHED;
 
 	hists->stats.nr_non_filtered_samples = 0;
 
 	hists__reset_filter_stats(hists);
 	hists__reset_col_len(hists);
 
-	nd = rb_first(&hists->entries);
+	nd = rb_first_cached(&hists->entries);
 	while (nd) {
 		struct hist_entry *h = rb_entry(nd, struct hist_entry, rb_node);
 		int ret;
@@ -2066,12 +2085,12 @@ static void hists__filter_hierarchy(struct hists *hists, int type, const void *a
 	 * resort output after applying a new filter since filter in a lower
 	 * hierarchy can change periods in a upper hierarchy.
 	 */
-	nd = rb_first(&hists->entries);
+	nd = rb_first_cached(&hists->entries);
 	while (nd) {
 		struct hist_entry *h = rb_entry(nd, struct hist_entry, rb_node);
 
 		nd = rb_next(nd);
-		rb_erase(&h->rb_node, &hists->entries);
+		rb_erase_cached(&h->rb_node, &hists->entries);
 
 		resort_filtered_entry(&new_root, h);
 	}
@@ -2140,18 +2159,19 @@ void hists__inc_nr_samples(struct hists *hists, bool filtered)
 static struct hist_entry *hists__add_dummy_entry(struct hists *hists,
 						 struct hist_entry *pair)
 {
-	struct rb_root *root;
+	struct rb_root_cached *root;
 	struct rb_node **p;
 	struct rb_node *parent = NULL;
 	struct hist_entry *he;
 	int64_t cmp;
+	bool leftmost = true;
 
 	if (hists__has(hists, need_collapse))
 		root = &hists->entries_collapsed;
 	else
 		root = hists->entries_in;
 
-	p = &root->rb_node;
+	p = &root->rb_root.rb_node;
 
 	while (*p != NULL) {
 		parent = *p;
@@ -2164,8 +2184,10 @@ static struct hist_entry *hists__add_dummy_entry(struct hists *hists,
 
 		if (cmp < 0)
 			p = &(*p)->rb_left;
-		else
+		else {
 			p = &(*p)->rb_right;
+			leftmost = false;
+		}
 	}
 
 	he = hist_entry__new(pair, true);
@@ -2175,7 +2197,7 @@ static struct hist_entry *hists__add_dummy_entry(struct hists *hists,
 		if (symbol_conf.cumulate_callchain)
 			memset(he->stat_acc, 0, sizeof(he->stat));
 		rb_link_node(&he->rb_node_in, parent, p);
-		rb_insert_color(&he->rb_node_in, root);
+		rb_insert_color_cached(&he->rb_node_in, root, leftmost);
 		hists__inc_stats(hists, he);
 		he->dummy = true;
 	}
@@ -2184,15 +2206,16 @@ static struct hist_entry *hists__add_dummy_entry(struct hists *hists,
 }
 
 static struct hist_entry *add_dummy_hierarchy_entry(struct hists *hists,
-						    struct rb_root *root,
+						    struct rb_root_cached *root,
 						    struct hist_entry *pair)
 {
 	struct rb_node **p;
 	struct rb_node *parent = NULL;
 	struct hist_entry *he;
 	struct perf_hpp_fmt *fmt;
+	bool leftmost = true;
 
-	p = &root->rb_node;
+	p = &root->rb_root.rb_node;
 	while (*p != NULL) {
 		int64_t cmp = 0;
 
@@ -2209,14 +2232,16 @@ static struct hist_entry *add_dummy_hierarchy_entry(struct hists *hists,
 
 		if (cmp < 0)
 			p = &parent->rb_left;
-		else
+		else {
 			p = &parent->rb_right;
+			leftmost = false;
+		}
 	}
 
 	he = hist_entry__new(pair, true);
 	if (he) {
 		rb_link_node(&he->rb_node_in, parent, p);
-		rb_insert_color(&he->rb_node_in, root);
+		rb_insert_color_cached(&he->rb_node_in, root, leftmost);
 
 		he->dummy = true;
 		he->hists = hists;
@@ -2233,9 +2258,9 @@ static struct hist_entry *hists__find_entry(struct hists *hists,
 	struct rb_node *n;
 
 	if (hists__has(hists, need_collapse))
-		n = hists->entries_collapsed.rb_node;
+		n = hists->entries_collapsed.rb_root.rb_node;
 	else
-		n = hists->entries_in->rb_node;
+		n = hists->entries_in->rb_root.rb_node;
 
 	while (n) {
 		struct hist_entry *iter = rb_entry(n, struct hist_entry, rb_node_in);
@@ -2252,10 +2277,10 @@ static struct hist_entry *hists__find_entry(struct hists *hists,
 	return NULL;
 }
 
-static struct hist_entry *hists__find_hierarchy_entry(struct rb_root *root,
+static struct hist_entry *hists__find_hierarchy_entry(struct rb_root_cached *root,
 						      struct hist_entry *he)
 {
-	struct rb_node *n = root->rb_node;
+	struct rb_node *n = root->rb_root.rb_node;
 
 	while (n) {
 		struct hist_entry *iter;
@@ -2280,13 +2305,13 @@ static struct hist_entry *hists__find_hierarchy_entry(struct rb_root *root,
 	return NULL;
 }
 
-static void hists__match_hierarchy(struct rb_root *leader_root,
-				   struct rb_root *other_root)
+static void hists__match_hierarchy(struct rb_root_cached *leader_root,
+				   struct rb_root_cached *other_root)
 {
 	struct rb_node *nd;
 	struct hist_entry *pos, *pair;
 
-	for (nd = rb_first(leader_root); nd; nd = rb_next(nd)) {
+	for (nd = rb_first_cached(leader_root); nd; nd = rb_next(nd)) {
 		pos  = rb_entry(nd, struct hist_entry, rb_node_in);
 		pair = hists__find_hierarchy_entry(other_root, pos);
 
@@ -2302,7 +2327,7 @@ static void hists__match_hierarchy(struct rb_root *leader_root,
  */
 void hists__match(struct hists *leader, struct hists *other)
 {
-	struct rb_root *root;
+	struct rb_root_cached *root;
 	struct rb_node *nd;
 	struct hist_entry *pos, *pair;
 
@@ -2317,7 +2342,7 @@ void hists__match(struct hists *leader, struct hists *other)
 	else
 		root = leader->entries_in;
 
-	for (nd = rb_first(root); nd; nd = rb_next(nd)) {
+	for (nd = rb_first_cached(root); nd; nd = rb_next(nd)) {
 		pos  = rb_entry(nd, struct hist_entry, rb_node_in);
 		pair = hists__find_entry(other, pos);
 
@@ -2328,13 +2353,13 @@ void hists__match(struct hists *leader, struct hists *other)
 
 static int hists__link_hierarchy(struct hists *leader_hists,
 				 struct hist_entry *parent,
-				 struct rb_root *leader_root,
-				 struct rb_root *other_root)
+				 struct rb_root_cached *leader_root,
+				 struct rb_root_cached *other_root)
 {
 	struct rb_node *nd;
 	struct hist_entry *pos, *leader;
 
-	for (nd = rb_first(other_root); nd; nd = rb_next(nd)) {
+	for (nd = rb_first_cached(other_root); nd; nd = rb_next(nd)) {
 		pos = rb_entry(nd, struct hist_entry, rb_node_in);
 
 		if (hist_entry__has_pairs(pos)) {
@@ -2377,7 +2402,7 @@ static int hists__link_hierarchy(struct hists *leader_hists,
  */
 int hists__link(struct hists *leader, struct hists *other)
 {
-	struct rb_root *root;
+	struct rb_root_cached *root;
 	struct rb_node *nd;
 	struct hist_entry *pos, *pair;
 
@@ -2393,7 +2418,7 @@ int hists__link(struct hists *leader, struct hists *other)
 	else
 		root = other->entries_in;
 
-	for (nd = rb_first(root); nd; nd = rb_next(nd)) {
+	for (nd = rb_first_cached(root); nd; nd = rb_next(nd)) {
 		pos = rb_entry(nd, struct hist_entry, rb_node_in);
 
 		if (!hist_entry__has_pairs(pos)) {
@@ -2566,10 +2591,10 @@ int perf_hist_config(const char *var, const char *value)
 int __hists__init(struct hists *hists, struct perf_hpp_list *hpp_list)
 {
 	memset(hists, 0, sizeof(*hists));
-	hists->entries_in_array[0] = hists->entries_in_array[1] = RB_ROOT;
+	hists->entries_in_array[0] = hists->entries_in_array[1] = RB_ROOT_CACHED;
 	hists->entries_in = &hists->entries_in_array[0];
-	hists->entries_collapsed = RB_ROOT;
-	hists->entries = RB_ROOT;
+	hists->entries_collapsed = RB_ROOT_CACHED;
+	hists->entries = RB_ROOT_CACHED;
 	pthread_mutex_init(&hists->lock, NULL);
 	hists->socket_filter = -1;
 	hists->hpp_list = hpp_list;
@@ -2577,14 +2602,14 @@ int __hists__init(struct hists *hists, struct perf_hpp_list *hpp_list)
 	return 0;
 }
 
-static void hists__delete_remaining_entries(struct rb_root *root)
+static void hists__delete_remaining_entries(struct rb_root_cached *root)
 {
 	struct rb_node *node;
 	struct hist_entry *he;
 
-	while (!RB_EMPTY_ROOT(root)) {
-		node = rb_first(root);
-		rb_erase(node, root);
+	while (!RB_EMPTY_ROOT(&root->rb_root)) {
+		node = rb_first_cached(root);
+		rb_erase_cached(node, root);
 
 		he = rb_entry(node, struct hist_entry, rb_node_in);
 		hist_entry__delete(he);
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index 3badd7f1e1b8..69d240e07c3f 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -69,10 +69,10 @@ struct thread;
 struct dso;
 
 struct hists {
-	struct rb_root		entries_in_array[2];
-	struct rb_root		*entries_in;
-	struct rb_root		entries;
-	struct rb_root		entries_collapsed;
+	struct rb_root_cached	entries_in_array[2];
+	struct rb_root_cached	*entries_in;
+	struct rb_root_cached	entries;
+	struct rb_root_cached	entries_collapsed;
 	u64			nr_entries;
 	u64			nr_non_filtered_entries;
 	u64			callchain_period;
@@ -229,7 +229,7 @@ static __pure inline bool hists__has_callchains(struct hists *hists)
 int hists__init(void);
 int __hists__init(struct hists *hists, struct perf_hpp_list *hpp_list);
 
-struct rb_root *hists__get_rotate_entries_in(struct hists *hists);
+struct rb_root_cached *hists__get_rotate_entries_in(struct hists *hists);
 
 struct perf_hpp {
 	char *buf;
diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h
index a97cf8e6be86..062ee8daab77 100644
--- a/tools/perf/util/sort.h
+++ b/tools/perf/util/sort.h
@@ -145,8 +145,8 @@ struct hist_entry {
 	union {
 		/* this is for hierarchical entry structure */
 		struct {
-			struct rb_root	hroot_in;
-			struct rb_root  hroot_out;
+			struct rb_root_cached	hroot_in;
+			struct rb_root_cached   hroot_out;
 		};				/* non-leaf entries */
 		struct rb_root	sorted_chain;	/* leaf entry has callchains */
 	};
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 7/7] perf sched: Use cached rbtrees
  2018-12-06 19:18 [PATCH v2 -tip 0/7] tools/perf: Update rbtree implementation and optimize users Davidlohr Bueso
                   ` (5 preceding siblings ...)
  2018-12-06 19:18 ` [PATCH 6/7] perf hist: " Davidlohr Bueso
@ 2018-12-06 19:18 ` Davidlohr Bueso
  2019-01-26 10:05   ` [tip:perf/core] " tip-bot for Davidlohr Bueso
  6 siblings, 1 reply; 18+ messages in thread
From: Davidlohr Bueso @ 2018-12-06 19:18 UTC (permalink / raw)
  To: acme; +Cc: mingo, linux-kernel, dave, Davidlohr Bueso

At the cost of an extra pointer, we can avoid the O(logN) cost
of finding the first element in the tree (smallest node), which
is something heavily required for perf-sched.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 tools/perf/builtin-sched.c | 45 +++++++++++++++++++++++++--------------------
 1 file changed, 25 insertions(+), 20 deletions(-)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index cbf39dab19c1..d687d8fbc1cd 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -213,7 +213,7 @@ struct perf_sched {
 	u64		 all_runtime;
 	u64		 all_count;
 	u64		 cpu_last_switched[MAX_CPUS];
-	struct rb_root	 atom_root, sorted_atom_root, merged_atom_root;
+	struct rb_root_cached atom_root, sorted_atom_root, merged_atom_root;
 	struct list_head sort_list, cmp_pid;
 	bool force;
 	bool skip_merge;
@@ -271,7 +271,7 @@ struct evsel_runtime {
 struct idle_thread_runtime {
 	struct thread_runtime	tr;
 	struct thread		*last_thread;
-	struct rb_root		sorted_root;
+	struct rb_root_cached	sorted_root;
 	struct callchain_root	callchain;
 	struct callchain_cursor	cursor;
 };
@@ -950,10 +950,10 @@ thread_lat_cmp(struct list_head *list, struct work_atoms *l, struct work_atoms *
 }
 
 static struct work_atoms *
-thread_atoms_search(struct rb_root *root, struct thread *thread,
+thread_atoms_search(struct rb_root_cached *root, struct thread *thread,
 			 struct list_head *sort_list)
 {
-	struct rb_node *node = root->rb_node;
+	struct rb_node *node = root->rb_root.rb_node;
 	struct work_atoms key = { .thread = thread };
 
 	while (node) {
@@ -976,10 +976,11 @@ thread_atoms_search(struct rb_root *root, struct thread *thread,
 }
 
 static void
-__thread_latency_insert(struct rb_root *root, struct work_atoms *data,
+__thread_latency_insert(struct rb_root_cached *root, struct work_atoms *data,
 			 struct list_head *sort_list)
 {
-	struct rb_node **new = &(root->rb_node), *parent = NULL;
+	struct rb_node **new = &(root->rb_root.rb_node), *parent = NULL;
+	bool leftmost = true;
 
 	while (*new) {
 		struct work_atoms *this;
@@ -992,12 +993,14 @@ __thread_latency_insert(struct rb_root *root, struct work_atoms *data,
 
 		if (cmp > 0)
 			new = &((*new)->rb_left);
-		else
+		else {
 			new = &((*new)->rb_right);
+			leftmost = false;
+		}
 	}
 
 	rb_link_node(&data->node, parent, new);
-	rb_insert_color(&data->node, root);
+	rb_insert_color_cached(&data->node, root, leftmost);
 }
 
 static int thread_atoms_insert(struct perf_sched *sched, struct thread *thread)
@@ -1447,15 +1450,15 @@ static int sort_dimension__add(const char *tok, struct list_head *list)
 static void perf_sched__sort_lat(struct perf_sched *sched)
 {
 	struct rb_node *node;
-	struct rb_root *root = &sched->atom_root;
+	struct rb_root_cached *root = &sched->atom_root;
 again:
 	for (;;) {
 		struct work_atoms *data;
-		node = rb_first(root);
+		node = rb_first_cached(root);
 		if (!node)
 			break;
 
-		rb_erase(node, root);
+		rb_erase_cached(node, root);
 		data = rb_entry(node, struct work_atoms, node);
 		__thread_latency_insert(&sched->sorted_atom_root, data, &sched->sort_list);
 	}
@@ -2762,12 +2765,12 @@ static size_t callchain__fprintf_folded(FILE *fp, struct callchain_node *node)
 	return ret;
 }
 
-static size_t timehist_print_idlehist_callchain(struct rb_root *root)
+static size_t timehist_print_idlehist_callchain(struct rb_root_cached *root)
 {
 	size_t ret = 0;
 	FILE *fp = stdout;
 	struct callchain_node *chain;
-	struct rb_node *rb_node = rb_first(root);
+	struct rb_node *rb_node = rb_first_cached(root);
 
 	printf("  %16s  %8s  %s\n", "Idle time (msec)", "Count", "Callchains");
 	printf("  %.16s  %.8s  %.50s\n", graph_dotted_line, graph_dotted_line,
@@ -2868,7 +2871,7 @@ static void timehist_print_summary(struct perf_sched *sched,
 			if (itr == NULL)
 				continue;
 
-			callchain_param.sort(&itr->sorted_root, &itr->callchain,
+			callchain_param.sort(&itr->sorted_root.rb_root, &itr->callchain,
 					     0, &callchain_param);
 
 			printf("  CPU %2d:", i);
@@ -3074,11 +3077,12 @@ static void print_bad_events(struct perf_sched *sched)
 	}
 }
 
-static void __merge_work_atoms(struct rb_root *root, struct work_atoms *data)
+static void __merge_work_atoms(struct rb_root_cached *root, struct work_atoms *data)
 {
-	struct rb_node **new = &(root->rb_node), *parent = NULL;
+	struct rb_node **new = &(root->rb_root.rb_node), *parent = NULL;
 	struct work_atoms *this;
 	const char *comm = thread__comm_str(data->thread), *this_comm;
+	bool leftmost = true;
 
 	while (*new) {
 		int cmp;
@@ -3092,6 +3096,7 @@ static void __merge_work_atoms(struct rb_root *root, struct work_atoms *data)
 			new = &((*new)->rb_left);
 		} else if (cmp < 0) {
 			new = &((*new)->rb_right);
+			leftmost = false;
 		} else {
 			this->num_merged++;
 			this->total_runtime += data->total_runtime;
@@ -3109,7 +3114,7 @@ static void __merge_work_atoms(struct rb_root *root, struct work_atoms *data)
 
 	data->num_merged++;
 	rb_link_node(&data->node, parent, new);
-	rb_insert_color(&data->node, root);
+	rb_insert_color_cached(&data->node, root, leftmost);
 }
 
 static void perf_sched__merge_lat(struct perf_sched *sched)
@@ -3120,8 +3125,8 @@ static void perf_sched__merge_lat(struct perf_sched *sched)
 	if (sched->skip_merge)
 		return;
 
-	while ((node = rb_first(&sched->atom_root))) {
-		rb_erase(node, &sched->atom_root);
+	while ((node = rb_first_cached(&sched->atom_root))) {
+		rb_erase_cached(node, &sched->atom_root);
 		data = rb_entry(node, struct work_atoms, node);
 		__merge_work_atoms(&sched->merged_atom_root, data);
 	}
@@ -3143,7 +3148,7 @@ static int perf_sched__lat(struct perf_sched *sched)
 	printf("  Task                  |   Runtime ms  | Switches | Average delay ms | Maximum delay ms | Maximum delay at       |\n");
 	printf(" -----------------------------------------------------------------------------------------------------------------\n");
 
-	next = rb_first(&sched->sorted_atom_root);
+	next = rb_first_cached(&sched->sorted_atom_root);
 
 	while (next) {
 		struct work_atoms *work_list;
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 6/7] perf hist: Use cached rbtrees
  2018-12-06 19:18 ` [PATCH 6/7] perf hist: " Davidlohr Bueso
@ 2019-01-22 13:59   ` Arnaldo Carvalho de Melo
  2019-01-22 15:22     ` Davidlohr Bueso
  2019-01-26 10:04   ` [tip:perf/core] " tip-bot for Davidlohr Bueso
  1 sibling, 1 reply; 18+ messages in thread
From: Arnaldo Carvalho de Melo @ 2019-01-22 13:59 UTC (permalink / raw)
  To: Davidlohr Bueso; +Cc: mingo, linux-kernel, Davidlohr Bueso

Em Thu, Dec 06, 2018 at 11:18:18AM -0800, Davidlohr Bueso escreveu:
> At the cost of an extra pointer, we can avoid the O(logN) cost
> of finding the first element in the tree (smallest node), which
> is something heavily required for histograms. Specifically, the
> following are converted to rb_root_cached, and users accordingly:
> 
> hist::entries_in_array
> hist::entries_in
> hist::entries
> hist::entries_collapsed
> hist_entry::hroot_in
> hist_entry::hroot_out

  CC       /tmp/build/perf/util/hist.o
ui/browsers/hists.c: In function ‘hierarchy_set_folding’:
ui/browsers/hists.c:511:21: error: passing argument 1 of ‘rb_first’ from incompatible pointer type [-Werror=incompatible-pointer-types]
  for (nd = rb_first(&he->hroot_out); nd; nd = rb_next(nd)) {
                     ^~~~~~~~~~~~~~
In file included from ui/browsers/hists.c:8:
/home/acme/git/perf/tools/include/linux/rbtree.h:83:24: note: expected ‘const struct rb_root *’ but argument is of type ‘struct rb_root_cached *’
 extern struct rb_node *rb_first(const struct rb_root *);
                        ^~~~~~~~
ui/browsers/hists.c: In function ‘__hist_browser__set_folding’:
ui/browsers/hists.c:569:16: error: passing argument 1 of ‘rb_first’ from incompatible pointer type [-Werror=incompatible-pointer-types]
  nd = rb_first(&browser->hists->entries);
                ^~~~~~~~~~~~~~~~~~~~~~~~

So I added this on top, please check:


diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index f8913723a99a..85790a4d1842 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -508,7 +508,7 @@ static int hierarchy_set_folding(struct hist_browser *hb, struct hist_entry *he,
 	struct hist_entry *child;
 	int n = 0;
 
-	for (nd = rb_first(&he->hroot_out); nd; nd = rb_next(nd)) {
+	for (nd = rb_first_cached(&he->hroot_out); nd; nd = rb_next(nd)) {
 		child = rb_entry(nd, struct hist_entry, rb_node);
 		percent = hist_entry__get_percent_limit(child);
 		if (!child->filtered && percent >= hb->min_pcnt)
@@ -566,7 +566,7 @@ __hist_browser__set_folding(struct hist_browser *browser, bool unfold)
 	struct rb_node *nd;
 	struct hist_entry *he;
 
-	nd = rb_first(&browser->hists->entries);
+	nd = rb_first_cached(&browser->hists->entries);
 	while (nd) {
 		he = rb_entry(nd, struct hist_entry, rb_node);
 
@@ -1738,7 +1738,7 @@ static void ui_browser__hists_init_top(struct ui_browser *browser)
 		struct hist_browser *hb;
 
 		hb = container_of(browser, struct hist_browser, b);
-		browser->top = rb_first(&hb->hists->entries);
+		browser->top = rb_first_cached(&hb->hists->entries);
 	}
 }
 
@@ -2649,7 +2649,7 @@ add_socket_opt(struct hist_browser *browser, struct popup_action *act,
 static void hist_browser__update_nr_entries(struct hist_browser *hb)
 {
 	u64 nr_entries = 0;
-	struct rb_node *nd = rb_first(&hb->hists->entries);
+	struct rb_node *nd = rb_first_cached(&hb->hists->entries);
 
 	if (hb->min_pcnt == 0 && !symbol_conf.report_hierarchy) {
 		hb->nr_non_filtered_entries = hb->hists->nr_non_filtered_entries;
@@ -2669,7 +2669,7 @@ static void hist_browser__update_percent_limit(struct hist_browser *hb,
 					       double percent)
 {
 	struct hist_entry *he;
-	struct rb_node *nd = rb_first(&hb->hists->entries);
+	struct rb_node *nd = rb_first_cached(&hb->hists->entries);
 	u64 total = hists__total_period(hb->hists);
 	u64 min_callchain_hits = total * (percent / 100);
 
diff --git a/tools/perf/ui/gtk/hists.c b/tools/perf/ui/gtk/hists.c
index dfab3deef028..74414ccd8c84 100644
--- a/tools/perf/ui/gtk/hists.c
+++ b/tools/perf/ui/gtk/hists.c
@@ -401,7 +401,7 @@ static void perf_gtk__show_hists(GtkWidget *window, struct hists *hists,
 }
 
 static void perf_gtk__add_hierarchy_entries(struct hists *hists,
-					    struct rb_root *root,
+					    struct rb_root_cached *root,
 					    GtkTreeStore *store,
 					    GtkTreeIter *parent,
 					    struct perf_hpp *hpp,
@@ -415,7 +415,7 @@ static void perf_gtk__add_hierarchy_entries(struct hists *hists,
 	u64 total = hists__total_period(hists);
 	int size;
 
-	for (node = rb_first(root); node; node = rb_next(node)) {
+	for (node = rb_first_cached(root); node; node = rb_next(node)) {
 		GtkTreeIter iter;
 		float percent;
 		char *bf;
@@ -578,7 +578,7 @@ static void perf_gtk__show_hierarchy(GtkWidget *window, struct hists *hists,
 	gtk_tree_view_set_model(GTK_TREE_VIEW(view), GTK_TREE_MODEL(store));
 	g_object_unref(GTK_TREE_MODEL(store));
 
-	perf_gtk__add_hierarchy_entries(hists, &hists->entries.rb_root, store,
+	perf_gtk__add_hierarchy_entries(hists, &hists->entries, store,
 					NULL, &hpp, min_pcnt);
 
 	gtk_tree_view_set_rules_hint(GTK_TREE_VIEW(view), TRUE);

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 6/7] perf hist: Use cached rbtrees
  2019-01-22 13:59   ` Arnaldo Carvalho de Melo
@ 2019-01-22 15:22     ` Davidlohr Bueso
  0 siblings, 0 replies; 18+ messages in thread
From: Davidlohr Bueso @ 2019-01-22 15:22 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: mingo, linux-kernel, Davidlohr Bueso

On Tue, 22 Jan 2019, Arnaldo Carvalho de Melo wrote:

>Em Thu, Dec 06, 2018 at 11:18:18AM -0800, Davidlohr Bueso escreveu:
>> At the cost of an extra pointer, we can avoid the O(logN) cost
>> of finding the first element in the tree (smallest node), which
>> is something heavily required for histograms. Specifically, the
>> following are converted to rb_root_cached, and users accordingly:
>>
>> hist::entries_in_array
>> hist::entries_in
>> hist::entries
>> hist::entries_collapsed
>> hist_entry::hroot_in
>> hist_entry::hroot_out
>
>  CC       /tmp/build/perf/util/hist.o
>ui/browsers/hists.c: In function ???hierarchy_set_folding???:
>ui/browsers/hists.c:511:21: error: passing argument 1 of ???rb_first??? from incompatible pointer type [-Werror=incompatible-pointer-types]
>  for (nd = rb_first(&he->hroot_out); nd; nd = rb_next(nd)) {
>                     ^~~~~~~~~~~~~~
>In file included from ui/browsers/hists.c:8:
>/home/acme/git/perf/tools/include/linux/rbtree.h:83:24: note: expected ???const struct rb_root *??? but argument is of type ???struct rb_root_cached *???
> extern struct rb_node *rb_first(const struct rb_root *);
>                        ^~~~~~~~
>ui/browsers/hists.c: In function ???__hist_browser__set_folding???:
>ui/browsers/hists.c:569:16: error: passing argument 1 of ???rb_first??? from incompatible pointer type [-Werror=incompatible-pointer-types]
>  nd = rb_first(&browser->hists->entries);
>                ^~~~~~~~~~~~~~~~~~~~~~~~
>
>So I added this on top, please check:

Looks good thanks (sorry I thought I had tested all this -- even ran perf
several times :/)

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [tip:perf/core] tools: Update rbtree implementation
  2018-12-06 19:18 ` [PATCH 1/7] tools/perf: Update rbtree implementation Davidlohr Bueso
@ 2019-01-26 10:01   ` tip-bot for Davidlohr Bueso
  0 siblings, 0 replies; 18+ messages in thread
From: tip-bot for Davidlohr Bueso @ 2019-01-26 10:01 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, acme, mingo, dbueso, namhyung, hpa, tglx, dave, jolsa

Commit-ID:  3aef2cad5d51ee66d2a614dd2f70cb34c74caf77
Gitweb:     https://git.kernel.org/tip/3aef2cad5d51ee66d2a614dd2f70cb34c74caf77
Author:     Davidlohr Bueso <dave@stgolabs.net>
AuthorDate: Thu, 6 Dec 2018 11:18:13 -0800
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Fri, 25 Jan 2019 15:12:09 +0100

tools: Update rbtree implementation

There have been a number of changes in the kernel's rbrtee
implementation, including loose lockless searching guarantees and
rb_root_cached, which later patches will use as an optimization.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/20181206191819.30182-2-dave@stgolabs.net
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/include/linux/rbtree.h           |  52 ++++++++--
 tools/include/linux/rbtree_augmented.h |  60 ++++++++---
 tools/lib/rbtree.c                     | 178 +++++++++++++++++++++++++--------
 3 files changed, 229 insertions(+), 61 deletions(-)

diff --git a/tools/include/linux/rbtree.h b/tools/include/linux/rbtree.h
index 112582253dd0..8e9ed4786269 100644
--- a/tools/include/linux/rbtree.h
+++ b/tools/include/linux/rbtree.h
@@ -43,13 +43,28 @@ struct rb_root {
 	struct rb_node *rb_node;
 };
 
+/*
+ * Leftmost-cached rbtrees.
+ *
+ * We do not cache the rightmost node based on footprint
+ * size vs number of potential users that could benefit
+ * from O(1) rb_last(). Just not worth it, users that want
+ * this feature can always implement the logic explicitly.
+ * Furthermore, users that want to cache both pointers may
+ * find it a bit asymmetric, but that's ok.
+ */
+struct rb_root_cached {
+	struct rb_root rb_root;
+	struct rb_node *rb_leftmost;
+};
 
 #define rb_parent(r)   ((struct rb_node *)((r)->__rb_parent_color & ~3))
 
 #define RB_ROOT	(struct rb_root) { NULL, }
+#define RB_ROOT_CACHED (struct rb_root_cached) { {NULL, }, NULL }
 #define	rb_entry(ptr, type, member) container_of(ptr, type, member)
 
-#define RB_EMPTY_ROOT(root)  ((root)->rb_node == NULL)
+#define RB_EMPTY_ROOT(root)  (READ_ONCE((root)->rb_node) == NULL)
 
 /* 'empty' nodes are nodes that are known not to be inserted in an rbtree */
 #define RB_EMPTY_NODE(node)  \
@@ -68,6 +83,12 @@ extern struct rb_node *rb_prev(const struct rb_node *);
 extern struct rb_node *rb_first(const struct rb_root *);
 extern struct rb_node *rb_last(const struct rb_root *);
 
+extern void rb_insert_color_cached(struct rb_node *,
+				   struct rb_root_cached *, bool);
+extern void rb_erase_cached(struct rb_node *node, struct rb_root_cached *);
+/* Same as rb_first(), but O(1) */
+#define rb_first_cached(root) (root)->rb_leftmost
+
 /* Postorder iteration - always visit the parent after its children */
 extern struct rb_node *rb_first_postorder(const struct rb_root *);
 extern struct rb_node *rb_next_postorder(const struct rb_node *);
@@ -75,6 +96,8 @@ extern struct rb_node *rb_next_postorder(const struct rb_node *);
 /* Fast replacement of a single node without remove/rebalance/add/rebalance */
 extern void rb_replace_node(struct rb_node *victim, struct rb_node *new,
 			    struct rb_root *root);
+extern void rb_replace_node_cached(struct rb_node *victim, struct rb_node *new,
+				   struct rb_root_cached *root);
 
 static inline void rb_link_node(struct rb_node *node, struct rb_node *parent,
 				struct rb_node **rb_link)
@@ -90,12 +113,29 @@ static inline void rb_link_node(struct rb_node *node, struct rb_node *parent,
 	   ____ptr ? rb_entry(____ptr, type, member) : NULL; \
 	})
 
-
-/*
- * Handy for checking that we are not deleting an entry that is
- * already in a list, found in block/{blk-throttle,cfq-iosched}.c,
- * probably should be moved to lib/rbtree.c...
+/**
+ * rbtree_postorder_for_each_entry_safe - iterate in post-order over rb_root of
+ * given type allowing the backing memory of @pos to be invalidated
+ *
+ * @pos:	the 'type *' to use as a loop cursor.
+ * @n:		another 'type *' to use as temporary storage
+ * @root:	'rb_root *' of the rbtree.
+ * @field:	the name of the rb_node field within 'type'.
+ *
+ * rbtree_postorder_for_each_entry_safe() provides a similar guarantee as
+ * list_for_each_entry_safe() and allows the iteration to continue independent
+ * of changes to @pos by the body of the loop.
+ *
+ * Note, however, that it cannot handle other modifications that re-order the
+ * rbtree it is iterating over. This includes calling rb_erase() on @pos, as
+ * rb_erase() may rebalance the tree, causing us to miss some nodes.
  */
+#define rbtree_postorder_for_each_entry_safe(pos, n, root, field) \
+	for (pos = rb_entry_safe(rb_first_postorder(root), typeof(*pos), field); \
+	     pos && ({ n = rb_entry_safe(rb_next_postorder(&pos->field), \
+			typeof(*pos), field); 1; }); \
+	     pos = n)
+
 static inline void rb_erase_init(struct rb_node *n, struct rb_root *root)
 {
 	rb_erase(n, root);
diff --git a/tools/include/linux/rbtree_augmented.h b/tools/include/linux/rbtree_augmented.h
index 43be941db695..d008e1404580 100644
--- a/tools/include/linux/rbtree_augmented.h
+++ b/tools/include/linux/rbtree_augmented.h
@@ -44,7 +44,9 @@ struct rb_augment_callbacks {
 	void (*rotate)(struct rb_node *old, struct rb_node *new);
 };
 
-extern void __rb_insert_augmented(struct rb_node *node, struct rb_root *root,
+extern void __rb_insert_augmented(struct rb_node *node,
+				  struct rb_root *root,
+				  bool newleft, struct rb_node **leftmost,
 	void (*augment_rotate)(struct rb_node *old, struct rb_node *new));
 /*
  * Fixup the rbtree and update the augmented information when rebalancing.
@@ -60,7 +62,16 @@ static inline void
 rb_insert_augmented(struct rb_node *node, struct rb_root *root,
 		    const struct rb_augment_callbacks *augment)
 {
-	__rb_insert_augmented(node, root, augment->rotate);
+	__rb_insert_augmented(node, root, false, NULL, augment->rotate);
+}
+
+static inline void
+rb_insert_augmented_cached(struct rb_node *node,
+			   struct rb_root_cached *root, bool newleft,
+			   const struct rb_augment_callbacks *augment)
+{
+	__rb_insert_augmented(node, &root->rb_root,
+			      newleft, &root->rb_leftmost, augment->rotate);
 }
 
 #define RB_DECLARE_CALLBACKS(rbstatic, rbname, rbstruct, rbfield,	\
@@ -93,7 +104,9 @@ rbname ## _rotate(struct rb_node *rb_old, struct rb_node *rb_new)	\
 	old->rbaugmented = rbcompute(old);				\
 }									\
 rbstatic const struct rb_augment_callbacks rbname = {			\
-	rbname ## _propagate, rbname ## _copy, rbname ## _rotate	\
+	.propagate = rbname ## _propagate,				\
+	.copy = rbname ## _copy,					\
+	.rotate = rbname ## _rotate					\
 };
 
 
@@ -126,11 +139,11 @@ __rb_change_child(struct rb_node *old, struct rb_node *new,
 {
 	if (parent) {
 		if (parent->rb_left == old)
-			parent->rb_left = new;
+			WRITE_ONCE(parent->rb_left, new);
 		else
-			parent->rb_right = new;
+			WRITE_ONCE(parent->rb_right, new);
 	} else
-		root->rb_node = new;
+		WRITE_ONCE(root->rb_node, new);
 }
 
 extern void __rb_erase_color(struct rb_node *parent, struct rb_root *root,
@@ -138,12 +151,17 @@ extern void __rb_erase_color(struct rb_node *parent, struct rb_root *root,
 
 static __always_inline struct rb_node *
 __rb_erase_augmented(struct rb_node *node, struct rb_root *root,
+		     struct rb_node **leftmost,
 		     const struct rb_augment_callbacks *augment)
 {
-	struct rb_node *child = node->rb_right, *tmp = node->rb_left;
+	struct rb_node *child = node->rb_right;
+	struct rb_node *tmp = node->rb_left;
 	struct rb_node *parent, *rebalance;
 	unsigned long pc;
 
+	if (leftmost && node == *leftmost)
+		*leftmost = rb_next(node);
+
 	if (!tmp) {
 		/*
 		 * Case 1: node to erase has no more than 1 child (easy!)
@@ -170,6 +188,7 @@ __rb_erase_augmented(struct rb_node *node, struct rb_root *root,
 		tmp = parent;
 	} else {
 		struct rb_node *successor = child, *child2;
+
 		tmp = child->rb_left;
 		if (!tmp) {
 			/*
@@ -183,6 +202,7 @@ __rb_erase_augmented(struct rb_node *node, struct rb_root *root,
 			 */
 			parent = successor;
 			child2 = successor->rb_right;
+
 			augment->copy(node, successor);
 		} else {
 			/*
@@ -204,19 +224,23 @@ __rb_erase_augmented(struct rb_node *node, struct rb_root *root,
 				successor = tmp;
 				tmp = tmp->rb_left;
 			} while (tmp);
-			parent->rb_left = child2 = successor->rb_right;
-			successor->rb_right = child;
+			child2 = successor->rb_right;
+			WRITE_ONCE(parent->rb_left, child2);
+			WRITE_ONCE(successor->rb_right, child);
 			rb_set_parent(child, successor);
+
 			augment->copy(node, successor);
 			augment->propagate(parent, successor);
 		}
 
-		successor->rb_left = tmp = node->rb_left;
+		tmp = node->rb_left;
+		WRITE_ONCE(successor->rb_left, tmp);
 		rb_set_parent(tmp, successor);
 
 		pc = node->__rb_parent_color;
 		tmp = __rb_parent(pc);
 		__rb_change_child(node, successor, tmp, root);
+
 		if (child2) {
 			successor->__rb_parent_color = pc;
 			rb_set_parent_color(child2, parent, RB_BLACK);
@@ -237,9 +261,21 @@ static __always_inline void
 rb_erase_augmented(struct rb_node *node, struct rb_root *root,
 		   const struct rb_augment_callbacks *augment)
 {
-	struct rb_node *rebalance = __rb_erase_augmented(node, root, augment);
+	struct rb_node *rebalance = __rb_erase_augmented(node, root,
+							 NULL, augment);
 	if (rebalance)
 		__rb_erase_color(rebalance, root, augment->rotate);
 }
 
-#endif	/* _TOOLS_LINUX_RBTREE_AUGMENTED_H */
+static __always_inline void
+rb_erase_augmented_cached(struct rb_node *node, struct rb_root_cached *root,
+			  const struct rb_augment_callbacks *augment)
+{
+	struct rb_node *rebalance = __rb_erase_augmented(node, &root->rb_root,
+							 &root->rb_leftmost,
+							 augment);
+	if (rebalance)
+		__rb_erase_color(rebalance, &root->rb_root, augment->rotate);
+}
+
+#endif /* _TOOLS_LINUX_RBTREE_AUGMENTED_H */
diff --git a/tools/lib/rbtree.c b/tools/lib/rbtree.c
index 17c2b596f043..904adb70a4f0 100644
--- a/tools/lib/rbtree.c
+++ b/tools/lib/rbtree.c
@@ -22,6 +22,7 @@
 */
 
 #include <linux/rbtree_augmented.h>
+#include <linux/export.h>
 
 /*
  * red-black trees properties:  http://en.wikipedia.org/wiki/Rbtree
@@ -43,6 +44,30 @@
  *  parentheses and have some accompanying text comment.
  */
 
+/*
+ * Notes on lockless lookups:
+ *
+ * All stores to the tree structure (rb_left and rb_right) must be done using
+ * WRITE_ONCE(). And we must not inadvertently cause (temporary) loops in the
+ * tree structure as seen in program order.
+ *
+ * These two requirements will allow lockless iteration of the tree -- not
+ * correct iteration mind you, tree rotations are not atomic so a lookup might
+ * miss entire subtrees.
+ *
+ * But they do guarantee that any such traversal will only see valid elements
+ * and that it will indeed complete -- does not get stuck in a loop.
+ *
+ * It also guarantees that if the lookup returns an element it is the 'correct'
+ * one. But not returning an element does _NOT_ mean it's not present.
+ *
+ * NOTE:
+ *
+ * Stores to __rb_parent_color are not important for simple lookups so those
+ * are left undone as of now. Nor did I check for loops involving parent
+ * pointers.
+ */
+
 static inline void rb_set_black(struct rb_node *rb)
 {
 	rb->__rb_parent_color |= RB_BLACK;
@@ -70,22 +95,35 @@ __rb_rotate_set_parents(struct rb_node *old, struct rb_node *new,
 
 static __always_inline void
 __rb_insert(struct rb_node *node, struct rb_root *root,
+	    bool newleft, struct rb_node **leftmost,
 	    void (*augment_rotate)(struct rb_node *old, struct rb_node *new))
 {
 	struct rb_node *parent = rb_red_parent(node), *gparent, *tmp;
 
+	if (newleft)
+		*leftmost = node;
+
 	while (true) {
 		/*
-		 * Loop invariant: node is red
-		 *
-		 * If there is a black parent, we are done.
-		 * Otherwise, take some corrective action as we don't
-		 * want a red root or two consecutive red nodes.
+		 * Loop invariant: node is red.
 		 */
-		if (!parent) {
+		if (unlikely(!parent)) {
+			/*
+			 * The inserted node is root. Either this is the
+			 * first node, or we recursed at Case 1 below and
+			 * are no longer violating 4).
+			 */
 			rb_set_parent_color(node, NULL, RB_BLACK);
 			break;
-		} else if (rb_is_black(parent))
+		}
+
+		/*
+		 * If there is a black parent, we are done.
+		 * Otherwise, take some corrective action as,
+		 * per 4), we don't want a red root or two
+		 * consecutive red nodes.
+		 */
+		if(rb_is_black(parent))
 			break;
 
 		gparent = rb_red_parent(parent);
@@ -94,7 +132,7 @@ __rb_insert(struct rb_node *node, struct rb_root *root,
 		if (parent != tmp) {	/* parent == gparent->rb_left */
 			if (tmp && rb_is_red(tmp)) {
 				/*
-				 * Case 1 - color flips
+				 * Case 1 - node's uncle is red (color flips).
 				 *
 				 *       G            g
 				 *      / \          / \
@@ -117,7 +155,8 @@ __rb_insert(struct rb_node *node, struct rb_root *root,
 			tmp = parent->rb_right;
 			if (node == tmp) {
 				/*
-				 * Case 2 - left rotate at parent
+				 * Case 2 - node's uncle is black and node is
+				 * the parent's right child (left rotate at parent).
 				 *
 				 *      G             G
 				 *     / \           / \
@@ -128,8 +167,9 @@ __rb_insert(struct rb_node *node, struct rb_root *root,
 				 * This still leaves us in violation of 4), the
 				 * continuation into Case 3 will fix that.
 				 */
-				parent->rb_right = tmp = node->rb_left;
-				node->rb_left = parent;
+				tmp = node->rb_left;
+				WRITE_ONCE(parent->rb_right, tmp);
+				WRITE_ONCE(node->rb_left, parent);
 				if (tmp)
 					rb_set_parent_color(tmp, parent,
 							    RB_BLACK);
@@ -140,7 +180,8 @@ __rb_insert(struct rb_node *node, struct rb_root *root,
 			}
 
 			/*
-			 * Case 3 - right rotate at gparent
+			 * Case 3 - node's uncle is black and node is
+			 * the parent's left child (right rotate at gparent).
 			 *
 			 *        G           P
 			 *       / \         / \
@@ -148,8 +189,8 @@ __rb_insert(struct rb_node *node, struct rb_root *root,
 			 *     /                 \
 			 *    n                   U
 			 */
-			gparent->rb_left = tmp;  /* == parent->rb_right */
-			parent->rb_right = gparent;
+			WRITE_ONCE(gparent->rb_left, tmp); /* == parent->rb_right */
+			WRITE_ONCE(parent->rb_right, gparent);
 			if (tmp)
 				rb_set_parent_color(tmp, gparent, RB_BLACK);
 			__rb_rotate_set_parents(gparent, parent, root, RB_RED);
@@ -170,8 +211,9 @@ __rb_insert(struct rb_node *node, struct rb_root *root,
 			tmp = parent->rb_left;
 			if (node == tmp) {
 				/* Case 2 - right rotate at parent */
-				parent->rb_left = tmp = node->rb_right;
-				node->rb_right = parent;
+				tmp = node->rb_right;
+				WRITE_ONCE(parent->rb_left, tmp);
+				WRITE_ONCE(node->rb_right, parent);
 				if (tmp)
 					rb_set_parent_color(tmp, parent,
 							    RB_BLACK);
@@ -182,8 +224,8 @@ __rb_insert(struct rb_node *node, struct rb_root *root,
 			}
 
 			/* Case 3 - left rotate at gparent */
-			gparent->rb_right = tmp;  /* == parent->rb_left */
-			parent->rb_left = gparent;
+			WRITE_ONCE(gparent->rb_right, tmp); /* == parent->rb_left */
+			WRITE_ONCE(parent->rb_left, gparent);
 			if (tmp)
 				rb_set_parent_color(tmp, gparent, RB_BLACK);
 			__rb_rotate_set_parents(gparent, parent, root, RB_RED);
@@ -223,8 +265,9 @@ ____rb_erase_color(struct rb_node *parent, struct rb_root *root,
 				 *      / \         / \
 				 *     Sl  Sr      N   Sl
 				 */
-				parent->rb_right = tmp1 = sibling->rb_left;
-				sibling->rb_left = parent;
+				tmp1 = sibling->rb_left;
+				WRITE_ONCE(parent->rb_right, tmp1);
+				WRITE_ONCE(sibling->rb_left, parent);
 				rb_set_parent_color(tmp1, parent, RB_BLACK);
 				__rb_rotate_set_parents(parent, sibling, root,
 							RB_RED);
@@ -268,15 +311,31 @@ ____rb_erase_color(struct rb_node *parent, struct rb_root *root,
 				 *
 				 *   (p)           (p)
 				 *   / \           / \
-				 *  N   S    -->  N   Sl
+				 *  N   S    -->  N   sl
 				 *     / \             \
-				 *    sl  Sr            s
+				 *    sl  Sr            S
 				 *                       \
 				 *                        Sr
+				 *
+				 * Note: p might be red, and then both
+				 * p and sl are red after rotation(which
+				 * breaks property 4). This is fixed in
+				 * Case 4 (in __rb_rotate_set_parents()
+				 *         which set sl the color of p
+				 *         and set p RB_BLACK)
+				 *
+				 *   (p)            (sl)
+				 *   / \            /  \
+				 *  N   sl   -->   P    S
+				 *       \        /      \
+				 *        S      N        Sr
+				 *         \
+				 *          Sr
 				 */
-				sibling->rb_left = tmp1 = tmp2->rb_right;
-				tmp2->rb_right = sibling;
-				parent->rb_right = tmp2;
+				tmp1 = tmp2->rb_right;
+				WRITE_ONCE(sibling->rb_left, tmp1);
+				WRITE_ONCE(tmp2->rb_right, sibling);
+				WRITE_ONCE(parent->rb_right, tmp2);
 				if (tmp1)
 					rb_set_parent_color(tmp1, sibling,
 							    RB_BLACK);
@@ -296,8 +355,9 @@ ____rb_erase_color(struct rb_node *parent, struct rb_root *root,
 			 *        / \         / \
 			 *      (sl) sr      N  (sl)
 			 */
-			parent->rb_right = tmp2 = sibling->rb_left;
-			sibling->rb_left = parent;
+			tmp2 = sibling->rb_left;
+			WRITE_ONCE(parent->rb_right, tmp2);
+			WRITE_ONCE(sibling->rb_left, parent);
 			rb_set_parent_color(tmp1, sibling, RB_BLACK);
 			if (tmp2)
 				rb_set_parent(tmp2, parent);
@@ -309,8 +369,9 @@ ____rb_erase_color(struct rb_node *parent, struct rb_root *root,
 			sibling = parent->rb_left;
 			if (rb_is_red(sibling)) {
 				/* Case 1 - right rotate at parent */
-				parent->rb_left = tmp1 = sibling->rb_right;
-				sibling->rb_right = parent;
+				tmp1 = sibling->rb_right;
+				WRITE_ONCE(parent->rb_left, tmp1);
+				WRITE_ONCE(sibling->rb_right, parent);
 				rb_set_parent_color(tmp1, parent, RB_BLACK);
 				__rb_rotate_set_parents(parent, sibling, root,
 							RB_RED);
@@ -334,10 +395,11 @@ ____rb_erase_color(struct rb_node *parent, struct rb_root *root,
 					}
 					break;
 				}
-				/* Case 3 - right rotate at sibling */
-				sibling->rb_right = tmp1 = tmp2->rb_left;
-				tmp2->rb_left = sibling;
-				parent->rb_left = tmp2;
+				/* Case 3 - left rotate at sibling */
+				tmp1 = tmp2->rb_left;
+				WRITE_ONCE(sibling->rb_right, tmp1);
+				WRITE_ONCE(tmp2->rb_left, sibling);
+				WRITE_ONCE(parent->rb_left, tmp2);
 				if (tmp1)
 					rb_set_parent_color(tmp1, sibling,
 							    RB_BLACK);
@@ -345,9 +407,10 @@ ____rb_erase_color(struct rb_node *parent, struct rb_root *root,
 				tmp1 = sibling;
 				sibling = tmp2;
 			}
-			/* Case 4 - left rotate at parent + color flips */
-			parent->rb_left = tmp2 = sibling->rb_right;
-			sibling->rb_right = parent;
+			/* Case 4 - right rotate at parent + color flips */
+			tmp2 = sibling->rb_right;
+			WRITE_ONCE(parent->rb_left, tmp2);
+			WRITE_ONCE(sibling->rb_right, parent);
 			rb_set_parent_color(tmp1, sibling, RB_BLACK);
 			if (tmp2)
 				rb_set_parent(tmp2, parent);
@@ -378,22 +441,41 @@ static inline void dummy_copy(struct rb_node *old, struct rb_node *new) {}
 static inline void dummy_rotate(struct rb_node *old, struct rb_node *new) {}
 
 static const struct rb_augment_callbacks dummy_callbacks = {
-	dummy_propagate, dummy_copy, dummy_rotate
+	.propagate = dummy_propagate,
+	.copy = dummy_copy,
+	.rotate = dummy_rotate
 };
 
 void rb_insert_color(struct rb_node *node, struct rb_root *root)
 {
-	__rb_insert(node, root, dummy_rotate);
+	__rb_insert(node, root, false, NULL, dummy_rotate);
 }
 
 void rb_erase(struct rb_node *node, struct rb_root *root)
 {
 	struct rb_node *rebalance;
-	rebalance = __rb_erase_augmented(node, root, &dummy_callbacks);
+	rebalance = __rb_erase_augmented(node, root,
+					 NULL, &dummy_callbacks);
 	if (rebalance)
 		____rb_erase_color(rebalance, root, dummy_rotate);
 }
 
+void rb_insert_color_cached(struct rb_node *node,
+			    struct rb_root_cached *root, bool leftmost)
+{
+	__rb_insert(node, &root->rb_root, leftmost,
+		    &root->rb_leftmost, dummy_rotate);
+}
+
+void rb_erase_cached(struct rb_node *node, struct rb_root_cached *root)
+{
+	struct rb_node *rebalance;
+	rebalance = __rb_erase_augmented(node, &root->rb_root,
+					 &root->rb_leftmost, &dummy_callbacks);
+	if (rebalance)
+		____rb_erase_color(rebalance, &root->rb_root, dummy_rotate);
+}
+
 /*
  * Augmented rbtree manipulation functions.
  *
@@ -402,9 +484,10 @@ void rb_erase(struct rb_node *node, struct rb_root *root)
  */
 
 void __rb_insert_augmented(struct rb_node *node, struct rb_root *root,
+			   bool newleft, struct rb_node **leftmost,
 	void (*augment_rotate)(struct rb_node *old, struct rb_node *new))
 {
-	__rb_insert(node, root, augment_rotate);
+	__rb_insert(node, root, newleft, leftmost, augment_rotate);
 }
 
 /*
@@ -498,15 +581,24 @@ void rb_replace_node(struct rb_node *victim, struct rb_node *new,
 {
 	struct rb_node *parent = rb_parent(victim);
 
+	/* Copy the pointers/colour from the victim to the replacement */
+	*new = *victim;
+
 	/* Set the surrounding nodes to point to the replacement */
-	__rb_change_child(victim, new, parent, root);
 	if (victim->rb_left)
 		rb_set_parent(victim->rb_left, new);
 	if (victim->rb_right)
 		rb_set_parent(victim->rb_right, new);
+	__rb_change_child(victim, new, parent, root);
+}
 
-	/* Copy the pointers/colour from the victim to the replacement */
-	*new = *victim;
+void rb_replace_node_cached(struct rb_node *victim, struct rb_node *new,
+			    struct rb_root_cached *root)
+{
+	rb_replace_node(victim, new, &root->rb_root);
+
+	if (root->rb_leftmost == victim)
+		root->rb_leftmost = new;
 }
 
 static struct rb_node *rb_left_deepest_node(const struct rb_node *node)

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [tip:perf/core] perf machine: Use cached rbtrees
  2018-12-06 19:18 ` [PATCH 2/7] perf machine: Use cached rbtrees Davidlohr Bueso
@ 2019-01-26 10:01   ` tip-bot for Davidlohr Bueso
  0 siblings, 0 replies; 18+ messages in thread
From: tip-bot for Davidlohr Bueso @ 2019-01-26 10:01 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: acme, dbueso, namhyung, dave, jolsa, mingo, linux-kernel, hpa, tglx

Commit-ID:  f3acb3a8a2081344801974ac5ec8e1b0d6f0ef36
Gitweb:     https://git.kernel.org/tip/f3acb3a8a2081344801974ac5ec8e1b0d6f0ef36
Author:     Davidlohr Bueso <dave@stgolabs.net>
AuthorDate: Thu, 6 Dec 2018 11:18:14 -0800
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Fri, 25 Jan 2019 15:12:09 +0100

perf machine: Use cached rbtrees

At the cost of an extra pointer, we can avoid the O(logN) cost of
finding the first element in the tree (smallest node), which is
something required for nearly every operation dealing with
machine->guests and threads->entries.

The conversion is straightforward, however, it's worth noticing that the
rb_erase_init() calls have been replaced by rb_erase_cached() which has
no _init() flavor, however, the node is explicitly cleared next anyway,
which was redundant until now.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/20181206191819.30182-3-dave@stgolabs.net
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/builtin-report.c |  3 ++-
 tools/perf/util/build-id.c  | 12 ++++++----
 tools/perf/util/machine.c   | 53 ++++++++++++++++++++++++++-------------------
 tools/perf/util/machine.h   | 12 +++++-----
 tools/perf/util/rb_resort.h |  6 ++---
 5 files changed, 50 insertions(+), 36 deletions(-)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 794a0de5ac1d..c9ceaf88759c 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -753,7 +753,8 @@ static int tasks_print(struct report *rep, FILE *fp)
 	for (i = 0; i < THREADS__TABLE_SIZE; i++) {
 		struct threads *threads = &machine->threads[i];
 
-		for (nd = rb_first(&threads->entries); nd; nd = rb_next(nd)) {
+		for (nd = rb_first_cached(&threads->entries); nd;
+		     nd = rb_next(nd)) {
 			task = tasks + itask++;
 
 			task->thread = rb_entry(nd, struct thread, rb_node);
diff --git a/tools/perf/util/build-id.c b/tools/perf/util/build-id.c
index 07ef7fad689c..2ff802068f06 100644
--- a/tools/perf/util/build-id.c
+++ b/tools/perf/util/build-id.c
@@ -364,7 +364,8 @@ int perf_session__write_buildid_table(struct perf_session *session,
 	if (err)
 		return err;
 
-	for (nd = rb_first(&session->machines.guests); nd; nd = rb_next(nd)) {
+	for (nd = rb_first_cached(&session->machines.guests); nd;
+	     nd = rb_next(nd)) {
 		struct machine *pos = rb_entry(nd, struct machine, rb_node);
 		err = machine__write_buildid_table(pos, fd);
 		if (err)
@@ -397,7 +398,8 @@ int dsos__hit_all(struct perf_session *session)
 	if (err)
 		return err;
 
-	for (nd = rb_first(&session->machines.guests); nd; nd = rb_next(nd)) {
+	for (nd = rb_first_cached(&session->machines.guests); nd;
+	     nd = rb_next(nd)) {
 		struct machine *pos = rb_entry(nd, struct machine, rb_node);
 
 		err = machine__hit_all_dsos(pos);
@@ -850,7 +852,8 @@ int perf_session__cache_build_ids(struct perf_session *session)
 
 	ret = machine__cache_build_ids(&session->machines.host);
 
-	for (nd = rb_first(&session->machines.guests); nd; nd = rb_next(nd)) {
+	for (nd = rb_first_cached(&session->machines.guests); nd;
+	     nd = rb_next(nd)) {
 		struct machine *pos = rb_entry(nd, struct machine, rb_node);
 		ret |= machine__cache_build_ids(pos);
 	}
@@ -867,7 +870,8 @@ bool perf_session__read_build_ids(struct perf_session *session, bool with_hits)
 	struct rb_node *nd;
 	bool ret = machine__read_build_ids(&session->machines.host, with_hits);
 
-	for (nd = rb_first(&session->machines.guests); nd; nd = rb_next(nd)) {
+	for (nd = rb_first_cached(&session->machines.guests); nd;
+	     nd = rb_next(nd)) {
 		struct machine *pos = rb_entry(nd, struct machine, rb_node);
 		ret |= machine__read_build_ids(pos, with_hits);
 	}
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index ae85106bb5bf..66f019fdc510 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -42,7 +42,7 @@ static void machine__threads_init(struct machine *machine)
 
 	for (i = 0; i < THREADS__TABLE_SIZE; i++) {
 		struct threads *threads = &machine->threads[i];
-		threads->entries = RB_ROOT;
+		threads->entries = RB_ROOT_CACHED;
 		init_rwsem(&threads->lock);
 		threads->nr = 0;
 		INIT_LIST_HEAD(&threads->dead);
@@ -180,7 +180,7 @@ void machine__delete_threads(struct machine *machine)
 	for (i = 0; i < THREADS__TABLE_SIZE; i++) {
 		struct threads *threads = &machine->threads[i];
 		down_write(&threads->lock);
-		nd = rb_first(&threads->entries);
+		nd = rb_first_cached(&threads->entries);
 		while (nd) {
 			struct thread *t = rb_entry(nd, struct thread, rb_node);
 
@@ -223,7 +223,7 @@ void machine__delete(struct machine *machine)
 void machines__init(struct machines *machines)
 {
 	machine__init(&machines->host, "", HOST_KERNEL_ID);
-	machines->guests = RB_ROOT;
+	machines->guests = RB_ROOT_CACHED;
 }
 
 void machines__exit(struct machines *machines)
@@ -235,9 +235,10 @@ void machines__exit(struct machines *machines)
 struct machine *machines__add(struct machines *machines, pid_t pid,
 			      const char *root_dir)
 {
-	struct rb_node **p = &machines->guests.rb_node;
+	struct rb_node **p = &machines->guests.rb_root.rb_node;
 	struct rb_node *parent = NULL;
 	struct machine *pos, *machine = malloc(sizeof(*machine));
+	bool leftmost = true;
 
 	if (machine == NULL)
 		return NULL;
@@ -252,12 +253,14 @@ struct machine *machines__add(struct machines *machines, pid_t pid,
 		pos = rb_entry(parent, struct machine, rb_node);
 		if (pid < pos->pid)
 			p = &(*p)->rb_left;
-		else
+		else {
 			p = &(*p)->rb_right;
+			leftmost = false;
+		}
 	}
 
 	rb_link_node(&machine->rb_node, parent, p);
-	rb_insert_color(&machine->rb_node, &machines->guests);
+	rb_insert_color_cached(&machine->rb_node, &machines->guests, leftmost);
 
 	return machine;
 }
@@ -268,7 +271,7 @@ void machines__set_comm_exec(struct machines *machines, bool comm_exec)
 
 	machines->host.comm_exec = comm_exec;
 
-	for (nd = rb_first(&machines->guests); nd; nd = rb_next(nd)) {
+	for (nd = rb_first_cached(&machines->guests); nd; nd = rb_next(nd)) {
 		struct machine *machine = rb_entry(nd, struct machine, rb_node);
 
 		machine->comm_exec = comm_exec;
@@ -277,7 +280,7 @@ void machines__set_comm_exec(struct machines *machines, bool comm_exec)
 
 struct machine *machines__find(struct machines *machines, pid_t pid)
 {
-	struct rb_node **p = &machines->guests.rb_node;
+	struct rb_node **p = &machines->guests.rb_root.rb_node;
 	struct rb_node *parent = NULL;
 	struct machine *machine;
 	struct machine *default_machine = NULL;
@@ -340,7 +343,7 @@ void machines__process_guests(struct machines *machines,
 {
 	struct rb_node *nd;
 
-	for (nd = rb_first(&machines->guests); nd; nd = rb_next(nd)) {
+	for (nd = rb_first_cached(&machines->guests); nd; nd = rb_next(nd)) {
 		struct machine *pos = rb_entry(nd, struct machine, rb_node);
 		process(pos, data);
 	}
@@ -353,7 +356,8 @@ void machines__set_id_hdr_size(struct machines *machines, u16 id_hdr_size)
 
 	machines->host.id_hdr_size = id_hdr_size;
 
-	for (node = rb_first(&machines->guests); node; node = rb_next(node)) {
+	for (node = rb_first_cached(&machines->guests); node;
+	     node = rb_next(node)) {
 		machine = rb_entry(node, struct machine, rb_node);
 		machine->id_hdr_size = id_hdr_size;
 	}
@@ -466,9 +470,10 @@ static struct thread *____machine__findnew_thread(struct machine *machine,
 						  pid_t pid, pid_t tid,
 						  bool create)
 {
-	struct rb_node **p = &threads->entries.rb_node;
+	struct rb_node **p = &threads->entries.rb_root.rb_node;
 	struct rb_node *parent = NULL;
 	struct thread *th;
+	bool leftmost = true;
 
 	th = threads__get_last_match(threads, machine, pid, tid);
 	if (th)
@@ -486,8 +491,10 @@ static struct thread *____machine__findnew_thread(struct machine *machine,
 
 		if (tid < th->tid)
 			p = &(*p)->rb_left;
-		else
+		else {
 			p = &(*p)->rb_right;
+			leftmost = false;
+		}
 	}
 
 	if (!create)
@@ -496,7 +503,7 @@ static struct thread *____machine__findnew_thread(struct machine *machine,
 	th = thread__new(pid, tid);
 	if (th != NULL) {
 		rb_link_node(&th->rb_node, parent, p);
-		rb_insert_color(&th->rb_node, &threads->entries);
+		rb_insert_color_cached(&th->rb_node, &threads->entries, leftmost);
 
 		/*
 		 * We have to initialize map_groups separately
@@ -507,7 +514,7 @@ static struct thread *____machine__findnew_thread(struct machine *machine,
 		 * leader and that would screwed the rb tree.
 		 */
 		if (thread__init_map_groups(th, machine)) {
-			rb_erase_init(&th->rb_node, &threads->entries);
+			rb_erase_cached(&th->rb_node, &threads->entries);
 			RB_CLEAR_NODE(&th->rb_node);
 			thread__put(th);
 			return NULL;
@@ -798,7 +805,7 @@ size_t machines__fprintf_dsos(struct machines *machines, FILE *fp)
 	struct rb_node *nd;
 	size_t ret = __dsos__fprintf(&machines->host.dsos.head, fp);
 
-	for (nd = rb_first(&machines->guests); nd; nd = rb_next(nd)) {
+	for (nd = rb_first_cached(&machines->guests); nd; nd = rb_next(nd)) {
 		struct machine *pos = rb_entry(nd, struct machine, rb_node);
 		ret += __dsos__fprintf(&pos->dsos.head, fp);
 	}
@@ -818,7 +825,7 @@ size_t machines__fprintf_dsos_buildid(struct machines *machines, FILE *fp,
 	struct rb_node *nd;
 	size_t ret = machine__fprintf_dsos_buildid(&machines->host, fp, skip, parm);
 
-	for (nd = rb_first(&machines->guests); nd; nd = rb_next(nd)) {
+	for (nd = rb_first_cached(&machines->guests); nd; nd = rb_next(nd)) {
 		struct machine *pos = rb_entry(nd, struct machine, rb_node);
 		ret += machine__fprintf_dsos_buildid(pos, fp, skip, parm);
 	}
@@ -858,7 +865,8 @@ size_t machine__fprintf(struct machine *machine, FILE *fp)
 
 		ret = fprintf(fp, "Threads: %u\n", threads->nr);
 
-		for (nd = rb_first(&threads->entries); nd; nd = rb_next(nd)) {
+		for (nd = rb_first_cached(&threads->entries); nd;
+		     nd = rb_next(nd)) {
 			struct thread *pos = rb_entry(nd, struct thread, rb_node);
 
 			ret += thread__fprintf(pos, fp);
@@ -1161,7 +1169,7 @@ failure:
 
 void machines__destroy_kernel_maps(struct machines *machines)
 {
-	struct rb_node *next = rb_first(&machines->guests);
+	struct rb_node *next = rb_first_cached(&machines->guests);
 
 	machine__destroy_kernel_maps(&machines->host);
 
@@ -1169,7 +1177,7 @@ void machines__destroy_kernel_maps(struct machines *machines)
 		struct machine *pos = rb_entry(next, struct machine, rb_node);
 
 		next = rb_next(&pos->rb_node);
-		rb_erase(&pos->rb_node, &machines->guests);
+		rb_erase_cached(&pos->rb_node, &machines->guests);
 		machine__delete(pos);
 	}
 }
@@ -1734,7 +1742,7 @@ static void __machine__remove_thread(struct machine *machine, struct thread *th,
 	BUG_ON(refcount_read(&th->refcnt) == 0);
 	if (lock)
 		down_write(&threads->lock);
-	rb_erase_init(&th->rb_node, &threads->entries);
+	rb_erase_cached(&th->rb_node, &threads->entries);
 	RB_CLEAR_NODE(&th->rb_node);
 	--threads->nr;
 	/*
@@ -2511,7 +2519,8 @@ int machine__for_each_thread(struct machine *machine,
 
 	for (i = 0; i < THREADS__TABLE_SIZE; i++) {
 		threads = &machine->threads[i];
-		for (nd = rb_first(&threads->entries); nd; nd = rb_next(nd)) {
+		for (nd = rb_first_cached(&threads->entries); nd;
+		     nd = rb_next(nd)) {
 			thread = rb_entry(nd, struct thread, rb_node);
 			rc = fn(thread, priv);
 			if (rc != 0)
@@ -2538,7 +2547,7 @@ int machines__for_each_thread(struct machines *machines,
 	if (rc != 0)
 		return rc;
 
-	for (nd = rb_first(&machines->guests); nd; nd = rb_next(nd)) {
+	for (nd = rb_first_cached(&machines->guests); nd; nd = rb_next(nd)) {
 		struct machine *machine = rb_entry(nd, struct machine, rb_node);
 
 		rc = machine__for_each_thread(machine, fn, priv);
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index 4ecd380ce1b4..e2f3df45410a 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -29,11 +29,11 @@ struct vdso_info;
 #define THREADS__TABLE_SIZE	(1 << THREADS__TABLE_BITS)
 
 struct threads {
-	struct rb_root	  entries;
-	struct rw_semaphore lock;
-	unsigned int	  nr;
-	struct list_head  dead;
-	struct thread	  *last_match;
+	struct rb_root_cached  entries;
+	struct rw_semaphore    lock;
+	unsigned int	       nr;
+	struct list_head       dead;
+	struct thread	       *last_match;
 };
 
 struct machine {
@@ -140,7 +140,7 @@ typedef void (*machine__process_t)(struct machine *machine, void *data);
 
 struct machines {
 	struct machine host;
-	struct rb_root guests;
+	struct rb_root_cached guests;
 };
 
 void machines__init(struct machines *machines);
diff --git a/tools/perf/util/rb_resort.h b/tools/perf/util/rb_resort.h
index a920f702a74d..f272e181d3d6 100644
--- a/tools/perf/util/rb_resort.h
+++ b/tools/perf/util/rb_resort.h
@@ -144,8 +144,8 @@ struct __name##_sorted *__name = __name##_sorted__new
 				  __ilist->rblist.nr_entries)
 
 /* For 'struct machine->threads' */
-#define DECLARE_RESORT_RB_MACHINE_THREADS(__name, __machine, hash_bucket)	\
-	DECLARE_RESORT_RB(__name)(&__machine->threads[hash_bucket].entries,	\
-				  __machine->threads[hash_bucket].nr)
+#define DECLARE_RESORT_RB_MACHINE_THREADS(__name, __machine, hash_bucket)    \
+ DECLARE_RESORT_RB(__name)(&__machine->threads[hash_bucket].entries.rb_root, \
+			   __machine->threads[hash_bucket].nr)
 
 #endif /* _PERF_RESORT_RB_H_ */

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [tip:perf/core] perf callchain: Use cached rbtrees
  2018-12-06 19:18 ` [PATCH 3/7] perf callchain: " Davidlohr Bueso
@ 2019-01-26 10:02   ` tip-bot for Davidlohr Bueso
  0 siblings, 0 replies; 18+ messages in thread
From: tip-bot for Davidlohr Bueso @ 2019-01-26 10:02 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, dave, acme, mingo, dbueso, jolsa, namhyung, tglx, hpa

Commit-ID:  55ecd6310f9fe48cf7e435be408862da1e0e6baa
Gitweb:     https://git.kernel.org/tip/55ecd6310f9fe48cf7e435be408862da1e0e6baa
Author:     Davidlohr Bueso <dave@stgolabs.net>
AuthorDate: Thu, 6 Dec 2018 11:18:15 -0800
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Fri, 25 Jan 2019 15:12:09 +0100

perf callchain: Use cached rbtrees

At the cost of an extra pointer, we can avoid the O(logN) cost of
finding the first element in the tree (smallest node), which is
something required for nearly every in/srcline callchain node deletion
(in/srcline__tree_delete()).

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/20181206191819.30182-4-dave@stgolabs.net
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/dso.c     |  4 ++--
 tools/perf/util/dso.h     |  4 ++--
 tools/perf/util/srcline.c | 43 +++++++++++++++++++++++++------------------
 tools/perf/util/srcline.h | 13 +++++++------
 4 files changed, 36 insertions(+), 28 deletions(-)

diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c
index cbe6e7dc6af3..9ec4b2e6d4ac 100644
--- a/tools/perf/util/dso.c
+++ b/tools/perf/util/dso.c
@@ -1199,8 +1199,8 @@ struct dso *dso__new(const char *name)
 		dso__set_short_name(dso, dso->name, false);
 		dso->symbols = dso->symbol_names = RB_ROOT;
 		dso->data.cache = RB_ROOT;
-		dso->inlined_nodes = RB_ROOT;
-		dso->srclines = RB_ROOT;
+		dso->inlined_nodes = RB_ROOT_CACHED;
+		dso->srclines = RB_ROOT_CACHED;
 		dso->data.fd = -1;
 		dso->data.status = DSO_DATA_STATUS_UNKNOWN;
 		dso->symtab_type = DSO_BINARY_TYPE__NOT_FOUND;
diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h
index 2de54c0b26b2..40edfb375a8b 100644
--- a/tools/perf/util/dso.h
+++ b/tools/perf/util/dso.h
@@ -143,8 +143,8 @@ struct dso {
 	struct rb_root	 *root;		/* root of rbtree that rb_node is in */
 	struct rb_root	 symbols;
 	struct rb_root	 symbol_names;
-	struct rb_root	 inlined_nodes;
-	struct rb_root	 srclines;
+	struct rb_root_cached inlined_nodes;
+	struct rb_root_cached srclines;
 	struct {
 		u64		addr;
 		struct symbol	*symbol;
diff --git a/tools/perf/util/srcline.c b/tools/perf/util/srcline.c
index dc86597d0cc4..00f215580b5a 100644
--- a/tools/perf/util/srcline.c
+++ b/tools/perf/util/srcline.c
@@ -594,11 +594,12 @@ struct srcline_node {
 	struct rb_node		rb_node;
 };
 
-void srcline__tree_insert(struct rb_root *tree, u64 addr, char *srcline)
+void srcline__tree_insert(struct rb_root_cached *tree, u64 addr, char *srcline)
 {
-	struct rb_node **p = &tree->rb_node;
+	struct rb_node **p = &tree->rb_root.rb_node;
 	struct rb_node *parent = NULL;
 	struct srcline_node *i, *node;
+	bool leftmost = true;
 
 	node = zalloc(sizeof(struct srcline_node));
 	if (!node) {
@@ -614,16 +615,18 @@ void srcline__tree_insert(struct rb_root *tree, u64 addr, char *srcline)
 		i = rb_entry(parent, struct srcline_node, rb_node);
 		if (addr < i->addr)
 			p = &(*p)->rb_left;
-		else
+		else {
 			p = &(*p)->rb_right;
+			leftmost = false;
+		}
 	}
 	rb_link_node(&node->rb_node, parent, p);
-	rb_insert_color(&node->rb_node, tree);
+	rb_insert_color_cached(&node->rb_node, tree, leftmost);
 }
 
-char *srcline__tree_find(struct rb_root *tree, u64 addr)
+char *srcline__tree_find(struct rb_root_cached *tree, u64 addr)
 {
-	struct rb_node *n = tree->rb_node;
+	struct rb_node *n = tree->rb_root.rb_node;
 
 	while (n) {
 		struct srcline_node *i = rb_entry(n, struct srcline_node,
@@ -640,15 +643,15 @@ char *srcline__tree_find(struct rb_root *tree, u64 addr)
 	return NULL;
 }
 
-void srcline__tree_delete(struct rb_root *tree)
+void srcline__tree_delete(struct rb_root_cached *tree)
 {
 	struct srcline_node *pos;
-	struct rb_node *next = rb_first(tree);
+	struct rb_node *next = rb_first_cached(tree);
 
 	while (next) {
 		pos = rb_entry(next, struct srcline_node, rb_node);
 		next = rb_next(&pos->rb_node);
-		rb_erase(&pos->rb_node, tree);
+		rb_erase_cached(&pos->rb_node, tree);
 		free_srcline(pos->srcline);
 		zfree(&pos);
 	}
@@ -682,28 +685,32 @@ void inline_node__delete(struct inline_node *node)
 	free(node);
 }
 
-void inlines__tree_insert(struct rb_root *tree, struct inline_node *inlines)
+void inlines__tree_insert(struct rb_root_cached *tree,
+			  struct inline_node *inlines)
 {
-	struct rb_node **p = &tree->rb_node;
+	struct rb_node **p = &tree->rb_root.rb_node;
 	struct rb_node *parent = NULL;
 	const u64 addr = inlines->addr;
 	struct inline_node *i;
+	bool leftmost = true;
 
 	while (*p != NULL) {
 		parent = *p;
 		i = rb_entry(parent, struct inline_node, rb_node);
 		if (addr < i->addr)
 			p = &(*p)->rb_left;
-		else
+		else {
 			p = &(*p)->rb_right;
+			leftmost = false;
+		}
 	}
 	rb_link_node(&inlines->rb_node, parent, p);
-	rb_insert_color(&inlines->rb_node, tree);
+	rb_insert_color_cached(&inlines->rb_node, tree, leftmost);
 }
 
-struct inline_node *inlines__tree_find(struct rb_root *tree, u64 addr)
+struct inline_node *inlines__tree_find(struct rb_root_cached *tree, u64 addr)
 {
-	struct rb_node *n = tree->rb_node;
+	struct rb_node *n = tree->rb_root.rb_node;
 
 	while (n) {
 		struct inline_node *i = rb_entry(n, struct inline_node,
@@ -720,15 +727,15 @@ struct inline_node *inlines__tree_find(struct rb_root *tree, u64 addr)
 	return NULL;
 }
 
-void inlines__tree_delete(struct rb_root *tree)
+void inlines__tree_delete(struct rb_root_cached *tree)
 {
 	struct inline_node *pos;
-	struct rb_node *next = rb_first(tree);
+	struct rb_node *next = rb_first_cached(tree);
 
 	while (next) {
 		pos = rb_entry(next, struct inline_node, rb_node);
 		next = rb_next(&pos->rb_node);
-		rb_erase(&pos->rb_node, tree);
+		rb_erase_cached(&pos->rb_node, tree);
 		inline_node__delete(pos);
 	}
 }
diff --git a/tools/perf/util/srcline.h b/tools/perf/util/srcline.h
index 5762212dc342..b11a0aaaa676 100644
--- a/tools/perf/util/srcline.h
+++ b/tools/perf/util/srcline.h
@@ -19,11 +19,11 @@ void free_srcline(char *srcline);
 char *get_srcline_split(struct dso *dso, u64 addr, unsigned *line);
 
 /* insert the srcline into the DSO, which will take ownership */
-void srcline__tree_insert(struct rb_root *tree, u64 addr, char *srcline);
+void srcline__tree_insert(struct rb_root_cached *tree, u64 addr, char *srcline);
 /* find previously inserted srcline */
-char *srcline__tree_find(struct rb_root *tree, u64 addr);
+char *srcline__tree_find(struct rb_root_cached *tree, u64 addr);
 /* delete all srclines within the tree */
-void srcline__tree_delete(struct rb_root *tree);
+void srcline__tree_delete(struct rb_root_cached *tree);
 
 #define SRCLINE_UNKNOWN  ((char *) "??:0")
 
@@ -46,10 +46,11 @@ struct inline_node *dso__parse_addr_inlines(struct dso *dso, u64 addr,
 void inline_node__delete(struct inline_node *node);
 
 /* insert the inline node list into the DSO, which will take ownership */
-void inlines__tree_insert(struct rb_root *tree, struct inline_node *inlines);
+void inlines__tree_insert(struct rb_root_cached *tree,
+			  struct inline_node *inlines);
 /* find previously inserted inline node list */
-struct inline_node *inlines__tree_find(struct rb_root *tree, u64 addr);
+struct inline_node *inlines__tree_find(struct rb_root_cached *tree, u64 addr);
 /* delete all nodes within the tree of inline_node s */
-void inlines__tree_delete(struct rb_root *tree);
+void inlines__tree_delete(struct rb_root_cached *tree);
 
 #endif /* PERF_SRCLINE_H */

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [tip:perf/core] perf util: Use cached rbtree for rblists
  2018-12-06 19:18 ` [PATCH 4/7] perf util: Use cached rbtree for rblists Davidlohr Bueso
@ 2019-01-26 10:03   ` tip-bot for Davidlohr Bueso
  0 siblings, 0 replies; 18+ messages in thread
From: tip-bot for Davidlohr Bueso @ 2019-01-26 10:03 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: jolsa, hpa, namhyung, dbueso, mingo, linux-kernel, tglx, dave, acme

Commit-ID:  ca2270292e6c3415102242bf9dc3d05f622b7b28
Gitweb:     https://git.kernel.org/tip/ca2270292e6c3415102242bf9dc3d05f622b7b28
Author:     Davidlohr Bueso <dave@stgolabs.net>
AuthorDate: Thu, 6 Dec 2018 11:18:16 -0800
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Fri, 25 Jan 2019 15:12:10 +0100

perf util: Use cached rbtree for rblists

At the cost of an extra pointer, we can avoid the O(logN) cost of
finding the first element in the tree (smallest node), which is
something required for any of the strlist or intlist traversals
(XXX_for_each_entry()). There are a number of users in perf of these
(particularly strlists), including probes, and buildid.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/20181206191819.30182-5-dave@stgolabs.net
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/intlist.h     |  2 +-
 tools/perf/util/metricgroup.c |  2 +-
 tools/perf/util/rb_resort.h   |  2 +-
 tools/perf/util/rblist.c      | 28 ++++++++++++++++++----------
 tools/perf/util/rblist.h      |  2 +-
 tools/perf/util/stat-shadow.c |  2 +-
 tools/perf/util/strlist.h     |  2 +-
 7 files changed, 24 insertions(+), 16 deletions(-)

diff --git a/tools/perf/util/intlist.h b/tools/perf/util/intlist.h
index 85bab8735fa9..5c19ee001299 100644
--- a/tools/perf/util/intlist.h
+++ b/tools/perf/util/intlist.h
@@ -45,7 +45,7 @@ static inline unsigned int intlist__nr_entries(const struct intlist *ilist)
 /* For intlist iteration */
 static inline struct int_node *intlist__first(struct intlist *ilist)
 {
-	struct rb_node *rn = rb_first(&ilist->rblist.entries);
+	struct rb_node *rn = rb_first_cached(&ilist->rblist.entries);
 	return rn ? rb_entry(rn, struct int_node, rb_node) : NULL;
 }
 static inline struct int_node *intlist__next(struct int_node *in)
diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
index a28f9b5cc4ff..8529cbd3955b 100644
--- a/tools/perf/util/metricgroup.c
+++ b/tools/perf/util/metricgroup.c
@@ -352,7 +352,7 @@ void metricgroup__print(bool metrics, bool metricgroups, char *filter,
 	else if (metrics && !raw)
 		printf("\nMetrics:\n\n");
 
-	for (node = rb_first(&groups.entries); node; node = next) {
+	for (node = rb_first_cached(&groups.entries); node; node = next) {
 		struct mep *me = container_of(node, struct mep, nd);
 
 		if (metricgroups)
diff --git a/tools/perf/util/rb_resort.h b/tools/perf/util/rb_resort.h
index f272e181d3d6..376e86cb4c3c 100644
--- a/tools/perf/util/rb_resort.h
+++ b/tools/perf/util/rb_resort.h
@@ -140,7 +140,7 @@ struct __name##_sorted *__name = __name##_sorted__new
 
 /* For 'struct intlist' */
 #define DECLARE_RESORT_RB_INTLIST(__name, __ilist)				\
-	DECLARE_RESORT_RB(__name)(&__ilist->rblist.entries,			\
+	DECLARE_RESORT_RB(__name)(&__ilist->rblist.entries.rb_root,		\
 				  __ilist->rblist.nr_entries)
 
 /* For 'struct machine->threads' */
diff --git a/tools/perf/util/rblist.c b/tools/perf/util/rblist.c
index 0efc3258c648..11e07fab20dc 100644
--- a/tools/perf/util/rblist.c
+++ b/tools/perf/util/rblist.c
@@ -13,8 +13,9 @@
 
 int rblist__add_node(struct rblist *rblist, const void *new_entry)
 {
-	struct rb_node **p = &rblist->entries.rb_node;
+	struct rb_node **p = &rblist->entries.rb_root.rb_node;
 	struct rb_node *parent = NULL, *new_node;
+	bool leftmost = true;
 
 	while (*p != NULL) {
 		int rc;
@@ -24,8 +25,10 @@ int rblist__add_node(struct rblist *rblist, const void *new_entry)
 		rc = rblist->node_cmp(parent, new_entry);
 		if (rc > 0)
 			p = &(*p)->rb_left;
-		else if (rc < 0)
+		else if (rc < 0) {
 			p = &(*p)->rb_right;
+			leftmost = false;
+		}
 		else
 			return -EEXIST;
 	}
@@ -35,7 +38,7 @@ int rblist__add_node(struct rblist *rblist, const void *new_entry)
 		return -ENOMEM;
 
 	rb_link_node(new_node, parent, p);
-	rb_insert_color(new_node, &rblist->entries);
+	rb_insert_color_cached(new_node, &rblist->entries, leftmost);
 	++rblist->nr_entries;
 
 	return 0;
@@ -43,7 +46,7 @@ int rblist__add_node(struct rblist *rblist, const void *new_entry)
 
 void rblist__remove_node(struct rblist *rblist, struct rb_node *rb_node)
 {
-	rb_erase(rb_node, &rblist->entries);
+	rb_erase_cached(rb_node, &rblist->entries);
 	--rblist->nr_entries;
 	rblist->node_delete(rblist, rb_node);
 }
@@ -52,8 +55,9 @@ static struct rb_node *__rblist__findnew(struct rblist *rblist,
 					 const void *entry,
 					 bool create)
 {
-	struct rb_node **p = &rblist->entries.rb_node;
+	struct rb_node **p = &rblist->entries.rb_root.rb_node;
 	struct rb_node *parent = NULL, *new_node = NULL;
+	bool leftmost = true;
 
 	while (*p != NULL) {
 		int rc;
@@ -63,8 +67,10 @@ static struct rb_node *__rblist__findnew(struct rblist *rblist,
 		rc = rblist->node_cmp(parent, entry);
 		if (rc > 0)
 			p = &(*p)->rb_left;
-		else if (rc < 0)
+		else if (rc < 0) {
 			p = &(*p)->rb_right;
+			leftmost = false;
+		}
 		else
 			return parent;
 	}
@@ -73,7 +79,8 @@ static struct rb_node *__rblist__findnew(struct rblist *rblist,
 		new_node = rblist->node_new(rblist, entry);
 		if (new_node) {
 			rb_link_node(new_node, parent, p);
-			rb_insert_color(new_node, &rblist->entries);
+			rb_insert_color_cached(new_node,
+					       &rblist->entries, leftmost);
 			++rblist->nr_entries;
 		}
 	}
@@ -94,7 +101,7 @@ struct rb_node *rblist__findnew(struct rblist *rblist, const void *entry)
 void rblist__init(struct rblist *rblist)
 {
 	if (rblist != NULL) {
-		rblist->entries	 = RB_ROOT;
+		rblist->entries	 = RB_ROOT_CACHED;
 		rblist->nr_entries = 0;
 	}
 
@@ -103,7 +110,7 @@ void rblist__init(struct rblist *rblist)
 
 void rblist__exit(struct rblist *rblist)
 {
-	struct rb_node *pos, *next = rb_first(&rblist->entries);
+	struct rb_node *pos, *next = rb_first_cached(&rblist->entries);
 
 	while (next) {
 		pos = next;
@@ -124,7 +131,8 @@ struct rb_node *rblist__entry(const struct rblist *rblist, unsigned int idx)
 {
 	struct rb_node *node;
 
-	for (node = rb_first(&rblist->entries); node; node = rb_next(node)) {
+	for (node = rb_first_cached(&rblist->entries); node;
+	     node = rb_next(node)) {
 		if (!idx--)
 			return node;
 	}
diff --git a/tools/perf/util/rblist.h b/tools/perf/util/rblist.h
index 76df15c27f5f..14b232a4d0b6 100644
--- a/tools/perf/util/rblist.h
+++ b/tools/perf/util/rblist.h
@@ -20,7 +20,7 @@
  */
 
 struct rblist {
-	struct rb_root entries;
+	struct rb_root_cached entries;
 	unsigned int   nr_entries;
 
 	int (*node_cmp)(struct rb_node *rbn, const void *entry);
diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c
index 3c22c58b3e90..83d8094be4fe 100644
--- a/tools/perf/util/stat-shadow.c
+++ b/tools/perf/util/stat-shadow.c
@@ -168,7 +168,7 @@ static void reset_stat(struct runtime_stat *st)
 	struct rb_node *pos, *next;
 
 	rblist = &st->value_list;
-	next = rb_first(&rblist->entries);
+	next = rb_first_cached(&rblist->entries);
 	while (next) {
 		pos = next;
 		next = rb_next(pos);
diff --git a/tools/perf/util/strlist.h b/tools/perf/util/strlist.h
index d58f1e08b170..7e82c71dcc42 100644
--- a/tools/perf/util/strlist.h
+++ b/tools/perf/util/strlist.h
@@ -57,7 +57,7 @@ static inline unsigned int strlist__nr_entries(const struct strlist *slist)
 /* For strlist iteration */
 static inline struct str_node *strlist__first(struct strlist *slist)
 {
-	struct rb_node *rn = rb_first(&slist->rblist.entries);
+	struct rb_node *rn = rb_first_cached(&slist->rblist.entries);
 	return rn ? rb_entry(rn, struct str_node, rb_node) : NULL;
 }
 static inline struct str_node *strlist__next(struct str_node *sn)

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [tip:perf/core] perf symbols: Use cached rbtrees
  2018-12-06 19:18 ` [PATCH 5/7] perf symbols: Use cached rbtrees Davidlohr Bueso
@ 2019-01-26 10:03   ` tip-bot for Davidlohr Bueso
  0 siblings, 0 replies; 18+ messages in thread
From: tip-bot for Davidlohr Bueso @ 2019-01-26 10:03 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, jolsa, acme, tglx, dave, namhyung, linux-kernel, dbueso, hpa

Commit-ID:  7137ff50b68a48bc28270c91b1c313259ab0c1c4
Gitweb:     https://git.kernel.org/tip/7137ff50b68a48bc28270c91b1c313259ab0c1c4
Author:     Davidlohr Bueso <dave@stgolabs.net>
AuthorDate: Thu, 6 Dec 2018 11:18:17 -0800
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Fri, 25 Jan 2019 15:12:10 +0100

perf symbols: Use cached rbtrees

At the cost of an extra pointer, we can avoid the O(logN) cost of
finding the first element in the tree (smallest node).

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/20181206191819.30182-6-dave@stgolabs.net
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/builtin-annotate.c    |  2 +-
 tools/perf/util/dso.c            |  4 +-
 tools/perf/util/dso.h            |  6 +--
 tools/perf/util/map.c            |  8 ++--
 tools/perf/util/probe-event.c    |  3 +-
 tools/perf/util/symbol.c         | 87 ++++++++++++++++++++++------------------
 tools/perf/util/symbol.h         | 13 +++---
 tools/perf/util/symbol_fprintf.c |  2 +-
 8 files changed, 67 insertions(+), 58 deletions(-)

diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index 93d679eaf1f4..cc3da5564300 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -227,7 +227,7 @@ static int perf_evsel__add_sample(struct perf_evsel *evsel,
 		 * the DSO?
 		 */
 		if (al->sym != NULL) {
-			rb_erase(&al->sym->rb_node,
+			rb_erase_cached(&al->sym->rb_node,
 				 &al->map->dso->symbols);
 			symbol__delete(al->sym);
 			dso__reset_find_symbol_cache(al->map->dso);
diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c
index 9ec4b2e6d4ac..a8a54115b115 100644
--- a/tools/perf/util/dso.c
+++ b/tools/perf/util/dso.c
@@ -1197,7 +1197,7 @@ struct dso *dso__new(const char *name)
 		strcpy(dso->name, name);
 		dso__set_long_name(dso, dso->name, false);
 		dso__set_short_name(dso, dso->name, false);
-		dso->symbols = dso->symbol_names = RB_ROOT;
+		dso->symbols = dso->symbol_names = RB_ROOT_CACHED;
 		dso->data.cache = RB_ROOT;
 		dso->inlined_nodes = RB_ROOT_CACHED;
 		dso->srclines = RB_ROOT_CACHED;
@@ -1469,7 +1469,7 @@ size_t dso__fprintf(struct dso *dso, FILE *fp)
 	ret += fprintf(fp, "%sloaded, ", dso__loaded(dso) ? "" : "NOT ");
 	ret += dso__fprintf_buildid(dso, fp);
 	ret += fprintf(fp, ")\n");
-	for (nd = rb_first(&dso->symbols); nd; nd = rb_next(nd)) {
+	for (nd = rb_first_cached(&dso->symbols); nd; nd = rb_next(nd)) {
 		struct symbol *pos = rb_entry(nd, struct symbol, rb_node);
 		ret += symbol__fprintf(pos, fp);
 	}
diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h
index 40edfb375a8b..bb417c54c25a 100644
--- a/tools/perf/util/dso.h
+++ b/tools/perf/util/dso.h
@@ -141,8 +141,8 @@ struct dso {
 	struct list_head node;
 	struct rb_node	 rb_node;	/* rbtree node sorted by long name */
 	struct rb_root	 *root;		/* root of rbtree that rb_node is in */
-	struct rb_root	 symbols;
-	struct rb_root	 symbol_names;
+	struct rb_root_cached symbols;
+	struct rb_root_cached symbol_names;
 	struct rb_root_cached inlined_nodes;
 	struct rb_root_cached srclines;
 	struct {
@@ -236,7 +236,7 @@ bool dso__loaded(const struct dso *dso);
 
 static inline bool dso__has_symbols(const struct dso *dso)
 {
-	return !RB_EMPTY_ROOT(&dso->symbols);
+	return !RB_EMPTY_ROOT(&dso->symbols.rb_root);
 }
 
 bool dso__sorted_by_name(const struct dso *dso);
diff --git a/tools/perf/util/map.c b/tools/perf/util/map.c
index 6751301a755c..cf407cc9d915 100644
--- a/tools/perf/util/map.c
+++ b/tools/perf/util/map.c
@@ -286,8 +286,8 @@ void map__put(struct map *map)
 
 void map__fixup_start(struct map *map)
 {
-	struct rb_root *symbols = &map->dso->symbols;
-	struct rb_node *nd = rb_first(symbols);
+	struct rb_root_cached *symbols = &map->dso->symbols;
+	struct rb_node *nd = rb_first_cached(symbols);
 	if (nd != NULL) {
 		struct symbol *sym = rb_entry(nd, struct symbol, rb_node);
 		map->start = sym->start;
@@ -296,8 +296,8 @@ void map__fixup_start(struct map *map)
 
 void map__fixup_end(struct map *map)
 {
-	struct rb_root *symbols = &map->dso->symbols;
-	struct rb_node *nd = rb_last(symbols);
+	struct rb_root_cached *symbols = &map->dso->symbols;
+	struct rb_node *nd = rb_last(&symbols->rb_root);
 	if (nd != NULL) {
 		struct symbol *sym = rb_entry(nd, struct symbol, rb_node);
 		map->end = sym->end;
diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
index 736787a18b9e..4008f0cc36e5 100644
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
@@ -3529,7 +3529,8 @@ int show_available_funcs(const char *target, struct nsinfo *nsi,
 	/* Show all (filtered) symbols */
 	setup_pager();
 
-	for (nd = rb_first(&map->dso->symbol_names); nd; nd = rb_next(nd)) {
+	for (nd = rb_first_cached(&map->dso->symbol_names); nd;
+	     nd = rb_next(nd)) {
 		struct symbol_name_rb_node *pos = rb_entry(nd, struct symbol_name_rb_node, rb_node);
 
 		if (strfilter__compare(_filter, pos->sym.name))
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 48efad6d0f90..bcbcbd610460 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -163,7 +163,7 @@ static int choose_best_symbol(struct symbol *syma, struct symbol *symb)
 	return arch__choose_best_symbol(syma, symb);
 }
 
-void symbols__fixup_duplicate(struct rb_root *symbols)
+void symbols__fixup_duplicate(struct rb_root_cached *symbols)
 {
 	struct rb_node *nd;
 	struct symbol *curr, *next;
@@ -171,7 +171,7 @@ void symbols__fixup_duplicate(struct rb_root *symbols)
 	if (symbol_conf.allow_aliases)
 		return;
 
-	nd = rb_first(symbols);
+	nd = rb_first_cached(symbols);
 
 	while (nd) {
 		curr = rb_entry(nd, struct symbol, rb_node);
@@ -186,20 +186,20 @@ again:
 			continue;
 
 		if (choose_best_symbol(curr, next) == SYMBOL_A) {
-			rb_erase(&next->rb_node, symbols);
+			rb_erase_cached(&next->rb_node, symbols);
 			symbol__delete(next);
 			goto again;
 		} else {
 			nd = rb_next(&curr->rb_node);
-			rb_erase(&curr->rb_node, symbols);
+			rb_erase_cached(&curr->rb_node, symbols);
 			symbol__delete(curr);
 		}
 	}
 }
 
-void symbols__fixup_end(struct rb_root *symbols)
+void symbols__fixup_end(struct rb_root_cached *symbols)
 {
-	struct rb_node *nd, *prevnd = rb_first(symbols);
+	struct rb_node *nd, *prevnd = rb_first_cached(symbols);
 	struct symbol *curr, *prev;
 
 	if (prevnd == NULL)
@@ -282,25 +282,27 @@ void symbol__delete(struct symbol *sym)
 	free(((void *)sym) - symbol_conf.priv_size);
 }
 
-void symbols__delete(struct rb_root *symbols)
+void symbols__delete(struct rb_root_cached *symbols)
 {
 	struct symbol *pos;
-	struct rb_node *next = rb_first(symbols);
+	struct rb_node *next = rb_first_cached(symbols);
 
 	while (next) {
 		pos = rb_entry(next, struct symbol, rb_node);
 		next = rb_next(&pos->rb_node);
-		rb_erase(&pos->rb_node, symbols);
+		rb_erase_cached(&pos->rb_node, symbols);
 		symbol__delete(pos);
 	}
 }
 
-void __symbols__insert(struct rb_root *symbols, struct symbol *sym, bool kernel)
+void __symbols__insert(struct rb_root_cached *symbols,
+		       struct symbol *sym, bool kernel)
 {
-	struct rb_node **p = &symbols->rb_node;
+	struct rb_node **p = &symbols->rb_root.rb_node;
 	struct rb_node *parent = NULL;
 	const u64 ip = sym->start;
 	struct symbol *s;
+	bool leftmost = true;
 
 	if (kernel) {
 		const char *name = sym->name;
@@ -318,26 +320,28 @@ void __symbols__insert(struct rb_root *symbols, struct symbol *sym, bool kernel)
 		s = rb_entry(parent, struct symbol, rb_node);
 		if (ip < s->start)
 			p = &(*p)->rb_left;
-		else
+		else {
 			p = &(*p)->rb_right;
+			leftmost = false;
+		}
 	}
 	rb_link_node(&sym->rb_node, parent, p);
-	rb_insert_color(&sym->rb_node, symbols);
+	rb_insert_color_cached(&sym->rb_node, symbols, leftmost);
 }
 
-void symbols__insert(struct rb_root *symbols, struct symbol *sym)
+void symbols__insert(struct rb_root_cached *symbols, struct symbol *sym)
 {
 	__symbols__insert(symbols, sym, false);
 }
 
-static struct symbol *symbols__find(struct rb_root *symbols, u64 ip)
+static struct symbol *symbols__find(struct rb_root_cached *symbols, u64 ip)
 {
 	struct rb_node *n;
 
 	if (symbols == NULL)
 		return NULL;
 
-	n = symbols->rb_node;
+	n = symbols->rb_root.rb_node;
 
 	while (n) {
 		struct symbol *s = rb_entry(n, struct symbol, rb_node);
@@ -353,9 +357,9 @@ static struct symbol *symbols__find(struct rb_root *symbols, u64 ip)
 	return NULL;
 }
 
-static struct symbol *symbols__first(struct rb_root *symbols)
+static struct symbol *symbols__first(struct rb_root_cached *symbols)
 {
-	struct rb_node *n = rb_first(symbols);
+	struct rb_node *n = rb_first_cached(symbols);
 
 	if (n)
 		return rb_entry(n, struct symbol, rb_node);
@@ -363,9 +367,9 @@ static struct symbol *symbols__first(struct rb_root *symbols)
 	return NULL;
 }
 
-static struct symbol *symbols__last(struct rb_root *symbols)
+static struct symbol *symbols__last(struct rb_root_cached *symbols)
 {
-	struct rb_node *n = rb_last(symbols);
+	struct rb_node *n = rb_last(&symbols->rb_root);
 
 	if (n)
 		return rb_entry(n, struct symbol, rb_node);
@@ -383,11 +387,12 @@ static struct symbol *symbols__next(struct symbol *sym)
 	return NULL;
 }
 
-static void symbols__insert_by_name(struct rb_root *symbols, struct symbol *sym)
+static void symbols__insert_by_name(struct rb_root_cached *symbols, struct symbol *sym)
 {
-	struct rb_node **p = &symbols->rb_node;
+	struct rb_node **p = &symbols->rb_root.rb_node;
 	struct rb_node *parent = NULL;
 	struct symbol_name_rb_node *symn, *s;
+	bool leftmost = true;
 
 	symn = container_of(sym, struct symbol_name_rb_node, sym);
 
@@ -396,19 +401,21 @@ static void symbols__insert_by_name(struct rb_root *symbols, struct symbol *sym)
 		s = rb_entry(parent, struct symbol_name_rb_node, rb_node);
 		if (strcmp(sym->name, s->sym.name) < 0)
 			p = &(*p)->rb_left;
-		else
+		else {
 			p = &(*p)->rb_right;
+			leftmost = false;
+		}
 	}
 	rb_link_node(&symn->rb_node, parent, p);
-	rb_insert_color(&symn->rb_node, symbols);
+	rb_insert_color_cached(&symn->rb_node, symbols, leftmost);
 }
 
-static void symbols__sort_by_name(struct rb_root *symbols,
-				  struct rb_root *source)
+static void symbols__sort_by_name(struct rb_root_cached *symbols,
+				  struct rb_root_cached *source)
 {
 	struct rb_node *nd;
 
-	for (nd = rb_first(source); nd; nd = rb_next(nd)) {
+	for (nd = rb_first_cached(source); nd; nd = rb_next(nd)) {
 		struct symbol *pos = rb_entry(nd, struct symbol, rb_node);
 		symbols__insert_by_name(symbols, pos);
 	}
@@ -431,7 +438,7 @@ int symbol__match_symbol_name(const char *name, const char *str,
 		return arch__compare_symbol_names(name, str);
 }
 
-static struct symbol *symbols__find_by_name(struct rb_root *symbols,
+static struct symbol *symbols__find_by_name(struct rb_root_cached *symbols,
 					    const char *name,
 					    enum symbol_tag_include includes)
 {
@@ -441,7 +448,7 @@ static struct symbol *symbols__find_by_name(struct rb_root *symbols,
 	if (symbols == NULL)
 		return NULL;
 
-	n = symbols->rb_node;
+	n = symbols->rb_root.rb_node;
 
 	while (n) {
 		int cmp;
@@ -644,7 +651,7 @@ static int map__process_kallsym_symbol(void *arg, const char *name,
 {
 	struct symbol *sym;
 	struct dso *dso = arg;
-	struct rb_root *root = &dso->symbols;
+	struct rb_root_cached *root = &dso->symbols;
 
 	if (!symbol_type__filter(type))
 		return 0;
@@ -681,14 +688,14 @@ static int map_groups__split_kallsyms_for_kcore(struct map_groups *kmaps, struct
 	struct map *curr_map;
 	struct symbol *pos;
 	int count = 0;
-	struct rb_root old_root = dso->symbols;
-	struct rb_root *root = &dso->symbols;
-	struct rb_node *next = rb_first(root);
+	struct rb_root_cached old_root = dso->symbols;
+	struct rb_root_cached *root = &dso->symbols;
+	struct rb_node *next = rb_first_cached(root);
 
 	if (!kmaps)
 		return -1;
 
-	*root = RB_ROOT;
+	*root = RB_ROOT_CACHED;
 
 	while (next) {
 		char *module;
@@ -696,8 +703,8 @@ static int map_groups__split_kallsyms_for_kcore(struct map_groups *kmaps, struct
 		pos = rb_entry(next, struct symbol, rb_node);
 		next = rb_next(&pos->rb_node);
 
-		rb_erase_init(&pos->rb_node, &old_root);
-
+		rb_erase_cached(&pos->rb_node, &old_root);
+		RB_CLEAR_NODE(&pos->rb_node);
 		module = strchr(pos->name, '\t');
 		if (module)
 			*module = '\0';
@@ -734,8 +741,8 @@ static int map_groups__split_kallsyms(struct map_groups *kmaps, struct dso *dso,
 	struct map *curr_map = initial_map;
 	struct symbol *pos;
 	int count = 0, moved = 0;
-	struct rb_root *root = &dso->symbols;
-	struct rb_node *next = rb_first(root);
+	struct rb_root_cached *root = &dso->symbols;
+	struct rb_node *next = rb_first_cached(root);
 	int kernel_range = 0;
 	bool x86_64;
 
@@ -849,7 +856,7 @@ static int map_groups__split_kallsyms(struct map_groups *kmaps, struct dso *dso,
 		}
 add_symbol:
 		if (curr_map != initial_map) {
-			rb_erase(&pos->rb_node, root);
+			rb_erase_cached(&pos->rb_node, root);
 			symbols__insert(&curr_map->dso->symbols, pos);
 			++moved;
 		} else
@@ -857,7 +864,7 @@ add_symbol:
 
 		continue;
 discard_symbol:
-		rb_erase(&pos->rb_node, root);
+		rb_erase_cached(&pos->rb_node, root);
 		symbol__delete(pos);
 	}
 
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index 2162a5d9b3ac..56e2bcb907cc 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -68,7 +68,7 @@ struct symbol {
 };
 
 void symbol__delete(struct symbol *sym);
-void symbols__delete(struct rb_root *symbols);
+void symbols__delete(struct rb_root_cached *symbols);
 
 /* symbols__for_each_entry - iterate over symbols (rb_root)
  *
@@ -77,7 +77,7 @@ void symbols__delete(struct rb_root *symbols);
  * @nd: the 'struct rb_node *' to use as a temporary storage
  */
 #define symbols__for_each_entry(symbols, pos, nd)			\
-	for (nd = rb_first(symbols);					\
+	for (nd = rb_first_cached(symbols);					\
 	     nd && (pos = rb_entry(nd, struct symbol, rb_node));	\
 	     nd = rb_next(nd))
 
@@ -247,10 +247,11 @@ int dso__synthesize_plt_symbols(struct dso *dso, struct symsrc *ss);
 
 char *dso__demangle_sym(struct dso *dso, int kmodule, const char *elf_name);
 
-void __symbols__insert(struct rb_root *symbols, struct symbol *sym, bool kernel);
-void symbols__insert(struct rb_root *symbols, struct symbol *sym);
-void symbols__fixup_duplicate(struct rb_root *symbols);
-void symbols__fixup_end(struct rb_root *symbols);
+void __symbols__insert(struct rb_root_cached *symbols, struct symbol *sym,
+		       bool kernel);
+void symbols__insert(struct rb_root_cached *symbols, struct symbol *sym);
+void symbols__fixup_duplicate(struct rb_root_cached *symbols);
+void symbols__fixup_end(struct rb_root_cached *symbols);
 void map_groups__fixup_end(struct map_groups *mg);
 
 typedef int (*mapfn_t)(u64 start, u64 len, u64 pgoff, void *data);
diff --git a/tools/perf/util/symbol_fprintf.c b/tools/perf/util/symbol_fprintf.c
index 50472c75da59..02e89b02c2ce 100644
--- a/tools/perf/util/symbol_fprintf.c
+++ b/tools/perf/util/symbol_fprintf.c
@@ -65,7 +65,7 @@ size_t dso__fprintf_symbols_by_name(struct dso *dso,
 	struct rb_node *nd;
 	struct symbol_name_rb_node *pos;
 
-	for (nd = rb_first(&dso->symbol_names); nd; nd = rb_next(nd)) {
+	for (nd = rb_first_cached(&dso->symbol_names); nd; nd = rb_next(nd)) {
 		pos = rb_entry(nd, struct symbol_name_rb_node, rb_node);
 		fprintf(fp, "%s\n", pos->sym.name);
 	}

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [tip:perf/core] perf hist: Use cached rbtrees
  2018-12-06 19:18 ` [PATCH 6/7] perf hist: " Davidlohr Bueso
  2019-01-22 13:59   ` Arnaldo Carvalho de Melo
@ 2019-01-26 10:04   ` tip-bot for Davidlohr Bueso
  1 sibling, 0 replies; 18+ messages in thread
From: tip-bot for Davidlohr Bueso @ 2019-01-26 10:04 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: dave, dbueso, hpa, tglx, linux-kernel, namhyung, jolsa, mingo, acme

Commit-ID:  2eb3d6894ae3b9cc8a94c91458a041c45773f23d
Gitweb:     https://git.kernel.org/tip/2eb3d6894ae3b9cc8a94c91458a041c45773f23d
Author:     Davidlohr Bueso <dave@stgolabs.net>
AuthorDate: Thu, 6 Dec 2018 11:18:18 -0800
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Fri, 25 Jan 2019 15:12:10 +0100

perf hist: Use cached rbtrees

At the cost of an extra pointer, we can avoid the O(logN) cost of
finding the first element in the tree (smallest node), which is
something heavily required for histograms. Specifically, the following
are converted to rb_root_cached, and users accordingly:

hist::entries_in_array
hist::entries_in
hist::entries
hist::entries_collapsed
hist_entry::hroot_in
hist_entry::hroot_out

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/20181206191819.30182-7-dave@stgolabs.net
[ Added some missing conversions to rb_first_cached() ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/builtin-annotate.c     |   2 +-
 tools/perf/builtin-c2c.c          |   6 +-
 tools/perf/builtin-diff.c         |  10 +-
 tools/perf/builtin-top.c          |   2 +-
 tools/perf/tests/hists_common.c   |   8 +-
 tools/perf/tests/hists_cumulate.c |  14 +--
 tools/perf/tests/hists_link.c     |   8 +-
 tools/perf/tests/hists_output.c   |  32 +++---
 tools/perf/ui/browsers/hists.c    |  16 +--
 tools/perf/ui/gtk/hists.c         |   6 +-
 tools/perf/ui/stdio/hist.c        |   3 +-
 tools/perf/util/hist.c            | 199 +++++++++++++++++++++-----------------
 tools/perf/util/hist.h            |  10 +-
 tools/perf/util/sort.h            |   4 +-
 14 files changed, 173 insertions(+), 147 deletions(-)

diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index cc3da5564300..20db635347e5 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -305,7 +305,7 @@ static void hists__find_annotations(struct hists *hists,
 				    struct perf_evsel *evsel,
 				    struct perf_annotate *ann)
 {
-	struct rb_node *nd = rb_first(&hists->entries), *next;
+	struct rb_node *nd = rb_first_cached(&hists->entries), *next;
 	int key = K_RIGHT;
 
 	while (nd) {
diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index f2863496dede..72ec0ae7d8ad 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -2088,7 +2088,7 @@ static int resort_hitm_cb(struct hist_entry *he)
 
 static int hists__iterate_cb(struct hists *hists, hists__resort_cb_t cb)
 {
-	struct rb_node *next = rb_first(&hists->entries);
+	struct rb_node *next = rb_first_cached(&hists->entries);
 	int ret = 0;
 
 	while (next) {
@@ -2215,7 +2215,7 @@ static void print_pareto(FILE *out)
 	if (WARN_ONCE(ret, "failed to setup sort entries\n"))
 		return;
 
-	nd = rb_first(&c2c.hists.hists.entries);
+	nd = rb_first_cached(&c2c.hists.hists.entries);
 
 	for (; nd; nd = rb_next(nd)) {
 		struct hist_entry *he = rb_entry(nd, struct hist_entry, rb_node);
@@ -2283,7 +2283,7 @@ static void perf_c2c__hists_fprintf(FILE *out, struct perf_session *session)
 static void c2c_browser__update_nr_entries(struct hist_browser *hb)
 {
 	u64 nr_entries = 0;
-	struct rb_node *nd = rb_first(&hb->hists->entries);
+	struct rb_node *nd = rb_first_cached(&hb->hists->entries);
 
 	while (nd) {
 		struct hist_entry *he = rb_entry(nd, struct hist_entry, rb_node);
diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
index 39db2ee32d48..751e1971456b 100644
--- a/tools/perf/builtin-diff.c
+++ b/tools/perf/builtin-diff.c
@@ -429,7 +429,7 @@ get_pair_fmt(struct hist_entry *he, struct diff_hpp_fmt *dfmt)
 
 static void hists__baseline_only(struct hists *hists)
 {
-	struct rb_root *root;
+	struct rb_root_cached *root;
 	struct rb_node *next;
 
 	if (hists__has(hists, need_collapse))
@@ -437,13 +437,13 @@ static void hists__baseline_only(struct hists *hists)
 	else
 		root = hists->entries_in;
 
-	next = rb_first(root);
+	next = rb_first_cached(root);
 	while (next != NULL) {
 		struct hist_entry *he = rb_entry(next, struct hist_entry, rb_node_in);
 
 		next = rb_next(&he->rb_node_in);
 		if (!hist_entry__next_pair(he)) {
-			rb_erase(&he->rb_node_in, root);
+			rb_erase_cached(&he->rb_node_in, root);
 			hist_entry__delete(he);
 		}
 	}
@@ -451,7 +451,7 @@ static void hists__baseline_only(struct hists *hists)
 
 static void hists__precompute(struct hists *hists)
 {
-	struct rb_root *root;
+	struct rb_root_cached *root;
 	struct rb_node *next;
 
 	if (hists__has(hists, need_collapse))
@@ -459,7 +459,7 @@ static void hists__precompute(struct hists *hists)
 	else
 		root = hists->entries_in;
 
-	next = rb_first(root);
+	next = rb_first_cached(root);
 	while (next != NULL) {
 		struct hist_entry *he, *pair;
 		struct data__file *d;
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 5a486d4de56e..57b1d7495d02 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -367,7 +367,7 @@ static void perf_top__prompt_symbol(struct perf_top *top, const char *msg)
 	if (p)
 		*p = 0;
 
-	next = rb_first(&hists->entries);
+	next = rb_first_cached(&hists->entries);
 	while (next) {
 		n = rb_entry(next, struct hist_entry, rb_node);
 		if (n->ms.sym && !strcmp(buf, n->ms.sym->name)) {
diff --git a/tools/perf/tests/hists_common.c b/tools/perf/tests/hists_common.c
index b889a28fd80b..6b0649c8b33e 100644
--- a/tools/perf/tests/hists_common.c
+++ b/tools/perf/tests/hists_common.c
@@ -161,7 +161,7 @@ out:
 void print_hists_in(struct hists *hists)
 {
 	int i = 0;
-	struct rb_root *root;
+	struct rb_root_cached *root;
 	struct rb_node *node;
 
 	if (hists__has(hists, need_collapse))
@@ -170,7 +170,7 @@ void print_hists_in(struct hists *hists)
 		root = hists->entries_in;
 
 	pr_info("----- %s --------\n", __func__);
-	node = rb_first(root);
+	node = rb_first_cached(root);
 	while (node) {
 		struct hist_entry *he;
 
@@ -191,13 +191,13 @@ void print_hists_in(struct hists *hists)
 void print_hists_out(struct hists *hists)
 {
 	int i = 0;
-	struct rb_root *root;
+	struct rb_root_cached *root;
 	struct rb_node *node;
 
 	root = &hists->entries;
 
 	pr_info("----- %s --------\n", __func__);
-	node = rb_first(root);
+	node = rb_first_cached(root);
 	while (node) {
 		struct hist_entry *he;
 
diff --git a/tools/perf/tests/hists_cumulate.c b/tools/perf/tests/hists_cumulate.c
index 65fe02bebbee..4ca417d1a06b 100644
--- a/tools/perf/tests/hists_cumulate.c
+++ b/tools/perf/tests/hists_cumulate.c
@@ -125,8 +125,8 @@ out:
 static void del_hist_entries(struct hists *hists)
 {
 	struct hist_entry *he;
-	struct rb_root *root_in;
-	struct rb_root *root_out;
+	struct rb_root_cached *root_in;
+	struct rb_root_cached *root_out;
 	struct rb_node *node;
 
 	if (hists__has(hists, need_collapse))
@@ -136,12 +136,12 @@ static void del_hist_entries(struct hists *hists)
 
 	root_out = &hists->entries;
 
-	while (!RB_EMPTY_ROOT(root_out)) {
-		node = rb_first(root_out);
+	while (!RB_EMPTY_ROOT(&root_out->rb_root)) {
+		node = rb_first_cached(root_out);
 
 		he = rb_entry(node, struct hist_entry, rb_node);
-		rb_erase(node, root_out);
-		rb_erase(&he->rb_node_in, root_in);
+		rb_erase_cached(node, root_out);
+		rb_erase_cached(&he->rb_node_in, root_in);
 		hist_entry__delete(he);
 	}
 }
@@ -198,7 +198,7 @@ static int do_test(struct hists *hists, struct result *expected, size_t nr_expec
 		print_hists_out(hists);
 	}
 
-	root = &hists->entries;
+	root = &hists->entries.rb_root;
 	for (node = rb_first(root), i = 0;
 	     node && (he = rb_entry(node, struct hist_entry, rb_node));
 	     node = rb_next(node), i++) {
diff --git a/tools/perf/tests/hists_link.c b/tools/perf/tests/hists_link.c
index 9a9d06cb0222..af633db63f4d 100644
--- a/tools/perf/tests/hists_link.c
+++ b/tools/perf/tests/hists_link.c
@@ -142,7 +142,7 @@ static int find_sample(struct sample *samples, size_t nr_samples,
 static int __validate_match(struct hists *hists)
 {
 	size_t count = 0;
-	struct rb_root *root;
+	struct rb_root_cached *root;
 	struct rb_node *node;
 
 	/*
@@ -153,7 +153,7 @@ static int __validate_match(struct hists *hists)
 	else
 		root = hists->entries_in;
 
-	node = rb_first(root);
+	node = rb_first_cached(root);
 	while (node) {
 		struct hist_entry *he;
 
@@ -192,7 +192,7 @@ static int __validate_link(struct hists *hists, int idx)
 	size_t count = 0;
 	size_t count_pair = 0;
 	size_t count_dummy = 0;
-	struct rb_root *root;
+	struct rb_root_cached *root;
 	struct rb_node *node;
 
 	/*
@@ -205,7 +205,7 @@ static int __validate_link(struct hists *hists, int idx)
 	else
 		root = hists->entries_in;
 
-	node = rb_first(root);
+	node = rb_first_cached(root);
 	while (node) {
 		struct hist_entry *he;
 
diff --git a/tools/perf/tests/hists_output.c b/tools/perf/tests/hists_output.c
index faacb4f41460..6c4bc68a77ee 100644
--- a/tools/perf/tests/hists_output.c
+++ b/tools/perf/tests/hists_output.c
@@ -91,8 +91,8 @@ out:
 static void del_hist_entries(struct hists *hists)
 {
 	struct hist_entry *he;
-	struct rb_root *root_in;
-	struct rb_root *root_out;
+	struct rb_root_cached *root_in;
+	struct rb_root_cached *root_out;
 	struct rb_node *node;
 
 	if (hists__has(hists, need_collapse))
@@ -102,12 +102,12 @@ static void del_hist_entries(struct hists *hists)
 
 	root_out = &hists->entries;
 
-	while (!RB_EMPTY_ROOT(root_out)) {
-		node = rb_first(root_out);
+	while (!RB_EMPTY_ROOT(&root_out->rb_root)) {
+		node = rb_first_cached(root_out);
 
 		he = rb_entry(node, struct hist_entry, rb_node);
-		rb_erase(node, root_out);
-		rb_erase(&he->rb_node_in, root_in);
+		rb_erase_cached(node, root_out);
+		rb_erase_cached(&he->rb_node_in, root_in);
 		hist_entry__delete(he);
 	}
 }
@@ -126,7 +126,7 @@ static int test1(struct perf_evsel *evsel, struct machine *machine)
 	int err;
 	struct hists *hists = evsel__hists(evsel);
 	struct hist_entry *he;
-	struct rb_root *root;
+	struct rb_root_cached *root;
 	struct rb_node *node;
 
 	field_order = NULL;
@@ -162,7 +162,7 @@ static int test1(struct perf_evsel *evsel, struct machine *machine)
 	}
 
 	root = &hists->entries;
-	node = rb_first(root);
+	node = rb_first_cached(root);
 	he = rb_entry(node, struct hist_entry, rb_node);
 	TEST_ASSERT_VAL("Invalid hist entry",
 			!strcmp(COMM(he), "perf") && !strcmp(DSO(he), "perf") &&
@@ -228,7 +228,7 @@ static int test2(struct perf_evsel *evsel, struct machine *machine)
 	int err;
 	struct hists *hists = evsel__hists(evsel);
 	struct hist_entry *he;
-	struct rb_root *root;
+	struct rb_root_cached *root;
 	struct rb_node *node;
 
 	field_order = "overhead,cpu";
@@ -262,7 +262,7 @@ static int test2(struct perf_evsel *evsel, struct machine *machine)
 	}
 
 	root = &hists->entries;
-	node = rb_first(root);
+	node = rb_first_cached(root);
 	he = rb_entry(node, struct hist_entry, rb_node);
 	TEST_ASSERT_VAL("Invalid hist entry",
 			CPU(he) == 1 && PID(he) == 100 && he->stat.period == 300);
@@ -284,7 +284,7 @@ static int test3(struct perf_evsel *evsel, struct machine *machine)
 	int err;
 	struct hists *hists = evsel__hists(evsel);
 	struct hist_entry *he;
-	struct rb_root *root;
+	struct rb_root_cached *root;
 	struct rb_node *node;
 
 	field_order = "comm,overhead,dso";
@@ -316,7 +316,7 @@ static int test3(struct perf_evsel *evsel, struct machine *machine)
 	}
 
 	root = &hists->entries;
-	node = rb_first(root);
+	node = rb_first_cached(root);
 	he = rb_entry(node, struct hist_entry, rb_node);
 	TEST_ASSERT_VAL("Invalid hist entry",
 			!strcmp(COMM(he), "bash") && !strcmp(DSO(he), "bash") &&
@@ -358,7 +358,7 @@ static int test4(struct perf_evsel *evsel, struct machine *machine)
 	int err;
 	struct hists *hists = evsel__hists(evsel);
 	struct hist_entry *he;
-	struct rb_root *root;
+	struct rb_root_cached *root;
 	struct rb_node *node;
 
 	field_order = "dso,sym,comm,overhead,dso";
@@ -394,7 +394,7 @@ static int test4(struct perf_evsel *evsel, struct machine *machine)
 	}
 
 	root = &hists->entries;
-	node = rb_first(root);
+	node = rb_first_cached(root);
 	he = rb_entry(node, struct hist_entry, rb_node);
 	TEST_ASSERT_VAL("Invalid hist entry",
 			!strcmp(DSO(he), "perf") && !strcmp(SYM(he), "cmd_record") &&
@@ -460,7 +460,7 @@ static int test5(struct perf_evsel *evsel, struct machine *machine)
 	int err;
 	struct hists *hists = evsel__hists(evsel);
 	struct hist_entry *he;
-	struct rb_root *root;
+	struct rb_root_cached *root;
 	struct rb_node *node;
 
 	field_order = "cpu,pid,comm,dso,sym";
@@ -497,7 +497,7 @@ static int test5(struct perf_evsel *evsel, struct machine *machine)
 	}
 
 	root = &hists->entries;
-	node = rb_first(root);
+	node = rb_first_cached(root);
 	he = rb_entry(node, struct hist_entry, rb_node);
 
 	TEST_ASSERT_VAL("Invalid hist entry",
diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index b32f505a7608..85790a4d1842 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -49,7 +49,7 @@ static int hist_browser__get_folding(struct hist_browser *browser)
 	struct hists *hists = browser->hists;
 	int unfolded_rows = 0;
 
-	for (nd = rb_first(&hists->entries);
+	for (nd = rb_first_cached(&hists->entries);
 	     (nd = hists__filter_entries(nd, browser->min_pcnt)) != NULL;
 	     nd = rb_hierarchy_next(nd)) {
 		struct hist_entry *he =
@@ -267,7 +267,7 @@ static int hierarchy_count_rows(struct hist_browser *hb, struct hist_entry *he,
 	if (he->has_no_entry)
 		return 1;
 
-	node = rb_first(&he->hroot_out);
+	node = rb_first_cached(&he->hroot_out);
 	while (node) {
 		float percent;
 
@@ -372,7 +372,7 @@ static void hist_entry__init_have_children(struct hist_entry *he)
 		he->has_children = !RB_EMPTY_ROOT(&he->sorted_chain);
 		callchain__init_have_children(&he->sorted_chain);
 	} else {
-		he->has_children = !RB_EMPTY_ROOT(&he->hroot_out);
+		he->has_children = !RB_EMPTY_ROOT(&he->hroot_out.rb_root);
 	}
 
 	he->init_have_children = true;
@@ -508,7 +508,7 @@ static int hierarchy_set_folding(struct hist_browser *hb, struct hist_entry *he,
 	struct hist_entry *child;
 	int n = 0;
 
-	for (nd = rb_first(&he->hroot_out); nd; nd = rb_next(nd)) {
+	for (nd = rb_first_cached(&he->hroot_out); nd; nd = rb_next(nd)) {
 		child = rb_entry(nd, struct hist_entry, rb_node);
 		percent = hist_entry__get_percent_limit(child);
 		if (!child->filtered && percent >= hb->min_pcnt)
@@ -566,7 +566,7 @@ __hist_browser__set_folding(struct hist_browser *browser, bool unfold)
 	struct rb_node *nd;
 	struct hist_entry *he;
 
-	nd = rb_first(&browser->hists->entries);
+	nd = rb_first_cached(&browser->hists->entries);
 	while (nd) {
 		he = rb_entry(nd, struct hist_entry, rb_node);
 
@@ -1738,7 +1738,7 @@ static void ui_browser__hists_init_top(struct ui_browser *browser)
 		struct hist_browser *hb;
 
 		hb = container_of(browser, struct hist_browser, b);
-		browser->top = rb_first(&hb->hists->entries);
+		browser->top = rb_first_cached(&hb->hists->entries);
 	}
 }
 
@@ -2649,7 +2649,7 @@ add_socket_opt(struct hist_browser *browser, struct popup_action *act,
 static void hist_browser__update_nr_entries(struct hist_browser *hb)
 {
 	u64 nr_entries = 0;
-	struct rb_node *nd = rb_first(&hb->hists->entries);
+	struct rb_node *nd = rb_first_cached(&hb->hists->entries);
 
 	if (hb->min_pcnt == 0 && !symbol_conf.report_hierarchy) {
 		hb->nr_non_filtered_entries = hb->hists->nr_non_filtered_entries;
@@ -2669,7 +2669,7 @@ static void hist_browser__update_percent_limit(struct hist_browser *hb,
 					       double percent)
 {
 	struct hist_entry *he;
-	struct rb_node *nd = rb_first(&hb->hists->entries);
+	struct rb_node *nd = rb_first_cached(&hb->hists->entries);
 	u64 total = hists__total_period(hb->hists);
 	u64 min_callchain_hits = total * (percent / 100);
 
diff --git a/tools/perf/ui/gtk/hists.c b/tools/perf/ui/gtk/hists.c
index 4ab663ec3e5e..74414ccd8c84 100644
--- a/tools/perf/ui/gtk/hists.c
+++ b/tools/perf/ui/gtk/hists.c
@@ -353,7 +353,7 @@ static void perf_gtk__show_hists(GtkWidget *window, struct hists *hists,
 
 	g_object_unref(GTK_TREE_MODEL(store));
 
-	for (nd = rb_first(&hists->entries); nd; nd = rb_next(nd)) {
+	for (nd = rb_first_cached(&hists->entries); nd; nd = rb_next(nd)) {
 		struct hist_entry *h = rb_entry(nd, struct hist_entry, rb_node);
 		GtkTreeIter iter;
 		u64 total = hists__total_period(h->hists);
@@ -401,7 +401,7 @@ static void perf_gtk__show_hists(GtkWidget *window, struct hists *hists,
 }
 
 static void perf_gtk__add_hierarchy_entries(struct hists *hists,
-					    struct rb_root *root,
+					    struct rb_root_cached *root,
 					    GtkTreeStore *store,
 					    GtkTreeIter *parent,
 					    struct perf_hpp *hpp,
@@ -415,7 +415,7 @@ static void perf_gtk__add_hierarchy_entries(struct hists *hists,
 	u64 total = hists__total_period(hists);
 	int size;
 
-	for (node = rb_first(root); node; node = rb_next(node)) {
+	for (node = rb_first_cached(root); node; node = rb_next(node)) {
 		GtkTreeIter iter;
 		float percent;
 		char *bf;
diff --git a/tools/perf/ui/stdio/hist.c b/tools/perf/ui/stdio/hist.c
index 74c4ae1f0a05..2a9fa4dcbc2a 100644
--- a/tools/perf/ui/stdio/hist.c
+++ b/tools/perf/ui/stdio/hist.c
@@ -788,7 +788,8 @@ size_t hists__fprintf(struct hists *hists, bool show_header, int max_rows,
 
 	indent = hists__overhead_width(hists) + 4;
 
-	for (nd = rb_first(&hists->entries); nd; nd = __rb_hierarchy_next(nd, HMD_FORCE_CHILD)) {
+	for (nd = rb_first_cached(&hists->entries); nd;
+	     nd = __rb_hierarchy_next(nd, HMD_FORCE_CHILD)) {
 		struct hist_entry *h = rb_entry(nd, struct hist_entry, rb_node);
 		float percent;
 
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index 8aad8330e392..9e7a8e044a0a 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -209,7 +209,7 @@ void hists__calc_col_len(struct hists *hists, struct hist_entry *h)
 
 void hists__output_recalc_col_len(struct hists *hists, int max_rows)
 {
-	struct rb_node *next = rb_first(&hists->entries);
+	struct rb_node *next = rb_first_cached(&hists->entries);
 	struct hist_entry *n;
 	int row = 0;
 
@@ -296,7 +296,7 @@ static bool hists__decay_entry(struct hists *hists, struct hist_entry *he)
 
 	if (!he->leaf) {
 		struct hist_entry *child;
-		struct rb_node *node = rb_first(&he->hroot_out);
+		struct rb_node *node = rb_first_cached(&he->hroot_out);
 		while (node) {
 			child = rb_entry(node, struct hist_entry, rb_node);
 			node = rb_next(node);
@@ -311,8 +311,8 @@ static bool hists__decay_entry(struct hists *hists, struct hist_entry *he)
 
 static void hists__delete_entry(struct hists *hists, struct hist_entry *he)
 {
-	struct rb_root *root_in;
-	struct rb_root *root_out;
+	struct rb_root_cached *root_in;
+	struct rb_root_cached *root_out;
 
 	if (he->parent_he) {
 		root_in  = &he->parent_he->hroot_in;
@@ -325,8 +325,8 @@ static void hists__delete_entry(struct hists *hists, struct hist_entry *he)
 		root_out = &hists->entries;
 	}
 
-	rb_erase(&he->rb_node_in, root_in);
-	rb_erase(&he->rb_node, root_out);
+	rb_erase_cached(&he->rb_node_in, root_in);
+	rb_erase_cached(&he->rb_node, root_out);
 
 	--hists->nr_entries;
 	if (!he->filtered)
@@ -337,7 +337,7 @@ static void hists__delete_entry(struct hists *hists, struct hist_entry *he)
 
 void hists__decay_entries(struct hists *hists, bool zap_user, bool zap_kernel)
 {
-	struct rb_node *next = rb_first(&hists->entries);
+	struct rb_node *next = rb_first_cached(&hists->entries);
 	struct hist_entry *n;
 
 	while (next) {
@@ -353,7 +353,7 @@ void hists__decay_entries(struct hists *hists, bool zap_user, bool zap_kernel)
 
 void hists__delete_entries(struct hists *hists)
 {
-	struct rb_node *next = rb_first(&hists->entries);
+	struct rb_node *next = rb_first_cached(&hists->entries);
 	struct hist_entry *n;
 
 	while (next) {
@@ -435,8 +435,8 @@ static int hist_entry__init(struct hist_entry *he,
 	}
 	INIT_LIST_HEAD(&he->pairs.node);
 	thread__get(he->thread);
-	he->hroot_in  = RB_ROOT;
-	he->hroot_out = RB_ROOT;
+	he->hroot_in  = RB_ROOT_CACHED;
+	he->hroot_out = RB_ROOT_CACHED;
 
 	if (!symbol_conf.report_hierarchy)
 		he->leaf = true;
@@ -513,8 +513,9 @@ static struct hist_entry *hists__findnew_entry(struct hists *hists,
 	int64_t cmp;
 	u64 period = entry->stat.period;
 	u64 weight = entry->stat.weight;
+	bool leftmost = true;
 
-	p = &hists->entries_in->rb_node;
+	p = &hists->entries_in->rb_root.rb_node;
 
 	while (*p != NULL) {
 		parent = *p;
@@ -557,8 +558,10 @@ static struct hist_entry *hists__findnew_entry(struct hists *hists,
 
 		if (cmp < 0)
 			p = &(*p)->rb_left;
-		else
+		else {
 			p = &(*p)->rb_right;
+			leftmost = false;
+		}
 	}
 
 	he = hist_entry__new(entry, sample_self);
@@ -570,7 +573,7 @@ static struct hist_entry *hists__findnew_entry(struct hists *hists,
 	hists->nr_entries++;
 
 	rb_link_node(&he->rb_node_in, parent, p);
-	rb_insert_color(&he->rb_node_in, hists->entries_in);
+	rb_insert_color_cached(&he->rb_node_in, hists->entries_in, leftmost);
 out:
 	if (sample_self)
 		he_stat__add_cpumode_period(&he->stat, al->cpumode, period);
@@ -1279,16 +1282,17 @@ static void hist_entry__apply_hierarchy_filters(struct hist_entry *he)
 }
 
 static struct hist_entry *hierarchy_insert_entry(struct hists *hists,
-						 struct rb_root *root,
+						 struct rb_root_cached *root,
 						 struct hist_entry *he,
 						 struct hist_entry *parent_he,
 						 struct perf_hpp_list *hpp_list)
 {
-	struct rb_node **p = &root->rb_node;
+	struct rb_node **p = &root->rb_root.rb_node;
 	struct rb_node *parent = NULL;
 	struct hist_entry *iter, *new;
 	struct perf_hpp_fmt *fmt;
 	int64_t cmp;
+	bool leftmost = true;
 
 	while (*p != NULL) {
 		parent = *p;
@@ -1308,8 +1312,10 @@ static struct hist_entry *hierarchy_insert_entry(struct hists *hists,
 
 		if (cmp < 0)
 			p = &parent->rb_left;
-		else
+		else {
 			p = &parent->rb_right;
+			leftmost = false;
+		}
 	}
 
 	new = hist_entry__new(he, true);
@@ -1343,12 +1349,12 @@ static struct hist_entry *hierarchy_insert_entry(struct hists *hists,
 	}
 
 	rb_link_node(&new->rb_node_in, parent, p);
-	rb_insert_color(&new->rb_node_in, root);
+	rb_insert_color_cached(&new->rb_node_in, root, leftmost);
 	return new;
 }
 
 static int hists__hierarchy_insert_entry(struct hists *hists,
-					 struct rb_root *root,
+					 struct rb_root_cached *root,
 					 struct hist_entry *he)
 {
 	struct perf_hpp_list_node *node;
@@ -1395,13 +1401,14 @@ static int hists__hierarchy_insert_entry(struct hists *hists,
 }
 
 static int hists__collapse_insert_entry(struct hists *hists,
-					struct rb_root *root,
+					struct rb_root_cached *root,
 					struct hist_entry *he)
 {
-	struct rb_node **p = &root->rb_node;
+	struct rb_node **p = &root->rb_root.rb_node;
 	struct rb_node *parent = NULL;
 	struct hist_entry *iter;
 	int64_t cmp;
+	bool leftmost = true;
 
 	if (symbol_conf.report_hierarchy)
 		return hists__hierarchy_insert_entry(hists, root, he);
@@ -1432,19 +1439,21 @@ static int hists__collapse_insert_entry(struct hists *hists,
 
 		if (cmp < 0)
 			p = &(*p)->rb_left;
-		else
+		else {
 			p = &(*p)->rb_right;
+			leftmost = false;
+		}
 	}
 	hists->nr_entries++;
 
 	rb_link_node(&he->rb_node_in, parent, p);
-	rb_insert_color(&he->rb_node_in, root);
+	rb_insert_color_cached(&he->rb_node_in, root, leftmost);
 	return 1;
 }
 
-struct rb_root *hists__get_rotate_entries_in(struct hists *hists)
+struct rb_root_cached *hists__get_rotate_entries_in(struct hists *hists)
 {
-	struct rb_root *root;
+	struct rb_root_cached *root;
 
 	pthread_mutex_lock(&hists->lock);
 
@@ -1467,7 +1476,7 @@ static void hists__apply_filters(struct hists *hists, struct hist_entry *he)
 
 int hists__collapse_resort(struct hists *hists, struct ui_progress *prog)
 {
-	struct rb_root *root;
+	struct rb_root_cached *root;
 	struct rb_node *next;
 	struct hist_entry *n;
 	int ret;
@@ -1479,7 +1488,7 @@ int hists__collapse_resort(struct hists *hists, struct ui_progress *prog)
 
 	root = hists__get_rotate_entries_in(hists);
 
-	next = rb_first(root);
+	next = rb_first_cached(root);
 
 	while (next) {
 		if (session_done())
@@ -1487,7 +1496,7 @@ int hists__collapse_resort(struct hists *hists, struct ui_progress *prog)
 		n = rb_entry(next, struct hist_entry, rb_node_in);
 		next = rb_next(&n->rb_node_in);
 
-		rb_erase(&n->rb_node_in, root);
+		rb_erase_cached(&n->rb_node_in, root);
 		ret = hists__collapse_insert_entry(hists, &hists->entries_collapsed, n);
 		if (ret < 0)
 			return -1;
@@ -1558,7 +1567,7 @@ static void hierarchy_recalc_total_periods(struct hists *hists)
 	struct rb_node *node;
 	struct hist_entry *he;
 
-	node = rb_first(&hists->entries);
+	node = rb_first_cached(&hists->entries);
 
 	hists->stats.total_period = 0;
 	hists->stats.total_non_filtered_period = 0;
@@ -1578,13 +1587,14 @@ static void hierarchy_recalc_total_periods(struct hists *hists)
 	}
 }
 
-static void hierarchy_insert_output_entry(struct rb_root *root,
+static void hierarchy_insert_output_entry(struct rb_root_cached *root,
 					  struct hist_entry *he)
 {
-	struct rb_node **p = &root->rb_node;
+	struct rb_node **p = &root->rb_root.rb_node;
 	struct rb_node *parent = NULL;
 	struct hist_entry *iter;
 	struct perf_hpp_fmt *fmt;
+	bool leftmost = true;
 
 	while (*p != NULL) {
 		parent = *p;
@@ -1592,12 +1602,14 @@ static void hierarchy_insert_output_entry(struct rb_root *root,
 
 		if (hist_entry__sort(he, iter) > 0)
 			p = &parent->rb_left;
-		else
+		else {
 			p = &parent->rb_right;
+			leftmost = false;
+		}
 	}
 
 	rb_link_node(&he->rb_node, parent, p);
-	rb_insert_color(&he->rb_node, root);
+	rb_insert_color_cached(&he->rb_node, root, leftmost);
 
 	/* update column width of dynamic entry */
 	perf_hpp_list__for_each_sort_list(he->hpp_list, fmt) {
@@ -1608,16 +1620,16 @@ static void hierarchy_insert_output_entry(struct rb_root *root,
 
 static void hists__hierarchy_output_resort(struct hists *hists,
 					   struct ui_progress *prog,
-					   struct rb_root *root_in,
-					   struct rb_root *root_out,
+					   struct rb_root_cached *root_in,
+					   struct rb_root_cached *root_out,
 					   u64 min_callchain_hits,
 					   bool use_callchain)
 {
 	struct rb_node *node;
 	struct hist_entry *he;
 
-	*root_out = RB_ROOT;
-	node = rb_first(root_in);
+	*root_out = RB_ROOT_CACHED;
+	node = rb_first_cached(root_in);
 
 	while (node) {
 		he = rb_entry(node, struct hist_entry, rb_node_in);
@@ -1660,15 +1672,16 @@ static void hists__hierarchy_output_resort(struct hists *hists,
 	}
 }
 
-static void __hists__insert_output_entry(struct rb_root *entries,
+static void __hists__insert_output_entry(struct rb_root_cached *entries,
 					 struct hist_entry *he,
 					 u64 min_callchain_hits,
 					 bool use_callchain)
 {
-	struct rb_node **p = &entries->rb_node;
+	struct rb_node **p = &entries->rb_root.rb_node;
 	struct rb_node *parent = NULL;
 	struct hist_entry *iter;
 	struct perf_hpp_fmt *fmt;
+	bool leftmost = true;
 
 	if (use_callchain) {
 		if (callchain_param.mode == CHAIN_GRAPH_REL) {
@@ -1689,12 +1702,14 @@ static void __hists__insert_output_entry(struct rb_root *entries,
 
 		if (hist_entry__sort(he, iter) > 0)
 			p = &(*p)->rb_left;
-		else
+		else {
 			p = &(*p)->rb_right;
+			leftmost = false;
+		}
 	}
 
 	rb_link_node(&he->rb_node, parent, p);
-	rb_insert_color(&he->rb_node, entries);
+	rb_insert_color_cached(&he->rb_node, entries, leftmost);
 
 	perf_hpp_list__for_each_sort_list(&perf_hpp_list, fmt) {
 		if (perf_hpp__is_dynamic_entry(fmt) &&
@@ -1706,7 +1721,7 @@ static void __hists__insert_output_entry(struct rb_root *entries,
 static void output_resort(struct hists *hists, struct ui_progress *prog,
 			  bool use_callchain, hists__resort_cb_t cb)
 {
-	struct rb_root *root;
+	struct rb_root_cached *root;
 	struct rb_node *next;
 	struct hist_entry *n;
 	u64 callchain_total;
@@ -1736,8 +1751,8 @@ static void output_resort(struct hists *hists, struct ui_progress *prog,
 	else
 		root = hists->entries_in;
 
-	next = rb_first(root);
-	hists->entries = RB_ROOT;
+	next = rb_first_cached(root);
+	hists->entries = RB_ROOT_CACHED;
 
 	while (next) {
 		n = rb_entry(next, struct hist_entry, rb_node_in);
@@ -1798,7 +1813,7 @@ struct rb_node *rb_hierarchy_last(struct rb_node *node)
 	struct hist_entry *he = rb_entry(node, struct hist_entry, rb_node);
 
 	while (can_goto_child(he, HMD_NORMAL)) {
-		node = rb_last(&he->hroot_out);
+		node = rb_last(&he->hroot_out.rb_root);
 		he = rb_entry(node, struct hist_entry, rb_node);
 	}
 	return node;
@@ -1809,7 +1824,7 @@ struct rb_node *__rb_hierarchy_next(struct rb_node *node, enum hierarchy_move_di
 	struct hist_entry *he = rb_entry(node, struct hist_entry, rb_node);
 
 	if (can_goto_child(he, hmd))
-		node = rb_first(&he->hroot_out);
+		node = rb_first_cached(&he->hroot_out);
 	else
 		node = rb_next(node);
 
@@ -1847,7 +1862,7 @@ bool hist_entry__has_hierarchy_children(struct hist_entry *he, float limit)
 	if (he->leaf)
 		return false;
 
-	node = rb_first(&he->hroot_out);
+	node = rb_first_cached(&he->hroot_out);
 	child = rb_entry(node, struct hist_entry, rb_node);
 
 	while (node && child->filtered) {
@@ -1965,7 +1980,7 @@ static void hists__filter_by_type(struct hists *hists, int type, filter_fn_t fil
 	hists__reset_filter_stats(hists);
 	hists__reset_col_len(hists);
 
-	for (nd = rb_first(&hists->entries); nd; nd = rb_next(nd)) {
+	for (nd = rb_first_cached(&hists->entries); nd; nd = rb_next(nd)) {
 		struct hist_entry *h = rb_entry(nd, struct hist_entry, rb_node);
 
 		if (filter(hists, h))
@@ -1975,13 +1990,15 @@ static void hists__filter_by_type(struct hists *hists, int type, filter_fn_t fil
 	}
 }
 
-static void resort_filtered_entry(struct rb_root *root, struct hist_entry *he)
+static void resort_filtered_entry(struct rb_root_cached *root,
+				  struct hist_entry *he)
 {
-	struct rb_node **p = &root->rb_node;
+	struct rb_node **p = &root->rb_root.rb_node;
 	struct rb_node *parent = NULL;
 	struct hist_entry *iter;
-	struct rb_root new_root = RB_ROOT;
+	struct rb_root_cached new_root = RB_ROOT_CACHED;
 	struct rb_node *nd;
+	bool leftmost = true;
 
 	while (*p != NULL) {
 		parent = *p;
@@ -1989,22 +2006,24 @@ static void resort_filtered_entry(struct rb_root *root, struct hist_entry *he)
 
 		if (hist_entry__sort(he, iter) > 0)
 			p = &(*p)->rb_left;
-		else
+		else {
 			p = &(*p)->rb_right;
+			leftmost = false;
+		}
 	}
 
 	rb_link_node(&he->rb_node, parent, p);
-	rb_insert_color(&he->rb_node, root);
+	rb_insert_color_cached(&he->rb_node, root, leftmost);
 
 	if (he->leaf || he->filtered)
 		return;
 
-	nd = rb_first(&he->hroot_out);
+	nd = rb_first_cached(&he->hroot_out);
 	while (nd) {
 		struct hist_entry *h = rb_entry(nd, struct hist_entry, rb_node);
 
 		nd = rb_next(nd);
-		rb_erase(&h->rb_node, &he->hroot_out);
+		rb_erase_cached(&h->rb_node, &he->hroot_out);
 
 		resort_filtered_entry(&new_root, h);
 	}
@@ -2015,14 +2034,14 @@ static void resort_filtered_entry(struct rb_root *root, struct hist_entry *he)
 static void hists__filter_hierarchy(struct hists *hists, int type, const void *arg)
 {
 	struct rb_node *nd;
-	struct rb_root new_root = RB_ROOT;
+	struct rb_root_cached new_root = RB_ROOT_CACHED;
 
 	hists->stats.nr_non_filtered_samples = 0;
 
 	hists__reset_filter_stats(hists);
 	hists__reset_col_len(hists);
 
-	nd = rb_first(&hists->entries);
+	nd = rb_first_cached(&hists->entries);
 	while (nd) {
 		struct hist_entry *h = rb_entry(nd, struct hist_entry, rb_node);
 		int ret;
@@ -2066,12 +2085,12 @@ static void hists__filter_hierarchy(struct hists *hists, int type, const void *a
 	 * resort output after applying a new filter since filter in a lower
 	 * hierarchy can change periods in a upper hierarchy.
 	 */
-	nd = rb_first(&hists->entries);
+	nd = rb_first_cached(&hists->entries);
 	while (nd) {
 		struct hist_entry *h = rb_entry(nd, struct hist_entry, rb_node);
 
 		nd = rb_next(nd);
-		rb_erase(&h->rb_node, &hists->entries);
+		rb_erase_cached(&h->rb_node, &hists->entries);
 
 		resort_filtered_entry(&new_root, h);
 	}
@@ -2140,18 +2159,19 @@ void hists__inc_nr_samples(struct hists *hists, bool filtered)
 static struct hist_entry *hists__add_dummy_entry(struct hists *hists,
 						 struct hist_entry *pair)
 {
-	struct rb_root *root;
+	struct rb_root_cached *root;
 	struct rb_node **p;
 	struct rb_node *parent = NULL;
 	struct hist_entry *he;
 	int64_t cmp;
+	bool leftmost = true;
 
 	if (hists__has(hists, need_collapse))
 		root = &hists->entries_collapsed;
 	else
 		root = hists->entries_in;
 
-	p = &root->rb_node;
+	p = &root->rb_root.rb_node;
 
 	while (*p != NULL) {
 		parent = *p;
@@ -2164,8 +2184,10 @@ static struct hist_entry *hists__add_dummy_entry(struct hists *hists,
 
 		if (cmp < 0)
 			p = &(*p)->rb_left;
-		else
+		else {
 			p = &(*p)->rb_right;
+			leftmost = false;
+		}
 	}
 
 	he = hist_entry__new(pair, true);
@@ -2175,7 +2197,7 @@ static struct hist_entry *hists__add_dummy_entry(struct hists *hists,
 		if (symbol_conf.cumulate_callchain)
 			memset(he->stat_acc, 0, sizeof(he->stat));
 		rb_link_node(&he->rb_node_in, parent, p);
-		rb_insert_color(&he->rb_node_in, root);
+		rb_insert_color_cached(&he->rb_node_in, root, leftmost);
 		hists__inc_stats(hists, he);
 		he->dummy = true;
 	}
@@ -2184,15 +2206,16 @@ out:
 }
 
 static struct hist_entry *add_dummy_hierarchy_entry(struct hists *hists,
-						    struct rb_root *root,
+						    struct rb_root_cached *root,
 						    struct hist_entry *pair)
 {
 	struct rb_node **p;
 	struct rb_node *parent = NULL;
 	struct hist_entry *he;
 	struct perf_hpp_fmt *fmt;
+	bool leftmost = true;
 
-	p = &root->rb_node;
+	p = &root->rb_root.rb_node;
 	while (*p != NULL) {
 		int64_t cmp = 0;
 
@@ -2209,14 +2232,16 @@ static struct hist_entry *add_dummy_hierarchy_entry(struct hists *hists,
 
 		if (cmp < 0)
 			p = &parent->rb_left;
-		else
+		else {
 			p = &parent->rb_right;
+			leftmost = false;
+		}
 	}
 
 	he = hist_entry__new(pair, true);
 	if (he) {
 		rb_link_node(&he->rb_node_in, parent, p);
-		rb_insert_color(&he->rb_node_in, root);
+		rb_insert_color_cached(&he->rb_node_in, root, leftmost);
 
 		he->dummy = true;
 		he->hists = hists;
@@ -2233,9 +2258,9 @@ static struct hist_entry *hists__find_entry(struct hists *hists,
 	struct rb_node *n;
 
 	if (hists__has(hists, need_collapse))
-		n = hists->entries_collapsed.rb_node;
+		n = hists->entries_collapsed.rb_root.rb_node;
 	else
-		n = hists->entries_in->rb_node;
+		n = hists->entries_in->rb_root.rb_node;
 
 	while (n) {
 		struct hist_entry *iter = rb_entry(n, struct hist_entry, rb_node_in);
@@ -2252,10 +2277,10 @@ static struct hist_entry *hists__find_entry(struct hists *hists,
 	return NULL;
 }
 
-static struct hist_entry *hists__find_hierarchy_entry(struct rb_root *root,
+static struct hist_entry *hists__find_hierarchy_entry(struct rb_root_cached *root,
 						      struct hist_entry *he)
 {
-	struct rb_node *n = root->rb_node;
+	struct rb_node *n = root->rb_root.rb_node;
 
 	while (n) {
 		struct hist_entry *iter;
@@ -2280,13 +2305,13 @@ static struct hist_entry *hists__find_hierarchy_entry(struct rb_root *root,
 	return NULL;
 }
 
-static void hists__match_hierarchy(struct rb_root *leader_root,
-				   struct rb_root *other_root)
+static void hists__match_hierarchy(struct rb_root_cached *leader_root,
+				   struct rb_root_cached *other_root)
 {
 	struct rb_node *nd;
 	struct hist_entry *pos, *pair;
 
-	for (nd = rb_first(leader_root); nd; nd = rb_next(nd)) {
+	for (nd = rb_first_cached(leader_root); nd; nd = rb_next(nd)) {
 		pos  = rb_entry(nd, struct hist_entry, rb_node_in);
 		pair = hists__find_hierarchy_entry(other_root, pos);
 
@@ -2302,7 +2327,7 @@ static void hists__match_hierarchy(struct rb_root *leader_root,
  */
 void hists__match(struct hists *leader, struct hists *other)
 {
-	struct rb_root *root;
+	struct rb_root_cached *root;
 	struct rb_node *nd;
 	struct hist_entry *pos, *pair;
 
@@ -2317,7 +2342,7 @@ void hists__match(struct hists *leader, struct hists *other)
 	else
 		root = leader->entries_in;
 
-	for (nd = rb_first(root); nd; nd = rb_next(nd)) {
+	for (nd = rb_first_cached(root); nd; nd = rb_next(nd)) {
 		pos  = rb_entry(nd, struct hist_entry, rb_node_in);
 		pair = hists__find_entry(other, pos);
 
@@ -2328,13 +2353,13 @@ void hists__match(struct hists *leader, struct hists *other)
 
 static int hists__link_hierarchy(struct hists *leader_hists,
 				 struct hist_entry *parent,
-				 struct rb_root *leader_root,
-				 struct rb_root *other_root)
+				 struct rb_root_cached *leader_root,
+				 struct rb_root_cached *other_root)
 {
 	struct rb_node *nd;
 	struct hist_entry *pos, *leader;
 
-	for (nd = rb_first(other_root); nd; nd = rb_next(nd)) {
+	for (nd = rb_first_cached(other_root); nd; nd = rb_next(nd)) {
 		pos = rb_entry(nd, struct hist_entry, rb_node_in);
 
 		if (hist_entry__has_pairs(pos)) {
@@ -2377,7 +2402,7 @@ static int hists__link_hierarchy(struct hists *leader_hists,
  */
 int hists__link(struct hists *leader, struct hists *other)
 {
-	struct rb_root *root;
+	struct rb_root_cached *root;
 	struct rb_node *nd;
 	struct hist_entry *pos, *pair;
 
@@ -2393,7 +2418,7 @@ int hists__link(struct hists *leader, struct hists *other)
 	else
 		root = other->entries_in;
 
-	for (nd = rb_first(root); nd; nd = rb_next(nd)) {
+	for (nd = rb_first_cached(root); nd; nd = rb_next(nd)) {
 		pos = rb_entry(nd, struct hist_entry, rb_node_in);
 
 		if (!hist_entry__has_pairs(pos)) {
@@ -2566,10 +2591,10 @@ int perf_hist_config(const char *var, const char *value)
 int __hists__init(struct hists *hists, struct perf_hpp_list *hpp_list)
 {
 	memset(hists, 0, sizeof(*hists));
-	hists->entries_in_array[0] = hists->entries_in_array[1] = RB_ROOT;
+	hists->entries_in_array[0] = hists->entries_in_array[1] = RB_ROOT_CACHED;
 	hists->entries_in = &hists->entries_in_array[0];
-	hists->entries_collapsed = RB_ROOT;
-	hists->entries = RB_ROOT;
+	hists->entries_collapsed = RB_ROOT_CACHED;
+	hists->entries = RB_ROOT_CACHED;
 	pthread_mutex_init(&hists->lock, NULL);
 	hists->socket_filter = -1;
 	hists->hpp_list = hpp_list;
@@ -2577,14 +2602,14 @@ int __hists__init(struct hists *hists, struct perf_hpp_list *hpp_list)
 	return 0;
 }
 
-static void hists__delete_remaining_entries(struct rb_root *root)
+static void hists__delete_remaining_entries(struct rb_root_cached *root)
 {
 	struct rb_node *node;
 	struct hist_entry *he;
 
-	while (!RB_EMPTY_ROOT(root)) {
-		node = rb_first(root);
-		rb_erase(node, root);
+	while (!RB_EMPTY_ROOT(&root->rb_root)) {
+		node = rb_first_cached(root);
+		rb_erase_cached(node, root);
 
 		he = rb_entry(node, struct hist_entry, rb_node_in);
 		hist_entry__delete(he);
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index 664b5eda8d51..08267af7439c 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -70,10 +70,10 @@ struct thread;
 struct dso;
 
 struct hists {
-	struct rb_root		entries_in_array[2];
-	struct rb_root		*entries_in;
-	struct rb_root		entries;
-	struct rb_root		entries_collapsed;
+	struct rb_root_cached	entries_in_array[2];
+	struct rb_root_cached	*entries_in;
+	struct rb_root_cached	entries;
+	struct rb_root_cached	entries_collapsed;
 	u64			nr_entries;
 	u64			nr_non_filtered_entries;
 	u64			callchain_period;
@@ -230,7 +230,7 @@ static __pure inline bool hists__has_callchains(struct hists *hists)
 int hists__init(void);
 int __hists__init(struct hists *hists, struct perf_hpp_list *hpp_list);
 
-struct rb_root *hists__get_rotate_entries_in(struct hists *hists);
+struct rb_root_cached *hists__get_rotate_entries_in(struct hists *hists);
 
 struct perf_hpp {
 	char *buf;
diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h
index 130fe37fe2df..dd6312876492 100644
--- a/tools/perf/util/sort.h
+++ b/tools/perf/util/sort.h
@@ -145,8 +145,8 @@ struct hist_entry {
 	union {
 		/* this is for hierarchical entry structure */
 		struct {
-			struct rb_root	hroot_in;
-			struct rb_root  hroot_out;
+			struct rb_root_cached	hroot_in;
+			struct rb_root_cached   hroot_out;
 		};				/* non-leaf entries */
 		struct rb_root	sorted_chain;	/* leaf entry has callchains */
 	};

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [tip:perf/core] perf sched: Use cached rbtrees
  2018-12-06 19:18 ` [PATCH 7/7] perf sched: " Davidlohr Bueso
@ 2019-01-26 10:05   ` tip-bot for Davidlohr Bueso
  0 siblings, 0 replies; 18+ messages in thread
From: tip-bot for Davidlohr Bueso @ 2019-01-26 10:05 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: dave, acme, namhyung, tglx, jolsa, linux-kernel, hpa, mingo, dbueso

Commit-ID:  cb4c13a5137766c3666ae106e1a5549316992379
Gitweb:     https://git.kernel.org/tip/cb4c13a5137766c3666ae106e1a5549316992379
Author:     Davidlohr Bueso <dave@stgolabs.net>
AuthorDate: Thu, 6 Dec 2018 11:18:19 -0800
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Fri, 25 Jan 2019 15:12:10 +0100

perf sched: Use cached rbtrees

At the cost of an extra pointer, we can avoid the O(logN) cost of
finding the first element in the tree (smallest node), which is
something heavily required for perf-sched.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/20181206191819.30182-8-dave@stgolabs.net
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/builtin-sched.c | 45 +++++++++++++++++++++++++--------------------
 1 file changed, 25 insertions(+), 20 deletions(-)

diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
index 2e0f0c65964a..640558e9352e 100644
--- a/tools/perf/builtin-sched.c
+++ b/tools/perf/builtin-sched.c
@@ -213,7 +213,7 @@ struct perf_sched {
 	u64		 all_runtime;
 	u64		 all_count;
 	u64		 cpu_last_switched[MAX_CPUS];
-	struct rb_root	 atom_root, sorted_atom_root, merged_atom_root;
+	struct rb_root_cached atom_root, sorted_atom_root, merged_atom_root;
 	struct list_head sort_list, cmp_pid;
 	bool force;
 	bool skip_merge;
@@ -271,7 +271,7 @@ struct evsel_runtime {
 struct idle_thread_runtime {
 	struct thread_runtime	tr;
 	struct thread		*last_thread;
-	struct rb_root		sorted_root;
+	struct rb_root_cached	sorted_root;
 	struct callchain_root	callchain;
 	struct callchain_cursor	cursor;
 };
@@ -950,10 +950,10 @@ thread_lat_cmp(struct list_head *list, struct work_atoms *l, struct work_atoms *
 }
 
 static struct work_atoms *
-thread_atoms_search(struct rb_root *root, struct thread *thread,
+thread_atoms_search(struct rb_root_cached *root, struct thread *thread,
 			 struct list_head *sort_list)
 {
-	struct rb_node *node = root->rb_node;
+	struct rb_node *node = root->rb_root.rb_node;
 	struct work_atoms key = { .thread = thread };
 
 	while (node) {
@@ -976,10 +976,11 @@ thread_atoms_search(struct rb_root *root, struct thread *thread,
 }
 
 static void
-__thread_latency_insert(struct rb_root *root, struct work_atoms *data,
+__thread_latency_insert(struct rb_root_cached *root, struct work_atoms *data,
 			 struct list_head *sort_list)
 {
-	struct rb_node **new = &(root->rb_node), *parent = NULL;
+	struct rb_node **new = &(root->rb_root.rb_node), *parent = NULL;
+	bool leftmost = true;
 
 	while (*new) {
 		struct work_atoms *this;
@@ -992,12 +993,14 @@ __thread_latency_insert(struct rb_root *root, struct work_atoms *data,
 
 		if (cmp > 0)
 			new = &((*new)->rb_left);
-		else
+		else {
 			new = &((*new)->rb_right);
+			leftmost = false;
+		}
 	}
 
 	rb_link_node(&data->node, parent, new);
-	rb_insert_color(&data->node, root);
+	rb_insert_color_cached(&data->node, root, leftmost);
 }
 
 static int thread_atoms_insert(struct perf_sched *sched, struct thread *thread)
@@ -1447,15 +1450,15 @@ static int sort_dimension__add(const char *tok, struct list_head *list)
 static void perf_sched__sort_lat(struct perf_sched *sched)
 {
 	struct rb_node *node;
-	struct rb_root *root = &sched->atom_root;
+	struct rb_root_cached *root = &sched->atom_root;
 again:
 	for (;;) {
 		struct work_atoms *data;
-		node = rb_first(root);
+		node = rb_first_cached(root);
 		if (!node)
 			break;
 
-		rb_erase(node, root);
+		rb_erase_cached(node, root);
 		data = rb_entry(node, struct work_atoms, node);
 		__thread_latency_insert(&sched->sorted_atom_root, data, &sched->sort_list);
 	}
@@ -2762,12 +2765,12 @@ static size_t callchain__fprintf_folded(FILE *fp, struct callchain_node *node)
 	return ret;
 }
 
-static size_t timehist_print_idlehist_callchain(struct rb_root *root)
+static size_t timehist_print_idlehist_callchain(struct rb_root_cached *root)
 {
 	size_t ret = 0;
 	FILE *fp = stdout;
 	struct callchain_node *chain;
-	struct rb_node *rb_node = rb_first(root);
+	struct rb_node *rb_node = rb_first_cached(root);
 
 	printf("  %16s  %8s  %s\n", "Idle time (msec)", "Count", "Callchains");
 	printf("  %.16s  %.8s  %.50s\n", graph_dotted_line, graph_dotted_line,
@@ -2868,7 +2871,7 @@ static void timehist_print_summary(struct perf_sched *sched,
 			if (itr == NULL)
 				continue;
 
-			callchain_param.sort(&itr->sorted_root, &itr->callchain,
+			callchain_param.sort(&itr->sorted_root.rb_root, &itr->callchain,
 					     0, &callchain_param);
 
 			printf("  CPU %2d:", i);
@@ -3074,11 +3077,12 @@ static void print_bad_events(struct perf_sched *sched)
 	}
 }
 
-static void __merge_work_atoms(struct rb_root *root, struct work_atoms *data)
+static void __merge_work_atoms(struct rb_root_cached *root, struct work_atoms *data)
 {
-	struct rb_node **new = &(root->rb_node), *parent = NULL;
+	struct rb_node **new = &(root->rb_root.rb_node), *parent = NULL;
 	struct work_atoms *this;
 	const char *comm = thread__comm_str(data->thread), *this_comm;
+	bool leftmost = true;
 
 	while (*new) {
 		int cmp;
@@ -3092,6 +3096,7 @@ static void __merge_work_atoms(struct rb_root *root, struct work_atoms *data)
 			new = &((*new)->rb_left);
 		} else if (cmp < 0) {
 			new = &((*new)->rb_right);
+			leftmost = false;
 		} else {
 			this->num_merged++;
 			this->total_runtime += data->total_runtime;
@@ -3109,7 +3114,7 @@ static void __merge_work_atoms(struct rb_root *root, struct work_atoms *data)
 
 	data->num_merged++;
 	rb_link_node(&data->node, parent, new);
-	rb_insert_color(&data->node, root);
+	rb_insert_color_cached(&data->node, root, leftmost);
 }
 
 static void perf_sched__merge_lat(struct perf_sched *sched)
@@ -3120,8 +3125,8 @@ static void perf_sched__merge_lat(struct perf_sched *sched)
 	if (sched->skip_merge)
 		return;
 
-	while ((node = rb_first(&sched->atom_root))) {
-		rb_erase(node, &sched->atom_root);
+	while ((node = rb_first_cached(&sched->atom_root))) {
+		rb_erase_cached(node, &sched->atom_root);
 		data = rb_entry(node, struct work_atoms, node);
 		__merge_work_atoms(&sched->merged_atom_root, data);
 	}
@@ -3143,7 +3148,7 @@ static int perf_sched__lat(struct perf_sched *sched)
 	printf("  Task                  |   Runtime ms  | Switches | Average delay ms | Maximum delay ms | Maximum delay at       |\n");
 	printf(" -----------------------------------------------------------------------------------------------------------------\n");
 
-	next = rb_first(&sched->sorted_atom_root);
+	next = rb_first_cached(&sched->sorted_atom_root);
 
 	while (next) {
 		struct work_atoms *work_list;

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 6/7] perf hist: Use cached rbtrees
  2017-11-27  2:30 [PATCH -tip 0/7] tools/perf: Update rbtree implementation and optimize users Davidlohr Bueso
@ 2017-11-27  2:30 ` Davidlohr Bueso
  0 siblings, 0 replies; 18+ messages in thread
From: Davidlohr Bueso @ 2017-11-27  2:30 UTC (permalink / raw)
  To: acme; +Cc: jolsa, ak, mingo, dave, linux-kernel, Davidlohr Bueso

At the cost of an extra pointer, we can avoid the O(logN) cost
of finding the first element in the tree (smallest node), which
is something heavily required for histograms. Specifically, the
following are converted to rb_root_cached, and users accordingly:

hist::entries_in_array
hist::entries_in
hist::entries
hist::entries_collapsed
hist_entry::hroot_in
hist_entry::hroot_out

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 tools/perf/builtin-annotate.c     |   2 +-
 tools/perf/builtin-c2c.c          |   6 +-
 tools/perf/builtin-diff.c         |  10 +-
 tools/perf/builtin-top.c          |   2 +-
 tools/perf/tests/hists_common.c   |   8 +-
 tools/perf/tests/hists_cumulate.c |  19 ++--
 tools/perf/tests/hists_link.c     |   8 +-
 tools/perf/tests/hists_output.c   |  32 +++---
 tools/perf/ui/browsers/hists.c    |  16 +--
 tools/perf/ui/gtk/hists.c         |   6 +-
 tools/perf/ui/stdio/hist.c        |   3 +-
 tools/perf/util/hist.c            | 198 +++++++++++++++++++++-----------------
 tools/perf/util/hist.h            |  10 +-
 tools/perf/util/sort.h            |   4 +-
 14 files changed, 175 insertions(+), 149 deletions(-)

diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index 2c2c265a0eb7..98e66781839e 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -228,7 +228,7 @@ static void hists__find_annotations(struct hists *hists,
 				    struct perf_evsel *evsel,
 				    struct perf_annotate *ann)
 {
-	struct rb_node *nd = rb_first(&hists->entries), *next;
+	struct rb_node *nd = rb_first_cached(&hists->entries), *next;
 	int key = K_RIGHT;
 
 	while (nd) {
diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index f1da9b0833c0..c4da89d25942 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -1968,7 +1968,7 @@ static int resort_hitm_cb(struct hist_entry *he)
 
 static int hists__iterate_cb(struct hists *hists, hists__resort_cb_t cb)
 {
-	struct rb_node *next = rb_first(&hists->entries);
+	struct rb_node *next = rb_first_cached(&hists->entries);
 	int ret = 0;
 
 	while (next) {
@@ -2095,7 +2095,7 @@ static void print_pareto(FILE *out)
 	if (WARN_ONCE(ret, "failed to setup sort entries\n"))
 		return;
 
-	nd = rb_first(&c2c.hists.hists.entries);
+	nd = rb_first_cached(&c2c.hists.hists.entries);
 
 	for (; nd; nd = rb_next(nd)) {
 		struct hist_entry *he = rb_entry(nd, struct hist_entry, rb_node);
@@ -2163,7 +2163,7 @@ static void perf_c2c__hists_fprintf(FILE *out, struct perf_session *session)
 static void c2c_browser__update_nr_entries(struct hist_browser *hb)
 {
 	u64 nr_entries = 0;
-	struct rb_node *nd = rb_first(&hb->hists->entries);
+	struct rb_node *nd = rb_first_cached(&hb->hists->entries);
 
 	while (nd) {
 		struct hist_entry *he = rb_entry(nd, struct hist_entry, rb_node);
diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
index d660cb7b222b..5486d6a3cd7c 100644
--- a/tools/perf/builtin-diff.c
+++ b/tools/perf/builtin-diff.c
@@ -429,7 +429,7 @@ get_pair_fmt(struct hist_entry *he, struct diff_hpp_fmt *dfmt)
 
 static void hists__baseline_only(struct hists *hists)
 {
-	struct rb_root *root;
+	struct rb_root_cached *root;
 	struct rb_node *next;
 
 	if (hists__has(hists, need_collapse))
@@ -437,13 +437,13 @@ static void hists__baseline_only(struct hists *hists)
 	else
 		root = hists->entries_in;
 
-	next = rb_first(root);
+	next = rb_first_cached(root);
 	while (next != NULL) {
 		struct hist_entry *he = rb_entry(next, struct hist_entry, rb_node_in);
 
 		next = rb_next(&he->rb_node_in);
 		if (!hist_entry__next_pair(he)) {
-			rb_erase(&he->rb_node_in, root);
+			rb_erase_cached(&he->rb_node_in, root);
 			hist_entry__delete(he);
 		}
 	}
@@ -451,7 +451,7 @@ static void hists__baseline_only(struct hists *hists)
 
 static void hists__precompute(struct hists *hists)
 {
-	struct rb_root *root;
+	struct rb_root_cached *root;
 	struct rb_node *next;
 
 	if (hists__has(hists, need_collapse))
@@ -459,7 +459,7 @@ static void hists__precompute(struct hists *hists)
 	else
 		root = hists->entries_in;
 
-	next = rb_first(root);
+	next = rb_first_cached(root);
 	while (next != NULL) {
 		struct hist_entry *he, *pair;
 		struct data__file *d;
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index c3030a05ed77..d268907941ea 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -373,7 +373,7 @@ static void perf_top__prompt_symbol(struct perf_top *top, const char *msg)
 	if (p)
 		*p = 0;
 
-	next = rb_first(&hists->entries);
+	next = rb_first_cached(&hists->entries);
 	while (next) {
 		n = rb_entry(next, struct hist_entry, rb_node);
 		if (n->ms.sym && !strcmp(buf, n->ms.sym->name)) {
diff --git a/tools/perf/tests/hists_common.c b/tools/perf/tests/hists_common.c
index f7c5b613d667..7824fd5e9870 100644
--- a/tools/perf/tests/hists_common.c
+++ b/tools/perf/tests/hists_common.c
@@ -161,7 +161,7 @@ struct machine *setup_fake_machine(struct machines *machines)
 void print_hists_in(struct hists *hists)
 {
 	int i = 0;
-	struct rb_root *root;
+	struct rb_root_cached *root;
 	struct rb_node *node;
 
 	if (hists__has(hists, need_collapse))
@@ -170,7 +170,7 @@ void print_hists_in(struct hists *hists)
 		root = hists->entries_in;
 
 	pr_info("----- %s --------\n", __func__);
-	node = rb_first(root);
+	node = rb_first_cached(root);
 	while (node) {
 		struct hist_entry *he;
 
@@ -191,13 +191,13 @@ void print_hists_in(struct hists *hists)
 void print_hists_out(struct hists *hists)
 {
 	int i = 0;
-	struct rb_root *root;
+	struct rb_root_cached *root;
 	struct rb_node *node;
 
 	root = &hists->entries;
 
 	pr_info("----- %s --------\n", __func__);
-	node = rb_first(root);
+	node = rb_first_cached(root);
 	while (node) {
 		struct hist_entry *he;
 
diff --git a/tools/perf/tests/hists_cumulate.c b/tools/perf/tests/hists_cumulate.c
index 65fe02bebbee..2e4759d5e913 100644
--- a/tools/perf/tests/hists_cumulate.c
+++ b/tools/perf/tests/hists_cumulate.c
@@ -125,8 +125,8 @@ static int add_hist_entries(struct hists *hists, struct machine *machine)
 static void del_hist_entries(struct hists *hists)
 {
 	struct hist_entry *he;
-	struct rb_root *root_in;
-	struct rb_root *root_out;
+	struct rb_root_cached *root_in;
+	struct rb_root_cached *root_out;
 	struct rb_node *node;
 
 	if (hists__has(hists, need_collapse))
@@ -136,12 +136,12 @@ static void del_hist_entries(struct hists *hists)
 
 	root_out = &hists->entries;
 
-	while (!RB_EMPTY_ROOT(root_out)) {
-		node = rb_first(root_out);
+	while (!RB_EMPTY_ROOT(&root_out->rb_root)) {
+		node = rb_first_cached(root_out);
 
 		he = rb_entry(node, struct hist_entry, rb_node);
-		rb_erase(node, root_out);
-		rb_erase(&he->rb_node_in, root_in);
+		rb_erase_cached(node, root_out);
+		rb_erase_cached(&he->rb_node_in, root_in);
 		hist_entry__delete(he);
 	}
 }
@@ -179,7 +179,8 @@ static int do_test(struct hists *hists, struct result *expected, size_t nr_expec
 	char buf[32];
 	size_t i, c;
 	struct hist_entry *he;
-	struct rb_root *root;
+	struct rb_root *root; /* for callchain entry test */
+	struct rb_root_cached *root_cached;
 	struct rb_node *node;
 	struct callchain_node *cnode;
 	struct callchain_list *clist;
@@ -198,8 +199,8 @@ static int do_test(struct hists *hists, struct result *expected, size_t nr_expec
 		print_hists_out(hists);
 	}
 
-	root = &hists->entries;
-	for (node = rb_first(root), i = 0;
+	root_cached = &hists->entries;
+	for (node = rb_first_cached(root_cached), i = 0;
 	     node && (he = rb_entry(node, struct hist_entry, rb_node));
 	     node = rb_next(node), i++) {
 		scnprintf(buf, sizeof(buf), "Invalid hist entry #%zd", i);
diff --git a/tools/perf/tests/hists_link.c b/tools/perf/tests/hists_link.c
index 9a9d06cb0222..af633db63f4d 100644
--- a/tools/perf/tests/hists_link.c
+++ b/tools/perf/tests/hists_link.c
@@ -142,7 +142,7 @@ static int find_sample(struct sample *samples, size_t nr_samples,
 static int __validate_match(struct hists *hists)
 {
 	size_t count = 0;
-	struct rb_root *root;
+	struct rb_root_cached *root;
 	struct rb_node *node;
 
 	/*
@@ -153,7 +153,7 @@ static int __validate_match(struct hists *hists)
 	else
 		root = hists->entries_in;
 
-	node = rb_first(root);
+	node = rb_first_cached(root);
 	while (node) {
 		struct hist_entry *he;
 
@@ -192,7 +192,7 @@ static int __validate_link(struct hists *hists, int idx)
 	size_t count = 0;
 	size_t count_pair = 0;
 	size_t count_dummy = 0;
-	struct rb_root *root;
+	struct rb_root_cached *root;
 	struct rb_node *node;
 
 	/*
@@ -205,7 +205,7 @@ static int __validate_link(struct hists *hists, int idx)
 	else
 		root = hists->entries_in;
 
-	node = rb_first(root);
+	node = rb_first_cached(root);
 	while (node) {
 		struct hist_entry *he;
 
diff --git a/tools/perf/tests/hists_output.c b/tools/perf/tests/hists_output.c
index faacb4f41460..6c4bc68a77ee 100644
--- a/tools/perf/tests/hists_output.c
+++ b/tools/perf/tests/hists_output.c
@@ -91,8 +91,8 @@ static int add_hist_entries(struct hists *hists, struct machine *machine)
 static void del_hist_entries(struct hists *hists)
 {
 	struct hist_entry *he;
-	struct rb_root *root_in;
-	struct rb_root *root_out;
+	struct rb_root_cached *root_in;
+	struct rb_root_cached *root_out;
 	struct rb_node *node;
 
 	if (hists__has(hists, need_collapse))
@@ -102,12 +102,12 @@ static void del_hist_entries(struct hists *hists)
 
 	root_out = &hists->entries;
 
-	while (!RB_EMPTY_ROOT(root_out)) {
-		node = rb_first(root_out);
+	while (!RB_EMPTY_ROOT(&root_out->rb_root)) {
+		node = rb_first_cached(root_out);
 
 		he = rb_entry(node, struct hist_entry, rb_node);
-		rb_erase(node, root_out);
-		rb_erase(&he->rb_node_in, root_in);
+		rb_erase_cached(node, root_out);
+		rb_erase_cached(&he->rb_node_in, root_in);
 		hist_entry__delete(he);
 	}
 }
@@ -126,7 +126,7 @@ static int test1(struct perf_evsel *evsel, struct machine *machine)
 	int err;
 	struct hists *hists = evsel__hists(evsel);
 	struct hist_entry *he;
-	struct rb_root *root;
+	struct rb_root_cached *root;
 	struct rb_node *node;
 
 	field_order = NULL;
@@ -162,7 +162,7 @@ static int test1(struct perf_evsel *evsel, struct machine *machine)
 	}
 
 	root = &hists->entries;
-	node = rb_first(root);
+	node = rb_first_cached(root);
 	he = rb_entry(node, struct hist_entry, rb_node);
 	TEST_ASSERT_VAL("Invalid hist entry",
 			!strcmp(COMM(he), "perf") && !strcmp(DSO(he), "perf") &&
@@ -228,7 +228,7 @@ static int test2(struct perf_evsel *evsel, struct machine *machine)
 	int err;
 	struct hists *hists = evsel__hists(evsel);
 	struct hist_entry *he;
-	struct rb_root *root;
+	struct rb_root_cached *root;
 	struct rb_node *node;
 
 	field_order = "overhead,cpu";
@@ -262,7 +262,7 @@ static int test2(struct perf_evsel *evsel, struct machine *machine)
 	}
 
 	root = &hists->entries;
-	node = rb_first(root);
+	node = rb_first_cached(root);
 	he = rb_entry(node, struct hist_entry, rb_node);
 	TEST_ASSERT_VAL("Invalid hist entry",
 			CPU(he) == 1 && PID(he) == 100 && he->stat.period == 300);
@@ -284,7 +284,7 @@ static int test3(struct perf_evsel *evsel, struct machine *machine)
 	int err;
 	struct hists *hists = evsel__hists(evsel);
 	struct hist_entry *he;
-	struct rb_root *root;
+	struct rb_root_cached *root;
 	struct rb_node *node;
 
 	field_order = "comm,overhead,dso";
@@ -316,7 +316,7 @@ static int test3(struct perf_evsel *evsel, struct machine *machine)
 	}
 
 	root = &hists->entries;
-	node = rb_first(root);
+	node = rb_first_cached(root);
 	he = rb_entry(node, struct hist_entry, rb_node);
 	TEST_ASSERT_VAL("Invalid hist entry",
 			!strcmp(COMM(he), "bash") && !strcmp(DSO(he), "bash") &&
@@ -358,7 +358,7 @@ static int test4(struct perf_evsel *evsel, struct machine *machine)
 	int err;
 	struct hists *hists = evsel__hists(evsel);
 	struct hist_entry *he;
-	struct rb_root *root;
+	struct rb_root_cached *root;
 	struct rb_node *node;
 
 	field_order = "dso,sym,comm,overhead,dso";
@@ -394,7 +394,7 @@ static int test4(struct perf_evsel *evsel, struct machine *machine)
 	}
 
 	root = &hists->entries;
-	node = rb_first(root);
+	node = rb_first_cached(root);
 	he = rb_entry(node, struct hist_entry, rb_node);
 	TEST_ASSERT_VAL("Invalid hist entry",
 			!strcmp(DSO(he), "perf") && !strcmp(SYM(he), "cmd_record") &&
@@ -460,7 +460,7 @@ static int test5(struct perf_evsel *evsel, struct machine *machine)
 	int err;
 	struct hists *hists = evsel__hists(evsel);
 	struct hist_entry *he;
-	struct rb_root *root;
+	struct rb_root_cached *root;
 	struct rb_node *node;
 
 	field_order = "cpu,pid,comm,dso,sym";
@@ -497,7 +497,7 @@ static int test5(struct perf_evsel *evsel, struct machine *machine)
 	}
 
 	root = &hists->entries;
-	node = rb_first(root);
+	node = rb_first_cached(root);
 	he = rb_entry(node, struct hist_entry, rb_node);
 
 	TEST_ASSERT_VAL("Invalid hist entry",
diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index 68146f4620a5..f552958475d2 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -50,7 +50,7 @@ static int hist_browser__get_folding(struct hist_browser *browser)
 	struct hists *hists = browser->hists;
 	int unfolded_rows = 0;
 
-	for (nd = rb_first(&hists->entries);
+	for (nd = rb_first_cached(&hists->entries);
 	     (nd = hists__filter_entries(nd, browser->min_pcnt)) != NULL;
 	     nd = rb_hierarchy_next(nd)) {
 		struct hist_entry *he =
@@ -264,7 +264,7 @@ static int hierarchy_count_rows(struct hist_browser *hb, struct hist_entry *he,
 	if (he->has_no_entry)
 		return 1;
 
-	node = rb_first(&he->hroot_out);
+	node = rb_first_cached(&he->hroot_out);
 	while (node) {
 		float percent;
 
@@ -369,7 +369,7 @@ static void hist_entry__init_have_children(struct hist_entry *he)
 		he->has_children = !RB_EMPTY_ROOT(&he->sorted_chain);
 		callchain__init_have_children(&he->sorted_chain);
 	} else {
-		he->has_children = !RB_EMPTY_ROOT(&he->hroot_out);
+		he->has_children = !RB_EMPTY_ROOT(&he->hroot_out.rb_root);
 	}
 
 	he->init_have_children = true;
@@ -505,7 +505,7 @@ static int hierarchy_set_folding(struct hist_browser *hb, struct hist_entry *he,
 	struct hist_entry *child;
 	int n = 0;
 
-	for (nd = rb_first(&he->hroot_out); nd; nd = rb_next(nd)) {
+	for (nd = rb_first_cached(&he->hroot_out); nd; nd = rb_next(nd)) {
 		child = rb_entry(nd, struct hist_entry, rb_node);
 		percent = hist_entry__get_percent_limit(child);
 		if (!child->filtered && percent >= hb->min_pcnt)
@@ -563,7 +563,7 @@ __hist_browser__set_folding(struct hist_browser *browser, bool unfold)
 	struct rb_node *nd;
 	struct hist_entry *he;
 
-	nd = rb_first(&browser->hists->entries);
+	nd = rb_first_cached(&browser->hists->entries);
 	while (nd) {
 		he = rb_entry(nd, struct hist_entry, rb_node);
 
@@ -1731,7 +1731,7 @@ static void ui_browser__hists_init_top(struct ui_browser *browser)
 		struct hist_browser *hb;
 
 		hb = container_of(browser, struct hist_browser, b);
-		browser->top = rb_first(&hb->hists->entries);
+		browser->top = rb_first_cached(&hb->hists->entries);
 	}
 }
 
@@ -2700,7 +2700,7 @@ add_socket_opt(struct hist_browser *browser, struct popup_action *act,
 static void hist_browser__update_nr_entries(struct hist_browser *hb)
 {
 	u64 nr_entries = 0;
-	struct rb_node *nd = rb_first(&hb->hists->entries);
+	struct rb_node *nd = rb_first_cached(&hb->hists->entries);
 
 	if (hb->min_pcnt == 0 && !symbol_conf.report_hierarchy) {
 		hb->nr_non_filtered_entries = hb->hists->nr_non_filtered_entries;
@@ -2720,7 +2720,7 @@ static void hist_browser__update_percent_limit(struct hist_browser *hb,
 					       double percent)
 {
 	struct hist_entry *he;
-	struct rb_node *nd = rb_first(&hb->hists->entries);
+	struct rb_node *nd = rb_first_cached(&hb->hists->entries);
 	u64 total = hists__total_period(hb->hists);
 	u64 min_callchain_hits = total * (percent / 100);
 
diff --git a/tools/perf/ui/gtk/hists.c b/tools/perf/ui/gtk/hists.c
index 24e1ec201ffd..10ba738adbff 100644
--- a/tools/perf/ui/gtk/hists.c
+++ b/tools/perf/ui/gtk/hists.c
@@ -353,7 +353,7 @@ static void perf_gtk__show_hists(GtkWidget *window, struct hists *hists,
 
 	g_object_unref(GTK_TREE_MODEL(store));
 
-	for (nd = rb_first(&hists->entries); nd; nd = rb_next(nd)) {
+	for (nd = rb_first_cached(&hists->entries); nd; nd = rb_next(nd)) {
 		struct hist_entry *h = rb_entry(nd, struct hist_entry, rb_node);
 		GtkTreeIter iter;
 		u64 total = hists__total_period(h->hists);
@@ -400,7 +400,7 @@ static void perf_gtk__show_hists(GtkWidget *window, struct hists *hists,
 }
 
 static void perf_gtk__add_hierarchy_entries(struct hists *hists,
-					    struct rb_root *root,
+					    struct rb_root_cached *root,
 					    GtkTreeStore *store,
 					    GtkTreeIter *parent,
 					    struct perf_hpp *hpp,
@@ -414,7 +414,7 @@ static void perf_gtk__add_hierarchy_entries(struct hists *hists,
 	u64 total = hists__total_period(hists);
 	int size;
 
-	for (node = rb_first(root); node; node = rb_next(node)) {
+	for (node = rb_first_cached(root); node; node = rb_next(node)) {
 		GtkTreeIter iter;
 		float percent;
 		char *bf;
diff --git a/tools/perf/ui/stdio/hist.c b/tools/perf/ui/stdio/hist.c
index 25dd1e0ecc58..8ab79b44a524 100644
--- a/tools/perf/ui/stdio/hist.c
+++ b/tools/perf/ui/stdio/hist.c
@@ -788,7 +788,8 @@ size_t hists__fprintf(struct hists *hists, bool show_header, int max_rows,
 
 	indent = hists__overhead_width(hists) + 4;
 
-	for (nd = rb_first(&hists->entries); nd; nd = __rb_hierarchy_next(nd, HMD_FORCE_CHILD)) {
+	for (nd = rb_first_cached(&hists->entries);
+	     nd; nd = __rb_hierarchy_next(nd, HMD_FORCE_CHILD)) {
 		struct hist_entry *h = rb_entry(nd, struct hist_entry, rb_node);
 		float percent;
 
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index b6140950301e..7d01c6f87ab4 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -207,7 +207,7 @@ void hists__calc_col_len(struct hists *hists, struct hist_entry *h)
 
 void hists__output_recalc_col_len(struct hists *hists, int max_rows)
 {
-	struct rb_node *next = rb_first(&hists->entries);
+	struct rb_node *next = rb_first_cached(&hists->entries);
 	struct hist_entry *n;
 	int row = 0;
 
@@ -294,7 +294,7 @@ static bool hists__decay_entry(struct hists *hists, struct hist_entry *he)
 
 	if (!he->leaf) {
 		struct hist_entry *child;
-		struct rb_node *node = rb_first(&he->hroot_out);
+		struct rb_node *node = rb_first_cached(&he->hroot_out);
 		while (node) {
 			child = rb_entry(node, struct hist_entry, rb_node);
 			node = rb_next(node);
@@ -309,8 +309,8 @@ static bool hists__decay_entry(struct hists *hists, struct hist_entry *he)
 
 static void hists__delete_entry(struct hists *hists, struct hist_entry *he)
 {
-	struct rb_root *root_in;
-	struct rb_root *root_out;
+	struct rb_root_cached *root_in;
+	struct rb_root_cached *root_out;
 
 	if (he->parent_he) {
 		root_in  = &he->parent_he->hroot_in;
@@ -323,8 +323,8 @@ static void hists__delete_entry(struct hists *hists, struct hist_entry *he)
 		root_out = &hists->entries;
 	}
 
-	rb_erase(&he->rb_node_in, root_in);
-	rb_erase(&he->rb_node, root_out);
+	rb_erase_cached(&he->rb_node_in, root_in);
+	rb_erase_cached(&he->rb_node, root_out);
 
 	--hists->nr_entries;
 	if (!he->filtered)
@@ -335,7 +335,7 @@ static void hists__delete_entry(struct hists *hists, struct hist_entry *he)
 
 void hists__decay_entries(struct hists *hists, bool zap_user, bool zap_kernel)
 {
-	struct rb_node *next = rb_first(&hists->entries);
+	struct rb_node *next = rb_first_cached(&hists->entries);
 	struct hist_entry *n;
 
 	while (next) {
@@ -351,7 +351,7 @@ void hists__decay_entries(struct hists *hists, bool zap_user, bool zap_kernel)
 
 void hists__delete_entries(struct hists *hists)
 {
-	struct rb_node *next = rb_first(&hists->entries);
+	struct rb_node *next = rb_first_cached(&hists->entries);
 	struct hist_entry *n;
 
 	while (next) {
@@ -431,8 +431,8 @@ static int hist_entry__init(struct hist_entry *he,
 	}
 	INIT_LIST_HEAD(&he->pairs.node);
 	thread__get(he->thread);
-	he->hroot_in  = RB_ROOT;
-	he->hroot_out = RB_ROOT;
+	he->hroot_in  = RB_ROOT_CACHED;
+	he->hroot_out = RB_ROOT_CACHED;
 
 	if (!symbol_conf.report_hierarchy)
 		he->leaf = true;
@@ -509,8 +509,9 @@ static struct hist_entry *hists__findnew_entry(struct hists *hists,
 	int64_t cmp;
 	u64 period = entry->stat.period;
 	u64 weight = entry->stat.weight;
+	bool leftmost = true;
 
-	p = &hists->entries_in->rb_node;
+	p = &hists->entries_in->rb_root.rb_node;
 
 	while (*p != NULL) {
 		parent = *p;
@@ -553,8 +554,10 @@ static struct hist_entry *hists__findnew_entry(struct hists *hists,
 
 		if (cmp < 0)
 			p = &(*p)->rb_left;
-		else
+		else {
 			p = &(*p)->rb_right;
+			leftmost = false;
+		}
 	}
 
 	he = hist_entry__new(entry, sample_self);
@@ -566,7 +569,7 @@ static struct hist_entry *hists__findnew_entry(struct hists *hists,
 	hists->nr_entries++;
 
 	rb_link_node(&he->rb_node_in, parent, p);
-	rb_insert_color(&he->rb_node_in, hists->entries_in);
+	rb_insert_color_cached(&he->rb_node_in, hists->entries_in, leftmost);
 out:
 	if (sample_self)
 		he_stat__add_cpumode_period(&he->stat, al->cpumode, period);
@@ -1275,16 +1278,17 @@ static void hist_entry__apply_hierarchy_filters(struct hist_entry *he)
 }
 
 static struct hist_entry *hierarchy_insert_entry(struct hists *hists,
-						 struct rb_root *root,
+						 struct rb_root_cached *root,
 						 struct hist_entry *he,
 						 struct hist_entry *parent_he,
 						 struct perf_hpp_list *hpp_list)
 {
-	struct rb_node **p = &root->rb_node;
+	struct rb_node **p = &root->rb_root.rb_node;
 	struct rb_node *parent = NULL;
 	struct hist_entry *iter, *new;
 	struct perf_hpp_fmt *fmt;
 	int64_t cmp;
+	bool leftmost = true;
 
 	while (*p != NULL) {
 		parent = *p;
@@ -1304,8 +1308,10 @@ static struct hist_entry *hierarchy_insert_entry(struct hists *hists,
 
 		if (cmp < 0)
 			p = &parent->rb_left;
-		else
+		else {
 			p = &parent->rb_right;
+			leftmost = false;
+		}
 	}
 
 	new = hist_entry__new(he, true);
@@ -1339,12 +1345,12 @@ static struct hist_entry *hierarchy_insert_entry(struct hists *hists,
 	}
 
 	rb_link_node(&new->rb_node_in, parent, p);
-	rb_insert_color(&new->rb_node_in, root);
+	rb_insert_color_cached(&new->rb_node_in, root, leftmost);
 	return new;
 }
 
 static int hists__hierarchy_insert_entry(struct hists *hists,
-					 struct rb_root *root,
+					 struct rb_root_cached *root,
 					 struct hist_entry *he)
 {
 	struct perf_hpp_list_node *node;
@@ -1390,13 +1396,14 @@ static int hists__hierarchy_insert_entry(struct hists *hists,
 }
 
 static int hists__collapse_insert_entry(struct hists *hists,
-					struct rb_root *root,
+					struct rb_root_cached *root,
 					struct hist_entry *he)
 {
-	struct rb_node **p = &root->rb_node;
+	struct rb_node **p = &root->rb_root.rb_node;
 	struct rb_node *parent = NULL;
 	struct hist_entry *iter;
 	int64_t cmp;
+	bool leftmost = true;
 
 	if (symbol_conf.report_hierarchy)
 		return hists__hierarchy_insert_entry(hists, root, he);
@@ -1427,19 +1434,21 @@ static int hists__collapse_insert_entry(struct hists *hists,
 
 		if (cmp < 0)
 			p = &(*p)->rb_left;
-		else
+		else {
 			p = &(*p)->rb_right;
+			leftmost = false;
+		}
 	}
 	hists->nr_entries++;
 
 	rb_link_node(&he->rb_node_in, parent, p);
-	rb_insert_color(&he->rb_node_in, root);
+	rb_insert_color_cached(&he->rb_node_in, root, leftmost);
 	return 1;
 }
 
-struct rb_root *hists__get_rotate_entries_in(struct hists *hists)
+struct rb_root_cached *hists__get_rotate_entries_in(struct hists *hists)
 {
-	struct rb_root *root;
+	struct rb_root_cached *root;
 
 	pthread_mutex_lock(&hists->lock);
 
@@ -1462,7 +1471,7 @@ static void hists__apply_filters(struct hists *hists, struct hist_entry *he)
 
 int hists__collapse_resort(struct hists *hists, struct ui_progress *prog)
 {
-	struct rb_root *root;
+	struct rb_root_cached *root;
 	struct rb_node *next;
 	struct hist_entry *n;
 	int ret;
@@ -1474,7 +1483,7 @@ int hists__collapse_resort(struct hists *hists, struct ui_progress *prog)
 
 	root = hists__get_rotate_entries_in(hists);
 
-	next = rb_first(root);
+	next = rb_first_cached(root);
 
 	while (next) {
 		if (session_done())
@@ -1482,7 +1491,7 @@ int hists__collapse_resort(struct hists *hists, struct ui_progress *prog)
 		n = rb_entry(next, struct hist_entry, rb_node_in);
 		next = rb_next(&n->rb_node_in);
 
-		rb_erase(&n->rb_node_in, root);
+		rb_erase_cached(&n->rb_node_in, root);
 		ret = hists__collapse_insert_entry(hists, &hists->entries_collapsed, n);
 		if (ret < 0)
 			return -1;
@@ -1553,7 +1562,7 @@ static void hierarchy_recalc_total_periods(struct hists *hists)
 	struct rb_node *node;
 	struct hist_entry *he;
 
-	node = rb_first(&hists->entries);
+	node = rb_first_cached(&hists->entries);
 
 	hists->stats.total_period = 0;
 	hists->stats.total_non_filtered_period = 0;
@@ -1573,13 +1582,14 @@ static void hierarchy_recalc_total_periods(struct hists *hists)
 	}
 }
 
-static void hierarchy_insert_output_entry(struct rb_root *root,
+static void hierarchy_insert_output_entry(struct rb_root_cached *root,
 					  struct hist_entry *he)
 {
-	struct rb_node **p = &root->rb_node;
+	struct rb_node **p = &root->rb_root.rb_node;
 	struct rb_node *parent = NULL;
 	struct hist_entry *iter;
 	struct perf_hpp_fmt *fmt;
+	bool leftmost = true;
 
 	while (*p != NULL) {
 		parent = *p;
@@ -1587,12 +1597,14 @@ static void hierarchy_insert_output_entry(struct rb_root *root,
 
 		if (hist_entry__sort(he, iter) > 0)
 			p = &parent->rb_left;
-		else
+		else {
 			p = &parent->rb_right;
+			leftmost = false;
+		}
 	}
 
 	rb_link_node(&he->rb_node, parent, p);
-	rb_insert_color(&he->rb_node, root);
+	rb_insert_color_cached(&he->rb_node, root, leftmost);
 
 	/* update column width of dynamic entry */
 	perf_hpp_list__for_each_sort_list(he->hpp_list, fmt) {
@@ -1603,16 +1615,16 @@ static void hierarchy_insert_output_entry(struct rb_root *root,
 
 static void hists__hierarchy_output_resort(struct hists *hists,
 					   struct ui_progress *prog,
-					   struct rb_root *root_in,
-					   struct rb_root *root_out,
+					   struct rb_root_cached *root_in,
+					   struct rb_root_cached *root_out,
 					   u64 min_callchain_hits,
 					   bool use_callchain)
 {
 	struct rb_node *node;
 	struct hist_entry *he;
 
-	*root_out = RB_ROOT;
-	node = rb_first(root_in);
+	*root_out = RB_ROOT_CACHED;
+	node = rb_first_cached(root_in);
 
 	while (node) {
 		he = rb_entry(node, struct hist_entry, rb_node_in);
@@ -1655,15 +1667,16 @@ static void hists__hierarchy_output_resort(struct hists *hists,
 	}
 }
 
-static void __hists__insert_output_entry(struct rb_root *entries,
+static void __hists__insert_output_entry(struct rb_root_cached *entries,
 					 struct hist_entry *he,
 					 u64 min_callchain_hits,
 					 bool use_callchain)
 {
-	struct rb_node **p = &entries->rb_node;
+	struct rb_node **p = &entries->rb_root.rb_node;
 	struct rb_node *parent = NULL;
 	struct hist_entry *iter;
 	struct perf_hpp_fmt *fmt;
+	bool leftmost = true;
 
 	if (use_callchain) {
 		if (callchain_param.mode == CHAIN_GRAPH_REL) {
@@ -1684,12 +1697,14 @@ static void __hists__insert_output_entry(struct rb_root *entries,
 
 		if (hist_entry__sort(he, iter) > 0)
 			p = &(*p)->rb_left;
-		else
+		else {
 			p = &(*p)->rb_right;
+			leftmost = false;
+		}
 	}
 
 	rb_link_node(&he->rb_node, parent, p);
-	rb_insert_color(&he->rb_node, entries);
+	rb_insert_color_cached(&he->rb_node, entries, leftmost);
 
 	perf_hpp_list__for_each_sort_list(&perf_hpp_list, fmt) {
 		if (perf_hpp__is_dynamic_entry(fmt) &&
@@ -1701,7 +1716,7 @@ static void __hists__insert_output_entry(struct rb_root *entries,
 static void output_resort(struct hists *hists, struct ui_progress *prog,
 			  bool use_callchain, hists__resort_cb_t cb)
 {
-	struct rb_root *root;
+	struct rb_root_cached *root;
 	struct rb_node *next;
 	struct hist_entry *n;
 	u64 callchain_total;
@@ -1731,8 +1746,8 @@ static void output_resort(struct hists *hists, struct ui_progress *prog,
 	else
 		root = hists->entries_in;
 
-	next = rb_first(root);
-	hists->entries = RB_ROOT;
+	next = rb_first_cached(root);
+	hists->entries = RB_ROOT_CACHED;
 
 	while (next) {
 		n = rb_entry(next, struct hist_entry, rb_node_in);
@@ -1793,7 +1808,7 @@ struct rb_node *rb_hierarchy_last(struct rb_node *node)
 	struct hist_entry *he = rb_entry(node, struct hist_entry, rb_node);
 
 	while (can_goto_child(he, HMD_NORMAL)) {
-		node = rb_last(&he->hroot_out);
+		node = rb_last(&he->hroot_out.rb_root);
 		he = rb_entry(node, struct hist_entry, rb_node);
 	}
 	return node;
@@ -1804,7 +1819,7 @@ struct rb_node *__rb_hierarchy_next(struct rb_node *node, enum hierarchy_move_di
 	struct hist_entry *he = rb_entry(node, struct hist_entry, rb_node);
 
 	if (can_goto_child(he, hmd))
-		node = rb_first(&he->hroot_out);
+		node = rb_first_cached(&he->hroot_out);
 	else
 		node = rb_next(node);
 
@@ -1842,7 +1857,7 @@ bool hist_entry__has_hierarchy_children(struct hist_entry *he, float limit)
 	if (he->leaf)
 		return false;
 
-	node = rb_first(&he->hroot_out);
+	node = rb_first_cached(&he->hroot_out);
 	child = rb_entry(node, struct hist_entry, rb_node);
 
 	while (node && child->filtered) {
@@ -1960,7 +1975,7 @@ static void hists__filter_by_type(struct hists *hists, int type, filter_fn_t fil
 	hists__reset_filter_stats(hists);
 	hists__reset_col_len(hists);
 
-	for (nd = rb_first(&hists->entries); nd; nd = rb_next(nd)) {
+	for (nd = rb_first_cached(&hists->entries); nd; nd = rb_next(nd)) {
 		struct hist_entry *h = rb_entry(nd, struct hist_entry, rb_node);
 
 		if (filter(hists, h))
@@ -1970,13 +1985,14 @@ static void hists__filter_by_type(struct hists *hists, int type, filter_fn_t fil
 	}
 }
 
-static void resort_filtered_entry(struct rb_root *root, struct hist_entry *he)
+static void resort_filtered_entry(struct rb_root_cached *root, struct hist_entry *he)
 {
-	struct rb_node **p = &root->rb_node;
+	struct rb_node **p = &root->rb_root.rb_node;
 	struct rb_node *parent = NULL;
 	struct hist_entry *iter;
-	struct rb_root new_root = RB_ROOT;
+	struct rb_root_cached new_root = RB_ROOT_CACHED;
 	struct rb_node *nd;
+	bool leftmost = true;
 
 	while (*p != NULL) {
 		parent = *p;
@@ -1984,22 +2000,24 @@ static void resort_filtered_entry(struct rb_root *root, struct hist_entry *he)
 
 		if (hist_entry__sort(he, iter) > 0)
 			p = &(*p)->rb_left;
-		else
+		else {
 			p = &(*p)->rb_right;
+			leftmost = false;
+		}
 	}
 
 	rb_link_node(&he->rb_node, parent, p);
-	rb_insert_color(&he->rb_node, root);
+	rb_insert_color_cached(&he->rb_node, root, leftmost);
 
 	if (he->leaf || he->filtered)
 		return;
 
-	nd = rb_first(&he->hroot_out);
+	nd = rb_first_cached(&he->hroot_out);
 	while (nd) {
 		struct hist_entry *h = rb_entry(nd, struct hist_entry, rb_node);
 
 		nd = rb_next(nd);
-		rb_erase(&h->rb_node, &he->hroot_out);
+		rb_erase_cached(&h->rb_node, &he->hroot_out);
 
 		resort_filtered_entry(&new_root, h);
 	}
@@ -2010,14 +2028,14 @@ static void resort_filtered_entry(struct rb_root *root, struct hist_entry *he)
 static void hists__filter_hierarchy(struct hists *hists, int type, const void *arg)
 {
 	struct rb_node *nd;
-	struct rb_root new_root = RB_ROOT;
+	struct rb_root_cached new_root = RB_ROOT_CACHED;
 
 	hists->stats.nr_non_filtered_samples = 0;
 
 	hists__reset_filter_stats(hists);
 	hists__reset_col_len(hists);
 
-	nd = rb_first(&hists->entries);
+	nd = rb_first_cached(&hists->entries);
 	while (nd) {
 		struct hist_entry *h = rb_entry(nd, struct hist_entry, rb_node);
 		int ret;
@@ -2061,12 +2079,12 @@ static void hists__filter_hierarchy(struct hists *hists, int type, const void *a
 	 * resort output after applying a new filter since filter in a lower
 	 * hierarchy can change periods in a upper hierarchy.
 	 */
-	nd = rb_first(&hists->entries);
+	nd = rb_first_cached(&hists->entries);
 	while (nd) {
 		struct hist_entry *h = rb_entry(nd, struct hist_entry, rb_node);
 
 		nd = rb_next(nd);
-		rb_erase(&h->rb_node, &hists->entries);
+		rb_erase_cached(&h->rb_node, &hists->entries);
 
 		resort_filtered_entry(&new_root, h);
 	}
@@ -2135,18 +2153,19 @@ void hists__inc_nr_samples(struct hists *hists, bool filtered)
 static struct hist_entry *hists__add_dummy_entry(struct hists *hists,
 						 struct hist_entry *pair)
 {
-	struct rb_root *root;
+	struct rb_root_cached *root;
 	struct rb_node **p;
 	struct rb_node *parent = NULL;
 	struct hist_entry *he;
 	int64_t cmp;
+	bool leftmost = true;
 
 	if (hists__has(hists, need_collapse))
 		root = &hists->entries_collapsed;
 	else
 		root = hists->entries_in;
 
-	p = &root->rb_node;
+	p = &root->rb_root.rb_node;
 
 	while (*p != NULL) {
 		parent = *p;
@@ -2159,8 +2178,10 @@ static struct hist_entry *hists__add_dummy_entry(struct hists *hists,
 
 		if (cmp < 0)
 			p = &(*p)->rb_left;
-		else
+		else {
 			p = &(*p)->rb_right;
+			leftmost = false;
+		}
 	}
 
 	he = hist_entry__new(pair, true);
@@ -2170,7 +2191,7 @@ static struct hist_entry *hists__add_dummy_entry(struct hists *hists,
 		if (symbol_conf.cumulate_callchain)
 			memset(he->stat_acc, 0, sizeof(he->stat));
 		rb_link_node(&he->rb_node_in, parent, p);
-		rb_insert_color(&he->rb_node_in, root);
+		rb_insert_color_cached(&he->rb_node_in, root, leftmost);
 		hists__inc_stats(hists, he);
 		he->dummy = true;
 	}
@@ -2179,15 +2200,16 @@ static struct hist_entry *hists__add_dummy_entry(struct hists *hists,
 }
 
 static struct hist_entry *add_dummy_hierarchy_entry(struct hists *hists,
-						    struct rb_root *root,
+						    struct rb_root_cached *root,
 						    struct hist_entry *pair)
 {
 	struct rb_node **p;
 	struct rb_node *parent = NULL;
 	struct hist_entry *he;
 	struct perf_hpp_fmt *fmt;
+	bool leftmost = true;
 
-	p = &root->rb_node;
+	p = &root->rb_root.rb_node;
 	while (*p != NULL) {
 		int64_t cmp = 0;
 
@@ -2204,14 +2226,16 @@ static struct hist_entry *add_dummy_hierarchy_entry(struct hists *hists,
 
 		if (cmp < 0)
 			p = &parent->rb_left;
-		else
+		else {
 			p = &parent->rb_right;
+			leftmost = false;
+		}
 	}
 
 	he = hist_entry__new(pair, true);
 	if (he) {
 		rb_link_node(&he->rb_node_in, parent, p);
-		rb_insert_color(&he->rb_node_in, root);
+		rb_insert_color_cached(&he->rb_node_in, root, leftmost);
 
 		he->dummy = true;
 		he->hists = hists;
@@ -2228,9 +2252,9 @@ static struct hist_entry *hists__find_entry(struct hists *hists,
 	struct rb_node *n;
 
 	if (hists__has(hists, need_collapse))
-		n = hists->entries_collapsed.rb_node;
+		n = hists->entries_collapsed.rb_root.rb_node;
 	else
-		n = hists->entries_in->rb_node;
+		n = hists->entries_in->rb_root.rb_node;
 
 	while (n) {
 		struct hist_entry *iter = rb_entry(n, struct hist_entry, rb_node_in);
@@ -2247,10 +2271,10 @@ static struct hist_entry *hists__find_entry(struct hists *hists,
 	return NULL;
 }
 
-static struct hist_entry *hists__find_hierarchy_entry(struct rb_root *root,
+static struct hist_entry *hists__find_hierarchy_entry(struct rb_root_cached *root,
 						      struct hist_entry *he)
 {
-	struct rb_node *n = root->rb_node;
+	struct rb_node *n = root->rb_root.rb_node;
 
 	while (n) {
 		struct hist_entry *iter;
@@ -2275,13 +2299,13 @@ static struct hist_entry *hists__find_hierarchy_entry(struct rb_root *root,
 	return NULL;
 }
 
-static void hists__match_hierarchy(struct rb_root *leader_root,
-				   struct rb_root *other_root)
+static void hists__match_hierarchy(struct rb_root_cached *leader_root,
+				   struct rb_root_cached *other_root)
 {
 	struct rb_node *nd;
 	struct hist_entry *pos, *pair;
 
-	for (nd = rb_first(leader_root); nd; nd = rb_next(nd)) {
+	for (nd = rb_first_cached(leader_root); nd; nd = rb_next(nd)) {
 		pos  = rb_entry(nd, struct hist_entry, rb_node_in);
 		pair = hists__find_hierarchy_entry(other_root, pos);
 
@@ -2297,7 +2321,7 @@ static void hists__match_hierarchy(struct rb_root *leader_root,
  */
 void hists__match(struct hists *leader, struct hists *other)
 {
-	struct rb_root *root;
+	struct rb_root_cached *root;
 	struct rb_node *nd;
 	struct hist_entry *pos, *pair;
 
@@ -2312,7 +2336,7 @@ void hists__match(struct hists *leader, struct hists *other)
 	else
 		root = leader->entries_in;
 
-	for (nd = rb_first(root); nd; nd = rb_next(nd)) {
+	for (nd = rb_first_cached(root); nd; nd = rb_next(nd)) {
 		pos  = rb_entry(nd, struct hist_entry, rb_node_in);
 		pair = hists__find_entry(other, pos);
 
@@ -2323,13 +2347,13 @@ void hists__match(struct hists *leader, struct hists *other)
 
 static int hists__link_hierarchy(struct hists *leader_hists,
 				 struct hist_entry *parent,
-				 struct rb_root *leader_root,
-				 struct rb_root *other_root)
+				 struct rb_root_cached *leader_root,
+				 struct rb_root_cached *other_root)
 {
 	struct rb_node *nd;
 	struct hist_entry *pos, *leader;
 
-	for (nd = rb_first(other_root); nd; nd = rb_next(nd)) {
+	for (nd = rb_first_cached(other_root); nd; nd = rb_next(nd)) {
 		pos = rb_entry(nd, struct hist_entry, rb_node_in);
 
 		if (hist_entry__has_pairs(pos)) {
@@ -2372,7 +2396,7 @@ static int hists__link_hierarchy(struct hists *leader_hists,
  */
 int hists__link(struct hists *leader, struct hists *other)
 {
-	struct rb_root *root;
+	struct rb_root_cached *root;
 	struct rb_node *nd;
 	struct hist_entry *pos, *pair;
 
@@ -2388,7 +2412,7 @@ int hists__link(struct hists *leader, struct hists *other)
 	else
 		root = other->entries_in;
 
-	for (nd = rb_first(root); nd; nd = rb_next(nd)) {
+	for (nd = rb_first_cached(root); nd; nd = rb_next(nd)) {
 		pos = rb_entry(nd, struct hist_entry, rb_node_in);
 
 		if (!hist_entry__has_pairs(pos)) {
@@ -2482,10 +2506,10 @@ int perf_hist_config(const char *var, const char *value)
 int __hists__init(struct hists *hists, struct perf_hpp_list *hpp_list)
 {
 	memset(hists, 0, sizeof(*hists));
-	hists->entries_in_array[0] = hists->entries_in_array[1] = RB_ROOT;
+	hists->entries_in_array[0] = hists->entries_in_array[1] = RB_ROOT_CACHED;
 	hists->entries_in = &hists->entries_in_array[0];
-	hists->entries_collapsed = RB_ROOT;
-	hists->entries = RB_ROOT;
+	hists->entries_collapsed = RB_ROOT_CACHED;
+	hists->entries = RB_ROOT_CACHED;
 	pthread_mutex_init(&hists->lock, NULL);
 	hists->socket_filter = -1;
 	hists->hpp_list = hpp_list;
@@ -2493,14 +2517,14 @@ int __hists__init(struct hists *hists, struct perf_hpp_list *hpp_list)
 	return 0;
 }
 
-static void hists__delete_remaining_entries(struct rb_root *root)
+static void hists__delete_remaining_entries(struct rb_root_cached *root)
 {
 	struct rb_node *node;
 	struct hist_entry *he;
 
-	while (!RB_EMPTY_ROOT(root)) {
-		node = rb_first(root);
-		rb_erase(node, root);
+	while (!RB_EMPTY_ROOT(&root->rb_root)) {
+		node = rb_first_cached(root);
+		rb_erase_cached(node, root);
 
 		he = rb_entry(node, struct hist_entry, rb_node_in);
 		hist_entry__delete(he);
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index f6630cb95eff..afe2d16bcea3 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -68,10 +68,10 @@ struct thread;
 struct dso;
 
 struct hists {
-	struct rb_root		entries_in_array[2];
-	struct rb_root		*entries_in;
-	struct rb_root		entries;
-	struct rb_root		entries_collapsed;
+	struct rb_root_cached	entries_in_array[2];
+	struct rb_root_cached	*entries_in;
+	struct rb_root_cached	entries;
+	struct rb_root_cached	entries_collapsed;
 	u64			nr_entries;
 	u64			nr_non_filtered_entries;
 	u64			callchain_period;
@@ -223,7 +223,7 @@ static inline struct hists *evsel__hists(struct perf_evsel *evsel)
 int hists__init(void);
 int __hists__init(struct hists *hists, struct perf_hpp_list *hpp_list);
 
-struct rb_root *hists__get_rotate_entries_in(struct hists *hists);
+struct rb_root_cached *hists__get_rotate_entries_in(struct hists *hists);
 
 struct perf_hpp {
 	char *buf;
diff --git a/tools/perf/util/sort.h b/tools/perf/util/sort.h
index f5901c10a563..b063690238c3 100644
--- a/tools/perf/util/sort.h
+++ b/tools/perf/util/sort.h
@@ -143,8 +143,8 @@ struct hist_entry {
 	union {
 		/* this is for hierarchical entry structure */
 		struct {
-			struct rb_root	hroot_in;
-			struct rb_root  hroot_out;
+			struct rb_root_cached  hroot_in;
+			struct rb_root_cached  hroot_out;
 		};				/* non-leaf entries */
 		struct rb_root	sorted_chain;	/* leaf entry has callchains */
 	};
-- 
2.13.6

^ permalink raw reply related	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2019-01-26 10:05 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-06 19:18 [PATCH v2 -tip 0/7] tools/perf: Update rbtree implementation and optimize users Davidlohr Bueso
2018-12-06 19:18 ` [PATCH 1/7] tools/perf: Update rbtree implementation Davidlohr Bueso
2019-01-26 10:01   ` [tip:perf/core] tools: " tip-bot for Davidlohr Bueso
2018-12-06 19:18 ` [PATCH 2/7] perf machine: Use cached rbtrees Davidlohr Bueso
2019-01-26 10:01   ` [tip:perf/core] " tip-bot for Davidlohr Bueso
2018-12-06 19:18 ` [PATCH 3/7] perf callchain: " Davidlohr Bueso
2019-01-26 10:02   ` [tip:perf/core] " tip-bot for Davidlohr Bueso
2018-12-06 19:18 ` [PATCH 4/7] perf util: Use cached rbtree for rblists Davidlohr Bueso
2019-01-26 10:03   ` [tip:perf/core] " tip-bot for Davidlohr Bueso
2018-12-06 19:18 ` [PATCH 5/7] perf symbols: Use cached rbtrees Davidlohr Bueso
2019-01-26 10:03   ` [tip:perf/core] " tip-bot for Davidlohr Bueso
2018-12-06 19:18 ` [PATCH 6/7] perf hist: " Davidlohr Bueso
2019-01-22 13:59   ` Arnaldo Carvalho de Melo
2019-01-22 15:22     ` Davidlohr Bueso
2019-01-26 10:04   ` [tip:perf/core] " tip-bot for Davidlohr Bueso
2018-12-06 19:18 ` [PATCH 7/7] perf sched: " Davidlohr Bueso
2019-01-26 10:05   ` [tip:perf/core] " tip-bot for Davidlohr Bueso
  -- strict thread matches above, loose matches on Subject: below --
2017-11-27  2:30 [PATCH -tip 0/7] tools/perf: Update rbtree implementation and optimize users Davidlohr Bueso
2017-11-27  2:30 ` [PATCH 6/7] perf hist: Use cached rbtrees Davidlohr Bueso

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.