All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH -next v4 00/17] rbtree: cache leftmost node internally
@ 2017-07-19  1:45 Davidlohr Bueso
  2017-07-19  1:45 ` [PATCH 01/17] " Davidlohr Bueso
                   ` (16 more replies)
  0 siblings, 17 replies; 29+ messages in thread
From: Davidlohr Bueso @ 2017-07-19  1:45 UTC (permalink / raw)
  To: akpm
  Cc: mingo, peterz, jack, torvalds, kirill.shutemov, hch, ldufour,
	mhocko, mgorman, dave, linux-kernel

Changes from v3 (https://lwn.net/Articles/726809/):
- Updated Documentation/rbtree.txt in patch 1.
- Explicitly mentioned the asymmetry wrt rightmost node in patch 1 (hch).
- Most changes in the set is adding more patches: patch 2,3,4,5,6,15,16,17.
- Added Peter's ack.

Changes from v2 (https://lkml.org/lkml/2017/6/8/857):
- Fixed 0day reported crash for drm_mm selftest program. We were
not correctly using the cached version of rbtree with the allocated
nodes.
- Added cfq patch to use internal rbtree caching.
- Added Christian's and Jan's reviews.

Changes from v1 (https://marc.info/?l=linux-kernel&m=149611025616685):
- No longer rfc.
- Removed bogus semimcolon in rb_first_cached()
- Updated missing interval tree user drivers/infiniband/hw/hfi1/
- Removed redundant @cached arg in when erasing a node.
- Added more patches that make use of rb_first_cached(), which I
  thought might be worth it: procfs and epoll.
- Cc more people for patch 5, which touches drivers such as infiniband
and gpu. The rest of the changes are pretty covered with the current
Cc'ed maintainers and mm folks.

Hi,

Here's a proposal for extending rbtrees to internally cache the leftmost
node such that we can have fast overlap check optimization for all interval
tree users[1]. The benefits of this series are that:

(i)   Unify users that do internal leftmost node caching.
(ii)  Optimize all interval tree users.
(iii) Convert at least two new users (epoll and procfs) to the new interface.
(iv)  Enable user rightmost node caching, similar to how we deal with the
      leftmost case before this patchset.

Patch 1: Layout the rb machinery.

Patch 2,3: Adds an unlikely() prediction to root-most node for insertions
as well as some minor comment enhancements.

Patches 4-6: Improve and extend rbrtee_test.c program.

Patches 7-10:  Make use of the internal leftmost node in scheduler, realtime
mutexes and cfq.

Patch 11: Implements fast overlap checks for interval trees.

Patch 12: rocket science.

Patches 13-15: Patches that convert to O(1) rb_first_cached().

Patches 16,17: Apply rightmost node optimization explicitly by two users.

The series has survived booting, kernel builds and pistress workloads.

Applies on top of today's -next.

Thanks!

Davidlohr Bueso (17):
  rbtree: cache leftmost node internally
  rbtree: optimize root-check during rebalancing loop
  rbtree: add some additional comments for rebalancing cases
  lib/rbtree_test.c: make input module parameters
  lib/rbtree_test.c: add (inorder) traversal test
  lib/rbtree_test.c: support rb_root_cached
  sched/fair: replace cfs_rq->rb_leftmost
  sched/deadline: replace earliest dl and rq leftmost caching
  locking/rtmutex: replace top-waiter and pi_waiters leftmost caching
  block/cfq: replace cfq_rb_root leftmost caching
  lib/interval_tree: fast overlap detection
  lib/interval-tree: correct comment wrt generic flavor
  procfs: use faster rb_first_cached()
  fs/epoll: use faster rb_first_cached()
  fs/ext4: use cached rbtrees
  mem/memcg: cache rightmost node
  block/cfq: cache rightmost rb_node

 Documentation/rbtree.txt                           |  33 +++
 block/cfq-iosched.c                                |  81 +++-----
 drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c             |   8 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c             |   7 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h             |   2 +-
 drivers/gpu/drm/drm_mm.c                           |  19 +-
 drivers/gpu/drm/drm_vma_manager.c                  |   2 +-
 drivers/gpu/drm/i915/i915_gem_userptr.c            |   6 +-
 drivers/gpu/drm/radeon/radeon.h                    |   2 +-
 drivers/gpu/drm/radeon/radeon_mn.c                 |   8 +-
 drivers/gpu/drm/radeon/radeon_vm.c                 |   7 +-
 drivers/infiniband/core/umem_rbtree.c              |   4 +-
 drivers/infiniband/core/uverbs_cmd.c               |   2 +-
 drivers/infiniband/hw/hfi1/mmu_rb.c                |  10 +-
 drivers/infiniband/hw/usnic/usnic_uiom.c           |   6 +-
 drivers/infiniband/hw/usnic/usnic_uiom.h           |   2 +-
 .../infiniband/hw/usnic/usnic_uiom_interval_tree.c |  15 +-
 .../infiniband/hw/usnic/usnic_uiom_interval_tree.h |  12 +-
 drivers/vhost/vhost.c                              |   2 +-
 drivers/vhost/vhost.h                              |   2 +-
 fs/eventpoll.c                                     |  30 +--
 fs/ext4/dir.c                                      |  24 ++-
 fs/ext4/ext4.h                                     |   4 +-
 fs/ext4/mballoc.c                                  |  22 +-
 fs/hugetlbfs/inode.c                               |   6 +-
 fs/inode.c                                         |   2 +-
 fs/proc/generic.c                                  |  26 +--
 fs/proc/internal.h                                 |   2 +-
 fs/proc/proc_net.c                                 |   2 +-
 fs/proc/root.c                                     |   2 +-
 include/drm/drm_mm.h                               |   2 +-
 include/linux/fs.h                                 |   4 +-
 include/linux/init_task.h                          |   5 +-
 include/linux/interval_tree.h                      |   8 +-
 include/linux/interval_tree_generic.h              |  48 ++++-
 include/linux/mm.h                                 |  17 +-
 include/linux/rbtree.h                             |  21 ++
 include/linux/rbtree_augmented.h                   |  33 ++-
 include/linux/rmap.h                               |   4 +-
 include/linux/rtmutex.h                            |  11 +-
 include/linux/sched.h                              |   3 +-
 include/rdma/ib_umem_odp.h                         |  11 +-
 include/rdma/ib_verbs.h                            |   2 +-
 kernel/fork.c                                      |   3 +-
 kernel/locking/rtmutex-debug.c                     |   2 +-
 kernel/locking/rtmutex.c                           |  35 +---
 kernel/locking/rtmutex_common.h                    |  12 +-
 kernel/sched/deadline.c                            |  50 ++---
 kernel/sched/debug.c                               |   2 +-
 kernel/sched/fair.c                                |  39 ++--
 kernel/sched/sched.h                               |   9 +-
 lib/interval_tree_test.c                           |   4 +-
 lib/rbtree.c                                       |  65 ++++--
 lib/rbtree_test.c                                  | 230 +++++++++++++++++----
 mm/interval_tree.c                                 |  10 +-
 mm/memcontrol.c                                    |  23 ++-
 mm/memory.c                                        |   4 +-
 mm/mmap.c                                          |  10 +-
 mm/rmap.c                                          |   4 +-
 59 files changed, 643 insertions(+), 378 deletions(-)

-- 
2.12.0

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 01/17] rbtree: cache leftmost node internally
  2017-07-19  1:45 [PATCH -next v4 00/17] rbtree: cache leftmost node internally Davidlohr Bueso
@ 2017-07-19  1:45 ` Davidlohr Bueso
  2017-07-19  1:45 ` [PATCH 02/17] rbtree: optimize root-check during rebalancing loop Davidlohr Bueso
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 29+ messages in thread
From: Davidlohr Bueso @ 2017-07-19  1:45 UTC (permalink / raw)
  To: akpm
  Cc: mingo, peterz, jack, torvalds, kirill.shutemov, hch, ldufour,
	mhocko, mgorman, dave, linux-kernel, Davidlohr Bueso

Red-black tree semantics imply that nodes with smaller or
greater (or equal for duplicates) keys always be to the
left and right, respectively. For the kernel this is
extremely evident when considering our rb_first() semantics.
Enabling lookups for the smallest node in the tree in O(1)
can save a good chunk of cycles in not having to walk down the
tree each time. To this end there are a few core users that
explicitly do this, such as the scheduler and rtmutexes.
There is also the desire for interval trees to have this
optimization allowing faster overlap checking.

This patch introduces a new 'struct rb_root_cached' which
is just the root with a cached pointer to the leftmost node.
The reason why the regular rb_root was not extended instead
of adding a new structure was that this allows the user to
have the choice between memory footprint and actual tree
performance. The new wrappers on top of the regular rb_root
calls are:

- rb_first_cached(cached_root) -- which is a fast replacement
     for rb_first.

- rb_insert_color_cached(node, cached_root, new)

- rb_erase_cached(node, cached_root)

In addition, augmented cached interfaces are also added for
basic insertion and deletion operations; which becomes
important for the interval tree changes.

With the exception of the inserts, which adds a bool for
updating the new leftmost, the interfaces are kept the same.
To this end, porting rb users to the cached version becomes really
trivial, and keeping current rbtree semantics for users that
don't care about the optimization requires zero overhead.

Reviewed-by: Jan Kara <jack@suse.cz>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 Documentation/rbtree.txt         | 33 +++++++++++++++++++++++++++++++++
 include/linux/rbtree.h           | 21 +++++++++++++++++++++
 include/linux/rbtree_augmented.h | 33 ++++++++++++++++++++++++++++++---
 lib/rbtree.c                     | 34 +++++++++++++++++++++++++++++-----
 4 files changed, 113 insertions(+), 8 deletions(-)

diff --git a/Documentation/rbtree.txt b/Documentation/rbtree.txt
index b8a8c70b0188..c42a21b99046 100644
--- a/Documentation/rbtree.txt
+++ b/Documentation/rbtree.txt
@@ -193,6 +193,39 @@ Example::
   for (node = rb_first(&mytree); node; node = rb_next(node))
 	printk("key=%s\n", rb_entry(node, struct mytype, node)->keystring);
 
+Cached rbtrees
+--------------
+
+Computing the leftmost (smallest) node is quite a common task for binary
+search trees, such as for traversals or users relying on a the particular
+order for their own logic. To this end, users can use 'struct rb_root_cached'
+to optimize O(logN) rb_first() calls to a simple pointer fetch avoiding
+potentially expensive tree iterations. This is done at negligible runtime
+overhead for maintanence; albeit larger memory footprint.
+
+Similar to the rb_root structure, cached rbtrees are initialized to be
+empty via:
+
+  struct rb_root_cached mytree = RB_ROOT_CACHED;
+
+Cached rbtree is simply a regular rb_root with an extra pointer to cache the
+leftmost node. This allows rb_root_cached to exist wherever rb_root does,
+which permits augmented trees to be supported as well as only a few extra
+interfaces:
+
+  struct rb_node *rb_first_cached(struct rb_root_cached *tree);
+  void rb_insert_color_cached(struct rb_node *, struct rb_root_cached *, bool);
+  void rb_erase_cached(struct rb_node *node, struct rb_root_cached *);
+
+Both insert and erase calls have their respective counterpart of augmented
+trees:
+
+  void rb_insert_augmented_cached(struct rb_node *node, struct rb_root_cached *,
+				  bool, struct rb_augment_callbacks *);
+  void rb_erase_augmented_cached(struct rb_node *, struct rb_root_cached *,
+				 struct rb_augment_callbacks *);
+
+
 Support for Augmented rbtrees
 -----------------------------
 
diff --git a/include/linux/rbtree.h b/include/linux/rbtree.h
index e585018498d5..d574361943ea 100644
--- a/include/linux/rbtree.h
+++ b/include/linux/rbtree.h
@@ -44,10 +44,25 @@ struct rb_root {
 	struct rb_node *rb_node;
 };
 
+/*
+ * Leftmost-cached rbtrees.
+ *
+ * We do not cache the rightmost node based on footprint
+ * size vs number of potential users that could benefit
+ * from O(1) rb_last(). Just not worth it, users that want
+ * this feature can always implement the logic explicitly.
+ * Furthermore, users that want to cache both pointers may
+ * find it a bit asymmetric, but that's ok.
+ */
+struct rb_root_cached {
+	struct rb_root rb_root;
+	struct rb_node *rb_leftmost;
+};
 
 #define rb_parent(r)   ((struct rb_node *)((r)->__rb_parent_color & ~3))
 
 #define RB_ROOT	(struct rb_root) { NULL, }
+#define RB_ROOT_CACHED (struct rb_root_cached) { {NULL, }, NULL }
 #define	rb_entry(ptr, type, member) container_of(ptr, type, member)
 
 #define RB_EMPTY_ROOT(root)  (READ_ONCE((root)->rb_node) == NULL)
@@ -69,6 +84,12 @@ extern struct rb_node *rb_prev(const struct rb_node *);
 extern struct rb_node *rb_first(const struct rb_root *);
 extern struct rb_node *rb_last(const struct rb_root *);
 
+extern void rb_insert_color_cached(struct rb_node *,
+				   struct rb_root_cached *, bool);
+extern void rb_erase_cached(struct rb_node *node, struct rb_root_cached *);
+/* Same as rb_first(), but O(1) */
+#define rb_first_cached(root) (root)->rb_leftmost
+
 /* Postorder iteration - always visit the parent after its children */
 extern struct rb_node *rb_first_postorder(const struct rb_root *);
 extern struct rb_node *rb_next_postorder(const struct rb_node *);
diff --git a/include/linux/rbtree_augmented.h b/include/linux/rbtree_augmented.h
index 9702b6e183bc..6bfd2b581f75 100644
--- a/include/linux/rbtree_augmented.h
+++ b/include/linux/rbtree_augmented.h
@@ -41,7 +41,9 @@ struct rb_augment_callbacks {
 	void (*rotate)(struct rb_node *old, struct rb_node *new);
 };
 
-extern void __rb_insert_augmented(struct rb_node *node, struct rb_root *root,
+extern void __rb_insert_augmented(struct rb_node *node,
+				  struct rb_root *root,
+				  bool newleft, struct rb_node **leftmost,
 	void (*augment_rotate)(struct rb_node *old, struct rb_node *new));
 /*
  * Fixup the rbtree and update the augmented information when rebalancing.
@@ -57,7 +59,16 @@ static inline void
 rb_insert_augmented(struct rb_node *node, struct rb_root *root,
 		    const struct rb_augment_callbacks *augment)
 {
-	__rb_insert_augmented(node, root, augment->rotate);
+	__rb_insert_augmented(node, root, false, NULL, augment->rotate);
+}
+
+static inline void
+rb_insert_augmented_cached(struct rb_node *node,
+			   struct rb_root_cached *root, bool newleft,
+			   const struct rb_augment_callbacks *augment)
+{
+	__rb_insert_augmented(node, &root->rb_root,
+			      newleft, &root->rb_leftmost, augment->rotate);
 }
 
 #define RB_DECLARE_CALLBACKS(rbstatic, rbname, rbstruct, rbfield,	\
@@ -150,6 +161,7 @@ extern void __rb_erase_color(struct rb_node *parent, struct rb_root *root,
 
 static __always_inline struct rb_node *
 __rb_erase_augmented(struct rb_node *node, struct rb_root *root,
+		     struct rb_node **leftmost,
 		     const struct rb_augment_callbacks *augment)
 {
 	struct rb_node *child = node->rb_right;
@@ -157,6 +169,9 @@ __rb_erase_augmented(struct rb_node *node, struct rb_root *root,
 	struct rb_node *parent, *rebalance;
 	unsigned long pc;
 
+	if (leftmost && node == *leftmost)
+		*leftmost = rb_next(node);
+
 	if (!tmp) {
 		/*
 		 * Case 1: node to erase has no more than 1 child (easy!)
@@ -256,9 +271,21 @@ static __always_inline void
 rb_erase_augmented(struct rb_node *node, struct rb_root *root,
 		   const struct rb_augment_callbacks *augment)
 {
-	struct rb_node *rebalance = __rb_erase_augmented(node, root, augment);
+	struct rb_node *rebalance = __rb_erase_augmented(node, root,
+							 NULL, augment);
 	if (rebalance)
 		__rb_erase_color(rebalance, root, augment->rotate);
 }
 
+static __always_inline void
+rb_erase_augmented_cached(struct rb_node *node, struct rb_root_cached *root,
+			  const struct rb_augment_callbacks *augment)
+{
+	struct rb_node *rebalance = __rb_erase_augmented(node, &root->rb_root,
+							 &root->rb_leftmost,
+							 augment);
+	if (rebalance)
+		__rb_erase_color(rebalance, &root->rb_root, augment->rotate);
+}
+
 #endif	/* _LINUX_RBTREE_AUGMENTED_H */
diff --git a/lib/rbtree.c b/lib/rbtree.c
index 4ba2828a67c0..d102d9d2ffaa 100644
--- a/lib/rbtree.c
+++ b/lib/rbtree.c
@@ -95,10 +95,14 @@ __rb_rotate_set_parents(struct rb_node *old, struct rb_node *new,
 
 static __always_inline void
 __rb_insert(struct rb_node *node, struct rb_root *root,
+	    bool newleft, struct rb_node **leftmost,
 	    void (*augment_rotate)(struct rb_node *old, struct rb_node *new))
 {
 	struct rb_node *parent = rb_red_parent(node), *gparent, *tmp;
 
+	if (newleft)
+		*leftmost = node;
+
 	while (true) {
 		/*
 		 * Loop invariant: node is red
@@ -434,19 +438,38 @@ static const struct rb_augment_callbacks dummy_callbacks = {
 
 void rb_insert_color(struct rb_node *node, struct rb_root *root)
 {
-	__rb_insert(node, root, dummy_rotate);
+	__rb_insert(node, root, false, NULL, dummy_rotate);
 }
 EXPORT_SYMBOL(rb_insert_color);
 
 void rb_erase(struct rb_node *node, struct rb_root *root)
 {
 	struct rb_node *rebalance;
-	rebalance = __rb_erase_augmented(node, root, &dummy_callbacks);
+	rebalance = __rb_erase_augmented(node, root,
+					 NULL, &dummy_callbacks);
 	if (rebalance)
 		____rb_erase_color(rebalance, root, dummy_rotate);
 }
 EXPORT_SYMBOL(rb_erase);
 
+void rb_insert_color_cached(struct rb_node *node,
+			    struct rb_root_cached *root, bool leftmost)
+{
+	__rb_insert(node, &root->rb_root, leftmost,
+		    &root->rb_leftmost, dummy_rotate);
+}
+EXPORT_SYMBOL(rb_insert_color_cached);
+
+void rb_erase_cached(struct rb_node *node, struct rb_root_cached *root)
+{
+	struct rb_node *rebalance;
+	rebalance = __rb_erase_augmented(node, &root->rb_root,
+					 &root->rb_leftmost, &dummy_callbacks);
+	if (rebalance)
+		____rb_erase_color(rebalance, &root->rb_root, dummy_rotate);
+}
+EXPORT_SYMBOL(rb_erase_cached);
+
 /*
  * Augmented rbtree manipulation functions.
  *
@@ -455,9 +478,10 @@ EXPORT_SYMBOL(rb_erase);
  */
 
 void __rb_insert_augmented(struct rb_node *node, struct rb_root *root,
+			   bool newleft, struct rb_node **leftmost,
 	void (*augment_rotate)(struct rb_node *old, struct rb_node *new))
 {
-	__rb_insert(node, root, augment_rotate);
+	__rb_insert(node, root, newleft, leftmost, augment_rotate);
 }
 EXPORT_SYMBOL(__rb_insert_augmented);
 
@@ -502,7 +526,7 @@ struct rb_node *rb_next(const struct rb_node *node)
 	 * as we can.
 	 */
 	if (node->rb_right) {
-		node = node->rb_right; 
+		node = node->rb_right;
 		while (node->rb_left)
 			node=node->rb_left;
 		return (struct rb_node *)node;
@@ -534,7 +558,7 @@ struct rb_node *rb_prev(const struct rb_node *node)
 	 * as we can.
 	 */
 	if (node->rb_left) {
-		node = node->rb_left; 
+		node = node->rb_left;
 		while (node->rb_right)
 			node=node->rb_right;
 		return (struct rb_node *)node;
-- 
2.12.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 02/17] rbtree: optimize root-check during rebalancing loop
  2017-07-19  1:45 [PATCH -next v4 00/17] rbtree: cache leftmost node internally Davidlohr Bueso
  2017-07-19  1:45 ` [PATCH 01/17] " Davidlohr Bueso
@ 2017-07-19  1:45 ` Davidlohr Bueso
  2017-07-19  1:45 ` [PATCH 03/17] rbtree: add some additional comments for rebalancing cases Davidlohr Bueso
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 29+ messages in thread
From: Davidlohr Bueso @ 2017-07-19  1:45 UTC (permalink / raw)
  To: akpm
  Cc: mingo, peterz, jack, torvalds, kirill.shutemov, hch, ldufour,
	mhocko, mgorman, dave, linux-kernel, Davidlohr Bueso

The only times the nil-parent (root node) condition is true is
when the node is the first in the tree, or after fixing rbtree
rule #4 and the case 1 rebalancing made the node the root. Such
conditions do not apply most of the time:

(i) The common case in an rbtree is to have more than a single
node, so this is only true for the first rb_insert().

(ii) While there is a chance only one first rotation is needed,
cases where the node's uncle is black (cases 2,3) are more
common as we can have the following scenarios during the rotation
looping:

    case1 only, case1+1, case2+3, case1+2+3, case3 only, etc.

This patch, therefore, adds an unlikely() optimization to this
conditional. When profiling with CONFIG_PROFILE_ANNOTATED_BRANCHES,
a kernel build shows that the incorrect rate is less than 15%, and
for workloads that involve insert mostly trees overtime tend to
have less than 2% incorrect rate.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 lib/rbtree.c | 23 ++++++++++++++++-------
 1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/lib/rbtree.c b/lib/rbtree.c
index d102d9d2ffaa..e7cce12f404f 100644
--- a/lib/rbtree.c
+++ b/lib/rbtree.c
@@ -105,16 +105,25 @@ __rb_insert(struct rb_node *node, struct rb_root *root,
 
 	while (true) {
 		/*
-		 * Loop invariant: node is red
-		 *
-		 * If there is a black parent, we are done.
-		 * Otherwise, take some corrective action as we don't
-		 * want a red root or two consecutive red nodes.
+		 * Loop invariant: node is red.
 		 */
-		if (!parent) {
+		if (unlikely(!parent)) {
+			/*
+			 * The inserted node is root. Either this is the
+			 * first node, or we recursed at Case 1 below and
+			 * are no longer violating 4).
+			 */
 			rb_set_parent_color(node, NULL, RB_BLACK);
 			break;
-		} else if (rb_is_black(parent))
+		}
+
+		/*
+		 * If there is a black parent, we are done.
+		 * Otherwise, take some corrective action as,
+		 * per 4), we don't want a red root or two
+		 * consecutive red nodes.
+		 */
+		if(rb_is_black(parent))
 			break;
 
 		gparent = rb_red_parent(parent);
-- 
2.12.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 03/17] rbtree: add some additional comments for rebalancing cases
  2017-07-19  1:45 [PATCH -next v4 00/17] rbtree: cache leftmost node internally Davidlohr Bueso
  2017-07-19  1:45 ` [PATCH 01/17] " Davidlohr Bueso
  2017-07-19  1:45 ` [PATCH 02/17] rbtree: optimize root-check during rebalancing loop Davidlohr Bueso
@ 2017-07-19  1:45 ` Davidlohr Bueso
  2017-07-19  1:45 ` [PATCH 04/17] lib/rbtree_test.c: make input module parameters Davidlohr Bueso
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 29+ messages in thread
From: Davidlohr Bueso @ 2017-07-19  1:45 UTC (permalink / raw)
  To: akpm
  Cc: mingo, peterz, jack, torvalds, kirill.shutemov, hch, ldufour,
	mhocko, mgorman, dave, linux-kernel, Davidlohr Bueso

While overall the code is very nicely commented, it might not be
immediately obvious from the diagrams what is going on. Add a
very brief summary of each case. Opposite cases where the node
is the left child are left untouched.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 lib/rbtree.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/lib/rbtree.c b/lib/rbtree.c
index e7cce12f404f..ba4a9d165f1b 100644
--- a/lib/rbtree.c
+++ b/lib/rbtree.c
@@ -132,7 +132,7 @@ __rb_insert(struct rb_node *node, struct rb_root *root,
 		if (parent != tmp) {	/* parent == gparent->rb_left */
 			if (tmp && rb_is_red(tmp)) {
 				/*
-				 * Case 1 - color flips
+				 * Case 1 - node's uncle is red (color flips).
 				 *
 				 *       G            g
 				 *      / \          / \
@@ -155,7 +155,8 @@ __rb_insert(struct rb_node *node, struct rb_root *root,
 			tmp = parent->rb_right;
 			if (node == tmp) {
 				/*
-				 * Case 2 - left rotate at parent
+				 * Case 2 - node's uncle is black and node is
+				 * the parent's right child (left rotate at parent).
 				 *
 				 *      G             G
 				 *     / \           / \
@@ -179,7 +180,8 @@ __rb_insert(struct rb_node *node, struct rb_root *root,
 			}
 
 			/*
-			 * Case 3 - right rotate at gparent
+			 * Case 3 - node's uncle is black and node is
+			 * the parent's left child (right rotate at gparent).
 			 *
 			 *        G           P
 			 *       / \         / \
-- 
2.12.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 04/17] lib/rbtree_test.c: make input module parameters
  2017-07-19  1:45 [PATCH -next v4 00/17] rbtree: cache leftmost node internally Davidlohr Bueso
                   ` (2 preceding siblings ...)
  2017-07-19  1:45 ` [PATCH 03/17] rbtree: add some additional comments for rebalancing cases Davidlohr Bueso
@ 2017-07-19  1:45 ` Davidlohr Bueso
  2017-07-19  1:45 ` [PATCH 05/17] lib/rbtree_test.c: add (inorder) traversal test Davidlohr Bueso
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 29+ messages in thread
From: Davidlohr Bueso @ 2017-07-19  1:45 UTC (permalink / raw)
  To: akpm
  Cc: mingo, peterz, jack, torvalds, kirill.shutemov, hch, ldufour,
	mhocko, mgorman, dave, linux-kernel, Davidlohr Bueso

Allows for more flexible debugging.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 lib/rbtree_test.c | 55 ++++++++++++++++++++++++++++++++++---------------------
 1 file changed, 34 insertions(+), 21 deletions(-)

diff --git a/lib/rbtree_test.c b/lib/rbtree_test.c
index 8b3c9dc88262..e83331aa1b7f 100644
--- a/lib/rbtree_test.c
+++ b/lib/rbtree_test.c
@@ -1,11 +1,18 @@
 #include <linux/module.h>
+#include <linux/moduleparam.h>
 #include <linux/rbtree_augmented.h>
 #include <linux/random.h>
+#include <linux/slab.h>
 #include <asm/timex.h>
 
-#define NODES       100
-#define PERF_LOOPS  100000
-#define CHECK_LOOPS 100
+#define __param(type, name, init, msg)		\
+	static type name = init;		\
+	module_param(name, type, 0444);		\
+	MODULE_PARM_DESC(name, msg);
+
+__param(int, nnodes, 100, "Number of nodes in the rb-tree");
+__param(int, perf_loops, 100000, "Number of iterations modifying the rb-tree");
+__param(int, check_loops, 100, "Number of iterations modifying and verifying the rb-tree");
 
 struct test_node {
 	u32 key;
@@ -17,7 +24,7 @@ struct test_node {
 };
 
 static struct rb_root root = RB_ROOT;
-static struct test_node nodes[NODES];
+static struct test_node *nodes = NULL;
 
 static struct rnd_state rnd;
 
@@ -95,7 +102,7 @@ static void erase_augmented(struct test_node *node, struct rb_root *root)
 static void init(void)
 {
 	int i;
-	for (i = 0; i < NODES; i++) {
+	for (i = 0; i < nnodes; i++) {
 		nodes[i].key = prandom_u32_state(&rnd);
 		nodes[i].val = prandom_u32_state(&rnd);
 	}
@@ -177,6 +184,10 @@ static int __init rbtree_test_init(void)
 	int i, j;
 	cycles_t time1, time2, time;
 
+	nodes = kmalloc(nnodes * sizeof(*nodes), GFP_KERNEL);
+	if (!nodes)
+		return -ENOMEM;
+
 	printk(KERN_ALERT "rbtree testing");
 
 	prandom_seed_state(&rnd, 3141592653589793238ULL);
@@ -184,27 +195,27 @@ static int __init rbtree_test_init(void)
 
 	time1 = get_cycles();
 
-	for (i = 0; i < PERF_LOOPS; i++) {
-		for (j = 0; j < NODES; j++)
+	for (i = 0; i < perf_loops; i++) {
+		for (j = 0; j < nnodes; j++)
 			insert(nodes + j, &root);
-		for (j = 0; j < NODES; j++)
+		for (j = 0; j < nnodes; j++)
 			erase(nodes + j, &root);
 	}
 
 	time2 = get_cycles();
 	time = time2 - time1;
 
-	time = div_u64(time, PERF_LOOPS);
+	time = div_u64(time, perf_loops);
 	printk(" -> %llu cycles\n", (unsigned long long)time);
 
-	for (i = 0; i < CHECK_LOOPS; i++) {
+	for (i = 0; i < check_loops; i++) {
 		init();
-		for (j = 0; j < NODES; j++) {
+		for (j = 0; j < nnodes; j++) {
 			check(j);
 			insert(nodes + j, &root);
 		}
-		for (j = 0; j < NODES; j++) {
-			check(NODES - j);
+		for (j = 0; j < nnodes; j++) {
+			check(nnodes - j);
 			erase(nodes + j, &root);
 		}
 		check(0);
@@ -216,32 +227,34 @@ static int __init rbtree_test_init(void)
 
 	time1 = get_cycles();
 
-	for (i = 0; i < PERF_LOOPS; i++) {
-		for (j = 0; j < NODES; j++)
+	for (i = 0; i < perf_loops; i++) {
+		for (j = 0; j < nnodes; j++)
 			insert_augmented(nodes + j, &root);
-		for (j = 0; j < NODES; j++)
+		for (j = 0; j < nnodes; j++)
 			erase_augmented(nodes + j, &root);
 	}
 
 	time2 = get_cycles();
 	time = time2 - time1;
 
-	time = div_u64(time, PERF_LOOPS);
+	time = div_u64(time, perf_loops);
 	printk(" -> %llu cycles\n", (unsigned long long)time);
 
-	for (i = 0; i < CHECK_LOOPS; i++) {
+	for (i = 0; i < check_loops; i++) {
 		init();
-		for (j = 0; j < NODES; j++) {
+		for (j = 0; j < nnodes; j++) {
 			check_augmented(j);
 			insert_augmented(nodes + j, &root);
 		}
-		for (j = 0; j < NODES; j++) {
-			check_augmented(NODES - j);
+		for (j = 0; j < nnodes; j++) {
+			check_augmented(nnodes - j);
 			erase_augmented(nodes + j, &root);
 		}
 		check_augmented(0);
 	}
 
+	kfree(nodes);
+
 	return -EAGAIN; /* Fail will directly unload the module */
 }
 
-- 
2.12.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 05/17] lib/rbtree_test.c: add (inorder) traversal test
  2017-07-19  1:45 [PATCH -next v4 00/17] rbtree: cache leftmost node internally Davidlohr Bueso
                   ` (3 preceding siblings ...)
  2017-07-19  1:45 ` [PATCH 04/17] lib/rbtree_test.c: make input module parameters Davidlohr Bueso
@ 2017-07-19  1:45 ` Davidlohr Bueso
  2017-07-19  1:45 ` [PATCH 06/17] lib/rbtree_test.c: support rb_root_cached Davidlohr Bueso
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 29+ messages in thread
From: Davidlohr Bueso @ 2017-07-19  1:45 UTC (permalink / raw)
  To: akpm
  Cc: mingo, peterz, jack, torvalds, kirill.shutemov, hch, ldufour,
	mhocko, mgorman, dave, linux-kernel, Davidlohr Bueso

This adds a second test for regular rb-tree testing in
that there is no need to repeat it for the augmented
flavor.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 lib/rbtree_test.c | 25 +++++++++++++++++++++++--
 1 file changed, 23 insertions(+), 2 deletions(-)

diff --git a/lib/rbtree_test.c b/lib/rbtree_test.c
index e83331aa1b7f..967999ff46db 100644
--- a/lib/rbtree_test.c
+++ b/lib/rbtree_test.c
@@ -183,6 +183,7 @@ static int __init rbtree_test_init(void)
 {
 	int i, j;
 	cycles_t time1, time2, time;
+	struct rb_node *node;
 
 	nodes = kmalloc(nnodes * sizeof(*nodes), GFP_KERNEL);
 	if (!nodes)
@@ -206,8 +207,28 @@ static int __init rbtree_test_init(void)
 	time = time2 - time1;
 
 	time = div_u64(time, perf_loops);
-	printk(" -> %llu cycles\n", (unsigned long long)time);
+	printk(" -> test 1 (latency of nnodes insert+delete): %llu cycles\n", (unsigned long long)time);
 
+	for (i = 0; i < nnodes; i++)
+		insert(nodes + i, &root);
+
+	time1 = get_cycles();
+
+	for (i = 0; i < perf_loops; i++) {
+		for (node = rb_first(&root); node; node = rb_next(node))
+			;
+	}
+
+	time2 = get_cycles();
+	time = time2 - time1;
+
+	time = div_u64(time, perf_loops);
+	printk(" -> test 2 (latency of inorder traversal): %llu cycles\n", (unsigned long long)time);
+
+	for (i = 0; i < nnodes; i++)
+		erase(nodes + i, &root);
+
+	/* run checks */
 	for (i = 0; i < check_loops; i++) {
 		init();
 		for (j = 0; j < nnodes; j++) {
@@ -238,7 +259,7 @@ static int __init rbtree_test_init(void)
 	time = time2 - time1;
 
 	time = div_u64(time, perf_loops);
-	printk(" -> %llu cycles\n", (unsigned long long)time);
+	printk(" -> test 1 (latency of nnodes insert+delete): %llu cycles\n", (unsigned long long)time);
 
 	for (i = 0; i < check_loops; i++) {
 		init();
-- 
2.12.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 06/17] lib/rbtree_test.c: support rb_root_cached
  2017-07-19  1:45 [PATCH -next v4 00/17] rbtree: cache leftmost node internally Davidlohr Bueso
                   ` (4 preceding siblings ...)
  2017-07-19  1:45 ` [PATCH 05/17] lib/rbtree_test.c: add (inorder) traversal test Davidlohr Bueso
@ 2017-07-19  1:45 ` Davidlohr Bueso
  2017-07-19  1:45 ` [PATCH 07/17] sched/fair: replace cfs_rq->rb_leftmost Davidlohr Bueso
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 29+ messages in thread
From: Davidlohr Bueso @ 2017-07-19  1:45 UTC (permalink / raw)
  To: akpm
  Cc: mingo, peterz, jack, torvalds, kirill.shutemov, hch, ldufour,
	mhocko, mgorman, dave, linux-kernel, Davidlohr Bueso

We can work with a single rb_root_cached root to test
both cached and non-cached rbtrees. In addition, also
add a test to measure latencies between rb_first and
its fast counterpart.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 lib/rbtree_test.c | 156 +++++++++++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 137 insertions(+), 19 deletions(-)

diff --git a/lib/rbtree_test.c b/lib/rbtree_test.c
index 967999ff46db..191a238e5a9d 100644
--- a/lib/rbtree_test.c
+++ b/lib/rbtree_test.c
@@ -23,14 +23,14 @@ struct test_node {
 	u32 augmented;
 };
 
-static struct rb_root root = RB_ROOT;
+static struct rb_root_cached root = RB_ROOT_CACHED;
 static struct test_node *nodes = NULL;
 
 static struct rnd_state rnd;
 
-static void insert(struct test_node *node, struct rb_root *root)
+static void insert(struct test_node *node, struct rb_root_cached *root)
 {
-	struct rb_node **new = &root->rb_node, *parent = NULL;
+	struct rb_node **new = &root->rb_root.rb_node, *parent = NULL;
 	u32 key = node->key;
 
 	while (*new) {
@@ -42,14 +42,40 @@ static void insert(struct test_node *node, struct rb_root *root)
 	}
 
 	rb_link_node(&node->rb, parent, new);
-	rb_insert_color(&node->rb, root);
+	rb_insert_color(&node->rb, &root->rb_root);
 }
 
-static inline void erase(struct test_node *node, struct rb_root *root)
+static void insert_cached(struct test_node *node, struct rb_root_cached *root)
 {
-	rb_erase(&node->rb, root);
+	struct rb_node **new = &root->rb_root.rb_node, *parent = NULL;
+	u32 key = node->key;
+	bool leftmost = true;
+
+	while (*new) {
+		parent = *new;
+		if (key < rb_entry(parent, struct test_node, rb)->key)
+			new = &parent->rb_left;
+		else {
+			new = &parent->rb_right;
+			leftmost = false;
+		}
+	}
+
+	rb_link_node(&node->rb, parent, new);
+	rb_insert_color_cached(&node->rb, root, leftmost);
 }
 
+static inline void erase(struct test_node *node, struct rb_root_cached *root)
+{
+	rb_erase(&node->rb, &root->rb_root);
+}
+
+static inline void erase_cached(struct test_node *node, struct rb_root_cached *root)
+{
+	rb_erase_cached(&node->rb, root);
+}
+
+
 static inline u32 augment_recompute(struct test_node *node)
 {
 	u32 max = node->val, child_augmented;
@@ -71,9 +97,10 @@ static inline u32 augment_recompute(struct test_node *node)
 RB_DECLARE_CALLBACKS(static, augment_callbacks, struct test_node, rb,
 		     u32, augmented, augment_recompute)
 
-static void insert_augmented(struct test_node *node, struct rb_root *root)
+static void insert_augmented(struct test_node *node,
+			     struct rb_root_cached *root)
 {
-	struct rb_node **new = &root->rb_node, *rb_parent = NULL;
+	struct rb_node **new = &root->rb_root.rb_node, *rb_parent = NULL;
 	u32 key = node->key;
 	u32 val = node->val;
 	struct test_node *parent;
@@ -91,12 +118,47 @@ static void insert_augmented(struct test_node *node, struct rb_root *root)
 
 	node->augmented = val;
 	rb_link_node(&node->rb, rb_parent, new);
-	rb_insert_augmented(&node->rb, root, &augment_callbacks);
+	rb_insert_augmented(&node->rb, &root->rb_root, &augment_callbacks);
+}
+
+static void insert_augmented_cached(struct test_node *node,
+				    struct rb_root_cached *root)
+{
+	struct rb_node **new = &root->rb_root.rb_node, *rb_parent = NULL;
+	u32 key = node->key;
+	u32 val = node->val;
+	struct test_node *parent;
+	bool leftmost = true;
+
+	while (*new) {
+		rb_parent = *new;
+		parent = rb_entry(rb_parent, struct test_node, rb);
+		if (parent->augmented < val)
+			parent->augmented = val;
+		if (key < parent->key)
+			new = &parent->rb.rb_left;
+		else {
+			new = &parent->rb.rb_right;
+			leftmost = false;
+		}
+	}
+
+	node->augmented = val;
+	rb_link_node(&node->rb, rb_parent, new);
+	rb_insert_augmented_cached(&node->rb, root,
+				   leftmost, &augment_callbacks);
+}
+
+
+static void erase_augmented(struct test_node *node, struct rb_root_cached *root)
+{
+	rb_erase_augmented(&node->rb, &root->rb_root, &augment_callbacks);
 }
 
-static void erase_augmented(struct test_node *node, struct rb_root *root)
+static void erase_augmented_cached(struct test_node *node,
+				   struct rb_root_cached *root)
 {
-	rb_erase_augmented(&node->rb, root, &augment_callbacks);
+	rb_erase_augmented_cached(&node->rb, root, &augment_callbacks);
 }
 
 static void init(void)
@@ -125,7 +187,7 @@ static void check_postorder_foreach(int nr_nodes)
 {
 	struct test_node *cur, *n;
 	int count = 0;
-	rbtree_postorder_for_each_entry_safe(cur, n, &root, rb)
+	rbtree_postorder_for_each_entry_safe(cur, n, &root.rb_root, rb)
 		count++;
 
 	WARN_ON_ONCE(count != nr_nodes);
@@ -135,7 +197,7 @@ static void check_postorder(int nr_nodes)
 {
 	struct rb_node *rb;
 	int count = 0;
-	for (rb = rb_first_postorder(&root); rb; rb = rb_next_postorder(rb))
+	for (rb = rb_first_postorder(&root.rb_root); rb; rb = rb_next_postorder(rb))
 		count++;
 
 	WARN_ON_ONCE(count != nr_nodes);
@@ -147,7 +209,7 @@ static void check(int nr_nodes)
 	int count = 0, blacks = 0;
 	u32 prev_key = 0;
 
-	for (rb = rb_first(&root); rb; rb = rb_next(rb)) {
+	for (rb = rb_first(&root.rb_root); rb; rb = rb_next(rb)) {
 		struct test_node *node = rb_entry(rb, struct test_node, rb);
 		WARN_ON_ONCE(node->key < prev_key);
 		WARN_ON_ONCE(is_red(rb) &&
@@ -162,7 +224,7 @@ static void check(int nr_nodes)
 	}
 
 	WARN_ON_ONCE(count != nr_nodes);
-	WARN_ON_ONCE(count < (1 << black_path_count(rb_last(&root))) - 1);
+	WARN_ON_ONCE(count < (1 << black_path_count(rb_last(&root.rb_root))) - 1);
 
 	check_postorder(nr_nodes);
 	check_postorder_foreach(nr_nodes);
@@ -173,7 +235,7 @@ static void check_augmented(int nr_nodes)
 	struct rb_node *rb;
 
 	check(nr_nodes);
-	for (rb = rb_first(&root); rb; rb = rb_next(rb)) {
+	for (rb = rb_first(&root.rb_root); rb; rb = rb_next(rb)) {
 		struct test_node *node = rb_entry(rb, struct test_node, rb);
 		WARN_ON_ONCE(node->augmented != augment_recompute(node));
 	}
@@ -207,7 +269,24 @@ static int __init rbtree_test_init(void)
 	time = time2 - time1;
 
 	time = div_u64(time, perf_loops);
-	printk(" -> test 1 (latency of nnodes insert+delete): %llu cycles\n", (unsigned long long)time);
+	printk(" -> test 1 (latency of nnodes insert+delete): %llu cycles\n",
+	       (unsigned long long)time);
+
+	time1 = get_cycles();
+
+	for (i = 0; i < perf_loops; i++) {
+		for (j = 0; j < nnodes; j++)
+			insert_cached(nodes + j, &root);
+		for (j = 0; j < nnodes; j++)
+			erase_cached(nodes + j, &root);
+	}
+
+	time2 = get_cycles();
+	time = time2 - time1;
+
+	time = div_u64(time, perf_loops);
+	printk(" -> test 2 (latency of nnodes cached insert+delete): %llu cycles\n",
+	       (unsigned long long)time);
 
 	for (i = 0; i < nnodes; i++)
 		insert(nodes + i, &root);
@@ -215,7 +294,7 @@ static int __init rbtree_test_init(void)
 	time1 = get_cycles();
 
 	for (i = 0; i < perf_loops; i++) {
-		for (node = rb_first(&root); node; node = rb_next(node))
+		for (node = rb_first(&root.rb_root); node; node = rb_next(node))
 			;
 	}
 
@@ -223,7 +302,31 @@ static int __init rbtree_test_init(void)
 	time = time2 - time1;
 
 	time = div_u64(time, perf_loops);
-	printk(" -> test 2 (latency of inorder traversal): %llu cycles\n", (unsigned long long)time);
+	printk(" -> test 3 (latency of inorder traversal): %llu cycles\n",
+	       (unsigned long long)time);
+
+	time1 = get_cycles();
+
+	for (i = 0; i < perf_loops; i++)
+		node = rb_first(&root.rb_root);
+
+	time2 = get_cycles();
+	time = time2 - time1;
+
+	time = div_u64(time, perf_loops);
+	printk(" -> test 4 (latency to fetch first node)\n");
+	printk("        non-cached: %llu cycles\n", (unsigned long long)time);
+
+	time1 = get_cycles();
+
+	for (i = 0; i < perf_loops; i++)
+		node = rb_first_cached(&root);
+
+	time2 = get_cycles();
+	time = time2 - time1;
+
+	time = div_u64(time, perf_loops);
+	printk("        cached: %llu cycles\n", (unsigned long long)time);
 
 	for (i = 0; i < nnodes; i++)
 		erase(nodes + i, &root);
@@ -261,6 +364,21 @@ static int __init rbtree_test_init(void)
 	time = div_u64(time, perf_loops);
 	printk(" -> test 1 (latency of nnodes insert+delete): %llu cycles\n", (unsigned long long)time);
 
+	time1 = get_cycles();
+
+	for (i = 0; i < perf_loops; i++) {
+		for (j = 0; j < nnodes; j++)
+			insert_augmented_cached(nodes + j, &root);
+		for (j = 0; j < nnodes; j++)
+			erase_augmented_cached(nodes + j, &root);
+	}
+
+	time2 = get_cycles();
+	time = time2 - time1;
+
+	time = div_u64(time, perf_loops);
+	printk(" -> test 2 (latency of nnodes cached insert+delete): %llu cycles\n", (unsigned long long)time);
+
 	for (i = 0; i < check_loops; i++) {
 		init();
 		for (j = 0; j < nnodes; j++) {
-- 
2.12.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 07/17] sched/fair: replace cfs_rq->rb_leftmost
  2017-07-19  1:45 [PATCH -next v4 00/17] rbtree: cache leftmost node internally Davidlohr Bueso
                   ` (5 preceding siblings ...)
  2017-07-19  1:45 ` [PATCH 06/17] lib/rbtree_test.c: support rb_root_cached Davidlohr Bueso
@ 2017-07-19  1:45 ` Davidlohr Bueso
  2017-07-19  1:45 ` [PATCH 08/17] sched/deadline: replace earliest dl and rq leftmost caching Davidlohr Bueso
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 29+ messages in thread
From: Davidlohr Bueso @ 2017-07-19  1:45 UTC (permalink / raw)
  To: akpm
  Cc: mingo, peterz, jack, torvalds, kirill.shutemov, hch, ldufour,
	mhocko, mgorman, dave, linux-kernel, Davidlohr Bueso

... with the generic rbtree flavor instead. No changes
in semantics whatsoever.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 kernel/sched/debug.c |  2 +-
 kernel/sched/fair.c  | 39 +++++++++++++--------------------------
 kernel/sched/sched.h |  3 +--
 3 files changed, 15 insertions(+), 29 deletions(-)

diff --git a/kernel/sched/debug.c b/kernel/sched/debug.c
index 4fa66de52bd6..10ab7294eb6b 100644
--- a/kernel/sched/debug.c
+++ b/kernel/sched/debug.c
@@ -488,7 +488,7 @@ void print_cfs_rq(struct seq_file *m, int cpu, struct cfs_rq *cfs_rq)
 			SPLIT_NS(cfs_rq->exec_clock));
 
 	raw_spin_lock_irqsave(&rq->lock, flags);
-	if (cfs_rq->rb_leftmost)
+	if (rb_first_cached(&cfs_rq->tasks_timeline))
 		MIN_vruntime = (__pick_first_entity(cfs_rq))->vruntime;
 	last = __pick_last_entity(cfs_rq);
 	if (last)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index c95880e216f6..74c36d4b1cc5 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -513,6 +513,7 @@ static inline int entity_before(struct sched_entity *a,
 static void update_min_vruntime(struct cfs_rq *cfs_rq)
 {
 	struct sched_entity *curr = cfs_rq->curr;
+	struct rb_node *leftmost = rb_first_cached(&cfs_rq->tasks_timeline);
 
 	u64 vruntime = cfs_rq->min_vruntime;
 
@@ -523,10 +524,9 @@ static void update_min_vruntime(struct cfs_rq *cfs_rq)
 			curr = NULL;
 	}
 
-	if (cfs_rq->rb_leftmost) {
-		struct sched_entity *se = rb_entry(cfs_rq->rb_leftmost,
-						   struct sched_entity,
-						   run_node);
+	if (leftmost) { /* non-empty tree */
+		struct sched_entity *se;
+		se = rb_entry(leftmost, struct sched_entity, run_node);
 
 		if (!curr)
 			vruntime = se->vruntime;
@@ -547,10 +547,10 @@ static void update_min_vruntime(struct cfs_rq *cfs_rq)
  */
 static void __enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se)
 {
-	struct rb_node **link = &cfs_rq->tasks_timeline.rb_node;
+	struct rb_node **link = &cfs_rq->tasks_timeline.rb_root.rb_node;
 	struct rb_node *parent = NULL;
 	struct sched_entity *entry;
-	int leftmost = 1;
+	bool leftmost = true;
 
 	/*
 	 * Find the right place in the rbtree:
@@ -566,36 +566,23 @@ static void __enqueue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se)
 			link = &parent->rb_left;
 		} else {
 			link = &parent->rb_right;
-			leftmost = 0;
+			leftmost = false;
 		}
 	}
 
-	/*
-	 * Maintain a cache of leftmost tree entries (it is frequently
-	 * used):
-	 */
-	if (leftmost)
-		cfs_rq->rb_leftmost = &se->run_node;
-
 	rb_link_node(&se->run_node, parent, link);
-	rb_insert_color(&se->run_node, &cfs_rq->tasks_timeline);
+	rb_insert_color_cached(&se->run_node,
+			       &cfs_rq->tasks_timeline, leftmost);
 }
 
 static void __dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se)
 {
-	if (cfs_rq->rb_leftmost == &se->run_node) {
-		struct rb_node *next_node;
-
-		next_node = rb_next(&se->run_node);
-		cfs_rq->rb_leftmost = next_node;
-	}
-
-	rb_erase(&se->run_node, &cfs_rq->tasks_timeline);
+	rb_erase_cached(&se->run_node, &cfs_rq->tasks_timeline);
 }
 
 struct sched_entity *__pick_first_entity(struct cfs_rq *cfs_rq)
 {
-	struct rb_node *left = cfs_rq->rb_leftmost;
+	struct rb_node *left = rb_first_cached(&cfs_rq->tasks_timeline);
 
 	if (!left)
 		return NULL;
@@ -616,7 +603,7 @@ static struct sched_entity *__pick_next_entity(struct sched_entity *se)
 #ifdef CONFIG_SCHED_DEBUG
 struct sched_entity *__pick_last_entity(struct cfs_rq *cfs_rq)
 {
-	struct rb_node *last = rb_last(&cfs_rq->tasks_timeline);
+	struct rb_node *last = rb_last(&cfs_rq->tasks_timeline.rb_root);
 
 	if (!last)
 		return NULL;
@@ -9171,7 +9158,7 @@ static void set_curr_task_fair(struct rq *rq)
 
 void init_cfs_rq(struct cfs_rq *cfs_rq)
 {
-	cfs_rq->tasks_timeline = RB_ROOT;
+	cfs_rq->tasks_timeline = RB_ROOT_CACHED;
 	cfs_rq->min_vruntime = (u64)(-(1LL << 20));
 #ifndef CONFIG_64BIT
 	cfs_rq->min_vruntime_copy = cfs_rq->min_vruntime;
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index eeef1a3086d1..bf4d3f7d29c7 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -426,8 +426,7 @@ struct cfs_rq {
 	u64 min_vruntime_copy;
 #endif
 
-	struct rb_root tasks_timeline;
-	struct rb_node *rb_leftmost;
+	struct rb_root_cached tasks_timeline;
 
 	/*
 	 * 'curr' points to currently running entity on this cfs_rq.
-- 
2.12.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 08/17] sched/deadline: replace earliest dl and rq leftmost caching
  2017-07-19  1:45 [PATCH -next v4 00/17] rbtree: cache leftmost node internally Davidlohr Bueso
                   ` (6 preceding siblings ...)
  2017-07-19  1:45 ` [PATCH 07/17] sched/fair: replace cfs_rq->rb_leftmost Davidlohr Bueso
@ 2017-07-19  1:45 ` Davidlohr Bueso
  2017-07-19  1:45 ` [PATCH 09/17] locking/rtmutex: replace top-waiter and pi_waiters " Davidlohr Bueso
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 29+ messages in thread
From: Davidlohr Bueso @ 2017-07-19  1:45 UTC (permalink / raw)
  To: akpm
  Cc: mingo, peterz, jack, torvalds, kirill.shutemov, hch, ldufour,
	mhocko, mgorman, dave, linux-kernel, Davidlohr Bueso

... with the generic rbtree flavor instead. No changes
in semantics whatsoever.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 kernel/sched/deadline.c | 50 +++++++++++++++++++------------------------------
 kernel/sched/sched.h    |  6 ++----
 2 files changed, 21 insertions(+), 35 deletions(-)

diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index 755bd3f1a1a9..6e0464b5cd19 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -296,7 +296,7 @@ static inline int is_leftmost(struct task_struct *p, struct dl_rq *dl_rq)
 {
 	struct sched_dl_entity *dl_se = &p->dl;
 
-	return dl_rq->rb_leftmost == &dl_se->rb_node;
+	return dl_rq->root.rb_leftmost == &dl_se->rb_node;
 }
 
 void init_dl_bandwidth(struct dl_bandwidth *dl_b, u64 period, u64 runtime)
@@ -320,7 +320,7 @@ void init_dl_bw(struct dl_bw *dl_b)
 
 void init_dl_rq(struct dl_rq *dl_rq)
 {
-	dl_rq->rb_root = RB_ROOT;
+	dl_rq->root = RB_ROOT_CACHED;
 
 #ifdef CONFIG_SMP
 	/* zero means no -deadline tasks */
@@ -328,7 +328,7 @@ void init_dl_rq(struct dl_rq *dl_rq)
 
 	dl_rq->dl_nr_migratory = 0;
 	dl_rq->overloaded = 0;
-	dl_rq->pushable_dl_tasks_root = RB_ROOT;
+	dl_rq->pushable_dl_tasks_root = RB_ROOT_CACHED;
 #else
 	init_dl_bw(&dl_rq->dl_bw);
 #endif
@@ -410,10 +410,10 @@ static void dec_dl_migration(struct sched_dl_entity *dl_se, struct dl_rq *dl_rq)
 static void enqueue_pushable_dl_task(struct rq *rq, struct task_struct *p)
 {
 	struct dl_rq *dl_rq = &rq->dl;
-	struct rb_node **link = &dl_rq->pushable_dl_tasks_root.rb_node;
+	struct rb_node **link = &dl_rq->pushable_dl_tasks_root.rb_root.rb_node;
 	struct rb_node *parent = NULL;
 	struct task_struct *entry;
-	int leftmost = 1;
+	bool leftmost = true;
 
 	BUG_ON(!RB_EMPTY_NODE(&p->pushable_dl_tasks));
 
@@ -425,17 +425,16 @@ static void enqueue_pushable_dl_task(struct rq *rq, struct task_struct *p)
 			link = &parent->rb_left;
 		else {
 			link = &parent->rb_right;
-			leftmost = 0;
+			leftmost = false;
 		}
 	}
 
-	if (leftmost) {
-		dl_rq->pushable_dl_tasks_leftmost = &p->pushable_dl_tasks;
+	if (leftmost)
 		dl_rq->earliest_dl.next = p->dl.deadline;
-	}
 
 	rb_link_node(&p->pushable_dl_tasks, parent, link);
-	rb_insert_color(&p->pushable_dl_tasks, &dl_rq->pushable_dl_tasks_root);
+	rb_insert_color_cached(&p->pushable_dl_tasks,
+			       &dl_rq->pushable_dl_tasks_root, leftmost);
 }
 
 static void dequeue_pushable_dl_task(struct rq *rq, struct task_struct *p)
@@ -445,24 +444,23 @@ static void dequeue_pushable_dl_task(struct rq *rq, struct task_struct *p)
 	if (RB_EMPTY_NODE(&p->pushable_dl_tasks))
 		return;
 
-	if (dl_rq->pushable_dl_tasks_leftmost == &p->pushable_dl_tasks) {
+	if (dl_rq->pushable_dl_tasks_root.rb_leftmost == &p->pushable_dl_tasks) {
 		struct rb_node *next_node;
 
 		next_node = rb_next(&p->pushable_dl_tasks);
-		dl_rq->pushable_dl_tasks_leftmost = next_node;
 		if (next_node) {
 			dl_rq->earliest_dl.next = rb_entry(next_node,
 				struct task_struct, pushable_dl_tasks)->dl.deadline;
 		}
 	}
 
-	rb_erase(&p->pushable_dl_tasks, &dl_rq->pushable_dl_tasks_root);
+	rb_erase_cached(&p->pushable_dl_tasks, &dl_rq->pushable_dl_tasks_root);
 	RB_CLEAR_NODE(&p->pushable_dl_tasks);
 }
 
 static inline int has_pushable_dl_tasks(struct rq *rq)
 {
-	return !RB_EMPTY_ROOT(&rq->dl.pushable_dl_tasks_root);
+	return !RB_EMPTY_ROOT(&rq->dl.pushable_dl_tasks_root.rb_root);
 }
 
 static int push_dl_task(struct rq *rq);
@@ -1266,7 +1264,7 @@ static void dec_dl_deadline(struct dl_rq *dl_rq, u64 deadline)
 		dl_rq->earliest_dl.next = 0;
 		cpudl_clear(&rq->rd->cpudl, rq->cpu);
 	} else {
-		struct rb_node *leftmost = dl_rq->rb_leftmost;
+		struct rb_node *leftmost = dl_rq->root.rb_leftmost;
 		struct sched_dl_entity *entry;
 
 		entry = rb_entry(leftmost, struct sched_dl_entity, rb_node);
@@ -1313,7 +1311,7 @@ void dec_dl_tasks(struct sched_dl_entity *dl_se, struct dl_rq *dl_rq)
 static void __enqueue_dl_entity(struct sched_dl_entity *dl_se)
 {
 	struct dl_rq *dl_rq = dl_rq_of_se(dl_se);
-	struct rb_node **link = &dl_rq->rb_root.rb_node;
+	struct rb_node **link = &dl_rq->root.rb_root.rb_node;
 	struct rb_node *parent = NULL;
 	struct sched_dl_entity *entry;
 	int leftmost = 1;
@@ -1331,11 +1329,8 @@ static void __enqueue_dl_entity(struct sched_dl_entity *dl_se)
 		}
 	}
 
-	if (leftmost)
-		dl_rq->rb_leftmost = &dl_se->rb_node;
-
 	rb_link_node(&dl_se->rb_node, parent, link);
-	rb_insert_color(&dl_se->rb_node, &dl_rq->rb_root);
+	rb_insert_color_cached(&dl_se->rb_node, &dl_rq->root, leftmost);
 
 	inc_dl_tasks(dl_se, dl_rq);
 }
@@ -1347,14 +1342,7 @@ static void __dequeue_dl_entity(struct sched_dl_entity *dl_se)
 	if (RB_EMPTY_NODE(&dl_se->rb_node))
 		return;
 
-	if (dl_rq->rb_leftmost == &dl_se->rb_node) {
-		struct rb_node *next_node;
-
-		next_node = rb_next(&dl_se->rb_node);
-		dl_rq->rb_leftmost = next_node;
-	}
-
-	rb_erase(&dl_se->rb_node, &dl_rq->rb_root);
+	rb_erase_cached(&dl_se->rb_node, &dl_rq->root);
 	RB_CLEAR_NODE(&dl_se->rb_node);
 
 	dec_dl_tasks(dl_se, dl_rq);
@@ -1647,7 +1635,7 @@ static void start_hrtick_dl(struct rq *rq, struct task_struct *p)
 static struct sched_dl_entity *pick_next_dl_entity(struct rq *rq,
 						   struct dl_rq *dl_rq)
 {
-	struct rb_node *left = dl_rq->rb_leftmost;
+	struct rb_node *left = rb_first_cached(&dl_rq->root);
 
 	if (!left)
 		return NULL;
@@ -1771,7 +1759,7 @@ static int pick_dl_task(struct rq *rq, struct task_struct *p, int cpu)
  */
 static struct task_struct *pick_earliest_pushable_dl_task(struct rq *rq, int cpu)
 {
-	struct rb_node *next_node = rq->dl.pushable_dl_tasks_leftmost;
+	struct rb_node *next_node = rq->dl.pushable_dl_tasks_root.rb_leftmost;
 	struct task_struct *p = NULL;
 
 	if (!has_pushable_dl_tasks(rq))
@@ -1944,7 +1932,7 @@ static struct task_struct *pick_next_pushable_dl_task(struct rq *rq)
 	if (!has_pushable_dl_tasks(rq))
 		return NULL;
 
-	p = rb_entry(rq->dl.pushable_dl_tasks_leftmost,
+	p = rb_entry(rq->dl.pushable_dl_tasks_root.rb_leftmost,
 		     struct task_struct, pushable_dl_tasks);
 
 	BUG_ON(rq->cpu != task_cpu(p));
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index bf4d3f7d29c7..d34d1a0dd563 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -549,8 +549,7 @@ struct rt_rq {
 /* Deadline class' related fields in a runqueue */
 struct dl_rq {
 	/* runqueue is an rbtree, ordered by deadline */
-	struct rb_root rb_root;
-	struct rb_node *rb_leftmost;
+	struct rb_root_cached root;
 
 	unsigned long dl_nr_running;
 
@@ -574,8 +573,7 @@ struct dl_rq {
 	 * an rb-tree, ordered by tasks' deadlines, with caching
 	 * of the leftmost (earliest deadline) element.
 	 */
-	struct rb_root pushable_dl_tasks_root;
-	struct rb_node *pushable_dl_tasks_leftmost;
+	struct rb_root_cached pushable_dl_tasks_root;
 #else
 	struct dl_bw dl_bw;
 #endif
-- 
2.12.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 09/17] locking/rtmutex: replace top-waiter and pi_waiters leftmost caching
  2017-07-19  1:45 [PATCH -next v4 00/17] rbtree: cache leftmost node internally Davidlohr Bueso
                   ` (7 preceding siblings ...)
  2017-07-19  1:45 ` [PATCH 08/17] sched/deadline: replace earliest dl and rq leftmost caching Davidlohr Bueso
@ 2017-07-19  1:45 ` Davidlohr Bueso
  2017-07-19  1:45 ` [PATCH 10/17] block/cfq: replace cfq_rb_root " Davidlohr Bueso
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 29+ messages in thread
From: Davidlohr Bueso @ 2017-07-19  1:45 UTC (permalink / raw)
  To: akpm
  Cc: mingo, peterz, jack, torvalds, kirill.shutemov, hch, ldufour,
	mhocko, mgorman, dave, linux-kernel, Davidlohr Bueso

... with the generic rbtree flavor instead. No changes
in semantics whatsoever.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 include/linux/init_task.h       |  5 ++---
 include/linux/rtmutex.h         | 11 +++++------
 include/linux/sched.h           |  3 +--
 kernel/fork.c                   |  3 +--
 kernel/locking/rtmutex-debug.c  |  2 +-
 kernel/locking/rtmutex.c        | 35 +++++++++++------------------------
 kernel/locking/rtmutex_common.h | 12 ++++++------
 7 files changed, 27 insertions(+), 44 deletions(-)

diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index a2f6707e9fc0..2008de121705 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -181,9 +181,8 @@ extern struct cred init_cred;
 
 #ifdef CONFIG_RT_MUTEXES
 # define INIT_RT_MUTEXES(tsk)						\
-	.pi_waiters = RB_ROOT,						\
-	.pi_top_task = NULL,						\
-	.pi_waiters_leftmost = NULL,
+	.pi_waiters = RB_ROOT_CACHED,					\
+	.pi_top_task = NULL,
 #else
 # define INIT_RT_MUTEXES(tsk)
 #endif
diff --git a/include/linux/rtmutex.h b/include/linux/rtmutex.h
index 44fd002f7cd5..53fcbe9de7fd 100644
--- a/include/linux/rtmutex.h
+++ b/include/linux/rtmutex.h
@@ -22,18 +22,17 @@ extern int max_lock_depth; /* for sysctl */
  * The rt_mutex structure
  *
  * @wait_lock:	spinlock to protect the structure
- * @waiters:	rbtree root to enqueue waiters in priority order
- * @waiters_leftmost: top waiter
+ * @waiters:	rbtree root to enqueue waiters in priority order;
+ *              caches top-waiter (leftmost node).
  * @owner:	the mutex owner
  */
 struct rt_mutex {
 	raw_spinlock_t		wait_lock;
-	struct rb_root          waiters;
-	struct rb_node          *waiters_leftmost;
+	struct rb_root_cached   waiters;
 	struct task_struct	*owner;
 #ifdef CONFIG_DEBUG_RT_MUTEXES
 	int			save_state;
-	const char 		*name, *file;
+	const char		*name, *file;
 	int			line;
 	void			*magic;
 #endif
@@ -84,7 +83,7 @@ do { \
 
 #define __RT_MUTEX_INITIALIZER(mutexname) \
 	{ .wait_lock = __RAW_SPIN_LOCK_UNLOCKED(mutexname.wait_lock) \
-	, .waiters = RB_ROOT \
+	, .waiters = RB_ROOT_CACHED \
 	, .owner = NULL \
 	__DEBUG_RT_MUTEX_INITIALIZER(mutexname) \
 	__DEP_MAP_RT_MUTEX_INITIALIZER(mutexname)}
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 8337e2db0bb2..99753c3e470d 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -811,8 +811,7 @@ struct task_struct {
 
 #ifdef CONFIG_RT_MUTEXES
 	/* PI waiters blocked on a rt_mutex held by this task: */
-	struct rb_root			pi_waiters;
-	struct rb_node			*pi_waiters_leftmost;
+	struct rb_root_cached		pi_waiters;
 	/* Updated under owner's pi_lock and rq lock */
 	struct task_struct		*pi_top_task;
 	/* Deadlock detection and priority inheritance handling: */
diff --git a/kernel/fork.c b/kernel/fork.c
index 17921b0390b4..4235905fe9d1 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1458,8 +1458,7 @@ static void rt_mutex_init_task(struct task_struct *p)
 {
 	raw_spin_lock_init(&p->pi_lock);
 #ifdef CONFIG_RT_MUTEXES
-	p->pi_waiters = RB_ROOT;
-	p->pi_waiters_leftmost = NULL;
+	p->pi_waiters = RB_ROOT_CACHED;
 	p->pi_top_task = NULL;
 	p->pi_blocked_on = NULL;
 #endif
diff --git a/kernel/locking/rtmutex-debug.c b/kernel/locking/rtmutex-debug.c
index ac35e648b0e5..f4a74e78d467 100644
--- a/kernel/locking/rtmutex-debug.c
+++ b/kernel/locking/rtmutex-debug.c
@@ -58,7 +58,7 @@ static void printk_lock(struct rt_mutex *lock, int print_owner)
 
 void rt_mutex_debug_task_free(struct task_struct *task)
 {
-	DEBUG_LOCKS_WARN_ON(!RB_EMPTY_ROOT(&task->pi_waiters));
+	DEBUG_LOCKS_WARN_ON(!RB_EMPTY_ROOT(&task->pi_waiters.rb_root));
 	DEBUG_LOCKS_WARN_ON(task->pi_blocked_on);
 }
 
diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index 649dc9d3951a..6f3dba6e4e9e 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -271,10 +271,10 @@ rt_mutex_waiter_equal(struct rt_mutex_waiter *left,
 static void
 rt_mutex_enqueue(struct rt_mutex *lock, struct rt_mutex_waiter *waiter)
 {
-	struct rb_node **link = &lock->waiters.rb_node;
+	struct rb_node **link = &lock->waiters.rb_root.rb_node;
 	struct rb_node *parent = NULL;
 	struct rt_mutex_waiter *entry;
-	int leftmost = 1;
+	bool leftmost = true;
 
 	while (*link) {
 		parent = *link;
@@ -283,15 +283,12 @@ rt_mutex_enqueue(struct rt_mutex *lock, struct rt_mutex_waiter *waiter)
 			link = &parent->rb_left;
 		} else {
 			link = &parent->rb_right;
-			leftmost = 0;
+			leftmost = false;
 		}
 	}
 
-	if (leftmost)
-		lock->waiters_leftmost = &waiter->tree_entry;
-
 	rb_link_node(&waiter->tree_entry, parent, link);
-	rb_insert_color(&waiter->tree_entry, &lock->waiters);
+	rb_insert_color_cached(&waiter->tree_entry, &lock->waiters, leftmost);
 }
 
 static void
@@ -300,20 +297,17 @@ rt_mutex_dequeue(struct rt_mutex *lock, struct rt_mutex_waiter *waiter)
 	if (RB_EMPTY_NODE(&waiter->tree_entry))
 		return;
 
-	if (lock->waiters_leftmost == &waiter->tree_entry)
-		lock->waiters_leftmost = rb_next(&waiter->tree_entry);
-
-	rb_erase(&waiter->tree_entry, &lock->waiters);
+	rb_erase_cached(&waiter->tree_entry, &lock->waiters);
 	RB_CLEAR_NODE(&waiter->tree_entry);
 }
 
 static void
 rt_mutex_enqueue_pi(struct task_struct *task, struct rt_mutex_waiter *waiter)
 {
-	struct rb_node **link = &task->pi_waiters.rb_node;
+	struct rb_node **link = &task->pi_waiters.rb_root.rb_node;
 	struct rb_node *parent = NULL;
 	struct rt_mutex_waiter *entry;
-	int leftmost = 1;
+	bool leftmost = true;
 
 	while (*link) {
 		parent = *link;
@@ -322,15 +316,12 @@ rt_mutex_enqueue_pi(struct task_struct *task, struct rt_mutex_waiter *waiter)
 			link = &parent->rb_left;
 		} else {
 			link = &parent->rb_right;
-			leftmost = 0;
+			leftmost = false;
 		}
 	}
 
-	if (leftmost)
-		task->pi_waiters_leftmost = &waiter->pi_tree_entry;
-
 	rb_link_node(&waiter->pi_tree_entry, parent, link);
-	rb_insert_color(&waiter->pi_tree_entry, &task->pi_waiters);
+	rb_insert_color_cached(&waiter->pi_tree_entry, &task->pi_waiters, leftmost);
 }
 
 static void
@@ -339,10 +330,7 @@ rt_mutex_dequeue_pi(struct task_struct *task, struct rt_mutex_waiter *waiter)
 	if (RB_EMPTY_NODE(&waiter->pi_tree_entry))
 		return;
 
-	if (task->pi_waiters_leftmost == &waiter->pi_tree_entry)
-		task->pi_waiters_leftmost = rb_next(&waiter->pi_tree_entry);
-
-	rb_erase(&waiter->pi_tree_entry, &task->pi_waiters);
+	rb_erase_cached(&waiter->pi_tree_entry, &task->pi_waiters);
 	RB_CLEAR_NODE(&waiter->pi_tree_entry);
 }
 
@@ -1657,8 +1645,7 @@ void __rt_mutex_init(struct rt_mutex *lock, const char *name,
 {
 	lock->owner = NULL;
 	raw_spin_lock_init(&lock->wait_lock);
-	lock->waiters = RB_ROOT;
-	lock->waiters_leftmost = NULL;
+	lock->waiters = RB_ROOT_CACHED;
 
 	if (name && key)
 		debug_rt_mutex_init(lock, name, key);
diff --git a/kernel/locking/rtmutex_common.h b/kernel/locking/rtmutex_common.h
index 72ad45a9a794..524beeee24b0 100644
--- a/kernel/locking/rtmutex_common.h
+++ b/kernel/locking/rtmutex_common.h
@@ -42,7 +42,7 @@ struct rt_mutex_waiter {
  */
 static inline int rt_mutex_has_waiters(struct rt_mutex *lock)
 {
-	return !RB_EMPTY_ROOT(&lock->waiters);
+	return !RB_EMPTY_ROOT(&lock->waiters.rb_root);
 }
 
 static inline struct rt_mutex_waiter *
@@ -50,8 +50,8 @@ rt_mutex_top_waiter(struct rt_mutex *lock)
 {
 	struct rt_mutex_waiter *w;
 
-	w = rb_entry(lock->waiters_leftmost, struct rt_mutex_waiter,
-		     tree_entry);
+	w = rb_entry(lock->waiters.rb_leftmost,
+		     struct rt_mutex_waiter, tree_entry);
 	BUG_ON(w->lock != lock);
 
 	return w;
@@ -59,14 +59,14 @@ rt_mutex_top_waiter(struct rt_mutex *lock)
 
 static inline int task_has_pi_waiters(struct task_struct *p)
 {
-	return !RB_EMPTY_ROOT(&p->pi_waiters);
+	return !RB_EMPTY_ROOT(&p->pi_waiters.rb_root);
 }
 
 static inline struct rt_mutex_waiter *
 task_top_pi_waiter(struct task_struct *p)
 {
-	return rb_entry(p->pi_waiters_leftmost, struct rt_mutex_waiter,
-			pi_tree_entry);
+	return rb_entry(p->pi_waiters.rb_leftmost,
+			struct rt_mutex_waiter, pi_tree_entry);
 }
 
 /*
-- 
2.12.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 10/17] block/cfq: replace cfq_rb_root leftmost caching
  2017-07-19  1:45 [PATCH -next v4 00/17] rbtree: cache leftmost node internally Davidlohr Bueso
                   ` (8 preceding siblings ...)
  2017-07-19  1:45 ` [PATCH 09/17] locking/rtmutex: replace top-waiter and pi_waiters " Davidlohr Bueso
@ 2017-07-19  1:45 ` Davidlohr Bueso
  2017-07-19  7:46   ` Jan Kara
  2017-07-19  1:45 ` [PATCH 11/17] lib/interval_tree: fast overlap detection Davidlohr Bueso
                   ` (6 subsequent siblings)
  16 siblings, 1 reply; 29+ messages in thread
From: Davidlohr Bueso @ 2017-07-19  1:45 UTC (permalink / raw)
  To: akpm
  Cc: mingo, peterz, jack, torvalds, kirill.shutemov, hch, ldufour,
	mhocko, mgorman, dave, linux-kernel, axboe, linux-block,
	Davidlohr Bueso

... with the generic rbtree flavor instead. No changes
in semantics whatsoever.

Cc: axboe@fb.com
Cc: linux-block@vger.kernel.org
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 block/cfq-iosched.c | 70 +++++++++++++++--------------------------------------
 1 file changed, 20 insertions(+), 50 deletions(-)

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 3d5c28945719..92c31683a2bb 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -93,13 +93,12 @@ struct cfq_ttime {
  * move this into the elevator for the rq sorting as well.
  */
 struct cfq_rb_root {
-	struct rb_root rb;
-	struct rb_node *left;
+	struct rb_root_cached rb;
 	unsigned count;
 	u64 min_vdisktime;
 	struct cfq_ttime ttime;
 };
-#define CFQ_RB_ROOT	(struct cfq_rb_root) { .rb = RB_ROOT, \
+#define CFQ_RB_ROOT	(struct cfq_rb_root) { .rb = RB_ROOT_CACHED, \
 			.ttime = {.last_end_request = ktime_get_ns(),},}
 
 /*
@@ -984,10 +983,9 @@ static inline u64 max_vdisktime(u64 min_vdisktime, u64 vdisktime)
 
 static void update_min_vdisktime(struct cfq_rb_root *st)
 {
-	struct cfq_group *cfqg;
+	if (!RB_EMPTY_ROOT(&st->rb.rb_root)) {
+		struct cfq_group *cfqg = rb_entry_cfqg(st->rb.rb_leftmost);
 
-	if (st->left) {
-		cfqg = rb_entry_cfqg(st->left);
 		st->min_vdisktime = max_vdisktime(st->min_vdisktime,
 						  cfqg->vdisktime);
 	}
@@ -1169,46 +1167,25 @@ cfq_choose_req(struct cfq_data *cfqd, struct request *rq1, struct request *rq2,
 	}
 }
 
-/*
- * The below is leftmost cache rbtree addon
- */
 static struct cfq_queue *cfq_rb_first(struct cfq_rb_root *root)
 {
 	/* Service tree is empty */
 	if (!root->count)
 		return NULL;
 
-	if (!root->left)
-		root->left = rb_first(&root->rb);
-
-	if (root->left)
-		return rb_entry(root->left, struct cfq_queue, rb_node);
-
-	return NULL;
+	return rb_entry(rb_first_cached(&root->rb), struct cfq_queue, rb_node);
 }
 
 static struct cfq_group *cfq_rb_first_group(struct cfq_rb_root *root)
 {
-	if (!root->left)
-		root->left = rb_first(&root->rb);
-
-	if (root->left)
-		return rb_entry_cfqg(root->left);
-
-	return NULL;
+	return rb_entry_cfqg(rb_first_cached(&root->rb));
 }
 
-static void rb_erase_init(struct rb_node *n, struct rb_root *root)
+static void cfq_rb_erase(struct rb_node *n, struct cfq_rb_root *root)
 {
-	rb_erase(n, root);
+	rb_erase_cached(n, &root->rb);
 	RB_CLEAR_NODE(n);
-}
 
-static void cfq_rb_erase(struct rb_node *n, struct cfq_rb_root *root)
-{
-	if (root->left == n)
-		root->left = NULL;
-	rb_erase_init(n, &root->rb);
 	--root->count;
 }
 
@@ -1258,11 +1235,11 @@ cfqg_key(struct cfq_rb_root *st, struct cfq_group *cfqg)
 static void
 __cfq_group_service_tree_add(struct cfq_rb_root *st, struct cfq_group *cfqg)
 {
-	struct rb_node **node = &st->rb.rb_node;
+	struct rb_node **node = &st->rb.rb_root.rb_node;
 	struct rb_node *parent = NULL;
 	struct cfq_group *__cfqg;
 	s64 key = cfqg_key(st, cfqg);
-	int left = 1;
+	bool leftmost = true;
 
 	while (*node != NULL) {
 		parent = *node;
@@ -1272,15 +1249,12 @@ __cfq_group_service_tree_add(struct cfq_rb_root *st, struct cfq_group *cfqg)
 			node = &parent->rb_left;
 		else {
 			node = &parent->rb_right;
-			left = 0;
+			leftmost = false;
 		}
 	}
 
-	if (left)
-		st->left = &cfqg->rb_node;
-
 	rb_link_node(&cfqg->rb_node, parent, node);
-	rb_insert_color(&cfqg->rb_node, &st->rb);
+	rb_insert_color_cached(&cfqg->rb_node, &st->rb, leftmost);
 }
 
 /*
@@ -1381,7 +1355,7 @@ cfq_group_notify_queue_add(struct cfq_data *cfqd, struct cfq_group *cfqg)
 	 * so that groups get lesser vtime based on their weights, so that
 	 * if group does not loose all if it was not continuously backlogged.
 	 */
-	n = rb_last(&st->rb);
+	n = rb_last(&st->rb.rb_root);
 	if (n) {
 		__cfqg = rb_entry_cfqg(n);
 		cfqg->vdisktime = __cfqg->vdisktime +
@@ -2223,14 +2197,14 @@ static void cfq_service_tree_add(struct cfq_data *cfqd, struct cfq_queue *cfqq,
 	struct cfq_queue *__cfqq;
 	u64 rb_key;
 	struct cfq_rb_root *st;
-	int left;
+	bool leftmost = true;
 	int new_cfqq = 1;
 	u64 now = ktime_get_ns();
 
 	st = st_for(cfqq->cfqg, cfqq_class(cfqq), cfqq_type(cfqq));
 	if (cfq_class_idle(cfqq)) {
 		rb_key = CFQ_IDLE_DELAY;
-		parent = rb_last(&st->rb);
+		parent = rb_last(&st->rb.rb_root);
 		if (parent && parent != &cfqq->rb_node) {
 			__cfqq = rb_entry(parent, struct cfq_queue, rb_node);
 			rb_key += __cfqq->rb_key;
@@ -2264,10 +2238,9 @@ static void cfq_service_tree_add(struct cfq_data *cfqd, struct cfq_queue *cfqq,
 		cfqq->service_tree = NULL;
 	}
 
-	left = 1;
 	parent = NULL;
 	cfqq->service_tree = st;
-	p = &st->rb.rb_node;
+	p = &st->rb.rb_root.rb_node;
 	while (*p) {
 		parent = *p;
 		__cfqq = rb_entry(parent, struct cfq_queue, rb_node);
@@ -2279,16 +2252,13 @@ static void cfq_service_tree_add(struct cfq_data *cfqd, struct cfq_queue *cfqq,
 			p = &parent->rb_left;
 		else {
 			p = &parent->rb_right;
-			left = 0;
+			leftmost = false;
 		}
 	}
 
-	if (left)
-		st->left = &cfqq->rb_node;
-
 	cfqq->rb_key = rb_key;
 	rb_link_node(&cfqq->rb_node, parent, p);
-	rb_insert_color(&cfqq->rb_node, &st->rb);
+	rb_insert_color_cached(&cfqq->rb_node, &st->rb, leftmost);
 	st->count++;
 	if (add_front || !new_cfqq)
 		return;
@@ -2735,7 +2705,7 @@ static struct cfq_queue *cfq_get_next_queue(struct cfq_data *cfqd)
 	/* There is nothing to dispatch */
 	if (!st)
 		return NULL;
-	if (RB_EMPTY_ROOT(&st->rb))
+	if (RB_EMPTY_ROOT(&st->rb.rb_root))
 		return NULL;
 	return cfq_rb_first(st);
 }
@@ -3221,7 +3191,7 @@ static struct cfq_group *cfq_get_next_cfqg(struct cfq_data *cfqd)
 	struct cfq_rb_root *st = &cfqd->grp_service_tree;
 	struct cfq_group *cfqg;
 
-	if (RB_EMPTY_ROOT(&st->rb))
+	if (RB_EMPTY_ROOT(&st->rb.rb_root))
 		return NULL;
 	cfqg = cfq_rb_first_group(st);
 	update_min_vdisktime(st);
-- 
2.12.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 11/17] lib/interval_tree: fast overlap detection
  2017-07-19  1:45 [PATCH -next v4 00/17] rbtree: cache leftmost node internally Davidlohr Bueso
                   ` (9 preceding siblings ...)
  2017-07-19  1:45 ` [PATCH 10/17] block/cfq: replace cfq_rb_root " Davidlohr Bueso
@ 2017-07-19  1:45 ` Davidlohr Bueso
       [not found]   ` <20170719014603.19029-12-dave-h16yJtLeMjHk1uMJSBkQmQ@public.gmane.org>
  2017-08-01 17:16     ` Michael S. Tsirkin
  2017-07-19  1:45 ` [PATCH 12/17] lib/interval-tree: correct comment wrt generic flavor Davidlohr Bueso
                   ` (5 subsequent siblings)
  16 siblings, 2 replies; 29+ messages in thread
From: Davidlohr Bueso @ 2017-07-19  1:45 UTC (permalink / raw)
  To: akpm
  Cc: mingo, peterz, jack, torvalds, kirill.shutemov, hch, ldufour,
	mhocko, mgorman, dave, linux-kernel, David Airlie, dri-devel,
	Michael S. Tsirkin, Jason Wang, Doug Ledford,
	Christian Benvenuti, linux-rdma, Davidlohr Bueso

Allow interval trees to quickly check for overlaps to avoid
unnecesary tree lookups in interval_tree_iter_first().

As of this patch, all interval tree flavors will require
using a 'rb_root_cached' such that we can have the leftmost
node easily available. While most users will make use of this
feature, those with special functions (in addition to the generic
insert, delete, search calls) will avoid using the cached
option as they can do funky things with insertions -- for example,
vma_interval_tree_insert_after().

Cc: David Airlie <airlied@linux.ie>
Cc: dri-devel@lists.freedesktop.org
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Christian Benvenuti <benve@cisco.com>
Cc: linux-rdma@vger.kernel.org
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c             |  8 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c             |  7 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h             |  2 +-
 drivers/gpu/drm/drm_mm.c                           | 19 +++++----
 drivers/gpu/drm/drm_vma_manager.c                  |  2 +-
 drivers/gpu/drm/i915/i915_gem_userptr.c            |  6 +--
 drivers/gpu/drm/radeon/radeon.h                    |  2 +-
 drivers/gpu/drm/radeon/radeon_mn.c                 |  8 ++--
 drivers/gpu/drm/radeon/radeon_vm.c                 |  7 ++--
 drivers/infiniband/core/umem_rbtree.c              |  4 +-
 drivers/infiniband/core/uverbs_cmd.c               |  2 +-
 drivers/infiniband/hw/hfi1/mmu_rb.c                | 10 ++---
 drivers/infiniband/hw/usnic/usnic_uiom.c           |  6 +--
 drivers/infiniband/hw/usnic/usnic_uiom.h           |  2 +-
 .../infiniband/hw/usnic/usnic_uiom_interval_tree.c | 15 +++----
 .../infiniband/hw/usnic/usnic_uiom_interval_tree.h | 12 +++---
 drivers/vhost/vhost.c                              |  2 +-
 drivers/vhost/vhost.h                              |  2 +-
 fs/hugetlbfs/inode.c                               |  6 +--
 fs/inode.c                                         |  2 +-
 include/drm/drm_mm.h                               |  2 +-
 include/linux/fs.h                                 |  4 +-
 include/linux/interval_tree.h                      |  8 ++--
 include/linux/interval_tree_generic.h              | 46 +++++++++++++++++-----
 include/linux/mm.h                                 | 17 ++++----
 include/linux/rmap.h                               |  4 +-
 include/rdma/ib_umem_odp.h                         | 11 ++++--
 include/rdma/ib_verbs.h                            |  2 +-
 lib/interval_tree_test.c                           |  4 +-
 mm/interval_tree.c                                 | 10 ++---
 mm/memory.c                                        |  4 +-
 mm/mmap.c                                          | 10 ++---
 mm/rmap.c                                          |  4 +-
 33 files changed, 145 insertions(+), 105 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
index 38f739fb727b..3f8aef21b9a6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
@@ -51,7 +51,7 @@ struct amdgpu_mn {
 
 	/* objects protected by lock */
 	struct mutex		lock;
-	struct rb_root		objects;
+	struct rb_root_cached	objects;
 };
 
 struct amdgpu_mn_node {
@@ -76,8 +76,8 @@ static void amdgpu_mn_destroy(struct work_struct *work)
 	mutex_lock(&adev->mn_lock);
 	mutex_lock(&rmn->lock);
 	hash_del(&rmn->node);
-	rbtree_postorder_for_each_entry_safe(node, next_node, &rmn->objects,
-					     it.rb) {
+	rbtree_postorder_for_each_entry_safe(node, next_node,
+					     &rmn->objects.rb_root, it.rb) {
 		list_for_each_entry_safe(bo, next_bo, &node->bos, mn_list) {
 			bo->mn = NULL;
 			list_del_init(&bo->mn_list);
@@ -252,7 +252,7 @@ static struct amdgpu_mn *amdgpu_mn_get(struct amdgpu_device *adev)
 	rmn->mm = mm;
 	rmn->mn.ops = &amdgpu_mn_ops;
 	mutex_init(&rmn->lock);
-	rmn->objects = RB_ROOT;
+	rmn->objects = RB_ROOT_CACHED;
 
 	r = __mmu_notifier_register(&rmn->mn, mm);
 	if (r)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 5795f81369f0..f872e2179bbd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -2405,7 +2405,7 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm,
 	int r, i;
 	u64 flags;
 
-	vm->va = RB_ROOT;
+	vm->va = RB_ROOT_CACHED;
 	vm->client_id = atomic64_inc_return(&adev->vm_manager.client_counter);
 	for (i = 0; i < AMDGPU_MAX_VMHUBS; i++)
 		vm->reserved_vmid[i] = NULL;
@@ -2512,10 +2512,11 @@ void amdgpu_vm_fini(struct amdgpu_device *adev, struct amdgpu_vm *vm)
 
 	amd_sched_entity_fini(vm->entity.sched, &vm->entity);
 
-	if (!RB_EMPTY_ROOT(&vm->va)) {
+	if (!RB_EMPTY_ROOT(&vm->va.rb_root)) {
 		dev_err(adev->dev, "still active bo inside vm\n");
 	}
-	rbtree_postorder_for_each_entry_safe(mapping, tmp, &vm->va, rb) {
+	rbtree_postorder_for_each_entry_safe(mapping, tmp,
+					     &vm->va.rb_root, rb) {
 		list_del(&mapping->list);
 		amdgpu_vm_it_remove(mapping, &vm->va);
 		kfree(mapping);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index 936f158bc5ec..ebffc1253f85 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -106,7 +106,7 @@ struct amdgpu_vm_pt {
 
 struct amdgpu_vm {
 	/* tree of virtual addresses mapped */
-	struct rb_root		va;
+	struct rb_root_cached	va;
 
 	/* protecting invalidated */
 	spinlock_t		status_lock;
diff --git a/drivers/gpu/drm/drm_mm.c b/drivers/gpu/drm/drm_mm.c
index f794089d30ac..61a1c8ea74bc 100644
--- a/drivers/gpu/drm/drm_mm.c
+++ b/drivers/gpu/drm/drm_mm.c
@@ -169,7 +169,7 @@ INTERVAL_TREE_DEFINE(struct drm_mm_node, rb,
 struct drm_mm_node *
 __drm_mm_interval_first(const struct drm_mm *mm, u64 start, u64 last)
 {
-	return drm_mm_interval_tree_iter_first((struct rb_root *)&mm->interval_tree,
+	return drm_mm_interval_tree_iter_first((struct rb_root_cached *)&mm->interval_tree,
 					       start, last) ?: (struct drm_mm_node *)&mm->head_node;
 }
 EXPORT_SYMBOL(__drm_mm_interval_first);
@@ -180,6 +180,7 @@ static void drm_mm_interval_tree_add_node(struct drm_mm_node *hole_node,
 	struct drm_mm *mm = hole_node->mm;
 	struct rb_node **link, *rb;
 	struct drm_mm_node *parent;
+	bool leftmost = true;
 
 	node->__subtree_last = LAST(node);
 
@@ -196,9 +197,10 @@ static void drm_mm_interval_tree_add_node(struct drm_mm_node *hole_node,
 
 		rb = &hole_node->rb;
 		link = &hole_node->rb.rb_right;
+		leftmost = false;
 	} else {
 		rb = NULL;
-		link = &mm->interval_tree.rb_node;
+		link = &mm->interval_tree.rb_root.rb_node;
 	}
 
 	while (*link) {
@@ -208,14 +210,15 @@ static void drm_mm_interval_tree_add_node(struct drm_mm_node *hole_node,
 			parent->__subtree_last = node->__subtree_last;
 		if (node->start < parent->start)
 			link = &parent->rb.rb_left;
-		else
+		else {
 			link = &parent->rb.rb_right;
+			leftmost = true;
+		}
 	}
 
 	rb_link_node(&node->rb, rb, link);
-	rb_insert_augmented(&node->rb,
-			    &mm->interval_tree,
-			    &drm_mm_interval_tree_augment);
+	rb_insert_augmented_cached(&node->rb, &mm->interval_tree, leftmost,
+				   &drm_mm_interval_tree_augment);
 }
 
 #define RB_INSERT(root, member, expr) do { \
@@ -577,7 +580,7 @@ void drm_mm_replace_node(struct drm_mm_node *old, struct drm_mm_node *new)
 	*new = *old;
 
 	list_replace(&old->node_list, &new->node_list);
-	rb_replace_node(&old->rb, &new->rb, &old->mm->interval_tree);
+	rb_replace_node(&old->rb, &new->rb, &old->mm->interval_tree.rb_root);
 
 	if (drm_mm_hole_follows(old)) {
 		list_replace(&old->hole_stack, &new->hole_stack);
@@ -863,7 +866,7 @@ void drm_mm_init(struct drm_mm *mm, u64 start, u64 size)
 	mm->color_adjust = NULL;
 
 	INIT_LIST_HEAD(&mm->hole_stack);
-	mm->interval_tree = RB_ROOT;
+	mm->interval_tree = RB_ROOT_CACHED;
 	mm->holes_size = RB_ROOT;
 	mm->holes_addr = RB_ROOT;
 
diff --git a/drivers/gpu/drm/drm_vma_manager.c b/drivers/gpu/drm/drm_vma_manager.c
index d9100b565198..28f1226576f8 100644
--- a/drivers/gpu/drm/drm_vma_manager.c
+++ b/drivers/gpu/drm/drm_vma_manager.c
@@ -147,7 +147,7 @@ struct drm_vma_offset_node *drm_vma_offset_lookup_locked(struct drm_vma_offset_m
 	struct rb_node *iter;
 	unsigned long offset;
 
-	iter = mgr->vm_addr_space_mm.interval_tree.rb_node;
+	iter = mgr->vm_addr_space_mm.interval_tree.rb_root.rb_node;
 	best = NULL;
 
 	while (likely(iter)) {
diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
index ccd09e8419f5..71dddf66baaa 100644
--- a/drivers/gpu/drm/i915/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
@@ -49,7 +49,7 @@ struct i915_mmu_notifier {
 	spinlock_t lock;
 	struct hlist_node node;
 	struct mmu_notifier mn;
-	struct rb_root objects;
+	struct rb_root_cached objects;
 	struct workqueue_struct *wq;
 };
 
@@ -123,7 +123,7 @@ static void i915_gem_userptr_mn_invalidate_range_start(struct mmu_notifier *_mn,
 	struct interval_tree_node *it;
 	LIST_HEAD(cancelled);
 
-	if (RB_EMPTY_ROOT(&mn->objects))
+	if (RB_EMPTY_ROOT(&mn->objects.rb_root))
 		return;
 
 	/* interval ranges are inclusive, but invalidate range is exclusive */
@@ -172,7 +172,7 @@ i915_mmu_notifier_create(struct mm_struct *mm)
 
 	spin_lock_init(&mn->lock);
 	mn->mn.ops = &i915_gem_userptr_notifier;
-	mn->objects = RB_ROOT;
+	mn->objects = RB_ROOT_CACHED;
 	mn->wq = alloc_workqueue("i915-userptr-release", WQ_UNBOUND, 0);
 	if (mn->wq == NULL) {
 		kfree(mn);
diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 5008f3d4cccc..10d0dd146808 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -924,7 +924,7 @@ struct radeon_vm_id {
 struct radeon_vm {
 	struct mutex		mutex;
 
-	struct rb_root		va;
+	struct rb_root_cached	va;
 
 	/* protecting invalidated and freed */
 	spinlock_t		status_lock;
diff --git a/drivers/gpu/drm/radeon/radeon_mn.c b/drivers/gpu/drm/radeon/radeon_mn.c
index 896f2cf51e4e..1d62288b7ee3 100644
--- a/drivers/gpu/drm/radeon/radeon_mn.c
+++ b/drivers/gpu/drm/radeon/radeon_mn.c
@@ -50,7 +50,7 @@ struct radeon_mn {
 
 	/* objects protected by lock */
 	struct mutex		lock;
-	struct rb_root		objects;
+	struct rb_root_cached	objects;
 };
 
 struct radeon_mn_node {
@@ -75,8 +75,8 @@ static void radeon_mn_destroy(struct work_struct *work)
 	mutex_lock(&rdev->mn_lock);
 	mutex_lock(&rmn->lock);
 	hash_del(&rmn->node);
-	rbtree_postorder_for_each_entry_safe(node, next_node, &rmn->objects,
-					     it.rb) {
+	rbtree_postorder_for_each_entry_safe(node, next_node,
+					     &rmn->objects.rb_root, it.rb) {
 
 		interval_tree_remove(&node->it, &rmn->objects);
 		list_for_each_entry_safe(bo, next_bo, &node->bos, mn_list) {
@@ -205,7 +205,7 @@ static struct radeon_mn *radeon_mn_get(struct radeon_device *rdev)
 	rmn->mm = mm;
 	rmn->mn.ops = &radeon_mn_ops;
 	mutex_init(&rmn->lock);
-	rmn->objects = RB_ROOT;
+	rmn->objects = RB_ROOT_CACHED;
 	
 	r = __mmu_notifier_register(&rmn->mn, mm);
 	if (r)
diff --git a/drivers/gpu/drm/radeon/radeon_vm.c b/drivers/gpu/drm/radeon/radeon_vm.c
index 5f68245579a3..f44777a6c2e8 100644
--- a/drivers/gpu/drm/radeon/radeon_vm.c
+++ b/drivers/gpu/drm/radeon/radeon_vm.c
@@ -1185,7 +1185,7 @@ int radeon_vm_init(struct radeon_device *rdev, struct radeon_vm *vm)
 		vm->ids[i].last_id_use = NULL;
 	}
 	mutex_init(&vm->mutex);
-	vm->va = RB_ROOT;
+	vm->va = RB_ROOT_CACHED;
 	spin_lock_init(&vm->status_lock);
 	INIT_LIST_HEAD(&vm->invalidated);
 	INIT_LIST_HEAD(&vm->freed);
@@ -1232,10 +1232,11 @@ void radeon_vm_fini(struct radeon_device *rdev, struct radeon_vm *vm)
 	struct radeon_bo_va *bo_va, *tmp;
 	int i, r;
 
-	if (!RB_EMPTY_ROOT(&vm->va)) {
+	if (!RB_EMPTY_ROOT(&vm->va.rb_root)) {
 		dev_err(rdev->dev, "still active bo inside vm\n");
 	}
-	rbtree_postorder_for_each_entry_safe(bo_va, tmp, &vm->va, it.rb) {
+	rbtree_postorder_for_each_entry_safe(bo_va, tmp,
+					     &vm->va.rb_root, it.rb) {
 		interval_tree_remove(&bo_va->it, &vm->va);
 		r = radeon_bo_reserve(bo_va->bo, false);
 		if (!r) {
diff --git a/drivers/infiniband/core/umem_rbtree.c b/drivers/infiniband/core/umem_rbtree.c
index d176597b4d78..fc801920e341 100644
--- a/drivers/infiniband/core/umem_rbtree.c
+++ b/drivers/infiniband/core/umem_rbtree.c
@@ -72,7 +72,7 @@ INTERVAL_TREE_DEFINE(struct umem_odp_node, rb, u64, __subtree_last,
 /* @last is not a part of the interval. See comment for function
  * node_last.
  */
-int rbt_ib_umem_for_each_in_range(struct rb_root *root,
+int rbt_ib_umem_for_each_in_range(struct rb_root_cached *root,
 				  u64 start, u64 last,
 				  umem_call_back cb,
 				  void *cookie)
@@ -95,7 +95,7 @@ int rbt_ib_umem_for_each_in_range(struct rb_root *root,
 }
 EXPORT_SYMBOL(rbt_ib_umem_for_each_in_range);
 
-struct ib_umem_odp *rbt_ib_umem_lookup(struct rb_root *root,
+struct ib_umem_odp *rbt_ib_umem_lookup(struct rb_root_cached *root,
 				       u64 addr, u64 length)
 {
 	struct umem_odp_node *node;
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 3f55d18a3791..5321c6ae2c0c 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -117,7 +117,7 @@ ssize_t ib_uverbs_get_context(struct ib_uverbs_file *file,
 	ucontext->closing = 0;
 
 #ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
-	ucontext->umem_tree = RB_ROOT;
+	ucontext->umem_tree = RB_ROOT_CACHED;
 	init_rwsem(&ucontext->umem_rwsem);
 	ucontext->odp_mrs_count = 0;
 	INIT_LIST_HEAD(&ucontext->no_private_counters);
diff --git a/drivers/infiniband/hw/hfi1/mmu_rb.c b/drivers/infiniband/hw/hfi1/mmu_rb.c
index d41fd87a39f2..5b6626f8feb6 100644
--- a/drivers/infiniband/hw/hfi1/mmu_rb.c
+++ b/drivers/infiniband/hw/hfi1/mmu_rb.c
@@ -54,7 +54,7 @@
 
 struct mmu_rb_handler {
 	struct mmu_notifier mn;
-	struct rb_root root;
+	struct rb_root_cached root;
 	void *ops_arg;
 	spinlock_t lock;        /* protect the RB tree */
 	struct mmu_rb_ops *ops;
@@ -111,7 +111,7 @@ int hfi1_mmu_rb_register(void *ops_arg, struct mm_struct *mm,
 	if (!handlr)
 		return -ENOMEM;
 
-	handlr->root = RB_ROOT;
+	handlr->root = RB_ROOT_CACHED;
 	handlr->ops = ops;
 	handlr->ops_arg = ops_arg;
 	INIT_HLIST_NODE(&handlr->mn.hlist);
@@ -152,9 +152,9 @@ void hfi1_mmu_rb_unregister(struct mmu_rb_handler *handler)
 	INIT_LIST_HEAD(&del_list);
 
 	spin_lock_irqsave(&handler->lock, flags);
-	while ((node = rb_first(&handler->root))) {
+	while ((node = rb_first_cached(&handler->root))) {
 		rbnode = rb_entry(node, struct mmu_rb_node, node);
-		rb_erase(node, &handler->root);
+		rb_erase_cached(node, &handler->root);
 		/* move from LRU list to delete list */
 		list_move(&rbnode->list, &del_list);
 	}
@@ -311,7 +311,7 @@ static void mmu_notifier_mem_invalidate(struct mmu_notifier *mn,
 {
 	struct mmu_rb_handler *handler =
 		container_of(mn, struct mmu_rb_handler, mn);
-	struct rb_root *root = &handler->root;
+	struct rb_root_cached *root = &handler->root;
 	struct mmu_rb_node *node, *ptr = NULL;
 	unsigned long flags;
 	bool added = false;
diff --git a/drivers/infiniband/hw/usnic/usnic_uiom.c b/drivers/infiniband/hw/usnic/usnic_uiom.c
index c49db7c33979..4381c0a9a873 100644
--- a/drivers/infiniband/hw/usnic/usnic_uiom.c
+++ b/drivers/infiniband/hw/usnic/usnic_uiom.c
@@ -227,7 +227,7 @@ static void __usnic_uiom_reg_release(struct usnic_uiom_pd *pd,
 	vpn_last = vpn_start + npages - 1;
 
 	spin_lock(&pd->lock);
-	usnic_uiom_remove_interval(&pd->rb_root, vpn_start,
+	usnic_uiom_remove_interval(&pd->root, vpn_start,
 					vpn_last, &rm_intervals);
 	usnic_uiom_unmap_sorted_intervals(&rm_intervals, pd);
 
@@ -379,7 +379,7 @@ struct usnic_uiom_reg *usnic_uiom_reg_get(struct usnic_uiom_pd *pd,
 	err = usnic_uiom_get_intervals_diff(vpn_start, vpn_last,
 						(writable) ? IOMMU_WRITE : 0,
 						IOMMU_WRITE,
-						&pd->rb_root,
+						&pd->root,
 						&sorted_diff_intervals);
 	if (err) {
 		usnic_err("Failed disjoint interval vpn [0x%lx,0x%lx] err %d\n",
@@ -395,7 +395,7 @@ struct usnic_uiom_reg *usnic_uiom_reg_get(struct usnic_uiom_pd *pd,
 
 	}
 
-	err = usnic_uiom_insert_interval(&pd->rb_root, vpn_start, vpn_last,
+	err = usnic_uiom_insert_interval(&pd->root, vpn_start, vpn_last,
 					(writable) ? IOMMU_WRITE : 0);
 	if (err) {
 		usnic_err("Failed insert interval vpn [0x%lx,0x%lx] err %d\n",
diff --git a/drivers/infiniband/hw/usnic/usnic_uiom.h b/drivers/infiniband/hw/usnic/usnic_uiom.h
index 45ca7c1613a7..431efe4143f4 100644
--- a/drivers/infiniband/hw/usnic/usnic_uiom.h
+++ b/drivers/infiniband/hw/usnic/usnic_uiom.h
@@ -55,7 +55,7 @@ struct usnic_uiom_dev {
 struct usnic_uiom_pd {
 	struct iommu_domain		*domain;
 	spinlock_t			lock;
-	struct rb_root			rb_root;
+	struct rb_root_cached		root;
 	struct list_head		devs;
 	int				dev_cnt;
 };
diff --git a/drivers/infiniband/hw/usnic/usnic_uiom_interval_tree.c b/drivers/infiniband/hw/usnic/usnic_uiom_interval_tree.c
index 42b4b4c4e452..d399523206c7 100644
--- a/drivers/infiniband/hw/usnic/usnic_uiom_interval_tree.c
+++ b/drivers/infiniband/hw/usnic/usnic_uiom_interval_tree.c
@@ -100,9 +100,9 @@ static int interval_cmp(void *priv, struct list_head *a, struct list_head *b)
 }
 
 static void
-find_intervals_intersection_sorted(struct rb_root *root, unsigned long start,
-					unsigned long last,
-					struct list_head *list)
+find_intervals_intersection_sorted(struct rb_root_cached *root,
+				   unsigned long start, unsigned long last,
+				   struct list_head *list)
 {
 	struct usnic_uiom_interval_node *node;
 
@@ -118,7 +118,7 @@ find_intervals_intersection_sorted(struct rb_root *root, unsigned long start,
 
 int usnic_uiom_get_intervals_diff(unsigned long start, unsigned long last,
 					int flags, int flag_mask,
-					struct rb_root *root,
+					struct rb_root_cached *root,
 					struct list_head *diff_set)
 {
 	struct usnic_uiom_interval_node *interval, *tmp;
@@ -175,7 +175,7 @@ void usnic_uiom_put_interval_set(struct list_head *intervals)
 		kfree(interval);
 }
 
-int usnic_uiom_insert_interval(struct rb_root *root, unsigned long start,
+int usnic_uiom_insert_interval(struct rb_root_cached *root, unsigned long start,
 				unsigned long last, int flags)
 {
 	struct usnic_uiom_interval_node *interval, *tmp;
@@ -246,8 +246,9 @@ int usnic_uiom_insert_interval(struct rb_root *root, unsigned long start,
 	return err;
 }
 
-void usnic_uiom_remove_interval(struct rb_root *root, unsigned long start,
-				unsigned long last, struct list_head *removed)
+void usnic_uiom_remove_interval(struct rb_root_cached *root,
+				unsigned long start, unsigned long last,
+				struct list_head *removed)
 {
 	struct usnic_uiom_interval_node *interval;
 
diff --git a/drivers/infiniband/hw/usnic/usnic_uiom_interval_tree.h b/drivers/infiniband/hw/usnic/usnic_uiom_interval_tree.h
index c0b0b876ab90..1d7fc3226bca 100644
--- a/drivers/infiniband/hw/usnic/usnic_uiom_interval_tree.h
+++ b/drivers/infiniband/hw/usnic/usnic_uiom_interval_tree.h
@@ -48,12 +48,12 @@ struct usnic_uiom_interval_node {
 
 extern void
 usnic_uiom_interval_tree_insert(struct usnic_uiom_interval_node *node,
-					struct rb_root *root);
+					struct rb_root_cached *root);
 extern void
 usnic_uiom_interval_tree_remove(struct usnic_uiom_interval_node *node,
-					struct rb_root *root);
+					struct rb_root_cached *root);
 extern struct usnic_uiom_interval_node *
-usnic_uiom_interval_tree_iter_first(struct rb_root *root,
+usnic_uiom_interval_tree_iter_first(struct rb_root_cached *root,
 					unsigned long start,
 					unsigned long last);
 extern struct usnic_uiom_interval_node *
@@ -63,7 +63,7 @@ usnic_uiom_interval_tree_iter_next(struct usnic_uiom_interval_node *node,
  * Inserts {start...last} into {root}.  If there are overlaps,
  * nodes will be broken up and merged
  */
-int usnic_uiom_insert_interval(struct rb_root *root,
+int usnic_uiom_insert_interval(struct rb_root_cached *root,
 				unsigned long start, unsigned long last,
 				int flags);
 /*
@@ -71,7 +71,7 @@ int usnic_uiom_insert_interval(struct rb_root *root,
  * 'removed.' The caller is responsibile for freeing memory of nodes in
  * 'removed.'
  */
-void usnic_uiom_remove_interval(struct rb_root *root,
+void usnic_uiom_remove_interval(struct rb_root_cached *root,
 				unsigned long start, unsigned long last,
 				struct list_head *removed);
 /*
@@ -81,7 +81,7 @@ void usnic_uiom_remove_interval(struct rb_root *root,
 int usnic_uiom_get_intervals_diff(unsigned long start,
 					unsigned long last, int flags,
 					int flag_mask,
-					struct rb_root *root,
+					struct rb_root_cached *root,
 					struct list_head *diff_set);
 /* Call this to free diff_set returned by usnic_uiom_get_intervals_diff */
 void usnic_uiom_put_interval_set(struct list_head *intervals);
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
index e4613a3c362d..88dc214de068 100644
--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -1272,7 +1272,7 @@ static struct vhost_umem *vhost_umem_alloc(void)
 	if (!umem)
 		return NULL;
 
-	umem->umem_tree = RB_ROOT;
+	umem->umem_tree = RB_ROOT_CACHED;
 	umem->numem = 0;
 	INIT_LIST_HEAD(&umem->umem_list);
 
diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
index f72095868b93..a0278ba6a8b4 100644
--- a/drivers/vhost/vhost.h
+++ b/drivers/vhost/vhost.h
@@ -71,7 +71,7 @@ struct vhost_umem_node {
 };
 
 struct vhost_umem {
-	struct rb_root umem_tree;
+	struct rb_root_cached umem_tree;
 	struct list_head umem_list;
 	int numem;
 };
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 28d2753be094..e61b91e6adf4 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -334,7 +334,7 @@ static void remove_huge_page(struct page *page)
 }
 
 static void
-hugetlb_vmdelete_list(struct rb_root *root, pgoff_t start, pgoff_t end)
+hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end)
 {
 	struct vm_area_struct *vma;
 
@@ -514,7 +514,7 @@ static int hugetlb_vmtruncate(struct inode *inode, loff_t offset)
 
 	i_size_write(inode, offset);
 	i_mmap_lock_write(mapping);
-	if (!RB_EMPTY_ROOT(&mapping->i_mmap))
+	if (!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root))
 		hugetlb_vmdelete_list(&mapping->i_mmap, pgoff, 0);
 	i_mmap_unlock_write(mapping);
 	remove_inode_hugepages(inode, offset, LLONG_MAX);
@@ -539,7 +539,7 @@ static long hugetlbfs_punch_hole(struct inode *inode, loff_t offset, loff_t len)
 
 		inode_lock(inode);
 		i_mmap_lock_write(mapping);
-		if (!RB_EMPTY_ROOT(&mapping->i_mmap))
+		if (!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root))
 			hugetlb_vmdelete_list(&mapping->i_mmap,
 						hole_start >> PAGE_SHIFT,
 						hole_end  >> PAGE_SHIFT);
diff --git a/fs/inode.c b/fs/inode.c
index 50370599e371..5ddb808ea934 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -353,7 +353,7 @@ void address_space_init_once(struct address_space *mapping)
 	init_rwsem(&mapping->i_mmap_rwsem);
 	INIT_LIST_HEAD(&mapping->private_list);
 	spin_lock_init(&mapping->private_lock);
-	mapping->i_mmap = RB_ROOT;
+	mapping->i_mmap = RB_ROOT_CACHED;
 }
 EXPORT_SYMBOL(address_space_init_once);
 
diff --git a/include/drm/drm_mm.h b/include/drm/drm_mm.h
index 49b292e98fec..8d10fc97801c 100644
--- a/include/drm/drm_mm.h
+++ b/include/drm/drm_mm.h
@@ -172,7 +172,7 @@ struct drm_mm {
 	 * according to the (increasing) start address of the memory node. */
 	struct drm_mm_node head_node;
 	/* Keep an interval_tree for fast lookup of drm_mm_nodes by address. */
-	struct rb_root interval_tree;
+	struct rb_root_cached interval_tree;
 	struct rb_root holes_size;
 	struct rb_root holes_addr;
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index ed8798255872..d1d521c5025b 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -392,7 +392,7 @@ struct address_space {
 	struct radix_tree_root	page_tree;	/* radix tree of all pages */
 	spinlock_t		tree_lock;	/* and lock protecting it */
 	atomic_t		i_mmap_writable;/* count VM_SHARED mappings */
-	struct rb_root		i_mmap;		/* tree of private and shared mappings */
+	struct rb_root_cached	i_mmap;		/* tree of private and shared mappings */
 	struct rw_semaphore	i_mmap_rwsem;	/* protect tree, count, list */
 	/* Protected by tree_lock together with the radix tree */
 	unsigned long		nrpages;	/* number of total pages */
@@ -486,7 +486,7 @@ static inline void i_mmap_unlock_read(struct address_space *mapping)
  */
 static inline int mapping_mapped(struct address_space *mapping)
 {
-	return	!RB_EMPTY_ROOT(&mapping->i_mmap);
+	return	!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root);
 }
 
 /*
diff --git a/include/linux/interval_tree.h b/include/linux/interval_tree.h
index 724556aa3c95..202ee1283f4b 100644
--- a/include/linux/interval_tree.h
+++ b/include/linux/interval_tree.h
@@ -11,13 +11,15 @@ struct interval_tree_node {
 };
 
 extern void
-interval_tree_insert(struct interval_tree_node *node, struct rb_root *root);
+interval_tree_insert(struct interval_tree_node *node,
+		     struct rb_root_cached *root);
 
 extern void
-interval_tree_remove(struct interval_tree_node *node, struct rb_root *root);
+interval_tree_remove(struct interval_tree_node *node,
+		     struct rb_root_cached *root);
 
 extern struct interval_tree_node *
-interval_tree_iter_first(struct rb_root *root,
+interval_tree_iter_first(struct rb_root_cached *root,
 			 unsigned long start, unsigned long last);
 
 extern struct interval_tree_node *
diff --git a/include/linux/interval_tree_generic.h b/include/linux/interval_tree_generic.h
index 58370e1862ad..f096423c8cbd 100644
--- a/include/linux/interval_tree_generic.h
+++ b/include/linux/interval_tree_generic.h
@@ -65,11 +65,13 @@ RB_DECLARE_CALLBACKS(static, ITPREFIX ## _augment, ITSTRUCT, ITRB,	      \
 									      \
 /* Insert / remove interval nodes from the tree */			      \
 									      \
-ITSTATIC void ITPREFIX ## _insert(ITSTRUCT *node, struct rb_root *root)	      \
+ITSTATIC void ITPREFIX ## _insert(ITSTRUCT *node,			      \
+				  struct rb_root_cached *root)	 	      \
 {									      \
-	struct rb_node **link = &root->rb_node, *rb_parent = NULL;	      \
+	struct rb_node **link = &root->rb_root.rb_node, *rb_parent = NULL;    \
 	ITTYPE start = ITSTART(node), last = ITLAST(node);		      \
 	ITSTRUCT *parent;						      \
+	bool leftmost = true;						      \
 									      \
 	while (*link) {							      \
 		rb_parent = *link;					      \
@@ -78,18 +80,22 @@ ITSTATIC void ITPREFIX ## _insert(ITSTRUCT *node, struct rb_root *root)	      \
 			parent->ITSUBTREE = last;			      \
 		if (start < ITSTART(parent))				      \
 			link = &parent->ITRB.rb_left;			      \
-		else							      \
+		else {							      \
 			link = &parent->ITRB.rb_right;			      \
+			leftmost = false;				      \
+		}							      \
 	}								      \
 									      \
 	node->ITSUBTREE = last;						      \
 	rb_link_node(&node->ITRB, rb_parent, link);			      \
-	rb_insert_augmented(&node->ITRB, root, &ITPREFIX ## _augment);	      \
+	rb_insert_augmented_cached(&node->ITRB, root,			      \
+				   leftmost, &ITPREFIX ## _augment);	      \
 }									      \
 									      \
-ITSTATIC void ITPREFIX ## _remove(ITSTRUCT *node, struct rb_root *root)	      \
+ITSTATIC void ITPREFIX ## _remove(ITSTRUCT *node,			      \
+				  struct rb_root_cached *root)		      \
 {									      \
-	rb_erase_augmented(&node->ITRB, root, &ITPREFIX ## _augment);	      \
+	rb_erase_augmented_cached(&node->ITRB, root, &ITPREFIX ## _augment);  \
 }									      \
 									      \
 /*									      \
@@ -140,15 +146,35 @@ ITPREFIX ## _subtree_search(ITSTRUCT *node, ITTYPE start, ITTYPE last)	      \
 }									      \
 									      \
 ITSTATIC ITSTRUCT *							      \
-ITPREFIX ## _iter_first(struct rb_root *root, ITTYPE start, ITTYPE last)      \
+ITPREFIX ## _iter_first(struct rb_root_cached *root,			      \
+			ITTYPE start, ITTYPE last)			      \
 {									      \
-	ITSTRUCT *node;							      \
+	ITSTRUCT *node, *leftmost;					      \
 									      \
-	if (!root->rb_node)						      \
+	if (!root->rb_root.rb_node)					      \
 		return NULL;						      \
-	node = rb_entry(root->rb_node, ITSTRUCT, ITRB);			      \
+									      \
+	/*								      \
+	 * Fastpath range intersection/overlap between A: [a0, a1] and	      \
+	 * B: [b0, b1] is given by:					      \
+	 *								      \
+	 *         a0 <= b1 && b0 <= a1					      \
+	 *								      \
+	 *  ... where A holds the lock range and B holds the smallest	      \
+	 * 'start' and largest 'last' in the tree. For the later, we	      \
+	 * rely on the root node, which by augmented interval tree	      \
+	 * property, holds the largest value in its last-in-subtree.	      \
+	 * This allows mitigating some of the tree walk overhead for	      \
+	 * for non-intersecting ranges, maintained and consulted in O(1).     \
+	 */								      \
+	node = rb_entry(root->rb_root.rb_node, ITSTRUCT, ITRB);		      \
 	if (node->ITSUBTREE < start)					      \
 		return NULL;						      \
+									      \
+	leftmost = rb_entry(root->rb_leftmost, ITSTRUCT, ITRB);		      \
+	if (ITSTART(leftmost) > last)					      \
+		return NULL;						      \
+									      \
 	return ITPREFIX ## _subtree_search(node, start, last);		      \
 }									      \
 									      \
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 46b9ac5e8569..3a2652efbbfb 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1992,13 +1992,13 @@ extern int nommu_shrink_inode_mappings(struct inode *, size_t, size_t);
 
 /* interval_tree.c */
 void vma_interval_tree_insert(struct vm_area_struct *node,
-			      struct rb_root *root);
+			      struct rb_root_cached *root);
 void vma_interval_tree_insert_after(struct vm_area_struct *node,
 				    struct vm_area_struct *prev,
-				    struct rb_root *root);
+				    struct rb_root_cached *root);
 void vma_interval_tree_remove(struct vm_area_struct *node,
-			      struct rb_root *root);
-struct vm_area_struct *vma_interval_tree_iter_first(struct rb_root *root,
+			      struct rb_root_cached *root);
+struct vm_area_struct *vma_interval_tree_iter_first(struct rb_root_cached *root,
 				unsigned long start, unsigned long last);
 struct vm_area_struct *vma_interval_tree_iter_next(struct vm_area_struct *node,
 				unsigned long start, unsigned long last);
@@ -2008,11 +2008,12 @@ struct vm_area_struct *vma_interval_tree_iter_next(struct vm_area_struct *node,
 	     vma; vma = vma_interval_tree_iter_next(vma, start, last))
 
 void anon_vma_interval_tree_insert(struct anon_vma_chain *node,
-				   struct rb_root *root);
+				   struct rb_root_cached *root);
 void anon_vma_interval_tree_remove(struct anon_vma_chain *node,
-				   struct rb_root *root);
-struct anon_vma_chain *anon_vma_interval_tree_iter_first(
-	struct rb_root *root, unsigned long start, unsigned long last);
+				   struct rb_root_cached *root);
+struct anon_vma_chain *
+anon_vma_interval_tree_iter_first(struct rb_root_cached *root,
+				  unsigned long start, unsigned long last);
 struct anon_vma_chain *anon_vma_interval_tree_iter_next(
 	struct anon_vma_chain *node, unsigned long start, unsigned long last);
 #ifdef CONFIG_DEBUG_VM_RB
diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 43ef2c30cb0f..22c298c6cc26 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -55,7 +55,9 @@ struct anon_vma {
 	 * is serialized by a system wide lock only visible to
 	 * mm_take_all_locks() (mm_all_locks_mutex).
 	 */
-	struct rb_root rb_root;	/* Interval tree of private "related" vmas */
+
+	/* Interval tree of private "related" vmas */
+	struct rb_root_cached rb_root;
 };
 
 /*
diff --git a/include/rdma/ib_umem_odp.h b/include/rdma/ib_umem_odp.h
index fb67554aabd6..5eb7f5bc8248 100644
--- a/include/rdma/ib_umem_odp.h
+++ b/include/rdma/ib_umem_odp.h
@@ -111,22 +111,25 @@ int ib_umem_odp_map_dma_pages(struct ib_umem *umem, u64 start_offset, u64 bcnt,
 void ib_umem_odp_unmap_dma_pages(struct ib_umem *umem, u64 start_offset,
 				 u64 bound);
 
-void rbt_ib_umem_insert(struct umem_odp_node *node, struct rb_root *root);
-void rbt_ib_umem_remove(struct umem_odp_node *node, struct rb_root *root);
+void rbt_ib_umem_insert(struct umem_odp_node *node,
+			struct rb_root_cached *root);
+void rbt_ib_umem_remove(struct umem_odp_node *node,
+			struct rb_root_cached *root);
 typedef int (*umem_call_back)(struct ib_umem *item, u64 start, u64 end,
 			      void *cookie);
 /*
  * Call the callback on each ib_umem in the range. Returns the logical or of
  * the return values of the functions called.
  */
-int rbt_ib_umem_for_each_in_range(struct rb_root *root, u64 start, u64 end,
+int rbt_ib_umem_for_each_in_range(struct rb_root_cached *root,
+				  u64 start, u64 end,
 				  umem_call_back cb, void *cookie);
 
 /*
  * Find first region intersecting with address range.
  * Return NULL if not found
  */
-struct ib_umem_odp *rbt_ib_umem_lookup(struct rb_root *root,
+struct ib_umem_odp *rbt_ib_umem_lookup(struct rb_root_cached *root,
 				       u64 addr, u64 length);
 
 static inline int ib_umem_mmu_notifier_retry(struct ib_umem *item,
diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
index 593ad2640d2f..81665480a4ab 100644
--- a/include/rdma/ib_verbs.h
+++ b/include/rdma/ib_verbs.h
@@ -1420,7 +1420,7 @@ struct ib_ucontext {
 
 	struct pid             *tgid;
 #ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
-	struct rb_root      umem_tree;
+	struct rb_root_cached   umem_tree;
 	/*
 	 * Protects .umem_rbroot and tree, as well as odp_mrs_count and
 	 * mmu notifiers registration.
diff --git a/lib/interval_tree_test.c b/lib/interval_tree_test.c
index df495fe81421..0e343fd29570 100644
--- a/lib/interval_tree_test.c
+++ b/lib/interval_tree_test.c
@@ -19,14 +19,14 @@ __param(bool, search_all, false, "Searches will iterate all nodes in the tree");
 
 __param(uint, max_endpoint, ~0, "Largest value for the interval's endpoint");
 
-static struct rb_root root = RB_ROOT;
+static struct rb_root_cached root = RB_ROOT_CACHED;
 static struct interval_tree_node *nodes = NULL;
 static u32 *queries = NULL;
 
 static struct rnd_state rnd;
 
 static inline unsigned long
-search(struct rb_root *root, unsigned long start, unsigned long last)
+search(struct rb_root_cached *root, unsigned long start, unsigned long last)
 {
 	struct interval_tree_node *node;
 	unsigned long results = 0;
diff --git a/mm/interval_tree.c b/mm/interval_tree.c
index f2c2492681bf..b47664358796 100644
--- a/mm/interval_tree.c
+++ b/mm/interval_tree.c
@@ -28,7 +28,7 @@ INTERVAL_TREE_DEFINE(struct vm_area_struct, shared.rb,
 /* Insert node immediately after prev in the interval tree */
 void vma_interval_tree_insert_after(struct vm_area_struct *node,
 				    struct vm_area_struct *prev,
-				    struct rb_root *root)
+				    struct rb_root_cached *root)
 {
 	struct rb_node **link;
 	struct vm_area_struct *parent;
@@ -55,7 +55,7 @@ void vma_interval_tree_insert_after(struct vm_area_struct *node,
 
 	node->shared.rb_subtree_last = last;
 	rb_link_node(&node->shared.rb, &parent->shared.rb, link);
-	rb_insert_augmented(&node->shared.rb, root,
+	rb_insert_augmented(&node->shared.rb, &root->rb_root,
 			    &vma_interval_tree_augment);
 }
 
@@ -74,7 +74,7 @@ INTERVAL_TREE_DEFINE(struct anon_vma_chain, rb, unsigned long, rb_subtree_last,
 		     static inline, __anon_vma_interval_tree)
 
 void anon_vma_interval_tree_insert(struct anon_vma_chain *node,
-				   struct rb_root *root)
+				   struct rb_root_cached *root)
 {
 #ifdef CONFIG_DEBUG_VM_RB
 	node->cached_vma_start = avc_start_pgoff(node);
@@ -84,13 +84,13 @@ void anon_vma_interval_tree_insert(struct anon_vma_chain *node,
 }
 
 void anon_vma_interval_tree_remove(struct anon_vma_chain *node,
-				   struct rb_root *root)
+				   struct rb_root_cached *root)
 {
 	__anon_vma_interval_tree_remove(node, root);
 }
 
 struct anon_vma_chain *
-anon_vma_interval_tree_iter_first(struct rb_root *root,
+anon_vma_interval_tree_iter_first(struct rb_root_cached *root,
 				  unsigned long first, unsigned long last)
 {
 	return __anon_vma_interval_tree_iter_first(root, first, last);
diff --git a/mm/memory.c b/mm/memory.c
index 0e517be91a89..33cb79f73394 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2593,7 +2593,7 @@ static void unmap_mapping_range_vma(struct vm_area_struct *vma,
 	zap_page_range_single(vma, start_addr, end_addr - start_addr, details);
 }
 
-static inline void unmap_mapping_range_tree(struct rb_root *root,
+static inline void unmap_mapping_range_tree(struct rb_root_cached *root,
 					    struct zap_details *details)
 {
 	struct vm_area_struct *vma;
@@ -2657,7 +2657,7 @@ void unmap_mapping_range(struct address_space *mapping,
 		details.last_index = ULONG_MAX;
 
 	i_mmap_lock_write(mapping);
-	if (unlikely(!RB_EMPTY_ROOT(&mapping->i_mmap)))
+	if (unlikely(!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root)))
 		unmap_mapping_range_tree(&mapping->i_mmap, &details);
 	i_mmap_unlock_write(mapping);
 }
diff --git a/mm/mmap.c b/mm/mmap.c
index f19efcf75418..8121c70df96f 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -684,7 +684,7 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
 	struct mm_struct *mm = vma->vm_mm;
 	struct vm_area_struct *next = vma->vm_next, *orig_vma = vma;
 	struct address_space *mapping = NULL;
-	struct rb_root *root = NULL;
+	struct rb_root_cached *root = NULL;
 	struct anon_vma *anon_vma = NULL;
 	struct file *file = vma->vm_file;
 	bool start_changed = false, end_changed = false;
@@ -3314,7 +3314,7 @@ static DEFINE_MUTEX(mm_all_locks_mutex);
 
 static void vm_lock_anon_vma(struct mm_struct *mm, struct anon_vma *anon_vma)
 {
-	if (!test_bit(0, (unsigned long *) &anon_vma->root->rb_root.rb_node)) {
+	if (!test_bit(0, (unsigned long *) &anon_vma->rb_root.rb_root.rb_node)) {
 		/*
 		 * The LSB of head.next can't change from under us
 		 * because we hold the mm_all_locks_mutex.
@@ -3330,7 +3330,7 @@ static void vm_lock_anon_vma(struct mm_struct *mm, struct anon_vma *anon_vma)
 		 * anon_vma->root->rwsem.
 		 */
 		if (__test_and_set_bit(0, (unsigned long *)
-				       &anon_vma->root->rb_root.rb_node))
+				       &anon_vma->root->rb_root.rb_root.rb_node))
 			BUG();
 	}
 }
@@ -3432,7 +3432,7 @@ int mm_take_all_locks(struct mm_struct *mm)
 
 static void vm_unlock_anon_vma(struct anon_vma *anon_vma)
 {
-	if (test_bit(0, (unsigned long *) &anon_vma->root->rb_root.rb_node)) {
+	if (test_bit(0, (unsigned long *) &anon_vma->root->rb_root.rb_root.rb_node)) {
 		/*
 		 * The LSB of head.next can't change to 0 from under
 		 * us because we hold the mm_all_locks_mutex.
@@ -3446,7 +3446,7 @@ static void vm_unlock_anon_vma(struct anon_vma *anon_vma)
 		 * anon_vma->root->rwsem.
 		 */
 		if (!__test_and_clear_bit(0, (unsigned long *)
-					  &anon_vma->root->rb_root.rb_node))
+					  &anon_vma->root->rb_root.rb_root.rb_node))
 			BUG();
 		anon_vma_unlock_write(anon_vma);
 	}
diff --git a/mm/rmap.c b/mm/rmap.c
index ced14f1af6dc..ad479e5e081d 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -390,7 +390,7 @@ void unlink_anon_vmas(struct vm_area_struct *vma)
 		 * Leave empty anon_vmas on the list - we'll need
 		 * to free them outside the lock.
 		 */
-		if (RB_EMPTY_ROOT(&anon_vma->rb_root)) {
+		if (RB_EMPTY_ROOT(&anon_vma->rb_root.rb_root)) {
 			anon_vma->parent->degree--;
 			continue;
 		}
@@ -424,7 +424,7 @@ static void anon_vma_ctor(void *data)
 
 	init_rwsem(&anon_vma->rwsem);
 	atomic_set(&anon_vma->refcount, 0);
-	anon_vma->rb_root = RB_ROOT;
+	anon_vma->rb_root = RB_ROOT_CACHED;
 }
 
 void __init anon_vma_init(void)
-- 
2.12.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 12/17] lib/interval-tree: correct comment wrt generic flavor
  2017-07-19  1:45 [PATCH -next v4 00/17] rbtree: cache leftmost node internally Davidlohr Bueso
                   ` (10 preceding siblings ...)
  2017-07-19  1:45 ` [PATCH 11/17] lib/interval_tree: fast overlap detection Davidlohr Bueso
@ 2017-07-19  1:45 ` Davidlohr Bueso
  2017-07-19  1:45 ` [PATCH 13/17] procfs: use faster rb_first_cached() Davidlohr Bueso
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 29+ messages in thread
From: Davidlohr Bueso @ 2017-07-19  1:45 UTC (permalink / raw)
  To: akpm
  Cc: mingo, peterz, jack, torvalds, kirill.shutemov, hch, ldufour,
	mhocko, mgorman, dave, linux-kernel, Davidlohr Bueso

interval_tree.h _is_ the generic flavor.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 include/linux/interval_tree_generic.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/interval_tree_generic.h b/include/linux/interval_tree_generic.h
index f096423c8cbd..1f97ce26cccc 100644
--- a/include/linux/interval_tree_generic.h
+++ b/include/linux/interval_tree_generic.h
@@ -33,7 +33,7 @@
  * ITSTATIC:   'static' or empty
  * ITPREFIX:   prefix to use for the inline tree definitions
  *
- * Note - before using this, please consider if non-generic version
+ * Note - before using this, please consider if generic version
  * (interval_tree.h) would work for you...
  */
 
-- 
2.12.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 13/17] procfs: use faster rb_first_cached()
  2017-07-19  1:45 [PATCH -next v4 00/17] rbtree: cache leftmost node internally Davidlohr Bueso
                   ` (11 preceding siblings ...)
  2017-07-19  1:45 ` [PATCH 12/17] lib/interval-tree: correct comment wrt generic flavor Davidlohr Bueso
@ 2017-07-19  1:45 ` Davidlohr Bueso
  2017-07-19  1:46 ` [PATCH 14/17] fs/epoll: " Davidlohr Bueso
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 29+ messages in thread
From: Davidlohr Bueso @ 2017-07-19  1:45 UTC (permalink / raw)
  To: akpm
  Cc: mingo, peterz, jack, torvalds, kirill.shutemov, hch, ldufour,
	mhocko, mgorman, dave, linux-kernel, Davidlohr Bueso

... such that we can avoid the tree walks to get the
node with the smallest key. Semantically the same,
as the previously used rb_first(), but O(1). The
main overhead is the extra footprint for the cached
rb_node pointer, which should not matter for procfs.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 fs/proc/generic.c  | 26 ++++++++++++++------------
 fs/proc/internal.h |  2 +-
 fs/proc/proc_net.c |  2 +-
 fs/proc/root.c     |  2 +-
 4 files changed, 17 insertions(+), 15 deletions(-)

diff --git a/fs/proc/generic.c b/fs/proc/generic.c
index e3cda0b5968f..ab6496356dc2 100644
--- a/fs/proc/generic.c
+++ b/fs/proc/generic.c
@@ -40,8 +40,8 @@ static int proc_match(unsigned int len, const char *name, struct proc_dir_entry
 
 static struct proc_dir_entry *pde_subdir_first(struct proc_dir_entry *dir)
 {
-	return rb_entry_safe(rb_first(&dir->subdir), struct proc_dir_entry,
-			     subdir_node);
+	return rb_entry_safe(rb_first_cached(&dir->subdir),
+			     struct proc_dir_entry, subdir_node);
 }
 
 static struct proc_dir_entry *pde_subdir_next(struct proc_dir_entry *dir)
@@ -54,7 +54,7 @@ static struct proc_dir_entry *pde_subdir_find(struct proc_dir_entry *dir,
 					      const char *name,
 					      unsigned int len)
 {
-	struct rb_node *node = dir->subdir.rb_node;
+	struct rb_node *node = dir->subdir.rb_root.rb_node;
 
 	while (node) {
 		struct proc_dir_entry *de = rb_entry(node,
@@ -75,8 +75,9 @@ static struct proc_dir_entry *pde_subdir_find(struct proc_dir_entry *dir,
 static bool pde_subdir_insert(struct proc_dir_entry *dir,
 			      struct proc_dir_entry *de)
 {
-	struct rb_root *root = &dir->subdir;
-	struct rb_node **new = &root->rb_node, *parent = NULL;
+	struct rb_root_cached *root = &dir->subdir;
+	struct rb_node **new = &root->rb_root.rb_node, *parent = NULL;
+	bool leftmost = true;
 
 	/* Figure out where to put new node */
 	while (*new) {
@@ -88,15 +89,16 @@ static bool pde_subdir_insert(struct proc_dir_entry *dir,
 		parent = *new;
 		if (result < 0)
 			new = &(*new)->rb_left;
-		else if (result > 0)
+		else if (result > 0) {
 			new = &(*new)->rb_right;
-		else
+			leftmost = false;
+		} else
 			return false;
 	}
 
 	/* Add new node and rebalance tree. */
 	rb_link_node(&de->subdir_node, parent, new);
-	rb_insert_color(&de->subdir_node, root);
+	rb_insert_color_cached(&de->subdir_node, root, leftmost);
 	return true;
 }
 
@@ -369,7 +371,7 @@ static struct proc_dir_entry *__proc_create(struct proc_dir_entry **parent,
 	ent->namelen = qstr.len;
 	ent->mode = mode;
 	ent->nlink = nlink;
-	ent->subdir = RB_ROOT;
+	ent->subdir = RB_ROOT_CACHED;
 	atomic_set(&ent->count, 1);
 	spin_lock_init(&ent->pde_unload_lock);
 	INIT_LIST_HEAD(&ent->pde_openers);
@@ -545,7 +547,7 @@ void remove_proc_entry(const char *name, struct proc_dir_entry *parent)
 
 	de = pde_subdir_find(parent, fn, len);
 	if (de)
-		rb_erase(&de->subdir_node, &parent->subdir);
+		rb_erase_cached(&de->subdir_node, &parent->subdir);
 	write_unlock(&proc_subdir_lock);
 	if (!de) {
 		WARN(1, "name '%s'\n", name);
@@ -582,13 +584,13 @@ int remove_proc_subtree(const char *name, struct proc_dir_entry *parent)
 		write_unlock(&proc_subdir_lock);
 		return -ENOENT;
 	}
-	rb_erase(&root->subdir_node, &parent->subdir);
+	rb_erase_cached(&root->subdir_node, &parent->subdir);
 
 	de = root;
 	while (1) {
 		next = pde_subdir_first(de);
 		if (next) {
-			rb_erase(&next->subdir_node, &de->subdir);
+			rb_erase_cached(&next->subdir_node, &de->subdir);
 			de = next;
 			continue;
 		}
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index aa2b89071630..8382d08385a9 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -40,7 +40,7 @@ struct proc_dir_entry {
 	const struct inode_operations *proc_iops;
 	const struct file_operations *proc_fops;
 	struct proc_dir_entry *parent;
-	struct rb_root subdir;
+	struct rb_root_cached subdir;
 	struct rb_node subdir_node;
 	void *data;
 	atomic_t count;		/* use count */
diff --git a/fs/proc/proc_net.c b/fs/proc/proc_net.c
index d72fc40241d9..a2bf369c923d 100644
--- a/fs/proc/proc_net.c
+++ b/fs/proc/proc_net.c
@@ -196,7 +196,7 @@ static __net_init int proc_net_ns_init(struct net *net)
 	if (!netd)
 		goto out;
 
-	netd->subdir = RB_ROOT;
+	netd->subdir = RB_ROOT_CACHED;
 	netd->data = net;
 	netd->nlink = 2;
 	netd->namelen = 3;
diff --git a/fs/proc/root.c b/fs/proc/root.c
index deecb397daa3..926fb27f4ca2 100644
--- a/fs/proc/root.c
+++ b/fs/proc/root.c
@@ -210,7 +210,7 @@ struct proc_dir_entry proc_root = {
 	.proc_iops	= &proc_root_inode_operations, 
 	.proc_fops	= &proc_root_operations,
 	.parent		= &proc_root,
-	.subdir		= RB_ROOT,
+	.subdir		= RB_ROOT_CACHED,
 	.name		= "/proc",
 };
 
-- 
2.12.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 14/17] fs/epoll: use faster rb_first_cached()
  2017-07-19  1:45 [PATCH -next v4 00/17] rbtree: cache leftmost node internally Davidlohr Bueso
                   ` (12 preceding siblings ...)
  2017-07-19  1:45 ` [PATCH 13/17] procfs: use faster rb_first_cached() Davidlohr Bueso
@ 2017-07-19  1:46 ` Davidlohr Bueso
  2017-07-19  1:46 ` [PATCH 15/17] fs/ext4: use cached rbtrees Davidlohr Bueso
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 29+ messages in thread
From: Davidlohr Bueso @ 2017-07-19  1:46 UTC (permalink / raw)
  To: akpm
  Cc: mingo, peterz, jack, torvalds, kirill.shutemov, hch, ldufour,
	mhocko, mgorman, dave, linux-kernel, Davidlohr Bueso

... such that we can avoid the tree walks to get the
node with the smallest key. Semantically the same,
as the previously used rb_first(), but O(1). The
main overhead is the extra footprint for the cached
rb_node pointer, which should not matter for epoll.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 fs/eventpoll.c | 30 ++++++++++++++++--------------
 1 file changed, 16 insertions(+), 14 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index e767e4389cb1..c56e842bc9cc 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -205,7 +205,7 @@ struct eventpoll {
 	struct list_head rdllist;
 
 	/* RB tree root used to store monitored fd structs */
-	struct rb_root rbr;
+	struct rb_root_cached rbr;
 
 	/*
 	 * This is a single linked list that chains all the "struct epitem" that
@@ -791,7 +791,7 @@ static int ep_remove(struct eventpoll *ep, struct epitem *epi)
 	list_del_rcu(&epi->fllink);
 	spin_unlock(&file->f_lock);
 
-	rb_erase(&epi->rbn, &ep->rbr);
+	rb_erase_cached(&epi->rbn, &ep->rbr);
 
 	spin_lock_irqsave(&ep->lock, flags);
 	if (ep_is_linked(&epi->rdllink))
@@ -835,7 +835,7 @@ static void ep_free(struct eventpoll *ep)
 	/*
 	 * Walks through the whole tree by unregistering poll callbacks.
 	 */
-	for (rbp = rb_first(&ep->rbr); rbp; rbp = rb_next(rbp)) {
+	for (rbp = rb_first_cached(&ep->rbr); rbp; rbp = rb_next(rbp)) {
 		epi = rb_entry(rbp, struct epitem, rbn);
 
 		ep_unregister_pollwait(ep, epi);
@@ -851,7 +851,7 @@ static void ep_free(struct eventpoll *ep)
 	 * a lockdep warning.
 	 */
 	mutex_lock(&ep->mtx);
-	while ((rbp = rb_first(&ep->rbr)) != NULL) {
+	while ((rbp = rb_first_cached(&ep->rbr)) != NULL) {
 		epi = rb_entry(rbp, struct epitem, rbn);
 		ep_remove(ep, epi);
 		cond_resched();
@@ -958,7 +958,7 @@ static void ep_show_fdinfo(struct seq_file *m, struct file *f)
 	struct rb_node *rbp;
 
 	mutex_lock(&ep->mtx);
-	for (rbp = rb_first(&ep->rbr); rbp; rbp = rb_next(rbp)) {
+	for (rbp = rb_first_cached(&ep->rbr); rbp; rbp = rb_next(rbp)) {
 		struct epitem *epi = rb_entry(rbp, struct epitem, rbn);
 		struct inode *inode = file_inode(epi->ffd.file);
 
@@ -1035,7 +1035,7 @@ static int ep_alloc(struct eventpoll **pep)
 	init_waitqueue_head(&ep->wq);
 	init_waitqueue_head(&ep->poll_wait);
 	INIT_LIST_HEAD(&ep->rdllist);
-	ep->rbr = RB_ROOT;
+	ep->rbr = RB_ROOT_CACHED;
 	ep->ovflist = EP_UNACTIVE_PTR;
 	ep->user = user;
 
@@ -1061,7 +1061,7 @@ static struct epitem *ep_find(struct eventpoll *ep, struct file *file, int fd)
 	struct epoll_filefd ffd;
 
 	ep_set_ffd(&ffd, file, fd);
-	for (rbp = ep->rbr.rb_node; rbp; ) {
+	for (rbp = ep->rbr.rb_root.rb_node; rbp; ) {
 		epi = rb_entry(rbp, struct epitem, rbn);
 		kcmp = ep_cmp_ffd(&ffd, &epi->ffd);
 		if (kcmp > 0)
@@ -1083,7 +1083,7 @@ static struct epitem *ep_find_tfd(struct eventpoll *ep, int tfd, unsigned long t
 	struct rb_node *rbp;
 	struct epitem *epi;
 
-	for (rbp = rb_first(&ep->rbr); rbp; rbp = rb_next(rbp)) {
+	for (rbp = rb_first_cached(&ep->rbr); rbp; rbp = rb_next(rbp)) {
 		epi = rb_entry(rbp, struct epitem, rbn);
 		if (epi->ffd.fd == tfd) {
 			if (toff == 0)
@@ -1263,20 +1263,22 @@ static void ep_ptable_queue_proc(struct file *file, wait_queue_head_t *whead,
 static void ep_rbtree_insert(struct eventpoll *ep, struct epitem *epi)
 {
 	int kcmp;
-	struct rb_node **p = &ep->rbr.rb_node, *parent = NULL;
+	struct rb_node **p = &ep->rbr.rb_root.rb_node, *parent = NULL;
 	struct epitem *epic;
+	bool leftmost = true;
 
 	while (*p) {
 		parent = *p;
 		epic = rb_entry(parent, struct epitem, rbn);
 		kcmp = ep_cmp_ffd(&epi->ffd, &epic->ffd);
-		if (kcmp > 0)
+		if (kcmp > 0) {
 			p = &parent->rb_right;
-		else
+			leftmost = false;
+		} else
 			p = &parent->rb_left;
 	}
 	rb_link_node(&epi->rbn, parent, p);
-	rb_insert_color(&epi->rbn, &ep->rbr);
+	rb_insert_color_cached(&epi->rbn, &ep->rbr, leftmost);
 }
 
 
@@ -1520,7 +1522,7 @@ static int ep_insert(struct eventpoll *ep, struct epoll_event *event,
 	list_del_rcu(&epi->fllink);
 	spin_unlock(&tfile->f_lock);
 
-	rb_erase(&epi->rbn, &ep->rbr);
+	rb_erase_cached(&epi->rbn, &ep->rbr);
 
 error_unregister:
 	ep_unregister_pollwait(ep, epi);
@@ -1868,7 +1870,7 @@ static int ep_loop_check_proc(void *priv, void *cookie, int call_nests)
 	mutex_lock_nested(&ep->mtx, call_nests + 1);
 	ep->visited = 1;
 	list_add(&ep->visited_list_link, &visited_list);
-	for (rbp = rb_first(&ep->rbr); rbp; rbp = rb_next(rbp)) {
+	for (rbp = rb_first_cached(&ep->rbr); rbp; rbp = rb_next(rbp)) {
 		epi = rb_entry(rbp, struct epitem, rbn);
 		if (unlikely(is_file_epoll(epi->ffd.file))) {
 			ep_tovisit = epi->ffd.file->private_data;
-- 
2.12.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 15/17] fs/ext4: use cached rbtrees
  2017-07-19  1:45 [PATCH -next v4 00/17] rbtree: cache leftmost node internally Davidlohr Bueso
                   ` (13 preceding siblings ...)
  2017-07-19  1:46 ` [PATCH 14/17] fs/epoll: " Davidlohr Bueso
@ 2017-07-19  1:46 ` Davidlohr Bueso
  2017-07-19  7:40   ` Jan Kara
  2017-07-19  1:46 ` [PATCH 16/17] mem/memcg: cache rightmost node Davidlohr Bueso
  2017-07-19  1:46 ` [PATCH 17/17] block/cfq: cache rightmost rb_node Davidlohr Bueso
  16 siblings, 1 reply; 29+ messages in thread
From: Davidlohr Bueso @ 2017-07-19  1:46 UTC (permalink / raw)
  To: akpm
  Cc: mingo, peterz, jack, torvalds, kirill.shutemov, hch, ldufour,
	mhocko, mgorman, dave, linux-kernel, tytso, Davidlohr Bueso

This patch replaces two uses of rbtrees to the cached version for
better rb_first() performance during:

* Tree traversals to mark blocks as used during ext4_mb_init_cache()
looping. (struct ext4_group_info)

* Storing ->curr_node for dx_readdir(). (struct dir_private_info)

Caching the leftmost node comes at negligible runtime cost, thus the
only downside is that the aforementioned structures are enlarged for
the extra pointer. No changes in semantics whatsoever.

Cc: tytso@mit.edu
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
This is part of the rbtree internal caching series:
https://lwn.net/Articles/726809/

 fs/ext4/dir.c     | 24 ++++++++++++++----------
 fs/ext4/ext4.h    |  4 ++--
 fs/ext4/mballoc.c | 22 ++++++++++++----------
 3 files changed, 28 insertions(+), 22 deletions(-)

diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c
index e8b365000d73..011df6bba13c 100644
--- a/fs/ext4/dir.c
+++ b/fs/ext4/dir.c
@@ -391,18 +391,18 @@ struct fname {
  * This functoin implements a non-recursive way of freeing all of the
  * nodes in the red-black tree.
  */
-static void free_rb_tree_fname(struct rb_root *root)
+static void free_rb_tree_fname(struct rb_root_cached *root)
 {
 	struct fname *fname, *next;
 
-	rbtree_postorder_for_each_entry_safe(fname, next, root, rb_hash)
+	rbtree_postorder_for_each_entry_safe(fname, next, &root->rb_root, rb_hash)
 		while (fname) {
 			struct fname *old = fname;
 			fname = fname->next;
 			kfree(old);
 		}
 
-	*root = RB_ROOT;
+	*root = RB_ROOT_CACHED;
 }
 
 
@@ -441,9 +441,10 @@ int ext4_htree_store_dirent(struct file *dir_file, __u32 hash,
 	struct fname *fname, *new_fn;
 	struct dir_private_info *info;
 	int len;
+	bool leftmost = true;
 
 	info = dir_file->private_data;
-	p = &info->root.rb_node;
+	p = &info->root.rb_root.rb_node;
 
 	/* Create and allocate the fname structure */
 	len = sizeof(struct fname) + ent_name->len + 1;
@@ -475,16 +476,19 @@ int ext4_htree_store_dirent(struct file *dir_file, __u32 hash,
 
 		if (new_fn->hash < fname->hash)
 			p = &(*p)->rb_left;
-		else if (new_fn->hash > fname->hash)
+		else if (new_fn->hash > fname->hash) {
 			p = &(*p)->rb_right;
-		else if (new_fn->minor_hash < fname->minor_hash)
+			leftmost = false;
+		} else if (new_fn->minor_hash < fname->minor_hash)
 			p = &(*p)->rb_left;
-		else /* if (new_fn->minor_hash > fname->minor_hash) */
+		else { /* if (new_fn->minor_hash > fname->minor_hash) */
 			p = &(*p)->rb_right;
+			leftmost = false;
+		}
 	}
 
 	rb_link_node(&new_fn->rb_hash, parent, p);
-	rb_insert_color(&new_fn->rb_hash, &info->root);
+	rb_insert_color_cached(&new_fn->rb_hash, &info->root, leftmost);
 	return 0;
 }
 
@@ -558,7 +562,7 @@ static int ext4_dx_readdir(struct file *file, struct dir_context *ctx)
 		info->extra_fname = NULL;
 		goto next_node;
 	} else if (!info->curr_node)
-		info->curr_node = rb_first(&info->root);
+		info->curr_node = rb_first_cached(&info->root);
 
 	while (1) {
 		/*
@@ -580,7 +584,7 @@ static int ext4_dx_readdir(struct file *file, struct dir_context *ctx)
 				ctx->pos = ext4_get_htree_eof(file);
 				break;
 			}
-			info->curr_node = rb_first(&info->root);
+			info->curr_node = rb_first_cached(&info->root);
 		}
 
 		fname = rb_entry(info->curr_node, struct fname, rb_hash);
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 9ebde0cd632e..0242c6801600 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2116,7 +2116,7 @@ static inline bool ext4_is_quota_file(struct inode *inode)
  * readdir operations in hash tree order.
  */
 struct dir_private_info {
-	struct rb_root	root;
+	struct          rb_root_cached	root;
 	struct rb_node	*curr_node;
 	struct fname	*extra_fname;
 	loff_t		last_pos;
@@ -2883,7 +2883,7 @@ int ext4_update_disksize_before_punch(struct inode *inode, loff_t offset,
 
 struct ext4_group_info {
 	unsigned long   bb_state;
-	struct rb_root  bb_free_root;
+	struct          rb_root_cached  bb_free_root;
 	ext4_grpblk_t	bb_first_free;	/* first free block */
 	ext4_grpblk_t	bb_free;	/* total free blocks */
 	ext4_grpblk_t	bb_fragments;	/* nr of freespace fragments */
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 581e357e8406..df36f9123a99 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -2462,7 +2462,7 @@ int ext4_mb_add_groupinfo(struct super_block *sb, ext4_group_t group,
 
 	INIT_LIST_HEAD(&meta_group_info[i]->bb_prealloc_list);
 	init_rwsem(&meta_group_info[i]->alloc_sem);
-	meta_group_info[i]->bb_free_root = RB_ROOT;
+	meta_group_info[i]->bb_free_root = RB_ROOT_CACHED;
 	meta_group_info[i]->bb_largest_free_order = -1;  /* uninit */
 
 #ifdef DOUBLE_CHECK
@@ -2824,7 +2824,7 @@ static void ext4_free_data_in_buddy(struct super_block *sb,
 	count2++;
 	ext4_lock_group(sb, entry->efd_group);
 	/* Take it out of per group rb tree */
-	rb_erase(&entry->efd_node, &(db->bb_free_root));
+	rb_erase_cached(&entry->efd_node, &(db->bb_free_root));
 	mb_free_blocks(NULL, &e4b, entry->efd_start_cluster, entry->efd_count);
 
 	/*
@@ -2836,7 +2836,7 @@ static void ext4_free_data_in_buddy(struct super_block *sb,
 	if (!test_opt(sb, DISCARD))
 		EXT4_MB_GRP_CLEAR_TRIMMED(db);
 
-	if (!db->bb_free_root.rb_node) {
+	if (!db->bb_free_root.rb_root.rb_node) {
 		/* No more items in the per group rb tree
 		 * balance refcounts from ext4_mb_free_metadata()
 		 */
@@ -3519,7 +3519,7 @@ static void ext4_mb_generate_from_freelist(struct super_block *sb, void *bitmap,
 	struct ext4_free_data *entry;
 
 	grp = ext4_get_group_info(sb, group);
-	n = rb_first(&(grp->bb_free_root));
+	n = rb_first_cached(&(grp->bb_free_root));
 
 	while (n) {
 		entry = rb_entry(n, struct ext4_free_data, efd_node);
@@ -4624,7 +4624,7 @@ ext4_fsblk_t ext4_mb_new_blocks(handle_t *handle,
 static void ext4_try_merge_freed_extent(struct ext4_sb_info *sbi,
 					struct ext4_free_data *entry,
 					struct ext4_free_data *new_entry,
-					struct rb_root *entry_rb_root)
+					struct rb_root_cached *entry_rb_root)
 {
 	if ((entry->efd_tid != new_entry->efd_tid) ||
 	    (entry->efd_group != new_entry->efd_group))
@@ -4641,7 +4641,7 @@ static void ext4_try_merge_freed_extent(struct ext4_sb_info *sbi,
 	spin_lock(&sbi->s_md_lock);
 	list_del(&entry->efd_list);
 	spin_unlock(&sbi->s_md_lock);
-	rb_erase(&entry->efd_node, entry_rb_root);
+	rb_erase_cached(&entry->efd_node, entry_rb_root);
 	kmem_cache_free(ext4_free_data_cachep, entry);
 }
 
@@ -4656,8 +4656,9 @@ ext4_mb_free_metadata(handle_t *handle, struct ext4_buddy *e4b,
 	struct ext4_group_info *db = e4b->bd_info;
 	struct super_block *sb = e4b->bd_sb;
 	struct ext4_sb_info *sbi = EXT4_SB(sb);
-	struct rb_node **n = &db->bb_free_root.rb_node, *node;
+	struct rb_node **n = &db->bb_free_root.rb_root.rb_node, *node;
 	struct rb_node *parent = NULL, *new_node;
+	bool leftmost = true;
 
 	BUG_ON(!ext4_handle_valid(handle));
 	BUG_ON(e4b->bd_bitmap_page == NULL);
@@ -4680,9 +4681,10 @@ ext4_mb_free_metadata(handle_t *handle, struct ext4_buddy *e4b,
 		entry = rb_entry(parent, struct ext4_free_data, efd_node);
 		if (cluster < entry->efd_start_cluster)
 			n = &(*n)->rb_left;
-		else if (cluster >= (entry->efd_start_cluster + entry->efd_count))
+		else if (cluster >= (entry->efd_start_cluster + entry->efd_count)) {
 			n = &(*n)->rb_right;
-		else {
+			leftmost = false;
+		} else {
 			ext4_grp_locked_error(sb, group, 0,
 				ext4_group_first_block_no(sb, group) +
 				EXT4_C2B(sbi, cluster),
@@ -4692,7 +4694,7 @@ ext4_mb_free_metadata(handle_t *handle, struct ext4_buddy *e4b,
 	}
 
 	rb_link_node(new_node, parent, n);
-	rb_insert_color(new_node, &db->bb_free_root);
+	rb_insert_color_cached(new_node, &db->bb_free_root, leftmost);
 
 	/* Now try to see the extent can be merged to left and right */
 	node = rb_prev(new_node);
-- 
2.12.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 16/17] mem/memcg: cache rightmost node
  2017-07-19  1:45 [PATCH -next v4 00/17] rbtree: cache leftmost node internally Davidlohr Bueso
                   ` (14 preceding siblings ...)
  2017-07-19  1:46 ` [PATCH 15/17] fs/ext4: use cached rbtrees Davidlohr Bueso
@ 2017-07-19  1:46 ` Davidlohr Bueso
  2017-07-19  7:50   ` Michal Hocko
  2017-07-19  1:46 ` [PATCH 17/17] block/cfq: cache rightmost rb_node Davidlohr Bueso
  16 siblings, 1 reply; 29+ messages in thread
From: Davidlohr Bueso @ 2017-07-19  1:46 UTC (permalink / raw)
  To: akpm
  Cc: mingo, peterz, jack, torvalds, kirill.shutemov, hch, ldufour,
	mhocko, mgorman, dave, linux-kernel, Davidlohr Bueso

Such that we can optimize __mem_cgroup_largest_soft_limit_node().
The only overhead is the extra footprint for the cached pointer,
but this should not be an issue for mem_cgroup_tree_per_node.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 mm/memcontrol.c | 23 ++++++++++++++++++-----
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 3df3c04d73ab..2ef9328ace2e 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -119,6 +119,7 @@ static const char *const mem_cgroup_lru_names[] = {
 
 struct mem_cgroup_tree_per_node {
 	struct rb_root rb_root;
+	struct rb_node *rb_rightmost;
 	spinlock_t lock;
 };
 
@@ -386,6 +387,7 @@ static void __mem_cgroup_insert_exceeded(struct mem_cgroup_per_node *mz,
 	struct rb_node **p = &mctz->rb_root.rb_node;
 	struct rb_node *parent = NULL;
 	struct mem_cgroup_per_node *mz_node;
+	bool rightmost = true;
 
 	if (mz->on_tree)
 		return;
@@ -397,8 +399,11 @@ static void __mem_cgroup_insert_exceeded(struct mem_cgroup_per_node *mz,
 		parent = *p;
 		mz_node = rb_entry(parent, struct mem_cgroup_per_node,
 					tree_node);
-		if (mz->usage_in_excess < mz_node->usage_in_excess)
+		if (mz->usage_in_excess < mz_node->usage_in_excess) {
 			p = &(*p)->rb_left;
+			rightmost = false;
+		}
+
 		/*
 		 * We can't avoid mem cgroups that are over their soft
 		 * limit by the same amount
@@ -406,6 +411,10 @@ static void __mem_cgroup_insert_exceeded(struct mem_cgroup_per_node *mz,
 		else if (mz->usage_in_excess >= mz_node->usage_in_excess)
 			p = &(*p)->rb_right;
 	}
+
+	if (rightmost)
+		mctz->rb_rightmost = &mz->tree_node;
+
 	rb_link_node(&mz->tree_node, parent, p);
 	rb_insert_color(&mz->tree_node, &mctz->rb_root);
 	mz->on_tree = true;
@@ -416,6 +425,10 @@ static void __mem_cgroup_remove_exceeded(struct mem_cgroup_per_node *mz,
 {
 	if (!mz->on_tree)
 		return;
+
+	if (&mz->tree_node == mctz->rb_rightmost)
+		mctz->rb_rightmost = rb_next(&mz->tree_node);
+
 	rb_erase(&mz->tree_node, &mctz->rb_root);
 	mz->on_tree = false;
 }
@@ -496,16 +509,15 @@ static void mem_cgroup_remove_from_trees(struct mem_cgroup *memcg)
 static struct mem_cgroup_per_node *
 __mem_cgroup_largest_soft_limit_node(struct mem_cgroup_tree_per_node *mctz)
 {
-	struct rb_node *rightmost = NULL;
 	struct mem_cgroup_per_node *mz;
 
 retry:
 	mz = NULL;
-	rightmost = rb_last(&mctz->rb_root);
-	if (!rightmost)
+	if (!mctz->rb_rightmost)
 		goto done;		/* Nothing to reclaim from */
 
-	mz = rb_entry(rightmost, struct mem_cgroup_per_node, tree_node);
+	mz = rb_entry(mctz->rb_rightmost,
+		      struct mem_cgroup_per_node, tree_node);
 	/*
 	 * Remove the node now but someone else can add it back,
 	 * we will to add it back at the end of reclaim to its correct
@@ -5850,6 +5862,7 @@ static int __init mem_cgroup_init(void)
 				    node_online(node) ? node : NUMA_NO_NODE);
 
 		rtpn->rb_root = RB_ROOT;
+		rtpn->rb_rightmost = NULL;
 		spin_lock_init(&rtpn->lock);
 		soft_limit_tree.rb_tree_per_node[node] = rtpn;
 	}
-- 
2.12.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH 17/17] block/cfq: cache rightmost rb_node
  2017-07-19  1:45 [PATCH -next v4 00/17] rbtree: cache leftmost node internally Davidlohr Bueso
                   ` (15 preceding siblings ...)
  2017-07-19  1:46 ` [PATCH 16/17] mem/memcg: cache rightmost node Davidlohr Bueso
@ 2017-07-19  1:46 ` Davidlohr Bueso
  2017-07-19  7:59   ` Jan Kara
  16 siblings, 1 reply; 29+ messages in thread
From: Davidlohr Bueso @ 2017-07-19  1:46 UTC (permalink / raw)
  To: akpm
  Cc: mingo, peterz, jack, torvalds, kirill.shutemov, hch, ldufour,
	mhocko, mgorman, dave, linux-kernel, axboe, linux-block,
	Davidlohr Bueso

For the same reasons we already cache the leftmost pointer,
apply the same optimization for rb_last() calls. Users must
explicitly do this as rb_root_cached only deals with the
smallest node.

Cc: axboe@fb.com
Cc: linux-block@vger.kernel.org
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
This is part of the rbtree internal caching series:
https://lwn.net/Articles/726809/

 block/cfq-iosched.c | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
index 92c31683a2bb..57ec45fd4590 100644
--- a/block/cfq-iosched.c
+++ b/block/cfq-iosched.c
@@ -94,11 +94,13 @@ struct cfq_ttime {
  */
 struct cfq_rb_root {
 	struct rb_root_cached rb;
+	struct rb_node *rb_rightmost;
 	unsigned count;
 	u64 min_vdisktime;
 	struct cfq_ttime ttime;
 };
 #define CFQ_RB_ROOT	(struct cfq_rb_root) { .rb = RB_ROOT_CACHED, \
+			.rb_rightmost = NULL,			     \
 			.ttime = {.last_end_request = ktime_get_ns(),},}
 
 /*
@@ -1183,6 +1185,9 @@ static struct cfq_group *cfq_rb_first_group(struct cfq_rb_root *root)
 
 static void cfq_rb_erase(struct rb_node *n, struct cfq_rb_root *root)
 {
+	if (root->rb_rightmost == n)
+		root->rb_rightmost = rb_next(n);
+
 	rb_erase_cached(n, &root->rb);
 	RB_CLEAR_NODE(n);
 
@@ -1239,20 +1244,24 @@ __cfq_group_service_tree_add(struct cfq_rb_root *st, struct cfq_group *cfqg)
 	struct rb_node *parent = NULL;
 	struct cfq_group *__cfqg;
 	s64 key = cfqg_key(st, cfqg);
-	bool leftmost = true;
+	bool leftmost = true, rightmost = true;
 
 	while (*node != NULL) {
 		parent = *node;
 		__cfqg = rb_entry_cfqg(parent);
 
-		if (key < cfqg_key(st, __cfqg))
+		if (key < cfqg_key(st, __cfqg)) {
 			node = &parent->rb_left;
-		else {
+			rightmost = false;
+		} else {
 			node = &parent->rb_right;
 			leftmost = false;
 		}
 	}
 
+	if (rightmost)
+		st->rb_rightmost = &cfqg->rb_node;
+
 	rb_link_node(&cfqg->rb_node, parent, node);
 	rb_insert_color_cached(&cfqg->rb_node, &st->rb, leftmost);
 }
@@ -1355,7 +1364,7 @@ cfq_group_notify_queue_add(struct cfq_data *cfqd, struct cfq_group *cfqg)
 	 * so that groups get lesser vtime based on their weights, so that
 	 * if group does not loose all if it was not continuously backlogged.
 	 */
-	n = rb_last(&st->rb.rb_root);
+	n = st->rb_rightmost;
 	if (n) {
 		__cfqg = rb_entry_cfqg(n);
 		cfqg->vdisktime = __cfqg->vdisktime +
@@ -2204,7 +2213,7 @@ static void cfq_service_tree_add(struct cfq_data *cfqd, struct cfq_queue *cfqq,
 	st = st_for(cfqq->cfqg, cfqq_class(cfqq), cfqq_type(cfqq));
 	if (cfq_class_idle(cfqq)) {
 		rb_key = CFQ_IDLE_DELAY;
-		parent = rb_last(&st->rb.rb_root);
+		parent = st->rb_rightmost;
 		if (parent && parent != &cfqq->rb_node) {
 			__cfqq = rb_entry(parent, struct cfq_queue, rb_node);
 			rb_key += __cfqq->rb_key;
-- 
2.12.0

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH 15/17] fs/ext4: use cached rbtrees
  2017-07-19  1:46 ` [PATCH 15/17] fs/ext4: use cached rbtrees Davidlohr Bueso
@ 2017-07-19  7:40   ` Jan Kara
  2017-07-19 22:50     ` Davidlohr Bueso
  0 siblings, 1 reply; 29+ messages in thread
From: Jan Kara @ 2017-07-19  7:40 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: akpm, mingo, peterz, jack, torvalds, kirill.shutemov, hch,
	ldufour, mhocko, mgorman, linux-kernel, tytso, Davidlohr Bueso

On Tue 18-07-17 18:46:01, Davidlohr Bueso wrote:
> This patch replaces two uses of rbtrees to the cached version for
> better rb_first() performance during:
> 
> * Tree traversals to mark blocks as used during ext4_mb_init_cache()
> looping. (struct ext4_group_info)
> 
> * Storing ->curr_node for dx_readdir(). (struct dir_private_info)
> 
> Caching the leftmost node comes at negligible runtime cost, thus the
> only downside is that the aforementioned structures are enlarged for
> the extra pointer. No changes in semantics whatsoever.
> 
> Cc: tytso@mit.edu
> Signed-off-by: Davidlohr Bueso <dbueso@suse.de>

Hum, so I'm somewhat undecided whether this is worth the churn. For free
blocks rb_tree we use rb_first() only in ext4_mb_generate_from_freelist()
which gets called only when generating new buddy bitmap from on-disk bitmap
and we traverse the whole tree after that - thus the extra cost of
rb_first() is a) well hidden in the total cost of iteration, b) rather rare
anyway.

Similarly for the H-tree directory code, we call rb_first() in
ext4_dx_readdir() only to start an iteration over the whole B-tree and in
such case I don't think optimizing rb_first() makes a big difference
(maintaining cached value is going to have about the same cost as we save
by using it).

								Honza

> ---
> This is part of the rbtree internal caching series:
> https://lwn.net/Articles/726809/
> 
>  fs/ext4/dir.c     | 24 ++++++++++++++----------
>  fs/ext4/ext4.h    |  4 ++--
>  fs/ext4/mballoc.c | 22 ++++++++++++----------
>  3 files changed, 28 insertions(+), 22 deletions(-)
> 
> diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c
> index e8b365000d73..011df6bba13c 100644
> --- a/fs/ext4/dir.c
> +++ b/fs/ext4/dir.c
> @@ -391,18 +391,18 @@ struct fname {
>   * This functoin implements a non-recursive way of freeing all of the
>   * nodes in the red-black tree.
>   */
> -static void free_rb_tree_fname(struct rb_root *root)
> +static void free_rb_tree_fname(struct rb_root_cached *root)
>  {
>  	struct fname *fname, *next;
>  
> -	rbtree_postorder_for_each_entry_safe(fname, next, root, rb_hash)
> +	rbtree_postorder_for_each_entry_safe(fname, next, &root->rb_root, rb_hash)
>  		while (fname) {
>  			struct fname *old = fname;
>  			fname = fname->next;
>  			kfree(old);
>  		}
>  
> -	*root = RB_ROOT;
> +	*root = RB_ROOT_CACHED;
>  }
>  
>  
> @@ -441,9 +441,10 @@ int ext4_htree_store_dirent(struct file *dir_file, __u32 hash,
>  	struct fname *fname, *new_fn;
>  	struct dir_private_info *info;
>  	int len;
> +	bool leftmost = true;
>  
>  	info = dir_file->private_data;
> -	p = &info->root.rb_node;
> +	p = &info->root.rb_root.rb_node;
>  
>  	/* Create and allocate the fname structure */
>  	len = sizeof(struct fname) + ent_name->len + 1;
> @@ -475,16 +476,19 @@ int ext4_htree_store_dirent(struct file *dir_file, __u32 hash,
>  
>  		if (new_fn->hash < fname->hash)
>  			p = &(*p)->rb_left;
> -		else if (new_fn->hash > fname->hash)
> +		else if (new_fn->hash > fname->hash) {
>  			p = &(*p)->rb_right;
> -		else if (new_fn->minor_hash < fname->minor_hash)
> +			leftmost = false;
> +		} else if (new_fn->minor_hash < fname->minor_hash)
>  			p = &(*p)->rb_left;
> -		else /* if (new_fn->minor_hash > fname->minor_hash) */
> +		else { /* if (new_fn->minor_hash > fname->minor_hash) */
>  			p = &(*p)->rb_right;
> +			leftmost = false;
> +		}
>  	}
>  
>  	rb_link_node(&new_fn->rb_hash, parent, p);
> -	rb_insert_color(&new_fn->rb_hash, &info->root);
> +	rb_insert_color_cached(&new_fn->rb_hash, &info->root, leftmost);
>  	return 0;
>  }
>  
> @@ -558,7 +562,7 @@ static int ext4_dx_readdir(struct file *file, struct dir_context *ctx)
>  		info->extra_fname = NULL;
>  		goto next_node;
>  	} else if (!info->curr_node)
> -		info->curr_node = rb_first(&info->root);
> +		info->curr_node = rb_first_cached(&info->root);
>  
>  	while (1) {
>  		/*
> @@ -580,7 +584,7 @@ static int ext4_dx_readdir(struct file *file, struct dir_context *ctx)
>  				ctx->pos = ext4_get_htree_eof(file);
>  				break;
>  			}
> -			info->curr_node = rb_first(&info->root);
> +			info->curr_node = rb_first_cached(&info->root);
>  		}
>  
>  		fname = rb_entry(info->curr_node, struct fname, rb_hash);
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 9ebde0cd632e..0242c6801600 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -2116,7 +2116,7 @@ static inline bool ext4_is_quota_file(struct inode *inode)
>   * readdir operations in hash tree order.
>   */
>  struct dir_private_info {
> -	struct rb_root	root;
> +	struct          rb_root_cached	root;
>  	struct rb_node	*curr_node;
>  	struct fname	*extra_fname;
>  	loff_t		last_pos;
> @@ -2883,7 +2883,7 @@ int ext4_update_disksize_before_punch(struct inode *inode, loff_t offset,
>  
>  struct ext4_group_info {
>  	unsigned long   bb_state;
> -	struct rb_root  bb_free_root;
> +	struct          rb_root_cached  bb_free_root;
>  	ext4_grpblk_t	bb_first_free;	/* first free block */
>  	ext4_grpblk_t	bb_free;	/* total free blocks */
>  	ext4_grpblk_t	bb_fragments;	/* nr of freespace fragments */
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index 581e357e8406..df36f9123a99 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -2462,7 +2462,7 @@ int ext4_mb_add_groupinfo(struct super_block *sb, ext4_group_t group,
>  
>  	INIT_LIST_HEAD(&meta_group_info[i]->bb_prealloc_list);
>  	init_rwsem(&meta_group_info[i]->alloc_sem);
> -	meta_group_info[i]->bb_free_root = RB_ROOT;
> +	meta_group_info[i]->bb_free_root = RB_ROOT_CACHED;
>  	meta_group_info[i]->bb_largest_free_order = -1;  /* uninit */
>  
>  #ifdef DOUBLE_CHECK
> @@ -2824,7 +2824,7 @@ static void ext4_free_data_in_buddy(struct super_block *sb,
>  	count2++;
>  	ext4_lock_group(sb, entry->efd_group);
>  	/* Take it out of per group rb tree */
> -	rb_erase(&entry->efd_node, &(db->bb_free_root));
> +	rb_erase_cached(&entry->efd_node, &(db->bb_free_root));
>  	mb_free_blocks(NULL, &e4b, entry->efd_start_cluster, entry->efd_count);
>  
>  	/*
> @@ -2836,7 +2836,7 @@ static void ext4_free_data_in_buddy(struct super_block *sb,
>  	if (!test_opt(sb, DISCARD))
>  		EXT4_MB_GRP_CLEAR_TRIMMED(db);
>  
> -	if (!db->bb_free_root.rb_node) {
> +	if (!db->bb_free_root.rb_root.rb_node) {
>  		/* No more items in the per group rb tree
>  		 * balance refcounts from ext4_mb_free_metadata()
>  		 */
> @@ -3519,7 +3519,7 @@ static void ext4_mb_generate_from_freelist(struct super_block *sb, void *bitmap,
>  	struct ext4_free_data *entry;
>  
>  	grp = ext4_get_group_info(sb, group);
> -	n = rb_first(&(grp->bb_free_root));
> +	n = rb_first_cached(&(grp->bb_free_root));
>  
>  	while (n) {
>  		entry = rb_entry(n, struct ext4_free_data, efd_node);
> @@ -4624,7 +4624,7 @@ ext4_fsblk_t ext4_mb_new_blocks(handle_t *handle,
>  static void ext4_try_merge_freed_extent(struct ext4_sb_info *sbi,
>  					struct ext4_free_data *entry,
>  					struct ext4_free_data *new_entry,
> -					struct rb_root *entry_rb_root)
> +					struct rb_root_cached *entry_rb_root)
>  {
>  	if ((entry->efd_tid != new_entry->efd_tid) ||
>  	    (entry->efd_group != new_entry->efd_group))
> @@ -4641,7 +4641,7 @@ static void ext4_try_merge_freed_extent(struct ext4_sb_info *sbi,
>  	spin_lock(&sbi->s_md_lock);
>  	list_del(&entry->efd_list);
>  	spin_unlock(&sbi->s_md_lock);
> -	rb_erase(&entry->efd_node, entry_rb_root);
> +	rb_erase_cached(&entry->efd_node, entry_rb_root);
>  	kmem_cache_free(ext4_free_data_cachep, entry);
>  }
>  
> @@ -4656,8 +4656,9 @@ ext4_mb_free_metadata(handle_t *handle, struct ext4_buddy *e4b,
>  	struct ext4_group_info *db = e4b->bd_info;
>  	struct super_block *sb = e4b->bd_sb;
>  	struct ext4_sb_info *sbi = EXT4_SB(sb);
> -	struct rb_node **n = &db->bb_free_root.rb_node, *node;
> +	struct rb_node **n = &db->bb_free_root.rb_root.rb_node, *node;
>  	struct rb_node *parent = NULL, *new_node;
> +	bool leftmost = true;
>  
>  	BUG_ON(!ext4_handle_valid(handle));
>  	BUG_ON(e4b->bd_bitmap_page == NULL);
> @@ -4680,9 +4681,10 @@ ext4_mb_free_metadata(handle_t *handle, struct ext4_buddy *e4b,
>  		entry = rb_entry(parent, struct ext4_free_data, efd_node);
>  		if (cluster < entry->efd_start_cluster)
>  			n = &(*n)->rb_left;
> -		else if (cluster >= (entry->efd_start_cluster + entry->efd_count))
> +		else if (cluster >= (entry->efd_start_cluster + entry->efd_count)) {
>  			n = &(*n)->rb_right;
> -		else {
> +			leftmost = false;
> +		} else {
>  			ext4_grp_locked_error(sb, group, 0,
>  				ext4_group_first_block_no(sb, group) +
>  				EXT4_C2B(sbi, cluster),
> @@ -4692,7 +4694,7 @@ ext4_mb_free_metadata(handle_t *handle, struct ext4_buddy *e4b,
>  	}
>  
>  	rb_link_node(new_node, parent, n);
> -	rb_insert_color(new_node, &db->bb_free_root);
> +	rb_insert_color_cached(new_node, &db->bb_free_root, leftmost);
>  
>  	/* Now try to see the extent can be merged to left and right */
>  	node = rb_prev(new_node);
> -- 
> 2.12.0
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 10/17] block/cfq: replace cfq_rb_root leftmost caching
  2017-07-19  1:45 ` [PATCH 10/17] block/cfq: replace cfq_rb_root " Davidlohr Bueso
@ 2017-07-19  7:46   ` Jan Kara
  0 siblings, 0 replies; 29+ messages in thread
From: Jan Kara @ 2017-07-19  7:46 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: akpm, mingo, peterz, jack, torvalds, kirill.shutemov, hch,
	ldufour, mhocko, mgorman, linux-kernel, axboe, linux-block,
	Davidlohr Bueso

On Tue 18-07-17 18:45:56, Davidlohr Bueso wrote:
> ... with the generic rbtree flavor instead. No changes
> in semantics whatsoever.
> 
> Cc: axboe@fb.com
> Cc: linux-block@vger.kernel.org
> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Signed-off-by: Davidlohr Bueso <dbueso@suse.de>

Looks good to me. You can add:

Reviewed-by: Jan Kara <jack@suse.cz>

								Honza

> ---
>  block/cfq-iosched.c | 70 +++++++++++++++--------------------------------------
>  1 file changed, 20 insertions(+), 50 deletions(-)
> 
> diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
> index 3d5c28945719..92c31683a2bb 100644
> --- a/block/cfq-iosched.c
> +++ b/block/cfq-iosched.c
> @@ -93,13 +93,12 @@ struct cfq_ttime {
>   * move this into the elevator for the rq sorting as well.
>   */
>  struct cfq_rb_root {
> -	struct rb_root rb;
> -	struct rb_node *left;
> +	struct rb_root_cached rb;
>  	unsigned count;
>  	u64 min_vdisktime;
>  	struct cfq_ttime ttime;
>  };
> -#define CFQ_RB_ROOT	(struct cfq_rb_root) { .rb = RB_ROOT, \
> +#define CFQ_RB_ROOT	(struct cfq_rb_root) { .rb = RB_ROOT_CACHED, \
>  			.ttime = {.last_end_request = ktime_get_ns(),},}
>  
>  /*
> @@ -984,10 +983,9 @@ static inline u64 max_vdisktime(u64 min_vdisktime, u64 vdisktime)
>  
>  static void update_min_vdisktime(struct cfq_rb_root *st)
>  {
> -	struct cfq_group *cfqg;
> +	if (!RB_EMPTY_ROOT(&st->rb.rb_root)) {
> +		struct cfq_group *cfqg = rb_entry_cfqg(st->rb.rb_leftmost);
>  
> -	if (st->left) {
> -		cfqg = rb_entry_cfqg(st->left);
>  		st->min_vdisktime = max_vdisktime(st->min_vdisktime,
>  						  cfqg->vdisktime);
>  	}
> @@ -1169,46 +1167,25 @@ cfq_choose_req(struct cfq_data *cfqd, struct request *rq1, struct request *rq2,
>  	}
>  }
>  
> -/*
> - * The below is leftmost cache rbtree addon
> - */
>  static struct cfq_queue *cfq_rb_first(struct cfq_rb_root *root)
>  {
>  	/* Service tree is empty */
>  	if (!root->count)
>  		return NULL;
>  
> -	if (!root->left)
> -		root->left = rb_first(&root->rb);
> -
> -	if (root->left)
> -		return rb_entry(root->left, struct cfq_queue, rb_node);
> -
> -	return NULL;
> +	return rb_entry(rb_first_cached(&root->rb), struct cfq_queue, rb_node);
>  }
>  
>  static struct cfq_group *cfq_rb_first_group(struct cfq_rb_root *root)
>  {
> -	if (!root->left)
> -		root->left = rb_first(&root->rb);
> -
> -	if (root->left)
> -		return rb_entry_cfqg(root->left);
> -
> -	return NULL;
> +	return rb_entry_cfqg(rb_first_cached(&root->rb));
>  }
>  
> -static void rb_erase_init(struct rb_node *n, struct rb_root *root)
> +static void cfq_rb_erase(struct rb_node *n, struct cfq_rb_root *root)
>  {
> -	rb_erase(n, root);
> +	rb_erase_cached(n, &root->rb);
>  	RB_CLEAR_NODE(n);
> -}
>  
> -static void cfq_rb_erase(struct rb_node *n, struct cfq_rb_root *root)
> -{
> -	if (root->left == n)
> -		root->left = NULL;
> -	rb_erase_init(n, &root->rb);
>  	--root->count;
>  }
>  
> @@ -1258,11 +1235,11 @@ cfqg_key(struct cfq_rb_root *st, struct cfq_group *cfqg)
>  static void
>  __cfq_group_service_tree_add(struct cfq_rb_root *st, struct cfq_group *cfqg)
>  {
> -	struct rb_node **node = &st->rb.rb_node;
> +	struct rb_node **node = &st->rb.rb_root.rb_node;
>  	struct rb_node *parent = NULL;
>  	struct cfq_group *__cfqg;
>  	s64 key = cfqg_key(st, cfqg);
> -	int left = 1;
> +	bool leftmost = true;
>  
>  	while (*node != NULL) {
>  		parent = *node;
> @@ -1272,15 +1249,12 @@ __cfq_group_service_tree_add(struct cfq_rb_root *st, struct cfq_group *cfqg)
>  			node = &parent->rb_left;
>  		else {
>  			node = &parent->rb_right;
> -			left = 0;
> +			leftmost = false;
>  		}
>  	}
>  
> -	if (left)
> -		st->left = &cfqg->rb_node;
> -
>  	rb_link_node(&cfqg->rb_node, parent, node);
> -	rb_insert_color(&cfqg->rb_node, &st->rb);
> +	rb_insert_color_cached(&cfqg->rb_node, &st->rb, leftmost);
>  }
>  
>  /*
> @@ -1381,7 +1355,7 @@ cfq_group_notify_queue_add(struct cfq_data *cfqd, struct cfq_group *cfqg)
>  	 * so that groups get lesser vtime based on their weights, so that
>  	 * if group does not loose all if it was not continuously backlogged.
>  	 */
> -	n = rb_last(&st->rb);
> +	n = rb_last(&st->rb.rb_root);
>  	if (n) {
>  		__cfqg = rb_entry_cfqg(n);
>  		cfqg->vdisktime = __cfqg->vdisktime +
> @@ -2223,14 +2197,14 @@ static void cfq_service_tree_add(struct cfq_data *cfqd, struct cfq_queue *cfqq,
>  	struct cfq_queue *__cfqq;
>  	u64 rb_key;
>  	struct cfq_rb_root *st;
> -	int left;
> +	bool leftmost = true;
>  	int new_cfqq = 1;
>  	u64 now = ktime_get_ns();
>  
>  	st = st_for(cfqq->cfqg, cfqq_class(cfqq), cfqq_type(cfqq));
>  	if (cfq_class_idle(cfqq)) {
>  		rb_key = CFQ_IDLE_DELAY;
> -		parent = rb_last(&st->rb);
> +		parent = rb_last(&st->rb.rb_root);
>  		if (parent && parent != &cfqq->rb_node) {
>  			__cfqq = rb_entry(parent, struct cfq_queue, rb_node);
>  			rb_key += __cfqq->rb_key;
> @@ -2264,10 +2238,9 @@ static void cfq_service_tree_add(struct cfq_data *cfqd, struct cfq_queue *cfqq,
>  		cfqq->service_tree = NULL;
>  	}
>  
> -	left = 1;
>  	parent = NULL;
>  	cfqq->service_tree = st;
> -	p = &st->rb.rb_node;
> +	p = &st->rb.rb_root.rb_node;
>  	while (*p) {
>  		parent = *p;
>  		__cfqq = rb_entry(parent, struct cfq_queue, rb_node);
> @@ -2279,16 +2252,13 @@ static void cfq_service_tree_add(struct cfq_data *cfqd, struct cfq_queue *cfqq,
>  			p = &parent->rb_left;
>  		else {
>  			p = &parent->rb_right;
> -			left = 0;
> +			leftmost = false;
>  		}
>  	}
>  
> -	if (left)
> -		st->left = &cfqq->rb_node;
> -
>  	cfqq->rb_key = rb_key;
>  	rb_link_node(&cfqq->rb_node, parent, p);
> -	rb_insert_color(&cfqq->rb_node, &st->rb);
> +	rb_insert_color_cached(&cfqq->rb_node, &st->rb, leftmost);
>  	st->count++;
>  	if (add_front || !new_cfqq)
>  		return;
> @@ -2735,7 +2705,7 @@ static struct cfq_queue *cfq_get_next_queue(struct cfq_data *cfqd)
>  	/* There is nothing to dispatch */
>  	if (!st)
>  		return NULL;
> -	if (RB_EMPTY_ROOT(&st->rb))
> +	if (RB_EMPTY_ROOT(&st->rb.rb_root))
>  		return NULL;
>  	return cfq_rb_first(st);
>  }
> @@ -3221,7 +3191,7 @@ static struct cfq_group *cfq_get_next_cfqg(struct cfq_data *cfqd)
>  	struct cfq_rb_root *st = &cfqd->grp_service_tree;
>  	struct cfq_group *cfqg;
>  
> -	if (RB_EMPTY_ROOT(&st->rb))
> +	if (RB_EMPTY_ROOT(&st->rb.rb_root))
>  		return NULL;
>  	cfqg = cfq_rb_first_group(st);
>  	update_min_vdisktime(st);
> -- 
> 2.12.0
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 16/17] mem/memcg: cache rightmost node
  2017-07-19  1:46 ` [PATCH 16/17] mem/memcg: cache rightmost node Davidlohr Bueso
@ 2017-07-19  7:50   ` Michal Hocko
  2017-07-26 21:09     ` Andrew Morton
  0 siblings, 1 reply; 29+ messages in thread
From: Michal Hocko @ 2017-07-19  7:50 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: akpm, mingo, peterz, jack, torvalds, kirill.shutemov, hch,
	ldufour, mgorman, linux-kernel, Davidlohr Bueso, Johannes Weiner,
	Vladimir Davydov

[CC Johannes and Vladimir - the whole series is
http://lkml.kernel.org/r/20170719014603.19029-1-dave@stgolabs.net]

On Tue 18-07-17 18:46:02, Davidlohr Bueso wrote:
> Such that we can optimize __mem_cgroup_largest_soft_limit_node().
> The only overhead is the extra footprint for the cached pointer,
> but this should not be an issue for mem_cgroup_tree_per_node.

The soft limit reclaim and the associated tree manipulation is not worth
touching/optimizing IMHO. We strongly discourage anybody configuring
soft limit because of the way how it is implemented and disruptive.

> Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
> ---
>  mm/memcontrol.c | 23 ++++++++++++++++++-----
>  1 file changed, 18 insertions(+), 5 deletions(-)
> 
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 3df3c04d73ab..2ef9328ace2e 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -119,6 +119,7 @@ static const char *const mem_cgroup_lru_names[] = {
>  
>  struct mem_cgroup_tree_per_node {
>  	struct rb_root rb_root;
> +	struct rb_node *rb_rightmost;
>  	spinlock_t lock;
>  };
>  
> @@ -386,6 +387,7 @@ static void __mem_cgroup_insert_exceeded(struct mem_cgroup_per_node *mz,
>  	struct rb_node **p = &mctz->rb_root.rb_node;
>  	struct rb_node *parent = NULL;
>  	struct mem_cgroup_per_node *mz_node;
> +	bool rightmost = true;
>  
>  	if (mz->on_tree)
>  		return;
> @@ -397,8 +399,11 @@ static void __mem_cgroup_insert_exceeded(struct mem_cgroup_per_node *mz,
>  		parent = *p;
>  		mz_node = rb_entry(parent, struct mem_cgroup_per_node,
>  					tree_node);
> -		if (mz->usage_in_excess < mz_node->usage_in_excess)
> +		if (mz->usage_in_excess < mz_node->usage_in_excess) {
>  			p = &(*p)->rb_left;
> +			rightmost = false;
> +		}
> +
>  		/*
>  		 * We can't avoid mem cgroups that are over their soft
>  		 * limit by the same amount
> @@ -406,6 +411,10 @@ static void __mem_cgroup_insert_exceeded(struct mem_cgroup_per_node *mz,
>  		else if (mz->usage_in_excess >= mz_node->usage_in_excess)
>  			p = &(*p)->rb_right;
>  	}
> +
> +	if (rightmost)
> +		mctz->rb_rightmost = &mz->tree_node;
> +
>  	rb_link_node(&mz->tree_node, parent, p);
>  	rb_insert_color(&mz->tree_node, &mctz->rb_root);
>  	mz->on_tree = true;
> @@ -416,6 +425,10 @@ static void __mem_cgroup_remove_exceeded(struct mem_cgroup_per_node *mz,
>  {
>  	if (!mz->on_tree)
>  		return;
> +
> +	if (&mz->tree_node == mctz->rb_rightmost)
> +		mctz->rb_rightmost = rb_next(&mz->tree_node);
> +
>  	rb_erase(&mz->tree_node, &mctz->rb_root);
>  	mz->on_tree = false;
>  }
> @@ -496,16 +509,15 @@ static void mem_cgroup_remove_from_trees(struct mem_cgroup *memcg)
>  static struct mem_cgroup_per_node *
>  __mem_cgroup_largest_soft_limit_node(struct mem_cgroup_tree_per_node *mctz)
>  {
> -	struct rb_node *rightmost = NULL;
>  	struct mem_cgroup_per_node *mz;
>  
>  retry:
>  	mz = NULL;
> -	rightmost = rb_last(&mctz->rb_root);
> -	if (!rightmost)
> +	if (!mctz->rb_rightmost)
>  		goto done;		/* Nothing to reclaim from */
>  
> -	mz = rb_entry(rightmost, struct mem_cgroup_per_node, tree_node);
> +	mz = rb_entry(mctz->rb_rightmost,
> +		      struct mem_cgroup_per_node, tree_node);
>  	/*
>  	 * Remove the node now but someone else can add it back,
>  	 * we will to add it back at the end of reclaim to its correct
> @@ -5850,6 +5862,7 @@ static int __init mem_cgroup_init(void)
>  				    node_online(node) ? node : NUMA_NO_NODE);
>  
>  		rtpn->rb_root = RB_ROOT;
> +		rtpn->rb_rightmost = NULL;
>  		spin_lock_init(&rtpn->lock);
>  		soft_limit_tree.rb_tree_per_node[node] = rtpn;
>  	}
> -- 
> 2.12.0

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 17/17] block/cfq: cache rightmost rb_node
  2017-07-19  1:46 ` [PATCH 17/17] block/cfq: cache rightmost rb_node Davidlohr Bueso
@ 2017-07-19  7:59   ` Jan Kara
  0 siblings, 0 replies; 29+ messages in thread
From: Jan Kara @ 2017-07-19  7:59 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: akpm, mingo, peterz, jack, torvalds, kirill.shutemov, hch,
	ldufour, mhocko, mgorman, linux-kernel, axboe, linux-block,
	Davidlohr Bueso

On Tue 18-07-17 18:46:03, Davidlohr Bueso wrote:
> For the same reasons we already cache the leftmost pointer,
> apply the same optimization for rb_last() calls. Users must
> explicitly do this as rb_root_cached only deals with the
> smallest node.
> 
> Cc: axboe@fb.com
> Cc: linux-block@vger.kernel.org
> Signed-off-by: Davidlohr Bueso <dbueso@suse.de>

Hum, as I'm reading the code, here we have about 1:1 ratio of cached
lookups and tree insert / delete where we have to maintain the cached
value. Is the optimization worth it in such case?

								Honza

> ---
> This is part of the rbtree internal caching series:
> https://lwn.net/Articles/726809/
> 
>  block/cfq-iosched.c | 19 ++++++++++++++-----
>  1 file changed, 14 insertions(+), 5 deletions(-)
> 
> diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c
> index 92c31683a2bb..57ec45fd4590 100644
> --- a/block/cfq-iosched.c
> +++ b/block/cfq-iosched.c
> @@ -94,11 +94,13 @@ struct cfq_ttime {
>   */
>  struct cfq_rb_root {
>  	struct rb_root_cached rb;
> +	struct rb_node *rb_rightmost;
>  	unsigned count;
>  	u64 min_vdisktime;
>  	struct cfq_ttime ttime;
>  };
>  #define CFQ_RB_ROOT	(struct cfq_rb_root) { .rb = RB_ROOT_CACHED, \
> +			.rb_rightmost = NULL,			     \
>  			.ttime = {.last_end_request = ktime_get_ns(),},}
>  
>  /*
> @@ -1183,6 +1185,9 @@ static struct cfq_group *cfq_rb_first_group(struct cfq_rb_root *root)
>  
>  static void cfq_rb_erase(struct rb_node *n, struct cfq_rb_root *root)
>  {
> +	if (root->rb_rightmost == n)
> +		root->rb_rightmost = rb_next(n);
> +
>  	rb_erase_cached(n, &root->rb);
>  	RB_CLEAR_NODE(n);
>  
> @@ -1239,20 +1244,24 @@ __cfq_group_service_tree_add(struct cfq_rb_root *st, struct cfq_group *cfqg)
>  	struct rb_node *parent = NULL;
>  	struct cfq_group *__cfqg;
>  	s64 key = cfqg_key(st, cfqg);
> -	bool leftmost = true;
> +	bool leftmost = true, rightmost = true;
>  
>  	while (*node != NULL) {
>  		parent = *node;
>  		__cfqg = rb_entry_cfqg(parent);
>  
> -		if (key < cfqg_key(st, __cfqg))
> +		if (key < cfqg_key(st, __cfqg)) {
>  			node = &parent->rb_left;
> -		else {
> +			rightmost = false;
> +		} else {
>  			node = &parent->rb_right;
>  			leftmost = false;
>  		}
>  	}
>  
> +	if (rightmost)
> +		st->rb_rightmost = &cfqg->rb_node;
> +
>  	rb_link_node(&cfqg->rb_node, parent, node);
>  	rb_insert_color_cached(&cfqg->rb_node, &st->rb, leftmost);
>  }
> @@ -1355,7 +1364,7 @@ cfq_group_notify_queue_add(struct cfq_data *cfqd, struct cfq_group *cfqg)
>  	 * so that groups get lesser vtime based on their weights, so that
>  	 * if group does not loose all if it was not continuously backlogged.
>  	 */
> -	n = rb_last(&st->rb.rb_root);
> +	n = st->rb_rightmost;
>  	if (n) {
>  		__cfqg = rb_entry_cfqg(n);
>  		cfqg->vdisktime = __cfqg->vdisktime +
> @@ -2204,7 +2213,7 @@ static void cfq_service_tree_add(struct cfq_data *cfqd, struct cfq_queue *cfqq,
>  	st = st_for(cfqq->cfqg, cfqq_class(cfqq), cfqq_type(cfqq));
>  	if (cfq_class_idle(cfqq)) {
>  		rb_key = CFQ_IDLE_DELAY;
> -		parent = rb_last(&st->rb.rb_root);
> +		parent = st->rb_rightmost;
>  		if (parent && parent != &cfqq->rb_node) {
>  			__cfqq = rb_entry(parent, struct cfq_queue, rb_node);
>  			rb_key += __cfqq->rb_key;
> -- 
> 2.12.0
> 
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 15/17] fs/ext4: use cached rbtrees
  2017-07-19  7:40   ` Jan Kara
@ 2017-07-19 22:50     ` Davidlohr Bueso
  0 siblings, 0 replies; 29+ messages in thread
From: Davidlohr Bueso @ 2017-07-19 22:50 UTC (permalink / raw)
  To: Jan Kara
  Cc: akpm, mingo, peterz, torvalds, kirill.shutemov, hch, ldufour,
	mhocko, mgorman, linux-kernel, tytso, Davidlohr Bueso

On Wed, 19 Jul 2017, Jan Kara wrote:

>Hum, so I'm somewhat undecided whether this is worth the churn. For free
>blocks rb_tree we use rb_first() only in ext4_mb_generate_from_freelist()
>which gets called only when generating new buddy bitmap from on-disk bitmap
>and we traverse the whole tree after that - thus the extra cost of
>rb_first() is a) well hidden in the total cost of iteration, b) rather rare
>anyway.

Yes, during some of the testing I did for this patch, I could not find trees
with more than just a handful of nodes, ie: while removing file(s). 

>
>Similarly for the H-tree directory code, we call rb_first() in
>ext4_dx_readdir() only to start an iteration over the whole B-tree and in
>such case I don't think optimizing rb_first() makes a big difference
>(maintaining cached value is going to have about the same cost as we save
>by using it).

Your reaction is rather expected, actually. Lets just leave this patch out.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 11/17] lib/interval_tree: fast overlap detection
  2017-07-19  1:45 ` [PATCH 11/17] lib/interval_tree: fast overlap detection Davidlohr Bueso
@ 2017-07-22 17:52       ` Doug Ledford
  2017-08-01 17:16     ` Michael S. Tsirkin
  1 sibling, 0 replies; 29+ messages in thread
From: Doug Ledford @ 2017-07-22 17:52 UTC (permalink / raw)
  To: Davidlohr Bueso, akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b
  Cc: mingo-DgEjT+Ai2ygdnm+yROfE0A, peterz-wEGCiKHe2LqWVfeAwA7xHQ,
	jack-AlSwsSmVLrQ, torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	kirill.shutemov-VuQAYsv1563Yd54FQh9/CA,
	hch-wEGCiKHe2LqWVfeAwA7xHQ,
	ldufour-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8, mhocko-IBi9RG/b67k,
	mgorman-3eNAlZScCAx27rWaFMvyedHuzzzSOjJt,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, David Airlie,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Michael S. Tsirkin,
	Jason Wang, Christian Benvenuti,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Davidlohr Bueso


[-- Attachment #1.1: Type: text/plain, Size: 1596 bytes --]

On 7/18/2017 9:45 PM, Davidlohr Bueso wrote:
> Allow interval trees to quickly check for overlaps to avoid
> unnecesary tree lookups in interval_tree_iter_first().
> 
> As of this patch, all interval tree flavors will require
> using a 'rb_root_cached' such that we can have the leftmost
> node easily available. While most users will make use of this
> feature, those with special functions (in addition to the generic
> insert, delete, search calls) will avoid using the cached
> option as they can do funky things with insertions -- for example,
> vma_interval_tree_insert_after().
> 
> Cc: David Airlie <airlied-cv59FeDIM0c@public.gmane.org>
> Cc: dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
> Cc: "Michael S. Tsirkin" <mst-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> Cc: Jason Wang <jasowang-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> Cc: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> Cc: Christian Benvenuti <benve-FYB4Gu1CFyUAvxtiuMwx3w@public.gmane.org>
> Cc: linux-rdma-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> Acked-by: Christian König <christian.koenig-5C7GfCeVMHo@public.gmane.org>
> Acked-by: Peter Zijlstra (Intel) <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
> Signed-off-by: Davidlohr Bueso <dbueso-l3A5Bk7waGM@public.gmane.org>

Ack for the RDMA parts.

Acked-by: Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>


-- 
Doug Ledford <dledford-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
    GPG Key ID: B826A3330E572FDD
    Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 11/17] lib/interval_tree: fast overlap detection
@ 2017-07-22 17:52       ` Doug Ledford
  0 siblings, 0 replies; 29+ messages in thread
From: Doug Ledford @ 2017-07-22 17:52 UTC (permalink / raw)
  To: Davidlohr Bueso, akpm
  Cc: mingo, peterz, jack, torvalds, kirill.shutemov, hch, ldufour,
	mhocko, mgorman, linux-kernel, David Airlie, dri-devel,
	Michael S. Tsirkin, Jason Wang, Christian Benvenuti, linux-rdma,
	Davidlohr Bueso


[-- Attachment #1.1: Type: text/plain, Size: 1281 bytes --]

On 7/18/2017 9:45 PM, Davidlohr Bueso wrote:
> Allow interval trees to quickly check for overlaps to avoid
> unnecesary tree lookups in interval_tree_iter_first().
> 
> As of this patch, all interval tree flavors will require
> using a 'rb_root_cached' such that we can have the leftmost
> node easily available. While most users will make use of this
> feature, those with special functions (in addition to the generic
> insert, delete, search calls) will avoid using the cached
> option as they can do funky things with insertions -- for example,
> vma_interval_tree_insert_after().
> 
> Cc: David Airlie <airlied@linux.ie>
> Cc: dri-devel@lists.freedesktop.org
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Jason Wang <jasowang@redhat.com>
> Cc: Doug Ledford <dledford@redhat.com>
> Cc: Christian Benvenuti <benve@cisco.com>
> Cc: linux-rdma@vger.kernel.org
> Acked-by: Christian König <christian.koenig@amd.com>
> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Signed-off-by: Davidlohr Bueso <dbueso@suse.de>

Ack for the RDMA parts.

Acked-by: Doug Ledford <dledford@redhat.com>


-- 
Doug Ledford <dledford@redhat.com>
    GPG Key ID: B826A3330E572FDD
    Key fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 884 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 16/17] mem/memcg: cache rightmost node
  2017-07-19  7:50   ` Michal Hocko
@ 2017-07-26 21:09     ` Andrew Morton
  2017-07-27  7:06       ` Michal Hocko
  0 siblings, 1 reply; 29+ messages in thread
From: Andrew Morton @ 2017-07-26 21:09 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Davidlohr Bueso, mingo, peterz, jack, torvalds, kirill.shutemov,
	hch, ldufour, mgorman, linux-kernel, Davidlohr Bueso,
	Johannes Weiner, Vladimir Davydov

On Wed, 19 Jul 2017 09:50:36 +0200 Michal Hocko <mhocko@kernel.org> wrote:

> [CC Johannes and Vladimir - the whole series is
> http://lkml.kernel.org/r/20170719014603.19029-1-dave@stgolabs.net]
> 
> On Tue 18-07-17 18:46:02, Davidlohr Bueso wrote:
> > Such that we can optimize __mem_cgroup_largest_soft_limit_node().
> > The only overhead is the extra footprint for the cached pointer,
> > but this should not be an issue for mem_cgroup_tree_per_node.
> 
> The soft limit reclaim and the associated tree manipulation is not worth
> touching/optimizing IMHO. We strongly discourage anybody configuring
> soft limit because of the way how it is implemented and disruptive.

I'm inclined to merge this.  Unless we plan to actually remove the code
"soon", I think it's best to continue to improve it.  Improving
performance may never matter to anyone, but there is benefit in keeping
up to date with the current interfaces and best practices.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 16/17] mem/memcg: cache rightmost node
  2017-07-26 21:09     ` Andrew Morton
@ 2017-07-27  7:06       ` Michal Hocko
  0 siblings, 0 replies; 29+ messages in thread
From: Michal Hocko @ 2017-07-27  7:06 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Davidlohr Bueso, mingo, peterz, jack, torvalds, kirill.shutemov,
	hch, ldufour, mgorman, linux-kernel, Davidlohr Bueso,
	Johannes Weiner, Vladimir Davydov

On Wed 26-07-17 14:09:27, Andrew Morton wrote:
> On Wed, 19 Jul 2017 09:50:36 +0200 Michal Hocko <mhocko@kernel.org> wrote:
> 
> > [CC Johannes and Vladimir - the whole series is
> > http://lkml.kernel.org/r/20170719014603.19029-1-dave@stgolabs.net]
> > 
> > On Tue 18-07-17 18:46:02, Davidlohr Bueso wrote:
> > > Such that we can optimize __mem_cgroup_largest_soft_limit_node().
> > > The only overhead is the extra footprint for the cached pointer,
> > > but this should not be an issue for mem_cgroup_tree_per_node.
> > 
> > The soft limit reclaim and the associated tree manipulation is not worth
> > touching/optimizing IMHO. We strongly discourage anybody configuring
> > soft limit because of the way how it is implemented and disruptive.
> 
> I'm inclined to merge this.  Unless we plan to actually remove the code
> "soon",

this is not going to happen. It is a user visible interface so we will
have to maintain it as long as cgroup v1 interface is available

> I think it's best to continue to improve it.  Improving
> performance may never matter to anyone, but there is benefit in keeping
> up to date with the current interfaces and best practices.
 
 Well, I am not opposing the change I just think it is not worth
 bothering. Soft limit reclaim tends to be so expensive (direct limit
 down to the soft limit) that a tiny otimization has hard times to help.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 11/17] lib/interval_tree: fast overlap detection
  2017-07-19  1:45 ` [PATCH 11/17] lib/interval_tree: fast overlap detection Davidlohr Bueso
@ 2017-08-01 17:16     ` Michael S. Tsirkin
  2017-08-01 17:16     ` Michael S. Tsirkin
  1 sibling, 0 replies; 29+ messages in thread
From: Michael S. Tsirkin @ 2017-08-01 17:16 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: mhocko, jack, Davidlohr Bueso, peterz, mgorman, Jason Wang,
	linux-kernel, dri-devel, hch, Doug Ledford, linux-rdma, akpm,
	ldufour, torvalds, mingo, kirill.shutemov, Christian Benvenuti

On Tue, Jul 18, 2017 at 06:45:57PM -0700, Davidlohr Bueso wrote:
> Allow interval trees to quickly check for overlaps to avoid
> unnecesary tree lookups in interval_tree_iter_first().
> 
> As of this patch, all interval tree flavors will require
> using a 'rb_root_cached' such that we can have the leftmost
> node easily available. While most users will make use of this
> feature, those with special functions (in addition to the generic
> insert, delete, search calls) will avoid using the cached
> option as they can do funky things with insertions -- for example,
> vma_interval_tree_insert_after().
> 
> Cc: David Airlie <airlied@linux.ie>
> Cc: dri-devel@lists.freedesktop.org
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Jason Wang <jasowang@redhat.com>
> Cc: Doug Ledford <dledford@redhat.com>
> Cc: Christian Benvenuti <benve@cisco.com>
> Cc: linux-rdma@vger.kernel.org
> Acked-by: Christian König <christian.koenig@amd.com>
> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Signed-off-by: Davidlohr Bueso <dbueso@suse.de>

For vhost bits:

Acked-by: Michael S. Tsirkin <mst@redhat.com>


> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c             |  8 ++--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c             |  7 ++--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h             |  2 +-
>  drivers/gpu/drm/drm_mm.c                           | 19 +++++----
>  drivers/gpu/drm/drm_vma_manager.c                  |  2 +-
>  drivers/gpu/drm/i915/i915_gem_userptr.c            |  6 +--
>  drivers/gpu/drm/radeon/radeon.h                    |  2 +-
>  drivers/gpu/drm/radeon/radeon_mn.c                 |  8 ++--
>  drivers/gpu/drm/radeon/radeon_vm.c                 |  7 ++--
>  drivers/infiniband/core/umem_rbtree.c              |  4 +-
>  drivers/infiniband/core/uverbs_cmd.c               |  2 +-
>  drivers/infiniband/hw/hfi1/mmu_rb.c                | 10 ++---
>  drivers/infiniband/hw/usnic/usnic_uiom.c           |  6 +--
>  drivers/infiniband/hw/usnic/usnic_uiom.h           |  2 +-
>  .../infiniband/hw/usnic/usnic_uiom_interval_tree.c | 15 +++----
>  .../infiniband/hw/usnic/usnic_uiom_interval_tree.h | 12 +++---
>  drivers/vhost/vhost.c                              |  2 +-
>  drivers/vhost/vhost.h                              |  2 +-
>  fs/hugetlbfs/inode.c                               |  6 +--
>  fs/inode.c                                         |  2 +-
>  include/drm/drm_mm.h                               |  2 +-
>  include/linux/fs.h                                 |  4 +-
>  include/linux/interval_tree.h                      |  8 ++--
>  include/linux/interval_tree_generic.h              | 46 +++++++++++++++++-----
>  include/linux/mm.h                                 | 17 ++++----
>  include/linux/rmap.h                               |  4 +-
>  include/rdma/ib_umem_odp.h                         | 11 ++++--
>  include/rdma/ib_verbs.h                            |  2 +-
>  lib/interval_tree_test.c                           |  4 +-
>  mm/interval_tree.c                                 | 10 ++---
>  mm/memory.c                                        |  4 +-
>  mm/mmap.c                                          | 10 ++---
>  mm/rmap.c                                          |  4 +-
>  33 files changed, 145 insertions(+), 105 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
> index 38f739fb727b..3f8aef21b9a6 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
> @@ -51,7 +51,7 @@ struct amdgpu_mn {
>  
>  	/* objects protected by lock */
>  	struct mutex		lock;
> -	struct rb_root		objects;
> +	struct rb_root_cached	objects;
>  };
>  
>  struct amdgpu_mn_node {
> @@ -76,8 +76,8 @@ static void amdgpu_mn_destroy(struct work_struct *work)
>  	mutex_lock(&adev->mn_lock);
>  	mutex_lock(&rmn->lock);
>  	hash_del(&rmn->node);
> -	rbtree_postorder_for_each_entry_safe(node, next_node, &rmn->objects,
> -					     it.rb) {
> +	rbtree_postorder_for_each_entry_safe(node, next_node,
> +					     &rmn->objects.rb_root, it.rb) {
>  		list_for_each_entry_safe(bo, next_bo, &node->bos, mn_list) {
>  			bo->mn = NULL;
>  			list_del_init(&bo->mn_list);
> @@ -252,7 +252,7 @@ static struct amdgpu_mn *amdgpu_mn_get(struct amdgpu_device *adev)
>  	rmn->mm = mm;
>  	rmn->mn.ops = &amdgpu_mn_ops;
>  	mutex_init(&rmn->lock);
> -	rmn->objects = RB_ROOT;
> +	rmn->objects = RB_ROOT_CACHED;
>  
>  	r = __mmu_notifier_register(&rmn->mn, mm);
>  	if (r)
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index 5795f81369f0..f872e2179bbd 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -2405,7 +2405,7 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm,
>  	int r, i;
>  	u64 flags;
>  
> -	vm->va = RB_ROOT;
> +	vm->va = RB_ROOT_CACHED;
>  	vm->client_id = atomic64_inc_return(&adev->vm_manager.client_counter);
>  	for (i = 0; i < AMDGPU_MAX_VMHUBS; i++)
>  		vm->reserved_vmid[i] = NULL;
> @@ -2512,10 +2512,11 @@ void amdgpu_vm_fini(struct amdgpu_device *adev, struct amdgpu_vm *vm)
>  
>  	amd_sched_entity_fini(vm->entity.sched, &vm->entity);
>  
> -	if (!RB_EMPTY_ROOT(&vm->va)) {
> +	if (!RB_EMPTY_ROOT(&vm->va.rb_root)) {
>  		dev_err(adev->dev, "still active bo inside vm\n");
>  	}
> -	rbtree_postorder_for_each_entry_safe(mapping, tmp, &vm->va, rb) {
> +	rbtree_postorder_for_each_entry_safe(mapping, tmp,
> +					     &vm->va.rb_root, rb) {
>  		list_del(&mapping->list);
>  		amdgpu_vm_it_remove(mapping, &vm->va);
>  		kfree(mapping);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> index 936f158bc5ec..ebffc1253f85 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> @@ -106,7 +106,7 @@ struct amdgpu_vm_pt {
>  
>  struct amdgpu_vm {
>  	/* tree of virtual addresses mapped */
> -	struct rb_root		va;
> +	struct rb_root_cached	va;
>  
>  	/* protecting invalidated */
>  	spinlock_t		status_lock;
> diff --git a/drivers/gpu/drm/drm_mm.c b/drivers/gpu/drm/drm_mm.c
> index f794089d30ac..61a1c8ea74bc 100644
> --- a/drivers/gpu/drm/drm_mm.c
> +++ b/drivers/gpu/drm/drm_mm.c
> @@ -169,7 +169,7 @@ INTERVAL_TREE_DEFINE(struct drm_mm_node, rb,
>  struct drm_mm_node *
>  __drm_mm_interval_first(const struct drm_mm *mm, u64 start, u64 last)
>  {
> -	return drm_mm_interval_tree_iter_first((struct rb_root *)&mm->interval_tree,
> +	return drm_mm_interval_tree_iter_first((struct rb_root_cached *)&mm->interval_tree,
>  					       start, last) ?: (struct drm_mm_node *)&mm->head_node;
>  }
>  EXPORT_SYMBOL(__drm_mm_interval_first);
> @@ -180,6 +180,7 @@ static void drm_mm_interval_tree_add_node(struct drm_mm_node *hole_node,
>  	struct drm_mm *mm = hole_node->mm;
>  	struct rb_node **link, *rb;
>  	struct drm_mm_node *parent;
> +	bool leftmost = true;
>  
>  	node->__subtree_last = LAST(node);
>  
> @@ -196,9 +197,10 @@ static void drm_mm_interval_tree_add_node(struct drm_mm_node *hole_node,
>  
>  		rb = &hole_node->rb;
>  		link = &hole_node->rb.rb_right;
> +		leftmost = false;
>  	} else {
>  		rb = NULL;
> -		link = &mm->interval_tree.rb_node;
> +		link = &mm->interval_tree.rb_root.rb_node;
>  	}
>  
>  	while (*link) {
> @@ -208,14 +210,15 @@ static void drm_mm_interval_tree_add_node(struct drm_mm_node *hole_node,
>  			parent->__subtree_last = node->__subtree_last;
>  		if (node->start < parent->start)
>  			link = &parent->rb.rb_left;
> -		else
> +		else {
>  			link = &parent->rb.rb_right;
> +			leftmost = true;
> +		}
>  	}
>  
>  	rb_link_node(&node->rb, rb, link);
> -	rb_insert_augmented(&node->rb,
> -			    &mm->interval_tree,
> -			    &drm_mm_interval_tree_augment);
> +	rb_insert_augmented_cached(&node->rb, &mm->interval_tree, leftmost,
> +				   &drm_mm_interval_tree_augment);
>  }
>  
>  #define RB_INSERT(root, member, expr) do { \
> @@ -577,7 +580,7 @@ void drm_mm_replace_node(struct drm_mm_node *old, struct drm_mm_node *new)
>  	*new = *old;
>  
>  	list_replace(&old->node_list, &new->node_list);
> -	rb_replace_node(&old->rb, &new->rb, &old->mm->interval_tree);
> +	rb_replace_node(&old->rb, &new->rb, &old->mm->interval_tree.rb_root);
>  
>  	if (drm_mm_hole_follows(old)) {
>  		list_replace(&old->hole_stack, &new->hole_stack);
> @@ -863,7 +866,7 @@ void drm_mm_init(struct drm_mm *mm, u64 start, u64 size)
>  	mm->color_adjust = NULL;
>  
>  	INIT_LIST_HEAD(&mm->hole_stack);
> -	mm->interval_tree = RB_ROOT;
> +	mm->interval_tree = RB_ROOT_CACHED;
>  	mm->holes_size = RB_ROOT;
>  	mm->holes_addr = RB_ROOT;
>  
> diff --git a/drivers/gpu/drm/drm_vma_manager.c b/drivers/gpu/drm/drm_vma_manager.c
> index d9100b565198..28f1226576f8 100644
> --- a/drivers/gpu/drm/drm_vma_manager.c
> +++ b/drivers/gpu/drm/drm_vma_manager.c
> @@ -147,7 +147,7 @@ struct drm_vma_offset_node *drm_vma_offset_lookup_locked(struct drm_vma_offset_m
>  	struct rb_node *iter;
>  	unsigned long offset;
>  
> -	iter = mgr->vm_addr_space_mm.interval_tree.rb_node;
> +	iter = mgr->vm_addr_space_mm.interval_tree.rb_root.rb_node;
>  	best = NULL;
>  
>  	while (likely(iter)) {
> diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
> index ccd09e8419f5..71dddf66baaa 100644
> --- a/drivers/gpu/drm/i915/i915_gem_userptr.c
> +++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
> @@ -49,7 +49,7 @@ struct i915_mmu_notifier {
>  	spinlock_t lock;
>  	struct hlist_node node;
>  	struct mmu_notifier mn;
> -	struct rb_root objects;
> +	struct rb_root_cached objects;
>  	struct workqueue_struct *wq;
>  };
>  
> @@ -123,7 +123,7 @@ static void i915_gem_userptr_mn_invalidate_range_start(struct mmu_notifier *_mn,
>  	struct interval_tree_node *it;
>  	LIST_HEAD(cancelled);
>  
> -	if (RB_EMPTY_ROOT(&mn->objects))
> +	if (RB_EMPTY_ROOT(&mn->objects.rb_root))
>  		return;
>  
>  	/* interval ranges are inclusive, but invalidate range is exclusive */
> @@ -172,7 +172,7 @@ i915_mmu_notifier_create(struct mm_struct *mm)
>  
>  	spin_lock_init(&mn->lock);
>  	mn->mn.ops = &i915_gem_userptr_notifier;
> -	mn->objects = RB_ROOT;
> +	mn->objects = RB_ROOT_CACHED;
>  	mn->wq = alloc_workqueue("i915-userptr-release", WQ_UNBOUND, 0);
>  	if (mn->wq == NULL) {
>  		kfree(mn);
> diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
> index 5008f3d4cccc..10d0dd146808 100644
> --- a/drivers/gpu/drm/radeon/radeon.h
> +++ b/drivers/gpu/drm/radeon/radeon.h
> @@ -924,7 +924,7 @@ struct radeon_vm_id {
>  struct radeon_vm {
>  	struct mutex		mutex;
>  
> -	struct rb_root		va;
> +	struct rb_root_cached	va;
>  
>  	/* protecting invalidated and freed */
>  	spinlock_t		status_lock;
> diff --git a/drivers/gpu/drm/radeon/radeon_mn.c b/drivers/gpu/drm/radeon/radeon_mn.c
> index 896f2cf51e4e..1d62288b7ee3 100644
> --- a/drivers/gpu/drm/radeon/radeon_mn.c
> +++ b/drivers/gpu/drm/radeon/radeon_mn.c
> @@ -50,7 +50,7 @@ struct radeon_mn {
>  
>  	/* objects protected by lock */
>  	struct mutex		lock;
> -	struct rb_root		objects;
> +	struct rb_root_cached	objects;
>  };
>  
>  struct radeon_mn_node {
> @@ -75,8 +75,8 @@ static void radeon_mn_destroy(struct work_struct *work)
>  	mutex_lock(&rdev->mn_lock);
>  	mutex_lock(&rmn->lock);
>  	hash_del(&rmn->node);
> -	rbtree_postorder_for_each_entry_safe(node, next_node, &rmn->objects,
> -					     it.rb) {
> +	rbtree_postorder_for_each_entry_safe(node, next_node,
> +					     &rmn->objects.rb_root, it.rb) {
>  
>  		interval_tree_remove(&node->it, &rmn->objects);
>  		list_for_each_entry_safe(bo, next_bo, &node->bos, mn_list) {
> @@ -205,7 +205,7 @@ static struct radeon_mn *radeon_mn_get(struct radeon_device *rdev)
>  	rmn->mm = mm;
>  	rmn->mn.ops = &radeon_mn_ops;
>  	mutex_init(&rmn->lock);
> -	rmn->objects = RB_ROOT;
> +	rmn->objects = RB_ROOT_CACHED;
>  	
>  	r = __mmu_notifier_register(&rmn->mn, mm);
>  	if (r)
> diff --git a/drivers/gpu/drm/radeon/radeon_vm.c b/drivers/gpu/drm/radeon/radeon_vm.c
> index 5f68245579a3..f44777a6c2e8 100644
> --- a/drivers/gpu/drm/radeon/radeon_vm.c
> +++ b/drivers/gpu/drm/radeon/radeon_vm.c
> @@ -1185,7 +1185,7 @@ int radeon_vm_init(struct radeon_device *rdev, struct radeon_vm *vm)
>  		vm->ids[i].last_id_use = NULL;
>  	}
>  	mutex_init(&vm->mutex);
> -	vm->va = RB_ROOT;
> +	vm->va = RB_ROOT_CACHED;
>  	spin_lock_init(&vm->status_lock);
>  	INIT_LIST_HEAD(&vm->invalidated);
>  	INIT_LIST_HEAD(&vm->freed);
> @@ -1232,10 +1232,11 @@ void radeon_vm_fini(struct radeon_device *rdev, struct radeon_vm *vm)
>  	struct radeon_bo_va *bo_va, *tmp;
>  	int i, r;
>  
> -	if (!RB_EMPTY_ROOT(&vm->va)) {
> +	if (!RB_EMPTY_ROOT(&vm->va.rb_root)) {
>  		dev_err(rdev->dev, "still active bo inside vm\n");
>  	}
> -	rbtree_postorder_for_each_entry_safe(bo_va, tmp, &vm->va, it.rb) {
> +	rbtree_postorder_for_each_entry_safe(bo_va, tmp,
> +					     &vm->va.rb_root, it.rb) {
>  		interval_tree_remove(&bo_va->it, &vm->va);
>  		r = radeon_bo_reserve(bo_va->bo, false);
>  		if (!r) {
> diff --git a/drivers/infiniband/core/umem_rbtree.c b/drivers/infiniband/core/umem_rbtree.c
> index d176597b4d78..fc801920e341 100644
> --- a/drivers/infiniband/core/umem_rbtree.c
> +++ b/drivers/infiniband/core/umem_rbtree.c
> @@ -72,7 +72,7 @@ INTERVAL_TREE_DEFINE(struct umem_odp_node, rb, u64, __subtree_last,
>  /* @last is not a part of the interval. See comment for function
>   * node_last.
>   */
> -int rbt_ib_umem_for_each_in_range(struct rb_root *root,
> +int rbt_ib_umem_for_each_in_range(struct rb_root_cached *root,
>  				  u64 start, u64 last,
>  				  umem_call_back cb,
>  				  void *cookie)
> @@ -95,7 +95,7 @@ int rbt_ib_umem_for_each_in_range(struct rb_root *root,
>  }
>  EXPORT_SYMBOL(rbt_ib_umem_for_each_in_range);
>  
> -struct ib_umem_odp *rbt_ib_umem_lookup(struct rb_root *root,
> +struct ib_umem_odp *rbt_ib_umem_lookup(struct rb_root_cached *root,
>  				       u64 addr, u64 length)
>  {
>  	struct umem_odp_node *node;
> diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
> index 3f55d18a3791..5321c6ae2c0c 100644
> --- a/drivers/infiniband/core/uverbs_cmd.c
> +++ b/drivers/infiniband/core/uverbs_cmd.c
> @@ -117,7 +117,7 @@ ssize_t ib_uverbs_get_context(struct ib_uverbs_file *file,
>  	ucontext->closing = 0;
>  
>  #ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
> -	ucontext->umem_tree = RB_ROOT;
> +	ucontext->umem_tree = RB_ROOT_CACHED;
>  	init_rwsem(&ucontext->umem_rwsem);
>  	ucontext->odp_mrs_count = 0;
>  	INIT_LIST_HEAD(&ucontext->no_private_counters);
> diff --git a/drivers/infiniband/hw/hfi1/mmu_rb.c b/drivers/infiniband/hw/hfi1/mmu_rb.c
> index d41fd87a39f2..5b6626f8feb6 100644
> --- a/drivers/infiniband/hw/hfi1/mmu_rb.c
> +++ b/drivers/infiniband/hw/hfi1/mmu_rb.c
> @@ -54,7 +54,7 @@
>  
>  struct mmu_rb_handler {
>  	struct mmu_notifier mn;
> -	struct rb_root root;
> +	struct rb_root_cached root;
>  	void *ops_arg;
>  	spinlock_t lock;        /* protect the RB tree */
>  	struct mmu_rb_ops *ops;
> @@ -111,7 +111,7 @@ int hfi1_mmu_rb_register(void *ops_arg, struct mm_struct *mm,
>  	if (!handlr)
>  		return -ENOMEM;
>  
> -	handlr->root = RB_ROOT;
> +	handlr->root = RB_ROOT_CACHED;
>  	handlr->ops = ops;
>  	handlr->ops_arg = ops_arg;
>  	INIT_HLIST_NODE(&handlr->mn.hlist);
> @@ -152,9 +152,9 @@ void hfi1_mmu_rb_unregister(struct mmu_rb_handler *handler)
>  	INIT_LIST_HEAD(&del_list);
>  
>  	spin_lock_irqsave(&handler->lock, flags);
> -	while ((node = rb_first(&handler->root))) {
> +	while ((node = rb_first_cached(&handler->root))) {
>  		rbnode = rb_entry(node, struct mmu_rb_node, node);
> -		rb_erase(node, &handler->root);
> +		rb_erase_cached(node, &handler->root);
>  		/* move from LRU list to delete list */
>  		list_move(&rbnode->list, &del_list);
>  	}
> @@ -311,7 +311,7 @@ static void mmu_notifier_mem_invalidate(struct mmu_notifier *mn,
>  {
>  	struct mmu_rb_handler *handler =
>  		container_of(mn, struct mmu_rb_handler, mn);
> -	struct rb_root *root = &handler->root;
> +	struct rb_root_cached *root = &handler->root;
>  	struct mmu_rb_node *node, *ptr = NULL;
>  	unsigned long flags;
>  	bool added = false;
> diff --git a/drivers/infiniband/hw/usnic/usnic_uiom.c b/drivers/infiniband/hw/usnic/usnic_uiom.c
> index c49db7c33979..4381c0a9a873 100644
> --- a/drivers/infiniband/hw/usnic/usnic_uiom.c
> +++ b/drivers/infiniband/hw/usnic/usnic_uiom.c
> @@ -227,7 +227,7 @@ static void __usnic_uiom_reg_release(struct usnic_uiom_pd *pd,
>  	vpn_last = vpn_start + npages - 1;
>  
>  	spin_lock(&pd->lock);
> -	usnic_uiom_remove_interval(&pd->rb_root, vpn_start,
> +	usnic_uiom_remove_interval(&pd->root, vpn_start,
>  					vpn_last, &rm_intervals);
>  	usnic_uiom_unmap_sorted_intervals(&rm_intervals, pd);
>  
> @@ -379,7 +379,7 @@ struct usnic_uiom_reg *usnic_uiom_reg_get(struct usnic_uiom_pd *pd,
>  	err = usnic_uiom_get_intervals_diff(vpn_start, vpn_last,
>  						(writable) ? IOMMU_WRITE : 0,
>  						IOMMU_WRITE,
> -						&pd->rb_root,
> +						&pd->root,
>  						&sorted_diff_intervals);
>  	if (err) {
>  		usnic_err("Failed disjoint interval vpn [0x%lx,0x%lx] err %d\n",
> @@ -395,7 +395,7 @@ struct usnic_uiom_reg *usnic_uiom_reg_get(struct usnic_uiom_pd *pd,
>  
>  	}
>  
> -	err = usnic_uiom_insert_interval(&pd->rb_root, vpn_start, vpn_last,
> +	err = usnic_uiom_insert_interval(&pd->root, vpn_start, vpn_last,
>  					(writable) ? IOMMU_WRITE : 0);
>  	if (err) {
>  		usnic_err("Failed insert interval vpn [0x%lx,0x%lx] err %d\n",
> diff --git a/drivers/infiniband/hw/usnic/usnic_uiom.h b/drivers/infiniband/hw/usnic/usnic_uiom.h
> index 45ca7c1613a7..431efe4143f4 100644
> --- a/drivers/infiniband/hw/usnic/usnic_uiom.h
> +++ b/drivers/infiniband/hw/usnic/usnic_uiom.h
> @@ -55,7 +55,7 @@ struct usnic_uiom_dev {
>  struct usnic_uiom_pd {
>  	struct iommu_domain		*domain;
>  	spinlock_t			lock;
> -	struct rb_root			rb_root;
> +	struct rb_root_cached		root;
>  	struct list_head		devs;
>  	int				dev_cnt;
>  };
> diff --git a/drivers/infiniband/hw/usnic/usnic_uiom_interval_tree.c b/drivers/infiniband/hw/usnic/usnic_uiom_interval_tree.c
> index 42b4b4c4e452..d399523206c7 100644
> --- a/drivers/infiniband/hw/usnic/usnic_uiom_interval_tree.c
> +++ b/drivers/infiniband/hw/usnic/usnic_uiom_interval_tree.c
> @@ -100,9 +100,9 @@ static int interval_cmp(void *priv, struct list_head *a, struct list_head *b)
>  }
>  
>  static void
> -find_intervals_intersection_sorted(struct rb_root *root, unsigned long start,
> -					unsigned long last,
> -					struct list_head *list)
> +find_intervals_intersection_sorted(struct rb_root_cached *root,
> +				   unsigned long start, unsigned long last,
> +				   struct list_head *list)
>  {
>  	struct usnic_uiom_interval_node *node;
>  
> @@ -118,7 +118,7 @@ find_intervals_intersection_sorted(struct rb_root *root, unsigned long start,
>  
>  int usnic_uiom_get_intervals_diff(unsigned long start, unsigned long last,
>  					int flags, int flag_mask,
> -					struct rb_root *root,
> +					struct rb_root_cached *root,
>  					struct list_head *diff_set)
>  {
>  	struct usnic_uiom_interval_node *interval, *tmp;
> @@ -175,7 +175,7 @@ void usnic_uiom_put_interval_set(struct list_head *intervals)
>  		kfree(interval);
>  }
>  
> -int usnic_uiom_insert_interval(struct rb_root *root, unsigned long start,
> +int usnic_uiom_insert_interval(struct rb_root_cached *root, unsigned long start,
>  				unsigned long last, int flags)
>  {
>  	struct usnic_uiom_interval_node *interval, *tmp;
> @@ -246,8 +246,9 @@ int usnic_uiom_insert_interval(struct rb_root *root, unsigned long start,
>  	return err;
>  }
>  
> -void usnic_uiom_remove_interval(struct rb_root *root, unsigned long start,
> -				unsigned long last, struct list_head *removed)
> +void usnic_uiom_remove_interval(struct rb_root_cached *root,
> +				unsigned long start, unsigned long last,
> +				struct list_head *removed)
>  {
>  	struct usnic_uiom_interval_node *interval;
>  
> diff --git a/drivers/infiniband/hw/usnic/usnic_uiom_interval_tree.h b/drivers/infiniband/hw/usnic/usnic_uiom_interval_tree.h
> index c0b0b876ab90..1d7fc3226bca 100644
> --- a/drivers/infiniband/hw/usnic/usnic_uiom_interval_tree.h
> +++ b/drivers/infiniband/hw/usnic/usnic_uiom_interval_tree.h
> @@ -48,12 +48,12 @@ struct usnic_uiom_interval_node {
>  
>  extern void
>  usnic_uiom_interval_tree_insert(struct usnic_uiom_interval_node *node,
> -					struct rb_root *root);
> +					struct rb_root_cached *root);
>  extern void
>  usnic_uiom_interval_tree_remove(struct usnic_uiom_interval_node *node,
> -					struct rb_root *root);
> +					struct rb_root_cached *root);
>  extern struct usnic_uiom_interval_node *
> -usnic_uiom_interval_tree_iter_first(struct rb_root *root,
> +usnic_uiom_interval_tree_iter_first(struct rb_root_cached *root,
>  					unsigned long start,
>  					unsigned long last);
>  extern struct usnic_uiom_interval_node *
> @@ -63,7 +63,7 @@ usnic_uiom_interval_tree_iter_next(struct usnic_uiom_interval_node *node,
>   * Inserts {start...last} into {root}.  If there are overlaps,
>   * nodes will be broken up and merged
>   */
> -int usnic_uiom_insert_interval(struct rb_root *root,
> +int usnic_uiom_insert_interval(struct rb_root_cached *root,
>  				unsigned long start, unsigned long last,
>  				int flags);
>  /*
> @@ -71,7 +71,7 @@ int usnic_uiom_insert_interval(struct rb_root *root,
>   * 'removed.' The caller is responsibile for freeing memory of nodes in
>   * 'removed.'
>   */
> -void usnic_uiom_remove_interval(struct rb_root *root,
> +void usnic_uiom_remove_interval(struct rb_root_cached *root,
>  				unsigned long start, unsigned long last,
>  				struct list_head *removed);
>  /*
> @@ -81,7 +81,7 @@ void usnic_uiom_remove_interval(struct rb_root *root,
>  int usnic_uiom_get_intervals_diff(unsigned long start,
>  					unsigned long last, int flags,
>  					int flag_mask,
> -					struct rb_root *root,
> +					struct rb_root_cached *root,
>  					struct list_head *diff_set);
>  /* Call this to free diff_set returned by usnic_uiom_get_intervals_diff */
>  void usnic_uiom_put_interval_set(struct list_head *intervals);
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index e4613a3c362d..88dc214de068 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -1272,7 +1272,7 @@ static struct vhost_umem *vhost_umem_alloc(void)
>  	if (!umem)
>  		return NULL;
>  
> -	umem->umem_tree = RB_ROOT;
> +	umem->umem_tree = RB_ROOT_CACHED;
>  	umem->numem = 0;
>  	INIT_LIST_HEAD(&umem->umem_list);
>  
> diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
> index f72095868b93..a0278ba6a8b4 100644
> --- a/drivers/vhost/vhost.h
> +++ b/drivers/vhost/vhost.h
> @@ -71,7 +71,7 @@ struct vhost_umem_node {
>  };
>  
>  struct vhost_umem {
> -	struct rb_root umem_tree;
> +	struct rb_root_cached umem_tree;
>  	struct list_head umem_list;
>  	int numem;
>  };
> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index 28d2753be094..e61b91e6adf4 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -334,7 +334,7 @@ static void remove_huge_page(struct page *page)
>  }
>  
>  static void
> -hugetlb_vmdelete_list(struct rb_root *root, pgoff_t start, pgoff_t end)
> +hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end)
>  {
>  	struct vm_area_struct *vma;
>  
> @@ -514,7 +514,7 @@ static int hugetlb_vmtruncate(struct inode *inode, loff_t offset)
>  
>  	i_size_write(inode, offset);
>  	i_mmap_lock_write(mapping);
> -	if (!RB_EMPTY_ROOT(&mapping->i_mmap))
> +	if (!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root))
>  		hugetlb_vmdelete_list(&mapping->i_mmap, pgoff, 0);
>  	i_mmap_unlock_write(mapping);
>  	remove_inode_hugepages(inode, offset, LLONG_MAX);
> @@ -539,7 +539,7 @@ static long hugetlbfs_punch_hole(struct inode *inode, loff_t offset, loff_t len)
>  
>  		inode_lock(inode);
>  		i_mmap_lock_write(mapping);
> -		if (!RB_EMPTY_ROOT(&mapping->i_mmap))
> +		if (!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root))
>  			hugetlb_vmdelete_list(&mapping->i_mmap,
>  						hole_start >> PAGE_SHIFT,
>  						hole_end  >> PAGE_SHIFT);
> diff --git a/fs/inode.c b/fs/inode.c
> index 50370599e371..5ddb808ea934 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -353,7 +353,7 @@ void address_space_init_once(struct address_space *mapping)
>  	init_rwsem(&mapping->i_mmap_rwsem);
>  	INIT_LIST_HEAD(&mapping->private_list);
>  	spin_lock_init(&mapping->private_lock);
> -	mapping->i_mmap = RB_ROOT;
> +	mapping->i_mmap = RB_ROOT_CACHED;
>  }
>  EXPORT_SYMBOL(address_space_init_once);
>  
> diff --git a/include/drm/drm_mm.h b/include/drm/drm_mm.h
> index 49b292e98fec..8d10fc97801c 100644
> --- a/include/drm/drm_mm.h
> +++ b/include/drm/drm_mm.h
> @@ -172,7 +172,7 @@ struct drm_mm {
>  	 * according to the (increasing) start address of the memory node. */
>  	struct drm_mm_node head_node;
>  	/* Keep an interval_tree for fast lookup of drm_mm_nodes by address. */
> -	struct rb_root interval_tree;
> +	struct rb_root_cached interval_tree;
>  	struct rb_root holes_size;
>  	struct rb_root holes_addr;
>  
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index ed8798255872..d1d521c5025b 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -392,7 +392,7 @@ struct address_space {
>  	struct radix_tree_root	page_tree;	/* radix tree of all pages */
>  	spinlock_t		tree_lock;	/* and lock protecting it */
>  	atomic_t		i_mmap_writable;/* count VM_SHARED mappings */
> -	struct rb_root		i_mmap;		/* tree of private and shared mappings */
> +	struct rb_root_cached	i_mmap;		/* tree of private and shared mappings */
>  	struct rw_semaphore	i_mmap_rwsem;	/* protect tree, count, list */
>  	/* Protected by tree_lock together with the radix tree */
>  	unsigned long		nrpages;	/* number of total pages */
> @@ -486,7 +486,7 @@ static inline void i_mmap_unlock_read(struct address_space *mapping)
>   */
>  static inline int mapping_mapped(struct address_space *mapping)
>  {
> -	return	!RB_EMPTY_ROOT(&mapping->i_mmap);
> +	return	!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root);
>  }
>  
>  /*
> diff --git a/include/linux/interval_tree.h b/include/linux/interval_tree.h
> index 724556aa3c95..202ee1283f4b 100644
> --- a/include/linux/interval_tree.h
> +++ b/include/linux/interval_tree.h
> @@ -11,13 +11,15 @@ struct interval_tree_node {
>  };
>  
>  extern void
> -interval_tree_insert(struct interval_tree_node *node, struct rb_root *root);
> +interval_tree_insert(struct interval_tree_node *node,
> +		     struct rb_root_cached *root);
>  
>  extern void
> -interval_tree_remove(struct interval_tree_node *node, struct rb_root *root);
> +interval_tree_remove(struct interval_tree_node *node,
> +		     struct rb_root_cached *root);
>  
>  extern struct interval_tree_node *
> -interval_tree_iter_first(struct rb_root *root,
> +interval_tree_iter_first(struct rb_root_cached *root,
>  			 unsigned long start, unsigned long last);
>  
>  extern struct interval_tree_node *
> diff --git a/include/linux/interval_tree_generic.h b/include/linux/interval_tree_generic.h
> index 58370e1862ad..f096423c8cbd 100644
> --- a/include/linux/interval_tree_generic.h
> +++ b/include/linux/interval_tree_generic.h
> @@ -65,11 +65,13 @@ RB_DECLARE_CALLBACKS(static, ITPREFIX ## _augment, ITSTRUCT, ITRB,	      \
>  									      \
>  /* Insert / remove interval nodes from the tree */			      \
>  									      \
> -ITSTATIC void ITPREFIX ## _insert(ITSTRUCT *node, struct rb_root *root)	      \
> +ITSTATIC void ITPREFIX ## _insert(ITSTRUCT *node,			      \
> +				  struct rb_root_cached *root)	 	      \
>  {									      \
> -	struct rb_node **link = &root->rb_node, *rb_parent = NULL;	      \
> +	struct rb_node **link = &root->rb_root.rb_node, *rb_parent = NULL;    \
>  	ITTYPE start = ITSTART(node), last = ITLAST(node);		      \
>  	ITSTRUCT *parent;						      \
> +	bool leftmost = true;						      \
>  									      \
>  	while (*link) {							      \
>  		rb_parent = *link;					      \
> @@ -78,18 +80,22 @@ ITSTATIC void ITPREFIX ## _insert(ITSTRUCT *node, struct rb_root *root)	      \
>  			parent->ITSUBTREE = last;			      \
>  		if (start < ITSTART(parent))				      \
>  			link = &parent->ITRB.rb_left;			      \
> -		else							      \
> +		else {							      \
>  			link = &parent->ITRB.rb_right;			      \
> +			leftmost = false;				      \
> +		}							      \
>  	}								      \
>  									      \
>  	node->ITSUBTREE = last;						      \
>  	rb_link_node(&node->ITRB, rb_parent, link);			      \
> -	rb_insert_augmented(&node->ITRB, root, &ITPREFIX ## _augment);	      \
> +	rb_insert_augmented_cached(&node->ITRB, root,			      \
> +				   leftmost, &ITPREFIX ## _augment);	      \
>  }									      \
>  									      \
> -ITSTATIC void ITPREFIX ## _remove(ITSTRUCT *node, struct rb_root *root)	      \
> +ITSTATIC void ITPREFIX ## _remove(ITSTRUCT *node,			      \
> +				  struct rb_root_cached *root)		      \
>  {									      \
> -	rb_erase_augmented(&node->ITRB, root, &ITPREFIX ## _augment);	      \
> +	rb_erase_augmented_cached(&node->ITRB, root, &ITPREFIX ## _augment);  \
>  }									      \
>  									      \
>  /*									      \
> @@ -140,15 +146,35 @@ ITPREFIX ## _subtree_search(ITSTRUCT *node, ITTYPE start, ITTYPE last)	      \
>  }									      \
>  									      \
>  ITSTATIC ITSTRUCT *							      \
> -ITPREFIX ## _iter_first(struct rb_root *root, ITTYPE start, ITTYPE last)      \
> +ITPREFIX ## _iter_first(struct rb_root_cached *root,			      \
> +			ITTYPE start, ITTYPE last)			      \
>  {									      \
> -	ITSTRUCT *node;							      \
> +	ITSTRUCT *node, *leftmost;					      \
>  									      \
> -	if (!root->rb_node)						      \
> +	if (!root->rb_root.rb_node)					      \
>  		return NULL;						      \
> -	node = rb_entry(root->rb_node, ITSTRUCT, ITRB);			      \
> +									      \
> +	/*								      \
> +	 * Fastpath range intersection/overlap between A: [a0, a1] and	      \
> +	 * B: [b0, b1] is given by:					      \
> +	 *								      \
> +	 *         a0 <= b1 && b0 <= a1					      \
> +	 *								      \
> +	 *  ... where A holds the lock range and B holds the smallest	      \
> +	 * 'start' and largest 'last' in the tree. For the later, we	      \
> +	 * rely on the root node, which by augmented interval tree	      \
> +	 * property, holds the largest value in its last-in-subtree.	      \
> +	 * This allows mitigating some of the tree walk overhead for	      \
> +	 * for non-intersecting ranges, maintained and consulted in O(1).     \
> +	 */								      \
> +	node = rb_entry(root->rb_root.rb_node, ITSTRUCT, ITRB);		      \
>  	if (node->ITSUBTREE < start)					      \
>  		return NULL;						      \
> +									      \
> +	leftmost = rb_entry(root->rb_leftmost, ITSTRUCT, ITRB);		      \
> +	if (ITSTART(leftmost) > last)					      \
> +		return NULL;						      \
> +									      \
>  	return ITPREFIX ## _subtree_search(node, start, last);		      \
>  }									      \
>  									      \
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 46b9ac5e8569..3a2652efbbfb 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1992,13 +1992,13 @@ extern int nommu_shrink_inode_mappings(struct inode *, size_t, size_t);
>  
>  /* interval_tree.c */
>  void vma_interval_tree_insert(struct vm_area_struct *node,
> -			      struct rb_root *root);
> +			      struct rb_root_cached *root);
>  void vma_interval_tree_insert_after(struct vm_area_struct *node,
>  				    struct vm_area_struct *prev,
> -				    struct rb_root *root);
> +				    struct rb_root_cached *root);
>  void vma_interval_tree_remove(struct vm_area_struct *node,
> -			      struct rb_root *root);
> -struct vm_area_struct *vma_interval_tree_iter_first(struct rb_root *root,
> +			      struct rb_root_cached *root);
> +struct vm_area_struct *vma_interval_tree_iter_first(struct rb_root_cached *root,
>  				unsigned long start, unsigned long last);
>  struct vm_area_struct *vma_interval_tree_iter_next(struct vm_area_struct *node,
>  				unsigned long start, unsigned long last);
> @@ -2008,11 +2008,12 @@ struct vm_area_struct *vma_interval_tree_iter_next(struct vm_area_struct *node,
>  	     vma; vma = vma_interval_tree_iter_next(vma, start, last))
>  
>  void anon_vma_interval_tree_insert(struct anon_vma_chain *node,
> -				   struct rb_root *root);
> +				   struct rb_root_cached *root);
>  void anon_vma_interval_tree_remove(struct anon_vma_chain *node,
> -				   struct rb_root *root);
> -struct anon_vma_chain *anon_vma_interval_tree_iter_first(
> -	struct rb_root *root, unsigned long start, unsigned long last);
> +				   struct rb_root_cached *root);
> +struct anon_vma_chain *
> +anon_vma_interval_tree_iter_first(struct rb_root_cached *root,
> +				  unsigned long start, unsigned long last);
>  struct anon_vma_chain *anon_vma_interval_tree_iter_next(
>  	struct anon_vma_chain *node, unsigned long start, unsigned long last);
>  #ifdef CONFIG_DEBUG_VM_RB
> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> index 43ef2c30cb0f..22c298c6cc26 100644
> --- a/include/linux/rmap.h
> +++ b/include/linux/rmap.h
> @@ -55,7 +55,9 @@ struct anon_vma {
>  	 * is serialized by a system wide lock only visible to
>  	 * mm_take_all_locks() (mm_all_locks_mutex).
>  	 */
> -	struct rb_root rb_root;	/* Interval tree of private "related" vmas */
> +
> +	/* Interval tree of private "related" vmas */
> +	struct rb_root_cached rb_root;
>  };
>  
>  /*
> diff --git a/include/rdma/ib_umem_odp.h b/include/rdma/ib_umem_odp.h
> index fb67554aabd6..5eb7f5bc8248 100644
> --- a/include/rdma/ib_umem_odp.h
> +++ b/include/rdma/ib_umem_odp.h
> @@ -111,22 +111,25 @@ int ib_umem_odp_map_dma_pages(struct ib_umem *umem, u64 start_offset, u64 bcnt,
>  void ib_umem_odp_unmap_dma_pages(struct ib_umem *umem, u64 start_offset,
>  				 u64 bound);
>  
> -void rbt_ib_umem_insert(struct umem_odp_node *node, struct rb_root *root);
> -void rbt_ib_umem_remove(struct umem_odp_node *node, struct rb_root *root);
> +void rbt_ib_umem_insert(struct umem_odp_node *node,
> +			struct rb_root_cached *root);
> +void rbt_ib_umem_remove(struct umem_odp_node *node,
> +			struct rb_root_cached *root);
>  typedef int (*umem_call_back)(struct ib_umem *item, u64 start, u64 end,
>  			      void *cookie);
>  /*
>   * Call the callback on each ib_umem in the range. Returns the logical or of
>   * the return values of the functions called.
>   */
> -int rbt_ib_umem_for_each_in_range(struct rb_root *root, u64 start, u64 end,
> +int rbt_ib_umem_for_each_in_range(struct rb_root_cached *root,
> +				  u64 start, u64 end,
>  				  umem_call_back cb, void *cookie);
>  
>  /*
>   * Find first region intersecting with address range.
>   * Return NULL if not found
>   */
> -struct ib_umem_odp *rbt_ib_umem_lookup(struct rb_root *root,
> +struct ib_umem_odp *rbt_ib_umem_lookup(struct rb_root_cached *root,
>  				       u64 addr, u64 length);
>  
>  static inline int ib_umem_mmu_notifier_retry(struct ib_umem *item,
> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> index 593ad2640d2f..81665480a4ab 100644
> --- a/include/rdma/ib_verbs.h
> +++ b/include/rdma/ib_verbs.h
> @@ -1420,7 +1420,7 @@ struct ib_ucontext {
>  
>  	struct pid             *tgid;
>  #ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
> -	struct rb_root      umem_tree;
> +	struct rb_root_cached   umem_tree;
>  	/*
>  	 * Protects .umem_rbroot and tree, as well as odp_mrs_count and
>  	 * mmu notifiers registration.
> diff --git a/lib/interval_tree_test.c b/lib/interval_tree_test.c
> index df495fe81421..0e343fd29570 100644
> --- a/lib/interval_tree_test.c
> +++ b/lib/interval_tree_test.c
> @@ -19,14 +19,14 @@ __param(bool, search_all, false, "Searches will iterate all nodes in the tree");
>  
>  __param(uint, max_endpoint, ~0, "Largest value for the interval's endpoint");
>  
> -static struct rb_root root = RB_ROOT;
> +static struct rb_root_cached root = RB_ROOT_CACHED;
>  static struct interval_tree_node *nodes = NULL;
>  static u32 *queries = NULL;
>  
>  static struct rnd_state rnd;
>  
>  static inline unsigned long
> -search(struct rb_root *root, unsigned long start, unsigned long last)
> +search(struct rb_root_cached *root, unsigned long start, unsigned long last)
>  {
>  	struct interval_tree_node *node;
>  	unsigned long results = 0;
> diff --git a/mm/interval_tree.c b/mm/interval_tree.c
> index f2c2492681bf..b47664358796 100644
> --- a/mm/interval_tree.c
> +++ b/mm/interval_tree.c
> @@ -28,7 +28,7 @@ INTERVAL_TREE_DEFINE(struct vm_area_struct, shared.rb,
>  /* Insert node immediately after prev in the interval tree */
>  void vma_interval_tree_insert_after(struct vm_area_struct *node,
>  				    struct vm_area_struct *prev,
> -				    struct rb_root *root)
> +				    struct rb_root_cached *root)
>  {
>  	struct rb_node **link;
>  	struct vm_area_struct *parent;
> @@ -55,7 +55,7 @@ void vma_interval_tree_insert_after(struct vm_area_struct *node,
>  
>  	node->shared.rb_subtree_last = last;
>  	rb_link_node(&node->shared.rb, &parent->shared.rb, link);
> -	rb_insert_augmented(&node->shared.rb, root,
> +	rb_insert_augmented(&node->shared.rb, &root->rb_root,
>  			    &vma_interval_tree_augment);
>  }
>  
> @@ -74,7 +74,7 @@ INTERVAL_TREE_DEFINE(struct anon_vma_chain, rb, unsigned long, rb_subtree_last,
>  		     static inline, __anon_vma_interval_tree)
>  
>  void anon_vma_interval_tree_insert(struct anon_vma_chain *node,
> -				   struct rb_root *root)
> +				   struct rb_root_cached *root)
>  {
>  #ifdef CONFIG_DEBUG_VM_RB
>  	node->cached_vma_start = avc_start_pgoff(node);
> @@ -84,13 +84,13 @@ void anon_vma_interval_tree_insert(struct anon_vma_chain *node,
>  }
>  
>  void anon_vma_interval_tree_remove(struct anon_vma_chain *node,
> -				   struct rb_root *root)
> +				   struct rb_root_cached *root)
>  {
>  	__anon_vma_interval_tree_remove(node, root);
>  }
>  
>  struct anon_vma_chain *
> -anon_vma_interval_tree_iter_first(struct rb_root *root,
> +anon_vma_interval_tree_iter_first(struct rb_root_cached *root,
>  				  unsigned long first, unsigned long last)
>  {
>  	return __anon_vma_interval_tree_iter_first(root, first, last);
> diff --git a/mm/memory.c b/mm/memory.c
> index 0e517be91a89..33cb79f73394 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2593,7 +2593,7 @@ static void unmap_mapping_range_vma(struct vm_area_struct *vma,
>  	zap_page_range_single(vma, start_addr, end_addr - start_addr, details);
>  }
>  
> -static inline void unmap_mapping_range_tree(struct rb_root *root,
> +static inline void unmap_mapping_range_tree(struct rb_root_cached *root,
>  					    struct zap_details *details)
>  {
>  	struct vm_area_struct *vma;
> @@ -2657,7 +2657,7 @@ void unmap_mapping_range(struct address_space *mapping,
>  		details.last_index = ULONG_MAX;
>  
>  	i_mmap_lock_write(mapping);
> -	if (unlikely(!RB_EMPTY_ROOT(&mapping->i_mmap)))
> +	if (unlikely(!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root)))
>  		unmap_mapping_range_tree(&mapping->i_mmap, &details);
>  	i_mmap_unlock_write(mapping);
>  }
> diff --git a/mm/mmap.c b/mm/mmap.c
> index f19efcf75418..8121c70df96f 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -684,7 +684,7 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
>  	struct mm_struct *mm = vma->vm_mm;
>  	struct vm_area_struct *next = vma->vm_next, *orig_vma = vma;
>  	struct address_space *mapping = NULL;
> -	struct rb_root *root = NULL;
> +	struct rb_root_cached *root = NULL;
>  	struct anon_vma *anon_vma = NULL;
>  	struct file *file = vma->vm_file;
>  	bool start_changed = false, end_changed = false;
> @@ -3314,7 +3314,7 @@ static DEFINE_MUTEX(mm_all_locks_mutex);
>  
>  static void vm_lock_anon_vma(struct mm_struct *mm, struct anon_vma *anon_vma)
>  {
> -	if (!test_bit(0, (unsigned long *) &anon_vma->root->rb_root.rb_node)) {
> +	if (!test_bit(0, (unsigned long *) &anon_vma->rb_root.rb_root.rb_node)) {
>  		/*
>  		 * The LSB of head.next can't change from under us
>  		 * because we hold the mm_all_locks_mutex.
> @@ -3330,7 +3330,7 @@ static void vm_lock_anon_vma(struct mm_struct *mm, struct anon_vma *anon_vma)
>  		 * anon_vma->root->rwsem.
>  		 */
>  		if (__test_and_set_bit(0, (unsigned long *)
> -				       &anon_vma->root->rb_root.rb_node))
> +				       &anon_vma->root->rb_root.rb_root.rb_node))
>  			BUG();
>  	}
>  }
> @@ -3432,7 +3432,7 @@ int mm_take_all_locks(struct mm_struct *mm)
>  
>  static void vm_unlock_anon_vma(struct anon_vma *anon_vma)
>  {
> -	if (test_bit(0, (unsigned long *) &anon_vma->root->rb_root.rb_node)) {
> +	if (test_bit(0, (unsigned long *) &anon_vma->root->rb_root.rb_root.rb_node)) {
>  		/*
>  		 * The LSB of head.next can't change to 0 from under
>  		 * us because we hold the mm_all_locks_mutex.
> @@ -3446,7 +3446,7 @@ static void vm_unlock_anon_vma(struct anon_vma *anon_vma)
>  		 * anon_vma->root->rwsem.
>  		 */
>  		if (!__test_and_clear_bit(0, (unsigned long *)
> -					  &anon_vma->root->rb_root.rb_node))
> +					  &anon_vma->root->rb_root.rb_root.rb_node))
>  			BUG();
>  		anon_vma_unlock_write(anon_vma);
>  	}
> diff --git a/mm/rmap.c b/mm/rmap.c
> index ced14f1af6dc..ad479e5e081d 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -390,7 +390,7 @@ void unlink_anon_vmas(struct vm_area_struct *vma)
>  		 * Leave empty anon_vmas on the list - we'll need
>  		 * to free them outside the lock.
>  		 */
> -		if (RB_EMPTY_ROOT(&anon_vma->rb_root)) {
> +		if (RB_EMPTY_ROOT(&anon_vma->rb_root.rb_root)) {
>  			anon_vma->parent->degree--;
>  			continue;
>  		}
> @@ -424,7 +424,7 @@ static void anon_vma_ctor(void *data)
>  
>  	init_rwsem(&anon_vma->rwsem);
>  	atomic_set(&anon_vma->refcount, 0);
> -	anon_vma->rb_root = RB_ROOT;
> +	anon_vma->rb_root = RB_ROOT_CACHED;
>  }
>  
>  void __init anon_vma_init(void)
> -- 
> 2.12.0
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 11/17] lib/interval_tree: fast overlap detection
@ 2017-08-01 17:16     ` Michael S. Tsirkin
  0 siblings, 0 replies; 29+ messages in thread
From: Michael S. Tsirkin @ 2017-08-01 17:16 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: akpm, mingo, peterz, jack, torvalds, kirill.shutemov, hch,
	ldufour, mhocko, mgorman, linux-kernel, David Airlie, dri-devel,
	Jason Wang, Doug Ledford, Christian Benvenuti, linux-rdma,
	Davidlohr Bueso

On Tue, Jul 18, 2017 at 06:45:57PM -0700, Davidlohr Bueso wrote:
> Allow interval trees to quickly check for overlaps to avoid
> unnecesary tree lookups in interval_tree_iter_first().
> 
> As of this patch, all interval tree flavors will require
> using a 'rb_root_cached' such that we can have the leftmost
> node easily available. While most users will make use of this
> feature, those with special functions (in addition to the generic
> insert, delete, search calls) will avoid using the cached
> option as they can do funky things with insertions -- for example,
> vma_interval_tree_insert_after().
> 
> Cc: David Airlie <airlied@linux.ie>
> Cc: dri-devel@lists.freedesktop.org
> Cc: "Michael S. Tsirkin" <mst@redhat.com>
> Cc: Jason Wang <jasowang@redhat.com>
> Cc: Doug Ledford <dledford@redhat.com>
> Cc: Christian Benvenuti <benve@cisco.com>
> Cc: linux-rdma@vger.kernel.org
> Acked-by: Christian König <christian.koenig@amd.com>
> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Signed-off-by: Davidlohr Bueso <dbueso@suse.de>

For vhost bits:

Acked-by: Michael S. Tsirkin <mst@redhat.com>


> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c             |  8 ++--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c             |  7 ++--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h             |  2 +-
>  drivers/gpu/drm/drm_mm.c                           | 19 +++++----
>  drivers/gpu/drm/drm_vma_manager.c                  |  2 +-
>  drivers/gpu/drm/i915/i915_gem_userptr.c            |  6 +--
>  drivers/gpu/drm/radeon/radeon.h                    |  2 +-
>  drivers/gpu/drm/radeon/radeon_mn.c                 |  8 ++--
>  drivers/gpu/drm/radeon/radeon_vm.c                 |  7 ++--
>  drivers/infiniband/core/umem_rbtree.c              |  4 +-
>  drivers/infiniband/core/uverbs_cmd.c               |  2 +-
>  drivers/infiniband/hw/hfi1/mmu_rb.c                | 10 ++---
>  drivers/infiniband/hw/usnic/usnic_uiom.c           |  6 +--
>  drivers/infiniband/hw/usnic/usnic_uiom.h           |  2 +-
>  .../infiniband/hw/usnic/usnic_uiom_interval_tree.c | 15 +++----
>  .../infiniband/hw/usnic/usnic_uiom_interval_tree.h | 12 +++---
>  drivers/vhost/vhost.c                              |  2 +-
>  drivers/vhost/vhost.h                              |  2 +-
>  fs/hugetlbfs/inode.c                               |  6 +--
>  fs/inode.c                                         |  2 +-
>  include/drm/drm_mm.h                               |  2 +-
>  include/linux/fs.h                                 |  4 +-
>  include/linux/interval_tree.h                      |  8 ++--
>  include/linux/interval_tree_generic.h              | 46 +++++++++++++++++-----
>  include/linux/mm.h                                 | 17 ++++----
>  include/linux/rmap.h                               |  4 +-
>  include/rdma/ib_umem_odp.h                         | 11 ++++--
>  include/rdma/ib_verbs.h                            |  2 +-
>  lib/interval_tree_test.c                           |  4 +-
>  mm/interval_tree.c                                 | 10 ++---
>  mm/memory.c                                        |  4 +-
>  mm/mmap.c                                          | 10 ++---
>  mm/rmap.c                                          |  4 +-
>  33 files changed, 145 insertions(+), 105 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
> index 38f739fb727b..3f8aef21b9a6 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c
> @@ -51,7 +51,7 @@ struct amdgpu_mn {
>  
>  	/* objects protected by lock */
>  	struct mutex		lock;
> -	struct rb_root		objects;
> +	struct rb_root_cached	objects;
>  };
>  
>  struct amdgpu_mn_node {
> @@ -76,8 +76,8 @@ static void amdgpu_mn_destroy(struct work_struct *work)
>  	mutex_lock(&adev->mn_lock);
>  	mutex_lock(&rmn->lock);
>  	hash_del(&rmn->node);
> -	rbtree_postorder_for_each_entry_safe(node, next_node, &rmn->objects,
> -					     it.rb) {
> +	rbtree_postorder_for_each_entry_safe(node, next_node,
> +					     &rmn->objects.rb_root, it.rb) {
>  		list_for_each_entry_safe(bo, next_bo, &node->bos, mn_list) {
>  			bo->mn = NULL;
>  			list_del_init(&bo->mn_list);
> @@ -252,7 +252,7 @@ static struct amdgpu_mn *amdgpu_mn_get(struct amdgpu_device *adev)
>  	rmn->mm = mm;
>  	rmn->mn.ops = &amdgpu_mn_ops;
>  	mutex_init(&rmn->lock);
> -	rmn->objects = RB_ROOT;
> +	rmn->objects = RB_ROOT_CACHED;
>  
>  	r = __mmu_notifier_register(&rmn->mn, mm);
>  	if (r)
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index 5795f81369f0..f872e2179bbd 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -2405,7 +2405,7 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm,
>  	int r, i;
>  	u64 flags;
>  
> -	vm->va = RB_ROOT;
> +	vm->va = RB_ROOT_CACHED;
>  	vm->client_id = atomic64_inc_return(&adev->vm_manager.client_counter);
>  	for (i = 0; i < AMDGPU_MAX_VMHUBS; i++)
>  		vm->reserved_vmid[i] = NULL;
> @@ -2512,10 +2512,11 @@ void amdgpu_vm_fini(struct amdgpu_device *adev, struct amdgpu_vm *vm)
>  
>  	amd_sched_entity_fini(vm->entity.sched, &vm->entity);
>  
> -	if (!RB_EMPTY_ROOT(&vm->va)) {
> +	if (!RB_EMPTY_ROOT(&vm->va.rb_root)) {
>  		dev_err(adev->dev, "still active bo inside vm\n");
>  	}
> -	rbtree_postorder_for_each_entry_safe(mapping, tmp, &vm->va, rb) {
> +	rbtree_postorder_for_each_entry_safe(mapping, tmp,
> +					     &vm->va.rb_root, rb) {
>  		list_del(&mapping->list);
>  		amdgpu_vm_it_remove(mapping, &vm->va);
>  		kfree(mapping);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> index 936f158bc5ec..ebffc1253f85 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
> @@ -106,7 +106,7 @@ struct amdgpu_vm_pt {
>  
>  struct amdgpu_vm {
>  	/* tree of virtual addresses mapped */
> -	struct rb_root		va;
> +	struct rb_root_cached	va;
>  
>  	/* protecting invalidated */
>  	spinlock_t		status_lock;
> diff --git a/drivers/gpu/drm/drm_mm.c b/drivers/gpu/drm/drm_mm.c
> index f794089d30ac..61a1c8ea74bc 100644
> --- a/drivers/gpu/drm/drm_mm.c
> +++ b/drivers/gpu/drm/drm_mm.c
> @@ -169,7 +169,7 @@ INTERVAL_TREE_DEFINE(struct drm_mm_node, rb,
>  struct drm_mm_node *
>  __drm_mm_interval_first(const struct drm_mm *mm, u64 start, u64 last)
>  {
> -	return drm_mm_interval_tree_iter_first((struct rb_root *)&mm->interval_tree,
> +	return drm_mm_interval_tree_iter_first((struct rb_root_cached *)&mm->interval_tree,
>  					       start, last) ?: (struct drm_mm_node *)&mm->head_node;
>  }
>  EXPORT_SYMBOL(__drm_mm_interval_first);
> @@ -180,6 +180,7 @@ static void drm_mm_interval_tree_add_node(struct drm_mm_node *hole_node,
>  	struct drm_mm *mm = hole_node->mm;
>  	struct rb_node **link, *rb;
>  	struct drm_mm_node *parent;
> +	bool leftmost = true;
>  
>  	node->__subtree_last = LAST(node);
>  
> @@ -196,9 +197,10 @@ static void drm_mm_interval_tree_add_node(struct drm_mm_node *hole_node,
>  
>  		rb = &hole_node->rb;
>  		link = &hole_node->rb.rb_right;
> +		leftmost = false;
>  	} else {
>  		rb = NULL;
> -		link = &mm->interval_tree.rb_node;
> +		link = &mm->interval_tree.rb_root.rb_node;
>  	}
>  
>  	while (*link) {
> @@ -208,14 +210,15 @@ static void drm_mm_interval_tree_add_node(struct drm_mm_node *hole_node,
>  			parent->__subtree_last = node->__subtree_last;
>  		if (node->start < parent->start)
>  			link = &parent->rb.rb_left;
> -		else
> +		else {
>  			link = &parent->rb.rb_right;
> +			leftmost = true;
> +		}
>  	}
>  
>  	rb_link_node(&node->rb, rb, link);
> -	rb_insert_augmented(&node->rb,
> -			    &mm->interval_tree,
> -			    &drm_mm_interval_tree_augment);
> +	rb_insert_augmented_cached(&node->rb, &mm->interval_tree, leftmost,
> +				   &drm_mm_interval_tree_augment);
>  }
>  
>  #define RB_INSERT(root, member, expr) do { \
> @@ -577,7 +580,7 @@ void drm_mm_replace_node(struct drm_mm_node *old, struct drm_mm_node *new)
>  	*new = *old;
>  
>  	list_replace(&old->node_list, &new->node_list);
> -	rb_replace_node(&old->rb, &new->rb, &old->mm->interval_tree);
> +	rb_replace_node(&old->rb, &new->rb, &old->mm->interval_tree.rb_root);
>  
>  	if (drm_mm_hole_follows(old)) {
>  		list_replace(&old->hole_stack, &new->hole_stack);
> @@ -863,7 +866,7 @@ void drm_mm_init(struct drm_mm *mm, u64 start, u64 size)
>  	mm->color_adjust = NULL;
>  
>  	INIT_LIST_HEAD(&mm->hole_stack);
> -	mm->interval_tree = RB_ROOT;
> +	mm->interval_tree = RB_ROOT_CACHED;
>  	mm->holes_size = RB_ROOT;
>  	mm->holes_addr = RB_ROOT;
>  
> diff --git a/drivers/gpu/drm/drm_vma_manager.c b/drivers/gpu/drm/drm_vma_manager.c
> index d9100b565198..28f1226576f8 100644
> --- a/drivers/gpu/drm/drm_vma_manager.c
> +++ b/drivers/gpu/drm/drm_vma_manager.c
> @@ -147,7 +147,7 @@ struct drm_vma_offset_node *drm_vma_offset_lookup_locked(struct drm_vma_offset_m
>  	struct rb_node *iter;
>  	unsigned long offset;
>  
> -	iter = mgr->vm_addr_space_mm.interval_tree.rb_node;
> +	iter = mgr->vm_addr_space_mm.interval_tree.rb_root.rb_node;
>  	best = NULL;
>  
>  	while (likely(iter)) {
> diff --git a/drivers/gpu/drm/i915/i915_gem_userptr.c b/drivers/gpu/drm/i915/i915_gem_userptr.c
> index ccd09e8419f5..71dddf66baaa 100644
> --- a/drivers/gpu/drm/i915/i915_gem_userptr.c
> +++ b/drivers/gpu/drm/i915/i915_gem_userptr.c
> @@ -49,7 +49,7 @@ struct i915_mmu_notifier {
>  	spinlock_t lock;
>  	struct hlist_node node;
>  	struct mmu_notifier mn;
> -	struct rb_root objects;
> +	struct rb_root_cached objects;
>  	struct workqueue_struct *wq;
>  };
>  
> @@ -123,7 +123,7 @@ static void i915_gem_userptr_mn_invalidate_range_start(struct mmu_notifier *_mn,
>  	struct interval_tree_node *it;
>  	LIST_HEAD(cancelled);
>  
> -	if (RB_EMPTY_ROOT(&mn->objects))
> +	if (RB_EMPTY_ROOT(&mn->objects.rb_root))
>  		return;
>  
>  	/* interval ranges are inclusive, but invalidate range is exclusive */
> @@ -172,7 +172,7 @@ i915_mmu_notifier_create(struct mm_struct *mm)
>  
>  	spin_lock_init(&mn->lock);
>  	mn->mn.ops = &i915_gem_userptr_notifier;
> -	mn->objects = RB_ROOT;
> +	mn->objects = RB_ROOT_CACHED;
>  	mn->wq = alloc_workqueue("i915-userptr-release", WQ_UNBOUND, 0);
>  	if (mn->wq == NULL) {
>  		kfree(mn);
> diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
> index 5008f3d4cccc..10d0dd146808 100644
> --- a/drivers/gpu/drm/radeon/radeon.h
> +++ b/drivers/gpu/drm/radeon/radeon.h
> @@ -924,7 +924,7 @@ struct radeon_vm_id {
>  struct radeon_vm {
>  	struct mutex		mutex;
>  
> -	struct rb_root		va;
> +	struct rb_root_cached	va;
>  
>  	/* protecting invalidated and freed */
>  	spinlock_t		status_lock;
> diff --git a/drivers/gpu/drm/radeon/radeon_mn.c b/drivers/gpu/drm/radeon/radeon_mn.c
> index 896f2cf51e4e..1d62288b7ee3 100644
> --- a/drivers/gpu/drm/radeon/radeon_mn.c
> +++ b/drivers/gpu/drm/radeon/radeon_mn.c
> @@ -50,7 +50,7 @@ struct radeon_mn {
>  
>  	/* objects protected by lock */
>  	struct mutex		lock;
> -	struct rb_root		objects;
> +	struct rb_root_cached	objects;
>  };
>  
>  struct radeon_mn_node {
> @@ -75,8 +75,8 @@ static void radeon_mn_destroy(struct work_struct *work)
>  	mutex_lock(&rdev->mn_lock);
>  	mutex_lock(&rmn->lock);
>  	hash_del(&rmn->node);
> -	rbtree_postorder_for_each_entry_safe(node, next_node, &rmn->objects,
> -					     it.rb) {
> +	rbtree_postorder_for_each_entry_safe(node, next_node,
> +					     &rmn->objects.rb_root, it.rb) {
>  
>  		interval_tree_remove(&node->it, &rmn->objects);
>  		list_for_each_entry_safe(bo, next_bo, &node->bos, mn_list) {
> @@ -205,7 +205,7 @@ static struct radeon_mn *radeon_mn_get(struct radeon_device *rdev)
>  	rmn->mm = mm;
>  	rmn->mn.ops = &radeon_mn_ops;
>  	mutex_init(&rmn->lock);
> -	rmn->objects = RB_ROOT;
> +	rmn->objects = RB_ROOT_CACHED;
>  	
>  	r = __mmu_notifier_register(&rmn->mn, mm);
>  	if (r)
> diff --git a/drivers/gpu/drm/radeon/radeon_vm.c b/drivers/gpu/drm/radeon/radeon_vm.c
> index 5f68245579a3..f44777a6c2e8 100644
> --- a/drivers/gpu/drm/radeon/radeon_vm.c
> +++ b/drivers/gpu/drm/radeon/radeon_vm.c
> @@ -1185,7 +1185,7 @@ int radeon_vm_init(struct radeon_device *rdev, struct radeon_vm *vm)
>  		vm->ids[i].last_id_use = NULL;
>  	}
>  	mutex_init(&vm->mutex);
> -	vm->va = RB_ROOT;
> +	vm->va = RB_ROOT_CACHED;
>  	spin_lock_init(&vm->status_lock);
>  	INIT_LIST_HEAD(&vm->invalidated);
>  	INIT_LIST_HEAD(&vm->freed);
> @@ -1232,10 +1232,11 @@ void radeon_vm_fini(struct radeon_device *rdev, struct radeon_vm *vm)
>  	struct radeon_bo_va *bo_va, *tmp;
>  	int i, r;
>  
> -	if (!RB_EMPTY_ROOT(&vm->va)) {
> +	if (!RB_EMPTY_ROOT(&vm->va.rb_root)) {
>  		dev_err(rdev->dev, "still active bo inside vm\n");
>  	}
> -	rbtree_postorder_for_each_entry_safe(bo_va, tmp, &vm->va, it.rb) {
> +	rbtree_postorder_for_each_entry_safe(bo_va, tmp,
> +					     &vm->va.rb_root, it.rb) {
>  		interval_tree_remove(&bo_va->it, &vm->va);
>  		r = radeon_bo_reserve(bo_va->bo, false);
>  		if (!r) {
> diff --git a/drivers/infiniband/core/umem_rbtree.c b/drivers/infiniband/core/umem_rbtree.c
> index d176597b4d78..fc801920e341 100644
> --- a/drivers/infiniband/core/umem_rbtree.c
> +++ b/drivers/infiniband/core/umem_rbtree.c
> @@ -72,7 +72,7 @@ INTERVAL_TREE_DEFINE(struct umem_odp_node, rb, u64, __subtree_last,
>  /* @last is not a part of the interval. See comment for function
>   * node_last.
>   */
> -int rbt_ib_umem_for_each_in_range(struct rb_root *root,
> +int rbt_ib_umem_for_each_in_range(struct rb_root_cached *root,
>  				  u64 start, u64 last,
>  				  umem_call_back cb,
>  				  void *cookie)
> @@ -95,7 +95,7 @@ int rbt_ib_umem_for_each_in_range(struct rb_root *root,
>  }
>  EXPORT_SYMBOL(rbt_ib_umem_for_each_in_range);
>  
> -struct ib_umem_odp *rbt_ib_umem_lookup(struct rb_root *root,
> +struct ib_umem_odp *rbt_ib_umem_lookup(struct rb_root_cached *root,
>  				       u64 addr, u64 length)
>  {
>  	struct umem_odp_node *node;
> diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
> index 3f55d18a3791..5321c6ae2c0c 100644
> --- a/drivers/infiniband/core/uverbs_cmd.c
> +++ b/drivers/infiniband/core/uverbs_cmd.c
> @@ -117,7 +117,7 @@ ssize_t ib_uverbs_get_context(struct ib_uverbs_file *file,
>  	ucontext->closing = 0;
>  
>  #ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
> -	ucontext->umem_tree = RB_ROOT;
> +	ucontext->umem_tree = RB_ROOT_CACHED;
>  	init_rwsem(&ucontext->umem_rwsem);
>  	ucontext->odp_mrs_count = 0;
>  	INIT_LIST_HEAD(&ucontext->no_private_counters);
> diff --git a/drivers/infiniband/hw/hfi1/mmu_rb.c b/drivers/infiniband/hw/hfi1/mmu_rb.c
> index d41fd87a39f2..5b6626f8feb6 100644
> --- a/drivers/infiniband/hw/hfi1/mmu_rb.c
> +++ b/drivers/infiniband/hw/hfi1/mmu_rb.c
> @@ -54,7 +54,7 @@
>  
>  struct mmu_rb_handler {
>  	struct mmu_notifier mn;
> -	struct rb_root root;
> +	struct rb_root_cached root;
>  	void *ops_arg;
>  	spinlock_t lock;        /* protect the RB tree */
>  	struct mmu_rb_ops *ops;
> @@ -111,7 +111,7 @@ int hfi1_mmu_rb_register(void *ops_arg, struct mm_struct *mm,
>  	if (!handlr)
>  		return -ENOMEM;
>  
> -	handlr->root = RB_ROOT;
> +	handlr->root = RB_ROOT_CACHED;
>  	handlr->ops = ops;
>  	handlr->ops_arg = ops_arg;
>  	INIT_HLIST_NODE(&handlr->mn.hlist);
> @@ -152,9 +152,9 @@ void hfi1_mmu_rb_unregister(struct mmu_rb_handler *handler)
>  	INIT_LIST_HEAD(&del_list);
>  
>  	spin_lock_irqsave(&handler->lock, flags);
> -	while ((node = rb_first(&handler->root))) {
> +	while ((node = rb_first_cached(&handler->root))) {
>  		rbnode = rb_entry(node, struct mmu_rb_node, node);
> -		rb_erase(node, &handler->root);
> +		rb_erase_cached(node, &handler->root);
>  		/* move from LRU list to delete list */
>  		list_move(&rbnode->list, &del_list);
>  	}
> @@ -311,7 +311,7 @@ static void mmu_notifier_mem_invalidate(struct mmu_notifier *mn,
>  {
>  	struct mmu_rb_handler *handler =
>  		container_of(mn, struct mmu_rb_handler, mn);
> -	struct rb_root *root = &handler->root;
> +	struct rb_root_cached *root = &handler->root;
>  	struct mmu_rb_node *node, *ptr = NULL;
>  	unsigned long flags;
>  	bool added = false;
> diff --git a/drivers/infiniband/hw/usnic/usnic_uiom.c b/drivers/infiniband/hw/usnic/usnic_uiom.c
> index c49db7c33979..4381c0a9a873 100644
> --- a/drivers/infiniband/hw/usnic/usnic_uiom.c
> +++ b/drivers/infiniband/hw/usnic/usnic_uiom.c
> @@ -227,7 +227,7 @@ static void __usnic_uiom_reg_release(struct usnic_uiom_pd *pd,
>  	vpn_last = vpn_start + npages - 1;
>  
>  	spin_lock(&pd->lock);
> -	usnic_uiom_remove_interval(&pd->rb_root, vpn_start,
> +	usnic_uiom_remove_interval(&pd->root, vpn_start,
>  					vpn_last, &rm_intervals);
>  	usnic_uiom_unmap_sorted_intervals(&rm_intervals, pd);
>  
> @@ -379,7 +379,7 @@ struct usnic_uiom_reg *usnic_uiom_reg_get(struct usnic_uiom_pd *pd,
>  	err = usnic_uiom_get_intervals_diff(vpn_start, vpn_last,
>  						(writable) ? IOMMU_WRITE : 0,
>  						IOMMU_WRITE,
> -						&pd->rb_root,
> +						&pd->root,
>  						&sorted_diff_intervals);
>  	if (err) {
>  		usnic_err("Failed disjoint interval vpn [0x%lx,0x%lx] err %d\n",
> @@ -395,7 +395,7 @@ struct usnic_uiom_reg *usnic_uiom_reg_get(struct usnic_uiom_pd *pd,
>  
>  	}
>  
> -	err = usnic_uiom_insert_interval(&pd->rb_root, vpn_start, vpn_last,
> +	err = usnic_uiom_insert_interval(&pd->root, vpn_start, vpn_last,
>  					(writable) ? IOMMU_WRITE : 0);
>  	if (err) {
>  		usnic_err("Failed insert interval vpn [0x%lx,0x%lx] err %d\n",
> diff --git a/drivers/infiniband/hw/usnic/usnic_uiom.h b/drivers/infiniband/hw/usnic/usnic_uiom.h
> index 45ca7c1613a7..431efe4143f4 100644
> --- a/drivers/infiniband/hw/usnic/usnic_uiom.h
> +++ b/drivers/infiniband/hw/usnic/usnic_uiom.h
> @@ -55,7 +55,7 @@ struct usnic_uiom_dev {
>  struct usnic_uiom_pd {
>  	struct iommu_domain		*domain;
>  	spinlock_t			lock;
> -	struct rb_root			rb_root;
> +	struct rb_root_cached		root;
>  	struct list_head		devs;
>  	int				dev_cnt;
>  };
> diff --git a/drivers/infiniband/hw/usnic/usnic_uiom_interval_tree.c b/drivers/infiniband/hw/usnic/usnic_uiom_interval_tree.c
> index 42b4b4c4e452..d399523206c7 100644
> --- a/drivers/infiniband/hw/usnic/usnic_uiom_interval_tree.c
> +++ b/drivers/infiniband/hw/usnic/usnic_uiom_interval_tree.c
> @@ -100,9 +100,9 @@ static int interval_cmp(void *priv, struct list_head *a, struct list_head *b)
>  }
>  
>  static void
> -find_intervals_intersection_sorted(struct rb_root *root, unsigned long start,
> -					unsigned long last,
> -					struct list_head *list)
> +find_intervals_intersection_sorted(struct rb_root_cached *root,
> +				   unsigned long start, unsigned long last,
> +				   struct list_head *list)
>  {
>  	struct usnic_uiom_interval_node *node;
>  
> @@ -118,7 +118,7 @@ find_intervals_intersection_sorted(struct rb_root *root, unsigned long start,
>  
>  int usnic_uiom_get_intervals_diff(unsigned long start, unsigned long last,
>  					int flags, int flag_mask,
> -					struct rb_root *root,
> +					struct rb_root_cached *root,
>  					struct list_head *diff_set)
>  {
>  	struct usnic_uiom_interval_node *interval, *tmp;
> @@ -175,7 +175,7 @@ void usnic_uiom_put_interval_set(struct list_head *intervals)
>  		kfree(interval);
>  }
>  
> -int usnic_uiom_insert_interval(struct rb_root *root, unsigned long start,
> +int usnic_uiom_insert_interval(struct rb_root_cached *root, unsigned long start,
>  				unsigned long last, int flags)
>  {
>  	struct usnic_uiom_interval_node *interval, *tmp;
> @@ -246,8 +246,9 @@ int usnic_uiom_insert_interval(struct rb_root *root, unsigned long start,
>  	return err;
>  }
>  
> -void usnic_uiom_remove_interval(struct rb_root *root, unsigned long start,
> -				unsigned long last, struct list_head *removed)
> +void usnic_uiom_remove_interval(struct rb_root_cached *root,
> +				unsigned long start, unsigned long last,
> +				struct list_head *removed)
>  {
>  	struct usnic_uiom_interval_node *interval;
>  
> diff --git a/drivers/infiniband/hw/usnic/usnic_uiom_interval_tree.h b/drivers/infiniband/hw/usnic/usnic_uiom_interval_tree.h
> index c0b0b876ab90..1d7fc3226bca 100644
> --- a/drivers/infiniband/hw/usnic/usnic_uiom_interval_tree.h
> +++ b/drivers/infiniband/hw/usnic/usnic_uiom_interval_tree.h
> @@ -48,12 +48,12 @@ struct usnic_uiom_interval_node {
>  
>  extern void
>  usnic_uiom_interval_tree_insert(struct usnic_uiom_interval_node *node,
> -					struct rb_root *root);
> +					struct rb_root_cached *root);
>  extern void
>  usnic_uiom_interval_tree_remove(struct usnic_uiom_interval_node *node,
> -					struct rb_root *root);
> +					struct rb_root_cached *root);
>  extern struct usnic_uiom_interval_node *
> -usnic_uiom_interval_tree_iter_first(struct rb_root *root,
> +usnic_uiom_interval_tree_iter_first(struct rb_root_cached *root,
>  					unsigned long start,
>  					unsigned long last);
>  extern struct usnic_uiom_interval_node *
> @@ -63,7 +63,7 @@ usnic_uiom_interval_tree_iter_next(struct usnic_uiom_interval_node *node,
>   * Inserts {start...last} into {root}.  If there are overlaps,
>   * nodes will be broken up and merged
>   */
> -int usnic_uiom_insert_interval(struct rb_root *root,
> +int usnic_uiom_insert_interval(struct rb_root_cached *root,
>  				unsigned long start, unsigned long last,
>  				int flags);
>  /*
> @@ -71,7 +71,7 @@ int usnic_uiom_insert_interval(struct rb_root *root,
>   * 'removed.' The caller is responsibile for freeing memory of nodes in
>   * 'removed.'
>   */
> -void usnic_uiom_remove_interval(struct rb_root *root,
> +void usnic_uiom_remove_interval(struct rb_root_cached *root,
>  				unsigned long start, unsigned long last,
>  				struct list_head *removed);
>  /*
> @@ -81,7 +81,7 @@ void usnic_uiom_remove_interval(struct rb_root *root,
>  int usnic_uiom_get_intervals_diff(unsigned long start,
>  					unsigned long last, int flags,
>  					int flag_mask,
> -					struct rb_root *root,
> +					struct rb_root_cached *root,
>  					struct list_head *diff_set);
>  /* Call this to free diff_set returned by usnic_uiom_get_intervals_diff */
>  void usnic_uiom_put_interval_set(struct list_head *intervals);
> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
> index e4613a3c362d..88dc214de068 100644
> --- a/drivers/vhost/vhost.c
> +++ b/drivers/vhost/vhost.c
> @@ -1272,7 +1272,7 @@ static struct vhost_umem *vhost_umem_alloc(void)
>  	if (!umem)
>  		return NULL;
>  
> -	umem->umem_tree = RB_ROOT;
> +	umem->umem_tree = RB_ROOT_CACHED;
>  	umem->numem = 0;
>  	INIT_LIST_HEAD(&umem->umem_list);
>  
> diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h
> index f72095868b93..a0278ba6a8b4 100644
> --- a/drivers/vhost/vhost.h
> +++ b/drivers/vhost/vhost.h
> @@ -71,7 +71,7 @@ struct vhost_umem_node {
>  };
>  
>  struct vhost_umem {
> -	struct rb_root umem_tree;
> +	struct rb_root_cached umem_tree;
>  	struct list_head umem_list;
>  	int numem;
>  };
> diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
> index 28d2753be094..e61b91e6adf4 100644
> --- a/fs/hugetlbfs/inode.c
> +++ b/fs/hugetlbfs/inode.c
> @@ -334,7 +334,7 @@ static void remove_huge_page(struct page *page)
>  }
>  
>  static void
> -hugetlb_vmdelete_list(struct rb_root *root, pgoff_t start, pgoff_t end)
> +hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end)
>  {
>  	struct vm_area_struct *vma;
>  
> @@ -514,7 +514,7 @@ static int hugetlb_vmtruncate(struct inode *inode, loff_t offset)
>  
>  	i_size_write(inode, offset);
>  	i_mmap_lock_write(mapping);
> -	if (!RB_EMPTY_ROOT(&mapping->i_mmap))
> +	if (!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root))
>  		hugetlb_vmdelete_list(&mapping->i_mmap, pgoff, 0);
>  	i_mmap_unlock_write(mapping);
>  	remove_inode_hugepages(inode, offset, LLONG_MAX);
> @@ -539,7 +539,7 @@ static long hugetlbfs_punch_hole(struct inode *inode, loff_t offset, loff_t len)
>  
>  		inode_lock(inode);
>  		i_mmap_lock_write(mapping);
> -		if (!RB_EMPTY_ROOT(&mapping->i_mmap))
> +		if (!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root))
>  			hugetlb_vmdelete_list(&mapping->i_mmap,
>  						hole_start >> PAGE_SHIFT,
>  						hole_end  >> PAGE_SHIFT);
> diff --git a/fs/inode.c b/fs/inode.c
> index 50370599e371..5ddb808ea934 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -353,7 +353,7 @@ void address_space_init_once(struct address_space *mapping)
>  	init_rwsem(&mapping->i_mmap_rwsem);
>  	INIT_LIST_HEAD(&mapping->private_list);
>  	spin_lock_init(&mapping->private_lock);
> -	mapping->i_mmap = RB_ROOT;
> +	mapping->i_mmap = RB_ROOT_CACHED;
>  }
>  EXPORT_SYMBOL(address_space_init_once);
>  
> diff --git a/include/drm/drm_mm.h b/include/drm/drm_mm.h
> index 49b292e98fec..8d10fc97801c 100644
> --- a/include/drm/drm_mm.h
> +++ b/include/drm/drm_mm.h
> @@ -172,7 +172,7 @@ struct drm_mm {
>  	 * according to the (increasing) start address of the memory node. */
>  	struct drm_mm_node head_node;
>  	/* Keep an interval_tree for fast lookup of drm_mm_nodes by address. */
> -	struct rb_root interval_tree;
> +	struct rb_root_cached interval_tree;
>  	struct rb_root holes_size;
>  	struct rb_root holes_addr;
>  
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index ed8798255872..d1d521c5025b 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -392,7 +392,7 @@ struct address_space {
>  	struct radix_tree_root	page_tree;	/* radix tree of all pages */
>  	spinlock_t		tree_lock;	/* and lock protecting it */
>  	atomic_t		i_mmap_writable;/* count VM_SHARED mappings */
> -	struct rb_root		i_mmap;		/* tree of private and shared mappings */
> +	struct rb_root_cached	i_mmap;		/* tree of private and shared mappings */
>  	struct rw_semaphore	i_mmap_rwsem;	/* protect tree, count, list */
>  	/* Protected by tree_lock together with the radix tree */
>  	unsigned long		nrpages;	/* number of total pages */
> @@ -486,7 +486,7 @@ static inline void i_mmap_unlock_read(struct address_space *mapping)
>   */
>  static inline int mapping_mapped(struct address_space *mapping)
>  {
> -	return	!RB_EMPTY_ROOT(&mapping->i_mmap);
> +	return	!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root);
>  }
>  
>  /*
> diff --git a/include/linux/interval_tree.h b/include/linux/interval_tree.h
> index 724556aa3c95..202ee1283f4b 100644
> --- a/include/linux/interval_tree.h
> +++ b/include/linux/interval_tree.h
> @@ -11,13 +11,15 @@ struct interval_tree_node {
>  };
>  
>  extern void
> -interval_tree_insert(struct interval_tree_node *node, struct rb_root *root);
> +interval_tree_insert(struct interval_tree_node *node,
> +		     struct rb_root_cached *root);
>  
>  extern void
> -interval_tree_remove(struct interval_tree_node *node, struct rb_root *root);
> +interval_tree_remove(struct interval_tree_node *node,
> +		     struct rb_root_cached *root);
>  
>  extern struct interval_tree_node *
> -interval_tree_iter_first(struct rb_root *root,
> +interval_tree_iter_first(struct rb_root_cached *root,
>  			 unsigned long start, unsigned long last);
>  
>  extern struct interval_tree_node *
> diff --git a/include/linux/interval_tree_generic.h b/include/linux/interval_tree_generic.h
> index 58370e1862ad..f096423c8cbd 100644
> --- a/include/linux/interval_tree_generic.h
> +++ b/include/linux/interval_tree_generic.h
> @@ -65,11 +65,13 @@ RB_DECLARE_CALLBACKS(static, ITPREFIX ## _augment, ITSTRUCT, ITRB,	      \
>  									      \
>  /* Insert / remove interval nodes from the tree */			      \
>  									      \
> -ITSTATIC void ITPREFIX ## _insert(ITSTRUCT *node, struct rb_root *root)	      \
> +ITSTATIC void ITPREFIX ## _insert(ITSTRUCT *node,			      \
> +				  struct rb_root_cached *root)	 	      \
>  {									      \
> -	struct rb_node **link = &root->rb_node, *rb_parent = NULL;	      \
> +	struct rb_node **link = &root->rb_root.rb_node, *rb_parent = NULL;    \
>  	ITTYPE start = ITSTART(node), last = ITLAST(node);		      \
>  	ITSTRUCT *parent;						      \
> +	bool leftmost = true;						      \
>  									      \
>  	while (*link) {							      \
>  		rb_parent = *link;					      \
> @@ -78,18 +80,22 @@ ITSTATIC void ITPREFIX ## _insert(ITSTRUCT *node, struct rb_root *root)	      \
>  			parent->ITSUBTREE = last;			      \
>  		if (start < ITSTART(parent))				      \
>  			link = &parent->ITRB.rb_left;			      \
> -		else							      \
> +		else {							      \
>  			link = &parent->ITRB.rb_right;			      \
> +			leftmost = false;				      \
> +		}							      \
>  	}								      \
>  									      \
>  	node->ITSUBTREE = last;						      \
>  	rb_link_node(&node->ITRB, rb_parent, link);			      \
> -	rb_insert_augmented(&node->ITRB, root, &ITPREFIX ## _augment);	      \
> +	rb_insert_augmented_cached(&node->ITRB, root,			      \
> +				   leftmost, &ITPREFIX ## _augment);	      \
>  }									      \
>  									      \
> -ITSTATIC void ITPREFIX ## _remove(ITSTRUCT *node, struct rb_root *root)	      \
> +ITSTATIC void ITPREFIX ## _remove(ITSTRUCT *node,			      \
> +				  struct rb_root_cached *root)		      \
>  {									      \
> -	rb_erase_augmented(&node->ITRB, root, &ITPREFIX ## _augment);	      \
> +	rb_erase_augmented_cached(&node->ITRB, root, &ITPREFIX ## _augment);  \
>  }									      \
>  									      \
>  /*									      \
> @@ -140,15 +146,35 @@ ITPREFIX ## _subtree_search(ITSTRUCT *node, ITTYPE start, ITTYPE last)	      \
>  }									      \
>  									      \
>  ITSTATIC ITSTRUCT *							      \
> -ITPREFIX ## _iter_first(struct rb_root *root, ITTYPE start, ITTYPE last)      \
> +ITPREFIX ## _iter_first(struct rb_root_cached *root,			      \
> +			ITTYPE start, ITTYPE last)			      \
>  {									      \
> -	ITSTRUCT *node;							      \
> +	ITSTRUCT *node, *leftmost;					      \
>  									      \
> -	if (!root->rb_node)						      \
> +	if (!root->rb_root.rb_node)					      \
>  		return NULL;						      \
> -	node = rb_entry(root->rb_node, ITSTRUCT, ITRB);			      \
> +									      \
> +	/*								      \
> +	 * Fastpath range intersection/overlap between A: [a0, a1] and	      \
> +	 * B: [b0, b1] is given by:					      \
> +	 *								      \
> +	 *         a0 <= b1 && b0 <= a1					      \
> +	 *								      \
> +	 *  ... where A holds the lock range and B holds the smallest	      \
> +	 * 'start' and largest 'last' in the tree. For the later, we	      \
> +	 * rely on the root node, which by augmented interval tree	      \
> +	 * property, holds the largest value in its last-in-subtree.	      \
> +	 * This allows mitigating some of the tree walk overhead for	      \
> +	 * for non-intersecting ranges, maintained and consulted in O(1).     \
> +	 */								      \
> +	node = rb_entry(root->rb_root.rb_node, ITSTRUCT, ITRB);		      \
>  	if (node->ITSUBTREE < start)					      \
>  		return NULL;						      \
> +									      \
> +	leftmost = rb_entry(root->rb_leftmost, ITSTRUCT, ITRB);		      \
> +	if (ITSTART(leftmost) > last)					      \
> +		return NULL;						      \
> +									      \
>  	return ITPREFIX ## _subtree_search(node, start, last);		      \
>  }									      \
>  									      \
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 46b9ac5e8569..3a2652efbbfb 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1992,13 +1992,13 @@ extern int nommu_shrink_inode_mappings(struct inode *, size_t, size_t);
>  
>  /* interval_tree.c */
>  void vma_interval_tree_insert(struct vm_area_struct *node,
> -			      struct rb_root *root);
> +			      struct rb_root_cached *root);
>  void vma_interval_tree_insert_after(struct vm_area_struct *node,
>  				    struct vm_area_struct *prev,
> -				    struct rb_root *root);
> +				    struct rb_root_cached *root);
>  void vma_interval_tree_remove(struct vm_area_struct *node,
> -			      struct rb_root *root);
> -struct vm_area_struct *vma_interval_tree_iter_first(struct rb_root *root,
> +			      struct rb_root_cached *root);
> +struct vm_area_struct *vma_interval_tree_iter_first(struct rb_root_cached *root,
>  				unsigned long start, unsigned long last);
>  struct vm_area_struct *vma_interval_tree_iter_next(struct vm_area_struct *node,
>  				unsigned long start, unsigned long last);
> @@ -2008,11 +2008,12 @@ struct vm_area_struct *vma_interval_tree_iter_next(struct vm_area_struct *node,
>  	     vma; vma = vma_interval_tree_iter_next(vma, start, last))
>  
>  void anon_vma_interval_tree_insert(struct anon_vma_chain *node,
> -				   struct rb_root *root);
> +				   struct rb_root_cached *root);
>  void anon_vma_interval_tree_remove(struct anon_vma_chain *node,
> -				   struct rb_root *root);
> -struct anon_vma_chain *anon_vma_interval_tree_iter_first(
> -	struct rb_root *root, unsigned long start, unsigned long last);
> +				   struct rb_root_cached *root);
> +struct anon_vma_chain *
> +anon_vma_interval_tree_iter_first(struct rb_root_cached *root,
> +				  unsigned long start, unsigned long last);
>  struct anon_vma_chain *anon_vma_interval_tree_iter_next(
>  	struct anon_vma_chain *node, unsigned long start, unsigned long last);
>  #ifdef CONFIG_DEBUG_VM_RB
> diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> index 43ef2c30cb0f..22c298c6cc26 100644
> --- a/include/linux/rmap.h
> +++ b/include/linux/rmap.h
> @@ -55,7 +55,9 @@ struct anon_vma {
>  	 * is serialized by a system wide lock only visible to
>  	 * mm_take_all_locks() (mm_all_locks_mutex).
>  	 */
> -	struct rb_root rb_root;	/* Interval tree of private "related" vmas */
> +
> +	/* Interval tree of private "related" vmas */
> +	struct rb_root_cached rb_root;
>  };
>  
>  /*
> diff --git a/include/rdma/ib_umem_odp.h b/include/rdma/ib_umem_odp.h
> index fb67554aabd6..5eb7f5bc8248 100644
> --- a/include/rdma/ib_umem_odp.h
> +++ b/include/rdma/ib_umem_odp.h
> @@ -111,22 +111,25 @@ int ib_umem_odp_map_dma_pages(struct ib_umem *umem, u64 start_offset, u64 bcnt,
>  void ib_umem_odp_unmap_dma_pages(struct ib_umem *umem, u64 start_offset,
>  				 u64 bound);
>  
> -void rbt_ib_umem_insert(struct umem_odp_node *node, struct rb_root *root);
> -void rbt_ib_umem_remove(struct umem_odp_node *node, struct rb_root *root);
> +void rbt_ib_umem_insert(struct umem_odp_node *node,
> +			struct rb_root_cached *root);
> +void rbt_ib_umem_remove(struct umem_odp_node *node,
> +			struct rb_root_cached *root);
>  typedef int (*umem_call_back)(struct ib_umem *item, u64 start, u64 end,
>  			      void *cookie);
>  /*
>   * Call the callback on each ib_umem in the range. Returns the logical or of
>   * the return values of the functions called.
>   */
> -int rbt_ib_umem_for_each_in_range(struct rb_root *root, u64 start, u64 end,
> +int rbt_ib_umem_for_each_in_range(struct rb_root_cached *root,
> +				  u64 start, u64 end,
>  				  umem_call_back cb, void *cookie);
>  
>  /*
>   * Find first region intersecting with address range.
>   * Return NULL if not found
>   */
> -struct ib_umem_odp *rbt_ib_umem_lookup(struct rb_root *root,
> +struct ib_umem_odp *rbt_ib_umem_lookup(struct rb_root_cached *root,
>  				       u64 addr, u64 length);
>  
>  static inline int ib_umem_mmu_notifier_retry(struct ib_umem *item,
> diff --git a/include/rdma/ib_verbs.h b/include/rdma/ib_verbs.h
> index 593ad2640d2f..81665480a4ab 100644
> --- a/include/rdma/ib_verbs.h
> +++ b/include/rdma/ib_verbs.h
> @@ -1420,7 +1420,7 @@ struct ib_ucontext {
>  
>  	struct pid             *tgid;
>  #ifdef CONFIG_INFINIBAND_ON_DEMAND_PAGING
> -	struct rb_root      umem_tree;
> +	struct rb_root_cached   umem_tree;
>  	/*
>  	 * Protects .umem_rbroot and tree, as well as odp_mrs_count and
>  	 * mmu notifiers registration.
> diff --git a/lib/interval_tree_test.c b/lib/interval_tree_test.c
> index df495fe81421..0e343fd29570 100644
> --- a/lib/interval_tree_test.c
> +++ b/lib/interval_tree_test.c
> @@ -19,14 +19,14 @@ __param(bool, search_all, false, "Searches will iterate all nodes in the tree");
>  
>  __param(uint, max_endpoint, ~0, "Largest value for the interval's endpoint");
>  
> -static struct rb_root root = RB_ROOT;
> +static struct rb_root_cached root = RB_ROOT_CACHED;
>  static struct interval_tree_node *nodes = NULL;
>  static u32 *queries = NULL;
>  
>  static struct rnd_state rnd;
>  
>  static inline unsigned long
> -search(struct rb_root *root, unsigned long start, unsigned long last)
> +search(struct rb_root_cached *root, unsigned long start, unsigned long last)
>  {
>  	struct interval_tree_node *node;
>  	unsigned long results = 0;
> diff --git a/mm/interval_tree.c b/mm/interval_tree.c
> index f2c2492681bf..b47664358796 100644
> --- a/mm/interval_tree.c
> +++ b/mm/interval_tree.c
> @@ -28,7 +28,7 @@ INTERVAL_TREE_DEFINE(struct vm_area_struct, shared.rb,
>  /* Insert node immediately after prev in the interval tree */
>  void vma_interval_tree_insert_after(struct vm_area_struct *node,
>  				    struct vm_area_struct *prev,
> -				    struct rb_root *root)
> +				    struct rb_root_cached *root)
>  {
>  	struct rb_node **link;
>  	struct vm_area_struct *parent;
> @@ -55,7 +55,7 @@ void vma_interval_tree_insert_after(struct vm_area_struct *node,
>  
>  	node->shared.rb_subtree_last = last;
>  	rb_link_node(&node->shared.rb, &parent->shared.rb, link);
> -	rb_insert_augmented(&node->shared.rb, root,
> +	rb_insert_augmented(&node->shared.rb, &root->rb_root,
>  			    &vma_interval_tree_augment);
>  }
>  
> @@ -74,7 +74,7 @@ INTERVAL_TREE_DEFINE(struct anon_vma_chain, rb, unsigned long, rb_subtree_last,
>  		     static inline, __anon_vma_interval_tree)
>  
>  void anon_vma_interval_tree_insert(struct anon_vma_chain *node,
> -				   struct rb_root *root)
> +				   struct rb_root_cached *root)
>  {
>  #ifdef CONFIG_DEBUG_VM_RB
>  	node->cached_vma_start = avc_start_pgoff(node);
> @@ -84,13 +84,13 @@ void anon_vma_interval_tree_insert(struct anon_vma_chain *node,
>  }
>  
>  void anon_vma_interval_tree_remove(struct anon_vma_chain *node,
> -				   struct rb_root *root)
> +				   struct rb_root_cached *root)
>  {
>  	__anon_vma_interval_tree_remove(node, root);
>  }
>  
>  struct anon_vma_chain *
> -anon_vma_interval_tree_iter_first(struct rb_root *root,
> +anon_vma_interval_tree_iter_first(struct rb_root_cached *root,
>  				  unsigned long first, unsigned long last)
>  {
>  	return __anon_vma_interval_tree_iter_first(root, first, last);
> diff --git a/mm/memory.c b/mm/memory.c
> index 0e517be91a89..33cb79f73394 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2593,7 +2593,7 @@ static void unmap_mapping_range_vma(struct vm_area_struct *vma,
>  	zap_page_range_single(vma, start_addr, end_addr - start_addr, details);
>  }
>  
> -static inline void unmap_mapping_range_tree(struct rb_root *root,
> +static inline void unmap_mapping_range_tree(struct rb_root_cached *root,
>  					    struct zap_details *details)
>  {
>  	struct vm_area_struct *vma;
> @@ -2657,7 +2657,7 @@ void unmap_mapping_range(struct address_space *mapping,
>  		details.last_index = ULONG_MAX;
>  
>  	i_mmap_lock_write(mapping);
> -	if (unlikely(!RB_EMPTY_ROOT(&mapping->i_mmap)))
> +	if (unlikely(!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root)))
>  		unmap_mapping_range_tree(&mapping->i_mmap, &details);
>  	i_mmap_unlock_write(mapping);
>  }
> diff --git a/mm/mmap.c b/mm/mmap.c
> index f19efcf75418..8121c70df96f 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -684,7 +684,7 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
>  	struct mm_struct *mm = vma->vm_mm;
>  	struct vm_area_struct *next = vma->vm_next, *orig_vma = vma;
>  	struct address_space *mapping = NULL;
> -	struct rb_root *root = NULL;
> +	struct rb_root_cached *root = NULL;
>  	struct anon_vma *anon_vma = NULL;
>  	struct file *file = vma->vm_file;
>  	bool start_changed = false, end_changed = false;
> @@ -3314,7 +3314,7 @@ static DEFINE_MUTEX(mm_all_locks_mutex);
>  
>  static void vm_lock_anon_vma(struct mm_struct *mm, struct anon_vma *anon_vma)
>  {
> -	if (!test_bit(0, (unsigned long *) &anon_vma->root->rb_root.rb_node)) {
> +	if (!test_bit(0, (unsigned long *) &anon_vma->rb_root.rb_root.rb_node)) {
>  		/*
>  		 * The LSB of head.next can't change from under us
>  		 * because we hold the mm_all_locks_mutex.
> @@ -3330,7 +3330,7 @@ static void vm_lock_anon_vma(struct mm_struct *mm, struct anon_vma *anon_vma)
>  		 * anon_vma->root->rwsem.
>  		 */
>  		if (__test_and_set_bit(0, (unsigned long *)
> -				       &anon_vma->root->rb_root.rb_node))
> +				       &anon_vma->root->rb_root.rb_root.rb_node))
>  			BUG();
>  	}
>  }
> @@ -3432,7 +3432,7 @@ int mm_take_all_locks(struct mm_struct *mm)
>  
>  static void vm_unlock_anon_vma(struct anon_vma *anon_vma)
>  {
> -	if (test_bit(0, (unsigned long *) &anon_vma->root->rb_root.rb_node)) {
> +	if (test_bit(0, (unsigned long *) &anon_vma->root->rb_root.rb_root.rb_node)) {
>  		/*
>  		 * The LSB of head.next can't change to 0 from under
>  		 * us because we hold the mm_all_locks_mutex.
> @@ -3446,7 +3446,7 @@ static void vm_unlock_anon_vma(struct anon_vma *anon_vma)
>  		 * anon_vma->root->rwsem.
>  		 */
>  		if (!__test_and_clear_bit(0, (unsigned long *)
> -					  &anon_vma->root->rb_root.rb_node))
> +					  &anon_vma->root->rb_root.rb_root.rb_node))
>  			BUG();
>  		anon_vma_unlock_write(anon_vma);
>  	}
> diff --git a/mm/rmap.c b/mm/rmap.c
> index ced14f1af6dc..ad479e5e081d 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -390,7 +390,7 @@ void unlink_anon_vmas(struct vm_area_struct *vma)
>  		 * Leave empty anon_vmas on the list - we'll need
>  		 * to free them outside the lock.
>  		 */
> -		if (RB_EMPTY_ROOT(&anon_vma->rb_root)) {
> +		if (RB_EMPTY_ROOT(&anon_vma->rb_root.rb_root)) {
>  			anon_vma->parent->degree--;
>  			continue;
>  		}
> @@ -424,7 +424,7 @@ static void anon_vma_ctor(void *data)
>  
>  	init_rwsem(&anon_vma->rwsem);
>  	atomic_set(&anon_vma->refcount, 0);
> -	anon_vma->rb_root = RB_ROOT;
> +	anon_vma->rb_root = RB_ROOT_CACHED;
>  }
>  
>  void __init anon_vma_init(void)
> -- 
> 2.12.0

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2017-08-01 17:16 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-19  1:45 [PATCH -next v4 00/17] rbtree: cache leftmost node internally Davidlohr Bueso
2017-07-19  1:45 ` [PATCH 01/17] " Davidlohr Bueso
2017-07-19  1:45 ` [PATCH 02/17] rbtree: optimize root-check during rebalancing loop Davidlohr Bueso
2017-07-19  1:45 ` [PATCH 03/17] rbtree: add some additional comments for rebalancing cases Davidlohr Bueso
2017-07-19  1:45 ` [PATCH 04/17] lib/rbtree_test.c: make input module parameters Davidlohr Bueso
2017-07-19  1:45 ` [PATCH 05/17] lib/rbtree_test.c: add (inorder) traversal test Davidlohr Bueso
2017-07-19  1:45 ` [PATCH 06/17] lib/rbtree_test.c: support rb_root_cached Davidlohr Bueso
2017-07-19  1:45 ` [PATCH 07/17] sched/fair: replace cfs_rq->rb_leftmost Davidlohr Bueso
2017-07-19  1:45 ` [PATCH 08/17] sched/deadline: replace earliest dl and rq leftmost caching Davidlohr Bueso
2017-07-19  1:45 ` [PATCH 09/17] locking/rtmutex: replace top-waiter and pi_waiters " Davidlohr Bueso
2017-07-19  1:45 ` [PATCH 10/17] block/cfq: replace cfq_rb_root " Davidlohr Bueso
2017-07-19  7:46   ` Jan Kara
2017-07-19  1:45 ` [PATCH 11/17] lib/interval_tree: fast overlap detection Davidlohr Bueso
     [not found]   ` <20170719014603.19029-12-dave-h16yJtLeMjHk1uMJSBkQmQ@public.gmane.org>
2017-07-22 17:52     ` Doug Ledford
2017-07-22 17:52       ` Doug Ledford
2017-08-01 17:16   ` Michael S. Tsirkin
2017-08-01 17:16     ` Michael S. Tsirkin
2017-07-19  1:45 ` [PATCH 12/17] lib/interval-tree: correct comment wrt generic flavor Davidlohr Bueso
2017-07-19  1:45 ` [PATCH 13/17] procfs: use faster rb_first_cached() Davidlohr Bueso
2017-07-19  1:46 ` [PATCH 14/17] fs/epoll: " Davidlohr Bueso
2017-07-19  1:46 ` [PATCH 15/17] fs/ext4: use cached rbtrees Davidlohr Bueso
2017-07-19  7:40   ` Jan Kara
2017-07-19 22:50     ` Davidlohr Bueso
2017-07-19  1:46 ` [PATCH 16/17] mem/memcg: cache rightmost node Davidlohr Bueso
2017-07-19  7:50   ` Michal Hocko
2017-07-26 21:09     ` Andrew Morton
2017-07-27  7:06       ` Michal Hocko
2017-07-19  1:46 ` [PATCH 17/17] block/cfq: cache rightmost rb_node Davidlohr Bueso
2017-07-19  7:59   ` Jan Kara

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.