linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH -next 0/5] rbtree: optimize frequent tree walks
@ 2020-02-07 18:03 Davidlohr Bueso
  2020-02-07 18:03 ` [PATCH 1/5] lib/rbtree: introduce linked-list rbtree interface Davidlohr Bueso
                   ` (5 more replies)
  0 siblings, 6 replies; 18+ messages in thread
From: Davidlohr Bueso @ 2020-02-07 18:03 UTC (permalink / raw)
  To: akpm; +Cc: dave, linux-kernel, mcgrof, broonie, alex.williamson

This series introduces the ll_rbtree interface to optimize rbtree users
that incur in frequent in-order tree iterations (even full traversals).
This can be seen as an abstraction to what is already done for dealing
with VMAs (albeit non-augmented trees).

Mainly, it trades off higher per-node memory footprint (two extra pointers)
for minimal runtime overhead to gain O(1) brachless next and prev node
pointers. See patch 1 for full details, but, for example, the following
tables show the benefits vs the costs of using this new interface.

   +--------+--------------+-----------+
   | #nodes | plain rbtree | ll-rbtree |
   +--------+--------------+-----------+
   |     10 |          138 |        24 |
   |    100 |        7,200 |       425 |
   |   1000 |       17,000 |     8,000 |
   |  10000 |      501,090 |   222,500 |
   +--------+--------------+-----------+

While there are other node representations that optimize getting such pointers
without bloating the nodes as much, such as keeping a parent pointer or threaded
trees where the nil prev/next pointers are recycled; both incurring in higher
runtime penalization for common modification operations as well as any rotations.
The overhead of tree modifications can be seen in the following table, comparing
cycles to insert+delete:

   +--------+--------------+------------------+-----------+
   | #nodes | plain rbtree | augmented rbtree | ll-rbtree |
   +--------+--------------+------------------+-----------+
   |     10 |          530 |              840 |       600 |
   |    100 |        7,100 |           14,200 |     7,800 |
   |   1000 |      265,000 |          370,000 |   205,200 |
   |  10000 |    4,400,000 |        5,500,000 | 4,200,000 |
   +--------+--------------+------------------+-----------+


Patch 1 introduces the ll_rbtree machinery.

Patches 2-5 convert users that might benefit from the new interface.

Full details and justifications for the conversion are in each
individual patch.

I am Cc'ing some of the maintainers of the users I have converted to the whole
series such that it can the numbers and interface details can be easily seen.

Please consider for v5.7.

Thanks!

Davidlohr Bueso (5):
  lib/rbtree: introduce linked-list rbtree interface
  proc/sysctl: optimize proc_sys_readdir
  regmap: optimize sync() and drop() regcache callbacks
  vfio/type1: optimize dma_list tree iterations
  uprobes: optimize build_probe_list()

 Documentation/rbtree.txt              |  74 ++++++++++++++++++++++++
 drivers/base/regmap/regcache-rbtree.c |  62 +++++++++++---------
 drivers/vfio/vfio_iommu_type1.c       |  50 +++++++++--------
 fs/proc/proc_sysctl.c                 |  37 ++++++------
 include/linux/ll_rbtree.h             | 103 ++++++++++++++++++++++++++++++++++
 include/linux/sysctl.h                |   6 +-
 kernel/events/uprobes.c               |  47 ++++++++--------
 7 files changed, 286 insertions(+), 93 deletions(-)
 create mode 100644 include/linux/ll_rbtree.h

-- 
2.16.4


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/5] lib/rbtree: introduce linked-list rbtree interface
  2020-02-07 18:03 [PATCH -next 0/5] rbtree: optimize frequent tree walks Davidlohr Bueso
@ 2020-02-07 18:03 ` Davidlohr Bueso
  2020-02-10 20:07   ` Luis Chamberlain
  2020-02-07 18:03 ` [PATCH 2/5] proc/sysctl: optimize proc_sys_readdir Davidlohr Bueso
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 18+ messages in thread
From: Davidlohr Bueso @ 2020-02-07 18:03 UTC (permalink / raw)
  To: akpm
  Cc: dave, linux-kernel, mcgrof, broonie, alex.williamson, Davidlohr Bueso

When doing in-order tree traversals, the rb_next() and rb_prev() helpers can
be sub-optimal as plain node representations incur in extra pointer chasing
up the tree to compute the corresponding node. This can impact negatively
in performance when traversals are a common thing. This, for example, is what
we do in our vm already to efficiently manage VMAs.

This interface provides branchless O(1) access to the first node as well as
both its in-order successor and predecessor. This is done at the cost of higher
memory footprint: mainly additional prev and next pointers for each node. Such
benefits can be seen in this table showing the amount of cycles it takes to
do a full tree traversal:

   +--------+--------------+-----------+
   | #nodes | plain rbtree | ll-rbtree |
   +--------+--------------+-----------+
   |     10 |          138 |        24 |
   |    100 |        7,200 |       425 |
   |   1000 |       17,000 |     8,000 |
   |  10000 |      501,090 |   222,500 |
   +--------+--------------+-----------+

While there are other node representations that optimize getting such pointers
without bloating the nodes as much, such as keeping a parent pointer or threaded
trees where the nil prev/next pointers are recycled; both incurring in higher
runtime penalization for common modification operations as well as any rotations.
The overhead of tree modifications can be seen in the following table, comparing
cycles to insert+delete:

   +--------+--------------+------------------+-----------+
   | #nodes | plain rbtree | augmented rbtree | ll-rbtree |
   +--------+--------------+------------------+-----------+
   |     10 |          530 |              840 |       600 |
   |    100 |        7,100 |           14,200 |     7,800 |
   |   1000 |      265,000 |          370,000 |   205,200 |
   |  10000 |    4,400,000 |        5,500,000 | 4,200,000 |
   +--------+--------------+------------------+-----------+

This shows that we gain fast iterators at pretty much a negligible runtime overhead,
where both regular and llrb trees are rather similar, but unsurprisingly there is
really a noticible cost when augmenting the rbtree (wich is not important for this
series perse).

Note that all data shown here is based on randomly populated rbtrees (albeit same
seed) running a hacked version of lib/rbtree_test.c on a x86-64 2.6 Ghz IvyBridge
system.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 Documentation/rbtree.txt  |  74 +++++++++++++++++++++++++++++++++
 include/linux/ll_rbtree.h | 103 ++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 177 insertions(+)
 create mode 100644 include/linux/ll_rbtree.h

diff --git a/Documentation/rbtree.txt b/Documentation/rbtree.txt
index 523d54b60087..fe38fb257931 100644
--- a/Documentation/rbtree.txt
+++ b/Documentation/rbtree.txt
@@ -5,6 +5,7 @@ Red-black Trees (rbtree) in Linux
 
 :Date: January 18, 2007
 :Author: Rob Landley <rob@landley.net>
+         Davidlohr Bueso <dave@stgolabs.net>
 
 What are red-black trees, and what are they for?
 ------------------------------------------------
@@ -226,6 +227,79 @@ trees::
 				 struct rb_augment_callbacks *);
 
 
+Linked-list rbtrees
+-------------------
+
+When doing in-order tree traversals, the rb_next() and rb_prev() helpers can
+be sub-optimal as plain node representations incur in extra pointer chasing
+up the tree to compute the corresponding node. This can have a negative impact
+in latencies in scenarios where tree traversals are not rare. This interface
+provides branchless O(1) access to the first node as well as both its in-order
+successor and predecessor. This is done at the cost of higher memory footprint:
+mainly additional prev and next pointers for each node, essentially duplicating
+the tree data structure. While there are other node representations that optimize
+getting such pointers without bloating the nodes as much, such as keeping a
+parent pointer or threaded trees where the nil prev/next pointers are recycled;
+both incurring in higher runtime penalization for common modification operations
+as well as any rotations.
+
+As with regular rb_root structure, linked-list rbtrees are initialized to be
+empty via::
+
+  struct llrb_root mytree = LLRB_ROOT;
+
+Insertion and deletion are defined by:
+
+  void llrb_insert(struct llrb_root *, struct llrb_node *, struct llrb_node *);
+  void llrb_erase(struct llrb_node *, struct llrb_root *);
+
+The corresponding helpers needed to iterate through a tree are the following,
+equivalent to an optimized version of the regular rbtree version:
+
+  struct llrb_node *llrb_first(struct llrb_root *tree);
+  struct llrb_node *llrb_next(struct rb_node *node);
+  struct llrb_node *llrb_prev(struct rb_node *node);
+
+
+Inserting data into a Linked-list rbtree
+----------------------------------------
+
+Because llrb trees can exist anywhere regular rbtrees, the steps are similar.
+The search for insertion differs from the regular search in two ways. First
+the caller must keep track of the previous node, and secondly the caller must
+combine both regular nodes for the actual iteration down and convert from rb_node
+to llrb_node.
+
+Example::
+
+  int my_insert(struct llrb_root *root, struct mytype *data)
+  {
+	struct rb_node **new = &(root->rb_node.rb_node), *parent = NULL;
+	struct llrb_node *prev = NULL;
+
+	/* Figure out where to put new node */
+	while (*new) {
+		struct mytype *this = llrb_entry(*new, struct mytype, node);
+		int result = strcmp(data->keystring, this->keystring);
+
+		parent = *new;
+		if (result < 0)
+			new = &((*new)->rb_left);
+		else if (result > 0) {
+			prev = llrb_from_rb(*parent);
+			new = &((*new)->rb_right);
+		} else
+			return FALSE;
+	}
+
+	/* Add new node and rebalance tree. */
+	rb_link_node(&data->node.rb_node, parent, new);
+	llrb_insert(root, &data->node, prev);
+
+	return TRUE;
+  }
+
+
 Support for Augmented rbtrees
 -----------------------------
 
diff --git a/include/linux/ll_rbtree.h b/include/linux/ll_rbtree.h
new file mode 100644
index 000000000000..34ffbb935008
--- /dev/null
+++ b/include/linux/ll_rbtree.h
@@ -0,0 +1,103 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Linked-list RB-trees
+ *
+ * Copyright (C) 2020 Davidlohr Bueso.
+ *
+ * Use only when rbtree traversals are common and per-node extra
+ * mempory footprint is not an issue.
+ */
+
+#ifndef LLRB_TREE_H
+#define LLRB_TREE_H
+
+#include <linux/rbtree.h>
+
+struct llrb_node {
+	struct llrb_node *prev, *next;
+	struct rb_node rb_node;
+};
+
+struct llrb_root {
+	struct rb_root rb_root;
+	struct llrb_node *first;
+};
+
+#define LLRB_ROOT (struct llrb_root) { { NULL }, NULL }
+
+/**
+ * llrb_from_rb - get an llrb_node from an rb_node
+ * @node: the node in question.
+ *
+ * This is useful when iterating down the tree, for example
+ * during node insertion.
+ */
+static __always_inline struct llrb_node *llrb_from_rb(struct rb_node *node)
+{
+	return container_of(node, struct llrb_node, rb_node);
+}
+
+/**
+ * llrb_entry - get the struct for this entry
+ * @ptr:	the &struct rb_node pointer.
+ * @type:	the type of the struct this is embedded in.
+ * @member:	the name of the llrb_node within the struct.
+ */
+#define	llrb_entry(ptr, type, member) \
+	container_of(llrb_from_rb(ptr), type, member)
+
+static __always_inline struct llrb_node *llrb_first(struct llrb_root *root)
+{
+	return root->first;
+}
+
+static __always_inline struct llrb_node *llrb_next(struct llrb_node *node)
+{
+	return node->next;
+}
+
+static __always_inline struct llrb_node *llrb_prev(struct llrb_node *node)
+{
+	return node->prev;
+}
+
+static __always_inline
+void llrb_insert(struct llrb_root *root, struct llrb_node *node,
+		 struct llrb_node *prev)
+{
+	struct llrb_node *next;
+
+	rb_insert_color(&node->rb_node, &root->rb_root);
+
+	node->prev = prev;
+	if (prev) {
+		next = prev->next;
+		prev->next = node;
+	} else {
+		next = root->first;
+		root->first = node;
+	}
+	node->next = next;
+	if (next)
+		next->prev = node;
+}
+
+static __always_inline
+void llrb_erase(struct llrb_node *node, struct llrb_root *root)
+{
+	struct llrb_node *prev, *next;
+
+	rb_erase(&node->rb_node, &root->rb_root);
+
+	next = node->next;
+	prev = node->prev;
+
+	if (prev)
+		prev->next = next;
+	else
+		root->first = next;
+	if (next)
+		next->prev = prev;
+}
+
+#endif
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/5] proc/sysctl: optimize proc_sys_readdir
  2020-02-07 18:03 [PATCH -next 0/5] rbtree: optimize frequent tree walks Davidlohr Bueso
  2020-02-07 18:03 ` [PATCH 1/5] lib/rbtree: introduce linked-list rbtree interface Davidlohr Bueso
@ 2020-02-07 18:03 ` Davidlohr Bueso
  2020-02-10 20:08   ` Luis Chamberlain
  2020-02-07 18:03 ` [PATCH 3/5] regmap: optimize sync() and drop() regcache callbacks Davidlohr Bueso
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 18+ messages in thread
From: Davidlohr Bueso @ 2020-02-07 18:03 UTC (permalink / raw)
  To: akpm
  Cc: dave, linux-kernel, mcgrof, broonie, alex.williamson, keescook,
	yzaikin, linux-fsdevel, Davidlohr Bueso

This patch coverts struct ctl_dir to use an llrbtree
instead of a regular rbtree such that computing nodes for
potential usable entries becomes a branchless O(1) operation,
therefore optimizing first_usable_entry(). The cost are
mainly three additional pointers: one for the root and two
for each struct ctl_node next/prev nodes, enlarging it from
32 to 48 bytes on x86-64.

Cc: mcgrof@kernel.org
Cc: keescook@chromium.org
Cc: yzaikin@google.com
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 fs/proc/proc_sysctl.c  | 37 +++++++++++++++++++------------------
 include/linux/sysctl.h |  6 +++---
 2 files changed, 22 insertions(+), 21 deletions(-)

diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c
index d80989b6c344..5a1b3b8be16b 100644
--- a/fs/proc/proc_sysctl.c
+++ b/fs/proc/proc_sysctl.c
@@ -111,15 +111,14 @@ static struct ctl_table *find_entry(struct ctl_table_header **phead,
 {
 	struct ctl_table_header *head;
 	struct ctl_table *entry;
-	struct rb_node *node = dir->root.rb_node;
+	struct rb_node *node = dir->root.rb_root.rb_node;
 
-	while (node)
-	{
+	while (node) {
 		struct ctl_node *ctl_node;
 		const char *procname;
 		int cmp;
 
-		ctl_node = rb_entry(node, struct ctl_node, node);
+		ctl_node = llrb_entry(node, struct ctl_node, node);
 		head = ctl_node->header;
 		entry = &head->ctl_table[ctl_node - head->node];
 		procname = entry->procname;
@@ -139,9 +138,10 @@ static struct ctl_table *find_entry(struct ctl_table_header **phead,
 
 static int insert_entry(struct ctl_table_header *head, struct ctl_table *entry)
 {
-	struct rb_node *node = &head->node[entry - head->ctl_table].node;
-	struct rb_node **p = &head->parent->root.rb_node;
+	struct rb_node *node = &head->node[entry - head->ctl_table].node.rb_node;
+	struct rb_node **p = &head->parent->root.rb_root.rb_node;
 	struct rb_node *parent = NULL;
+	struct llrb_node *prev = NULL;
 	const char *name = entry->procname;
 	int namelen = strlen(name);
 
@@ -153,7 +153,7 @@ static int insert_entry(struct ctl_table_header *head, struct ctl_table *entry)
 		int cmp;
 
 		parent = *p;
-		parent_node = rb_entry(parent, struct ctl_node, node);
+		parent_node = llrb_entry(parent, struct ctl_node, node);
 		parent_head = parent_node->header;
 		parent_entry = &parent_head->ctl_table[parent_node - parent_head->node];
 		parent_name = parent_entry->procname;
@@ -161,9 +161,10 @@ static int insert_entry(struct ctl_table_header *head, struct ctl_table *entry)
 		cmp = namecmp(name, namelen, parent_name, strlen(parent_name));
 		if (cmp < 0)
 			p = &(*p)->rb_left;
-		else if (cmp > 0)
+		else if (cmp > 0) {
+			prev = llrb_from_rb(parent);
 			p = &(*p)->rb_right;
-		else {
+		} else {
 			pr_err("sysctl duplicate entry: ");
 			sysctl_print_dir(head->parent);
 			pr_cont("/%s\n", entry->procname);
@@ -172,15 +173,15 @@ static int insert_entry(struct ctl_table_header *head, struct ctl_table *entry)
 	}
 
 	rb_link_node(node, parent, p);
-	rb_insert_color(node, &head->parent->root);
+	llrb_insert(&head->parent->root, llrb_from_rb(node), prev);
 	return 0;
 }
 
 static void erase_entry(struct ctl_table_header *head, struct ctl_table *entry)
 {
-	struct rb_node *node = &head->node[entry - head->ctl_table].node;
+	struct llrb_node *node = &head->node[entry - head->ctl_table].node;
 
-	rb_erase(node, &head->parent->root);
+	llrb_erase(node, &head->parent->root);
 }
 
 static void init_header(struct ctl_table_header *head,
@@ -223,7 +224,7 @@ static int insert_header(struct ctl_dir *dir, struct ctl_table_header *header)
 
 	/* Am I creating a permanently empty directory? */
 	if (header->ctl_table == sysctl_mount_point) {
-		if (!RB_EMPTY_ROOT(&dir->root))
+		if (!RB_EMPTY_ROOT(&dir->root.rb_root))
 			return -EINVAL;
 		set_empty_dir(dir);
 	}
@@ -381,11 +382,11 @@ static struct ctl_table *lookup_entry(struct ctl_table_header **phead,
 	return entry;
 }
 
-static struct ctl_node *first_usable_entry(struct rb_node *node)
+static struct ctl_node *first_usable_entry(struct llrb_node *node)
 {
 	struct ctl_node *ctl_node;
 
-	for (;node; node = rb_next(node)) {
+	for (; node; node = llrb_next(node)) {
 		ctl_node = rb_entry(node, struct ctl_node, node);
 		if (use_table(ctl_node->header))
 			return ctl_node;
@@ -401,7 +402,7 @@ static void first_entry(struct ctl_dir *dir,
 	struct ctl_node *ctl_node;
 
 	spin_lock(&sysctl_lock);
-	ctl_node = first_usable_entry(rb_first(&dir->root));
+	ctl_node = first_usable_entry(llrb_first(&dir->root));
 	spin_unlock(&sysctl_lock);
 	if (ctl_node) {
 		head = ctl_node->header;
@@ -420,7 +421,7 @@ static void next_entry(struct ctl_table_header **phead, struct ctl_table **pentr
 	spin_lock(&sysctl_lock);
 	unuse_table(head);
 
-	ctl_node = first_usable_entry(rb_next(&ctl_node->node));
+	ctl_node = first_usable_entry(llrb_next(&ctl_node->node));
 	spin_unlock(&sysctl_lock);
 	head = NULL;
 	if (ctl_node) {
@@ -1711,7 +1712,7 @@ void setup_sysctl_set(struct ctl_table_set *set,
 
 void retire_sysctl_set(struct ctl_table_set *set)
 {
-	WARN_ON(!RB_EMPTY_ROOT(&set->dir.root));
+	WARN_ON(!RB_EMPTY_ROOT(&set->dir.root.rb_root));
 }
 
 int __init proc_sys_init(void)
diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index 02fa84493f23..afb35fa61b91 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -25,7 +25,7 @@
 #include <linux/list.h>
 #include <linux/rcupdate.h>
 #include <linux/wait.h>
-#include <linux/rbtree.h>
+#include <linux/ll_rbtree.h>
 #include <linux/uidgid.h>
 #include <uapi/linux/sysctl.h>
 
@@ -133,7 +133,7 @@ struct ctl_table {
 } __randomize_layout;
 
 struct ctl_node {
-	struct rb_node node;
+	struct llrb_node node;
 	struct ctl_table_header *header;
 };
 
@@ -161,7 +161,7 @@ struct ctl_table_header {
 struct ctl_dir {
 	/* Header must be at the start of ctl_dir */
 	struct ctl_table_header header;
-	struct rb_root root;
+	struct llrb_root root;
 };
 
 struct ctl_table_set {
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 3/5] regmap: optimize sync() and drop() regcache callbacks
  2020-02-07 18:03 [PATCH -next 0/5] rbtree: optimize frequent tree walks Davidlohr Bueso
  2020-02-07 18:03 ` [PATCH 1/5] lib/rbtree: introduce linked-list rbtree interface Davidlohr Bueso
  2020-02-07 18:03 ` [PATCH 2/5] proc/sysctl: optimize proc_sys_readdir Davidlohr Bueso
@ 2020-02-07 18:03 ` Davidlohr Bueso
  2020-02-10 13:11   ` Mark Brown
  2020-02-07 18:03 ` [PATCH 4/5] vfio/type1: optimize dma_list tree iterations Davidlohr Bueso
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 18+ messages in thread
From: Davidlohr Bueso @ 2020-02-07 18:03 UTC (permalink / raw)
  To: akpm
  Cc: dave, linux-kernel, mcgrof, broonie, alex.williamson, Davidlohr Bueso

At a higher memory footprint (two additional pointers per node) we
can get branchless O(1) tree iterators which can optimize in-order
tree walks/traversals. For example, on my laptop looking at the rbtree
debugfs entries:

before:
    60 nodes, 165 registers, average 2 registers, used 4036 bytes
after:
    60 nodes, 165 registers, average 2 registers, used 5004 bytes

Cc: broonie@kernel.org
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 drivers/base/regmap/regcache-rbtree.c | 62 +++++++++++++++++++----------------
 1 file changed, 34 insertions(+), 28 deletions(-)

diff --git a/drivers/base/regmap/regcache-rbtree.c b/drivers/base/regmap/regcache-rbtree.c
index cfa29dc89bbf..d55f6b6c87b4 100644
--- a/drivers/base/regmap/regcache-rbtree.c
+++ b/drivers/base/regmap/regcache-rbtree.c
@@ -8,7 +8,7 @@
 
 #include <linux/debugfs.h>
 #include <linux/device.h>
-#include <linux/rbtree.h>
+#include <linux/ll_rbtree.h>
 #include <linux/seq_file.h>
 #include <linux/slab.h>
 
@@ -28,11 +28,11 @@ struct regcache_rbtree_node {
 	/* number of registers available in the block */
 	unsigned int blklen;
 	/* the actual rbtree node holding this block */
-	struct rb_node node;
+	struct llrb_node node;
 };
 
 struct regcache_rbtree_ctx {
-	struct rb_root root;
+	struct llrb_root root;
 	struct regcache_rbtree_node *cached_rbnode;
 };
 
@@ -75,9 +75,9 @@ static struct regcache_rbtree_node *regcache_rbtree_lookup(struct regmap *map,
 			return rbnode;
 	}
 
-	node = rbtree_ctx->root.rb_node;
+	node = rbtree_ctx->root.rb_root.rb_node;
 	while (node) {
-		rbnode = rb_entry(node, struct regcache_rbtree_node, node);
+		rbnode = llrb_entry(node, struct regcache_rbtree_node, node);
 		regcache_rbtree_get_base_top_reg(map, rbnode, &base_reg,
 						 &top_reg);
 		if (reg >= base_reg && reg <= top_reg) {
@@ -93,18 +93,20 @@ static struct regcache_rbtree_node *regcache_rbtree_lookup(struct regmap *map,
 	return NULL;
 }
 
-static int regcache_rbtree_insert(struct regmap *map, struct rb_root *root,
+static int regcache_rbtree_insert(struct regmap *map, struct llrb_root *root,
 				  struct regcache_rbtree_node *rbnode)
 {
 	struct rb_node **new, *parent;
+	struct llrb_node *prev = NULL;
 	struct regcache_rbtree_node *rbnode_tmp;
 	unsigned int base_reg_tmp, top_reg_tmp;
 	unsigned int base_reg;
 
 	parent = NULL;
-	new = &root->rb_node;
+	new = &root->rb_root.rb_node;
 	while (*new) {
-		rbnode_tmp = rb_entry(*new, struct regcache_rbtree_node, node);
+		rbnode_tmp = llrb_entry(*new,
+					struct regcache_rbtree_node, node);
 		/* base and top registers of the current rbnode */
 		regcache_rbtree_get_base_top_reg(map, rbnode_tmp, &base_reg_tmp,
 						 &top_reg_tmp);
@@ -115,15 +117,16 @@ static int regcache_rbtree_insert(struct regmap *map, struct rb_root *root,
 		if (base_reg >= base_reg_tmp &&
 		    base_reg <= top_reg_tmp)
 			return 0;
-		else if (base_reg > top_reg_tmp)
+		else if (base_reg > top_reg_tmp) {
+			prev = llrb_from_rb(parent);
 			new = &((*new)->rb_right);
-		else if (base_reg < base_reg_tmp)
+		} else if (base_reg < base_reg_tmp)
 			new = &((*new)->rb_left);
 	}
 
 	/* insert the node into the rbtree */
-	rb_link_node(&rbnode->node, parent, new);
-	rb_insert_color(&rbnode->node, root);
+	rb_link_node(&rbnode->node.rb_node, parent, new);
+	llrb_insert(root, &rbnode->node, prev);
 
 	return 1;
 }
@@ -134,7 +137,7 @@ static int rbtree_show(struct seq_file *s, void *ignored)
 	struct regmap *map = s->private;
 	struct regcache_rbtree_ctx *rbtree_ctx = map->cache;
 	struct regcache_rbtree_node *n;
-	struct rb_node *node;
+	struct llrb_node *node;
 	unsigned int base, top;
 	size_t mem_size;
 	int nodes = 0;
@@ -145,8 +148,8 @@ static int rbtree_show(struct seq_file *s, void *ignored)
 
 	mem_size = sizeof(*rbtree_ctx);
 
-	for (node = rb_first(&rbtree_ctx->root); node != NULL;
-	     node = rb_next(node)) {
+	for (node = llrb_first(&rbtree_ctx->root); node != NULL;
+	     node = llrb_next(node)) {
 		n = rb_entry(node, struct regcache_rbtree_node, node);
 		mem_size += sizeof(*n);
 		mem_size += (n->blklen * map->cache_word_size);
@@ -192,7 +195,7 @@ static int regcache_rbtree_init(struct regmap *map)
 		return -ENOMEM;
 
 	rbtree_ctx = map->cache;
-	rbtree_ctx->root = RB_ROOT;
+	rbtree_ctx->root = LLRB_ROOT;
 	rbtree_ctx->cached_rbnode = NULL;
 
 	for (i = 0; i < map->num_reg_defaults; i++) {
@@ -212,7 +215,7 @@ static int regcache_rbtree_init(struct regmap *map)
 
 static int regcache_rbtree_exit(struct regmap *map)
 {
-	struct rb_node *next;
+	struct llrb_node *next;
 	struct regcache_rbtree_ctx *rbtree_ctx;
 	struct regcache_rbtree_node *rbtree_node;
 
@@ -222,11 +225,11 @@ static int regcache_rbtree_exit(struct regmap *map)
 		return 0;
 
 	/* free up the rbtree */
-	next = rb_first(&rbtree_ctx->root);
+	next = llrb_first(&rbtree_ctx->root);
 	while (next) {
 		rbtree_node = rb_entry(next, struct regcache_rbtree_node, node);
-		next = rb_next(&rbtree_node->node);
-		rb_erase(&rbtree_node->node, &rbtree_ctx->root);
+		next = llrb_next(&rbtree_node->node);
+		llrb_erase(&rbtree_node->node, &rbtree_ctx->root);
 		kfree(rbtree_node->cache_present);
 		kfree(rbtree_node->block);
 		kfree(rbtree_node);
@@ -400,10 +403,10 @@ static int regcache_rbtree_write(struct regmap *map, unsigned int reg,
 		max = reg + max_dist;
 
 		/* look for an adjacent register to the one we are about to add */
-		node = rbtree_ctx->root.rb_node;
+		node = rbtree_ctx->root.rb_root.rb_node;
 		while (node) {
-			rbnode_tmp = rb_entry(node, struct regcache_rbtree_node,
-					      node);
+			rbnode_tmp = llrb_entry(node,
+					      struct regcache_rbtree_node, node);
 
 			regcache_rbtree_get_base_top_reg(map, rbnode_tmp,
 				&base_reg, &top_reg);
@@ -466,15 +469,17 @@ static int regcache_rbtree_sync(struct regmap *map, unsigned int min,
 				unsigned int max)
 {
 	struct regcache_rbtree_ctx *rbtree_ctx;
-	struct rb_node *node;
+	struct llrb_node *node;
 	struct regcache_rbtree_node *rbnode;
 	unsigned int base_reg, top_reg;
 	unsigned int start, end;
 	int ret;
 
 	rbtree_ctx = map->cache;
-	for (node = rb_first(&rbtree_ctx->root); node; node = rb_next(node)) {
-		rbnode = rb_entry(node, struct regcache_rbtree_node, node);
+	for (node = llrb_first(&rbtree_ctx->root); node;
+	     node = llrb_next(node)) {
+		rbnode = rb_entry(node,
+				  struct regcache_rbtree_node, node);
 
 		regcache_rbtree_get_base_top_reg(map, rbnode, &base_reg,
 			&top_reg);
@@ -508,12 +513,13 @@ static int regcache_rbtree_drop(struct regmap *map, unsigned int min,
 {
 	struct regcache_rbtree_ctx *rbtree_ctx;
 	struct regcache_rbtree_node *rbnode;
-	struct rb_node *node;
+	struct llrb_node *node;
 	unsigned int base_reg, top_reg;
 	unsigned int start, end;
 
 	rbtree_ctx = map->cache;
-	for (node = rb_first(&rbtree_ctx->root); node; node = rb_next(node)) {
+	for (node = llrb_first(&rbtree_ctx->root); node;
+	     node = llrb_next(node)) {
 		rbnode = rb_entry(node, struct regcache_rbtree_node, node);
 
 		regcache_rbtree_get_base_top_reg(map, rbnode, &base_reg,
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 4/5] vfio/type1: optimize dma_list tree iterations
  2020-02-07 18:03 [PATCH -next 0/5] rbtree: optimize frequent tree walks Davidlohr Bueso
                   ` (2 preceding siblings ...)
  2020-02-07 18:03 ` [PATCH 3/5] regmap: optimize sync() and drop() regcache callbacks Davidlohr Bueso
@ 2020-02-07 18:03 ` Davidlohr Bueso
  2020-02-07 18:03 ` [PATCH 5/5] uprobes: optimize build_probe_list() Davidlohr Bueso
  2020-02-10  1:46 ` [PATCH -next 0/5] rbtree: optimize frequent tree walks Andrew Morton
  5 siblings, 0 replies; 18+ messages in thread
From: Davidlohr Bueso @ 2020-02-07 18:03 UTC (permalink / raw)
  To: akpm
  Cc: dave, linux-kernel, mcgrof, broonie, alex.williamson, cohuck,
	kvm, Davidlohr Bueso

Both ->release() and ->attach_group() vfio_iommu driver
callbacks can incur in full in-order iommu->dma_list traversals,
which can be suboptimal under regular rbtree interators. This
patch proposes using the optimized llrbtree interface such that
we always have a branchless O(1) next node available. The cost
is higher memory footprint, mainly enlarging struct vfio_dma
by two pointers. This allows for minimal runtime overhead
when doing tree modifications.

Cc: alex.williamson@redhat.com
Cc: cohuck@redhat.com
Cc: kvm@vger.kernel.org
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 drivers/vfio/vfio_iommu_type1.c | 50 +++++++++++++++++++++++------------------
 1 file changed, 28 insertions(+), 22 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 2ada8e6cdb88..875170fc8515 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -28,6 +28,7 @@
 #include <linux/module.h>
 #include <linux/mm.h>
 #include <linux/rbtree.h>
+#include <linux/ll_rbtree.h>
 #include <linux/sched/signal.h>
 #include <linux/sched/mm.h>
 #include <linux/slab.h>
@@ -65,7 +66,7 @@ struct vfio_iommu {
 	struct list_head	iova_list;
 	struct vfio_domain	*external_domain; /* domain for external user */
 	struct mutex		lock;
-	struct rb_root		dma_list;
+	struct llrb_root	dma_list;
 	struct blocking_notifier_head notifier;
 	unsigned int		dma_avail;
 	bool			v2;
@@ -81,7 +82,7 @@ struct vfio_domain {
 };
 
 struct vfio_dma {
-	struct rb_node		node;
+	struct llrb_node	node;
 	dma_addr_t		iova;		/* Device address */
 	unsigned long		vaddr;		/* Process virtual addr */
 	size_t			size;		/* Map size (bytes) */
@@ -134,10 +135,10 @@ static int put_pfn(unsigned long pfn, int prot);
 static struct vfio_dma *vfio_find_dma(struct vfio_iommu *iommu,
 				      dma_addr_t start, size_t size)
 {
-	struct rb_node *node = iommu->dma_list.rb_node;
+	struct rb_node *node = iommu->dma_list.rb_root.rb_node;
 
 	while (node) {
-		struct vfio_dma *dma = rb_entry(node, struct vfio_dma, node);
+		struct vfio_dma *dma = llrb_entry(node, struct vfio_dma, node);
 
 		if (start + size <= dma->iova)
 			node = node->rb_left;
@@ -152,26 +153,30 @@ static struct vfio_dma *vfio_find_dma(struct vfio_iommu *iommu,
 
 static void vfio_link_dma(struct vfio_iommu *iommu, struct vfio_dma *new)
 {
-	struct rb_node **link = &iommu->dma_list.rb_node, *parent = NULL;
+	struct rb_node **link = &iommu->dma_list.rb_root.rb_node;
+	struct rb_node *parent = NULL;
+	struct llrb_node *prev = NULL;
 	struct vfio_dma *dma;
 
 	while (*link) {
 		parent = *link;
-		dma = rb_entry(parent, struct vfio_dma, node);
+		dma = llrb_entry(parent, struct vfio_dma, node);
 
 		if (new->iova + new->size <= dma->iova)
 			link = &(*link)->rb_left;
-		else
+		else {
 			link = &(*link)->rb_right;
+			prev = llrb_from_rb(parent);
+		}
 	}
 
-	rb_link_node(&new->node, parent, link);
-	rb_insert_color(&new->node, &iommu->dma_list);
+	rb_link_node(&new->node.rb_node, parent, link);
+	llrb_insert(&iommu->dma_list, &new->node, prev);
 }
 
 static void vfio_unlink_dma(struct vfio_iommu *iommu, struct vfio_dma *old)
 {
-	rb_erase(&old->node, &iommu->dma_list);
+	llrb_erase(&old->node, &iommu->dma_list);
 }
 
 /*
@@ -1170,15 +1175,15 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu,
 			     struct vfio_domain *domain)
 {
 	struct vfio_domain *d;
-	struct rb_node *n;
+	struct llrb_node *n;
 	unsigned long limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
 	int ret;
 
 	/* Arbitrarily pick the first domain in the list for lookups */
 	d = list_first_entry(&iommu->domain_list, struct vfio_domain, next);
-	n = rb_first(&iommu->dma_list);
+	n = llrb_first(&iommu->dma_list);
 
-	for (; n; n = rb_next(n)) {
+	for (; n; n = llrb_next(n)) {
 		struct vfio_dma *dma;
 		dma_addr_t iova;
 
@@ -1835,18 +1840,19 @@ static int vfio_iommu_type1_attach_group(void *iommu_data,
 
 static void vfio_iommu_unmap_unpin_all(struct vfio_iommu *iommu)
 {
-	struct rb_node *node;
+	struct llrb_node *node;
 
-	while ((node = rb_first(&iommu->dma_list)))
+	while ((node = llrb_first(&iommu->dma_list)))
 		vfio_remove_dma(iommu, rb_entry(node, struct vfio_dma, node));
 }
 
 static void vfio_iommu_unmap_unpin_reaccount(struct vfio_iommu *iommu)
 {
-	struct rb_node *n, *p;
+	struct llrb_node *n;
+	struct rb_node *p;
 
-	n = rb_first(&iommu->dma_list);
-	for (; n; n = rb_next(n)) {
+	n = llrb_first(&iommu->dma_list);
+	for (; n; n = llrb_next(n)) {
 		struct vfio_dma *dma;
 		long locked = 0, unlocked = 0;
 
@@ -1866,10 +1872,10 @@ static void vfio_iommu_unmap_unpin_reaccount(struct vfio_iommu *iommu)
 
 static void vfio_sanity_check_pfn_list(struct vfio_iommu *iommu)
 {
-	struct rb_node *n;
+	struct llrb_node *n;
 
-	n = rb_first(&iommu->dma_list);
-	for (; n; n = rb_next(n)) {
+	n = llrb_first(&iommu->dma_list);
+	for (; n; n = llrb_next(n)) {
 		struct vfio_dma *dma;
 
 		dma = rb_entry(n, struct vfio_dma, node);
@@ -2060,7 +2066,7 @@ static void *vfio_iommu_type1_open(unsigned long arg)
 
 	INIT_LIST_HEAD(&iommu->domain_list);
 	INIT_LIST_HEAD(&iommu->iova_list);
-	iommu->dma_list = RB_ROOT;
+	iommu->dma_list = LLRB_ROOT;
 	iommu->dma_avail = dma_entry_limit;
 	mutex_init(&iommu->lock);
 	BLOCKING_INIT_NOTIFIER_HEAD(&iommu->notifier);
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 5/5] uprobes: optimize build_probe_list()
  2020-02-07 18:03 [PATCH -next 0/5] rbtree: optimize frequent tree walks Davidlohr Bueso
                   ` (3 preceding siblings ...)
  2020-02-07 18:03 ` [PATCH 4/5] vfio/type1: optimize dma_list tree iterations Davidlohr Bueso
@ 2020-02-07 18:03 ` Davidlohr Bueso
  2020-02-10  1:46 ` [PATCH -next 0/5] rbtree: optimize frequent tree walks Andrew Morton
  5 siblings, 0 replies; 18+ messages in thread
From: Davidlohr Bueso @ 2020-02-07 18:03 UTC (permalink / raw)
  To: akpm
  Cc: dave, linux-kernel, mcgrof, broonie, alex.williamson, peterz,
	mingo, Davidlohr Bueso

We can optimize the uprobes_tree iterators when calling
build_probe_list() by using llrbtrees, having the previous
and next nodes from the given range available in branchless
O(1). The runtime overhead to maintain this data is minimal
for tree modifications, however the cost comes from enlarging
mostly the struct uprobe by two pointers and therefore we
sacrifice memory footprint.

Cc: peterz@infradead.org
Cc: mingo@redhat.com
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 kernel/events/uprobes.c | 47 +++++++++++++++++++++++++----------------------
 1 file changed, 25 insertions(+), 22 deletions(-)

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index ece7e13f6e4a..0a7b593578d1 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -27,18 +27,19 @@
 #include <linux/task_work.h>
 #include <linux/shmem_fs.h>
 #include <linux/khugepaged.h>
+#include <linux/ll_rbtree.h>
 
 #include <linux/uprobes.h>
 
 #define UINSNS_PER_PAGE			(PAGE_SIZE/UPROBE_XOL_SLOT_BYTES)
 #define MAX_UPROBE_XOL_SLOTS		UINSNS_PER_PAGE
 
-static struct rb_root uprobes_tree = RB_ROOT;
+static struct llrb_root uprobes_tree = LLRB_ROOT;
 /*
  * allows us to skip the uprobe_mmap if there are no uprobe events active
  * at this time.  Probably a fine grained per inode count is better?
  */
-#define no_uprobe_events()	RB_EMPTY_ROOT(&uprobes_tree)
+#define no_uprobe_events()	RB_EMPTY_ROOT(&uprobes_tree.rb_root)
 
 static DEFINE_SPINLOCK(uprobes_treelock);	/* serialize rbtree access */
 
@@ -53,7 +54,7 @@ DEFINE_STATIC_PERCPU_RWSEM(dup_mmap_sem);
 #define UPROBE_COPY_INSN	0
 
 struct uprobe {
-	struct rb_node		rb_node;	/* node in the rb tree */
+	struct llrb_node	rb_node;	/* node in the llrb tree */
 	refcount_t		ref;
 	struct rw_semaphore	register_rwsem;
 	struct rw_semaphore	consumer_rwsem;
@@ -639,12 +640,12 @@ static int match_uprobe(struct uprobe *l, struct uprobe *r)
 static struct uprobe *__find_uprobe(struct inode *inode, loff_t offset)
 {
 	struct uprobe u = { .inode = inode, .offset = offset };
-	struct rb_node *n = uprobes_tree.rb_node;
+	struct rb_node *n = uprobes_tree.rb_root.rb_node;
 	struct uprobe *uprobe;
 	int match;
 
 	while (n) {
-		uprobe = rb_entry(n, struct uprobe, rb_node);
+		uprobe = llrb_entry(n, struct uprobe, rb_node);
 		match = match_uprobe(&u, uprobe);
 		if (!match)
 			return get_uprobe(uprobe);
@@ -674,28 +675,30 @@ static struct uprobe *find_uprobe(struct inode *inode, loff_t offset)
 
 static struct uprobe *__insert_uprobe(struct uprobe *uprobe)
 {
-	struct rb_node **p = &uprobes_tree.rb_node;
+	struct rb_node **p = &uprobes_tree.rb_root.rb_node;
 	struct rb_node *parent = NULL;
+	struct llrb_node *prev = NULL;
 	struct uprobe *u;
 	int match;
 
 	while (*p) {
 		parent = *p;
-		u = rb_entry(parent, struct uprobe, rb_node);
+		u = llrb_entry(parent, struct uprobe, rb_node);
 		match = match_uprobe(uprobe, u);
 		if (!match)
 			return get_uprobe(u);
 
 		if (match < 0)
 			p = &parent->rb_left;
-		else
+		else {
 			p = &parent->rb_right;
-
+			prev = llrb_from_rb(parent);
+		}
 	}
 
 	u = NULL;
-	rb_link_node(&uprobe->rb_node, parent, p);
-	rb_insert_color(&uprobe->rb_node, &uprobes_tree);
+	rb_link_node(&uprobe->rb_node.rb_node, parent, p);
+	llrb_insert(&uprobes_tree, &uprobe->rb_node, prev);
 	/* get access + creation ref */
 	refcount_set(&uprobe->ref, 2);
 
@@ -940,7 +943,7 @@ remove_breakpoint(struct uprobe *uprobe, struct mm_struct *mm, unsigned long vad
 
 static inline bool uprobe_is_active(struct uprobe *uprobe)
 {
-	return !RB_EMPTY_NODE(&uprobe->rb_node);
+	return !RB_EMPTY_NODE(&uprobe->rb_node.rb_node);
 }
 /*
  * There could be threads that have already hit the breakpoint. They
@@ -953,9 +956,9 @@ static void delete_uprobe(struct uprobe *uprobe)
 		return;
 
 	spin_lock(&uprobes_treelock);
-	rb_erase(&uprobe->rb_node, &uprobes_tree);
+	llrb_erase(&uprobe->rb_node, &uprobes_tree);
 	spin_unlock(&uprobes_treelock);
-	RB_CLEAR_NODE(&uprobe->rb_node); /* for uprobe_is_active() */
+	RB_CLEAR_NODE(&uprobe->rb_node.rb_node); /* for uprobe_is_active() */
 	put_uprobe(uprobe);
 }
 
@@ -1263,13 +1266,13 @@ static int unapply_uprobe(struct uprobe *uprobe, struct mm_struct *mm)
 	return err;
 }
 
-static struct rb_node *
+static struct llrb_node *
 find_node_in_range(struct inode *inode, loff_t min, loff_t max)
 {
-	struct rb_node *n = uprobes_tree.rb_node;
+	struct rb_node *n = uprobes_tree.rb_root.rb_node;
 
 	while (n) {
-		struct uprobe *u = rb_entry(n, struct uprobe, rb_node);
+		struct uprobe *u = llrb_entry(n, struct uprobe, rb_node);
 
 		if (inode < u->inode) {
 			n = n->rb_left;
@@ -1285,7 +1288,7 @@ find_node_in_range(struct inode *inode, loff_t min, loff_t max)
 		}
 	}
 
-	return n;
+	return llrb_from_rb(n);
 }
 
 /*
@@ -1297,7 +1300,7 @@ static void build_probe_list(struct inode *inode,
 				struct list_head *head)
 {
 	loff_t min, max;
-	struct rb_node *n, *t;
+	struct llrb_node *n, *t;
 	struct uprobe *u;
 
 	INIT_LIST_HEAD(head);
@@ -1307,14 +1310,14 @@ static void build_probe_list(struct inode *inode,
 	spin_lock(&uprobes_treelock);
 	n = find_node_in_range(inode, min, max);
 	if (n) {
-		for (t = n; t; t = rb_prev(t)) {
+		for (t = n; t; t = llrb_prev(t)) {
 			u = rb_entry(t, struct uprobe, rb_node);
 			if (u->inode != inode || u->offset < min)
 				break;
 			list_add(&u->pending_list, head);
 			get_uprobe(u);
 		}
-		for (t = n; (t = rb_next(t)); ) {
+		for (t = n; (t = llrb_next(t)); ) {
 			u = rb_entry(t, struct uprobe, rb_node);
 			if (u->inode != inode || u->offset > max)
 				break;
@@ -1406,7 +1409,7 @@ vma_has_uprobes(struct vm_area_struct *vma, unsigned long start, unsigned long e
 {
 	loff_t min, max;
 	struct inode *inode;
-	struct rb_node *n;
+	struct llrb_node *n;
 
 	inode = file_inode(vma->vm_file);
 
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH -next 0/5] rbtree: optimize frequent tree walks
  2020-02-07 18:03 [PATCH -next 0/5] rbtree: optimize frequent tree walks Davidlohr Bueso
                   ` (4 preceding siblings ...)
  2020-02-07 18:03 ` [PATCH 5/5] uprobes: optimize build_probe_list() Davidlohr Bueso
@ 2020-02-10  1:46 ` Andrew Morton
  2020-02-10 15:56   ` Davidlohr Bueso
  5 siblings, 1 reply; 18+ messages in thread
From: Andrew Morton @ 2020-02-10  1:46 UTC (permalink / raw)
  To: Davidlohr Bueso; +Cc: linux-kernel, mcgrof, broonie, alex.williamson

On Fri,  7 Feb 2020 10:03:00 -0800 Davidlohr Bueso <dave@stgolabs.net> wrote:

> This series introduces the ll_rbtree interface to optimize rbtree users
> that incur in frequent in-order tree iterations (even full traversals).
> This can be seen as an abstraction to what is already done for dealing
> with VMAs (albeit non-augmented trees).
> 
> Mainly, it trades off higher per-node memory footprint (two extra pointers)
> for minimal runtime overhead to gain O(1) brachless next and prev node
> pointers. See patch 1 for full details, but, for example, the following
> tables show the benefits vs the costs of using this new interface.
> 
>    +--------+--------------+-----------+
>    | #nodes | plain rbtree | ll-rbtree |
>    +--------+--------------+-----------+
>    |     10 |          138 |        24 |
>    |    100 |        7,200 |       425 |
>    |   1000 |       17,000 |     8,000 |
>    |  10000 |      501,090 |   222,500 |
>    +--------+--------------+-----------+
> 
> While there are other node representations that optimize getting such pointers
> without bloating the nodes as much, such as keeping a parent pointer or threaded
> trees where the nil prev/next pointers are recycled; both incurring in higher
> runtime penalization for common modification operations as well as any rotations.
> The overhead of tree modifications can be seen in the following table, comparing
> cycles to insert+delete:
> 
>    +--------+--------------+------------------+-----------+
>    | #nodes | plain rbtree | augmented rbtree | ll-rbtree |
>    +--------+--------------+------------------+-----------+
>    |     10 |          530 |              840 |       600 |
>    |    100 |        7,100 |           14,200 |     7,800 |
>    |   1000 |      265,000 |          370,000 |   205,200 |
>    |  10000 |    4,400,000 |        5,500,000 | 4,200,000 |
>    +--------+--------------+------------------+-----------+
> 
> 
> Patch 1 introduces the ll_rbtree machinery.
> 
> Patches 2-5 convert users that might benefit from the new interface.
> 
> Full details and justifications for the conversion are in each
> individual patch.
> 
> I am Cc'ing some of the maintainers of the users I have converted to the whole
> series such that it can the numbers and interface details can be easily seen.
> 
> Please consider for v5.7.

Seems that all the caller sites you've converted use a fairly small
number of rbnodes, so the additional storage shouldn't be a big
problem.  Are there any other sites you're eyeing?  If so, do you expect
any of those will use a significant amount of memory for the nodes?

And...  are these patches really worth merging?  Complexity is added,
but what end-user benefit can we expect?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 3/5] regmap: optimize sync() and drop() regcache callbacks
  2020-02-07 18:03 ` [PATCH 3/5] regmap: optimize sync() and drop() regcache callbacks Davidlohr Bueso
@ 2020-02-10 13:11   ` Mark Brown
  0 siblings, 0 replies; 18+ messages in thread
From: Mark Brown @ 2020-02-10 13:11 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: akpm, linux-kernel, mcgrof, alex.williamson, Davidlohr Bueso

[-- Attachment #1: Type: text/plain, Size: 793 bytes --]

On Fri, Feb 07, 2020 at 10:03:03AM -0800, Davidlohr Bueso wrote:
> At a higher memory footprint (two additional pointers per node) we
> can get branchless O(1) tree iterators which can optimize in-order
> tree walks/traversals. For example, on my laptop looking at the rbtree
> debugfs entries:

It's not clear that this is a good optimization - we're increasing the
memory footprint for a bit of performance but our performance is all
relative to the I2C or SPI I/O we're most likely all in the noise here
whereas for larger maps the size cost looks like it could be quite bad
(but equally your case looks to have a lot of 2 register blocks which is
almost the worst case for overhead so...).

That said the code is fine so from that point of view:

Acked-by: Mark Brown <broonie@kernel.org>

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH -next 0/5] rbtree: optimize frequent tree walks
  2020-02-10  1:46 ` [PATCH -next 0/5] rbtree: optimize frequent tree walks Andrew Morton
@ 2020-02-10 15:56   ` Davidlohr Bueso
  2020-02-10 22:35     ` Andrew Morton
  2020-02-11 11:20     ` Mark Brown
  0 siblings, 2 replies; 18+ messages in thread
From: Davidlohr Bueso @ 2020-02-10 15:56 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, mcgrof, broonie, alex.williamson

On Sun, 09 Feb 2020, Andrew Morton wrote:
>Seems that all the caller sites you've converted use a fairly small
>number of rbnodes, so the additional storage shouldn't be a big
>problem.  Are there any other sites you're eyeing?  If so, do you expect
>any of those will use a significant amount of memory for the nodes?

I also thought about converting the deadline scheduler to use these,
mainly benefiting pull_dl_task() but didn't get to it and I don't expect
the extra footprint to be prohibitive.

>
>And...  are these patches really worth merging?  Complexity is added,
>but what end-user benefit can we expect?

Yes they are worth merging, imo (which of course is biased :)

I don't think there is too much added complexity overall, particularly
considering that the user conversions are rather trivial. And even for
small trees (ie 100 nodes) we still benefit in a measurable way from
these optimizations.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/5] lib/rbtree: introduce linked-list rbtree interface
  2020-02-07 18:03 ` [PATCH 1/5] lib/rbtree: introduce linked-list rbtree interface Davidlohr Bueso
@ 2020-02-10 20:07   ` Luis Chamberlain
  2020-02-10 21:28     ` Davidlohr Bueso
  0 siblings, 1 reply; 18+ messages in thread
From: Luis Chamberlain @ 2020-02-10 20:07 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: akpm, linux-kernel, broonie, alex.williamson, Davidlohr Bueso

On Fri, Feb 07, 2020 at 10:03:01AM -0800, Davidlohr Bueso wrote:
> When doing in-order tree traversals, the rb_next() and rb_prev() helpers can
> be sub-optimal as plain node representations incur in extra pointer chasing
> up the tree to compute the corresponding node. This can impact negatively
> in performance when traversals are a common thing. This, for example, is what
> we do in our vm already to efficiently manage VMAs.
> 
> This interface provides branchless O(1)

I think including the word "branchless" does injustice to the
optimization, just O(1) sells it to me more to how I read the code.
Why is the "branchless" prefix needed here?

> access to the first node as well as
> both its in-order successor and predecessor. This is done at the cost of higher
> memory footprint: mainly additional prev and next pointers for each node. Such
> benefits can be seen in this table showing the amount of cycles it takes to
> do a full tree traversal:
> 
>    +--------+--------------+-----------+
>    | #nodes | plain rbtree | ll-rbtree |
>    +--------+--------------+-----------+
>    |     10 |          138 |        24 |
>    |    100 |        7,200 |       425 |
>    |   1000 |       17,000 |     8,000 |
>    |  10000 |      501,090 |   222,500 |
>    +--------+--------------+-----------+

Sold, however I wonder if we can have *one new API* where based on just one
Kconfig you either get the two pointers or not, the performance gain
then would only be observed if this new kconfig entry is enabled. The
benefit of this is that we don't shove the performance benefit down
all user's throughts but rather this can be decided by distributions
and system integrators.

> diff --git a/Documentation/rbtree.txt b/Documentation/rbtree.txt
> index 523d54b60087..fe38fb257931 100644
> --- a/Documentation/rbtree.txt
> +++ b/Documentation/rbtree.txt
> @@ -5,6 +5,7 @@ Red-black Trees (rbtree) in Linux
>  
>  :Date: January 18, 2007
>  :Author: Rob Landley <rob@landley.net>
> +         Davidlohr Bueso <dave@stgolabs.net>
>  
>  What are red-black trees, and what are they for?
>  ------------------------------------------------
> @@ -226,6 +227,79 @@ trees::
>  				 struct rb_augment_callbacks *);
>  
>  
> +Linked-list rbtrees
> +-------------------
> +
> +When doing in-order tree traversals, the rb_next() and rb_prev() helpers can
> +be sub-optimal as plain node representations incur in extra pointer chasing
> +up the tree to compute the corresponding node. This can have a negative impact
> +in latencies in scenarios where tree traversals are not rare. This interface
> +provides branchless O(1) access to the first node as well as both its in-order
> +successor and predecessor. This is done at the cost of higher memory footprint:
> +mainly additional prev and next pointers for each node, essentially duplicating
> +the tree data structure. While there are other node representations that optimize
> +getting such pointers without bloating the nodes as much, such as keeping a
> +parent pointer or threaded trees where the nil prev/next pointers are recycled;
> +both incurring in higher runtime penalization for common modification operations
> +as well as any rotations.
> +
> +As with regular rb_root structure, linked-list rbtrees are initialized to be
> +empty via::
> +
> +  struct llrb_root mytree = LLRB_ROOT;
> +
> +Insertion and deletion are defined by:
> +
> +  void llrb_insert(struct llrb_root *, struct llrb_node *, struct llrb_node *);
> +  void llrb_erase(struct llrb_node *, struct llrb_root *);
> +
> +The corresponding helpers needed to iterate through a tree are the following,
> +equivalent to an optimized version of the regular rbtree version:
> +
> +  struct llrb_node *llrb_first(struct llrb_root *tree);
> +  struct llrb_node *llrb_next(struct rb_node *node);
> +  struct llrb_node *llrb_prev(struct rb_node *node);
> +
> +
> +Inserting data into a Linked-list rbtree
> +----------------------------------------
> +
> +Because llrb trees can exist anywhere regular rbtrees, the steps are similar.
> +The search for insertion differs from the regular search in two ways. First
> +the caller must keep track of the previous node,

can you explain here why, even though its clear in the code: its because
we need to pass it as a parameter when the new node is inserted into the
rb tree.

Also, what about a selftest for this?

Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>   

  Luis

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/5] proc/sysctl: optimize proc_sys_readdir
  2020-02-07 18:03 ` [PATCH 2/5] proc/sysctl: optimize proc_sys_readdir Davidlohr Bueso
@ 2020-02-10 20:08   ` Luis Chamberlain
  0 siblings, 0 replies; 18+ messages in thread
From: Luis Chamberlain @ 2020-02-10 20:08 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: akpm, linux-kernel, broonie, alex.williamson, keescook, yzaikin,
	linux-fsdevel, Davidlohr Bueso

On Fri, Feb 07, 2020 at 10:03:02AM -0800, Davidlohr Bueso wrote:
> This patch coverts struct ctl_dir to use an llrbtree
> instead of a regular rbtree such that computing nodes for
> potential usable entries becomes a branchless O(1) operation,
> therefore optimizing first_usable_entry(). The cost are
> mainly three additional pointers: one for the root and two
> for each struct ctl_node next/prev nodes, enlarging it from
> 32 to 48 bytes on x86-64.

Acked-by: Luis Chamberlain <mcgrof@kernel.org>

  Luis

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/5] lib/rbtree: introduce linked-list rbtree interface
  2020-02-10 20:07   ` Luis Chamberlain
@ 2020-02-10 21:28     ` Davidlohr Bueso
  2020-02-10 21:44       ` Luis Chamberlain
  0 siblings, 1 reply; 18+ messages in thread
From: Davidlohr Bueso @ 2020-02-10 21:28 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: akpm, linux-kernel, broonie, alex.williamson, Davidlohr Bueso

On Mon, 10 Feb 2020, Luis Chamberlain wrote:

>I think including the word "branchless" does injustice to the
>optimization, just O(1) sells it to me more to how I read the code.
>Why is the "branchless" prefix needed here?

Yes, compared to regular rb_next() 'branchless' would be unnecessary.

However compared to other node representations that are easier on
the memory footprint (parent pointers or threaded trees) also have
O(1) but still have branches - but most importantly, these approaches
incur in higher overhead for modifications to the tree.

>
>> access to the first node as well as
>> both its in-order successor and predecessor. This is done at the cost of higher
>> memory footprint: mainly additional prev and next pointers for each node. Such
>> benefits can be seen in this table showing the amount of cycles it takes to
>> do a full tree traversal:
>>
>>    +--------+--------------+-----------+
>>    | #nodes | plain rbtree | ll-rbtree |
>>    +--------+--------------+-----------+
>>    |     10 |          138 |        24 |
>>    |    100 |        7,200 |       425 |
>>    |   1000 |       17,000 |     8,000 |
>>    |  10000 |      501,090 |   222,500 |
>>    +--------+--------------+-----------+
>
>Sold, however I wonder if we can have *one new API* where based on just one
>Kconfig you either get the two pointers or not, the performance gain
>then would only be observed if this new kconfig entry is enabled. The
>benefit of this is that we don't shove the performance benefit down
>all user's throughts but rather this can be decided by distributions
>and system integrators.

I don't think we want an all or nothing approach as different users in the
kernel have different needs and some users are simply unable to deal with
enlarging data structures, while others have no problem.

>...

>> +Inserting data into a Linked-list rbtree
>> +----------------------------------------
>> +
>> +Because llrb trees can exist anywhere regular rbtrees, the steps are similar.
>> +The search for insertion differs from the regular search in two ways. First
>> +the caller must keep track of the previous node,
>
>can you explain here why, even though its clear in the code: its because
>we need to pass it as a parameter when the new node is inserted into the
>rb tree.

Right. We piggyback from the node info we already have available ie when user
iterates down the tree to find a point of insertion.

>
>Also, what about a selftest for this?

So we have lib/rbtree_test.c which does functional+latency testing - which I am
planning on updating if this series is merged. I first have some patches that
improve the overall module that are unrelated to this series and therefore
did not send it.

>
>Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/5] lib/rbtree: introduce linked-list rbtree interface
  2020-02-10 21:28     ` Davidlohr Bueso
@ 2020-02-10 21:44       ` Luis Chamberlain
  0 siblings, 0 replies; 18+ messages in thread
From: Luis Chamberlain @ 2020-02-10 21:44 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: akpm, linux-kernel, broonie, alex.williamson, Davidlohr Bueso

On Mon, Feb 10, 2020 at 01:28:32PM -0800, Davidlohr Bueso wrote:
> On Mon, 10 Feb 2020, Luis Chamberlain wrote:
> > > access to the first node as well as
> > > both its in-order successor and predecessor. This is done at the cost of higher
> > > memory footprint: mainly additional prev and next pointers for each node. Such
> > > benefits can be seen in this table showing the amount of cycles it takes to
> > > do a full tree traversal:
> > > 
> > >    +--------+--------------+-----------+
> > >    | #nodes | plain rbtree | ll-rbtree |
> > >    +--------+--------------+-----------+
> > >    |     10 |          138 |        24 |
> > >    |    100 |        7,200 |       425 |
> > >    |   1000 |       17,000 |     8,000 |
> > >    |  10000 |      501,090 |   222,500 |
> > >    +--------+--------------+-----------+
> > 
> > Sold, however I wonder if we can have *one new API* where based on just one
> > Kconfig you either get the two pointers or not, the performance gain
> > then would only be observed if this new kconfig entry is enabled. The
> > benefit of this is that we don't shove the performance benefit down
> > all user's throughts but rather this can be decided by distributions
> > and system integrators.
> 
> I don't think we want an all or nothing approach

I'm not suggesting all or nothing for all rb tree users, I'm suggesting
an all or nothing approach to users of a *new* API. That is, users would
still need to be converted over, and *iff* the new kconfig entry is
enabled would the two pointers be added / used, otherwise we'd fallback
to the default current boring rbtree.

> as different users in the
> kernel have different needs and some users are simply unable to deal with
> enlarging data structures, while others have no problem.

The approach would allow distros which are not yet sure if two pointers
are worth it yet to continue to chug on as they used to, but also allow
them to explore this prospect in a flexible way.

  Luis

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH -next 0/5] rbtree: optimize frequent tree walks
  2020-02-10 15:56   ` Davidlohr Bueso
@ 2020-02-10 22:35     ` Andrew Morton
  2020-02-13 15:50       ` Davidlohr Bueso
  2020-02-11 11:20     ` Mark Brown
  1 sibling, 1 reply; 18+ messages in thread
From: Andrew Morton @ 2020-02-10 22:35 UTC (permalink / raw)
  To: Davidlohr Bueso; +Cc: linux-kernel, mcgrof, broonie, alex.williamson

On Mon, 10 Feb 2020 07:56:11 -0800 Davidlohr Bueso <dave@stgolabs.net> wrote:

> On Sun, 09 Feb 2020, Andrew Morton wrote:
> >Seems that all the caller sites you've converted use a fairly small
> >number of rbnodes, so the additional storage shouldn't be a big
> >problem.  Are there any other sites you're eyeing?  If so, do you expect
> >any of those will use a significant amount of memory for the nodes?
> 
> I also thought about converting the deadline scheduler to use these,
> mainly benefiting pull_dl_task() but didn't get to it and I don't expect
> the extra footprint to be prohibitive.
> 
> >
> >And...  are these patches really worth merging?  Complexity is added,
> >but what end-user benefit can we expect?
> 
> Yes they are worth merging, imo (which of course is biased :)
> 
> I don't think there is too much added complexity overall, particularly
> considering that the user conversions are rather trivial. And even for
> small trees (ie 100 nodes) we still benefit in a measurable way from
> these optimizations.
> 

Measurable for microbenchmarks, I think?  But what benefit will a user
see, running a workload that is cared about?

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH -next 0/5] rbtree: optimize frequent tree walks
  2020-02-10 15:56   ` Davidlohr Bueso
  2020-02-10 22:35     ` Andrew Morton
@ 2020-02-11 11:20     ` Mark Brown
  2020-02-13 15:55       ` Davidlohr Bueso
  1 sibling, 1 reply; 18+ messages in thread
From: Mark Brown @ 2020-02-11 11:20 UTC (permalink / raw)
  To: Davidlohr Bueso; +Cc: Andrew Morton, linux-kernel, mcgrof, alex.williamson

[-- Attachment #1: Type: text/plain, Size: 672 bytes --]

On Mon, Feb 10, 2020 at 07:56:11AM -0800, Davidlohr Bueso wrote:
> On Sun, 09 Feb 2020, Andrew Morton wrote:

> > And...  are these patches really worth merging?  Complexity is added,
> > but what end-user benefit can we expect?

> Yes they are worth merging, imo (which of course is biased :)

> I don't think there is too much added complexity overall, particularly
> considering that the user conversions are rather trivial. And even for
> small trees (ie 100 nodes) we still benefit in a measurable way from
> these optimizations.

As I said in reply to the regmap patch I'm really unconvinced that any
benefit will outweigh the increased memory costs for that usage.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH -next 0/5] rbtree: optimize frequent tree walks
  2020-02-10 22:35     ` Andrew Morton
@ 2020-02-13 15:50       ` Davidlohr Bueso
  0 siblings, 0 replies; 18+ messages in thread
From: Davidlohr Bueso @ 2020-02-13 15:50 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, mcgrof, broonie, alex.williamson

On Mon, 10 Feb 2020, Andrew Morton wrote:
>Measurable for microbenchmarks, I think?  But what benefit will a user
>see, running a workload that is cared about?

Well avoiding pointer chasing on large SMP systems has been shown to be
something we care about in the past. And yes my performance data comes
from an adhoc test scenario which only measures iteration latencies but
that doesn't mean there isn't value here.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH -next 0/5] rbtree: optimize frequent tree walks
  2020-02-11 11:20     ` Mark Brown
@ 2020-02-13 15:55       ` Davidlohr Bueso
  2020-02-13 16:56         ` Mark Brown
  0 siblings, 1 reply; 18+ messages in thread
From: Davidlohr Bueso @ 2020-02-13 15:55 UTC (permalink / raw)
  To: Mark Brown; +Cc: Andrew Morton, linux-kernel, mcgrof, alex.williamson

On Tue, 11 Feb 2020, Mark Brown wrote:

>As I said in reply to the regmap patch I'm really unconvinced that any
>benefit will outweigh the increased memory costs for that usage.

I'm not arguing the regmap case; if it's not worth it it's not worth it.
With that conversion I was merely hoping that sync was used enough for
it to be considered.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH -next 0/5] rbtree: optimize frequent tree walks
  2020-02-13 15:55       ` Davidlohr Bueso
@ 2020-02-13 16:56         ` Mark Brown
  0 siblings, 0 replies; 18+ messages in thread
From: Mark Brown @ 2020-02-13 16:56 UTC (permalink / raw)
  To: Davidlohr Bueso; +Cc: Andrew Morton, linux-kernel, mcgrof, alex.williamson

[-- Attachment #1: Type: text/plain, Size: 673 bytes --]

On Thu, Feb 13, 2020 at 07:55:20AM -0800, Davidlohr Bueso wrote:
> On Tue, 11 Feb 2020, Mark Brown wrote:

> > As I said in reply to the regmap patch I'm really unconvinced that any
> > benefit will outweigh the increased memory costs for that usage.

> I'm not arguing the regmap case; if it's not worth it it's not worth it.
> With that conversion I was merely hoping that sync was used enough for
> it to be considered.

Well, it's used a fair amount but like I say it's mainly used in the
context of working out what I2C or SPI I/O to do so the amount of CPU
time consumed is neither here nor there really, we'd have to be doing
spectacularly badly for it to register.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2020-02-13 16:56 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-07 18:03 [PATCH -next 0/5] rbtree: optimize frequent tree walks Davidlohr Bueso
2020-02-07 18:03 ` [PATCH 1/5] lib/rbtree: introduce linked-list rbtree interface Davidlohr Bueso
2020-02-10 20:07   ` Luis Chamberlain
2020-02-10 21:28     ` Davidlohr Bueso
2020-02-10 21:44       ` Luis Chamberlain
2020-02-07 18:03 ` [PATCH 2/5] proc/sysctl: optimize proc_sys_readdir Davidlohr Bueso
2020-02-10 20:08   ` Luis Chamberlain
2020-02-07 18:03 ` [PATCH 3/5] regmap: optimize sync() and drop() regcache callbacks Davidlohr Bueso
2020-02-10 13:11   ` Mark Brown
2020-02-07 18:03 ` [PATCH 4/5] vfio/type1: optimize dma_list tree iterations Davidlohr Bueso
2020-02-07 18:03 ` [PATCH 5/5] uprobes: optimize build_probe_list() Davidlohr Bueso
2020-02-10  1:46 ` [PATCH -next 0/5] rbtree: optimize frequent tree walks Andrew Morton
2020-02-10 15:56   ` Davidlohr Bueso
2020-02-10 22:35     ` Andrew Morton
2020-02-13 15:50       ` Davidlohr Bueso
2020-02-11 11:20     ` Mark Brown
2020-02-13 15:55       ` Davidlohr Bueso
2020-02-13 16:56         ` Mark Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).