linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch 0/3] x86: Use interval tree to keep track of PAT reserve/free
@ 2010-02-10 19:57 venkatesh.pallipadi
  2010-02-10 19:57 ` [patch 1/3] rbtree: Add support for augmented rbtrees venkatesh.pallipadi
                   ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: venkatesh.pallipadi @ 2010-02-10 19:57 UTC (permalink / raw)
  To: Ingo Molnar, H Peter Anvin, Thomas Gleixner, Wolfram Strepp
  Cc: Venkatesh Pallipadi, Suresh Siddha, linux-kernel

Reserve and free ranges of IO region has to be kept track of in x86 PAT, as
the same region should not be mapped with conflicting types by multiple users.
These reserve and free requests can be of varying sizes and can overlap in
various ways. As in
uncached-minus @ 0xfbf00000-0xfbf04000
uncached-minus @ 0xfbf02000-0xfbf03000
or
uncached-minus @ 0xfbf00000-0xfbf04000
uncached-minus @ 0xfbf03000-0xfbf05000
etc.
depending on driver usage model.

And PAT code has to have efficient conflict lookup (while adding a new
region), exact region lookup (to free currently reserved region) and type
lookup (to lookup the memtype of any particular address).

Currently this is done by using a linked-list and rbtree hybrid model,
where linked list is sorted in increasing start address of these ranges.

But, the optimal way to deal with this is to use interval tree
(augmented rbtree). The patchset adds support for augmented rbtree in
generic rbtree code and uses that in x86 PAT, there by
cleaning up and simplifying the current PAT reserve-free backend.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>

-- 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [patch 1/3] rbtree: Add support for augmented rbtrees
  2010-02-10 19:57 [patch 0/3] x86: Use interval tree to keep track of PAT reserve/free venkatesh.pallipadi
@ 2010-02-10 19:57 ` venkatesh.pallipadi
  2010-02-10 23:23   ` [patch 1/3] rbtree: Add support for augmented rbtrees (ver 2) Pallipadi, Venkatesh
  2010-02-10 19:57 ` [patch 2/3] x86: Preparatory changes in pat.c for bigger rbtree change venkatesh.pallipadi
  2010-02-10 19:57 ` [patch 3/3] x86: Migrate to rbtree only backend for pat memtype management venkatesh.pallipadi
  2 siblings, 1 reply; 27+ messages in thread
From: venkatesh.pallipadi @ 2010-02-10 19:57 UTC (permalink / raw)
  To: Ingo Molnar, H Peter Anvin, Thomas Gleixner, Wolfram Strepp
  Cc: Venkatesh Pallipadi, Suresh Siddha, linux-kernel

[-- Attachment #1: 0001-rbtree-Add-support-for-augmented-rbtrees.patch --]
[-- Type: text/plain, Size: 5243 bytes --]

Add support for augmented rbtrees in core rbtree code.

This will be used in subsequent patches, in x86 PAT code, which needs
interval trees to efficiently keep track of PAT ranges.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---
 Documentation/rbtree.txt |   19 ++++++++++++++++++
 include/linux/rbtree.h   |    5 +++-
 lib/rbtree.c             |   48 ++++++++++++++++++++++++++++++++++++++++++---
 3 files changed, 67 insertions(+), 5 deletions(-)

diff --git a/Documentation/rbtree.txt b/Documentation/rbtree.txt
index aae8355..b2bae0a 100644
--- a/Documentation/rbtree.txt
+++ b/Documentation/rbtree.txt
@@ -190,3 +190,22 @@ Example:
   for (node = rb_first(&mytree); node; node = rb_next(node))
 	printk("key=%s\n", rb_entry(node, struct mytype, node)->keystring);
 
+Support for Augmented rbtrees
+-----------------------------
+
+Augmented rbtree is an rbtree with "some" additional data stored in each node.
+This data can be used to augment some new functionality to rbtree.
+Augmented rbtree is an optional feature built on top of basic rbtree
+infrastructure. rbtree user who wants this feature will have an augment
+callback function in rb_root initialized.
+
+Interval tree is an example of augmented rb tree. Reference -
+"Introduction to Algorithms" by Cormen, Leiserson, Rivest and Stein.
+
+This callback function will be called from rbtree core routines whenever
+a node has a change in one or both of its children. It is the responsibility
+of the callback function to recalculate the additional data that is in the
+rb node using new children information. Note that if this new additional
+data affects the parent node's additional data, then callback function has
+to handle it and do the recursive updates.
+
diff --git a/include/linux/rbtree.h b/include/linux/rbtree.h
index 9c29541..8e33a25 100644
--- a/include/linux/rbtree.h
+++ b/include/linux/rbtree.h
@@ -110,6 +110,7 @@ struct rb_node
 struct rb_root
 {
 	struct rb_node *rb_node;
+	void (*augment_cb)(struct rb_node *node);
 };
 
 
@@ -129,7 +130,9 @@ static inline void rb_set_color(struct rb_node *rb, int color)
 	rb->rb_parent_color = (rb->rb_parent_color & ~1) | color;
 }
 
-#define RB_ROOT	(struct rb_root) { NULL, }
+#define RB_ROOT	(struct rb_root) { NULL, NULL, }
+#define RB_AUGMENT_ROOT(x)	(struct rb_root) { NULL, x}
+
 #define	rb_entry(ptr, type, member) container_of(ptr, type, member)
 
 #define RB_EMPTY_ROOT(root)	((root)->rb_node == NULL)
diff --git a/lib/rbtree.c b/lib/rbtree.c
index e2aa3be..15e10b1 100644
--- a/lib/rbtree.c
+++ b/lib/rbtree.c
@@ -44,6 +44,11 @@ static void __rb_rotate_left(struct rb_node *node, struct rb_root *root)
 	else
 		root->rb_node = right;
 	rb_set_parent(node, right);
+
+	if (root->augment_cb) {
+		root->augment_cb(node);
+		root->augment_cb(right);
+	}
 }
 
 static void __rb_rotate_right(struct rb_node *node, struct rb_root *root)
@@ -67,12 +72,20 @@ static void __rb_rotate_right(struct rb_node *node, struct rb_root *root)
 	else
 		root->rb_node = left;
 	rb_set_parent(node, left);
+
+	if (root->augment_cb) {
+		root->augment_cb(node);
+		root->augment_cb(left);
+	}
 }
 
 void rb_insert_color(struct rb_node *node, struct rb_root *root)
 {
 	struct rb_node *parent, *gparent;
 
+	if (root->augment_cb)
+		root->augment_cb(node);
+
 	while ((parent = rb_parent(node)) && rb_is_red(parent))
 	{
 		gparent = rb_parent(parent);
@@ -227,12 +240,15 @@ void rb_erase(struct rb_node *node, struct rb_root *root)
 	else
 	{
 		struct rb_node *old = node, *left;
+		int old_parent_cb = 0;
+		int successor_parent_cb = 0;
 
 		node = node->rb_right;
 		while ((left = node->rb_left) != NULL)
 			node = left;
 
 		if (rb_parent(old)) {
+			old_parent_cb = 1;
 			if (rb_parent(old)->rb_left == old)
 				rb_parent(old)->rb_left = node;
 			else
@@ -247,8 +263,10 @@ void rb_erase(struct rb_node *node, struct rb_root *root)
 		if (parent == old) {
 			parent = node;
 		} else {
+			successor_parent_cb = 1;
 			if (child)
 				rb_set_parent(child, parent);
+
 			parent->rb_left = child;
 
 			node->rb_right = old->rb_right;
@@ -259,6 +277,24 @@ void rb_erase(struct rb_node *node, struct rb_root *root)
 		node->rb_left = old->rb_left;
 		rb_set_parent(old->rb_left, node);
 
+		if (root->augment_cb) {
+			/*
+			 * Here, three different nodes can have new children.
+			 * The parent of the successor node that was selected
+			 * to replace the node to be erased.
+			 * The node that is getting erased and is now replaced
+			 * by its successor.
+			 * The parent of the node getting erased-replaced.
+			 */
+			if (successor_parent_cb)
+				root->augment_cb(parent);
+
+			root->augment_cb(node);
+
+			if (old_parent_cb)
+				root->augment_cb(rb_parent(old));
+		}
+
 		goto color;
 	}
 
@@ -267,15 +303,19 @@ void rb_erase(struct rb_node *node, struct rb_root *root)
 
 	if (child)
 		rb_set_parent(child, parent);
-	if (parent)
-	{
+
+	if (parent) {
 		if (parent->rb_left == node)
 			parent->rb_left = child;
 		else
 			parent->rb_right = child;
-	}
-	else
+
+		if (root->augment_cb)
+			root->augment_cb(parent);
+
+	} else {
 		root->rb_node = child;
+	}
 
  color:
 	if (color == RB_BLACK)
-- 
1.6.0.6

-- 


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [patch 2/3] x86: Preparatory changes in pat.c for bigger rbtree change
  2010-02-10 19:57 [patch 0/3] x86: Use interval tree to keep track of PAT reserve/free venkatesh.pallipadi
  2010-02-10 19:57 ` [patch 1/3] rbtree: Add support for augmented rbtrees venkatesh.pallipadi
@ 2010-02-10 19:57 ` venkatesh.pallipadi
  2010-02-19  0:22   ` [tip:x86/pat] x86, pat: " tip-bot for venkatesh.pallipadi@intel.com
  2010-02-10 19:57 ` [patch 3/3] x86: Migrate to rbtree only backend for pat memtype management venkatesh.pallipadi
  2 siblings, 1 reply; 27+ messages in thread
From: venkatesh.pallipadi @ 2010-02-10 19:57 UTC (permalink / raw)
  To: Ingo Molnar, H Peter Anvin, Thomas Gleixner
  Cc: Venkatesh Pallipadi, Suresh Siddha, linux-kernel

[-- Attachment #1: 0002-x86-Preparatory-changes-in-pat.c-for-bigger-rbtree.patch --]
[-- Type: text/plain, Size: 7705 bytes --]

Minor changes in pat.c to cleanup code and make it smoother to introduce
bigger rbtree only change in the following patch. The changes are cleaup
only and should not have any functional impact.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---
 arch/x86/mm/pat.c          |  170 +++++++++++++++++++++++---------------------
 arch/x86/mm/pat_internal.h |   28 +++++++
 2 files changed, 116 insertions(+), 82 deletions(-)
 create mode 100644 arch/x86/mm/pat_internal.h

diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index ae9648e..628e507 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -30,6 +30,8 @@
 #include <asm/pat.h>
 #include <asm/io.h>
 
+#include "pat_internal.h"
+
 #ifdef CONFIG_X86_PAT
 int __read_mostly pat_enabled = 1;
 
@@ -53,19 +55,15 @@ static inline void pat_disable(const char *reason)
 #endif
 
 
-static int debug_enable;
+int pat_debug_enable;
 
 static int __init pat_debug_setup(char *str)
 {
-	debug_enable = 1;
+	pat_debug_enable = 1;
 	return 0;
 }
 __setup("debugpat", pat_debug_setup);
 
-#define dprintk(fmt, arg...) \
-	do { if (debug_enable) printk(KERN_INFO fmt, ##arg); } while (0)
-
-
 static u64 __read_mostly boot_pat_state;
 
 enum {
@@ -132,17 +130,6 @@ void pat_init(void)
 
 #undef PAT
 
-static char *cattr_name(unsigned long flags)
-{
-	switch (flags & _PAGE_CACHE_MASK) {
-	case _PAGE_CACHE_UC:		return "uncached";
-	case _PAGE_CACHE_UC_MINUS:	return "uncached-minus";
-	case _PAGE_CACHE_WB:		return "write-back";
-	case _PAGE_CACHE_WC:		return "write-combining";
-	default:			return "broken";
-	}
-}
-
 /*
  * The global memtype list keeps track of memory type for specific
  * physical memory areas. Conflicting memory types in different
@@ -159,14 +146,6 @@ static char *cattr_name(unsigned long flags)
  * memtype_lock protects both the linear list and rbtree.
  */
 
-struct memtype {
-	u64			start;
-	u64			end;
-	unsigned long		type;
-	struct list_head	nd;
-	struct rb_node		rb;
-};
-
 static struct rb_root memtype_rbroot = RB_ROOT;
 static LIST_HEAD(memtype_list);
 static DEFINE_SPINLOCK(memtype_lock);	/* protects memtype list */
@@ -349,6 +328,64 @@ static int free_ram_pages_type(u64 start, u64 end)
 	return 0;
 }
 
+static int memtype_check_insert(struct memtype *new, unsigned long *new_type)
+{
+	struct memtype *entry;
+	u64 start, end;
+	unsigned long actual_type;
+	struct list_head *where;
+	int err = 0;
+
+	start = new->start;
+	end = new->end;
+	actual_type = new->type;
+
+	/* Search for existing mapping that overlaps the current range */
+	where = NULL;
+	list_for_each_entry(entry, &memtype_list, nd) {
+		if (end <= entry->start) {
+			where = entry->nd.prev;
+			break;
+		} else if (start <= entry->start) { /* end > entry->start */
+			err = chk_conflict(new, entry, new_type);
+			if (!err) {
+				dprintk("Overlap at 0x%Lx-0x%Lx\n",
+					entry->start, entry->end);
+				where = entry->nd.prev;
+			}
+			break;
+		} else if (start < entry->end) { /* start > entry->start */
+			err = chk_conflict(new, entry, new_type);
+			if (!err) {
+				dprintk("Overlap at 0x%Lx-0x%Lx\n",
+					entry->start, entry->end);
+
+				/*
+				 * Move to right position in the linked
+				 * list to add this new entry
+				 */
+				list_for_each_entry_continue(entry,
+							&memtype_list, nd) {
+					if (start <= entry->start) {
+						where = entry->nd.prev;
+						break;
+					}
+				}
+			}
+			break;
+		}
+	}
+	if (!err) {
+		if (where)
+			list_add(&new->nd, where);
+		else
+			list_add_tail(&new->nd, &memtype_list);
+
+		memtype_rb_insert(&memtype_rbroot, new);
+	}
+	return err;
+}
+
 /*
  * req_type typically has one of the:
  * - _PAGE_CACHE_WB
@@ -364,9 +401,8 @@ static int free_ram_pages_type(u64 start, u64 end)
 int reserve_memtype(u64 start, u64 end, unsigned long req_type,
 		    unsigned long *new_type)
 {
-	struct memtype *new, *entry;
+	struct memtype *new;
 	unsigned long actual_type;
-	struct list_head *where;
 	int is_range_ram;
 	int err = 0;
 
@@ -423,42 +459,7 @@ int reserve_memtype(u64 start, u64 end, unsigned long req_type,
 
 	spin_lock(&memtype_lock);
 
-	/* Search for existing mapping that overlaps the current range */
-	where = NULL;
-	list_for_each_entry(entry, &memtype_list, nd) {
-		if (end <= entry->start) {
-			where = entry->nd.prev;
-			break;
-		} else if (start <= entry->start) { /* end > entry->start */
-			err = chk_conflict(new, entry, new_type);
-			if (!err) {
-				dprintk("Overlap at 0x%Lx-0x%Lx\n",
-					entry->start, entry->end);
-				where = entry->nd.prev;
-			}
-			break;
-		} else if (start < entry->end) { /* start > entry->start */
-			err = chk_conflict(new, entry, new_type);
-			if (!err) {
-				dprintk("Overlap at 0x%Lx-0x%Lx\n",
-					entry->start, entry->end);
-
-				/*
-				 * Move to right position in the linked
-				 * list to add this new entry
-				 */
-				list_for_each_entry_continue(entry,
-							&memtype_list, nd) {
-					if (start <= entry->start) {
-						where = entry->nd.prev;
-						break;
-					}
-				}
-			}
-			break;
-		}
-	}
-
+	err = memtype_check_insert(new, new_type);
 	if (err) {
 		printk(KERN_INFO "reserve_memtype failed 0x%Lx-0x%Lx, "
 		       "track %s, req %s\n",
@@ -469,13 +470,6 @@ int reserve_memtype(u64 start, u64 end, unsigned long req_type,
 		return err;
 	}
 
-	if (where)
-		list_add(&new->nd, where);
-	else
-		list_add_tail(&new->nd, &memtype_list);
-
-	memtype_rb_insert(&memtype_rbroot, new);
-
 	spin_unlock(&memtype_lock);
 
 	dprintk("reserve_memtype added 0x%Lx-0x%Lx, track %s, req %s, ret %s\n",
@@ -937,28 +931,40 @@ EXPORT_SYMBOL_GPL(pgprot_writecombine);
 #if defined(CONFIG_DEBUG_FS) && defined(CONFIG_X86_PAT)
 
 /* get Nth element of the linked list */
-static struct memtype *memtype_get_idx(loff_t pos)
+static int copy_memtype_nth_element(struct memtype *out, loff_t pos)
 {
-	struct memtype *list_node, *print_entry;
+	struct memtype *list_node;
 	int i = 1;
 
-	print_entry  = kmalloc(sizeof(struct memtype), GFP_KERNEL);
-	if (!print_entry)
-		return NULL;
-
-	spin_lock(&memtype_lock);
 	list_for_each_entry(list_node, &memtype_list, nd) {
 		if (pos == i) {
-			*print_entry = *list_node;
-			spin_unlock(&memtype_lock);
-			return print_entry;
+			*out = *list_node;
+			return 0;
 		}
 		++i;
 	}
+	return 1;
+}
+
+static struct memtype *memtype_get_idx(loff_t pos)
+{
+	struct memtype *print_entry;
+	int ret;
+
+	print_entry  = kzalloc(sizeof(struct memtype), GFP_KERNEL);
+	if (!print_entry)
+		return NULL;
+
+	spin_lock(&memtype_lock);
+	ret = copy_memtype_nth_element(print_entry, pos);
 	spin_unlock(&memtype_lock);
-	kfree(print_entry);
 
-	return NULL;
+	if (!ret) {
+		return print_entry;
+	} else {
+		kfree(print_entry);
+		return NULL;
+	}
 }
 
 static void *memtype_seq_start(struct seq_file *seq, loff_t *pos)
diff --git a/arch/x86/mm/pat_internal.h b/arch/x86/mm/pat_internal.h
new file mode 100644
index 0000000..6c98780
--- /dev/null
+++ b/arch/x86/mm/pat_internal.h
@@ -0,0 +1,28 @@
+#ifndef __PAT_INTERNAL_H_
+#define __PAT_INTERNAL_H_
+
+extern int pat_debug_enable;
+
+#define dprintk(fmt, arg...) \
+	do { if (pat_debug_enable) printk(KERN_INFO fmt, ##arg); } while (0)
+
+struct memtype {
+	u64			start;
+	u64			end;
+	unsigned long		type;
+	struct list_head	nd;
+	struct rb_node		rb;
+};
+
+static inline char *cattr_name(unsigned long flags)
+{
+	switch (flags & _PAGE_CACHE_MASK) {
+	case _PAGE_CACHE_UC:		return "uncached";
+	case _PAGE_CACHE_UC_MINUS:	return "uncached-minus";
+	case _PAGE_CACHE_WB:		return "write-back";
+	case _PAGE_CACHE_WC:		return "write-combining";
+	default:			return "broken";
+	}
+}
+
+#endif /* __PAT_INTERNAL_H_ */
-- 
1.6.0.6

-- 


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [patch 3/3] x86: Migrate to rbtree only backend for pat memtype management
  2010-02-10 19:57 [patch 0/3] x86: Use interval tree to keep track of PAT reserve/free venkatesh.pallipadi
  2010-02-10 19:57 ` [patch 1/3] rbtree: Add support for augmented rbtrees venkatesh.pallipadi
  2010-02-10 19:57 ` [patch 2/3] x86: Preparatory changes in pat.c for bigger rbtree change venkatesh.pallipadi
@ 2010-02-10 19:57 ` venkatesh.pallipadi
  2010-02-10 23:26   ` [patch 3/3] x86: Migrate to rbtree only backend for pat memtype management (ver 2) Pallipadi, Venkatesh
  2 siblings, 1 reply; 27+ messages in thread
From: venkatesh.pallipadi @ 2010-02-10 19:57 UTC (permalink / raw)
  To: Ingo Molnar, H Peter Anvin, Thomas Gleixner
  Cc: Venkatesh Pallipadi, Suresh Siddha, linux-kernel

[-- Attachment #1: 0003-x86-Migrate-to-rbtree-only-backend-for-pat-memtype.patch --]
[-- Type: text/plain, Size: 16515 bytes --]

Move pat backend to fully rbtree based implementation from the existing
rbtree and linked list hybrid.

New rbtree based solution uses interval trees (augmented rbtrees) in
order to store the PAT ranges. The new code seprates out the pat backend
to pat_rbtree.c file, making is cleaner. The change also makes the PAT
lookup, reserve and free operations more optimal, as we don't have to
traverse linear linked list of few tens of entries in normal case.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---
 arch/x86/mm/Makefile       |    1 +
 arch/x86/mm/pat.c          |  209 +---------------------------------
 arch/x86/mm/pat_internal.h |   20 +++-
 arch/x86/mm/pat_rbtree.c   |  269 ++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 294 insertions(+), 205 deletions(-)
 create mode 100644 arch/x86/mm/pat_rbtree.c

diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index 06630d2..a4c7683 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -6,6 +6,7 @@ nostackp := $(call cc-option, -fno-stack-protector)
 CFLAGS_physaddr.o		:= $(nostackp)
 CFLAGS_setup_nx.o		:= $(nostackp)
 
+obj-$(CONFIG_X86_PAT)		+= pat_rbtree.o
 obj-$(CONFIG_SMP)		+= tlb.o
 
 obj-$(CONFIG_X86_32)		+= pgtable_32.o iomap_32.o
diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index 628e507..9510111 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -130,65 +130,7 @@ void pat_init(void)
 
 #undef PAT
 
-/*
- * The global memtype list keeps track of memory type for specific
- * physical memory areas. Conflicting memory types in different
- * mappings can cause CPU cache corruption. To avoid this we keep track.
- *
- * The list is sorted based on starting address and can contain multiple
- * entries for each address (this allows reference counting for overlapping
- * areas). All the aliases have the same cache attributes of course.
- * Zero attributes are represented as holes.
- *
- * The data structure is a list that is also organized as an rbtree
- * sorted on the start address of memtype range.
- *
- * memtype_lock protects both the linear list and rbtree.
- */
-
-static struct rb_root memtype_rbroot = RB_ROOT;
-static LIST_HEAD(memtype_list);
-static DEFINE_SPINLOCK(memtype_lock);	/* protects memtype list */
-
-static struct memtype *memtype_rb_search(struct rb_root *root, u64 start)
-{
-	struct rb_node *node = root->rb_node;
-	struct memtype *last_lower = NULL;
-
-	while (node) {
-		struct memtype *data = container_of(node, struct memtype, rb);
-
-		if (data->start < start) {
-			last_lower = data;
-			node = node->rb_right;
-		} else if (data->start > start) {
-			node = node->rb_left;
-		} else
-			return data;
-	}
-
-	/* Will return NULL if there is no entry with its start <= start */
-	return last_lower;
-}
-
-static void memtype_rb_insert(struct rb_root *root, struct memtype *data)
-{
-	struct rb_node **new = &(root->rb_node);
-	struct rb_node *parent = NULL;
-
-	while (*new) {
-		struct memtype *this = container_of(*new, struct memtype, rb);
-
-		parent = *new;
-		if (data->start <= this->start)
-			new = &((*new)->rb_left);
-		else if (data->start > this->start)
-			new = &((*new)->rb_right);
-	}
-
-	rb_link_node(&data->rb, parent, new);
-	rb_insert_color(&data->rb, root);
-}
+static DEFINE_SPINLOCK(memtype_lock);	/* protects memtype accesses */
 
 /*
  * Does intersection of PAT memory type and MTRR memory type and returns
@@ -216,33 +158,6 @@ static unsigned long pat_x_mtrr_type(u64 start, u64 end, unsigned long req_type)
 	return req_type;
 }
 
-static int
-chk_conflict(struct memtype *new, struct memtype *entry, unsigned long *type)
-{
-	if (new->type != entry->type) {
-		if (type) {
-			new->type = entry->type;
-			*type = entry->type;
-		} else
-			goto conflict;
-	}
-
-	 /* check overlaps with more than one entry in the list */
-	list_for_each_entry_continue(entry, &memtype_list, nd) {
-		if (new->end <= entry->start)
-			break;
-		else if (new->type != entry->type)
-			goto conflict;
-	}
-	return 0;
-
- conflict:
-	printk(KERN_INFO "%s:%d conflicting memory types "
-	       "%Lx-%Lx %s<->%s\n", current->comm, current->pid, new->start,
-	       new->end, cattr_name(new->type), cattr_name(entry->type));
-	return -EBUSY;
-}
-
 static int pat_pagerange_is_ram(unsigned long start, unsigned long end)
 {
 	int ram_page = 0, not_rampage = 0;
@@ -328,64 +243,6 @@ static int free_ram_pages_type(u64 start, u64 end)
 	return 0;
 }
 
-static int memtype_check_insert(struct memtype *new, unsigned long *new_type)
-{
-	struct memtype *entry;
-	u64 start, end;
-	unsigned long actual_type;
-	struct list_head *where;
-	int err = 0;
-
-	start = new->start;
-	end = new->end;
-	actual_type = new->type;
-
-	/* Search for existing mapping that overlaps the current range */
-	where = NULL;
-	list_for_each_entry(entry, &memtype_list, nd) {
-		if (end <= entry->start) {
-			where = entry->nd.prev;
-			break;
-		} else if (start <= entry->start) { /* end > entry->start */
-			err = chk_conflict(new, entry, new_type);
-			if (!err) {
-				dprintk("Overlap at 0x%Lx-0x%Lx\n",
-					entry->start, entry->end);
-				where = entry->nd.prev;
-			}
-			break;
-		} else if (start < entry->end) { /* start > entry->start */
-			err = chk_conflict(new, entry, new_type);
-			if (!err) {
-				dprintk("Overlap at 0x%Lx-0x%Lx\n",
-					entry->start, entry->end);
-
-				/*
-				 * Move to right position in the linked
-				 * list to add this new entry
-				 */
-				list_for_each_entry_continue(entry,
-							&memtype_list, nd) {
-					if (start <= entry->start) {
-						where = entry->nd.prev;
-						break;
-					}
-				}
-			}
-			break;
-		}
-	}
-	if (!err) {
-		if (where)
-			list_add(&new->nd, where);
-		else
-			list_add_tail(&new->nd, &memtype_list);
-
-		memtype_rb_insert(&memtype_rbroot, new);
-	}
-	return err;
-}
-
 /*
  * req_type typically has one of the:
  * - _PAGE_CACHE_WB
@@ -459,7 +316,7 @@ int reserve_memtype(u64 start, u64 end, unsigned long req_type,
 
 	spin_lock(&memtype_lock);
 
-	err = memtype_check_insert(new, new_type);
+	err = rbt_memtype_check_insert(new, new_type);
 	if (err) {
 		printk(KERN_INFO "reserve_memtype failed 0x%Lx-0x%Lx, "
 		       "track %s, req %s\n",
@@ -481,7 +338,6 @@ int reserve_memtype(u64 start, u64 end, unsigned long req_type,
 
 int free_memtype(u64 start, u64 end)
 {
-	struct memtype *entry, *saved_entry;
 	int err = -EINVAL;
 	int is_range_ram;
 
@@ -505,46 +361,7 @@ int free_memtype(u64 start, u64 end)
 	}
 
 	spin_lock(&memtype_lock);
-
-	entry = memtype_rb_search(&memtype_rbroot, start);
-	if (unlikely(entry == NULL))
-		goto unlock_ret;
-
-	/*
-	 * Saved entry points to an entry with start same or less than what
-	 * we searched for. Now go through the list in both directions to look
-	 * for the entry that matches with both start and end, with list stored
-	 * in sorted start address
-	 */
-	saved_entry = entry;
-	list_for_each_entry_from(entry, &memtype_list, nd) {
-		if (entry->start == start && entry->end == end) {
-			rb_erase(&entry->rb, &memtype_rbroot);
-			list_del(&entry->nd);
-			kfree(entry);
-			err = 0;
-			break;
-		} else if (entry->start > start) {
-			break;
-		}
-	}
-
-	if (!err)
-		goto unlock_ret;
-
-	entry = saved_entry;
-	list_for_each_entry_reverse(entry, &memtype_list, nd) {
-		if (entry->start == start && entry->end == end) {
-			rb_erase(&entry->rb, &memtype_rbroot);
-			list_del(&entry->nd);
-			kfree(entry);
-			err = 0;
-			break;
-		} else if (entry->start < start) {
-			break;
-		}
-	}
-unlock_ret:
+	err = rbt_memtype_erase(start, end);
 	spin_unlock(&memtype_lock);
 
 	if (err) {
@@ -593,7 +410,7 @@ static unsigned long lookup_memtype(u64 paddr)
 
 	spin_lock(&memtype_lock);
 
-	entry = memtype_rb_search(&memtype_rbroot, paddr);
+	entry = rbt_memtype_lookup(paddr);
 	if (entry != NULL)
 		rettype = entry->type;
 	else
@@ -930,22 +747,6 @@ EXPORT_SYMBOL_GPL(pgprot_writecombine);
 
 #if defined(CONFIG_DEBUG_FS) && defined(CONFIG_X86_PAT)
 
-/* get Nth element of the linked list */
-static int copy_memtype_nth_element(struct memtype *out, loff_t pos)
-{
-	struct memtype *list_node;
-	int i = 1;
-
-	list_for_each_entry(list_node, &memtype_list, nd) {
-		if (pos == i) {
-			*out = *list_node;
-			return 0;
-		}
-		++i;
-	}
-	return 1;
-}
-
 static struct memtype *memtype_get_idx(loff_t pos)
 {
 	struct memtype *print_entry;
@@ -956,7 +757,7 @@ static struct memtype *memtype_get_idx(loff_t pos)
 		return NULL;
 
 	spin_lock(&memtype_lock);
-	ret = copy_memtype_nth_element(print_entry, pos);
+	ret = rbt_memtype_copy_nth_element(print_entry, pos);
 	spin_unlock(&memtype_lock);
 
 	if (!ret) {
diff --git a/arch/x86/mm/pat_internal.h b/arch/x86/mm/pat_internal.h
index 6c98780..4f39eef 100644
--- a/arch/x86/mm/pat_internal.h
+++ b/arch/x86/mm/pat_internal.h
@@ -9,8 +9,8 @@ extern int pat_debug_enable;
 struct memtype {
 	u64			start;
 	u64			end;
+	u64			subtree_max_end;
 	unsigned long		type;
-	struct list_head	nd;
 	struct rb_node		rb;
 };
 
@@ -25,4 +25,22 @@ static inline char *cattr_name(unsigned long flags)
 	}
 }
 
+#ifdef CONFIG_X86_PAT
+extern int rbt_memtype_check_insert(struct memtype *new,
+					unsigned long *new_type);
+extern int rbt_memtype_erase(u64 start, u64 end);
+extern struct memtype *rbt_memtype_lookup(u64 addr);
+extern int rbt_memtype_copy_nth_element(struct memtype *out, loff_t pos);
+#else
+static inline int rbt_memtype_check_insert(struct memtype *new,
+					unsigned long *new_type)
+{ return 0; }
+static inline int rbt_memtype_erase(u64 start, u64 end)
+{ return 0; }
+static inline struct memtype *rbt_memtype_lookup(u64 addr)
+{ return NULL; }
+static inline int rbt_memtype_copy_nth_element(struct memtype *out, loff_t pos)
+{ return 0; }
+#endif
+
 #endif /* __PAT_INTERNAL_H_ */
diff --git a/arch/x86/mm/pat_rbtree.c b/arch/x86/mm/pat_rbtree.c
new file mode 100644
index 0000000..55ca66b
--- /dev/null
+++ b/arch/x86/mm/pat_rbtree.c
@@ -0,0 +1,269 @@
+/*
+ * Handle caching attributes in page tables (PAT)
+ *
+ * Authors: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
+ *          Suresh B Siddha <suresh.b.siddha@intel.com>
+ *
+ * Interval tree (augmented rbtree) used to store the PAT memory type
+ * reservations.
+ */
+
+#include <linux/seq_file.h>
+#include <linux/debugfs.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/rbtree.h>
+#include <linux/sched.h>
+#include <linux/gfp.h>
+
+#include <asm/pgtable.h>
+#include <asm/pat.h>
+
+#include "pat_internal.h"
+
+/*
+ * The memtype tree keeps track of memory type for specific
+ * physical memory areas. Without proper tracking, conflicting memory
+ * types in different mappings can cause CPU cache corruption.
+ *
+ * The tree is an interval tree (augmented rbtree) with tree ordered
+ * on starting address. Tree can contain multiple entries for
+ * different regions which overlap. All the aliases have the same
+ * cache attributes of course.
+ *
+ * memtype_lock protects the rbtree.
+ */
+
+static void memtype_rb_augment_cb(struct rb_node *node);
+static struct rb_root memtype_rbroot = RB_AUGMENT_ROOT(&memtype_rb_augment_cb);
+
+static int is_node_overlap(struct memtype *node, u64 start, u64 end)
+{
+	if (node->start >= end || node->end <= start)
+		return 0;
+
+	return 1;
+}
+
+static u64 get_subtree_max_end(struct rb_node *node)
+{
+	u64 ret = 0;
+	if (node) {
+		struct memtype *data = container_of(node, struct memtype, rb);
+		ret = data->subtree_max_end;
+	}
+	return ret;
+}
+
+/* Update 'subtree_max_end' for a node, based on node and its children */
+static void update_node_max_end(struct rb_node *node)
+{
+	struct memtype *data;
+	u64 max_end, child_max_end;
+
+	if (!node)
+		return;
+
+	data = container_of(node, struct memtype, rb);
+	max_end = data->end;
+
+	child_max_end = get_subtree_max_end(node->rb_right);
+	if (child_max_end > max_end)
+		max_end = child_max_end;
+
+	child_max_end = get_subtree_max_end(node->rb_left);
+	if (child_max_end > max_end)
+		max_end = child_max_end;
+
+	data->subtree_max_end = max_end;
+}
+
+/* Update 'subtree_max_end' for a node and all its ancestors */
+static void update_path_max_end(struct rb_node *node)
+{
+	u64 old_max_end, new_max_end;
+
+	while (node) {
+		struct memtype *data = container_of(node, struct memtype, rb);
+
+		old_max_end = data->subtree_max_end;
+		update_node_max_end(node);
+		new_max_end = data->subtree_max_end;
+
+		if (new_max_end == old_max_end)
+			break;
+
+		node = rb_parent(node);
+	}
+}
+
+/* Find the first (lowest start addr) overlapping range from rb tree */
+static struct memtype *memtype_rb_lowest_match(struct rb_root *root,
+				u64 start, u64 end)
+{
+	struct rb_node *node = root->rb_node;
+	struct memtype *last_lower = NULL;
+
+	while (node) {
+		struct memtype *data = container_of(node, struct memtype, rb);
+
+		if (is_node_overlap(data, start, end))
+			last_lower = data;
+
+		if (get_subtree_max_end(node->rb_left) > start)
+			node = node->rb_left;
+		else if (last_lower == NULL && start >= data->start)
+			node = node->rb_right;
+		else
+			break;
+	}
+	return last_lower; /* Returns NULL if there is no overlap */
+}
+
+static struct memtype *memtype_rb_exact_match(struct rb_root *root,
+				u64 start, u64 end)
+{
+	struct memtype *match;
+
+	match = memtype_rb_lowest_match(root, start, end);
+	while (match != NULL && match->start < end) {
+		struct rb_node *node;
+
+		if (match->start == start && match->end == end)
+			return match;
+
+		node = rb_next(&match->rb);
+		if (node)
+			match = container_of(node, struct memtype, rb);
+		else
+			match = NULL;
+	}
+
+	return NULL; /* Returns NULL if there is no exact match */
+}
+
+static int memtype_rb_check_conflict(struct rb_root *root,
+				u64 start, u64 end,
+				unsigned long reqtype, unsigned long *newtype)
+{
+	struct rb_node *node;
+	struct memtype *match;
+	int found_type = reqtype;
+
+	match = memtype_rb_lowest_match(&memtype_rbroot, start, end);
+	if (match == NULL)
+		goto success;
+
+	if (match->type != found_type && newtype == NULL)
+		goto failure;
+
+	dprintk("Overlap at 0x%Lx-0x%Lx\n", match->start, match->end);
+	found_type = match->type;
+
+	node = rb_next(&match->rb);
+	while (node) {
+		match = container_of(node, struct memtype, rb);
+
+		if (match->start >= end) /* Checked all possible matches */
+			goto success;
+
+		if (is_node_overlap(match, start, end) &&
+		    match->type != found_type) {
+			goto failure;
+		}
+
+		node = rb_next(&match->rb);
+	}
+success:
+	if (newtype)
+		*newtype = found_type;
+
+	return 0;
+
+failure:
+	printk(KERN_INFO "%s:%d conflicting memory types "
+		"%Lx-%Lx %s<->%s\n", current->comm, current->pid, start,
+		end, cattr_name(found_type), cattr_name(match->type));
+	return -EBUSY;
+}
+
+static void memtype_rb_augment_cb(struct rb_node *node)
+{
+	if (node)
+		update_path_max_end(node);
+}
+
+static void memtype_rb_insert(struct rb_root *root, struct memtype *newdata)
+{
+	struct rb_node **node = &(root->rb_node);
+	struct rb_node *parent = NULL;
+
+	while (*node) {
+		struct memtype *data = container_of(*node, struct memtype, rb);
+
+		parent = *node;
+		if (newdata->start <= data->start)
+			node = &((*node)->rb_left);
+		else if (newdata->start > data->start)
+			node = &((*node)->rb_right);
+	}
+
+	rb_link_node(&newdata->rb, parent, node);
+	rb_insert_color(&newdata->rb, root);
+}
+
+int rbt_memtype_check_insert(struct memtype *new, unsigned long *ret_type)
+{
+	int err = 0;
+
+	err = memtype_rb_check_conflict(&memtype_rbroot, new->start, new->end,
+						new->type, ret_type);
+
+	if (!err) {
+		new->type = *ret_type;
+		memtype_rb_insert(&memtype_rbroot, new);
+	}
+	return err;
+}
+
+int rbt_memtype_erase(u64 start, u64 end)
+{
+	struct memtype *data;
+
+	data = memtype_rb_exact_match(&memtype_rbroot, start, end);
+	if (!data)
+		return -EINVAL;
+
+	rb_erase(&data->rb, &memtype_rbroot);
+	return 0;
+}
+
+struct memtype *rbt_memtype_lookup(u64 addr)
+{
+	struct memtype *data;
+	data = memtype_rb_lowest_match(&memtype_rbroot, addr, addr + PAGE_SIZE);
+	return data;
+}
+
+#if defined(CONFIG_DEBUG_FS)
+int rbt_memtype_copy_nth_element(struct memtype *out, loff_t pos)
+{
+	struct rb_node *node;
+	int i = 1;
+
+	node = rb_first(&memtype_rbroot);
+	while (node && pos != i) {
+		node = rb_next(node);
+		i++;
+	}
+
+	if (node) { /* pos == i */
+		struct memtype *this = container_of(node, struct memtype, rb);
+		*out = *this;
+		return 0;
+	} else {
+		return 1;
+	}
+}
+#endif
+
-- 
1.6.0.6

-- 


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [patch 1/3] rbtree: Add support for augmented rbtrees (ver 2)
  2010-02-10 19:57 ` [patch 1/3] rbtree: Add support for augmented rbtrees venkatesh.pallipadi
@ 2010-02-10 23:23   ` Pallipadi, Venkatesh
  2010-02-19  0:21     ` [tip:x86/pat] rbtree: Add support for augmented rbtrees tip-bot for Pallipadi, Venkatesh
  0 siblings, 1 reply; 27+ messages in thread
From: Pallipadi, Venkatesh @ 2010-02-10 23:23 UTC (permalink / raw)
  To: Pallipadi, Venkatesh
  Cc: Ingo Molnar, H Peter Anvin, Thomas Gleixner, Wolfram Strepp,
	Siddha, Suresh B, linux-kernel


Add support for augmented rbtrees in core rbtree code.

This will be used in subsequent patches, in x86 PAT code, which needs
interval trees to efficiently keep track of PAT ranges.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---

Updated with more info on interval trees.

 Documentation/rbtree.txt |   58 ++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/rbtree.h   |    5 +++-
 lib/rbtree.c             |   48 ++++++++++++++++++++++++++++++++++---
 3 files changed, 106 insertions(+), 5 deletions(-)

diff --git a/Documentation/rbtree.txt b/Documentation/rbtree.txt
index aae8355..4e19db8 100644
--- a/Documentation/rbtree.txt
+++ b/Documentation/rbtree.txt
@@ -190,3 +190,61 @@ Example:
   for (node = rb_first(&mytree); node; node = rb_next(node))
 	printk("key=%s\n", rb_entry(node, struct mytype, node)->keystring);
 
+Support for Augmented rbtrees
+-----------------------------
+
+Augmented rbtree is an rbtree with "some" additional data stored in each node.
+This data can be used to augment some new functionality to rbtree.
+Augmented rbtree is an optional feature built on top of basic rbtree
+infrastructure. rbtree user who wants this feature will have an augment
+callback function in rb_root initialized.
+
+This callback function will be called from rbtree core routines whenever
+a node has a change in one or both of its children. It is the responsibility
+of the callback function to recalculate the additional data that is in the
+rb node using new children information. Note that if this new additional
+data affects the parent node's additional data, then callback function has
+to handle it and do the recursive updates.
+
+
+Interval tree is an example of augmented rb tree. Reference -
+"Introduction to Algorithms" by Cormen, Leiserson, Rivest and Stein.
+More details about interval trees:
+
+Classical rbtree has a single key and it cannot be directly used to store
+interval ranges like [lo:hi] and do a quick lookup for any overlap with a new
+lo:hi or to find whether there is an exact match for a new lo:hi.
+
+However, rbtree can be augmented to store such interval ranges in a structured
+way making it possible to do efficient lookup and exact match.
+
+This "extra information" stored in each node is the maximum hi (max_hi) value
+among all the nodes that are its descendents. This information can be
+maintained at each node just be looking at the node and its immediate
+children. And this will be used in O(n) lookup for lowest match (lowest start
+address among all possible matches) with something like:
+
+find_lowest_match(lo, hi, node)
+{
+	lowest_match = NULL;
+	while (node) {
+		if (max_hi(node->left) > lo) {
+			// Lowest overlap if any must be on left side
+			node = node->left;
+		} else if (overlap(lo, hi, node)) {
+			lowest_match = node;
+			break;
+		} else if (lo > node->lo) {
+			// Lowest overlap if any must be on right side
+			node = node->right;
+		} else {
+			break;
+		}
+	}
+	return lowest_match;
+}
+
+Finding exact match will be to first find lowest match and then to follow
+successor nodes looking for exact match, until the start of a node is beyond
+the hi value we are looking for.
+
diff --git a/include/linux/rbtree.h b/include/linux/rbtree.h
index 9c29541..8e33a25 100644
--- a/include/linux/rbtree.h
+++ b/include/linux/rbtree.h
@@ -110,6 +110,7 @@ struct rb_node
 struct rb_root
 {
 	struct rb_node *rb_node;
+	void (*augment_cb)(struct rb_node *node);
 };
 
 
@@ -129,7 +130,9 @@ static inline void rb_set_color(struct rb_node *rb, int color)
 	rb->rb_parent_color = (rb->rb_parent_color & ~1) | color;
 }
 
-#define RB_ROOT	(struct rb_root) { NULL, }
+#define RB_ROOT	(struct rb_root) { NULL, NULL, }
+#define RB_AUGMENT_ROOT(x)	(struct rb_root) { NULL, x}
+
 #define	rb_entry(ptr, type, member) container_of(ptr, type, member)
 
 #define RB_EMPTY_ROOT(root)	((root)->rb_node == NULL)
diff --git a/lib/rbtree.c b/lib/rbtree.c
index e2aa3be..15e10b1 100644
--- a/lib/rbtree.c
+++ b/lib/rbtree.c
@@ -44,6 +44,11 @@ static void __rb_rotate_left(struct rb_node *node, struct rb_root *root)
 	else
 		root->rb_node = right;
 	rb_set_parent(node, right);
+
+	if (root->augment_cb) {
+		root->augment_cb(node);
+		root->augment_cb(right);
+	}
 }
 
 static void __rb_rotate_right(struct rb_node *node, struct rb_root *root)
@@ -67,12 +72,20 @@ static void __rb_rotate_right(struct rb_node *node, struct rb_root *root)
 	else
 		root->rb_node = left;
 	rb_set_parent(node, left);
+
+	if (root->augment_cb) {
+		root->augment_cb(node);
+		root->augment_cb(left);
+	}
 }
 
 void rb_insert_color(struct rb_node *node, struct rb_root *root)
 {
 	struct rb_node *parent, *gparent;
 
+	if (root->augment_cb)
+		root->augment_cb(node);
+
 	while ((parent = rb_parent(node)) && rb_is_red(parent))
 	{
 		gparent = rb_parent(parent);
@@ -227,12 +240,15 @@ void rb_erase(struct rb_node *node, struct rb_root *root)
 	else
 	{
 		struct rb_node *old = node, *left;
+		int old_parent_cb = 0;
+		int successor_parent_cb = 0;
 
 		node = node->rb_right;
 		while ((left = node->rb_left) != NULL)
 			node = left;
 
 		if (rb_parent(old)) {
+			old_parent_cb = 1;
 			if (rb_parent(old)->rb_left == old)
 				rb_parent(old)->rb_left = node;
 			else
@@ -247,8 +263,10 @@ void rb_erase(struct rb_node *node, struct rb_root *root)
 		if (parent == old) {
 			parent = node;
 		} else {
+			successor_parent_cb = 1;
 			if (child)
 				rb_set_parent(child, parent);
+
 			parent->rb_left = child;
 
 			node->rb_right = old->rb_right;
@@ -259,6 +277,24 @@ void rb_erase(struct rb_node *node, struct rb_root *root)
 		node->rb_left = old->rb_left;
 		rb_set_parent(old->rb_left, node);
 
+		if (root->augment_cb) {
+			/*
+			 * Here, three different nodes can have new children.
+			 * The parent of the successor node that was selected
+			 * to replace the node to be erased.
+			 * The node that is getting erased and is now replaced
+			 * by its successor.
+			 * The parent of the node getting erased-replaced.
+			 */
+			if (successor_parent_cb)
+				root->augment_cb(parent);
+
+			root->augment_cb(node);
+
+			if (old_parent_cb)
+				root->augment_cb(rb_parent(old));
+		}
+
 		goto color;
 	}
 
@@ -267,15 +303,19 @@ void rb_erase(struct rb_node *node, struct rb_root *root)
 
 	if (child)
 		rb_set_parent(child, parent);
-	if (parent)
-	{
+
+	if (parent) {
 		if (parent->rb_left == node)
 			parent->rb_left = child;
 		else
 			parent->rb_right = child;
-	}
-	else
+
+		if (root->augment_cb)
+			root->augment_cb(parent);
+
+	} else {
 		root->rb_node = child;
+	}
 
  color:
 	if (color == RB_BLACK)
-- 
1.6.0.6


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [patch 3/3] x86: Migrate to rbtree only backend for pat memtype management (ver 2)
  2010-02-10 19:57 ` [patch 3/3] x86: Migrate to rbtree only backend for pat memtype management venkatesh.pallipadi
@ 2010-02-10 23:26   ` Pallipadi, Venkatesh
  2010-02-19  0:22     ` [tip:x86/pat] x86, pat: Migrate to rbtree only backend for pat memtype management tip-bot for Pallipadi, Venkatesh
  0 siblings, 1 reply; 27+ messages in thread
From: Pallipadi, Venkatesh @ 2010-02-10 23:26 UTC (permalink / raw)
  To: Pallipadi, Venkatesh
  Cc: Ingo Molnar, H Peter Anvin, Thomas Gleixner, Siddha, Suresh B,
	linux-kernel


Move pat backend to fully rbtree based implementation from the existing
rbtree and linked list hybrid.

New rbtree based solution uses interval trees (augmented rbtrees) in
order to store the PAT ranges. The new code seprates out the pat backend
to pat_rbtree.c file, making is cleaner. The change also makes the PAT
lookup, reserve and free operations more optimal, as we don't have to
traverse linear linked list of few tens of entries in normal case.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---
Updated with a bug fix in memtype_rb_lowest_match

 arch/x86/mm/Makefile       |    1 +
 arch/x86/mm/pat.c          |  209 +---------------------------------
 arch/x86/mm/pat_internal.h |   20 +++-
 arch/x86/mm/pat_rbtree.c   |  272 ++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 297 insertions(+), 205 deletions(-)
 create mode 100644 arch/x86/mm/pat_rbtree.c

diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index 06630d2..a4c7683 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -6,6 +6,7 @@ nostackp := $(call cc-option, -fno-stack-protector)
 CFLAGS_physaddr.o		:= $(nostackp)
 CFLAGS_setup_nx.o		:= $(nostackp)
 
+obj-$(CONFIG_X86_PAT)		+= pat_rbtree.o
 obj-$(CONFIG_SMP)		+= tlb.o
 
 obj-$(CONFIG_X86_32)		+= pgtable_32.o iomap_32.o
diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index 628e507..9510111 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -130,65 +130,7 @@ void pat_init(void)
 
 #undef PAT
 
-/*
- * The global memtype list keeps track of memory type for specific
- * physical memory areas. Conflicting memory types in different
- * mappings can cause CPU cache corruption. To avoid this we keep track.
- *
- * The list is sorted based on starting address and can contain multiple
- * entries for each address (this allows reference counting for overlapping
- * areas). All the aliases have the same cache attributes of course.
- * Zero attributes are represented as holes.
- *
- * The data structure is a list that is also organized as an rbtree
- * sorted on the start address of memtype range.
- *
- * memtype_lock protects both the linear list and rbtree.
- */
-
-static struct rb_root memtype_rbroot = RB_ROOT;
-static LIST_HEAD(memtype_list);
-static DEFINE_SPINLOCK(memtype_lock);	/* protects memtype list */
-
-static struct memtype *memtype_rb_search(struct rb_root *root, u64 start)
-{
-	struct rb_node *node = root->rb_node;
-	struct memtype *last_lower = NULL;
-
-	while (node) {
-		struct memtype *data = container_of(node, struct memtype, rb);
-
-		if (data->start < start) {
-			last_lower = data;
-			node = node->rb_right;
-		} else if (data->start > start) {
-			node = node->rb_left;
-		} else
-			return data;
-	}
-
-	/* Will return NULL if there is no entry with its start <= start */
-	return last_lower;
-}
-
-static void memtype_rb_insert(struct rb_root *root, struct memtype *data)
-{
-	struct rb_node **new = &(root->rb_node);
-	struct rb_node *parent = NULL;
-
-	while (*new) {
-		struct memtype *this = container_of(*new, struct memtype, rb);
-
-		parent = *new;
-		if (data->start <= this->start)
-			new = &((*new)->rb_left);
-		else if (data->start > this->start)
-			new = &((*new)->rb_right);
-	}
-
-	rb_link_node(&data->rb, parent, new);
-	rb_insert_color(&data->rb, root);
-}
+static DEFINE_SPINLOCK(memtype_lock);	/* protects memtype accesses */
 
 /*
  * Does intersection of PAT memory type and MTRR memory type and returns
@@ -216,33 +158,6 @@ static unsigned long pat_x_mtrr_type(u64 start, u64 end, unsigned long req_type)
 	return req_type;
 }
 
-static int
-chk_conflict(struct memtype *new, struct memtype *entry, unsigned long *type)
-{
-	if (new->type != entry->type) {
-		if (type) {
-			new->type = entry->type;
-			*type = entry->type;
-		} else
-			goto conflict;
-	}
-
-	 /* check overlaps with more than one entry in the list */
-	list_for_each_entry_continue(entry, &memtype_list, nd) {
-		if (new->end <= entry->start)
-			break;
-		else if (new->type != entry->type)
-			goto conflict;
-	}
-	return 0;
-
- conflict:
-	printk(KERN_INFO "%s:%d conflicting memory types "
-	       "%Lx-%Lx %s<->%s\n", current->comm, current->pid, new->start,
-	       new->end, cattr_name(new->type), cattr_name(entry->type));
-	return -EBUSY;
-}
-
 static int pat_pagerange_is_ram(unsigned long start, unsigned long end)
 {
 	int ram_page = 0, not_rampage = 0;
@@ -328,64 +243,6 @@ static int free_ram_pages_type(u64 start, u64 end)
 	return 0;
 }
 
-static int memtype_check_insert(struct memtype *new, unsigned long *new_type)
-{
-	struct memtype *entry;
-	u64 start, end;
-	unsigned long actual_type;
-	struct list_head *where;
-	int err = 0;
-
-	start = new->start;
-	end = new->end;
-	actual_type = new->type;
-
-	/* Search for existing mapping that overlaps the current range */
-	where = NULL;
-	list_for_each_entry(entry, &memtype_list, nd) {
-		if (end <= entry->start) {
-			where = entry->nd.prev;
-			break;
-		} else if (start <= entry->start) { /* end > entry->start */
-			err = chk_conflict(new, entry, new_type);
-			if (!err) {
-				dprintk("Overlap at 0x%Lx-0x%Lx\n",
-					entry->start, entry->end);
-				where = entry->nd.prev;
-			}
-			break;
-		} else if (start < entry->end) { /* start > entry->start */
-			err = chk_conflict(new, entry, new_type);
-			if (!err) {
-				dprintk("Overlap at 0x%Lx-0x%Lx\n",
-					entry->start, entry->end);
-
-				/*
-				 * Move to right position in the linked
-				 * list to add this new entry
-				 */
-				list_for_each_entry_continue(entry,
-							&memtype_list, nd) {
-					if (start <= entry->start) {
-						where = entry->nd.prev;
-						break;
-					}
-				}
-			}
-			break;
-		}
-	}
-	if (!err) {
-		if (where)
-			list_add(&new->nd, where);
-		else
-			list_add_tail(&new->nd, &memtype_list);
-
-		memtype_rb_insert(&memtype_rbroot, new);
-	}
-	return err;
-}
-
 /*
  * req_type typically has one of the:
  * - _PAGE_CACHE_WB
@@ -459,7 +316,7 @@ int reserve_memtype(u64 start, u64 end, unsigned long req_type,
 
 	spin_lock(&memtype_lock);
 
-	err = memtype_check_insert(new, new_type);
+	err = rbt_memtype_check_insert(new, new_type);
 	if (err) {
 		printk(KERN_INFO "reserve_memtype failed 0x%Lx-0x%Lx, "
 		       "track %s, req %s\n",
@@ -481,7 +338,6 @@ int reserve_memtype(u64 start, u64 end, unsigned long req_type,
 
 int free_memtype(u64 start, u64 end)
 {
-	struct memtype *entry, *saved_entry;
 	int err = -EINVAL;
 	int is_range_ram;
 
@@ -505,46 +361,7 @@ int free_memtype(u64 start, u64 end)
 	}
 
 	spin_lock(&memtype_lock);
-
-	entry = memtype_rb_search(&memtype_rbroot, start);
-	if (unlikely(entry == NULL))
-		goto unlock_ret;
-
-	/*
-	 * Saved entry points to an entry with start same or less than what
-	 * we searched for. Now go through the list in both directions to look
-	 * for the entry that matches with both start and end, with list stored
-	 * in sorted start address
-	 */
-	saved_entry = entry;
-	list_for_each_entry_from(entry, &memtype_list, nd) {
-		if (entry->start == start && entry->end == end) {
-			rb_erase(&entry->rb, &memtype_rbroot);
-			list_del(&entry->nd);
-			kfree(entry);
-			err = 0;
-			break;
-		} else if (entry->start > start) {
-			break;
-		}
-	}
-
-	if (!err)
-		goto unlock_ret;
-
-	entry = saved_entry;
-	list_for_each_entry_reverse(entry, &memtype_list, nd) {
-		if (entry->start == start && entry->end == end) {
-			rb_erase(&entry->rb, &memtype_rbroot);
-			list_del(&entry->nd);
-			kfree(entry);
-			err = 0;
-			break;
-		} else if (entry->start < start) {
-			break;
-		}
-	}
-unlock_ret:
+	err = rbt_memtype_erase(start, end);
 	spin_unlock(&memtype_lock);
 
 	if (err) {
@@ -593,7 +410,7 @@ static unsigned long lookup_memtype(u64 paddr)
 
 	spin_lock(&memtype_lock);
 
-	entry = memtype_rb_search(&memtype_rbroot, paddr);
+	entry = rbt_memtype_lookup(paddr);
 	if (entry != NULL)
 		rettype = entry->type;
 	else
@@ -930,22 +747,6 @@ EXPORT_SYMBOL_GPL(pgprot_writecombine);
 
 #if defined(CONFIG_DEBUG_FS) && defined(CONFIG_X86_PAT)
 
-/* get Nth element of the linked list */
-static int copy_memtype_nth_element(struct memtype *out, loff_t pos)
-{
-	struct memtype *list_node;
-	int i = 1;
-
-	list_for_each_entry(list_node, &memtype_list, nd) {
-		if (pos == i) {
-			*out = *list_node;
-			return 0;
-		}
-		++i;
-	}
-	return 1;
-}
-
 static struct memtype *memtype_get_idx(loff_t pos)
 {
 	struct memtype *print_entry;
@@ -956,7 +757,7 @@ static struct memtype *memtype_get_idx(loff_t pos)
 		return NULL;
 
 	spin_lock(&memtype_lock);
-	ret = copy_memtype_nth_element(print_entry, pos);
+	ret = rbt_memtype_copy_nth_element(print_entry, pos);
 	spin_unlock(&memtype_lock);
 
 	if (!ret) {
diff --git a/arch/x86/mm/pat_internal.h b/arch/x86/mm/pat_internal.h
index 6c98780..4f39eef 100644
--- a/arch/x86/mm/pat_internal.h
+++ b/arch/x86/mm/pat_internal.h
@@ -9,8 +9,8 @@ extern int pat_debug_enable;
 struct memtype {
 	u64			start;
 	u64			end;
+	u64			subtree_max_end;
 	unsigned long		type;
-	struct list_head	nd;
 	struct rb_node		rb;
 };
 
@@ -25,4 +25,22 @@ static inline char *cattr_name(unsigned long flags)
 	}
 }
 
+#ifdef CONFIG_X86_PAT
+extern int rbt_memtype_check_insert(struct memtype *new,
+					unsigned long *new_type);
+extern int rbt_memtype_erase(u64 start, u64 end);
+extern struct memtype *rbt_memtype_lookup(u64 addr);
+extern int rbt_memtype_copy_nth_element(struct memtype *out, loff_t pos);
+#else
+static inline int rbt_memtype_check_insert(struct memtype *new,
+					unsigned long *new_type)
+{ return 0; }
+static inline int rbt_memtype_erase(u64 start, u64 end)
+{ return 0; }
+static inline struct memtype *rbt_memtype_lookup(u64 addr)
+{ return NULL; }
+static inline int rbt_memtype_copy_nth_element(struct memtype *out, loff_t pos)
+{ return 0; }
+#endif
+
 #endif /* __PAT_INTERNAL_H_ */
diff --git a/arch/x86/mm/pat_rbtree.c b/arch/x86/mm/pat_rbtree.c
new file mode 100644
index 0000000..e4cd229
--- /dev/null
+++ b/arch/x86/mm/pat_rbtree.c
@@ -0,0 +1,272 @@
+/*
+ * Handle caching attributes in page tables (PAT)
+ *
+ * Authors: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
+ *          Suresh B Siddha <suresh.b.siddha@intel.com>
+ *
+ * Interval tree (augmented rbtree) used to store the PAT memory type
+ * reservations.
+ */
+
+#include <linux/seq_file.h>
+#include <linux/debugfs.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/rbtree.h>
+#include <linux/sched.h>
+#include <linux/gfp.h>
+
+#include <asm/pgtable.h>
+#include <asm/pat.h>
+
+#include "pat_internal.h"
+
+/*
+ * The memtype tree keeps track of memory type for specific
+ * physical memory areas. Without proper tracking, conflicting memory
+ * types in different mappings can cause CPU cache corruption.
+ *
+ * The tree is an interval tree (augmented rbtree) with tree ordered
+ * on starting address. Tree can contain multiple entries for
+ * different regions which overlap. All the aliases have the same
+ * cache attributes of course.
+ *
+ * memtype_lock protects the rbtree.
+ */
+
+static void memtype_rb_augment_cb(struct rb_node *node);
+static struct rb_root memtype_rbroot = RB_AUGMENT_ROOT(&memtype_rb_augment_cb);
+
+static int is_node_overlap(struct memtype *node, u64 start, u64 end)
+{
+	if (node->start >= end || node->end <= start)
+		return 0;
+
+	return 1;
+}
+
+static u64 get_subtree_max_end(struct rb_node *node)
+{
+	u64 ret = 0;
+	if (node) {
+		struct memtype *data = container_of(node, struct memtype, rb);
+		ret = data->subtree_max_end;
+	}
+	return ret;
+}
+
+/* Update 'subtree_max_end' for a node, based on node and its children */
+static void update_node_max_end(struct rb_node *node)
+{
+	struct memtype *data;
+	u64 max_end, child_max_end;
+
+	if (!node)
+		return;
+
+	data = container_of(node, struct memtype, rb);
+	max_end = data->end;
+
+	child_max_end = get_subtree_max_end(node->rb_right);
+	if (child_max_end > max_end)
+		max_end = child_max_end;
+
+	child_max_end = get_subtree_max_end(node->rb_left);
+	if (child_max_end > max_end)
+		max_end = child_max_end;
+
+	data->subtree_max_end = max_end;
+}
+
+/* Update 'subtree_max_end' for a node and all its ancestors */
+static void update_path_max_end(struct rb_node *node)
+{
+	u64 old_max_end, new_max_end;
+
+	while (node) {
+		struct memtype *data = container_of(node, struct memtype, rb);
+
+		old_max_end = data->subtree_max_end;
+		update_node_max_end(node);
+		new_max_end = data->subtree_max_end;
+
+		if (new_max_end == old_max_end)
+			break;
+
+		node = rb_parent(node);
+	}
+}
+
+/* Find the first (lowest start addr) overlapping range from rb tree */
+static struct memtype *memtype_rb_lowest_match(struct rb_root *root,
+				u64 start, u64 end)
+{
+	struct rb_node *node = root->rb_node;
+	struct memtype *last_lower = NULL;
+
+	while (node) {
+		struct memtype *data = container_of(node, struct memtype, rb);
+
+		if (get_subtree_max_end(node->rb_left) > start) {
+			/* Lowest overlap if any must be on left side */
+			node = node->rb_left;
+		} else if (is_node_overlap(data, start, end)) {
+			last_lower = data;
+			break;
+		} else if (start >= data->start) {
+			/* Lowest overlap if any must be on right side */
+			node = node->rb_right;
+		} else {
+			break;
+		}
+	}
+	return last_lower; /* Returns NULL if there is no overlap */
+}
+
+static struct memtype *memtype_rb_exact_match(struct rb_root *root,
+				u64 start, u64 end)
+{
+	struct memtype *match;
+
+	match = memtype_rb_lowest_match(root, start, end);
+	while (match != NULL && match->start < end) {
+		struct rb_node *node;
+
+		if (match->start == start && match->end == end)
+			return match;
+
+		node = rb_next(&match->rb);
+		if (node)
+			match = container_of(node, struct memtype, rb);
+		else
+			match = NULL;
+	}
+
+	return NULL; /* Returns NULL if there is no exact match */
+}
+
+static int memtype_rb_check_conflict(struct rb_root *root,
+				u64 start, u64 end,
+				unsigned long reqtype, unsigned long *newtype)
+{
+	struct rb_node *node;
+	struct memtype *match;
+	int found_type = reqtype;
+
+	match = memtype_rb_lowest_match(&memtype_rbroot, start, end);
+	if (match == NULL)
+		goto success;
+
+	if (match->type != found_type && newtype == NULL)
+		goto failure;
+
+	dprintk("Overlap at 0x%Lx-0x%Lx\n", match->start, match->end);
+	found_type = match->type;
+
+	node = rb_next(&match->rb);
+	while (node) {
+		match = container_of(node, struct memtype, rb);
+
+		if (match->start >= end) /* Checked all possible matches */
+			goto success;
+
+		if (is_node_overlap(match, start, end) &&
+		    match->type != found_type) {
+			goto failure;
+		}
+
+		node = rb_next(&match->rb);
+	}
+success:
+	if (newtype)
+		*newtype = found_type;
+
+	return 0;
+
+failure:
+	printk(KERN_INFO "%s:%d conflicting memory types "
+		"%Lx-%Lx %s<->%s\n", current->comm, current->pid, start,
+		end, cattr_name(found_type), cattr_name(match->type));
+	return -EBUSY;
+}
+
+static void memtype_rb_augment_cb(struct rb_node *node)
+{
+	if (node)
+		update_path_max_end(node);
+}
+
+static void memtype_rb_insert(struct rb_root *root, struct memtype *newdata)
+{
+	struct rb_node **node = &(root->rb_node);
+	struct rb_node *parent = NULL;
+
+	while (*node) {
+		struct memtype *data = container_of(*node, struct memtype, rb);
+
+		parent = *node;
+		if (newdata->start <= data->start)
+			node = &((*node)->rb_left);
+		else if (newdata->start > data->start)
+			node = &((*node)->rb_right);
+	}
+
+	rb_link_node(&newdata->rb, parent, node);
+	rb_insert_color(&newdata->rb, root);
+}
+
+int rbt_memtype_check_insert(struct memtype *new, unsigned long *ret_type)
+{
+	int err = 0;
+
+	err = memtype_rb_check_conflict(&memtype_rbroot, new->start, new->end,
+						new->type, ret_type);
+
+	if (!err) {
+		new->type = *ret_type;
+		memtype_rb_insert(&memtype_rbroot, new);
+	}
+	return err;
+}
+
+int rbt_memtype_erase(u64 start, u64 end)
+{
+	struct memtype *data;
+
+	data = memtype_rb_exact_match(&memtype_rbroot, start, end);
+	if (!data)
+		return -EINVAL;
+
+	rb_erase(&data->rb, &memtype_rbroot);
+	return 0;
+}
+
+struct memtype *rbt_memtype_lookup(u64 addr)
+{
+	struct memtype *data;
+	data = memtype_rb_lowest_match(&memtype_rbroot, addr, addr + PAGE_SIZE);
+	return data;
+}
+
+#if defined(CONFIG_DEBUG_FS)
+int rbt_memtype_copy_nth_element(struct memtype *out, loff_t pos)
+{
+	struct rb_node *node;
+	int i = 1;
+
+	node = rb_first(&memtype_rbroot);
+	while (node && pos != i) {
+		node = rb_next(node);
+		i++;
+	}
+
+	if (node) { /* pos == i */
+		struct memtype *this = container_of(node, struct memtype, rb);
+		*out = *this;
+		return 0;
+	} else {
+		return 1;
+	}
+}
+#endif
+
-- 
1.6.0.6


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [tip:x86/pat] rbtree: Add support for augmented rbtrees
  2010-02-10 23:23   ` [patch 1/3] rbtree: Add support for augmented rbtrees (ver 2) Pallipadi, Venkatesh
@ 2010-02-19  0:21     ` tip-bot for Pallipadi, Venkatesh
  2010-05-27 15:29       ` Peter Zijlstra
  2010-05-29 12:29       ` [RFC][PATCH] rbtree: undo augmented damage Peter Zijlstra
  0 siblings, 2 replies; 27+ messages in thread
From: tip-bot for Pallipadi, Venkatesh @ 2010-02-19  0:21 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, venkatesh.pallipadi, suresh.b.siddha, tglx

Commit-ID:  17d9ddc72fb8bba0d4f67868c9c612e472a594a9
Gitweb:     http://git.kernel.org/tip/17d9ddc72fb8bba0d4f67868c9c612e472a594a9
Author:     Pallipadi, Venkatesh <venkatesh.pallipadi@intel.com>
AuthorDate: Wed, 10 Feb 2010 15:23:44 -0800
Committer:  H. Peter Anvin <hpa@zytor.com>
CommitDate: Thu, 18 Feb 2010 15:40:56 -0800

rbtree: Add support for augmented rbtrees

Add support for augmented rbtrees in core rbtree code.

This will be used in subsequent patches, in x86 PAT code, which needs
interval trees to efficiently keep track of PAT ranges.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
LKML-Reference: <20100210232343.GA11465@linux-os.sc.intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
---
 Documentation/rbtree.txt |   58 ++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/rbtree.h   |    5 +++-
 lib/rbtree.c             |   48 ++++++++++++++++++++++++++++++++++---
 3 files changed, 106 insertions(+), 5 deletions(-)

diff --git a/Documentation/rbtree.txt b/Documentation/rbtree.txt
index aae8355..221f38b 100644
--- a/Documentation/rbtree.txt
+++ b/Documentation/rbtree.txt
@@ -190,3 +190,61 @@ Example:
   for (node = rb_first(&mytree); node; node = rb_next(node))
 	printk("key=%s\n", rb_entry(node, struct mytype, node)->keystring);
 
+Support for Augmented rbtrees
+-----------------------------
+
+Augmented rbtree is an rbtree with "some" additional data stored in each node.
+This data can be used to augment some new functionality to rbtree.
+Augmented rbtree is an optional feature built on top of basic rbtree
+infrastructure. rbtree user who wants this feature will have an augment
+callback function in rb_root initialized.
+
+This callback function will be called from rbtree core routines whenever
+a node has a change in one or both of its children. It is the responsibility
+of the callback function to recalculate the additional data that is in the
+rb node using new children information. Note that if this new additional
+data affects the parent node's additional data, then callback function has
+to handle it and do the recursive updates.
+
+
+Interval tree is an example of augmented rb tree. Reference -
+"Introduction to Algorithms" by Cormen, Leiserson, Rivest and Stein.
+More details about interval trees:
+
+Classical rbtree has a single key and it cannot be directly used to store
+interval ranges like [lo:hi] and do a quick lookup for any overlap with a new
+lo:hi or to find whether there is an exact match for a new lo:hi.
+
+However, rbtree can be augmented to store such interval ranges in a structured
+way making it possible to do efficient lookup and exact match.
+
+This "extra information" stored in each node is the maximum hi
+(max_hi) value among all the nodes that are its descendents. This
+information can be maintained at each node just be looking at the node
+and its immediate children. And this will be used in O(log n) lookup
+for lowest match (lowest start address among all possible matches)
+with something like:
+
+find_lowest_match(lo, hi, node)
+{
+	lowest_match = NULL;
+	while (node) {
+		if (max_hi(node->left) > lo) {
+			// Lowest overlap if any must be on left side
+			node = node->left;
+		} else if (overlap(lo, hi, node)) {
+			lowest_match = node;
+			break;
+		} else if (lo > node->lo) {
+			// Lowest overlap if any must be on right side
+			node = node->right;
+		} else {
+			break;
+		}
+	}
+	return lowest_match;
+}
+
+Finding exact match will be to first find lowest match and then to follow
+successor nodes looking for exact match, until the start of a node is beyond
+the hi value we are looking for.
diff --git a/include/linux/rbtree.h b/include/linux/rbtree.h
index 9c29541..8e33a25 100644
--- a/include/linux/rbtree.h
+++ b/include/linux/rbtree.h
@@ -110,6 +110,7 @@ struct rb_node
 struct rb_root
 {
 	struct rb_node *rb_node;
+	void (*augment_cb)(struct rb_node *node);
 };
 
 
@@ -129,7 +130,9 @@ static inline void rb_set_color(struct rb_node *rb, int color)
 	rb->rb_parent_color = (rb->rb_parent_color & ~1) | color;
 }
 
-#define RB_ROOT	(struct rb_root) { NULL, }
+#define RB_ROOT	(struct rb_root) { NULL, NULL, }
+#define RB_AUGMENT_ROOT(x)	(struct rb_root) { NULL, x}
+
 #define	rb_entry(ptr, type, member) container_of(ptr, type, member)
 
 #define RB_EMPTY_ROOT(root)	((root)->rb_node == NULL)
diff --git a/lib/rbtree.c b/lib/rbtree.c
index e2aa3be..15e10b1 100644
--- a/lib/rbtree.c
+++ b/lib/rbtree.c
@@ -44,6 +44,11 @@ static void __rb_rotate_left(struct rb_node *node, struct rb_root *root)
 	else
 		root->rb_node = right;
 	rb_set_parent(node, right);
+
+	if (root->augment_cb) {
+		root->augment_cb(node);
+		root->augment_cb(right);
+	}
 }
 
 static void __rb_rotate_right(struct rb_node *node, struct rb_root *root)
@@ -67,12 +72,20 @@ static void __rb_rotate_right(struct rb_node *node, struct rb_root *root)
 	else
 		root->rb_node = left;
 	rb_set_parent(node, left);
+
+	if (root->augment_cb) {
+		root->augment_cb(node);
+		root->augment_cb(left);
+	}
 }
 
 void rb_insert_color(struct rb_node *node, struct rb_root *root)
 {
 	struct rb_node *parent, *gparent;
 
+	if (root->augment_cb)
+		root->augment_cb(node);
+
 	while ((parent = rb_parent(node)) && rb_is_red(parent))
 	{
 		gparent = rb_parent(parent);
@@ -227,12 +240,15 @@ void rb_erase(struct rb_node *node, struct rb_root *root)
 	else
 	{
 		struct rb_node *old = node, *left;
+		int old_parent_cb = 0;
+		int successor_parent_cb = 0;
 
 		node = node->rb_right;
 		while ((left = node->rb_left) != NULL)
 			node = left;
 
 		if (rb_parent(old)) {
+			old_parent_cb = 1;
 			if (rb_parent(old)->rb_left == old)
 				rb_parent(old)->rb_left = node;
 			else
@@ -247,8 +263,10 @@ void rb_erase(struct rb_node *node, struct rb_root *root)
 		if (parent == old) {
 			parent = node;
 		} else {
+			successor_parent_cb = 1;
 			if (child)
 				rb_set_parent(child, parent);
+
 			parent->rb_left = child;
 
 			node->rb_right = old->rb_right;
@@ -259,6 +277,24 @@ void rb_erase(struct rb_node *node, struct rb_root *root)
 		node->rb_left = old->rb_left;
 		rb_set_parent(old->rb_left, node);
 
+		if (root->augment_cb) {
+			/*
+			 * Here, three different nodes can have new children.
+			 * The parent of the successor node that was selected
+			 * to replace the node to be erased.
+			 * The node that is getting erased and is now replaced
+			 * by its successor.
+			 * The parent of the node getting erased-replaced.
+			 */
+			if (successor_parent_cb)
+				root->augment_cb(parent);
+
+			root->augment_cb(node);
+
+			if (old_parent_cb)
+				root->augment_cb(rb_parent(old));
+		}
+
 		goto color;
 	}
 
@@ -267,15 +303,19 @@ void rb_erase(struct rb_node *node, struct rb_root *root)
 
 	if (child)
 		rb_set_parent(child, parent);
-	if (parent)
-	{
+
+	if (parent) {
 		if (parent->rb_left == node)
 			parent->rb_left = child;
 		else
 			parent->rb_right = child;
-	}
-	else
+
+		if (root->augment_cb)
+			root->augment_cb(parent);
+
+	} else {
 		root->rb_node = child;
+	}
 
  color:
 	if (color == RB_BLACK)

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [tip:x86/pat] x86, pat: Preparatory changes in pat.c for bigger rbtree change
  2010-02-10 19:57 ` [patch 2/3] x86: Preparatory changes in pat.c for bigger rbtree change venkatesh.pallipadi
@ 2010-02-19  0:22   ` tip-bot for venkatesh.pallipadi@intel.com
  0 siblings, 0 replies; 27+ messages in thread
From: tip-bot for venkatesh.pallipadi@intel.com @ 2010-02-19  0:22 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, venkatesh.pallipadi, suresh.b.siddha, tglx

Commit-ID:  be5a0c126ad1dea2128dc5aef12c87083518d1ab
Gitweb:     http://git.kernel.org/tip/be5a0c126ad1dea2128dc5aef12c87083518d1ab
Author:     venkatesh.pallipadi@intel.com <venkatesh.pallipadi@intel.com>
AuthorDate: Wed, 10 Feb 2010 11:57:06 -0800
Committer:  H. Peter Anvin <hpa@zytor.com>
CommitDate: Thu, 18 Feb 2010 15:41:21 -0800

x86, pat: Preparatory changes in pat.c for bigger rbtree change

Minor changes in pat.c to cleanup code and make it smoother to introduce
bigger rbtree only change in the following patch. The changes are cleaup
only and should not have any functional impact.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
LKML-Reference: <20100210195909.792781000@intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
---
 arch/x86/mm/pat.c          |  170 +++++++++++++++++++++++---------------------
 arch/x86/mm/pat_internal.h |   28 +++++++
 2 files changed, 116 insertions(+), 82 deletions(-)

diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index ae9648e..628e507 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -30,6 +30,8 @@
 #include <asm/pat.h>
 #include <asm/io.h>
 
+#include "pat_internal.h"
+
 #ifdef CONFIG_X86_PAT
 int __read_mostly pat_enabled = 1;
 
@@ -53,19 +55,15 @@ static inline void pat_disable(const char *reason)
 #endif
 
 
-static int debug_enable;
+int pat_debug_enable;
 
 static int __init pat_debug_setup(char *str)
 {
-	debug_enable = 1;
+	pat_debug_enable = 1;
 	return 0;
 }
 __setup("debugpat", pat_debug_setup);
 
-#define dprintk(fmt, arg...) \
-	do { if (debug_enable) printk(KERN_INFO fmt, ##arg); } while (0)
-
-
 static u64 __read_mostly boot_pat_state;
 
 enum {
@@ -132,17 +130,6 @@ void pat_init(void)
 
 #undef PAT
 
-static char *cattr_name(unsigned long flags)
-{
-	switch (flags & _PAGE_CACHE_MASK) {
-	case _PAGE_CACHE_UC:		return "uncached";
-	case _PAGE_CACHE_UC_MINUS:	return "uncached-minus";
-	case _PAGE_CACHE_WB:		return "write-back";
-	case _PAGE_CACHE_WC:		return "write-combining";
-	default:			return "broken";
-	}
-}
-
 /*
  * The global memtype list keeps track of memory type for specific
  * physical memory areas. Conflicting memory types in different
@@ -159,14 +146,6 @@ static char *cattr_name(unsigned long flags)
  * memtype_lock protects both the linear list and rbtree.
  */
 
-struct memtype {
-	u64			start;
-	u64			end;
-	unsigned long		type;
-	struct list_head	nd;
-	struct rb_node		rb;
-};
-
 static struct rb_root memtype_rbroot = RB_ROOT;
 static LIST_HEAD(memtype_list);
 static DEFINE_SPINLOCK(memtype_lock);	/* protects memtype list */
@@ -349,6 +328,64 @@ static int free_ram_pages_type(u64 start, u64 end)
 	return 0;
 }
 
+static int memtype_check_insert(struct memtype *new, unsigned long *new_type)
+{
+	struct memtype *entry;
+	u64 start, end;
+	unsigned long actual_type;
+	struct list_head *where;
+	int err = 0;
+
+	start = new->start;
+	end = new->end;
+	actual_type = new->type;
+
+	/* Search for existing mapping that overlaps the current range */
+	where = NULL;
+	list_for_each_entry(entry, &memtype_list, nd) {
+		if (end <= entry->start) {
+			where = entry->nd.prev;
+			break;
+		} else if (start <= entry->start) { /* end > entry->start */
+			err = chk_conflict(new, entry, new_type);
+			if (!err) {
+				dprintk("Overlap at 0x%Lx-0x%Lx\n",
+					entry->start, entry->end);
+				where = entry->nd.prev;
+			}
+			break;
+		} else if (start < entry->end) { /* start > entry->start */
+			err = chk_conflict(new, entry, new_type);
+			if (!err) {
+				dprintk("Overlap at 0x%Lx-0x%Lx\n",
+					entry->start, entry->end);
+
+				/*
+				 * Move to right position in the linked
+				 * list to add this new entry
+				 */
+				list_for_each_entry_continue(entry,
+							&memtype_list, nd) {
+					if (start <= entry->start) {
+						where = entry->nd.prev;
+						break;
+					}
+				}
+			}
+			break;
+		}
+	}
+	if (!err) {
+		if (where)
+			list_add(&new->nd, where);
+		else
+			list_add_tail(&new->nd, &memtype_list);
+
+		memtype_rb_insert(&memtype_rbroot, new);
+	}
+	return err;
+}
+
 /*
  * req_type typically has one of the:
  * - _PAGE_CACHE_WB
@@ -364,9 +401,8 @@ static int free_ram_pages_type(u64 start, u64 end)
 int reserve_memtype(u64 start, u64 end, unsigned long req_type,
 		    unsigned long *new_type)
 {
-	struct memtype *new, *entry;
+	struct memtype *new;
 	unsigned long actual_type;
-	struct list_head *where;
 	int is_range_ram;
 	int err = 0;
 
@@ -423,42 +459,7 @@ int reserve_memtype(u64 start, u64 end, unsigned long req_type,
 
 	spin_lock(&memtype_lock);
 
-	/* Search for existing mapping that overlaps the current range */
-	where = NULL;
-	list_for_each_entry(entry, &memtype_list, nd) {
-		if (end <= entry->start) {
-			where = entry->nd.prev;
-			break;
-		} else if (start <= entry->start) { /* end > entry->start */
-			err = chk_conflict(new, entry, new_type);
-			if (!err) {
-				dprintk("Overlap at 0x%Lx-0x%Lx\n",
-					entry->start, entry->end);
-				where = entry->nd.prev;
-			}
-			break;
-		} else if (start < entry->end) { /* start > entry->start */
-			err = chk_conflict(new, entry, new_type);
-			if (!err) {
-				dprintk("Overlap at 0x%Lx-0x%Lx\n",
-					entry->start, entry->end);
-
-				/*
-				 * Move to right position in the linked
-				 * list to add this new entry
-				 */
-				list_for_each_entry_continue(entry,
-							&memtype_list, nd) {
-					if (start <= entry->start) {
-						where = entry->nd.prev;
-						break;
-					}
-				}
-			}
-			break;
-		}
-	}
-
+	err = memtype_check_insert(new, new_type);
 	if (err) {
 		printk(KERN_INFO "reserve_memtype failed 0x%Lx-0x%Lx, "
 		       "track %s, req %s\n",
@@ -469,13 +470,6 @@ int reserve_memtype(u64 start, u64 end, unsigned long req_type,
 		return err;
 	}
 
-	if (where)
-		list_add(&new->nd, where);
-	else
-		list_add_tail(&new->nd, &memtype_list);
-
-	memtype_rb_insert(&memtype_rbroot, new);
-
 	spin_unlock(&memtype_lock);
 
 	dprintk("reserve_memtype added 0x%Lx-0x%Lx, track %s, req %s, ret %s\n",
@@ -937,28 +931,40 @@ EXPORT_SYMBOL_GPL(pgprot_writecombine);
 #if defined(CONFIG_DEBUG_FS) && defined(CONFIG_X86_PAT)
 
 /* get Nth element of the linked list */
-static struct memtype *memtype_get_idx(loff_t pos)
+static int copy_memtype_nth_element(struct memtype *out, loff_t pos)
 {
-	struct memtype *list_node, *print_entry;
+	struct memtype *list_node;
 	int i = 1;
 
-	print_entry  = kmalloc(sizeof(struct memtype), GFP_KERNEL);
-	if (!print_entry)
-		return NULL;
-
-	spin_lock(&memtype_lock);
 	list_for_each_entry(list_node, &memtype_list, nd) {
 		if (pos == i) {
-			*print_entry = *list_node;
-			spin_unlock(&memtype_lock);
-			return print_entry;
+			*out = *list_node;
+			return 0;
 		}
 		++i;
 	}
+	return 1;
+}
+
+static struct memtype *memtype_get_idx(loff_t pos)
+{
+	struct memtype *print_entry;
+	int ret;
+
+	print_entry  = kzalloc(sizeof(struct memtype), GFP_KERNEL);
+	if (!print_entry)
+		return NULL;
+
+	spin_lock(&memtype_lock);
+	ret = copy_memtype_nth_element(print_entry, pos);
 	spin_unlock(&memtype_lock);
-	kfree(print_entry);
 
-	return NULL;
+	if (!ret) {
+		return print_entry;
+	} else {
+		kfree(print_entry);
+		return NULL;
+	}
 }
 
 static void *memtype_seq_start(struct seq_file *seq, loff_t *pos)
diff --git a/arch/x86/mm/pat_internal.h b/arch/x86/mm/pat_internal.h
new file mode 100644
index 0000000..6c98780
--- /dev/null
+++ b/arch/x86/mm/pat_internal.h
@@ -0,0 +1,28 @@
+#ifndef __PAT_INTERNAL_H_
+#define __PAT_INTERNAL_H_
+
+extern int pat_debug_enable;
+
+#define dprintk(fmt, arg...) \
+	do { if (pat_debug_enable) printk(KERN_INFO fmt, ##arg); } while (0)
+
+struct memtype {
+	u64			start;
+	u64			end;
+	unsigned long		type;
+	struct list_head	nd;
+	struct rb_node		rb;
+};
+
+static inline char *cattr_name(unsigned long flags)
+{
+	switch (flags & _PAGE_CACHE_MASK) {
+	case _PAGE_CACHE_UC:		return "uncached";
+	case _PAGE_CACHE_UC_MINUS:	return "uncached-minus";
+	case _PAGE_CACHE_WB:		return "write-back";
+	case _PAGE_CACHE_WC:		return "write-combining";
+	default:			return "broken";
+	}
+}
+
+#endif /* __PAT_INTERNAL_H_ */

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [tip:x86/pat] x86, pat: Migrate to rbtree only backend for pat memtype management
  2010-02-10 23:26   ` [patch 3/3] x86: Migrate to rbtree only backend for pat memtype management (ver 2) Pallipadi, Venkatesh
@ 2010-02-19  0:22     ` tip-bot for Pallipadi, Venkatesh
  0 siblings, 0 replies; 27+ messages in thread
From: tip-bot for Pallipadi, Venkatesh @ 2010-02-19  0:22 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, venkatesh.pallipadi, suresh.b.siddha, tglx

Commit-ID:  9e41a49aab88a5a6c8f4875bf10a5543bc321f2d
Gitweb:     http://git.kernel.org/tip/9e41a49aab88a5a6c8f4875bf10a5543bc321f2d
Author:     Pallipadi, Venkatesh <venkatesh.pallipadi@intel.com>
AuthorDate: Wed, 10 Feb 2010 15:26:07 -0800
Committer:  H. Peter Anvin <hpa@zytor.com>
CommitDate: Thu, 18 Feb 2010 15:41:36 -0800

x86, pat: Migrate to rbtree only backend for pat memtype management

Move pat backend to fully rbtree based implementation from the existing
rbtree and linked list hybrid.

New rbtree based solution uses interval trees (augmented rbtrees) in
order to store the PAT ranges. The new code seprates out the pat backend
to pat_rbtree.c file, making is cleaner. The change also makes the PAT
lookup, reserve and free operations more optimal, as we don't have to
traverse linear linked list of few tens of entries in normal case.

Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
LKML-Reference: <20100210232607.GB11465@linux-os.sc.intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
---
 arch/x86/mm/Makefile       |    1 +
 arch/x86/mm/pat.c          |  209 +---------------------------------
 arch/x86/mm/pat_internal.h |   20 +++-
 arch/x86/mm/pat_rbtree.c   |  271 ++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 296 insertions(+), 205 deletions(-)

diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile
index 06630d2..a4c7683 100644
--- a/arch/x86/mm/Makefile
+++ b/arch/x86/mm/Makefile
@@ -6,6 +6,7 @@ nostackp := $(call cc-option, -fno-stack-protector)
 CFLAGS_physaddr.o		:= $(nostackp)
 CFLAGS_setup_nx.o		:= $(nostackp)
 
+obj-$(CONFIG_X86_PAT)		+= pat_rbtree.o
 obj-$(CONFIG_SMP)		+= tlb.o
 
 obj-$(CONFIG_X86_32)		+= pgtable_32.o iomap_32.o
diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c
index 628e507..9510111 100644
--- a/arch/x86/mm/pat.c
+++ b/arch/x86/mm/pat.c
@@ -130,65 +130,7 @@ void pat_init(void)
 
 #undef PAT
 
-/*
- * The global memtype list keeps track of memory type for specific
- * physical memory areas. Conflicting memory types in different
- * mappings can cause CPU cache corruption. To avoid this we keep track.
- *
- * The list is sorted based on starting address and can contain multiple
- * entries for each address (this allows reference counting for overlapping
- * areas). All the aliases have the same cache attributes of course.
- * Zero attributes are represented as holes.
- *
- * The data structure is a list that is also organized as an rbtree
- * sorted on the start address of memtype range.
- *
- * memtype_lock protects both the linear list and rbtree.
- */
-
-static struct rb_root memtype_rbroot = RB_ROOT;
-static LIST_HEAD(memtype_list);
-static DEFINE_SPINLOCK(memtype_lock);	/* protects memtype list */
-
-static struct memtype *memtype_rb_search(struct rb_root *root, u64 start)
-{
-	struct rb_node *node = root->rb_node;
-	struct memtype *last_lower = NULL;
-
-	while (node) {
-		struct memtype *data = container_of(node, struct memtype, rb);
-
-		if (data->start < start) {
-			last_lower = data;
-			node = node->rb_right;
-		} else if (data->start > start) {
-			node = node->rb_left;
-		} else
-			return data;
-	}
-
-	/* Will return NULL if there is no entry with its start <= start */
-	return last_lower;
-}
-
-static void memtype_rb_insert(struct rb_root *root, struct memtype *data)
-{
-	struct rb_node **new = &(root->rb_node);
-	struct rb_node *parent = NULL;
-
-	while (*new) {
-		struct memtype *this = container_of(*new, struct memtype, rb);
-
-		parent = *new;
-		if (data->start <= this->start)
-			new = &((*new)->rb_left);
-		else if (data->start > this->start)
-			new = &((*new)->rb_right);
-	}
-
-	rb_link_node(&data->rb, parent, new);
-	rb_insert_color(&data->rb, root);
-}
+static DEFINE_SPINLOCK(memtype_lock);	/* protects memtype accesses */
 
 /*
  * Does intersection of PAT memory type and MTRR memory type and returns
@@ -216,33 +158,6 @@ static unsigned long pat_x_mtrr_type(u64 start, u64 end, unsigned long req_type)
 	return req_type;
 }
 
-static int
-chk_conflict(struct memtype *new, struct memtype *entry, unsigned long *type)
-{
-	if (new->type != entry->type) {
-		if (type) {
-			new->type = entry->type;
-			*type = entry->type;
-		} else
-			goto conflict;
-	}
-
-	 /* check overlaps with more than one entry in the list */
-	list_for_each_entry_continue(entry, &memtype_list, nd) {
-		if (new->end <= entry->start)
-			break;
-		else if (new->type != entry->type)
-			goto conflict;
-	}
-	return 0;
-
- conflict:
-	printk(KERN_INFO "%s:%d conflicting memory types "
-	       "%Lx-%Lx %s<->%s\n", current->comm, current->pid, new->start,
-	       new->end, cattr_name(new->type), cattr_name(entry->type));
-	return -EBUSY;
-}
-
 static int pat_pagerange_is_ram(unsigned long start, unsigned long end)
 {
 	int ram_page = 0, not_rampage = 0;
@@ -328,64 +243,6 @@ static int free_ram_pages_type(u64 start, u64 end)
 	return 0;
 }
 
-static int memtype_check_insert(struct memtype *new, unsigned long *new_type)
-{
-	struct memtype *entry;
-	u64 start, end;
-	unsigned long actual_type;
-	struct list_head *where;
-	int err = 0;
-
-	start = new->start;
-	end = new->end;
-	actual_type = new->type;
-
-	/* Search for existing mapping that overlaps the current range */
-	where = NULL;
-	list_for_each_entry(entry, &memtype_list, nd) {
-		if (end <= entry->start) {
-			where = entry->nd.prev;
-			break;
-		} else if (start <= entry->start) { /* end > entry->start */
-			err = chk_conflict(new, entry, new_type);
-			if (!err) {
-				dprintk("Overlap at 0x%Lx-0x%Lx\n",
-					entry->start, entry->end);
-				where = entry->nd.prev;
-			}
-			break;
-		} else if (start < entry->end) { /* start > entry->start */
-			err = chk_conflict(new, entry, new_type);
-			if (!err) {
-				dprintk("Overlap at 0x%Lx-0x%Lx\n",
-					entry->start, entry->end);
-
-				/*
-				 * Move to right position in the linked
-				 * list to add this new entry
-				 */
-				list_for_each_entry_continue(entry,
-							&memtype_list, nd) {
-					if (start <= entry->start) {
-						where = entry->nd.prev;
-						break;
-					}
-				}
-			}
-			break;
-		}
-	}
-	if (!err) {
-		if (where)
-			list_add(&new->nd, where);
-		else
-			list_add_tail(&new->nd, &memtype_list);
-
-		memtype_rb_insert(&memtype_rbroot, new);
-	}
-	return err;
-}
-
 /*
  * req_type typically has one of the:
  * - _PAGE_CACHE_WB
@@ -459,7 +316,7 @@ int reserve_memtype(u64 start, u64 end, unsigned long req_type,
 
 	spin_lock(&memtype_lock);
 
-	err = memtype_check_insert(new, new_type);
+	err = rbt_memtype_check_insert(new, new_type);
 	if (err) {
 		printk(KERN_INFO "reserve_memtype failed 0x%Lx-0x%Lx, "
 		       "track %s, req %s\n",
@@ -481,7 +338,6 @@ int reserve_memtype(u64 start, u64 end, unsigned long req_type,
 
 int free_memtype(u64 start, u64 end)
 {
-	struct memtype *entry, *saved_entry;
 	int err = -EINVAL;
 	int is_range_ram;
 
@@ -505,46 +361,7 @@ int free_memtype(u64 start, u64 end)
 	}
 
 	spin_lock(&memtype_lock);
-
-	entry = memtype_rb_search(&memtype_rbroot, start);
-	if (unlikely(entry == NULL))
-		goto unlock_ret;
-
-	/*
-	 * Saved entry points to an entry with start same or less than what
-	 * we searched for. Now go through the list in both directions to look
-	 * for the entry that matches with both start and end, with list stored
-	 * in sorted start address
-	 */
-	saved_entry = entry;
-	list_for_each_entry_from(entry, &memtype_list, nd) {
-		if (entry->start == start && entry->end == end) {
-			rb_erase(&entry->rb, &memtype_rbroot);
-			list_del(&entry->nd);
-			kfree(entry);
-			err = 0;
-			break;
-		} else if (entry->start > start) {
-			break;
-		}
-	}
-
-	if (!err)
-		goto unlock_ret;
-
-	entry = saved_entry;
-	list_for_each_entry_reverse(entry, &memtype_list, nd) {
-		if (entry->start == start && entry->end == end) {
-			rb_erase(&entry->rb, &memtype_rbroot);
-			list_del(&entry->nd);
-			kfree(entry);
-			err = 0;
-			break;
-		} else if (entry->start < start) {
-			break;
-		}
-	}
-unlock_ret:
+	err = rbt_memtype_erase(start, end);
 	spin_unlock(&memtype_lock);
 
 	if (err) {
@@ -593,7 +410,7 @@ static unsigned long lookup_memtype(u64 paddr)
 
 	spin_lock(&memtype_lock);
 
-	entry = memtype_rb_search(&memtype_rbroot, paddr);
+	entry = rbt_memtype_lookup(paddr);
 	if (entry != NULL)
 		rettype = entry->type;
 	else
@@ -930,22 +747,6 @@ EXPORT_SYMBOL_GPL(pgprot_writecombine);
 
 #if defined(CONFIG_DEBUG_FS) && defined(CONFIG_X86_PAT)
 
-/* get Nth element of the linked list */
-static int copy_memtype_nth_element(struct memtype *out, loff_t pos)
-{
-	struct memtype *list_node;
-	int i = 1;
-
-	list_for_each_entry(list_node, &memtype_list, nd) {
-		if (pos == i) {
-			*out = *list_node;
-			return 0;
-		}
-		++i;
-	}
-	return 1;
-}
-
 static struct memtype *memtype_get_idx(loff_t pos)
 {
 	struct memtype *print_entry;
@@ -956,7 +757,7 @@ static struct memtype *memtype_get_idx(loff_t pos)
 		return NULL;
 
 	spin_lock(&memtype_lock);
-	ret = copy_memtype_nth_element(print_entry, pos);
+	ret = rbt_memtype_copy_nth_element(print_entry, pos);
 	spin_unlock(&memtype_lock);
 
 	if (!ret) {
diff --git a/arch/x86/mm/pat_internal.h b/arch/x86/mm/pat_internal.h
index 6c98780..4f39eef 100644
--- a/arch/x86/mm/pat_internal.h
+++ b/arch/x86/mm/pat_internal.h
@@ -9,8 +9,8 @@ extern int pat_debug_enable;
 struct memtype {
 	u64			start;
 	u64			end;
+	u64			subtree_max_end;
 	unsigned long		type;
-	struct list_head	nd;
 	struct rb_node		rb;
 };
 
@@ -25,4 +25,22 @@ static inline char *cattr_name(unsigned long flags)
 	}
 }
 
+#ifdef CONFIG_X86_PAT
+extern int rbt_memtype_check_insert(struct memtype *new,
+					unsigned long *new_type);
+extern int rbt_memtype_erase(u64 start, u64 end);
+extern struct memtype *rbt_memtype_lookup(u64 addr);
+extern int rbt_memtype_copy_nth_element(struct memtype *out, loff_t pos);
+#else
+static inline int rbt_memtype_check_insert(struct memtype *new,
+					unsigned long *new_type)
+{ return 0; }
+static inline int rbt_memtype_erase(u64 start, u64 end)
+{ return 0; }
+static inline struct memtype *rbt_memtype_lookup(u64 addr)
+{ return NULL; }
+static inline int rbt_memtype_copy_nth_element(struct memtype *out, loff_t pos)
+{ return 0; }
+#endif
+
 #endif /* __PAT_INTERNAL_H_ */
diff --git a/arch/x86/mm/pat_rbtree.c b/arch/x86/mm/pat_rbtree.c
new file mode 100644
index 0000000..9063f40
--- /dev/null
+++ b/arch/x86/mm/pat_rbtree.c
@@ -0,0 +1,271 @@
+/*
+ * Handle caching attributes in page tables (PAT)
+ *
+ * Authors: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
+ *          Suresh B Siddha <suresh.b.siddha@intel.com>
+ *
+ * Interval tree (augmented rbtree) used to store the PAT memory type
+ * reservations.
+ */
+
+#include <linux/seq_file.h>
+#include <linux/debugfs.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/rbtree.h>
+#include <linux/sched.h>
+#include <linux/gfp.h>
+
+#include <asm/pgtable.h>
+#include <asm/pat.h>
+
+#include "pat_internal.h"
+
+/*
+ * The memtype tree keeps track of memory type for specific
+ * physical memory areas. Without proper tracking, conflicting memory
+ * types in different mappings can cause CPU cache corruption.
+ *
+ * The tree is an interval tree (augmented rbtree) with tree ordered
+ * on starting address. Tree can contain multiple entries for
+ * different regions which overlap. All the aliases have the same
+ * cache attributes of course.
+ *
+ * memtype_lock protects the rbtree.
+ */
+
+static void memtype_rb_augment_cb(struct rb_node *node);
+static struct rb_root memtype_rbroot = RB_AUGMENT_ROOT(&memtype_rb_augment_cb);
+
+static int is_node_overlap(struct memtype *node, u64 start, u64 end)
+{
+	if (node->start >= end || node->end <= start)
+		return 0;
+
+	return 1;
+}
+
+static u64 get_subtree_max_end(struct rb_node *node)
+{
+	u64 ret = 0;
+	if (node) {
+		struct memtype *data = container_of(node, struct memtype, rb);
+		ret = data->subtree_max_end;
+	}
+	return ret;
+}
+
+/* Update 'subtree_max_end' for a node, based on node and its children */
+static void update_node_max_end(struct rb_node *node)
+{
+	struct memtype *data;
+	u64 max_end, child_max_end;
+
+	if (!node)
+		return;
+
+	data = container_of(node, struct memtype, rb);
+	max_end = data->end;
+
+	child_max_end = get_subtree_max_end(node->rb_right);
+	if (child_max_end > max_end)
+		max_end = child_max_end;
+
+	child_max_end = get_subtree_max_end(node->rb_left);
+	if (child_max_end > max_end)
+		max_end = child_max_end;
+
+	data->subtree_max_end = max_end;
+}
+
+/* Update 'subtree_max_end' for a node and all its ancestors */
+static void update_path_max_end(struct rb_node *node)
+{
+	u64 old_max_end, new_max_end;
+
+	while (node) {
+		struct memtype *data = container_of(node, struct memtype, rb);
+
+		old_max_end = data->subtree_max_end;
+		update_node_max_end(node);
+		new_max_end = data->subtree_max_end;
+
+		if (new_max_end == old_max_end)
+			break;
+
+		node = rb_parent(node);
+	}
+}
+
+/* Find the first (lowest start addr) overlapping range from rb tree */
+static struct memtype *memtype_rb_lowest_match(struct rb_root *root,
+				u64 start, u64 end)
+{
+	struct rb_node *node = root->rb_node;
+	struct memtype *last_lower = NULL;
+
+	while (node) {
+		struct memtype *data = container_of(node, struct memtype, rb);
+
+		if (get_subtree_max_end(node->rb_left) > start) {
+			/* Lowest overlap if any must be on left side */
+			node = node->rb_left;
+		} else if (is_node_overlap(data, start, end)) {
+			last_lower = data;
+			break;
+		} else if (start >= data->start) {
+			/* Lowest overlap if any must be on right side */
+			node = node->rb_right;
+		} else {
+			break;
+		}
+	}
+	return last_lower; /* Returns NULL if there is no overlap */
+}
+
+static struct memtype *memtype_rb_exact_match(struct rb_root *root,
+				u64 start, u64 end)
+{
+	struct memtype *match;
+
+	match = memtype_rb_lowest_match(root, start, end);
+	while (match != NULL && match->start < end) {
+		struct rb_node *node;
+
+		if (match->start == start && match->end == end)
+			return match;
+
+		node = rb_next(&match->rb);
+		if (node)
+			match = container_of(node, struct memtype, rb);
+		else
+			match = NULL;
+	}
+
+	return NULL; /* Returns NULL if there is no exact match */
+}
+
+static int memtype_rb_check_conflict(struct rb_root *root,
+				u64 start, u64 end,
+				unsigned long reqtype, unsigned long *newtype)
+{
+	struct rb_node *node;
+	struct memtype *match;
+	int found_type = reqtype;
+
+	match = memtype_rb_lowest_match(&memtype_rbroot, start, end);
+	if (match == NULL)
+		goto success;
+
+	if (match->type != found_type && newtype == NULL)
+		goto failure;
+
+	dprintk("Overlap at 0x%Lx-0x%Lx\n", match->start, match->end);
+	found_type = match->type;
+
+	node = rb_next(&match->rb);
+	while (node) {
+		match = container_of(node, struct memtype, rb);
+
+		if (match->start >= end) /* Checked all possible matches */
+			goto success;
+
+		if (is_node_overlap(match, start, end) &&
+		    match->type != found_type) {
+			goto failure;
+		}
+
+		node = rb_next(&match->rb);
+	}
+success:
+	if (newtype)
+		*newtype = found_type;
+
+	return 0;
+
+failure:
+	printk(KERN_INFO "%s:%d conflicting memory types "
+		"%Lx-%Lx %s<->%s\n", current->comm, current->pid, start,
+		end, cattr_name(found_type), cattr_name(match->type));
+	return -EBUSY;
+}
+
+static void memtype_rb_augment_cb(struct rb_node *node)
+{
+	if (node)
+		update_path_max_end(node);
+}
+
+static void memtype_rb_insert(struct rb_root *root, struct memtype *newdata)
+{
+	struct rb_node **node = &(root->rb_node);
+	struct rb_node *parent = NULL;
+
+	while (*node) {
+		struct memtype *data = container_of(*node, struct memtype, rb);
+
+		parent = *node;
+		if (newdata->start <= data->start)
+			node = &((*node)->rb_left);
+		else if (newdata->start > data->start)
+			node = &((*node)->rb_right);
+	}
+
+	rb_link_node(&newdata->rb, parent, node);
+	rb_insert_color(&newdata->rb, root);
+}
+
+int rbt_memtype_check_insert(struct memtype *new, unsigned long *ret_type)
+{
+	int err = 0;
+
+	err = memtype_rb_check_conflict(&memtype_rbroot, new->start, new->end,
+						new->type, ret_type);
+
+	if (!err) {
+		new->type = *ret_type;
+		memtype_rb_insert(&memtype_rbroot, new);
+	}
+	return err;
+}
+
+int rbt_memtype_erase(u64 start, u64 end)
+{
+	struct memtype *data;
+
+	data = memtype_rb_exact_match(&memtype_rbroot, start, end);
+	if (!data)
+		return -EINVAL;
+
+	rb_erase(&data->rb, &memtype_rbroot);
+	return 0;
+}
+
+struct memtype *rbt_memtype_lookup(u64 addr)
+{
+	struct memtype *data;
+	data = memtype_rb_lowest_match(&memtype_rbroot, addr, addr + PAGE_SIZE);
+	return data;
+}
+
+#if defined(CONFIG_DEBUG_FS)
+int rbt_memtype_copy_nth_element(struct memtype *out, loff_t pos)
+{
+	struct rb_node *node;
+	int i = 1;
+
+	node = rb_first(&memtype_rbroot);
+	while (node && pos != i) {
+		node = rb_next(node);
+		i++;
+	}
+
+	if (node) { /* pos == i */
+		struct memtype *this = container_of(node, struct memtype, rb);
+		*out = *this;
+		return 0;
+	} else {
+		return 1;
+	}
+}
+#endif

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [tip:x86/pat] rbtree: Add support for augmented rbtrees
  2010-02-19  0:21     ` [tip:x86/pat] rbtree: Add support for augmented rbtrees tip-bot for Pallipadi, Venkatesh
@ 2010-05-27 15:29       ` Peter Zijlstra
  2010-05-29 12:29       ` [RFC][PATCH] rbtree: undo augmented damage Peter Zijlstra
  1 sibling, 0 replies; 27+ messages in thread
From: Peter Zijlstra @ 2010-05-27 15:29 UTC (permalink / raw)
  To: mingo, hpa, linux-kernel, venkatesh.pallipadi, suresh.b.siddha, tglx
  Cc: linux-tip-commits

On Fri, 2010-02-19 at 00:21 +0000, tip-bot for Pallipadi, Venkatesh
wrote:
> Commit-ID:  17d9ddc72fb8bba0d4f67868c9c612e472a594a9
> Gitweb:     http://git.kernel.org/tip/17d9ddc72fb8bba0d4f67868c9c612e472a594a9
> Author:     Pallipadi, Venkatesh <venkatesh.pallipadi@intel.com>
> AuthorDate: Wed, 10 Feb 2010 15:23:44 -0800
> Committer:  H. Peter Anvin <hpa@zytor.com>
> CommitDate: Thu, 18 Feb 2010 15:40:56 -0800
> 
> rbtree: Add support for augmented rbtrees
> 
> Add support for augmented rbtrees in core rbtree code.
> 
> This will be used in subsequent patches, in x86 PAT code, which needs
> interval trees to efficiently keep track of PAT ranges.
> 
> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
> LKML-Reference: <20100210232343.GA11465@linux-os.sc.intel.com>
> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
> Signed-off-by: H. Peter Anvin <hpa@zytor.com>

> --- a/lib/rbtree.c
> +++ b/lib/rbtree.c
> @@ -44,6 +44,11 @@ static void __rb_rotate_left(struct rb_node *node, struct rb_root *root)
>  	else
>  		root->rb_node = right;
>  	rb_set_parent(node, right);
> +
> +	if (root->augment_cb) {
> +		root->augment_cb(node);
> +		root->augment_cb(right);
> +	}
>  }
>  
>  static void __rb_rotate_right(struct rb_node *node, struct rb_root *root)
> @@ -67,12 +72,20 @@ static void __rb_rotate_right(struct rb_node *node, struct rb_root *root)
>  	else
>  		root->rb_node = left;
>  	rb_set_parent(node, left);
> +
> +	if (root->augment_cb) {
> +		root->augment_cb(node);
> +		root->augment_cb(left);
> +	}
>  }
>  

Did someone measure what these extra conditionals do to the scheduler?

Also, I don't think you actually need this callback, you can augment
externally like this ('borrowed' the idea from the BFQ code):

full patch at:
http://programming.kicks-ass.net/kernel-patches/sched_eevdf.patch

+static inline struct sched_entity *se_of(struct rb_node *node)
+{
+	return rb_entry(node, struct sched_entity, run_node);
+}
+
+static inline s64 deadline_key(struct cfs_rq *cfs_rq, u64 deadline)
+{
+	return (s64)(deadline - cfs_rq->min_vruntime);
+}
+
+#define deadline_gt(cfs_rq, field, lse, rse)			\
+({ deadline_key(cfs_rq, lse->field) >				\
+   deadline_key(cfs_rq, rse->field); })
+
+static void update_min_deadline(struct cfs_rq *cfs_rq,
+				struct sched_entity *se, struct rb_node *node)
+{
+	if (node && deadline_gt(cfs_rq, min_deadline, se, se_of(node)))
+		se->min_deadline = se_of(node)->min_deadline;
+}
+
+/*
+ * se->min_deadline = min(se->deadline, left->min_deadline, right->min_deadline)
+ */
+static void update_node(struct cfs_rq *cfs_rq, struct rb_node *node)
+{
+	struct sched_entity *se = rb_entry(node, struct sched_entity, run_node);
+
+	se->min_deadline = se->deadline;
+	update_min_deadline(cfs_rq, se, node->rb_right);
+	update_min_deadline(cfs_rq, se, node->rb_left);
+}
+
+/*
+ * update min_deadline for all nodes that could have been affected by
+ * a rebalance pass up from @node
+ */
+static void update_tree(struct cfs_rq *cfs_rq, struct rb_node *node)
+{
+	struct rb_node *parent;
+
+up:
+	update_node(cfs_rq, node);
+	parent = rb_parent(node);
+	if (!parent)
+		return;
+
+	if (node == parent->rb_left && parent->rb_right)
+		update_node(cfs_rq, parent->rb_right);
+	else if (parent->rb_left)
+		update_node(cfs_rq, parent->rb_left);
+
+	node = parent;
+	goto up;
+}
+
+/*
+ * after inserting @se into the tree, update min_deadline to account for
+ * both the new deadline and any damage done by rebalance
+ */
+static void update_tree_enqueue(struct cfs_rq *cfs_rq, struct sched_entity *se)
+{
+	struct rb_node *node = &se->run_node;
+
+	if (node->rb_left)
+		node = node->rb_left;
+	else if (node->rb_right)
+		node = node->rb_right;
+
+	update_tree(cfs_rq, node);
+}
+
+/*
+ * before removing the node, find the deepest node on the rebalance path
+ * that will still be there after @se gets removed
+ */
+static struct rb_node *update_tree_dequeue_begin(struct sched_entity *se)
+{
+	struct rb_node *deepest;
+	struct rb_node *node = &se->run_node;
+
+	if (!node->rb_right && !node->rb_left)
+		deepest = rb_parent(node);
+	else if (!node->rb_right)
+		deepest = node->rb_left;
+	else if (!node->rb_left)
+		deepest = node->rb_right;
+	else {
+		deepest = rb_next(node);
+		if (deepest->rb_right)
+			deepest = deepest->rb_right;
+		else if (rb_parent(deepest) != node)
+			deepest = rb_parent(deepest);
+	}
+
+	return deepest;
+}
+
+/*
+ * now that the entity got removed, update min_deadline to undo the missing
+ * deadine and any rebalance damage
+ */
+static void update_tree_dequeue_end(struct cfs_rq *cfs_rq, struct rb_node *node)
+{
+	if (node)
+		update_tree(cfs_rq, node);
 }
 
 /*
@@ -360,10 +578,14 @@ static void __enqueue_entity(struct cfs_
 
 	rb_link_node(&se->run_node, parent, link);
 	rb_insert_color(&se->run_node, &cfs_rq->tasks_timeline);
+
+	update_tree_enqueue(cfs_rq, se);
 }
 
 static void __dequeue_entity(struct cfs_rq *cfs_rq, struct sched_entity *se)
 {
+	struct rb_node *node = update_tree_dequeue_begin(se);
+
 	if (cfs_rq->rb_leftmost == &se->run_node) {
 		struct rb_node *next_node;
 
@@ -372,16 +594,145 @@ static void __dequeue_entity(struct cfs_
 	}
 
 	rb_erase(&se->run_node, &cfs_rq->tasks_timeline);
+	update_tree_dequeue_end(cfs_rq, node);
+	avg_vruntime_sub(cfs_rq, se);
 }

^ permalink raw reply	[flat|nested] 27+ messages in thread

* [RFC][PATCH] rbtree: undo augmented damage
  2010-02-19  0:21     ` [tip:x86/pat] rbtree: Add support for augmented rbtrees tip-bot for Pallipadi, Venkatesh
  2010-05-27 15:29       ` Peter Zijlstra
@ 2010-05-29 12:29       ` Peter Zijlstra
  2010-05-29 13:31         ` [PATCH] rbtree: undo augmented damage -v2 Peter Zijlstra
  1 sibling, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2010-05-29 12:29 UTC (permalink / raw)
  To: mingo, hpa, linux-kernel, venkatesh.pallipadi, suresh.b.siddha, tglx
  Cc: linux-tip-commits, Fabio Checconi

On Fri, 2010-02-19 at 00:21 +0000, tip-bot for Pallipadi, Venkatesh wrote:
> Commit-ID:  17d9ddc72fb8bba0d4f67868c9c612e472a594a9
> Gitweb:     http://git.kernel.org/tip/17d9ddc72fb8bba0d4f67868c9c612e472a594a9
> Author:     Pallipadi, Venkatesh <venkatesh.pallipadi@intel.com>
> AuthorDate: Wed, 10 Feb 2010 15:23:44 -0800
> Committer:  H. Peter Anvin <hpa@zytor.com>
> CommitDate: Thu, 18 Feb 2010 15:40:56 -0800
> 
> rbtree: Add support for augmented rbtrees
> 
> Add support for augmented rbtrees in core rbtree code.
> 
> This will be used in subsequent patches, in x86 PAT code, which needs
> interval trees to efficiently keep track of PAT ranges.
> 
> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
> LKML-Reference: <20100210232343.GA11465@linux-os.sc.intel.com>
> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
> Signed-off-by: H. Peter Anvin <hpa@zytor.com> 

How about the below, which undoes all the damage to the rb_tree code?

Its likely this method is faster than the previously implemented one,
simply because it only does the whole update pass a single time, not on
every rotate.

It certainly removes all the extra conditionals in the RB-tree code,
which live on the scheduler hot-path.

(Compile tested only)

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 Documentation/rbtree.txt |   58 ---------------------------
 arch/x86/mm/pat_rbtree.c |    9 +++-
 include/linux/rbtree.h   |   13 ++++--
 lib/rbtree.c             |   97 +++++++++++++++++++++++++---------------------
 4 files changed, 68 insertions(+), 109 deletions(-)

diff --git a/Documentation/rbtree.txt b/Documentation/rbtree.txt
index 221f38b..aae8355 100644
--- a/Documentation/rbtree.txt
+++ b/Documentation/rbtree.txt
@@ -190,61 +190,3 @@ Example:
   for (node = rb_first(&mytree); node; node = rb_next(node))
 	printk("key=%s\n", rb_entry(node, struct mytype, node)->keystring);
 
-Support for Augmented rbtrees
------------------------------
-
-Augmented rbtree is an rbtree with "some" additional data stored in each node.
-This data can be used to augment some new functionality to rbtree.
-Augmented rbtree is an optional feature built on top of basic rbtree
-infrastructure. rbtree user who wants this feature will have an augment
-callback function in rb_root initialized.
-
-This callback function will be called from rbtree core routines whenever
-a node has a change in one or both of its children. It is the responsibility
-of the callback function to recalculate the additional data that is in the
-rb node using new children information. Note that if this new additional
-data affects the parent node's additional data, then callback function has
-to handle it and do the recursive updates.
-
-
-Interval tree is an example of augmented rb tree. Reference -
-"Introduction to Algorithms" by Cormen, Leiserson, Rivest and Stein.
-More details about interval trees:
-
-Classical rbtree has a single key and it cannot be directly used to store
-interval ranges like [lo:hi] and do a quick lookup for any overlap with a new
-lo:hi or to find whether there is an exact match for a new lo:hi.
-
-However, rbtree can be augmented to store such interval ranges in a structured
-way making it possible to do efficient lookup and exact match.
-
-This "extra information" stored in each node is the maximum hi
-(max_hi) value among all the nodes that are its descendents. This
-information can be maintained at each node just be looking at the node
-and its immediate children. And this will be used in O(log n) lookup
-for lowest match (lowest start address among all possible matches)
-with something like:
-
-find_lowest_match(lo, hi, node)
-{
-	lowest_match = NULL;
-	while (node) {
-		if (max_hi(node->left) > lo) {
-			// Lowest overlap if any must be on left side
-			node = node->left;
-		} else if (overlap(lo, hi, node)) {
-			lowest_match = node;
-			break;
-		} else if (lo > node->lo) {
-			// Lowest overlap if any must be on right side
-			node = node->right;
-		} else {
-			break;
-		}
-	}
-	return lowest_match;
-}
-
-Finding exact match will be to first find lowest match and then to follow
-successor nodes looking for exact match, until the start of a node is beyond
-the hi value we are looking for.
diff --git a/arch/x86/mm/pat_rbtree.c b/arch/x86/mm/pat_rbtree.c
index f537087..ece5a67 100644
--- a/arch/x86/mm/pat_rbtree.c
+++ b/arch/x86/mm/pat_rbtree.c
@@ -34,8 +34,7 @@
  * memtype_lock protects the rbtree.
  */
 
-static void memtype_rb_augment_cb(struct rb_node *node);
-static struct rb_root memtype_rbroot = RB_AUGMENT_ROOT(&memtype_rb_augment_cb);
+static struct rb_root memtype_rbroot = RB_ROOT;
 
 static int is_node_overlap(struct memtype *node, u64 start, u64 end)
 {
@@ -190,7 +189,7 @@ failure:
 	return -EBUSY;
 }
 
-static void memtype_rb_augment_cb(struct rb_node *node)
+static void memtype_rb_augment_cb(struct rb_node *node, void *data)
 {
 	if (node)
 		update_path_max_end(node);
@@ -213,6 +212,7 @@ static void memtype_rb_insert(struct rb_root *root, struct memtype *newdata)
 
 	rb_link_node(&newdata->rb, parent, node);
 	rb_insert_color(&newdata->rb, root);
+	rb_augment_insert(&newdata->rb, memtype_rb_augment_cb, NULL);
 }
 
 int rbt_memtype_check_insert(struct memtype *new, unsigned long *ret_type)
@@ -233,13 +233,16 @@ int rbt_memtype_check_insert(struct memtype *new, unsigned long *ret_type)
 
 struct memtype *rbt_memtype_erase(u64 start, u64 end)
 {
+	struct rb_node *deepest;
 	struct memtype *data;
 
 	data = memtype_rb_exact_match(&memtype_rbroot, start, end);
 	if (!data)
 		goto out;
 
+	deepest = rb_augment_erase_begin(&data->rb);
 	rb_erase(&data->rb, &memtype_rbroot);
+	rb_augment_erase_end(deepest, memtype_rb_augment_cb, NULL);
 out:
 	return data;
 }
diff --git a/include/linux/rbtree.h b/include/linux/rbtree.h
index fe1872e..7066acb 100644
--- a/include/linux/rbtree.h
+++ b/include/linux/rbtree.h
@@ -110,7 +110,6 @@ struct rb_node
 struct rb_root
 {
 	struct rb_node *rb_node;
-	void (*augment_cb)(struct rb_node *node);
 };
 
 
@@ -130,9 +129,7 @@ static inline void rb_set_color(struct rb_node *rb, int color)
 	rb->rb_parent_color = (rb->rb_parent_color & ~1) | color;
 }
 
-#define RB_ROOT	(struct rb_root) { NULL, NULL, }
-#define RB_AUGMENT_ROOT(x)	(struct rb_root) { NULL, x}
-
+#define RB_ROOT	(struct rb_root) { NULL, }
 #define	rb_entry(ptr, type, member) container_of(ptr, type, member)
 
 #define RB_EMPTY_ROOT(root)	((root)->rb_node == NULL)
@@ -142,6 +139,14 @@ static inline void rb_set_color(struct rb_node *rb, int color)
 extern void rb_insert_color(struct rb_node *, struct rb_root *);
 extern void rb_erase(struct rb_node *, struct rb_root *);
 
+typedef void (*rb_augment_f)(struct rb_node *node, void *data);
+
+extern void rb_augment_insert(struct rb_node *node,
+			      rb_augment_f func, void *data);
+extern struct rb_node *rb_augment_erase_begin(struct rb_node *node);
+extern void rb_augment_erase_end(struct rb_node *node,
+				 rb_augment_f func, void *data);
+
 /* Find logical next and previous nodes in a tree */
 extern struct rb_node *rb_next(const struct rb_node *);
 extern struct rb_node *rb_prev(const struct rb_node *);
diff --git a/lib/rbtree.c b/lib/rbtree.c
index 15e10b1..945bff4 100644
--- a/lib/rbtree.c
+++ b/lib/rbtree.c
@@ -44,11 +44,6 @@ static void __rb_rotate_left(struct rb_node *node, struct rb_root *root)
 	else
 		root->rb_node = right;
 	rb_set_parent(node, right);
-
-	if (root->augment_cb) {
-		root->augment_cb(node);
-		root->augment_cb(right);
-	}
 }
 
 static void __rb_rotate_right(struct rb_node *node, struct rb_root *root)
@@ -72,20 +67,12 @@ static void __rb_rotate_right(struct rb_node *node, struct rb_root *root)
 	else
 		root->rb_node = left;
 	rb_set_parent(node, left);
-
-	if (root->augment_cb) {
-		root->augment_cb(node);
-		root->augment_cb(left);
-	}
 }
 
 void rb_insert_color(struct rb_node *node, struct rb_root *root)
 {
 	struct rb_node *parent, *gparent;
 
-	if (root->augment_cb)
-		root->augment_cb(node);
-
 	while ((parent = rb_parent(node)) && rb_is_red(parent))
 	{
 		gparent = rb_parent(parent);
@@ -240,15 +227,12 @@ void rb_erase(struct rb_node *node, struct rb_root *root)
 	else
 	{
 		struct rb_node *old = node, *left;
-		int old_parent_cb = 0;
-		int successor_parent_cb = 0;
 
 		node = node->rb_right;
 		while ((left = node->rb_left) != NULL)
 			node = left;
 
 		if (rb_parent(old)) {
-			old_parent_cb = 1;
 			if (rb_parent(old)->rb_left == old)
 				rb_parent(old)->rb_left = node;
 			else
@@ -263,10 +247,8 @@ void rb_erase(struct rb_node *node, struct rb_root *root)
 		if (parent == old) {
 			parent = node;
 		} else {
-			successor_parent_cb = 1;
 			if (child)
 				rb_set_parent(child, parent);
-
 			parent->rb_left = child;
 
 			node->rb_right = old->rb_right;
@@ -277,24 +259,6 @@ void rb_erase(struct rb_node *node, struct rb_root *root)
 		node->rb_left = old->rb_left;
 		rb_set_parent(old->rb_left, node);
 
-		if (root->augment_cb) {
-			/*
-			 * Here, three different nodes can have new children.
-			 * The parent of the successor node that was selected
-			 * to replace the node to be erased.
-			 * The node that is getting erased and is now replaced
-			 * by its successor.
-			 * The parent of the node getting erased-replaced.
-			 */
-			if (successor_parent_cb)
-				root->augment_cb(parent);
-
-			root->augment_cb(node);
-
-			if (old_parent_cb)
-				root->augment_cb(rb_parent(old));
-		}
-
 		goto color;
 	}
 
@@ -303,19 +267,15 @@ void rb_erase(struct rb_node *node, struct rb_root *root)
 
 	if (child)
 		rb_set_parent(child, parent);
-
-	if (parent) {
+	if (parent)
+	{
 		if (parent->rb_left == node)
 			parent->rb_left = child;
 		else
 			parent->rb_right = child;
-
-		if (root->augment_cb)
-			root->augment_cb(parent);
-
-	} else {
-		root->rb_node = child;
 	}
+	else
+		root->rb_node = child;
 
  color:
 	if (color == RB_BLACK)
@@ -324,6 +284,55 @@ void rb_erase(struct rb_node *node, struct rb_root *root)
 EXPORT_SYMBOL(rb_erase);
 
 /*
+ * after inserting @node into the tree, update the tree to account for
+ * both the new entry and any damage done by rebalance
+ */
+void rb_augment_insert(struct rb_node *node, rb_augment_f func, void *data)
+{
+	if (node->rb_left)
+		node = node->rb_left;
+	else if (node->rb_right)
+		node = node->rb_right;
+
+	func(node, data);
+}
+
+/*
+ * before removing the node, find the deepest node on the rebalance path
+ * that will still be there after @se gets removed
+ */
+struct rb_node *rb_augment_erase_begin(struct rb_node *node)
+{
+	struct rb_node *deepest;
+
+	if (!node->rb_right && !node->rb_left)
+		deepest = rb_parent(node);
+	else if (!node->rb_right)
+		deepest = node->rb_left;
+	else if (!node->rb_left)
+		deepest = node->rb_right;
+	else {
+		deepest = rb_next(node);
+		if (deepest->rb_right)
+			deepest = deepest->rb_right;
+		else if (rb_parent(deepest) != node)
+			deepest = rb_parent(deepest);
+	}
+
+	return deepest;
+}
+
+/*
+ * after removal, update the tree to account for the removed entry
+ * and any rebalance damage.
+ */
+void rb_augment_erase_end(struct rb_node *node, rb_augment_f func, void *data)
+{
+	if (node)
+		func(node, data);
+}
+
+/*
  * This function returns the first node (in sort order) of the tree.
  */
 struct rb_node *rb_first(const struct rb_root *root)


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH] rbtree: undo augmented damage -v2
  2010-05-29 12:29       ` [RFC][PATCH] rbtree: undo augmented damage Peter Zijlstra
@ 2010-05-29 13:31         ` Peter Zijlstra
  2010-06-01 17:00           ` Venkatesh Pallipadi
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2010-05-29 13:31 UTC (permalink / raw)
  To: mingo
  Cc: hpa, linux-kernel, Venkatesh Pallipadi, suresh.b.siddha, tglx,
	linux-tip-commits, Fabio Checconi

A better changelog, and I guess we can keep the explanation of
the augmented rb-tree since it doesn't actually mentions the
interface, merely the idea of adding a heap to the binary tree.

The patch simply reverts the rbtree.c hunk, and keeps the non-conforming
style it has..

---
Subject: rbtree: Undo augmented damage
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date: Sat May 29 15:14:02 CEST 2010

Reimplement augmented RB-trees without sprinkling extra branches
all over the RB-tree code (which lives in the scheduler hot path).

This approach is 'borrowed' from Fabio's BFQ implementation and
relies on traversing the rebalance path after the RB-tree-op to
correct the heap property for insertion/removal and make up for
the damage done by the tree rotations.

For insertion the rebalance path is trivially that from the new
node upwards to the root, for removal it is that from the deepest
node in the path from the to be removed node that will still
be around after the removal.

Cc: Fabio Checconi <fabio@gandalf.sssup.it>
Cc: suresh.b.siddha@intel.com
Cc: venki@google.com
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 arch/x86/mm/pat_rbtree.c |    9 ++--
 include/linux/rbtree.h   |   13 ++++--
 lib/rbtree.c             |   97 +++++++++++++++++++++++++----------------------
 3 files changed, 68 insertions(+), 51 deletions(-)

Index: linux-2.6/arch/x86/mm/pat_rbtree.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/pat_rbtree.c
+++ linux-2.6/arch/x86/mm/pat_rbtree.c
@@ -34,8 +34,7 @@
  * memtype_lock protects the rbtree.
  */
 
-static void memtype_rb_augment_cb(struct rb_node *node);
-static struct rb_root memtype_rbroot = RB_AUGMENT_ROOT(&memtype_rb_augment_cb);
+static struct rb_root memtype_rbroot = RB_ROOT;
 
 static int is_node_overlap(struct memtype *node, u64 start, u64 end)
 {
@@ -190,7 +189,7 @@ failure:
 	return -EBUSY;
 }
 
-static void memtype_rb_augment_cb(struct rb_node *node)
+static void memtype_rb_augment_cb(struct rb_node *node, void *data)
 {
 	if (node)
 		update_path_max_end(node);
@@ -213,6 +212,7 @@ static void memtype_rb_insert(struct rb_
 
 	rb_link_node(&newdata->rb, parent, node);
 	rb_insert_color(&newdata->rb, root);
+	rb_augment_insert(&newdata->rb, memtype_rb_augment_cb, NULL);
 }
 
 int rbt_memtype_check_insert(struct memtype *new, unsigned long *ret_type)
@@ -233,13 +233,16 @@ int rbt_memtype_check_insert(struct memt
 
 struct memtype *rbt_memtype_erase(u64 start, u64 end)
 {
+	struct rb_node *deepest;
 	struct memtype *data;
 
 	data = memtype_rb_exact_match(&memtype_rbroot, start, end);
 	if (!data)
 		goto out;
 
+	deepest = rb_augment_erase_begin(&data->rb);
 	rb_erase(&data->rb, &memtype_rbroot);
+	rb_augment_erase_end(deepest, memtype_rb_augment_cb, NULL);
 out:
 	return data;
 }
Index: linux-2.6/include/linux/rbtree.h
===================================================================
--- linux-2.6.orig/include/linux/rbtree.h
+++ linux-2.6/include/linux/rbtree.h
@@ -110,7 +110,6 @@ struct rb_node
 struct rb_root
 {
 	struct rb_node *rb_node;
-	void (*augment_cb)(struct rb_node *node);
 };
 
 
@@ -130,9 +129,7 @@ static inline void rb_set_color(struct r
 	rb->rb_parent_color = (rb->rb_parent_color & ~1) | color;
 }
 
-#define RB_ROOT	(struct rb_root) { NULL, NULL, }
-#define RB_AUGMENT_ROOT(x)	(struct rb_root) { NULL, x}
-
+#define RB_ROOT	(struct rb_root) { NULL, }
 #define	rb_entry(ptr, type, member) container_of(ptr, type, member)
 
 #define RB_EMPTY_ROOT(root)	((root)->rb_node == NULL)
@@ -142,6 +139,14 @@ static inline void rb_set_color(struct r
 extern void rb_insert_color(struct rb_node *, struct rb_root *);
 extern void rb_erase(struct rb_node *, struct rb_root *);
 
+typedef void (*rb_augment_f)(struct rb_node *node, void *data);
+
+extern void rb_augment_insert(struct rb_node *node,
+			      rb_augment_f func, void *data);
+extern struct rb_node *rb_augment_erase_begin(struct rb_node *node);
+extern void rb_augment_erase_end(struct rb_node *node,
+				 rb_augment_f func, void *data);
+
 /* Find logical next and previous nodes in a tree */
 extern struct rb_node *rb_next(const struct rb_node *);
 extern struct rb_node *rb_prev(const struct rb_node *);
Index: linux-2.6/lib/rbtree.c
===================================================================
--- linux-2.6.orig/lib/rbtree.c
+++ linux-2.6/lib/rbtree.c
@@ -44,11 +44,6 @@ static void __rb_rotate_left(struct rb_n
 	else
 		root->rb_node = right;
 	rb_set_parent(node, right);
-
-	if (root->augment_cb) {
-		root->augment_cb(node);
-		root->augment_cb(right);
-	}
 }
 
 static void __rb_rotate_right(struct rb_node *node, struct rb_root *root)
@@ -72,20 +67,12 @@ static void __rb_rotate_right(struct rb_
 	else
 		root->rb_node = left;
 	rb_set_parent(node, left);
-
-	if (root->augment_cb) {
-		root->augment_cb(node);
-		root->augment_cb(left);
-	}
 }
 
 void rb_insert_color(struct rb_node *node, struct rb_root *root)
 {
 	struct rb_node *parent, *gparent;
 
-	if (root->augment_cb)
-		root->augment_cb(node);
-
 	while ((parent = rb_parent(node)) && rb_is_red(parent))
 	{
 		gparent = rb_parent(parent);
@@ -240,15 +227,12 @@ void rb_erase(struct rb_node *node, stru
 	else
 	{
 		struct rb_node *old = node, *left;
-		int old_parent_cb = 0;
-		int successor_parent_cb = 0;
 
 		node = node->rb_right;
 		while ((left = node->rb_left) != NULL)
 			node = left;
 
 		if (rb_parent(old)) {
-			old_parent_cb = 1;
 			if (rb_parent(old)->rb_left == old)
 				rb_parent(old)->rb_left = node;
 			else
@@ -263,10 +247,8 @@ void rb_erase(struct rb_node *node, stru
 		if (parent == old) {
 			parent = node;
 		} else {
-			successor_parent_cb = 1;
 			if (child)
 				rb_set_parent(child, parent);
-
 			parent->rb_left = child;
 
 			node->rb_right = old->rb_right;
@@ -277,24 +259,6 @@ void rb_erase(struct rb_node *node, stru
 		node->rb_left = old->rb_left;
 		rb_set_parent(old->rb_left, node);
 
-		if (root->augment_cb) {
-			/*
-			 * Here, three different nodes can have new children.
-			 * The parent of the successor node that was selected
-			 * to replace the node to be erased.
-			 * The node that is getting erased and is now replaced
-			 * by its successor.
-			 * The parent of the node getting erased-replaced.
-			 */
-			if (successor_parent_cb)
-				root->augment_cb(parent);
-
-			root->augment_cb(node);
-
-			if (old_parent_cb)
-				root->augment_cb(rb_parent(old));
-		}
-
 		goto color;
 	}
 
@@ -303,19 +267,15 @@ void rb_erase(struct rb_node *node, stru
 
 	if (child)
 		rb_set_parent(child, parent);
-
-	if (parent) {
+	if (parent)
+	{
 		if (parent->rb_left == node)
 			parent->rb_left = child;
 		else
 			parent->rb_right = child;
-
-		if (root->augment_cb)
-			root->augment_cb(parent);
-
-	} else {
-		root->rb_node = child;
 	}
+	else
+		root->rb_node = child;
 
  color:
 	if (color == RB_BLACK)
@@ -324,6 +284,55 @@ void rb_erase(struct rb_node *node, stru
 EXPORT_SYMBOL(rb_erase);
 
 /*
+ * after inserting @node into the tree, update the tree to account for
+ * both the new entry and any damage done by rebalance
+ */
+void rb_augment_insert(struct rb_node *node, rb_augment_f func, void *data)
+{
+	if (node->rb_left)
+		node = node->rb_left;
+	else if (node->rb_right)
+		node = node->rb_right;
+
+	func(node, data);
+}
+
+/*
+ * before removing the node, find the deepest node on the rebalance path
+ * that will still be there after @node gets removed
+ */
+struct rb_node *rb_augment_erase_begin(struct rb_node *node)
+{
+	struct rb_node *deepest;
+
+	if (!node->rb_right && !node->rb_left)
+		deepest = rb_parent(node);
+	else if (!node->rb_right)
+		deepest = node->rb_left;
+	else if (!node->rb_left)
+		deepest = node->rb_right;
+	else {
+		deepest = rb_next(node);
+		if (deepest->rb_right)
+			deepest = deepest->rb_right;
+		else if (rb_parent(deepest) != node)
+			deepest = rb_parent(deepest);
+	}
+
+	return deepest;
+}
+
+/*
+ * after removal, update the tree to account for the removed entry
+ * and any rebalance damage.
+ */
+void rb_augment_erase_end(struct rb_node *node, rb_augment_f func, void *data)
+{
+	if (node)
+		func(node, data);
+}
+
+/*
  * This function returns the first node (in sort order) of the tree.
  */
 struct rb_node *rb_first(const struct rb_root *root)


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] rbtree: undo augmented damage -v2
  2010-05-29 13:31         ` [PATCH] rbtree: undo augmented damage -v2 Peter Zijlstra
@ 2010-06-01 17:00           ` Venkatesh Pallipadi
  2010-06-01 17:19             ` Peter Zijlstra
  0 siblings, 1 reply; 27+ messages in thread
From: Venkatesh Pallipadi @ 2010-06-01 17:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, hpa, linux-kernel, suresh.b.siddha, tglx,
	linux-tip-commits, Fabio Checconi

On Sat, May 29, 2010 at 6:31 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> A better changelog, and I guess we can keep the explanation of
> the augmented rb-tree since it doesn't actually mentions the
> interface, merely the idea of adding a heap to the binary tree.
>
> The patch simply reverts the rbtree.c hunk, and keeps the non-conforming
> style it has..
>
> ---
> Subject: rbtree: Undo augmented damage
> From: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Date: Sat May 29 15:14:02 CEST 2010
>
> Reimplement augmented RB-trees without sprinkling extra branches
> all over the RB-tree code (which lives in the scheduler hot path).
>
> This approach is 'borrowed' from Fabio's BFQ implementation and
> relies on traversing the rebalance path after the RB-tree-op to
> correct the heap property for insertion/removal and make up for
> the damage done by the tree rotations.
>
> For insertion the rebalance path is trivially that from the new
> node upwards to the root, for removal it is that from the deepest
> node in the path from the to be removed node that will still
> be around after the removal.

Yes. I like avoiding the sprinkled if's. But, I don't think
rb_augment_erase_begin() and rb_augment_insert() is covering all the
callbacks needed to maintain the augmented tree correctly. More
comments below.

>
> Cc: Fabio Checconi <fabio@gandalf.sssup.it>
> Cc: suresh.b.siddha@intel.com
> Cc: venki@google.com
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> ---
>  arch/x86/mm/pat_rbtree.c |    9 ++--
>  include/linux/rbtree.h   |   13 ++++--
>  lib/rbtree.c             |   97 +++++++++++++++++++++++++----------------------
>  3 files changed, 68 insertions(+), 51 deletions(-)
>
> Index: linux-2.6/arch/x86/mm/pat_rbtree.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/mm/pat_rbtree.c
> +++ linux-2.6/arch/x86/mm/pat_rbtree.c
> @@ -34,8 +34,7 @@
>  * memtype_lock protects the rbtree.
>  */
>
> -static void memtype_rb_augment_cb(struct rb_node *node);
> -static struct rb_root memtype_rbroot = RB_AUGMENT_ROOT(&memtype_rb_augment_cb);
> +static struct rb_root memtype_rbroot = RB_ROOT;
>
>  static int is_node_overlap(struct memtype *node, u64 start, u64 end)
>  {
> @@ -190,7 +189,7 @@ failure:
>        return -EBUSY;
>  }
>
> -static void memtype_rb_augment_cb(struct rb_node *node)
> +static void memtype_rb_augment_cb(struct rb_node *node, void *data)
>  {
>        if (node)
>                update_path_max_end(node);

There is a optimization in memtype_rb_augment_cb, where in it does not
do the fixup all the way to the root, when a nodes max_end does not
change. That has to be removed for this change to work.


> @@ -213,6 +212,7 @@ static void memtype_rb_insert(struct rb_
>
>        rb_link_node(&newdata->rb, parent, node);
>        rb_insert_color(&newdata->rb, root);
> +       rb_augment_insert(&newdata->rb, memtype_rb_augment_cb, NULL);
>  }
>

<snip>

> @@ -324,6 +284,55 @@ void rb_erase(struct rb_node *node, stru
>  EXPORT_SYMBOL(rb_erase);
>
>  /*
> + * after inserting @node into the tree, update the tree to account for
> + * both the new entry and any damage done by rebalance
> + */
> +void rb_augment_insert(struct rb_node *node, rb_augment_f func, void *data)
> +{
> +       if (node->rb_left)
> +               node = node->rb_left;
> +       else if (node->rb_right)
> +               node = node->rb_right;
> +
> +       func(node, data);
> +}

I am not seeing how this can cover all the callbacks we will need to
maintain the augmented tree property. May be I am missing something.

For example, during insertion there can be a rotate at the grand
parent. So, (bad ascii art alert)

    G
   / \
  P   U
 /
N

    P
   / \
  N   G
       \
        U

Where N is the new node. Here G has new children and has to
recalculate the nodes augmented data. I don't see that happening with
rb_augment_insert()


> +
> +/*
> + * before removing the node, find the deepest node on the rebalance path
> + * that will still be there after @node gets removed
> + */
> +struct rb_node *rb_augment_erase_begin(struct rb_node *node)
> +{
> +       struct rb_node *deepest;
> +
> +       if (!node->rb_right && !node->rb_left)
> +               deepest = rb_parent(node);
> +       else if (!node->rb_right)
> +               deepest = node->rb_left;
> +       else if (!node->rb_left)
> +               deepest = node->rb_right;
> +       else {
> +               deepest = rb_next(node);
> +               if (deepest->rb_right)
> +                       deepest = deepest->rb_right;
> +               else if (rb_parent(deepest) != node)
> +                       deepest = rb_parent(deepest);
> +       }
> +
> +       return deepest;
> +}
> +

I have similar concern with rb_augment_erase_begin() returning the
deepest node. There is a case in __rb_erase_color where we do a
rotate_left of 'other' (sibling) node. That changes the children of
some nodes that are not in deepest to root path. No?

Thanks,
Venki

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] rbtree: undo augmented damage -v2
  2010-06-01 17:00           ` Venkatesh Pallipadi
@ 2010-06-01 17:19             ` Peter Zijlstra
  2010-06-01 17:34               ` Peter Zijlstra
  2010-06-01 17:34               ` Venkatesh Pallipadi
  0 siblings, 2 replies; 27+ messages in thread
From: Peter Zijlstra @ 2010-06-01 17:19 UTC (permalink / raw)
  To: Venkatesh Pallipadi
  Cc: mingo, hpa, linux-kernel, suresh.b.siddha, tglx,
	linux-tip-commits, Fabio Checconi

On Tue, 2010-06-01 at 10:00 -0700, Venkatesh Pallipadi wrote:
> I am not seeing how this can cover all the callbacks we will need to
> maintain the augmented tree property. May be I am missing something.

Yes, indeed, how about the below delta, which my eevdf code did do but I
overlooked on the conversion to the PAT code.

That removes the break as you said, but also adds code to update the
child nodes when walking up the path.

So in your rotation case:

    G             P
   / \    -->    / \
  P   U         N   G
 /        <--        \
N                     U

Say we take the right rotation, then the traversal up from N to P will
find that since N was the left child of P and it has a right child (G)
it will also update G.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 arch/x86/mm/pat_rbtree.c |   23 ++++++++++++-----------
 1 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/arch/x86/mm/pat_rbtree.c b/arch/x86/mm/pat_rbtree.c
index f537087..ca19aae 100644
--- a/arch/x86/mm/pat_rbtree.c
+++ b/arch/x86/mm/pat_rbtree.c
@@ -81,20 +81,21 @@ static void update_node_max_end(struct rb_node *node)
 /* Update 'subtree_max_end' for a node and all its ancestors */
 static void update_path_max_end(struct rb_node *node)
 {
-	u64 old_max_end, new_max_end;
+	struct rb_node *parent;
 
-	while (node) {
-		struct memtype *data = container_of(node, struct memtype, rb);
-
-		old_max_end = data->subtree_max_end;
-		update_node_max_end(node);
-		new_max_end = data->subtree_max_end;
+up:
+	update_node_max_end(node);
+	parent = rb_parent(node);
+	if (!parent)
+		return;
 
-		if (new_max_end == old_max_end)
-			break;
+	if (node == parent->rb_left && parent->rb_right)
+		update_node_max_end(parent->rb_right);
+	else if (parent->rb_left)
+		update_node_max_end(parent->rb_left);
 
-		node = rb_parent(node);
-	}
+	node = parent;
+	goto up;
 }
 
 /* Find the first (lowest start addr) overlapping range from rb tree */


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH] rbtree: undo augmented damage -v2
  2010-06-01 17:19             ` Peter Zijlstra
@ 2010-06-01 17:34               ` Peter Zijlstra
  2010-06-01 17:34               ` Venkatesh Pallipadi
  1 sibling, 0 replies; 27+ messages in thread
From: Peter Zijlstra @ 2010-06-01 17:34 UTC (permalink / raw)
  To: Venkatesh Pallipadi
  Cc: mingo, hpa, linux-kernel, suresh.b.siddha, tglx,
	linux-tip-commits, Fabio Checconi

On Tue, 2010-06-01 at 19:19 +0200, Peter Zijlstra wrote:
> On Tue, 2010-06-01 at 10:00 -0700, Venkatesh Pallipadi wrote:
> > I am not seeing how this can cover all the callbacks we will need to
> > maintain the augmented tree property. May be I am missing something.
> 
> Yes, indeed, how about the below delta, which my eevdf code did do but I
> overlooked on the conversion to the PAT code.
> 
> That removes the break as you said, but also adds code to update the
> child nodes when walking up the path.
> 
> So in your rotation case:
> 
>     G             P
>    / \    -->    / \
>   P   U         N   G
>  /        <--        \
> N                     U
> 
> Say we take the right rotation, then the traversal up from N to P will
> find that since N was the left child of P and it has a right child (G)
> it will also update G.

Hmm, that looks like it could well be folded in to the generic code..
let me spin a new patch.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] rbtree: undo augmented damage -v2
  2010-06-01 17:19             ` Peter Zijlstra
  2010-06-01 17:34               ` Peter Zijlstra
@ 2010-06-01 17:34               ` Venkatesh Pallipadi
  2010-06-01 17:42                 ` Peter Zijlstra
  1 sibling, 1 reply; 27+ messages in thread
From: Venkatesh Pallipadi @ 2010-06-01 17:34 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, hpa, linux-kernel, suresh.b.siddha, tglx,
	linux-tip-commits, Fabio Checconi

On Tue, Jun 1, 2010 at 10:19 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, 2010-06-01 at 10:00 -0700, Venkatesh Pallipadi wrote:
>> I am not seeing how this can cover all the callbacks we will need to
>> maintain the augmented tree property. May be I am missing something.
>
> Yes, indeed, how about the below delta, which my eevdf code did do but I
> overlooked on the conversion to the PAT code.
>
> That removes the break as you said, but also adds code to update the
> child nodes when walking up the path.
>
> So in your rotation case:
>
>    G             P
>   / \    -->    / \
>  P   U         N   G
>  /        <--        \
> N                     U
>
> Say we take the right rotation, then the traversal up from N to P will
> find that since N was the left child of P and it has a right child (G)
> it will also update G.

Yes. This will cover all the cases on insert. But on erase, there is
still a case where a rotate of sibling node is done during the
re-coloration process. There we have a child change on sibling's
child. I am not able to think of any easy way to handle that case.

Thanks,
Venki

>
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> ---
>  arch/x86/mm/pat_rbtree.c |   23 ++++++++++++-----------
>  1 files changed, 12 insertions(+), 11 deletions(-)
>
> diff --git a/arch/x86/mm/pat_rbtree.c b/arch/x86/mm/pat_rbtree.c
> index f537087..ca19aae 100644
> --- a/arch/x86/mm/pat_rbtree.c
> +++ b/arch/x86/mm/pat_rbtree.c
> @@ -81,20 +81,21 @@ static void update_node_max_end(struct rb_node *node)
>  /* Update 'subtree_max_end' for a node and all its ancestors */
>  static void update_path_max_end(struct rb_node *node)
>  {
> -       u64 old_max_end, new_max_end;
> +       struct rb_node *parent;
>
> -       while (node) {
> -               struct memtype *data = container_of(node, struct memtype, rb);
> -
> -               old_max_end = data->subtree_max_end;
> -               update_node_max_end(node);
> -               new_max_end = data->subtree_max_end;
> +up:
> +       update_node_max_end(node);
> +       parent = rb_parent(node);
> +       if (!parent)
> +               return;
>
> -               if (new_max_end == old_max_end)
> -                       break;
> +       if (node == parent->rb_left && parent->rb_right)
> +               update_node_max_end(parent->rb_right);
> +       else if (parent->rb_left)
> +               update_node_max_end(parent->rb_left);
>
> -               node = rb_parent(node);
> -       }
> +       node = parent;
> +       goto up;
>  }
>
>  /* Find the first (lowest start addr) overlapping range from rb tree */
>
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] rbtree: undo augmented damage -v2
  2010-06-01 17:34               ` Venkatesh Pallipadi
@ 2010-06-01 17:42                 ` Peter Zijlstra
  2010-06-01 18:09                   ` Venkatesh Pallipadi
                                     ` (2 more replies)
  0 siblings, 3 replies; 27+ messages in thread
From: Peter Zijlstra @ 2010-06-01 17:42 UTC (permalink / raw)
  To: Venkatesh Pallipadi
  Cc: mingo, hpa, linux-kernel, suresh.b.siddha, tglx,
	linux-tip-commits, Fabio Checconi

On Tue, 2010-06-01 at 10:34 -0700, Venkatesh Pallipadi wrote:
> 
> Yes. This will cover all the cases on insert. But on erase, there is
> still a case where a rotate of sibling node is done during the
> re-coloration process. There we have a child change on sibling's
> child. I am not able to think of any easy way to handle that case.

Let me go draw some figures with pen and paper to match up the erase
path with the rb_augment_erase_begin() code, because I can't quite spot
the case we're missing.

If you have it handy, ascii art might help..

Below a new patch with the missing bit folded into the generic code.


---
Subject: rbtree: Undo augmented damage 
From: Peter Zijlstra <peterz@infradead.org>
Date: Sat, 29 May 2010 15:31:43 +0200

Reimplement augmented RB-trees without sprinkling extra branches
all over the RB-tree code (which lives in the scheduler hot path).

This approach is 'borrowed' from Fabio's BFQ implementation and
relies on traversing the rebalance path after the RB-tree-op to
correct the heap property for insertion/removal and make up for
the damage done by the tree rotations.

For insertion the rebalance path is trivially that from the new
node upwards to the root, for removal it is that from the deepest
node in the path from the to be removed node that will still
be around after the removal.

Cc: Fabio Checconi <fabio@gandalf.sssup.it>
Cc: suresh.b.siddha@intel.com
Cc: venki@google.com
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 arch/x86/mm/pat_rbtree.c |   34 ++-----------
 include/linux/rbtree.h   |   13 +++--
 lib/rbtree.c             |  116 +++++++++++++++++++++++++++++------------------
 3 files changed, 87 insertions(+), 76 deletions(-)

Index: linux-2.6/arch/x86/mm/pat_rbtree.c
===================================================================
--- linux-2.6.orig/arch/x86/mm/pat_rbtree.c
+++ linux-2.6/arch/x86/mm/pat_rbtree.c
@@ -34,8 +34,7 @@
  * memtype_lock protects the rbtree.
  */
 
-static void memtype_rb_augment_cb(struct rb_node *node);
-static struct rb_root memtype_rbroot = RB_AUGMENT_ROOT(&memtype_rb_augment_cb);
+static struct rb_root memtype_rbroot = RB_ROOT;
 
 static int is_node_overlap(struct memtype *node, u64 start, u64 end)
 {
@@ -56,7 +55,7 @@ static u64 get_subtree_max_end(struct rb
 }
 
 /* Update 'subtree_max_end' for a node, based on node and its children */
-static void update_node_max_end(struct rb_node *node)
+static void memtype_rb_augment_cb(struct rb_node *node, void *__unused)
 {
 	struct memtype *data;
 	u64 max_end, child_max_end;
@@ -78,25 +77,6 @@ static void update_node_max_end(struct r
 	data->subtree_max_end = max_end;
 }
 
-/* Update 'subtree_max_end' for a node and all its ancestors */
-static void update_path_max_end(struct rb_node *node)
-{
-	u64 old_max_end, new_max_end;
-
-	while (node) {
-		struct memtype *data = container_of(node, struct memtype, rb);
-
-		old_max_end = data->subtree_max_end;
-		update_node_max_end(node);
-		new_max_end = data->subtree_max_end;
-
-		if (new_max_end == old_max_end)
-			break;
-
-		node = rb_parent(node);
-	}
-}
-
 /* Find the first (lowest start addr) overlapping range from rb tree */
 static struct memtype *memtype_rb_lowest_match(struct rb_root *root,
 				u64 start, u64 end)
@@ -190,12 +170,6 @@ failure:
 	return -EBUSY;
 }
 
-static void memtype_rb_augment_cb(struct rb_node *node)
-{
-	if (node)
-		update_path_max_end(node);
-}
-
 static void memtype_rb_insert(struct rb_root *root, struct memtype *newdata)
 {
 	struct rb_node **node = &(root->rb_node);
@@ -213,6 +187,7 @@ static void memtype_rb_insert(struct rb_
 
 	rb_link_node(&newdata->rb, parent, node);
 	rb_insert_color(&newdata->rb, root);
+	rb_augment_insert(&newdata->rb, memtype_rb_augment_cb, NULL);
 }
 
 int rbt_memtype_check_insert(struct memtype *new, unsigned long *ret_type)
@@ -233,13 +208,16 @@ int rbt_memtype_check_insert(struct memt
 
 struct memtype *rbt_memtype_erase(u64 start, u64 end)
 {
+	struct rb_node *deepest;
 	struct memtype *data;
 
 	data = memtype_rb_exact_match(&memtype_rbroot, start, end);
 	if (!data)
 		goto out;
 
+	deepest = rb_augment_erase_begin(&data->rb);
 	rb_erase(&data->rb, &memtype_rbroot);
+	rb_augment_erase_end(deepest, memtype_rb_augment_cb, NULL);
 out:
 	return data;
 }
Index: linux-2.6/include/linux/rbtree.h
===================================================================
--- linux-2.6.orig/include/linux/rbtree.h
+++ linux-2.6/include/linux/rbtree.h
@@ -110,7 +110,6 @@ struct rb_node
 struct rb_root
 {
 	struct rb_node *rb_node;
-	void (*augment_cb)(struct rb_node *node);
 };
 
 
@@ -130,9 +129,7 @@ static inline void rb_set_color(struct r
 	rb->rb_parent_color = (rb->rb_parent_color & ~1) | color;
 }
 
-#define RB_ROOT	(struct rb_root) { NULL, NULL, }
-#define RB_AUGMENT_ROOT(x)	(struct rb_root) { NULL, x}
-
+#define RB_ROOT	(struct rb_root) { NULL, }
 #define	rb_entry(ptr, type, member) container_of(ptr, type, member)
 
 #define RB_EMPTY_ROOT(root)	((root)->rb_node == NULL)
@@ -142,6 +139,14 @@ static inline void rb_set_color(struct r
 extern void rb_insert_color(struct rb_node *, struct rb_root *);
 extern void rb_erase(struct rb_node *, struct rb_root *);
 
+typedef void (*rb_augment_f)(struct rb_node *node, void *data);
+
+extern void rb_augment_insert(struct rb_node *node,
+			      rb_augment_f func, void *data);
+extern struct rb_node *rb_augment_erase_begin(struct rb_node *node);
+extern void rb_augment_erase_end(struct rb_node *node,
+				 rb_augment_f func, void *data);
+
 /* Find logical next and previous nodes in a tree */
 extern struct rb_node *rb_next(const struct rb_node *);
 extern struct rb_node *rb_prev(const struct rb_node *);
Index: linux-2.6/lib/rbtree.c
===================================================================
--- linux-2.6.orig/lib/rbtree.c
+++ linux-2.6/lib/rbtree.c
@@ -44,11 +44,6 @@ static void __rb_rotate_left(struct rb_n
 	else
 		root->rb_node = right;
 	rb_set_parent(node, right);
-
-	if (root->augment_cb) {
-		root->augment_cb(node);
-		root->augment_cb(right);
-	}
 }
 
 static void __rb_rotate_right(struct rb_node *node, struct rb_root *root)
@@ -72,20 +67,12 @@ static void __rb_rotate_right(struct rb_
 	else
 		root->rb_node = left;
 	rb_set_parent(node, left);
-
-	if (root->augment_cb) {
-		root->augment_cb(node);
-		root->augment_cb(left);
-	}
 }
 
 void rb_insert_color(struct rb_node *node, struct rb_root *root)
 {
 	struct rb_node *parent, *gparent;
 
-	if (root->augment_cb)
-		root->augment_cb(node);
-
 	while ((parent = rb_parent(node)) && rb_is_red(parent))
 	{
 		gparent = rb_parent(parent);
@@ -240,15 +227,12 @@ void rb_erase(struct rb_node *node, stru
 	else
 	{
 		struct rb_node *old = node, *left;
-		int old_parent_cb = 0;
-		int successor_parent_cb = 0;
 
 		node = node->rb_right;
 		while ((left = node->rb_left) != NULL)
 			node = left;
 
 		if (rb_parent(old)) {
-			old_parent_cb = 1;
 			if (rb_parent(old)->rb_left == old)
 				rb_parent(old)->rb_left = node;
 			else
@@ -263,10 +247,8 @@ void rb_erase(struct rb_node *node, stru
 		if (parent == old) {
 			parent = node;
 		} else {
-			successor_parent_cb = 1;
 			if (child)
 				rb_set_parent(child, parent);
-
 			parent->rb_left = child;
 
 			node->rb_right = old->rb_right;
@@ -277,24 +259,6 @@ void rb_erase(struct rb_node *node, stru
 		node->rb_left = old->rb_left;
 		rb_set_parent(old->rb_left, node);
 
-		if (root->augment_cb) {
-			/*
-			 * Here, three different nodes can have new children.
-			 * The parent of the successor node that was selected
-			 * to replace the node to be erased.
-			 * The node that is getting erased and is now replaced
-			 * by its successor.
-			 * The parent of the node getting erased-replaced.
-			 */
-			if (successor_parent_cb)
-				root->augment_cb(parent);
-
-			root->augment_cb(node);
-
-			if (old_parent_cb)
-				root->augment_cb(rb_parent(old));
-		}
-
 		goto color;
 	}
 
@@ -303,19 +267,15 @@ void rb_erase(struct rb_node *node, stru
 
 	if (child)
 		rb_set_parent(child, parent);
-
-	if (parent) {
+	if (parent)
+	{
 		if (parent->rb_left == node)
 			parent->rb_left = child;
 		else
 			parent->rb_right = child;
-
-		if (root->augment_cb)
-			root->augment_cb(parent);
-
-	} else {
-		root->rb_node = child;
 	}
+	else
+		root->rb_node = child;
 
  color:
 	if (color == RB_BLACK)
@@ -323,6 +283,74 @@ void rb_erase(struct rb_node *node, stru
 }
 EXPORT_SYMBOL(rb_erase);
 
+static void rb_augment_path(struct rb_node *node, rb_augment_f func, void *data)
+{
+	struct rb_node *parent;
+
+up:
+	func(node, data);
+	parent = rb_parent(node);
+	if (!parent)
+		return;
+
+	if (node == parent->rb_left && parent->rb_right)
+		func(parent->rb_right, data);
+	else if (parent->rb_left)
+		func(parent->rb_left, data);
+
+	node = parent;
+	goto up;
+}
+
+/*
+ * after inserting @node into the tree, update the tree to account for
+ * both the new entry and any damage done by rebalance
+ */
+void rb_augment_insert(struct rb_node *node, rb_augment_f func, void *data)
+{
+	if (node->rb_left)
+		node = node->rb_left;
+	else if (node->rb_right)
+		node = node->rb_right;
+
+	rb_augment_path(node, func, data);
+}
+
+/*
+ * before removing the node, find the deepest node on the rebalance path
+ * that will still be there after @node gets removed
+ */
+struct rb_node *rb_augment_erase_begin(struct rb_node *node)
+{
+	struct rb_node *deepest;
+
+	if (!node->rb_right && !node->rb_left)
+		deepest = rb_parent(node);
+	else if (!node->rb_right)
+		deepest = node->rb_left;
+	else if (!node->rb_left)
+		deepest = node->rb_right;
+	else {
+		deepest = rb_next(node);
+		if (deepest->rb_right)
+			deepest = deepest->rb_right;
+		else if (rb_parent(deepest) != node)
+			deepest = rb_parent(deepest);
+	}
+
+	return deepest;
+}
+
+/*
+ * after removal, update the tree to account for the removed entry
+ * and any rebalance damage.
+ */
+void rb_augment_erase_end(struct rb_node *node, rb_augment_f func, void *data)
+{
+	if (node)
+		rb_augment_path(node, func, data);
+}
+
 /*
  * This function returns the first node (in sort order) of the tree.
  */


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] rbtree: undo augmented damage -v2
  2010-06-01 17:42                 ` Peter Zijlstra
@ 2010-06-01 18:09                   ` Venkatesh Pallipadi
  2010-06-01 18:42                     ` Peter Zijlstra
  2010-06-09 10:12                   ` [tip:x86/pat] rbtree: Undo augmented trees performance damage tip-bot for Peter Zijlstra
  2010-07-05 12:48                   ` [tip:x86/urgent] rbtree: Undo augmented trees performance damage and regression tip-bot for Peter Zijlstra
  2 siblings, 1 reply; 27+ messages in thread
From: Venkatesh Pallipadi @ 2010-06-01 18:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, hpa, linux-kernel, suresh.b.siddha, tglx,
	linux-tip-commits, Fabio Checconi

On Tue, Jun 1, 2010 at 10:42 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, 2010-06-01 at 10:34 -0700, Venkatesh Pallipadi wrote:
>>
>> Yes. This will cover all the cases on insert. But on erase, there is
>> still a case where a rotate of sibling node is done during the
>> re-coloration process. There we have a child change on sibling's
>> child. I am not able to think of any easy way to handle that case.
>
> Let me go draw some figures with pen and paper to match up the erase
> path with the rb_augment_erase_begin() code, because I can't quite spot
> the case we're missing.
>
> If you have it handy, ascii art might help..

It is this case

    P
   / \
  N   S
     / \
    SL SR

changing to

    P
   / \
  N  SL
      \
       S
        \
        SR

This can happen in __rb_erase_color, where 'other' points to sibling node.

Thanks,
Venki

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] rbtree: undo augmented damage -v2
  2010-06-01 18:09                   ` Venkatesh Pallipadi
@ 2010-06-01 18:42                     ` Peter Zijlstra
  2010-06-01 20:31                       ` Venkatesh Pallipadi
  2010-06-02 21:11                       ` Suresh Siddha
  0 siblings, 2 replies; 27+ messages in thread
From: Peter Zijlstra @ 2010-06-01 18:42 UTC (permalink / raw)
  To: Venkatesh Pallipadi
  Cc: mingo, hpa, linux-kernel, suresh.b.siddha, tglx,
	linux-tip-commits, Fabio Checconi

On Tue, 2010-06-01 at 11:09 -0700, Venkatesh Pallipadi wrote:
> On Tue, Jun 1, 2010 at 10:42 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Tue, 2010-06-01 at 10:34 -0700, Venkatesh Pallipadi wrote:
> >>
> >> Yes. This will cover all the cases on insert. But on erase, there is
> >> still a case where a rotate of sibling node is done during the
> >> re-coloration process. There we have a child change on sibling's
> >> child. I am not able to think of any easy way to handle that case.
> >
> > Let me go draw some figures with pen and paper to match up the erase
> > path with the rb_augment_erase_begin() code, because I can't quite spot
> > the case we're missing.
> >
> > If you have it handy, ascii art might help..
> 
> It is this case
> 
>     P
>    / \
>   N   S
>      / \
>     SL SR
> 
> changing to
> 
>     P
>    / \
>   N  SL
>       \
>        S
>         \
>         SR

Right, but see: http://en.wikipedia.org/wiki/Red-black_tree
That is delete_case5, however then we fall into delete_case6 and perform
a left rotation.

So suppose we start with the tree:

    P                  P               P                SL
  /   \               / \             / \               / \
 D     S      -->    N  S      -->   N   SL     -->    P   S
  \   / \              / \                \           /     \
   N SL  SR          SL* SR                S*        N      SR
                                            \
                                             SR

and then remove D, delete case 5 and finally delete case 6, * marks red.

rb_augment_erase_begin(D) will return N, and then rb_augment_path(N)
will re-augment: N, P, SL and S.



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] rbtree: undo augmented damage -v2
  2010-06-01 18:42                     ` Peter Zijlstra
@ 2010-06-01 20:31                       ` Venkatesh Pallipadi
  2010-06-02 21:11                       ` Suresh Siddha
  1 sibling, 0 replies; 27+ messages in thread
From: Venkatesh Pallipadi @ 2010-06-01 20:31 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, hpa, linux-kernel, suresh.b.siddha, tglx,
	linux-tip-commits, Fabio Checconi

On Tue, Jun 1, 2010 at 11:42 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, 2010-06-01 at 11:09 -0700, Venkatesh Pallipadi wrote:
>> On Tue, Jun 1, 2010 at 10:42 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>> > On Tue, 2010-06-01 at 10:34 -0700, Venkatesh Pallipadi wrote:
>> >>
>> >> Yes. This will cover all the cases on insert. But on erase, there is
>> >> still a case where a rotate of sibling node is done during the
>> >> re-coloration process. There we have a child change on sibling's
>> >> child. I am not able to think of any easy way to handle that case.
>> >
>> > Let me go draw some figures with pen and paper to match up the erase
>> > path with the rb_augment_erase_begin() code, because I can't quite spot
>> > the case we're missing.
>> >
>> > If you have it handy, ascii art might help..
>>
>> It is this case
>>
>>     P
>>    / \
>>   N   S
>>      / \
>>     SL SR
>>
>> changing to
>>
>>     P
>>    / \
>>   N  SL
>>       \
>>        S
>>         \
>>         SR
>
> Right, but see: http://en.wikipedia.org/wiki/Red-black_tree
> That is delete_case5, however then we fall into delete_case6 and perform
> a left rotation.
>
> So suppose we start with the tree:
>
>    P                  P               P                SL
>  /   \               / \             / \               / \
>  D     S      -->    N  S      -->   N   SL     -->    P   S
>  \   / \              / \                \           /     \
>   N SL  SR          SL* SR                S*        N      SR
>                                            \
>                                             SR
>
> and then remove D, delete case 5 and finally delete case 6, * marks red.
>
> rb_augment_erase_begin(D) will return N, and then rb_augment_path(N)
> will re-augment: N, P, SL and S.
>

Yes. I had missed that rotate_right of parent following the
rotate_left of sibling (or vice-versa). Thanks for fixing this. The
latest patch looks good.

Acked-by: Venkatesh Pallipadi <venki@google.com>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] rbtree: undo augmented damage -v2
  2010-06-01 18:42                     ` Peter Zijlstra
  2010-06-01 20:31                       ` Venkatesh Pallipadi
@ 2010-06-02 21:11                       ` Suresh Siddha
  2010-06-03  1:15                         ` Venkatesh Pallipadi
  2010-06-03  7:13                         ` Peter Zijlstra
  1 sibling, 2 replies; 27+ messages in thread
From: Suresh Siddha @ 2010-06-02 21:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Venkatesh Pallipadi, mingo, hpa, linux-kernel, tglx,
	linux-tip-commits, Fabio Checconi

On Tue, 2010-06-01 at 11:42 -0700, Peter Zijlstra wrote:
> On Tue, 2010-06-01 at 11:09 -0700, Venkatesh Pallipadi wrote:
> > On Tue, Jun 1, 2010 at 10:42 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > > On Tue, 2010-06-01 at 10:34 -0700, Venkatesh Pallipadi wrote:
> > >>
> > >> Yes. This will cover all the cases on insert. But on erase, there is
> > >> still a case where a rotate of sibling node is done during the
> > >> re-coloration process. There we have a child change on sibling's
> > >> child. I am not able to think of any easy way to handle that case.
> > >
> > > Let me go draw some figures with pen and paper to match up the erase
> > > path with the rb_augment_erase_begin() code, because I can't quite spot
> > > the case we're missing.
> > >
> > > If you have it handy, ascii art might help..
> > 
> > It is this case
> > 
> >     P
> >    / \
> >   N   S
> >      / \
> >     SL SR
> > 
> > changing to
> > 
> >     P
> >    / \
> >   N  SL
> >       \
> >        S
> >         \
> >         SR
> 
> Right, but see: http://en.wikipedia.org/wiki/Red-black_tree
> That is delete_case5, however then we fall into delete_case6 and perform
> a left rotation.
> 
> So suppose we start with the tree:
> 
>     P                  P               P                SL
>   /   \               / \             / \               / \
>  D     S      -->    N  S      -->   N   SL     -->    P   S
>   \   / \              / \                \           /     \
>    N SL  SR          SL* SR                S*        N      SR
>                                             \
>                                              SR
> 
> and then remove D, delete case 5 and finally delete case 6, * marks red.
> 
> rb_augment_erase_begin(D) will return N, and then rb_augment_path(N)
> will re-augment: N, P, SL and S.


      P                       SL
    /  \                     /  \
   N    S       --->        N    S
  /   /  \                 /      \
 C   SL  SR               C        SR

If P needs to be removed, we need to re-augment S also in this case,
right? It looks like we are not handling this case.

thanks,
suresh




^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] rbtree: undo augmented damage -v2
  2010-06-02 21:11                       ` Suresh Siddha
@ 2010-06-03  1:15                         ` Venkatesh Pallipadi
  2010-06-03  7:13                         ` Peter Zijlstra
  1 sibling, 0 replies; 27+ messages in thread
From: Venkatesh Pallipadi @ 2010-06-03  1:15 UTC (permalink / raw)
  To: Suresh Siddha
  Cc: Peter Zijlstra, mingo, hpa, linux-kernel, tglx,
	linux-tip-commits, Fabio Checconi

On Wed, Jun 2, 2010 at 2:11 PM, Suresh Siddha <suresh.b.siddha@intel.com> wrote:
> On Tue, 2010-06-01 at 11:42 -0700, Peter Zijlstra wrote:
>> On Tue, 2010-06-01 at 11:09 -0700, Venkatesh Pallipadi wrote:
>> > On Tue, Jun 1, 2010 at 10:42 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>> > > On Tue, 2010-06-01 at 10:34 -0700, Venkatesh Pallipadi wrote:
>> > >>
>> > >> Yes. This will cover all the cases on insert. But on erase, there is
>> > >> still a case where a rotate of sibling node is done during the
>> > >> re-coloration process. There we have a child change on sibling's
>> > >> child. I am not able to think of any easy way to handle that case.
>> > >
>> > > Let me go draw some figures with pen and paper to match up the erase
>> > > path with the rb_augment_erase_begin() code, because I can't quite spot
>> > > the case we're missing.
>> > >
>> > > If you have it handy, ascii art might help..
>> >
>> > It is this case
>> >
>> >     P
>> >    / \
>> >   N   S
>> >      / \
>> >     SL SR
>> >
>> > changing to
>> >
>> >     P
>> >    / \
>> >   N  SL
>> >       \
>> >        S
>> >         \
>> >         SR
>>
>> Right, but see: http://en.wikipedia.org/wiki/Red-black_tree
>> That is delete_case5, however then we fall into delete_case6 and perform
>> a left rotation.
>>
>> So suppose we start with the tree:
>>
>>     P                  P               P                SL
>>   /   \               / \             / \               / \
>>  D     S      -->    N  S      -->   N   SL     -->    P   S
>>   \   / \              / \                \           /     \
>>    N SL  SR          SL* SR                S*        N      SR
>>                                             \
>>                                              SR
>>
>> and then remove D, delete case 5 and finally delete case 6, * marks red.
>>
>> rb_augment_erase_begin(D) will return N, and then rb_augment_path(N)
>> will re-augment: N, P, SL and S.
>
>
>      P                       SL
>    /  \                     /  \
>   N    S       --->        N    S
>  /   /  \                 /      \
>  C   SL  SR               C        SR
>
> If P needs to be removed, we need to re-augment S also in this case,
> right? It looks like we are not handling this case.
>

rb_augment_erase_begin() should take care of that. In this case, it
will return S as the deepest node and we start the walk-back-to-root
from there.

Thanks,
Venki

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] rbtree: undo augmented damage -v2
  2010-06-02 21:11                       ` Suresh Siddha
  2010-06-03  1:15                         ` Venkatesh Pallipadi
@ 2010-06-03  7:13                         ` Peter Zijlstra
  2010-06-03  7:48                           ` Peter Zijlstra
  1 sibling, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2010-06-03  7:13 UTC (permalink / raw)
  To: Suresh Siddha
  Cc: Venkatesh Pallipadi, mingo, hpa, linux-kernel, tglx,
	linux-tip-commits, Fabio Checconi

On Wed, 2010-06-02 at 14:11 -0700, Suresh Siddha wrote:
>       P                       SL
>     /  \                     /  \
>    N    S       --->        N    S
>   /   /  \                 /      \
>  C   SL  SR               C        SR
> 
> If P needs to be removed, we need to re-augment S also in this case,
> right? It looks like we are not handling this case.


rb_augment_erase_begin(P) will return S if I read the code right, so
rb_augment_path(S) will the re-augment S, SL, N.



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] rbtree: undo augmented damage -v2
  2010-06-03  7:13                         ` Peter Zijlstra
@ 2010-06-03  7:48                           ` Peter Zijlstra
  2010-06-03 16:04                             ` Suresh Siddha
  0 siblings, 1 reply; 27+ messages in thread
From: Peter Zijlstra @ 2010-06-03  7:48 UTC (permalink / raw)
  To: Suresh Siddha
  Cc: Venkatesh Pallipadi, mingo, hpa, linux-kernel, tglx,
	linux-tip-commits, Fabio Checconi

On Thu, 2010-06-03 at 09:13 +0200, Peter Zijlstra wrote:
> On Wed, 2010-06-02 at 14:11 -0700, Suresh Siddha wrote:
> >       P                       SL
> >     /  \                     /  \
> >    N    S       --->        N    S
> >   /   /  \                 /      \
> >  C   SL  SR               C        SR
> > 
> > If P needs to be removed, we need to re-augment S also in this case,
> > right? It looks like we are not handling this case.
> 
> 
> rb_augment_erase_begin(P) will return S if I read the code right, so
> rb_augment_path(S) will the re-augment S, SL, N.

To be more explicit: P has two children, so we select the next entry,
SL, as its substitute. Since SL doesn't have a right child we select its
parent, S.

If SL were to represent a subtree with a right child, then we'd select
that, lets call it SLR. SLR would then end up being a direct descendant
of S, and hence rb_augment_path(SLR) would still pass S on its way up.



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH] rbtree: undo augmented damage -v2
  2010-06-03  7:48                           ` Peter Zijlstra
@ 2010-06-03 16:04                             ` Suresh Siddha
  0 siblings, 0 replies; 27+ messages in thread
From: Suresh Siddha @ 2010-06-03 16:04 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Venkatesh Pallipadi, mingo, hpa, linux-kernel, tglx,
	linux-tip-commits, Fabio Checconi

On Thu, 2010-06-03 at 00:48 -0700, Peter Zijlstra wrote:
> On Thu, 2010-06-03 at 09:13 +0200, Peter Zijlstra wrote:
> > On Wed, 2010-06-02 at 14:11 -0700, Suresh Siddha wrote:
> > >       P                       SL
> > >     /  \                     /  \
> > >    N    S       --->        N    S
> > >   /   /  \                 /      \
> > >  C   SL  SR               C        SR
> > > 
> > > If P needs to be removed, we need to re-augment S also in this case,
> > > right? It looks like we are not handling this case.
> > 
> > 
> > rb_augment_erase_begin(P) will return S if I read the code right, so
> > rb_augment_path(S) will the re-augment S, SL, N.
> 
> To be more explicit: P has two children, so we select the next entry,
> SL, as its substitute. Since SL doesn't have a right child we select its
> parent, S.
> 
> If SL were to represent a subtree with a right child, then we'd select
> that, lets call it SLR. SLR would then end up being a direct descendant
> of S, and hence rb_augment_path(SLR) would still pass S on its way up.

Yep. I missed the rb_parent() part in the rb_augment_erase_begin().

Thanks.

Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [tip:x86/pat] rbtree: Undo augmented trees performance damage
  2010-06-01 17:42                 ` Peter Zijlstra
  2010-06-01 18:09                   ` Venkatesh Pallipadi
@ 2010-06-09 10:12                   ` tip-bot for Peter Zijlstra
  2010-07-05 12:48                   ` [tip:x86/urgent] rbtree: Undo augmented trees performance damage and regression tip-bot for Peter Zijlstra
  2 siblings, 0 replies; 27+ messages in thread
From: tip-bot for Peter Zijlstra @ 2010-06-09 10:12 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, torvalds, a.p.zijlstra, peterz, akpm,
	fabio, suresh.b.siddha, tglx, mingo, venki

Commit-ID:  2463eb8b3093995e09a0d41b3d78ee0cf5fb4249
Gitweb:     http://git.kernel.org/tip/2463eb8b3093995e09a0d41b3d78ee0cf5fb4249
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Sat, 29 May 2010 15:31:43 +0200
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Wed, 9 Jun 2010 10:32:49 +0200

rbtree: Undo augmented trees performance damage

Reimplement augmented RB-trees without sprinkling extra branches
all over the RB-tree code (which lives in the scheduler hot path).

This approach is 'borrowed' from Fabio's BFQ implementation and
relies on traversing the rebalance path after the RB-tree-op to
correct the heap property for insertion/removal and make up for
the damage done by the tree rotations.

For insertion the rebalance path is trivially that from the new
node upwards to the root, for removal it is that from the deepest
node in the path from the to be removed node that will still
be around after the removal.

Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: Venkatesh Pallipadi <venki@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Fabio Checconi <fabio@gandalf.sssup.it>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <1275414172.27810.27961.camel@twins>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/mm/pat_rbtree.c |   34 ++-----------
 include/linux/rbtree.h   |   13 ++++--
 lib/rbtree.c             |  116 ++++++++++++++++++++++++++++-----------------
 3 files changed, 87 insertions(+), 76 deletions(-)

diff --git a/arch/x86/mm/pat_rbtree.c b/arch/x86/mm/pat_rbtree.c
index f537087..a2903e6 100644
--- a/arch/x86/mm/pat_rbtree.c
+++ b/arch/x86/mm/pat_rbtree.c
@@ -34,8 +34,7 @@
  * memtype_lock protects the rbtree.
  */
 
-static void memtype_rb_augment_cb(struct rb_node *node);
-static struct rb_root memtype_rbroot = RB_AUGMENT_ROOT(&memtype_rb_augment_cb);
+static struct rb_root memtype_rbroot = RB_ROOT;
 
 static int is_node_overlap(struct memtype *node, u64 start, u64 end)
 {
@@ -56,7 +55,7 @@ static u64 get_subtree_max_end(struct rb_node *node)
 }
 
 /* Update 'subtree_max_end' for a node, based on node and its children */
-static void update_node_max_end(struct rb_node *node)
+static void memtype_rb_augment_cb(struct rb_node *node, void *__unused)
 {
 	struct memtype *data;
 	u64 max_end, child_max_end;
@@ -78,25 +77,6 @@ static void update_node_max_end(struct rb_node *node)
 	data->subtree_max_end = max_end;
 }
 
-/* Update 'subtree_max_end' for a node and all its ancestors */
-static void update_path_max_end(struct rb_node *node)
-{
-	u64 old_max_end, new_max_end;
-
-	while (node) {
-		struct memtype *data = container_of(node, struct memtype, rb);
-
-		old_max_end = data->subtree_max_end;
-		update_node_max_end(node);
-		new_max_end = data->subtree_max_end;
-
-		if (new_max_end == old_max_end)
-			break;
-
-		node = rb_parent(node);
-	}
-}
-
 /* Find the first (lowest start addr) overlapping range from rb tree */
 static struct memtype *memtype_rb_lowest_match(struct rb_root *root,
 				u64 start, u64 end)
@@ -190,12 +170,6 @@ failure:
 	return -EBUSY;
 }
 
-static void memtype_rb_augment_cb(struct rb_node *node)
-{
-	if (node)
-		update_path_max_end(node);
-}
-
 static void memtype_rb_insert(struct rb_root *root, struct memtype *newdata)
 {
 	struct rb_node **node = &(root->rb_node);
@@ -213,6 +187,7 @@ static void memtype_rb_insert(struct rb_root *root, struct memtype *newdata)
 
 	rb_link_node(&newdata->rb, parent, node);
 	rb_insert_color(&newdata->rb, root);
+	rb_augment_insert(&newdata->rb, memtype_rb_augment_cb, NULL);
 }
 
 int rbt_memtype_check_insert(struct memtype *new, unsigned long *ret_type)
@@ -233,13 +208,16 @@ int rbt_memtype_check_insert(struct memtype *new, unsigned long *ret_type)
 
 struct memtype *rbt_memtype_erase(u64 start, u64 end)
 {
+	struct rb_node *deepest;
 	struct memtype *data;
 
 	data = memtype_rb_exact_match(&memtype_rbroot, start, end);
 	if (!data)
 		goto out;
 
+	deepest = rb_augment_erase_begin(&data->rb);
 	rb_erase(&data->rb, &memtype_rbroot);
+	rb_augment_erase_end(deepest, memtype_rb_augment_cb, NULL);
 out:
 	return data;
 }
diff --git a/include/linux/rbtree.h b/include/linux/rbtree.h
index fe1872e..7066acb 100644
--- a/include/linux/rbtree.h
+++ b/include/linux/rbtree.h
@@ -110,7 +110,6 @@ struct rb_node
 struct rb_root
 {
 	struct rb_node *rb_node;
-	void (*augment_cb)(struct rb_node *node);
 };
 
 
@@ -130,9 +129,7 @@ static inline void rb_set_color(struct rb_node *rb, int color)
 	rb->rb_parent_color = (rb->rb_parent_color & ~1) | color;
 }
 
-#define RB_ROOT	(struct rb_root) { NULL, NULL, }
-#define RB_AUGMENT_ROOT(x)	(struct rb_root) { NULL, x}
-
+#define RB_ROOT	(struct rb_root) { NULL, }
 #define	rb_entry(ptr, type, member) container_of(ptr, type, member)
 
 #define RB_EMPTY_ROOT(root)	((root)->rb_node == NULL)
@@ -142,6 +139,14 @@ static inline void rb_set_color(struct rb_node *rb, int color)
 extern void rb_insert_color(struct rb_node *, struct rb_root *);
 extern void rb_erase(struct rb_node *, struct rb_root *);
 
+typedef void (*rb_augment_f)(struct rb_node *node, void *data);
+
+extern void rb_augment_insert(struct rb_node *node,
+			      rb_augment_f func, void *data);
+extern struct rb_node *rb_augment_erase_begin(struct rb_node *node);
+extern void rb_augment_erase_end(struct rb_node *node,
+				 rb_augment_f func, void *data);
+
 /* Find logical next and previous nodes in a tree */
 extern struct rb_node *rb_next(const struct rb_node *);
 extern struct rb_node *rb_prev(const struct rb_node *);
diff --git a/lib/rbtree.c b/lib/rbtree.c
index 15e10b1..4693f79 100644
--- a/lib/rbtree.c
+++ b/lib/rbtree.c
@@ -44,11 +44,6 @@ static void __rb_rotate_left(struct rb_node *node, struct rb_root *root)
 	else
 		root->rb_node = right;
 	rb_set_parent(node, right);
-
-	if (root->augment_cb) {
-		root->augment_cb(node);
-		root->augment_cb(right);
-	}
 }
 
 static void __rb_rotate_right(struct rb_node *node, struct rb_root *root)
@@ -72,20 +67,12 @@ static void __rb_rotate_right(struct rb_node *node, struct rb_root *root)
 	else
 		root->rb_node = left;
 	rb_set_parent(node, left);
-
-	if (root->augment_cb) {
-		root->augment_cb(node);
-		root->augment_cb(left);
-	}
 }
 
 void rb_insert_color(struct rb_node *node, struct rb_root *root)
 {
 	struct rb_node *parent, *gparent;
 
-	if (root->augment_cb)
-		root->augment_cb(node);
-
 	while ((parent = rb_parent(node)) && rb_is_red(parent))
 	{
 		gparent = rb_parent(parent);
@@ -240,15 +227,12 @@ void rb_erase(struct rb_node *node, struct rb_root *root)
 	else
 	{
 		struct rb_node *old = node, *left;
-		int old_parent_cb = 0;
-		int successor_parent_cb = 0;
 
 		node = node->rb_right;
 		while ((left = node->rb_left) != NULL)
 			node = left;
 
 		if (rb_parent(old)) {
-			old_parent_cb = 1;
 			if (rb_parent(old)->rb_left == old)
 				rb_parent(old)->rb_left = node;
 			else
@@ -263,10 +247,8 @@ void rb_erase(struct rb_node *node, struct rb_root *root)
 		if (parent == old) {
 			parent = node;
 		} else {
-			successor_parent_cb = 1;
 			if (child)
 				rb_set_parent(child, parent);
-
 			parent->rb_left = child;
 
 			node->rb_right = old->rb_right;
@@ -277,24 +259,6 @@ void rb_erase(struct rb_node *node, struct rb_root *root)
 		node->rb_left = old->rb_left;
 		rb_set_parent(old->rb_left, node);
 
-		if (root->augment_cb) {
-			/*
-			 * Here, three different nodes can have new children.
-			 * The parent of the successor node that was selected
-			 * to replace the node to be erased.
-			 * The node that is getting erased and is now replaced
-			 * by its successor.
-			 * The parent of the node getting erased-replaced.
-			 */
-			if (successor_parent_cb)
-				root->augment_cb(parent);
-
-			root->augment_cb(node);
-
-			if (old_parent_cb)
-				root->augment_cb(rb_parent(old));
-		}
-
 		goto color;
 	}
 
@@ -303,19 +267,15 @@ void rb_erase(struct rb_node *node, struct rb_root *root)
 
 	if (child)
 		rb_set_parent(child, parent);
-
-	if (parent) {
+	if (parent)
+	{
 		if (parent->rb_left == node)
 			parent->rb_left = child;
 		else
 			parent->rb_right = child;
-
-		if (root->augment_cb)
-			root->augment_cb(parent);
-
-	} else {
-		root->rb_node = child;
 	}
+	else
+		root->rb_node = child;
 
  color:
 	if (color == RB_BLACK)
@@ -323,6 +283,74 @@ void rb_erase(struct rb_node *node, struct rb_root *root)
 }
 EXPORT_SYMBOL(rb_erase);
 
+static void rb_augment_path(struct rb_node *node, rb_augment_f func, void *data)
+{
+	struct rb_node *parent;
+
+up:
+	func(node, data);
+	parent = rb_parent(node);
+	if (!parent)
+		return;
+
+	if (node == parent->rb_left && parent->rb_right)
+		func(parent->rb_right, data);
+	else if (parent->rb_left)
+		func(parent->rb_left, data);
+
+	node = parent;
+	goto up;
+}
+
+/*
+ * after inserting @node into the tree, update the tree to account for
+ * both the new entry and any damage done by rebalance
+ */
+void rb_augment_insert(struct rb_node *node, rb_augment_f func, void *data)
+{
+	if (node->rb_left)
+		node = node->rb_left;
+	else if (node->rb_right)
+		node = node->rb_right;
+
+	rb_augment_path(node, func, data);
+}
+
+/*
+ * before removing the node, find the deepest node on the rebalance path
+ * that will still be there after @node gets removed
+ */
+struct rb_node *rb_augment_erase_begin(struct rb_node *node)
+{
+	struct rb_node *deepest;
+
+	if (!node->rb_right && !node->rb_left)
+		deepest = rb_parent(node);
+	else if (!node->rb_right)
+		deepest = node->rb_left;
+	else if (!node->rb_left)
+		deepest = node->rb_right;
+	else {
+		deepest = rb_next(node);
+		if (deepest->rb_right)
+			deepest = deepest->rb_right;
+		else if (rb_parent(deepest) != node)
+			deepest = rb_parent(deepest);
+	}
+
+	return deepest;
+}
+
+/*
+ * after removal, update the tree to account for the removed entry
+ * and any rebalance damage.
+ */
+void rb_augment_erase_end(struct rb_node *node, rb_augment_f func, void *data)
+{
+	if (node)
+		rb_augment_path(node, func, data);
+}
+
 /*
  * This function returns the first node (in sort order) of the tree.
  */

^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [tip:x86/urgent] rbtree: Undo augmented trees performance damage and regression
  2010-06-01 17:42                 ` Peter Zijlstra
  2010-06-01 18:09                   ` Venkatesh Pallipadi
  2010-06-09 10:12                   ` [tip:x86/pat] rbtree: Undo augmented trees performance damage tip-bot for Peter Zijlstra
@ 2010-07-05 12:48                   ` tip-bot for Peter Zijlstra
  2 siblings, 0 replies; 27+ messages in thread
From: tip-bot for Peter Zijlstra @ 2010-07-05 12:48 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, torvalds, a.p.zijlstra, peterz, ali,
	akpm, fabio, suresh.b.siddha, tglx, mingo, venki

Commit-ID:  b945d6b2554d550fe95caadc61e521c0ad71fb9c
Gitweb:     http://git.kernel.org/tip/b945d6b2554d550fe95caadc61e521c0ad71fb9c
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Sat, 29 May 2010 15:31:43 +0200
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Mon, 5 Jul 2010 14:43:50 +0200

rbtree: Undo augmented trees performance damage and regression

Reimplement augmented RB-trees without sprinkling extra branches
all over the RB-tree code (which lives in the scheduler hot path).

This approach is 'borrowed' from Fabio's BFQ implementation and
relies on traversing the rebalance path after the RB-tree-op to
correct the heap property for insertion/removal and make up for
the damage done by the tree rotations.

For insertion the rebalance path is trivially that from the new
node upwards to the root, for removal it is that from the deepest
node in the path from the to be removed node that will still
be around after the removal.

[ This patch also fixes a video driver regression reported by
  Ali Gholami Rudi - the memtype->subtree_max_end was updated
  incorrectly. ]

Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: Venkatesh Pallipadi <venki@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Tested-by: Ali Gholami Rudi <ali@rudi.ir>
Cc: Fabio Checconi <fabio@gandalf.sssup.it>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <1275414172.27810.27961.camel@twins>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/mm/pat_rbtree.c |   34 ++-----------
 include/linux/rbtree.h   |   13 ++++--
 lib/rbtree.c             |  116 ++++++++++++++++++++++++++++-----------------
 3 files changed, 87 insertions(+), 76 deletions(-)

diff --git a/arch/x86/mm/pat_rbtree.c b/arch/x86/mm/pat_rbtree.c
index f20eeec..8acaddd 100644
--- a/arch/x86/mm/pat_rbtree.c
+++ b/arch/x86/mm/pat_rbtree.c
@@ -34,8 +34,7 @@
  * memtype_lock protects the rbtree.
  */
 
-static void memtype_rb_augment_cb(struct rb_node *node);
-static struct rb_root memtype_rbroot = RB_AUGMENT_ROOT(&memtype_rb_augment_cb);
+static struct rb_root memtype_rbroot = RB_ROOT;
 
 static int is_node_overlap(struct memtype *node, u64 start, u64 end)
 {
@@ -56,7 +55,7 @@ static u64 get_subtree_max_end(struct rb_node *node)
 }
 
 /* Update 'subtree_max_end' for a node, based on node and its children */
-static void update_node_max_end(struct rb_node *node)
+static void memtype_rb_augment_cb(struct rb_node *node, void *__unused)
 {
 	struct memtype *data;
 	u64 max_end, child_max_end;
@@ -78,25 +77,6 @@ static void update_node_max_end(struct rb_node *node)
 	data->subtree_max_end = max_end;
 }
 
-/* Update 'subtree_max_end' for a node and all its ancestors */
-static void update_path_max_end(struct rb_node *node)
-{
-	u64 old_max_end, new_max_end;
-
-	while (node) {
-		struct memtype *data = container_of(node, struct memtype, rb);
-
-		old_max_end = data->subtree_max_end;
-		update_node_max_end(node);
-		new_max_end = data->subtree_max_end;
-
-		if (new_max_end == old_max_end)
-			break;
-
-		node = rb_parent(node);
-	}
-}
-
 /* Find the first (lowest start addr) overlapping range from rb tree */
 static struct memtype *memtype_rb_lowest_match(struct rb_root *root,
 				u64 start, u64 end)
@@ -190,12 +170,6 @@ failure:
 	return -EBUSY;
 }
 
-static void memtype_rb_augment_cb(struct rb_node *node)
-{
-	if (node)
-		update_path_max_end(node);
-}
-
 static void memtype_rb_insert(struct rb_root *root, struct memtype *newdata)
 {
 	struct rb_node **node = &(root->rb_node);
@@ -213,6 +187,7 @@ static void memtype_rb_insert(struct rb_root *root, struct memtype *newdata)
 
 	rb_link_node(&newdata->rb, parent, node);
 	rb_insert_color(&newdata->rb, root);
+	rb_augment_insert(&newdata->rb, memtype_rb_augment_cb, NULL);
 }
 
 int rbt_memtype_check_insert(struct memtype *new, unsigned long *ret_type)
@@ -234,13 +209,16 @@ int rbt_memtype_check_insert(struct memtype *new, unsigned long *ret_type)
 
 struct memtype *rbt_memtype_erase(u64 start, u64 end)
 {
+	struct rb_node *deepest;
 	struct memtype *data;
 
 	data = memtype_rb_exact_match(&memtype_rbroot, start, end);
 	if (!data)
 		goto out;
 
+	deepest = rb_augment_erase_begin(&data->rb);
 	rb_erase(&data->rb, &memtype_rbroot);
+	rb_augment_erase_end(deepest, memtype_rb_augment_cb, NULL);
 out:
 	return data;
 }
diff --git a/include/linux/rbtree.h b/include/linux/rbtree.h
index fe1872e..7066acb 100644
--- a/include/linux/rbtree.h
+++ b/include/linux/rbtree.h
@@ -110,7 +110,6 @@ struct rb_node
 struct rb_root
 {
 	struct rb_node *rb_node;
-	void (*augment_cb)(struct rb_node *node);
 };
 
 
@@ -130,9 +129,7 @@ static inline void rb_set_color(struct rb_node *rb, int color)
 	rb->rb_parent_color = (rb->rb_parent_color & ~1) | color;
 }
 
-#define RB_ROOT	(struct rb_root) { NULL, NULL, }
-#define RB_AUGMENT_ROOT(x)	(struct rb_root) { NULL, x}
-
+#define RB_ROOT	(struct rb_root) { NULL, }
 #define	rb_entry(ptr, type, member) container_of(ptr, type, member)
 
 #define RB_EMPTY_ROOT(root)	((root)->rb_node == NULL)
@@ -142,6 +139,14 @@ static inline void rb_set_color(struct rb_node *rb, int color)
 extern void rb_insert_color(struct rb_node *, struct rb_root *);
 extern void rb_erase(struct rb_node *, struct rb_root *);
 
+typedef void (*rb_augment_f)(struct rb_node *node, void *data);
+
+extern void rb_augment_insert(struct rb_node *node,
+			      rb_augment_f func, void *data);
+extern struct rb_node *rb_augment_erase_begin(struct rb_node *node);
+extern void rb_augment_erase_end(struct rb_node *node,
+				 rb_augment_f func, void *data);
+
 /* Find logical next and previous nodes in a tree */
 extern struct rb_node *rb_next(const struct rb_node *);
 extern struct rb_node *rb_prev(const struct rb_node *);
diff --git a/lib/rbtree.c b/lib/rbtree.c
index 15e10b1..4693f79 100644
--- a/lib/rbtree.c
+++ b/lib/rbtree.c
@@ -44,11 +44,6 @@ static void __rb_rotate_left(struct rb_node *node, struct rb_root *root)
 	else
 		root->rb_node = right;
 	rb_set_parent(node, right);
-
-	if (root->augment_cb) {
-		root->augment_cb(node);
-		root->augment_cb(right);
-	}
 }
 
 static void __rb_rotate_right(struct rb_node *node, struct rb_root *root)
@@ -72,20 +67,12 @@ static void __rb_rotate_right(struct rb_node *node, struct rb_root *root)
 	else
 		root->rb_node = left;
 	rb_set_parent(node, left);
-
-	if (root->augment_cb) {
-		root->augment_cb(node);
-		root->augment_cb(left);
-	}
 }
 
 void rb_insert_color(struct rb_node *node, struct rb_root *root)
 {
 	struct rb_node *parent, *gparent;
 
-	if (root->augment_cb)
-		root->augment_cb(node);
-
 	while ((parent = rb_parent(node)) && rb_is_red(parent))
 	{
 		gparent = rb_parent(parent);
@@ -240,15 +227,12 @@ void rb_erase(struct rb_node *node, struct rb_root *root)
 	else
 	{
 		struct rb_node *old = node, *left;
-		int old_parent_cb = 0;
-		int successor_parent_cb = 0;
 
 		node = node->rb_right;
 		while ((left = node->rb_left) != NULL)
 			node = left;
 
 		if (rb_parent(old)) {
-			old_parent_cb = 1;
 			if (rb_parent(old)->rb_left == old)
 				rb_parent(old)->rb_left = node;
 			else
@@ -263,10 +247,8 @@ void rb_erase(struct rb_node *node, struct rb_root *root)
 		if (parent == old) {
 			parent = node;
 		} else {
-			successor_parent_cb = 1;
 			if (child)
 				rb_set_parent(child, parent);
-
 			parent->rb_left = child;
 
 			node->rb_right = old->rb_right;
@@ -277,24 +259,6 @@ void rb_erase(struct rb_node *node, struct rb_root *root)
 		node->rb_left = old->rb_left;
 		rb_set_parent(old->rb_left, node);
 
-		if (root->augment_cb) {
-			/*
-			 * Here, three different nodes can have new children.
-			 * The parent of the successor node that was selected
-			 * to replace the node to be erased.
-			 * The node that is getting erased and is now replaced
-			 * by its successor.
-			 * The parent of the node getting erased-replaced.
-			 */
-			if (successor_parent_cb)
-				root->augment_cb(parent);
-
-			root->augment_cb(node);
-
-			if (old_parent_cb)
-				root->augment_cb(rb_parent(old));
-		}
-
 		goto color;
 	}
 
@@ -303,19 +267,15 @@ void rb_erase(struct rb_node *node, struct rb_root *root)
 
 	if (child)
 		rb_set_parent(child, parent);
-
-	if (parent) {
+	if (parent)
+	{
 		if (parent->rb_left == node)
 			parent->rb_left = child;
 		else
 			parent->rb_right = child;
-
-		if (root->augment_cb)
-			root->augment_cb(parent);
-
-	} else {
-		root->rb_node = child;
 	}
+	else
+		root->rb_node = child;
 
  color:
 	if (color == RB_BLACK)
@@ -323,6 +283,74 @@ void rb_erase(struct rb_node *node, struct rb_root *root)
 }
 EXPORT_SYMBOL(rb_erase);
 
+static void rb_augment_path(struct rb_node *node, rb_augment_f func, void *data)
+{
+	struct rb_node *parent;
+
+up:
+	func(node, data);
+	parent = rb_parent(node);
+	if (!parent)
+		return;
+
+	if (node == parent->rb_left && parent->rb_right)
+		func(parent->rb_right, data);
+	else if (parent->rb_left)
+		func(parent->rb_left, data);
+
+	node = parent;
+	goto up;
+}
+
+/*
+ * after inserting @node into the tree, update the tree to account for
+ * both the new entry and any damage done by rebalance
+ */
+void rb_augment_insert(struct rb_node *node, rb_augment_f func, void *data)
+{
+	if (node->rb_left)
+		node = node->rb_left;
+	else if (node->rb_right)
+		node = node->rb_right;
+
+	rb_augment_path(node, func, data);
+}
+
+/*
+ * before removing the node, find the deepest node on the rebalance path
+ * that will still be there after @node gets removed
+ */
+struct rb_node *rb_augment_erase_begin(struct rb_node *node)
+{
+	struct rb_node *deepest;
+
+	if (!node->rb_right && !node->rb_left)
+		deepest = rb_parent(node);
+	else if (!node->rb_right)
+		deepest = node->rb_left;
+	else if (!node->rb_left)
+		deepest = node->rb_right;
+	else {
+		deepest = rb_next(node);
+		if (deepest->rb_right)
+			deepest = deepest->rb_right;
+		else if (rb_parent(deepest) != node)
+			deepest = rb_parent(deepest);
+	}
+
+	return deepest;
+}
+
+/*
+ * after removal, update the tree to account for the removed entry
+ * and any rebalance damage.
+ */
+void rb_augment_erase_end(struct rb_node *node, rb_augment_f func, void *data)
+{
+	if (node)
+		rb_augment_path(node, func, data);
+}
+
 /*
  * This function returns the first node (in sort order) of the tree.
  */

^ permalink raw reply related	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2010-07-05 12:53 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-02-10 19:57 [patch 0/3] x86: Use interval tree to keep track of PAT reserve/free venkatesh.pallipadi
2010-02-10 19:57 ` [patch 1/3] rbtree: Add support for augmented rbtrees venkatesh.pallipadi
2010-02-10 23:23   ` [patch 1/3] rbtree: Add support for augmented rbtrees (ver 2) Pallipadi, Venkatesh
2010-02-19  0:21     ` [tip:x86/pat] rbtree: Add support for augmented rbtrees tip-bot for Pallipadi, Venkatesh
2010-05-27 15:29       ` Peter Zijlstra
2010-05-29 12:29       ` [RFC][PATCH] rbtree: undo augmented damage Peter Zijlstra
2010-05-29 13:31         ` [PATCH] rbtree: undo augmented damage -v2 Peter Zijlstra
2010-06-01 17:00           ` Venkatesh Pallipadi
2010-06-01 17:19             ` Peter Zijlstra
2010-06-01 17:34               ` Peter Zijlstra
2010-06-01 17:34               ` Venkatesh Pallipadi
2010-06-01 17:42                 ` Peter Zijlstra
2010-06-01 18:09                   ` Venkatesh Pallipadi
2010-06-01 18:42                     ` Peter Zijlstra
2010-06-01 20:31                       ` Venkatesh Pallipadi
2010-06-02 21:11                       ` Suresh Siddha
2010-06-03  1:15                         ` Venkatesh Pallipadi
2010-06-03  7:13                         ` Peter Zijlstra
2010-06-03  7:48                           ` Peter Zijlstra
2010-06-03 16:04                             ` Suresh Siddha
2010-06-09 10:12                   ` [tip:x86/pat] rbtree: Undo augmented trees performance damage tip-bot for Peter Zijlstra
2010-07-05 12:48                   ` [tip:x86/urgent] rbtree: Undo augmented trees performance damage and regression tip-bot for Peter Zijlstra
2010-02-10 19:57 ` [patch 2/3] x86: Preparatory changes in pat.c for bigger rbtree change venkatesh.pallipadi
2010-02-19  0:22   ` [tip:x86/pat] x86, pat: " tip-bot for venkatesh.pallipadi@intel.com
2010-02-10 19:57 ` [patch 3/3] x86: Migrate to rbtree only backend for pat memtype management venkatesh.pallipadi
2010-02-10 23:26   ` [patch 3/3] x86: Migrate to rbtree only backend for pat memtype management (ver 2) Pallipadi, Venkatesh
2010-02-19  0:22     ` [tip:x86/pat] x86, pat: Migrate to rbtree only backend for pat memtype management tip-bot for Pallipadi, Venkatesh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).