linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC 1/4] lib/radix: add universal radix_tree_fill_range
@ 2016-08-27 14:14 Konstantin Khlebnikov
  2016-08-27 14:16 ` [PATCH RFC 2/4] lib/radix-tree: remove sibling entries Konstantin Khlebnikov
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Konstantin Khlebnikov @ 2016-08-27 14:14 UTC (permalink / raw)
  To: linux-mm; +Cc: Andrew Morton, linux-kernel, Kirill A. Shutemov, Ross Zwisler

This patch adds function for filling and truncating ranges of slots:

radix_tree_node *radix_tree_fill_range(root, start, end, item, flags)

It fills slots in range "begin".."end" with "item" and returns pointer
to the last filled node. Filling with NULL truncates range.

This is intended for managing transparent huge pages in page cache where
all entries are aligned but this function can handle arbitrary unaligned
ranges. Might be useful for PAT or VMA-like extent trees.

By default filling range constructs shallow tree: entries are assigned
directly inner slots if possible. In worst case any range requires only
2 * RADIX_TREE_MAX_PATH nodes. If length is power of two and start index
is aligned then all slots are always in single node and requires at most
RADIX_TREE_MAX_PATH nodes.

Function accepts several flags:

RADIX_TREE_FILL_LEAVES  - build deep tree, insert entry into leaves.

RADIX_TREE_FILL_OVERWRITE - overwrite instead of failing with -EEXIST.

RADIX_TREE_FILL_ATOMIC - play well with concurrent RCU-protected lookup:
fill new nodes with RADIX_TREE_RETRY before inserting them into the tree.
At following iterations these slots are filled with @item or sub-nodes.

RADIX_TREE_FILL_CLEAR_TAGS - also clears all tags.

radix_tree_fill_range() returns pointer to the node which holds the last
slot in range, NULL if this is root slot, or ERR_PTR in case of error.

Thus, radix_tree_fill_range() can handle all operations required for THP:

* Insert
Fill range with pointer to head page.

radix_tree_fill_range(root, index, index + nr_pages - 1, head_page,
		      RADIX_TREE_FILL_ATOMIC)

* Remove
Fill range with NULL or shadow entry, returned value will be used for
linking completely shadow nodes into slab shrinker.

radix_tree_fill_range(root, index, index + nr_pages - 1, NULL,
		      RADIX_TREE_FILL_OVERWRITE)

* Merge
Fill range with overwrite to replace 0-order pages with THP.

radix_tree_fill_range(root, index, index + nr_pages - 1, head_page,
		      RADIX_TREE_FILL_OVERWRITE | RADIX_TREE_FILL_ATOMIC)

* Split
Two passes: first fill leaves with head_page entry and then replace each
slot with pointer to individual tail page. This could be done in single
pass but makes radix_tree_fill_range much more complicated.

radix_tree_fill_range(root, index, index + nr_pages - 1, head_page,
		      RADIX_TREE_FILL_LEAVES | RADIX_TREE_FILL_OVERWRITE |
		      RADIX_TREE_FILL_ATOMIC);
radix_tree_for_each_slot(...)
	radix_tree_replace_slot(slot, head + iter.index - head->index);


Page lookup and iterator will return pointer to head page for any index.


Code inside iterator loop could detect huge entry, handle all sub-pages
and jump to next index using new helper function radix_tree_iter_jump():

slot = radix_tree_iter_jump(&iter, page->index + hpage_nr_pages(page));

This helper has builtin protection against overflows: jump to index = 0
stops iterator. This uses existing logic in radix_tree_next_chunk():
if iter.next_index is zero then iter.index must be zero too.


Tags should be set only for last index of THP range: this way iterator
will find them regardless of starting index.

radix_tree_preload_range() pre-allocates nodes for filling range.

Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
---
 include/linux/radix-tree.h |   46 ++++++++
 lib/radix-tree.c           |  245 ++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 291 insertions(+)

diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
index 4613bf35c311..af33e8d93ec3 100644
--- a/include/linux/radix-tree.h
+++ b/include/linux/radix-tree.h
@@ -319,6 +319,35 @@ static inline void radix_tree_preload_end(void)
 	preempt_enable();
 }
 
+#define RADIX_TREE_FILL_LEAVES		1 /* build full depth tree */
+#define RADIX_TREE_FILL_OVERWRITE	2 /* overwrite non-empty slots */
+#define RADIX_TREE_FILL_CLEAR_TAGS	4 /* clear all tags */
+#define RADIX_TREE_FILL_ATOMIC		8 /* play well with rcu lookup */
+
+struct radix_tree_node *
+radix_tree_fill_range(struct radix_tree_root *root, unsigned long start,
+		      unsigned long end, void *item, unsigned int flags);
+
+int radix_tree_preload_range(gfp_t gfp_mask, unsigned long start,
+			     unsigned long end, unsigned int flags);
+
+/**
+ * radix_tree_truncate_range  - remove everything in range
+ * @root:	radix tree root
+ * @start:	first index
+ * @end:	last index
+ *
+ * This function removes all items and tags within given range.
+ */
+static inline void
+radix_tree_truncate_range(struct radix_tree_root *root,
+			  unsigned long start, unsigned long end)
+{
+	radix_tree_fill_range(root, start, end, NULL,
+			      RADIX_TREE_FILL_OVERWRITE |
+			      RADIX_TREE_FILL_CLEAR_TAGS);
+}
+
 /**
  * struct radix_tree_iter - radix tree iterator state
  *
@@ -435,6 +464,23 @@ void **radix_tree_iter_next(struct radix_tree_iter *iter)
 }
 
 /**
+ * radix_tree_iter_jump - restart iterating from given index if it non-zero
+ * @iter:	iterator state
+ * @index:	next index
+ *
+ * If index is zero when iterator will stop. This protects from endless loop
+ * when index overflows after visiting last entry.
+ */
+static inline __must_check
+void **radix_tree_iter_jump(struct radix_tree_iter *iter, unsigned long index)
+{
+	iter->index = index - 1;
+	iter->next_index = index;
+	iter->tags = 0;
+	return NULL;
+}
+
+/**
  * radix_tree_chunk_size - get current chunk size
  *
  * @iter:	pointer to radix tree iterator
diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index 1b7bf7314141..c46a60065a77 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -36,6 +36,7 @@
 #include <linux/bitops.h>
 #include <linux/rcupdate.h>
 #include <linux/preempt.h>		/* in_interrupt() */
+#include <linux/err.h>
 
 
 /* Number of nodes in fully populated tree of given height */
@@ -1014,6 +1015,250 @@ void **radix_tree_next_chunk(struct radix_tree_root *root,
 EXPORT_SYMBOL(radix_tree_next_chunk);
 
 /**
+ * radix_tree_preload_range  - preload nodes for filling range.
+ * @gfp_mask:
+ * @start:	first index
+ * @end:	last index
+ * @flags:	RADIX_TREE_FILL_*
+ */
+int radix_tree_preload_range(gfp_t gfp_mask, unsigned long start,
+			     unsigned long end, unsigned int flags)
+{
+	unsigned long length = end - start + 1;
+	int nr_nodes, shift;
+
+	/* Preloading doesn't help anything with this gfp mask, skip it */
+	if (!gfpflags_allow_blocking(gfp_mask)) {
+		preempt_disable();
+		return 0;
+	}
+
+	/*
+	 * For filling leaves tree must cover all indexes in range at all
+	 * levels plus RADIX_TREE_MAX_PATH required for growing tree depth
+	 * and only root node is shared for sure.
+	 *
+	 * If for aligned range we need RADIX_TREE_MAX_PATH for growing depth
+	 * and RADIX_TREE_MAX_PATH for path where all slots will be.
+	 *
+	 * For arbitrary range we need again RADIX_TREE_MAX_PATH for growing
+	 * depth and two RADIX_TREE_MAX_PATH chains for constructing arc of
+	 * slots from leaf to root and back. Only root node is shared.
+	 */
+	if (flags & RADIX_TREE_FILL_LEAVES) {
+		if (start > end)
+			return -EINVAL;
+		shift = 0;
+		nr_nodes = RADIX_TREE_MAX_PATH - 1;
+		do {
+			shift += RADIX_TREE_MAP_SHIFT;
+			nr_nodes += (end >> shift) - (start >> shift) + 1;
+		} while (shift < RADIX_TREE_INDEX_BITS);
+	} else if (is_power_of_2(length) && IS_ALIGNED(start, length))
+		nr_nodes = RADIX_TREE_MAX_PATH * 2 - 1;
+	else
+		nr_nodes = RADIX_TREE_MAX_PATH * 3 - 2;
+	return __radix_tree_preload(gfp_mask, nr_nodes);
+}
+EXPORT_SYMBOL(radix_tree_preload_range);
+
+/**
+ * radix_tree_fill_range - fill range of slots
+ * @root:	radix tree root
+ * @start:	first index
+ * @end:	last index
+ * @item:	value for filling, NULL for removing
+ * @flags:	RADIX_TREE_FILL_* flags
+ * Returns:	pointer last node or NULL, ERR_PTR for errors
+ *
+ * By default builds shallow tree: assign entry to inner slots if possible.
+ * In wost case range requires up to 2 * RADIX_TREE_MAX_PATH nodes plus
+ * RADIX_TREE_MAX_PATH for extending tree depth.
+ *
+ * If length is 2^n and start aligned to it then all slots are in one node.
+ *
+ * This function cannot fill or cut part of bugger range if this require
+ * spltting inner slots and insering new nodes: fails with -ERANGE.
+ *
+ * With flag RADIX_TREE_FILL_LEAVES builds deep tree and insert @item into
+ * leaf slots. This requires much more nodes.
+ *
+ * With flag RADIX_TREE_FILL_OVERWRITE removes everything in range and cut
+ * sub-tree if @item is NULL. Without that flag function undo all chandges
+ * and fails with code -EEXIST if finds any populated slot.
+ *
+ * With flag RADIX_TREE_FILL_ATOMIC function plays well with rcu-protected
+ * lookups: it fills new nodes with RADIX_TREE_RETRY before inserting them
+ * into the tree: lookup will see either old entry, @item or retry entry.
+ * At following iterations these slots are filled with @item or sub-nodes.
+ *
+ * With flag RADIX_TREE_FILL_CLEAR_TAGS also clears all tags.
+ *
+ * Function returns pointer to node which holds the last slot in range,
+ * NULL if that was root slot, or ERR_PTR: -ENOMEM, -EEXIST, -ERANGE.
+ */
+struct radix_tree_node *
+radix_tree_fill_range(struct radix_tree_root *root, unsigned long start,
+		      unsigned long end, void *item, unsigned int flags)
+{
+	unsigned long index = start, maxindex;
+	struct radix_tree_node *node, *child;
+	int error, root_shift, shift, tag, offset;
+	void *entry;
+
+	/* Sanity check */
+	if (start > end)
+		return ERR_PTR(-EINVAL);
+
+	/* Make sure the tree is high enough.  */
+	root_shift = radix_tree_load_root(root, &node, &maxindex);
+	if (end > maxindex) {
+		error = radix_tree_extend(root, end, root_shift);
+		if (error < 0)
+			return ERR_PTR(error);
+		root_shift = error;
+	}
+
+	/* Special case: single slot tree */
+	if (!root_shift) {
+		if (node && (!(flags & RADIX_TREE_FILL_OVERWRITE)))
+			return ERR_PTR(-EEXIST);
+		if (flags & RADIX_TREE_FILL_CLEAR_TAGS)
+			root_tag_clear_all(root);
+		rcu_assign_pointer(root->rnode, item);
+		return NULL;
+	}
+
+next_node:
+	node = NULL;
+	offset = 0;
+	entry = rcu_dereference_raw(root->rnode);
+	shift = root_shift;
+
+	/* Descend to the index. Do at least one step. */
+	do {
+		child = entry_to_node(entry);
+		shift -= RADIX_TREE_MAP_SHIFT;
+		if (!child || !radix_tree_is_internal_node(entry)) {
+			/* Entry wider than range */
+			if (child) {
+				error = -ERANGE;
+				goto undo;
+			}
+			/* Hole wider tnan truncated range */
+			if (!item)
+				goto skip_node;
+			child = radix_tree_node_alloc(root);
+			if (!child) {
+				error = -ENOMEM;
+				goto undo;
+			}
+			child->shift = shift;
+			child->offset = offset;
+			child->parent = node;
+			/* Populate range with retry entries. */
+			if (flags & RADIX_TREE_FILL_ATOMIC) {
+				int idx = (index >> shift) &
+					   RADIX_TREE_MAP_MASK;
+				int last = RADIX_TREE_MAP_SIZE;
+
+				if (end < (index | shift_maxindex(shift)))
+					last = (end >> shift) &
+						RADIX_TREE_MAP_MASK;
+				for (; idx <= last; idx++)
+					child->slots[idx] = RADIX_TREE_RETRY;
+			}
+			entry = node_to_entry(child);
+			if (node) {
+				rcu_assign_pointer(node->slots[offset], entry);
+				node->count++;
+			} else
+				rcu_assign_pointer(root->rnode, entry);
+		}
+		node = child;
+		offset = (index >> shift) & RADIX_TREE_MAP_MASK;
+		entry = rcu_dereference_raw(node->slots[offset]);
+
+		/* Stop if find leaf or slot inside range */
+	} while ((flags & RADIX_TREE_FILL_LEAVES) ? shift :
+			((index & ((1ul << shift) - 1)) ||
+			 (index | ((1ul << shift) - 1)) > end));
+
+next_slot:
+	/* NULL or retry entry */
+	if (entry <= RADIX_TREE_RETRY)
+		goto fill;
+
+	if (!(flags & RADIX_TREE_FILL_OVERWRITE)) {
+		error = -EEXIST;
+		goto undo;
+	}
+
+	/* Cut sub-tree */
+	if (unlikely(radix_tree_is_internal_node(entry))) {
+		rcu_assign_pointer(node->slots[offset], item);
+		child = entry_to_node(entry);
+		offset = 0;
+		do {
+			entry = rcu_dereference_raw(child->slots[offset]);
+			if (entry)
+				child->count--;
+			if (radix_tree_is_internal_node(entry)) {
+				child = entry_to_node(entry);
+				offset = 0;
+			} else if (++offset == RADIX_TREE_MAP_SIZE) {
+				offset = child->offset;
+				entry = child->parent;
+				WARN_ON_ONCE(child->count);
+				radix_tree_node_free(child);
+				child = entry;
+			}
+		} while (child != node);
+	}
+
+	if (flags & RADIX_TREE_FILL_CLEAR_TAGS) {
+		for (tag = 0; tag < RADIX_TREE_MAX_TAGS; tag++)
+			node_tag_clear(root, node, tag, offset);
+	}
+
+	/* Skip the rest if we're cleared class slot in node */
+	if (!--node->count && !item && __radix_tree_delete_node(root, node))
+		goto skip_node;
+
+
+fill:
+	rcu_assign_pointer(node->slots[offset], item);
+	if (item)
+		node->count++;
+
+	index += 1ul << shift;
+	if (index - 1 == end)
+		return node;
+
+	/* Next slot in this node and still in range */
+	if (index + (1ul << shift) - 1 <= end &&
+			++offset < RADIX_TREE_MAP_SIZE) {
+		entry = rcu_dereference_raw(node->slots[offset]);
+		goto next_slot;
+	}
+
+	goto next_node;
+
+skip_node:
+	index |= shift_maxindex(shift);
+	if (index++ >= end)
+		return node;
+	goto next_node;
+
+undo:
+	if (index > start)
+		radix_tree_fill_range(root, start, index - 1, NULL,
+				      RADIX_TREE_FILL_OVERWRITE);
+	return ERR_PTR(error);
+}
+EXPORT_SYMBOL(radix_tree_fill_range);
+
+/**
  * radix_tree_range_tag_if_tagged - for each item in given range set given
  *				   tag if item has another tag set
  * @root:		radix tree root

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH RFC 2/4] lib/radix-tree: remove sibling entries
  2016-08-27 14:14 [PATCH RFC 1/4] lib/radix: add universal radix_tree_fill_range Konstantin Khlebnikov
@ 2016-08-27 14:16 ` Konstantin Khlebnikov
  2016-08-27 14:16 ` [PATCH RFC 3/4] testing/radix-tree: replace multi-order with range operations Konstantin Khlebnikov
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Konstantin Khlebnikov @ 2016-08-27 14:16 UTC (permalink / raw)
  To: linux-mm; +Cc: Andrew Morton, linux-kernel, Kirill A. Shutemov, Ross Zwisler

Current implementation stores huge entry as "canonical" head entry with
tail of "sibling" entries which points backward to the head. Iterator
jumps back and forward when sees them. This complication is required for
THP in page-cache because struct page can tell it start index and size.

This patch removes support of sibling entries but keeps part that allows
store data pointers in non-leaf slots. Huge pages will be stored as range
of slots which points to the head page.

This allows to simplify fast-path in radix_tree_next_chunk(): huge entry
is reported as single-slot chunk for any index within range.

Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
---
 include/linux/radix-tree.h |   75 ++++-----------------
 lib/radix-tree.c           |  158 +++++++++++++-------------------------------
 2 files changed, 60 insertions(+), 173 deletions(-)

diff --git a/include/linux/radix-tree.h b/include/linux/radix-tree.h
index af33e8d93ec3..1721ddbf981d 100644
--- a/include/linux/radix-tree.h
+++ b/include/linux/radix-tree.h
@@ -37,8 +37,8 @@
  * 10 - exceptional entry
  * 11 - this bit combination is currently unused/reserved
  *
- * The internal entry may be a pointer to the next level in the tree, a
- * sibling entry, or an indicator that the entry in this slot has been moved
+ * The internal entry may be a pointer to the next level in the tree
+ * or an indicator that the entry in this slot has been moved
  * to another location in the tree and the lookup should be restarted.  While
  * NULL fits the 'data pointer' pattern, it means that there is no entry in
  * the tree for this index (no matter what level of the tree it is found at).
@@ -265,13 +265,7 @@ static inline void radix_tree_replace_slot(void **pslot, void *item)
 int __radix_tree_create(struct radix_tree_root *root, unsigned long index,
 			unsigned order, struct radix_tree_node **nodep,
 			void ***slotp);
-int __radix_tree_insert(struct radix_tree_root *, unsigned long index,
-			unsigned order, void *);
-static inline int radix_tree_insert(struct radix_tree_root *root,
-			unsigned long index, void *entry)
-{
-	return __radix_tree_insert(root, index, 0, entry);
-}
+int radix_tree_insert(struct radix_tree_root *, unsigned long index, void *);
 void *__radix_tree_lookup(struct radix_tree_root *root, unsigned long index,
 			  struct radix_tree_node **nodep, void ***slotp);
 void *radix_tree_lookup(struct radix_tree_root *, unsigned long);
@@ -354,7 +348,6 @@ radix_tree_truncate_range(struct radix_tree_root *root,
  * @index:	index of current slot
  * @next_index:	one beyond the last index for this chunk
  * @tags:	bit-mask for tag-iterating
- * @shift:	shift for the node that holds our slots
  *
  * This radix tree iterator works in terms of "chunks" of slots.  A chunk is a
  * subinterval of slots contained within one radix tree leaf node.  It is
@@ -367,20 +360,8 @@ struct radix_tree_iter {
 	unsigned long	index;
 	unsigned long	next_index;
 	unsigned long	tags;
-#ifdef CONFIG_RADIX_TREE_MULTIORDER
-	unsigned int	shift;
-#endif
 };
 
-static inline unsigned int iter_shift(struct radix_tree_iter *iter)
-{
-#ifdef CONFIG_RADIX_TREE_MULTIORDER
-	return iter->shift;
-#else
-	return 0;
-#endif
-}
-
 #define RADIX_TREE_ITER_TAG_MASK	0x00FF	/* tag index in lower byte */
 #define RADIX_TREE_ITER_TAGGED		0x0100	/* lookup tagged slots */
 #define RADIX_TREE_ITER_CONTIG		0x0200	/* stop at first hole */
@@ -441,12 +422,6 @@ void **radix_tree_iter_retry(struct radix_tree_iter *iter)
 	return NULL;
 }
 
-static inline unsigned long
-__radix_tree_iter_add(struct radix_tree_iter *iter, unsigned long slots)
-{
-	return iter->index + (slots << iter_shift(iter));
-}
-
 /**
  * radix_tree_iter_next - resume iterating when the chunk may be invalid
  * @iter:	iterator state
@@ -458,7 +433,7 @@ __radix_tree_iter_add(struct radix_tree_iter *iter, unsigned long slots)
 static inline __must_check
 void **radix_tree_iter_next(struct radix_tree_iter *iter)
 {
-	iter->next_index = __radix_tree_iter_add(iter, 1);
+	iter->next_index = iter->index + 1;
 	iter->tags = 0;
 	return NULL;
 }
@@ -489,7 +464,7 @@ void **radix_tree_iter_jump(struct radix_tree_iter *iter, unsigned long index)
 static __always_inline long
 radix_tree_chunk_size(struct radix_tree_iter *iter)
 {
-	return (iter->next_index - iter->index) >> iter_shift(iter);
+	return iter->next_index - iter->index;
 }
 
 static inline struct radix_tree_node *entry_to_node(void *ptr)
@@ -508,6 +483,9 @@ static inline struct radix_tree_node *entry_to_node(void *ptr)
  * This function updates @iter->index in the case of a successful lookup.
  * For tagged lookup it also eats @iter->tags.
  *
+ * Please keep this fast-path as small as possible. Complicated logic is hidden
+ * inside radix_tree_next_chunk() which prepares chunks for this funciton.
+ *
  * There are several cases where 'slot' can be passed in as NULL to this
  * function.  These cases result from the use of radix_tree_iter_next() or
  * radix_tree_iter_retry().  In these cases we don't end up dereferencing
@@ -520,49 +498,24 @@ static __always_inline void **
 radix_tree_next_slot(void **slot, struct radix_tree_iter *iter, unsigned flags)
 {
 	if (flags & RADIX_TREE_ITER_TAGGED) {
-		void *canon = slot;
-
 		iter->tags >>= 1;
-		if (unlikely(!iter->tags))
-			return NULL;
-		while (IS_ENABLED(CONFIG_RADIX_TREE_MULTIORDER) &&
-					radix_tree_is_internal_node(slot[1])) {
-			if (entry_to_node(slot[1]) == canon) {
-				iter->tags >>= 1;
-				iter->index = __radix_tree_iter_add(iter, 1);
-				slot++;
-				continue;
-			}
-			iter->next_index = __radix_tree_iter_add(iter, 1);
-			return NULL;
-		}
 		if (likely(iter->tags & 1ul)) {
-			iter->index = __radix_tree_iter_add(iter, 1);
+			iter->index++;
 			return slot + 1;
 		}
-		if (!(flags & RADIX_TREE_ITER_CONTIG)) {
+		if (!(flags & RADIX_TREE_ITER_CONTIG) && likely(iter->tags)) {
 			unsigned offset = __ffs(iter->tags);
 
 			iter->tags >>= offset;
-			iter->index = __radix_tree_iter_add(iter, offset + 1);
+			iter->index += offset + 1;
 			return slot + offset + 1;
 		}
 	} else {
-		long count = radix_tree_chunk_size(iter);
-		void *canon = slot;
+		long size = radix_tree_chunk_size(iter);
 
-		while (--count > 0) {
+		while (--size > 0) {
 			slot++;
-			iter->index = __radix_tree_iter_add(iter, 1);
-
-			if (IS_ENABLED(CONFIG_RADIX_TREE_MULTIORDER) &&
-			    radix_tree_is_internal_node(*slot)) {
-				if (entry_to_node(*slot) == canon)
-					continue;
-				iter->next_index = iter->index;
-				break;
-			}
-
+			iter->index++;
 			if (likely(*slot))
 				return slot;
 			if (flags & RADIX_TREE_ITER_CONTIG) {
diff --git a/lib/radix-tree.c b/lib/radix-tree.c
index c46a60065a77..234f1ddbd7a9 100644
--- a/lib/radix-tree.c
+++ b/lib/radix-tree.c
@@ -77,21 +77,6 @@ static inline void *node_to_entry(void *ptr)
 
 #define RADIX_TREE_RETRY	node_to_entry(NULL)
 
-#ifdef CONFIG_RADIX_TREE_MULTIORDER
-/* Sibling slots point directly to another slot in the same node */
-static inline bool is_sibling_entry(struct radix_tree_node *parent, void *node)
-{
-	void **ptr = node;
-	return (parent->slots <= ptr) &&
-			(ptr < parent->slots + RADIX_TREE_MAP_SIZE);
-}
-#else
-static inline bool is_sibling_entry(struct radix_tree_node *parent, void *node)
-{
-	return false;
-}
-#endif
-
 static inline unsigned long get_slot_offset(struct radix_tree_node *parent,
 						 void **slot)
 {
@@ -104,16 +89,6 @@ static unsigned int radix_tree_descend(struct radix_tree_node *parent,
 	unsigned int offset = (index >> parent->shift) & RADIX_TREE_MAP_MASK;
 	void **entry = rcu_dereference_raw(parent->slots[offset]);
 
-#ifdef CONFIG_RADIX_TREE_MULTIORDER
-	if (radix_tree_is_internal_node(entry)) {
-		unsigned long siboff = get_slot_offset(parent, entry);
-		if (siboff < RADIX_TREE_MAP_SIZE) {
-			offset = siboff;
-			entry = rcu_dereference_raw(parent->slots[offset]);
-		}
-	}
-#endif
-
 	*nodep = (void *)entry;
 	return offset;
 }
@@ -232,12 +207,7 @@ static void dump_node(struct radix_tree_node *node, unsigned long index)
 		void *entry = node->slots[i];
 		if (!entry)
 			continue;
-		if (is_sibling_entry(node, entry)) {
-			pr_debug("radix sblng %p offset %ld val %p indices %ld-%ld\n",
-					entry, i,
-					*(void **)entry_to_node(entry),
-					first, last);
-		} else if (!radix_tree_is_internal_node(entry)) {
+		if (!radix_tree_is_internal_node(entry)) {
 			pr_debug("radix entry %p offset %ld indices %ld-%ld\n",
 					entry, i, first, last);
 		} else {
@@ -596,25 +566,6 @@ int __radix_tree_create(struct radix_tree_root *root, unsigned long index,
 		slot = &node->slots[offset];
 	}
 
-#ifdef CONFIG_RADIX_TREE_MULTIORDER
-	/* Insert pointers to the canonical entry */
-	if (order > shift) {
-		unsigned i, n = 1 << (order - shift);
-		offset = offset & ~(n - 1);
-		slot = &node->slots[offset];
-		child = node_to_entry(slot);
-		for (i = 0; i < n; i++) {
-			if (slot[i])
-				return -EEXIST;
-		}
-
-		for (i = 1; i < n; i++) {
-			rcu_assign_pointer(slot[i], child);
-			node->count++;
-		}
-	}
-#endif
-
 	if (nodep)
 		*nodep = node;
 	if (slotp)
@@ -623,16 +574,15 @@ int __radix_tree_create(struct radix_tree_root *root, unsigned long index,
 }
 
 /**
- *	__radix_tree_insert    -    insert into a radix tree
+ *	radix_tree_insert    -    insert into a radix tree
  *	@root:		radix tree root
  *	@index:		index key
- *	@order:		key covers the 2^order indices around index
  *	@item:		item to insert
  *
  *	Insert an item into the radix tree at position @index.
  */
-int __radix_tree_insert(struct radix_tree_root *root, unsigned long index,
-			unsigned order, void *item)
+int radix_tree_insert(struct radix_tree_root *root, unsigned long index,
+		      void *item)
 {
 	struct radix_tree_node *node;
 	void **slot;
@@ -640,7 +590,7 @@ int __radix_tree_insert(struct radix_tree_root *root, unsigned long index,
 
 	BUG_ON(radix_tree_is_internal_node(item));
 
-	error = __radix_tree_create(root, index, order, &node, &slot);
+	error = __radix_tree_create(root, index, 0, &node, &slot);
 	if (error)
 		return error;
 	if (*slot != NULL)
@@ -659,7 +609,7 @@ int __radix_tree_insert(struct radix_tree_root *root, unsigned long index,
 
 	return 0;
 }
-EXPORT_SYMBOL(__radix_tree_insert);
+EXPORT_SYMBOL(radix_tree_insert);
 
 /**
  *	__radix_tree_lookup	-	lookup an item in a radix tree
@@ -895,14 +845,6 @@ int radix_tree_tag_get(struct radix_tree_root *root,
 }
 EXPORT_SYMBOL(radix_tree_tag_get);
 
-static inline void __set_iter_shift(struct radix_tree_iter *iter,
-					unsigned int shift)
-{
-#ifdef CONFIG_RADIX_TREE_MULTIORDER
-	iter->shift = shift;
-#endif
-}
-
 /**
  * radix_tree_next_chunk - find next chunk of slots for iteration
  *
@@ -914,9 +856,10 @@ static inline void __set_iter_shift(struct radix_tree_iter *iter,
 void **radix_tree_next_chunk(struct radix_tree_root *root,
 			     struct radix_tree_iter *iter, unsigned flags)
 {
-	unsigned tag = flags & RADIX_TREE_ITER_TAG_MASK;
-	struct radix_tree_node *node, *child;
+	unsigned int shift, tag = flags & RADIX_TREE_ITER_TAG_MASK;
 	unsigned long index, offset, maxindex;
+	struct radix_tree_node *node;
+	void *entry;
 
 	if ((flags & RADIX_TREE_ITER_TAGGED) && !root_tag_get(root, tag))
 		return NULL;
@@ -934,28 +877,27 @@ void **radix_tree_next_chunk(struct radix_tree_root *root,
 	if (!index && iter->index)
 		return NULL;
 
- restart:
-	radix_tree_load_root(root, &child, &maxindex);
+restart:
+	shift = radix_tree_load_root(root, &node, &maxindex);
 	if (index > maxindex)
 		return NULL;
-	if (!child)
-		return NULL;
 
-	if (!radix_tree_is_internal_node(child)) {
+	if (!maxindex) {
 		/* Single-slot tree */
-		iter->index = index;
-		iter->next_index = maxindex + 1;
-		iter->tags = 1;
-		__set_iter_shift(iter, 0);
+		iter->index = 0;
+		iter->next_index = 1;
+		iter->tags = 0;
 		return (void **)&root->rnode;
 	}
 
-	do {
-		node = entry_to_node(child);
-		offset = radix_tree_descend(node, &child, index);
+	node = entry_to_node(node);
+	while (1) {
+		shift -= RADIX_TREE_MAP_SHIFT;
+		offset = (index >> shift) & RADIX_TREE_MAP_MASK;
+		entry = rcu_dereference_raw(node->slots[offset]);
 
 		if ((flags & RADIX_TREE_ITER_TAGGED) ?
-				!tag_get(node, tag, offset) : !child) {
+				!tag_get(node, tag, offset) : !entry) {
 			/* Hole detected */
 			if (flags & RADIX_TREE_ITER_CONTIG)
 				return NULL;
@@ -967,30 +909,42 @@ void **radix_tree_next_chunk(struct radix_tree_root *root,
 						offset + 1);
 			else
 				while (++offset	< RADIX_TREE_MAP_SIZE) {
-					void *slot = node->slots[offset];
-					if (is_sibling_entry(node, slot))
-						continue;
-					if (slot)
+					if (node->slots[offset])
 						break;
 				}
-			index &= ~node_maxindex(node);
-			index += offset << node->shift;
+
+			index &= ~shift_maxindex(shift);
+			index += offset << shift;
 			/* Overflow after ~0UL */
 			if (!index)
 				return NULL;
 			if (offset == RADIX_TREE_MAP_SIZE)
 				goto restart;
-			child = rcu_dereference_raw(node->slots[offset]);
+			entry = rcu_dereference_raw(node->slots[offset]);
 		}
 
-		if ((child == NULL) || (child == RADIX_TREE_RETRY))
+		/* This is leaf-node */
+		if (!shift)
+			break;
+
+		/* Non-leaf data entry */
+		if (!radix_tree_is_internal_node(entry)) {
+			/* Report as a single slot chunk */
+			iter->index = index;
+			iter->next_index = index + 1;
+			iter->tags = 0;
+			return node->slots + offset;
+		}
+
+		node = entry_to_node(entry);
+		/* RADIX_TREE_RETRY */
+		if (!node)
 			goto restart;
-	} while (radix_tree_is_internal_node(child));
+	}
 
 	/* Update the iterator state */
-	iter->index = (index &~ node_maxindex(node)) | (offset << node->shift);
-	iter->next_index = (index | node_maxindex(node)) + 1;
-	__set_iter_shift(iter, node->shift);
+	iter->index = index;
+	iter->next_index = (index | RADIX_TREE_MAP_MASK) + 1;
 
 	/* Construct iter->tags bit-mask from node->tags[tag] array */
 	if (flags & RADIX_TREE_ITER_TAGGED) {
@@ -1319,7 +1273,6 @@ unsigned long radix_tree_range_tag_if_tagged(struct radix_tree_root *root,
 			goto next;
 		if (!tag_get(node, iftag, offset))
 			goto next;
-		/* Sibling slots never have tags set on them */
 		if (radix_tree_is_internal_node(child)) {
 			node = entry_to_node(child);
 			continue;
@@ -1341,7 +1294,7 @@ unsigned long radix_tree_range_tag_if_tagged(struct radix_tree_root *root,
 				break;
 			tag_set(parent, settag, offset);
 		}
- next:
+next:
 		/* Go to next entry in node */
 		index = ((index >> node->shift) + 1) << node->shift;
 		/* Overflow can happen when last_index is ~0UL... */
@@ -1357,8 +1310,6 @@ unsigned long radix_tree_range_tag_if_tagged(struct radix_tree_root *root,
 			node = node->parent;
 			offset = (index >> node->shift) & RADIX_TREE_MAP_MASK;
 		}
-		if (is_sibling_entry(node, node->slots[offset]))
-			goto next;
 		if (tagged >= nr_to_tag)
 			break;
 	}
@@ -1574,8 +1525,6 @@ static unsigned long __locate(struct radix_tree_node *slot, void *item,
 				continue;
 			}
 			node = entry_to_node(node);
-			if (is_sibling_entry(slot, node))
-				continue;
 			slot = node;
 			break;
 		}
@@ -1750,20 +1699,6 @@ bool __radix_tree_delete_node(struct radix_tree_root *root,
 	return deleted;
 }
 
-static inline void delete_sibling_entries(struct radix_tree_node *node,
-					void *ptr, unsigned offset)
-{
-#ifdef CONFIG_RADIX_TREE_MULTIORDER
-	int i;
-	for (i = 1; offset + i < RADIX_TREE_MAP_SIZE; i++) {
-		if (node->slots[offset + i] != ptr)
-			break;
-		node->slots[offset + i] = NULL;
-		node->count--;
-	}
-#endif
-}
-
 /**
  *	radix_tree_delete_item    -    delete an item from a radix tree
  *	@root:		radix tree root
@@ -1803,7 +1738,6 @@ void *radix_tree_delete_item(struct radix_tree_root *root,
 	for (tag = 0; tag < RADIX_TREE_MAX_TAGS; tag++)
 		node_tag_clear(root, node, tag, offset);
 
-	delete_sibling_entries(node, node_to_entry(slot), offset);
 	node->slots[offset] = NULL;
 	node->count--;
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH RFC 3/4] testing/radix-tree: replace multi-order with range operations
  2016-08-27 14:14 [PATCH RFC 1/4] lib/radix: add universal radix_tree_fill_range Konstantin Khlebnikov
  2016-08-27 14:16 ` [PATCH RFC 2/4] lib/radix-tree: remove sibling entries Konstantin Khlebnikov
@ 2016-08-27 14:16 ` Konstantin Khlebnikov
  2016-08-27 14:16 ` [PATCH RFC 4/4] testing/radix-tree: benchmark for iterator Konstantin Khlebnikov
  2016-08-30 22:39 ` [PATCH RFC 1/4] lib/radix: add universal radix_tree_fill_range Ross Zwisler
  3 siblings, 0 replies; 5+ messages in thread
From: Konstantin Khlebnikov @ 2016-08-27 14:16 UTC (permalink / raw)
  To: linux-mm; +Cc: Andrew Morton, linux-kernel, Kirill A. Shutemov, Ross Zwisler

This patch updates test for multi-order operations according to changes
in radix-tree: huge entry now must remember its range.

Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
---
 tools/testing/radix-tree/linux/bug.h      |    2 -
 tools/testing/radix-tree/linux/err.h      |   31 ++++++++
 tools/testing/radix-tree/linux/kernel.h   |    3 +
 tools/testing/radix-tree/linux/rcupdate.h |    6 ++
 tools/testing/radix-tree/multiorder.c     |  113 ++++++++++++++++-------------
 tools/testing/radix-tree/test.c           |   49 ++++++++++---
 tools/testing/radix-tree/test.h           |   15 ++--
 7 files changed, 151 insertions(+), 68 deletions(-)
 create mode 100644 tools/testing/radix-tree/linux/err.h

diff --git a/tools/testing/radix-tree/linux/bug.h b/tools/testing/radix-tree/linux/bug.h
index ccbe444977df..7a77fa971e91 100644
--- a/tools/testing/radix-tree/linux/bug.h
+++ b/tools/testing/radix-tree/linux/bug.h
@@ -1 +1 @@
-#define WARN_ON_ONCE(x)		assert(x)
+#define WARN_ON_ONCE(x)		assert(!(x))
diff --git a/tools/testing/radix-tree/linux/err.h b/tools/testing/radix-tree/linux/err.h
new file mode 100644
index 000000000000..6fd3e608d4d7
--- /dev/null
+++ b/tools/testing/radix-tree/linux/err.h
@@ -0,0 +1,31 @@
+#ifndef _LINUX_ERR_H
+#define _LINUX_ERR_H
+
+#define MAX_ERRNO       4095
+
+#define IS_ERR_VALUE(x) unlikely((unsigned long)(void *)(x) >= (unsigned long)-MAX_ERRNO)
+
+static inline void *ERR_PTR(long error)
+{
+	return (void *) error;
+}
+
+static inline long PTR_ERR(const void *ptr)
+{
+	return (long) ptr;
+}
+
+static inline bool IS_ERR(const void *ptr)
+{
+	return IS_ERR_VALUE((unsigned long)ptr);
+}
+
+static inline int PTR_ERR_OR_ZERO(const void *ptr)
+{
+	if (IS_ERR(ptr))
+		return PTR_ERR(ptr);
+	else
+		return 0;
+}
+
+#endif /* _LINUX_ERR_H */
diff --git a/tools/testing/radix-tree/linux/kernel.h b/tools/testing/radix-tree/linux/kernel.h
index be98a47b4e1b..52714e86991b 100644
--- a/tools/testing/radix-tree/linux/kernel.h
+++ b/tools/testing/radix-tree/linux/kernel.h
@@ -32,6 +32,9 @@
 
 #define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]))
 
+#define IS_ALIGNED(x, a)	(((x) & ((typeof(x))(a) - 1)) == 0)
+#define is_power_of_2(n)	((n) != 0 && (((n) & ((n) - 1)) == 0))
+
 #define container_of(ptr, type, member) ({                      \
 	const typeof( ((type *)0)->member ) *__mptr = (ptr);    \
 	(type *)( (char *)__mptr - offsetof(type, member) );})
diff --git a/tools/testing/radix-tree/linux/rcupdate.h b/tools/testing/radix-tree/linux/rcupdate.h
index f7129ea2a899..8c4ae8173778 100644
--- a/tools/testing/radix-tree/linux/rcupdate.h
+++ b/tools/testing/radix-tree/linux/rcupdate.h
@@ -3,6 +3,12 @@
 
 #include <urcu.h>
 
+/* urcu.h includes errno.h which undefines ERANGE */
+#ifndef ERANGE
+#define ERANGE 34
+#endif
+
+#define RCU_INIT_POINTER(p, v) rcu_assign_pointer(p, v)
 #define rcu_dereference_raw(p) rcu_dereference(p)
 #define rcu_dereference_protected(p, cond) rcu_dereference(p)
 
diff --git a/tools/testing/radix-tree/multiorder.c b/tools/testing/radix-tree/multiorder.c
index 39d9b9568fe2..f73a3bfa83d8 100644
--- a/tools/testing/radix-tree/multiorder.c
+++ b/tools/testing/radix-tree/multiorder.c
@@ -29,7 +29,7 @@ static void __multiorder_tag_test(int index, int order)
 	unsigned long first = 0;
 
 	/* our canonical entry */
-	base = index & ~((1 << order) - 1);
+	base = index;
 
 	printf("Multiorder tag test with index %d, canonical entry %d\n",
 			index, base);
@@ -43,7 +43,7 @@ static void __multiorder_tag_test(int index, int order)
 	 * item_insert_order().
 	 */
 	for_each_index(i, base, order) {
-		err = __radix_tree_insert(&tree, i, order,
+		err = radix_tree_insert(&tree, i,
 				(void *)(0xA0 | RADIX_TREE_EXCEPTIONAL_ENTRY));
 		assert(err == -EEXIST);
 	}
@@ -55,18 +55,18 @@ static void __multiorder_tag_test(int index, int order)
 
 	assert(radix_tree_tag_set(&tree, index, 0));
 
-	for_each_index(i, base, order) {
-		assert(radix_tree_tag_get(&tree, i, 0));
+	assert(radix_tree_tag_get(&tree, index, 0));
+
+	for_each_index(i, base, order)
 		assert(!radix_tree_tag_get(&tree, i, 1));
-	}
 
 	assert(radix_tree_range_tag_if_tagged(&tree, &first, ~0UL, 10, 0, 1) == 1);
 	assert(radix_tree_tag_clear(&tree, index, 0));
 
-	for_each_index(i, base, order) {
+	assert(radix_tree_tag_get(&tree, index, 1));
+
+	for_each_index(i, base, order)
 		assert(!radix_tree_tag_get(&tree, i, 0));
-		assert(radix_tree_tag_get(&tree, i, 1));
-	}
 
 	assert(radix_tree_tag_clear(&tree, index, 1));
 
@@ -122,7 +122,7 @@ static void multiorder_tag_tests(void)
 static void multiorder_check(unsigned long index, int order)
 {
 	unsigned long i;
-	unsigned long min = index & ~((1UL << order) - 1);
+	unsigned long min = index;
 	unsigned long max = min + (1UL << order);
 	RADIX_TREE(tree, GFP_KERNEL);
 
@@ -145,7 +145,7 @@ static void multiorder_check(unsigned long index, int order)
 		assert(radix_tree_insert(&tree, i, entry) == -EEXIST);
 	}
 
-	assert(item_delete(&tree, index) != 0);
+	assert(item_delete_order(&tree, index, order) != 0);
 
 	for (i = 0; i < 2*max; i++)
 		item_check_absent(&tree, i);
@@ -178,7 +178,7 @@ static void multiorder_shrink(unsigned long index, int order)
 	for (i = max; i < 2*max; i++)
 		item_check_absent(&tree, i);
 
-	if (!item_delete(&tree, 0)) {
+	if (!item_delete_order(&tree, 0, order)) {
 		printf("failed to delete index %ld (order %d)\n", index, order);		abort();
 	}
 
@@ -221,14 +221,12 @@ void multiorder_iteration(void)
 				break;
 
 		radix_tree_for_each_slot(slot, &tree, &iter, j) {
-			int height = order[i] / RADIX_TREE_MAP_SHIFT;
-			int shift = height * RADIX_TREE_MAP_SHIFT;
 			int mask = (1 << order[i]) - 1;
 
 			assert(iter.index >= (index[i] &~ mask));
 			assert(iter.index <= (index[i] | mask));
-			assert(iter.shift == shift);
-			i++;
+			if (iter.index == (index[i] | mask))
+				i++;
 		}
 	}
 
@@ -248,36 +246,48 @@ void multiorder_tagged_iteration(void)
 #define MT_NUM_ENTRIES 9
 	int index[MT_NUM_ENTRIES] = {0, 2, 4, 16, 32, 40, 64, 72, 128};
 	int order[MT_NUM_ENTRIES] = {1, 0, 2, 4,  3,  1,  3,  0,   7};
-
-#define TAG_ENTRIES 7
-	int tag_index[TAG_ENTRIES] = {0, 4, 16, 40, 64, 72, 128};
+	int tag[MT_NUM_ENTRIES]   = {1, 0, 1, 1,  0,  1,  1,  1,   1};
 
 	for (i = 0; i < MT_NUM_ENTRIES; i++)
 		assert(!item_insert_order(&tree, index[i], order[i]));
 
 	assert(!radix_tree_tagged(&tree, 1));
 
-	for (i = 0; i < TAG_ENTRIES; i++)
-		assert(radix_tree_tag_set(&tree, tag_index[i], 1));
+	for (i = 0; i < MT_NUM_ENTRIES; i++) {
+		unsigned long end = item_order_end(index[i], order[i]);
+
+		assert(!radix_tree_tag_get(&tree, index[i], 1));
+		assert(!radix_tree_tag_get(&tree, end, 1));
+		if (tag[i])
+			assert(radix_tree_tag_set(&tree, end, 1));
+	}
+
+	for (i = 0; i < MT_NUM_ENTRIES; i++) {
+		unsigned long end = item_order_end(index[i], order[i]);
+
+		if (tag[i])
+			assert(radix_tree_tag_get(&tree, end, 1));
+		else
+			assert(!radix_tree_tag_get(&tree, end, 1));
+	}
 
 	for (j = 0; j < 256; j++) {
-		int mask, k;
+		int k;
 
-		for (i = 0; i < TAG_ENTRIES; i++) {
-			for (k = i; index[k] < tag_index[i]; k++)
-				;
-			if (j <= (index[k] | ((1 << order[k]) - 1)))
+		for (k = 0; k < MT_NUM_ENTRIES; k++)
+			if (tag[k] && j <= item_order_end(index[k], order[k]))
 				break;
-		}
 
 		radix_tree_for_each_tagged(slot, &tree, &iter, j, 1) {
-			for (k = i; index[k] < tag_index[i]; k++)
-				;
-			mask = (1 << order[k]) - 1;
-
-			assert(iter.index >= (tag_index[i] &~ mask));
-			assert(iter.index <= (tag_index[i] | mask));
-			i++;
+			unsigned long end = item_order_end(index[k], order[k]);
+
+			assert(k < MT_NUM_ENTRIES);
+			assert(radix_tree_tag_get(&tree, iter.index, 1));
+			assert(iter.index >= index[k]);
+			assert(iter.index <= end);
+			if (iter.index == end)
+				while (++k < MT_NUM_ENTRIES && !tag[k])
+					;
 		}
 	}
 
@@ -285,33 +295,36 @@ void multiorder_tagged_iteration(void)
 					MT_NUM_ENTRIES, 1, 2);
 
 	for (j = 0; j < 256; j++) {
-		int mask, k;
+		int k;
 
-		for (i = 0; i < TAG_ENTRIES; i++) {
-			for (k = i; index[k] < tag_index[i]; k++)
-				;
-			if (j <= (index[k] | ((1 << order[k]) - 1)))
+		for (k = 0; k < MT_NUM_ENTRIES; k++)
+			if (tag[k] && j <= item_order_end(index[k], order[k]))
 				break;
-		}
 
 		radix_tree_for_each_tagged(slot, &tree, &iter, j, 2) {
-			for (k = i; index[k] < tag_index[i]; k++)
-				;
-			mask = (1 << order[k]) - 1;
-
-			assert(iter.index >= (tag_index[i] &~ mask));
-			assert(iter.index <= (tag_index[i] | mask));
-			i++;
+			unsigned long end = item_order_end(index[k], order[k]);
+
+			assert(k < MT_NUM_ENTRIES);
+			assert(radix_tree_tag_get(&tree, iter.index, 2));
+			assert(iter.index >= index[k]);
+			assert(iter.index <= end);
+			if (iter.index == end)
+				while (++k < MT_NUM_ENTRIES && !tag[k])
+					;
 		}
 	}
 
-	first = 1;
+	assert(!radix_tree_tagged(&tree, 0));
+
+	first = index[1];
 	radix_tree_range_tag_if_tagged(&tree, &first, ~0UL,
 					MT_NUM_ENTRIES, 1, 0);
-	i = 0;
+
 	radix_tree_for_each_tagged(slot, &tree, &iter, 0, 0) {
-		assert(iter.index == tag_index[i]);
-		i++;
+		assert(radix_tree_tag_get(&tree, iter.index, 0));
+		assert(iter.index >= index[2]);
+		assert(iter.index <= item_order_end(index[2], order[2]));
+		break;
 	}
 
 	item_kill_tree(&tree);
diff --git a/tools/testing/radix-tree/test.c b/tools/testing/radix-tree/test.c
index a6e8099eaf4f..d6761d1a157a 100644
--- a/tools/testing/radix-tree/test.c
+++ b/tools/testing/radix-tree/test.c
@@ -4,6 +4,7 @@
 #include <linux/types.h>
 #include <linux/kernel.h>
 #include <linux/bitops.h>
+#include <linux/err.h>
 
 #include "test.h"
 
@@ -24,21 +25,23 @@ int item_tag_get(struct radix_tree_root *root, unsigned long index, int tag)
 	return radix_tree_tag_get(root, index, tag);
 }
 
-int __item_insert(struct radix_tree_root *root, struct item *item,
-			unsigned order)
+int item_insert(struct radix_tree_root *root, unsigned long index)
 {
-	return __radix_tree_insert(root, item->index, order, item);
+	return radix_tree_insert(root, index, item_create(index, index));
 }
 
-int item_insert(struct radix_tree_root *root, unsigned long index)
+unsigned long item_order_end(unsigned long index, unsigned int order)
 {
-	return __item_insert(root, item_create(index), 0);
+	return index + (1ul << order) - 1;
 }
 
 int item_insert_order(struct radix_tree_root *root, unsigned long index,
-			unsigned order)
+			unsigned int order)
 {
-	return __item_insert(root, item_create(index), order);
+	unsigned long end = item_order_end(index, order);
+
+	return PTR_ERR_OR_ZERO(radix_tree_fill_range(root, index, end,
+				item_create(index, end), 0));
 }
 
 int item_delete(struct radix_tree_root *root, unsigned long index)
@@ -47,17 +50,34 @@ int item_delete(struct radix_tree_root *root, unsigned long index)
 
 	if (item) {
 		assert(item->index == index);
+		assert(item->end == index);
 		free(item);
 		return 1;
 	}
 	return 0;
 }
 
-struct item *item_create(unsigned long index)
+int item_delete_order(struct radix_tree_root *root, unsigned long index,
+			unsigned int order)
+{
+	struct item *item = radix_tree_lookup(root, index);
+	unsigned long end = item_order_end(index, order);
+
+	if (item) {
+		assert(item->index == index);
+		assert(item->end == end);
+	}
+	radix_tree_truncate_range(root, index, end);
+	free(item);
+	return !!item;
+}
+
+struct item *item_create(unsigned long index, unsigned long end)
 {
 	struct item *ret = malloc(sizeof(*ret));
 
 	ret->index = index;
+	ret->end = end;
 	return ret;
 }
 
@@ -207,10 +227,17 @@ void item_kill_tree(struct radix_tree_root *root)
 		int i;
 
 		for (i = 0; i < nfound; i++) {
-			void *ret;
+			void *item;
+
+			if (items[i]->index != items[i]->end) {
+				radix_tree_truncate_range(root, items[i]->index,
+								items[i]->end);
+				free(items[i]);
+				break;
+			}
 
-			ret = radix_tree_delete(root, items[i]->index);
-			assert(ret == items[i]);
+			item = radix_tree_delete(root, items[i]->index);
+			assert(item == items[i]);
 			free(items[i]);
 		}
 	}
diff --git a/tools/testing/radix-tree/test.h b/tools/testing/radix-tree/test.h
index 217fb2403f09..93a6ce5e5a59 100644
--- a/tools/testing/radix-tree/test.h
+++ b/tools/testing/radix-tree/test.h
@@ -4,16 +4,19 @@
 #include <linux/rcupdate.h>
 
 struct item {
-	unsigned long index;
+	unsigned long index, end;
 };
 
-struct item *item_create(unsigned long index);
-int __item_insert(struct radix_tree_root *root, struct item *item,
-			unsigned order);
+struct item *item_create(unsigned long index, unsigned long end);
 int item_insert(struct radix_tree_root *root, unsigned long index);
-int item_insert_order(struct radix_tree_root *root, unsigned long index,
-			unsigned order);
 int item_delete(struct radix_tree_root *root, unsigned long index);
+
+unsigned long item_order_end(unsigned long index, unsigned int order);
+int item_insert_order(struct radix_tree_root *root, unsigned long index,
+			unsigned int order);
+int item_delete_order(struct radix_tree_root *root, unsigned long index,
+			unsigned int order);
+
 struct item *item_lookup(struct radix_tree_root *root, unsigned long index);
 
 void item_check_present(struct radix_tree_root *root, unsigned long index);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH RFC 4/4] testing/radix-tree: benchmark for iterator
  2016-08-27 14:14 [PATCH RFC 1/4] lib/radix: add universal radix_tree_fill_range Konstantin Khlebnikov
  2016-08-27 14:16 ` [PATCH RFC 2/4] lib/radix-tree: remove sibling entries Konstantin Khlebnikov
  2016-08-27 14:16 ` [PATCH RFC 3/4] testing/radix-tree: replace multi-order with range operations Konstantin Khlebnikov
@ 2016-08-27 14:16 ` Konstantin Khlebnikov
  2016-08-30 22:39 ` [PATCH RFC 1/4] lib/radix: add universal radix_tree_fill_range Ross Zwisler
  3 siblings, 0 replies; 5+ messages in thread
From: Konstantin Khlebnikov @ 2016-08-27 14:16 UTC (permalink / raw)
  To: linux-mm; +Cc: Andrew Morton, linux-kernel, Kirill A. Shutemov, Ross Zwisler

This adds simple benchmark for iterator similar to one I've used for
commit 78c1d78488a3 ("radix-tree: introduce bit-optimized iterator")

Building with make BENCHMARK=1 set radix tree order to 6 and adds -O2,
this allows to get performance comparable to in kernel performance.

Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com>
---
 tools/testing/radix-tree/Makefile       |    6 ++
 tools/testing/radix-tree/benchmark.c    |  101 +++++++++++++++++++++++++++++++
 tools/testing/radix-tree/linux/kernel.h |    4 +
 tools/testing/radix-tree/main.c         |    2 +
 tools/testing/radix-tree/test.h         |    1 
 5 files changed, 113 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/radix-tree/benchmark.c

diff --git a/tools/testing/radix-tree/Makefile b/tools/testing/radix-tree/Makefile
index 6079ec142685..1594335d1ed6 100644
--- a/tools/testing/radix-tree/Makefile
+++ b/tools/testing/radix-tree/Makefile
@@ -4,7 +4,11 @@ LDFLAGS += -lpthread -lurcu
 TARGETS = main
 OFILES = main.o radix-tree.o linux.o test.o tag_check.o find_next_bit.o \
 	 regression1.o regression2.o regression3.o multiorder.o \
-	 iteration_check.o
+	 iteration_check.o benchmark.o
+
+ifdef BENCHMARK
+	CFLAGS += -DBENCHMARK=1 -O2
+endif
 
 targets: $(TARGETS)
 
diff --git a/tools/testing/radix-tree/benchmark.c b/tools/testing/radix-tree/benchmark.c
new file mode 100644
index 000000000000..05d46071bf37
--- /dev/null
+++ b/tools/testing/radix-tree/benchmark.c
@@ -0,0 +1,101 @@
+/*
+ * benchmark.c:
+ * Author: Konstantin Khlebnikov <koct9i@gmail.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ */
+#include <linux/radix-tree.h>
+#include <linux/slab.h>
+#include <linux/errno.h>
+#include <time.h>
+#include "test.h"
+
+#define NSEC_PER_SEC	1000000000L
+
+static long long benchmark_iter(struct radix_tree_root *root, bool tagged)
+{
+	volatile unsigned long sink = 0;
+	struct radix_tree_iter iter;
+	struct timespec start, finish;
+	long long nsec;
+	int l, loops = 1;
+	void **slot;
+
+#ifdef BENCHMARK
+again:
+#endif
+	clock_gettime(CLOCK_MONOTONIC, &start);
+	for (l = 0; l < loops; l++) {
+		if (tagged) {
+			radix_tree_for_each_tagged(slot, root, &iter, 0, 0)
+				sink ^= (unsigned long)slot;
+		} else {
+			radix_tree_for_each_slot(slot, root, &iter, 0)
+				sink ^= (unsigned long)slot;
+		}
+	}
+	clock_gettime(CLOCK_MONOTONIC, &finish);
+
+	nsec = (finish.tv_sec - start.tv_sec) * NSEC_PER_SEC +
+	       (finish.tv_nsec - start.tv_nsec);
+
+#ifdef BENCHMARK
+	if (loops == 1 && nsec * 5 < NSEC_PER_SEC) {
+		loops = NSEC_PER_SEC / nsec / 4 + 1;
+		goto again;
+	}
+#endif
+
+	nsec /= loops;
+	return nsec;
+}
+
+static void benchmark_size(unsigned long size, unsigned long step, int order)
+{
+	RADIX_TREE(tree, GFP_KERNEL);
+	long long normal, tagged;
+	unsigned long index;
+
+	for (index = 0 ; index < size ; index += step) {
+		item_insert_order(&tree, index, order);
+		radix_tree_tag_set(&tree, item_order_end(index, order), 0);
+	}
+
+	tagged = benchmark_iter(&tree, true);
+	normal = benchmark_iter(&tree, false);
+
+	printf("Size %ld, step %6ld, order %d tagged %10lld ns, normal %10lld ns\n",
+		size, step, order, tagged, normal);
+
+	item_kill_tree(&tree);
+}
+
+void benchmark(void)
+{
+	unsigned long size[] = {1 << 10, 1 << 20,
+#ifdef BENCHMARK
+		1 << 27,
+#endif
+		0};
+	unsigned long step[] = {1, 2, 7, 15, 63, 64, 65,
+				128, 256, 512, 12345, 0};
+	int c, s;
+
+	printf("starting benchmarks\n");
+	printf("RADIX_TREE_MAP_SHIFT = %d\n", RADIX_TREE_MAP_SHIFT);
+
+	for (c = 0; size[c]; c++)
+		for (s = 0; step[s]; s++)
+			benchmark_size(size[c], step[s], 0);
+
+	for (c = 0; size[c]; c++)
+		for (s = 0; step[s]; s++)
+			benchmark_size(size[c], step[s] << 9, 9);
+}
diff --git a/tools/testing/radix-tree/linux/kernel.h b/tools/testing/radix-tree/linux/kernel.h
index 52714e86991b..ddabc495423f 100644
--- a/tools/testing/radix-tree/linux/kernel.h
+++ b/tools/testing/radix-tree/linux/kernel.h
@@ -10,7 +10,11 @@
 #include "../../include/linux/compiler.h"
 #include "../../../include/linux/kconfig.h"
 
+#ifdef BENCHMARK
+#define RADIX_TREE_MAP_SHIFT	6
+#else
 #define RADIX_TREE_MAP_SHIFT	3
+#endif
 
 #ifndef NULL
 #define NULL	0
diff --git a/tools/testing/radix-tree/main.c b/tools/testing/radix-tree/main.c
index daa9010693e8..6e9a02cb7b97 100644
--- a/tools/testing/radix-tree/main.c
+++ b/tools/testing/radix-tree/main.c
@@ -335,6 +335,8 @@ int main(int argc, char **argv)
 	iteration_test();
 	single_thread_tests(long_run);
 
+	benchmark();
+
 	sleep(1);
 	printf("after sleep(1): %d allocated\n", nr_allocated);
 	rcu_unregister_thread();
diff --git a/tools/testing/radix-tree/test.h b/tools/testing/radix-tree/test.h
index 93a6ce5e5a59..2529a622f1f2 100644
--- a/tools/testing/radix-tree/test.h
+++ b/tools/testing/radix-tree/test.h
@@ -31,6 +31,7 @@ void item_kill_tree(struct radix_tree_root *root);
 void tag_check(void);
 void multiorder_checks(void);
 void iteration_test(void);
+void benchmark(void);
 
 struct item *
 item_tag_set(struct radix_tree_root *root, unsigned long index, int tag);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH RFC 1/4] lib/radix: add universal radix_tree_fill_range
  2016-08-27 14:14 [PATCH RFC 1/4] lib/radix: add universal radix_tree_fill_range Konstantin Khlebnikov
                   ` (2 preceding siblings ...)
  2016-08-27 14:16 ` [PATCH RFC 4/4] testing/radix-tree: benchmark for iterator Konstantin Khlebnikov
@ 2016-08-30 22:39 ` Ross Zwisler
  3 siblings, 0 replies; 5+ messages in thread
From: Ross Zwisler @ 2016-08-30 22:39 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: linux-mm, Andrew Morton, linux-kernel, Kirill A. Shutemov, Ross Zwisler

On Sat, Aug 27, 2016 at 05:14:34PM +0300, Konstantin Khlebnikov wrote:
> Tags should be set only for last index of THP range: this way iterator
> will find them regardless of starting index.

I don't think this works well for DAX.  We really want to to have the tags be
consistent for all indices within a multi-order range.  Meaning, if I fault in
an order-9 fault, and then I get a PTE write fault to anywhere within that
range, I want to be able to do a lookup, find the one canonical entry that has
my dirty tags, flush, and eventually I want to be able to clear that one tag.

I agree that it's *possible* to do all of this with your code, but it puts a
lot of onus on the user.  I now have to have two paths, one for order-0
entries, and one for multi-order entries where I know to use a specific entry
as my canonical entry where I can count on the log bit, on tags, etc.

This was actually the way that it was done with the old PMD code.   We used
the first aligned index for the PMD to be the one source of truth.  On every
fault I would first check to see if there was a PMD aligned entry, and then if
not I would treat it like a normal 4k fault.  The multi-order radix tree with
sibling entries was a huge step forward.

I guess my question is the same as Matthew's: what is the problem you need to
solve with this code, and why can't the current code be made to solve it?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-08-30 22:39 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-27 14:14 [PATCH RFC 1/4] lib/radix: add universal radix_tree_fill_range Konstantin Khlebnikov
2016-08-27 14:16 ` [PATCH RFC 2/4] lib/radix-tree: remove sibling entries Konstantin Khlebnikov
2016-08-27 14:16 ` [PATCH RFC 3/4] testing/radix-tree: replace multi-order with range operations Konstantin Khlebnikov
2016-08-27 14:16 ` [PATCH RFC 4/4] testing/radix-tree: benchmark for iterator Konstantin Khlebnikov
2016-08-30 22:39 ` [PATCH RFC 1/4] lib/radix: add universal radix_tree_fill_range Ross Zwisler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).