All of lore.kernel.org
 help / color / mirror / Atom feed
* IDA/IDR rewrite, percpu ida
@ 2013-08-07 17:34 Kent Overstreet
  2013-08-07 17:34 ` [PATCH 03/10] idr: Rewrite ida Kent Overstreet
                   ` (7 more replies)
  0 siblings, 8 replies; 46+ messages in thread
From: Kent Overstreet @ 2013-08-07 17:34 UTC (permalink / raw)
  To: akpm, linux-kernel

Andrew - this should be pretty much identical to the patch series I
mailed out during last merge window, except rebased onto 3.11-rc4 and
retested.

I think the series should be more or less ready to go, and it'd be
really nice to get at least the percpu bits in - think you can have a
look and pick it up if it meets your standards?

Patch series is also available in git -
git://evilpiepirate.org/~kent/linux-bcache.git idr


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH 03/10] idr: Rewrite ida
  2013-08-07 17:34 IDA/IDR rewrite, percpu ida Kent Overstreet
@ 2013-08-07 17:34 ` Kent Overstreet
  2013-08-07 20:22   ` Tejun Heo
  2013-08-07 17:34 ` [PATCH 04/10] idr: Percpu ida Kent Overstreet
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 46+ messages in thread
From: Kent Overstreet @ 2013-08-07 17:34 UTC (permalink / raw)
  To: akpm, linux-kernel
  Cc: Kent Overstreet, Tejun Heo, Stephen Rothwell, Fengguang Wu

This is a new, from scratch implementation of ida that should be
simpler, faster and more space efficient.

Two primary reasons for the rewrite:
 * A future patch will reimplement idr on top of this ida implementation +
   radix trees. Once that's done, the end result will be ~1k fewer lines
   of code, much simpler and easier to understand and it should be quite
   a bit faster.

 * The performance improvements and addition of ganged allocation should
   make ida more suitable for use by a percpu id/tag allocator, which
   would then act as a frontend to this allocator.

The old ida implementation was done with the idr data structures - this
was IMO backwards. I'll soon be reimplementing idr on top of this new
ida implementation and radix trees - using a separate dedicated data
structure for the free ID bitmap should actually make idr faster, and
the end result is _significantly_ less code.

This implementation conceptually isn't that different from the old one -
it's a tree of bitmaps, where one bit in a given node indicates whether
or not there are free bits in a child node.

The main difference (and advantage) over the old version is that the
tree isn't implemented with pointers - it's implemented in an array,
like how heaps are implemented, which both better space efficiency and
it'll be faster since there's no pointer chasing. (It's not one giant
contiguous array, it's an array of arrays but the algorithm treats it as
one big array)

Time to allocate 1 << 24 ids:		0m0.663s
Time to allocate 1 << 24 ids, old code:	0m28.604s

Time to allocate INT_MAX ids:		1m41.371s
Time to allocate INT_MAX ids, old code:	Got bored of waiting for it to finish.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Fengguang Wu <fengguang.wu@intel.com>
---
 include/linux/idr.h | 122 ++++---
 lib/idr.c           | 897 +++++++++++++++++++++++++++++++++++-----------------
 2 files changed, 691 insertions(+), 328 deletions(-)

diff --git a/include/linux/idr.h b/include/linux/idr.h
index c0e0c54..a310bb0 100644
--- a/include/linux/idr.h
+++ b/include/linux/idr.h
@@ -16,6 +16,92 @@
 #include <linux/bitops.h>
 #include <linux/init.h>
 #include <linux/rcupdate.h>
+#include <linux/spinlock_types.h>
+#include <linux/wait.h>
+
+/* IDA */
+
+struct ida {
+	spinlock_t		lock;
+
+	/*
+	 * cur_id and allocated_ids are for ida_alloc_cyclic. For cyclic
+	 * allocations we search for new ids to allocate starting from the last
+	 * id allocated - cur_id is the next id to try allocating.
+	 *
+	 * But we also don't want the allocated ids to be arbitrarily sparse -
+	 * the memory usage for the bitmap could be arbitrarily bad, and if
+	 * they're used as keys in a radix tree the memory overhead of the radix
+	 * tree could be quite bad as well. So we use allocated_ids to decide
+	 * when to restart cur_id from 0, and bound how sparse the bitmap can
+	 * be.
+	 */
+	unsigned		cur_id;
+	unsigned		allocated_ids;
+
+	/* size of ida->tree */
+	unsigned		nodes;
+
+	/*
+	 * Index of first leaf node in ida->tree; equal to the number of non
+	 * leaf nodes, ida->nodes - ida->first_leaf == number of leaf nodes
+	 */
+	unsigned		first_leaf;
+	unsigned		sections;
+
+	unsigned long		**tree;
+	unsigned long		*inline_section;
+	unsigned long		inline_node;
+};
+
+#define IDA_INIT(name)						\
+{								\
+	.lock		= __SPIN_LOCK_UNLOCKED(name.lock),	\
+	.nodes		= 1,					\
+	.first_leaf	= 0,					\
+	.sections	= 1,					\
+	.tree		= &name.inline_section,			\
+	.inline_section	= &name.inline_node,			\
+}
+#define DEFINE_IDA(name)	struct ida name = IDA_INIT(name)
+
+void ida_remove(struct ida *ida, unsigned id);
+int ida_alloc_range(struct ida *ida, unsigned int start,
+		  unsigned int end, gfp_t gfp);
+int ida_alloc_cyclic(struct ida *ida, unsigned start, unsigned end, gfp_t gfp);
+void ida_destroy(struct ida *ida);
+int ida_init_prealloc(struct ida *ida, unsigned prealloc);
+
+/**
+ * ida_alloc_range - allocate a new id.
+ * @ida: the (initialized) ida.
+ * @gfp_mask: memory allocation flags
+ *
+ * Allocates an id in the range [0, INT_MAX]. Returns -ENOSPC if no ids are
+ * available, or -ENOMEM on memory allocation failure.
+ *
+ * Returns the smallest available id
+ *
+ * Use ida_remove() to get rid of an id.
+ */
+static inline int ida_alloc(struct ida *ida, gfp_t gfp_mask)
+{
+	return ida_alloc_range(ida, 0, 0, gfp_mask);
+}
+
+/**
+ * ida_init - initialize ida handle
+ * @ida:	ida handle
+ *
+ * This function is use to set up the handle (@ida) that you will pass
+ * to the rest of the functions.
+ */
+static inline void ida_init(struct ida *ida)
+{
+	ida_init_prealloc(ida, 0);
+}
+
+/* IDR */
 
 /*
  * We want shallower trees and thus more bits covered at each layer.  8
@@ -195,42 +281,6 @@ static inline void __deprecated idr_remove_all(struct idr *idp)
 	__idr_remove_all(idp);
 }
 
-/*
- * IDA - IDR based id allocator, use when translation from id to
- * pointer isn't necessary.
- *
- * IDA_BITMAP_LONGS is calculated to be one less to accommodate
- * ida_bitmap->nr_busy so that the whole struct fits in 128 bytes.
- */
-#define IDA_CHUNK_SIZE		128	/* 128 bytes per chunk */
-#define IDA_BITMAP_LONGS	(IDA_CHUNK_SIZE / sizeof(long) - 1)
-#define IDA_BITMAP_BITS 	(IDA_BITMAP_LONGS * sizeof(long) * 8)
-
-struct ida_bitmap {
-	long			nr_busy;
-	unsigned long		bitmap[IDA_BITMAP_LONGS];
-};
-
-struct ida {
-	struct idr		idr;
-	struct ida_bitmap	*free_bitmap;
-};
-
-#define IDA_INIT(name)		{ .idr = IDR_INIT((name).idr), .free_bitmap = NULL, }
-#define DEFINE_IDA(name)	struct ida name = IDA_INIT(name)
-
-void ida_destroy(struct ida *ida);
-void ida_init(struct ida *ida);
-
-int ida_alloc_range(struct ida *ida, unsigned int start, unsigned int end,
-		   gfp_t gfp_mask);
-void ida_remove(struct ida *ida, unsigned int id);
-
-static inline int ida_alloc(struct ida *ida, gfp_t gfp_mask)
-{
-	return ida_alloc_range(ida, 0, 0, gfp_mask);
-}
-
 void __init idr_init_cache(void);
 
 #endif /* __IDR_H__ */
diff --git a/lib/idr.c b/lib/idr.c
index 9ac1174..e5afab5 100644
--- a/lib/idr.c
+++ b/lib/idr.c
@@ -8,6 +8,8 @@
  *
  * Modified by Nadia Derbey to make it RCU safe.
  *
+ * IDA completely rewritten by Kent Overstreet <koverstreet@google.com>
+ *
  * Small id to pointer translation service.
  *
  * It uses a radix tree like structure as a sparse array indexed
@@ -26,17 +28,612 @@
  * with the slab allocator.
  */
 
-#ifndef TEST                        // to test in user space...
-#include <linux/slab.h>
-#include <linux/init.h>
-#include <linux/export.h>
-#endif
+#include <linux/bitmap.h>
+#include <linux/bitops.h>
+#include <linux/bug.h>
 #include <linux/err.h>
-#include <linux/string.h>
+#include <linux/export.h>
+#include <linux/hardirq.h>
 #include <linux/idr.h>
-#include <linux/spinlock.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
 #include <linux/percpu.h>
-#include <linux/hardirq.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+#include <linux/spinlock.h>
+
+static void kgfree(void *ptr, size_t size)
+{
+	if (size < PAGE_SIZE)
+		kfree(ptr);
+	else
+		free_pages((unsigned long) ptr, get_order(size));
+}
+
+static void *kgalloc(size_t size, gfp_t gfp)
+{
+	return size < PAGE_SIZE
+		? kmalloc(size, gfp)
+		: (void *) __get_free_pages(gfp, get_order(size));
+}
+
+/**
+ * DOC: IDA description
+ * IDA - ID (small integer) allocator
+ *
+ * This works much like using a simple bitmap to allocate indices - ida_alloc()
+ * is equivalent to find_first_zero_bit() then __set_bit(), and ida_remove() is
+ * equivalent to __clear_bit(). But it's much more efficient than a large
+ * bitmap, and resizes itself as needed.
+ *
+ * It's implemented as a tree of bitmaps: a node in the tree is a single
+ * unsigned long. The leaf nodes of the tree are segments of the entire bitmap -
+ * a cleared bit indicates a free id, and a set bit indicates an allocated one.
+ * Bits in the parent nodes indicate whether or not there are free bits in the
+ * corresponding child node - when all the bits in a parent node are set, none
+ * of its children have bits free.
+ *
+ * The splay factor of the tree (IDA_TREE_ARY) == BITS_PER_LONG - parent nodes
+ * have 32 or 64 children.
+ *
+ * The tree itself is implemented with an array instead of pointers - exactly
+ * like the textbook implementation of D-ary heaps. The root of the bitmap tree
+ * is at ida->tree[0]. The children of node i are at i * IDA_TREE_ARY + 1 + j,
+ * where j is in the range [0, 63], and the parent of node i is at (i - 1) /
+ * IDA_TREE_ARY.
+ *
+ * This conveniently means that our leaf nodes are all contiguous in memory -
+ * the bit for id i is bit id % BITS_PER_LONG in ida->tree[ida->first_leaf + i /
+ * BITS_PER_LONG].
+ *
+ * Note that the number of ids we can allocate is limited by the amount of
+ * memory we can contiguously allocate. The amount of memory used for the bitmap
+ * tree is only slightly more than a flat bitmap would use - about 1 / TREE_ARY
+ * * (sizeof flat bitmap).
+ *
+ * So for 1 mb of memory (and allocating more than that should be fine with
+ * CONFIG_COMPACTION) you get slightly under 8 million IDs.
+ */
+
+#define IDA_TREE_ARY		BITS_PER_LONG
+#define IDA_ALLOC_ORDER_MAX	4
+#define IDA_SECTION_SIZE	(PAGE_SIZE << IDA_ALLOC_ORDER_MAX)
+#define IDA_NODES_PER_SECTION	(IDA_SECTION_SIZE / sizeof(unsigned long))
+
+static inline unsigned long *ida_index_to_node(struct ida *ida, unsigned node)
+{
+	return ida->tree[node / IDA_NODES_PER_SECTION] +
+		node % IDA_NODES_PER_SECTION;
+}
+
+/*
+ * For a given number of nodes, calculate how many are going to be parent nodes
+ * (equal to ida->first_leaf) and by extension how my will be leaves.
+ */
+static unsigned first_leaf_from_nodes(unsigned nodes)
+{
+	unsigned ret = 0;
+
+	while (ret * IDA_TREE_ARY + 1 < nodes)
+		ret = ret * IDA_TREE_ARY + 1;
+
+	return ret;
+}
+
+static void __ida_remove(struct ida *ida, unsigned int id)
+{
+	unsigned i = ida->first_leaf + id / BITS_PER_LONG;
+	unsigned bit = id % BITS_PER_LONG;
+
+	if (WARN(i >= ida->nodes,
+		 "Tried to free an id outside the range of allocated ids\n"))
+		return;
+
+	--ida->allocated_ids;
+
+	while (1) {
+		unsigned long *node = ida_index_to_node(ida, i), old = *node;
+
+		WARN(!test_bit(bit, node),
+		     "Tried to free an id that was already free\n");
+		__clear_bit(bit, node);
+
+		if (~old || !i)
+			break;
+
+		/*
+		 * If this node's bits were all 1s before we cleared this bit,
+		 * we need to clear this node's bit in the parent node - and so
+		 * on up to the root.
+		 */
+
+		bit = (i - 1) % IDA_TREE_ARY;
+		i = (i - 1) / IDA_TREE_ARY;
+	}
+}
+
+/**
+ * ida_remove - remove an allocated id.
+ * @ida: the (initialized) ida.
+ * @id: the id returned by ida_alloc_range.
+ */
+void ida_remove(struct ida *ida, unsigned int id)
+{
+	unsigned long flags;
+	spin_lock_irqsave(&ida->lock, flags);
+	__ida_remove(ida, id);
+	spin_unlock_irqrestore(&ida->lock, flags);
+}
+EXPORT_SYMBOL(ida_remove);
+
+static void ida_increase_depth(struct ida *ida, unsigned new_nodes,
+			       unsigned new_first_leaf)
+{
+	unsigned old_leaves = ida->nodes - ida->first_leaf;
+	unsigned src = ida->nodes;
+	unsigned dst = new_first_leaf + old_leaves;
+	unsigned n, i, bit;
+	unsigned long *node;
+
+	/* Shift leaves up to new position */
+	while (src != ida->first_leaf) {
+		i = min((src - 1) % IDA_NODES_PER_SECTION + 1,
+			(dst - 1) % IDA_NODES_PER_SECTION + 1);
+
+		i = min(i, src - ida->first_leaf);
+
+		src -= i;
+		dst -= i;
+
+		memmove(ida_index_to_node(ida, dst),
+			ida_index_to_node(ida, src),
+			i * sizeof(unsigned long));
+	}
+
+	/* Zero out parent nodes */
+	for (n = 0; n < new_first_leaf; n += i) {
+		i = min_t(unsigned, new_first_leaf - n,
+			  IDA_NODES_PER_SECTION);
+
+		memset(ida_index_to_node(ida, n),
+		       0, i * sizeof(unsigned long));
+	}
+
+	/* Reconstruct parent nodes */
+	for (n = new_first_leaf; n < new_first_leaf + old_leaves; n++) {
+		i = n;
+		node = ida_index_to_node(ida, i);
+
+		while (!~*node && i) {
+			bit = (i - 1) % IDA_TREE_ARY;
+			i = (i - 1) / IDA_TREE_ARY;
+
+			node = ida_index_to_node(ida, i);
+			__set_bit(bit, node);
+		}
+	}
+}
+
+/*
+ * Attempt to double the size of the tree. We have to drop ida->lock to allocate
+ * memory, so we might race with another allocation that also tries to resize.
+ * So if the tree's not the size it originally was when we retake ida->lock,
+ * just return 0 - but the caller needs to recheck for the tree being full in
+ * case we _really_ raced badly.
+ */
+static int __ida_resize(struct ida *ida, gfp_t gfp, unsigned long *flags)
+	__releases(&ida->lock)
+	__acquires(&ida->lock)
+{
+	unsigned long *tree, **sections;
+	unsigned cur_nodes, new_nodes, new_first_leaf, cur_sections;
+again:
+	cur_nodes = ida->nodes;
+
+	new_nodes = roundup_pow_of_two(ida->nodes + 1) <= IDA_NODES_PER_SECTION
+		? roundup_pow_of_two(ida->nodes + 1)
+		: ida->nodes + IDA_NODES_PER_SECTION;
+
+	new_first_leaf = first_leaf_from_nodes(new_nodes);
+
+	sections = NULL;
+	cur_sections = ida->sections;
+
+	BUG_ON(ida->nodes > IDA_NODES_PER_SECTION &&
+	       ida->nodes % IDA_NODES_PER_SECTION);
+
+	spin_unlock_irqrestore(&ida->lock, *flags);
+
+	if (ida->nodes >= IDA_NODES_PER_SECTION &&
+	    is_power_of_2(cur_sections)) {
+		sections = kgalloc(cur_sections * 2 * sizeof(unsigned long *),
+				   __GFP_ZERO|gfp);
+		if (!sections)
+			goto err;
+	}
+
+	tree = kgalloc(min_t(size_t, new_nodes * sizeof(unsigned long),
+			     IDA_SECTION_SIZE), __GFP_ZERO|gfp);
+	if (!tree)
+		goto err;
+
+	spin_lock_irqsave(&ida->lock, *flags);
+
+	if (cur_nodes != ida->nodes || cur_sections != ida->sections) {
+		kgfree(sections, cur_sections * 2 * sizeof(unsigned long *));
+		kgfree(tree, min_t(size_t, new_nodes * sizeof(unsigned long),
+				   IDA_SECTION_SIZE));
+		return 0;
+	}
+
+	if (sections) {
+		memcpy(sections, ida->tree,
+		       ida->sections  * sizeof(unsigned long *));
+
+		if (ida->tree != &ida->inline_section)
+			kgfree(ida->tree,
+			       ida->sections * sizeof(unsigned long *));
+
+		ida->tree = sections;
+	}
+
+	if (ida->nodes < IDA_NODES_PER_SECTION) {
+		memcpy(tree, ida_index_to_node(ida, 0),
+		       ida->nodes * sizeof(unsigned long));
+
+		if (ida->tree[0] != &ida->inline_node)
+			kgfree(ida->tree[0],
+			       ida->nodes * sizeof(unsigned long));
+
+		ida->tree[0] = tree;
+	} else {
+		ida->tree[ida->sections++] = tree;
+
+		new_nodes = ida->sections * IDA_NODES_PER_SECTION;
+		new_first_leaf = first_leaf_from_nodes(new_nodes);
+
+		if (new_nodes - new_first_leaf < ida->nodes - ida->first_leaf)
+			goto again;
+	}
+
+	if (new_first_leaf != ida->first_leaf)
+		ida_increase_depth(ida, new_nodes, new_first_leaf);
+
+	ida->nodes	= new_nodes;
+	ida->first_leaf	= new_first_leaf;
+
+	return 0;
+err:
+	kgfree(sections, cur_sections * 2 * sizeof(unsigned long));
+	spin_lock_irqsave(&ida->lock, *flags);
+	return -ENOMEM;
+}
+
+/*
+ * Ganged allocation - amortize locking and tree traversal for when we've got
+ * another allocator (i.e. a percpu version) acting as a frontend to this code
+ */
+static int __ida_alloc_range_multiple(struct ida *ida, unsigned *ids,
+				      unsigned nr_ids, unsigned min_id,
+				      unsigned max_id, gfp_t gfp,
+				      unsigned long *flags)
+	__releases(&ida->lock)
+	__acquires(&ida->lock)
+{
+	unsigned i = 0, bit, bit_offset, id, ids_found = 0;
+	unsigned long *node = ida_index_to_node(ida, i);
+	int err = 0;
+
+	if (!max_id || max_id > (unsigned) INT_MAX + 1)
+		max_id = (unsigned) INT_MAX + 1;
+
+	if (min_id >= max_id)
+		return -ENOSPC;
+
+	while (ids_found < nr_ids) {
+		/*
+		 * If all bits are set in the root, no bits free and we need to
+		 * resize.
+		 */
+		while (!~*node) {
+resize:
+			if (ida->nodes - ida->first_leaf >=
+			    BITS_TO_LONGS(max_id)) {
+				err = -ENOSPC;
+				goto err;
+			}
+
+			err = __ida_resize(ida, gfp, flags);
+			if (err)
+				goto err;
+
+			i = 0;
+			node = ida_index_to_node(ida, i);
+		}
+
+		if (min_id) {
+			/*
+			 * If we're starting from a specific index, skip to that
+			 * leaf node and start looking there:
+			 */
+			bit_offset = min_id % BITS_PER_LONG;
+			i = ida->first_leaf + min_id / BITS_PER_LONG;
+
+			if (i >= ida->nodes)
+				goto resize;
+
+			while (1) {
+				node = ida_index_to_node(ida, i);
+				bit = ffz(*node >> bit_offset) + bit_offset;
+
+				/*
+				 * We might have had to go back up the tree
+				 * before we found a free bit - so skip down to
+				 * where we recurse down the tree.
+				 */
+				if (~*node && bit < BITS_PER_LONG)
+					goto found;
+
+				if (!i)
+					goto resize;
+
+				/*
+				 * Ok, no bits available in this node - go up a
+				 * level. But we have to update bit_offset so we
+				 * start searching in the parent _after_ the
+				 * node we're currently at
+				 */
+				bit_offset = (i - 1) % IDA_TREE_ARY + 1;
+				i = (i - 1) / IDA_TREE_ARY;
+			}
+		}
+
+		/*
+		 * Recurse down the tree looking for a free bit. We already
+		 * checked to make sure there _were_ free bits, but we might end
+		 * up at a leaf node we haven't allocated yet.
+		 */
+		while (1) {
+			bit = ffz(*node);
+found:
+			/*
+			 * Found a bit - if we're at a leaf node, great! We're
+			 * done:
+			 */
+			if (i >= ida->first_leaf)
+				break;
+
+			i = i * IDA_TREE_ARY + 1 + bit;
+			node = ida_index_to_node(ida, i);
+
+			/*
+			 * Recurse. But if we'd recurse to a node that hasn't
+			 * been allocated yet, resize:
+			 */
+
+			if (i >= ida->nodes)
+				goto resize;
+
+			BUG_ON(!~*node);
+		}
+
+		/*
+		 * Our leaves are contiguous, so we can calculate the id we
+		 * allocated from the node we're at and the bit we found within
+		 * that node:
+		 */
+		id = (i - ida->first_leaf) * BITS_PER_LONG + bit;
+		BUG_ON(id < min_id);
+
+		if (id >= max_id) {
+			err = -ENOSPC;
+			goto err;
+		}
+
+		ids[ids_found++] = id;
+		ida->allocated_ids++;
+
+		/*
+		 * Mark the id as allocated. If all the bits are now set in this
+		 * node, set this node's bit in the parent node - and so on up
+		 * to the root:
+		 */
+		while (1) {
+			__set_bit(bit, node);
+
+			if (~*node || !i)
+				break;
+
+			bit = (i - 1) % IDA_TREE_ARY;
+			i = (i - 1) / IDA_TREE_ARY;
+
+			node = ida_index_to_node(ida, i);
+		}
+	}
+err:
+	return ids_found ? ids_found : err;
+}
+
+/**
+ * ida_alloc_range - allocate a new id.
+ * @ida: the (initialized) ida.
+ * @start: the minimum id (inclusive, <= INT_MAX)
+ * @end: the maximum id (exclusive, <= INT_MAX + 1 or 0 for unlimited)
+ * @gfp: memory allocation flags
+ *
+ * Allocates an id in the range [start, end). Returns -ENOSPC if no ids are
+ * available, or -ENOMEM on memory allocation failure.
+ *
+ * Returns the smallest free id >= start.
+ *
+ * Use ida_remove() to get rid of an id.
+ */
+int ida_alloc_range(struct ida *ida, unsigned int start,
+		    unsigned int end, gfp_t gfp)
+{
+	int ret;
+	unsigned id;
+	unsigned long flags;
+
+	spin_lock_irqsave(&ida->lock, flags);
+	ret = __ida_alloc_range_multiple(ida, &id, 1, start, end, gfp, &flags);
+	spin_unlock_irqrestore(&ida->lock, flags);
+
+	return ret == 1 ? id : ret;
+}
+EXPORT_SYMBOL(ida_alloc_range);
+
+static int __ida_alloc_cyclic(struct ida *ida, unsigned start, unsigned end,
+			      gfp_t gfp, unsigned long *flags)
+	__releases(&ida->lock)
+	__acquires(&ida->lock)
+{
+	int ret;
+	unsigned id;
+
+	ret = __ida_alloc_range_multiple(ida, &id, 1,
+					 max(start, ida->cur_id),
+					 end, gfp, flags);
+
+	if (ret < 0)
+		ret = __ida_alloc_range_multiple(ida, &id, 1, start,
+						 end, gfp, flags);
+	if (ret == 1) {
+		ida->cur_id = id + 1;
+		if ((ida->cur_id - start) / 2 > max(1024U, ida->allocated_ids))
+			ida->cur_id = 0;
+
+		return id;
+	}
+
+	return ret;
+}
+
+/**
+ * ida_alloc_cyclic - allocate new ids cyclically
+ * @ida: the (initialized) ida.
+ * @start: the minimum id (inclusive, <= INT_MAX)
+ * @end: the maximum id (exclusive, <= INT_MAX + 1 or 0 for unlimited)
+ * @gfp: memory allocation flags
+ *
+ * Allocates an id in the range start <= id < end, or returns -ENOSPC.
+ * On memory allocation failure, returns -ENOMEM.
+ *
+ * Instead of returning the smallest free id, start searching from the position
+ * where the last id was allocated - i.e. it won't reuse freed ids right away.
+ *
+ * To avoid the allocated id space (and internal bitmap) becoming arbitrarily
+ * sparse, it can wrap before reaching the maximum id - if less than half of our
+ * current id space is allocated, it resets cur_id to 0
+ *
+ * But we don't want to wrap when the id space is small, so we use the maximum
+ * of (1024, allocated_ids) - see __ida_alloc_cyclic().
+ *
+ * Use ida_remove() to get rid of an id.
+ */
+int ida_alloc_cyclic(struct ida *ida, unsigned start, unsigned end, gfp_t gfp)
+{
+	int ret;
+	unsigned long flags;
+
+	spin_lock_irqsave(&ida->lock, flags);
+	ret = __ida_alloc_cyclic(ida, start, end, gfp, &flags);
+	spin_unlock_irqrestore(&ida->lock, flags);
+
+	return ret;
+}
+EXPORT_SYMBOL(ida_alloc_cyclic);
+
+/**
+ * ida_destroy - release all cached layers within an ida tree
+ * @ida:		ida handle
+ */
+void ida_destroy(struct ida *ida)
+{
+	unsigned i;
+
+	if (ida->tree[0] &&
+	    ida->tree[0] != &ida->inline_node)
+		kgfree(ida->tree[0], min(ida->nodes * sizeof(unsigned long),
+					 IDA_SECTION_SIZE));
+
+	for (i = 1; i < ida->sections; i++)
+		kgfree(ida->tree[i], IDA_SECTION_SIZE);
+
+	if (ida->tree &&
+	    ida->tree != &ida->inline_section)
+		kgfree(ida->tree, roundup_pow_of_two(ida->sections) *
+		       sizeof(unsigned long *));
+}
+EXPORT_SYMBOL(ida_destroy);
+
+/**
+ * ida_init_prealloc - initialize ida handle
+ * @ida:	ida handle
+ * @prealloc:	number of ids to preallocate memory for
+ *
+ * Initialize an ida, and preallocate enough memory that ida_alloc() will never
+ * return -ENOMEM if passed max_id <= prealloc.
+ */
+int ida_init_prealloc(struct ida *ida, unsigned prealloc)
+{
+	unsigned leaves = BITS_TO_LONGS(prealloc);
+
+	memset(ida, 0, sizeof(*ida));
+
+	spin_lock_init(&ida->lock);
+
+	ida->nodes		= 1;
+	ida->first_leaf		= 0;
+	ida->sections		= 1;
+	ida->inline_section	= &ida->inline_node;
+	ida->tree		= &ida->inline_section;
+
+	if (leaves > ida->nodes - ida->first_leaf) {
+		unsigned i;
+
+		while (leaves > ida->nodes - ida->first_leaf) {
+			if (ida->nodes < IDA_NODES_PER_SECTION)
+				ida->nodes *= 2;
+			else
+				ida->nodes += IDA_NODES_PER_SECTION;
+
+			ida->first_leaf = first_leaf_from_nodes(ida->nodes);
+		}
+
+		if (ida->nodes > IDA_NODES_PER_SECTION) {
+			ida->sections = ida->nodes / IDA_NODES_PER_SECTION;
+			ida->tree = kgalloc(roundup_pow_of_two(ida->sections) *
+					    sizeof(unsigned long *),
+					    __GFP_ZERO|GFP_KERNEL);
+			if (!ida->tree)
+				return -ENOMEM;
+
+			for (i = 0; i < ida->sections; i++) {
+				ida->tree[i] = kgalloc(IDA_SECTION_SIZE,
+						       __GFP_ZERO|GFP_KERNEL);
+				if (!ida->tree[i])
+					goto err;
+			}
+		} else {
+			ida->tree[0] =
+				kgalloc(ida->nodes * sizeof(unsigned long),
+					__GFP_ZERO|GFP_KERNEL);
+			if (!ida->tree)
+				return -ENOMEM;
+		}
+	}
+
+	return 0;
+err:
+	ida_destroy(ida);
+	return -ENOMEM;
+
+}
+EXPORT_SYMBOL(ida_init_prealloc);
+
+/* IDR */
 
 #define MAX_IDR_SHIFT		(sizeof(int) * 8 - 1)
 #define MAX_IDR_BIT		(1U << MAX_IDR_SHIFT)
@@ -50,7 +647,6 @@
 static struct kmem_cache *idr_layer_cache;
 static DEFINE_PER_CPU(struct idr_layer *, idr_preload_head);
 static DEFINE_PER_CPU(int, idr_preload_cnt);
-static DEFINE_SPINLOCK(simple_ida_lock);
 
 /* the maximum ID which can be allocated given idr->layers */
 static int idr_max(int layers)
@@ -868,286 +1464,3 @@ void idr_init(struct idr *idp)
 	spin_lock_init(&idp->lock);
 }
 EXPORT_SYMBOL(idr_init);
-
-
-/**
- * DOC: IDA description
- * IDA - IDR based ID allocator
- *
- * This is id allocator without id -> pointer translation.  Memory
- * usage is much lower than full blown idr because each id only
- * occupies a bit.  ida uses a custom leaf node which contains
- * IDA_BITMAP_BITS slots.
- *
- * 2007-04-25  written by Tejun Heo <htejun@gmail.com>
- */
-
-static void free_bitmap(struct ida *ida, struct ida_bitmap *bitmap)
-{
-	unsigned long flags;
-
-	if (!ida->free_bitmap) {
-		spin_lock_irqsave(&ida->idr.lock, flags);
-		if (!ida->free_bitmap) {
-			ida->free_bitmap = bitmap;
-			bitmap = NULL;
-		}
-		spin_unlock_irqrestore(&ida->idr.lock, flags);
-	}
-
-	kfree(bitmap);
-}
-
-/**
- * ida_pre_get - reserve resources for ida allocation
- * @ida:	ida handle
- * @gfp_mask:	memory allocation flag
- *
- * This function should be called prior to locking and calling the
- * following function.  It preallocates enough memory to satisfy the
- * worst possible allocation.
- *
- * If the system is REALLY out of memory this function returns %0,
- * otherwise %1.
- */
-static int ida_pre_get(struct ida *ida, gfp_t gfp_mask)
-{
-	/* allocate idr_layers */
-	if (!__idr_pre_get(&ida->idr, gfp_mask))
-		return 0;
-
-	/* allocate free_bitmap */
-	if (!ida->free_bitmap) {
-		struct ida_bitmap *bitmap;
-
-		bitmap = kmalloc(sizeof(struct ida_bitmap), gfp_mask);
-		if (!bitmap)
-			return 0;
-
-		free_bitmap(ida, bitmap);
-	}
-
-	return 1;
-}
-
-/**
- * ida_get_new_above - allocate new ID above or equal to a start id
- * @ida:	ida handle
- * @starting_id: id to start search at
- * @p_id:	pointer to the allocated handle
- *
- * Allocate new ID above or equal to @starting_id.  It should be called
- * with any required locks.
- *
- * If memory is required, it will return %-EAGAIN, you should unlock
- * and go back to the ida_pre_get() call.  If the ida is full, it will
- * return %-ENOSPC.
- *
- * @p_id returns a value in the range @starting_id ... %0x7fffffff.
- */
-static int ida_get_new_above(struct ida *ida, int starting_id, int *p_id)
-{
-	struct idr_layer *pa[MAX_IDR_LEVEL + 1];
-	struct ida_bitmap *bitmap;
-	unsigned long flags;
-	int idr_id = starting_id / IDA_BITMAP_BITS;
-	int offset = starting_id % IDA_BITMAP_BITS;
-	int t, id;
-
- restart:
-	/* get vacant slot */
-	t = idr_get_empty_slot(&ida->idr, idr_id, pa, 0, &ida->idr);
-	if (t < 0)
-		return t == -ENOMEM ? -EAGAIN : t;
-
-	if (t * IDA_BITMAP_BITS >= MAX_IDR_BIT)
-		return -ENOSPC;
-
-	if (t != idr_id)
-		offset = 0;
-	idr_id = t;
-
-	/* if bitmap isn't there, create a new one */
-	bitmap = (void *)pa[0]->ary[idr_id & IDR_MASK];
-	if (!bitmap) {
-		spin_lock_irqsave(&ida->idr.lock, flags);
-		bitmap = ida->free_bitmap;
-		ida->free_bitmap = NULL;
-		spin_unlock_irqrestore(&ida->idr.lock, flags);
-
-		if (!bitmap)
-			return -EAGAIN;
-
-		memset(bitmap, 0, sizeof(struct ida_bitmap));
-		rcu_assign_pointer(pa[0]->ary[idr_id & IDR_MASK],
-				(void *)bitmap);
-		pa[0]->count++;
-	}
-
-	/* lookup for empty slot */
-	t = find_next_zero_bit(bitmap->bitmap, IDA_BITMAP_BITS, offset);
-	if (t == IDA_BITMAP_BITS) {
-		/* no empty slot after offset, continue to the next chunk */
-		idr_id++;
-		offset = 0;
-		goto restart;
-	}
-
-	id = idr_id * IDA_BITMAP_BITS + t;
-	if (id >= MAX_IDR_BIT)
-		return -ENOSPC;
-
-	__set_bit(t, bitmap->bitmap);
-	if (++bitmap->nr_busy == IDA_BITMAP_BITS)
-		idr_mark_full(pa, idr_id);
-
-	*p_id = id;
-
-	/* Each leaf node can handle nearly a thousand slots and the
-	 * whole idea of ida is to have small memory foot print.
-	 * Throw away extra resources one by one after each successful
-	 * allocation.
-	 */
-	if (ida->idr.id_free_cnt || ida->free_bitmap) {
-		struct idr_layer *p = get_from_free_list(&ida->idr);
-		if (p)
-			kmem_cache_free(idr_layer_cache, p);
-	}
-
-	return 0;
-}
-
-static void __ida_remove(struct ida *ida, int id)
-{
-	struct idr_layer *p = ida->idr.top;
-	int shift = (ida->idr.layers - 1) * IDR_BITS;
-	int idr_id = id / IDA_BITMAP_BITS;
-	int offset = id % IDA_BITMAP_BITS;
-	int n;
-	struct ida_bitmap *bitmap;
-
-	/* clear full bits while looking up the leaf idr_layer */
-	while ((shift > 0) && p) {
-		n = (idr_id >> shift) & IDR_MASK;
-		__clear_bit(n, p->bitmap);
-		p = p->ary[n];
-		shift -= IDR_BITS;
-	}
-
-	if (p == NULL)
-		goto err;
-
-	n = idr_id & IDR_MASK;
-	__clear_bit(n, p->bitmap);
-
-	bitmap = (void *)p->ary[n];
-	if (!test_bit(offset, bitmap->bitmap))
-		goto err;
-
-	/* update bitmap and remove it if empty */
-	__clear_bit(offset, bitmap->bitmap);
-	if (--bitmap->nr_busy == 0) {
-		__set_bit(n, p->bitmap);	/* to please idr_remove() */
-		idr_remove(&ida->idr, idr_id);
-		free_bitmap(ida, bitmap);
-	}
-
-	return;
-
- err:
-	WARN(1, "ida_remove called for id=%d which is not allocated.\n", id);
-}
-
-/**
- * ida_destroy - release all cached layers within an ida tree
- * @ida:		ida handle
- */
-void ida_destroy(struct ida *ida)
-{
-	idr_destroy(&ida->idr);
-	kfree(ida->free_bitmap);
-}
-EXPORT_SYMBOL(ida_destroy);
-
-/**
- * ida_alloc_range - get a new id.
- * @ida: the (initialized) ida.
- * @start: the minimum id (inclusive, < 0x8000000)
- * @end: the maximum id (exclusive, < 0x8000000 or 0)
- * @gfp_mask: memory allocation flags
- *
- * Allocates an id in the range start <= id < end, or returns -ENOSPC.
- * On memory allocation failure, returns -ENOMEM.
- *
- * Use ida_remove() to get rid of an id.
- */
-int ida_alloc_range(struct ida *ida, unsigned int start, unsigned int end,
-		   gfp_t gfp_mask)
-{
-	int ret, id;
-	unsigned int max;
-	unsigned long flags;
-
-	BUG_ON((int)start < 0);
-	BUG_ON((int)end < 0);
-
-	if (end == 0)
-		max = 0x80000000;
-	else {
-		BUG_ON(end < start);
-		max = end - 1;
-	}
-
-again:
-	if (!ida_pre_get(ida, gfp_mask))
-		return -ENOMEM;
-
-	spin_lock_irqsave(&simple_ida_lock, flags);
-	ret = ida_get_new_above(ida, start, &id);
-	if (!ret) {
-		if (id > max) {
-			__ida_remove(ida, id);
-			ret = -ENOSPC;
-		} else {
-			ret = id;
-		}
-	}
-	spin_unlock_irqrestore(&simple_ida_lock, flags);
-
-	if (unlikely(ret == -EAGAIN))
-		goto again;
-
-	return ret;
-}
-EXPORT_SYMBOL(ida_alloc_range);
-
-/**
- * ida_remove - remove an allocated id.
- * @ida: the (initialized) ida.
- * @id: the id returned by ida_alloc_range.
- */
-void ida_remove(struct ida *ida, unsigned int id)
-{
-	unsigned long flags;
-
-	BUG_ON((int)id < 0);
-	spin_lock_irqsave(&simple_ida_lock, flags);
-	__ida_remove(ida, id);
-	spin_unlock_irqrestore(&simple_ida_lock, flags);
-}
-EXPORT_SYMBOL(ida_remove);
-
-/**
- * ida_init - initialize ida handle
- * @ida:	ida handle
- *
- * This function is use to set up the handle (@ida) that you will pass
- * to the rest of the functions.
- */
-void ida_init(struct ida *ida)
-{
-	memset(ida, 0, sizeof(struct ida));
-	idr_init(&ida->idr);
-
-}
-EXPORT_SYMBOL(ida_init);
-- 
1.8.4.rc1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 04/10] idr: Percpu ida
  2013-08-07 17:34 IDA/IDR rewrite, percpu ida Kent Overstreet
  2013-08-07 17:34 ` [PATCH 03/10] idr: Rewrite ida Kent Overstreet
@ 2013-08-07 17:34 ` Kent Overstreet
  2013-08-07 17:56   ` Christoph Lameter
  2013-08-07 17:46 ` [PATCH 05/10] idr: Kill old deprecated idr interfaces Kent Overstreet
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 46+ messages in thread
From: Kent Overstreet @ 2013-08-07 17:34 UTC (permalink / raw)
  To: akpm, linux-kernel
  Cc: Kent Overstreet, Tejun Heo, Oleg Nesterov, Christoph Lameter,
	Ingo Molnar, Andi Kleen, Jens Axboe, Nicholas A. Bellinger

Percpu frontend for allocating ids. With percpu allocation (that works),
it's impossible to guarantee it will always be possible to allocate all
nr_tags - typically, some will be stuck on a remote percpu freelist
where the current job can't get to them.

We do guarantee that it will always be possible to allocate at least
(nr_tags / 2) tags - this is done by keeping track of which and how many
cpus have tags on their percpu freelists. On allocation failure if
enough cpus have tags that there could potentially be (nr_tags / 2) tags
stuck on remote percpu freelists, we then pick a remote cpu at random to
steal from.

Note that there's no cpu hotplug notifier - we don't care, because
steal_tags() will eventually get the down cpu's tags. We _could_ satisfy
more allocations if we had a notifier - but we'll still meet our
guarantees and it's absolutely not a correctness issue, so I don't think
it's worth the extra code.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
---
 include/linux/idr.h |  46 +++++++++
 lib/idr.c           | 282 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 328 insertions(+)

diff --git a/include/linux/idr.h b/include/linux/idr.h
index a310bb0..f5b889b 100644
--- a/include/linux/idr.h
+++ b/include/linux/idr.h
@@ -101,6 +101,52 @@ static inline void ida_init(struct ida *ida)
 	ida_init_prealloc(ida, 0);
 }
 
+/* Percpu IDA/tag allocator */
+
+struct percpu_ida_cpu;
+
+struct percpu_ida {
+	/*
+	 * number of tags available to be allocated, as passed to
+	 * percpu_ida_init()
+	 */
+	unsigned			nr_tags;
+
+	struct percpu_ida_cpu __percpu	*tag_cpu;
+
+	/*
+	 * Bitmap of cpus that (may) have tags on their percpu freelists:
+	 * steal_tags() uses this to decide when to steal tags, and which cpus
+	 * to try stealing from.
+	 *
+	 * It's ok for a freelist to be empty when its bit is set - steal_tags()
+	 * will just keep looking - but the bitmap _must_ be set whenever a
+	 * percpu freelist does have tags.
+	 */
+	unsigned long			*cpus_have_tags;
+
+	struct {
+		/*
+		 * When we go to steal tags from another cpu (see steal_tags()),
+		 * we want to pick a cpu at random. Cycling through them every
+		 * time we steal is a bit easier and more or less equivalent:
+		 */
+		unsigned		cpu_last_stolen;
+
+		/* For sleeping on allocation failure */
+		wait_queue_head_t	wait;
+
+		/* Global freelist */
+		struct ida		ida;
+	} ____cacheline_aligned_in_smp;
+};
+
+int percpu_ida_alloc(struct percpu_ida *pool, gfp_t gfp);
+void percpu_ida_free(struct percpu_ida *pool, unsigned tag);
+
+void percpu_ida_destroy(struct percpu_ida *pool);
+int percpu_ida_init(struct percpu_ida *pool, unsigned long nr_tags);
+
 /* IDR */
 
 /*
diff --git a/lib/idr.c b/lib/idr.c
index e5afab5..c94e29e 100644
--- a/lib/idr.c
+++ b/lib/idr.c
@@ -633,6 +633,288 @@ err:
 }
 EXPORT_SYMBOL(ida_init_prealloc);
 
+/* Percpu IDA */
+
+/*
+ * Number of tags we move between the percpu freelist and the global freelist at
+ * a time
+ */
+#define IDA_PCPU_BATCH_MOVE	32U
+
+/* Max size of percpu freelist, */
+#define IDA_PCPU_SIZE		((IDA_PCPU_BATCH_MOVE * 3) / 2)
+
+struct percpu_ida_cpu {
+	spinlock_t			lock;
+	unsigned			nr_free;
+	unsigned			freelist[];
+};
+
+/*
+ * Try to steal tags from a remote cpu's percpu freelist.
+ *
+ * We first check how many percpu freelists have tags - we don't steal tags
+ * unless enough percpu freelists have tags on them that it's possible more than
+ * half the total tags could be stuck on remote percpu freelists.
+ *
+ * Then we iterate through the cpus until we find some tags - we don't attempt
+ * to find the "best" cpu to steal from, to keep cacheline bouncing to a
+ * minimum.
+ */
+static inline void steal_tags(struct percpu_ida *pool,
+			      struct percpu_ida_cpu *tags)
+{
+	unsigned cpus_have_tags, cpu = pool->cpu_last_stolen;
+	struct percpu_ida_cpu *remote;
+
+	for (cpus_have_tags = bitmap_weight(pool->cpus_have_tags, nr_cpu_ids);
+	     cpus_have_tags * IDA_PCPU_SIZE > pool->nr_tags / 2;
+	     cpus_have_tags--) {
+		cpu = find_next_bit(pool->cpus_have_tags, nr_cpu_ids, cpu);
+
+		if (cpu == nr_cpu_ids)
+			cpu = find_first_bit(pool->cpus_have_tags, nr_cpu_ids);
+
+		if (cpu == nr_cpu_ids)
+			BUG();
+
+		pool->cpu_last_stolen = cpu;
+		remote = per_cpu_ptr(pool->tag_cpu, cpu);
+
+		clear_bit(cpu, pool->cpus_have_tags);
+
+		if (remote == tags)
+			continue;
+
+		spin_lock(&remote->lock);
+
+		if (remote->nr_free) {
+			memcpy(tags->freelist,
+			       remote->freelist,
+			       sizeof(unsigned) * remote->nr_free);
+
+			tags->nr_free = remote->nr_free;
+			remote->nr_free = 0;
+		}
+
+		spin_unlock(&remote->lock);
+
+		if (tags->nr_free)
+			break;
+	}
+}
+
+static inline void alloc_global_tags(struct percpu_ida *pool,
+				     struct percpu_ida_cpu *tags)
+{
+	int nr_free = __ida_alloc_range_multiple(&pool->ida, tags->freelist,
+						 IDA_PCPU_BATCH_MOVE, 0,
+						 pool->nr_tags, GFP_NOWAIT,
+						 NULL);
+	if (nr_free > 0)
+		tags->nr_free = nr_free;
+}
+
+static inline unsigned alloc_local_tag(struct percpu_ida *pool,
+				       struct percpu_ida_cpu *tags)
+{
+	int tag = -ENOSPC;
+
+	spin_lock(&tags->lock);
+	if (tags->nr_free)
+		tag = tags->freelist[--tags->nr_free];
+	spin_unlock(&tags->lock);
+
+	return tag;
+}
+
+/**
+ * percpu_ida_alloc - allocate a tag
+ * @pool: pool to allocate from
+ * @gfp: gfp flags
+ *
+ * Returns a tag - an integer in the range [0..nr_tags) (passed to
+ * tag_pool_init()), or otherwise -ENOSPC on allocation failure.
+ *
+ * Safe to be called from interrupt context (assuming it isn't passed
+ * __GFP_WAIT, of course).
+ *
+ * Will not fail if passed __GFP_WAIT.
+ */
+int percpu_ida_alloc(struct percpu_ida *pool, gfp_t gfp)
+{
+	DEFINE_WAIT(wait);
+	struct percpu_ida_cpu *tags;
+	unsigned long flags;
+	unsigned this_cpu;
+	int tag;
+
+	local_irq_save(flags);
+	this_cpu = smp_processor_id();
+	tags = per_cpu_ptr(pool->tag_cpu, this_cpu);
+
+	/* Fastpath */
+	tag = alloc_local_tag(pool, tags);
+	if (likely(tag >= 0)) {
+		local_irq_restore(flags);
+		return tag;
+	}
+
+	while (1) {
+		spin_lock(&pool->ida.lock);
+
+		/*
+		 * prepare_to_wait() must come before steal_tags(), in case
+		 * percpu_ida_free() on another cpu flips a bit in
+		 * cpus_have_tags
+		 *
+		 * global lock held and irqs disabled, don't need percpu lock
+		 */
+		prepare_to_wait(&pool->wait, &wait, TASK_UNINTERRUPTIBLE);
+
+		if (!tags->nr_free)
+			alloc_global_tags(pool, tags);
+		if (!tags->nr_free)
+			steal_tags(pool, tags);
+
+		if (tags->nr_free) {
+			tag = tags->freelist[--tags->nr_free];
+			if (tags->nr_free)
+				set_bit(this_cpu, pool->cpus_have_tags);
+		}
+
+		spin_unlock(&pool->ida.lock);
+		local_irq_restore(flags);
+
+		if (tag >= 0 || !(gfp & __GFP_WAIT))
+			break;
+
+		schedule();
+
+		local_irq_save(flags);
+		this_cpu = smp_processor_id();
+		tags = per_cpu_ptr(pool->tag_cpu, this_cpu);
+	}
+
+	finish_wait(&pool->wait, &wait);
+	return tag;
+}
+EXPORT_SYMBOL_GPL(percpu_ida_alloc);
+
+/**
+ * percpu_ida_free - free a tag
+ * @pool: pool @tag was allocated from
+ * @tag: a tag previously allocated with percpu_ida_alloc()
+ *
+ * Safe to be called from interrupt context.
+ */
+void percpu_ida_free(struct percpu_ida *pool, unsigned tag)
+{
+	struct percpu_ida_cpu *tags;
+	unsigned long flags;
+	unsigned nr_free, this_cpu;
+
+	BUG_ON(tag >= pool->nr_tags);
+
+	local_irq_save(flags);
+	this_cpu = smp_processor_id();
+	tags = per_cpu_ptr(pool->tag_cpu, this_cpu);
+
+	spin_lock(&tags->lock);
+	tags->freelist[tags->nr_free++] = tag;
+
+	nr_free = tags->nr_free;
+	spin_unlock(&tags->lock);
+
+	if (nr_free == 1) {
+		set_bit(this_cpu, pool->cpus_have_tags);
+		wake_up(&pool->wait);
+	}
+
+	if (nr_free == IDA_PCPU_SIZE) {
+		spin_lock(&pool->ida.lock);
+
+		/*
+		 * Global lock held and irqs disabled, don't need percpu
+		 * lock
+		 */
+		while (tags->nr_free > IDA_PCPU_SIZE - IDA_PCPU_BATCH_MOVE)
+			__ida_remove(&pool->ida,
+				     tags->freelist[--tags->nr_free]);
+
+		wake_up(&pool->wait);
+		spin_unlock(&pool->ida.lock);
+	}
+
+	local_irq_restore(flags);
+}
+EXPORT_SYMBOL_GPL(percpu_ida_free);
+
+/**
+ * percpu_ida_destroy - release a tag pool's resources
+ * @pool: pool to free
+ *
+ * Frees the resources allocated by percpu_ida_init().
+ */
+void percpu_ida_destroy(struct percpu_ida *pool)
+{
+	free_percpu(pool->tag_cpu);
+	kfree(pool->cpus_have_tags);
+	ida_destroy(&pool->ida);
+}
+EXPORT_SYMBOL_GPL(percpu_ida_destroy);
+
+/**
+ * percpu_ida_init - initialize a percpu tag pool
+ * @pool: pool to initialize
+ * @nr_tags: number of tags that will be available for allocation
+ *
+ * Initializes @pool so that it can be used to allocate tags - integers in the
+ * range [0, nr_tags). Typically, they'll be used by driver code to refer to a
+ * preallocated array of tag structures.
+ *
+ * Allocation is percpu, but sharding is limited by nr_tags - for best
+ * performance, the workload should not span more cpus than nr_tags / 128.
+ */
+int percpu_ida_init(struct percpu_ida *pool, unsigned long nr_tags)
+{
+	unsigned cpu;
+
+	memset(pool, 0, sizeof(*pool));
+
+	init_waitqueue_head(&pool->wait);
+	pool->nr_tags = nr_tags;
+
+	/* Guard against overflow */
+	if (nr_tags > (unsigned) INT_MAX + 1) {
+		pr_err("tags.c: nr_tags too large\n");
+		return -EINVAL;
+	}
+
+	if (ida_init_prealloc(&pool->ida, nr_tags))
+		return -ENOMEM;
+
+	pool->cpus_have_tags = kzalloc(BITS_TO_LONGS(nr_cpu_ids) *
+				       sizeof(unsigned long), GFP_KERNEL);
+	if (!pool->cpus_have_tags)
+		goto err;
+
+	pool->tag_cpu = __alloc_percpu(sizeof(struct percpu_ida_cpu) +
+				       IDA_PCPU_SIZE * sizeof(unsigned),
+				       sizeof(unsigned));
+	if (!pool->tag_cpu)
+		goto err;
+
+	for_each_possible_cpu(cpu)
+		spin_lock_init(&per_cpu_ptr(pool->tag_cpu, cpu)->lock);
+
+	return 0;
+err:
+	percpu_ida_destroy(pool);
+	return -ENOMEM;
+}
+EXPORT_SYMBOL_GPL(percpu_ida_init);
+
 /* IDR */
 
 #define MAX_IDR_SHIFT		(sizeof(int) * 8 - 1)
-- 
1.8.4.rc1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 05/10] idr: Kill old deprecated idr interfaces
  2013-08-07 17:34 IDA/IDR rewrite, percpu ida Kent Overstreet
  2013-08-07 17:34 ` [PATCH 03/10] idr: Rewrite ida Kent Overstreet
  2013-08-07 17:34 ` [PATCH 04/10] idr: Percpu ida Kent Overstreet
@ 2013-08-07 17:46 ` Kent Overstreet
  2013-08-07 17:46 ` [PATCH 06/10] idr: Rename idr_get_next() -> idr_find_next() Kent Overstreet
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 46+ messages in thread
From: Kent Overstreet @ 2013-08-07 17:46 UTC (permalink / raw)
  To: akpm, linux-kernel; +Cc: Kent Overstreet, Tejun Heo

The deprecated idr interfaces don't have any in kernel users, so let's
delete them as prep work for the idr rewrite.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tejun Heo <tj@kernel.org>
---
 include/linux/idr.h | 63 -----------------------------------------------------
 lib/idr.c           | 36 +++---------------------------
 2 files changed, 3 insertions(+), 96 deletions(-)

diff --git a/include/linux/idr.h b/include/linux/idr.h
index f5b889b..b26f8b1 100644
--- a/include/linux/idr.h
+++ b/include/linux/idr.h
@@ -264,69 +264,6 @@ static inline void *idr_find(struct idr *idr, int id)
 #define idr_for_each_entry(idp, entry, id)			\
 	for (id = 0; ((entry) = idr_get_next(idp, &(id))) != NULL; ++id)
 
-/*
- * Don't use the following functions.  These exist only to suppress
- * deprecated warnings on EXPORT_SYMBOL()s.
- */
-int __idr_pre_get(struct idr *idp, gfp_t gfp_mask);
-int __idr_get_new_above(struct idr *idp, void *ptr, int starting_id, int *id);
-void __idr_remove_all(struct idr *idp);
-
-/**
- * idr_pre_get - reserve resources for idr allocation
- * @idp:	idr handle
- * @gfp_mask:	memory allocation flags
- *
- * Part of old alloc interface.  This is going away.  Use
- * idr_preload[_end]() and idr_alloc() instead.
- */
-static inline int __deprecated idr_pre_get(struct idr *idp, gfp_t gfp_mask)
-{
-	return __idr_pre_get(idp, gfp_mask);
-}
-
-/**
- * idr_get_new_above - allocate new idr entry above or equal to a start id
- * @idp: idr handle
- * @ptr: pointer you want associated with the id
- * @starting_id: id to start search at
- * @id: pointer to the allocated handle
- *
- * Part of old alloc interface.  This is going away.  Use
- * idr_preload[_end]() and idr_alloc() instead.
- */
-static inline int __deprecated idr_get_new_above(struct idr *idp, void *ptr,
-						 int starting_id, int *id)
-{
-	return __idr_get_new_above(idp, ptr, starting_id, id);
-}
-
-/**
- * idr_get_new - allocate new idr entry
- * @idp: idr handle
- * @ptr: pointer you want associated with the id
- * @id: pointer to the allocated handle
- *
- * Part of old alloc interface.  This is going away.  Use
- * idr_preload[_end]() and idr_alloc() instead.
- */
-static inline int __deprecated idr_get_new(struct idr *idp, void *ptr, int *id)
-{
-	return __idr_get_new_above(idp, ptr, 0, id);
-}
-
-/**
- * idr_remove_all - remove all ids from the given idr tree
- * @idp: idr handle
- *
- * If you're trying to destroy @idp, calling idr_destroy() is enough.
- * This is going away.  Don't use.
- */
-static inline void __deprecated idr_remove_all(struct idr *idp)
-{
-	__idr_remove_all(idp);
-}
-
 void __init idr_init_cache(void);
 
 #endif /* __IDR_H__ */
diff --git a/lib/idr.c b/lib/idr.c
index c94e29e..3f68665 100644
--- a/lib/idr.c
+++ b/lib/idr.c
@@ -1074,19 +1074,6 @@ static void idr_mark_full(struct idr_layer **pa, int id)
 	}
 }
 
-int __idr_pre_get(struct idr *idp, gfp_t gfp_mask)
-{
-	while (idp->id_free_cnt < MAX_IDR_FREE) {
-		struct idr_layer *new;
-		new = kmem_cache_zalloc(idr_layer_cache, gfp_mask);
-		if (new == NULL)
-			return (0);
-		move_to_free_list(idp, new);
-	}
-	return 1;
-}
-EXPORT_SYMBOL(__idr_pre_get);
-
 /**
  * sub_alloc - try to allocate an id without growing the tree depth
  * @idp: idr handle
@@ -1252,21 +1239,6 @@ static void idr_fill_slot(struct idr *idr, void *ptr, int id,
 	idr_mark_full(pa, id);
 }
 
-int __idr_get_new_above(struct idr *idp, void *ptr, int starting_id, int *id)
-{
-	struct idr_layer *pa[MAX_IDR_LEVEL + 1];
-	int rv;
-
-	rv = idr_get_empty_slot(idp, starting_id, pa, 0, idp);
-	if (rv < 0)
-		return rv == -ENOMEM ? -EAGAIN : rv;
-
-	idr_fill_slot(idp, ptr, rv, pa);
-	*id = rv;
-	return 0;
-}
-EXPORT_SYMBOL(__idr_get_new_above);
-
 /**
  * idr_preload - preload for idr_alloc()
  * @gfp_mask: allocation mask to use for preloading
@@ -1485,7 +1457,7 @@ void idr_remove(struct idr *idp, int id)
 }
 EXPORT_SYMBOL(idr_remove);
 
-void __idr_remove_all(struct idr *idp)
+static void __idr_remove_all(struct idr *idp)
 {
 	int n, id, max;
 	int bt_mask;
@@ -1518,7 +1490,6 @@ void __idr_remove_all(struct idr *idp)
 	}
 	idp->layers = 0;
 }
-EXPORT_SYMBOL(__idr_remove_all);
 
 /**
  * idr_destroy - release all cached layers within an idr tree
@@ -1580,13 +1551,12 @@ EXPORT_SYMBOL(idr_find_slowpath);
  * callback function will be called for each pointer currently
  * registered, passing the id, the pointer and the data pointer passed
  * to this function.  It is not safe to modify the idr tree while in
- * the callback, so functions such as idr_get_new and idr_remove are
- * not allowed.
+ * the callback, so functions such as idr_remove are not allowed.
  *
  * We check the return of @fn each time. If it returns anything other
  * than %0, we break out and return that value.
  *
- * The caller must serialize idr_for_each() vs idr_get_new() and idr_remove().
+ * The caller must serialize idr_for_each() vs idr_remove().
  */
 int idr_for_each(struct idr *idp,
 		 int (*fn)(int id, void *p, void *data), void *data)
-- 
1.8.4.rc1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 06/10] idr: Rename idr_get_next() -> idr_find_next()
  2013-08-07 17:34 IDA/IDR rewrite, percpu ida Kent Overstreet
                   ` (2 preceding siblings ...)
  2013-08-07 17:46 ` [PATCH 05/10] idr: Kill old deprecated idr interfaces Kent Overstreet
@ 2013-08-07 17:46 ` Kent Overstreet
  2013-08-07 17:46 ` [PATCH 07/10] idr: Rename idr_alloc() -> idr_alloc_range() Kent Overstreet
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 46+ messages in thread
From: Kent Overstreet @ 2013-08-07 17:46 UTC (permalink / raw)
  To: akpm, linux-kernel; +Cc: Kent Overstreet, Tejun Heo

get() implies taking a ref or sometimes an allocation, which this
function definitely does not do - rename it to something more sensible.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tejun Heo <tj@kernel.org>
---
 drivers/block/drbd/drbd_main.c | 2 +-
 drivers/block/drbd/drbd_nl.c   | 2 +-
 drivers/mtd/mtdcore.c          | 2 +-
 include/linux/idr.h            | 4 ++--
 kernel/cgroup.c                | 2 +-
 lib/idr.c                      | 6 +++---
 6 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c
index 55635ed..73f4765 100644
--- a/drivers/block/drbd/drbd_main.c
+++ b/drivers/block/drbd/drbd_main.c
@@ -500,7 +500,7 @@ int conn_lowest_minor(struct drbd_tconn *tconn)
 	int vnr = 0, m;
 
 	rcu_read_lock();
-	mdev = idr_get_next(&tconn->volumes, &vnr);
+	mdev = idr_find_next(&tconn->volumes, &vnr);
 	m = mdev ? mdev_to_minor(mdev) : -1;
 	rcu_read_unlock();
 
diff --git a/drivers/block/drbd/drbd_nl.c b/drivers/block/drbd/drbd_nl.c
index 8cc1e64..936da36 100644
--- a/drivers/block/drbd/drbd_nl.c
+++ b/drivers/block/drbd/drbd_nl.c
@@ -2938,7 +2938,7 @@ int get_one_status(struct sk_buff *skb, struct netlink_callback *cb)
 	}
 	if (tconn) {
 next_tconn:
-		mdev = idr_get_next(&tconn->volumes, &volume);
+		mdev = idr_find_next(&tconn->volumes, &volume);
 		if (!mdev) {
 			/* No more volumes to dump on this tconn.
 			 * Advance tconn iterator. */
diff --git a/drivers/mtd/mtdcore.c b/drivers/mtd/mtdcore.c
index 048c823..8d64363 100644
--- a/drivers/mtd/mtdcore.c
+++ b/drivers/mtd/mtdcore.c
@@ -91,7 +91,7 @@ EXPORT_SYMBOL_GPL(mtd_table_mutex);
 
 struct mtd_info *__mtd_next_device(int i)
 {
-	return idr_get_next(&mtd_idr, &i);
+	return idr_find_next(&mtd_idr, &i);
 }
 EXPORT_SYMBOL_GPL(__mtd_next_device);
 
diff --git a/include/linux/idr.h b/include/linux/idr.h
index b26f8b1..6395da1 100644
--- a/include/linux/idr.h
+++ b/include/linux/idr.h
@@ -211,7 +211,7 @@ int idr_alloc(struct idr *idp, void *ptr, int start, int end, gfp_t gfp_mask);
 int idr_alloc_cyclic(struct idr *idr, void *ptr, int start, int end, gfp_t gfp_mask);
 int idr_for_each(struct idr *idp,
 		 int (*fn)(int id, void *p, void *data), void *data);
-void *idr_get_next(struct idr *idp, int *nextid);
+void *idr_find_next(struct idr *idp, int *nextid);
 void *idr_replace(struct idr *idp, void *ptr, int id);
 void idr_remove(struct idr *idp, int id);
 void idr_free(struct idr *idp, int id);
@@ -262,7 +262,7 @@ static inline void *idr_find(struct idr *idr, int id)
  * is convenient for a "not found" value.
  */
 #define idr_for_each_entry(idp, entry, id)			\
-	for (id = 0; ((entry) = idr_get_next(idp, &(id))) != NULL; ++id)
+	for (id = 0; ((entry) = idr_find_next(idp, &(id))) != NULL; ++id)
 
 void __init idr_init_cache(void);
 
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index bac5312..7b397f8 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1868,7 +1868,7 @@ int task_cgroup_path(struct task_struct *task, char *buf, size_t buflen)
 
 	mutex_lock(&cgroup_mutex);
 
-	root = idr_get_next(&cgroup_hierarchy_idr, &hierarchy_id);
+	root = idr_find_next(&cgroup_hierarchy_idr, &hierarchy_id);
 
 	if (root) {
 		cgrp = task_cgroup_from_root(task, root);
diff --git a/lib/idr.c b/lib/idr.c
index 3f68665..1d67cdd 100644
--- a/lib/idr.c
+++ b/lib/idr.c
@@ -1596,7 +1596,7 @@ int idr_for_each(struct idr *idp,
 EXPORT_SYMBOL(idr_for_each);
 
 /**
- * idr_get_next - lookup next object of id to given id.
+ * idr_find_next - lookup next object of id to given id.
  * @idp: idr handle
  * @nextidp:  pointer to lookup key
  *
@@ -1607,7 +1607,7 @@ EXPORT_SYMBOL(idr_for_each);
  * This function can be called under rcu_read_lock(), given that the leaf
  * pointers lifetimes are correctly managed.
  */
-void *idr_get_next(struct idr *idp, int *nextidp)
+void *idr_find_next(struct idr *idp, int *nextidp)
 {
 	struct idr_layer *p, *pa[MAX_IDR_LEVEL + 1];
 	struct idr_layer **paa = &pa[0];
@@ -1648,7 +1648,7 @@ void *idr_get_next(struct idr *idp, int *nextidp)
 	}
 	return NULL;
 }
-EXPORT_SYMBOL(idr_get_next);
+EXPORT_SYMBOL(idr_find_next);
 
 
 /**
-- 
1.8.4.rc1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 07/10] idr: Rename idr_alloc() -> idr_alloc_range()
  2013-08-07 17:34 IDA/IDR rewrite, percpu ida Kent Overstreet
                   ` (3 preceding siblings ...)
  2013-08-07 17:46 ` [PATCH 06/10] idr: Rename idr_get_next() -> idr_find_next() Kent Overstreet
@ 2013-08-07 17:46 ` Kent Overstreet
  2013-08-07 19:04   ` Wolfram Sang
  2013-08-07 17:46 ` [PATCH 08/10] idr: Reimplement idr on top of ida/radix trees Kent Overstreet
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 46+ messages in thread
From: Kent Overstreet @ 2013-08-07 17:46 UTC (permalink / raw)
  To: akpm, linux-kernel
  Cc: Wolfram Sang, Tatyana Nikolova, Linus Walleij, Paul Mackerras,
	Christoph Hellwig, Alon Levy, Davidlohr Bueso, Ohad Ben-Cohen,
	John W. Linville, Inki Dae, Kent Overstreet, David Teigland,
	Joel Becker, Tim Gardner, Greg Kroah-Hartman, Seung-Woo Kim,
	Li Zefan, Tejun Heo, Guo Chao, Andy Grover, Dmitry Torokhov

Then also add an idr_alloc() wrapper, without the start/end arguments -
for consistency with the ida interfaces.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Chas Williams <chas@cmf.nrl.navy.mil>
Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
Cc: Dan Williams <djbw@fb.com>
Cc: Vinod Koul <vinod.koul@intel.com>
Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: David Airlie <airlied@linux.ie>
Cc: Inki Dae <inki.dae@samsung.com>
Cc: Joonyoung Shim <jy0922.shim@samsung.com>
Cc: Seung-Woo Kim <sw0312.kim@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Wolfram Sang <wsa@the-dreams.de>
Cc: Roland Dreier <roland@kernel.org>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: Steve Wise <swise@chelsio.com>
Cc: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Cc: Christoph Raisch <raisch@de.ibm.com>
Cc: Mike Marciniszyn <infinipath@intel.com>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: dm-devel@redhat.com
Cc: Samuel Ortiz <sameo@linux.intel.com>
Cc: Alex Dubov <oakad@yahoo.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Chris Ball <cjb@laptop.org>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Anton Vorontsov <cbou@mail.ru>
Cc: Rodolfo Giometti <giometti@enneenne.com>
Cc: Ohad Ben-Cohen <ohad@wizery.com>
Cc: Anil Gurumurthy <agurumur@brocade.com>
Cc: Vijaya Mohan Guvva <vmohan@brocade.com>
Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
Cc: James Smart <james.smart@emulex.com>
Cc: Doug Gilbert <dgilbert@interlog.com>
Cc: Willem Riede <osst@riede.org>
Cc: "Kai Mäkisara" <Kai.Makisara@kolumbus.fi>
Cc: Omar Ramirez Luna <omar.ramirez@copitl.com>
Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Eduardo Valentin <eduardo.valentin@ti.com>
Cc: "Hans J. Koch" <hjk@hansjkoch.de>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Christine Caulfield <ccaulfie@redhat.com>
Cc: David Teigland <teigland@redhat.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: "John W. Linville" <linville@tuxdriver.com>
Cc: Guo Chao <yan@linux.vnet.ibm.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Alon Levy <alevy@redhat.com>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Brian Paul <brianp@vmware.com>
Cc: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Cc: Dmitry Torokhov <dtor@vmware.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Erez Shitrit <erezsh@mellanox.co.il>
Cc: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com>
Cc: Haggai Eran <haggaie@mellanox.com>
Cc: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Andy Grover <agrover@redhat.com>
Cc: Masanari Iida <standby24x7@gmail.com>
Cc: Tim Gardner <tim.gardner@canonical.com>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Michel Lespinasse <walken@google.com>
Acked-by: Alex Williamson <alex.williamson@redhat.com>
Acked-by: Davidlohr Bueso <davidlohr.bueso@hp.com>
Acked-by: Vinod Koul <vinod.koul@intel.com>
---
 block/bsg.c                                |  2 +-
 block/genhd.c                              |  2 +-
 drivers/atm/nicstar.c                      |  4 ++--
 drivers/block/drbd/drbd_main.c             |  6 +++--
 drivers/block/loop.c                       |  5 ++--
 drivers/dca/dca-sysfs.c                    |  2 +-
 drivers/dma/dmaengine.c                    |  2 +-
 drivers/firewire/core-cdev.c               |  3 +--
 drivers/firewire/core-device.c             |  2 +-
 drivers/gpio/gpiolib.c                     |  2 +-
 drivers/gpu/drm/drm_context.c              |  2 +-
 drivers/gpu/drm/drm_crtc.c                 |  3 ++-
 drivers/gpu/drm/drm_gem.c                  |  5 ++--
 drivers/gpu/drm/drm_stub.c                 |  2 +-
 drivers/gpu/drm/exynos/exynos_drm_ipp.c    |  2 +-
 drivers/gpu/drm/i915/i915_gem_context.c    |  4 ++--
 drivers/gpu/drm/qxl/qxl_cmd.c              |  2 +-
 drivers/gpu/drm/qxl/qxl_release.c          |  3 ++-
 drivers/gpu/drm/sis/sis_mm.c               |  2 +-
 drivers/gpu/drm/via/via_mm.c               |  2 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_resource.c   |  2 +-
 drivers/i2c/i2c-core.c                     |  4 ++--
 drivers/infiniband/core/cm.c               |  3 ++-
 drivers/infiniband/core/cma.c              |  2 +-
 drivers/infiniband/core/sa_query.c         |  2 +-
 drivers/infiniband/core/ucm.c              |  2 +-
 drivers/infiniband/core/ucma.c             |  4 ++--
 drivers/infiniband/core/uverbs_cmd.c       |  2 +-
 drivers/infiniband/hw/cxgb3/iwch.h         |  2 +-
 drivers/infiniband/hw/cxgb4/iw_cxgb4.h     |  2 +-
 drivers/infiniband/hw/ehca/ehca_cq.c       |  3 ++-
 drivers/infiniband/hw/ehca/ehca_qp.c       |  2 +-
 drivers/infiniband/hw/ipath/ipath_driver.c |  2 +-
 drivers/infiniband/hw/ocrdma/ocrdma_main.c |  2 +-
 drivers/infiniband/hw/qib/qib_init.c       |  2 +-
 drivers/md/dm.c                            |  6 +++--
 drivers/memstick/core/memstick.c           |  2 +-
 drivers/memstick/core/mspro_block.c        |  3 ++-
 drivers/mfd/rtsx_pcr.c                     |  2 +-
 drivers/misc/c2port/core.c                 |  2 +-
 drivers/misc/tifm_core.c                   |  2 +-
 drivers/mmc/core/host.c                    |  2 +-
 drivers/mtd/mtdcore.c                      |  2 +-
 drivers/net/macvtap.c                      |  3 ++-
 drivers/net/ppp/ppp_generic.c              |  4 ++--
 drivers/power/bq2415x_charger.c            |  2 +-
 drivers/power/bq27x00_battery.c            |  2 +-
 drivers/power/ds2782_battery.c             |  2 +-
 drivers/pps/kapi.c                         |  2 +-
 drivers/pps/pps.c                          |  4 ++--
 drivers/remoteproc/remoteproc_core.c       |  2 +-
 drivers/rpmsg/virtio_rpmsg_bus.c           |  4 ++--
 drivers/scsi/bfa/bfad_im.c                 |  2 +-
 drivers/scsi/ch.c                          |  3 ++-
 drivers/scsi/lpfc/lpfc_init.c              |  2 +-
 drivers/scsi/sg.c                          |  2 +-
 drivers/scsi/st.c                          |  3 ++-
 drivers/staging/tidspbridge/rmgr/drv.c     |  4 ++--
 drivers/staging/zcache/ramster/tcp.c       |  2 +-
 drivers/target/iscsi/iscsi_target.c        |  2 +-
 drivers/target/iscsi/iscsi_target_login.c  |  2 +-
 drivers/thermal/cpu_cooling.c              |  2 +-
 drivers/thermal/thermal_core.c             |  2 +-
 drivers/uio/uio.c                          |  3 ++-
 drivers/usb/serial/usb-serial.c            |  2 +-
 drivers/vfio/vfio.c                        |  3 ++-
 fs/dlm/lock.c                              |  2 +-
 fs/dlm/recover.c                           |  2 +-
 fs/nfs/nfs4client.c                        |  2 +-
 fs/ocfs2/cluster/tcp.c                     |  2 +-
 include/linux/idr.h                        | 11 +++++++--
 ipc/util.c                                 |  2 +-
 kernel/cgroup.c                            |  2 +-
 kernel/events/core.c                       |  3 ++-
 kernel/workqueue.c                         |  4 ++--
 lib/idr.c                                  | 38 ++++++++++++++++--------------
 net/9p/util.c                              |  2 +-
 net/mac80211/tx.c                          |  4 ++--
 net/tipc/server.c                          |  2 +-
 79 files changed, 137 insertions(+), 112 deletions(-)

diff --git a/block/bsg.c b/block/bsg.c
index 420a5a9..fc13d24 100644
--- a/block/bsg.c
+++ b/block/bsg.c
@@ -1016,7 +1016,7 @@ int bsg_register_queue(struct request_queue *q, struct device *parent,
 
 	mutex_lock(&bsg_mutex);
 
-	ret = idr_alloc(&bsg_minor_idr, bcd, 0, BSG_MAX_DEVS, GFP_KERNEL);
+	ret = idr_alloc_range(&bsg_minor_idr, bcd, 0, BSG_MAX_DEVS, GFP_KERNEL);
 	if (ret < 0) {
 		if (ret == -ENOSPC) {
 			printk(KERN_ERR "bsg: too many bsg devices\n");
diff --git a/block/genhd.c b/block/genhd.c
index dadf42b..14de2dc 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -421,7 +421,7 @@ int blk_alloc_devt(struct hd_struct *part, dev_t *devt)
 
 	/* allocate ext devt */
 	mutex_lock(&ext_devt_mutex);
-	idx = idr_alloc(&ext_devt_idr, part, 0, NR_EXT_DEVT, GFP_KERNEL);
+	idx = idr_alloc_range(&ext_devt_idr, part, 0, NR_EXT_DEVT, GFP_KERNEL);
 	mutex_unlock(&ext_devt_mutex);
 	if (idx < 0)
 		return idx == -ENOSPC ? -EBUSY : idx;
diff --git a/drivers/atm/nicstar.c b/drivers/atm/nicstar.c
index 6587dc2..0a735d6 100644
--- a/drivers/atm/nicstar.c
+++ b/drivers/atm/nicstar.c
@@ -1025,11 +1025,11 @@ static void push_rxbufs(ns_dev * card, struct sk_buff *skb)
 				card->lbfqc += 2;
 		}
 
-		id1 = idr_alloc(&card->idr, handle1, 0, 0, GFP_ATOMIC);
+		id1 = idr_alloc(&card->idr, handle1, GFP_ATOMIC);
 		if (id1 < 0)
 			goto out;
 
-		id2 = idr_alloc(&card->idr, handle2, 0, 0, GFP_ATOMIC);
+		id2 = idr_alloc(&card->idr, handle2, GFP_ATOMIC);
 		if (id2 < 0)
 			goto out;
 
diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c
index 73f4765..45dd6fc 100644
--- a/drivers/block/drbd/drbd_main.c
+++ b/drivers/block/drbd/drbd_main.c
@@ -2675,7 +2675,8 @@ enum drbd_ret_code conn_new_minor(struct drbd_tconn *tconn, unsigned int minor,
 	mdev->read_requests = RB_ROOT;
 	mdev->write_requests = RB_ROOT;
 
-	minor_got = idr_alloc(&minors, mdev, minor, minor + 1, GFP_KERNEL);
+	minor_got = idr_alloc_range(&minors, mdev, minor,
+				    minor + 1, GFP_KERNEL);
 	if (minor_got < 0) {
 		if (minor_got == -ENOSPC) {
 			err = ERR_MINOR_EXISTS;
@@ -2684,7 +2685,8 @@ enum drbd_ret_code conn_new_minor(struct drbd_tconn *tconn, unsigned int minor,
 		goto out_no_minor_idr;
 	}
 
-	vnr_got = idr_alloc(&tconn->volumes, mdev, vnr, vnr + 1, GFP_KERNEL);
+	vnr_got = idr_alloc_range(&tconn->volumes, mdev,
+				  vnr, vnr + 1, GFP_KERNEL);
 	if (vnr_got < 0) {
 		if (vnr_got == -ENOSPC) {
 			err = ERR_INVALID_REQUEST;
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index 40e7155..68319ae 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1620,11 +1620,12 @@ static int loop_add(struct loop_device **l, int i)
 
 	/* allocate id, if @id >= 0, we're requesting that specific id */
 	if (i >= 0) {
-		err = idr_alloc(&loop_index_idr, lo, i, i + 1, GFP_KERNEL);
+		err = idr_alloc_range(&loop_index_idr, lo,
+				      i, i + 1, GFP_KERNEL);
 		if (err == -ENOSPC)
 			err = -EEXIST;
 	} else {
-		err = idr_alloc(&loop_index_idr, lo, 0, 0, GFP_KERNEL);
+		err = idr_alloc(&loop_index_idr, lo, GFP_KERNEL);
 	}
 	if (err < 0)
 		goto out_free_dev;
diff --git a/drivers/dca/dca-sysfs.c b/drivers/dca/dca-sysfs.c
index 126cf29..effda66 100644
--- a/drivers/dca/dca-sysfs.c
+++ b/drivers/dca/dca-sysfs.c
@@ -58,7 +58,7 @@ int dca_sysfs_add_provider(struct dca_provider *dca, struct device *dev)
 	idr_preload(GFP_KERNEL);
 	spin_lock(&dca_idr_lock);
 
-	ret = idr_alloc(&dca_idr, dca, 0, 0, GFP_NOWAIT);
+	ret = idr_alloc(&dca_idr, dca, GFP_NOWAIT);
 	if (ret >= 0)
 		dca->id = ret;
 
diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c
index 9e56745..dd6ecb1 100644
--- a/drivers/dma/dmaengine.c
+++ b/drivers/dma/dmaengine.c
@@ -692,7 +692,7 @@ static int get_dma_id(struct dma_device *device)
 
 	mutex_lock(&dma_list_mutex);
 
-	rc = idr_alloc(&dma_idr, NULL, 0, 0, GFP_KERNEL);
+	rc = idr_alloc(&dma_idr, NULL, GFP_KERNEL);
 	if (rc >= 0)
 		device->dev_id = rc;
 
diff --git a/drivers/firewire/core-cdev.c b/drivers/firewire/core-cdev.c
index ac1b43a..ba78d08 100644
--- a/drivers/firewire/core-cdev.c
+++ b/drivers/firewire/core-cdev.c
@@ -497,8 +497,7 @@ static int add_client_resource(struct client *client,
 	if (client->in_shutdown)
 		ret = -ECANCELED;
 	else
-		ret = idr_alloc(&client->resource_idr, resource, 0, 0,
-				GFP_NOWAIT);
+		ret = idr_alloc(&client->resource_idr, resource, GFP_NOWAIT);
 	if (ret >= 0) {
 		resource->handle = ret;
 		client_get(client);
diff --git a/drivers/firewire/core-device.c b/drivers/firewire/core-device.c
index de4aa40..cd1ac73 100644
--- a/drivers/firewire/core-device.c
+++ b/drivers/firewire/core-device.c
@@ -1036,7 +1036,7 @@ static void fw_device_init(struct work_struct *work)
 
 	fw_device_get(device);
 	down_write(&fw_device_rwsem);
-	minor = idr_alloc(&fw_device_idr, device, 0, 1 << MINORBITS,
+	minor = idr_alloc_range(&fw_device_idr, device, 0, 1 << MINORBITS,
 			GFP_KERNEL);
 	up_write(&fw_device_rwsem);
 
diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c
index ff0fd65..36a6280 100644
--- a/drivers/gpio/gpiolib.c
+++ b/drivers/gpio/gpiolib.c
@@ -414,7 +414,7 @@ static int gpio_setup_irq(struct gpio_desc *desc, struct device *dev,
 			goto err_out;
 		}
 
-		ret = idr_alloc(&dirent_idr, value_sd, 1, 0, GFP_KERNEL);
+		ret = idr_alloc_range(&dirent_idr, value_sd, 1, 0, GFP_KERNEL);
 		if (ret < 0)
 			goto free_sd;
 		id = ret;
diff --git a/drivers/gpu/drm/drm_context.c b/drivers/gpu/drm/drm_context.c
index 725968d..9a5c0c8 100644
--- a/drivers/gpu/drm/drm_context.c
+++ b/drivers/gpu/drm/drm_context.c
@@ -77,7 +77,7 @@ static int drm_ctxbitmap_next(struct drm_device * dev)
 	int ret;
 
 	mutex_lock(&dev->struct_mutex);
-	ret = idr_alloc(&dev->ctx_idr, NULL, DRM_RESERVED_CONTEXTS, 0,
+	ret = idr_alloc_range(&dev->ctx_idr, NULL, DRM_RESERVED_CONTEXTS, 0,
 			GFP_KERNEL);
 	mutex_unlock(&dev->struct_mutex);
 	return ret;
diff --git a/drivers/gpu/drm/drm_crtc.c b/drivers/gpu/drm/drm_crtc.c
index fc83bb9..df26985 100644
--- a/drivers/gpu/drm/drm_crtc.c
+++ b/drivers/gpu/drm/drm_crtc.c
@@ -294,7 +294,8 @@ static int drm_mode_object_get(struct drm_device *dev,
 	int ret;
 
 	mutex_lock(&dev->mode_config.idr_mutex);
-	ret = idr_alloc(&dev->mode_config.crtc_idr, obj, 1, 0, GFP_KERNEL);
+	ret = idr_alloc_range(&dev->mode_config.crtc_idr,
+			      obj, 1, 0, GFP_KERNEL);
 	if (ret >= 0) {
 		/*
 		 * Set up the object linking under the protection of the idr
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index 603f256..d12ea60 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -272,7 +272,7 @@ drm_gem_handle_create(struct drm_file *file_priv,
 	idr_preload(GFP_KERNEL);
 	spin_lock(&file_priv->table_lock);
 
-	ret = idr_alloc(&file_priv->object_idr, obj, 1, 0, GFP_NOWAIT);
+	ret = idr_alloc_range(&file_priv->object_idr, obj, 1, 0, GFP_NOWAIT);
 
 	spin_unlock(&file_priv->table_lock);
 	idr_preload_end();
@@ -448,7 +448,8 @@ drm_gem_flink_ioctl(struct drm_device *dev, void *data,
 	idr_preload(GFP_KERNEL);
 	spin_lock(&dev->object_name_lock);
 	if (!obj->name) {
-		ret = idr_alloc(&dev->object_name_idr, obj, 1, 0, GFP_NOWAIT);
+		ret = idr_alloc_range(&dev->object_name_idr,
+				      obj, 1, 0, GFP_NOWAIT);
 		if (ret < 0)
 			goto err;
 
diff --git a/drivers/gpu/drm/drm_stub.c b/drivers/gpu/drm/drm_stub.c
index 327ca19..41675a9 100644
--- a/drivers/gpu/drm/drm_stub.c
+++ b/drivers/gpu/drm/drm_stub.c
@@ -121,7 +121,7 @@ static int drm_minor_get_id(struct drm_device *dev, int type)
         }
 
 	mutex_lock(&dev->struct_mutex);
-	ret = idr_alloc(&drm_minors_idr, NULL, base, limit, GFP_KERNEL);
+	ret = idr_alloc_range(&drm_minors_idr, NULL, base, limit, GFP_KERNEL);
 	mutex_unlock(&dev->struct_mutex);
 
 	return ret == -ENOSPC ? -EINVAL : ret;
diff --git a/drivers/gpu/drm/exynos/exynos_drm_ipp.c b/drivers/gpu/drm/exynos/exynos_drm_ipp.c
index d2b6ab4..16db049 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_ipp.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_ipp.c
@@ -159,7 +159,7 @@ static int ipp_create_id(struct idr *id_idr, struct mutex *lock, void *obj,
 
 	/* do the allocation under our mutexlock */
 	mutex_lock(lock);
-	ret = idr_alloc(id_idr, obj, 1, 0, GFP_KERNEL);
+	ret = idr_alloc_range(id_idr, obj, 1, 0, GFP_KERNEL);
 	mutex_unlock(lock);
 	if (ret < 0)
 		return ret;
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 51b7a21..39d3883 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -171,8 +171,8 @@ create_hw_context(struct drm_device *dev,
 	if (file_priv == NULL)
 		return ctx;
 
-	ret = idr_alloc(&file_priv->context_idr, ctx, DEFAULT_CONTEXT_ID + 1, 0,
-			GFP_KERNEL);
+	ret = idr_alloc_range(&file_priv->context_idr, ctx,
+			      DEFAULT_CONTEXT_ID + 1, 0, GFP_KERNEL);
 	if (ret < 0)
 		goto err_out;
 
diff --git a/drivers/gpu/drm/qxl/qxl_cmd.c b/drivers/gpu/drm/qxl/qxl_cmd.c
index eb89653..555a698 100644
--- a/drivers/gpu/drm/qxl/qxl_cmd.c
+++ b/drivers/gpu/drm/qxl/qxl_cmd.c
@@ -451,7 +451,7 @@ int qxl_surface_id_alloc(struct qxl_device *qdev,
 again:
 	idr_preload(GFP_ATOMIC);
 	spin_lock(&qdev->surf_id_idr_lock);
-	idr_ret = idr_alloc(&qdev->surf_id_idr, NULL, 1, 0, GFP_NOWAIT);
+	idr_ret = idr_alloc_range(&qdev->surf_id_idr, NULL, 1, 0, GFP_NOWAIT);
 	spin_unlock(&qdev->surf_id_idr_lock);
 	idr_preload_end();
 	if (idr_ret < 0)
diff --git a/drivers/gpu/drm/qxl/qxl_release.c b/drivers/gpu/drm/qxl/qxl_release.c
index b61449e..7cbe62a 100644
--- a/drivers/gpu/drm/qxl/qxl_release.c
+++ b/drivers/gpu/drm/qxl/qxl_release.c
@@ -60,7 +60,8 @@ qxl_release_alloc(struct qxl_device *qdev, int type,
 
 	idr_preload(GFP_KERNEL);
 	spin_lock(&qdev->release_idr_lock);
-	idr_ret = idr_alloc(&qdev->release_idr, release, 1, 0, GFP_NOWAIT);
+	idr_ret = idr_alloc_range(&qdev->release_idr,
+				  release, 1, 0, GFP_NOWAIT);
 	spin_unlock(&qdev->release_idr_lock);
 	idr_preload_end();
 	handle = idr_ret;
diff --git a/drivers/gpu/drm/sis/sis_mm.c b/drivers/gpu/drm/sis/sis_mm.c
index 9a43d98..ec9fd4a 100644
--- a/drivers/gpu/drm/sis/sis_mm.c
+++ b/drivers/gpu/drm/sis/sis_mm.c
@@ -128,7 +128,7 @@ static int sis_drm_alloc(struct drm_device *dev, struct drm_file *file,
 	if (retval)
 		goto fail_alloc;
 
-	retval = idr_alloc(&dev_priv->object_idr, item, 1, 0, GFP_KERNEL);
+	retval = idr_alloc_range(&dev_priv->object_idr, item, 1, 0, GFP_KERNEL);
 	if (retval < 0)
 		goto fail_idr;
 	user_key = retval;
diff --git a/drivers/gpu/drm/via/via_mm.c b/drivers/gpu/drm/via/via_mm.c
index 0ab93ff..8dc9731 100644
--- a/drivers/gpu/drm/via/via_mm.c
+++ b/drivers/gpu/drm/via/via_mm.c
@@ -148,7 +148,7 @@ int via_mem_alloc(struct drm_device *dev, void *data,
 	if (retval)
 		goto fail_alloc;
 
-	retval = idr_alloc(&dev_priv->object_idr, item, 1, 0, GFP_KERNEL);
+	retval = idr_alloc_range(&dev_priv->object_idr, item, 1, 0, GFP_KERNEL);
 	if (retval < 0)
 		goto fail_idr;
 	user_key = retval;
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c b/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
index 7953d1f..4838238 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
@@ -180,7 +180,7 @@ int vmw_resource_alloc_id(struct vmw_resource *res)
 	idr_preload(GFP_KERNEL);
 	write_lock(&dev_priv->resource_lock);
 
-	ret = idr_alloc(idr, res, 1, 0, GFP_NOWAIT);
+	ret = idr_alloc_range(idr, res, 1, 0, GFP_NOWAIT);
 	if (ret >= 0)
 		res->id = ret;
 
diff --git a/drivers/i2c/i2c-core.c b/drivers/i2c/i2c-core.c
index f32ca29..7a8e5ca 100644
--- a/drivers/i2c/i2c-core.c
+++ b/drivers/i2c/i2c-core.c
@@ -1087,7 +1087,7 @@ static int __i2c_add_numbered_adapter(struct i2c_adapter *adap)
 	int	id;
 
 	mutex_lock(&core_lock);
-	id = idr_alloc(&i2c_adapter_idr, adap, adap->nr, adap->nr + 1,
+	id = idr_alloc_range(&i2c_adapter_idr, adap, adap->nr, adap->nr + 1,
 		       GFP_KERNEL);
 	mutex_unlock(&core_lock);
 	if (id < 0)
@@ -1124,7 +1124,7 @@ int i2c_add_adapter(struct i2c_adapter *adapter)
 	}
 
 	mutex_lock(&core_lock);
-	id = idr_alloc(&i2c_adapter_idr, adapter,
+	id = idr_alloc_range(&i2c_adapter_idr, adapter,
 		       __i2c_first_dynamic_bus_num, 0, GFP_KERNEL);
 	mutex_unlock(&core_lock);
 	if (id < 0)
diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index 784b97c..c686690 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -388,7 +388,8 @@ static int cm_alloc_id(struct cm_id_private *cm_id_priv)
 	idr_preload(GFP_KERNEL);
 	spin_lock_irqsave(&cm.lock, flags);
 
-	id = idr_alloc(&cm.local_id_table, cm_id_priv, next_id, 0, GFP_NOWAIT);
+	id = idr_alloc_range(&cm.local_id_table, cm_id_priv,
+			     next_id, 0, GFP_NOWAIT);
 	if (id >= 0)
 		next_id = max(id + 1, 0);
 
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c
index 7c0f953..7e5ac1e 100644
--- a/drivers/infiniband/core/cma.c
+++ b/drivers/infiniband/core/cma.c
@@ -2283,7 +2283,7 @@ static int cma_alloc_port(struct idr *ps, struct rdma_id_private *id_priv,
 	if (!bind_list)
 		return -ENOMEM;
 
-	ret = idr_alloc(ps, bind_list, snum, snum + 1, GFP_KERNEL);
+	ret = idr_alloc_range(ps, bind_list, snum, snum + 1, GFP_KERNEL);
 	if (ret < 0)
 		goto err;
 
diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c
index 9838ca4..ce757fd 100644
--- a/drivers/infiniband/core/sa_query.c
+++ b/drivers/infiniband/core/sa_query.c
@@ -619,7 +619,7 @@ static int send_mad(struct ib_sa_query *query, int timeout_ms, gfp_t gfp_mask)
 		idr_preload(gfp_mask);
 	spin_lock_irqsave(&idr_lock, flags);
 
-	id = idr_alloc(&query_idr, query, 0, 0, GFP_NOWAIT);
+	id = idr_alloc(&query_idr, query, GFP_NOWAIT);
 
 	spin_unlock_irqrestore(&idr_lock, flags);
 	if (preload)
diff --git a/drivers/infiniband/core/ucm.c b/drivers/infiniband/core/ucm.c
index f2f6393..f66dc51 100644
--- a/drivers/infiniband/core/ucm.c
+++ b/drivers/infiniband/core/ucm.c
@@ -187,7 +187,7 @@ static struct ib_ucm_context *ib_ucm_ctx_alloc(struct ib_ucm_file *file)
 	INIT_LIST_HEAD(&ctx->events);
 
 	mutex_lock(&ctx_id_mutex);
-	ctx->id = idr_alloc(&ctx_id_table, ctx, 0, 0, GFP_KERNEL);
+	ctx->id = idr_alloc(&ctx_id_table, ctx, GFP_KERNEL);
 	mutex_unlock(&ctx_id_mutex);
 	if (ctx->id < 0)
 		goto error;
diff --git a/drivers/infiniband/core/ucma.c b/drivers/infiniband/core/ucma.c
index b0f189b..1e9bedf 100644
--- a/drivers/infiniband/core/ucma.c
+++ b/drivers/infiniband/core/ucma.c
@@ -158,7 +158,7 @@ static struct ucma_context *ucma_alloc_ctx(struct ucma_file *file)
 	ctx->file = file;
 
 	mutex_lock(&mut);
-	ctx->id = idr_alloc(&ctx_idr, ctx, 0, 0, GFP_KERNEL);
+	ctx->id = idr_alloc(&ctx_idr, ctx, GFP_KERNEL);
 	mutex_unlock(&mut);
 	if (ctx->id < 0)
 		goto error;
@@ -180,7 +180,7 @@ static struct ucma_multicast* ucma_alloc_multicast(struct ucma_context *ctx)
 		return NULL;
 
 	mutex_lock(&mut);
-	mc->id = idr_alloc(&multicast_idr, mc, 0, 0, GFP_KERNEL);
+	mc->id = idr_alloc_range(&multicast_idr, mc, 0, 0, GFP_KERNEL);
 	mutex_unlock(&mut);
 	if (mc->id < 0)
 		goto error;
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index b3c07b0..9ddc2e0 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -128,7 +128,7 @@ static int idr_add_uobj(struct idr *idr, struct ib_uobject *uobj)
 	idr_preload(GFP_KERNEL);
 	spin_lock(&ib_uverbs_idr_lock);
 
-	ret = idr_alloc(idr, uobj, 0, 0, GFP_NOWAIT);
+	ret = idr_alloc(idr, uobj, GFP_NOWAIT);
 	if (ret >= 0)
 		uobj->id = ret;
 
diff --git a/drivers/infiniband/hw/cxgb3/iwch.h b/drivers/infiniband/hw/cxgb3/iwch.h
index 8378622..f28c585 100644
--- a/drivers/infiniband/hw/cxgb3/iwch.h
+++ b/drivers/infiniband/hw/cxgb3/iwch.h
@@ -157,7 +157,7 @@ static inline int insert_handle(struct iwch_dev *rhp, struct idr *idr,
 	idr_preload(GFP_KERNEL);
 	spin_lock_irq(&rhp->lock);
 
-	ret = idr_alloc(idr, handle, id, id + 1, GFP_NOWAIT);
+	ret = idr_alloc_range(idr, handle, id, id + 1, GFP_NOWAIT);
 
 	spin_unlock_irq(&rhp->lock);
 	idr_preload_end();
diff --git a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
index 485183a..50e5a3f 100644
--- a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
+++ b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
@@ -266,7 +266,7 @@ static inline int _insert_handle(struct c4iw_dev *rhp, struct idr *idr,
 		spin_lock_irq(&rhp->lock);
 	}
 
-	ret = idr_alloc(idr, handle, id, id + 1, GFP_ATOMIC);
+	ret = idr_alloc_range(idr, handle, id, id + 1, GFP_ATOMIC);
 
 	if (lock) {
 		spin_unlock_irq(&rhp->lock);
diff --git a/drivers/infiniband/hw/ehca/ehca_cq.c b/drivers/infiniband/hw/ehca/ehca_cq.c
index 212150c..a3632ee 100644
--- a/drivers/infiniband/hw/ehca/ehca_cq.c
+++ b/drivers/infiniband/hw/ehca/ehca_cq.c
@@ -165,7 +165,8 @@ struct ib_cq *ehca_create_cq(struct ib_device *device, int cqe, int comp_vector,
 
 	idr_preload(GFP_KERNEL);
 	write_lock_irqsave(&ehca_cq_idr_lock, flags);
-	my_cq->token = idr_alloc(&ehca_cq_idr, my_cq, 0, 0x2000000, GFP_NOWAIT);
+	my_cq->token = idr_alloc_range(&ehca_cq_idr, my_cq, 0,
+				       0x2000000, GFP_NOWAIT);
 	write_unlock_irqrestore(&ehca_cq_idr_lock, flags);
 	idr_preload_end();
 
diff --git a/drivers/infiniband/hw/ehca/ehca_qp.c b/drivers/infiniband/hw/ehca/ehca_qp.c
index 00d6861..758a265 100644
--- a/drivers/infiniband/hw/ehca/ehca_qp.c
+++ b/drivers/infiniband/hw/ehca/ehca_qp.c
@@ -639,7 +639,7 @@ static struct ehca_qp *internal_create_qp(
 	idr_preload(GFP_KERNEL);
 	write_lock_irqsave(&ehca_qp_idr_lock, flags);
 
-	ret = idr_alloc(&ehca_qp_idr, my_qp, 0, 0x2000000, GFP_NOWAIT);
+	ret = idr_alloc_range(&ehca_qp_idr, my_qp, 0, 0x2000000, GFP_NOWAIT);
 	if (ret >= 0)
 		my_qp->token = ret;
 
diff --git a/drivers/infiniband/hw/ipath/ipath_driver.c b/drivers/infiniband/hw/ipath/ipath_driver.c
index bd0caed..83a40a5 100644
--- a/drivers/infiniband/hw/ipath/ipath_driver.c
+++ b/drivers/infiniband/hw/ipath/ipath_driver.c
@@ -204,7 +204,7 @@ static struct ipath_devdata *ipath_alloc_devdata(struct pci_dev *pdev)
 	idr_preload(GFP_KERNEL);
 	spin_lock_irqsave(&ipath_devs_lock, flags);
 
-	ret = idr_alloc(&unit_table, dd, 0, 0, GFP_NOWAIT);
+	ret = idr_alloc(&unit_table, dd, GFP_NOWAIT);
 	if (ret < 0) {
 		printk(KERN_ERR IPATH_DRV_NAME
 		       ": Could not allocate unit ID: error %d\n", -ret);
diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_main.c b/drivers/infiniband/hw/ocrdma/ocrdma_main.c
index ded416f..fac1d91 100644
--- a/drivers/infiniband/hw/ocrdma/ocrdma_main.c
+++ b/drivers/infiniband/hw/ocrdma/ocrdma_main.c
@@ -404,7 +404,7 @@ static struct ocrdma_dev *ocrdma_add(struct be_dev_info *dev_info)
 		goto idr_err;
 
 	memcpy(&dev->nic_info, dev_info, sizeof(*dev_info));
-	dev->id = idr_alloc(&ocrdma_dev_id, NULL, 0, 0, GFP_KERNEL);
+	dev->id = idr_alloc(&ocrdma_dev_id, NULL, GFP_KERNEL);
 	if (dev->id < 0)
 		goto idr_err;
 
diff --git a/drivers/infiniband/hw/qib/qib_init.c b/drivers/infiniband/hw/qib/qib_init.c
index 36e048e..17adbd10c 100644
--- a/drivers/infiniband/hw/qib/qib_init.c
+++ b/drivers/infiniband/hw/qib/qib_init.c
@@ -1109,7 +1109,7 @@ struct qib_devdata *qib_alloc_devdata(struct pci_dev *pdev, size_t extra)
 	idr_preload(GFP_KERNEL);
 	spin_lock_irqsave(&qib_devs_lock, flags);
 
-	ret = idr_alloc(&qib_unit_table, dd, 0, 0, GFP_NOWAIT);
+	ret = idr_alloc(&qib_unit_table, dd, GFP_NOWAIT);
 	if (ret >= 0) {
 		dd->unit = ret;
 		list_add(&dd->list, &qib_dev_list);
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index 9e39d2b..c63d35a 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1830,7 +1830,8 @@ static int specific_minor(int minor)
 	idr_preload(GFP_KERNEL);
 	spin_lock(&_minor_lock);
 
-	r = idr_alloc(&_minor_idr, MINOR_ALLOCED, minor, minor + 1, GFP_NOWAIT);
+	r = idr_alloc_range(&minor_idr, MINOR_ALLOCED,
+			    minor, minor + 1, GFP_NOWAIT);
 
 	spin_unlock(&_minor_lock);
 	idr_preload_end();
@@ -1846,7 +1847,8 @@ static int next_free_minor(int *minor)
 	idr_preload(GFP_KERNEL);
 	spin_lock(&_minor_lock);
 
-	r = idr_alloc(&_minor_idr, MINOR_ALLOCED, 0, 1 << MINORBITS, GFP_NOWAIT);
+	r = idr_alloc_range(&minor_idr, MINOR_ALLOCED, 0,
+			    1 << MINORBITS, GFP_NOWAIT);
 
 	spin_unlock(&_minor_lock);
 	idr_preload_end();
diff --git a/drivers/memstick/core/memstick.c b/drivers/memstick/core/memstick.c
index ffcb10a..70fb07c 100644
--- a/drivers/memstick/core/memstick.c
+++ b/drivers/memstick/core/memstick.c
@@ -515,7 +515,7 @@ int memstick_add_host(struct memstick_host *host)
 	idr_preload(GFP_KERNEL);
 	spin_lock(&memstick_host_lock);
 
-	rc = idr_alloc(&memstick_host_idr, host, 0, 0, GFP_NOWAIT);
+	rc = idr_alloc(&memstick_host_idr, host, GFP_NOWAIT);
 	if (rc >= 0)
 		host->id = rc;
 
diff --git a/drivers/memstick/core/mspro_block.c b/drivers/memstick/core/mspro_block.c
index f4176ca..34ef64b 100644
--- a/drivers/memstick/core/mspro_block.c
+++ b/drivers/memstick/core/mspro_block.c
@@ -1211,7 +1211,8 @@ static int mspro_block_init_disk(struct memstick_dev *card)
 	msb->page_size = be16_to_cpu(sys_info->unit_size);
 
 	mutex_lock(&mspro_block_disk_lock);
-	disk_id = idr_alloc(&mspro_block_disk_idr, card, 0, 256, GFP_KERNEL);
+	disk_id = idr_alloc_range(&mspro_block_disk_idr,
+				  card, 0, 256, GFP_KERNEL);
 	mutex_unlock(&mspro_block_disk_lock);
 	if (disk_id < 0)
 		return disk_id;
diff --git a/drivers/mfd/rtsx_pcr.c b/drivers/mfd/rtsx_pcr.c
index dd186c4..862deed 100644
--- a/drivers/mfd/rtsx_pcr.c
+++ b/drivers/mfd/rtsx_pcr.c
@@ -1103,7 +1103,7 @@ static int rtsx_pci_probe(struct pci_dev *pcidev,
 
 	idr_preload(GFP_KERNEL);
 	spin_lock(&rtsx_pci_lock);
-	ret = idr_alloc(&rtsx_pci_idr, pcr, 0, 0, GFP_NOWAIT);
+	ret = idr_alloc(&rtsx_pci_idr, pcr, GFP_NOWAIT);
 	if (ret >= 0)
 		pcr->id = ret;
 	spin_unlock(&rtsx_pci_lock);
diff --git a/drivers/misc/c2port/core.c b/drivers/misc/c2port/core.c
index f32550a..8a1ab10 100644
--- a/drivers/misc/c2port/core.c
+++ b/drivers/misc/c2port/core.c
@@ -899,7 +899,7 @@ struct c2port_device *c2port_device_register(char *name,
 
 	idr_preload(GFP_KERNEL);
 	spin_lock_irq(&c2port_idr_lock);
-	ret = idr_alloc(&c2port_idr, c2dev, 0, 0, GFP_NOWAIT);
+	ret = idr_alloc(&c2port_idr, c2dev, GFP_NOWAIT);
 	spin_unlock_irq(&c2port_idr_lock);
 	idr_preload_end();
 
diff --git a/drivers/misc/tifm_core.c b/drivers/misc/tifm_core.c
index 0ab7c92..c828c27 100644
--- a/drivers/misc/tifm_core.c
+++ b/drivers/misc/tifm_core.c
@@ -198,7 +198,7 @@ int tifm_add_adapter(struct tifm_adapter *fm)
 
 	idr_preload(GFP_KERNEL);
 	spin_lock(&tifm_adapter_lock);
-	rc = idr_alloc(&tifm_adapter_idr, fm, 0, 0, GFP_NOWAIT);
+	rc = idr_alloc(&tifm_adapter_idr, fm, GFP_NOWAIT);
 	if (rc >= 0)
 		fm->id = rc;
 	spin_unlock(&tifm_adapter_lock);
diff --git a/drivers/mmc/core/host.c b/drivers/mmc/core/host.c
index 6fb6f77..dd223fa 100644
--- a/drivers/mmc/core/host.c
+++ b/drivers/mmc/core/host.c
@@ -459,7 +459,7 @@ struct mmc_host *mmc_alloc_host(int extra, struct device *dev)
 	host->rescan_disable = 1;
 	idr_preload(GFP_KERNEL);
 	spin_lock(&mmc_host_lock);
-	err = idr_alloc(&mmc_host_idr, host, 0, 0, GFP_NOWAIT);
+	err = idr_alloc(&mmc_host_idr, host, GFP_NOWAIT);
 	if (err >= 0)
 		host->index = err;
 	spin_unlock(&mmc_host_lock);
diff --git a/drivers/mtd/mtdcore.c b/drivers/mtd/mtdcore.c
index 8d64363..c78b95c 100644
--- a/drivers/mtd/mtdcore.c
+++ b/drivers/mtd/mtdcore.c
@@ -347,7 +347,7 @@ int add_mtd_device(struct mtd_info *mtd)
 	BUG_ON(mtd->writesize == 0);
 	mutex_lock(&mtd_table_mutex);
 
-	i = idr_alloc(&mtd_idr, mtd, 0, 0, GFP_KERNEL);
+	i = idr_alloc(&mtd_idr, mtd, GFP_KERNEL);
 	if (i < 0)
 		goto fail_locked;
 
diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index a98fb0e..b31074f 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -338,7 +338,8 @@ static int macvtap_get_minor(struct macvlan_dev *vlan)
 	int retval = -ENOMEM;
 
 	mutex_lock(&minor_lock);
-	retval = idr_alloc(&minor_idr, vlan, 1, MACVTAP_NUM_DEVS, GFP_KERNEL);
+	retval = idr_alloc_range(&minor_idr, vlan, 1,
+				 MACVTAP_NUM_DEVS, GFP_KERNEL);
 	if (retval >= 0) {
 		vlan->minor = retval;
 	} else if (retval == -ENOSPC) {
diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c
index 72ff14b..0818f4f 100644
--- a/drivers/net/ppp/ppp_generic.c
+++ b/drivers/net/ppp/ppp_generic.c
@@ -2958,7 +2958,7 @@ static int unit_set(struct idr *p, void *ptr, int n)
 {
 	int unit;
 
-	unit = idr_alloc(p, ptr, n, n + 1, GFP_KERNEL);
+	unit = idr_alloc_range(p, ptr, n, n + 1, GFP_KERNEL);
 	if (unit == -ENOSPC)
 		unit = -EINVAL;
 	return unit;
@@ -2967,7 +2967,7 @@ static int unit_set(struct idr *p, void *ptr, int n)
 /* get new free unit number and associate pointer with it */
 static int unit_get(struct idr *p, void *ptr)
 {
-	return idr_alloc(p, ptr, 0, 0, GFP_KERNEL);
+	return idr_alloc_range(p, ptr, 0, 0, GFP_KERNEL);
 }
 
 /* put unit number back to a pool */
diff --git a/drivers/power/bq2415x_charger.c b/drivers/power/bq2415x_charger.c
index 0727f92..65c6a2e 100644
--- a/drivers/power/bq2415x_charger.c
+++ b/drivers/power/bq2415x_charger.c
@@ -1516,7 +1516,7 @@ static int bq2415x_probe(struct i2c_client *client,
 
 	/* Get new ID for the new device */
 	mutex_lock(&bq2415x_id_mutex);
-	num = idr_alloc(&bq2415x_id, client, 0, 0, GFP_KERNEL);
+	num = idr_alloc(&bq2415x_id, client, GFP_KERNEL);
 	mutex_unlock(&bq2415x_id_mutex);
 	if (num < 0)
 		return num;
diff --git a/drivers/power/bq27x00_battery.c b/drivers/power/bq27x00_battery.c
index b309713..08d0062 100644
--- a/drivers/power/bq27x00_battery.c
+++ b/drivers/power/bq27x00_battery.c
@@ -792,7 +792,7 @@ static int bq27x00_battery_probe(struct i2c_client *client,
 
 	/* Get new ID for the new battery device */
 	mutex_lock(&battery_mutex);
-	num = idr_alloc(&battery_id, client, 0, 0, GFP_KERNEL);
+	num = idr_alloc(&battery_id, client, GFP_KERNEL);
 	mutex_unlock(&battery_mutex);
 	if (num < 0)
 		return num;
diff --git a/drivers/power/ds2782_battery.c b/drivers/power/ds2782_battery.c
index 5631748..a945e2b7 100644
--- a/drivers/power/ds2782_battery.c
+++ b/drivers/power/ds2782_battery.c
@@ -396,7 +396,7 @@ static int ds278x_battery_probe(struct i2c_client *client,
 
 	/* Get an ID for this battery */
 	mutex_lock(&battery_lock);
-	ret = idr_alloc(&battery_id, client, 0, 0, GFP_KERNEL);
+	ret = idr_alloc(&battery_id, client, GFP_KERNEL);
 	mutex_unlock(&battery_lock);
 	if (ret < 0)
 		goto fail_id;
diff --git a/drivers/pps/kapi.c b/drivers/pps/kapi.c
index cdad4d9..613622d 100644
--- a/drivers/pps/kapi.c
+++ b/drivers/pps/kapi.c
@@ -102,7 +102,7 @@ struct pps_device *pps_register_source(struct pps_source_info *info,
 		goto pps_register_source_exit;
 	}
 
-	/* These initializations must be done before calling idr_alloc()
+	/* These initializations must be done before calling idr_alloc_range()
 	 * in order to avoid reces into pps_event().
 	 */
 	pps->params.api_version = PPS_API_VERS;
diff --git a/drivers/pps/pps.c b/drivers/pps/pps.c
index 7173e3a..7bd62a1 100644
--- a/drivers/pps/pps.c
+++ b/drivers/pps/pps.c
@@ -296,10 +296,10 @@ int pps_register_cdev(struct pps_device *pps)
 
 	mutex_lock(&pps_idr_lock);
 	/*
-	 * Get new ID for the new PPS source.  After idr_alloc() calling
+	 * Get new ID for the new PPS source.  After idr_alloc_range() calling
 	 * the new source will be freely available into the kernel.
 	 */
-	err = idr_alloc(&pps_idr, pps, 0, PPS_MAX_SOURCES, GFP_KERNEL);
+	err = idr_alloc_range(&pps_idr, pps, 0, PPS_MAX_SOURCES, GFP_KERNEL);
 	if (err < 0) {
 		if (err == -ENOSPC) {
 			pr_err("%s: too many PPS sources in the system\n",
diff --git a/drivers/remoteproc/remoteproc_core.c b/drivers/remoteproc/remoteproc_core.c
index d99809f..912e31f 100644
--- a/drivers/remoteproc/remoteproc_core.c
+++ b/drivers/remoteproc/remoteproc_core.c
@@ -217,7 +217,7 @@ int rproc_alloc_vring(struct rproc_vdev *rvdev, int i)
 	 * TODO: assign a notifyid for rvdev updates as well
 	 * TODO: support predefined notifyids (via resource table)
 	 */
-	ret = idr_alloc(&rproc->notifyids, rvring, 0, 0, GFP_KERNEL);
+	ret = idr_alloc(&rproc->notifyids, rvring, GFP_KERNEL);
 	if (ret < 0) {
 		dev_err(dev, "idr_alloc failed: %d\n", ret);
 		dma_free_coherent(dev->parent, size, va, dma);
diff --git a/drivers/rpmsg/virtio_rpmsg_bus.c b/drivers/rpmsg/virtio_rpmsg_bus.c
index b6135d4..aaee678 100644
--- a/drivers/rpmsg/virtio_rpmsg_bus.c
+++ b/drivers/rpmsg/virtio_rpmsg_bus.c
@@ -242,9 +242,9 @@ static struct rpmsg_endpoint *__rpmsg_create_ept(struct virtproc_info *vrp,
 	mutex_lock(&vrp->endpoints_lock);
 
 	/* bind the endpoint to an rpmsg address (and allocate one if needed) */
-	id = idr_alloc(&vrp->endpoints, ept, id_min, id_max, GFP_KERNEL);
+	id = idr_alloc_range(&vrp->endpoints, ept, id_min, id_max, GFP_KERNEL);
 	if (id < 0) {
-		dev_err(dev, "idr_alloc failed: %d\n", id);
+		dev_err(dev, "idr_alloc_range failed: %d\n", id);
 		goto free_ept;
 	}
 	ept->addr = id;
diff --git a/drivers/scsi/bfa/bfad_im.c b/drivers/scsi/bfa/bfad_im.c
index 9796284..ae1b822 100644
--- a/drivers/scsi/bfa/bfad_im.c
+++ b/drivers/scsi/bfa/bfad_im.c
@@ -523,7 +523,7 @@ bfad_im_scsi_host_alloc(struct bfad_s *bfad, struct bfad_im_port_s *im_port,
 	int error = 1;
 
 	mutex_lock(&bfad_mutex);
-	error = idr_alloc(&bfad_im_port_index, im_port, 0, 0, GFP_KERNEL);
+	error = idr_alloc(&bfad_im_port_index, im_port, GFP_KERNEL);
 	if (error < 0) {
 		mutex_unlock(&bfad_mutex);
 		printk(KERN_WARNING "idr_alloc failure\n");
diff --git a/drivers/scsi/ch.c b/drivers/scsi/ch.c
index 2a32374..4e0ffdc 100644
--- a/drivers/scsi/ch.c
+++ b/drivers/scsi/ch.c
@@ -907,7 +907,8 @@ static int ch_probe(struct device *dev)
 
 	idr_preload(GFP_KERNEL);
 	spin_lock(&ch_index_lock);
-	ret = idr_alloc(&ch_index_idr, ch, 0, CH_MAX_DEVS + 1, GFP_NOWAIT);
+	ret = idr_alloc_range(&ch_index_idr, ch, 0,
+			      CH_MAX_DEVS + 1, GFP_NOWAIT);
 	spin_unlock(&ch_index_lock);
 	idr_preload_end();
 
diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c
index e0b20fa..a859e33 100644
--- a/drivers/scsi/lpfc/lpfc_init.c
+++ b/drivers/scsi/lpfc/lpfc_init.c
@@ -3212,7 +3212,7 @@ lpfc_get_instance(void)
 {
 	int ret;
 
-	ret = idr_alloc(&lpfc_hba_index, NULL, 0, 0, GFP_KERNEL);
+	ret = idr_alloc(&lpfc_hba_index, NULL, GFP_KERNEL);
 	return ret < 0 ? -1 : ret;
 }
 
diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index df5e961..23856c8 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -1395,7 +1395,7 @@ static Sg_device *sg_alloc(struct gendisk *disk, struct scsi_device *scsidp)
 	idr_preload(GFP_KERNEL);
 	write_lock_irqsave(&sg_index_lock, iflags);
 
-	error = idr_alloc(&sg_index_idr, sdp, 0, SG_MAX_DEVS, GFP_NOWAIT);
+	error = idr_alloc_range(&sg_index_idr, sdp, 0, SG_MAX_DEVS, GFP_NOWAIT);
 	if (error < 0) {
 		if (error == -ENOSPC) {
 			sdev_printk(KERN_WARNING, scsidp,
diff --git a/drivers/scsi/st.c b/drivers/scsi/st.c
index 2a32036..62ac019 100644
--- a/drivers/scsi/st.c
+++ b/drivers/scsi/st.c
@@ -4184,7 +4184,8 @@ static int st_probe(struct device *dev)
 
 	idr_preload(GFP_KERNEL);
 	spin_lock(&st_index_lock);
-	error = idr_alloc(&st_index_idr, tpnt, 0, ST_MAX_TAPES + 1, GFP_NOWAIT);
+	error = idr_alloc_range(&st_index_idr, tpnt, 0,
+				ST_MAX_TAPES + 1, GFP_NOWAIT);
 	spin_unlock(&st_index_lock);
 	idr_preload_end();
 	if (error < 0) {
diff --git a/drivers/staging/tidspbridge/rmgr/drv.c b/drivers/staging/tidspbridge/rmgr/drv.c
index be26917..fb59732 100644
--- a/drivers/staging/tidspbridge/rmgr/drv.c
+++ b/drivers/staging/tidspbridge/rmgr/drv.c
@@ -83,7 +83,7 @@ int drv_insert_node_res_element(void *hnode, void *node_resource,
 		return -ENOMEM;
 
 	(*node_res_obj)->node = hnode;
-	retval = idr_alloc(ctxt->node_id, *node_res_obj, 0, 0, GFP_KERNEL);
+	retval = idr_alloc(ctxt->node_id, *node_res_obj, GFP_KERNEL);
 	if (retval >= 0) {
 		(*node_res_obj)->id = retval;
 		return 0;
@@ -199,7 +199,7 @@ int drv_proc_insert_strm_res_element(void *stream_obj,
 		return -EFAULT;
 
 	(*pstrm_res)->stream = stream_obj;
-	retval = idr_alloc(ctxt->stream_id, *pstrm_res, 0, 0, GFP_KERNEL);
+	retval = idr_alloc(ctxt->stream_id, *pstrm_res, GFP_KERNEL);
 	if (retval >= 0) {
 		(*pstrm_res)->id = retval;
 		return 0;
diff --git a/drivers/staging/zcache/ramster/tcp.c b/drivers/staging/zcache/ramster/tcp.c
index f6e1e52..2dc645e 100644
--- a/drivers/staging/zcache/ramster/tcp.c
+++ b/drivers/staging/zcache/ramster/tcp.c
@@ -303,7 +303,7 @@ static int r2net_prep_nsw(struct r2net_node *nn, struct r2net_status_wait *nsw)
 	int ret;
 
 	spin_lock(&nn->nn_lock);
-	ret = idr_alloc(&nn->nn_status_idr, nsw, 0, 0, GFP_ATOMIC);
+	ret = idr_alloc(&nn->nn_status_idr, nsw, GFP_ATOMIC);
 	if (ret >= 0) {
 		nsw->ns_id = ret;
 		list_add_tail(&nsw->ns_node_item, &nn->nn_status_list);
diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c
index f73da43..1e0539c 100644
--- a/drivers/target/iscsi/iscsi_target.c
+++ b/drivers/target/iscsi/iscsi_target.c
@@ -150,7 +150,7 @@ struct iscsi_tiqn *iscsit_add_tiqn(unsigned char *buf)
 	idr_preload(GFP_KERNEL);
 	spin_lock(&tiqn_lock);
 
-	ret = idr_alloc(&tiqn_idr, NULL, 0, 0, GFP_NOWAIT);
+	ret = idr_alloc(&tiqn_idr, NULL, GFP_NOWAIT);
 	if (ret < 0) {
 		pr_err("idr_alloc() failed for tiqn->tiqn_index\n");
 		spin_unlock(&tiqn_lock);
diff --git a/drivers/target/iscsi/iscsi_target_login.c b/drivers/target/iscsi/iscsi_target_login.c
index 3402241..4da42b0 100644
--- a/drivers/target/iscsi/iscsi_target_login.c
+++ b/drivers/target/iscsi/iscsi_target_login.c
@@ -291,7 +291,7 @@ static int iscsi_login_zero_tsih_s1(
 
 	idr_preload(GFP_KERNEL);
 	spin_lock_bh(&sess_idr_lock);
-	ret = idr_alloc(&sess_idr, NULL, 0, 0, GFP_NOWAIT);
+	ret = idr_alloc(&sess_idr, NULL, GFP_NOWAIT);
 	if (ret >= 0)
 		sess->session_index = ret;
 	spin_unlock_bh(&sess_idr_lock);
diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c
index 82e15db..24cb4ae 100644
--- a/drivers/thermal/cpu_cooling.c
+++ b/drivers/thermal/cpu_cooling.c
@@ -75,7 +75,7 @@ static int get_idr(struct idr *idr, int *id)
 	int ret;
 
 	mutex_lock(&cooling_cpufreq_lock);
-	ret = idr_alloc(idr, NULL, 0, 0, GFP_KERNEL);
+	ret = idr_alloc(idr, NULL, GFP_KERNEL);
 	mutex_unlock(&cooling_cpufreq_lock);
 	if (unlikely(ret < 0))
 		return ret;
diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c
index 1f02e8e..9163fe8 100644
--- a/drivers/thermal/thermal_core.c
+++ b/drivers/thermal/thermal_core.c
@@ -134,7 +134,7 @@ static int get_idr(struct idr *idr, struct mutex *lock, int *id)
 
 	if (lock)
 		mutex_lock(lock);
-	ret = idr_alloc(idr, NULL, 0, 0, GFP_KERNEL);
+	ret = idr_alloc(idr, NULL, GFP_KERNEL);
 	if (lock)
 		mutex_unlock(lock);
 	if (unlikely(ret < 0))
diff --git a/drivers/uio/uio.c b/drivers/uio/uio.c
index 3b96f18..dc22d59 100644
--- a/drivers/uio/uio.c
+++ b/drivers/uio/uio.c
@@ -371,7 +371,8 @@ static int uio_get_minor(struct uio_device *idev)
 	int retval = -ENOMEM;
 
 	mutex_lock(&minor_lock);
-	retval = idr_alloc(&uio_idr, idev, 0, UIO_MAX_DEVICES, GFP_KERNEL);
+	retval = idr_alloc_range(&uio_idr, idev, 0,
+				 UIO_MAX_DEVICES, GFP_KERNEL);
 	if (retval >= 0) {
 		idev->minor = retval;
 		retval = 0;
diff --git a/drivers/usb/serial/usb-serial.c b/drivers/usb/serial/usb-serial.c
index cb27fcb..a7985a6 100644
--- a/drivers/usb/serial/usb-serial.c
+++ b/drivers/usb/serial/usb-serial.c
@@ -96,7 +96,7 @@ static int allocate_minors(struct usb_serial *serial, int num_ports)
 	mutex_lock(&table_lock);
 	for (i = 0; i < num_ports; ++i) {
 		port = serial->port[i];
-		minor = idr_alloc(&serial_minors, port, 0, 0, GFP_KERNEL);
+		minor = idr_alloc(&serial_minors, port, GFP_KERNEL);
 		if (minor < 0)
 			goto error;
 		port->minor = minor;
diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 842f450..e60dcd0 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -143,7 +143,8 @@ EXPORT_SYMBOL_GPL(vfio_unregister_iommu_driver);
 static int vfio_alloc_group_minor(struct vfio_group *group)
 {
 	/* index 0 is used by /dev/vfio/vfio */
-	return idr_alloc(&vfio.group_idr, group, 1, MINORMASK + 1, GFP_KERNEL);
+	return idr_alloc_range(&vfio.group_idr, group, 1,
+			       MINORMASK + 1, GFP_KERNEL);
 }
 
 static void vfio_free_group_minor(int minor)
diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c
index e223a91..75f0421 100644
--- a/fs/dlm/lock.c
+++ b/fs/dlm/lock.c
@@ -1201,7 +1201,7 @@ static int create_lkb(struct dlm_ls *ls, struct dlm_lkb **lkb_ret)
 
 	idr_preload(GFP_NOFS);
 	spin_lock(&ls->ls_lkbidr_spin);
-	rv = idr_alloc(&ls->ls_lkbidr, lkb, 1, 0, GFP_NOWAIT);
+	rv = idr_alloc_range(&ls->ls_lkbidr, lkb, 1, 0, GFP_NOWAIT);
 	if (rv >= 0)
 		lkb->lkb_id = rv;
 	spin_unlock(&ls->ls_lkbidr_spin);
diff --git a/fs/dlm/recover.c b/fs/dlm/recover.c
index a6bc63f..2babe5e 100644
--- a/fs/dlm/recover.c
+++ b/fs/dlm/recover.c
@@ -313,7 +313,7 @@ static int recover_idr_add(struct dlm_rsb *r)
 		rv = -1;
 		goto out_unlock;
 	}
-	rv = idr_alloc(&ls->ls_recover_idr, r, 1, 0, GFP_NOWAIT);
+	rv = idr_alloc_range(&ls->ls_recover_idr, r, 1, 0, GFP_NOWAIT);
 	if (rv < 0)
 		goto out_unlock;
 
diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
index 90dce91..dd8451d 100644
--- a/fs/nfs/nfs4client.c
+++ b/fs/nfs/nfs4client.c
@@ -32,7 +32,7 @@ static int nfs_get_cb_ident_idr(struct nfs_client *clp, int minorversion)
 		return ret;
 	idr_preload(GFP_KERNEL);
 	spin_lock(&nn->nfs_client_lock);
-	ret = idr_alloc(&nn->cb_ident_idr, clp, 0, 0, GFP_NOWAIT);
+	ret = idr_alloc(&nn->cb_ident_idr, clp, GFP_NOWAIT);
 	if (ret >= 0)
 		clp->cl_cb_ident = ret;
 	spin_unlock(&nn->nfs_client_lock);
diff --git a/fs/ocfs2/cluster/tcp.c b/fs/ocfs2/cluster/tcp.c
index d644dc6..c230f68 100644
--- a/fs/ocfs2/cluster/tcp.c
+++ b/fs/ocfs2/cluster/tcp.c
@@ -307,7 +307,7 @@ static int o2net_prep_nsw(struct o2net_node *nn, struct o2net_status_wait *nsw)
 	int ret;
 
 	spin_lock(&nn->nn_lock);
-	ret = idr_alloc(&nn->nn_status_idr, nsw, 0, 0, GFP_ATOMIC);
+	ret = idr_alloc(&nn->nn_status_idr, nsw, GFP_ATOMIC);
 	if (ret >= 0) {
 		nsw->ns_id = ret;
 		list_add_tail(&nsw->ns_node_item, &nn->nn_status_list);
diff --git a/include/linux/idr.h b/include/linux/idr.h
index 6395da1..4926f36 100644
--- a/include/linux/idr.h
+++ b/include/linux/idr.h
@@ -207,8 +207,10 @@ struct idr {
 
 void *idr_find_slowpath(struct idr *idp, int id);
 void idr_preload(gfp_t gfp_mask);
-int idr_alloc(struct idr *idp, void *ptr, int start, int end, gfp_t gfp_mask);
-int idr_alloc_cyclic(struct idr *idr, void *ptr, int start, int end, gfp_t gfp_mask);
+int idr_alloc_range(struct idr *idp, void *ptr, int start,
+		    int end, gfp_t gfp_mask);
+int idr_alloc_cyclic(struct idr *idr, void *ptr, int start,
+		     int end, gfp_t gfp_mask);
 int idr_for_each(struct idr *idp,
 		 int (*fn)(int id, void *p, void *data), void *data);
 void *idr_find_next(struct idr *idp, int *nextid);
@@ -218,6 +220,11 @@ void idr_free(struct idr *idp, int id);
 void idr_destroy(struct idr *idp);
 void idr_init(struct idr *idp);
 
+static inline int idr_alloc(struct idr *idr, void *ptr, gfp_t gfp)
+{
+	return idr_alloc_range(idr, ptr, 0, 0, gfp);
+}
+
 /**
  * idr_preload_end - end preload section started with idr_preload()
  *
diff --git a/ipc/util.c b/ipc/util.c
index 4704223..e31ecb8 100644
--- a/ipc/util.c
+++ b/ipc/util.c
@@ -268,7 +268,7 @@ int ipc_addid(struct ipc_ids* ids, struct kern_ipc_perm* new, int size)
 	rcu_read_lock();
 	spin_lock(&new->lock);
 
-	id = idr_alloc(&ids->ipcs_idr, new,
+	id = idr_alloc_range(&ids->ipcs_idr, new,
 		       (next_id < 0) ? 0 : ipcid_to_idx(next_id), 0,
 		       GFP_NOWAIT);
 	idr_preload_end();
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 7b397f8..5bf4f97 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -5420,7 +5420,7 @@ static struct css_id *get_new_cssid(struct cgroup_subsys *ss, int depth)
 	idr_preload(GFP_KERNEL);
 	spin_lock(&ss->id_lock);
 	/* Don't use 0. allocates an ID of 1-65535 */
-	ret = idr_alloc(&ss->idr, newid, 1, CSS_ID_MAX + 1, GFP_NOWAIT);
+	ret = idr_alloc_range(&ss->idr, newid, 1, CSS_ID_MAX + 1, GFP_NOWAIT);
 	spin_unlock(&ss->id_lock);
 	idr_preload_end();
 
diff --git a/kernel/events/core.c b/kernel/events/core.c
index f86599e..70c87e1 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6299,7 +6299,8 @@ int perf_pmu_register(struct pmu *pmu, const char *name, int type)
 	pmu->name = name;
 
 	if (type < 0) {
-		type = idr_alloc(&pmu_idr, pmu, PERF_TYPE_MAX, 0, GFP_KERNEL);
+		type = idr_alloc_range(&pmu_idr, pmu, PERF_TYPE_MAX,
+				       0, GFP_KERNEL);
 		if (type < 0) {
 			ret = type;
 			goto free_pdc;
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 0b72e81..aeb41bf 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -524,7 +524,7 @@ static int worker_pool_assign_id(struct worker_pool *pool)
 
 	lockdep_assert_held(&wq_pool_mutex);
 
-	ret = idr_alloc(&worker_pool_idr, pool, 0, 0, GFP_KERNEL);
+	ret = idr_alloc(&worker_pool_idr, pool, GFP_KERNEL);
 	if (ret >= 0) {
 		pool->id = ret;
 		return 0;
@@ -1706,7 +1706,7 @@ static struct worker *create_worker(struct worker_pool *pool)
 	idr_preload(GFP_KERNEL);
 	spin_lock_irq(&pool->lock);
 
-	id = idr_alloc(&pool->worker_idr, NULL, 0, 0, GFP_NOWAIT);
+	id = idr_alloc(&pool->worker_idr, NULL, GFP_NOWAIT);
 
 	spin_unlock_irq(&pool->lock);
 	idr_preload_end();
diff --git a/lib/idr.c b/lib/idr.c
index 1d67cdd..5393aa1 100644
--- a/lib/idr.c
+++ b/lib/idr.c
@@ -986,7 +986,7 @@ static struct idr_layer *idr_layer_alloc(gfp_t gfp_mask, struct idr *layer_idr)
 
 	/*
 	 * Try to allocate directly from kmem_cache.  We want to try this
-	 * before preload buffer; otherwise, non-preloading idr_alloc()
+	 * before preload buffer; otherwise, non-preloading idr_alloc_range()
 	 * users will end up taking advantage of preloading ones.  As the
 	 * following is allowed to fail for preloaded cases, suppress
 	 * warning this time.
@@ -1240,24 +1240,24 @@ static void idr_fill_slot(struct idr *idr, void *ptr, int id,
 }
 
 /**
- * idr_preload - preload for idr_alloc()
+ * idr_preload - preload for idr_alloc_range()
  * @gfp_mask: allocation mask to use for preloading
  *
- * Preload per-cpu layer buffer for idr_alloc().  Can only be used from
+ * Preload per-cpu layer buffer for idr_alloc_range().  Can only be used from
  * process context and each idr_preload() invocation should be matched with
  * idr_preload_end().  Note that preemption is disabled while preloaded.
  *
- * The first idr_alloc() in the preloaded section can be treated as if it
+ * The first idr_alloc_range() in the preloaded section can be treated as if it
  * were invoked with @gfp_mask used for preloading.  This allows using more
  * permissive allocation masks for idrs protected by spinlocks.
  *
- * For example, if idr_alloc() below fails, the failure can be treated as
- * if idr_alloc() were called with GFP_KERNEL rather than GFP_NOWAIT.
+ * For example, if idr_alloc_range() below fails, the failure can be treated as
+ * if idr_alloc_range() were called with GFP_KERNEL rather than GFP_NOWAIT.
  *
  *	idr_preload(GFP_KERNEL);
  *	spin_lock(lock);
  *
- *	id = idr_alloc(idr, ptr, start, end, GFP_NOWAIT);
+ *	id = idr_alloc_range(idr, ptr, start, end, GFP_NOWAIT);
  *
  *	spin_unlock(lock);
  *	idr_preload_end();
@@ -1276,10 +1276,10 @@ void idr_preload(gfp_t gfp_mask)
 	preempt_disable();
 
 	/*
-	 * idr_alloc() is likely to succeed w/o full idr_layer buffer and
-	 * return value from idr_alloc() needs to be checked for failure
+	 * idr_alloc_range() is likely to succeed w/o full idr_layer buffer and
+	 * return value from idr_alloc_range() needs to be checked for failure
 	 * anyway.  Silently give up if allocation fails.  The caller can
-	 * treat failures from idr_alloc() as if idr_alloc() were called
+	 * treat failures from idr_alloc_range() as if idr_alloc() were called
 	 * with @gfp_mask which should be enough.
 	 */
 	while (__this_cpu_read(idr_preload_cnt) < MAX_IDR_FREE) {
@@ -1300,7 +1300,7 @@ void idr_preload(gfp_t gfp_mask)
 EXPORT_SYMBOL(idr_preload);
 
 /**
- * idr_alloc - allocate new idr entry
+ * idr_alloc_range - allocate new idr entry
  * @idr: the (initialized) idr
  * @ptr: pointer to be associated with the new id
  * @start: the minimum id (inclusive)
@@ -1319,7 +1319,8 @@ EXPORT_SYMBOL(idr_preload);
  * or iteration can be performed under RCU read lock provided the user
  * destroys @ptr in RCU-safe way after removal from idr.
  */
-int idr_alloc(struct idr *idr, void *ptr, int start, int end, gfp_t gfp_mask)
+int idr_alloc_range(struct idr *idr, void *ptr, int start,
+		    int end, gfp_t gfp_mask)
 {
 	int max = end > 0 ? end - 1 : INT_MAX;	/* inclusive upper limit */
 	struct idr_layer *pa[MAX_IDR_LEVEL + 1];
@@ -1343,7 +1344,7 @@ int idr_alloc(struct idr *idr, void *ptr, int start, int end, gfp_t gfp_mask)
 	idr_fill_slot(idr, ptr, id, pa);
 	return id;
 }
-EXPORT_SYMBOL_GPL(idr_alloc);
+EXPORT_SYMBOL_GPL(idr_alloc_range);
 
 /**
  * idr_alloc_cyclic - allocate new idr entry in a cyclical fashion
@@ -1353,18 +1354,19 @@ EXPORT_SYMBOL_GPL(idr_alloc);
  * @end: the maximum id (exclusive, <= 0 for max)
  * @gfp_mask: memory allocation flags
  *
- * Essentially the same as idr_alloc, but prefers to allocate progressively
- * higher ids if it can. If the "cur" counter wraps, then it will start again
- * at the "start" end of the range and allocate one that has already been used.
+ * Essentially the same as idr_alloc_range, but prefers to allocate
+ * progressively higher ids if it can. If the "cur" counter wraps, then it will
+ * start again at the "start" end of the range and allocate one that has already
+ * been used.
  */
 int idr_alloc_cyclic(struct idr *idr, void *ptr, int start, int end,
 			gfp_t gfp_mask)
 {
 	int id;
 
-	id = idr_alloc(idr, ptr, max(start, idr->cur), end, gfp_mask);
+	id = idr_alloc_range(idr, ptr, max(start, idr->cur), end, gfp_mask);
 	if (id == -ENOSPC)
-		id = idr_alloc(idr, ptr, start, end, gfp_mask);
+		id = idr_alloc_range(idr, ptr, start, end, gfp_mask);
 
 	if (likely(id >= 0))
 		idr->cur = id + 1;
diff --git a/net/9p/util.c b/net/9p/util.c
index 59f278e..cc31c62 100644
--- a/net/9p/util.c
+++ b/net/9p/util.c
@@ -94,7 +94,7 @@ int p9_idpool_get(struct p9_idpool *p)
 	spin_lock_irqsave(&p->lock, flags);
 
 	/* no need to store exactly p, we just need something non-null */
-	i = idr_alloc(&p->pool, p, 0, 0, GFP_NOWAIT);
+	i = idr_alloc(&p->pool, p, GFP_NOWAIT);
 
 	spin_unlock_irqrestore(&p->lock, flags);
 	idr_preload_end();
diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 4105d0c..03a32c2 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -1982,8 +1982,8 @@ netdev_tx_t ieee80211_subif_start_xmit(struct sk_buff *skb,
 			int id;
 
 			spin_lock_irqsave(&local->ack_status_lock, flags);
-			id = idr_alloc(&local->ack_status_frames, orig_skb,
-				       1, 0x10000, GFP_ATOMIC);
+			id = idr_alloc_range(&local->ack_status_frames,
+					     orig_skb, 1, 0x10000, GFP_ATOMIC);
 			spin_unlock_irqrestore(&local->ack_status_lock, flags);
 
 			if (id >= 0) {
diff --git a/net/tipc/server.c b/net/tipc/server.c
index fd3fa57..9d789e1 100644
--- a/net/tipc/server.c
+++ b/net/tipc/server.c
@@ -216,7 +216,7 @@ static struct tipc_conn *tipc_alloc_conn(struct tipc_server *s)
 	INIT_WORK(&con->rwork, tipc_recv_work);
 
 	spin_lock_bh(&s->idr_lock);
-	ret = idr_alloc(&s->conn_idr, con, 0, 0, GFP_ATOMIC);
+	ret = idr_alloc(&s->conn_idr, con, GFP_ATOMIC);
 	if (ret < 0) {
 		kfree(con);
 		spin_unlock_bh(&s->idr_lock);
-- 
1.8.4.rc1

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 08/10] idr: Reimplement idr on top of ida/radix trees
  2013-08-07 17:34 IDA/IDR rewrite, percpu ida Kent Overstreet
                   ` (4 preceding siblings ...)
  2013-08-07 17:46 ` [PATCH 07/10] idr: Rename idr_alloc() -> idr_alloc_range() Kent Overstreet
@ 2013-08-07 17:46 ` Kent Overstreet
       [not found] ` <1375896905-6074-1-git-send-email-kmo-PEzghdH756F8UrSeD/g0lQ@public.gmane.org>
  2013-08-07 17:46   ` [Cluster-devel] " Kent Overstreet
  7 siblings, 0 replies; 46+ messages in thread
From: Kent Overstreet @ 2013-08-07 17:46 UTC (permalink / raw)
  To: akpm, linux-kernel; +Cc: Kent Overstreet, Tejun Heo

The old idr code was really a second radix tree implementation - we
already have one in lib/radix-tree.c.

This patch reimplements idr on top of our existing radix trees, using
our shiny new ida implementation for allocating/freeing the ids. The old
idr code was noticably slower than lib/radix-tree.c in at least some
benchmarks, so in addition to being ~500 lines less code this patch
should improve performance too.

There's one thing left unfinished in this patch - the existing
idr_preload() interface won't work for ida. Another patch on top of this
will fix idr_preload() and update existing users to the new interface.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tejun Heo <tj@kernel.org>
---
 include/linux/idr.h | 159 +++++-----
 init/main.c         |   1 -
 lib/idr.c           | 890 ++++++++++------------------------------------------
 3 files changed, 246 insertions(+), 804 deletions(-)

diff --git a/include/linux/idr.h b/include/linux/idr.h
index 4926f36..85355d7 100644
--- a/include/linux/idr.h
+++ b/include/linux/idr.h
@@ -1,6 +1,6 @@
 /*
  * include/linux/idr.h
- * 
+ *
  * 2002-10-18  written by Jim Houston jim.houston@ccur.com
  *	Copyright (C) 2002 by Concurrent Computer Corporation
  *	Distributed under the GNU GPL license version 2.
@@ -12,10 +12,8 @@
 #ifndef __IDR_H__
 #define __IDR_H__
 
-#include <linux/types.h>
-#include <linux/bitops.h>
-#include <linux/init.h>
-#include <linux/rcupdate.h>
+#include <linux/gfp.h>
+#include <linux/radix-tree.h>
 #include <linux/spinlock_types.h>
 #include <linux/wait.h>
 
@@ -149,76 +147,42 @@ int percpu_ida_init(struct percpu_ida *pool, unsigned long nr_tags);
 
 /* IDR */
 
-/*
- * We want shallower trees and thus more bits covered at each layer.  8
- * bits gives us large enough first layer for most use cases and maximum
- * tree depth of 4.  Each idr_layer is slightly larger than 2k on 64bit and
- * 1k on 32bit.
+/**
+ * DOC: idr sync
+ * idr synchronization (stolen from radix-tree.h)
+ *
+ * idr_alloc() and idr_remove() do their own locking internally - the user need
+ * not be concerned with synchronization unless there's other operations that
+ * need to be done atomically.
+ *
+ * idr_find() does no locking - it can be called locklessly using RCU, if the
+ * caller ensures calls to this function are made within rcu_read_lock()
+ * regions and does all the other appropriate RCU stuff.
  */
-#define IDR_BITS 8
-#define IDR_SIZE (1 << IDR_BITS)
-#define IDR_MASK ((1 << IDR_BITS)-1)
-
-struct idr_layer {
-	int			prefix;	/* the ID prefix of this idr_layer */
-	DECLARE_BITMAP(bitmap, IDR_SIZE); /* A zero bit means "space here" */
-	struct idr_layer __rcu	*ary[1<<IDR_BITS];
-	int			count;	/* When zero, we can release it */
-	int			layer;	/* distance from leaf */
-	struct rcu_head		rcu_head;
-};
 
 struct idr {
-	struct idr_layer __rcu	*hint;	/* the last layer allocated from */
-	struct idr_layer __rcu	*top;
-	struct idr_layer	*id_free;
-	int			layers;	/* only valid w/o concurrent changes */
-	int			id_free_cnt;
-	int			cur;	/* current pos for cyclic allocation */
-	spinlock_t		lock;
+	struct ida		ida;
+	struct radix_tree_root	ptrs;
 };
 
 #define IDR_INIT(name)							\
 {									\
-	.lock			= __SPIN_LOCK_UNLOCKED(name.lock),	\
+	.ida			= IDA_INIT(name.ida),			\
+	.ptrs			= RADIX_TREE_INIT(GFP_NOWAIT),		\
 }
 #define DEFINE_IDR(name)	struct idr name = IDR_INIT(name)
 
-/**
- * DOC: idr sync
- * idr synchronization (stolen from radix-tree.h)
- *
- * idr_find() is able to be called locklessly, using RCU. The caller must
- * ensure calls to this function are made within rcu_read_lock() regions.
- * Other readers (lock-free or otherwise) and modifications may be running
- * concurrently.
- *
- * It is still required that the caller manage the synchronization and
- * lifetimes of the items. So if RCU lock-free lookups are used, typically
- * this would mean that the items have their own locks, or are amenable to
- * lock-free access; and that the items are freed by RCU (or only freed after
- * having been deleted from the idr tree *and* a synchronize_rcu() grace
- * period).
- */
-
-/*
- * This is what we export.
- */
-
-void *idr_find_slowpath(struct idr *idp, int id);
-void idr_preload(gfp_t gfp_mask);
-int idr_alloc_range(struct idr *idp, void *ptr, int start,
-		    int end, gfp_t gfp_mask);
-int idr_alloc_cyclic(struct idr *idr, void *ptr, int start,
-		     int end, gfp_t gfp_mask);
-int idr_for_each(struct idr *idp,
+void *idr_find_next(struct idr *idr, int *nextid);
+int idr_for_each(struct idr *idr,
 		 int (*fn)(int id, void *p, void *data), void *data);
-void *idr_find_next(struct idr *idp, int *nextid);
-void *idr_replace(struct idr *idp, void *ptr, int id);
-void idr_remove(struct idr *idp, int id);
-void idr_free(struct idr *idp, int id);
-void idr_destroy(struct idr *idp);
-void idr_init(struct idr *idp);
+void *idr_replace(struct idr *idr, void *ptr, unsigned id);
+void idr_remove(struct idr *idr, unsigned id);
+int idr_alloc_range(struct idr *idr, void *ptr, unsigned start,
+		    unsigned end, gfp_t gfp);
+int idr_alloc_cyclic(struct idr *idr, void *ptr, unsigned start,
+		     unsigned end, gfp_t gfp_mask);
+void idr_destroy(struct idr *idr);
+void idr_init(struct idr *idr);
 
 static inline int idr_alloc(struct idr *idr, void *ptr, gfp_t gfp)
 {
@@ -233,7 +197,53 @@ static inline int idr_alloc(struct idr *idr, void *ptr, gfp_t gfp)
  */
 static inline void idr_preload_end(void)
 {
-	preempt_enable();
+	radix_tree_preload_end();
+}
+
+/**
+ * idr_preload - preload for idr_alloc_range()
+ * @gfp: allocation mask to use for preloading
+ *
+ * Preload per-cpu layer buffer for idr_alloc_range().  Can only be used from
+ * process context and each idr_preload() invocation should be matched with
+ * idr_preload_end().  Note that preemption is disabled while preloaded.
+ *
+ * The first idr_alloc_range() in the preloaded section can be treated as if it
+ * were invoked with @gfp_mask used for preloading.  This allows using more
+ * permissive allocation masks for idrs protected by spinlocks.
+ *
+ * For example, if idr_alloc_range() below fails, the failure can be treated as
+ * if idr_alloc_range() were called with GFP_KERNEL rather than GFP_NOWAIT.
+ *
+ *	idr_preload(GFP_KERNEL);
+ *	spin_lock(lock);
+ *
+ *	id = idr_alloc_range(idr, ptr, start, end, GFP_NOWAIT);
+ *
+ *	spin_unlock(lock);
+ *	idr_preload_end();
+ *	if (id < 0)
+ *		error;
+ */
+static inline void idr_preload(gfp_t gfp)
+{
+	might_sleep_if(gfp & __GFP_WAIT);
+
+	/* Well this is horrible, but idr_preload doesn't return errors */
+	if (radix_tree_preload(gfp))
+		preempt_disable();
+}
+
+/* radix tree can't store NULL pointers, so we have to translate...  */
+static inline void *__radix_idr_ptr(void *ptr)
+{
+	return ptr != (void *) (~0UL & ~RADIX_TREE_INDIRECT_PTR)
+		? ptr : NULL;
+}
+
+static inline void *__idr_radix_ptr(void *ptr)
+{
+	return ptr ?: (void *) (~0UL & ~RADIX_TREE_INDIRECT_PTR);
 }
 
 /**
@@ -243,24 +253,19 @@ static inline void idr_preload_end(void)
  *
  * Return the pointer given the id it has been registered with.  A %NULL
  * return indicates that @id is not valid or you passed %NULL in
- * idr_get_new().
+ * idr_alloc().
  *
  * This function can be called under rcu_read_lock(), given that the leaf
  * pointers lifetimes are correctly managed.
  */
-static inline void *idr_find(struct idr *idr, int id)
+static inline void *idr_find(struct idr *idr, unsigned id)
 {
-	struct idr_layer *hint = rcu_dereference_raw(idr->hint);
-
-	if (hint && (id & ~IDR_MASK) == hint->prefix)
-		return rcu_dereference_raw(hint->ary[id & IDR_MASK]);
-
-	return idr_find_slowpath(idr, id);
+	return __radix_idr_ptr(radix_tree_lookup(&idr->ptrs, id));
 }
 
 /**
  * idr_for_each_entry - iterate over an idr's elements of a given type
- * @idp:     idr handle
+ * @idr:     idr handle
  * @entry:   the type * to use as cursor
  * @id:      id entry's key
  *
@@ -268,9 +273,7 @@ static inline void *idr_find(struct idr *idr, int id)
  * after normal terminatinon @entry is left with the value NULL.  This
  * is convenient for a "not found" value.
  */
-#define idr_for_each_entry(idp, entry, id)			\
-	for (id = 0; ((entry) = idr_find_next(idp, &(id))) != NULL; ++id)
-
-void __init idr_init_cache(void);
+#define idr_for_each_entry(idr, entry, id)			\
+	for (id = 0; ((entry) = idr_find_next(idr, &(id))) != NULL; ++id)
 
 #endif /* __IDR_H__ */
diff --git a/init/main.c b/init/main.c
index d03d2ec..6b44887 100644
--- a/init/main.c
+++ b/init/main.c
@@ -542,7 +542,6 @@ asmlinkage void __init start_kernel(void)
 	preempt_disable();
 	if (WARN(!irqs_disabled(), "Interrupts were enabled *very* early, fixing it\n"))
 		local_irq_disable();
-	idr_init_cache();
 	rcu_init();
 	tick_nohz_init();
 	radix_tree_init();
diff --git a/lib/idr.c b/lib/idr.c
index 5393aa1..89ec59f 100644
--- a/lib/idr.c
+++ b/lib/idr.c
@@ -8,24 +8,10 @@
  *
  * Modified by Nadia Derbey to make it RCU safe.
  *
- * IDA completely rewritten by Kent Overstreet <koverstreet@google.com>
+ * Completely rewritten by Kent Overstreet <koverstreet@google.com>.
  *
- * Small id to pointer translation service.
- *
- * It uses a radix tree like structure as a sparse array indexed
- * by the id to obtain the pointer.  The bitmap makes allocating
- * a new id quick.
- *
- * You call it to allocate an id (an int) an associate with that id a
- * pointer or what ever, we treat it as a (void *).  You can pass this
- * id to a user for him to pass back at a later time.  You then pass
- * that id to this code and it returns your pointer.
-
- * You can release ids at any time. When all ids are released, most of
- * the memory is returned (we keep MAX_IDR_FREE) in a local pool so we
- * don't need to go to the memory "store" during an id allocate, just
- * so you don't need to be too concerned about locking and conflicts
- * with the slab allocator.
+ * id allocator (scalable/resizable bitmap, essentially), and also idr which
+ * combines ida with a radix tree to map pointers to small integers for you.
  */
 
 #include <linux/bitmap.h>
@@ -33,11 +19,10 @@
 #include <linux/bug.h>
 #include <linux/err.h>
 #include <linux/export.h>
-#include <linux/hardirq.h>
 #include <linux/idr.h>
-#include <linux/init.h>
 #include <linux/kernel.h>
 #include <linux/percpu.h>
+#include <linux/rcupdate.h>
 #include <linux/sched.h>
 #include <linux/slab.h>
 #include <linux/string.h>
@@ -915,389 +900,158 @@ err:
 }
 EXPORT_SYMBOL_GPL(percpu_ida_init);
 
-/* IDR */
-
-#define MAX_IDR_SHIFT		(sizeof(int) * 8 - 1)
-#define MAX_IDR_BIT		(1U << MAX_IDR_SHIFT)
-
-/* Leave the possibility of an incomplete final layer */
-#define MAX_IDR_LEVEL ((MAX_IDR_SHIFT + IDR_BITS - 1) / IDR_BITS)
-
-/* Number of id_layer structs to leave in free list */
-#define MAX_IDR_FREE (MAX_IDR_LEVEL * 2)
-
-static struct kmem_cache *idr_layer_cache;
-static DEFINE_PER_CPU(struct idr_layer *, idr_preload_head);
-static DEFINE_PER_CPU(int, idr_preload_cnt);
-
-/* the maximum ID which can be allocated given idr->layers */
-static int idr_max(int layers)
-{
-	int bits = min_t(int, layers * IDR_BITS, MAX_IDR_SHIFT);
-
-	return (1 << bits) - 1;
-}
-
-/*
- * Prefix mask for an idr_layer at @layer.  For layer 0, the prefix mask is
- * all bits except for the lower IDR_BITS.  For layer 1, 2 * IDR_BITS, and
- * so on.
+/**
+ * DOC: IDR description
+ * IDR: Maps ids (small integers) to pointers.
+ *
+ * This merely combines ida (id allocation) with a radix tree; idr_alloc()
+ * stores a pointer, and returns you a small integer by which you can refer to
+ * it.
+ *
+ * It'll give you the smallest available integer (within a specified range if
+ * you use idr_alloc_range()) - there's also idr_alloc_cyclic() if you don't
+ * want ids to be reused right away.
+ *
+ * id -> pointer mappings can be deleted with idr_remove().
  */
-static int idr_layer_prefix_mask(int layer)
-{
-	return ~idr_max(layer + 1);
-}
-
-static struct idr_layer *get_from_free_list(struct idr *idp)
-{
-	struct idr_layer *p;
-	unsigned long flags;
-
-	spin_lock_irqsave(&idp->lock, flags);
-	if ((p = idp->id_free)) {
-		idp->id_free = p->ary[0];
-		idp->id_free_cnt--;
-		p->ary[0] = NULL;
-	}
-	spin_unlock_irqrestore(&idp->lock, flags);
-	return(p);
-}
 
 /**
- * idr_layer_alloc - allocate a new idr_layer
- * @gfp_mask: allocation mask
- * @layer_idr: optional idr to allocate from
+ * idr_find_next - lookup next object of id to given id.
+ * @idr: idr handle
+ * @nextidp:  pointer to lookup key
  *
- * If @layer_idr is %NULL, directly allocate one using @gfp_mask or fetch
- * one from the per-cpu preload buffer.  If @layer_idr is not %NULL, fetch
- * an idr_layer from @idr->id_free.
+ * Returns pointer to registered object with id, which is next number to
+ * given id. After being looked up, *@nextidp will be updated for the next
+ * iteration.
  *
- * @layer_idr is to maintain backward compatibility with the old alloc
- * interface - idr_pre_get() and idr_get_new*() - and will be removed
- * together with per-pool preload buffer.
+ * This function can be called under rcu_read_lock(), given that the leaf
+ * pointers lifetimes are correctly managed.
  */
-static struct idr_layer *idr_layer_alloc(gfp_t gfp_mask, struct idr *layer_idr)
-{
-	struct idr_layer *new;
-
-	/* this is the old path, bypass to get_from_free_list() */
-	if (layer_idr)
-		return get_from_free_list(layer_idr);
-
-	/*
-	 * Try to allocate directly from kmem_cache.  We want to try this
-	 * before preload buffer; otherwise, non-preloading idr_alloc_range()
-	 * users will end up taking advantage of preloading ones.  As the
-	 * following is allowed to fail for preloaded cases, suppress
-	 * warning this time.
-	 */
-	new = kmem_cache_zalloc(idr_layer_cache, gfp_mask | __GFP_NOWARN);
-	if (new)
-		return new;
-
-	/*
-	 * Try to fetch one from the per-cpu preload buffer if in process
-	 * context.  See idr_preload() for details.
-	 */
-	if (!in_interrupt()) {
-		preempt_disable();
-		new = __this_cpu_read(idr_preload_head);
-		if (new) {
-			__this_cpu_write(idr_preload_head, new->ary[0]);
-			__this_cpu_dec(idr_preload_cnt);
-			new->ary[0] = NULL;
-		}
-		preempt_enable();
-		if (new)
-			return new;
-	}
-
-	/*
-	 * Both failed.  Try kmem_cache again w/o adding __GFP_NOWARN so
-	 * that memory allocation failure warning is printed as intended.
-	 */
-	return kmem_cache_zalloc(idr_layer_cache, gfp_mask);
-}
-
-static void idr_layer_rcu_free(struct rcu_head *head)
+void *idr_find_next(struct idr *idr, int *nextidp)
 {
-	struct idr_layer *layer;
+	void **slot;
+	struct radix_tree_iter iter;
+	void *ret = NULL;
 
-	layer = container_of(head, struct idr_layer, rcu_head);
-	kmem_cache_free(idr_layer_cache, layer);
-}
+	rcu_read_lock();
 
-static inline void free_layer(struct idr *idr, struct idr_layer *p)
-{
-	if (idr->hint && idr->hint == p)
-		RCU_INIT_POINTER(idr->hint, NULL);
-	call_rcu(&p->rcu_head, idr_layer_rcu_free);
-}
-
-/* only called when idp->lock is held */
-static void __move_to_free_list(struct idr *idp, struct idr_layer *p)
-{
-	p->ary[0] = idp->id_free;
-	idp->id_free = p;
-	idp->id_free_cnt++;
-}
+	radix_tree_for_each_slot(slot, &idr->ptrs, &iter, *nextidp) {
+		*nextidp = iter.index;
+		ret = radix_tree_deref_slot(slot);
+		break;
+	}
 
-static void move_to_free_list(struct idr *idp, struct idr_layer *p)
-{
-	unsigned long flags;
+	rcu_read_unlock();
 
-	/*
-	 * Depends on the return element being zeroed.
-	 */
-	spin_lock_irqsave(&idp->lock, flags);
-	__move_to_free_list(idp, p);
-	spin_unlock_irqrestore(&idp->lock, flags);
-}
-
-static void idr_mark_full(struct idr_layer **pa, int id)
-{
-	struct idr_layer *p = pa[0];
-	int l = 0;
-
-	__set_bit(id & IDR_MASK, p->bitmap);
-	/*
-	 * If this layer is full mark the bit in the layer above to
-	 * show that this part of the radix tree is full.  This may
-	 * complete the layer above and require walking up the radix
-	 * tree.
-	 */
-	while (bitmap_full(p->bitmap, IDR_SIZE)) {
-		if (!(p = pa[++l]))
-			break;
-		id = id >> IDR_BITS;
-		__set_bit((id & IDR_MASK), p->bitmap);
-	}
+	return __radix_idr_ptr(ret);
 }
+EXPORT_SYMBOL(idr_find_next);
 
 /**
- * sub_alloc - try to allocate an id without growing the tree depth
- * @idp: idr handle
- * @starting_id: id to start search at
- * @pa: idr_layer[MAX_IDR_LEVEL] used as backtrack buffer
- * @gfp_mask: allocation mask for idr_layer_alloc()
- * @layer_idr: optional idr passed to idr_layer_alloc()
+ * idr_for_each - iterate through all stored pointers
+ * @idr: idr handle
+ * @fn: function to be called for each pointer
+ * @data: data passed back to callback function
+ *
+ * Iterate over the pointers registered with the given idr.  The
+ * callback function will be called for each pointer currently
+ * registered, passing the id, the pointer and the data pointer passed
+ * to this function.  It is not safe to modify the idr tree while in
+ * the callback, so functions such as idr_remove are not allowed.
  *
- * Allocate an id in range [@starting_id, INT_MAX] from @idp without
- * growing its depth.  Returns
+ * We check the return of @fn each time. If it returns anything other
+ * than %0, we break out and return that value.
  *
- *  the allocated id >= 0 if successful,
- *  -EAGAIN if the tree needs to grow for allocation to succeed,
- *  -ENOSPC if the id space is exhausted,
- *  -ENOMEM if more idr_layers need to be allocated.
+ * The caller must serialize idr_for_each() vs idr_remove().
  */
-static int sub_alloc(struct idr *idp, int *starting_id, struct idr_layer **pa,
-		     gfp_t gfp_mask, struct idr *layer_idr)
+int idr_for_each(struct idr *idr,
+		 int (*fn)(int id, void *p, void *data), void *data)
 {
-	int n, m, sh;
-	struct idr_layer *p, *new;
-	int l, id, oid;
-
-	id = *starting_id;
- restart:
-	p = idp->top;
-	l = idp->layers;
-	pa[l--] = NULL;
-	while (1) {
-		/*
-		 * We run around this while until we reach the leaf node...
-		 */
-		n = (id >> (IDR_BITS*l)) & IDR_MASK;
-		m = find_next_zero_bit(p->bitmap, IDR_SIZE, n);
-		if (m == IDR_SIZE) {
-			/* no space available go back to previous layer. */
-			l++;
-			oid = id;
-			id = (id | ((1 << (IDR_BITS * l)) - 1)) + 1;
-
-			/* if already at the top layer, we need to grow */
-			if (id >= 1 << (idp->layers * IDR_BITS)) {
-				*starting_id = id;
-				return -EAGAIN;
-			}
-			p = pa[l];
-			BUG_ON(!p);
+	void *p;
+	unsigned id;
+	int error = 0;
 
-			/* If we need to go up one layer, continue the
-			 * loop; otherwise, restart from the top.
-			 */
-			sh = IDR_BITS * (l + 1);
-			if (oid >> sh == id >> sh)
-				continue;
-			else
-				goto restart;
-		}
-		if (m != n) {
-			sh = IDR_BITS*l;
-			id = ((id >> sh) ^ n ^ m) << sh;
-		}
-		if ((id >= MAX_IDR_BIT) || (id < 0))
-			return -ENOSPC;
-		if (l == 0)
+	idr_for_each_entry(idr, p, id) {
+		error = fn(id, p, data);
+		if (error)
 			break;
-		/*
-		 * Create the layer below if it is missing.
-		 */
-		if (!p->ary[m]) {
-			new = idr_layer_alloc(gfp_mask, layer_idr);
-			if (!new)
-				return -ENOMEM;
-			new->layer = l-1;
-			new->prefix = id & idr_layer_prefix_mask(new->layer);
-			rcu_assign_pointer(p->ary[m], new);
-			p->count++;
-		}
-		pa[l--] = p;
-		p = p->ary[m];
 	}
 
-	pa[l] = p;
-	return id;
+	return error;
 }
+EXPORT_SYMBOL(idr_for_each);
 
-static int idr_get_empty_slot(struct idr *idp, int starting_id,
-			      struct idr_layer **pa, gfp_t gfp_mask,
-			      struct idr *layer_idr)
+/**
+ * idr_replace - replace pointer for given id
+ * @idr: idr handle
+ * @ptr: pointer you want associated with the id
+ * @id: lookup key
+ *
+ * Replace the pointer registered with an id and return the old value.
+ * A %-ENOENT return indicates that @id was not found.
+ * A %-EINVAL return indicates that @id was not within valid constraints.
+ */
+void *idr_replace(struct idr *idr, void *ptr, unsigned id)
 {
-	struct idr_layer *p, *new;
-	int layers, v, id;
+	void **slot, *old = ERR_PTR(-ENOENT);
 	unsigned long flags;
 
-	id = starting_id;
-build_up:
-	p = idp->top;
-	layers = idp->layers;
-	if (unlikely(!p)) {
-		if (!(p = idr_layer_alloc(gfp_mask, layer_idr)))
-			return -ENOMEM;
-		p->layer = 0;
-		layers = 1;
-	}
-	/*
-	 * Add a new layer to the top of the tree if the requested
-	 * id is larger than the currently allocated space.
-	 */
-	while (id > idr_max(layers)) {
-		layers++;
-		if (!p->count) {
-			/* special case: if the tree is currently empty,
-			 * then we grow the tree by moving the top node
-			 * upwards.
-			 */
-			p->layer++;
-			WARN_ON_ONCE(p->prefix);
-			continue;
-		}
-		if (!(new = idr_layer_alloc(gfp_mask, layer_idr))) {
-			/*
-			 * The allocation failed.  If we built part of
-			 * the structure tear it down.
-			 */
-			spin_lock_irqsave(&idp->lock, flags);
-			for (new = p; p && p != idp->top; new = p) {
-				p = p->ary[0];
-				new->ary[0] = NULL;
-				new->count = 0;
-				bitmap_clear(new->bitmap, 0, IDR_SIZE);
-				__move_to_free_list(idp, new);
-			}
-			spin_unlock_irqrestore(&idp->lock, flags);
-			return -ENOMEM;
-		}
-		new->ary[0] = p;
-		new->count = 1;
-		new->layer = layers-1;
-		new->prefix = id & idr_layer_prefix_mask(new->layer);
-		if (bitmap_full(p->bitmap, IDR_SIZE))
-			__set_bit(0, new->bitmap);
-		p = new;
+	rcu_read_lock();
+	spin_lock_irqsave(&idr->ida.lock, flags);
+
+	slot = radix_tree_lookup_slot(&idr->ptrs, id);
+
+	if (slot) {
+		old = radix_tree_deref_slot(slot);
+		if (old)
+			radix_tree_replace_slot(slot, __idr_radix_ptr(ptr));
 	}
-	rcu_assign_pointer(idp->top, p);
-	idp->layers = layers;
-	v = sub_alloc(idp, &id, pa, gfp_mask, layer_idr);
-	if (v == -EAGAIN)
-		goto build_up;
-	return(v);
-}
 
-/*
- * @id and @pa are from a successful allocation from idr_get_empty_slot().
- * Install the user pointer @ptr and mark the slot full.
- */
-static void idr_fill_slot(struct idr *idr, void *ptr, int id,
-			  struct idr_layer **pa)
-{
-	/* update hint used for lookup, cleared from free_layer() */
-	rcu_assign_pointer(idr->hint, pa[0]);
+	spin_unlock_irqrestore(&idr->ida.lock, flags);
+	rcu_read_unlock();
 
-	rcu_assign_pointer(pa[0]->ary[id & IDR_MASK], (struct idr_layer *)ptr);
-	pa[0]->count++;
-	idr_mark_full(pa, id);
+	return __radix_idr_ptr(old);
 }
+EXPORT_SYMBOL(idr_replace);
 
 /**
- * idr_preload - preload for idr_alloc_range()
- * @gfp_mask: allocation mask to use for preloading
- *
- * Preload per-cpu layer buffer for idr_alloc_range().  Can only be used from
- * process context and each idr_preload() invocation should be matched with
- * idr_preload_end().  Note that preemption is disabled while preloaded.
- *
- * The first idr_alloc_range() in the preloaded section can be treated as if it
- * were invoked with @gfp_mask used for preloading.  This allows using more
- * permissive allocation masks for idrs protected by spinlocks.
- *
- * For example, if idr_alloc_range() below fails, the failure can be treated as
- * if idr_alloc_range() were called with GFP_KERNEL rather than GFP_NOWAIT.
- *
- *	idr_preload(GFP_KERNEL);
- *	spin_lock(lock);
- *
- *	id = idr_alloc_range(idr, ptr, start, end, GFP_NOWAIT);
- *
- *	spin_unlock(lock);
- *	idr_preload_end();
- *	if (id < 0)
- *		error;
+ * idr_remove - remove the given id and free its slot
+ * @idr: idr handle
+ * @id: unique key
  */
-void idr_preload(gfp_t gfp_mask)
+void idr_remove(struct idr *idr, unsigned id)
 {
-	/*
-	 * Consuming preload buffer from non-process context breaks preload
-	 * allocation guarantee.  Disallow usage from those contexts.
-	 */
-	WARN_ON_ONCE(in_interrupt());
-	might_sleep_if(gfp_mask & __GFP_WAIT);
-
-	preempt_disable();
-
-	/*
-	 * idr_alloc_range() is likely to succeed w/o full idr_layer buffer and
-	 * return value from idr_alloc_range() needs to be checked for failure
-	 * anyway.  Silently give up if allocation fails.  The caller can
-	 * treat failures from idr_alloc_range() as if idr_alloc() were called
-	 * with @gfp_mask which should be enough.
-	 */
-	while (__this_cpu_read(idr_preload_cnt) < MAX_IDR_FREE) {
-		struct idr_layer *new;
-
-		preempt_enable();
-		new = kmem_cache_zalloc(idr_layer_cache, gfp_mask);
-		preempt_disable();
-		if (!new)
-			break;
+	unsigned long flags;
+
+	spin_lock_irqsave(&idr->ida.lock, flags);
 
-		/* link the new one to per-cpu preload list */
-		new->ary[0] = __this_cpu_read(idr_preload_head);
-		__this_cpu_write(idr_preload_head, new);
-		__this_cpu_inc(idr_preload_cnt);
+	radix_tree_delete(&idr->ptrs, id);
+	__ida_remove(&idr->ida, id);
+
+	spin_unlock_irqrestore(&idr->ida.lock, flags);
+}
+EXPORT_SYMBOL(idr_remove);
+
+static int idr_insert(struct idr *idr, void *ptr, unsigned id,
+		      gfp_t gfp, unsigned long *flags)
+{
+	int ret = radix_tree_preload(GFP_NOWAIT);
+	if (ret) {
+		spin_unlock_irqrestore(&idr->ida.lock, *flags);
+		ret = radix_tree_preload(gfp);
+		spin_lock_irqsave(&idr->ida.lock, *flags);
+
+		if (ret) {
+			__ida_remove(&idr->ida, id);
+			return ret;
+		}
 	}
+
+	ret = radix_tree_insert(&idr->ptrs, id, __idr_radix_ptr(ptr));
+	BUG_ON(ret);
+	radix_tree_preload_end();
+	return id;
 }
-EXPORT_SYMBOL(idr_preload);
 
 /**
  * idr_alloc_range - allocate new idr entry
@@ -1305,44 +1059,34 @@ EXPORT_SYMBOL(idr_preload);
  * @ptr: pointer to be associated with the new id
  * @start: the minimum id (inclusive)
  * @end: the maximum id (exclusive, <= 0 for max)
- * @gfp_mask: memory allocation flags
+ * @gfp: memory allocation flags
  *
  * Allocate an id in [start, end) and associate it with @ptr.  If no ID is
  * available in the specified range, returns -ENOSPC.  On memory allocation
  * failure, returns -ENOMEM.
  *
- * Note that @end is treated as max when <= 0.  This is to always allow
- * using @start + N as @end as long as N is inside integer range.
- *
- * The user is responsible for exclusively synchronizing all operations
- * which may modify @idr.  However, read-only accesses such as idr_find()
- * or iteration can be performed under RCU read lock provided the user
- * destroys @ptr in RCU-safe way after removal from idr.
+ * Note that @end is treated as max when <= 0.  This is to always allow using
+ * @start + N as @end as long as N is inside integer range.
  */
-int idr_alloc_range(struct idr *idr, void *ptr, int start,
-		    int end, gfp_t gfp_mask)
+int idr_alloc_range(struct idr *idr, void *ptr, unsigned start,
+		    unsigned end, gfp_t gfp)
 {
-	int max = end > 0 ? end - 1 : INT_MAX;	/* inclusive upper limit */
-	struct idr_layer *pa[MAX_IDR_LEVEL + 1];
-	int id;
+	int ret;
+	unsigned id;
+	unsigned long flags;
 
-	might_sleep_if(gfp_mask & __GFP_WAIT);
+	might_sleep_if(gfp & __GFP_WAIT);
 
-	/* sanity checks */
-	if (WARN_ON_ONCE(start < 0))
-		return -EINVAL;
-	if (unlikely(max < start))
-		return -ENOSPC;
+	spin_lock_irqsave(&idr->ida.lock, flags);
 
-	/* allocate id */
-	id = idr_get_empty_slot(idr, start, pa, gfp_mask, NULL);
-	if (unlikely(id < 0))
-		return id;
-	if (unlikely(id > max))
-		return -ENOSPC;
+	ret = __ida_alloc_range_multiple(&idr->ida, &id, 1, start,
+					 end, gfp, &flags);
+	if (ret == 1)
+		ret = idr_insert(idr, ptr, id, gfp, &flags);
 
-	idr_fill_slot(idr, ptr, id, pa);
-	return id;
+	spin_unlock_irqrestore(&idr->ida.lock, flags);
+
+	return ret;
 }
 EXPORT_SYMBOL_GPL(idr_alloc_range);
 
@@ -1352,369 +1096,65 @@ EXPORT_SYMBOL_GPL(idr_alloc_range);
  * @ptr: pointer to be associated with the new id
  * @start: the minimum id (inclusive)
  * @end: the maximum id (exclusive, <= 0 for max)
- * @gfp_mask: memory allocation flags
+ * @gfp: memory allocation flags
  *
  * Essentially the same as idr_alloc_range, but prefers to allocate
  * progressively higher ids if it can. If the "cur" counter wraps, then it will
  * start again at the "start" end of the range and allocate one that has already
  * been used.
  */
-int idr_alloc_cyclic(struct idr *idr, void *ptr, int start, int end,
-			gfp_t gfp_mask)
+int idr_alloc_cyclic(struct idr *idr, void *ptr, unsigned start,
+		     unsigned end, gfp_t gfp)
 {
-	int id;
-
-	id = idr_alloc_range(idr, ptr, max(start, idr->cur), end, gfp_mask);
-	if (id == -ENOSPC)
-		id = idr_alloc_range(idr, ptr, start, end, gfp_mask);
-
-	if (likely(id >= 0))
-		idr->cur = id + 1;
-	return id;
-}
-EXPORT_SYMBOL(idr_alloc_cyclic);
-
-static void idr_remove_warning(int id)
-{
-	WARN(1, "idr_remove called for id=%d which is not allocated.\n", id);
-}
-
-static void sub_remove(struct idr *idp, int shift, int id)
-{
-	struct idr_layer *p = idp->top;
-	struct idr_layer **pa[MAX_IDR_LEVEL + 1];
-	struct idr_layer ***paa = &pa[0];
-	struct idr_layer *to_free;
-	int n;
-
-	*paa = NULL;
-	*++paa = &idp->top;
-
-	while ((shift > 0) && p) {
-		n = (id >> shift) & IDR_MASK;
-		__clear_bit(n, p->bitmap);
-		*++paa = &p->ary[n];
-		p = p->ary[n];
-		shift -= IDR_BITS;
-	}
-	n = id & IDR_MASK;
-	if (likely(p != NULL && test_bit(n, p->bitmap))) {
-		__clear_bit(n, p->bitmap);
-		rcu_assign_pointer(p->ary[n], NULL);
-		to_free = NULL;
-		while(*paa && ! --((**paa)->count)){
-			if (to_free)
-				free_layer(idp, to_free);
-			to_free = **paa;
-			**paa-- = NULL;
-		}
-		if (!*paa)
-			idp->layers = 0;
-		if (to_free)
-			free_layer(idp, to_free);
-	} else
-		idr_remove_warning(id);
-}
+	int ret;
+	unsigned long flags;
 
-/**
- * idr_remove - remove the given id and free its slot
- * @idp: idr handle
- * @id: unique key
- */
-void idr_remove(struct idr *idp, int id)
-{
-	struct idr_layer *p;
-	struct idr_layer *to_free;
+	might_sleep_if(gfp & __GFP_WAIT);
 
-	if (id < 0)
-		return;
+	spin_lock_irqsave(&idr->ida.lock, flags);
 
-	sub_remove(idp, (idp->layers - 1) * IDR_BITS, id);
-	if (idp->top && idp->top->count == 1 && (idp->layers > 1) &&
-	    idp->top->ary[0]) {
-		/*
-		 * Single child at leftmost slot: we can shrink the tree.
-		 * This level is not needed anymore since when layers are
-		 * inserted, they are inserted at the top of the existing
-		 * tree.
-		 */
-		to_free = idp->top;
-		p = idp->top->ary[0];
-		rcu_assign_pointer(idp->top, p);
-		--idp->layers;
-		to_free->count = 0;
-		bitmap_clear(to_free->bitmap, 0, IDR_SIZE);
-		free_layer(idp, to_free);
-	}
-	while (idp->id_free_cnt >= MAX_IDR_FREE) {
-		p = get_from_free_list(idp);
-		/*
-		 * Note: we don't call the rcu callback here, since the only
-		 * layers that fall into the freelist are those that have been
-		 * preallocated.
-		 */
-		kmem_cache_free(idr_layer_cache, p);
-	}
-	return;
-}
-EXPORT_SYMBOL(idr_remove);
+	ret = __ida_alloc_cyclic(&idr->ida, start, end, gfp, &flags);
+	if (ret >= 0)
+		ret = idr_insert(idr, ptr, ret, gfp, &flags);
 
-static void __idr_remove_all(struct idr *idp)
-{
-	int n, id, max;
-	int bt_mask;
-	struct idr_layer *p;
-	struct idr_layer *pa[MAX_IDR_LEVEL + 1];
-	struct idr_layer **paa = &pa[0];
-
-	n = idp->layers * IDR_BITS;
-	p = idp->top;
-	rcu_assign_pointer(idp->top, NULL);
-	max = idr_max(idp->layers);
-
-	id = 0;
-	while (id >= 0 && id <= max) {
-		while (n > IDR_BITS && p) {
-			n -= IDR_BITS;
-			*paa++ = p;
-			p = p->ary[(id >> n) & IDR_MASK];
-		}
+	spin_unlock_irqrestore(&idr->ida.lock, flags);
 
-		bt_mask = id;
-		id += 1 << n;
-		/* Get the highest bit that the above add changed from 0->1. */
-		while (n < fls(id ^ bt_mask)) {
-			if (p)
-				free_layer(idp, p);
-			n += IDR_BITS;
-			p = *--paa;
-		}
-	}
-	idp->layers = 0;
+	return ret;
 }
+EXPORT_SYMBOL(idr_alloc_cyclic);
 
 /**
- * idr_destroy - release all cached layers within an idr tree
- * @idp: idr handle
+ * idr_destroy - free all memory owned by @idr
+ * @idr: idr handle
  *
- * Free all id mappings and all idp_layers.  After this function, @idp is
- * completely unused and can be freed / recycled.  The caller is
- * responsible for ensuring that no one else accesses @idp during or after
- * idr_destroy().
+ * After this function, @idr is completely unused and can be freed / recycled.
  *
  * A typical clean-up sequence for objects stored in an idr tree will use
  * idr_for_each() to free all objects, if necessay, then idr_destroy() to
- * free up the id mappings and cached idr_layers.
+ * free the embedded ida and radix tree.
  */
-void idr_destroy(struct idr *idp)
+void idr_destroy(struct idr *idr)
 {
-	__idr_remove_all(idp);
-
-	while (idp->id_free_cnt) {
-		struct idr_layer *p = get_from_free_list(idp);
-		kmem_cache_free(idr_layer_cache, p);
-	}
-}
-EXPORT_SYMBOL(idr_destroy);
-
-void *idr_find_slowpath(struct idr *idp, int id)
-{
-	int n;
-	struct idr_layer *p;
-
-	if (id < 0)
-		return NULL;
-
-	p = rcu_dereference_raw(idp->top);
-	if (!p)
-		return NULL;
-	n = (p->layer+1) * IDR_BITS;
-
-	if (id > idr_max(p->layer + 1))
-		return NULL;
-	BUG_ON(n == 0);
-
-	while (n > 0 && p) {
-		n -= IDR_BITS;
-		BUG_ON(n != p->layer*IDR_BITS);
-		p = rcu_dereference_raw(p->ary[(id >> n) & IDR_MASK]);
-	}
-	return((void *)p);
-}
-EXPORT_SYMBOL(idr_find_slowpath);
-
-/**
- * idr_for_each - iterate through all stored pointers
- * @idp: idr handle
- * @fn: function to be called for each pointer
- * @data: data passed back to callback function
- *
- * Iterate over the pointers registered with the given idr.  The
- * callback function will be called for each pointer currently
- * registered, passing the id, the pointer and the data pointer passed
- * to this function.  It is not safe to modify the idr tree while in
- * the callback, so functions such as idr_remove are not allowed.
- *
- * We check the return of @fn each time. If it returns anything other
- * than %0, we break out and return that value.
- *
- * The caller must serialize idr_for_each() vs idr_remove().
- */
-int idr_for_each(struct idr *idp,
-		 int (*fn)(int id, void *p, void *data), void *data)
-{
-	int n, id, max, error = 0;
-	struct idr_layer *p;
-	struct idr_layer *pa[MAX_IDR_LEVEL + 1];
-	struct idr_layer **paa = &pa[0];
-
-	n = idp->layers * IDR_BITS;
-	p = rcu_dereference_raw(idp->top);
-	max = idr_max(idp->layers);
-
-	id = 0;
-	while (id >= 0 && id <= max) {
-		while (n > 0 && p) {
-			n -= IDR_BITS;
-			*paa++ = p;
-			p = rcu_dereference_raw(p->ary[(id >> n) & IDR_MASK]);
-		}
-
-		if (p) {
-			error = fn(id, (void *)p, data);
-			if (error)
-				break;
-		}
-
-		id += 1 << n;
-		while (n < fls(id)) {
-			n += IDR_BITS;
-			p = *--paa;
-		}
-	}
-
-	return error;
-}
-EXPORT_SYMBOL(idr_for_each);
-
-/**
- * idr_find_next - lookup next object of id to given id.
- * @idp: idr handle
- * @nextidp:  pointer to lookup key
- *
- * Returns pointer to registered object with id, which is next number to
- * given id. After being looked up, *@nextidp will be updated for the next
- * iteration.
- *
- * This function can be called under rcu_read_lock(), given that the leaf
- * pointers lifetimes are correctly managed.
- */
-void *idr_find_next(struct idr *idp, int *nextidp)
-{
-	struct idr_layer *p, *pa[MAX_IDR_LEVEL + 1];
-	struct idr_layer **paa = &pa[0];
-	int id = *nextidp;
-	int n, max;
-
-	/* find first ent */
-	p = rcu_dereference_raw(idp->top);
-	if (!p)
-		return NULL;
-	n = (p->layer + 1) * IDR_BITS;
-	max = idr_max(p->layer + 1);
-
-	while (id >= 0 && id <= max) {
-		while (n > 0 && p) {
-			n -= IDR_BITS;
-			*paa++ = p;
-			p = rcu_dereference_raw(p->ary[(id >> n) & IDR_MASK]);
-		}
-
-		if (p) {
-			*nextidp = id;
-			return p;
-		}
-
-		/*
-		 * Proceed to the next layer at the current level.  Unlike
-		 * idr_for_each(), @id isn't guaranteed to be aligned to
-		 * layer boundary at this point and adding 1 << n may
-		 * incorrectly skip IDs.  Make sure we jump to the
-		 * beginning of the next layer using round_up().
-		 */
-		id = round_up(id + 1, 1 << n);
-		while (n < fls(id)) {
-			n += IDR_BITS;
-			p = *--paa;
-		}
-	}
-	return NULL;
-}
-EXPORT_SYMBOL(idr_find_next);
-
-
-/**
- * idr_replace - replace pointer for given id
- * @idp: idr handle
- * @ptr: pointer you want associated with the id
- * @id: lookup key
- *
- * Replace the pointer registered with an id and return the old value.
- * A %-ENOENT return indicates that @id was not found.
- * A %-EINVAL return indicates that @id was not within valid constraints.
- *
- * The caller must serialize with writers.
- */
-void *idr_replace(struct idr *idp, void *ptr, int id)
-{
-	int n;
-	struct idr_layer *p, *old_p;
-
-	if (id < 0)
-		return ERR_PTR(-EINVAL);
-
-	p = idp->top;
-	if (!p)
-		return ERR_PTR(-EINVAL);
-
-	n = (p->layer+1) * IDR_BITS;
-
-	if (id >= (1 << n))
-		return ERR_PTR(-EINVAL);
-
-	n -= IDR_BITS;
-	while ((n > 0) && p) {
-		p = p->ary[(id >> n) & IDR_MASK];
-		n -= IDR_BITS;
-	}
-
-	n = id & IDR_MASK;
-	if (unlikely(p == NULL || !test_bit(n, p->bitmap)))
-		return ERR_PTR(-ENOENT);
-
-	old_p = p->ary[n];
-	rcu_assign_pointer(p->ary[n], ptr);
+	void *p;
+	unsigned id;
 
-	return old_p;
-}
-EXPORT_SYMBOL(idr_replace);
+	idr_for_each_entry(idr, p, id)
+		idr_remove(idr, id);
 
-void __init idr_init_cache(void)
-{
-	idr_layer_cache = kmem_cache_create("idr_layer_cache",
-				sizeof(struct idr_layer), 0, SLAB_PANIC, NULL);
+	ida_destroy(&idr->ida);
 }
+EXPORT_SYMBOL(idr_destroy);
 
 /**
- * idr_init - initialize idr handle
- * @idp:	idr handle
+ * idr_init - initialize sparse idr handle
+ * @idr:	idr handle
  *
- * This function is use to set up the handle (@idp) that you will pass
+ * This function is use to set up the handle (@idr) that you will pass
  * to the rest of the functions.
  */
-void idr_init(struct idr *idp)
+void idr_init(struct idr *idr)
 {
-	memset(idp, 0, sizeof(struct idr));
-	spin_lock_init(&idp->lock);
+	ida_init(&idr->ida);
+	INIT_RADIX_TREE(&idr->ptrs, GFP_NOWAIT);
 }
 EXPORT_SYMBOL(idr_init);
-- 
1.8.4.rc1


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 09/10] idr: Remove unneeded idr locking, idr_preload() usage
       [not found] ` <1375896905-6074-1-git-send-email-kmo-PEzghdH756F8UrSeD/g0lQ@public.gmane.org>
@ 2013-08-07 17:46     ` Kent Overstreet
  0 siblings, 0 replies; 46+ messages in thread
From: Kent Overstreet @ 2013-08-07 17:46 UTC (permalink / raw)
  To: akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: David Airlie, Steve Wise, Tom Tucker, Kai Mäkisara,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	target-devel-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Sean Hefty, Christoph Hellwig, Alasdair Kergon, Roland Dreier,
	Samuel Ortiz, linux-scsi-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Alon Levy, Dave Airlie,
	osst-users-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, Chris Ball,
	Hal Rosenstock, Alex Dubov, Kent Overstreet,
	Nicholas A. Bellinger, cgroups-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Neil Horman

Our new idr implementation does its own locking, instead of forcing it
onto the callers like the old implementation.

Many of the existing idr users need locking for more than just
idr_alloc()/idr_remove()/idr_find() - they're taking refcounts and such
under their locks and we can't touch those.

But a significant number of users had locks that protected nothing more
than the idr data structures itself - those we can get rid of.

Note that we have to be careful when removing locks; in some places,
locks appear to only be protecting idr_alloc()/idr_remove() calls but
they're also used by other code that needs to ensure the idr isn't
modified while it's doing something else - so ideally we want to delete
the lock that protected the idr, or else we have to carefully audit all
the other places it's used.

There's also a fair number of places where things were being done under
the idr lock unnecessarily; drivers/dca/dca-sysfs.c is a good example.
dca->id is set to the id idr_alloc() returns under the lock - but
there's no idr_find() calls under the lock, and dca->id isn't touched in
the idr_remove() paths. So the lock can be safely deleted.

The really nice thing about deleting this unnecessary locking is that it
lets us trivially delete a lot of now unnecessary idr_preload() - with
idr doing its own locking, we can pass GFP_KERNEL to idr_alloc() just
fine.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: David Airlie <airlied@linux.ie>
Cc: Tom Tucker <tom@opengridcomputing.com>
Cc: Steve Wise <swise@opengridcomputing.com>
Cc: Roland Dreier <roland@kernel.org>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: dm-devel@redhat.com
Cc: Samuel Ortiz <sameo@linux.intel.com>
Cc: Alex Dubov <oakad@yahoo.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Chris Ball <cjb@laptop.org>
Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
Cc: Willem Riede <osst@riede.org>
Cc: "Kai Mäkisara" <Kai.Makisara@kolumbus.fi>
Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Alon Levy <alevy@redhat.com>
Cc: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: linux-kernel@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: linux-rdma@vger.kernel.org
Cc: linux-mmc@vger.kernel.org
Cc: linux-scsi@vger.kernel.org
Cc: osst-users@lists.sourceforge.net
Cc: target-devel@vger.kernel.org
Cc: containers@lists.linux-foundation.org
Cc: cgroups@vger.kernel.org
Cc: linux-sctp@vger.kernel.org
Cc: netdev@vger.kernel.org
---
 drivers/dca/dca-sysfs.c                   | 18 +++---------------
 drivers/gpu/drm/qxl/qxl_cmd.c             |  4 +---
 drivers/gpu/drm/qxl/qxl_drv.h             |  1 -
 drivers/gpu/drm/qxl/qxl_kms.c             |  1 -
 drivers/gpu/drm/qxl/qxl_release.c         | 20 +++++---------------
 drivers/infiniband/hw/amso1100/c2.h       |  1 -
 drivers/infiniband/hw/amso1100/c2_qp.c    | 20 ++------------------
 drivers/md/dm.c                           | 24 ++++--------------------
 drivers/memstick/core/memstick.c          | 17 +++--------------
 drivers/mfd/rtsx_pcr.c                    | 13 +++----------
 drivers/misc/c2port/core.c                | 11 +----------
 drivers/misc/tifm_core.c                  | 15 +++------------
 drivers/mmc/core/host.c                   | 13 ++-----------
 drivers/scsi/ch.c                         | 13 +------------
 drivers/scsi/st.c                         | 13 +------------
 drivers/target/iscsi/iscsi_target.c       | 17 ++++-------------
 drivers/target/iscsi/iscsi_target.h       |  1 -
 drivers/target/iscsi/iscsi_target_login.c | 12 ++----------
 include/linux/cgroup.h                    |  1 -
 include/net/sctp/sctp.h                   |  1 -
 kernel/cgroup.c                           |  9 +--------
 kernel/workqueue.c                        | 15 ++-------------
 net/9p/util.c                             | 15 +--------------
 net/sctp/associola.c                      | 14 ++------------
 net/sctp/protocol.c                       |  1 -
 net/sctp/socket.c                         |  2 --
 26 files changed, 41 insertions(+), 231 deletions(-)

diff --git a/drivers/dca/dca-sysfs.c b/drivers/dca/dca-sysfs.c
index effda66..6be5fbd 100644
--- a/drivers/dca/dca-sysfs.c
+++ b/drivers/dca/dca-sysfs.c
@@ -31,7 +31,6 @@
 
 static struct class *dca_class;
 static struct idr dca_idr;
-static spinlock_t dca_idr_lock;
 
 int dca_sysfs_add_req(struct dca_provider *dca, struct device *dev, int slot)
 {
@@ -55,23 +54,15 @@ int dca_sysfs_add_provider(struct dca_provider *dca, struct device *dev)
 	struct device *cd;
 	int ret;
 
-	idr_preload(GFP_KERNEL);
-	spin_lock(&dca_idr_lock);
-
-	ret = idr_alloc(&dca_idr, dca, GFP_NOWAIT);
-	if (ret >= 0)
-		dca->id = ret;
-
-	spin_unlock(&dca_idr_lock);
-	idr_preload_end();
+	ret = idr_alloc(&dca_idr, dca, GFP_KERNEL);
 	if (ret < 0)
 		return ret;
 
+	dca->id = ret;
+
 	cd = device_create(dca_class, dev, MKDEV(0, 0), NULL, "dca%d", dca->id);
 	if (IS_ERR(cd)) {
-		spin_lock(&dca_idr_lock);
 		idr_remove(&dca_idr, dca->id);
-		spin_unlock(&dca_idr_lock);
 		return PTR_ERR(cd);
 	}
 	dca->cd = cd;
@@ -82,15 +73,12 @@ void dca_sysfs_remove_provider(struct dca_provider *dca)
 {
 	device_unregister(dca->cd);
 	dca->cd = NULL;
-	spin_lock(&dca_idr_lock);
 	idr_remove(&dca_idr, dca->id);
-	spin_unlock(&dca_idr_lock);
 }
 
 int __init dca_sysfs_init(void)
 {
 	idr_init(&dca_idr);
-	spin_lock_init(&dca_idr_lock);
 
 	dca_class = class_create(THIS_MODULE, "dca");
 	if (IS_ERR(dca_class)) {
diff --git a/drivers/gpu/drm/qxl/qxl_cmd.c b/drivers/gpu/drm/qxl/qxl_cmd.c
index 555a698..e01f7d5 100644
--- a/drivers/gpu/drm/qxl/qxl_cmd.c
+++ b/drivers/gpu/drm/qxl/qxl_cmd.c
@@ -449,11 +449,9 @@ int qxl_surface_id_alloc(struct qxl_device *qdev,
 	int idr_ret;
 	int count = 0;
 again:
-	idr_preload(GFP_ATOMIC);
 	spin_lock(&qdev->surf_id_idr_lock);
-	idr_ret = idr_alloc_range(&qdev->surf_id_idr, NULL, 1, 0, GFP_NOWAIT);
+	idr_ret = idr_alloc_range(&qdev->surf_id_idr, NULL, 1, 0, GFP_ATOMIC);
 	spin_unlock(&qdev->surf_id_idr_lock);
-	idr_preload_end();
 	if (idr_ret < 0)
 		return idr_ret;
 	handle = idr_ret;
diff --git a/drivers/gpu/drm/qxl/qxl_drv.h b/drivers/gpu/drm/qxl/qxl_drv.h
index 7e96f4f..df25af5 100644
--- a/drivers/gpu/drm/qxl/qxl_drv.h
+++ b/drivers/gpu/drm/qxl/qxl_drv.h
@@ -285,7 +285,6 @@ struct qxl_device {
 	uint64_t	va_slot_mask;
 
 	struct idr	release_idr;
-	spinlock_t release_idr_lock;
 	struct mutex	async_io_mutex;
 	unsigned int last_sent_io_cmd;
 
diff --git a/drivers/gpu/drm/qxl/qxl_kms.c b/drivers/gpu/drm/qxl/qxl_kms.c
index 9e8da9e..396c1eb 100644
--- a/drivers/gpu/drm/qxl/qxl_kms.c
+++ b/drivers/gpu/drm/qxl/qxl_kms.c
@@ -204,7 +204,6 @@ int qxl_device_init(struct qxl_device *qdev,
 			GFP_KERNEL);
 
 	idr_init(&qdev->release_idr);
-	spin_lock_init(&qdev->release_idr_lock);
 
 	idr_init(&qdev->surf_id_idr);
 	spin_lock_init(&qdev->surf_id_idr_lock);
diff --git a/drivers/gpu/drm/qxl/qxl_release.c b/drivers/gpu/drm/qxl/qxl_release.c
index 7cbe62a..6cb2c83 100644
--- a/drivers/gpu/drm/qxl/qxl_release.c
+++ b/drivers/gpu/drm/qxl/qxl_release.c
@@ -44,7 +44,6 @@ qxl_release_alloc(struct qxl_device *qdev, int type,
 		  struct qxl_release **ret)
 {
 	struct qxl_release *release;
-	int handle;
 	size_t size = sizeof(*release);
 	int idr_ret;
 
@@ -58,21 +57,16 @@ qxl_release_alloc(struct qxl_device *qdev, int type,
 	release->surface_release_id = 0;
 	INIT_LIST_HEAD(&release->bos);
 
-	idr_preload(GFP_KERNEL);
-	spin_lock(&qdev->release_idr_lock);
-	idr_ret = idr_alloc_range(&qdev->release_idr,
-				  release, 1, 0, GFP_NOWAIT);
-	spin_unlock(&qdev->release_idr_lock);
-	idr_preload_end();
-	handle = idr_ret;
+	idr_ret = idr_alloc_range(&qdev->release_idr, release,
+				  1, 0, GFP_KERNEL);
 	if (idr_ret < 0)
 		goto release_fail;
 	*ret = release;
-	QXL_INFO(qdev, "allocated release %lld\n", handle);
-	release->id = handle;
+	QXL_INFO(qdev, "allocated release %lld\n", idr_ret);
+	release->id = idr_ret;
 release_fail:
 
-	return handle;
+	return idr_ret;
 }
 
 void
@@ -94,9 +88,7 @@ qxl_release_free(struct qxl_device *qdev,
 		qxl_fence_remove_release(&bo->fence, release->id);
 		qxl_bo_unref(&bo);
 	}
-	spin_lock(&qdev->release_idr_lock);
 	idr_remove(&qdev->release_idr, release->id);
-	spin_unlock(&qdev->release_idr_lock);
 	kfree(release);
 }
 
@@ -279,9 +271,7 @@ struct qxl_release *qxl_release_from_id_locked(struct qxl_device *qdev,
 {
 	struct qxl_release *release;
 
-	spin_lock(&qdev->release_idr_lock);
 	release = idr_find(&qdev->release_idr, id);
-	spin_unlock(&qdev->release_idr_lock);
 	if (!release) {
 		DRM_ERROR("failed to find id in release_idr\n");
 		return NULL;
diff --git a/drivers/infiniband/hw/amso1100/c2.h b/drivers/infiniband/hw/amso1100/c2.h
index d619d73..73f2e96 100644
--- a/drivers/infiniband/hw/amso1100/c2.h
+++ b/drivers/infiniband/hw/amso1100/c2.h
@@ -264,7 +264,6 @@ struct c2_pd_table {
 
 struct c2_qp_table {
 	struct idr idr;
-	spinlock_t lock;
 };
 
 struct c2_element {
diff --git a/drivers/infiniband/hw/amso1100/c2_qp.c b/drivers/infiniband/hw/amso1100/c2_qp.c
index 86708de..cb80aa9 100644
--- a/drivers/infiniband/hw/amso1100/c2_qp.c
+++ b/drivers/infiniband/hw/amso1100/c2_qp.c
@@ -380,36 +380,21 @@ static int destroy_qp(struct c2_dev *c2dev, struct c2_qp *qp)
 
 static int c2_alloc_qpn(struct c2_dev *c2dev, struct c2_qp *qp)
 {
-	int ret;
-
-	idr_preload(GFP_KERNEL);
-	spin_lock_irq(&c2dev->qp_table.lock);
-
-	ret = idr_alloc_cyclic(&c2dev->qp_table.idr, qp, 0, 0, GFP_NOWAIT);
+	int ret = idr_alloc_cyclic(&c2dev->qp_table.idr, qp, 0, 0, GFP_KERNEL);
 	if (ret >= 0)
 		qp->qpn = ret;
 
-	spin_unlock_irq(&c2dev->qp_table.lock);
-	idr_preload_end();
 	return ret < 0 ? ret : 0;
 }
 
 static void c2_free_qpn(struct c2_dev *c2dev, int qpn)
 {
-	spin_lock_irq(&c2dev->qp_table.lock);
 	idr_remove(&c2dev->qp_table.idr, qpn);
-	spin_unlock_irq(&c2dev->qp_table.lock);
 }
 
 struct c2_qp *c2_find_qpn(struct c2_dev *c2dev, int qpn)
 {
-	unsigned long flags;
-	struct c2_qp *qp;
-
-	spin_lock_irqsave(&c2dev->qp_table.lock, flags);
-	qp = idr_find(&c2dev->qp_table.idr, qpn);
-	spin_unlock_irqrestore(&c2dev->qp_table.lock, flags);
-	return qp;
+	return idr_find(&c2dev->qp_table.idr, qpn);
 }
 
 int c2_alloc_qp(struct c2_dev *c2dev,
@@ -1014,7 +999,6 @@ out:
 
 void c2_init_qp_table(struct c2_dev *c2dev)
 {
-	spin_lock_init(&c2dev->qp_table.lock);
 	idr_init(&c2dev->qp_table.idr);
 }
 
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index c63d35a..321e891 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1812,9 +1812,7 @@ static int dm_any_congested(void *congested_data, int bdi_bits)
  *---------------------------------------------------------------*/
 static void free_minor(int minor)
 {
-	spin_lock(&_minor_lock);
 	idr_remove(&_minor_idr, minor);
-	spin_unlock(&_minor_lock);
 }
 
 /*
@@ -1827,14 +1825,8 @@ static int specific_minor(int minor)
 	if (minor >= (1 << MINORBITS))
 		return -EINVAL;
 
-	idr_preload(GFP_KERNEL);
-	spin_lock(&_minor_lock);
-
-	r = idr_alloc_range(&minor_idr, MINOR_ALLOCED,
-			    minor, minor + 1, GFP_NOWAIT);
-
-	spin_unlock(&_minor_lock);
-	idr_preload_end();
+	r = idr_alloc_range(&_minor_idr, MINOR_ALLOCED, minor,
+			    minor + 1, GFP_KERNEL);
 	if (r < 0)
 		return r == -ENOSPC ? -EBUSY : r;
 	return 0;
@@ -1844,14 +1836,8 @@ static int next_free_minor(int *minor)
 {
 	int r;
 
-	idr_preload(GFP_KERNEL);
-	spin_lock(&_minor_lock);
-
-	r = idr_alloc_range(&minor_idr, MINOR_ALLOCED, 0,
-			    1 << MINORBITS, GFP_NOWAIT);
-
-	spin_unlock(&_minor_lock);
-	idr_preload_end();
+	r = idr_alloc_range(&_minor_idr, MINOR_ALLOCED,
+			    0, 1 << MINORBITS, GFP_KERNEL);
 	if (r < 0)
 		return r;
 	*minor = r;
@@ -1962,9 +1948,7 @@ static struct mapped_device *alloc_dev(int minor)
 	md->flush_bio.bi_rw = WRITE_FLUSH;
 
 	/* Populate the mapping, nobody knows we exist yet */
-	spin_lock(&_minor_lock);
 	old_md = idr_replace(&_minor_idr, md, minor);
-	spin_unlock(&_minor_lock);
 
 	BUG_ON(old_md != MINOR_ALLOCED);
 
diff --git a/drivers/memstick/core/memstick.c b/drivers/memstick/core/memstick.c
index 70fb07c..b7544a3 100644
--- a/drivers/memstick/core/memstick.c
+++ b/drivers/memstick/core/memstick.c
@@ -26,7 +26,6 @@ module_param(cmd_retries, uint, 0644);
 
 static struct workqueue_struct *workqueue;
 static DEFINE_IDR(memstick_host_idr);
-static DEFINE_SPINLOCK(memstick_host_lock);
 
 static int memstick_dev_match(struct memstick_dev *card,
 			      struct memstick_device_id *id)
@@ -512,25 +511,17 @@ int memstick_add_host(struct memstick_host *host)
 {
 	int rc;
 
-	idr_preload(GFP_KERNEL);
-	spin_lock(&memstick_host_lock);
-
-	rc = idr_alloc(&memstick_host_idr, host, GFP_NOWAIT);
-	if (rc >= 0)
-		host->id = rc;
-
-	spin_unlock(&memstick_host_lock);
-	idr_preload_end();
+	rc = idr_alloc(&memstick_host_idr, host, GFP_KERNEL);
 	if (rc < 0)
 		return rc;
 
+	host->id = rc;
+
 	dev_set_name(&host->dev, "memstick%u", host->id);
 
 	rc = device_add(&host->dev);
 	if (rc) {
-		spin_lock(&memstick_host_lock);
 		idr_remove(&memstick_host_idr, host->id);
-		spin_unlock(&memstick_host_lock);
 		return rc;
 	}
 
@@ -554,9 +545,7 @@ void memstick_remove_host(struct memstick_host *host)
 	host->set_param(host, MEMSTICK_POWER, MEMSTICK_POWER_OFF);
 	mutex_unlock(&host->lock);
 
-	spin_lock(&memstick_host_lock);
 	idr_remove(&memstick_host_idr, host->id);
-	spin_unlock(&memstick_host_lock);
 	device_del(&host->dev);
 }
 EXPORT_SYMBOL(memstick_remove_host);
diff --git a/drivers/mfd/rtsx_pcr.c b/drivers/mfd/rtsx_pcr.c
index 862deed..be5913a 100644
--- a/drivers/mfd/rtsx_pcr.c
+++ b/drivers/mfd/rtsx_pcr.c
@@ -40,7 +40,6 @@ module_param(msi_en, bool, S_IRUGO | S_IWUSR);
 MODULE_PARM_DESC(msi_en, "Enable MSI");
 
 static DEFINE_IDR(rtsx_pci_idr);
-static DEFINE_SPINLOCK(rtsx_pci_lock);
 
 static struct mfd_cell rtsx_pcr_cells[] = {
 	[RTSX_SD_CARD] = {
@@ -1101,16 +1100,12 @@ static int rtsx_pci_probe(struct pci_dev *pcidev,
 	}
 	handle->pcr = pcr;
 
-	idr_preload(GFP_KERNEL);
-	spin_lock(&rtsx_pci_lock);
-	ret = idr_alloc(&rtsx_pci_idr, pcr, GFP_NOWAIT);
-	if (ret >= 0)
-		pcr->id = ret;
-	spin_unlock(&rtsx_pci_lock);
-	idr_preload_end();
+	ret = idr_alloc(&rtsx_pci_idr, pcr, GFP_KERNEL);
 	if (ret < 0)
 		goto free_handle;
 
+	pcr->id = ret;
+
 	pcr->pci = pcidev;
 	dev_set_drvdata(&pcidev->dev, handle);
 
@@ -1216,9 +1211,7 @@ static void rtsx_pci_remove(struct pci_dev *pcidev)
 	pci_release_regions(pcidev);
 	pci_disable_device(pcidev);
 
-	spin_lock(&rtsx_pci_lock);
 	idr_remove(&rtsx_pci_idr, pcr->id);
-	spin_unlock(&rtsx_pci_lock);
 
 	kfree(pcr->slots);
 	kfree(pcr);
diff --git a/drivers/misc/c2port/core.c b/drivers/misc/c2port/core.c
index 8a1ab10..5cd8c99 100644
--- a/drivers/misc/c2port/core.c
+++ b/drivers/misc/c2port/core.c
@@ -27,7 +27,6 @@
 #define DRIVER_NAME             "c2port"
 #define DRIVER_VERSION          "0.51.0"
 
-static DEFINE_SPINLOCK(c2port_idr_lock);
 static DEFINE_IDR(c2port_idr);
 
 /*
@@ -897,11 +896,7 @@ struct c2port_device *c2port_device_register(char *name,
 	if (unlikely(!c2dev))
 		return ERR_PTR(-ENOMEM);
 
-	idr_preload(GFP_KERNEL);
-	spin_lock_irq(&c2port_idr_lock);
-	ret = idr_alloc(&c2port_idr, c2dev, GFP_NOWAIT);
-	spin_unlock_irq(&c2port_idr_lock);
-	idr_preload_end();
+	ret = idr_alloc(&c2port_idr, c2dev, GFP_KERNEL);
 
 	if (ret < 0)
 		goto error_idr_alloc;
@@ -941,9 +936,7 @@ error_device_create_bin_file:
 	device_destroy(c2port_class, 0);
 
 error_device_create:
-	spin_lock_irq(&c2port_idr_lock);
 	idr_remove(&c2port_idr, c2dev->id);
-	spin_unlock_irq(&c2port_idr_lock);
 
 error_idr_alloc:
 	kfree(c2dev);
@@ -960,9 +953,7 @@ void c2port_device_unregister(struct c2port_device *c2dev)
 	dev_info(c2dev->dev, "C2 port %s removed\n", c2dev->name);
 
 	device_remove_bin_file(c2dev->dev, &c2port_bin_attrs);
-	spin_lock_irq(&c2port_idr_lock);
 	idr_remove(&c2port_idr, c2dev->id);
-	spin_unlock_irq(&c2port_idr_lock);
 
 	device_destroy(c2port_class, c2dev->id);
 
diff --git a/drivers/misc/tifm_core.c b/drivers/misc/tifm_core.c
index c828c27..ee7b181 100644
--- a/drivers/misc/tifm_core.c
+++ b/drivers/misc/tifm_core.c
@@ -20,7 +20,6 @@
 
 static struct workqueue_struct *workqueue;
 static DEFINE_IDR(tifm_adapter_idr);
-static DEFINE_SPINLOCK(tifm_adapter_lock);
 
 static const char *tifm_media_type_name(unsigned char type, unsigned char nt)
 {
@@ -196,22 +195,16 @@ int tifm_add_adapter(struct tifm_adapter *fm)
 {
 	int rc;
 
-	idr_preload(GFP_KERNEL);
-	spin_lock(&tifm_adapter_lock);
-	rc = idr_alloc(&tifm_adapter_idr, fm, GFP_NOWAIT);
-	if (rc >= 0)
-		fm->id = rc;
-	spin_unlock(&tifm_adapter_lock);
-	idr_preload_end();
+	rc = idr_alloc(&tifm_adapter_idr, fm, GFP_KERNEL);
 	if (rc < 0)
 		return rc;
 
+	fm->id = rc;
+
 	dev_set_name(&fm->dev, "tifm%u", fm->id);
 	rc = device_add(&fm->dev);
 	if (rc) {
-		spin_lock(&tifm_adapter_lock);
 		idr_remove(&tifm_adapter_idr, fm->id);
-		spin_unlock(&tifm_adapter_lock);
 	}
 
 	return rc;
@@ -228,9 +221,7 @@ void tifm_remove_adapter(struct tifm_adapter *fm)
 			device_unregister(&fm->sockets[cnt]->dev);
 	}
 
-	spin_lock(&tifm_adapter_lock);
 	idr_remove(&tifm_adapter_idr, fm->id);
-	spin_unlock(&tifm_adapter_lock);
 	device_del(&fm->dev);
 }
 EXPORT_SYMBOL(tifm_remove_adapter);
diff --git a/drivers/mmc/core/host.c b/drivers/mmc/core/host.c
index dd223fa..d184bd4 100644
--- a/drivers/mmc/core/host.c
+++ b/drivers/mmc/core/host.c
@@ -55,7 +55,6 @@ void mmc_unregister_host_class(void)
 }
 
 static DEFINE_IDR(mmc_host_idr);
-static DEFINE_SPINLOCK(mmc_host_lock);
 
 #ifdef CONFIG_MMC_CLKGATE
 static ssize_t clkgate_delay_show(struct device *dev,
@@ -457,16 +456,11 @@ struct mmc_host *mmc_alloc_host(int extra, struct device *dev)
 
 	/* scanning will be enabled when we're ready */
 	host->rescan_disable = 1;
-	idr_preload(GFP_KERNEL);
-	spin_lock(&mmc_host_lock);
-	err = idr_alloc(&mmc_host_idr, host, GFP_NOWAIT);
-	if (err >= 0)
-		host->index = err;
-	spin_unlock(&mmc_host_lock);
-	idr_preload_end();
+	err = idr_alloc(&mmc_host_idr, host, GFP_KERNEL);
 	if (err < 0)
 		goto free;
 
+	host->index = err;
 	dev_set_name(&host->class_dev, "mmc%d", host->index);
 
 	host->parent = dev;
@@ -574,10 +568,7 @@ EXPORT_SYMBOL(mmc_remove_host);
  */
 void mmc_free_host(struct mmc_host *host)
 {
-	spin_lock(&mmc_host_lock);
 	idr_remove(&mmc_host_idr, host->index);
-	spin_unlock(&mmc_host_lock);
-
 	put_device(&host->class_dev);
 }
 
diff --git a/drivers/scsi/ch.c b/drivers/scsi/ch.c
index 4e0ffdc..0693879 100644
--- a/drivers/scsi/ch.c
+++ b/drivers/scsi/ch.c
@@ -115,7 +115,6 @@ typedef struct {
 } scsi_changer;
 
 static DEFINE_IDR(ch_index_idr);
-static DEFINE_SPINLOCK(ch_index_lock);
 
 static const struct {
 	unsigned char  sense;
@@ -582,15 +581,12 @@ ch_open(struct inode *inode, struct file *file)
 	int minor = iminor(inode);
 
 	mutex_lock(&ch_mutex);
-	spin_lock(&ch_index_lock);
 	ch = idr_find(&ch_index_idr, minor);
 
 	if (NULL == ch || scsi_device_get(ch->device)) {
-		spin_unlock(&ch_index_lock);
 		mutex_unlock(&ch_mutex);
 		return -ENXIO;
 	}
-	spin_unlock(&ch_index_lock);
 
 	file->private_data = ch;
 	mutex_unlock(&ch_mutex);
@@ -905,13 +901,8 @@ static int ch_probe(struct device *dev)
 	if (NULL == ch)
 		return -ENOMEM;
 
-	idr_preload(GFP_KERNEL);
-	spin_lock(&ch_index_lock);
 	ret = idr_alloc_range(&ch_index_idr, ch, 0,
-			      CH_MAX_DEVS + 1, GFP_NOWAIT);
-	spin_unlock(&ch_index_lock);
-	idr_preload_end();
-
+			      CH_MAX_DEVS + 1, GFP_KERNEL);
 	if (ret < 0) {
 		if (ret == -ENOSPC)
 			ret = -ENODEV;
@@ -952,9 +943,7 @@ static int ch_remove(struct device *dev)
 {
 	scsi_changer *ch = dev_get_drvdata(dev);
 
-	spin_lock(&ch_index_lock);
 	idr_remove(&ch_index_idr, ch->minor);
-	spin_unlock(&ch_index_lock);
 
 	device_destroy(ch_sysfs_class, MKDEV(SCSI_CHANGER_MAJOR,ch->minor));
 	kfree(ch->dt);
diff --git a/drivers/scsi/st.c b/drivers/scsi/st.c
index 62ac019..d256410 100644
--- a/drivers/scsi/st.c
+++ b/drivers/scsi/st.c
@@ -214,7 +214,6 @@ static void scsi_tape_release(struct kref *);
 #define to_scsi_tape(obj) container_of(obj, struct scsi_tape, kref)
 
 static DEFINE_MUTEX(st_ref_mutex);
-static DEFINE_SPINLOCK(st_index_lock);
 static DEFINE_SPINLOCK(st_use_lock);
 static DEFINE_IDR(st_index_idr);
 
@@ -235,7 +234,6 @@ static struct scsi_tape *scsi_tape_get(int dev)
 	struct scsi_tape *STp = NULL;
 
 	mutex_lock(&st_ref_mutex);
-	spin_lock(&st_index_lock);
 
 	STp = idr_find(&st_index_idr, dev);
 	if (!STp) goto out;
@@ -254,7 +252,6 @@ out_put:
 	kref_put(&STp->kref, scsi_tape_release);
 	STp = NULL;
 out:
-	spin_unlock(&st_index_lock);
 	mutex_unlock(&st_ref_mutex);
 	return STp;
 }
@@ -4182,12 +4179,8 @@ static int st_probe(struct device *dev)
 	    tpnt->blksize_changed = 0;
 	mutex_init(&tpnt->lock);
 
-	idr_preload(GFP_KERNEL);
-	spin_lock(&st_index_lock);
 	error = idr_alloc_range(&st_index_idr, tpnt, 0,
-				ST_MAX_TAPES + 1, GFP_NOWAIT);
-	spin_unlock(&st_index_lock);
-	idr_preload_end();
+				ST_MAX_TAPES + 1, GFP_KERNEL);
 	if (error < 0) {
 		pr_warn("st: idr allocation failed: %d\n", error);
 		goto out_put_queue;
@@ -4213,9 +4206,7 @@ static int st_probe(struct device *dev)
 
 out_remove_devs:
 	remove_cdevs(tpnt);
-	spin_lock(&st_index_lock);
 	idr_remove(&st_index_idr, tpnt->index);
-	spin_unlock(&st_index_lock);
 out_put_queue:
 	blk_put_queue(disk->queue);
 out_put_disk:
@@ -4239,9 +4230,7 @@ static int st_remove(struct device *dev)
 	mutex_lock(&st_ref_mutex);
 	kref_put(&tpnt->kref, scsi_tape_release);
 	mutex_unlock(&st_ref_mutex);
-	spin_lock(&st_index_lock);
 	idr_remove(&st_index_idr, index);
-	spin_unlock(&st_index_lock);
 	return 0;
 }
 
diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c
index 1e0539c..af3a9d5 100644
--- a/drivers/target/iscsi/iscsi_target.c
+++ b/drivers/target/iscsi/iscsi_target.c
@@ -59,7 +59,6 @@ static DEFINE_SPINLOCK(np_lock);
 static struct idr tiqn_idr;
 struct idr sess_idr;
 struct mutex auth_id_lock;
-spinlock_t sess_idr_lock;
 
 struct iscsit_global *iscsit_global;
 
@@ -147,22 +146,17 @@ struct iscsi_tiqn *iscsit_add_tiqn(unsigned char *buf)
 
 	tiqn->tiqn_state = TIQN_STATE_ACTIVE;
 
-	idr_preload(GFP_KERNEL);
-	spin_lock(&tiqn_lock);
-
-	ret = idr_alloc(&tiqn_idr, NULL, GFP_NOWAIT);
+	ret = idr_alloc(&tiqn_idr, NULL, GFP_KERNEL);
 	if (ret < 0) {
 		pr_err("idr_alloc() failed for tiqn->tiqn_index\n");
-		spin_unlock(&tiqn_lock);
-		idr_preload_end();
 		kfree(tiqn);
 		return ERR_PTR(ret);
 	}
 	tiqn->tiqn_index = ret;
-	list_add_tail(&tiqn->tiqn_list, &g_tiqn_list);
 
+	spin_lock(&tiqn_lock);
+	list_add_tail(&tiqn->tiqn_list, &g_tiqn_list);
 	spin_unlock(&tiqn_lock);
-	idr_preload_end();
 
 	pr_debug("CORE[0] - Added iSCSI Target IQN: %s\n", tiqn->tiqn);
 
@@ -201,8 +195,8 @@ void iscsit_del_tiqn(struct iscsi_tiqn *tiqn)
 
 	spin_lock(&tiqn_lock);
 	list_del(&tiqn->tiqn_list);
-	idr_remove(&tiqn_idr, tiqn->tiqn_index);
 	spin_unlock(&tiqn_lock);
+	idr_remove(&tiqn_idr, tiqn->tiqn_index);
 
 	pr_debug("CORE[0] - Deleted iSCSI Target IQN: %s\n",
 			tiqn->tiqn);
@@ -519,7 +513,6 @@ static int __init iscsi_target_init_module(void)
 		return -1;
 	}
 	mutex_init(&auth_id_lock);
-	spin_lock_init(&sess_idr_lock);
 	idr_init(&tiqn_idr);
 	idr_init(&sess_idr);
 
@@ -4449,9 +4442,7 @@ int iscsit_close_session(struct iscsi_session *sess)
 	pr_debug("Decremented number of active iSCSI Sessions on"
 		" iSCSI TPG: %hu to %u\n", tpg->tpgt, tpg->nsessions);
 
-	spin_lock(&sess_idr_lock);
 	idr_remove(&sess_idr, sess->session_index);
-	spin_unlock(&sess_idr_lock);
 
 	kfree(sess->sess_ops);
 	sess->sess_ops = NULL;
diff --git a/drivers/target/iscsi/iscsi_target.h b/drivers/target/iscsi/iscsi_target.h
index 2c437cb..db52ddd 100644
--- a/drivers/target/iscsi/iscsi_target.h
+++ b/drivers/target/iscsi/iscsi_target.h
@@ -43,7 +43,6 @@ extern struct kmem_cache *lio_r2t_cache;
 
 extern struct idr sess_idr;
 extern struct mutex auth_id_lock;
-extern spinlock_t sess_idr_lock;
 
 
 #endif   /*** ISCSI_TARGET_H ***/
diff --git a/drivers/target/iscsi/iscsi_target_login.c b/drivers/target/iscsi/iscsi_target_login.c
index 4da42b0..176cd56 100644
--- a/drivers/target/iscsi/iscsi_target_login.c
+++ b/drivers/target/iscsi/iscsi_target_login.c
@@ -289,14 +289,7 @@ static int iscsi_login_zero_tsih_s1(
 	spin_lock_init(&sess->session_usage_lock);
 	spin_lock_init(&sess->ttt_lock);
 
-	idr_preload(GFP_KERNEL);
-	spin_lock_bh(&sess_idr_lock);
-	ret = idr_alloc(&sess_idr, NULL, GFP_NOWAIT);
-	if (ret >= 0)
-		sess->session_index = ret;
-	spin_unlock_bh(&sess_idr_lock);
-	idr_preload_end();
-
+	ret = idr_alloc(&sess_idr, NULL, GFP_KERNEL);
 	if (ret < 0) {
 		pr_err("idr_alloc() for sess_idr failed\n");
 		iscsit_tx_login_rsp(conn, ISCSI_STATUS_CLS_TARGET_ERR,
@@ -305,6 +298,7 @@ static int iscsi_login_zero_tsih_s1(
 		return -ENOMEM;
 	}
 
+	sess->session_index = ret;
 	sess->creation_time = get_jiffies_64();
 	spin_lock_init(&sess->session_stats_lock);
 	/*
@@ -1312,9 +1306,7 @@ new_sess_out:
 	if (conn->sess->se_sess)
 		transport_free_session(conn->sess->se_sess);
 	if (conn->sess->session_index != 0) {
-		spin_lock_bh(&sess_idr_lock);
 		idr_remove(&sess_idr, conn->sess->session_index);
-		spin_unlock_bh(&sess_idr_lock);
 	}
 	kfree(conn->sess->sess_ops);
 	kfree(conn->sess);
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index e9ac882..964779a 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -622,7 +622,6 @@ struct cgroup_subsys {
 	struct list_head sibling;
 	/* used when use_id == true */
 	struct idr idr;
-	spinlock_t id_lock;
 
 	/* list of cftype_sets */
 	struct list_head cftsets;
diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
index d8e37ec..5a1ad3d 100644
--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -502,7 +502,6 @@ extern struct proto sctpv6_prot;
 void sctp_put_port(struct sock *sk);
 
 extern struct idr sctp_assocs_id;
-extern spinlock_t sctp_assocs_id_lock;
 
 /* Static inline functions. */
 
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 5bf4f97..b7d0be1 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -5393,9 +5393,7 @@ void free_css_id(struct cgroup_subsys *ss, struct cgroup_subsys_state *css)
 
 	rcu_assign_pointer(id->css, NULL);
 	rcu_assign_pointer(css->id, NULL);
-	spin_lock(&ss->id_lock);
 	idr_remove(&ss->idr, id->id);
-	spin_unlock(&ss->id_lock);
 	kfree_rcu(id, rcu_head);
 }
 EXPORT_SYMBOL_GPL(free_css_id);
@@ -5417,12 +5415,8 @@ static struct css_id *get_new_cssid(struct cgroup_subsys *ss, int depth)
 	if (!newid)
 		return ERR_PTR(-ENOMEM);
 
-	idr_preload(GFP_KERNEL);
-	spin_lock(&ss->id_lock);
 	/* Don't use 0. allocates an ID of 1-65535 */
-	ret = idr_alloc_range(&ss->idr, newid, 1, CSS_ID_MAX + 1, GFP_NOWAIT);
-	spin_unlock(&ss->id_lock);
-	idr_preload_end();
+	ret = idr_alloc_range(&ss->idr, newid, 1, CSS_ID_MAX + 1, GFP_KERNEL);
 
 	/* Returns error when there are no free spaces for new ID.*/
 	if (ret < 0)
@@ -5442,7 +5436,6 @@ static int __init_or_module cgroup_init_idr(struct cgroup_subsys *ss,
 {
 	struct css_id *newid;
 
-	spin_lock_init(&ss->id_lock);
 	idr_init(&ss->idr);
 
 	newid = get_new_cssid(ss, 0);
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index aeb41bf..b97303a 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1703,13 +1703,7 @@ static struct worker *create_worker(struct worker_pool *pool)
 	 * ID is needed to determine kthread name.  Allocate ID first
 	 * without installing the pointer.
 	 */
-	idr_preload(GFP_KERNEL);
-	spin_lock_irq(&pool->lock);
-
-	id = idr_alloc(&pool->worker_idr, NULL, GFP_NOWAIT);
-
-	spin_unlock_irq(&pool->lock);
-	idr_preload_end();
+	id = idr_alloc(&pool->worker_idr, NULL, GFP_KERNEL);
 	if (id < 0)
 		goto fail;
 
@@ -1750,18 +1744,13 @@ static struct worker *create_worker(struct worker_pool *pool)
 		worker->flags |= WORKER_UNBOUND;
 
 	/* successful, commit the pointer to idr */
-	spin_lock_irq(&pool->lock);
 	idr_replace(&pool->worker_idr, worker, worker->id);
-	spin_unlock_irq(&pool->lock);
 
 	return worker;
 
 fail:
-	if (id >= 0) {
-		spin_lock_irq(&pool->lock);
+	if (id >= 0)
 		idr_remove(&pool->worker_idr, id);
-		spin_unlock_irq(&pool->lock);
-	}
 	kfree(worker);
 	return NULL;
 }
diff --git a/net/9p/util.c b/net/9p/util.c
index cc31c62..23693d6 100644
--- a/net/9p/util.c
+++ b/net/9p/util.c
@@ -41,7 +41,6 @@
  */
 
 struct p9_idpool {
-	spinlock_t lock;
 	struct idr pool;
 };
 
@@ -58,7 +57,6 @@ struct p9_idpool *p9_idpool_create(void)
 	if (!p)
 		return ERR_PTR(-ENOMEM);
 
-	spin_lock_init(&p->lock);
 	idr_init(&p->pool);
 
 	return p;
@@ -88,16 +86,9 @@ EXPORT_SYMBOL(p9_idpool_destroy);
 int p9_idpool_get(struct p9_idpool *p)
 {
 	int i;
-	unsigned long flags;
-
-	idr_preload(GFP_NOFS);
-	spin_lock_irqsave(&p->lock, flags);
 
 	/* no need to store exactly p, we just need something non-null */
-	i = idr_alloc(&p->pool, p, GFP_NOWAIT);
-
-	spin_unlock_irqrestore(&p->lock, flags);
-	idr_preload_end();
+	i = idr_alloc(&p->pool, p, GFP_NOFS);
 	if (i < 0)
 		return -1;
 
@@ -117,13 +108,9 @@ EXPORT_SYMBOL(p9_idpool_get);
 
 void p9_idpool_put(int id, struct p9_idpool *p)
 {
-	unsigned long flags;
-
 	p9_debug(P9_DEBUG_MUX, " id %d pool %p\n", id, p);
 
-	spin_lock_irqsave(&p->lock, flags);
 	idr_remove(&p->pool, id);
-	spin_unlock_irqrestore(&p->lock, flags);
 }
 EXPORT_SYMBOL(p9_idpool_put);
 
diff --git a/net/sctp/associola.c b/net/sctp/associola.c
index bce5b79..d88fc6d 100644
--- a/net/sctp/associola.c
+++ b/net/sctp/associola.c
@@ -464,11 +464,8 @@ static void sctp_association_destroy(struct sctp_association *asoc)
 	sctp_endpoint_put(asoc->ep);
 	sock_put(asoc->base.sk);
 
-	if (asoc->assoc_id != 0) {
-		spin_lock_bh(&sctp_assocs_id_lock);
+	if (asoc->assoc_id != 0)
 		idr_remove(&sctp_assocs_id, asoc->assoc_id);
-		spin_unlock_bh(&sctp_assocs_id_lock);
-	}
 
 	WARN_ON(atomic_read(&asoc->rmem_alloc));
 
@@ -1572,21 +1569,14 @@ int sctp_assoc_lookup_laddr(struct sctp_association *asoc,
 /* Set an association id for a given association */
 int sctp_assoc_set_id(struct sctp_association *asoc, gfp_t gfp)
 {
-	bool preload = gfp & __GFP_WAIT;
 	int ret;
 
 	/* If the id is already assigned, keep it. */
 	if (asoc->assoc_id)
 		return 0;
 
-	if (preload)
-		idr_preload(gfp);
-	spin_lock_bh(&sctp_assocs_id_lock);
 	/* 0 is not a valid assoc_id, must be >= 1 */
-	ret = idr_alloc_cyclic(&sctp_assocs_id, asoc, 1, 0, GFP_NOWAIT);
-	spin_unlock_bh(&sctp_assocs_id_lock);
-	if (preload)
-		idr_preload_end();
+	ret = idr_alloc_cyclic(&sctp_assocs_id, asoc, 1, 0, gfp);
 	if (ret < 0)
 		return ret;
 
diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index 4a17494d..7dc468a 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -71,7 +71,6 @@
 struct sctp_globals sctp_globals __read_mostly;
 
 struct idr sctp_assocs_id;
-DEFINE_SPINLOCK(sctp_assocs_id_lock);
 
 static struct sctp_pf *sctp_pf_inet6_specific;
 static struct sctp_pf *sctp_pf_inet_specific;
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index c6670d2..eb68192 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -226,9 +226,7 @@ struct sctp_association *sctp_id2assoc(struct sock *sk, sctp_assoc_t id)
 	if (!id || (id == (sctp_assoc_t)-1))
 		return NULL;
 
-	spin_lock_bh(&sctp_assocs_id_lock);
 	asoc = (struct sctp_association *)idr_find(&sctp_assocs_id, (int)id);
-	spin_unlock_bh(&sctp_assocs_id_lock);
 
 	if (!asoc || (asoc->base.sk != sk) || asoc->base.dead)
 		return NULL;
-- 
1.8.4.rc1

_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 09/10] idr: Remove unneeded idr locking, idr_preload() usage
@ 2013-08-07 17:46     ` Kent Overstreet
  0 siblings, 0 replies; 46+ messages in thread
From: Kent Overstreet @ 2013-08-07 17:46 UTC (permalink / raw)
  To: akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: David Airlie, Steve Wise, Tom Tucker, Kai Mäkisara,
	dm-devel-H+wXaHxf7aLQT0dZR+AlfA,
	target-devel-u79uwXL29TY76Z2rM5mHXA,
	containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	Sean Hefty, Christoph Hellwig, Alasdair Kergon, Roland Dreier,
	Samuel Ortiz, linux-scsi-u79uwXL29TY76Z2rM5mHXA,
	linux-rdma-u79uwXL29TY76Z2rM5mHXA, Alon Levy, Dave Airlie,
	osst-users-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, Chris Ball,
	Hal Rosenstock, Alex Dubov, Kent Overstreet,
	Nicholas A. Bellinger, cgroups-u79uwXL29TY76Z2rM5mHXA,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Neil Horman

Our new idr implementation does its own locking, instead of forcing it
onto the callers like the old implementation.

Many of the existing idr users need locking for more than just
idr_alloc()/idr_remove()/idr_find() - they're taking refcounts and such
under their locks and we can't touch those.

But a significant number of users had locks that protected nothing more
than the idr data structures itself - those we can get rid of.

Note that we have to be careful when removing locks; in some places,
locks appear to only be protecting idr_alloc()/idr_remove() calls but
they're also used by other code that needs to ensure the idr isn't
modified while it's doing something else - so ideally we want to delete
the lock that protected the idr, or else we have to carefully audit all
the other places it's used.

There's also a fair number of places where things were being done under
the idr lock unnecessarily; drivers/dca/dca-sysfs.c is a good example.
dca->id is set to the id idr_alloc() returns under the lock - but
there's no idr_find() calls under the lock, and dca->id isn't touched in
the idr_remove() paths. So the lock can be safely deleted.

The really nice thing about deleting this unnecessary locking is that it
lets us trivially delete a lot of now unnecessary idr_preload() - with
idr doing its own locking, we can pass GFP_KERNEL to idr_alloc() just
fine.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: David Airlie <airlied@linux.ie>
Cc: Tom Tucker <tom@opengridcomputing.com>
Cc: Steve Wise <swise@opengridcomputing.com>
Cc: Roland Dreier <roland@kernel.org>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: Alasdair Kergon <agk@redhat.com>
Cc: dm-devel@redhat.com
Cc: Samuel Ortiz <sameo@linux.intel.com>
Cc: Alex Dubov <oakad@yahoo.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Chris Ball <cjb@laptop.org>
Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
Cc: Willem Riede <osst@riede.org>
Cc: "Kai Mäkisara" <Kai.Makisara@kolumbus.fi>
Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Alon Levy <alevy@redhat.com>
Cc: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: linux-kernel@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: linux-rdma@vger.kernel.org
Cc: linux-mmc@vger.kernel.org
Cc: linux-scsi@vger.kernel.org
Cc: osst-users@lists.sourceforge.net
Cc: target-devel@vger.kernel.org
Cc: containers@lists.linux-foundation.org
Cc: cgroups@vger.kernel.org
Cc: linux-sctp@vger.kernel.org
Cc: netdev@vger.kernel.org
---
 drivers/dca/dca-sysfs.c                   | 18 +++---------------
 drivers/gpu/drm/qxl/qxl_cmd.c             |  4 +---
 drivers/gpu/drm/qxl/qxl_drv.h             |  1 -
 drivers/gpu/drm/qxl/qxl_kms.c             |  1 -
 drivers/gpu/drm/qxl/qxl_release.c         | 20 +++++---------------
 drivers/infiniband/hw/amso1100/c2.h       |  1 -
 drivers/infiniband/hw/amso1100/c2_qp.c    | 20 ++------------------
 drivers/md/dm.c                           | 24 ++++--------------------
 drivers/memstick/core/memstick.c          | 17 +++--------------
 drivers/mfd/rtsx_pcr.c                    | 13 +++----------
 drivers/misc/c2port/core.c                | 11 +----------
 drivers/misc/tifm_core.c                  | 15 +++------------
 drivers/mmc/core/host.c                   | 13 ++-----------
 drivers/scsi/ch.c                         | 13 +------------
 drivers/scsi/st.c                         | 13 +------------
 drivers/target/iscsi/iscsi_target.c       | 17 ++++-------------
 drivers/target/iscsi/iscsi_target.h       |  1 -
 drivers/target/iscsi/iscsi_target_login.c | 12 ++----------
 include/linux/cgroup.h                    |  1 -
 include/net/sctp/sctp.h                   |  1 -
 kernel/cgroup.c                           |  9 +--------
 kernel/workqueue.c                        | 15 ++-------------
 net/9p/util.c                             | 15 +--------------
 net/sctp/associola.c                      | 14 ++------------
 net/sctp/protocol.c                       |  1 -
 net/sctp/socket.c                         |  2 --
 26 files changed, 41 insertions(+), 231 deletions(-)

diff --git a/drivers/dca/dca-sysfs.c b/drivers/dca/dca-sysfs.c
index effda66..6be5fbd 100644
--- a/drivers/dca/dca-sysfs.c
+++ b/drivers/dca/dca-sysfs.c
@@ -31,7 +31,6 @@
 
 static struct class *dca_class;
 static struct idr dca_idr;
-static spinlock_t dca_idr_lock;
 
 int dca_sysfs_add_req(struct dca_provider *dca, struct device *dev, int slot)
 {
@@ -55,23 +54,15 @@ int dca_sysfs_add_provider(struct dca_provider *dca, struct device *dev)
 	struct device *cd;
 	int ret;
 
-	idr_preload(GFP_KERNEL);
-	spin_lock(&dca_idr_lock);
-
-	ret = idr_alloc(&dca_idr, dca, GFP_NOWAIT);
-	if (ret >= 0)
-		dca->id = ret;
-
-	spin_unlock(&dca_idr_lock);
-	idr_preload_end();
+	ret = idr_alloc(&dca_idr, dca, GFP_KERNEL);
 	if (ret < 0)
 		return ret;
 
+	dca->id = ret;
+
 	cd = device_create(dca_class, dev, MKDEV(0, 0), NULL, "dca%d", dca->id);
 	if (IS_ERR(cd)) {
-		spin_lock(&dca_idr_lock);
 		idr_remove(&dca_idr, dca->id);
-		spin_unlock(&dca_idr_lock);
 		return PTR_ERR(cd);
 	}
 	dca->cd = cd;
@@ -82,15 +73,12 @@ void dca_sysfs_remove_provider(struct dca_provider *dca)
 {
 	device_unregister(dca->cd);
 	dca->cd = NULL;
-	spin_lock(&dca_idr_lock);
 	idr_remove(&dca_idr, dca->id);
-	spin_unlock(&dca_idr_lock);
 }
 
 int __init dca_sysfs_init(void)
 {
 	idr_init(&dca_idr);
-	spin_lock_init(&dca_idr_lock);
 
 	dca_class = class_create(THIS_MODULE, "dca");
 	if (IS_ERR(dca_class)) {
diff --git a/drivers/gpu/drm/qxl/qxl_cmd.c b/drivers/gpu/drm/qxl/qxl_cmd.c
index 555a698..e01f7d5 100644
--- a/drivers/gpu/drm/qxl/qxl_cmd.c
+++ b/drivers/gpu/drm/qxl/qxl_cmd.c
@@ -449,11 +449,9 @@ int qxl_surface_id_alloc(struct qxl_device *qdev,
 	int idr_ret;
 	int count = 0;
 again:
-	idr_preload(GFP_ATOMIC);
 	spin_lock(&qdev->surf_id_idr_lock);
-	idr_ret = idr_alloc_range(&qdev->surf_id_idr, NULL, 1, 0, GFP_NOWAIT);
+	idr_ret = idr_alloc_range(&qdev->surf_id_idr, NULL, 1, 0, GFP_ATOMIC);
 	spin_unlock(&qdev->surf_id_idr_lock);
-	idr_preload_end();
 	if (idr_ret < 0)
 		return idr_ret;
 	handle = idr_ret;
diff --git a/drivers/gpu/drm/qxl/qxl_drv.h b/drivers/gpu/drm/qxl/qxl_drv.h
index 7e96f4f..df25af5 100644
--- a/drivers/gpu/drm/qxl/qxl_drv.h
+++ b/drivers/gpu/drm/qxl/qxl_drv.h
@@ -285,7 +285,6 @@ struct qxl_device {
 	uint64_t	va_slot_mask;
 
 	struct idr	release_idr;
-	spinlock_t release_idr_lock;
 	struct mutex	async_io_mutex;
 	unsigned int last_sent_io_cmd;
 
diff --git a/drivers/gpu/drm/qxl/qxl_kms.c b/drivers/gpu/drm/qxl/qxl_kms.c
index 9e8da9e..396c1eb 100644
--- a/drivers/gpu/drm/qxl/qxl_kms.c
+++ b/drivers/gpu/drm/qxl/qxl_kms.c
@@ -204,7 +204,6 @@ int qxl_device_init(struct qxl_device *qdev,
 			GFP_KERNEL);
 
 	idr_init(&qdev->release_idr);
-	spin_lock_init(&qdev->release_idr_lock);
 
 	idr_init(&qdev->surf_id_idr);
 	spin_lock_init(&qdev->surf_id_idr_lock);
diff --git a/drivers/gpu/drm/qxl/qxl_release.c b/drivers/gpu/drm/qxl/qxl_release.c
index 7cbe62a..6cb2c83 100644
--- a/drivers/gpu/drm/qxl/qxl_release.c
+++ b/drivers/gpu/drm/qxl/qxl_release.c
@@ -44,7 +44,6 @@ qxl_release_alloc(struct qxl_device *qdev, int type,
 		  struct qxl_release **ret)
 {
 	struct qxl_release *release;
-	int handle;
 	size_t size = sizeof(*release);
 	int idr_ret;
 
@@ -58,21 +57,16 @@ qxl_release_alloc(struct qxl_device *qdev, int type,
 	release->surface_release_id = 0;
 	INIT_LIST_HEAD(&release->bos);
 
-	idr_preload(GFP_KERNEL);
-	spin_lock(&qdev->release_idr_lock);
-	idr_ret = idr_alloc_range(&qdev->release_idr,
-				  release, 1, 0, GFP_NOWAIT);
-	spin_unlock(&qdev->release_idr_lock);
-	idr_preload_end();
-	handle = idr_ret;
+	idr_ret = idr_alloc_range(&qdev->release_idr, release,
+				  1, 0, GFP_KERNEL);
 	if (idr_ret < 0)
 		goto release_fail;
 	*ret = release;
-	QXL_INFO(qdev, "allocated release %lld\n", handle);
-	release->id = handle;
+	QXL_INFO(qdev, "allocated release %lld\n", idr_ret);
+	release->id = idr_ret;
 release_fail:
 
-	return handle;
+	return idr_ret;
 }
 
 void
@@ -94,9 +88,7 @@ qxl_release_free(struct qxl_device *qdev,
 		qxl_fence_remove_release(&bo->fence, release->id);
 		qxl_bo_unref(&bo);
 	}
-	spin_lock(&qdev->release_idr_lock);
 	idr_remove(&qdev->release_idr, release->id);
-	spin_unlock(&qdev->release_idr_lock);
 	kfree(release);
 }
 
@@ -279,9 +271,7 @@ struct qxl_release *qxl_release_from_id_locked(struct qxl_device *qdev,
 {
 	struct qxl_release *release;
 
-	spin_lock(&qdev->release_idr_lock);
 	release = idr_find(&qdev->release_idr, id);
-	spin_unlock(&qdev->release_idr_lock);
 	if (!release) {
 		DRM_ERROR("failed to find id in release_idr\n");
 		return NULL;
diff --git a/drivers/infiniband/hw/amso1100/c2.h b/drivers/infiniband/hw/amso1100/c2.h
index d619d73..73f2e96 100644
--- a/drivers/infiniband/hw/amso1100/c2.h
+++ b/drivers/infiniband/hw/amso1100/c2.h
@@ -264,7 +264,6 @@ struct c2_pd_table {
 
 struct c2_qp_table {
 	struct idr idr;
-	spinlock_t lock;
 };
 
 struct c2_element {
diff --git a/drivers/infiniband/hw/amso1100/c2_qp.c b/drivers/infiniband/hw/amso1100/c2_qp.c
index 86708de..cb80aa9 100644
--- a/drivers/infiniband/hw/amso1100/c2_qp.c
+++ b/drivers/infiniband/hw/amso1100/c2_qp.c
@@ -380,36 +380,21 @@ static int destroy_qp(struct c2_dev *c2dev, struct c2_qp *qp)
 
 static int c2_alloc_qpn(struct c2_dev *c2dev, struct c2_qp *qp)
 {
-	int ret;
-
-	idr_preload(GFP_KERNEL);
-	spin_lock_irq(&c2dev->qp_table.lock);
-
-	ret = idr_alloc_cyclic(&c2dev->qp_table.idr, qp, 0, 0, GFP_NOWAIT);
+	int ret = idr_alloc_cyclic(&c2dev->qp_table.idr, qp, 0, 0, GFP_KERNEL);
 	if (ret >= 0)
 		qp->qpn = ret;
 
-	spin_unlock_irq(&c2dev->qp_table.lock);
-	idr_preload_end();
 	return ret < 0 ? ret : 0;
 }
 
 static void c2_free_qpn(struct c2_dev *c2dev, int qpn)
 {
-	spin_lock_irq(&c2dev->qp_table.lock);
 	idr_remove(&c2dev->qp_table.idr, qpn);
-	spin_unlock_irq(&c2dev->qp_table.lock);
 }
 
 struct c2_qp *c2_find_qpn(struct c2_dev *c2dev, int qpn)
 {
-	unsigned long flags;
-	struct c2_qp *qp;
-
-	spin_lock_irqsave(&c2dev->qp_table.lock, flags);
-	qp = idr_find(&c2dev->qp_table.idr, qpn);
-	spin_unlock_irqrestore(&c2dev->qp_table.lock, flags);
-	return qp;
+	return idr_find(&c2dev->qp_table.idr, qpn);
 }
 
 int c2_alloc_qp(struct c2_dev *c2dev,
@@ -1014,7 +999,6 @@ out:
 
 void c2_init_qp_table(struct c2_dev *c2dev)
 {
-	spin_lock_init(&c2dev->qp_table.lock);
 	idr_init(&c2dev->qp_table.idr);
 }
 
diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index c63d35a..321e891 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -1812,9 +1812,7 @@ static int dm_any_congested(void *congested_data, int bdi_bits)
  *---------------------------------------------------------------*/
 static void free_minor(int minor)
 {
-	spin_lock(&_minor_lock);
 	idr_remove(&_minor_idr, minor);
-	spin_unlock(&_minor_lock);
 }
 
 /*
@@ -1827,14 +1825,8 @@ static int specific_minor(int minor)
 	if (minor >= (1 << MINORBITS))
 		return -EINVAL;
 
-	idr_preload(GFP_KERNEL);
-	spin_lock(&_minor_lock);
-
-	r = idr_alloc_range(&minor_idr, MINOR_ALLOCED,
-			    minor, minor + 1, GFP_NOWAIT);
-
-	spin_unlock(&_minor_lock);
-	idr_preload_end();
+	r = idr_alloc_range(&_minor_idr, MINOR_ALLOCED, minor,
+			    minor + 1, GFP_KERNEL);
 	if (r < 0)
 		return r == -ENOSPC ? -EBUSY : r;
 	return 0;
@@ -1844,14 +1836,8 @@ static int next_free_minor(int *minor)
 {
 	int r;
 
-	idr_preload(GFP_KERNEL);
-	spin_lock(&_minor_lock);
-
-	r = idr_alloc_range(&minor_idr, MINOR_ALLOCED, 0,
-			    1 << MINORBITS, GFP_NOWAIT);
-
-	spin_unlock(&_minor_lock);
-	idr_preload_end();
+	r = idr_alloc_range(&_minor_idr, MINOR_ALLOCED,
+			    0, 1 << MINORBITS, GFP_KERNEL);
 	if (r < 0)
 		return r;
 	*minor = r;
@@ -1962,9 +1948,7 @@ static struct mapped_device *alloc_dev(int minor)
 	md->flush_bio.bi_rw = WRITE_FLUSH;
 
 	/* Populate the mapping, nobody knows we exist yet */
-	spin_lock(&_minor_lock);
 	old_md = idr_replace(&_minor_idr, md, minor);
-	spin_unlock(&_minor_lock);
 
 	BUG_ON(old_md != MINOR_ALLOCED);
 
diff --git a/drivers/memstick/core/memstick.c b/drivers/memstick/core/memstick.c
index 70fb07c..b7544a3 100644
--- a/drivers/memstick/core/memstick.c
+++ b/drivers/memstick/core/memstick.c
@@ -26,7 +26,6 @@ module_param(cmd_retries, uint, 0644);
 
 static struct workqueue_struct *workqueue;
 static DEFINE_IDR(memstick_host_idr);
-static DEFINE_SPINLOCK(memstick_host_lock);
 
 static int memstick_dev_match(struct memstick_dev *card,
 			      struct memstick_device_id *id)
@@ -512,25 +511,17 @@ int memstick_add_host(struct memstick_host *host)
 {
 	int rc;
 
-	idr_preload(GFP_KERNEL);
-	spin_lock(&memstick_host_lock);
-
-	rc = idr_alloc(&memstick_host_idr, host, GFP_NOWAIT);
-	if (rc >= 0)
-		host->id = rc;
-
-	spin_unlock(&memstick_host_lock);
-	idr_preload_end();
+	rc = idr_alloc(&memstick_host_idr, host, GFP_KERNEL);
 	if (rc < 0)
 		return rc;
 
+	host->id = rc;
+
 	dev_set_name(&host->dev, "memstick%u", host->id);
 
 	rc = device_add(&host->dev);
 	if (rc) {
-		spin_lock(&memstick_host_lock);
 		idr_remove(&memstick_host_idr, host->id);
-		spin_unlock(&memstick_host_lock);
 		return rc;
 	}
 
@@ -554,9 +545,7 @@ void memstick_remove_host(struct memstick_host *host)
 	host->set_param(host, MEMSTICK_POWER, MEMSTICK_POWER_OFF);
 	mutex_unlock(&host->lock);
 
-	spin_lock(&memstick_host_lock);
 	idr_remove(&memstick_host_idr, host->id);
-	spin_unlock(&memstick_host_lock);
 	device_del(&host->dev);
 }
 EXPORT_SYMBOL(memstick_remove_host);
diff --git a/drivers/mfd/rtsx_pcr.c b/drivers/mfd/rtsx_pcr.c
index 862deed..be5913a 100644
--- a/drivers/mfd/rtsx_pcr.c
+++ b/drivers/mfd/rtsx_pcr.c
@@ -40,7 +40,6 @@ module_param(msi_en, bool, S_IRUGO | S_IWUSR);
 MODULE_PARM_DESC(msi_en, "Enable MSI");
 
 static DEFINE_IDR(rtsx_pci_idr);
-static DEFINE_SPINLOCK(rtsx_pci_lock);
 
 static struct mfd_cell rtsx_pcr_cells[] = {
 	[RTSX_SD_CARD] = {
@@ -1101,16 +1100,12 @@ static int rtsx_pci_probe(struct pci_dev *pcidev,
 	}
 	handle->pcr = pcr;
 
-	idr_preload(GFP_KERNEL);
-	spin_lock(&rtsx_pci_lock);
-	ret = idr_alloc(&rtsx_pci_idr, pcr, GFP_NOWAIT);
-	if (ret >= 0)
-		pcr->id = ret;
-	spin_unlock(&rtsx_pci_lock);
-	idr_preload_end();
+	ret = idr_alloc(&rtsx_pci_idr, pcr, GFP_KERNEL);
 	if (ret < 0)
 		goto free_handle;
 
+	pcr->id = ret;
+
 	pcr->pci = pcidev;
 	dev_set_drvdata(&pcidev->dev, handle);
 
@@ -1216,9 +1211,7 @@ static void rtsx_pci_remove(struct pci_dev *pcidev)
 	pci_release_regions(pcidev);
 	pci_disable_device(pcidev);
 
-	spin_lock(&rtsx_pci_lock);
 	idr_remove(&rtsx_pci_idr, pcr->id);
-	spin_unlock(&rtsx_pci_lock);
 
 	kfree(pcr->slots);
 	kfree(pcr);
diff --git a/drivers/misc/c2port/core.c b/drivers/misc/c2port/core.c
index 8a1ab10..5cd8c99 100644
--- a/drivers/misc/c2port/core.c
+++ b/drivers/misc/c2port/core.c
@@ -27,7 +27,6 @@
 #define DRIVER_NAME             "c2port"
 #define DRIVER_VERSION          "0.51.0"
 
-static DEFINE_SPINLOCK(c2port_idr_lock);
 static DEFINE_IDR(c2port_idr);
 
 /*
@@ -897,11 +896,7 @@ struct c2port_device *c2port_device_register(char *name,
 	if (unlikely(!c2dev))
 		return ERR_PTR(-ENOMEM);
 
-	idr_preload(GFP_KERNEL);
-	spin_lock_irq(&c2port_idr_lock);
-	ret = idr_alloc(&c2port_idr, c2dev, GFP_NOWAIT);
-	spin_unlock_irq(&c2port_idr_lock);
-	idr_preload_end();
+	ret = idr_alloc(&c2port_idr, c2dev, GFP_KERNEL);
 
 	if (ret < 0)
 		goto error_idr_alloc;
@@ -941,9 +936,7 @@ error_device_create_bin_file:
 	device_destroy(c2port_class, 0);
 
 error_device_create:
-	spin_lock_irq(&c2port_idr_lock);
 	idr_remove(&c2port_idr, c2dev->id);
-	spin_unlock_irq(&c2port_idr_lock);
 
 error_idr_alloc:
 	kfree(c2dev);
@@ -960,9 +953,7 @@ void c2port_device_unregister(struct c2port_device *c2dev)
 	dev_info(c2dev->dev, "C2 port %s removed\n", c2dev->name);
 
 	device_remove_bin_file(c2dev->dev, &c2port_bin_attrs);
-	spin_lock_irq(&c2port_idr_lock);
 	idr_remove(&c2port_idr, c2dev->id);
-	spin_unlock_irq(&c2port_idr_lock);
 
 	device_destroy(c2port_class, c2dev->id);
 
diff --git a/drivers/misc/tifm_core.c b/drivers/misc/tifm_core.c
index c828c27..ee7b181 100644
--- a/drivers/misc/tifm_core.c
+++ b/drivers/misc/tifm_core.c
@@ -20,7 +20,6 @@
 
 static struct workqueue_struct *workqueue;
 static DEFINE_IDR(tifm_adapter_idr);
-static DEFINE_SPINLOCK(tifm_adapter_lock);
 
 static const char *tifm_media_type_name(unsigned char type, unsigned char nt)
 {
@@ -196,22 +195,16 @@ int tifm_add_adapter(struct tifm_adapter *fm)
 {
 	int rc;
 
-	idr_preload(GFP_KERNEL);
-	spin_lock(&tifm_adapter_lock);
-	rc = idr_alloc(&tifm_adapter_idr, fm, GFP_NOWAIT);
-	if (rc >= 0)
-		fm->id = rc;
-	spin_unlock(&tifm_adapter_lock);
-	idr_preload_end();
+	rc = idr_alloc(&tifm_adapter_idr, fm, GFP_KERNEL);
 	if (rc < 0)
 		return rc;
 
+	fm->id = rc;
+
 	dev_set_name(&fm->dev, "tifm%u", fm->id);
 	rc = device_add(&fm->dev);
 	if (rc) {
-		spin_lock(&tifm_adapter_lock);
 		idr_remove(&tifm_adapter_idr, fm->id);
-		spin_unlock(&tifm_adapter_lock);
 	}
 
 	return rc;
@@ -228,9 +221,7 @@ void tifm_remove_adapter(struct tifm_adapter *fm)
 			device_unregister(&fm->sockets[cnt]->dev);
 	}
 
-	spin_lock(&tifm_adapter_lock);
 	idr_remove(&tifm_adapter_idr, fm->id);
-	spin_unlock(&tifm_adapter_lock);
 	device_del(&fm->dev);
 }
 EXPORT_SYMBOL(tifm_remove_adapter);
diff --git a/drivers/mmc/core/host.c b/drivers/mmc/core/host.c
index dd223fa..d184bd4 100644
--- a/drivers/mmc/core/host.c
+++ b/drivers/mmc/core/host.c
@@ -55,7 +55,6 @@ void mmc_unregister_host_class(void)
 }
 
 static DEFINE_IDR(mmc_host_idr);
-static DEFINE_SPINLOCK(mmc_host_lock);
 
 #ifdef CONFIG_MMC_CLKGATE
 static ssize_t clkgate_delay_show(struct device *dev,
@@ -457,16 +456,11 @@ struct mmc_host *mmc_alloc_host(int extra, struct device *dev)
 
 	/* scanning will be enabled when we're ready */
 	host->rescan_disable = 1;
-	idr_preload(GFP_KERNEL);
-	spin_lock(&mmc_host_lock);
-	err = idr_alloc(&mmc_host_idr, host, GFP_NOWAIT);
-	if (err >= 0)
-		host->index = err;
-	spin_unlock(&mmc_host_lock);
-	idr_preload_end();
+	err = idr_alloc(&mmc_host_idr, host, GFP_KERNEL);
 	if (err < 0)
 		goto free;
 
+	host->index = err;
 	dev_set_name(&host->class_dev, "mmc%d", host->index);
 
 	host->parent = dev;
@@ -574,10 +568,7 @@ EXPORT_SYMBOL(mmc_remove_host);
  */
 void mmc_free_host(struct mmc_host *host)
 {
-	spin_lock(&mmc_host_lock);
 	idr_remove(&mmc_host_idr, host->index);
-	spin_unlock(&mmc_host_lock);
-
 	put_device(&host->class_dev);
 }
 
diff --git a/drivers/scsi/ch.c b/drivers/scsi/ch.c
index 4e0ffdc..0693879 100644
--- a/drivers/scsi/ch.c
+++ b/drivers/scsi/ch.c
@@ -115,7 +115,6 @@ typedef struct {
 } scsi_changer;
 
 static DEFINE_IDR(ch_index_idr);
-static DEFINE_SPINLOCK(ch_index_lock);
 
 static const struct {
 	unsigned char  sense;
@@ -582,15 +581,12 @@ ch_open(struct inode *inode, struct file *file)
 	int minor = iminor(inode);
 
 	mutex_lock(&ch_mutex);
-	spin_lock(&ch_index_lock);
 	ch = idr_find(&ch_index_idr, minor);
 
 	if (NULL == ch || scsi_device_get(ch->device)) {
-		spin_unlock(&ch_index_lock);
 		mutex_unlock(&ch_mutex);
 		return -ENXIO;
 	}
-	spin_unlock(&ch_index_lock);
 
 	file->private_data = ch;
 	mutex_unlock(&ch_mutex);
@@ -905,13 +901,8 @@ static int ch_probe(struct device *dev)
 	if (NULL == ch)
 		return -ENOMEM;
 
-	idr_preload(GFP_KERNEL);
-	spin_lock(&ch_index_lock);
 	ret = idr_alloc_range(&ch_index_idr, ch, 0,
-			      CH_MAX_DEVS + 1, GFP_NOWAIT);
-	spin_unlock(&ch_index_lock);
-	idr_preload_end();
-
+			      CH_MAX_DEVS + 1, GFP_KERNEL);
 	if (ret < 0) {
 		if (ret == -ENOSPC)
 			ret = -ENODEV;
@@ -952,9 +943,7 @@ static int ch_remove(struct device *dev)
 {
 	scsi_changer *ch = dev_get_drvdata(dev);
 
-	spin_lock(&ch_index_lock);
 	idr_remove(&ch_index_idr, ch->minor);
-	spin_unlock(&ch_index_lock);
 
 	device_destroy(ch_sysfs_class, MKDEV(SCSI_CHANGER_MAJOR,ch->minor));
 	kfree(ch->dt);
diff --git a/drivers/scsi/st.c b/drivers/scsi/st.c
index 62ac019..d256410 100644
--- a/drivers/scsi/st.c
+++ b/drivers/scsi/st.c
@@ -214,7 +214,6 @@ static void scsi_tape_release(struct kref *);
 #define to_scsi_tape(obj) container_of(obj, struct scsi_tape, kref)
 
 static DEFINE_MUTEX(st_ref_mutex);
-static DEFINE_SPINLOCK(st_index_lock);
 static DEFINE_SPINLOCK(st_use_lock);
 static DEFINE_IDR(st_index_idr);
 
@@ -235,7 +234,6 @@ static struct scsi_tape *scsi_tape_get(int dev)
 	struct scsi_tape *STp = NULL;
 
 	mutex_lock(&st_ref_mutex);
-	spin_lock(&st_index_lock);
 
 	STp = idr_find(&st_index_idr, dev);
 	if (!STp) goto out;
@@ -254,7 +252,6 @@ out_put:
 	kref_put(&STp->kref, scsi_tape_release);
 	STp = NULL;
 out:
-	spin_unlock(&st_index_lock);
 	mutex_unlock(&st_ref_mutex);
 	return STp;
 }
@@ -4182,12 +4179,8 @@ static int st_probe(struct device *dev)
 	    tpnt->blksize_changed = 0;
 	mutex_init(&tpnt->lock);
 
-	idr_preload(GFP_KERNEL);
-	spin_lock(&st_index_lock);
 	error = idr_alloc_range(&st_index_idr, tpnt, 0,
-				ST_MAX_TAPES + 1, GFP_NOWAIT);
-	spin_unlock(&st_index_lock);
-	idr_preload_end();
+				ST_MAX_TAPES + 1, GFP_KERNEL);
 	if (error < 0) {
 		pr_warn("st: idr allocation failed: %d\n", error);
 		goto out_put_queue;
@@ -4213,9 +4206,7 @@ static int st_probe(struct device *dev)
 
 out_remove_devs:
 	remove_cdevs(tpnt);
-	spin_lock(&st_index_lock);
 	idr_remove(&st_index_idr, tpnt->index);
-	spin_unlock(&st_index_lock);
 out_put_queue:
 	blk_put_queue(disk->queue);
 out_put_disk:
@@ -4239,9 +4230,7 @@ static int st_remove(struct device *dev)
 	mutex_lock(&st_ref_mutex);
 	kref_put(&tpnt->kref, scsi_tape_release);
 	mutex_unlock(&st_ref_mutex);
-	spin_lock(&st_index_lock);
 	idr_remove(&st_index_idr, index);
-	spin_unlock(&st_index_lock);
 	return 0;
 }
 
diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c
index 1e0539c..af3a9d5 100644
--- a/drivers/target/iscsi/iscsi_target.c
+++ b/drivers/target/iscsi/iscsi_target.c
@@ -59,7 +59,6 @@ static DEFINE_SPINLOCK(np_lock);
 static struct idr tiqn_idr;
 struct idr sess_idr;
 struct mutex auth_id_lock;
-spinlock_t sess_idr_lock;
 
 struct iscsit_global *iscsit_global;
 
@@ -147,22 +146,17 @@ struct iscsi_tiqn *iscsit_add_tiqn(unsigned char *buf)
 
 	tiqn->tiqn_state = TIQN_STATE_ACTIVE;
 
-	idr_preload(GFP_KERNEL);
-	spin_lock(&tiqn_lock);
-
-	ret = idr_alloc(&tiqn_idr, NULL, GFP_NOWAIT);
+	ret = idr_alloc(&tiqn_idr, NULL, GFP_KERNEL);
 	if (ret < 0) {
 		pr_err("idr_alloc() failed for tiqn->tiqn_index\n");
-		spin_unlock(&tiqn_lock);
-		idr_preload_end();
 		kfree(tiqn);
 		return ERR_PTR(ret);
 	}
 	tiqn->tiqn_index = ret;
-	list_add_tail(&tiqn->tiqn_list, &g_tiqn_list);
 
+	spin_lock(&tiqn_lock);
+	list_add_tail(&tiqn->tiqn_list, &g_tiqn_list);
 	spin_unlock(&tiqn_lock);
-	idr_preload_end();
 
 	pr_debug("CORE[0] - Added iSCSI Target IQN: %s\n", tiqn->tiqn);
 
@@ -201,8 +195,8 @@ void iscsit_del_tiqn(struct iscsi_tiqn *tiqn)
 
 	spin_lock(&tiqn_lock);
 	list_del(&tiqn->tiqn_list);
-	idr_remove(&tiqn_idr, tiqn->tiqn_index);
 	spin_unlock(&tiqn_lock);
+	idr_remove(&tiqn_idr, tiqn->tiqn_index);
 
 	pr_debug("CORE[0] - Deleted iSCSI Target IQN: %s\n",
 			tiqn->tiqn);
@@ -519,7 +513,6 @@ static int __init iscsi_target_init_module(void)
 		return -1;
 	}
 	mutex_init(&auth_id_lock);
-	spin_lock_init(&sess_idr_lock);
 	idr_init(&tiqn_idr);
 	idr_init(&sess_idr);
 
@@ -4449,9 +4442,7 @@ int iscsit_close_session(struct iscsi_session *sess)
 	pr_debug("Decremented number of active iSCSI Sessions on"
 		" iSCSI TPG: %hu to %u\n", tpg->tpgt, tpg->nsessions);
 
-	spin_lock(&sess_idr_lock);
 	idr_remove(&sess_idr, sess->session_index);
-	spin_unlock(&sess_idr_lock);
 
 	kfree(sess->sess_ops);
 	sess->sess_ops = NULL;
diff --git a/drivers/target/iscsi/iscsi_target.h b/drivers/target/iscsi/iscsi_target.h
index 2c437cb..db52ddd 100644
--- a/drivers/target/iscsi/iscsi_target.h
+++ b/drivers/target/iscsi/iscsi_target.h
@@ -43,7 +43,6 @@ extern struct kmem_cache *lio_r2t_cache;
 
 extern struct idr sess_idr;
 extern struct mutex auth_id_lock;
-extern spinlock_t sess_idr_lock;
 
 
 #endif   /*** ISCSI_TARGET_H ***/
diff --git a/drivers/target/iscsi/iscsi_target_login.c b/drivers/target/iscsi/iscsi_target_login.c
index 4da42b0..176cd56 100644
--- a/drivers/target/iscsi/iscsi_target_login.c
+++ b/drivers/target/iscsi/iscsi_target_login.c
@@ -289,14 +289,7 @@ static int iscsi_login_zero_tsih_s1(
 	spin_lock_init(&sess->session_usage_lock);
 	spin_lock_init(&sess->ttt_lock);
 
-	idr_preload(GFP_KERNEL);
-	spin_lock_bh(&sess_idr_lock);
-	ret = idr_alloc(&sess_idr, NULL, GFP_NOWAIT);
-	if (ret >= 0)
-		sess->session_index = ret;
-	spin_unlock_bh(&sess_idr_lock);
-	idr_preload_end();
-
+	ret = idr_alloc(&sess_idr, NULL, GFP_KERNEL);
 	if (ret < 0) {
 		pr_err("idr_alloc() for sess_idr failed\n");
 		iscsit_tx_login_rsp(conn, ISCSI_STATUS_CLS_TARGET_ERR,
@@ -305,6 +298,7 @@ static int iscsi_login_zero_tsih_s1(
 		return -ENOMEM;
 	}
 
+	sess->session_index = ret;
 	sess->creation_time = get_jiffies_64();
 	spin_lock_init(&sess->session_stats_lock);
 	/*
@@ -1312,9 +1306,7 @@ new_sess_out:
 	if (conn->sess->se_sess)
 		transport_free_session(conn->sess->se_sess);
 	if (conn->sess->session_index != 0) {
-		spin_lock_bh(&sess_idr_lock);
 		idr_remove(&sess_idr, conn->sess->session_index);
-		spin_unlock_bh(&sess_idr_lock);
 	}
 	kfree(conn->sess->sess_ops);
 	kfree(conn->sess);
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index e9ac882..964779a 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -622,7 +622,6 @@ struct cgroup_subsys {
 	struct list_head sibling;
 	/* used when use_id == true */
 	struct idr idr;
-	spinlock_t id_lock;
 
 	/* list of cftype_sets */
 	struct list_head cftsets;
diff --git a/include/net/sctp/sctp.h b/include/net/sctp/sctp.h
index d8e37ec..5a1ad3d 100644
--- a/include/net/sctp/sctp.h
+++ b/include/net/sctp/sctp.h
@@ -502,7 +502,6 @@ extern struct proto sctpv6_prot;
 void sctp_put_port(struct sock *sk);
 
 extern struct idr sctp_assocs_id;
-extern spinlock_t sctp_assocs_id_lock;
 
 /* Static inline functions. */
 
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 5bf4f97..b7d0be1 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -5393,9 +5393,7 @@ void free_css_id(struct cgroup_subsys *ss, struct cgroup_subsys_state *css)
 
 	rcu_assign_pointer(id->css, NULL);
 	rcu_assign_pointer(css->id, NULL);
-	spin_lock(&ss->id_lock);
 	idr_remove(&ss->idr, id->id);
-	spin_unlock(&ss->id_lock);
 	kfree_rcu(id, rcu_head);
 }
 EXPORT_SYMBOL_GPL(free_css_id);
@@ -5417,12 +5415,8 @@ static struct css_id *get_new_cssid(struct cgroup_subsys *ss, int depth)
 	if (!newid)
 		return ERR_PTR(-ENOMEM);
 
-	idr_preload(GFP_KERNEL);
-	spin_lock(&ss->id_lock);
 	/* Don't use 0. allocates an ID of 1-65535 */
-	ret = idr_alloc_range(&ss->idr, newid, 1, CSS_ID_MAX + 1, GFP_NOWAIT);
-	spin_unlock(&ss->id_lock);
-	idr_preload_end();
+	ret = idr_alloc_range(&ss->idr, newid, 1, CSS_ID_MAX + 1, GFP_KERNEL);
 
 	/* Returns error when there are no free spaces for new ID.*/
 	if (ret < 0)
@@ -5442,7 +5436,6 @@ static int __init_or_module cgroup_init_idr(struct cgroup_subsys *ss,
 {
 	struct css_id *newid;
 
-	spin_lock_init(&ss->id_lock);
 	idr_init(&ss->idr);
 
 	newid = get_new_cssid(ss, 0);
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index aeb41bf..b97303a 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1703,13 +1703,7 @@ static struct worker *create_worker(struct worker_pool *pool)
 	 * ID is needed to determine kthread name.  Allocate ID first
 	 * without installing the pointer.
 	 */
-	idr_preload(GFP_KERNEL);
-	spin_lock_irq(&pool->lock);
-
-	id = idr_alloc(&pool->worker_idr, NULL, GFP_NOWAIT);
-
-	spin_unlock_irq(&pool->lock);
-	idr_preload_end();
+	id = idr_alloc(&pool->worker_idr, NULL, GFP_KERNEL);
 	if (id < 0)
 		goto fail;
 
@@ -1750,18 +1744,13 @@ static struct worker *create_worker(struct worker_pool *pool)
 		worker->flags |= WORKER_UNBOUND;
 
 	/* successful, commit the pointer to idr */
-	spin_lock_irq(&pool->lock);
 	idr_replace(&pool->worker_idr, worker, worker->id);
-	spin_unlock_irq(&pool->lock);
 
 	return worker;
 
 fail:
-	if (id >= 0) {
-		spin_lock_irq(&pool->lock);
+	if (id >= 0)
 		idr_remove(&pool->worker_idr, id);
-		spin_unlock_irq(&pool->lock);
-	}
 	kfree(worker);
 	return NULL;
 }
diff --git a/net/9p/util.c b/net/9p/util.c
index cc31c62..23693d6 100644
--- a/net/9p/util.c
+++ b/net/9p/util.c
@@ -41,7 +41,6 @@
  */
 
 struct p9_idpool {
-	spinlock_t lock;
 	struct idr pool;
 };
 
@@ -58,7 +57,6 @@ struct p9_idpool *p9_idpool_create(void)
 	if (!p)
 		return ERR_PTR(-ENOMEM);
 
-	spin_lock_init(&p->lock);
 	idr_init(&p->pool);
 
 	return p;
@@ -88,16 +86,9 @@ EXPORT_SYMBOL(p9_idpool_destroy);
 int p9_idpool_get(struct p9_idpool *p)
 {
 	int i;
-	unsigned long flags;
-
-	idr_preload(GFP_NOFS);
-	spin_lock_irqsave(&p->lock, flags);
 
 	/* no need to store exactly p, we just need something non-null */
-	i = idr_alloc(&p->pool, p, GFP_NOWAIT);
-
-	spin_unlock_irqrestore(&p->lock, flags);
-	idr_preload_end();
+	i = idr_alloc(&p->pool, p, GFP_NOFS);
 	if (i < 0)
 		return -1;
 
@@ -117,13 +108,9 @@ EXPORT_SYMBOL(p9_idpool_get);
 
 void p9_idpool_put(int id, struct p9_idpool *p)
 {
-	unsigned long flags;
-
 	p9_debug(P9_DEBUG_MUX, " id %d pool %p\n", id, p);
 
-	spin_lock_irqsave(&p->lock, flags);
 	idr_remove(&p->pool, id);
-	spin_unlock_irqrestore(&p->lock, flags);
 }
 EXPORT_SYMBOL(p9_idpool_put);
 
diff --git a/net/sctp/associola.c b/net/sctp/associola.c
index bce5b79..d88fc6d 100644
--- a/net/sctp/associola.c
+++ b/net/sctp/associola.c
@@ -464,11 +464,8 @@ static void sctp_association_destroy(struct sctp_association *asoc)
 	sctp_endpoint_put(asoc->ep);
 	sock_put(asoc->base.sk);
 
-	if (asoc->assoc_id != 0) {
-		spin_lock_bh(&sctp_assocs_id_lock);
+	if (asoc->assoc_id != 0)
 		idr_remove(&sctp_assocs_id, asoc->assoc_id);
-		spin_unlock_bh(&sctp_assocs_id_lock);
-	}
 
 	WARN_ON(atomic_read(&asoc->rmem_alloc));
 
@@ -1572,21 +1569,14 @@ int sctp_assoc_lookup_laddr(struct sctp_association *asoc,
 /* Set an association id for a given association */
 int sctp_assoc_set_id(struct sctp_association *asoc, gfp_t gfp)
 {
-	bool preload = gfp & __GFP_WAIT;
 	int ret;
 
 	/* If the id is already assigned, keep it. */
 	if (asoc->assoc_id)
 		return 0;
 
-	if (preload)
-		idr_preload(gfp);
-	spin_lock_bh(&sctp_assocs_id_lock);
 	/* 0 is not a valid assoc_id, must be >= 1 */
-	ret = idr_alloc_cyclic(&sctp_assocs_id, asoc, 1, 0, GFP_NOWAIT);
-	spin_unlock_bh(&sctp_assocs_id_lock);
-	if (preload)
-		idr_preload_end();
+	ret = idr_alloc_cyclic(&sctp_assocs_id, asoc, 1, 0, gfp);
 	if (ret < 0)
 		return ret;
 
diff --git a/net/sctp/protocol.c b/net/sctp/protocol.c
index 4a17494d..7dc468a 100644
--- a/net/sctp/protocol.c
+++ b/net/sctp/protocol.c
@@ -71,7 +71,6 @@
 struct sctp_globals sctp_globals __read_mostly;
 
 struct idr sctp_assocs_id;
-DEFINE_SPINLOCK(sctp_assocs_id_lock);
 
 static struct sctp_pf *sctp_pf_inet6_specific;
 static struct sctp_pf *sctp_pf_inet_specific;
diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index c6670d2..eb68192 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -226,9 +226,7 @@ struct sctp_association *sctp_id2assoc(struct sock *sk, sctp_assoc_t id)
 	if (!id || (id == (sctp_assoc_t)-1))
 		return NULL;
 
-	spin_lock_bh(&sctp_assocs_id_lock);
 	asoc = (struct sctp_association *)idr_find(&sctp_assocs_id, (int)id);
-	spin_unlock_bh(&sctp_assocs_id_lock);
 
 	if (!asoc || (asoc->base.sk != sk) || asoc->base.dead)
 		return NULL;
-- 
1.8.4.rc1

_______________________________________________
Containers mailing list
Containers@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/containers

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [PATCH 10/10] idr: Rework idr_preload()
  2013-08-07 17:34 IDA/IDR rewrite, percpu ida Kent Overstreet
@ 2013-08-07 17:46   ` Kent Overstreet
  2013-08-07 17:34 ` [PATCH 04/10] idr: Percpu ida Kent Overstreet
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 46+ messages in thread
From: Kent Overstreet @ 2013-08-07 17:46 UTC (permalink / raw)
  To: akpm, linux-kernel
  Cc: Dmitry Torokhov, David Airlie, Davidlohr Bueso, Trond Myklebust,
	dri-devel, Sean Hefty, Michel Lespinasse, John McCutchan,
	Roland Dreier, Thomas Hellstrom, linux1394-devel, linux-scsi,
	Robert Love, linux-rdma, cluster-devel, Brian Paul, Doug Gilbert,
	Dave Airlie, Hal Rosenstock, Rik van Riel, Erez Shitrit,
	Steve Wise, Wo

The old idr_preload() used percpu buffers - since the
bitmap/radix/whatever tree only grew by fixed sized nodes, it only had
to ensure there was a node available in the percpu buffer and disable
preemption. This conveniently meant that you didn't have to pass the idr
you were going to allocate from to it.

With the new ida implementation, that doesn't work anymore - the new ida
code grows its bitmap tree by reallocating the entire thing in power of
two increments. Doh.

So we need a slightly different trick. Note that if all allocations from
an idr start by calling idr_prealloc() and disabling premption, at
most nr_cpus() allocations can happen before someone calls
idr_prealloc() again.

So, we just change idr_prealloc() to resize the ida bitmap tree if
there's less than num_possible_cpus() ids available - conveniently, we
already track the number of allocated ids, and the total number of ids
we can have allocated is just nr_leaf_nodes * BITS_PER_LONG. Easy.

This does require a fairly trivial interface change - we now have to
pass the idr we're going to allocate from (and the starting id we're
going to pass to idr_allocate_range()) to idr_prealloc(), so this patch
updates all the callers.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
Cc: David Airlie <airlied@linux.ie>
Cc: Roland Dreier <roland@kernel.org>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: Steve Wise <swise@chelsio.com>
Cc: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Cc: Christoph Raisch <raisch@de.ibm.com>
Cc: Mike Marciniszyn <infinipath@intel.com>
Cc: Doug Gilbert <dgilbert@interlog.com>
Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
Cc: Christine Caulfield <ccaulfie@redhat.com>
Cc: David Teigland <teigland@redhat.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: John McCutchan <john@johnmccutchan.com>
Cc: Robert Love <rlove@rlove.org>
Cc: Eric Paris <eparis@parisplace.org>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Brian Paul <brianp@vmware.com>
Cc: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Cc: Dmitry Torokhov <dtor@vmware.com>
Cc: Erez Shitrit <erezsh@mellanox.co.il>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Haggai Eran <haggaie@mellanox.com>
Cc: Jack Morgenstein <jackm@dev.mellanox.co.il>
Cc: Wolfram Sang <wolfram@the-dreams.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: linux1394-devel@lists.sourceforge.net
Cc: linux-kernel@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: linux-rdma@vger.kernel.org
Cc: linux-scsi@vger.kernel.org
Cc: cluster-devel@redhat.com
Cc: linux-nfs@vger.kernel.org
---
 drivers/firewire/core-cdev.c               |  2 +-
 drivers/gpu/drm/drm_gem.c                  |  4 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_resource.c   |  2 +-
 drivers/infiniband/core/cm.c               |  8 +---
 drivers/infiniband/core/sa_query.c         |  2 +-
 drivers/infiniband/core/uverbs_cmd.c       |  2 +-
 drivers/infiniband/hw/cxgb3/iwch.h         |  2 +-
 drivers/infiniband/hw/cxgb4/iw_cxgb4.h     |  2 +-
 drivers/infiniband/hw/ehca/ehca_cq.c       |  2 +-
 drivers/infiniband/hw/ehca/ehca_qp.c       |  2 +-
 drivers/infiniband/hw/ipath/ipath_driver.c |  2 +-
 drivers/infiniband/hw/mlx4/cm.c            |  2 +-
 drivers/infiniband/hw/qib/qib_init.c       |  2 +-
 drivers/scsi/sg.c                          |  2 +-
 fs/dlm/lock.c                              |  2 +-
 fs/dlm/recover.c                           |  2 +-
 fs/nfs/nfs4client.c                        |  2 +-
 fs/notify/inotify/inotify_user.c           |  2 +-
 include/linux/idr.h                        | 37 +----------------
 ipc/util.c                                 |  4 +-
 lib/idr.c                                  | 66 ++++++++++++++++++++++++++++++
 21 files changed, 91 insertions(+), 60 deletions(-)

diff --git a/drivers/firewire/core-cdev.c b/drivers/firewire/core-cdev.c
index ba78d08..08d31da 100644
--- a/drivers/firewire/core-cdev.c
+++ b/drivers/firewire/core-cdev.c
@@ -491,7 +491,7 @@ static int add_client_resource(struct client *client,
 	int ret;
 
 	if (preload)
-		idr_preload(gfp_mask);
+		idr_preload(&client->resource_idr, 0, gfp_mask);
 	spin_lock_irqsave(&client->lock, flags);
 
 	if (client->in_shutdown)
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index d12ea60..c8ed531 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -269,7 +269,7 @@ drm_gem_handle_create(struct drm_file *file_priv,
 	 * Get the user-visible handle using idr.  Preload and perform
 	 * allocation under our spinlock.
 	 */
-	idr_preload(GFP_KERNEL);
+	idr_preload(&file_priv->object_idr, 1, GFP_KERNEL);
 	spin_lock(&file_priv->table_lock);
 
 	ret = idr_alloc_range(&file_priv->object_idr, obj, 1, 0, GFP_NOWAIT);
@@ -445,7 +445,7 @@ drm_gem_flink_ioctl(struct drm_device *dev, void *data,
 	if (obj == NULL)
 		return -ENOENT;
 
-	idr_preload(GFP_KERNEL);
+	idr_preload(&dev->object_name_idr, 1, GFP_KERNEL);
 	spin_lock(&dev->object_name_lock);
 	if (!obj->name) {
 		ret = idr_alloc_range(&dev->object_name_idr,
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c b/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
index 4838238..1078b51 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
@@ -177,7 +177,7 @@ int vmw_resource_alloc_id(struct vmw_resource *res)
 
 	BUG_ON(res->id != -1);
 
-	idr_preload(GFP_KERNEL);
+	idr_preload(idr, 1, GFP_KERNEL);
 	write_lock(&dev_priv->resource_lock);
 
 	ret = idr_alloc_range(idr, res, 1, 0, GFP_NOWAIT);
diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index c686690..a11bb5e 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -383,15 +383,11 @@ static int cm_alloc_id(struct cm_id_private *cm_id_priv)
 {
 	unsigned long flags;
 	int id;
-	static int next_id;
 
-	idr_preload(GFP_KERNEL);
+	idr_preload(&cm.local_id_table, 0, GFP_KERNEL);
 	spin_lock_irqsave(&cm.lock, flags);
 
-	id = idr_alloc_range(&cm.local_id_table, cm_id_priv,
-			     next_id, 0, GFP_NOWAIT);
-	if (id >= 0)
-		next_id = max(id + 1, 0);
+	id = idr_alloc_cyclic(&cm.local_id_table, cm_id_priv, 0, 0, GFP_NOWAIT);
 
 	spin_unlock_irqrestore(&cm.lock, flags);
 	idr_preload_end();
diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c
index ce757fd..b1ed7fd 100644
--- a/drivers/infiniband/core/sa_query.c
+++ b/drivers/infiniband/core/sa_query.c
@@ -616,7 +616,7 @@ static int send_mad(struct ib_sa_query *query, int timeout_ms, gfp_t gfp_mask)
 	int ret, id;
 
 	if (preload)
-		idr_preload(gfp_mask);
+		idr_preload(&query_idr, 0, gfp_mask);
 	spin_lock_irqsave(&idr_lock, flags);
 
 	id = idr_alloc(&query_idr, query, GFP_NOWAIT);
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 9ddc2e0..0f20a27 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -125,7 +125,7 @@ static int idr_add_uobj(struct idr *idr, struct ib_uobject *uobj)
 {
 	int ret;
 
-	idr_preload(GFP_KERNEL);
+	idr_preload(idr, 0, GFP_KERNEL);
 	spin_lock(&ib_uverbs_idr_lock);
 
 	ret = idr_alloc(idr, uobj, GFP_NOWAIT);
diff --git a/drivers/infiniband/hw/cxgb3/iwch.h b/drivers/infiniband/hw/cxgb3/iwch.h
index f28c585..12e5f29 100644
--- a/drivers/infiniband/hw/cxgb3/iwch.h
+++ b/drivers/infiniband/hw/cxgb3/iwch.h
@@ -154,7 +154,7 @@ static inline int insert_handle(struct iwch_dev *rhp, struct idr *idr,
 {
 	int ret;
 
-	idr_preload(GFP_KERNEL);
+	idr_preload(idr, id, GFP_KERNEL);
 	spin_lock_irq(&rhp->lock);
 
 	ret = idr_alloc_range(idr, handle, id, id + 1, GFP_NOWAIT);
diff --git a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
index 50e5a3f..e6a5fc3 100644
--- a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
+++ b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
@@ -262,7 +262,7 @@ static inline int _insert_handle(struct c4iw_dev *rhp, struct idr *idr,
 	int ret;
 
 	if (lock) {
-		idr_preload(GFP_KERNEL);
+		idr_preload(idr, id, GFP_KERNEL);
 		spin_lock_irq(&rhp->lock);
 	}
 
diff --git a/drivers/infiniband/hw/ehca/ehca_cq.c b/drivers/infiniband/hw/ehca/ehca_cq.c
index a3632ee..3886f43 100644
--- a/drivers/infiniband/hw/ehca/ehca_cq.c
+++ b/drivers/infiniband/hw/ehca/ehca_cq.c
@@ -163,7 +163,7 @@ struct ib_cq *ehca_create_cq(struct ib_device *device, int cqe, int comp_vector,
 	adapter_handle = shca->ipz_hca_handle;
 	param.eq_handle = shca->eq.ipz_eq_handle;
 
-	idr_preload(GFP_KERNEL);
+	idr_preload(&ehca_cq_idr, 0, GFP_KERNEL);
 	write_lock_irqsave(&ehca_cq_idr_lock, flags);
 	my_cq->token = idr_alloc_range(&ehca_cq_idr, my_cq, 0,
 				       0x2000000, GFP_NOWAIT);
diff --git a/drivers/infiniband/hw/ehca/ehca_qp.c b/drivers/infiniband/hw/ehca/ehca_qp.c
index 758a265..4184133 100644
--- a/drivers/infiniband/hw/ehca/ehca_qp.c
+++ b/drivers/infiniband/hw/ehca/ehca_qp.c
@@ -636,7 +636,7 @@ static struct ehca_qp *internal_create_qp(
 		my_qp->send_cq =
 			container_of(init_attr->send_cq, struct ehca_cq, ib_cq);
 
-	idr_preload(GFP_KERNEL);
+	idr_preload(&ehca_qp_idr, 0, GFP_KERNEL);
 	write_lock_irqsave(&ehca_qp_idr_lock, flags);
 
 	ret = idr_alloc_range(&ehca_qp_idr, my_qp, 0, 0x2000000, GFP_NOWAIT);
diff --git a/drivers/infiniband/hw/ipath/ipath_driver.c b/drivers/infiniband/hw/ipath/ipath_driver.c
index 83a40a5..b241f42 100644
--- a/drivers/infiniband/hw/ipath/ipath_driver.c
+++ b/drivers/infiniband/hw/ipath/ipath_driver.c
@@ -201,7 +201,7 @@ static struct ipath_devdata *ipath_alloc_devdata(struct pci_dev *pdev)
 	}
 	dd->ipath_unit = -1;
 
-	idr_preload(GFP_KERNEL);
+	idr_preload(&unit_table, 0, GFP_KERNEL);
 	spin_lock_irqsave(&ipath_devs_lock, flags);
 
 	ret = idr_alloc(&unit_table, dd, GFP_NOWAIT);
diff --git a/drivers/infiniband/hw/mlx4/cm.c b/drivers/infiniband/hw/mlx4/cm.c
index d1f5f1d..ac089e6 100644
--- a/drivers/infiniband/hw/mlx4/cm.c
+++ b/drivers/infiniband/hw/mlx4/cm.c
@@ -219,7 +219,7 @@ id_map_alloc(struct ib_device *ibdev, int slave_id, u32 sl_cm_id)
 	ent->dev = to_mdev(ibdev);
 	INIT_DELAYED_WORK(&ent->timeout, id_map_ent_timeout);
 
-	idr_preload(GFP_KERNEL);
+	idr_preload(&sriov->pv_id_table, 0, GFP_KERNEL);
 	spin_lock(&to_mdev(ibdev)->sriov.id_map_lock);
 
 	ret = idr_alloc_cyclic(&sriov->pv_id_table, ent, 0, 0, GFP_NOWAIT);
diff --git a/drivers/infiniband/hw/qib/qib_init.c b/drivers/infiniband/hw/qib/qib_init.c
index 17adbd10c..e7101b2 100644
--- a/drivers/infiniband/hw/qib/qib_init.c
+++ b/drivers/infiniband/hw/qib/qib_init.c
@@ -1106,7 +1106,7 @@ struct qib_devdata *qib_alloc_devdata(struct pci_dev *pdev, size_t extra)
 	qib_dbg_ibdev_init(&dd->verbs_dev);
 #endif
 
-	idr_preload(GFP_KERNEL);
+	idr_preload(&qib_unit_table, 0, GFP_KERNEL);
 	spin_lock_irqsave(&qib_devs_lock, flags);
 
 	ret = idr_alloc(&qib_unit_table, dd, GFP_NOWAIT);
diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 23856c8..d226a64 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -1392,7 +1392,7 @@ static Sg_device *sg_alloc(struct gendisk *disk, struct scsi_device *scsidp)
 		return ERR_PTR(-ENOMEM);
 	}
 
-	idr_preload(GFP_KERNEL);
+	idr_preload(&sg_index_idr, 0, GFP_KERNEL);
 	write_lock_irqsave(&sg_index_lock, iflags);
 
 	error = idr_alloc_range(&sg_index_idr, sdp, 0, SG_MAX_DEVS, GFP_NOWAIT);
diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c
index 75f0421..47edc23 100644
--- a/fs/dlm/lock.c
+++ b/fs/dlm/lock.c
@@ -1199,7 +1199,7 @@ static int create_lkb(struct dlm_ls *ls, struct dlm_lkb **lkb_ret)
 	mutex_init(&lkb->lkb_cb_mutex);
 	INIT_WORK(&lkb->lkb_cb_work, dlm_callback_work);
 
-	idr_preload(GFP_NOFS);
+	idr_preload(&ls->ls_lkbidr, 1, GFP_NOFS);
 	spin_lock(&ls->ls_lkbidr_spin);
 	rv = idr_alloc_range(&ls->ls_lkbidr, lkb, 1, 0, GFP_NOWAIT);
 	if (rv >= 0)
diff --git a/fs/dlm/recover.c b/fs/dlm/recover.c
index 2babe5e..757b7a6 100644
--- a/fs/dlm/recover.c
+++ b/fs/dlm/recover.c
@@ -307,7 +307,7 @@ static int recover_idr_add(struct dlm_rsb *r)
 	struct dlm_ls *ls = r->res_ls;
 	int rv;
 
-	idr_preload(GFP_NOFS);
+	idr_preload(&ls->ls_recover_idr, 1, GFP_NOFS);
 	spin_lock(&ls->ls_recover_idr_lock);
 	if (r->res_id) {
 		rv = -1;
diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
index dd8451d..14ab2da 100644
--- a/fs/nfs/nfs4client.c
+++ b/fs/nfs/nfs4client.c
@@ -30,7 +30,7 @@ static int nfs_get_cb_ident_idr(struct nfs_client *clp, int minorversion)
 
 	if (clp->rpc_ops->version != 4 || minorversion != 0)
 		return ret;
-	idr_preload(GFP_KERNEL);
+	idr_preload(&nn->cb_ident_idr, 0, GFP_KERNEL);
 	spin_lock(&nn->nfs_client_lock);
 	ret = idr_alloc(&nn->cb_ident_idr, clp, GFP_NOWAIT);
 	if (ret >= 0)
diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c
index 60f954a..c6bcf73 100644
--- a/fs/notify/inotify/inotify_user.c
+++ b/fs/notify/inotify/inotify_user.c
@@ -360,7 +360,7 @@ static int inotify_add_to_idr(struct idr *idr, spinlock_t *idr_lock,
 {
 	int ret;
 
-	idr_preload(GFP_KERNEL);
+	idr_preload(idr, 1, GFP_KERNEL);
 	spin_lock(idr_lock);
 
 	ret = idr_alloc_cyclic(idr, i_mark, 1, 0, GFP_NOWAIT);
diff --git a/include/linux/idr.h b/include/linux/idr.h
index 85355d7..418d87c 100644
--- a/include/linux/idr.h
+++ b/include/linux/idr.h
@@ -177,6 +177,7 @@ int idr_for_each(struct idr *idr,
 		 int (*fn)(int id, void *p, void *data), void *data);
 void *idr_replace(struct idr *idr, void *ptr, unsigned id);
 void idr_remove(struct idr *idr, unsigned id);
+int idr_preload(struct idr *idr, unsigned start, gfp_t gfp);
 int idr_alloc_range(struct idr *idr, void *ptr, unsigned start,
 		    unsigned end, gfp_t gfp);
 int idr_alloc_cyclic(struct idr *idr, void *ptr, unsigned start,
@@ -197,41 +198,7 @@ static inline int idr_alloc(struct idr *idr, void *ptr, gfp_t gfp)
  */
 static inline void idr_preload_end(void)
 {
-	radix_tree_preload_end();
-}
-
-/**
- * idr_preload - preload for idr_alloc_range()
- * @gfp: allocation mask to use for preloading
- *
- * Preload per-cpu layer buffer for idr_alloc_range().  Can only be used from
- * process context and each idr_preload() invocation should be matched with
- * idr_preload_end().  Note that preemption is disabled while preloaded.
- *
- * The first idr_alloc_range() in the preloaded section can be treated as if it
- * were invoked with @gfp_mask used for preloading.  This allows using more
- * permissive allocation masks for idrs protected by spinlocks.
- *
- * For example, if idr_alloc_range() below fails, the failure can be treated as
- * if idr_alloc_range() were called with GFP_KERNEL rather than GFP_NOWAIT.
- *
- *	idr_preload(GFP_KERNEL);
- *	spin_lock(lock);
- *
- *	id = idr_alloc_range(idr, ptr, start, end, GFP_NOWAIT);
- *
- *	spin_unlock(lock);
- *	idr_preload_end();
- *	if (id < 0)
- *		error;
- */
-static inline void idr_preload(gfp_t gfp)
-{
-	might_sleep_if(gfp & __GFP_WAIT);
-
-	/* Well this is horrible, but idr_preload doesn't return errors */
-	if (radix_tree_preload(gfp))
-		preempt_disable();
+	preempt_enable();
 }
 
 /* radix tree can't store NULL pointers, so we have to translate...  */
diff --git a/ipc/util.c b/ipc/util.c
index e31ecb8..d6453c1 100644
--- a/ipc/util.c
+++ b/ipc/util.c
@@ -261,7 +261,9 @@ int ipc_addid(struct ipc_ids* ids, struct kern_ipc_perm* new, int size)
 	if (ids->in_use >= size)
 		return -ENOSPC;
 
-	idr_preload(GFP_KERNEL);
+	idr_preload(&ids->ipcs_idr,
+		    (next_id < 0) ? 0 : ipcid_to_idx(next_id),
+		    GFP_KERNEL);
 
 	spin_lock_init(&new->lock);
 	new->deleted = 0;
diff --git a/lib/idr.c b/lib/idr.c
index 89ec59f..fb374c3 100644
--- a/lib/idr.c
+++ b/lib/idr.c
@@ -295,6 +295,23 @@ err:
 	return -ENOMEM;
 }
 
+static int ida_preload(struct ida *ida, unsigned start, gfp_t gfp)
+{
+	int ret = 0;
+	unsigned long flags;
+
+	spin_lock_irqsave(&ida->lock, flags);
+
+	while (!ret &&
+	       (ida->nodes - ida->first_leaf * BITS_PER_LONG <
+		start + ida->allocated_ids + num_possible_cpus()))
+		ret = __ida_resize(ida, gfp, &flags);
+
+	spin_unlock_irqrestore(&ida->lock, flags);
+
+	return ret;
+}
+
 /*
  * Ganged allocation - amortize locking and tree traversal for when we've got
  * another allocator (i.e. a percpu version) acting as a frontend to this code
@@ -1032,6 +1049,55 @@ void idr_remove(struct idr *idr, unsigned id)
 }
 EXPORT_SYMBOL(idr_remove);
 
+/**
+ * idr_preload - preload for idr_alloc_range()
+ * @idr: idr to ensure has room to allocate an id
+ * @start: value that will be passed to ida_alloc_range()
+ * @gfp: allocation mask to use for preloading
+ *
+ * On success, guarantees that one call of idr_alloc()/idr_alloc_range() won't
+ * fail. Returns with preemption disabled; use idr_preload_end() when
+ * finished.
+ *
+ * It's not required to check for failure if you're still checking for
+ * idr_alloc() failure.
+ *
+ * In order to guarantee idr_alloc() won't fail, all allocations from @idr must
+ * make use of idr_preload().
+ */
+int idr_preload(struct idr *idr, unsigned start, gfp_t gfp)
+{
+	int radix_ret, ida_ret = 0;
+
+	might_sleep_if(gfp & __GFP_WAIT);
+
+	while (1) {
+		radix_ret = radix_tree_preload(gfp);
+
+		/*
+		 * Well this is horrible, but radix_tree_preload() doesn't
+		 * disable preemption if it fails, and idr_preload() users don't
+		 * check for errors
+		 */
+		if (radix_ret)
+			preempt_disable();
+
+		/* if ida_preload with GFP_WAIT failed, don't retry */
+		if (ida_ret)
+			break;
+
+		if (!ida_preload(&idr->ida, start, GFP_NOWAIT) ||
+		    !(gfp & __GFP_WAIT))
+			break;
+
+		radix_tree_preload_end();
+		ida_ret = ida_preload(&idr->ida, start, gfp);
+	}
+
+	return radix_ret ?: ida_ret;
+}
+EXPORT_SYMBOL(idr_preload);
+
 static int idr_insert(struct idr *idr, void *ptr, unsigned id,
 		      gfp_t gfp, unsigned long *flags)
 {
-- 
1.8.4.rc1

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* [Cluster-devel] [PATCH 10/10] idr: Rework idr_preload()
@ 2013-08-07 17:46   ` Kent Overstreet
  0 siblings, 0 replies; 46+ messages in thread
From: Kent Overstreet @ 2013-08-07 17:46 UTC (permalink / raw)
  To: cluster-devel.redhat.com

The old idr_preload() used percpu buffers - since the
bitmap/radix/whatever tree only grew by fixed sized nodes, it only had
to ensure there was a node available in the percpu buffer and disable
preemption. This conveniently meant that you didn't have to pass the idr
you were going to allocate from to it.

With the new ida implementation, that doesn't work anymore - the new ida
code grows its bitmap tree by reallocating the entire thing in power of
two increments. Doh.

So we need a slightly different trick. Note that if all allocations from
an idr start by calling idr_prealloc() and disabling premption, at
most nr_cpus() allocations can happen before someone calls
idr_prealloc() again.

So, we just change idr_prealloc() to resize the ida bitmap tree if
there's less than num_possible_cpus() ids available - conveniently, we
already track the number of allocated ids, and the total number of ids
we can have allocated is just nr_leaf_nodes * BITS_PER_LONG. Easy.

This does require a fairly trivial interface change - we now have to
pass the idr we're going to allocate from (and the starting id we're
going to pass to idr_allocate_range()) to idr_prealloc(), so this patch
updates all the callers.

Signed-off-by: Kent Overstreet <kmo@daterainc.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
Cc: David Airlie <airlied@linux.ie>
Cc: Roland Dreier <roland@kernel.org>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: Steve Wise <swise@chelsio.com>
Cc: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Cc: Christoph Raisch <raisch@de.ibm.com>
Cc: Mike Marciniszyn <infinipath@intel.com>
Cc: Doug Gilbert <dgilbert@interlog.com>
Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
Cc: Christine Caulfield <ccaulfie@redhat.com>
Cc: David Teigland <teigland@redhat.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: John McCutchan <john@johnmccutchan.com>
Cc: Robert Love <rlove@rlove.org>
Cc: Eric Paris <eparis@parisplace.org>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Thomas Hellstrom <thellstrom@vmware.com>
Cc: Brian Paul <brianp@vmware.com>
Cc: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Cc: Dmitry Torokhov <dtor@vmware.com>
Cc: Erez Shitrit <erezsh@mellanox.co.il>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Haggai Eran <haggaie@mellanox.com>
Cc: Jack Morgenstein <jackm@dev.mellanox.co.il>
Cc: Wolfram Sang <wolfram@the-dreams.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: linux1394-devel at lists.sourceforge.net
Cc: linux-kernel at vger.kernel.org
Cc: dri-devel at lists.freedesktop.org
Cc: linux-rdma at vger.kernel.org
Cc: linux-scsi at vger.kernel.org
Cc: cluster-devel at redhat.com
Cc: linux-nfs at vger.kernel.org
---
 drivers/firewire/core-cdev.c               |  2 +-
 drivers/gpu/drm/drm_gem.c                  |  4 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_resource.c   |  2 +-
 drivers/infiniband/core/cm.c               |  8 +---
 drivers/infiniband/core/sa_query.c         |  2 +-
 drivers/infiniband/core/uverbs_cmd.c       |  2 +-
 drivers/infiniband/hw/cxgb3/iwch.h         |  2 +-
 drivers/infiniband/hw/cxgb4/iw_cxgb4.h     |  2 +-
 drivers/infiniband/hw/ehca/ehca_cq.c       |  2 +-
 drivers/infiniband/hw/ehca/ehca_qp.c       |  2 +-
 drivers/infiniband/hw/ipath/ipath_driver.c |  2 +-
 drivers/infiniband/hw/mlx4/cm.c            |  2 +-
 drivers/infiniband/hw/qib/qib_init.c       |  2 +-
 drivers/scsi/sg.c                          |  2 +-
 fs/dlm/lock.c                              |  2 +-
 fs/dlm/recover.c                           |  2 +-
 fs/nfs/nfs4client.c                        |  2 +-
 fs/notify/inotify/inotify_user.c           |  2 +-
 include/linux/idr.h                        | 37 +----------------
 ipc/util.c                                 |  4 +-
 lib/idr.c                                  | 66 ++++++++++++++++++++++++++++++
 21 files changed, 91 insertions(+), 60 deletions(-)

diff --git a/drivers/firewire/core-cdev.c b/drivers/firewire/core-cdev.c
index ba78d08..08d31da 100644
--- a/drivers/firewire/core-cdev.c
+++ b/drivers/firewire/core-cdev.c
@@ -491,7 +491,7 @@ static int add_client_resource(struct client *client,
 	int ret;
 
 	if (preload)
-		idr_preload(gfp_mask);
+		idr_preload(&client->resource_idr, 0, gfp_mask);
 	spin_lock_irqsave(&client->lock, flags);
 
 	if (client->in_shutdown)
diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c
index d12ea60..c8ed531 100644
--- a/drivers/gpu/drm/drm_gem.c
+++ b/drivers/gpu/drm/drm_gem.c
@@ -269,7 +269,7 @@ drm_gem_handle_create(struct drm_file *file_priv,
 	 * Get the user-visible handle using idr.  Preload and perform
 	 * allocation under our spinlock.
 	 */
-	idr_preload(GFP_KERNEL);
+	idr_preload(&file_priv->object_idr, 1, GFP_KERNEL);
 	spin_lock(&file_priv->table_lock);
 
 	ret = idr_alloc_range(&file_priv->object_idr, obj, 1, 0, GFP_NOWAIT);
@@ -445,7 +445,7 @@ drm_gem_flink_ioctl(struct drm_device *dev, void *data,
 	if (obj == NULL)
 		return -ENOENT;
 
-	idr_preload(GFP_KERNEL);
+	idr_preload(&dev->object_name_idr, 1, GFP_KERNEL);
 	spin_lock(&dev->object_name_lock);
 	if (!obj->name) {
 		ret = idr_alloc_range(&dev->object_name_idr,
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c b/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
index 4838238..1078b51 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
@@ -177,7 +177,7 @@ int vmw_resource_alloc_id(struct vmw_resource *res)
 
 	BUG_ON(res->id != -1);
 
-	idr_preload(GFP_KERNEL);
+	idr_preload(idr, 1, GFP_KERNEL);
 	write_lock(&dev_priv->resource_lock);
 
 	ret = idr_alloc_range(idr, res, 1, 0, GFP_NOWAIT);
diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c
index c686690..a11bb5e 100644
--- a/drivers/infiniband/core/cm.c
+++ b/drivers/infiniband/core/cm.c
@@ -383,15 +383,11 @@ static int cm_alloc_id(struct cm_id_private *cm_id_priv)
 {
 	unsigned long flags;
 	int id;
-	static int next_id;
 
-	idr_preload(GFP_KERNEL);
+	idr_preload(&cm.local_id_table, 0, GFP_KERNEL);
 	spin_lock_irqsave(&cm.lock, flags);
 
-	id = idr_alloc_range(&cm.local_id_table, cm_id_priv,
-			     next_id, 0, GFP_NOWAIT);
-	if (id >= 0)
-		next_id = max(id + 1, 0);
+	id = idr_alloc_cyclic(&cm.local_id_table, cm_id_priv, 0, 0, GFP_NOWAIT);
 
 	spin_unlock_irqrestore(&cm.lock, flags);
 	idr_preload_end();
diff --git a/drivers/infiniband/core/sa_query.c b/drivers/infiniband/core/sa_query.c
index ce757fd..b1ed7fd 100644
--- a/drivers/infiniband/core/sa_query.c
+++ b/drivers/infiniband/core/sa_query.c
@@ -616,7 +616,7 @@ static int send_mad(struct ib_sa_query *query, int timeout_ms, gfp_t gfp_mask)
 	int ret, id;
 
 	if (preload)
-		idr_preload(gfp_mask);
+		idr_preload(&query_idr, 0, gfp_mask);
 	spin_lock_irqsave(&idr_lock, flags);
 
 	id = idr_alloc(&query_idr, query, GFP_NOWAIT);
diff --git a/drivers/infiniband/core/uverbs_cmd.c b/drivers/infiniband/core/uverbs_cmd.c
index 9ddc2e0..0f20a27 100644
--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -125,7 +125,7 @@ static int idr_add_uobj(struct idr *idr, struct ib_uobject *uobj)
 {
 	int ret;
 
-	idr_preload(GFP_KERNEL);
+	idr_preload(idr, 0, GFP_KERNEL);
 	spin_lock(&ib_uverbs_idr_lock);
 
 	ret = idr_alloc(idr, uobj, GFP_NOWAIT);
diff --git a/drivers/infiniband/hw/cxgb3/iwch.h b/drivers/infiniband/hw/cxgb3/iwch.h
index f28c585..12e5f29 100644
--- a/drivers/infiniband/hw/cxgb3/iwch.h
+++ b/drivers/infiniband/hw/cxgb3/iwch.h
@@ -154,7 +154,7 @@ static inline int insert_handle(struct iwch_dev *rhp, struct idr *idr,
 {
 	int ret;
 
-	idr_preload(GFP_KERNEL);
+	idr_preload(idr, id, GFP_KERNEL);
 	spin_lock_irq(&rhp->lock);
 
 	ret = idr_alloc_range(idr, handle, id, id + 1, GFP_NOWAIT);
diff --git a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
index 50e5a3f..e6a5fc3 100644
--- a/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
+++ b/drivers/infiniband/hw/cxgb4/iw_cxgb4.h
@@ -262,7 +262,7 @@ static inline int _insert_handle(struct c4iw_dev *rhp, struct idr *idr,
 	int ret;
 
 	if (lock) {
-		idr_preload(GFP_KERNEL);
+		idr_preload(idr, id, GFP_KERNEL);
 		spin_lock_irq(&rhp->lock);
 	}
 
diff --git a/drivers/infiniband/hw/ehca/ehca_cq.c b/drivers/infiniband/hw/ehca/ehca_cq.c
index a3632ee..3886f43 100644
--- a/drivers/infiniband/hw/ehca/ehca_cq.c
+++ b/drivers/infiniband/hw/ehca/ehca_cq.c
@@ -163,7 +163,7 @@ struct ib_cq *ehca_create_cq(struct ib_device *device, int cqe, int comp_vector,
 	adapter_handle = shca->ipz_hca_handle;
 	param.eq_handle = shca->eq.ipz_eq_handle;
 
-	idr_preload(GFP_KERNEL);
+	idr_preload(&ehca_cq_idr, 0, GFP_KERNEL);
 	write_lock_irqsave(&ehca_cq_idr_lock, flags);
 	my_cq->token = idr_alloc_range(&ehca_cq_idr, my_cq, 0,
 				       0x2000000, GFP_NOWAIT);
diff --git a/drivers/infiniband/hw/ehca/ehca_qp.c b/drivers/infiniband/hw/ehca/ehca_qp.c
index 758a265..4184133 100644
--- a/drivers/infiniband/hw/ehca/ehca_qp.c
+++ b/drivers/infiniband/hw/ehca/ehca_qp.c
@@ -636,7 +636,7 @@ static struct ehca_qp *internal_create_qp(
 		my_qp->send_cq =
 			container_of(init_attr->send_cq, struct ehca_cq, ib_cq);
 
-	idr_preload(GFP_KERNEL);
+	idr_preload(&ehca_qp_idr, 0, GFP_KERNEL);
 	write_lock_irqsave(&ehca_qp_idr_lock, flags);
 
 	ret = idr_alloc_range(&ehca_qp_idr, my_qp, 0, 0x2000000, GFP_NOWAIT);
diff --git a/drivers/infiniband/hw/ipath/ipath_driver.c b/drivers/infiniband/hw/ipath/ipath_driver.c
index 83a40a5..b241f42 100644
--- a/drivers/infiniband/hw/ipath/ipath_driver.c
+++ b/drivers/infiniband/hw/ipath/ipath_driver.c
@@ -201,7 +201,7 @@ static struct ipath_devdata *ipath_alloc_devdata(struct pci_dev *pdev)
 	}
 	dd->ipath_unit = -1;
 
-	idr_preload(GFP_KERNEL);
+	idr_preload(&unit_table, 0, GFP_KERNEL);
 	spin_lock_irqsave(&ipath_devs_lock, flags);
 
 	ret = idr_alloc(&unit_table, dd, GFP_NOWAIT);
diff --git a/drivers/infiniband/hw/mlx4/cm.c b/drivers/infiniband/hw/mlx4/cm.c
index d1f5f1d..ac089e6 100644
--- a/drivers/infiniband/hw/mlx4/cm.c
+++ b/drivers/infiniband/hw/mlx4/cm.c
@@ -219,7 +219,7 @@ id_map_alloc(struct ib_device *ibdev, int slave_id, u32 sl_cm_id)
 	ent->dev = to_mdev(ibdev);
 	INIT_DELAYED_WORK(&ent->timeout, id_map_ent_timeout);
 
-	idr_preload(GFP_KERNEL);
+	idr_preload(&sriov->pv_id_table, 0, GFP_KERNEL);
 	spin_lock(&to_mdev(ibdev)->sriov.id_map_lock);
 
 	ret = idr_alloc_cyclic(&sriov->pv_id_table, ent, 0, 0, GFP_NOWAIT);
diff --git a/drivers/infiniband/hw/qib/qib_init.c b/drivers/infiniband/hw/qib/qib_init.c
index 17adbd10c..e7101b2 100644
--- a/drivers/infiniband/hw/qib/qib_init.c
+++ b/drivers/infiniband/hw/qib/qib_init.c
@@ -1106,7 +1106,7 @@ struct qib_devdata *qib_alloc_devdata(struct pci_dev *pdev, size_t extra)
 	qib_dbg_ibdev_init(&dd->verbs_dev);
 #endif
 
-	idr_preload(GFP_KERNEL);
+	idr_preload(&qib_unit_table, 0, GFP_KERNEL);
 	spin_lock_irqsave(&qib_devs_lock, flags);
 
 	ret = idr_alloc(&qib_unit_table, dd, GFP_NOWAIT);
diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index 23856c8..d226a64 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -1392,7 +1392,7 @@ static Sg_device *sg_alloc(struct gendisk *disk, struct scsi_device *scsidp)
 		return ERR_PTR(-ENOMEM);
 	}
 
-	idr_preload(GFP_KERNEL);
+	idr_preload(&sg_index_idr, 0, GFP_KERNEL);
 	write_lock_irqsave(&sg_index_lock, iflags);
 
 	error = idr_alloc_range(&sg_index_idr, sdp, 0, SG_MAX_DEVS, GFP_NOWAIT);
diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c
index 75f0421..47edc23 100644
--- a/fs/dlm/lock.c
+++ b/fs/dlm/lock.c
@@ -1199,7 +1199,7 @@ static int create_lkb(struct dlm_ls *ls, struct dlm_lkb **lkb_ret)
 	mutex_init(&lkb->lkb_cb_mutex);
 	INIT_WORK(&lkb->lkb_cb_work, dlm_callback_work);
 
-	idr_preload(GFP_NOFS);
+	idr_preload(&ls->ls_lkbidr, 1, GFP_NOFS);
 	spin_lock(&ls->ls_lkbidr_spin);
 	rv = idr_alloc_range(&ls->ls_lkbidr, lkb, 1, 0, GFP_NOWAIT);
 	if (rv >= 0)
diff --git a/fs/dlm/recover.c b/fs/dlm/recover.c
index 2babe5e..757b7a6 100644
--- a/fs/dlm/recover.c
+++ b/fs/dlm/recover.c
@@ -307,7 +307,7 @@ static int recover_idr_add(struct dlm_rsb *r)
 	struct dlm_ls *ls = r->res_ls;
 	int rv;
 
-	idr_preload(GFP_NOFS);
+	idr_preload(&ls->ls_recover_idr, 1, GFP_NOFS);
 	spin_lock(&ls->ls_recover_idr_lock);
 	if (r->res_id) {
 		rv = -1;
diff --git a/fs/nfs/nfs4client.c b/fs/nfs/nfs4client.c
index dd8451d..14ab2da 100644
--- a/fs/nfs/nfs4client.c
+++ b/fs/nfs/nfs4client.c
@@ -30,7 +30,7 @@ static int nfs_get_cb_ident_idr(struct nfs_client *clp, int minorversion)
 
 	if (clp->rpc_ops->version != 4 || minorversion != 0)
 		return ret;
-	idr_preload(GFP_KERNEL);
+	idr_preload(&nn->cb_ident_idr, 0, GFP_KERNEL);
 	spin_lock(&nn->nfs_client_lock);
 	ret = idr_alloc(&nn->cb_ident_idr, clp, GFP_NOWAIT);
 	if (ret >= 0)
diff --git a/fs/notify/inotify/inotify_user.c b/fs/notify/inotify/inotify_user.c
index 60f954a..c6bcf73 100644
--- a/fs/notify/inotify/inotify_user.c
+++ b/fs/notify/inotify/inotify_user.c
@@ -360,7 +360,7 @@ static int inotify_add_to_idr(struct idr *idr, spinlock_t *idr_lock,
 {
 	int ret;
 
-	idr_preload(GFP_KERNEL);
+	idr_preload(idr, 1, GFP_KERNEL);
 	spin_lock(idr_lock);
 
 	ret = idr_alloc_cyclic(idr, i_mark, 1, 0, GFP_NOWAIT);
diff --git a/include/linux/idr.h b/include/linux/idr.h
index 85355d7..418d87c 100644
--- a/include/linux/idr.h
+++ b/include/linux/idr.h
@@ -177,6 +177,7 @@ int idr_for_each(struct idr *idr,
 		 int (*fn)(int id, void *p, void *data), void *data);
 void *idr_replace(struct idr *idr, void *ptr, unsigned id);
 void idr_remove(struct idr *idr, unsigned id);
+int idr_preload(struct idr *idr, unsigned start, gfp_t gfp);
 int idr_alloc_range(struct idr *idr, void *ptr, unsigned start,
 		    unsigned end, gfp_t gfp);
 int idr_alloc_cyclic(struct idr *idr, void *ptr, unsigned start,
@@ -197,41 +198,7 @@ static inline int idr_alloc(struct idr *idr, void *ptr, gfp_t gfp)
  */
 static inline void idr_preload_end(void)
 {
-	radix_tree_preload_end();
-}
-
-/**
- * idr_preload - preload for idr_alloc_range()
- * @gfp: allocation mask to use for preloading
- *
- * Preload per-cpu layer buffer for idr_alloc_range().  Can only be used from
- * process context and each idr_preload() invocation should be matched with
- * idr_preload_end().  Note that preemption is disabled while preloaded.
- *
- * The first idr_alloc_range() in the preloaded section can be treated as if it
- * were invoked with @gfp_mask used for preloading.  This allows using more
- * permissive allocation masks for idrs protected by spinlocks.
- *
- * For example, if idr_alloc_range() below fails, the failure can be treated as
- * if idr_alloc_range() were called with GFP_KERNEL rather than GFP_NOWAIT.
- *
- *	idr_preload(GFP_KERNEL);
- *	spin_lock(lock);
- *
- *	id = idr_alloc_range(idr, ptr, start, end, GFP_NOWAIT);
- *
- *	spin_unlock(lock);
- *	idr_preload_end();
- *	if (id < 0)
- *		error;
- */
-static inline void idr_preload(gfp_t gfp)
-{
-	might_sleep_if(gfp & __GFP_WAIT);
-
-	/* Well this is horrible, but idr_preload doesn't return errors */
-	if (radix_tree_preload(gfp))
-		preempt_disable();
+	preempt_enable();
 }
 
 /* radix tree can't store NULL pointers, so we have to translate...  */
diff --git a/ipc/util.c b/ipc/util.c
index e31ecb8..d6453c1 100644
--- a/ipc/util.c
+++ b/ipc/util.c
@@ -261,7 +261,9 @@ int ipc_addid(struct ipc_ids* ids, struct kern_ipc_perm* new, int size)
 	if (ids->in_use >= size)
 		return -ENOSPC;
 
-	idr_preload(GFP_KERNEL);
+	idr_preload(&ids->ipcs_idr,
+		    (next_id < 0) ? 0 : ipcid_to_idx(next_id),
+		    GFP_KERNEL);
 
 	spin_lock_init(&new->lock);
 	new->deleted = 0;
diff --git a/lib/idr.c b/lib/idr.c
index 89ec59f..fb374c3 100644
--- a/lib/idr.c
+++ b/lib/idr.c
@@ -295,6 +295,23 @@ err:
 	return -ENOMEM;
 }
 
+static int ida_preload(struct ida *ida, unsigned start, gfp_t gfp)
+{
+	int ret = 0;
+	unsigned long flags;
+
+	spin_lock_irqsave(&ida->lock, flags);
+
+	while (!ret &&
+	       (ida->nodes - ida->first_leaf * BITS_PER_LONG <
+		start + ida->allocated_ids + num_possible_cpus()))
+		ret = __ida_resize(ida, gfp, &flags);
+
+	spin_unlock_irqrestore(&ida->lock, flags);
+
+	return ret;
+}
+
 /*
  * Ganged allocation - amortize locking and tree traversal for when we've got
  * another allocator (i.e. a percpu version) acting as a frontend to this code
@@ -1032,6 +1049,55 @@ void idr_remove(struct idr *idr, unsigned id)
 }
 EXPORT_SYMBOL(idr_remove);
 
+/**
+ * idr_preload - preload for idr_alloc_range()
+ * @idr: idr to ensure has room to allocate an id
+ * @start: value that will be passed to ida_alloc_range()
+ * @gfp: allocation mask to use for preloading
+ *
+ * On success, guarantees that one call of idr_alloc()/idr_alloc_range() won't
+ * fail. Returns with preemption disabled; use idr_preload_end() when
+ * finished.
+ *
+ * It's not required to check for failure if you're still checking for
+ * idr_alloc() failure.
+ *
+ * In order to guarantee idr_alloc() won't fail, all allocations from @idr must
+ * make use of idr_preload().
+ */
+int idr_preload(struct idr *idr, unsigned start, gfp_t gfp)
+{
+	int radix_ret, ida_ret = 0;
+
+	might_sleep_if(gfp & __GFP_WAIT);
+
+	while (1) {
+		radix_ret = radix_tree_preload(gfp);
+
+		/*
+		 * Well this is horrible, but radix_tree_preload() doesn't
+		 * disable preemption if it fails, and idr_preload() users don't
+		 * check for errors
+		 */
+		if (radix_ret)
+			preempt_disable();
+
+		/* if ida_preload with GFP_WAIT failed, don't retry */
+		if (ida_ret)
+			break;
+
+		if (!ida_preload(&idr->ida, start, GFP_NOWAIT) ||
+		    !(gfp & __GFP_WAIT))
+			break;
+
+		radix_tree_preload_end();
+		ida_ret = ida_preload(&idr->ida, start, gfp);
+	}
+
+	return radix_ret ?: ida_ret;
+}
+EXPORT_SYMBOL(idr_preload);
+
 static int idr_insert(struct idr *idr, void *ptr, unsigned id,
 		      gfp_t gfp, unsigned long *flags)
 {
-- 
1.8.4.rc1



^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH 04/10] idr: Percpu ida
  2013-08-07 17:34 ` [PATCH 04/10] idr: Percpu ida Kent Overstreet
@ 2013-08-07 17:56   ` Christoph Lameter
  2013-08-07 18:33     ` Kent Overstreet
  0 siblings, 1 reply; 46+ messages in thread
From: Christoph Lameter @ 2013-08-07 17:56 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: akpm, linux-kernel, Tejun Heo, Oleg Nesterov, Ingo Molnar,
	Andi Kleen, Jens Axboe, Nicholas A. Bellinger

On Wed, 7 Aug 2013, Kent Overstreet wrote:

> +{
> +	DEFINE_WAIT(wait);
> +	struct percpu_ida_cpu *tags;
> +	unsigned long flags;
> +	unsigned this_cpu;
> +	int tag;
> +
> +	local_irq_save(flags);

> +	this_cpu = smp_processor_id();
> +	tags = per_cpu_ptr(pool->tag_cpu, this_cpu);

tags = this_cpu_ptr(pool->tag_cpu);

> +		schedule();
> +
> +		local_irq_save(flags);
> +		this_cpu = smp_processor_id();
> +		tags = per_cpu_ptr(pool->tag_cpu, this_cpu);

And the same here.


> +void percpu_ida_free(struct percpu_ida *pool, unsigned tag)
> +{
> +	struct percpu_ida_cpu *tags;
> +	unsigned long flags;
> +	unsigned nr_free, this_cpu;
> +
> +	BUG_ON(tag >= pool->nr_tags);
> +
> +	local_irq_save(flags);
> +	this_cpu = smp_processor_id();
> +	tags = per_cpu_ptr(pool->tag_cpu, this_cpu);

And again



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 04/10] idr: Percpu ida
  2013-08-07 17:56   ` Christoph Lameter
@ 2013-08-07 18:33     ` Kent Overstreet
  2013-08-07 19:40       ` Christoph Lameter
  0 siblings, 1 reply; 46+ messages in thread
From: Kent Overstreet @ 2013-08-07 18:33 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: akpm, linux-kernel, Tejun Heo, Oleg Nesterov, Ingo Molnar,
	Andi Kleen, Jens Axboe, Nicholas A. Bellinger

On Wed, Aug 07, 2013 at 05:56:34PM +0000, Christoph Lameter wrote:
> On Wed, 7 Aug 2013, Kent Overstreet wrote:
> 
> > +{
> > +	DEFINE_WAIT(wait);
> > +	struct percpu_ida_cpu *tags;
> > +	unsigned long flags;
> > +	unsigned this_cpu;
> > +	int tag;
> > +
> > +	local_irq_save(flags);
> 
> > +	this_cpu = smp_processor_id();
> > +	tags = per_cpu_ptr(pool->tag_cpu, this_cpu);
> 
> tags = this_cpu_ptr(pool->tag_cpu);

I was breaking it apart because I was using this_cpu elsewhere too - for
the bitmap of which cpus have non empty freelists.

Or is this_cpu_ptr() doing something smarter than per_cpu_ptr(ptr,
smp_processer_id())? There's so many variants I'm not 100% sure they're
the same.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 07/10] idr: Rename idr_alloc() -> idr_alloc_range()
  2013-08-07 17:46 ` [PATCH 07/10] idr: Rename idr_alloc() -> idr_alloc_range() Kent Overstreet
@ 2013-08-07 19:04   ` Wolfram Sang
  0 siblings, 0 replies; 46+ messages in thread
From: Wolfram Sang @ 2013-08-07 19:04 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Tatyana Nikolova, Linus Walleij, Paul Mackerras,
	Christoph Hellwig, Alon Levy, Davidlohr Bueso, Ohad Ben-Cohen,
	John W. Linville, Inki Dae, Alex Williamson, David Teigland,
	Joel Becker, Tim Gardner, Greg Kroah-Hartman, Seung-Woo Kim,
	linux-kernel, Li Zefan, Tejun Heo, Guo Chao, akpm, Andy Grover,
	Dmitry


[-- Attachment #1.1: Type: text/plain, Size: 333 bytes --]

On Wed, Aug 07, 2013 at 10:46:16AM -0700, Kent Overstreet wrote:
> Then also add an idr_alloc() wrapper, without the start/end arguments -
> for consistency with the ida interfaces.
> 
> Signed-off-by: Kent Overstreet <kmo@daterainc.com>

For my taste, the motivation of the change could be described in a bit
more detail.


[-- Attachment #1.2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH 04/10] idr: Percpu ida
  2013-08-07 18:33     ` Kent Overstreet
@ 2013-08-07 19:40       ` Christoph Lameter
  2013-08-07 19:57         ` [PATCH] idr: Use this_cpu_ptr() for percpu_ida Kent Overstreet
  0 siblings, 1 reply; 46+ messages in thread
From: Christoph Lameter @ 2013-08-07 19:40 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: akpm, linux-kernel, Tejun Heo, Oleg Nesterov, Ingo Molnar,
	Andi Kleen, Jens Axboe, Nicholas A. Bellinger

On Wed, 7 Aug 2013, Kent Overstreet wrote:

> I was breaking it apart because I was using this_cpu elsewhere too - for
> the bitmap of which cpus have non empty freelists.

this_cpu can be retrieved with smp_processor_id().

> Or is this_cpu_ptr() doing something smarter than per_cpu_ptr(ptr,
> smp_processer_id())? There's so many variants I'm not 100% sure they're
> the same.

Yes it is. It uses a sepecial register that contains the offset of this
cpus per cpu area instead of going through the table of all processor
offsets. Its less code.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH] idr: Use this_cpu_ptr() for percpu_ida
  2013-08-07 19:40       ` Christoph Lameter
@ 2013-08-07 19:57         ` Kent Overstreet
  2013-08-08 14:32           ` Christoph Lameter
  0 siblings, 1 reply; 46+ messages in thread
From: Kent Overstreet @ 2013-08-07 19:57 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: akpm, linux-kernel, Tejun Heo, Oleg Nesterov, Ingo Molnar,
	Andi Kleen, Jens Axboe, Nicholas A. Bellinger

On Wed, Aug 07, 2013 at 07:40:15PM +0000, Christoph Lameter wrote:
> On Wed, 7 Aug 2013, Kent Overstreet wrote:
> 
> > I was breaking it apart because I was using this_cpu elsewhere too - for
> > the bitmap of which cpus have non empty freelists.
> 
> this_cpu can be retrieved with smp_processor_id().
> 
> > Or is this_cpu_ptr() doing something smarter than per_cpu_ptr(ptr,
> > smp_processer_id())? There's so many variants I'm not 100% sure they're
> > the same.
> 
> Yes it is. It uses a sepecial register that contains the offset of this
> cpus per cpu area instead of going through the table of all processor
> offsets. Its less code.

Alright, well here's a fixup patch - untested for the moment though.

One thing that was bugging me - I was never able to figure out for sure
if smp_processor_id() returns a number in the range [0, nr_cpu_ids), at
least I couldn't find where it was documented - could you tell me if
that's true?

>From e2b8016de49c28c0ccbe7849d7254f005c7e2e77 Mon Sep 17 00:00:00 2001
From: Kent Overstreet <kmo@daterainc.com>
Date: Wed, 7 Aug 2013 12:52:58 -0700
Subject: [PATCH] idr: Use this_cpu_ptr() for percpu_ida


diff --git a/lib/idr.c b/lib/idr.c
index fb374c3..320ffea 100644
--- a/lib/idr.c
+++ b/lib/idr.c
@@ -748,12 +748,10 @@ int percpu_ida_alloc(struct percpu_ida *pool, gfp_t gfp)
 	DEFINE_WAIT(wait);
 	struct percpu_ida_cpu *tags;
 	unsigned long flags;
-	unsigned this_cpu;
 	int tag;
 
 	local_irq_save(flags);
-	this_cpu = smp_processor_id();
-	tags = per_cpu_ptr(pool->tag_cpu, this_cpu);
+	tags = this_cpu_ptr(pool->tag_cpu);
 
 	/* Fastpath */
 	tag = alloc_local_tag(pool, tags);
@@ -782,7 +780,8 @@ int percpu_ida_alloc(struct percpu_ida *pool, gfp_t gfp)
 		if (tags->nr_free) {
 			tag = tags->freelist[--tags->nr_free];
 			if (tags->nr_free)
-				set_bit(this_cpu, pool->cpus_have_tags);
+				set_bit(smp_processor_id(),
+					pool->cpus_have_tags);
 		}
 
 		spin_unlock(&pool->ida.lock);
@@ -794,8 +793,7 @@ int percpu_ida_alloc(struct percpu_ida *pool, gfp_t gfp)
 		schedule();
 
 		local_irq_save(flags);
-		this_cpu = smp_processor_id();
-		tags = per_cpu_ptr(pool->tag_cpu, this_cpu);
+		tags = this_cpu_ptr(pool->tag_cpu);
 	}
 
 	finish_wait(&pool->wait, &wait);
@@ -814,13 +812,12 @@ void percpu_ida_free(struct percpu_ida *pool, unsigned tag)
 {
 	struct percpu_ida_cpu *tags;
 	unsigned long flags;
-	unsigned nr_free, this_cpu;
+	unsigned nr_free;
 
 	BUG_ON(tag >= pool->nr_tags);
 
 	local_irq_save(flags);
-	this_cpu = smp_processor_id();
-	tags = per_cpu_ptr(pool->tag_cpu, this_cpu);
+	tags = this_cpu_ptr(pool->tag_cpu);
 
 	spin_lock(&tags->lock);
 	tags->freelist[tags->nr_free++] = tag;
@@ -829,7 +826,8 @@ void percpu_ida_free(struct percpu_ida *pool, unsigned tag)
 	spin_unlock(&tags->lock);
 
 	if (nr_free == 1) {
-		set_bit(this_cpu, pool->cpus_have_tags);
+		set_bit(smp_processor_id(),
+			pool->cpus_have_tags);
 		wake_up(&pool->wait);
 	}
 

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH 03/10] idr: Rewrite ida
  2013-08-07 17:34 ` [PATCH 03/10] idr: Rewrite ida Kent Overstreet
@ 2013-08-07 20:22   ` Tejun Heo
  2013-08-07 20:51     ` [PATCH] idr: Document ida tree sections Kent Overstreet
  0 siblings, 1 reply; 46+ messages in thread
From: Tejun Heo @ 2013-08-07 20:22 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: akpm, linux-kernel, Stephen Rothwell, Fengguang Wu

Hello, Kent.

On Wed, Aug 07, 2013 at 10:34:58AM -0700, Kent Overstreet wrote:
> + * So for 1 mb of memory (and allocating more than that should be fine with
> + * CONFIG_COMPACTION) you get slightly under 8 million IDs.

Nothing seems to explain the section thing.  This is broken up now,
right?  Where's the documentation?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 46+ messages in thread

* [PATCH] idr: Document ida tree sections
  2013-08-07 20:22   ` Tejun Heo
@ 2013-08-07 20:51     ` Kent Overstreet
  2013-08-09 14:57       ` Tejun Heo
  0 siblings, 1 reply; 46+ messages in thread
From: Kent Overstreet @ 2013-08-07 20:51 UTC (permalink / raw)
  To: Tejun Heo; +Cc: akpm, linux-kernel, Stephen Rothwell, Fengguang Wu

On Wed, Aug 07, 2013 at 04:22:01PM -0400, Tejun Heo wrote:
> Hello, Kent.
> 
> On Wed, Aug 07, 2013 at 10:34:58AM -0700, Kent Overstreet wrote:
> > + * So for 1 mb of memory (and allocating more than that should be fine with
> > + * CONFIG_COMPACTION) you get slightly under 8 million IDs.
> 
> Nothing seems to explain the section thing.  This is broken up now,
> right?  Where's the documentation?

Whoops, yes. As usual with the documentation...

Here's a fixup patch for that:

>From c24de588c5f31fa77fb8fcbf4c457b32062fee0c Mon Sep 17 00:00:00 2001
From: Kent Overstreet <kmo@daterainc.com>
Date: Wed, 7 Aug 2013 13:50:42 -0700
Subject: [PATCH] idr: Document ida tree sections


diff --git a/lib/idr.c b/lib/idr.c
index 320ffea..02a221c 100644
--- a/lib/idr.c
+++ b/lib/idr.c
@@ -72,18 +72,37 @@ static void *kgalloc(size_t size, gfp_t gfp)
  * the bit for id i is bit id % BITS_PER_LONG in ida->tree[ida->first_leaf + i /
  * BITS_PER_LONG].
  *
- * Note that the number of ids we can allocate is limited by the amount of
- * memory we can contiguously allocate. The amount of memory used for the bitmap
- * tree is only slightly more than a flat bitmap would use - about 1 / TREE_ARY
- * * (sizeof flat bitmap).
+ * That last line of code is a lie - logically, the data structure is one flat
+ * array - but to avoid giant contiguous allocations we use an array of arrays -
+ * ida_index_to_node() replaces the array lookup in the above example.
  *
- * So for 1 mb of memory (and allocating more than that should be fine with
- * CONFIG_COMPACTION) you get slightly under 8 million IDs.
+ * So ida->tree is an array of pointers to sections, where the sections are
+ * different segments of the array the bitmap tree lives in.
+ *
+ * If there's a single section, it's only as big as we need it to be, and we
+ * grow the bitmap tree by doubling the size of the allocation.
+ *
+ * Once the tree is big enough that we start using multiple sections, the
+ * sections are always the same size - the max section size - and we grow the
+ * tree by appending new sections.
+ *
+ * The maximum size of the bitmap tree is when we've allocated all the way up to
+ * INT_MAX ids; we need (INT_MAX / 8) bytes of memory for the leaves, plus a
+ * couple percent for the parent nodes (since TREE_ARY == BITS_PER_LONG the
+ * parent nodes only add around 2%).
+ *
+ * So that's ~256 mb of memory max; we pick the max section size such that the
+ * max size of the array of pointers to sections isn't any bigger than the max
+ * section size.
+ *
+ * So if the max section size is 64k, that's ~4096 sections, with 8 byte
+ * pointers that's a little over 32k for the pointers to sections.
+ *
+ * That means max size sections are order 4 page allocations.
  */
 
 #define IDA_TREE_ARY		BITS_PER_LONG
-#define IDA_ALLOC_ORDER_MAX	4
-#define IDA_SECTION_SIZE	(PAGE_SIZE << IDA_ALLOC_ORDER_MAX)
+#define IDA_SECTION_SIZE	(64UL << 10)
 #define IDA_NODES_PER_SECTION	(IDA_SECTION_SIZE / sizeof(unsigned long))
 
 static inline unsigned long *ida_index_to_node(struct ida *ida, unsigned node)

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH] idr: Use this_cpu_ptr() for percpu_ida
  2013-08-07 19:57         ` [PATCH] idr: Use this_cpu_ptr() for percpu_ida Kent Overstreet
@ 2013-08-08 14:32           ` Christoph Lameter
  2013-08-20 21:19             ` Nicholas A. Bellinger
  0 siblings, 1 reply; 46+ messages in thread
From: Christoph Lameter @ 2013-08-08 14:32 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: akpm, linux-kernel, Tejun Heo, Oleg Nesterov, Ingo Molnar,
	Andi Kleen, Jens Axboe, Nicholas A. Bellinger

On Wed, 7 Aug 2013, Kent Overstreet wrote:

> One thing that was bugging me - I was never able to figure out for sure
> if smp_processor_id() returns a number in the range [0, nr_cpu_ids), at
> least I couldn't find where it was documented - could you tell me if
> that's true?

I always assumed that it was in the range 0 ... nr_cpu_ids - 1 and that is
the assumption under which the kernel code was written. Things would break
horribly if smp_process_id would return nr_cpu_ids or higher.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] idr: Document ida tree sections
  2013-08-07 20:51     ` [PATCH] idr: Document ida tree sections Kent Overstreet
@ 2013-08-09 14:57       ` Tejun Heo
  2013-08-13 22:13         ` Kent Overstreet
  2013-08-13 22:33         ` Kent Overstreet
  0 siblings, 2 replies; 46+ messages in thread
From: Tejun Heo @ 2013-08-09 14:57 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: akpm, linux-kernel, Stephen Rothwell, Fengguang Wu

Hello,

On Wed, Aug 07, 2013 at 01:51:17PM -0700, Kent Overstreet wrote:
> + * So if the max section size is 64k, that's ~4096 sections, with 8 byte
> + * pointers that's a little over 32k for the pointers to sections.
> + *
> + * That means max size sections are order 4 page allocations.

Order 4 allocations for common data structure doesn't really sound
like a good idea to me.  It's gonna work fine on relatively large
machines but suck on mobile / small embedded devices, many of which
are still struggling with 32bit address space and compaction may not
be enabled.  It just doens't make sense to me to impose 64k
allocations from low level library function like ida.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] idr: Document ida tree sections
  2013-08-09 14:57       ` Tejun Heo
@ 2013-08-13 22:13         ` Kent Overstreet
  2013-08-13 22:19           ` Tejun Heo
  2013-08-13 22:33         ` Kent Overstreet
  1 sibling, 1 reply; 46+ messages in thread
From: Kent Overstreet @ 2013-08-13 22:13 UTC (permalink / raw)
  To: Tejun Heo; +Cc: akpm, linux-kernel, Stephen Rothwell, Fengguang Wu

On Fri, Aug 09, 2013 at 10:57:56AM -0400, Tejun Heo wrote:
> Hello,
> 
> On Wed, Aug 07, 2013 at 01:51:17PM -0700, Kent Overstreet wrote:
> > + * So if the max section size is 64k, that's ~4096 sections, with 8 byte
> > + * pointers that's a little over 32k for the pointers to sections.
> > + *
> > + * That means max size sections are order 4 page allocations.
> 
> Order 4 allocations for common data structure doesn't really sound
> like a good idea to me.  It's gonna work fine on relatively large
> machines but suck on mobile / small embedded devices, many of which
> are still struggling with 32bit address space and compaction may not
> be enabled.  It just doens't make sense to me to impose 64k
> allocations from low level library function like ida.

I have a hard time seeing how it could really be an issue in practice -
keep in mind, for every bit in the ida tree we're going to have some
struct allocated somewhere else that the id correspends to.

So for a 4k ida, that's... bare minimum around 128k memory that has to
be allocated somewhere else, and in a single subsystem, assuming 16 byte
structs - and typically it'll be many times that. 32k memory in the ida
-> 2 mb in the subsystem, absolute minimum.

If you're convinced this is a real issue though - how about
IDA_SECTION_SIZE conditional on CONFIG_COMPACTION, so we use order 2 or
3 allocations if CONFIG_COMPACTION=n?

Then the max size toplevel array of pointers to segments would be
bigger, but that's only an issue when we're allocating up to near
INT_MAX ids, so it's difficult to see how _that_ would be an issue on a
small/embedded system... and we could even use vmalloc for that
allocation when the size of that array is > IDA_SECTION_SIZE.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] idr: Document ida tree sections
  2013-08-13 22:13         ` Kent Overstreet
@ 2013-08-13 22:19           ` Tejun Heo
  2013-08-13 22:27             ` Kent Overstreet
  0 siblings, 1 reply; 46+ messages in thread
From: Tejun Heo @ 2013-08-13 22:19 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: akpm, linux-kernel, Stephen Rothwell, Fengguang Wu

Hello,

On Tue, Aug 13, 2013 at 03:13:08PM -0700, Kent Overstreet wrote:
> If you're convinced this is a real issue though - how about

It is a real issue.  Large order allocation is fine for optimization
but shouldn't be depended upon.  It does fail easily without
compaction and compaction is heavy-ass operation which will blow up
any minute performance advantage you might get from avoiding proper
radix tree implementation.

> IDA_SECTION_SIZE conditional on CONFIG_COMPACTION, so we use order 2 or
> 3 allocations if CONFIG_COMPACTION=n?
> 
> Then the max size toplevel array of pointers to segments would be
> bigger, but that's only an issue when we're allocating up to near
> INT_MAX ids, so it's difficult to see how _that_ would be an issue on a
> small/embedded system... and we could even use vmalloc for that
> allocation when the size of that array is > IDA_SECTION_SIZE.

What about cyclic allocations then?  This is natrually a radix tree
problem.  I don't know why you're resisting radix tree so much here.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] idr: Document ida tree sections
  2013-08-13 22:19           ` Tejun Heo
@ 2013-08-13 22:27             ` Kent Overstreet
  2013-08-13 22:44               ` Tejun Heo
  0 siblings, 1 reply; 46+ messages in thread
From: Kent Overstreet @ 2013-08-13 22:27 UTC (permalink / raw)
  To: Tejun Heo; +Cc: akpm, linux-kernel, Stephen Rothwell, Fengguang Wu

On Tue, Aug 13, 2013 at 06:19:28PM -0400, Tejun Heo wrote:
> Hello,
> 
> On Tue, Aug 13, 2013 at 03:13:08PM -0700, Kent Overstreet wrote:
> > If you're convinced this is a real issue though - how about
> 
> It is a real issue.  Large order allocation is fine for optimization
> but shouldn't be depended upon.  It does fail easily without
> compaction and compaction is heavy-ass operation which will blow up
> any minute performance advantage you might get from avoiding proper
> radix tree implementation.
> 
> > IDA_SECTION_SIZE conditional on CONFIG_COMPACTION, so we use order 2 or
> > 3 allocations if CONFIG_COMPACTION=n?
> > 
> > Then the max size toplevel array of pointers to segments would be
> > bigger, but that's only an issue when we're allocating up to near
> > INT_MAX ids, so it's difficult to see how _that_ would be an issue on a
> > small/embedded system... and we could even use vmalloc for that
> > allocation when the size of that array is > IDA_SECTION_SIZE.
> 
> What about cyclic allocations then?  This is natrually a radix tree
> problem.  I don't know why you're resisting radix tree so much here.

It's only naturally a radix tree problem _if_ you require sparseness.
Otherwise, radix trees require pointer chasing, which we can avoid -
which saves us both the cost of chasing pointers (which is significant)
and the overhead of storing them.

The patch handles cyclic allocation by limiting sparseness - we talked
about this and I thought you were ok with this solution, though it was
awhile ago and I could be misremembering your comments.

To recap, here's the code that implements that sparseness limiting, it's
documented in ida_alloc_cyclic()'s docs:

static int __ida_alloc_cyclic(struct ida *ida, unsigned start, unsigned end,
			      gfp_t gfp, unsigned long *flags)
	__releases(&ida->lock)
	__acquires(&ida->lock)
{
	int ret;
	unsigned id;

	ret = __ida_alloc_range_multiple(ida, &id, 1,
					 max(start, ida->cur_id),
					 end, gfp, flags);

	if (ret < 0)
		ret = __ida_alloc_range_multiple(ida, &id, 1, start,
						 end, gfp, flags);
	if (ret == 1) {
		ida->cur_id = id + 1;
		if ((ida->cur_id - start) / 2 > max(1024U, ida->allocated_ids))
			ida->cur_id = 0;

		return id;
	}

	return ret;
}

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] idr: Document ida tree sections
  2013-08-09 14:57       ` Tejun Heo
  2013-08-13 22:13         ` Kent Overstreet
@ 2013-08-13 22:33         ` Kent Overstreet
  1 sibling, 0 replies; 46+ messages in thread
From: Kent Overstreet @ 2013-08-13 22:33 UTC (permalink / raw)
  To: Tejun Heo; +Cc: akpm, linux-kernel, Stephen Rothwell, Fengguang Wu

On Fri, Aug 09, 2013 at 10:57:56AM -0400, Tejun Heo wrote:
> Hello,
> 
> On Wed, Aug 07, 2013 at 01:51:17PM -0700, Kent Overstreet wrote:
> > + * So if the max section size is 64k, that's ~4096 sections, with 8 byte
> > + * pointers that's a little over 32k for the pointers to sections.
> > + *
> > + * That means max size sections are order 4 page allocations.
> 
> Order 4 allocations for common data structure doesn't really sound
> like a good idea to me.  It's gonna work fine on relatively large
> machines but suck on mobile / small embedded devices, many of which
> are still struggling with 32bit address space and compaction may not
> be enabled.  It just doens't make sense to me to impose 64k
> allocations from low level library function like ida.

Would this be an acceptable solution?

>From 483cfa0c809b7dc3b0abad93407468f273416578 Mon Sep 17 00:00:00 2001
From: Kent Overstreet <kmo@daterainc.com>
Date: Tue, 13 Aug 2013 15:31:20 -0700
Subject: [PATCH] ida: Use order 2 allocations when COMPACTION=n

And fall back to vmalloc for the array of pointers to sections so we can
still allocate up to INT_MAX ids.

diff --git a/lib/idr.c b/lib/idr.c
index 02a221c..3bffb52 100644
--- a/lib/idr.c
+++ b/lib/idr.c
@@ -21,16 +21,20 @@
 #include <linux/export.h>
 #include <linux/idr.h>
 #include <linux/kernel.h>
+#include <linux/mm.h>
 #include <linux/percpu.h>
 #include <linux/rcupdate.h>
 #include <linux/sched.h>
 #include <linux/slab.h>
 #include <linux/string.h>
 #include <linux/spinlock.h>
+#include <linux/vmalloc.h>
 
 static void kgfree(void *ptr, size_t size)
 {
-	if (size < PAGE_SIZE)
+	if (is_vmalloc_addr(ptr))
+		vfree(ptr);
+	else if (size < PAGE_SIZE)
 		kfree(ptr);
 	else
 		free_pages((unsigned long) ptr, get_order(size));
@@ -102,7 +106,14 @@ static void *kgalloc(size_t size, gfp_t gfp)
  */
 
 #define IDA_TREE_ARY		BITS_PER_LONG
-#define IDA_SECTION_SIZE	(64UL << 10)
+
+/* Max section size, in bytes */
+#ifdef CONFIG_COMPACTION
+#define IDA_SECTION_SIZE	(64UL << 10) /* order 4 page allocation */
+#else
+#define IDA_SECTION_SIZE	(16UL << 10) /* order 2 */
+#endif
+
 #define IDA_NODES_PER_SECTION	(IDA_SECTION_SIZE / sizeof(unsigned long))
 
 static inline unsigned long *ida_index_to_node(struct ida *ida, unsigned node)
@@ -251,8 +262,15 @@ again:
 
 	if (ida->nodes >= IDA_NODES_PER_SECTION &&
 	    is_power_of_2(cur_sections)) {
-		sections = kgalloc(cur_sections * 2 * sizeof(unsigned long *),
-				   __GFP_ZERO|gfp);
+		size_t bytes = cur_sections * 2 * sizeof(unsigned long *);
+
+		if (bytes <= IDA_SECTION_SIZE)
+			sections = kgalloc(bytes, __GFP_ZERO|gfp);
+		else if (gfp & GFP_KERNEL)
+			sections = vzalloc(bytes);
+		else
+			sections = NULL;
+
 		if (!sections)
 			goto err;
 	}

^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [PATCH] idr: Document ida tree sections
  2013-08-13 22:27             ` Kent Overstreet
@ 2013-08-13 22:44               ` Tejun Heo
  2013-08-13 22:59                 ` Kent Overstreet
  0 siblings, 1 reply; 46+ messages in thread
From: Tejun Heo @ 2013-08-13 22:44 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: akpm, linux-kernel, Stephen Rothwell, Fengguang Wu

Hello, Kent.

On Tue, Aug 13, 2013 at 03:27:59PM -0700, Kent Overstreet wrote:
> It's only naturally a radix tree problem _if_ you require sparseness.

Well, it's not necessarily about requiring it but more about surviving
it with some grace when things don't go as expected, which is an
important characteristic for common library stuff.

> Otherwise, radix trees require pointer chasing, which we can avoid -
> which saves us both the cost of chasing pointers (which is significant)
> and the overhead of storing them.

Vast majority of which can be avoided with simple caching, right?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] idr: Document ida tree sections
  2013-08-13 22:44               ` Tejun Heo
@ 2013-08-13 22:59                 ` Kent Overstreet
  2013-08-13 23:22                   ` Tejun Heo
  0 siblings, 1 reply; 46+ messages in thread
From: Kent Overstreet @ 2013-08-13 22:59 UTC (permalink / raw)
  To: Tejun Heo; +Cc: akpm, linux-kernel, Stephen Rothwell, Fengguang Wu

On Tue, Aug 13, 2013 at 06:44:28PM -0400, Tejun Heo wrote:
> Hello, Kent.
> 
> On Tue, Aug 13, 2013 at 03:27:59PM -0700, Kent Overstreet wrote:
> > It's only naturally a radix tree problem _if_ you require sparseness.
> 
> Well, it's not necessarily about requiring it but more about surviving
> it with some grace when things don't go as expected, which is an
> important characteristic for common library stuff.

The patch I posted should solve the high order allocations stuff, and
sparseness from cyclic allocations was already solved.

> > Otherwise, radix trees require pointer chasing, which we can avoid -
> > which saves us both the cost of chasing pointers (which is significant)
> > and the overhead of storing them.
> 
> Vast majority of which can be avoided with simple caching, right?

Whatever caching optimizations you do with a radix tree version I could
apply to this bitmap tree version, and my bitmap tree code is simpler
and _considerably_ faster than the existing code.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] idr: Document ida tree sections
  2013-08-13 22:59                 ` Kent Overstreet
@ 2013-08-13 23:22                   ` Tejun Heo
  2013-08-13 23:51                     ` Kent Overstreet
  0 siblings, 1 reply; 46+ messages in thread
From: Tejun Heo @ 2013-08-13 23:22 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: akpm, linux-kernel, Stephen Rothwell, Fengguang Wu

Hello,

On Tue, Aug 13, 2013 at 03:59:27PM -0700, Kent Overstreet wrote:
> > Well, it's not necessarily about requiring it but more about surviving
> > it with some grace when things don't go as expected, which is an
> > important characteristic for common library stuff.
> 
> The patch I posted should solve the high order allocations stuff, and
> sparseness from cyclic allocations was already solved.

I don't know.  Yeah, using vmalloc would be able to work around the
issue for most cases, I suppose.  It's iffy to consume vmalloc space
from ida, which functionally is such a basic algorithmic construct.
It probably won't worsen things noticeably but vmalloc area can be a
very precious resource on 32bit configs.

> Whatever caching optimizations you do with a radix tree version I could
> apply to this bitmap tree version, and my bitmap tree code is simpler
> and _considerably_ faster than the existing code.

But the difference won't really matter.  Cached performance would be
the same and that's likely to cover most cases, right?  It's not like
radix tree is orders of magnitude slower.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] idr: Document ida tree sections
  2013-08-13 23:22                   ` Tejun Heo
@ 2013-08-13 23:51                     ` Kent Overstreet
  2013-08-13 23:59                       ` Tejun Heo
  0 siblings, 1 reply; 46+ messages in thread
From: Kent Overstreet @ 2013-08-13 23:51 UTC (permalink / raw)
  To: Tejun Heo; +Cc: akpm, linux-kernel, Stephen Rothwell, Fengguang Wu

On Tue, Aug 13, 2013 at 07:22:11PM -0400, Tejun Heo wrote:
> Hello,
> 
> On Tue, Aug 13, 2013 at 03:59:27PM -0700, Kent Overstreet wrote:
> > > Well, it's not necessarily about requiring it but more about surviving
> > > it with some grace when things don't go as expected, which is an
> > > important characteristic for common library stuff.
> > 
> > The patch I posted should solve the high order allocations stuff, and
> > sparseness from cyclic allocations was already solved.
> 
> I don't know.  Yeah, using vmalloc would be able to work around the
> issue for most cases, I suppose.  It's iffy to consume vmalloc space
> from ida, which functionally is such a basic algorithmic construct.
> It probably won't worsen things noticeably but vmalloc area can be a
> very precious resource on 32bit configs.

This is only using it for the array of pointers to sections though, not
the bitmap itself - and only when that allocations is > 16k. For INT_MAX
allocated ids (absolute worst case) we'd be using 256k of vmalloc memory
on 64 bit, half that on 32 bit.

> 
> > Whatever caching optimizations you do with a radix tree version I could
> > apply to this bitmap tree version, and my bitmap tree code is simpler
> > and _considerably_ faster than the existing code.
> 
> But the difference won't really matter.  Cached performance would be
> the same and that's likely to cover most cases, right?  It's not like
> radix tree is orders of magnitude slower.

Should probably be almost as good, yeah... in theory, but the space
efficiency still isn't going to be as good, and it'll probably be more
code... and at this point I really just don't want to futz with it more.
At this point unless there's something really wrong with this code I
just want to move onto something else :P

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] idr: Document ida tree sections
  2013-08-13 23:51                     ` Kent Overstreet
@ 2013-08-13 23:59                       ` Tejun Heo
  2013-08-15  0:04                         ` Kent Overstreet
  0 siblings, 1 reply; 46+ messages in thread
From: Tejun Heo @ 2013-08-13 23:59 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: akpm, linux-kernel, Stephen Rothwell, Fengguang Wu

Hey, Kent.

On Tue, Aug 13, 2013 at 04:51:33PM -0700, Kent Overstreet wrote:
> Should probably be almost as good, yeah... in theory, but the space
> efficiency still isn't going to be as good, and it'll probably be more
> code... and at this point I really just don't want to futz with it more.
> At this point unless there's something really wrong with this code I
> just want to move onto something else :P

I think it probably would be okay in most cases but don't feel
confident about acking as it's making trade-offs which are unnecessary
and unusual.  So, ummm, I really don't know.  Maybe it's better enough
than what we have now but at the same time if you want to reimplement
the whole thing you should be persistent / reliable enough to see it
through this time around too, right?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] idr: Document ida tree sections
  2013-08-13 23:59                       ` Tejun Heo
@ 2013-08-15  0:04                         ` Kent Overstreet
  2013-08-15  0:22                           ` Tejun Heo
  0 siblings, 1 reply; 46+ messages in thread
From: Kent Overstreet @ 2013-08-15  0:04 UTC (permalink / raw)
  To: Tejun Heo; +Cc: akpm, linux-kernel, Stephen Rothwell, Fengguang Wu

On Tue, Aug 13, 2013 at 07:59:47PM -0400, Tejun Heo wrote:
> Hey, Kent.
> 
> On Tue, Aug 13, 2013 at 04:51:33PM -0700, Kent Overstreet wrote:
> > Should probably be almost as good, yeah... in theory, but the space
> > efficiency still isn't going to be as good, and it'll probably be more
> > code... and at this point I really just don't want to futz with it more.
> > At this point unless there's something really wrong with this code I
> > just want to move onto something else :P
> 
> I think it probably would be okay in most cases but don't feel
> confident about acking as it's making trade-offs which are unnecessary
> and unusual.  So, ummm, I really don't know.  Maybe it's better enough
> than what we have now but at the same time if you want to reimplement
> the whole thing you should be persistent / reliable enough to see it
> through this time around too, right?

I was just telling you how I felt :) Regardless of that, IMO what I've
got now is superior to any radix tree based approach for what ida/idr
are supposed to do. I could of course be wrong, but I'm not convinced...

Re: caching the last allocation with a radix tree based implementation.
I thought about that more last night, I don't think that would be viable
for using ida underneath the percpu ida allocator.

Reason being percpu ida has to heavily optimize for the case where
almost all of the id space is allocated, and after awhile the id space
is going to be fragmented - so caching the last allocated id is going to
be useless.

This is also why I implemented the ganged allocation bit, to amortize
the bitmap tree traversal. So we'd lose out on that going back to a
radix tree, or have to reimplement it (and it'll be slower due to
pointer chasing).

Which is all not the end of the world, but it means that if we punt on
the ida/idr rewrites for now or change our minds about them - we _do_
have quite a bit of stuff waiting on the percpu ida allocator, so for
that to go in separate I'll have to change it back to using a stack of
integers for the global freelist - which does use significantly more
memory.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] idr: Document ida tree sections
  2013-08-15  0:04                         ` Kent Overstreet
@ 2013-08-15  0:22                           ` Tejun Heo
  0 siblings, 0 replies; 46+ messages in thread
From: Tejun Heo @ 2013-08-15  0:22 UTC (permalink / raw)
  To: Kent Overstreet; +Cc: akpm, linux-kernel, Stephen Rothwell, Fengguang Wu

Hello, Kent.

On Wed, Aug 14, 2013 at 05:04:27PM -0700, Kent Overstreet wrote:
> I was just telling you how I felt :) Regardless of that, IMO what I've
> got now is superior to any radix tree based approach for what ida/idr
> are supposed to do. I could of course be wrong, but I'm not convinced...

It's just very difficult to tell either way.  You say it's better but
the benefit to weirdity ratio doesn't seem too apparent.  The only
thing the proposed solution saves is a few pointer dereferences in
extreme corner cases at the cost of making low level library using
high order allocation or vmalloc allocation.

Weirdity aside, the unsualness even makes evaluating the overhead
muddier.  e.g. vmalloc space is expensive not only in terms of address
space real estate but also in terms of runtime performance because
each vmalloc page is additional TLB pressure in most configurations
where the kernel linear address space is mapped with gigantic
mappings.  The net effect could be small and won't easily show up in
microbenchmarks as they usually don't tend to push TLB pressure but
then again the performance benefit of the proposed implementation is
likely to extremely minor too.

For a code piece to be unusual, it should have its accompanying clear
benefits, which doesn't seem to be the case here.  It's different and
maybe better in some extreme benchmarks specifically designed for it
but that seems to be about it.

> Re: caching the last allocation with a radix tree based implementation.
> I thought about that more last night, I don't think that would be viable
> for using ida underneath the percpu ida allocator.
>
> Reason being percpu ida has to heavily optimize for the case where
> almost all of the id space is allocated, and after awhile the id space
> is going to be fragmented - so caching the last allocated id is going to
> be useless.

A 4k page has 32k bits.  It can serve up quite a few IDs even with
internal indexing.  Most cases will be fine with single page and
single layer would cover most of what's left.  How is that gonna be
very different from the proposed implementation.  If you worry about
huge ID space being distributed by a lot of CPUs, you can use per-cpu
hints, which will be faster than the proposed solution anyway.

> This is also why I implemented the ganged allocation bit, to amortize
> the bitmap tree traversal. So we'd lose out on that going back to a
> radix tree, or have to reimplement it (and it'll be slower due to
> pointer chasing).
> 
> Which is all not the end of the world, but it means that if we punt on
> the ida/idr rewrites for now or change our minds about them - we _do_
> have quite a bit of stuff waiting on the percpu ida allocator, so for
> that to go in separate I'll have to change it back to using a stack of
> integers for the global freelist - which does use significantly more
> memory.

Yes, I'd like to see better, percpu-aware ida too and there are things
which can benefit from it, but we still need to get ida right and I
don't think it's a very difficult thing to do at this point.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] idr: Use this_cpu_ptr() for percpu_ida
  2013-08-08 14:32           ` Christoph Lameter
@ 2013-08-20 21:19             ` Nicholas A. Bellinger
  2013-08-20 21:29               ` Andrew Morton
  2013-08-21 14:32               ` Christoph Lameter
  0 siblings, 2 replies; 46+ messages in thread
From: Nicholas A. Bellinger @ 2013-08-20 21:19 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Kent Overstreet, akpm, linux-kernel, Tejun Heo, Oleg Nesterov,
	Ingo Molnar, Andi Kleen, Jens Axboe

On Thu, 2013-08-08 at 14:32 +0000, Christoph Lameter wrote:
> On Wed, 7 Aug 2013, Kent Overstreet wrote:
> 
> > One thing that was bugging me - I was never able to figure out for sure
> > if smp_processor_id() returns a number in the range [0, nr_cpu_ids), at
> > least I couldn't find where it was documented - could you tell me if
> > that's true?
> 
> I always assumed that it was in the range 0 ... nr_cpu_ids - 1 and that is
> the assumption under which the kernel code was written. Things would break
> horribly if smp_process_id would return nr_cpu_ids or higher.
> 

Hi guys,

Just a heads up that I've put Kent's standalone percpu-ida patch (with
Christoph's recommend changes) into target-pending/for-next here:

https://git.kernel.org/cgit/linux/kernel/git/nab/target-pending.git/commit/?h=for-next&id=47bd524a5b3eb6429b058b8b562b45329ab2c9e7

I've got a number of target patches that depend on this code for v3.12,
and a delay on this particular piece would be painful to endure..

Sooo, please yell loudly if there is an objection to percpu-ida merge as
a completely standalone item, that does not effect any existing ida
code.

--nab


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] idr: Use this_cpu_ptr() for percpu_ida
  2013-08-20 21:19             ` Nicholas A. Bellinger
@ 2013-08-20 21:29               ` Andrew Morton
  2013-08-21  2:01                 ` Kent Overstreet
  2013-08-21 14:32               ` Christoph Lameter
  1 sibling, 1 reply; 46+ messages in thread
From: Andrew Morton @ 2013-08-20 21:29 UTC (permalink / raw)
  To: Nicholas A. Bellinger
  Cc: Christoph Lameter, Kent Overstreet, linux-kernel, Tejun Heo,
	Oleg Nesterov, Ingo Molnar, Andi Kleen, Jens Axboe

On Tue, 20 Aug 2013 14:19:06 -0700 "Nicholas A. Bellinger" <nab@linux-iscsi.org> wrote:

> On Thu, 2013-08-08 at 14:32 +0000, Christoph Lameter wrote:
> > On Wed, 7 Aug 2013, Kent Overstreet wrote:
> > 
> > > One thing that was bugging me - I was never able to figure out for sure
> > > if smp_processor_id() returns a number in the range [0, nr_cpu_ids), at
> > > least I couldn't find where it was documented - could you tell me if
> > > that's true?
> > 
> > I always assumed that it was in the range 0 ... nr_cpu_ids - 1 and that is
> > the assumption under which the kernel code was written. Things would break
> > horribly if smp_process_id would return nr_cpu_ids or higher.
> > 
> 
> Hi guys,
> 
> Just a heads up that I've put Kent's standalone percpu-ida patch (with
> Christoph's recommend changes) into target-pending/for-next here:
> 
> https://git.kernel.org/cgit/linux/kernel/git/nab/target-pending.git/commit/?h=for-next&id=47bd524a5b3eb6429b058b8b562b45329ab2c9e7
> 
> I've got a number of target patches that depend on this code for v3.12,
> and a delay on this particular piece would be painful to endure..
> 
> Sooo, please yell loudly if there is an objection to percpu-ida merge as
> a completely standalone item, that does not effect any existing ida
> code.

Was hoping that Tejun had time.  I'll take a look...

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] idr: Use this_cpu_ptr() for percpu_ida
  2013-08-20 21:29               ` Andrew Morton
@ 2013-08-21  2:01                 ` Kent Overstreet
  2013-08-21  2:07                   ` Tejun Heo
  0 siblings, 1 reply; 46+ messages in thread
From: Kent Overstreet @ 2013-08-21  2:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Nicholas A. Bellinger, Christoph Lameter, linux-kernel,
	Tejun Heo, Oleg Nesterov, Ingo Molnar, Andi Kleen, Jens Axboe

On Tue, Aug 20, 2013 at 02:29:56PM -0700, Andrew Morton wrote:
> On Tue, 20 Aug 2013 14:19:06 -0700 "Nicholas A. Bellinger" <nab@linux-iscsi.org> wrote:
> 
> > On Thu, 2013-08-08 at 14:32 +0000, Christoph Lameter wrote:
> > > On Wed, 7 Aug 2013, Kent Overstreet wrote:
> > > 
> > > > One thing that was bugging me - I was never able to figure out for sure
> > > > if smp_processor_id() returns a number in the range [0, nr_cpu_ids), at
> > > > least I couldn't find where it was documented - could you tell me if
> > > > that's true?
> > > 
> > > I always assumed that it was in the range 0 ... nr_cpu_ids - 1 and that is
> > > the assumption under which the kernel code was written. Things would break
> > > horribly if smp_process_id would return nr_cpu_ids or higher.
> > > 
> > 
> > Hi guys,
> > 
> > Just a heads up that I've put Kent's standalone percpu-ida patch (with
> > Christoph's recommend changes) into target-pending/for-next here:
> > 
> > https://git.kernel.org/cgit/linux/kernel/git/nab/target-pending.git/commit/?h=for-next&id=47bd524a5b3eb6429b058b8b562b45329ab2c9e7
> > 
> > I've got a number of target patches that depend on this code for v3.12,
> > and a delay on this particular piece would be painful to endure..
> > 
> > Sooo, please yell loudly if there is an objection to percpu-ida merge as
> > a completely standalone item, that does not effect any existing ida
> > code.
> 
> Was hoping that Tejun had time.  I'll take a look...

I think Tejun and I might be at a bit of an impasse with the ida rewrite
itself, but I don't think there were any outstanding objections to the
percpu ida code itself - and this is a standalone version.

I was meaning to ask you Andrew, if you could take a look at the ida
discussion and lend your opinion - I don't think there's any _specific_
technical objections left to my ida code, and it's now on a more
philisophical "complexity vs. ..." level.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] idr: Use this_cpu_ptr() for percpu_ida
  2013-08-21  2:01                 ` Kent Overstreet
@ 2013-08-21  2:07                   ` Tejun Heo
  2013-08-21  2:31                     ` Kent Overstreet
  0 siblings, 1 reply; 46+ messages in thread
From: Tejun Heo @ 2013-08-21  2:07 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Andrew Morton, Nicholas A. Bellinger, Christoph Lameter,
	linux-kernel, Oleg Nesterov, Ingo Molnar, Andi Kleen, Jens Axboe

Hello, Kent.

On Tue, Aug 20, 2013 at 07:01:32PM -0700, Kent Overstreet wrote:
> I think Tejun and I might be at a bit of an impasse with the ida rewrite
> itself, but I don't think there were any outstanding objections to the
> percpu ida code itself - and this is a standalone version.

The percpu ida code can be applied separately from the ida rewrite?

> I was meaning to ask you Andrew, if you could take a look at the ida
> discussion and lend your opinion - I don't think there's any _specific_
> technical objections left to my ida code, and it's now on a more
> philisophical "complexity vs. ..." level.

Hmmm... the objection was pretty specific - don't depend on high-order
or vmalloc allocations when it can be easily avoided by using proper
radix tree.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] idr: Use this_cpu_ptr() for percpu_ida
  2013-08-21  2:07                   ` Tejun Heo
@ 2013-08-21  2:31                     ` Kent Overstreet
  2013-08-21 11:59                       ` Tejun Heo
  0 siblings, 1 reply; 46+ messages in thread
From: Kent Overstreet @ 2013-08-21  2:31 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Andrew Morton, Nicholas A. Bellinger, Christoph Lameter,
	linux-kernel, Oleg Nesterov, Ingo Molnar, Andi Kleen, Jens Axboe

On Tue, Aug 20, 2013 at 10:07:42PM -0400, Tejun Heo wrote:
> Hello, Kent.
> 
> On Tue, Aug 20, 2013 at 07:01:32PM -0700, Kent Overstreet wrote:
> > I think Tejun and I might be at a bit of an impasse with the ida rewrite
> > itself, but I don't think there were any outstanding objections to the
> > percpu ida code itself - and this is a standalone version.
> 
> The percpu ida code can be applied separately from the ida rewrite?

Yes - at the cost of using significantly more memory for the global
freelist

> > I was meaning to ask you Andrew, if you could take a look at the ida
> > discussion and lend your opinion - I don't think there's any _specific_
> > technical objections left to my ida code, and it's now on a more
> > philisophical "complexity vs. ..." level.
> 
> Hmmm... the objection was pretty specific - don't depend on high-order
> or vmalloc allocations when it can be easily avoided by using proper
> radix tree.

We only do vmalloc allocations for CONFIG_COMPACTION=n, and then only
when we need to allocate more than almost 1 _billion_ ids from a single
ida (twice than on 32 bit, so never because that gets us just about to
INT_MAX) - and then it's only 32k of vmalloc memory, for the entire ida.

This is with max allocation order of 4 for COMPACTION=y, 2 for
COMPACTION=n. 

All this for a performance improvement of 10x to 50x (or more), for the
ida sizes I measured.

So I could see your point if we were allocating gobs of vmalloc memory,
or high order allocations big enough to realistically be problematic (I
honestly don't think these will be) - but to me, this seems like a
pretty reasonable tradeoff for those performance gains.

(And the performance gains do substantially come from using more
contiguous memory and treating the whole data structure as an array, and
doing less pointer chasing/looping)

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] idr: Use this_cpu_ptr() for percpu_ida
  2013-08-21  2:31                     ` Kent Overstreet
@ 2013-08-21 11:59                       ` Tejun Heo
  2013-08-21 21:09                         ` Kent Overstreet
  0 siblings, 1 reply; 46+ messages in thread
From: Tejun Heo @ 2013-08-21 11:59 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Andrew Morton, Nicholas A. Bellinger, Christoph Lameter,
	linux-kernel, Oleg Nesterov, Ingo Molnar, Andi Kleen, Jens Axboe

Hello, Kent.

On Tue, Aug 20, 2013 at 07:31:51PM -0700, Kent Overstreet wrote:
> All this for a performance improvement of 10x to 50x (or more), for the
> ida sizes I measured.

That's misleading, isn't it?  We should see large performance
improvements even without the large pages.  What matters more is the
leaf node performance for vast majority of cases and an extra radix
tree layer on top would cover most of whatever is left.  Whether to
use high order pages or not only affects the extreme niche use cases
and I don't think going for high order pages to micro optimize those
extreme use cases is the right trade off.

> So I could see your point if we were allocating gobs of vmalloc memory,
> or high order allocations big enough to realistically be problematic (I
> honestly don't think these will be) - but to me, this seems like a
> pretty reasonable tradeoff for those performance gains.

The trade off is made up as the bulk of the performance benefit can be
gained without resorting to high order allocations.

> (And the performance gains do substantially come from using more
> contiguous memory and treating the whole data structure as an array, and
> doing less pointer chasing/looping)

I really have hard time buying that.  Let's say you go with single
page leaf node and an extra single page layer on top.  How many IDs
are we talking about?  For the cases which are most performance
sensitive, this doesn't even matter a bit as percpu caching layer
would be on top anyway.  I really don't think the micro optimization
is called for at the cost of high order allocations from low level
tool library.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] idr: Use this_cpu_ptr() for percpu_ida
  2013-08-20 21:19             ` Nicholas A. Bellinger
  2013-08-20 21:29               ` Andrew Morton
@ 2013-08-21 14:32               ` Christoph Lameter
  2013-08-21 17:49                 ` Nicholas A. Bellinger
  2013-08-21 20:49                 ` Andrew Morton
  1 sibling, 2 replies; 46+ messages in thread
From: Christoph Lameter @ 2013-08-21 14:32 UTC (permalink / raw)
  To: Nicholas A. Bellinger
  Cc: Kent Overstreet, akpm, linux-kernel, Tejun Heo, Oleg Nesterov,
	Ingo Molnar, Andi Kleen, Jens Axboe

On Tue, 20 Aug 2013, Nicholas A. Bellinger wrote:

> On Thu, 2013-08-08 at 14:32 +0000, Christoph Lameter wrote:
> > On Wed, 7 Aug 2013, Kent Overstreet wrote:
> >
> > > One thing that was bugging me - I was never able to figure out for sure
> > > if smp_processor_id() returns a number in the range [0, nr_cpu_ids), at
> > > least I couldn't find where it was documented - could you tell me if
> > > that's true?
> >
> > I always assumed that it was in the range 0 ... nr_cpu_ids - 1 and that is
> > the assumption under which the kernel code was written. Things would break
> > horribly if smp_process_id would return nr_cpu_ids or higher.
> >
>
> Hi guys,
>
> Just a heads up that I've put Kent's standalone percpu-ida patch (with
> Christoph's recommend changes) into target-pending/for-next here:
>
> https://git.kernel.org/cgit/linux/kernel/git/nab/target-pending.git/commit/?h=for-next&id=47bd524a5b3eb6429b058b8b562b45329ab2c9e7
>
> I've got a number of target patches that depend on this code for v3.12,
> and a delay on this particular piece would be painful to endure..
>
> Sooo, please yell loudly if there is an objection to percpu-ida merge as
> a completely standalone item, that does not effect any existing ida
> code.

Well the performance is still going to be limited due to the spinlock in
the percpu handling. You do not need the spinlock. Once preempt is off you
should have exclusive access to the per cpu data. This is already
exploited by idr_layer_alloc before the patch. Doing so is going to
reduce the code size of the patch significantly.

Please post the patch inline so that its easy to comment on it.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] idr: Use this_cpu_ptr() for percpu_ida
  2013-08-21 14:32               ` Christoph Lameter
@ 2013-08-21 17:49                 ` Nicholas A. Bellinger
  2013-08-21 20:49                 ` Andrew Morton
  1 sibling, 0 replies; 46+ messages in thread
From: Nicholas A. Bellinger @ 2013-08-21 17:49 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Kent Overstreet, akpm, linux-kernel, Tejun Heo, Oleg Nesterov,
	Ingo Molnar, Andi Kleen, Jens Axboe

On Wed, 2013-08-21 at 14:32 +0000, Christoph Lameter wrote:
> On Tue, 20 Aug 2013, Nicholas A. Bellinger wrote:
> 
> > On Thu, 2013-08-08 at 14:32 +0000, Christoph Lameter wrote:
> > > On Wed, 7 Aug 2013, Kent Overstreet wrote:
> > >
> > > > One thing that was bugging me - I was never able to figure out for sure
> > > > if smp_processor_id() returns a number in the range [0, nr_cpu_ids), at
> > > > least I couldn't find where it was documented - could you tell me if
> > > > that's true?
> > >
> > > I always assumed that it was in the range 0 ... nr_cpu_ids - 1 and that is
> > > the assumption under which the kernel code was written. Things would break
> > > horribly if smp_process_id would return nr_cpu_ids or higher.
> > >
> >
> > Hi guys,
> >
> > Just a heads up that I've put Kent's standalone percpu-ida patch (with
> > Christoph's recommend changes) into target-pending/for-next here:
> >
> > https://git.kernel.org/cgit/linux/kernel/git/nab/target-pending.git/commit/?h=for-next&id=47bd524a5b3eb6429b058b8b562b45329ab2c9e7
> >
> > I've got a number of target patches that depend on this code for v3.12,
> > and a delay on this particular piece would be painful to endure..
> >
> > Sooo, please yell loudly if there is an objection to percpu-ida merge as
> > a completely standalone item, that does not effect any existing ida
> > code.
> 
> Well the performance is still going to be limited due to the spinlock in
> the percpu handling. You do not need the spinlock. Once preempt is off you
> should have exclusive access to the per cpu data. This is already
> exploited by idr_layer_alloc before the patch. Doing so is going to
> reduce the code size of the patch significantly.
> 
> Please post the patch inline so that its easy to comment on it.
> 

Hi Christoph,

The latest version from Kent was posted last week here:

http://marc.info/?l=linux-kernel&m=137669878117020&w=2

--nab


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] idr: Use this_cpu_ptr() for percpu_ida
  2013-08-21 14:32               ` Christoph Lameter
  2013-08-21 17:49                 ` Nicholas A. Bellinger
@ 2013-08-21 20:49                 ` Andrew Morton
  2013-08-22 16:44                   ` Christoph Lameter
  1 sibling, 1 reply; 46+ messages in thread
From: Andrew Morton @ 2013-08-21 20:49 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Nicholas A. Bellinger, Kent Overstreet, linux-kernel, Tejun Heo,
	Oleg Nesterov, Ingo Molnar, Andi Kleen, Jens Axboe

On Wed, 21 Aug 2013 14:32:55 +0000 Christoph Lameter <cl@linux.com> wrote:

> On Tue, 20 Aug 2013, Nicholas A. Bellinger wrote:
> 
> > On Thu, 2013-08-08 at 14:32 +0000, Christoph Lameter wrote:
> > > On Wed, 7 Aug 2013, Kent Overstreet wrote:
> > >
> > > > One thing that was bugging me - I was never able to figure out for sure
> > > > if smp_processor_id() returns a number in the range [0, nr_cpu_ids), at
> > > > least I couldn't find where it was documented - could you tell me if
> > > > that's true?
> > >
> > > I always assumed that it was in the range 0 ... nr_cpu_ids - 1 and that is
> > > the assumption under which the kernel code was written. Things would break
> > > horribly if smp_process_id would return nr_cpu_ids or higher.
> > >
> >
> > Hi guys,
> >
> > Just a heads up that I've put Kent's standalone percpu-ida patch (with
> > Christoph's recommend changes) into target-pending/for-next here:
> >
> > https://git.kernel.org/cgit/linux/kernel/git/nab/target-pending.git/commit/?h=for-next&id=47bd524a5b3eb6429b058b8b562b45329ab2c9e7
> >
> > I've got a number of target patches that depend on this code for v3.12,
> > and a delay on this particular piece would be painful to endure..
> >
> > Sooo, please yell loudly if there is an objection to percpu-ida merge as
> > a completely standalone item, that does not effect any existing ida
> > code.
> 
> Well the performance is still going to be limited due to the spinlock in
> the percpu handling. You do not need the spinlock. Once preempt is off you
> should have exclusive access to the per cpu data.

The lock is needed so that one cpu can steal tags from another cpu's cache.
See (the needlessly inlined!) steal_tags().

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] idr: Use this_cpu_ptr() for percpu_ida
  2013-08-21 11:59                       ` Tejun Heo
@ 2013-08-21 21:09                         ` Kent Overstreet
  2013-08-21 21:16                           ` Tejun Heo
  0 siblings, 1 reply; 46+ messages in thread
From: Kent Overstreet @ 2013-08-21 21:09 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Andrew Morton, Nicholas A. Bellinger, Christoph Lameter,
	linux-kernel, Oleg Nesterov, Ingo Molnar, Andi Kleen, Jens Axboe

On Wed, Aug 21, 2013 at 07:59:41AM -0400, Tejun Heo wrote:
> Hello, Kent.
> 
> On Tue, Aug 20, 2013 at 07:31:51PM -0700, Kent Overstreet wrote:
> > All this for a performance improvement of 10x to 50x (or more), for the
> > ida sizes I measured.
> 
> That's misleading, isn't it? 

It's comparing it to the existing version that actually exists, instead
of comparing it to a hypothetical approach that doesn't exist yet. I
don't see how that's misleading.

> We should see large performance
> improvements even without the large pages.  What matters more is the
> leaf node performance for vast majority of cases and an extra radix
> tree layer on top would cover most of whatever is left.  Whether to
> use high order pages or not only affects the extreme niche use cases
> and I don't think going for high order pages to micro optimize those
> extreme use cases is the right trade off.
> 
> > So I could see your point if we were allocating gobs of vmalloc memory,
> > or high order allocations big enough to realistically be problematic (I
> > honestly don't think these will be) - but to me, this seems like a
> > pretty reasonable tradeoff for those performance gains.
> 
> The trade off is made up as the bulk of the performance benefit can be
> gained without resorting to high order allocations.

I'm more and more skeptical that that's true, and it's a given that it
wouldn't be as fast.

> > (And the performance gains do substantially come from using more
> > contiguous memory and treating the whole data structure as an array, and
> > doing less pointer chasing/looping)
> 
> I really have hard time buying that.  Let's say you go with single
> page leaf node and an extra single page layer on top.  How many IDs
> are we talking about?  For the cases which are most performance
> sensitive, this doesn't even matter a bit as percpu caching layer
> would be on top anyway.  I really don't think the micro optimization
> is called for at the cost of high order allocations from low level
> tool library.

These "micro optimizations" mean either less pointer chasing or less
branching in the _common_ case; you'd trade common case performance for
avoiding ever doing higher order allocations (and 2 with COMPACTION=n
and 4 with COMPACTION=y is not particularly high order!).

I don't buy that that's a good tradeoff. If you're convinced radix trees
are the way to go and it can be done without much performance cost, why
not code it up and show us?

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] idr: Use this_cpu_ptr() for percpu_ida
  2013-08-21 21:09                         ` Kent Overstreet
@ 2013-08-21 21:16                           ` Tejun Heo
  2013-08-21 21:24                             ` Kent Overstreet
  0 siblings, 1 reply; 46+ messages in thread
From: Tejun Heo @ 2013-08-21 21:16 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Andrew Morton, Nicholas A. Bellinger, Christoph Lameter,
	linux-kernel, Oleg Nesterov, Ingo Molnar, Andi Kleen, Jens Axboe

Hello, Kent.

On Wed, Aug 21, 2013 at 02:09:01PM -0700, Kent Overstreet wrote:
> These "micro optimizations" mean either less pointer chasing or less
> branching in the _common_ case; you'd trade common case performance for
> avoiding ever doing higher order allocations (and 2 with COMPACTION=n
> and 4 with COMPACTION=y is not particularly high order!).

Order 4 allocation probably isn't as bad as before but it still is a
lot nastier than single page allocations.  You say doing it the other
way would harm the common case performance but didn't answer my
question about the number of IDs being served per page.  How many can
be served from a single page?  And how many from two layer single page
configuration?  How are you defining the "common" case?

> I don't buy that that's a good tradeoff. If you're convinced radix trees
> are the way to go and it can be done without much performance cost, why
> not code it up and show us?

Well, I'm not the one trying to rewrite ida, so the onus to justify
the proposed code is primarily on you.  Another thing is that the
proposed code is *not* using the existing radix tree and instead
implementing its own simplified radix tree, which *can* be fine but
the bar to clear is fairly high.  You have to be able to show
*clearly* that using the existing radix tree is not an option.  Until
now, the only thing that I gathered is the simplified thing is gonna
be faster in some extreme cases while having clear disadvantage in
terms of memory allocation.  Not very convincing.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] idr: Use this_cpu_ptr() for percpu_ida
  2013-08-21 21:16                           ` Tejun Heo
@ 2013-08-21 21:24                             ` Kent Overstreet
  2013-08-21 21:31                               ` Tejun Heo
  0 siblings, 1 reply; 46+ messages in thread
From: Kent Overstreet @ 2013-08-21 21:24 UTC (permalink / raw)
  To: Tejun Heo
  Cc: Andrew Morton, Nicholas A. Bellinger, Christoph Lameter,
	linux-kernel, Oleg Nesterov, Ingo Molnar, Andi Kleen, Jens Axboe

On Wed, Aug 21, 2013 at 05:16:50PM -0400, Tejun Heo wrote:
> Hello, Kent.
> 
> On Wed, Aug 21, 2013 at 02:09:01PM -0700, Kent Overstreet wrote:
> > These "micro optimizations" mean either less pointer chasing or less
> > branching in the _common_ case; you'd trade common case performance for
> > avoiding ever doing higher order allocations (and 2 with COMPACTION=n
> > and 4 with COMPACTION=y is not particularly high order!).
> 
> Order 4 allocation probably isn't as bad as before but it still is a
> lot nastier than single page allocations.  You say doing it the other
> way would harm the common case performance but didn't answer my
> question about the number of IDs being served per page.  How many can
> be served from a single page?  And how many from two layer single page
> configuration?  How are you defining the "common" case?

With single page allocations:

1 << 15 bits per page

1 << 9 pointers per page

So two layers of pointers does get us to 1 << 33 bits, which is what we
need.

But now, since we need two layers of pointers instead of one, we need
either another pointer deref for a node lookup - _always_, even when
we've got 8 bytes of bits - or we need to branch on the depth of the
tree, which is something we don't have now.

This is extra overhead _no matter the size of the ida_, over my current
approach.

I'm assuming the common case is < one page of bits, based on the usage
I've seen throughout the kernel that's probably way conservative.

In that case, your approach is going to be slower than mine, and there's
no difference in the size of the allocations.

> > I don't buy that that's a good tradeoff. If you're convinced radix trees
> > are the way to go and it can be done without much performance cost, why
> > not code it up and show us?
> 
> Well, I'm not the one trying to rewrite ida, so the onus to justify
> the proposed code is primarily on you.  Another thing is that the
> proposed code is *not* using the existing radix tree and instead
> implementing its own simplified radix tree, which *can* be fine but
> the bar to clear is fairly high.  You have to be able to show
> *clearly* that using the existing radix tree is not an option.  Until
> now, the only thing that I gathered is the simplified thing is gonna
> be faster in some extreme cases while having clear disadvantage in
> terms of memory allocation.  Not very convincing.

I've already shown massive performance gains over the existing radix
tree approach, you're the one claiming a different approach would be
better.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] idr: Use this_cpu_ptr() for percpu_ida
  2013-08-21 21:24                             ` Kent Overstreet
@ 2013-08-21 21:31                               ` Tejun Heo
  0 siblings, 0 replies; 46+ messages in thread
From: Tejun Heo @ 2013-08-21 21:31 UTC (permalink / raw)
  To: Kent Overstreet
  Cc: Andrew Morton, Nicholas A. Bellinger, Christoph Lameter,
	linux-kernel, Oleg Nesterov, Ingo Molnar, Andi Kleen, Jens Axboe

Hello, Kent.

On Wed, Aug 21, 2013 at 02:24:42PM -0700, Kent Overstreet wrote:
> With single page allocations:
> 
> 1 << 15 bits per page
> 
> 1 << 9 pointers per page
> 
> So two layers of pointers does get us to 1 << 33 bits, which is what we
> need.

And single layer - 1 << 15 - would cover most of the use cases, right?
With 1 << (9 + 15) probably covering everyone else but the cyclic ones
doing the full circle.

> But now, since we need two layers of pointers instead of one, we need
> either another pointer deref for a node lookup - _always_, even when
> we've got 8 bytes of bits - or we need to branch on the depth of the
> tree, which is something we don't have now.

A likely() branch which is almost always hit is *extremely* cheap.

> This is extra overhead _no matter the size of the ida_, over my current
> approach.
> I'm assuming the common case is < one page of bits, based on the usage
> I've seen throughout the kernel that's probably way conservative.
> 
> In that case, your approach is going to be slower than mine, and there's
> no difference in the size of the allocations.

By single likely() branch.  I'm not even sure that'd be measureable in
most cases.  I'd take that over custom radix tree implementation which
needs high order allocations.

> I've already shown massive performance gains over the existing radix
> tree approach, you're the one claiming a different approach would be
> better.

So?  What difference does that make?  You should be able to justify
your custom thing.  If you do something unusual, of course someone is
gonna ask you to justify it and justifying that is *your*
responsibility.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] idr: Use this_cpu_ptr() for percpu_ida
  2013-08-21 20:49                 ` Andrew Morton
@ 2013-08-22 16:44                   ` Christoph Lameter
  2013-08-22 16:56                     ` Jens Axboe
  0 siblings, 1 reply; 46+ messages in thread
From: Christoph Lameter @ 2013-08-22 16:44 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Nicholas A. Bellinger, Kent Overstreet, linux-kernel, Tejun Heo,
	Oleg Nesterov, Ingo Molnar, Andi Kleen, Jens Axboe

On Wed, 21 Aug 2013, Andrew Morton wrote:

> The lock is needed so that one cpu can steal tags from another cpu's cache.
> See (the needlessly inlined!) steal_tags().

Stealing tags could also be done via IPIs or some other things.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [PATCH] idr: Use this_cpu_ptr() for percpu_ida
  2013-08-22 16:44                   ` Christoph Lameter
@ 2013-08-22 16:56                     ` Jens Axboe
  0 siblings, 0 replies; 46+ messages in thread
From: Jens Axboe @ 2013-08-22 16:56 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andrew Morton, Nicholas A. Bellinger, Kent Overstreet,
	linux-kernel, Tejun Heo, Oleg Nesterov, Ingo Molnar, Andi Kleen

On 08/22/2013 10:44 AM, Christoph Lameter wrote:
> On Wed, 21 Aug 2013, Andrew Morton wrote:
> 
>> The lock is needed so that one cpu can steal tags from another cpu's cache.
>> See (the needlessly inlined!) steal_tags().
> 
> Stealing tags could also be done via IPIs or some other things.

That is actually the approach I took for blk-mq tagging. But that isn't
free either. I think it pretty much boils down to how much stealing you
expect. If the tag space is sufficiently large, you would not expect a
lot of stealing. And then it doesn't really matter if you use a lock or
an IPI, since it's the rare case - the lock cacheline isn't going to be
shared across processors anyway. The ticket locks aren't exactly free on
their own, however, so that cost is still paid. But I'd be surprised if
it was much of an issue.

If the tag space isn't large enough (common case), then yes, it matters
a lot what kind of mechanism is used. It's on my TODO to compare the two
under realistic scenarios and see how they fare.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2013-08-22 16:56 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-07 17:34 IDA/IDR rewrite, percpu ida Kent Overstreet
2013-08-07 17:34 ` [PATCH 03/10] idr: Rewrite ida Kent Overstreet
2013-08-07 20:22   ` Tejun Heo
2013-08-07 20:51     ` [PATCH] idr: Document ida tree sections Kent Overstreet
2013-08-09 14:57       ` Tejun Heo
2013-08-13 22:13         ` Kent Overstreet
2013-08-13 22:19           ` Tejun Heo
2013-08-13 22:27             ` Kent Overstreet
2013-08-13 22:44               ` Tejun Heo
2013-08-13 22:59                 ` Kent Overstreet
2013-08-13 23:22                   ` Tejun Heo
2013-08-13 23:51                     ` Kent Overstreet
2013-08-13 23:59                       ` Tejun Heo
2013-08-15  0:04                         ` Kent Overstreet
2013-08-15  0:22                           ` Tejun Heo
2013-08-13 22:33         ` Kent Overstreet
2013-08-07 17:34 ` [PATCH 04/10] idr: Percpu ida Kent Overstreet
2013-08-07 17:56   ` Christoph Lameter
2013-08-07 18:33     ` Kent Overstreet
2013-08-07 19:40       ` Christoph Lameter
2013-08-07 19:57         ` [PATCH] idr: Use this_cpu_ptr() for percpu_ida Kent Overstreet
2013-08-08 14:32           ` Christoph Lameter
2013-08-20 21:19             ` Nicholas A. Bellinger
2013-08-20 21:29               ` Andrew Morton
2013-08-21  2:01                 ` Kent Overstreet
2013-08-21  2:07                   ` Tejun Heo
2013-08-21  2:31                     ` Kent Overstreet
2013-08-21 11:59                       ` Tejun Heo
2013-08-21 21:09                         ` Kent Overstreet
2013-08-21 21:16                           ` Tejun Heo
2013-08-21 21:24                             ` Kent Overstreet
2013-08-21 21:31                               ` Tejun Heo
2013-08-21 14:32               ` Christoph Lameter
2013-08-21 17:49                 ` Nicholas A. Bellinger
2013-08-21 20:49                 ` Andrew Morton
2013-08-22 16:44                   ` Christoph Lameter
2013-08-22 16:56                     ` Jens Axboe
2013-08-07 17:46 ` [PATCH 05/10] idr: Kill old deprecated idr interfaces Kent Overstreet
2013-08-07 17:46 ` [PATCH 06/10] idr: Rename idr_get_next() -> idr_find_next() Kent Overstreet
2013-08-07 17:46 ` [PATCH 07/10] idr: Rename idr_alloc() -> idr_alloc_range() Kent Overstreet
2013-08-07 19:04   ` Wolfram Sang
2013-08-07 17:46 ` [PATCH 08/10] idr: Reimplement idr on top of ida/radix trees Kent Overstreet
     [not found] ` <1375896905-6074-1-git-send-email-kmo-PEzghdH756F8UrSeD/g0lQ@public.gmane.org>
2013-08-07 17:46   ` [PATCH 09/10] idr: Remove unneeded idr locking, idr_preload() usage Kent Overstreet
2013-08-07 17:46     ` Kent Overstreet
2013-08-07 17:46 ` [PATCH 10/10] idr: Rework idr_preload() Kent Overstreet
2013-08-07 17:46   ` [Cluster-devel] " Kent Overstreet

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.