linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/9] Scalability requirements for sysv ipc - v3
@ 2008-05-07 11:35 Nadia.Derbey
  2008-05-07 11:35 ` [PATCH 1/9] Change the idr structure Nadia.Derbey
                   ` (11 more replies)
  0 siblings, 12 replies; 39+ messages in thread
From: Nadia.Derbey @ 2008-05-07 11:35 UTC (permalink / raw)
  To: manfred, paulmck, lnxninja; +Cc: linux-kernel, efault, akpm


After scalability problems have been detected when using the sysV ipcs, I
have proposed to use an RCU based implementation of the IDR api instead (see
threads http://lkml.org/lkml/2008/4/11/212 and
http://lkml.org/lkml/2008/4/29/295).

This resulted in many people asking to convert the idr API and make it
rcu safe (because most of the code was duplicated and thus unmaintanable
and unreviewable).

So here is a first attempt.

The important change wrt to the idr API itself is during idr removes:
idr layers are freed after a grace period, instead of being moved to the
free list.

The important change wrt to ipcs, is that idr_find() can now be called
locklessly inside a rcu read critical section.

Here are the results I've got for the pmsg test sent by Manfred: 

   2.6.25-rc3-mm1   2.6.25-rc3-mm1+   2.6.25-mm1   Patched 2.6.25-mm1
1         1168441           1064021       876000               947488
2         1094264            921059      1549592              1730685
3         2082520           1738165      1694370              2324880
4         2079929           1695521       404553              2400408
5         2898758            406566       391283              3246580
6         2921417            261275       263249              3752148
7         3308761            126056       191742              4243142
8         3329456            100129       141722              4275780

1st column: stock 2.6.25-rc3-mm1
2nd column: 2.6.25-rc3-mm1 + ipc patches (store ipcs into idrs)
3nd column: stock 2.6.25-mm1
4th column: 2.6.25-mm1 + this pacth series.

I'll send a chart as an answer to this mail: don't know how to do that
with quilt :-(


Reviewers are more than ever welcome!

Patches should be applied on linux-2.6.25-mm1, in the following order:

[ PATCH 01/09 ] : idr_add_rcu_head.patch
[ PATCH 02/09 ] : idr_rename_routines.patch
[ PATCH 03/09 ] : idr_fix_printk.patch
[ PATCH 04/09 ] : idr_rc_to_errno.patch
[ PATCH 05/09 ] : idr_get_new_rcu_safe.patch
[ PATCH 06/09 ] : idr_find_rcu_safe.patch
[ PATCH 07/09 ] : idr_remove_rcu_safe.patch
[ PATCH 08/09 ] : ipc_fix_ipc_lock.patch
[ PATCH 09/09 ] : remove_ipc_lock_down.patch

Patches 2, 3 and 4 do not introduce actual changes.

I won't be available before next Tuesday, so, please, don't be mad at me if
I'm not answering fast enough.

Regards,
Nadia

--

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 1/9] Change the idr structure
  2008-05-07 11:35 [PATCH 0/9] Scalability requirements for sysv ipc - v3 Nadia.Derbey
@ 2008-05-07 11:35 ` Nadia.Derbey
  2008-05-08 17:12   ` Rik van Riel
  2008-05-30  8:22   ` Paul E. McKenney
  2008-05-07 11:35 ` [PATCH 2/9] Rename some of the idr APIs internal routines Nadia.Derbey
                   ` (10 subsequent siblings)
  11 siblings, 2 replies; 39+ messages in thread
From: Nadia.Derbey @ 2008-05-07 11:35 UTC (permalink / raw)
  To: manfred, paulmck, lnxninja; +Cc: linux-kernel, efault, akpm, Nadia Derbey

[-- Attachment #1: idr_add_rcu_head.patch --]
[-- Type: text/plain, Size: 922 bytes --]

[PATCH 01/09]

This patch adds an rcu_head to the idr_layer structure in order to free it
after a grace period.

Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>

---
 include/linux/idr.h |    2 ++
 1 file changed, 2 insertions(+)

Index: linux-2.6.25-mm1/include/linux/idr.h
===================================================================
--- linux-2.6.25-mm1.orig/include/linux/idr.h	2008-05-06 17:14:24.000000000 +0200
+++ linux-2.6.25-mm1/include/linux/idr.h	2008-05-06 17:20:58.000000000 +0200
@@ -15,6 +15,7 @@
 #include <linux/types.h>
 #include <linux/bitops.h>
 #include <linux/init.h>
+#include <linux/rcupdate.h>
 
 #if BITS_PER_LONG == 32
 # define IDR_BITS 5
@@ -51,6 +52,7 @@ struct idr_layer {
 	unsigned long		 bitmap; /* A zero bit means "space here" */
 	struct idr_layer	*ary[1<<IDR_BITS];
 	int			 count;	 /* When zero, we can release it */
+	struct rcu_head		 rcu_head;
 };
 
 struct idr {

--

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 2/9] Rename some of the idr APIs internal routines
  2008-05-07 11:35 [PATCH 0/9] Scalability requirements for sysv ipc - v3 Nadia.Derbey
  2008-05-07 11:35 ` [PATCH 1/9] Change the idr structure Nadia.Derbey
@ 2008-05-07 11:35 ` Nadia.Derbey
  2008-05-08 17:15   ` Rik van Riel
  2008-05-30  8:23   ` Paul E. McKenney
  2008-05-07 11:35 ` [PATCH 3/9] Fix a printk call Nadia.Derbey
                   ` (9 subsequent siblings)
  11 siblings, 2 replies; 39+ messages in thread
From: Nadia.Derbey @ 2008-05-07 11:35 UTC (permalink / raw)
  To: manfred, paulmck, lnxninja; +Cc: linux-kernel, efault, akpm, Nadia Derbey

[-- Attachment #1: idr_rename_routines.patch --]
[-- Type: text/plain, Size: 4270 bytes --]

[PATCH 02/09]

This is a trivial patch that renames:
   . alloc_layer to get_from_free_list since it idr_pre_get that actually
     allocates memory.
   . free_layer to move_to_free_list since memory is not actually freed there.

This makes things more clear for the next patches.

Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>

---
 lib/idr.c |   31 ++++++++++++++++---------------
 1 file changed, 16 insertions(+), 15 deletions(-)

Index: linux-2.6.25-mm1/lib/idr.c
===================================================================
--- linux-2.6.25-mm1.orig/lib/idr.c	2008-05-06 17:14:30.000000000 +0200
+++ linux-2.6.25-mm1/lib/idr.c	2008-05-06 17:36:36.000000000 +0200
@@ -35,7 +35,7 @@
 
 static struct kmem_cache *idr_layer_cache;
 
-static struct idr_layer *alloc_layer(struct idr *idp)
+static struct idr_layer *get_from_free_list(struct idr *idp)
 {
 	struct idr_layer *p;
 	unsigned long flags;
@@ -51,14 +51,14 @@ static struct idr_layer *alloc_layer(str
 }
 
 /* only called when idp->lock is held */
-static void __free_layer(struct idr *idp, struct idr_layer *p)
+static void __move_to_free_list(struct idr *idp, struct idr_layer *p)
 {
 	p->ary[0] = idp->id_free;
 	idp->id_free = p;
 	idp->id_free_cnt++;
 }
 
-static void free_layer(struct idr *idp, struct idr_layer *p)
+static void move_to_free_list(struct idr *idp, struct idr_layer *p)
 {
 	unsigned long flags;
 
@@ -66,7 +66,7 @@ static void free_layer(struct idr *idp, 
 	 * Depends on the return element being zeroed.
 	 */
 	spin_lock_irqsave(&idp->lock, flags);
-	__free_layer(idp, p);
+	__move_to_free_list(idp, p);
 	spin_unlock_irqrestore(&idp->lock, flags);
 }
 
@@ -109,7 +109,7 @@ int idr_pre_get(struct idr *idp, gfp_t g
 		new = kmem_cache_alloc(idr_layer_cache, gfp_mask);
 		if (new == NULL)
 			return (0);
-		free_layer(idp, new);
+		move_to_free_list(idp, new);
 	}
 	return 1;
 }
@@ -167,7 +167,8 @@ static int sub_alloc(struct idr *idp, in
 		 * Create the layer below if it is missing.
 		 */
 		if (!p->ary[m]) {
-			if (!(new = alloc_layer(idp)))
+			new = get_from_free_list(idp);
+			if (!new)
 				return -1;
 			p->ary[m] = new;
 			p->count++;
@@ -192,7 +193,7 @@ build_up:
 	p = idp->top;
 	layers = idp->layers;
 	if (unlikely(!p)) {
-		if (!(p = alloc_layer(idp)))
+		if (!(p = get_from_free_list(idp)))
 			return -1;
 		layers = 1;
 	}
@@ -204,7 +205,7 @@ build_up:
 		layers++;
 		if (!p->count)
 			continue;
-		if (!(new = alloc_layer(idp))) {
+		if (!(new = get_from_free_list(idp))) {
 			/*
 			 * The allocation failed.  If we built part of
 			 * the structure tear it down.
@@ -214,7 +215,7 @@ build_up:
 				p = p->ary[0];
 				new->ary[0] = NULL;
 				new->bitmap = new->count = 0;
-				__free_layer(idp, new);
+				__move_to_free_list(idp, new);
 			}
 			spin_unlock_irqrestore(&idp->lock, flags);
 			return -1;
@@ -351,7 +352,7 @@ static void sub_remove(struct idr *idp, 
 		__clear_bit(n, &p->bitmap);
 		p->ary[n] = NULL;
 		while(*paa && ! --((**paa)->count)){
-			free_layer(idp, **paa);
+			move_to_free_list(idp, **paa);
 			**paa-- = NULL;
 		}
 		if (!*paa)
@@ -378,12 +379,12 @@ void idr_remove(struct idr *idp, int id)
 
 		p = idp->top->ary[0];
 		idp->top->bitmap = idp->top->count = 0;
-		free_layer(idp, idp->top);
+		move_to_free_list(idp, idp->top);
 		idp->top = p;
 		--idp->layers;
 	}
 	while (idp->id_free_cnt >= IDR_FREE_MAX) {
-		p = alloc_layer(idp);
+		p = get_from_free_list(idp);
 		kmem_cache_free(idr_layer_cache, p);
 		return;
 	}
@@ -426,7 +427,7 @@ void idr_remove_all(struct idr *idp)
 		while (n < fls(id)) {
 			if (p) {
 				memset(p, 0, sizeof *p);
-				free_layer(idp, p);
+				move_to_free_list(idp, p);
 			}
 			n += IDR_BITS;
 			p = *--paa;
@@ -444,7 +445,7 @@ EXPORT_SYMBOL(idr_remove_all);
 void idr_destroy(struct idr *idp)
 {
 	while (idp->id_free_cnt) {
-		struct idr_layer *p = alloc_layer(idp);
+		struct idr_layer *p = get_from_free_list(idp);
 		kmem_cache_free(idr_layer_cache, p);
 	}
 }
@@ -749,7 +750,7 @@ int ida_get_new_above(struct ida *ida, i
 	 * allocation.
 	 */
 	if (ida->idr.id_free_cnt || ida->free_bitmap) {
-		struct idr_layer *p = alloc_layer(&ida->idr);
+		struct idr_layer *p = get_from_free_list(&ida->idr);
 		if (p)
 			kmem_cache_free(idr_layer_cache, p);
 	}

--

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 3/9] Fix a printk call
  2008-05-07 11:35 [PATCH 0/9] Scalability requirements for sysv ipc - v3 Nadia.Derbey
  2008-05-07 11:35 ` [PATCH 1/9] Change the idr structure Nadia.Derbey
  2008-05-07 11:35 ` [PATCH 2/9] Rename some of the idr APIs internal routines Nadia.Derbey
@ 2008-05-07 11:35 ` Nadia.Derbey
  2008-05-08 17:43   ` Rik van Riel
  2008-05-30  8:23   ` Paul E. McKenney
  2008-05-07 11:35 ` [PATCH 4/9] Error checking factorization Nadia.Derbey
                   ` (8 subsequent siblings)
  11 siblings, 2 replies; 39+ messages in thread
From: Nadia.Derbey @ 2008-05-07 11:35 UTC (permalink / raw)
  To: manfred, paulmck, lnxninja; +Cc: linux-kernel, efault, akpm, Nadia Derbey

[-- Attachment #1: idr_fix_printk.patch --]
[-- Type: text/plain, Size: 724 bytes --]

[PATCH 03/09]

This is a trivial patch that fixes the printk incomplete call.

Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>

---
 lib/idr.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Index: linux-2.6.25-mm1/lib/idr.c
===================================================================
--- linux-2.6.25-mm1.orig/lib/idr.c	2008-05-06 17:24:15.000000000 +0200
+++ linux-2.6.25-mm1/lib/idr.c	2008-05-06 17:26:41.000000000 +0200
@@ -325,7 +325,8 @@ EXPORT_SYMBOL(idr_get_new);
 
 static void idr_remove_warning(int id)
 {
-	printk("idr_remove called for id=%d which is not allocated.\n", id);
+	printk(KERN_WARNING
+		"idr_remove called for id=%d which is not allocated.\n", id);
 	dump_stack();
 }
 

--

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 4/9] Error checking factorization
  2008-05-07 11:35 [PATCH 0/9] Scalability requirements for sysv ipc - v3 Nadia.Derbey
                   ` (2 preceding siblings ...)
  2008-05-07 11:35 ` [PATCH 3/9] Fix a printk call Nadia.Derbey
@ 2008-05-07 11:35 ` Nadia.Derbey
  2008-05-08 17:45   ` Rik van Riel
  2008-05-07 11:35 ` [PATCH 5/9] Make idr_get_new* rcu-safe Nadia.Derbey
                   ` (7 subsequent siblings)
  11 siblings, 1 reply; 39+ messages in thread
From: Nadia.Derbey @ 2008-05-07 11:35 UTC (permalink / raw)
  To: manfred, paulmck, lnxninja; +Cc: linux-kernel, efault, akpm, Nadia Derbey

[-- Attachment #1: idr_rc_to_errno.patch --]
[-- Type: text/plain, Size: 2832 bytes --]

[PATCH 04/09]

This is a trivial patch that does some codes factorization in the return code
analysis.

Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>

---
 include/linux/idr.h |    6 ++++++
 lib/idr.c           |   30 +++++++++---------------------
 2 files changed, 15 insertions(+), 21 deletions(-)

Index: linux-2.6.25-mm1/include/linux/idr.h
===================================================================
--- linux-2.6.25-mm1.orig/include/linux/idr.h	2008-05-06 17:35:42.000000000 +0200
+++ linux-2.6.25-mm1/include/linux/idr.h	2008-05-06 17:38:42.000000000 +0200
@@ -73,6 +73,12 @@ struct idr {
 }
 #define DEFINE_IDR(name)	struct idr name = IDR_INIT(name)
 
+/* Actions to be taken after a call to _idr_sub_alloc */
+#define IDR_NEED_TO_GROW -2
+#define IDR_NOMORE_SPACE -3
+
+#define _idr_rc_to_errno(rc) ((rc) == -1 ? -EAGAIN : -ENOSPC)
+
 /*
  * This is what we export.
  */
Index: linux-2.6.25-mm1/lib/idr.c
===================================================================
--- linux-2.6.25-mm1.orig/lib/idr.c	2008-05-06 17:37:40.000000000 +0200
+++ linux-2.6.25-mm1/lib/idr.c	2008-05-06 17:38:42.000000000 +0200
@@ -143,7 +143,7 @@ static int sub_alloc(struct idr *idp, in
 			/* if already at the top layer, we need to grow */
 			if (!(p = pa[l])) {
 				*starting_id = id;
-				return -2;
+				return IDR_NEED_TO_GROW;
 			}
 
 			/* If we need to go up one layer, continue the
@@ -160,7 +160,7 @@ static int sub_alloc(struct idr *idp, in
 			id = ((id >> sh) ^ n ^ m) << sh;
 		}
 		if ((id >= MAX_ID_BIT) || (id < 0))
-			return -3;
+			return IDR_NOMORE_SPACE;
 		if (l == 0)
 			break;
 		/*
@@ -229,7 +229,7 @@ build_up:
 	idp->top = p;
 	idp->layers = layers;
 	v = sub_alloc(idp, &id, pa);
-	if (v == -2)
+	if (v == IDR_NEED_TO_GROW)
 		goto build_up;
 	return(v);
 }
@@ -278,12 +278,8 @@ int idr_get_new_above(struct idr *idp, v
 	 * This is a cheap hack until the IDR code can be fixed to
 	 * return proper error values.
 	 */
-	if (rv < 0) {
-		if (rv == -1)
-			return -EAGAIN;
-		else /* Will be -3 */
-			return -ENOSPC;
-	}
+	if (rv < 0)
+		return _idr_rc_to_errno(rv);
 	*id = rv;
 	return 0;
 }
@@ -313,12 +309,8 @@ int idr_get_new(struct idr *idp, void *p
 	 * This is a cheap hack until the IDR code can be fixed to
 	 * return proper error values.
 	 */
-	if (rv < 0) {
-		if (rv == -1)
-			return -EAGAIN;
-		else /* Will be -3 */
-			return -ENOSPC;
-	}
+	if (rv < 0)
+		return _idr_rc_to_errno(rv);
 	*id = rv;
 	return 0;
 }
@@ -696,12 +688,8 @@ int ida_get_new_above(struct ida *ida, i
  restart:
 	/* get vacant slot */
 	t = idr_get_empty_slot(&ida->idr, idr_id, pa);
-	if (t < 0) {
-		if (t == -1)
-			return -EAGAIN;
-		else /* will be -3 */
-			return -ENOSPC;
-	}
+	if (t < 0)
+		return _idr_rc_to_errno(t);
 
 	if (t * IDA_BITMAP_BITS >= MAX_ID_BIT)
 		return -ENOSPC;

--

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 5/9] Make idr_get_new* rcu-safe
  2008-05-07 11:35 [PATCH 0/9] Scalability requirements for sysv ipc - v3 Nadia.Derbey
                   ` (3 preceding siblings ...)
  2008-05-07 11:35 ` [PATCH 4/9] Error checking factorization Nadia.Derbey
@ 2008-05-07 11:35 ` Nadia.Derbey
  2008-05-08 17:55   ` Rik van Riel
  2008-05-30  8:23   ` Paul E. McKenney
  2008-05-07 11:35 ` [PATCH 6/9] Make idr_find rcu-safe Nadia.Derbey
                   ` (6 subsequent siblings)
  11 siblings, 2 replies; 39+ messages in thread
From: Nadia.Derbey @ 2008-05-07 11:35 UTC (permalink / raw)
  To: manfred, paulmck, lnxninja; +Cc: linux-kernel, efault, akpm, Nadia Derbey

[-- Attachment #1: idr_get_new_rcu_safe.patch --]
[-- Type: text/plain, Size: 2567 bytes --]

[PATCH 05/09]

This is a patch that make the idr_get_new* routines rcu-safe.

Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>

---
 lib/idr.c |   17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

Index: linux-2.6.25-mm1/lib/idr.c
===================================================================
--- linux-2.6.25-mm1.orig/lib/idr.c	2008-05-06 17:38:42.000000000 +0200
+++ linux-2.6.25-mm1/lib/idr.c	2008-05-06 18:02:07.000000000 +0200
@@ -6,6 +6,8 @@
  * Modified by George Anzinger to reuse immediately and to use
  * find bit instructions.  Also removed _irq on spinlocks.
  *
+ * Modified by Nadia Derbey to make it RCU safe.
+ *
  * Small id to pointer translation service.
  *
  * It uses a radix tree like structure as a sparse array indexed
@@ -96,7 +98,7 @@ static void idr_mark_full(struct idr_lay
  * @gfp_mask:	memory allocation flags
  *
  * This function should be called prior to locking and calling the
- * following function.  It preallocates enough memory to satisfy
+ * idr_get_new* functions. It preallocates enough memory to satisfy
  * the worst possible allocation.
  *
  * If the system is REALLY out of memory this function returns 0,
@@ -170,7 +172,8 @@ static int sub_alloc(struct idr *idp, in
 			new = get_from_free_list(idp);
 			if (!new)
 				return -1;
-			p->ary[m] = new;
+			INIT_RCU_HEAD(&new->rcu_head);
+			rcu_assign_pointer(p->ary[m], new);
 			p->count++;
 		}
 		pa[l--] = p;
@@ -195,6 +198,7 @@ build_up:
 	if (unlikely(!p)) {
 		if (!(p = get_from_free_list(idp)))
 			return -1;
+		INIT_RCU_HEAD(&p->rcu_head);
 		layers = 1;
 	}
 	/*
@@ -222,11 +226,12 @@ build_up:
 		}
 		new->ary[0] = p;
 		new->count = 1;
+		INIT_RCU_HEAD(&new->rcu_head);
 		if (p->bitmap == IDR_FULL)
 			__set_bit(0, &new->bitmap);
 		p = new;
 	}
-	idp->top = p;
+	rcu_assign_pointer(idp->top, p);
 	idp->layers = layers;
 	v = sub_alloc(idp, &id, pa);
 	if (v == IDR_NEED_TO_GROW)
@@ -245,7 +250,8 @@ static int idr_get_new_above_int(struct 
 		 * Successfully found an empty slot.  Install the user
 		 * pointer and mark the slot full.
 		 */
-		pa[0]->ary[id & IDR_MASK] = (struct idr_layer *)ptr;
+		rcu_assign_pointer(pa[0]->ary[id & IDR_MASK],
+				(struct idr_layer *)ptr);
 		pa[0]->count++;
 		idr_mark_full(pa, id);
 	}
@@ -710,7 +716,8 @@ int ida_get_new_above(struct ida *ida, i
 			return -EAGAIN;
 
 		memset(bitmap, 0, sizeof(struct ida_bitmap));
-		pa[0]->ary[idr_id & IDR_MASK] = (void *)bitmap;
+		rcu_assign_pointer(pa[0]->ary[idr_id & IDR_MASK],
+				(void *)bitmap);
 		pa[0]->count++;
 	}
 

--

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 6/9] Make idr_find rcu-safe
  2008-05-07 11:35 [PATCH 0/9] Scalability requirements for sysv ipc - v3 Nadia.Derbey
                   ` (4 preceding siblings ...)
  2008-05-07 11:35 ` [PATCH 5/9] Make idr_get_new* rcu-safe Nadia.Derbey
@ 2008-05-07 11:35 ` Nadia.Derbey
  2008-05-08 17:58   ` Rik van Riel
  2008-05-30  8:24   ` Paul E. McKenney
  2008-05-07 11:36 ` [PATCH 7/9] Make idr_remove rcu-safe Nadia.Derbey
                   ` (5 subsequent siblings)
  11 siblings, 2 replies; 39+ messages in thread
From: Nadia.Derbey @ 2008-05-07 11:35 UTC (permalink / raw)
  To: manfred, paulmck, lnxninja; +Cc: linux-kernel, efault, akpm, Nadia Derbey

[-- Attachment #1: idr_find_rcu_safe.patch --]
[-- Type: text/plain, Size: 2943 bytes --]

[PATCH 06/09]

This is a patch that makes idr_find rcu-safe: it can now be called inside an
rcu_read critical section.

Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>

---
 include/linux/idr.h |   16 ++++++++++++++++
 lib/idr.c           |   11 ++++++-----
 2 files changed, 22 insertions(+), 5 deletions(-)

Index: linux-2.6.25-mm1/lib/idr.c
===================================================================
--- linux-2.6.25-mm1.orig/lib/idr.c	2008-05-06 18:06:58.000000000 +0200
+++ linux-2.6.25-mm1/lib/idr.c	2008-05-07 10:38:15.000000000 +0200
@@ -459,7 +459,8 @@ EXPORT_SYMBOL(idr_destroy);
  * return indicates that @id is not valid or you passed %NULL in
  * idr_get_new().
  *
- * The caller must serialize idr_find() vs idr_get_new() and idr_remove().
+ * This function can be called under rcu_read_lock(), given that the leaf
+ * pointers lifetimes are correctly managed.
  */
 void *idr_find(struct idr *idp, int id)
 {
@@ -467,7 +468,7 @@ void *idr_find(struct idr *idp, int id)
 	struct idr_layer *p;
 
 	n = idp->layers * IDR_BITS;
-	p = idp->top;
+	p = rcu_dereference(idp->top);
 
 	/* Mask off upper bits we don't use for the search. */
 	id &= MAX_ID_MASK;
@@ -477,7 +478,7 @@ void *idr_find(struct idr *idp, int id)
 
 	while (n > 0 && p) {
 		n -= IDR_BITS;
-		p = p->ary[(id >> n) & IDR_MASK];
+		p = rcu_dereference(p->ary[(id >> n) & IDR_MASK]);
 	}
 	return((void *)p);
 }
@@ -510,7 +511,7 @@ int idr_for_each(struct idr *idp,
 	struct idr_layer **paa = &pa[0];
 
 	n = idp->layers * IDR_BITS;
-	p = idp->top;
+	p = rcu_dereference(idp->top);
 	max = 1 << n;
 
 	id = 0;
@@ -518,7 +519,7 @@ int idr_for_each(struct idr *idp,
 		while (n > 0 && p) {
 			n -= IDR_BITS;
 			*paa++ = p;
-			p = p->ary[(id >> n) & IDR_MASK];
+			p = rcu_dereference(p->ary[(id >> n) & IDR_MASK]);
 		}
 
 		if (p) {
Index: linux-2.6.25-mm1/include/linux/idr.h
===================================================================
--- linux-2.6.25-mm1.orig/include/linux/idr.h	2008-05-06 17:38:42.000000000 +0200
+++ linux-2.6.25-mm1/include/linux/idr.h	2008-05-07 10:41:22.000000000 +0200
@@ -79,6 +79,22 @@ struct idr {
 
 #define _idr_rc_to_errno(rc) ((rc) == -1 ? -EAGAIN : -ENOSPC)
 
+/**
+ * idr synchronization (stolen from radix-tree.h)
+ *
+ * idr_find() is able to be called locklessly, using RCU. The caller must
+ * ensure calls to this function are made within rcu_read_lock() regions.
+ * Other readers (lock-free or otherwise) and modifications may be running
+ * concurrently.
+ *
+ * It is still required that the caller manage the synchronization and
+ * lifetimes of the items. So if RCU lock-free lookups are used, typically
+ * this would mean that the items have their own locks, or are amenable to
+ * lock-free access; and that the items are freed by RCU (or only freed after
+ * having been deleted from the idr tree *and* a synchronize_rcu() grace
+ * period).
+ */
+
 /*
  * This is what we export.
  */

--

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 7/9] Make idr_remove rcu-safe
  2008-05-07 11:35 [PATCH 0/9] Scalability requirements for sysv ipc - v3 Nadia.Derbey
                   ` (5 preceding siblings ...)
  2008-05-07 11:35 ` [PATCH 6/9] Make idr_find rcu-safe Nadia.Derbey
@ 2008-05-07 11:36 ` Nadia.Derbey
  2008-05-08 18:02   ` Rik van Riel
                     ` (2 more replies)
  2008-05-07 11:36 ` [PATCH 8/9] Call idr_find() without locking in ipc_lock() Nadia.Derbey
                   ` (4 subsequent siblings)
  11 siblings, 3 replies; 39+ messages in thread
From: Nadia.Derbey @ 2008-05-07 11:36 UTC (permalink / raw)
  To: manfred, paulmck, lnxninja; +Cc: linux-kernel, efault, akpm, Nadia Derbey

[-- Attachment #1: idr_remove_rcu_safe.patch --]
[-- Type: text/plain, Size: 3990 bytes --]

[PATCH 07/09]

This patch introduces the free_layer() routine: it is the one that actually
frees memory after a grace period has elapsed.

Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>

---
 lib/idr.c |   59 ++++++++++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 44 insertions(+), 15 deletions(-)

Index: linux-2.6.25-mm1/lib/idr.c
===================================================================
--- linux-2.6.25-mm1.orig/lib/idr.c	2008-05-06 18:06:43.000000000 +0200
+++ linux-2.6.25-mm1/lib/idr.c	2008-05-07 09:07:31.000000000 +0200
@@ -52,6 +52,19 @@ static struct idr_layer *get_from_free_l
 	return(p);
 }
 
+static void idr_layer_rcu_free(struct rcu_head *head)
+{
+	struct idr_layer *layer;
+
+	layer = container_of(head, struct idr_layer, rcu_head);
+	kmem_cache_free(idr_layer_cache, layer);
+}
+
+static inline void free_layer(struct idr_layer *p)
+{
+	call_rcu(&p->rcu_head, idr_layer_rcu_free);
+}
+
 /* only called when idp->lock is held */
 static void __move_to_free_list(struct idr *idp, struct idr_layer *p)
 {
@@ -334,6 +347,7 @@ static void sub_remove(struct idr *idp, 
 	struct idr_layer *p = idp->top;
 	struct idr_layer **pa[MAX_LEVEL];
 	struct idr_layer ***paa = &pa[0];
+	struct idr_layer *to_free;
 	int n;
 
 	*paa = NULL;
@@ -349,13 +363,18 @@ static void sub_remove(struct idr *idp, 
 	n = id & IDR_MASK;
 	if (likely(p != NULL && test_bit(n, &p->bitmap))){
 		__clear_bit(n, &p->bitmap);
-		p->ary[n] = NULL;
+		rcu_assign_pointer(p->ary[n], NULL);
+		to_free = NULL;
 		while(*paa && ! --((**paa)->count)){
-			move_to_free_list(idp, **paa);
+			if (to_free)
+				free_layer(to_free);
+			to_free = **paa;
 			**paa-- = NULL;
 		}
 		if (!*paa)
 			idp->layers = 0;
+		if (to_free)
+			free_layer(to_free);
 	} else
 		idr_remove_warning(id);
 }
@@ -368,25 +387,37 @@ static void sub_remove(struct idr *idp, 
 void idr_remove(struct idr *idp, int id)
 {
 	struct idr_layer *p;
+	struct idr_layer *to_free;
 
 	/* Mask off upper bits we don't use for the search. */
 	id &= MAX_ID_MASK;
 
 	sub_remove(idp, (idp->layers - 1) * IDR_BITS, id);
 	if (idp->top && idp->top->count == 1 && (idp->layers > 1) &&
-	    idp->top->ary[0]) {  // We can drop a layer
-
+	    idp->top->ary[0]) {
+		/*
+		 * Single child at leftmost slot: we can shrink the tree.
+		 * This level is not needed anymore since when layers are
+		 * inserted, they are inserted at the top of the existing
+		 * tree.
+		 */
+		to_free = idp->top;
 		p = idp->top->ary[0];
-		idp->top->bitmap = idp->top->count = 0;
-		move_to_free_list(idp, idp->top);
-		idp->top = p;
+		rcu_assign_pointer(idp->top, p);
 		--idp->layers;
+		to_free->bitmap = to_free->count = 0;
+		free_layer(to_free);
 	}
 	while (idp->id_free_cnt >= IDR_FREE_MAX) {
 		p = get_from_free_list(idp);
+		/*
+		 * Note: we don't call the rcu callback here, since the only
+		 * layers that fall into the freelist are those that have been
+		 * preallocated.
+		 */
 		kmem_cache_free(idr_layer_cache, p);
-		return;
 	}
+	return;
 }
 EXPORT_SYMBOL(idr_remove);
 
@@ -424,15 +455,13 @@ void idr_remove_all(struct idr *idp)
 
 		id += 1 << n;
 		while (n < fls(id)) {
-			if (p) {
-				memset(p, 0, sizeof *p);
-				move_to_free_list(idp, p);
-			}
+			if (p)
+				free_layer(p);
 			n += IDR_BITS;
 			p = *--paa;
 		}
 	}
-	idp->top = NULL;
+	rcu_assign_pointer(idp->top, NULL);
 	idp->layers = 0;
 }
 EXPORT_SYMBOL(idr_remove_all);
@@ -549,7 +578,7 @@ EXPORT_SYMBOL(idr_for_each);
  * A -ENOENT return indicates that @id was not found.
  * A -EINVAL return indicates that @id was not within valid constraints.
  *
- * The caller must serialize vs idr_find(), idr_get_new(), and idr_remove().
+ * The caller must serialize with writers.
  */
 void *idr_replace(struct idr *idp, void *ptr, int id)
 {
@@ -575,7 +604,7 @@ void *idr_replace(struct idr *idp, void 
 		return ERR_PTR(-ENOENT);
 
 	old_p = p->ary[n];
-	p->ary[n] = ptr;
+	rcu_assign_pointer(p->ary[n], ptr);
 
 	return old_p;
 }

--

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 8/9] Call idr_find() without locking in ipc_lock()
  2008-05-07 11:35 [PATCH 0/9] Scalability requirements for sysv ipc - v3 Nadia.Derbey
                   ` (6 preceding siblings ...)
  2008-05-07 11:36 ` [PATCH 7/9] Make idr_remove rcu-safe Nadia.Derbey
@ 2008-05-07 11:36 ` Nadia.Derbey
  2008-05-08 18:11   ` Rik van Riel
  2008-05-30  8:27   ` Paul E. McKenney
  2008-05-07 11:36 ` [PATCH 9/9] Get rid of ipc_lock_down() Nadia.Derbey
                   ` (3 subsequent siblings)
  11 siblings, 2 replies; 39+ messages in thread
From: Nadia.Derbey @ 2008-05-07 11:36 UTC (permalink / raw)
  To: manfred, paulmck, lnxninja; +Cc: linux-kernel, efault, akpm, Nadia Derbey

[-- Attachment #1: ipc_fix_ipc_lock.patch --]
[-- Type: text/plain, Size: 1354 bytes --]

[PATCH 08/09]

This patch makes idr_find() called locklessly in ipc_lock(), since the idr
tree is now RCU protected.

Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>

---
 ipc/util.c |    9 ---------
 1 file changed, 9 deletions(-)

Index: linux-2.6.25-mm1/ipc/util.c
===================================================================
--- linux-2.6.25-mm1.orig/ipc/util.c	2008-05-06 17:15:10.000000000 +0200
+++ linux-2.6.25-mm1/ipc/util.c	2008-05-07 09:56:20.000000000 +0200
@@ -688,10 +688,6 @@ void ipc64_perm_to_ipc_perm (struct ipc6
  * Look for an id in the ipc ids idr and lock the associated ipc object.
  *
  * The ipc object is locked on exit.
- *
- * This is the routine that should be called when the rw_mutex is not already
- * held, i.e. idr tree not protected: it protects the idr tree in read mode
- * during the idr_find().
  */
 
 struct kern_ipc_perm *ipc_lock(struct ipc_ids *ids, int id)
@@ -699,18 +695,13 @@ struct kern_ipc_perm *ipc_lock(struct ip
 	struct kern_ipc_perm *out;
 	int lid = ipcid_to_idx(id);
 
-	down_read(&ids->rw_mutex);
-
 	rcu_read_lock();
 	out = idr_find(&ids->ipcs_idr, lid);
 	if (out == NULL) {
 		rcu_read_unlock();
-		up_read(&ids->rw_mutex);
 		return ERR_PTR(-EINVAL);
 	}
 
-	up_read(&ids->rw_mutex);
-
 	spin_lock(&out->lock);
 	
 	/* ipc_rmid() may have already freed the ID while ipc_lock

--

^ permalink raw reply	[flat|nested] 39+ messages in thread

* [PATCH 9/9] Get rid of ipc_lock_down()
  2008-05-07 11:35 [PATCH 0/9] Scalability requirements for sysv ipc - v3 Nadia.Derbey
                   ` (7 preceding siblings ...)
  2008-05-07 11:36 ` [PATCH 8/9] Call idr_find() without locking in ipc_lock() Nadia.Derbey
@ 2008-05-07 11:36 ` Nadia.Derbey
  2008-05-08 18:13   ` Rik van Riel
  2008-05-30  8:29   ` Paul E. McKenney
  2008-05-07 11:41 ` [PATCH 0/9] Scalability requirements for sysv ipc - v3 Nadia Derbey
                   ` (2 subsequent siblings)
  11 siblings, 2 replies; 39+ messages in thread
From: Nadia.Derbey @ 2008-05-07 11:36 UTC (permalink / raw)
  To: manfred, paulmck, lnxninja; +Cc: linux-kernel, efault, akpm, Nadia Derbey

[-- Attachment #1: remove_ipc_lock_down.patch --]
[-- Type: text/plain, Size: 4618 bytes --]

[PATCH 09/09]

This patch removes the ipc_lock_down() routines: they used to call idr_find()
locklessly (given that the ipc ids lock was already held), so they are not
needed anymore.

Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>

---
 ipc/shm.c  |   21 +++------------------
 ipc/util.c |   52 +---------------------------------------------------
 ipc/util.h |    6 ------
 3 files changed, 4 insertions(+), 75 deletions(-)

Index: linux-2.6.25-mm1/ipc/util.c
===================================================================
--- linux-2.6.25-mm1.orig/ipc/util.c	2008-05-07 09:56:20.000000000 +0200
+++ linux-2.6.25-mm1/ipc/util.c	2008-05-07 09:58:32.000000000 +0200
@@ -716,56 +716,6 @@ struct kern_ipc_perm *ipc_lock(struct ip
 	return out;
 }
 
-/**
- * ipc_lock_down - Lock an ipc structure with rw_sem held
- * @ids: IPC identifier set
- * @id: ipc id to look for
- *
- * Look for an id in the ipc ids idr and lock the associated ipc object.
- *
- * The ipc object is locked on exit.
- *
- * This is the routine that should be called when the rw_mutex is already
- * held, i.e. idr tree protected.
- */
-
-struct kern_ipc_perm *ipc_lock_down(struct ipc_ids *ids, int id)
-{
-	struct kern_ipc_perm *out;
-	int lid = ipcid_to_idx(id);
-
-	rcu_read_lock();
-	out = idr_find(&ids->ipcs_idr, lid);
-	if (out == NULL) {
-		rcu_read_unlock();
-		return ERR_PTR(-EINVAL);
-	}
-
-	spin_lock(&out->lock);
-
-	/*
-	 * No need to verify that the structure is still valid since the
-	 * rw_mutex is held.
-	 */
-	return out;
-}
-
-struct kern_ipc_perm *ipc_lock_check_down(struct ipc_ids *ids, int id)
-{
-	struct kern_ipc_perm *out;
-
-	out = ipc_lock_down(ids, id);
-	if (IS_ERR(out))
-		return out;
-
-	if (ipc_checkid(out, id)) {
-		ipc_unlock(out);
-		return ERR_PTR(-EIDRM);
-	}
-
-	return out;
-}
-
 struct kern_ipc_perm *ipc_lock_check(struct ipc_ids *ids, int id)
 {
 	struct kern_ipc_perm *out;
@@ -837,7 +787,7 @@ struct kern_ipc_perm *ipcctl_pre_down(st
 	int err;
 
 	down_write(&ids->rw_mutex);
-	ipcp = ipc_lock_check_down(ids, id);
+	ipcp = ipc_lock_check(ids, id);
 	if (IS_ERR(ipcp)) {
 		err = PTR_ERR(ipcp);
 		goto out_up;
Index: linux-2.6.25-mm1/ipc/util.h
===================================================================
--- linux-2.6.25-mm1.orig/ipc/util.h	2008-05-06 17:15:10.000000000 +0200
+++ linux-2.6.25-mm1/ipc/util.h	2008-05-07 09:59:31.000000000 +0200
@@ -102,11 +102,6 @@ void* ipc_rcu_alloc(int size);
 void ipc_rcu_getref(void *ptr);
 void ipc_rcu_putref(void *ptr);
 
-/*
- * ipc_lock_down: called with rw_mutex held
- * ipc_lock: called without that lock held
- */
-struct kern_ipc_perm *ipc_lock_down(struct ipc_ids *, int);
 struct kern_ipc_perm *ipc_lock(struct ipc_ids *, int);
 
 void kernel_to_ipc64_perm(struct kern_ipc_perm *in, struct ipc64_perm *out);
@@ -155,7 +150,6 @@ static inline void ipc_unlock(struct ker
 	rcu_read_unlock();
 }
 
-struct kern_ipc_perm *ipc_lock_check_down(struct ipc_ids *ids, int id);
 struct kern_ipc_perm *ipc_lock_check(struct ipc_ids *ids, int id);
 int ipcget(struct ipc_namespace *ns, struct ipc_ids *ids,
 			struct ipc_ops *ops, struct ipc_params *params);
Index: linux-2.6.25-mm1/ipc/shm.c
===================================================================
--- linux-2.6.25-mm1.orig/ipc/shm.c	2008-05-06 17:15:10.000000000 +0200
+++ linux-2.6.25-mm1/ipc/shm.c	2008-05-07 10:01:05.000000000 +0200
@@ -112,23 +112,8 @@ void __init shm_init (void)
 }
 
 /*
- * shm_lock_(check_)down routines are called in the paths where the rw_mutex
- * is held to protect access to the idr tree.
- */
-static inline struct shmid_kernel *shm_lock_down(struct ipc_namespace *ns,
-						int id)
-{
-	struct kern_ipc_perm *ipcp = ipc_lock_down(&shm_ids(ns), id);
-
-	if (IS_ERR(ipcp))
-		return (struct shmid_kernel *)ipcp;
-
-	return container_of(ipcp, struct shmid_kernel, shm_perm);
-}
-
-/*
  * shm_lock_(check_) routines are called in the paths where the rw_mutex
- * is not held.
+ * is not necessarily held.
  */
 static inline struct shmid_kernel *shm_lock(struct ipc_namespace *ns, int id)
 {
@@ -211,7 +196,7 @@ static void shm_close(struct vm_area_str
 
 	down_write(&shm_ids(ns).rw_mutex);
 	/* remove from the list of attaches of the shm segment */
-	shp = shm_lock_down(ns, sfd->id);
+	shp = shm_lock(ns, sfd->id);
 	BUG_ON(IS_ERR(shp));
 	shp->shm_lprid = task_tgid_vnr(current);
 	shp->shm_dtim = get_seconds();
@@ -933,7 +918,7 @@ invalid:
 
 out_nattch:
 	down_write(&shm_ids(ns).rw_mutex);
-	shp = shm_lock_down(ns, shmid);
+	shp = shm_lock(ns, shmid);
 	BUG_ON(IS_ERR(shp));
 	shp->shm_nattch--;
 	if(shp->shm_nattch == 0 &&

--

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/9] Scalability requirements for sysv ipc - v3
  2008-05-07 11:35 [PATCH 0/9] Scalability requirements for sysv ipc - v3 Nadia.Derbey
                   ` (8 preceding siblings ...)
  2008-05-07 11:36 ` [PATCH 9/9] Get rid of ipc_lock_down() Nadia.Derbey
@ 2008-05-07 11:41 ` Nadia Derbey
  2008-05-07 13:19 ` KOSAKI Motohiro
  2008-05-30  8:22 ` Paul E. McKenney
  11 siblings, 0 replies; 39+ messages in thread
From: Nadia Derbey @ 2008-05-07 11:41 UTC (permalink / raw)
  To: Nadia.Derbey; +Cc: manfred, paulmck, lnxninja, linux-kernel, efault, akpm

[-- Attachment #1: Type: text/plain, Size: 1795 bytes --]

Nadia.Derbey@bull.net wrote:
> After scalability problems have been detected when using the sysV ipcs, I
> have proposed to use an RCU based implementation of the IDR api instead (see
> threads http://lkml.org/lkml/2008/4/11/212 and
> http://lkml.org/lkml/2008/4/29/295).
> 
> This resulted in many people asking to convert the idr API and make it
> rcu safe (because most of the code was duplicated and thus unmaintanable
> and unreviewable).
> 
> So here is a first attempt.
> 
> The important change wrt to the idr API itself is during idr removes:
> idr layers are freed after a grace period, instead of being moved to the
> free list.
> 
> The important change wrt to ipcs, is that idr_find() can now be called
> locklessly inside a rcu read critical section.
> 
> Here are the results I've got for the pmsg test sent by Manfred: 
> 
>    2.6.25-rc3-mm1   2.6.25-rc3-mm1+   2.6.25-mm1   Patched 2.6.25-mm1
> 1         1168441           1064021       876000               947488
> 2         1094264            921059      1549592              1730685
> 3         2082520           1738165      1694370              2324880
> 4         2079929           1695521       404553              2400408
> 5         2898758            406566       391283              3246580
> 6         2921417            261275       263249              3752148
> 7         3308761            126056       191742              4243142
> 8         3329456            100129       141722              4275780
> 
> 1st column: stock 2.6.25-rc3-mm1
> 2nd column: 2.6.25-rc3-mm1 + ipc patches (store ipcs into idrs)
> 3nd column: stock 2.6.25-mm1
> 4th column: 2.6.25-mm1 + this pacth series.
> 


You'll find in attachment the corresponding chart, and also the pmsg 
code (originally sent by Manfred).

Regards,
Nadia


[-- Attachment #2: results.pdf --]
[-- Type: application/force-download, Size: 13292 bytes --]

[-- Attachment #3: pmsg.cpp --]
[-- Type: text/x-c++, Size: 4653 bytes --]

/*
 * pmsg.cpp, parallel sysv msg pingpong
 *
 * Copyright (C) 1999, 2001, 2005, 2008 by Manfred Spraul.
 *	All rights reserved except the rights granted by the GPL.
 *
 * Redistribution of this file is permitted under the terms of the GNU 
 * General Public License (GPL) version 2 or later.
 * $Header$
 */

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <getopt.h>
#include <errno.h>
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/msg.h>
#include <pthread.h>

//////////////////////////////////////////////////////////////////////////////

static enum {
	WAITING,
	RUNNING,
	STOPPED,
} volatile g_state = WAITING;

unsigned long long *g_results;
int *g_svmsg_ids;
pthread_t *g_threads;

struct taskinfo {
	int svmsg_id;
	int threadid;
	int cpuid;
	int sender;
};

#define DATASIZE	8

void* worker_thread(void *arg)
{
	struct taskinfo *ti = (struct taskinfo*)arg;
	unsigned long long rounds;
	int ret;
	struct {
		long mtype;
		char buffer[DATASIZE];
	} mbuf;

	{
		cpu_set_t cpus;
		CPU_ZERO(&cpus);
		CPU_SET(ti->cpuid, &cpus);

		ret = pthread_setaffinity_np(g_threads[ti->threadid], sizeof(cpus), &cpus);
		if (ret < 0) {
			printf("pthread_setaffinity_np failed for thread %d with errno %d.\n",
					ti->threadid, errno);
		}

		ret = pthread_getaffinity_np(g_threads[ti->threadid], sizeof(cpus), &cpus);
		if (ret < 0) {
			printf("pthread_getaffinity_np() failed for thread %d with errno %d.\n",
					ti->threadid, errno);
			fflush(stdout);
		} else {
			printf("thread %d: sysvmsg %8d type %d bound to %04lxh\n",ti->threadid,
					ti->svmsg_id, ti->sender, cpus.__bits[0]);
		}
		fflush(stdout);
	}

	rounds = 0;
	while(g_state == WAITING) {
#ifdef __i386__
		__asm__ __volatile__("pause": : :"memory");
#endif
	}

	if (ti->sender) {
		mbuf.mtype = ti->sender+1;
		ret = msgsnd(ti->svmsg_id, &mbuf, DATASIZE, 0);
		if (ret != 0) {
			printf("Initial send failed, errno %d.\n", errno);
			exit(1);
		}
	}
	while(g_state == RUNNING) {
		int target = 1+!ti->sender;

		ret = msgrcv(ti->svmsg_id, &mbuf, DATASIZE, target, 0);
		if (ret != DATASIZE) {
			if (errno == EIDRM)
				break;
			printf("Error on msgrcv, got %d, errno %d.\n", ret, errno);
			exit(1);
		}
		mbuf.mtype = ti->sender+1;
		ret = msgsnd(ti->svmsg_id, &mbuf, DATASIZE, 0);
		if (ret != 0) {
			if (errno == EIDRM)
				break;
			printf("send failed, errno %d.\n", errno);
			exit(1);
		}
		rounds++;
	}
	/* store result */
	g_results[ti->threadid] = rounds;

	pthread_exit(0);
	return NULL;
}

void init_threads(int cpu, int cpus)
{
	int ret;
	struct taskinfo *ti1, *ti2;

	ti1 = new (struct taskinfo);
	ti2 = new (struct taskinfo);
	if (!ti1 || !ti2) {
		printf("Could not allocate task info\n");
		exit(1);
	}

	g_svmsg_ids[cpu] = msgget(IPC_PRIVATE,0777|IPC_CREAT);
	if(g_svmsg_ids[cpu] == -1) {
		printf(" message queue create failed.\n");
		exit(1);
	}

	g_results[cpu] = 0;
	g_results[cpu+cpus] = 0;

	ti1->svmsg_id = g_svmsg_ids[cpu];
	ti1->threadid = cpu;
	ti1->cpuid = cpu;
	ti1->sender = 1;
	ti2->svmsg_id = g_svmsg_ids[cpu];
	ti2->threadid = cpu+cpus;
	ti2->cpuid = cpu;
	ti2->sender = 0;

	ret = pthread_create(&g_threads[ti1->threadid], NULL, worker_thread, ti1);
	if (ret) {
		printf(" pthread_create failed with error code %d\n", ret);
		exit(1);
	}
	ret = pthread_create(&g_threads[ti2->threadid], NULL, worker_thread, ti2);
	if (ret) {
		printf(" pthread_create failed with error code %d\n", ret);
		exit(1);
	}
}

//////////////////////////////////////////////////////////////////////////////

int main(int argc, char **argv)
{
	int queues, timeout;
	unsigned long long totals;
	int i;

	printf("pmsg [nr queues] [timeout]\n");
	if (argc != 3) {
		printf(" Invalid parameters.\n");
		return 0;
	}
	queues = atoi(argv[1]);
	timeout = atoi(argv[2]);
	printf("Using %d queues/cpus (%d threads) for %d seconds.\n",
			queues, 2*queues, timeout);

	g_results = new unsigned long long[2*queues];
	g_svmsg_ids = new int[queues];
	g_threads = new pthread_t[2*queues];
	for (i=0;i<queues;i++) {
		init_threads(i, queues);
	}

	sleep(1);
	g_state = RUNNING;
	sleep(timeout);
	g_state = STOPPED;
	sleep(1);
	for (i=0;i<queues;i++) {
		int res;
		res = msgctl(g_svmsg_ids[i],IPC_RMID,NULL);
		if (res < 0) {
			printf("msgctl(IPC_RMID) failed for %d, errno%d.\n",
				g_svmsg_ids[i], errno);
		}
	}
	for (i=0;i<2*queues;i++)
		pthread_join(g_threads[i], NULL);

	printf("Result matrix:\n");
	totals = 0;
	for (i=0;i<queues;i++) {
		printf("  Thread %3d: %8lld     %3d: %8lld\n",
				i, g_results[i], i+queues, g_results[i+queues]);
		totals += g_results[i] + g_results[i+queues];
	}
	printf("Total: %lld\n", totals);
}

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/9] Scalability requirements for sysv ipc - v3
  2008-05-07 11:35 [PATCH 0/9] Scalability requirements for sysv ipc - v3 Nadia.Derbey
                   ` (9 preceding siblings ...)
  2008-05-07 11:41 ` [PATCH 0/9] Scalability requirements for sysv ipc - v3 Nadia Derbey
@ 2008-05-07 13:19 ` KOSAKI Motohiro
  2008-05-13 14:10   ` Nadia Derbey
  2008-05-30  8:22 ` Paul E. McKenney
  11 siblings, 1 reply; 39+ messages in thread
From: KOSAKI Motohiro @ 2008-05-07 13:19 UTC (permalink / raw)
  To: Nadia.Derbey; +Cc: manfred, paulmck, lnxninja, linux-kernel, efault, akpm

Hi

nice improvement.
but...

>    2.6.25-rc3-mm1   2.6.25-rc3-mm1+   2.6.25-mm1   Patched 2.6.25-mm1
>  1         1168441           1064021       876000               947488
>  2         1094264            921059      1549592              1730685
>  3         2082520           1738165      1694370              2324880
>  4         2079929           1695521       404553              2400408
>  5         2898758            406566       391283              3246580
>  6         2921417            261275       263249              3752148
>  7         3308761            126056       191742              4243142
>  8         3329456            100129       141722              4275780
>
>  1st column: stock 2.6.25-rc3-mm1
>  2nd column: 2.6.25-rc3-mm1 + ipc patches (store ipcs into idrs)
>  3nd column: stock 2.6.25-mm1
>  4th column: 2.6.25-mm1 + this pacth series.

this result is slightly odd.

similar to 2.6.25-rc3-mm1 and patched-2.6.25-mm1
similar to 2.6.25-mm1 and 2.6.25-rc3-mm1

but

not similar to 2.6.25-rc3-mm1 and patched-2.6.25-rc3-mm1
not similar to 2.6.25-mm1 and patched-2.6.25-mm1

Is patched-2.6.25-rc3-mm1 and patched-2.6.25-mm1 applied the same patch?
or I misunderstand how to see your chart?

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/9] Change the idr structure
  2008-05-07 11:35 ` [PATCH 1/9] Change the idr structure Nadia.Derbey
@ 2008-05-08 17:12   ` Rik van Riel
  2008-05-30  8:22   ` Paul E. McKenney
  1 sibling, 0 replies; 39+ messages in thread
From: Rik van Riel @ 2008-05-08 17:12 UTC (permalink / raw)
  To: Nadia.Derbey
  Cc: manfred, paulmck, lnxninja, linux-kernel, efault, akpm, Nadia Derbey

On Wed, 07 May 2008 13:35:54 +0200
Nadia.Derbey@bull.net wrote:

> [PATCH 01/09]
> 
> This patch adds an rcu_head to the idr_layer structure in order to free it
> after a grace period.
> 
> Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>

Acked-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/9] Rename some of the idr APIs internal routines
  2008-05-07 11:35 ` [PATCH 2/9] Rename some of the idr APIs internal routines Nadia.Derbey
@ 2008-05-08 17:15   ` Rik van Riel
  2008-05-30  8:23   ` Paul E. McKenney
  1 sibling, 0 replies; 39+ messages in thread
From: Rik van Riel @ 2008-05-08 17:15 UTC (permalink / raw)
  To: Nadia.Derbey
  Cc: manfred, paulmck, lnxninja, linux-kernel, efault, akpm, Nadia Derbey

On Wed, 07 May 2008 13:35:55 +0200
Nadia.Derbey@bull.net wrote:

> This makes things more clear for the next patches.
> 
> Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>

Sure, when RCU (and delayed freeing) is involved some extra code
readability never hurts.

Acked-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 3/9] Fix a printk call
  2008-05-07 11:35 ` [PATCH 3/9] Fix a printk call Nadia.Derbey
@ 2008-05-08 17:43   ` Rik van Riel
  2008-05-30  8:23   ` Paul E. McKenney
  1 sibling, 0 replies; 39+ messages in thread
From: Rik van Riel @ 2008-05-08 17:43 UTC (permalink / raw)
  To: Nadia.Derbey
  Cc: manfred, paulmck, lnxninja, linux-kernel, efault, akpm, Nadia Derbey

On Wed, 07 May 2008 13:35:56 +0200
Nadia.Derbey@bull.net wrote:

> [PATCH 03/09]
> 
> This is a trivial patch that fixes the printk incomplete call.
> 
> Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>

Acked-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 4/9] Error checking factorization
  2008-05-07 11:35 ` [PATCH 4/9] Error checking factorization Nadia.Derbey
@ 2008-05-08 17:45   ` Rik van Riel
  0 siblings, 0 replies; 39+ messages in thread
From: Rik van Riel @ 2008-05-08 17:45 UTC (permalink / raw)
  To: Nadia.Derbey
  Cc: manfred, paulmck, lnxninja, linux-kernel, efault, akpm, Nadia Derbey

On Wed, 07 May 2008 13:35:57 +0200
Nadia.Derbey@bull.net wrote:

> [PATCH 04/09]
> 
> This is a trivial patch that does some codes factorization in the return code
> analysis.
> 
> Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>

Acked-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 5/9] Make idr_get_new* rcu-safe
  2008-05-07 11:35 ` [PATCH 5/9] Make idr_get_new* rcu-safe Nadia.Derbey
@ 2008-05-08 17:55   ` Rik van Riel
  2008-05-30  8:23   ` Paul E. McKenney
  1 sibling, 0 replies; 39+ messages in thread
From: Rik van Riel @ 2008-05-08 17:55 UTC (permalink / raw)
  To: Nadia.Derbey
  Cc: manfred, paulmck, lnxninja, linux-kernel, efault, akpm, Nadia Derbey

On Wed, 07 May 2008 13:35:58 +0200
Nadia.Derbey@bull.net wrote:

> [PATCH 05/09]
> 
> This is a patch that make the idr_get_new* routines rcu-safe.
> 
> Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>

Acked-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 6/9] Make idr_find rcu-safe
  2008-05-07 11:35 ` [PATCH 6/9] Make idr_find rcu-safe Nadia.Derbey
@ 2008-05-08 17:58   ` Rik van Riel
  2008-05-30  8:24   ` Paul E. McKenney
  1 sibling, 0 replies; 39+ messages in thread
From: Rik van Riel @ 2008-05-08 17:58 UTC (permalink / raw)
  To: Nadia.Derbey
  Cc: manfred, paulmck, lnxninja, linux-kernel, efault, akpm, Nadia Derbey

On Wed, 07 May 2008 13:35:59 +0200
Nadia.Derbey@bull.net wrote:

> [PATCH 06/09]
> 
> This is a patch that makes idr_find rcu-safe: it can now be called inside an
> rcu_read critical section.
> 
> Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>

Acked-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 7/9] Make idr_remove rcu-safe
  2008-05-07 11:36 ` [PATCH 7/9] Make idr_remove rcu-safe Nadia.Derbey
@ 2008-05-08 18:02   ` Rik van Riel
  2008-05-14 19:59   ` Tim Pepper
  2008-05-30  8:24   ` Paul E. McKenney
  2 siblings, 0 replies; 39+ messages in thread
From: Rik van Riel @ 2008-05-08 18:02 UTC (permalink / raw)
  To: Nadia.Derbey
  Cc: manfred, paulmck, lnxninja, linux-kernel, efault, akpm, Nadia Derbey

On Wed, 07 May 2008 13:36:00 +0200
Nadia.Derbey@bull.net wrote:

> [PATCH 07/09]
> 
> This patch introduces the free_layer() routine: it is the one that actually
> frees memory after a grace period has elapsed.
> 
> Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>

Acked-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 8/9] Call idr_find() without locking in ipc_lock()
  2008-05-07 11:36 ` [PATCH 8/9] Call idr_find() without locking in ipc_lock() Nadia.Derbey
@ 2008-05-08 18:11   ` Rik van Riel
  2008-05-30  8:27   ` Paul E. McKenney
  1 sibling, 0 replies; 39+ messages in thread
From: Rik van Riel @ 2008-05-08 18:11 UTC (permalink / raw)
  To: Nadia.Derbey
  Cc: manfred, paulmck, lnxninja, linux-kernel, efault, akpm, Nadia Derbey

On Wed, 07 May 2008 13:36:01 +0200
Nadia.Derbey@bull.net wrote:

> [PATCH 08/09]
> 
> This patch makes idr_find() called locklessly in ipc_lock(), since the idr
> tree is now RCU protected.
> 
> Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>

Acked-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 9/9] Get rid of ipc_lock_down()
  2008-05-07 11:36 ` [PATCH 9/9] Get rid of ipc_lock_down() Nadia.Derbey
@ 2008-05-08 18:13   ` Rik van Riel
  2008-05-30  8:29   ` Paul E. McKenney
  1 sibling, 0 replies; 39+ messages in thread
From: Rik van Riel @ 2008-05-08 18:13 UTC (permalink / raw)
  To: Nadia.Derbey
  Cc: manfred, paulmck, lnxninja, linux-kernel, efault, akpm, Nadia Derbey

On Wed, 07 May 2008 13:36:02 +0200
Nadia.Derbey@bull.net wrote:

> [PATCH 09/09]
> 
> This patch removes the ipc_lock_down() routines: they used to call idr_find()
> locklessly (given that the ipc ids lock was already held), so they are not
> needed anymore.
> 
> Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>

Acked-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/9] Scalability requirements for sysv ipc - v3
  2008-05-07 13:19 ` KOSAKI Motohiro
@ 2008-05-13 14:10   ` Nadia Derbey
  2008-05-14  4:22     ` KOSAKI Motohiro
  0 siblings, 1 reply; 39+ messages in thread
From: Nadia Derbey @ 2008-05-13 14:10 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: manfred, paulmck, lnxninja, linux-kernel, efault, akpm

KOSAKI Motohiro wrote:
> Hi
> 
> nice improvement.
> but...
> 
> 
>>   2.6.25-rc3-mm1   2.6.25-rc3-mm1+   2.6.25-mm1   Patched 2.6.25-mm1
>> 1         1168441           1064021       876000               947488
>> 2         1094264            921059      1549592              1730685
>> 3         2082520           1738165      1694370              2324880
>> 4         2079929           1695521       404553              2400408
>> 5         2898758            406566       391283              3246580
>> 6         2921417            261275       263249              3752148
>> 7         3308761            126056       191742              4243142
>> 8         3329456            100129       141722              4275780
>>
>> 1st column: stock 2.6.25-rc3-mm1
>> 2nd column: 2.6.25-rc3-mm1 + ipc patches (store ipcs into idrs)
>> 3nd column: stock 2.6.25-mm1
>> 4th column: 2.6.25-mm1 + this pacth series.
> 
> 
> this result is slightly odd.
> 
> similar to 2.6.25-rc3-mm1 and patched-2.6.25-mm1
> similar to 2.6.25-mm1 and 2.6.25-rc3-mm1
> 
> but
> 
> not similar to 2.6.25-rc3-mm1 and patched-2.6.25-rc3-mm1
> not similar to 2.6.25-mm1 and patched-2.6.25-mm1
> 
> Is patched-2.6.25-rc3-mm1 and patched-2.6.25-mm1 applied the same patch?

No

> or I misunderstand how to see your chart?
> 
> 

Well, looks like the description was not clear, sorry for that!
1st column: 2.6.25-rc3-mm1 with original ipc implementation
                            (i.e. ipcs stored in an array)
2nd column: 2.6.25-rc3-mm1, with ipcs stored in an idr tree
             actually, that's when the performance regression
             had been introduced.
3rd column: 2.6.25-mm1, still with the same implementation
             i.e. still with degraded performances.
4th column: 2.6.25-mm1 + rcu-based implementation of the idr,
             which is the current patch series.

Hope this makes things more clear.

Regards,
Nadia

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/9] Scalability requirements for sysv ipc - v3
  2008-05-13 14:10   ` Nadia Derbey
@ 2008-05-14  4:22     ` KOSAKI Motohiro
  0 siblings, 0 replies; 39+ messages in thread
From: KOSAKI Motohiro @ 2008-05-14  4:22 UTC (permalink / raw)
  To: Nadia Derbey
  Cc: kosaki.motohiro, KOSAKI Motohiro, manfred, paulmck, lnxninja,
	linux-kernel, efault, akpm

> Well, looks like the description was not clear, sorry for that!
> 1st column: 2.6.25-rc3-mm1 with original ipc implementation
>                             (i.e. ipcs stored in an array)
> 2nd column: 2.6.25-rc3-mm1, with ipcs stored in an idr tree
>              actually, that's when the performance regression
>              had been introduced.
> 3rd column: 2.6.25-mm1, still with the same implementation
>              i.e. still with degraded performances.
> 4th column: 2.6.25-mm1 + rcu-based implementation of the idr,
>              which is the current patch series.
> 
> Hope this makes things more clear.

Oh, I see that your patchset has very nice performance result :)
Thanks!



^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 7/9] Make idr_remove rcu-safe
  2008-05-07 11:36 ` [PATCH 7/9] Make idr_remove rcu-safe Nadia.Derbey
  2008-05-08 18:02   ` Rik van Riel
@ 2008-05-14 19:59   ` Tim Pepper
  2008-05-15  7:40     ` Nadia Derbey
  2008-05-30  8:24   ` Paul E. McKenney
  2 siblings, 1 reply; 39+ messages in thread
From: Tim Pepper @ 2008-05-14 19:59 UTC (permalink / raw)
  To: Nadia.Derbey; +Cc: manfred, paulmck, lnxninja, linux-kernel, efault, akpm

On Wed 07 May at 13:36:00 +0200 Nadia.Derbey@bull.net said:
> [PATCH 07/09]
> 
> This patch introduces the free_layer() routine: it is the one that actually
> frees memory after a grace period has elapsed.
> 
> Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
> 
> ---
>  lib/idr.c |   59 ++++++++++++++++++++++++++++++++++++++++++++---------------
>  1 file changed, 44 insertions(+), 15 deletions(-)
> 
> Index: linux-2.6.25-mm1/lib/idr.c
> ===================================================================
> --- linux-2.6.25-mm1.orig/lib/idr.c	2008-05-06 18:06:43.000000000 +0200
> +++ linux-2.6.25-mm1/lib/idr.c	2008-05-07 09:07:31.000000000 +0200
> @@ -424,15 +455,13 @@ void idr_remove_all(struct idr *idp)
> 
>  		id += 1 << n;
>  		while (n < fls(id)) {
> -			if (p) {
> -				memset(p, 0, sizeof *p);
> -				move_to_free_list(idp, p);
> -			}
> +			if (p)
> +				free_layer(p);
>  			n += IDR_BITS;
>  			p = *--paa;
>  		}
>  	}
> -	idp->top = NULL;
> +	rcu_assign_pointer(idp->top, NULL);
>  	idp->layers = 0;
>  }
>  EXPORT_SYMBOL(idr_remove_all);

Does idr_remove_all() need an rcu_dereference() in the loop preceeding the
above, where it does:

                while (n > IDR_BITS && p) {
                        n -= IDR_BITS;
                        *paa++ = p;
     ---->              p = p->ary[(id >> n) & IDR_MASK];
                }

idr_replace() also has that loop without rcu_derefernce, but I _think_
I see why that one should be ok.  At least there the comment is clear
that locking at a higher level should be happening.  And then
idr_remove_all() is almost unused and it looks like it is only in
serialized places.

Otherwise, thanks for redoing...This patch set was much easier to digest
and looks reasonable to me.

I've been having some machine issues, but hope to give this patch set a run
still on a 128way machine and mainline to provide some additional
datapoints.

-- 
Tim Pepper  <lnxninja@linux.vnet.ibm.com>
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 7/9] Make idr_remove rcu-safe
  2008-05-14 19:59   ` Tim Pepper
@ 2008-05-15  7:40     ` Nadia Derbey
  2008-05-20  5:29       ` Tim Pepper
  0 siblings, 1 reply; 39+ messages in thread
From: Nadia Derbey @ 2008-05-15  7:40 UTC (permalink / raw)
  To: Tim Pepper; +Cc: manfred, paulmck, linux-kernel, efault, akpm

Tim Pepper wrote:
> On Wed 07 May at 13:36:00 +0200 Nadia.Derbey@bull.net said:
> 
>>[PATCH 07/09]
>>
>>This patch introduces the free_layer() routine: it is the one that actually
>>frees memory after a grace period has elapsed.
>>
>>Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
>>
>>---
>> lib/idr.c |   59 ++++++++++++++++++++++++++++++++++++++++++++---------------
>> 1 file changed, 44 insertions(+), 15 deletions(-)
>>
>>Index: linux-2.6.25-mm1/lib/idr.c
>>===================================================================
>>--- linux-2.6.25-mm1.orig/lib/idr.c	2008-05-06 18:06:43.000000000 +0200
>>+++ linux-2.6.25-mm1/lib/idr.c	2008-05-07 09:07:31.000000000 +0200
>>@@ -424,15 +455,13 @@ void idr_remove_all(struct idr *idp)
>>
>> 		id += 1 << n;
>> 		while (n < fls(id)) {
>>-			if (p) {
>>-				memset(p, 0, sizeof *p);
>>-				move_to_free_list(idp, p);
>>-			}
>>+			if (p)
>>+				free_layer(p);
>> 			n += IDR_BITS;
>> 			p = *--paa;
>> 		}
>> 	}
>>-	idp->top = NULL;
>>+	rcu_assign_pointer(idp->top, NULL);
>> 	idp->layers = 0;
>> }
>> EXPORT_SYMBOL(idr_remove_all);
> 
> 
> Does idr_remove_all() need an rcu_dereference() in the loop preceeding the
> above, where it does:
> 
>                 while (n > IDR_BITS && p) {
>                         n -= IDR_BITS;
>                         *paa++ = p;
>      ---->              p = p->ary[(id >> n) & IDR_MASK];
>                 }

I assumed here that idr_remove_all() was called in the "typical cleanup 
sequence" mentioned in the comment describing the routine.
And actually, that's how it is called in drivers/char/drm.

> 
> idr_replace() also has that loop without rcu_derefernce, but I _think_
> I see why that one should be ok.  At least there the comment is clear
> that locking at a higher level should be happening.

I didn't use rcu_dereference here since the caller should anyway 
serialize with other writers. So the tree should remain unchanged during 
the replace operation.

>  And then
> idr_remove_all() is almost unused and it looks like it is only in
> serialized places.
> 
> Otherwise, thanks for redoing...This patch set was much easier to digest
> and looks reasonable to me.
> 
> I've been having some machine issues, but hope to give this patch set a run
> still on a 128way machine and mainline to provide some additional
> datapoints.
> 

That would be kind, indeed (hope I didn't break anything).

Regards,
Nadia


^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 7/9] Make idr_remove rcu-safe
  2008-05-15  7:40     ` Nadia Derbey
@ 2008-05-20  5:29       ` Tim Pepper
  2008-05-20  5:35         ` Tim Pepper
  2008-05-20  7:03         ` Nadia Derbey
  0 siblings, 2 replies; 39+ messages in thread
From: Tim Pepper @ 2008-05-20  5:29 UTC (permalink / raw)
  To: Nadia Derbey; +Cc: Tim Pepper, manfred, paulmck, linux-kernel, efault, akpm

[-- Attachment #1: Type: text/plain, Size: 4544 bytes --]

On Thu 15 May at 09:40:13 +0200 Nadia.Derbey@bull.net said:
> Tim Pepper wrote:
>> On Wed 07 May at 13:36:00 +0200 Nadia.Derbey@bull.net said:
>>
>>> [PATCH 07/09]
>>>
>>> This patch introduces the free_layer() routine: it is the one that actually
>>> frees memory after a grace period has elapsed.
>>>
>>> Index: linux-2.6.25-mm1/lib/idr.c
>>> ===================================================================
>>> --- linux-2.6.25-mm1.orig/lib/idr.c	2008-05-06 18:06:43.000000000 +0200
>>> +++ linux-2.6.25-mm1/lib/idr.c	2008-05-07 09:07:31.000000000 +0200
>>> @@ -424,15 +455,13 @@ void idr_remove_all(struct idr *idp)
>>>
>>> 		id += 1 << n;
>>> 		while (n < fls(id)) {
>>> -			if (p) {
>>> -				memset(p, 0, sizeof *p);
>>> -				move_to_free_list(idp, p);
>>> -			}
>>> +			if (p)
>>> +				free_layer(p);
>>> 			n += IDR_BITS;
>>> 			p = *--paa;
>>> 		}
>>> 	}
>>> -	idp->top = NULL;
>>> +	rcu_assign_pointer(idp->top, NULL);
>>> 	idp->layers = 0;
>>> }
>>> EXPORT_SYMBOL(idr_remove_all);
>>
>>
>> Does idr_remove_all() need an rcu_dereference() in the loop preceeding the
>> above, where it does:
>>
>>                 while (n > IDR_BITS && p) {
>>                         n -= IDR_BITS;
>>                         *paa++ = p;
>>      ---->              p = p->ary[(id >> n) & IDR_MASK];
>>                 }
>
> I assumed here that idr_remove_all() was called in the "typical cleanup 
> sequence" mentioned in the comment describing the routine.
> And actually, that's how it is called in drivers/char/drm.

Sure.  I guess I was thinking out loud that it's maybe somewhat implicit
that things must be serial at that point and I wasn't sure if it was meant
to be required or enforced, if it should be clarified with comments or
by explicitly adding the rcu calls in these couple additional places.

>> I've been having some machine issues, but hope to give this patch set a run
>> still on a 128way machine and mainline to provide some additional
>> datapoints.
>>
>
> That would be kind, indeed (hope I didn't break anything).

I've had a bunch of machine issues, but I did manage to do some testing.

I'd started looking at the regression after Anton Blanchard mentioned
seeing it via this simple bit of code:
    http://ozlabs.org/~anton/junkcode/lock_comparison.c
It essentially spawns a number of threads to match the cpu count, each
thread looping 10000 times, where each loop does some trivial semop()'s.
Each thread has its own semaphore it is semop()'ing so there's no
contention.

To get a little more detail I hacked Anton's code minimally to record
results for thread counts 1..n and also to optionally have just a single
semaphore on which all of these threads are contending.  I ran this on
a 64cpu machine (128 given SMT), but didn't make any effort to force
clean thread/cpu affinity.

The contended numbers don't look quite as consistent as they could at
the high end, but I think this is more quick/dirty test than patch.
Nevertheless the patch clearly outperforms mainline and despite the
noise actually looks to perform a more consistently than mainline
(see graphs).

Summary numbers from a run (with a bit more detail at the high thread
side as the numbers had more variation there and just showing the power
of two numbers for this run incorrectly implies a knee...I can do more
runs with averages/stats if people need more convincing):

threads          uncontended                     contended
                 semop loops                    semop loops

        2.6.26-rc2      +patchset        2.6.26-rc2      +patchset

1	  2243.94	  4436.25	   2063.18	   4386.78
2	  2954.11	  5445.12	  67885.16	   5906.72
4	  4367.45	  8125.67	  72820.32	  44263.86
8	  7440.00	  9842.66	  60184.17	  95677.58
16	 12959.44	 20323.97	 136482.42	 248143.80
32	 35252.71	 56334.28	 363884.09	 599773.31
48	 62811.15	102684.67	 515886.12	1714530.12
...
57	 81064.99	141434.33	 564874.69	2518078.75
58	 79486.08	145685.84	 693038.06	1868813.12
59	 83634.19	153087.80	1237576.25	2828301.25
60	 91581.07	158207.08	 797796.94	2970977.25
61	 89209.40	160529.38	1202135.38	2538114.50
62	 89008.45	167843.78	1305666.75	2274845.00
63	 97753.17	177470.12	 733957.31	 363952.62
64	102556.05	175923.56	1356988.88	 199527.83

(detail plots from this same run attached...)

Nadia, you're welcome to add either or both of these to the series if
you'd like:

Reviewed-by: Tim Pepper <lnxninja@linux.vnet.ibm.com>
Tested-by: Tim Pepper <lnxninja@linux.vnet.ibm.com>

-- 
Tim Pepper  <lnxninja@linux.vnet.ibm.com>
IBM Linux Technology Center

[-- Attachment #2: 2.6.26-rc2.contended.128_semops.png --]
[-- Type: image/png, Size: 13381 bytes --]

[-- Attachment #3: 2.6.26-rc2-patched.contended_semops.png --]
[-- Type: image/png, Size: 11563 bytes --]

[-- Attachment #4: 2.6.26-rc2.uncontended_semops.png --]
[-- Type: image/png, Size: 9257 bytes --]

[-- Attachment #5: 2.6.26-rc2-patched.uncontended_semops.png --]
[-- Type: image/png, Size: 11056 bytes --]

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 7/9] Make idr_remove rcu-safe
  2008-05-20  5:29       ` Tim Pepper
@ 2008-05-20  5:35         ` Tim Pepper
  2008-05-20  7:03         ` Nadia Derbey
  1 sibling, 0 replies; 39+ messages in thread
From: Tim Pepper @ 2008-05-20  5:35 UTC (permalink / raw)
  To: Tim Pepper; +Cc: Nadia Derbey, manfred, paulmck, linux-kernel, efault, akpm

On Mon 19 May at 22:29:05 -0700 lnxninja@linux.vnet.ibm.com said:
> 
> (detail plots from this same run attached...)
> 

Errr...clumsy fingers.  I attached a plot that had up to 128 threads, which
is really just noise over 60-ish.

At any rate, if anybody would like to see more runs/details let me know.

-- 
Tim Pepper  <lnxninja@linux.vnet.ibm.com>
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 7/9] Make idr_remove rcu-safe
  2008-05-20  5:29       ` Tim Pepper
  2008-05-20  5:35         ` Tim Pepper
@ 2008-05-20  7:03         ` Nadia Derbey
  2008-05-20 16:26           ` Tim Pepper
  1 sibling, 1 reply; 39+ messages in thread
From: Nadia Derbey @ 2008-05-20  7:03 UTC (permalink / raw)
  To: Tim Pepper; +Cc: manfred, paulmck, linux-kernel, efault, akpm

Tim Pepper wrote:
> On Thu 15 May at 09:40:13 +0200 Nadia.Derbey@bull.net said:
> 
>>Tim Pepper wrote:
>>
>>>On Wed 07 May at 13:36:00 +0200 Nadia.Derbey@bull.net said:
>>>
>>>
>>>>[PATCH 07/09]
>>>>
>>>>This patch introduces the free_layer() routine: it is the one that actually
>>>>frees memory after a grace period has elapsed.
>>>>
>>>>Index: linux-2.6.25-mm1/lib/idr.c
>>>>===================================================================
>>>>--- linux-2.6.25-mm1.orig/lib/idr.c	2008-05-06 18:06:43.000000000 +0200
>>>>+++ linux-2.6.25-mm1/lib/idr.c	2008-05-07 09:07:31.000000000 +0200
>>>>@@ -424,15 +455,13 @@ void idr_remove_all(struct idr *idp)
>>>>
>>>>		id += 1 << n;
>>>>		while (n < fls(id)) {
>>>>-			if (p) {
>>>>-				memset(p, 0, sizeof *p);
>>>>-				move_to_free_list(idp, p);
>>>>-			}
>>>>+			if (p)
>>>>+				free_layer(p);
>>>>			n += IDR_BITS;
>>>>			p = *--paa;
>>>>		}
>>>>	}
>>>>-	idp->top = NULL;
>>>>+	rcu_assign_pointer(idp->top, NULL);
>>>>	idp->layers = 0;
>>>>}
>>>>EXPORT_SYMBOL(idr_remove_all);
>>>
>>>
>>>Does idr_remove_all() need an rcu_dereference() in the loop preceeding the
>>>above, where it does:
>>>
>>>                while (n > IDR_BITS && p) {
>>>                        n -= IDR_BITS;
>>>                        *paa++ = p;
>>>     ---->              p = p->ary[(id >> n) & IDR_MASK];
>>>                }
>>
>>I assumed here that idr_remove_all() was called in the "typical cleanup 
>>sequence" mentioned in the comment describing the routine.
>>And actually, that's how it is called in drivers/char/drm.
> 
> 
> Sure.  I guess I was thinking out loud that it's maybe somewhat implicit
> that things must be serial at that point and I wasn't sure if it was meant
> to be required or enforced, if it should be clarified with comments or
> by explicitly adding the rcu calls in these couple additional places.

Ok, I'll add a comment to clarify things.

> 
> 
>>>I've been having some machine issues, but hope to give this patch set a run
>>>still on a 128way machine and mainline to provide some additional
>>>datapoints.
>>>
>>
>>That would be kind, indeed (hope I didn't break anything).
> 
> 
> I've had a bunch of machine issues, but I did manage to do some testing.
> 
> I'd started looking at the regression after Anton Blanchard mentioned
> seeing it via this simple bit of code:
>     http://ozlabs.org/~anton/junkcode/lock_comparison.c
> It essentially spawns a number of threads to match the cpu count, each
> thread looping 10000 times, where each loop does some trivial semop()'s.
> Each thread has its own semaphore it is semop()'ing so there's no
> contention.
> 
> To get a little more detail I hacked Anton's code minimally to record
> results for thread counts 1..n and also to optionally have just a single
> semaphore on which all of these threads are contending.  I ran this on
> a 64cpu machine (128 given SMT), but didn't make any effort to force
> clean thread/cpu affinity.
> 
> The contended numbers don't look quite as consistent as they could at
> the high end, but I think this is more quick/dirty test than patch.
> Nevertheless the patch clearly outperforms mainline and despite the
> noise actually looks to perform a more consistently than mainline
> (see graphs).
> 
> Summary numbers from a run (with a bit more detail at the high thread
> side as the numbers had more variation there and just showing the power
> of two numbers for this run incorrectly implies a knee...I can do more
> runs with averages/stats if people need more convincing):
> 
> threads          uncontended                     contended
>                  semop loops                    semop loops
> 
>         2.6.26-rc2      +patchset        2.6.26-rc2      +patchset
> 
> 1	  2243.94	  4436.25	   2063.18	   4386.78
> 2	  2954.11	  5445.12	  67885.16	   5906.72
> 4	  4367.45	  8125.67	  72820.32	  44263.86
> 8	  7440.00	  9842.66	  60184.17	  95677.58
> 16	 12959.44	 20323.97	 136482.42	 248143.80
> 32	 35252.71	 56334.28	 363884.09	 599773.31
> 48	 62811.15	102684.67	 515886.12	1714530.12
> ...
> 57	 81064.99	141434.33	 564874.69	2518078.75
> 58	 79486.08	145685.84	 693038.06	1868813.12
> 59	 83634.19	153087.80	1237576.25	2828301.25
> 60	 91581.07	158207.08	 797796.94	2970977.25
> 61	 89209.40	160529.38	1202135.38	2538114.50
> 62	 89008.45	167843.78	1305666.75	2274845.00
> 63	 97753.17	177470.12	 733957.31	 363952.62
> 64	102556.05	175923.56	1356988.88	 199527.83
> 
> (detail plots from this same run attached...)
> 
> Nadia, you're welcome to add either or both of these to the series if
> you'd like:
> 
> Reviewed-by: Tim Pepper <lnxninja@linux.vnet.ibm.com>
> Tested-by: Tim Pepper <lnxninja@linux.vnet.ibm.com>
> 
> 

Indeed, thanks a lot for taking some of your time to pass the tests!
Actually there are 2 numbers that bother me: those for the contended 
loops on the patched kernel (63 and 64 threads) - the last 2 numbers in 
the rightmost column.
Did you have the opportunity to run that same test for 128 threads: this 
is just for me to check whether 64 is not the #of threads we are 
starting to slow down from.

Thanks,
Nadia

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 7/9] Make idr_remove rcu-safe
  2008-05-20  7:03         ` Nadia Derbey
@ 2008-05-20 16:26           ` Tim Pepper
  0 siblings, 0 replies; 39+ messages in thread
From: Tim Pepper @ 2008-05-20 16:26 UTC (permalink / raw)
  To: Nadia Derbey; +Cc: Tim Pepper, manfred, paulmck, linux-kernel, efault, akpm

On Tue 20 May at 09:03:26 +0200 Nadia.Derbey@bull.net said:
>> 60	 91581.07	158207.08	 797796.94	2970977.25
>> 61	 89209.40	160529.38	1202135.38	2538114.50
>> 62	 89008.45	167843.78	1305666.75	2274845.00
>> 63	 97753.17	177470.12	 733957.31	 363952.62
>> 64	102556.05	175923.56	1356988.88	 199527.83
>>
> Actually there are 2 numbers that bother me: those for the contended loops 
> on the patched kernel (63 and 64 threads) - the last 2 numbers in the 
> rightmost column.

> Did you have the opportunity to run that same test for 128 threads: this is 
> just for me to check whether 64 is not the #of threads we are starting to 
> slow down from.

I don't have results from a run handy with over 64 threads with the
patched kernel for the contended case.  But from what I've seen, the
closer the number of test threads is to the number of actual cores,
not SMT threads, the more noisy the test gets.  I think this is
reasonable/expected and that there's nothing magic about 63 or 64
threads.

We've been having network issues in the lab where this big box is.
If/when I can get access long enough, I'll do a series of runs including
past 64threads give averaged results and deviations.

-- 
Tim Pepper  <lnxninja@linux.vnet.ibm.com>
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/9] Scalability requirements for sysv ipc - v3
  2008-05-07 11:35 [PATCH 0/9] Scalability requirements for sysv ipc - v3 Nadia.Derbey
                   ` (10 preceding siblings ...)
  2008-05-07 13:19 ` KOSAKI Motohiro
@ 2008-05-30  8:22 ` Paul E. McKenney
  2008-06-02  5:53   ` Nadia Derbey
  11 siblings, 1 reply; 39+ messages in thread
From: Paul E. McKenney @ 2008-05-30  8:22 UTC (permalink / raw)
  To: Nadia.Derbey; +Cc: manfred, lnxninja, linux-kernel, efault, akpm

On Wed, May 07, 2008 at 01:35:53PM +0200, Nadia.Derbey@bull.net wrote:
> 
> After scalability problems have been detected when using the sysV ipcs, I
> have proposed to use an RCU based implementation of the IDR api instead (see
> threads http://lkml.org/lkml/2008/4/11/212 and
> http://lkml.org/lkml/2008/4/29/295).
> 
> This resulted in many people asking to convert the idr API and make it
> rcu safe (because most of the code was duplicated and thus unmaintanable
> and unreviewable).
> 
> So here is a first attempt.
> 
> The important change wrt to the idr API itself is during idr removes:
> idr layers are freed after a grace period, instead of being moved to the
> free list.
> 
> The important change wrt to ipcs, is that idr_find() can now be called
> locklessly inside a rcu read critical section.
> 
> Here are the results I've got for the pmsg test sent by Manfred: 
> 
>    2.6.25-rc3-mm1   2.6.25-rc3-mm1+   2.6.25-mm1   Patched 2.6.25-mm1
> 1         1168441           1064021       876000               947488
> 2         1094264            921059      1549592              1730685
> 3         2082520           1738165      1694370              2324880
> 4         2079929           1695521       404553              2400408
> 5         2898758            406566       391283              3246580
> 6         2921417            261275       263249              3752148
> 7         3308761            126056       191742              4243142
> 8         3329456            100129       141722              4275780
> 
> 1st column: stock 2.6.25-rc3-mm1
> 2nd column: 2.6.25-rc3-mm1 + ipc patches (store ipcs into idrs)
> 3nd column: stock 2.6.25-mm1
> 4th column: 2.6.25-mm1 + this pacth series.
> 
> I'll send a chart as an answer to this mail: don't know how to do that
> with quilt :-(
> 
> 
> Reviewers are more than ever welcome!
> 
> Patches should be applied on linux-2.6.25-mm1, in the following order:
> 
> [ PATCH 01/09 ] : idr_add_rcu_head.patch
> [ PATCH 02/09 ] : idr_rename_routines.patch
> [ PATCH 03/09 ] : idr_fix_printk.patch
> [ PATCH 04/09 ] : idr_rc_to_errno.patch
> [ PATCH 05/09 ] : idr_get_new_rcu_safe.patch
> [ PATCH 06/09 ] : idr_find_rcu_safe.patch
> [ PATCH 07/09 ] : idr_remove_rcu_safe.patch
> [ PATCH 08/09 ] : ipc_fix_ipc_lock.patch
> [ PATCH 09/09 ] : remove_ipc_lock_down.patch
> 
> Patches 2, 3 and 4 do not introduce actual changes.
> 
> I won't be available before next Tuesday, so, please, don't be mad at me if
> I'm not answering fast enough.

I guess in my case, next Tuesday was not an issue.  :-/

Anyway, the idr.c changes look good to me.  Not sure why you are using
INIT_RCU_HEAD() given that call_rcu() completely initializes the fields.
Using INIT_RCU_HEAD() doesn't cause any problems, but does add needless
code.

Commentary below, looks good from an RCU viewpoint.

							Thanx, Paul

> /*
>  * 2002-10-18  written by Jim Houston jim.houston@ccur.com
>  *	Copyright (C) 2002 by Concurrent Computer Corporation
>  *	Distributed under the GNU GPL license version 2.
>  *
>  * Modified by George Anzinger to reuse immediately and to use
>  * find bit instructions.  Also removed _irq on spinlocks.
>  *
>  * Modified by Nadia Derbey to make it RCU safe.
>  *
>  * Small id to pointer translation service.
>  *
>  * It uses a radix tree like structure as a sparse array indexed
>  * by the id to obtain the pointer.  The bitmap makes allocating
>  * a new id quick.
>  *
>  * You call it to allocate an id (an int) an associate with that id a
>  * pointer or what ever, we treat it as a (void *).  You can pass this
>  * id to a user for him to pass back at a later time.  You then pass
>  * that id to this code and it returns your pointer.
> 
>  * You can release ids at any time. When all ids are released, most of
>  * the memory is returned (we keep IDR_FREE_MAX) in a local pool so we
>  * don't need to go to the memory "store" during an id allocate, just
>  * so you don't need to be too concerned about locking and conflicts
>  * with the slab allocator.
>  */
> 
> #ifndef TEST                        // to test in user space...
> #include <linux/slab.h>
> #include <linux/init.h>
> #include <linux/module.h>
> #endif
> #include <linux/err.h>
> #include <linux/string.h>
> #include <linux/idr.h>
> 
> static struct kmem_cache *idr_layer_cache;
> 
> static struct idr_layer *get_from_free_list(struct idr *idp)
> {
> 	struct idr_layer *p;
> 	unsigned long flags;
> 
> 	spin_lock_irqsave(&idp->lock, flags);
> 	if ((p = idp->id_free)) {
> 		idp->id_free = p->ary[0];
> 		idp->id_free_cnt--;
> 		p->ary[0] = NULL;

OK, this is the freelist which is inaccessible to readers.

> 	}
> 	spin_unlock_irqrestore(&idp->lock, flags);
> 	return(p);
> }
> 
> static void idr_layer_rcu_free(struct rcu_head *head)
> {
> 	struct idr_layer *layer;
> 
> 	layer = container_of(head, struct idr_layer, rcu_head);
> 	kmem_cache_free(idr_layer_cache, layer);
> }
> 
> static inline void free_layer(struct idr_layer *p)
> {
> 	call_rcu(&p->rcu_head, idr_layer_rcu_free);
> }
> 
> /* only called when idp->lock is held */
> static void __move_to_free_list(struct idr *idp, struct idr_layer *p)
> {
> 	p->ary[0] = idp->id_free;

OK, this is the freelist which is inaccessible to readers.

> 	idp->id_free = p;
> 	idp->id_free_cnt++;
> }
> 
> static void move_to_free_list(struct idr *idp, struct idr_layer *p)
> {
> 	unsigned long flags;
> 
> 	/*
> 	 * Depends on the return element being zeroed.
> 	 */
> 	spin_lock_irqsave(&idp->lock, flags);
> 	__move_to_free_list(idp, p);
> 	spin_unlock_irqrestore(&idp->lock, flags);
> }
> 
> static void idr_mark_full(struct idr_layer **pa, int id)
> {
> 	struct idr_layer *p = pa[0];
> 	int l = 0;
> 
> 	__set_bit(id & IDR_MASK, &p->bitmap);
> 	/*
> 	 * If this layer is full mark the bit in the layer above to
> 	 * show that this part of the radix tree is full.  This may
> 	 * complete the layer above and require walking up the radix
> 	 * tree.
> 	 */
> 	while (p->bitmap == IDR_FULL) {
> 		if (!(p = pa[++l]))
> 			break;
> 		id = id >> IDR_BITS;
> 		__set_bit((id & IDR_MASK), &p->bitmap);
> 	}
> }
> 
> /**
>  * idr_pre_get - reserver resources for idr allocation
>  * @idp:	idr handle
>  * @gfp_mask:	memory allocation flags
>  *
>  * This function should be called prior to locking and calling the
>  * idr_get_new* functions. It preallocates enough memory to satisfy
>  * the worst possible allocation.
>  *
>  * If the system is REALLY out of memory this function returns 0,
>  * otherwise 1.
>  */
> int idr_pre_get(struct idr *idp, gfp_t gfp_mask)
> {
> 	while (idp->id_free_cnt < IDR_FREE_MAX) {
> 		struct idr_layer *new;
> 		new = kmem_cache_alloc(idr_layer_cache, gfp_mask);
> 		if (new == NULL)
> 			return (0);
> 		move_to_free_list(idp, new);
> 	}
> 	return 1;
> }
> EXPORT_SYMBOL(idr_pre_get);
> 
> static int sub_alloc(struct idr *idp, int *starting_id, struct idr_layer **pa)
> {
> 	int n, m, sh;
> 	struct idr_layer *p, *new;
> 	int l, id, oid;
> 	unsigned long bm;
> 
> 	id = *starting_id;
>  restart:
> 	p = idp->top;

OK, the caller presumably holds an update-side lock.

> 	l = idp->layers;
> 	pa[l--] = NULL;
> 	while (1) {
> 		/*
> 		 * We run around this while until we reach the leaf node...
> 		 */
> 		n = (id >> (IDR_BITS*l)) & IDR_MASK;
> 		bm = ~p->bitmap;
> 		m = find_next_bit(&bm, IDR_SIZE, n);
> 		if (m == IDR_SIZE) {
> 			/* no space available go back to previous layer. */
> 			l++;
> 			oid = id;
> 			id = (id | ((1 << (IDR_BITS * l)) - 1)) + 1;
> 
> 			/* if already at the top layer, we need to grow */
> 			if (!(p = pa[l])) {
> 				*starting_id = id;
> 				return IDR_NEED_TO_GROW;
> 			}
> 
> 			/* If we need to go up one layer, continue the
> 			 * loop; otherwise, restart from the top.
> 			 */
> 			sh = IDR_BITS * (l + 1);
> 			if (oid >> sh == id >> sh)
> 				continue;
> 			else
> 				goto restart;
> 		}
> 		if (m != n) {
> 			sh = IDR_BITS*l;
> 			id = ((id >> sh) ^ n ^ m) << sh;
> 		}
> 		if ((id >= MAX_ID_BIT) || (id < 0))
> 			return IDR_NOMORE_SPACE;
> 		if (l == 0)
> 			break;
> 		/*
> 		 * Create the layer below if it is missing.
> 		 */
> 		if (!p->ary[m]) {

OK, we aren't dereferencing.  Besides, we should hold the update-side
lock at this point.

> 			new = get_from_free_list(idp);
> 			if (!new)
> 				return -1;
> 			INIT_RCU_HEAD(&new->rcu_head);

Not needed, unless you want this zeroed for debug purposes.

> 			rcu_assign_pointer(p->ary[m], new);
> 			p->count++;
> 		}
> 		pa[l--] = p;
> 		p = p->ary[m];

Holding update-side lock.

> 	}
> 
> 	pa[l] = p;
> 	return id;
> }
> 
> static int idr_get_empty_slot(struct idr *idp, int starting_id,
> 			      struct idr_layer **pa)
> {
> 	struct idr_layer *p, *new;
> 	int layers, v, id;
> 	unsigned long flags;
> 
> 	id = starting_id;
> build_up:
> 	p = idp->top;

OK, the caller presumably holds an update-side lock.

> 	layers = idp->layers;
> 	if (unlikely(!p)) {
> 		if (!(p = get_from_free_list(idp)))
> 			return -1;
> 		INIT_RCU_HEAD(&p->rcu_head);

Not needed, unless you want this zeroed for debug purposes.

> 		layers = 1;
> 	}
> 	/*
> 	 * Add a new layer to the top of the tree if the requested
> 	 * id is larger than the currently allocated space.
> 	 */
> 	while ((layers < (MAX_LEVEL - 1)) && (id >= (1 << (layers*IDR_BITS)))) {
> 		layers++;
> 		if (!p->count)
> 			continue;
> 		if (!(new = get_from_free_list(idp))) {
> 			/*
> 			 * The allocation failed.  If we built part of
> 			 * the structure tear it down.
> 			 */
> 			spin_lock_irqsave(&idp->lock, flags);
> 			for (new = p; p && p != idp->top; new = p) {
> 				p = p->ary[0];
> 				new->ary[0] = NULL;

OK, this presumably has not yet been made accessible to readers.

> 				new->bitmap = new->count = 0;
> 				__move_to_free_list(idp, new);
> 			}
> 			spin_unlock_irqrestore(&idp->lock, flags);
> 			return -1;
> 		}
> 		new->ary[0] = p;

OK, this presumably has not yet been made accessible to readers.

> 		new->count = 1;
> 		INIT_RCU_HEAD(&new->rcu_head);

Not needed, unless you want this zeroed for debug purposes.

> 		if (p->bitmap == IDR_FULL)
> 			__set_bit(0, &new->bitmap);
> 		p = new;
> 	}
> 	rcu_assign_pointer(idp->top, p);
> 	idp->layers = layers;
> 	v = sub_alloc(idp, &id, pa);
> 	if (v == IDR_NEED_TO_GROW)
> 		goto build_up;
> 	return(v);
> }
> 
> static int idr_get_new_above_int(struct idr *idp, void *ptr, int starting_id)
> {
> 	struct idr_layer *pa[MAX_LEVEL];
> 	int id;
> 
> 	id = idr_get_empty_slot(idp, starting_id, pa);
> 	if (id >= 0) {
> 		/*
> 		 * Successfully found an empty slot.  Install the user
> 		 * pointer and mark the slot full.
> 		 */
> 		rcu_assign_pointer(pa[0]->ary[id & IDR_MASK],
> 				(struct idr_layer *)ptr);
> 		pa[0]->count++;
> 		idr_mark_full(pa, id);
> 	}
> 
> 	return id;
> }
> 
> /**
>  * idr_get_new_above - allocate new idr entry above or equal to a start id
>  * @idp: idr handle
>  * @ptr: pointer you want associated with the ide
>  * @start_id: id to start search at
>  * @id: pointer to the allocated handle
>  *
>  * This is the allocate id function.  It should be called with any
>  * required locks.
>  *
>  * If memory is required, it will return -EAGAIN, you should unlock
>  * and go back to the idr_pre_get() call.  If the idr is full, it will
>  * return -ENOSPC.
>  *
>  * @id returns a value in the range 0 ... 0x7fffffff
>  */
> int idr_get_new_above(struct idr *idp, void *ptr, int starting_id, int *id)
> {
> 	int rv;
> 
> 	rv = idr_get_new_above_int(idp, ptr, starting_id);
> 	/*
> 	 * This is a cheap hack until the IDR code can be fixed to
> 	 * return proper error values.
> 	 */
> 	if (rv < 0)
> 		return _idr_rc_to_errno(rv);
> 	*id = rv;
> 	return 0;
> }
> EXPORT_SYMBOL(idr_get_new_above);
> 
> /**
>  * idr_get_new - allocate new idr entry
>  * @idp: idr handle
>  * @ptr: pointer you want associated with the ide
>  * @id: pointer to the allocated handle
>  *
>  * This is the allocate id function.  It should be called with any
>  * required locks.
>  *
>  * If memory is required, it will return -EAGAIN, you should unlock
>  * and go back to the idr_pre_get() call.  If the idr is full, it will
>  * return -ENOSPC.
>  *
>  * @id returns a value in the range 0 ... 0x7fffffff
>  */
> int idr_get_new(struct idr *idp, void *ptr, int *id)
> {
> 	int rv;
> 
> 	rv = idr_get_new_above_int(idp, ptr, 0);
> 	/*
> 	 * This is a cheap hack until the IDR code can be fixed to
> 	 * return proper error values.
> 	 */
> 	if (rv < 0)
> 		return _idr_rc_to_errno(rv);
> 	*id = rv;
> 	return 0;
> }
> EXPORT_SYMBOL(idr_get_new);
> 
> static void idr_remove_warning(int id)
> {
> 	printk(KERN_WARNING
> 		"idr_remove called for id=%d which is not allocated.\n", id);
> 	dump_stack();
> }
> 
> static void sub_remove(struct idr *idp, int shift, int id)
> {
> 	struct idr_layer *p = idp->top;

OK, the caller presumably holds an update-side lock.

> 	struct idr_layer **pa[MAX_LEVEL];
> 	struct idr_layer ***paa = &pa[0];
> 	struct idr_layer *to_free;
> 	int n;
> 
> 	*paa = NULL;
> 	*++paa = &idp->top;
> 
> 	while ((shift > 0) && p) {
> 		n = (id >> shift) & IDR_MASK;
> 		__clear_bit(n, &p->bitmap);
> 		*++paa = &p->ary[n];

OK, the caller presumably holds an update-side lock.

> 		p = p->ary[n];
> 		shift -= IDR_BITS;
> 	}
> 	n = id & IDR_MASK;
> 	if (likely(p != NULL && test_bit(n, &p->bitmap))){
> 		__clear_bit(n, &p->bitmap);
> 		rcu_assign_pointer(p->ary[n], NULL);
> 		to_free = NULL;
> 		while(*paa && ! --((**paa)->count)){
> 			if (to_free)
> 				free_layer(to_free);
> 			to_free = **paa;
> 			**paa-- = NULL;
> 		}
> 		if (!*paa)
> 			idp->layers = 0;
> 		if (to_free)
> 			free_layer(to_free);
> 	} else
> 		idr_remove_warning(id);
> }
> 
> /**
>  * idr_remove - remove the given id and free it's slot
>  * @idp: idr handle
>  * @id: unique key
>  */
> void idr_remove(struct idr *idp, int id)
> {
> 	struct idr_layer *p;
> 	struct idr_layer *to_free;
> 
> 	/* Mask off upper bits we don't use for the search. */
> 	id &= MAX_ID_MASK;
> 
> 	sub_remove(idp, (idp->layers - 1) * IDR_BITS, id);
> 	if (idp->top && idp->top->count == 1 && (idp->layers > 1) &&
> 	    idp->top->ary[0]) {

OK, the caller presumably holds the update-side lock.

> 		/*
> 		 * Single child at leftmost slot: we can shrink the tree.
> 		 * This level is not needed anymore since when layers are
> 		 * inserted, they are inserted at the top of the existing
> 		 * tree.
> 		 */
> 		to_free = idp->top;
> 		p = idp->top->ary[0];

OK, the caller presumably holds the update-side lock.

> 		rcu_assign_pointer(idp->top, p);
> 		--idp->layers;
> 		to_free->bitmap = to_free->count = 0;
> 		free_layer(to_free);
> 	}
> 	while (idp->id_free_cnt >= IDR_FREE_MAX) {
> 		p = get_from_free_list(idp);
> 		/*
> 		 * Note: we don't call the rcu callback here, since the only
> 		 * layers that fall into the freelist are those that have been
> 		 * preallocated.
> 		 */
> 		kmem_cache_free(idr_layer_cache, p);
> 	}
> 	return;
> }
> EXPORT_SYMBOL(idr_remove);
> 
> /**
>  * idr_remove_all - remove all ids from the given idr tree
>  * @idp: idr handle
>  *
>  * idr_destroy() only frees up unused, cached idp_layers, but this
>  * function will remove all id mappings and leave all idp_layers
>  * unused.
>  *
>  * A typical clean-up sequence for objects stored in an idr tree, will
>  * use idr_for_each() to free all objects, if necessay, then
>  * idr_remove_all() to remove all ids, and idr_destroy() to free
>  * up the cached idr_layers.
>  */
> void idr_remove_all(struct idr *idp)
> {
> 	int n, id, max;
> 	struct idr_layer *p;
> 	struct idr_layer *pa[MAX_LEVEL];
> 	struct idr_layer **paa = &pa[0];
> 
> 	n = idp->layers * IDR_BITS;
> 	p = idp->top;

OK, the caller presumably holds an update-side lock.

> 	max = 1 << n;
> 
> 	id = 0;
> 	while (id < max) {
> 		while (n > IDR_BITS && p) {
> 			n -= IDR_BITS;
> 			*paa++ = p;
> 			p = p->ary[(id >> n) & IDR_MASK];

OK, the caller presumably holds the update-side lock.

> 		}
> 
> 		id += 1 << n;
> 		while (n < fls(id)) {
> 			if (p)
> 				free_layer(p);
> 			n += IDR_BITS;
> 			p = *--paa;
> 		}
> 	}
> 	rcu_assign_pointer(idp->top, NULL);
> 	idp->layers = 0;
> }
> EXPORT_SYMBOL(idr_remove_all);
> 
> /**
>  * idr_destroy - release all cached layers within an idr tree
>  * idp: idr handle
>  */
> void idr_destroy(struct idr *idp)
> {
> 	while (idp->id_free_cnt) {
> 		struct idr_layer *p = get_from_free_list(idp);
> 		kmem_cache_free(idr_layer_cache, p);
> 	}
> }
> EXPORT_SYMBOL(idr_destroy);
> 
> /**
>  * idr_find - return pointer for given id
>  * @idp: idr handle
>  * @id: lookup key
>  *
>  * Return the pointer given the id it has been registered with.  A %NULL
>  * return indicates that @id is not valid or you passed %NULL in
>  * idr_get_new().
>  *
>  * This function can be called under rcu_read_lock(), given that the leaf
>  * pointers lifetimes are correctly managed.
>  */
> void *idr_find(struct idr *idp, int id)
> {
> 	int n;
> 	struct idr_layer *p;
> 
> 	n = idp->layers * IDR_BITS;
> 	p = rcu_dereference(idp->top);
> 
> 	/* Mask off upper bits we don't use for the search. */
> 	id &= MAX_ID_MASK;
> 
> 	if (id >= (1 << n))
> 		return NULL;
> 
> 	while (n > 0 && p) {
> 		n -= IDR_BITS;
> 		p = rcu_dereference(p->ary[(id >> n) & IDR_MASK]);
> 	}
> 	return((void *)p);
> }
> EXPORT_SYMBOL(idr_find);
> 
> /**
>  * idr_for_each - iterate through all stored pointers
>  * @idp: idr handle
>  * @fn: function to be called for each pointer
>  * @data: data passed back to callback function
>  *
>  * Iterate over the pointers registered with the given idr.  The
>  * callback function will be called for each pointer currently
>  * registered, passing the id, the pointer and the data pointer passed
>  * to this function.  It is not safe to modify the idr tree while in
>  * the callback, so functions such as idr_get_new and idr_remove are
>  * not allowed.
>  *
>  * We check the return of @fn each time. If it returns anything other
>  * than 0, we break out and return that value.
>  *
>  * The caller must serialize idr_for_each() vs idr_get_new() and idr_remove().
>  */
> int idr_for_each(struct idr *idp,
> 		 int (*fn)(int id, void *p, void *data), void *data)
> {
> 	int n, id, max, error = 0;
> 	struct idr_layer *p;
> 	struct idr_layer *pa[MAX_LEVEL];
> 	struct idr_layer **paa = &pa[0];
> 
> 	n = idp->layers * IDR_BITS;
> 	p = rcu_dereference(idp->top);
> 	max = 1 << n;
> 
> 	id = 0;
> 	while (id < max) {
> 		while (n > 0 && p) {
> 			n -= IDR_BITS;
> 			*paa++ = p;
> 			p = rcu_dereference(p->ary[(id >> n) & IDR_MASK]);
> 		}
> 
> 		if (p) {
> 			error = fn(id, (void *)p, data);
> 			if (error)
> 				break;
> 		}
> 
> 		id += 1 << n;
> 		while (n < fls(id)) {
> 			n += IDR_BITS;
> 			p = *--paa;
> 		}
> 	}
> 
> 	return error;
> }
> EXPORT_SYMBOL(idr_for_each);
> 
> /**
>  * idr_replace - replace pointer for given id
>  * @idp: idr handle
>  * @ptr: pointer you want associated with the id
>  * @id: lookup key
>  *
>  * Replace the pointer registered with an id and return the old value.
>  * A -ENOENT return indicates that @id was not found.
>  * A -EINVAL return indicates that @id was not within valid constraints.
>  *
>  * The caller must serialize with writers.
>  */
> void *idr_replace(struct idr *idp, void *ptr, int id)
> {
> 	int n;
> 	struct idr_layer *p, *old_p;
> 
> 	n = idp->layers * IDR_BITS;
> 	p = idp->top;

OK, the caller presumably holds an update-side lock.

> 
> 	id &= MAX_ID_MASK;
> 
> 	if (id >= (1 << n))
> 		return ERR_PTR(-EINVAL);
> 
> 	n -= IDR_BITS;
> 	while ((n > 0) && p) {
> 		p = p->ary[(id >> n) & IDR_MASK];

OK, the caller presumably holds the update-side lock.

> 		n -= IDR_BITS;
> 	}
> 
> 	n = id & IDR_MASK;
> 	if (unlikely(p == NULL || !test_bit(n, &p->bitmap)))
> 		return ERR_PTR(-ENOENT);
> 
> 	old_p = p->ary[n];

OK, the caller presumably holds the update-side lock.

> 	rcu_assign_pointer(p->ary[n], ptr);
> 
> 	return old_p;
> }
> EXPORT_SYMBOL(idr_replace);
> 
> static void idr_cache_ctor(struct kmem_cache *idr_layer_cache, void *idr_layer)
> {
> 	memset(idr_layer, 0, sizeof(struct idr_layer));
> }
> 
> void __init idr_init_cache(void)
> {
> 	idr_layer_cache = kmem_cache_create("idr_layer_cache",
> 				sizeof(struct idr_layer), 0, SLAB_PANIC,
> 				idr_cache_ctor);
> }
> 
> /**
>  * idr_init - initialize idr handle
>  * @idp:	idr handle
>  *
>  * This function is use to set up the handle (@idp) that you will pass
>  * to the rest of the functions.
>  */
> void idr_init(struct idr *idp)
> {
> 	memset(idp, 0, sizeof(struct idr));
> 	spin_lock_init(&idp->lock);
> }
> EXPORT_SYMBOL(idr_init);
> 
> 
> /*
>  * IDA - IDR based ID allocator
>  *
>  * this is id allocator without id -> pointer translation.  Memory
>  * usage is much lower than full blown idr because each id only
>  * occupies a bit.  ida uses a custom leaf node which contains
>  * IDA_BITMAP_BITS slots.
>  *
>  * 2007-04-25  written by Tejun Heo <htejun@gmail.com>
>  */
> 
> static void free_bitmap(struct ida *ida, struct ida_bitmap *bitmap)
> {
> 	unsigned long flags;
> 
> 	if (!ida->free_bitmap) {
> 		spin_lock_irqsave(&ida->idr.lock, flags);
> 		if (!ida->free_bitmap) {
> 			ida->free_bitmap = bitmap;
> 			bitmap = NULL;
> 		}
> 		spin_unlock_irqrestore(&ida->idr.lock, flags);
> 	}
> 
> 	kfree(bitmap);
> }
> 
> /**
>  * ida_pre_get - reserve resources for ida allocation
>  * @ida:	ida handle
>  * @gfp_mask:	memory allocation flag
>  *
>  * This function should be called prior to locking and calling the
>  * following function.  It preallocates enough memory to satisfy the
>  * worst possible allocation.
>  *
>  * If the system is REALLY out of memory this function returns 0,
>  * otherwise 1.
>  */
> int ida_pre_get(struct ida *ida, gfp_t gfp_mask)
> {
> 	/* allocate idr_layers */
> 	if (!idr_pre_get(&ida->idr, gfp_mask))
> 		return 0;
> 
> 	/* allocate free_bitmap */
> 	if (!ida->free_bitmap) {
> 		struct ida_bitmap *bitmap;
> 
> 		bitmap = kmalloc(sizeof(struct ida_bitmap), gfp_mask);
> 		if (!bitmap)
> 			return 0;
> 
> 		free_bitmap(ida, bitmap);
> 	}
> 
> 	return 1;
> }
> EXPORT_SYMBOL(ida_pre_get);
> 
> /**
>  * ida_get_new_above - allocate new ID above or equal to a start id
>  * @ida:	ida handle
>  * @staring_id:	id to start search at
>  * @p_id:	pointer to the allocated handle
>  *
>  * Allocate new ID above or equal to @ida.  It should be called with
>  * any required locks.
>  *
>  * If memory is required, it will return -EAGAIN, you should unlock
>  * and go back to the ida_pre_get() call.  If the ida is full, it will
>  * return -ENOSPC.
>  *
>  * @p_id returns a value in the range 0 ... 0x7fffffff.
>  */
> int ida_get_new_above(struct ida *ida, int starting_id, int *p_id)
> {
> 	struct idr_layer *pa[MAX_LEVEL];
> 	struct ida_bitmap *bitmap;
> 	unsigned long flags;
> 	int idr_id = starting_id / IDA_BITMAP_BITS;
> 	int offset = starting_id % IDA_BITMAP_BITS;
> 	int t, id;
> 
>  restart:
> 	/* get vacant slot */
> 	t = idr_get_empty_slot(&ida->idr, idr_id, pa);
> 	if (t < 0)
> 		return _idr_rc_to_errno(t);
> 
> 	if (t * IDA_BITMAP_BITS >= MAX_ID_BIT)
> 		return -ENOSPC;
> 
> 	if (t != idr_id)
> 		offset = 0;
> 	idr_id = t;
> 
> 	/* if bitmap isn't there, create a new one */
> 	bitmap = (void *)pa[0]->ary[idr_id & IDR_MASK];

OK, the caller presumably holds the update-side lock.

> 	if (!bitmap) {
> 		spin_lock_irqsave(&ida->idr.lock, flags);
> 		bitmap = ida->free_bitmap;
> 		ida->free_bitmap = NULL;
> 		spin_unlock_irqrestore(&ida->idr.lock, flags);
> 
> 		if (!bitmap)
> 			return -EAGAIN;
> 
> 		memset(bitmap, 0, sizeof(struct ida_bitmap));
> 		rcu_assign_pointer(pa[0]->ary[idr_id & IDR_MASK],
> 				(void *)bitmap);
> 		pa[0]->count++;
> 	}
> 
> 	/* lookup for empty slot */
> 	t = find_next_zero_bit(bitmap->bitmap, IDA_BITMAP_BITS, offset);
> 	if (t == IDA_BITMAP_BITS) {
> 		/* no empty slot after offset, continue to the next chunk */
> 		idr_id++;
> 		offset = 0;
> 		goto restart;
> 	}
> 
> 	id = idr_id * IDA_BITMAP_BITS + t;
> 	if (id >= MAX_ID_BIT)
> 		return -ENOSPC;
> 
> 	__set_bit(t, bitmap->bitmap);
> 	if (++bitmap->nr_busy == IDA_BITMAP_BITS)
> 		idr_mark_full(pa, idr_id);
> 
> 	*p_id = id;
> 
> 	/* Each leaf node can handle nearly a thousand slots and the
> 	 * whole idea of ida is to have small memory foot print.
> 	 * Throw away extra resources one by one after each successful
> 	 * allocation.
> 	 */
> 	if (ida->idr.id_free_cnt || ida->free_bitmap) {
> 		struct idr_layer *p = get_from_free_list(&ida->idr);
> 		if (p)
> 			kmem_cache_free(idr_layer_cache, p);
> 	}
> 
> 	return 0;
> }
> EXPORT_SYMBOL(ida_get_new_above);
> 
> /**
>  * ida_get_new - allocate new ID
>  * @ida:	idr handle
>  * @p_id:	pointer to the allocated handle
>  *
>  * Allocate new ID.  It should be called with any required locks.
>  *
>  * If memory is required, it will return -EAGAIN, you should unlock
>  * and go back to the idr_pre_get() call.  If the idr is full, it will
>  * return -ENOSPC.
>  *
>  * @id returns a value in the range 0 ... 0x7fffffff.
>  */
> int ida_get_new(struct ida *ida, int *p_id)
> {
> 	return ida_get_new_above(ida, 0, p_id);
> }
> EXPORT_SYMBOL(ida_get_new);
> 
> /**
>  * ida_remove - remove the given ID
>  * @ida:	ida handle
>  * @id:		ID to free
>  */
> void ida_remove(struct ida *ida, int id)
> {
> 	struct idr_layer *p = ida->idr.top;
> 	int shift = (ida->idr.layers - 1) * IDR_BITS;
> 	int idr_id = id / IDA_BITMAP_BITS;
> 	int offset = id % IDA_BITMAP_BITS;
> 	int n;
> 	struct ida_bitmap *bitmap;
> 
> 	/* clear full bits while looking up the leaf idr_layer */
> 	while ((shift > 0) && p) {
> 		n = (idr_id >> shift) & IDR_MASK;
> 		__clear_bit(n, &p->bitmap);
> 		p = p->ary[n];

OK, the caller presumably holds the update-side lock.

> 		shift -= IDR_BITS;
> 	}
> 
> 	if (p == NULL)
> 		goto err;
> 
> 	n = idr_id & IDR_MASK;
> 	__clear_bit(n, &p->bitmap);
> 
> 	bitmap = (void *)p->ary[n];

OK, the caller presumably holds the update-side lock.

> 	if (!test_bit(offset, bitmap->bitmap))
> 		goto err;
> 
> 	/* update bitmap and remove it if empty */
> 	__clear_bit(offset, bitmap->bitmap);
> 	if (--bitmap->nr_busy == 0) {
> 		__set_bit(n, &p->bitmap);	/* to please idr_remove() */
> 		idr_remove(&ida->idr, idr_id);
> 		free_bitmap(ida, bitmap);
> 	}
> 
> 	return;
> 
>  err:
> 	printk(KERN_WARNING
> 	       "ida_remove called for id=%d which is not allocated.\n", id);
> }
> EXPORT_SYMBOL(ida_remove);
> 
> /**
>  * ida_destroy - release all cached layers within an ida tree
>  * ida:		ida handle
>  */
> void ida_destroy(struct ida *ida)
> {
> 	idr_destroy(&ida->idr);
> 	kfree(ida->free_bitmap);
> }
> EXPORT_SYMBOL(ida_destroy);
> 
> /**
>  * ida_init - initialize ida handle
>  * @ida:	ida handle
>  *
>  * This function is use to set up the handle (@ida) that you will pass
>  * to the rest of the functions.
>  */
> void ida_init(struct ida *ida)
> {
> 	memset(ida, 0, sizeof(struct ida));
> 	idr_init(&ida->idr);
> 
> }
> EXPORT_SYMBOL(ida_init);

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 1/9] Change the idr structure
  2008-05-07 11:35 ` [PATCH 1/9] Change the idr structure Nadia.Derbey
  2008-05-08 17:12   ` Rik van Riel
@ 2008-05-30  8:22   ` Paul E. McKenney
  1 sibling, 0 replies; 39+ messages in thread
From: Paul E. McKenney @ 2008-05-30  8:22 UTC (permalink / raw)
  To: Nadia.Derbey; +Cc: manfred, lnxninja, linux-kernel, efault, akpm

On Wed, May 07, 2008 at 01:35:54PM +0200, Nadia.Derbey@bull.net wrote:
> [PATCH 01/09]
> 
> This patch adds an rcu_head to the idr_layer structure in order to free it
> after a grace period.

Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

> Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
> 
> ---
>  include/linux/idr.h |    2 ++
>  1 file changed, 2 insertions(+)
> 
> Index: linux-2.6.25-mm1/include/linux/idr.h
> ===================================================================
> --- linux-2.6.25-mm1.orig/include/linux/idr.h	2008-05-06 17:14:24.000000000 +0200
> +++ linux-2.6.25-mm1/include/linux/idr.h	2008-05-06 17:20:58.000000000 +0200
> @@ -15,6 +15,7 @@
>  #include <linux/types.h>
>  #include <linux/bitops.h>
>  #include <linux/init.h>
> +#include <linux/rcupdate.h>
> 
>  #if BITS_PER_LONG == 32
>  # define IDR_BITS 5
> @@ -51,6 +52,7 @@ struct idr_layer {
>  	unsigned long		 bitmap; /* A zero bit means "space here" */
>  	struct idr_layer	*ary[1<<IDR_BITS];
>  	int			 count;	 /* When zero, we can release it */
> +	struct rcu_head		 rcu_head;
>  };
> 
>  struct idr {
> 
> --

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 2/9] Rename some of the idr APIs internal routines
  2008-05-07 11:35 ` [PATCH 2/9] Rename some of the idr APIs internal routines Nadia.Derbey
  2008-05-08 17:15   ` Rik van Riel
@ 2008-05-30  8:23   ` Paul E. McKenney
  1 sibling, 0 replies; 39+ messages in thread
From: Paul E. McKenney @ 2008-05-30  8:23 UTC (permalink / raw)
  To: Nadia.Derbey; +Cc: manfred, lnxninja, linux-kernel, efault, akpm

On Wed, May 07, 2008 at 01:35:55PM +0200, Nadia.Derbey@bull.net wrote:
> [PATCH 02/09]
> 
> This is a trivial patch that renames:
>    . alloc_layer to get_from_free_list since it idr_pre_get that actually
>      allocates memory.
>    . free_layer to move_to_free_list since memory is not actually freed there.
> 
> This makes things more clear for the next patches.

Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

> Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
> 
> ---
>  lib/idr.c |   31 ++++++++++++++++---------------
>  1 file changed, 16 insertions(+), 15 deletions(-)
> 
> Index: linux-2.6.25-mm1/lib/idr.c
> ===================================================================
> --- linux-2.6.25-mm1.orig/lib/idr.c	2008-05-06 17:14:30.000000000 +0200
> +++ linux-2.6.25-mm1/lib/idr.c	2008-05-06 17:36:36.000000000 +0200
> @@ -35,7 +35,7 @@
> 
>  static struct kmem_cache *idr_layer_cache;
> 
> -static struct idr_layer *alloc_layer(struct idr *idp)
> +static struct idr_layer *get_from_free_list(struct idr *idp)
>  {
>  	struct idr_layer *p;
>  	unsigned long flags;
> @@ -51,14 +51,14 @@ static struct idr_layer *alloc_layer(str
>  }
> 
>  /* only called when idp->lock is held */
> -static void __free_layer(struct idr *idp, struct idr_layer *p)
> +static void __move_to_free_list(struct idr *idp, struct idr_layer *p)
>  {
>  	p->ary[0] = idp->id_free;
>  	idp->id_free = p;
>  	idp->id_free_cnt++;
>  }
> 
> -static void free_layer(struct idr *idp, struct idr_layer *p)
> +static void move_to_free_list(struct idr *idp, struct idr_layer *p)
>  {
>  	unsigned long flags;
> 
> @@ -66,7 +66,7 @@ static void free_layer(struct idr *idp, 
>  	 * Depends on the return element being zeroed.
>  	 */
>  	spin_lock_irqsave(&idp->lock, flags);
> -	__free_layer(idp, p);
> +	__move_to_free_list(idp, p);
>  	spin_unlock_irqrestore(&idp->lock, flags);
>  }
> 
> @@ -109,7 +109,7 @@ int idr_pre_get(struct idr *idp, gfp_t g
>  		new = kmem_cache_alloc(idr_layer_cache, gfp_mask);
>  		if (new == NULL)
>  			return (0);
> -		free_layer(idp, new);
> +		move_to_free_list(idp, new);
>  	}
>  	return 1;
>  }
> @@ -167,7 +167,8 @@ static int sub_alloc(struct idr *idp, in
>  		 * Create the layer below if it is missing.
>  		 */
>  		if (!p->ary[m]) {
> -			if (!(new = alloc_layer(idp)))
> +			new = get_from_free_list(idp);
> +			if (!new)
>  				return -1;
>  			p->ary[m] = new;
>  			p->count++;
> @@ -192,7 +193,7 @@ build_up:
>  	p = idp->top;
>  	layers = idp->layers;
>  	if (unlikely(!p)) {
> -		if (!(p = alloc_layer(idp)))
> +		if (!(p = get_from_free_list(idp)))
>  			return -1;
>  		layers = 1;
>  	}
> @@ -204,7 +205,7 @@ build_up:
>  		layers++;
>  		if (!p->count)
>  			continue;
> -		if (!(new = alloc_layer(idp))) {
> +		if (!(new = get_from_free_list(idp))) {
>  			/*
>  			 * The allocation failed.  If we built part of
>  			 * the structure tear it down.
> @@ -214,7 +215,7 @@ build_up:
>  				p = p->ary[0];
>  				new->ary[0] = NULL;
>  				new->bitmap = new->count = 0;
> -				__free_layer(idp, new);
> +				__move_to_free_list(idp, new);
>  			}
>  			spin_unlock_irqrestore(&idp->lock, flags);
>  			return -1;
> @@ -351,7 +352,7 @@ static void sub_remove(struct idr *idp, 
>  		__clear_bit(n, &p->bitmap);
>  		p->ary[n] = NULL;
>  		while(*paa && ! --((**paa)->count)){
> -			free_layer(idp, **paa);
> +			move_to_free_list(idp, **paa);
>  			**paa-- = NULL;
>  		}
>  		if (!*paa)
> @@ -378,12 +379,12 @@ void idr_remove(struct idr *idp, int id)
> 
>  		p = idp->top->ary[0];
>  		idp->top->bitmap = idp->top->count = 0;
> -		free_layer(idp, idp->top);
> +		move_to_free_list(idp, idp->top);
>  		idp->top = p;
>  		--idp->layers;
>  	}
>  	while (idp->id_free_cnt >= IDR_FREE_MAX) {
> -		p = alloc_layer(idp);
> +		p = get_from_free_list(idp);
>  		kmem_cache_free(idr_layer_cache, p);
>  		return;
>  	}
> @@ -426,7 +427,7 @@ void idr_remove_all(struct idr *idp)
>  		while (n < fls(id)) {
>  			if (p) {
>  				memset(p, 0, sizeof *p);
> -				free_layer(idp, p);
> +				move_to_free_list(idp, p);
>  			}
>  			n += IDR_BITS;
>  			p = *--paa;
> @@ -444,7 +445,7 @@ EXPORT_SYMBOL(idr_remove_all);
>  void idr_destroy(struct idr *idp)
>  {
>  	while (idp->id_free_cnt) {
> -		struct idr_layer *p = alloc_layer(idp);
> +		struct idr_layer *p = get_from_free_list(idp);
>  		kmem_cache_free(idr_layer_cache, p);
>  	}
>  }
> @@ -749,7 +750,7 @@ int ida_get_new_above(struct ida *ida, i
>  	 * allocation.
>  	 */
>  	if (ida->idr.id_free_cnt || ida->free_bitmap) {
> -		struct idr_layer *p = alloc_layer(&ida->idr);
> +		struct idr_layer *p = get_from_free_list(&ida->idr);
>  		if (p)
>  			kmem_cache_free(idr_layer_cache, p);
>  	}
> 
> --

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 3/9] Fix a printk call
  2008-05-07 11:35 ` [PATCH 3/9] Fix a printk call Nadia.Derbey
  2008-05-08 17:43   ` Rik van Riel
@ 2008-05-30  8:23   ` Paul E. McKenney
  1 sibling, 0 replies; 39+ messages in thread
From: Paul E. McKenney @ 2008-05-30  8:23 UTC (permalink / raw)
  To: Nadia.Derbey; +Cc: manfred, lnxninja, linux-kernel, efault, akpm

On Wed, May 07, 2008 at 01:35:56PM +0200, Nadia.Derbey@bull.net wrote:
> [PATCH 03/09]
> 
> This is a trivial patch that fixes the printk incomplete call.

Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

> Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
> 
> ---
>  lib/idr.c |    3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> Index: linux-2.6.25-mm1/lib/idr.c
> ===================================================================
> --- linux-2.6.25-mm1.orig/lib/idr.c	2008-05-06 17:24:15.000000000 +0200
> +++ linux-2.6.25-mm1/lib/idr.c	2008-05-06 17:26:41.000000000 +0200
> @@ -325,7 +325,8 @@ EXPORT_SYMBOL(idr_get_new);
> 
>  static void idr_remove_warning(int id)
>  {
> -	printk("idr_remove called for id=%d which is not allocated.\n", id);
> +	printk(KERN_WARNING
> +		"idr_remove called for id=%d which is not allocated.\n", id);
>  	dump_stack();
>  }
> 
> 
> --

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 5/9] Make idr_get_new* rcu-safe
  2008-05-07 11:35 ` [PATCH 5/9] Make idr_get_new* rcu-safe Nadia.Derbey
  2008-05-08 17:55   ` Rik van Riel
@ 2008-05-30  8:23   ` Paul E. McKenney
  1 sibling, 0 replies; 39+ messages in thread
From: Paul E. McKenney @ 2008-05-30  8:23 UTC (permalink / raw)
  To: Nadia.Derbey; +Cc: manfred, lnxninja, linux-kernel, efault, akpm

On Wed, May 07, 2008 at 01:35:58PM +0200, Nadia.Derbey@bull.net wrote:
> [PATCH 05/09]
> 
> This is a patch that make the idr_get_new* routines rcu-safe.

Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

> Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
> 
> ---
>  lib/idr.c |   17 ++++++++++++-----
>  1 file changed, 12 insertions(+), 5 deletions(-)
> 
> Index: linux-2.6.25-mm1/lib/idr.c
> ===================================================================
> --- linux-2.6.25-mm1.orig/lib/idr.c	2008-05-06 17:38:42.000000000 +0200
> +++ linux-2.6.25-mm1/lib/idr.c	2008-05-06 18:02:07.000000000 +0200
> @@ -6,6 +6,8 @@
>   * Modified by George Anzinger to reuse immediately and to use
>   * find bit instructions.  Also removed _irq on spinlocks.
>   *
> + * Modified by Nadia Derbey to make it RCU safe.
> + *
>   * Small id to pointer translation service.
>   *
>   * It uses a radix tree like structure as a sparse array indexed
> @@ -96,7 +98,7 @@ static void idr_mark_full(struct idr_lay
>   * @gfp_mask:	memory allocation flags
>   *
>   * This function should be called prior to locking and calling the
> - * following function.  It preallocates enough memory to satisfy
> + * idr_get_new* functions. It preallocates enough memory to satisfy
>   * the worst possible allocation.
>   *
>   * If the system is REALLY out of memory this function returns 0,
> @@ -170,7 +172,8 @@ static int sub_alloc(struct idr *idp, in
>  			new = get_from_free_list(idp);
>  			if (!new)
>  				return -1;
> -			p->ary[m] = new;
> +			INIT_RCU_HEAD(&new->rcu_head);
> +			rcu_assign_pointer(p->ary[m], new);
>  			p->count++;
>  		}
>  		pa[l--] = p;
> @@ -195,6 +198,7 @@ build_up:
>  	if (unlikely(!p)) {
>  		if (!(p = get_from_free_list(idp)))
>  			return -1;
> +		INIT_RCU_HEAD(&p->rcu_head);
>  		layers = 1;
>  	}
>  	/*
> @@ -222,11 +226,12 @@ build_up:
>  		}
>  		new->ary[0] = p;
>  		new->count = 1;
> +		INIT_RCU_HEAD(&new->rcu_head);
>  		if (p->bitmap == IDR_FULL)
>  			__set_bit(0, &new->bitmap);
>  		p = new;
>  	}
> -	idp->top = p;
> +	rcu_assign_pointer(idp->top, p);
>  	idp->layers = layers;
>  	v = sub_alloc(idp, &id, pa);
>  	if (v == IDR_NEED_TO_GROW)
> @@ -245,7 +250,8 @@ static int idr_get_new_above_int(struct 
>  		 * Successfully found an empty slot.  Install the user
>  		 * pointer and mark the slot full.
>  		 */
> -		pa[0]->ary[id & IDR_MASK] = (struct idr_layer *)ptr;
> +		rcu_assign_pointer(pa[0]->ary[id & IDR_MASK],
> +				(struct idr_layer *)ptr);
>  		pa[0]->count++;
>  		idr_mark_full(pa, id);
>  	}
> @@ -710,7 +716,8 @@ int ida_get_new_above(struct ida *ida, i
>  			return -EAGAIN;
> 
>  		memset(bitmap, 0, sizeof(struct ida_bitmap));
> -		pa[0]->ary[idr_id & IDR_MASK] = (void *)bitmap;
> +		rcu_assign_pointer(pa[0]->ary[idr_id & IDR_MASK],
> +				(void *)bitmap);
>  		pa[0]->count++;
>  	}
> 
> 
> --

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 6/9] Make idr_find rcu-safe
  2008-05-07 11:35 ` [PATCH 6/9] Make idr_find rcu-safe Nadia.Derbey
  2008-05-08 17:58   ` Rik van Riel
@ 2008-05-30  8:24   ` Paul E. McKenney
  1 sibling, 0 replies; 39+ messages in thread
From: Paul E. McKenney @ 2008-05-30  8:24 UTC (permalink / raw)
  To: Nadia.Derbey; +Cc: manfred, lnxninja, linux-kernel, efault, akpm

On Wed, May 07, 2008 at 01:35:59PM +0200, Nadia.Derbey@bull.net wrote:
> [PATCH 06/09]
> 
> This is a patch that makes idr_find rcu-safe: it can now be called inside an
> rcu_read critical section.

Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

> Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
> 
> ---
>  include/linux/idr.h |   16 ++++++++++++++++
>  lib/idr.c           |   11 ++++++-----
>  2 files changed, 22 insertions(+), 5 deletions(-)
> 
> Index: linux-2.6.25-mm1/lib/idr.c
> ===================================================================
> --- linux-2.6.25-mm1.orig/lib/idr.c	2008-05-06 18:06:58.000000000 +0200
> +++ linux-2.6.25-mm1/lib/idr.c	2008-05-07 10:38:15.000000000 +0200
> @@ -459,7 +459,8 @@ EXPORT_SYMBOL(idr_destroy);
>   * return indicates that @id is not valid or you passed %NULL in
>   * idr_get_new().
>   *
> - * The caller must serialize idr_find() vs idr_get_new() and idr_remove().
> + * This function can be called under rcu_read_lock(), given that the leaf
> + * pointers lifetimes are correctly managed.
>   */
>  void *idr_find(struct idr *idp, int id)
>  {
> @@ -467,7 +468,7 @@ void *idr_find(struct idr *idp, int id)
>  	struct idr_layer *p;
> 
>  	n = idp->layers * IDR_BITS;
> -	p = idp->top;
> +	p = rcu_dereference(idp->top);
> 
>  	/* Mask off upper bits we don't use for the search. */
>  	id &= MAX_ID_MASK;
> @@ -477,7 +478,7 @@ void *idr_find(struct idr *idp, int id)
> 
>  	while (n > 0 && p) {
>  		n -= IDR_BITS;
> -		p = p->ary[(id >> n) & IDR_MASK];
> +		p = rcu_dereference(p->ary[(id >> n) & IDR_MASK]);
>  	}
>  	return((void *)p);
>  }
> @@ -510,7 +511,7 @@ int idr_for_each(struct idr *idp,
>  	struct idr_layer **paa = &pa[0];
> 
>  	n = idp->layers * IDR_BITS;
> -	p = idp->top;
> +	p = rcu_dereference(idp->top);
>  	max = 1 << n;
> 
>  	id = 0;
> @@ -518,7 +519,7 @@ int idr_for_each(struct idr *idp,
>  		while (n > 0 && p) {
>  			n -= IDR_BITS;
>  			*paa++ = p;
> -			p = p->ary[(id >> n) & IDR_MASK];
> +			p = rcu_dereference(p->ary[(id >> n) & IDR_MASK]);
>  		}
> 
>  		if (p) {
> Index: linux-2.6.25-mm1/include/linux/idr.h
> ===================================================================
> --- linux-2.6.25-mm1.orig/include/linux/idr.h	2008-05-06 17:38:42.000000000 +0200
> +++ linux-2.6.25-mm1/include/linux/idr.h	2008-05-07 10:41:22.000000000 +0200
> @@ -79,6 +79,22 @@ struct idr {
> 
>  #define _idr_rc_to_errno(rc) ((rc) == -1 ? -EAGAIN : -ENOSPC)
> 
> +/**
> + * idr synchronization (stolen from radix-tree.h)
> + *
> + * idr_find() is able to be called locklessly, using RCU. The caller must
> + * ensure calls to this function are made within rcu_read_lock() regions.
> + * Other readers (lock-free or otherwise) and modifications may be running
> + * concurrently.
> + *
> + * It is still required that the caller manage the synchronization and
> + * lifetimes of the items. So if RCU lock-free lookups are used, typically
> + * this would mean that the items have their own locks, or are amenable to
> + * lock-free access; and that the items are freed by RCU (or only freed after
> + * having been deleted from the idr tree *and* a synchronize_rcu() grace
> + * period).
> + */
> +
>  /*
>   * This is what we export.
>   */
> 
> --

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 7/9] Make idr_remove rcu-safe
  2008-05-07 11:36 ` [PATCH 7/9] Make idr_remove rcu-safe Nadia.Derbey
  2008-05-08 18:02   ` Rik van Riel
  2008-05-14 19:59   ` Tim Pepper
@ 2008-05-30  8:24   ` Paul E. McKenney
  2 siblings, 0 replies; 39+ messages in thread
From: Paul E. McKenney @ 2008-05-30  8:24 UTC (permalink / raw)
  To: Nadia.Derbey; +Cc: manfred, lnxninja, linux-kernel, efault, akpm

On Wed, May 07, 2008 at 01:36:00PM +0200, Nadia.Derbey@bull.net wrote:
> [PATCH 07/09]
> 
> This patch introduces the free_layer() routine: it is the one that actually
> frees memory after a grace period has elapsed.

Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

> Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
> 
> ---
>  lib/idr.c |   59 ++++++++++++++++++++++++++++++++++++++++++++---------------
>  1 file changed, 44 insertions(+), 15 deletions(-)
> 
> Index: linux-2.6.25-mm1/lib/idr.c
> ===================================================================
> --- linux-2.6.25-mm1.orig/lib/idr.c	2008-05-06 18:06:43.000000000 +0200
> +++ linux-2.6.25-mm1/lib/idr.c	2008-05-07 09:07:31.000000000 +0200
> @@ -52,6 +52,19 @@ static struct idr_layer *get_from_free_l
>  	return(p);
>  }
> 
> +static void idr_layer_rcu_free(struct rcu_head *head)
> +{
> +	struct idr_layer *layer;
> +
> +	layer = container_of(head, struct idr_layer, rcu_head);
> +	kmem_cache_free(idr_layer_cache, layer);
> +}
> +
> +static inline void free_layer(struct idr_layer *p)
> +{
> +	call_rcu(&p->rcu_head, idr_layer_rcu_free);
> +}
> +
>  /* only called when idp->lock is held */
>  static void __move_to_free_list(struct idr *idp, struct idr_layer *p)
>  {
> @@ -334,6 +347,7 @@ static void sub_remove(struct idr *idp, 
>  	struct idr_layer *p = idp->top;
>  	struct idr_layer **pa[MAX_LEVEL];
>  	struct idr_layer ***paa = &pa[0];
> +	struct idr_layer *to_free;
>  	int n;
> 
>  	*paa = NULL;
> @@ -349,13 +363,18 @@ static void sub_remove(struct idr *idp, 
>  	n = id & IDR_MASK;
>  	if (likely(p != NULL && test_bit(n, &p->bitmap))){
>  		__clear_bit(n, &p->bitmap);
> -		p->ary[n] = NULL;
> +		rcu_assign_pointer(p->ary[n], NULL);
> +		to_free = NULL;
>  		while(*paa && ! --((**paa)->count)){
> -			move_to_free_list(idp, **paa);
> +			if (to_free)
> +				free_layer(to_free);
> +			to_free = **paa;
>  			**paa-- = NULL;
>  		}
>  		if (!*paa)
>  			idp->layers = 0;
> +		if (to_free)
> +			free_layer(to_free);
>  	} else
>  		idr_remove_warning(id);
>  }
> @@ -368,25 +387,37 @@ static void sub_remove(struct idr *idp, 
>  void idr_remove(struct idr *idp, int id)
>  {
>  	struct idr_layer *p;
> +	struct idr_layer *to_free;
> 
>  	/* Mask off upper bits we don't use for the search. */
>  	id &= MAX_ID_MASK;
> 
>  	sub_remove(idp, (idp->layers - 1) * IDR_BITS, id);
>  	if (idp->top && idp->top->count == 1 && (idp->layers > 1) &&
> -	    idp->top->ary[0]) {  // We can drop a layer
> -
> +	    idp->top->ary[0]) {
> +		/*
> +		 * Single child at leftmost slot: we can shrink the tree.
> +		 * This level is not needed anymore since when layers are
> +		 * inserted, they are inserted at the top of the existing
> +		 * tree.
> +		 */
> +		to_free = idp->top;
>  		p = idp->top->ary[0];
> -		idp->top->bitmap = idp->top->count = 0;
> -		move_to_free_list(idp, idp->top);
> -		idp->top = p;
> +		rcu_assign_pointer(idp->top, p);
>  		--idp->layers;
> +		to_free->bitmap = to_free->count = 0;
> +		free_layer(to_free);
>  	}
>  	while (idp->id_free_cnt >= IDR_FREE_MAX) {
>  		p = get_from_free_list(idp);
> +		/*
> +		 * Note: we don't call the rcu callback here, since the only
> +		 * layers that fall into the freelist are those that have been
> +		 * preallocated.
> +		 */
>  		kmem_cache_free(idr_layer_cache, p);
> -		return;
>  	}
> +	return;
>  }
>  EXPORT_SYMBOL(idr_remove);
> 
> @@ -424,15 +455,13 @@ void idr_remove_all(struct idr *idp)
> 
>  		id += 1 << n;
>  		while (n < fls(id)) {
> -			if (p) {
> -				memset(p, 0, sizeof *p);
> -				move_to_free_list(idp, p);
> -			}
> +			if (p)
> +				free_layer(p);
>  			n += IDR_BITS;
>  			p = *--paa;
>  		}
>  	}
> -	idp->top = NULL;
> +	rcu_assign_pointer(idp->top, NULL);
>  	idp->layers = 0;
>  }
>  EXPORT_SYMBOL(idr_remove_all);
> @@ -549,7 +578,7 @@ EXPORT_SYMBOL(idr_for_each);
>   * A -ENOENT return indicates that @id was not found.
>   * A -EINVAL return indicates that @id was not within valid constraints.
>   *
> - * The caller must serialize vs idr_find(), idr_get_new(), and idr_remove().
> + * The caller must serialize with writers.
>   */
>  void *idr_replace(struct idr *idp, void *ptr, int id)
>  {
> @@ -575,7 +604,7 @@ void *idr_replace(struct idr *idp, void 
>  		return ERR_PTR(-ENOENT);
> 
>  	old_p = p->ary[n];
> -	p->ary[n] = ptr;
> +	rcu_assign_pointer(p->ary[n], ptr);
> 
>  	return old_p;
>  }
> 
> --

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 8/9] Call idr_find() without locking in ipc_lock()
  2008-05-07 11:36 ` [PATCH 8/9] Call idr_find() without locking in ipc_lock() Nadia.Derbey
  2008-05-08 18:11   ` Rik van Riel
@ 2008-05-30  8:27   ` Paul E. McKenney
  1 sibling, 0 replies; 39+ messages in thread
From: Paul E. McKenney @ 2008-05-30  8:27 UTC (permalink / raw)
  To: Nadia.Derbey; +Cc: manfred, lnxninja, linux-kernel, efault, akpm

On Wed, May 07, 2008 at 01:36:01PM +0200, Nadia.Derbey@bull.net wrote:
> [PATCH 08/09]
> 
> This patch makes idr_find() called locklessly in ipc_lock(), since the idr
> tree is now RCU protected.

Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

> Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
> 
> ---
>  ipc/util.c |    9 ---------
>  1 file changed, 9 deletions(-)
> 
> Index: linux-2.6.25-mm1/ipc/util.c
> ===================================================================
> --- linux-2.6.25-mm1.orig/ipc/util.c	2008-05-06 17:15:10.000000000 +0200
> +++ linux-2.6.25-mm1/ipc/util.c	2008-05-07 09:56:20.000000000 +0200
> @@ -688,10 +688,6 @@ void ipc64_perm_to_ipc_perm (struct ipc6
>   * Look for an id in the ipc ids idr and lock the associated ipc object.
>   *
>   * The ipc object is locked on exit.
> - *
> - * This is the routine that should be called when the rw_mutex is not already
> - * held, i.e. idr tree not protected: it protects the idr tree in read mode
> - * during the idr_find().
>   */
> 
>  struct kern_ipc_perm *ipc_lock(struct ipc_ids *ids, int id)
> @@ -699,18 +695,13 @@ struct kern_ipc_perm *ipc_lock(struct ip
>  	struct kern_ipc_perm *out;
>  	int lid = ipcid_to_idx(id);
> 
> -	down_read(&ids->rw_mutex);
> -
>  	rcu_read_lock();
>  	out = idr_find(&ids->ipcs_idr, lid);
>  	if (out == NULL) {
>  		rcu_read_unlock();
> -		up_read(&ids->rw_mutex);
>  		return ERR_PTR(-EINVAL);
>  	}
> 
> -	up_read(&ids->rw_mutex);
> -
>  	spin_lock(&out->lock);
>  	
>  	/* ipc_rmid() may have already freed the ID while ipc_lock
> 
> --

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 9/9] Get rid of ipc_lock_down()
  2008-05-07 11:36 ` [PATCH 9/9] Get rid of ipc_lock_down() Nadia.Derbey
  2008-05-08 18:13   ` Rik van Riel
@ 2008-05-30  8:29   ` Paul E. McKenney
  1 sibling, 0 replies; 39+ messages in thread
From: Paul E. McKenney @ 2008-05-30  8:29 UTC (permalink / raw)
  To: Nadia.Derbey; +Cc: manfred, lnxninja, linux-kernel, efault, akpm

On Wed, May 07, 2008 at 01:36:02PM +0200, Nadia.Derbey@bull.net wrote:
> [PATCH 09/09]
> 
> This patch removes the ipc_lock_down() routines: they used to call idr_find()
> locklessly (given that the ipc ids lock was already held), so they are not
> needed anymore.

Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

> Signed-off-by: Nadia Derbey <Nadia.Derbey@bull.net>
> 
> ---
>  ipc/shm.c  |   21 +++------------------
>  ipc/util.c |   52 +---------------------------------------------------
>  ipc/util.h |    6 ------
>  3 files changed, 4 insertions(+), 75 deletions(-)
> 
> Index: linux-2.6.25-mm1/ipc/util.c
> ===================================================================
> --- linux-2.6.25-mm1.orig/ipc/util.c	2008-05-07 09:56:20.000000000 +0200
> +++ linux-2.6.25-mm1/ipc/util.c	2008-05-07 09:58:32.000000000 +0200
> @@ -716,56 +716,6 @@ struct kern_ipc_perm *ipc_lock(struct ip
>  	return out;
>  }
> 
> -/**
> - * ipc_lock_down - Lock an ipc structure with rw_sem held
> - * @ids: IPC identifier set
> - * @id: ipc id to look for
> - *
> - * Look for an id in the ipc ids idr and lock the associated ipc object.
> - *
> - * The ipc object is locked on exit.
> - *
> - * This is the routine that should be called when the rw_mutex is already
> - * held, i.e. idr tree protected.
> - */
> -
> -struct kern_ipc_perm *ipc_lock_down(struct ipc_ids *ids, int id)
> -{
> -	struct kern_ipc_perm *out;
> -	int lid = ipcid_to_idx(id);
> -
> -	rcu_read_lock();
> -	out = idr_find(&ids->ipcs_idr, lid);
> -	if (out == NULL) {
> -		rcu_read_unlock();
> -		return ERR_PTR(-EINVAL);
> -	}
> -
> -	spin_lock(&out->lock);
> -
> -	/*
> -	 * No need to verify that the structure is still valid since the
> -	 * rw_mutex is held.
> -	 */
> -	return out;
> -}
> -
> -struct kern_ipc_perm *ipc_lock_check_down(struct ipc_ids *ids, int id)
> -{
> -	struct kern_ipc_perm *out;
> -
> -	out = ipc_lock_down(ids, id);
> -	if (IS_ERR(out))
> -		return out;
> -
> -	if (ipc_checkid(out, id)) {
> -		ipc_unlock(out);
> -		return ERR_PTR(-EIDRM);
> -	}
> -
> -	return out;
> -}
> -
>  struct kern_ipc_perm *ipc_lock_check(struct ipc_ids *ids, int id)
>  {
>  	struct kern_ipc_perm *out;
> @@ -837,7 +787,7 @@ struct kern_ipc_perm *ipcctl_pre_down(st
>  	int err;
> 
>  	down_write(&ids->rw_mutex);
> -	ipcp = ipc_lock_check_down(ids, id);
> +	ipcp = ipc_lock_check(ids, id);
>  	if (IS_ERR(ipcp)) {
>  		err = PTR_ERR(ipcp);
>  		goto out_up;
> Index: linux-2.6.25-mm1/ipc/util.h
> ===================================================================
> --- linux-2.6.25-mm1.orig/ipc/util.h	2008-05-06 17:15:10.000000000 +0200
> +++ linux-2.6.25-mm1/ipc/util.h	2008-05-07 09:59:31.000000000 +0200
> @@ -102,11 +102,6 @@ void* ipc_rcu_alloc(int size);
>  void ipc_rcu_getref(void *ptr);
>  void ipc_rcu_putref(void *ptr);
> 
> -/*
> - * ipc_lock_down: called with rw_mutex held
> - * ipc_lock: called without that lock held
> - */
> -struct kern_ipc_perm *ipc_lock_down(struct ipc_ids *, int);
>  struct kern_ipc_perm *ipc_lock(struct ipc_ids *, int);
> 
>  void kernel_to_ipc64_perm(struct kern_ipc_perm *in, struct ipc64_perm *out);
> @@ -155,7 +150,6 @@ static inline void ipc_unlock(struct ker
>  	rcu_read_unlock();
>  }
> 
> -struct kern_ipc_perm *ipc_lock_check_down(struct ipc_ids *ids, int id);
>  struct kern_ipc_perm *ipc_lock_check(struct ipc_ids *ids, int id);
>  int ipcget(struct ipc_namespace *ns, struct ipc_ids *ids,
>  			struct ipc_ops *ops, struct ipc_params *params);
> Index: linux-2.6.25-mm1/ipc/shm.c
> ===================================================================
> --- linux-2.6.25-mm1.orig/ipc/shm.c	2008-05-06 17:15:10.000000000 +0200
> +++ linux-2.6.25-mm1/ipc/shm.c	2008-05-07 10:01:05.000000000 +0200
> @@ -112,23 +112,8 @@ void __init shm_init (void)
>  }
> 
>  /*
> - * shm_lock_(check_)down routines are called in the paths where the rw_mutex
> - * is held to protect access to the idr tree.
> - */
> -static inline struct shmid_kernel *shm_lock_down(struct ipc_namespace *ns,
> -						int id)
> -{
> -	struct kern_ipc_perm *ipcp = ipc_lock_down(&shm_ids(ns), id);
> -
> -	if (IS_ERR(ipcp))
> -		return (struct shmid_kernel *)ipcp;
> -
> -	return container_of(ipcp, struct shmid_kernel, shm_perm);
> -}
> -
> -/*
>   * shm_lock_(check_) routines are called in the paths where the rw_mutex
> - * is not held.
> + * is not necessarily held.
>   */
>  static inline struct shmid_kernel *shm_lock(struct ipc_namespace *ns, int id)
>  {
> @@ -211,7 +196,7 @@ static void shm_close(struct vm_area_str
> 
>  	down_write(&shm_ids(ns).rw_mutex);
>  	/* remove from the list of attaches of the shm segment */
> -	shp = shm_lock_down(ns, sfd->id);
> +	shp = shm_lock(ns, sfd->id);
>  	BUG_ON(IS_ERR(shp));
>  	shp->shm_lprid = task_tgid_vnr(current);
>  	shp->shm_dtim = get_seconds();
> @@ -933,7 +918,7 @@ invalid:
> 
>  out_nattch:
>  	down_write(&shm_ids(ns).rw_mutex);
> -	shp = shm_lock_down(ns, shmid);
> +	shp = shm_lock(ns, shmid);
>  	BUG_ON(IS_ERR(shp));
>  	shp->shm_nattch--;
>  	if(shp->shm_nattch == 0 &&
> 
> --

^ permalink raw reply	[flat|nested] 39+ messages in thread

* Re: [PATCH 0/9] Scalability requirements for sysv ipc - v3
  2008-05-30  8:22 ` Paul E. McKenney
@ 2008-06-02  5:53   ` Nadia Derbey
  0 siblings, 0 replies; 39+ messages in thread
From: Nadia Derbey @ 2008-06-02  5:53 UTC (permalink / raw)
  To: paulmck; +Cc: manfred, lnxninja, linux-kernel, efault, akpm

Paul E. McKenney wrote:
> On Wed, May 07, 2008 at 01:35:53PM +0200, Nadia.Derbey@bull.net wrote:
> 
>>After scalability problems have been detected when using the sysV ipcs, I
>>have proposed to use an RCU based implementation of the IDR api instead (see
>>threads http://lkml.org/lkml/2008/4/11/212 and
>>http://lkml.org/lkml/2008/4/29/295).
>>
>>This resulted in many people asking to convert the idr API and make it
>>rcu safe (because most of the code was duplicated and thus unmaintanable
>>and unreviewable).
>>
>>So here is a first attempt.
>>
>>The important change wrt to the idr API itself is during idr removes:
>>idr layers are freed after a grace period, instead of being moved to the
>>free list.
>>
>>The important change wrt to ipcs, is that idr_find() can now be called
>>locklessly inside a rcu read critical section.
>>
>>Here are the results I've got for the pmsg test sent by Manfred: 
>>
>>   2.6.25-rc3-mm1   2.6.25-rc3-mm1+   2.6.25-mm1   Patched 2.6.25-mm1
>>1         1168441           1064021       876000               947488
>>2         1094264            921059      1549592              1730685
>>3         2082520           1738165      1694370              2324880
>>4         2079929           1695521       404553              2400408
>>5         2898758            406566       391283              3246580
>>6         2921417            261275       263249              3752148
>>7         3308761            126056       191742              4243142
>>8         3329456            100129       141722              4275780
>>
>>1st column: stock 2.6.25-rc3-mm1
>>2nd column: 2.6.25-rc3-mm1 + ipc patches (store ipcs into idrs)
>>3nd column: stock 2.6.25-mm1
>>4th column: 2.6.25-mm1 + this pacth series.
>>
>>I'll send a chart as an answer to this mail: don't know how to do that
>>with quilt :-(
>>
>>
>>Reviewers are more than ever welcome!
>>
>>Patches should be applied on linux-2.6.25-mm1, in the following order:
>>
>>[ PATCH 01/09 ] : idr_add_rcu_head.patch
>>[ PATCH 02/09 ] : idr_rename_routines.patch
>>[ PATCH 03/09 ] : idr_fix_printk.patch
>>[ PATCH 04/09 ] : idr_rc_to_errno.patch
>>[ PATCH 05/09 ] : idr_get_new_rcu_safe.patch
>>[ PATCH 06/09 ] : idr_find_rcu_safe.patch
>>[ PATCH 07/09 ] : idr_remove_rcu_safe.patch
>>[ PATCH 08/09 ] : ipc_fix_ipc_lock.patch
>>[ PATCH 09/09 ] : remove_ipc_lock_down.patch
>>
>>Patches 2, 3 and 4 do not introduce actual changes.
>>
>>I won't be available before next Tuesday, so, please, don't be mad at me if
>>I'm not answering fast enough.
> 
> 
> I guess in my case, next Tuesday was not an issue.  :-/

Anyway, thanks for reviewing the code!

> 
> Anyway, the idr.c changes look good to me.  Not sure why you are using
> INIT_RCU_HEAD() given that call_rcu() completely initializes the fields.
> Using INIT_RCU_HEAD() doesn't cause any problems, but does add needless
> code.

Well the INIT_RCU_HEAD calls are here because I first thought of moving 
the freed idr layers to the free list instead of directly removing them.
So the INIT_RCU_HEAD was a way for me to have a "blank" structure when 
getting a layer from the free list, i.e. reusing an old layer.
But I finally gave up the idea... but left the INI_RCU_HEAD() calls.
Will fix this.
But I have a "process" question: should I resend the complete patch 
series or would a patch above the already submitted ones be enough?

Regards,
Nadia

> 
> Commentary below, looks good from an RCU viewpoint.
> 
> 							Thanx, Paul
> 
> 
>>/*
>> * 2002-10-18  written by Jim Houston jim.houston@ccur.com
>> *	Copyright (C) 2002 by Concurrent Computer Corporation
>> *	Distributed under the GNU GPL license version 2.
>> *
>> * Modified by George Anzinger to reuse immediately and to use
>> * find bit instructions.  Also removed _irq on spinlocks.
>> *
>> * Modified by Nadia Derbey to make it RCU safe.
>> *
>> * Small id to pointer translation service.
>> *
>> * It uses a radix tree like structure as a sparse array indexed
>> * by the id to obtain the pointer.  The bitmap makes allocating
>> * a new id quick.
>> *
>> * You call it to allocate an id (an int) an associate with that id a
>> * pointer or what ever, we treat it as a (void *).  You can pass this
>> * id to a user for him to pass back at a later time.  You then pass
>> * that id to this code and it returns your pointer.
>>
>> * You can release ids at any time. When all ids are released, most of
>> * the memory is returned (we keep IDR_FREE_MAX) in a local pool so we
>> * don't need to go to the memory "store" during an id allocate, just
>> * so you don't need to be too concerned about locking and conflicts
>> * with the slab allocator.
>> */
>>
>>#ifndef TEST                        // to test in user space...
>>#include <linux/slab.h>
>>#include <linux/init.h>
>>#include <linux/module.h>
>>#endif
>>#include <linux/err.h>
>>#include <linux/string.h>
>>#include <linux/idr.h>
>>
>>static struct kmem_cache *idr_layer_cache;
>>
>>static struct idr_layer *get_from_free_list(struct idr *idp)
>>{
>>	struct idr_layer *p;
>>	unsigned long flags;
>>
>>	spin_lock_irqsave(&idp->lock, flags);
>>	if ((p = idp->id_free)) {
>>		idp->id_free = p->ary[0];
>>		idp->id_free_cnt--;
>>		p->ary[0] = NULL;
> 
> 
> OK, this is the freelist which is inaccessible to readers.
> 
> 
>>	}
>>	spin_unlock_irqrestore(&idp->lock, flags);
>>	return(p);
>>}
>>
>>static void idr_layer_rcu_free(struct rcu_head *head)
>>{
>>	struct idr_layer *layer;
>>
>>	layer = container_of(head, struct idr_layer, rcu_head);
>>	kmem_cache_free(idr_layer_cache, layer);
>>}
>>
>>static inline void free_layer(struct idr_layer *p)
>>{
>>	call_rcu(&p->rcu_head, idr_layer_rcu_free);
>>}
>>
>>/* only called when idp->lock is held */
>>static void __move_to_free_list(struct idr *idp, struct idr_layer *p)
>>{
>>	p->ary[0] = idp->id_free;
> 
> 
> OK, this is the freelist which is inaccessible to readers.
> 
> 
>>	idp->id_free = p;
>>	idp->id_free_cnt++;
>>}
>>
>>static void move_to_free_list(struct idr *idp, struct idr_layer *p)
>>{
>>	unsigned long flags;
>>
>>	/*
>>	 * Depends on the return element being zeroed.
>>	 */
>>	spin_lock_irqsave(&idp->lock, flags);
>>	__move_to_free_list(idp, p);
>>	spin_unlock_irqrestore(&idp->lock, flags);
>>}
>>
>>static void idr_mark_full(struct idr_layer **pa, int id)
>>{
>>	struct idr_layer *p = pa[0];
>>	int l = 0;
>>
>>	__set_bit(id & IDR_MASK, &p->bitmap);
>>	/*
>>	 * If this layer is full mark the bit in the layer above to
>>	 * show that this part of the radix tree is full.  This may
>>	 * complete the layer above and require walking up the radix
>>	 * tree.
>>	 */
>>	while (p->bitmap == IDR_FULL) {
>>		if (!(p = pa[++l]))
>>			break;
>>		id = id >> IDR_BITS;
>>		__set_bit((id & IDR_MASK), &p->bitmap);
>>	}
>>}
>>
>>/**
>> * idr_pre_get - reserver resources for idr allocation
>> * @idp:	idr handle
>> * @gfp_mask:	memory allocation flags
>> *
>> * This function should be called prior to locking and calling the
>> * idr_get_new* functions. It preallocates enough memory to satisfy
>> * the worst possible allocation.
>> *
>> * If the system is REALLY out of memory this function returns 0,
>> * otherwise 1.
>> */
>>int idr_pre_get(struct idr *idp, gfp_t gfp_mask)
>>{
>>	while (idp->id_free_cnt < IDR_FREE_MAX) {
>>		struct idr_layer *new;
>>		new = kmem_cache_alloc(idr_layer_cache, gfp_mask);
>>		if (new == NULL)
>>			return (0);
>>		move_to_free_list(idp, new);
>>	}
>>	return 1;
>>}
>>EXPORT_SYMBOL(idr_pre_get);
>>
>>static int sub_alloc(struct idr *idp, int *starting_id, struct idr_layer **pa)
>>{
>>	int n, m, sh;
>>	struct idr_layer *p, *new;
>>	int l, id, oid;
>>	unsigned long bm;
>>
>>	id = *starting_id;
>> restart:
>>	p = idp->top;
> 
> 
> OK, the caller presumably holds an update-side lock.
> 
> 
>>	l = idp->layers;
>>	pa[l--] = NULL;
>>	while (1) {
>>		/*
>>		 * We run around this while until we reach the leaf node...
>>		 */
>>		n = (id >> (IDR_BITS*l)) & IDR_MASK;
>>		bm = ~p->bitmap;
>>		m = find_next_bit(&bm, IDR_SIZE, n);
>>		if (m == IDR_SIZE) {
>>			/* no space available go back to previous layer. */
>>			l++;
>>			oid = id;
>>			id = (id | ((1 << (IDR_BITS * l)) - 1)) + 1;
>>
>>			/* if already at the top layer, we need to grow */
>>			if (!(p = pa[l])) {
>>				*starting_id = id;
>>				return IDR_NEED_TO_GROW;
>>			}
>>
>>			/* If we need to go up one layer, continue the
>>			 * loop; otherwise, restart from the top.
>>			 */
>>			sh = IDR_BITS * (l + 1);
>>			if (oid >> sh == id >> sh)
>>				continue;
>>			else
>>				goto restart;
>>		}
>>		if (m != n) {
>>			sh = IDR_BITS*l;
>>			id = ((id >> sh) ^ n ^ m) << sh;
>>		}
>>		if ((id >= MAX_ID_BIT) || (id < 0))
>>			return IDR_NOMORE_SPACE;
>>		if (l == 0)
>>			break;
>>		/*
>>		 * Create the layer below if it is missing.
>>		 */
>>		if (!p->ary[m]) {
> 
> 
> OK, we aren't dereferencing.  Besides, we should hold the update-side
> lock at this point.
> 
> 
>>			new = get_from_free_list(idp);
>>			if (!new)
>>				return -1;
>>			INIT_RCU_HEAD(&new->rcu_head);
> 
> 
> Not needed, unless you want this zeroed for debug purposes.
> 
> 
>>			rcu_assign_pointer(p->ary[m], new);
>>			p->count++;
>>		}
>>		pa[l--] = p;
>>		p = p->ary[m];
> 
> 
> Holding update-side lock.
> 
> 
>>	}
>>
>>	pa[l] = p;
>>	return id;
>>}
>>
>>static int idr_get_empty_slot(struct idr *idp, int starting_id,
>>			      struct idr_layer **pa)
>>{
>>	struct idr_layer *p, *new;
>>	int layers, v, id;
>>	unsigned long flags;
>>
>>	id = starting_id;
>>build_up:
>>	p = idp->top;
> 
> 
> OK, the caller presumably holds an update-side lock.
> 
> 
>>	layers = idp->layers;
>>	if (unlikely(!p)) {
>>		if (!(p = get_from_free_list(idp)))
>>			return -1;
>>		INIT_RCU_HEAD(&p->rcu_head);
> 
> 
> Not needed, unless you want this zeroed for debug purposes.
> 
> 
>>		layers = 1;
>>	}
>>	/*
>>	 * Add a new layer to the top of the tree if the requested
>>	 * id is larger than the currently allocated space.
>>	 */
>>	while ((layers < (MAX_LEVEL - 1)) && (id >= (1 << (layers*IDR_BITS)))) {
>>		layers++;
>>		if (!p->count)
>>			continue;
>>		if (!(new = get_from_free_list(idp))) {
>>			/*
>>			 * The allocation failed.  If we built part of
>>			 * the structure tear it down.
>>			 */
>>			spin_lock_irqsave(&idp->lock, flags);
>>			for (new = p; p && p != idp->top; new = p) {
>>				p = p->ary[0];
>>				new->ary[0] = NULL;
> 
> 
> OK, this presumably has not yet been made accessible to readers.
> 
> 
>>				new->bitmap = new->count = 0;
>>				__move_to_free_list(idp, new);
>>			}
>>			spin_unlock_irqrestore(&idp->lock, flags);
>>			return -1;
>>		}
>>		new->ary[0] = p;
> 
> 
> OK, this presumably has not yet been made accessible to readers.
> 
> 
>>		new->count = 1;
>>		INIT_RCU_HEAD(&new->rcu_head);
> 
> 
> Not needed, unless you want this zeroed for debug purposes.
> 
> 
>>		if (p->bitmap == IDR_FULL)
>>			__set_bit(0, &new->bitmap);
>>		p = new;
>>	}
>>	rcu_assign_pointer(idp->top, p);
>>	idp->layers = layers;
>>	v = sub_alloc(idp, &id, pa);
>>	if (v == IDR_NEED_TO_GROW)
>>		goto build_up;
>>	return(v);
>>}
>>
>>static int idr_get_new_above_int(struct idr *idp, void *ptr, int starting_id)
>>{
>>	struct idr_layer *pa[MAX_LEVEL];
>>	int id;
>>
>>	id = idr_get_empty_slot(idp, starting_id, pa);
>>	if (id >= 0) {
>>		/*
>>		 * Successfully found an empty slot.  Install the user
>>		 * pointer and mark the slot full.
>>		 */
>>		rcu_assign_pointer(pa[0]->ary[id & IDR_MASK],
>>				(struct idr_layer *)ptr);
>>		pa[0]->count++;
>>		idr_mark_full(pa, id);
>>	}
>>
>>	return id;
>>}
>>
>>/**
>> * idr_get_new_above - allocate new idr entry above or equal to a start id
>> * @idp: idr handle
>> * @ptr: pointer you want associated with the ide
>> * @start_id: id to start search at
>> * @id: pointer to the allocated handle
>> *
>> * This is the allocate id function.  It should be called with any
>> * required locks.
>> *
>> * If memory is required, it will return -EAGAIN, you should unlock
>> * and go back to the idr_pre_get() call.  If the idr is full, it will
>> * return -ENOSPC.
>> *
>> * @id returns a value in the range 0 ... 0x7fffffff
>> */
>>int idr_get_new_above(struct idr *idp, void *ptr, int starting_id, int *id)
>>{
>>	int rv;
>>
>>	rv = idr_get_new_above_int(idp, ptr, starting_id);
>>	/*
>>	 * This is a cheap hack until the IDR code can be fixed to
>>	 * return proper error values.
>>	 */
>>	if (rv < 0)
>>		return _idr_rc_to_errno(rv);
>>	*id = rv;
>>	return 0;
>>}
>>EXPORT_SYMBOL(idr_get_new_above);
>>
>>/**
>> * idr_get_new - allocate new idr entry
>> * @idp: idr handle
>> * @ptr: pointer you want associated with the ide
>> * @id: pointer to the allocated handle
>> *
>> * This is the allocate id function.  It should be called with any
>> * required locks.
>> *
>> * If memory is required, it will return -EAGAIN, you should unlock
>> * and go back to the idr_pre_get() call.  If the idr is full, it will
>> * return -ENOSPC.
>> *
>> * @id returns a value in the range 0 ... 0x7fffffff
>> */
>>int idr_get_new(struct idr *idp, void *ptr, int *id)
>>{
>>	int rv;
>>
>>	rv = idr_get_new_above_int(idp, ptr, 0);
>>	/*
>>	 * This is a cheap hack until the IDR code can be fixed to
>>	 * return proper error values.
>>	 */
>>	if (rv < 0)
>>		return _idr_rc_to_errno(rv);
>>	*id = rv;
>>	return 0;
>>}
>>EXPORT_SYMBOL(idr_get_new);
>>
>>static void idr_remove_warning(int id)
>>{
>>	printk(KERN_WARNING
>>		"idr_remove called for id=%d which is not allocated.\n", id);
>>	dump_stack();
>>}
>>
>>static void sub_remove(struct idr *idp, int shift, int id)
>>{
>>	struct idr_layer *p = idp->top;
> 
> 
> OK, the caller presumably holds an update-side lock.
> 
> 
>>	struct idr_layer **pa[MAX_LEVEL];
>>	struct idr_layer ***paa = &pa[0];
>>	struct idr_layer *to_free;
>>	int n;
>>
>>	*paa = NULL;
>>	*++paa = &idp->top;
>>
>>	while ((shift > 0) && p) {
>>		n = (id >> shift) & IDR_MASK;
>>		__clear_bit(n, &p->bitmap);
>>		*++paa = &p->ary[n];
> 
> 
> OK, the caller presumably holds an update-side lock.
> 
> 
>>		p = p->ary[n];
>>		shift -= IDR_BITS;
>>	}
>>	n = id & IDR_MASK;
>>	if (likely(p != NULL && test_bit(n, &p->bitmap))){
>>		__clear_bit(n, &p->bitmap);
>>		rcu_assign_pointer(p->ary[n], NULL);
>>		to_free = NULL;
>>		while(*paa && ! --((**paa)->count)){
>>			if (to_free)
>>				free_layer(to_free);
>>			to_free = **paa;
>>			**paa-- = NULL;
>>		}
>>		if (!*paa)
>>			idp->layers = 0;
>>		if (to_free)
>>			free_layer(to_free);
>>	} else
>>		idr_remove_warning(id);
>>}
>>
>>/**
>> * idr_remove - remove the given id and free it's slot
>> * @idp: idr handle
>> * @id: unique key
>> */
>>void idr_remove(struct idr *idp, int id)
>>{
>>	struct idr_layer *p;
>>	struct idr_layer *to_free;
>>
>>	/* Mask off upper bits we don't use for the search. */
>>	id &= MAX_ID_MASK;
>>
>>	sub_remove(idp, (idp->layers - 1) * IDR_BITS, id);
>>	if (idp->top && idp->top->count == 1 && (idp->layers > 1) &&
>>	    idp->top->ary[0]) {
> 
> 
> OK, the caller presumably holds the update-side lock.
> 
> 
>>		/*
>>		 * Single child at leftmost slot: we can shrink the tree.
>>		 * This level is not needed anymore since when layers are
>>		 * inserted, they are inserted at the top of the existing
>>		 * tree.
>>		 */
>>		to_free = idp->top;
>>		p = idp->top->ary[0];
> 
> 
> OK, the caller presumably holds the update-side lock.
> 
> 
>>		rcu_assign_pointer(idp->top, p);
>>		--idp->layers;
>>		to_free->bitmap = to_free->count = 0;
>>		free_layer(to_free);
>>	}
>>	while (idp->id_free_cnt >= IDR_FREE_MAX) {
>>		p = get_from_free_list(idp);
>>		/*
>>		 * Note: we don't call the rcu callback here, since the only
>>		 * layers that fall into the freelist are those that have been
>>		 * preallocated.
>>		 */
>>		kmem_cache_free(idr_layer_cache, p);
>>	}
>>	return;
>>}
>>EXPORT_SYMBOL(idr_remove);
>>
>>/**
>> * idr_remove_all - remove all ids from the given idr tree
>> * @idp: idr handle
>> *
>> * idr_destroy() only frees up unused, cached idp_layers, but this
>> * function will remove all id mappings and leave all idp_layers
>> * unused.
>> *
>> * A typical clean-up sequence for objects stored in an idr tree, will
>> * use idr_for_each() to free all objects, if necessay, then
>> * idr_remove_all() to remove all ids, and idr_destroy() to free
>> * up the cached idr_layers.
>> */
>>void idr_remove_all(struct idr *idp)
>>{
>>	int n, id, max;
>>	struct idr_layer *p;
>>	struct idr_layer *pa[MAX_LEVEL];
>>	struct idr_layer **paa = &pa[0];
>>
>>	n = idp->layers * IDR_BITS;
>>	p = idp->top;
> 
> 
> OK, the caller presumably holds an update-side lock.
> 
> 
>>	max = 1 << n;
>>
>>	id = 0;
>>	while (id < max) {
>>		while (n > IDR_BITS && p) {
>>			n -= IDR_BITS;
>>			*paa++ = p;
>>			p = p->ary[(id >> n) & IDR_MASK];
> 
> 
> OK, the caller presumably holds the update-side lock.
> 
> 
>>		}
>>
>>		id += 1 << n;
>>		while (n < fls(id)) {
>>			if (p)
>>				free_layer(p);
>>			n += IDR_BITS;
>>			p = *--paa;
>>		}
>>	}
>>	rcu_assign_pointer(idp->top, NULL);
>>	idp->layers = 0;
>>}
>>EXPORT_SYMBOL(idr_remove_all);
>>
>>/**
>> * idr_destroy - release all cached layers within an idr tree
>> * idp: idr handle
>> */
>>void idr_destroy(struct idr *idp)
>>{
>>	while (idp->id_free_cnt) {
>>		struct idr_layer *p = get_from_free_list(idp);
>>		kmem_cache_free(idr_layer_cache, p);
>>	}
>>}
>>EXPORT_SYMBOL(idr_destroy);
>>
>>/**
>> * idr_find - return pointer for given id
>> * @idp: idr handle
>> * @id: lookup key
>> *
>> * Return the pointer given the id it has been registered with.  A %NULL
>> * return indicates that @id is not valid or you passed %NULL in
>> * idr_get_new().
>> *
>> * This function can be called under rcu_read_lock(), given that the leaf
>> * pointers lifetimes are correctly managed.
>> */
>>void *idr_find(struct idr *idp, int id)
>>{
>>	int n;
>>	struct idr_layer *p;
>>
>>	n = idp->layers * IDR_BITS;
>>	p = rcu_dereference(idp->top);
>>
>>	/* Mask off upper bits we don't use for the search. */
>>	id &= MAX_ID_MASK;
>>
>>	if (id >= (1 << n))
>>		return NULL;
>>
>>	while (n > 0 && p) {
>>		n -= IDR_BITS;
>>		p = rcu_dereference(p->ary[(id >> n) & IDR_MASK]);
>>	}
>>	return((void *)p);
>>}
>>EXPORT_SYMBOL(idr_find);
>>
>>/**
>> * idr_for_each - iterate through all stored pointers
>> * @idp: idr handle
>> * @fn: function to be called for each pointer
>> * @data: data passed back to callback function
>> *
>> * Iterate over the pointers registered with the given idr.  The
>> * callback function will be called for each pointer currently
>> * registered, passing the id, the pointer and the data pointer passed
>> * to this function.  It is not safe to modify the idr tree while in
>> * the callback, so functions such as idr_get_new and idr_remove are
>> * not allowed.
>> *
>> * We check the return of @fn each time. If it returns anything other
>> * than 0, we break out and return that value.
>> *
>> * The caller must serialize idr_for_each() vs idr_get_new() and idr_remove().
>> */
>>int idr_for_each(struct idr *idp,
>>		 int (*fn)(int id, void *p, void *data), void *data)
>>{
>>	int n, id, max, error = 0;
>>	struct idr_layer *p;
>>	struct idr_layer *pa[MAX_LEVEL];
>>	struct idr_layer **paa = &pa[0];
>>
>>	n = idp->layers * IDR_BITS;
>>	p = rcu_dereference(idp->top);
>>	max = 1 << n;
>>
>>	id = 0;
>>	while (id < max) {
>>		while (n > 0 && p) {
>>			n -= IDR_BITS;
>>			*paa++ = p;
>>			p = rcu_dereference(p->ary[(id >> n) & IDR_MASK]);
>>		}
>>
>>		if (p) {
>>			error = fn(id, (void *)p, data);
>>			if (error)
>>				break;
>>		}
>>
>>		id += 1 << n;
>>		while (n < fls(id)) {
>>			n += IDR_BITS;
>>			p = *--paa;
>>		}
>>	}
>>
>>	return error;
>>}
>>EXPORT_SYMBOL(idr_for_each);
>>
>>/**
>> * idr_replace - replace pointer for given id
>> * @idp: idr handle
>> * @ptr: pointer you want associated with the id
>> * @id: lookup key
>> *
>> * Replace the pointer registered with an id and return the old value.
>> * A -ENOENT return indicates that @id was not found.
>> * A -EINVAL return indicates that @id was not within valid constraints.
>> *
>> * The caller must serialize with writers.
>> */
>>void *idr_replace(struct idr *idp, void *ptr, int id)
>>{
>>	int n;
>>	struct idr_layer *p, *old_p;
>>
>>	n = idp->layers * IDR_BITS;
>>	p = idp->top;
> 
> 
> OK, the caller presumably holds an update-side lock.
> 
> 
>>	id &= MAX_ID_MASK;
>>
>>	if (id >= (1 << n))
>>		return ERR_PTR(-EINVAL);
>>
>>	n -= IDR_BITS;
>>	while ((n > 0) && p) {
>>		p = p->ary[(id >> n) & IDR_MASK];
> 
> 
> OK, the caller presumably holds the update-side lock.
> 
> 
>>		n -= IDR_BITS;
>>	}
>>
>>	n = id & IDR_MASK;
>>	if (unlikely(p == NULL || !test_bit(n, &p->bitmap)))
>>		return ERR_PTR(-ENOENT);
>>
>>	old_p = p->ary[n];
> 
> 
> OK, the caller presumably holds the update-side lock.
> 
> 
>>	rcu_assign_pointer(p->ary[n], ptr);
>>
>>	return old_p;
>>}
>>EXPORT_SYMBOL(idr_replace);
>>
>>static void idr_cache_ctor(struct kmem_cache *idr_layer_cache, void *idr_layer)
>>{
>>	memset(idr_layer, 0, sizeof(struct idr_layer));
>>}
>>
>>void __init idr_init_cache(void)
>>{
>>	idr_layer_cache = kmem_cache_create("idr_layer_cache",
>>				sizeof(struct idr_layer), 0, SLAB_PANIC,
>>				idr_cache_ctor);
>>}
>>
>>/**
>> * idr_init - initialize idr handle
>> * @idp:	idr handle
>> *
>> * This function is use to set up the handle (@idp) that you will pass
>> * to the rest of the functions.
>> */
>>void idr_init(struct idr *idp)
>>{
>>	memset(idp, 0, sizeof(struct idr));
>>	spin_lock_init(&idp->lock);
>>}
>>EXPORT_SYMBOL(idr_init);
>>
>>
>>/*
>> * IDA - IDR based ID allocator
>> *
>> * this is id allocator without id -> pointer translation.  Memory
>> * usage is much lower than full blown idr because each id only
>> * occupies a bit.  ida uses a custom leaf node which contains
>> * IDA_BITMAP_BITS slots.
>> *
>> * 2007-04-25  written by Tejun Heo <htejun@gmail.com>
>> */
>>
>>static void free_bitmap(struct ida *ida, struct ida_bitmap *bitmap)
>>{
>>	unsigned long flags;
>>
>>	if (!ida->free_bitmap) {
>>		spin_lock_irqsave(&ida->idr.lock, flags);
>>		if (!ida->free_bitmap) {
>>			ida->free_bitmap = bitmap;
>>			bitmap = NULL;
>>		}
>>		spin_unlock_irqrestore(&ida->idr.lock, flags);
>>	}
>>
>>	kfree(bitmap);
>>}
>>
>>/**
>> * ida_pre_get - reserve resources for ida allocation
>> * @ida:	ida handle
>> * @gfp_mask:	memory allocation flag
>> *
>> * This function should be called prior to locking and calling the
>> * following function.  It preallocates enough memory to satisfy the
>> * worst possible allocation.
>> *
>> * If the system is REALLY out of memory this function returns 0,
>> * otherwise 1.
>> */
>>int ida_pre_get(struct ida *ida, gfp_t gfp_mask)
>>{
>>	/* allocate idr_layers */
>>	if (!idr_pre_get(&ida->idr, gfp_mask))
>>		return 0;
>>
>>	/* allocate free_bitmap */
>>	if (!ida->free_bitmap) {
>>		struct ida_bitmap *bitmap;
>>
>>		bitmap = kmalloc(sizeof(struct ida_bitmap), gfp_mask);
>>		if (!bitmap)
>>			return 0;
>>
>>		free_bitmap(ida, bitmap);
>>	}
>>
>>	return 1;
>>}
>>EXPORT_SYMBOL(ida_pre_get);
>>
>>/**
>> * ida_get_new_above - allocate new ID above or equal to a start id
>> * @ida:	ida handle
>> * @staring_id:	id to start search at
>> * @p_id:	pointer to the allocated handle
>> *
>> * Allocate new ID above or equal to @ida.  It should be called with
>> * any required locks.
>> *
>> * If memory is required, it will return -EAGAIN, you should unlock
>> * and go back to the ida_pre_get() call.  If the ida is full, it will
>> * return -ENOSPC.
>> *
>> * @p_id returns a value in the range 0 ... 0x7fffffff.
>> */
>>int ida_get_new_above(struct ida *ida, int starting_id, int *p_id)
>>{
>>	struct idr_layer *pa[MAX_LEVEL];
>>	struct ida_bitmap *bitmap;
>>	unsigned long flags;
>>	int idr_id = starting_id / IDA_BITMAP_BITS;
>>	int offset = starting_id % IDA_BITMAP_BITS;
>>	int t, id;
>>
>> restart:
>>	/* get vacant slot */
>>	t = idr_get_empty_slot(&ida->idr, idr_id, pa);
>>	if (t < 0)
>>		return _idr_rc_to_errno(t);
>>
>>	if (t * IDA_BITMAP_BITS >= MAX_ID_BIT)
>>		return -ENOSPC;
>>
>>	if (t != idr_id)
>>		offset = 0;
>>	idr_id = t;
>>
>>	/* if bitmap isn't there, create a new one */
>>	bitmap = (void *)pa[0]->ary[idr_id & IDR_MASK];
> 
> 
> OK, the caller presumably holds the update-side lock.
> 
> 
>>	if (!bitmap) {
>>		spin_lock_irqsave(&ida->idr.lock, flags);
>>		bitmap = ida->free_bitmap;
>>		ida->free_bitmap = NULL;
>>		spin_unlock_irqrestore(&ida->idr.lock, flags);
>>
>>		if (!bitmap)
>>			return -EAGAIN;
>>
>>		memset(bitmap, 0, sizeof(struct ida_bitmap));
>>		rcu_assign_pointer(pa[0]->ary[idr_id & IDR_MASK],
>>				(void *)bitmap);
>>		pa[0]->count++;
>>	}
>>
>>	/* lookup for empty slot */
>>	t = find_next_zero_bit(bitmap->bitmap, IDA_BITMAP_BITS, offset);
>>	if (t == IDA_BITMAP_BITS) {
>>		/* no empty slot after offset, continue to the next chunk */
>>		idr_id++;
>>		offset = 0;
>>		goto restart;
>>	}
>>
>>	id = idr_id * IDA_BITMAP_BITS + t;
>>	if (id >= MAX_ID_BIT)
>>		return -ENOSPC;
>>
>>	__set_bit(t, bitmap->bitmap);
>>	if (++bitmap->nr_busy == IDA_BITMAP_BITS)
>>		idr_mark_full(pa, idr_id);
>>
>>	*p_id = id;
>>
>>	/* Each leaf node can handle nearly a thousand slots and the
>>	 * whole idea of ida is to have small memory foot print.
>>	 * Throw away extra resources one by one after each successful
>>	 * allocation.
>>	 */
>>	if (ida->idr.id_free_cnt || ida->free_bitmap) {
>>		struct idr_layer *p = get_from_free_list(&ida->idr);
>>		if (p)
>>			kmem_cache_free(idr_layer_cache, p);
>>	}
>>
>>	return 0;
>>}
>>EXPORT_SYMBOL(ida_get_new_above);
>>
>>/**
>> * ida_get_new - allocate new ID
>> * @ida:	idr handle
>> * @p_id:	pointer to the allocated handle
>> *
>> * Allocate new ID.  It should be called with any required locks.
>> *
>> * If memory is required, it will return -EAGAIN, you should unlock
>> * and go back to the idr_pre_get() call.  If the idr is full, it will
>> * return -ENOSPC.
>> *
>> * @id returns a value in the range 0 ... 0x7fffffff.
>> */
>>int ida_get_new(struct ida *ida, int *p_id)
>>{
>>	return ida_get_new_above(ida, 0, p_id);
>>}
>>EXPORT_SYMBOL(ida_get_new);
>>
>>/**
>> * ida_remove - remove the given ID
>> * @ida:	ida handle
>> * @id:		ID to free
>> */
>>void ida_remove(struct ida *ida, int id)
>>{
>>	struct idr_layer *p = ida->idr.top;
>>	int shift = (ida->idr.layers - 1) * IDR_BITS;
>>	int idr_id = id / IDA_BITMAP_BITS;
>>	int offset = id % IDA_BITMAP_BITS;
>>	int n;
>>	struct ida_bitmap *bitmap;
>>
>>	/* clear full bits while looking up the leaf idr_layer */
>>	while ((shift > 0) && p) {
>>		n = (idr_id >> shift) & IDR_MASK;
>>		__clear_bit(n, &p->bitmap);
>>		p = p->ary[n];
> 
> 
> OK, the caller presumably holds the update-side lock.
> 
> 
>>		shift -= IDR_BITS;
>>	}
>>
>>	if (p == NULL)
>>		goto err;
>>
>>	n = idr_id & IDR_MASK;
>>	__clear_bit(n, &p->bitmap);
>>
>>	bitmap = (void *)p->ary[n];
> 
> 
> OK, the caller presumably holds the update-side lock.
> 
> 
>>	if (!test_bit(offset, bitmap->bitmap))
>>		goto err;
>>
>>	/* update bitmap and remove it if empty */
>>	__clear_bit(offset, bitmap->bitmap);
>>	if (--bitmap->nr_busy == 0) {
>>		__set_bit(n, &p->bitmap);	/* to please idr_remove() */
>>		idr_remove(&ida->idr, idr_id);
>>		free_bitmap(ida, bitmap);
>>	}
>>
>>	return;
>>
>> err:
>>	printk(KERN_WARNING
>>	       "ida_remove called for id=%d which is not allocated.\n", id);
>>}
>>EXPORT_SYMBOL(ida_remove);
>>
>>/**
>> * ida_destroy - release all cached layers within an ida tree
>> * ida:		ida handle
>> */
>>void ida_destroy(struct ida *ida)
>>{
>>	idr_destroy(&ida->idr);
>>	kfree(ida->free_bitmap);
>>}
>>EXPORT_SYMBOL(ida_destroy);
>>
>>/**
>> * ida_init - initialize ida handle
>> * @ida:	ida handle
>> *
>> * This function is use to set up the handle (@ida) that you will pass
>> * to the rest of the functions.
>> */
>>void ida_init(struct ida *ida)
>>{
>>	memset(ida, 0, sizeof(struct ida));
>>	idr_init(&ida->idr);
>>
>>}
>>EXPORT_SYMBOL(ida_init);
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
> 



^ permalink raw reply	[flat|nested] 39+ messages in thread

end of thread, other threads:[~2008-06-02  5:52 UTC | newest]

Thread overview: 39+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-05-07 11:35 [PATCH 0/9] Scalability requirements for sysv ipc - v3 Nadia.Derbey
2008-05-07 11:35 ` [PATCH 1/9] Change the idr structure Nadia.Derbey
2008-05-08 17:12   ` Rik van Riel
2008-05-30  8:22   ` Paul E. McKenney
2008-05-07 11:35 ` [PATCH 2/9] Rename some of the idr APIs internal routines Nadia.Derbey
2008-05-08 17:15   ` Rik van Riel
2008-05-30  8:23   ` Paul E. McKenney
2008-05-07 11:35 ` [PATCH 3/9] Fix a printk call Nadia.Derbey
2008-05-08 17:43   ` Rik van Riel
2008-05-30  8:23   ` Paul E. McKenney
2008-05-07 11:35 ` [PATCH 4/9] Error checking factorization Nadia.Derbey
2008-05-08 17:45   ` Rik van Riel
2008-05-07 11:35 ` [PATCH 5/9] Make idr_get_new* rcu-safe Nadia.Derbey
2008-05-08 17:55   ` Rik van Riel
2008-05-30  8:23   ` Paul E. McKenney
2008-05-07 11:35 ` [PATCH 6/9] Make idr_find rcu-safe Nadia.Derbey
2008-05-08 17:58   ` Rik van Riel
2008-05-30  8:24   ` Paul E. McKenney
2008-05-07 11:36 ` [PATCH 7/9] Make idr_remove rcu-safe Nadia.Derbey
2008-05-08 18:02   ` Rik van Riel
2008-05-14 19:59   ` Tim Pepper
2008-05-15  7:40     ` Nadia Derbey
2008-05-20  5:29       ` Tim Pepper
2008-05-20  5:35         ` Tim Pepper
2008-05-20  7:03         ` Nadia Derbey
2008-05-20 16:26           ` Tim Pepper
2008-05-30  8:24   ` Paul E. McKenney
2008-05-07 11:36 ` [PATCH 8/9] Call idr_find() without locking in ipc_lock() Nadia.Derbey
2008-05-08 18:11   ` Rik van Riel
2008-05-30  8:27   ` Paul E. McKenney
2008-05-07 11:36 ` [PATCH 9/9] Get rid of ipc_lock_down() Nadia.Derbey
2008-05-08 18:13   ` Rik van Riel
2008-05-30  8:29   ` Paul E. McKenney
2008-05-07 11:41 ` [PATCH 0/9] Scalability requirements for sysv ipc - v3 Nadia Derbey
2008-05-07 13:19 ` KOSAKI Motohiro
2008-05-13 14:10   ` Nadia Derbey
2008-05-14  4:22     ` KOSAKI Motohiro
2008-05-30  8:22 ` Paul E. McKenney
2008-06-02  5:53   ` Nadia Derbey

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).