All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/8] urcu-qsbr: improve 2-phase wait scheme
       [not found] <1353169007-31389-1-git-send-email-mathieu.desnoyers@efficios.com>
@ 2012-11-17 16:16 ` Mathieu Desnoyers
  2012-11-17 16:16 ` [PATCH 2/8] urcu-mb/signal/membarrier: " Mathieu Desnoyers
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 14+ messages in thread
From: Mathieu Desnoyers @ 2012-11-17 16:16 UTC (permalink / raw)
  To: lttng-dev, rp, Paul E . McKenney, Lai Jiangshan, Alan Stern
  Cc: Mathieu Desnoyers

In the single-bit, 2-phase grace period scheme, all we need to do is to
observe each reader going through a quiescent state while we are in the
grace period.

We therefore only need to perform one global counter update, surrounded
by 2 iterations on readers to observe change in their snapshot.

We can therefore remove the first counter update (prior to the first
iteration on readers): it was useless and was only slowing down the
grace period.

Suggested-by: Alan Stern <stern@rowland.harvard.edu>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
---
 urcu-qsbr.c |   92 ++++++++++++++++++++++++++++++++++++-----------------------
 1 file changed, 56 insertions(+), 36 deletions(-)

diff --git a/urcu-qsbr.c b/urcu-qsbr.c
index c6a1b18..eb167d1 100644
--- a/urcu-qsbr.c
+++ b/urcu-qsbr.c
@@ -116,38 +116,16 @@ static void wait_gp(void)
 		      NULL, NULL, 0);
 }
 
-static void update_counter_and_wait(void)
+static void wait_for_readers(void)
 {
 	CDS_LIST_HEAD(qsreaders);
 	int wait_loops = 0;
 	struct rcu_reader *index, *tmp;
 
-#if (CAA_BITS_PER_LONG < 64)
-	/* Switch parity: 0 -> 1, 1 -> 0 */
-	CMM_STORE_SHARED(rcu_gp_ctr, rcu_gp_ctr ^ RCU_GP_CTR);
-#else	/* !(CAA_BITS_PER_LONG < 64) */
-	/* Increment current G.P. */
-	CMM_STORE_SHARED(rcu_gp_ctr, rcu_gp_ctr + RCU_GP_CTR);
-#endif	/* !(CAA_BITS_PER_LONG < 64) */
-
-	/*
-	 * Must commit rcu_gp_ctr update to memory before waiting for
-	 * quiescent state. Failure to do so could result in the writer
-	 * waiting forever while new readers are always accessing data
-	 * (no progress). Enforce compiler-order of store to rcu_gp_ctr
-	 * before load URCU_TLS(rcu_reader).ctr.
-	 */
-	cmm_barrier();
-
-	/*
-	 * Adding a cmm_smp_mb() which is _not_ formally required, but makes the
-	 * model easier to understand. It does not have a big performance impact
-	 * anyway, given this is the write-side.
-	 */
-	cmm_smp_mb();
-
 	/*
-	 * Wait for each thread rcu_reader_qs_gp count to become 0.
+	 * Wait for each thread URCU_TLS(rcu_reader).ctr to either
+	 * indicate quiescence (offline), or for them to observe the
+	 * current rcu_gp_ctr value.
 	 */
 	for (;;) {
 		wait_loops++;
@@ -223,17 +201,17 @@ void synchronize_rcu(void)
 		goto out;
 
 	/*
-	 * Wait for previous parity to be empty of readers.
+	 * Wait for readers to observe original parity or be quiescent.
 	 */
-	update_counter_and_wait();	/* 0 -> 1, wait readers in parity 0 */
+	wait_for_readers();
 
 	/*
-	 * Must finish waiting for quiescent state for parity 0 before
-	 * committing next rcu_gp_ctr update to memory. Failure to
-	 * do so could result in the writer waiting forever while new
+	 * Must finish waiting for quiescent state for original parity
+	 * before committing next rcu_gp_ctr update to memory. Failure
+	 * to do so could result in the writer waiting forever while new
 	 * readers are always accessing data (no progress).  Enforce
-	 * compiler-order of load URCU_TLS(rcu_reader).ctr before store to
-	 * rcu_gp_ctr.
+	 * compiler-order of load URCU_TLS(rcu_reader).ctr before store
+	 * to rcu_gp_ctr.
 	 */
 	cmm_barrier();
 
@@ -244,10 +222,29 @@ void synchronize_rcu(void)
 	 */
 	cmm_smp_mb();
 
+	/* Switch parity: 0 -> 1, 1 -> 0 */
+	CMM_STORE_SHARED(rcu_gp_ctr, rcu_gp_ctr ^ RCU_GP_CTR);
+
 	/*
-	 * Wait for previous parity to be empty of readers.
+	 * Must commit rcu_gp_ctr update to memory before waiting for
+	 * quiescent state. Failure to do so could result in the writer
+	 * waiting forever while new readers are always accessing data
+	 * (no progress). Enforce compiler-order of store to rcu_gp_ctr
+	 * before load URCU_TLS(rcu_reader).ctr.
 	 */
-	update_counter_and_wait();	/* 1 -> 0, wait readers in parity 1 */
+	cmm_barrier();
+
+	/*
+	 * Adding a cmm_smp_mb() which is _not_ formally required, but makes the
+	 * model easier to understand. It does not have a big performance impact
+	 * anyway, given this is the write-side.
+	 */
+	cmm_smp_mb();
+
+	/*
+	 * Wait for readers to observe new parity or be quiescent.
+	 */
+	wait_for_readers();
 out:
 	mutex_unlock(&rcu_gp_lock);
 
@@ -280,7 +277,30 @@ void synchronize_rcu(void)
 	mutex_lock(&rcu_gp_lock);
 	if (cds_list_empty(&registry))
 		goto out;
-	update_counter_and_wait();
+
+	/* Increment current G.P. */
+	CMM_STORE_SHARED(rcu_gp_ctr, rcu_gp_ctr + RCU_GP_CTR);
+
+	/*
+	 * Must commit rcu_gp_ctr update to memory before waiting for
+	 * quiescent state. Failure to do so could result in the writer
+	 * waiting forever while new readers are always accessing data
+	 * (no progress). Enforce compiler-order of store to rcu_gp_ctr
+	 * before load URCU_TLS(rcu_reader).ctr.
+	 */
+	cmm_barrier();
+
+	/*
+	 * Adding a cmm_smp_mb() which is _not_ formally required, but makes the
+	 * model easier to understand. It does not have a big performance impact
+	 * anyway, given this is the write-side.
+	 */
+	cmm_smp_mb();
+
+	/*
+	 * Wait for readers to observe new count of be quiescent.
+	 */
+	wait_for_readers();
 out:
 	mutex_unlock(&rcu_gp_lock);
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/8] urcu-mb/signal/membarrier: improve 2-phase wait scheme
       [not found] <1353169007-31389-1-git-send-email-mathieu.desnoyers@efficios.com>
  2012-11-17 16:16 ` [PATCH 1/8] urcu-qsbr: improve 2-phase wait scheme Mathieu Desnoyers
@ 2012-11-17 16:16 ` Mathieu Desnoyers
  2012-11-17 16:16 ` [PATCH 3/8] urcu-bp: " Mathieu Desnoyers
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 14+ messages in thread
From: Mathieu Desnoyers @ 2012-11-17 16:16 UTC (permalink / raw)
  To: lttng-dev, rp, Paul E . McKenney, Lai Jiangshan, Alan Stern
  Cc: Mathieu Desnoyers

In the single-bit, 2-phase grace period scheme, all we need to do is to
observe each reader going through a quiescent state while we are in the
grace period.

We therefore only need to perform one global counter update, surrounded
by 2 iterations on readers to observe change in their snapshot.

We can therefore remove the first counter update (prior to the first
iteration on readers): it was useless and was only slowing down the
grace period.

CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Lai Jiangshan <laijs@cn.fujitsu.com>
CC: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
---
 urcu.c |   54 ++++++++++++++++++++++++++++--------------------------
 1 file changed, 28 insertions(+), 26 deletions(-)

diff --git a/urcu.c b/urcu.c
index c421846..2d5c510 100644
--- a/urcu.c
+++ b/urcu.c
@@ -215,33 +215,16 @@ static void wait_gp(void)
 		      NULL, NULL, 0);
 }
 
-static void update_counter_and_wait(void)
+static void wait_for_readers(void)
 {
 	CDS_LIST_HEAD(qsreaders);
 	int wait_loops = 0;
 	struct rcu_reader *index, *tmp;
 
-	/* Switch parity: 0 -> 1, 1 -> 0 */
-	CMM_STORE_SHARED(rcu_gp_ctr, rcu_gp_ctr ^ RCU_GP_CTR_PHASE);
-
-	/*
-	 * Must commit rcu_gp_ctr update to memory before waiting for quiescent
-	 * state. Failure to do so could result in the writer waiting forever
-	 * while new readers are always accessing data (no progress). Enforce
-	 * compiler-order of store to rcu_gp_ctr before load rcu_reader ctr.
-	 */
-	cmm_barrier();
-
-	/*
-	 *
-	 * Adding a cmm_smp_mb() which is _not_ formally required, but makes the
-	 * model easier to understand. It does not have a big performance impact
-	 * anyway, given this is the write-side.
-	 */
-	cmm_smp_mb();
-
 	/*
-	 * Wait for each thread URCU_TLS(rcu_reader).ctr count to become 0.
+	 * Wait for each thread URCU_TLS(rcu_reader).ctr to either
+	 * indicate quiescence (not nested), or observe the current
+	 * rcu_gp_ctr value.
 	 */
 	for (;;) {
 		wait_loops++;
@@ -316,12 +299,12 @@ void synchronize_rcu(void)
 	smp_mb_master(RCU_MB_GROUP);
 
 	/*
-	 * Wait for previous parity to be empty of readers.
+	 * Wait for readers to observe original parity or be quiescent.
 	 */
-	update_counter_and_wait();	/* 0 -> 1, wait readers in parity 0 */
+	wait_for_readers();
 
 	/*
-	 * Must finish waiting for quiescent state for parity 0 before
+	 * Must finish waiting for quiescent state for original parity before
 	 * committing next rcu_gp_ctr update to memory. Failure to do so could
 	 * result in the writer waiting forever while new readers are always
 	 * accessing data (no progress).  Enforce compiler-order of load
@@ -336,10 +319,29 @@ void synchronize_rcu(void)
 	 */
 	cmm_smp_mb();
 
+	/* Switch parity: 0 -> 1, 1 -> 0 */
+	CMM_STORE_SHARED(rcu_gp_ctr, rcu_gp_ctr ^ RCU_GP_CTR_PHASE);
+
+	/*
+	 * Must commit rcu_gp_ctr update to memory before waiting for quiescent
+	 * state. Failure to do so could result in the writer waiting forever
+	 * while new readers are always accessing data (no progress). Enforce
+	 * compiler-order of store to rcu_gp_ctr before load rcu_reader ctr.
+	 */
+	cmm_barrier();
+
+	/*
+	 *
+	 * Adding a cmm_smp_mb() which is _not_ formally required, but makes the
+	 * model easier to understand. It does not have a big performance impact
+	 * anyway, given this is the write-side.
+	 */
+	cmm_smp_mb();
+
 	/*
-	 * Wait for previous parity to be empty of readers.
+	 * Wait for readers to observe new parity or be quiescent.
 	 */
-	update_counter_and_wait();	/* 1 -> 0, wait readers in parity 1 */
+	wait_for_readers();
 
 	/* Finish waiting for reader threads before letting the old ptr being
 	 * freed. Must be done within rcu_gp_lock because it iterates on reader
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/8] urcu-bp: improve 2-phase wait scheme
       [not found] <1353169007-31389-1-git-send-email-mathieu.desnoyers@efficios.com>
  2012-11-17 16:16 ` [PATCH 1/8] urcu-qsbr: improve 2-phase wait scheme Mathieu Desnoyers
  2012-11-17 16:16 ` [PATCH 2/8] urcu-mb/signal/membarrier: " Mathieu Desnoyers
@ 2012-11-17 16:16 ` Mathieu Desnoyers
  2012-11-17 16:16 ` [PATCH 4/8] urcu-qsbr: move offline threads to separate list Mathieu Desnoyers
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 14+ messages in thread
From: Mathieu Desnoyers @ 2012-11-17 16:16 UTC (permalink / raw)
  To: lttng-dev, rp, Paul E . McKenney, Lai Jiangshan, Alan Stern
  Cc: Mathieu Desnoyers

In the single-bit, 2-phase grace period scheme, all we need to do is to
observe each reader going through a quiescent state while we are in the
grace period.

We therefore only need to perform one global counter update, surrounded
by 2 iterations on readers to observe change in their snapshot.

We can therefore remove the first counter update (prior to the first
iteration on readers): it was useless and was only slowing down the
grace period.

CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Lai Jiangshan <laijs@cn.fujitsu.com>
CC: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
---
 urcu-bp.c |   48 +++++++++++++++++++++++++-----------------------
 1 file changed, 25 insertions(+), 23 deletions(-)

diff --git a/urcu-bp.c b/urcu-bp.c
index b07a1bb..4b3cf01 100644
--- a/urcu-bp.c
+++ b/urcu-bp.c
@@ -164,31 +164,16 @@ static void mutex_unlock(pthread_mutex_t *mutex)
 		urcu_die(ret);
 }
 
-static void update_counter_and_wait(void)
+static void wait_for_readers(void)
 {
 	CDS_LIST_HEAD(qsreaders);
 	int wait_loops = 0;
 	struct rcu_reader *index, *tmp;
 
-	/* Switch parity: 0 -> 1, 1 -> 0 */
-	CMM_STORE_SHARED(rcu_gp_ctr, rcu_gp_ctr ^ RCU_GP_CTR_PHASE);
-
-	/*
-	 * Must commit qparity update to memory before waiting for other parity
-	 * quiescent state. Failure to do so could result in the writer waiting
-	 * forever while new readers are always accessing data (no progress).
-	 * Ensured by CMM_STORE_SHARED and CMM_LOAD_SHARED.
-	 */
-
-	/*
-	 * Adding a cmm_smp_mb() which is _not_ formally required, but makes the
-	 * model easier to understand. It does not have a big performance impact
-	 * anyway, given this is the write-side.
-	 */
-	cmm_smp_mb();
-
 	/*
-	 * Wait for each thread rcu_reader.ctr count to become 0.
+	 * Wait for each thread URCU_TLS(rcu_reader).ctr to either
+	 * indicate quiescence (not nested), or observe the current
+	 * rcu_gp_ctr value.
 	 */
 	for (;;) {
 		wait_loops++;
@@ -234,9 +219,26 @@ void synchronize_rcu(void)
 	rcu_gc_registry();
 
 	/*
-	 * Wait for previous parity to be empty of readers.
+	 * Wait for readers to observe original parity or be quiescent.
+	 */
+	wait_for_readers();
+
+	/*
+	 * Adding a cmm_smp_mb() which is _not_ formally required, but makes the
+	 * model easier to understand. It does not have a big performance impact
+	 * anyway, given this is the write-side.
+	 */
+	cmm_smp_mb();
+
+	/* Switch parity: 0 -> 1, 1 -> 0 */
+	CMM_STORE_SHARED(rcu_gp_ctr, rcu_gp_ctr ^ RCU_GP_CTR_PHASE);
+
+	/*
+	 * Must commit qparity update to memory before waiting for other parity
+	 * quiescent state. Failure to do so could result in the writer waiting
+	 * forever while new readers are always accessing data (no progress).
+	 * Ensured by CMM_STORE_SHARED and CMM_LOAD_SHARED.
 	 */
-	update_counter_and_wait();	/* 0 -> 1, wait readers in parity 0 */
 
 	/*
 	 * Adding a cmm_smp_mb() which is _not_ formally required, but makes the
@@ -246,9 +248,9 @@ void synchronize_rcu(void)
 	cmm_smp_mb();
 
 	/*
-	 * Wait for previous parity to be empty of readers.
+	 * Wait for readers to observe new parity or be quiescent.
 	 */
-	update_counter_and_wait();	/* 1 -> 0, wait readers in parity 1 */
+	wait_for_readers();
 
 	/*
 	 * Finish waiting for reader threads before letting the old ptr being
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/8] urcu-qsbr: move offline threads to separate list
       [not found] <1353169007-31389-1-git-send-email-mathieu.desnoyers@efficios.com>
                   ` (2 preceding siblings ...)
  2012-11-17 16:16 ` [PATCH 3/8] urcu-bp: " Mathieu Desnoyers
@ 2012-11-17 16:16 ` Mathieu Desnoyers
  2012-11-17 16:16 ` [PATCH 5/8] urcu-mb/signal/membarrier: move quiescent " Mathieu Desnoyers
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 14+ messages in thread
From: Mathieu Desnoyers @ 2012-11-17 16:16 UTC (permalink / raw)
  To: lttng-dev, rp, Paul E . McKenney, Lai Jiangshan, Alan Stern
  Cc: Mathieu Desnoyers

Accelerate 2-phase grace period by not having to iterate on offline
threads twice.

CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Lai Jiangshan <laijs@cn.fujitsu.com>
CC: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
---
 urcu-qsbr.c             |   54 ++++++++++++++++++++++++++++++++++++-----------
 urcu/static/urcu-qsbr.h |   14 ++++++++++--
 2 files changed, 54 insertions(+), 14 deletions(-)

diff --git a/urcu-qsbr.c b/urcu-qsbr.c
index eb167d1..5b341b5 100644
--- a/urcu-qsbr.c
+++ b/urcu-qsbr.c
@@ -116,9 +116,10 @@ static void wait_gp(void)
 		      NULL, NULL, 0);
 }
 
-static void wait_for_readers(void)
+static void wait_for_readers(struct cds_list_head *input_readers,
+			struct cds_list_head *cur_snap_readers,
+			struct cds_list_head *qsreaders)
 {
-	CDS_LIST_HEAD(qsreaders);
 	int wait_loops = 0;
 	struct rcu_reader *index, *tmp;
 
@@ -136,18 +137,36 @@ static void wait_for_readers(void)
 			 * reads them in the opposite order).
 			 */
 			cmm_smp_wmb();
-			cds_list_for_each_entry(index, &registry, node) {
+			cds_list_for_each_entry(index, input_readers, node) {
 				_CMM_STORE_SHARED(index->waiting, 1);
 			}
 			/* Write futex before read reader_gp */
 			cmm_smp_mb();
 		}
-		cds_list_for_each_entry_safe(index, tmp, &registry, node) {
-			if (!rcu_gp_ongoing(&index->ctr))
-				cds_list_move(&index->node, &qsreaders);
+		cds_list_for_each_entry_safe(index, tmp, input_readers, node) {
+			switch (rcu_reader_state(&index->ctr)) {
+			case RCU_READER_ACTIVE_CURRENT:
+				if (cur_snap_readers) {
+					cds_list_move(&index->node,
+						cur_snap_readers);
+					break;
+				}
+				/* Fall-through */
+			case RCU_READER_INACTIVE:
+				cds_list_move(&index->node, qsreaders);
+				break;
+			case RCU_READER_ACTIVE_OLD:
+				/*
+				 * Old snapshot. Leaving node in
+				 * input_readers will make us busy-loop
+				 * until the snapshot becomes current or
+				 * the reader becomes inactive.
+				 */
+				break;
+			}
 		}
 
-		if (cds_list_empty(&registry)) {
+		if (cds_list_empty(input_readers)) {
 			if (wait_loops >= RCU_QS_ACTIVE_ATTEMPTS) {
 				/* Read reader_gp before write futex */
 				cmm_smp_mb();
@@ -166,8 +185,6 @@ static void wait_for_readers(void)
 			}
 		}
 	}
-	/* put back the reader list in the registry */
-	cds_list_splice(&qsreaders, &registry);
 }
 
 /*
@@ -178,6 +195,8 @@ static void wait_for_readers(void)
 #if (CAA_BITS_PER_LONG < 64)
 void synchronize_rcu(void)
 {
+	CDS_LIST_HEAD(cur_snap_readers);
+	CDS_LIST_HEAD(qsreaders);
 	unsigned long was_online;
 
 	was_online = URCU_TLS(rcu_reader).ctr;
@@ -203,7 +222,7 @@ void synchronize_rcu(void)
 	/*
 	 * Wait for readers to observe original parity or be quiescent.
 	 */
-	wait_for_readers();
+	wait_for_readers(&registry, &cur_snap_readers, &qsreaders);
 
 	/*
 	 * Must finish waiting for quiescent state for original parity
@@ -244,7 +263,12 @@ void synchronize_rcu(void)
 	/*
 	 * Wait for readers to observe new parity or be quiescent.
 	 */
-	wait_for_readers();
+	wait_for_readers(&cur_snap_readers, NULL, &qsreaders);
+
+	/*
+	 * Put quiescent reader list back into registry.
+	 */
+	cds_list_splice(&qsreaders, &registry);
 out:
 	mutex_unlock(&rcu_gp_lock);
 
@@ -260,6 +284,7 @@ out:
 #else /* !(CAA_BITS_PER_LONG < 64) */
 void synchronize_rcu(void)
 {
+	CDS_LIST_HEAD(qsreaders);
 	unsigned long was_online;
 
 	was_online = URCU_TLS(rcu_reader).ctr;
@@ -300,7 +325,12 @@ void synchronize_rcu(void)
 	/*
 	 * Wait for readers to observe new count of be quiescent.
 	 */
-	wait_for_readers();
+	wait_for_readers(&registry, NULL, &qsreaders);
+
+	/*
+	 * Put quiescent reader list back into registry.
+	 */
+	cds_list_splice(&qsreaders, &registry);
 out:
 	mutex_unlock(&rcu_gp_lock);
 
diff --git a/urcu/static/urcu-qsbr.h b/urcu/static/urcu-qsbr.h
index f314956..f6e5580 100644
--- a/urcu/static/urcu-qsbr.h
+++ b/urcu/static/urcu-qsbr.h
@@ -62,6 +62,12 @@ extern "C" {
 #define rcu_assert(args...)
 #endif
 
+enum rcu_state {
+	RCU_READER_ACTIVE_CURRENT,
+	RCU_READER_ACTIVE_OLD,
+	RCU_READER_INACTIVE,
+};
+
 #ifdef DEBUG_YIELD
 #include <sched.h>
 #include <time.h>
@@ -149,12 +155,16 @@ static inline void wake_up_gp(void)
 	}
 }
 
-static inline int rcu_gp_ongoing(unsigned long *ctr)
+static inline enum rcu_state rcu_reader_state(unsigned long *ctr)
 {
 	unsigned long v;
 
 	v = CMM_LOAD_SHARED(*ctr);
-	return v && (v != rcu_gp_ctr);
+	if (!v)
+		return RCU_READER_INACTIVE;
+	if (v == rcu_gp_ctr)
+		return RCU_READER_ACTIVE_CURRENT;
+	return RCU_READER_ACTIVE_OLD;
 }
 
 /*
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 5/8] urcu-mb/signal/membarrier: move quiescent threads to separate list
       [not found] <1353169007-31389-1-git-send-email-mathieu.desnoyers@efficios.com>
                   ` (3 preceding siblings ...)
  2012-11-17 16:16 ` [PATCH 4/8] urcu-qsbr: move offline threads to separate list Mathieu Desnoyers
@ 2012-11-17 16:16 ` Mathieu Desnoyers
  2012-11-17 16:16 ` [PATCH 6/8] urcu-bp: " Mathieu Desnoyers
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 14+ messages in thread
From: Mathieu Desnoyers @ 2012-11-17 16:16 UTC (permalink / raw)
  To: lttng-dev, rp, Paul E . McKenney, Lai Jiangshan, Alan Stern
  Cc: Mathieu Desnoyers

Accelerate 2-phase grace period by not having to iterate twice on
threads not nested within a RCU read-side lock.

CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Lai Jiangshan <laijs@cn.fujitsu.com>
CC: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
---
 urcu.c             |   47 ++++++++++++++++++++++++++++++++++++-----------
 urcu/static/urcu.h |   15 ++++++++++++---
 2 files changed, 48 insertions(+), 14 deletions(-)

diff --git a/urcu.c b/urcu.c
index 2d5c510..c692db9 100644
--- a/urcu.c
+++ b/urcu.c
@@ -215,9 +215,10 @@ static void wait_gp(void)
 		      NULL, NULL, 0);
 }
 
-static void wait_for_readers(void)
+static void wait_for_readers(struct cds_list_head *input_readers,
+			struct cds_list_head *cur_snap_readers,
+			struct cds_list_head *qsreaders)
 {
-	CDS_LIST_HEAD(qsreaders);
 	int wait_loops = 0;
 	struct rcu_reader *index, *tmp;
 
@@ -234,13 +235,31 @@ static void wait_for_readers(void)
 			smp_mb_master(RCU_MB_GROUP);
 		}
 
-		cds_list_for_each_entry_safe(index, tmp, &registry, node) {
-			if (!rcu_gp_ongoing(&index->ctr))
-				cds_list_move(&index->node, &qsreaders);
+		cds_list_for_each_entry_safe(index, tmp, input_readers, node) {
+			switch (rcu_reader_state(&index->ctr)) {
+			case RCU_READER_ACTIVE_CURRENT:
+				if (cur_snap_readers) {
+					cds_list_move(&index->node,
+						cur_snap_readers);
+					break;
+				}
+				/* Fall-through */
+			case RCU_READER_INACTIVE:
+				cds_list_move(&index->node, qsreaders);
+				break;
+			case RCU_READER_ACTIVE_OLD:
+				/*
+				 * Old snapshot. Leaving node in
+				 * input_readers will make us busy-loop
+				 * until the snapshot becomes current or
+				 * the reader becomes inactive.
+				 */
+				break;
+			}
 		}
 
 #ifndef HAS_INCOHERENT_CACHES
-		if (cds_list_empty(&registry)) {
+		if (cds_list_empty(input_readers)) {
 			if (wait_loops == RCU_QS_ACTIVE_ATTEMPTS) {
 				/* Read reader_gp before write futex */
 				smp_mb_master(RCU_MB_GROUP);
@@ -259,7 +278,7 @@ static void wait_for_readers(void)
 		 * URCU_TLS(rcu_reader).ctr update to memory if we wait
 		 * for too long.
 		 */
-		if (cds_list_empty(&registry)) {
+		if (cds_list_empty(input_readers)) {
 			if (wait_loops == RCU_QS_ACTIVE_ATTEMPTS) {
 				/* Read reader_gp before write futex */
 				smp_mb_master(RCU_MB_GROUP);
@@ -281,12 +300,13 @@ static void wait_for_readers(void)
 		}
 #endif /* #else #ifndef HAS_INCOHERENT_CACHES */
 	}
-	/* put back the reader list in the registry */
-	cds_list_splice(&qsreaders, &registry);
 }
 
 void synchronize_rcu(void)
 {
+	CDS_LIST_HEAD(cur_snap_readers);
+	CDS_LIST_HEAD(qsreaders);
+
 	mutex_lock(&rcu_gp_lock);
 
 	if (cds_list_empty(&registry))
@@ -301,7 +321,7 @@ void synchronize_rcu(void)
 	/*
 	 * Wait for readers to observe original parity or be quiescent.
 	 */
-	wait_for_readers();
+	wait_for_readers(&registry, &cur_snap_readers, &qsreaders);
 
 	/*
 	 * Must finish waiting for quiescent state for original parity before
@@ -341,7 +361,12 @@ void synchronize_rcu(void)
 	/*
 	 * Wait for readers to observe new parity or be quiescent.
 	 */
-	wait_for_readers();
+	wait_for_readers(&registry, NULL, &qsreaders);
+
+	/*
+	 * Put quiescent reader list back into registry.
+	 */
+	cds_list_splice(&qsreaders, &registry);
 
 	/* Finish waiting for reader threads before letting the old ptr being
 	 * freed. Must be done within rcu_gp_lock because it iterates on reader
diff --git a/urcu/static/urcu.h b/urcu/static/urcu.h
index f32f300..973826a 100644
--- a/urcu/static/urcu.h
+++ b/urcu/static/urcu.h
@@ -96,6 +96,12 @@ extern "C" {
 #define SIGRCU SIGUSR1
 #endif
 
+enum rcu_state {
+	RCU_READER_ACTIVE_CURRENT,
+	RCU_READER_ACTIVE_OLD,
+	RCU_READER_INACTIVE,
+};
+
 #ifdef DEBUG_RCU
 #define rcu_assert(args...)	assert(args)
 #else
@@ -239,7 +245,7 @@ static inline void wake_up_gp(void)
 	}
 }
 
-static inline int rcu_gp_ongoing(unsigned long *ctr)
+static inline enum rcu_state rcu_reader_state(unsigned long *ctr)
 {
 	unsigned long v;
 
@@ -248,8 +254,11 @@ static inline int rcu_gp_ongoing(unsigned long *ctr)
 	 * to insure consistency.
 	 */
 	v = CMM_LOAD_SHARED(*ctr);
-	return (v & RCU_GP_CTR_NEST_MASK) &&
-		 ((v ^ rcu_gp_ctr) & RCU_GP_CTR_PHASE);
+	if (!(v & RCU_GP_CTR_NEST_MASK))
+		return RCU_READER_INACTIVE;
+	if (!((v ^ rcu_gp_ctr) & RCU_GP_CTR_PHASE))
+		return RCU_READER_ACTIVE_CURRENT;
+	return RCU_READER_ACTIVE_OLD;
 }
 
 /*
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 6/8] urcu-bp: move quiescent threads to separate list
       [not found] <1353169007-31389-1-git-send-email-mathieu.desnoyers@efficios.com>
                   ` (4 preceding siblings ...)
  2012-11-17 16:16 ` [PATCH 5/8] urcu-mb/signal/membarrier: move quiescent " Mathieu Desnoyers
@ 2012-11-17 16:16 ` Mathieu Desnoyers
  2012-11-17 16:16 ` [PATCH 7/8] tests: use standard malloc/free for synchronize_rcu() Mathieu Desnoyers
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 14+ messages in thread
From: Mathieu Desnoyers @ 2012-11-17 16:16 UTC (permalink / raw)
  To: lttng-dev, rp, Paul E . McKenney, Lai Jiangshan, Alan Stern
  Cc: Mathieu Desnoyers

Accelerate 2-phase grace period by not having to iterate twice on
threads not within RCU read-side critical section.

CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Lai Jiangshan <laijs@cn.fujitsu.com>
CC: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
---
 urcu-bp.c             |   46 +++++++++++++++++++++++++++++++++++-----------
 urcu.c                |    2 +-
 urcu/static/urcu-bp.h |   29 +++++++++++++++++++----------
 3 files changed, 55 insertions(+), 22 deletions(-)

diff --git a/urcu-bp.c b/urcu-bp.c
index 4b3cf01..f99c0e5 100644
--- a/urcu-bp.c
+++ b/urcu-bp.c
@@ -115,7 +115,7 @@ DEFINE_URCU_TLS(unsigned int, rcu_rand_yield);
  * Also has a RCU_GP_COUNT of 1, to accelerate the reader fast path.
  * Written to only by writer with mutex taken. Read by both writer and readers.
  */
-long rcu_gp_ctr = RCU_GP_COUNT;
+unsigned long rcu_gp_ctr = RCU_GP_COUNT;
 
 /*
  * Pointer to registry elements. Written to only by each individual reader. Read
@@ -164,9 +164,10 @@ static void mutex_unlock(pthread_mutex_t *mutex)
 		urcu_die(ret);
 }
 
-static void wait_for_readers(void)
+static void wait_for_readers(struct cds_list_head *input_readers,
+			struct cds_list_head *cur_snap_readers,
+			struct cds_list_head *qsreaders)
 {
-	CDS_LIST_HEAD(qsreaders);
 	int wait_loops = 0;
 	struct rcu_reader *index, *tmp;
 
@@ -177,12 +178,30 @@ static void wait_for_readers(void)
 	 */
 	for (;;) {
 		wait_loops++;
-		cds_list_for_each_entry_safe(index, tmp, &registry, node) {
-			if (!rcu_old_gp_ongoing(&index->ctr))
-				cds_list_move(&index->node, &qsreaders);
+		cds_list_for_each_entry_safe(index, tmp, input_readers, node) {
+			switch (rcu_reader_state(&index->ctr)) {
+			case RCU_READER_ACTIVE_CURRENT:
+				if (cur_snap_readers) {
+					cds_list_move(&index->node,
+						cur_snap_readers);
+					break;
+				}
+				/* Fall-through */
+			case RCU_READER_INACTIVE:
+				cds_list_move(&index->node, qsreaders);
+				break;
+			case RCU_READER_ACTIVE_OLD:
+				/*
+				 * Old snapshot. Leaving node in
+				 * input_readers will make us busy-loop
+				 * until the snapshot becomes current or
+				 * the reader becomes inactive.
+				 */
+				break;
+			}
 		}
 
-		if (cds_list_empty(&registry)) {
+		if (cds_list_empty(input_readers)) {
 			break;
 		} else {
 			if (wait_loops == RCU_QS_ACTIVE_ATTEMPTS)
@@ -191,12 +210,12 @@ static void wait_for_readers(void)
 				caa_cpu_relax();
 		}
 	}
-	/* put back the reader list in the registry */
-	cds_list_splice(&qsreaders, &registry);
 }
 
 void synchronize_rcu(void)
 {
+	CDS_LIST_HEAD(cur_snap_readers);
+	CDS_LIST_HEAD(qsreaders);
 	sigset_t newmask, oldmask;
 	int ret;
 
@@ -221,7 +240,7 @@ void synchronize_rcu(void)
 	/*
 	 * Wait for readers to observe original parity or be quiescent.
 	 */
-	wait_for_readers();
+	wait_for_readers(&registry, &cur_snap_readers, &qsreaders);
 
 	/*
 	 * Adding a cmm_smp_mb() which is _not_ formally required, but makes the
@@ -250,7 +269,12 @@ void synchronize_rcu(void)
 	/*
 	 * Wait for readers to observe new parity or be quiescent.
 	 */
-	wait_for_readers();
+	wait_for_readers(&cur_snap_readers, NULL, &qsreaders);
+
+	/*
+	 * Put quiescent reader list back into registry.
+	 */
+	cds_list_splice(&qsreaders, &registry);
 
 	/*
 	 * Finish waiting for reader threads before letting the old ptr being
diff --git a/urcu.c b/urcu.c
index c692db9..e6ff0f3 100644
--- a/urcu.c
+++ b/urcu.c
@@ -361,7 +361,7 @@ void synchronize_rcu(void)
 	/*
 	 * Wait for readers to observe new parity or be quiescent.
 	 */
-	wait_for_readers(&registry, NULL, &qsreaders);
+	wait_for_readers(&cur_snap_readers, NULL, &qsreaders);
 
 	/*
 	 * Put quiescent reader list back into registry.
diff --git a/urcu/static/urcu-bp.h b/urcu/static/urcu-bp.h
index c52a688..c7f5326 100644
--- a/urcu/static/urcu-bp.h
+++ b/urcu/static/urcu-bp.h
@@ -58,6 +58,12 @@ extern "C" {
 #define rcu_assert(args...)
 #endif
 
+enum rcu_state {
+	RCU_READER_ACTIVE_CURRENT,
+	RCU_READER_ACTIVE_OLD,
+	RCU_READER_INACTIVE,
+};
+
 #ifdef DEBUG_YIELD
 #include <sched.h>
 #include <time.h>
@@ -129,11 +135,11 @@ extern void rcu_bp_register(void);
  * Using a int rather than a char to eliminate false register dependencies
  * causing stalls on some architectures.
  */
-extern long rcu_gp_ctr;
+extern unsigned long rcu_gp_ctr;
 
 struct rcu_reader {
 	/* Data used by both reader and synchronize_rcu() */
-	long ctr;
+	unsigned long ctr;
 	/* Data used for registry */
 	struct cds_list_head node __attribute__((aligned(CAA_CACHE_LINE_SIZE)));
 	pthread_t tid;
@@ -147,19 +153,22 @@ struct rcu_reader {
  */
 extern DECLARE_URCU_TLS(struct rcu_reader *, rcu_reader);
 
-static inline int rcu_old_gp_ongoing(long *value)
+static inline enum rcu_state rcu_reader_state(unsigned long *ctr)
 {
-	long v;
+	unsigned long v;
 
-	if (value == NULL)
-		return 0;
+	if (ctr == NULL)
+		return RCU_READER_INACTIVE;
 	/*
 	 * Make sure both tests below are done on the same version of *value
 	 * to insure consistency.
 	 */
-	v = CMM_LOAD_SHARED(*value);
-	return (v & RCU_GP_CTR_NEST_MASK) &&
-		 ((v ^ rcu_gp_ctr) & RCU_GP_CTR_PHASE);
+	v = CMM_LOAD_SHARED(*ctr);
+	if (!(v & RCU_GP_CTR_NEST_MASK))
+		return RCU_READER_INACTIVE;
+	if (!((v ^ rcu_gp_ctr) & RCU_GP_CTR_PHASE))
+		return RCU_READER_ACTIVE_CURRENT;
+	return RCU_READER_ACTIVE_OLD;
 }
 
 /*
@@ -190,7 +199,7 @@ static inline void _rcu_read_lock_update(unsigned long tmp)
  */
 static inline void _rcu_read_lock(void)
 {
-	long tmp;
+	unsigned long tmp;
 
 	if (caa_unlikely(!URCU_TLS(rcu_reader)))
 		rcu_bp_register(); /* If not yet registered. */
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 7/8] tests: use standard malloc/free for synchronize_rcu()
       [not found] <1353169007-31389-1-git-send-email-mathieu.desnoyers@efficios.com>
                   ` (5 preceding siblings ...)
  2012-11-17 16:16 ` [PATCH 6/8] urcu-bp: " Mathieu Desnoyers
@ 2012-11-17 16:16 ` Mathieu Desnoyers
  2012-11-17 16:16 ` [PATCH 8/8] urcu-qsbr: batch concurrent synchronize_rcu() Mathieu Desnoyers
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 14+ messages in thread
From: Mathieu Desnoyers @ 2012-11-17 16:16 UTC (permalink / raw)
  To: lttng-dev, rp, Paul E . McKenney, Lai Jiangshan, Alan Stern
  Cc: Mathieu Desnoyers

Allows removing mutex from tests, which allow testing scalability of
concurrent synchronize_rcu() executions.

CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Lai Jiangshan <laijs@cn.fujitsu.com>
CC: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
---
 tests/test_urcu.c      |   60 ++++++++---------------------------------------
 tests/test_urcu_bp.c   |   59 +++++++---------------------------------------
 tests/test_urcu_qsbr.c |   61 ++++++++----------------------------------------
 3 files changed, 29 insertions(+), 151 deletions(-)

diff --git a/tests/test_urcu.c b/tests/test_urcu.c
index 6463d67..87972aa 100644
--- a/tests/test_urcu.c
+++ b/tests/test_urcu.c
@@ -66,15 +66,11 @@ static inline pid_t gettid(void)
 #endif
 #include <urcu.h>
 
-struct test_array {
-	int a;
-};
-
 static volatile int test_go, test_stop;
 
 static unsigned long wdelay;
 
-static struct test_array *test_rcu_pointer;
+static int *test_rcu_pointer;
 
 static unsigned long duration;
 
@@ -185,43 +181,10 @@ void rcu_copy_mutex_unlock(void)
 	}
 }
 
-/*
- * malloc/free are reusing memory areas too quickly, which does not let us
- * test races appropriately. Use a large circular array for allocations.
- * ARRAY_SIZE is larger than nr_writers, and we keep the mutex across
- * both alloc and free, which insures we never run over our tail.
- */
-#define ARRAY_SIZE (1048576 * nr_writers)
-#define ARRAY_POISON 0xDEADBEEF
-static int array_index;
-static struct test_array *test_array;
-
-static struct test_array *test_array_alloc(void)
-{
-	struct test_array *ret;
-	int index;
-
-	index = array_index % ARRAY_SIZE;
-	assert(test_array[index].a == ARRAY_POISON ||
-		test_array[index].a == 0);
-	ret = &test_array[index];
-	array_index++;
-	if (array_index == ARRAY_SIZE)
-		array_index = 0;
-	return ret;
-}
-
-static void test_array_free(struct test_array *ptr)
-{
-	if (!ptr)
-		return;
-	ptr->a = ARRAY_POISON;
-}
-
 void *thr_reader(void *_count)
 {
 	unsigned long long *count = _count;
-	struct test_array *local_ptr;
+	int *local_ptr;
 
 	printf_verbose("thread_begin %s, thread id : %lx, tid %lu\n",
 			"reader", (unsigned long) pthread_self(),
@@ -241,7 +204,7 @@ void *thr_reader(void *_count)
 		local_ptr = rcu_dereference(test_rcu_pointer);
 		rcu_debug_yield_read();
 		if (local_ptr)
-			assert(local_ptr->a == 8);
+			assert(*local_ptr == 8);
 		if (caa_unlikely(rduration))
 			loop_sleep(rduration);
 		rcu_read_unlock();
@@ -267,7 +230,7 @@ void *thr_reader(void *_count)
 void *thr_writer(void *_count)
 {
 	unsigned long long *count = _count;
-	struct test_array *new, *old;
+	int *new, *old;
 
 	printf_verbose("thread_begin %s, thread id : %lx, tid %lu\n",
 			"writer", (unsigned long) pthread_self(),
@@ -281,17 +244,16 @@ void *thr_writer(void *_count)
 	cmm_smp_mb();
 
 	for (;;) {
-		rcu_copy_mutex_lock();
-		new = test_array_alloc();
-		new->a = 8;
+		new = malloc(sizeof(int));
+		assert(new);
+		*new = 8;
 		old = rcu_xchg_pointer(&test_rcu_pointer, new);
 		if (caa_unlikely(wduration))
 			loop_sleep(wduration);
 		synchronize_rcu();
 		if (old)
-			old->a = 0;
-		test_array_free(old);
-		rcu_copy_mutex_unlock();
+			*old = 0;
+		free(old);
 		URCU_TLS(nr_writes)++;
 		if (caa_unlikely(!test_duration_write()))
 			break;
@@ -409,7 +371,6 @@ int main(int argc, char **argv)
 			"main", (unsigned long) pthread_self(),
 			(unsigned long)gettid());
 
-	test_array = calloc(1, sizeof(*test_array) * ARRAY_SIZE);
 	tid_reader = malloc(sizeof(*tid_reader) * nr_readers);
 	tid_writer = malloc(sizeof(*tid_writer) * nr_writers);
 	count_reader = malloc(sizeof(*count_reader) * nr_readers);
@@ -459,8 +420,7 @@ int main(int argc, char **argv)
 		argv[0], duration, nr_readers, rduration, wduration,
 		nr_writers, wdelay, tot_reads, tot_writes,
 		tot_reads + tot_writes);
-	test_array_free(test_rcu_pointer);
-	free(test_array);
+	free(test_rcu_pointer);
 	free(tid_reader);
 	free(tid_writer);
 	free(count_reader);
diff --git a/tests/test_urcu_bp.c b/tests/test_urcu_bp.c
index 7902388..58afe90 100644
--- a/tests/test_urcu_bp.c
+++ b/tests/test_urcu_bp.c
@@ -66,15 +66,11 @@ static inline pid_t gettid(void)
 #endif
 #include <urcu-bp.h>
 
-struct test_array {
-	int a;
-};
-
 static volatile int test_go, test_stop;
 
 static unsigned long wdelay;
 
-static struct test_array *test_rcu_pointer;
+static int *test_rcu_pointer;
 
 static unsigned long duration;
 
@@ -185,43 +181,10 @@ void rcu_copy_mutex_unlock(void)
 	}
 }
 
-/*
- * malloc/free are reusing memory areas too quickly, which does not let us
- * test races appropriately. Use a large circular array for allocations.
- * ARRAY_SIZE is larger than nr_writers, and we keep the mutex across
- * both alloc and free, which insures we never run over our tail.
- */
-#define ARRAY_SIZE (1048576 * nr_writers)
-#define ARRAY_POISON 0xDEADBEEF
-static int array_index;
-static struct test_array *test_array;
-
-static struct test_array *test_array_alloc(void)
-{
-	struct test_array *ret;
-	int index;
-
-	index = array_index % ARRAY_SIZE;
-	assert(test_array[index].a == ARRAY_POISON ||
-		test_array[index].a == 0);
-	ret = &test_array[index];
-	array_index++;
-	if (array_index == ARRAY_SIZE)
-		array_index = 0;
-	return ret;
-}
-
-static void test_array_free(struct test_array *ptr)
-{
-	if (!ptr)
-		return;
-	ptr->a = ARRAY_POISON;
-}
-
 void *thr_reader(void *_count)
 {
 	unsigned long long *count = _count;
-	struct test_array *local_ptr;
+	int *local_ptr;
 
 	printf_verbose("thread_begin %s, thread id : %lx, tid %lu\n",
 			"reader", (unsigned long) pthread_self(),
@@ -241,7 +204,7 @@ void *thr_reader(void *_count)
 		local_ptr = rcu_dereference(test_rcu_pointer);
 		rcu_debug_yield_read();
 		if (local_ptr)
-			assert(local_ptr->a == 8);
+			assert(*local_ptr == 8);
 		if (caa_unlikely(rduration))
 			loop_sleep(rduration);
 		rcu_read_unlock();
@@ -263,7 +226,7 @@ void *thr_reader(void *_count)
 void *thr_writer(void *_count)
 {
 	unsigned long long *count = _count;
-	struct test_array *new, *old;
+	int *new, *old;
 
 	printf_verbose("thread_begin %s, thread id : %lx, tid %lu\n",
 			"writer", (unsigned long) pthread_self(),
@@ -277,17 +240,15 @@ void *thr_writer(void *_count)
 	cmm_smp_mb();
 
 	for (;;) {
-		rcu_copy_mutex_lock();
-		new = test_array_alloc();
-		new->a = 8;
+		new = malloc(sizeof(int));
+		*new = 8;
 		old = rcu_xchg_pointer(&test_rcu_pointer, new);
 		if (caa_unlikely(wduration))
 			loop_sleep(wduration);
 		synchronize_rcu();
 		if (old)
-			old->a = 0;
-		test_array_free(old);
-		rcu_copy_mutex_unlock();
+			*old = 0;
+		free(old);
 		URCU_TLS(nr_writes)++;
 		if (caa_unlikely(!test_duration_write()))
 			break;
@@ -405,7 +366,6 @@ int main(int argc, char **argv)
 			"main", (unsigned long) pthread_self(),
 			(unsigned long) gettid());
 
-	test_array = calloc(1, sizeof(*test_array) * ARRAY_SIZE);
 	tid_reader = malloc(sizeof(*tid_reader) * nr_readers);
 	tid_writer = malloc(sizeof(*tid_writer) * nr_writers);
 	count_reader = malloc(sizeof(*count_reader) * nr_readers);
@@ -455,8 +415,7 @@ int main(int argc, char **argv)
 		argv[0], duration, nr_readers, rduration, wduration,
 		nr_writers, wdelay, tot_reads, tot_writes,
 		tot_reads + tot_writes);
-	test_array_free(test_rcu_pointer);
-	free(test_array);
+	free(test_rcu_pointer);
 	free(tid_reader);
 	free(tid_writer);
 	free(count_reader);
diff --git a/tests/test_urcu_qsbr.c b/tests/test_urcu_qsbr.c
index da26b77..d5d893c 100644
--- a/tests/test_urcu_qsbr.c
+++ b/tests/test_urcu_qsbr.c
@@ -66,15 +66,11 @@ static inline pid_t gettid(void)
 #endif
 #include "urcu-qsbr.h"
 
-struct test_array {
-	int a;
-};
-
 static volatile int test_go, test_stop;
 
 static unsigned long wdelay;
 
-static struct test_array *test_rcu_pointer;
+static int *test_rcu_pointer;
 
 static unsigned long duration;
 
@@ -184,43 +180,10 @@ void rcu_copy_mutex_unlock(void)
 	}
 }
 
-/*
- * malloc/free are reusing memory areas too quickly, which does not let us
- * test races appropriately. Use a large circular array for allocations.
- * ARRAY_SIZE is larger than nr_writers, and we keep the mutex across
- * both alloc and free, which insures we never run over our tail.
- */
-#define ARRAY_SIZE (1048576 * nr_writers)
-#define ARRAY_POISON 0xDEADBEEF
-static int array_index;
-static struct test_array *test_array;
-
-static struct test_array *test_array_alloc(void)
-{
-	struct test_array *ret;
-	int index;
-
-	index = array_index % ARRAY_SIZE;
-	assert(test_array[index].a == ARRAY_POISON ||
-		test_array[index].a == 0);
-	ret = &test_array[index];
-	array_index++;
-	if (array_index == ARRAY_SIZE)
-		array_index = 0;
-	return ret;
-}
-
-static void test_array_free(struct test_array *ptr)
-{
-	if (!ptr)
-		return;
-	ptr->a = ARRAY_POISON;
-}
-
 void *thr_reader(void *_count)
 {
 	unsigned long long *count = _count;
-	struct test_array *local_ptr;
+	int *local_ptr;
 
 	printf_verbose("thread_begin %s, thread id : %lx, tid %lu\n",
 			"reader", (unsigned long) pthread_self(),
@@ -240,7 +203,7 @@ void *thr_reader(void *_count)
 		local_ptr = rcu_dereference(test_rcu_pointer);
 		rcu_debug_yield_read();
 		if (local_ptr)
-			assert(local_ptr->a == 8);
+			assert(*local_ptr == 8);
 		if (caa_unlikely(rduration))
 			loop_sleep(rduration);
 		rcu_read_unlock();
@@ -269,7 +232,7 @@ void *thr_reader(void *_count)
 void *thr_writer(void *_count)
 {
 	unsigned long long *count = _count;
-	struct test_array *new, *old;
+	int *new, *old;
 
 	printf_verbose("thread_begin %s, thread id : %lx, tid %lu\n",
 			"writer", (unsigned long) pthread_self(),
@@ -283,18 +246,16 @@ void *thr_writer(void *_count)
 	cmm_smp_mb();
 
 	for (;;) {
-		rcu_copy_mutex_lock();
-		new = test_array_alloc();
-		new->a = 8;
+		new = malloc(sizeof(int));
+		assert(new);
+		*new = 8;
 		old = rcu_xchg_pointer(&test_rcu_pointer, new);
 		if (caa_unlikely(wduration))
 			loop_sleep(wduration);
 		synchronize_rcu();
-		/* can be done after unlock */
 		if (old)
-			old->a = 0;
-		test_array_free(old);
-		rcu_copy_mutex_unlock();
+			*old = 0;
+		free(old);
 		URCU_TLS(nr_writes)++;
 		if (caa_unlikely(!test_duration_write()))
 			break;
@@ -412,7 +373,6 @@ int main(int argc, char **argv)
 			"main", (unsigned long) pthread_self(),
 			(unsigned long) gettid());
 
-	test_array = calloc(1, sizeof(*test_array) * ARRAY_SIZE);
 	tid_reader = malloc(sizeof(*tid_reader) * nr_readers);
 	tid_writer = malloc(sizeof(*tid_writer) * nr_writers);
 	count_reader = malloc(sizeof(*count_reader) * nr_readers);
@@ -462,8 +422,7 @@ int main(int argc, char **argv)
 		argv[0], duration, nr_readers, rduration, wduration,
 		nr_writers, wdelay, tot_reads, tot_writes,
 		tot_reads + tot_writes);
-	test_array_free(test_rcu_pointer);
-	free(test_array);
+	free(test_rcu_pointer);
 	free(tid_reader);
 	free(tid_writer);
 	free(count_reader);
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 8/8] urcu-qsbr: batch concurrent synchronize_rcu()
       [not found] <1353169007-31389-1-git-send-email-mathieu.desnoyers@efficios.com>
                   ` (6 preceding siblings ...)
  2012-11-17 16:16 ` [PATCH 7/8] tests: use standard malloc/free for synchronize_rcu() Mathieu Desnoyers
@ 2012-11-17 16:16 ` Mathieu Desnoyers
       [not found] ` <1353169007-31389-6-git-send-email-mathieu.desnoyers@efficios.com>
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 14+ messages in thread
From: Mathieu Desnoyers @ 2012-11-17 16:16 UTC (permalink / raw)
  To: lttng-dev, rp, Paul E . McKenney, Lai Jiangshan, Alan Stern
  Cc: Mathieu Desnoyers

CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Lai Jiangshan <laijs@cn.fujitsu.com>
CC: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
---
 urcu-qsbr.c |  151 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 151 insertions(+)

diff --git a/urcu-qsbr.c b/urcu-qsbr.c
index 5b341b5..7f747ed 100644
--- a/urcu-qsbr.c
+++ b/urcu-qsbr.c
@@ -36,6 +36,7 @@
 #include <poll.h>
 
 #include "urcu/wfcqueue.h"
+#include "urcu/wfstack.h"
 #include "urcu/map/urcu-qsbr.h"
 #define BUILD_QSBR_LIB
 #include "urcu/static/urcu-qsbr.h"
@@ -78,6 +79,35 @@ DEFINE_URCU_TLS(unsigned int, rcu_rand_yield);
 
 static CDS_LIST_HEAD(registry);
 
+/*
+ * Number of busy-loop attempts before waiting on futex for grace period
+ * batching.
+ */
+#define RCU_AWAKE_ATTEMPTS 1000
+
+enum adapt_wakeup_state {
+	/* AWAKE_WAITING is compared directly (futex compares it). */
+	AWAKE_WAITING =		0,
+	/* non-zero are used as masks. */
+	AWAKE_WAKEUP =		(1 << 0),
+	AWAKE_AWAKENED =	(1 << 1),
+	AWAKE_TEARDOWN =	(1 << 2),
+};
+
+struct gp_waiters_thread {
+	struct cds_wfs_node node;
+	int32_t wait_futex;
+};
+
+/*
+ * Stack keeping threads awaiting to wait for a grace period. Contains
+ * struct gp_waiters_thread objects.
+ */
+static struct cds_wfs_stack gp_waiters = {
+	.head = CDS_WFS_END,
+	.lock = PTHREAD_MUTEX_INITIALIZER,
+};
+
 static void mutex_lock(pthread_mutex_t *mutex)
 {
 	int ret;
@@ -116,6 +146,58 @@ static void wait_gp(void)
 		      NULL, NULL, 0);
 }
 
+/*
+ * Note: urcu_adaptative_wake_up needs "value" to stay allocated
+ * throughout its execution. In this scheme, the waiter owns the futex
+ * memory, and we only allow it to free this memory when it receives the
+ * AWAKE_TEARDOWN flag.
+ */
+static void urcu_adaptative_wake_up(int32_t *value)
+{
+	cmm_smp_mb();
+	assert(uatomic_read(value) == AWAKE_WAITING);
+	uatomic_set(value, AWAKE_WAKEUP);
+	if (!(uatomic_read(value) & AWAKE_AWAKENED))
+		futex_noasync(value, FUTEX_WAKE, 1, NULL, NULL, 0);
+	/* Allow teardown of "value" memory. */
+	uatomic_or(value, AWAKE_TEARDOWN);
+}
+
+/*
+ * Caller must initialize "value" to AWAKE_WAITING before passing its
+ * memory to waker thread.
+ */
+static void urcu_adaptative_busy_wait(int32_t *value)
+{
+	unsigned int i;
+
+	/* Load and test condition before read futex */
+	cmm_smp_rmb();
+	for (i = 0; i < RCU_AWAKE_ATTEMPTS; i++) {
+		if (uatomic_read(value) != AWAKE_WAITING)
+			goto skip_futex_wait;
+		caa_cpu_relax();
+	}
+	futex_noasync(value, FUTEX_WAIT, AWAKE_WAITING, NULL, NULL, 0);
+skip_futex_wait:
+
+	/* Tell waker thread than we are awakened. */
+	uatomic_or(value, AWAKE_AWAKENED);
+
+	/*
+	 * Wait until waker thread lets us know it's ok to tear down
+	 * memory allocated for value.
+	 */
+	for (i = 0; i < RCU_AWAKE_ATTEMPTS; i++) {
+		if (uatomic_read(value) & AWAKE_TEARDOWN)
+			break;
+		caa_cpu_relax();
+	}
+	while (!(uatomic_read(value) & AWAKE_TEARDOWN))
+		poll(NULL, 0, 10);
+	assert(uatomic_read(value) & AWAKE_TEARDOWN);
+}
+
 static void wait_for_readers(struct cds_list_head *input_readers,
 			struct cds_list_head *cur_snap_readers,
 			struct cds_list_head *qsreaders)
@@ -198,6 +280,9 @@ void synchronize_rcu(void)
 	CDS_LIST_HEAD(cur_snap_readers);
 	CDS_LIST_HEAD(qsreaders);
 	unsigned long was_online;
+	struct gp_waiters_thread gp_waiters_thread;
+	struct cds_wfs_head *gp_waiters_head;
+	struct cds_wfs_node *waiters_iter, *waiters_iter_n;
 
 	was_online = URCU_TLS(rcu_reader).ctr;
 
@@ -214,8 +299,26 @@ void synchronize_rcu(void)
 	else
 		cmm_smp_mb();
 
+	/*
+	 * Add ourself to gp_waiters stack of threads awaiting to wait
+	 * for a grace period. Proceed to perform the grace period only
+	 * if we are the first thread added into the stack.
+	 */
+	cds_wfs_node_init(&gp_waiters_thread.node);
+	gp_waiters_thread.wait_futex = AWAKE_WAITING;
+	if (cds_wfs_push(&gp_waiters, &gp_waiters_node) != 0) {
+		/* Not first in stack: will be awakened by another thread. */
+		urcu_adaptative_busy_wait(&gp_waiters_thread.wait_futex);
+		goto gp_end;
+	}
+
 	mutex_lock(&rcu_gp_lock);
 
+	/*
+	 * Pop all waiters into our local stack head.
+	 */
+	gp_waiters_head = __cds_wfs_pop_all(&gp_waiters);
+
 	if (cds_list_empty(&registry))
 		goto out;
 
@@ -272,6 +375,19 @@ void synchronize_rcu(void)
 out:
 	mutex_unlock(&rcu_gp_lock);
 
+	/* Wake all waiters in our stack head, excluding ourself. */
+	cds_wfs_for_each_blocking_safe(gp_waiters_head, waiters_iter,
+				waiters_iter_n) {
+		struct gp_waiters_thread *wt;
+
+		wt = caa_container_of(waiters_iter,
+				struct gp_waiters_thread, node);
+		if (wt == &gp_waiters_thread)
+			continue;
+		urcu_adaptative_wake_up(&wt->wait_futex);
+	}
+
+gp_end:
 	/*
 	 * Finish waiting for reader threads before letting the old ptr being
 	 * freed.
@@ -286,6 +402,9 @@ void synchronize_rcu(void)
 {
 	CDS_LIST_HEAD(qsreaders);
 	unsigned long was_online;
+	struct gp_waiters_thread gp_waiters_thread;
+	struct cds_wfs_head *gp_waiters_head;
+	struct cds_wfs_node *waiters_iter, *waiters_iter_n;
 
 	was_online = URCU_TLS(rcu_reader).ctr;
 
@@ -299,7 +418,26 @@ void synchronize_rcu(void)
 	else
 		cmm_smp_mb();
 
+	/*
+	 * Add ourself to gp_waiters stack of threads awaiting to wait
+	 * for a grace period. Proceed to perform the grace period only
+	 * if we are the first thread added into the stack.
+	 */
+	cds_wfs_node_init(&gp_waiters_thread.node);
+	gp_waiters_thread.wait_futex = AWAKE_WAITING;
+	if (cds_wfs_push(&gp_waiters, &gp_waiters_thread.node) != 0) {
+		/* Not first in stack: will be awakened by another thread. */
+		urcu_adaptative_busy_wait(&gp_waiters_thread.wait_futex);
+		goto gp_end;
+	}
+
 	mutex_lock(&rcu_gp_lock);
+
+	/*
+	 * Pop all waiters into our local stack head.
+	 */
+	gp_waiters_head = __cds_wfs_pop_all(&gp_waiters);
+
 	if (cds_list_empty(&registry))
 		goto out;
 
@@ -334,6 +472,19 @@ void synchronize_rcu(void)
 out:
 	mutex_unlock(&rcu_gp_lock);
 
+	/* Wake all waiters in our stack head, excluding ourself. */
+	cds_wfs_for_each_blocking_safe(gp_waiters_head, waiters_iter,
+				waiters_iter_n) {
+		struct gp_waiters_thread *wt;
+
+		wt = caa_container_of(waiters_iter,
+				struct gp_waiters_thread, node);
+		if (wt == &gp_waiters_thread)
+			continue;
+		urcu_adaptative_wake_up(&wt->wait_futex);
+	}
+
+gp_end:
 	if (was_online)
 		rcu_thread_online();
 	else
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 5/8] urcu-mb/signal/membarrier: move quiescent threads to separate list
       [not found] ` <1353169007-31389-6-git-send-email-mathieu.desnoyers@efficios.com>
@ 2012-11-19  7:16   ` Lai Jiangshan
       [not found]   ` <50A9DCD2.7000009@cn.fujitsu.com>
  1 sibling, 0 replies; 14+ messages in thread
From: Lai Jiangshan @ 2012-11-19  7:16 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: Paul E . McKenney, lttng-dev, rp, Alan Stern

On 11/18/2012 12:16 AM, Mathieu Desnoyers wrote:
> Accelerate 2-phase grace period by not having to iterate twice on
> threads not nested within a RCU read-side lock.

You almost gain nothing.
The first wait_for_readers() waits none very likely.

> @@ -341,7 +361,12 @@ void synchronize_rcu(void)
>  	/*
>  	 * Wait for readers to observe new parity or be quiescent.
>  	 */
> -	wait_for_readers();
> +	wait_for_readers(&registry, NULL, &qsreaders);

wait_for_readers(&cur_snap_readers, NULL, &qsreaders);

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: userspace rcu flavor improvements
       [not found] <1353169007-31389-1-git-send-email-mathieu.desnoyers@efficios.com>
                   ` (8 preceding siblings ...)
       [not found] ` <1353169007-31389-6-git-send-email-mathieu.desnoyers@efficios.com>
@ 2012-11-19  7:52 ` Lai Jiangshan
       [not found] ` <50A9E532.6000706@cn.fujitsu.com>
  10 siblings, 0 replies; 14+ messages in thread
From: Lai Jiangshan @ 2012-11-19  7:52 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: Paul E . McKenney, lttng-dev, rp, Alan Stern

On 11/18/2012 12:16 AM, Mathieu Desnoyers wrote:
> Here are a couple of improvements for all userspace RCU flavors. Many
> thanks to Alan Stern for his suggestions.


It makes urcu like SRCU. (sync_rcu = check zero + flip + check zero)
If I have time, I may port more SRCU code to urcu.

> 
> Patch 8/8 is only done for qsbr so far, and proposed as RFC. I'd like to
> try and benchmark other approaches to concurrent grace periods too.
> 
> Feedback is welcome,
> 
> Thanks,
> 
> Mathieu
> 
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 5/8] urcu-mb/signal/membarrier: move quiescent threads to separate list
       [not found]   ` <50A9DCD2.7000009@cn.fujitsu.com>
@ 2012-11-19 15:16     ` Mathieu Desnoyers
  0 siblings, 0 replies; 14+ messages in thread
From: Mathieu Desnoyers @ 2012-11-19 15:16 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: Paul E . McKenney, lttng-dev, rp, Alan Stern

* Lai Jiangshan (laijs@cn.fujitsu.com) wrote:
> On 11/18/2012 12:16 AM, Mathieu Desnoyers wrote:
> > Accelerate 2-phase grace period by not having to iterate twice on
> > threads not nested within a RCU read-side lock.
> 
> You almost gain nothing.
> The first wait_for_readers() waits none very likely.

I agree with you that the speedup will probably be small, but as we
start having more and more readers, most of which are outside of RCU
read-side critical sections, iterating only once, rather than twice, on
the list of readers seems to be a saving worth doing.

Actually, when I implemented grace periods based on list move/splicing,
it was my original intent to only touch threads as many times as we
needed to observe them become quiescent. The "move quiescent threads to
separate list" patches are completing that modification.

> 
> > @@ -341,7 +361,12 @@ void synchronize_rcu(void)
> >  	/*
> >  	 * Wait for readers to observe new parity or be quiescent.
> >  	 */
> > -	wait_for_readers();
> > +	wait_for_readers(&registry, NULL, &qsreaders);
> 
> wait_for_readers(&cur_snap_readers, NULL, &qsreaders);

Good catch, fixed in my dev branch.

Thanks!

Mathieu


-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: userspace rcu flavor improvements
       [not found] ` <50A9E532.6000706@cn.fujitsu.com>
@ 2012-11-19 15:18   ` Mathieu Desnoyers
  2012-11-19 16:23   ` Paul E. McKenney
       [not found]   ` <20121119162307.GE2829@linux.vnet.ibm.com>
  2 siblings, 0 replies; 14+ messages in thread
From: Mathieu Desnoyers @ 2012-11-19 15:18 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: Paul E . McKenney, lttng-dev, rp, Alan Stern

* Lai Jiangshan (laijs@cn.fujitsu.com) wrote:
> On 11/18/2012 12:16 AM, Mathieu Desnoyers wrote:
> > Here are a couple of improvements for all userspace RCU flavors. Many
> > thanks to Alan Stern for his suggestions.
> 
> 
> It makes urcu like SRCU. (sync_rcu = check zero + flip + check zero)

Good to know :)

> If I have time, I may port more SRCU code to urcu.

That will certainly be interesting.

Thanks!

Mathieu

> 
> > 
> > Patch 8/8 is only done for qsbr so far, and proposed as RFC. I'd like to
> > try and benchmark other approaches to concurrent grace periods too.
> > 
> > Feedback is welcome,
> > 
> > Thanks,
> > 
> > Mathieu
> > 
> > 
> 

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: userspace rcu flavor improvements
       [not found] ` <50A9E532.6000706@cn.fujitsu.com>
  2012-11-19 15:18   ` Mathieu Desnoyers
@ 2012-11-19 16:23   ` Paul E. McKenney
       [not found]   ` <20121119162307.GE2829@linux.vnet.ibm.com>
  2 siblings, 0 replies; 14+ messages in thread
From: Paul E. McKenney @ 2012-11-19 16:23 UTC (permalink / raw)
  To: Lai Jiangshan; +Cc: Alan Stern, lttng-dev, rp, Mathieu Desnoyers

On Mon, Nov 19, 2012 at 03:52:18PM +0800, Lai Jiangshan wrote:
> On 11/18/2012 12:16 AM, Mathieu Desnoyers wrote:
> > Here are a couple of improvements for all userspace RCU flavors. Many
> > thanks to Alan Stern for his suggestions.
> 
> It makes urcu like SRCU. (sync_rcu = check zero + flip + check zero)
> If I have time, I may port more SRCU code to urcu.

I am sure that this is obvious to everyone, but I cannot help restating
it.  There is one important difference between user code and kernel code,
though.  In the kernel, we track by CPU, so one of SRCU's big jobs is
to track multiple tasks using the same CPU.  This opens the possibility
of preemption, which is one of the things that complicates SRCU's design.

In contrast, user-mode RCU tracks tasks without multiplexing.  This
allows simplifications that are similar to those that could be achieved
in the kernel if we were willing to disable preemption across the entire
SRCU read-side critical section.

So although I am all for user-mode RCU taking advantage of any technology
we have at hand, we do need to be careful to avoid needless complexity.

> > Patch 8/8 is only done for qsbr so far, and proposed as RFC. I'd like to
> > try and benchmark other approaches to concurrent grace periods too.

The concurrent grace periods are the big win, in my opinion.  ;-)

							Thanx, Paul

> > Feedback is welcome,
> > 
> > Thanks,
> > 
> > Mathieu
> > 
> > 
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: userspace rcu flavor improvements
       [not found]   ` <20121119162307.GE2829@linux.vnet.ibm.com>
@ 2012-11-19 17:05     ` Mathieu Desnoyers
  0 siblings, 0 replies; 14+ messages in thread
From: Mathieu Desnoyers @ 2012-11-19 17:05 UTC (permalink / raw)
  To: Paul E. McKenney; +Cc: lttng-dev, rp, Alan Stern

* Paul E. McKenney (paulmck@linux.vnet.ibm.com) wrote:
> On Mon, Nov 19, 2012 at 03:52:18PM +0800, Lai Jiangshan wrote:
> > On 11/18/2012 12:16 AM, Mathieu Desnoyers wrote:
> > > Here are a couple of improvements for all userspace RCU flavors. Many
> > > thanks to Alan Stern for his suggestions.
> > 
> > It makes urcu like SRCU. (sync_rcu = check zero + flip + check zero)
> > If I have time, I may port more SRCU code to urcu.
> 
> I am sure that this is obvious to everyone, but I cannot help restating
> it.  There is one important difference between user code and kernel code,
> though.  In the kernel, we track by CPU, so one of SRCU's big jobs is
> to track multiple tasks using the same CPU.  This opens the possibility
> of preemption, which is one of the things that complicates SRCU's design.
> 
> In contrast, user-mode RCU tracks tasks without multiplexing.  This
> allows simplifications that are similar to those that could be achieved
> in the kernel if we were willing to disable preemption across the entire
> SRCU read-side critical section.
> 
> So although I am all for user-mode RCU taking advantage of any technology
> we have at hand, we do need to be careful to avoid needless complexity.

Very good point! Indeed, when considering modifications to URCU, I will
be considering all of those elements:

- Added complexity (verification cost),
+ Speedup,
+ Lower latency,
+ Better scalability,
+ Lower power consumption,

So yes, I'm all for improving URCU synchronisation, but I might be
reluctant to pull modifications that increase complexity significantly
without very significant benefits.

> 
> > > Patch 8/8 is only done for qsbr so far, and proposed as RFC. I'd like to
> > > try and benchmark other approaches to concurrent grace periods too.
> 
> The concurrent grace periods are the big win, in my opinion.  ;-)

I've done some basic benchmarking on the approach taken by patch 8/8,
and it leads to very interesting scalability improvement and speedups,
e.g., on a 24-core AMD, with a write-heavy scenario (4 readers threads,
20 updater threads, each updater using synchronize_rcu()):

* Serialized grace periods :
./test_urcu_qsbr 4 20 20
SUMMARY ./test_urcu_qsbr          testdur   20 nr_readers   4 rdur      0 wdur      0 nr_writers  20 wdelay      0 nr_reads  20251412728 nr_writes      1826331 nr_ops  20253239059

* Batched grace periods :
./test_urcu_qsbr 4 20 20
SUMMARY ./test_urcu_qsbr          testdur   20 nr_readers   4 rdur      0 wdur      0 nr_writers  20 wdelay      0 nr_reads  15141994746 nr_writes      9382515 nr_ops  15151377261

For a 9382515/1826331 = 5.13 speedup

Of course, we can see that readers have slowed down, probably due to
increased update traffic, given there is no change to the read-side code
whatsoever.

Now let's see the penality of managing the stack for single-updater.
With 4 readers, single updater:

* Serialized grace periods :
./test_urcu_qsbr 4 1 20
SUMMARY ./test_urcu_qsbr          testdur   20 nr_readers   4 rdur      0 wdur      0 nr_writers   1 wdelay      0 nr_reads  19240784755 nr_writes      2130839 nr_ops  19242915594

* Batched grace periods :
./test_urcu_qsbr 4 1 20
SUMMARY ./test_urcu_qsbr          testdur   20 nr_readers   4 rdur      0 wdur      0 nr_writers   1 wdelay      0 nr_reads  19160162768 nr_writes      2253068 nr_ops  1916241583

2253068 vs 2137036 -> a couple of runs show that this difference is lost
in the noise for single updater.

So given that implementing a real "concurrent" approach for grace
periods would take a while and adds a lot of complexity, I am tempted to
merge the batching approach given it does not add complexity to the
synchronization algorithm, and already shows interesting speedup.
Moreover, we can easily remove batching if it appears not to be needed
in the future.

Thoughts ?

Thanks,

Mathieu


> 
> 							Thanx, Paul
> 
> > > Feedback is welcome,
> > > 
> > > Thanks,
> > > 
> > > Mathieu
> > > 
> > > 
> > 
> 

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2012-11-19 17:05 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1353169007-31389-1-git-send-email-mathieu.desnoyers@efficios.com>
2012-11-17 16:16 ` [PATCH 1/8] urcu-qsbr: improve 2-phase wait scheme Mathieu Desnoyers
2012-11-17 16:16 ` [PATCH 2/8] urcu-mb/signal/membarrier: " Mathieu Desnoyers
2012-11-17 16:16 ` [PATCH 3/8] urcu-bp: " Mathieu Desnoyers
2012-11-17 16:16 ` [PATCH 4/8] urcu-qsbr: move offline threads to separate list Mathieu Desnoyers
2012-11-17 16:16 ` [PATCH 5/8] urcu-mb/signal/membarrier: move quiescent " Mathieu Desnoyers
2012-11-17 16:16 ` [PATCH 6/8] urcu-bp: " Mathieu Desnoyers
2012-11-17 16:16 ` [PATCH 7/8] tests: use standard malloc/free for synchronize_rcu() Mathieu Desnoyers
2012-11-17 16:16 ` [PATCH 8/8] urcu-qsbr: batch concurrent synchronize_rcu() Mathieu Desnoyers
     [not found] ` <1353169007-31389-6-git-send-email-mathieu.desnoyers@efficios.com>
2012-11-19  7:16   ` [PATCH 5/8] urcu-mb/signal/membarrier: move quiescent threads to separate list Lai Jiangshan
     [not found]   ` <50A9DCD2.7000009@cn.fujitsu.com>
2012-11-19 15:16     ` Mathieu Desnoyers
2012-11-19  7:52 ` userspace rcu flavor improvements Lai Jiangshan
     [not found] ` <50A9E532.6000706@cn.fujitsu.com>
2012-11-19 15:18   ` Mathieu Desnoyers
2012-11-19 16:23   ` Paul E. McKenney
     [not found]   ` <20121119162307.GE2829@linux.vnet.ibm.com>
2012-11-19 17:05     ` Mathieu Desnoyers

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.