All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] CodeSample: rwdeq
@ 2015-08-12  7:55 Dominik Dingel
  2015-08-12  7:55 ` [PATCH 1/2] CodeSamples: Fix compiler warnings Dominik Dingel
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Dominik Dingel @ 2015-08-12  7:55 UTC (permalink / raw)
  To: perfbook; +Cc: Dominik Dingel

From: Dominik Dingel <dingel@linux.vnet.ibm.com>

Hi all,

here is a somehow naive implementation of a reader/writer lock based approch
for the parallel deq problem.

The next step would be to use it as an introduction example in the text, as due
its way of implementation (taking always a read lock, working always with an
atomic counter) it is a less efficient way of doing things, compared with the
tandem approch. Then the benefits of partitioning might be even more obvious.

Thanks
	Dominik

Dominik Dingel (2):
  CodeSamples: Fix compiler warnings
  CodeSamples: add read/write solution to the deq example

 CodeSamples/SMPdesign/Makefile       |   5 +-
 CodeSamples/SMPdesign/deqtorture.h   |   2 +-
 CodeSamples/SMPdesign/lockrwdeq.c    | 154 +++++++++++++++++++++++++++++++++++
 CodeSamples/SMPdesign/locktdeq.c     |   4 -
 CodeSamples/SMPdesign/matmul.c       |   3 +-
 CodeSamples/SMPdesign/matmul_block.c |   3 +-
 CodeSamples/SMPdesign/smpalloc.c     |   1 -
 7 files changed, 163 insertions(+), 9 deletions(-)
 create mode 100644 CodeSamples/SMPdesign/lockrwdeq.c

-- 
1.9.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/2] CodeSamples: Fix compiler warnings
  2015-08-12  7:55 [PATCH 0/2] CodeSample: rwdeq Dominik Dingel
@ 2015-08-12  7:55 ` Dominik Dingel
  2015-08-12  7:55 ` [PATCH 2/2] CodeSamples: add read/write solution to the deq example Dominik Dingel
  2015-08-13 16:43 ` [PATCH 0/2] CodeSample: rwdeq Paul E. McKenney
  2 siblings, 0 replies; 9+ messages in thread
From: Dominik Dingel @ 2015-08-12  7:55 UTC (permalink / raw)
  To: perfbook; +Cc: Dominik Dingel

From: Dominik Dingel <dingel@linux.vnet.ibm.com>

Remove unreferenced variables, add necessary returns, make compiler
happy.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
---
 CodeSamples/SMPdesign/deqtorture.h   | 2 +-
 CodeSamples/SMPdesign/locktdeq.c     | 4 ----
 CodeSamples/SMPdesign/matmul.c       | 3 ++-
 CodeSamples/SMPdesign/matmul_block.c | 3 ++-
 CodeSamples/SMPdesign/smpalloc.c     | 1 -
 5 files changed, 5 insertions(+), 8 deletions(-)

diff --git a/CodeSamples/SMPdesign/deqtorture.h b/CodeSamples/SMPdesign/deqtorture.h
index 4def9d7..61ea70a 100644
--- a/CodeSamples/SMPdesign/deqtorture.h
+++ b/CodeSamples/SMPdesign/deqtorture.h
@@ -482,7 +482,6 @@ int main(int argc, char *argv[])
 	int d1, d2, d3, d4;
 	struct deq_elem e1, e2, e3;
 	int i;
-	struct cds_list_head *p;
 	struct pdeq deq;

 	init_pdeq(&deq);
@@ -546,4 +545,5 @@ int main(int argc, char *argv[])
 	for (i = 0; i < 10; i++) {
 		pdeq_perf();
 	}
+	return 0;
 }
diff --git a/CodeSamples/SMPdesign/locktdeq.c b/CodeSamples/SMPdesign/locktdeq.c
index 20bdb85..0375902 100644
--- a/CodeSamples/SMPdesign/locktdeq.c
+++ b/CodeSamples/SMPdesign/locktdeq.c
@@ -140,8 +140,6 @@ struct cds_list_head *pdeq_pop_r(struct pdeq *d)

 void pdeq_push_l(struct cds_list_head *e, struct pdeq *d)
 {
-	int i;
-
 	spin_lock(&d->llock);
 	deq_push_l(e, &d->ldeq);
 	spin_unlock(&d->llock);
@@ -149,8 +147,6 @@ void pdeq_push_l(struct cds_list_head *e, struct pdeq *d)

 void pdeq_push_r(struct cds_list_head *e, struct pdeq *d)
 {
-	int i;
-
 	spin_lock(&d->rlock);
 	deq_push_r(e, &d->rdeq);
 	spin_unlock(&d->rlock);
diff --git a/CodeSamples/SMPdesign/matmul.c b/CodeSamples/SMPdesign/matmul.c
index 13a7d46..96ed9ef 100644
--- a/CodeSamples/SMPdesign/matmul.c
+++ b/CodeSamples/SMPdesign/matmul.c
@@ -53,11 +53,12 @@ void *matmul_thread(void *me_in)
 		}

 	atomic_inc(&ndone);
+	return NULL;
 }

 int main(int argc, char *argv[])
 {
-	int i, j, k;
+	int i, j;
 	long long startcreatetime;
 	long long starttime;
 	long long endtime;
diff --git a/CodeSamples/SMPdesign/matmul_block.c b/CodeSamples/SMPdesign/matmul_block.c
index 7ed0d52..2ed38ca 100644
--- a/CodeSamples/SMPdesign/matmul_block.c
+++ b/CodeSamples/SMPdesign/matmul_block.c
@@ -58,11 +58,12 @@ void *matmul_thread(void *band_in)
 		}

 	atomic_inc(&ndone);
+	return NULL;
 }

 int main(int argc, char *argv[])
 {
-	int i, j, k;
+	int i, j;
 	struct band *bands;
 	int bandsize;
 	long long startcreatetime;
diff --git a/CodeSamples/SMPdesign/smpalloc.c b/CodeSamples/SMPdesign/smpalloc.c
index f6250ef..5b1f5b6 100644
--- a/CodeSamples/SMPdesign/smpalloc.c
+++ b/CodeSamples/SMPdesign/smpalloc.c
@@ -156,7 +156,6 @@ void usage(char *progname)
 int main(int argc, char *argv[])
 {
 	int i;
-	struct memblock *p[2 * TARGET_POOL_SIZE] = { 0 };
 	long long nc;
 	long long nf;
 	int nkids = 1;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/2] CodeSamples: add read/write solution to the deq example
  2015-08-12  7:55 [PATCH 0/2] CodeSample: rwdeq Dominik Dingel
  2015-08-12  7:55 ` [PATCH 1/2] CodeSamples: Fix compiler warnings Dominik Dingel
@ 2015-08-12  7:55 ` Dominik Dingel
  2015-08-13 16:43 ` [PATCH 0/2] CodeSample: rwdeq Paul E. McKenney
  2 siblings, 0 replies; 9+ messages in thread
From: Dominik Dingel @ 2015-08-12  7:55 UTC (permalink / raw)
  To: perfbook; +Cc: Dominik Dingel

From: Dominik Dingel <dingel@linux.vnet.ibm.com>

In addition to the split deque and the hashed deque there is now a
solution working with an atomic counter and a read writer lock.

Signed-off-by: Dominik Dingel <dingel@linux.vnet.ibm.com>
---
 CodeSamples/SMPdesign/Makefile    |   5 +-
 CodeSamples/SMPdesign/lockrwdeq.c | 154 ++++++++++++++++++++++++++++++++++++++
 2 files changed, 158 insertions(+), 1 deletion(-)
 create mode 100644 CodeSamples/SMPdesign/lockrwdeq.c

diff --git a/CodeSamples/SMPdesign/Makefile b/CodeSamples/SMPdesign/Makefile
index fd042b9..80da412 100644
--- a/CodeSamples/SMPdesign/Makefile
+++ b/CodeSamples/SMPdesign/Makefile
@@ -14,7 +14,7 @@
 #
 # Copyright (c) 2008 Paul E. McKenney, IBM Corporation.

-PROGS = smpalloc lockdeq lockhdeq locktdeq matmul matmul_block
+PROGS = smpalloc lockdeq lockhdeq locktdeq lockrwdeq matmul matmul_block

 all: $(PROGS)

@@ -37,6 +37,9 @@ lockhdeq: lockhdeq.c deqtorture.h ../api.h
 locktdeq: locktdeq.c deqtorture.h ../api.h
 	cc $(GCC_ARGS) -g -o locktdeq -DTEST locktdeq.c -lpthread

+lockrwdeq: lockrwdeq.c deqtorture.h ../api.h
+	cc $(GCC_ARGS) -g -o lockrwdeq -DTEST lockrwdeq.c -lpthread
+
 matmul: matmul.c ../api.h
 	cc $(GCC_ARGS) -g -o matmul -DTEST matmul.c -lpthread

diff --git a/CodeSamples/SMPdesign/lockrwdeq.c b/CodeSamples/SMPdesign/lockrwdeq.c
new file mode 100644
index 0000000..65f3189
--- /dev/null
+++ b/CodeSamples/SMPdesign/lockrwdeq.c
@@ -0,0 +1,154 @@
+/*
+ * lockrwdeq.c: simple rw lock-based parallel deq implementation.
+ * 	This is similar to lockdeq.c, but expresses the parallel
+ * 	implementation in terms of a rw lock with an atomic counter.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (c) 2015 Dominik Dingel, IBM Corporation.
+ */
+
+#include "../api.h"
+
+/* First do the underlying deq implementation. */
+
+struct deq {
+	struct cds_list_head chain;
+} ____cacheline_internodealigned_in_smp;
+
+void init_deq(struct deq *p)
+{
+	CDS_INIT_LIST_HEAD(&p->chain);
+}
+
+struct cds_list_head *deq_pop_l(struct deq *p)
+{
+	struct cds_list_head *e;
+
+	if (cds_list_empty(&p->chain))
+		e = NULL;
+	else {
+		e = p->chain.prev;
+		cds_list_del(e);
+		CDS_INIT_LIST_HEAD(e);
+	}
+	return e;
+}
+
+void deq_push_l(struct cds_list_head *e, struct deq *p)
+{
+	cds_list_add_tail(e, &p->chain);
+}
+
+struct cds_list_head *deq_pop_r(struct deq *p)
+{
+	struct cds_list_head *e;
+
+	if (cds_list_empty(&p->chain))
+		e = NULL;
+	else {
+		e = p->chain.next;
+		cds_list_del(e);
+		CDS_INIT_LIST_HEAD(e);
+	}
+	return e;
+}
+
+void deq_push_r(struct cds_list_head *e, struct deq *p)
+{
+	cds_list_add(e, &p->chain);
+}
+
+/*
+ * And now the concurrent implementation, which uses a counter and a rw lock.
+ * The counter holds the current available number of items to modify.
+ * Before doing a modification we aquire a read lock and reserve our item to modify
+ * (link/unlink).
+ * If there are not enough items left for true parallel access we will upgrade
+ * the read lock, to a write lock, making the access truly sequential.
+ */
+
+struct pdeq {
+	pthread_rwlock_t lock;
+	atomic_t counter;
+	struct deq deq_cnt;
+};
+
+void init_pdeq(struct pdeq *d)
+{
+	pthread_rwlock_init(&d->lock, NULL);
+	atomic_set(&d->counter, 0);
+	init_deq(&d->deq_cnt);
+}
+
+struct cds_list_head *pdeq_pop_l(struct pdeq *d)
+{
+	struct cds_list_head *e;
+	pthread_rwlock_rdlock(&d->lock);
+	if (atomic_sub_return(2, &d->counter) < 1) {
+		pthread_rwlock_unlock(&d->lock);
+		pthread_rwlock_wrlock(&d->lock);
+	}
+	e = deq_pop_l(&d->deq_cnt);
+	if (e == NULL)
+		atomic_inc(&d->counter);
+	atomic_inc(&d->counter);
+	pthread_rwlock_unlock(&d->lock);
+	return e;
+}
+
+struct cds_list_head *pdeq_pop_r(struct pdeq *d)
+{
+	struct cds_list_head *e;
+	pthread_rwlock_rdlock(&d->lock);
+	if (atomic_sub_return(2, &d->counter) < 1) {
+		pthread_rwlock_unlock(&d->lock);
+		pthread_rwlock_wrlock(&d->lock);
+	}
+	e = deq_pop_r(&d->deq_cnt);
+	if (e == NULL)
+		atomic_inc(&d->counter);
+	atomic_inc(&d->counter);
+	pthread_rwlock_unlock(&d->lock);
+	return e;
+}
+
+void pdeq_push_l(struct cds_list_head *e, struct pdeq *d)
+{
+	pthread_rwlock_rdlock(&d->lock);
+	if (atomic_sub_return(1, &d->counter) < 1) {
+		pthread_rwlock_unlock(&d->lock);
+		pthread_rwlock_wrlock(&d->lock);
+	}
+	deq_push_l(e, &d->deq_cnt);
+	atomic_add(2, &d->counter);
+	pthread_rwlock_unlock(&d->lock);
+}
+
+void pdeq_push_r(struct cds_list_head *e, struct pdeq *d)
+{
+	pthread_rwlock_rdlock(&d->lock);
+	if (atomic_sub_return(1, &d->counter) < 1) {
+		pthread_rwlock_unlock(&d->lock);
+		pthread_rwlock_wrlock(&d->lock);
+	}
+	deq_push_r(e, &d->deq_cnt);
+	atomic_add(2, &d->counter);
+	pthread_rwlock_unlock(&d->lock);
+}
+
+#ifdef TEST
+#include "deqtorture.h"
+#endif /* #ifdef TEST */
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/2] CodeSample: rwdeq
  2015-08-12  7:55 [PATCH 0/2] CodeSample: rwdeq Dominik Dingel
  2015-08-12  7:55 ` [PATCH 1/2] CodeSamples: Fix compiler warnings Dominik Dingel
  2015-08-12  7:55 ` [PATCH 2/2] CodeSamples: add read/write solution to the deq example Dominik Dingel
@ 2015-08-13 16:43 ` Paul E. McKenney
  2015-08-17  6:22   ` Paul E. McKenney
  2 siblings, 1 reply; 9+ messages in thread
From: Paul E. McKenney @ 2015-08-13 16:43 UTC (permalink / raw)
  To: Dominik Dingel; +Cc: perfbook, Dominik Dingel

On Wed, Aug 12, 2015 at 09:55:22AM +0200, Dominik Dingel wrote:
> From: Dominik Dingel <dingel@linux.vnet.ibm.com>
> 
> Hi all,
> 
> here is a somehow naive implementation of a reader/writer lock based approch
> for the parallel deq problem.
> 
> The next step would be to use it as an introduction example in the text, as due
> its way of implementation (taking always a read lock, working always with an
> atomic counter) it is a less efficient way of doing things, compared with the
> tandem approch. Then the benefits of partitioning might be even more obvious.
> 
> Thanks
> 	Dominik

Queued, thank you, Dominik!

A quick unscientific run leads me to believe that this is slightly faster
than the hashed implementation (not bad!), but about twice as slow as
the tandem version.  Of course, considerable variation is to be expected
based on workload.

							Thanx, Paul

> Dominik Dingel (2):
>   CodeSamples: Fix compiler warnings
>   CodeSamples: add read/write solution to the deq example
> 
>  CodeSamples/SMPdesign/Makefile       |   5 +-
>  CodeSamples/SMPdesign/deqtorture.h   |   2 +-
>  CodeSamples/SMPdesign/lockrwdeq.c    | 154 +++++++++++++++++++++++++++++++++++
>  CodeSamples/SMPdesign/locktdeq.c     |   4 -
>  CodeSamples/SMPdesign/matmul.c       |   3 +-
>  CodeSamples/SMPdesign/matmul_block.c |   3 +-
>  CodeSamples/SMPdesign/smpalloc.c     |   1 -
>  7 files changed, 163 insertions(+), 9 deletions(-)
>  create mode 100644 CodeSamples/SMPdesign/lockrwdeq.c
> 
> -- 
> 1.9.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe perfbook" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/2] CodeSample: rwdeq
  2015-08-13 16:43 ` [PATCH 0/2] CodeSample: rwdeq Paul E. McKenney
@ 2015-08-17  6:22   ` Paul E. McKenney
  2015-08-19 19:06     ` Dominik Dingel
  0 siblings, 1 reply; 9+ messages in thread
From: Paul E. McKenney @ 2015-08-17  6:22 UTC (permalink / raw)
  To: Dominik Dingel; +Cc: perfbook, Dominik Dingel

On Thu, Aug 13, 2015 at 09:43:01AM -0700, Paul E. McKenney wrote:
> On Wed, Aug 12, 2015 at 09:55:22AM +0200, Dominik Dingel wrote:
> > From: Dominik Dingel <dingel@linux.vnet.ibm.com>
> > 
> > Hi all,
> > 
> > here is a somehow naive implementation of a reader/writer lock based approch
> > for the parallel deq problem.
> > 
> > The next step would be to use it as an introduction example in the text, as due
> > its way of implementation (taking always a read lock, working always with an
> > atomic counter) it is a less efficient way of doing things, compared with the
> > tandem approch. Then the benefits of partitioning might be even more obvious.
> > 
> > Thanks
> > 	Dominik
> 
> Queued, thank you, Dominik!
> 
> A quick unscientific run leads me to believe that this is slightly faster
> than the hashed implementation (not bad!), but about twice as slow as
> the tandem version.  Of course, considerable variation is to be expected
> based on workload.

And, as discussed earlier, here the commit adding a Quick Quiz referring
to your algorithm.  Thoughts?

							Thanx, Paul

------------------------------------------------------------------------

    Add Quick Quiz for Dominik Dingel's solution
    
    Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/SMPdesign/partexercises.tex b/SMPdesign/partexercises.tex
index a44c285ce134..5d46e70028aa 100644
--- a/SMPdesign/partexercises.tex
+++ b/SMPdesign/partexercises.tex
@@ -683,6 +683,15 @@ Michael~\cite{DBLP:conf/europar/Michael03}.\footnote{
 	Instead, the common compare-and-swap (e.g., x86 cmpxchg)
 	suffices.}

+\QuickQuiz{}
+	Why are there not one but two solutions to the double-ended queue
+	problem?
+\QuickQuizAnswer{
+	There are actually at least three.
+	The third, by Dominik Dingel, makes interesting use of
+	reader-writer locking, and may be found in \co{lockrwdeq.c}.
+} \QuickQuizEnd
+
 In fact, as noted by Dice et al.~\cite{DavidDice:2010:SCA:HTM:deque},
 an unsynchronized single-threaded double-ended queue significantly
 outperforms any of the parallel implementations they studied.
diff --git a/qqz.tex b/qqz.tex
index 57cdc43f63d1..cfd0356854b7 100644
--- a/qqz.tex
+++ b/qqz.tex
@@ -2561,6 +2561,14 @@ ret = ~ret & tmp;
 	it is worthwhile) is left as an exercise for the reader.

 \QuickQ{}
+	Why are there not one but two solutions to the double-ended queue
+	problem?
+\QuickA{}
+	There are actually at least three.
+	The third, by Dominik Dingel, makes interesting use of
+	reader-writer locking, and may be found in \co{lockrwdeq.c}.
+
+\QuickQ{}
 	The tandem double-ended queue runs about twice as fast as
 	the hashed double-ended queue, even when I increase the
 	size of the hash table to an insanely large number.


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/2] CodeSample: rwdeq
  2015-08-17  6:22   ` Paul E. McKenney
@ 2015-08-19 19:06     ` Dominik Dingel
  2015-08-19 21:22       ` Paul E. McKenney
  0 siblings, 1 reply; 9+ messages in thread
From: Dominik Dingel @ 2015-08-19 19:06 UTC (permalink / raw)
  To: paulmck; +Cc: perfbook, Dominik Dingel

On Sun, 16 Aug 2015 23:22:19 -0700
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:

> On Thu, Aug 13, 2015 at 09:43:01AM -0700, Paul E. McKenney wrote:
> > On Wed, Aug 12, 2015 at 09:55:22AM +0200, Dominik Dingel wrote:
> > > From: Dominik Dingel <dingel@linux.vnet.ibm.com>
> > > 
> > > Hi all,
> > > 
> > > here is a somehow naive implementation of a reader/writer lock based approch
> > > for the parallel deq problem.
> > > 
> > > The next step would be to use it as an introduction example in the text, as due
> > > its way of implementation (taking always a read lock, working always with an
> > > atomic counter) it is a less efficient way of doing things, compared with the
> > > tandem approch. Then the benefits of partitioning might be even more obvious.
> > > 
> > > Thanks
> > > 	Dominik
> > 
> > Queued, thank you, Dominik!
> > 
> > A quick unscientific run leads me to believe that this is slightly faster
> > than the hashed implementation (not bad!), but about twice as slow as
> > the tandem version.  Of course, considerable variation is to be expected
> > based on workload.

Sounds good, during sniff tests I saw even slightly less performance than your hashed implementation. 

> And, as discussed earlier, here the commit adding a Quick Quiz referring
> to your algorithm.  Thoughts?

Looks good! Thanks for putting it in.
My plan, as soon as I get to:
- Rework the section to start with my solution
- Show the disadvantages of my solution
  -> Atomic and read/writer lock (resulting in bouncing cache lines)
- Show your tandem solution which will avoid that overhead and will only
in the worst case really do synchronization of the complete data structure.

What do you think?

Thanks
	Dominik

> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
>     Add Quick Quiz for Dominik Dingel's solution
>     
>     Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> 
> diff --git a/SMPdesign/partexercises.tex b/SMPdesign/partexercises.tex
> index a44c285ce134..5d46e70028aa 100644
> --- a/SMPdesign/partexercises.tex
> +++ b/SMPdesign/partexercises.tex
> @@ -683,6 +683,15 @@ Michael~\cite{DBLP:conf/europar/Michael03}.\footnote{
>  	Instead, the common compare-and-swap (e.g., x86 cmpxchg)
>  	suffices.}
>  
> +\QuickQuiz{}
> +	Why are there not one but two solutions to the double-ended queue
> +	problem?
> +\QuickQuizAnswer{
> +	There are actually at least three.
> +	The third, by Dominik Dingel, makes interesting use of
> +	reader-writer locking, and may be found in \co{lockrwdeq.c}.
> +} \QuickQuizEnd
> +
>  In fact, as noted by Dice et al.~\cite{DavidDice:2010:SCA:HTM:deque},
>  an unsynchronized single-threaded double-ended queue significantly
>  outperforms any of the parallel implementations they studied.
> diff --git a/qqz.tex b/qqz.tex
> index 57cdc43f63d1..cfd0356854b7 100644
> --- a/qqz.tex
> +++ b/qqz.tex
> @@ -2561,6 +2561,14 @@ ret = ~ret & tmp;
>  	it is worthwhile) is left as an exercise for the reader.
>  
>  \QuickQ{}
> +	Why are there not one but two solutions to the double-ended queue
> +	problem?
> +\QuickA{}
> +	There are actually at least three.
> +	The third, by Dominik Dingel, makes interesting use of
> +	reader-writer locking, and may be found in \co{lockrwdeq.c}.
> +
> +\QuickQ{}
>  	The tandem double-ended queue runs about twice as fast as
>  	the hashed double-ended queue, even when I increase the
>  	size of the hash table to an insanely large number.
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/2] CodeSample: rwdeq
  2015-08-19 19:06     ` Dominik Dingel
@ 2015-08-19 21:22       ` Paul E. McKenney
  2015-08-20 17:36         ` Dominik Dingel
  0 siblings, 1 reply; 9+ messages in thread
From: Paul E. McKenney @ 2015-08-19 21:22 UTC (permalink / raw)
  To: Dominik Dingel; +Cc: perfbook, Dominik Dingel

On Wed, Aug 19, 2015 at 09:06:04PM +0200, Dominik Dingel wrote:
> On Sun, 16 Aug 2015 23:22:19 -0700
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> 
> > On Thu, Aug 13, 2015 at 09:43:01AM -0700, Paul E. McKenney wrote:
> > > On Wed, Aug 12, 2015 at 09:55:22AM +0200, Dominik Dingel wrote:
> > > > From: Dominik Dingel <dingel@linux.vnet.ibm.com>
> > > > 
> > > > Hi all,
> > > > 
> > > > here is a somehow naive implementation of a reader/writer lock based approch
> > > > for the parallel deq problem.
> > > > 
> > > > The next step would be to use it as an introduction example in the text, as due
> > > > its way of implementation (taking always a read lock, working always with an
> > > > atomic counter) it is a less efficient way of doing things, compared with the
> > > > tandem approch. Then the benefits of partitioning might be even more obvious.
> > > > 
> > > > Thanks
> > > > 	Dominik
> > > 
> > > Queued, thank you, Dominik!
> > > 
> > > A quick unscientific run leads me to believe that this is slightly faster
> > > than the hashed implementation (not bad!), but about twice as slow as
> > > the tandem version.  Of course, considerable variation is to be expected
> > > based on workload.
> 
> Sounds good, during sniff tests I saw even slightly less performance than your hashed implementation. 

It is probably possible to construct a workload that makes either look
good or bad.  ;-)

> > And, as discussed earlier, here the commit adding a Quick Quiz referring
> > to your algorithm.  Thoughts?
> 
> Looks good! Thanks for putting it in.
> My plan, as soon as I get to:
> - Rework the section to start with my solution
> - Show the disadvantages of my solution
>   -> Atomic and read/writer lock (resulting in bouncing cache lines)
> - Show your tandem solution which will avoid that overhead and will only
> in the worst case really do synchronization of the complete data structure.
> 
> What do you think?

I was actually planning to push the hashed version to either a quick
quiz or an appendix in order to make room for other sections to
expand.  One approach would be to put both your implementation and
the hashed implementation in the "Important Questions" section for
the second edition.

Later on, I am thinking in terms of a supplement of some sort, so that
the main book stays at roughly 500 pages, but so that people can
go look at additional material if they want.  Thoughts?

							Thanx, Paul

> Thanks
> 	Dominik
> 
> > 							Thanx, Paul
> > 
> > ------------------------------------------------------------------------
> > 
> >     Add Quick Quiz for Dominik Dingel's solution
> >     
> >     Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > 
> > diff --git a/SMPdesign/partexercises.tex b/SMPdesign/partexercises.tex
> > index a44c285ce134..5d46e70028aa 100644
> > --- a/SMPdesign/partexercises.tex
> > +++ b/SMPdesign/partexercises.tex
> > @@ -683,6 +683,15 @@ Michael~\cite{DBLP:conf/europar/Michael03}.\footnote{
> >  	Instead, the common compare-and-swap (e.g., x86 cmpxchg)
> >  	suffices.}
> >  
> > +\QuickQuiz{}
> > +	Why are there not one but two solutions to the double-ended queue
> > +	problem?
> > +\QuickQuizAnswer{
> > +	There are actually at least three.
> > +	The third, by Dominik Dingel, makes interesting use of
> > +	reader-writer locking, and may be found in \co{lockrwdeq.c}.
> > +} \QuickQuizEnd
> > +
> >  In fact, as noted by Dice et al.~\cite{DavidDice:2010:SCA:HTM:deque},
> >  an unsynchronized single-threaded double-ended queue significantly
> >  outperforms any of the parallel implementations they studied.
> > diff --git a/qqz.tex b/qqz.tex
> > index 57cdc43f63d1..cfd0356854b7 100644
> > --- a/qqz.tex
> > +++ b/qqz.tex
> > @@ -2561,6 +2561,14 @@ ret = ~ret & tmp;
> >  	it is worthwhile) is left as an exercise for the reader.
> >  
> >  \QuickQ{}
> > +	Why are there not one but two solutions to the double-ended queue
> > +	problem?
> > +\QuickA{}
> > +	There are actually at least three.
> > +	The third, by Dominik Dingel, makes interesting use of
> > +	reader-writer locking, and may be found in \co{lockrwdeq.c}.
> > +
> > +\QuickQ{}
> >  	The tandem double-ended queue runs about twice as fast as
> >  	the hashed double-ended queue, even when I increase the
> >  	size of the hash table to an insanely large number.
> > 
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/2] CodeSample: rwdeq
  2015-08-19 21:22       ` Paul E. McKenney
@ 2015-08-20 17:36         ` Dominik Dingel
  2015-08-21 16:21           ` Paul E. McKenney
  0 siblings, 1 reply; 9+ messages in thread
From: Dominik Dingel @ 2015-08-20 17:36 UTC (permalink / raw)
  To: paulmck; +Cc: perfbook, Dominik Dingel

On Wed, 19 Aug 2015 14:22:03 -0700
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:

> On Wed, Aug 19, 2015 at 09:06:04PM +0200, Dominik Dingel wrote:
> > On Sun, 16 Aug 2015 23:22:19 -0700
> > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> > 
> > > On Thu, Aug 13, 2015 at 09:43:01AM -0700, Paul E. McKenney wrote:
> > > > On Wed, Aug 12, 2015 at 09:55:22AM +0200, Dominik Dingel wrote:
> > > > > From: Dominik Dingel <dingel@linux.vnet.ibm.com>
> > > > > 
> > > > > Hi all,
> > > > > 
> > > > > here is a somehow naive implementation of a reader/writer lock based approch
> > > > > for the parallel deq problem.
> > > > > 
> > > > > The next step would be to use it as an introduction example in the text, as due
> > > > > its way of implementation (taking always a read lock, working always with an
> > > > > atomic counter) it is a less efficient way of doing things, compared with the
> > > > > tandem approch. Then the benefits of partitioning might be even more obvious.
> > > > > 
> > > > > Thanks
> > > > > 	Dominik
> > > > 
> > > > Queued, thank you, Dominik!
> > > > 
> > > > A quick unscientific run leads me to believe that this is slightly faster
> > > > than the hashed implementation (not bad!), but about twice as slow as
> > > > the tandem version.  Of course, considerable variation is to be expected
> > > > based on workload.
> > 
> > Sounds good, during sniff tests I saw even slightly less performance than your hashed implementation. 
> 
> It is probably possible to construct a workload that makes either look
> good or bad.  ;-)
> 
> > > And, as discussed earlier, here the commit adding a Quick Quiz referring
> > > to your algorithm.  Thoughts?
> > 
> > Looks good! Thanks for putting it in.
> > My plan, as soon as I get to:
> > - Rework the section to start with my solution
> > - Show the disadvantages of my solution
> >   -> Atomic and read/writer lock (resulting in bouncing cache lines)
> > - Show your tandem solution which will avoid that overhead and will only
> > in the worst case really do synchronization of the complete data structure.
> > 
> > What do you think?
> 
> I was actually planning to push the hashed version to either a quick
> quiz or an appendix in order to make room for other sections to
> expand.  One approach would be to put both your implementation and
> the hashed implementation in the "Important Questions" section for
> the second edition.


Sounds reasonable. You have any time for releasing in mind?


> Later on, I am thinking in terms of a supplement of some sort, so that
> the main book stays at roughly 500 pages, but so that people can
> go look at additional material if they want.  Thoughts?
> 
> 							Thanx, Paul

I encourage to go that way, so people really want to dive deep can do so,
while at the same time you do not scare potential readers, right?

Thanks
	Dominik

> > Thanks
> > 	Dominik
> > 
> > > 							Thanx, Paul
> > > 
> > > ------------------------------------------------------------------------
> > > 
> > >     Add Quick Quiz for Dominik Dingel's solution
> > >     
> > >     Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > 
> > > diff --git a/SMPdesign/partexercises.tex b/SMPdesign/partexercises.tex
> > > index a44c285ce134..5d46e70028aa 100644
> > > --- a/SMPdesign/partexercises.tex
> > > +++ b/SMPdesign/partexercises.tex
> > > @@ -683,6 +683,15 @@ Michael~\cite{DBLP:conf/europar/Michael03}.\footnote{
> > >  	Instead, the common compare-and-swap (e.g., x86 cmpxchg)
> > >  	suffices.}
> > >  
> > > +\QuickQuiz{}
> > > +	Why are there not one but two solutions to the double-ended queue
> > > +	problem?
> > > +\QuickQuizAnswer{
> > > +	There are actually at least three.
> > > +	The third, by Dominik Dingel, makes interesting use of
> > > +	reader-writer locking, and may be found in \co{lockrwdeq.c}.
> > > +} \QuickQuizEnd
> > > +
> > >  In fact, as noted by Dice et al.~\cite{DavidDice:2010:SCA:HTM:deque},
> > >  an unsynchronized single-threaded double-ended queue significantly
> > >  outperforms any of the parallel implementations they studied.
> > > diff --git a/qqz.tex b/qqz.tex
> > > index 57cdc43f63d1..cfd0356854b7 100644
> > > --- a/qqz.tex
> > > +++ b/qqz.tex
> > > @@ -2561,6 +2561,14 @@ ret = ~ret & tmp;
> > >  	it is worthwhile) is left as an exercise for the reader.
> > >  
> > >  \QuickQ{}
> > > +	Why are there not one but two solutions to the double-ended queue
> > > +	problem?
> > > +\QuickA{}
> > > +	There are actually at least three.
> > > +	The third, by Dominik Dingel, makes interesting use of
> > > +	reader-writer locking, and may be found in \co{lockrwdeq.c}.
> > > +
> > > +\QuickQ{}
> > >  	The tandem double-ended queue runs about twice as fast as
> > >  	the hashed double-ended queue, even when I increase the
> > >  	size of the hash table to an insanely large number.
> > > 
> > 
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/2] CodeSample: rwdeq
  2015-08-20 17:36         ` Dominik Dingel
@ 2015-08-21 16:21           ` Paul E. McKenney
  0 siblings, 0 replies; 9+ messages in thread
From: Paul E. McKenney @ 2015-08-21 16:21 UTC (permalink / raw)
  To: Dominik Dingel; +Cc: perfbook, Dominik Dingel

On Thu, Aug 20, 2015 at 07:36:14PM +0200, Dominik Dingel wrote:
> On Wed, 19 Aug 2015 14:22:03 -0700
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> 
> > On Wed, Aug 19, 2015 at 09:06:04PM +0200, Dominik Dingel wrote:
> > > On Sun, 16 Aug 2015 23:22:19 -0700
> > > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> > > 
> > > > On Thu, Aug 13, 2015 at 09:43:01AM -0700, Paul E. McKenney wrote:
> > > > > On Wed, Aug 12, 2015 at 09:55:22AM +0200, Dominik Dingel wrote:
> > > > > > From: Dominik Dingel <dingel@linux.vnet.ibm.com>
> > > > > > 
> > > > > > Hi all,
> > > > > > 
> > > > > > here is a somehow naive implementation of a reader/writer lock based approch
> > > > > > for the parallel deq problem.
> > > > > > 
> > > > > > The next step would be to use it as an introduction example in the text, as due
> > > > > > its way of implementation (taking always a read lock, working always with an
> > > > > > atomic counter) it is a less efficient way of doing things, compared with the
> > > > > > tandem approch. Then the benefits of partitioning might be even more obvious.
> > > > > > 
> > > > > > Thanks
> > > > > > 	Dominik
> > > > > 
> > > > > Queued, thank you, Dominik!
> > > > > 
> > > > > A quick unscientific run leads me to believe that this is slightly faster
> > > > > than the hashed implementation (not bad!), but about twice as slow as
> > > > > the tandem version.  Of course, considerable variation is to be expected
> > > > > based on workload.
> > > 
> > > Sounds good, during sniff tests I saw even slightly less performance than your hashed implementation. 
> > 
> > It is probably possible to construct a workload that makes either look
> > good or bad.  ;-)
> > 
> > > > And, as discussed earlier, here the commit adding a Quick Quiz referring
> > > > to your algorithm.  Thoughts?
> > > 
> > > Looks good! Thanks for putting it in.
> > > My plan, as soon as I get to:
> > > - Rework the section to start with my solution
> > > - Show the disadvantages of my solution
> > >   -> Atomic and read/writer lock (resulting in bouncing cache lines)
> > > - Show your tandem solution which will avoid that overhead and will only
> > > in the worst case really do synchronization of the complete data structure.
> > > 
> > > What do you think?
> > 
> > I was actually planning to push the hashed version to either a quick
> > quiz or an appendix in order to make room for other sections to
> > expand.  One approach would be to put both your implementation and
> > the hashed implementation in the "Important Questions" section for
> > the second edition.
> 
> Sounds reasonable. You have any time for releasing in mind?

Well, there is a public git tree, so as soon as I get to make the
changes.  I don't yet have a firm date for the second edition, but
probably a small number of years rather than months, depending on
how some work leading up to it goes.

> > Later on, I am thinking in terms of a supplement of some sort, so that
> > the main book stays at roughly 500 pages, but so that people can
> > go look at additional material if they want.  Thoughts?
> > 
> I encourage to go that way, so people really want to dive deep can do so,
> while at the same time you do not scare potential readers, right?

Yep!  Also, some people like a printed copy, and keeping the book under
500 pages keeps the printing costs reasonable.

							Thanx, Paul

> Thanks
> 	Dominik
> 
> > > Thanks
> > > 	Dominik
> > > 
> > > > 							Thanx, Paul
> > > > 
> > > > ------------------------------------------------------------------------
> > > > 
> > > >     Add Quick Quiz for Dominik Dingel's solution
> > > >     
> > > >     Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> > > > 
> > > > diff --git a/SMPdesign/partexercises.tex b/SMPdesign/partexercises.tex
> > > > index a44c285ce134..5d46e70028aa 100644
> > > > --- a/SMPdesign/partexercises.tex
> > > > +++ b/SMPdesign/partexercises.tex
> > > > @@ -683,6 +683,15 @@ Michael~\cite{DBLP:conf/europar/Michael03}.\footnote{
> > > >  	Instead, the common compare-and-swap (e.g., x86 cmpxchg)
> > > >  	suffices.}
> > > >  
> > > > +\QuickQuiz{}
> > > > +	Why are there not one but two solutions to the double-ended queue
> > > > +	problem?
> > > > +\QuickQuizAnswer{
> > > > +	There are actually at least three.
> > > > +	The third, by Dominik Dingel, makes interesting use of
> > > > +	reader-writer locking, and may be found in \co{lockrwdeq.c}.
> > > > +} \QuickQuizEnd
> > > > +
> > > >  In fact, as noted by Dice et al.~\cite{DavidDice:2010:SCA:HTM:deque},
> > > >  an unsynchronized single-threaded double-ended queue significantly
> > > >  outperforms any of the parallel implementations they studied.
> > > > diff --git a/qqz.tex b/qqz.tex
> > > > index 57cdc43f63d1..cfd0356854b7 100644
> > > > --- a/qqz.tex
> > > > +++ b/qqz.tex
> > > > @@ -2561,6 +2561,14 @@ ret = ~ret & tmp;
> > > >  	it is worthwhile) is left as an exercise for the reader.
> > > >  
> > > >  \QuickQ{}
> > > > +	Why are there not one but two solutions to the double-ended queue
> > > > +	problem?
> > > > +\QuickA{}
> > > > +	There are actually at least three.
> > > > +	The third, by Dominik Dingel, makes interesting use of
> > > > +	reader-writer locking, and may be found in \co{lockrwdeq.c}.
> > > > +
> > > > +\QuickQ{}
> > > >  	The tandem double-ended queue runs about twice as fast as
> > > >  	the hashed double-ended queue, even when I increase the
> > > >  	size of the hash table to an insanely large number.
> > > > 
> > > 
> > 
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-08-21 16:21 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-12  7:55 [PATCH 0/2] CodeSample: rwdeq Dominik Dingel
2015-08-12  7:55 ` [PATCH 1/2] CodeSamples: Fix compiler warnings Dominik Dingel
2015-08-12  7:55 ` [PATCH 2/2] CodeSamples: add read/write solution to the deq example Dominik Dingel
2015-08-13 16:43 ` [PATCH 0/2] CodeSample: rwdeq Paul E. McKenney
2015-08-17  6:22   ` Paul E. McKenney
2015-08-19 19:06     ` Dominik Dingel
2015-08-19 21:22       ` Paul E. McKenney
2015-08-20 17:36         ` Dominik Dingel
2015-08-21 16:21           ` Paul E. McKenney

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.