[PATCH -mm -v2 0/5] irq_work, Use llist in irq

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH -mm -v2 0/5] irq_work, Use llist in irq_work
@ 2011-09-08  6:00 Huang Ying
  2011-09-08  6:00 ` [PATCH -mm -v2 1/5] llist, Make all llist functions inline Huang Ying
                   ` (8 more replies)
  0 siblings, 9 replies; 23+ messages in thread
From: Huang Ying @ 2011-09-08  6:00 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Andi Kleen, ying.huang, Peter Zijlstra, Mathieu Desnoyers

Changes:

v2:

- Return whether list is empty before adding in llist_add and use that in
  __irq_work_queue

[PATCH -mm -v2 1/5] llist, Make all llist functions inline
[PATCH -mm -v2 2/5] llist, Define macro to check NMI safe cmpxchg
[PATCH -mm -v2 3/5] llist, Move cpu_relax after cmpxchg
[PATCH -mm -v2 4/5] llist, Return whether list is empty before adding in llist_add
[PATCH -mm -v2 5/5] irq_work, Use llist in irq_work

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH -mm -v2 1/5] llist, Make all llist functions inline
  2011-09-08  6:00 [PATCH -mm -v2 0/5] irq_work, Use llist in irq_work Huang Ying
@ 2011-09-08  6:00 ` Huang Ying
  2011-09-08  6:00 ` [PATCH -mm -v2 2/5] llist, Define macro to check NMI safe cmpxchg Huang Ying
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 23+ messages in thread
From: Huang Ying @ 2011-09-08  6:00 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Andi Kleen, ying.huang, Peter Zijlstra, Mathieu Desnoyers

Because llist may be used in performance critical code path, make all
llist functions inline to avoid function calling overhead.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
---
 drivers/acpi/apei/Kconfig |    1 
 include/linux/llist.h     |  122 +++++++++++++++++++++++++++++++++++++++++--
 lib/Kconfig               |    3 -
 lib/Makefile              |    2 
 lib/llist.c               |  129 ----------------------------------------------
 5 files changed, 116 insertions(+), 141 deletions(-)
 delete mode 100644 lib/llist.c

--- a/drivers/acpi/apei/Kconfig
+++ b/drivers/acpi/apei/Kconfig
@@ -14,7 +14,6 @@ config ACPI_APEI_GHES
 	depends on ACPI_APEI && X86
 	select ACPI_HED
 	select IRQ_WORK
-	select LLIST
 	select GENERIC_ALLOCATOR
 	help
 	  Generic Hardware Error Source provides a way to report
--- a/include/linux/llist.h
+++ b/include/linux/llist.h
@@ -37,8 +37,28 @@
  * architectures that don't have NMI-safe cmpxchg implementation, the
  * list can NOT be used in NMI handler.  So code uses the list in NMI
  * handler should depend on CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG.
+ *
+ * Copyright 2010,2011 Intel Corp.
+ *   Author: Huang Ying <ying.huang@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version
+ * 2 as published by the Free Software Foundation;
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
  */
 
+#include <linux/kernel.h>
+#include <asm/system.h>
+#include <asm/processor.h>
+
 struct llist_head {
 	struct llist_node *first;
 };
@@ -113,14 +133,104 @@ static inline void init_llist_head(struc
  * test whether the list is empty without deleting something from the
  * list.
  */
-static inline int llist_empty(const struct llist_head *head)
+static inline bool llist_empty(const struct llist_head *head)
 {
 	return ACCESS_ONCE(head->first) == NULL;
 }
 
-void llist_add(struct llist_node *new, struct llist_head *head);
-void llist_add_batch(struct llist_node *new_first, struct llist_node *new_last,
-		     struct llist_head *head);
-struct llist_node *llist_del_first(struct llist_head *head);
-struct llist_node *llist_del_all(struct llist_head *head);
+/**
+ * llist_add - add a new entry
+ * @new:	new entry to be added
+ * @head:	the head for your lock-less list
+ */
+static inline void llist_add(struct llist_node *new, struct llist_head *head)
+{
+	struct llist_node *entry, *old_entry;
+
+#ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
+	BUG_ON(in_nmi());
+#endif
+
+	entry = head->first;
+	do {
+		old_entry = entry;
+		new->next = entry;
+		cpu_relax();
+	} while ((entry = cmpxchg(&head->first, old_entry, new)) != old_entry);
+}
+
+/**
+ * llist_add_batch - add several linked entries in batch
+ * @new_first:	first entry in batch to be added
+ * @new_last:	last entry in batch to be added
+ * @head:	the head for your lock-less list
+ */
+static inline void llist_add_batch(struct llist_node *new_first,
+				   struct llist_node *new_last,
+				   struct llist_head *head)
+{
+	struct llist_node *entry, *old_entry;
+
+#ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
+	BUG_ON(in_nmi());
+#endif
+
+	entry = head->first;
+	do {
+		old_entry = entry;
+		new_last->next = entry;
+		cpu_relax();
+	} while ((entry = cmpxchg(&head->first, old_entry, new_first)) != old_entry);
+}
+
+/**
+ * llist_del_first - delete the first entry of lock-less list
+ * @head:	the head for your lock-less list
+ *
+ * If list is empty, return NULL, otherwise, return the first entry
+ * deleted, this is the newest added one.
+ *
+ * Only one llist_del_first user can be used simultaneously with
+ * multiple llist_add users without lock.  Because otherwise
+ * llist_del_first, llist_add, llist_add (or llist_del_all, llist_add,
+ * llist_add) sequence in another user may change @head->first->next,
+ * but keep @head->first.  If multiple consumers are needed, please
+ * use llist_del_all or use lock between consumers.
+ */
+static inline struct llist_node *llist_del_first(struct llist_head *head)
+{
+	struct llist_node *entry, *old_entry, *next;
+
+#ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
+	BUG_ON(in_nmi());
+#endif
+
+	entry = head->first;
+	do {
+		if (entry == NULL)
+			return NULL;
+		old_entry = entry;
+		next = entry->next;
+		cpu_relax();
+	} while ((entry = cmpxchg(&head->first, old_entry, next)) != old_entry);
+
+	return entry;
+}
+
+/**
+ * llist_del_all - delete all entries from lock-less list
+ * @head:	the head of lock-less list to delete all entries
+ *
+ * If list is empty, return NULL, otherwise, delete all entries and
+ * return the pointer to the first entry.  The order of entries
+ * deleted is from the newest to the oldest added one.
+ */
+static inline struct llist_node *llist_del_all(struct llist_head *head)
+{
+#ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
+	BUG_ON(in_nmi());
+#endif
+
+	return xchg(&head->first, NULL);
+}
 #endif /* LLIST_H */
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -276,7 +276,4 @@ config CORDIC
 	  so its calculations are in fixed point. Modules can select this
 	  when they require this function. Module will be called cordic.
 
-config LLIST
-	bool
-
 endmenu
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -116,8 +116,6 @@ obj-$(CONFIG_CPU_RMAP) += cpu_rmap.o
 
 obj-$(CONFIG_CORDIC) += cordic.o
 
-obj-$(CONFIG_LLIST) += llist.o
-
 hostprogs-y	:= gen_crc32table
 clean-files	:= crc32table.h
 
--- a/lib/llist.c
+++ /dev/null
@@ -1,129 +0,0 @@
-/*
- * Lock-less NULL terminated single linked list
- *
- * The basic atomic operation of this list is cmpxchg on long.  On
- * architectures that don't have NMI-safe cmpxchg implementation, the
- * list can NOT be used in NMI handler.  So code uses the list in NMI
- * handler should depend on CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG.
- *
- * Copyright 2010,2011 Intel Corp.
- *   Author: Huang Ying <ying.huang@intel.com>
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License version
- * 2 as published by the Free Software Foundation;
- *
- * This program is distributed in the hope that it will be useful,
- * but WITHOUT ANY WARRANTY; without even the implied warranty of
- * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
- * GNU General Public License for more details.
- *
- * You should have received a copy of the GNU General Public License
- * along with this program; if not, write to the Free Software
- * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
- */
-#include <linux/kernel.h>
-#include <linux/module.h>
-#include <linux/interrupt.h>
-#include <linux/llist.h>
-
-#include <asm/system.h>
-
-/**
- * llist_add - add a new entry
- * @new:	new entry to be added
- * @head:	the head for your lock-less list
- */
-void llist_add(struct llist_node *new, struct llist_head *head)
-{
-	struct llist_node *entry, *old_entry;
-
-#ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
-	BUG_ON(in_nmi());
-#endif
-
-	entry = head->first;
-	do {
-		old_entry = entry;
-		new->next = entry;
-		cpu_relax();
-	} while ((entry = cmpxchg(&head->first, old_entry, new)) != old_entry);
-}
-EXPORT_SYMBOL_GPL(llist_add);
-
-/**
- * llist_add_batch - add several linked entries in batch
- * @new_first:	first entry in batch to be added
- * @new_last:	last entry in batch to be added
- * @head:	the head for your lock-less list
- */
-void llist_add_batch(struct llist_node *new_first, struct llist_node *new_last,
-		     struct llist_head *head)
-{
-	struct llist_node *entry, *old_entry;
-
-#ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
-	BUG_ON(in_nmi());
-#endif
-
-	entry = head->first;
-	do {
-		old_entry = entry;
-		new_last->next = entry;
-		cpu_relax();
-	} while ((entry = cmpxchg(&head->first, old_entry, new_first)) != old_entry);
-}
-EXPORT_SYMBOL_GPL(llist_add_batch);
-
-/**
- * llist_del_first - delete the first entry of lock-less list
- * @head:	the head for your lock-less list
- *
- * If list is empty, return NULL, otherwise, return the first entry
- * deleted, this is the newest added one.
- *
- * Only one llist_del_first user can be used simultaneously with
- * multiple llist_add users without lock.  Because otherwise
- * llist_del_first, llist_add, llist_add (or llist_del_all, llist_add,
- * llist_add) sequence in another user may change @head->first->next,
- * but keep @head->first.  If multiple consumers are needed, please
- * use llist_del_all or use lock between consumers.
- */
-struct llist_node *llist_del_first(struct llist_head *head)
-{
-	struct llist_node *entry, *old_entry, *next;
-
-#ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
-	BUG_ON(in_nmi());
-#endif
-
-	entry = head->first;
-	do {
-		if (entry == NULL)
-			return NULL;
-		old_entry = entry;
-		next = entry->next;
-		cpu_relax();
-	} while ((entry = cmpxchg(&head->first, old_entry, next)) != old_entry);
-
-	return entry;
-}
-EXPORT_SYMBOL_GPL(llist_del_first);
-
-/**
- * llist_del_all - delete all entries from lock-less list
- * @head:	the head of lock-less list to delete all entries
- *
- * If list is empty, return NULL, otherwise, delete all entries and
- * return the pointer to the first entry.  The order of entries
- * deleted is from the newest to the oldest added one.
- */
-struct llist_node *llist_del_all(struct llist_head *head)
-{
-#ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
-	BUG_ON(in_nmi());
-#endif
-
-	return xchg(&head->first, NULL);
-}
-EXPORT_SYMBOL_GPL(llist_del_all);

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH -mm -v2 2/5] llist, Define macro to check NMI safe cmpxchg
  2011-09-08  6:00 [PATCH -mm -v2 0/5] irq_work, Use llist in irq_work Huang Ying
  2011-09-08  6:00 ` [PATCH -mm -v2 1/5] llist, Make all llist functions inline Huang Ying
@ 2011-09-08  6:00 ` Huang Ying
  2011-09-08  6:00 ` [PATCH -mm -v2 3/5] llist, Move cpu_relax after cmpxchg Huang Ying
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 23+ messages in thread
From: Huang Ying @ 2011-09-08  6:00 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Andi Kleen, ying.huang, Peter Zijlstra, Mathieu Desnoyers

To make code cleaner and reduce code duplication.  Thanks Peter
Zijlstra for reminding.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
---
 include/linux/llist.h |   22 ++++++++++------------
 1 file changed, 10 insertions(+), 12 deletions(-)

--- a/include/linux/llist.h
+++ b/include/linux/llist.h
@@ -70,6 +70,12 @@ struct llist_node {
 #define LLIST_HEAD_INIT(name)	{ NULL }
 #define LLIST_HEAD(name)	struct llist_head name = LLIST_HEAD_INIT(name)
 
+#ifdef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
+#define CHECK_NMI_SAFE_CMPXCHG()
+#else
+#define CHECK_NMI_SAFE_CMPXCHG()	BUG_ON(in_nmi())
+#endif
+
 /**
  * init_llist_head - initialize lock-less list head
  * @head:	the head for your lock-less list
@@ -147,9 +153,7 @@ static inline void llist_add(struct llis
 {
 	struct llist_node *entry, *old_entry;
 
-#ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
-	BUG_ON(in_nmi());
-#endif
+	CHECK_NMI_SAFE_CMPXCHG();
 
 	entry = head->first;
 	do {
@@ -171,9 +175,7 @@ static inline void llist_add_batch(struc
 {
 	struct llist_node *entry, *old_entry;
 
-#ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
-	BUG_ON(in_nmi());
-#endif
+	CHECK_NMI_SAFE_CMPXCHG();
 
 	entry = head->first;
 	do {
@@ -201,9 +203,7 @@ static inline struct llist_node *llist_d
 {
 	struct llist_node *entry, *old_entry, *next;
 
-#ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
-	BUG_ON(in_nmi());
-#endif
+	CHECK_NMI_SAFE_CMPXCHG();
 
 	entry = head->first;
 	do {
@@ -227,9 +227,7 @@ static inline struct llist_node *llist_d
  */
 static inline struct llist_node *llist_del_all(struct llist_head *head)
 {
-#ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
-	BUG_ON(in_nmi());
-#endif
+	CHECK_NMI_SAFE_CMPXCHG();
 
 	return xchg(&head->first, NULL);
 }

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH -mm -v2 3/5] llist, Move cpu_relax after cmpxchg
  2011-09-08  6:00 [PATCH -mm -v2 0/5] irq_work, Use llist in irq_work Huang Ying
  2011-09-08  6:00 ` [PATCH -mm -v2 1/5] llist, Make all llist functions inline Huang Ying
  2011-09-08  6:00 ` [PATCH -mm -v2 2/5] llist, Define macro to check NMI safe cmpxchg Huang Ying
@ 2011-09-08  6:00 ` Huang Ying
  2011-09-08  6:00 ` [PATCH -mm -v2 4/5] llist, Return whether list is empty before adding in llist_add Huang Ying
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 23+ messages in thread
From: Huang Ying @ 2011-09-08  6:00 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Andi Kleen, ying.huang, Peter Zijlstra, Mathieu Desnoyers

If the first cmpxchg calling succeeds, it is not necessary to use
cpu_relax before cmpxchg.  So cpu_relax in a busy loop involving
cmpxchg should go after cmpxchg instead of before that.  This patch
fixes this in llist.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
---
 include/linux/llist.h |   21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

--- a/include/linux/llist.h
+++ b/include/linux/llist.h
@@ -156,11 +156,14 @@ static inline void llist_add(struct llis
 	CHECK_NMI_SAFE_CMPXCHG();
 
 	entry = head->first;
-	do {
+	for (;;) {
 		old_entry = entry;
 		new->next = entry;
+		entry = cmpxchg(&head->first, old_entry, new);
+		if (entry == old_entry)
+			break;
 		cpu_relax();
-	} while ((entry = cmpxchg(&head->first, old_entry, new)) != old_entry);
+	}
 }
 
 /**
@@ -178,11 +181,14 @@ static inline void llist_add_batch(struc
 	CHECK_NMI_SAFE_CMPXCHG();
 
 	entry = head->first;
-	do {
+	for (;;) {
 		old_entry = entry;
 		new_last->next = entry;
+		entry = cmpxchg(&head->first, old_entry, new_first);
+		if (entry == old_entry)
+			break;
 		cpu_relax();
-	} while ((entry = cmpxchg(&head->first, old_entry, new_first)) != old_entry);
+	}
 }
 
 /**
@@ -206,13 +212,16 @@ static inline struct llist_node *llist_d
 	CHECK_NMI_SAFE_CMPXCHG();
 
 	entry = head->first;
-	do {
+	for (;;) {
 		if (entry == NULL)
 			return NULL;
 		old_entry = entry;
 		next = entry->next;
+		entry = cmpxchg(&head->first, old_entry, next);
+		if (entry == old_entry)
+			break;
 		cpu_relax();
-	} while ((entry = cmpxchg(&head->first, old_entry, next)) != old_entry);
+	}
 
 	return entry;
 }

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH -mm -v2 4/5] llist, Return whether list is empty before adding in llist_add
  2011-09-08  6:00 [PATCH -mm -v2 0/5] irq_work, Use llist in irq_work Huang Ying
                   ` (2 preceding siblings ...)
  2011-09-08  6:00 ` [PATCH -mm -v2 3/5] llist, Move cpu_relax after cmpxchg Huang Ying
@ 2011-09-08  6:00 ` Huang Ying
  2011-09-08  6:00 ` [PATCH -mm -v2 5/5] irq_work, Use llist in irq_work Huang Ying
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 23+ messages in thread
From: Huang Ying @ 2011-09-08  6:00 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Andi Kleen, ying.huang, Peter Zijlstra, Mathieu Desnoyers

This is needed by irq_work.  And because list_add_xxx functions are
inline, this can be optimized out if not needed by callers.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Peter Zijlstra <peterz@infradead.org>
---
 include/linux/llist.h |   12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

--- a/include/linux/llist.h
+++ b/include/linux/llist.h
@@ -148,8 +148,10 @@ static inline bool llist_empty(const str
  * llist_add - add a new entry
  * @new:	new entry to be added
  * @head:	the head for your lock-less list
+ *
+ * Return whether list is empty before adding.
  */
-static inline void llist_add(struct llist_node *new, struct llist_head *head)
+static inline bool llist_add(struct llist_node *new, struct llist_head *head)
 {
 	struct llist_node *entry, *old_entry;
 
@@ -164,6 +166,8 @@ static inline void llist_add(struct llis
 			break;
 		cpu_relax();
 	}
+
+	return old_entry == NULL;
 }
 
 /**
@@ -171,8 +175,10 @@ static inline void llist_add(struct llis
  * @new_first:	first entry in batch to be added
  * @new_last:	last entry in batch to be added
  * @head:	the head for your lock-less list
+ *
+ * Return whether list is empty before adding.
  */
-static inline void llist_add_batch(struct llist_node *new_first,
+static inline bool llist_add_batch(struct llist_node *new_first,
 				   struct llist_node *new_last,
 				   struct llist_head *head)
 {
@@ -189,6 +195,8 @@ static inline void llist_add_batch(struc
 			break;
 		cpu_relax();
 	}
+
+	return old_entry == NULL;
 }
 
 /**

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH -mm -v2 5/5] irq_work, Use llist in irq_work
  2011-09-08  6:00 [PATCH -mm -v2 0/5] irq_work, Use llist in irq_work Huang Ying
                   ` (3 preceding siblings ...)
  2011-09-08  6:00 ` [PATCH -mm -v2 4/5] llist, Return whether list is empty before adding in llist_add Huang Ying
@ 2011-09-08  6:00 ` Huang Ying
  2011-09-12 14:05 ` [PATCH 6/5] llist: Add llist_next() Peter Zijlstra
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 23+ messages in thread
From: Huang Ying @ 2011-09-08  6:00 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Andi Kleen, ying.huang, Peter Zijlstra, Mathieu Desnoyers

Use llist in irq_work instead of the lock-less linked list
implementation in irq_work to avoid the code duplication.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
---
 include/linux/irq_work.h |   15 ++++---
 kernel/irq_work.c        |   92 ++++++++++++++++-------------------------------
 2 files changed, 42 insertions(+), 65 deletions(-)

--- a/include/linux/irq_work.h
+++ b/include/linux/irq_work.h
@@ -1,20 +1,23 @@
 #ifndef _LINUX_IRQ_WORK_H
 #define _LINUX_IRQ_WORK_H
 
+#include <linux/llist.h>
+
 struct irq_work {
-	struct irq_work *next;
+	unsigned long flags;
+	struct llist_node llnode;
 	void (*func)(struct irq_work *);
 };
 
 static inline
-void init_irq_work(struct irq_work *entry, void (*func)(struct irq_work *))
+void init_irq_work(struct irq_work *work, void (*func)(struct irq_work *))
 {
-	entry->next = NULL;
-	entry->func = func;
+	work->flags = 0;
+	work->func = func;
 }
 
-bool irq_work_queue(struct irq_work *entry);
+bool irq_work_queue(struct irq_work *work);
 void irq_work_run(void);
-void irq_work_sync(struct irq_work *entry);
+void irq_work_sync(struct irq_work *work);
 
 #endif /* _LINUX_IRQ_WORK_H */
--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c
@@ -10,7 +10,6 @@
 #include <linux/irq_work.h>
 #include <linux/percpu.h>
 #include <linux/hardirq.h>
-#include <asm/processor.h>
 
 /*
  * An entry can be in one of four states:
@@ -19,54 +18,34 @@
  * claimed   NULL, 3 -> {pending}       : claimed to be enqueued
  * pending   next, 3 -> {busy}          : queued, pending callback
  * busy      NULL, 2 -> {free, claimed} : callback in progress, can be claimed
- *
- * We use the lower two bits of the next pointer to keep PENDING and BUSY
- * flags.
  */
 
 #define IRQ_WORK_PENDING	1UL
 #define IRQ_WORK_BUSY		2UL
 #define IRQ_WORK_FLAGS		3UL
 
-static inline bool irq_work_is_set(struct irq_work *entry, int flags)
-{
-	return (unsigned long)entry->next & flags;
-}
-
-static inline struct irq_work *irq_work_next(struct irq_work *entry)
-{
-	unsigned long next = (unsigned long)entry->next;
-	next &= ~IRQ_WORK_FLAGS;
-	return (struct irq_work *)next;
-}
-
-static inline struct irq_work *next_flags(struct irq_work *entry, int flags)
-{
-	unsigned long next = (unsigned long)entry;
-	next |= flags;
-	return (struct irq_work *)next;
-}
-
-static DEFINE_PER_CPU(struct irq_work *, irq_work_list);
+static DEFINE_PER_CPU(struct llist_head, irq_work_list);
 
 /*
  * Claim the entry so that no one else will poke at it.
  */
-static bool irq_work_claim(struct irq_work *entry)
+static bool irq_work_claim(struct irq_work *work)
 {
-	struct irq_work *next, *nflags;
+	unsigned long flags, nflags;
 
-	do {
-		next = entry->next;
-		if ((unsigned long)next & IRQ_WORK_PENDING)
+	for (;;) {
+		flags = work->flags;
+		if (flags & IRQ_WORK_PENDING)
 			return false;
-		nflags = next_flags(next, IRQ_WORK_FLAGS);
-	} while (cmpxchg(&entry->next, next, nflags) != next);
+		nflags = flags | IRQ_WORK_FLAGS;
+		if (cmpxchg(&work->flags, flags, nflags) == flags)
+			break;
+		cpu_relax();
+	}
 
 	return true;
 }
 
-
 void __weak arch_irq_work_raise(void)
 {
 	/*
@@ -77,20 +56,15 @@ void __weak arch_irq_work_raise(void)
 /*
  * Queue the entry and raise the IPI if needed.
  */
-static void __irq_work_queue(struct irq_work *entry)
+static void __irq_work_queue(struct irq_work *work)
 {
-	struct irq_work *next;
+	bool empty;
 
 	preempt_disable();
 
-	do {
-		next = __this_cpu_read(irq_work_list);
-		/* Can assign non-atomic because we keep the flags set. */
-		entry->next = next_flags(next, IRQ_WORK_FLAGS);
-	} while (this_cpu_cmpxchg(irq_work_list, next, entry) != next);
-
+	empty = llist_add(&work->llnode, &__get_cpu_var(irq_work_list));
 	/* The list was empty, raise self-interrupt to start processing. */
-	if (!irq_work_next(entry))
+	if (empty)
 		arch_irq_work_raise();
 
 	preempt_enable();
@@ -102,16 +76,16 @@ static void __irq_work_queue(struct irq_
  *
  * Can be re-enqueued while the callback is still in progress.
  */
-bool irq_work_queue(struct irq_work *entry)
+bool irq_work_queue(struct irq_work *work)
 {
-	if (!irq_work_claim(entry)) {
+	if (!irq_work_claim(work)) {
 		/*
 		 * Already enqueued, can't do!
 		 */
 		return false;
 	}
 
-	__irq_work_queue(entry);
+	__irq_work_queue(work);
 	return true;
 }
 EXPORT_SYMBOL_GPL(irq_work_queue);
@@ -122,34 +96,34 @@ EXPORT_SYMBOL_GPL(irq_work_queue);
  */
 void irq_work_run(void)
 {
-	struct irq_work *list;
+	struct irq_work *work;
+	struct llist_head *this_list;
+	struct llist_node *llnode;
 
-	if (this_cpu_read(irq_work_list) == NULL)
+	this_list = &__get_cpu_var(irq_work_list);
+	if (llist_empty(this_list))
 		return;
 
 	BUG_ON(!in_irq());
 	BUG_ON(!irqs_disabled());
 
-	list = this_cpu_xchg(irq_work_list, NULL);
-
-	while (list != NULL) {
-		struct irq_work *entry = list;
+	llnode = llist_del_all(this_list);
+	while (llnode != NULL) {
+		work = llist_entry(llnode, struct irq_work, llnode);
 
-		list = irq_work_next(list);
+		llnode = llnode->next;
 
 		/*
-		 * Clear the PENDING bit, after this point the @entry
+		 * Clear the PENDING bit, after this point the @work
 		 * can be re-used.
 		 */
-		entry->next = next_flags(NULL, IRQ_WORK_BUSY);
-		entry->func(entry);
+		work->flags = IRQ_WORK_BUSY;
+		work->func(work);
 		/*
 		 * Clear the BUSY bit and return to the free state if
 		 * no-one else claimed it meanwhile.
 		 */
-		(void)cmpxchg(&entry->next,
-			      next_flags(NULL, IRQ_WORK_BUSY),
-			      NULL);
+		(void)cmpxchg(&work->flags, IRQ_WORK_BUSY, 0);
 	}
 }
 EXPORT_SYMBOL_GPL(irq_work_run);
@@ -158,11 +132,11 @@ EXPORT_SYMBOL_GPL(irq_work_run);
  * Synchronize against the irq_work @entry, ensures the entry is not
  * currently in use.
  */
-void irq_work_sync(struct irq_work *entry)
+void irq_work_sync(struct irq_work *work)
 {
 	WARN_ON_ONCE(irqs_disabled());
 
-	while (irq_work_is_set(entry, IRQ_WORK_BUSY))
+	while (work->flags & IRQ_WORK_BUSY)
 		cpu_relax();
 }
 EXPORT_SYMBOL_GPL(irq_work_sync);

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 6/5] llist: Add llist_next()
  2011-09-08  6:00 [PATCH -mm -v2 0/5] irq_work, Use llist in irq_work Huang Ying
                   ` (4 preceding siblings ...)
  2011-09-08  6:00 ` [PATCH -mm -v2 5/5] irq_work, Use llist in irq_work Huang Ying
@ 2011-09-12 14:05 ` Peter Zijlstra
  2011-09-12 14:05 ` [PATCH 7/5] sched: Convert to use llist Peter Zijlstra
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 23+ messages in thread
From: Peter Zijlstra @ 2011-09-12 14:05 UTC (permalink / raw)
  To: Huang Ying; +Cc: Andrew Morton, linux-kernel, Andi Kleen, Mathieu Desnoyers

Subject: llist: Add llist_next()
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date: Mon Sep 12 13:12:28 CEST 2011

So we don't have to expose the struct list_node members

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 include/linux/llist.h |    5 +++++
 kernel/irq_work.c     |    2 +-
 2 files changed, 6 insertions(+), 1 deletion(-)
Index: linux-2.6/include/linux/llist.h
===================================================================
--- linux-2.6.orig/include/linux/llist.h
+++ linux-2.6/include/linux/llist.h
@@ -144,6 +144,11 @@ static inline bool llist_empty(const str
 	return ACCESS_ONCE(head->first) == NULL;
 }
 
+static inline struct llist_node *llist_next(struct llist_node *node)
+{
+	return node->next;
+}
+
 /**
  * llist_add - add a new entry
  * @new:	new entry to be added
Index: linux-2.6/kernel/irq_work.c
===================================================================
--- linux-2.6.orig/kernel/irq_work.c
+++ linux-2.6/kernel/irq_work.c
@@ -110,7 +110,7 @@ void irq_work_run(void)
 	while (llnode != NULL) {
 		work = llist_entry(llnode, struct irq_work, llnode);
 
-		llnode = llnode->next;
+		llnode = llist_next(llnode);
 
 		/*
 		 * Clear the PENDING bit, after this point the @work


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 7/5] sched: Convert to use llist
  2011-09-08  6:00 [PATCH -mm -v2 0/5] irq_work, Use llist in irq_work Huang Ying
                   ` (5 preceding siblings ...)
  2011-09-12 14:05 ` [PATCH 6/5] llist: Add llist_next() Peter Zijlstra
@ 2011-09-12 14:05 ` Peter Zijlstra
  2011-09-12 14:05 ` [PATCH 8/5] llist: Remove cpu_relax() usage in cmpxchg loops Peter Zijlstra
  2011-09-12 14:06 ` [PATCH -mm -v2 0/5] irq_work, Use llist in irq_work Peter Zijlstra
  8 siblings, 0 replies; 23+ messages in thread
From: Peter Zijlstra @ 2011-09-12 14:05 UTC (permalink / raw)
  To: Huang Ying; +Cc: Andrew Morton, linux-kernel, Andi Kleen, Mathieu Desnoyers

Subject: sched: Convert to use llist
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date: Mon Sep 12 13:06:17 CEST 2011

Use the generic llist primitives..

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 include/linux/sched.h |    3 ++-
 kernel/sched.c        |   48 ++++++++++--------------------------------------
 2 files changed, 12 insertions(+), 39 deletions(-)
Index: linux-2.6/include/linux/sched.h
===================================================================
--- linux-2.6.orig/include/linux/sched.h
+++ linux-2.6/include/linux/sched.h
@@ -90,6 +90,7 @@ struct sched_param {
 #include <linux/task_io_accounting.h>
 #include <linux/latencytop.h>
 #include <linux/cred.h>
+#include <linux/llist.h>
 
 #include <asm/processor.h>
 
@@ -1225,7 +1226,7 @@ struct task_struct {
 	unsigned int ptrace;
 
 #ifdef CONFIG_SMP
-	struct task_struct *wake_entry;
+	struct llist_node wake_entry;
 	int on_cpu;
 #endif
 	int on_rq;
Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -702,7 +702,7 @@ struct rq {
 #endif
 
 #ifdef CONFIG_SMP
-	struct task_struct *wake_list;
+	struct llist_head wake_list;
 #endif
 };
 
@@ -2698,42 +2698,26 @@ static int ttwu_remote(struct task_struc
 }
 
 #ifdef CONFIG_SMP
-static void sched_ttwu_do_pending(struct task_struct *list)
+static void sched_ttwu_pending(void)
 {
 	struct rq *rq = this_rq();
+	struct llist_node *llist = llist_del_all(&rq->wake_list);
+	struct task_struct *p;
 
 	raw_spin_lock(&rq->lock);
 
-	while (list) {
-		struct task_struct *p = list;
-		list = list->wake_entry;
+	while (llist) {
+		p = llist_entry(llist, struct task_struct, wake_entry);
+		llist = llist_next(llist);
 		ttwu_do_activate(rq, p, 0);
 	}
 
 	raw_spin_unlock(&rq->lock);
 }
 
-#ifdef CONFIG_HOTPLUG_CPU
-
-static void sched_ttwu_pending(void)
-{
-	struct rq *rq = this_rq();
-	struct task_struct *list = xchg(&rq->wake_list, NULL);
-
-	if (!list)
-		return;
-
-	sched_ttwu_do_pending(list);
-}
-
-#endif /* CONFIG_HOTPLUG_CPU */
-
 void scheduler_ipi(void)
 {
-	struct rq *rq = this_rq();
-	struct task_struct *list = xchg(&rq->wake_list, NULL);
-
-	if (!list)
+	if (llist_empty(&this_rq()->wake_list))
 		return;
 
 	/*
@@ -2750,25 +2734,13 @@ void scheduler_ipi(void)
 	 * somewhat pessimize the simple resched case.
 	 */
 	irq_enter();
-	sched_ttwu_do_pending(list);
+	sched_ttwu_pending();
 	irq_exit();
 }
 
 static void ttwu_queue_remote(struct task_struct *p, int cpu)
 {
-	struct rq *rq = cpu_rq(cpu);
-	struct task_struct *next = rq->wake_list;
-
-	for (;;) {
-		struct task_struct *old = next;
-
-		p->wake_entry = next;
-		next = cmpxchg(&rq->wake_list, old, p);
-		if (next == old)
-			break;
-	}
-
-	if (!next)
+	if (llist_add(&p->wake_entry, &cpu_rq(cpu)->wake_list))
 		smp_send_reschedule(cpu);
 }
 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 8/5] llist: Remove cpu_relax() usage in cmpxchg loops
  2011-09-08  6:00 [PATCH -mm -v2 0/5] irq_work, Use llist in irq_work Huang Ying
                   ` (6 preceding siblings ...)
  2011-09-12 14:05 ` [PATCH 7/5] sched: Convert to use llist Peter Zijlstra
@ 2011-09-12 14:05 ` Peter Zijlstra
  2011-09-12 14:23   ` Andi Kleen
  2011-09-12 14:26   ` Avi Kivity
  2011-09-12 14:06 ` [PATCH -mm -v2 0/5] irq_work, Use llist in irq_work Peter Zijlstra
  8 siblings, 2 replies; 23+ messages in thread
From: Peter Zijlstra @ 2011-09-12 14:05 UTC (permalink / raw)
  To: Huang Ying; +Cc: Andrew Morton, linux-kernel, Andi Kleen, Mathieu Desnoyers

Subject: llist: Remove cpu_relax() usage in cmpxchg loops
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date: Mon Sep 12 15:50:49 CEST 2011

Initial benchmarks show they're a net loss (2 socket wsm):

 $ for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor ; do echo performance > $i; done
 $ echo 4096 32000 64 128 > /proc/sys/kernel/sem
 $ ./sembench -t 2048 -w 1900 -o 0

Pre:

run time 30 seconds 778936 worker burns per second
run time 30 seconds 912190 worker burns per second
run time 30 seconds 817506 worker burns per second
run time 30 seconds 830870 worker burns per second
run time 30 seconds 845056 worker burns per second

Post:

run time 30 seconds 905920 worker burns per second
run time 30 seconds 849046 worker burns per second
run time 30 seconds 886286 worker burns per second
run time 30 seconds 822320 worker burns per second
run time 30 seconds 900283 worker burns per second

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
 include/linux/llist.h |    3 ---
 1 file changed, 3 deletions(-)

Index: linux-2.6/include/linux/llist.h
===================================================================
--- linux-2.6.orig/include/linux/llist.h
+++ linux-2.6/include/linux/llist.h
@@ -169,7 +169,6 @@ static inline bool llist_add(struct llis
 		entry = cmpxchg(&head->first, old_entry, new);
 		if (entry == old_entry)
 			break;
-		cpu_relax();
 	}
 
 	return old_entry == NULL;
@@ -198,7 +197,6 @@ static inline bool llist_add_batch(struc
 		entry = cmpxchg(&head->first, old_entry, new_first);
 		if (entry == old_entry)
 			break;
-		cpu_relax();
 	}
 
 	return old_entry == NULL;
@@ -233,7 +231,6 @@ static inline struct llist_node *llist_d
 		entry = cmpxchg(&head->first, old_entry, next);
 		if (entry == old_entry)
 			break;
-		cpu_relax();
 	}
 
 	return entry;


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH -mm -v2 0/5] irq_work, Use llist in irq_work
  2011-09-08  6:00 [PATCH -mm -v2 0/5] irq_work, Use llist in irq_work Huang Ying
                   ` (7 preceding siblings ...)
  2011-09-12 14:05 ` [PATCH 8/5] llist: Remove cpu_relax() usage in cmpxchg loops Peter Zijlstra
@ 2011-09-12 14:06 ` Peter Zijlstra
  8 siblings, 0 replies; 23+ messages in thread
From: Peter Zijlstra @ 2011-09-12 14:06 UTC (permalink / raw)
  To: Huang Ying; +Cc: Andrew Morton, linux-kernel, Andi Kleen, Mathieu Desnoyers


Andrew, shall I take all 8 patches through the scheduler tree?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 8/5] llist: Remove cpu_relax() usage in cmpxchg loops
  2011-09-12 14:05 ` [PATCH 8/5] llist: Remove cpu_relax() usage in cmpxchg loops Peter Zijlstra
@ 2011-09-12 14:23   ` Andi Kleen
  2011-09-12 14:23     ` Peter Zijlstra
  2011-09-12 14:47     ` Mathieu Desnoyers
  2011-09-12 14:26   ` Avi Kivity
  1 sibling, 2 replies; 23+ messages in thread
From: Andi Kleen @ 2011-09-12 14:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Huang Ying, Andrew Morton, linux-kernel, Andi Kleen, Mathieu Desnoyers

On Mon, Sep 12, 2011 at 04:05:58PM +0200, Peter Zijlstra wrote:
> Subject: llist: Remove cpu_relax() usage in cmpxchg loops
> From: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Date: Mon Sep 12 15:50:49 CEST 2011
> 
> Initial benchmarks show they're a net loss (2 socket wsm):
> 

May still save power. however:

>  		entry = cmpxchg(&head->first, old_entry, new);
>  		if (entry == old_entry)
>  			break;
> -		cpu_relax();

You need at least a barrier() then, otherwise the compiler could
change the memory order in the loop.

-Andi

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 8/5] llist: Remove cpu_relax() usage in cmpxchg loops
  2011-09-12 14:23   ` Andi Kleen
@ 2011-09-12 14:23     ` Peter Zijlstra
  2011-09-12 14:47     ` Mathieu Desnoyers
  1 sibling, 0 replies; 23+ messages in thread
From: Peter Zijlstra @ 2011-09-12 14:23 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Huang Ying, Andrew Morton, linux-kernel, Mathieu Desnoyers

On Mon, 2011-09-12 at 16:23 +0200, Andi Kleen wrote:
> On Mon, Sep 12, 2011 at 04:05:58PM +0200, Peter Zijlstra wrote:
> > Subject: llist: Remove cpu_relax() usage in cmpxchg loops
> > From: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > Date: Mon Sep 12 15:50:49 CEST 2011
> > 
> > Initial benchmarks show they're a net loss (2 socket wsm):
> > 
> 
> May still save power. however:
> 
> >  		entry = cmpxchg(&head->first, old_entry, new);
> >  		if (entry == old_entry)
> >  			break;
> > -		cpu_relax();
> 
> You need at least a barrier() then, otherwise the compiler could
> change the memory order in the loop.

the cmpxchg is a full memory barrier and compiler barrier.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 8/5] llist: Remove cpu_relax() usage in cmpxchg loops
  2011-09-12 14:05 ` [PATCH 8/5] llist: Remove cpu_relax() usage in cmpxchg loops Peter Zijlstra
  2011-09-12 14:23   ` Andi Kleen
@ 2011-09-12 14:26   ` Avi Kivity
  2011-09-12 14:32     ` Peter Zijlstra
  1 sibling, 1 reply; 23+ messages in thread
From: Avi Kivity @ 2011-09-12 14:26 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Huang Ying, Andrew Morton, linux-kernel, Andi Kleen, Mathieu Desnoyers

On 09/12/2011 05:05 PM, Peter Zijlstra wrote:
> Subject: llist: Remove cpu_relax() usage in cmpxchg loops
> From: Peter Zijlstra<a.p.zijlstra@chello.nl>
> Date: Mon Sep 12 15:50:49 CEST 2011
>
> Initial benchmarks show they're a net loss (2 socket wsm):
>
>   $ for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor ; do echo performance>  $i; done
>   $ echo 4096 32000 64 128>  /proc/sys/kernel/sem
>   $ ./sembench -t 2048 -w 1900 -o 0
>

We hyperthreading enabled, and were all threads loaded?  cpu_relax 
allows the other thread to make more progress while the spinner relaxes.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 8/5] llist: Remove cpu_relax() usage in cmpxchg loops
  2011-09-12 14:26   ` Avi Kivity
@ 2011-09-12 14:32     ` Peter Zijlstra
  2011-09-13 11:43       ` Avi Kivity
  0 siblings, 1 reply; 23+ messages in thread
From: Peter Zijlstra @ 2011-09-12 14:32 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Huang Ying, Andrew Morton, linux-kernel, Andi Kleen, Mathieu Desnoyers

On Mon, 2011-09-12 at 17:26 +0300, Avi Kivity wrote:
> On 09/12/2011 05:05 PM, Peter Zijlstra wrote:
> > Subject: llist: Remove cpu_relax() usage in cmpxchg loops
> > From: Peter Zijlstra<a.p.zijlstra@chello.nl>
> > Date: Mon Sep 12 15:50:49 CEST 2011
> >
> > Initial benchmarks show they're a net loss (2 socket wsm):
> >
> >   $ for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor ; do echo performance>  $i; done
> >   $ echo 4096 32000 64 128>  /proc/sys/kernel/sem
> >   $ ./sembench -t 2048 -w 1900 -o 0
> >
> 
> We hyperthreading enabled, and were all threads loaded?  cpu_relax 
> allows the other thread to make more progress while the spinner relaxes.

Yeah, with HT enabled, the benchmark runs 2048 tasks and does 1900 task
bulk wakeups or so. Forgot the details, but it basically stresses the
sleep+wakeup paths like nobodies business.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 8/5] llist: Remove cpu_relax() usage in cmpxchg loops
  2011-09-12 14:23   ` Andi Kleen
  2011-09-12 14:23     ` Peter Zijlstra
@ 2011-09-12 14:47     ` Mathieu Desnoyers
  2011-09-12 15:09       ` Peter Zijlstra
                         ` (2 more replies)
  1 sibling, 3 replies; 23+ messages in thread
From: Mathieu Desnoyers @ 2011-09-12 14:47 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Peter Zijlstra, Huang Ying, Andrew Morton, linux-kernel

* Andi Kleen (andi@firstfloor.org) wrote:
> On Mon, Sep 12, 2011 at 04:05:58PM +0200, Peter Zijlstra wrote:
> > Subject: llist: Remove cpu_relax() usage in cmpxchg loops
> > From: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > Date: Mon Sep 12 15:50:49 CEST 2011
> > 
> > Initial benchmarks show they're a net loss (2 socket wsm):
> > 
> 
> May still save power.

Looking at kernel/spinlock.c:

void __lockfunc __raw_##op##_lock(locktype##_t *lock) \
{
[...]
                while (!raw_##op##_can_lock(lock) && (lock)->break_lock)\
                        arch_##op##_relax(&lock->raw_lock); \

so basically, in typical locking primitives (spinlock), it looks like
lower power consumption is preferred over getting the raw maximal
performance in fully contented scenarios.

So what is the rationale for making those lock-less lists retry scheme
different from spinlocks here ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 8/5] llist: Remove cpu_relax() usage in cmpxchg loops
  2011-09-12 14:47     ` Mathieu Desnoyers
@ 2011-09-12 15:09       ` Peter Zijlstra
  2011-09-12 15:24       ` Peter Zijlstra
  2011-09-12 16:38       ` Andi Kleen
  2 siblings, 0 replies; 23+ messages in thread
From: Peter Zijlstra @ 2011-09-12 15:09 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: Andi Kleen, Huang Ying, Andrew Morton, linux-kernel

On Mon, 2011-09-12 at 10:47 -0400, Mathieu Desnoyers wrote:
> * Andi Kleen (andi@firstfloor.org) wrote:
> > On Mon, Sep 12, 2011 at 04:05:58PM +0200, Peter Zijlstra wrote:
> > > Subject: llist: Remove cpu_relax() usage in cmpxchg loops
> > > From: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > > Date: Mon Sep 12 15:50:49 CEST 2011
> > > 
> > > Initial benchmarks show they're a net loss (2 socket wsm):
> > > 
> > 
> > May still save power.
> 
> Looking at kernel/spinlock.c:
> 
> void __lockfunc __raw_##op##_lock(locktype##_t *lock) \
> {
> [...]
>                 while (!raw_##op##_can_lock(lock) && (lock)->break_lock)\
>                         arch_##op##_relax(&lock->raw_lock); \
> 
> so basically, in typical locking primitives (spinlock), it looks like
> lower power consumption is preferred over getting the raw maximal
> performance in fully contented scenarios.

Who says its about power consumption? That was a baseless claim made by
Andi which you propagate as a truth. Typically PAUSE is too short to
really save any power and the only gain of having it in loops is to
provide a window where another core on the cache domain can have a go.

If power consumption would be the prime concern Intel should fix their
damn mwait implementation so we can use that for locks. Local spinners +
mwait would give a very power efficient 'spin'-lock.

> So what is the rationale for making those lock-less lists retry scheme
> different from spinlocks here ?

PAUSE should help on the medium contended case (it would be pointless on
heavy contention), but hurt the lightly contended case. We have all
kinds of contention spinlocks in the kernel, but we don't as of yet have
very contended llist users in the kernel.

Furthermore, I would argue we should avoid growing them, significantly
contended atomic ops are bad, use a different scheme.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 8/5] llist: Remove cpu_relax() usage in cmpxchg loops
  2011-09-12 14:47     ` Mathieu Desnoyers
  2011-09-12 15:09       ` Peter Zijlstra
@ 2011-09-12 15:24       ` Peter Zijlstra
  2011-09-12 16:38       ` Andi Kleen
  2 siblings, 0 replies; 23+ messages in thread
From: Peter Zijlstra @ 2011-09-12 15:24 UTC (permalink / raw)
  To: Mathieu Desnoyers; +Cc: Andi Kleen, Huang Ying, Andrew Morton, linux-kernel

On Mon, 2011-09-12 at 17:09 +0200, Peter Zijlstra wrote:

> Furthermore, I would argue we should avoid growing them, significantly
> contended atomic ops are bad, use a different scheme.

To clarify, plain cmpxchg loops are unfair and unbounded in completion
time, and we should therefore avoid any sort of medium/high contention.
If you get into that situation you're doing it wrong.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 8/5] llist: Remove cpu_relax() usage in cmpxchg loops
  2011-09-12 14:47     ` Mathieu Desnoyers
  2011-09-12 15:09       ` Peter Zijlstra
  2011-09-12 15:24       ` Peter Zijlstra
@ 2011-09-12 16:38       ` Andi Kleen
  2011-09-12 18:53         ` Peter Zijlstra
  2 siblings, 1 reply; 23+ messages in thread
From: Andi Kleen @ 2011-09-12 16:38 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Andi Kleen, Peter Zijlstra, Huang Ying, Andrew Morton, linux-kernel

> so basically, in typical locking primitives (spinlock), it looks like
> lower power consumption is preferred over getting the raw maximal

It's not only power, its:
- Allow the other siblings make more progress on SMT
- Do some backoff to stress the interconnect less (this is important on >2S):
A tight loop which constantly writes is a extremly stressfull pattern.
- Save some power by allowing the CPU to do more clock gating

-Andi

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 8/5] llist: Remove cpu_relax() usage in cmpxchg loops
  2011-09-12 16:38       ` Andi Kleen
@ 2011-09-12 18:53         ` Peter Zijlstra
  0 siblings, 0 replies; 23+ messages in thread
From: Peter Zijlstra @ 2011-09-12 18:53 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Mathieu Desnoyers, Huang Ying, Andrew Morton, linux-kernel

On Mon, 2011-09-12 at 18:38 +0200, Andi Kleen wrote:
> > so basically, in typical locking primitives (spinlock), it looks like
> > lower power consumption is preferred over getting the raw maximal
> 
> It's not only power, its:
> - Allow the other siblings make more progress on SMT
> - Do some backoff to stress the interconnect less (this is important on >2S):
> A tight loop which constantly writes is a extremly stressfull pattern.
> - Save some power by allowing the CPU to do more clock gating

If you're hitting a cmpxchg hard enough for any of those to make a
difference you're doing it wrong.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 8/5] llist: Remove cpu_relax() usage in cmpxchg loops
  2011-09-12 14:32     ` Peter Zijlstra
@ 2011-09-13 11:43       ` Avi Kivity
  2011-09-13 14:22         ` Peter Zijlstra
  0 siblings, 1 reply; 23+ messages in thread
From: Avi Kivity @ 2011-09-13 11:43 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Huang Ying, Andrew Morton, linux-kernel, Andi Kleen, Mathieu Desnoyers

On 09/12/2011 05:32 PM, Peter Zijlstra wrote:
> On Mon, 2011-09-12 at 17:26 +0300, Avi Kivity wrote:
> >  On 09/12/2011 05:05 PM, Peter Zijlstra wrote:
> >  >  Subject: llist: Remove cpu_relax() usage in cmpxchg loops
> >  >  From: Peter Zijlstra<a.p.zijlstra@chello.nl>
> >  >  Date: Mon Sep 12 15:50:49 CEST 2011
> >  >
> >  >  Initial benchmarks show they're a net loss (2 socket wsm):
> >  >
> >  >    $ for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor ; do echo performance>   $i; done
> >  >    $ echo 4096 32000 64 128>   /proc/sys/kernel/sem
> >  >    $ ./sembench -t 2048 -w 1900 -o 0
> >  >
> >
> >  We hyperthreading enabled, and were all threads loaded?  cpu_relax
> >  allows the other thread to make more progress while the spinner relaxes.
>
> Yeah, with HT enabled, the benchmark runs 2048 tasks and does 1900 task
> bulk wakeups or so. Forgot the details, but it basically stresses the
> sleep+wakeup paths like nobodies business.

Ok.

Another issue is that hypervisors use PAUSE to detect a spinning guest 
and issue a directed yield to another vcpu.  But for cmpxchg loops, the 
"spinner" would just commit on the next loop, no?  So I think there's no 
objection from that front.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 8/5] llist: Remove cpu_relax() usage in cmpxchg loops
  2011-09-13 11:43       ` Avi Kivity
@ 2011-09-13 14:22         ` Peter Zijlstra
  2011-09-13 14:51           ` Avi Kivity
  0 siblings, 1 reply; 23+ messages in thread
From: Peter Zijlstra @ 2011-09-13 14:22 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Huang Ying, Andrew Morton, linux-kernel, Andi Kleen, Mathieu Desnoyers

On Tue, 2011-09-13 at 14:43 +0300, Avi Kivity wrote:

> Another issue is that hypervisors use PAUSE to detect a spinning guest 
> and issue a directed yield to another vcpu.  But for cmpxchg loops, the 
> "spinner" would just commit on the next loop, no?  So I think there's no 
> objection from that front.

Right, we shouldn't ever spend a significant amount spinning on a
cmpxchg. If we do we need to fix that instead.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 8/5] llist: Remove cpu_relax() usage in cmpxchg loops
  2011-09-13 14:22         ` Peter Zijlstra
@ 2011-09-13 14:51           ` Avi Kivity
  2011-09-13 14:53             ` Peter Zijlstra
  0 siblings, 1 reply; 23+ messages in thread
From: Avi Kivity @ 2011-09-13 14:51 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Huang Ying, Andrew Morton, linux-kernel, Andi Kleen, Mathieu Desnoyers

On 09/13/2011 05:22 PM, Peter Zijlstra wrote:
> On Tue, 2011-09-13 at 14:43 +0300, Avi Kivity wrote:
>
> >  Another issue is that hypervisors use PAUSE to detect a spinning guest
> >  and issue a directed yield to another vcpu.  But for cmpxchg loops, the
> >  "spinner" would just commit on the next loop, no?  So I think there's no
> >  objection from that front.
>
> Right, we shouldn't ever spend a significant amount spinning on a
> cmpxchg. If we do we need to fix that instead.

I hate arguing while agreeing, but the issue isn't that we don't spend a 
significant time spinning, but that there is no owner.  Should the other 
cpu go away, we just pick up a new copy of oldval and complete the 
transaction.

With spinlocks, even if you hold it for just a single guest cycle, the 
situation is different.  If the vcpu that holds the spinlock is 
preempted, the spinner is forced to spin until the owner is rescheduled.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 8/5] llist: Remove cpu_relax() usage in cmpxchg loops
  2011-09-13 14:51           ` Avi Kivity
@ 2011-09-13 14:53             ` Peter Zijlstra
  0 siblings, 0 replies; 23+ messages in thread
From: Peter Zijlstra @ 2011-09-13 14:53 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Huang Ying, Andrew Morton, linux-kernel, Andi Kleen, Mathieu Desnoyers

On Tue, 2011-09-13 at 17:51 +0300, Avi Kivity wrote:
> 
> I hate arguing while agreeing, but the issue isn't that we don't spend a 
> significant time spinning, but that there is no owner.  Should the other 
> cpu go away, we just pick up a new copy of oldval and complete the 
> transaction. 

Ah, ok. Yes that too ;-)

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2011-09-13 14:53 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-09-08  6:00 [PATCH -mm -v2 0/5] irq_work, Use llist in irq_work Huang Ying
2011-09-08  6:00 ` [PATCH -mm -v2 1/5] llist, Make all llist functions inline Huang Ying
2011-09-08  6:00 ` [PATCH -mm -v2 2/5] llist, Define macro to check NMI safe cmpxchg Huang Ying
2011-09-08  6:00 ` [PATCH -mm -v2 3/5] llist, Move cpu_relax after cmpxchg Huang Ying
2011-09-08  6:00 ` [PATCH -mm -v2 4/5] llist, Return whether list is empty before adding in llist_add Huang Ying
2011-09-08  6:00 ` [PATCH -mm -v2 5/5] irq_work, Use llist in irq_work Huang Ying
2011-09-12 14:05 ` [PATCH 6/5] llist: Add llist_next() Peter Zijlstra
2011-09-12 14:05 ` [PATCH 7/5] sched: Convert to use llist Peter Zijlstra
2011-09-12 14:05 ` [PATCH 8/5] llist: Remove cpu_relax() usage in cmpxchg loops Peter Zijlstra
2011-09-12 14:23   ` Andi Kleen
2011-09-12 14:23     ` Peter Zijlstra
2011-09-12 14:47     ` Mathieu Desnoyers
2011-09-12 15:09       ` Peter Zijlstra
2011-09-12 15:24       ` Peter Zijlstra
2011-09-12 16:38       ` Andi Kleen
2011-09-12 18:53         ` Peter Zijlstra
2011-09-12 14:26   ` Avi Kivity
2011-09-12 14:32     ` Peter Zijlstra
2011-09-13 11:43       ` Avi Kivity
2011-09-13 14:22         ` Peter Zijlstra
2011-09-13 14:51           ` Avi Kivity
2011-09-13 14:53             ` Peter Zijlstra
2011-09-12 14:06 ` [PATCH -mm -v2 0/5] irq_work, Use llist in irq_work Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.