[PATCH -v10 0/4] Lock-less list

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH -v10 0/4] Lock-less list
@ 2011-01-17  6:16 Huang Ying
  2011-01-17  6:16 ` [PATCH -v10 1/4] Add Kconfig option ARCH_HAVE_NMI_SAFE_CMPXCHG Huang Ying
                   ` (5 more replies)
  0 siblings, 6 replies; 25+ messages in thread
From: Huang Ying @ 2011-01-17  6:16 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Andi Kleen, ying.huang, Peter Zijlstra,
	Linus Torvalds, Ingo Molnar, Chris Mason

v10:

- Rebased on latest Linus' tree.
- Revise ARCH_HAVE_NMI_SAFE_CMPXCHG definition for SPARC32 per arch
  maintainer's comments.

v9:

- Split out lock-less allocator, will repost allocator with its user
  in another patchset.
- Use lock-less list in irq_work and replace net/rds/xlist.h

v8:

- Rebased on mmotm 2010-12-02

v7:

- Revise ARCH_HAVE_NMI_SAFE_CMPXCHG definition for some architectures
  according to architecture maitainers' comments.
- Remove spin_trylock_irqsave based fallback for lockless memory allocator,
  because it does not work for !CONFIG_SMP and is not likely to be used.
- Make lockless memory allocator and list does not depend on
  ARCH_HAVE_NMI_SAFE_CMPXCHG.  Instead, require the user to depend on it
  when needed. And BUG_ON(in_nmi()) is added in necessary place to prevent
  silent race.

v6:

- Revise ARCH_HAVE_NMI_SAFE_CMPXCHG definition for some architectures
  according to architecture maitainers' comments.

v5:

- Add ARCH_HAVE_NMI_SAFE_CMPXCHG
- Add spin_trylock_irqsave based fallback in lockless memory allocator
  if ARCH_HAVE_NMI_SAFE_CMPXCHG=n
- Make lockless list depends on ARCH_HAVE_NMI_SAFE_CMPXCHG

v4:

- Split from APEI patchset
- Update patch description and comments according to ML comments

v3:

- Rework lockless memory allocator and list according to ML comments


[PATCH -v10 1/4] Add Kconfig option ARCH_HAVE_NMI_SAFE_CMPXCHG
[PATCH -v10 2/4] lib, Add lock-less NULL terminated single list
[PATCH -v10 3/4] irq_work, Use llist in irq_work
[PATCH -v10 4/4] net, rds, Replace xlist in net/rds/xlist.h with llist

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH -v10 1/4] Add Kconfig option ARCH_HAVE_NMI_SAFE_CMPXCHG
  2011-01-17  6:16 [PATCH -v10 0/4] Lock-less list Huang Ying
@ 2011-01-17  6:16 ` Huang Ying
  2011-01-17  6:16 ` [PATCH -v10 2/4] lib, Add lock-less NULL terminated single list Huang Ying
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 25+ messages in thread
From: Huang Ying @ 2011-01-17  6:16 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Andi Kleen, ying.huang, Peter Zijlstra,
	Linus Torvalds, Ingo Molnar, Chris Mason, Richard Henderson,
	Russell King, Mikael Starvik, David Howells, Yoshinori Sato,
	Tony Luck, Hirokazu Takata, Geert Uytterhoeven, Michal Simek,
	Ralf Baechle, Kyle McMartin, Martin Schwidefsky, Chen Liqin,
	David S. Miller, Ingo Molnar, Chris Zankel

cmpxchg() is widely used by lockless code, including NMI-safe lockless
code.  But on some architectures, the cmpxchg() implementation is not
NMI-safe, on these architectures the lockless code may need to a
spin_trylock_irqsave() based implementation.

This patch adds a Kconfig option: ARCH_HAVE_NMI_SAFE_CMPXCHG, so that
NMI-safe lockless code can depend on it or provide different
implementation according to it.

On many architectures, cmpxchg is only NMI-safe for several specific
operand sizes. So, ARCH_HAVE_NMI_SAFE_CMPXCHG define in this patch
only guarantees cmpxchg is NMI-safe for sizeof(unsigned long).

Signed-off-by: Huang Ying <ying.huang@intel.com>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Acked-by: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
CC: Richard Henderson <rth@twiddle.net>
CC: Russell King <linux@arm.linux.org.uk>
CC: Mikael Starvik <starvik@axis.com>
CC: David Howells <dhowells@redhat.com>
CC: Yoshinori Sato <ysato@users.sourceforge.jp>
CC: Tony Luck <tony.luck@intel.com>
CC: Hirokazu Takata <takata@linux-m32r.org>
CC: Geert Uytterhoeven <geert@linux-m68k.org>
CC: Michal Simek <monstr@monstr.eu>
CC: Ralf Baechle <ralf@linux-mips.org>
CC: Kyle McMartin <kyle@mcmartin.ca>
CC: Martin Schwidefsky <schwidefsky@de.ibm.com>
CC: Chen Liqin <liqin.chen@sunplusct.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: Ingo Molnar <mingo@redhat.com>
CC: Chris Zankel <chris@zankel.net>
---
 arch/Kconfig         |    3 +++
 arch/alpha/Kconfig   |    1 +
 arch/avr32/Kconfig   |    1 +
 arch/frv/Kconfig     |    1 +
 arch/ia64/Kconfig    |    1 +
 arch/m68k/Kconfig    |    1 +
 arch/parisc/Kconfig  |    1 +
 arch/powerpc/Kconfig |    1 +
 arch/s390/Kconfig    |    1 +
 arch/sh/Kconfig      |    1 +
 arch/sparc/Kconfig   |    1 +
 arch/tile/Kconfig    |    1 +
 arch/x86/Kconfig     |    1 +
 13 files changed, 15 insertions(+)

--- a/arch/Kconfig
+++ b/arch/Kconfig
@@ -178,4 +178,7 @@ config HAVE_ARCH_JUMP_LABEL
 config HAVE_ARCH_MUTEX_CPU_RELAX
 	bool
 
+config ARCH_HAVE_NMI_SAFE_CMPXCHG
+	bool
+
 source "kernel/gcov/Kconfig"
--- a/arch/alpha/Kconfig
+++ b/arch/alpha/Kconfig
@@ -7,6 +7,7 @@ config ALPHA
 	select HAVE_SYSCALL_WRAPPERS
 	select HAVE_IRQ_WORK
 	select HAVE_PERF_EVENTS
+	select ARCH_HAVE_NMI_SAFE_CMPXCHG
 	select HAVE_DMA_ATTRS
 	help
 	  The Alpha is a 64-bit general-purpose processor designed and
--- a/arch/avr32/Kconfig
+++ b/arch/avr32/Kconfig
@@ -6,6 +6,7 @@ config AVR32
 	select HAVE_CLK
 	select HAVE_OPROFILE
 	select HAVE_KPROBES
+	select ARCH_HAVE_NMI_SAFE_CMPXCHG
 	help
 	  AVR32 is a high-performance 32-bit RISC microprocessor core,
 	  designed for cost-sensitive embedded applications, with particular
--- a/arch/frv/Kconfig
+++ b/arch/frv/Kconfig
@@ -5,6 +5,7 @@ config FRV
 	select HAVE_ARCH_TRACEHOOK
 	select HAVE_IRQ_WORK
 	select HAVE_PERF_EVENTS
+	select ARCH_HAVE_NMI_SAFE_CMPXCHG
 
 config ZONE_DMA
 	bool
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -22,6 +22,7 @@ config IA64
 	select HAVE_KVM
 	select HAVE_ARCH_TRACEHOOK
 	select HAVE_DMA_API_DEBUG
+	select ARCH_HAVE_NMI_SAFE_CMPXCHG
 	default y
 	help
 	  The Itanium Processor Family is Intel's 64-bit successor to
--- a/arch/m68k/Kconfig
+++ b/arch/m68k/Kconfig
@@ -4,6 +4,7 @@ config M68K
 	select HAVE_AOUT
 	select HAVE_IDE
 	select GENERIC_ATOMIC64
+	select ARCH_HAVE_NMI_SAFE_CMPXCHG if RMW_INSNS
 
 config MMU
 	bool
--- a/arch/parisc/Kconfig
+++ b/arch/parisc/Kconfig
@@ -11,6 +11,7 @@ config PARISC
 	select BUG
 	select HAVE_IRQ_WORK
 	select HAVE_PERF_EVENTS
+	select ARCH_HAVE_NMI_SAFE_CMPXCHG
 	select GENERIC_ATOMIC64 if !64BIT
 	select GENERIC_HARDIRQS_NO__DO_IRQ
 	help
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -141,6 +141,7 @@ config PPC
 	select GENERIC_ATOMIC64 if PPC32
 	select HAVE_IRQ_WORK
 	select HAVE_PERF_EVENTS
+	select ARCH_HAVE_NMI_SAFE_CMPXCHG
 	select HAVE_REGS_AND_STACK_ACCESS_API
 	select HAVE_HW_BREAKPOINT if PERF_EVENTS && PPC_BOOK3S_64
 
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -81,6 +81,7 @@ config S390
 	select INIT_ALL_POSSIBLE
 	select HAVE_IRQ_WORK
 	select HAVE_PERF_EVENTS
+	select ARCH_HAVE_NMI_SAFE_CMPXCHG
 	select HAVE_KERNEL_GZIP
 	select HAVE_KERNEL_BZIP2
 	select HAVE_KERNEL_LZMA
--- a/arch/sh/Kconfig
+++ b/arch/sh/Kconfig
@@ -11,6 +11,7 @@ config SUPERH
 	select HAVE_DMA_ATTRS
 	select HAVE_IRQ_WORK
 	select HAVE_PERF_EVENTS
+	select ARCH_HAVE_NMI_SAFE_CMPXCHG if (GUSA_RB || CPU_SH4A)
 	select PERF_USE_VMALLOC
 	select HAVE_KERNEL_GZIP
 	select HAVE_KERNEL_BZIP2
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -50,6 +50,7 @@ config SPARC64
 	select RTC_DRV_STARFIRE
 	select HAVE_PERF_EVENTS
 	select PERF_USE_VMALLOC
+	select ARCH_HAVE_NMI_SAFE_CMPXCHG
 
 config ARCH_DEFCONFIG
 	string
--- a/arch/tile/Kconfig
+++ b/arch/tile/Kconfig
@@ -104,6 +104,7 @@ config TILE
 	select GENERIC_FIND_NEXT_BIT
 	select USE_GENERIC_SMP_HELPERS
 	select CC_OPTIMIZE_FOR_SIZE
+	select ARCH_HAVE_NMI_SAFE_CMPXCHG
 
 # FIXME: investigate whether we need/want these options.
 #	select HAVE_IOREMAP_PROT
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -23,6 +23,7 @@ config X86
 	select HAVE_OPROFILE
 	select HAVE_PERF_EVENTS
 	select HAVE_IRQ_WORK
+	select ARCH_HAVE_NMI_SAFE_CMPXCHG if !M386
 	select HAVE_IOREMAP_PROT
 	select HAVE_KPROBES
 	select HAVE_MEMBLOCK

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH -v10 2/4] lib, Add lock-less NULL terminated single list
  2011-01-17  6:16 [PATCH -v10 0/4] Lock-less list Huang Ying
  2011-01-17  6:16 ` [PATCH -v10 1/4] Add Kconfig option ARCH_HAVE_NMI_SAFE_CMPXCHG Huang Ying
@ 2011-01-17  6:16 ` Huang Ying
  2011-01-17  6:16 ` [PATCH -v10 3/4] irq_work, Use llist in irq_work Huang Ying
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 25+ messages in thread
From: Huang Ying @ 2011-01-17  6:16 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Andi Kleen, ying.huang, Peter Zijlstra,
	Linus Torvalds, Ingo Molnar, Chris Mason

Cmpxchg is used to implement adding new entry to the list, deleting
all entries from the list, deleting first entry of the list and some
other operations.

Because this is a single list, so the tail can not be accessed in O(1).

If there are multiple producers and multiple consumers, llist_add can
be used in producers and llist_del_all can be used in consumers.  They
can work simultaneously without lock.  But llist_del_first can not be
used here.  Because llist_del_first depends on list->first->next does
not changed if list->first is not changed during its operation, but
llist_del_first, llist_add, llist_add sequence in another consumer may
violate that.

If there are multiple producers and one consumer, llist_add can be
used in producers and llist_del_all or llist_del_first can be used in
the consumer.

The list entries deleted via llist_del_all can be traversed with
traversing function such as llist_for_each etc.  But the list entries
can not be traversed safely before deleted from the list without
proper synchronization with the list consumers.

The basic atomic operation of this list is cmpxchg on long.  On
architectures that don't have NMI-safe cmpxchg implementation, the
list can NOT be used in NMI handler.  So code uses the list in NMI
handler should depend on CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
---
 include/linux/llist.h |   99 +++++++++++++++++++++++++++++++++++++++++
 lib/Kconfig           |    3 +
 lib/Makefile          |    2 
 lib/llist.c           |  119 ++++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 223 insertions(+)
 create mode 100644 include/linux/llist.h
 create mode 100644 lib/llist.c

--- /dev/null
+++ b/include/linux/llist.h
@@ -0,0 +1,99 @@
+#ifndef LLIST_H
+#define LLIST_H
+/*
+ * Lock-less NULL terminated single linked list
+ *
+ * If there are multiple producers and multiple consumers, llist_add
+ * can be used in producers and llist_del_all can be used in
+ * consumers.  They can work simultaneously without lock.  But
+ * llist_del_first can not be used here.  Because llist_del_first
+ * depends on list->first->next does not changed if list->first is not
+ * changed during its operation, but llist_del_first, llist_add,
+ * llist_add sequence in another consumer may violate that.
+ *
+ * If there are multiple producers and one consumer, llist_add can be
+ * used in producers and llist_del_all or llist_del_first can be used
+ * in the consumer.
+ *
+ * The list entries deleted via llist_del_all can be traversed with
+ * traversing function such as llist_for_each etc.  But the list
+ * entries can not be traversed safely before deleted from the list
+ * without proper synchronization with the list consumers.
+ *
+ * The basic atomic operation of this list is cmpxchg on long.  On
+ * architectures that don't have NMI-safe cmpxchg implementation, the
+ * list can NOT be used in NMI handler.  So code uses the list in NMI
+ * handler should depend on CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG.
+ */
+
+struct llist_head {
+	struct llist_node *first;
+};
+
+struct llist_node {
+	struct llist_node *next;
+};
+
+#define LLIST_HEAD_INIT(name)	{ NULL }
+#define LLIST_HEAD(name)	struct llist_head name = LLIST_HEAD_INIT(name)
+
+/**
+ * init_llist_head - initialize lock-less list head
+ * @head:	the head for your lock-less list
+ */
+static inline void init_llist_head(struct llist_head *list)
+{
+	list->first = NULL;
+}
+
+/**
+ * llist_entry - get the struct of this entry
+ * @ptr:	the &struct llist_node pointer.
+ * @type:	the type of the struct this is embedded in.
+ * @member:	the name of the llist_node within the struct.
+ */
+#define llist_entry(ptr, type, member)		\
+	container_of(ptr, type, member)
+
+/**
+ * llist_for_each - iterate over some entries of a lock-less list
+ * @pos:	the &struct llist_node to use as a loop cursor
+ * @node:	the first entry of deleted list entries
+ *
+ * In general, some entries of the lock-less list can be traversed
+ * safely only after being deleted from list, so start with an entry
+ * instead of list head.
+ */
+#define llist_for_each(pos, node)			\
+	for (pos = (node); pos; pos = pos->next)
+
+/**
+ * llist_for_each_entry - iterate over some entries of lock-less list of given type
+ * @pos:	the type * to use as a loop cursor.
+ * @node:	the fist entry of deleted list entries.
+ * @member:	the name of the llist_node with the struct.
+ *
+ * In general, some entries of the lock-less list can be traversed
+ * safely only after being removed from list, so start with an entry
+ * instead of list head.
+ */
+#define llist_for_each_entry(pos, node, member)				\
+	for (pos = llist_entry((node), typeof(*pos), member);		\
+	     &pos->member != NULL;					\
+	     pos = llist_entry(pos->member.next, typeof(*pos), member))
+
+/**
+ * llist_empty - tests whether a lock-less list is empty
+ * @head:	the list to test
+ */
+static inline int llist_empty(const struct llist_head *head)
+{
+	return head->first == NULL;
+}
+
+void llist_add(struct llist_node *new, struct llist_head *head);
+void llist_add_batch(struct llist_node *new_first, struct llist_node *new_last,
+		     struct llist_head *head);
+struct llist_node *llist_del_first(struct llist_head *head);
+struct llist_node *llist_del_all(struct llist_head *head);
+#endif /* LLIST_H */
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -219,4 +219,7 @@ config LRU_CACHE
 config AVERAGE
 	bool
 
+config LLIST
+	bool
+
 endmenu
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -110,6 +110,8 @@ obj-$(CONFIG_ATOMIC64_SELFTEST) += atomi
 
 obj-$(CONFIG_AVERAGE) += average.o
 
+obj-$(CONFIG_LLIST) += llist.o
+
 hostprogs-y	:= gen_crc32table
 clean-files	:= crc32table.h
 
--- /dev/null
+++ b/lib/llist.c
@@ -0,0 +1,119 @@
+/*
+ * Lock-less NULL terminated single linked list
+ *
+ * The basic atomic operation of this list is cmpxchg on long.  On
+ * architectures that don't have NMI-safe cmpxchg implementation, the
+ * list can NOT be used in NMI handler.  So code uses the list in NMI
+ * handler should depend on CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG.
+ *
+ * Copyright 2010 Intel Corp.
+ *   Author: Huang Ying <ying.huang@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License version
+ * 2 as published by the Free Software Foundation;
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+ */
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/interrupt.h>
+#include <linux/llist.h>
+
+#include <asm/system.h>
+
+/**
+ * llist_add - add a new entry
+ * @new:	new entry to be added
+ * @head:	the head for your lock-less list
+ */
+void llist_add(struct llist_node *new, struct llist_head *head)
+{
+	struct llist_node *entry;
+
+#ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
+	BUG_ON(in_nmi());
+#endif
+
+	do {
+		entry = head->first;
+		new->next = entry;
+	} while (cmpxchg(&head->first, entry, new) != entry);
+}
+EXPORT_SYMBOL_GPL(llist_add);
+
+/**
+ * llist_add_batch - add several linked entries in batch
+ * @new_first:	first entry in batch to be added
+ * @new_last:	last entry in batch to be added
+ * @head:	the head for your lock-less list
+ */
+void llist_add_batch(struct llist_node *new_first, struct llist_node *new_last,
+		     struct llist_head *head)
+{
+	struct llist_node *entry;
+
+#ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
+	BUG_ON(in_nmi());
+#endif
+
+	do {
+		entry = head->first;
+		new_last->next = entry;
+	} while (cmpxchg(&head->first, entry, new_first) != entry);
+}
+EXPORT_SYMBOL_GPL(llist_add_batch);
+
+/**
+ * llist_del_first - delete the first entry of lock-less list
+ * @head:	the head for your lock-less list
+ *
+ * If list is empty, return NULL, otherwise, return the first entry deleted.
+ *
+ * Only one llist_del_first user can be used simultaneously with
+ * multiple llist_add users without lock. Because otherwise
+ * llist_del_first, llist_add, llist_add sequence in another user may
+ * change @head->first->next, but keep @head->first. If multiple
+ * consumers are needed, please use llist_del_all.
+ */
+struct llist_node *llist_del_first(struct llist_head *head)
+{
+	struct llist_node *entry;
+
+#ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
+	BUG_ON(in_nmi());
+#endif
+
+	do {
+		entry = head->first;
+		if (entry == NULL)
+			return NULL;
+	} while (cmpxchg(&head->first, entry, entry->next) != entry);
+
+	return entry;
+}
+EXPORT_SYMBOL_GPL(llist_del_first);
+
+/**
+ * llist_del_all - delete all entries from lock-less list
+ * @head:	the head of lock-less list to delete all entries
+ *
+ * If list is empty, return NULL, otherwise, delete all entries and
+ * return the pointer to the first entry.
+ */
+struct llist_node *llist_del_all(struct llist_head *head)
+{
+#ifndef CONFIG_ARCH_HAVE_NMI_SAFE_CMPXCHG
+	BUG_ON(in_nmi());
+#endif
+
+	return xchg(&head->first, NULL);
+}
+EXPORT_SYMBOL_GPL(llist_del_all);

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH -v10 3/4] irq_work, Use llist in irq_work
  2011-01-17  6:16 [PATCH -v10 0/4] Lock-less list Huang Ying
  2011-01-17  6:16 ` [PATCH -v10 1/4] Add Kconfig option ARCH_HAVE_NMI_SAFE_CMPXCHG Huang Ying
  2011-01-17  6:16 ` [PATCH -v10 2/4] lib, Add lock-less NULL terminated single list Huang Ying
@ 2011-01-17  6:16 ` Huang Ying
  2011-01-17  6:16 ` [PATCH -v10 4/4] net, rds, Replace xlist in net/rds/xlist.h with llist Huang Ying
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 25+ messages in thread
From: Huang Ying @ 2011-01-17  6:16 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Andi Kleen, ying.huang, Peter Zijlstra,
	Linus Torvalds, Ingo Molnar, Chris Mason

Use llist in irq_work instead of the lock-less linked list
implementation in irq_work to avoid the code duplication.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
---
 include/linux/irq_work.h |   15 ++++---
 init/Kconfig             |    1 
 kernel/irq_work.c        |   92 ++++++++++++++++++-----------------------------
 3 files changed, 47 insertions(+), 61 deletions(-)

--- a/include/linux/irq_work.h
+++ b/include/linux/irq_work.h
@@ -1,20 +1,23 @@
 #ifndef _LINUX_IRQ_WORK_H
 #define _LINUX_IRQ_WORK_H
 
+#include <linux/llist.h>
+
 struct irq_work {
-	struct irq_work *next;
+	unsigned long flags;
+	struct llist_node llnode;
 	void (*func)(struct irq_work *);
 };
 
 static inline
-void init_irq_work(struct irq_work *entry, void (*func)(struct irq_work *))
+void init_irq_work(struct irq_work *work, void (*func)(struct irq_work *))
 {
-	entry->next = NULL;
-	entry->func = func;
+	work->flags = 0;
+	work->func = func;
 }
 
-bool irq_work_queue(struct irq_work *entry);
+bool irq_work_queue(struct irq_work *work);
 void irq_work_run(void);
-void irq_work_sync(struct irq_work *entry);
+void irq_work_sync(struct irq_work *work);
 
 #endif /* _LINUX_IRQ_WORK_H */
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -27,6 +27,7 @@ config HAVE_IRQ_WORK
 config IRQ_WORK
 	bool
 	depends on HAVE_IRQ_WORK
+	select LLIST
 
 menu "General setup"
 
--- a/kernel/irq_work.c
+++ b/kernel/irq_work.c
@@ -17,49 +17,34 @@
  * claimed   NULL, 3 -> {pending}       : claimed to be enqueued
  * pending   next, 3 -> {busy}          : queued, pending callback
  * busy      NULL, 2 -> {free, claimed} : callback in progress, can be claimed
- *
- * We use the lower two bits of the next pointer to keep PENDING and BUSY
- * flags.
  */
 
 #define IRQ_WORK_PENDING	1UL
 #define IRQ_WORK_BUSY		2UL
 #define IRQ_WORK_FLAGS		3UL
 
-static inline bool irq_work_is_set(struct irq_work *entry, int flags)
-{
-	return (unsigned long)entry->next & flags;
-}
-
-static inline struct irq_work *irq_work_next(struct irq_work *entry)
-{
-	unsigned long next = (unsigned long)entry->next;
-	next &= ~IRQ_WORK_FLAGS;
-	return (struct irq_work *)next;
-}
+#define LIST_NONEMPTY_BIT	0
 
-static inline struct irq_work *next_flags(struct irq_work *entry, int flags)
-{
-	unsigned long next = (unsigned long)entry;
-	next |= flags;
-	return (struct irq_work *)next;
-}
+struct irq_work_list {
+	unsigned long flags;
+	struct llist_head llist;
+};
 
-static DEFINE_PER_CPU(struct irq_work *, irq_work_list);
+static DEFINE_PER_CPU(struct irq_work_list, irq_work_lists);
 
 /*
  * Claim the entry so that no one else will poke at it.
  */
-static bool irq_work_claim(struct irq_work *entry)
+static bool irq_work_claim(struct irq_work *work)
 {
-	struct irq_work *next, *nflags;
+	unsigned long flags, nflags;
 
 	do {
-		next = entry->next;
-		if ((unsigned long)next & IRQ_WORK_PENDING)
+		flags = work->flags;
+		if (flags & IRQ_WORK_PENDING)
 			return false;
-		nflags = next_flags(next, IRQ_WORK_FLAGS);
-	} while (cmpxchg(&entry->next, next, nflags) != next);
+		nflags = flags | IRQ_WORK_FLAGS;
+	} while (cmpxchg(&work->flags, flags, nflags) != flags);
 
 	return true;
 }
@@ -75,23 +60,19 @@ void __weak arch_irq_work_raise(void)
 /*
  * Queue the entry and raise the IPI if needed.
  */
-static void __irq_work_queue(struct irq_work *entry)
+static void __irq_work_queue(struct irq_work *work)
 {
-	struct irq_work *next;
+	struct irq_work_list *irq_work_list;
 
-	preempt_disable();
+	irq_work_list = &get_cpu_var(irq_work_lists);
 
-	do {
-		next = __this_cpu_read(irq_work_list);
-		/* Can assign non-atomic because we keep the flags set. */
-		entry->next = next_flags(next, IRQ_WORK_FLAGS);
-	} while (this_cpu_cmpxchg(irq_work_list, next, entry) != next);
+	llist_add(&work->llnode, &irq_work_list->llist);
 
 	/* The list was empty, raise self-interrupt to start processing. */
-	if (!irq_work_next(entry))
+	if (!test_and_set_bit(LIST_NONEMPTY_BIT, &irq_work_list->flags))
 		arch_irq_work_raise();
 
-	preempt_enable();
+	put_cpu_var(irq_work_list);
 }
 
 /*
@@ -100,16 +81,16 @@ static void __irq_work_queue(struct irq_
  *
  * Can be re-enqueued while the callback is still in progress.
  */
-bool irq_work_queue(struct irq_work *entry)
+bool irq_work_queue(struct irq_work *work)
 {
-	if (!irq_work_claim(entry)) {
+	if (!irq_work_claim(work)) {
 		/*
 		 * Already enqueued, can't do!
 		 */
 		return false;
 	}
 
-	__irq_work_queue(entry);
+	__irq_work_queue(work);
 	return true;
 }
 EXPORT_SYMBOL_GPL(irq_work_queue);
@@ -120,34 +101,35 @@ EXPORT_SYMBOL_GPL(irq_work_queue);
  */
 void irq_work_run(void)
 {
-	struct irq_work *list;
+	struct irq_work *work;
+	struct irq_work_list *irq_work_list;
+	struct llist_node *llnode;
 
-	if (this_cpu_read(irq_work_list) == NULL)
+	irq_work_list = &__get_cpu_var(irq_work_lists);
+	if (llist_empty(&irq_work_list->llist))
 		return;
 
 	BUG_ON(!in_irq());
 	BUG_ON(!irqs_disabled());
 
-	list = this_cpu_xchg(irq_work_list, NULL);
-
-	while (list != NULL) {
-		struct irq_work *entry = list;
+	clear_bit(LIST_NONEMPTY_BIT, &irq_work_list->flags);
+	llnode = llist_del_all(&irq_work_list->llist);
+	while (llnode != NULL) {
+		work = llist_entry(llnode, struct irq_work, llnode);
 
-		list = irq_work_next(list);
+		llnode = llnode->next;
 
 		/*
-		 * Clear the PENDING bit, after this point the @entry
+		 * Clear the PENDING bit, after this point the @work
 		 * can be re-used.
 		 */
-		entry->next = next_flags(NULL, IRQ_WORK_BUSY);
-		entry->func(entry);
+		work->flags = IRQ_WORK_BUSY;
+		work->func(work);
 		/*
 		 * Clear the BUSY bit and return to the free state if
 		 * no-one else claimed it meanwhile.
 		 */
-		(void)cmpxchg(&entry->next,
-			      next_flags(NULL, IRQ_WORK_BUSY),
-			      NULL);
+		(void)cmpxchg(&work->flags, IRQ_WORK_BUSY, 0);
 	}
 }
 EXPORT_SYMBOL_GPL(irq_work_run);
@@ -156,11 +138,11 @@ EXPORT_SYMBOL_GPL(irq_work_run);
  * Synchronize against the irq_work @entry, ensures the entry is not
  * currently in use.
  */
-void irq_work_sync(struct irq_work *entry)
+void irq_work_sync(struct irq_work *work)
 {
 	WARN_ON_ONCE(irqs_disabled());
 
-	while (irq_work_is_set(entry, IRQ_WORK_BUSY))
+	while (work->flags & IRQ_WORK_BUSY)
 		cpu_relax();
 }
 EXPORT_SYMBOL_GPL(irq_work_sync);

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH -v10 4/4] net, rds, Replace xlist in net/rds/xlist.h with llist
  2011-01-17  6:16 [PATCH -v10 0/4] Lock-less list Huang Ying
                   ` (2 preceding siblings ...)
  2011-01-17  6:16 ` [PATCH -v10 3/4] irq_work, Use llist in irq_work Huang Ying
@ 2011-01-17  6:16 ` Huang Ying
  2011-01-19 21:55 ` [PATCH -v10 0/4] Lock-less list Andrew Morton
  2011-01-20  5:55 ` Mathieu Desnoyers
  5 siblings, 0 replies; 25+ messages in thread
From: Huang Ying @ 2011-01-17  6:16 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Andi Kleen, ying.huang, Peter Zijlstra,
	Linus Torvalds, Ingo Molnar, Chris Mason

The functionality of xlist and llist is almost same.  This patch
replace xlist with llist to avoid code duplication.

Known issues: don't know how to test this, need special hardware?

Signed-off-by: Huang Ying <ying.huang@intel.com>
Cc: Chris Mason <chris.mason@oracle.com>
---
 net/rds/Kconfig   |    1 
 net/rds/ib_rdma.c |  112 ++++++++++++++++++++++++------------------------------
 net/rds/xlist.h   |   80 --------------------------------------
 3 files changed, 52 insertions(+), 141 deletions(-)
 delete mode 100644 net/rds/xlist.h

--- a/net/rds/Kconfig
+++ b/net/rds/Kconfig
@@ -9,6 +9,7 @@ config RDS
 
 config RDS_RDMA
 	tristate "RDS over Infiniband and iWARP"
+	select LLIST
 	depends on RDS && INFINIBAND && INFINIBAND_ADDR_TRANS
 	---help---
 	  Allow RDS to use Infiniband and iWARP as a transport.
--- a/net/rds/ib_rdma.c
+++ b/net/rds/ib_rdma.c
@@ -33,10 +33,10 @@
 #include <linux/kernel.h>
 #include <linux/slab.h>
 #include <linux/rculist.h>
+#include <linux/llist.h>
 
 #include "rds.h"
 #include "ib.h"
-#include "xlist.h"
 
 static struct workqueue_struct *rds_ib_fmr_wq;
 
@@ -51,7 +51,7 @@ struct rds_ib_mr {
 	struct rds_ib_mr_pool	*pool;
 	struct ib_fmr		*fmr;
 
-	struct xlist_head	xlist;
+	struct llist_node	llnode;
 
 	/* unmap_list is for freeing */
 	struct list_head	unmap_list;
@@ -73,9 +73,9 @@ struct rds_ib_mr_pool {
 	atomic_t		item_count;		/* total # of MRs */
 	atomic_t		dirty_count;		/* # dirty of MRs */
 
-	struct xlist_head	drop_list;		/* MRs that have reached their max_maps limit */
-	struct xlist_head	free_list;		/* unused MRs */
-	struct xlist_head	clean_list;		/* global unused & unamapped MRs */
+	struct llist_head	drop_list;		/* MRs that have reached their max_maps limit */
+	struct llist_head	free_list;		/* unused MRs */
+	struct llist_head	clean_list;		/* global unused & unamapped MRs */
 	wait_queue_head_t	flush_wait;
 
 	atomic_t		free_pinned;		/* memory pinned by free MRs */
@@ -222,9 +222,9 @@ struct rds_ib_mr_pool *rds_ib_create_mr_
 	if (!pool)
 		return ERR_PTR(-ENOMEM);
 
-	INIT_XLIST_HEAD(&pool->free_list);
-	INIT_XLIST_HEAD(&pool->drop_list);
-	INIT_XLIST_HEAD(&pool->clean_list);
+	init_llist_head(&pool->free_list);
+	init_llist_head(&pool->drop_list);
+	init_llist_head(&pool->clean_list);
 	mutex_init(&pool->flush_lock);
 	init_waitqueue_head(&pool->flush_wait);
 	INIT_DELAYED_WORK(&pool->flush_worker, rds_ib_mr_pool_flush_worker);
@@ -262,26 +262,18 @@ void rds_ib_destroy_mr_pool(struct rds_i
 	kfree(pool);
 }
 
-static void refill_local(struct rds_ib_mr_pool *pool, struct xlist_head *xl,
-			 struct rds_ib_mr **ibmr_ret)
-{
-	struct xlist_head *ibmr_xl;
-	ibmr_xl = xlist_del_head_fast(xl);
-	*ibmr_ret = list_entry(ibmr_xl, struct rds_ib_mr, xlist);
-}
-
 static inline struct rds_ib_mr *rds_ib_reuse_fmr(struct rds_ib_mr_pool *pool)
 {
 	struct rds_ib_mr *ibmr = NULL;
-	struct xlist_head *ret;
+	struct llist_node *ret;
 	unsigned long *flag;
 
 	preempt_disable();
 	flag = &__get_cpu_var(clean_list_grace);
 	set_bit(CLEAN_LIST_BUSY_BIT, flag);
-	ret = xlist_del_head(&pool->clean_list);
+	ret = llist_del_first(&pool->clean_list);
 	if (ret)
-		ibmr = list_entry(ret, struct rds_ib_mr, xlist);
+		ibmr = llist_entry(ret, struct rds_ib_mr, llnode);
 
 	clear_bit(CLEAN_LIST_BUSY_BIT, flag);
 	preempt_enable();
@@ -531,46 +523,44 @@ static inline unsigned int rds_ib_flush_
 }
 
 /*
- * given an xlist of mrs, put them all into the list_head for more processing
+ * given an llist of mrs, put them all into the list_head for more processing
  */
-static void xlist_append_to_list(struct xlist_head *xlist, struct list_head *list)
+static void llist_append_to_list(struct llist_head *llist, struct list_head *list)
 {
 	struct rds_ib_mr *ibmr;
-	struct xlist_head splice;
-	struct xlist_head *cur;
-	struct xlist_head *next;
-
-	splice.next = NULL;
-	xlist_splice(xlist, &splice);
-	cur = splice.next;
-	while (cur) {
-		next = cur->next;
-		ibmr = list_entry(cur, struct rds_ib_mr, xlist);
+	struct llist_node *node;
+	struct llist_node *next;
+
+	node = llist_del_all(llist);
+	while (node) {
+		next = node->next;
+		ibmr = llist_entry(node, struct rds_ib_mr, llnode);
 		list_add_tail(&ibmr->unmap_list, list);
-		cur = next;
+		node = next;
 	}
 }
 
 /*
- * this takes a list head of mrs and turns it into an xlist of clusters.
- * each cluster has an xlist of MR_CLUSTER_SIZE mrs that are ready for
- * reuse.
+ * this takes a list head of mrs and turns it into linked llist nodes
+ * of clusters.  Each cluster has linked llist nodes of
+ * MR_CLUSTER_SIZE mrs that are ready for reuse.
  */
-static void list_append_to_xlist(struct rds_ib_mr_pool *pool,
-				struct list_head *list, struct xlist_head *xlist,
-				struct xlist_head **tail_ret)
+static void list_to_llist_nodes(struct rds_ib_mr_pool *pool,
+				struct list_head *list,
+				struct llist_node **nodes_head,
+				struct llist_node **nodes_tail)
 {
 	struct rds_ib_mr *ibmr;
-	struct xlist_head *cur_mr = xlist;
-	struct xlist_head *tail_mr = NULL;
+	struct llist_node *cur = NULL;
+	struct llist_node **next = nodes_head;
 
 	list_for_each_entry(ibmr, list, unmap_list) {
-		tail_mr = &ibmr->xlist;
-		tail_mr->next = NULL;
-		cur_mr->next = tail_mr;
-		cur_mr = tail_mr;
+		cur = &ibmr->llnode;
+		*next = cur;
+		next = &cur->next;
 	}
-	*tail_ret = tail_mr;
+	*next = NULL;
+	*nodes_tail = cur;
 }
 
 /*
@@ -583,8 +573,8 @@ static int rds_ib_flush_mr_pool(struct r
 			        int free_all, struct rds_ib_mr **ibmr_ret)
 {
 	struct rds_ib_mr *ibmr, *next;
-	struct xlist_head clean_xlist;
-	struct xlist_head *clean_tail;
+	struct llist_node *clean_nodes;
+	struct llist_node *clean_tail;
 	LIST_HEAD(unmap_list);
 	LIST_HEAD(fmr_list);
 	unsigned long unpinned = 0;
@@ -605,7 +595,7 @@ static int rds_ib_flush_mr_pool(struct r
 
 			prepare_to_wait(&pool->flush_wait, &wait,
 					TASK_UNINTERRUPTIBLE);
-			if (xlist_empty(&pool->clean_list))
+			if (llist_empty(&pool->clean_list))
 				schedule();
 
 			ibmr = rds_ib_reuse_fmr(pool);
@@ -630,10 +620,10 @@ static int rds_ib_flush_mr_pool(struct r
 	/* Get the list of all MRs to be dropped. Ordering matters -
 	 * we want to put drop_list ahead of free_list.
 	 */
-	xlist_append_to_list(&pool->drop_list, &unmap_list);
-	xlist_append_to_list(&pool->free_list, &unmap_list);
+	llist_append_to_list(&pool->drop_list, &unmap_list);
+	llist_append_to_list(&pool->free_list, &unmap_list);
 	if (free_all)
-		xlist_append_to_list(&pool->clean_list, &unmap_list);
+		llist_append_to_list(&pool->clean_list, &unmap_list);
 
 	free_goal = rds_ib_flush_goal(pool, free_all);
 
@@ -665,22 +655,22 @@ static int rds_ib_flush_mr_pool(struct r
 	if (!list_empty(&unmap_list)) {
 		/* we have to make sure that none of the things we're about
 		 * to put on the clean list would race with other cpus trying
-		 * to pull items off.  The xlist would explode if we managed to
+		 * to pull items off.  The llist would explode if we managed to
 		 * remove something from the clean list and then add it back again
-		 * while another CPU was spinning on that same item in xlist_del_head.
+		 * while another CPU was spinning on that same item in llist_del_first.
 		 *
-		 * This is pretty unlikely, but just in case  wait for an xlist grace period
+		 * This is pretty unlikely, but just in case  wait for an llist grace period
 		 * here before adding anything back into the clean list.
 		 */
 		wait_clean_list_grace();
 
-		list_append_to_xlist(pool, &unmap_list, &clean_xlist, &clean_tail);
+		list_to_llist_nodes(pool, &unmap_list, &clean_nodes, &clean_tail);
 		if (ibmr_ret)
-			refill_local(pool, &clean_xlist, ibmr_ret);
+			*ibmr_ret = llist_entry(clean_nodes, struct rds_ib_mr, llnode);
 
-		/* refill_local may have emptied our list */
-		if (!xlist_empty(&clean_xlist))
-			xlist_add(clean_xlist.next, clean_tail, &pool->clean_list);
+		/* more than one entry in llist nodes */
+		if (clean_nodes->next)
+			llist_add_batch(clean_nodes->next, clean_tail, &pool->clean_list);
 
 	}
 
@@ -731,9 +721,9 @@ void rds_ib_free_mr(void *trans_private,
 
 	/* Return it to the pool's free list */
 	if (ibmr->remap_count >= pool->fmr_attr.max_maps)
-		xlist_add(&ibmr->xlist, &ibmr->xlist, &pool->drop_list);
+		llist_add(&ibmr->llnode, &pool->drop_list);
 	else
-		xlist_add(&ibmr->xlist, &ibmr->xlist, &pool->free_list);
+		llist_add(&ibmr->llnode, &pool->free_list);
 
 	atomic_add(ibmr->sg_len, &pool->free_pinned);
 	atomic_inc(&pool->dirty_count);
--- a/net/rds/xlist.h
+++ /dev/null
@@ -1,80 +0,0 @@
-#ifndef _LINUX_XLIST_H
-#define _LINUX_XLIST_H
-
-#include <linux/stddef.h>
-#include <linux/poison.h>
-#include <linux/prefetch.h>
-#include <asm/system.h>
-
-struct xlist_head {
-	struct xlist_head *next;
-};
-
-static inline void INIT_XLIST_HEAD(struct xlist_head *list)
-{
-	list->next = NULL;
-}
-
-static inline int xlist_empty(struct xlist_head *head)
-{
-	return head->next == NULL;
-}
-
-static inline void xlist_add(struct xlist_head *new, struct xlist_head *tail,
-			     struct xlist_head *head)
-{
-	struct xlist_head *cur;
-	struct xlist_head *check;
-
-	while (1) {
-		cur = head->next;
-		tail->next = cur;
-		check = cmpxchg(&head->next, cur, new);
-		if (check == cur)
-			break;
-	}
-}
-
-static inline struct xlist_head *xlist_del_head(struct xlist_head *head)
-{
-	struct xlist_head *cur;
-	struct xlist_head *check;
-	struct xlist_head *next;
-
-	while (1) {
-		cur = head->next;
-		if (!cur)
-			goto out;
-
-		next = cur->next;
-		check = cmpxchg(&head->next, cur, next);
-		if (check == cur)
-			goto out;
-	}
-out:
-	return cur;
-}
-
-static inline struct xlist_head *xlist_del_head_fast(struct xlist_head *head)
-{
-	struct xlist_head *cur;
-
-	cur = head->next;
-	if (!cur)
-		return NULL;
-
-	head->next = cur->next;
-	return cur;
-}
-
-static inline void xlist_splice(struct xlist_head *list,
-				struct xlist_head *head)
-{
-	struct xlist_head *cur;
-
-	WARN_ON(head->next);
-	cur = xchg(&list->next, NULL);
-	head->next = cur;
-}
-
-#endif

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH -v10 0/4] Lock-less list
  2011-01-17  6:16 [PATCH -v10 0/4] Lock-less list Huang Ying
                   ` (3 preceding siblings ...)
  2011-01-17  6:16 ` [PATCH -v10 4/4] net, rds, Replace xlist in net/rds/xlist.h with llist Huang Ying
@ 2011-01-19 21:55 ` Andrew Morton
  2011-01-20  0:45   ` Huang Ying
  2011-01-20  5:55 ` Mathieu Desnoyers
  5 siblings, 1 reply; 25+ messages in thread
From: Andrew Morton @ 2011-01-19 21:55 UTC (permalink / raw)
  To: Huang Ying
  Cc: linux-kernel, Andi Kleen, Peter Zijlstra, Linus Torvalds,
	Ingo Molnar, Chris Mason

I'm trying to remember why we're talking about this.

You had an ACPI-based "hardware error reporting" thing.  And that
required an nmi-context memory allocator.  And that required a
"lockless" list implementation.

Yes?

If so, what happened to all of that?  I assume that the facilities
which this patch series adds will be used for "hardware error reporting"?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH -v10 0/4] Lock-less list
  2011-01-19 21:55 ` [PATCH -v10 0/4] Lock-less list Andrew Morton
@ 2011-01-20  0:45   ` Huang Ying
  2011-01-20  0:52     ` Andrew Morton
  0 siblings, 1 reply; 25+ messages in thread
From: Huang Ying @ 2011-01-20  0:45 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Andi Kleen, Peter Zijlstra, Linus Torvalds,
	Ingo Molnar, Chris Mason

On Thu, 2011-01-20 at 05:55 +0800, Andrew Morton wrote:
> I'm trying to remember why we're talking about this.
> 
> You had an ACPI-based "hardware error reporting" thing.  And that
> required an nmi-context memory allocator.  And that required a
> "lockless" list implementation.
> 
> Yes?

Yes.  But the "lockless" list implementation is general, it can be used
by other part of kernel too,  such as irq_work and xlist in
net/rds/xlist.h in the patchset.

Best Regards,
Huang Ying



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH -v10 0/4] Lock-less list
  2011-01-20  0:45   ` Huang Ying
@ 2011-01-20  0:52     ` Andrew Morton
  2011-01-20  1:09       ` Huang Ying
  2011-01-20 10:44       ` Peter Zijlstra
  0 siblings, 2 replies; 25+ messages in thread
From: Andrew Morton @ 2011-01-20  0:52 UTC (permalink / raw)
  To: Huang Ying
  Cc: linux-kernel, Andi Kleen, Peter Zijlstra, Linus Torvalds,
	Ingo Molnar, Chris Mason

On Thu, 20 Jan 2011 08:45:58 +0800
Huang Ying <ying.huang@intel.com> wrote:

> On Thu, 2011-01-20 at 05:55 +0800, Andrew Morton wrote:
> > I'm trying to remember why we're talking about this.
> > 
> > You had an ACPI-based "hardware error reporting" thing.  And that
> > required an nmi-context memory allocator.  And that required a
> > "lockless" list implementation.
> > 
> > Yes?
> 
> Yes.  But the "lockless" list implementation is general, it can be used
> by other part of kernel too,  such as irq_work and xlist in
> net/rds/xlist.h in the patchset.

Well.  Lots of things are general but that doesn't mean we toss them
into the kernel when we already have plenty of infrastructure to handle
that sort of thing.

otoh, hoisting xlist.h out of net/rds and making it generally available
is a good thing.

otooh, net/rds/ probably didn't need xlist at all and could have used
existing general code.

So...  I'd say that unless and until the NMI-context allocator is
merged, the case for merging the lockless list code is a bit marginal? 
Or have you identified other code sites which could use llist and which
would gain some benefit from migrating?

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH -v10 0/4] Lock-less list
  2011-01-20  0:52     ` Andrew Morton
@ 2011-01-20  1:09       ` Huang Ying
  2011-01-20 10:44       ` Peter Zijlstra
  1 sibling, 0 replies; 25+ messages in thread
From: Huang Ying @ 2011-01-20  1:09 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Andi Kleen, Peter Zijlstra, Linus Torvalds,
	Ingo Molnar, Chris Mason

On Thu, 2011-01-20 at 08:52 +0800, Andrew Morton wrote:
> On Thu, 20 Jan 2011 08:45:58 +0800
> Huang Ying <ying.huang@intel.com> wrote:
> 
> > On Thu, 2011-01-20 at 05:55 +0800, Andrew Morton wrote:
> > > I'm trying to remember why we're talking about this.
> > > 
> > > You had an ACPI-based "hardware error reporting" thing.  And that
> > > required an nmi-context memory allocator.  And that required a
> > > "lockless" list implementation.
> > > 
> > > Yes?
> > 
> > Yes.  But the "lockless" list implementation is general, it can be used
> > by other part of kernel too,  such as irq_work and xlist in
> > net/rds/xlist.h in the patchset.
> 
> Well.  Lots of things are general but that doesn't mean we toss them
> into the kernel when we already have plenty of infrastructure to handle
> that sort of thing.
> 
> otoh, hoisting xlist.h out of net/rds and making it generally available
> is a good thing.
> 
> otooh, net/rds/ probably didn't need xlist at all and could have used
> existing general code.

>From commit description of xlist, it seems that xlist is created for
some performance issue.

commit 6fa70da6081bbcf948801fd5ee0be4d222298a43
Author: Chris Mason <chris.mason@oracle.com>
Date:   Fri Jun 11 11:17:59 2010 -0700

    rds: recycle FMRs through lockless lists
    
    FRM allocation and recycling is performance critical and fairly lock
    intensive.  The current code has a per connection lock that all
    processes bang on and it becomes a major bottleneck on large systems.
    
    This changes things to use a number of cmpxchg based lists instead,
    allowing us to go through the whole FMR lifecycle without locking inside
    RDS.

    [snip]

So general list may be not good for them.

> So...  I'd say that unless and until the NMI-context allocator is
> merged, the case for merging the lockless list code is a bit marginal? 

In fact, lockless allocator is not really depends on llist, it just
depends on the ARCH_HAVE_NMI_SAFE_CMPXCHG patch in the patchset.

> Or have you identified other code sites which could use llist and which
> would gain some benefit from migrating?

The llist will be used by APEI (ACPI Platform Error Interface, that is,
an ACPI-based "hardware error reporting" thing).  It may be used by
other hardware error reporting mechanisms too, which involves NMI or
NMI-like notification method.

Best Regards,
Huang Ying



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH -v10 0/4] Lock-less list
  2011-01-17  6:16 [PATCH -v10 0/4] Lock-less list Huang Ying
                   ` (4 preceding siblings ...)
  2011-01-19 21:55 ` [PATCH -v10 0/4] Lock-less list Andrew Morton
@ 2011-01-20  5:55 ` Mathieu Desnoyers
  2011-01-20  8:57   ` huang ying
  5 siblings, 1 reply; 25+ messages in thread
From: Mathieu Desnoyers @ 2011-01-20  5:55 UTC (permalink / raw)
  To: Huang Ying
  Cc: linux-kernel, Andi Kleen, Peter Zijlstra, Linus Torvalds,
	Ingo Molnar, Chris Mason, Paul E. McKenney

Hi Huang,

I just found out about your lockless linked-list implementation, it looks
interesting. Paul McKenney and myself have created a similar implementation for
userspace within the Userspace RCU project, you might want to have a look (it's
LGPLv2.1).

One point about semantic: a singly-linked list for which you can either delete
the first element or all the elements should probably be called a stack ? I'm
therefore going to refer to "push/pop" in this email rather than "add/delete".

We've got two linked-list stack implementations in the userspace RCU tree:

One has wait-free push, and blocking pop. Useful if you can afford to block when
you pop from the list. There is no restriction on the number of concurrent
push/pop to/from the list. The code is at:

http://git.lttng.org/?p=userspace-rcu.git;a=blob;f=urcu/wfstack-static.h

The other has lock-free push and pop, no restriction on the number of concurrent
push/pop, but uses a clever trick that ensures that the cmpxchg loop will never
see a re-used pointer by using a RCU read-side to protect from memory reclaim.
It therefore requires that the memory used for the list must only be freed after
a RCU grace period. The code is at:

http://git.lttng.org/?p=userspace-rcu.git;a=blob;f=urcu/rculfstack-static.h

We could probably extend this to allow popping all stack entries in one go, but
I haven't given it much thought. 

We also have queue implementations (enqueue at head, dequeue at tail). The first
has wait-free enqueue/blocking dequeue and the second has lock-free
enqueue/dequeue. We use the wait-free enqueue/blocking dequeue queue to push
RCU callbacks when call_rcu() is executed from real-time threads, and we then
dequeue the callbacks to execute with a blocking thread. The lock-free
enqueue/dequeue queue also needs a reference count on the "dummy" node it keeps
in addition to use RCU to delay memory reclamation (see comments in the code).

wait-free enqueue/blocking dequeue:

http://git.lttng.org/?p=userspace-rcu.git;a=blob;f=urcu/wfqueue-static.h

lock-free enqueue/dequeue (beware, the refcounting makes the API less elegant
than the other implementations):

http://git.lttng.org/?p=userspace-rcu.git;a=blob;f=urcu/rculfqueue-static.h

Hoping this might be helpful to you,

Mathieu

-- 
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH -v10 0/4] Lock-less list
  2011-01-20  5:55 ` Mathieu Desnoyers
@ 2011-01-20  8:57   ` huang ying
  0 siblings, 0 replies; 25+ messages in thread
From: huang ying @ 2011-01-20  8:57 UTC (permalink / raw)
  To: Mathieu Desnoyers
  Cc: Huang Ying, linux-kernel, Andi Kleen, Peter Zijlstra,
	Linus Torvalds, Ingo Molnar, Chris Mason, Paul E. McKenney

Hi, Mathieu,

On Thu, Jan 20, 2011 at 1:55 PM, Mathieu Desnoyers
<mathieu.desnoyers@efficios.com> wrote:
> Hi Huang,
>
> I just found out about your lockless linked-list implementation, it looks
> interesting. Paul McKenney and myself have created a similar implementation for
> userspace within the Userspace RCU project, you might want to have a look (it's
> LGPLv2.1).
>
> One point about semantic: a singly-linked list for which you can either delete
> the first element or all the elements should probably be called a stack ? I'm
> therefore going to refer to "push/pop" in this email rather than "add/delete".
>
> We've got two linked-list stack implementations in the userspace RCU tree:
>
> One has wait-free push, and blocking pop. Useful if you can afford to block when
> you pop from the list. There is no restriction on the number of concurrent
> push/pop to/from the list. The code is at:
>
> http://git.lttng.org/?p=userspace-rcu.git;a=blob;f=urcu/wfstack-static.h
>
> The other has lock-free push and pop, no restriction on the number of concurrent
> push/pop, but uses a clever trick that ensures that the cmpxchg loop will never
> see a re-used pointer by using a RCU read-side to protect from memory reclaim.
> It therefore requires that the memory used for the list must only be freed after
> a RCU grace period. The code is at:
>
> http://git.lttng.org/?p=userspace-rcu.git;a=blob;f=urcu/rculfstack-static.h
>
> We could probably extend this to allow popping all stack entries in one go, but
> I haven't given it much thought.
>
> We also have queue implementations (enqueue at head, dequeue at tail). The first
> has wait-free enqueue/blocking dequeue and the second has lock-free
> enqueue/dequeue. We use the wait-free enqueue/blocking dequeue queue to push
> RCU callbacks when call_rcu() is executed from real-time threads, and we then
> dequeue the callbacks to execute with a blocking thread. The lock-free
> enqueue/dequeue queue also needs a reference count on the "dummy" node it keeps
> in addition to use RCU to delay memory reclamation (see comments in the code).
>
> wait-free enqueue/blocking dequeue:
>
> http://git.lttng.org/?p=userspace-rcu.git;a=blob;f=urcu/wfqueue-static.h
>
> lock-free enqueue/dequeue (beware, the refcounting makes the API less elegant
> than the other implementations):
>
> http://git.lttng.org/?p=userspace-rcu.git;a=blob;f=urcu/rculfqueue-static.h
>
> Hoping this might be helpful to you,

Thanks!  I will investigate your code!

Best Regards,
Huang Ying

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH -v10 0/4] Lock-less list
  2011-01-20  0:52     ` Andrew Morton
  2011-01-20  1:09       ` Huang Ying
@ 2011-01-20 10:44       ` Peter Zijlstra
  2011-01-20 11:18         ` huang ying
  1 sibling, 1 reply; 25+ messages in thread
From: Peter Zijlstra @ 2011-01-20 10:44 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Huang Ying, linux-kernel, Andi Kleen, Linus Torvalds,
	Ingo Molnar, Chris Mason

On Wed, 2011-01-19 at 16:52 -0800, Andrew Morton wrote:
> On Thu, 20 Jan 2011 08:45:58 +0800
> Huang Ying <ying.huang@intel.com> wrote:
> 
> > On Thu, 2011-01-20 at 05:55 +0800, Andrew Morton wrote:
> > > I'm trying to remember why we're talking about this.
> > > 
> > > You had an ACPI-based "hardware error reporting" thing.  And that
> > > required an nmi-context memory allocator.  And that required a
> > > "lockless" list implementation.
> > > 
> > > Yes?
> > 
> > Yes.  But the "lockless" list implementation is general, it can be used
> > by other part of kernel too,  such as irq_work and xlist in
> > net/rds/xlist.h in the patchset.
> 
> Well.  Lots of things are general but that doesn't mean we toss them
> into the kernel when we already have plenty of infrastructure to handle
> that sort of thing.
> 
> otoh, hoisting xlist.h out of net/rds and making it generally available
> is a good thing.
> 
> otooh, net/rds/ probably didn't need xlist at all and could have used
> existing general code.
> 
> So...  I'd say that unless and until the NMI-context allocator is
> merged, the case for merging the lockless list code is a bit marginal? 
> Or have you identified other code sites which could use llist and which
> would gain some benefit from migrating?

In fact, I have a patch ready and waiting to revert the whole irq_work
stuff, that too seems to be a superfluous generalization.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH -v10 0/4] Lock-less list
  2011-01-20 10:44       ` Peter Zijlstra
@ 2011-01-20 11:18         ` huang ying
  2011-01-20 11:27           ` Peter Zijlstra
  0 siblings, 1 reply; 25+ messages in thread
From: huang ying @ 2011-01-20 11:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andrew Morton, Huang Ying, linux-kernel, Andi Kleen,
	Linus Torvalds, Ingo Molnar, Chris Mason

On Thu, Jan 20, 2011 at 6:44 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Wed, 2011-01-19 at 16:52 -0800, Andrew Morton wrote:
>> On Thu, 20 Jan 2011 08:45:58 +0800
>> Huang Ying <ying.huang@intel.com> wrote:
>>
>> > On Thu, 2011-01-20 at 05:55 +0800, Andrew Morton wrote:
>> > > I'm trying to remember why we're talking about this.
>> > >
>> > > You had an ACPI-based "hardware error reporting" thing.  And that
>> > > required an nmi-context memory allocator.  And that required a
>> > > "lockless" list implementation.
>> > >
>> > > Yes?
>> >
>> > Yes.  But the "lockless" list implementation is general, it can be used
>> > by other part of kernel too,  such as irq_work and xlist in
>> > net/rds/xlist.h in the patchset.
>>
>> Well.  Lots of things are general but that doesn't mean we toss them
>> into the kernel when we already have plenty of infrastructure to handle
>> that sort of thing.
>>
>> otoh, hoisting xlist.h out of net/rds and making it generally available
>> is a good thing.
>>
>> otooh, net/rds/ probably didn't need xlist at all and could have used
>> existing general code.
>>
>> So...  I'd say that unless and until the NMI-context allocator is
>> merged, the case for merging the lockless list code is a bit marginal?
>> Or have you identified other code sites which could use llist and which
>> would gain some benefit from migrating?
>
> In fact, I have a patch ready and waiting to revert the whole irq_work
> stuff, that too seems to be a superfluous generalization.

What do you plan to replace irq_work?  I plan to use it in APEI NMI
handler and MCE handler.

Best Regards,
Huang Ying

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH -v10 0/4] Lock-less list
  2011-01-20 11:18         ` huang ying
@ 2011-01-20 11:27           ` Peter Zijlstra
  2011-01-20 11:57             ` huang ying
  0 siblings, 1 reply; 25+ messages in thread
From: Peter Zijlstra @ 2011-01-20 11:27 UTC (permalink / raw)
  To: huang ying
  Cc: Andrew Morton, Huang Ying, linux-kernel, Andi Kleen,
	Linus Torvalds, Ingo Molnar, Chris Mason

On Thu, 2011-01-20 at 19:18 +0800, huang ying wrote:
> >
> > In fact, I have a patch ready and waiting to revert the whole irq_work
> > stuff, that too seems to be a superfluous generalization.
> 
> What do you plan to replace irq_work?  I plan to use it in APEI NMI
> handler and MCE handler. 

But will all that stuff be accepted? Please stop sending infrastructure
bits and focus on your larger RAS picture, once you have consensus on
that from all parties involved, then, and only then, does it make sense
to submit everything, including infrastructure.

As it stands now you're simply submitting infrastructure without any
users, and we all know the RAS thing isn't settled, so those users might
never come..

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH -v10 0/4] Lock-less list
  2011-01-20 11:27           ` Peter Zijlstra
@ 2011-01-20 11:57             ` huang ying
  2011-01-20 12:14               ` Ingo Molnar
  0 siblings, 1 reply; 25+ messages in thread
From: huang ying @ 2011-01-20 11:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andrew Morton, Huang Ying, linux-kernel, Andi Kleen,
	Linus Torvalds, Ingo Molnar, Chris Mason

On Thu, Jan 20, 2011 at 7:27 PM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Thu, 2011-01-20 at 19:18 +0800, huang ying wrote:
>> >
>> > In fact, I have a patch ready and waiting to revert the whole irq_work
>> > stuff, that too seems to be a superfluous generalization.
>>
>> What do you plan to replace irq_work?  I plan to use it in APEI NMI
>> handler and MCE handler.
>
> But will all that stuff be accepted? Please stop sending infrastructure
> bits and focus on your larger RAS picture, once you have consensus on
> that from all parties involved, then, and only then, does it make sense
> to submit everything, including infrastructure.

I am not sending hardware error reporting infrastructure.  As far as I
know, Linus and Andrew suggest to use printk for hardware error
reporting.  And now, I just try to write APEI driver and reporting
hardware error with printk.  Is it acceptable?  Do you have some other
idea about hardware error reporting?

> As it stands now you're simply submitting infrastructure without any
> users, and we all know the RAS thing isn't settled, so those users might
> never come..

As for llist, I just want to send some code that is useful for not
only RAS users.  If it has not any user except APEI driver,  I will
move it into APEI driver code.

Best Regards,
Huang Ying

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH -v10 0/4] Lock-less list
  2011-01-20 11:57             ` huang ying
@ 2011-01-20 12:14               ` Ingo Molnar
  2011-01-20 12:49                 ` huang ying
  0 siblings, 1 reply; 25+ messages in thread
From: Ingo Molnar @ 2011-01-20 12:14 UTC (permalink / raw)
  To: huang ying
  Cc: Peter Zijlstra, Andrew Morton, Huang Ying, linux-kernel,
	Andi Kleen, Linus Torvalds, Chris Mason, Borislav Petkov


* huang ying <huang.ying.caritas@gmail.com> wrote:

> > But will all that stuff be accepted? Please stop sending infrastructure bits and 
> > focus on your larger RAS picture, once you have consensus on that from all 
> > parties involved, then, and only then, does it make sense to submit everything, 
> > including infrastructure.
> 
> I am not sending hardware error reporting infrastructure.  As far as I know, Linus 
> and Andrew suggest to use printk for hardware error reporting.  And now, I just 
> try to write APEI driver and reporting hardware error with printk.  Is it 
> acceptable?  Do you have some other idea about hardware error reporting?

Erm, how could you possible have missed the perf based RAS daemon work of Boris, 
which we've pointed out about half a dozen times already?

It's somewhat annoying that you simply ignore repeated feedback and feign ignorance. 
Reminds me of someone :-)

	Ingo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH -v10 0/4] Lock-less list
  2011-01-20 12:14               ` Ingo Molnar
@ 2011-01-20 12:49                 ` huang ying
  2011-01-20 13:06                   ` Ingo Molnar
  0 siblings, 1 reply; 25+ messages in thread
From: huang ying @ 2011-01-20 12:49 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Andrew Morton, Huang Ying, linux-kernel,
	Andi Kleen, Linus Torvalds, Chris Mason, Borislav Petkov

On Thu, Jan 20, 2011 at 8:14 PM, Ingo Molnar <mingo@elte.hu> wrote:
>
> * huang ying <huang.ying.caritas@gmail.com> wrote:
>
>> > But will all that stuff be accepted? Please stop sending infrastructure bits and
>> > focus on your larger RAS picture, once you have consensus on that from all
>> > parties involved, then, and only then, does it make sense to submit everything,
>> > including infrastructure.
>>
>> I am not sending hardware error reporting infrastructure.  As far as I know, Linus
>> and Andrew suggest to use printk for hardware error reporting.  And now, I just
>> try to write APEI driver and reporting hardware error with printk.  Is it
>> acceptable?  Do you have some other idea about hardware error reporting?
>
> Erm, how could you possible have missed the perf based RAS daemon work of Boris,
> which we've pointed out about half a dozen times already?

Even if there is some other hardware error reporting infrastructure
such as perf based, I think we still need printk too. After all, as
Linus pointed out, printk is the most popular error reporting
mechanism so far. Do you think so?

Best Regards,
Huang Ying

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH -v10 0/4] Lock-less list
  2011-01-20 12:49                 ` huang ying
@ 2011-01-20 13:06                   ` Ingo Molnar
  2011-01-20 13:24                     ` huang ying
                                       ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Ingo Molnar @ 2011-01-20 13:06 UTC (permalink / raw)
  To: huang ying
  Cc: Peter Zijlstra, Andrew Morton, Huang Ying, linux-kernel,
	Andi Kleen, Linus Torvalds, Chris Mason, Borislav Petkov


* huang ying <huang.ying.caritas@gmail.com> wrote:

> On Thu, Jan 20, 2011 at 8:14 PM, Ingo Molnar <mingo@elte.hu> wrote:
> >
> > * huang ying <huang.ying.caritas@gmail.com> wrote:
> >
> >> > But will all that stuff be accepted? Please stop sending infrastructure bits and
> >> > focus on your larger RAS picture, once you have consensus on that from all
> >> > parties involved, then, and only then, does it make sense to submit everything,
> >> > including infrastructure.
> >>
> >> I am not sending hardware error reporting infrastructure.  As far as I know, Linus
> >> and Andrew suggest to use printk for hardware error reporting.  And now, I just
> >> try to write APEI driver and reporting hardware error with printk.  Is it
> >> acceptable?  Do you have some other idea about hardware error reporting?
> >
> > Erm, how could you possible have missed the perf based RAS daemon work of Boris,
> > which we've pointed out about half a dozen times already?
> 
> Even if there is some other hardware error reporting infrastructure
> such as perf based, I think we still need printk too. After all, as
> Linus pointed out, printk is the most popular error reporting
> mechanism so far. Do you think so?

Of course, that's why the upstream EDAC code uses printk too. In fact it does all 
sorts of in-kernel decoding to make the printk output more useful - the /dev/mcelog 
method of pushing all decoding to user-space is fundamentally flawed.

So yes, printk is the primary output channel and having a readable printk output 
pretty much overrides any other concern.

But that is not what you are doing. I get the impression that you are using printk 
as an _excuse_ to not have to work with the RAS people and run some parallel 
framework so that you do not have to work with them or listen to them. It is rather 
counter-productive. Working together is useful.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH -v10 0/4] Lock-less list
  2011-01-20 13:06                   ` Ingo Molnar
@ 2011-01-20 13:24                     ` huang ying
  2011-01-20 13:36                     ` Borislav Petkov
  2011-01-20 22:53                     ` Mike Waychison
  2 siblings, 0 replies; 25+ messages in thread
From: huang ying @ 2011-01-20 13:24 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Andrew Morton, Huang Ying, linux-kernel,
	Andi Kleen, Linus Torvalds, Chris Mason, Borislav Petkov

On Thu, Jan 20, 2011 at 9:06 PM, Ingo Molnar <mingo@elte.hu> wrote:
>
> * huang ying <huang.ying.caritas@gmail.com> wrote:
>
>> On Thu, Jan 20, 2011 at 8:14 PM, Ingo Molnar <mingo@elte.hu> wrote:
>> >
>> > * huang ying <huang.ying.caritas@gmail.com> wrote:
>> >
>> >> > But will all that stuff be accepted? Please stop sending infrastructure bits and
>> >> > focus on your larger RAS picture, once you have consensus on that from all
>> >> > parties involved, then, and only then, does it make sense to submit everything,
>> >> > including infrastructure.
>> >>
>> >> I am not sending hardware error reporting infrastructure.  As far as I know, Linus
>> >> and Andrew suggest to use printk for hardware error reporting.  And now, I just
>> >> try to write APEI driver and reporting hardware error with printk.  Is it
>> >> acceptable?  Do you have some other idea about hardware error reporting?
>> >
>> > Erm, how could you possible have missed the perf based RAS daemon work of Boris,
>> > which we've pointed out about half a dozen times already?
>>
>> Even if there is some other hardware error reporting infrastructure
>> such as perf based, I think we still need printk too. After all, as
>> Linus pointed out, printk is the most popular error reporting
>> mechanism so far. Do you think so?
>
> Of course, that's why the upstream EDAC code uses printk too. In fact it does all
> sorts of in-kernel decoding to make the printk output more useful - the /dev/mcelog
> method of pushing all decoding to user-space is fundamentally flawed.
>
> So yes, printk is the primary output channel and having a readable printk output
> pretty much overrides any other concern.
>
> But that is not what you are doing. I get the impression that you are using printk
> as an _excuse_ to not have to work with the RAS people and run some parallel
> framework so that you do not have to work with them or listen to them. It is rather
> counter-productive. Working together is useful.

No.  I am not working on some other hardware error reporting framework
now.  I just want to make printk works better for hardware error
reporting.  For example, printk is not NMI-safe yet, so I need
lockless list and memory allocator to record the information in NMI
handler and printk them later.  And maybe in the future, we can make
printk NMI safe with similar method.  I think there is no
contradiction between my work and other RAS people's work.

Best Regards,
Huang Ying

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH -v10 0/4] Lock-less list
  2011-01-20 13:06                   ` Ingo Molnar
  2011-01-20 13:24                     ` huang ying
@ 2011-01-20 13:36                     ` Borislav Petkov
  2011-01-20 14:11                       ` Ingo Molnar
  2011-01-20 22:53                     ` Mike Waychison
  2 siblings, 1 reply; 25+ messages in thread
From: Borislav Petkov @ 2011-01-20 13:36 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: huang ying, Peter Zijlstra, Andrew Morton, Huang Ying,
	linux-kernel, Andi Kleen, Linus Torvalds, Chris Mason, Luck,
	Tony

+ Tony.

On Thu, Jan 20, 2011 at 02:06:25PM +0100, Ingo Molnar wrote:
> 
> * huang ying <huang.ying.caritas@gmail.com> wrote:
> 
> > On Thu, Jan 20, 2011 at 8:14 PM, Ingo Molnar <mingo@elte.hu> wrote:
> > >
> > > * huang ying <huang.ying.caritas@gmail.com> wrote:
> > >
> > >> > But will all that stuff be accepted? Please stop sending infrastructure bits and
> > >> > focus on your larger RAS picture, once you have consensus on that from all
> > >> > parties involved, then, and only then, does it make sense to submit everything,
> > >> > including infrastructure.
> > >>
> > >> I am not sending hardware error reporting infrastructure.  As far as I know, Linus
> > >> and Andrew suggest to use printk for hardware error reporting.  And now, I just
> > >> try to write APEI driver and reporting hardware error with printk.  Is it
> > >> acceptable?  Do you have some other idea about hardware error reporting?
> > >
> > > Erm, how could you possible have missed the perf based RAS daemon work of Boris,
> > > which we've pointed out about half a dozen times already?
> > 
> > Even if there is some other hardware error reporting infrastructure
> > such as perf based, I think we still need printk too. After all, as
> > Linus pointed out, printk is the most popular error reporting
> > mechanism so far. Do you think so?
> 
> Of course, that's why the upstream EDAC code uses printk too. In fact it does all 
> sorts of in-kernel decoding to make the printk output more useful - the /dev/mcelog 
> method of pushing all decoding to user-space is fundamentally flawed.

True story. And yet google folk still do that, unfortunately:
https://lkml.org/lkml/2011/1/10/419

I think printk should be used in the most cases, where the home user
runs Linux on his machine and it freezes and when he tries to catch the
MCA info, he simply collects serial console or with a persistent storage
device in place, he reboots and then reads out the exact decoded error.

In the big data center, printk might not be that useful anymore and we
might want to have structured log error data - still decoded, mind you,
and properly formatted but sent to userspace over perf and then over
the network or collected by a userspace daemon doing policy decisions
and error trends evaluation. This is, I think, much saner approach than
collecting hardware info from every machine and then using it to decode
the errors. We still need a bunch of work in that direction though.

> So yes, printk is the primary output channel and having a readable printk output 
> pretty much overrides any other concern.
> 
> But that is not what you are doing. I get the impression that you are using printk 
> as an _excuse_ to not have to work with the RAS people and run some parallel 
> framework so that you do not have to work with them or listen to them. It is rather 
> counter-productive. Working together is useful.

So yeah, let me reiterate what Andrew and Ingo said: I don't want to
discuss the merits of all those cool lockless software thingies that
could replace this and that and would be cool if someone used them. I'm
only interested if they can help in a real-world use case - otherwise
it's just a programming exercise.

Yeah, this all IMHO, of course :).

-- 
Regards/Gruss,
Boris.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH -v10 0/4] Lock-less list
  2011-01-20 13:36                     ` Borislav Petkov
@ 2011-01-20 14:11                       ` Ingo Molnar
  2011-01-20 17:59                         ` Luck, Tony
  0 siblings, 1 reply; 25+ messages in thread
From: Ingo Molnar @ 2011-01-20 14:11 UTC (permalink / raw)
  To: Borislav Petkov, huang ying, Peter Zijlstra, Andrew Morton,
	Huang Ying, linux-kernel, Andi Kleen, Linus Torvalds,
	Chris Mason, Luck, Tony


* Borislav Petkov <bp@alien8.de> wrote:

> + Tony.
> 
> On Thu, Jan 20, 2011 at 02:06:25PM +0100, Ingo Molnar wrote:
> > 
> > * huang ying <huang.ying.caritas@gmail.com> wrote:
> > 
> > > On Thu, Jan 20, 2011 at 8:14 PM, Ingo Molnar <mingo@elte.hu> wrote:
> > > >
> > > > * huang ying <huang.ying.caritas@gmail.com> wrote:
> > > >
> > > >> > But will all that stuff be accepted? Please stop sending infrastructure bits and
> > > >> > focus on your larger RAS picture, once you have consensus on that from all
> > > >> > parties involved, then, and only then, does it make sense to submit everything,
> > > >> > including infrastructure.
> > > >>
> > > >> I am not sending hardware error reporting infrastructure.  As far as I know, Linus
> > > >> and Andrew suggest to use printk for hardware error reporting.  And now, I just
> > > >> try to write APEI driver and reporting hardware error with printk.  Is it
> > > >> acceptable?  Do you have some other idea about hardware error reporting?
> > > >
> > > > Erm, how could you possible have missed the perf based RAS daemon work of Boris,
> > > > which we've pointed out about half a dozen times already?
> > > 
> > > Even if there is some other hardware error reporting infrastructure
> > > such as perf based, I think we still need printk too. After all, as
> > > Linus pointed out, printk is the most popular error reporting
> > > mechanism so far. Do you think so?
> > 
> > Of course, that's why the upstream EDAC code uses printk too. In fact it does all 
> > sorts of in-kernel decoding to make the printk output more useful - the /dev/mcelog 
> > method of pushing all decoding to user-space is fundamentally flawed.
> 
> True story. And yet google folk still do that, unfortunately:
> https://lkml.org/lkml/2011/1/10/419

I wouldnt worry about that too much - such uses are extremely isolated.

If we give RAS functionality that gives the limited capabilities of /dev/mcelog and 
much more then the migration path is clear towards the superior solution.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 25+ messages in thread

* RE: [PATCH -v10 0/4] Lock-less list
  2011-01-20 14:11                       ` Ingo Molnar
@ 2011-01-20 17:59                         ` Luck, Tony
  0 siblings, 0 replies; 25+ messages in thread
From: Luck, Tony @ 2011-01-20 17:59 UTC (permalink / raw)
  To: Ingo Molnar, Borislav Petkov, huang ying, Peter Zijlstra,
	Andrew Morton, Huang, Ying, linux-kernel, Andi Kleen,
	Linus Torvalds, Chris Mason, Mike Waychison

> > True story. And yet google folk still do that, unfortunately:
> > https://lkml.org/lkml/2011/1/10/419
>
> I wouldn't worry about that too much - such uses are extremely isolated.

Google is just one user - but they have (so I hear) a large number
of machines.

I would hazard a guess that there are other users that feel the
same way - i.e. users with a large number of machines that move
onto new kernel versions from time to time.

These users want a stable kernel->user ABI - and Google has stated
quite clearly that "printk" doesn't meet their need.  I think that
they would be happy with some other stable solution than mcelog,
but we might as well ask for their input before imposing something
on them.

Mike: would you like to join the conversation on how to report
errors - or nominate someone else to represent the Google point
of view?

-Tony

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH -v10 0/4] Lock-less list
  2011-01-20 13:06                   ` Ingo Molnar
  2011-01-20 13:24                     ` huang ying
  2011-01-20 13:36                     ` Borislav Petkov
@ 2011-01-20 22:53                     ` Mike Waychison
  2011-01-21 17:39                       ` Tim Hockin
  2 siblings, 1 reply; 25+ messages in thread
From: Mike Waychison @ 2011-01-20 22:53 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: huang ying, Peter Zijlstra, Andrew Morton, Huang Ying,
	linux-kernel, Andi Kleen, Linus Torvalds, Chris Mason,
	Borislav Petkov, Robert Lippert

On 01/20/11 05:06, Ingo Molnar wrote:
>
> * huang ying<huang.ying.caritas@gmail.com>  wrote:
>
>> On Thu, Jan 20, 2011 at 8:14 PM, Ingo Molnar<mingo@elte.hu>  wrote:
>>>
>>> * huang ying<huang.ying.caritas@gmail.com>  wrote:
>>>
>>>>> But will all that stuff be accepted? Please stop sending infrastructure bits and
>>>>> focus on your larger RAS picture, once you have consensus on that from all
>>>>> parties involved, then, and only then, does it make sense to submit everything,
>>>>> including infrastructure.
>>>>
>>>> I am not sending hardware error reporting infrastructure.  As far as I know, Linus
>>>> and Andrew suggest to use printk for hardware error reporting.  And now, I just
>>>> try to write APEI driver and reporting hardware error with printk.  Is it
>>>> acceptable?  Do you have some other idea about hardware error reporting?
>>>
>>> Erm, how could you possible have missed the perf based RAS daemon work of Boris,
>>> which we've pointed out about half a dozen times already?
>>
>> Even if there is some other hardware error reporting infrastructure
>> such as perf based, I think we still need printk too. After all, as
>> Linus pointed out, printk is the most popular error reporting
>> mechanism so far. Do you think so?
>
> Of course, that's why the upstream EDAC code uses printk too. In fact it does all
> sorts of in-kernel decoding to make the printk output more useful - the /dev/mcelog
> method of pushing all decoding to user-space is fundamentally flawed.

Geez, I don't know how to approach this preposition in a concise way :( 
  Processing machine checks in-kernel is just as flawed as relying on 
/dev/mcelog alone IMO.  I agree with you that relying on /dev/mcelog to 
get all of our error data out is flawed, but so is relying on an 
in-kernel "abstraction" of the data exposed from the hardware.

There are many different ways a system can fail such that an MCE isn't 
received and processed by the kernel.  Sometimes the error is just too 
fatal to do anything useful.  Errors like a NB buffer CRC error, a bus 
syncflood, or a cache hierarchy ECC error that was incorrectly 
propagated up through to the L1 (which may only have parity checking) 
can cause the kernel to fall over as the CPU is either cut off from the 
rest of the world or too confused to get anything right.

Getting at this information is still very worthwhile however, and I'm 
guessing that this is what the APEI bits are meant to be doing.  You'll 
be seeing patches for Google firmware drivers that provide 
functionality along the same vein in the coming days (I'm still busy 
whitewashing and documenting them).

It's also very ignorant to assume that the kernel knows everything about 
the system and is capable of decoding errors to the satisfaction of 
userland.  As Duncan Laurie pointed out 
(https://lkml.org/lkml/2011/1/11/390) we care about not only the 
physical address, but which stick and which dimm *chip* on the stick is 
having problems.  In-kernel abstractions  break down due to the following:

    * The kernel couldn't possible know how my i2c busses are setup and 
the SPD EEPROMs are related to the physical memory abstraction that the 
bios sets up for me.  I don't know of any standard way to have the BIOS 
expose this sort of information to the operating system.  This sort of 
layout changes between motherboard spins quite frequently as well, so 
good luck mapping it yourself in any generic way.

    * The kernel couldn't know how to map SPD JEDEC Manufacturer ID, 
Model part number and revision to anything useful about the chips 
themselves.

    * The kernel also couldn't know how to communicate with the AMBs in 
a meaningful way (if present).

At the end of the day,   The only things I really care about are:

    * I don't care if the kernel pre-processes the data it gets from the 
hardware when there is an error.  For most users, burping something out 
to the logs in decoded form is generally useful.  It isn't for us.
    * Don't ever put the kernel in a position where it will spam the 
logs and wedge the system -- even if the hardware is wonky.
    * Don't dummy the data such that I can't do the same calculations 
with better visibility from userland.
    * Don't ever enforce a reactive policy that can't be changed from 
userland.
    * I don't care whether the data comes from netlink, /dev/mcelog, 
whiz-bang-sysfs uevent, or thingamaboo perfevents doohickie: as long as 
I get events that are both atomic+consistent and the ABI is maintained.

I've CCed Robert who owns our userland bits as he may have something to add.

That said, I'd love to have generic NMI-safe data-passing for improved 
debugability, regardless of this conflated bickering about RAS 
infrastructure :)

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH -v10 0/4] Lock-less list
  2011-01-20 22:53                     ` Mike Waychison
@ 2011-01-21 17:39                       ` Tim Hockin
  2011-01-21 18:01                         ` Borislav Petkov
  0 siblings, 1 reply; 25+ messages in thread
From: Tim Hockin @ 2011-01-21 17:39 UTC (permalink / raw)
  To: Mike Waychison
  Cc: Ingo Molnar, huang ying, Peter Zijlstra, Andrew Morton,
	Huang Ying, linux-kernel, Andi Kleen, Linus Torvalds,
	Chris Mason, Borislav Petkov, Robert Lippert

OOh ohh, can I jump in?  As another of "those Google guys" who has
been dealing with Linux's lack of solutions here for years....

On Thu, Jan 20, 2011 at 2:53 PM, Mike Waychison <mikew@google.com> wrote:
> On 01/20/11 05:06, Ingo Molnar wrote:
>>
>> * huang ying<huang.ying.caritas@gmail.com>  wrote:
>>
>>> On Thu, Jan 20, 2011 at 8:14 PM, Ingo Molnar<mingo@elte.hu>  wrote:
>>>>
>>>> * huang ying<huang.ying.caritas@gmail.com>  wrote:
>>>>
>>>>>> But will all that stuff be accepted? Please stop sending
>>>>>> infrastructure bits and
>>>>>> focus on your larger RAS picture, once you have consensus on that from
>>>>>> all
>>>>>> parties involved, then, and only then, does it make sense to submit
>>>>>> everything,
>>>>>> including infrastructure.
>>>>>
>>>>> I am not sending hardware error reporting infrastructure.  As far as I
>>>>> know, Linus
>>>>> and Andrew suggest to use printk for hardware error reporting.  And
>>>>> now, I just
>>>>> try to write APEI driver and reporting hardware error with printk.  Is
>>>>> it
>>>>> acceptable?  Do you have some other idea about hardware error
>>>>> reporting?
>>>>
>>>> Erm, how could you possible have missed the perf based RAS daemon work
>>>> of Boris,
>>>> which we've pointed out about half a dozen times already?
>>>
>>> Even if there is some other hardware error reporting infrastructure
>>> such as perf based, I think we still need printk too. After all, as
>>> Linus pointed out, printk is the most popular error reporting
>>> mechanism so far. Do you think so?
>>
>> Of course, that's why the upstream EDAC code uses printk too. In fact it
>> does all
>> sorts of in-kernel decoding to make the printk output more useful - the
>> /dev/mcelog
>> method of pushing all decoding to user-space is fundamentally flawed.

EDAC is fundamentally flawed and we don't use it any more.  It strips
off so much information that we can't actually figure out what
happened to the level we want.  We do it in userspace now.

> Geez, I don't know how to approach this preposition in a concise way :(
>  Processing machine checks in-kernel is just as flawed as relying on
> /dev/mcelog alone IMO.  I agree with you that relying on /dev/mcelog to get
> all of our error data out is flawed, but so is relying on an in-kernel
> "abstraction" of the data exposed from the hardware.
>
>
> There are many different ways a system can fail such that an MCE isn't
> received and processed by the kernel.  Sometimes the error is just too fatal
> to do anything useful.  Errors like a NB buffer CRC error, a bus syncflood,
> or a cache hierarchy ECC error that was incorrectly propagated up through to
> the L1 (which may only have parity checking) can cause the kernel to fall
> over as the CPU is either cut off from the rest of the world or too confused
> to get anything right.
>
> Getting at this information is still very worthwhile however, and I'm
> guessing that this is what the APEI bits are meant to be doing.  You'll be
> seeing patches for Google firmware drivers that provide functionality along
> the same vein in the coming days (I'm still busy whitewashing and
> documenting them).
>
> It's also very ignorant to assume that the kernel knows everything about the
> system and is capable of decoding errors to the satisfaction of userland.
>  As Duncan Laurie pointed out (https://lkml.org/lkml/2011/1/11/390) we care
> about not only the physical address, but which stick and which dimm *chip*
> on the stick is having problems.  In-kernel abstractions  break down due to
> the following:

This.  Andi was trying to use DMI tables to decode physical address to
DIMMs, but I'll tell you this: I have yet to see a platform that has
THAT MUCH information in the DMI tables and have it be *correct*.

>
>   * The kernel couldn't possible know how my i2c busses are setup and the
> SPD EEPROMs are related to the physical memory abstraction that the bios
> sets up for me.  I don't know of any standard way to have the BIOS expose
> this sort of information to the operating system.  This sort of layout
> changes between motherboard spins quite frequently as well, so good luck
> mapping it yourself in any generic way.
>
>   * The kernel couldn't know how to map SPD JEDEC Manufacturer ID, Model
> part number and revision to anything useful about the chips themselves.
>
>   * The kernel also couldn't know how to communicate with the AMBs in a
> meaningful way (if present).
>
>
> At the end of the day,   The only things I really care about are:
>
>   * I don't care if the kernel pre-processes the data it gets from the
> hardware when there is an error.  For most users, burping something out to
> the logs in decoded form is generally useful.  It isn't for us.
>   * Don't ever put the kernel in a position where it will spam the logs and
> wedge the system -- even if the hardware is wonky.

I'll add to this - sometimes 100 MCEs/second is acceptable.  The
Kernel needs to not flake out under that.

>   * Don't dummy the data such that I can't do the same calculations with
> better visibility from userland.

This.  We do extensive analysis of data in userland.

>   * Don't ever enforce a reactive policy that can't be changed from
> userland.
>   * I don't care whether the data comes from netlink, /dev/mcelog,
> whiz-bang-sysfs uevent, or thingamaboo perfevents doohickie: as long as I
> get events that are both atomic+consistent and the ABI is maintained.

I've been asking for hardware events for ever.  I seem to recall a
proposal from IBM at OLS 2002 or 2003 where this was discussed.  I
wanted it then, and I still want it.  But I don't just want MCEs.  Why
can I not use the same channel to get PCI errors or SATA errors or
EDAC (non-MCE) errors.

I don't care what the channel is, so long as I can rate-limit
(/dev/mcelog is pretty good at that) events and the events I read
contain full details about what happened.

> I've CCed Robert who owns our userland bits as he may have something to add.
>
> That said, I'd love to have generic NMI-safe data-passing for improved
> debugability, regardless of this conflated bickering about RAS
> infrastructure :)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH -v10 0/4] Lock-less list
  2011-01-21 17:39                       ` Tim Hockin
@ 2011-01-21 18:01                         ` Borislav Petkov
  0 siblings, 0 replies; 25+ messages in thread
From: Borislav Petkov @ 2011-01-21 18:01 UTC (permalink / raw)
  To: Tim Hockin
  Cc: Mike Waychison, Ingo Molnar, huang ying, Peter Zijlstra,
	Andrew Morton, Huang Ying, linux-kernel, Andi Kleen,
	Linus Torvalds, Chris Mason, Borislav Petkov, Robert Lippert

On Fri, Jan 21, 2011 at 09:39:34AM -0800, Tim Hockin wrote:
> >> Of course, that's why the upstream EDAC code uses printk too. In fact it
> >> does all
> >> sorts of in-kernel decoding to make the printk output more useful - the
> >> /dev/mcelog
> >> method of pushing all decoding to user-space is fundamentally flawed.
> 
> EDAC is fundamentally flawed and we don't use it any more.  It strips
> off so much information that we can't actually figure out what
> happened to the level we want.  We do it in userspace now.

Well, you better make sure to tell me what information you need reported
and I'll try to get it fixed :) Currently, we can decode all MCEs in the
kernel and when the MCE is reporting a DRAM ECC error we can get you the
chip select it resulted from with EDAC.

We can also get you the bank, row and column from which the error
originates (could be added easily to amd64_edac.c).

[..]

> > It's also very ignorant to assume that the kernel knows everything about the
> > system and is capable of decoding errors to the satisfaction of userland.
> >  As Duncan Laurie pointed out (https://lkml.org/lkml/2011/1/11/390) we care
> > about not only the physical address, but which stick and which dimm *chip*
> > on the stick is having problems.  In-kernel abstractions  break down due to
> > the following:
> 
> This.  Andi was trying to use DMI tables to decode physical address to
> DIMMs, but I'll tell you this: I have yet to see a platform that has
> THAT MUCH information in the DMI tables and have it be *correct*.

and yes, there's not a fool-proof and generic way to tell which chip
select on the system points at which DIMM. And excuse me, but I really
really think that reading i2c devices and decyphering SPD ROM info from
them is still not the optimal solution - it should be easier and more
transparent than that. But guess what, this might change...

> >   * The kernel couldn't possible know how my i2c busses are setup and the
> > SPD EEPROMs are related to the physical memory abstraction that the bios
> > sets up for me.  I don't know of any standard way to have the BIOS expose
> > this sort of information to the operating system.  This sort of layout
> > changes between motherboard spins quite frequently as well, so good luck
> > mapping it yourself in any generic way.
> >
> >   * The kernel couldn't know how to map SPD JEDEC Manufacturer ID, Model
> > part number and revision to anything useful about the chips themselves.
> >
> >   * The kernel also couldn't know how to communicate with the AMBs in a
> > meaningful way (if present).
> >
> >
> > At the end of the day,   The only things I really care about are:
> >
> >   * I don't care if the kernel pre-processes the data it gets from the
> > hardware when there is an error.  For most users, burping something out to
> > the logs in decoded form is generally useful.  It isn't for us.
> >   * Don't ever put the kernel in a position where it will spam the logs and
> > wedge the system -- even if the hardware is wonky.
> 
> I'll add to this - sometimes 100 MCEs/second is acceptable.  The
> Kernel needs to not flake out under that.

Yeah, we got that, you want error reporting to be configurable and not
only over printk - we'll fix it.

> >   * Don't dummy the data such that I can't do the same calculations with
> > better visibility from userland.
> 
> This.  We do extensive analysis of data in userland.

Yeah, we want to put the MCE register info along with the decoded info.
We don't want to dummy up the data - we want to make it more useful.

> >   * Don't ever enforce a reactive policy that can't be changed from
> > userland.
> >   * I don't care whether the data comes from netlink, /dev/mcelog,
> > whiz-bang-sysfs uevent, or thingamaboo perfevents doohickie: as long as I
> > get events that are both atomic+consistent and the ABI is maintained.
> 
> I've been asking for hardware events for ever.  I seem to recall a
> proposal from IBM at OLS 2002 or 2003 where this was discussed.  I
> wanted it then, and I still want it.  But I don't just want MCEs.  Why
> can I not use the same channel to get PCI errors or SATA errors or
> EDAC (non-MCE) errors.
> 
> I don't care what the channel is, so long as I can rate-limit
> (/dev/mcelog is pretty good at that) events and the events I read
> contain full details about what happened.

Ok, makes sense.

> > I've CCed Robert who owns our userland bits as he may have something to add.
> >
> > That said, I'd love to have generic NMI-safe data-passing for improved
> > debugability, regardless of this conflated bickering about RAS
> > infrastructure :)

Thanks for the suggestions, much appreciated.

-- 
Regards/Gruss,
Boris.

Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2011-01-21 18:01 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-17  6:16 [PATCH -v10 0/4] Lock-less list Huang Ying
2011-01-17  6:16 ` [PATCH -v10 1/4] Add Kconfig option ARCH_HAVE_NMI_SAFE_CMPXCHG Huang Ying
2011-01-17  6:16 ` [PATCH -v10 2/4] lib, Add lock-less NULL terminated single list Huang Ying
2011-01-17  6:16 ` [PATCH -v10 3/4] irq_work, Use llist in irq_work Huang Ying
2011-01-17  6:16 ` [PATCH -v10 4/4] net, rds, Replace xlist in net/rds/xlist.h with llist Huang Ying
2011-01-19 21:55 ` [PATCH -v10 0/4] Lock-less list Andrew Morton
2011-01-20  0:45   ` Huang Ying
2011-01-20  0:52     ` Andrew Morton
2011-01-20  1:09       ` Huang Ying
2011-01-20 10:44       ` Peter Zijlstra
2011-01-20 11:18         ` huang ying
2011-01-20 11:27           ` Peter Zijlstra
2011-01-20 11:57             ` huang ying
2011-01-20 12:14               ` Ingo Molnar
2011-01-20 12:49                 ` huang ying
2011-01-20 13:06                   ` Ingo Molnar
2011-01-20 13:24                     ` huang ying
2011-01-20 13:36                     ` Borislav Petkov
2011-01-20 14:11                       ` Ingo Molnar
2011-01-20 17:59                         ` Luck, Tony
2011-01-20 22:53                     ` Mike Waychison
2011-01-21 17:39                       ` Tim Hockin
2011-01-21 18:01                         ` Borislav Petkov
2011-01-20  5:55 ` Mathieu Desnoyers
2011-01-20  8:57   ` huang ying

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.