linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/5] Priority Sifting Reader-Writer Lock v13
@ 2008-09-09  0:34 Mathieu Desnoyers
  2008-09-09  0:34 ` [RFC PATCH 1/5] " Mathieu Desnoyers
                   ` (4 more replies)
  0 siblings, 5 replies; 6+ messages in thread
From: Mathieu Desnoyers @ 2008-09-09  0:34 UTC (permalink / raw)
  To: Linus Torvalds, H. Peter Anvin, Jeremy Fitzhardinge,
	Andrew Morton, Ingo Molnar, Paul E. McKenney, Peter Zijlstra,
	Joe Perches, Wei Weng, linux-kernel

Hi,

Here is the reworked version of what was initially called "Fair rwlock", then
"writer-biased rwlock". Hopefully the naming should now better represent the
innovation in this reader-writer locking scheme.

Thanks to Linus' patient explanations, it uses a single atomic op on a 32 bit
variable in the fast path. The bright side of this is instruction-wise
compactness of the fast path and that there is practically no limitation on the
number of readers or writers. The downside is added memory ordering complexity
between fast and slow path variables in the slow path.

Mathieu

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [RFC PATCH 1/5] Priority Sifting Reader-Writer Lock v13
  2008-09-09  0:34 [RFC PATCH 0/5] Priority Sifting Reader-Writer Lock v13 Mathieu Desnoyers
@ 2008-09-09  0:34 ` Mathieu Desnoyers
  2008-09-09  0:34 ` [RFC PATCH 2/5] Priority Sifting Reader-Writer Lock Documentation Mathieu Desnoyers
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Mathieu Desnoyers @ 2008-09-09  0:34 UTC (permalink / raw)
  To: Linus Torvalds, H. Peter Anvin, Jeremy Fitzhardinge,
	Andrew Morton, Ingo Molnar, Paul E. McKenney, Peter Zijlstra,
	Joe Perches, Wei Weng, linux-kernel
  Cc: Mathieu Desnoyers

[-- Attachment #1: psrwlock.patch --]
[-- Type: text/plain, Size: 43532 bytes --]

Priority Sifting Reader-Writer Lock (psrwlock) excludes reader execution
contexts one at a time, thus increasing the writer priority in stages. It favors
writers against readers, but lets higher priority readers access the lock even
when there are subscribed writers waiting for the lock at a lower priority.
Very frequent writers could starve reader threads.


I used LTTng traces and eventually made a small patch to lockdep to detect
whenever a spinlock or a rwlock is used both with interrupts enabled and
disabled. Those sites are likely to produce very high latencies and should IMHO
be considered as bogus. The basic bogus scenario is to have a spinlock held on
CPU A with interrupts enabled being interrupted and then a softirq runs. On CPU
B, the same lock is acquired with interrupts off. We therefore disable
interrupts on CPU B for the duration of the softirq currently running on the CPU
A, which is really not something that helps keeping short latencies. My
preliminary results shows that there are a lot of inconsistent spinlock/rwlock
irq on/off uses in the kernel.

This kind of scenario is pretty easy to fix for spinlocks (either move
the interrupt disable within the spinlock section if the spinlock is
never used by an interrupt handler or make sure that every users has
interrupts disabled).

The problem comes with rwlocks : it is correct to have readers both with
and without irq disable, even when interrupt handlers use the read lock.
However, the write lock has to disable interrupt in that case, and we
suffer from the high latency I pointed out. The tasklist_lock is the
perfect example of this. In the following patch, I try to address this
issue.

TODO :
- Add writer-writer fairness using "tickets" instead of single-bit mutexes

- Add lockdep support
- Create a compatibility layer to make port of current rwlock easier
- Use priority barrel (shifting the reader priority bits in a loop to generalize
  the number of reader priority up to a maximum of 64 with a u64).
- For -rt : support priority inheritance

Other name ideas (RFC) :
  - Priority Sifting Reader-Writer Lock
  - Staircase Reader-Writer Lock
  - Staged Priority Elevation Reader-Writer Lock

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
CC: Jeremy Fitzhardinge <jeremy@goop.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Ingo Molnar <mingo@elte.hu>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Joe Perches <joe@perches.com>
CC: Wei Weng <wweng@acedsl.com>
---
 include/linux/psrwlock-types.h |   92 ++++
 include/linux/psrwlock.h       |  384 ++++++++++++++++++
 lib/Kconfig.debug              |    3 
 lib/Makefile                   |    3 
 lib/psrwlock.c                 |  839 +++++++++++++++++++++++++++++++++++++++++
 5 files changed, 1321 insertions(+)

Index: linux-2.6-lttng/include/linux/psrwlock.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/linux/psrwlock.h	2008-09-08 20:29:11.000000000 -0400
@@ -0,0 +1,384 @@
+#ifndef _LINUX_PSRWLOCK_H
+#define _LINUX_PSRWLOCK_H
+
+/*
+ * Priority Sifting Reader-Writer Lock
+ *
+ * Priority Sifting Reader-Writer Lock (psrwlock) excludes reader execution
+ * contexts one at a time, thus increasing the writer priority in stages. It
+ * favors writers against reader threads, but lets higher priority readers in
+ * even when there are subscribed writers waiting for the lock at a given lower
+ * priority. Very frequent writers could starve reader threads.
+ *
+ * See psrwlock-types.h for types definitions.
+ * See psrwlock.c for algorithmic details.
+ *
+ * Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
+ * August 2008
+ */
+
+#include <linux/hardirq.h>
+#include <linux/wait.h>
+#include <linux/psrwlock-types.h>
+
+#include <asm/atomic.h>
+
+#define NR_PREEMPT_BUSY_LOOPS	100
+
+/*
+ * Uncontended word bits (32 bits)
+ *
+ * Because we deal with overflow by busy-looping waiting for the counter to
+ * decrement, make sure the maximum allowed for lower-priority execution
+ * contexts is lower than the maximum for higher priority execution contexts.
+ * Therefore, all contexts use the same counter bits, but they reach their
+ * overflow capacity one bit apart from each other (only used in the slow path).
+ *
+ * 3 bits for status
+ * 29 bits for reader count
+ *   reserve 1 high bit for irqs
+ *   reserve 1 high bit for bh
+ *   reserve 1 high bit for non-preemptable threads
+ *   26 bits left for preemptable readers count
+ */
+#define UC_READER_MAX		(1U << 29)
+#define UC_HARDIRQ_READER_MAX	UC_READER_MAX
+#define UC_SOFTIRQ_READER_MAX	(UC_HARDIRQ_READER_MAX >> 1)
+#define UC_NPTHREAD_READER_MAX	(UC_SOFTIRQ_READER_MAX >> 1)
+#define UC_PTHREAD_READER_MAX	(UC_NPTHREAD_READER_MAX >> 1)
+
+#define UC_WRITER		(1U << 0)
+#define UC_SLOW_WRITER		(1U << 1)
+#define UC_WQ_ACTIVE		(1U << 2)
+#define UC_READER_OFFSET	(1U << 3)
+#define UC_HARDIRQ_READER_MASK	((UC_HARDIRQ_READER_MAX - 1) * UC_READER_OFFSET)
+#define UC_SOFTIRQ_READER_MASK	((UC_SOFTIRQ_READER_MAX - 1) * UC_READER_OFFSET)
+#define UC_NPTHREAD_READER_MASK		\
+	((UC_NPTHREAD_READER_MAX - 1) * UC_READER_OFFSET)
+#define UC_PTHREAD_READER_MASK	((UC_PTHREAD_READER_MAX - 1) * UC_READER_OFFSET)
+#define UC_READER_MASK		UC_HARDIRQ_READER_MASK
+
+
+/*
+ * Writers in slow path count and mutexes (32 bits)
+ *
+ * 1 bit for WS_WQ_MUTEX (wait queue mutex, always taken with irqs off)
+ * 1 bit for WS_COUNT_MUTEX (protects writer count and UC_SLOW_WRITER updates,
+ *                           taken in initial writer context).
+ * 1 bit for WS_LOCK_MUTEX (single writer in critical section)
+ * 29 bits for writer count.
+ */
+#define WS_WQ_MUTEX		(1U << 0)
+#define WS_COUNT_MUTEX		(1U << 1)
+#define WS_LOCK_MUTEX		(1U << 2)
+
+#define WS_MAX			(1U << 29)
+#define WS_OFFSET		(1U << 3)
+#define WS_MASK			((WS_MAX - 1) * WS_OFFSET)
+
+
+/*
+ * Per-context slow path reader and writer count maximum, offset and mask.
+ * unsigned long type. Used to atomically detect that there is no contention in
+ * a given slow path context and subscribe a writer or let a reader take the
+ * slow path context lock.
+ */
+#define CTX_WOFFSET		(1UL << 0)
+#define CTX_WMAX		(1UL << (BITS_PER_LONG/2))
+#define CTX_WMASK		((CTX_WMAX - 1) * CTX_WOFFSET)
+
+#define CTX_ROFFSET		CTX_WMAX
+#define CTX_RMAX		(1UL << (BITS_PER_LONG/2))
+#define CTX_RMASK		((CTX_RMAX - 1) * CTX_ROFFSET)
+
+
+/*
+ * Internal slow paths.
+ */
+extern asmregparm
+void _psread_lock_slow_irq(unsigned int uc, psrwlock_t *rwlock);
+extern asmregparm
+int _psread_trylock_slow_irq(unsigned int uc, psrwlock_t *rwlock);
+extern asmregparm
+void _psread_lock_slow_bh(unsigned int uc, psrwlock_t *rwlock);
+extern asmregparm
+int _psread_trylock_slow_bh(unsigned int uc, psrwlock_t *rwlock);
+extern asmregparm
+void _psread_lock_slow_inatomic(unsigned int uc, psrwlock_t *rwlock);
+extern asmregparm
+int _psread_trylock_slow_inatomic(unsigned int uc, psrwlock_t *rwlock);
+extern asmregparm
+void _psread_lock_slow(unsigned int uc, psrwlock_t *rwlock);
+extern asmregparm
+int _psread_trylock_slow(unsigned int uc, psrwlock_t *rwlock);
+
+extern asmregparm
+void _pswrite_lock_slow(unsigned int uc, psrwlock_t *rwlock);
+extern asmregparm
+int _pswrite_trylock_slow(unsigned int uc, psrwlock_t *rwlock);
+extern asmregparm
+void _pswrite_unlock_slow(unsigned int uc, psrwlock_t *rwlock);
+extern asmregparm
+void _psrwlock_wakeup(unsigned int uc, psrwlock_t *rwlock);
+
+#ifdef CONFIG_HAVE_PSRWLOCK_ASM_CALL
+#include <asm/call_64.h>
+#else
+#define psread_lock_slow_irq		_psread_lock_slow_irq
+#define psread_trylock_slow_irq		_psread_trylock_slow_irq
+#define psread_lock_slow_bh		_psread_lock_slow_bh
+#define psread_trylock_slow_bh		_psread_trylock_slow_bh
+#define psread_lock_slow_inatomic	_psread_lock_slow_inatomic
+#define psread_trylock_slow_inatomic	_psread_trylock_slow_inatomic
+#define psread_lock_slow		_psread_lock_slow
+#define psread_trylock_slow		_psread_trylock_slow
+
+#define pswrite_lock_slow		_pswrite_lock_slow
+#define pswrite_trylock_slow		_pswrite_trylock_slow
+#define pswrite_unlock_slow		_pswrite_unlock_slow
+#define psrwlock_wakeup			_psrwlock_wakeup
+#endif
+
+/*
+ * psrwlock-specific latency tracing, maps to standard macros by default.
+ */
+#ifdef CONFIG_PSRWLOCK_LATENCY_TEST
+#include <linux/psrwlock-latency-trace.h>
+#else
+static inline void psrwlock_profile_latency_reset(void)
+{ }
+static inline void psrwlock_profile_latency_print(void)
+{ }
+
+#define psrwlock_irq_save(flags)		local_irq_save(flags)
+#define psrwlock_irq_restore(flags)		local_irq_restore(flags)
+#define psrwlock_irq_disable()			local_irq_disable()
+#define psrwlock_irq_enable()			local_irq_enable()
+#define psrwlock_bh_disable()			local_bh_disable()
+#define psrwlock_bh_enable()			local_bh_enable()
+#define psrwlock_bh_enable_ip(ip)		local_bh_enable_ip(ip)
+#define psrwlock_preempt_disable()		preempt_disable()
+#define psrwlock_preempt_enable()		preempt_enable()
+#define psrwlock_preempt_enable_no_resched()	preempt_enable_no_resched()
+#endif
+
+/*
+ * Internal preemption/softirq/irq disabling helpers. Optimized into simple use
+ * of standard local_irq_disable, local_bh_disable, preempt_disable by the
+ * compiler since wctx and rctx are constant.
+ */
+
+static inline void write_context_disable(enum psrw_prio wctx, u32 rctx)
+{
+	if (wctx != PSRW_PRIO_IRQ && (rctx & PSR_IRQ))
+		psrwlock_irq_disable();
+	else if (wctx != PSRW_PRIO_BH && (rctx & PSR_BH))
+		psrwlock_bh_disable();
+	else if (wctx != PSRW_PRIO_NP && (rctx & PSR_NPTHREAD))
+		psrwlock_preempt_disable();
+}
+
+static inline void write_context_enable(enum psrw_prio wctx, u32 rctx)
+{
+	if (wctx != PSRW_PRIO_IRQ && (rctx & PSR_IRQ))
+		psrwlock_irq_enable();
+	else if (wctx != PSRW_PRIO_BH && (rctx & PSR_BH))
+		psrwlock_bh_enable();
+	else if (wctx != PSRW_PRIO_NP && (rctx & PSR_NPTHREAD))
+		psrwlock_preempt_enable();
+}
+
+/*
+ * psrwlock_preempt_check must have a uc parameter read with a memory
+ * barrier making sure the slow path variable writes and the UC_WQ_ACTIVE flag
+ * read are done in this order (either a smp_mb() or a atomic_sub_return()).
+ */
+static inline void psrwlock_preempt_check(unsigned int uc,
+		psrwlock_t *rwlock)
+{
+	if (unlikely(uc & UC_WQ_ACTIVE))
+		psrwlock_wakeup(uc, rwlock);
+}
+
+
+/*
+ * API
+ */
+
+/* Reader lock */
+
+/*
+ * many readers, from irq/softirq/non preemptable and preemptable thread
+ * context. Protects against writers.
+ *
+ * Read lock fastpath :
+ *
+ * A cmpxchg is used here and _not_ a simple add because a lower-priority reader
+ * could block the writer while it is waiting for readers to clear the
+ * uncontended path. This would happen if, for instance, the reader gets
+ * interrupted between the add and the moment it gets to the slow path.
+ */
+
+/*
+ * Called from any context.
+ */
+static inline void psread_unlock(psrwlock_t *rwlock)
+{
+	unsigned int uc = atomic_sub_return(UC_READER_OFFSET, &rwlock->uc);
+	psrwlock_preempt_check(uc, rwlock);
+}
+
+/*
+ * Called from interrupt disabled or interrupt context.
+ */
+static inline void psread_lock_irq(psrwlock_t *rwlock)
+{
+	unsigned int uc = atomic_cmpxchg(&rwlock->uc, 0, UC_READER_OFFSET);
+	if (likely(!uc))
+		return;
+	psread_lock_slow_irq(uc, rwlock);
+}
+
+static inline int psread_trylock_irq(psrwlock_t *rwlock)
+{
+	unsigned int uc = atomic_cmpxchg(&rwlock->uc, 0, UC_READER_OFFSET);
+	if (likely(!uc))
+		return 1;
+	return psread_trylock_slow_irq(uc, rwlock);
+}
+
+/*
+ * Called from softirq context.
+ */
+
+static inline void psread_lock_bh(psrwlock_t *rwlock)
+{
+	unsigned int uc = atomic_cmpxchg(&rwlock->uc, 0, UC_READER_OFFSET);
+	if (likely(!uc))
+		return;
+	psread_lock_slow_bh(uc, rwlock);
+}
+
+static inline int psread_trylock_bh(psrwlock_t *rwlock)
+{
+	unsigned int uc = atomic_cmpxchg(&rwlock->uc, 0, UC_READER_OFFSET);
+	if (likely(!uc))
+		return 1;
+	return psread_trylock_slow_bh(uc, rwlock);
+}
+
+
+/*
+ * Called from non-preemptable thread context.
+ */
+
+static inline void psread_lock_inatomic(psrwlock_t *rwlock)
+{
+	unsigned int uc = atomic_cmpxchg(&rwlock->uc, 0, UC_READER_OFFSET);
+	if (likely(!uc))
+		return;
+	psread_lock_slow_inatomic(uc, rwlock);
+}
+
+static inline int psread_trylock_inatomic(psrwlock_t *rwlock)
+{
+	unsigned int uc = atomic_cmpxchg(&rwlock->uc, 0, UC_READER_OFFSET);
+	if (likely(!uc))
+		return 1;
+	return psread_trylock_slow_inatomic(uc, rwlock);
+}
+
+
+/*
+ * Called from preemptable thread context.
+ */
+
+static inline void psread_lock(psrwlock_t *rwlock)
+{
+	unsigned int uc = atomic_cmpxchg(&rwlock->uc, 0, UC_READER_OFFSET);
+	if (likely(!uc))
+		return;
+	psread_lock_slow(uc, rwlock);
+}
+
+static inline int psread_trylock(psrwlock_t *rwlock)
+{
+	unsigned int uc = atomic_cmpxchg(&rwlock->uc, 0, UC_READER_OFFSET);
+	if (likely(!uc))
+		return 1;
+	return psread_trylock_slow(uc, rwlock);
+}
+
+
+/* Writer Lock */
+
+/*
+ * ctx is the context map showing which contexts can take the read lock and
+ * which context is using the write lock.
+ *
+ * Write lock use example, where the lock is used by readers in interrupt,
+ * preemptable context and non-preemptable context. The writer lock is taken in
+ * preemptable context.
+ *
+ * static DEFINE_PSRWLOCK(lock, PSRW_PRIO_P, PSR_IRQ | PSR_PTHREAD);
+ * CHECK_PSRWLOCK_MAP(lock, PSRW_PRIO_P, PSR_IRQ | PSR_PTHREAD);
+ *
+ *  pswrite_lock(&lock, PSRW_PRIO_P, PSR_IRQ | PSR_PTHREAD);
+ *  ...
+ *  pswrite_unlock(&lock, PSRW_PRIO_P, PSR_IRQ | PSR_PTHREAD);
+ */
+static inline
+void pswrite_lock(psrwlock_t *rwlock, enum psrw_prio wctx, u32 rctx)
+{
+	unsigned int uc;
+
+	write_context_disable(wctx, rctx);
+	/* no other reader nor writer present, try to take the lock */
+	uc = atomic_cmpxchg(&rwlock->uc, 0, UC_WRITER);
+	if (likely(!uc))
+		return;
+	else
+		pswrite_lock_slow(uc, rwlock);
+}
+
+static inline
+int pswrite_trylock(psrwlock_t *rwlock, enum psrw_prio wctx, u32 rctx)
+{
+	unsigned int uc;
+
+	write_context_disable(wctx, rctx);
+	/* no other reader nor writer present, try to take the lock */
+	uc = atomic_cmpxchg(&rwlock->uc, 0, UC_WRITER);
+	if (likely(!uc))
+		return 1;
+	else
+		return pswrite_trylock_slow(uc, rwlock);
+}
+
+static inline
+void pswrite_unlock(psrwlock_t *rwlock, enum psrw_prio wctx, u32 rctx)
+{
+	unsigned int uc;
+
+	/*
+	 * atomic_cmpxchg makes sure we commit the data before reenabling
+	 * the lock. Will take the slow path if there are active readers, if
+	 * UC_SLOW_WRITER is set or if there are threads in the wait queue.
+	 */
+	uc = atomic_cmpxchg(&rwlock->uc, UC_WRITER, 0);
+	if (likely(uc == UC_WRITER)) {
+		write_context_enable(wctx, rctx);
+		/*
+		 * no need to check preempt because all wait queue masks
+		 * were 0. An active wait queue would trigger the slow path.
+		 */
+		return;
+	}
+	/*
+	 * Go through the slow unlock path to check if we must clear the
+	 * UC_SLOW_WRITER bit.
+	 */
+	pswrite_unlock_slow(uc, rwlock);
+}
+
+#endif /* _LINUX_PSRWLOCK_H */
Index: linux-2.6-lttng/lib/Makefile
===================================================================
--- linux-2.6-lttng.orig/lib/Makefile	2008-09-08 20:28:14.000000000 -0400
+++ linux-2.6-lttng/lib/Makefile	2008-09-08 20:29:11.000000000 -0400
@@ -43,6 +43,9 @@ obj-$(CONFIG_DEBUG_PREEMPT) += smp_proce
 obj-$(CONFIG_DEBUG_LIST) += list_debug.o
 obj-$(CONFIG_DEBUG_OBJECTS) += debugobjects.o
 
+obj-y += psrwlock.o
+obj-$(CONFIG_PSRWLOCK_LATENCY_TEST) += psrwlock-latency-trace.o
+
 ifneq ($(CONFIG_HAVE_DEC_LOCK),y)
   lib-y += dec_and_lock.o
 endif
Index: linux-2.6-lttng/lib/psrwlock.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/lib/psrwlock.c	2008-09-08 20:29:11.000000000 -0400
@@ -0,0 +1,839 @@
+/*
+ * Priority Sifting Reader-Writer Lock
+ *
+ * Priority Sifting Reader-Writer Lock (psrwlock) excludes reader execution
+ * contexts one at a time, thus increasing the writer priority in stages. It
+ * favors writers against reader threads, but lets higher priority readers in
+ * even when there are subscribed writers waiting for the lock at a given lower
+ * priority. Very frequent writers could starve reader threads.
+ *
+ * Copyright 2008 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
+ */
+
+#include <linux/psrwlock.h>
+#include <linux/wait.h>
+#include <linux/freezer.h>
+#include <linux/module.h>
+
+#include <asm/processor.h>
+
+#ifdef WBIAS_RWLOCK_DEBUG
+#define printk_dbg printk
+#else
+#define printk_dbg(fmt, args...)
+#endif
+
+enum preempt_type {
+	PSRW_PREEMPT,		/* preemptable */
+	PSRW_NON_PREEMPT,	/* non-preemptable */
+};
+
+enum lock_type {
+	PSRW_READ,
+	PSRW_WRITE,
+};
+
+enum v_type {
+	V_INT,
+	V_LONG,
+};
+
+static void rwlock_wait(void *vptr, psrwlock_t *rwlock,
+		unsigned long mask, unsigned long test_mask,
+		unsigned long full_mask, int check_full_mask,
+		enum v_type vtype, enum lock_type ltype);
+
+/*
+ * Lock out a specific uncontended execution context from the read lock. Wait
+ * for the rmask (readers in previous context count) and for the writer count in
+ * the new context not to be full before proceeding to subscribe to the new
+ * write context.
+ */
+static int _pswrite_lock_ctx_wait_sub(void *v_inout,
+		void *vptr, psrwlock_t *rwlock,
+		unsigned long wait_mask, unsigned long test_mask,
+		unsigned long full_mask, long offset,
+		enum v_type vtype, enum lock_type ltype,
+		enum preempt_type ptype, int trylock)
+{
+	long try = NR_PREEMPT_BUSY_LOOPS;
+	unsigned long newv;
+	unsigned long v;
+
+	if (vtype == V_LONG)
+		v = *(unsigned long *)v_inout;
+	else
+		v = *(unsigned int *)v_inout;
+
+	printk_dbg("wait sub start v %lX, new %lX, wait_mask %lX, "
+		"test_mask %lX, full_mask %lX, offset %lX\n",
+		v, v + offset, wait_mask, test_mask, full_mask, offset);
+
+	for (;;) {
+		if (v & wait_mask || (v & test_mask) >= full_mask) {
+			if (trylock)
+				return 0;
+			if (ptype == PSRW_PREEMPT && unlikely(!(--try))) {
+				rwlock_wait(vptr, rwlock, wait_mask,
+					test_mask, full_mask, 1,
+					vtype, ltype);
+				try = NR_PREEMPT_BUSY_LOOPS;
+			} else
+				cpu_relax();	/* Order v reads */
+			if (vtype == V_LONG)
+				v = atomic_long_read((atomic_long_t *)vptr);
+			else
+				v = atomic_read((atomic_t *)vptr);
+			continue;
+		}
+		if (vtype == V_LONG)
+			newv = atomic_long_cmpxchg((atomic_long_t *)vptr,
+				v, v + offset);
+		else
+			newv = atomic_cmpxchg((atomic_t *)vptr,
+				(int)v, (int)v + (int)offset);
+		if (likely(newv == v))
+			break;
+		else {
+			if (trylock)
+				return 0;
+			v = newv;
+		}
+	}
+	printk_dbg("wait sub end v %lX, new %lX, wait_mask %lX, "
+		"test_mask %lX, full_mask %lX, offset %lX\n",
+		v, v + offset, wait_mask, test_mask, full_mask, offset);
+	/* cmpxchg orders memory reads and writes */
+	v += offset;
+	if (vtype == V_LONG)
+		*(unsigned long *)v_inout = v;
+	else
+		*(unsigned int *)v_inout = v;
+	return 1;
+}
+
+static int _pswrite_lock_ctx_wait(unsigned long v_in, void *vptr,
+		psrwlock_t *rwlock, unsigned long wait_mask,
+		enum v_type vtype, enum lock_type ltype,
+		enum preempt_type ptype, int trylock)
+{
+	int try = NR_PREEMPT_BUSY_LOOPS;
+	unsigned long v = v_in;
+
+	printk_dbg("wait start v %lX, wait_mask %lX\n", v, wait_mask);
+	/* order all read and write memory operations. */
+	smp_mb();
+	while (v & wait_mask) {
+		if (ptype == PSRW_PREEMPT && unlikely(!(--try))) {
+			if (trylock)
+				return 0;
+			rwlock_wait(vptr, rwlock, wait_mask, 0, 0, 0, vtype,
+				ltype);
+			try = NR_PREEMPT_BUSY_LOOPS;
+		} else
+			cpu_relax();	/* Order v reads */
+		if (vtype == V_LONG)
+			v = atomic_long_read((atomic_long_t *)vptr);
+		else
+			v = atomic_read((atomic_t *)vptr);
+	}
+	/* order all read and write memory operations. */
+	smp_mb();
+	printk_dbg("wait end v %lX, wait_mask %lX\n", v, wait_mask);
+
+	return 1;
+}
+
+/*
+ * Go into a wait queue.
+ *
+ * mask, v & full_mask == full_mask are the conditions for which we wait.
+ */
+static void rwlock_wait(void *vptr, psrwlock_t *rwlock,
+		unsigned long mask, unsigned long test_mask,
+		unsigned long full_mask, int check_full_mask,
+		enum v_type vtype, enum lock_type ltype)
+{
+	DECLARE_WAITQUEUE(psrwlock_wq, current);
+	unsigned long v;
+	int wq_active, ws;
+
+	/*
+	 * Busy-loop waiting for the waitqueue mutex.
+	 */
+	psrwlock_irq_disable();
+	ws = atomic_read(&rwlock->ws);
+	_pswrite_lock_ctx_wait_sub(&ws, &rwlock->ws, rwlock,
+		0, WS_WQ_MUTEX, WS_WQ_MUTEX, WS_WQ_MUTEX,
+		V_INT, ltype, PSRW_NON_PREEMPT, 0);
+	/*
+	 * Got the waitqueue mutex, get into the wait queue.
+	 */
+	wq_active = waitqueue_active(&rwlock->wq_read)
+			|| waitqueue_active(&rwlock->wq_write);
+	if (!wq_active)
+		atomic_add(UC_WQ_ACTIVE, &rwlock->uc);
+	/* Set the UC_WQ_ACTIVE flag before testing the condition. */
+	smp_mb();
+	/*
+	 * Before we go to sleep, check that the lock we were expecting
+	 * did not free between the moment we last checked for the lock and the
+	 * moment we raised the UC_WQ_ACTIVE flag.
+	 */
+	if (vtype == V_LONG)
+		v = atomic_long_read((atomic_long_t *)vptr);
+	else
+		v = atomic_read((atomic_t *)vptr);
+	if (unlikely(!(v & mask || (check_full_mask
+			&& (v & test_mask) >= full_mask))))
+		goto skip_sleep;
+	/*
+	 * Only one thread will be woken up at a time.
+	 */
+	if (ltype == PSRW_WRITE)
+		add_wait_queue_exclusive_locked(&rwlock->wq_write,
+			&psrwlock_wq);
+	else
+		__add_wait_queue(&rwlock->wq_read, &psrwlock_wq);
+	__set_current_state(TASK_UNINTERRUPTIBLE);
+	smp_mb();	/* Insure memory ordering when clearing the mutex. */
+	atomic_sub(WS_WQ_MUTEX, &rwlock->ws);
+	psrwlock_irq_enable();
+
+	try_to_freeze();
+	schedule();
+
+	/*
+	 * Woken up; Busy-loop waiting for the waitqueue mutex.
+	 */
+	psrwlock_irq_disable();
+	ws = atomic_read(&rwlock->ws);
+	_pswrite_lock_ctx_wait_sub(&ws, &rwlock->ws, rwlock,
+		0, WS_WQ_MUTEX, WS_WQ_MUTEX, WS_WQ_MUTEX,
+		V_INT, ltype, PSRW_NON_PREEMPT, 0);
+	__set_current_state(TASK_RUNNING);
+	if (ltype == PSRW_WRITE)
+		remove_wait_queue_locked(&rwlock->wq_write, &psrwlock_wq);
+	else
+		remove_wait_queue_locked(&rwlock->wq_read, &psrwlock_wq);
+skip_sleep:
+	wq_active = waitqueue_active(&rwlock->wq_read)
+			|| waitqueue_active(&rwlock->wq_write);
+	if (!wq_active)
+		atomic_sub(UC_WQ_ACTIVE, &rwlock->uc);
+	smp_mb();	/* Insure memory ordering when clearing the mutex. */
+	atomic_sub(WS_WQ_MUTEX, &rwlock->ws);
+	psrwlock_irq_enable();
+}
+
+/*
+ * Reader lock
+ */
+
+/*
+ * _psread_lock_fast_check
+ *
+ * Second cmpxchg taken in case of many active readers.
+ * Will busy-loop if cmpxchg fails even in trylock mode.
+ *
+ * First try to get the uncontended lock. If it is non-zero (can be common,
+ * since we allow multiple readers), pass the returned cmpxchg v to the loop
+ * to try to get the reader lock.
+ *
+ * trylock will fail if a writer is subscribed or holds the lock, but will
+ * spin if there is concurency to win the cmpxchg. It could happen if, for
+ * instance, other concurrent reads need to update the roffset or if a
+ * writer updated the lock bits which does not contend us. Since many
+ * concurrent readers is a common case, it makes sense not to fail is it
+ * happens.
+ *
+ * the non-trylock case will spin for both situations.
+ *
+ * Busy-loop if the reader count is full.
+ */
+static int _psread_lock_fast_check(unsigned int uc, psrwlock_t *rwlock,
+	unsigned int uc_rmask)
+{
+	unsigned int newuc;
+
+	/*
+	 * This is the second cmpxchg taken in case of many active readers.
+	 */
+	while (likely(!(uc & (UC_SLOW_WRITER | UC_WRITER))
+			&& (uc & UC_READER_MASK) < uc_rmask)) {
+		newuc = atomic_cmpxchg(&rwlock->uc, uc, uc + UC_READER_OFFSET);
+		if (likely(newuc == uc))
+			return 1;
+		else
+			uc = newuc;
+	}
+	return 0;
+}
+
+int __psread_lock_slow(psrwlock_t *rwlock,
+		unsigned int uc_rmask, atomic_long_t *vptr,
+		int trylock, enum preempt_type ptype)
+{
+	u32 rctx = rwlock->rctx_bitmap;
+	unsigned long v;
+	unsigned int uc;
+	int ret;
+
+	if (unlikely(in_irq() || irqs_disabled()))
+		WARN_ON_ONCE(!(rctx & PSR_IRQ) || ptype != PSRW_NON_PREEMPT);
+	else if (in_softirq())
+		WARN_ON_ONCE(!(rctx & PSR_BH) || ptype != PSRW_NON_PREEMPT);
+#ifdef CONFIG_PREEMPT
+	else if (in_atomic())
+		WARN_ON_ONCE(!(rctx & PSR_NPTHREAD)
+			|| ptype != PSRW_NON_PREEMPT);
+	else
+		WARN_ON_ONCE(!(rctx & PSR_PTHREAD) || ptype != PSRW_PREEMPT);
+#else
+	else
+		WARN_ON_ONCE((!(rctx & PSR_NPTHREAD)
+				|| ptype != PSRW_NON_PREEMPT)
+				&& (!(rctx & PSR_PTHREAD)
+				|| ptype != PSRW_PREEMPT));
+#endif
+
+	/*
+	 * A cmpxchg read uc, which implies strict ordering.
+	 */
+	v = atomic_long_read(vptr);
+	ret = _pswrite_lock_ctx_wait_sub(&v, vptr, rwlock,
+		CTX_WMASK, CTX_RMASK, CTX_RMASK, CTX_ROFFSET,
+		V_LONG, PSRW_READ, ptype, trylock);
+	if (unlikely(!ret))
+		goto fail;
+
+	/*
+	 * We are in! Well, we just have to busy-loop waiting for any
+	 * uncontended writer to release its lock.
+	 *
+	 * In this exact order :
+	 * - increment the uncontended readers count.
+	 * - decrement the current context reader count we just previously got.
+	 *
+	 * This makes sure we always count in either the slow path per context
+	 * count or the uncontended reader count starting from the moment we got
+	 * the slow path count to the moment we will release the uncontended
+	 * reader count at the unlock.
+	 *
+	 * This implies a strict read/write ordering of these two variables.
+	 * Reading first "uc" and then "v" is strictly required. The current
+	 * reader count can be summed twice in the worse case, but we are only
+	 * interested to know if there is _any_ reader left.
+	 */
+	uc = atomic_read(&rwlock->uc);
+	ret = _pswrite_lock_ctx_wait_sub(&uc, &rwlock->uc, rwlock,
+		UC_WRITER, UC_READER_MASK, uc_rmask, UC_READER_OFFSET,
+		V_INT, PSRW_READ, ptype, trylock);
+	/*
+	 * _pswrite_lock_ctx_wait_sub has a memory barrier
+	 */
+	atomic_long_sub(CTX_ROFFSET, vptr);
+	/*
+	 * don't care about v ordering wrt memory operations inside the
+	 * read lock. It's uc which holds our read count.
+	 */
+	if (unlikely(!ret))
+		goto fail_preempt;
+
+	/* Success */
+	return 1;
+
+	/* Failure */
+fail_preempt:
+	/* write v before reading uc */
+	smp_mb();
+	uc = atomic_read(&rwlock->uc);
+	psrwlock_preempt_check(uc, rwlock);
+fail:
+	cpu_relax();
+	return 0;
+
+}
+
+/*
+ * _psread_lock_slow : read lock slow path.
+ *
+ * Non-preemptable :
+ * Busy-wait for the specific context lock.
+ * Preemptable :
+ * Busy-wait for the specific context lock NR_PREEMPT_BUSY_LOOPS loops, and then
+ * go to the wait queue.
+ *
+ * _psread_trylock_slow : read trylock slow path.
+ *
+ * Try to get the read lock. Returns 1 if succeeds, else returns 0.
+ */
+
+asmregparm
+void _psread_lock_slow_irq(unsigned int uc, psrwlock_t *rwlock)
+{
+	int ret;
+
+	ret = _psread_lock_fast_check(uc, rwlock, UC_HARDIRQ_READER_MASK);
+	if (ret)
+		return;
+	__psread_lock_slow(rwlock, UC_HARDIRQ_READER_MASK,
+			&rwlock->prio[PSRW_PRIO_IRQ],
+			0, PSRW_NON_PREEMPT);
+}
+EXPORT_SYMBOL(_psread_lock_slow_irq);
+
+asmregparm
+void _psread_lock_slow_bh(unsigned int uc, psrwlock_t *rwlock)
+{
+	int ret;
+
+	ret = _psread_lock_fast_check(uc, rwlock, UC_SOFTIRQ_READER_MASK);
+	if (ret)
+		return;
+	__psread_lock_slow(rwlock, UC_SOFTIRQ_READER_MASK,
+			&rwlock->prio[PSRW_PRIO_BH],
+			0, PSRW_NON_PREEMPT);
+}
+EXPORT_SYMBOL(_psread_lock_slow_bh);
+
+asmregparm
+void _psread_lock_slow_inatomic(unsigned int uc, psrwlock_t *rwlock)
+{
+	int ret;
+
+	ret = _psread_lock_fast_check(uc, rwlock, UC_NPTHREAD_READER_MASK);
+	if (ret)
+		return;
+	__psread_lock_slow(rwlock, UC_NPTHREAD_READER_MASK,
+			&rwlock->prio[PSRW_PRIO_NP],
+			0, PSRW_NON_PREEMPT);
+}
+EXPORT_SYMBOL(_psread_lock_slow_inatomic);
+
+asmregparm
+void _psread_lock_slow(unsigned int uc, psrwlock_t *rwlock)
+{
+	int ret;
+
+	ret = _psread_lock_fast_check(uc, rwlock, UC_PTHREAD_READER_MASK);
+	if (ret)
+		return;
+	__psread_lock_slow(rwlock, UC_PTHREAD_READER_MASK,
+			&rwlock->prio[PSRW_PRIO_P],
+			0, PSRW_PREEMPT);
+}
+EXPORT_SYMBOL(_psread_lock_slow);
+
+asmregparm
+int _psread_trylock_slow_irq(unsigned int uc, psrwlock_t *rwlock)
+{
+	int ret;
+
+	ret = _psread_lock_fast_check(uc, rwlock, UC_HARDIRQ_READER_MASK);
+	if (ret)
+		return 1;
+	return __psread_lock_slow(rwlock, UC_HARDIRQ_READER_MASK,
+			&rwlock->prio[PSRW_PRIO_IRQ],
+			1, PSRW_NON_PREEMPT);
+}
+EXPORT_SYMBOL(_psread_trylock_slow_irq);
+
+asmregparm
+int _psread_trylock_slow_bh(unsigned int uc, psrwlock_t *rwlock)
+{
+	int ret;
+
+	ret = _psread_lock_fast_check(uc, rwlock, UC_SOFTIRQ_READER_MASK);
+	if (ret)
+		return 1;
+	return __psread_lock_slow(rwlock, UC_SOFTIRQ_READER_MASK,
+			&rwlock->prio[PSRW_PRIO_BH],
+			1, PSRW_NON_PREEMPT);
+}
+EXPORT_SYMBOL(_psread_trylock_slow_bh);
+
+asmregparm
+int _psread_trylock_slow_inatomic(unsigned int uc, psrwlock_t *rwlock)
+{
+	int ret;
+
+	ret = _psread_lock_fast_check(uc, rwlock, UC_NPTHREAD_READER_MASK);
+	if (ret)
+		return 1;
+	return __psread_lock_slow(rwlock, UC_NPTHREAD_READER_MASK,
+			&rwlock->prio[PSRW_PRIO_NP],
+			1, PSRW_NON_PREEMPT);
+}
+EXPORT_SYMBOL(_psread_trylock_slow_inatomic);
+
+asmregparm
+int _psread_trylock_slow(unsigned int uc, psrwlock_t *rwlock)
+{
+	int ret;
+
+	ret = _psread_lock_fast_check(uc, rwlock, UC_PTHREAD_READER_MASK);
+	if (ret)
+		return 1;
+	return __psread_lock_slow(rwlock, UC_PTHREAD_READER_MASK,
+			&rwlock->prio[PSRW_PRIO_P],
+			1, PSRW_PREEMPT);
+}
+EXPORT_SYMBOL(_psread_trylock_slow);
+
+
+/* Writer lock */
+
+static int _pswrite_lock_out_context(unsigned int *uc_inout,
+	atomic_long_t *vptr, psrwlock_t *rwlock,
+	enum preempt_type ptype, int trylock)
+{
+	int ret;
+	unsigned long v;
+
+	/* lock out read slow paths */
+	v = atomic_long_read(vptr);
+	ret = _pswrite_lock_ctx_wait_sub(&v, vptr, rwlock,
+		0, CTX_WMASK, CTX_WMASK, CTX_WOFFSET,
+		V_LONG, PSRW_WRITE, ptype, trylock);
+	if (unlikely(!ret))
+		return 0;
+	/*
+	 * continue when no reader threads left, but keep subscription, will be
+	 * removed by next subscription.
+	 */
+	ret = _pswrite_lock_ctx_wait(v, vptr, rwlock,
+		CTX_RMASK, V_LONG, PSRW_WRITE, ptype, trylock);
+	if (unlikely(!ret))
+		goto fail_clean_slow;
+	/* Wait for uncontended readers and writers to unlock */
+	*uc_inout = atomic_read(&rwlock->uc);
+	ret = _pswrite_lock_ctx_wait(*uc_inout, &rwlock->uc, rwlock,
+		UC_WRITER | UC_READER_MASK,
+		V_INT, PSRW_WRITE, ptype, trylock);
+	if (!ret)
+		goto fail_clean_slow;
+	return 1;
+
+fail_clean_slow:
+	atomic_long_sub(CTX_WOFFSET, vptr);
+	return 0;
+}
+
+static void writer_count_inc(unsigned int *uc, psrwlock_t *rwlock,
+		enum preempt_type ptype)
+{
+	unsigned int ws;
+
+	ws = atomic_read(&rwlock->ws);
+	/*
+	 * Take the mutex and increment the writer count at once.
+	 * Never fail.
+	 */
+	_pswrite_lock_ctx_wait_sub(&ws, &rwlock->ws, rwlock,
+		WS_COUNT_MUTEX, WS_MASK, WS_MASK,
+		WS_COUNT_MUTEX + WS_OFFSET,
+		V_INT, PSRW_WRITE, ptype, 0);
+	/* First writer in slow path ? */
+	if ((ws & WS_MASK) == WS_OFFSET) {
+		atomic_add(UC_SLOW_WRITER, &rwlock->uc);
+		*uc += UC_SLOW_WRITER;
+	}
+	smp_mb();	/* serialize memory operations with mutex */
+	atomic_sub(WS_COUNT_MUTEX, &rwlock->ws);
+}
+
+static void writer_count_dec(unsigned int *uc, psrwlock_t *rwlock,
+		enum preempt_type ptype)
+{
+	unsigned int ws;
+
+	ws = atomic_read(&rwlock->ws);
+	/*
+	 * Take the mutex and decrement the writer count at once.
+	 * Never fail.
+	 */
+	_pswrite_lock_ctx_wait_sub(&ws, &rwlock->ws, rwlock,
+		WS_COUNT_MUTEX, WS_COUNT_MUTEX, WS_COUNT_MUTEX,
+		WS_COUNT_MUTEX - WS_OFFSET,
+		V_INT, PSRW_WRITE, ptype, 0);
+	/* Last writer in slow path ? */
+	if (!(ws & WS_MASK)) {
+		atomic_sub(UC_SLOW_WRITER, &rwlock->uc);
+		*uc -= UC_SLOW_WRITER;
+	}
+	smp_mb();	/* serialize memory operations with mutex */
+	atomic_sub(WS_COUNT_MUTEX, &rwlock->ws);
+}
+
+static int __pswrite_lock_slow(unsigned int uc, psrwlock_t *rwlock,
+		int trylock)
+{
+	enum psrw_prio wctx = rwlock->wctx;
+	u32 rctx = rwlock->rctx_bitmap;
+	enum preempt_type ptype;
+	unsigned int ws;
+	int ret;
+
+	write_context_enable(wctx, rctx);
+
+	if (wctx == PSRW_PRIO_IRQ)
+		WARN_ON_ONCE(!in_irq() && !irqs_disabled());
+	else if (wctx == PSRW_PRIO_BH)
+		WARN_ON_ONCE(!in_softirq());
+#ifdef CONFIG_PREEMPT
+	else if (wctx == PSRW_PRIO_NP)
+		WARN_ON_ONCE(!in_atomic());
+#endif
+
+	/*
+	 * We got here because the MAY_CONTEND bit is set in the uc bitmask. We
+	 * are therefore contending with fast-path or other slow-path writers.
+	 * A cmpxchg reads uc, which implies strict ordering.
+	 */
+	if (wctx == PSRW_PRIO_P)
+		ptype = PSRW_PREEMPT;
+	else
+		ptype = PSRW_NON_PREEMPT;
+
+	/* Increment the slow path writer count */
+	writer_count_inc(&uc, rwlock, ptype);
+
+	if (rctx & PSR_PTHREAD) {
+		ptype = PSRW_PREEMPT;
+		ret = _pswrite_lock_out_context(&uc,
+			&rwlock->prio[PSRW_PRIO_P], rwlock, ptype, trylock);
+		if (unlikely(!ret))
+			goto fail_dec_count;
+	}
+
+	/*
+	 * lock out non-preemptable threads.
+	 */
+	if (rctx & PSR_NPTHREAD) {
+		if (wctx != PSRW_PRIO_NP)
+			psrwlock_preempt_disable();
+		ptype = PSRW_NON_PREEMPT;
+		ret = _pswrite_lock_out_context(&uc,
+			&rwlock->prio[PSRW_PRIO_NP], rwlock, ptype, trylock);
+		if (unlikely(!ret))
+			goto fail_unsub_pthread;
+	}
+
+	/* lock out softirqs */
+	if (rctx & PSR_BH) {
+		if (wctx != PSRW_PRIO_BH)
+			psrwlock_bh_disable();
+		ptype = PSRW_NON_PREEMPT;
+		ret = _pswrite_lock_out_context(&uc,
+			&rwlock->prio[PSRW_PRIO_BH], rwlock,
+			ptype, trylock);
+		if (unlikely(!ret))
+			goto fail_unsub_npthread;
+	}
+
+	/* lock out hardirqs */
+	if (rctx & PSR_IRQ) {
+		if (wctx != PSRW_PRIO_IRQ)
+			psrwlock_irq_disable();
+		ptype = PSRW_NON_PREEMPT;
+		ret = _pswrite_lock_out_context(&uc,
+			&rwlock->prio[PSRW_PRIO_IRQ], rwlock,
+			ptype, trylock);
+		if (unlikely(!ret))
+			goto fail_unsub_bh;
+	}
+
+	/*
+	 * Finally, take the mutex.
+	 */
+	if (rctx & (PSR_NPTHREAD | PSR_BH | PSR_IRQ))
+		ptype = PSRW_NON_PREEMPT;
+	else
+		ptype = PSRW_PREEMPT;
+	ws = atomic_read(&rwlock->ws);
+	ret = _pswrite_lock_ctx_wait_sub(&ws, &rwlock->ws, rwlock,
+		0, WS_LOCK_MUTEX, WS_LOCK_MUTEX, WS_LOCK_MUTEX,
+		V_INT, PSRW_WRITE, ptype, trylock);
+	if (unlikely(!ret))
+		goto fail_unsub_irq;
+	/* atomic_cmpxchg orders writes */
+
+	return 1;	/* success */
+
+	/* Failure paths */
+fail_unsub_irq:
+	if (rctx & PSR_IRQ)
+		atomic_long_sub(CTX_WOFFSET, &rwlock->prio[PSRW_PRIO_IRQ]);
+fail_unsub_bh:
+	if ((rctx & PSR_IRQ) && wctx != PSRW_PRIO_IRQ)
+		psrwlock_irq_enable();
+	if (rctx & PSR_BH)
+		atomic_long_sub(CTX_WOFFSET, &rwlock->prio[PSRW_PRIO_BH]);
+fail_unsub_npthread:
+	if ((rctx & PSR_BH) && wctx != PSRW_PRIO_BH)
+		psrwlock_bh_enable();
+	if (rctx & PSR_NPTHREAD)
+		atomic_long_sub(CTX_WOFFSET, &rwlock->prio[PSRW_PRIO_NP]);
+fail_unsub_pthread:
+	if ((rctx & PSR_NPTHREAD) && wctx != PSRW_PRIO_NP)
+		psrwlock_preempt_enable();
+	if (rctx & PSR_PTHREAD)
+		atomic_long_sub(CTX_WOFFSET, &rwlock->prio[PSRW_PRIO_P]);
+fail_dec_count:
+	if (wctx == PSRW_PRIO_P)
+		ptype = PSRW_PREEMPT;
+	else
+		ptype = PSRW_NON_PREEMPT;
+	writer_count_dec(&uc, rwlock, ptype);
+	psrwlock_preempt_check(uc, rwlock);
+	cpu_relax();
+	return 0;
+}
+
+/*
+ * _pswrite_lock_slow : Writer-biased rwlock write lock slow path.
+ *
+ * Locks out execution contexts one by one.
+ */
+asmregparm void _pswrite_lock_slow(unsigned int uc, psrwlock_t *rwlock)
+{
+	__pswrite_lock_slow(uc, rwlock, 0);
+}
+EXPORT_SYMBOL_GPL(_pswrite_lock_slow);
+
+/*
+ * _pswrite_trylock_slow : Try to take a write lock.
+ */
+asmregparm
+int _pswrite_trylock_slow(unsigned int uc, psrwlock_t *rwlock)
+{
+	return __pswrite_lock_slow(uc, rwlock, 1);
+}
+EXPORT_SYMBOL_GPL(_pswrite_trylock_slow);
+
+asmregparm
+void _pswrite_unlock_slow(unsigned int uc, psrwlock_t *rwlock)
+{
+	enum psrw_prio wctx = rwlock->wctx;
+	u32 rctx = rwlock->rctx_bitmap;
+	enum preempt_type ptype;
+
+	/*
+	 * We get here either :
+	 * - From the fast-path unlock, but a slow-path writer has set the
+	 *   UC_SLOW_WRITER bit.
+	 * - still having the slowpath locks.
+	 *
+	 * We have to know if we must decrement the WS_OFFSET count.
+	 *
+	 * uc, received as parameter, was read by an atomic cmpxchg, which
+	 * implies strict memory ordering. It orders memory accesses done within
+	 * the critical section with the lock.
+	 */
+	if (uc & UC_WRITER) {
+		uc = atomic_sub_return(UC_WRITER, &rwlock->uc);
+		write_context_enable(wctx, rctx);
+		psrwlock_preempt_check(uc, rwlock);
+	} else {
+		/*
+		 * Release the slow path lock.
+		 */
+		smp_mb();	/* insure memory order with lock mutex */
+		atomic_sub(WS_LOCK_MUTEX, &rwlock->ws);
+		if (rctx & PSR_IRQ) {
+			atomic_long_sub(CTX_WOFFSET,
+				&rwlock->prio[PSRW_PRIO_IRQ]);
+			if (wctx != PSRW_PRIO_IRQ)
+				psrwlock_irq_enable();
+		}
+		if (rctx & PSR_BH) {
+			atomic_long_sub(CTX_WOFFSET,
+				&rwlock->prio[PSRW_PRIO_BH]);
+			if (wctx != PSRW_PRIO_BH)
+				psrwlock_bh_enable();
+		}
+		if (rctx & PSR_NPTHREAD) {
+			atomic_long_sub(CTX_WOFFSET,
+				&rwlock->prio[PSRW_PRIO_NP]);
+			if (wctx != PSRW_PRIO_NP)
+				psrwlock_preempt_enable();
+		}
+		if (rctx & PSR_PTHREAD)
+			atomic_long_sub(CTX_WOFFSET,
+				&rwlock->prio[PSRW_PRIO_P]);
+
+		if (wctx == PSRW_PRIO_P)
+			ptype = PSRW_PREEMPT;
+		else
+			ptype = PSRW_NON_PREEMPT;
+		writer_count_dec(&uc, rwlock, ptype);
+		psrwlock_preempt_check(uc, rwlock);
+	}
+}
+EXPORT_SYMBOL_GPL(_pswrite_unlock_slow);
+
+/*
+ * _psrwlock_wakeup : Wake up tasks waiting for a write or read lock.
+ *
+ * Called from any context (irq/softirq/preempt/non-preempt). Contains a
+ * busy-loop; must therefore disable interrupts, but only for a short time.
+ */
+asmregparm void _psrwlock_wakeup(unsigned int uc, psrwlock_t *rwlock)
+{
+	unsigned long flags;
+	unsigned int ws;
+
+	/*
+	 * Busy-loop waiting for the waitqueue mutex.
+	 */
+	psrwlock_irq_save(flags);
+	/*
+	 * Pass PSRW_READ since unused in PSRW_NON_PREEMPT.
+	 */
+	ws = atomic_read(&rwlock->ws);
+	_pswrite_lock_ctx_wait_sub(&ws, &rwlock->ws, rwlock,
+		0, WS_WQ_MUTEX, WS_WQ_MUTEX, WS_WQ_MUTEX,
+		V_INT, PSRW_READ, PSRW_NON_PREEMPT, 0);
+	/*
+	 * If there is at least one non-preemptable writer subscribed or holding
+	 * higher priority write masks, let it handle the wakeup when it exits
+	 * its critical section which excludes any preemptable context anyway.
+	 * The same applies to preemptable readers, which are the only ones
+	 * which can cause a preemptable writer to sleep.
+	 *
+	 * The conditions here are all the states in which we are sure to reach
+	 * a preempt check without blocking on the lock.
+	 */
+	uc = atomic_read(&rwlock->uc);
+	if (!(uc & UC_WQ_ACTIVE) || uc & UC_READER_MASK
+			|| (atomic_long_read(&rwlock->prio[PSRW_PRIO_IRQ])
+				& CTX_WMASK)
+			|| (atomic_long_read(&rwlock->prio[PSRW_PRIO_BH])
+				& CTX_WMASK)
+			|| (atomic_long_read(&rwlock->prio[PSRW_PRIO_NP])
+				& CTX_WMASK)) {
+		smp_mb();	/*
+				 * Insure memory ordering when clearing the
+				 * mutex.
+				 */
+		atomic_sub(WS_WQ_MUTEX, &rwlock->ws);
+		psrwlock_irq_restore(flags);
+		return;
+	}
+
+	/*
+	 * First do an exclusive wake-up of the first writer if there is one
+	 * waiting, else wake-up the readers.
+	 */
+	if (waitqueue_active(&rwlock->wq_write))
+		wake_up_locked(&rwlock->wq_write);
+	else
+		wake_up_locked(&rwlock->wq_read);
+	smp_mb();	/*
+			 * Insure global memory order when clearing the mutex.
+			 */
+	atomic_sub(WS_WQ_MUTEX, &rwlock->ws);
+	psrwlock_irq_restore(flags);
+}
+EXPORT_SYMBOL_GPL(_psrwlock_wakeup);
Index: linux-2.6-lttng/lib/Kconfig.debug
===================================================================
--- linux-2.6-lttng.orig/lib/Kconfig.debug	2008-09-08 20:28:14.000000000 -0400
+++ linux-2.6-lttng/lib/Kconfig.debug	2008-09-08 20:29:11.000000000 -0400
@@ -680,6 +680,9 @@ config FAULT_INJECTION_STACKTRACE_FILTER
 	help
 	  Provide stacktrace filter for fault-injection capabilities
 
+config HAVE_PSRWLOCK_ASM_CALL
+	def_bool n
+
 config LATENCYTOP
 	bool "Latency measuring infrastructure"
 	select FRAME_POINTER if !MIPS
Index: linux-2.6-lttng/include/linux/psrwlock-types.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/linux/psrwlock-types.h	2008-09-08 20:29:11.000000000 -0400
@@ -0,0 +1,92 @@
+#ifndef _LINUX_PSRWLOCK_TYPES_H
+#define _LINUX_PSRWLOCK_TYPES_H
+
+/*
+ * Priority Sifting Reader-Writer Lock types definition
+ *
+ * Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
+ * August 2008
+ */
+
+#include <linux/wait.h>
+#include <asm/atomic.h>
+
+/*
+ * This table represents which is the lowest read priority context can be used
+ * given the highest read priority context and the context in which the write
+ * lock is taken.
+ *
+ * e.g. given the highest priority context from which we take the read lock is
+ * interrupt context (IRQ) and the context where the write lock is taken is
+ * non-preemptable (NP), we should never have a reader in context lower than
+ * NP.
+ *
+ * X means : don't !
+ *
+ * X axis : Priority of writer
+ * Y axis : Max priority of reader
+ * Maps to :  Minimum priority of a reader.
+ *
+ * Highest Read Prio / Write Prio    | P     NP    BH    IRQ
+ * ------------------------------------------------------------------------
+ * P                                 | P     X     X     X
+ * NP                                | P     NP    X     X
+ * BH                                | P     NP    BH    X
+ * IRQ                               | P     NP    BH    IRQ
+ *
+ * This table is verified by the CHECK_PSRWLOCK_MAP macro.
+ */
+
+enum psrw_prio {
+	PSRW_PRIO_P,
+	PSRW_PRIO_NP,
+	PSRW_PRIO_BH,
+	PSRW_PRIO_IRQ,
+	PSRW_NR_PRIO,
+};
+
+/*
+ * Possible execution contexts for readers.
+ */
+#define PSR_PTHREAD	(1U << PSRW_PRIO_P)
+#define PSR_NPTHREAD	(1U << PSRW_PRIO_NP)
+#define PSR_BH		(1U << PSRW_PRIO_BH)
+#define PSR_IRQ		(1U << PSRW_PRIO_IRQ)
+#define PSR_NR		PSRW_NR_PRIO
+#define PSR_MASK	(PSR_PTHREAD | PSR_NPTHREAD | PSR_BH | PSR_IRQ)
+
+typedef struct psrwlock {
+	atomic_t uc;			/* Uncontended word	*/
+	atomic_t ws;			/* Writers in the slow path count */
+	atomic_long_t prio[PSRW_NR_PRIO]; /* Per priority slow path counts */
+	u32 rctx_bitmap;		/* Allowed read execution ctx */
+	enum psrw_prio wctx;		/* Allowed write execution ctx */
+	wait_queue_head_t wq_read;	/* Preemptable readers wait queue */
+	wait_queue_head_t wq_write;	/* Preemptable writers wait queue */
+} psrwlock_t;
+
+#define __PSRWLOCK_UNLOCKED(x, _wctx, _rctx)				\
+	{								\
+		.uc = { 0 },						\
+		.ws = { 0 },						\
+		.prio[0 ... (PSRW_NR_PRIO - 1)] = { 0 },		\
+		.rctx_bitmap = (_rctx),					\
+		.wctx = (_wctx),					\
+		.wq_read = __WAIT_QUEUE_HEAD_INITIALIZER((x).wq_read),	\
+		.wq_write = __WAIT_QUEUE_HEAD_INITIALIZER((x).wq_write),\
+	}
+
+#define DEFINE_PSRWLOCK(x, wctx, rctx)					\
+	psrwlock_t x = __PSRWLOCK_UNLOCKED(x, wctx, rctx)
+
+/*
+ * Statically check that no reader with priority lower than the writer is
+ * possible.
+ */
+#define CHECK_PSRWLOCK_MAP(x, wctx, rctx)				\
+	static inline void __psrwlock_bad_context_map_##x(void)		\
+	{								\
+		BUILD_BUG_ON((~(~0UL << (wctx))) & (rctx));		\
+	}
+
+#endif /* _LINUX_PSRWLOCK_TYPES_H */

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [RFC PATCH 2/5] Priority Sifting Reader-Writer Lock Documentation
  2008-09-09  0:34 [RFC PATCH 0/5] Priority Sifting Reader-Writer Lock v13 Mathieu Desnoyers
  2008-09-09  0:34 ` [RFC PATCH 1/5] " Mathieu Desnoyers
@ 2008-09-09  0:34 ` Mathieu Desnoyers
  2008-09-09  0:34 ` [RFC PATCH 3/5] Priority Sifting Reader-Writer Lock x86_64 Optimised Call Mathieu Desnoyers
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 6+ messages in thread
From: Mathieu Desnoyers @ 2008-09-09  0:34 UTC (permalink / raw)
  To: Linus Torvalds, H. Peter Anvin, Jeremy Fitzhardinge,
	Andrew Morton, Ingo Molnar, Paul E. McKenney, Peter Zijlstra,
	Joe Perches, Wei Weng, linux-kernel
  Cc: Mathieu Desnoyers

[-- Attachment #1: psrwlock-documentation.patch --]
[-- Type: text/plain, Size: 21286 bytes --]

Design goal, algorithmic description and performance tests.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
CC: Jeremy Fitzhardinge <jeremy@goop.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Ingo Molnar <mingo@elte.hu>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Joe Perches <joe@perches.com>
CC: Wei Weng <wweng@acedsl.com>
---
 Documentation/psrwlock.txt |  440 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 440 insertions(+)

Index: linux-2.6-lttng/Documentation/psrwlock.txt
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/Documentation/psrwlock.txt	2008-09-08 20:29:13.000000000 -0400
@@ -0,0 +1,440 @@
+                 Priority Sifting Reader-Writer Locks
+                     Design and Performance tests
+                       Mathieu Desnoyers, 2008
+
+
+****** Design Goal ******
+
+The main design goal is to lessen the rwlock impact on the irq and softirq
+latency of the system.
+
+A typical case leading to long interrupt latencies :
+
+- rwlock shared between
+  - Rare update in thread context
+  - Frequent slow read in thread context (task list iteration)
+  - Fast interrupt handler read
+
+The slow write must therefore disable interrupts around the write lock, but will
+therefore add up to the global interrupt latency; worse case being the duration
+of the slow read.
+
+
+****** Description of the psrwlock algorithm ******
+
+The writer fast path uses a single bit to indicate that a fast path writer is
+active (UC_WRITER). The uncontended case is done by upgrading the priority to
+the priority of the highest priority reader (e.g. by disabling interrupts) and
+by doing a cmpxchg which sets the UC_WRITER bit atomically only if there is no
+other writer nor reader in their critical section. If there is contention caused
+by either a reader or a writer, the writer falls in the slow path.
+
+The writer slow path first sets the UC_SLOW_WRITER bit and increments the WS
+(writers in slow path) counter (the two operations are made atomic by using the
+WS_COUNT_MUTEX bit as a mutex) and then subscribes to the preemptable lock,
+which locks out the preemptable reader threads. It waits for all the preemptable
+reader threads in the slow path to exit their critical section. It then waits
+for all the "in-flight" fast path readers to exit their critical section. Then,
+it upgrades its priority by disabling preemption and does the same
+(subscription, wait for slow path readers, wait for fast path readers) for
+preemptable readers. The same is then done for bottom halves and interrupt
+contexts. One all the reader contexts has been excluded, the writer takes the
+slow path mutex, accesses the data structure.
+
+In its unlock, the writer detects if it was a fast or slow path writer by
+checking the UC_WRITER bit. A fast path writer has to clear the UC_WRITER bit
+and bring its priority back to its original state (e.g. reenabling interrupts).
+The slow path writer must unlock the mutex, lower its priority stage by stage
+(reenabling interrupt, bh, preemption). At each stage, it unsubscribes from the
+specific context. Then, it checks if it is the last writer in the slow path by
+decrementing and testing the WS counter (writers in slow path) bit. If it is the
+last writer, it clears the UC_SLOW_WRITER bit (count and bit are made atomic by
+the WS_COUNT_MUTEX bit used as a mutex).
+
+The reader does an atomic cmpxchg to check if there is any contention on the
+lock (other readers or writers in their critical section). If not, it increments
+the reader count. If there are other active readers, the first cmpxchg will
+fail, but a second cmpxchg will be taken at the beginning of the slow path if
+there are no writers contending the lock. A cmpxchg is used to take the reader
+lock instead of a simple addition because we cannot afford to take the read
+fast-path lock, even for a short period of time, while we are in a lower
+execution context while there is a writer in an higher priority execution
+context waiting for the lock. Failure to do so would result in priority
+inversion.
+
+The reader slow path waits for the slow path writer subscription count in its
+particular context to become 0. When it does, the reader atomically increments
+the reader count for this context. Then, it waits for all the fast path writers
+to exit their critical section and increments the fast path reader count. Before
+returning from the reader lock primitive, it decrements the slow path reader
+count for the context it subscribed to, behaving exactly as a fast path reader.
+
+The unlock primitive is then exactly the same for the fast and slow path
+readers: They only have to decrement the fastpath reader count.
+
+WS_WQ_MUTEX protects the waitqueue and UC_WQ_ACTIVE changes. WS_WQ_MUTEX must be
+taken with interrupts off. The ordering of operations dealing with preemptable
+threads involves that any code path which can cause a thread to be added to the
+wait queue must end with a check for UC_WQ_ACTIVE flag which leads to a wake up
+if there are pending threads in the wait queue. Also, any point where a thread
+can be woken up from the wait queue must be followed by a UC_WQ_ACTIVE check.
+Given that the UC_WQ_ACTIVE flag is tested without taking the WS_WQ_MUTEX, we
+must make sure that threads added to the waitqueue first set the UC_WQ_ACTIVE
+flag and then re-test for the condition which led them to be put to sleep.
+
+Upon unlock, the following sequence is done :
+- atomically unlock and return the lock value, which contains the UC_WQ_ACTIVE
+  bit status at the moment the unlock is done.
+- if UC_WQ_ACTIVE is set :
+  - take WS_WQ_MUTEX
+  - wake a thread
+  - release WS_WQ_MUTEX
+
+When a thread is ready to be added to the wait queue :
+- the last busy-looping iteration fails.
+- take the WS_WQ_MUTEX
+- set the UC_WQ_ACTIVE bit if the list about to pass from inactive to active.
+- check again for the failed condition, since its status may have changed
+  since the busy-loop failed. If the condition now succeeds, return to
+  busy-looping after putting the UC_WQ_ACTIVE bit back to its original state and
+  releasing the WS_WQ_MUTEX.
+- add the current thread to the wait queue, change state.
+- release WS_WQ_MUTEX.
+
+Upon wakeup :
+- take WS_WQ_MUTEX
+- set state to running, remove from the wait queue.
+- clear UC_WQ_ACTIVE if the list passed to inactive.
+- release WS_WQ_MUTEX.
+
+A sequence similar to the unlock must be done when a trylock fails.
+
+
+****** Performance tests ******
+
+The test module is available at :
+
+http://ltt.polymtl.ca/svn/trunk/tests/kernel/test-psrwlock.c
+
+Dual quad-core Xeon 2.0GHz E5405
+
+
+**** Latency ****
+
+This section presents the detailed breakdown of latency preemption, softirq and
+interrupt latency generated by the psrwlock. In the "High contention" section,
+is compares the "irqoff latency tracer" results between standard Linux kernel
+rwlocks and the psrwlocks (tests done on wbias-rwlock v8).
+
+get_cycles takes [min,avg,max] 72,75,78 cycles, results calibrated on avg
+
+** Single writer test, no contention **
+SINGLE_WRITER_TEST_DURATION 10s
+
+IRQ latency for cpu 6 disabled 99490 times, [min,avg,max] 471,485,1527 cycles
+SoftIRQ latency for cpu 6 disabled 99490 times, [min,avg,max] 693,704,3969 cycles
+Preemption latency for cpu 6 disabled 99490 times, [min,avg,max] 909,917,4593 cycles
+
+
+** Single trylock writer test, no contention **
+SINGLE_WRITER_TEST_DURATION 10s
+
+IRQ latency for cpu 2 disabled 10036 times, [min,avg,max] 393,396,849 cycles
+SoftIRQ latency for cpu 2 disabled 10036 times, [min,avg,max] 609,614,1317 cycles
+Preemption latency for cpu 2 disabled 10036 times, [min,avg,max] 825,826,1971 cycles
+
+
+** Single reader test, no contention **
+SINGLE_READER_TEST_DURATION 10s
+
+Preemption latency for cpu 2 disabled 31596702 times, [min,avg,max] 502,508,54256 cycles
+
+
+** Multiple readers test, no contention (4 readers, busy-loop) **
+MULTIPLE_READERS_TEST_DURATION 10s
+NR_READERS 4
+
+Preemption latency for cpu 1 disabled 9302974 times, [min,avg,max] 502,2039,88060 cycles
+Preemption latency for cpu 3 disabled 9270742 times, [min,avg,max] 508,2045,61342 cycles
+Preemption latency for cpu 6 disabled 13331943 times, [min,avg,max] 508,1387,309088 cycles
+Preemption latency for cpu 7 disabled 4781453 times, [min,avg,max] 508,4092,230752 cycles
+
+
+** High contention test **
+TEST_DURATION 60s
+NR_WRITERS 2
+NR_TRYLOCK_WRITERS 1
+NR_READERS 4
+NR_TRYLOCK_READERS 1
+WRITER_DELAY 100us
+TRYLOCK_WRITER_DELAY 1000us
+TRYLOCK_WRITERS_FAIL_ITER 100
+THREAD_READER_DELAY 0   /* busy loop */
+INTERRUPT_READER_DELAY 100ms
+
+Standard Linux rwlock
+
+irqsoff latency trace v1.1.5 on 2.6.27-rc3-trace
+--------------------------------------------------------------------
+ latency: 2902 us, #3/3, CPU#5 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:8)
+    -----------------
+    | task: wbiasrwlock_wri-4984 (uid:0 nice:-5 policy:0 rt_prio:0)
+    -----------------
+ => started at: _write_lock_irq
+ => ended at:   _write_unlock_irq
+
+#                _------=> CPU#
+#               / _-----=> irqs-off
+#              | / _----=> need-resched
+#              || / _---=> hardirq/softirq
+#              ||| / _--=> preempt-depth
+#              |||| /
+#              |||||     delay
+#  cmd     pid ||||| time  |   caller
+#     \   /    |||||   \   |   /
+wbiasrwl-4984  5d..1    0us!: _write_lock_irq (0)
+wbiasrwl-4984  5d..2 2902us : _write_unlock_irq (0)
+wbiasrwl-4984  5d..3 2903us : trace_hardirqs_on (_write_unlock_irq)
+
+
+Writer-biased rwlock, same test routine
+
+irqsoff latency trace v1.1.5 on 2.6.27-rc3-trace
+--------------------------------------------------------------------
+ latency: 33 us, #3/3, CPU#7 | (M:preempt VP:0, KP:0, SP:0 HP:0 #P:8)
+    -----------------
+    | task: events/7-27 (uid:0 nice:-5 policy:0 rt_prio:0)
+    -----------------
+ => started at: _spin_lock_irqsave
+ => ended at:   _spin_unlock_irqrestore
+
+#                _------=> CPU#
+#               / _-----=> irqs-off
+#              | / _----=> need-resched
+#              || / _---=> hardirq/softirq
+#              ||| / _--=> preempt-depth
+#              |||| /
+#              |||||     delay
+#  cmd     pid ||||| time  |   caller
+#     \   /    |||||   \   |   /
+events/7-27    7d...    0us+: _spin_lock_irqsave (0)
+events/7-27    7d..1   33us : _spin_unlock_irqrestore (0)
+events/7-27    7d..2   33us : trace_hardirqs_on (_spin_unlock_irqrestore)
+
+(latency unrelated to the tests, therefore irq latency <= 33us)
+
+wbias rwlock instrumentation (below) shows that interrupt latency has been 14176
+cycles, for a total of 7us.
+
+Detailed psrwlock latency breakdown :
+
+IRQ latency for cpu 0 disabled 1086419 times, [min,avg,max] 316,2833,14176 cycles
+IRQ latency for cpu 1 disabled 1099517 times, [min,avg,max] 316,1820,8254 cycles
+IRQ latency for cpu 3 disabled 159088 times, [min,avg,max] 316,1409,5632 cycles
+IRQ latency for cpu 4 disabled 161 times, [min,avg,max] 340,1882,5206 cycles
+SoftIRQ latency for cpu 0 disabled 1086419 times, [min,avg,max] 2212,5350,166402 cycles
+SoftIRQ latency for cpu 1 disabled 1099517 times, [min,avg,max] 2230,4265,138988 cycles
+SoftIRQ latency for cpu 3 disabled 159088 times, [min,avg,max] 2212,3319,14992 cycles
+SoftIRQ latency for cpu 4 disabled 161 times, [min,avg,max] 2266,3802,7138 cycles
+Preemption latency for cpu 3 disabled 59855 times, [min,avg,max] 5266,15706,53494 cycles
+Preemption latency for cpu 4 disabled 72 times, [min,avg,max] 5728,14132,28042 cycles
+Preemption latency for cpu 5 disabled 55586612 times, [min,avg,max] 196,2080,126526 cycles
+
+Note : preemptable critical sections has been implemented after the previous
+latency tests. It should be noted that the worse latency obtained in wbias
+rwlock comes from the busy-loop for the wait queue protection mutex (100us) :
+
+IRQ latency for cpu 3 disabled 2822178 times, [min,avg,max] 256,8892,209926 cycles
+disable : [<ffffffff803acff5>] rwlock_wait+0x265/0x2c0
+
+
+
+**** Lock contention delays ****
+
+Number of cycles required to take the lock are benchmarked for each context.
+
+
+** get_cycles calibration **
+get_cycles takes [min,avg,max] 72,75,78 cycles, results calibrated on avg
+
+
+** Single writer test, no contention **
+
+* Writer-biased rwlocks v13
+
+writer_thread/0 iterations : 100274, lock delay [min,avg,max] 27,33,249 cycles
+writer_thread/0 iterations : 100274, unlock delay [min,avg,max] 27,30,10407 cycles
+
+* Standard rwlock, kernel 2.6.27-rc3
+
+writer_thread/0 iterations : 100322, lock delay [min,avg,max] 37,40,4537 cycles
+writer_thread/0 iterations : 100322, unlock delay [min,avg,max] 37,40,25435 cycles
+
+
+** Single preemptable reader test, no contention **
+
+Writer-biased rwlock has a twice faster lock and unlock uncontended fast path.
+Note that wbias rwlock support preemptable readers. Standard rwlocks disables
+preemption.
+
+* Writer-biased rwlocks v13
+
+preader_thread/0 iterations : 33856510, lock delay [min,avg,max] 27,29,34035 cycles
+preader_thread/0 iterations : 33856510, unlock delay [min,avg,max] 15,20,34701 cycles
+
+* Standard rwlock, kernel 2.6.27-rc3
+
+N/A : preemption must be disabled with standard rwlocks.
+
+
+** Single non-preemptable reader test, no contention **
+wbias rwlock read is still twice faster than standard rwlock even for read done
+in non-preemptable context.
+
+* Writer-biased rwlocks v13
+
+npreader_thread/0 iterations : 33461225, lock delay [min,avg,max] 27,30,16329 cycles
+npreader_thread/0 iterations : 33461225, unlock delay [min,avg,max] 15,19,21657 cycles
+
+* Standard rwlock, kernel 2.6.27-rc3
+
+npreader_thread/0 iterations : 31639225, lock delay [min,avg,max] 37,39,127111 cycles
+npreader_thread/0 iterations : 31639225, unlock delay [min,avg,max] 37,42,215587 cycles
+
+
+** Multiple p(reemptable)/n(on-)p(reemptable) readers test, no contention **
+This contended case where multiple readers try to access the data structure in
+loop, without any writer, shows that standard rwlock average is a little bit
+better than wbias rwlock. It could be explained by the fact that wbias rwlock
+cmpxchg operation, which is used to keep the count of active readers, may fail
+if there is contention and must therefore be retried. The fastpath actually
+expects the number of readers to be 0, which isn't the case here.
+
+* Writer-biased rwlocks v13
+
+npreader_thread/0 iterations : 16885001, lock delay [min,avg,max] 27,425,40239 cycles
+npreader_thread/0 iterations : 16885001, unlock delay [min,avg,max] 15,220,18153 cycles
+npreader_thread/1 iterations : 16832690, lock delay [min,avg,max] 33,433,26841 cycles
+npreader_thread/1 iterations : 16832690, unlock delay [min,avg,max] 15,219,22329 cycles
+preader_thread/0 iterations : 17185174, lock delay [min,avg,max] 27,438,31437 cycles
+preader_thread/0 iterations : 17185174, unlock delay [min,avg,max] 15,211,30465 cycles
+preader_thread/1 iterations : 17293655, lock delay [min,avg,max] 27,435,53301 cycles
+preader_thread/1 iterations : 17293655, unlock delay [min,avg,max] 15,209,63921 cycles
+
+* Standard rwlock, kernel 2.6.27-rc3
+
+npreader_thread/0 iterations : 19248438, lock delay [min,avg,max] 37,273,364459 cycles
+npreader_thread/0 iterations : 19248438, unlock delay [min,avg,max] 43,216,272539 cycles
+npreader_thread/1 iterations : 19251717, lock delay [min,avg,max] 37,242,365719 cycles
+npreader_thread/1 iterations : 19251717, unlock delay [min,avg,max] 43,249,162847 cycles
+preader_thread/0 iterations : 19557931, lock delay [min,avg,max] 37,250,334921 cycles
+preader_thread/0 iterations : 19557931, unlock delay [min,avg,max] 37,245,266377 cycles
+preader_thread/1 iterations : 19671318, lock delay [min,avg,max] 37,258,390913 cycles
+preader_thread/1 iterations : 19671318, unlock delay [min,avg,max] 37,234,604507 cycles
+
+
+
+** High contention test **
+
+In high contention test :
+
+TEST_DURATION 60s
+NR_WRITERS 2
+NR_TRYLOCK_WRITERS 1
+NR_PREEMPTABLE_READERS 2
+NR_NON_PREEMPTABLE_READERS 2
+NR_TRYLOCK_READERS 1
+WRITER_DELAY 100us
+TRYLOCK_WRITER_DELAY 1000us
+TRYLOCK_WRITERS_FAIL_ITER 100
+THREAD_READER_DELAY 0   /* busy loop */
+INTERRUPT_READER_DELAY 100ms
+
+
+* Preemptable writers
+
+* Writer-biased rwlocks v13
+
+writer_thread/0 iterations : 537678, lock delay [min,avg,max] 123,14021,8580813 cycles
+writer_thread/0 iterations : 537678, unlock delay [min,avg,max] 387,9070,1450053 cycles
+writer_thread/1 iterations : 536944, lock delay [min,avg,max] 123,13179,8687331 cycles
+writer_thread/1 iterations : 536944, unlock delay [min,avg,max] 363,10430,1400835 cycles
+
+* Standard rwlock, kernel 2.6.27-rc3
+
+writer_thread/0 iterations : 222797, lock delay [min,avg,max] 127,336611,4710367 cycles
+writer_thread/0 iterations : 222797, unlock delay [min,avg,max] 151,2009,714115 cycles
+writer_thread/1 iterations : 6845, lock delay [min,avg,max] 139,17271138,352848961 cycles
+writer_thread/1 iterations : 6845, unlock delay [min,avg,max] 217,93935,1991509 cycles
+
+
+* Non-preemptable readers
+
+* Writer-biased rwlocks v13
+
+npreader_thread/0 iterations : 64652609, lock delay [min,avg,max] 27,828,67497 cycles
+npreader_thread/0 iterations : 64652609, unlock delay [min,avg,max] 15,485,202773 cycles
+npreader_thread/1 iterations : 65143310, lock delay [min,avg,max] 27,817,64569 cycles
+npreader_thread/1 iterations : 65143310, unlock delay [min,avg,max] 15,484,133611 cycles
+
+* Standard rwlock, kernel 2.6.27-rc3
+
+npreader_thread/0 iterations : 68298472, lock delay [min,avg,max] 37,640,733423 cycles
+npreader_thread/0 iterations : 68298472, unlock delay [min,avg,max] 37,565,672241 cycles
+npreader_thread/1 iterations : 70331311, lock delay [min,avg,max] 37,603,393925 cycles
+npreader_thread/1 iterations : 70331311, unlock delay [min,avg,max] 37,558,373477 cycles
+
+
+* Preemptable readers
+
+* Writer-biased rwlocks v13
+
+preader_thread/0 iterations : 38484022, lock delay [min,avg,max] 27,2207,89363619 cycles
+preader_thread/0 iterations : 38484022, unlock delay [min,avg,max] 15,392,1965315 cycles
+preader_thread/1 iterations : 44661191, lock delay [min,avg,max] 27,1672,8708253 cycles
+preader_thread/1 iterations : 44661191, unlock delay [min,avg,max] 15,495,142119 cycles
+
+* Standard rwlock, kernel 2.6.27-rc3
+
+N/A : preemption must be disabled with standard rwlocks.
+
+
+* Interrupt context readers
+
+* Writer-biased rwlocks v13 (note : the highest unlock delays (32us) are
+  caused by the wakeup of the wait queue done at the exit of the critical section
+  if the waitqueue is active)
+
+interrupt readers on CPU 0, lock delay [min,avg,max] 135,1603,28119 cycles
+interrupt readers on CPU 0, unlock delay [min,avg,max] 9,1712,35355 cycles
+interrupt readers on CPU 1, lock delay [min,avg,max] 39,1756,18285 cycles
+interrupt readers on CPU 1, unlock delay [min,avg,max] 9,2628,58257 cycles
+interrupt readers on CPU 2, lock delay [min,avg,max] 129,1450,16533 cycles
+interrupt readers on CPU 2, unlock delay [min,avg,max] 27,2354,49647 cycles
+interrupt readers on CPU 3, lock delay [min,avg,max] 75,1758,27051 cycles
+interrupt readers on CPU 3, unlock delay [min,avg,max] 9,2446,63603 cycles
+interrupt readers on CPU 4, lock delay [min,avg,max] 159,1707,27903 cycles
+interrupt readers on CPU 4, unlock delay [min,avg,max] 9,1822,39957 cycles
+interrupt readers on CPU 6, lock delay [min,avg,max] 105,1635,24489 cycles
+interrupt readers on CPU 6, unlock delay [min,avg,max] 9,2390,36771 cycles
+interrupt readers on CPU 7, lock delay [min,avg,max] 135,1614,22995 cycles
+interrupt readers on CPU 7, unlock delay [min,avg,max] 9,2052,43479 cycles
+
+* Standard rwlock, kernel 2.6.27-rc3 (note : these numbers seems good, but
+  they do not take interrupt latency in account. See interrupt latency tests
+  above for discussion of this issue)
+
+interrupt readers on CPU 0, lock delay [min,avg,max] 55,573,4417 cycles
+interrupt readers on CPU 0, unlock delay [min,avg,max] 43,529,1591 cycles
+interrupt readers on CPU 1, lock delay [min,avg,max] 139,591,5731 cycles
+interrupt readers on CPU 1, unlock delay [min,avg,max] 31,534,2395 cycles
+interrupt readers on CPU 2, lock delay [min,avg,max] 127,671,6043 cycles
+interrupt readers on CPU 2, unlock delay [min,avg,max] 37,401,1567 cycles
+interrupt readers on CPU 3, lock delay [min,avg,max] 151,676,5569 cycles
+interrupt readers on CPU 3, unlock delay [min,avg,max] 127,536,2797 cycles
+interrupt readers on CPU 5, lock delay [min,avg,max] 127,531,15397 cycles
+interrupt readers on CPU 5, unlock delay [min,avg,max] 31,323,1747 cycles
+interrupt readers on CPU 6, lock delay [min,avg,max] 121,548,29125 cycles
+interrupt readers on CPU 6, unlock delay [min,avg,max] 31,435,2089 cycles
+interrupt readers on CPU 7, lock delay [min,avg,max] 37,613,5485 cycles
+interrupt readers on CPU 7, unlock delay [min,avg,max] 49,541,1645 cycles

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [RFC PATCH 3/5] Priority Sifting Reader-Writer Lock x86_64 Optimised Call
  2008-09-09  0:34 [RFC PATCH 0/5] Priority Sifting Reader-Writer Lock v13 Mathieu Desnoyers
  2008-09-09  0:34 ` [RFC PATCH 1/5] " Mathieu Desnoyers
  2008-09-09  0:34 ` [RFC PATCH 2/5] Priority Sifting Reader-Writer Lock Documentation Mathieu Desnoyers
@ 2008-09-09  0:34 ` Mathieu Desnoyers
  2008-09-09  0:34 ` [RFC PATCH 4/5] Priority Sifting Reader-Writer Lock Sample Mathieu Desnoyers
  2008-09-09  0:34 ` [RFC PATCH 5/5] Priority Sifting Reader-Writer Lock Latency Trace Mathieu Desnoyers
  4 siblings, 0 replies; 6+ messages in thread
From: Mathieu Desnoyers @ 2008-09-09  0:34 UTC (permalink / raw)
  To: Linus Torvalds, H. Peter Anvin, Jeremy Fitzhardinge,
	Andrew Morton, Ingo Molnar, Paul E. McKenney, Peter Zijlstra,
	Joe Perches, Wei Weng, linux-kernel
  Cc: Mathieu Desnoyers

[-- Attachment #1: psrwlock-x86_64-optimised-call.patch --]
[-- Type: text/plain, Size: 7593 bytes --]

Create a specialized calling convention for x86_64 where the first argument is
passed in rax. Use a trampoline to move it to the rdi register. Useful to re-use
the return value of a cmpxchg without moving registers in-line.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
CC: Jeremy Fitzhardinge <jeremy@goop.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Ingo Molnar <mingo@elte.hu>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Joe Perches <joe@perches.com>
CC: Wei Weng <wweng@acedsl.com>
---
 arch/x86/Kconfig                 |    1 
 arch/x86/kernel/Makefile         |    3 +
 arch/x86/kernel/call_64.S        |   45 +++++++++++++++++++++++++
 arch/x86/kernel/call_export_64.c |   36 ++++++++++++++++++++
 include/asm-x86/call_64.h        |   68 +++++++++++++++++++++++++++++++++++++++
 5 files changed, 153 insertions(+)

Index: linux-2.6-lttng/arch/x86/kernel/Makefile
===================================================================
--- linux-2.6-lttng.orig/arch/x86/kernel/Makefile	2008-09-08 11:49:37.000000000 -0400
+++ linux-2.6-lttng/arch/x86/kernel/Makefile	2008-09-08 11:50:46.000000000 -0400
@@ -99,6 +99,9 @@ scx200-y			+= scx200_32.o
 
 obj-$(CONFIG_OLPC)		+= olpc.o
 
+obj-y				+= call_64.o
+obj-y				+= call_export_64.o
+
 ###
 # 64 bit specific files
 ifeq ($(CONFIG_X86_64),y)
Index: linux-2.6-lttng/arch/x86/kernel/call_64.S
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/arch/x86/kernel/call_64.S	2008-09-08 11:53:47.000000000 -0400
@@ -0,0 +1,45 @@
+/*
+ * linux/arch/x86/kernel/call_64.S -- special 64-bits calling conventions
+ *
+ * Copyright (C) 2008 Mathieu Desnoyers
+ */
+
+#include <linux/linkage.h>
+
+/*
+ * Called by call_rax_rsi().
+ *
+ * Move rax to rdi and proceed to the standard call.
+ */
+.macro TRAMPOLINE_RAX_RSI symbol
+ENTRY(asm_\symbol)
+	movq	%rax, %rdi
+	jmp	_\symbol
+END(asm_\symbol)
+.endm
+
+/*
+ * Called by call_rbx_rsi().
+ *
+ * Move rbx to rdi and proceed to the standard call.
+ */
+.macro TRAMPOLINE_RBX_RSI symbol
+ENTRY(asm_\symbol)
+	movq	%rbx, %rdi
+	jmp	_\symbol
+END(asm_\symbol)
+.endm
+
+TRAMPOLINE_RAX_RSI psread_lock_slow_irq
+TRAMPOLINE_RAX_RSI psread_trylock_slow_irq
+TRAMPOLINE_RAX_RSI psread_lock_slow_bh
+TRAMPOLINE_RAX_RSI psread_trylock_slow_bh
+TRAMPOLINE_RAX_RSI psread_lock_slow_inatomic
+TRAMPOLINE_RAX_RSI psread_trylock_slow_inatomic
+TRAMPOLINE_RAX_RSI psread_lock_slow
+TRAMPOLINE_RAX_RSI psread_trylock_slow
+
+TRAMPOLINE_RAX_RSI pswrite_lock_slow
+TRAMPOLINE_RAX_RSI pswrite_trylock_slow
+TRAMPOLINE_RAX_RSI pswrite_unlock_slow
+TRAMPOLINE_RBX_RSI psrwlock_wakeup
Index: linux-2.6-lttng/arch/x86/kernel/call_export_64.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/arch/x86/kernel/call_export_64.c	2008-09-08 11:50:46.000000000 -0400
@@ -0,0 +1,36 @@
+/*
+ * linux/arch/x86/kernel/call_64.c -- special 64-bits calling conventions
+ *
+ * Export function symbols of special calling convention functions.
+ *
+ * Copyright (C) 2008 Mathieu Desnoyers
+ */
+
+#include <linux/module.h>
+#include <asm/call_64.h>
+
+void asm_psread_lock_slow_irq(void);
+EXPORT_SYMBOL_GPL(asm_psread_lock_slow_irq);
+void asm_psread_trylock_slow_irq(void);
+EXPORT_SYMBOL_GPL(asm_psread_trylock_slow_irq);
+void asm_psread_lock_slow_bh(void);
+EXPORT_SYMBOL_GPL(asm_psread_lock_slow_bh);
+void asm_psread_trylock_slow_bh(void);
+EXPORT_SYMBOL_GPL(asm_psread_trylock_slow_bh);
+void asm_psread_lock_slow_inatomic(void);
+EXPORT_SYMBOL_GPL(asm_psread_lock_slow_inatomic);
+void asm_psread_trylock_slow_inatomic(void);
+EXPORT_SYMBOL_GPL(asm_psread_trylock_slow_inatomic);
+void asm_psread_lock_slow(void);
+EXPORT_SYMBOL_GPL(asm_psread_lock_slow);
+void asm_psread_trylock_slow(void);
+EXPORT_SYMBOL_GPL(asm_psread_trylock_slow);
+
+void asm_pswrite_lock_slow(void);
+EXPORT_SYMBOL_GPL(asm_pswrite_lock_slow);
+void asm_pswrite_trylock_slow(void);
+EXPORT_SYMBOL_GPL(asm_pswrite_trylock_slow);
+void asm_pswrite_unlock_slow(void);
+EXPORT_SYMBOL_GPL(asm_pswrite_unlock_slow);
+void asm_psrwlock_wakeup(void);
+EXPORT_SYMBOL_GPL(asm_psrwlock_wakeup);
Index: linux-2.6-lttng/include/asm-x86/call_64.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-x86/call_64.h	2008-09-08 11:52:07.000000000 -0400
@@ -0,0 +1,68 @@
+#ifndef __ASM_X86_CALL_64_H
+#define __ASM_X86_CALL_64_H
+
+/*
+ * asm-x86/call_64.h
+ *
+ * Use rax as first argument for the call. Useful when already returned by the
+ * previous instruction, such as cmpxchg.
+ * Leave rdi free to mov rax to rdi in the trampoline.
+ * Return value in rax.
+ *
+ * Saving the registers in the original caller because we cannot restore them in
+ * the trampoline. Save the same as "SAVE_ARGS".
+ *
+ * Copyright (C) 2008 Mathieu Desnoyers
+ */
+
+#define call_rax_rsi(symbol, rax, rsi)				\
+	({							\
+		unsigned long ret, modrsi;			\
+		asm volatile("callq asm_" #symbol "\n\t"	\
+			     : "=a" (ret), "=S" (modrsi)	\
+			     : "a" (rax), "S" (rsi)		\
+			     : "rdi", "rcx", "rdx",		\
+			       "%r8", "%r9", "%r10", "%r11",	\
+			       "cc", "memory");			\
+		ret;						\
+	})
+
+#define call_rbx_rsi(symbol, rbx, rsi)				\
+	({							\
+		unsigned long ret, modrsi;			\
+		asm volatile("callq asm_" #symbol "\n\t"	\
+			     : "=a" (ret), "=S" (modrsi)	\
+			     : "b" (rbx), "S" (rsi)		\
+			     : "rdi", "rcx", "rdx",		\
+			       "%r8", "%r9", "%r10", "%r11",	\
+			       "cc", "memory");			\
+		ret;						\
+	})
+
+#define psread_lock_slow_irq(v, rwlock)				\
+	call_rax_rsi(psread_lock_slow_irq, v, rwlock)
+#define psread_trylock_slow_irq(v, rwlock)			\
+	call_rax_rsi(psread_trylock_slow_irq, v, rwlock)
+#define psread_lock_slow_bh(v, rwlock)				\
+	call_rax_rsi(psread_lock_slow_bh, v, rwlock)
+#define psread_trylock_slow_bh(v, rwlock)			\
+	call_rax_rsi(psread_trylock_slow_bh, v, rwlock)
+#define psread_lock_slow_inatomic(v, rwlock)			\
+	call_rax_rsi(psread_lock_slow_inatomic, v, rwlock)
+#define psread_trylock_slow_inatomic(v, rwlock)			\
+	call_rax_rsi(psread_trylock_slow_inatomic, v, rwlock)
+#define psread_lock_slow(v, rwlock)				\
+	call_rax_rsi(psread_lock_slow, v, rwlock)
+#define psread_trylock_slow(v, rwlock)				\
+	call_rax_rsi(psread_trylock_slow, v, rwlock)
+
+#define pswrite_lock_slow(v, rwlock)				\
+	call_rax_rsi(pswrite_lock_slow, v, rwlock)
+#define pswrite_trylock_slow(v, rwlock)				\
+	call_rax_rsi(pswrite_trylock_slow, v, rwlock)
+#define pswrite_unlock_slow(v, rwlock)				\
+	call_rax_rsi(pswrite_unlock_slow, v, rwlock)
+#define psrwlock_wakeup(v, rwlock)				\
+	call_rbx_rsi(psrwlock_wakeup, v, rwlock)
+
+#endif
Index: linux-2.6-lttng/arch/x86/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/x86/Kconfig	2008-09-08 11:49:37.000000000 -0400
+++ linux-2.6-lttng/arch/x86/Kconfig	2008-09-08 11:50:46.000000000 -0400
@@ -31,6 +31,7 @@ config X86
 	select HAVE_ARCH_KGDB if !X86_VOYAGER
 	select HAVE_GENERIC_DMA_COHERENT if X86_32
 	select HAVE_EFFICIENT_UNALIGNED_ACCESS
+	select HAVE_PSRWLOCK_ASM_CALL
 
 config ARCH_DEFCONFIG
 	string

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [RFC PATCH 4/5] Priority Sifting Reader-Writer Lock Sample
  2008-09-09  0:34 [RFC PATCH 0/5] Priority Sifting Reader-Writer Lock v13 Mathieu Desnoyers
                   ` (2 preceding siblings ...)
  2008-09-09  0:34 ` [RFC PATCH 3/5] Priority Sifting Reader-Writer Lock x86_64 Optimised Call Mathieu Desnoyers
@ 2008-09-09  0:34 ` Mathieu Desnoyers
  2008-09-09  0:34 ` [RFC PATCH 5/5] Priority Sifting Reader-Writer Lock Latency Trace Mathieu Desnoyers
  4 siblings, 0 replies; 6+ messages in thread
From: Mathieu Desnoyers @ 2008-09-09  0:34 UTC (permalink / raw)
  To: Linus Torvalds, H. Peter Anvin, Jeremy Fitzhardinge,
	Andrew Morton, Ingo Molnar, Paul E. McKenney, Peter Zijlstra,
	Joe Perches, Wei Weng, linux-kernel
  Cc: Mathieu Desnoyers

[-- Attachment #1: psrwlock-sample.patch --]
[-- Type: text/plain, Size: 6363 bytes --]

Sample module to show how to use psrwlock.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
CC: Jeremy Fitzhardinge <jeremy@goop.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Ingo Molnar <mingo@elte.hu>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: Joe Perches <joe@perches.com>
CC: Wei Weng <wweng@acedsl.com>
---
 samples/Kconfig                     |    5 +
 samples/Makefile                    |    2 
 samples/psrwlock/Makefile           |    4 
 samples/psrwlock/psrwlock_example.c |  173 ++++++++++++++++++++++++++++++++++++
 4 files changed, 183 insertions(+), 1 deletion(-)

Index: linux-2.6-lttng/samples/Makefile
===================================================================
--- linux-2.6-lttng.orig/samples/Makefile	2008-09-06 14:05:34.000000000 -0400
+++ linux-2.6-lttng/samples/Makefile	2008-09-06 14:12:42.000000000 -0400
@@ -1,3 +1,3 @@
 # Makefile for Linux samples code
 
-obj-$(CONFIG_SAMPLES)	+= markers/ kobject/ kprobes/
+obj-$(CONFIG_SAMPLES)	+= markers/ kobject/ kprobes/ psrwlock/
Index: linux-2.6-lttng/samples/psrwlock/Makefile
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/samples/psrwlock/Makefile	2008-09-06 14:12:42.000000000 -0400
@@ -0,0 +1,4 @@
+# builds the writer-biased rwlock example kernel modules;
+# then to use one (as root):  insmod <module_name.ko>
+
+obj-$(CONFIG_SAMPLE_PSRWLOCK) += psrwlock_example.o
Index: linux-2.6-lttng/samples/psrwlock/psrwlock_example.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/samples/psrwlock/psrwlock_example.c	2008-09-06 14:21:22.000000000 -0400
@@ -0,0 +1,173 @@
+/*
+ * Priority Sifting Reader-Writer Lock Example
+ *
+ * Copyright 2008 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
+ */
+
+#include <linux/module.h>
+#include <linux/psrwlock.h>
+
+/*
+ * Define which execution contexts can access the lock in read or write mode.
+ * See psrwlock.h and psrwlock-types.h for details.
+ *
+ * In this example, the writer is in preemptable context and the readers either
+ * in IRQ context, softirq context, non-preemptable context or preemptable
+ * context.
+ */
+#define SAMPLE_ALL_WCTX		PSRW_PRIO_P
+#define SAMPLE_ALL_RCTX		(PSR_IRQ | PSR_BH | PSR_NPTHREAD | PSR_PTHREAD)
+
+static DEFINE_PSRWLOCK(sample_rwlock, SAMPLE_ALL_WCTX, SAMPLE_ALL_RCTX);
+CHECK_PSRWLOCK_MAP(sample_rwlock, SAMPLE_ALL_WCTX, SAMPLE_ALL_RCTX);
+
+/*
+ * Reader in IRQ context.
+ */
+static void executed_in_irq(void)
+{
+	psread_lock_irq(&sample_rwlock);
+	/* read structure */
+	psread_unlock(&sample_rwlock);
+}
+
+/*
+ * Reader in Softirq context.
+ */
+static void executed_in_bh(void)
+{
+	psread_lock_bh(&sample_rwlock);
+	/* read structure */
+	psread_unlock(&sample_rwlock);
+}
+
+/*
+ * Reader in non-preemptable context.
+ */
+static void executed_inatomic(void)
+{
+	psread_lock_inatomic(&sample_rwlock);
+	/* read structure */
+	psread_unlock(&sample_rwlock);
+}
+
+/*
+ * Reader in preemptable context.
+ */
+static void reader_executed_preemptable(void)
+{
+	psread_lock(&sample_rwlock);
+	/* read structure */
+	psread_unlock(&sample_rwlock);
+}
+
+/*
+ * Writer in preemptable context.
+ */
+static void writer_executed_preemptable(void)
+{
+	pswrite_lock(&sample_rwlock, SAMPLE_ALL_WCTX, SAMPLE_ALL_RCTX);
+	/* read structure */
+	pswrite_unlock(&sample_rwlock, SAMPLE_ALL_WCTX, SAMPLE_ALL_RCTX);
+}
+
+/*
+ * Execute readers in all contexts.
+ */
+static void sample_all_context(void)
+{
+	local_irq_disable();
+	executed_in_irq();
+	local_irq_enable();
+
+	local_bh_disable();
+	executed_in_bh();
+	local_bh_enable();
+
+	preempt_disable();
+	executed_inatomic();
+	preempt_enable();
+
+	reader_executed_preemptable();
+
+	writer_executed_preemptable();
+}
+
+
+/*
+ * In this second example, the writer is in non-preemptable context and the
+ * readers either in IRQ context or softirq context only.
+ */
+static DEFINE_PSRWLOCK(sample_wnp_rbh_rirq_rwlock,
+	PSRW_PRIO_P, PSR_IRQ | PSR_BH);
+CHECK_PSRWLOCK_MAP(sample_wnp_rbh_rirq_rwlock,
+	PSRW_PRIO_P, PSR_IRQ | PSR_BH);
+
+/*
+ * Reader in IRQ context.
+ */
+static void wnp_rbh_rirq_executed_in_irq(void)
+{
+	psread_lock_irq(&sample_wnp_rbh_rirq_rwlock);
+	/* read structure */
+	psread_unlock(&sample_wnp_rbh_rirq_rwlock);
+}
+
+/*
+ * Reader in Softirq context.
+ */
+static void wnp_rbh_rirq_executed_in_bh(void)
+{
+	psread_lock_bh(&sample_wnp_rbh_rirq_rwlock);
+	/* read structure */
+	psread_unlock(&sample_wnp_rbh_rirq_rwlock);
+}
+
+/*
+ * Writer in preemptable context.
+ */
+static void wnp_rbh_rirq_writer_executed_non_preemptable(void)
+{
+	pswrite_lock(&sample_wnp_rbh_rirq_rwlock,
+				PSRW_PRIO_P, PSR_IRQ | PSR_BH);
+	/* read structure */
+	pswrite_unlock(&sample_wnp_rbh_rirq_rwlock,
+				PSRW_PRIO_P, PSR_IRQ | PSR_BH);
+}
+
+/*
+ * Execute readers in all contexts.
+ */
+static void sample_wnp_rbh_rirq_context(void)
+{
+	local_irq_disable();
+	wnp_rbh_rirq_executed_in_irq();
+	local_irq_enable();
+
+	local_bh_disable();
+	wnp_rbh_rirq_executed_in_bh();
+	local_bh_enable();
+
+	preempt_disable();
+	wnp_rbh_rirq_writer_executed_non_preemptable();
+	preempt_enable();
+}
+
+static int __init init_example(void)
+{
+	sample_all_context();
+	sample_wnp_rbh_rirq_context();
+
+	return 0;
+}
+
+static void __exit exit_example(void)
+{
+}
+
+module_init(init_example)
+module_exit(exit_example)
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Mathieu Desnoyers");
+MODULE_DESCRIPTION("psrwlock example");
Index: linux-2.6-lttng/samples/Kconfig
===================================================================
--- linux-2.6-lttng.orig/samples/Kconfig	2008-09-06 14:05:34.000000000 -0400
+++ linux-2.6-lttng/samples/Kconfig	2008-09-06 14:12:42.000000000 -0400
@@ -33,5 +33,10 @@ config SAMPLE_KRETPROBES
 	default m
 	depends on SAMPLE_KPROBES && KRETPROBES
 
+config SAMPLE_PSRWLOCK
+	tristate "Build psrwlock example -- loadable modules only"
+	default m
+	depends on m
+
 endif # SAMPLES
 

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [RFC PATCH 5/5] Priority Sifting Reader-Writer Lock Latency Trace
  2008-09-09  0:34 [RFC PATCH 0/5] Priority Sifting Reader-Writer Lock v13 Mathieu Desnoyers
                   ` (3 preceding siblings ...)
  2008-09-09  0:34 ` [RFC PATCH 4/5] Priority Sifting Reader-Writer Lock Sample Mathieu Desnoyers
@ 2008-09-09  0:34 ` Mathieu Desnoyers
  4 siblings, 0 replies; 6+ messages in thread
From: Mathieu Desnoyers @ 2008-09-09  0:34 UTC (permalink / raw)
  To: Linus Torvalds, H. Peter Anvin, Jeremy Fitzhardinge,
	Andrew Morton, Ingo Molnar, Paul E. McKenney, Peter Zijlstra,
	Joe Perches, Wei Weng, linux-kernel
  Cc: Mathieu Desnoyers

[-- Attachment #1: psrwlock-latency-trace.patch --]
[-- Type: text/plain, Size: 12982 bytes --]

Trace preempt/softirqs off/irqsoff latency in rwlock. Can be used to perform
precise measurement on the impact of these primitives without being lost in the
kernel noise.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
CC: Linus Torvalds <torvalds@linux-foundation.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
CC: Jeremy Fitzhardinge <jeremy@goop.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Ingo Molnar <mingo@elte.hu>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
---
 include/linux/psrwlock-latency-trace.h |  104 +++++++++++
 lib/Kconfig.debug                      |    4 
 lib/psrwlock-latency-trace.c           |  288 +++++++++++++++++++++++++++++++++
 3 files changed, 396 insertions(+)

Index: linux-2.6-lttng/lib/Kconfig.debug
===================================================================
--- linux-2.6-lttng.orig/lib/Kconfig.debug	2008-09-06 18:02:21.000000000 -0400
+++ linux-2.6-lttng/lib/Kconfig.debug	2008-09-06 18:06:20.000000000 -0400
@@ -683,6 +683,10 @@ config FAULT_INJECTION_STACKTRACE_FILTER
 config HAVE_PSRWLOCK_ASM_CALL
 	def_bool n
 
+config PSRWLOCK_LATENCY_TEST
+	boolean "Testing API for psrwlock latency test"
+	help
+
 config LATENCYTOP
 	bool "Latency measuring infrastructure"
 	select FRAME_POINTER if !MIPS
Index: linux-2.6-lttng/include/linux/psrwlock-latency-trace.h
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/linux/psrwlock-latency-trace.h	2008-09-06 18:06:20.000000000 -0400
@@ -0,0 +1,104 @@
+#ifndef _LINUX_PSRWLOCK_LATENCY_TRACE_H
+#define _LINUX_PSRWLOCK_LATENCY_TRACE_H
+
+/*
+ * Priority Sifting Reader-Writer Lock Latency Tracer
+ *
+ * Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
+ * August 2008
+ */
+
+#include <linux/hardirq.h>
+
+#ifdef CONFIG_PSRWLOCK_LATENCY_TEST
+
+extern void psrwlock_profile_latency_reset(void);
+extern void psrwlock_profile_latency_print(void);
+
+extern void psrwlock_profile_irq_disable(void);
+extern void psrwlock_profile_irq_enable(void);
+extern void psrwlock_profile_bh_disable(void);
+extern void psrwlock_profile_bh_enable(void);
+
+#define psrwlock_irq_save(flags)				\
+do {								\
+	local_irq_save(flags);					\
+	if (!irqs_disabled_flags(flags))			\
+		psrwlock_profile_irq_disable();		\
+} while (0)
+
+#define psrwlock_irq_restore(flags)				\
+do {								\
+	if (irqs_disabled() && !irqs_disabled_flags(flags))	\
+		psrwlock_profile_irq_enable();		\
+	local_irq_restore(flags);				\
+} while (0)
+
+static inline void psrwlock_irq_disable(void)
+{
+	unsigned long flags;
+
+	local_save_flags(flags);
+	local_irq_disable();
+	if (!irqs_disabled_flags(flags))
+		psrwlock_profile_irq_disable();
+}
+static inline void psrwlock_irq_enable(void)
+{
+	if (irqs_disabled())
+		psrwlock_profile_irq_enable();
+	local_irq_enable();
+}
+static inline void psrwlock_bh_disable(void)
+{
+	local_bh_disable();
+	if (softirq_count() == SOFTIRQ_OFFSET)
+		psrwlock_profile_bh_disable();
+}
+static inline void psrwlock_bh_enable(void)
+{
+	if (softirq_count() == SOFTIRQ_OFFSET)
+		psrwlock_profile_bh_enable();
+	local_bh_enable();
+}
+static inline void psrwlock_bh_enable_ip(unsigned long ip)
+{
+	if (softirq_count() == SOFTIRQ_OFFSET)
+		psrwlock_profile_bh_enable();
+	local_bh_enable_ip(ip);
+}
+
+#ifdef CONFIG_PREEMPT
+extern void psrwlock_profile_preempt_disable(void);
+extern void psrwlock_profile_preempt_enable(void);
+
+static inline void psrwlock_preempt_disable(void)
+{
+	preempt_disable();
+	if (preempt_count() == PREEMPT_OFFSET)
+		psrwlock_profile_preempt_disable();
+}
+static inline void psrwlock_preempt_enable(void)
+{
+	if (preempt_count() == PREEMPT_OFFSET)
+		psrwlock_profile_preempt_enable();
+	preempt_enable();
+}
+static inline void psrwlock_preempt_enable_no_resched(void)
+{
+	/*
+	 * Not exactly true, since we really re-preempt at the next preempt
+	 * check, but gives a good idea (lower-bound).
+	 */
+	if (preempt_count() == PREEMPT_OFFSET)
+		psrwlock_profile_preempt_enable();
+	preempt_enable_no_resched();
+}
+#else
+#define psrwlock_preempt_disable()		preempt_disable()
+#define psrwlock_preempt_enable()		preempt_enable()
+#define psrwlock_preempt_enable_no_resched()	preempt_enable_no_resched()
+#endif
+
+#endif	/* CONFIG_PSRWLOCK_LATENCY_TEST */
+#endif	/* _LINUX_PSRWLOCK_LATENCY_TRACE_H */
Index: linux-2.6-lttng/lib/psrwlock-latency-trace.c
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/lib/psrwlock-latency-trace.c	2008-09-06 18:07:26.000000000 -0400
@@ -0,0 +1,288 @@
+/*
+ * Priority Sifting Reader-Writer Lock Latency Tracer
+ *
+ * Copyright 2008 Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
+ */
+
+#include <linux/psrwlock.h>
+#include <linux/module.h>
+#include <linux/stop_machine.h>
+#include <linux/percpu.h>
+#include <linux/init.h>
+#include <linux/kallsyms.h>
+
+/*
+ * Use unsigned long, enough to represent cycle count diff, event on 32-bit
+ * arch.
+ */
+
+struct psrwlock_latency {
+	unsigned long last_disable_cycles, max_latency, min_latency, nr_enable;
+	cycles_t total_latency;
+	unsigned long max_latency_ip_disable,
+		max_latency_ip_enable,
+		last_ip_disable;
+};
+
+static DEFINE_PER_CPU(struct psrwlock_latency, irq_latency_info);
+static DEFINE_PER_CPU(struct psrwlock_latency, softirq_latency_info);
+static DEFINE_PER_CPU(struct psrwlock_latency, preempt_latency_info);
+
+static DEFINE_MUTEX(calibration_mutex);
+static unsigned long cycles_calibration_min,
+		cycles_calibration_avg,
+		cycles_calibration_max;
+
+/*
+ * Since we are taking the timestamps within the critical section,
+ * add the number of cycles it takes to take two consecutive
+ * cycles count reads to the total.
+ * Returns an unsigned long long for %llu print format.
+ */
+static unsigned long long calibrate_cycles(cycles_t cycles)
+{
+	return cycles + cycles_calibration_avg;
+}
+
+static void calibrate_get_cycles(void)
+{
+	int i;
+	cycles_t time1, time2;
+	unsigned long delay;
+
+	printk(KERN_INFO "** get_cycles calibration **\n");
+	cycles_calibration_min = ULLONG_MAX;
+	cycles_calibration_avg = 0;
+	cycles_calibration_max = 0;
+
+	local_irq_disable();
+	for (i = 0; i < 10; i++) {
+		rdtsc_barrier();
+		time1 = get_cycles();
+		rdtsc_barrier();
+		rdtsc_barrier();
+		time2 = get_cycles();
+		rdtsc_barrier();
+		delay = time2 - time1;
+		cycles_calibration_min = min(cycles_calibration_min, delay);
+		cycles_calibration_avg += delay;
+		cycles_calibration_max = max(cycles_calibration_max, delay);
+	}
+	cycles_calibration_avg /= 10;
+	local_irq_enable();
+
+	printk(KERN_INFO "get_cycles takes [min,avg,max] %lu,%lu,%lu "
+		"cycles, results calibrated on avg\n",
+		cycles_calibration_min,
+		cycles_calibration_avg,
+		cycles_calibration_max);
+	printk("\n");
+}
+
+static void reset_latency(struct psrwlock_latency *irql)
+{
+	irql->last_disable_cycles = 0;
+	irql->max_latency = 0;
+	irql->min_latency = ULONG_MAX;
+	irql->total_latency = 0;
+	irql->nr_enable = 0;
+	irql->max_latency_ip_disable = 0;
+	irql->max_latency_ip_enable = 0;
+	irql->last_ip_disable = 0;
+}
+
+/* can't be in irq disabled section in stop_machine */
+static int _psrwlock_profile_latency_reset(void *data)
+{
+	int cpu = smp_processor_id();
+
+	reset_latency(&per_cpu(irq_latency_info, cpu));
+	reset_latency(&per_cpu(softirq_latency_info, cpu));
+	reset_latency(&per_cpu(preempt_latency_info, cpu));
+	return 0;
+}
+
+
+void psrwlock_profile_latency_reset(void)
+{
+	mutex_lock(&calibration_mutex);
+	printk(KERN_INFO "Writer-biased rwlock latency profiling reset\n");
+	calibrate_get_cycles();
+	stop_machine(_psrwlock_profile_latency_reset,
+			NULL, &cpu_possible_map);
+	mutex_unlock(&calibration_mutex);
+}
+EXPORT_SYMBOL_GPL(psrwlock_profile_latency_reset);
+
+enum irq_latency_type {
+	IRQ_LATENCY,
+	SOFTIRQ_LATENCY,
+	PREEMPT_LATENCY,
+};
+
+/*
+ * total_irq_latency and nr_irq_enable reads are racy, but it's just an
+ * average. Off-by-one is not a big deal.
+ */
+static void print_latency(const char *typename, enum irq_latency_type type)
+{
+	struct psrwlock_latency *irql;
+	cycles_t avg;
+	unsigned long nr_enable;
+	int i;
+
+	for_each_online_cpu(i) {
+		if (type == IRQ_LATENCY)
+			irql = &per_cpu(irq_latency_info, i);
+		else if (type == SOFTIRQ_LATENCY)
+			irql = &per_cpu(softirq_latency_info, i);
+		else
+			irql = &per_cpu(preempt_latency_info, i);
+		nr_enable = irql->nr_enable;
+		if (!nr_enable)
+			continue;
+		avg = irql->total_latency / (cycles_t)nr_enable;
+		printk(KERN_INFO "%s latency for cpu %d "
+			"disabled %lu times, "
+			"[min,avg,max] %llu,%llu,%llu cycles\n",
+			typename, i, nr_enable,
+			calibrate_cycles(irql->min_latency),
+			calibrate_cycles(avg),
+			calibrate_cycles(irql->max_latency));
+		printk(KERN_INFO "Max %s latency caused by :\n", typename);
+		printk(KERN_INFO "disable : ");
+		print_ip_sym(irql->max_latency_ip_disable);
+		printk(KERN_INFO "enable : ");
+		print_ip_sym(irql->max_latency_ip_enable);
+	}
+}
+
+void psrwlock_profile_latency_print(void)
+{
+	mutex_lock(&calibration_mutex);
+	printk(KERN_INFO "Writer-biased rwlock latency profiling results\n");
+	printk(KERN_INFO "\n");
+	print_latency("IRQ", IRQ_LATENCY);
+	print_latency("SoftIRQ", SOFTIRQ_LATENCY);
+	print_latency("Preemption", PREEMPT_LATENCY);
+	printk(KERN_INFO "\n");
+	mutex_unlock(&calibration_mutex);
+}
+EXPORT_SYMBOL_GPL(psrwlock_profile_latency_print);
+
+void psrwlock_profile_irq_disable(void)
+{
+	struct psrwlock_latency *irql =
+		&per_cpu(irq_latency_info, smp_processor_id());
+
+	WARN_ON_ONCE(!irqs_disabled());
+	irql->last_ip_disable = _RET_IP_;
+	rdtsc_barrier();
+	irql->last_disable_cycles = get_cycles();
+	rdtsc_barrier();
+}
+EXPORT_SYMBOL_GPL(psrwlock_profile_irq_disable);
+
+void psrwlock_profile_irq_enable(void)
+{
+	struct psrwlock_latency *irql;
+	unsigned long cur_cycles, diff_cycles;
+
+	rdtsc_barrier();
+	cur_cycles = get_cycles();
+	rdtsc_barrier();
+	irql = &per_cpu(irq_latency_info, smp_processor_id());
+	WARN_ON_ONCE(!irqs_disabled());
+	if (!irql->last_disable_cycles)
+		return;
+	diff_cycles = cur_cycles - irql->last_disable_cycles;
+	if (diff_cycles > irql->max_latency) {
+		irql->max_latency = diff_cycles;
+		irql->max_latency_ip_enable = _RET_IP_;
+		irql->max_latency_ip_disable = irql->last_ip_disable;
+	}
+	irql->min_latency = min(irql->min_latency, diff_cycles);
+	irql->total_latency += diff_cycles;
+	irql->nr_enable++;
+}
+EXPORT_SYMBOL_GPL(psrwlock_profile_irq_enable);
+
+void psrwlock_profile_bh_disable(void)
+{
+	struct psrwlock_latency *irql =
+		&per_cpu(softirq_latency_info, smp_processor_id());
+
+	WARN_ON_ONCE(!in_softirq());
+	irql->last_ip_disable = _RET_IP_;
+	rdtsc_barrier();
+	irql->last_disable_cycles = get_cycles();
+	rdtsc_barrier();
+}
+EXPORT_SYMBOL_GPL(psrwlock_profile_bh_disable);
+
+void psrwlock_profile_bh_enable(void)
+{
+	struct psrwlock_latency *irql;
+	unsigned long cur_cycles, diff_cycles;
+
+	rdtsc_barrier();
+	cur_cycles = get_cycles();
+	rdtsc_barrier();
+	irql = &per_cpu(softirq_latency_info, smp_processor_id());
+	WARN_ON_ONCE(!in_softirq());
+	diff_cycles = cur_cycles - irql->last_disable_cycles;
+	if (diff_cycles > irql->max_latency) {
+		irql->max_latency = diff_cycles;
+		irql->max_latency_ip_enable = _RET_IP_;
+		irql->max_latency_ip_disable = irql->last_ip_disable;
+	}
+	irql->min_latency = min(irql->min_latency, diff_cycles);
+	irql->total_latency += diff_cycles;
+	irql->nr_enable++;
+}
+EXPORT_SYMBOL_GPL(psrwlock_profile_bh_enable);
+
+#ifdef CONFIG_PREEMPT
+void psrwlock_profile_preempt_disable(void)
+{
+	struct psrwlock_latency *irql =
+		&per_cpu(preempt_latency_info, smp_processor_id());
+
+	WARN_ON_ONCE(preemptible());
+	irql->last_ip_disable = _RET_IP_;
+	rdtsc_barrier();
+	irql->last_disable_cycles = get_cycles();
+	rdtsc_barrier();
+}
+EXPORT_SYMBOL_GPL(psrwlock_profile_preempt_disable);
+
+void psrwlock_profile_preempt_enable(void)
+{
+	struct psrwlock_latency *irql;
+	unsigned long cur_cycles, diff_cycles;
+
+	rdtsc_barrier();
+	cur_cycles = get_cycles();
+	rdtsc_barrier();
+	irql = &per_cpu(preempt_latency_info, smp_processor_id());
+	WARN_ON_ONCE(preemptible());
+	diff_cycles = cur_cycles - irql->last_disable_cycles;
+	if (diff_cycles > irql->max_latency) {
+		irql->max_latency = diff_cycles;
+		irql->max_latency_ip_enable = _RET_IP_;
+		irql->max_latency_ip_disable = irql->last_ip_disable;
+	}
+	irql->min_latency = min(irql->min_latency, diff_cycles);
+	irql->total_latency += diff_cycles;
+	irql->nr_enable++;
+}
+EXPORT_SYMBOL_GPL(psrwlock_profile_preempt_enable);
+#endif
+
+__init int psrwlock_init(void)
+{
+	printk(KERN_INFO "psrwlock latency profiling init\n");
+	calibrate_get_cycles();
+	return 0;
+}
+device_initcall(psrwlock_init);

-- 
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2008-09-09  1:21 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-09-09  0:34 [RFC PATCH 0/5] Priority Sifting Reader-Writer Lock v13 Mathieu Desnoyers
2008-09-09  0:34 ` [RFC PATCH 1/5] " Mathieu Desnoyers
2008-09-09  0:34 ` [RFC PATCH 2/5] Priority Sifting Reader-Writer Lock Documentation Mathieu Desnoyers
2008-09-09  0:34 ` [RFC PATCH 3/5] Priority Sifting Reader-Writer Lock x86_64 Optimised Call Mathieu Desnoyers
2008-09-09  0:34 ` [RFC PATCH 4/5] Priority Sifting Reader-Writer Lock Sample Mathieu Desnoyers
2008-09-09  0:34 ` [RFC PATCH 5/5] Priority Sifting Reader-Writer Lock Latency Trace Mathieu Desnoyers

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).