linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [Linux-ia64] reader-writer livelock problem
       [not found] <3FAD1088D4556046AEC48D80B47B478C0101F4E7@usslc-exch-4.slc.unisys.com>
@ 2002-11-08  3:51 ` William Lee Irwin III
  2002-11-08 17:13   ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 14+ messages in thread
From: William Lee Irwin III @ 2002-11-08  3:51 UTC (permalink / raw)
  To: Van Maren, Kevin
  Cc: linux-ia64, linux-kernel, rusty, dhowells, mingo, torvalds

On Thu, Nov 07, 2002 at 09:23:21PM -0600, Van Maren, Kevin wrote:
> This is a follow-up to the email thread I started on July 29th.
> See http://www.cs.helsinki.fi/linux/linux-kernel/2002-30/0446.html
> and the following discussion on LKML.
> I'll summarize the problem again to refresh the issue.
> Again, this is a correctness issue, not a performance one.
> I am seeing a problem on medium-sized SMPs where user programs are
> able to livelock the Linux kernel to such an extent that the
> system appears dead.  With the help of some hardware debugging
> tools, I was able to determine that the problem is caused by
> the reader-preference reader/writer locks in the Linux kernel.

This is a very serious problem which I have also encountered. My
strategy was to make the readers on the tasklist_lock more well-behaved,
and with Ingo's help and co-authorship those changes were cleaned up,
tuned to provide performance benefits for smaller systems, bugfixed,
and incorporated in the kernel. They have at least provided 16x systems
in my lab with much more stability. The issues are still triggerable on
32x systems in my lab, to which I do not have regular access.

Rusty, Dave, Ingo, and Linus cc:'d for additional commentary/help.

Bill

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Linux-ia64] reader-writer livelock problem
  2002-11-08  3:51 ` [Linux-ia64] reader-writer livelock problem William Lee Irwin III
@ 2002-11-08 17:13   ` Jeremy Fitzhardinge
  2002-11-08 17:25     ` Linus Torvalds
  2002-11-08 17:34     ` [Linux-ia64] reader-writer livelock problem David Howells
  0 siblings, 2 replies; 14+ messages in thread
From: Jeremy Fitzhardinge @ 2002-11-08 17:13 UTC (permalink / raw)
  To: William Lee Irwin III
  Cc: Van Maren, Kevin, linux-ia64, Linux Kernel List, rusty, dhowells,
	mingo, Linus Torvalds

On Thu, 2002-11-07 at 19:51, William Lee Irwin III wrote:
> On Thu, Nov 07, 2002 at 09:23:21PM -0600, Van Maren, Kevin wrote:
> > This is a follow-up to the email thread I started on July 29th.
> > See http://www.cs.helsinki.fi/linux/linux-kernel/2002-30/0446.html
> > and the following discussion on LKML.
> > I'll summarize the problem again to refresh the issue.
> > Again, this is a correctness issue, not a performance one.
> > I am seeing a problem on medium-sized SMPs where user programs are
> > able to livelock the Linux kernel to such an extent that the
> > system appears dead.  With the help of some hardware debugging
> > tools, I was able to determine that the problem is caused by
> > the reader-preference reader/writer locks in the Linux kernel.
> 
> This is a very serious problem which I have also encountered. My
> strategy was to make the readers on the tasklist_lock more well-behaved,
> and with Ingo's help and co-authorship those changes were cleaned up,
> tuned to provide performance benefits for smaller systems, bugfixed,
> and incorporated in the kernel. They have at least provided 16x systems
> in my lab with much more stability. The issues are still triggerable on
> 32x systems in my lab, to which I do not have regular access.

The normal way of solving this fairness problem is to make pending write
locks block read lock attempts, so that the reader count is guaranteed
to drop to zero as read locks are released.  I haven't looked at the
Linux implementation of rwlocks, so I don't know how hard this is to
do.  Or perhaps there's some other reason for not implementing it this
way?

	J


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Linux-ia64] reader-writer livelock problem
  2002-11-08 17:13   ` Jeremy Fitzhardinge
@ 2002-11-08 17:25     ` Linus Torvalds
  2002-11-08 17:28       ` Linus Torvalds
  2002-11-08 17:38       ` Jeremy Fitzhardinge
  2002-11-08 17:34     ` [Linux-ia64] reader-writer livelock problem David Howells
  1 sibling, 2 replies; 14+ messages in thread
From: Linus Torvalds @ 2002-11-08 17:25 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: William Lee Irwin III, Van Maren, Kevin, linux-ia64,
	Linux Kernel List, rusty, dhowells, mingo


On 8 Nov 2002, Jeremy Fitzhardinge wrote:
> 
> The normal way of solving this fairness problem is to make pending write
> locks block read lock attempts, so that the reader count is guaranteed
> to drop to zero as read locks are released.  I haven't looked at the
> Linux implementation of rwlocks, so I don't know how hard this is to
> do.  Or perhaps there's some other reason for not implementing it this
> way?

There's another reason for not doing it that way: allowing readers to keep 
interrupts on even in the presense of interrupt uses of readers.

If you do the "pending writes stop readers" approach, you get

		cpu1			cpu2

		read_lock() - get

					write_lock_irq() - pending

		irq happens
		 - read_lock() - deadlock

and that means that you need to make readers protect against interrupts 
even if the interrupts only read themselves.

NOTE! I'm not saying the existing practice is necessarily a good tradeoff,
and maybe we should just make sure to find all such cases and turn the
read_lock() calls into read_lock_irqsave() and then make the rw-locks
block readers on pending writers. But it's certainly more work and cause
for subtler problems than just naively changing the rw implementation.

		Linus


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Linux-ia64] reader-writer livelock problem
  2002-11-08 17:25     ` Linus Torvalds
@ 2002-11-08 17:28       ` Linus Torvalds
  2002-11-08 17:38       ` Jeremy Fitzhardinge
  1 sibling, 0 replies; 14+ messages in thread
From: Linus Torvalds @ 2002-11-08 17:28 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: William Lee Irwin III, Van Maren, Kevin, linux-ia64,
	Linux Kernel List, rusty, dhowells, mingo


On Fri, 8 Nov 2002, Linus Torvalds wrote:
> 
> NOTE! I'm not saying the existing practice is necessarily a good tradeoff,
> and maybe we should just make sure to find all such cases and turn the
> read_lock() calls into read_lock_irqsave() and then make the rw-locks
> block readers on pending writers. But it's certainly more work and cause
> for subtler problems than just naively changing the rw implementation.

Actually, giving this som emore thought, I really suspect that the
simplest solution is to alloc a separate "fair_read_lock()", and paths
that need to care about fairness (and know they don't have the irq issue)  
can use that, slowly porting users over one by one...

		Linus


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Linux-ia64] reader-writer livelock problem
  2002-11-08 17:13   ` Jeremy Fitzhardinge
  2002-11-08 17:25     ` Linus Torvalds
@ 2002-11-08 17:34     ` David Howells
  2002-11-08 17:54       ` David Howells
  2002-11-08 17:55       ` Stephen Hemminger
  1 sibling, 2 replies; 14+ messages in thread
From: David Howells @ 2002-11-08 17:34 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: William Lee Irwin III, Van Maren, Kevin, linux-ia64,
	Linux Kernel List, rusty, dhowells, mingo, Linus Torvalds

[-- Attachment #1: Type: text/plain, Size: 708 bytes --]


> The normal way of solving this fairness problem is to make pending write
> locks block read lock attempts, so that the reader count is guaranteed
> to drop to zero as read locks are released.  I haven't looked at the
> Linux implementation of rwlocks, so I don't know how hard this is to
> do.  Or perhaps there's some other reason for not implementing it this
> way?

Actually implementing a fair spinlocks and fair rwlocks on the x86 arch are
very easy (at least, if you have XADD it is). Any arch which has CMPXCHG can
also do it, just not so easily.

I've attached an implementation of a fair spinlock and an implementation of a
fair rwlock (which can be compiled and played with in u-space).

David


[-- Attachment #2: Type: text/plain, Size: 1806 bytes --]


typedef struct fairlock {
	u8	curr;
	u8	ticket;
} fairlock_t;

static inline init_fairlock(fairlock_t *lock)
{
	memset(lock,0,sizeof(*lock));
}

/*
 * spin lock fairly
 */
static inline void fair_lock(fairlock_t *lock)
{
	u8 number = 1;

	__asm__ __volatile__(
		"# beginning fair_lock\n\t"
LOCK_PREFIX	"  xaddb     %b2,%1\n\t"	/* number = lock->ticket; lock->ticket++; */
		"1:\n\t"
		"  cmpb	     %b2,%0\n\t"
		"  jne       2f\n\t"		/* jump if number!=lock->curr */
		LOCK_SECTION_START("")
		"2:\n\t"
		"  rep; nop\n\t"
		"  jmp       1b\n"
		LOCK_SECTION_END
		"# ending fair_lock\n\t"
		: "=m"(lock->curr), "=m"(lock->ticket), "=r"(number)
		: "r"(lock), "m"(lock->curr), "m"(lock->ticket), "2"(number)
		: "memory", "cc");
}

/*
 * spin trylock fairly
 */
static inline void fair_trylock(fairlock_t *lock)
{
	u32 tmp;

	__asm__ __volatile__(
		"# beginning fair_trylock\n\t"
		"  movw      (%3),%%ax\n\t"	/* AL=curr, AH=ticket */
		"1:\n\t"
		"  cmpb      %%al,%%ah\n\t"
		"  je        3f\n\t"		/* jump if maybe we can get it */
		"2:\n\t"
		LOCK_SECTION_START("")
		"3:\n\t"
		"  leaw      1(%ax),%w2\n\t"	/* [propose] ticket=ticket+1 */
LOCK_PREFIX	"  cmpxchgw  %w2,(%3)\n\t"
		"  je        3b\n\t"		/* jump if worked */
		"  rep; nop\n\t"		/* didn't work; AL & AH have been updated */
		"  jmp       1b\n"
		LOCK_SECTION_END
		"# ending fair_trylock\n\t"
		: "=m"(lock->curr), "=m"(lock->ticket), "=$r"(tmp)
		: "r"(lock), "m"(lock->curr), "m"(lock->ticket)
		: "memory", "cc", "eax");
}

/*
 * spin unlock fairly
 */
static inline void fair_unlock(fairlock_t *lock)
{
	u8 number;

	__asm__ __volatile__(
		"# beginning fair_unlock\n\t"
LOCK_PREFIX	"  incb      %1\n\t"		/* lock->curr++; */
		"# ending fair_unlock\n\t"
		: "=m"(lock->curr)
		: "r"(lock), "m"(lock->curr)
		: "memory", "cc");
}


[-- Attachment #3: Type: text/plain, Size: 4065 bytes --]


/* rwlock.c: fair read/write spinlocks
 *
 * Copyright (c) 2002   David Howells (dhowells@redhat.com).
 */

#include <linux/types.h>

typedef unsigned char u8;
typedef unsigned int u32;

#define LOCK_PREFIX "lock;"

typedef struct rwlock {

	/* these member variables must be at the beginning and in this order */
	u8	rd_curr;	/* next reader ticket allowed to become active */
	u8	curr;		/* currently active ticket */
	u8	ticket;		/* next ticket to be issued */
	u8	__pad;

} __attribute__((aligned(4))) rwlock_t;

#define RWLOCK_UNLOCKED (rwlock_t) { 0, 0, 0, 0 }

#define rwlock_debug(LOCK,WHERE)							\
do {											\
	printf(WHERE"{%02x,%02x,%02x}\n",LOCK->rd_curr,LOCK->curr,LOCK->ticket);	\
} while(0)

/*
 * obtain a write lock
 * - pull the next ticket from lock->ticket (which is subsequently incremented)
 * - spin until lock->curr catches up to the value that lock->ticket had before the XADD
 * - lock->rd_curr is left equal to the lock->curr (and thus my ticket number) to prevent reads
 *   getting a lock
 */
static inline void write_lock(rwlock_t *lock)
{
	u32 eax;

	rwlock_debug(lock,"-->write_lock");

	asm volatile(
		"	# begin write_lock	\n"
LOCK_PREFIX     "	xaddw	%%ax,1(%3)	\n"	/* my ticket in AH */
		"0:	cmpb	%%al,%%ah	\n"	/* lock->curr in AL */
		"	jne	2f		\n"
		"	.section .text.lock,\"ax\"\n"
		"2:				\n"
		"	rep; nop		\n"
		"	movb	1(%3),%%al	\n"	/* reload AL from lock->curr */
		"	jmp	0b		\n"
		"	.previous		\n"
		"	# end write_lock	\n"
		: "=m"(lock), "=a"(eax)
		: "m"(lock), "r"(lock), "a"(0x0100)
		: "memory", "cc"
		);

	rwlock_debug(lock,"<--write_lock");
}

/*
 * release a write lock
 * - advance both lock->rd_curr and lock->curr by one to enable the next lock to be granted
 */
static inline void write_unlock(rwlock_t *lock)
{
	u32 eax;

	rwlock_debug(lock,"-->write_unlock");

	asm volatile(
		"	# begin write_unlock	\n"
		"	movw	0(%3),%%ax	\n"
		"	incb	%%al		\n"	/* lock->rd_curr++ */
		"	incb	%%ah		\n"	/* lock->curr++ */
		"	movw	%%ax,0(%3)	\n"
		"	# end write_unlock	\n"
		: "=m"(lock), "=&a"(eax)
		: "m"(lock), "r"(lock)
		: "cc"
		);

	rwlock_debug(lock,"<--write_unlock");
}

/*
 * obtain a read lock
 * - pull the next ticket from lock->ticket (which is subsequently incremented)
 * - spin until lock->rd_curr catches up to the value that lock->ticket had before the XADD
 * - lock->rd_curr is then advanced by one to allow the next read lock to be granted
 * - lock->curr is irrelevant
 */
static inline void read_lock(rwlock_t *lock)
{
	u32 eax;

	rwlock_debug(lock,"-->read_lock");

	asm volatile(
		"	# begin read_lock	\n"
LOCK_PREFIX     "	xaddb	%%ah,2(%3)	\n"	/* my ticket in AH */
		"0:	movb	0(%3),%%al	\n"	/* AL = lock->rd_curr */
		"	cmpb	%%al,%%ah	\n"	/* if (ticket!=lock->rd_curr)  */
		"	jne	2f		\n"	/* then jump */
		"	incb	0(%3)		\n"	/* lock->rd_curr */
		"	.section .text.lock,\"ax\"\n"
		"2:				\n"
		"	rep; nop		\n"
		"	jmp	0b		\n"
		"	.previous		\n"
		"	# end read_lock	\n"
		: "=m"(lock), "=a"(eax)
		: "m"(lock), "r"(lock), "a"(0x0100)
		: "memory", "cc"
		);

	rwlock_debug(lock,"<--read_lock");
}

/*
 * release a read lock
 * - just advance the lock->curr count so that any spinning write lock can happen
 */
static inline void read_unlock(rwlock_t *lock)
{
	rwlock_debug(lock,"-->read_unlock");

	asm volatile(
		"	# begin read_unlock	\n"
LOCK_PREFIX	"	incb	%0		\n"	/* lock->curr++ */
		"	# end read_unlock	\n"
		: "=m"(lock->curr)
		: "m"(lock->curr)
		: "cc"
		);

	rwlock_debug(lock,"<--read_unlock");
}

/*****************************************************************************/
/*
 * testing stuff
 */ 
rwlock_t mylock = RWLOCK_UNLOCKED;

#define wibble() asm("nop")

void get_read_lock(void)
{
	wibble();
	read_lock(&mylock);
	read_lock(&mylock);
	read_unlock(&mylock);
	wibble();
	read_lock(&mylock);
	read_unlock(&mylock);
	read_unlock(&mylock);
	wibble();
}

void get_write_lock(void)
{
	wibble();
	write_lock(&mylock);
	wibble();
	write_unlock(&mylock);
	wibble();
}

int main()
{
	get_read_lock();
	get_write_lock();
	get_write_lock();
	get_read_lock();
	return 0;
}

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Linux-ia64] reader-writer livelock problem
  2002-11-08 17:25     ` Linus Torvalds
  2002-11-08 17:28       ` Linus Torvalds
@ 2002-11-08 17:38       ` Jeremy Fitzhardinge
  2002-11-08 17:43         ` David Howells
                           ` (2 more replies)
  1 sibling, 3 replies; 14+ messages in thread
From: Jeremy Fitzhardinge @ 2002-11-08 17:38 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: William Lee Irwin III, Van Maren, Kevin, linux-ia64,
	Linux Kernel List, rusty, dhowells, mingo

On Fri, 2002-11-08 at 09:25, Linus Torvalds wrote:
> There's another reason for not doing it that way: allowing readers to keep 
> interrupts on even in the presense of interrupt uses of readers.
> 
> If you do the "pending writes stop readers" approach, you get
> 
> 		cpu1			cpu2
> 
> 		read_lock() - get
> 
> 					write_lock_irq() - pending
> 
> 		irq happens
> 		 - read_lock() - deadlock
> 
> and that means that you need to make readers protect against interrupts 
> even if the interrupts only read themselves.

Even without interrupts that would be a bug.  It isn't ever safe to
attempt to retake a read lock if you already hold it, because you may
deadlock with a pending writer.  Fair multi-reader locks aren't
recursive locks.

> NOTE! I'm not saying the existing practice is necessarily a good tradeoff,
> and maybe we should just make sure to find all such cases and turn the
> read_lock() calls into read_lock_irqsave() and then make the rw-locks
> block readers on pending writers. But it's certainly more work and cause
> for subtler problems than just naively changing the rw implementation.

Yes, I'd agree.  It would definitely be a behavioural change with
respect to the legality of retaking a lock for reading, which would
probably be quite irritating to find (since they'd only cause a problem
if they actually coincide with an attempted write lock).

> Actually, giving this som emore thought, I really suspect that the
> simplest solution is to alloc a separate "fair_read_lock()", and paths
> that need to care about fairness (and know they don't have the irq
> issue)  
> can use that, slowly porting users over one by one...

Do you mean have a separate lock type, or have two different read_lock
operations on the current type?

	J


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Linux-ia64] reader-writer livelock problem
  2002-11-08 17:38       ` Jeremy Fitzhardinge
@ 2002-11-08 17:43         ` David Howells
  2002-11-08 17:57         ` Linus Torvalds
  2002-11-09  2:48         ` Rusty Russell
  2 siblings, 0 replies; 14+ messages in thread
From: David Howells @ 2002-11-08 17:43 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Linus Torvalds, William Lee Irwin III, Van Maren, Kevin,
	linux-ia64, Linux Kernel List, rusty, dhowells, mingo


> Do you mean have a separate lock type, or have two different read_lock
> operations on the current type?

You'd probably be better off with separate types... that way you can avoid
problems with conditionals and also with different tracking fields being
required by each grade of lock.

David

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Linux-ia64] reader-writer livelock problem
  2002-11-08 17:34     ` [Linux-ia64] reader-writer livelock problem David Howells
@ 2002-11-08 17:54       ` David Howells
  2002-11-08 17:55       ` Stephen Hemminger
  1 sibling, 0 replies; 14+ messages in thread
From: David Howells @ 2002-11-08 17:54 UTC (permalink / raw)
  To: David Howells
  Cc: Jeremy Fitzhardinge, William Lee Irwin III, Van Maren, Kevin,
	linux-ia64, Linux Kernel List, rusty, dhowells, mingo,
	Linus Torvalds


> I've attached an implementation of a fair spinlock and an implementation of a
> fair rwlock (which can be compiled and played with in u-space).

For those that prefer patches (or at least something that'll compile without
warnings)...

David


diff -uNr linux-2.5.46/include/asm-i386/fairlock.h linux-fair-2546/include/asm-i386/fairlock.h
--- linux-2.5.46/include/asm-i386/fairlock.h	1970-01-01 01:00:00.000000000 +0100
+++ linux-fair-2546/include/asm-i386/fairlock.h	2002-11-08 17:51:30.000000000 +0000
@@ -0,0 +1,210 @@
+/* fairlock.h: fair spinlocks
+ *
+ * Copyright (C) 2002 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _ASM_FAIRLOCK_H
+#define _ASM_FAIRLOCK_H
+
+#include <linux/config.h>
+#include <linux/string.h>
+#include <asm/atomic.h>
+
+typedef struct fairlock {
+	u8	curr;
+	u8	ticket;
+} fairlock_t;
+
+static inline void init_fairlock(fairlock_t *lock)
+{
+	memset(lock,0,sizeof(*lock));
+}
+
+typedef struct rwfairlock {
+
+	/* these member variables must be at the beginning and in this order */
+	u8	rd_curr;	/* next reader ticket allowed to become active */
+	u8	curr;		/* currently active ticket */
+	u8	ticket;		/* next ticket to be issued */
+	u8	__pad;
+
+} __attribute__((aligned(4))) rwfairlock_t;
+
+#define RWFAIRLOCK_UNLOCKED (rwfairlock_t) { 0, 0, 0, 0 }
+
+/*****************************************************************************/
+/*
+ * spin lock fairly
+ */
+static inline void fair_lock(fairlock_t *lock)
+{
+	u8 number = 1;
+
+	__asm__ __volatile__(
+		"# beginning fair_lock\n\t"
+LOCK_PREFIX	"  xaddb     %b2,%1\n\t"	/* number = lock->ticket; lock->ticket++; */
+		"1:\n\t"
+		"  cmpb	     %b2,%0\n\t"
+		"  jne       2f\n\t"		/* jump if number!=lock->curr */
+		LOCK_SECTION_START("")
+		"2:\n\t"
+		"  rep; nop\n\t"
+		"  jmp       1b\n"
+		LOCK_SECTION_END
+		"# ending fair_lock\n\t"
+		: "=m"(lock->curr), "=m"(lock->ticket), "=r"(number)
+		: "r"(lock), "m"(lock->curr), "m"(lock->ticket), "2"(number)
+		: "memory", "cc");
+}
+
+/*****************************************************************************/
+/*
+ * spin trylock fairly
+ */
+static inline void fair_trylock(fairlock_t *lock)
+{
+	u32 tmp;
+
+	__asm__ __volatile__(
+		"# beginning fair_trylock\n\t"
+		"  movw      (%3),%%ax\n\t"	/* AL=curr, AH=ticket */
+		"1:\n\t"
+		"  cmpb      %%al,%%ah\n\t"
+		"  je        3f\n\t"		/* jump if maybe we can get it */
+		"2:\n\t"
+		LOCK_SECTION_START("")
+		"3:\n\t"
+		"  leaw      1(%ax),%w2\n\t"	/* [propose] ticket=ticket+1 */
+LOCK_PREFIX	"  cmpxchgw  %w2,(%3)\n\t"
+		"  je        3b\n\t"		/* jump if worked */
+		"  rep; nop\n\t"		/* didn't work; AL & AH have been updated */
+		"  jmp       1b\n"
+		LOCK_SECTION_END
+		"# ending fair_trylock\n\t"
+		: "=m"(lock->curr), "=m"(lock->ticket), "=$r"(tmp)
+		: "r"(lock), "m"(lock->curr), "m"(lock->ticket)
+		: "memory", "cc", "eax");
+}
+
+/*****************************************************************************/
+/*
+ * spin unlock fairly
+ */
+static inline void fair_unlock(fairlock_t *lock)
+{
+	__asm__ __volatile__(
+		"# beginning fair_unlock\n\t"
+LOCK_PREFIX	"  incb      %1\n\t"		/* lock->curr++; */
+		"# ending fair_unlock\n\t"
+		: "=m"(lock->curr)
+		: "r"(lock), "m"(lock->curr)
+		: "memory", "cc");
+}
+
+/*****************************************************************************/
+/*
+ * obtain a write lock
+ * - pull the next ticket from lock->ticket (which is subsequently incremented)
+ * - spin until lock->curr catches up to the value that lock->ticket had before the XADD
+ * - lock->rd_curr is left equal to the lock->curr (and thus my ticket number) to prevent reads
+ *   getting a lock
+ */
+static inline void fair_write_lock(rwfairlock_t *lock)
+{
+	u32 eax;
+
+	asm volatile(
+		"	# begin write_lock	\n"
+LOCK_PREFIX     "	xaddw	%%ax,1(%3)	\n"	/* my ticket in AH */
+		"0:	cmpb	%%al,%%ah	\n"	/* lock->curr in AL */
+		"	jne	2f		\n"
+		"	.section .text.lock,\"ax\"\n"
+		"2:				\n"
+		"	rep; nop		\n"
+		"	movb	1(%3),%%al	\n"	/* reload AL from lock->curr */
+		"	jmp	0b		\n"
+		"	.previous		\n"
+		"	# end write_lock	\n"
+		: "=m"(lock), "=a"(eax)
+		: "m"(lock), "r"(lock), "a"(0x0100)
+		: "memory", "cc"
+		);
+}
+
+/*****************************************************************************/
+/*
+ * release a write lock
+ * - advance both lock->rd_curr and lock->curr by one to enable the next lock to be granted
+ */
+static inline void fair_write_unlock(rwfairlock_t *lock)
+{
+	u32 eax;
+
+	asm volatile(
+		"	# begin write_unlock	\n"
+		"	movw	0(%3),%%ax	\n"
+		"	incb	%%al		\n"	/* lock->rd_curr++ */
+		"	incb	%%ah		\n"	/* lock->curr++ */
+		"	movw	%%ax,0(%3)	\n"
+		"	# end write_unlock	\n"
+		: "=m"(lock), "=&a"(eax)
+		: "m"(lock), "r"(lock)
+		: "cc"
+		);
+}
+
+/*****************************************************************************/
+/*
+ * obtain a read lock
+ * - pull the next ticket from lock->ticket (which is subsequently incremented)
+ * - spin until lock->rd_curr catches up to the value that lock->ticket had before the XADD
+ * - lock->rd_curr is then advanced by one to allow the next read lock to be granted
+ * - lock->curr is irrelevant
+ */
+static inline void fair_read_lock(rwfairlock_t *lock)
+{
+	u32 eax;
+
+	asm volatile(
+		"	# begin read_lock	\n"
+LOCK_PREFIX     "	xaddb	%%ah,2(%3)	\n"	/* my ticket in AH */
+		"0:	movb	0(%3),%%al	\n"	/* AL = lock->rd_curr */
+		"	cmpb	%%al,%%ah	\n"	/* if (ticket!=lock->rd_curr)  */
+		"	jne	2f		\n"	/* then jump */
+		"	incb	0(%3)		\n"	/* lock->rd_curr */
+		"	.section .text.lock,\"ax\"\n"
+		"2:				\n"
+		"	rep; nop		\n"
+		"	jmp	0b		\n"
+		"	.previous		\n"
+		"	# end read_lock	\n"
+		: "=m"(lock), "=a"(eax)
+		: "m"(lock), "r"(lock), "a"(0x0100)
+		: "memory", "cc"
+		);
+}
+
+/*****************************************************************************/
+/*
+ * release a read lock
+ * - just advance the lock->curr count so that any spinning write lock can happen
+ */
+static inline void read_unlock(rwfairlock_t *lock)
+{
+	asm volatile(
+		"	# begin read_unlock	\n"
+LOCK_PREFIX	"	incb	%0		\n"	/* lock->curr++ */
+		"	# end read_unlock	\n"
+		: "=m"(lock->curr)
+		: "m"(lock->curr)
+		: "cc"
+		);
+}
+
+#endif /* _ASM_FAIRLOCK_H */
diff -uNr linux-2.5.46/include/asm-i386/spinlock.h linux-fair-2546/include/asm-i386/spinlock.h
--- linux-2.5.46/include/asm-i386/spinlock.h	2002-10-15 10:12:35.000000000 +0100
+++ linux-fair-2546/include/asm-i386/spinlock.h	2002-11-08 17:50:39.000000000 +0000
@@ -4,6 +4,7 @@
 #include <asm/atomic.h>
 #include <asm/rwlock.h>
 #include <asm/page.h>
+#include <asm/fairlock.h>
 #include <linux/config.h>
 
 /*

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Linux-ia64] reader-writer livelock problem
  2002-11-08 17:34     ` [Linux-ia64] reader-writer livelock problem David Howells
  2002-11-08 17:54       ` David Howells
@ 2002-11-08 17:55       ` Stephen Hemminger
  1 sibling, 0 replies; 14+ messages in thread
From: Stephen Hemminger @ 2002-11-08 17:55 UTC (permalink / raw)
  To: David Howells
  Cc: Jeremy Fitzhardinge, William Lee Irwin III, Van Maren, Kevin,
	linux-ia64, Linux Kernel List, rusty, dhowells, mingo,
	Linus Torvalds

On Fri, 2002-11-08 at 09:34, David Howells wrote:
> 
> > The normal way of solving this fairness problem is to make pending write
> > locks block read lock attempts, so that the reader count is guaranteed
> > to drop to zero as read locks are released.  I haven't looked at the
> > Linux implementation of rwlocks, so I don't know how hard this is to
> > do.  Or perhaps there's some other reason for not implementing it this
> > way?
> 
> Actually implementing a fair spinlocks and fair rwlocks on the x86 arch are
> very easy (at least, if you have XADD it is). Any arch which has CMPXCHG can
> also do it, just not so easily.
> 
> I've attached an implementation of a fair spinlock and an implementation of a
> fair rwlock (which can be compiled and played with in u-space).
> 
> David

There are a selection of similar algorithms here:
	http://www.cs.rochester.edu/u/scott/synchronization/pseudocode/rw.html#s_f

How does yours compare?




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Linux-ia64] reader-writer livelock problem
  2002-11-08 17:38       ` Jeremy Fitzhardinge
  2002-11-08 17:43         ` David Howells
@ 2002-11-08 17:57         ` Linus Torvalds
  2002-11-09  2:48         ` Rusty Russell
  2 siblings, 0 replies; 14+ messages in thread
From: Linus Torvalds @ 2002-11-08 17:57 UTC (permalink / raw)
  To: linux-kernel

In article <1036777105.13021.13.camel@ixodes.goop.org>,
Jeremy Fitzhardinge  <jeremy@goop.org> wrote:
>
>Even without interrupts that would be a bug.  It isn't ever safe to
>attempt to retake a read lock if you already hold it, because you may
>deadlock with a pending writer.  Fair multi-reader locks aren't
>recursive locks.

.. but I don't think we have any real users who use them for recursion,
so the only "recursion" right now is through interrupts that use this
feature.

(At least that was true a long time time ago, maybe we've added truly
recursive users since)

>> Actually, giving this som emore thought, I really suspect that the
>> simplest solution is to alloc a separate "fair_read_lock()", and paths
>> that need to care about fairness (and know they don't have the irq
>> issue)  
>> can use that, slowly porting users over one by one...
>
>Do you mean have a separate lock type, or have two different read_lock
>operations on the current type?

That depends on whether it is even sanely implementable to share the
same lock. It may not be. 

>From a migration standpoint it would be easiest (by far) to be able to
share the lock type and to mix operations (ie an interrupt - or
recursive user - could just use the non-fair version, while others could
use the fair version on the same lock).  However, I have this nagging
suspicion that it might be a total nightmare to implement efficiently in
practice. 

I've not looked at it. Any ideas?

		Linus

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Linux-ia64] reader-writer livelock problem
  2002-11-08 17:38       ` Jeremy Fitzhardinge
  2002-11-08 17:43         ` David Howells
  2002-11-08 17:57         ` Linus Torvalds
@ 2002-11-09  2:48         ` Rusty Russell
  2002-11-09  4:36           ` William Lee Irwin III
       [not found]           ` <3DCFDAE9.6D359448@email.mot.com>
  2 siblings, 2 replies; 14+ messages in thread
From: Rusty Russell @ 2002-11-09  2:48 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: William Lee Irwin III, Van Maren, Kevin, linux-ia64,
	Linux Kernel List, rusty, dhowells, mingo, torvalds

In message <1036777105.13021.13.camel@ixodes.goop.org> you write:
> On Fri, 2002-11-08 at 09:25, Linus Torvalds wrote:
> > There's another reason for not doing it that way: allowing readers to keep 
> > interrupts on even in the presense of interrupt uses of readers.
> > 
> > If you do the "pending writes stop readers" approach, you get
> > 
> > 		cpu1			cpu2
> > 
> > 		read_lock() - get
> > 
> > 					write_lock_irq() - pending
> > 
> > 		irq happens
> > 		 - read_lock() - deadlock
> > 
> > and that means that you need to make readers protect against interrupts 
> > even if the interrupts only read themselves.
> 
> Even without interrupts that would be a bug.  It isn't ever safe to
> attempt to retake a read lock if you already hold it, because you may
> deadlock with a pending writer.  Fair multi-reader locks aren't
> recursive locks.

That's the point.  This is explicitly guaranteed with the current
locks, and you are allowed to recurse on them.  The netfilter code
explicitly uses this to retake the net brlock, since it gets called
from different paths.

Implement "read_lock_yield" or "wrlock_t" but don't break existing
semantics until 2.7 *please*!

Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Linux-ia64] reader-writer livelock problem
  2002-11-09  2:48         ` Rusty Russell
@ 2002-11-09  4:36           ` William Lee Irwin III
       [not found]           ` <3DCFDAE9.6D359448@email.mot.com>
  1 sibling, 0 replies; 14+ messages in thread
From: William Lee Irwin III @ 2002-11-09  4:36 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Jeremy Fitzhardinge, Van Maren, Kevin, linux-ia64,
	Linux Kernel List, dhowells, mingo, torvalds

In message <1036777105.13021.13.camel@ixodes.goop.org> you write:
>> Even without interrupts that would be a bug.  It isn't ever safe to
>> attempt to retake a read lock if you already hold it, because you may
>> deadlock with a pending writer.  Fair multi-reader locks aren't
>> recursive locks.

On Sat, Nov 09, 2002 at 01:48:17PM +1100, Rusty Russell wrote:
> That's the point.  This is explicitly guaranteed with the current
> locks, and you are allowed to recurse on them.  The netfilter code
> explicitly uses this to retake the net brlock, since it gets called
> from different paths.
> Implement "read_lock_yield" or "wrlock_t" but don't break existing
> semantics until 2.7 *please*!

My only interest is doing this specifically for problematic cases,
e.g. the tasklist_lock. Other usages probably shouldn't be altered
during this release cycle unless they too prove problematic, in which
case their usages should be fixed by their maintainers as bugfixes.

Of course, I'm not happy to hear about this explicit recursion.


Bill

^ permalink raw reply	[flat|nested] 14+ messages in thread

* (no subject)
       [not found]           ` <3DCFDAE9.6D359448@email.mot.com>
@ 2002-11-11 19:22             ` David Mosberger
  2002-11-12  1:39               ` your mail Rik van Riel
  0 siblings, 1 reply; 14+ messages in thread
From: David Mosberger @ 2002-11-11 19:22 UTC (permalink / raw)
  To: Mario Smarduch; +Cc: linux-ia64, linux-kernel

>>>>> On Mon, 11 Nov 2002 10:29:29 -0600, Mario Smarduch <cms063@email.mot.com> said:

  Mario> I know that on some commercial Unix systems there are ways to
  Mario> cap the CPU utilization by user/group ids are there such
  Mario> features/patches available on Linux?

There are probably other patches floating around, but Process Resource
Management (PRM) for Linux is/was one approach to do just that:

	http://resourcemanagement.unixsolutions.hp.com/WaRM/prm_linux/

The kernel patches available from this URL are pretty old (up to
2.4.6, as far as I could see), and I'm not sure what the future plans
for PRM on Linux are.  Perhaps someone else can provide more details.

	--david

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: your mail
  2002-11-11 19:22             ` David Mosberger
@ 2002-11-12  1:39               ` Rik van Riel
  0 siblings, 0 replies; 14+ messages in thread
From: Rik van Riel @ 2002-11-12  1:39 UTC (permalink / raw)
  To: davidm; +Cc: Mario Smarduch, linux-ia64, linux-kernel

On Mon, 11 Nov 2002, David Mosberger wrote:
> >>>>> On Mon, 11 Nov 2002 10:29:29 -0600, Mario Smarduch <cms063@email.mot.com> said:
>
>   Mario> I know that on some commercial Unix systems there are ways to
>   Mario> cap the CPU utilization by user/group ids are there such
>   Mario> features/patches available on Linux?

> The kernel patches available from this URL are pretty old (up to
> 2.4.6, as far as I could see), and I'm not sure what the future plans
> for PRM on Linux are.  Perhaps someone else can provide more details.

I'm (slowly) working on a per-user fair scheduler on top of Ingo's
O(1) scheduler.  Slowly because it's a fairly complex thing.

Once that is done it should be possible to change the accounting
to other resource containers and generally have fun assigning
priorities, though that is beyond the scope of what I'm trying to
achieve.

cheers,

Rik
-- 
Bravely reimplemented by the knights who say "NIH".
http://www.surriel.com/		http://distro.conectiva.com/
Current spamtrap:  <a href=mailto:"october@surriel.com">october@surriel.com</a>


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2002-11-12  1:32 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <3FAD1088D4556046AEC48D80B47B478C0101F4E7@usslc-exch-4.slc.unisys.com>
2002-11-08  3:51 ` [Linux-ia64] reader-writer livelock problem William Lee Irwin III
2002-11-08 17:13   ` Jeremy Fitzhardinge
2002-11-08 17:25     ` Linus Torvalds
2002-11-08 17:28       ` Linus Torvalds
2002-11-08 17:38       ` Jeremy Fitzhardinge
2002-11-08 17:43         ` David Howells
2002-11-08 17:57         ` Linus Torvalds
2002-11-09  2:48         ` Rusty Russell
2002-11-09  4:36           ` William Lee Irwin III
     [not found]           ` <3DCFDAE9.6D359448@email.mot.com>
2002-11-11 19:22             ` David Mosberger
2002-11-12  1:39               ` your mail Rik van Riel
2002-11-08 17:34     ` [Linux-ia64] reader-writer livelock problem David Howells
2002-11-08 17:54       ` David Howells
2002-11-08 17:55       ` Stephen Hemminger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).