linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [Linux-ia64] reader-writer livelock problem
       [not found] <3FAD1088D4556046AEC48D80B47B478C0101F4E7@usslc-exch-4.slc.unisys.com>
@ 2002-11-08  3:51 ` William Lee Irwin III
  2002-11-08 17:13   ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 22+ messages in thread
From: William Lee Irwin III @ 2002-11-08  3:51 UTC (permalink / raw)
  To: Van Maren, Kevin
  Cc: linux-ia64, linux-kernel, rusty, dhowells, mingo, torvalds

On Thu, Nov 07, 2002 at 09:23:21PM -0600, Van Maren, Kevin wrote:
> This is a follow-up to the email thread I started on July 29th.
> See http://www.cs.helsinki.fi/linux/linux-kernel/2002-30/0446.html
> and the following discussion on LKML.
> I'll summarize the problem again to refresh the issue.
> Again, this is a correctness issue, not a performance one.
> I am seeing a problem on medium-sized SMPs where user programs are
> able to livelock the Linux kernel to such an extent that the
> system appears dead.  With the help of some hardware debugging
> tools, I was able to determine that the problem is caused by
> the reader-preference reader/writer locks in the Linux kernel.

This is a very serious problem which I have also encountered. My
strategy was to make the readers on the tasklist_lock more well-behaved,
and with Ingo's help and co-authorship those changes were cleaned up,
tuned to provide performance benefits for smaller systems, bugfixed,
and incorporated in the kernel. They have at least provided 16x systems
in my lab with much more stability. The issues are still triggerable on
32x systems in my lab, to which I do not have regular access.

Rusty, Dave, Ingo, and Linus cc:'d for additional commentary/help.

Bill

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Linux-ia64] reader-writer livelock problem
  2002-11-08  3:51 ` [Linux-ia64] reader-writer livelock problem William Lee Irwin III
@ 2002-11-08 17:13   ` Jeremy Fitzhardinge
  2002-11-08 17:25     ` Linus Torvalds
  2002-11-08 17:34     ` [Linux-ia64] reader-writer livelock problem David Howells
  0 siblings, 2 replies; 22+ messages in thread
From: Jeremy Fitzhardinge @ 2002-11-08 17:13 UTC (permalink / raw)
  To: William Lee Irwin III
  Cc: Van Maren, Kevin, linux-ia64, Linux Kernel List, rusty, dhowells,
	mingo, Linus Torvalds

On Thu, 2002-11-07 at 19:51, William Lee Irwin III wrote:
> On Thu, Nov 07, 2002 at 09:23:21PM -0600, Van Maren, Kevin wrote:
> > This is a follow-up to the email thread I started on July 29th.
> > See http://www.cs.helsinki.fi/linux/linux-kernel/2002-30/0446.html
> > and the following discussion on LKML.
> > I'll summarize the problem again to refresh the issue.
> > Again, this is a correctness issue, not a performance one.
> > I am seeing a problem on medium-sized SMPs where user programs are
> > able to livelock the Linux kernel to such an extent that the
> > system appears dead.  With the help of some hardware debugging
> > tools, I was able to determine that the problem is caused by
> > the reader-preference reader/writer locks in the Linux kernel.
> 
> This is a very serious problem which I have also encountered. My
> strategy was to make the readers on the tasklist_lock more well-behaved,
> and with Ingo's help and co-authorship those changes were cleaned up,
> tuned to provide performance benefits for smaller systems, bugfixed,
> and incorporated in the kernel. They have at least provided 16x systems
> in my lab with much more stability. The issues are still triggerable on
> 32x systems in my lab, to which I do not have regular access.

The normal way of solving this fairness problem is to make pending write
locks block read lock attempts, so that the reader count is guaranteed
to drop to zero as read locks are released.  I haven't looked at the
Linux implementation of rwlocks, so I don't know how hard this is to
do.  Or perhaps there's some other reason for not implementing it this
way?

	J


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Linux-ia64] reader-writer livelock problem
  2002-11-08 17:13   ` Jeremy Fitzhardinge
@ 2002-11-08 17:25     ` Linus Torvalds
  2002-11-08 17:28       ` Linus Torvalds
  2002-11-08 17:38       ` Jeremy Fitzhardinge
  2002-11-08 17:34     ` [Linux-ia64] reader-writer livelock problem David Howells
  1 sibling, 2 replies; 22+ messages in thread
From: Linus Torvalds @ 2002-11-08 17:25 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: William Lee Irwin III, Van Maren, Kevin, linux-ia64,
	Linux Kernel List, rusty, dhowells, mingo


On 8 Nov 2002, Jeremy Fitzhardinge wrote:
> 
> The normal way of solving this fairness problem is to make pending write
> locks block read lock attempts, so that the reader count is guaranteed
> to drop to zero as read locks are released.  I haven't looked at the
> Linux implementation of rwlocks, so I don't know how hard this is to
> do.  Or perhaps there's some other reason for not implementing it this
> way?

There's another reason for not doing it that way: allowing readers to keep 
interrupts on even in the presense of interrupt uses of readers.

If you do the "pending writes stop readers" approach, you get

		cpu1			cpu2

		read_lock() - get

					write_lock_irq() - pending

		irq happens
		 - read_lock() - deadlock

and that means that you need to make readers protect against interrupts 
even if the interrupts only read themselves.

NOTE! I'm not saying the existing practice is necessarily a good tradeoff,
and maybe we should just make sure to find all such cases and turn the
read_lock() calls into read_lock_irqsave() and then make the rw-locks
block readers on pending writers. But it's certainly more work and cause
for subtler problems than just naively changing the rw implementation.

		Linus


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Linux-ia64] reader-writer livelock problem
  2002-11-08 17:25     ` Linus Torvalds
@ 2002-11-08 17:28       ` Linus Torvalds
  2002-11-08 17:38       ` Jeremy Fitzhardinge
  1 sibling, 0 replies; 22+ messages in thread
From: Linus Torvalds @ 2002-11-08 17:28 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: William Lee Irwin III, Van Maren, Kevin, linux-ia64,
	Linux Kernel List, rusty, dhowells, mingo


On Fri, 8 Nov 2002, Linus Torvalds wrote:
> 
> NOTE! I'm not saying the existing practice is necessarily a good tradeoff,
> and maybe we should just make sure to find all such cases and turn the
> read_lock() calls into read_lock_irqsave() and then make the rw-locks
> block readers on pending writers. But it's certainly more work and cause
> for subtler problems than just naively changing the rw implementation.

Actually, giving this som emore thought, I really suspect that the
simplest solution is to alloc a separate "fair_read_lock()", and paths
that need to care about fairness (and know they don't have the irq issue)  
can use that, slowly porting users over one by one...

		Linus


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Linux-ia64] reader-writer livelock problem
  2002-11-08 17:13   ` Jeremy Fitzhardinge
  2002-11-08 17:25     ` Linus Torvalds
@ 2002-11-08 17:34     ` David Howells
  2002-11-08 17:54       ` David Howells
  2002-11-08 17:55       ` Stephen Hemminger
  1 sibling, 2 replies; 22+ messages in thread
From: David Howells @ 2002-11-08 17:34 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: William Lee Irwin III, Van Maren, Kevin, linux-ia64,
	Linux Kernel List, rusty, dhowells, mingo, Linus Torvalds

[-- Attachment #1: Type: text/plain, Size: 708 bytes --]


> The normal way of solving this fairness problem is to make pending write
> locks block read lock attempts, so that the reader count is guaranteed
> to drop to zero as read locks are released.  I haven't looked at the
> Linux implementation of rwlocks, so I don't know how hard this is to
> do.  Or perhaps there's some other reason for not implementing it this
> way?

Actually implementing a fair spinlocks and fair rwlocks on the x86 arch are
very easy (at least, if you have XADD it is). Any arch which has CMPXCHG can
also do it, just not so easily.

I've attached an implementation of a fair spinlock and an implementation of a
fair rwlock (which can be compiled and played with in u-space).

David


[-- Attachment #2: Type: text/plain, Size: 1806 bytes --]


typedef struct fairlock {
	u8	curr;
	u8	ticket;
} fairlock_t;

static inline init_fairlock(fairlock_t *lock)
{
	memset(lock,0,sizeof(*lock));
}

/*
 * spin lock fairly
 */
static inline void fair_lock(fairlock_t *lock)
{
	u8 number = 1;

	__asm__ __volatile__(
		"# beginning fair_lock\n\t"
LOCK_PREFIX	"  xaddb     %b2,%1\n\t"	/* number = lock->ticket; lock->ticket++; */
		"1:\n\t"
		"  cmpb	     %b2,%0\n\t"
		"  jne       2f\n\t"		/* jump if number!=lock->curr */
		LOCK_SECTION_START("")
		"2:\n\t"
		"  rep; nop\n\t"
		"  jmp       1b\n"
		LOCK_SECTION_END
		"# ending fair_lock\n\t"
		: "=m"(lock->curr), "=m"(lock->ticket), "=r"(number)
		: "r"(lock), "m"(lock->curr), "m"(lock->ticket), "2"(number)
		: "memory", "cc");
}

/*
 * spin trylock fairly
 */
static inline void fair_trylock(fairlock_t *lock)
{
	u32 tmp;

	__asm__ __volatile__(
		"# beginning fair_trylock\n\t"
		"  movw      (%3),%%ax\n\t"	/* AL=curr, AH=ticket */
		"1:\n\t"
		"  cmpb      %%al,%%ah\n\t"
		"  je        3f\n\t"		/* jump if maybe we can get it */
		"2:\n\t"
		LOCK_SECTION_START("")
		"3:\n\t"
		"  leaw      1(%ax),%w2\n\t"	/* [propose] ticket=ticket+1 */
LOCK_PREFIX	"  cmpxchgw  %w2,(%3)\n\t"
		"  je        3b\n\t"		/* jump if worked */
		"  rep; nop\n\t"		/* didn't work; AL & AH have been updated */
		"  jmp       1b\n"
		LOCK_SECTION_END
		"# ending fair_trylock\n\t"
		: "=m"(lock->curr), "=m"(lock->ticket), "=$r"(tmp)
		: "r"(lock), "m"(lock->curr), "m"(lock->ticket)
		: "memory", "cc", "eax");
}

/*
 * spin unlock fairly
 */
static inline void fair_unlock(fairlock_t *lock)
{
	u8 number;

	__asm__ __volatile__(
		"# beginning fair_unlock\n\t"
LOCK_PREFIX	"  incb      %1\n\t"		/* lock->curr++; */
		"# ending fair_unlock\n\t"
		: "=m"(lock->curr)
		: "r"(lock), "m"(lock->curr)
		: "memory", "cc");
}


[-- Attachment #3: Type: text/plain, Size: 4065 bytes --]


/* rwlock.c: fair read/write spinlocks
 *
 * Copyright (c) 2002   David Howells (dhowells@redhat.com).
 */

#include <linux/types.h>

typedef unsigned char u8;
typedef unsigned int u32;

#define LOCK_PREFIX "lock;"

typedef struct rwlock {

	/* these member variables must be at the beginning and in this order */
	u8	rd_curr;	/* next reader ticket allowed to become active */
	u8	curr;		/* currently active ticket */
	u8	ticket;		/* next ticket to be issued */
	u8	__pad;

} __attribute__((aligned(4))) rwlock_t;

#define RWLOCK_UNLOCKED (rwlock_t) { 0, 0, 0, 0 }

#define rwlock_debug(LOCK,WHERE)							\
do {											\
	printf(WHERE"{%02x,%02x,%02x}\n",LOCK->rd_curr,LOCK->curr,LOCK->ticket);	\
} while(0)

/*
 * obtain a write lock
 * - pull the next ticket from lock->ticket (which is subsequently incremented)
 * - spin until lock->curr catches up to the value that lock->ticket had before the XADD
 * - lock->rd_curr is left equal to the lock->curr (and thus my ticket number) to prevent reads
 *   getting a lock
 */
static inline void write_lock(rwlock_t *lock)
{
	u32 eax;

	rwlock_debug(lock,"-->write_lock");

	asm volatile(
		"	# begin write_lock	\n"
LOCK_PREFIX     "	xaddw	%%ax,1(%3)	\n"	/* my ticket in AH */
		"0:	cmpb	%%al,%%ah	\n"	/* lock->curr in AL */
		"	jne	2f		\n"
		"	.section .text.lock,\"ax\"\n"
		"2:				\n"
		"	rep; nop		\n"
		"	movb	1(%3),%%al	\n"	/* reload AL from lock->curr */
		"	jmp	0b		\n"
		"	.previous		\n"
		"	# end write_lock	\n"
		: "=m"(lock), "=a"(eax)
		: "m"(lock), "r"(lock), "a"(0x0100)
		: "memory", "cc"
		);

	rwlock_debug(lock,"<--write_lock");
}

/*
 * release a write lock
 * - advance both lock->rd_curr and lock->curr by one to enable the next lock to be granted
 */
static inline void write_unlock(rwlock_t *lock)
{
	u32 eax;

	rwlock_debug(lock,"-->write_unlock");

	asm volatile(
		"	# begin write_unlock	\n"
		"	movw	0(%3),%%ax	\n"
		"	incb	%%al		\n"	/* lock->rd_curr++ */
		"	incb	%%ah		\n"	/* lock->curr++ */
		"	movw	%%ax,0(%3)	\n"
		"	# end write_unlock	\n"
		: "=m"(lock), "=&a"(eax)
		: "m"(lock), "r"(lock)
		: "cc"
		);

	rwlock_debug(lock,"<--write_unlock");
}

/*
 * obtain a read lock
 * - pull the next ticket from lock->ticket (which is subsequently incremented)
 * - spin until lock->rd_curr catches up to the value that lock->ticket had before the XADD
 * - lock->rd_curr is then advanced by one to allow the next read lock to be granted
 * - lock->curr is irrelevant
 */
static inline void read_lock(rwlock_t *lock)
{
	u32 eax;

	rwlock_debug(lock,"-->read_lock");

	asm volatile(
		"	# begin read_lock	\n"
LOCK_PREFIX     "	xaddb	%%ah,2(%3)	\n"	/* my ticket in AH */
		"0:	movb	0(%3),%%al	\n"	/* AL = lock->rd_curr */
		"	cmpb	%%al,%%ah	\n"	/* if (ticket!=lock->rd_curr)  */
		"	jne	2f		\n"	/* then jump */
		"	incb	0(%3)		\n"	/* lock->rd_curr */
		"	.section .text.lock,\"ax\"\n"
		"2:				\n"
		"	rep; nop		\n"
		"	jmp	0b		\n"
		"	.previous		\n"
		"	# end read_lock	\n"
		: "=m"(lock), "=a"(eax)
		: "m"(lock), "r"(lock), "a"(0x0100)
		: "memory", "cc"
		);

	rwlock_debug(lock,"<--read_lock");
}

/*
 * release a read lock
 * - just advance the lock->curr count so that any spinning write lock can happen
 */
static inline void read_unlock(rwlock_t *lock)
{
	rwlock_debug(lock,"-->read_unlock");

	asm volatile(
		"	# begin read_unlock	\n"
LOCK_PREFIX	"	incb	%0		\n"	/* lock->curr++ */
		"	# end read_unlock	\n"
		: "=m"(lock->curr)
		: "m"(lock->curr)
		: "cc"
		);

	rwlock_debug(lock,"<--read_unlock");
}

/*****************************************************************************/
/*
 * testing stuff
 */ 
rwlock_t mylock = RWLOCK_UNLOCKED;

#define wibble() asm("nop")

void get_read_lock(void)
{
	wibble();
	read_lock(&mylock);
	read_lock(&mylock);
	read_unlock(&mylock);
	wibble();
	read_lock(&mylock);
	read_unlock(&mylock);
	read_unlock(&mylock);
	wibble();
}

void get_write_lock(void)
{
	wibble();
	write_lock(&mylock);
	wibble();
	write_unlock(&mylock);
	wibble();
}

int main()
{
	get_read_lock();
	get_write_lock();
	get_write_lock();
	get_read_lock();
	return 0;
}

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Linux-ia64] reader-writer livelock problem
  2002-11-08 17:25     ` Linus Torvalds
  2002-11-08 17:28       ` Linus Torvalds
@ 2002-11-08 17:38       ` Jeremy Fitzhardinge
  2002-11-08 17:43         ` David Howells
                           ` (2 more replies)
  1 sibling, 3 replies; 22+ messages in thread
From: Jeremy Fitzhardinge @ 2002-11-08 17:38 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: William Lee Irwin III, Van Maren, Kevin, linux-ia64,
	Linux Kernel List, rusty, dhowells, mingo

On Fri, 2002-11-08 at 09:25, Linus Torvalds wrote:
> There's another reason for not doing it that way: allowing readers to keep 
> interrupts on even in the presense of interrupt uses of readers.
> 
> If you do the "pending writes stop readers" approach, you get
> 
> 		cpu1			cpu2
> 
> 		read_lock() - get
> 
> 					write_lock_irq() - pending
> 
> 		irq happens
> 		 - read_lock() - deadlock
> 
> and that means that you need to make readers protect against interrupts 
> even if the interrupts only read themselves.

Even without interrupts that would be a bug.  It isn't ever safe to
attempt to retake a read lock if you already hold it, because you may
deadlock with a pending writer.  Fair multi-reader locks aren't
recursive locks.

> NOTE! I'm not saying the existing practice is necessarily a good tradeoff,
> and maybe we should just make sure to find all such cases and turn the
> read_lock() calls into read_lock_irqsave() and then make the rw-locks
> block readers on pending writers. But it's certainly more work and cause
> for subtler problems than just naively changing the rw implementation.

Yes, I'd agree.  It would definitely be a behavioural change with
respect to the legality of retaking a lock for reading, which would
probably be quite irritating to find (since they'd only cause a problem
if they actually coincide with an attempted write lock).

> Actually, giving this som emore thought, I really suspect that the
> simplest solution is to alloc a separate "fair_read_lock()", and paths
> that need to care about fairness (and know they don't have the irq
> issue)  
> can use that, slowly porting users over one by one...

Do you mean have a separate lock type, or have two different read_lock
operations on the current type?

	J


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Linux-ia64] reader-writer livelock problem
  2002-11-08 17:38       ` Jeremy Fitzhardinge
@ 2002-11-08 17:43         ` David Howells
  2002-11-08 17:57         ` Linus Torvalds
  2002-11-09  2:48         ` Rusty Russell
  2 siblings, 0 replies; 22+ messages in thread
From: David Howells @ 2002-11-08 17:43 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Linus Torvalds, William Lee Irwin III, Van Maren, Kevin,
	linux-ia64, Linux Kernel List, rusty, dhowells, mingo


> Do you mean have a separate lock type, or have two different read_lock
> operations on the current type?

You'd probably be better off with separate types... that way you can avoid
problems with conditionals and also with different tracking fields being
required by each grade of lock.

David

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Linux-ia64] reader-writer livelock problem
  2002-11-08 17:34     ` [Linux-ia64] reader-writer livelock problem David Howells
@ 2002-11-08 17:54       ` David Howells
  2002-11-08 17:55       ` Stephen Hemminger
  1 sibling, 0 replies; 22+ messages in thread
From: David Howells @ 2002-11-08 17:54 UTC (permalink / raw)
  To: David Howells
  Cc: Jeremy Fitzhardinge, William Lee Irwin III, Van Maren, Kevin,
	linux-ia64, Linux Kernel List, rusty, dhowells, mingo,
	Linus Torvalds


> I've attached an implementation of a fair spinlock and an implementation of a
> fair rwlock (which can be compiled and played with in u-space).

For those that prefer patches (or at least something that'll compile without
warnings)...

David


diff -uNr linux-2.5.46/include/asm-i386/fairlock.h linux-fair-2546/include/asm-i386/fairlock.h
--- linux-2.5.46/include/asm-i386/fairlock.h	1970-01-01 01:00:00.000000000 +0100
+++ linux-fair-2546/include/asm-i386/fairlock.h	2002-11-08 17:51:30.000000000 +0000
@@ -0,0 +1,210 @@
+/* fairlock.h: fair spinlocks
+ *
+ * Copyright (C) 2002 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#ifndef _ASM_FAIRLOCK_H
+#define _ASM_FAIRLOCK_H
+
+#include <linux/config.h>
+#include <linux/string.h>
+#include <asm/atomic.h>
+
+typedef struct fairlock {
+	u8	curr;
+	u8	ticket;
+} fairlock_t;
+
+static inline void init_fairlock(fairlock_t *lock)
+{
+	memset(lock,0,sizeof(*lock));
+}
+
+typedef struct rwfairlock {
+
+	/* these member variables must be at the beginning and in this order */
+	u8	rd_curr;	/* next reader ticket allowed to become active */
+	u8	curr;		/* currently active ticket */
+	u8	ticket;		/* next ticket to be issued */
+	u8	__pad;
+
+} __attribute__((aligned(4))) rwfairlock_t;
+
+#define RWFAIRLOCK_UNLOCKED (rwfairlock_t) { 0, 0, 0, 0 }
+
+/*****************************************************************************/
+/*
+ * spin lock fairly
+ */
+static inline void fair_lock(fairlock_t *lock)
+{
+	u8 number = 1;
+
+	__asm__ __volatile__(
+		"# beginning fair_lock\n\t"
+LOCK_PREFIX	"  xaddb     %b2,%1\n\t"	/* number = lock->ticket; lock->ticket++; */
+		"1:\n\t"
+		"  cmpb	     %b2,%0\n\t"
+		"  jne       2f\n\t"		/* jump if number!=lock->curr */
+		LOCK_SECTION_START("")
+		"2:\n\t"
+		"  rep; nop\n\t"
+		"  jmp       1b\n"
+		LOCK_SECTION_END
+		"# ending fair_lock\n\t"
+		: "=m"(lock->curr), "=m"(lock->ticket), "=r"(number)
+		: "r"(lock), "m"(lock->curr), "m"(lock->ticket), "2"(number)
+		: "memory", "cc");
+}
+
+/*****************************************************************************/
+/*
+ * spin trylock fairly
+ */
+static inline void fair_trylock(fairlock_t *lock)
+{
+	u32 tmp;
+
+	__asm__ __volatile__(
+		"# beginning fair_trylock\n\t"
+		"  movw      (%3),%%ax\n\t"	/* AL=curr, AH=ticket */
+		"1:\n\t"
+		"  cmpb      %%al,%%ah\n\t"
+		"  je        3f\n\t"		/* jump if maybe we can get it */
+		"2:\n\t"
+		LOCK_SECTION_START("")
+		"3:\n\t"
+		"  leaw      1(%ax),%w2\n\t"	/* [propose] ticket=ticket+1 */
+LOCK_PREFIX	"  cmpxchgw  %w2,(%3)\n\t"
+		"  je        3b\n\t"		/* jump if worked */
+		"  rep; nop\n\t"		/* didn't work; AL & AH have been updated */
+		"  jmp       1b\n"
+		LOCK_SECTION_END
+		"# ending fair_trylock\n\t"
+		: "=m"(lock->curr), "=m"(lock->ticket), "=$r"(tmp)
+		: "r"(lock), "m"(lock->curr), "m"(lock->ticket)
+		: "memory", "cc", "eax");
+}
+
+/*****************************************************************************/
+/*
+ * spin unlock fairly
+ */
+static inline void fair_unlock(fairlock_t *lock)
+{
+	__asm__ __volatile__(
+		"# beginning fair_unlock\n\t"
+LOCK_PREFIX	"  incb      %1\n\t"		/* lock->curr++; */
+		"# ending fair_unlock\n\t"
+		: "=m"(lock->curr)
+		: "r"(lock), "m"(lock->curr)
+		: "memory", "cc");
+}
+
+/*****************************************************************************/
+/*
+ * obtain a write lock
+ * - pull the next ticket from lock->ticket (which is subsequently incremented)
+ * - spin until lock->curr catches up to the value that lock->ticket had before the XADD
+ * - lock->rd_curr is left equal to the lock->curr (and thus my ticket number) to prevent reads
+ *   getting a lock
+ */
+static inline void fair_write_lock(rwfairlock_t *lock)
+{
+	u32 eax;
+
+	asm volatile(
+		"	# begin write_lock	\n"
+LOCK_PREFIX     "	xaddw	%%ax,1(%3)	\n"	/* my ticket in AH */
+		"0:	cmpb	%%al,%%ah	\n"	/* lock->curr in AL */
+		"	jne	2f		\n"
+		"	.section .text.lock,\"ax\"\n"
+		"2:				\n"
+		"	rep; nop		\n"
+		"	movb	1(%3),%%al	\n"	/* reload AL from lock->curr */
+		"	jmp	0b		\n"
+		"	.previous		\n"
+		"	# end write_lock	\n"
+		: "=m"(lock), "=a"(eax)
+		: "m"(lock), "r"(lock), "a"(0x0100)
+		: "memory", "cc"
+		);
+}
+
+/*****************************************************************************/
+/*
+ * release a write lock
+ * - advance both lock->rd_curr and lock->curr by one to enable the next lock to be granted
+ */
+static inline void fair_write_unlock(rwfairlock_t *lock)
+{
+	u32 eax;
+
+	asm volatile(
+		"	# begin write_unlock	\n"
+		"	movw	0(%3),%%ax	\n"
+		"	incb	%%al		\n"	/* lock->rd_curr++ */
+		"	incb	%%ah		\n"	/* lock->curr++ */
+		"	movw	%%ax,0(%3)	\n"
+		"	# end write_unlock	\n"
+		: "=m"(lock), "=&a"(eax)
+		: "m"(lock), "r"(lock)
+		: "cc"
+		);
+}
+
+/*****************************************************************************/
+/*
+ * obtain a read lock
+ * - pull the next ticket from lock->ticket (which is subsequently incremented)
+ * - spin until lock->rd_curr catches up to the value that lock->ticket had before the XADD
+ * - lock->rd_curr is then advanced by one to allow the next read lock to be granted
+ * - lock->curr is irrelevant
+ */
+static inline void fair_read_lock(rwfairlock_t *lock)
+{
+	u32 eax;
+
+	asm volatile(
+		"	# begin read_lock	\n"
+LOCK_PREFIX     "	xaddb	%%ah,2(%3)	\n"	/* my ticket in AH */
+		"0:	movb	0(%3),%%al	\n"	/* AL = lock->rd_curr */
+		"	cmpb	%%al,%%ah	\n"	/* if (ticket!=lock->rd_curr)  */
+		"	jne	2f		\n"	/* then jump */
+		"	incb	0(%3)		\n"	/* lock->rd_curr */
+		"	.section .text.lock,\"ax\"\n"
+		"2:				\n"
+		"	rep; nop		\n"
+		"	jmp	0b		\n"
+		"	.previous		\n"
+		"	# end read_lock	\n"
+		: "=m"(lock), "=a"(eax)
+		: "m"(lock), "r"(lock), "a"(0x0100)
+		: "memory", "cc"
+		);
+}
+
+/*****************************************************************************/
+/*
+ * release a read lock
+ * - just advance the lock->curr count so that any spinning write lock can happen
+ */
+static inline void read_unlock(rwfairlock_t *lock)
+{
+	asm volatile(
+		"	# begin read_unlock	\n"
+LOCK_PREFIX	"	incb	%0		\n"	/* lock->curr++ */
+		"	# end read_unlock	\n"
+		: "=m"(lock->curr)
+		: "m"(lock->curr)
+		: "cc"
+		);
+}
+
+#endif /* _ASM_FAIRLOCK_H */
diff -uNr linux-2.5.46/include/asm-i386/spinlock.h linux-fair-2546/include/asm-i386/spinlock.h
--- linux-2.5.46/include/asm-i386/spinlock.h	2002-10-15 10:12:35.000000000 +0100
+++ linux-fair-2546/include/asm-i386/spinlock.h	2002-11-08 17:50:39.000000000 +0000
@@ -4,6 +4,7 @@
 #include <asm/atomic.h>
 #include <asm/rwlock.h>
 #include <asm/page.h>
+#include <asm/fairlock.h>
 #include <linux/config.h>
 
 /*

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Linux-ia64] reader-writer livelock problem
  2002-11-08 17:34     ` [Linux-ia64] reader-writer livelock problem David Howells
  2002-11-08 17:54       ` David Howells
@ 2002-11-08 17:55       ` Stephen Hemminger
  1 sibling, 0 replies; 22+ messages in thread
From: Stephen Hemminger @ 2002-11-08 17:55 UTC (permalink / raw)
  To: David Howells
  Cc: Jeremy Fitzhardinge, William Lee Irwin III, Van Maren, Kevin,
	linux-ia64, Linux Kernel List, rusty, dhowells, mingo,
	Linus Torvalds

On Fri, 2002-11-08 at 09:34, David Howells wrote:
> 
> > The normal way of solving this fairness problem is to make pending write
> > locks block read lock attempts, so that the reader count is guaranteed
> > to drop to zero as read locks are released.  I haven't looked at the
> > Linux implementation of rwlocks, so I don't know how hard this is to
> > do.  Or perhaps there's some other reason for not implementing it this
> > way?
> 
> Actually implementing a fair spinlocks and fair rwlocks on the x86 arch are
> very easy (at least, if you have XADD it is). Any arch which has CMPXCHG can
> also do it, just not so easily.
> 
> I've attached an implementation of a fair spinlock and an implementation of a
> fair rwlock (which can be compiled and played with in u-space).
> 
> David

There are a selection of similar algorithms here:
	http://www.cs.rochester.edu/u/scott/synchronization/pseudocode/rw.html#s_f

How does yours compare?




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Linux-ia64] reader-writer livelock problem
  2002-11-08 17:38       ` Jeremy Fitzhardinge
  2002-11-08 17:43         ` David Howells
@ 2002-11-08 17:57         ` Linus Torvalds
  2002-11-09  2:48         ` Rusty Russell
  2 siblings, 0 replies; 22+ messages in thread
From: Linus Torvalds @ 2002-11-08 17:57 UTC (permalink / raw)
  To: linux-kernel

In article <1036777105.13021.13.camel@ixodes.goop.org>,
Jeremy Fitzhardinge  <jeremy@goop.org> wrote:
>
>Even without interrupts that would be a bug.  It isn't ever safe to
>attempt to retake a read lock if you already hold it, because you may
>deadlock with a pending writer.  Fair multi-reader locks aren't
>recursive locks.

.. but I don't think we have any real users who use them for recursion,
so the only "recursion" right now is through interrupts that use this
feature.

(At least that was true a long time time ago, maybe we've added truly
recursive users since)

>> Actually, giving this som emore thought, I really suspect that the
>> simplest solution is to alloc a separate "fair_read_lock()", and paths
>> that need to care about fairness (and know they don't have the irq
>> issue)  
>> can use that, slowly porting users over one by one...
>
>Do you mean have a separate lock type, or have two different read_lock
>operations on the current type?

That depends on whether it is even sanely implementable to share the
same lock. It may not be. 

>From a migration standpoint it would be easiest (by far) to be able to
share the lock type and to mix operations (ie an interrupt - or
recursive user - could just use the non-fair version, while others could
use the fair version on the same lock).  However, I have this nagging
suspicion that it might be a total nightmare to implement efficiently in
practice. 

I've not looked at it. Any ideas?

		Linus

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Linux-ia64] reader-writer livelock problem
  2002-11-08 17:38       ` Jeremy Fitzhardinge
  2002-11-08 17:43         ` David Howells
  2002-11-08 17:57         ` Linus Torvalds
@ 2002-11-09  2:48         ` Rusty Russell
  2002-11-09  4:36           ` William Lee Irwin III
       [not found]           ` <3DCFDAE9.6D359448@email.mot.com>
  2 siblings, 2 replies; 22+ messages in thread
From: Rusty Russell @ 2002-11-09  2:48 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: William Lee Irwin III, Van Maren, Kevin, linux-ia64,
	Linux Kernel List, rusty, dhowells, mingo, torvalds

In message <1036777105.13021.13.camel@ixodes.goop.org> you write:
> On Fri, 2002-11-08 at 09:25, Linus Torvalds wrote:
> > There's another reason for not doing it that way: allowing readers to keep 
> > interrupts on even in the presense of interrupt uses of readers.
> > 
> > If you do the "pending writes stop readers" approach, you get
> > 
> > 		cpu1			cpu2
> > 
> > 		read_lock() - get
> > 
> > 					write_lock_irq() - pending
> > 
> > 		irq happens
> > 		 - read_lock() - deadlock
> > 
> > and that means that you need to make readers protect against interrupts 
> > even if the interrupts only read themselves.
> 
> Even without interrupts that would be a bug.  It isn't ever safe to
> attempt to retake a read lock if you already hold it, because you may
> deadlock with a pending writer.  Fair multi-reader locks aren't
> recursive locks.

That's the point.  This is explicitly guaranteed with the current
locks, and you are allowed to recurse on them.  The netfilter code
explicitly uses this to retake the net brlock, since it gets called
from different paths.

Implement "read_lock_yield" or "wrlock_t" but don't break existing
semantics until 2.7 *please*!

Rusty.
--
  Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Linux-ia64] reader-writer livelock problem
  2002-11-09  2:48         ` Rusty Russell
@ 2002-11-09  4:36           ` William Lee Irwin III
       [not found]           ` <3DCFDAE9.6D359448@email.mot.com>
  1 sibling, 0 replies; 22+ messages in thread
From: William Lee Irwin III @ 2002-11-09  4:36 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Jeremy Fitzhardinge, Van Maren, Kevin, linux-ia64,
	Linux Kernel List, dhowells, mingo, torvalds

In message <1036777105.13021.13.camel@ixodes.goop.org> you write:
>> Even without interrupts that would be a bug.  It isn't ever safe to
>> attempt to retake a read lock if you already hold it, because you may
>> deadlock with a pending writer.  Fair multi-reader locks aren't
>> recursive locks.

On Sat, Nov 09, 2002 at 01:48:17PM +1100, Rusty Russell wrote:
> That's the point.  This is explicitly guaranteed with the current
> locks, and you are allowed to recurse on them.  The netfilter code
> explicitly uses this to retake the net brlock, since it gets called
> from different paths.
> Implement "read_lock_yield" or "wrlock_t" but don't break existing
> semantics until 2.7 *please*!

My only interest is doing this specifically for problematic cases,
e.g. the tasklist_lock. Other usages probably shouldn't be altered
during this release cycle unless they too prove problematic, in which
case their usages should be fixed by their maintainers as bugfixes.

Of course, I'm not happy to hear about this explicit recursion.


Bill

^ permalink raw reply	[flat|nested] 22+ messages in thread

* (no subject)
       [not found]           ` <3DCFDAE9.6D359448@email.mot.com>
@ 2002-11-11 19:22             ` David Mosberger
  2002-11-12  1:39               ` your mail Rik van Riel
  0 siblings, 1 reply; 22+ messages in thread
From: David Mosberger @ 2002-11-11 19:22 UTC (permalink / raw)
  To: Mario Smarduch; +Cc: linux-ia64, linux-kernel

>>>>> On Mon, 11 Nov 2002 10:29:29 -0600, Mario Smarduch <cms063@email.mot.com> said:

  Mario> I know that on some commercial Unix systems there are ways to
  Mario> cap the CPU utilization by user/group ids are there such
  Mario> features/patches available on Linux?

There are probably other patches floating around, but Process Resource
Management (PRM) for Linux is/was one approach to do just that:

	http://resourcemanagement.unixsolutions.hp.com/WaRM/prm_linux/

The kernel patches available from this URL are pretty old (up to
2.4.6, as far as I could see), and I'm not sure what the future plans
for PRM on Linux are.  Perhaps someone else can provide more details.

	--david

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: your mail
  2002-11-11 19:22             ` David Mosberger
@ 2002-11-12  1:39               ` Rik van Riel
  0 siblings, 0 replies; 22+ messages in thread
From: Rik van Riel @ 2002-11-12  1:39 UTC (permalink / raw)
  To: davidm; +Cc: Mario Smarduch, linux-ia64, linux-kernel

On Mon, 11 Nov 2002, David Mosberger wrote:
> >>>>> On Mon, 11 Nov 2002 10:29:29 -0600, Mario Smarduch <cms063@email.mot.com> said:
>
>   Mario> I know that on some commercial Unix systems there are ways to
>   Mario> cap the CPU utilization by user/group ids are there such
>   Mario> features/patches available on Linux?

> The kernel patches available from this URL are pretty old (up to
> 2.4.6, as far as I could see), and I'm not sure what the future plans
> for PRM on Linux are.  Perhaps someone else can provide more details.

I'm (slowly) working on a per-user fair scheduler on top of Ingo's
O(1) scheduler.  Slowly because it's a fairly complex thing.

Once that is done it should be possible to change the accounting
to other resource containers and generally have fun assigning
priorities, though that is beyond the scope of what I'm trying to
achieve.

cheers,

Rik
-- 
Bravely reimplemented by the knights who say "NIH".
http://www.surriel.com/		http://distro.conectiva.com/
Current spamtrap:  <a href=mailto:"october@surriel.com">october@surriel.com</a>


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Linux-ia64] reader-writer livelock problem
  2002-11-08 20:17 Van Maren, Kevin
@ 2002-11-08 20:39 ` Matthew Wilcox
  0 siblings, 0 replies; 22+ messages in thread
From: Matthew Wilcox @ 2002-11-08 20:39 UTC (permalink / raw)
  To: Van Maren, Kevin
  Cc: 'Matthew Wilcox', ''Linus Torvalds ' ',
	''Jeremy Fitzhardinge ' ',
	''William Lee Irwin III ' ',
	''linux-ia64@linuxia64.org ' ',
	''Linux Kernel List ' ',
	''rusty@rustcorp.com.au ' ',
	''dhowells@redhat.com ' ',
	''mingo@elte.hu ' '

On Fri, Nov 08, 2002 at 02:17:21PM -0600, Van Maren, Kevin wrote:
> > all that cacheline bouncing can't do your numa boxes any good.
> 
> It happens even on our non-NUMA boxes.  But that was the reason
> behind developing MCS locks: they are designed to minimize the
> cacheline bouncing due to lock contention, and become a win with
> a very small number of processors contending the same spinlock.

that's not my point... a resource occupies a number of cachelines;
bouncing those cachelines between processors is expensive.  if there's
a real workload that all the processors are contending for the same
resource, it's time to split up that resource.

> I was using gettimeofday() as ONE example of the problem.
> Fixing gettimeofday(), such as with frlocks (see, for example,
> http://lwn.net/Articles/7388) fixes ONE occurance of the
> problem.
> 
> Every reader/writer lock that an application can force
> the kernel to acquire can have this problem.  If there
> is enough time between acquires, it may take 32 or 64
> processors to hang the system, but livelock WILL occur.

not true.  every reader/writer lock which guards a global resource can
have this problem.  if the lock only guards your own task's resources,
there can be no problem. 

> There are MANY other cases where an application can force the
> kernel to acquire a lock needed by other things.

and i agree they should be fixed.

> Spinlocks are a slightly different story.  While there isn't
> the starvation issue, livelock can still occur if the kernel
> needs to acquire the spinlock more often that it takes to
> acquire.  This is why replacing the xtime_lock with a spinlock
> fixes the reader/writer livelock, but not the problem: while
> the writer can now get the spinlock, it can take an entire
> clock tick to acquire/release it.  So the net behavior is the
> same: with a 1KHz timer and with 1us cache-cache latency, 32
> processors spinning on gettimeofday() using a spinlock would
> have a similar result.

right.  so spinlocking this resource is also not good enough.  did
you see the "Voyager subarchitecture for 2.5.46" thread earlier this
week which discussed making it per-cpu?
http://www.uwsg.iu.edu/hypermail/linux/kernel/0211.0/1799.html
this seems like the right way to go to me.

-- 
Revolutions do not require corporate support.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [Linux-ia64] reader-writer livelock problem
@ 2002-11-08 20:24 Van Maren, Kevin
  0 siblings, 0 replies; 22+ messages in thread
From: Van Maren, Kevin @ 2002-11-08 20:24 UTC (permalink / raw)
  To: linux-kernel

Here is the original post sent to the IA64 list.
Sorry if you have gotten this already:


This is a follow-up to the email thread I started on July 29th.
See http://www.cs.helsinki.fi/linux/linux-kernel/2002-30/0446.html
and the following discussion on LKML.

I'll summarize the problem again to refresh the issue.
Again, this is a correctness issue, not a performance one.

I am seeing a problem on medium-sized SMPs where user programs are
able to livelock the Linux kernel to such an extent that the
system appears dead.  With the help of some hardware debugging
tools, I was able to determine that the problem is caused by
the reader-preference reader/writer locks in the Linux kernel.
Under "heavy" reader contention, it is possible for the writer to
_never_ be able to acquire the lock since the read count never
drops to zero (since some reader always loses the cacheline after
acquiring the lock, and before it can release the lock, by which
time another reader has the lock).

I have seen this occur with at least two different reader locks,
without _trying_ to cause the problem, and I believe every reader
lock that is acquired by a system call is susceptible to this problem,
if the writer lock needs to be acquired by a processor.

A simple test-case is to run a process on every cpu in a gettimeofday()
loop.  With 16 IA64 processors doing this, the timer code takes longer
to acquire the writer lock than the 1ms between the 1KHz clock ticks,
which causes all the other interrupts directed to the BSP to get queued
up (since the timer is higher priority).  When this occurs, time
"stands still" -- calling gettimeofday() takes 0 time.

My initial suggestion was to make reader/writer locks "fair" by
setting a writer-pending bit.  However, that proposal was not
satisfactory for Linux because Linux *requires* reader-preference
reader locks because reader locks can be acquired recursively,
either through an explicit call-graph or an interrupt handler,
since interrupts are not disabled while holding a reader lock if
the interrupt handler only acquires a read lock.

There are quite a few options; I would like to hash these issues
out and determine the "best" solution.  The solutions, as I see
them, are:

1) Implement "fair" locks while providing the required semantics.
This would require pre-processor storage for the reference count,
so that a processor can acquire the reader lock if it already has
it (to prevent deadlock), and otherwise wait for a pending writer.
Costs: per-processor storage.  Writer has to wait for all the
reader reference counts to hit zero before proceeding (currently
only has to wait for the one "global" reference count).

2) Implement "scalable reader-writer locks".  See Wilson Hsieh's
paper at http://www.psg.lcs.mit.edu/papers/RWlocks.ps
This approach also requires per-processor storage, but solves the
problem in a slightly different (probably better) way.

Note that 1 and 2, in essence, convert every reader/writer lock
into a "big" reader/writer lock (see lib/brlock).

The downside here, of course, is that each lock is allocated
based on the compile-time constant for the maximum number of
processors in the system.  If a 128-byte cacheline is allocated
for every processor, that results in 4KB for each reader/writer
lock with just 32 processors.  While the memory consumption
does not disturb me, it does bother me that a driver compiled
for NR_CPUS=16 would not work with a kernel compiled with
NR_CPUS=32 (if there are > 16 CPUs).

The upside is that for read-mostly locks (> 99%) there is a
performance improvement.  Downside is that writers become more
expensive in the uncontested/lightly contested case.


3) Change the kernel to not allow reader locks to be acquired
recursively.  This change would bring Linux into line with most
other Unix-like operating systems, but requires a change in 
the locking mentality and could be difficult to get right at
this stage.  Long term, it is the most appealing to me,
even though it makes reader locks more expensive because
interrupts have to be disabled more frequently.

4) Change the reader/writer locks to be a recursive spinlock:
have a reference count and an "owner".  Processor can only
acquire the lock if a) reference count is 0, or b) reference
count is > 0 but it is the owner.

This change would only allow a single processor to be in the
critical section, but would allow it to recursively reenter.
This approach would not increase the storage requirements, and
would eliminate parallelism, but would guarantee fairness (and
slow is better than never).

5) Combination of 3&4: have "compatible" reader/writer locks
that act as #4, and new reader-writer locks that are fair (#3),
but cannot be acquired recursivly.

6) Some heuristic "hack" involving exponential backoff if a
writer is pending.  Not "guaranteed" to work.  If a writer is
pending, wait before trying to acquire the lock.  Problem is
that a cpu that already holds the reader lock will also delay
before acquiring the lock, even though it is the reason the
writer cannot acquire the lock.

The delays to use are going to be highly platform-specific
and also depend on the size of the critical section.


Other heuristics, such as using a per-processor reader-lock
count, fall apart when the contended lock is not the first
one acquired by the processor.

7) ???


I did implement #2 (without the gate MCS lock) on 2.4.x for IA64;
combined with the NMCS patch from IBM, I was unable to "hang"
the system.  For my prototype, I converted the big reader locks
into "normal" reader locks, since the regular locks now provided
the big-reader functionality.  There is still a lot of low-hanging
fruit in the prototype, but I just wanted to get something working.

The performance impact was negligible for a single processor, where
gettimeofday() ran about as fast, while stat() was around 0.2us
(10%) slower.  However, with 16 processors, gettimeofday did
not kill the clock, and stat() performance increased more than 5X,
but the big win was that the execution time was bounded: no
enormous outliers.  Note that with one processor, gettimeofday()
was called around 1200 times between clock ticks, and about the
same as a 16X (each call took longer on average) with the new
locking code.

Kevin Van Maren

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [Linux-ia64] reader-writer livelock problem
@ 2002-11-08 20:17 Van Maren, Kevin
  2002-11-08 20:39 ` Matthew Wilcox
  0 siblings, 1 reply; 22+ messages in thread
From: Van Maren, Kevin @ 2002-11-08 20:17 UTC (permalink / raw)
  To: 'Matthew Wilcox'
  Cc: ''Linus Torvalds ' ',
	''Jeremy Fitzhardinge ' ',
	''William Lee Irwin III ' ',
	''linux-ia64@linuxia64.org ' ',
	''Linux Kernel List ' ',
	''rusty@rustcorp.com.au ' ',
	''dhowells@redhat.com ' ',
	''mingo@elte.hu ' '

> all that cacheline bouncing can't do your numa boxes any good.

It happens even on our non-NUMA boxes.  But that was the reason
behind developing MCS locks: they are designed to minimize the
cacheline bouncing due to lock contention, and become a win with
a very small number of processors contending the same spinlock.


> i hear x86-64 has a lockless gettimeofday.  maybe that's the solution.

I was using gettimeofday() as ONE example of the problem.
Fixing gettimeofday(), such as with frlocks (see, for example,
http://lwn.net/Articles/7388) fixes ONE occurance of the
problem.

Every reader/writer lock that an application can force
the kernel to acquire can have this problem.  If there
is enough time between acquires, it may take 32 or 64
processors to hang the system, but livelock WILL occur.

> it's really
> not the kernel's fault that your app is badly written.

There are MANY other cases where an application can force the
kernel to acquire a lock needed by other things.
The point is not whether the *application* is badly written,
but point is whether a bad application can mess up the kernel
by causing a livelock.


Spinlocks are a slightly different story.  While there isn't
the starvation issue, livelock can still occur if the kernel
needs to acquire the spinlock more often that it takes to
acquire.  This is why replacing the xtime_lock with a spinlock
fixes the reader/writer livelock, but not the problem: while
the writer can now get the spinlock, it can take an entire
clock tick to acquire/release it.  So the net behavior is the
same: with a 1KHz timer and with 1us cache-cache latency, 32
processors spinning on gettimeofday() using a spinlock would
have a similar result.

Kevin

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Linux-ia64] reader-writer livelock problem
  2002-11-08 19:19 ` Matthew Wilcox
@ 2002-11-08 19:26   ` David Mosberger
  0 siblings, 0 replies; 22+ messages in thread
From: David Mosberger @ 2002-11-08 19:26 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Van Maren, Kevin, ''Linus Torvalds ' ',
	''Jeremy Fitzhardinge ' ',
	''William Lee Irwin III ' ',
	''linux-ia64@linuxia64.org ' ',
	''Linux Kernel List ' ',
	''rusty@rustcorp.com.au ' ',
	''dhowells@redhat.com ' ',
	''mingo@elte.hu ' '

>>>>> On Fri, 8 Nov 2002 19:19:07 +0000, Matthew Wilcox <willy@debian.org> said:

  Matthew> On Fri, Nov 08, 2002 at 12:05:30PM -0600, Van Maren, Kevin
  Matthew> wrote:
  >> Absolutely you should minimize the locking contention.  However,
  >> that isn't always possible, such as when you have 64 processors
  >> contending on the same resource.

  Matthew> if you've got 64 processors contending on the same
  Matthew> resource, maybe you need to split that resource up so they
  Matthew> can have a copy each.  all that cacheline bouncing can't do
  Matthew> your numa boxes any good.

Matthew, please understand that this is NOT a performance problem.
It's a correctness problem.  If livelock can resolut from read-write locks,
it's a huge security problem.  Period.

	--david

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Linux-ia64] reader-writer livelock problem
  2002-11-08 18:05 Van Maren, Kevin
@ 2002-11-08 19:19 ` Matthew Wilcox
  2002-11-08 19:26   ` David Mosberger
  0 siblings, 1 reply; 22+ messages in thread
From: Matthew Wilcox @ 2002-11-08 19:19 UTC (permalink / raw)
  To: Van Maren, Kevin
  Cc: 'Matthew Wilcox ', ''Linus Torvalds ' ',
	''Jeremy Fitzhardinge ' ',
	''William Lee Irwin III ' ',
	''linux-ia64@linuxia64.org ' ',
	''Linux Kernel List ' ',
	''rusty@rustcorp.com.au ' ',
	''dhowells@redhat.com ' ',
	''mingo@elte.hu ' '

On Fri, Nov 08, 2002 at 12:05:30PM -0600, Van Maren, Kevin wrote:
> Absolutely you should minimize the locking contention.
> However, that isn't always possible, such as when you
> have 64 processors contending on the same resource.

if you've got 64 processors contending on the same resource, maybe you
need to split that resource up so they can have a copy each.  all that
cacheline bouncing can't do your numa boxes any good.

> With the current kernel, the trivial example with reader/
> writer locks was having them all call gettimeofday().

i hear x86-64 has a lockless gettimeofday.  maybe that's the solution.

> But try having 64 processors fstat() the same file,
> which I have also seen happen (application looping,
> waiting for another process to finish setting up the
> file so they can all mmap it).

umm.. the call trace:

sys_fstat
|-> vfs_fstat
|   |-> fget
|	|-> read_lock(&files->file_lock)
|   |-> vfs_getattr
|	|-> inode->i_op->getattr
|	|-> generic_fillattr
|-> cp_new_stat64
    |-> memset
    |-> copy_to_user

so you're talking about contention on files->file_lock, right?  it's really
not the kernel's fault that your app is badly written.  that lock's private
to process & children, so it's not like another application can hurt you.

> What MCS locks do is they reduce the number of times
> the cacheline has to be flung around the system in
> order to get work done: they "scale" much better with
> the number of processors: O(N) instead of O(N^2).

yes, but how slow are they in the uncontended case?

-- 
Revolutions do not require corporate support.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [Linux-ia64] reader-writer livelock problem
@ 2002-11-08 18:05 Van Maren, Kevin
  2002-11-08 19:19 ` Matthew Wilcox
  0 siblings, 1 reply; 22+ messages in thread
From: Van Maren, Kevin @ 2002-11-08 18:05 UTC (permalink / raw)
  To: 'Matthew Wilcox ', Van Maren, Kevin
  Cc: ''Linus Torvalds ' ',
	''Jeremy Fitzhardinge ' ',
	''William Lee Irwin III ' ',
	''linux-ia64@linuxia64.org ' ',
	''Linux Kernel List ' ',
	''rusty@rustcorp.com.au ' ',
	''dhowells@redhat.com ' ',
	''mingo@elte.hu ' '

Absolutely you should minimize the locking contention.
However, that isn't always possible, such as when you
have 64 processors contending on the same resource.
With the current kernel, the trivial example with reader/
writer locks was having them all call gettimeofday().
But try having 64 processors fstat() the same file,
which I have also seen happen (application looping,
waiting for another process to finish setting up the
file so they can all mmap it).

What MCS locks do is they reduce the number of times
the cacheline has to be flung around the system in
order to get work done: they "scale" much better with
the number of processors: O(N) instead of O(N^2).

Kevin

-----Original Message-----
From: Matthew Wilcox
To: Van Maren, Kevin
Cc: 'Linus Torvalds '; 'Jeremy Fitzhardinge '; 'William Lee Irwin III ';
'linux-ia64@linuxia64.org '; 'Linux Kernel List '; 'rusty@rustcorp.com.au ';
'dhowells@redhat.com '; 'mingo@elte.hu '
Sent: 11/8/02 12:52 PM
Subject: Re: [Linux-ia64] reader-writer livelock problem

On Fri, Nov 08, 2002 at 11:41:57AM -0600, Van Maren, Kevin wrote:
> processor to acquire/release the lock once.  So with 32 processors
> contending for the lock, at 1us per cache-cache transfer (picked

if you have 32 processors contending for the same spinlock, you have
bigger problems.

-- 
Revolutions do not require corporate support.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [Linux-ia64] reader-writer livelock problem
  2002-11-08 17:41 Van Maren, Kevin
@ 2002-11-08 17:52 ` Matthew Wilcox
  0 siblings, 0 replies; 22+ messages in thread
From: Matthew Wilcox @ 2002-11-08 17:52 UTC (permalink / raw)
  To: Van Maren, Kevin
  Cc: 'Linus Torvalds ', 'Jeremy Fitzhardinge ',
	'William Lee Irwin III ',
	'linux-ia64@linuxia64.org ', 'Linux Kernel List ',
	'rusty@rustcorp.com.au ', 'dhowells@redhat.com ',
	'mingo@elte.hu '

On Fri, Nov 08, 2002 at 11:41:57AM -0600, Van Maren, Kevin wrote:
> processor to acquire/release the lock once.  So with 32 processors
> contending for the lock, at 1us per cache-cache transfer (picked

if you have 32 processors contending for the same spinlock, you have
bigger problems.

-- 
Revolutions do not require corporate support.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* RE: [Linux-ia64] reader-writer livelock problem
@ 2002-11-08 17:41 Van Maren, Kevin
  2002-11-08 17:52 ` Matthew Wilcox
  0 siblings, 1 reply; 22+ messages in thread
From: Van Maren, Kevin @ 2002-11-08 17:41 UTC (permalink / raw)
  To: 'Linus Torvalds ', 'Jeremy Fitzhardinge '
  Cc: 'William Lee Irwin III ',
	Van Maren, Kevin, 'linux-ia64@linuxia64.org ',
	'Linux Kernel List ', 'rusty@rustcorp.com.au ',
	'dhowells@redhat.com ', 'mingo@elte.hu '

Yes, that was one of the options I suggested in the original post
to the IA64 list.  I'll bounce it to LKML for reference, now that
the discussion has moved there.

In my proposal, however, I proposed making (the current) reader
locks recursive spinlocks (to eliminate the starvation problem,
at the expense of eliminating reader parallelism), which would
have the added benefit of "encouraging" people to move to the
fair locks.  But your proposal is probably at least as good.

Of course, normal spinlocks do not scale either, since they
currently require N cache-cache transfers for a processor to
drop the lock, which results in N^2 cache transfers for each
processor to acquire/release the lock once.  So with 32 processors
contending for the lock, at 1us per cache-cache transfer (picked
for easy math, but that is reasonable for large systems), it
takes 1ms for each processor to acquire the spinlock and hold it
for 10 cpu cycles.

So I'd _also_ like to see an MCS lock implementation replace the current
spinlock code (IBM "NMCS" lock patch can be trivially used to replace
all spinlocks).

Kevin

-----Original Message-----
From: Linus Torvalds
To: Jeremy Fitzhardinge
Cc: William Lee Irwin III; Van Maren, Kevin; linux-ia64@linuxia64.org; Linux
Kernel List; rusty@rustcorp.com.au; dhowells@redhat.com; mingo@elte.hu
Sent: 11/8/02 12:28 PM
Subject: Re: [Linux-ia64] reader-writer livelock problem


On Fri, 8 Nov 2002, Linus Torvalds wrote:
> 
> NOTE! I'm not saying the existing practice is necessarily a good
tradeoff,
> and maybe we should just make sure to find all such cases and turn the
> read_lock() calls into read_lock_irqsave() and then make the rw-locks
> block readers on pending writers. But it's certainly more work and
cause
> for subtler problems than just naively changing the rw implementation.

Actually, giving this som emore thought, I really suspect that the
simplest solution is to alloc a separate "fair_read_lock()", and paths
that need to care about fairness (and know they don't have the irq
issue)  
can use that, slowly porting users over one by one...

  Linus

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2002-11-12  1:32 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <3FAD1088D4556046AEC48D80B47B478C0101F4E7@usslc-exch-4.slc.unisys.com>
2002-11-08  3:51 ` [Linux-ia64] reader-writer livelock problem William Lee Irwin III
2002-11-08 17:13   ` Jeremy Fitzhardinge
2002-11-08 17:25     ` Linus Torvalds
2002-11-08 17:28       ` Linus Torvalds
2002-11-08 17:38       ` Jeremy Fitzhardinge
2002-11-08 17:43         ` David Howells
2002-11-08 17:57         ` Linus Torvalds
2002-11-09  2:48         ` Rusty Russell
2002-11-09  4:36           ` William Lee Irwin III
     [not found]           ` <3DCFDAE9.6D359448@email.mot.com>
2002-11-11 19:22             ` David Mosberger
2002-11-12  1:39               ` your mail Rik van Riel
2002-11-08 17:34     ` [Linux-ia64] reader-writer livelock problem David Howells
2002-11-08 17:54       ` David Howells
2002-11-08 17:55       ` Stephen Hemminger
2002-11-08 17:41 Van Maren, Kevin
2002-11-08 17:52 ` Matthew Wilcox
2002-11-08 18:05 Van Maren, Kevin
2002-11-08 19:19 ` Matthew Wilcox
2002-11-08 19:26   ` David Mosberger
2002-11-08 20:17 Van Maren, Kevin
2002-11-08 20:39 ` Matthew Wilcox
2002-11-08 20:24 Van Maren, Kevin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).