linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RE: [Linux-ia64] reader-writer livelock problem
@ 2002-11-08 17:41 Van Maren, Kevin
  2002-11-08 17:52 ` Matthew Wilcox
  0 siblings, 1 reply; 22+ messages in thread
From: Van Maren, Kevin @ 2002-11-08 17:41 UTC (permalink / raw)
  To: 'Linus Torvalds ', 'Jeremy Fitzhardinge '
  Cc: 'William Lee Irwin III ',
	Van Maren, Kevin, 'linux-ia64@linuxia64.org ',
	'Linux Kernel List ', 'rusty@rustcorp.com.au ',
	'dhowells@redhat.com ', 'mingo@elte.hu '

Yes, that was one of the options I suggested in the original post
to the IA64 list.  I'll bounce it to LKML for reference, now that
the discussion has moved there.

In my proposal, however, I proposed making (the current) reader
locks recursive spinlocks (to eliminate the starvation problem,
at the expense of eliminating reader parallelism), which would
have the added benefit of "encouraging" people to move to the
fair locks.  But your proposal is probably at least as good.

Of course, normal spinlocks do not scale either, since they
currently require N cache-cache transfers for a processor to
drop the lock, which results in N^2 cache transfers for each
processor to acquire/release the lock once.  So with 32 processors
contending for the lock, at 1us per cache-cache transfer (picked
for easy math, but that is reasonable for large systems), it
takes 1ms for each processor to acquire the spinlock and hold it
for 10 cpu cycles.

So I'd _also_ like to see an MCS lock implementation replace the current
spinlock code (IBM "NMCS" lock patch can be trivially used to replace
all spinlocks).

Kevin

-----Original Message-----
From: Linus Torvalds
To: Jeremy Fitzhardinge
Cc: William Lee Irwin III; Van Maren, Kevin; linux-ia64@linuxia64.org; Linux
Kernel List; rusty@rustcorp.com.au; dhowells@redhat.com; mingo@elte.hu
Sent: 11/8/02 12:28 PM
Subject: Re: [Linux-ia64] reader-writer livelock problem


On Fri, 8 Nov 2002, Linus Torvalds wrote:
> 
> NOTE! I'm not saying the existing practice is necessarily a good
tradeoff,
> and maybe we should just make sure to find all such cases and turn the
> read_lock() calls into read_lock_irqsave() and then make the rw-locks
> block readers on pending writers. But it's certainly more work and
cause
> for subtler problems than just naively changing the rw implementation.

Actually, giving this som emore thought, I really suspect that the
simplest solution is to alloc a separate "fair_read_lock()", and paths
that need to care about fairness (and know they don't have the irq
issue)  
can use that, slowly porting users over one by one...

  Linus

^ permalink raw reply	[flat|nested] 22+ messages in thread
* RE: [Linux-ia64] reader-writer livelock problem
@ 2002-11-08 18:05 Van Maren, Kevin
  2002-11-08 19:19 ` Matthew Wilcox
  0 siblings, 1 reply; 22+ messages in thread
From: Van Maren, Kevin @ 2002-11-08 18:05 UTC (permalink / raw)
  To: 'Matthew Wilcox ', Van Maren, Kevin
  Cc: ''Linus Torvalds ' ',
	''Jeremy Fitzhardinge ' ',
	''William Lee Irwin III ' ',
	''linux-ia64@linuxia64.org ' ',
	''Linux Kernel List ' ',
	''rusty@rustcorp.com.au ' ',
	''dhowells@redhat.com ' ',
	''mingo@elte.hu ' '

Absolutely you should minimize the locking contention.
However, that isn't always possible, such as when you
have 64 processors contending on the same resource.
With the current kernel, the trivial example with reader/
writer locks was having them all call gettimeofday().
But try having 64 processors fstat() the same file,
which I have also seen happen (application looping,
waiting for another process to finish setting up the
file so they can all mmap it).

What MCS locks do is they reduce the number of times
the cacheline has to be flung around the system in
order to get work done: they "scale" much better with
the number of processors: O(N) instead of O(N^2).

Kevin

-----Original Message-----
From: Matthew Wilcox
To: Van Maren, Kevin
Cc: 'Linus Torvalds '; 'Jeremy Fitzhardinge '; 'William Lee Irwin III ';
'linux-ia64@linuxia64.org '; 'Linux Kernel List '; 'rusty@rustcorp.com.au ';
'dhowells@redhat.com '; 'mingo@elte.hu '
Sent: 11/8/02 12:52 PM
Subject: Re: [Linux-ia64] reader-writer livelock problem

On Fri, Nov 08, 2002 at 11:41:57AM -0600, Van Maren, Kevin wrote:
> processor to acquire/release the lock once.  So with 32 processors
> contending for the lock, at 1us per cache-cache transfer (picked

if you have 32 processors contending for the same spinlock, you have
bigger problems.

-- 
Revolutions do not require corporate support.

^ permalink raw reply	[flat|nested] 22+ messages in thread
* RE: [Linux-ia64] reader-writer livelock problem
@ 2002-11-08 20:17 Van Maren, Kevin
  2002-11-08 20:39 ` Matthew Wilcox
  0 siblings, 1 reply; 22+ messages in thread
From: Van Maren, Kevin @ 2002-11-08 20:17 UTC (permalink / raw)
  To: 'Matthew Wilcox'
  Cc: ''Linus Torvalds ' ',
	''Jeremy Fitzhardinge ' ',
	''William Lee Irwin III ' ',
	''linux-ia64@linuxia64.org ' ',
	''Linux Kernel List ' ',
	''rusty@rustcorp.com.au ' ',
	''dhowells@redhat.com ' ',
	''mingo@elte.hu ' '

> all that cacheline bouncing can't do your numa boxes any good.

It happens even on our non-NUMA boxes.  But that was the reason
behind developing MCS locks: they are designed to minimize the
cacheline bouncing due to lock contention, and become a win with
a very small number of processors contending the same spinlock.


> i hear x86-64 has a lockless gettimeofday.  maybe that's the solution.

I was using gettimeofday() as ONE example of the problem.
Fixing gettimeofday(), such as with frlocks (see, for example,
http://lwn.net/Articles/7388) fixes ONE occurance of the
problem.

Every reader/writer lock that an application can force
the kernel to acquire can have this problem.  If there
is enough time between acquires, it may take 32 or 64
processors to hang the system, but livelock WILL occur.

> it's really
> not the kernel's fault that your app is badly written.

There are MANY other cases where an application can force the
kernel to acquire a lock needed by other things.
The point is not whether the *application* is badly written,
but point is whether a bad application can mess up the kernel
by causing a livelock.


Spinlocks are a slightly different story.  While there isn't
the starvation issue, livelock can still occur if the kernel
needs to acquire the spinlock more often that it takes to
acquire.  This is why replacing the xtime_lock with a spinlock
fixes the reader/writer livelock, but not the problem: while
the writer can now get the spinlock, it can take an entire
clock tick to acquire/release it.  So the net behavior is the
same: with a 1KHz timer and with 1us cache-cache latency, 32
processors spinning on gettimeofday() using a spinlock would
have a similar result.

Kevin

^ permalink raw reply	[flat|nested] 22+ messages in thread
* RE: [Linux-ia64] reader-writer livelock problem
@ 2002-11-08 20:24 Van Maren, Kevin
  0 siblings, 0 replies; 22+ messages in thread
From: Van Maren, Kevin @ 2002-11-08 20:24 UTC (permalink / raw)
  To: linux-kernel

Here is the original post sent to the IA64 list.
Sorry if you have gotten this already:


This is a follow-up to the email thread I started on July 29th.
See http://www.cs.helsinki.fi/linux/linux-kernel/2002-30/0446.html
and the following discussion on LKML.

I'll summarize the problem again to refresh the issue.
Again, this is a correctness issue, not a performance one.

I am seeing a problem on medium-sized SMPs where user programs are
able to livelock the Linux kernel to such an extent that the
system appears dead.  With the help of some hardware debugging
tools, I was able to determine that the problem is caused by
the reader-preference reader/writer locks in the Linux kernel.
Under "heavy" reader contention, it is possible for the writer to
_never_ be able to acquire the lock since the read count never
drops to zero (since some reader always loses the cacheline after
acquiring the lock, and before it can release the lock, by which
time another reader has the lock).

I have seen this occur with at least two different reader locks,
without _trying_ to cause the problem, and I believe every reader
lock that is acquired by a system call is susceptible to this problem,
if the writer lock needs to be acquired by a processor.

A simple test-case is to run a process on every cpu in a gettimeofday()
loop.  With 16 IA64 processors doing this, the timer code takes longer
to acquire the writer lock than the 1ms between the 1KHz clock ticks,
which causes all the other interrupts directed to the BSP to get queued
up (since the timer is higher priority).  When this occurs, time
"stands still" -- calling gettimeofday() takes 0 time.

My initial suggestion was to make reader/writer locks "fair" by
setting a writer-pending bit.  However, that proposal was not
satisfactory for Linux because Linux *requires* reader-preference
reader locks because reader locks can be acquired recursively,
either through an explicit call-graph or an interrupt handler,
since interrupts are not disabled while holding a reader lock if
the interrupt handler only acquires a read lock.

There are quite a few options; I would like to hash these issues
out and determine the "best" solution.  The solutions, as I see
them, are:

1) Implement "fair" locks while providing the required semantics.
This would require pre-processor storage for the reference count,
so that a processor can acquire the reader lock if it already has
it (to prevent deadlock), and otherwise wait for a pending writer.
Costs: per-processor storage.  Writer has to wait for all the
reader reference counts to hit zero before proceeding (currently
only has to wait for the one "global" reference count).

2) Implement "scalable reader-writer locks".  See Wilson Hsieh's
paper at http://www.psg.lcs.mit.edu/papers/RWlocks.ps
This approach also requires per-processor storage, but solves the
problem in a slightly different (probably better) way.

Note that 1 and 2, in essence, convert every reader/writer lock
into a "big" reader/writer lock (see lib/brlock).

The downside here, of course, is that each lock is allocated
based on the compile-time constant for the maximum number of
processors in the system.  If a 128-byte cacheline is allocated
for every processor, that results in 4KB for each reader/writer
lock with just 32 processors.  While the memory consumption
does not disturb me, it does bother me that a driver compiled
for NR_CPUS=16 would not work with a kernel compiled with
NR_CPUS=32 (if there are > 16 CPUs).

The upside is that for read-mostly locks (> 99%) there is a
performance improvement.  Downside is that writers become more
expensive in the uncontested/lightly contested case.


3) Change the kernel to not allow reader locks to be acquired
recursively.  This change would bring Linux into line with most
other Unix-like operating systems, but requires a change in 
the locking mentality and could be difficult to get right at
this stage.  Long term, it is the most appealing to me,
even though it makes reader locks more expensive because
interrupts have to be disabled more frequently.

4) Change the reader/writer locks to be a recursive spinlock:
have a reference count and an "owner".  Processor can only
acquire the lock if a) reference count is 0, or b) reference
count is > 0 but it is the owner.

This change would only allow a single processor to be in the
critical section, but would allow it to recursively reenter.
This approach would not increase the storage requirements, and
would eliminate parallelism, but would guarantee fairness (and
slow is better than never).

5) Combination of 3&4: have "compatible" reader/writer locks
that act as #4, and new reader-writer locks that are fair (#3),
but cannot be acquired recursivly.

6) Some heuristic "hack" involving exponential backoff if a
writer is pending.  Not "guaranteed" to work.  If a writer is
pending, wait before trying to acquire the lock.  Problem is
that a cpu that already holds the reader lock will also delay
before acquiring the lock, even though it is the reason the
writer cannot acquire the lock.

The delays to use are going to be highly platform-specific
and also depend on the size of the critical section.


Other heuristics, such as using a per-processor reader-lock
count, fall apart when the contended lock is not the first
one acquired by the processor.

7) ???


I did implement #2 (without the gate MCS lock) on 2.4.x for IA64;
combined with the NMCS patch from IBM, I was unable to "hang"
the system.  For my prototype, I converted the big reader locks
into "normal" reader locks, since the regular locks now provided
the big-reader functionality.  There is still a lot of low-hanging
fruit in the prototype, but I just wanted to get something working.

The performance impact was negligible for a single processor, where
gettimeofday() ran about as fast, while stat() was around 0.2us
(10%) slower.  However, with 16 processors, gettimeofday did
not kill the clock, and stat() performance increased more than 5X,
but the big win was that the execution time was bounded: no
enormous outliers.  Note that with one processor, gettimeofday()
was called around 1200 times between clock ticks, and about the
same as a 16X (each call took longer on average) with the new
locking code.

Kevin Van Maren

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2002-11-12  1:32 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <3FAD1088D4556046AEC48D80B47B478C0101F4E7@usslc-exch-4.slc.unisys.com>
2002-11-08  3:51 ` [Linux-ia64] reader-writer livelock problem William Lee Irwin III
2002-11-08 17:13   ` Jeremy Fitzhardinge
2002-11-08 17:25     ` Linus Torvalds
2002-11-08 17:28       ` Linus Torvalds
2002-11-08 17:38       ` Jeremy Fitzhardinge
2002-11-08 17:43         ` David Howells
2002-11-08 17:57         ` Linus Torvalds
2002-11-09  2:48         ` Rusty Russell
2002-11-09  4:36           ` William Lee Irwin III
     [not found]           ` <3DCFDAE9.6D359448@email.mot.com>
2002-11-11 19:22             ` David Mosberger
2002-11-12  1:39               ` your mail Rik van Riel
2002-11-08 17:34     ` [Linux-ia64] reader-writer livelock problem David Howells
2002-11-08 17:54       ` David Howells
2002-11-08 17:55       ` Stephen Hemminger
2002-11-08 17:41 Van Maren, Kevin
2002-11-08 17:52 ` Matthew Wilcox
2002-11-08 18:05 Van Maren, Kevin
2002-11-08 19:19 ` Matthew Wilcox
2002-11-08 19:26   ` David Mosberger
2002-11-08 20:17 Van Maren, Kevin
2002-11-08 20:39 ` Matthew Wilcox
2002-11-08 20:24 Van Maren, Kevin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).