* [PATCH 0/4][RT] futex: fix tasks blocking on two rt_mutex locks @ 2010-07-09 22:32 Darren Hart 2010-07-09 22:32 ` [PATCH 1/4] rtmutex: avoid null derefence in WARN_ON Darren Hart ` (3 more replies) 0 siblings, 4 replies; 28+ messages in thread From: Darren Hart @ 2010-07-09 22:32 UTC (permalink / raw) To: linux-kernel Cc: Thomas Gleixner, Peter Zijlstra, Ingo Molnar, Eric Dumazet, John Kacur, Steven Rostedt, Mike Galbraith, linux-rt-users This patch-set address the following problem reported by Mike Galbraith: > Stress testing, looking to trigger RCU stalls, I've managed to find a > way to repeatably create fireworks. (got RCU stall, see attached) > > 1. download ltp-full-20100630. Needs to be this version because of > testcase bustage in earlier versions, and must be built with gcc > 4.3, > else testcases will segfault due to a gcc bug. > > 2. apply patchlet so you can run testcases/realtime/perf/latency/run.sh > at all. > > --- pthread_cond_many.c.org 2010-07-05 09:05:59.000000000 +0200 > +++ pthread_cond_many.c 2010-07-04 12:12:25.000000000 +0200 > @@ -259,7 +259,7 @@ void usage(void) > > int parse_args(int c, char *v) > { > - int handled; > + int handled = 1; > switch (c) { > case 'h': > usage(); > > 3. add --realtime for no particular reason. > > --- run.sh.org 2010-07-06 15:54:58.000000000 +0200 > +++ run.sh 2010-07-06 16:37:34.000000000 +0200 > @@ -22,7 +22,7 @@ make > # process to run realtime. The remainder of the processes (if any) > # will run non-realtime in any case. > > -nthread=5000 > +nthread=500 > iter=400 > nproc=5 > > @@ -39,7 +39,7 @@ i=0 > i=1 > while test $i -lt $nproc > do > - ./pthread_cond_many --broadcast -i $iter -n $nthread > $nthread.$iter.$nproc.$i.out & > + ./pthread_cond_many --realtime --broadcast -i $iter -n $nthread > $nthread.$iter.$nproc.$i.out & > i=`expr $i + 1` > done > wait The LTP wreckage will be addressed separately <ugh> > 4. run it. > > What happens here is we hit WARN_ON(pendowner->pi_blocked_on != waiter), > this does not make it to consoles (poking sysrq-foo doesn't either). > Next comes WARN_ON(!pendowner->pi_blocked_on), followed by the NULL > explosion, which does make it to consoles. > > With explosion avoidance, I also see pendowner->pi_blocked_on->task == > NULL at times, but that, as !pendowner->pi_blocked_on, seems to be > fallout. The start of bad juju is always pi_blocked_on != waiter. > > [ 141.609268] BUG: unable to handle kernel NULL pointer dereference at 0000000000000058 > [ 141.609268] IP: [<ffffffff8106856d>] wakeup_next_waiter+0x12c/0x177 > [ 141.609268] PGD 20e174067 PUD 20e253067 PMD 0 > [ 141.609268] Oops: 0000 [#1] PREEMPT SMP > [ 141.609268] last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map > [ 141.609268] CPU 0 > [ 141.609268] Pid: 8154, comm: pthread_cond_ma Tainted: G W 2.6.33.6-rt23 #12 MS-7502/MS-7502 > [ 141.609268] RIP: 0010:[<ffffffff8106856d>] [<ffffffff8106856d>] wakeup_next_waiter+0x12c/0x177 > [ 141.609268] RSP: 0018:ffff88020e3cdd78 EFLAGS: 00010097 > [ 141.609268] RAX: 0000000000000000 RBX: ffff8801e8eba5c0 RCX: 0000000000000000 > [ 141.609268] RDX: ffff880028200000 RSI: 0000000000000046 RDI: 0000000000000009 > [ 141.609268] RBP: ffff88020e3cdda8 R08: 0000000000000002 R09: 0000000000000000 > [ 141.609268] R10: 0000000000000005 R11: 0000000000000000 R12: ffffffff81659068 > [ 141.609268] R13: ffff8801e8ebdb58 R14: 0000000000000000 R15: ffff8801e8ebac08 > [ 141.609268] FS: 00007f664d539700(0000) GS:ffff880028200000(0000) knlGS:0000000000000000 > [ 141.609268] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 141.609268] CR2: 0000000000000058 CR3: 0000000214266000 CR4: 00000000000006f0 > [ 141.609268] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 141.609268] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 > [ 141.609268] Process pthread_cond_ma (pid: 8154, threadinfo ffff88020e3cc000, task ffff88020e2a4700) > [ 141.609268] Stack: > [ 141.609268] 0000000000000000 ffffffff81659068 0000000000000202 0000000000000000 > [ 141.609268] <0> 0000000000000000 0000000080001fda ffff88020e3cddc8 ffffffff812fec48 > [ 141.609268] <0> ffffffff81659068 0000000000606300 ffff88020e3cddd8 ffffffff812ff1b9 > [ 141.609268] Call Trace: > [ 141.609268] [<ffffffff812fec48>] rt_spin_lock_slowunlock+0x43/0x61 > [ 141.609268] [<ffffffff812ff1b9>] rt_spin_unlock+0x46/0x48 > [ 141.609268] [<ffffffff81067d7f>] do_futex+0x83c/0x935 > [ 141.609268] [<ffffffff810c26ce>] ? handle_mm_fault+0x6de/0x6f1 > [ 141.609268] [<ffffffff81067e36>] ? do_futex+0x8f3/0x935 > [ 141.609268] [<ffffffff81067fba>] sys_futex+0x142/0x154 > [ 141.609268] [<ffffffff81020eb0>] ? do_page_fault+0x23e/0x28e > [ 141.609268] [<ffffffff81004aa7>] ? math_state_restore+0x3d/0x3f > [ 141.609268] [<ffffffff81004b08>] ? do_device_not_available+0xe/0x12 > [ 141.609268] [<ffffffff81002c5b>] system_call_fastpath+0x16/0x1b > [ 141.609268] Code: c7 09 6d 41 81 e8 ac 34 fd ff 4c 39 ab 70 06 00 00 74 11 be 47 02 00 00 48 c7 c7 09 6d 41 81 e8 92 34 fd ff 48 8b 83 70 06 00 00 <4c> 39 60 58 74 11 be 48 02 00 00 48 c7 c7 09 6d 41 81 e8 74 34 > [ 141.609268] RIP [<ffffffff8106856d>] wakeup_next_waiter+0x12c/0x177 > [ 141.609268] RSP <ffff88020e3cdd78> > [ 141.609268] CR2: 0000000000000058 > [ 141.609268] ---[ end trace 58805b944e6f93ce ]--- > [ 141.609268] note: pthread_cond_ma[8154] exited with preempt_count 2 ^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH 1/4] rtmutex: avoid null derefence in WARN_ON 2010-07-09 22:32 [PATCH 0/4][RT] futex: fix tasks blocking on two rt_mutex locks Darren Hart @ 2010-07-09 22:32 ` Darren Hart 2010-07-10 0:29 ` Steven Rostedt 2010-07-09 22:32 ` [PATCH 2/4] rtmutex: add BUG_ON if a task attempts to block on two locks Darren Hart ` (2 subsequent siblings) 3 siblings, 1 reply; 28+ messages in thread From: Darren Hart @ 2010-07-09 22:32 UTC (permalink / raw) To: linux-kernel Cc: Thomas Gleixner, Peter Zijlstra, Ingo Molnar, Eric Dumazet, John Kacur, Steven Rostedt, Mike Galbraith, linux-rt-users, Darren Hart If the pi_blocked_on variable is NULL, the subsequent WARN_ON's will cause an OOPS. Only perform the susequent checks if pi_blocked_on is valid. Signed-off-by: Darren Hart <dvhltc@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: John Kacur <jkacur@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mike Galbraith <efault@gmx.de> --- kernel/rtmutex.c | 7 ++++--- 1 files changed, 4 insertions(+), 3 deletions(-) diff --git a/kernel/rtmutex.c b/kernel/rtmutex.c index 23dd443..baac7d9 100644 --- a/kernel/rtmutex.c +++ b/kernel/rtmutex.c @@ -579,9 +579,10 @@ static void wakeup_next_waiter(struct rt_mutex *lock, int savestate) raw_spin_lock(&pendowner->pi_lock); - WARN_ON(!pendowner->pi_blocked_on); - WARN_ON(pendowner->pi_blocked_on != waiter); - WARN_ON(pendowner->pi_blocked_on->lock != lock); + if (!WARN_ON(!pendowner->pi_blocked_on)) { + WARN_ON(pendowner->pi_blocked_on != waiter); + WARN_ON(pendowner->pi_blocked_on->lock != lock); + } pendowner->pi_blocked_on = NULL; -- 1.7.0.4 ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH 1/4] rtmutex: avoid null derefence in WARN_ON 2010-07-09 22:32 ` [PATCH 1/4] rtmutex: avoid null derefence in WARN_ON Darren Hart @ 2010-07-10 0:29 ` Steven Rostedt 2010-07-10 14:42 ` Darren Hart 0 siblings, 1 reply; 28+ messages in thread From: Steven Rostedt @ 2010-07-10 0:29 UTC (permalink / raw) To: Darren Hart Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Ingo Molnar, Eric Dumazet, John Kacur, Mike Galbraith, linux-rt-users On Fri, 2010-07-09 at 15:32 -0700, Darren Hart wrote: > If the pi_blocked_on variable is NULL, the subsequent WARN_ON's > will cause an OOPS. Only perform the susequent checks if > pi_blocked_on is valid. > > Signed-off-by: Darren Hart <dvhltc@us.ibm.com> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: Peter Zijlstra <peterz@infradead.org> > Cc: Ingo Molnar <mingo@elte.hu> > Cc: Eric Dumazet <eric.dumazet@gmail.com> > Cc: John Kacur <jkacur@redhat.com> > Cc: Steven Rostedt <rostedt@goodmis.org> > Cc: Mike Galbraith <efault@gmx.de> > --- > kernel/rtmutex.c | 7 ++++--- > 1 files changed, 4 insertions(+), 3 deletions(-) > > diff --git a/kernel/rtmutex.c b/kernel/rtmutex.c > index 23dd443..baac7d9 100644 > --- a/kernel/rtmutex.c > +++ b/kernel/rtmutex.c > @@ -579,9 +579,10 @@ static void wakeup_next_waiter(struct rt_mutex *lock, int savestate) > > raw_spin_lock(&pendowner->pi_lock); > > - WARN_ON(!pendowner->pi_blocked_on); > - WARN_ON(pendowner->pi_blocked_on != waiter); > - WARN_ON(pendowner->pi_blocked_on->lock != lock); > + if (!WARN_ON(!pendowner->pi_blocked_on)) { > + WARN_ON(pendowner->pi_blocked_on != waiter); The above actually has no issue if the pi_blocked_on is NULL. The below, well yeah. -- Steve > + WARN_ON(pendowner->pi_blocked_on->lock != lock); > + } > > pendowner->pi_blocked_on = NULL; > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 1/4] rtmutex: avoid null derefence in WARN_ON 2010-07-10 0:29 ` Steven Rostedt @ 2010-07-10 14:42 ` Darren Hart 0 siblings, 0 replies; 28+ messages in thread From: Darren Hart @ 2010-07-10 14:42 UTC (permalink / raw) To: rostedt Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Ingo Molnar, Eric Dumazet, John Kacur, Mike Galbraith, linux-rt-users On 07/09/2010 05:29 PM, Steven Rostedt wrote: > On Fri, 2010-07-09 at 15:32 -0700, Darren Hart wrote: >> If the pi_blocked_on variable is NULL, the subsequent WARN_ON's >> will cause an OOPS. Only perform the susequent checks if >> pi_blocked_on is valid. >> >> Signed-off-by: Darren Hart<dvhltc@us.ibm.com> >> Cc: Thomas Gleixner<tglx@linutronix.de> >> Cc: Peter Zijlstra<peterz@infradead.org> >> Cc: Ingo Molnar<mingo@elte.hu> >> Cc: Eric Dumazet<eric.dumazet@gmail.com> >> Cc: John Kacur<jkacur@redhat.com> >> Cc: Steven Rostedt<rostedt@goodmis.org> >> Cc: Mike Galbraith<efault@gmx.de> >> --- >> kernel/rtmutex.c | 7 ++++--- >> 1 files changed, 4 insertions(+), 3 deletions(-) >> >> diff --git a/kernel/rtmutex.c b/kernel/rtmutex.c >> index 23dd443..baac7d9 100644 >> --- a/kernel/rtmutex.c >> +++ b/kernel/rtmutex.c >> @@ -579,9 +579,10 @@ static void wakeup_next_waiter(struct rt_mutex *lock, int savestate) >> >> raw_spin_lock(&pendowner->pi_lock); >> >> - WARN_ON(!pendowner->pi_blocked_on); >> - WARN_ON(pendowner->pi_blocked_on != waiter); >> - WARN_ON(pendowner->pi_blocked_on->lock != lock); >> + if (!WARN_ON(!pendowner->pi_blocked_on)) { >> + WARN_ON(pendowner->pi_blocked_on != waiter); > > The above actually has no issue if the pi_blocked_on is NULL. It doesn't, but it's also redundant and makes the console noisier for no reason. Seemed worth while to drop it under the if in the same go. -- Darren > The below, well yeah. > > -- Steve > >> + WARN_ON(pendowner->pi_blocked_on->lock != lock); >> + } >> >> pendowner->pi_blocked_on = NULL; >> > > -- Darren Hart IBM Linux Technology Center Real-Time Linux Team ^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH 2/4] rtmutex: add BUG_ON if a task attempts to block on two locks 2010-07-09 22:32 [PATCH 0/4][RT] futex: fix tasks blocking on two rt_mutex locks Darren Hart 2010-07-09 22:32 ` [PATCH 1/4] rtmutex: avoid null derefence in WARN_ON Darren Hart @ 2010-07-09 22:32 ` Darren Hart 2010-07-10 0:30 ` Steven Rostedt 2010-07-09 22:32 ` [PATCH 3/4] futex: free_pi_state outside of hb->lock sections Darren Hart 2010-07-09 22:33 ` [PATCH 4/4] futex: convert hash_bucket locks to raw_spinlock_t Darren Hart 3 siblings, 1 reply; 28+ messages in thread From: Darren Hart @ 2010-07-09 22:32 UTC (permalink / raw) To: linux-kernel Cc: Thomas Gleixner, Peter Zijlstra, Ingo Molnar, Eric Dumazet, John Kacur, Steven Rostedt, Mike Galbraith, linux-rt-users, Darren Hart rtmutex proxy locking complicates the logic a bit and opens up the possibility for a task to wake and attempt to take another sleeping lock without knowing it has been enqueued on another lock already. Add a BUG_ON to catch this scenario early. Signed-off-by: Darren Hart <dvhltc@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: John Kacur <jkacur@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mike Galbraith <efault@gmx.de> --- kernel/rtmutex.c | 3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/kernel/rtmutex.c b/kernel/rtmutex.c index baac7d9..22f9d18 100644 --- a/kernel/rtmutex.c +++ b/kernel/rtmutex.c @@ -459,6 +459,9 @@ static int task_blocks_on_rt_mutex(struct rt_mutex *lock, top_waiter = rt_mutex_top_waiter(lock); plist_add(&waiter->list_entry, &lock->wait_list); + /* Tasks can only block on one lock at a time. */ + BUG_ON(task->pi_blocked_on != NULL); + task->pi_blocked_on = waiter; raw_spin_unlock(&task->pi_lock); -- 1.7.0.4 ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH 2/4] rtmutex: add BUG_ON if a task attempts to block on two locks 2010-07-09 22:32 ` [PATCH 2/4] rtmutex: add BUG_ON if a task attempts to block on two locks Darren Hart @ 2010-07-10 0:30 ` Steven Rostedt 2010-07-10 17:30 ` [PATCH 2/4 V2] " Darren Hart 0 siblings, 1 reply; 28+ messages in thread From: Steven Rostedt @ 2010-07-10 0:30 UTC (permalink / raw) To: Darren Hart Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Ingo Molnar, Eric Dumazet, John Kacur, Mike Galbraith, linux-rt-users On Fri, 2010-07-09 at 15:32 -0700, Darren Hart wrote: > rtmutex proxy locking complicates the logic a bit and opens up > the possibility for a task to wake and attempt to take another > sleeping lock without knowing it has been enqueued on another > lock already. Add a BUG_ON to catch this scenario early. > > Signed-off-by: Darren Hart <dvhltc@us.ibm.com> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: Peter Zijlstra <peterz@infradead.org> > Cc: Ingo Molnar <mingo@elte.hu> > Cc: Eric Dumazet <eric.dumazet@gmail.com> > Cc: John Kacur <jkacur@redhat.com> > Cc: Steven Rostedt <rostedt@goodmis.org> > Cc: Mike Galbraith <efault@gmx.de> > --- > kernel/rtmutex.c | 3 +++ > 1 files changed, 3 insertions(+), 0 deletions(-) > > diff --git a/kernel/rtmutex.c b/kernel/rtmutex.c > index baac7d9..22f9d18 100644 > --- a/kernel/rtmutex.c > +++ b/kernel/rtmutex.c > @@ -459,6 +459,9 @@ static int task_blocks_on_rt_mutex(struct rt_mutex *lock, > top_waiter = rt_mutex_top_waiter(lock); > plist_add(&waiter->list_entry, &lock->wait_list); > > + /* Tasks can only block on one lock at a time. */ > + BUG_ON(task->pi_blocked_on != NULL); WARN_ON may be better. Since it may not cause a system crash or other huge bug if it is not true. -- Steve > + > task->pi_blocked_on = waiter; > > raw_spin_unlock(&task->pi_lock); ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 2/4 V2] rtmutex: add BUG_ON if a task attempts to block on two locks 2010-07-10 0:30 ` Steven Rostedt @ 2010-07-10 17:30 ` Darren Hart 0 siblings, 0 replies; 28+ messages in thread From: Darren Hart @ 2010-07-10 17:30 UTC (permalink / raw) To: rostedt Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Ingo Molnar, Eric Dumazet, John Kacur, Mike Galbraith, linux-rt-users On 07/09/2010 05:30 PM, Steven Rostedt wrote: > On Fri, 2010-07-09 at 15:32 -0700, Darren Hart wrote: >> rtmutex proxy locking complicates the logic a bit and opens up >> the possibility for a task to wake and attempt to take another >> sleeping lock without knowing it has been enqueued on another >> lock already. Add a BUG_ON to catch this scenario early. >> >> Signed-off-by: Darren Hart<dvhltc@us.ibm.com> >> Cc: Thomas Gleixner<tglx@linutronix.de> >> Cc: Peter Zijlstra<peterz@infradead.org> >> Cc: Ingo Molnar<mingo@elte.hu> >> Cc: Eric Dumazet<eric.dumazet@gmail.com> >> Cc: John Kacur<jkacur@redhat.com> >> Cc: Steven Rostedt<rostedt@goodmis.org> >> Cc: Mike Galbraith<efault@gmx.de> >> --- >> kernel/rtmutex.c | 3 +++ >> 1 files changed, 3 insertions(+), 0 deletions(-) >> >> diff --git a/kernel/rtmutex.c b/kernel/rtmutex.c >> index baac7d9..22f9d18 100644 >> --- a/kernel/rtmutex.c >> +++ b/kernel/rtmutex.c >> @@ -459,6 +459,9 @@ static int task_blocks_on_rt_mutex(struct rt_mutex *lock, >> top_waiter = rt_mutex_top_waiter(lock); >> plist_add(&waiter->list_entry,&lock->wait_list); >> >> + /* Tasks can only block on one lock at a time. */ >> + BUG_ON(task->pi_blocked_on != NULL); > > WARN_ON may be better. Since it may not cause a system crash or other > huge bug if it is not true. > No objection. >From 31c7b6c5657bcc897ddc79d7f9bb1942eb4c854a Mon Sep 17 00:00:00 2001 From: Darren Hart <dvhltc@us.ibm.com> Date: Fri, 9 Jul 2010 17:46:05 -0400 Subject: [PATCH 2/4] rtmutex: add WARN_ON if a task attempts to block on two locks rtmutex proxy locking complicates the logic a bit and opens up the possibility for a task to wake and attempt to take another sleeping lock without knowing it has been enqueued on another lock already. Add a WARN_ON to catch this scenario early. V2: use a WARN_ON instead of a BUG_ON per Steven's request. Signed-off-by: Darren Hart <dvhltc@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: John Kacur <jkacur@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mike Galbraith <efault@gmx.de> --- kernel/rtmutex.c | 3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/kernel/rtmutex.c b/kernel/rtmutex.c index baac7d9..5262d0f 100644 --- a/kernel/rtmutex.c +++ b/kernel/rtmutex.c @@ -459,6 +459,9 @@ static int task_blocks_on_rt_mutex(struct rt_mutex *lock, top_waiter = rt_mutex_top_waiter(lock); plist_add(&waiter->list_entry, &lock->wait_list); + /* Tasks can only block on one lock at a time. */ + WARN_ON(task->pi_blocked_on != NULL); + task->pi_blocked_on = waiter; raw_spin_unlock(&task->pi_lock); -- 1.7.0.4 ^ permalink raw reply related [flat|nested] 28+ messages in thread
* [PATCH 3/4] futex: free_pi_state outside of hb->lock sections 2010-07-09 22:32 [PATCH 0/4][RT] futex: fix tasks blocking on two rt_mutex locks Darren Hart 2010-07-09 22:32 ` [PATCH 1/4] rtmutex: avoid null derefence in WARN_ON Darren Hart 2010-07-09 22:32 ` [PATCH 2/4] rtmutex: add BUG_ON if a task attempts to block on two locks Darren Hart @ 2010-07-09 22:32 ` Darren Hart 2010-07-09 22:55 ` Darren Hart 2010-07-12 10:35 ` [PATCH 3/4] " Thomas Gleixner 2010-07-09 22:33 ` [PATCH 4/4] futex: convert hash_bucket locks to raw_spinlock_t Darren Hart 3 siblings, 2 replies; 28+ messages in thread From: Darren Hart @ 2010-07-09 22:32 UTC (permalink / raw) To: linux-kernel Cc: Thomas Gleixner, Peter Zijlstra, Ingo Molnar, Eric Dumazet, John Kacur, Steven Rostedt, Mike Galbraith, linux-rt-users, Darren Hart free_pi_state() calls kfree() and might sleep. To prepare for raw hb->locks, get the calls to free_pi_state() out of the hb->lock() sections. Signed-off-by: Darren Hart <dvhltc@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: John Kacur <jkacur@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mike Galbraith <efault@gmx.de> --- kernel/futex.c | 14 ++++++++------ 1 files changed, 8 insertions(+), 6 deletions(-) diff --git a/kernel/futex.c b/kernel/futex.c index a6cec32..b217972 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -114,7 +114,7 @@ struct futex_q { struct plist_node list; struct task_struct *task; - spinlock_t *lock_ptr; + raw_spinlock_t *lock_ptr; union futex_key key; struct futex_pi_state *pi_state; struct rt_mutex_waiter *rt_waiter; @@ -487,7 +487,7 @@ void exit_pi_state_list(struct task_struct *curr) * task still owns the PI-state: */ if (head->next != next) { - spin_unlock(&hb->lock); + raw_spin_unlock(&hb->lock); continue; } @@ -1339,7 +1339,6 @@ retry_private: } else if (ret) { /* -EDEADLK */ this->pi_state = NULL; - free_pi_state(pi_state); goto out_unlock; } } @@ -1437,7 +1436,7 @@ static inline void queue_me(struct futex_q *q, struct futex_hash_bucket *hb) */ static int unqueue_me(struct futex_q *q) { - spinlock_t *lock_ptr; + raw_spinlock_t *lock_ptr; int ret = 0; /* In the common case we don't take the spinlock, which is nice. */ @@ -1483,16 +1482,19 @@ retry: */ static void unqueue_me_pi(struct futex_q *q) { + struct futex_pi_state *pi_state = NULL; + WARN_ON(plist_node_empty(&q->list)); plist_del(&q->list, &q->list.plist); BUG_ON(!q->pi_state); - free_pi_state(q->pi_state); + pi_state = q->pi_state; q->pi_state = NULL; spin_unlock(q->lock_ptr); - drop_futex_key_refs(&q->key); + + free_pi_state(pi_state); } /* -- 1.7.0.4 ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH 3/4 V2] futex: free_pi_state outside of hb->lock sections 2010-07-09 22:32 ` [PATCH 3/4] futex: free_pi_state outside of hb->lock sections Darren Hart @ 2010-07-09 22:55 ` Darren Hart 2010-07-12 10:35 ` [PATCH 3/4] " Thomas Gleixner 1 sibling, 0 replies; 28+ messages in thread From: Darren Hart @ 2010-07-09 22:55 UTC (permalink / raw) To: Darren Hart Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Ingo Molnar, Eric Dumazet, John Kacur, Steven Rostedt, Mike Galbraith, linux-rt-users Apologies, mangled a rebase and this patch had bits of 4/4 in it. Corrected below: >From f40b8e684df6ed2b1ba9167cad3922ab43f4a717 Mon Sep 17 00:00:00 2001 From: Darren Hart <dvhltc@us.ibm.com> Date: Fri, 9 Jul 2010 17:07:10 -0400 Subject: [PATCH 3/4 V2] futex: free_pi_state outside of hb->lock sections free_pi_state() calls kfree() and might sleep. To prepare for raw hb->locks, get the calls to free_pi_state() out of the hb->lock() sections. Signed-off-by: Darren Hart <dvhltc@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: John Kacur <jkacur@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mike Galbraith <efault@gmx.de> --- kernel/futex.c | 8 +++++--- 1 files changed, 5 insertions(+), 3 deletions(-) diff --git a/kernel/futex.c b/kernel/futex.c index a6cec32..2cd58a2 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -1339,7 +1339,6 @@ retry_private: } else if (ret) { /* -EDEADLK */ this->pi_state = NULL; - free_pi_state(pi_state); goto out_unlock; } } @@ -1483,16 +1482,19 @@ retry: */ static void unqueue_me_pi(struct futex_q *q) { + struct futex_pi_state *pi_state = NULL; + WARN_ON(plist_node_empty(&q->list)); plist_del(&q->list, &q->list.plist); BUG_ON(!q->pi_state); - free_pi_state(q->pi_state); + pi_state = q->pi_state; q->pi_state = NULL; spin_unlock(q->lock_ptr); - drop_futex_key_refs(&q->key); + + free_pi_state(pi_state); } /* -- 1.7.0.4 ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH 3/4 V2] futex: free_pi_state outside of hb->lock sections @ 2010-07-09 22:55 ` Darren Hart 0 siblings, 0 replies; 28+ messages in thread From: Darren Hart @ 2010-07-09 22:55 UTC (permalink / raw) To: Darren Hart Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Ingo Molnar, Eric Dumazet, John Kacur, Steven Rostedt, Mike Galbraith, linux-rt-users Apologies, mangled a rebase and this patch had bits of 4/4 in it. Corrected below: >From f40b8e684df6ed2b1ba9167cad3922ab43f4a717 Mon Sep 17 00:00:00 2001 From: Darren Hart <dvhltc@us.ibm.com> Date: Fri, 9 Jul 2010 17:07:10 -0400 Subject: [PATCH 3/4 V2] futex: free_pi_state outside of hb->lock sections free_pi_state() calls kfree() and might sleep. To prepare for raw hb->locks, get the calls to free_pi_state() out of the hb->lock() sections. Signed-off-by: Darren Hart <dvhltc@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: John Kacur <jkacur@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mike Galbraith <efault@gmx.de> --- kernel/futex.c | 8 +++++--- 1 files changed, 5 insertions(+), 3 deletions(-) diff --git a/kernel/futex.c b/kernel/futex.c index a6cec32..2cd58a2 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -1339,7 +1339,6 @@ retry_private: } else if (ret) { /* -EDEADLK */ this->pi_state = NULL; - free_pi_state(pi_state); goto out_unlock; } } @@ -1483,16 +1482,19 @@ retry: */ static void unqueue_me_pi(struct futex_q *q) { + struct futex_pi_state *pi_state = NULL; + WARN_ON(plist_node_empty(&q->list)); plist_del(&q->list, &q->list.plist); BUG_ON(!q->pi_state); - free_pi_state(q->pi_state); + pi_state = q->pi_state; q->pi_state = NULL; spin_unlock(q->lock_ptr); ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH 3/4 V2] futex: free_pi_state outside of hb->lock sections 2010-07-09 22:55 ` Darren Hart (?) @ 2010-07-10 0:32 ` Steven Rostedt 2010-07-10 14:41 ` Darren Hart -1 siblings, 1 reply; 28+ messages in thread From: Steven Rostedt @ 2010-07-10 0:32 UTC (permalink / raw) To: Darren Hart Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Ingo Molnar, Eric Dumazet, John Kacur, Mike Galbraith, linux-rt-users On Fri, 2010-07-09 at 15:55 -0700, Darren Hart wrote: > Apologies, mangled a rebase and this patch had bits of 4/4 in it. > Corrected below: > > > >From f40b8e684df6ed2b1ba9167cad3922ab43f4a717 Mon Sep 17 00:00:00 2001 > From: Darren Hart <dvhltc@us.ibm.com> > Date: Fri, 9 Jul 2010 17:07:10 -0400 > Subject: [PATCH 3/4 V2] futex: free_pi_state outside of hb->lock sections > > free_pi_state() calls kfree() and might sleep. To prepare for raw hb->locks, > get the calls to free_pi_state() out of the hb->lock() sections. > > Signed-off-by: Darren Hart <dvhltc@us.ibm.com> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: Peter Zijlstra <peterz@infradead.org> > Cc: Ingo Molnar <mingo@elte.hu> > Cc: Eric Dumazet <eric.dumazet@gmail.com> > Cc: John Kacur <jkacur@redhat.com> > Cc: Steven Rostedt <rostedt@goodmis.org> > Cc: Mike Galbraith <efault@gmx.de> > --- > kernel/futex.c | 8 +++++--- > 1 files changed, 5 insertions(+), 3 deletions(-) > > diff --git a/kernel/futex.c b/kernel/futex.c > index a6cec32..2cd58a2 100644 > --- a/kernel/futex.c > +++ b/kernel/futex.c > @@ -1339,7 +1339,6 @@ retry_private: This is why I always add a space before labels. I have no idea what function this was in. > } else if (ret) { > /* -EDEADLK */ > this->pi_state = NULL; > - free_pi_state(pi_state); Where do we free the pi state here? -- Steve > goto out_unlock; > } > } > @@ -1483,16 +1482,19 @@ retry: > */ > static void unqueue_me_pi(struct futex_q *q) > { > + struct futex_pi_state *pi_state = NULL; > + > WARN_ON(plist_node_empty(&q->list)); > plist_del(&q->list, &q->list.plist); > > BUG_ON(!q->pi_state); > - free_pi_state(q->pi_state); > + pi_state = q->pi_state; > q->pi_state = NULL; > > spin_unlock(q->lock_ptr); > - > drop_futex_key_refs(&q->key); > + > + free_pi_state(pi_state); > } > > /* ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 3/4 V2] futex: free_pi_state outside of hb->lock sections 2010-07-10 0:32 ` Steven Rostedt @ 2010-07-10 14:41 ` Darren Hart 0 siblings, 0 replies; 28+ messages in thread From: Darren Hart @ 2010-07-10 14:41 UTC (permalink / raw) To: rostedt Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Ingo Molnar, Eric Dumazet, John Kacur, Mike Galbraith, linux-rt-users On 07/09/2010 05:32 PM, Steven Rostedt wrote: > On Fri, 2010-07-09 at 15:55 -0700, Darren Hart wrote: >> Apologies, mangled a rebase and this patch had bits of 4/4 in it. >> Corrected below: >> >> >> > From f40b8e684df6ed2b1ba9167cad3922ab43f4a717 Mon Sep 17 00:00:00 2001 >> From: Darren Hart<dvhltc@us.ibm.com> >> Date: Fri, 9 Jul 2010 17:07:10 -0400 >> Subject: [PATCH 3/4 V2] futex: free_pi_state outside of hb->lock sections >> >> free_pi_state() calls kfree() and might sleep. To prepare for raw hb->locks, >> get the calls to free_pi_state() out of the hb->lock() sections. >> >> Signed-off-by: Darren Hart<dvhltc@us.ibm.com> >> Cc: Thomas Gleixner<tglx@linutronix.de> >> Cc: Peter Zijlstra<peterz@infradead.org> >> Cc: Ingo Molnar<mingo@elte.hu> >> Cc: Eric Dumazet<eric.dumazet@gmail.com> >> Cc: John Kacur<jkacur@redhat.com> >> Cc: Steven Rostedt<rostedt@goodmis.org> >> Cc: Mike Galbraith<efault@gmx.de> >> --- >> kernel/futex.c | 8 +++++--- >> 1 files changed, 5 insertions(+), 3 deletions(-) >> >> diff --git a/kernel/futex.c b/kernel/futex.c >> index a6cec32..2cd58a2 100644 >> --- a/kernel/futex.c >> +++ b/kernel/futex.c >> @@ -1339,7 +1339,6 @@ retry_private: > > This is why I always add a space before labels. I have no idea what > function this was in. > >> } else if (ret) { >> /* -EDEADLK */ >> this->pi_state = NULL; >> - free_pi_state(pi_state); > > Where do we free the pi state here? Near the end after goto out_unlock which falls through to the out: label. In this case I didn't move it, it just wasn't necessary. out: if (pi_state != NULL) free_pi_state(pi_state); -- Darren Hart IBM Linux Technology Center Real-Time Linux Team ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 3/4] futex: free_pi_state outside of hb->lock sections 2010-07-09 22:32 ` [PATCH 3/4] futex: free_pi_state outside of hb->lock sections Darren Hart 2010-07-09 22:55 ` Darren Hart @ 2010-07-12 10:35 ` Thomas Gleixner 2010-07-12 10:46 ` Steven Rostedt 1 sibling, 1 reply; 28+ messages in thread From: Thomas Gleixner @ 2010-07-12 10:35 UTC (permalink / raw) To: Darren Hart Cc: linux-kernel, Peter Zijlstra, Ingo Molnar, Eric Dumazet, John Kacur, Steven Rostedt, Mike Galbraith, linux-rt-users On Fri, 9 Jul 2010, Darren Hart wrote: > free_pi_state() calls kfree() and might sleep. To prepare for raw hb->locks, > get the calls to free_pi_state() out of the hb->lock() sections. > > Signed-off-by: Darren Hart <dvhltc@us.ibm.com> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: Peter Zijlstra <peterz@infradead.org> > Cc: Ingo Molnar <mingo@elte.hu> > Cc: Eric Dumazet <eric.dumazet@gmail.com> > Cc: John Kacur <jkacur@redhat.com> > Cc: Steven Rostedt <rostedt@goodmis.org> > Cc: Mike Galbraith <efault@gmx.de> > --- > kernel/futex.c | 14 ++++++++------ > 1 files changed, 8 insertions(+), 6 deletions(-) > > diff --git a/kernel/futex.c b/kernel/futex.c > index a6cec32..b217972 100644 > --- a/kernel/futex.c > +++ b/kernel/futex.c > @@ -114,7 +114,7 @@ struct futex_q { > struct plist_node list; > > struct task_struct *task; > - spinlock_t *lock_ptr; > + raw_spinlock_t *lock_ptr; How is this related to the changelog ? Thanks, tglx ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 3/4] futex: free_pi_state outside of hb->lock sections 2010-07-12 10:35 ` [PATCH 3/4] " Thomas Gleixner @ 2010-07-12 10:46 ` Steven Rostedt 0 siblings, 0 replies; 28+ messages in thread From: Steven Rostedt @ 2010-07-12 10:46 UTC (permalink / raw) To: Thomas Gleixner Cc: Darren Hart, linux-kernel, Peter Zijlstra, Ingo Molnar, Eric Dumazet, John Kacur, Mike Galbraith, linux-rt-users On Mon, 2010-07-12 at 12:35 +0200, Thomas Gleixner wrote: > On Fri, 9 Jul 2010, Darren Hart wrote: > > > free_pi_state() calls kfree() and might sleep. To prepare for raw hb->locks, > > get the calls to free_pi_state() out of the hb->lock() sections. > > > > Signed-off-by: Darren Hart <dvhltc@us.ibm.com> > > struct task_struct *task; > > - spinlock_t *lock_ptr; > > + raw_spinlock_t *lock_ptr; > > How is this related to the changelog ? Read the reply to himself (V2) ;-) -- Steve ^ permalink raw reply [flat|nested] 28+ messages in thread
* [PATCH 4/4] futex: convert hash_bucket locks to raw_spinlock_t 2010-07-09 22:32 [PATCH 0/4][RT] futex: fix tasks blocking on two rt_mutex locks Darren Hart ` (2 preceding siblings ...) 2010-07-09 22:32 ` [PATCH 3/4] futex: free_pi_state outside of hb->lock sections Darren Hart @ 2010-07-09 22:33 ` Darren Hart 2010-07-09 22:57 ` [PATCH 4/4 V2] " Darren Hart ` (2 more replies) 3 siblings, 3 replies; 28+ messages in thread From: Darren Hart @ 2010-07-09 22:33 UTC (permalink / raw) To: linux-kernel Cc: Thomas Gleixner, Peter Zijlstra, Ingo Molnar, Eric Dumazet, John Kacur, Steven Rostedt, Mike Galbraith, linux-rt-users, Darren Hart The requeue_pi mechanism introduced proxy locking of the rtmutex. This creates a scenario where a task can wake-up, not knowing it has been enqueued on an rtmutex. In order to detect this, the task would have to be able to take either task->pi_blocked_on->lock->wait_lock and/or the hb->lock. Unfortunately, without already holding one of these, the pi_blocked_on variable can change from NULL to valid or from valid to NULL. Therefor, the task cannot be allowed to take a sleeping lock after wakeup or it could end up trying to block on two locks, the second overwriting a valid pi_blocked_on value. This obviously breaks the pi mechanism. This patch increases latency, while running the ltp pthread_cond_many test which Michal reported the bug with, I see double digit hrtimer latencies (typically only on the first run after boo): kernel: hrtimer: interrupt took 75911 ns This might be addressed by changing the various loops in the futex code to be incremental, probably at an additional throughput hit. The private hash_bucket lists discussed in the past could reduce hb->lock contention in some scenarios. It should be noted that pthread_cond_many is a rather pathological case. This also introduces problems for plists which want a spinlock_t rather than a raw_spinlock_t. Any thoughts on how to address this? Signed-off-by: Darren Hart <dvhltc@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: John Kacur <jkacur@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mike Galbraith <efault@gmx.de> --- kernel/futex.c | 67 ++++++++++++++++++++++++++++++-------------------------- 1 files changed, 36 insertions(+), 31 deletions(-) diff --git a/kernel/futex.c b/kernel/futex.c index b217972..0ad5a85 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -128,7 +128,7 @@ struct futex_q { * waiting on a futex. */ struct futex_hash_bucket { - spinlock_t lock; + raw_spinlock_t lock; struct plist_head chain; }; @@ -479,7 +479,7 @@ void exit_pi_state_list(struct task_struct *curr) hb = hash_futex(&key); raw_spin_unlock_irq(&curr->pi_lock); - spin_lock(&hb->lock); + raw_spin_lock(&hb->lock); raw_spin_lock_irq(&curr->pi_lock); /* @@ -499,7 +499,7 @@ void exit_pi_state_list(struct task_struct *curr) rt_mutex_unlock(&pi_state->pi_mutex); - spin_unlock(&hb->lock); + raw_spin_unlock(&hb->lock); raw_spin_lock_irq(&curr->pi_lock); } @@ -860,21 +860,21 @@ static inline void double_lock_hb(struct futex_hash_bucket *hb1, struct futex_hash_bucket *hb2) { if (hb1 <= hb2) { - spin_lock(&hb1->lock); + raw_spin_lock(&hb1->lock); if (hb1 < hb2) - spin_lock_nested(&hb2->lock, SINGLE_DEPTH_NESTING); + raw_spin_lock_nested(&hb2->lock, SINGLE_DEPTH_NESTING); } else { /* hb1 > hb2 */ - spin_lock(&hb2->lock); - spin_lock_nested(&hb1->lock, SINGLE_DEPTH_NESTING); + raw_spin_lock(&hb2->lock); + raw_spin_lock_nested(&hb1->lock, SINGLE_DEPTH_NESTING); } } static inline void double_unlock_hb(struct futex_hash_bucket *hb1, struct futex_hash_bucket *hb2) { - spin_unlock(&hb1->lock); + raw_spin_unlock(&hb1->lock); if (hb1 != hb2) - spin_unlock(&hb2->lock); + raw_spin_unlock(&hb2->lock); } /* @@ -896,7 +896,7 @@ static int futex_wake(u32 __user *uaddr, int fshared, int nr_wake, u32 bitset) goto out; hb = hash_futex(&key); - spin_lock(&hb->lock); + raw_spin_lock(&hb->lock); head = &hb->chain; plist_for_each_entry_safe(this, next, head, list) { @@ -916,7 +916,7 @@ static int futex_wake(u32 __user *uaddr, int fshared, int nr_wake, u32 bitset) } } - spin_unlock(&hb->lock); + raw_spin_unlock(&hb->lock); put_futex_key(fshared, &key); out: return ret; @@ -1070,6 +1070,7 @@ void requeue_pi_wake_futex(struct futex_q *q, union futex_key *key, q->lock_ptr = &hb->lock; #ifdef CONFIG_DEBUG_PI_LIST + /* FIXME: we're converting this to a raw lock... */ q->list.plist.spinlock = &hb->lock; #endif @@ -1377,14 +1378,14 @@ static inline struct futex_hash_bucket *queue_lock(struct futex_q *q) hb = hash_futex(&q->key); q->lock_ptr = &hb->lock; - spin_lock(&hb->lock); + raw_spin_lock(&hb->lock); return hb; } static inline void queue_unlock(struct futex_q *q, struct futex_hash_bucket *hb) { - spin_unlock(&hb->lock); + raw_spin_unlock(&hb->lock); drop_futex_key_refs(&q->key); } @@ -1416,11 +1417,12 @@ static inline void queue_me(struct futex_q *q, struct futex_hash_bucket *hb) plist_node_init(&q->list, prio); #ifdef CONFIG_DEBUG_PI_LIST + /* FIXME: we're converting this to a raw_spinlock */ q->list.plist.spinlock = &hb->lock; #endif plist_add(&q->list, &hb->chain); q->task = current; - spin_unlock(&hb->lock); + raw_spin_unlock(&hb->lock); } /** @@ -1444,7 +1446,7 @@ retry: lock_ptr = q->lock_ptr; barrier(); if (lock_ptr != NULL) { - spin_lock(lock_ptr); + raw_spin_lock(lock_ptr); /* * q->lock_ptr can change between reading it and * spin_lock(), causing us to take the wrong lock. This @@ -1459,7 +1461,7 @@ retry: * we can detect whether we acquired the correct lock. */ if (unlikely(lock_ptr != q->lock_ptr)) { - spin_unlock(lock_ptr); + raw_spin_unlock(lock_ptr); goto retry; } WARN_ON(plist_node_empty(&q->list)); @@ -1467,7 +1469,7 @@ retry: BUG_ON(q->pi_state); - spin_unlock(lock_ptr); + raw_spin_unlock(lock_ptr); ret = 1; } @@ -1491,7 +1493,7 @@ static void unqueue_me_pi(struct futex_q *q) pi_state = q->pi_state; q->pi_state = NULL; - spin_unlock(q->lock_ptr); + raw_spin_unlock(q->lock_ptr); drop_futex_key_refs(&q->key); free_pi_state(pi_state); @@ -1579,11 +1581,11 @@ retry: * simply return. */ handle_fault: - spin_unlock(q->lock_ptr); + raw_spin_unlock(q->lock_ptr); ret = fault_in_user_writeable(uaddr); - spin_lock(q->lock_ptr); + raw_spin_lock(q->lock_ptr); /* * Check if someone else fixed it for us: @@ -1976,7 +1978,7 @@ retry_private: ret = ret ? 0 : -EWOULDBLOCK; } - spin_lock(q.lock_ptr); + raw_spin_lock(q.lock_ptr); /* * Fixup the pi_state owner and possibly acquire the lock if we * haven't already. @@ -2053,7 +2055,7 @@ retry: goto out; hb = hash_futex(&key); - spin_lock(&hb->lock); + raw_spin_lock(&hb->lock); /* * To avoid races, try to do the TID -> 0 atomic transition @@ -2102,14 +2104,14 @@ retry: } out_unlock: - spin_unlock(&hb->lock); + raw_spin_unlock(&hb->lock); put_futex_key(fshared, &key); out: return ret; pi_faulted: - spin_unlock(&hb->lock); + raw_spin_unlock(&hb->lock); put_futex_key(fshared, &key); ret = fault_in_user_writeable(uaddr); @@ -2257,9 +2259,9 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, int fshared, /* Queue the futex_q, drop the hb lock, wait for wakeup. */ futex_wait_queue_me(hb, &q, to); - spin_lock(&hb->lock); + raw_spin_lock(&hb->lock); ret = handle_early_requeue_pi_wakeup(hb, &q, &key2, to); - spin_unlock(&hb->lock); + raw_spin_unlock(&hb->lock); if (ret) goto out_put_keys; @@ -2277,10 +2279,10 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, int fshared, * did a lock-steal - fix up the PI-state in that case. */ if (q.pi_state && (q.pi_state->owner != current)) { - spin_lock(q.lock_ptr); + raw_spin_lock(q.lock_ptr); ret = fixup_pi_state_owner(uaddr2, &q, current, fshared); - spin_unlock(q.lock_ptr); + raw_spin_unlock(q.lock_ptr); } } else { /* @@ -2293,7 +2295,7 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, int fshared, ret = rt_mutex_finish_proxy_lock(pi_mutex, to, &rt_waiter, 1); debug_rt_mutex_free_waiter(&rt_waiter); - spin_lock(q.lock_ptr); + raw_spin_lock(q.lock_ptr); /* * Fixup the pi_state owner and possibly acquire the lock if we * haven't already. @@ -2668,8 +2670,11 @@ static int __init futex_init(void) futex_cmpxchg_enabled = 1; for (i = 0; i < ARRAY_SIZE(futex_queues); i++) { - plist_head_init(&futex_queues[i].chain, &futex_queues[i].lock); - spin_lock_init(&futex_queues[i].lock); + /* + * FIXME: plist wants a spinlock, but the hb->lock is a raw_spinlock_t + */ + plist_head_init(&futex_queues[i].chain, NULL /*&futex_queues[i].lock*/); + raw_spin_lock_init(&futex_queues[i].lock); } return 0; -- 1.7.0.4 ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH 4/4 V2] futex: convert hash_bucket locks to raw_spinlock_t 2010-07-09 22:33 ` [PATCH 4/4] futex: convert hash_bucket locks to raw_spinlock_t Darren Hart @ 2010-07-09 22:57 ` Darren Hart 2010-07-10 0:34 ` Steven Rostedt 2010-07-10 19:41 ` [PATCH 4/4] " Mike Galbraith 2010-07-12 13:05 ` Thomas Gleixner 2 siblings, 1 reply; 28+ messages in thread From: Darren Hart @ 2010-07-09 22:57 UTC (permalink / raw) To: Darren Hart Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Ingo Molnar, Eric Dumazet, John Kacur, Steven Rostedt, Mike Galbraith, linux-rt-users This version pulls in the bits mistakenly left in 3/4. >From 9f8b4faac79518f98131464c2d21a1c64fb841d2 Mon Sep 17 00:00:00 2001 From: Darren Hart <dvhltc@us.ibm.com> Date: Fri, 9 Jul 2010 16:44:47 -0400 Subject: [PATCH 4/4 V2] futex: convert hash_bucket locks to raw_spinlock_t The requeue_pi mechanism introduced proxy locking of the rtmutex. This creates a scenario where a task can wake-up, not knowing it has been enqueued on an rtmutex. In order to detect this, the task would have to be able to take either task->pi_blocked_on->lock->wait_lock and/or the hb->lock. Unfortunately, without already holding one of these, the pi_blocked_on variable can change from NULL to valid or from valid to NULL. Therefor, the task cannot be allowed to take a sleeping lock after wakeup or it could end up trying to block on two locks, the second overwriting a valid pi_blocked_on value. This obviously breaks the pi mechanism. This patch increases latency, while running the ltp pthread_cond_many test which Michal reported the bug with, I see double digit hrtimer latencies (typically only on the first run after boo): kernel: hrtimer: interrupt took 75911 ns This might be addressed by changing the various loops in the futex code to be incremental, probably at an additional throughput hit. The private hash_bucket lists discussed in the past could reduce hb->lock contention in some scenarios. It should be noted that pthread_cond_many is a rather pathological case. This also introduces problems for plists which want a spinlock_t rather than a raw_spinlock_t. Any thoughts on how to address this? Signed-off-by: Darren Hart <dvhltc@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: John Kacur <jkacur@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mike Galbraith <efault@gmx.de> --- kernel/futex.c | 73 ++++++++++++++++++++++++++++++-------------------------- 1 files changed, 39 insertions(+), 34 deletions(-) diff --git a/kernel/futex.c b/kernel/futex.c index 2cd58a2..0ad5a85 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -114,7 +114,7 @@ struct futex_q { struct plist_node list; struct task_struct *task; - spinlock_t *lock_ptr; + raw_spinlock_t *lock_ptr; union futex_key key; struct futex_pi_state *pi_state; struct rt_mutex_waiter *rt_waiter; @@ -128,7 +128,7 @@ struct futex_q { * waiting on a futex. */ struct futex_hash_bucket { - spinlock_t lock; + raw_spinlock_t lock; struct plist_head chain; }; @@ -479,7 +479,7 @@ void exit_pi_state_list(struct task_struct *curr) hb = hash_futex(&key); raw_spin_unlock_irq(&curr->pi_lock); - spin_lock(&hb->lock); + raw_spin_lock(&hb->lock); raw_spin_lock_irq(&curr->pi_lock); /* @@ -487,7 +487,7 @@ void exit_pi_state_list(struct task_struct *curr) * task still owns the PI-state: */ if (head->next != next) { - spin_unlock(&hb->lock); + raw_spin_unlock(&hb->lock); continue; } @@ -499,7 +499,7 @@ void exit_pi_state_list(struct task_struct *curr) rt_mutex_unlock(&pi_state->pi_mutex); - spin_unlock(&hb->lock); + raw_spin_unlock(&hb->lock); raw_spin_lock_irq(&curr->pi_lock); } @@ -860,21 +860,21 @@ static inline void double_lock_hb(struct futex_hash_bucket *hb1, struct futex_hash_bucket *hb2) { if (hb1 <= hb2) { - spin_lock(&hb1->lock); + raw_spin_lock(&hb1->lock); if (hb1 < hb2) - spin_lock_nested(&hb2->lock, SINGLE_DEPTH_NESTING); + raw_spin_lock_nested(&hb2->lock, SINGLE_DEPTH_NESTING); } else { /* hb1 > hb2 */ - spin_lock(&hb2->lock); - spin_lock_nested(&hb1->lock, SINGLE_DEPTH_NESTING); + raw_spin_lock(&hb2->lock); + raw_spin_lock_nested(&hb1->lock, SINGLE_DEPTH_NESTING); } } static inline void double_unlock_hb(struct futex_hash_bucket *hb1, struct futex_hash_bucket *hb2) { - spin_unlock(&hb1->lock); + raw_spin_unlock(&hb1->lock); if (hb1 != hb2) - spin_unlock(&hb2->lock); + raw_spin_unlock(&hb2->lock); } /* @@ -896,7 +896,7 @@ static int futex_wake(u32 __user *uaddr, int fshared, int nr_wake, u32 bitset) goto out; hb = hash_futex(&key); - spin_lock(&hb->lock); + raw_spin_lock(&hb->lock); head = &hb->chain; plist_for_each_entry_safe(this, next, head, list) { @@ -916,7 +916,7 @@ static int futex_wake(u32 __user *uaddr, int fshared, int nr_wake, u32 bitset) } } - spin_unlock(&hb->lock); + raw_spin_unlock(&hb->lock); put_futex_key(fshared, &key); out: return ret; @@ -1070,6 +1070,7 @@ void requeue_pi_wake_futex(struct futex_q *q, union futex_key *key, q->lock_ptr = &hb->lock; #ifdef CONFIG_DEBUG_PI_LIST + /* FIXME: we're converting this to a raw lock... */ q->list.plist.spinlock = &hb->lock; #endif @@ -1377,14 +1378,14 @@ static inline struct futex_hash_bucket *queue_lock(struct futex_q *q) hb = hash_futex(&q->key); q->lock_ptr = &hb->lock; - spin_lock(&hb->lock); + raw_spin_lock(&hb->lock); return hb; } static inline void queue_unlock(struct futex_q *q, struct futex_hash_bucket *hb) { - spin_unlock(&hb->lock); + raw_spin_unlock(&hb->lock); drop_futex_key_refs(&q->key); } @@ -1416,11 +1417,12 @@ static inline void queue_me(struct futex_q *q, struct futex_hash_bucket *hb) plist_node_init(&q->list, prio); #ifdef CONFIG_DEBUG_PI_LIST + /* FIXME: we're converting this to a raw_spinlock */ q->list.plist.spinlock = &hb->lock; #endif plist_add(&q->list, &hb->chain); q->task = current; - spin_unlock(&hb->lock); + raw_spin_unlock(&hb->lock); } /** @@ -1436,7 +1438,7 @@ static inline void queue_me(struct futex_q *q, struct futex_hash_bucket *hb) */ static int unqueue_me(struct futex_q *q) { - spinlock_t *lock_ptr; + raw_spinlock_t *lock_ptr; int ret = 0; /* In the common case we don't take the spinlock, which is nice. */ @@ -1444,7 +1446,7 @@ retry: lock_ptr = q->lock_ptr; barrier(); if (lock_ptr != NULL) { - spin_lock(lock_ptr); + raw_spin_lock(lock_ptr); /* * q->lock_ptr can change between reading it and * spin_lock(), causing us to take the wrong lock. This @@ -1459,7 +1461,7 @@ retry: * we can detect whether we acquired the correct lock. */ if (unlikely(lock_ptr != q->lock_ptr)) { - spin_unlock(lock_ptr); + raw_spin_unlock(lock_ptr); goto retry; } WARN_ON(plist_node_empty(&q->list)); @@ -1467,7 +1469,7 @@ retry: BUG_ON(q->pi_state); - spin_unlock(lock_ptr); + raw_spin_unlock(lock_ptr); ret = 1; } @@ -1491,7 +1493,7 @@ static void unqueue_me_pi(struct futex_q *q) pi_state = q->pi_state; q->pi_state = NULL; - spin_unlock(q->lock_ptr); + raw_spin_unlock(q->lock_ptr); drop_futex_key_refs(&q->key); free_pi_state(pi_state); @@ -1579,11 +1581,11 @@ retry: * simply return. */ handle_fault: - spin_unlock(q->lock_ptr); + raw_spin_unlock(q->lock_ptr); ret = fault_in_user_writeable(uaddr); - spin_lock(q->lock_ptr); + raw_spin_lock(q->lock_ptr); /* * Check if someone else fixed it for us: @@ -1976,7 +1978,7 @@ retry_private: ret = ret ? 0 : -EWOULDBLOCK; } - spin_lock(q.lock_ptr); + raw_spin_lock(q.lock_ptr); /* * Fixup the pi_state owner and possibly acquire the lock if we * haven't already. @@ -2053,7 +2055,7 @@ retry: goto out; hb = hash_futex(&key); - spin_lock(&hb->lock); + raw_spin_lock(&hb->lock); /* * To avoid races, try to do the TID -> 0 atomic transition @@ -2102,14 +2104,14 @@ retry: } out_unlock: - spin_unlock(&hb->lock); + raw_spin_unlock(&hb->lock); put_futex_key(fshared, &key); out: return ret; pi_faulted: - spin_unlock(&hb->lock); + raw_spin_unlock(&hb->lock); put_futex_key(fshared, &key); ret = fault_in_user_writeable(uaddr); @@ -2257,9 +2259,9 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, int fshared, /* Queue the futex_q, drop the hb lock, wait for wakeup. */ futex_wait_queue_me(hb, &q, to); - spin_lock(&hb->lock); + raw_spin_lock(&hb->lock); ret = handle_early_requeue_pi_wakeup(hb, &q, &key2, to); - spin_unlock(&hb->lock); + raw_spin_unlock(&hb->lock); if (ret) goto out_put_keys; @@ -2277,10 +2279,10 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, int fshared, * did a lock-steal - fix up the PI-state in that case. */ if (q.pi_state && (q.pi_state->owner != current)) { - spin_lock(q.lock_ptr); + raw_spin_lock(q.lock_ptr); ret = fixup_pi_state_owner(uaddr2, &q, current, fshared); - spin_unlock(q.lock_ptr); + raw_spin_unlock(q.lock_ptr); } } else { /* @@ -2293,7 +2295,7 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, int fshared, ret = rt_mutex_finish_proxy_lock(pi_mutex, to, &rt_waiter, 1); debug_rt_mutex_free_waiter(&rt_waiter); - spin_lock(q.lock_ptr); + raw_spin_lock(q.lock_ptr); /* * Fixup the pi_state owner and possibly acquire the lock if we * haven't already. @@ -2668,8 +2670,11 @@ static int __init futex_init(void) futex_cmpxchg_enabled = 1; for (i = 0; i < ARRAY_SIZE(futex_queues); i++) { - plist_head_init(&futex_queues[i].chain, &futex_queues[i].lock); - spin_lock_init(&futex_queues[i].lock); + /* + * FIXME: plist wants a spinlock, but the hb->lock is a raw_spinlock_t + */ + plist_head_init(&futex_queues[i].chain, NULL /*&futex_queues[i].lock*/); + raw_spin_lock_init(&futex_queues[i].lock); } return 0; -- 1.7.0.4 ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH 4/4 V2] futex: convert hash_bucket locks to raw_spinlock_t 2010-07-09 22:57 ` [PATCH 4/4 V2] " Darren Hart @ 2010-07-10 0:34 ` Steven Rostedt 0 siblings, 0 replies; 28+ messages in thread From: Steven Rostedt @ 2010-07-10 0:34 UTC (permalink / raw) To: Darren Hart Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Ingo Molnar, Eric Dumazet, John Kacur, Mike Galbraith, linux-rt-users On Fri, 2010-07-09 at 15:57 -0700, Darren Hart wrote: > This version pulls in the bits mistakenly left in 3/4. > > > >From 9f8b4faac79518f98131464c2d21a1c64fb841d2 Mon Sep 17 00:00:00 2001 > From: Darren Hart <dvhltc@us.ibm.com> > Date: Fri, 9 Jul 2010 16:44:47 -0400 > Subject: [PATCH 4/4 V2] futex: convert hash_bucket locks to raw_spinlock_t > > The requeue_pi mechanism introduced proxy locking of the rtmutex. This creates > a scenario where a task can wake-up, not knowing it has been enqueued on an > rtmutex. In order to detect this, the task would have to be able to take either > task->pi_blocked_on->lock->wait_lock and/or the hb->lock. Unfortunately, > without already holding one of these, the pi_blocked_on variable can change > from NULL to valid or from valid to NULL. Therefor, the task cannot be allowed > to take a sleeping lock after wakeup or it could end up trying to block on two > locks, the second overwriting a valid pi_blocked_on value. This obviously > breaks the pi mechanism. > > This patch increases latency, while running the ltp pthread_cond_many test > which Michal reported the bug with, I see double digit hrtimer latencies > (typically only on the first run after boo): > > kernel: hrtimer: interrupt took 75911 ns > > This might be addressed by changing the various loops in the futex code to be > incremental, probably at an additional throughput hit. The private hash_bucket > lists discussed in the past could reduce hb->lock contention in some scenarios. > It should be noted that pthread_cond_many is a rather pathological case. > > This also introduces problems for plists which want a spinlock_t rather > than a raw_spinlock_t. Any thoughts on how to address this? > > Signed-off-by: Darren Hart <dvhltc@us.ibm.com> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: Peter Zijlstra <peterz@infradead.org> > Cc: Ingo Molnar <mingo@elte.hu> > Cc: Eric Dumazet <eric.dumazet@gmail.com> > Cc: John Kacur <jkacur@redhat.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> -- Steve > Cc: Mike Galbraith <efault@gmx.de> > --- > kernel/futex.c | 73 ++++++++++++++++++++++++++++++-------------------------- > 1 files changed, 39 insertions(+), 34 deletions(-) ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 4/4] futex: convert hash_bucket locks to raw_spinlock_t 2010-07-09 22:33 ` [PATCH 4/4] futex: convert hash_bucket locks to raw_spinlock_t Darren Hart 2010-07-09 22:57 ` [PATCH 4/4 V2] " Darren Hart @ 2010-07-10 19:41 ` Mike Galbraith 2010-07-11 13:33 ` Mike Galbraith 2010-07-12 19:10 ` Darren Hart 2010-07-12 13:05 ` Thomas Gleixner 2 siblings, 2 replies; 28+ messages in thread From: Mike Galbraith @ 2010-07-10 19:41 UTC (permalink / raw) To: Darren Hart Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Ingo Molnar, Eric Dumazet, John Kacur, Steven Rostedt, linux-rt-users On Fri, 2010-07-09 at 15:33 -0700, Darren Hart wrote: > The requeue_pi mechanism introduced proxy locking of the rtmutex. This creates > a scenario where a task can wake-up, not knowing it has been enqueued on an > rtmutex. In order to detect this, the task would have to be able to take either > task->pi_blocked_on->lock->wait_lock and/or the hb->lock. Unfortunately, > without already holding one of these, the pi_blocked_on variable can change > from NULL to valid or from valid to NULL. Therefor, the task cannot be allowed > to take a sleeping lock after wakeup or it could end up trying to block on two > locks, the second overwriting a valid pi_blocked_on value. This obviously > breaks the pi mechanism. copy/paste offline query/reply at Darren's request.. On Sat, 2010-07-10 at 10:26 -0700, Darren Hart wrote: On 07/09/2010 09:32 PM, Mike Galbraith wrote: > > On Fri, 2010-07-09 at 13:05 -0700, Darren Hart wrote: > > > >> The core of the problem is that the proxy_lock blocks a task on a lock > >> the task knows nothing about. So when it wakes up inside of > >> futex_wait_requeue_pi, it immediately tries to block on hb->lock to > >> check why it woke up. This has the potential to block the task on two > >> locks (thus overwriting the pi_blocked_on). Any attempt preventing this > >> involves a lock, and ultimiately the hb->lock. The only solution I see > >> is to make the hb->locks raw locks (thanks to Steven Rostedt for > >> original idea and batting this around with me in IRC). > > > > Hm, so wakee _was_ munging his own state after all. > > > > Out of curiosity, what's wrong with holding his pi_lock across the > > wakeup? He can _try_ to block, but can't until pi state is stable. > > > > I presume there's a big fat gotcha that's just not obvious to futex > > locking newbie :) > > It'll take me more time that I have right now to positive, but: > > > rt_mutex_set_owner(lock, pendowner, RT_MUTEX_OWNER_PENDING); > > raw_spin_unlock(¤t->pi_lock); > > Your patch moved the unlock before the set_owner. I _believe_ this can > break the pi boosting logic - current is the owner until it calls > set_owner to be pendowner. I haven't traced this entire path yet, but > that's my gut feel. I _think_ it should be fine to do that. Setting an owner seems to only require holding the wait_lock. I could easily be missing subtleties though. Looking around, I didn't see any reason not to unlock the owner's pi_lock after twiddling pi_waiters (and still don't, but...). > However, you're idea has merit as we have to take our ->pi_lock before > we can block on the hb->lock (inside task_blocks_on_rt_mutex()). > > If we can't move the unlock above before set_owner, then we may need a: > > retry: > cur->lock() > top_waiter = get_top_waiter() > cur->unlock() > > double_lock(cur, topwaiter) > if top_waiter != get_top_waiter() > double_unlock(cur, topwaiter) > goto retry > > Not ideal, but I think I prefer that to making all the hb locks raw. > > You dropped the CC list for some reason, probably a good idea to send > this back out in response to my raw lock patch (4/4) - your question and > my reply. This is crazy stuff, no harm in putting the question out there. > > I'll take a closer look at this when I can, if not tonight, Monday morning. -Mike ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 4/4] futex: convert hash_bucket locks to raw_spinlock_t 2010-07-10 19:41 ` [PATCH 4/4] " Mike Galbraith @ 2010-07-11 13:33 ` Mike Galbraith 2010-07-11 15:10 ` Darren Hart 2010-07-12 11:45 ` Steven Rostedt 2010-07-12 19:10 ` Darren Hart 1 sibling, 2 replies; 28+ messages in thread From: Mike Galbraith @ 2010-07-11 13:33 UTC (permalink / raw) To: Darren Hart Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Ingo Molnar, Eric Dumazet, John Kacur, Steven Rostedt, linux-rt-users On Sat, 2010-07-10 at 21:41 +0200, Mike Galbraith wrote: > On Fri, 2010-07-09 at 15:33 -0700, Darren Hart wrote: > > If we can't move the unlock above before set_owner, then we may need a: > > > > retry: > > cur->lock() > > top_waiter = get_top_waiter() > > cur->unlock() > > > > double_lock(cur, topwaiter) > > if top_waiter != get_top_waiter() > > double_unlock(cur, topwaiter) > > goto retry > > > > Not ideal, but I think I prefer that to making all the hb locks raw. Another option: only scratch the itchy spot. futex: non-blocking synchronization point for futex_wait_requeue_pi() and futex_requeue(). Problem analysis by Darren Hart; The requeue_pi mechanism introduced proxy locking of the rtmutex. This creates a scenario where a task can wake-up, not knowing it has been enqueued on an rtmutex. In order to detect this, the task would have to be able to take either task->pi_blocked_on->lock->wait_lock and/or the hb->lock. Unfortunately, without already holding one of these, the pi_blocked_on variable can change from NULL to valid or from valid to NULL. Therefor, the task cannot be allowed to take a sleeping lock after wakeup or it could end up trying to block on two locks, the second overwriting a valid pi_blocked_on value. This obviously breaks the pi mechanism. Rather than convert the bh-lock to a raw spinlock, do so only in the spot where blocking cannot be allowed, ie before we know that lock handoff has completed. Signed-off-by: Mike Galbraith <efault@gmx.de> Cc: Darren Hart <dvhltc@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: John Kacur <jkacur@redhat.com> Cc: Steven Rostedt <rostedt@goodmis.org> diff --git a/kernel/futex.c b/kernel/futex.c index a6cec32..ef489f3 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -2255,7 +2255,14 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, int fshared, /* Queue the futex_q, drop the hb lock, wait for wakeup. */ futex_wait_queue_me(hb, &q, to); - spin_lock(&hb->lock); + /* + * Non-blocking synchronization point with futex_requeue(). + * + * We dare not block here because this will alter PI state, possibly + * before our waker finishes modifying same in wakeup_next_waiter(). + */ + while(!spin_trylock(&hb->lock)) + cpu_relax(); ret = handle_early_requeue_pi_wakeup(hb, &q, &key2, to); spin_unlock(&hb->lock); if (ret) ^ permalink raw reply related [flat|nested] 28+ messages in thread
* Re: [PATCH 4/4] futex: convert hash_bucket locks to raw_spinlock_t 2010-07-11 13:33 ` Mike Galbraith @ 2010-07-11 15:10 ` Darren Hart 2010-07-12 11:45 ` Steven Rostedt 1 sibling, 0 replies; 28+ messages in thread From: Darren Hart @ 2010-07-11 15:10 UTC (permalink / raw) To: Mike Galbraith Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Ingo Molnar, Eric Dumazet, John Kacur, Steven Rostedt, linux-rt-users On 07/11/2010 06:33 AM, Mike Galbraith wrote: > On Sat, 2010-07-10 at 21:41 +0200, Mike Galbraith wrote: >> On Fri, 2010-07-09 at 15:33 -0700, Darren Hart wrote: > >>> If we can't move the unlock above before set_owner, then we may need a: >>> >>> retry: >>> cur->lock() >>> top_waiter = get_top_waiter() >>> cur->unlock() >>> >>> double_lock(cur, topwaiter) >>> if top_waiter != get_top_waiter() >>> double_unlock(cur, topwaiter) >>> goto retry >>> >>> Not ideal, but I think I prefer that to making all the hb locks raw. > > Another option: only scratch the itchy spot. > > futex: non-blocking synchronization point for futex_wait_requeue_pi() and futex_requeue(). > > Problem analysis by Darren Hart; > The requeue_pi mechanism introduced proxy locking of the rtmutex. This creates > a scenario where a task can wake-up, not knowing it has been enqueued on an > rtmutex. In order to detect this, the task would have to be able to take either > task->pi_blocked_on->lock->wait_lock and/or the hb->lock. Unfortunately, > without already holding one of these, the pi_blocked_on variable can change > from NULL to valid or from valid to NULL. Therefor, the task cannot be allowed > to take a sleeping lock after wakeup or it could end up trying to block on two > locks, the second overwriting a valid pi_blocked_on value. This obviously > breaks the pi mechanism. > > Rather than convert the bh-lock to a raw spinlock, do so only in the spot where > blocking cannot be allowed, ie before we know that lock handoff has completed. I like it. I especially like the change is only evident if you are using the code path that introduced the problem in the first place. If you're doing a lot of requeue_pi operations, then the waking waiters have an advantage over new pending waiters or other tasks with futex keyed on the same hash-bucket... but that seems acceptable to me. I'd like to confirm that holding the pendowner->pi-lock across the wakeup in wakeup_next_waiter() isn't feasible first. If it can work, I think the impact would be lower. I'll have a look tomorrow. Nice work Mike. -- Darrem > Signed-off-by: Mike Galbraith<efault@gmx.de> > Cc: Darren Hart<dvhltc@us.ibm.com> > Cc: Thomas Gleixner<tglx@linutronix.de> > Cc: Peter Zijlstra<peterz@infradead.org> > Cc: Ingo Molnar<mingo@elte.hu> > Cc: Eric Dumazet<eric.dumazet@gmail.com> > Cc: John Kacur<jkacur@redhat.com> > Cc: Steven Rostedt<rostedt@goodmis.org> > > diff --git a/kernel/futex.c b/kernel/futex.c > index a6cec32..ef489f3 100644 > --- a/kernel/futex.c > +++ b/kernel/futex.c > @@ -2255,7 +2255,14 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, int fshared, > /* Queue the futex_q, drop the hb lock, wait for wakeup. */ > futex_wait_queue_me(hb,&q, to); > > - spin_lock(&hb->lock); > + /* > + * Non-blocking synchronization point with futex_requeue(). > + * > + * We dare not block here because this will alter PI state, possibly > + * before our waker finishes modifying same in wakeup_next_waiter(). > + */ > + while(!spin_trylock(&hb->lock)) > + cpu_relax(); > ret = handle_early_requeue_pi_wakeup(hb,&q,&key2, to); > spin_unlock(&hb->lock); > if (ret) > > -- Darren Hart IBM Linux Technology Center Real-Time Linux Team ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 4/4] futex: convert hash_bucket locks to raw_spinlock_t 2010-07-11 13:33 ` Mike Galbraith 2010-07-11 15:10 ` Darren Hart @ 2010-07-12 11:45 ` Steven Rostedt 2010-07-12 12:12 ` Mike Galbraith 1 sibling, 1 reply; 28+ messages in thread From: Steven Rostedt @ 2010-07-12 11:45 UTC (permalink / raw) To: Mike Galbraith Cc: Darren Hart, linux-kernel, Thomas Gleixner, Peter Zijlstra, Ingo Molnar, Eric Dumazet, John Kacur, linux-rt-users On Sun, 2010-07-11 at 15:33 +0200, Mike Galbraith wrote: > On Sat, 2010-07-10 at 21:41 +0200, Mike Galbraith wrote: > diff --git a/kernel/futex.c b/kernel/futex.c > index a6cec32..ef489f3 100644 > --- a/kernel/futex.c > +++ b/kernel/futex.c > @@ -2255,7 +2255,14 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, int fshared, > /* Queue the futex_q, drop the hb lock, wait for wakeup. */ > futex_wait_queue_me(hb, &q, to); > > - spin_lock(&hb->lock); > + /* > + * Non-blocking synchronization point with futex_requeue(). > + * > + * We dare not block here because this will alter PI state, possibly > + * before our waker finishes modifying same in wakeup_next_waiter(). > + */ > + while(!spin_trylock(&hb->lock)) > + cpu_relax(); I agree that this would work. But I wonder if this should have an: #ifdef PREEMPT_RT [...] #else spin_lock(&hb->lock); #endif around it. Or encapsulate this lock in a macro that does the same thing (just to keep the actual code cleaner) -- Steve > ret = handle_early_requeue_pi_wakeup(hb, &q, &key2, to); > spin_unlock(&hb->lock); > if (ret) > > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 4/4] futex: convert hash_bucket locks to raw_spinlock_t 2010-07-12 11:45 ` Steven Rostedt @ 2010-07-12 12:12 ` Mike Galbraith 0 siblings, 0 replies; 28+ messages in thread From: Mike Galbraith @ 2010-07-12 12:12 UTC (permalink / raw) To: rostedt Cc: Darren Hart, linux-kernel, Thomas Gleixner, Peter Zijlstra, Ingo Molnar, Eric Dumazet, John Kacur, linux-rt-users On Mon, 2010-07-12 at 07:45 -0400, Steven Rostedt wrote: > On Sun, 2010-07-11 at 15:33 +0200, Mike Galbraith wrote: > > On Sat, 2010-07-10 at 21:41 +0200, Mike Galbraith wrote: > > > diff --git a/kernel/futex.c b/kernel/futex.c > > index a6cec32..ef489f3 100644 > > --- a/kernel/futex.c > > +++ b/kernel/futex.c > > @@ -2255,7 +2255,14 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, int fshared, > > /* Queue the futex_q, drop the hb lock, wait for wakeup. */ > > futex_wait_queue_me(hb, &q, to); > > > > - spin_lock(&hb->lock); > > + /* > > + * Non-blocking synchronization point with futex_requeue(). > > + * > > + * We dare not block here because this will alter PI state, possibly > > + * before our waker finishes modifying same in wakeup_next_waiter(). > > + */ > > + while(!spin_trylock(&hb->lock)) > > + cpu_relax(); > > I agree that this would work. But I wonder if this should have an: > > #ifdef PREEMPT_RT > [...] > #else > spin_lock(&hb->lock); > #endif > > around it. Or encapsulate this lock in a macro that does the same thing > (just to keep the actual code cleaner) Yeah, it should. I'll wait to see what Darren/others say about holding the wakee's pi_lock across wakeup to plug it. If he submits something along that line, I can bin this. -Mike ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 4/4] futex: convert hash_bucket locks to raw_spinlock_t 2010-07-10 19:41 ` [PATCH 4/4] " Mike Galbraith 2010-07-11 13:33 ` Mike Galbraith @ 2010-07-12 19:10 ` Darren Hart 2010-07-12 20:40 ` Thomas Gleixner 1 sibling, 1 reply; 28+ messages in thread From: Darren Hart @ 2010-07-12 19:10 UTC (permalink / raw) To: Mike Galbraith Cc: linux-kernel, Thomas Gleixner, Peter Zijlstra, Ingo Molnar, Eric Dumazet, John Kacur, Steven Rostedt, linux-rt-users On 07/10/2010 12:41 PM, Mike Galbraith wrote: > On Fri, 2010-07-09 at 15:33 -0700, Darren Hart wrote: >> The requeue_pi mechanism introduced proxy locking of the rtmutex. This creates >> a scenario where a task can wake-up, not knowing it has been enqueued on an >> rtmutex. In order to detect this, the task would have to be able to take either >> task->pi_blocked_on->lock->wait_lock and/or the hb->lock. Unfortunately, >> without already holding one of these, the pi_blocked_on variable can change >> from NULL to valid or from valid to NULL. Therefor, the task cannot be allowed >> to take a sleeping lock after wakeup or it could end up trying to block on two >> locks, the second overwriting a valid pi_blocked_on value. This obviously >> breaks the pi mechanism. > > copy/paste offline query/reply at Darren's request.. > > On Sat, 2010-07-10 at 10:26 -0700, Darren Hart wrote: > On 07/09/2010 09:32 PM, Mike Galbraith wrote: >>> On Fri, 2010-07-09 at 13:05 -0700, Darren Hart wrote: >>> >>>> The core of the problem is that the proxy_lock blocks a task on a lock >>>> the task knows nothing about. So when it wakes up inside of >>>> futex_wait_requeue_pi, it immediately tries to block on hb->lock to >>>> check why it woke up. This has the potential to block the task on two >>>> locks (thus overwriting the pi_blocked_on). Any attempt preventing this >>>> involves a lock, and ultimiately the hb->lock. The only solution I see >>>> is to make the hb->locks raw locks (thanks to Steven Rostedt for >>>> original idea and batting this around with me in IRC). >>> >>> Hm, so wakee _was_ munging his own state after all. >>> >>> Out of curiosity, what's wrong with holding his pi_lock across the >>> wakeup? He can _try_ to block, but can't until pi state is stable. >>> >>> I presume there's a big fat gotcha that's just not obvious to futex >>> locking newbie :) Nor to some of us that have been engrossed in futexes for the last couple years! I discussed the pi_lock across the wakeup issue with Thomas. While this fixes the problem for this particular failure case, it doesn't protect against: <tglx> assume the following: <tglx> t1 is on the condvar <tglx> t2 does the requeue dance and t1 is now blocked on the outer futex <tglx> t3 takes hb->lock for a futex in the same bucket <tglx> t2 wakes due to signal/timeout <tglx> t2 blocks on hb->lock You are likely to have not hit the above scenario because you only had one condvar, so the hash_buckets were not heavily shared and you weren't likely to hit: <tglx> t3 takes hb->lock for a futex in the same bucket I'm going to roll up a patchset with your (Mike) spin_trylock patch and run it through some tests. I'd still prefer a way to detect early wakeup without having to grab the hb->lock(), but I haven't found it yet. + while(!spin_trylock(&hb->lock)) + cpu_relax(); ret = handle_early_requeue_pi_wakeup(hb, &q, &key2, to); spin_unlock(&hb->lock); Thanks, -- Darren Hart IBM Linux Technology Center Real-Time Linux Team ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 4/4] futex: convert hash_bucket locks to raw_spinlock_t 2010-07-12 19:10 ` Darren Hart @ 2010-07-12 20:40 ` Thomas Gleixner 2010-07-12 20:43 ` Thomas Gleixner 2010-07-13 3:09 ` Mike Galbraith 0 siblings, 2 replies; 28+ messages in thread From: Thomas Gleixner @ 2010-07-12 20:40 UTC (permalink / raw) To: Darren Hart Cc: Mike Galbraith, linux-kernel, Peter Zijlstra, Ingo Molnar, Eric Dumazet, John Kacur, Steven Rostedt, linux-rt-users On Mon, 12 Jul 2010, Darren Hart wrote: > On 07/10/2010 12:41 PM, Mike Galbraith wrote: > > On Fri, 2010-07-09 at 15:33 -0700, Darren Hart wrote: > > > > Out of curiosity, what's wrong with holding his pi_lock across the > > > > wakeup? He can _try_ to block, but can't until pi state is stable. > > > > > > > > I presume there's a big fat gotcha that's just not obvious to futex > > > > locking newbie :) > > Nor to some of us that have been engrossed in futexes for the last couple > years! I discussed the pi_lock across the wakeup issue with Thomas. While this > fixes the problem for this particular failure case, it doesn't protect > against: > > <tglx> assume the following: > <tglx> t1 is on the condvar > <tglx> t2 does the requeue dance and t1 is now blocked on the outer futex > <tglx> t3 takes hb->lock for a futex in the same bucket > <tglx> t2 wakes due to signal/timeout > <tglx> t2 blocks on hb->lock > > You are likely to have not hit the above scenario because you only had one > condvar, so the hash_buckets were not heavily shared and you weren't likely to > hit: > > <tglx> t3 takes hb->lock for a futex in the same bucket > > > I'm going to roll up a patchset with your (Mike) spin_trylock patch and run it > through some tests. I'd still prefer a way to detect early wakeup without > having to grab the hb->lock(), but I haven't found it yet. > > + while(!spin_trylock(&hb->lock)) > + cpu_relax(); > ret = handle_early_requeue_pi_wakeup(hb, &q, &key2, to); > spin_unlock(&hb->lock); And this is nasty as it will create unbound priority inversion :( We discussed another solution on IRC in meantime: in futex_wait_requeue_pi() futex_wait_queue_me(hb, &q, to); raw_spin_lock(current->pi_lock); if (current->pi_blocked_on) { /* * We know that we can only be blocked on the outer futex * so we can skip the early wakeup check */ raw_spin_unlock(current->pi_lock); ret = 0; } else { current->pi_blocked_on = PI_WAKEUP_INPROGRESS; raw_spin_unlock(current->pi_lock); spin_lock(&hb->lock); ret = handle_early_requeue_pi_wakeup(); .... spin_lock(&hb->lock); } Now in the rtmutex magic we need in task_blocks_on_rt_mutex(): raw_spin_lock(task->pi_lock); /* * Add big fat comment why this is only relevant to futex * requeue_pi */ if (task != current && task->pi_blocked_on == PI_WAKEUP_INPROGRESS) { raw_spin_lock(task->pi_lock); /* * Returning 0 here is fine. the requeue code is just going to * move the futex_q to the other bucket, but that'll be fixed * up in handle_early_requeue_pi_wakeup() */ return 0; } Thanks, tglx ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 4/4] futex: convert hash_bucket locks to raw_spinlock_t 2010-07-12 20:40 ` Thomas Gleixner @ 2010-07-12 20:43 ` Thomas Gleixner 2010-07-13 3:09 ` Mike Galbraith 1 sibling, 0 replies; 28+ messages in thread From: Thomas Gleixner @ 2010-07-12 20:43 UTC (permalink / raw) To: Darren Hart Cc: Mike Galbraith, linux-kernel, Peter Zijlstra, Ingo Molnar, Eric Dumazet, John Kacur, Steven Rostedt, linux-rt-users On Mon, 12 Jul 2010, Thomas Gleixner wrote: > On Mon, 12 Jul 2010, Darren Hart wrote: > > On 07/10/2010 12:41 PM, Mike Galbraith wrote: > > > On Fri, 2010-07-09 at 15:33 -0700, Darren Hart wrote: > > > > > Out of curiosity, what's wrong with holding his pi_lock across the > > > > > wakeup? He can _try_ to block, but can't until pi state is stable. > > > > > > > > > > I presume there's a big fat gotcha that's just not obvious to futex > > > > > locking newbie :) > > > > Nor to some of us that have been engrossed in futexes for the last couple > > years! I discussed the pi_lock across the wakeup issue with Thomas. While this > > fixes the problem for this particular failure case, it doesn't protect > > against: > > > > <tglx> assume the following: > > <tglx> t1 is on the condvar > > <tglx> t2 does the requeue dance and t1 is now blocked on the outer futex > > <tglx> t3 takes hb->lock for a futex in the same bucket > > <tglx> t2 wakes due to signal/timeout > > <tglx> t2 blocks on hb->lock > > > > You are likely to have not hit the above scenario because you only had one > > condvar, so the hash_buckets were not heavily shared and you weren't likely to > > hit: > > > > <tglx> t3 takes hb->lock for a futex in the same bucket > > > > > > I'm going to roll up a patchset with your (Mike) spin_trylock patch and run it > > through some tests. I'd still prefer a way to detect early wakeup without > > having to grab the hb->lock(), but I haven't found it yet. > > > > + while(!spin_trylock(&hb->lock)) > > + cpu_relax(); > > ret = handle_early_requeue_pi_wakeup(hb, &q, &key2, to); > > spin_unlock(&hb->lock); > > And this is nasty as it will create unbound priority inversion :( > > We discussed another solution on IRC in meantime: > > in futex_wait_requeue_pi() > > futex_wait_queue_me(hb, &q, to); > > raw_spin_lock(current->pi_lock); > if (current->pi_blocked_on) { > /* > * We know that we can only be blocked on the outer futex > * so we can skip the early wakeup check > */ > raw_spin_unlock(current->pi_lock); > ret = 0; > } else { > current->pi_blocked_on = PI_WAKEUP_INPROGRESS; > raw_spin_unlock(current->pi_lock); > > spin_lock(&hb->lock); > ret = handle_early_requeue_pi_wakeup(); > .... > spin_lock(&hb->lock); > } > > Now in the rtmutex magic we need in task_blocks_on_rt_mutex(): > > raw_spin_lock(task->pi_lock); > > /* > * Add big fat comment why this is only relevant to futex > * requeue_pi > */ > > if (task != current && task->pi_blocked_on == PI_WAKEUP_INPROGRESS) { > raw_spin_lock(task->pi_lock); > > /* > * Returning 0 here is fine. the requeue code is just going to > * move the futex_q to the other bucket, but that'll be fixed > * up in handle_early_requeue_pi_wakeup() > */ > > return 0; We might also return a sensible error code here and just remove the waiter from all queues, which needs to be handled in handle_early_requeue_pi_wakeup() after acquiring hb->lock then. Thanks, tglx ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 4/4] futex: convert hash_bucket locks to raw_spinlock_t 2010-07-12 20:40 ` Thomas Gleixner 2010-07-12 20:43 ` Thomas Gleixner @ 2010-07-13 3:09 ` Mike Galbraith 2010-07-13 7:12 ` Darren Hart 1 sibling, 1 reply; 28+ messages in thread From: Mike Galbraith @ 2010-07-13 3:09 UTC (permalink / raw) To: Thomas Gleixner Cc: Darren Hart, linux-kernel, Peter Zijlstra, Ingo Molnar, Eric Dumazet, John Kacur, Steven Rostedt, linux-rt-users On Mon, 2010-07-12 at 22:40 +0200, Thomas Gleixner wrote: > On Mon, 12 Jul 2010, Darren Hart wrote: > > On 07/10/2010 12:41 PM, Mike Galbraith wrote: > > > On Fri, 2010-07-09 at 15:33 -0700, Darren Hart wrote: > > > > > Out of curiosity, what's wrong with holding his pi_lock across the > > > > > wakeup? He can _try_ to block, but can't until pi state is stable. > > > > > > > > > > I presume there's a big fat gotcha that's just not obvious to futex > > > > > locking newbie :) > > > > Nor to some of us that have been engrossed in futexes for the last couple > > years! I discussed the pi_lock across the wakeup issue with Thomas. While this > > fixes the problem for this particular failure case, it doesn't protect > > against: > > > > <tglx> assume the following: > > <tglx> t1 is on the condvar > > <tglx> t2 does the requeue dance and t1 is now blocked on the outer futex > > <tglx> t3 takes hb->lock for a futex in the same bucket > > <tglx> t2 wakes due to signal/timeout > > <tglx> t2 blocks on hb->lock > > > > You are likely to have not hit the above scenario because you only had one > > condvar, so the hash_buckets were not heavily shared and you weren't likely to > > hit: > > > > <tglx> t3 takes hb->lock for a futex in the same bucket > > > > > > I'm going to roll up a patchset with your (Mike) spin_trylock patch and run it > > through some tests. I'd still prefer a way to detect early wakeup without > > having to grab the hb->lock(), but I haven't found it yet. > > > > + while(!spin_trylock(&hb->lock)) > > + cpu_relax(); > > ret = handle_early_requeue_pi_wakeup(hb, &q, &key2, to); > > spin_unlock(&hb->lock); > > And this is nasty as it will create unbound priority inversion :( Oh ma gawd, _it's a train_ :> > We discussed another solution on IRC in meantime: > > in futex_wait_requeue_pi() > > futex_wait_queue_me(hb, &q, to); > > raw_spin_lock(current->pi_lock); > if (current->pi_blocked_on) { > /* > * We know that we can only be blocked on the outer futex > * so we can skip the early wakeup check > */ > raw_spin_unlock(current->pi_lock); > ret = 0; > } else { > current->pi_blocked_on = PI_WAKEUP_INPROGRESS; > raw_spin_unlock(current->pi_lock); > > spin_lock(&hb->lock); > ret = handle_early_requeue_pi_wakeup(); > .... > spin_lock(&hb->lock); > } > > Now in the rtmutex magic we need in task_blocks_on_rt_mutex(): > > raw_spin_lock(task->pi_lock); > > /* > * Add big fat comment why this is only relevant to futex > * requeue_pi > */ > > if (task != current && task->pi_blocked_on == PI_WAKEUP_INPROGRESS) { > raw_spin_lock(task->pi_lock); > > /* > * Returning 0 here is fine. the requeue code is just going to > * move the futex_q to the other bucket, but that'll be fixed > * up in handle_early_requeue_pi_wakeup() > */ > > return 0; > } > > Thanks, > > tglx > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-rt-users" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 4/4] futex: convert hash_bucket locks to raw_spinlock_t 2010-07-13 3:09 ` Mike Galbraith @ 2010-07-13 7:12 ` Darren Hart 0 siblings, 0 replies; 28+ messages in thread From: Darren Hart @ 2010-07-13 7:12 UTC (permalink / raw) To: Mike Galbraith Cc: Thomas Gleixner, linux-kernel, Peter Zijlstra, Ingo Molnar, Eric Dumazet, John Kacur, Steven Rostedt, linux-rt-users On 07/12/2010 08:09 PM, Mike Galbraith wrote: > On Mon, 2010-07-12 at 22:40 +0200, Thomas Gleixner wrote: >> On Mon, 12 Jul 2010, Darren Hart wrote: >>> On 07/10/2010 12:41 PM, Mike Galbraith wrote: >>>> On Fri, 2010-07-09 at 15:33 -0700, Darren Hart wrote: >>>>>> Out of curiosity, what's wrong with holding his pi_lock across the >>>>>> wakeup? He can _try_ to block, but can't until pi state is stable. >>>>>> >>>>>> I presume there's a big fat gotcha that's just not obvious to futex >>>>>> locking newbie :) >>> >>> Nor to some of us that have been engrossed in futexes for the last couple >>> years! I discussed the pi_lock across the wakeup issue with Thomas. While this >>> fixes the problem for this particular failure case, it doesn't protect >>> against: >>> >>> <tglx> assume the following: >>> <tglx> t1 is on the condvar >>> <tglx> t2 does the requeue dance and t1 is now blocked on the outer futex >>> <tglx> t3 takes hb->lock for a futex in the same bucket >>> <tglx> t2 wakes due to signal/timeout >>> <tglx> t2 blocks on hb->lock >>> >>> You are likely to have not hit the above scenario because you only had one >>> condvar, so the hash_buckets were not heavily shared and you weren't likely to >>> hit: >>> >>> <tglx> t3 takes hb->lock for a futex in the same bucket >>> >>> >>> I'm going to roll up a patchset with your (Mike) spin_trylock patch and run it >>> through some tests. I'd still prefer a way to detect early wakeup without >>> having to grab the hb->lock(), but I haven't found it yet. >>> >>> + while(!spin_trylock(&hb->lock)) >>> + cpu_relax(); >>> ret = handle_early_requeue_pi_wakeup(hb,&q,&key2, to); >>> spin_unlock(&hb->lock); >> >> And this is nasty as it will create unbound priority inversion :( > > Oh ma gawd, _it's a train_ :> Seriously. I have a fix. Cleaning it up as we speak, still hope to send out tonight. -- Darren Hart IBM Linux Technology Center Real-Time Linux Team ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 4/4] futex: convert hash_bucket locks to raw_spinlock_t 2010-07-09 22:33 ` [PATCH 4/4] futex: convert hash_bucket locks to raw_spinlock_t Darren Hart 2010-07-09 22:57 ` [PATCH 4/4 V2] " Darren Hart 2010-07-10 19:41 ` [PATCH 4/4] " Mike Galbraith @ 2010-07-12 13:05 ` Thomas Gleixner 2 siblings, 0 replies; 28+ messages in thread From: Thomas Gleixner @ 2010-07-12 13:05 UTC (permalink / raw) To: Darren Hart Cc: linux-kernel, Peter Zijlstra, Ingo Molnar, Eric Dumazet, John Kacur, Steven Rostedt, Mike Galbraith, linux-rt-users On Fri, 9 Jul 2010, Darren Hart wrote: > The requeue_pi mechanism introduced proxy locking of the rtmutex. This creates > a scenario where a task can wake-up, not knowing it has been enqueued on an > rtmutex. In order to detect this, the task would have to be able to take either > task->pi_blocked_on->lock->wait_lock and/or the hb->lock. Unfortunately, > without already holding one of these, the pi_blocked_on variable can change > from NULL to valid or from valid to NULL. Therefor, the task cannot be allowed > to take a sleeping lock after wakeup or it could end up trying to block on two > locks, the second overwriting a valid pi_blocked_on value. This obviously > breaks the pi mechanism. > > This patch increases latency, while running the ltp pthread_cond_many test > which Michal reported the bug with, I see double digit hrtimer latencies > (typically only on the first run after boo): > > kernel: hrtimer: interrupt took 75911 ns Eewwww. There must be some more intelligent and less intrusive way to detect this. Thanks, tglx ^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2010-07-13 7:12 UTC | newest] Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2010-07-09 22:32 [PATCH 0/4][RT] futex: fix tasks blocking on two rt_mutex locks Darren Hart 2010-07-09 22:32 ` [PATCH 1/4] rtmutex: avoid null derefence in WARN_ON Darren Hart 2010-07-10 0:29 ` Steven Rostedt 2010-07-10 14:42 ` Darren Hart 2010-07-09 22:32 ` [PATCH 2/4] rtmutex: add BUG_ON if a task attempts to block on two locks Darren Hart 2010-07-10 0:30 ` Steven Rostedt 2010-07-10 17:30 ` [PATCH 2/4 V2] " Darren Hart 2010-07-09 22:32 ` [PATCH 3/4] futex: free_pi_state outside of hb->lock sections Darren Hart 2010-07-09 22:55 ` [PATCH 3/4 V2] " Darren Hart 2010-07-09 22:55 ` Darren Hart 2010-07-10 0:32 ` Steven Rostedt 2010-07-10 14:41 ` Darren Hart 2010-07-12 10:35 ` [PATCH 3/4] " Thomas Gleixner 2010-07-12 10:46 ` Steven Rostedt 2010-07-09 22:33 ` [PATCH 4/4] futex: convert hash_bucket locks to raw_spinlock_t Darren Hart 2010-07-09 22:57 ` [PATCH 4/4 V2] " Darren Hart 2010-07-10 0:34 ` Steven Rostedt 2010-07-10 19:41 ` [PATCH 4/4] " Mike Galbraith 2010-07-11 13:33 ` Mike Galbraith 2010-07-11 15:10 ` Darren Hart 2010-07-12 11:45 ` Steven Rostedt 2010-07-12 12:12 ` Mike Galbraith 2010-07-12 19:10 ` Darren Hart 2010-07-12 20:40 ` Thomas Gleixner 2010-07-12 20:43 ` Thomas Gleixner 2010-07-13 3:09 ` Mike Galbraith 2010-07-13 7:12 ` Darren Hart 2010-07-12 13:05 ` Thomas Gleixner
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.