* [PATCH -tip 0/4] rtmutex: Spin on owner @ 2015-05-19 17:24 Davidlohr Bueso 2015-05-19 17:24 ` [PATCH 1/4] locking/rtmutex: Implement lockless top-waiter wakeup Davidlohr Bueso ` (4 more replies) 0 siblings, 5 replies; 38+ messages in thread From: Davidlohr Bueso @ 2015-05-19 17:24 UTC (permalink / raw) To: Thomas Gleixner, Peter Zijlstra, Ingo Molnar Cc: Steven Rostedt, Mike Galbraith, Paul E. McKenney, Sebastian Andrzej Siewior, Davidlohr Bueso, linux-kernel Hello, First three patches are straightforward and found while going through the code. Patch 4 is the actual meat of the set, but similar to what we have already in regular mutexes. While having optimistic spinning in rtmutexes is a desired feature, I've marked it RFC as I could very well have missed something inherint in the rt flavor. Details in the individual patches. Passes pi tests from Darren's futextest suite as well as all weekend running pi_stress from rt-tests on a 60 core box. Thanks! Davidlohr Bueso (4): locking/rtmutex: Implement lockless top-waiter wakeup locking/rtmutex: Use cmp-cmpxchg locking/rtmutex: Update stale plist comments locking/rtmutex: Support spin on owner (osq) include/linux/rtmutex.h | 4 ++ kernel/Kconfig.locks | 4 ++ kernel/locking/rtmutex.c | 175 +++++++++++++++++++++++++++++++++++++++++------ 3 files changed, 162 insertions(+), 21 deletions(-) -- 2.1.4 ^ permalink raw reply [flat|nested] 38+ messages in thread
* [PATCH 1/4] locking/rtmutex: Implement lockless top-waiter wakeup 2015-05-19 17:24 [PATCH -tip 0/4] rtmutex: Spin on owner Davidlohr Bueso @ 2015-05-19 17:24 ` Davidlohr Bueso 2015-06-05 12:35 ` Thomas Gleixner ` (2 more replies) 2015-05-19 17:24 ` [PATCH 2/4] locking/rtmutex: Use cmp-cmpxchg Davidlohr Bueso ` (3 subsequent siblings) 4 siblings, 3 replies; 38+ messages in thread From: Davidlohr Bueso @ 2015-05-19 17:24 UTC (permalink / raw) To: Thomas Gleixner, Peter Zijlstra, Ingo Molnar Cc: Steven Rostedt, Mike Galbraith, Paul E. McKenney, Sebastian Andrzej Siewior, Davidlohr Bueso, linux-kernel, Davidlohr Bueso Mark the task for later wakeup after the wait_lock has been released. This way, once the next task is awoken, it will have a better chance to of finding the wait_lock free when continuing executing in __rt_mutex_slowlock() when trying to acquire the rtmutex, calling try_to_take_rt_mutex(). Upon contended scenarios, other tasks attempting take the lock may acquire it first, right after the wait_lock is released, but (a) this can also occur with the current code, as it relies on the spinlock fairness, and (b) we are dealing with the top-waiter anyway, so it will always take the lock next. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> --- kernel/locking/rtmutex.c | 21 ++++++++++----------- 1 file changed, 10 insertions(+), 11 deletions(-) diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c index 36573e9..74188d8 100644 --- a/kernel/locking/rtmutex.c +++ b/kernel/locking/rtmutex.c @@ -955,14 +955,13 @@ static int task_blocks_on_rt_mutex(struct rt_mutex *lock, } /* - * Wake up the next waiter on the lock. - * * Remove the top waiter from the current tasks pi waiter list and - * wake it up. + * queue it up. * * Called with lock->wait_lock held. */ -static void wakeup_next_waiter(struct rt_mutex *lock) +static void mark_wakeup_next_waiter(struct wake_q_head *wake_q, + struct rt_mutex *lock) { struct rt_mutex_waiter *waiter; unsigned long flags; @@ -991,12 +990,7 @@ static void wakeup_next_waiter(struct rt_mutex *lock) raw_spin_unlock_irqrestore(¤t->pi_lock, flags); - /* - * It's safe to dereference waiter as it cannot go away as - * long as we hold lock->wait_lock. The waiter task needs to - * acquire it in order to dequeue the waiter. - */ - wake_up_process(waiter->task); + wake_q_add(wake_q, waiter->task); } /* @@ -1255,6 +1249,8 @@ static inline int rt_mutex_slowtrylock(struct rt_mutex *lock) static void __sched rt_mutex_slowunlock(struct rt_mutex *lock) { + WAKE_Q(wake_q); + raw_spin_lock(&lock->wait_lock); debug_rt_mutex_unlock(lock); @@ -1303,10 +1299,13 @@ rt_mutex_slowunlock(struct rt_mutex *lock) /* * The wakeup next waiter path does not suffer from the above * race. See the comments there. + * + * Queue the next waiter for wakeup once we release the wait_lock. */ - wakeup_next_waiter(lock); + mark_wakeup_next_waiter(&wake_q, lock); raw_spin_unlock(&lock->wait_lock); + wake_up_q(&wake_q); /* Undo pi boosting if necessary: */ rt_mutex_adjust_prio(current); -- 2.1.4 ^ permalink raw reply related [flat|nested] 38+ messages in thread
* Re: [PATCH 1/4] locking/rtmutex: Implement lockless top-waiter wakeup 2015-05-19 17:24 ` [PATCH 1/4] locking/rtmutex: Implement lockless top-waiter wakeup Davidlohr Bueso @ 2015-06-05 12:35 ` Thomas Gleixner 2015-06-16 19:29 ` [PATCH] futex: lower the lock contention on the HB lock during wake up Sebastian Andrzej Siewior 2015-06-18 20:30 ` [tip:sched/core] locking/rtmutex: Implement lockless top-waiter wakeup tip-bot for Davidlohr Bueso 2 siblings, 0 replies; 38+ messages in thread From: Thomas Gleixner @ 2015-06-05 12:35 UTC (permalink / raw) To: Davidlohr Bueso Cc: Peter Zijlstra, Ingo Molnar, Steven Rostedt, Mike Galbraith, Paul E. McKenney, Sebastian Andrzej Siewior, linux-kernel, Davidlohr Bueso On Tue, 19 May 2015, Davidlohr Bueso wrote: > Mark the task for later wakeup after the wait_lock has been released. > This way, once the next task is awoken, it will have a better chance > to of finding the wait_lock free when continuing executing in > __rt_mutex_slowlock() when trying to acquire the rtmutex, calling > try_to_take_rt_mutex(). Upon contended scenarios, other tasks attempting > take the lock may acquire it first, right after the wait_lock is released, > but (a) this can also occur with the current code, as it relies on the > spinlock fairness, and (b) we are dealing with the top-waiter anyway, > so it will always take the lock next. > > Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> ^ permalink raw reply [flat|nested] 38+ messages in thread
* [PATCH] futex: lower the lock contention on the HB lock during wake up 2015-05-19 17:24 ` [PATCH 1/4] locking/rtmutex: Implement lockless top-waiter wakeup Davidlohr Bueso 2015-06-05 12:35 ` Thomas Gleixner @ 2015-06-16 19:29 ` Sebastian Andrzej Siewior 2015-06-16 19:50 ` Davidlohr Bueso 2015-06-18 20:30 ` [tip:sched/core] locking/rtmutex: Implement lockless top-waiter wakeup tip-bot for Davidlohr Bueso 2 siblings, 1 reply; 38+ messages in thread From: Sebastian Andrzej Siewior @ 2015-06-16 19:29 UTC (permalink / raw) To: Davidlohr Bueso Cc: Thomas Gleixner, Peter Zijlstra, Ingo Molnar, Steven Rostedt, Mike Galbraith, Paul E. McKenney, linux-kernel, Davidlohr Bueso wake_futex_pi() wakes the task before releasing the hash bucket lock (HB). The first thing the woken up task usually does is to acquire the lock which requires the HB lock. On SMP Systems this leads to blocking on the HB lock which is released by the owner shortly after. This patch rearranges the unlock path by first releasing the HB lock and then waking up the task. [bigeasy: redo ontop of lockless wake-queues] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> --- Davidlohr, would it work for you to replace that patch #1 from your series with this one? kernel/futex.c | 32 ++++++++++++++++++++--- kernel/locking/rtmutex.c | 56 +++++++++++++++++++++++++++++------------ kernel/locking/rtmutex_common.h | 3 +++ 3 files changed, 72 insertions(+), 19 deletions(-) diff --git a/kernel/futex.c b/kernel/futex.c index ea6ca0bca525..026594f02bd2 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -1117,12 +1117,15 @@ static void mark_wake_futex(struct wake_q_head *wake_q, struct futex_q *q) q->lock_ptr = NULL; } -static int wake_futex_pi(u32 __user *uaddr, u32 uval, struct futex_q *this) +static int wake_futex_pi(u32 __user *uaddr, u32 uval, struct futex_q *this, + struct futex_hash_bucket *hb) { struct task_struct *new_owner; struct futex_pi_state *pi_state = this->pi_state; u32 uninitialized_var(curval), newval; int ret = 0; + WAKE_Q(wake_q); + bool deboost; if (!pi_state) return -EINVAL; @@ -1173,7 +1176,19 @@ static int wake_futex_pi(u32 __user *uaddr, u32 uval, struct futex_q *this) raw_spin_unlock_irq(&new_owner->pi_lock); raw_spin_unlock(&pi_state->pi_mutex.wait_lock); - rt_mutex_unlock(&pi_state->pi_mutex); + + deboost = rt_mutex_futex_unlock(&pi_state->pi_mutex, &wake_q); + + /* + * First unlock HB so the waiter does not spin on it once he got woken + * up. Second wake up the waiter before the priority is adjusted. If we + * deboost first (and lose our higher priority), then the task might get + * scheduled away before the wake up can take place. + */ + spin_unlock(&hb->lock); + wake_up_q(&wake_q); + if (deboost) + rt_mutex_adjust_prio(current); return 0; } @@ -2410,13 +2425,23 @@ static int futex_unlock_pi(u32 __user *uaddr, unsigned int flags) */ match = futex_top_waiter(hb, &key); if (match) { - ret = wake_futex_pi(uaddr, uval, match); + ret = wake_futex_pi(uaddr, uval, match, hb); + /* + * In case of success wake_futex_pi dropped the hash + * bucket lock. + */ + if (!ret) + goto out_putkey; /* * The atomic access to the futex value generated a * pagefault, so retry the user-access and the wakeup: */ if (ret == -EFAULT) goto pi_faulted; + /* + * wake_futex_pi has detected invalid state. Tell user + * space. + */ goto out_unlock; } @@ -2437,6 +2462,7 @@ static int futex_unlock_pi(u32 __user *uaddr, unsigned int flags) out_unlock: spin_unlock(&hb->lock); +out_putkey: put_futex_key(&key); return ret; diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c index 36573e96a477..0c8f43baead6 100644 --- a/kernel/locking/rtmutex.c +++ b/kernel/locking/rtmutex.c @@ -300,7 +300,7 @@ static void __rt_mutex_adjust_prio(struct task_struct *task) * of task. We do not use the spin_xx_mutex() variants here as we are * outside of the debug path.) */ -static void rt_mutex_adjust_prio(struct task_struct *task) +void rt_mutex_adjust_prio(struct task_struct *task) { unsigned long flags; @@ -957,12 +957,12 @@ static int task_blocks_on_rt_mutex(struct rt_mutex *lock, /* * Wake up the next waiter on the lock. * - * Remove the top waiter from the current tasks pi waiter list and - * wake it up. + * Remove the top waiter from the current tasks pi waiter list, put him on the + * wake head (for later wake up). * * Called with lock->wait_lock held. */ -static void wakeup_next_waiter(struct rt_mutex *lock) +static void wakeup_next_waiter(struct rt_mutex *lock, struct wake_q_head *wqh) { struct rt_mutex_waiter *waiter; unsigned long flags; @@ -996,7 +996,7 @@ static void wakeup_next_waiter(struct rt_mutex *lock) * long as we hold lock->wait_lock. The waiter task needs to * acquire it in order to dequeue the waiter. */ - wake_up_process(waiter->task); + wake_q_add(wqh, waiter->task); } /* @@ -1250,10 +1250,11 @@ static inline int rt_mutex_slowtrylock(struct rt_mutex *lock) } /* - * Slow path to release a rt-mutex: + * Slow path to release a rt-mutex. + * Return whether the current task needs to undo a potential priority boosting. */ -static void __sched -rt_mutex_slowunlock(struct rt_mutex *lock) +static bool __sched rt_mutex_slowunlock(struct rt_mutex *lock, + struct wake_q_head *wqh) { raw_spin_lock(&lock->wait_lock); @@ -1295,7 +1296,7 @@ rt_mutex_slowunlock(struct rt_mutex *lock) while (!rt_mutex_has_waiters(lock)) { /* Drops lock->wait_lock ! */ if (unlock_rt_mutex_safe(lock) == true) - return; + return false; /* Relock the rtmutex and try again */ raw_spin_lock(&lock->wait_lock); } @@ -1304,12 +1305,12 @@ rt_mutex_slowunlock(struct rt_mutex *lock) * The wakeup next waiter path does not suffer from the above * race. See the comments there. */ - wakeup_next_waiter(lock); + wakeup_next_waiter(lock, wqh); raw_spin_unlock(&lock->wait_lock); - /* Undo pi boosting if necessary: */ - rt_mutex_adjust_prio(current); + /* check PI boosting */ + return true; } /* @@ -1360,12 +1361,18 @@ rt_mutex_fasttrylock(struct rt_mutex *lock, static inline void rt_mutex_fastunlock(struct rt_mutex *lock, - void (*slowfn)(struct rt_mutex *lock)) + bool (*slowfn)(struct rt_mutex *lock, + struct wake_q_head *wqh)) { - if (likely(rt_mutex_cmpxchg(lock, current, NULL))) + WAKE_Q(wake_q); + + if (likely(rt_mutex_cmpxchg(lock, current, NULL))) { rt_mutex_deadlock_account_unlock(current); - else - slowfn(lock); + + } else if (slowfn(lock, &wake_q)) { + /* Undo pi boosting if necessary: */ + rt_mutex_adjust_prio(current); + } } /** @@ -1467,6 +1474,23 @@ void __sched rt_mutex_unlock(struct rt_mutex *lock) EXPORT_SYMBOL_GPL(rt_mutex_unlock); /** + * rt_mutex_futex_unlock - Futex variant of rt_mutex_unlock + * @lock: the rt_mutex to be unlocked + * + * Returns: true/false indicating whether priority adjustment is + * required or not. + */ +bool __sched rt_mutex_futex_unlock(struct rt_mutex *lock, + struct wake_q_head *wqh) +{ + if (likely(rt_mutex_cmpxchg(lock, current, NULL))) { + rt_mutex_deadlock_account_unlock(current); + return false; + } + return rt_mutex_slowunlock(lock, wqh); +} + +/** * rt_mutex_destroy - mark a mutex unusable * @lock: the mutex to be destroyed * diff --git a/kernel/locking/rtmutex_common.h b/kernel/locking/rtmutex_common.h index 855212501407..7844f8f0e639 100644 --- a/kernel/locking/rtmutex_common.h +++ b/kernel/locking/rtmutex_common.h @@ -131,6 +131,9 @@ extern int rt_mutex_finish_proxy_lock(struct rt_mutex *lock, struct hrtimer_sleeper *to, struct rt_mutex_waiter *waiter); extern int rt_mutex_timed_futex_lock(struct rt_mutex *l, struct hrtimer_sleeper *to); +extern bool rt_mutex_futex_unlock(struct rt_mutex *lock, + struct wake_q_head *wqh); +extern void rt_mutex_adjust_prio(struct task_struct *task); #ifdef CONFIG_DEBUG_RT_MUTEXES # include "rtmutex-debug.h" -- 2.1.4 ^ permalink raw reply related [flat|nested] 38+ messages in thread
* Re: [PATCH] futex: lower the lock contention on the HB lock during wake up 2015-06-16 19:29 ` [PATCH] futex: lower the lock contention on the HB lock during wake up Sebastian Andrzej Siewior @ 2015-06-16 19:50 ` Davidlohr Bueso 2015-06-17 8:33 ` [PATCH v2] " Sebastian Andrzej Siewior 0 siblings, 1 reply; 38+ messages in thread From: Davidlohr Bueso @ 2015-06-16 19:50 UTC (permalink / raw) To: Sebastian Andrzej Siewior Cc: Thomas Gleixner, Peter Zijlstra, Ingo Molnar, Steven Rostedt, Mike Galbraith, Paul E. McKenney, linux-kernel On Tue, 2015-06-16 at 21:29 +0200, Sebastian Andrzej Siewior wrote: > Davidlohr, would it work for you to replace that patch #1 from your > series with this one? I prefer having two separate patches, thus keeping their own changelog for the change justification. Thanks, Davidlohr ^ permalink raw reply [flat|nested] 38+ messages in thread
* [PATCH v2] futex: lower the lock contention on the HB lock during wake up 2015-06-16 19:50 ` Davidlohr Bueso @ 2015-06-17 8:33 ` Sebastian Andrzej Siewior 2015-06-17 14:17 ` Mike Galbraith ` (3 more replies) 0 siblings, 4 replies; 38+ messages in thread From: Sebastian Andrzej Siewior @ 2015-06-17 8:33 UTC (permalink / raw) To: Davidlohr Bueso Cc: Thomas Gleixner, Peter Zijlstra, Ingo Molnar, Steven Rostedt, Mike Galbraith, Paul E. McKenney, linux-kernel wake_futex_pi() wakes the task before releasing the hash bucket lock (HB). The first thing the woken up task usually does is to acquire the lock which requires the HB lock. On SMP Systems this leads to blocking on the HB lock which is released by the owner shortly after. This patch rearranges the unlock path by first releasing the HB lock and then waking up the task. [bigeasy: redo ontop of lockless wake-queues] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> --- * Davidlohr Bueso | 2015-06-16 12:50:26 [-0700]: >I prefer having two separate patches, thus keeping their own changelog >for the change justification. okay, here it is on top of #1. kernel/futex.c | 32 +++++++++++++++++++++++--- kernel/locking/rtmutex.c | 51 +++++++++++++++++++++++++++++------------ kernel/locking/rtmutex_common.h | 3 +++ 3 files changed, 68 insertions(+), 18 deletions(-) diff --git a/kernel/futex.c b/kernel/futex.c index ea6ca0bca525..026594f02bd2 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -1117,12 +1117,15 @@ static void mark_wake_futex(struct wake_q_head *wake_q, struct futex_q *q) q->lock_ptr = NULL; } -static int wake_futex_pi(u32 __user *uaddr, u32 uval, struct futex_q *this) +static int wake_futex_pi(u32 __user *uaddr, u32 uval, struct futex_q *this, + struct futex_hash_bucket *hb) { struct task_struct *new_owner; struct futex_pi_state *pi_state = this->pi_state; u32 uninitialized_var(curval), newval; int ret = 0; + WAKE_Q(wake_q); + bool deboost; if (!pi_state) return -EINVAL; @@ -1173,7 +1176,19 @@ static int wake_futex_pi(u32 __user *uaddr, u32 uval, struct futex_q *this) raw_spin_unlock_irq(&new_owner->pi_lock); raw_spin_unlock(&pi_state->pi_mutex.wait_lock); - rt_mutex_unlock(&pi_state->pi_mutex); + + deboost = rt_mutex_futex_unlock(&pi_state->pi_mutex, &wake_q); + + /* + * First unlock HB so the waiter does not spin on it once he got woken + * up. Second wake up the waiter before the priority is adjusted. If we + * deboost first (and lose our higher priority), then the task might get + * scheduled away before the wake up can take place. + */ + spin_unlock(&hb->lock); + wake_up_q(&wake_q); + if (deboost) + rt_mutex_adjust_prio(current); return 0; } @@ -2410,13 +2425,23 @@ static int futex_unlock_pi(u32 __user *uaddr, unsigned int flags) */ match = futex_top_waiter(hb, &key); if (match) { - ret = wake_futex_pi(uaddr, uval, match); + ret = wake_futex_pi(uaddr, uval, match, hb); + /* + * In case of success wake_futex_pi dropped the hash + * bucket lock. + */ + if (!ret) + goto out_putkey; /* * The atomic access to the futex value generated a * pagefault, so retry the user-access and the wakeup: */ if (ret == -EFAULT) goto pi_faulted; + /* + * wake_futex_pi has detected invalid state. Tell user + * space. + */ goto out_unlock; } @@ -2437,6 +2462,7 @@ static int futex_unlock_pi(u32 __user *uaddr, unsigned int flags) out_unlock: spin_unlock(&hb->lock); +out_putkey: put_futex_key(&key); return ret; diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c index 74188d83b065..53fab686a1c2 100644 --- a/kernel/locking/rtmutex.c +++ b/kernel/locking/rtmutex.c @@ -300,7 +300,7 @@ static void __rt_mutex_adjust_prio(struct task_struct *task) * of task. We do not use the spin_xx_mutex() variants here as we are * outside of the debug path.) */ -static void rt_mutex_adjust_prio(struct task_struct *task) +void rt_mutex_adjust_prio(struct task_struct *task) { unsigned long flags; @@ -1244,13 +1244,12 @@ static inline int rt_mutex_slowtrylock(struct rt_mutex *lock) } /* - * Slow path to release a rt-mutex: + * Slow path to release a rt-mutex. + * Return whether the current task needs to undo a potential priority boosting. */ -static void __sched -rt_mutex_slowunlock(struct rt_mutex *lock) +static bool __sched rt_mutex_slowunlock(struct rt_mutex *lock, + struct wake_q_head *wake_q) { - WAKE_Q(wake_q); - raw_spin_lock(&lock->wait_lock); debug_rt_mutex_unlock(lock); @@ -1291,7 +1290,7 @@ rt_mutex_slowunlock(struct rt_mutex *lock) while (!rt_mutex_has_waiters(lock)) { /* Drops lock->wait_lock ! */ if (unlock_rt_mutex_safe(lock) == true) - return; + return false; /* Relock the rtmutex and try again */ raw_spin_lock(&lock->wait_lock); } @@ -1302,13 +1301,12 @@ rt_mutex_slowunlock(struct rt_mutex *lock) * * Queue the next waiter for wakeup once we release the wait_lock. */ - mark_wakeup_next_waiter(&wake_q, lock); + mark_wakeup_next_waiter(wake_q, lock); raw_spin_unlock(&lock->wait_lock); - wake_up_q(&wake_q); - /* Undo pi boosting if necessary: */ - rt_mutex_adjust_prio(current); + /* check PI boosting */ + return true; } /* @@ -1359,12 +1357,18 @@ rt_mutex_fasttrylock(struct rt_mutex *lock, static inline void rt_mutex_fastunlock(struct rt_mutex *lock, - void (*slowfn)(struct rt_mutex *lock)) + bool (*slowfn)(struct rt_mutex *lock, + struct wake_q_head *wqh)) { - if (likely(rt_mutex_cmpxchg(lock, current, NULL))) + WAKE_Q(wake_q); + + if (likely(rt_mutex_cmpxchg(lock, current, NULL))) { rt_mutex_deadlock_account_unlock(current); - else - slowfn(lock); + + } else if (slowfn(lock, &wake_q)) { + /* Undo pi boosting if necessary: */ + rt_mutex_adjust_prio(current); + } } /** @@ -1466,6 +1470,23 @@ void __sched rt_mutex_unlock(struct rt_mutex *lock) EXPORT_SYMBOL_GPL(rt_mutex_unlock); /** + * rt_mutex_futex_unlock - Futex variant of rt_mutex_unlock + * @lock: the rt_mutex to be unlocked + * + * Returns: true/false indicating whether priority adjustment is + * required or not. + */ +bool __sched rt_mutex_futex_unlock(struct rt_mutex *lock, + struct wake_q_head *wqh) +{ + if (likely(rt_mutex_cmpxchg(lock, current, NULL))) { + rt_mutex_deadlock_account_unlock(current); + return false; + } + return rt_mutex_slowunlock(lock, wqh); +} + +/** * rt_mutex_destroy - mark a mutex unusable * @lock: the mutex to be destroyed * diff --git a/kernel/locking/rtmutex_common.h b/kernel/locking/rtmutex_common.h index 855212501407..7844f8f0e639 100644 --- a/kernel/locking/rtmutex_common.h +++ b/kernel/locking/rtmutex_common.h @@ -131,6 +131,9 @@ extern int rt_mutex_finish_proxy_lock(struct rt_mutex *lock, struct hrtimer_sleeper *to, struct rt_mutex_waiter *waiter); extern int rt_mutex_timed_futex_lock(struct rt_mutex *l, struct hrtimer_sleeper *to); +extern bool rt_mutex_futex_unlock(struct rt_mutex *lock, + struct wake_q_head *wqh); +extern void rt_mutex_adjust_prio(struct task_struct *task); #ifdef CONFIG_DEBUG_RT_MUTEXES # include "rtmutex-debug.h" -- 2.1.4 ^ permalink raw reply related [flat|nested] 38+ messages in thread
* Re: [PATCH v2] futex: lower the lock contention on the HB lock during wake up 2015-06-17 8:33 ` [PATCH v2] " Sebastian Andrzej Siewior @ 2015-06-17 14:17 ` Mike Galbraith 2015-06-17 14:28 ` Sebastian Andrzej Siewior 2015-06-18 20:30 ` [tip:sched/core] futex: Lower " tip-bot for Sebastian Andrzej Siewior ` (2 subsequent siblings) 3 siblings, 1 reply; 38+ messages in thread From: Mike Galbraith @ 2015-06-17 14:17 UTC (permalink / raw) To: Sebastian Andrzej Siewior Cc: Davidlohr Bueso, Thomas Gleixner, Peter Zijlstra, Ingo Molnar, Steven Rostedt, Paul E. McKenney, linux-kernel On Wed, 2015-06-17 at 10:33 +0200, Sebastian Andrzej Siewior wrote: > wake_futex_pi() wakes the task before releasing the hash bucket lock > (HB). The first thing the woken up task usually does is to acquire the > lock which requires the HB lock. On SMP Systems this leads to blocking > on the HB lock which is released by the owner shortly after. > This patch rearranges the unlock path by first releasing the HB lock and > then waking up the task. > > [bigeasy: redo ontop of lockless wake-queues] > Signed-off-by: Thomas Gleixner <tglx@linutronix.de> > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> 4.1-rc8-rt4 contains this via 4.0-rt4, and seems fine on my 64 core DL980. I ran a few iterations of futextests and stockfish, then mixed two loops of futextest at different rt prios, with stockfish also rt, and ltplight as tossed in as... crack filler. Box is still doing that, is way too busy, but not griping about it. -Mike ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v2] futex: lower the lock contention on the HB lock during wake up 2015-06-17 14:17 ` Mike Galbraith @ 2015-06-17 14:28 ` Sebastian Andrzej Siewior 2015-06-17 14:31 ` Mike Galbraith 2015-06-21 4:35 ` Mike Galbraith 0 siblings, 2 replies; 38+ messages in thread From: Sebastian Andrzej Siewior @ 2015-06-17 14:28 UTC (permalink / raw) To: Mike Galbraith Cc: Davidlohr Bueso, Thomas Gleixner, Peter Zijlstra, Ingo Molnar, Steven Rostedt, Paul E. McKenney, linux-kernel On 06/17/2015 04:17 PM, Mike Galbraith wrote: > On Wed, 2015-06-17 at 10:33 +0200, Sebastian Andrzej Siewior wrote: >> wake_futex_pi() wakes the task before releasing the hash bucket lock >> (HB). The first thing the woken up task usually does is to acquire the >> lock which requires the HB lock. On SMP Systems this leads to blocking >> on the HB lock which is released by the owner shortly after. >> This patch rearranges the unlock path by first releasing the HB lock and >> then waking up the task. >> >> [bigeasy: redo ontop of lockless wake-queues] >> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> >> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> > > 4.1-rc8-rt4 contains this via 4.0-rt4, and seems fine on my 64 core > DL980. I ran a few iterations of futextests and stockfish, then mixed > two loops of futextest at different rt prios, with stockfish also rt, > and ltplight as tossed in as... crack filler. Box is still doing that, > is way too busy, but not griping about it. There are two patches mostly doing the same thing. The patch posted here is a redo ontop of "lockless wake-queues". It does hb-unlock, wakeup, de-boost. The patch merged into -RT is the original approach not using "lockless wake-queues" and performing wakeup, hb-unlock, de-boost. I plan to get into -RT the final solution once it hits upstream. > > -Mike > Sebastian ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v2] futex: lower the lock contention on the HB lock during wake up 2015-06-17 14:28 ` Sebastian Andrzej Siewior @ 2015-06-17 14:31 ` Mike Galbraith 2015-06-21 4:35 ` Mike Galbraith 1 sibling, 0 replies; 38+ messages in thread From: Mike Galbraith @ 2015-06-17 14:31 UTC (permalink / raw) To: Sebastian Andrzej Siewior Cc: Davidlohr Bueso, Thomas Gleixner, Peter Zijlstra, Ingo Molnar, Steven Rostedt, Paul E. McKenney, linux-kernel On Wed, 2015-06-17 at 16:28 +0200, Sebastian Andrzej Siewior wrote: > On 06/17/2015 04:17 PM, Mike Galbraith wrote: > > On Wed, 2015-06-17 at 10:33 +0200, Sebastian Andrzej Siewior wrote: > >> wake_futex_pi() wakes the task before releasing the hash bucket lock > >> (HB). The first thing the woken up task usually does is to acquire the > >> lock which requires the HB lock. On SMP Systems this leads to blocking > >> on the HB lock which is released by the owner shortly after. > >> This patch rearranges the unlock path by first releasing the HB lock and > >> then waking up the task. > >> > >> [bigeasy: redo ontop of lockless wake-queues] > >> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> > >> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> > > > > 4.1-rc8-rt4 contains this via 4.0-rt4, and seems fine on my 64 core > > DL980. I ran a few iterations of futextests and stockfish, then mixed > > two loops of futextest at different rt prios, with stockfish also rt, > > and ltplight as tossed in as... crack filler. Box is still doing that, > > is way too busy, but not griping about it. > > There are two patches mostly doing the same thing. The patch posted > here is a redo ontop of "lockless wake-queues". It does hb-unlock, > wakeup, de-boost. The patch merged into -RT is the original approach > not using "lockless wake-queues" and performing wakeup, hb-unlock, > de-boost. > > I plan to get into -RT the final solution once it hits upstream. OK, a glance wasn't enough. Guess I can let tired ole box rest. -Mike ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v2] futex: lower the lock contention on the HB lock during wake up 2015-06-17 14:28 ` Sebastian Andrzej Siewior 2015-06-17 14:31 ` Mike Galbraith @ 2015-06-21 4:35 ` Mike Galbraith 1 sibling, 0 replies; 38+ messages in thread From: Mike Galbraith @ 2015-06-21 4:35 UTC (permalink / raw) To: Sebastian Andrzej Siewior Cc: Davidlohr Bueso, Thomas Gleixner, Peter Zijlstra, Ingo Molnar, Steven Rostedt, Paul E. McKenney, linux-kernel On Wed, 2015-06-17 at 16:28 +0200, Sebastian Andrzej Siewior wrote: > On 06/17/2015 04:17 PM, Mike Galbraith wrote: > > On Wed, 2015-06-17 at 10:33 +0200, Sebastian Andrzej Siewior wrote: > >> wake_futex_pi() wakes the task before releasing the hash bucket lock > >> (HB). The first thing the woken up task usually does is to acquire the > >> lock which requires the HB lock. On SMP Systems this leads to blocking > >> on the HB lock which is released by the owner shortly after. > >> This patch rearranges the unlock path by first releasing the HB lock and > >> then waking up the task. > >> > >> [bigeasy: redo ontop of lockless wake-queues] > >> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> > >> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> > > > > 4.1-rc8-rt4 contains this via 4.0-rt4, and seems fine on my 64 core > > DL980. I ran a few iterations of futextests and stockfish, then mixed > > two loops of futextest at different rt prios, with stockfish also rt, > > and ltplight as tossed in as... crack filler. Box is still doing that, > > is way too busy, but not griping about it. > > There are two patches mostly doing the same thing. The patch posted > here is a redo ontop of "lockless wake-queues". It does hb-unlock, > wakeup, de-boost. The patch merged into -RT is the original approach > not using "lockless wake-queues" and performing wakeup, hb-unlock, > de-boost. > > I plan to get into -RT the final solution once it hits upstream. I plugged patch1 and tip version into rt and beat it, seems solid. Converting the rest of rtmutex.c to use wake queues with ->save_state to select wake function went less well. Kernel does a good impersonation of a working kernel until I beat it up, then it loses wakeups. Hohum, so much for yet another early morning tinker session. -Mike -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 38+ messages in thread
* [tip:sched/core] futex: Lower the lock contention on the HB lock during wake up 2015-06-17 8:33 ` [PATCH v2] " Sebastian Andrzej Siewior 2015-06-17 14:17 ` Mike Galbraith @ 2015-06-18 20:30 ` tip-bot for Sebastian Andrzej Siewior 2015-06-19 17:51 ` [PATCH v2] futex: lower " Kevin Hilman 2015-06-19 19:33 ` [tip:sched/locking] futex: Lower " tip-bot for Sebastian Andrzej Siewior 3 siblings, 0 replies; 38+ messages in thread From: tip-bot for Sebastian Andrzej Siewior @ 2015-06-18 20:30 UTC (permalink / raw) To: linux-tip-commits Cc: rostedt, tglx, mingo, hpa, bigeasy, dave, peterz, linux-kernel, umgwanakikbuti, paulmck Commit-ID: 881bd58d6e9eba4240b9dbc49fdc03a3374d7508 Gitweb: http://git.kernel.org/tip/881bd58d6e9eba4240b9dbc49fdc03a3374d7508 Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de> AuthorDate: Wed, 17 Jun 2015 10:33:50 +0200 Committer: Thomas Gleixner <tglx@linutronix.de> CommitDate: Thu, 18 Jun 2015 22:27:46 +0200 futex: Lower the lock contention on the HB lock during wake up wake_futex_pi() wakes the task before releasing the hash bucket lock (HB). The first thing the woken up task usually does is to acquire the lock which requires the HB lock. On SMP Systems this leads to blocking on the HB lock which is released by the owner shortly after. This patch rearranges the unlock path by first releasing the HB lock and then waking up the task. Originally-from: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Link: http://lkml.kernel.org/r/20150617083350.GA2433@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de> --- kernel/futex.c | 32 +++++++++++++++++++++++--- kernel/locking/rtmutex.c | 51 +++++++++++++++++++++++++++++------------ kernel/locking/rtmutex_common.h | 3 +++ 3 files changed, 68 insertions(+), 18 deletions(-) diff --git a/kernel/futex.c b/kernel/futex.c index f9984c3..a0cf6fa 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -1117,11 +1117,14 @@ static void mark_wake_futex(struct wake_q_head *wake_q, struct futex_q *q) q->lock_ptr = NULL; } -static int wake_futex_pi(u32 __user *uaddr, u32 uval, struct futex_q *this) +static int wake_futex_pi(u32 __user *uaddr, u32 uval, struct futex_q *this, + struct futex_hash_bucket *hb) { struct task_struct *new_owner; struct futex_pi_state *pi_state = this->pi_state; u32 uninitialized_var(curval), newval; + WAKE_Q(wake_q); + bool deboost; int ret = 0; if (!pi_state) @@ -1173,7 +1176,19 @@ static int wake_futex_pi(u32 __user *uaddr, u32 uval, struct futex_q *this) raw_spin_unlock_irq(&new_owner->pi_lock); raw_spin_unlock(&pi_state->pi_mutex.wait_lock); - rt_mutex_unlock(&pi_state->pi_mutex); + + deboost = rt_mutex_futex_unlock(&pi_state->pi_mutex, &wake_q); + + /* + * First unlock HB so the waiter does not spin on it once he got woken + * up. Second wake up the waiter before the priority is adjusted. If we + * deboost first (and lose our higher priority), then the task might get + * scheduled away before the wake up can take place. + */ + spin_unlock(&hb->lock); + wake_up_q(&wake_q); + if (deboost) + rt_mutex_adjust_prio(current); return 0; } @@ -2413,13 +2428,23 @@ retry: */ match = futex_top_waiter(hb, &key); if (match) { - ret = wake_futex_pi(uaddr, uval, match); + ret = wake_futex_pi(uaddr, uval, match, hb); + /* + * In case of success wake_futex_pi dropped the hash + * bucket lock. + */ + if (!ret) + goto out_putkey; /* * The atomic access to the futex value generated a * pagefault, so retry the user-access and the wakeup: */ if (ret == -EFAULT) goto pi_faulted; + /* + * wake_futex_pi has detected invalid state. Tell user + * space. + */ goto out_unlock; } @@ -2440,6 +2465,7 @@ retry: out_unlock: spin_unlock(&hb->lock); +out_putkey: put_futex_key(&key); return ret; diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c index 44ee8f8..1130130 100644 --- a/kernel/locking/rtmutex.c +++ b/kernel/locking/rtmutex.c @@ -300,7 +300,7 @@ static void __rt_mutex_adjust_prio(struct task_struct *task) * of task. We do not use the spin_xx_mutex() variants here as we are * outside of the debug path.) */ -static void rt_mutex_adjust_prio(struct task_struct *task) +void rt_mutex_adjust_prio(struct task_struct *task) { unsigned long flags; @@ -1247,13 +1247,12 @@ static inline int rt_mutex_slowtrylock(struct rt_mutex *lock) } /* - * Slow path to release a rt-mutex: + * Slow path to release a rt-mutex. + * Return whether the current task needs to undo a potential priority boosting. */ -static void __sched -rt_mutex_slowunlock(struct rt_mutex *lock) +static bool __sched rt_mutex_slowunlock(struct rt_mutex *lock, + struct wake_q_head *wake_q) { - WAKE_Q(wake_q); - raw_spin_lock(&lock->wait_lock); debug_rt_mutex_unlock(lock); @@ -1294,7 +1293,7 @@ rt_mutex_slowunlock(struct rt_mutex *lock) while (!rt_mutex_has_waiters(lock)) { /* Drops lock->wait_lock ! */ if (unlock_rt_mutex_safe(lock) == true) - return; + return false; /* Relock the rtmutex and try again */ raw_spin_lock(&lock->wait_lock); } @@ -1305,13 +1304,12 @@ rt_mutex_slowunlock(struct rt_mutex *lock) * * Queue the next waiter for wakeup once we release the wait_lock. */ - mark_wakeup_next_waiter(&wake_q, lock); + mark_wakeup_next_waiter(wake_q, lock); raw_spin_unlock(&lock->wait_lock); - wake_up_q(&wake_q); - /* Undo pi boosting if necessary: */ - rt_mutex_adjust_prio(current); + /* check PI boosting */ + return true; } /* @@ -1362,12 +1360,18 @@ rt_mutex_fasttrylock(struct rt_mutex *lock, static inline void rt_mutex_fastunlock(struct rt_mutex *lock, - void (*slowfn)(struct rt_mutex *lock)) + bool (*slowfn)(struct rt_mutex *lock, + struct wake_q_head *wqh)) { - if (likely(rt_mutex_cmpxchg(lock, current, NULL))) + WAKE_Q(wake_q); + + if (likely(rt_mutex_cmpxchg(lock, current, NULL))) { rt_mutex_deadlock_account_unlock(current); - else - slowfn(lock); + + } else if (slowfn(lock, &wake_q)) { + /* Undo pi boosting if necessary: */ + rt_mutex_adjust_prio(current); + } } /** @@ -1462,6 +1466,23 @@ void __sched rt_mutex_unlock(struct rt_mutex *lock) EXPORT_SYMBOL_GPL(rt_mutex_unlock); /** + * rt_mutex_futex_unlock - Futex variant of rt_mutex_unlock + * @lock: the rt_mutex to be unlocked + * + * Returns: true/false indicating whether priority adjustment is + * required or not. + */ +bool __sched rt_mutex_futex_unlock(struct rt_mutex *lock, + struct wake_q_head *wqh) +{ + if (likely(rt_mutex_cmpxchg(lock, current, NULL))) { + rt_mutex_deadlock_account_unlock(current); + return false; + } + return rt_mutex_slowunlock(lock, wqh); +} + +/** * rt_mutex_destroy - mark a mutex unusable * @lock: the mutex to be destroyed * diff --git a/kernel/locking/rtmutex_common.h b/kernel/locking/rtmutex_common.h index 8552125..7844f8f 100644 --- a/kernel/locking/rtmutex_common.h +++ b/kernel/locking/rtmutex_common.h @@ -131,6 +131,9 @@ extern int rt_mutex_finish_proxy_lock(struct rt_mutex *lock, struct hrtimer_sleeper *to, struct rt_mutex_waiter *waiter); extern int rt_mutex_timed_futex_lock(struct rt_mutex *l, struct hrtimer_sleeper *to); +extern bool rt_mutex_futex_unlock(struct rt_mutex *lock, + struct wake_q_head *wqh); +extern void rt_mutex_adjust_prio(struct task_struct *task); #ifdef CONFIG_DEBUG_RT_MUTEXES # include "rtmutex-debug.h" ^ permalink raw reply related [flat|nested] 38+ messages in thread
* Re: [PATCH v2] futex: lower the lock contention on the HB lock during wake up 2015-06-17 8:33 ` [PATCH v2] " Sebastian Andrzej Siewior 2015-06-17 14:17 ` Mike Galbraith 2015-06-18 20:30 ` [tip:sched/core] futex: Lower " tip-bot for Sebastian Andrzej Siewior @ 2015-06-19 17:51 ` Kevin Hilman 2015-06-19 18:54 ` Thomas Gleixner 2015-06-19 19:33 ` [tip:sched/locking] futex: Lower " tip-bot for Sebastian Andrzej Siewior 3 siblings, 1 reply; 38+ messages in thread From: Kevin Hilman @ 2015-06-19 17:51 UTC (permalink / raw) To: Sebastian Andrzej Siewior Cc: Davidlohr Bueso, Thomas Gleixner, Peter Zijlstra, Ingo Molnar, Steven Rostedt, Mike Galbraith, Paul E. McKenney, lkml, Tyler Baker, Olof Johansson, Tony Lindgren, linux-omap, Santosh Shilimkar, Felipe Balbi, Nishanth Menon On Wed, Jun 17, 2015 at 1:33 AM, Sebastian Andrzej Siewior <bigeasy@linutronix.de> wrote: > wake_futex_pi() wakes the task before releasing the hash bucket lock > (HB). The first thing the woken up task usually does is to acquire the > lock which requires the HB lock. On SMP Systems this leads to blocking > on the HB lock which is released by the owner shortly after. > This patch rearranges the unlock path by first releasing the HB lock and > then waking up the task. > > [bigeasy: redo ontop of lockless wake-queues] > Signed-off-by: Thomas Gleixner <tglx@linutronix.de> > Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> > --- > * Davidlohr Bueso | 2015-06-16 12:50:26 [-0700]: > >>I prefer having two separate patches, thus keeping their own changelog >>for the change justification. > > okay, here it is on top of #1. A handful of boot test failures on ARM/OMAP were found by kernelci.org in next-20150619[1] and were bisected down to this patch, which hit next-20150619 in the form of commit 881bd58d6e9e (futex: Lower the lock contention on the HB lock during wake up). I confirmed that reverting that patch on top of next-20150619 gets things booting again for the affected platforms. I haven't debugged this any further, but full boot logs are available for the boot failures[2][3] and the linux-omap list and maintainer are Cc'd here to help investigate further if needed. Kevin [1] http://kernelci.org/boot/all/job/next/kernel/next-20150619/ [2] http://storage.kernelci.org/next/next-20150619/arm-multi_v7_defconfig/lab-khilman/boot-omap5-uevm.html [3] http://storage.kernelci.org/next/next-20150619/arm-omap2plus_defconfig/lab-tbaker/boot-omap3-beagle-xm.html -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH v2] futex: lower the lock contention on the HB lock during wake up 2015-06-19 17:51 ` [PATCH v2] futex: lower " Kevin Hilman @ 2015-06-19 18:54 ` Thomas Gleixner 2015-06-19 19:32 ` Kevin Hilman 0 siblings, 1 reply; 38+ messages in thread From: Thomas Gleixner @ 2015-06-19 18:54 UTC (permalink / raw) To: Kevin Hilman Cc: Sebastian Andrzej Siewior, Davidlohr Bueso, Peter Zijlstra, Ingo Molnar, Steven Rostedt, Mike Galbraith, Paul E. McKenney, lkml, Tyler Baker, Olof Johansson, Tony Lindgren, linux-omap, Santosh Shilimkar, Felipe Balbi, Nishanth Menon On Fri, 19 Jun 2015, Kevin Hilman wrote: > On Wed, Jun 17, 2015 at 1:33 AM, Sebastian Andrzej Siewior > A handful of boot test failures on ARM/OMAP were found by kernelci.org > in next-20150619[1] and were bisected down to this patch, which hit > next-20150619 in the form of commit 881bd58d6e9e (futex: Lower the > lock contention on the HB lock during wake up). I confirmed that > reverting that patch on top of next-20150619 gets things booting again > for the affected platforms. > > I haven't debugged this any further, but full boot logs are available > for the boot failures[2][3] and the linux-omap list and maintainer are > Cc'd here to help investigate further if needed. Found it. Dunno, how I missed that one. Fix below. Thanks, tglx --- diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c index 10dbeb6fe96f..5674b073473c 100644 --- a/kernel/locking/rtmutex.c +++ b/kernel/locking/rtmutex.c @@ -1365,9 +1365,14 @@ rt_mutex_fastunlock(struct rt_mutex *lock, if (likely(rt_mutex_cmpxchg(lock, current, NULL))) { rt_mutex_deadlock_account_unlock(current); - } else if (slowfn(lock, &wake_q)) { + } else { + bool deboost = slowfn(lock, &wake_q); + + wake_up_q(&wake_q); + /* Undo pi boosting if necessary: */ - rt_mutex_adjust_prio(current); + if (deboost) + rt_mutex_adjust_prio(current); } } -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply related [flat|nested] 38+ messages in thread
* Re: [PATCH v2] futex: lower the lock contention on the HB lock during wake up 2015-06-19 18:54 ` Thomas Gleixner @ 2015-06-19 19:32 ` Kevin Hilman 0 siblings, 0 replies; 38+ messages in thread From: Kevin Hilman @ 2015-06-19 19:32 UTC (permalink / raw) To: Thomas Gleixner Cc: Sebastian Andrzej Siewior, Davidlohr Bueso, Peter Zijlstra, Ingo Molnar, Steven Rostedt, Mike Galbraith, Paul E. McKenney, lkml, Tyler Baker, Olof Johansson, Tony Lindgren, linux-omap, Santosh Shilimkar, Felipe Balbi, Nishanth Menon Thomas Gleixner <tglx@linutronix.de> writes: > On Fri, 19 Jun 2015, Kevin Hilman wrote: >> On Wed, Jun 17, 2015 at 1:33 AM, Sebastian Andrzej Siewior >> A handful of boot test failures on ARM/OMAP were found by kernelci.org >> in next-20150619[1] and were bisected down to this patch, which hit >> next-20150619 in the form of commit 881bd58d6e9e (futex: Lower the >> lock contention on the HB lock during wake up). I confirmed that >> reverting that patch on top of next-20150619 gets things booting again >> for the affected platforms. >> >> I haven't debugged this any further, but full boot logs are available >> for the boot failures[2][3] and the linux-omap list and maintainer are >> Cc'd here to help investigate further if needed. > > Found it. Dunno, how I missed that one. Fix below. > Yup, that fix on top of next-20150619 gets the two OMAP platforms booting again. Tested-by: Kevin Hilman <khilman@linaro.org> Kevin -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 38+ messages in thread
* [tip:sched/locking] futex: Lower the lock contention on the HB lock during wake up 2015-06-17 8:33 ` [PATCH v2] " Sebastian Andrzej Siewior ` (2 preceding siblings ...) 2015-06-19 17:51 ` [PATCH v2] futex: lower " Kevin Hilman @ 2015-06-19 19:33 ` tip-bot for Sebastian Andrzej Siewior 3 siblings, 0 replies; 38+ messages in thread From: tip-bot for Sebastian Andrzej Siewior @ 2015-06-19 19:33 UTC (permalink / raw) To: linux-tip-commits Cc: umgwanakikbuti, tglx, linux-kernel, paulmck, bigeasy, rostedt, peterz, mingo, hpa, dave Commit-ID: 802ab58da74bb49ab348d2872190ef26ddc1a3e0 Gitweb: http://git.kernel.org/tip/802ab58da74bb49ab348d2872190ef26ddc1a3e0 Author: Sebastian Andrzej Siewior <bigeasy@linutronix.de> AuthorDate: Wed, 17 Jun 2015 10:33:50 +0200 Committer: Thomas Gleixner <tglx@linutronix.de> CommitDate: Fri, 19 Jun 2015 21:26:38 +0200 futex: Lower the lock contention on the HB lock during wake up wake_futex_pi() wakes the task before releasing the hash bucket lock (HB). The first thing the woken up task usually does is to acquire the lock which requires the HB lock. On SMP Systems this leads to blocking on the HB lock which is released by the owner shortly after. This patch rearranges the unlock path by first releasing the HB lock and then waking up the task. [ tglx: Fixed up the rtmutex unlock path ] Originally-from: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Link: http://lkml.kernel.org/r/20150617083350.GA2433@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de> --- kernel/futex.c | 32 ++++++++++++++++++++--- kernel/locking/rtmutex.c | 56 ++++++++++++++++++++++++++++++----------- kernel/locking/rtmutex_common.h | 3 +++ 3 files changed, 73 insertions(+), 18 deletions(-) diff --git a/kernel/futex.c b/kernel/futex.c index f9984c3..a0cf6fa 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -1117,11 +1117,14 @@ static void mark_wake_futex(struct wake_q_head *wake_q, struct futex_q *q) q->lock_ptr = NULL; } -static int wake_futex_pi(u32 __user *uaddr, u32 uval, struct futex_q *this) +static int wake_futex_pi(u32 __user *uaddr, u32 uval, struct futex_q *this, + struct futex_hash_bucket *hb) { struct task_struct *new_owner; struct futex_pi_state *pi_state = this->pi_state; u32 uninitialized_var(curval), newval; + WAKE_Q(wake_q); + bool deboost; int ret = 0; if (!pi_state) @@ -1173,7 +1176,19 @@ static int wake_futex_pi(u32 __user *uaddr, u32 uval, struct futex_q *this) raw_spin_unlock_irq(&new_owner->pi_lock); raw_spin_unlock(&pi_state->pi_mutex.wait_lock); - rt_mutex_unlock(&pi_state->pi_mutex); + + deboost = rt_mutex_futex_unlock(&pi_state->pi_mutex, &wake_q); + + /* + * First unlock HB so the waiter does not spin on it once he got woken + * up. Second wake up the waiter before the priority is adjusted. If we + * deboost first (and lose our higher priority), then the task might get + * scheduled away before the wake up can take place. + */ + spin_unlock(&hb->lock); + wake_up_q(&wake_q); + if (deboost) + rt_mutex_adjust_prio(current); return 0; } @@ -2413,13 +2428,23 @@ retry: */ match = futex_top_waiter(hb, &key); if (match) { - ret = wake_futex_pi(uaddr, uval, match); + ret = wake_futex_pi(uaddr, uval, match, hb); + /* + * In case of success wake_futex_pi dropped the hash + * bucket lock. + */ + if (!ret) + goto out_putkey; /* * The atomic access to the futex value generated a * pagefault, so retry the user-access and the wakeup: */ if (ret == -EFAULT) goto pi_faulted; + /* + * wake_futex_pi has detected invalid state. Tell user + * space. + */ goto out_unlock; } @@ -2440,6 +2465,7 @@ retry: out_unlock: spin_unlock(&hb->lock); +out_putkey: put_futex_key(&key); return ret; diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c index 44ee8f8..0add724 100644 --- a/kernel/locking/rtmutex.c +++ b/kernel/locking/rtmutex.c @@ -300,7 +300,7 @@ static void __rt_mutex_adjust_prio(struct task_struct *task) * of task. We do not use the spin_xx_mutex() variants here as we are * outside of the debug path.) */ -static void rt_mutex_adjust_prio(struct task_struct *task) +void rt_mutex_adjust_prio(struct task_struct *task) { unsigned long flags; @@ -1247,13 +1247,12 @@ static inline int rt_mutex_slowtrylock(struct rt_mutex *lock) } /* - * Slow path to release a rt-mutex: + * Slow path to release a rt-mutex. + * Return whether the current task needs to undo a potential priority boosting. */ -static void __sched -rt_mutex_slowunlock(struct rt_mutex *lock) +static bool __sched rt_mutex_slowunlock(struct rt_mutex *lock, + struct wake_q_head *wake_q) { - WAKE_Q(wake_q); - raw_spin_lock(&lock->wait_lock); debug_rt_mutex_unlock(lock); @@ -1294,7 +1293,7 @@ rt_mutex_slowunlock(struct rt_mutex *lock) while (!rt_mutex_has_waiters(lock)) { /* Drops lock->wait_lock ! */ if (unlock_rt_mutex_safe(lock) == true) - return; + return false; /* Relock the rtmutex and try again */ raw_spin_lock(&lock->wait_lock); } @@ -1305,13 +1304,12 @@ rt_mutex_slowunlock(struct rt_mutex *lock) * * Queue the next waiter for wakeup once we release the wait_lock. */ - mark_wakeup_next_waiter(&wake_q, lock); + mark_wakeup_next_waiter(wake_q, lock); raw_spin_unlock(&lock->wait_lock); - wake_up_q(&wake_q); - /* Undo pi boosting if necessary: */ - rt_mutex_adjust_prio(current); + /* check PI boosting */ + return true; } /* @@ -1362,12 +1360,23 @@ rt_mutex_fasttrylock(struct rt_mutex *lock, static inline void rt_mutex_fastunlock(struct rt_mutex *lock, - void (*slowfn)(struct rt_mutex *lock)) + bool (*slowfn)(struct rt_mutex *lock, + struct wake_q_head *wqh)) { - if (likely(rt_mutex_cmpxchg(lock, current, NULL))) + WAKE_Q(wake_q); + + if (likely(rt_mutex_cmpxchg(lock, current, NULL))) { rt_mutex_deadlock_account_unlock(current); - else - slowfn(lock); + + } else { + bool deboost = slowfn(lock, &wake_q); + + wake_up_q(&wake_q); + + /* Undo pi boosting if necessary: */ + if (deboost) + rt_mutex_adjust_prio(current); + } } /** @@ -1462,6 +1471,23 @@ void __sched rt_mutex_unlock(struct rt_mutex *lock) EXPORT_SYMBOL_GPL(rt_mutex_unlock); /** + * rt_mutex_futex_unlock - Futex variant of rt_mutex_unlock + * @lock: the rt_mutex to be unlocked + * + * Returns: true/false indicating whether priority adjustment is + * required or not. + */ +bool __sched rt_mutex_futex_unlock(struct rt_mutex *lock, + struct wake_q_head *wqh) +{ + if (likely(rt_mutex_cmpxchg(lock, current, NULL))) { + rt_mutex_deadlock_account_unlock(current); + return false; + } + return rt_mutex_slowunlock(lock, wqh); +} + +/** * rt_mutex_destroy - mark a mutex unusable * @lock: the mutex to be destroyed * diff --git a/kernel/locking/rtmutex_common.h b/kernel/locking/rtmutex_common.h index 8552125..7844f8f 100644 --- a/kernel/locking/rtmutex_common.h +++ b/kernel/locking/rtmutex_common.h @@ -131,6 +131,9 @@ extern int rt_mutex_finish_proxy_lock(struct rt_mutex *lock, struct hrtimer_sleeper *to, struct rt_mutex_waiter *waiter); extern int rt_mutex_timed_futex_lock(struct rt_mutex *l, struct hrtimer_sleeper *to); +extern bool rt_mutex_futex_unlock(struct rt_mutex *lock, + struct wake_q_head *wqh); +extern void rt_mutex_adjust_prio(struct task_struct *task); #ifdef CONFIG_DEBUG_RT_MUTEXES # include "rtmutex-debug.h" -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply related [flat|nested] 38+ messages in thread
* [tip:sched/core] locking/rtmutex: Implement lockless top-waiter wakeup 2015-05-19 17:24 ` [PATCH 1/4] locking/rtmutex: Implement lockless top-waiter wakeup Davidlohr Bueso 2015-06-05 12:35 ` Thomas Gleixner 2015-06-16 19:29 ` [PATCH] futex: lower the lock contention on the HB lock during wake up Sebastian Andrzej Siewior @ 2015-06-18 20:30 ` tip-bot for Davidlohr Bueso 2 siblings, 0 replies; 38+ messages in thread From: tip-bot for Davidlohr Bueso @ 2015-06-18 20:30 UTC (permalink / raw) To: linux-tip-commits Cc: umgwanakikbuti, rostedt, hpa, tglx, bigeasy, linux-kernel, peterz, paulmck, dave, dbueso, mingo Commit-ID: 45ab4effc3bee6f8a5cb05652b7bb895ec5b6a7a Gitweb: http://git.kernel.org/tip/45ab4effc3bee6f8a5cb05652b7bb895ec5b6a7a Author: Davidlohr Bueso <dave@stgolabs.net> AuthorDate: Tue, 19 May 2015 10:24:55 -0700 Committer: Thomas Gleixner <tglx@linutronix.de> CommitDate: Thu, 18 Jun 2015 22:27:46 +0200 locking/rtmutex: Implement lockless top-waiter wakeup Mark the task for later wakeup after the wait_lock has been released. This way, once the next task is awoken, it will have a better chance to of finding the wait_lock free when continuing executing in __rt_mutex_slowlock() when trying to acquire the rtmutex, calling try_to_take_rt_mutex(). Upon contended scenarios, other tasks attempting take the lock may acquire it first, right after the wait_lock is released, but (a) this can also occur with the current code, as it relies on the spinlock fairness, and (b) we are dealing with the top-waiter anyway, so it will always take the lock next. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1432056298-18738-2-git-send-email-dave@stgolabs.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de> --- kernel/locking/rtmutex.c | 21 ++++++++++----------- 1 file changed, 10 insertions(+), 11 deletions(-) diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c index b025295..44ee8f8 100644 --- a/kernel/locking/rtmutex.c +++ b/kernel/locking/rtmutex.c @@ -955,14 +955,13 @@ static int task_blocks_on_rt_mutex(struct rt_mutex *lock, } /* - * Wake up the next waiter on the lock. - * * Remove the top waiter from the current tasks pi waiter list and - * wake it up. + * queue it up. * * Called with lock->wait_lock held. */ -static void wakeup_next_waiter(struct rt_mutex *lock) +static void mark_wakeup_next_waiter(struct wake_q_head *wake_q, + struct rt_mutex *lock) { struct rt_mutex_waiter *waiter; unsigned long flags; @@ -991,12 +990,7 @@ static void wakeup_next_waiter(struct rt_mutex *lock) raw_spin_unlock_irqrestore(¤t->pi_lock, flags); - /* - * It's safe to dereference waiter as it cannot go away as - * long as we hold lock->wait_lock. The waiter task needs to - * acquire it in order to dequeue the waiter. - */ - wake_up_process(waiter->task); + wake_q_add(wake_q, waiter->task); } /* @@ -1258,6 +1252,8 @@ static inline int rt_mutex_slowtrylock(struct rt_mutex *lock) static void __sched rt_mutex_slowunlock(struct rt_mutex *lock) { + WAKE_Q(wake_q); + raw_spin_lock(&lock->wait_lock); debug_rt_mutex_unlock(lock); @@ -1306,10 +1302,13 @@ rt_mutex_slowunlock(struct rt_mutex *lock) /* * The wakeup next waiter path does not suffer from the above * race. See the comments there. + * + * Queue the next waiter for wakeup once we release the wait_lock. */ - wakeup_next_waiter(lock); + mark_wakeup_next_waiter(&wake_q, lock); raw_spin_unlock(&lock->wait_lock); + wake_up_q(&wake_q); /* Undo pi boosting if necessary: */ rt_mutex_adjust_prio(current); ^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH 2/4] locking/rtmutex: Use cmp-cmpxchg 2015-05-19 17:24 [PATCH -tip 0/4] rtmutex: Spin on owner Davidlohr Bueso 2015-05-19 17:24 ` [PATCH 1/4] locking/rtmutex: Implement lockless top-waiter wakeup Davidlohr Bueso @ 2015-05-19 17:24 ` Davidlohr Bueso 2015-06-05 12:38 ` Thomas Gleixner 2015-05-19 17:24 ` [PATCH 3/4] locking/rtmutex: Update stale plist comments Davidlohr Bueso ` (2 subsequent siblings) 4 siblings, 1 reply; 38+ messages in thread From: Davidlohr Bueso @ 2015-05-19 17:24 UTC (permalink / raw) To: Thomas Gleixner, Peter Zijlstra, Ingo Molnar Cc: Steven Rostedt, Mike Galbraith, Paul E. McKenney, Sebastian Andrzej Siewior, Davidlohr Bueso, linux-kernel, Davidlohr Bueso Avoid unnecessary cmpxchg calls, all of our other locks use it as well. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> --- kernel/locking/rtmutex.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c index 74188d8..1d5cc9d 100644 --- a/kernel/locking/rtmutex.c +++ b/kernel/locking/rtmutex.c @@ -74,7 +74,9 @@ static void fixup_rt_mutex_waiters(struct rt_mutex *lock) * set up. */ #ifndef CONFIG_DEBUG_RT_MUTEXES -# define rt_mutex_cmpxchg(l,c,n) (cmpxchg(&l->owner, c, n) == c) +# define rt_mutex_cmpxchg(l,c,n) \ + (l->owner == c && cmpxchg(&l->owner, c, n) == c) + static inline void mark_rt_mutex_waiters(struct rt_mutex *lock) { unsigned long owner, *p = (unsigned long *) &lock->owner; -- 2.1.4 ^ permalink raw reply related [flat|nested] 38+ messages in thread
* Re: [PATCH 2/4] locking/rtmutex: Use cmp-cmpxchg 2015-05-19 17:24 ` [PATCH 2/4] locking/rtmutex: Use cmp-cmpxchg Davidlohr Bueso @ 2015-06-05 12:38 ` Thomas Gleixner 2015-06-06 15:27 ` Davidlohr Bueso 0 siblings, 1 reply; 38+ messages in thread From: Thomas Gleixner @ 2015-06-05 12:38 UTC (permalink / raw) To: Davidlohr Bueso Cc: Peter Zijlstra, Ingo Molnar, Steven Rostedt, Mike Galbraith, Paul E. McKenney, Sebastian Andrzej Siewior, linux-kernel, Davidlohr Bueso On Tue, 19 May 2015, Davidlohr Bueso wrote: > Avoid unnecessary cmpxchg calls, all of our other locks > use it as well. > > Signed-off-by: Davidlohr Bueso <dbueso@suse.de> > --- > kernel/locking/rtmutex.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c > index 74188d8..1d5cc9d 100644 > --- a/kernel/locking/rtmutex.c > +++ b/kernel/locking/rtmutex.c > @@ -74,7 +74,9 @@ static void fixup_rt_mutex_waiters(struct rt_mutex *lock) > * set up. > */ > #ifndef CONFIG_DEBUG_RT_MUTEXES > -# define rt_mutex_cmpxchg(l,c,n) (cmpxchg(&l->owner, c, n) == c) > +# define rt_mutex_cmpxchg(l,c,n) \ > + (l->owner == c && cmpxchg(&l->owner, c, n) == c) Errm. How does that improve stuff like rt_mutex_lock() ? Thanks, tglx ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 2/4] locking/rtmutex: Use cmp-cmpxchg 2015-06-05 12:38 ` Thomas Gleixner @ 2015-06-06 15:27 ` Davidlohr Bueso 2015-06-15 18:34 ` Jason Low 0 siblings, 1 reply; 38+ messages in thread From: Davidlohr Bueso @ 2015-06-06 15:27 UTC (permalink / raw) To: Thomas Gleixner Cc: Peter Zijlstra, Ingo Molnar, Steven Rostedt, Mike Galbraith, Paul E. McKenney, Sebastian Andrzej Siewior, linux-kernel On Fri, 2015-06-05 at 14:38 +0200, Thomas Gleixner wrote: > On Tue, 19 May 2015, Davidlohr Bueso wrote: > > > Avoid unnecessary cmpxchg calls, all of our other locks > > use it as well. > > > > Signed-off-by: Davidlohr Bueso <dbueso@suse.de> > > --- > > kernel/locking/rtmutex.c | 4 +++- > > 1 file changed, 3 insertions(+), 1 deletion(-) > > > > diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c > > index 74188d8..1d5cc9d 100644 > > --- a/kernel/locking/rtmutex.c > > +++ b/kernel/locking/rtmutex.c > > @@ -74,7 +74,9 @@ static void fixup_rt_mutex_waiters(struct rt_mutex *lock) > > * set up. > > */ > > #ifndef CONFIG_DEBUG_RT_MUTEXES > > -# define rt_mutex_cmpxchg(l,c,n) (cmpxchg(&l->owner, c, n) == c) > > +# define rt_mutex_cmpxchg(l,c,n) \ > > + (l->owner == c && cmpxchg(&l->owner, c, n) == c) > > Errm. How does that improve stuff like rt_mutex_lock() ? It just avoids a cmpxchg in the fastpath when locked, at the cost of an extra test when unlocked. CCAS techniques have been proven to boost some workloads for both rwsems and mutexes. That's all. Thanks, Davidlohr ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 2/4] locking/rtmutex: Use cmp-cmpxchg 2015-06-06 15:27 ` Davidlohr Bueso @ 2015-06-15 18:34 ` Jason Low 2015-06-15 19:37 ` Davidlohr Bueso 0 siblings, 1 reply; 38+ messages in thread From: Jason Low @ 2015-06-15 18:34 UTC (permalink / raw) To: Davidlohr Bueso, Jason Low Cc: Thomas Gleixner, Peter Zijlstra, Ingo Molnar, Steven Rostedt, Mike Galbraith, Paul E. McKenney, Sebastian Andrzej Siewior, Linux Kernel Mailing List Hi David, On Sat, Jun 6, 2015 at 8:27 AM, Davidlohr Bueso <dave@stgolabs.net> wrote: > On Fri, 2015-06-05 at 14:38 +0200, Thomas Gleixner wrote: >> On Tue, 19 May 2015, Davidlohr Bueso wrote: >> >> > Avoid unnecessary cmpxchg calls, all of our other locks >> > use it as well. >> > >> > Signed-off-by: Davidlohr Bueso <dbueso@suse.de> >> > --- >> > kernel/locking/rtmutex.c | 4 +++- >> > 1 file changed, 3 insertions(+), 1 deletion(-) >> > >> > diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c >> > index 74188d8..1d5cc9d 100644 >> > --- a/kernel/locking/rtmutex.c >> > +++ b/kernel/locking/rtmutex.c >> > @@ -74,7 +74,9 @@ static void fixup_rt_mutex_waiters(struct rt_mutex *lock) >> > * set up. >> > */ >> > #ifndef CONFIG_DEBUG_RT_MUTEXES >> > -# define rt_mutex_cmpxchg(l,c,n) (cmpxchg(&l->owner, c, n) == c) >> > +# define rt_mutex_cmpxchg(l,c,n) \ >> > + (l->owner == c && cmpxchg(&l->owner, c, n) == c) >> >> Errm. How does that improve stuff like rt_mutex_lock() ? > > It just avoids a cmpxchg in the fastpath when locked, at the cost of an > extra test when unlocked. CCAS techniques have been proven to boost some > workloads for both rwsems and mutexes. That's all. The CCAS technique was typically used in the slow paths for those other locks, where the chance of the operation returning false is higher. The rt_mutex_cmpxchg is used in places such as rt_mutex fastlocks. We might not want to add extra costs to the fast paths, particularly when the rt_mutex_cmpxchg are marked "likely" in those cases. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 2/4] locking/rtmutex: Use cmp-cmpxchg 2015-06-15 18:34 ` Jason Low @ 2015-06-15 19:37 ` Davidlohr Bueso 2015-06-16 1:00 ` Jason Low 0 siblings, 1 reply; 38+ messages in thread From: Davidlohr Bueso @ 2015-06-15 19:37 UTC (permalink / raw) To: Jason Low Cc: Thomas Gleixner, Peter Zijlstra, Ingo Molnar, Steven Rostedt, Mike Galbraith, Paul E. McKenney, Sebastian Andrzej Siewior, Linux Kernel Mailing List On Mon, 2015-06-15 at 11:34 -0700, Jason Low wrote: > The CCAS technique was typically used in the slow paths for those > other locks, where the chance of the operation returning false is > higher. That is true. Although I really want to use it in patch 4, I guess I could move the check in there, and thus avoid having it in the fastpath. Thanks, Davidlohr ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH 2/4] locking/rtmutex: Use cmp-cmpxchg 2015-06-15 19:37 ` Davidlohr Bueso @ 2015-06-16 1:00 ` Jason Low 0 siblings, 0 replies; 38+ messages in thread From: Jason Low @ 2015-06-16 1:00 UTC (permalink / raw) To: Davidlohr Bueso Cc: Jason Low, Thomas Gleixner, Peter Zijlstra, Ingo Molnar, Steven Rostedt, Mike Galbraith, Paul E. McKenney, Sebastian Andrzej Siewior, Linux Kernel Mailing List On Mon, Jun 15, 2015 at 12:37 PM, Davidlohr Bueso <dave@stgolabs.net> wrote: > On Mon, 2015-06-15 at 11:34 -0700, Jason Low wrote: >> The CCAS technique was typically used in the slow paths for those >> other locks, where the chance of the operation returning false is >> higher. > > That is true. Although I really want to use it in patch 4, I guess I > could move the check in there, and thus avoid having it in the fastpath. I agree, that way, we only have the extra check in cases where it is beneficial, like in the optimistic spin loop. Jason ^ permalink raw reply [flat|nested] 38+ messages in thread
* [PATCH 3/4] locking/rtmutex: Update stale plist comments 2015-05-19 17:24 [PATCH -tip 0/4] rtmutex: Spin on owner Davidlohr Bueso 2015-05-19 17:24 ` [PATCH 1/4] locking/rtmutex: Implement lockless top-waiter wakeup Davidlohr Bueso 2015-05-19 17:24 ` [PATCH 2/4] locking/rtmutex: Use cmp-cmpxchg Davidlohr Bueso @ 2015-05-19 17:24 ` Davidlohr Bueso 2015-06-05 12:39 ` Thomas Gleixner ` (2 more replies) 2015-05-19 17:24 ` [PATCH -rfc 4/4] locking/rtmutex: Support spin on owner (osq) Davidlohr Bueso 2015-05-25 20:35 ` [PATCH -tip 0/4] rtmutex: Spin on owner Davidlohr Bueso 4 siblings, 3 replies; 38+ messages in thread From: Davidlohr Bueso @ 2015-05-19 17:24 UTC (permalink / raw) To: Thomas Gleixner, Peter Zijlstra, Ingo Molnar Cc: Steven Rostedt, Mike Galbraith, Paul E. McKenney, Sebastian Andrzej Siewior, Davidlohr Bueso, linux-kernel, Davidlohr Bueso ... as of fb00aca4744 (rtmutex: Turn the plist into an rb-tree) we no longer use plists for queuing any waiters. Update stale comments. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> --- kernel/locking/rtmutex.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c index 1d5cc9d..81aad32 100644 --- a/kernel/locking/rtmutex.c +++ b/kernel/locking/rtmutex.c @@ -626,7 +626,7 @@ static int rt_mutex_adjust_prio_chain(struct task_struct *task, */ prerequeue_top_waiter = rt_mutex_top_waiter(lock); - /* [7] Requeue the waiter in the lock waiter list. */ + /* [7] Requeue the waiter in the lock waiter tree. */ rt_mutex_dequeue(lock, waiter); waiter->prio = task->prio; rt_mutex_enqueue(lock, waiter); @@ -664,7 +664,7 @@ static int rt_mutex_adjust_prio_chain(struct task_struct *task, /* * The waiter became the new top (highest priority) * waiter on the lock. Replace the previous top waiter - * in the owner tasks pi waiters list with this waiter + * in the owner tasks pi waiters tree with this waiter * and adjust the priority of the owner. */ rt_mutex_dequeue_pi(task, prerequeue_top_waiter); @@ -675,7 +675,7 @@ static int rt_mutex_adjust_prio_chain(struct task_struct *task, /* * The waiter was the top waiter on the lock, but is * no longer the top prority waiter. Replace waiter in - * the owner tasks pi waiters list with the new top + * the owner tasks pi waiters tree with the new top * (highest priority) waiter and adjust the priority * of the owner. * The new top waiter is stored in @waiter so that @@ -749,7 +749,7 @@ static int rt_mutex_adjust_prio_chain(struct task_struct *task, * * @lock: The lock to be acquired. * @task: The task which wants to acquire the lock - * @waiter: The waiter that is queued to the lock's wait list if the + * @waiter: The waiter that is queued to the lock's wait tree if the * callsite called task_blocked_on_lock(), otherwise NULL */ static int try_to_take_rt_mutex(struct rt_mutex *lock, struct task_struct *task, @@ -784,7 +784,7 @@ static int try_to_take_rt_mutex(struct rt_mutex *lock, struct task_struct *task, /* * If @waiter != NULL, @task has already enqueued the waiter - * into @lock waiter list. If @waiter == NULL then this is a + * into @lock waiter tree. If @waiter == NULL then this is a * trylock attempt. */ if (waiter) { @@ -797,7 +797,7 @@ static int try_to_take_rt_mutex(struct rt_mutex *lock, struct task_struct *task, /* * We can acquire the lock. Remove the waiter from the - * lock waiters list. + * lock waiters tree. */ rt_mutex_dequeue(lock, waiter); @@ -829,7 +829,7 @@ static int try_to_take_rt_mutex(struct rt_mutex *lock, struct task_struct *task, * No waiters. Take the lock without the * pi_lock dance.@task->pi_blocked_on is NULL * and we have no waiters to enqueue in @task - * pi waiters list. + * pi waiters tree. */ goto takeit; } @@ -846,7 +846,7 @@ static int try_to_take_rt_mutex(struct rt_mutex *lock, struct task_struct *task, /* * Finish the lock acquisition. @task is the new owner. If * other waiters exist we have to insert the highest priority - * waiter into @task->pi_waiters list. + * waiter into @task->pi_waiters tree. */ if (rt_mutex_has_waiters(lock)) rt_mutex_enqueue_pi(task, rt_mutex_top_waiter(lock)); @@ -957,7 +957,7 @@ static int task_blocks_on_rt_mutex(struct rt_mutex *lock, } /* - * Remove the top waiter from the current tasks pi waiter list and + * Remove the top waiter from the current tasks pi waiter tree and * queue it up. * * Called with lock->wait_lock held. -- 2.1.4 ^ permalink raw reply related [flat|nested] 38+ messages in thread
* Re: [PATCH 3/4] locking/rtmutex: Update stale plist comments 2015-05-19 17:24 ` [PATCH 3/4] locking/rtmutex: Update stale plist comments Davidlohr Bueso @ 2015-06-05 12:39 ` Thomas Gleixner 2015-06-18 20:57 ` [tip:sched/core] " tip-bot for Davidlohr Bueso 2015-06-19 19:33 ` [tip:sched/locking] " tip-bot for Davidlohr Bueso 2 siblings, 0 replies; 38+ messages in thread From: Thomas Gleixner @ 2015-06-05 12:39 UTC (permalink / raw) To: Davidlohr Bueso Cc: Peter Zijlstra, Ingo Molnar, Steven Rostedt, Mike Galbraith, Paul E. McKenney, Sebastian Andrzej Siewior, linux-kernel, Davidlohr Bueso On Tue, 19 May 2015, Davidlohr Bueso wrote: > ... as of fb00aca4744 (rtmutex: Turn the plist into an rb-tree) we > no longer use plists for queuing any waiters. Update stale comments. > > Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> ^ permalink raw reply [flat|nested] 38+ messages in thread
* [tip:sched/core] locking/rtmutex: Update stale plist comments 2015-05-19 17:24 ` [PATCH 3/4] locking/rtmutex: Update stale plist comments Davidlohr Bueso 2015-06-05 12:39 ` Thomas Gleixner @ 2015-06-18 20:57 ` tip-bot for Davidlohr Bueso 2015-06-19 19:33 ` [tip:sched/locking] " tip-bot for Davidlohr Bueso 2 siblings, 0 replies; 38+ messages in thread From: tip-bot for Davidlohr Bueso @ 2015-06-18 20:57 UTC (permalink / raw) To: linux-tip-commits Cc: paulmck, umgwanakikbuti, rostedt, linux-kernel, dave, bigeasy, dbueso, peterz, mingo, hpa, tglx Commit-ID: f5de9f848cdfba072a0466c24891167c0c8b3be3 Gitweb: http://git.kernel.org/tip/f5de9f848cdfba072a0466c24891167c0c8b3be3 Author: Davidlohr Bueso <dave@stgolabs.net> AuthorDate: Tue, 19 May 2015 10:24:57 -0700 Committer: Thomas Gleixner <tglx@linutronix.de> CommitDate: Thu, 18 Jun 2015 22:53:36 +0200 locking/rtmutex: Update stale plist comments ... as of fb00aca4744 (rtmutex: Turn the plist into an rb-tree) we no longer use plists for queuing any waiters. Update stale comments. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1432056298-18738-4-git-send-email-dave@stgolabs.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de> --- kernel/locking/rtmutex.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c index 1130130..5537ebc 100644 --- a/kernel/locking/rtmutex.c +++ b/kernel/locking/rtmutex.c @@ -624,7 +624,7 @@ static int rt_mutex_adjust_prio_chain(struct task_struct *task, */ prerequeue_top_waiter = rt_mutex_top_waiter(lock); - /* [7] Requeue the waiter in the lock waiter list. */ + /* [7] Requeue the waiter in the lock waiter tree. */ rt_mutex_dequeue(lock, waiter); waiter->prio = task->prio; rt_mutex_enqueue(lock, waiter); @@ -662,7 +662,7 @@ static int rt_mutex_adjust_prio_chain(struct task_struct *task, /* * The waiter became the new top (highest priority) * waiter on the lock. Replace the previous top waiter - * in the owner tasks pi waiters list with this waiter + * in the owner tasks pi waiters tree with this waiter * and adjust the priority of the owner. */ rt_mutex_dequeue_pi(task, prerequeue_top_waiter); @@ -673,7 +673,7 @@ static int rt_mutex_adjust_prio_chain(struct task_struct *task, /* * The waiter was the top waiter on the lock, but is * no longer the top prority waiter. Replace waiter in - * the owner tasks pi waiters list with the new top + * the owner tasks pi waiters tree with the new top * (highest priority) waiter and adjust the priority * of the owner. * The new top waiter is stored in @waiter so that @@ -747,7 +747,7 @@ static int rt_mutex_adjust_prio_chain(struct task_struct *task, * * @lock: The lock to be acquired. * @task: The task which wants to acquire the lock - * @waiter: The waiter that is queued to the lock's wait list if the + * @waiter: The waiter that is queued to the lock's wait tree if the * callsite called task_blocked_on_lock(), otherwise NULL */ static int try_to_take_rt_mutex(struct rt_mutex *lock, struct task_struct *task, @@ -782,7 +782,7 @@ static int try_to_take_rt_mutex(struct rt_mutex *lock, struct task_struct *task, /* * If @waiter != NULL, @task has already enqueued the waiter - * into @lock waiter list. If @waiter == NULL then this is a + * into @lock waiter tree. If @waiter == NULL then this is a * trylock attempt. */ if (waiter) { @@ -795,7 +795,7 @@ static int try_to_take_rt_mutex(struct rt_mutex *lock, struct task_struct *task, /* * We can acquire the lock. Remove the waiter from the - * lock waiters list. + * lock waiters tree. */ rt_mutex_dequeue(lock, waiter); @@ -827,7 +827,7 @@ static int try_to_take_rt_mutex(struct rt_mutex *lock, struct task_struct *task, * No waiters. Take the lock without the * pi_lock dance.@task->pi_blocked_on is NULL * and we have no waiters to enqueue in @task - * pi waiters list. + * pi waiters tree. */ goto takeit; } @@ -844,7 +844,7 @@ static int try_to_take_rt_mutex(struct rt_mutex *lock, struct task_struct *task, /* * Finish the lock acquisition. @task is the new owner. If * other waiters exist we have to insert the highest priority - * waiter into @task->pi_waiters list. + * waiter into @task->pi_waiters tree. */ if (rt_mutex_has_waiters(lock)) rt_mutex_enqueue_pi(task, rt_mutex_top_waiter(lock)); @@ -955,7 +955,7 @@ static int task_blocks_on_rt_mutex(struct rt_mutex *lock, } /* - * Remove the top waiter from the current tasks pi waiter list and + * Remove the top waiter from the current tasks pi waiter tree and * queue it up. * * Called with lock->wait_lock held. ^ permalink raw reply related [flat|nested] 38+ messages in thread
* [tip:sched/locking] locking/rtmutex: Update stale plist comments 2015-05-19 17:24 ` [PATCH 3/4] locking/rtmutex: Update stale plist comments Davidlohr Bueso 2015-06-05 12:39 ` Thomas Gleixner 2015-06-18 20:57 ` [tip:sched/core] " tip-bot for Davidlohr Bueso @ 2015-06-19 19:33 ` tip-bot for Davidlohr Bueso 2 siblings, 0 replies; 38+ messages in thread From: tip-bot for Davidlohr Bueso @ 2015-06-19 19:33 UTC (permalink / raw) To: linux-tip-commits Cc: dbueso, bigeasy, mingo, dave, umgwanakikbuti, tglx, paulmck, rostedt, linux-kernel, hpa, peterz Commit-ID: 9f40a51a35a0e1445cc4873251c3df2631eda294 Gitweb: http://git.kernel.org/tip/9f40a51a35a0e1445cc4873251c3df2631eda294 Author: Davidlohr Bueso <dave@stgolabs.net> AuthorDate: Tue, 19 May 2015 10:24:57 -0700 Committer: Thomas Gleixner <tglx@linutronix.de> CommitDate: Fri, 19 Jun 2015 21:27:21 +0200 locking/rtmutex: Update stale plist comments ... as of fb00aca4744 (rtmutex: Turn the plist into an rb-tree) we no longer use plists for queuing any waiters. Update stale comments. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1432056298-18738-4-git-send-email-dave@stgolabs.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de> --- kernel/locking/rtmutex.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c index 0add724..86d4853 100644 --- a/kernel/locking/rtmutex.c +++ b/kernel/locking/rtmutex.c @@ -624,7 +624,7 @@ static int rt_mutex_adjust_prio_chain(struct task_struct *task, */ prerequeue_top_waiter = rt_mutex_top_waiter(lock); - /* [7] Requeue the waiter in the lock waiter list. */ + /* [7] Requeue the waiter in the lock waiter tree. */ rt_mutex_dequeue(lock, waiter); waiter->prio = task->prio; rt_mutex_enqueue(lock, waiter); @@ -662,7 +662,7 @@ static int rt_mutex_adjust_prio_chain(struct task_struct *task, /* * The waiter became the new top (highest priority) * waiter on the lock. Replace the previous top waiter - * in the owner tasks pi waiters list with this waiter + * in the owner tasks pi waiters tree with this waiter * and adjust the priority of the owner. */ rt_mutex_dequeue_pi(task, prerequeue_top_waiter); @@ -673,7 +673,7 @@ static int rt_mutex_adjust_prio_chain(struct task_struct *task, /* * The waiter was the top waiter on the lock, but is * no longer the top prority waiter. Replace waiter in - * the owner tasks pi waiters list with the new top + * the owner tasks pi waiters tree with the new top * (highest priority) waiter and adjust the priority * of the owner. * The new top waiter is stored in @waiter so that @@ -747,7 +747,7 @@ static int rt_mutex_adjust_prio_chain(struct task_struct *task, * * @lock: The lock to be acquired. * @task: The task which wants to acquire the lock - * @waiter: The waiter that is queued to the lock's wait list if the + * @waiter: The waiter that is queued to the lock's wait tree if the * callsite called task_blocked_on_lock(), otherwise NULL */ static int try_to_take_rt_mutex(struct rt_mutex *lock, struct task_struct *task, @@ -782,7 +782,7 @@ static int try_to_take_rt_mutex(struct rt_mutex *lock, struct task_struct *task, /* * If @waiter != NULL, @task has already enqueued the waiter - * into @lock waiter list. If @waiter == NULL then this is a + * into @lock waiter tree. If @waiter == NULL then this is a * trylock attempt. */ if (waiter) { @@ -795,7 +795,7 @@ static int try_to_take_rt_mutex(struct rt_mutex *lock, struct task_struct *task, /* * We can acquire the lock. Remove the waiter from the - * lock waiters list. + * lock waiters tree. */ rt_mutex_dequeue(lock, waiter); @@ -827,7 +827,7 @@ static int try_to_take_rt_mutex(struct rt_mutex *lock, struct task_struct *task, * No waiters. Take the lock without the * pi_lock dance.@task->pi_blocked_on is NULL * and we have no waiters to enqueue in @task - * pi waiters list. + * pi waiters tree. */ goto takeit; } @@ -844,7 +844,7 @@ static int try_to_take_rt_mutex(struct rt_mutex *lock, struct task_struct *task, /* * Finish the lock acquisition. @task is the new owner. If * other waiters exist we have to insert the highest priority - * waiter into @task->pi_waiters list. + * waiter into @task->pi_waiters tree. */ if (rt_mutex_has_waiters(lock)) rt_mutex_enqueue_pi(task, rt_mutex_top_waiter(lock)); @@ -955,7 +955,7 @@ static int task_blocks_on_rt_mutex(struct rt_mutex *lock, } /* - * Remove the top waiter from the current tasks pi waiter list and + * Remove the top waiter from the current tasks pi waiter tree and * queue it up. * * Called with lock->wait_lock held. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply related [flat|nested] 38+ messages in thread
* [PATCH -rfc 4/4] locking/rtmutex: Support spin on owner (osq) 2015-05-19 17:24 [PATCH -tip 0/4] rtmutex: Spin on owner Davidlohr Bueso ` (2 preceding siblings ...) 2015-05-19 17:24 ` [PATCH 3/4] locking/rtmutex: Update stale plist comments Davidlohr Bueso @ 2015-05-19 17:24 ` Davidlohr Bueso 2015-05-20 7:11 ` Paul Bolle ` (2 more replies) 2015-05-25 20:35 ` [PATCH -tip 0/4] rtmutex: Spin on owner Davidlohr Bueso 4 siblings, 3 replies; 38+ messages in thread From: Davidlohr Bueso @ 2015-05-19 17:24 UTC (permalink / raw) To: Thomas Gleixner, Peter Zijlstra, Ingo Molnar Cc: Steven Rostedt, Mike Galbraith, Paul E. McKenney, Sebastian Andrzej Siewior, Davidlohr Bueso, linux-kernel, Davidlohr Bueso Similar to what we have in other locks, particularly regular mutexes, the idea is that as long as the owner is running, there is a fair chance it'll release the lock soon, and thus a task trying to acquire the rtmutex will better off spinning instead of blocking immediately after the fastpath. Conditions to stop spinning and enter the slowpath are simple: (1) Upon need_resched() (2) Current lock owner blocks Spinning tasks performance is further enhanced by queuing in cancelable mcs (osq). Because rtmutexes track the lock owner atomically, we can extend the fastpath to continue polling on the lock owner via cmpxchg(lock->owner, NULL, current). This is a conservative approach, such that upon entering the slowpath, we force all lock spinners (including future ones) to serialize on the wait_lock as mark_rt_mutex_waiters() is called, unconditionally. CPU0 CPU1 optimistic_spin() (failed) try_to_take_rt_mutex() mark_rt_mutex_waiters() optimistic_spin() (failed cmpxchg) spin_lock(&lock->wait_lock) As such we check for RT_MUTEX_HAS_WAITERS bit0 (rt_mutex_has_waiters_fast()). This allows priority boosting to take precedence over spinning, as otherwise we could starve a higher priority queued-up task (ie: top waiter) if spinners constantly steal the lock. Another alternative would be to stop spinning when we should really wakeup a higher priority waiter, but of course we do not hold the wait_lock, so that is racy. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> --- include/linux/rtmutex.h | 4 ++ kernel/Kconfig.locks | 4 ++ kernel/locking/rtmutex.c | 132 +++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 140 insertions(+) diff --git a/include/linux/rtmutex.h b/include/linux/rtmutex.h index 1abba5c..b5e85ca 100644 --- a/include/linux/rtmutex.h +++ b/include/linux/rtmutex.h @@ -15,6 +15,7 @@ #include <linux/linkage.h> #include <linux/rbtree.h> #include <linux/spinlock_types.h> +#include <linux/osq_lock.h> extern int max_lock_depth; /* for sysctl */ @@ -31,6 +32,9 @@ struct rt_mutex { struct rb_root waiters; struct rb_node *waiters_leftmost; struct task_struct *owner; +#ifdef CONFIG_RT_MUTEX_SPIN_ON_OWNER + struct optimistic_spin_queue osq; /* Spinner MCS lock */ +#endif #ifdef CONFIG_DEBUG_RT_MUTEXES int save_state; const char *name, *file; diff --git a/kernel/Kconfig.locks b/kernel/Kconfig.locks index ebdb004..628915d 100644 --- a/kernel/Kconfig.locks +++ b/kernel/Kconfig.locks @@ -227,6 +227,10 @@ config MUTEX_SPIN_ON_OWNER def_bool y depends on SMP && !DEBUG_MUTEXES && ARCH_SUPPORTS_ATOMIC_RMW +config RT_MUTEX_SPIN_ON_OWNER + def_bool y + depends on SMP && RT_MUTEXES && !CONFIG_DEBUG_RT_MUTEXES && ARCH_SUPPORTS_ATOMIC_RMW + config RWSEM_SPIN_ON_OWNER def_bool y depends on SMP && RWSEM_XCHGADD_ALGORITHM && ARCH_SUPPORTS_ATOMIC_RMW diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c index 81aad32..6524c7c 100644 --- a/kernel/locking/rtmutex.c +++ b/kernel/locking/rtmutex.c @@ -63,6 +63,20 @@ static inline void clear_rt_mutex_waiters(struct rt_mutex *lock) ((unsigned long)lock->owner & ~RT_MUTEX_HAS_WAITERS); } +/* + * Lockless alternative to rt_mutex_has_waiters() as we do not need the + * wait_lock to check if we are in, for instance, a transitional state + * after calling mark_rt_mutex_waiters(). + */ +static inline bool rt_mutex_has_waiters_fast(struct rt_mutex *lock) +{ + unsigned long val = (unsigned long)lock->owner; + + if (!val) + return false; + return val & RT_MUTEX_HAS_WAITERS; +} + static void fixup_rt_mutex_waiters(struct rt_mutex *lock) { if (!rt_mutex_has_waiters(lock)) @@ -1152,6 +1166,121 @@ static void rt_mutex_handle_deadlock(int res, int detect_deadlock, } } +#ifdef CONFIG_RT_MUTEX_SPIN_ON_OWNER +static noinline +bool rt_mutex_spin_on_owner(struct rt_mutex *lock, struct task_struct *owner) +{ + bool ret = true; + + rcu_read_lock(); + while (rt_mutex_owner(lock) == owner) { + /* + * Ensure we emit the owner->on_cpu, dereference _after_ + * checking lock->owner still matches owner. If that fails, + * owner might point to freed memory. If it still matches, + * the rcu_read_lock() ensures the memory stays valid. + */ + barrier(); + + if (!owner->on_cpu || need_resched()) { + ret = false; + break; + } + + cpu_relax_lowlatency(); + } + rcu_read_unlock(); + + return ret; +} + +/* + * Initial check for entering the mutex spinning loop + */ +static inline bool rt_mutex_can_spin_on_owner(struct rt_mutex *lock) +{ + struct task_struct *owner; + /* default return to spin: if no owner, the lock is free */ + int ret = true; + + if (need_resched()) + return 0; + + rcu_read_lock(); + owner = rt_mutex_owner(lock); + if (owner) + ret = owner->on_cpu; + rcu_read_unlock(); + + return ret; +} + +static bool rt_mutex_optimistic_spin(struct rt_mutex *lock) +{ + bool taken = false; + + preempt_disable(); + + if (!rt_mutex_can_spin_on_owner(lock)) + goto done; + /* + * In order to avoid a stampede of mutex spinners trying to + * acquire the mutex all at once, the spinners need to take a + * MCS (queued) lock first before spinning on the owner field. + */ + if (!osq_lock(&lock->osq)) + goto done; + + while (true) { + struct task_struct *owner; + + /* + * When another task is competing for the lock in the + * slowpath (either transitional or not), avoid the + * cmpxchg and immediately block. If the situation is + * later fixed by clearing the waiters bit, future + * tasks that atempt to take the rtmutex can spin. + */ + if (rt_mutex_has_waiters_fast(lock)) + goto done_unlock; + + /* + * If there's an owner, wait for it to either + * release the lock or go to sleep. + */ + owner = rt_mutex_owner(lock); + if (owner && !rt_mutex_spin_on_owner(lock, owner)) + break; + + /* Try to acquire the lock, if it is unlocked. */ + if (rt_mutex_cmpxchg(lock, NULL, current)) { + taken = true; + goto done_unlock; + } + + /* + * The cpu_relax() call is a compiler barrier which forces + * everything in this loop to be re-loaded. We don't need + * memory barriers as we'll eventually observe the right + * values at the cost of a few extra spins. + */ + cpu_relax_lowlatency(); + } + +done_unlock: + osq_unlock(&lock->osq); +done: + preempt_enable(); + return taken; +} + +#else +static bool rt_mutex_optimistic_spin(struct rt_mutex *lock) +{ + return false; +} +#endif + /* * Slow path lock function: */ @@ -1163,6 +1292,9 @@ rt_mutex_slowlock(struct rt_mutex *lock, int state, struct rt_mutex_waiter waiter; int ret = 0; + if (rt_mutex_optimistic_spin(lock)) + return ret; + debug_rt_mutex_init_waiter(&waiter); RB_CLEAR_NODE(&waiter.pi_tree_entry); RB_CLEAR_NODE(&waiter.tree_entry); -- 2.1.4 ^ permalink raw reply related [flat|nested] 38+ messages in thread
* Re: [PATCH -rfc 4/4] locking/rtmutex: Support spin on owner (osq) 2015-05-19 17:24 ` [PATCH -rfc 4/4] locking/rtmutex: Support spin on owner (osq) Davidlohr Bueso @ 2015-05-20 7:11 ` Paul Bolle 2015-05-25 20:35 ` Davidlohr Bueso 2015-05-29 15:19 ` Davidlohr Bueso 2015-06-05 13:59 ` Thomas Gleixner 2 siblings, 1 reply; 38+ messages in thread From: Paul Bolle @ 2015-05-20 7:11 UTC (permalink / raw) To: Davidlohr Bueso Cc: Thomas Gleixner, Peter Zijlstra, Ingo Molnar, Steven Rostedt, Mike Galbraith, Paul E. McKenney, Sebastian Andrzej Siewior, linux-kernel, Davidlohr Bueso (This is pasted as an RFC, so you probably don't want feedback going into detail yet, but I couldn't help spotting this.) On Tue, 2015-05-19 at 10:24 -0700, Davidlohr Bueso wrote: > --- a/kernel/Kconfig.locks > +++ b/kernel/Kconfig.locks > +config RT_MUTEX_SPIN_ON_OWNER > + def_bool y > + depends on SMP && RT_MUTEXES && !CONFIG_DEBUG_RT_MUTEXES && ARCH_SUPPORTS_ATOMIC_RMW s/CONFIG_DEBUG_RT_MUTEXES/DEBUG_RT_MUTEXES/? Or perhaps even drop "&& !CONFIG_DEBUG_RT_MUTEXES" entirely, because that test currently always evaluates to true, so it might not be needed. Running scripts/checkkconfigsymbols.py --diff $sha1..$sha2 helps catching typos like this. Paul Bolle ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH -rfc 4/4] locking/rtmutex: Support spin on owner (osq) 2015-05-20 7:11 ` Paul Bolle @ 2015-05-25 20:35 ` Davidlohr Bueso 0 siblings, 0 replies; 38+ messages in thread From: Davidlohr Bueso @ 2015-05-25 20:35 UTC (permalink / raw) To: Paul Bolle Cc: Thomas Gleixner, Peter Zijlstra, Ingo Molnar, Steven Rostedt, Mike Galbraith, Paul E. McKenney, Sebastian Andrzej Siewior, linux-kernel On Wed, 2015-05-20 at 09:11 +0200, Paul Bolle wrote: > (This is pasted as an RFC, so you probably don't want feedback going > into detail yet, but I couldn't help spotting this.) > > On Tue, 2015-05-19 at 10:24 -0700, Davidlohr Bueso wrote: > > --- a/kernel/Kconfig.locks > > +++ b/kernel/Kconfig.locks > > > +config RT_MUTEX_SPIN_ON_OWNER > > + def_bool y > > + depends on SMP && RT_MUTEXES && !CONFIG_DEBUG_RT_MUTEXES && ARCH_SUPPORTS_ATOMIC_RMW > > s/CONFIG_DEBUG_RT_MUTEXES/DEBUG_RT_MUTEXES/? Ah, yes thanks. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH -rfc 4/4] locking/rtmutex: Support spin on owner (osq) 2015-05-19 17:24 ` [PATCH -rfc 4/4] locking/rtmutex: Support spin on owner (osq) Davidlohr Bueso 2015-05-20 7:11 ` Paul Bolle @ 2015-05-29 15:19 ` Davidlohr Bueso 2015-05-29 18:01 ` Davidlohr Bueso 2015-06-05 13:59 ` Thomas Gleixner 2 siblings, 1 reply; 38+ messages in thread From: Davidlohr Bueso @ 2015-05-29 15:19 UTC (permalink / raw) To: Thomas Gleixner Cc: Peter Zijlstra, Ingo Molnar, Steven Rostedt, Mike Galbraith, Paul E. McKenney, Sebastian Andrzej Siewior, linux-kernel Btw, I just realized I had sent a stale patch where the osq was not being initialized, fixed below. Thanks! From: Davidlohr Bueso <dave@stgolabs.net> Subject: [PATCH v2] locking/rtmutex: Support spin on owner (osq) Similar to what we have in other locks, particularly regular mutexes, the idea is that as long as the owner is running, there is a fair chance it'll release the lock soon, and thus a task trying to acquire the rtmutex will better off spinning instead of blocking immediately after the fastpath. Conditions to stop spinning and enter the slowpath are simple: (1) Upon need_resched() (2) Current lock owner blocks Spinning tasks performance is further enhanced by queuing in cancelable mcs (osq). Because rtmutexes track the lock owner atomically, we can extend the fastpath to continue polling on the lock owner via cmpxchg(lock->owner, NULL, current). This is a conservative approach, such that upon entering the slowpath, we force all lock spinners (including future ones) to serialize on the wait_lock as mark_rt_mutex_waiters() is called, unconditionally. CPU0 CPU1 optimistic_spin() (failed) try_to_take_rt_mutex() mark_rt_mutex_waiters() optimistic_spin() (failed cmpxchg) spin_lock(&lock->wait_lock) As such we check for RT_MUTEX_HAS_WAITERS bit0 (rt_mutex_has_waiters_fast()). This allows priority boosting to take precedence over spinning, as otherwise we could starve a higher priority queued-up task (ie: top waiter) if spinners constantly steal the lock. Another alternative would be to stop spinning when we should really wakeup a higher priority waiter, but of course we do not hold the wait_lock, so that is racy. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> --- Changes from v1: - Fix Kconfig debug - Init lock->osq in rt mutex initializer include/linux/percpu-rwsem.h | 14 +++++ include/linux/rtmutex.h | 4 ++ kernel/Kconfig.locks | 4 ++ kernel/locking/rtmutex.c | 136 ++++++++++++++++++++++++++++++++++++++++++- mm/vmscan.c | 14 ++--- 5 files changed, 164 insertions(+), 8 deletions(-) diff --git a/include/linux/percpu-rwsem.h b/include/linux/percpu-rwsem.h index 266110d..05b4737 100644 --- a/include/linux/percpu-rwsem.h +++ b/include/linux/percpu-rwsem.h @@ -45,6 +45,20 @@ static inline void percpu_down_read(struct percpu_rw_semaphore *sem) */ } +static inline bool percpu_down_read_trylock(struct percpu_rw_semaphore *sem) +{ + bool ret = false; + + preempt_disable(); + if (likely(rcu_sync_is_idle(&sem->rss))) { + __this_cpu_inc(*sem->refcount); + ret = true; + rwsem_acquire_read(&sem->rw_sem.dep_map, 0, 0, _RET_IP_); + } + preempt_enable(); + return ret; +} + static inline void percpu_up_read(struct percpu_rw_semaphore *sem) { /* diff --git a/include/linux/rtmutex.h b/include/linux/rtmutex.h index 1abba5c..b5e85ca 100644 --- a/include/linux/rtmutex.h +++ b/include/linux/rtmutex.h @@ -15,6 +15,7 @@ #include <linux/linkage.h> #include <linux/rbtree.h> #include <linux/spinlock_types.h> +#include <linux/osq_lock.h> extern int max_lock_depth; /* for sysctl */ @@ -31,6 +32,9 @@ struct rt_mutex { struct rb_root waiters; struct rb_node *waiters_leftmost; struct task_struct *owner; +#ifdef CONFIG_RT_MUTEX_SPIN_ON_OWNER + struct optimistic_spin_queue osq; /* Spinner MCS lock */ +#endif #ifdef CONFIG_DEBUG_RT_MUTEXES int save_state; const char *name, *file; diff --git a/kernel/Kconfig.locks b/kernel/Kconfig.locks index ebdb004..8602871 100644 --- a/kernel/Kconfig.locks +++ b/kernel/Kconfig.locks @@ -227,6 +227,10 @@ config MUTEX_SPIN_ON_OWNER def_bool y depends on SMP && !DEBUG_MUTEXES && ARCH_SUPPORTS_ATOMIC_RMW +config RT_MUTEX_SPIN_ON_OWNER + def_bool y + depends on SMP && RT_MUTEXES && !DEBUG_RT_MUTEXES && ARCH_SUPPORTS_ATOMIC_RMW + config RWSEM_SPIN_ON_OWNER def_bool y depends on SMP && RWSEM_XCHGADD_ALGORITHM && ARCH_SUPPORTS_ATOMIC_RMW diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c index 81aad32..c868bd4 100644 --- a/kernel/locking/rtmutex.c +++ b/kernel/locking/rtmutex.c @@ -63,6 +63,20 @@ static inline void clear_rt_mutex_waiters(struct rt_mutex *lock) ((unsigned long)lock->owner & ~RT_MUTEX_HAS_WAITERS); } +/* + * Lockless alternative to rt_mutex_has_waiters() as we do not need the + * wait_lock to check if we are in, for instance, a transitional state + * after calling mark_rt_mutex_waiters(). + */ +static inline bool rt_mutex_has_waiters_fast(struct rt_mutex *lock) +{ + unsigned long val = (unsigned long)lock->owner; + + if (!val) + return false; + return val & RT_MUTEX_HAS_WAITERS; +} + static void fixup_rt_mutex_waiters(struct rt_mutex *lock) { if (!rt_mutex_has_waiters(lock)) @@ -1152,6 +1166,121 @@ static void rt_mutex_handle_deadlock(int res, int detect_deadlock, } } +#ifdef CONFIG_RT_MUTEX_SPIN_ON_OWNER +static noinline +bool rt_mutex_spin_on_owner(struct rt_mutex *lock, struct task_struct *owner) +{ + bool ret = true; + + rcu_read_lock(); + while (rt_mutex_owner(lock) == owner) { + /* + * Ensure we emit the owner->on_cpu, dereference _after_ + * checking lock->owner still matches owner. If that fails, + * owner might point to freed memory. If it still matches, + * the rcu_read_lock() ensures the memory stays valid. + */ + barrier(); + + if (!owner->on_cpu || need_resched()) { + ret = false; + break; + } + + cpu_relax_lowlatency(); + } + rcu_read_unlock(); + + return ret; +} + +/* + * Initial check for entering the mutex spinning loop + */ +static inline bool rt_mutex_can_spin_on_owner(struct rt_mutex *lock) +{ + struct task_struct *owner; + /* default return to spin: if no owner, the lock is free */ + int ret = true; + + if (need_resched()) + return 0; + + rcu_read_lock(); + owner = rt_mutex_owner(lock); + if (owner) + ret = owner->on_cpu; + rcu_read_unlock(); + + return ret; +} + +static bool rt_mutex_optimistic_spin(struct rt_mutex *lock) +{ + bool taken = false; + + preempt_disable(); + + if (!rt_mutex_can_spin_on_owner(lock)) + goto done; + /* + * In order to avoid a stampede of mutex spinners trying to + * acquire the mutex all at once, the spinners need to take a + * MCS (queued) lock first before spinning on the owner field. + */ + if (!osq_lock(&lock->osq)) + goto done; + + while (true) { + struct task_struct *owner; + + /* + * When another task is competing for the lock in the + * slowpath (either transitional or not), avoid the + * cmpxchg and immediately block. If the situation is + * later fixed by clearing the waiters bit, future + * tasks that atempt to take the rtmutex can spin. + */ + if (rt_mutex_has_waiters_fast(lock)) + goto done_unlock; + + /* + * If there's an owner, wait for it to either + * release the lock or go to sleep. + */ + owner = rt_mutex_owner(lock); + if (owner && !rt_mutex_spin_on_owner(lock, owner)) + break; + + /* Try to acquire the lock, if it is unlocked. */ + if (rt_mutex_cmpxchg(lock, NULL, current)) { + taken = true; + goto done_unlock; + } + + /* + * The cpu_relax() call is a compiler barrier which forces + * everything in this loop to be re-loaded. We don't need + * memory barriers as we'll eventually observe the right + * values at the cost of a few extra spins. + */ + cpu_relax_lowlatency(); + } + +done_unlock: + osq_unlock(&lock->osq); +done: + preempt_enable(); + return taken; +} + +#else +static bool rt_mutex_optimistic_spin(struct rt_mutex *lock) +{ + return false; +} +#endif + /* * Slow path lock function: */ @@ -1163,6 +1292,9 @@ rt_mutex_slowlock(struct rt_mutex *lock, int state, struct rt_mutex_waiter waiter; int ret = 0; + if (rt_mutex_optimistic_spin(lock)) + return ret; + debug_rt_mutex_init_waiter(&waiter); RB_CLEAR_NODE(&waiter.pi_tree_entry); RB_CLEAR_NODE(&waiter.tree_entry); @@ -1500,7 +1632,9 @@ void __rt_mutex_init(struct rt_mutex *lock, const char *name) raw_spin_lock_init(&lock->wait_lock); lock->waiters = RB_ROOT; lock->waiters_leftmost = NULL; - +#ifdef CONFIG_RT_MUTEX_SPIN_ON_OWNER + osq_lock_init(&lock->osq); +#endif debug_rt_mutex_init(lock, name); } EXPORT_SYMBOL_GPL(__rt_mutex_init); diff --git a/mm/vmscan.c b/mm/vmscan.c index 5e8eadd..f2da14d 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -36,7 +36,7 @@ #include <linux/cpuset.h> #include <linux/compaction.h> #include <linux/notifier.h> -#include <linux/rwsem.h> +#include <linux/percpu-rwsem.h> #include <linux/delay.h> #include <linux/kthread.h> #include <linux/freezer.h> @@ -211,9 +211,9 @@ int register_shrinker(struct shrinker *shrinker) if (!shrinker->nr_deferred) return -ENOMEM; - down_write(&shrinker_rwsem); + percpu_down_write(&shrinker_rwsem); list_add_tail(&shrinker->list, &shrinker_list); - up_write(&shrinker_rwsem); + percpu_up_write(&shrinker_rwsem); return 0; } EXPORT_SYMBOL(register_shrinker); @@ -223,9 +223,9 @@ EXPORT_SYMBOL(register_shrinker); */ void unregister_shrinker(struct shrinker *shrinker) { - down_write(&shrinker_rwsem); + percpu_down_write(&shrinker_rwsem); list_del(&shrinker->list); - up_write(&shrinker_rwsem); + percpu_up_write(&shrinker_rwsem); kfree(shrinker->nr_deferred); } EXPORT_SYMBOL(unregister_shrinker); @@ -386,7 +386,7 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid, if (nr_scanned == 0) nr_scanned = SWAP_CLUSTER_MAX; - if (!down_read_trylock(&shrinker_rwsem)) { + if (!percpu_down_read_trylock(&shrinker_rwsem)) { /* * If we would return 0, our callers would understand that we * have nothing else to shrink and give up trying. By returning @@ -413,7 +413,7 @@ static unsigned long shrink_slab(gfp_t gfp_mask, int nid, freed += do_shrink_slab(&sc, shrinker, nr_scanned, nr_eligible); } - up_read(&shrinker_rwsem); + percpu_up_read(&shrinker_rwsem); out: cond_resched(); return freed; -- 2.1.4 ^ permalink raw reply related [flat|nested] 38+ messages in thread
* Re: [PATCH -rfc 4/4] locking/rtmutex: Support spin on owner (osq) 2015-05-29 15:19 ` Davidlohr Bueso @ 2015-05-29 18:01 ` Davidlohr Bueso 0 siblings, 0 replies; 38+ messages in thread From: Davidlohr Bueso @ 2015-05-29 18:01 UTC (permalink / raw) To: Thomas Gleixner Cc: Peter Zijlstra, Ingo Molnar, Steven Rostedt, Mike Galbraith, Paul E. McKenney, Sebastian Andrzej Siewior, linux-kernel On Fri, 2015-05-29 at 08:19 -0700, Davidlohr Bueso wrote: > Btw, I just realized I had sent a stale patch where the osq was not > being initialized, fixed below. Thanks! I am an idiot, patch contained some bogus changes in percpu rwsems... *sigh* Here's v3, sorry for the noise. 8<--------------------------------------------------------------------- From: Davidlohr Bueso <dave@stgolabs.net> Subject: [PATCH v3 4/4] locking/rtmutex: Support spin on owner (osq) Similar to what we have in other locks, particularly regular mutexes, the idea is that as long as the owner is running, there is a fair chance it'll release the lock soon, and thus a task trying to acquire the rtmutex will better off spinning instead of blocking immediately after the fastpath. Conditions to stop spinning and enter the slowpath are simple: (1) Upon need_resched() (2) Current lock owner blocks Spinning tasks performance is further enhanced by queuing in cancelable mcs (osq). Because rtmutexes track the lock owner atomically, we can extend the fastpath to continue polling on the lock owner via cmpxchg(lock->owner, NULL, current). This is a conservative approach, such that upon entering the slowpath, we force all lock spinners (including future ones) to serialize on the wait_lock as mark_rt_mutex_waiters() is called, unconditionally. CPU0 CPU1 optimistic_spin() (failed) try_to_take_rt_mutex() mark_rt_mutex_waiters() optimistic_spin() (failed cmpxchg) spin_lock(&lock->wait_lock) As such we check for RT_MUTEX_HAS_WAITERS bit0 (rt_mutex_has_waiters_fast()). This allows priority boosting to take precedence over spinning, as otherwise we could starve a higher priority queued-up task (ie: top waiter) if spinners constantly steal the lock. Another alternative would be to stop spinning when we should really wakeup a higher priority waiter, but of course we do not hold the wait_lock, so that is racy. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> --- include/linux/rtmutex.h | 4 ++ kernel/Kconfig.locks | 4 ++ kernel/locking/rtmutex.c | 136 ++++++++++++++++++++++++++++++++++++++++++++++- 3 files changed, 143 insertions(+), 1 deletion(-) diff --git a/include/linux/rtmutex.h b/include/linux/rtmutex.h index 1abba5c..b5e85ca 100644 --- a/include/linux/rtmutex.h +++ b/include/linux/rtmutex.h @@ -15,6 +15,7 @@ #include <linux/linkage.h> #include <linux/rbtree.h> #include <linux/spinlock_types.h> +#include <linux/osq_lock.h> extern int max_lock_depth; /* for sysctl */ @@ -31,6 +32,9 @@ struct rt_mutex { struct rb_root waiters; struct rb_node *waiters_leftmost; struct task_struct *owner; +#ifdef CONFIG_RT_MUTEX_SPIN_ON_OWNER + struct optimistic_spin_queue osq; /* Spinner MCS lock */ +#endif #ifdef CONFIG_DEBUG_RT_MUTEXES int save_state; const char *name, *file; diff --git a/kernel/Kconfig.locks b/kernel/Kconfig.locks index ebdb004..8602871 100644 --- a/kernel/Kconfig.locks +++ b/kernel/Kconfig.locks @@ -227,6 +227,10 @@ config MUTEX_SPIN_ON_OWNER def_bool y depends on SMP && !DEBUG_MUTEXES && ARCH_SUPPORTS_ATOMIC_RMW +config RT_MUTEX_SPIN_ON_OWNER + def_bool y + depends on SMP && RT_MUTEXES && !DEBUG_RT_MUTEXES && ARCH_SUPPORTS_ATOMIC_RMW + config RWSEM_SPIN_ON_OWNER def_bool y depends on SMP && RWSEM_XCHGADD_ALGORITHM && ARCH_SUPPORTS_ATOMIC_RMW diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c index 81aad32..c868bd4 100644 --- a/kernel/locking/rtmutex.c +++ b/kernel/locking/rtmutex.c @@ -63,6 +63,20 @@ static inline void clear_rt_mutex_waiters(struct rt_mutex *lock) ((unsigned long)lock->owner & ~RT_MUTEX_HAS_WAITERS); } +/* + * Lockless alternative to rt_mutex_has_waiters() as we do not need the + * wait_lock to check if we are in, for instance, a transitional state + * after calling mark_rt_mutex_waiters(). + */ +static inline bool rt_mutex_has_waiters_fast(struct rt_mutex *lock) +{ + unsigned long val = (unsigned long)lock->owner; + + if (!val) + return false; + return val & RT_MUTEX_HAS_WAITERS; +} + static void fixup_rt_mutex_waiters(struct rt_mutex *lock) { if (!rt_mutex_has_waiters(lock)) @@ -1152,6 +1166,121 @@ static void rt_mutex_handle_deadlock(int res, int detect_deadlock, } } +#ifdef CONFIG_RT_MUTEX_SPIN_ON_OWNER +static noinline +bool rt_mutex_spin_on_owner(struct rt_mutex *lock, struct task_struct *owner) +{ + bool ret = true; + + rcu_read_lock(); + while (rt_mutex_owner(lock) == owner) { + /* + * Ensure we emit the owner->on_cpu, dereference _after_ + * checking lock->owner still matches owner. If that fails, + * owner might point to freed memory. If it still matches, + * the rcu_read_lock() ensures the memory stays valid. + */ + barrier(); + + if (!owner->on_cpu || need_resched()) { + ret = false; + break; + } + + cpu_relax_lowlatency(); + } + rcu_read_unlock(); + + return ret; +} + +/* + * Initial check for entering the mutex spinning loop + */ +static inline bool rt_mutex_can_spin_on_owner(struct rt_mutex *lock) +{ + struct task_struct *owner; + /* default return to spin: if no owner, the lock is free */ + int ret = true; + + if (need_resched()) + return 0; + + rcu_read_lock(); + owner = rt_mutex_owner(lock); + if (owner) + ret = owner->on_cpu; + rcu_read_unlock(); + + return ret; +} + +static bool rt_mutex_optimistic_spin(struct rt_mutex *lock) +{ + bool taken = false; + + preempt_disable(); + + if (!rt_mutex_can_spin_on_owner(lock)) + goto done; + /* + * In order to avoid a stampede of mutex spinners trying to + * acquire the mutex all at once, the spinners need to take a + * MCS (queued) lock first before spinning on the owner field. + */ + if (!osq_lock(&lock->osq)) + goto done; + + while (true) { + struct task_struct *owner; + + /* + * When another task is competing for the lock in the + * slowpath (either transitional or not), avoid the + * cmpxchg and immediately block. If the situation is + * later fixed by clearing the waiters bit, future + * tasks that atempt to take the rtmutex can spin. + */ + if (rt_mutex_has_waiters_fast(lock)) + goto done_unlock; + + /* + * If there's an owner, wait for it to either + * release the lock or go to sleep. + */ + owner = rt_mutex_owner(lock); + if (owner && !rt_mutex_spin_on_owner(lock, owner)) + break; + + /* Try to acquire the lock, if it is unlocked. */ + if (rt_mutex_cmpxchg(lock, NULL, current)) { + taken = true; + goto done_unlock; + } + + /* + * The cpu_relax() call is a compiler barrier which forces + * everything in this loop to be re-loaded. We don't need + * memory barriers as we'll eventually observe the right + * values at the cost of a few extra spins. + */ + cpu_relax_lowlatency(); + } + +done_unlock: + osq_unlock(&lock->osq); +done: + preempt_enable(); + return taken; +} + +#else +static bool rt_mutex_optimistic_spin(struct rt_mutex *lock) +{ + return false; +} +#endif + /* * Slow path lock function: */ @@ -1163,6 +1292,9 @@ rt_mutex_slowlock(struct rt_mutex *lock, int state, struct rt_mutex_waiter waiter; int ret = 0; + if (rt_mutex_optimistic_spin(lock)) + return ret; + debug_rt_mutex_init_waiter(&waiter); RB_CLEAR_NODE(&waiter.pi_tree_entry); RB_CLEAR_NODE(&waiter.tree_entry); @@ -1500,7 +1632,9 @@ void __rt_mutex_init(struct rt_mutex *lock, const char *name) raw_spin_lock_init(&lock->wait_lock); lock->waiters = RB_ROOT; lock->waiters_leftmost = NULL; - +#ifdef CONFIG_RT_MUTEX_SPIN_ON_OWNER + osq_lock_init(&lock->osq); +#endif debug_rt_mutex_init(lock, name); } EXPORT_SYMBOL_GPL(__rt_mutex_init); -- 2.1.4 ^ permalink raw reply related [flat|nested] 38+ messages in thread
* Re: [PATCH -rfc 4/4] locking/rtmutex: Support spin on owner (osq) 2015-05-19 17:24 ` [PATCH -rfc 4/4] locking/rtmutex: Support spin on owner (osq) Davidlohr Bueso 2015-05-20 7:11 ` Paul Bolle 2015-05-29 15:19 ` Davidlohr Bueso @ 2015-06-05 13:59 ` Thomas Gleixner 2015-06-09 4:41 ` Davidlohr Bueso 2 siblings, 1 reply; 38+ messages in thread From: Thomas Gleixner @ 2015-06-05 13:59 UTC (permalink / raw) To: Davidlohr Bueso Cc: Peter Zijlstra, Ingo Molnar, Steven Rostedt, Mike Galbraith, Paul E. McKenney, Sebastian Andrzej Siewior, linux-kernel, Davidlohr Bueso On Tue, 19 May 2015, Davidlohr Bueso wrote: > > +/* > + * Lockless alternative to rt_mutex_has_waiters() as we do not need the > + * wait_lock to check if we are in, for instance, a transitional state > + * after calling mark_rt_mutex_waiters(). Before I get into a state of brain melt, could you please explain that in an understandable way? rt_mutex_has_waiters() looks at the root pointer of the rbtree head whether that's empty. You can do a lockless check of that as well, right? So what's the FAST part of that function and how is that related to a point after we called mark_rt_mutex_waiters()? > + */ > +static inline bool rt_mutex_has_waiters_fast(struct rt_mutex *lock) > +{ > + unsigned long val = (unsigned long)lock->owner; > + > + if (!val) > + return false; > + return val & RT_MUTEX_HAS_WAITERS; > +} > + > +/* > + * Initial check for entering the mutex spinning loop > + */ > +static inline bool rt_mutex_can_spin_on_owner(struct rt_mutex *lock) > +{ > + struct task_struct *owner; > + /* default return to spin: if no owner, the lock is free */ Rather than having a comment in the middle of the variable declaration section, I'd prefer a comment explaing the whole logic of this function. > + int ret = true; > +static bool rt_mutex_optimistic_spin(struct rt_mutex *lock) > +{ > + bool taken = false; > + > + preempt_disable(); > + > + if (!rt_mutex_can_spin_on_owner(lock)) > + goto done; > + /* > + * In order to avoid a stampede of mutex spinners trying to > + * acquire the mutex all at once, the spinners need to take a > + * MCS (queued) lock first before spinning on the owner field. > + */ > + if (!osq_lock(&lock->osq)) > + goto done; Hmm. The queue lock is serializing potential spinners, right? So that's going to lead to a potential priority ordering problem because if a lower prio task wins the racing to the ocq_lock queue, then the higher prio waiter will be queued behind and blocked from taking the lock first. Thanks, tglx ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH -rfc 4/4] locking/rtmutex: Support spin on owner (osq) 2015-06-05 13:59 ` Thomas Gleixner @ 2015-06-09 4:41 ` Davidlohr Bueso 2015-06-09 9:29 ` Thomas Gleixner 0 siblings, 1 reply; 38+ messages in thread From: Davidlohr Bueso @ 2015-06-09 4:41 UTC (permalink / raw) To: Thomas Gleixner Cc: Peter Zijlstra, Ingo Molnar, Steven Rostedt, Mike Galbraith, Paul E. McKenney, Sebastian Andrzej Siewior, linux-kernel On Fri, 2015-06-05 at 15:59 +0200, Thomas Gleixner wrote: > On Tue, 19 May 2015, Davidlohr Bueso wrote: > > > > +/* > > + * Lockless alternative to rt_mutex_has_waiters() as we do not need the > > + * wait_lock to check if we are in, for instance, a transitional state > > + * after calling mark_rt_mutex_waiters(). > > Before I get into a state of brain melt, could you please explain that > in an understandable way? With that I meant that we could check the owner to see if the RT_MUTEX_HAS_WAITERS bit was set without taking the wait_lock and no owner. > > rt_mutex_has_waiters() looks at the root pointer of the rbtree head > whether that's empty. You can do a lockless check of that as well, > right? So what's the FAST part of that function and how is that > related to a point after we called mark_rt_mutex_waiters()? You're right, we could use rt_mutex_has_waiters(). When I thought of this originally, I was considering something like: if (rt_mutex_has_waiters(lock)) { if (current->prio >= rt_mutex_top_waiter(lock)->prio) ... Which obviously requires the wait_lock, but I did not consider just using the tree. However, the consequence I see in doing this is that we would miss scenarios where mark_rt_mutex_waiters() is called (under nil owner, for example), so we would force tasks to block only when there are truly waiters. > > + */ > > +static inline bool rt_mutex_has_waiters_fast(struct rt_mutex *lock) > > +{ > > + unsigned long val = (unsigned long)lock->owner; > > + > > + if (!val) > > + return false; > > + return val & RT_MUTEX_HAS_WAITERS; > > +} > > + > > > +/* > > + * Initial check for entering the mutex spinning loop > > + */ > > +static inline bool rt_mutex_can_spin_on_owner(struct rt_mutex *lock) > > +{ > > + struct task_struct *owner; > > + /* default return to spin: if no owner, the lock is free */ > > > Rather than having a comment in the middle of the variable declaration > section, I'd prefer a comment explaing the whole logic of this > function. Ok. > > + int ret = true; > > > +static bool rt_mutex_optimistic_spin(struct rt_mutex *lock) > > +{ > > + bool taken = false; > > + > > + preempt_disable(); > > + > > + if (!rt_mutex_can_spin_on_owner(lock)) > > + goto done; > > + /* > > + * In order to avoid a stampede of mutex spinners trying to > > + * acquire the mutex all at once, the spinners need to take a > > + * MCS (queued) lock first before spinning on the owner field. > > + */ > > + if (!osq_lock(&lock->osq)) > > + goto done; > > Hmm. The queue lock is serializing potential spinners, right? Yes. > > So that's going to lead to a potential priority ordering problem > because if a lower prio task wins the racing to the ocq_lock queue, > then the higher prio waiter will be queued behind and blocked from > taking the lock first. Hmm yes, ocq is a fair lock. However I believe this is mitigated by (a) the conservative spinning approach, and (b) by osq_lock's need_resched() check, so at least a spinner will abort if a higher prio task comes in. But of course, this only deals with spinners, and we cannot account for a lower prio owner task. So if this is not acceptable, I guess we'll have to do without the mcs like properties. Thanks, Davidlohr ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH -rfc 4/4] locking/rtmutex: Support spin on owner (osq) 2015-06-09 4:41 ` Davidlohr Bueso @ 2015-06-09 9:29 ` Thomas Gleixner 2015-06-09 11:21 ` Peter Zijlstra 0 siblings, 1 reply; 38+ messages in thread From: Thomas Gleixner @ 2015-06-09 9:29 UTC (permalink / raw) To: Davidlohr Bueso Cc: Peter Zijlstra, Ingo Molnar, Steven Rostedt, Mike Galbraith, Paul E. McKenney, Sebastian Andrzej Siewior, linux-kernel On Mon, 8 Jun 2015, Davidlohr Bueso wrote: > On Fri, 2015-06-05 at 15:59 +0200, Thomas Gleixner wrote: > > rt_mutex_has_waiters() looks at the root pointer of the rbtree head > > whether that's empty. You can do a lockless check of that as well, > > right? So what's the FAST part of that function and how is that > > related to a point after we called mark_rt_mutex_waiters()? > > You're right, we could use rt_mutex_has_waiters(). When I thought of > this originally, I was considering something like: > > if (rt_mutex_has_waiters(lock)) { > if (current->prio >= rt_mutex_top_waiter(lock)->prio) > ... > > Which obviously requires the wait_lock, but I did not consider just > using the tree. However, the consequence I see in doing this is that we > would miss scenarios where mark_rt_mutex_waiters() is called (under nil > owner, for example), so we would force tasks to block only when there > are truly waiters. Fair enough. But we really want a proper comment explaining it. > > > +static bool rt_mutex_optimistic_spin(struct rt_mutex *lock) > > > +{ > > > + bool taken = false; > > > + > > > + preempt_disable(); > > > + > > > + if (!rt_mutex_can_spin_on_owner(lock)) > > > + goto done; > > > + /* > > > + * In order to avoid a stampede of mutex spinners trying to > > > + * acquire the mutex all at once, the spinners need to take a > > > + * MCS (queued) lock first before spinning on the owner field. > > > + */ > > > + if (!osq_lock(&lock->osq)) > > > + goto done; > > > > Hmm. The queue lock is serializing potential spinners, right? > > Yes. > > > > > So that's going to lead to a potential priority ordering problem > > because if a lower prio task wins the racing to the ocq_lock queue, > > then the higher prio waiter will be queued behind and blocked from > > taking the lock first. > > Hmm yes, ocq is a fair lock. However I believe this is mitigated by (a) > the conservative spinning approach, and (b) by osq_lock's need_resched() > check, so at least a spinner will abort if a higher prio task comes in. > But of course, this only deals with spinners, and we cannot account for > a lower prio owner task. > > So if this is not acceptable, I guess we'll have to do without the mcs > like properties. I'd say it accounts as priority inversion. If you look at the RT code, then you'll notice that in the slow lock path we queue the incoming waiter (including the PI dance) and then spin only if the waiter is the top waiter on the lock. Surely it would be nice to avoid the whole PI dance, but OTOH it can lead to the following issue (and some others): CPU0 CPU1 T0 prio=10 T1 prio=20 lock(RTM); lock(RTM); spin() ->preempt() T2 prio=15 if (!owner->on_cpu) break; block_on_rtmutex(); prio_boost(T0); ->preempt() T0 prio=20 unlock(RTM); IIRC, the initial RT attempt was to spin w/o the PI dance but we gave up on it due to latency and correctness issues. I wrapped my head around doing the following: 1) Lightweight mark the owner as boosted, w/o actually boosting it to keep it on the cpu 2) Have a priority check in the spin to drop out when a higher prio waiter comes in. But #1 is a nightmare in the scheduler to do, so I gave up on it. If you have a better idea, you're welcome. Thanks, tglx ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH -rfc 4/4] locking/rtmutex: Support spin on owner (osq) 2015-06-09 9:29 ` Thomas Gleixner @ 2015-06-09 11:21 ` Peter Zijlstra 2015-06-09 12:53 ` Thomas Gleixner 0 siblings, 1 reply; 38+ messages in thread From: Peter Zijlstra @ 2015-06-09 11:21 UTC (permalink / raw) To: Thomas Gleixner Cc: Davidlohr Bueso, Ingo Molnar, Steven Rostedt, Mike Galbraith, Paul E. McKenney, Sebastian Andrzej Siewior, linux-kernel On Tue, Jun 09, 2015 at 11:29:59AM +0200, Thomas Gleixner wrote: > > If you look at the RT code, then you'll notice that in the slow lock > path we queue the incoming waiter (including the PI dance) and then > spin only if the waiter is the top waiter on the lock. On a related note; does the RT code still add lock stealing to rtmutex? We allow higher prio lock stealing, but I seem to remember a longish debate if we should allow equal prio lock stealing. ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH -rfc 4/4] locking/rtmutex: Support spin on owner (osq) 2015-06-09 11:21 ` Peter Zijlstra @ 2015-06-09 12:53 ` Thomas Gleixner 0 siblings, 0 replies; 38+ messages in thread From: Thomas Gleixner @ 2015-06-09 12:53 UTC (permalink / raw) To: Peter Zijlstra Cc: Davidlohr Bueso, Ingo Molnar, Steven Rostedt, Mike Galbraith, Paul E. McKenney, Sebastian Andrzej Siewior, linux-kernel On Tue, 9 Jun 2015, Peter Zijlstra wrote: > On Tue, Jun 09, 2015 at 11:29:59AM +0200, Thomas Gleixner wrote: > > > > If you look at the RT code, then you'll notice that in the slow lock > > path we queue the incoming waiter (including the PI dance) and then > > spin only if the waiter is the top waiter on the lock. > > On a related note; does the RT code still add lock stealing to rtmutex? No. > We allow higher prio lock stealing, but I seem to remember a longish > debate if we should allow equal prio lock stealing. Right. We never finished that one:) Thanks, tglx ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH -tip 0/4] rtmutex: Spin on owner 2015-05-19 17:24 [PATCH -tip 0/4] rtmutex: Spin on owner Davidlohr Bueso ` (3 preceding siblings ...) 2015-05-19 17:24 ` [PATCH -rfc 4/4] locking/rtmutex: Support spin on owner (osq) Davidlohr Bueso @ 2015-05-25 20:35 ` Davidlohr Bueso 2015-05-26 19:05 ` Thomas Gleixner 4 siblings, 1 reply; 38+ messages in thread From: Davidlohr Bueso @ 2015-05-25 20:35 UTC (permalink / raw) To: Thomas Gleixner Cc: Peter Zijlstra, Ingo Molnar, Steven Rostedt, Mike Galbraith, Paul E. McKenney, Sebastian Andrzej Siewior, linux-kernel ping? ^ permalink raw reply [flat|nested] 38+ messages in thread
* Re: [PATCH -tip 0/4] rtmutex: Spin on owner 2015-05-25 20:35 ` [PATCH -tip 0/4] rtmutex: Spin on owner Davidlohr Bueso @ 2015-05-26 19:05 ` Thomas Gleixner 0 siblings, 0 replies; 38+ messages in thread From: Thomas Gleixner @ 2015-05-26 19:05 UTC (permalink / raw) To: Davidlohr Bueso Cc: Peter Zijlstra, Ingo Molnar, Steven Rostedt, Mike Galbraith, Paul E. McKenney, Sebastian Andrzej Siewior, linux-kernel On Mon, 25 May 2015, Davidlohr Bueso wrote: > ping? pong! It's in my review backlog list and it's gonna stay there until I return from vacation and conferencing end of next week. Thanks, tglx ^ permalink raw reply [flat|nested] 38+ messages in thread
end of thread, other threads:[~2015-06-21 4:35 UTC | newest] Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-05-19 17:24 [PATCH -tip 0/4] rtmutex: Spin on owner Davidlohr Bueso 2015-05-19 17:24 ` [PATCH 1/4] locking/rtmutex: Implement lockless top-waiter wakeup Davidlohr Bueso 2015-06-05 12:35 ` Thomas Gleixner 2015-06-16 19:29 ` [PATCH] futex: lower the lock contention on the HB lock during wake up Sebastian Andrzej Siewior 2015-06-16 19:50 ` Davidlohr Bueso 2015-06-17 8:33 ` [PATCH v2] " Sebastian Andrzej Siewior 2015-06-17 14:17 ` Mike Galbraith 2015-06-17 14:28 ` Sebastian Andrzej Siewior 2015-06-17 14:31 ` Mike Galbraith 2015-06-21 4:35 ` Mike Galbraith 2015-06-18 20:30 ` [tip:sched/core] futex: Lower " tip-bot for Sebastian Andrzej Siewior 2015-06-19 17:51 ` [PATCH v2] futex: lower " Kevin Hilman 2015-06-19 18:54 ` Thomas Gleixner 2015-06-19 19:32 ` Kevin Hilman 2015-06-19 19:33 ` [tip:sched/locking] futex: Lower " tip-bot for Sebastian Andrzej Siewior 2015-06-18 20:30 ` [tip:sched/core] locking/rtmutex: Implement lockless top-waiter wakeup tip-bot for Davidlohr Bueso 2015-05-19 17:24 ` [PATCH 2/4] locking/rtmutex: Use cmp-cmpxchg Davidlohr Bueso 2015-06-05 12:38 ` Thomas Gleixner 2015-06-06 15:27 ` Davidlohr Bueso 2015-06-15 18:34 ` Jason Low 2015-06-15 19:37 ` Davidlohr Bueso 2015-06-16 1:00 ` Jason Low 2015-05-19 17:24 ` [PATCH 3/4] locking/rtmutex: Update stale plist comments Davidlohr Bueso 2015-06-05 12:39 ` Thomas Gleixner 2015-06-18 20:57 ` [tip:sched/core] " tip-bot for Davidlohr Bueso 2015-06-19 19:33 ` [tip:sched/locking] " tip-bot for Davidlohr Bueso 2015-05-19 17:24 ` [PATCH -rfc 4/4] locking/rtmutex: Support spin on owner (osq) Davidlohr Bueso 2015-05-20 7:11 ` Paul Bolle 2015-05-25 20:35 ` Davidlohr Bueso 2015-05-29 15:19 ` Davidlohr Bueso 2015-05-29 18:01 ` Davidlohr Bueso 2015-06-05 13:59 ` Thomas Gleixner 2015-06-09 4:41 ` Davidlohr Bueso 2015-06-09 9:29 ` Thomas Gleixner 2015-06-09 11:21 ` Peter Zijlstra 2015-06-09 12:53 ` Thomas Gleixner 2015-05-25 20:35 ` [PATCH -tip 0/4] rtmutex: Spin on owner Davidlohr Bueso 2015-05-26 19:05 ` Thomas Gleixner
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).