* [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL. @ 2016-01-21 9:29 Ding Tianhong 2016-01-21 21:23 ` Tim Chen 2016-01-21 23:02 ` Waiman Long 0 siblings, 2 replies; 34+ messages in thread From: Ding Tianhong @ 2016-01-21 9:29 UTC (permalink / raw) To: Peter Zijlstra, Ingo Molnar, linux-kernel Cc: Davidlohr Bueso, Linus Torvalds, Paul E. McKenney, Thomas Gleixner, Will Deacon, Jason Low, Tim Chen, Waiman Long I build a script to create several process for ioctl loop calling, the ioctl will calling the kernel function just like: xx_ioctl { ... rtnl_lock(); function(); rtnl_unlock(); ... } The function may sleep several ms, but will not halt, at the same time another user service may calling ifconfig to change the state of the ethernet, and after several hours, the hung task thread report this problem: ======================================================================== 149738.039038] INFO: task ifconfig:11890 blocked for more than 120 seconds. [149738.040597] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [149738.042280] ifconfig D ffff88061ec13680 0 11890 11573 0x00000080 [149738.042284] ffff88052449bd40 0000000000000082 ffff88053a33f300 ffff88052449bfd8 [149738.042286] ffff88052449bfd8 ffff88052449bfd8 ffff88053a33f300 ffffffff819e6240 [149738.042288] ffffffff819e6244 ffff88053a33f300 00000000ffffffff ffffffff819e6248 [149738.042290] Call Trace: [149738.042300] [<ffffffff8160d219>] schedule_preempt_disabled+0x29/0x70 [149738.042303] [<ffffffff8160af65>] __mutex_lock_slowpath+0xc5/0x1c0 [149738.042305] [<ffffffff8160a3cf>] mutex_lock+0x1f/0x2f [149738.042309] [<ffffffff8150d945>] rtnl_lock+0x15/0x20 [149738.042311] [<ffffffff81514e3a>] dev_ioctl+0xda/0x590 [149738.042314] [<ffffffff816121cc>] ? __do_page_fault+0x21c/0x560 [149738.042318] [<ffffffff814e42c5>] sock_do_ioctl+0x45/0x50 [149738.042320] [<ffffffff814e49d0>] sock_ioctl+0x1f0/0x2c0 [149738.042324] [<ffffffff811dc9b5>] do_vfs_ioctl+0x2e5/0x4c0 [149738.042327] [<ffffffff811e6a00>] ? fget_light+0xa0/0xd0 ================================ cut here ================================ I got the vmcore and found that the ifconfig is already in the wait_list of the rtnl_lock for 120 second, but my process could get and release the rtnl_lock normally several times in one second, so it means that my process jump the queue and the ifconfig couldn't get the rtnl all the time, I check the mutex lock slow path and found that the mutex may spin on owner ignore whether the wait list is empty, it will cause the task in the wait list always be cut in line, so add test for wait list in the mutex_can_spin_on_owner and avoid this problem. Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@us.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Cc: Jason Low <jason.low2@hp.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Waiman Long <Waiman.Long@hp.com> --- kernel/locking/mutex.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index 0551c21..596b341 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -256,7 +256,7 @@ static inline int mutex_can_spin_on_owner(struct mutex *lock) struct task_struct *owner; int retval = 1; - if (need_resched()) + if (need_resched() || atomic_read(&lock->count) == -1) return 0; rcu_read_lock(); @@ -283,10 +283,11 @@ static inline bool mutex_try_to_acquire(struct mutex *lock) /* * Optimistic spinning. * - * We try to spin for acquisition when we find that the lock owner - * is currently running on a (different) CPU and while we don't - * need to reschedule. The rationale is that if the lock owner is - * running, it is likely to release the lock soon. + * We try to spin for acquisition when we find that there are no + * pending waiters and the lock owner is currently running on a + * (different) CPU and while we don't need to reschedule. The + * rationale is that if the lock owner is running, it is likely + * to release the lock soon. * * Since this needs the lock owner, and this mutex implementation * doesn't track the owner atomically in the lock field, we need to -- 2.5.0 ^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL. 2016-01-21 9:29 [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL Ding Tianhong @ 2016-01-21 21:23 ` Tim Chen 2016-01-22 2:41 ` Paul E. McKenney 2016-01-21 23:02 ` Waiman Long 1 sibling, 1 reply; 34+ messages in thread From: Tim Chen @ 2016-01-21 21:23 UTC (permalink / raw) To: Ding Tianhong Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Davidlohr Bueso, Linus Torvalds, Paul E. McKenney, Thomas Gleixner, Will Deacon, Jason Low, Waiman Long On Thu, 2016-01-21 at 17:29 +0800, Ding Tianhong wrote: > > diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c > index 0551c21..596b341 100644 > --- a/kernel/locking/mutex.c > +++ b/kernel/locking/mutex.c > @@ -256,7 +256,7 @@ static inline int mutex_can_spin_on_owner(struct mutex *lock) > struct task_struct *owner; > int retval = 1; > > - if (need_resched()) > + if (need_resched() || atomic_read(&lock->count) == -1) > return 0; > One concern I have is this change will eliminate any optimistic spinning as long as there is a waiter. Is there a middle ground that we can allow only one spinner if there are waiters? In other words, we allow spinning when atomic_read(&lock->count) == -1 but there is no one on the osq lock that queue up the spinners (i.e. no other process doing optimistic spinning). This could allow a bit of spinning without starving out the waiters. Tim ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL. 2016-01-21 21:23 ` Tim Chen @ 2016-01-22 2:41 ` Paul E. McKenney 2016-01-22 2:48 ` Davidlohr Bueso 0 siblings, 1 reply; 34+ messages in thread From: Paul E. McKenney @ 2016-01-22 2:41 UTC (permalink / raw) To: Tim Chen Cc: Ding Tianhong, Peter Zijlstra, Ingo Molnar, linux-kernel, Davidlohr Bueso, Linus Torvalds, Thomas Gleixner, Will Deacon, Jason Low, Waiman Long On Thu, Jan 21, 2016 at 01:23:09PM -0800, Tim Chen wrote: > On Thu, 2016-01-21 at 17:29 +0800, Ding Tianhong wrote: > > > > > diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c > > index 0551c21..596b341 100644 > > --- a/kernel/locking/mutex.c > > +++ b/kernel/locking/mutex.c > > @@ -256,7 +256,7 @@ static inline int mutex_can_spin_on_owner(struct mutex *lock) > > struct task_struct *owner; > > int retval = 1; > > > > - if (need_resched()) > > + if (need_resched() || atomic_read(&lock->count) == -1) > > return 0; > > > > One concern I have is this change will eliminate any optimistic spinning > as long as there is a waiter. Is there a middle ground that we > can allow only one spinner if there are waiters? > > In other words, we allow spinning when > atomic_read(&lock->count) == -1 but there is no one on the > osq lock that queue up the spinners (i.e. no other process doing > optimistic spinning). > > This could allow a bit of spinning without starving out the waiters. I did some testing, which exposed it to the 0day test robot, which did note some performance differences. I was hoping that it would clear up some instability from other patches, but no such luck. ;-) Thanx, Paul ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL. 2016-01-22 2:41 ` Paul E. McKenney @ 2016-01-22 2:48 ` Davidlohr Bueso 2016-01-22 3:13 ` Paul E. McKenney 0 siblings, 1 reply; 34+ messages in thread From: Davidlohr Bueso @ 2016-01-22 2:48 UTC (permalink / raw) To: Paul E. McKenney Cc: Tim Chen, Ding Tianhong, Peter Zijlstra, Ingo Molnar, linux-kernel, Linus Torvalds, Thomas Gleixner, Will Deacon, Jason Low, Waiman Long On Thu, 21 Jan 2016, Paul E. McKenney wrote: >I did some testing, which exposed it to the 0day test robot, which >did note some performance differences. I was hoping that it would >clear up some instability from other patches, but no such luck. ;-) Oh, that explains why we got a performance regression report :) Thanks, Davidlohr ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL. 2016-01-22 2:48 ` Davidlohr Bueso @ 2016-01-22 3:13 ` Paul E. McKenney 0 siblings, 0 replies; 34+ messages in thread From: Paul E. McKenney @ 2016-01-22 3:13 UTC (permalink / raw) To: Davidlohr Bueso Cc: Tim Chen, Ding Tianhong, Peter Zijlstra, Ingo Molnar, linux-kernel, Linus Torvalds, Thomas Gleixner, Will Deacon, Jason Low, Waiman Long On Thu, Jan 21, 2016 at 06:48:54PM -0800, Davidlohr Bueso wrote: > On Thu, 21 Jan 2016, Paul E. McKenney wrote: > > >I did some testing, which exposed it to the 0day test robot, which > >did note some performance differences. I was hoping that it would > >clear up some instability from other patches, but no such luck. ;-) > > Oh, that explains why we got a performance regression report :) Plus I suspected that you wanted some extra email. ;-) Thanx, Paul ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL. 2016-01-21 9:29 [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL Ding Tianhong 2016-01-21 21:23 ` Tim Chen @ 2016-01-21 23:02 ` Waiman Long 2016-01-22 6:09 ` Davidlohr Bueso 2016-01-22 8:54 ` [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL Peter Zijlstra 1 sibling, 2 replies; 34+ messages in thread From: Waiman Long @ 2016-01-21 23:02 UTC (permalink / raw) To: Ding Tianhong Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Davidlohr Bueso, Linus Torvalds, Paul E. McKenney, Thomas Gleixner, Will Deacon, Jason Low, Tim Chen, Waiman Long [-- Attachment #1: Type: text/plain, Size: 4432 bytes --] On 01/21/2016 04:29 AM, Ding Tianhong wrote: > I build a script to create several process for ioctl loop calling, > the ioctl will calling the kernel function just like: > xx_ioctl { > ... > rtnl_lock(); > function(); > rtnl_unlock(); > ... > } > The function may sleep several ms, but will not halt, at the same time > another user service may calling ifconfig to change the state of the > ethernet, and after several hours, the hung task thread report this problem: > > ======================================================================== > 149738.039038] INFO: task ifconfig:11890 blocked for more than 120 seconds. > [149738.040597] "echo 0> /proc/sys/kernel/hung_task_timeout_secs" disables this message. > [149738.042280] ifconfig D ffff88061ec13680 0 11890 11573 0x00000080 > [149738.042284] ffff88052449bd40 0000000000000082 ffff88053a33f300 ffff88052449bfd8 > [149738.042286] ffff88052449bfd8 ffff88052449bfd8 ffff88053a33f300 ffffffff819e6240 > [149738.042288] ffffffff819e6244 ffff88053a33f300 00000000ffffffff ffffffff819e6248 > [149738.042290] Call Trace: > [149738.042300] [<ffffffff8160d219>] schedule_preempt_disabled+0x29/0x70 > [149738.042303] [<ffffffff8160af65>] __mutex_lock_slowpath+0xc5/0x1c0 > [149738.042305] [<ffffffff8160a3cf>] mutex_lock+0x1f/0x2f > [149738.042309] [<ffffffff8150d945>] rtnl_lock+0x15/0x20 > [149738.042311] [<ffffffff81514e3a>] dev_ioctl+0xda/0x590 > [149738.042314] [<ffffffff816121cc>] ? __do_page_fault+0x21c/0x560 > [149738.042318] [<ffffffff814e42c5>] sock_do_ioctl+0x45/0x50 > [149738.042320] [<ffffffff814e49d0>] sock_ioctl+0x1f0/0x2c0 > [149738.042324] [<ffffffff811dc9b5>] do_vfs_ioctl+0x2e5/0x4c0 > [149738.042327] [<ffffffff811e6a00>] ? fget_light+0xa0/0xd0 > > ================================ cut here ================================ > > I got the vmcore and found that the ifconfig is already in the wait_list of the > rtnl_lock for 120 second, but my process could get and release the rtnl_lock > normally several times in one second, so it means that my process jump the > queue and the ifconfig couldn't get the rtnl all the time, I check the mutex lock > slow path and found that the mutex may spin on owner ignore whether the wait list > is empty, it will cause the task in the wait list always be cut in line, so add > test for wait list in the mutex_can_spin_on_owner and avoid this problem. > > Signed-off-by: Ding Tianhong<dingtianhong@huawei.com> > Cc: Ingo Molnar<mingo@redhat.com> > Cc: Peter Zijlstra<peterz@infradead.org> > Cc: Davidlohr Bueso<dave@stgolabs.net> > Cc: Linus Torvalds<torvalds@linux-foundation.org> > Cc: Paul E. McKenney<paulmck@us.ibm.com> > Cc: Thomas Gleixner<tglx@linutronix.de> > Cc: Will Deacon<will.deacon@arm.com> > Cc: Jason Low<jason.low2@hp.com> > Cc: Tim Chen<tim.c.chen@linux.intel.com> > Cc: Waiman Long<Waiman.Long@hp.com> > --- > kernel/locking/mutex.c | 11 ++++++----- > 1 file changed, 6 insertions(+), 5 deletions(-) > > diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c > index 0551c21..596b341 100644 > --- a/kernel/locking/mutex.c > +++ b/kernel/locking/mutex.c > @@ -256,7 +256,7 @@ static inline int mutex_can_spin_on_owner(struct mutex *lock) > struct task_struct *owner; > int retval = 1; > > - if (need_resched()) > + if (need_resched() || atomic_read(&lock->count) == -1) > return 0; > > rcu_read_lock(); > @@ -283,10 +283,11 @@ static inline bool mutex_try_to_acquire(struct mutex *lock) > /* > * Optimistic spinning. > * > - * We try to spin for acquisition when we find that the lock owner > - * is currently running on a (different) CPU and while we don't > - * need to reschedule. The rationale is that if the lock owner is > - * running, it is likely to release the lock soon. > + * We try to spin for acquisition when we find that there are no > + * pending waiters and the lock owner is currently running on a > + * (different) CPU and while we don't need to reschedule. The > + * rationale is that if the lock owner is running, it is likely > + * to release the lock soon. > * > * Since this needs the lock owner, and this mutex implementation > * doesn't track the owner atomically in the lock field, we need to This patch will largely defeat the performance benefit of optimistic spinning. I have an alternative solution to this live-lock problem. Would you mind trying out the attached patch to see if it can fix your problem? Cheers, Longman [-- Attachment #2: 0001-locking-mutex-Enable-optimistic-spinning-of-woken-ta.patch --] [-- Type: text/plain, Size: 5460 bytes --] From 1bbb5a4434d395f48163abc5435c5c720a15d327 Mon Sep 17 00:00:00 2001 From: Waiman Long <Waiman.Long@hpe.com> Date: Thu, 21 Jan 2016 17:53:14 -0500 Subject: [PATCH] locking/mutex: Enable optimistic spinning of woken task in wait list Ding Tianhong reported a live-lock situation where a constant stream of incoming optimistic spinners blocked a task in the wait list from getting the mutex. This patch attempts to fix this live-lock condition by enabling the a woken task in the wait list to enter optimistic spinning loop itself with precedence over the ones in the OSQ. This should prevent the live-lock condition from happening. Signed-off-by: Waiman Long <Waiman.Long@hpe.com> --- include/linux/mutex.h | 2 + kernel/locking/mutex.c | 95 +++++++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 95 insertions(+), 2 deletions(-) diff --git a/include/linux/mutex.h b/include/linux/mutex.h index 2cb7531..2c55ecd 100644 --- a/include/linux/mutex.h +++ b/include/linux/mutex.h @@ -57,6 +57,8 @@ struct mutex { #endif #ifdef CONFIG_MUTEX_SPIN_ON_OWNER struct optimistic_spin_queue osq; /* Spinner MCS lock */ + /* Set if wait list head actively spinning */ + int wlh_spinning; #endif #ifdef CONFIG_DEBUG_MUTEXES void *magic; diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index 0551c21..8b27b03 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -55,6 +55,7 @@ __mutex_init(struct mutex *lock, const char *name, struct lock_class_key *key) mutex_clear_owner(lock); #ifdef CONFIG_MUTEX_SPIN_ON_OWNER osq_lock_init(&lock->osq); + lock->wlh_spinning = false; #endif debug_mutex_init(lock, name, key); @@ -346,8 +347,12 @@ static bool mutex_optimistic_spin(struct mutex *lock, if (owner && !mutex_spin_on_owner(lock, owner)) break; - /* Try to acquire the mutex if it is unlocked. */ - if (mutex_try_to_acquire(lock)) { + /* + * Try to acquire the mutex if it is unlocked and the wait + * list head isn't spinning on the lock. + */ + if (!READ_ONCE(lock->wlh_spinning) && + mutex_try_to_acquire(lock)) { lock_acquired(&lock->dep_map, ip); if (use_ww_ctx) { @@ -398,12 +403,91 @@ done: return false; } + +/* + * Wait list head optimistic spinning + * + * The wait list head, when woken up, will try to spin on the lock if the + * lock owner is active. It will also set the wlh_spinning flag to give + * itself a higher chance of getting the lock than the other optimisically + * spinning locker in the OSQ. + */ +static bool mutex_wlh_opt_spin(struct mutex *lock, + struct ww_acquire_ctx *ww_ctx, const bool use_ww_ctx) +{ + struct task_struct *owner, *task = current; + int gotlock = false; + + WRITE_ONCE(lock->wlh_spinning, true); + while (true) { + if (use_ww_ctx && ww_ctx->acquired > 0) { + struct ww_mutex *ww; + + ww = container_of(lock, struct ww_mutex, base); + /* + * If ww->ctx is set the contents are undefined, only + * by acquiring wait_lock there is a guarantee that + * they are not invalid when reading. + * + * As such, when deadlock detection needs to be + * performed the optimistic spinning cannot be done. + */ + if (READ_ONCE(ww->ctx)) + break; + } + + /* + * If there's an owner, wait for it to either + * release the lock or go to sleep. + */ + owner = READ_ONCE(lock->owner); + if (owner && !mutex_spin_on_owner(lock, owner)) + break; + + /* + * Try to acquire the mutex if it is unlocked. The mutex + * value is set to -1 which will be changed to 0 later on + * if the wait list becomes empty. + */ + if (!mutex_is_locked(lock) && + (atomic_cmpxchg_acquire(&lock->count, 1, -1) == 1)) { + gotlock = true; + break; + } + + /* + * When there's no owner, we might have preempted between the + * owner acquiring the lock and setting the owner field. If + * we're an RT task that will live-lock because we won't let + * the owner complete. + */ + if (!owner && (need_resched() || rt_task(task))) + break; + + /* + * The cpu_relax() call is a compiler barrier which forces + * everything in this loop to be re-loaded. We don't need + * memory barriers as we'll eventually observe the right + * values at the cost of a few extra spins. + */ + cpu_relax_lowlatency(); + + } + WRITE_ONCE(lock->wlh_spinning, false); + return gotlock; +} #else static bool mutex_optimistic_spin(struct mutex *lock, struct ww_acquire_ctx *ww_ctx, const bool use_ww_ctx) { return false; } + +static bool mutex_wlh_opt_spin(struct mutex *lock, + struct ww_acquire_ctx *ww_ctx, const bool use_ww_ctx) +{ + return false; +} #endif __visible __used noinline @@ -543,6 +627,8 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, lock_contended(&lock->dep_map, ip); for (;;) { + int gotlock; + /* * Lets try to take the lock again - this is needed even if * we get here for the first time (shortly after failing to @@ -577,7 +663,12 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, /* didn't get the lock, go to sleep: */ spin_unlock_mutex(&lock->wait_lock, flags); schedule_preempt_disabled(); + + /* optimistically spinning on the mutex without the wait lock */ + gotlock = mutex_wlh_opt_spin(lock, ww_ctx, use_ww_ctx); spin_lock_mutex(&lock->wait_lock, flags); + if (gotlock) + break; } __set_task_state(task, TASK_RUNNING); -- 1.7.1 ^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL. 2016-01-21 23:02 ` Waiman Long @ 2016-01-22 6:09 ` Davidlohr Bueso 2016-01-22 13:38 ` Waiman Long 2016-01-22 8:54 ` [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL Peter Zijlstra 1 sibling, 1 reply; 34+ messages in thread From: Davidlohr Bueso @ 2016-01-22 6:09 UTC (permalink / raw) To: Waiman Long Cc: Ding Tianhong, Peter Zijlstra, Ingo Molnar, linux-kernel, Linus Torvalds, Paul E. McKenney, Thomas Gleixner, Will Deacon, Jason Low, Tim Chen, Waiman Long On Thu, 21 Jan 2016, Waiman Long wrote: >On 01/21/2016 04:29 AM, Ding Tianhong wrote: >>I got the vmcore and found that the ifconfig is already in the wait_list of the >>rtnl_lock for 120 second, but my process could get and release the rtnl_lock >>normally several times in one second, so it means that my process jump the >>queue and the ifconfig couldn't get the rtnl all the time, I check the mutex lock >>slow path and found that the mutex may spin on owner ignore whether the wait list >>is empty, it will cause the task in the wait list always be cut in line, so add >>test for wait list in the mutex_can_spin_on_owner and avoid this problem. So this has been somewhat always known, at least in theory, until now. It's the cost of spinning without going through the wait-queue, unlike other locks. >> [...] >From: Waiman Long <Waiman.Long@hpe.com> >Date: Thu, 21 Jan 2016 17:53:14 -0500 >Subject: [PATCH] locking/mutex: Enable optimistic spinning of woken task in wait list > >Ding Tianhong reported a live-lock situation where a constant stream >of incoming optimistic spinners blocked a task in the wait list from >getting the mutex. > >This patch attempts to fix this live-lock condition by enabling the >a woken task in the wait list to enter optimistic spinning loop itself >with precedence over the ones in the OSQ. This should prevent the >live-lock >condition from happening. And one of the reasons why we never bothered 'fixing' things was the additional branching out in the slowpath (and lack of real issue, although this one being so damn pathological). I fear that your approach is one of those scenarios where the code ends up being bloated, albeit most of it is actually duplicated and can be refactored *sigh*. So now we'd spin, then sleep, then try spinning then sleep again... phew. Not to mention the performance implications, ie loosing the benefits of osq over waiter spinning in scenarios that would otherwise have more osq spinners as opposed to waiter spinners, or in setups where it is actually best to block instead of spinning. Thanks, Davidlohr ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL. 2016-01-22 6:09 ` Davidlohr Bueso @ 2016-01-22 13:38 ` Waiman Long 2016-01-22 16:46 ` Davidlohr Bueso 0 siblings, 1 reply; 34+ messages in thread From: Waiman Long @ 2016-01-22 13:38 UTC (permalink / raw) To: Davidlohr Bueso Cc: Ding Tianhong, Peter Zijlstra, Ingo Molnar, linux-kernel, Linus Torvalds, Paul E. McKenney, Thomas Gleixner, Will Deacon, Jason Low, Tim Chen, Waiman Long On 01/22/2016 01:09 AM, Davidlohr Bueso wrote: > On Thu, 21 Jan 2016, Waiman Long wrote: > >> On 01/21/2016 04:29 AM, Ding Tianhong wrote: > >>> I got the vmcore and found that the ifconfig is already in the >>> wait_list of the >>> rtnl_lock for 120 second, but my process could get and release the >>> rtnl_lock >>> normally several times in one second, so it means that my process >>> jump the >>> queue and the ifconfig couldn't get the rtnl all the time, I check >>> the mutex lock >>> slow path and found that the mutex may spin on owner ignore whether >>> the wait list >>> is empty, it will cause the task in the wait list always be cut in >>> line, so add >>> test for wait list in the mutex_can_spin_on_owner and avoid this >>> problem. > > So this has been somewhat always known, at least in theory, until now. > It's the cost > of spinning without going through the wait-queue, unlike other locks. > >>> [...] > >> From: Waiman Long <Waiman.Long@hpe.com> >> Date: Thu, 21 Jan 2016 17:53:14 -0500 >> Subject: [PATCH] locking/mutex: Enable optimistic spinning of woken >> task in wait list >> >> Ding Tianhong reported a live-lock situation where a constant stream >> of incoming optimistic spinners blocked a task in the wait list from >> getting the mutex. >> >> This patch attempts to fix this live-lock condition by enabling the >> a woken task in the wait list to enter optimistic spinning loop itself >> with precedence over the ones in the OSQ. This should prevent the >> live-lock >> condition from happening. > > And one of the reasons why we never bothered 'fixing' things was the > additional > branching out in the slowpath (and lack of real issue, although this > one being so > damn pathological). I fear that your approach is one of those > scenarios where the > code ends up being bloated, albeit most of it is actually duplicated > and can be > refactored *sigh*. So now we'd spin, then sleep, then try spinning > then sleep again... > phew. Not to mention the performance implications, ie loosing the > benefits of osq > over waiter spinning in scenarios that would otherwise have more osq > spinners as > opposed to waiter spinners, or in setups where it is actually best to > block instead > of spinning. The patch that I sent out is just a proof of concept to make sure that it can fix that particular case. I do plan to refactor it if I decide to go ahead with an official one. Unlike the OSQ, there can be no more than one waiter spinner as the wakeup function is directed to only the first task in the wait list and the spinning won't happen until the task is first woken up. In the worst case scenario, there are only 2 spinners spinning on the lock and the owner field, one from OSQ and one from the wait list. That shouldn't put too much cacheline contention traffic to the system. Cheers, Longman ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL. 2016-01-22 13:38 ` Waiman Long @ 2016-01-22 16:46 ` Davidlohr Bueso 2016-01-25 2:23 ` [PATCH] locking/mutex: Allow next waiter lockless wakeup Davidlohr Bueso 0 siblings, 1 reply; 34+ messages in thread From: Davidlohr Bueso @ 2016-01-22 16:46 UTC (permalink / raw) To: Waiman Long Cc: Ding Tianhong, Peter Zijlstra, Ingo Molnar, linux-kernel, Linus Torvalds, Paul E. McKenney, Thomas Gleixner, Will Deacon, Jason Low, Tim Chen, Waiman Long On Fri, 22 Jan 2016, Waiman Long wrote: >The patch that I sent out is just a proof of concept to make sure >that it can fix that particular case. I do plan to refactor it if I >decide to go ahead with an official one. Unlike the OSQ, there can be >no more than one waiter spinner as the wakeup function is directed to >only the first task in the wait list and the spinning won't happen >until the task is first woken up. In the worst case scenario, there >are only 2 spinners spinning on the lock and the owner field, one >from OSQ and one from the wait list. That shouldn't put too much >cacheline contention traffic to the system. Similarly, I guess we should also wakeup the next waiter in line after releasing the wait_lock via wake_q. This would allow the woken waiter a slightly better chance of finding the wait_lock free when continuing to take the mutex. Thanks, Davidlohr ^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH] locking/mutex: Allow next waiter lockless wakeup 2016-01-22 16:46 ` Davidlohr Bueso @ 2016-01-25 2:23 ` Davidlohr Bueso 2016-01-25 23:02 ` Waiman Long 2016-02-29 11:21 ` [tip:locking/core] " tip-bot for Davidlohr Bueso 0 siblings, 2 replies; 34+ messages in thread From: Davidlohr Bueso @ 2016-01-25 2:23 UTC (permalink / raw) To: Waiman Long Cc: Ding Tianhong, Peter Zijlstra, Ingo Molnar, linux-kernel, Linus Torvalds, Paul E. McKenney, Thomas Gleixner, Will Deacon, Jason Low, Tim Chen, Waiman Long Make use of wake_q and enable the wakeup to occur after releasing the wait_lock. This is similar to what we do with rtmutex top waiter, slightly shortening the critical region and allow other waiters to acquire the wait_lock sooner. In low contention cases it can also help the recently woken waiter to find the wait_lock available (fastpath) when it continues execution. Signed-off-by: Davidlohr Bueso <dbueso@suse.de> --- kernel/locking/mutex.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index 0551c21..e364b42 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -716,6 +716,7 @@ static inline void __mutex_unlock_common_slowpath(struct mutex *lock, int nested) { unsigned long flags; + WAKE_Q(wake_q); /* * As a performance measurement, release the lock before doing other @@ -743,11 +744,11 @@ __mutex_unlock_common_slowpath(struct mutex *lock, int nested) struct mutex_waiter, list); debug_mutex_wake_waiter(lock, waiter); - - wake_up_process(waiter->task); + wake_q_add(&wake_q, waiter->task); } spin_unlock_mutex(&lock->wait_lock, flags); + wake_up_q(&wake_q); } /* -- 2.1.4 ^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [PATCH] locking/mutex: Allow next waiter lockless wakeup 2016-01-25 2:23 ` [PATCH] locking/mutex: Allow next waiter lockless wakeup Davidlohr Bueso @ 2016-01-25 23:02 ` Waiman Long 2016-02-29 11:21 ` [tip:locking/core] " tip-bot for Davidlohr Bueso 1 sibling, 0 replies; 34+ messages in thread From: Waiman Long @ 2016-01-25 23:02 UTC (permalink / raw) To: Davidlohr Bueso Cc: Ding Tianhong, Peter Zijlstra, Ingo Molnar, linux-kernel, Linus Torvalds, Paul E. McKenney, Thomas Gleixner, Will Deacon, Jason Low, Tim Chen, Waiman Long On 01/24/2016 09:23 PM, Davidlohr Bueso wrote: > Make use of wake_q and enable the wakeup to occur after > releasing the wait_lock. This is similar to what we do > with rtmutex top waiter, slightly shortening the critical > region and allow other waiters to acquire the wait_lock > sooner. In low contention cases it can also help the > recently woken waiter to find the wait_lock available > (fastpath) when it continues execution. > > Signed-off-by: Davidlohr Bueso <dbueso@suse.de> > --- > kernel/locking/mutex.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c > index 0551c21..e364b42 100644 > --- a/kernel/locking/mutex.c > +++ b/kernel/locking/mutex.c > @@ -716,6 +716,7 @@ static inline void > __mutex_unlock_common_slowpath(struct mutex *lock, int nested) > { > unsigned long flags; > + WAKE_Q(wake_q); > > /* > * As a performance measurement, release the lock before doing other > @@ -743,11 +744,11 @@ __mutex_unlock_common_slowpath(struct mutex > *lock, int nested) > struct mutex_waiter, list); > > debug_mutex_wake_waiter(lock, waiter); > - > - wake_up_process(waiter->task); > + wake_q_add(&wake_q, waiter->task); > } > > spin_unlock_mutex(&lock->wait_lock, flags); > + wake_up_q(&wake_q); > } > > /* This patch looks good to me. Reviewed-by: Waiman Long <Waiman.Long@hpe.com> ^ permalink raw reply [flat|nested] 34+ messages in thread
* [tip:locking/core] locking/mutex: Allow next waiter lockless wakeup 2016-01-25 2:23 ` [PATCH] locking/mutex: Allow next waiter lockless wakeup Davidlohr Bueso 2016-01-25 23:02 ` Waiman Long @ 2016-02-29 11:21 ` tip-bot for Davidlohr Bueso 1 sibling, 0 replies; 34+ messages in thread From: tip-bot for Davidlohr Bueso @ 2016-02-29 11:21 UTC (permalink / raw) To: linux-tip-commits Cc: Will.Deacon, mingo, paulmck, linux-kernel, Waiman.Long, paulmck, akpm, dingtianhong, waiman.long, tim.c.chen, dave, dbueso, peterz, torvalds, jason.low2, hpa, tglx Commit-ID: 1329ce6fbbe4536592dfcfc8d64d61bfeb598fe6 Gitweb: http://git.kernel.org/tip/1329ce6fbbe4536592dfcfc8d64d61bfeb598fe6 Author: Davidlohr Bueso <dave@stgolabs.net> AuthorDate: Sun, 24 Jan 2016 18:23:43 -0800 Committer: Ingo Molnar <mingo@kernel.org> CommitDate: Mon, 29 Feb 2016 10:02:42 +0100 locking/mutex: Allow next waiter lockless wakeup Make use of wake-queues and enable the wakeup to occur after releasing the wait_lock. This is similar to what we do with rtmutex top waiter, slightly shortening the critical region and allow other waiters to acquire the wait_lock sooner. In low contention cases it can also help the recently woken waiter to find the wait_lock available (fastpath) when it continues execution. Reviewed-by: Waiman Long <Waiman.Long@hpe.com> Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ding Tianhong <dingtianhong@huawei.com> Cc: Jason Low <jason.low2@hp.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul E. McKenney <paulmck@us.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Waiman Long <waiman.long@hpe.com> Cc: Will Deacon <Will.Deacon@arm.com> Link: http://lkml.kernel.org/r/20160125022343.GA3322@linux-uzut.site Signed-off-by: Ingo Molnar <mingo@kernel.org> --- kernel/locking/mutex.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index 0551c21..e364b42 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -716,6 +716,7 @@ static inline void __mutex_unlock_common_slowpath(struct mutex *lock, int nested) { unsigned long flags; + WAKE_Q(wake_q); /* * As a performance measurement, release the lock before doing other @@ -743,11 +744,11 @@ __mutex_unlock_common_slowpath(struct mutex *lock, int nested) struct mutex_waiter, list); debug_mutex_wake_waiter(lock, waiter); - - wake_up_process(waiter->task); + wake_q_add(&wake_q, waiter->task); } spin_unlock_mutex(&lock->wait_lock, flags); + wake_up_q(&wake_q); } /* ^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL. 2016-01-21 23:02 ` Waiman Long 2016-01-22 6:09 ` Davidlohr Bueso @ 2016-01-22 8:54 ` Peter Zijlstra 2016-01-22 10:20 ` Jason Low 2016-01-22 13:41 ` [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL Waiman Long 1 sibling, 2 replies; 34+ messages in thread From: Peter Zijlstra @ 2016-01-22 8:54 UTC (permalink / raw) To: Waiman Long Cc: Ding Tianhong, Ingo Molnar, linux-kernel, Davidlohr Bueso, Linus Torvalds, Paul E. McKenney, Thomas Gleixner, Will Deacon, Jason Low, Tim Chen, Waiman Long On Thu, Jan 21, 2016 at 06:02:34PM -0500, Waiman Long wrote: > This patch attempts to fix this live-lock condition by enabling the > a woken task in the wait list to enter optimistic spinning loop itself > with precedence over the ones in the OSQ. This should prevent the > live-lock > condition from happening. So I think having the top waiter going back in to contend on the OSQ is an excellent idea, but I'm not sure the wlh_spinning thing is important. The OSQ itself is FIFO fair, and the waiters retain the wait_list position. So having the top wait_list entry contending on the OSQ ensures we cannot starve (I think). Also, as Davidlohr said, we cannot copy/paste this much code. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL. 2016-01-22 8:54 ` [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL Peter Zijlstra @ 2016-01-22 10:20 ` Jason Low 2016-01-22 10:53 ` Peter Zijlstra 2016-01-22 13:41 ` [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL Waiman Long 1 sibling, 1 reply; 34+ messages in thread From: Jason Low @ 2016-01-22 10:20 UTC (permalink / raw) To: Peter Zijlstra Cc: Waiman Long, Ding Tianhong, Ingo Molnar, linux-kernel, Davidlohr Bueso, Linus Torvalds, Paul E. McKenney, Thomas Gleixner, Will Deacon, Tim Chen, Waiman Long, jason.low2 On Fri, 2016-01-22 at 09:54 +0100, Peter Zijlstra wrote: > On Thu, Jan 21, 2016 at 06:02:34PM -0500, Waiman Long wrote: > > This patch attempts to fix this live-lock condition by enabling the > > a woken task in the wait list to enter optimistic spinning loop itself > > with precedence over the ones in the OSQ. This should prevent the > > live-lock > > condition from happening. > > > So I think having the top waiter going back in to contend on the OSQ is > an excellent idea, but I'm not sure the wlh_spinning thing is important. > > The OSQ itself is FIFO fair, and the waiters retain the wait_list > position. So having the top wait_list entry contending on the OSQ > ensures we cannot starve (I think). Right, and we can also avoid needing to add that extra field to the mutex structure. Before calling optimistic spinning, we do want to check if the lock is available to avoid unnecessary OSQ overhead though. So maybe the following would be sufficient: --- kernel/locking/mutex.c | 7 +++++++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index 0551c21..ead0bd1 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -543,6 +543,8 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, lock_contended(&lock->dep_map, ip); for (;;) { + bool acquired = false; + /* * Lets try to take the lock again - this is needed even if * we get here for the first time (shortly after failing to @@ -577,7 +579,12 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, /* didn't get the lock, go to sleep: */ spin_unlock_mutex(&lock->wait_lock, flags); schedule_preempt_disabled(); + + if (mutex_is_locked(lock)) + acquired = mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx); spin_lock_mutex(&lock->wait_lock, flags); + if (acquired) + break; } __set_task_state(task, TASK_RUNNING); -- 1.7.2.5 ^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL. 2016-01-22 10:20 ` Jason Low @ 2016-01-22 10:53 ` Peter Zijlstra 2016-01-22 10:56 ` Peter Zijlstra 0 siblings, 1 reply; 34+ messages in thread From: Peter Zijlstra @ 2016-01-22 10:53 UTC (permalink / raw) To: Jason Low Cc: Waiman Long, Ding Tianhong, Ingo Molnar, linux-kernel, Davidlohr Bueso, Linus Torvalds, Paul E. McKenney, Thomas Gleixner, Will Deacon, Tim Chen, Waiman Long On Fri, Jan 22, 2016 at 02:20:19AM -0800, Jason Low wrote: > --- a/kernel/locking/mutex.c > +++ b/kernel/locking/mutex.c > @@ -543,6 +543,8 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, > lock_contended(&lock->dep_map, ip); > > for (;;) { > + bool acquired = false; > + > /* > * Lets try to take the lock again - this is needed even if > * we get here for the first time (shortly after failing to > @@ -577,7 +579,12 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, > /* didn't get the lock, go to sleep: */ > spin_unlock_mutex(&lock->wait_lock, flags); > schedule_preempt_disabled(); > + > + if (mutex_is_locked(lock)) > + acquired = mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx); > spin_lock_mutex(&lock->wait_lock, flags); > + if (acquired) > + break; > } > __set_task_state(task, TASK_RUNNING); I think the problem here is that mutex_optimistic_spin() leaves the mutex->count == 0, even though we have waiters (us at the very least). But this should be easily fixed, since if we acquired, we should be the one releasing, so there's no race. So something like so: if (acquired) { atomic_set(&mutex->count, -1); break; } Should deal with that -- we'll set it to 0 again a little further down if the list ends up empty. There might be other details, but this is the one that stood out. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL. 2016-01-22 10:53 ` Peter Zijlstra @ 2016-01-22 10:56 ` Peter Zijlstra 2016-01-22 11:06 ` Peter Zijlstra 0 siblings, 1 reply; 34+ messages in thread From: Peter Zijlstra @ 2016-01-22 10:56 UTC (permalink / raw) To: Jason Low Cc: Waiman Long, Ding Tianhong, Ingo Molnar, linux-kernel, Davidlohr Bueso, Linus Torvalds, Paul E. McKenney, Thomas Gleixner, Will Deacon, Tim Chen, Waiman Long On Fri, Jan 22, 2016 at 11:53:12AM +0100, Peter Zijlstra wrote: > There might be other details, but this is the one that stood out. I think this also does the wrong thing for use_ww_ctx. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL. 2016-01-22 10:56 ` Peter Zijlstra @ 2016-01-22 11:06 ` Peter Zijlstra 2016-01-22 13:59 ` Waiman Long 0 siblings, 1 reply; 34+ messages in thread From: Peter Zijlstra @ 2016-01-22 11:06 UTC (permalink / raw) To: Jason Low Cc: Waiman Long, Ding Tianhong, Ingo Molnar, linux-kernel, Davidlohr Bueso, Linus Torvalds, Paul E. McKenney, Thomas Gleixner, Will Deacon, Tim Chen, Waiman Long On Fri, Jan 22, 2016 at 11:56:52AM +0100, Peter Zijlstra wrote: > On Fri, Jan 22, 2016 at 11:53:12AM +0100, Peter Zijlstra wrote: > > > There might be other details, but this is the one that stood out. > > I think this also does the wrong thing for use_ww_ctx. Something like so? --- kernel/locking/mutex.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c index 0551c219c40e..070a0ac34aa7 100644 --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -512,6 +512,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, struct task_struct *task = current; struct mutex_waiter waiter; unsigned long flags; + bool acquired; int ret; preempt_disable(); @@ -543,6 +544,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, lock_contended(&lock->dep_map, ip); for (;;) { + acquired = false; /* * Lets try to take the lock again - this is needed even if * we get here for the first time (shortly after failing to @@ -577,7 +579,16 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, /* didn't get the lock, go to sleep: */ spin_unlock_mutex(&lock->wait_lock, flags); schedule_preempt_disabled(); + + if (mutex_is_locked(lock)) + acquired = mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx); + spin_lock_mutex(&lock->wait_lock, flags); + + if (acquired) { + atomic_set(&lock->count, -1); + break; + } } __set_task_state(task, TASK_RUNNING); @@ -587,6 +598,9 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, atomic_set(&lock->count, 0); debug_mutex_free_waiter(&waiter); + if (acquired) + goto unlock; + skip_wait: /* got the lock - cleanup and rejoice! */ lock_acquired(&lock->dep_map, ip); @@ -597,6 +611,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, ww_mutex_set_context_slowpath(ww, ww_ctx); } +unlock: spin_unlock_mutex(&lock->wait_lock, flags); preempt_enable(); return 0; ^ permalink raw reply related [flat|nested] 34+ messages in thread
* Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL. 2016-01-22 11:06 ` Peter Zijlstra @ 2016-01-22 13:59 ` Waiman Long 2016-01-24 8:03 ` Ding Tianhong 0 siblings, 1 reply; 34+ messages in thread From: Waiman Long @ 2016-01-22 13:59 UTC (permalink / raw) To: Peter Zijlstra Cc: Jason Low, Ding Tianhong, Ingo Molnar, linux-kernel, Davidlohr Bueso, Linus Torvalds, Paul E. McKenney, Thomas Gleixner, Will Deacon, Tim Chen, Waiman Long On 01/22/2016 06:06 AM, Peter Zijlstra wrote: > On Fri, Jan 22, 2016 at 11:56:52AM +0100, Peter Zijlstra wrote: >> On Fri, Jan 22, 2016 at 11:53:12AM +0100, Peter Zijlstra wrote: >> >>> There might be other details, but this is the one that stood out. >> I think this also does the wrong thing for use_ww_ctx. > Something like so? I think that should work. My only minor concern is that putting the waiter spinner at the end of the OSQ will take it longer to get the lock, but that shouldn't be a big issue. > --- > kernel/locking/mutex.c | 15 +++++++++++++++ > 1 file changed, 15 insertions(+) > > diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c > index 0551c219c40e..070a0ac34aa7 100644 > --- a/kernel/locking/mutex.c > +++ b/kernel/locking/mutex.c > @@ -512,6 +512,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, > struct task_struct *task = current; > struct mutex_waiter waiter; > unsigned long flags; > + bool acquired; > int ret; > > preempt_disable(); > @@ -543,6 +544,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, > lock_contended(&lock->dep_map, ip); > > for (;;) { > + acquired = false; > /* > * Lets try to take the lock again - this is needed even if > * we get here for the first time (shortly after failing to > @@ -577,7 +579,16 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, > /* didn't get the lock, go to sleep: */ > spin_unlock_mutex(&lock->wait_lock, flags); > schedule_preempt_disabled(); > + > + if (mutex_is_locked(lock)) > + acquired = mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx); > + > spin_lock_mutex(&lock->wait_lock, flags); > + > + if (acquired) { > + atomic_set(&lock->count, -1); > + break; > + } > } > __set_task_state(task, TASK_RUNNING); > > @@ -587,6 +598,9 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, > atomic_set(&lock->count, 0); > debug_mutex_free_waiter(&waiter); > > + if (acquired) > + goto unlock; > + > skip_wait: > /* got the lock - cleanup and rejoice! */ > lock_acquired(&lock->dep_map, ip); > @@ -597,6 +611,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, > ww_mutex_set_context_slowpath(ww, ww_ctx); > } > > +unlock: > spin_unlock_mutex(&lock->wait_lock, flags); > preempt_enable(); > return 0; Cheers, Longman ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL. 2016-01-22 13:59 ` Waiman Long @ 2016-01-24 8:03 ` Ding Tianhong 2016-01-29 9:53 ` Peter Zijlstra 0 siblings, 1 reply; 34+ messages in thread From: Ding Tianhong @ 2016-01-24 8:03 UTC (permalink / raw) To: Waiman Long, Peter Zijlstra Cc: Jason Low, Ingo Molnar, linux-kernel, Davidlohr Bueso, Linus Torvalds, Paul E. McKenney, Thomas Gleixner, Will Deacon, Tim Chen, Waiman Long On 2016/1/22 21:59, Waiman Long wrote: > On 01/22/2016 06:06 AM, Peter Zijlstra wrote: >> On Fri, Jan 22, 2016 at 11:56:52AM +0100, Peter Zijlstra wrote: >>> On Fri, Jan 22, 2016 at 11:53:12AM +0100, Peter Zijlstra wrote: >>> >>>> There might be other details, but this is the one that stood out. >>> I think this also does the wrong thing for use_ww_ctx. >> Something like so? > > I think that should work. My only minor concern is that putting the waiter spinner at the end of the OSQ will take it longer to get the lock, but that shouldn't be a big issue. > >> --- >> kernel/locking/mutex.c | 15 +++++++++++++++ >> 1 file changed, 15 insertions(+) >> >> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c >> index 0551c219c40e..070a0ac34aa7 100644 >> --- a/kernel/locking/mutex.c >> +++ b/kernel/locking/mutex.c >> @@ -512,6 +512,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, >> struct task_struct *task = current; >> struct mutex_waiter waiter; >> unsigned long flags; >> + bool acquired; >> int ret; >> >> preempt_disable(); >> @@ -543,6 +544,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, >> lock_contended(&lock->dep_map, ip); >> >> for (;;) { >> + acquired = false; >> /* >> * Lets try to take the lock again - this is needed even if >> * we get here for the first time (shortly after failing to >> @@ -577,7 +579,16 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, >> /* didn't get the lock, go to sleep: */ >> spin_unlock_mutex(&lock->wait_lock, flags); >> schedule_preempt_disabled(); >> + >> + if (mutex_is_locked(lock)) >> + acquired = mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx); >> + >> spin_lock_mutex(&lock->wait_lock, flags); >> + >> + if (acquired) { >> + atomic_set(&lock->count, -1); >> + break; >> + } >> } >> __set_task_state(task, TASK_RUNNING); >> >> @@ -587,6 +598,9 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, >> atomic_set(&lock->count, 0); >> debug_mutex_free_waiter(&waiter); >> >> + if (acquired) >> + goto unlock; >> + >> skip_wait: >> /* got the lock - cleanup and rejoice! */ >> lock_acquired(&lock->dep_map, ip); >> @@ -597,6 +611,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, >> ww_mutex_set_context_slowpath(ww, ww_ctx); >> } >> >> +unlock: >> spin_unlock_mutex(&lock->wait_lock, flags); >> preempt_enable(); >> return 0; > > Cheers, > Longman > looks good to me, I will try this solution and report the result, thanks everyone. Ding > . > ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL. 2016-01-24 8:03 ` Ding Tianhong @ 2016-01-29 9:53 ` Peter Zijlstra 2016-01-30 1:18 ` Ding Tianhong 0 siblings, 1 reply; 34+ messages in thread From: Peter Zijlstra @ 2016-01-29 9:53 UTC (permalink / raw) To: Ding Tianhong Cc: Waiman Long, Jason Low, Ingo Molnar, linux-kernel, Davidlohr Bueso, Linus Torvalds, Paul E. McKenney, Thomas Gleixner, Will Deacon, Tim Chen, Waiman Long On Sun, Jan 24, 2016 at 04:03:50PM +0800, Ding Tianhong wrote: > looks good to me, I will try this solution and report the result, thanks everyone. Did you get a change to run with this? ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL. 2016-01-29 9:53 ` Peter Zijlstra @ 2016-01-30 1:18 ` Ding Tianhong 2016-02-01 3:29 ` huang ying 2016-02-01 10:08 ` [PATCH] locking/mutex: Avoid spinner vs waiter starvation Peter Zijlstra 0 siblings, 2 replies; 34+ messages in thread From: Ding Tianhong @ 2016-01-30 1:18 UTC (permalink / raw) To: Peter Zijlstra Cc: Waiman Long, Jason Low, Ingo Molnar, linux-kernel, Davidlohr Bueso, Linus Torvalds, Paul E. McKenney, Thomas Gleixner, Will Deacon, Tim Chen, Waiman Long On 2016/1/29 17:53, Peter Zijlstra wrote: > On Sun, Jan 24, 2016 at 04:03:50PM +0800, Ding Tianhong wrote: > >> looks good to me, I will try this solution and report the result, thanks everyone. > > Did you get a change to run with this? > > . > I backport this patch to 3.10 lts kernel, and didn't change any logic, Till now, the patch works fine to me, and no need to change anything, So I think this patch is no problem, could you formal release this patch to the latest kernel? :) Thanks. Ding ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL. 2016-01-30 1:18 ` Ding Tianhong @ 2016-02-01 3:29 ` huang ying 2016-02-01 3:35 ` Huang, Ying 2016-02-01 10:08 ` [PATCH] locking/mutex: Avoid spinner vs waiter starvation Peter Zijlstra 1 sibling, 1 reply; 34+ messages in thread From: huang ying @ 2016-02-01 3:29 UTC (permalink / raw) To: Ding Tianhong Cc: Peter Zijlstra, Waiman Long, Jason Low, Ingo Molnar, linux-kernel, Davidlohr Bueso, Linus Torvalds, Paul E. McKenney, Thomas Gleixner, Will Deacon, Tim Chen, Waiman Long, Huang Ying On Sat, Jan 30, 2016 at 9:18 AM, Ding Tianhong <dingtianhong@huawei.com> wrote: > On 2016/1/29 17:53, Peter Zijlstra wrote: >> On Sun, Jan 24, 2016 at 04:03:50PM +0800, Ding Tianhong wrote: >> >>> looks good to me, I will try this solution and report the result, thanks everyone. >> >> Did you get a change to run with this? >> >> . >> > > I backport this patch to 3.10 lts kernel, and didn't change any logic, Till now, the patch works fine to me, and no need to change anything, > So I think this patch is no problem, could you formal release this patch to the latest kernel? :) > > Thanks. > Ding > > The original patch from Tianhong triggered a performance regression because the optimistic spinning is turned off in effect. I tested Peter's patch with same configuration and there show no regression. So I think the patch keep the optimistic spinning. Test result details will be in the next email. Best Regards, Huang, YIng ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL. 2016-02-01 3:29 ` huang ying @ 2016-02-01 3:35 ` Huang, Ying 0 siblings, 0 replies; 34+ messages in thread From: Huang, Ying @ 2016-02-01 3:35 UTC (permalink / raw) To: huang ying Cc: Ding Tianhong, Peter Zijlstra, Waiman Long, Jason Low, Ingo Molnar, linux-kernel, Davidlohr Bueso, Linus Torvalds, Paul E. McKenney, Thomas Gleixner, Will Deacon, Tim Chen, Waiman Long, Huang Ying huang ying <huang.ying.caritas@gmail.com> writes: > On Sat, Jan 30, 2016 at 9:18 AM, Ding Tianhong <dingtianhong@huawei.com> wrote: >> On 2016/1/29 17:53, Peter Zijlstra wrote: >>> On Sun, Jan 24, 2016 at 04:03:50PM +0800, Ding Tianhong wrote: >>> >>>> looks good to me, I will try this solution and report the result, thanks everyone. >>> >>> Did you get a change to run with this? >>> >>> . >>> >> >> I backport this patch to 3.10 lts kernel, and didn't change any >> logic, Till now, the patch works fine to me, and no need to change >> anything, >> So I think this patch is no problem, could you formal release this patch to the latest kernel? :) >> >> Thanks. >> Ding >> >> > > The original patch from Tianhong triggered a performance regression > because the optimistic spinning is turned off in effect. I tested > Peter's patch with same configuration and there show no regression. > So I think the patch keep the optimistic spinning. Test result > details will be in the next email. Here is the detailed test result: ========================================================================================= compiler/cpufreq_governor/kconfig/nr_task/rootfs/tbox_group/test/testcase: gcc-4.9/performance/x86_64-rhel/100%/debian-x86_64-2015-02-07.cgz/lkp-ivb-d01/fstime/unixbench commit: v4.4 1db66c17114d5437c0757d6792c0d8923192ecd6 v4.4 1db66c17114d5437c0757d6792 ---------------- -------------------------- %stddev %change %stddev \ | \ 371269 ± 10% -93.2% 25080 ± 4% unixbench.time.voluntary_context_switches 371269 ± 10% -93.2% 25080 ± 4% time.voluntary_context_switches 6189 ± 8% -76.4% 1463 ± 6% vmstat.system.cs 5706 ± 0% -1.7% 5608 ± 0% vmstat.system.in 113680 ± 12% -73.5% 30086 ± 1% cpuidle.C1-IVB.usage 1515925 ± 20% +68.3% 2552001 ± 11% cpuidle.C1E-IVB.time 1227221 ± 20% -51.7% 592695 ± 17% cpuidle.C3-IVB.time 2697 ± 10% -77.8% 598.33 ± 10% cpuidle.C3-IVB.usage 15173 ± 6% -23.1% 11663 ± 1% cpuidle.C6-IVB.usage 34.38 ± 27% -35.0% 22.33 ± 2% cpuidle.POLL.usage 61.92 ± 9% +14.3% 70.78 ± 7% sched_debug.cfs_rq:/.load_avg.min 40.85 ± 29% +27.3% 52.00 ± 10% sched_debug.cfs_rq:/.runnable_load_avg.2 -1949 ±-37% -64.6% -690.19 ±-134% sched_debug.cfs_rq:/.spread0.4 -1773 ±-29% -85.6% -256.00 ±-388% sched_debug.cfs_rq:/.spread0.7 -2478 ±-26% -49.5% -1251 ±-66% sched_debug.cfs_rq:/.spread0.min 61.95 ± 9% +14.8% 71.11 ± 7% sched_debug.cfs_rq:/.tg_load_avg_contrib.min 396962 ± 12% +27.9% 507573 ± 10% sched_debug.cpu.avg_idle.0 432973 ± 18% +45.3% 629147 ± 12% sched_debug.cpu.avg_idle.6 448566 ± 3% +11.5% 499990 ± 0% sched_debug.cpu.avg_idle.avg 45.31 ± 5% -9.5% 41.00 ± 3% sched_debug.cpu.cpu_load[3].7 52204 ± 10% -49.9% 26173 ± 34% sched_debug.cpu.nr_switches.0 50383 ± 12% -57.6% 21353 ± 15% sched_debug.cpu.nr_switches.1 45425 ± 16% -68.5% 14325 ± 28% sched_debug.cpu.nr_switches.2 43069 ± 20% -65.5% 14852 ± 41% sched_debug.cpu.nr_switches.3 40285 ± 16% -70.4% 11905 ± 47% sched_debug.cpu.nr_switches.4 40732 ± 13% -75.8% 9872 ± 39% sched_debug.cpu.nr_switches.5 43011 ± 19% -80.0% 8607 ± 42% sched_debug.cpu.nr_switches.6 38076 ± 12% -75.9% 9167 ± 40% sched_debug.cpu.nr_switches.7 44148 ± 7% -67.1% 14532 ± 6% sched_debug.cpu.nr_switches.avg 59877 ± 8% -51.3% 29146 ± 21% sched_debug.cpu.nr_switches.max 33672 ± 9% -83.0% 5718 ± 4% sched_debug.cpu.nr_switches.min -0.62 ±-1411% -2212.5% 13.00 ± 16% sched_debug.cpu.nr_uninterruptible.0 -1.23 ±-582% -1318.8% 15.00 ± 35% sched_debug.cpu.nr_uninterruptible.1 2.54 ±263% +267.7% 9.33 ± 91% sched_debug.cpu.nr_uninterruptible.2 0.31 ±2966% -5841.7% -17.67 ±-19% sched_debug.cpu.nr_uninterruptible.5 9.84 ± 19% +70.4% 16.76 ± 5% sched_debug.cpu.nr_uninterruptible.stddev 116287 ± 4% -20.9% 91972 ± 9% sched_debug.cpu.sched_count.0 50411 ± 12% -57.6% 21382 ± 15% sched_debug.cpu.sched_count.1 45453 ± 16% -68.4% 14356 ± 28% sched_debug.cpu.sched_count.2 43098 ± 20% -65.5% 14888 ± 41% sched_debug.cpu.sched_count.3 40314 ± 16% -70.4% 11934 ± 47% sched_debug.cpu.sched_count.4 40761 ± 13% -75.7% 9896 ± 39% sched_debug.cpu.sched_count.5 43041 ± 19% -79.9% 8636 ± 42% sched_debug.cpu.sched_count.6 38105 ± 12% -75.9% 9193 ± 40% sched_debug.cpu.sched_count.7 52184 ± 6% -56.3% 22782 ± 4% sched_debug.cpu.sched_count.avg 116288 ± 4% -20.9% 91972 ± 9% sched_debug.cpu.sched_count.max 33701 ± 9% -82.9% 5746 ± 4% sched_debug.cpu.sched_count.min 22760 ± 10% -63.0% 8418 ± 40% sched_debug.cpu.sched_goidle.0 23319 ± 13% -60.9% 9114 ± 22% sched_debug.cpu.sched_goidle.1 21273 ± 17% -80.4% 4169 ± 13% sched_debug.cpu.sched_goidle.2 19993 ± 19% -67.8% 6429 ± 45% sched_debug.cpu.sched_goidle.3 18614 ± 17% -85.0% 2788 ± 29% sched_debug.cpu.sched_goidle.4 18921 ± 12% -86.7% 2520 ± 15% sched_debug.cpu.sched_goidle.5 20131 ± 17% -82.1% 3596 ± 52% sched_debug.cpu.sched_goidle.6 17861 ± 12% -86.9% 2334 ± 14% sched_debug.cpu.sched_goidle.7 20359 ± 8% -75.8% 4921 ± 5% sched_debug.cpu.sched_goidle.avg 26477 ± 10% -60.2% 10539 ± 21% sched_debug.cpu.sched_goidle.max 15845 ± 10% -87.2% 2033 ± 4% sched_debug.cpu.sched_goidle.min 29043 ± 15% -58.8% 11958 ± 26% sched_debug.cpu.ttwu_count.0 24191 ± 10% -68.8% 7547 ± 27% sched_debug.cpu.ttwu_count.1 21313 ± 11% -72.7% 5819 ± 24% sched_debug.cpu.ttwu_count.2 21487 ± 13% -61.4% 8296 ± 43% sched_debug.cpu.ttwu_count.3 19644 ± 15% -54.4% 8967 ± 79% sched_debug.cpu.ttwu_count.4 20786 ± 15% -69.2% 6409 ± 58% sched_debug.cpu.ttwu_count.5 20435 ± 17% -79.3% 4231 ± 58% sched_debug.cpu.ttwu_count.6 19293 ± 17% -77.0% 4432 ± 55% sched_debug.cpu.ttwu_count.7 22024 ± 7% -67.3% 7207 ± 6% sched_debug.cpu.ttwu_count.avg 31009 ± 9% -45.2% 17008 ± 17% sched_debug.cpu.ttwu_count.max 16791 ± 10% -85.1% 2494 ± 5% sched_debug.cpu.ttwu_count.min 3084 ± 8% +25.5% 3870 ± 9% sched_debug.cpu.ttwu_local.avg The URL of the original regression report email: https://lists.01.org/pipermail/lkp/2016-January/003442.html Best Regards, Huang, Ying ^ permalink raw reply [flat|nested] 34+ messages in thread
* [PATCH] locking/mutex: Avoid spinner vs waiter starvation 2016-01-30 1:18 ` Ding Tianhong 2016-02-01 3:29 ` huang ying @ 2016-02-01 10:08 ` Peter Zijlstra 2016-02-02 21:19 ` Davidlohr Bueso 2016-02-04 1:35 ` Jason Low 1 sibling, 2 replies; 34+ messages in thread From: Peter Zijlstra @ 2016-02-01 10:08 UTC (permalink / raw) To: Ding Tianhong Cc: Waiman Long, Jason Low, Ingo Molnar, linux-kernel, Davidlohr Bueso, Linus Torvalds, Paul E. McKenney, Thomas Gleixner, Will Deacon, Tim Chen, Waiman Long On Sat, Jan 30, 2016 at 09:18:44AM +0800, Ding Tianhong wrote: > On 2016/1/29 17:53, Peter Zijlstra wrote: > > On Sun, Jan 24, 2016 at 04:03:50PM +0800, Ding Tianhong wrote: > > > >> looks good to me, I will try this solution and report the result, thanks everyone. > > > > Did you get a change to run with this? > > > > . > > > > I backport this patch to 3.10 lts kernel, and didn't change any logic, > Till now, the patch works fine to me, and no need to change anything, > So I think this patch is no problem, could you formal release this > patch to the latest kernel? :) Thanks for testing, I've queued the below patch. --- Subject: locking/mutex: Avoid spinner vs waiter starvation From: Peter Zijlstra <peterz@infradead.org> Date: Fri, 22 Jan 2016 12:06:53 +0100 Ding Tianhong reported that under his load the optimistic spinners would totally starve a task that ended up on the wait list. Fix this by ensuring the top waiter also partakes in the optimistic spin queue. There are a few subtle differences between the assumed state of regular optimistic spinners and those already on the wait list, which result in the @acquired complication of the acquire path. Most notable are: - waiters are on the wait list and need to be taken off - mutex_optimistic_spin() sets the lock->count to 0 on acquire even though there might be more tasks on the wait list. Cc: Jason Low <jason.low2@hp.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Waiman Long <waiman.long@hpe.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "Paul E. McKenney" <paulmck@us.ibm.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Will Deacon <Will.Deacon@arm.com> Reported-by: Ding Tianhong <dingtianhong@huawei.com> Tested-by: Ding Tianhong <dingtianhong@huawei.com> Tested-by: "Huang, Ying" <ying.huang@intel.com> Suggested-by: Waiman Long <Waiman.Long@hp.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/20160122110653.GF6375@twins.programming.kicks-ass.net --- kernel/locking/mutex.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) --- a/kernel/locking/mutex.c +++ b/kernel/locking/mutex.c @@ -512,6 +512,7 @@ __mutex_lock_common(struct mutex *lock, struct task_struct *task = current; struct mutex_waiter waiter; unsigned long flags; + bool acquired; int ret; preempt_disable(); @@ -543,6 +544,7 @@ __mutex_lock_common(struct mutex *lock, lock_contended(&lock->dep_map, ip); for (;;) { + acquired = false; /* * Lets try to take the lock again - this is needed even if * we get here for the first time (shortly after failing to @@ -577,7 +579,16 @@ __mutex_lock_common(struct mutex *lock, /* didn't get the lock, go to sleep: */ spin_unlock_mutex(&lock->wait_lock, flags); schedule_preempt_disabled(); + + if (mutex_is_locked(lock)) + acquired = mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx); + spin_lock_mutex(&lock->wait_lock, flags); + + if (acquired) { + atomic_set(&lock->count, -1); + break; + } } __set_task_state(task, TASK_RUNNING); @@ -587,6 +598,9 @@ __mutex_lock_common(struct mutex *lock, atomic_set(&lock->count, 0); debug_mutex_free_waiter(&waiter); + if (acquired) + goto unlock; + skip_wait: /* got the lock - cleanup and rejoice! */ lock_acquired(&lock->dep_map, ip); @@ -597,6 +611,7 @@ __mutex_lock_common(struct mutex *lock, ww_mutex_set_context_slowpath(ww, ww_ctx); } +unlock: spin_unlock_mutex(&lock->wait_lock, flags); preempt_enable(); return 0; ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] locking/mutex: Avoid spinner vs waiter starvation 2016-02-01 10:08 ` [PATCH] locking/mutex: Avoid spinner vs waiter starvation Peter Zijlstra @ 2016-02-02 21:19 ` Davidlohr Bueso 2016-02-03 7:10 ` Ding Tianhong 2016-02-03 22:07 ` Waiman Long 2016-02-04 1:35 ` Jason Low 1 sibling, 2 replies; 34+ messages in thread From: Davidlohr Bueso @ 2016-02-02 21:19 UTC (permalink / raw) To: Peter Zijlstra Cc: Ding Tianhong, Waiman Long, Jason Low, Ingo Molnar, linux-kernel, Linus Torvalds, Paul E. McKenney, Thomas Gleixner, Will Deacon, Tim Chen, Waiman Long On Mon, 01 Feb 2016, Peter Zijlstra wrote: >Subject: locking/mutex: Avoid spinner vs waiter starvation >From: Peter Zijlstra <peterz@infradead.org> >Date: Fri, 22 Jan 2016 12:06:53 +0100 > >Ding Tianhong reported that under his load the optimistic spinners >would totally starve a task that ended up on the wait list. > >Fix this by ensuring the top waiter also partakes in the optimistic >spin queue. > >There are a few subtle differences between the assumed state of >regular optimistic spinners and those already on the wait list, which >result in the @acquired complication of the acquire path. > >Most notable are: > > - waiters are on the wait list and need to be taken off > - mutex_optimistic_spin() sets the lock->count to 0 on acquire > even though there might be more tasks on the wait list. Right, the main impact I see with these complications are that the window of when a waiter takes the lock via spinning and then acquires the wait_lock to remove itself from the list, will allow an unlock thread to set the lock as available in the fastpath which could in turn allow a third thread the steal the lock. With high contention, this window will be come obviously larger as we contend for the wait_lock. CPU-0 CPU-1 CPU-3 __mutex_lock_common mutex_optimistic_spin (->count now 0) __mutex_fastpath_unlock (->count now 1) __mutex_fastpath_lock (stolen) spin_lock_mutex(&lock->wait_lock, flags); But we've always been bad when it comes to counter and waiters. Thanks, Davidlohr ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] locking/mutex: Avoid spinner vs waiter starvation 2016-02-02 21:19 ` Davidlohr Bueso @ 2016-02-03 7:10 ` Ding Tianhong 2016-02-03 19:24 ` Davidlohr Bueso 2016-02-03 22:07 ` Waiman Long 1 sibling, 1 reply; 34+ messages in thread From: Ding Tianhong @ 2016-02-03 7:10 UTC (permalink / raw) To: Davidlohr Bueso, Peter Zijlstra Cc: Waiman Long, Jason Low, Ingo Molnar, linux-kernel, Linus Torvalds, Paul E. McKenney, Thomas Gleixner, Will Deacon, Tim Chen, Waiman Long On 2016/2/3 5:19, Davidlohr Bueso wrote: > On Mon, 01 Feb 2016, Peter Zijlstra wrote: > >> Subject: locking/mutex: Avoid spinner vs waiter starvation >> From: Peter Zijlstra <peterz@infradead.org> >> Date: Fri, 22 Jan 2016 12:06:53 +0100 >> >> Ding Tianhong reported that under his load the optimistic spinners >> would totally starve a task that ended up on the wait list. >> >> Fix this by ensuring the top waiter also partakes in the optimistic >> spin queue. >> >> There are a few subtle differences between the assumed state of >> regular optimistic spinners and those already on the wait list, which >> result in the @acquired complication of the acquire path. >> >> Most notable are: >> >> - waiters are on the wait list and need to be taken off >> - mutex_optimistic_spin() sets the lock->count to 0 on acquire >> even though there might be more tasks on the wait list. > > Right, the main impact I see with these complications are that the > window of when a waiter takes the lock via spinning and then acquires > the wait_lock to remove itself from the list, will allow an unlock > thread to set the lock as available in the fastpath which could in > turn allow a third thread the steal the lock. With high contention, > this window will be come obviously larger as we contend for the > wait_lock. > > CPU-0 CPU-1 CPU-3 > __mutex_lock_common mutex_optimistic_spin > (->count now 0) > __mutex_fastpath_unlock > (->count now 1) __mutex_fastpath_lock > (stolen) > > spin_lock_mutex(&lock->wait_lock, flags); > > But we've always been bad when it comes to counter and waiters. > Agree, but this patch is going to help the waiter in the wait list to get the lock, your scene probability looks more too low and I don't think it is a problem. Thanks Ding > Thanks, > Davidlohr > > . > ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] locking/mutex: Avoid spinner vs waiter starvation 2016-02-03 7:10 ` Ding Tianhong @ 2016-02-03 19:24 ` Davidlohr Bueso 2016-02-04 1:20 ` Ding Tianhong 0 siblings, 1 reply; 34+ messages in thread From: Davidlohr Bueso @ 2016-02-03 19:24 UTC (permalink / raw) To: Ding Tianhong Cc: Peter Zijlstra, Waiman Long, Jason Low, Ingo Molnar, linux-kernel, Linus Torvalds, Paul E. McKenney, Thomas Gleixner, Will Deacon, Tim Chen, Waiman Long On Wed, 03 Feb 2016, Ding Tianhong wrote: >Agree, but this patch is going to help the waiter in the wait list to get the lock, your scene probability looks more >too low and I don't think it is a problem. Sure, I was in fact implying its not the end of the world, although it will be interesting to see the impact on different (non pathological) workloads, even if it only affects a single waiter. Also, technically this issue can also affect rwsems if only using writers, but that's obviously pretty idiotic, so I wouldn't worry about it. Thanks, Davidlohr ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] locking/mutex: Avoid spinner vs waiter starvation 2016-02-03 19:24 ` Davidlohr Bueso @ 2016-02-04 1:20 ` Ding Tianhong 2016-02-12 18:33 ` Waiman Long 0 siblings, 1 reply; 34+ messages in thread From: Ding Tianhong @ 2016-02-04 1:20 UTC (permalink / raw) To: Davidlohr Bueso Cc: Peter Zijlstra, Waiman Long, Jason Low, Ingo Molnar, linux-kernel, Linus Torvalds, Paul E. McKenney, Thomas Gleixner, Will Deacon, Tim Chen, Waiman Long On 2016/2/4 3:24, Davidlohr Bueso wrote: > On Wed, 03 Feb 2016, Ding Tianhong wrote: > >> Agree, but this patch is going to help the waiter in the wait list to get the lock, your scene probability looks more >> too low and I don't think it is a problem. > > Sure, I was in fact implying its not the end of the world, > although it will be interesting to see the impact on different > (non pathological) workloads, even if it only affects a single > waiter. Also, technically this issue can also affect rwsems if > only using writers, but that's obviously pretty idiotic, so I > wouldn't worry about it. > > Thanks, > Davidlohr > Hi Davidlohr, Peter: According Davidlohr's suggestion, I use several VM to test Peter's patch, and sadly I found one VM still happen Hung Task for this problem, so I think we still need to think more about this solution. Thanks. Ding > . > ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] locking/mutex: Avoid spinner vs waiter starvation 2016-02-04 1:20 ` Ding Tianhong @ 2016-02-12 18:33 ` Waiman Long 0 siblings, 0 replies; 34+ messages in thread From: Waiman Long @ 2016-02-12 18:33 UTC (permalink / raw) To: Ding Tianhong Cc: Davidlohr Bueso, Peter Zijlstra, Jason Low, Ingo Molnar, linux-kernel, Linus Torvalds, Paul E. McKenney, Thomas Gleixner, Will Deacon, Tim Chen, Waiman Long [-- Attachment #1: Type: text/plain, Size: 1201 bytes --] On 02/03/2016 08:20 PM, Ding Tianhong wrote: > On 2016/2/4 3:24, Davidlohr Bueso wrote: >> On Wed, 03 Feb 2016, Ding Tianhong wrote: >> >>> Agree, but this patch is going to help the waiter in the wait list to get the lock, your scene probability looks more >>> too low and I don't think it is a problem. >> Sure, I was in fact implying its not the end of the world, >> although it will be interesting to see the impact on different >> (non pathological) workloads, even if it only affects a single >> waiter. Also, technically this issue can also affect rwsems if >> only using writers, but that's obviously pretty idiotic, so I >> wouldn't worry about it. >> >> Thanks, >> Davidlohr >> > Hi Davidlohr, Peter: > > According Davidlohr's suggestion, I use several VM to test Peter's patch, and sadly I found one VM > still happen Hung Task for this problem, so I think we still need to think more about this solution. > > Thanks. > Ding After thinking about it, it may be the case the hung task missed the wakeup. If no new task enters the wait queue again, the hung task will never get woken up. Would you mind trying out my patch to see if you are still seeing hung task or not? Thanks, Longman [-- Attachment #2: longman-mutex-v2-patch.tar.gz --] [-- Type: application/x-gzip, Size: 5106 bytes --] ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] locking/mutex: Avoid spinner vs waiter starvation 2016-02-02 21:19 ` Davidlohr Bueso 2016-02-03 7:10 ` Ding Tianhong @ 2016-02-03 22:07 ` Waiman Long 1 sibling, 0 replies; 34+ messages in thread From: Waiman Long @ 2016-02-03 22:07 UTC (permalink / raw) To: Davidlohr Bueso Cc: Peter Zijlstra, Ding Tianhong, Jason Low, Ingo Molnar, linux-kernel, Linus Torvalds, Paul E. McKenney, Thomas Gleixner, Will Deacon, Tim Chen, Waiman Long On 02/02/2016 04:19 PM, Davidlohr Bueso wrote: > On Mon, 01 Feb 2016, Peter Zijlstra wrote: > >> Subject: locking/mutex: Avoid spinner vs waiter starvation >> From: Peter Zijlstra <peterz@infradead.org> >> Date: Fri, 22 Jan 2016 12:06:53 +0100 >> >> Ding Tianhong reported that under his load the optimistic spinners >> would totally starve a task that ended up on the wait list. >> >> Fix this by ensuring the top waiter also partakes in the optimistic >> spin queue. >> >> There are a few subtle differences between the assumed state of >> regular optimistic spinners and those already on the wait list, which >> result in the @acquired complication of the acquire path. >> >> Most notable are: >> >> - waiters are on the wait list and need to be taken off >> - mutex_optimistic_spin() sets the lock->count to 0 on acquire >> even though there might be more tasks on the wait list. > > Right, the main impact I see with these complications are that the > window of when a waiter takes the lock via spinning and then acquires > the wait_lock to remove itself from the list, will allow an unlock > thread to set the lock as available in the fastpath which could in > turn allow a third thread the steal the lock. With high contention, > this window will be come obviously larger as we contend for the > wait_lock. > > CPU-0 CPU-1 CPU-3 > __mutex_lock_common mutex_optimistic_spin > (->count now 0) > __mutex_fastpath_unlock > (->count now 1) __mutex_fastpath_lock > (stolen) > > spin_lock_mutex(&lock->wait_lock, flags); > > But we've always been bad when it comes to counter and waiters. > > Thanks, > Davidlohr I don't quite get how that can happen. CPU0 cannot change the count to 0 unless CPU1, the lock holder, does the unlock first. Once CPU0 sees a count of 1 and change it to 0, it is the lock holder and there can be no other CPU that can do the unlock. Cheers, Longman ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] locking/mutex: Avoid spinner vs waiter starvation 2016-02-01 10:08 ` [PATCH] locking/mutex: Avoid spinner vs waiter starvation Peter Zijlstra 2016-02-02 21:19 ` Davidlohr Bueso @ 2016-02-04 1:35 ` Jason Low 2016-02-04 8:55 ` huang ying 1 sibling, 1 reply; 34+ messages in thread From: Jason Low @ 2016-02-04 1:35 UTC (permalink / raw) To: Peter Zijlstra Cc: Ding Tianhong, Waiman Long, Ingo Molnar, linux-kernel, Davidlohr Bueso, Linus Torvalds, Paul E. McKenney, Thomas Gleixner, Will Deacon, Tim Chen, Waiman Long, jason.low2 On Mon, 2016-02-01 at 11:08 +0100, Peter Zijlstra wrote: > On Sat, Jan 30, 2016 at 09:18:44AM +0800, Ding Tianhong wrote: > > On 2016/1/29 17:53, Peter Zijlstra wrote: > > > On Sun, Jan 24, 2016 at 04:03:50PM +0800, Ding Tianhong wrote: > > > > > >> looks good to me, I will try this solution and report the result, thanks everyone. > > > > > > Did you get a change to run with this? > > > > > > . > > > > > > > I backport this patch to 3.10 lts kernel, and didn't change any logic, > > Till now, the patch works fine to me, and no need to change anything, > > So I think this patch is no problem, could you formal release this > > patch to the latest kernel? :) > > Thanks for testing, I've queued the below patch. > > --- > Subject: locking/mutex: Avoid spinner vs waiter starvation > From: Peter Zijlstra <peterz@infradead.org> > Date: Fri, 22 Jan 2016 12:06:53 +0100 > > Ding Tianhong reported that under his load the optimistic spinners > would totally starve a task that ended up on the wait list. > > Fix this by ensuring the top waiter also partakes in the optimistic > spin queue. > > There are a few subtle differences between the assumed state of > regular optimistic spinners and those already on the wait list, which > result in the @acquired complication of the acquire path. > > Most notable are: > > - waiters are on the wait list and need to be taken off > - mutex_optimistic_spin() sets the lock->count to 0 on acquire > even though there might be more tasks on the wait list. > > Cc: Jason Low <jason.low2@hp.com> > Cc: Ingo Molnar <mingo@redhat.com> > Cc: Tim Chen <tim.c.chen@linux.intel.com> > Cc: Linus Torvalds <torvalds@linux-foundation.org> > Cc: Waiman Long <waiman.long@hpe.com> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: "Paul E. McKenney" <paulmck@us.ibm.com> > Cc: Davidlohr Bueso <dave@stgolabs.net> > Cc: Will Deacon <Will.Deacon@arm.com> > Reported-by: Ding Tianhong <dingtianhong@huawei.com> > Tested-by: Ding Tianhong <dingtianhong@huawei.com> > Tested-by: "Huang, Ying" <ying.huang@intel.com> > Suggested-by: Waiman Long <Waiman.Long@hp.com> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> > Link: http://lkml.kernel.org/r/20160122110653.GF6375@twins.programming.kicks-ass.net I've done some testing with this patch with some of the AIM7 workloads and found that this reduced throughput by about 10%. The reduction in throughput is expected since spinning as a waiter is less efficient. Another observation I made is that the top waiter spinners would often times require needing to reschedule before being able to acquire the lock from spinning when there was high contention. A waiter can go into the cycle of spin -> reschedule -> spin -> reschedule. So although the chance of starvation is reduced, this patch doesn't fully address the issue of waiter starvation. Jason ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] locking/mutex: Avoid spinner vs waiter starvation 2016-02-04 1:35 ` Jason Low @ 2016-02-04 8:55 ` huang ying 2016-02-04 22:49 ` Jason Low 0 siblings, 1 reply; 34+ messages in thread From: huang ying @ 2016-02-04 8:55 UTC (permalink / raw) To: Jason Low Cc: Peter Zijlstra, Ding Tianhong, Waiman Long, Ingo Molnar, linux-kernel, Davidlohr Bueso, Linus Torvalds, Paul E. McKenney, Thomas Gleixner, Will Deacon, Tim Chen, Waiman Long, jason.low2 Hi, Low, On Thu, Feb 4, 2016 at 9:35 AM, Jason Low <jason.low2@hp.com> wrote: > I've done some testing with this patch with some of the AIM7 workloads > and found that this reduced throughput by about 10%. The reduction in > throughput is expected since spinning as a waiter is less efficient. > > Another observation I made is that the top waiter spinners would often > times require needing to reschedule before being able to acquire the > lock from spinning when there was high contention. A waiter can go into > the cycle of spin -> reschedule -> spin -> reschedule. So although the > chance of starvation is reduced, this patch doesn't fully address the > issue of waiter starvation. Could you share your workload? I want to reproduce it in 0day/LKP+ environment. Best Regards, Huang, Ying ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH] locking/mutex: Avoid spinner vs waiter starvation 2016-02-04 8:55 ` huang ying @ 2016-02-04 22:49 ` Jason Low 0 siblings, 0 replies; 34+ messages in thread From: Jason Low @ 2016-02-04 22:49 UTC (permalink / raw) To: huang ying Cc: Peter Zijlstra, Ding Tianhong, Waiman Long, Ingo Molnar, linux-kernel, Davidlohr Bueso, Linus Torvalds, Paul E. McKenney, Thomas Gleixner, Will Deacon, Tim Chen, Waiman Long, jason.low2, scott.norton On Thu, 2016-02-04 at 16:55 +0800, huang ying wrote: > Hi, Low, > > On Thu, Feb 4, 2016 at 9:35 AM, Jason Low <jason.low2@hp.com> wrote: > > I've done some testing with this patch with some of the AIM7 workloads > > and found that this reduced throughput by about 10%. The reduction in > > throughput is expected since spinning as a waiter is less efficient. > > > > Another observation I made is that the top waiter spinners would often > > times require needing to reschedule before being able to acquire the > > lock from spinning when there was high contention. A waiter can go into > > the cycle of spin -> reschedule -> spin -> reschedule. So although the > > chance of starvation is reduced, this patch doesn't fully address the > > issue of waiter starvation. > > Could you share your workload? I want to reproduce it in 0day/LKP+ environment. CC'ing Scott, who wrote the automation scripts. Jason ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL. 2016-01-22 8:54 ` [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL Peter Zijlstra 2016-01-22 10:20 ` Jason Low @ 2016-01-22 13:41 ` Waiman Long 1 sibling, 0 replies; 34+ messages in thread From: Waiman Long @ 2016-01-22 13:41 UTC (permalink / raw) To: Peter Zijlstra Cc: Ding Tianhong, Ingo Molnar, linux-kernel, Davidlohr Bueso, Linus Torvalds, Paul E. McKenney, Thomas Gleixner, Will Deacon, Jason Low, Tim Chen, Waiman Long On 01/22/2016 03:54 AM, Peter Zijlstra wrote: > On Thu, Jan 21, 2016 at 06:02:34PM -0500, Waiman Long wrote: >> This patch attempts to fix this live-lock condition by enabling the >> a woken task in the wait list to enter optimistic spinning loop itself >> with precedence over the ones in the OSQ. This should prevent the >> live-lock >> condition from happening. > > So I think having the top waiter going back in to contend on the OSQ is > an excellent idea, but I'm not sure the wlh_spinning thing is important. Yes, that is optional. I put it there just to make it is more likely for the waiter spinner to get the lock. Without that, the chance will be 50/50 on average. I can certainly take that out. > The OSQ itself is FIFO fair, and the waiters retain the wait_list > position. So having the top wait_list entry contending on the OSQ > ensures we cannot starve (I think). > > Also, as Davidlohr said, we cannot copy/paste this much code. As I said in the previous mail, I do intend to refactor it before sending out the official patch. Cheers, Longman ^ permalink raw reply [flat|nested] 34+ messages in thread
end of thread, other threads:[~2016-02-29 11:22 UTC | newest] Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-01-21 9:29 [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL Ding Tianhong 2016-01-21 21:23 ` Tim Chen 2016-01-22 2:41 ` Paul E. McKenney 2016-01-22 2:48 ` Davidlohr Bueso 2016-01-22 3:13 ` Paul E. McKenney 2016-01-21 23:02 ` Waiman Long 2016-01-22 6:09 ` Davidlohr Bueso 2016-01-22 13:38 ` Waiman Long 2016-01-22 16:46 ` Davidlohr Bueso 2016-01-25 2:23 ` [PATCH] locking/mutex: Allow next waiter lockless wakeup Davidlohr Bueso 2016-01-25 23:02 ` Waiman Long 2016-02-29 11:21 ` [tip:locking/core] " tip-bot for Davidlohr Bueso 2016-01-22 8:54 ` [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL Peter Zijlstra 2016-01-22 10:20 ` Jason Low 2016-01-22 10:53 ` Peter Zijlstra 2016-01-22 10:56 ` Peter Zijlstra 2016-01-22 11:06 ` Peter Zijlstra 2016-01-22 13:59 ` Waiman Long 2016-01-24 8:03 ` Ding Tianhong 2016-01-29 9:53 ` Peter Zijlstra 2016-01-30 1:18 ` Ding Tianhong 2016-02-01 3:29 ` huang ying 2016-02-01 3:35 ` Huang, Ying 2016-02-01 10:08 ` [PATCH] locking/mutex: Avoid spinner vs waiter starvation Peter Zijlstra 2016-02-02 21:19 ` Davidlohr Bueso 2016-02-03 7:10 ` Ding Tianhong 2016-02-03 19:24 ` Davidlohr Bueso 2016-02-04 1:20 ` Ding Tianhong 2016-02-12 18:33 ` Waiman Long 2016-02-03 22:07 ` Waiman Long 2016-02-04 1:35 ` Jason Low 2016-02-04 8:55 ` huang ying 2016-02-04 22:49 ` Jason Low 2016-01-22 13:41 ` [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL Waiman Long
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).