linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC ] locking/mutexes: don't spin on owner when wait list is not NULL.
@ 2016-01-21  6:53 Ding Tianhong
  2016-01-21  7:29 ` Ingo Molnar
  0 siblings, 1 reply; 24+ messages in thread
From: Ding Tianhong @ 2016-01-21  6:53 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, linux-kernel

I build a script to create several process for ioctl loop calling,
the ioctl will calling the kernel function just like:
xx_ioctl {
...
rtnl_lock();
function();
rtnl_unlock();
...
}
The function may sleep several ms, but will not halt, at the same time
another user service may calling ifconfig to change the state of the
ethernet, and after several hours, the hung task thread report this problem:

========================================================================
149738.039038] INFO: task ifconfig:11890 blocked for more than 120 seconds.
[149738.040597] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[149738.042280] ifconfig D ffff88061ec13680 0 11890 11573 0x00000080
[149738.042284] ffff88052449bd40 0000000000000082 ffff88053a33f300 ffff88052449bfd8
[149738.042286] ffff88052449bfd8 ffff88052449bfd8 ffff88053a33f300 ffffffff819e6240
[149738.042288] ffffffff819e6244 ffff88053a33f300 00000000ffffffff ffffffff819e6248
[149738.042290] Call Trace:
[149738.042300] [<ffffffff8160d219>] schedule_preempt_disabled+0x29/0x70
[149738.042303] [<ffffffff8160af65>] __mutex_lock_slowpath+0xc5/0x1c0
[149738.042305] [<ffffffff8160a3cf>] mutex_lock+0x1f/0x2f
[149738.042309] [<ffffffff8150d945>] rtnl_lock+0x15/0x20
[149738.042311] [<ffffffff81514e3a>] dev_ioctl+0xda/0x590
[149738.042314] [<ffffffff816121cc>] ? __do_page_fault+0x21c/0x560
[149738.042318] [<ffffffff814e42c5>] sock_do_ioctl+0x45/0x50
[149738.042320] [<ffffffff814e49d0>] sock_ioctl+0x1f0/0x2c0
[149738.042324] [<ffffffff811dc9b5>] do_vfs_ioctl+0x2e5/0x4c0
[149738.042327] [<ffffffff811e6a00>] ? fget_light+0xa0/0xd0

================================ cut here ================================

I got the vmcore and found that the ifconfig is already in the wait_list of the
rtnl_lock for 120 second, but my process could get and release the rtnl_lock
normally several times in one second, so it means that my process jump the
queue and the ifconfig couldn't get the rtnl all the time, I check the mutex lock
slow path and found that the mutex may spin on owner ignore whether the  wait list
is empty, it will cause the task in the wait list always be cut in line, so add
test for wait list in the mutex_can_spin_on_owner and avoid this problem.

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
---
 kernel/locking/mutex.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 0551c21..596b341 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -256,7 +256,7 @@ static inline int mutex_can_spin_on_owner(struct mutex *lock)
 	struct task_struct *owner;
 	int retval = 1;
 
-	if (need_resched())
+	if (need_resched() || atomic_read(&lock->count) == -1)
 		return 0;
 
 	rcu_read_lock();
@@ -283,10 +283,11 @@ static inline bool mutex_try_to_acquire(struct mutex *lock)
 /*
  * Optimistic spinning.
  *
- * We try to spin for acquisition when we find that the lock owner
- * is currently running on a (different) CPU and while we don't
- * need to reschedule. The rationale is that if the lock owner is
- * running, it is likely to release the lock soon.
+ * We try to spin for acquisition when we find that there are no
+ * pending waiters and the lock owner is currently running on a
+ * (different) CPU and while we don't need to reschedule. The
+ * rationale is that if the lock owner is running, it is likely
+ * to release the lock soon.
  *
  * Since this needs the lock owner, and this mutex implementation
  * doesn't track the owner atomically in the lock field, we need to
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC ] locking/mutexes: don't spin on owner when wait list is not NULL.
  2016-01-21  6:53 [PATCH RFC ] locking/mutexes: don't spin on owner when wait list is not NULL Ding Tianhong
@ 2016-01-21  7:29 ` Ingo Molnar
  2016-01-21  9:04   ` Ding Tianhong
  0 siblings, 1 reply; 24+ messages in thread
From: Ingo Molnar @ 2016-01-21  7:29 UTC (permalink / raw)
  To: Ding Tianhong
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Davidlohr Bueso,
	Linus Torvalds, Paul E. McKenney, Thomas Gleixner,
	Peter Zijlstra, Will Deacon, Jason Low, Tim Chen, Waiman Long


Cc:-ed other gents who touched the mutex code recently. Mail quoted below.

Thanks,

	Ingo

* Ding Tianhong <dingtianhong@huawei.com> wrote:

> I build a script to create several process for ioctl loop calling,
> the ioctl will calling the kernel function just like:
> xx_ioctl {
> ...
> rtnl_lock();
> function();
> rtnl_unlock();
> ...
> }
> The function may sleep several ms, but will not halt, at the same time
> another user service may calling ifconfig to change the state of the
> ethernet, and after several hours, the hung task thread report this problem:
> 
> ========================================================================
> 149738.039038] INFO: task ifconfig:11890 blocked for more than 120 seconds.
> [149738.040597] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [149738.042280] ifconfig D ffff88061ec13680 0 11890 11573 0x00000080
> [149738.042284] ffff88052449bd40 0000000000000082 ffff88053a33f300 ffff88052449bfd8
> [149738.042286] ffff88052449bfd8 ffff88052449bfd8 ffff88053a33f300 ffffffff819e6240
> [149738.042288] ffffffff819e6244 ffff88053a33f300 00000000ffffffff ffffffff819e6248
> [149738.042290] Call Trace:
> [149738.042300] [<ffffffff8160d219>] schedule_preempt_disabled+0x29/0x70
> [149738.042303] [<ffffffff8160af65>] __mutex_lock_slowpath+0xc5/0x1c0
> [149738.042305] [<ffffffff8160a3cf>] mutex_lock+0x1f/0x2f
> [149738.042309] [<ffffffff8150d945>] rtnl_lock+0x15/0x20
> [149738.042311] [<ffffffff81514e3a>] dev_ioctl+0xda/0x590
> [149738.042314] [<ffffffff816121cc>] ? __do_page_fault+0x21c/0x560
> [149738.042318] [<ffffffff814e42c5>] sock_do_ioctl+0x45/0x50
> [149738.042320] [<ffffffff814e49d0>] sock_ioctl+0x1f0/0x2c0
> [149738.042324] [<ffffffff811dc9b5>] do_vfs_ioctl+0x2e5/0x4c0
> [149738.042327] [<ffffffff811e6a00>] ? fget_light+0xa0/0xd0
> 
> ================================ cut here ================================
> 
> I got the vmcore and found that the ifconfig is already in the wait_list of the
> rtnl_lock for 120 second, but my process could get and release the rtnl_lock
> normally several times in one second, so it means that my process jump the
> queue and the ifconfig couldn't get the rtnl all the time, I check the mutex lock
> slow path and found that the mutex may spin on owner ignore whether the  wait list
> is empty, it will cause the task in the wait list always be cut in line, so add
> test for wait list in the mutex_can_spin_on_owner and avoid this problem.
> 
> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
> ---
>  kernel/locking/mutex.c | 11 ++++++-----
>  1 file changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
> index 0551c21..596b341 100644
> --- a/kernel/locking/mutex.c
> +++ b/kernel/locking/mutex.c
> @@ -256,7 +256,7 @@ static inline int mutex_can_spin_on_owner(struct mutex *lock)
>  	struct task_struct *owner;
>  	int retval = 1;
>  
> -	if (need_resched())
> +	if (need_resched() || atomic_read(&lock->count) == -1)
>  		return 0;
>  
>  	rcu_read_lock();
> @@ -283,10 +283,11 @@ static inline bool mutex_try_to_acquire(struct mutex *lock)
>  /*
>   * Optimistic spinning.
>   *
> - * We try to spin for acquisition when we find that the lock owner
> - * is currently running on a (different) CPU and while we don't
> - * need to reschedule. The rationale is that if the lock owner is
> - * running, it is likely to release the lock soon.
> + * We try to spin for acquisition when we find that there are no
> + * pending waiters and the lock owner is currently running on a
> + * (different) CPU and while we don't need to reschedule. The
> + * rationale is that if the lock owner is running, it is likely
> + * to release the lock soon.
>   *
>   * Since this needs the lock owner, and this mutex implementation
>   * doesn't track the owner atomically in the lock field, we need to
> -- 
> 2.5.0
> 
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC ] locking/mutexes: don't spin on owner when wait list is not NULL.
  2016-01-21  7:29 ` Ingo Molnar
@ 2016-01-21  9:04   ` Ding Tianhong
  0 siblings, 0 replies; 24+ messages in thread
From: Ding Tianhong @ 2016-01-21  9:04 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Davidlohr Bueso,
	Linus Torvalds, Paul E. McKenney, Thomas Gleixner,
	Peter Zijlstra, Will Deacon, Jason Low, Tim Chen, Waiman Long

On 2016/1/21 15:29, Ingo Molnar wrote:
> 
> Cc:-ed other gents who touched the mutex code recently. Mail quoted below.
> 

Ok, thanks.

Ding

> Thanks,
> 
> 	Ingo
> 
> * Ding Tianhong <dingtianhong@huawei.com> wrote:
> 
>> I build a script to create several process for ioctl loop calling,
>> the ioctl will calling the kernel function just like:
>> xx_ioctl {
>> ...
>> rtnl_lock();
>> function();
>> rtnl_unlock();
>> ...
>> }
>> The function may sleep several ms, but will not halt, at the same time
>> another user service may calling ifconfig to change the state of the
>> ethernet, and after several hours, the hung task thread report this problem:
>>
>> ========================================================================
>> 149738.039038] INFO: task ifconfig:11890 blocked for more than 120 seconds.
>> [149738.040597] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> [149738.042280] ifconfig D ffff88061ec13680 0 11890 11573 0x00000080
>> [149738.042284] ffff88052449bd40 0000000000000082 ffff88053a33f300 ffff88052449bfd8
>> [149738.042286] ffff88052449bfd8 ffff88052449bfd8 ffff88053a33f300 ffffffff819e6240
>> [149738.042288] ffffffff819e6244 ffff88053a33f300 00000000ffffffff ffffffff819e6248
>> [149738.042290] Call Trace:
>> [149738.042300] [<ffffffff8160d219>] schedule_preempt_disabled+0x29/0x70
>> [149738.042303] [<ffffffff8160af65>] __mutex_lock_slowpath+0xc5/0x1c0
>> [149738.042305] [<ffffffff8160a3cf>] mutex_lock+0x1f/0x2f
>> [149738.042309] [<ffffffff8150d945>] rtnl_lock+0x15/0x20
>> [149738.042311] [<ffffffff81514e3a>] dev_ioctl+0xda/0x590
>> [149738.042314] [<ffffffff816121cc>] ? __do_page_fault+0x21c/0x560
>> [149738.042318] [<ffffffff814e42c5>] sock_do_ioctl+0x45/0x50
>> [149738.042320] [<ffffffff814e49d0>] sock_ioctl+0x1f0/0x2c0
>> [149738.042324] [<ffffffff811dc9b5>] do_vfs_ioctl+0x2e5/0x4c0
>> [149738.042327] [<ffffffff811e6a00>] ? fget_light+0xa0/0xd0
>>
>> ================================ cut here ================================
>>
>> I got the vmcore and found that the ifconfig is already in the wait_list of the
>> rtnl_lock for 120 second, but my process could get and release the rtnl_lock
>> normally several times in one second, so it means that my process jump the
>> queue and the ifconfig couldn't get the rtnl all the time, I check the mutex lock
>> slow path and found that the mutex may spin on owner ignore whether the  wait list
>> is empty, it will cause the task in the wait list always be cut in line, so add
>> test for wait list in the mutex_can_spin_on_owner and avoid this problem.
>>
>> Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
>> ---
>>  kernel/locking/mutex.c | 11 ++++++-----
>>  1 file changed, 6 insertions(+), 5 deletions(-)
>>
>> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
>> index 0551c21..596b341 100644
>> --- a/kernel/locking/mutex.c
>> +++ b/kernel/locking/mutex.c
>> @@ -256,7 +256,7 @@ static inline int mutex_can_spin_on_owner(struct mutex *lock)
>>  	struct task_struct *owner;
>>  	int retval = 1;
>>  
>> -	if (need_resched())
>> +	if (need_resched() || atomic_read(&lock->count) == -1)
>>  		return 0;
>>  
>>  	rcu_read_lock();
>> @@ -283,10 +283,11 @@ static inline bool mutex_try_to_acquire(struct mutex *lock)
>>  /*
>>   * Optimistic spinning.
>>   *
>> - * We try to spin for acquisition when we find that the lock owner
>> - * is currently running on a (different) CPU and while we don't
>> - * need to reschedule. The rationale is that if the lock owner is
>> - * running, it is likely to release the lock soon.
>> + * We try to spin for acquisition when we find that there are no
>> + * pending waiters and the lock owner is currently running on a
>> + * (different) CPU and while we don't need to reschedule. The
>> + * rationale is that if the lock owner is running, it is likely
>> + * to release the lock soon.
>>   *
>>   * Since this needs the lock owner, and this mutex implementation
>>   * doesn't track the owner atomically in the lock field, we need to
>> -- 
>> 2.5.0
>>
>>
> 
> .
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
  2016-02-01  3:29                     ` huang ying
@ 2016-02-01  3:35                       ` Huang, Ying
  0 siblings, 0 replies; 24+ messages in thread
From: Huang, Ying @ 2016-02-01  3:35 UTC (permalink / raw)
  To: huang ying
  Cc: Ding Tianhong, Peter Zijlstra, Waiman Long, Jason Low,
	Ingo Molnar, linux-kernel, Davidlohr Bueso, Linus Torvalds,
	Paul E. McKenney, Thomas Gleixner, Will Deacon, Tim Chen,
	Waiman Long, Huang Ying

huang ying <huang.ying.caritas@gmail.com> writes:

> On Sat, Jan 30, 2016 at 9:18 AM, Ding Tianhong <dingtianhong@huawei.com> wrote:
>> On 2016/1/29 17:53, Peter Zijlstra wrote:
>>> On Sun, Jan 24, 2016 at 04:03:50PM +0800, Ding Tianhong wrote:
>>>
>>>> looks good to me, I will try this solution and report the result, thanks everyone.
>>>
>>> Did you get a change to run with this?
>>>
>>> .
>>>
>>
>> I backport this patch to 3.10 lts kernel, and didn't change any
>> logic, Till now, the patch works fine to me, and no need to change
>> anything,
>> So I think this patch is no problem, could you formal release this patch to the latest kernel? :)
>>
>> Thanks.
>> Ding
>>
>>
>
> The original patch from Tianhong triggered a performance regression
> because the optimistic spinning is turned off in effect.  I tested
> Peter's patch with same configuration and there show no regression.
> So I think the patch keep the optimistic spinning.  Test result
> details will be in the next email.

Here is the detailed test result:

=========================================================================================
compiler/cpufreq_governor/kconfig/nr_task/rootfs/tbox_group/test/testcase:
  gcc-4.9/performance/x86_64-rhel/100%/debian-x86_64-2015-02-07.cgz/lkp-ivb-d01/fstime/unixbench

commit: 
  v4.4
  1db66c17114d5437c0757d6792c0d8923192ecd6

            v4.4 1db66c17114d5437c0757d6792 
---------------- -------------------------- 
         %stddev     %change         %stddev
             \          |                \  
    371269 ± 10%     -93.2%      25080 ±  4%  unixbench.time.voluntary_context_switches
    371269 ± 10%     -93.2%      25080 ±  4%  time.voluntary_context_switches
      6189 ±  8%     -76.4%       1463 ±  6%  vmstat.system.cs
      5706 ±  0%      -1.7%       5608 ±  0%  vmstat.system.in
    113680 ± 12%     -73.5%      30086 ±  1%  cpuidle.C1-IVB.usage
   1515925 ± 20%     +68.3%    2552001 ± 11%  cpuidle.C1E-IVB.time
   1227221 ± 20%     -51.7%     592695 ± 17%  cpuidle.C3-IVB.time
      2697 ± 10%     -77.8%     598.33 ± 10%  cpuidle.C3-IVB.usage
     15173 ±  6%     -23.1%      11663 ±  1%  cpuidle.C6-IVB.usage
     34.38 ± 27%     -35.0%      22.33 ±  2%  cpuidle.POLL.usage
     61.92 ±  9%     +14.3%      70.78 ±  7%  sched_debug.cfs_rq:/.load_avg.min
     40.85 ± 29%     +27.3%      52.00 ± 10%  sched_debug.cfs_rq:/.runnable_load_avg.2
     -1949 ±-37%     -64.6%    -690.19 ±-134%  sched_debug.cfs_rq:/.spread0.4
     -1773 ±-29%     -85.6%    -256.00 ±-388%  sched_debug.cfs_rq:/.spread0.7
     -2478 ±-26%     -49.5%      -1251 ±-66%  sched_debug.cfs_rq:/.spread0.min
     61.95 ±  9%     +14.8%      71.11 ±  7%  sched_debug.cfs_rq:/.tg_load_avg_contrib.min
    396962 ± 12%     +27.9%     507573 ± 10%  sched_debug.cpu.avg_idle.0
    432973 ± 18%     +45.3%     629147 ± 12%  sched_debug.cpu.avg_idle.6
    448566 ±  3%     +11.5%     499990 ±  0%  sched_debug.cpu.avg_idle.avg
     45.31 ±  5%      -9.5%      41.00 ±  3%  sched_debug.cpu.cpu_load[3].7
     52204 ± 10%     -49.9%      26173 ± 34%  sched_debug.cpu.nr_switches.0
     50383 ± 12%     -57.6%      21353 ± 15%  sched_debug.cpu.nr_switches.1
     45425 ± 16%     -68.5%      14325 ± 28%  sched_debug.cpu.nr_switches.2
     43069 ± 20%     -65.5%      14852 ± 41%  sched_debug.cpu.nr_switches.3
     40285 ± 16%     -70.4%      11905 ± 47%  sched_debug.cpu.nr_switches.4
     40732 ± 13%     -75.8%       9872 ± 39%  sched_debug.cpu.nr_switches.5
     43011 ± 19%     -80.0%       8607 ± 42%  sched_debug.cpu.nr_switches.6
     38076 ± 12%     -75.9%       9167 ± 40%  sched_debug.cpu.nr_switches.7
     44148 ±  7%     -67.1%      14532 ±  6%  sched_debug.cpu.nr_switches.avg
     59877 ±  8%     -51.3%      29146 ± 21%  sched_debug.cpu.nr_switches.max
     33672 ±  9%     -83.0%       5718 ±  4%  sched_debug.cpu.nr_switches.min
     -0.62 ±-1411%   -2212.5%      13.00 ± 16%  sched_debug.cpu.nr_uninterruptible.0
     -1.23 ±-582%   -1318.8%      15.00 ± 35%  sched_debug.cpu.nr_uninterruptible.1
      2.54 ±263%    +267.7%       9.33 ± 91%  sched_debug.cpu.nr_uninterruptible.2
      0.31 ±2966%   -5841.7%     -17.67 ±-19%  sched_debug.cpu.nr_uninterruptible.5
      9.84 ± 19%     +70.4%      16.76 ±  5%  sched_debug.cpu.nr_uninterruptible.stddev
    116287 ±  4%     -20.9%      91972 ±  9%  sched_debug.cpu.sched_count.0
     50411 ± 12%     -57.6%      21382 ± 15%  sched_debug.cpu.sched_count.1
     45453 ± 16%     -68.4%      14356 ± 28%  sched_debug.cpu.sched_count.2
     43098 ± 20%     -65.5%      14888 ± 41%  sched_debug.cpu.sched_count.3
     40314 ± 16%     -70.4%      11934 ± 47%  sched_debug.cpu.sched_count.4
     40761 ± 13%     -75.7%       9896 ± 39%  sched_debug.cpu.sched_count.5
     43041 ± 19%     -79.9%       8636 ± 42%  sched_debug.cpu.sched_count.6
     38105 ± 12%     -75.9%       9193 ± 40%  sched_debug.cpu.sched_count.7
     52184 ±  6%     -56.3%      22782 ±  4%  sched_debug.cpu.sched_count.avg
    116288 ±  4%     -20.9%      91972 ±  9%  sched_debug.cpu.sched_count.max
     33701 ±  9%     -82.9%       5746 ±  4%  sched_debug.cpu.sched_count.min
     22760 ± 10%     -63.0%       8418 ± 40%  sched_debug.cpu.sched_goidle.0
     23319 ± 13%     -60.9%       9114 ± 22%  sched_debug.cpu.sched_goidle.1
     21273 ± 17%     -80.4%       4169 ± 13%  sched_debug.cpu.sched_goidle.2
     19993 ± 19%     -67.8%       6429 ± 45%  sched_debug.cpu.sched_goidle.3
     18614 ± 17%     -85.0%       2788 ± 29%  sched_debug.cpu.sched_goidle.4
     18921 ± 12%     -86.7%       2520 ± 15%  sched_debug.cpu.sched_goidle.5
     20131 ± 17%     -82.1%       3596 ± 52%  sched_debug.cpu.sched_goidle.6
     17861 ± 12%     -86.9%       2334 ± 14%  sched_debug.cpu.sched_goidle.7
     20359 ±  8%     -75.8%       4921 ±  5%  sched_debug.cpu.sched_goidle.avg
     26477 ± 10%     -60.2%      10539 ± 21%  sched_debug.cpu.sched_goidle.max
     15845 ± 10%     -87.2%       2033 ±  4%  sched_debug.cpu.sched_goidle.min
     29043 ± 15%     -58.8%      11958 ± 26%  sched_debug.cpu.ttwu_count.0
     24191 ± 10%     -68.8%       7547 ± 27%  sched_debug.cpu.ttwu_count.1
     21313 ± 11%     -72.7%       5819 ± 24%  sched_debug.cpu.ttwu_count.2
     21487 ± 13%     -61.4%       8296 ± 43%  sched_debug.cpu.ttwu_count.3
     19644 ± 15%     -54.4%       8967 ± 79%  sched_debug.cpu.ttwu_count.4
     20786 ± 15%     -69.2%       6409 ± 58%  sched_debug.cpu.ttwu_count.5
     20435 ± 17%     -79.3%       4231 ± 58%  sched_debug.cpu.ttwu_count.6
     19293 ± 17%     -77.0%       4432 ± 55%  sched_debug.cpu.ttwu_count.7
     22024 ±  7%     -67.3%       7207 ±  6%  sched_debug.cpu.ttwu_count.avg
     31009 ±  9%     -45.2%      17008 ± 17%  sched_debug.cpu.ttwu_count.max
     16791 ± 10%     -85.1%       2494 ±  5%  sched_debug.cpu.ttwu_count.min
      3084 ±  8%     +25.5%       3870 ±  9%  sched_debug.cpu.ttwu_local.avg

The URL of the original regression report email:

https://lists.01.org/pipermail/lkp/2016-January/003442.html

Best Regards,
Huang, Ying

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
  2016-01-30  1:18                   ` Ding Tianhong
@ 2016-02-01  3:29                     ` huang ying
  2016-02-01  3:35                       ` Huang, Ying
  0 siblings, 1 reply; 24+ messages in thread
From: huang ying @ 2016-02-01  3:29 UTC (permalink / raw)
  To: Ding Tianhong
  Cc: Peter Zijlstra, Waiman Long, Jason Low, Ingo Molnar,
	linux-kernel, Davidlohr Bueso, Linus Torvalds, Paul E. McKenney,
	Thomas Gleixner, Will Deacon, Tim Chen, Waiman Long, Huang Ying

On Sat, Jan 30, 2016 at 9:18 AM, Ding Tianhong <dingtianhong@huawei.com> wrote:
> On 2016/1/29 17:53, Peter Zijlstra wrote:
>> On Sun, Jan 24, 2016 at 04:03:50PM +0800, Ding Tianhong wrote:
>>
>>> looks good to me, I will try this solution and report the result, thanks everyone.
>>
>> Did you get a change to run with this?
>>
>> .
>>
>
> I backport this patch to 3.10 lts kernel, and didn't change any logic, Till now, the patch works fine to me, and no need to change anything,
> So I think this patch is no problem, could you formal release this patch to the latest kernel? :)
>
> Thanks.
> Ding
>
>

The original patch from Tianhong triggered a performance regression
because the optimistic spinning is turned off in effect.  I tested
Peter's patch with same configuration and there show no regression.
So I think the patch keep the optimistic spinning.  Test result
details will be in the next email.

Best Regards,
Huang, YIng

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
  2016-01-29  9:53                 ` Peter Zijlstra
@ 2016-01-30  1:18                   ` Ding Tianhong
  2016-02-01  3:29                     ` huang ying
  0 siblings, 1 reply; 24+ messages in thread
From: Ding Tianhong @ 2016-01-30  1:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Waiman Long, Jason Low, Ingo Molnar, linux-kernel,
	Davidlohr Bueso, Linus Torvalds, Paul E. McKenney,
	Thomas Gleixner, Will Deacon, Tim Chen, Waiman Long

On 2016/1/29 17:53, Peter Zijlstra wrote:
> On Sun, Jan 24, 2016 at 04:03:50PM +0800, Ding Tianhong wrote:
> 
>> looks good to me, I will try this solution and report the result, thanks everyone.
> 
> Did you get a change to run with this?
> 
> .
> 

I backport this patch to 3.10 lts kernel, and didn't change any logic, Till now, the patch works fine to me, and no need to change anything,
So I think this patch is no problem, could you formal release this patch to the latest kernel? :)

Thanks.
Ding 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
  2016-01-24  8:03               ` Ding Tianhong
@ 2016-01-29  9:53                 ` Peter Zijlstra
  2016-01-30  1:18                   ` Ding Tianhong
  0 siblings, 1 reply; 24+ messages in thread
From: Peter Zijlstra @ 2016-01-29  9:53 UTC (permalink / raw)
  To: Ding Tianhong
  Cc: Waiman Long, Jason Low, Ingo Molnar, linux-kernel,
	Davidlohr Bueso, Linus Torvalds, Paul E. McKenney,
	Thomas Gleixner, Will Deacon, Tim Chen, Waiman Long

On Sun, Jan 24, 2016 at 04:03:50PM +0800, Ding Tianhong wrote:

> looks good to me, I will try this solution and report the result, thanks everyone.

Did you get a change to run with this?

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
  2016-01-22 13:59             ` Waiman Long
@ 2016-01-24  8:03               ` Ding Tianhong
  2016-01-29  9:53                 ` Peter Zijlstra
  0 siblings, 1 reply; 24+ messages in thread
From: Ding Tianhong @ 2016-01-24  8:03 UTC (permalink / raw)
  To: Waiman Long, Peter Zijlstra
  Cc: Jason Low, Ingo Molnar, linux-kernel, Davidlohr Bueso,
	Linus Torvalds, Paul E. McKenney, Thomas Gleixner, Will Deacon,
	Tim Chen, Waiman Long

On 2016/1/22 21:59, Waiman Long wrote:
> On 01/22/2016 06:06 AM, Peter Zijlstra wrote:
>> On Fri, Jan 22, 2016 at 11:56:52AM +0100, Peter Zijlstra wrote:
>>> On Fri, Jan 22, 2016 at 11:53:12AM +0100, Peter Zijlstra wrote:
>>>
>>>> There might be other details, but this is the one that stood out.
>>> I think this also does the wrong thing for use_ww_ctx.
>> Something like so?
> 
> I think that should work. My only minor concern is that putting the waiter spinner at the end of the OSQ will take it longer to get the lock, but that shouldn't be a big issue.
> 
>> ---
>>   kernel/locking/mutex.c | 15 +++++++++++++++
>>   1 file changed, 15 insertions(+)
>>
>> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
>> index 0551c219c40e..070a0ac34aa7 100644
>> --- a/kernel/locking/mutex.c
>> +++ b/kernel/locking/mutex.c
>> @@ -512,6 +512,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>>       struct task_struct *task = current;
>>       struct mutex_waiter waiter;
>>       unsigned long flags;
>> +    bool acquired;
>>       int ret;
>>
>>       preempt_disable();
>> @@ -543,6 +544,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>>       lock_contended(&lock->dep_map, ip);
>>
>>       for (;;) {
>> +        acquired = false;
>>           /*
>>            * Lets try to take the lock again - this is needed even if
>>            * we get here for the first time (shortly after failing to
>> @@ -577,7 +579,16 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>>           /* didn't get the lock, go to sleep: */
>>           spin_unlock_mutex(&lock->wait_lock, flags);
>>           schedule_preempt_disabled();
>> +
>> +        if (mutex_is_locked(lock))
>> +            acquired = mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx);
>> +
>>           spin_lock_mutex(&lock->wait_lock, flags);
>> +
>> +        if (acquired) {
>> +            atomic_set(&lock->count, -1);
>> +            break;
>> +        }
>>       }
>>       __set_task_state(task, TASK_RUNNING);
>>
>> @@ -587,6 +598,9 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>>           atomic_set(&lock->count, 0);
>>       debug_mutex_free_waiter(&waiter);
>>
>> +    if (acquired)
>> +        goto unlock;
>> +
>>   skip_wait:
>>       /* got the lock - cleanup and rejoice! */
>>       lock_acquired(&lock->dep_map, ip);
>> @@ -597,6 +611,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>>           ww_mutex_set_context_slowpath(ww, ww_ctx);
>>       }
>>
>> +unlock:
>>       spin_unlock_mutex(&lock->wait_lock, flags);
>>       preempt_enable();
>>       return 0;
> 
> Cheers,
> Longman
> 

looks good to me, I will try this solution and report the result, thanks everyone.

Ding

> .
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
  2016-01-22 13:38     ` Waiman Long
@ 2016-01-22 16:46       ` Davidlohr Bueso
  0 siblings, 0 replies; 24+ messages in thread
From: Davidlohr Bueso @ 2016-01-22 16:46 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ding Tianhong, Peter Zijlstra, Ingo Molnar, linux-kernel,
	Linus Torvalds, Paul E. McKenney, Thomas Gleixner, Will Deacon,
	Jason Low, Tim Chen, Waiman Long

On Fri, 22 Jan 2016, Waiman Long wrote:

>The patch that I sent out is just a proof of concept to make sure
>that it can fix that particular case. I do plan to refactor it if I
>decide to go ahead with an official one. Unlike the OSQ, there can be
>no more than one waiter spinner as the wakeup function is directed to
>only the first task in the wait list and the spinning won't happen
>until the task is first woken up. In the worst case scenario, there
>are only 2 spinners spinning on the lock and the owner field, one
>from OSQ and one from the wait list. That shouldn't put too much
>cacheline contention traffic to the system.

Similarly, I guess we should also wakeup the next waiter in line after
releasing the wait_lock via wake_q. This would allow the woken waiter a
slightly better chance of finding the wait_lock free when continuing to
take the mutex.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
  2016-01-22 11:06           ` Peter Zijlstra
@ 2016-01-22 13:59             ` Waiman Long
  2016-01-24  8:03               ` Ding Tianhong
  0 siblings, 1 reply; 24+ messages in thread
From: Waiman Long @ 2016-01-22 13:59 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Jason Low, Ding Tianhong, Ingo Molnar, linux-kernel,
	Davidlohr Bueso, Linus Torvalds, Paul E. McKenney,
	Thomas Gleixner, Will Deacon, Tim Chen, Waiman Long

On 01/22/2016 06:06 AM, Peter Zijlstra wrote:
> On Fri, Jan 22, 2016 at 11:56:52AM +0100, Peter Zijlstra wrote:
>> On Fri, Jan 22, 2016 at 11:53:12AM +0100, Peter Zijlstra wrote:
>>
>>> There might be other details, but this is the one that stood out.
>> I think this also does the wrong thing for use_ww_ctx.
> Something like so?

I think that should work. My only minor concern is that putting the 
waiter spinner at the end of the OSQ will take it longer to get the 
lock, but that shouldn't be a big issue.

> ---
>   kernel/locking/mutex.c | 15 +++++++++++++++
>   1 file changed, 15 insertions(+)
>
> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
> index 0551c219c40e..070a0ac34aa7 100644
> --- a/kernel/locking/mutex.c
> +++ b/kernel/locking/mutex.c
> @@ -512,6 +512,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>   	struct task_struct *task = current;
>   	struct mutex_waiter waiter;
>   	unsigned long flags;
> +	bool acquired;
>   	int ret;
>
>   	preempt_disable();
> @@ -543,6 +544,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>   	lock_contended(&lock->dep_map, ip);
>
>   	for (;;) {
> +		acquired = false;
>   		/*
>   		 * Lets try to take the lock again - this is needed even if
>   		 * we get here for the first time (shortly after failing to
> @@ -577,7 +579,16 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>   		/* didn't get the lock, go to sleep: */
>   		spin_unlock_mutex(&lock->wait_lock, flags);
>   		schedule_preempt_disabled();
> +
> +		if (mutex_is_locked(lock))
> +			acquired = mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx);
> +
>   		spin_lock_mutex(&lock->wait_lock, flags);
> +
> +		if (acquired) {
> +			atomic_set(&lock->count, -1);
> +			break;
> +		}
>   	}
>   	__set_task_state(task, TASK_RUNNING);
>
> @@ -587,6 +598,9 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>   		atomic_set(&lock->count, 0);
>   	debug_mutex_free_waiter(&waiter);
>
> +	if (acquired)
> +		goto unlock;
> +
>   skip_wait:
>   	/* got the lock - cleanup and rejoice! */
>   	lock_acquired(&lock->dep_map, ip);
> @@ -597,6 +611,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>   		ww_mutex_set_context_slowpath(ww, ww_ctx);
>   	}
>
> +unlock:
>   	spin_unlock_mutex(&lock->wait_lock, flags);
>   	preempt_enable();
>   	return 0;

Cheers,
Longman

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
  2016-01-22  8:54   ` Peter Zijlstra
  2016-01-22 10:20     ` Jason Low
@ 2016-01-22 13:41     ` Waiman Long
  1 sibling, 0 replies; 24+ messages in thread
From: Waiman Long @ 2016-01-22 13:41 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ding Tianhong, Ingo Molnar, linux-kernel, Davidlohr Bueso,
	Linus Torvalds, Paul E. McKenney, Thomas Gleixner, Will Deacon,
	Jason Low, Tim Chen, Waiman Long

On 01/22/2016 03:54 AM, Peter Zijlstra wrote:
> On Thu, Jan 21, 2016 at 06:02:34PM -0500, Waiman Long wrote:
>> This patch attempts to fix this live-lock condition by enabling the
>> a woken task in the wait list to enter optimistic spinning loop itself
>> with precedence over the ones in the OSQ. This should prevent the
>> live-lock
>> condition from happening.
>
> So I think having the top waiter going back in to contend on the OSQ is
> an excellent idea, but I'm not sure the wlh_spinning thing is important.

Yes, that is optional. I put it there just to make it is more likely for 
the waiter spinner to get the lock. Without that, the chance will be 
50/50 on average. I can certainly take that out.

> The OSQ itself is FIFO fair, and the waiters retain the wait_list
> position. So having the top wait_list entry contending on the OSQ
> ensures we cannot starve (I think).
>
> Also, as Davidlohr said, we cannot copy/paste this much code.

As I said in the previous mail, I do intend to refactor it before 
sending out the official patch.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
  2016-01-22  6:09   ` Davidlohr Bueso
@ 2016-01-22 13:38     ` Waiman Long
  2016-01-22 16:46       ` Davidlohr Bueso
  0 siblings, 1 reply; 24+ messages in thread
From: Waiman Long @ 2016-01-22 13:38 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Ding Tianhong, Peter Zijlstra, Ingo Molnar, linux-kernel,
	Linus Torvalds, Paul E. McKenney, Thomas Gleixner, Will Deacon,
	Jason Low, Tim Chen, Waiman Long

On 01/22/2016 01:09 AM, Davidlohr Bueso wrote:
> On Thu, 21 Jan 2016, Waiman Long wrote:
>
>> On 01/21/2016 04:29 AM, Ding Tianhong wrote:
>
>>> I got the vmcore and found that the ifconfig is already in the 
>>> wait_list of the
>>> rtnl_lock for 120 second, but my process could get and release the 
>>> rtnl_lock
>>> normally several times in one second, so it means that my process 
>>> jump the
>>> queue and the ifconfig couldn't get the rtnl all the time, I check 
>>> the mutex lock
>>> slow path and found that the mutex may spin on owner ignore whether 
>>> the  wait list
>>> is empty, it will cause the task in the wait list always be cut in 
>>> line, so add
>>> test for wait list in the mutex_can_spin_on_owner and avoid this 
>>> problem.
>
> So this has been somewhat always known, at least in theory, until now. 
> It's the cost
> of spinning without going through the wait-queue, unlike other locks.
>
>>> [...]
>
>> From: Waiman Long <Waiman.Long@hpe.com>
>> Date: Thu, 21 Jan 2016 17:53:14 -0500
>> Subject: [PATCH] locking/mutex: Enable optimistic spinning of woken 
>> task in wait list
>>
>> Ding Tianhong reported a live-lock situation where a constant stream
>> of incoming optimistic spinners blocked a task in the wait list from
>> getting the mutex.
>>
>> This patch attempts to fix this live-lock condition by enabling the
>> a woken task in the wait list to enter optimistic spinning loop itself
>> with precedence over the ones in the OSQ. This should prevent the
>> live-lock
>> condition from happening.
>
> And one of the reasons why we never bothered 'fixing' things was the 
> additional
> branching out in the slowpath (and lack of real issue, although this 
> one being so
> damn pathological). I fear that your approach is one of those 
> scenarios where the
> code ends up being bloated, albeit most of it is actually duplicated 
> and can be
> refactored *sigh*. So now we'd spin, then sleep, then try spinning 
> then sleep again...
> phew. Not to mention the performance implications, ie loosing the 
> benefits of osq
> over waiter spinning in scenarios that would otherwise have more osq 
> spinners as
> opposed to waiter spinners, or in setups where it is actually best to 
> block instead
> of spinning.

The patch that I sent out is just a proof of concept to make sure that 
it can fix that particular case. I do plan to refactor it if I decide to 
go ahead with an official one. Unlike the OSQ, there can be no more than 
one waiter spinner as the wakeup function is directed to only the first 
task in the wait list and the spinning won't happen until the task is 
first woken up. In the worst case scenario, there are only 2 spinners 
spinning on the lock and the owner field, one from OSQ and one from the 
wait list. That shouldn't put too much cacheline contention traffic to 
the system.

Cheers,
Longman

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
  2016-01-22 10:56         ` Peter Zijlstra
@ 2016-01-22 11:06           ` Peter Zijlstra
  2016-01-22 13:59             ` Waiman Long
  0 siblings, 1 reply; 24+ messages in thread
From: Peter Zijlstra @ 2016-01-22 11:06 UTC (permalink / raw)
  To: Jason Low
  Cc: Waiman Long, Ding Tianhong, Ingo Molnar, linux-kernel,
	Davidlohr Bueso, Linus Torvalds, Paul E. McKenney,
	Thomas Gleixner, Will Deacon, Tim Chen, Waiman Long

On Fri, Jan 22, 2016 at 11:56:52AM +0100, Peter Zijlstra wrote:
> On Fri, Jan 22, 2016 at 11:53:12AM +0100, Peter Zijlstra wrote:
> 
> > There might be other details, but this is the one that stood out.
> 
> I think this also does the wrong thing for use_ww_ctx.

Something like so? 

---
 kernel/locking/mutex.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 0551c219c40e..070a0ac34aa7 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -512,6 +512,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 	struct task_struct *task = current;
 	struct mutex_waiter waiter;
 	unsigned long flags;
+	bool acquired;
 	int ret;
 
 	preempt_disable();
@@ -543,6 +544,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 	lock_contended(&lock->dep_map, ip);
 
 	for (;;) {
+		acquired = false;
 		/*
 		 * Lets try to take the lock again - this is needed even if
 		 * we get here for the first time (shortly after failing to
@@ -577,7 +579,16 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 		/* didn't get the lock, go to sleep: */
 		spin_unlock_mutex(&lock->wait_lock, flags);
 		schedule_preempt_disabled();
+
+		if (mutex_is_locked(lock))
+			acquired = mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx);
+
 		spin_lock_mutex(&lock->wait_lock, flags);
+
+		if (acquired) {
+			atomic_set(&lock->count, -1);
+			break;
+		}
 	}
 	__set_task_state(task, TASK_RUNNING);
 
@@ -587,6 +598,9 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 		atomic_set(&lock->count, 0);
 	debug_mutex_free_waiter(&waiter);
 
+	if (acquired)
+		goto unlock;
+
 skip_wait:
 	/* got the lock - cleanup and rejoice! */
 	lock_acquired(&lock->dep_map, ip);
@@ -597,6 +611,7 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 		ww_mutex_set_context_slowpath(ww, ww_ctx);
 	}
 
+unlock:
 	spin_unlock_mutex(&lock->wait_lock, flags);
 	preempt_enable();
 	return 0;

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
  2016-01-22 10:53       ` Peter Zijlstra
@ 2016-01-22 10:56         ` Peter Zijlstra
  2016-01-22 11:06           ` Peter Zijlstra
  0 siblings, 1 reply; 24+ messages in thread
From: Peter Zijlstra @ 2016-01-22 10:56 UTC (permalink / raw)
  To: Jason Low
  Cc: Waiman Long, Ding Tianhong, Ingo Molnar, linux-kernel,
	Davidlohr Bueso, Linus Torvalds, Paul E. McKenney,
	Thomas Gleixner, Will Deacon, Tim Chen, Waiman Long

On Fri, Jan 22, 2016 at 11:53:12AM +0100, Peter Zijlstra wrote:

> There might be other details, but this is the one that stood out.

I think this also does the wrong thing for use_ww_ctx.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
  2016-01-22 10:20     ` Jason Low
@ 2016-01-22 10:53       ` Peter Zijlstra
  2016-01-22 10:56         ` Peter Zijlstra
  0 siblings, 1 reply; 24+ messages in thread
From: Peter Zijlstra @ 2016-01-22 10:53 UTC (permalink / raw)
  To: Jason Low
  Cc: Waiman Long, Ding Tianhong, Ingo Molnar, linux-kernel,
	Davidlohr Bueso, Linus Torvalds, Paul E. McKenney,
	Thomas Gleixner, Will Deacon, Tim Chen, Waiman Long

On Fri, Jan 22, 2016 at 02:20:19AM -0800, Jason Low wrote:

> --- a/kernel/locking/mutex.c
> +++ b/kernel/locking/mutex.c
> @@ -543,6 +543,8 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>  	lock_contended(&lock->dep_map, ip);
>  
>  	for (;;) {
> +		bool acquired = false;
> +
>  		/*
>  		 * Lets try to take the lock again - this is needed even if
>  		 * we get here for the first time (shortly after failing to
> @@ -577,7 +579,12 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
>  		/* didn't get the lock, go to sleep: */
>  		spin_unlock_mutex(&lock->wait_lock, flags);
>  		schedule_preempt_disabled();
> +
> +		if (mutex_is_locked(lock))
> +			acquired = mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx);
>  		spin_lock_mutex(&lock->wait_lock, flags);
> +		if (acquired)
> +			break;
>  	}
>  	__set_task_state(task, TASK_RUNNING);

I think the problem here is that mutex_optimistic_spin() leaves the
mutex->count == 0, even though we have waiters (us at the very least).

But this should be easily fixed, since if we acquired, we should be the
one releasing, so there's no race.

So something like so:

		if (acquired) {
			atomic_set(&mutex->count, -1);
			break;
		}

Should deal with that -- we'll set it to 0 again a little further down
if the list ends up empty.


There might be other details, but this is the one that stood out.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
  2016-01-22  8:54   ` Peter Zijlstra
@ 2016-01-22 10:20     ` Jason Low
  2016-01-22 10:53       ` Peter Zijlstra
  2016-01-22 13:41     ` Waiman Long
  1 sibling, 1 reply; 24+ messages in thread
From: Jason Low @ 2016-01-22 10:20 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Waiman Long, Ding Tianhong, Ingo Molnar, linux-kernel,
	Davidlohr Bueso, Linus Torvalds, Paul E. McKenney,
	Thomas Gleixner, Will Deacon, Tim Chen, Waiman Long, jason.low2

On Fri, 2016-01-22 at 09:54 +0100, Peter Zijlstra wrote:
> On Thu, Jan 21, 2016 at 06:02:34PM -0500, Waiman Long wrote:
> > This patch attempts to fix this live-lock condition by enabling the
> > a woken task in the wait list to enter optimistic spinning loop itself
> > with precedence over the ones in the OSQ. This should prevent the
> > live-lock
> > condition from happening.
> 
> 
> So I think having the top waiter going back in to contend on the OSQ is
> an excellent idea, but I'm not sure the wlh_spinning thing is important.
> 
> The OSQ itself is FIFO fair, and the waiters retain the wait_list
> position. So having the top wait_list entry contending on the OSQ
> ensures we cannot starve (I think).

Right, and we can also avoid needing to add that extra field to the
mutex structure. Before calling optimistic spinning, we do want to check
if the lock is available to avoid unnecessary OSQ overhead though.

So maybe the following would be sufficient:

---
 kernel/locking/mutex.c |    7 +++++++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 0551c21..ead0bd1 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -543,6 +543,8 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 	lock_contended(&lock->dep_map, ip);
 
 	for (;;) {
+		bool acquired = false;
+
 		/*
 		 * Lets try to take the lock again - this is needed even if
 		 * we get here for the first time (shortly after failing to
@@ -577,7 +579,12 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 		/* didn't get the lock, go to sleep: */
 		spin_unlock_mutex(&lock->wait_lock, flags);
 		schedule_preempt_disabled();
+
+		if (mutex_is_locked(lock))
+			acquired = mutex_optimistic_spin(lock, ww_ctx, use_ww_ctx);
 		spin_lock_mutex(&lock->wait_lock, flags);
+		if (acquired)
+			break;
 	}
 	__set_task_state(task, TASK_RUNNING);
 
-- 
1.7.2.5

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
  2016-01-21 23:02 ` Waiman Long
  2016-01-22  6:09   ` Davidlohr Bueso
@ 2016-01-22  8:54   ` Peter Zijlstra
  2016-01-22 10:20     ` Jason Low
  2016-01-22 13:41     ` Waiman Long
  1 sibling, 2 replies; 24+ messages in thread
From: Peter Zijlstra @ 2016-01-22  8:54 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ding Tianhong, Ingo Molnar, linux-kernel, Davidlohr Bueso,
	Linus Torvalds, Paul E. McKenney, Thomas Gleixner, Will Deacon,
	Jason Low, Tim Chen, Waiman Long

On Thu, Jan 21, 2016 at 06:02:34PM -0500, Waiman Long wrote:
> This patch attempts to fix this live-lock condition by enabling the
> a woken task in the wait list to enter optimistic spinning loop itself
> with precedence over the ones in the OSQ. This should prevent the
> live-lock
> condition from happening.


So I think having the top waiter going back in to contend on the OSQ is
an excellent idea, but I'm not sure the wlh_spinning thing is important.

The OSQ itself is FIFO fair, and the waiters retain the wait_list
position. So having the top wait_list entry contending on the OSQ
ensures we cannot starve (I think).

Also, as Davidlohr said, we cannot copy/paste this much code.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
  2016-01-21 23:02 ` Waiman Long
@ 2016-01-22  6:09   ` Davidlohr Bueso
  2016-01-22 13:38     ` Waiman Long
  2016-01-22  8:54   ` Peter Zijlstra
  1 sibling, 1 reply; 24+ messages in thread
From: Davidlohr Bueso @ 2016-01-22  6:09 UTC (permalink / raw)
  To: Waiman Long
  Cc: Ding Tianhong, Peter Zijlstra, Ingo Molnar, linux-kernel,
	Linus Torvalds, Paul E. McKenney, Thomas Gleixner, Will Deacon,
	Jason Low, Tim Chen, Waiman Long

On Thu, 21 Jan 2016, Waiman Long wrote:

>On 01/21/2016 04:29 AM, Ding Tianhong wrote:

>>I got the vmcore and found that the ifconfig is already in the wait_list of the
>>rtnl_lock for 120 second, but my process could get and release the rtnl_lock
>>normally several times in one second, so it means that my process jump the
>>queue and the ifconfig couldn't get the rtnl all the time, I check the mutex lock
>>slow path and found that the mutex may spin on owner ignore whether the  wait list
>>is empty, it will cause the task in the wait list always be cut in line, so add
>>test for wait list in the mutex_can_spin_on_owner and avoid this problem.

So this has been somewhat always known, at least in theory, until now. It's the cost
of spinning without going through the wait-queue, unlike other locks.

>> [...]

>From: Waiman Long <Waiman.Long@hpe.com>
>Date: Thu, 21 Jan 2016 17:53:14 -0500
>Subject: [PATCH] locking/mutex: Enable optimistic spinning of woken task in wait list
>
>Ding Tianhong reported a live-lock situation where a constant stream
>of incoming optimistic spinners blocked a task in the wait list from
>getting the mutex.
>
>This patch attempts to fix this live-lock condition by enabling the
>a woken task in the wait list to enter optimistic spinning loop itself
>with precedence over the ones in the OSQ. This should prevent the
>live-lock
>condition from happening.

And one of the reasons why we never bothered 'fixing' things was the additional
branching out in the slowpath (and lack of real issue, although this one being so
damn pathological). I fear that your approach is one of those scenarios where the
code ends up being bloated, albeit most of it is actually duplicated and can be
refactored *sigh*. So now we'd spin, then sleep, then try spinning then sleep again...
phew. Not to mention the performance implications, ie loosing the benefits of osq
over waiter spinning in scenarios that would otherwise have more osq spinners as
opposed to waiter spinners, or in setups where it is actually best to block instead
of spinning.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
  2016-01-22  2:48     ` Davidlohr Bueso
@ 2016-01-22  3:13       ` Paul E. McKenney
  0 siblings, 0 replies; 24+ messages in thread
From: Paul E. McKenney @ 2016-01-22  3:13 UTC (permalink / raw)
  To: Davidlohr Bueso
  Cc: Tim Chen, Ding Tianhong, Peter Zijlstra, Ingo Molnar,
	linux-kernel, Linus Torvalds, Thomas Gleixner, Will Deacon,
	Jason Low, Waiman Long

On Thu, Jan 21, 2016 at 06:48:54PM -0800, Davidlohr Bueso wrote:
> On Thu, 21 Jan 2016, Paul E. McKenney wrote:
> 
> >I did some testing, which exposed it to the 0day test robot, which
> >did note some performance differences.  I was hoping that it would
> >clear up some instability from other patches, but no such luck.  ;-)
> 
> Oh, that explains why we got a performance regression report :)

Plus I suspected that you wanted some extra email.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
  2016-01-22  2:41   ` Paul E. McKenney
@ 2016-01-22  2:48     ` Davidlohr Bueso
  2016-01-22  3:13       ` Paul E. McKenney
  0 siblings, 1 reply; 24+ messages in thread
From: Davidlohr Bueso @ 2016-01-22  2:48 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Tim Chen, Ding Tianhong, Peter Zijlstra, Ingo Molnar,
	linux-kernel, Linus Torvalds, Thomas Gleixner, Will Deacon,
	Jason Low, Waiman Long

On Thu, 21 Jan 2016, Paul E. McKenney wrote:

>I did some testing, which exposed it to the 0day test robot, which
>did note some performance differences.  I was hoping that it would
>clear up some instability from other patches, but no such luck.  ;-)

Oh, that explains why we got a performance regression report :)

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
  2016-01-21 21:23 ` Tim Chen
@ 2016-01-22  2:41   ` Paul E. McKenney
  2016-01-22  2:48     ` Davidlohr Bueso
  0 siblings, 1 reply; 24+ messages in thread
From: Paul E. McKenney @ 2016-01-22  2:41 UTC (permalink / raw)
  To: Tim Chen
  Cc: Ding Tianhong, Peter Zijlstra, Ingo Molnar, linux-kernel,
	Davidlohr Bueso, Linus Torvalds, Thomas Gleixner, Will Deacon,
	Jason Low, Waiman Long

On Thu, Jan 21, 2016 at 01:23:09PM -0800, Tim Chen wrote:
> On Thu, 2016-01-21 at 17:29 +0800, Ding Tianhong wrote:
> 
> > 
> > diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
> > index 0551c21..596b341 100644
> > --- a/kernel/locking/mutex.c
> > +++ b/kernel/locking/mutex.c
> > @@ -256,7 +256,7 @@ static inline int mutex_can_spin_on_owner(struct mutex *lock)
> >  	struct task_struct *owner;
> >  	int retval = 1;
> >  
> > -	if (need_resched())
> > +	if (need_resched() || atomic_read(&lock->count) == -1)
> >  		return 0;
> >  
> 
> One concern I have is this change will eliminate any optimistic spinning
> as long as there is a waiter.  Is there a middle ground that we
> can allow only one spinner if there are waiters?  
> 
> In other words, we allow spinning when
> atomic_read(&lock->count) == -1 but there is no one on the
> osq lock that queue up the spinners (i.e. no other process doing
> optimistic spinning).
> 
> This could allow a bit of spinning without starving out the waiters.

I did some testing, which exposed it to the 0day test robot, which
did note some performance differences.  I was hoping that it would
clear up some instability from other patches, but no such luck.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
  2016-01-21  9:29 [PATCH RFC] " Ding Tianhong
  2016-01-21 21:23 ` Tim Chen
@ 2016-01-21 23:02 ` Waiman Long
  2016-01-22  6:09   ` Davidlohr Bueso
  2016-01-22  8:54   ` Peter Zijlstra
  1 sibling, 2 replies; 24+ messages in thread
From: Waiman Long @ 2016-01-21 23:02 UTC (permalink / raw)
  To: Ding Tianhong
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Davidlohr Bueso,
	Linus Torvalds, Paul E. McKenney, Thomas Gleixner, Will Deacon,
	Jason Low, Tim Chen, Waiman Long

[-- Attachment #1: Type: text/plain, Size: 4432 bytes --]

On 01/21/2016 04:29 AM, Ding Tianhong wrote:
> I build a script to create several process for ioctl loop calling,
> the ioctl will calling the kernel function just like:
> xx_ioctl {
> ...
> rtnl_lock();
> function();
> rtnl_unlock();
> ...
> }
> The function may sleep several ms, but will not halt, at the same time
> another user service may calling ifconfig to change the state of the
> ethernet, and after several hours, the hung task thread report this problem:
>
> ========================================================================
> 149738.039038] INFO: task ifconfig:11890 blocked for more than 120 seconds.
> [149738.040597] "echo 0>  /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [149738.042280] ifconfig D ffff88061ec13680 0 11890 11573 0x00000080
> [149738.042284] ffff88052449bd40 0000000000000082 ffff88053a33f300 ffff88052449bfd8
> [149738.042286] ffff88052449bfd8 ffff88052449bfd8 ffff88053a33f300 ffffffff819e6240
> [149738.042288] ffffffff819e6244 ffff88053a33f300 00000000ffffffff ffffffff819e6248
> [149738.042290] Call Trace:
> [149738.042300] [<ffffffff8160d219>] schedule_preempt_disabled+0x29/0x70
> [149738.042303] [<ffffffff8160af65>] __mutex_lock_slowpath+0xc5/0x1c0
> [149738.042305] [<ffffffff8160a3cf>] mutex_lock+0x1f/0x2f
> [149738.042309] [<ffffffff8150d945>] rtnl_lock+0x15/0x20
> [149738.042311] [<ffffffff81514e3a>] dev_ioctl+0xda/0x590
> [149738.042314] [<ffffffff816121cc>] ? __do_page_fault+0x21c/0x560
> [149738.042318] [<ffffffff814e42c5>] sock_do_ioctl+0x45/0x50
> [149738.042320] [<ffffffff814e49d0>] sock_ioctl+0x1f0/0x2c0
> [149738.042324] [<ffffffff811dc9b5>] do_vfs_ioctl+0x2e5/0x4c0
> [149738.042327] [<ffffffff811e6a00>] ? fget_light+0xa0/0xd0
>
> ================================ cut here ================================
>
> I got the vmcore and found that the ifconfig is already in the wait_list of the
> rtnl_lock for 120 second, but my process could get and release the rtnl_lock
> normally several times in one second, so it means that my process jump the
> queue and the ifconfig couldn't get the rtnl all the time, I check the mutex lock
> slow path and found that the mutex may spin on owner ignore whether the  wait list
> is empty, it will cause the task in the wait list always be cut in line, so add
> test for wait list in the mutex_can_spin_on_owner and avoid this problem.
>
> Signed-off-by: Ding Tianhong<dingtianhong@huawei.com>
> Cc: Ingo Molnar<mingo@redhat.com>
> Cc: Peter Zijlstra<peterz@infradead.org>
> Cc: Davidlohr Bueso<dave@stgolabs.net>
> Cc: Linus Torvalds<torvalds@linux-foundation.org>
> Cc: Paul E. McKenney<paulmck@us.ibm.com>
> Cc: Thomas Gleixner<tglx@linutronix.de>
> Cc: Will Deacon<will.deacon@arm.com>
> Cc: Jason Low<jason.low2@hp.com>
> Cc: Tim Chen<tim.c.chen@linux.intel.com>
> Cc: Waiman Long<Waiman.Long@hp.com>
> ---
>   kernel/locking/mutex.c | 11 ++++++-----
>   1 file changed, 6 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
> index 0551c21..596b341 100644
> --- a/kernel/locking/mutex.c
> +++ b/kernel/locking/mutex.c
> @@ -256,7 +256,7 @@ static inline int mutex_can_spin_on_owner(struct mutex *lock)
>   	struct task_struct *owner;
>   	int retval = 1;
>
> -	if (need_resched())
> +	if (need_resched() || atomic_read(&lock->count) == -1)
>   		return 0;
>
>   	rcu_read_lock();
> @@ -283,10 +283,11 @@ static inline bool mutex_try_to_acquire(struct mutex *lock)
>   /*
>    * Optimistic spinning.
>    *
> - * We try to spin for acquisition when we find that the lock owner
> - * is currently running on a (different) CPU and while we don't
> - * need to reschedule. The rationale is that if the lock owner is
> - * running, it is likely to release the lock soon.
> + * We try to spin for acquisition when we find that there are no
> + * pending waiters and the lock owner is currently running on a
> + * (different) CPU and while we don't need to reschedule. The
> + * rationale is that if the lock owner is running, it is likely
> + * to release the lock soon.
>    *
>    * Since this needs the lock owner, and this mutex implementation
>    * doesn't track the owner atomically in the lock field, we need to

This patch will largely defeat the performance benefit of optimistic 
spinning. I have an alternative solution to this live-lock problem. 
Would you mind trying out the attached patch to see if it can fix your 
problem?

Cheers,
Longman


[-- Attachment #2: 0001-locking-mutex-Enable-optimistic-spinning-of-woken-ta.patch --]
[-- Type: text/plain, Size: 5460 bytes --]

From 1bbb5a4434d395f48163abc5435c5c720a15d327 Mon Sep 17 00:00:00 2001
From: Waiman Long <Waiman.Long@hpe.com>
Date: Thu, 21 Jan 2016 17:53:14 -0500
Subject: [PATCH] locking/mutex: Enable optimistic spinning of woken task in wait list

Ding Tianhong reported a live-lock situation where a constant stream
of incoming optimistic spinners blocked a task in the wait list from
getting the mutex.

This patch attempts to fix this live-lock condition by enabling the
a woken task in the wait list to enter optimistic spinning loop itself
with precedence over the ones in the OSQ. This should prevent the
live-lock
condition from happening.

Signed-off-by: Waiman Long <Waiman.Long@hpe.com>
---
 include/linux/mutex.h  |    2 +
 kernel/locking/mutex.c |   95 +++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 95 insertions(+), 2 deletions(-)

diff --git a/include/linux/mutex.h b/include/linux/mutex.h
index 2cb7531..2c55ecd 100644
--- a/include/linux/mutex.h
+++ b/include/linux/mutex.h
@@ -57,6 +57,8 @@ struct mutex {
 #endif
 #ifdef CONFIG_MUTEX_SPIN_ON_OWNER
 	struct optimistic_spin_queue osq; /* Spinner MCS lock */
+	/* Set if wait list head actively spinning */
+	int			wlh_spinning;
 #endif
 #ifdef CONFIG_DEBUG_MUTEXES
 	void			*magic;
diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 0551c21..8b27b03 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -55,6 +55,7 @@ __mutex_init(struct mutex *lock, const char *name, struct lock_class_key *key)
 	mutex_clear_owner(lock);
 #ifdef CONFIG_MUTEX_SPIN_ON_OWNER
 	osq_lock_init(&lock->osq);
+	lock->wlh_spinning = false;
 #endif
 
 	debug_mutex_init(lock, name, key);
@@ -346,8 +347,12 @@ static bool mutex_optimistic_spin(struct mutex *lock,
 		if (owner && !mutex_spin_on_owner(lock, owner))
 			break;
 
-		/* Try to acquire the mutex if it is unlocked. */
-		if (mutex_try_to_acquire(lock)) {
+		/*
+		 * Try to acquire the mutex if it is unlocked and the wait
+		 * list head isn't spinning on the lock.
+		 */
+		if (!READ_ONCE(lock->wlh_spinning) &&
+		    mutex_try_to_acquire(lock)) {
 			lock_acquired(&lock->dep_map, ip);
 
 			if (use_ww_ctx) {
@@ -398,12 +403,91 @@ done:
 
 	return false;
 }
+
+/*
+ * Wait list head optimistic spinning
+ *
+ * The wait list head, when woken up, will try to spin on the lock if the
+ * lock owner is active. It will also set the wlh_spinning flag to give
+ * itself a higher chance of getting the lock than the other optimisically
+ * spinning locker in the OSQ.
+ */
+static bool mutex_wlh_opt_spin(struct mutex *lock,
+			       struct ww_acquire_ctx *ww_ctx, const bool use_ww_ctx)
+{
+	struct task_struct *owner, *task = current;
+	int gotlock = false;
+
+	WRITE_ONCE(lock->wlh_spinning, true);
+	while (true) {
+		if (use_ww_ctx && ww_ctx->acquired > 0) {
+			struct ww_mutex *ww;
+
+			ww = container_of(lock, struct ww_mutex, base);
+			/*
+			 * If ww->ctx is set the contents are undefined, only
+			 * by acquiring wait_lock there is a guarantee that
+			 * they are not invalid when reading.
+			 *
+			 * As such, when deadlock detection needs to be
+			 * performed the optimistic spinning cannot be done.
+			 */
+			if (READ_ONCE(ww->ctx))
+				break;
+		}
+
+		/*
+		 * If there's an owner, wait for it to either
+		 * release the lock or go to sleep.
+		 */
+		owner = READ_ONCE(lock->owner);
+		if (owner && !mutex_spin_on_owner(lock, owner))
+			break;
+
+		/*
+		 * Try to acquire the mutex if it is unlocked. The mutex
+		 * value is set to -1 which will be changed to 0 later on
+		 * if the wait list becomes empty.
+		 */
+		if (!mutex_is_locked(lock) &&
+		   (atomic_cmpxchg_acquire(&lock->count, 1, -1) == 1)) {
+			gotlock = true;
+			break;
+		}
+
+		/*
+		 * When there's no owner, we might have preempted between the
+		 * owner acquiring the lock and setting the owner field. If
+		 * we're an RT task that will live-lock because we won't let
+		 * the owner complete.
+		 */
+		if (!owner && (need_resched() || rt_task(task)))
+			break;
+
+		/*
+		 * The cpu_relax() call is a compiler barrier which forces
+		 * everything in this loop to be re-loaded. We don't need
+		 * memory barriers as we'll eventually observe the right
+		 * values at the cost of a few extra spins.
+		 */
+		cpu_relax_lowlatency();
+
+	}
+	WRITE_ONCE(lock->wlh_spinning, false);
+	return gotlock;
+}
 #else
 static bool mutex_optimistic_spin(struct mutex *lock,
 				  struct ww_acquire_ctx *ww_ctx, const bool use_ww_ctx)
 {
 	return false;
 }
+
+static bool mutex_wlh_opt_spin(struct mutex *lock,
+			       struct ww_acquire_ctx *ww_ctx, const bool use_ww_ctx)
+{
+	return false;
+}
 #endif
 
 __visible __used noinline
@@ -543,6 +627,8 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 	lock_contended(&lock->dep_map, ip);
 
 	for (;;) {
+		int gotlock;
+
 		/*
 		 * Lets try to take the lock again - this is needed even if
 		 * we get here for the first time (shortly after failing to
@@ -577,7 +663,12 @@ __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass,
 		/* didn't get the lock, go to sleep: */
 		spin_unlock_mutex(&lock->wait_lock, flags);
 		schedule_preempt_disabled();
+
+		/* optimistically spinning on the mutex without the wait lock */
+		gotlock = mutex_wlh_opt_spin(lock, ww_ctx, use_ww_ctx);
 		spin_lock_mutex(&lock->wait_lock, flags);
+		if (gotlock)
+			break;
 	}
 	__set_task_state(task, TASK_RUNNING);
 
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
  2016-01-21  9:29 [PATCH RFC] " Ding Tianhong
@ 2016-01-21 21:23 ` Tim Chen
  2016-01-22  2:41   ` Paul E. McKenney
  2016-01-21 23:02 ` Waiman Long
  1 sibling, 1 reply; 24+ messages in thread
From: Tim Chen @ 2016-01-21 21:23 UTC (permalink / raw)
  To: Ding Tianhong
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, Davidlohr Bueso,
	Linus Torvalds, Paul E. McKenney, Thomas Gleixner, Will Deacon,
	Jason Low, Waiman Long

On Thu, 2016-01-21 at 17:29 +0800, Ding Tianhong wrote:

> 
> diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
> index 0551c21..596b341 100644
> --- a/kernel/locking/mutex.c
> +++ b/kernel/locking/mutex.c
> @@ -256,7 +256,7 @@ static inline int mutex_can_spin_on_owner(struct mutex *lock)
>  	struct task_struct *owner;
>  	int retval = 1;
>  
> -	if (need_resched())
> +	if (need_resched() || atomic_read(&lock->count) == -1)
>  		return 0;
>  

One concern I have is this change will eliminate any optimistic spinning
as long as there is a waiter.  Is there a middle ground that we
can allow only one spinner if there are waiters?  

In other words, we allow spinning when
atomic_read(&lock->count) == -1 but there is no one on the
osq lock that queue up the spinners (i.e. no other process doing
optimistic spinning).

This could allow a bit of spinning without starving out the waiters.

Tim

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH RFC] locking/mutexes: don't spin on owner when wait list is not NULL.
@ 2016-01-21  9:29 Ding Tianhong
  2016-01-21 21:23 ` Tim Chen
  2016-01-21 23:02 ` Waiman Long
  0 siblings, 2 replies; 24+ messages in thread
From: Ding Tianhong @ 2016-01-21  9:29 UTC (permalink / raw)
  To: Peter Zijlstra, Ingo Molnar, linux-kernel
  Cc: Davidlohr Bueso, Linus Torvalds, Paul E. McKenney,
	Thomas Gleixner, Will Deacon, Jason Low, Tim Chen, Waiman Long

I build a script to create several process for ioctl loop calling,
the ioctl will calling the kernel function just like:
xx_ioctl {
...
rtnl_lock();
function();
rtnl_unlock();
...
}
The function may sleep several ms, but will not halt, at the same time
another user service may calling ifconfig to change the state of the
ethernet, and after several hours, the hung task thread report this problem:

========================================================================
149738.039038] INFO: task ifconfig:11890 blocked for more than 120 seconds.
[149738.040597] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[149738.042280] ifconfig D ffff88061ec13680 0 11890 11573 0x00000080
[149738.042284] ffff88052449bd40 0000000000000082 ffff88053a33f300 ffff88052449bfd8
[149738.042286] ffff88052449bfd8 ffff88052449bfd8 ffff88053a33f300 ffffffff819e6240
[149738.042288] ffffffff819e6244 ffff88053a33f300 00000000ffffffff ffffffff819e6248
[149738.042290] Call Trace:
[149738.042300] [<ffffffff8160d219>] schedule_preempt_disabled+0x29/0x70
[149738.042303] [<ffffffff8160af65>] __mutex_lock_slowpath+0xc5/0x1c0
[149738.042305] [<ffffffff8160a3cf>] mutex_lock+0x1f/0x2f
[149738.042309] [<ffffffff8150d945>] rtnl_lock+0x15/0x20
[149738.042311] [<ffffffff81514e3a>] dev_ioctl+0xda/0x590
[149738.042314] [<ffffffff816121cc>] ? __do_page_fault+0x21c/0x560
[149738.042318] [<ffffffff814e42c5>] sock_do_ioctl+0x45/0x50
[149738.042320] [<ffffffff814e49d0>] sock_ioctl+0x1f0/0x2c0
[149738.042324] [<ffffffff811dc9b5>] do_vfs_ioctl+0x2e5/0x4c0
[149738.042327] [<ffffffff811e6a00>] ? fget_light+0xa0/0xd0

================================ cut here ================================

I got the vmcore and found that the ifconfig is already in the wait_list of the
rtnl_lock for 120 second, but my process could get and release the rtnl_lock
normally several times in one second, so it means that my process jump the
queue and the ifconfig couldn't get the rtnl all the time, I check the mutex lock
slow path and found that the mutex may spin on owner ignore whether the  wait list
is empty, it will cause the task in the wait list always be cut in line, so add
test for wait list in the mutex_can_spin_on_owner and avoid this problem.

Signed-off-by: Ding Tianhong <dingtianhong@huawei.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@us.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Jason Low <jason.low2@hp.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Cc: Waiman Long <Waiman.Long@hp.com>
---
 kernel/locking/mutex.c | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index 0551c21..596b341 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -256,7 +256,7 @@ static inline int mutex_can_spin_on_owner(struct mutex *lock)
 	struct task_struct *owner;
 	int retval = 1;
 
-	if (need_resched())
+	if (need_resched() || atomic_read(&lock->count) == -1)
 		return 0;
 
 	rcu_read_lock();
@@ -283,10 +283,11 @@ static inline bool mutex_try_to_acquire(struct mutex *lock)
 /*
  * Optimistic spinning.
  *
- * We try to spin for acquisition when we find that the lock owner
- * is currently running on a (different) CPU and while we don't
- * need to reschedule. The rationale is that if the lock owner is
- * running, it is likely to release the lock soon.
+ * We try to spin for acquisition when we find that there are no
+ * pending waiters and the lock owner is currently running on a
+ * (different) CPU and while we don't need to reschedule. The
+ * rationale is that if the lock owner is running, it is likely
+ * to release the lock soon.
  *
  * Since this needs the lock owner, and this mutex implementation
  * doesn't track the owner atomically in the lock field, we need to
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2016-02-01  3:35 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-21  6:53 [PATCH RFC ] locking/mutexes: don't spin on owner when wait list is not NULL Ding Tianhong
2016-01-21  7:29 ` Ingo Molnar
2016-01-21  9:04   ` Ding Tianhong
2016-01-21  9:29 [PATCH RFC] " Ding Tianhong
2016-01-21 21:23 ` Tim Chen
2016-01-22  2:41   ` Paul E. McKenney
2016-01-22  2:48     ` Davidlohr Bueso
2016-01-22  3:13       ` Paul E. McKenney
2016-01-21 23:02 ` Waiman Long
2016-01-22  6:09   ` Davidlohr Bueso
2016-01-22 13:38     ` Waiman Long
2016-01-22 16:46       ` Davidlohr Bueso
2016-01-22  8:54   ` Peter Zijlstra
2016-01-22 10:20     ` Jason Low
2016-01-22 10:53       ` Peter Zijlstra
2016-01-22 10:56         ` Peter Zijlstra
2016-01-22 11:06           ` Peter Zijlstra
2016-01-22 13:59             ` Waiman Long
2016-01-24  8:03               ` Ding Tianhong
2016-01-29  9:53                 ` Peter Zijlstra
2016-01-30  1:18                   ` Ding Tianhong
2016-02-01  3:29                     ` huang ying
2016-02-01  3:35                       ` Huang, Ying
2016-01-22 13:41     ` Waiman Long

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).