linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: qiaozhou <qiaozhou@asrmicro.com>
To: Vikram Mulukutla <markivx@codeaurora.org>,
	Will Deacon <will.deacon@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	John Stultz <john.stultz@linaro.org>, <sboyd@codeaurora.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Wang Wilbur <wilburwang@asrmicro.com>,
	Marc Zyngier <marc.zyngier@arm.com>,
	Peter Zijlstra <peterz@infradead.org>,
	<linux-kernel-owner@vger.kernel.org>, <sudeep.holla@arm.com>,
	Zhou Qiao <qiaozhou@asrmicro.com>
Subject: Re: [Question]: try to fix contention between expire_timers and try_to_del_timer_sync
Date: Tue, 1 Aug 2017 15:37:01 +0800	[thread overview]
Message-ID: <c2450453-0a42-0c68-db9b-3d097dc62c34@asrmicro.com> (raw)
In-Reply-To: <1c951f01-c450-d21b-13e9-6a32eb509d94@asrmicro.com>



On 2017年07月31日 19:20, qiaozhou wrote:
> 
> 
> On 2017年07月29日 03:09, Vikram Mulukutla wrote:
>> On 2017-07-28 02:28, Will Deacon wrote:
>>> On Thu, Jul 27, 2017 at 06:10:34PM -0700, Vikram Mulukutla wrote:
>>
>> <snip>
>>>
>>> Does bodging cpu_relax to back-off to wfe after a while help? The event
>>> stream will wake it up if nothing else does. Nasty patch below, but 
>>> I'd be
>>> interested to know whether or not it helps.
>>>
>>> Will
>>>
> The patch also helps a lot on my platform. (Though it does cause 
> deadlock(related with udelay) in uart driver in early boot, and not sure 
> it's uart driver issue. Just workaround it firstly)
> 
> Platform: 4 a53(832MHz) + 4 a73(1.8GHz)
> Test condition #1:
>      a. core2: a53, while loop (spinlock, spin_unlock)
>      b. core7: a73, while loop (spinlock, spin_unlock, cpu_relax)
> 
> Test result: recording the lock acquire times(a53, a73), max lock 
> acquired time(a53), in 20 seconds
> 
> Without cpu_relax bodging patch:
> ===============================================================
> |a53 locked times | a73 locked times | a53 max locked time(us)|
> ==================|==================|========================|
>                 182|          38371616|               1,951,954|
>                 202|          38427652|               2,261,319|
>                 210|          38477427|              15,309,597|
>                 207|          38494479|               6,656,453|
>                 220|          38422283|               2,064,155|
> ===============================================================
> 
> With cpu_relax bodging patch:
> ===============================================================
> |a53 locked times | a73 locked times | a53 max locked time(us)|
> ==================|==================|========================|
>             1849898|          37799379|                 131,255|
>             1574172|          38557653|                  38,410|
>             1924777|          37831725|                  42,999|
>             1477665|          38723741|                  52,087|
>             1865793|          38007741|                 783,965|
> ===============================================================
> 
> Also add some workload to the whole system to check the result.
> Test condition #2: based on #1
>      c. core6: a73, 1.8GHz, run "while(1);" loop
> 
> With cpu_relax bodging patch:
> ===============================================================
> |a53 locked times | a73 locked times | a53 max locked time(us)|
> ==================|==================|========================|
>                  20|          42563981|               2,317,070|
>                  10|          42652793|               4,210,944|
>                   9|          42651075|               5,691,834|
>                  28|          42652591|               4,539,555|
>                  10|          42652801|               5,850,639|
> ===============================================================
> 
> Also hotplug out other cores.
> Test condition #3: based on #1
>      d. hotplug out core1/3/4/5/6, keep core0 for scheduling
> 
> With cpu_relax bodging patch:
> ===============================================================
> |a53 locked times | a73 locked times | a53 max locked time(us)|
> ==================|==================|========================|
>                 447|          42652450|                 309,549|
>                 515|          42650382|                 337,661|
>                 415|          42646669|                 628,525|
>                 431|          42651137|                 365,862|
>                 464|          42648916|                 379,934|
> ===============================================================
> 
> The last two tests are the actual cases where the hard-lockup is 
> triggered on my platform. So I gathered some data, and it shows that a53 
> needs much longer time to acquire the lock.
> 
> All tests are done in android, black screen with USB cable attached. The 
> data is not so pretty as Vikram's. It might be related with cpu 
> topology, core numbers, CCI frequency etc. (I'll do another test with 
> both a53 and a73 running at 1.2GHz, to check whether it's the core 
> frequency which leads to the major difference.)
> 
Test the contention with the same frequency between a53 and a73 cores.
Platform: 4 a53(1248MHz) + 4 a73(1248MHz)
Test condition #4:
      a. core2: a53, while loop (spinlock, spin_unlock)
      b. core7: a73, while loop (spinlock, spin_unlock)
===============================================================
|a53 locked times | a73 locked times | a53 max locked time(us)|
==================|==================|========================|
           12945632|          13021576|                      14|
           12934181|          13059230|                      16|
           12987186|          13059016|                      49|
           12958583|          13038884|                      24|
           14637546|          14672522|                      14|
===============================================================

The locked times are almost the same, and the max time of acquiring the 
lock on a53 also drops. On my platform, core frequency seems to be the 
key factor.

>> This does seem to help. Here's some data after 5 runs with and without 
>> the patch.
>>
>> time = max time taken to acquire lock
>> counter = number of times lock acquired
>>
>> cpu0: little cpu @ 300MHz, cpu4: Big cpu @2.0GHz
>> Without the cpu_relax() bodging patch:
>> =====================================================
>> cpu0 time | cpu0 counter | cpu4 time | cpu4 counter |
>> ==========|==============|===========|==============|
>>    117893us|       2349144|        2us|       6748236|
>>    571260us|       2125651|        2us|       7643264|
>>     19780us|       2392770|        2us|       5987203|
>>     19948us|       2395413|        2us|       5977286|
>>     19822us|       2429619|        2us|       5768252|
>>     19888us|       2444940|        2us|       5675657|
>> =====================================================
>>
>> cpu0: little cpu @ 300MHz, cpu4: Big cpu @2.0GHz
>> With the cpu_relax() bodging patch:
>> =====================================================
>> cpu0 time | cpu0 counter | cpu4 time | cpu4 counter |
>> ==========|==============|===========|==============|
>>         3us|       2737438|        2us|       6907147|
>>         2us|       2742478|        2us|       6902241|
>>       132us|       2745636|        2us|       6876485|
>>         3us|       2744554|        2us|       6898048|
>>         3us|       2741391|        2us|       6882901|
>> ==================================================== >
>> The patch also seems to have helped with fairness in general
>> allowing more work to be done if the CPU frequencies are more
>> closely matched (I don't know if this translates to real world
>> performance - probably not). The counter values are higher
>> with the patch.
>>
>> time = max time taken to acquire lock
>> counter = number of times lock acquired
>>
>> cpu0: little cpu @ 1.5GHz, cpu4: Big cpu @2.0GHz
>> Without the cpu_relax() bodging patch:
>> =====================================================
>> cpu0 time | cpu0 counter | cpu4 time | cpu4 counter |
>> ==========|==============|===========|==============|
>>         2us|       5240654|        1us|       5339009|
>>         2us|       5287797|       97us|       5327073|
>>         2us|       5237634|        1us|       5334694|
>>         2us|       5236676|       88us|       5333582|
>>        84us|       5285880|       84us|       5329489|
>> =====================================================
>>
>> cpu0: little cpu @ 1.5GHz, cpu4: Big cpu @2.0GHz
>> With the cpu_relax() bodging patch:
>> =====================================================
>> cpu0 time | cpu0 counter | cpu4 time | cpu4 counter |
>> ==========|==============|===========|==============|
>>       140us|      10449121|        1us|      11154596|
>>         1us|      10757081|        1us|      11479395|
>>        83us|      10237109|        1us|      10902557|
>>         2us|       9871101|        1us|      10514313|
>>         2us|       9758763|        1us|      10391849|
>> =====================================================
>>Also apply Vikram's patch and have a test.

cpu2: a53, 832MHz, cpu7: a73, 1.75Hz
Without cpu_relax bodging patch
=====================================================
cpu2 time | cpu2 counter | cpu7 time | cpu7 counter |
==========|==============|===========|==============|
      16505|          5243|          2|      12487322|
      16494|          5619|          1|      12013291|
      16498|          5276|          2|      11706824|
      16494|          7123|          1|      12532355|
      16470|          7208|          2|      11784617|
=====================================================

cpu2: a53, 832MHz, cpu7: a73, 1.75Hz
With cpu_relax bodging patch:
=====================================================
cpu2 time | cpu2 counter | cpu7 time | cpu7 counter |
==========|==============|===========|==============|
       3991|        140714|          1|      11430528|
       4018|        144371|          1|      11430528|
       4034|        143250|          1|      11427011|
       4330|        147345|          1|      11423583|
       4752|        138273|          1|      11433241|
=====================================================

It has some improvements, but not so good as Vikram's data. The big core 
still has much more chance to acquire lock.
>>
>> Thanks,
>> Vikram
>>

  reply	other threads:[~2017-08-01  7:37 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <3d2459c7-defd-a47e-6cea-007c10cecaac@asrmicro.com>
2017-07-26 14:16 ` [Question]: try to fix contention between expire_timers and try_to_del_timer_sync Thomas Gleixner
2017-07-27  1:29   ` qiaozhou
2017-07-27 15:14     ` Will Deacon
2017-07-27 15:19       ` Thomas Gleixner
2017-07-28  1:10     ` Vikram Mulukutla
2017-07-28  9:28       ` Peter Zijlstra
2017-07-28 19:11         ` Vikram Mulukutla
2017-07-28  9:28       ` Will Deacon
2017-07-28 19:09         ` Vikram Mulukutla
2017-07-31 11:20           ` qiaozhou
2017-08-01  7:37             ` qiaozhou [this message]
2017-08-03 23:32               ` Vikram Mulukutla
2017-08-04  3:15                 ` qiaozhou
2017-07-31 13:13           ` Will Deacon
2017-08-03 23:25             ` Vikram Mulukutla
2017-08-15 18:40               ` Will Deacon
2017-08-25 19:48                 ` Vikram Mulukutla
2017-08-25 20:25                   ` Vikram Mulukutla
2017-08-28 23:12                   ` Vikram Mulukutla
2017-09-06 11:19                     ` qiaozhou
2017-09-25 11:02                     ` qiaozhou
2017-10-02 14:14                       ` Will Deacon
2017-10-11  8:33                         ` qiaozhou

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c2450453-0a42-0c68-db9b-3d097dc62c34@asrmicro.com \
    --to=qiaozhou@asrmicro.com \
    --cc=john.stultz@linaro.org \
    --cc=linux-kernel-owner@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=marc.zyngier@arm.com \
    --cc=markivx@codeaurora.org \
    --cc=peterz@infradead.org \
    --cc=sboyd@codeaurora.org \
    --cc=sudeep.holla@arm.com \
    --cc=tglx@linutronix.de \
    --cc=wilburwang@asrmicro.com \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).