From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1753629AbdIFLU7 (ORCPT ); Wed, 6 Sep 2017 07:20:59 -0400 Received: from [140.206.112.106] ([140.206.112.106]:50215 "EHLO mail2012.asrmicro.com" rhost-flags-FAIL-FAIL-OK-FAIL) by vger.kernel.org with ESMTP id S1752508AbdIFLU6 (ORCPT ); Wed, 6 Sep 2017 07:20:58 -0400 Subject: Re: [Question]: try to fix contention between expire_timers and try_to_del_timer_sync To: Vikram Mulukutla , Will Deacon CC: Thomas Gleixner , John Stultz , , LKML , Wang Wilbur , Marc Zyngier , Peter Zijlstra , , References: <3d2459c7-defd-a47e-6cea-007c10cecaac@asrmicro.com> <20170728092831.GA24839@arm.com> <2aa9684cf9c889ee9fdc8550b4388af6@codeaurora.org> <20170731131321.GB1737@arm.com> <20170815184039.GE10801@arm.com> <9f86bd426bbaede9de6d38cb047bd6fa@codeaurora.org> From: qiaozhou Message-ID: <104312dd-3ba0-01af-5d61-db3d7dd29991@asrmicro.com> Date: Wed, 6 Sep 2017 19:19:53 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset="utf-8"; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-Originating-IP: [10.1.50.16] X-ClientProxiedBy: mail2012.asrmicro.com (10.1.24.123) To mail2012.asrmicro.com (10.1.24.123) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2017年08月29日 07:12, Vikram Mulukutla wrote: > > Well here's something interesting. I tried a different platform and > found that > the workaround doesn't help much at all, similar to Qiao's observation > on his b.L > chipset. Something to do with the WFE implementation or event-stream? Hi Vikram, I did some experiments, to tune the ddr controller(and ddr ram) freq, and cci freq. And the result is as below: cpu2: a53, 832MHz, cpu7: a73, 1.75Hz cci: 832M dclk: DDR controller clock.(data rate = 4 * dclk) With cpu_relax bodging patch: ============================================================== dclk | cpu2 time | cpu2 counter | cpu7 time | cpu7 counter | =======|===========|==============|===========|==============| 78M | 8906| 55438| 13| 4015789| 156M | 5964| 75109| 4| 8229050| 500M | 102| 5984783| 1| 6400885| 600M | 16| 6233601| 1| 6504718| ============================================================== I suspect that the global exclusive monitor in ddr controller may play an important part. With ddr frequency is higher enough, it seems to handle the exclusive requests efficiently and fairly. If reducing cci freq to a lower value, the result of little core drops a lot again. cpu2: a53, 832MHz, cpu7: a73, 1.75Hz cci: 416M dclk: DDR controller clock.(data rate = 4 * dclk) With cpu_relax bodging patch: ============================================================== dclk | cpu2 time | cpu2 counter | cpu7 time | cpu7 counter | =======|===========|==============|===========|==============| 78M | 8837| 10596| 11| 3873635| 156M | 17597| 10211| 4| 6513493| 500M | 10888| 13214| 2| 8916396| 600M | 8934| 15842| 2| 9394124| ============================================================== I guess the result on your different platform might be related with DDR frequency too. Best Regards Qiao