From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1752663AbdG1TLi (ORCPT <rfc822;w@1wt.eu>);
        Fri, 28 Jul 2017 15:11:38 -0400
Received: from smtp.codeaurora.org ([198.145.29.96]:51702 "EHLO
        smtp.codeaurora.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1752204AbdG1TLg (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Fri, 28 Jul 2017 15:11:36 -0400
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII;
 format=flowed
Content-Transfer-Encoding: 7bit
Date: Fri, 28 Jul 2017 12:11:35 -0700
From: Vikram Mulukutla <markivx@codeaurora.org>
To: Peter Zijlstra <peterz@infradead.org>
Cc: qiaozhou <qiaozhou@asrmicro.com>, Thomas Gleixner <tglx@linutronix.de>,
        John Stultz <john.stultz@linaro.org>, sboyd@codeaurora.org,
        LKML <linux-kernel@vger.kernel.org>,
        Wang Wilbur <wilburwang@asrmicro.com>,
        Marc Zyngier <marc.zyngier@arm.com>, Will Deacon <will.deacon@arm.com>,
        linux-kernel-owner@vger.kernel.org, sudeep.holla@arm.com
Subject: Re: [Question]: try to fix contention between expire_timers and
 try_to_del_timer_sync
In-Reply-To: <20170728092811.33bhkylg7kk6szxh@hirez.programming.kicks-ass.net>
References: <3d2459c7-defd-a47e-6cea-007c10cecaac@asrmicro.com>
 <alpine.DEB.2.20.1707261548560.2186@nanos>
 <dcb18367-c747-96a8-9927-d8ba6954c496@asrmicro.com>
 <e1cc02c5e7dfd4d6bec937b6dc97bfc7@codeaurora.org>
 <20170728092811.33bhkylg7kk6szxh@hirez.programming.kicks-ass.net>
Message-ID: <22831be0d0e558768007ddc7a1e90fdd@codeaurora.org>
User-Agent: Roundcube Webmail/1.2.5
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

On 2017-07-28 02:28, Peter Zijlstra wrote:
> On Thu, Jul 27, 2017 at 06:10:34PM -0700, Vikram Mulukutla wrote:
> 
>> I think we should have this discussion now - I brought this up earlier 
>> [1]
>> and I promised a test case that I completely forgot about - but here 
>> it
>> is (attached). Essentially a Big CPU in an acquire-check-release loop
>> will have an unfair advantage over a little CPU concurrently 
>> attempting
>> to acquire the same lock, in spite of the ticket implementation. If 
>> the Big
>> CPU needs the little CPU to make forward progress : livelock.
> 
> This needs to be fixed in hardware. There really isn't anything the
> software can sanely do about it.
> 
> It also doesn't have anything to do with the spinlock implementation.
> Ticket or not, its a fundamental problem of LL/SC. Any situation where
> we use atomics for fwd progress guarantees this can happen.
> 

Agreed, it seems like trying to build a fair SW protocol over unfair HW.
But if we can minimally change such loop constructs to address this (all
instances I've seen so far use cpu_relax) it would save a lot of hours
spent debugging these problems. Lot of b.L devices out there :-)

It's also possible that such a workaround may help contention 
performance
since the big CPU may have to wait for say a tick before breaking out of
that loop (the non-livelock scenario where the entire loop isn't in a
critical section).

> The little core (or really any core) should hold on to the locked
> cacheline for a while and not insta relinquish it. Giving it a chance 
> to
> reach the SC.

Thanks,
Vikram

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project