From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_MUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8B90AC31E46 for ; Wed, 12 Jun 2019 09:32:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 6B9FF207E0 for ; Wed, 12 Jun 2019 09:32:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S2437337AbfFLJcC (ORCPT ); Wed, 12 Jun 2019 05:32:02 -0400 Received: from foss.arm.com ([217.140.110.172]:48454 "EHLO foss.arm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S2436605AbfFLJcC (ORCPT ); Wed, 12 Jun 2019 05:32:02 -0400 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id A427028; Wed, 12 Jun 2019 02:32:01 -0700 (PDT) Received: from brain-police (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id C6C563F246; Wed, 12 Jun 2019 02:31:59 -0700 (PDT) Date: Wed, 12 Jun 2019 10:31:53 +0100 From: Will Deacon To: Jayachandran Chandrasekharan Nair Cc: Ard Biesheuvel , "catalin.marinas@arm.com" , Jan Glauber , Linus Torvalds , "linux-kernel@vger.kernel.org" , "linux-arm-kernel@lists.infradead.org" Subject: Re: [RFC] Disable lockref on arm64 Message-ID: <20190612093151.GA11554@brain-police> References: <20190502082741.GE13955@hc> <20190502231858.GB13168@dc5-eodlnx05.marvell.com> <20190506061100.GA8465@dc5-eodlnx05.marvell.com> <20190506181039.GA2875@brain-police> <20190518042424.GA28517@dc5-eodlnx05.marvell.com> <20190522160417.GF7876@fuggles.cambridge.arm.com> <20190612040933.GA18848@dc5-eodlnx05.marvell.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20190612040933.GA18848@dc5-eodlnx05.marvell.com> User-Agent: Mutt/1.9.4 (2018-02-28) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi JC, On Wed, Jun 12, 2019 at 04:10:20AM +0000, Jayachandran Chandrasekharan Nair wrote: > On Wed, May 22, 2019 at 05:04:17PM +0100, Will Deacon wrote: > > On Sat, May 18, 2019 at 12:00:34PM +0200, Ard Biesheuvel wrote: > > > On Sat, 18 May 2019 at 06:25, Jayachandran Chandrasekharan Nair > > > wrote: > > > > Looking thru the perf output of this case (open/close of a file from > > > > multiple CPUs), I see that refcount is a significant factor in most > > > > kernel configurations - and that too uses cmpxchg (without yield). > > > > x86 has an optimized inline version of refcount that helps > > > > significantly. Do you think this is worth looking at for arm64? > > > > > > > > > > I looked into this a while ago [0], but at the time, we decided to > > > stick with the generic implementation until we encountered a use case > > > that benefits from it. Worth a try, I suppose ... > > > > > > [0] https://lore.kernel.org/linux-arm-kernel/20170903101622.12093-1-ard.biesheuvel@linaro.org/ > > > > If JC can show that we benefit from this, it would be interesting to see if > > we can implement the refcount-full saturating arithmetic using the > > LDMIN/LDMAX instructions instead of the current cmpxchg() loops. > > Now that the lockref change is mainline, I think we need to take another > look at this patch. Before we get too involved with this, I really don't want to start a trend of "let's try to rewrite all code using cmpxchg() in Linux because of TX2". At some point, the hardware needs to play ball. However... Ard's refcount patch was about moving the overflow check out-of-line. A side-effect of this, is that we avoid the cmpxchg() operation from many of the operations (atomic_add_unless() disappears), and it's /this/ which helps you. So there may well be a middle ground where we avoid the complexity of the out-of-line {over,under}flow handling but do the saturation post-atomic inline. I was hoping we could use LDMIN/LDMAX to maintain the semantics of REFCOUNT_FULL, but now that I think about it I can't see how we could keep the arithmetic atomic in that case. Hmm. Whatever we do, I prefer to keep REFCOUNT_FULL the default option for arm64, so if we can't keep the semantics when we remove the cmpxchg, you'll need to opt into this at config time. Will From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.2 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,USER_AGENT_MUTT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A4C33C31E46 for ; Wed, 12 Jun 2019 09:32:16 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7BCBD207E0 for ; Wed, 12 Jun 2019 09:32:16 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=lists.infradead.org header.i=@lists.infradead.org header.b="ME3szar7" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7BCBD207E0 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=arm.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-arm-kernel-bounces+infradead-linux-arm-kernel=archiver.kernel.org@lists.infradead.org DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20170209; h=Sender: Content-Transfer-Encoding:Content-Type:Cc:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:In-Reply-To:MIME-Version:References: Message-ID:Subject:To:From:Date:Reply-To:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: List-Owner; bh=g90JLcG3vqM6Pk+7EHUraYoqXum96JqL0NmtSBjfA9w=; b=ME3szar7mFAs2i pcJTqc/tt/gb2PlF1BhDkWtIRZDlzRVYxreK+b/9JDfzSrkV65G3kt3+EtbUQpPJSFerwBsbyI/8Q 8WwMdStIUITnwHMbo6UKwUVskK/Zxosg7l6cWXAuF/mELBZ/5OhiaP9uNCdBmKGEndi6vFN8o0JAH jEUNb00hg1xNw7F+FGPkjB4FGeV+5zm++MDiwCd3l20pLZfFbpWwI1jk14PELGPuEpS9FhqZk3pvZ moxEGHc54H1aXb9lV0QUa7G6gBMm2ZZkan0rJlIG3k83Z6ttqilVZwvdZVXYxVun1Bso36uvdfvLX FTNfDxWJ5YL84TAtItAg==; Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.92 #3 (Red Hat Linux)) id 1hazc0-00033D-Bq; Wed, 12 Jun 2019 09:32:08 +0000 Received: from foss.arm.com ([217.140.110.172]) by bombadil.infradead.org with esmtp (Exim 4.92 #3 (Red Hat Linux)) id 1hazbu-00032H-8g for linux-arm-kernel@lists.infradead.org; Wed, 12 Jun 2019 09:32:07 +0000 Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id A427028; Wed, 12 Jun 2019 02:32:01 -0700 (PDT) Received: from brain-police (unknown [172.31.20.19]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id C6C563F246; Wed, 12 Jun 2019 02:31:59 -0700 (PDT) Date: Wed, 12 Jun 2019 10:31:53 +0100 From: Will Deacon To: Jayachandran Chandrasekharan Nair Subject: Re: [RFC] Disable lockref on arm64 Message-ID: <20190612093151.GA11554@brain-police> References: <20190502082741.GE13955@hc> <20190502231858.GB13168@dc5-eodlnx05.marvell.com> <20190506061100.GA8465@dc5-eodlnx05.marvell.com> <20190506181039.GA2875@brain-police> <20190518042424.GA28517@dc5-eodlnx05.marvell.com> <20190522160417.GF7876@fuggles.cambridge.arm.com> <20190612040933.GA18848@dc5-eodlnx05.marvell.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20190612040933.GA18848@dc5-eodlnx05.marvell.com> User-Agent: Mutt/1.9.4 (2018-02-28) X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20190612_023205_598355_DD929E0D X-CRM114-Status: GOOD ( 21.20 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Ard Biesheuvel , "catalin.marinas@arm.com" , Jan Glauber , "linux-kernel@vger.kernel.org" , Linus Torvalds , "linux-arm-kernel@lists.infradead.org" Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+infradead-linux-arm-kernel=archiver.kernel.org@lists.infradead.org Hi JC, On Wed, Jun 12, 2019 at 04:10:20AM +0000, Jayachandran Chandrasekharan Nair wrote: > On Wed, May 22, 2019 at 05:04:17PM +0100, Will Deacon wrote: > > On Sat, May 18, 2019 at 12:00:34PM +0200, Ard Biesheuvel wrote: > > > On Sat, 18 May 2019 at 06:25, Jayachandran Chandrasekharan Nair > > > wrote: > > > > Looking thru the perf output of this case (open/close of a file from > > > > multiple CPUs), I see that refcount is a significant factor in most > > > > kernel configurations - and that too uses cmpxchg (without yield). > > > > x86 has an optimized inline version of refcount that helps > > > > significantly. Do you think this is worth looking at for arm64? > > > > > > > > > > I looked into this a while ago [0], but at the time, we decided to > > > stick with the generic implementation until we encountered a use case > > > that benefits from it. Worth a try, I suppose ... > > > > > > [0] https://lore.kernel.org/linux-arm-kernel/20170903101622.12093-1-ard.biesheuvel@linaro.org/ > > > > If JC can show that we benefit from this, it would be interesting to see if > > we can implement the refcount-full saturating arithmetic using the > > LDMIN/LDMAX instructions instead of the current cmpxchg() loops. > > Now that the lockref change is mainline, I think we need to take another > look at this patch. Before we get too involved with this, I really don't want to start a trend of "let's try to rewrite all code using cmpxchg() in Linux because of TX2". At some point, the hardware needs to play ball. However... Ard's refcount patch was about moving the overflow check out-of-line. A side-effect of this, is that we avoid the cmpxchg() operation from many of the operations (atomic_add_unless() disappears), and it's /this/ which helps you. So there may well be a middle ground where we avoid the complexity of the out-of-line {over,under}flow handling but do the saturation post-atomic inline. I was hoping we could use LDMIN/LDMAX to maintain the semantics of REFCOUNT_FULL, but now that I think about it I can't see how we could keep the arithmetic atomic in that case. Hmm. Whatever we do, I prefer to keep REFCOUNT_FULL the default option for arm64, so if we can't keep the semantics when we remove the cmpxchg, you'll need to opt into this at config time. Will _______________________________________________ linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel