From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3EE3FC432BE for ; Wed, 1 Sep 2021 08:54:54 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1120A61058 for ; Wed, 1 Sep 2021 08:54:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S242916AbhIAIzs (ORCPT ); Wed, 1 Sep 2021 04:55:48 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:35878 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S241376AbhIAIzq (ORCPT ); Wed, 1 Sep 2021 04:55:46 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1B91CC061575 for ; Wed, 1 Sep 2021 01:54:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=4WVH33cX+rNIZSshZyxMgesf4pVxYNsVlPzUVOR6Rvw=; b=lvjTIAv3jah6lQ33l2ecRpZGog uCXV7F4diAduy7500Nz34fdBwPwfbAu5nfNdUm1edeDumLakZyOvms48DqhdEtvQbD0rxesBIBpgq i6F498YzeBJR8VkW9bSAnWfXvcr1z5MT91nkjNpzw6wsRlV0rdJ/PFB2sy7j2+YPuUtK+s5C5GdfL tVfd0bVnFz3Ail7YpYGOWIA/Ne8niUrqfFTd2Iyln8w7MmGgS3rZ240Z9oN83jeT+PgZTJbpRgIG4 YCwBXV0UywEWdU7FiX6NdhO5ElBSvtkHR7jCUk0Zh6tw2NDGv5JwULy8tR67CQrccmrcrw9ipt2zu tk/d8YnQ==; Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1mLLzA-00240m-J8; Wed, 01 Sep 2021 08:53:03 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id CA9823002C1; Wed, 1 Sep 2021 10:52:36 +0200 (CEST) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 1000) id B165B2DD42378; Wed, 1 Sep 2021 10:52:36 +0200 (CEST) Date: Wed, 1 Sep 2021 10:52:36 +0200 From: Peter Zijlstra To: Christophe Leroy Cc: Benjamin Herrenschmidt , Paul Mackerras , Michael Ellerman , linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, Josh Poimboeuf , Jason Baron , Steven Rostedt , Ard Biesheuvel Subject: Re: [PATCH v3] powerpc/32: Add support for out-of-line static calls Message-ID: References: <6ec2a7865ed6a5ec54ab46d026785bafe1d837ea.1630484892.git.christophe.leroy@csgroup.eu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <6ec2a7865ed6a5ec54ab46d026785bafe1d837ea.1630484892.git.christophe.leroy@csgroup.eu> Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Sep 01, 2021 at 08:30:21AM +0000, Christophe Leroy wrote: > Add support for out-of-line static calls on PPC32. This change > improve performance of calls to global function pointers by > using direct calls instead of indirect calls. > > The trampoline is initialy populated with a 'blr' or branch to target, > followed by an unreachable long jump sequence. > > In order to cater with parallele execution, the trampoline needs to > be updated in a way that ensures it remains consistent at all time. > This means we can't use the traditional lis/addi to load r12 with > the target address, otherwise there would be a window during which > the first instruction contains the upper part of the new target > address while the second instruction still contains the lower part of > the old target address. To avoid that the target address is stored > just after the 'bctr' and loaded from there with a single instruction. > > Then, depending on the target distance, arch_static_call_transform() > will either replace the first instruction by a direct 'bl ' or > 'nop' in order to have the trampoline fall through the long jump > sequence. > > For the special case of __static_call_return0(), to avoid the risk of > a far branch, a version of it is inlined at the end of the trampoline. (also, it's in the same line, so it avoids another cachemiss and it nicely fills the hole you had in your 32byte chunk) > Performancewise the long jump sequence is probably not better than > the indirect calls set by GCC when we don't use static calls, but > such calls are unlikely to be required on powerpc32: With most > configurations the kernel size is far below 32 Mbytes so only > modules may happen to be too far. And even modules are likely to > be close enough as they are allocated below the kernel core and > as close as possible of the kernel text. > > static_call selftest is running successfully with this change. Nice!, I'd ask if you'd tried PREEMPT_DYNAMIC, since that should really stress the thing, but I see that also requires GENERIC_ENTRY and you don't have that. Alas. Acked-by: Peter Zijlstra (Intel) From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 17D14C432BE for ; Wed, 1 Sep 2021 08:55:21 +0000 (UTC) Received: from lists.ozlabs.org (lists.ozlabs.org [112.213.38.117]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 51A8060C3E for ; Wed, 1 Sep 2021 08:55:20 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 51A8060C3E Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=lists.ozlabs.org Received: from boromir.ozlabs.org (localhost [IPv6:::1]) by lists.ozlabs.org (Postfix) with ESMTP id 4GzyYy6rXfz2ymr for ; Wed, 1 Sep 2021 18:55:18 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; secure) header.d=infradead.org header.i=@infradead.org header.a=rsa-sha256 header.s=casper.20170209 header.b=lvjTIAv3; dkim-atps=neutral Authentication-Results: lists.ozlabs.org; spf=none (no SPF record) smtp.mailfrom=infradead.org (client-ip=2001:8b0:10b:1236::1; helo=casper.infradead.org; envelope-from=peterz@infradead.org; receiver=) Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; secure) header.d=infradead.org header.i=@infradead.org header.a=rsa-sha256 header.s=casper.20170209 header.b=lvjTIAv3; dkim-atps=neutral Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4GzyY92LwLz2xYC for ; Wed, 1 Sep 2021 18:54:32 +1000 (AEST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=4WVH33cX+rNIZSshZyxMgesf4pVxYNsVlPzUVOR6Rvw=; b=lvjTIAv3jah6lQ33l2ecRpZGog uCXV7F4diAduy7500Nz34fdBwPwfbAu5nfNdUm1edeDumLakZyOvms48DqhdEtvQbD0rxesBIBpgq i6F498YzeBJR8VkW9bSAnWfXvcr1z5MT91nkjNpzw6wsRlV0rdJ/PFB2sy7j2+YPuUtK+s5C5GdfL tVfd0bVnFz3Ail7YpYGOWIA/Ne8niUrqfFTd2Iyln8w7MmGgS3rZ240Z9oN83jeT+PgZTJbpRgIG4 YCwBXV0UywEWdU7FiX6NdhO5ElBSvtkHR7jCUk0Zh6tw2NDGv5JwULy8tR67CQrccmrcrw9ipt2zu tk/d8YnQ==; Received: from j217100.upc-j.chello.nl ([24.132.217.100] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1mLLzA-00240m-J8; Wed, 01 Sep 2021 08:53:03 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id CA9823002C1; Wed, 1 Sep 2021 10:52:36 +0200 (CEST) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 1000) id B165B2DD42378; Wed, 1 Sep 2021 10:52:36 +0200 (CEST) Date: Wed, 1 Sep 2021 10:52:36 +0200 From: Peter Zijlstra To: Christophe Leroy Subject: Re: [PATCH v3] powerpc/32: Add support for out-of-line static calls Message-ID: References: <6ec2a7865ed6a5ec54ab46d026785bafe1d837ea.1630484892.git.christophe.leroy@csgroup.eu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <6ec2a7865ed6a5ec54ab46d026785bafe1d837ea.1630484892.git.christophe.leroy@csgroup.eu> X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linux-kernel@vger.kernel.org, Steven Rostedt , Jason Baron , Paul Mackerras , Josh Poimboeuf , linuxppc-dev@lists.ozlabs.org, Ard Biesheuvel Errors-To: linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Sender: "Linuxppc-dev" On Wed, Sep 01, 2021 at 08:30:21AM +0000, Christophe Leroy wrote: > Add support for out-of-line static calls on PPC32. This change > improve performance of calls to global function pointers by > using direct calls instead of indirect calls. > > The trampoline is initialy populated with a 'blr' or branch to target, > followed by an unreachable long jump sequence. > > In order to cater with parallele execution, the trampoline needs to > be updated in a way that ensures it remains consistent at all time. > This means we can't use the traditional lis/addi to load r12 with > the target address, otherwise there would be a window during which > the first instruction contains the upper part of the new target > address while the second instruction still contains the lower part of > the old target address. To avoid that the target address is stored > just after the 'bctr' and loaded from there with a single instruction. > > Then, depending on the target distance, arch_static_call_transform() > will either replace the first instruction by a direct 'bl ' or > 'nop' in order to have the trampoline fall through the long jump > sequence. > > For the special case of __static_call_return0(), to avoid the risk of > a far branch, a version of it is inlined at the end of the trampoline. (also, it's in the same line, so it avoids another cachemiss and it nicely fills the hole you had in your 32byte chunk) > Performancewise the long jump sequence is probably not better than > the indirect calls set by GCC when we don't use static calls, but > such calls are unlikely to be required on powerpc32: With most > configurations the kernel size is far below 32 Mbytes so only > modules may happen to be too far. And even modules are likely to > be close enough as they are allocated below the kernel core and > as close as possible of the kernel text. > > static_call selftest is running successfully with this change. Nice!, I'd ask if you'd tried PREEMPT_DYNAMIC, since that should really stress the thing, but I see that also requires GENERIC_ENTRY and you don't have that. Alas. Acked-by: Peter Zijlstra (Intel)