From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE, SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 238A5C433B4 for ; Wed, 12 May 2021 12:57:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D84B8611C9 for ; Wed, 12 May 2021 12:57:21 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231533AbhELM62 (ORCPT ); Wed, 12 May 2021 08:58:28 -0400 Received: from pegase2.c-s.fr ([93.17.235.10]:58869 "EHLO pegase2.c-s.fr" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230334AbhELM6U (ORCPT ); Wed, 12 May 2021 08:58:20 -0400 Received: from localhost (mailhub3.si.c-s.fr [172.26.127.67]) by localhost (Postfix) with ESMTP id 4FgFDj4JDkz9sf2; Wed, 12 May 2021 14:57:09 +0200 (CEST) X-Virus-Scanned: amavisd-new at c-s.fr Received: from pegase2.c-s.fr ([172.26.127.65]) by localhost (pegase2.c-s.fr [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id CKX-q6AHXoOp; Wed, 12 May 2021 14:57:09 +0200 (CEST) Received: from messagerie.si.c-s.fr (messagerie.si.c-s.fr [192.168.25.192]) by pegase2.c-s.fr (Postfix) with ESMTP id 4FgFDj3J5cz9sf1; Wed, 12 May 2021 14:57:09 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by messagerie.si.c-s.fr (Postfix) with ESMTP id 4DE178B7F2; Wed, 12 May 2021 14:57:09 +0200 (CEST) X-Virus-Scanned: amavisd-new at c-s.fr Received: from messagerie.si.c-s.fr ([127.0.0.1]) by localhost (messagerie.si.c-s.fr [127.0.0.1]) (amavisd-new, port 10023) with ESMTP id r9l0ajoEPVOw; Wed, 12 May 2021 14:57:09 +0200 (CEST) Received: from [192.168.4.90] (unknown [192.168.4.90]) by messagerie.si.c-s.fr (Postfix) with ESMTP id CFDFF8B7EF; Wed, 12 May 2021 14:57:08 +0200 (CEST) Subject: Re: [PATCH] powerpc: Force inlining of csum_add() To: Segher Boessenkool Cc: Benjamin Herrenschmidt , Paul Mackerras , Michael Ellerman , linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org References: <20210511105154.GJ10366@gate.crashing.org> From: Christophe Leroy Message-ID: Date: Wed, 12 May 2021 14:56:56 +0200 User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.10.1 MIME-Version: 1.0 In-Reply-To: <20210511105154.GJ10366@gate.crashing.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: fr Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, Le 11/05/2021 à 12:51, Segher Boessenkool a écrit : > Hi! > > On Tue, May 11, 2021 at 06:08:06AM +0000, Christophe Leroy wrote: >> Commit 328e7e487a46 ("powerpc: force inlining of csum_partial() to >> avoid multiple csum_partial() with GCC10") inlined csum_partial(). >> >> Now that csum_partial() is inlined, GCC outlines csum_add() when >> called by csum_partial(). > >> c064fb28 : >> c064fb28: 7c 63 20 14 addc r3,r3,r4 >> c064fb2c: 7c 63 01 94 addze r3,r3 >> c064fb30: 4e 80 00 20 blr > > Could you build this with -fdump-tree-einline-all and send me the > results? Or open a GCC PR yourself :-) Ok, I'll forward it to you in a minute. > > Something seems to have decided this asm is more expensive than it is. > That isn't always avoidable -- the compiler cannot look inside asms -- > but it seems it could be improved here. > > Do you have (or can make) a self-contained testcase? I have not tried, and I fear it might be difficult, because on a kernel build with dozens of calls to csum_add(), only ip6_tunnel.o exhibits such an issue. > >> The sum with 0 is useless, should have been skipped. > > That isn't something the compiler can do anything about (not sure if you > were suggesting that); it has to be done in the user code (and it tries > to already, see below). I was not suggesting that, only that when properly inlined the sum with 0 is skipped (because we put the necessary stuff in csum_add() of course). > >> And there is even one completely unused instance of csum_add(). > > That is strange, that should never happen. It seems that several .o include unused versions of csum_add. After the final link, one remains (in addition to the used one) in vmlinux. > >> ./arch/powerpc/include/asm/checksum.h: In function '__ip6_tnl_rcv': >> ./arch/powerpc/include/asm/checksum.h:94:22: warning: inlining failed in call to 'csum_add': call is unlikely and code size would grow [-Winline] >> 94 | static inline __wsum csum_add(__wsum csum, __wsum addend) >> | ^~~~~~~~ >> ./arch/powerpc/include/asm/checksum.h:172:31: note: called from here >> 172 | sum = csum_add(sum, (__force __wsum)*(const u32 *)buff); >> | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > At least we say what happened. Progress! :-) Lol. I've seen this warning for long, that's not something new I guess. > >> In the non-inlined version, the first sum with 0 was performed. >> Here it is skipped. > > That is because of how __builtin_constant_p works, most likely. As we > discussed elsewhere it is evaluated before all forms of loop unrolling. But we are not talking about loop unrolling here, are we ? It seems that the reason here is that __builtin_constant_p() is evaluated long after GCC decided to not inline that call to csum_add(). Christophe From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE, SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2AF26C433ED for ; Wed, 12 May 2021 12:57:40 +0000 (UTC) Received: from lists.ozlabs.org (lists.ozlabs.org [112.213.38.117]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 197C161104 for ; Wed, 12 May 2021 12:57:39 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 197C161104 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=csgroup.eu Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Received: from boromir.ozlabs.org (localhost [IPv6:::1]) by lists.ozlabs.org (Postfix) with ESMTP id 4FgFFF3fw6z30C6 for ; Wed, 12 May 2021 22:57:37 +1000 (AEST) Authentication-Results: lists.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=csgroup.eu (client-ip=93.17.235.10; helo=pegase2.c-s.fr; envelope-from=christophe.leroy@csgroup.eu; receiver=) Received: from pegase2.c-s.fr (pegase2.c-s.fr [93.17.235.10]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 4FgFDt0z36z2xgL for ; Wed, 12 May 2021 22:57:13 +1000 (AEST) Received: from localhost (mailhub3.si.c-s.fr [172.26.127.67]) by localhost (Postfix) with ESMTP id 4FgFDj4JDkz9sf2; Wed, 12 May 2021 14:57:09 +0200 (CEST) X-Virus-Scanned: amavisd-new at c-s.fr Received: from pegase2.c-s.fr ([172.26.127.65]) by localhost (pegase2.c-s.fr [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id CKX-q6AHXoOp; Wed, 12 May 2021 14:57:09 +0200 (CEST) Received: from messagerie.si.c-s.fr (messagerie.si.c-s.fr [192.168.25.192]) by pegase2.c-s.fr (Postfix) with ESMTP id 4FgFDj3J5cz9sf1; Wed, 12 May 2021 14:57:09 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by messagerie.si.c-s.fr (Postfix) with ESMTP id 4DE178B7F2; Wed, 12 May 2021 14:57:09 +0200 (CEST) X-Virus-Scanned: amavisd-new at c-s.fr Received: from messagerie.si.c-s.fr ([127.0.0.1]) by localhost (messagerie.si.c-s.fr [127.0.0.1]) (amavisd-new, port 10023) with ESMTP id r9l0ajoEPVOw; Wed, 12 May 2021 14:57:09 +0200 (CEST) Received: from [192.168.4.90] (unknown [192.168.4.90]) by messagerie.si.c-s.fr (Postfix) with ESMTP id CFDFF8B7EF; Wed, 12 May 2021 14:57:08 +0200 (CEST) Subject: Re: [PATCH] powerpc: Force inlining of csum_add() To: Segher Boessenkool References: <20210511105154.GJ10366@gate.crashing.org> From: Christophe Leroy Message-ID: Date: Wed, 12 May 2021 14:56:56 +0200 User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.10.1 MIME-Version: 1.0 In-Reply-To: <20210511105154.GJ10366@gate.crashing.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: fr Content-Transfer-Encoding: 8bit X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Paul Mackerras , linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org Errors-To: linuxppc-dev-bounces+linuxppc-dev=archiver.kernel.org@lists.ozlabs.org Sender: "Linuxppc-dev" Hi, Le 11/05/2021 à 12:51, Segher Boessenkool a écrit : > Hi! > > On Tue, May 11, 2021 at 06:08:06AM +0000, Christophe Leroy wrote: >> Commit 328e7e487a46 ("powerpc: force inlining of csum_partial() to >> avoid multiple csum_partial() with GCC10") inlined csum_partial(). >> >> Now that csum_partial() is inlined, GCC outlines csum_add() when >> called by csum_partial(). > >> c064fb28 : >> c064fb28: 7c 63 20 14 addc r3,r3,r4 >> c064fb2c: 7c 63 01 94 addze r3,r3 >> c064fb30: 4e 80 00 20 blr > > Could you build this with -fdump-tree-einline-all and send me the > results? Or open a GCC PR yourself :-) Ok, I'll forward it to you in a minute. > > Something seems to have decided this asm is more expensive than it is. > That isn't always avoidable -- the compiler cannot look inside asms -- > but it seems it could be improved here. > > Do you have (or can make) a self-contained testcase? I have not tried, and I fear it might be difficult, because on a kernel build with dozens of calls to csum_add(), only ip6_tunnel.o exhibits such an issue. > >> The sum with 0 is useless, should have been skipped. > > That isn't something the compiler can do anything about (not sure if you > were suggesting that); it has to be done in the user code (and it tries > to already, see below). I was not suggesting that, only that when properly inlined the sum with 0 is skipped (because we put the necessary stuff in csum_add() of course). > >> And there is even one completely unused instance of csum_add(). > > That is strange, that should never happen. It seems that several .o include unused versions of csum_add. After the final link, one remains (in addition to the used one) in vmlinux. > >> ./arch/powerpc/include/asm/checksum.h: In function '__ip6_tnl_rcv': >> ./arch/powerpc/include/asm/checksum.h:94:22: warning: inlining failed in call to 'csum_add': call is unlikely and code size would grow [-Winline] >> 94 | static inline __wsum csum_add(__wsum csum, __wsum addend) >> | ^~~~~~~~ >> ./arch/powerpc/include/asm/checksum.h:172:31: note: called from here >> 172 | sum = csum_add(sum, (__force __wsum)*(const u32 *)buff); >> | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > At least we say what happened. Progress! :-) Lol. I've seen this warning for long, that's not something new I guess. > >> In the non-inlined version, the first sum with 0 was performed. >> Here it is skipped. > > That is because of how __builtin_constant_p works, most likely. As we > discussed elsewhere it is evaluated before all forms of loop unrolling. But we are not talking about loop unrolling here, are we ? It seems that the reason here is that __builtin_constant_p() is evaluated long after GCC decided to not inline that call to csum_add(). Christophe