From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C3DCBC433EF for ; Tue, 10 May 2022 22:15:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S235233AbiEJWPA (ORCPT ); Tue, 10 May 2022 18:15:00 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55072 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229755AbiEJWO6 (ORCPT ); Tue, 10 May 2022 18:14:58 -0400 Received: from mail-lf1-x12f.google.com (mail-lf1-x12f.google.com [IPv6:2a00:1450:4864:20::12f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F2F41289BCF for ; Tue, 10 May 2022 15:14:56 -0700 (PDT) Received: by mail-lf1-x12f.google.com with SMTP id d19so460381lfj.4 for ; Tue, 10 May 2022 15:14:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=SOgaTsnX6kepYSuoDyzZHRKBXThHWTHTtRY7fC7kcaM=; b=s1y/ht2yi/2zxNHXLvPXx7iQ0XuULH38UHyRhwN6QVL9z87v/qpo3S2OWCRXlI98b5 TL6M3YqOKxRQ4GYP2ZfDKKw6IOd/bL0nxeiIJSsz9dCUdKsc9rlVmg5aRHXuKmcrDAef GD3aABaMl+pOPlGnj7qLPxZD6t1zdpHZ8dHtRgR+ae7z30avXe4IPOA+a8Gga5Unzzl6 q9JbX+0vFJ6ruK0M013yV3bn31tMuSHC3JXezBi11lmkq8SA+vQmoNGIaKLJ0eMR0Fg9 VeFoFxxVbmmcJP7rE91hYyoXY+PFjY6Bcn9rODtWN7pzSn7cvh1I2k5t2Ye1HNL6ZSDR +ViA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=SOgaTsnX6kepYSuoDyzZHRKBXThHWTHTtRY7fC7kcaM=; b=OicwzYsvNtZV1x0lLcd776cU4yN22jEkYE6mGMV6TwY65HuhKXV5fYvR13iUo/e9KU 9rPHbEWBlrtSDmPPQDf3eKJ7I6p1gTszX0SpHHWXxUOzBW0yxtCVv5ux5zNyhXrKd7fx GzC6nNWBYMqSYE4vX6H+QHWC5/sbI2bqWhL57fcizKBzGHhS2u5cb/gti+MZkCTwTDf+ AG117GIZppCFE1lXS1TApSDSlQnkq9PX0RBZmQM6HopoZwvLGBGxoAfqOkSDudU9jQj6 cEake6gwX5WqukDgLVkgxQWgLObq4aT5ISlW4sp8nQWbT5Ve7JwlhaZBBjPwtXlcMKRp PnwQ== X-Gm-Message-State: AOAM531leKiN0GkWKLZ7huet6QlTJxUjiAwQ4mQZzd1oYdULMb36PVry fdWVvkvFkjuF+irGUHBDdXytxh4GFTZ6D8EScr2hExX5PKvLQA== X-Google-Smtp-Source: ABdhPJxpoMjT+YF2n2eAaW5Yd1bxoaZoOWOwRnJE6lNtIt3rJlDv1tzFoHDV25WVVMfbNYRinG4W6EDDNrQGJ/t4uIk= X-Received: by 2002:a05:6512:48f:b0:472:3c47:94a0 with SMTP id v15-20020a056512048f00b004723c4794a0mr17686928lfq.579.1652220895068; Tue, 10 May 2022 15:14:55 -0700 (PDT) MIME-Version: 1.0 References: <20220510142550.1686866-1-mailhol.vincent@wanadoo.fr> In-Reply-To: <20220510142550.1686866-1-mailhol.vincent@wanadoo.fr> From: Nick Desaulniers Date: Tue, 10 May 2022 15:14:43 -0700 Message-ID: Subject: Re: [PATCH 0/2] x86/asm/bitops: optimize ff{s,z} functions for constant expressions To: Vincent Mailhol Cc: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H . Peter Anvin" , Nathan Chancellor , Tom Rix , linux-kernel@vger.kernel.org, llvm@lists.linux.dev Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, May 10, 2022 at 7:26 AM Vincent Mailhol wrote: > > The compilers provides some builtin expression equivalent to the > ffs(), __ffs() and ffz() function of the kernel. The kernel uses > optimized assembly which produces better code than the builtin > functions. However, such assembly code can not be optimized when used > on constant expression. > > This series relies on __builtin_constant_p to select the optimal solution: > > * use kernel assembly for non constant expressions > > * use compiler's __builtin function for constant expressions. > > I also think that the fls() and fls64() can be optimized in a similar > way, using __builtin_ctz() and __builtin_ctzll() but it is a bit less > trivial so I want to focus on this series first. If it get accepted, I > will then work on those two additionnal function. > > > ** Statistics ** > > On a allyesconfig, before applying this series, I get: > > | $ objdump -d vmlinux.o | grep bsf | wc -l > | 1081 > > After applying this series: > > | $ objdump -d vmlinux.o | grep bsf | wc -l > | 792 > > So, roughly 26.7% of the call to either ffs() or __ffs() were using > constant expression and can be optimized (I did not produce the > figures for ffz()). These stats are interesting; consider putting them on patch 1/2 commit message though (in addition to the cover letter). (Sending thoughts on 1/2 next). > > (tests done on linux v5.18-rc5 x86_64 using GCC 11.2.1) Here's the same measure of x86_64 allyesconfig (./scripts/config -d CONFIG_HINIC) at 9be9ed2612b5aedb52a2c240edb1630b6b743cb6 with ToT LLVM (~clang-15): Before: $ objdump -d vmlinux.o | grep bsf | wc -l 1454 After: $ objdump -d vmlinux.o | grep bsf | wc -l 1070 -26.4% :) > > > Vincent Mailhol (2): > x86/asm/bitops: ffs: use __builtin_ffs to evaluate constant > expressions > x86/asm/bitops: __ffs,ffz: use __builtin_ctzl to evaluate constant > expressions > > arch/x86/include/asm/bitops.h | 65 +++++++++++++++++++++-------------- > 1 file changed, 40 insertions(+), 25 deletions(-) > > -- > 2.35.1 > -- Thanks, ~Nick Desaulniers