linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Fangrui Song <maskray@google.com>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>,
	Bill Wendling <morbo@google.com>, Helge Deller <deller@gmx.de>,
	Nathan Chancellor <nathan@kernel.org>,
	linux-kernel@vger.kernel.org, linux-parisc@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Chanho Min <chanho.min@lge.com>,
	Geert Uytterhoeven <geert@linux-m68k.org>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Kees Cook <keescook@chromium.org>,
	clang-built-linux <llvm@lists.linux.dev>,
	"maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT)"
	<x86@kernel.org>
Subject: Re: [PATCH] lib/clz_ctz.c: Fix __clzdi2() and __ctzdi2() for 32-bit kernels
Date: Fri, 25 Aug 2023 20:17:52 -0700	[thread overview]
Message-ID: <20230826031752.gongmxkr56zyycol@google.com> (raw)
In-Reply-To: <CAHk-=wgJzMzPFTCzejWs1WM4=74z2VENyOzySnucrXG3i=ajrw@mail.gmail.com>

On 2023-08-25, Linus Torvalds wrote:
>On Fri, 25 Aug 2023 at 17:52, Nick Desaulniers <ndesaulniers@google.com> wrote:
>>
>> So 2 concerns where "I'll do it in inline asm" can pessimize codegen:
>> 1. You alluded to this, but what happens when one of these functions
>> is called with a constant?
>
>This is why our headers have a lot of __builtin_constant_p()'s in them..
>
>In this particular case, see the x86 asm/bitops.h code:
>
>    #define ffs(x) (__builtin_constant_p(x) ? __builtin_ffs(x) :
>variable_ffs(x))

For the curious (like me),

__builtin_ffs
https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#index-_005f_005fbuiltin_005fffs says

     Returns the number of leading 0-bits in x, starting at the most significant bit position. If x is 0, the result is undefined.

The hangling of 0 seems the cause that __builtin_ffs codegen is not as
well as inline asm.  Clang implemented the builtin in 2008 and took the
same constraint (penalty).



GCC compiles __builtin_ctzl(x) to xorl    %eax, %eax; tzcntq  %rdi, %rax
on most Intel processors (AMD -march= values are unaffected). The extra
xor is due to a false dependency issue
https://gcc.gnu.org/git/?p=gcc.git&a=commit;h=73543b2286027da1de561902440b53f775a03a86

Inline asm wins here as well since we know the argument 0 is undefined.

In May 2023, https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=cc6eb8b51f9568ae0caf46b80e2a0aff050030ce
"Disable avoid_false_dep_for_bmi for atom and icelake(and later) core processors."
removed the extra xor for icelake (and later) core processors.

>but this is actually quite a common pattern, and it's often not about
>something like __builtin_ffs() at all.
>
>See all the other __builtin_constant_p()'s that we have in that same
>file because we basically just use different code sequences for
>constants.
>
>And that file isn't even unusual. We use it quite a lot when we care
>about code generation for some particular case.
>
>> 2. by providing the definition of a symbol typically provided by libc
>> (and then not building with -ffreestanding) pessimizes libcall
>> optimization.
>
>.. and this is partly why we often avoid libgcc things, and do certain
>things by hand.
>
>The classic rule is "Don't do 64-bit divisions using the C '/' operator".
>
>So in the kernel you have to use do_div() and friends, because the
>library versions are often disgusting and might not know that 64/32 is
>much much cheaper and is what you want.
>
>And quite often we simply use other names - but then we also do *not*
>build with -freestanding, because -freestanding has at least
>traditionally meant that the compiler won't optimize the simple and
>obvious cases (typically things like "memcpy with a constant size").
>
>So we mix and match and pick the best option.
>
>The kernel really doesn't care about architecture portability, because
>honestly, something like "ffs()" is entirely *trivial* to get right,
>compared to the real differences between architectures (eg VM and IO
>differences etc).
>
>             Linus
>

  reply	other threads:[~2023-08-26  3:18 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-25 19:50 [PATCH] lib/clz_ctz.c: Fix __clzdi2() and __ctzdi2() for 32-bit kernels Helge Deller
2023-08-25 20:25 ` Linus Torvalds
2023-08-25 20:43   ` Linus Torvalds
2023-08-25 21:01     ` Nick Desaulniers
2023-08-25 22:33       ` Bill Wendling
2023-08-25 22:57         ` Bill Wendling
2023-08-25 23:34           ` Linus Torvalds
2023-08-26  0:08             ` Bill Wendling
2023-08-26  0:52             ` Nick Desaulniers
2023-08-26  1:07               ` Linus Torvalds
2023-08-26  3:17                 ` Fangrui Song [this message]
2023-08-28  7:33                 ` Geert Uytterhoeven
2023-08-28 16:24                   ` Linus Torvalds
2023-08-28 20:13                     ` Nick Desaulniers
2023-08-28 20:09                   ` Nick Desaulniers
2023-08-28 20:08                 ` Nick Desaulniers
2023-08-28 10:53     ` David Laight
2023-08-28 16:30       ` Linus Torvalds
2023-08-28 20:14         ` Nick Desaulniers
2023-08-28 20:10       ` Nick Desaulniers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230826031752.gongmxkr56zyycol@google.com \
    --to=maskray@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=chanho.min@lge.com \
    --cc=deller@gmx.de \
    --cc=geert@linux-m68k.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=keescook@chromium.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-parisc@vger.kernel.org \
    --cc=llvm@lists.linux.dev \
    --cc=morbo@google.com \
    --cc=nathan@kernel.org \
    --cc=ndesaulniers@google.com \
    --cc=torvalds@linux-foundation.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).