All of lore.kernel.org
 help / color / mirror / Atom feed
* ilog2 vs. GCC inlining heuristics
@ 2020-10-21 13:27 Jakub Jelinek
  2020-10-21 13:36 ` Christophe Leroy
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Jakub Jelinek @ 2020-10-21 13:27 UTC (permalink / raw)
  To: linux-toolchains, Christophe Leroy

Hi!

Based on the GCC PR97445 discussions, I'd like to propose following change,
which should significantly decrease the amount of code in inline functions
that use ilog2, but as I'm already two decades out of the Linux kernel
development, I'd appreciate if some kernel developer could try that (all
I have done is check that it gives the same results as before) and if it
works submit it for inclusion into the kernel?

Thanks.


Improve ilog2 for constant arguments

As discussed in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
the const_ilog2 macro generates a lot of code which interferes badly
with GCC inlining heuristics, until it can be proven that the ilog2
argument can or can't be simplified into a constant.

It can be expressed using __builtin_clzll builtin which is supported
by GCC 3.4 and later and when used only in the __builtin_constant_p guarded
code it ought to always fold back to a constant.
Other compilers support the same builtin for many years too.

Other option would be to change the const_ilog2 macro, though as the
description says it is meant to be used also in C constant expressions,
and while GCC will fold it to constant with constant argument even in
those, perhaps it is better to avoid using extensions in that case.

Signed-off-by: Jakub Jelinek <jakub@redhat.com>

diff --git a/include/linux/log2.h b/include/linux/log2.h
index c619ec6eff4a..4307d3477642 100644
--- a/include/linux/log2.h
+++ b/include/linux/log2.h
@@ -156,7 +156,8 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
 #define ilog2(n) \
 ( \
 	__builtin_constant_p(n) ?	\
-	const_ilog2(n) :		\
+	((n) < 2 ? 0 :			\
+	 63 - __builtin_clzll (n)) :	\
 	(sizeof(n) <= 4) ?		\
 	__ilog2_u32(n) :		\
 	__ilog2_u64(n)			\

	Jakub


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: ilog2 vs. GCC inlining heuristics
  2020-10-21 13:27 ilog2 vs. GCC inlining heuristics Jakub Jelinek
@ 2020-10-21 13:36 ` Christophe Leroy
  2020-10-21 13:45   ` Jakub Jelinek
  2020-10-21 15:19 ` Peter Zijlstra
  2020-11-20 12:33 ` [tip: core/core] " tip-bot2 for Jakub Jelinek
  2 siblings, 1 reply; 9+ messages in thread
From: Christophe Leroy @ 2020-10-21 13:36 UTC (permalink / raw)
  To: Jakub Jelinek, linux-toolchains



Le 21/10/2020 à 15:27, Jakub Jelinek a écrit :
> Hi!
> 
> Based on the GCC PR97445 discussions, I'd like to propose following change,
> which should significantly decrease the amount of code in inline functions
> that use ilog2, but as I'm already two decades out of the Linux kernel
> development, I'd appreciate if some kernel developer could try that (all
> I have done is check that it gives the same results as before) and if it
> works submit it for inclusion into the kernel?
> 
> Thanks.
> 
> 
> Improve ilog2 for constant arguments
> 
> As discussed in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
> the const_ilog2 macro generates a lot of code which interferes badly
> with GCC inlining heuristics, until it can be proven that the ilog2
> argument can or can't be simplified into a constant.
> 
> It can be expressed using __builtin_clzll builtin which is supported
> by GCC 3.4 and later and when used only in the __builtin_constant_p guarded
> code it ought to always fold back to a constant.
> Other compilers support the same builtin for many years too.
> 
> Other option would be to change the const_ilog2 macro, though as the
> description says it is meant to be used also in C constant expressions,
> and while GCC will fold it to constant with constant argument even in
> those, perhaps it is better to avoid using extensions in that case.
> 
> Signed-off-by: Jakub Jelinek <jakub@redhat.com>
> 
> diff --git a/include/linux/log2.h b/include/linux/log2.h
> index c619ec6eff4a..4307d3477642 100644
> --- a/include/linux/log2.h
> +++ b/include/linux/log2.h
> @@ -156,7 +156,8 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
>   #define ilog2(n) \
>   ( \
>   	__builtin_constant_p(n) ?	\
> -	const_ilog2(n) :		\
> +	((n) < 2 ? 0 :			\
> +	 63 - __builtin_clzll (n)) :	\

That is likely to work only on 64 bits targets.
On 32 bits targets I guess it needs to be  31 - __builtin_clz (n), doesn't it ?

>   	(sizeof(n) <= 4) ?		\
>   	__ilog2_u32(n) :		\
>   	__ilog2_u64(n)			\
> 
> 	Jakub
> 

Christophe

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: ilog2 vs. GCC inlining heuristics
  2020-10-21 13:36 ` Christophe Leroy
@ 2020-10-21 13:45   ` Jakub Jelinek
  0 siblings, 0 replies; 9+ messages in thread
From: Jakub Jelinek @ 2020-10-21 13:45 UTC (permalink / raw)
  To: Christophe Leroy; +Cc: linux-toolchains

On Wed, Oct 21, 2020 at 03:36:14PM +0200, Christophe Leroy wrote:
> > --- a/include/linux/log2.h
> > +++ b/include/linux/log2.h
> > @@ -156,7 +156,8 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
> >   #define ilog2(n) \
> >   ( \
> >   	__builtin_constant_p(n) ?	\
> > -	const_ilog2(n) :		\
> > +	((n) < 2 ? 0 :			\
> > +	 63 - __builtin_clzll (n)) :	\
> 
> That is likely to work only on 64 bits targets.

No, it should work on all.  __builtin_clzll argument is long long, which
should be 64-bit on all architectures Linux kernel supports (and all
architectures GCC supports - ok, avr with -mint8 doesn't, but it isn't C
requirements compatible in that case).
const_ilog2 also uses long long and will work with any long long or
unsigned long long value (but e.g. not with __uint128_t values on 64-bit
architectures, guess the kernel doesn't need that though).
And similarly, for the non-constant operands the sizeof below ensures
that __ilog2_u64 is used (even on 32-bit arches) and will handle that.

> On 32 bits targets I guess it needs to be  31 - __builtin_clz (n), doesn't it ?
> 
> >   	(sizeof(n) <= 4) ?		\
> >   	__ilog2_u32(n) :		\
> >   	__ilog2_u64(n)			\
> > 
> > 	Jakub
> > 

	Jakub


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: ilog2 vs. GCC inlining heuristics
  2020-10-21 13:27 ilog2 vs. GCC inlining heuristics Jakub Jelinek
  2020-10-21 13:36 ` Christophe Leroy
@ 2020-10-21 15:19 ` Peter Zijlstra
  2020-10-21 18:40   ` Christophe Leroy
  2020-11-20 12:33 ` [tip: core/core] " tip-bot2 for Jakub Jelinek
  2 siblings, 1 reply; 9+ messages in thread
From: Peter Zijlstra @ 2020-10-21 15:19 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: linux-toolchains, Christophe Leroy

On Wed, Oct 21, 2020 at 03:27:18PM +0200, Jakub Jelinek wrote:
> Hi!
> 
> Based on the GCC PR97445 discussions, I'd like to propose following change,
> which should significantly decrease the amount of code in inline functions
> that use ilog2, but as I'm already two decades out of the Linux kernel
> development, I'd appreciate if some kernel developer could try that (all
> I have done is check that it gives the same results as before) and if it
> works submit it for inclusion into the kernel?

I'll stick it in my queue and feed it to the robots.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: ilog2 vs. GCC inlining heuristics
  2020-10-21 15:19 ` Peter Zijlstra
@ 2020-10-21 18:40   ` Christophe Leroy
  2020-10-22  4:01     ` Randy Dunlap
  0 siblings, 1 reply; 9+ messages in thread
From: Christophe Leroy @ 2020-10-21 18:40 UTC (permalink / raw)
  To: Peter Zijlstra, Jakub Jelinek; +Cc: linux-toolchains



Le 21/10/2020 à 17:19, Peter Zijlstra a écrit :
> On Wed, Oct 21, 2020 at 03:27:18PM +0200, Jakub Jelinek wrote:
>> Hi!
>>
>> Based on the GCC PR97445 discussions, I'd like to propose following change,
>> which should significantly decrease the amount of code in inline functions
>> that use ilog2, but as I'm already two decades out of the Linux kernel
>> development, I'd appreciate if some kernel developer could try that (all
>> I have done is check that it gives the same results as before) and if it
>> works submit it for inclusion into the kernel?
> 
> I'll stick it in my queue and feed it to the robots.
> 

I did a mpc885_ads_defconfig build with your patch. That's far better, even better than with gcc 9 
without the patch.

I only have two instances of get_order() in vmlinux, one of it is used twice, the other is never user.

With -Winline, the reason for not inlining is for both because "the call is unlikely and the code 
size would grow"

Christophe

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: ilog2 vs. GCC inlining heuristics
  2020-10-21 18:40   ` Christophe Leroy
@ 2020-10-22  4:01     ` Randy Dunlap
  2020-10-22  7:12       ` Jakub Jelinek
  0 siblings, 1 reply; 9+ messages in thread
From: Randy Dunlap @ 2020-10-22  4:01 UTC (permalink / raw)
  To: Christophe Leroy, Peter Zijlstra, Jakub Jelinek; +Cc: linux-toolchains

[-- Attachment #1: Type: text/plain, Size: 1379 bytes --]

On 10/21/20 11:40 AM, Christophe Leroy wrote:
> 
> 
> Le 21/10/2020 à 17:19, Peter Zijlstra a écrit :
>> On Wed, Oct 21, 2020 at 03:27:18PM +0200, Jakub Jelinek wrote:
>>> Hi!
>>>
>>> Based on the GCC PR97445 discussions, I'd like to propose following change,
>>> which should significantly decrease the amount of code in inline functions
>>> that use ilog2, but as I'm already two decades out of the Linux kernel
>>> development, I'd appreciate if some kernel developer could try that (all
>>> I have done is check that it gives the same results as before) and if it
>>> works submit it for inclusion into the kernel?
>>
>> I'll stick it in my queue and feed it to the robots.
>>
> 
> I did a mpc885_ads_defconfig build with your patch. That's far better, even better than with gcc 9 without the patch.
> 
> I only have two instances of get_order() in vmlinux, one of it is used twice, the other is never user.
> 
> With -Winline, the reason for not inlining is for both because "the call is unlikely and the code size would grow"
> 
> Christophe

Do we want a lib/test_log2.c test?  Although Jakub says that he already
verified that is gives the same results as the previous code.

I ran a (new) lib/test_log2.ko before and after Jakub's patch and got the
same results, although I am not claiming that it has an exhaustive set of
tests in it.  [it is attached]

-- 
~Randy


[-- Attachment #2: lib-test-log2.patch --]
[-- Type: text/x-patch, Size: 4527 bytes --]

From: Randy Dunlap <rdunlap@infradead.org>

// modified from lib/test_bitops.c

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
---
 lib/Kconfig.debug |   11 +++
 lib/Makefile      |    1 
 lib/test_log2.c   |  139 ++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 151 insertions(+)

--- linux-next-20201021.orig/lib/Kconfig.debug
+++ linux-next-20201021/lib/Kconfig.debug
@@ -2136,6 +2136,17 @@ config TEST_BITOPS
 
 	  If unsure, say N.
 
+config TEST_LOG2
+	tristate "Test module for log2 interfaces"
+	depends on m
+	help
+	  This builds the "test_log2" module that does basic testing of
+	  all <linux/log2.h> interfaces.
+	  It has no dependencies and doesn't run or load unless
+	  explicitly requested by name.  For example: modprobe test_log2.
+
+	  If unsure, say N.
+
 config TEST_VMALLOC
 	tristate "Test module for stress/performance analysis of vmalloc allocator"
 	default n
--- linux-next-20201021.orig/lib/Makefile
+++ linux-next-20201021/lib/Makefile
@@ -62,6 +62,7 @@ obj-$(CONFIG_TEST_BPF) += test_bpf.o
 obj-$(CONFIG_TEST_FIRMWARE) += test_firmware.o
 obj-$(CONFIG_TEST_BITOPS) += test_bitops.o
 CFLAGS_test_bitops.o += -Werror
+obj-$(CONFIG_TEST_LOG2) += test_log2.o
 obj-$(CONFIG_TEST_SYSCTL) += test_sysctl.o
 obj-$(CONFIG_TEST_HASH) += test_hash.o test_siphash.o
 obj-$(CONFIG_TEST_IDA) += test_ida.o
--- /dev/null
+++ linux-next-20201021/lib/test_log2.c
@@ -0,0 +1,139 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+// currently missing tests of roundup/rounddown macros;
+
+#include <linux/init.h>
+#include <linux/module.h>
+#include <linux/printk.h>
+#include <linux/log2.h>
+#include <asm/page.h>
+
+/*
+ * a tiny module for testing all interfaces in <linux/log2.h>
+ */
+
+static unsigned int test_power_of_2[][2] = {
+	{0, 0},
+	{PAGE_SIZE, 1},
+	{65535, 0},
+	{65536, 1},
+	{(unsigned int)~0, 0},
+	{0x80000000, 1},
+	{0x80008000, 0},
+};
+
+static unsigned int test_ilog2_uint[][2] = {
+	//{0, -1},
+	{1, 0},
+	{2, 1},
+	{4, 2},
+	{(unsigned int)~0, 31},
+	{0x80000000, 31},
+	{0x80008000, 31},
+};
+
+static unsigned long test_ilog2_ulong[][2] = {
+	//{0, -1},
+	{1, 0},
+	{2, 1},
+	{4, 2},
+	{(unsigned long)~0, 63},
+	{0x80000000UL, 31},
+	{0x80008000UL, 31},
+};
+
+static unsigned int order_base2[][2] = {
+	{0x00000003,  2},
+	{0x00000004,  2},
+	{0x00001fff, 13},
+	{0x00002000, 13},
+	{0x50000000, 31},
+	{0x80000000, 31},
+	{0x80003000, 32},
+};
+
+//#ifdef CONFIG_64BIT
+static unsigned long order_base2_long[][2] = {
+	{0x0000000300000000, 34},
+	{0x0000000400000000, 34},
+	{0x00001fff00000000, 45},
+	{0x0000200000000000, 45},
+	{0x5000000000000000, 63},
+	{0x8000000000000000, 63},
+	{0x8000300000000000, 64},
+};
+//#endif
+
+static unsigned int test_bits_per[][2] = {
+	{0, 1},
+	{1, 1},
+	{2, 2},
+	{3, 2},
+	{4, 3},
+	{0x0fff, 12},
+	{0x1000, 13},
+	{0x80003000, 32},
+};
+
+static int __init test_log2_start(void)
+{
+	int i;
+
+	pr_info("Starting log2 tests\n");
+
+	for (i = 0; i < ARRAY_SIZE(test_power_of_2); i++) {
+		if (is_power_of_2(test_power_of_2[i][0]) != test_power_of_2[i][1])
+			pr_warn("is_power_of_2 wrong for %x\n",
+				       test_power_of_2[i][0]);
+	}
+
+	for (i = 0; i < ARRAY_SIZE(test_ilog2_uint); i++) {
+		if (ilog2(test_ilog2_uint[i][0]) != test_ilog2_uint[i][1])
+			pr_warn("ilog2 wrong for uint %x\n",
+				       test_ilog2_uint[i][0]);
+	}
+
+	for (i = 0; i < ARRAY_SIZE(test_ilog2_ulong); i++) {
+		if (ilog2(test_ilog2_ulong[i][0]) != test_ilog2_ulong[i][1])
+			pr_warn("ilog2 wrong for ulong %x\n",
+				       test_ilog2_ulong[i][0]);
+	}
+
+	for (i = 0; i < ARRAY_SIZE(order_base2); i++) {
+		if (order_base_2(order_base2[i][0]) != order_base2[i][1])
+			pr_warn("order_base_2 wrong for %x\n",
+				       order_base2[i][0]);
+	}
+
+//#ifdef CONFIG_64BIT
+	for (i = 0; i < ARRAY_SIZE(order_base2_long); i++) {
+		if (order_base_2(order_base2_long[i][0]) !=
+			       order_base2_long[i][1])
+			pr_warn("order_base_2 wrong for %lx\n",
+				       order_base2_long[i][0]);
+	}
+//#endif
+
+	for (i = 0; i < ARRAY_SIZE(test_bits_per); i++) {
+		if (bits_per(test_bits_per[i][0]) != test_bits_per[i][1])
+			pr_warn("bits_per wrong for %x\n",
+				       test_bits_per[i][0]);
+	}
+
+	pr_info("Completed log2 tests\n");
+
+	return 0;
+}
+
+static void __exit test_log2_fini(void)
+{
+}
+
+module_init(test_log2_start);
+module_exit(test_log2_fini);
+
+MODULE_AUTHOR("Randy Dunlap <rdunlap@infradead.org>");
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("log2 test module");

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: ilog2 vs. GCC inlining heuristics
  2020-10-22  4:01     ` Randy Dunlap
@ 2020-10-22  7:12       ` Jakub Jelinek
  0 siblings, 0 replies; 9+ messages in thread
From: Jakub Jelinek @ 2020-10-22  7:12 UTC (permalink / raw)
  To: Randy Dunlap; +Cc: Christophe Leroy, Peter Zijlstra, linux-toolchains

On Wed, Oct 21, 2020 at 09:01:01PM -0700, Randy Dunlap wrote:
> I ran a (new) lib/test_log2.ko before and after Jakub's patch and got the
> same results, although I am not claiming that it has an exhaustive set of
> tests in it.  [it is attached]

The test most likely tests the non-constant paths (which is something
I haven't changed), unless all the loops would be completely unrolled,
all the arrays determined to be never written to and turned into const and
all the loads from those arrays optimized into constants.

If you want to test both the non-constant and constant paths, it would
be good to make sure to guarantee that.  For non-constant, e.g. use those
arrays but add some optimization barrier so that the compiler can't
optimize them into constants (e.g. make the arrays volatile, or add
__asm__ ("" : : "g" (test_power_of_2) : "memory");
etc. barriers at the start of test function etc.
And for constants, perhaps put the test values into macros, through
which to place it both
#define TEST_ILOG2_UINT \
  TEST (1, 0) TEST (2, 1) TEST (4, 2) TEST (~0U, 31) \
  TEST (0x80000000U, 31)
and once define TEST(x, y) as { x, y }, then #undef TEST
and in another place as if (ilog2(x) != y) ...

Also, you do want to test even behavior of ilog2 at 0, e.g. the
powerpc fls/fls64 implementations are incorrect for that value as could be
seen with -fsanitize=undefined.

And you do want to test also unsigned long long.

	Jakub


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [tip: core/core] ilog2 vs. GCC inlining heuristics
  2020-10-21 13:27 ilog2 vs. GCC inlining heuristics Jakub Jelinek
  2020-10-21 13:36 ` Christophe Leroy
  2020-10-21 15:19 ` Peter Zijlstra
@ 2020-11-20 12:33 ` tip-bot2 for Jakub Jelinek
  2020-11-20 12:41   ` Peter Zijlstra
  2 siblings, 1 reply; 9+ messages in thread
From: tip-bot2 for Jakub Jelinek @ 2020-11-20 12:33 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: Jakub Jelinek, Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the core/core branch of tip:

Commit-ID:     ecbd43f6728a5cf79c8b50ed326658e9181531b1
Gitweb:        https://git.kernel.org/tip/ecbd43f6728a5cf79c8b50ed326658e9181531b1
Author:        Jakub Jelinek <jakub@redhat.com>
AuthorDate:    Wed, 21 Oct 2020 15:27:18 +02:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Thu, 19 Nov 2020 11:26:18 +01:00

ilog2 vs. GCC inlining heuristics

Hi!

Based on the GCC PR97445 discussions, I'd like to propose following change,
which should significantly decrease the amount of code in inline functions
that use ilog2, but as I'm already two decades out of the Linux kernel
development, I'd appreciate if some kernel developer could try that (all
I have done is check that it gives the same results as before) and if it
works submit it for inclusion into the kernel?

Thanks.

Improve ilog2 for constant arguments

As discussed in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
the const_ilog2 macro generates a lot of code which interferes badly
with GCC inlining heuristics, until it can be proven that the ilog2
argument can or can't be simplified into a constant.

It can be expressed using __builtin_clzll builtin which is supported
by GCC 3.4 and later and when used only in the __builtin_constant_p guarded
code it ought to always fold back to a constant.
Other compilers support the same builtin for many years too.

Other option would be to change the const_ilog2 macro, though as the
description says it is meant to be used also in C constant expressions,
and while GCC will fold it to constant with constant argument even in
those, perhaps it is better to avoid using extensions in that case.

Signed-off-by: Jakub Jelinek <jakub@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20201021132718.GB2176@tucnak
---
 include/linux/log2.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/linux/log2.h b/include/linux/log2.h
index c619ec6..4307d34 100644
--- a/include/linux/log2.h
+++ b/include/linux/log2.h
@@ -156,7 +156,8 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
 #define ilog2(n) \
 ( \
 	__builtin_constant_p(n) ?	\
-	const_ilog2(n) :		\
+	((n) < 2 ? 0 :			\
+	 63 - __builtin_clzll (n)) :	\
 	(sizeof(n) <= 4) ?		\
 	__ilog2_u32(n) :		\
 	__ilog2_u64(n)			\

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [tip: core/core] ilog2 vs. GCC inlining heuristics
  2020-11-20 12:33 ` [tip: core/core] " tip-bot2 for Jakub Jelinek
@ 2020-11-20 12:41   ` Peter Zijlstra
  0 siblings, 0 replies; 9+ messages in thread
From: Peter Zijlstra @ 2020-11-20 12:41 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-tip-commits, Jakub Jelinek, x86


Sorry, I typoed the branch name. I'll make this branch go away.

Anyway, Jacub, your patch seems to not upset the robots, so I'll go post
it properly for you.

On Fri, Nov 20, 2020 at 12:33:58PM -0000, tip-bot2 for Jakub Jelinek wrote:
> The following commit has been merged into the core/core branch of tip:
> 
> Commit-ID:     ecbd43f6728a5cf79c8b50ed326658e9181531b1
> Gitweb:        https://git.kernel.org/tip/ecbd43f6728a5cf79c8b50ed326658e9181531b1
> Author:        Jakub Jelinek <jakub@redhat.com>
> AuthorDate:    Wed, 21 Oct 2020 15:27:18 +02:00
> Committer:     Peter Zijlstra <peterz@infradead.org>
> CommitterDate: Thu, 19 Nov 2020 11:26:18 +01:00
> 
> ilog2 vs. GCC inlining heuristics
> 
> Hi!
> 
> Based on the GCC PR97445 discussions, I'd like to propose following change,
> which should significantly decrease the amount of code in inline functions
> that use ilog2, but as I'm already two decades out of the Linux kernel
> development, I'd appreciate if some kernel developer could try that (all
> I have done is check that it gives the same results as before) and if it
> works submit it for inclusion into the kernel?
> 
> Thanks.
> 
> Improve ilog2 for constant arguments
> 
> As discussed in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97445
> the const_ilog2 macro generates a lot of code which interferes badly
> with GCC inlining heuristics, until it can be proven that the ilog2
> argument can or can't be simplified into a constant.
> 
> It can be expressed using __builtin_clzll builtin which is supported
> by GCC 3.4 and later and when used only in the __builtin_constant_p guarded
> code it ought to always fold back to a constant.
> Other compilers support the same builtin for many years too.
> 
> Other option would be to change the const_ilog2 macro, though as the
> description says it is meant to be used also in C constant expressions,
> and while GCC will fold it to constant with constant argument even in
> those, perhaps it is better to avoid using extensions in that case.
> 
> Signed-off-by: Jakub Jelinek <jakub@redhat.com>
> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Link: https://lkml.kernel.org/r/20201021132718.GB2176@tucnak
> ---
>  include/linux/log2.h | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/log2.h b/include/linux/log2.h
> index c619ec6..4307d34 100644
> --- a/include/linux/log2.h
> +++ b/include/linux/log2.h
> @@ -156,7 +156,8 @@ unsigned long __rounddown_pow_of_two(unsigned long n)
>  #define ilog2(n) \
>  ( \
>  	__builtin_constant_p(n) ?	\
> -	const_ilog2(n) :		\
> +	((n) < 2 ? 0 :			\
> +	 63 - __builtin_clzll (n)) :	\
>  	(sizeof(n) <= 4) ?		\
>  	__ilog2_u32(n) :		\
>  	__ilog2_u64(n)			\

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2020-11-20 12:41 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-21 13:27 ilog2 vs. GCC inlining heuristics Jakub Jelinek
2020-10-21 13:36 ` Christophe Leroy
2020-10-21 13:45   ` Jakub Jelinek
2020-10-21 15:19 ` Peter Zijlstra
2020-10-21 18:40   ` Christophe Leroy
2020-10-22  4:01     ` Randy Dunlap
2020-10-22  7:12       ` Jakub Jelinek
2020-11-20 12:33 ` [tip: core/core] " tip-bot2 for Jakub Jelinek
2020-11-20 12:41   ` Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.