All of lore.kernel.org
 help / color / mirror / Atom feed
* [GIT PULL] optimize 64-by-32 ddivision for constant divisors on 32-bit machines
@ 2015-11-17  1:20 Nicolas Pitre
  2015-11-19 16:28 ` Arnd Bergmann
  2015-11-22 22:18 ` Arnd Bergmann
  0 siblings, 2 replies; 15+ messages in thread
From: Nicolas Pitre @ 2015-11-17  1:20 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: Russell King - ARM Linux, linux-arch, linux-kernel

Arnd,

Please pull the following branch:

	git://git.linaro.org/people/nicolas.pitre/linux div64

This contains those patches I've initially posted here:

	https://lkml.org/lkml/2015/11/2/715

Only changes to those posted patches are cosmetic improvements such as 
the use of ilog2() replacing the custom __div64_ffs(). Exposure in 
linux-next would be a good thing.

I also included fixes for a couple do_div() misuses that an allyesconfig 
build turned up after switching ARM to the generic do_div() code.  
Those patches have been posted separately and addressed to relevant 
maintainers. They are included here until/unless those maintainers 
include those patches in their tree.

Original cover letter:

This is a generalization of the optimization I produced for ARM a decade
ago to turn constant divisors into a multiplication by the divisor
reciprocal. Turns out that after all those years gcc is still not
optimizing things on its own for that case.

This has important performance benefits as discussed in this thread:

	https://lkml.org/lkml/2015/10/28/851

This series brings the formerly ARM-only optimization to all 32-bit
architectures using C code by default.  The possibility for the actual
multiplication to be implemented in assembly is provided in order to get
optimal code.  The ARM version can be used as an example implementation
for other interested architectures to implement.


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [GIT PULL] optimize 64-by-32 ddivision for constant divisors on 32-bit machines
  2015-11-17  1:20 [GIT PULL] optimize 64-by-32 ddivision for constant divisors on 32-bit machines Nicolas Pitre
@ 2015-11-19 16:28 ` Arnd Bergmann
  2015-11-20  0:29   ` Nicolas Pitre
  2015-11-22 22:18 ` Arnd Bergmann
  1 sibling, 1 reply; 15+ messages in thread
From: Arnd Bergmann @ 2015-11-19 16:28 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Russell King - ARM Linux, linux-arch, linux-kernel

On Monday 16 November 2015 20:20:38 Nicolas Pitre wrote:
> Arnd,
> 
> Please pull the following branch:
> 
> 	git://git.linaro.org/people/nicolas.pitre/linux div64
> 
> This contains those patches I've initially posted here:
> 
> 	https://lkml.org/lkml/2015/11/2/715
> 
> Only changes to those posted patches are cosmetic improvements such as 
> the use of ilog2() replacing the custom __div64_ffs(). Exposure in 
> linux-next would be a good thing.
> 
> I also included fixes for a couple do_div() misuses that an allyesconfig 
> build turned up after switching ARM to the generic do_div() code.  
> Those patches have been posted separately and addressed to relevant 
> maintainers. They are included here until/unless those maintainers 
> include those patches in their tree.
> 
> Original cover letter:
> 
> This is a generalization of the optimization I produced for ARM a decade
> ago to turn constant divisors into a multiplication by the divisor
> reciprocal. Turns out that after all those years gcc is still not
> optimizing things on its own for that case.
> 
> This has important performance benefits as discussed in this thread:
> 
> 	https://lkml.org/lkml/2015/10/28/851
> 
> This series brings the formerly ARM-only optimization to all 32-bit
> architectures using C code by default.  The possibility for the actual
> multiplication to be implemented in assembly is provided in order to get
> optimal code.  The ARM version can be used as an example implementation
> for other interested architectures to implement.
> 

Pulled into my asm-generic tree now, it should show up in linux-next
tomorrow.

Thanks,

	Arnd

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [GIT PULL] optimize 64-by-32 ddivision for constant divisors on 32-bit machines
  2015-11-19 16:28 ` Arnd Bergmann
@ 2015-11-20  0:29   ` Nicolas Pitre
  2015-11-20 12:27     ` Arnd Bergmann
  0 siblings, 1 reply; 15+ messages in thread
From: Nicolas Pitre @ 2015-11-20  0:29 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: Russell King - ARM Linux, linux-arch, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 555 bytes --]

On Thu, 19 Nov 2015, Arnd Bergmann wrote:

> On Monday 16 November 2015 20:20:38 Nicolas Pitre wrote:
> > Arnd,
> > 
> > Please pull the following branch:
> > 
> > 	git://git.linaro.org/people/nicolas.pitre/linux div64
> 
> Pulled into my asm-generic tree now, it should show up in linux-next
> tomorrow.

Thanks.

I removed a small optimization from the top commit that was included by 
mistake into a patch that was supposed to produce identical code 
(noticed by Måns). It's not urgent but it would be good if you could 
repull at some point.


Nicolas

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [GIT PULL] optimize 64-by-32 ddivision for constant divisors on 32-bit machines
  2015-11-20  0:29   ` Nicolas Pitre
@ 2015-11-20 12:27     ` Arnd Bergmann
  2015-11-20 13:24       ` Nicolas Pitre
  0 siblings, 1 reply; 15+ messages in thread
From: Arnd Bergmann @ 2015-11-20 12:27 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Russell King - ARM Linux, linux-arch, linux-kernel

On Thursday 19 November 2015 19:29:33 Nicolas Pitre wrote:
> On Thu, 19 Nov 2015, Arnd Bergmann wrote:
> 
> > On Monday 16 November 2015 20:20:38 Nicolas Pitre wrote:
> > > Arnd,
> > > 
> > > Please pull the following branch:
> > > 
> > >     git://git.linaro.org/people/nicolas.pitre/linux div64
> > 
> > Pulled into my asm-generic tree now, it should show up in linux-next
> > tomorrow.
> 
> Thanks.
> 
> I removed a small optimization from the top commit that was included by 
> mistake into a patch that was supposed to produce identical code 
> (noticed by Måns). It's not urgent but it would be good if you could 
> repull at some point.

I already applied another patch on top, could you send a relative
patch instead? If that ends up being ugly, I can redo the branch,
but I generally prefer to not rebase things I have on kernel.org that I
plan to send to Linus.

	Arnd

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [GIT PULL] optimize 64-by-32 ddivision for constant divisors on 32-bit machines
  2015-11-20 12:27     ` Arnd Bergmann
@ 2015-11-20 13:24       ` Nicolas Pitre
  0 siblings, 0 replies; 15+ messages in thread
From: Nicolas Pitre @ 2015-11-20 13:24 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: Russell King - ARM Linux, linux-arch, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1104 bytes --]

On Fri, 20 Nov 2015, Arnd Bergmann wrote:

> On Thursday 19 November 2015 19:29:33 Nicolas Pitre wrote:
> > On Thu, 19 Nov 2015, Arnd Bergmann wrote:
> > 
> > > On Monday 16 November 2015 20:20:38 Nicolas Pitre wrote:
> > > > Arnd,
> > > > 
> > > > Please pull the following branch:
> > > > 
> > > >     git://git.linaro.org/people/nicolas.pitre/linux div64
> > > 
> > > Pulled into my asm-generic tree now, it should show up in linux-next
> > > tomorrow.
> > 
> > Thanks.
> > 
> > I removed a small optimization from the top commit that was included by 
> > mistake into a patch that was supposed to produce identical code 
> > (noticed by Måns). It's not urgent but it would be good if you could 
> > repull at some point.
> 
> I already applied another patch on top, could you send a relative
> patch instead? If that ends up being ugly, I can redo the branch,
> but I generally prefer to not rebase things I have on kernel.org that I
> plan to send to Linus.

Alternately we can leave things as they are.  There is no bugs here, 
just that the changelog doesn't exactly describe the change.


Nicolas

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [GIT PULL] optimize 64-by-32 ddivision for constant divisors on 32-bit machines
  2015-11-17  1:20 [GIT PULL] optimize 64-by-32 ddivision for constant divisors on 32-bit machines Nicolas Pitre
  2015-11-19 16:28 ` Arnd Bergmann
@ 2015-11-22 22:18 ` Arnd Bergmann
  2015-11-22 22:28   ` Nicolas Pitre
  1 sibling, 1 reply; 15+ messages in thread
From: Arnd Bergmann @ 2015-11-22 22:18 UTC (permalink / raw)
  To: Nicolas Pitre, Russell King - ARM Linux, linux-arch, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2500 bytes --]

On Monday 16 November 2015 20:20:38 you wrote:
> Arnd,
> 
> Please pull the following branch:
> 
> 	git://git.linaro.org/people/nicolas.pitre/linux div64
> 
> This contains those patches I've initially posted here:
> 
> 	https://lkml.org/lkml/2015/11/2/715
> 
> Only changes to those posted patches are cosmetic improvements such as 
> the use of ilog2() replacing the custom __div64_ffs(). Exposure in 
> linux-next would be a good thing.
> 
> I also included fixes for a couple do_div() misuses that an allyesconfig 
> build turned up after switching ARM to the generic do_div() code.  
> Those patches have been posted separately and addressed to relevant 
> maintainers. They are included here until/unless those maintainers 
> include those patches in their tree.
> 
> Original cover letter:
> 
> This is a generalization of the optimization I produced for ARM a decade
> ago to turn constant divisors into a multiplication by the divisor
> reciprocal. Turns out that after all those years gcc is still not
> optimizing things on its own for that case.
> 
> This has important performance benefits as discussed in this thread:
> 
> 	https://lkml.org/lkml/2015/10/28/851
> 
> This series brings the formerly ARM-only optimization to all 32-bit
> architectures using C code by default.  The possibility for the actual
> multiplication to be implemented in assembly is provided in order to get
> optimal code.  The ARM version can be used as an example implementation
> for other interested architectures to implement.

I'm now getting a build regressing with the attached randconfig configuration,
when compiling drivers/net/wireless/iwlegacy/common.o:

drivers/built-in.o: In function `il_send_rxon_timing':
:(.text+0xbbac80): undefined reference to `__aeabi_uldivmod'
:(.text+0xbbac9c): undefined reference to `__aeabi_uldivmod'
:(.text+0xbbacdc): undefined reference to `__aeabi_uldivmod'
:(.text+0xbbadc8): undefined reference to `__aeabi_uldivmod'
:(.text+0xbbadf8): undefined reference to `__aeabi_uldivmod'
:(.text+0xbbae3c): more undefined references to `__aeabi_uldivmod' follow
drivers/built-in.o: In function `il_send_rxon_timing':
:(.text+0xbbb11c): undefined reference to `____ilog2_NaN'

I've verified that this goes away if I turn off CONFIG_PROFILE_ALL_BRANCHES,
and it only shows up with gcc-5.0 through gcc-5.2.1, but not 4.9.3.

Aside from those, I have not been able to reduce the failure scenario.

Also, I have only tested on ARM32, no idea if this shows up elsewhere.

	Arnd

[-- Attachment #2: 0x2673AB84-config.gz --]
[-- Type: application/gzip, Size: 23757 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [GIT PULL] optimize 64-by-32 ddivision for constant divisors on 32-bit machines
  2015-11-22 22:18 ` Arnd Bergmann
@ 2015-11-22 22:28   ` Nicolas Pitre
  2015-11-23  7:59     ` Arnd Bergmann
  0 siblings, 1 reply; 15+ messages in thread
From: Nicolas Pitre @ 2015-11-22 22:28 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: Russell King - ARM Linux, linux-arch, linux-kernel

On Sun, 22 Nov 2015, Arnd Bergmann wrote:

> On Monday 16 November 2015 20:20:38 you wrote:
> > Arnd,
> > 
> > Please pull the following branch:
> > 
> > 	git://git.linaro.org/people/nicolas.pitre/linux div64
> > 
> > This contains those patches I've initially posted here:
> > 
> > 	https://lkml.org/lkml/2015/11/2/715
> > 
> > Only changes to those posted patches are cosmetic improvements such as 
> > the use of ilog2() replacing the custom __div64_ffs(). Exposure in 
> > linux-next would be a good thing.
> > 
> > I also included fixes for a couple do_div() misuses that an allyesconfig 
> > build turned up after switching ARM to the generic do_div() code.  
> > Those patches have been posted separately and addressed to relevant 
> > maintainers. They are included here until/unless those maintainers 
> > include those patches in their tree.
> > 
> > Original cover letter:
> > 
> > This is a generalization of the optimization I produced for ARM a decade
> > ago to turn constant divisors into a multiplication by the divisor
> > reciprocal. Turns out that after all those years gcc is still not
> > optimizing things on its own for that case.
> > 
> > This has important performance benefits as discussed in this thread:
> > 
> > 	https://lkml.org/lkml/2015/10/28/851
> > 
> > This series brings the formerly ARM-only optimization to all 32-bit
> > architectures using C code by default.  The possibility for the actual
> > multiplication to be implemented in assembly is provided in order to get
> > optimal code.  The ARM version can be used as an example implementation
> > for other interested architectures to implement.
> 
> I'm now getting a build regressing with the attached randconfig configuration,
> when compiling drivers/net/wireless/iwlegacy/common.o:
> 
> drivers/built-in.o: In function `il_send_rxon_timing':
> :(.text+0xbbac80): undefined reference to `__aeabi_uldivmod'
> :(.text+0xbbac9c): undefined reference to `__aeabi_uldivmod'
> :(.text+0xbbacdc): undefined reference to `__aeabi_uldivmod'
> :(.text+0xbbadc8): undefined reference to `__aeabi_uldivmod'
> :(.text+0xbbadf8): undefined reference to `__aeabi_uldivmod'
> :(.text+0xbbae3c): more undefined references to `__aeabi_uldivmod' follow
> drivers/built-in.o: In function `il_send_rxon_timing':
> :(.text+0xbbb11c): undefined reference to `____ilog2_NaN'

This looks like some gcc bug from a few years ago.

> I've verified that this goes away if I turn off CONFIG_PROFILE_ALL_BRANCHES,
> and it only shows up with gcc-5.0 through gcc-5.2.1, but not 4.9.3.
> 
> Aside from those, I have not been able to reduce the failure scenario.
> 
> Also, I have only tested on ARM32, no idea if this shows up elsewhere.

What if you revert "ARM: asm/div64.h: adjust to generic codde" and 
recompile?


Nicolas

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [GIT PULL] optimize 64-by-32 ddivision for constant divisors on 32-bit machines
  2015-11-22 22:28   ` Nicolas Pitre
@ 2015-11-23  7:59     ` Arnd Bergmann
  2015-11-23 14:59       ` Nicolas Pitre
  0 siblings, 1 reply; 15+ messages in thread
From: Arnd Bergmann @ 2015-11-23  7:59 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Russell King - ARM Linux, linux-arch, linux-kernel

On Sunday 22 November 2015 17:28:33 Nicolas Pitre wrote:
> On Sun, 22 Nov 2015, Arnd Bergmann wrote:
> > On Monday 16 November 2015 20:20:38 you wrote:
> > I'm now getting a build regressing with the attached randconfig configuration,
> > when compiling drivers/net/wireless/iwlegacy/common.o:
> > 
> > drivers/built-in.o: In function `il_send_rxon_timing':
> > :(.text+0xbbac80): undefined reference to `__aeabi_uldivmod'
> > :(.text+0xbbac9c): undefined reference to `__aeabi_uldivmod'
> > :(.text+0xbbacdc): undefined reference to `__aeabi_uldivmod'
> > :(.text+0xbbadc8): undefined reference to `__aeabi_uldivmod'
> > :(.text+0xbbadf8): undefined reference to `__aeabi_uldivmod'
> > :(.text+0xbbae3c): more undefined references to `__aeabi_uldivmod' follow
> > drivers/built-in.o: In function `il_send_rxon_timing':
> > :(.text+0xbbb11c): undefined reference to `____ilog2_NaN'
> 
> This looks like some gcc bug from a few years ago.

Yes, I clearly remember debugging something in this area before,
but I don't remember what we did about it.

We already have the workaround for OABI (which you kept), and I have
another workaround to disable the optimized function for ARMv3 as
well, as newer gcc versions also get that wrong here (internal compiler
error IIRC).

We could add yet another such workaround for CONFIG_PROFILE_ALL_BRANCHES,
but I don't have a good feeling about doing this unless we understand
well enough why it breaks. At some point, I did this workaround, which
is still in my randconfig tree:

--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -144,7 +144,7 @@ void ftrace_likely_update(struct ftrace_branch_data *f, int val, int expect);
  */
 #define if(cond, ...) __trace_if( (cond , ## __VA_ARGS__) )
 #define __trace_if(cond) \
-       if (__builtin_constant_p((cond)) ? !!(cond) :                   \
+       if (__builtin_constant_p(!!(cond)) ? !!(cond) :                 \
        ({                                                              \
                int ______r;                                            \
                static struct ftrace_branch_data                        \

However, it doesn't seem related to the problem at hand. I think it was
about some false 'maybe uninitialized' warning, but I currently don't see any
difference with or without that patch for either issues.

> > I've verified that this goes away if I turn off CONFIG_PROFILE_ALL_BRANCHES,
> > and it only shows up with gcc-5.0 through gcc-5.2.1, but not 4.9.3.
> > 
> > Aside from those, I have not been able to reduce the failure scenario.
> > 
> > Also, I have only tested on ARM32, no idea if this shows up elsewhere.
> 
> What if you revert "ARM: asm/div64.h: adjust to generic codde" and 
> recompile?

Without that patch, it works again.

	Arnd

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [GIT PULL] optimize 64-by-32 ddivision for constant divisors on 32-bit machines
  2015-11-23  7:59     ` Arnd Bergmann
@ 2015-11-23 14:59       ` Nicolas Pitre
  2015-11-23 16:04         ` Nicolas Pitre
  0 siblings, 1 reply; 15+ messages in thread
From: Nicolas Pitre @ 2015-11-23 14:59 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: Russell King - ARM Linux, linux-arch, linux-kernel

On Mon, 23 Nov 2015, Arnd Bergmann wrote:

> On Sunday 22 November 2015 17:28:33 Nicolas Pitre wrote:
> > On Sun, 22 Nov 2015, Arnd Bergmann wrote:
> > > On Monday 16 November 2015 20:20:38 you wrote:
> > > I'm now getting a build regressing with the attached randconfig configuration,
> > > when compiling drivers/net/wireless/iwlegacy/common.o:
> > > 
> > > drivers/built-in.o: In function `il_send_rxon_timing':
> > > :(.text+0xbbac80): undefined reference to `__aeabi_uldivmod'
> > > :(.text+0xbbac9c): undefined reference to `__aeabi_uldivmod'
> > > :(.text+0xbbacdc): undefined reference to `__aeabi_uldivmod'
> > > :(.text+0xbbadc8): undefined reference to `__aeabi_uldivmod'
> > > :(.text+0xbbadf8): undefined reference to `__aeabi_uldivmod'
> > > :(.text+0xbbae3c): more undefined references to `__aeabi_uldivmod' follow
> > > drivers/built-in.o: In function `il_send_rxon_timing':
> > > :(.text+0xbbb11c): undefined reference to `____ilog2_NaN'
> > 
> > This looks like some gcc bug from a few years ago.
> 
> Yes, I clearly remember debugging something in this area before,
> but I don't remember what we did about it.
> 
> We already have the workaround for OABI (which you kept), and I have
> another workaround to disable the optimized function for ARMv3 as
> well, as newer gcc versions also get that wrong here (internal compiler
> error IIRC).

Although we no longer support ARMv3 targets, the only reason 
-march=armv3 is used is to prevent gcc from using strh/ldrh instructions 
on RiscPC whose memory bus can't accommodate it.  Still, this would be 
worth reporting a bug for it.

> We could add yet another such workaround for CONFIG_PROFILE_ALL_BRANCHES,
> but I don't have a good feeling about doing this unless we understand
> well enough why it breaks.

I'm able to reproduce it, and I am in the process of reintroducing the 
last patch piece by piece to see what breaks.  The presence of 
____ilog2_NaN is particularly intriguing as this would mean 0 is passed 
to ilog2() somewhere.


Nicolas

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [GIT PULL] optimize 64-by-32 ddivision for constant divisors on 32-bit machines
  2015-11-23 14:59       ` Nicolas Pitre
@ 2015-11-23 16:04         ` Nicolas Pitre
  2015-11-23 16:11           ` Arnd Bergmann
  0 siblings, 1 reply; 15+ messages in thread
From: Nicolas Pitre @ 2015-11-23 16:04 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: Russell King - ARM Linux, linux-arch, linux-kernel

On Mon, 23 Nov 2015, Nicolas Pitre wrote:

> On Mon, 23 Nov 2015, Arnd Bergmann wrote:
> 
> > On Sunday 22 November 2015 17:28:33 Nicolas Pitre wrote:
> > > On Sun, 22 Nov 2015, Arnd Bergmann wrote:
> > > > On Monday 16 November 2015 20:20:38 you wrote:
> > > > I'm now getting a build regressing with the attached randconfig configuration,
> > > > when compiling drivers/net/wireless/iwlegacy/common.o:
> > > > 
> > > > drivers/built-in.o: In function `il_send_rxon_timing':
> > > > :(.text+0xbbac80): undefined reference to `__aeabi_uldivmod'
> > > > :(.text+0xbbac9c): undefined reference to `__aeabi_uldivmod'
> > > > :(.text+0xbbacdc): undefined reference to `__aeabi_uldivmod'
> > > > :(.text+0xbbadc8): undefined reference to `__aeabi_uldivmod'
> > > > :(.text+0xbbadf8): undefined reference to `__aeabi_uldivmod'
> > > > :(.text+0xbbae3c): more undefined references to `__aeabi_uldivmod' follow
> > > > drivers/built-in.o: In function `il_send_rxon_timing':
> > > > :(.text+0xbbb11c): undefined reference to `____ilog2_NaN'
> > > 
> 
> I'm able to reproduce it, and I am in the process of reintroducing the 
> last patch piece by piece to see what breaks.  The presence of 
> ____ilog2_NaN is particularly intriguing as this would mean 0 is passed 
> to ilog2() somewhere.

OK... I'm able to "fix" the build with:

diff --git a/include/asm-generic/div64.h b/include/asm-generic/div64.h
index 163f77999e..d246c4c801 100644
--- a/include/asm-generic/div64.h
+++ b/include/asm-generic/div64.h
@@ -206,7 +206,7 @@ extern uint32_t __div64_32(uint64_t *dividend, uint32_t divisor);
 	uint32_t __rem;					\
 	(void)(((typeof((n)) *)0) == ((uint64_t *)0));	\
 	if (__builtin_constant_p(__base) &&		\
-	    is_power_of_2(__base)) {			\
+	    is_power_of_2(__base) && __base != 0) {	\
 		__rem = (n) & (__base - 1);		\
 		(n) >>= ilog2(__base);			\
 	} else if (__div64_const32_is_OK &&		\

What doesn't make sense to me is the fact that is_power_of_2() is 
defined as:

static inline __attribute__((const))
bool is_power_of_2(unsigned long n)
{
        return (n != 0 && ((n & (n - 1)) == 0));
}

So the test for zero is already in there.

And adding BUILD_BUG_ON(__builtin_constant_p(__base) && __base == 0) 
before the if doesn't trig either.

Confused.


Nicolas

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [GIT PULL] optimize 64-by-32 ddivision for constant divisors on 32-bit machines
  2015-11-23 16:04         ` Nicolas Pitre
@ 2015-11-23 16:11           ` Arnd Bergmann
  2015-11-23 16:38             ` Nicolas Pitre
  2015-11-24  5:37             ` Nicolas Pitre
  0 siblings, 2 replies; 15+ messages in thread
From: Arnd Bergmann @ 2015-11-23 16:11 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Russell King - ARM Linux, linux-arch, linux-kernel

On Monday 23 November 2015 11:04:33 Nicolas Pitre wrote:
> 
> OK... I'm able to "fix" the build with:
> 
> diff --git a/include/asm-generic/div64.h b/include/asm-generic/div64.h
> index 163f77999e..d246c4c801 100644
> --- a/include/asm-generic/div64.h
> +++ b/include/asm-generic/div64.h
> @@ -206,7 +206,7 @@ extern uint32_t __div64_32(uint64_t *dividend, uint32_t divisor);
>         uint32_t __rem;                                 \
>         (void)(((typeof((n)) *)0) == ((uint64_t *)0));  \
>         if (__builtin_constant_p(__base) &&             \
> -           is_power_of_2(__base)) {                    \
> +           is_power_of_2(__base) && __base != 0) {     \
>                 __rem = (n) & (__base - 1);             \
>                 (n) >>= ilog2(__base);                  \
>         } else if (__div64_const32_is_OK &&             \
> 
> What doesn't make sense to me is the fact that is_power_of_2() is 
> defined as:
> 
> static inline __attribute__((const))
> bool is_power_of_2(unsigned long n)
> {
>         return (n != 0 && ((n & (n - 1)) == 0));
> }
> 
> So the test for zero is already in there.
> 
> And adding BUILD_BUG_ON(__builtin_constant_p(__base) && __base == 0) 
> before the if doesn't trig either.

I've seen similarly messed up situations with PROFILE_ALL_BRANCHES
before, I think it's got something to do with how __builtin_constant_p()
is used inside of the __trace_if() macro, and how gcc sometimes falls
back to treating variables as not-really-constant based on context.

To gcc, __builtin_constant_p is just best-effort, and they don't care
about returning false sometimes if they catch most cases in practice.

Note that llvm will always return false for __builtin_constant_p on
non-pointer arguments, which breaks a lot of optimizations.

	Arnd

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [GIT PULL] optimize 64-by-32 ddivision for constant divisors on 32-bit machines
  2015-11-23 16:11           ` Arnd Bergmann
@ 2015-11-23 16:38             ` Nicolas Pitre
  2015-11-24  5:37             ` Nicolas Pitre
  1 sibling, 0 replies; 15+ messages in thread
From: Nicolas Pitre @ 2015-11-23 16:38 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: Russell King - ARM Linux, linux-arch, linux-kernel

On Mon, 23 Nov 2015, Arnd Bergmann wrote:

> On Monday 23 November 2015 11:04:33 Nicolas Pitre wrote:
> > 
> > OK... I'm able to "fix" the build with:
> > 
> > diff --git a/include/asm-generic/div64.h b/include/asm-generic/div64.h
> > index 163f77999e..d246c4c801 100644
> > --- a/include/asm-generic/div64.h
> > +++ b/include/asm-generic/div64.h
> > @@ -206,7 +206,7 @@ extern uint32_t __div64_32(uint64_t *dividend, uint32_t divisor);
> >         uint32_t __rem;                                 \
> >         (void)(((typeof((n)) *)0) == ((uint64_t *)0));  \
> >         if (__builtin_constant_p(__base) &&             \
> > -           is_power_of_2(__base)) {                    \
> > +           is_power_of_2(__base) && __base != 0) {     \
> >                 __rem = (n) & (__base - 1);             \
> >                 (n) >>= ilog2(__base);                  \
> >         } else if (__div64_const32_is_OK &&             \
> > 
> > What doesn't make sense to me is the fact that is_power_of_2() is 
> > defined as:
> > 
> > static inline __attribute__((const))
> > bool is_power_of_2(unsigned long n)
> > {
> >         return (n != 0 && ((n & (n - 1)) == 0));
> > }
> > 
> > So the test for zero is already in there.
> > 
> > And adding BUILD_BUG_ON(__builtin_constant_p(__base) && __base == 0) 
> > before the if doesn't trig either.
> 
> I've seen similarly messed up situations with PROFILE_ALL_BRANCHES
> before, I think it's got something to do with how __builtin_constant_p()
> is used inside of the __trace_if() macro, and how gcc sometimes falls
> back to treating variables as not-really-constant based on context.
> 
> To gcc, __builtin_constant_p is just best-effort, and they don't care
> about returning false sometimes if they catch most cases in practice.

But here it must have returned true, and is_power_of_2() returned true 
as well (which implies that __base is not zero), ans somehow aving an 
additional __base != 0 test changes the outcome.  There is a correctness 
issue beyond __builtin_constant_p it seems.

> Note that llvm will always return false for __builtin_constant_p on
> non-pointer arguments, which breaks a lot of optimizations.

If llvm is able to optimize this case on its own then we won't need all 
this contraption.


Nicolas

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [GIT PULL] optimize 64-by-32 ddivision for constant divisors on 32-bit machines
  2015-11-23 16:11           ` Arnd Bergmann
  2015-11-23 16:38             ` Nicolas Pitre
@ 2015-11-24  5:37             ` Nicolas Pitre
  2015-11-24 13:28               ` Arnd Bergmann
  1 sibling, 1 reply; 15+ messages in thread
From: Nicolas Pitre @ 2015-11-24  5:37 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: Russell King - ARM Linux, linux-arch, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2281 bytes --]

On Mon, 23 Nov 2015, Arnd Bergmann wrote:

> On Monday 23 November 2015 11:04:33 Nicolas Pitre wrote:
> > 
> > OK... I'm able to "fix" the build with:
> > 
> > diff --git a/include/asm-generic/div64.h b/include/asm-generic/div64.h
> > index 163f77999e..d246c4c801 100644
> > --- a/include/asm-generic/div64.h
> > +++ b/include/asm-generic/div64.h
> > @@ -206,7 +206,7 @@ extern uint32_t __div64_32(uint64_t *dividend, uint32_t divisor);
> >         uint32_t __rem;                                 \
> >         (void)(((typeof((n)) *)0) == ((uint64_t *)0));  \
> >         if (__builtin_constant_p(__base) &&             \
> > -           is_power_of_2(__base)) {                    \
> > +           is_power_of_2(__base) && __base != 0) {     \
> >                 __rem = (n) & (__base - 1);             \
> >                 (n) >>= ilog2(__base);                  \
> >         } else if (__div64_const32_is_OK &&             \
> > 
> > What doesn't make sense to me is the fact that is_power_of_2() is 
> > defined as:
> > 
> > static inline __attribute__((const))
> > bool is_power_of_2(unsigned long n)
> > {
> >         return (n != 0 && ((n & (n - 1)) == 0));
> > }
> > 
> > So the test for zero is already in there.
> > 
> > And adding BUILD_BUG_ON(__builtin_constant_p(__base) && __base == 0) 
> > before the if doesn't trig either.
> 
> I've seen similarly messed up situations with PROFILE_ALL_BRANCHES
> before, I think it's got something to do with how __builtin_constant_p()
> is used inside of the __trace_if() macro, and how gcc sometimes falls
> back to treating variables as not-really-constant based on context.
> 
> To gcc, __builtin_constant_p is just best-effort, and they don't care
> about returning false sometimes if they catch most cases in practice.

OK... I produced a minimal test case. I think gcc is screwed. And it is 
not a question of __builtin_constant_p being best effort. The resulting 
code is plain wrong where a variable is suddenly turned into a constant 
of value 0. Any random modification to various part of the code just 
makes it disappear but I didn't check the disassembly to see if it is 
then correct.

And the good news(tm) is that the same bug is triggered with x86 gcc as 
well.

Please try the attached test case.


Nicolas

[-- Attachment #2: Type: application/gzip, Size: 2297 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [GIT PULL] optimize 64-by-32 ddivision for constant divisors on 32-bit machines
  2015-11-24  5:37             ` Nicolas Pitre
@ 2015-11-24 13:28               ` Arnd Bergmann
  2015-11-24 16:44                 ` Nicolas Pitre
  0 siblings, 1 reply; 15+ messages in thread
From: Arnd Bergmann @ 2015-11-24 13:28 UTC (permalink / raw)
  To: Nicolas Pitre; +Cc: Russell King - ARM Linux, linux-arch, linux-kernel

On Tuesday 24 November 2015 00:37:17 Nicolas Pitre wrote:
> On Mon, 23 Nov 2015, Arnd Bergmann wrote:
> 
> > On Monday 23 November 2015 11:04:33 Nicolas Pitre wrote:
> > > 
> > > OK... I'm able to "fix" the build with:
> > > 
> > > diff --git a/include/asm-generic/div64.h b/include/asm-generic/div64.h
> > > index 163f77999e..d246c4c801 100644
> > > --- a/include/asm-generic/div64.h
> > > +++ b/include/asm-generic/div64.h
> > > @@ -206,7 +206,7 @@ extern uint32_t __div64_32(uint64_t *dividend, uint32_t divisor);
> > >         uint32_t __rem;                                 \
> > >         (void)(((typeof((n)) *)0) == ((uint64_t *)0));  \
> > >         if (__builtin_constant_p(__base) &&             \
> > > -           is_power_of_2(__base)) {                    \
> > > +           is_power_of_2(__base) && __base != 0) {     \
> > >                 __rem = (n) & (__base - 1);             \
> > >                 (n) >>= ilog2(__base);                  \
> > >         } else if (__div64_const32_is_OK &&             \
> > > 
> > > What doesn't make sense to me is the fact that is_power_of_2() is 
> > > defined as:
> > > 
> > > static inline __attribute__((const))
> > > bool is_power_of_2(unsigned long n)
> > > {
> > >         return (n != 0 && ((n & (n - 1)) == 0));
> > > }
> > > 
> > > So the test for zero is already in there.
> > > 
> > > And adding BUILD_BUG_ON(__builtin_constant_p(__base) && __base == 0) 
> > > before the if doesn't trig either.
> > 
> > I've seen similarly messed up situations with PROFILE_ALL_BRANCHES
> > before, I think it's got something to do with how __builtin_constant_p()
> > is used inside of the __trace_if() macro, and how gcc sometimes falls
> > back to treating variables as not-really-constant based on context.
> > 
> > To gcc, __builtin_constant_p is just best-effort, and they don't care
> > about returning false sometimes if they catch most cases in practice.
> 
> OK... I produced a minimal test case. I think gcc is screwed. And it is 
> not a question of __builtin_constant_p being best effort. The resulting 
> code is plain wrong where a variable is suddenly turned into a constant 
> of value 0. Any random modification to various part of the code just 
> makes it disappear but I didn't check the disassembly to see if it is 
> then correct.
> 
> And the good news(tm) is that the same bug is triggered with x86 gcc as 
> well.
> 
> Please try the attached test case.

I can confirm the behavior you see with gcc-4.9 and later (I only saw
it on 5.x or later with the kernel code). It seems to have been
introduced with

https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=206941


        PR tree-optimization/59597
        * tree-ssa-threadupdate.c (dump_jump_thread_path): Move to earlier
        in file.  Accept new argument REGISTERING and use it to modify
        dump output appropriately.
        (register_jump_thread): Corresponding changes.
        (mark_threaded_blocks): Reinstate code to cancel unprofitable
        thread paths involving joiner blocks.  Add code to dump cancelled
        jump threading paths.
    
        PR tree-optimization/59597
        * gcc.dg/tree-ssa/pr59597.c: New test.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@206941 138bc75d-0d04-0410-961f-82ee72b054a4

 gcc/ChangeLog                           |  11 +++++
 gcc/testsuite/ChangeLog                 |   5 +++
 gcc/testsuite/gcc.dg/tree-ssa/pr59597.c |  55 +++++++++++++++++++++++++
 gcc/tree-ssa-threadupdate.c             | 125 +++++++++++++++++++++++++++++++++++++++------------------
 4 files changed, 157 insertions(+), 39 deletions(-)

however, in that version, the -DHIDE_THE_BUG variant also fails:

compilation that works:
gcc-cross -Wall -I /home/arnd/git/buildall/arm/gcc/gcc/include  -O2 -c -o /home/arnd/git/buildall/arm/gcc_testcase/src.o /home/arnd/git/buildall/arm/gcc_testcase/src.c -DHIDE_THE_BUG
/home/arnd/git/buildall/arm/gcc_testcase/src.c: In function 'il_send_rxon_timing':
/home/arnd/git/buildall/arm/gcc_testcase/src.c:39:30: error: call to '____ilog2_NaN' declared with attribute error: gcc is screwed

compilation that works (v2):
gcc-cross -Wall -I /home/arnd/git/buildall/arm/gcc/gcc/include  -O2 -c -o /home/arnd/git/buildall/arm/gcc_testcase/src.o /home/arnd/git/buildall/arm/gcc_testcase/src.c -DHIDE_THE_BUG_2
as: unrecognized option '-meabi=5'

compilation that fails:
gcc-cross -Wall -I /home/arnd/git/buildall/arm/gcc/gcc/include  -O2 -c -o /home/arnd/git/buildall/arm/gcc_testcase/src.o /home/arnd/git/buildall/arm/gcc_testcase/src.c
/home/arnd/git/buildall/arm/gcc_testcase/src.c: In function 'il_send_rxon_timing':
/home/arnd/git/buildall/arm/gcc_testcase/src.c:39:30: error: call to '____ilog2_NaN' declared with attribute error: gcc is screwed


This changed with

https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=220743

        PR tree-optimization/64823
        * tree-vrp.c (identify_jump_threads): Handle blocks with no real
        statements.
        * tree-ssa-threadedge.c (potentially_threadable_block): Allow
        threading through blocks with PHIs, but no statements.
        (thread_through_normal_block): Distinguish between blocks where
        we did not process all the statements and blocks with no statements.
    
        PR tree-optimization/64823
        * gcc.dg/uninit-20.c: New test.
    
    git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@220743 138bc75d-0d04-0410-961f-82ee72b054a4

 gcc/ChangeLog                    | 10 ++++++++++
 gcc/testsuite/ChangeLog          |  5 +++++
 gcc/testsuite/gcc.dg/uninit-20.c | 18 ++++++++++++++++++
 gcc/tree-ssa-threadedge.c        | 39 ++++++++++++++++++++++++++++++++-------
 gcc/tree-vrp.c                   | 13 ++++++++++---
 5 files changed, 75 insertions(+), 10 deletions(-)


after which only the third run fails but the -DHIDE_THE_BUG one succeeds.

	Arnd

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [GIT PULL] optimize 64-by-32 ddivision for constant divisors on 32-bit machines
  2015-11-24 13:28               ` Arnd Bergmann
@ 2015-11-24 16:44                 ` Nicolas Pitre
  0 siblings, 0 replies; 15+ messages in thread
From: Nicolas Pitre @ 2015-11-24 16:44 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: Russell King - ARM Linux, linux-arch, linux-kernel

On Tue, 24 Nov 2015, Arnd Bergmann wrote:

> On Tuesday 24 November 2015 00:37:17 Nicolas Pitre wrote:
> > OK... I produced a minimal test case. I think gcc is screwed. And it is 
> > not a question of __builtin_constant_p being best effort. The resulting 
> > code is plain wrong where a variable is suddenly turned into a constant 
> > of value 0. Any random modification to various part of the code just 
> > makes it disappear but I didn't check the disassembly to see if it is 
> > then correct.
> > 
> > And the good news(tm) is that the same bug is triggered with x86 gcc as 
> > well.
> > 
> > Please try the attached test case.
> 
> I can confirm the behavior you see with gcc-4.9 and later (I only saw
> it on 5.x or later with the kernel code). 

It is really sensitive to code modifications. You simplify it somehow 
and the bug is gone.  So there might be some kind of complexity threshold 
that gcc-4.9 didn't reach with the kernel code.

> It seems to have been
> introduced with
> 
> https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=206941
> 
> 
>         PR tree-optimization/59597
>         * tree-ssa-threadupdate.c (dump_jump_thread_path): Move to earlier
>         in file.  Accept new argument REGISTERING and use it to modify
>         dump output appropriately.
>         (register_jump_thread): Corresponding changes.
>         (mark_threaded_blocks): Reinstate code to cancel unprofitable
>         thread paths involving joiner blocks.  Add code to dump cancelled
>         jump threading paths.
>     
>         PR tree-optimization/59597
>         * gcc.dg/tree-ssa/pr59597.c: New test.
>     
>     git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@206941 138bc75d-0d04-0410-961f-82ee72b054a4
> 
>  gcc/ChangeLog                           |  11 +++++
>  gcc/testsuite/ChangeLog                 |   5 +++
>  gcc/testsuite/gcc.dg/tree-ssa/pr59597.c |  55 +++++++++++++++++++++++++
>  gcc/tree-ssa-threadupdate.c             | 125 +++++++++++++++++++++++++++++++++++++++------------------
>  4 files changed, 157 insertions(+), 39 deletions(-)
> 
> however, in that version, the -DHIDE_THE_BUG variant also fails:
> 
> compilation that works:
> gcc-cross -Wall -I /home/arnd/git/buildall/arm/gcc/gcc/include  -O2 -c -o /home/arnd/git/buildall/arm/gcc_testcase/src.o /home/arnd/git/buildall/arm/gcc_testcase/src.c -DHIDE_THE_BUG
> /home/arnd/git/buildall/arm/gcc_testcase/src.c: In function 'il_send_rxon_timing':
> /home/arnd/git/buildall/arm/gcc_testcase/src.c:39:30: error: call to '____ilog2_NaN' declared with attribute error: gcc is screwed

IMHO the fact that the -DHIDE_THE_BUG case is enough to make the bug go 
away sometimes is even more worrisome given the nature of the 
"workaround".

Given you have a good handle with the bad commits already, are you going 
to bring it up with the gcc people?


Nicolas

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2015-11-24 16:44 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-17  1:20 [GIT PULL] optimize 64-by-32 ddivision for constant divisors on 32-bit machines Nicolas Pitre
2015-11-19 16:28 ` Arnd Bergmann
2015-11-20  0:29   ` Nicolas Pitre
2015-11-20 12:27     ` Arnd Bergmann
2015-11-20 13:24       ` Nicolas Pitre
2015-11-22 22:18 ` Arnd Bergmann
2015-11-22 22:28   ` Nicolas Pitre
2015-11-23  7:59     ` Arnd Bergmann
2015-11-23 14:59       ` Nicolas Pitre
2015-11-23 16:04         ` Nicolas Pitre
2015-11-23 16:11           ` Arnd Bergmann
2015-11-23 16:38             ` Nicolas Pitre
2015-11-24  5:37             ` Nicolas Pitre
2015-11-24 13:28               ` Arnd Bergmann
2015-11-24 16:44                 ` Nicolas Pitre

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.