All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] powerpc/vdso32: Add missing _restgpr_31_x to fix build failure
@ 2021-03-09  6:19 ` Christophe Leroy
  0 siblings, 0 replies; 17+ messages in thread
From: Christophe Leroy @ 2021-03-09  6:19 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, segher
  Cc: linux-kernel, linuxppc-dev

With some defconfig including CONFIG_CC_OPTIMIZE_FOR_SIZE,
(for instance mvme5100_defconfig and ps3_defconfig), gcc 5
generates a call to _restgpr_31_x.

Until recently it went unnoticed, but
commit 42ed6d56ade2 ("powerpc/vdso: Block R_PPC_REL24 relocations")
made it rise to the surface.

Provide that function (copied from lib/crtsavres.S) in
gettimeofday.S

Fixes: ab037dd87a2f ("powerpc/vdso: Switch VDSO to generic C implementation.")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
I don't know if there is a way to tell GCC not to emit that call, because at the end we get more instructions than needed.
---
 arch/powerpc/kernel/vdso32/gettimeofday.S | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/arch/powerpc/kernel/vdso32/gettimeofday.S b/arch/powerpc/kernel/vdso32/gettimeofday.S
index a6e29f880e0e..d21d08140a5e 100644
--- a/arch/powerpc/kernel/vdso32/gettimeofday.S
+++ b/arch/powerpc/kernel/vdso32/gettimeofday.S
@@ -65,3 +65,14 @@ V_FUNCTION_END(__kernel_clock_getres)
 V_FUNCTION_BEGIN(__kernel_time)
 	cvdso_call_time __c_kernel_time
 V_FUNCTION_END(__kernel_time)
+
+/* Routines for restoring integer registers, called by the compiler.  */
+/* Called with r11 pointing to the stack header word of the caller of the */
+/* function, just beyond the end of the integer restore area.  */
+_GLOBAL(_restgpr_31_x)
+_GLOBAL(_rest32gpr_31_x)
+	lwz	r0,4(r11)
+	lwz	r31,-4(r11)
+	mtlr	r0
+	mr	r1,r11
+	blr
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH] powerpc/vdso32: Add missing _restgpr_31_x to fix build failure
@ 2021-03-09  6:19 ` Christophe Leroy
  0 siblings, 0 replies; 17+ messages in thread
From: Christophe Leroy @ 2021-03-09  6:19 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman, segher
  Cc: linuxppc-dev, linux-kernel

With some defconfig including CONFIG_CC_OPTIMIZE_FOR_SIZE,
(for instance mvme5100_defconfig and ps3_defconfig), gcc 5
generates a call to _restgpr_31_x.

Until recently it went unnoticed, but
commit 42ed6d56ade2 ("powerpc/vdso: Block R_PPC_REL24 relocations")
made it rise to the surface.

Provide that function (copied from lib/crtsavres.S) in
gettimeofday.S

Fixes: ab037dd87a2f ("powerpc/vdso: Switch VDSO to generic C implementation.")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
---
I don't know if there is a way to tell GCC not to emit that call, because at the end we get more instructions than needed.
---
 arch/powerpc/kernel/vdso32/gettimeofday.S | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/arch/powerpc/kernel/vdso32/gettimeofday.S b/arch/powerpc/kernel/vdso32/gettimeofday.S
index a6e29f880e0e..d21d08140a5e 100644
--- a/arch/powerpc/kernel/vdso32/gettimeofday.S
+++ b/arch/powerpc/kernel/vdso32/gettimeofday.S
@@ -65,3 +65,14 @@ V_FUNCTION_END(__kernel_clock_getres)
 V_FUNCTION_BEGIN(__kernel_time)
 	cvdso_call_time __c_kernel_time
 V_FUNCTION_END(__kernel_time)
+
+/* Routines for restoring integer registers, called by the compiler.  */
+/* Called with r11 pointing to the stack header word of the caller of the */
+/* function, just beyond the end of the integer restore area.  */
+_GLOBAL(_restgpr_31_x)
+_GLOBAL(_rest32gpr_31_x)
+	lwz	r0,4(r11)
+	lwz	r31,-4(r11)
+	mtlr	r0
+	mr	r1,r11
+	blr
-- 
2.25.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] powerpc/vdso32: Add missing _restgpr_31_x to fix build failure
  2021-03-09  6:19 ` Christophe Leroy
@ 2021-03-12  2:29   ` Segher Boessenkool
  -1 siblings, 0 replies; 17+ messages in thread
From: Segher Boessenkool @ 2021-03-12  2:29 UTC (permalink / raw)
  To: Christophe Leroy
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	linux-kernel, linuxppc-dev

Hi!

On Tue, Mar 09, 2021 at 06:19:30AM +0000, Christophe Leroy wrote:
> With some defconfig including CONFIG_CC_OPTIMIZE_FOR_SIZE,
> (for instance mvme5100_defconfig and ps3_defconfig), gcc 5
> generates a call to _restgpr_31_x.

> I don't know if there is a way to tell GCC not to emit that call, because at the end we get more instructions than needed.

The function is required by the ABI, you need to have it.

You get *fewer* insns statically, and that is what -Os is about: reduce
the size of the binaries.

(The only reason you get such problems is because Linux does not have
all of libgcc.  You can have that *and* have some symbols cause link
errors, it isn't rocket science).


Segher

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] powerpc/vdso32: Add missing _restgpr_31_x to fix build failure
@ 2021-03-12  2:29   ` Segher Boessenkool
  0 siblings, 0 replies; 17+ messages in thread
From: Segher Boessenkool @ 2021-03-12  2:29 UTC (permalink / raw)
  To: Christophe Leroy; +Cc: linuxppc-dev, Paul Mackerras, linux-kernel

Hi!

On Tue, Mar 09, 2021 at 06:19:30AM +0000, Christophe Leroy wrote:
> With some defconfig including CONFIG_CC_OPTIMIZE_FOR_SIZE,
> (for instance mvme5100_defconfig and ps3_defconfig), gcc 5
> generates a call to _restgpr_31_x.

> I don't know if there is a way to tell GCC not to emit that call, because at the end we get more instructions than needed.

The function is required by the ABI, you need to have it.

You get *fewer* insns statically, and that is what -Os is about: reduce
the size of the binaries.

(The only reason you get such problems is because Linux does not have
all of libgcc.  You can have that *and* have some symbols cause link
errors, it isn't rocket science).


Segher

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] powerpc/vdso32: Add missing _restgpr_31_x to fix build failure
  2021-03-09  6:19 ` Christophe Leroy
  (?)
  (?)
@ 2021-03-12 13:09 ` Christophe Leroy
  -1 siblings, 0 replies; 17+ messages in thread
From: Christophe Leroy @ 2021-03-12 13:09 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev, linux-kernel



Le 09/03/2021 à 07:19, Christophe Leroy a écrit :
> With some defconfig including CONFIG_CC_OPTIMIZE_FOR_SIZE,
> (for instance mvme5100_defconfig and ps3_defconfig), gcc 5
> generates a call to _restgpr_31_x.
> 
> Until recently it went unnoticed, but
> commit 42ed6d56ade2 ("powerpc/vdso: Block R_PPC_REL24 relocations")
> made it rise to the surface.
> 
> Provide that function (copied from lib/crtsavres.S) in
> gettimeofday.S
> 
> Fixes: ab037dd87a2f ("powerpc/vdso: Switch VDSO to generic C implementation.")
> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>

Fixes the following builds:

http://kisskb.ellerman.id.au/kisskb/buildresult/14492138/
http://kisskb.ellerman.id.au/kisskb/buildresult/14492041/

Christophe


> ---
> I don't know if there is a way to tell GCC not to emit that call, because at the end we get more instructions than needed.
> ---
>   arch/powerpc/kernel/vdso32/gettimeofday.S | 11 +++++++++++
>   1 file changed, 11 insertions(+)
> 
> diff --git a/arch/powerpc/kernel/vdso32/gettimeofday.S b/arch/powerpc/kernel/vdso32/gettimeofday.S
> index a6e29f880e0e..d21d08140a5e 100644
> --- a/arch/powerpc/kernel/vdso32/gettimeofday.S
> +++ b/arch/powerpc/kernel/vdso32/gettimeofday.S
> @@ -65,3 +65,14 @@ V_FUNCTION_END(__kernel_clock_getres)
>   V_FUNCTION_BEGIN(__kernel_time)
>   	cvdso_call_time __c_kernel_time
>   V_FUNCTION_END(__kernel_time)
> +
> +/* Routines for restoring integer registers, called by the compiler.  */
> +/* Called with r11 pointing to the stack header word of the caller of the */
> +/* function, just beyond the end of the integer restore area.  */
> +_GLOBAL(_restgpr_31_x)
> +_GLOBAL(_rest32gpr_31_x)
> +	lwz	r0,4(r11)
> +	lwz	r31,-4(r11)
> +	mtlr	r0
> +	mr	r1,r11
> +	blr
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] powerpc/vdso32: Add missing _restgpr_31_x to fix build failure
  2021-03-09  6:19 ` Christophe Leroy
@ 2021-03-15 13:31   ` Michael Ellerman
  -1 siblings, 0 replies; 17+ messages in thread
From: Michael Ellerman @ 2021-03-15 13:31 UTC (permalink / raw)
  To: Paul Mackerras, Benjamin Herrenschmidt, Michael Ellerman, segher,
	Christophe Leroy
  Cc: linux-kernel, linuxppc-dev

On Tue, 9 Mar 2021 06:19:30 +0000 (UTC), Christophe Leroy wrote:
> With some defconfig including CONFIG_CC_OPTIMIZE_FOR_SIZE,
> (for instance mvme5100_defconfig and ps3_defconfig), gcc 5
> generates a call to _restgpr_31_x.
> 
> Until recently it went unnoticed, but
> commit 42ed6d56ade2 ("powerpc/vdso: Block R_PPC_REL24 relocations")
> made it rise to the surface.
> 
> [...]

Applied to powerpc/fixes.

[1/1] powerpc/vdso32: Add missing _restgpr_31_x to fix build failure
      https://git.kernel.org/powerpc/c/08c18b63d9656e0389087d1956d2b37fd7019172

cheers

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] powerpc/vdso32: Add missing _restgpr_31_x to fix build failure
@ 2021-03-15 13:31   ` Michael Ellerman
  0 siblings, 0 replies; 17+ messages in thread
From: Michael Ellerman @ 2021-03-15 13:31 UTC (permalink / raw)
  To: Paul Mackerras, Benjamin Herrenschmidt, Michael Ellerman, segher,
	Christophe Leroy
  Cc: linuxppc-dev, linux-kernel

On Tue, 9 Mar 2021 06:19:30 +0000 (UTC), Christophe Leroy wrote:
> With some defconfig including CONFIG_CC_OPTIMIZE_FOR_SIZE,
> (for instance mvme5100_defconfig and ps3_defconfig), gcc 5
> generates a call to _restgpr_31_x.
> 
> Until recently it went unnoticed, but
> commit 42ed6d56ade2 ("powerpc/vdso: Block R_PPC_REL24 relocations")
> made it rise to the surface.
> 
> [...]

Applied to powerpc/fixes.

[1/1] powerpc/vdso32: Add missing _restgpr_31_x to fix build failure
      https://git.kernel.org/powerpc/c/08c18b63d9656e0389087d1956d2b37fd7019172

cheers

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] powerpc/vdso32: Add missing _restgpr_31_x to fix build failure
  2021-03-12  2:29   ` Segher Boessenkool
@ 2021-03-15 16:23     ` Rasmus Villemoes
  -1 siblings, 0 replies; 17+ messages in thread
From: Rasmus Villemoes @ 2021-03-15 16:23 UTC (permalink / raw)
  To: Segher Boessenkool, Christophe Leroy
  Cc: Benjamin Herrenschmidt, Paul Mackerras, Michael Ellerman,
	linux-kernel, linuxppc-dev

On 12/03/2021 03.29, Segher Boessenkool wrote:
> Hi!
> 
> On Tue, Mar 09, 2021 at 06:19:30AM +0000, Christophe Leroy wrote:
>> With some defconfig including CONFIG_CC_OPTIMIZE_FOR_SIZE,
>> (for instance mvme5100_defconfig and ps3_defconfig), gcc 5
>> generates a call to _restgpr_31_x.
> 
>> I don't know if there is a way to tell GCC not to emit that call, because at the end we get more instructions than needed.
> 
> The function is required by the ABI, you need to have it.
> 
> You get *fewer* insns statically, and that is what -Os is about: reduce
> the size of the binaries.

Is there any reason to not just always build the vdso with -O2? It's one
page/one VMA either way, and the vdso is about making certain system
calls cheaper, so if unconditional -O2 could save a few cycles compared
to -Os, why not? (And if, as it seems, there's only one user within the
DSO of _restgpr_31_x, yes, the overall size of the .text segment
probably increases slightly).

Rasmus

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] powerpc/vdso32: Add missing _restgpr_31_x to fix build failure
@ 2021-03-15 16:23     ` Rasmus Villemoes
  0 siblings, 0 replies; 17+ messages in thread
From: Rasmus Villemoes @ 2021-03-15 16:23 UTC (permalink / raw)
  To: Segher Boessenkool, Christophe Leroy
  Cc: linuxppc-dev, Paul Mackerras, linux-kernel

On 12/03/2021 03.29, Segher Boessenkool wrote:
> Hi!
> 
> On Tue, Mar 09, 2021 at 06:19:30AM +0000, Christophe Leroy wrote:
>> With some defconfig including CONFIG_CC_OPTIMIZE_FOR_SIZE,
>> (for instance mvme5100_defconfig and ps3_defconfig), gcc 5
>> generates a call to _restgpr_31_x.
> 
>> I don't know if there is a way to tell GCC not to emit that call, because at the end we get more instructions than needed.
> 
> The function is required by the ABI, you need to have it.
> 
> You get *fewer* insns statically, and that is what -Os is about: reduce
> the size of the binaries.

Is there any reason to not just always build the vdso with -O2? It's one
page/one VMA either way, and the vdso is about making certain system
calls cheaper, so if unconditional -O2 could save a few cycles compared
to -Os, why not? (And if, as it seems, there's only one user within the
DSO of _restgpr_31_x, yes, the overall size of the .text segment
probably increases slightly).

Rasmus

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH] powerpc/vdso32: Add missing _restgpr_31_x to fix build failure
  2021-03-15 16:23     ` Rasmus Villemoes
@ 2021-03-15 16:38       ` David Laight
  -1 siblings, 0 replies; 17+ messages in thread
From: David Laight @ 2021-03-15 16:38 UTC (permalink / raw)
  To: 'Rasmus Villemoes', Segher Boessenkool, Christophe Leroy
  Cc: linuxppc-dev, Paul Mackerras, linux-kernel

From: Rasmus Villemoes
> Sent: 15 March 2021 16:24
> 
> On 12/03/2021 03.29, Segher Boessenkool wrote:
> > Hi!
> >
> > On Tue, Mar 09, 2021 at 06:19:30AM +0000, Christophe Leroy wrote:
> >> With some defconfig including CONFIG_CC_OPTIMIZE_FOR_SIZE,
> >> (for instance mvme5100_defconfig and ps3_defconfig), gcc 5
> >> generates a call to _restgpr_31_x.
> >
> >> I don't know if there is a way to tell GCC not to emit that call, because at the end we get more
> instructions than needed.
> >
> > The function is required by the ABI, you need to have it.
> >
> > You get *fewer* insns statically, and that is what -Os is about: reduce
> > the size of the binaries.
> 
> Is there any reason to not just always build the vdso with -O2? It's one
> page/one VMA either way, and the vdso is about making certain system
> calls cheaper, so if unconditional -O2 could save a few cycles compared
> to -Os, why not? (And if, as it seems, there's only one user within the
> DSO of _restgpr_31_x, yes, the overall size of the .text segment
> probably increases slightly).

Sometimes -Os generates such horrid code you really never want to use it.
A classic is on x86 where it replaces 'load register with byte constant'
with 'push byte' 'pop register'.
The code is actually smaller but the execution time is horrid.

There are also cases where -O2 actually generates smaller code.

Although you may need to disable loop unrolling (often dubious at best)
and either force or disable some function inlining.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH] powerpc/vdso32: Add missing _restgpr_31_x to fix build failure
@ 2021-03-15 16:38       ` David Laight
  0 siblings, 0 replies; 17+ messages in thread
From: David Laight @ 2021-03-15 16:38 UTC (permalink / raw)
  To: 'Rasmus Villemoes', Segher Boessenkool, Christophe Leroy
  Cc: Paul Mackerras, linuxppc-dev, linux-kernel

From: Rasmus Villemoes
> Sent: 15 March 2021 16:24
> 
> On 12/03/2021 03.29, Segher Boessenkool wrote:
> > Hi!
> >
> > On Tue, Mar 09, 2021 at 06:19:30AM +0000, Christophe Leroy wrote:
> >> With some defconfig including CONFIG_CC_OPTIMIZE_FOR_SIZE,
> >> (for instance mvme5100_defconfig and ps3_defconfig), gcc 5
> >> generates a call to _restgpr_31_x.
> >
> >> I don't know if there is a way to tell GCC not to emit that call, because at the end we get more
> instructions than needed.
> >
> > The function is required by the ABI, you need to have it.
> >
> > You get *fewer* insns statically, and that is what -Os is about: reduce
> > the size of the binaries.
> 
> Is there any reason to not just always build the vdso with -O2? It's one
> page/one VMA either way, and the vdso is about making certain system
> calls cheaper, so if unconditional -O2 could save a few cycles compared
> to -Os, why not? (And if, as it seems, there's only one user within the
> DSO of _restgpr_31_x, yes, the overall size of the .text segment
> probably increases slightly).

Sometimes -Os generates such horrid code you really never want to use it.
A classic is on x86 where it replaces 'load register with byte constant'
with 'push byte' 'pop register'.
The code is actually smaller but the execution time is horrid.

There are also cases where -O2 actually generates smaller code.

Although you may need to disable loop unrolling (often dubious at best)
and either force or disable some function inlining.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] powerpc/vdso32: Add missing _restgpr_31_x to fix build failure
  2021-03-15 16:23     ` Rasmus Villemoes
@ 2021-03-15 23:47       ` Segher Boessenkool
  -1 siblings, 0 replies; 17+ messages in thread
From: Segher Boessenkool @ 2021-03-15 23:47 UTC (permalink / raw)
  To: Rasmus Villemoes
  Cc: Christophe Leroy, Benjamin Herrenschmidt, Paul Mackerras,
	Michael Ellerman, linux-kernel, linuxppc-dev

On Mon, Mar 15, 2021 at 05:23:44PM +0100, Rasmus Villemoes wrote:
> On 12/03/2021 03.29, Segher Boessenkool wrote:
> > On Tue, Mar 09, 2021 at 06:19:30AM +0000, Christophe Leroy wrote:
> >> With some defconfig including CONFIG_CC_OPTIMIZE_FOR_SIZE,
> >> (for instance mvme5100_defconfig and ps3_defconfig), gcc 5
> >> generates a call to _restgpr_31_x.
> > 
> >> I don't know if there is a way to tell GCC not to emit that call, because at the end we get more instructions than needed.
> > 
> > The function is required by the ABI, you need to have it.
> > 
> > You get *fewer* insns statically, and that is what -Os is about: reduce
> > the size of the binaries.
> 
> Is there any reason to not just always build the vdso with -O2? It's one
> page/one VMA either way, and the vdso is about making certain system
> calls cheaper, so if unconditional -O2 could save a few cycles compared
> to -Os, why not? (And if, as it seems, there's only one user within the
> DSO of _restgpr_31_x, yes, the overall size of the .text segment
> probably increases slightly).

You can use exactly the same reasoning for using -O2 instead of -Os
anywhere else.

-Os doesn't mean "smaller code, but only where that is reasonable".  It
means "smaller code".


Segher

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] powerpc/vdso32: Add missing _restgpr_31_x to fix build failure
@ 2021-03-15 23:47       ` Segher Boessenkool
  0 siblings, 0 replies; 17+ messages in thread
From: Segher Boessenkool @ 2021-03-15 23:47 UTC (permalink / raw)
  To: Rasmus Villemoes; +Cc: linux-kernel, Paul Mackerras, linuxppc-dev

On Mon, Mar 15, 2021 at 05:23:44PM +0100, Rasmus Villemoes wrote:
> On 12/03/2021 03.29, Segher Boessenkool wrote:
> > On Tue, Mar 09, 2021 at 06:19:30AM +0000, Christophe Leroy wrote:
> >> With some defconfig including CONFIG_CC_OPTIMIZE_FOR_SIZE,
> >> (for instance mvme5100_defconfig and ps3_defconfig), gcc 5
> >> generates a call to _restgpr_31_x.
> > 
> >> I don't know if there is a way to tell GCC not to emit that call, because at the end we get more instructions than needed.
> > 
> > The function is required by the ABI, you need to have it.
> > 
> > You get *fewer* insns statically, and that is what -Os is about: reduce
> > the size of the binaries.
> 
> Is there any reason to not just always build the vdso with -O2? It's one
> page/one VMA either way, and the vdso is about making certain system
> calls cheaper, so if unconditional -O2 could save a few cycles compared
> to -Os, why not? (And if, as it seems, there's only one user within the
> DSO of _restgpr_31_x, yes, the overall size of the .text segment
> probably increases slightly).

You can use exactly the same reasoning for using -O2 instead of -Os
anywhere else.

-Os doesn't mean "smaller code, but only where that is reasonable".  It
means "smaller code".


Segher

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] powerpc/vdso32: Add missing _restgpr_31_x to fix build failure
  2021-03-15 16:38       ` David Laight
@ 2021-03-15 23:59         ` Segher Boessenkool
  -1 siblings, 0 replies; 17+ messages in thread
From: Segher Boessenkool @ 2021-03-15 23:59 UTC (permalink / raw)
  To: David Laight
  Cc: 'Rasmus Villemoes',
	Christophe Leroy, linuxppc-dev, Paul Mackerras, linux-kernel

On Mon, Mar 15, 2021 at 04:38:52PM +0000, David Laight wrote:
> From: Rasmus Villemoes
> > Sent: 15 March 2021 16:24
> > On 12/03/2021 03.29, Segher Boessenkool wrote:
> > > On Tue, Mar 09, 2021 at 06:19:30AM +0000, Christophe Leroy wrote:
> > >> With some defconfig including CONFIG_CC_OPTIMIZE_FOR_SIZE,
> > >> (for instance mvme5100_defconfig and ps3_defconfig), gcc 5
> > >> generates a call to _restgpr_31_x.
> > >
> > >> I don't know if there is a way to tell GCC not to emit that call, because at the end we get more
> > instructions than needed.
> > >
> > > The function is required by the ABI, you need to have it.
> > >
> > > You get *fewer* insns statically, and that is what -Os is about: reduce
> > > the size of the binaries.
> > 
> > Is there any reason to not just always build the vdso with -O2? It's one
> > page/one VMA either way, and the vdso is about making certain system
> > calls cheaper, so if unconditional -O2 could save a few cycles compared
> > to -Os, why not? (And if, as it seems, there's only one user within the
> > DSO of _restgpr_31_x, yes, the overall size of the .text segment
> > probably increases slightly).
> 
> Sometimes -Os generates such horrid code you really never want to use it.
> A classic is on x86 where it replaces 'load register with byte constant'
> with 'push byte' 'pop register'.
> The code is actually smaller but the execution time is horrid.
> 
> There are also cases where -O2 actually generates smaller code.

Yes, as with all heuristics it doesn't always work out.  But usually -Os
is smaller.

> Although you may need to disable loop unrolling (often dubious at best)
> and either force or disable some function inlining.

The cases where GCC does loop unrolling at -O2 always help quite a lot.
Or, do you have a counter-example?  We'd love to see one.

And yup, inlining is hard.  GCC's heuristics there are very good
nowadays, but any single decision has big effects.  Doing the important
spots manually (always_inline or noinline) has good payoff.


Segher

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] powerpc/vdso32: Add missing _restgpr_31_x to fix build failure
@ 2021-03-15 23:59         ` Segher Boessenkool
  0 siblings, 0 replies; 17+ messages in thread
From: Segher Boessenkool @ 2021-03-15 23:59 UTC (permalink / raw)
  To: David Laight
  Cc: Paul Mackerras, linuxppc-dev, linux-kernel, 'Rasmus Villemoes'

On Mon, Mar 15, 2021 at 04:38:52PM +0000, David Laight wrote:
> From: Rasmus Villemoes
> > Sent: 15 March 2021 16:24
> > On 12/03/2021 03.29, Segher Boessenkool wrote:
> > > On Tue, Mar 09, 2021 at 06:19:30AM +0000, Christophe Leroy wrote:
> > >> With some defconfig including CONFIG_CC_OPTIMIZE_FOR_SIZE,
> > >> (for instance mvme5100_defconfig and ps3_defconfig), gcc 5
> > >> generates a call to _restgpr_31_x.
> > >
> > >> I don't know if there is a way to tell GCC not to emit that call, because at the end we get more
> > instructions than needed.
> > >
> > > The function is required by the ABI, you need to have it.
> > >
> > > You get *fewer* insns statically, and that is what -Os is about: reduce
> > > the size of the binaries.
> > 
> > Is there any reason to not just always build the vdso with -O2? It's one
> > page/one VMA either way, and the vdso is about making certain system
> > calls cheaper, so if unconditional -O2 could save a few cycles compared
> > to -Os, why not? (And if, as it seems, there's only one user within the
> > DSO of _restgpr_31_x, yes, the overall size of the .text segment
> > probably increases slightly).
> 
> Sometimes -Os generates such horrid code you really never want to use it.
> A classic is on x86 where it replaces 'load register with byte constant'
> with 'push byte' 'pop register'.
> The code is actually smaller but the execution time is horrid.
> 
> There are also cases where -O2 actually generates smaller code.

Yes, as with all heuristics it doesn't always work out.  But usually -Os
is smaller.

> Although you may need to disable loop unrolling (often dubious at best)
> and either force or disable some function inlining.

The cases where GCC does loop unrolling at -O2 always help quite a lot.
Or, do you have a counter-example?  We'd love to see one.

And yup, inlining is hard.  GCC's heuristics there are very good
nowadays, but any single decision has big effects.  Doing the important
spots manually (always_inline or noinline) has good payoff.


Segher

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH] powerpc/vdso32: Add missing _restgpr_31_x to fix build failure
  2021-03-15 23:59         ` Segher Boessenkool
@ 2021-03-16  9:35           ` David Laight
  -1 siblings, 0 replies; 17+ messages in thread
From: David Laight @ 2021-03-16  9:35 UTC (permalink / raw)
  To: 'Segher Boessenkool'
  Cc: 'Rasmus Villemoes',
	Christophe Leroy, linuxppc-dev, Paul Mackerras, linux-kernel

From: Segher Boessenkool
> Sent: 16 March 2021 00:00
...
> > Although you may need to disable loop unrolling (often dubious at best)
> > and either force or disable some function inlining.
> 
> The cases where GCC does loop unrolling at -O2 always help quite a lot.
> Or, do you have a counter-example?  We'd love to see one.

The real problem with loop unrolling is that quite often a modern
out-of-order superscaler processor actually has 'spare' execution
cycles where the loop control can be done 'for free'.
Sometimes you do need to unroll (or interleave) a couple of
times to get enough spare execution cycles.

But the unrolled loop has to read a lot more code into cache
- so unless the code is 'hot cache' (that is usually arranged
for benchmarking) those delays apply as well.
The larger code footprint also displaces other code.

My real annoyance with gcc is unrolling (and vectorizing)
loops that I know are never executed as many times as even one
copy of the unrolled loop.

As an example intel (ivy bridge onwards) cpu execute the
following code (the middle of the ip checksum) at 8 bytes/clock.
(Limited by the carry flag.)
It just doesn't need any further unrolling.

+               "10:    jecxz 20f\n"
+               "       adc   (%[buff], %[len]), %[sum_0]\n"
+               "       adc   8(%[buff], %[len]), %[sum_1]\n"
+               "       lea   32(%[len]), %[len_tmp]\n"
+               "       adc   16(%[buff], %[len]), %[sum_0]\n"
+               "       adc   24(%[buff], %[len]), %[sum_1]\n"
+               "       mov   %[len_tmp], %[len]\n"
+               "       jmp   10b\n"

Annoyingly that loop is slow on my 8-core atom. 
The existing code only does 4 bytes/clock on intel cpu prior
to either broadwell or haswell (forgotten which) in spite
of much more unroling.


> And yup, inlining is hard.  GCC's heuristics there are very good
> nowadays, but any single decision has big effects.  Doing the important
> spots manually (always_inline or noinline) has good payoff.

Latest inline gripe was a function replicated about 20 times
when the non-inline version was a register load and 'tail call'.
The inlining is just bloat.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH] powerpc/vdso32: Add missing _restgpr_31_x to fix build failure
@ 2021-03-16  9:35           ` David Laight
  0 siblings, 0 replies; 17+ messages in thread
From: David Laight @ 2021-03-16  9:35 UTC (permalink / raw)
  To: 'Segher Boessenkool'
  Cc: Paul Mackerras, linuxppc-dev, linux-kernel, 'Rasmus Villemoes'

From: Segher Boessenkool
> Sent: 16 March 2021 00:00
...
> > Although you may need to disable loop unrolling (often dubious at best)
> > and either force or disable some function inlining.
> 
> The cases where GCC does loop unrolling at -O2 always help quite a lot.
> Or, do you have a counter-example?  We'd love to see one.

The real problem with loop unrolling is that quite often a modern
out-of-order superscaler processor actually has 'spare' execution
cycles where the loop control can be done 'for free'.
Sometimes you do need to unroll (or interleave) a couple of
times to get enough spare execution cycles.

But the unrolled loop has to read a lot more code into cache
- so unless the code is 'hot cache' (that is usually arranged
for benchmarking) those delays apply as well.
The larger code footprint also displaces other code.

My real annoyance with gcc is unrolling (and vectorizing)
loops that I know are never executed as many times as even one
copy of the unrolled loop.

As an example intel (ivy bridge onwards) cpu execute the
following code (the middle of the ip checksum) at 8 bytes/clock.
(Limited by the carry flag.)
It just doesn't need any further unrolling.

+               "10:    jecxz 20f\n"
+               "       adc   (%[buff], %[len]), %[sum_0]\n"
+               "       adc   8(%[buff], %[len]), %[sum_1]\n"
+               "       lea   32(%[len]), %[len_tmp]\n"
+               "       adc   16(%[buff], %[len]), %[sum_0]\n"
+               "       adc   24(%[buff], %[len]), %[sum_1]\n"
+               "       mov   %[len_tmp], %[len]\n"
+               "       jmp   10b\n"

Annoyingly that loop is slow on my 8-core atom. 
The existing code only does 4 bytes/clock on intel cpu prior
to either broadwell or haswell (forgotten which) in spite
of much more unroling.


> And yup, inlining is hard.  GCC's heuristics there are very good
> nowadays, but any single decision has big effects.  Doing the important
> spots manually (always_inline or noinline) has good payoff.

Latest inline gripe was a function replicated about 20 times
when the non-inline version was a register load and 'tail call'.
The inlining is just bloat.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-03-16  9:36 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-09  6:19 [PATCH] powerpc/vdso32: Add missing _restgpr_31_x to fix build failure Christophe Leroy
2021-03-09  6:19 ` Christophe Leroy
2021-03-12  2:29 ` Segher Boessenkool
2021-03-12  2:29   ` Segher Boessenkool
2021-03-15 16:23   ` Rasmus Villemoes
2021-03-15 16:23     ` Rasmus Villemoes
2021-03-15 16:38     ` David Laight
2021-03-15 16:38       ` David Laight
2021-03-15 23:59       ` Segher Boessenkool
2021-03-15 23:59         ` Segher Boessenkool
2021-03-16  9:35         ` David Laight
2021-03-16  9:35           ` David Laight
2021-03-15 23:47     ` Segher Boessenkool
2021-03-15 23:47       ` Segher Boessenkool
2021-03-12 13:09 ` Christophe Leroy
2021-03-15 13:31 ` Michael Ellerman
2021-03-15 13:31   ` Michael Ellerman

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.