All of lore.kernel.org
 help / color / mirror / Atom feed
* v4.10: kernel stack frame pointer .. has bad value (null)
@ 2017-02-21 22:14 Pavel Machek
  2017-02-21 23:12 ` Josh Poimboeuf
  0 siblings, 1 reply; 51+ messages in thread
From: Pavel Machek @ 2017-02-21 22:14 UTC (permalink / raw)
  To: kernel list, jpoimboe, mingo
  Cc: luto, bp, brgerst, dvlasenk, hpa, torvalds, peterz, tglx

[-- Attachment #1: Type: text/plain, Size: 4977 bytes --]

Hi!

Thinkpad X220, in 32 bit mode... and I'm getting rather scary messages
from kernel during boot:

Git blame says that message comes from commit

commit 24d86f59093b0bcb3756cdf47f2db10ff4e90dbb
Author: Josh Poimboeuf <jpoimboe@redhat.com>
Date:   Thu Oct 27 08:10:58 2016 -0500

    x86/unwind: Ensure stack grows down

    Add a sanity check to ensure the stack only grows down, and print
    a
        warning if the check fails.

Any ideas?
							Pavel

[    1.047295] [drm] Memory usable by graphics device = 2048M
[    1.047356] [drm] Replacing VGA console driver
[    1.048029] Console: switching to colour dummy device 80x25
[    1.048348] WARNING: kernel stack frame pointer at f50cdf98 in
swapper/2:0 has bad value   (null)
[    1.048349] unwind stack type:0 next_sp:  (null) mask:a graph_idx:0
[    1.048352] f50cdebc: 00000000f50cdec4 (0xf50cdec4)
[    1.048356] f50cdec0: 00000000c40489b7 (irq_exit+0x87/0xa0)
[    1.048357] f50cdec4: 00000000f50cded0 (0xf50cded0)
[    1.048361] f50cdec8: 00000000c402f6d3
(smp_apic_timer_interrupt+0x33/0x40)
[    1.048362] f50cdecc: 000000003e7c28eb (0x3e7c28eb)
[    1.048363] f50cded0: 00000000f50cded9 (0xf50cded9)
[    1.048366] f50cded4: 00000000c4b7ac8e
(apic_timer_interrupt+0x36/0x3c)
[    1.048367] f50cded8: 000000003e7c28eb (0x3e7c28eb)
[    1.048368] f50cdedc: 0000000000000001 (0x1)
[    1.048370] f50cdee0: 00000000f57c41c0 (0xf57c41c0)
[    1.048370] f50cdee4: 0000000000000000 ...
[    1.048371] f50cdee8: 0000000000000002 (0x2)
[    1.048372] f50cdeec: 00000000f50cdf30 (0xf50cdf30)
[    1.048373] f50cdef0: 000000003e7c28eb (0x3e7c28eb)
[    1.048376] f50cdef4: 00000000c506007b (cltrack_prog+0x7fb/0x1000)
[    1.048377] f50cdef8: 000000000000007b (0x7b)
[    1.048378] f50cdefc: 00000000000000d8 (0xd8)
[    1.048379] f50cdf00: 00000000c50600e0 (cltrack_prog+0x860/0x1000)
[    1.048380] f50cdf04: 00000000ffffff10 (0xffffff10)
[    1.048384] f50cdf08: 00000000c48d05b5
(cpuidle_enter_state+0xf5/0x220)
[    1.048384] f50cdf0c: 0000000000000060 (0x60)
[    1.048385] f50cdf10: 0000000000200286 (0x200286)
[    1.048386] f50cdf14: 0000000000000000 ...
[    1.048387] f50cdf18: 00000000ffb8fad0 (0xffb8fad0)
[    1.048388] f50cdf1c: 000000003e6bbd2b (0x3e6bbd2b)
[    1.048389] f50cdf20: 0000000000000000 ...
[    1.048391] f50cdf24: 00000000c506c680 (max_cstate+0x2c/0x2c)
[    1.048394] f50cdf28: 00000000c506c680 (max_cstate+0x2c/0x2c)
[    1.048394] f50cdf2c: 00000000f50a2040 (0xf50a2040)
[    1.048395] f50cdf30: 00000000f50cdf3c (0xf50cdf3c)
[    1.048397] f50cdf34: 00000000c48d06ff (cpuidle_enter+0xf/0x20)
[    1.048398] f50cdf38: 0000000000000000 ...
[    1.048399] f50cdf3c: 00000000f50cdf48 (0xf50cdf48)
[    1.048402] f50cdf40: 00000000c4083113 (call_cpuidle+0x23/0x40)
[    1.048403] f50cdf44: 00000000ffb8fad0 (0xffb8fad0)
[    1.048404] f50cdf48: 00000000f50cdf60 (0xf50cdf60)
[    1.048406] f50cdf4c: 00000000c4083364 (do_idle+0x174/0x1d0)
[    1.048407] f50cdf50: 00000000f50a2040 (0xf50a2040)
[    1.048408] f50cdf54: 000000008c0ed583 (0x8c0ed583)
[    1.048409] f50cdf58: 0000000000000087 (0x87)
[    1.048410] f50cdf5c: 000000009779245f (0x9779245f)
[    1.048411] f50cdf60: 00000000f50cdf78 (0xf50cdf78)
[    1.048413] f50cdf64: 00000000c408361d
(cpu_startup_entry+0x5d/0x60)
[    1.048414] f50cdf68: 0000000033ee4c15 (0x33ee4c15)
[    1.048415] f50cdf6c: 00000000c9d84398 (0xc9d84398)
[    1.048417] f50cdf70: 0000000002100800 (0x2100800)
[    1.048418] f50cdf74: 000000000fdc6fb2 (0xfdc6fb2)
[    1.048418] f50cdf78: 00000000f50cdf98 (0xf50cdf98)
[    1.048421] f50cdf7c: 00000000c402d216
(start_secondary+0x176/0x1c0)
[    1.048422] f50cdf80: 000000000fdc6fb2 (0xfdc6fb2)
[    1.048423] f50cdf84: 00000000d5425adc (0xd5425adc)
[    1.048424] f50cdf88: 0000000000000002 (0x2)
[    1.048425] f50cdf8c: 00000000ffffffff (0xffffffff)
[    1.048425] f50cdf90: 0000000000000000 ...
[    1.048426] f50cdf94: 00000000f50cdfac (0xf50cdfac)
[    1.048427] f50cdf98: 0000000000000000 ...
[    1.048429] f50cdf9c: 00000000c4000237 (startup_32_smp+0x16b/0x16d)
[    1.048429] f50cdfa0: 0000000000200002 (0x200002)
[    1.048430] f50cdfa4: 0000000000000000 ...
[    1.048432] f50cdfa8: 00000000c4000237 (startup_32_smp+0x16b/0x16d)
[    1.048432] f50cdfac: 0000000000000000 ...
[    1.048433] f50cdff4: 0000000000000100 (0x100)
[    1.048434] f50cdff8: 0000000000000200 (0x200)
[    1.048435] f50cdffc: 0000000000000000 ...
[    1.060368] [drm] Supports vblank timestamp caching Rev 2
(21.10.2013).
[    1.060373] [drm] Driver supports precise vblank timestamp query.
[    1.061668] i915 0000:00:02.0: vgaarb: changed VGA decodes:
olddecodes=io+mem,decodes=io+mem:owns=io+mem
[    1.146917] ACPI: Video Device [VID] (multi-head: yes  rom: no
post: no)

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: v4.10: kernel stack frame pointer .. has bad value (null)
  2017-02-21 22:14 v4.10: kernel stack frame pointer .. has bad value (null) Pavel Machek
@ 2017-02-21 23:12 ` Josh Poimboeuf
  2017-02-21 23:15   ` H. Peter Anvin
  2017-02-22 21:05   ` Pavel Machek
  0 siblings, 2 replies; 51+ messages in thread
From: Josh Poimboeuf @ 2017-02-21 23:12 UTC (permalink / raw)
  To: Pavel Machek
  Cc: kernel list, mingo, luto, bp, brgerst, dvlasenk, hpa, torvalds,
	peterz, tglx

On Tue, Feb 21, 2017 at 11:14:18PM +0100, Pavel Machek wrote:
> Hi!
> 
> Thinkpad X220, in 32 bit mode... and I'm getting rather scary messages
> from kernel during boot:
> 
> Git blame says that message comes from commit
> 
> commit 24d86f59093b0bcb3756cdf47f2db10ff4e90dbb
> Author: Josh Poimboeuf <jpoimboe@redhat.com>
> Date:   Thu Oct 27 08:10:58 2016 -0500
> 
>     x86/unwind: Ensure stack grows down
> 
>     Add a sanity check to ensure the stack only grows down, and print
>     a
>         warning if the check fails.
> 
> Any ideas?

Hi Pavel,

I don't think I've seen this one.  Any chance this came after resuming
from a hibernation or suspend?


> [    1.047295] [drm] Memory usable by graphics device = 2048M
> [    1.047356] [drm] Replacing VGA console driver
> [    1.048029] Console: switching to colour dummy device 80x25
> [    1.048348] WARNING: kernel stack frame pointer at f50cdf98 in
> swapper/2:0 has bad value   (null)
> [    1.048349] unwind stack type:0 next_sp:  (null) mask:a graph_idx:0
> [    1.048352] f50cdebc: 00000000f50cdec4 (0xf50cdec4)
> [    1.048356] f50cdec0: 00000000c40489b7 (irq_exit+0x87/0xa0)
> [    1.048357] f50cdec4: 00000000f50cded0 (0xf50cded0)
> [    1.048361] f50cdec8: 00000000c402f6d3
> (smp_apic_timer_interrupt+0x33/0x40)
> [    1.048362] f50cdecc: 000000003e7c28eb (0x3e7c28eb)
> [    1.048363] f50cded0: 00000000f50cded9 (0xf50cded9)
> [    1.048366] f50cded4: 00000000c4b7ac8e
> (apic_timer_interrupt+0x36/0x3c)
> [    1.048367] f50cded8: 000000003e7c28eb (0x3e7c28eb)
> [    1.048368] f50cdedc: 0000000000000001 (0x1)
> [    1.048370] f50cdee0: 00000000f57c41c0 (0xf57c41c0)
> [    1.048370] f50cdee4: 0000000000000000 ...
> [    1.048371] f50cdee8: 0000000000000002 (0x2)
> [    1.048372] f50cdeec: 00000000f50cdf30 (0xf50cdf30)
> [    1.048373] f50cdef0: 000000003e7c28eb (0x3e7c28eb)
> [    1.048376] f50cdef4: 00000000c506007b (cltrack_prog+0x7fb/0x1000)
> [    1.048377] f50cdef8: 000000000000007b (0x7b)
> [    1.048378] f50cdefc: 00000000000000d8 (0xd8)
> [    1.048379] f50cdf00: 00000000c50600e0 (cltrack_prog+0x860/0x1000)
> [    1.048380] f50cdf04: 00000000ffffff10 (0xffffff10)
> [    1.048384] f50cdf08: 00000000c48d05b5
> (cpuidle_enter_state+0xf5/0x220)
> [    1.048384] f50cdf0c: 0000000000000060 (0x60)
> [    1.048385] f50cdf10: 0000000000200286 (0x200286)
> [    1.048386] f50cdf14: 0000000000000000 ...
> [    1.048387] f50cdf18: 00000000ffb8fad0 (0xffb8fad0)
> [    1.048388] f50cdf1c: 000000003e6bbd2b (0x3e6bbd2b)
> [    1.048389] f50cdf20: 0000000000000000 ...
> [    1.048391] f50cdf24: 00000000c506c680 (max_cstate+0x2c/0x2c)
> [    1.048394] f50cdf28: 00000000c506c680 (max_cstate+0x2c/0x2c)
> [    1.048394] f50cdf2c: 00000000f50a2040 (0xf50a2040)
> [    1.048395] f50cdf30: 00000000f50cdf3c (0xf50cdf3c)
> [    1.048397] f50cdf34: 00000000c48d06ff (cpuidle_enter+0xf/0x20)
> [    1.048398] f50cdf38: 0000000000000000 ...
> [    1.048399] f50cdf3c: 00000000f50cdf48 (0xf50cdf48)
> [    1.048402] f50cdf40: 00000000c4083113 (call_cpuidle+0x23/0x40)
> [    1.048403] f50cdf44: 00000000ffb8fad0 (0xffb8fad0)
> [    1.048404] f50cdf48: 00000000f50cdf60 (0xf50cdf60)
> [    1.048406] f50cdf4c: 00000000c4083364 (do_idle+0x174/0x1d0)
> [    1.048407] f50cdf50: 00000000f50a2040 (0xf50a2040)
> [    1.048408] f50cdf54: 000000008c0ed583 (0x8c0ed583)
> [    1.048409] f50cdf58: 0000000000000087 (0x87)
> [    1.048410] f50cdf5c: 000000009779245f (0x9779245f)
> [    1.048411] f50cdf60: 00000000f50cdf78 (0xf50cdf78)
> [    1.048413] f50cdf64: 00000000c408361d
> (cpu_startup_entry+0x5d/0x60)
> [    1.048414] f50cdf68: 0000000033ee4c15 (0x33ee4c15)
> [    1.048415] f50cdf6c: 00000000c9d84398 (0xc9d84398)
> [    1.048417] f50cdf70: 0000000002100800 (0x2100800)
> [    1.048418] f50cdf74: 000000000fdc6fb2 (0xfdc6fb2)
> [    1.048418] f50cdf78: 00000000f50cdf98 (0xf50cdf98)
> [    1.048421] f50cdf7c: 00000000c402d216
> (start_secondary+0x176/0x1c0)
> [    1.048422] f50cdf80: 000000000fdc6fb2 (0xfdc6fb2)
> [    1.048423] f50cdf84: 00000000d5425adc (0xd5425adc)
> [    1.048424] f50cdf88: 0000000000000002 (0x2)
> [    1.048425] f50cdf8c: 00000000ffffffff (0xffffffff)
> [    1.048425] f50cdf90: 0000000000000000 ...
> [    1.048426] f50cdf94: 00000000f50cdfac (0xf50cdfac)
> [    1.048427] f50cdf98: 0000000000000000 ...
> [    1.048429] f50cdf9c: 00000000c4000237 (startup_32_smp+0x16b/0x16d)
> [    1.048429] f50cdfa0: 0000000000200002 (0x200002)
> [    1.048430] f50cdfa4: 0000000000000000 ...
> [    1.048432] f50cdfa8: 00000000c4000237 (startup_32_smp+0x16b/0x16d)
> [    1.048432] f50cdfac: 0000000000000000 ...
> [    1.048433] f50cdff4: 0000000000000100 (0x100)
> [    1.048434] f50cdff8: 0000000000000200 (0x200)
> [    1.048435] f50cdffc: 0000000000000000 ...
> [    1.060368] [drm] Supports vblank timestamp caching Rev 2
> (21.10.2013).
> [    1.060373] [drm] Driver supports precise vblank timestamp query.
> [    1.061668] i915 0000:00:02.0: vgaarb: changed VGA decodes:
> olddecodes=io+mem,decodes=io+mem:owns=io+mem
> [    1.146917] ACPI: Video Device [VID] (multi-head: yes  rom: no
> post: no)
> 
> -- 
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html



-- 
Josh

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: v4.10: kernel stack frame pointer .. has bad value (null)
  2017-02-21 23:12 ` Josh Poimboeuf
@ 2017-02-21 23:15   ` H. Peter Anvin
  2017-02-22 16:45     ` Josh Poimboeuf
  2017-02-22 21:05   ` Pavel Machek
  1 sibling, 1 reply; 51+ messages in thread
From: H. Peter Anvin @ 2017-02-21 23:15 UTC (permalink / raw)
  To: Josh Poimboeuf, Pavel Machek
  Cc: kernel list, mingo, luto, bp, brgerst, dvlasenk, torvalds, peterz, tglx

On 02/21/17 15:12, Josh Poimboeuf wrote:
>>
>> commit 24d86f59093b0bcb3756cdf47f2db10ff4e90dbb
>> Author: Josh Poimboeuf <jpoimboe@redhat.com>
>> Date:   Thu Oct 27 08:10:58 2016 -0500
>>
>>     x86/unwind: Ensure stack grows down
>>
>>     Add a sanity check to ensure the stack only grows down, and print
>>     a
>>         warning if the check fails.
>>
>> Any ideas?
> 
> Hi Pavel,
> 
> I don't think I've seen this one.  Any chance this came after resuming
> from a hibernation or suspend?
> 
> 
>> [    1.047295] [drm] Memory usable by graphics device = 2048M
>> [    1.047356] [drm] Replacing VGA console driver
>> [    1.048029] Console: switching to colour dummy device 80x25
>> [    1.048348] WARNING: kernel stack frame pointer at f50cdf98 in
>> swapper/2:0 has bad value   (null)
>> [    1.048349] unwind stack type:0 next_sp:  (null) mask:a graph_idx:0
>> [    1.048352] f50cdebc: 00000000f50cdec4 (0xf50cdec4)
                            ^^^^^^^^^^^^^^^^

FWIW, it would be really darned nice to not have all those zeroes in a
32-bit stack frame dump.

Is not a zero stack frame pointer value an end of stack token?

	-hpa

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: v4.10: kernel stack frame pointer .. has bad value (null)
  2017-02-21 23:15   ` H. Peter Anvin
@ 2017-02-22 16:45     ` Josh Poimboeuf
  2017-02-22 20:51       ` H. Peter Anvin
  0 siblings, 1 reply; 51+ messages in thread
From: Josh Poimboeuf @ 2017-02-22 16:45 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Pavel Machek, kernel list, mingo, luto, bp, brgerst, dvlasenk,
	torvalds, peterz, tglx

On Tue, Feb 21, 2017 at 03:15:36PM -0800, H. Peter Anvin wrote:
> On 02/21/17 15:12, Josh Poimboeuf wrote:
> >>
> >> commit 24d86f59093b0bcb3756cdf47f2db10ff4e90dbb
> >> Author: Josh Poimboeuf <jpoimboe@redhat.com>
> >> Date:   Thu Oct 27 08:10:58 2016 -0500
> >>
> >>     x86/unwind: Ensure stack grows down
> >>
> >>     Add a sanity check to ensure the stack only grows down, and print
> >>     a
> >>         warning if the check fails.
> >>
> >> Any ideas?
> > 
> > Hi Pavel,
> > 
> > I don't think I've seen this one.  Any chance this came after resuming
> > from a hibernation or suspend?
> > 
> > 
> >> [    1.047295] [drm] Memory usable by graphics device = 2048M
> >> [    1.047356] [drm] Replacing VGA console driver
> >> [    1.048029] Console: switching to colour dummy device 80x25
> >> [    1.048348] WARNING: kernel stack frame pointer at f50cdf98 in
> >> swapper/2:0 has bad value   (null)
> >> [    1.048349] unwind stack type:0 next_sp:  (null) mask:a graph_idx:0
> >> [    1.048352] f50cdebc: 00000000f50cdec4 (0xf50cdec4)
>                             ^^^^^^^^^^^^^^^^
> 
> FWIW, it would be really darned nice to not have all those zeroes in a
> 32-bit stack frame dump.

Yeah, I'll fix that.

> Is not a zero stack frame pointer value an end of stack token?

There's no end of stack "token" per se, though any frame pointer value
outside the bounds of the stack will terminate the stack trace (and that
still happened here).

The warning is because the stack trace didn't make it all the way to the
"end" location of the stack (right before the syscall pt_regs location).
The warning is part of the effort to ensure reliable stacks.

-- 
Josh

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: v4.10: kernel stack frame pointer .. has bad value (null)
  2017-02-22 16:45     ` Josh Poimboeuf
@ 2017-02-22 20:51       ` H. Peter Anvin
  2017-02-22 21:15         ` Josh Poimboeuf
  0 siblings, 1 reply; 51+ messages in thread
From: H. Peter Anvin @ 2017-02-22 20:51 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Pavel Machek, kernel list, mingo, luto, bp, brgerst, dvlasenk,
	torvalds, peterz, tglx

On 02/22/17 08:45, Josh Poimboeuf wrote:
>>
>> FWIW, it would be really darned nice to not have all those zeroes in a
>> 32-bit stack frame dump.
> 
> Yeah, I'll fix that.
> 
>> Is not a zero stack frame pointer value an end of stack token?
> 
> There's no end of stack "token" per se, though any frame pointer value
> outside the bounds of the stack will terminate the stack trace (and that
> still happened here).
> 

Well, my understanding is that at least gdb and perhaps other unwinders
consider a zero stack frame pointer to be an indicator that the stack
has reached its end.  That's why I'm wondering if this is possible in
this case or if it is unlikely because of the value.

> The warning is because the stack trace didn't make it all the way to the
> "end" location of the stack (right before the syscall pt_regs location).
> The warning is part of the effort to ensure reliable stacks.

It would be useful to get an understanding why...

	-hpa

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: v4.10: kernel stack frame pointer .. has bad value (null)
  2017-02-21 23:12 ` Josh Poimboeuf
  2017-02-21 23:15   ` H. Peter Anvin
@ 2017-02-22 21:05   ` Pavel Machek
  2017-02-22 21:21     ` Josh Poimboeuf
  1 sibling, 1 reply; 51+ messages in thread
From: Pavel Machek @ 2017-02-22 21:05 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: kernel list, mingo, luto, bp, brgerst, dvlasenk, hpa, torvalds,
	peterz, tglx

[-- Attachment #1: Type: text/plain, Size: 5724 bytes --]

Hi!

> > Thinkpad X220, in 32 bit mode... and I'm getting rather scary messages
> > from kernel during boot:
> > 
> > Git blame says that message comes from commit
> > 
> > commit 24d86f59093b0bcb3756cdf47f2db10ff4e90dbb
> > Author: Josh Poimboeuf <jpoimboe@redhat.com>
> > Date:   Thu Oct 27 08:10:58 2016 -0500
> > 
> >     x86/unwind: Ensure stack grows down
> > 
> >     Add a sanity check to ensure the stack only grows down, and print
> >     a
> >         warning if the check fails.
> > 
> > Any ideas?
> 
> Hi Pavel,
> 
> I don't think I've seen this one.  Any chance this came after resuming
> from a hibernation or suspend?

No, it was during the boot. Notice the timestamps...
								Pavel

> > [    1.047295] [drm] Memory usable by graphics device = 2048M
> > [    1.047356] [drm] Replacing VGA console driver
> > [    1.048029] Console: switching to colour dummy device 80x25
> > [    1.048348] WARNING: kernel stack frame pointer at f50cdf98 in
> > swapper/2:0 has bad value   (null)
> > [    1.048349] unwind stack type:0 next_sp:  (null) mask:a graph_idx:0
> > [    1.048352] f50cdebc: 00000000f50cdec4 (0xf50cdec4)
> > [    1.048356] f50cdec0: 00000000c40489b7 (irq_exit+0x87/0xa0)
> > [    1.048357] f50cdec4: 00000000f50cded0 (0xf50cded0)
> > [    1.048361] f50cdec8: 00000000c402f6d3
> > (smp_apic_timer_interrupt+0x33/0x40)
> > [    1.048362] f50cdecc: 000000003e7c28eb (0x3e7c28eb)
> > [    1.048363] f50cded0: 00000000f50cded9 (0xf50cded9)
> > [    1.048366] f50cded4: 00000000c4b7ac8e
> > (apic_timer_interrupt+0x36/0x3c)
> > [    1.048367] f50cded8: 000000003e7c28eb (0x3e7c28eb)
> > [    1.048368] f50cdedc: 0000000000000001 (0x1)
> > [    1.048370] f50cdee0: 00000000f57c41c0 (0xf57c41c0)
> > [    1.048370] f50cdee4: 0000000000000000 ...
> > [    1.048371] f50cdee8: 0000000000000002 (0x2)
> > [    1.048372] f50cdeec: 00000000f50cdf30 (0xf50cdf30)
> > [    1.048373] f50cdef0: 000000003e7c28eb (0x3e7c28eb)
> > [    1.048376] f50cdef4: 00000000c506007b (cltrack_prog+0x7fb/0x1000)
> > [    1.048377] f50cdef8: 000000000000007b (0x7b)
> > [    1.048378] f50cdefc: 00000000000000d8 (0xd8)
> > [    1.048379] f50cdf00: 00000000c50600e0 (cltrack_prog+0x860/0x1000)
> > [    1.048380] f50cdf04: 00000000ffffff10 (0xffffff10)
> > [    1.048384] f50cdf08: 00000000c48d05b5
> > (cpuidle_enter_state+0xf5/0x220)
> > [    1.048384] f50cdf0c: 0000000000000060 (0x60)
> > [    1.048385] f50cdf10: 0000000000200286 (0x200286)
> > [    1.048386] f50cdf14: 0000000000000000 ...
> > [    1.048387] f50cdf18: 00000000ffb8fad0 (0xffb8fad0)
> > [    1.048388] f50cdf1c: 000000003e6bbd2b (0x3e6bbd2b)
> > [    1.048389] f50cdf20: 0000000000000000 ...
> > [    1.048391] f50cdf24: 00000000c506c680 (max_cstate+0x2c/0x2c)
> > [    1.048394] f50cdf28: 00000000c506c680 (max_cstate+0x2c/0x2c)
> > [    1.048394] f50cdf2c: 00000000f50a2040 (0xf50a2040)
> > [    1.048395] f50cdf30: 00000000f50cdf3c (0xf50cdf3c)
> > [    1.048397] f50cdf34: 00000000c48d06ff (cpuidle_enter+0xf/0x20)
> > [    1.048398] f50cdf38: 0000000000000000 ...
> > [    1.048399] f50cdf3c: 00000000f50cdf48 (0xf50cdf48)
> > [    1.048402] f50cdf40: 00000000c4083113 (call_cpuidle+0x23/0x40)
> > [    1.048403] f50cdf44: 00000000ffb8fad0 (0xffb8fad0)
> > [    1.048404] f50cdf48: 00000000f50cdf60 (0xf50cdf60)
> > [    1.048406] f50cdf4c: 00000000c4083364 (do_idle+0x174/0x1d0)
> > [    1.048407] f50cdf50: 00000000f50a2040 (0xf50a2040)
> > [    1.048408] f50cdf54: 000000008c0ed583 (0x8c0ed583)
> > [    1.048409] f50cdf58: 0000000000000087 (0x87)
> > [    1.048410] f50cdf5c: 000000009779245f (0x9779245f)
> > [    1.048411] f50cdf60: 00000000f50cdf78 (0xf50cdf78)
> > [    1.048413] f50cdf64: 00000000c408361d
> > (cpu_startup_entry+0x5d/0x60)
> > [    1.048414] f50cdf68: 0000000033ee4c15 (0x33ee4c15)
> > [    1.048415] f50cdf6c: 00000000c9d84398 (0xc9d84398)
> > [    1.048417] f50cdf70: 0000000002100800 (0x2100800)
> > [    1.048418] f50cdf74: 000000000fdc6fb2 (0xfdc6fb2)
> > [    1.048418] f50cdf78: 00000000f50cdf98 (0xf50cdf98)
> > [    1.048421] f50cdf7c: 00000000c402d216
> > (start_secondary+0x176/0x1c0)
> > [    1.048422] f50cdf80: 000000000fdc6fb2 (0xfdc6fb2)
> > [    1.048423] f50cdf84: 00000000d5425adc (0xd5425adc)
> > [    1.048424] f50cdf88: 0000000000000002 (0x2)
> > [    1.048425] f50cdf8c: 00000000ffffffff (0xffffffff)
> > [    1.048425] f50cdf90: 0000000000000000 ...
> > [    1.048426] f50cdf94: 00000000f50cdfac (0xf50cdfac)
> > [    1.048427] f50cdf98: 0000000000000000 ...
> > [    1.048429] f50cdf9c: 00000000c4000237 (startup_32_smp+0x16b/0x16d)
> > [    1.048429] f50cdfa0: 0000000000200002 (0x200002)
> > [    1.048430] f50cdfa4: 0000000000000000 ...
> > [    1.048432] f50cdfa8: 00000000c4000237 (startup_32_smp+0x16b/0x16d)
> > [    1.048432] f50cdfac: 0000000000000000 ...
> > [    1.048433] f50cdff4: 0000000000000100 (0x100)
> > [    1.048434] f50cdff8: 0000000000000200 (0x200)
> > [    1.048435] f50cdffc: 0000000000000000 ...
> > [    1.060368] [drm] Supports vblank timestamp caching Rev 2
> > (21.10.2013).
> > [    1.060373] [drm] Driver supports precise vblank timestamp query.
> > [    1.061668] i915 0000:00:02.0: vgaarb: changed VGA decodes:
> > olddecodes=io+mem,decodes=io+mem:owns=io+mem
> > [    1.146917] ACPI: Video Device [VID] (multi-head: yes  rom: no
> > post: no)
> > 
> > -- 
> > (english) http://www.livejournal.com/~pavelmachek
> > (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
> 
> 
> 

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: v4.10: kernel stack frame pointer .. has bad value (null)
  2017-02-22 20:51       ` H. Peter Anvin
@ 2017-02-22 21:15         ` Josh Poimboeuf
  0 siblings, 0 replies; 51+ messages in thread
From: Josh Poimboeuf @ 2017-02-22 21:15 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Pavel Machek, kernel list, mingo, luto, bp, brgerst, dvlasenk,
	torvalds, peterz, tglx

On Wed, Feb 22, 2017 at 12:51:11PM -0800, H. Peter Anvin wrote:
> On 02/22/17 08:45, Josh Poimboeuf wrote:
> >>
> >> FWIW, it would be really darned nice to not have all those zeroes in a
> >> 32-bit stack frame dump.
> > 
> > Yeah, I'll fix that.
> > 
> >> Is not a zero stack frame pointer value an end of stack token?
> > 
> > There's no end of stack "token" per se, though any frame pointer value
> > outside the bounds of the stack will terminate the stack trace (and that
> > still happened here).
> > 
> 
> Well, my understanding is that at least gdb and perhaps other unwinders
> consider a zero stack frame pointer to be an indicator that the stack
> has reached its end.  That's why I'm wondering if this is possible in
> this case or if it is unlikely because of the value.

I'm not sure I follow your question.  The frame pointer was zero, and
that did cause the unwinder to stop the stack trace.  The warning was
because it ended in an unexpected place.

> > The warning is because the stack trace didn't make it all the way to the
> > "end" location of the stack (right before the syscall pt_regs location).
> > The warning is part of the effort to ensure reliable stacks.
> 
> It would be useful to get an understanding why...

Agreed...

-- 
Josh

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: v4.10: kernel stack frame pointer .. has bad value (null)
  2017-02-22 21:05   ` Pavel Machek
@ 2017-02-22 21:21     ` Josh Poimboeuf
  2017-02-22 22:47       ` Pavel Machek
  0 siblings, 1 reply; 51+ messages in thread
From: Josh Poimboeuf @ 2017-02-22 21:21 UTC (permalink / raw)
  To: Pavel Machek
  Cc: kernel list, mingo, luto, bp, brgerst, dvlasenk, hpa, torvalds,
	peterz, tglx

On Wed, Feb 22, 2017 at 10:05:48PM +0100, Pavel Machek wrote:
> Hi!
> 
> > > Thinkpad X220, in 32 bit mode... and I'm getting rather scary messages
> > > from kernel during boot:
> > > 
> > > Git blame says that message comes from commit
> > > 
> > > commit 24d86f59093b0bcb3756cdf47f2db10ff4e90dbb
> > > Author: Josh Poimboeuf <jpoimboe@redhat.com>
> > > Date:   Thu Oct 27 08:10:58 2016 -0500
> > > 
> > >     x86/unwind: Ensure stack grows down
> > > 
> > >     Add a sanity check to ensure the stack only grows down, and print
> > >     a
> > >         warning if the check fails.
> > > 
> > > Any ideas?
> > 
> > Hi Pavel,
> > 
> > I don't think I've seen this one.  Any chance this came after resuming
> > from a hibernation or suspend?
> 
> No, it was during the boot. Notice the timestamps...

Right, but doesn't waking from hibernation initially start with a
timestamp of zero?

The reason I asked is because of the following part of the stack dump:

> > > [    1.048429] f50cdf9c: 00000000c4000237 (startup_32_smp+0x16b/0x16d)
> > > [    1.048429] f50cdfa0: 0000000000200002 (0x200002)
> > > [    1.048430] f50cdfa4: 0000000000000000 ...
> > > [    1.048432] f50cdfa8: 00000000c4000237 (startup_32_smp+0x16b/0x16d)
> > > [    1.048432] f50cdfac: 0000000000000000 ...
> > > [    1.048433] f50cdff4: 0000000000000100 (0x100)
> > > [    1.048434] f50cdff8: 0000000000000200 (0x200)
> > > [    1.048435] f50cdffc: 0000000000000000 ...
> > > [    1.060368] [drm] Supports vblank timestamp caching Rev 2

Somehow, startup_32_smp() is on the stack twice.  The stack unwind led
to the startup_32_smp() frame at 0xf50cdf9c rather than the one at
0xf50cdfa8 (which is where it should normally be).  So the question is
how startup_32_smp() got executed the second time, with the wrong stack
offset.

-- 
Josh

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: v4.10: kernel stack frame pointer .. has bad value (null)
  2017-02-22 21:21     ` Josh Poimboeuf
@ 2017-02-22 22:47       ` Pavel Machek
  2017-02-22 22:56         ` Josh Poimboeuf
  0 siblings, 1 reply; 51+ messages in thread
From: Pavel Machek @ 2017-02-22 22:47 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: kernel list, mingo, luto, bp, brgerst, dvlasenk, hpa, torvalds,
	peterz, tglx

[-- Attachment #1: Type: text/plain, Size: 2278 bytes --]

Hi!

> > > > Thinkpad X220, in 32 bit mode... and I'm getting rather scary messages
> > > > from kernel during boot:
> > > > 
> > > > Git blame says that message comes from commit
> > > > 
> > > > commit 24d86f59093b0bcb3756cdf47f2db10ff4e90dbb
> > > > Author: Josh Poimboeuf <jpoimboe@redhat.com>
> > > > Date:   Thu Oct 27 08:10:58 2016 -0500
> > > > 
> > > >     x86/unwind: Ensure stack grows down
> > > > 
> > > >     Add a sanity check to ensure the stack only grows down, and print
> > > >     a
> > > >         warning if the check fails.
> > > > 
> > > > Any ideas?
> > > 
> > > I don't think I've seen this one.  Any chance this came after resuming
> > > from a hibernation or suspend?
> > 
> > No, it was during the boot. Notice the timestamps...
> 
> Right, but doesn't waking from hibernation initially start with a
> timestamp of zero?

Aha, ok, I guess so. Anyway... no hibernation was involved.

> The reason I asked is because of the following part of the stack
> dump:

> 
> > > > [    1.048429] f50cdf9c: 00000000c4000237 (startup_32_smp+0x16b/0x16d)
> > > > [    1.048429] f50cdfa0: 0000000000200002 (0x200002)
> > > > [    1.048430] f50cdfa4: 0000000000000000 ...
> > > > [    1.048432] f50cdfa8: 00000000c4000237 (startup_32_smp+0x16b/0x16d)
> > > > [    1.048432] f50cdfac: 0000000000000000 ...
> > > > [    1.048433] f50cdff4: 0000000000000100 (0x100)
> > > > [    1.048434] f50cdff8: 0000000000000200 (0x200)
> > > > [    1.048435] f50cdffc: 0000000000000000 ...
> > > > [    1.060368] [drm] Supports vblank timestamp caching Rev 2
> 
> Somehow, startup_32_smp() is on the stack twice.  The stack unwind led
> to the startup_32_smp() frame at 0xf50cdf9c rather than the one at
> 0xf50cdfa8 (which is where it should normally be).  So the question is
> how startup_32_smp() got executed the second time, with the wrong stack
> offset.

Not much idea... but this is stack dump, right? Just because some
value is on the stack does not mean it is a return address, no?

And .... startup_32_smp is kind of "interesting" function. Take a
look...

									Pavel

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: v4.10: kernel stack frame pointer .. has bad value (null)
  2017-02-22 22:47       ` Pavel Machek
@ 2017-02-22 22:56         ` Josh Poimboeuf
  2017-02-22 23:18           ` Josh Poimboeuf
  0 siblings, 1 reply; 51+ messages in thread
From: Josh Poimboeuf @ 2017-02-22 22:56 UTC (permalink / raw)
  To: Pavel Machek
  Cc: kernel list, mingo, luto, bp, brgerst, dvlasenk, hpa, torvalds,
	peterz, tglx

On Wed, Feb 22, 2017 at 11:47:55PM +0100, Pavel Machek wrote:
> Hi!
> 
> > > > > Thinkpad X220, in 32 bit mode... and I'm getting rather scary messages
> > > > > from kernel during boot:
> > > > > 
> > > > > Git blame says that message comes from commit
> > > > > 
> > > > > commit 24d86f59093b0bcb3756cdf47f2db10ff4e90dbb
> > > > > Author: Josh Poimboeuf <jpoimboe@redhat.com>
> > > > > Date:   Thu Oct 27 08:10:58 2016 -0500
> > > > > 
> > > > >     x86/unwind: Ensure stack grows down
> > > > > 
> > > > >     Add a sanity check to ensure the stack only grows down, and print
> > > > >     a
> > > > >         warning if the check fails.
> > > > > 
> > > > > Any ideas?
> > > > 
> > > > I don't think I've seen this one.  Any chance this came after resuming
> > > > from a hibernation or suspend?
> > > 
> > > No, it was during the boot. Notice the timestamps...
> > 
> > Right, but doesn't waking from hibernation initially start with a
> > timestamp of zero?
> 
> Aha, ok, I guess so. Anyway... no hibernation was involved.
> 
> > The reason I asked is because of the following part of the stack
> > dump:
> 
> > 
> > > > > [    1.048429] f50cdf9c: 00000000c4000237 (startup_32_smp+0x16b/0x16d)
> > > > > [    1.048429] f50cdfa0: 0000000000200002 (0x200002)
> > > > > [    1.048430] f50cdfa4: 0000000000000000 ...
> > > > > [    1.048432] f50cdfa8: 00000000c4000237 (startup_32_smp+0x16b/0x16d)
> > > > > [    1.048432] f50cdfac: 0000000000000000 ...
> > > > > [    1.048433] f50cdff4: 0000000000000100 (0x100)
> > > > > [    1.048434] f50cdff8: 0000000000000200 (0x200)
> > > > > [    1.048435] f50cdffc: 0000000000000000 ...
> > > > > [    1.060368] [drm] Supports vblank timestamp caching Rev 2
> > 
> > Somehow, startup_32_smp() is on the stack twice.  The stack unwind led
> > to the startup_32_smp() frame at 0xf50cdf9c rather than the one at
> > 0xf50cdfa8 (which is where it should normally be).  So the question is
> > how startup_32_smp() got executed the second time, with the wrong stack
> > offset.
> 
> Not much idea... but this is stack dump, right? Just because some
> value is on the stack does not mean it is a return address, no?

Right, but the one at 0xf50cdfa8 is where the startup_32_smp() is
*supposed* to be.  If the unwinder had unwinded to that one, it wouldn't
have complained.  So it looks to me like the CPU somehow booted twice:
the first time at the right stack address, and the second time it
somehow ended up with a different stack address.

> And .... startup_32_smp is kind of "interesting" function. Take a
> look...

Yes, it's used in bringing up the CPU.

-- 
Josh

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: v4.10: kernel stack frame pointer .. has bad value (null)
  2017-02-22 22:56         ` Josh Poimboeuf
@ 2017-02-22 23:18           ` Josh Poimboeuf
  2017-02-23 20:10             ` Pavel Machek
  0 siblings, 1 reply; 51+ messages in thread
From: Josh Poimboeuf @ 2017-02-22 23:18 UTC (permalink / raw)
  To: Pavel Machek
  Cc: kernel list, mingo, luto, bp, brgerst, dvlasenk, hpa, torvalds,
	peterz, tglx

On Wed, Feb 22, 2017 at 04:56:14PM -0600, Josh Poimboeuf wrote:
> On Wed, Feb 22, 2017 at 11:47:55PM +0100, Pavel Machek wrote:
> > Hi!
> > 
> > > > > > Thinkpad X220, in 32 bit mode... and I'm getting rather scary messages
> > > > > > from kernel during boot:
> > > > > > 
> > > > > > Git blame says that message comes from commit
> > > > > > 
> > > > > > commit 24d86f59093b0bcb3756cdf47f2db10ff4e90dbb
> > > > > > Author: Josh Poimboeuf <jpoimboe@redhat.com>
> > > > > > Date:   Thu Oct 27 08:10:58 2016 -0500
> > > > > > 
> > > > > >     x86/unwind: Ensure stack grows down
> > > > > > 
> > > > > >     Add a sanity check to ensure the stack only grows down, and print
> > > > > >     a
> > > > > >         warning if the check fails.
> > > > > > 
> > > > > > Any ideas?
> > > > > 
> > > > > I don't think I've seen this one.  Any chance this came after resuming
> > > > > from a hibernation or suspend?
> > > > 
> > > > No, it was during the boot. Notice the timestamps...
> > > 
> > > Right, but doesn't waking from hibernation initially start with a
> > > timestamp of zero?
> > 
> > Aha, ok, I guess so. Anyway... no hibernation was involved.
> > 
> > > The reason I asked is because of the following part of the stack
> > > dump:
> > 
> > > 
> > > > > > [    1.048429] f50cdf9c: 00000000c4000237 (startup_32_smp+0x16b/0x16d)
> > > > > > [    1.048429] f50cdfa0: 0000000000200002 (0x200002)
> > > > > > [    1.048430] f50cdfa4: 0000000000000000 ...
> > > > > > [    1.048432] f50cdfa8: 00000000c4000237 (startup_32_smp+0x16b/0x16d)
> > > > > > [    1.048432] f50cdfac: 0000000000000000 ...
> > > > > > [    1.048433] f50cdff4: 0000000000000100 (0x100)
> > > > > > [    1.048434] f50cdff8: 0000000000000200 (0x200)
> > > > > > [    1.048435] f50cdffc: 0000000000000000 ...
> > > > > > [    1.060368] [drm] Supports vblank timestamp caching Rev 2
> > > 
> > > Somehow, startup_32_smp() is on the stack twice.  The stack unwind led
> > > to the startup_32_smp() frame at 0xf50cdf9c rather than the one at
> > > 0xf50cdfa8 (which is where it should normally be).  So the question is
> > > how startup_32_smp() got executed the second time, with the wrong stack
> > > offset.
> > 
> > Not much idea... but this is stack dump, right? Just because some
> > value is on the stack does not mean it is a return address, no?
> 
> Right, but the one at 0xf50cdfa8 is where the startup_32_smp() is
> *supposed* to be.  If the unwinder had unwinded to that one, it wouldn't
> have complained.  So it looks to me like the CPU somehow booted twice:
> the first time at the right stack address, and the second time it
> somehow ended up with a different stack address.
> 
> > And .... startup_32_smp is kind of "interesting" function. Take a
> > look...
> 
> Yes, it's used in bringing up the CPU.

Can you share your .config?  

-- 
Josh

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: v4.10: kernel stack frame pointer .. has bad value (null)
  2017-02-22 23:18           ` Josh Poimboeuf
@ 2017-02-23 20:10             ` Pavel Machek
  2017-02-25  5:04               ` Josh Poimboeuf
  0 siblings, 1 reply; 51+ messages in thread
From: Pavel Machek @ 2017-02-23 20:10 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: kernel list, mingo, luto, bp, brgerst, dvlasenk, hpa, torvalds,
	peterz, tglx


[-- Attachment #1.1: Type: text/plain, Size: 4899 bytes --]

Hi!


> > > > Somehow, startup_32_smp() is on the stack twice.  The stack unwind led
> > > > to the startup_32_smp() frame at 0xf50cdf9c rather than the one at
> > > > 0xf50cdfa8 (which is where it should normally be).  So the question is
> > > > how startup_32_smp() got executed the second time, with the wrong stack
> > > > offset.
> > > 
> > > Not much idea... but this is stack dump, right? Just because some
> > > value is on the stack does not mean it is a return address, no?
> > 
> > Right, but the one at 0xf50cdfa8 is where the startup_32_smp() is
> > *supposed* to be.  If the unwinder had unwinded to that one, it wouldn't
> > have complained.  So it looks to me like the CPU somehow booted twice:
> > the first time at the right stack address, and the second time it
> > somehow ended up with a different stack address.
> > 
> > > And .... startup_32_smp is kind of "interesting" function. Take a
> > > look...
> > 
> > Yes, it's used in bringing up the CPU.
> 
> Can you share your .config?  

Here you go...

Meanwhile, another machine, same kernel:

[    0.359606] RPC: Registered tcp NFSv4.1 backchannel transport
module.
[    0.359681] pci 0000:00:02.0: Video device with shadowed ROM at
[mem 0x000c0000-0x000dffff]
[    0.392020] WARNING: kernel stack frame pointer at f4ca9f98 in
swapper/1:0 has bad value   (null)
[    0.392023] unwind stack type:0 next_sp:  (null) mask:a graph_idx:0
[    0.392026] f4ca9ee0: 00000000f4ca9ee8 (0xf4ca9ee8)
[    0.392031] f4ca9ee4: 00000000c40489b7 (irq_exit+0x87/0xa0)
[    0.392032] f4ca9ee8: 00000000f4ca9ef4 (0xf4ca9ef4)
[    0.392036] f4ca9eec: 00000000c402f6d3
(smp_apic_timer_interrupt+0x33/0x40)
[    0.392037] f4ca9ef0: 0000000000000000 ...
[    0.392038] f4ca9ef4: 00000000f4ca9efd (0xf4ca9efd)
[    0.392042] f4ca9ef8: 00000000c4b7ac8e
(apic_timer_interrupt+0x36/0x3c)
[    0.392042] f4ca9efc: 0000000000000000 ...
[    0.392044] f4ca9f0c: 00000000f4c82000 (0xf4c82000)
[    0.392045] f4ca9f10: 00000000f4ca9f38 (0xf4ca9f38)
[    0.392046] f4ca9f14: 0000000000000000 ...
[    0.392047] f4ca9f18: 0000000016e3007b (0x16e3007b)
[    0.392048] f4ca9f1c: 000000000000007b (0x7b)
[    0.392050] f4ca9f20: 00000000000000d8 (0xd8)
[    0.392051] f4ca9f24: 00000000175d00e0 (0x175d00e0)
[    0.392052] f4ca9f28: 00000000ffffff10 (0xffffff10)
[    0.392054] f4ca9f2c: 00000000c4b79883 (mwait_idle+0x43/0x70)
[    0.392055] f4ca9f30: 0000000000000060 (0x60)
[    0.392057] f4ca9f34: 0000000000200246 (0x200246)
[    0.392058] f4ca9f38: 00000000f4ca9f40 (0xf4ca9f40)
[    0.392061] f4ca9f3c: 00000000c401ed09 (arch_cpu_idle+0x9/0x10)
[    0.392062] f4ca9f40: 00000000f4ca9f48 (0xf4ca9f48)
[    0.392064] f4ca9f44: 00000000c4b799cf
(default_idle_call+0x1f/0x30)
[    0.392065] f4ca9f48: 00000000f4ca9f60 (0xf4ca9f60)
[    0.392069] f4ca9f4c: 00000000c4083345 (do_idle+0x155/0x1d0)
[    0.392071] f4ca9f50: 00000000f4c82000 (0xf4c82000)
[    0.392072] f4ca9f54: 00000000fa696c2c (0xfa696c2c)
[    0.392073] f4ca9f58: 0000000000000087 (0x87)
[    0.392074] f4ca9f5c: 00000000cdcd762c (0xcdcd762c)
[    0.392075] f4ca9f60: 00000000f4ca9f78 (0xf4ca9f78)
[    0.392078] f4ca9f64: 00000000c408361d
(cpu_startup_entry+0x5d/0x60)
[    0.392079] f4ca9f68: 000000001dc1ce68 (0x1dc1ce68)
[    0.392080] f4ca9f6c: 00000000fc816a87 (0xfc816a87)
[    0.392081] f4ca9f70: 0000000001020800 (0x1020800)
[    0.392083] f4ca9f74: 000000003e54bb16 (0x3e54bb16)
[    0.392084] f4ca9f78: 00000000f4ca9f98 (0xf4ca9f98)
[    0.392086] f4ca9f7c: 00000000c402d216
(start_secondary+0x176/0x1c0)
[    0.392088] f4ca9f80: 000000003e54bb16 (0x3e54bb16)
[    0.392089] f4ca9f84: 00000000cfe53ead (0xcfe53ead)
[    0.392090] f4ca9f88: 000000000a810020 (0xa810020)
[    0.392091] f4ca9f8c: 0000000000046210 (0x46210)
[    0.392092] f4ca9f90: 0000000000000000 ...
[    0.392094] f4ca9f94: 00000000f4ca9fac (0xf4ca9fac)
[    0.392095] f4ca9f98: 0000000000000000 ...
[    0.392097] f4ca9f9c: 00000000c4000237 (startup_32_smp+0x16b/0x16d)
[    0.392098] f4ca9fa0: 0000000000200002 (0x200002)
[    0.392099] f4ca9fa4: 0000000000000000 ...
[    0.392101] f4ca9fa8: 00000000c4000237 (startup_32_smp+0x16b/0x16d)
[    0.392102] f4ca9fac: 0000000000000000 ...
[    0.392103] f4ca9ff4: 0000000002008070 (0x2008070)
[    0.392104] f4ca9ff8: 0000000000200000 (0x200000)
[    0.392106] f4ca9ffc: 000000004398000c (0x4398000c)
[    2.368034] pci 0000:00:1d.7: EHCI: BIOS handoff failed (BIOS bug?)
01010001
[    2.368388] PCI: CLS 64 bytes, default 64
[    2.370561] workingset: timestamp_bits=30 max_order=20
bucket_order=0
[    2.371741] Installing knfsd (copyright (C) 1996
okir@monad.swb.de).
[    2.372041] ntfs: driver 2.1.32 [Flags: R/W].



-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #1.2: .config.gz --]
[-- Type: application/gzip, Size: 26643 bytes --]

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: v4.10: kernel stack frame pointer .. has bad value (null)
  2017-02-23 20:10             ` Pavel Machek
@ 2017-02-25  5:04               ` Josh Poimboeuf
  2017-03-02 23:45                 ` Josh Poimboeuf
  0 siblings, 1 reply; 51+ messages in thread
From: Josh Poimboeuf @ 2017-02-25  5:04 UTC (permalink / raw)
  To: Pavel Machek
  Cc: kernel list, mingo, luto, bp, brgerst, dvlasenk, hpa, torvalds,
	peterz, tglx

On Thu, Feb 23, 2017 at 09:10:39PM +0100, Pavel Machek wrote:
> Hi!
> 
> 
> > > > > Somehow, startup_32_smp() is on the stack twice.  The stack unwind led
> > > > > to the startup_32_smp() frame at 0xf50cdf9c rather than the one at
> > > > > 0xf50cdfa8 (which is where it should normally be).  So the question is
> > > > > how startup_32_smp() got executed the second time, with the wrong stack
> > > > > offset.
> > > > 
> > > > Not much idea... but this is stack dump, right? Just because some
> > > > value is on the stack does not mean it is a return address, no?
> > > 
> > > Right, but the one at 0xf50cdfa8 is where the startup_32_smp() is
> > > *supposed* to be.  If the unwinder had unwinded to that one, it wouldn't
> > > have complained.  So it looks to me like the CPU somehow booted twice:
> > > the first time at the right stack address, and the second time it
> > > somehow ended up with a different stack address.
> > > 
> > > > And .... startup_32_smp is kind of "interesting" function. Take a
> > > > look...
> > > 
> > > Yes, it's used in bringing up the CPU.
> > 
> > Can you share your .config?  
> 
> Here you go...

What version of gcc are you using?

Can you post a disassembly of the first 10 instructions of
start_secondary()?

-- 
Josh

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: v4.10: kernel stack frame pointer .. has bad value (null)
  2017-02-25  5:04               ` Josh Poimboeuf
@ 2017-03-02 23:45                 ` Josh Poimboeuf
  2017-03-06 16:38                   ` Pavel Machek
  0 siblings, 1 reply; 51+ messages in thread
From: Josh Poimboeuf @ 2017-03-02 23:45 UTC (permalink / raw)
  To: Pavel Machek
  Cc: kernel list, mingo, luto, bp, brgerst, dvlasenk, hpa, torvalds,
	peterz, tglx

On Fri, Feb 24, 2017 at 11:04:39PM -0600, Josh Poimboeuf wrote:
> On Thu, Feb 23, 2017 at 09:10:39PM +0100, Pavel Machek wrote:
> > Hi!
> > 
> > 
> > > > > > Somehow, startup_32_smp() is on the stack twice.  The stack unwind led
> > > > > > to the startup_32_smp() frame at 0xf50cdf9c rather than the one at
> > > > > > 0xf50cdfa8 (which is where it should normally be).  So the question is
> > > > > > how startup_32_smp() got executed the second time, with the wrong stack
> > > > > > offset.
> > > > > 
> > > > > Not much idea... but this is stack dump, right? Just because some
> > > > > value is on the stack does not mean it is a return address, no?
> > > > 
> > > > Right, but the one at 0xf50cdfa8 is where the startup_32_smp() is
> > > > *supposed* to be.  If the unwinder had unwinded to that one, it wouldn't
> > > > have complained.  So it looks to me like the CPU somehow booted twice:
> > > > the first time at the right stack address, and the second time it
> > > > somehow ended up with a different stack address.
> > > > 
> > > > > And .... startup_32_smp is kind of "interesting" function. Take a
> > > > > look...
> > > > 
> > > > Yes, it's used in bringing up the CPU.
> > > 
> > > Can you share your .config?  
> > 
> > Here you go...
> 
> What version of gcc are you using?
> 
> Can you post a disassembly of the first 10 instructions of
> start_secondary()?

Pavel, ping?  I'd like to try to get to the bottom of this issue soon.

I asked for the gcc version and the disassembly of start_secondary()
because I suspect gcc may have done a funky stack alignment prologue
which copies the return address on the stack a second time after
aligning it.

-- 
Josh

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: v4.10: kernel stack frame pointer .. has bad value (null)
  2017-03-02 23:45                 ` Josh Poimboeuf
@ 2017-03-06 16:38                   ` Pavel Machek
  2017-03-07 17:38                     ` Josh Poimboeuf
  0 siblings, 1 reply; 51+ messages in thread
From: Pavel Machek @ 2017-03-06 16:38 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: kernel list, mingo, luto, bp, brgerst, dvlasenk, hpa, torvalds,
	peterz, tglx

[-- Attachment #1: Type: text/plain, Size: 4337 bytes --]

On Thu 2017-03-02 17:45:14, Josh Poimboeuf wrote:
> On Fri, Feb 24, 2017 at 11:04:39PM -0600, Josh Poimboeuf wrote:
> > On Thu, Feb 23, 2017 at 09:10:39PM +0100, Pavel Machek wrote:
> > > Hi!
> > > 
> > > 
> > > > > > > Somehow, startup_32_smp() is on the stack twice.  The stack unwind led
> > > > > > > to the startup_32_smp() frame at 0xf50cdf9c rather than the one at
> > > > > > > 0xf50cdfa8 (which is where it should normally be).  So the question is
> > > > > > > how startup_32_smp() got executed the second time, with the wrong stack
> > > > > > > offset.
> > > > > > 
> > > > > > Not much idea... but this is stack dump, right? Just because some
> > > > > > value is on the stack does not mean it is a return address, no?
> > > > > 
> > > > > Right, but the one at 0xf50cdfa8 is where the startup_32_smp() is
> > > > > *supposed* to be.  If the unwinder had unwinded to that one, it wouldn't
> > > > > have complained.  So it looks to me like the CPU somehow booted twice:
> > > > > the first time at the right stack address, and the second time it
> > > > > somehow ended up with a different stack address.
> > > > > 
> > > > > > And .... startup_32_smp is kind of "interesting" function. Take a
> > > > > > look...
> > > > > 
> > > > > Yes, it's used in bringing up the CPU.
> > > > 
> > > > Can you share your .config?  
> > > 
> > > Here you go...
> > 
> > What version of gcc are you using?
> > 
> > Can you post a disassembly of the first 10 instructions of
> > start_secondary()?
> 
> Pavel, ping?  I'd like to try to get to the bottom of this issue soon.
> 
> I asked for the gcc version and the disassembly of start_secondary()
> because I suspect gcc may have done a funky stack alignment prologue
> which copies the return address on the stack a second time after
> aligning it.

Sorry for the delay. This is on v4.11-rc1, but that should be similar.

pavel@duo:~$ gcc --version
gcc (Debian 4.9.2-10) 4.9.2

And here's the disassemble:

c402d200 <start_secondary>:
c402d200:       57                      push   %edi
c402d201:       8d 7c 24 08             lea    0x8(%esp),%edi
c402d205:       83 e4 f8                and    $0xfffffff8,%esp
c402d208:       ff 77 fc                pushl  -0x4(%edi)
c402d20b:       55                      push   %ebp
c402d20c:       89 e5                   mov    %esp,%ebp
c402d20e:       57                      push   %edi
c402d20f:       56                      push   %esi
c402d210:       83 ec 10                sub    $0x10,%esp
c402d213:       e8 78 78 ff ff          call   c4024a90 <cpu_init>
c402d218:       ff 15 d0 d7 0c c5       call   *0xc50cd7d0
c402d21e:       8b 15 00 53 05 c5       mov    0xc5055300,%edx
c402d224:       8d 75 e8                lea    -0x18(%ebp),%esi
c402d227:       64 a1 f4 c0 1d c5       mov    %fs:0xc51dc0f4,%eax
c402d22d:       89 45 e8                mov    %eax,-0x18(%ebp)
c402d230:       b8 20 00 00 00          mov    $0x20,%eax
c402d235:       ff 52 78                call   *0x78(%edx)
c402d238:       8b 15 00 53 05 c5       mov    0xc5055300,%edx
c402d23e:       ff 52 4c                call   *0x4c(%edx)
c402d241:       e8 ea 2c 00 00          call   c402ff30
<apic_ap_setup>
c402d246:       8b 45 e8                mov    -0x18(%ebp),%eax
c402d249:       e8 42 fb ff ff          call   c402cd90
<smp_store_cpu_info>
c402d24e:       e8 5d 37 fd ff          call   c40009b0
<calibrate_delay>
c402d253:       8b 55 e8                mov    -0x18(%ebp),%edx
c402d256:       b8 00 c0 1d c5          mov    $0xc51dc000,%eax
c402d25b:       8b 0d 88 d6 0b c5       mov    0xc50bd688,%ecx
c402d261:       f6 05 fa fc 13 c5 04    testb  $0x4,0xc513fcfa
c402d268:       8b 14 95 20 52 05 c5    mov
-0x3afaade0(,%edx,4),%edx
c402d26f:       89 8c 10 c4 00 00 00    mov    %ecx,0xc4(%eax,%edx,1)
c402d276:       0f 85 24 01 00 00       jne    c402d3a0
<start_secondary+0x1a0>
c402d27c:       64 a1 f4 c0 1d c5       mov    %fs:0xc51dc0f4,%eax
c402d282:       e8 49 fb ff ff          call   c402cdd0
<set_cpu_sibling_map>

Let me know if I should go back to v4.10 and retry.

Best regards,
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: v4.10: kernel stack frame pointer .. has bad value (null)
  2017-03-06 16:38                   ` Pavel Machek
@ 2017-03-07 17:38                     ` Josh Poimboeuf
  2017-03-07 17:52                       ` Linus Torvalds
  0 siblings, 1 reply; 51+ messages in thread
From: Josh Poimboeuf @ 2017-03-07 17:38 UTC (permalink / raw)
  To: Pavel Machek
  Cc: kernel list, mingo, luto, bp, brgerst, dvlasenk, hpa, torvalds,
	peterz, tglx

On Mon, Mar 06, 2017 at 05:38:07PM +0100, Pavel Machek wrote:
> Sorry for the delay. This is on v4.11-rc1, but that should be similar.
> 
> pavel@duo:~$ gcc --version
> gcc (Debian 4.9.2-10) 4.9.2
> 
> And here's the disassemble:
> 
> c402d200 <start_secondary>:
> c402d200:       57                      push   %edi
> c402d201:       8d 7c 24 08             lea    0x8(%esp),%edi
> c402d205:       83 e4 f8                and    $0xfffffff8,%esp
> c402d208:       ff 77 fc                pushl  -0x4(%edi)
> c402d20b:       55                      push   %ebp
> c402d20c:       89 e5                   mov    %esp,%ebp
> c402d20e:       57                      push   %edi
> c402d20f:       56                      push   %esi
> c402d210:       83 ec 10                sub    $0x10,%esp

Thanks.  This confirms what I was thinking, the function prologue is
wack.  It's realigning the stack, but it's not the "normal" realign
pattern.  Instead it makes a fake frame header, which saves a duplicate
copy of the return address ("pushl -0x4(%edi)") in a place the unwinder
wasn't expecting.

I did some digging in gcc to figure out why this can happen.  gcc uses
something called a Dynamic Realign Argument Pointer (DRAP), which, when
enabled, makes a prologue like the above.  It's almost always enabled
for aligned stacks when -maccumulate-outgoing-args isn't set.

So I'm thinking we should have -maccumulate-outgoing-args always enabled
on x86_32 just like we already do on x86_64.

Can you verify the warning is fixed with the following patch?

-----

diff --git a/arch/x86/Makefile_32.cpu b/arch/x86/Makefile_32.cpu
index 6647ed4..53ec1e4 100644
--- a/arch/x86/Makefile_32.cpu
+++ b/arch/x86/Makefile_32.cpu
@@ -61,7 +61,7 @@ ifeq ($(CONFIG_JUMP_LABEL), y)
 ADD_ACCUMULATE_OUTGOING_ARGS := y
 endif
 
-cflags-$(ADD_ACCUMULATE_OUTGOING_ARGS) += $(call cc-option,-maccumulate-outgoing-args)
+cflags-y += $(call cc-option,-maccumulate-outgoing-args)
 
 # Bug fix for binutils: this option is required in order to keep
 # binutils from generating NOPL instructions against our will.

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: v4.10: kernel stack frame pointer .. has bad value (null)
  2017-03-07 17:38                     ` Josh Poimboeuf
@ 2017-03-07 17:52                       ` Linus Torvalds
  2017-03-07 17:59                         ` Andy Lutomirski
  0 siblings, 1 reply; 51+ messages in thread
From: Linus Torvalds @ 2017-03-07 17:52 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Pavel Machek, kernel list, Ingo Molnar, Andrew Lutomirski,
	Borislav Petkov, Brian Gerst, Denys Vlasenko, Peter Anvin,
	Peter Zijlstra, Thomas Gleixner

On Tue, Mar 7, 2017 at 9:38 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>
> So I'm thinking we should have -maccumulate-outgoing-args always enabled
> on x86_32 just like we already do on x86_64.

Ugh. I realize we have workarounds for bugs, but I think
-maccumulate-outgoing-args is nasty. It just generates worse code by
avoiding the much nicer push/pop sequences, afaik.

On x86-64 it's not such a big deal, because we pass the first six
arguments in registers anyway, so the arguments on the stack is a
fairly unusual special case.

But on x86-32, we only have three argument registers, so this
braindamage is potentially worse.

I guess we already do this in most situations due to the gcc bugs, but
I do think it's sad that we would do it for our _own_ bugs too.

                   Linus

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: v4.10: kernel stack frame pointer .. has bad value (null)
  2017-03-07 17:52                       ` Linus Torvalds
@ 2017-03-07 17:59                         ` Andy Lutomirski
  2017-03-07 18:28                           ` Josh Poimboeuf
  0 siblings, 1 reply; 51+ messages in thread
From: Andy Lutomirski @ 2017-03-07 17:59 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Josh Poimboeuf, Pavel Machek, kernel list, Ingo Molnar,
	Andrew Lutomirski, Borislav Petkov, Brian Gerst, Denys Vlasenko,
	Peter Anvin, Peter Zijlstra, Thomas Gleixner

On Tue, Mar 7, 2017 at 9:52 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Tue, Mar 7, 2017 at 9:38 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>>
>> So I'm thinking we should have -maccumulate-outgoing-args always enabled
>> on x86_32 just like we already do on x86_64.
>
> Ugh. I realize we have workarounds for bugs, but I think
> -maccumulate-outgoing-args is nasty. It just generates worse code by
> avoiding the much nicer push/pop sequences, afaik.
>
> On x86-64 it's not such a big deal, because we pass the first six
> arguments in registers anyway, so the arguments on the stack is a
> fairly unusual special case.
>
> But on x86-32, we only have three argument registers, so this
> braindamage is potentially worse.
>
> I guess we already do this in most situations due to the gcc bugs, but
> I do think it's sad that we would do it for our _own_ bugs too.
>

Is it our bug or a gcc bug?  I would have thought
-fno-omit-frame-pointer meant that the call-frame-to-return-address
offset should be constant and -fomit-frame-pointer meant "do
whatever".

Also, maybe I'm missing something, but does gcc's code even allow the
function to return sensibly?  It could do it by a nasty calculation
involving backing out the old esp from edi, but that seems quite
overcomplicated.

--Andy

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: v4.10: kernel stack frame pointer .. has bad value (null)
  2017-03-07 17:59                         ` Andy Lutomirski
@ 2017-03-07 18:28                           ` Josh Poimboeuf
  2017-03-07 18:30                             ` Josh Poimboeuf
  2017-03-07 18:40                             ` Linus Torvalds
  0 siblings, 2 replies; 51+ messages in thread
From: Josh Poimboeuf @ 2017-03-07 18:28 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, Pavel Machek, kernel list, Ingo Molnar,
	Andrew Lutomirski, Borislav Petkov, Brian Gerst, Denys Vlasenko,
	Peter Anvin, Peter Zijlstra, Thomas Gleixner

On Tue, Mar 07, 2017 at 09:59:44AM -0800, Andy Lutomirski wrote:
> On Tue, Mar 7, 2017 at 9:52 AM, Linus Torvalds
> <torvalds@linux-foundation.org> wrote:
> > On Tue, Mar 7, 2017 at 9:38 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> >>
> >> So I'm thinking we should have -maccumulate-outgoing-args always enabled
> >> on x86_32 just like we already do on x86_64.
> >
> > Ugh. I realize we have workarounds for bugs, but I think
> > -maccumulate-outgoing-args is nasty. It just generates worse code by
> > avoiding the much nicer push/pop sequences, afaik.

Yes, maybe the pushes/pops around a function call are a little easier to
read than movs.

But the -maccumulate-outgoing-args realignment prologue is a *lot* worse
for readability, IMO.

Also, the gcc documentation says -maccumulate-outgoing-args is
"generally beneficial for performance and size."

Not to mention the fact that -maccumulate-outgoing-args seems to already
be enabled in most cases anyway.  Having it uniformly enabled everywhere
makes it less confusing overall when the rare divergences are
encountered.  From looking at some of the changes related to
ADD_ACCUMULATE_OUTGOING_ARGS in arch/x86/Makefile_32.cpu, I can tell
that several others before me have stumbled into this prologue issue.

> > On x86-64 it's not such a big deal, because we pass the first six
> > arguments in registers anyway, so the arguments on the stack is a
> > fairly unusual special case.
> >
> > But on x86-32, we only have three argument registers, so this
> > braindamage is potentially worse.
> >
> > I guess we already do this in most situations due to the gcc bugs, but
> > I do think it's sad that we would do it for our _own_ bugs too.
> >
> 
> Is it our bug or a gcc bug?  I would have thought
> -fno-omit-frame-pointer meant that the call-frame-to-return-address
> offset should be constant and -fomit-frame-pointer meant "do
> whatever".

I don't think it's a gcc bug because it doesn't seem to violate frame
pointer conventions:

    pushl -0x4(%edi)	# copy return address
    push %ebp

The frame pointer and return address are still stored adjacently.  And
it normally allows unwinds to work fine.

The problem is the kernel unwinder's assumption that the last frame
pointer is at a certain address.  That assumption breaks with the DRAP
prologue.

> Also, maybe I'm missing something, but does gcc's code even allow the
> function to return sensibly?  It could do it by a nasty calculation
> involving backing out the old esp from edi, but that seems quite
> overcomplicated.

That's what it does:

    lea -0x8(%edi),%esp
    pop %edi
    ret

-- 
Josh

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: v4.10: kernel stack frame pointer .. has bad value (null)
  2017-03-07 18:28                           ` Josh Poimboeuf
@ 2017-03-07 18:30                             ` Josh Poimboeuf
  2017-03-07 18:40                             ` Linus Torvalds
  1 sibling, 0 replies; 51+ messages in thread
From: Josh Poimboeuf @ 2017-03-07 18:30 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Linus Torvalds, Pavel Machek, kernel list, Ingo Molnar,
	Andrew Lutomirski, Borislav Petkov, Brian Gerst, Denys Vlasenko,
	Peter Anvin, Peter Zijlstra, Thomas Gleixner

On Tue, Mar 07, 2017 at 12:28:55PM -0600, Josh Poimboeuf wrote:
> On Tue, Mar 07, 2017 at 09:59:44AM -0800, Andy Lutomirski wrote:
> > On Tue, Mar 7, 2017 at 9:52 AM, Linus Torvalds
> > <torvalds@linux-foundation.org> wrote:
> > > On Tue, Mar 7, 2017 at 9:38 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > >>
> > >> So I'm thinking we should have -maccumulate-outgoing-args always enabled
> > >> on x86_32 just like we already do on x86_64.
> > >
> > > Ugh. I realize we have workarounds for bugs, but I think
> > > -maccumulate-outgoing-args is nasty. It just generates worse code by
> > > avoiding the much nicer push/pop sequences, afaik.
> 
> Yes, maybe the pushes/pops around a function call are a little easier to
> read than movs.
> 
> But the -maccumulate-outgoing-args realignment prologue is a *lot* worse
> for readability, IMO.

Er, the *NON* -maccumulate-outgoing-args realignment prologue.

> Also, the gcc documentation says -maccumulate-outgoing-args is
> "generally beneficial for performance and size."
> 
> Not to mention the fact that -maccumulate-outgoing-args seems to already
> be enabled in most cases anyway.  Having it uniformly enabled everywhere
> makes it less confusing overall when the rare divergences are
> encountered.  From looking at some of the changes related to
> ADD_ACCUMULATE_OUTGOING_ARGS in arch/x86/Makefile_32.cpu, I can tell
> that several others before me have stumbled into this prologue issue.
> 
> > > On x86-64 it's not such a big deal, because we pass the first six
> > > arguments in registers anyway, so the arguments on the stack is a
> > > fairly unusual special case.
> > >
> > > But on x86-32, we only have three argument registers, so this
> > > braindamage is potentially worse.
> > >
> > > I guess we already do this in most situations due to the gcc bugs, but
> > > I do think it's sad that we would do it for our _own_ bugs too.
> > >
> > 
> > Is it our bug or a gcc bug?  I would have thought
> > -fno-omit-frame-pointer meant that the call-frame-to-return-address
> > offset should be constant and -fomit-frame-pointer meant "do
> > whatever".
> 
> I don't think it's a gcc bug because it doesn't seem to violate frame
> pointer conventions:
> 
>     pushl -0x4(%edi)	# copy return address
>     push %ebp
> 
> The frame pointer and return address are still stored adjacently.  And
> it normally allows unwinds to work fine.
> 
> The problem is the kernel unwinder's assumption that the last frame
> pointer is at a certain address.  That assumption breaks with the DRAP
> prologue.
> 
> > Also, maybe I'm missing something, but does gcc's code even allow the
> > function to return sensibly?  It could do it by a nasty calculation
> > involving backing out the old esp from edi, but that seems quite
> > overcomplicated.
> 
> That's what it does:
> 
>     lea -0x8(%edi),%esp
>     pop %edi
>     ret
> 
> -- 
> Josh

-- 
Josh

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: v4.10: kernel stack frame pointer .. has bad value (null)
  2017-03-07 18:28                           ` Josh Poimboeuf
  2017-03-07 18:30                             ` Josh Poimboeuf
@ 2017-03-07 18:40                             ` Linus Torvalds
  2017-03-08 17:37                               ` Josh Poimboeuf
  1 sibling, 1 reply; 51+ messages in thread
From: Linus Torvalds @ 2017-03-07 18:40 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Andy Lutomirski, Pavel Machek, kernel list, Ingo Molnar,
	Andrew Lutomirski, Borislav Petkov, Brian Gerst, Denys Vlasenko,
	Peter Anvin, Peter Zijlstra, Thomas Gleixner

On Tue, Mar 7, 2017 at 10:28 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>
> Also, the gcc documentation says -maccumulate-outgoing-args is
> "generally beneficial for performance and size."

Hmm. I wonder how true that is. I'm pretty sure it generates bigger
code, although it's probably less noticeable in the kernel (as opposed
to the traditional x86 "push everything" model) due to having the
three register arguments.

And the "it's faster" is almost certainly garbage. It's true on P4 and
some older AMD cores that couldn't do push/pops quickly.

> Not to mention the fact that -maccumulate-outgoing-args seems to already
> be enabled in most cases anyway.

Yeah, that's the main argument for this patch, I think - just remove
the (unusual) special case.

                 Linus

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: v4.10: kernel stack frame pointer .. has bad value (null)
  2017-03-07 18:40                             ` Linus Torvalds
@ 2017-03-08 17:37                               ` Josh Poimboeuf
  2017-03-08 18:25                                 ` Linus Torvalds
  0 siblings, 1 reply; 51+ messages in thread
From: Josh Poimboeuf @ 2017-03-08 17:37 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Pavel Machek, kernel list, Ingo Molnar,
	Andrew Lutomirski, Borislav Petkov, Brian Gerst, Denys Vlasenko,
	Peter Anvin, Peter Zijlstra, Thomas Gleixner

On Tue, Mar 07, 2017 at 10:40:14AM -0800, Linus Torvalds wrote:
> On Tue, Mar 7, 2017 at 10:28 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> >
> > Also, the gcc documentation says -maccumulate-outgoing-args is
> > "generally beneficial for performance and size."
> 
> Hmm. I wonder how true that is. I'm pretty sure it generates bigger
> code, although it's probably less noticeable in the kernel (as opposed
> to the traditional x86 "push everything" model) due to having the
> three register arguments.

It does seem to make it bigger.  With Pavel's config on gcc 6, if I add
-maccumulate-outgoing-args:

   text	   data	    bss	    dec	    hex	filename
12692555	5550652	9146368	27389575	1a1ee87	vmlinux.before
13179531	5546556	9146368	27872455	1a94cc7	vmlinux.after

That's 3.8% more text on x86-32.

(FWIW, on x86-64, the size difference is negligible.)

> And the "it's faster" is almost certainly garbage. It's true on P4 and
> some older AMD cores that couldn't do push/pops quickly.
> 
> > Not to mention the fact that -maccumulate-outgoing-args seems to already
> > be enabled in most cases anyway.
> 
> Yeah, that's the main argument for this patch, I think - just remove
> the (unusual) special case.

As it turns out, when optimizing for size, gcc seems to ignore
-maccumulate-outgoing-args completely.  So I guess we would have to live
with both cases anyway.  Which means I'll need to make the unwinder
smart enough to deal with it.

But that brings up another question.  If -maccumulate-outgoing-args is
ignored with CONFIG_CC_OPTIMIZE_FOR_SIZE=y, wouldn't using that option
break the things which require -maccumulate-outgoing-args?

So, looking deeper at the various reasons this flag is enabled, they
seem to be mostly obsolete.

- CONFIG_FUNCTION_GRAPH_TRACER sets it on x86-32 because of a gcc bug
  where the stack gets aligned before the mcount call.  This issue
  should be mostly obsolete as most modern compilers now have -mfentry.
  We could make it dependent on CC_USING_FENTRY.

- CONFIG_JUMP_LABEL sets it on x86-32 because of a bug in gcc <= 4.5.1
  which has since been fixed with
  https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46226.  We could probably
  make it gcc-version-dependent.

- x86-64 sets it to apparently make the no-longer-in-tree DWARF unwinder
  happy with older versions of gcc.

So it looks like -maccumulate-outgoing-args isn't actually needed in
most cases.

-- 
Josh

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: v4.10: kernel stack frame pointer .. has bad value (null)
  2017-03-08 17:37                               ` Josh Poimboeuf
@ 2017-03-08 18:25                                 ` Linus Torvalds
  2017-03-08 18:54                                   ` Andy Lutomirski
                                                     ` (2 more replies)
  0 siblings, 3 replies; 51+ messages in thread
From: Linus Torvalds @ 2017-03-08 18:25 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Andy Lutomirski, Pavel Machek, kernel list, Ingo Molnar,
	Andrew Lutomirski, Borislav Petkov, Brian Gerst, Denys Vlasenko,
	Peter Anvin, Peter Zijlstra, Thomas Gleixner

On Wed, Mar 8, 2017 at 9:37 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>
> It does seem to make it bigger.  With Pavel's config on gcc 6, if I add
> -maccumulate-outgoing-args:
>
> That's 3.8% more text on x86-32.

That's even more than I expected.  I would have expected the
-mregparm=3 to catch so much of our stack setup that it wouldn't be
all that noticeable. But apparently we just have a ton of functions
with more than 3 arguments.

> (FWIW, on x86-64, the size difference is negligible.)

Yeah, I seriously hope we've actively tried to avoid the more than six
argument calling conventions. The mm code had some that grew over
time, but most of that got converted to passing a pointer to a
descriptor structure instead (ie "struct vm_fault" etc models).

> As it turns out, when optimizing for size, gcc seems to ignore
> -maccumulate-outgoing-args completely.  So I guess we would have to live
> with both cases anyway.  Which means I'll need to make the unwinder
> smart enough to deal with it.
>
> But that brings up another question.  If -maccumulate-outgoing-args is
> ignored with CONFIG_CC_OPTIMIZE_FOR_SIZE=y, wouldn't using that option
> break the things which require -maccumulate-outgoing-args?
>
> So, looking deeper at the various reasons this flag is enabled, they
> seem to be mostly obsolete.

Good.

> - CONFIG_FUNCTION_GRAPH_TRACER sets it on x86-32 because of a gcc bug
>   where the stack gets aligned before the mcount call.  This issue
>   should be mostly obsolete as most modern compilers now have -mfentry.
>   We could make it dependent on CC_USING_FENTRY.

Yeah. At some point we might even upgrade the compiler requirements to
no longer accept the mcount model.

I think the fentry model is gcc-4.6.0 and up. Currently I guess we
support gcc-3.2+, which is fairly ridiculous considering that 4.6.0 is
from March, 2011. So it's over five years ago already.

gcc-3.2.0 is from 2002, I think. At some point you just have to say
"caring about a 15 year old compiler is ridiculous"

The main reason we have fairly aggressively supported old compilers
tends to be some odder architectures that don't have good support, so
people use various random "this works for me" versions.

We could easily make the gcc version checks much more strict on x86, I suspect.

> - CONFIG_JUMP_LABEL sets it on x86-32 because of a bug in gcc <= 4.5.1
>   which has since been fixed with
>   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46226.  We could probably
>   make it gcc-version-dependent.

Looks like we could just use the FENTRY test, since that's more recent.

> - x86-64 sets it to apparently make the no-longer-in-tree DWARF unwinder
>   happy with older versions of gcc.

Ok. Since it's not as big of a deal on x86-64 I guess we don't care,
but on the other hand it would probably then be better to aim to
switch away from it entirely and just put that whole sorry thing
behind us.

> So it looks like -maccumulate-outgoing-args isn't actually needed in
> most cases.

That would be lovely indeed.

                  Linus

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: v4.10: kernel stack frame pointer .. has bad value (null)
  2017-03-08 18:25                                 ` Linus Torvalds
@ 2017-03-08 18:54                                   ` Andy Lutomirski
  2017-03-08 21:22                                   ` Pavel Machek
  2017-03-08 21:29                                   ` Josh Poimboeuf
  2 siblings, 0 replies; 51+ messages in thread
From: Andy Lutomirski @ 2017-03-08 18:54 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Josh Poimboeuf, Pavel Machek, kernel list, Ingo Molnar,
	Andrew Lutomirski, Borislav Petkov, Brian Gerst, Denys Vlasenko,
	Peter Anvin, Peter Zijlstra, Thomas Gleixner

On Wed, Mar 8, 2017 at 10:25 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Wed, Mar 8, 2017 at 9:37 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> Yeah. At some point we might even upgrade the compiler requirements to
> no longer accept the mcount model.
>
> I think the fentry model is gcc-4.6.0 and up. Currently I guess we
> support gcc-3.2+, which is fairly ridiculous considering that 4.6.0 is
> from March, 2011. So it's over five years ago already.
>
> gcc-3.2.0 is from 2002, I think. At some point you just have to say
> "caring about a 15 year old compiler is ridiculous"
>
> The main reason we have fairly aggressively supported old compilers
> tends to be some odder architectures that don't have good support, so
> people use various random "this works for me" versions.

I thought it was because akpm still used Fedora Core 6. :)

At some point, it would be nice to skip way forward and require a
compiler without the 16-byte-stack-alignment bug, too.

--Andy

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: v4.10: kernel stack frame pointer .. has bad value (null)
  2017-03-08 18:25                                 ` Linus Torvalds
  2017-03-08 18:54                                   ` Andy Lutomirski
@ 2017-03-08 21:22                                   ` Pavel Machek
  2017-03-09  9:38                                     ` Geert Uytterhoeven
                                                       ` (2 more replies)
  2017-03-08 21:29                                   ` Josh Poimboeuf
  2 siblings, 3 replies; 51+ messages in thread
From: Pavel Machek @ 2017-03-08 21:22 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Josh Poimboeuf, Andy Lutomirski, kernel list, Ingo Molnar,
	Andrew Lutomirski, Borislav Petkov, Brian Gerst, Denys Vlasenko,
	Peter Anvin, Peter Zijlstra, Thomas Gleixner

[-- Attachment #1: Type: text/plain, Size: 1386 bytes --]

Hi!

> > - CONFIG_FUNCTION_GRAPH_TRACER sets it on x86-32 because of a gcc bug
> >   where the stack gets aligned before the mcount call.  This issue
> >   should be mostly obsolete as most modern compilers now have -mfentry.
> >   We could make it dependent on CC_USING_FENTRY.
> 
> Yeah. At some point we might even upgrade the compiler requirements to
> no longer accept the mcount model.
> 
> I think the fentry model is gcc-4.6.0 and up. Currently I guess we
> support gcc-3.2+, which is fairly ridiculous considering that 4.6.0 is
> from March, 2011. So it's over five years ago already.
> 
> gcc-3.2.0 is from 2002, I think. At some point you just have to say
> "caring about a 15 year old compiler is ridiculous"
> 
> The main reason we have fairly aggressively supported old compilers
> tends to be some odder architectures that don't have good support, so
> people use various random "this works for me" versions.
> 
> We could easily make the gcc version checks much more strict on x86,
> I suspect.

Well, I have fast CPUs, but most of the time they just compile
stuff. Especially bisect is compile-heavy. I suspect going back to
gcc-3.2 would bring me bigger advantages than CPU upgrade...

								Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: v4.10: kernel stack frame pointer .. has bad value (null)
  2017-03-08 18:25                                 ` Linus Torvalds
  2017-03-08 18:54                                   ` Andy Lutomirski
  2017-03-08 21:22                                   ` Pavel Machek
@ 2017-03-08 21:29                                   ` Josh Poimboeuf
  2017-03-09 14:14                                     ` Steven Rostedt
  2017-03-16 15:42                                     ` [PATCH] x86: mostly disable '-maccumulate-outgoing-args' Josh Poimboeuf
  2 siblings, 2 replies; 51+ messages in thread
From: Josh Poimboeuf @ 2017-03-08 21:29 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andy Lutomirski, Pavel Machek, kernel list, Ingo Molnar,
	Andrew Lutomirski, Borislav Petkov, Brian Gerst, Denys Vlasenko,
	Peter Anvin, Peter Zijlstra, Thomas Gleixner, Steven Rostedt

[adding Steven Rostedt to CC as an FYI]

On Wed, Mar 08, 2017 at 10:25:01AM -0800, Linus Torvalds wrote:
> On Wed, Mar 8, 2017 at 9:37 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > - CONFIG_FUNCTION_GRAPH_TRACER sets it on x86-32 because of a gcc bug
> >   where the stack gets aligned before the mcount call.  This issue
> >   should be mostly obsolete as most modern compilers now have -mfentry.
> >   We could make it dependent on CC_USING_FENTRY.
> 
> Yeah. At some point we might even upgrade the compiler requirements to
> no longer accept the mcount model.

The plot slightly thickens...

So I was mistaken about this problem not existing with newer versions of
gcc, because the x86-32 ftrace code doesn't use -mfentry.  It still
relies on mcount.  So CONFIG_FUNCTION_GRAPH_TRACER will still need
-maccumulate-outgoing-args for *all* versions of gcc on x86-32.

(Of course, that situation would improve if ftrace on x86-32 were ported
to use -mfentry.)

Also, since -Os tells gcc to ignore -maccumulate-outgoing-args, this
means that CONFIG_FUNCTION_GRAPH_TRACER with mcount needs a dependency
on CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE.

I suspect these issues also affect x86-64 with gcc 4.4.x and 4.5.x,
which corresponds to the window after the funky DRAP prologue was
introduced but before -mfentry was introduced.

In summary, here are the changes I'm looking at:

- set -maccumulate-outgoing-args if CONFIG_FUNCTION_GRAPH_TRACER &&
  !CC_USING_ENTRY
  (for both 32- and 64-bit)

- somehow make CONFIG_FUNCTION_GRAPH_TRACER depend on either
  CC_USING_FENTRY or CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE
  (for both 32- and 64-bit)

  (not sure how to do that -- maybe just fail the build in the
  graph tracer + mcount + '-Os' case)

- set -maccumulate-outgoing-args if CONFIG_JUMP_LABEL && gcc < 4.5.2
  (for both 32-bit and 64-bit)

-- 
Josh

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: v4.10: kernel stack frame pointer .. has bad value (null)
  2017-03-08 21:22                                   ` Pavel Machek
@ 2017-03-09  9:38                                     ` Geert Uytterhoeven
  2017-03-09 10:56                                       ` Pavel Machek
  2017-03-09 10:49                                     ` Old compiler versions " Pavel Machek
  2017-03-09 15:29                                     ` v4.10: kernel stack frame pointer .. has bad value (null) Peter Zijlstra
  2 siblings, 1 reply; 51+ messages in thread
From: Geert Uytterhoeven @ 2017-03-09  9:38 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Linus Torvalds, Josh Poimboeuf, Andy Lutomirski, kernel list,
	Ingo Molnar, Andrew Lutomirski, Borislav Petkov, Brian Gerst,
	Denys Vlasenko, Peter Anvin, Peter Zijlstra, Thomas Gleixner

Hi Pavel,

On Wed, Mar 8, 2017 at 10:22 PM, Pavel Machek <pavel@ucw.cz> wrote:
> Well, I have fast CPUs, but most of the time they just compile
> stuff. Especially bisect is compile-heavy. I suspect going back to
> gcc-3.2 would bring me bigger advantages than CPU upgrade...

I hope you do use ccache or distcc?

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Old compiler versions (was Re: v4.10: kernel stack frame pointer .. has bad value (null))
  2017-03-08 21:22                                   ` Pavel Machek
  2017-03-09  9:38                                     ` Geert Uytterhoeven
@ 2017-03-09 10:49                                     ` Pavel Machek
  2017-03-09 18:05                                       ` Linus Torvalds
  2017-03-09 15:29                                     ` v4.10: kernel stack frame pointer .. has bad value (null) Peter Zijlstra
  2 siblings, 1 reply; 51+ messages in thread
From: Pavel Machek @ 2017-03-09 10:49 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Josh Poimboeuf, Andy Lutomirski, kernel list, Ingo Molnar,
	Andrew Lutomirski, Borislav Petkov, Brian Gerst, Denys Vlasenko,
	Peter Anvin, Peter Zijlstra, Thomas Gleixner

[-- Attachment #1: Type: text/plain, Size: 4456 bytes --]

Hi!

> > > - CONFIG_FUNCTION_GRAPH_TRACER sets it on x86-32 because of a gcc bug
> > >   where the stack gets aligned before the mcount call.  This issue
> > >   should be mostly obsolete as most modern compilers now have -mfentry.
> > >   We could make it dependent on CC_USING_FENTRY.
> > 
> > Yeah. At some point we might even upgrade the compiler requirements to
> > no longer accept the mcount model.
> > 
> > I think the fentry model is gcc-4.6.0 and up. Currently I guess we
> > support gcc-3.2+, which is fairly ridiculous considering that 4.6.0 is
> > from March, 2011. So it's over five years ago already.
> > 
> > gcc-3.2.0 is from 2002, I think. At some point you just have to say
> > "caring about a 15 year old compiler is ridiculous"
> > 
> > The main reason we have fairly aggressively supported old compilers
> > tends to be some odder architectures that don't have good support, so
> > people use various random "this works for me" versions.
> > 
> > We could easily make the gcc version checks much more strict on x86,
> > I suspect.
> 
> Well, I have fast CPUs, but most of the time they just compile
> stuff. Especially bisect is compile-heavy. I suspect going back to
> gcc-3.2 would bring me bigger advantages than CPU upgrade...

Okay, would not it be nice if we supported gcc-3.3? It compiles about
twice the speed of gcc-4.9, across the board: (If we could compile at
-O1, we'd get 4 times the speed. At -O0, we'd be at cca 9 times the
speed; that would be useful for a bisect!)

Good news is that -Os is quite significantly faster than -O2 (and
already supported), so that should be simple way to optimize bisect performance. 

(On thinkpad X220, compiling bzip2)

| mach |      gcc |     |                |   real |   user |   sys |           $
| x220 | 4.9.2-10 | -O0 | bzip2.c caf036 |  0.644 |   0.54 |  0.03 |           $
|      |          | -O1 |                |  1.501 |        |       |           $
|      |          | -O2 |                |  2.607 |        |       |           $
|      |          | -O3 |                |  3.052 |        |       |           $
|      |          | -Os |                |  1.839 |        |       |           $
|      | 3.3.5-13 | -O0 |                |  0.343 |  0.300 | 0.028 |           $
|      |          | -O1 |                |  0.721 |        |       |           $
|      |          | -O2 |                |  1.238 |        |       |           $
|      |          | -O3 |                |  1.598 |  1.508 | 0.032 |           $


Unfortunately, 4.11-rc1 fails to compile on gcc 3.3.5.

> 1. None (CC_STACKPROTECTOR_NONE) (NEW)

is needed. Easy. But then I get

  AS      arch/x86/entry/entry_32.o
  arch/x86/entry/entry_32.S: Assembler messages:
  arch/x86/entry/entry_32.S:440: Error: invalid character '"' in
  operand 1

from the ALTERNATIVE macro. It seems 3.3 just does not like " in macro
arguments.

arch/x86/boot/bioscall.S: Assembler messages:
arch/x86/boot/bioscall.S:68: Error: `68(%esp)' is not a valid 16 bit
base/index expression

Plus I get about milion of

                 from fs/fs-writeback.c:23:
		 include/linux/irq.h:419: warning: parameter has
		 incomplete type
		 include/linux/irq.h:420: warning: parameter has
		 incomplete type
		 
... and problem with builtin_ffs in drm_blend.c, and others with
function alignment in drm.

lzo1x_compress needs __builtin_ctz. In the end, compilation fails with

mm/built-in.o(.text+0x2b714): In function `do_set_pmd':
: undefined reference to `__compiletime_assert_3034'
mm/built-in.o(.text+0x2c09a): In function `create_huge_pmd':
: undefined reference to `do_huge_pmd_anonymous_page'
mm/built-in.o(.text+0x2c0ca): In function `wp_huge_pmd':
: undefined reference to `do_huge_pmd_wp_page'
drivers/built-in.o(.text+0xe5a2b): In function
`cea_mode_alternate_timings':
: undefined reference to `__compiletime_assert_2638'
drivers/built-in.o(.text+0x3c969f): In function `sg_ioctl':
: undefined reference to `__divdi3'

But that looks fixable. But when I force the compilation, it is
actually _slower_ than recent gcc (23 minutes vs. 13
minutes). Interesting. If someone knows what old gcc versions actually
compile recent kernels, I'd like to know.

Best regards,

									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: v4.10: kernel stack frame pointer .. has bad value (null)
  2017-03-09  9:38                                     ` Geert Uytterhoeven
@ 2017-03-09 10:56                                       ` Pavel Machek
  2017-03-09 12:16                                         ` Geert Uytterhoeven
  0 siblings, 1 reply; 51+ messages in thread
From: Pavel Machek @ 2017-03-09 10:56 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Linus Torvalds, Josh Poimboeuf, Andy Lutomirski, kernel list,
	Ingo Molnar, Andrew Lutomirski, Borislav Petkov, Brian Gerst,
	Denys Vlasenko, Peter Anvin, Peter Zijlstra, Thomas Gleixner

[-- Attachment #1: Type: text/plain, Size: 872 bytes --]

On Thu 2017-03-09 10:38:46, Geert Uytterhoeven wrote:
> Hi Pavel,
> 
> On Wed, Mar 8, 2017 at 10:22 PM, Pavel Machek <pavel@ucw.cz> wrote:
> > Well, I have fast CPUs, but most of the time they just compile
> > stuff. Especially bisect is compile-heavy. I suspect going back to
> > gcc-3.2 would bring me bigger advantages than CPU upgrade...
> 
> I hope you do use ccache or distcc?

I tried to use distcc before, but it was rather hard to maintain. No
ccache here. Hmm. I guess ccache really makes sense for bisect.

On the other hand... it should be possible to compile kernel 10 times
faster than we normally do, without powering up additional machines
and without caching tricks.

Best regards,
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: v4.10: kernel stack frame pointer .. has bad value (null)
  2017-03-09 10:56                                       ` Pavel Machek
@ 2017-03-09 12:16                                         ` Geert Uytterhoeven
  2017-03-10 13:17                                           ` Compiling kernels faster (was Re: v4.10: kernel stack frame pointer .. has bad value (null)) Pavel Machek
  0 siblings, 1 reply; 51+ messages in thread
From: Geert Uytterhoeven @ 2017-03-09 12:16 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Linus Torvalds, Josh Poimboeuf, Andy Lutomirski, kernel list,
	Ingo Molnar, Andrew Lutomirski, Borislav Petkov, Brian Gerst,
	Denys Vlasenko, Peter Anvin, Peter Zijlstra, Thomas Gleixner

Hi Pavel,

On Thu, Mar 9, 2017 at 11:56 AM, Pavel Machek <pavel@ucw.cz> wrote:
> On Thu 2017-03-09 10:38:46, Geert Uytterhoeven wrote:
>> On Wed, Mar 8, 2017 at 10:22 PM, Pavel Machek <pavel@ucw.cz> wrote:
>> > Well, I have fast CPUs, but most of the time they just compile
>> > stuff. Especially bisect is compile-heavy. I suspect going back to
>> > gcc-3.2 would bring me bigger advantages than CPU upgrade...
>>
>> I hope you do use ccache or distcc?
>
> I tried to use distcc before, but it was rather hard to maintain. No
> ccache here. Hmm. I guess ccache really makes sense for bisect.

Yes it does. So if you're not using it yet, do the below, today, not tomorrow.

If your distro supports it, prepend /usr/lib/ccache/ to your $PATH.
Create symlinks from the names of your favorite cross-compilers
to /usr/bin/ccache, and make sure they are early in your $PATH.

That's it! Enjoy!

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: v4.10: kernel stack frame pointer .. has bad value (null)
  2017-03-08 21:29                                   ` Josh Poimboeuf
@ 2017-03-09 14:14                                     ` Steven Rostedt
  2017-03-09 18:31                                       ` Josh Poimboeuf
  2017-03-16 15:42                                     ` [PATCH] x86: mostly disable '-maccumulate-outgoing-args' Josh Poimboeuf
  1 sibling, 1 reply; 51+ messages in thread
From: Steven Rostedt @ 2017-03-09 14:14 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: Linus Torvalds, Andy Lutomirski, Pavel Machek, kernel list,
	Ingo Molnar, Andrew Lutomirski, Borislav Petkov, Brian Gerst,
	Denys Vlasenko, Peter Anvin, Peter Zijlstra, Thomas Gleixner

On Wed, 8 Mar 2017 15:29:59 -0600
Josh Poimboeuf <jpoimboe@redhat.com> wrote:

> [adding Steven Rostedt to CC as an FYI]
> 
> On Wed, Mar 08, 2017 at 10:25:01AM -0800, Linus Torvalds wrote:
> > On Wed, Mar 8, 2017 at 9:37 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:  
> > > - CONFIG_FUNCTION_GRAPH_TRACER sets it on x86-32 because of a gcc bug
> > >   where the stack gets aligned before the mcount call.  This issue
> > >   should be mostly obsolete as most modern compilers now have -mfentry.
> > >   We could make it dependent on CC_USING_FENTRY.  
> > 
> > Yeah. At some point we might even upgrade the compiler requirements to
> > no longer accept the mcount model.  
> 
> The plot slightly thickens...
> 
> So I was mistaken about this problem not existing with newer versions of
> gcc, because the x86-32 ftrace code doesn't use -mfentry.  It still
> relies on mcount.  So CONFIG_FUNCTION_GRAPH_TRACER will still need
> -maccumulate-outgoing-args for *all* versions of gcc on x86-32.

OK, I admit, I was lazy here. I thought, who cares about x86-32
anymore ;-)

> 
> (Of course, that situation would improve if ftrace on x86-32 were ported
> to use -mfentry.)

That can easily be done.

> 
> Also, since -Os tells gcc to ignore -maccumulate-outgoing-args, this
> means that CONFIG_FUNCTION_GRAPH_TRACER with mcount needs a dependency
> on CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE.
> 
> I suspect these issues also affect x86-64 with gcc 4.4.x and 4.5.x,
> which corresponds to the window after the funky DRAP prologue was
> introduced but before -mfentry was introduced.
> 
> In summary, here are the changes I'm looking at:
> 
> - set -maccumulate-outgoing-args if CONFIG_FUNCTION_GRAPH_TRACER &&
>   !CC_USING_ENTRY
>   (for both 32- and 64-bit)
> 
> - somehow make CONFIG_FUNCTION_GRAPH_TRACER depend on either
>   CC_USING_FENTRY or CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE
>   (for both 32- and 64-bit)
> 
>   (not sure how to do that -- maybe just fail the build in the
>   graph tracer + mcount + '-Os' case)

Could just place something like this in the x86 code:

#if defined(CONFIG_FUNCTION_GRAPH_TRACER) && \
    !defined(CC_USING_FENTRY) && \
    !defined(CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE)
# error Your compiler doesn't support function graph tracing
#endif

-- Steve

> 
> - set -maccumulate-outgoing-args if CONFIG_JUMP_LABEL && gcc < 4.5.2
>   (for both 32-bit and 64-bit)
> 

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: v4.10: kernel stack frame pointer .. has bad value (null)
  2017-03-08 21:22                                   ` Pavel Machek
  2017-03-09  9:38                                     ` Geert Uytterhoeven
  2017-03-09 10:49                                     ` Old compiler versions " Pavel Machek
@ 2017-03-09 15:29                                     ` Peter Zijlstra
  2017-03-09 21:12                                       ` Pavel Machek
  2 siblings, 1 reply; 51+ messages in thread
From: Peter Zijlstra @ 2017-03-09 15:29 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Linus Torvalds, Josh Poimboeuf, Andy Lutomirski, kernel list,
	Ingo Molnar, Andrew Lutomirski, Borislav Petkov, Brian Gerst,
	Denys Vlasenko, Peter Anvin, Thomas Gleixner

On Wed, Mar 08, 2017 at 10:22:53PM +0100, Pavel Machek wrote:
> 
> Well, I have fast CPUs, but most of the time they just compile
> stuff. Especially bisect is compile-heavy. I suspect going back to
> gcc-3.2 would bring me bigger advantages than CPU upgrade...
> 

But note that 3.2 compiles a distinctly different kernel from something
new and shiny. The kernel uses a lot of GCC features optimistically to
generate different code.

So if by some chance your error depends on one of the new features,
bisecting with some ancient compiler will not work.

And I cannot see that getting any better, only worse.

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Old compiler versions (was Re: v4.10: kernel stack frame pointer .. has bad value (null))
  2017-03-09 10:49                                     ` Old compiler versions " Pavel Machek
@ 2017-03-09 18:05                                       ` Linus Torvalds
  0 siblings, 0 replies; 51+ messages in thread
From: Linus Torvalds @ 2017-03-09 18:05 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Josh Poimboeuf, Andy Lutomirski, kernel list, Ingo Molnar,
	Andrew Lutomirski, Borislav Petkov, Brian Gerst, Denys Vlasenko,
	Peter Anvin, Peter Zijlstra, Thomas Gleixner

On Thu, Mar 9, 2017 at 2:49 AM, Pavel Machek <pavel@ucw.cz> wrote:
>
> (On thinkpad X220, compiling bzip2)

You really shouldn't assume that the zlib build tracks the kernel build.

At least at some point, a noticeable part of the build cost for the
kernel was just parsing the fairly big source code. We have honking
big include files and deep nesting, and there is a lot of preprocessor
and just general parsing overhead for stuff that in most files don't
even generate code.

All those inline functions and type declarations for things that then
aren't actually used in most files means that you spend relatively
more time just parsing files than you spend on generating and
optimizing code.

So the trade-offs between different projects can be very different.
Some projects have huge tables with static initializers that gcc at
some point had serious quadratic-time issues with, and other code has
big functions where the actual optimization phase is the bulk of it.
And some projects have a lot of big and nested include files.

It's not nearly as bad as some C++ projects (where the header file
mess can often _easily_ be the dominant factor by far), but it's still
potentially completely different from something like building zlib.

Oh, and don't even bother looking at -O0 times. That's almost purely
parsing, but more importantly, the kernel has never in its lifetime
built without optimizations.

We basically rely on the compiler not being moronic crap. Always have,
always will.

> Unfortunately, 4.11-rc1 fails to compile on gcc 3.3.5.
>
>> 1. None (CC_STACKPROTECTOR_NONE) (NEW)
>
> is needed. Easy. But then I get
>
>   AS      arch/x86/entry/entry_32.o
>   arch/x86/entry/entry_32.S: Assembler messages:
>   arch/x86/entry/entry_32.S:440: Error: invalid character '"' in
>   operand 1
>
> from the ALTERNATIVE macro. It seems 3.3 just does not like " in macro
> arguments.

Ok. Clearly our checks in <linux/compiler-gcc.h> are outdated, and we
"allow" compilers that don't actually work.

> But that looks fixable. But when I force the compilation, it is
> actually _slower_ than recent gcc (23 minutes vs. 13
> minutes). Interesting.

I forget when gcc got the "integrated preprocessor". It's a long time
ago. But that actually sped things up, because it basically halves (or
more) the overhead of parsing.

With an external preprocessor you obviously first have cpp doing its
parsing, then writing the preprocessed results out, and then you had
cc1 doing parsing again.

So yes, gcc has gotten a lot slower over time, but some things have
actually improved.

                 Linus

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: v4.10: kernel stack frame pointer .. has bad value (null)
  2017-03-09 14:14                                     ` Steven Rostedt
@ 2017-03-09 18:31                                       ` Josh Poimboeuf
  0 siblings, 0 replies; 51+ messages in thread
From: Josh Poimboeuf @ 2017-03-09 18:31 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Linus Torvalds, Andy Lutomirski, Pavel Machek, kernel list,
	Ingo Molnar, Andrew Lutomirski, Borislav Petkov, Brian Gerst,
	Denys Vlasenko, Peter Anvin, Peter Zijlstra, Thomas Gleixner

On Thu, Mar 09, 2017 at 09:14:47AM -0500, Steven Rostedt wrote:
> On Wed, 8 Mar 2017 15:29:59 -0600
> Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> 
> > [adding Steven Rostedt to CC as an FYI]
> > 
> > On Wed, Mar 08, 2017 at 10:25:01AM -0800, Linus Torvalds wrote:
> > > On Wed, Mar 8, 2017 at 9:37 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:  
> > > > - CONFIG_FUNCTION_GRAPH_TRACER sets it on x86-32 because of a gcc bug
> > > >   where the stack gets aligned before the mcount call.  This issue
> > > >   should be mostly obsolete as most modern compilers now have -mfentry.
> > > >   We could make it dependent on CC_USING_FENTRY.  
> > > 
> > > Yeah. At some point we might even upgrade the compiler requirements to
> > > no longer accept the mcount model.  
> > 
> > The plot slightly thickens...
> > 
> > So I was mistaken about this problem not existing with newer versions of
> > gcc, because the x86-32 ftrace code doesn't use -mfentry.  It still
> > relies on mcount.  So CONFIG_FUNCTION_GRAPH_TRACER will still need
> > -maccumulate-outgoing-args for *all* versions of gcc on x86-32.
> 
> OK, I admit, I was lazy here. I thought, who cares about x86-32
> anymore ;-)

As we just saw in another thread where somebody ran into this problem
with -Os, apparently some people still do care...

> > (Of course, that situation would improve if ftrace on x86-32 were ported
> > to use -mfentry.)
> 
> That can easily be done.

You weren't on CC earlier, so just to summarize the benefits of doing
fentry on x86-32, thus removing the need for -maccumulate-outgoing-args:

- graph tracer compatibility with -Os
- text size decrease of ~3%
- possible performance improvement
- more uniformity (-maccumulate-outgoing-args disabled everywhere for
  modern gccs)

But either way I'll still work up a patch to make the changes I
suggested.

-- 
Josh

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: v4.10: kernel stack frame pointer .. has bad value (null)
  2017-03-09 15:29                                     ` v4.10: kernel stack frame pointer .. has bad value (null) Peter Zijlstra
@ 2017-03-09 21:12                                       ` Pavel Machek
  0 siblings, 0 replies; 51+ messages in thread
From: Pavel Machek @ 2017-03-09 21:12 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Josh Poimboeuf, Andy Lutomirski, kernel list,
	Ingo Molnar, Andrew Lutomirski, Borislav Petkov, Brian Gerst,
	Denys Vlasenko, Peter Anvin, Thomas Gleixner

[-- Attachment #1: Type: text/plain, Size: 1221 bytes --]

On Thu 2017-03-09 16:29:10, Peter Zijlstra wrote:
> On Wed, Mar 08, 2017 at 10:22:53PM +0100, Pavel Machek wrote:
> > 
> > Well, I have fast CPUs, but most of the time they just compile
> > stuff. Especially bisect is compile-heavy. I suspect going back to
> > gcc-3.2 would bring me bigger advantages than CPU upgrade...
> > 
> 
> But note that 3.2 compiles a distinctly different kernel from something
> new and shiny. The kernel uses a lot of GCC features optimistically to
> generate different code.
> 
> So if by some chance your error depends on one of the new features,
> bisecting with some ancient compiler will not work.

Well, yes, obviously different compilers generate different code.

OTOH for drivers (where most errors are) the difference should not be
significant.

And actually.. if you realize it bug is gcc version dependend, you'll
know where to look for the bug.

(Anyway, it looks like gcc-3.3 is not usable for kernel on x86, and it
is actually slower, too. So -- bad idea. gcc -O1 looks promising.)

Best regards,
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Compiling kernels faster (was Re: v4.10: kernel stack frame pointer .. has bad value (null))
  2017-03-09 12:16                                         ` Geert Uytterhoeven
@ 2017-03-10 13:17                                           ` Pavel Machek
  2017-03-10 13:28                                             ` Geert Uytterhoeven
  2017-03-10 14:15                                             ` Willy Tarreau
  0 siblings, 2 replies; 51+ messages in thread
From: Pavel Machek @ 2017-03-10 13:17 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Linus Torvalds, Josh Poimboeuf, Andy Lutomirski, kernel list,
	Ingo Molnar, Andrew Lutomirski, Borislav Petkov, Brian Gerst,
	Denys Vlasenko, Peter Anvin, Peter Zijlstra, Thomas Gleixner

[-- Attachment #1: Type: text/plain, Size: 2193 bytes --]

On Thu 2017-03-09 13:16:09, Geert Uytterhoeven wrote:
> Hi Pavel,
> 
> On Thu, Mar 9, 2017 at 11:56 AM, Pavel Machek <pavel@ucw.cz> wrote:
> > On Thu 2017-03-09 10:38:46, Geert Uytterhoeven wrote:
> >> On Wed, Mar 8, 2017 at 10:22 PM, Pavel Machek <pavel@ucw.cz> wrote:
> >> > Well, I have fast CPUs, but most of the time they just compile
> >> > stuff. Especially bisect is compile-heavy. I suspect going back to
> >> > gcc-3.2 would bring me bigger advantages than CPU upgrade...
> >>
> >> I hope you do use ccache or distcc?
> >
> > I tried to use distcc before, but it was rather hard to maintain. No
> > ccache here. Hmm. I guess ccache really makes sense for bisect.
> 
> Yes it does. So if you're not using it yet, do the below, today, not tomorrow.
> 
> If your distro supports it, prepend /usr/lib/ccache/ to your $PATH.
> Create symlinks from the names of your favorite cross-compilers
> to /usr/bin/ccache, and make sure they are early in your $PATH.
> 
> That's it! Enjoy!

Hmm. Installed, and seems to work. OTOH, compilation now seems to
produce 2-3MB/sec writing on spinning rust, and CPUs are no longer
fully loaded. (make -j 7 on 2 core HT machine). Any io load sends the
CPU utilization to cca 50% range... Compilation goes up from 9:13 to
11:40... to 23 minutes depending on situation. I guess it is still
worth it for the bisect, but it looks like ccache really needs an ssd.

On the other hand, switching to -O1 is really easy, and gets 15% or so
improvement.

Hmm. And killing chromium matters a lot for a compile time. I hate
modern web :-(.

Best regards,							Pavel


--- a/Makefile
+++ b/Makefile
@@ -639,9 +639,9 @@ ifdef CONFIG_CC_OPTIMIZE_FOR_SIZE
 KBUILD_CFLAGS  += -Os $(call cc-disable-warning,maybe-uninitialized,)
 else
 ifdef CONFIG_PROFILE_ALL_BRANCHES
-KBUILD_CFLAGS  += -O2 $(call cc-disable-warning,maybe-uninitialized,)
+KBUILD_CFLAGS  += -O1 $(call cc-disable-warning,maybe-uninitialized,)
 else
-KBUILD_CFLAGS   += -O2
+KBUILD_CFLAGS   += -O1
 endif
 endif

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Compiling kernels faster (was Re: v4.10: kernel stack frame pointer .. has bad value (null))
  2017-03-10 13:17                                           ` Compiling kernels faster (was Re: v4.10: kernel stack frame pointer .. has bad value (null)) Pavel Machek
@ 2017-03-10 13:28                                             ` Geert Uytterhoeven
  2017-03-10 14:15                                             ` Willy Tarreau
  1 sibling, 0 replies; 51+ messages in thread
From: Geert Uytterhoeven @ 2017-03-10 13:28 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Linus Torvalds, Josh Poimboeuf, Andy Lutomirski, kernel list,
	Ingo Molnar, Andrew Lutomirski, Borislav Petkov, Brian Gerst,
	Denys Vlasenko, Peter Anvin, Peter Zijlstra, Thomas Gleixner

Hi Pavel,

On Fri, Mar 10, 2017 at 2:17 PM, Pavel Machek <pavel@ucw.cz> wrote:
> On Thu 2017-03-09 13:16:09, Geert Uytterhoeven wrote:
>> On Thu, Mar 9, 2017 at 11:56 AM, Pavel Machek <pavel@ucw.cz> wrote:
>> > On Thu 2017-03-09 10:38:46, Geert Uytterhoeven wrote:
>> >> I hope you do use ccache or distcc?
>> >
>> > I tried to use distcc before, but it was rather hard to maintain. No
>> > ccache here. Hmm. I guess ccache really makes sense for bisect.
>>
>> Yes it does. So if you're not using it yet, do the below, today, not tomorrow.
>>
>> If your distro supports it, prepend /usr/lib/ccache/ to your $PATH.
>> Create symlinks from the names of your favorite cross-compilers
>> to /usr/bin/ccache, and make sure they are early in your $PATH.
>>
>> That's it! Enjoy!
>
> Hmm. Installed, and seems to work. OTOH, compilation now seems to
> produce 2-3MB/sec writing on spinning rust, and CPUs are no longer
> fully loaded. (make -j 7 on 2 core HT machine). Any io load sends the
> CPU utilization to cca 50% range... Compilation goes up from 9:13 to
> 11:40... to 23 minutes depending on situation. I guess it is still

I guess that was the first build, with a clean cache?
Now run "make clean", and try again ;-)

BTW, I tend not to do -j beyond the number of cores/threads (i.e. -j 8
on the i7-4770), unless you just want to compile, and not enjoy other
interactive work ;-)

> worth it for the bisect, but it looks like ccache really needs an ssd.

Adding an SSD never hurts.
Although I have been a happy user of ccache since long before I got an SSD.

> Hmm. And killing chromium matters a lot for a compile time. I hate
> modern web :-(.

Adding (freeing) RAM also never hurts ;-)

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: Compiling kernels faster (was Re: v4.10: kernel stack frame pointer .. has bad value (null))
  2017-03-10 13:17                                           ` Compiling kernels faster (was Re: v4.10: kernel stack frame pointer .. has bad value (null)) Pavel Machek
  2017-03-10 13:28                                             ` Geert Uytterhoeven
@ 2017-03-10 14:15                                             ` Willy Tarreau
  1 sibling, 0 replies; 51+ messages in thread
From: Willy Tarreau @ 2017-03-10 14:15 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Geert Uytterhoeven, Linus Torvalds, Josh Poimboeuf,
	Andy Lutomirski, kernel list, Ingo Molnar, Andrew Lutomirski,
	Borislav Petkov, Brian Gerst, Denys Vlasenko, Peter Anvin,
	Peter Zijlstra, Thomas Gleixner

Hi Pavel,

On Fri, Mar 10, 2017 at 02:17:51PM +0100, Pavel Machek wrote:
> On Thu 2017-03-09 13:16:09, Geert Uytterhoeven wrote:
> > Hi Pavel,
> > 
> > On Thu, Mar 9, 2017 at 11:56 AM, Pavel Machek <pavel@ucw.cz> wrote:
> > > On Thu 2017-03-09 10:38:46, Geert Uytterhoeven wrote:
> > >> On Wed, Mar 8, 2017 at 10:22 PM, Pavel Machek <pavel@ucw.cz> wrote:
> > >> > Well, I have fast CPUs, but most of the time they just compile
> > >> > stuff. Especially bisect is compile-heavy. I suspect going back to
> > >> > gcc-3.2 would bring me bigger advantages than CPU upgrade...
> > >>
> > >> I hope you do use ccache or distcc?
> > >
> > > I tried to use distcc before, but it was rather hard to maintain. No
> > > ccache here. Hmm. I guess ccache really makes sense for bisect.
> > 
> > Yes it does. So if you're not using it yet, do the below, today, not tomorrow.
> > 
> > If your distro supports it, prepend /usr/lib/ccache/ to your $PATH.
> > Create symlinks from the names of your favorite cross-compilers
> > to /usr/bin/ccache, and make sure they are early in your $PATH.
> > 
> > That's it! Enjoy!
> 
> Hmm. Installed, and seems to work. OTOH, compilation now seems to
> produce 2-3MB/sec writing on spinning rust, and CPUs are no longer
> fully loaded. (make -j 7 on 2 core HT machine). Any io load sends the
> CPU utilization to cca 50% range... Compilation goes up from 9:13 to
> 11:40... to 23 minutes depending on situation. I guess it is still
> worth it for the bisect, but it looks like ccache really needs an ssd.

You need to put your cache in /dev/shm or some fast place not moving
heavy metallic heads.

That said, I have great success with distcc, I use it a lot with my build
farms (home and work) on some fanless machines [1]. I had to apply some
small updates recently to distcc because I noticed it would not delegate
files built with certain -Wa arguments as were recently added to the kernel.
I also found that by default it limits itself to 50 jobs which is often not
enough to keep your local machine busy.

The last 3.10.105-rc1 series I sent was build-tested this way, and it
builds allmodconfig in slightly less than 5 mn (4900 modules I believe),
with peaks up to 120 files per second. That's quite fast.

With distcc however you need a fast local machine because cpp is run
there, like a number of other tasks (eg: do not compress modules). You need
some RAM as well because you'll need a high parallel job count to keep your
machines busy (I build at -j60 which is the optimal value in my case). At
work only 4 small fanless ARM boards (cortex A17) cut my build time by 3
(local is a t430s with an i5-3320m) and at home 6 such boards cut the build
time by 2.5 (local is i7-6700K).

You cannot reasonably do that with too slow build nodes because you want
to limit the maximum build time for a single file. Here the cortex A17
at 1.8-2.0 GHz is perfect, it's the fastest fanless machine I found. Some
cheap x86 machines can work well also. In all case you must absolutely use
gigabit network or you'll constantly have some idle time as the traffic is
very bursty. I'm seeing up to 650 Mbps without compressing with LZO, and
between 70-150 Mbps with LZO, and it always builds faster with it.

> On the other hand, switching to -O1 is really easy, and gets 15% or so
> improvement.

I used to do such things in the past but certain level of optimizations
are useful to report warnings. For example I build haproxy daily at -O0,
but sometimes I discover later that I introduced some warnings that are
only detected at -O2.

> Hmm. And killing chromium matters a lot for a compile time. I hate
> modern web :-(.

killall -STOP. That's what I'm doing with firefox when I want some
resources.

Cheers,
Willy

---
[1] https://forum.mqmaker.com/t/miqi-based-build-farm-finally-up-and-running/605/24

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH] x86: mostly disable '-maccumulate-outgoing-args'
  2017-03-08 21:29                                   ` Josh Poimboeuf
  2017-03-09 14:14                                     ` Steven Rostedt
@ 2017-03-16 15:42                                     ` Josh Poimboeuf
  2017-03-16 17:32                                       ` Steven Rostedt
  2017-03-16 19:31                                       ` [PATCH v2] " Josh Poimboeuf
  1 sibling, 2 replies; 51+ messages in thread
From: Josh Poimboeuf @ 2017-03-16 15:42 UTC (permalink / raw)
  To: x86
  Cc: Andy Lutomirski, Pavel Machek, kernel list, Ingo Molnar,
	Andrew Lutomirski, Borislav Petkov, Brian Gerst, Denys Vlasenko,
	Peter Anvin, Peter Zijlstra, Thomas Gleixner, Steven Rostedt,
	Linus Torvalds

Subject: [PATCH] x86: mostly disable '-maccumulate-outgoing-args'

The gcc '-maccumulate-outgoing-args' flag is enabled for most configs,
mostly because of issues which are no longer relevant.  For most
configs, with most recent versions of gcc, it's no longer needed.

Clarify which cases need it, and only enable it for those cases.  Also
produce a compile-time error for the ftrace graph + mcount + '-Os' case,
which will otherwise cause runtime failures.

The main benefit of '-maccumulate-outgoing-args' is that it prevents an
ugly prologue for functions which have aligned stacks.  But removing the
option also has some benefits: more readable argument saves, smaller
text size, and (presumably) slightly improved performance.

Here are the object size savings for 32-bit and 64-bit defconfig
kernels:

      text	   data	    bss	     dec	    hex	filename
  10006710	3543328	1773568	15323606	 e9d1d6	vmlinux.x86-32.before
   9706358	3547424	1773568	15027350	 e54c96	vmlinux.x86-32.after

      text	   data	    bss	     dec	    hex	filename
  10652105	4537576	 843776	16033457	 f4a6b1	vmlinux.x86-64.before
  10639629	4537576	 843776	16020981	 f475f5	vmlinux.x86-64.after

That comes out to a 3% text size improvement on x86-32 and a 0.1% text
size improvement on x86-64.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
 arch/x86/Makefile        | 29 +++++++++++++++++++++++++----
 arch/x86/Makefile_32.cpu | 18 ------------------
 arch/x86/kernel/ftrace.c |  6 ++++++
 scripts/Kbuild.include   |  4 ++++
 4 files changed, 35 insertions(+), 22 deletions(-)

diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 2d44933..fa45989b 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -120,10 +120,6 @@ else
         # -funit-at-a-time shrinks the kernel .text considerably
         # unfortunately it makes reading oopses harder.
         KBUILD_CFLAGS += $(call cc-option,-funit-at-a-time)
-
-        # this works around some issues with generating unwind tables in older gccs
-        # newer gccs do it by default
-        KBUILD_CFLAGS += $(call cc-option,-maccumulate-outgoing-args)
 endif
 
 ifdef CONFIG_X86_X32
@@ -147,6 +143,31 @@ ifeq ($(CONFIG_KMEMCHECK),y)
 	KBUILD_CFLAGS += $(call cc-option,-fno-builtin-memcpy)
 endif
 
+# If the function graph tracer is used with mcount instead of fentry,
+# '-maccumulate-outgoing-args' is needed to prevent gcc bug
+# https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42109
+ifdef CONFIG_FUNCTION_GRAPH_TRACER
+  ifndef CONFIG_HAVE_FENTRY
+	ACCUMULATE_OUTGOING_ARGS := 1
+  else
+    ifeq ($(call cc-option, -mfentry),)
+	ACCUMULATE_OUTGOING_ARGS := 1
+    endif
+  endif
+endif
+
+# Jump labels need '-maccumulate-outgoing-args' for gcc < 4.5.2 to prevent
+# gcc bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46226
+ifdef CONFIG_JUMP_LABEL
+  ifneq ($(ACCUMULATE_OUTGOING_ARGS), 1)
+	ACCUMULATE_OUTGOING_ARGS = $(call cc-if-fullversion, -lt, 040502, 1)
+  endif
+endif
+
+ifeq ($(ACCUMULATE_OUTGOING_ARGS), 1)
+	KBUILD_CFLAGS += -maccumulate-outgoing-args
+endif
+
 # Stackpointer is addressed different for 32 bit and 64 bit x86
 sp-$(CONFIG_X86_32) := esp
 sp-$(CONFIG_X86_64) := rsp
diff --git a/arch/x86/Makefile_32.cpu b/arch/x86/Makefile_32.cpu
index 6647ed4..a45eb15 100644
--- a/arch/x86/Makefile_32.cpu
+++ b/arch/x86/Makefile_32.cpu
@@ -45,24 +45,6 @@ cflags-$(CONFIG_MGEODE_LX)	+= $(call cc-option,-march=geode,-march=pentium-mmx)
 # cpu entries
 cflags-$(CONFIG_X86_GENERIC) 	+= $(call tune,generic,$(call tune,i686))
 
-# Work around the pentium-mmx code generator madness of gcc4.4.x which
-# does stack alignment by generating horrible code _before_ the mcount
-# prologue (push %ebp, mov %esp, %ebp) which breaks the function graph
-# tracer assumptions. For i686, generic, core2 this is set by the
-# compiler anyway
-ifeq ($(CONFIG_FUNCTION_GRAPH_TRACER), y)
-ADD_ACCUMULATE_OUTGOING_ARGS := y
-endif
-
-# Work around to a bug with asm goto with first implementations of it
-# in gcc causing gcc to mess up the push and pop of the stack in some
-# uses of asm goto.
-ifeq ($(CONFIG_JUMP_LABEL), y)
-ADD_ACCUMULATE_OUTGOING_ARGS := y
-endif
-
-cflags-$(ADD_ACCUMULATE_OUTGOING_ARGS) += $(call cc-option,-maccumulate-outgoing-args)
-
 # Bug fix for binutils: this option is required in order to keep
 # binutils from generating NOPL instructions against our will.
 ifneq ($(CONFIG_X86_P6_NOP),y)
diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index 8f3d9cf..59f9b46 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -29,6 +29,12 @@
 #include <asm/ftrace.h>
 #include <asm/nops.h>
 
+#if defined(CONFIG_FUNCTION_GRAPH_TRACER) && \
+	!defined(CC_USING_FENTRY) && \
+	!defined(CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE)
+# error Your compiler does not support function graph tracing
+#endif
+
 #ifdef CONFIG_DYNAMIC_FTRACE
 
 int ftrace_arch_code_modify_prepare(void)
diff --git a/scripts/Kbuild.include b/scripts/Kbuild.include
index d6ca649..afe3fd3 100644
--- a/scripts/Kbuild.include
+++ b/scripts/Kbuild.include
@@ -148,6 +148,10 @@ cc-fullversion = $(shell $(CONFIG_SHELL) \
 # Usage:  EXTRA_CFLAGS += $(call cc-ifversion, -lt, 0402, -O1)
 cc-ifversion = $(shell [ $(cc-version) $(1) $(2) ] && echo $(3) || echo $(4))
 
+# cc-if-fullversion
+# Usage:  EXTRA_CFLAGS += $(call cc-if-fullversion, -lt, 040502, -O1)
+cc-if-fullversion = $(shell [ $(cc-fullversion) $(1) $(2) ] && echo $(3) || echo $(4))
+
 # cc-ldoption
 # Usage: ldflags += $(call cc-ldoption, -Wl$(comma)--hash-style=both)
 cc-ldoption = $(call try-run,\
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH] x86: mostly disable '-maccumulate-outgoing-args'
  2017-03-16 15:42                                     ` [PATCH] x86: mostly disable '-maccumulate-outgoing-args' Josh Poimboeuf
@ 2017-03-16 17:32                                       ` Steven Rostedt
  2017-03-16 18:36                                         ` Josh Poimboeuf
  2017-03-16 19:31                                       ` [PATCH v2] " Josh Poimboeuf
  1 sibling, 1 reply; 51+ messages in thread
From: Steven Rostedt @ 2017-03-16 17:32 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, Andy Lutomirski, Pavel Machek, kernel list, Ingo Molnar,
	Andrew Lutomirski, Borislav Petkov, Brian Gerst, Denys Vlasenko,
	Peter Anvin, Peter Zijlstra, Thomas Gleixner, Linus Torvalds

On Thu, 16 Mar 2017 10:42:08 -0500
Josh Poimboeuf <jpoimboe@redhat.com> wrote:

> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
> ---
>  arch/x86/Makefile        | 29 +++++++++++++++++++++++++----
>  arch/x86/Makefile_32.cpu | 18 ------------------
>  arch/x86/kernel/ftrace.c |  6 ++++++
>  scripts/Kbuild.include   |  4 ++++
>  4 files changed, 35 insertions(+), 22 deletions(-)
> 
> diff --git a/arch/x86/Makefile b/arch/x86/Makefile
> index 2d44933..fa45989b 100644
> --- a/arch/x86/Makefile
> +++ b/arch/x86/Makefile
> @@ -120,10 +120,6 @@ else
>          # -funit-at-a-time shrinks the kernel .text considerably
>          # unfortunately it makes reading oopses harder.
>          KBUILD_CFLAGS += $(call cc-option,-funit-at-a-time)
> -
> -        # this works around some issues with generating unwind tables in older gccs
> -        # newer gccs do it by default
> -        KBUILD_CFLAGS += $(call cc-option,-maccumulate-outgoing-args)
>  endif
>  
>  ifdef CONFIG_X86_X32
> @@ -147,6 +143,31 @@ ifeq ($(CONFIG_KMEMCHECK),y)
>  	KBUILD_CFLAGS += $(call cc-option,-fno-builtin-memcpy)
>  endif
>  
> +# If the function graph tracer is used with mcount instead of fentry,
> +# '-maccumulate-outgoing-args' is needed to prevent gcc bug

			"to prevent a gcc bug"

> +# https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42109
> +ifdef CONFIG_FUNCTION_GRAPH_TRACER
> +  ifndef CONFIG_HAVE_FENTRY
> +	ACCUMULATE_OUTGOING_ARGS := 1
> +  else
> +    ifeq ($(call cc-option, -mfentry),)

Hmm, the blank entry makes me nervous. I wonder if it would be better
if we had ifneq ($(call cc-option-yn, -mfentry),y)

Unfortunately, there's one of each in the existing kernel, so there is
really no precedence.

> +	ACCUMULATE_OUTGOING_ARGS := 1
> +    endif
> +  endif
> +endif
> +
> +# Jump labels need '-maccumulate-outgoing-args' for gcc < 4.5.2 to prevent

Can we make a test instead? I hate testing versions, and things get
backported all the time. We usually like to have a test case instead of
relying on versions. Not to mention, a newer gcc may one day break.

-- Steve

> +# gcc bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46226
> +ifdef CONFIG_JUMP_LABEL
> +  ifneq ($(ACCUMULATE_OUTGOING_ARGS), 1)
> +	ACCUMULATE_OUTGOING_ARGS = $(call cc-if-fullversion, -lt, 040502, 1)
> +  endif
> +endif
> +
> +ifeq ($(ACCUMULATE_OUTGOING_ARGS), 1)
> +	KBUILD_CFLAGS += -maccumulate-outgoing-args
> +endif
> +
>  # Stackpointer is addressed different for 32 bit and 64 bit x86
>  sp-$(CONFIG_X86_32) := esp
>  sp-$(CONFIG_X86_64) := rsp
> diff --git a/arch/x86/Makefile_32.cpu b/arch/x86/Makefile_32.cpu
> index 6647ed4..a45eb15 100644
> --- a/arch/x86/Makefile_32.cpu
> +++ b/arch/x86/Makefile_32.cpu
> @@ -45,24 +45,6 @@ cflags-$(CONFIG_MGEODE_LX)	+= $(call cc-option,-march=geode,-march=pentium-mmx)
>  # cpu entries
>  cflags-$(CONFIG_X86_GENERIC) 	+= $(call tune,generic,$(call tune,i686))
>  
> -# Work around the pentium-mmx code generator madness of gcc4.4.x which
> -# does stack alignment by generating horrible code _before_ the mcount
> -# prologue (push %ebp, mov %esp, %ebp) which breaks the function graph
> -# tracer assumptions. For i686, generic, core2 this is set by the
> -# compiler anyway
> -ifeq ($(CONFIG_FUNCTION_GRAPH_TRACER), y)
> -ADD_ACCUMULATE_OUTGOING_ARGS := y
> -endif
> -
> -# Work around to a bug with asm goto with first implementations of it
> -# in gcc causing gcc to mess up the push and pop of the stack in some
> -# uses of asm goto.
> -ifeq ($(CONFIG_JUMP_LABEL), y)
> -ADD_ACCUMULATE_OUTGOING_ARGS := y
> -endif
> -
> -cflags-$(ADD_ACCUMULATE_OUTGOING_ARGS) += $(call cc-option,-maccumulate-outgoing-args)
> -
>  # Bug fix for binutils: this option is required in order to keep
>  # binutils from generating NOPL instructions against our will.
>  ifneq ($(CONFIG_X86_P6_NOP),y)
> diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
> index 8f3d9cf..59f9b46 100644
> --- a/arch/x86/kernel/ftrace.c
> +++ b/arch/x86/kernel/ftrace.c
> @@ -29,6 +29,12 @@
>  #include <asm/ftrace.h>
>  #include <asm/nops.h>
>  
> +#if defined(CONFIG_FUNCTION_GRAPH_TRACER) && \
> +	!defined(CC_USING_FENTRY) && \
> +	!defined(CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE)
> +# error Your compiler does not support function graph tracing
> +#endif
> +
>  #ifdef CONFIG_DYNAMIC_FTRACE
>  
>  int ftrace_arch_code_modify_prepare(void)
> diff --git a/scripts/Kbuild.include b/scripts/Kbuild.include
> index d6ca649..afe3fd3 100644
> --- a/scripts/Kbuild.include
> +++ b/scripts/Kbuild.include
> @@ -148,6 +148,10 @@ cc-fullversion = $(shell $(CONFIG_SHELL) \
>  # Usage:  EXTRA_CFLAGS += $(call cc-ifversion, -lt, 0402, -O1)
>  cc-ifversion = $(shell [ $(cc-version) $(1) $(2) ] && echo $(3) || echo $(4))
>  
> +# cc-if-fullversion
> +# Usage:  EXTRA_CFLAGS += $(call cc-if-fullversion, -lt, 040502, -O1)
> +cc-if-fullversion = $(shell [ $(cc-fullversion) $(1) $(2) ] && echo $(3) || echo $(4))
> +
>  # cc-ldoption
>  # Usage: ldflags += $(call cc-ldoption, -Wl$(comma)--hash-style=both)
>  cc-ldoption = $(call try-run,\

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] x86: mostly disable '-maccumulate-outgoing-args'
  2017-03-16 17:32                                       ` Steven Rostedt
@ 2017-03-16 18:36                                         ` Josh Poimboeuf
  2017-03-16 18:53                                           ` Josh Poimboeuf
  2017-03-16 19:06                                           ` Steven Rostedt
  0 siblings, 2 replies; 51+ messages in thread
From: Josh Poimboeuf @ 2017-03-16 18:36 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: x86, Andy Lutomirski, Pavel Machek, kernel list, Ingo Molnar,
	Andrew Lutomirski, Borislav Petkov, Brian Gerst, Denys Vlasenko,
	Peter Anvin, Peter Zijlstra, Thomas Gleixner, Linus Torvalds

On Thu, Mar 16, 2017 at 01:32:01PM -0400, Steven Rostedt wrote:
> On Thu, 16 Mar 2017 10:42:08 -0500
> Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> 
> > Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
> > ---
> >  arch/x86/Makefile        | 29 +++++++++++++++++++++++++----
> >  arch/x86/Makefile_32.cpu | 18 ------------------
> >  arch/x86/kernel/ftrace.c |  6 ++++++
> >  scripts/Kbuild.include   |  4 ++++
> >  4 files changed, 35 insertions(+), 22 deletions(-)
> > 
> > diff --git a/arch/x86/Makefile b/arch/x86/Makefile
> > index 2d44933..fa45989b 100644
> > --- a/arch/x86/Makefile
> > +++ b/arch/x86/Makefile
> > @@ -120,10 +120,6 @@ else
> >          # -funit-at-a-time shrinks the kernel .text considerably
> >          # unfortunately it makes reading oopses harder.
> >          KBUILD_CFLAGS += $(call cc-option,-funit-at-a-time)
> > -
> > -        # this works around some issues with generating unwind tables in older gccs
> > -        # newer gccs do it by default
> > -        KBUILD_CFLAGS += $(call cc-option,-maccumulate-outgoing-args)
> >  endif
> >  
> >  ifdef CONFIG_X86_X32
> > @@ -147,6 +143,31 @@ ifeq ($(CONFIG_KMEMCHECK),y)
> >  	KBUILD_CFLAGS += $(call cc-option,-fno-builtin-memcpy)
> >  endif
> >  
> > +# If the function graph tracer is used with mcount instead of fentry,
> > +# '-maccumulate-outgoing-args' is needed to prevent gcc bug
> 
> 			"to prevent a gcc bug"

It was

  "to prevent gcc bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42109"

where "gcc bug" was an adjective and the URL was a noun.  But yeah,
that's kind of confusing, and the line wrap made it more so.  Maybe I'll
change it to

  "to prevent a gcc bug (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42109)"

and a similar change for the jump label bug comment.

> > +# https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42109
> > +ifdef CONFIG_FUNCTION_GRAPH_TRACER
> > +  ifndef CONFIG_HAVE_FENTRY
> > +	ACCUMULATE_OUTGOING_ARGS := 1
> > +  else
> > +    ifeq ($(call cc-option, -mfentry),)
> 
> Hmm, the blank entry makes me nervous. I wonder if it would be better
> if we had ifneq ($(call cc-option-yn, -mfentry),y)
> 
> Unfortunately, there's one of each in the existing kernel, so there is
> really no precedence.

Either way seems fine.  I'll go with your suggested change.

> > +	ACCUMULATE_OUTGOING_ARGS := 1
> > +    endif
> > +  endif
> > +endif
> > +
> > +# Jump labels need '-maccumulate-outgoing-args' for gcc < 4.5.2 to prevent
> 
> Can we make a test instead? I hate testing versions, and things get
> backported all the time. We usually like to have a test case instead of
> relying on versions. Not to mention, a newer gcc may one day break.

Tests are generally better, but I'm not sure how to test for this
cleanly.  The test is rather big for embedding in a makefile:

  https://gcc.gnu.org/bugzilla/attachment.cgi?id=22199

Any ideas?

-- 
Josh

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] x86: mostly disable '-maccumulate-outgoing-args'
  2017-03-16 18:36                                         ` Josh Poimboeuf
@ 2017-03-16 18:53                                           ` Josh Poimboeuf
  2017-03-16 19:04                                             ` Josh Poimboeuf
  2017-03-16 19:06                                           ` Steven Rostedt
  1 sibling, 1 reply; 51+ messages in thread
From: Josh Poimboeuf @ 2017-03-16 18:53 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: x86, Andy Lutomirski, Pavel Machek, kernel list, Ingo Molnar,
	Andrew Lutomirski, Borislav Petkov, Brian Gerst, Denys Vlasenko,
	Peter Anvin, Peter Zijlstra, Thomas Gleixner, Linus Torvalds

On Thu, Mar 16, 2017 at 01:36:35PM -0500, Josh Poimboeuf wrote:
> On Thu, Mar 16, 2017 at 01:32:01PM -0400, Steven Rostedt wrote:
> > On Thu, 16 Mar 2017 10:42:08 -0500
> > Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > > +	ACCUMULATE_OUTGOING_ARGS := 1
> > > +    endif
> > > +  endif
> > > +endif
> > > +
> > > +# Jump labels need '-maccumulate-outgoing-args' for gcc < 4.5.2 to prevent
> > 
> > Can we make a test instead? I hate testing versions, and things get
> > backported all the time. We usually like to have a test case instead of
> > relying on versions. Not to mention, a newer gcc may one day break.
> 
> Tests are generally better, but I'm not sure how to test for this
> cleanly.  The test is rather big for embedding in a makefile:
> 
>   https://gcc.gnu.org/bugzilla/attachment.cgi?id=22199
> 
> Any ideas?

After some snooping I discovered there's some precedent for doing this
in the scripts/gcc-*.sh files.  So maybe I'll add a test there and call
it from the Makefile.

-- 
Josh

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] x86: mostly disable '-maccumulate-outgoing-args'
  2017-03-16 18:53                                           ` Josh Poimboeuf
@ 2017-03-16 19:04                                             ` Josh Poimboeuf
  2017-03-16 19:07                                               ` Steven Rostedt
  0 siblings, 1 reply; 51+ messages in thread
From: Josh Poimboeuf @ 2017-03-16 19:04 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: x86, Andy Lutomirski, Pavel Machek, kernel list, Ingo Molnar,
	Andrew Lutomirski, Borislav Petkov, Brian Gerst, Denys Vlasenko,
	Peter Anvin, Peter Zijlstra, Thomas Gleixner, Linus Torvalds

On Thu, Mar 16, 2017 at 01:53:05PM -0500, Josh Poimboeuf wrote:
> On Thu, Mar 16, 2017 at 01:36:35PM -0500, Josh Poimboeuf wrote:
> > On Thu, Mar 16, 2017 at 01:32:01PM -0400, Steven Rostedt wrote:
> > > On Thu, 16 Mar 2017 10:42:08 -0500
> > > Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > > > +	ACCUMULATE_OUTGOING_ARGS := 1
> > > > +    endif
> > > > +  endif
> > > > +endif
> > > > +
> > > > +# Jump labels need '-maccumulate-outgoing-args' for gcc < 4.5.2 to prevent
> > > 
> > > Can we make a test instead? I hate testing versions, and things get
> > > backported all the time. We usually like to have a test case instead of
> > > relying on versions. Not to mention, a newer gcc may one day break.
> > 
> > Tests are generally better, but I'm not sure how to test for this
> > cleanly.  The test is rather big for embedding in a makefile:
> > 
> >   https://gcc.gnu.org/bugzilla/attachment.cgi?id=22199
> > 
> > Any ideas?
> 
> After some snooping I discovered there's some precedent for doing this
> in the scripts/gcc-*.sh files.  So maybe I'll add a test there and call
> it from the Makefile.

But now I realize that those other tests are just build tests, whereas
this one needs to be executed.  That's a no-go for cross compilers.  So
I think we need to do the version check after all.

-- 
Josh

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] x86: mostly disable '-maccumulate-outgoing-args'
  2017-03-16 18:36                                         ` Josh Poimboeuf
  2017-03-16 18:53                                           ` Josh Poimboeuf
@ 2017-03-16 19:06                                           ` Steven Rostedt
  1 sibling, 0 replies; 51+ messages in thread
From: Steven Rostedt @ 2017-03-16 19:06 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, Andy Lutomirski, Pavel Machek, kernel list, Ingo Molnar,
	Andrew Lutomirski, Borislav Petkov, Brian Gerst, Denys Vlasenko,
	Peter Anvin, Peter Zijlstra, Thomas Gleixner, Linus Torvalds

On Thu, 16 Mar 2017 13:36:35 -0500
Josh Poimboeuf <jpoimboe@redhat.com> wrote:

> On Thu, Mar 16, 2017 at 01:32:01PM -0400, Steven Rostedt wrote:
> > On Thu, 16 Mar 2017 10:42:08 -0500
> > Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> >   
> > > Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
> > > ---
> > >  arch/x86/Makefile        | 29 +++++++++++++++++++++++++----
> > >  arch/x86/Makefile_32.cpu | 18 ------------------
> > >  arch/x86/kernel/ftrace.c |  6 ++++++
> > >  scripts/Kbuild.include   |  4 ++++
> > >  4 files changed, 35 insertions(+), 22 deletions(-)
> > > 
> > > diff --git a/arch/x86/Makefile b/arch/x86/Makefile
> > > index 2d44933..fa45989b 100644
> > > --- a/arch/x86/Makefile
> > > +++ b/arch/x86/Makefile
> > > @@ -120,10 +120,6 @@ else
> > >          # -funit-at-a-time shrinks the kernel .text considerably
> > >          # unfortunately it makes reading oopses harder.
> > >          KBUILD_CFLAGS += $(call cc-option,-funit-at-a-time)
> > > -
> > > -        # this works around some issues with generating unwind tables in older gccs
> > > -        # newer gccs do it by default
> > > -        KBUILD_CFLAGS += $(call cc-option,-maccumulate-outgoing-args)
> > >  endif
> > >  
> > >  ifdef CONFIG_X86_X32
> > > @@ -147,6 +143,31 @@ ifeq ($(CONFIG_KMEMCHECK),y)
> > >  	KBUILD_CFLAGS += $(call cc-option,-fno-builtin-memcpy)
> > >  endif
> > >  
> > > +# If the function graph tracer is used with mcount instead of fentry,
> > > +# '-maccumulate-outgoing-args' is needed to prevent gcc bug  
> > 
> > 			"to prevent a gcc bug"  
> 
> It was
> 
>   "to prevent gcc bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42109"
> 
> where "gcc bug" was an adjective and the URL was a noun.  But yeah,
> that's kind of confusing, and the line wrap made it more so.  Maybe I'll
> change it to
> 
>   "to prevent a gcc bug (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42109)"

Hmm, "the" would have made it work too.

> 
> and a similar change for the jump label bug comment.
> 
> > > +# https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42109
> > > +ifdef CONFIG_FUNCTION_GRAPH_TRACER
> > > +  ifndef CONFIG_HAVE_FENTRY
> > > +	ACCUMULATE_OUTGOING_ARGS := 1
> > > +  else
> > > +    ifeq ($(call cc-option, -mfentry),)  
> > 
> > Hmm, the blank entry makes me nervous. I wonder if it would be better
> > if we had ifneq ($(call cc-option-yn, -mfentry),y)
> > 
> > Unfortunately, there's one of each in the existing kernel, so there is
> > really no precedence.  
> 
> Either way seems fine.  I'll go with your suggested change.
> 
> > > +	ACCUMULATE_OUTGOING_ARGS := 1
> > > +    endif
> > > +  endif
> > > +endif
> > > +
> > > +# Jump labels need '-maccumulate-outgoing-args' for gcc < 4.5.2 to prevent  
> > 
> > Can we make a test instead? I hate testing versions, and things get
> > backported all the time. We usually like to have a test case instead of
> > relying on versions. Not to mention, a newer gcc may one day break.  
> 
> Tests are generally better, but I'm not sure how to test for this
> cleanly.  The test is rather big for embedding in a makefile:
> 
>   https://gcc.gnu.org/bugzilla/attachment.cgi?id=22199
> 
> Any ideas?
> 

I'd reply but I see you figured it out yourself.

-- Steve

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH] x86: mostly disable '-maccumulate-outgoing-args'
  2017-03-16 19:04                                             ` Josh Poimboeuf
@ 2017-03-16 19:07                                               ` Steven Rostedt
  0 siblings, 0 replies; 51+ messages in thread
From: Steven Rostedt @ 2017-03-16 19:07 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, Andy Lutomirski, Pavel Machek, kernel list, Ingo Molnar,
	Andrew Lutomirski, Borislav Petkov, Brian Gerst, Denys Vlasenko,
	Peter Anvin, Peter Zijlstra, Thomas Gleixner, Linus Torvalds

On Thu, 16 Mar 2017 14:04:01 -0500
Josh Poimboeuf <jpoimboe@redhat.com> wrote:

> > After some snooping I discovered there's some precedent for doing this
> > in the scripts/gcc-*.sh files.  So maybe I'll add a test there and call
> > it from the Makefile.  
> 
> But now I realize that those other tests are just build tests, whereas
> this one needs to be executed.  That's a no-go for cross compilers.  So
> I think we need to do the version check after all.
> 

Fine, but please add a comment saying such.

-- Steve

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [PATCH v2] x86: mostly disable '-maccumulate-outgoing-args'
  2017-03-16 15:42                                     ` [PATCH] x86: mostly disable '-maccumulate-outgoing-args' Josh Poimboeuf
  2017-03-16 17:32                                       ` Steven Rostedt
@ 2017-03-16 19:31                                       ` Josh Poimboeuf
  2017-03-22  7:51                                         ` Ingo Molnar
                                                           ` (2 more replies)
  1 sibling, 3 replies; 51+ messages in thread
From: Josh Poimboeuf @ 2017-03-16 19:31 UTC (permalink / raw)
  To: x86
  Cc: Andy Lutomirski, Pavel Machek, kernel list, Ingo Molnar,
	Andrew Lutomirski, Borislav Petkov, Brian Gerst, Denys Vlasenko,
	Peter Anvin, Peter Zijlstra, Thomas Gleixner, Steven Rostedt,
	Linus Torvalds

The gcc '-maccumulate-outgoing-args' flag is enabled for most configs,
mostly because of issues which are no longer relevant.  For most
configs, and with most recent versions of gcc, it's no longer needed.

Clarify which cases need it, and only enable it for those cases.  Also
produce a compile-time error for the ftrace graph + mcount + '-Os' case,
which will otherwise cause runtime failures.

The main benefit of '-maccumulate-outgoing-args' is that it prevents an
ugly prologue for functions which have aligned stacks.  But removing the
option also has some benefits: more readable argument saves, smaller
text size, and (presumably) slightly improved performance.

Here are the object size savings for 32-bit and 64-bit defconfig
kernels:

      text	   data	    bss	     dec	    hex	filename
  10006710	3543328	1773568	15323606	 e9d1d6	vmlinux.x86-32.before
   9706358	3547424	1773568	15027350	 e54c96	vmlinux.x86-32.after

      text	   data	    bss	     dec	    hex	filename
  10652105	4537576	 843776	16033457	 f4a6b1	vmlinux.x86-64.before
  10639629	4537576	 843776	16020981	 f475f5	vmlinux.x86-64.after

That comes out to a 3% text size improvement on x86-32 and a 0.1% text
size improvement on x86-64.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
v2:
- improve readability of the comments
- add comment about why gcc version needs to be checked
- use cc-option-yn instead of cc-option

 arch/x86/Makefile        | 31 +++++++++++++++++++++++++++----
 arch/x86/Makefile_32.cpu | 18 ------------------
 arch/x86/kernel/ftrace.c |  6 ++++++
 scripts/Kbuild.include   |  4 ++++
 4 files changed, 37 insertions(+), 22 deletions(-)

diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 2d44933..04c87be 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -120,10 +120,6 @@ else
         # -funit-at-a-time shrinks the kernel .text considerably
         # unfortunately it makes reading oopses harder.
         KBUILD_CFLAGS += $(call cc-option,-funit-at-a-time)
-
-        # this works around some issues with generating unwind tables in older gccs
-        # newer gccs do it by default
-        KBUILD_CFLAGS += $(call cc-option,-maccumulate-outgoing-args)
 endif
 
 ifdef CONFIG_X86_X32
@@ -147,6 +143,33 @@ ifeq ($(CONFIG_KMEMCHECK),y)
 	KBUILD_CFLAGS += $(call cc-option,-fno-builtin-memcpy)
 endif
 
+# If the function graph tracer is used with mcount instead of fentry,
+# '-maccumulate-outgoing-args' is needed to prevent a gcc bug
+# (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42109)
+ifdef CONFIG_FUNCTION_GRAPH_TRACER
+  ifndef CONFIG_HAVE_FENTRY
+	ACCUMULATE_OUTGOING_ARGS := 1
+  else
+    ifeq ($(call cc-option-yn, -mfentry), n)
+	ACCUMULATE_OUTGOING_ARGS := 1
+    endif
+  endif
+endif
+
+# Jump labels need '-maccumulate-outgoing-args' for gcc < 4.5.2 to prevent a
+# gcc bug (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46226).  There's no way
+# to test for this bug at compile-time because the test case needs to execute,
+# which is a no-go for cross compilers.  So check the gcc version instead.
+ifdef CONFIG_JUMP_LABEL
+  ifneq ($(ACCUMULATE_OUTGOING_ARGS), 1)
+	ACCUMULATE_OUTGOING_ARGS = $(call cc-if-fullversion, -lt, 040502, 1)
+  endif
+endif
+
+ifeq ($(ACCUMULATE_OUTGOING_ARGS), 1)
+	KBUILD_CFLAGS += -maccumulate-outgoing-args
+endif
+
 # Stackpointer is addressed different for 32 bit and 64 bit x86
 sp-$(CONFIG_X86_32) := esp
 sp-$(CONFIG_X86_64) := rsp
diff --git a/arch/x86/Makefile_32.cpu b/arch/x86/Makefile_32.cpu
index 6647ed4..a45eb15 100644
--- a/arch/x86/Makefile_32.cpu
+++ b/arch/x86/Makefile_32.cpu
@@ -45,24 +45,6 @@ cflags-$(CONFIG_MGEODE_LX)	+= $(call cc-option,-march=geode,-march=pentium-mmx)
 # cpu entries
 cflags-$(CONFIG_X86_GENERIC) 	+= $(call tune,generic,$(call tune,i686))
 
-# Work around the pentium-mmx code generator madness of gcc4.4.x which
-# does stack alignment by generating horrible code _before_ the mcount
-# prologue (push %ebp, mov %esp, %ebp) which breaks the function graph
-# tracer assumptions. For i686, generic, core2 this is set by the
-# compiler anyway
-ifeq ($(CONFIG_FUNCTION_GRAPH_TRACER), y)
-ADD_ACCUMULATE_OUTGOING_ARGS := y
-endif
-
-# Work around to a bug with asm goto with first implementations of it
-# in gcc causing gcc to mess up the push and pop of the stack in some
-# uses of asm goto.
-ifeq ($(CONFIG_JUMP_LABEL), y)
-ADD_ACCUMULATE_OUTGOING_ARGS := y
-endif
-
-cflags-$(ADD_ACCUMULATE_OUTGOING_ARGS) += $(call cc-option,-maccumulate-outgoing-args)
-
 # Bug fix for binutils: this option is required in order to keep
 # binutils from generating NOPL instructions against our will.
 ifneq ($(CONFIG_X86_P6_NOP),y)
diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index 8f3d9cf..59f9b46 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -29,6 +29,12 @@
 #include <asm/ftrace.h>
 #include <asm/nops.h>
 
+#if defined(CONFIG_FUNCTION_GRAPH_TRACER) && \
+	!defined(CC_USING_FENTRY) && \
+	!defined(CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE)
+# error Your compiler does not support function graph tracing
+#endif
+
 #ifdef CONFIG_DYNAMIC_FTRACE
 
 int ftrace_arch_code_modify_prepare(void)
diff --git a/scripts/Kbuild.include b/scripts/Kbuild.include
index d6ca649..afe3fd3 100644
--- a/scripts/Kbuild.include
+++ b/scripts/Kbuild.include
@@ -148,6 +148,10 @@ cc-fullversion = $(shell $(CONFIG_SHELL) \
 # Usage:  EXTRA_CFLAGS += $(call cc-ifversion, -lt, 0402, -O1)
 cc-ifversion = $(shell [ $(cc-version) $(1) $(2) ] && echo $(3) || echo $(4))
 
+# cc-if-fullversion
+# Usage:  EXTRA_CFLAGS += $(call cc-if-fullversion, -lt, 040502, -O1)
+cc-if-fullversion = $(shell [ $(cc-fullversion) $(1) $(2) ] && echo $(3) || echo $(4))
+
 # cc-ldoption
 # Usage: ldflags += $(call cc-ldoption, -Wl$(comma)--hash-style=both)
 cc-ldoption = $(call try-run,\
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [PATCH v2] x86: mostly disable '-maccumulate-outgoing-args'
  2017-03-16 19:31                                       ` [PATCH v2] " Josh Poimboeuf
@ 2017-03-22  7:51                                         ` Ingo Molnar
  2017-03-22 15:48                                           ` Josh Poimboeuf
  2017-03-28  8:13                                         ` [tip:x86/urgent] x86/build: Mostly " tip-bot for Josh Poimboeuf
  2017-03-30  9:58                                         ` tip-bot for Josh Poimboeuf
  2 siblings, 1 reply; 51+ messages in thread
From: Ingo Molnar @ 2017-03-22  7:51 UTC (permalink / raw)
  To: Josh Poimboeuf
  Cc: x86, Andy Lutomirski, Pavel Machek, kernel list,
	Andrew Lutomirski, Borislav Petkov, Brian Gerst, Denys Vlasenko,
	Peter Anvin, Peter Zijlstra, Thomas Gleixner, Steven Rostedt,
	Linus Torvalds


* Josh Poimboeuf <jpoimboe@redhat.com> wrote:

> The gcc '-maccumulate-outgoing-args' flag is enabled for most configs,
> mostly because of issues which are no longer relevant.  For most
> configs, and with most recent versions of gcc, it's no longer needed.
> 
> Clarify which cases need it, and only enable it for those cases.  Also
> produce a compile-time error for the ftrace graph + mcount + '-Os' case,
> which will otherwise cause runtime failures.
> 
> The main benefit of '-maccumulate-outgoing-args' is that it prevents an
> ugly prologue for functions which have aligned stacks.  But removing the
> option also has some benefits: more readable argument saves, smaller
> text size, and (presumably) slightly improved performance.
> 
> Here are the object size savings for 32-bit and 64-bit defconfig
> kernels:
> 
>       text	   data	    bss	     dec	    hex	filename
>   10006710	3543328	1773568	15323606	 e9d1d6	vmlinux.x86-32.before
>    9706358	3547424	1773568	15027350	 e54c96	vmlinux.x86-32.after
> 
>       text	   data	    bss	     dec	    hex	filename
>   10652105	4537576	 843776	16033457	 f4a6b1	vmlinux.x86-64.before
>   10639629	4537576	 843776	16020981	 f475f5	vmlinux.x86-64.after
> 
> That comes out to a 3% text size improvement on x86-32 and a 0.1% text
> size improvement on x86-64.
> 
> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
> ---
> v2:
> - improve readability of the comments
> - add comment about why gcc version needs to be checked
> - use cc-option-yn instead of cc-option
> 
>  arch/x86/Makefile        | 31 +++++++++++++++++++++++++++----
>  arch/x86/Makefile_32.cpu | 18 ------------------
>  arch/x86/kernel/ftrace.c |  6 ++++++
>  scripts/Kbuild.include   |  4 ++++
>  4 files changed, 37 insertions(+), 22 deletions(-)
> 
> diff --git a/arch/x86/Makefile b/arch/x86/Makefile
> index 2d44933..04c87be 100644
> --- a/arch/x86/Makefile
> +++ b/arch/x86/Makefile
> @@ -120,10 +120,6 @@ else
>          # -funit-at-a-time shrinks the kernel .text considerably
>          # unfortunately it makes reading oopses harder.
>          KBUILD_CFLAGS += $(call cc-option,-funit-at-a-time)
> -
> -        # this works around some issues with generating unwind tables in older gccs
> -        # newer gccs do it by default
> -        KBUILD_CFLAGS += $(call cc-option,-maccumulate-outgoing-args)
>  endif
>  
>  ifdef CONFIG_X86_X32
> @@ -147,6 +143,33 @@ ifeq ($(CONFIG_KMEMCHECK),y)
>  	KBUILD_CFLAGS += $(call cc-option,-fno-builtin-memcpy)
>  endif
>  
> +# If the function graph tracer is used with mcount instead of fentry,
> +# '-maccumulate-outgoing-args' is needed to prevent a gcc bug
> +# (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42109)
> +ifdef CONFIG_FUNCTION_GRAPH_TRACER
> +  ifndef CONFIG_HAVE_FENTRY
> +	ACCUMULATE_OUTGOING_ARGS := 1
> +  else
> +    ifeq ($(call cc-option-yn, -mfentry), n)
> +	ACCUMULATE_OUTGOING_ARGS := 1
> +    endif
> +  endif
> +endif
> +
> +# Jump labels need '-maccumulate-outgoing-args' for gcc < 4.5.2 to prevent a
> +# gcc bug (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46226).  There's no way
> +# to test for this bug at compile-time because the test case needs to execute,
> +# which is a no-go for cross compilers.  So check the gcc version instead.

In documentation please refer to GCC the compiler as 'GCC' (upper case), while gcc 
the command as 'gcc'.

Also, even in Kbuild try to follow the kernel style - which for comments would be 
something like:

#
# Jump labels need '-maccumulate-outgoing-args' for gcc < 4.5.2 to prevent a
# GCC bug (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46226).  There's no way
# to test for this bug at compile-time because the test case needs to execute,
# which is a no-go for cross compilers.  So check the GCC version instead.
#

... even though there's plenty of bad example in the Makefile you are changing.

> +#if defined(CONFIG_FUNCTION_GRAPH_TRACER) && \
> +	!defined(CC_USING_FENTRY) && \
> +	!defined(CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE)
> +# error Your compiler does not support function graph tracing
> +#endif

Might make sense to add the compiler option that is missing, i.e. something like:

  # error Your compiler does not support function graph tracing (-mfentry)

(or whatever compiler feature is missing.)

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 51+ messages in thread

* Re: [PATCH v2] x86: mostly disable '-maccumulate-outgoing-args'
  2017-03-22  7:51                                         ` Ingo Molnar
@ 2017-03-22 15:48                                           ` Josh Poimboeuf
  0 siblings, 0 replies; 51+ messages in thread
From: Josh Poimboeuf @ 2017-03-22 15:48 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: x86, Andy Lutomirski, Pavel Machek, kernel list,
	Andrew Lutomirski, Borislav Petkov, Brian Gerst, Denys Vlasenko,
	Peter Anvin, Peter Zijlstra, Thomas Gleixner, Steven Rostedt,
	Linus Torvalds

On Wed, Mar 22, 2017 at 08:51:53AM +0100, Ingo Molnar wrote:
> > +#if defined(CONFIG_FUNCTION_GRAPH_TRACER) && \
> > +	!defined(CC_USING_FENTRY) && \
> > +	!defined(CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE)
> > +# error Your compiler does not support function graph tracing
> > +#endif
> 
> Might make sense to add the compiler option that is missing, i.e. something like:
> 
>   # error Your compiler does not support function graph tracing (-mfentry)
> 
> (or whatever compiler feature is missing.)

I left it vague because otherwise it would need a paragraph :-)

After Steven's latest patches which port fentry to x86-32, I think the
precise version would be:

  # error The following combination is not supported: ((compiler missing -mfentry) || (CONFIG_X86_32 and !CONFIG_DYNAMIC_FTRACE)) && CONFIG_FUNCTION_GRAPH_TRACER && CONFIG_CC_OPTIMIZE_FOR_SIZE.

-- 
Josh

^ permalink raw reply	[flat|nested] 51+ messages in thread

* [tip:x86/urgent] x86/build: Mostly disable '-maccumulate-outgoing-args'
  2017-03-16 19:31                                       ` [PATCH v2] " Josh Poimboeuf
  2017-03-22  7:51                                         ` Ingo Molnar
@ 2017-03-28  8:13                                         ` tip-bot for Josh Poimboeuf
  2017-03-28 16:17                                           ` Josh Poimboeuf
  2017-03-30  9:58                                         ` tip-bot for Josh Poimboeuf
  2 siblings, 1 reply; 51+ messages in thread
From: tip-bot for Josh Poimboeuf @ 2017-03-28  8:13 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: rostedt, dvlasenk, tglx, luto, linux-kernel, hpa, pavel, luto,
	torvalds, bp, mingo, peterz, brgerst, jpoimboe

Commit-ID:  a14d30bdd58680dd9eb4d83600387f159a4b0f18
Gitweb:     http://git.kernel.org/tip/a14d30bdd58680dd9eb4d83600387f159a4b0f18
Author:     Josh Poimboeuf <jpoimboe@redhat.com>
AuthorDate: Thu, 16 Mar 2017 14:31:33 -0500
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 28 Mar 2017 09:33:02 +0200

x86/build: Mostly disable '-maccumulate-outgoing-args'

The GCC '-maccumulate-outgoing-args' flag is enabled for most configs,
mostly because of issues which are no longer relevant.  For most
configs, and with most recent versions of GCC, it's no longer needed.

Clarify which cases need it, and only enable it for those cases.  Also
produce a compile-time error for the ftrace graph + mcount + '-Os' case,
which will otherwise cause runtime failures.

The main benefit of '-maccumulate-outgoing-args' is that it prevents an
ugly prologue for functions which have aligned stacks.  But removing the
option also has some benefits: more readable argument saves, smaller
text size, and (presumably) slightly improved performance.

Here are the object size savings for 32-bit and 64-bit defconfig
kernels:

      text	   data	    bss	     dec	    hex	filename
  10006710	3543328	1773568	15323606	 e9d1d6	vmlinux.x86-32.before
   9706358	3547424	1773568	15027350	 e54c96	vmlinux.x86-32.after

      text	   data	    bss	     dec	    hex	filename
  10652105	4537576	 843776	16033457	 f4a6b1	vmlinux.x86-64.before
  10639629	4537576	 843776	16020981	 f475f5	vmlinux.x86-64.after

That comes out to a 3% text size improvement on x86-32 and a 0.1% text
size improvement on x86-64.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrew Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170316193133.zrj6gug53766m6nn@treble
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/Makefile        | 31 +++++++++++++++++++++++++++----
 arch/x86/Makefile_32.cpu | 18 ------------------
 arch/x86/kernel/ftrace.c |  6 ++++++
 scripts/Kbuild.include   |  4 ++++
 4 files changed, 37 insertions(+), 22 deletions(-)

diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 2d44933..07b6746 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -120,10 +120,6 @@ else
         # -funit-at-a-time shrinks the kernel .text considerably
         # unfortunately it makes reading oopses harder.
         KBUILD_CFLAGS += $(call cc-option,-funit-at-a-time)
-
-        # this works around some issues with generating unwind tables in older gccs
-        # newer gccs do it by default
-        KBUILD_CFLAGS += $(call cc-option,-maccumulate-outgoing-args)
 endif
 
 ifdef CONFIG_X86_X32
@@ -147,6 +143,33 @@ ifeq ($(CONFIG_KMEMCHECK),y)
 	KBUILD_CFLAGS += $(call cc-option,-fno-builtin-memcpy)
 endif
 
+# If the function graph tracer is used with mcount instead of fentry,
+# '-maccumulate-outgoing-args' is needed to prevent a GCC bug
+# (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42109)
+ifdef CONFIG_FUNCTION_GRAPH_TRACER
+  ifndef CONFIG_HAVE_FENTRY
+	ACCUMULATE_OUTGOING_ARGS := 1
+  else
+    ifeq ($(call cc-option-yn, -mfentry), n)
+	ACCUMULATE_OUTGOING_ARGS := 1
+    endif
+  endif
+endif
+
+# Jump labels need '-maccumulate-outgoing-args' for GCC < 4.5.2 to prevent a
+# GCC bug (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46226).  There's no way
+# to test for this bug at compile-time because the test case needs to execute,
+# which is a no-go for cross compilers.  So check the GCC version instead.
+ifdef CONFIG_JUMP_LABEL
+  ifneq ($(ACCUMULATE_OUTGOING_ARGS), 1)
+	ACCUMULATE_OUTGOING_ARGS = $(call cc-if-fullversion, -lt, 040502, 1)
+  endif
+endif
+
+ifeq ($(ACCUMULATE_OUTGOING_ARGS), 1)
+	KBUILD_CFLAGS += -maccumulate-outgoing-args
+endif
+
 # Stackpointer is addressed different for 32 bit and 64 bit x86
 sp-$(CONFIG_X86_32) := esp
 sp-$(CONFIG_X86_64) := rsp
diff --git a/arch/x86/Makefile_32.cpu b/arch/x86/Makefile_32.cpu
index 6647ed4..a45eb15 100644
--- a/arch/x86/Makefile_32.cpu
+++ b/arch/x86/Makefile_32.cpu
@@ -45,24 +45,6 @@ cflags-$(CONFIG_MGEODE_LX)	+= $(call cc-option,-march=geode,-march=pentium-mmx)
 # cpu entries
 cflags-$(CONFIG_X86_GENERIC) 	+= $(call tune,generic,$(call tune,i686))
 
-# Work around the pentium-mmx code generator madness of gcc4.4.x which
-# does stack alignment by generating horrible code _before_ the mcount
-# prologue (push %ebp, mov %esp, %ebp) which breaks the function graph
-# tracer assumptions. For i686, generic, core2 this is set by the
-# compiler anyway
-ifeq ($(CONFIG_FUNCTION_GRAPH_TRACER), y)
-ADD_ACCUMULATE_OUTGOING_ARGS := y
-endif
-
-# Work around to a bug with asm goto with first implementations of it
-# in gcc causing gcc to mess up the push and pop of the stack in some
-# uses of asm goto.
-ifeq ($(CONFIG_JUMP_LABEL), y)
-ADD_ACCUMULATE_OUTGOING_ARGS := y
-endif
-
-cflags-$(ADD_ACCUMULATE_OUTGOING_ARGS) += $(call cc-option,-maccumulate-outgoing-args)
-
 # Bug fix for binutils: this option is required in order to keep
 # binutils from generating NOPL instructions against our will.
 ifneq ($(CONFIG_X86_P6_NOP),y)
diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index 8f3d9cf..59f9b46 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -29,6 +29,12 @@
 #include <asm/ftrace.h>
 #include <asm/nops.h>
 
+#if defined(CONFIG_FUNCTION_GRAPH_TRACER) && \
+	!defined(CC_USING_FENTRY) && \
+	!defined(CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE)
+# error Your compiler does not support function graph tracing
+#endif
+
 #ifdef CONFIG_DYNAMIC_FTRACE
 
 int ftrace_arch_code_modify_prepare(void)
diff --git a/scripts/Kbuild.include b/scripts/Kbuild.include
index d6ca649..afe3fd3 100644
--- a/scripts/Kbuild.include
+++ b/scripts/Kbuild.include
@@ -148,6 +148,10 @@ cc-fullversion = $(shell $(CONFIG_SHELL) \
 # Usage:  EXTRA_CFLAGS += $(call cc-ifversion, -lt, 0402, -O1)
 cc-ifversion = $(shell [ $(cc-version) $(1) $(2) ] && echo $(3) || echo $(4))
 
+# cc-if-fullversion
+# Usage:  EXTRA_CFLAGS += $(call cc-if-fullversion, -lt, 040502, -O1)
+cc-if-fullversion = $(shell [ $(cc-fullversion) $(1) $(2) ] && echo $(3) || echo $(4))
+
 # cc-ldoption
 # Usage: ldflags += $(call cc-ldoption, -Wl$(comma)--hash-style=both)
 cc-ldoption = $(call try-run,\

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* Re: [tip:x86/urgent] x86/build: Mostly disable '-maccumulate-outgoing-args'
  2017-03-28  8:13                                         ` [tip:x86/urgent] x86/build: Mostly " tip-bot for Josh Poimboeuf
@ 2017-03-28 16:17                                           ` Josh Poimboeuf
  0 siblings, 0 replies; 51+ messages in thread
From: Josh Poimboeuf @ 2017-03-28 16:17 UTC (permalink / raw)
  To: tip-bot for Josh Poimboeuf
  Cc: linux-tip-commits, rostedt, dvlasenk, tglx, luto, linux-kernel,
	hpa, pavel, luto, torvalds, bp, mingo, peterz, brgerst

Dear tip-bot,

If it's not too late, here's a version with Ingo's suggested changes
(improving the makefile comment readability and improving the ftrace
build error message).

----

From: Josh Poimboeuf <jpoimboe@redhat.com>
Subject: [PATCH v3] x86: mostly disable '-maccumulate-outgoing-args'

The gcc '-maccumulate-outgoing-args' flag is enabled for most configs,
mostly because of issues which are no longer relevant.  For most
configs, and with most recent versions of gcc, it's no longer needed.

Clarify which cases need it, and only enable it for those cases.  Also
produce a compile-time error for the ftrace graph + mcount + '-Os' case,
which will otherwise cause runtime failures.

The main benefit of '-maccumulate-outgoing-args' is that it prevents an
ugly prologue for functions which have aligned stacks.  But removing the
option also has some benefits: more readable argument saves, smaller
text size, and (presumably) slightly improved performance.

Here are the object size savings for 32-bit and 64-bit defconfig
kernels:

      text	   data	    bss	     dec	    hex	filename
  10006710	3543328	1773568	15323606	 e9d1d6	vmlinux.x86-32.before
   9706358	3547424	1773568	15027350	 e54c96	vmlinux.x86-32.after

      text	   data	    bss	     dec	    hex	filename
  10652105	4537576	 843776	16033457	 f4a6b1	vmlinux.x86-64.before
  10639629	4537576	 843776	16020981	 f475f5	vmlinux.x86-64.after

That comes out to a 3% text size improvement on x86-32 and a 0.1% text
size improvement on x86-64.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
---
v3:
- improve makefile comment readability
- improve precision of ftrace build error message

 arch/x86/Makefile        | 35 +++++++++++++++++++++++++++++++----
 arch/x86/Makefile_32.cpu | 18 ------------------
 arch/x86/kernel/ftrace.c |  6 ++++++
 scripts/Kbuild.include   |  4 ++++
 4 files changed, 41 insertions(+), 22 deletions(-)

diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 2d44933..a94a4d1 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -120,10 +120,6 @@ else
         # -funit-at-a-time shrinks the kernel .text considerably
         # unfortunately it makes reading oopses harder.
         KBUILD_CFLAGS += $(call cc-option,-funit-at-a-time)
-
-        # this works around some issues with generating unwind tables in older gccs
-        # newer gccs do it by default
-        KBUILD_CFLAGS += $(call cc-option,-maccumulate-outgoing-args)
 endif
 
 ifdef CONFIG_X86_X32
@@ -147,6 +143,37 @@ ifeq ($(CONFIG_KMEMCHECK),y)
 	KBUILD_CFLAGS += $(call cc-option,-fno-builtin-memcpy)
 endif
 
+#
+# If the function graph tracer is used with mcount instead of fentry,
+# '-maccumulate-outgoing-args' is needed to prevent a GCC bug
+# (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42109)
+#
+ifdef CONFIG_FUNCTION_GRAPH_TRACER
+  ifndef CONFIG_HAVE_FENTRY
+	ACCUMULATE_OUTGOING_ARGS := 1
+  else
+    ifeq ($(call cc-option-yn, -mfentry), n)
+	ACCUMULATE_OUTGOING_ARGS := 1
+    endif
+  endif
+endif
+
+#
+# Jump labels need '-maccumulate-outgoing-args' for gcc < 4.5.2 to prevent a
+# GCC bug (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46226).  There's no way
+# to test for this bug at compile-time because the test case needs to execute,
+# which is a no-go for cross compilers.  So check the GCC version instead.
+#
+ifdef CONFIG_JUMP_LABEL
+  ifneq ($(ACCUMULATE_OUTGOING_ARGS), 1)
+	ACCUMULATE_OUTGOING_ARGS = $(call cc-if-fullversion, -lt, 040502, 1)
+  endif
+endif
+
+ifeq ($(ACCUMULATE_OUTGOING_ARGS), 1)
+	KBUILD_CFLAGS += -maccumulate-outgoing-args
+endif
+
 # Stackpointer is addressed different for 32 bit and 64 bit x86
 sp-$(CONFIG_X86_32) := esp
 sp-$(CONFIG_X86_64) := rsp
diff --git a/arch/x86/Makefile_32.cpu b/arch/x86/Makefile_32.cpu
index 6647ed4..a45eb15 100644
--- a/arch/x86/Makefile_32.cpu
+++ b/arch/x86/Makefile_32.cpu
@@ -45,24 +45,6 @@ cflags-$(CONFIG_MGEODE_LX)	+= $(call cc-option,-march=geode,-march=pentium-mmx)
 # cpu entries
 cflags-$(CONFIG_X86_GENERIC) 	+= $(call tune,generic,$(call tune,i686))
 
-# Work around the pentium-mmx code generator madness of gcc4.4.x which
-# does stack alignment by generating horrible code _before_ the mcount
-# prologue (push %ebp, mov %esp, %ebp) which breaks the function graph
-# tracer assumptions. For i686, generic, core2 this is set by the
-# compiler anyway
-ifeq ($(CONFIG_FUNCTION_GRAPH_TRACER), y)
-ADD_ACCUMULATE_OUTGOING_ARGS := y
-endif
-
-# Work around to a bug with asm goto with first implementations of it
-# in gcc causing gcc to mess up the push and pop of the stack in some
-# uses of asm goto.
-ifeq ($(CONFIG_JUMP_LABEL), y)
-ADD_ACCUMULATE_OUTGOING_ARGS := y
-endif
-
-cflags-$(ADD_ACCUMULATE_OUTGOING_ARGS) += $(call cc-option,-maccumulate-outgoing-args)
-
 # Bug fix for binutils: this option is required in order to keep
 # binutils from generating NOPL instructions against our will.
 ifneq ($(CONFIG_X86_P6_NOP),y)
diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index 8f3d9cf..cbd73eb 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -29,6 +29,12 @@
 #include <asm/ftrace.h>
 #include <asm/nops.h>
 
+#if defined(CONFIG_FUNCTION_GRAPH_TRACER) && \
+	!defined(CC_USING_FENTRY) && \
+	!defined(CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE)
+# error The following combination is not supported: ((compiler missing -mfentry) || (CONFIG_X86_32 and !CONFIG_DYNAMIC_FTRACE)) && CONFIG_FUNCTION_GRAPH_TRACER && CONFIG_CC_OPTIMIZE_FOR_SIZE
+#endif
+
 #ifdef CONFIG_DYNAMIC_FTRACE
 
 int ftrace_arch_code_modify_prepare(void)
diff --git a/scripts/Kbuild.include b/scripts/Kbuild.include
index d6ca649..afe3fd3 100644
--- a/scripts/Kbuild.include
+++ b/scripts/Kbuild.include
@@ -148,6 +148,10 @@ cc-fullversion = $(shell $(CONFIG_SHELL) \
 # Usage:  EXTRA_CFLAGS += $(call cc-ifversion, -lt, 0402, -O1)
 cc-ifversion = $(shell [ $(cc-version) $(1) $(2) ] && echo $(3) || echo $(4))
 
+# cc-if-fullversion
+# Usage:  EXTRA_CFLAGS += $(call cc-if-fullversion, -lt, 040502, -O1)
+cc-if-fullversion = $(shell [ $(cc-fullversion) $(1) $(2) ] && echo $(3) || echo $(4))
+
 # cc-ldoption
 # Usage: ldflags += $(call cc-ldoption, -Wl$(comma)--hash-style=both)
 cc-ldoption = $(call try-run,\
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 51+ messages in thread

* [tip:x86/urgent] x86/build: Mostly disable '-maccumulate-outgoing-args'
  2017-03-16 19:31                                       ` [PATCH v2] " Josh Poimboeuf
  2017-03-22  7:51                                         ` Ingo Molnar
  2017-03-28  8:13                                         ` [tip:x86/urgent] x86/build: Mostly " tip-bot for Josh Poimboeuf
@ 2017-03-30  9:58                                         ` tip-bot for Josh Poimboeuf
  2 siblings, 0 replies; 51+ messages in thread
From: tip-bot for Josh Poimboeuf @ 2017-03-30  9:58 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, mingo, luto, hpa, tglx, luto, torvalds, dvlasenk,
	peterz, bp, brgerst, jpoimboe, rostedt, pavel

Commit-ID:  3f135e57a4f76d24ae8d8a490314331f0ced40c5
Gitweb:     http://git.kernel.org/tip/3f135e57a4f76d24ae8d8a490314331f0ced40c5
Author:     Josh Poimboeuf <jpoimboe@redhat.com>
AuthorDate: Thu, 16 Mar 2017 14:31:33 -0500
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 30 Mar 2017 11:53:04 +0200

x86/build: Mostly disable '-maccumulate-outgoing-args'

The GCC '-maccumulate-outgoing-args' flag is enabled for most configs,
mostly because of issues which are no longer relevant.  For most
configs, and with most recent versions of GCC, it's no longer needed.

Clarify which cases need it, and only enable it for those cases.  Also
produce a compile-time error for the ftrace graph + mcount + '-Os' case,
which will otherwise cause runtime failures.

The main benefit of '-maccumulate-outgoing-args' is that it prevents an
ugly prologue for functions which have aligned stacks.  But removing the
option also has some benefits: more readable argument saves, smaller
text size, and (presumably) slightly improved performance.

Here are the object size savings for 32-bit and 64-bit defconfig
kernels:

      text	   data	    bss	     dec	    hex	filename
  10006710	3543328	1773568	15323606	 e9d1d6	vmlinux.x86-32.before
   9706358	3547424	1773568	15027350	 e54c96	vmlinux.x86-32.after

      text	   data	    bss	     dec	    hex	filename
  10652105	4537576	 843776	16033457	 f4a6b1	vmlinux.x86-64.before
  10639629	4537576	 843776	16020981	 f475f5	vmlinux.x86-64.after

That comes out to a 3% text size improvement on x86-32 and a 0.1% text
size improvement on x86-64.

Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Andrew Lutomirski <luto@kernel.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170316193133.zrj6gug53766m6nn@treble
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 arch/x86/Makefile        | 35 +++++++++++++++++++++++++++++++----
 arch/x86/Makefile_32.cpu | 18 ------------------
 arch/x86/kernel/ftrace.c |  6 ++++++
 scripts/Kbuild.include   |  4 ++++
 4 files changed, 41 insertions(+), 22 deletions(-)

diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index 2d44933..a94a4d1 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -120,10 +120,6 @@ else
         # -funit-at-a-time shrinks the kernel .text considerably
         # unfortunately it makes reading oopses harder.
         KBUILD_CFLAGS += $(call cc-option,-funit-at-a-time)
-
-        # this works around some issues with generating unwind tables in older gccs
-        # newer gccs do it by default
-        KBUILD_CFLAGS += $(call cc-option,-maccumulate-outgoing-args)
 endif
 
 ifdef CONFIG_X86_X32
@@ -147,6 +143,37 @@ ifeq ($(CONFIG_KMEMCHECK),y)
 	KBUILD_CFLAGS += $(call cc-option,-fno-builtin-memcpy)
 endif
 
+#
+# If the function graph tracer is used with mcount instead of fentry,
+# '-maccumulate-outgoing-args' is needed to prevent a GCC bug
+# (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42109)
+#
+ifdef CONFIG_FUNCTION_GRAPH_TRACER
+  ifndef CONFIG_HAVE_FENTRY
+	ACCUMULATE_OUTGOING_ARGS := 1
+  else
+    ifeq ($(call cc-option-yn, -mfentry), n)
+	ACCUMULATE_OUTGOING_ARGS := 1
+    endif
+  endif
+endif
+
+#
+# Jump labels need '-maccumulate-outgoing-args' for gcc < 4.5.2 to prevent a
+# GCC bug (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46226).  There's no way
+# to test for this bug at compile-time because the test case needs to execute,
+# which is a no-go for cross compilers.  So check the GCC version instead.
+#
+ifdef CONFIG_JUMP_LABEL
+  ifneq ($(ACCUMULATE_OUTGOING_ARGS), 1)
+	ACCUMULATE_OUTGOING_ARGS = $(call cc-if-fullversion, -lt, 040502, 1)
+  endif
+endif
+
+ifeq ($(ACCUMULATE_OUTGOING_ARGS), 1)
+	KBUILD_CFLAGS += -maccumulate-outgoing-args
+endif
+
 # Stackpointer is addressed different for 32 bit and 64 bit x86
 sp-$(CONFIG_X86_32) := esp
 sp-$(CONFIG_X86_64) := rsp
diff --git a/arch/x86/Makefile_32.cpu b/arch/x86/Makefile_32.cpu
index 6647ed4..a45eb15 100644
--- a/arch/x86/Makefile_32.cpu
+++ b/arch/x86/Makefile_32.cpu
@@ -45,24 +45,6 @@ cflags-$(CONFIG_MGEODE_LX)	+= $(call cc-option,-march=geode,-march=pentium-mmx)
 # cpu entries
 cflags-$(CONFIG_X86_GENERIC) 	+= $(call tune,generic,$(call tune,i686))
 
-# Work around the pentium-mmx code generator madness of gcc4.4.x which
-# does stack alignment by generating horrible code _before_ the mcount
-# prologue (push %ebp, mov %esp, %ebp) which breaks the function graph
-# tracer assumptions. For i686, generic, core2 this is set by the
-# compiler anyway
-ifeq ($(CONFIG_FUNCTION_GRAPH_TRACER), y)
-ADD_ACCUMULATE_OUTGOING_ARGS := y
-endif
-
-# Work around to a bug with asm goto with first implementations of it
-# in gcc causing gcc to mess up the push and pop of the stack in some
-# uses of asm goto.
-ifeq ($(CONFIG_JUMP_LABEL), y)
-ADD_ACCUMULATE_OUTGOING_ARGS := y
-endif
-
-cflags-$(ADD_ACCUMULATE_OUTGOING_ARGS) += $(call cc-option,-maccumulate-outgoing-args)
-
 # Bug fix for binutils: this option is required in order to keep
 # binutils from generating NOPL instructions against our will.
 ifneq ($(CONFIG_X86_P6_NOP),y)
diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index 8f3d9cf..cbd73eb 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -29,6 +29,12 @@
 #include <asm/ftrace.h>
 #include <asm/nops.h>
 
+#if defined(CONFIG_FUNCTION_GRAPH_TRACER) && \
+	!defined(CC_USING_FENTRY) && \
+	!defined(CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE)
+# error The following combination is not supported: ((compiler missing -mfentry) || (CONFIG_X86_32 and !CONFIG_DYNAMIC_FTRACE)) && CONFIG_FUNCTION_GRAPH_TRACER && CONFIG_CC_OPTIMIZE_FOR_SIZE
+#endif
+
 #ifdef CONFIG_DYNAMIC_FTRACE
 
 int ftrace_arch_code_modify_prepare(void)
diff --git a/scripts/Kbuild.include b/scripts/Kbuild.include
index d6ca649..afe3fd3 100644
--- a/scripts/Kbuild.include
+++ b/scripts/Kbuild.include
@@ -148,6 +148,10 @@ cc-fullversion = $(shell $(CONFIG_SHELL) \
 # Usage:  EXTRA_CFLAGS += $(call cc-ifversion, -lt, 0402, -O1)
 cc-ifversion = $(shell [ $(cc-version) $(1) $(2) ] && echo $(3) || echo $(4))
 
+# cc-if-fullversion
+# Usage:  EXTRA_CFLAGS += $(call cc-if-fullversion, -lt, 040502, -O1)
+cc-if-fullversion = $(shell [ $(cc-fullversion) $(1) $(2) ] && echo $(3) || echo $(4))
+
 # cc-ldoption
 # Usage: ldflags += $(call cc-ldoption, -Wl$(comma)--hash-style=both)
 cc-ldoption = $(call try-run,\

^ permalink raw reply related	[flat|nested] 51+ messages in thread

end of thread, other threads:[~2017-03-30 10:06 UTC | newest]

Thread overview: 51+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-21 22:14 v4.10: kernel stack frame pointer .. has bad value (null) Pavel Machek
2017-02-21 23:12 ` Josh Poimboeuf
2017-02-21 23:15   ` H. Peter Anvin
2017-02-22 16:45     ` Josh Poimboeuf
2017-02-22 20:51       ` H. Peter Anvin
2017-02-22 21:15         ` Josh Poimboeuf
2017-02-22 21:05   ` Pavel Machek
2017-02-22 21:21     ` Josh Poimboeuf
2017-02-22 22:47       ` Pavel Machek
2017-02-22 22:56         ` Josh Poimboeuf
2017-02-22 23:18           ` Josh Poimboeuf
2017-02-23 20:10             ` Pavel Machek
2017-02-25  5:04               ` Josh Poimboeuf
2017-03-02 23:45                 ` Josh Poimboeuf
2017-03-06 16:38                   ` Pavel Machek
2017-03-07 17:38                     ` Josh Poimboeuf
2017-03-07 17:52                       ` Linus Torvalds
2017-03-07 17:59                         ` Andy Lutomirski
2017-03-07 18:28                           ` Josh Poimboeuf
2017-03-07 18:30                             ` Josh Poimboeuf
2017-03-07 18:40                             ` Linus Torvalds
2017-03-08 17:37                               ` Josh Poimboeuf
2017-03-08 18:25                                 ` Linus Torvalds
2017-03-08 18:54                                   ` Andy Lutomirski
2017-03-08 21:22                                   ` Pavel Machek
2017-03-09  9:38                                     ` Geert Uytterhoeven
2017-03-09 10:56                                       ` Pavel Machek
2017-03-09 12:16                                         ` Geert Uytterhoeven
2017-03-10 13:17                                           ` Compiling kernels faster (was Re: v4.10: kernel stack frame pointer .. has bad value (null)) Pavel Machek
2017-03-10 13:28                                             ` Geert Uytterhoeven
2017-03-10 14:15                                             ` Willy Tarreau
2017-03-09 10:49                                     ` Old compiler versions " Pavel Machek
2017-03-09 18:05                                       ` Linus Torvalds
2017-03-09 15:29                                     ` v4.10: kernel stack frame pointer .. has bad value (null) Peter Zijlstra
2017-03-09 21:12                                       ` Pavel Machek
2017-03-08 21:29                                   ` Josh Poimboeuf
2017-03-09 14:14                                     ` Steven Rostedt
2017-03-09 18:31                                       ` Josh Poimboeuf
2017-03-16 15:42                                     ` [PATCH] x86: mostly disable '-maccumulate-outgoing-args' Josh Poimboeuf
2017-03-16 17:32                                       ` Steven Rostedt
2017-03-16 18:36                                         ` Josh Poimboeuf
2017-03-16 18:53                                           ` Josh Poimboeuf
2017-03-16 19:04                                             ` Josh Poimboeuf
2017-03-16 19:07                                               ` Steven Rostedt
2017-03-16 19:06                                           ` Steven Rostedt
2017-03-16 19:31                                       ` [PATCH v2] " Josh Poimboeuf
2017-03-22  7:51                                         ` Ingo Molnar
2017-03-22 15:48                                           ` Josh Poimboeuf
2017-03-28  8:13                                         ` [tip:x86/urgent] x86/build: Mostly " tip-bot for Josh Poimboeuf
2017-03-28 16:17                                           ` Josh Poimboeuf
2017-03-30  9:58                                         ` tip-bot for Josh Poimboeuf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.