* Re: BUG: GCC-4.4.x changes the function frame on some functions
@ 2009-11-19 21:14 H. Peter Anvin
2009-11-19 21:25 ` Jeff Law
0 siblings, 1 reply; 68+ messages in thread
From: H. Peter Anvin @ 2009-11-19 21:14 UTC (permalink / raw)
To: Jeff Law
Cc: rostedt, David Daney, Linus Torvalds, Andrew Haley,
Richard Guenther, Thomas Gleixner, Ingo Molnar, LKML,
Andrew Morton, Heiko Carstens, feng.tang, Fr??d??ric Weisbecker,
Peter Zijlstra, jakub, gcc
[-- Attachment #1: Type: text/plain, Size: 728 bytes --]
Hence a new unconstrained option...
"Jeff Law" <law@redhat.com> wrote:
>On 11/19/09 12:50, H. Peter Anvin wrote:
>>
>> Calling the profiler immediately at the entry point is clearly the more
>> sane option. It means the ABI is well-defined, stable, and independent
>> of what the actual function contents are. It means that ABI isn't the
>> normal C ABI (the __fentry__ function would have to preserve all
>> registers), but that's fine...
>>
>Note there are targets (even some old x86 variants) that required the
>profiling calls to occur after the prologue. Unfortunately, nobody
>documented *why* that was the case. Sigh.
>
>Jeff
--
Sent from my mobile phone. Please excuse any lack of formatting.
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: BUG: GCC-4.4.x changes the function frame on some functions
2009-11-19 21:14 BUG: GCC-4.4.x changes the function frame on some functions H. Peter Anvin
@ 2009-11-19 21:25 ` Jeff Law
2009-11-19 22:43 ` Steven Rostedt
0 siblings, 1 reply; 68+ messages in thread
From: Jeff Law @ 2009-11-19 21:25 UTC (permalink / raw)
To: H. Peter Anvin
Cc: rostedt, David Daney, Linus Torvalds, Andrew Haley,
Richard Guenther, Thomas Gleixner, Ingo Molnar, LKML,
Andrew Morton, Heiko Carstens, feng.tang, Fr??d??ric Weisbecker,
Peter Zijlstra, jakub, gcc
On 11/19/09 14:14, H. Peter Anvin wrote:
> Hence a new unconstrained option...
>
Not arguing against it, just noting there are targets where after the
prologue mcount is mandated. There's certainly hooks in GCC to do it
both ways and if there's no clear need to use after-prologue on
x86-linux, then before-prologue seems reasonable to me.
It's also the case that aligning stacks on the x86 and the poor code
generated when used with profiling is an interaction I doubt anyone has
looked at until now. The result is definitely ugly and inefficient --
and there's something to be said for cleaning that up and at least
marginally reducing the overhead of profiling.
Having said all that, I don't expect to personally be looking at the
problem, given the list of other codegen issues that need to be looked
at (reload in particular), profiling/stack interactions would be around
87 millionth on my list.
jeff
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: BUG: GCC-4.4.x changes the function frame on some functions
2009-11-19 21:25 ` Jeff Law
@ 2009-11-19 22:43 ` Steven Rostedt
2009-11-19 23:58 ` Jeff Law
0 siblings, 1 reply; 68+ messages in thread
From: Steven Rostedt @ 2009-11-19 22:43 UTC (permalink / raw)
To: Jeff Law
Cc: H. Peter Anvin, David Daney, Linus Torvalds, Andrew Haley,
Richard Guenther, Thomas Gleixner, Ingo Molnar, LKML,
Andrew Morton, Heiko Carstens, feng.tang, Fr??d??ric Weisbecker,
Peter Zijlstra, jakub, gcc
On Thu, 2009-11-19 at 14:25 -0700, Jeff Law wrote:
> Having said all that, I don't expect to personally be looking at the
> problem, given the list of other codegen issues that need to be looked
> at (reload in particular), profiling/stack interactions would be around
> 87 millionth on my list.
Is there someone else that can look at it?
Or at the very least, could you point us to where that code is, and one
of us tracing folks could take a crack at switching hats to be a
compiler writer (with the obvious prerequisite of drinking a lot of beer
first, or is there a better drug to cope with the pain of writing gcc?).
-- Steve
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: BUG: GCC-4.4.x changes the function frame on some functions
2009-11-19 22:43 ` Steven Rostedt
@ 2009-11-19 23:58 ` Jeff Law
2009-11-20 0:36 ` Thomas Gleixner
0 siblings, 1 reply; 68+ messages in thread
From: Jeff Law @ 2009-11-19 23:58 UTC (permalink / raw)
To: rostedt
Cc: H. Peter Anvin, David Daney, Linus Torvalds, Andrew Haley,
Richard Guenther, Thomas Gleixner, Ingo Molnar, LKML,
Andrew Morton, Heiko Carstens, feng.tang, Fr??d??ric Weisbecker,
Peter Zijlstra, jakub, gcc
On 11/19/09 15:43, Steven Rostedt wrote:
> On Thu, 2009-11-19 at 14:25 -0700, Jeff Law wrote:
>
>
>> Having said all that, I don't expect to personally be looking at the
>> problem, given the list of other codegen issues that need to be looked
>> at (reload in particular), profiling/stack interactions would be around
>> 87 millionth on my list.
>>
> Is there someone else that can look at it?
>
>
Unsure at the moment... Like everyone else, GCC developers are busy and
this probably isn't going to be a high priority item for anyone.
> Or at the very least, could you point us to where that code is, and one
> of us tracing folks could take a crack at switching hats to be a
> compiler writer (with the obvious prerequisite of drinking a lot of beer
> first, or is there a better drug to cope with the pain of writing gcc?).
>
It _might_ be as easy as defining PROFILE_BEFORE_PROLOGUE in
gcc-<someversion>gcc/config/i386/linux.h & rebuilding GCC.
Based on comments elsewhere, the sun386i support may have used
PROFILE_BEFORE_PROLOGUE in the past and thus the x86 backend may not
need further adjustment. That is obviously the ideal case.
If that appears to work for your needs, I'll volunteer to test it more
thoroughly and assuming those tests look good shepherd it into the
source tree.
Jeff
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: BUG: GCC-4.4.x changes the function frame on some functions
2009-11-19 23:58 ` Jeff Law
@ 2009-11-20 0:36 ` Thomas Gleixner
2009-11-20 0:59 ` Linus Torvalds
2009-11-20 12:04 ` Andrew Haley
0 siblings, 2 replies; 68+ messages in thread
From: Thomas Gleixner @ 2009-11-20 0:36 UTC (permalink / raw)
To: Jeff Law
Cc: rostedt, H. Peter Anvin, David Daney, Linus Torvalds,
Andrew Haley, Richard Guenther, Ingo Molnar, LKML, Andrew Morton,
Heiko Carstens, feng.tang, Fr??d??ric Weisbecker, Peter Zijlstra,
jakub, gcc
On Thu, 19 Nov 2009, Jeff Law wrote:
> On 11/19/09 15:43, Steven Rostedt wrote:
> > On Thu, 2009-11-19 at 14:25 -0700, Jeff Law wrote:
> >
> >
> > > Having said all that, I don't expect to personally be looking at the
> > > problem, given the list of other codegen issues that need to be looked
> > > at (reload in particular), profiling/stack interactions would be around
> > > 87 millionth on my list.
> > >
> > Is there someone else that can look at it?
> >
> >
> Unsure at the moment... Like everyone else, GCC developers are busy and this
> probably isn't going to be a high priority item for anyone.
>
>
> > Or at the very least, could you point us to where that code is, and one
> > of us tracing folks could take a crack at switching hats to be a
> > compiler writer (with the obvious prerequisite of drinking a lot of beer
> > first, or is there a better drug to cope with the pain of writing gcc?).
> >
> It _might_ be as easy as defining PROFILE_BEFORE_PROLOGUE in
> gcc-<someversion>gcc/config/i386/linux.h & rebuilding GCC.
>
> Based on comments elsewhere, the sun386i support may have used
> PROFILE_BEFORE_PROLOGUE in the past and thus the x86 backend may not need
> further adjustment. That is obviously the ideal case.
>
> If that appears to work for your needs, I'll volunteer to test it more
> thoroughly and assuming those tests look good shepherd it into the source
> tree.
We definitely want to see that ASAP.
While testing various kernel configs we found out that the problem
comes and goes. Finally I started to compare the gcc command line
options and after some fiddling it turned out that the following
minimal deltas change the code generator behaviour:
Bad: -march=pentium-mmx -Wa,-mtune=generic32
Good: -march=i686 -mtune=generic -Wa,-mtune=generic32
Good: -march=pentium-mmx -mtune-generic -Wa,-mtune=generic32
I'm not supposed to understand the logic behind that, right ?
Thanks,
tglx
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: BUG: GCC-4.4.x changes the function frame on some functions
2009-11-20 0:36 ` Thomas Gleixner
@ 2009-11-20 0:59 ` Linus Torvalds
2009-11-20 1:27 ` Thomas Gleixner
` (2 more replies)
2009-11-20 12:04 ` Andrew Haley
1 sibling, 3 replies; 68+ messages in thread
From: Linus Torvalds @ 2009-11-20 0:59 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Jeff Law, rostedt, H. Peter Anvin, David Daney, Andrew Haley,
Richard Guenther, Ingo Molnar, LKML, Andrew Morton,
Heiko Carstens, feng.tang, Fr??d??ric Weisbecker, Peter Zijlstra,
jakub, gcc
On Fri, 20 Nov 2009, Thomas Gleixner wrote:
>
> While testing various kernel configs we found out that the problem
> comes and goes. Finally I started to compare the gcc command line
> options and after some fiddling it turned out that the following
> minimal deltas change the code generator behaviour:
>
> Bad: -march=pentium-mmx -Wa,-mtune=generic32
> Good: -march=i686 -mtune=generic -Wa,-mtune=generic32
> Good: -march=pentium-mmx -mtune-generic -Wa,-mtune=generic32
>
> I'm not supposed to understand the logic behind that, right ?
Are you sure it's just the compiler flags?
There's another configuration portion: the size of the alignment itself.
That's dependent on L1_CACHE_SHIFT, which in turn is taken from the kernel
config CONFIG_X86_L1_CACHE_SHIFT.
Maybe that value matters too - for example maybe gcc will not try to align
the stack if it's big?
[ Btw, looking at that, why are X86_L1_CACHE_BYTES and X86_L1_CACHE_SHIFT
totally unrelated numbers? Very confusing. ]
The compiler flags we use are tied to some of the same choices that choose
the cache shift, so the correlation you found while debugging this would
still hold.
Linus
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: BUG: GCC-4.4.x changes the function frame on some functions
2009-11-20 0:59 ` Linus Torvalds
@ 2009-11-20 1:27 ` Thomas Gleixner
2009-11-20 2:14 ` Thomas Gleixner
2009-11-20 13:09 ` [tip:x86/urgent] x86: Prevent GCC 4.4.x (pentium-mmx et al) function prologue wreckage tip-bot for Thomas Gleixner
2009-11-20 1:29 ` BUG: GCC-4.4.x changes the function frame on some functions H. Peter Anvin
2009-11-20 5:36 ` Ingo Molnar
2 siblings, 2 replies; 68+ messages in thread
From: Thomas Gleixner @ 2009-11-20 1:27 UTC (permalink / raw)
To: Linus Torvalds
Cc: Jeff Law, rostedt, H. Peter Anvin, David Daney, Andrew Haley,
Richard Guenther, Ingo Molnar, LKML, Andrew Morton,
Heiko Carstens, feng.tang, Fr??d??ric Weisbecker, Peter Zijlstra,
jakub, gcc
On Thu, 19 Nov 2009, Linus Torvalds wrote:
> On Fri, 20 Nov 2009, Thomas Gleixner wrote:
> >
> > While testing various kernel configs we found out that the problem
> > comes and goes. Finally I started to compare the gcc command line
> > options and after some fiddling it turned out that the following
> > minimal deltas change the code generator behaviour:
> >
> > Bad: -march=pentium-mmx -Wa,-mtune=generic32
> > Good: -march=i686 -mtune=generic -Wa,-mtune=generic32
> > Good: -march=pentium-mmx -mtune-generic -Wa,-mtune=generic32
> >
> > I'm not supposed to understand the logic behind that, right ?
>
> Are you sure it's just the compiler flags?
I first captured the command line with V=1 and created a script of
it. Then I changed the -march -mtune options in that script and
compiled just that single file manually w/o changing .config or
invoking the kernel make magic.
The good ones produce:
650: 55 push %ebp
651: 89 e5 mov %esp,%ebp
653: 83 e4 f0 and $0xfffffff0,%esp
The bad one:
000005f0 <timer_stats_update_stats>:
5f0: 57 push %edi
5f1: 8d 7c 24 08 lea 0x8(%esp),%edi
5f5: 83 e4 f0 and $0xfffffff0,%esp
5f8: ff 77 fc pushl -0x4(%edi)
5fb: 55 push %ebp
5fc: 89 e5 mov %esp,%ebp
> There's another configuration portion: the size of the alignment itself.
> That's dependent on L1_CACHE_SHIFT, which in turn is taken from the kernel
> config CONFIG_X86_L1_CACHE_SHIFT.
>
> Maybe that value matters too - for example maybe gcc will not try to align
> the stack if it's big?
That does not change any of the compiler options, but yes it could
have some effect via the various include magics, but all I have seen
so far is linkage.h which should not affect the compiler. And the
manual compile did not change any of this.
> [ Btw, looking at that, why are X86_L1_CACHE_BYTES and X86_L1_CACHE_SHIFT
> totally unrelated numbers? Very confusing. ]
Agreed.
> The compiler flags we use are tied to some of the same choices that choose
> the cache shift, so the correlation you found while debugging this would
> still hold.
Digging further tomorrow when my brain is more awake.
Thanks,
tglx
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: BUG: GCC-4.4.x changes the function frame on some functions
2009-11-20 0:59 ` Linus Torvalds
2009-11-20 1:27 ` Thomas Gleixner
@ 2009-11-20 1:29 ` H. Peter Anvin
2009-11-20 5:36 ` Ingo Molnar
2 siblings, 0 replies; 68+ messages in thread
From: H. Peter Anvin @ 2009-11-20 1:29 UTC (permalink / raw)
To: Linus Torvalds
Cc: Thomas Gleixner, Jeff Law, rostedt, David Daney, Andrew Haley,
Richard Guenther, Ingo Molnar, LKML, Andrew Morton,
Heiko Carstens, feng.tang, Fr??d??ric Weisbecker, Peter Zijlstra,
jakub, gcc
On 11/19/2009 04:59 PM, Linus Torvalds wrote:
>
> [ Btw, looking at that, why are X86_L1_CACHE_BYTES and X86_L1_CACHE_SHIFT
> totally unrelated numbers? Very confusing. ]
>
Yes, there is another thread to clean up that particular mess; it is
already in -tip:
http://git.kernel.org/tip/350f8f5631922c7848ec4b530c111cb8c2ff7caa
-hpa
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: BUG: GCC-4.4.x changes the function frame on some functions
2009-11-20 1:27 ` Thomas Gleixner
@ 2009-11-20 2:14 ` Thomas Gleixner
2009-11-20 13:09 ` [tip:x86/urgent] x86: Prevent GCC 4.4.x (pentium-mmx et al) function prologue wreckage tip-bot for Thomas Gleixner
1 sibling, 0 replies; 68+ messages in thread
From: Thomas Gleixner @ 2009-11-20 2:14 UTC (permalink / raw)
To: Linus Torvalds
Cc: Jeff Law, rostedt, H. Peter Anvin, David Daney, Andrew Haley,
Richard Guenther, Ingo Molnar, LKML, Andrew Morton,
Heiko Carstens, feng.tang, Fr??d??ric Weisbecker, Peter Zijlstra,
jakub, gcc
On Fri, 20 Nov 2009, Thomas Gleixner wrote:
> On Thu, 19 Nov 2009, Linus Torvalds wrote:
> > On Fri, 20 Nov 2009, Thomas Gleixner wrote:
> > >
> > > While testing various kernel configs we found out that the problem
> > > comes and goes. Finally I started to compare the gcc command line
> > > options and after some fiddling it turned out that the following
> > > minimal deltas change the code generator behaviour:
> > >
> > > Bad: -march=pentium-mmx -Wa,-mtune=generic32
> > > Good: -march=i686 -mtune=generic -Wa,-mtune=generic32
> > > Good: -march=pentium-mmx -mtune-generic -Wa,-mtune=generic32
Found some more:
Bad: -march=k6 -Wa,-mtune=generic32
Bad: -march=geode -Wa,-mtune=generic32
Bad: -march=c3 -Wa,-mtune=generic32
That seems every thing which has MMX support but no SSE and is somehow
compatible to the pentium-mmx.
Looks like the code generator optimization for those was done after
consuming the secret gcc-shrooms.
Thanks,
tglx
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: BUG: GCC-4.4.x changes the function frame on some functions
2009-11-20 0:59 ` Linus Torvalds
2009-11-20 1:27 ` Thomas Gleixner
2009-11-20 1:29 ` BUG: GCC-4.4.x changes the function frame on some functions H. Peter Anvin
@ 2009-11-20 5:36 ` Ingo Molnar
2 siblings, 0 replies; 68+ messages in thread
From: Ingo Molnar @ 2009-11-20 5:36 UTC (permalink / raw)
To: Linus Torvalds
Cc: Thomas Gleixner, Jeff Law, rostedt, H. Peter Anvin, David Daney,
Andrew Haley, Richard Guenther, LKML, Andrew Morton,
Heiko Carstens, feng.tang, Fr??d??ric Weisbecker, Peter Zijlstra,
jakub, gcc
* Linus Torvalds <torvalds@linux-foundation.org> wrote:
> [ Btw, looking at that, why are X86_L1_CACHE_BYTES and X86_L1_CACHE_SHIFT
> totally unrelated numbers? Very confusing. ]
incidentally (or maybe not so incidentally) that got fixed yesterday in
-tip - at around the time i triggered that crash:
350f8f5: x86: Eliminate redundant/contradicting cache line size config options
See the full commit below. The config that triggered the crash for me
has:
CONFIG_X86_L1_CACHE_SHIFT=4
so it's 16 bytes - and it's consistent now, which is a new angle. So i
think this explains why it stayed dormant for such a long time - it was
hidden by the cacheline-size config value inconsistencies.
Ingo
----------------->
>From 350f8f5631922c7848ec4b530c111cb8c2ff7caa Mon Sep 17 00:00:00 2001
From: Jan Beulich <JBeulich@novell.com>
Date: Fri, 13 Nov 2009 11:54:40 +0000
Subject: [PATCH] x86: Eliminate redundant/contradicting cache line size config options
Rather than having X86_L1_CACHE_BYTES and X86_L1_CACHE_SHIFT
(with inconsistent defaults), just having the latter suffices as
the former can be easily calculated from it.
To be consistent, also change X86_INTERNODE_CACHE_BYTES to
X86_INTERNODE_CACHE_SHIFT, and set it to 7 (128 bytes) for NUMA
to account for last level cache line size (which here matters
more than L1 cache line size).
Finally, make sure the default value for X86_L1_CACHE_SHIFT,
when X86_GENERIC is selected, is being seen before that for the
individual CPU model options (other than on x86-64, where
GENERIC_CPU is part of the choice construct, X86_GENERIC is a
separate option on ix86).
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Ravikiran Thirumalai <kiran@scalex86.org>
Acked-by: Nick Piggin <npiggin@suse.de>
LKML-Reference: <4AFD5710020000780001F8F0@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
arch/x86/Kconfig.cpu | 14 +++++---------
arch/x86/boot/compressed/vmlinux.lds.S | 3 ++-
arch/x86/include/asm/cache.h | 7 ++++---
arch/x86/kernel/vmlinux.lds.S | 10 +++++-----
arch/x86/mm/tlb.c | 3 ++-
5 files changed, 18 insertions(+), 19 deletions(-)
diff --git a/arch/x86/Kconfig.cpu b/arch/x86/Kconfig.cpu
index f2824fb..621f2bd 100644
--- a/arch/x86/Kconfig.cpu
+++ b/arch/x86/Kconfig.cpu
@@ -301,15 +301,11 @@ config X86_CPU
#
# Define implied options from the CPU selection here
-config X86_L1_CACHE_BYTES
+config X86_INTERNODE_CACHE_SHIFT
int
- default "128" if MPSC
- default "64" if GENERIC_CPU || MK8 || MCORE2 || MATOM || X86_32
-
-config X86_INTERNODE_CACHE_BYTES
- int
- default "4096" if X86_VSMP
- default X86_L1_CACHE_BYTES if !X86_VSMP
+ default "12" if X86_VSMP
+ default "7" if NUMA
+ default X86_L1_CACHE_SHIFT
config X86_CMPXCHG
def_bool X86_64 || (X86_32 && !M386)
@@ -317,9 +313,9 @@ config X86_CMPXCHG
config X86_L1_CACHE_SHIFT
int
default "7" if MPENTIUM4 || MPSC
+ default "6" if MK7 || MK8 || MPENTIUMM || MCORE2 || MATOM || MVIAC7 || X86_GENERIC || GENERIC_CPU
default "4" if X86_ELAN || M486 || M386 || MGEODEGX1
default "5" if MWINCHIP3D || MWINCHIPC6 || MCRUSOE || MEFFICEON || MCYRIXIII || MK6 || MPENTIUMIII || MPENTIUMII || M686 || M586MMX || M586TSC || M586 || MVIAC3_2 || MGEODE_LX
- default "6" if MK7 || MK8 || MPENTIUMM || MCORE2 || MATOM || MVIAC7 || X86_GENERIC || GENERIC_CPU
config X86_XADD
def_bool y
diff --git a/arch/x86/boot/compressed/vmlinux.lds.S b/arch/x86/boot/compressed/vmlinux.lds.S
index f4193bb..a6f1a59 100644
--- a/arch/x86/boot/compressed/vmlinux.lds.S
+++ b/arch/x86/boot/compressed/vmlinux.lds.S
@@ -4,6 +4,7 @@ OUTPUT_FORMAT(CONFIG_OUTPUT_FORMAT, CONFIG_OUTPUT_FORMAT, CONFIG_OUTPUT_FORMAT)
#undef i386
+#include <asm/cache.h>
#include <asm/page_types.h>
#ifdef CONFIG_X86_64
@@ -46,7 +47,7 @@ SECTIONS
*(.data.*)
_edata = . ;
}
- . = ALIGN(CONFIG_X86_L1_CACHE_BYTES);
+ . = ALIGN(L1_CACHE_BYTES);
.bss : {
_bss = . ;
*(.bss)
diff --git a/arch/x86/include/asm/cache.h b/arch/x86/include/asm/cache.h
index 549860d..2f9047c 100644
--- a/arch/x86/include/asm/cache.h
+++ b/arch/x86/include/asm/cache.h
@@ -9,12 +9,13 @@
#define __read_mostly __attribute__((__section__(".data.read_mostly")))
+#define INTERNODE_CACHE_SHIFT CONFIG_X86_INTERNODE_CACHE_SHIFT
+#define INTERNODE_CACHE_BYTES (1 << INTERNODE_CACHE_SHIFT)
+
#ifdef CONFIG_X86_VSMP
-/* vSMP Internode cacheline shift */
-#define INTERNODE_CACHE_SHIFT (12)
#ifdef CONFIG_SMP
#define __cacheline_aligned_in_smp \
- __attribute__((__aligned__(1 << (INTERNODE_CACHE_SHIFT)))) \
+ __attribute__((__aligned__(INTERNODE_CACHE_BYTES))) \
__page_aligned_data
#endif
#endif
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index fd2dabe..eeb4f5f 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -135,13 +135,13 @@ SECTIONS
PAGE_ALIGNED_DATA(PAGE_SIZE)
- CACHELINE_ALIGNED_DATA(CONFIG_X86_L1_CACHE_BYTES)
+ CACHELINE_ALIGNED_DATA(L1_CACHE_BYTES)
DATA_DATA
CONSTRUCTORS
/* rarely changed data like cpu maps */
- READ_MOSTLY_DATA(CONFIG_X86_INTERNODE_CACHE_BYTES)
+ READ_MOSTLY_DATA(INTERNODE_CACHE_BYTES)
/* End of data section */
_edata = .;
@@ -165,12 +165,12 @@ SECTIONS
*(.vsyscall_0)
} :user
- . = ALIGN(CONFIG_X86_L1_CACHE_BYTES);
+ . = ALIGN(L1_CACHE_BYTES);
.vsyscall_fn : AT(VLOAD(.vsyscall_fn)) {
*(.vsyscall_fn)
}
- . = ALIGN(CONFIG_X86_L1_CACHE_BYTES);
+ . = ALIGN(L1_CACHE_BYTES);
.vsyscall_gtod_data : AT(VLOAD(.vsyscall_gtod_data)) {
*(.vsyscall_gtod_data)
}
@@ -194,7 +194,7 @@ SECTIONS
}
vgetcpu_mode = VVIRT(.vgetcpu_mode);
- . = ALIGN(CONFIG_X86_L1_CACHE_BYTES);
+ . = ALIGN(L1_CACHE_BYTES);
.jiffies : AT(VLOAD(.jiffies)) {
*(.jiffies)
}
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 36fe08e..65b58e4 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -8,6 +8,7 @@
#include <asm/tlbflush.h>
#include <asm/mmu_context.h>
+#include <asm/cache.h>
#include <asm/apic.h>
#include <asm/uv/uv.h>
@@ -43,7 +44,7 @@ union smp_flush_state {
spinlock_t tlbstate_lock;
DECLARE_BITMAP(flush_cpumask, NR_CPUS);
};
- char pad[CONFIG_X86_INTERNODE_CACHE_BYTES];
+ char pad[INTERNODE_CACHE_BYTES];
} ____cacheline_internodealigned_in_smp;
/* State is put into the per CPU data section, but padded
^ permalink raw reply related [flat|nested] 68+ messages in thread
* Re: BUG: GCC-4.4.x changes the function frame on some functions
2009-11-20 0:36 ` Thomas Gleixner
2009-11-20 0:59 ` Linus Torvalds
@ 2009-11-20 12:04 ` Andrew Haley
2009-11-20 12:22 ` Andrew Haley
1 sibling, 1 reply; 68+ messages in thread
From: Andrew Haley @ 2009-11-20 12:04 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Jeff Law, rostedt, H. Peter Anvin, David Daney, Linus Torvalds,
Richard Guenther, Ingo Molnar, LKML, Andrew Morton,
Heiko Carstens, feng.tang, Fr??d??ric Weisbecker, Peter Zijlstra,
jakub, gcc
Thomas Gleixner wrote:
> While testing various kernel configs we found out that the problem
> comes and goes. Finally I started to compare the gcc command line
> options and after some fiddling it turned out that the following
> minimal deltas change the code generator behaviour:
>
> Bad: -march=pentium-mmx -Wa,-mtune=generic32
> Good: -march=i686 -mtune=generic -Wa,-mtune=generic32
> Good: -march=pentium-mmx -mtune-generic -Wa,-mtune=generic32
>
> I'm not supposed to understand the logic behind that, right ?
I don't either. I'm seeing:
timer_stats_update_stats: timer_stats_update_stats:
pushl %edi <
leal 8(%esp), %edi <
andl $-16, %esp <
pushl -4(%edi) <
pushl %ebp pushl %ebp
movl %esp, %ebp movl %esp, %ebp
pushl %edi | andl $-16, %esp
pushl %esi | subl $112, %esp
pushl %ebx | movl %ebx, 100(%esp)
subl $108, %esp | movl %esi, 104(%esp)
> movl %edi, 108(%esp)
call mcount call mcount
where the only difference is -mtune=generic. I'm investigating.
Andrew.
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: BUG: GCC-4.4.x changes the function frame on some functions
2009-11-20 12:04 ` Andrew Haley
@ 2009-11-20 12:22 ` Andrew Haley
0 siblings, 0 replies; 68+ messages in thread
From: Andrew Haley @ 2009-11-20 12:22 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Jeff Law, rostedt, H. Peter Anvin, David Daney, Linus Torvalds,
Richard Guenther, Ingo Molnar, LKML, Andrew Morton,
Heiko Carstens, feng.tang, Fr??d??ric Weisbecker, Peter Zijlstra,
jakub, gcc
Andrew Haley wrote:
> Thomas Gleixner wrote:
>
>> While testing various kernel configs we found out that the problem
>> comes and goes. Finally I started to compare the gcc command line
>> options and after some fiddling it turned out that the following
>> minimal deltas change the code generator behaviour:
>>
>> Bad: -march=pentium-mmx -Wa,-mtune=generic32
>> Good: -march=i686 -mtune=generic -Wa,-mtune=generic32
>> Good: -march=pentium-mmx -mtune-generic -Wa,-mtune=generic32
>>
>> I'm not supposed to understand the logic behind that, right ?
>
> I don't either. I'm seeing:
>
> timer_stats_update_stats: timer_stats_update_stats:
> pushl %edi <
> leal 8(%esp), %edi <
> andl $-16, %esp <
> pushl -4(%edi) <
> pushl %ebp pushl %ebp
> movl %esp, %ebp movl %esp, %ebp
> pushl %edi | andl $-16, %esp
> pushl %esi | subl $112, %esp
> pushl %ebx | movl %ebx, 100(%esp)
> subl $108, %esp | movl %esi, 104(%esp)
> > movl %edi, 108(%esp)
> call mcount call mcount
>
> where the only difference is -mtune=generic. I'm investigating.
Forget that, I see from the gcc-bugs list that hj has tracked it down to
the use of DRAP, and for some reason the mtune options affect that. He's
the best person to fix this.
Andrew.
^ permalink raw reply [flat|nested] 68+ messages in thread
* [tip:x86/urgent] x86: Prevent GCC 4.4.x (pentium-mmx et al) function prologue wreckage
2009-11-20 1:27 ` Thomas Gleixner
2009-11-20 2:14 ` Thomas Gleixner
@ 2009-11-20 13:09 ` tip-bot for Thomas Gleixner
1 sibling, 0 replies; 68+ messages in thread
From: tip-bot for Thomas Gleixner @ 2009-11-20 13:09 UTC (permalink / raw)
To: linux-tip-commits
Cc: linux-kernel, hpa, mingo, aph, torvalds, peterz,
richard.guenther, law, akpm, rostedt, ddaney, tglx, mingo
Commit-ID: 746357d6a526d6da9d89a2ec645b28406e959c2e
Gitweb: http://git.kernel.org/tip/746357d6a526d6da9d89a2ec645b28406e959c2e
Author: Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Fri, 20 Nov 2009 12:01:43 +0100
Committer: Thomas Gleixner <tglx@linutronix.de>
CommitDate: Fri, 20 Nov 2009 14:06:46 +0100
x86: Prevent GCC 4.4.x (pentium-mmx et al) function prologue wreckage
When the kernel is compiled with -pg for tracing GCC 4.4.x inserts
stack alignment of a function _before_ the mcount prologue if the
-march=pentium-mmx is set and -mtune=generic is not set. This breaks
the assumption of the function graph tracer which expects that the
mcount prologue
push %ebp
mov %esp, %ebp
is the first stack operation in a function because it needs to modify
the function return address on the stack to trap into the tracer
before returning to the real caller.
The generated code is:
push %edi
lea 0x8(%esp),%edi
and $0xfffffff0,%esp
pushl -0x4(%edi)
push %ebp
mov %esp,%ebp
so the tracer modifies the copy of the return address which is stored
after the stack alignment and therefor does not trap the return which
in turn breaks the call chain logic of the tracer and leads to a
kernel panic.
Aside of the fact that the generated code is horrible for no good
reason other -march -mtune options generate the expected:
push %ebp
mov %esp,%ebp
and $0xfffffff0,%esp
which does the same and keeps everything intact.
After some experimenting we found out that this problem is restricted
to gcc4.4.x and to the following -march settings:
i586, pentium, pentium-mmx, k6, k6-2, k6-3, winchip-c6, winchip2, c3,
geode
By adding -mtune=generic the code generator produces always the
expected code.
So forcing -mtune=generic when CONFIG_FUNCTION_GRAPH_TRACER=y is not
pretty, but at the moment the only way to prevent that the kernel
trips over gcc-shrooms induced code madness.
Most distro kernels have CONFIG_X86_GENERIC=y anyway which forces
-mtune=generic as well so it will not impact those.
References: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42109
http://lkml.org/lkml/2009/11/19/17
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
LKML-Reference: <alpine.LFD.2.00.0911200206570.24119@localhost.localdomain>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>,
Cc: Jeff Law <law@redhat.com>
Cc: gcc@gcc.gnu.org
Cc: David Daney <ddaney@caviumnetworks.com>
Cc: Andrew Haley <aph@redhat.com>
Cc: Richard Guenther <richard.guenther@gmail.com>
Cc: stable@kernel.org
---
arch/x86/Makefile_32.cpu | 6 ++++++
1 files changed, 6 insertions(+), 0 deletions(-)
diff --git a/arch/x86/Makefile_32.cpu b/arch/x86/Makefile_32.cpu
index 30e9a26..df7fdf8 100644
--- a/arch/x86/Makefile_32.cpu
+++ b/arch/x86/Makefile_32.cpu
@@ -46,6 +46,12 @@ cflags-$(CONFIG_MGEODEGX1) += -march=pentium-mmx
# cpu entries
cflags-$(CONFIG_X86_GENERIC) += $(call tune,generic,$(call tune,i686))
+# Work around the pentium-mmx code generator madness of gcc4.4.x which
+# does stack alignment by generating horrible code _before_ the mcount
+# prologue (push %ebp, mov %esp, %ebp) which breaks the function graph
+# tracer assumptions
+cflags-$(CONFIG_FUNCTION_GRAPH_TRACER) += $(call cc-option,-mtune=generic)
+
# Bug fix for binutils: this option is required in order to keep
# binutils from generating NOPL instructions against our will.
ifneq ($(CONFIG_X86_P6_NOP),y)
^ permalink raw reply related [flat|nested] 68+ messages in thread
* Re: BUG: GCC-4.4.x changes the function frame on some functions
2009-11-23 9:16 ` Jakub Jelinek
@ 2009-11-23 9:51 ` Thomas Gleixner
0 siblings, 0 replies; 68+ messages in thread
From: Thomas Gleixner @ 2009-11-23 9:51 UTC (permalink / raw)
To: Jakub Jelinek
Cc: Linus Torvalds, Andrew Haley, Richard Guenther, rostedt,
Ingo Molnar, H. Peter Anvin, LKML, Andrew Morton, Heiko Carstens,
feng.tang, Fr??d??ric Weisbecker, Peter Zijlstra, gcc
On Mon, 23 Nov 2009, Jakub Jelinek wrote:
> On Thu, Nov 19, 2009 at 08:01:57PM +0100, Thomas Gleixner wrote:
> > Just compiled with -mincoming-stack-boundary=4 and the problem goes
> > away as gcc now thinks that the incoming stack is already 16 byte
> > aligned. But that might break code which actually uses SSE
>
> Please don't do this, lying to the compiler is just going to result in
> wrong-code sooner or later, with the above switch gcc will assume the
> incoming stack is 16-byte aligned (which is not true in the ix86 kernel)
> and could very well e.g. optimize away code that looks at
> alignment of stack variables etc.
Right. I gave up the idea pretty fast. But in the current situation we
are forced to lie to the compiler in some way. Forcing -mtune=generic
when the function graph tracer is enabled seems to be a halfways sane
work around.
Thanks,
tglx
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: BUG: GCC-4.4.x changes the function frame on some functions
2009-11-19 19:01 ` Thomas Gleixner
@ 2009-11-23 9:16 ` Jakub Jelinek
2009-11-23 9:51 ` Thomas Gleixner
0 siblings, 1 reply; 68+ messages in thread
From: Jakub Jelinek @ 2009-11-23 9:16 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Linus Torvalds, Andrew Haley, Richard Guenther, rostedt,
Ingo Molnar, H. Peter Anvin, LKML, Andrew Morton, Heiko Carstens,
feng.tang, Fr??d??ric Weisbecker, Peter Zijlstra, gcc
On Thu, Nov 19, 2009 at 08:01:57PM +0100, Thomas Gleixner wrote:
> Just compiled with -mincoming-stack-boundary=4 and the problem goes
> away as gcc now thinks that the incoming stack is already 16 byte
> aligned. But that might break code which actually uses SSE
Please don't do this, lying to the compiler is just going to result in
wrong-code sooner or later, with the above switch gcc will assume the
incoming stack is 16-byte aligned (which is not true in the ix86 kernel)
and could very well e.g. optimize away code that looks at
alignment of stack variables etc.
Jakub
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: BUG: GCC-4.4.x changes the function frame on some functions
2009-11-19 20:06 ` Linus Torvalds
@ 2009-11-19 21:12 ` Jeff Law
0 siblings, 0 replies; 68+ messages in thread
From: Jeff Law @ 2009-11-19 21:12 UTC (permalink / raw)
To: Linus Torvalds
Cc: H. Peter Anvin, rostedt, David Daney, Andrew Haley,
Richard Guenther, Thomas Gleixner, Ingo Molnar, LKML,
Andrew Morton, Heiko Carstens, feng.tang, Fr??d??ric Weisbecker,
Peter Zijlstra, jakub, gcc
On 11/19/09 13:06, Linus Torvalds wrote:
>
> On Thu, 19 Nov 2009, H. Peter Anvin wrote:
>
>> Calling the profiler immediately at the entry point is clearly the more
>> sane option. It means the ABI is well-defined, stable, and independent
>> of what the actual function contents are. It means that ABI isn't the
>> normal C ABI (the __fentry__ function would have to preserve all
>> registers), but that's fine...
>>
> As far as I know, that's true of _mcount already: it's not a normal ABI
> and is rather a highly architecture-specific special case to begin with.
> At least ARM has some (several?) special mcount calling conventions,
> afaik.
>
Correct. _mcount's ABI typically has been defined by the implementation
of the vendor's C library mcount.
GCC has options to emit the profiling code prior to or after the
prologue controllable through the usual variety of target macros &
hooks. I can't imagine anyone would object to a clean, tested patch to
change how x86-linux's profiling code was implemented.
jeff
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: BUG: GCC-4.4.x changes the function frame on some functions
2009-11-19 19:50 ` H. Peter Anvin
2009-11-19 20:06 ` Linus Torvalds
2009-11-19 20:10 ` Steven Rostedt
@ 2009-11-19 21:05 ` Jeff Law
2 siblings, 0 replies; 68+ messages in thread
From: Jeff Law @ 2009-11-19 21:05 UTC (permalink / raw)
To: H. Peter Anvin
Cc: rostedt, David Daney, Linus Torvalds, Andrew Haley,
Richard Guenther, Thomas Gleixner, Ingo Molnar, LKML,
Andrew Morton, Heiko Carstens, feng.tang, Fr??d??ric Weisbecker,
Peter Zijlstra, jakub, gcc
On 11/19/09 12:50, H. Peter Anvin wrote:
>
> Calling the profiler immediately at the entry point is clearly the more
> sane option. It means the ABI is well-defined, stable, and independent
> of what the actual function contents are. It means that ABI isn't the
> normal C ABI (the __fentry__ function would have to preserve all
> registers), but that's fine...
>
Note there are targets (even some old x86 variants) that required the
profiling calls to occur after the prologue. Unfortunately, nobody
documented *why* that was the case. Sigh.
Jeff
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: BUG: GCC-4.4.x changes the function frame on some functions
@ 2009-11-19 20:48 H. Peter Anvin
0 siblings, 0 replies; 68+ messages in thread
From: H. Peter Anvin @ 2009-11-19 20:48 UTC (permalink / raw)
To: Frederic Weisbecker, Steven Rostedt
Cc: David Daney, Linus Torvalds, Andrew Haley, Richard Guenther,
Thomas Gleixner, Ingo Molnar, LKML, Andrew Morton,
Heiko Carstens, feng.tang, Peter Zijlstra, jakub, gcc
[-- Attachment #1: Type: text/plain, Size: 2814 bytes --]
On i386, if we call __fentry__ immediately on entry the return address will be in 4(%esp), so I fail to see how you could not reliably have the return address. Other arches would have different constraints, of course.
"Frederic Weisbecker" <fweisbec@gmail.com> wrote:
>On Thu, Nov 19, 2009 at 03:05:41PM -0500, Steven Rostedt wrote:
>> On Thu, 2009-11-19 at 20:46 +0100, Frederic Weisbecker wrote:
>> > On Thu, Nov 19, 2009 at 02:28:06PM -0500, Steven Rostedt wrote:
>>
>> > > <function>:
>> > > call __fentry__
>> > > [...]
>> > >
>> > >
>> > > -- Steve
>> >
>> >
>> > I would really like this. So that we can forget about other possible
>> > further suprises due to sophisticated function prologues beeing before
>> > the mcount call.
>> >
>> > And I guess that would fix it in every archs.
>>
>> Well, other archs use a register to store the return address. But it
>> would also be easy to do (pseudo arch assembly):
>>
>> <function>:
>> mov lr, (%sp)
>> add 8, %sp
>> blr __fentry__
>> sub 8, %sp
>> mov (%sp), lr
>>
>>
>> That way the lr would have the current function, and the parent would
>> still be at 8(%sp)
>>
>
>
>Yeah right, we need at least such very tiny prologue for
>archs that store return addresses in a reg.
>
>
>> >
>> > That said, Linus had a good point about the fact there might other uses
>> > of mcount even more tricky than what does the function graph tracer,
>> > outside the kernel, and those may depend on the strict ABI assumption
>> > that 4(ebp) is always the _real_ return address, and that through all
>> > the previous stack call. This is even a concern that extrapolates the
>> > single mcount case.
>>
>> As I am proposing a new call. This means that mcount stay as is for
>> legacy reasons. Yes I know there exists the -finstrument-functions but
>> that adds way too much bloat to the code. One single call to the
>> profiler is all I want.
>
>
>Sure, the purpose is not to change the existing -mcount thing.
>What I meant is that we could have -mcount and -real-ra-before-fp
>at the same time to guarantee fp + 4 is really what we want while
>using -mcount.
>
>The __fentry__ idea is more neat, but the guarantee of a real pointer
>to the return address is still something that lacks.
>
>
>> >
>> > So I wonder that actually the real problem is the lack of something that
>> > could provide this guarantee. We may need a -real-ra-before-fp (yeah
>> > I suck in naming).
>>
>> Don't worry, so do the C compiler folks, I mean, come on "mcount"?
>
>
>I guess it has been first created for the single purpose of counting
>specific functions but then it has been used for wider, unpredicted uses :)
>
--
Sent from my mobile phone. Please excuse any lack of formatting.
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: BUG: GCC-4.4.x changes the function frame on some functions
2009-11-19 20:36 ` Linus Torvalds
@ 2009-11-19 20:44 ` Steven Rostedt
0 siblings, 0 replies; 68+ messages in thread
From: Steven Rostedt @ 2009-11-19 20:44 UTC (permalink / raw)
To: Linus Torvalds
Cc: Frederic Weisbecker, David Daney, Andrew Haley, Richard Guenther,
Thomas Gleixner, Ingo Molnar, H. Peter Anvin, LKML,
Andrew Morton, Heiko Carstens, feng.tang, Peter Zijlstra, jakub,
gcc
On Thu, 2009-11-19 at 12:36 -0800, Linus Torvalds wrote:
>
> On Thu, 19 Nov 2009, Frederic Weisbecker wrote:
> >
> > > That way the lr would have the current function, and the parent would
> > > still be at 8(%sp)
> >
> > Yeah right, we need at least such very tiny prologue for
> > archs that store return addresses in a reg.
>
> Well, it will be architecture-dependent.
Totally agree, as mcount today is architecture dependent.
>
> For example, alpha can store the return value in _any_ register if I
> recall correctly, so you can do the "call to __fentry__" by just picking
> another register than the default one as the return address.
>
> And powerpc has two special registers: link and ctr, but iirc you can only
> load 'link' with a branch instruction. Which means that you could do
> something like
>
> mflr 0
> bl __fentry__
>
> in the caller (I forget if R0 is actually a call-trashed register or not),
> and then __fentry__ could do something like
>
> mflr 12 # save _new_ link
> mtlr 0 # restore original link
> mtctr 12 # move __fentry__ link to ctr
> .. do whatever ..
> bctr # return to __fentry__ caller
>
> to return with 'link' restored (but ctr and r0/r12 trashed - I don't
> recall the ppc calling conventions any more, but I think that is ok).
>
> Saving to stack seems unnecessary and pointless.
I was just using an example. But as you pointed out, each arch can find
its best way to handle it. Having the profiler called at the beginning
of the function is what I feel is the best.
We also get access to the function's parameters :-)
-- Steve
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: BUG: GCC-4.4.x changes the function frame on some functions
2009-11-19 18:38 ` Linus Torvalds
2009-11-19 18:47 ` Ingo Molnar
@ 2009-11-19 20:36 ` Thomas Gleixner
1 sibling, 0 replies; 68+ messages in thread
From: Thomas Gleixner @ 2009-11-19 20:36 UTC (permalink / raw)
To: Linus Torvalds
Cc: Richard Guenther, rostedt, Ingo Molnar, H. Peter Anvin, LKML,
Andrew Morton, Heiko Carstens, feng.tang, Fr??d??ric Weisbecker,
Peter Zijlstra, jakub, gcc
On Thu, 19 Nov 2009, Linus Torvalds wrote:
> On Thu, 19 Nov 2009, Richard Guenther wrote:
> >
> > Note that I only can reproduce the issue with
> > -mincoming-stack-boundary=2, not with -mpreferred-stack-boundary=2.
>
> Since you can reproduce it with -mincoming-stack-boundary=2, I woul
> suggest just fixing mcount handling that way regardless of anything else.
> The current code generated by gcc is just insane - even for the case where
> you _want_ 16-byte stack alignment.
>
> Instead crazy code like
>
> > push %edi
> > lea 0x8(%esp),%edi
> > and $0xfffffff0,%esp
> > pushl -0x4(%edi)
> > push %ebp
> > mov %esp,%ebp
> > ...
> > call mcount
>
> the sane thing to do would be to just do it as
>
> push %ebp
> mov %esp,%ebp
> call mcount
> and $0xfffffff0,%esp
which is what the 64bit compile does except that the mcount call
happens a bit later which is fine.
ffffffff8107cd34 <timer_stats_update_stats>:
ffffffff8107cd34: 55 push %rbp
ffffffff8107cd35: 48 89 e5 mov %rsp,%rbp
ffffffff8107cd38: 48 83 e4 c0 and $0xffffffffffffffc0,%rsp
Thanks,
tglx
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: BUG: GCC-4.4.x changes the function frame on some functions
2009-11-19 20:25 ` Frederic Weisbecker
@ 2009-11-19 20:36 ` Linus Torvalds
2009-11-19 20:44 ` Steven Rostedt
0 siblings, 1 reply; 68+ messages in thread
From: Linus Torvalds @ 2009-11-19 20:36 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: Steven Rostedt, David Daney, Andrew Haley, Richard Guenther,
Thomas Gleixner, Ingo Molnar, H. Peter Anvin, LKML,
Andrew Morton, Heiko Carstens, feng.tang, Peter Zijlstra, jakub,
gcc
On Thu, 19 Nov 2009, Frederic Weisbecker wrote:
>
> > That way the lr would have the current function, and the parent would
> > still be at 8(%sp)
>
> Yeah right, we need at least such very tiny prologue for
> archs that store return addresses in a reg.
Well, it will be architecture-dependent.
For example, alpha can store the return value in _any_ register if I
recall correctly, so you can do the "call to __fentry__" by just picking
another register than the default one as the return address.
And powerpc has two special registers: link and ctr, but iirc you can only
load 'link' with a branch instruction. Which means that you could do
something like
mflr 0
bl __fentry__
in the caller (I forget if R0 is actually a call-trashed register or not),
and then __fentry__ could do something like
mflr 12 # save _new_ link
mtlr 0 # restore original link
mtctr 12 # move __fentry__ link to ctr
.. do whatever ..
bctr # return to __fentry__ caller
to return with 'link' restored (but ctr and r0/r12 trashed - I don't
recall the ppc calling conventions any more, but I think that is ok).
Saving to stack seems unnecessary and pointless.
Linus
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: BUG: GCC-4.4.x changes the function frame on some functions
2009-11-19 20:17 ` Steven Rostedt
@ 2009-11-19 20:28 ` Frederic Weisbecker
0 siblings, 0 replies; 68+ messages in thread
From: Frederic Weisbecker @ 2009-11-19 20:28 UTC (permalink / raw)
To: Steven Rostedt
Cc: David Daney, Linus Torvalds, Andrew Haley, Richard Guenther,
Thomas Gleixner, Ingo Molnar, H. Peter Anvin, LKML,
Andrew Morton, Heiko Carstens, feng.tang, Peter Zijlstra, jakub,
gcc
On Thu, Nov 19, 2009 at 03:17:16PM -0500, Steven Rostedt wrote:
> On Thu, 2009-11-19 at 15:05 -0500, Steven Rostedt wrote:
>
> > Well, other archs use a register to store the return address. But it
> > would also be easy to do (pseudo arch assembly):
> >
> > <function>:
> > mov lr, (%sp)
> > add 8, %sp
> > blr __fentry__
>
> Should be bl __fentry__ for "branch and link".
>
> > sub 8, %sp
> > mov (%sp), lr
> >
> >
> > That way the lr would have the current function, and the parent would
> > still be at 8(%sp)
>
> Actually, if we add a new profiler and can make our own specification, I
> would say that the add and sub lines be the responsibility of
> __fentry__. Then we would have:
>
> <function>:
> mov lr, (%sp)
> bl __fentry__
> mov (%sp), lr
>
> If sp points to the current content, then replace (%sp) above with
> -8(%sp). Then the implementation of a nop __fentry__ would simply be:
>
> __fentry__:
> blr
Good point!
> For anything more elaborate, __fentry__ would be responsible for all
> adjustments.
Yep. The more we control it from __fentry__, the less we fall
down into unexpected surprises.
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: BUG: GCC-4.4.x changes the function frame on some functions
2009-11-19 20:05 ` Steven Rostedt
2009-11-19 20:17 ` Steven Rostedt
@ 2009-11-19 20:25 ` Frederic Weisbecker
2009-11-19 20:36 ` Linus Torvalds
1 sibling, 1 reply; 68+ messages in thread
From: Frederic Weisbecker @ 2009-11-19 20:25 UTC (permalink / raw)
To: Steven Rostedt
Cc: David Daney, Linus Torvalds, Andrew Haley, Richard Guenther,
Thomas Gleixner, Ingo Molnar, H. Peter Anvin, LKML,
Andrew Morton, Heiko Carstens, feng.tang, Peter Zijlstra, jakub,
gcc
On Thu, Nov 19, 2009 at 03:05:41PM -0500, Steven Rostedt wrote:
> On Thu, 2009-11-19 at 20:46 +0100, Frederic Weisbecker wrote:
> > On Thu, Nov 19, 2009 at 02:28:06PM -0500, Steven Rostedt wrote:
>
> > > <function>:
> > > call __fentry__
> > > [...]
> > >
> > >
> > > -- Steve
> >
> >
> > I would really like this. So that we can forget about other possible
> > further suprises due to sophisticated function prologues beeing before
> > the mcount call.
> >
> > And I guess that would fix it in every archs.
>
> Well, other archs use a register to store the return address. But it
> would also be easy to do (pseudo arch assembly):
>
> <function>:
> mov lr, (%sp)
> add 8, %sp
> blr __fentry__
> sub 8, %sp
> mov (%sp), lr
>
>
> That way the lr would have the current function, and the parent would
> still be at 8(%sp)
>
Yeah right, we need at least such very tiny prologue for
archs that store return addresses in a reg.
> >
> > That said, Linus had a good point about the fact there might other uses
> > of mcount even more tricky than what does the function graph tracer,
> > outside the kernel, and those may depend on the strict ABI assumption
> > that 4(ebp) is always the _real_ return address, and that through all
> > the previous stack call. This is even a concern that extrapolates the
> > single mcount case.
>
> As I am proposing a new call. This means that mcount stay as is for
> legacy reasons. Yes I know there exists the -finstrument-functions but
> that adds way too much bloat to the code. One single call to the
> profiler is all I want.
Sure, the purpose is not to change the existing -mcount thing.
What I meant is that we could have -mcount and -real-ra-before-fp
at the same time to guarantee fp + 4 is really what we want while
using -mcount.
The __fentry__ idea is more neat, but the guarantee of a real pointer
to the return address is still something that lacks.
> >
> > So I wonder that actually the real problem is the lack of something that
> > could provide this guarantee. We may need a -real-ra-before-fp (yeah
> > I suck in naming).
>
> Don't worry, so do the C compiler folks, I mean, come on "mcount"?
I guess it has been first created for the single purpose of counting
specific functions but then it has been used for wider, unpredicted uses :)
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: BUG: GCC-4.4.x changes the function frame on some functions
2009-11-19 20:05 ` Steven Rostedt
@ 2009-11-19 20:17 ` Steven Rostedt
2009-11-19 20:28 ` Frederic Weisbecker
2009-11-19 20:25 ` Frederic Weisbecker
1 sibling, 1 reply; 68+ messages in thread
From: Steven Rostedt @ 2009-11-19 20:17 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: David Daney, Linus Torvalds, Andrew Haley, Richard Guenther,
Thomas Gleixner, Ingo Molnar, H. Peter Anvin, LKML,
Andrew Morton, Heiko Carstens, feng.tang, Peter Zijlstra, jakub,
gcc
On Thu, 2009-11-19 at 15:05 -0500, Steven Rostedt wrote:
> Well, other archs use a register to store the return address. But it
> would also be easy to do (pseudo arch assembly):
>
> <function>:
> mov lr, (%sp)
> add 8, %sp
> blr __fentry__
Should be bl __fentry__ for "branch and link".
> sub 8, %sp
> mov (%sp), lr
>
>
> That way the lr would have the current function, and the parent would
> still be at 8(%sp)
Actually, if we add a new profiler and can make our own specification, I
would say that the add and sub lines be the responsibility of
__fentry__. Then we would have:
<function>:
mov lr, (%sp)
bl __fentry__
mov (%sp), lr
If sp points to the current content, then replace (%sp) above with
-8(%sp). Then the implementation of a nop __fentry__ would simply be:
__fentry__:
blr
For anything more elaborate, __fentry__ would be responsible for all
adjustments.
-- Steve
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: BUG: GCC-4.4.x changes the function frame on some functions
2009-11-19 19:50 ` H. Peter Anvin
2009-11-19 20:06 ` Linus Torvalds
@ 2009-11-19 20:10 ` Steven Rostedt
2009-11-19 21:05 ` Jeff Law
2 siblings, 0 replies; 68+ messages in thread
From: Steven Rostedt @ 2009-11-19 20:10 UTC (permalink / raw)
To: H. Peter Anvin
Cc: David Daney, Linus Torvalds, Andrew Haley, Richard Guenther,
Thomas Gleixner, Ingo Molnar, LKML, Andrew Morton,
Heiko Carstens, feng.tang, Fr??d??ric Weisbecker, Peter Zijlstra,
jakub, gcc
On Thu, 2009-11-19 at 11:50 -0800, H. Peter Anvin wrote:
> > Perhaps we could create another profiler? Instead of calling mcount,
> > call a new function: __fentry__ or something. Have it activated with
> > another switch. This could make the performance of the function tracer
> > even better without all these exceptions.
> >
> > <function>:
> > call __fentry__
> > [...]
> >
>
> Calling the profiler immediately at the entry point is clearly the more
> sane option. It means the ABI is well-defined, stable, and independent
> of what the actual function contents are. It means that ABI isn't the
> normal C ABI (the __fentry__ function would have to preserve all
> registers), but that's fine...
mcount already has that requirement (saving all/most regs). Anyway, you
are right, we don't care. The tracer should carry the blunt of the load,
not the individual callers.
-- Steve
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: BUG: GCC-4.4.x changes the function frame on some functions
2009-11-19 19:50 ` H. Peter Anvin
@ 2009-11-19 20:06 ` Linus Torvalds
2009-11-19 21:12 ` Jeff Law
2009-11-19 20:10 ` Steven Rostedt
2009-11-19 21:05 ` Jeff Law
2 siblings, 1 reply; 68+ messages in thread
From: Linus Torvalds @ 2009-11-19 20:06 UTC (permalink / raw)
To: H. Peter Anvin
Cc: rostedt, David Daney, Andrew Haley, Richard Guenther,
Thomas Gleixner, Ingo Molnar, LKML, Andrew Morton,
Heiko Carstens, feng.tang, Fr??d??ric Weisbecker, Peter Zijlstra,
jakub, gcc
On Thu, 19 Nov 2009, H. Peter Anvin wrote:
>
> Calling the profiler immediately at the entry point is clearly the more
> sane option. It means the ABI is well-defined, stable, and independent
> of what the actual function contents are. It means that ABI isn't the
> normal C ABI (the __fentry__ function would have to preserve all
> registers), but that's fine...
As far as I know, that's true of _mcount already: it's not a normal ABI
and is rather a highly architecture-specific special case to begin with.
At least ARM has some (several?) special mcount calling conventions,
afaik.
(And then ARM people use __attribute__((naked)) and insert the code by
inline asm, or something. That seems to be "standard" in the embedded
world, where they often do even stranger things than we do in the
kernel. At least our low-level system call and interrupt handlers are
written as assembly language - the embedded world seems to commonly
write them as C functions with magic attributes and inline asm).
Linus
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: BUG: GCC-4.4.x changes the function frame on some functions
2009-11-19 19:46 ` Frederic Weisbecker
2009-11-19 19:54 ` Kai Tietz
@ 2009-11-19 20:05 ` Steven Rostedt
2009-11-19 20:17 ` Steven Rostedt
2009-11-19 20:25 ` Frederic Weisbecker
1 sibling, 2 replies; 68+ messages in thread
From: Steven Rostedt @ 2009-11-19 20:05 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: David Daney, Linus Torvalds, Andrew Haley, Richard Guenther,
Thomas Gleixner, Ingo Molnar, H. Peter Anvin, LKML,
Andrew Morton, Heiko Carstens, feng.tang, Peter Zijlstra, jakub,
gcc
On Thu, 2009-11-19 at 20:46 +0100, Frederic Weisbecker wrote:
> On Thu, Nov 19, 2009 at 02:28:06PM -0500, Steven Rostedt wrote:
> > <function>:
> > call __fentry__
> > [...]
> >
> >
> > -- Steve
>
>
> I would really like this. So that we can forget about other possible
> further suprises due to sophisticated function prologues beeing before
> the mcount call.
>
> And I guess that would fix it in every archs.
Well, other archs use a register to store the return address. But it
would also be easy to do (pseudo arch assembly):
<function>:
mov lr, (%sp)
add 8, %sp
blr __fentry__
sub 8, %sp
mov (%sp), lr
That way the lr would have the current function, and the parent would
still be at 8(%sp)
>
> That said, Linus had a good point about the fact there might other uses
> of mcount even more tricky than what does the function graph tracer,
> outside the kernel, and those may depend on the strict ABI assumption
> that 4(ebp) is always the _real_ return address, and that through all
> the previous stack call. This is even a concern that extrapolates the
> single mcount case.
As I am proposing a new call. This means that mcount stay as is for
legacy reasons. Yes I know there exists the -finstrument-functions but
that adds way too much bloat to the code. One single call to the
profiler is all I want.
>
> So I wonder that actually the real problem is the lack of something that
> could provide this guarantee. We may need a -real-ra-before-fp (yeah
> I suck in naming).
Don't worry, so do the C compiler folks, I mean, come on "mcount"?
-- Steve
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: BUG: GCC-4.4.x changes the function frame on some functions
2009-11-19 19:54 ` Kai Tietz
@ 2009-11-19 20:05 ` Frederic Weisbecker
0 siblings, 0 replies; 68+ messages in thread
From: Frederic Weisbecker @ 2009-11-19 20:05 UTC (permalink / raw)
To: Kai Tietz
Cc: Steven Rostedt, David Daney, Linus Torvalds, Andrew Haley,
Richard Guenther, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
LKML, Andrew Morton, Heiko Carstens, feng.tang, Peter Zijlstra,
jakub, gcc
On Thu, Nov 19, 2009 at 08:54:56PM +0100, Kai Tietz wrote:
> 2009/11/19 Frederic Weisbecker <fweisbec@gmail.com>:
> > I would really like this. So that we can forget about other possible
> > further suprises due to sophisticated function prologues beeing before
> > the mcount call.
> >
> > And I guess that would fix it in every archs.
>
> My 5 cent for this, too.
>
> > That said, Linus had a good point about the fact there might other uses
> > of mcount even more tricky than what does the function graph tracer,
> > outside the kernel, and those may depend on the strict ABI assumption
> > that 4(ebp) is always the _real_ return address, and that through all
> > the previous stack call. This is even a concern that extrapolates the
> > single mcount case.
> >
> > So I wonder that actually the real problem is the lack of something that
> > could provide this guarantee. We may need a -real-ra-before-fp (yeah
> > I suck in naming).
>
> There are, especially in windows world. We noticed that for example
> the Sun's JDK (which is compiled by VC) can be used in gcc compiled
> code only by -fno-omit-frame-pointer, as otherwise it fails badly
> reasoned by wrong ebp accesses.
Yeah but what we need is not only to ensure ebp is used as the frame
pointer but also that ebp + 4 is really the address that will be used
to return to the caller, and not a copy of the return value.
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: BUG: GCC-4.4.x changes the function frame on some functions
2009-11-19 19:46 ` Frederic Weisbecker
@ 2009-11-19 19:54 ` Kai Tietz
2009-11-19 20:05 ` Frederic Weisbecker
2009-11-19 20:05 ` Steven Rostedt
1 sibling, 1 reply; 68+ messages in thread
From: Kai Tietz @ 2009-11-19 19:54 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: Steven Rostedt, David Daney, Linus Torvalds, Andrew Haley,
Richard Guenther, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
LKML, Andrew Morton, Heiko Carstens, feng.tang, Peter Zijlstra,
jakub, gcc
2009/11/19 Frederic Weisbecker <fweisbec@gmail.com>:
> I would really like this. So that we can forget about other possible
> further suprises due to sophisticated function prologues beeing before
> the mcount call.
>
> And I guess that would fix it in every archs.
My 5 cent for this, too.
> That said, Linus had a good point about the fact there might other uses
> of mcount even more tricky than what does the function graph tracer,
> outside the kernel, and those may depend on the strict ABI assumption
> that 4(ebp) is always the _real_ return address, and that through all
> the previous stack call. This is even a concern that extrapolates the
> single mcount case.
>
> So I wonder that actually the real problem is the lack of something that
> could provide this guarantee. We may need a -real-ra-before-fp (yeah
> I suck in naming).
There are, especially in windows world. We noticed that for example
the Sun's JDK (which is compiled by VC) can be used in gcc compiled
code only by -fno-omit-frame-pointer, as otherwise it fails badly
reasoned by wrong ebp accesses.
Kai
--
| (\_/) This is Bunny. Copy and paste
| (='.'=) Bunny into your signature to help
| (")_(") him gain world domination
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: BUG: GCC-4.4.x changes the function frame on some functions
2009-11-19 19:06 ` Steven Rostedt
@ 2009-11-19 19:50 ` Ingo Molnar
0 siblings, 0 replies; 68+ messages in thread
From: Ingo Molnar @ 2009-11-19 19:50 UTC (permalink / raw)
To: Steven Rostedt
Cc: Linus Torvalds, Richard Guenther, Thomas Gleixner,
H. Peter Anvin, LKML, Andrew Morton, Heiko Carstens, feng.tang,
Fr??d??ric Weisbecker, Peter Zijlstra, jakub, gcc
* Steven Rostedt <rostedt@goodmis.org> wrote:
> On Thu, 2009-11-19 at 19:47 +0100, Ingo Molnar wrote:
> > * Linus Torvalds <torvalds@linux-foundation.org> wrote:
> >
> > > Admittedly, anybody who compiles with -pg probably doesn't care deeply
> > > about smaller and more efficient code, since the mcount call overhead
> > > tends to make the thing moot anyway, but it really looks like a
> > > win-win situation to just fix the mcount call sequence regardless.
> >
> > Just a sidenote: due to dyn-ftrace, which patches out all mcounts during
> > bootup to be NOPs (and opt-in patches them in again if someone runs the
> > function tracer), the cost is not as large as one would have it with say
> > -pg based user-space profiling.
> >
> > It's not completely zero-cost as the pure NOPs balloon the i$ footprint
> > a bit and GCC generates different code too in some cases. But it's
> > certainly good enough that it's generally pretty hard to prove overhead
> > via micro or macro benchmarks that the patched out mcounts call sites
> > are there.
>
> And frame pointers do add a little overhead as well. Too bad the mcount
> ABI wasn't something like this:
>
>
> <function>:
> call mcount
> [...]
>
> This way, the function address for mcount would have been (%esp) and
> the parent address would be 4(%esp). Mcount would work without frame
> pointers and this whole mess would also become moot.
In that case we could also fix up static callsites to this address as
well (to jump +5 bytes into the function) and avoid the NOP as well in
most cases. (That would in essence merge any slow-path function epilogue
with the mcount cal instruction in terms of I$ footprint - i.e. it would
be an even lower overhead feature.)
If only the kernel had its own compiler.
Ingo
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: BUG: GCC-4.4.x changes the function frame on some functions
2009-11-19 19:28 ` Steven Rostedt
2009-11-19 19:46 ` Frederic Weisbecker
@ 2009-11-19 19:50 ` H. Peter Anvin
2009-11-19 20:06 ` Linus Torvalds
` (2 more replies)
1 sibling, 3 replies; 68+ messages in thread
From: H. Peter Anvin @ 2009-11-19 19:50 UTC (permalink / raw)
To: rostedt
Cc: David Daney, Linus Torvalds, Andrew Haley, Richard Guenther,
Thomas Gleixner, Ingo Molnar, LKML, Andrew Morton,
Heiko Carstens, feng.tang, Fr??d??ric Weisbecker, Peter Zijlstra,
jakub, gcc
On 11/19/2009 11:28 AM, Steven Rostedt wrote:
>
> Hehe, scratch register on i686 ;-)
>
> i686 has no extra regs. It just has:
>
> %eax, %ebx, %ecx, %edx - as the general purpose regs
> %esp - stack
> %ebp - frame pointer
> %edi, %esi - counter regs
>
> That's just 8 regs, and half of those are special.
>
For a modern ABI it is better described as:
%eax, %edx, %ecx - argument/return/scratch registers
%ebx, %esi, %edi - saved registers
%esp - stack pointer
%ebp - frame pointer (saved)
> Perhaps we could create another profiler? Instead of calling mcount,
> call a new function: __fentry__ or something. Have it activated with
> another switch. This could make the performance of the function tracer
> even better without all these exceptions.
>
> <function>:
> call __fentry__
> [...]
>
Calling the profiler immediately at the entry point is clearly the more
sane option. It means the ABI is well-defined, stable, and independent
of what the actual function contents are. It means that ABI isn't the
normal C ABI (the __fentry__ function would have to preserve all
registers), but that's fine...
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: BUG: GCC-4.4.x changes the function frame on some functions
2009-11-19 19:28 ` Steven Rostedt
@ 2009-11-19 19:46 ` Frederic Weisbecker
2009-11-19 19:54 ` Kai Tietz
2009-11-19 20:05 ` Steven Rostedt
2009-11-19 19:50 ` H. Peter Anvin
1 sibling, 2 replies; 68+ messages in thread
From: Frederic Weisbecker @ 2009-11-19 19:46 UTC (permalink / raw)
To: Steven Rostedt
Cc: David Daney, Linus Torvalds, Andrew Haley, Richard Guenther,
Thomas Gleixner, Ingo Molnar, H. Peter Anvin, LKML,
Andrew Morton, Heiko Carstens, feng.tang, Peter Zijlstra, jakub,
gcc
On Thu, Nov 19, 2009 at 02:28:06PM -0500, Steven Rostedt wrote:
> On Thu, 2009-11-19 at 11:10 -0800, David Daney wrote:
> > Linus Torvalds wrote:
>
> > For the MIPS port of GCC and Linux I recently added the
> > -mmcount-ra-address switch. It causes the location of the return
> > address (on the stack) to be passed to mcount in a scratch register.
>
> Hehe, scratch register on i686 ;-)
>
> i686 has no extra regs. It just has:
>
> %eax, %ebx, %ecx, %edx - as the general purpose regs
> %esp - stack
> %ebp - frame pointer
> %edi, %esi - counter regs
>
> That's just 8 regs, and half of those are special.
>
> >
> > Perhaps something similar could be done for x86. It would make this
> > patching of the return location more reliable at the expense of more
> > code at the mcount invocation site.
>
> I rather not put any more code in the call site.
>
> >
> > For the MIPS case the code size doesn't increase, as it is done in the
> > delay slot of the call instruction, which would otherwise be a nop.
>
> I showed in a previous post what the best would be for x86. That is just
> calling mcount at the very beginning of the function. The return address
> is automatically pushed onto the stack.
> Perhaps we could create another profiler? Instead of calling mcount,
> call a new function: __fentry__ or something. Have it activated with
> another switch. This could make the performance of the function tracer
> even better without all these exceptions.
>
> <function>:
> call __fentry__
> [...]
>
>
> -- Steve
I would really like this. So that we can forget about other possible
further suprises due to sophisticated function prologues beeing before
the mcount call.
And I guess that would fix it in every archs.
That said, Linus had a good point about the fact there might other uses
of mcount even more tricky than what does the function graph tracer,
outside the kernel, and those may depend on the strict ABI assumption
that 4(ebp) is always the _real_ return address, and that through all
the previous stack call. This is even a concern that extrapolates the
single mcount case.
So I wonder that actually the real problem is the lack of something that
could provide this guarantee. We may need a -real-ra-before-fp (yeah
I suck in naming).
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: BUG: GCC-4.4.x changes the function frame on some functions
2009-11-19 19:10 ` David Daney
@ 2009-11-19 19:28 ` Steven Rostedt
2009-11-19 19:46 ` Frederic Weisbecker
2009-11-19 19:50 ` H. Peter Anvin
0 siblings, 2 replies; 68+ messages in thread
From: Steven Rostedt @ 2009-11-19 19:28 UTC (permalink / raw)
To: David Daney
Cc: Linus Torvalds, Andrew Haley, Richard Guenther, Thomas Gleixner,
Ingo Molnar, H. Peter Anvin, LKML, Andrew Morton, Heiko Carstens,
feng.tang, Fr??d??ric Weisbecker, Peter Zijlstra, jakub, gcc
On Thu, 2009-11-19 at 11:10 -0800, David Daney wrote:
> Linus Torvalds wrote:
> For the MIPS port of GCC and Linux I recently added the
> -mmcount-ra-address switch. It causes the location of the return
> address (on the stack) to be passed to mcount in a scratch register.
Hehe, scratch register on i686 ;-)
i686 has no extra regs. It just has:
%eax, %ebx, %ecx, %edx - as the general purpose regs
%esp - stack
%ebp - frame pointer
%edi, %esi - counter regs
That's just 8 regs, and half of those are special.
>
> Perhaps something similar could be done for x86. It would make this
> patching of the return location more reliable at the expense of more
> code at the mcount invocation site.
I rather not put any more code in the call site.
>
> For the MIPS case the code size doesn't increase, as it is done in the
> delay slot of the call instruction, which would otherwise be a nop.
I showed in a previous post what the best would be for x86. That is just
calling mcount at the very beginning of the function. The return address
is automatically pushed onto the stack.
Perhaps we could create another profiler? Instead of calling mcount,
call a new function: __fentry__ or something. Have it activated with
another switch. This could make the performance of the function tracer
even better without all these exceptions.
<function>:
call __fentry__
[...]
-- Steve
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: BUG: GCC-4.4.x changes the function frame on some functions
2009-11-19 18:54 ` Linus Torvalds
2009-11-19 19:01 ` Thomas Gleixner
@ 2009-11-19 19:10 ` David Daney
2009-11-19 19:28 ` Steven Rostedt
1 sibling, 1 reply; 68+ messages in thread
From: David Daney @ 2009-11-19 19:10 UTC (permalink / raw)
To: Linus Torvalds
Cc: Andrew Haley, Richard Guenther, rostedt, Thomas Gleixner,
Ingo Molnar, H. Peter Anvin, LKML, Andrew Morton, Heiko Carstens,
feng.tang, Fr??d??ric Weisbecker, Peter Zijlstra, jakub, gcc
Linus Torvalds wrote:
>
> On Thu, 19 Nov 2009, Linus Torvalds wrote:
>> I bet other people than just the kernel use the mcount hook for subtler
>> things than just doing profiles. And even if they don't, the quoted code
>> generation is just crazy _crap_.
>
> For the kernel, if the only case is that timer_stat.c thing that Thomas
> pointed at, I guess we can at least work around it with something like the
> appended. The kernel code is certainly ugly too, no question about that.
>
> It's just that we'd like to be able to depend on mcount code generation
> not being insane even in the presense of ugly code..
>
> The alternative would be to have some warning when this happens, so that
> we can at least see it. "mcount won't work reliably"
>
For the MIPS port of GCC and Linux I recently added the
-mmcount-ra-address switch. It causes the location of the return
address (on the stack) to be passed to mcount in a scratch register.
Perhaps something similar could be done for x86. It would make this
patching of the return location more reliable at the expense of more
code at the mcount invocation site.
For the MIPS case the code size doesn't increase, as it is done in the
delay slot of the call instruction, which would otherwise be a nop.
David Daney
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: BUG: GCC-4.4.x changes the function frame on some functions
2009-11-19 18:47 ` Ingo Molnar
@ 2009-11-19 19:06 ` Steven Rostedt
2009-11-19 19:50 ` Ingo Molnar
0 siblings, 1 reply; 68+ messages in thread
From: Steven Rostedt @ 2009-11-19 19:06 UTC (permalink / raw)
To: Ingo Molnar
Cc: Linus Torvalds, Richard Guenther, Thomas Gleixner,
H. Peter Anvin, LKML, Andrew Morton, Heiko Carstens, feng.tang,
Fr??d??ric Weisbecker, Peter Zijlstra, jakub, gcc
On Thu, 2009-11-19 at 19:47 +0100, Ingo Molnar wrote:
> * Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
> > Admittedly, anybody who compiles with -pg probably doesn't care deeply
> > about smaller and more efficient code, since the mcount call overhead
> > tends to make the thing moot anyway, but it really looks like a
> > win-win situation to just fix the mcount call sequence regardless.
>
> Just a sidenote: due to dyn-ftrace, which patches out all mcounts during
> bootup to be NOPs (and opt-in patches them in again if someone runs the
> function tracer), the cost is not as large as one would have it with say
> -pg based user-space profiling.
>
> It's not completely zero-cost as the pure NOPs balloon the i$ footprint
> a bit and GCC generates different code too in some cases. But it's
> certainly good enough that it's generally pretty hard to prove overhead
> via micro or macro benchmarks that the patched out mcounts call sites
> are there.
And frame pointers do add a little overhead as well. Too bad the mcount
ABI wasn't something like this:
<function>:
call mcount
[...]
This way, the function address for mcount would have been (%esp) and the
parent address would be 4(%esp). Mcount would work without frame
pointers and this whole mess would also become moot.
-- Steve
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: BUG: GCC-4.4.x changes the function frame on some functions
2009-11-19 18:54 ` Linus Torvalds
@ 2009-11-19 19:01 ` Thomas Gleixner
2009-11-23 9:16 ` Jakub Jelinek
2009-11-19 19:10 ` David Daney
1 sibling, 1 reply; 68+ messages in thread
From: Thomas Gleixner @ 2009-11-19 19:01 UTC (permalink / raw)
To: Linus Torvalds
Cc: Andrew Haley, Richard Guenther, rostedt, Ingo Molnar,
H. Peter Anvin, LKML, Andrew Morton, Heiko Carstens, feng.tang,
Fr??d??ric Weisbecker, Peter Zijlstra, jakub, gcc
On Thu, 19 Nov 2009, Linus Torvalds wrote:
> > I bet other people than just the kernel use the mcount hook for subtler
> > things than just doing profiles. And even if they don't, the quoted code
> > generation is just crazy _crap_.
>
> For the kernel, if the only case is that timer_stat.c thing that Thomas
> pointed at, I guess we can at least work around it with something like the
> appended. The kernel code is certainly ugly too, no question about that.
>
> It's just that we'd like to be able to depend on mcount code generation
> not being insane even in the presense of ugly code..
>
> The alternative would be to have some warning when this happens, so that
> we can at least see it. "mcount won't work reliably"
There are at least 20 other random functions which have the same
problem. Have not looked at the details yet.
Just compiled with -mincoming-stack-boundary=4 and the problem goes
away as gcc now thinks that the incoming stack is already 16 byte
aligned. But that might break code which actually uses SSE
Thanks,
tglx
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: BUG: GCC-4.4.x changes the function frame on some functions
2009-11-19 18:43 ` Linus Torvalds
@ 2009-11-19 18:54 ` Linus Torvalds
2009-11-19 19:01 ` Thomas Gleixner
2009-11-19 19:10 ` David Daney
0 siblings, 2 replies; 68+ messages in thread
From: Linus Torvalds @ 2009-11-19 18:54 UTC (permalink / raw)
To: Andrew Haley
Cc: Richard Guenther, rostedt, Thomas Gleixner, Ingo Molnar,
H. Peter Anvin, LKML, Andrew Morton, Heiko Carstens, feng.tang,
Fr??d??ric Weisbecker, Peter Zijlstra, jakub, gcc
On Thu, 19 Nov 2009, Linus Torvalds wrote:
>
> I bet other people than just the kernel use the mcount hook for subtler
> things than just doing profiles. And even if they don't, the quoted code
> generation is just crazy _crap_.
For the kernel, if the only case is that timer_stat.c thing that Thomas
pointed at, I guess we can at least work around it with something like the
appended. The kernel code is certainly ugly too, no question about that.
It's just that we'd like to be able to depend on mcount code generation
not being insane even in the presense of ugly code..
The alternative would be to have some warning when this happens, so that
we can at least see it. "mcount won't work reliably"
Linus
---
kernel/time/timer_stats.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/kernel/time/timer_stats.c b/kernel/time/timer_stats.c
index ee5681f..488c7b8 100644
--- a/kernel/time/timer_stats.c
+++ b/kernel/time/timer_stats.c
@@ -76,7 +76,7 @@ struct entry {
*/
char comm[TASK_COMM_LEN + 1];
-} ____cacheline_aligned_in_smp;
+};
/*
* Spinlock protecting the tables - not taken during lookup:
@@ -114,7 +114,7 @@ static ktime_t time_start, time_stop;
#define MAX_ENTRIES (1UL << MAX_ENTRIES_BITS)
static unsigned long nr_entries;
-static struct entry entries[MAX_ENTRIES];
+static struct entry entries[MAX_ENTRIES] ____cacheline_aligned_in_smp;
static atomic_t overflow_count;
^ permalink raw reply related [flat|nested] 68+ messages in thread
* Re: BUG: GCC-4.4.x changes the function frame on some functions
2009-11-19 18:38 ` Linus Torvalds
@ 2009-11-19 18:47 ` Ingo Molnar
2009-11-19 19:06 ` Steven Rostedt
2009-11-19 20:36 ` Thomas Gleixner
1 sibling, 1 reply; 68+ messages in thread
From: Ingo Molnar @ 2009-11-19 18:47 UTC (permalink / raw)
To: Linus Torvalds
Cc: Richard Guenther, rostedt, Thomas Gleixner, H. Peter Anvin, LKML,
Andrew Morton, Heiko Carstens, feng.tang, Fr??d??ric Weisbecker,
Peter Zijlstra, jakub, gcc
* Linus Torvalds <torvalds@linux-foundation.org> wrote:
> Admittedly, anybody who compiles with -pg probably doesn't care deeply
> about smaller and more efficient code, since the mcount call overhead
> tends to make the thing moot anyway, but it really looks like a
> win-win situation to just fix the mcount call sequence regardless.
Just a sidenote: due to dyn-ftrace, which patches out all mcounts during
bootup to be NOPs (and opt-in patches them in again if someone runs the
function tracer), the cost is not as large as one would have it with say
-pg based user-space profiling.
It's not completely zero-cost as the pure NOPs balloon the i$ footprint
a bit and GCC generates different code too in some cases. But it's
certainly good enough that it's generally pretty hard to prove overhead
via micro or macro benchmarks that the patched out mcounts call sites
are there.
Ingo
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: BUG: GCC-4.4.x changes the function frame on some functions
2009-11-19 18:41 ` Linus Torvalds
@ 2009-11-19 18:43 ` Linus Torvalds
2009-11-19 18:54 ` Linus Torvalds
0 siblings, 1 reply; 68+ messages in thread
From: Linus Torvalds @ 2009-11-19 18:43 UTC (permalink / raw)
To: Andrew Haley
Cc: Richard Guenther, rostedt, Thomas Gleixner, Ingo Molnar,
H. Peter Anvin, LKML, Andrew Morton, Heiko Carstens, feng.tang,
Fr??d??ric Weisbecker, Peter Zijlstra, jakub, gcc
On Thu, 19 Nov 2009, Linus Torvalds wrote:
>
> Oh Gods, are we back to gcc people saying "sure, we do stupid things, but
> it's allowed, so we don't consider it a bug because it doesn't matter that
> real people care about real life, we only care about some paper, and real
> life doesn't matter, if it's 'undefined' we can make our idiotic choices
> regardless of what people need, and regardless of whether it actually
> generates better code or not".
Put another way: the stack alignment itself may not be a bug, but gcc
generating God-awful code for the mcount handling that results in problems
in real life sure as hell is *stupid* enough to be called a bug.
I bet other people than just the kernel use the mcount hook for subtler
things than just doing profiles. And even if they don't, the quoted code
generation is just crazy _crap_.
Linus
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: BUG: GCC-4.4.x changes the function frame on some functions
2009-11-19 18:22 ` Andrew Haley
@ 2009-11-19 18:41 ` Linus Torvalds
2009-11-19 18:43 ` Linus Torvalds
0 siblings, 1 reply; 68+ messages in thread
From: Linus Torvalds @ 2009-11-19 18:41 UTC (permalink / raw)
To: Andrew Haley
Cc: Richard Guenther, rostedt, Thomas Gleixner, Ingo Molnar,
H. Peter Anvin, LKML, Andrew Morton, Heiko Carstens, feng.tang,
Fr??d??ric Weisbecker, Peter Zijlstra, jakub, gcc
On Thu, 19 Nov 2009, Andrew Haley wrote:
>
> I've got all that off-list. I found the cause, and replied in another
> email. It's not a bug.
Oh Gods, are we back to gcc people saying "sure, we do stupid things, but
it's allowed, so we don't consider it a bug because it doesn't matter that
real people care about real life, we only care about some paper, and real
life doesn't matter, if it's 'undefined' we can make our idiotic choices
regardless of what people need, and regardless of whether it actually
generates better code or not".
Linus
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: BUG: GCC-4.4.x changes the function frame on some functions
2009-11-19 18:20 ` Andrew Haley
2009-11-19 18:33 ` Steven Rostedt
@ 2009-11-19 18:39 ` Thomas Gleixner
1 sibling, 0 replies; 68+ messages in thread
From: Thomas Gleixner @ 2009-11-19 18:39 UTC (permalink / raw)
To: Andrew Haley
Cc: Ingo Molnar, H. Peter Anvin, LKML, Andrew Morton, Heiko Carstens,
feng.tang, Fr??d??ric Weisbecker, Steven Rostedt, Peter Zijlstra,
jakub, gcc
On Thu, 19 Nov 2009, Andrew Haley wrote:
> OK, I found it. There is a struct defined as
>
> struct entry {
> ...
> } __attribute__((__aligned__((1 << (4)))));
>
> and then in timer_stats_update_stats you have a local variable of type
> struct entry:
>
> void timer_stats_update_stats()
> {
> spinlock_t *lock;
> struct entry *entry, input;
>
> So, gcc has to 16-align the stack pointer to satisfy the alignment
> for struct entry.
This does not explain why GCC < 4.4.x actually puts
push %ebp
mov %esp, %ebp
first and why GCC 4.4.x decides to create an extra copy of the return
address instead of just keeping the mcount stack magic right at the
function entry.
Thanks,
tglx
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: BUG: GCC-4.4.x changes the function frame on some functions
2009-11-19 18:03 ` Richard Guenther
2009-11-19 18:22 ` Andrew Haley
2009-11-19 18:31 ` Thomas Gleixner
@ 2009-11-19 18:38 ` Linus Torvalds
2009-11-19 18:47 ` Ingo Molnar
2009-11-19 20:36 ` Thomas Gleixner
2 siblings, 2 replies; 68+ messages in thread
From: Linus Torvalds @ 2009-11-19 18:38 UTC (permalink / raw)
To: Richard Guenther
Cc: rostedt, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, LKML,
Andrew Morton, Heiko Carstens, feng.tang, Fr??d??ric Weisbecker,
Peter Zijlstra, jakub, gcc
On Thu, 19 Nov 2009, Richard Guenther wrote:
>
> Note that I only can reproduce the issue with
> -mincoming-stack-boundary=2, not with -mpreferred-stack-boundary=2.
Since you can reproduce it with -mincoming-stack-boundary=2, I woul
suggest just fixing mcount handling that way regardless of anything else.
The current code generated by gcc is just insane - even for the case where
you _want_ 16-byte stack alignment.
Instead crazy code like
> push %edi
> lea 0x8(%esp),%edi
> and $0xfffffff0,%esp
> pushl -0x4(%edi)
> push %ebp
> mov %esp,%ebp
> ...
> call mcount
the sane thing to do would be to just do it as
push %ebp
mov %esp,%ebp
call mcount
and $0xfffffff0,%esp
since
- no sane 'mcount' implementation can ever care about 16-byte stack
alignment anyway, so aliging the stack before mcount is crazy.
- mcount is special anyway, and is the only thing that cares about that
whole ebp/return address thing is mcount, and _all_ your games with
%edi are about that mcount thing.
IOW, once you as a compiler person understand that the 'mcount' call is
special, you should have realized that all the work you did for it was
totally pointless and stupid to begin with.
You must already have that special mcount logic (the whole code to save a
register early and push the fake mcount stack frame), so instead of _that_
special logic, change it to a different mcount special logic that
associates the 'mcount' call with theframe pointer pushing.
That will not only make the Linux kernel tracer happy, it will make all
your _other_ users happier too, since you can generate smaller and more
efficient code.
Admittedly, anybody who compiles with -pg probably doesn't care deeply
about smaller and more efficient code, since the mcount call overhead
tends to make the thing moot anyway, but it really looks like a win-win
situation to just fix the mcount call sequence regardless.
> And you didn't provide us with a testcase either ... so please open a
> bugzilla and attach preprocessed source of a file that shows the
> problem, note the function it happens in and provide the command-line
> options you used for building.
>
> Otherwise it's going to be all speculation on our side.
See above - all you need to do is to just fix mcount calling.
Now, there is a separate bug that shows that you seem to over-align the
stack when not asked for, and yes, since we noticed that I hope that
Thomas and friends will fix that, but I think your mcount logic could (and
should) be fixed as an independent sillyness.
Linus
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: BUG: GCC-4.4.x changes the function frame on some functions
2009-11-19 18:33 ` Steven Rostedt
2009-11-19 18:36 ` Andrew Pinski
2009-11-19 18:36 ` Andrew Haley
@ 2009-11-19 18:37 ` H. Peter Anvin
2 siblings, 0 replies; 68+ messages in thread
From: H. Peter Anvin @ 2009-11-19 18:37 UTC (permalink / raw)
To: rostedt
Cc: Andrew Haley, Thomas Gleixner, Ingo Molnar, LKML, Andrew Morton,
Heiko Carstens, feng.tang, Fr??d??ric Weisbecker, Peter Zijlstra,
jakub, gcc
On 11/19/2009 10:33 AM, Steven Rostedt wrote:
>
> It has to align the entire stack? Why not just the variable within the
> stack?
>
Because if the stack pointer isn't aligned, it won't know where it can
stuff the variable. It has to pad *somewhere*, and since you may have
more than one such variable, the most efficient way -- and by far least
complex -- is for the compiler to align the stack when it sets up the
stack frame.
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: BUG: GCC-4.4.x changes the function frame on some functions
2009-11-19 18:33 ` Steven Rostedt
2009-11-19 18:36 ` Andrew Pinski
@ 2009-11-19 18:36 ` Andrew Haley
2009-11-19 18:37 ` H. Peter Anvin
2 siblings, 0 replies; 68+ messages in thread
From: Andrew Haley @ 2009-11-19 18:36 UTC (permalink / raw)
To: rostedt
Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, LKML,
Andrew Morton, Heiko Carstens, feng.tang, Fr??d??ric Weisbecker,
Peter Zijlstra, jakub, gcc
Steven Rostedt wrote:
> On Thu, 2009-11-19 at 18:20 +0000, Andrew Haley wrote:
>
>> OK, I found it. There is a struct defined as
>>
>> struct entry {
>> ...
>> } __attribute__((__aligned__((1 << (4)))));
>>
>> and then in timer_stats_update_stats you have a local variable of type
>> struct entry:
>>
>> void timer_stats_update_stats()
>> {
>> spinlock_t *lock;
>> struct entry *entry, input;
>>
>> So, gcc has to 16-align the stack pointer to satisfy the alignment
>> for struct entry.
>
> It has to align the entire stack? Why not just the variable within the
> stack?
How?. gcc has to know, at compile time, the offset from sp of each variable.
So, it of course makes sure that offset is 16-aligned, but it also has to
16-align the stack pointer.
Andrew.
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: BUG: GCC-4.4.x changes the function frame on some functions
2009-11-19 18:33 ` Steven Rostedt
@ 2009-11-19 18:36 ` Andrew Pinski
2009-11-19 18:36 ` Andrew Haley
2009-11-19 18:37 ` H. Peter Anvin
2 siblings, 0 replies; 68+ messages in thread
From: Andrew Pinski @ 2009-11-19 18:36 UTC (permalink / raw)
To: rostedt
Cc: Andrew Haley, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, LKML,
Andrew Morton, Heiko Carstens, feng.tang, Fr??d??ric Weisbecker,
Peter Zijlstra, jakub, gcc
On Thu, Nov 19, 2009 at 10:33 AM, Steven Rostedt <rostedt@goodmis.org> wrote:
> It has to align the entire stack? Why not just the variable within the
> stack?
I had proposed a patch which just aligns the variable but that patch
was never really commented on and HJL's patches to realign the whole
stack went in afterwards.
Thanks,
Andrew Pinski
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: BUG: GCC-4.4.x changes the function frame on some functions
2009-11-19 18:20 ` Andrew Haley
@ 2009-11-19 18:33 ` Steven Rostedt
2009-11-19 18:36 ` Andrew Pinski
` (2 more replies)
2009-11-19 18:39 ` Thomas Gleixner
1 sibling, 3 replies; 68+ messages in thread
From: Steven Rostedt @ 2009-11-19 18:33 UTC (permalink / raw)
To: Andrew Haley
Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, LKML,
Andrew Morton, Heiko Carstens, feng.tang, Fr??d??ric Weisbecker,
Peter Zijlstra, jakub, gcc
On Thu, 2009-11-19 at 18:20 +0000, Andrew Haley wrote:
> OK, I found it. There is a struct defined as
>
> struct entry {
> ...
> } __attribute__((__aligned__((1 << (4)))));
>
> and then in timer_stats_update_stats you have a local variable of type
> struct entry:
>
> void timer_stats_update_stats()
> {
> spinlock_t *lock;
> struct entry *entry, input;
>
> So, gcc has to 16-align the stack pointer to satisfy the alignment
> for struct entry.
It has to align the entire stack? Why not just the variable within the
stack?
-- Steve
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: BUG: GCC-4.4.x changes the function frame on some functions
2009-11-19 18:03 ` Richard Guenther
2009-11-19 18:22 ` Andrew Haley
@ 2009-11-19 18:31 ` Thomas Gleixner
2009-11-19 18:38 ` Linus Torvalds
2 siblings, 0 replies; 68+ messages in thread
From: Thomas Gleixner @ 2009-11-19 18:31 UTC (permalink / raw)
To: Richard Guenther
Cc: rostedt, Linus Torvalds, Ingo Molnar, H. Peter Anvin, LKML,
Andrew Morton, Heiko Carstens, feng.tang, Fr??d??ric Weisbecker,
Peter Zijlstra, jakub, gcc
On Thu, 19 Nov 2009, Richard Guenther wrote:
> Note that I only can reproduce the issue with
> -mincoming-stack-boundary=2, not with -mpreferred-stack-boundary=2.
> And
> you didn't provide us with a testcase either ... so please open
> a bugzilla and attach preprocessed source of a file that
> shows the problem, note the function it happens in and provide
> the command-line options you used for building.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42109
Thanks,
tglx
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: BUG: GCC-4.4.x changes the function frame on some functions
2009-11-19 18:03 ` Richard Guenther
@ 2009-11-19 18:22 ` Andrew Haley
2009-11-19 18:41 ` Linus Torvalds
2009-11-19 18:31 ` Thomas Gleixner
2009-11-19 18:38 ` Linus Torvalds
2 siblings, 1 reply; 68+ messages in thread
From: Andrew Haley @ 2009-11-19 18:22 UTC (permalink / raw)
To: Richard Guenther
Cc: rostedt, Linus Torvalds, Thomas Gleixner, Ingo Molnar,
H. Peter Anvin, LKML, Andrew Morton, Heiko Carstens, feng.tang,
Fr??d??ric Weisbecker, Peter Zijlstra, jakub, gcc
Richard Guenther wrote:
> And
> you didn't provide us with a testcase either ... so please open
> a bugzilla and attach preprocessed source of a file that
> shows the problem, note the function it happens in and provide
> the command-line options you used for building.
I've got all that off-list. I found the cause, and replied in another
email. It's not a bug.
Andrew.
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: BUG: GCC-4.4.x changes the function frame on some functions
2009-11-19 15:37 ` BUG: GCC-4.4.x changes the function frame on some functions Thomas Gleixner
` (2 preceding siblings ...)
2009-11-19 17:39 ` Linus Torvalds
@ 2009-11-19 18:20 ` Andrew Haley
2009-11-19 18:33 ` Steven Rostedt
2009-11-19 18:39 ` Thomas Gleixner
3 siblings, 2 replies; 68+ messages in thread
From: Andrew Haley @ 2009-11-19 18:20 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Ingo Molnar, H. Peter Anvin, LKML, Andrew Morton, Heiko Carstens,
feng.tang, Fr??d??ric Weisbecker, Steven Rostedt, Peter Zijlstra,
jakub, gcc
Thomas Gleixner wrote:
> On Thu, 19 Nov 2009, Thomas Gleixner wrote:
>
> Can the GCC folks please shed some light on this:
>
> standard function start:
>
> push %ebp
> mov %esp, %ebp
> ....
> call mcount
>
> modified function start on a handful of functions only seen with gcc
> 4.4.x on x86 32 bit:
>
> push %edi
> lea 0x8(%esp),%edi
> and $0xfffffff0,%esp
> pushl -0x4(%edi)
> push %ebp
> mov %esp,%ebp
> ...
> call mcount
>
> This modification leads to a hard to solve problem in the kernel
> function graph tracer which assumes that the stack looks like:
>
> return address
> saved ebp
>
> With the modified function start sequence this is not longer true and
> the manipulation of the return address on the stack fails silently.
>
> Neither gcc 4.3 nor gcc 3.4 are generating such function frames, so it
> looks like a gcc 4.4.x feature.
>
> There is no real obvious reason why the edi magic needs to be done
> _before_
>
> push %ebp
> mov %esp,%ebp
OK, I found it. There is a struct defined as
struct entry {
...
} __attribute__((__aligned__((1 << (4)))));
and then in timer_stats_update_stats you have a local variable of type
struct entry:
void timer_stats_update_stats()
{
spinlock_t *lock;
struct entry *entry, input;
So, gcc has to 16-align the stack pointer to satisfy the alignment
for struct entry.
Andrew.
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: BUG: GCC-4.4.x changes the function frame on some functions
2009-11-19 17:59 ` Steven Rostedt
@ 2009-11-19 18:03 ` Richard Guenther
2009-11-19 18:22 ` Andrew Haley
` (2 more replies)
0 siblings, 3 replies; 68+ messages in thread
From: Richard Guenther @ 2009-11-19 18:03 UTC (permalink / raw)
To: rostedt
Cc: Linus Torvalds, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
LKML, Andrew Morton, Heiko Carstens, feng.tang,
Fr??d??ric Weisbecker, Peter Zijlstra, jakub, gcc
On Thu, Nov 19, 2009 at 6:59 PM, Steven Rostedt <rostedt@goodmis.org> wrote:
> On Thu, 2009-11-19 at 09:39 -0800, Linus Torvalds wrote:
>
>> > This modification leads to a hard to solve problem in the kernel
>> > function graph tracer which assumes that the stack looks like:
>> >
>> > return address
>> > saved ebp
>>
>> Umm. But it still does, doesn't it? That
>>
>> pushl -0x4(%edi)
>> push %ebp
>>
>> should do it - the "-0x4(%edi)" thing seems to be trying to reload the
>> return address. No?
>
> Yes that is what it is doing. The problem we have is that it is putting
> into the frame pointer a "copy" of the return address, and not the
> actual pointer. Which is fine for the function tracer, but breaks the
> function graph tracer (which is a much more powerful tracer).
>
> Technically, this is all that mcount must have. And yes, we are making
> an assumption that the return address in the frame pointer is the one
> that will be used to leave the function. But the reason for making this
> copy just seems to be all messed up.
>
> I don't know if the ABI says anything about the return address in the
> frame pointer must be the actual return address. But it would be nice if
> the gcc folks would let us guarantee that it is.
Note that I only can reproduce the issue with
-mincoming-stack-boundary=2, not with -mpreferred-stack-boundary=2.
And
you didn't provide us with a testcase either ... so please open
a bugzilla and attach preprocessed source of a file that
shows the problem, note the function it happens in and provide
the command-line options you used for building.
Otherwise it's going to be all speculation on our side.
Thanks,
Richard.
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: BUG: GCC-4.4.x changes the function frame on some functions
2009-11-19 17:39 ` Linus Torvalds
2009-11-19 17:51 ` Thomas Gleixner
@ 2009-11-19 17:59 ` Steven Rostedt
2009-11-19 18:03 ` Richard Guenther
1 sibling, 1 reply; 68+ messages in thread
From: Steven Rostedt @ 2009-11-19 17:59 UTC (permalink / raw)
To: Linus Torvalds
Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, LKML,
Andrew Morton, Heiko Carstens, feng.tang, Fr??d??ric Weisbecker,
Peter Zijlstra, jakub, gcc
On Thu, 2009-11-19 at 09:39 -0800, Linus Torvalds wrote:
> > This modification leads to a hard to solve problem in the kernel
> > function graph tracer which assumes that the stack looks like:
> >
> > return address
> > saved ebp
>
> Umm. But it still does, doesn't it? That
>
> pushl -0x4(%edi)
> push %ebp
>
> should do it - the "-0x4(%edi)" thing seems to be trying to reload the
> return address. No?
Yes that is what it is doing. The problem we have is that it is putting
into the frame pointer a "copy" of the return address, and not the
actual pointer. Which is fine for the function tracer, but breaks the
function graph tracer (which is a much more powerful tracer).
Technically, this is all that mcount must have. And yes, we are making
an assumption that the return address in the frame pointer is the one
that will be used to leave the function. But the reason for making this
copy just seems to be all messed up.
I don't know if the ABI says anything about the return address in the
frame pointer must be the actual return address. But it would be nice if
the gcc folks would let us guarantee that it is.
-- Steve
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: BUG: GCC-4.4.x changes the function frame on some functions
2009-11-19 17:39 ` Linus Torvalds
@ 2009-11-19 17:51 ` Thomas Gleixner
2009-11-19 17:59 ` Steven Rostedt
1 sibling, 0 replies; 68+ messages in thread
From: Thomas Gleixner @ 2009-11-19 17:51 UTC (permalink / raw)
To: Linus Torvalds
Cc: Ingo Molnar, H. Peter Anvin, LKML, Andrew Morton, Heiko Carstens,
feng.tang, Fr??d??ric Weisbecker, Steven Rostedt, Peter Zijlstra,
jakub, gcc
On Thu, 19 Nov 2009, Linus Torvalds wrote:
> Umm. But it still does, doesn't it? That
>
> pushl -0x4(%edi)
> push %ebp
>
> should do it - the "-0x4(%edi)" thing seems to be trying to reload the
> return address. No?
>
> Maybe I misread the code - but regardless, it does look like a gcc code
> generation bug if only because we really don't want a 16-byte aligned
> stack anyway, and have asked for it to not be done.
>
> So I agree that gcc shouldn't do that crazy prologue (and certainly _not_
> before calling mcount anyway), but I'm not sure I agree with that detail
> of your analysis or explanation.
Yes, it does store the return address before the pushed ebp, but this
is a copy of the real stack entry which is before the pushed edi.
The function graph tracer needs to redirect the return into the tracer
and it therefor saves the real return address and modifies the stack
so the return ends up in the tracer code which then goes back to the
real return address.
But in this prologue/aligment case we modify the copy and not the real
return address on the stack, so we return without calling into the
tracer which is causing the headache because the state of the tracer
becomes confused.
Thanks,
tglx
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: BUG: GCC-4.4.x changes the function frame on some functions
2009-11-19 15:37 ` BUG: GCC-4.4.x changes the function frame on some functions Thomas Gleixner
2009-11-19 15:44 ` Andrew Haley
2009-11-19 15:45 ` H. Peter Anvin
@ 2009-11-19 17:39 ` Linus Torvalds
2009-11-19 17:51 ` Thomas Gleixner
2009-11-19 17:59 ` Steven Rostedt
2009-11-19 18:20 ` Andrew Haley
3 siblings, 2 replies; 68+ messages in thread
From: Linus Torvalds @ 2009-11-19 17:39 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Ingo Molnar, H. Peter Anvin, LKML, Andrew Morton, Heiko Carstens,
feng.tang, Fr??d??ric Weisbecker, Steven Rostedt, Peter Zijlstra,
jakub, gcc
On Thu, 19 Nov 2009, Thomas Gleixner wrote:
>
> standard function start:
>
> push %ebp
> mov %esp, %ebp
> ....
> call mcount
>
> modified function start on a handful of functions only seen with gcc
> 4.4.x on x86 32 bit:
>
> push %edi
> lea 0x8(%esp),%edi
> and $0xfffffff0,%esp
> pushl -0x4(%edi)
> push %ebp
> mov %esp,%ebp
> ...
> call mcount
That's some crazy sh*t anyway, since we don't _want_ the stack to be
16-byte aligned in the kernel. We do
KBUILD_CFLAGS += $(call cc-option,-mpreferred-stack-boundary=2)
why is that not working?
So this looks like a gcc bug, plain and simple.
> This modification leads to a hard to solve problem in the kernel
> function graph tracer which assumes that the stack looks like:
>
> return address
> saved ebp
Umm. But it still does, doesn't it? That
pushl -0x4(%edi)
push %ebp
should do it - the "-0x4(%edi)" thing seems to be trying to reload the
return address. No?
Maybe I misread the code - but regardless, it does look like a gcc code
generation bug if only because we really don't want a 16-byte aligned
stack anyway, and have asked for it to not be done.
So I agree that gcc shouldn't do that crazy prologue (and certainly _not_
before calling mcount anyway), but I'm not sure I agree with that detail
of your analysis or explanation.
Linus
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: BUG: GCC-4.4.x changes the function frame on some functions
2009-11-19 15:49 ` Richard Guenther
2009-11-19 15:52 ` Richard Guenther
@ 2009-11-19 17:37 ` Andi Kleen
1 sibling, 0 replies; 68+ messages in thread
From: Andi Kleen @ 2009-11-19 17:37 UTC (permalink / raw)
To: Richard Guenther
Cc: H. Peter Anvin, Thomas Gleixner, Ingo Molnar, LKML,
Andrew Morton, Heiko Carstens, feng.tang, Fr??d??ric Weisbecker,
Steven Rostedt, Peter Zijlstra, jakub, gcc
Richard Guenther <richard.guenther@gmail.com> writes:
>
> It's likely because you have long long vars on the stack which is
> faster when they are aligned.
It's not faster for 32bit.
-Andi
--
ak@linux.intel.com -- Speaking for myself only.
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: BUG: GCC-4.4.x changes the function frame on some functions
2009-11-19 16:17 ` Andrew Haley
@ 2009-11-19 16:43 ` Thomas Gleixner
0 siblings, 0 replies; 68+ messages in thread
From: Thomas Gleixner @ 2009-11-19 16:43 UTC (permalink / raw)
To: Andrew Haley
Cc: Ingo Molnar, H. Peter Anvin, LKML, Andrew Morton, Heiko Carstens,
feng.tang, Fr??d??ric Weisbecker, Steven Rostedt, Peter Zijlstra,
jakub, gcc
On Thu, 19 Nov 2009, Andrew Haley wrote:
> Thomas Gleixner wrote:
> > On Thu, 19 Nov 2009, Andrew Haley wrote:
> >> Thomas Gleixner wrote:
> >>> There is no real obvious reason why the edi magic needs to be done
> >>> _before_
> >>>
> >>> push %ebp
> >>> mov %esp,%ebp
> >> Sure there is: unless you do the adjustment first %ebp won't be 16-aligned.
> >
> > And why is this not done in 99% of the functions in the kernel, just
> > in this one and some random others ?
>
> If I could see the function I might be able to tell you. It's either a
> performance enhancement, something to do with SSE, or it's a bug.
kernel/time/timer_stats.c timer_stats_update_stats()
Here is the disassembly:
8107ad50 <timer_stats_update_stats>:
8107ad50: 57 push %edi
8107ad51: 8d 7c 24 08 lea 0x8(%esp),%edi
8107ad55: 83 e4 f0 and $0xfffffff0,%esp
8107ad58: ff 77 fc pushl -0x4(%edi)
8107ad5b: 55 push %ebp
8107ad5c: 89 e5 mov %esp,%ebp
8107ad5e: 57 push %edi
8107ad5f: 56 push %esi
8107ad60: 53 push %ebx
8107ad61: 83 ec 6c sub $0x6c,%esp
8107ad64: e8 47 92 f8 ff call 81003fb0 <mcount>
8107ad69: 8b 77 04 mov 0x4(%edi),%esi
8107ad6c: 89 75 a4 mov %esi,-0x5c(%ebp)
8107ad6f: 65 8b 35 14 00 00 00 mov %gs:0x14,%esi
8107ad76: 89 75 e4 mov %esi,-0x1c(%ebp)
8107ad79: 31 f6 xor %esi,%esi
8107ad7b: 8b 35 60 5a cd 81 mov 0x81cd5a60,%esi
8107ad81: 8b 1f mov (%edi),%ebx
8107ad83: 85 f6 test %esi,%esi
8107ad85: 8b 7f 08 mov 0x8(%edi),%edi
8107ad88: 75 18 jne 8107ada2 <timer_stats_update_stats+0x52>
8107ad8a: 8b 45 e4 mov -0x1c(%ebp),%eax
8107ad8d: 65 33 05 14 00 00 00 xor %gs:0x14,%eax
8107ad94: 75 53 jne 8107ade9 <timer_stats_update_stats+0x99>
8107ad96: 83 c4 6c add $0x6c,%esp
8107ad99: 5b pop %ebx
8107ad9a: 5e pop %esi
8107ad9b: 5f pop %edi
8107ad9c: 5d pop %ebp
8107ad9d: 8d 67 f8 lea -0x8(%edi),%esp
8107ada0: 5f pop %edi
8107ada1: c3 ret
8107ada2: be 00 7a d6 81 mov $0x81d67a00,%esi
8107ada7: 89 45 ac mov %eax,-0x54(%ebp)
8107adaa: 89 75 a0 mov %esi,-0x60(%ebp)
8107adad: 89 5d b4 mov %ebx,-0x4c(%ebp)
8107adb0: 64 8b 35 78 6a d6 81 mov %fs:0x81d66a78,%esi
8107adb7: 8b 34 b5 20 50 cd 81 mov -0x7e32afe0(,%esi,4),%esi
8107adbe: 89 4d b0 mov %ecx,-0x50(%ebp)
8107adc1: 01 75 a0 add %esi,-0x60(%ebp)
8107adc4: 89 55 b8 mov %edx,-0x48(%ebp)
8107adc7: 8b 45 a0 mov -0x60(%ebp),%eax
8107adca: 89 7d c0 mov %edi,-0x40(%ebp)
8107adcd: e8 de f7 76 00 call 817ea5b0 <_spin_lock_irqsave>
8107add2: 83 3d 60 5a cd 81 00 cmpl $0x0,0x81cd5a60
8107add9: 89 c3 mov %eax,%ebx
8107addb: 75 11 jne 8107adee <timer_stats_update_stats+0x9e>
8107addd: 89 da mov %ebx,%edx
8107addf: 8b 45 a0 mov -0x60(%ebp),%eax
8107ade2: e8 79 fc 76 00 call 817eaa60 <_spin_unlock_irqrestore>
8107ade7: eb a1 jmp 8107ad8a <timer_stats_update_stats+0x3a>
8107ade9: e8 52 e4 fc ff call 81049240 <__stack_chk_fail>
8107adee: 8d 45 a8 lea -0x58(%ebp),%eax
8107adf1: 8b 55 a4 mov -0x5c(%ebp),%edx
8107adf4: e8 f7 fd ff ff call 8107abf0 <tstat_lookup>
8107adf9: 85 c0 test %eax,%eax
8107adfb: 74 05 je 8107ae02 <timer_stats_update_stats+0xb2>
8107adfd: ff 40 14 incl 0x14(%eax)
8107ae00: eb db jmp 8107addd <timer_stats_update_stats+0x8d>
8107ae02: f0 ff 05 00 67 fd 81 lock incl 0x81fd6700
8107ae09: eb d2 jmp 8107addd <timer_stats_update_stats+0x8d>
8107ae0b: 90 nop
8107ae0c: 90 nop
8107ae0d: 90 nop
8107ae0e: 90 nop
8107ae0f: 90 nop
There is a dozen more of those.
Thanks,
tglx
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: BUG: GCC-4.4.x changes the function frame on some functions
2009-11-19 16:02 ` Steven Rostedt
2009-11-19 16:11 ` H. Peter Anvin
@ 2009-11-19 16:19 ` Frederic Weisbecker
1 sibling, 0 replies; 68+ messages in thread
From: Frederic Weisbecker @ 2009-11-19 16:19 UTC (permalink / raw)
To: Steven Rostedt
Cc: Andrew Haley, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, LKML,
Andrew Morton, Heiko Carstens, feng.tang, Peter Zijlstra, jakub,
gcc
On Thu, Nov 19, 2009 at 11:02:32AM -0500, Steven Rostedt wrote:
> On Thu, 2009-11-19 at 15:44 +0000, Andrew Haley wrote:
> > Thomas Gleixner wrote:
>
> > We're aligning the stack properly, as per the ABI requirements. Can't
> > you just fix the tracer?
>
> And how do we do that? The hooks that are in place have no idea of what
> happened before they were called?
>
> -- Steve
Yep, this is really something we can't fix from the tracer....
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: BUG: GCC-4.4.x changes the function frame on some functions
2009-11-19 16:06 ` Thomas Gleixner
@ 2009-11-19 16:17 ` Andrew Haley
2009-11-19 16:43 ` Thomas Gleixner
0 siblings, 1 reply; 68+ messages in thread
From: Andrew Haley @ 2009-11-19 16:17 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Ingo Molnar, H. Peter Anvin, LKML, Andrew Morton, Heiko Carstens,
feng.tang, Fr??d??ric Weisbecker, Steven Rostedt, Peter Zijlstra,
jakub, gcc
Thomas Gleixner wrote:
> On Thu, 19 Nov 2009, Andrew Haley wrote:
>> Thomas Gleixner wrote:
>>> There is no real obvious reason why the edi magic needs to be done
>>> _before_
>>>
>>> push %ebp
>>> mov %esp,%ebp
>> Sure there is: unless you do the adjustment first %ebp won't be 16-aligned.
>
> And why is this not done in 99% of the functions in the kernel, just
> in this one and some random others ?
If I could see the function I might be able to tell you. It's either a
performance enhancement, something to do with SSE, or it's a bug.
Andrew.
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: BUG: GCC-4.4.x changes the function frame on some functions
2009-11-19 15:44 ` Andrew Haley
` (2 preceding siblings ...)
2009-11-19 16:06 ` Thomas Gleixner
@ 2009-11-19 16:12 ` Steven Rostedt
3 siblings, 0 replies; 68+ messages in thread
From: Steven Rostedt @ 2009-11-19 16:12 UTC (permalink / raw)
To: Andrew Haley
Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, LKML,
Andrew Morton, Heiko Carstens, feng.tang, Fr??d??ric Weisbecker,
Peter Zijlstra, jakub, gcc
On Thu, 2009-11-19 at 15:44 +0000, Andrew Haley wrote:
> We're aligning the stack properly, as per the ABI requirements. Can't
> you just fix the tracer?
Unfortunately, this is the only fix we have:
diff --git a/kernel/trace/Kconfig b/kernel/trace/Kconfig
index b416512..cd39064 100644
--- a/kernel/trace/Kconfig
+++ b/kernel/trace/Kconfig
@@ -143,7 +143,6 @@ config FUNCTION_GRAPH_TRACER
bool "Kernel Function Graph Tracer"
depends on HAVE_FUNCTION_GRAPH_TRACER
depends on FUNCTION_TRACER
- depends on !X86_32 || !CC_OPTIMIZE_FOR_SIZE
default y
help
Enable the kernel to trace a function at both its return
diff --git a/kernel/trace/trace_functions_graph.c b/kernel/trace/trace_functions_graph.c
index 45e6c01..50c2251 100644
--- a/kernel/trace/trace_functions_graph.c
+++ b/kernel/trace/trace_functions_graph.c
@@ -1070,6 +1070,11 @@ static __init int init_graph_trace(void)
{
max_bytes_for_cpu = snprintf(NULL, 0, "%d", nr_cpu_ids - 1);
+#if defined(CONFIG_X86_32 && __GNUC__ >= 4 && __GNUC_MINOR__ >= 4)
+ pr_info("WARNING: GCC 4.4.X breaks the function graph tracer on i686.\n"
+ " The function graph tracer will be disabled.\n");
+ return -1;
+#endif
return register_tracer(&graph_trace);
}
-- Steve
^ permalink raw reply related [flat|nested] 68+ messages in thread
* Re: BUG: GCC-4.4.x changes the function frame on some functions
2009-11-19 16:02 ` Steven Rostedt
@ 2009-11-19 16:11 ` H. Peter Anvin
2009-11-19 16:19 ` Frederic Weisbecker
1 sibling, 0 replies; 68+ messages in thread
From: H. Peter Anvin @ 2009-11-19 16:11 UTC (permalink / raw)
To: rostedt
Cc: Andrew Haley, Thomas Gleixner, Ingo Molnar, LKML, Andrew Morton,
Heiko Carstens, feng.tang, Fr??d??ric Weisbecker, Peter Zijlstra,
jakub, gcc
On 11/19/2009 08:02 AM, Steven Rostedt wrote:
> On Thu, 2009-11-19 at 15:44 +0000, Andrew Haley wrote:
>> Thomas Gleixner wrote:
>
>> We're aligning the stack properly, as per the ABI requirements. Can't
>> you just fix the tracer?
>
> And how do we do that? The hooks that are in place have no idea of what
> happened before they were called?
>
Furthermore, it is nonsense -- ABI stack alignment on *32 bits* is 4
bytes, not 16.
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: BUG: GCC-4.4.x changes the function frame on some functions
2009-11-19 15:44 ` Andrew Haley
2009-11-19 15:54 ` H. Peter Anvin
2009-11-19 16:02 ` Steven Rostedt
@ 2009-11-19 16:06 ` Thomas Gleixner
2009-11-19 16:17 ` Andrew Haley
2009-11-19 16:12 ` Steven Rostedt
3 siblings, 1 reply; 68+ messages in thread
From: Thomas Gleixner @ 2009-11-19 16:06 UTC (permalink / raw)
To: Andrew Haley
Cc: Ingo Molnar, H. Peter Anvin, LKML, Andrew Morton, Heiko Carstens,
feng.tang, Fr??d??ric Weisbecker, Steven Rostedt, Peter Zijlstra,
jakub, gcc
On Thu, 19 Nov 2009, Andrew Haley wrote:
> Thomas Gleixner wrote:
> > There is no real obvious reason why the edi magic needs to be done
> > _before_
> >
> > push %ebp
> > mov %esp,%ebp
>
> Sure there is: unless you do the adjustment first %ebp won't be 16-aligned.
And why is this not done in 99% of the functions in the kernel, just
in this one and some random others ?
> We're aligning the stack properly, as per the ABI requirements. Can't
> you just fix the tracer?
Where is that ABI requirement that
push %ebp
needs to happen on an aligned stack ?
And why is this something GCC did not care about until GCC4.4 ?
Thanks,
tglx
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: BUG: GCC-4.4.x changes the function frame on some functions
2009-11-19 15:44 ` Andrew Haley
2009-11-19 15:54 ` H. Peter Anvin
@ 2009-11-19 16:02 ` Steven Rostedt
2009-11-19 16:11 ` H. Peter Anvin
2009-11-19 16:19 ` Frederic Weisbecker
2009-11-19 16:06 ` Thomas Gleixner
2009-11-19 16:12 ` Steven Rostedt
3 siblings, 2 replies; 68+ messages in thread
From: Steven Rostedt @ 2009-11-19 16:02 UTC (permalink / raw)
To: Andrew Haley
Cc: Thomas Gleixner, Ingo Molnar, H. Peter Anvin, LKML,
Andrew Morton, Heiko Carstens, feng.tang, Fr??d??ric Weisbecker,
Peter Zijlstra, jakub, gcc
On Thu, 2009-11-19 at 15:44 +0000, Andrew Haley wrote:
> Thomas Gleixner wrote:
> We're aligning the stack properly, as per the ABI requirements. Can't
> you just fix the tracer?
And how do we do that? The hooks that are in place have no idea of what
happened before they were called?
-- Steve
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: BUG: GCC-4.4.x changes the function frame on some functions
2009-11-19 15:54 ` H. Peter Anvin
@ 2009-11-19 15:57 ` Richard Guenther
0 siblings, 0 replies; 68+ messages in thread
From: Richard Guenther @ 2009-11-19 15:57 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Andrew Haley, Thomas Gleixner, Ingo Molnar, LKML, Andrew Morton,
Heiko Carstens, feng.tang, Fr??d??ric Weisbecker, Steven Rostedt,
Peter Zijlstra, jakub, gcc, H.J. Lu
On Thu, Nov 19, 2009 at 4:54 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 11/19/2009 07:44 AM, Andrew Haley wrote:
>>
>> We're aligning the stack properly, as per the ABI requirements. Can't
>> you just fix the tracer?
>>
>
> "Per the ABI requirements?" We're talking 32 bits, here.
Hm, even with
void bar (int *);
void foo (void)
{
int x;
bar (&x);
}
gcc -S -O2 -m32 -mincoming-stack-boundary=2 t.c
we re-align the stack. That looks indeed bogus.
HJ, you invented all this code, what's the reason for the above?
Richard.
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: BUG: GCC-4.4.x changes the function frame on some functions
2009-11-19 15:44 ` Andrew Haley
@ 2009-11-19 15:54 ` H. Peter Anvin
2009-11-19 15:57 ` Richard Guenther
2009-11-19 16:02 ` Steven Rostedt
` (2 subsequent siblings)
3 siblings, 1 reply; 68+ messages in thread
From: H. Peter Anvin @ 2009-11-19 15:54 UTC (permalink / raw)
To: Andrew Haley
Cc: Thomas Gleixner, Ingo Molnar, LKML, Andrew Morton,
Heiko Carstens, feng.tang, Fr??d??ric Weisbecker, Steven Rostedt,
Peter Zijlstra, jakub, gcc
On 11/19/2009 07:44 AM, Andrew Haley wrote:
>
> We're aligning the stack properly, as per the ABI requirements. Can't
> you just fix the tracer?
>
"Per the ABI requirements?" We're talking 32 bits, here.
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: BUG: GCC-4.4.x changes the function frame on some functions
2009-11-19 15:49 ` Richard Guenther
@ 2009-11-19 15:52 ` Richard Guenther
2009-11-19 17:37 ` Andi Kleen
1 sibling, 0 replies; 68+ messages in thread
From: Richard Guenther @ 2009-11-19 15:52 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Thomas Gleixner, Ingo Molnar, LKML, Andrew Morton,
Heiko Carstens, feng.tang, Fr??d??ric Weisbecker, Steven Rostedt,
Peter Zijlstra, jakub, gcc
On Thu, Nov 19, 2009 at 4:49 PM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Thu, Nov 19, 2009 at 4:45 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>> On 11/19/2009 07:37 AM, Thomas Gleixner wrote:
>>>
>>> modified function start on a handful of functions only seen with gcc
>>> 4.4.x on x86 32 bit:
>>>
>>> push %edi
>>> lea 0x8(%esp),%edi
>>> and $0xfffffff0,%esp
>>> pushl -0x4(%edi)
>>> push %ebp
>>> mov %esp,%ebp
>>> ...
>>> call mcount
>>>
>>
>> The real questions is why we're aligning the stack in the kernel. It is
>> probably not what we want -- we don't use SSE for anything but a handful
>> of special cases in the kernel, and we don't want the overhead.
>
> It's likely because you have long long vars on the stack which is
> faster when they are aligned. -mno-stackrealign may do what you
> want (or may not, I have not checked). I assume you already
> use -mpreferred-stack-boundary=2.
Just checking it seems you must be using -mincoming-stack-boundary=2
instead but keep the preferred stack boundary at 4.
Richard.
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: BUG: GCC-4.4.x changes the function frame on some functions
2009-11-19 15:45 ` H. Peter Anvin
@ 2009-11-19 15:49 ` Richard Guenther
2009-11-19 15:52 ` Richard Guenther
2009-11-19 17:37 ` Andi Kleen
0 siblings, 2 replies; 68+ messages in thread
From: Richard Guenther @ 2009-11-19 15:49 UTC (permalink / raw)
To: H. Peter Anvin
Cc: Thomas Gleixner, Ingo Molnar, LKML, Andrew Morton,
Heiko Carstens, feng.tang, Fr??d??ric Weisbecker, Steven Rostedt,
Peter Zijlstra, jakub, gcc
On Thu, Nov 19, 2009 at 4:45 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 11/19/2009 07:37 AM, Thomas Gleixner wrote:
>>
>> modified function start on a handful of functions only seen with gcc
>> 4.4.x on x86 32 bit:
>>
>> push %edi
>> lea 0x8(%esp),%edi
>> and $0xfffffff0,%esp
>> pushl -0x4(%edi)
>> push %ebp
>> mov %esp,%ebp
>> ...
>> call mcount
>>
>
> The real questions is why we're aligning the stack in the kernel. It is
> probably not what we want -- we don't use SSE for anything but a handful
> of special cases in the kernel, and we don't want the overhead.
It's likely because you have long long vars on the stack which is
faster when they are aligned. -mno-stackrealign may do what you
want (or may not, I have not checked). I assume you already
use -mpreferred-stack-boundary=2.
Richard.
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: BUG: GCC-4.4.x changes the function frame on some functions
2009-11-19 15:37 ` BUG: GCC-4.4.x changes the function frame on some functions Thomas Gleixner
2009-11-19 15:44 ` Andrew Haley
@ 2009-11-19 15:45 ` H. Peter Anvin
2009-11-19 15:49 ` Richard Guenther
2009-11-19 17:39 ` Linus Torvalds
2009-11-19 18:20 ` Andrew Haley
3 siblings, 1 reply; 68+ messages in thread
From: H. Peter Anvin @ 2009-11-19 15:45 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Ingo Molnar, LKML, Andrew Morton, Heiko Carstens, feng.tang,
Fr??d??ric Weisbecker, Steven Rostedt, Peter Zijlstra, jakub,
gcc
On 11/19/2009 07:37 AM, Thomas Gleixner wrote:
>
> modified function start on a handful of functions only seen with gcc
> 4.4.x on x86 32 bit:
>
> push %edi
> lea 0x8(%esp),%edi
> and $0xfffffff0,%esp
> pushl -0x4(%edi)
> push %ebp
> mov %esp,%ebp
> ...
> call mcount
>
The real questions is why we're aligning the stack in the kernel. It is
probably not what we want -- we don't use SSE for anything but a handful
of special cases in the kernel, and we don't want the overhead.
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: BUG: GCC-4.4.x changes the function frame on some functions
2009-11-19 15:37 ` BUG: GCC-4.4.x changes the function frame on some functions Thomas Gleixner
@ 2009-11-19 15:44 ` Andrew Haley
2009-11-19 15:54 ` H. Peter Anvin
` (3 more replies)
2009-11-19 15:45 ` H. Peter Anvin
` (2 subsequent siblings)
3 siblings, 4 replies; 68+ messages in thread
From: Andrew Haley @ 2009-11-19 15:44 UTC (permalink / raw)
To: Thomas Gleixner
Cc: Ingo Molnar, H. Peter Anvin, LKML, Andrew Morton, Heiko Carstens,
feng.tang, Fr??d??ric Weisbecker, Steven Rostedt, Peter Zijlstra,
jakub, gcc
Thomas Gleixner wrote:
> On Thu, 19 Nov 2009, Thomas Gleixner wrote:
>
> Can the GCC folks please shed some light on this:
>
> standard function start:
>
> push %ebp
> mov %esp, %ebp
> ....
> call mcount
>
> modified function start on a handful of functions only seen with gcc
> 4.4.x on x86 32 bit:
>
> push %edi
> lea 0x8(%esp),%edi
> and $0xfffffff0,%esp
> pushl -0x4(%edi)
> push %ebp
> mov %esp,%ebp
> ...
> call mcount
>
> This modification leads to a hard to solve problem in the kernel
> function graph tracer which assumes that the stack looks like:
>
> return address
> saved ebp
>
> With the modified function start sequence this is not longer true and
> the manipulation of the return address on the stack fails silently.
>
> Neither gcc 4.3 nor gcc 3.4 are generating such function frames, so it
> looks like a gcc 4.4.x feature.
>
> There is no real obvious reason why the edi magic needs to be done
> _before_
>
> push %ebp
> mov %esp,%ebp
Sure there is: unless you do the adjustment first %ebp won't be 16-aligned.
We're aligning the stack properly, as per the ABI requirements. Can't
you just fix the tracer?
Andrew.
^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: BUG: GCC-4.4.x changes the function frame on some functions
2009-11-19 14:30 ` BUG: function graph tracer function frame assumptions Thomas Gleixner
@ 2009-11-19 15:37 ` Thomas Gleixner
2009-11-19 15:44 ` Andrew Haley
` (3 more replies)
0 siblings, 4 replies; 68+ messages in thread
From: Thomas Gleixner @ 2009-11-19 15:37 UTC (permalink / raw)
To: Ingo Molnar
Cc: H. Peter Anvin, LKML, Andrew Morton, Heiko Carstens, feng.tang,
Fr??d??ric Weisbecker, Steven Rostedt, Peter Zijlstra, jakub,
gcc
On Thu, 19 Nov 2009, Thomas Gleixner wrote:
Can the GCC folks please shed some light on this:
standard function start:
push %ebp
mov %esp, %ebp
....
call mcount
modified function start on a handful of functions only seen with gcc
4.4.x on x86 32 bit:
push %edi
lea 0x8(%esp),%edi
and $0xfffffff0,%esp
pushl -0x4(%edi)
push %ebp
mov %esp,%ebp
...
call mcount
This modification leads to a hard to solve problem in the kernel
function graph tracer which assumes that the stack looks like:
return address
saved ebp
With the modified function start sequence this is not longer true and
the manipulation of the return address on the stack fails silently.
Neither gcc 4.3 nor gcc 3.4 are generating such function frames, so it
looks like a gcc 4.4.x feature.
There is no real obvious reason why the edi magic needs to be done
_before_
push %ebp
mov %esp,%ebp
Thanks,
tglx
^ permalink raw reply [flat|nested] 68+ messages in thread
end of thread, other threads:[~2009-11-23 9:54 UTC | newest]
Thread overview: 68+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-11-19 21:14 BUG: GCC-4.4.x changes the function frame on some functions H. Peter Anvin
2009-11-19 21:25 ` Jeff Law
2009-11-19 22:43 ` Steven Rostedt
2009-11-19 23:58 ` Jeff Law
2009-11-20 0:36 ` Thomas Gleixner
2009-11-20 0:59 ` Linus Torvalds
2009-11-20 1:27 ` Thomas Gleixner
2009-11-20 2:14 ` Thomas Gleixner
2009-11-20 13:09 ` [tip:x86/urgent] x86: Prevent GCC 4.4.x (pentium-mmx et al) function prologue wreckage tip-bot for Thomas Gleixner
2009-11-20 1:29 ` BUG: GCC-4.4.x changes the function frame on some functions H. Peter Anvin
2009-11-20 5:36 ` Ingo Molnar
2009-11-20 12:04 ` Andrew Haley
2009-11-20 12:22 ` Andrew Haley
-- strict thread matches above, loose matches on Subject: below --
2009-11-19 20:48 H. Peter Anvin
2009-11-18 19:30 [patch for 2.6.32? 1/3] hrtimers: remove the "timer_stats_active" check when setting the start info Thomas Gleixner
2009-11-18 20:24 ` [tip:timers/urgent] hrtimer: Fix /proc/timer_list regression tip-bot for Feng Tang
2009-11-19 7:20 ` Ingo Molnar
2009-11-19 10:05 ` Thomas Gleixner
2009-11-19 14:30 ` BUG: function graph tracer function frame assumptions Thomas Gleixner
2009-11-19 15:37 ` BUG: GCC-4.4.x changes the function frame on some functions Thomas Gleixner
2009-11-19 15:44 ` Andrew Haley
2009-11-19 15:54 ` H. Peter Anvin
2009-11-19 15:57 ` Richard Guenther
2009-11-19 16:02 ` Steven Rostedt
2009-11-19 16:11 ` H. Peter Anvin
2009-11-19 16:19 ` Frederic Weisbecker
2009-11-19 16:06 ` Thomas Gleixner
2009-11-19 16:17 ` Andrew Haley
2009-11-19 16:43 ` Thomas Gleixner
2009-11-19 16:12 ` Steven Rostedt
2009-11-19 15:45 ` H. Peter Anvin
2009-11-19 15:49 ` Richard Guenther
2009-11-19 15:52 ` Richard Guenther
2009-11-19 17:37 ` Andi Kleen
2009-11-19 17:39 ` Linus Torvalds
2009-11-19 17:51 ` Thomas Gleixner
2009-11-19 17:59 ` Steven Rostedt
2009-11-19 18:03 ` Richard Guenther
2009-11-19 18:22 ` Andrew Haley
2009-11-19 18:41 ` Linus Torvalds
2009-11-19 18:43 ` Linus Torvalds
2009-11-19 18:54 ` Linus Torvalds
2009-11-19 19:01 ` Thomas Gleixner
2009-11-23 9:16 ` Jakub Jelinek
2009-11-23 9:51 ` Thomas Gleixner
2009-11-19 19:10 ` David Daney
2009-11-19 19:28 ` Steven Rostedt
2009-11-19 19:46 ` Frederic Weisbecker
2009-11-19 19:54 ` Kai Tietz
2009-11-19 20:05 ` Frederic Weisbecker
2009-11-19 20:05 ` Steven Rostedt
2009-11-19 20:17 ` Steven Rostedt
2009-11-19 20:28 ` Frederic Weisbecker
2009-11-19 20:25 ` Frederic Weisbecker
2009-11-19 20:36 ` Linus Torvalds
2009-11-19 20:44 ` Steven Rostedt
2009-11-19 19:50 ` H. Peter Anvin
2009-11-19 20:06 ` Linus Torvalds
2009-11-19 21:12 ` Jeff Law
2009-11-19 20:10 ` Steven Rostedt
2009-11-19 21:05 ` Jeff Law
2009-11-19 18:31 ` Thomas Gleixner
2009-11-19 18:38 ` Linus Torvalds
2009-11-19 18:47 ` Ingo Molnar
2009-11-19 19:06 ` Steven Rostedt
2009-11-19 19:50 ` Ingo Molnar
2009-11-19 20:36 ` Thomas Gleixner
2009-11-19 18:20 ` Andrew Haley
2009-11-19 18:33 ` Steven Rostedt
2009-11-19 18:36 ` Andrew Pinski
2009-11-19 18:36 ` Andrew Haley
2009-11-19 18:37 ` H. Peter Anvin
2009-11-19 18:39 ` Thomas Gleixner
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.