* [RFC] gcc feature request: Moving blocks into sections @ 2013-08-05 16:55 Steven Rostedt 2013-08-05 17:02 ` H. Peter Anvin ` (3 more replies) 0 siblings, 4 replies; 68+ messages in thread From: Steven Rostedt @ 2013-08-05 16:55 UTC (permalink / raw) To: LKML, gcc Cc: Linus Torvalds, Ingo Molnar, Mathieu Desnoyers, H. Peter Anvin, Thomas Gleixner, David Daney, Behan Webster, Peter Zijlstra [ sent to both Linux kernel mailing list and to gcc list ] I was looking at some of the old code I still have marked in my TODO list, that I never pushed to get mainlined. One of them is to move trace point logic out of the fast path to get rid of the stress that it imposes on the icache. Almost a full year ago, Mathieu suggested something like: if (unlikely(x)) __attribute__((section(".unlikely"))) { ... } else __attribute__((section(".likely"))) { ... } https://lkml.org/lkml/2012/8/9/658 Which got me thinking. How hard would it be to set a block in its own section. Like what Mathieu suggested, but it doesn't have to be ".unlikely". if (x) __attibute__((section(".foo"))) { /* do something */ } Then have in the assembly, simply: test x beq 2f 1: /* continue */ ret 2: jmp foo1 3: jmp 1b Then in section ".foo": foo1: /* do something */ jmp 3b Perhaps we can't use the section attribute. We could create a new attribute. Perhaps a __jmp_section__ or whatever (I'm horrible with names). Is this a possibility? If this is possible, we can get a lot of code out of the fast path. Things like stats and tracing, which is mostly default off. I would imagine that we would get better performance by doing this. Especially as tracepoints are being added all over the place. Thanks, -- Steve ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [RFC] gcc feature request: Moving blocks into sections 2013-08-05 16:55 [RFC] gcc feature request: Moving blocks into sections Steven Rostedt @ 2013-08-05 17:02 ` H. Peter Anvin 2013-08-05 17:24 ` Steven Rostedt 2013-08-05 17:12 ` Linus Torvalds ` (2 subsequent siblings) 3 siblings, 1 reply; 68+ messages in thread From: H. Peter Anvin @ 2013-08-05 17:02 UTC (permalink / raw) To: Steven Rostedt Cc: LKML, gcc, Linus Torvalds, Ingo Molnar, Mathieu Desnoyers, Thomas Gleixner, David Daney, Behan Webster, Peter Zijlstra On 08/05/2013 09:55 AM, Steven Rostedt wrote: > > Almost a full year ago, Mathieu suggested something like: > > if (unlikely(x)) __attribute__((section(".unlikely"))) { > ... > } else __attribute__((section(".likely"))) { > ... > } > > https://lkml.org/lkml/2012/8/9/658 > > Which got me thinking. How hard would it be to set a block in its own > section. Like what Mathieu suggested, but it doesn't have to be > ".unlikely". > > if (x) __attibute__((section(".foo"))) { > /* do something */ > } > One concern I have is how this kind of code would work when embedded inside a function which already has a section attribute. This could easily cause really weird bugs when someone "optimizes" an inline or macro and breaks a single call site... -hpa ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [RFC] gcc feature request: Moving blocks into sections 2013-08-05 17:02 ` H. Peter Anvin @ 2013-08-05 17:24 ` Steven Rostedt 0 siblings, 0 replies; 68+ messages in thread From: Steven Rostedt @ 2013-08-05 17:24 UTC (permalink / raw) To: H. Peter Anvin Cc: LKML, gcc, Linus Torvalds, Ingo Molnar, Mathieu Desnoyers, Thomas Gleixner, David Daney, Behan Webster, Peter Zijlstra On Mon, 2013-08-05 at 10:02 -0700, H. Peter Anvin wrote: > > if (x) __attibute__((section(".foo"))) { > > /* do something */ > > } > > > > One concern I have is how this kind of code would work when embedded > inside a function which already has a section attribute. This could > easily cause really weird bugs when someone "optimizes" an inline or > macro and breaks a single call site... I would say that it overrides the section it is embedded in. Basically like a .pushsection and .popsection would work. What bugs do you think would happen? Sure, this used in an .init section would have this code sit around after boot up. I'm sure modules could handle this properly. What other uses of attribute section is there for code? I'm aware of locks and sched using it but that's more for debugging purposes and even there, the worse thing I see is that a debug report wont say that the code is in the section. We do a lot of tricks with sections in the Linux kernel, so I too share your concern. But even with that, if we audit all use cases, we may still be able to safely do this. This is why I'm asking for comments :-) -- Steve ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [RFC] gcc feature request: Moving blocks into sections 2013-08-05 16:55 [RFC] gcc feature request: Moving blocks into sections Steven Rostedt 2013-08-05 17:02 ` H. Peter Anvin @ 2013-08-05 17:12 ` Linus Torvalds 2013-08-05 17:15 ` Linus Torvalds 2013-08-05 17:55 ` Steven Rostedt 2013-08-05 19:04 ` Andi Kleen 2013-08-12 9:17 ` Peter Zijlstra 3 siblings, 2 replies; 68+ messages in thread From: Linus Torvalds @ 2013-08-05 17:12 UTC (permalink / raw) To: Steven Rostedt Cc: LKML, gcc, Ingo Molnar, Mathieu Desnoyers, H. Peter Anvin, Thomas Gleixner, David Daney, Behan Webster, Peter Zijlstra On Mon, Aug 5, 2013 at 9:55 AM, Steven Rostedt <rostedt@goodmis.org> wrote: > > Almost a full year ago, Mathieu suggested something like: > > if (unlikely(x)) __attribute__((section(".unlikely"))) { > ... > } else __attribute__((section(".likely"))) { > ... > } It's almost certainly a horrible idea. First off, we have very few things that are *so* unlikely that they never get executed. Putting things in a separate section would actually be really bad. Secondly, you don't want a separate section anyway for any normal kernel code, since you want short jumps if possible (pretty much every single architecture out there has a concept of shorter jumps that are noticeably cheaper than long ones). You want the unlikely code to be out-of-line, but still *close*. Which is largely what gcc already does (except if you use "-Os", which disables all the basic block movement and thus makes "likely/unlikely" pointless to begin with). There are some situations where you'd want extremely unlikely code to really be elsewhere, but they are rare as hell, and mostly in user code where you might try to avoid demand-loading such code entirely. So give up on sections. They are a bad idea for anything except the things we already use them for. Sure, you can try to fix the problems with sections with link-time optimization work and a *lot* of small individual sections (the way per-function sections work already), but that's basically just undoing the stupidity of using sections to begin with. Linus ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [RFC] gcc feature request: Moving blocks into sections 2013-08-05 17:12 ` Linus Torvalds @ 2013-08-05 17:15 ` Linus Torvalds 2013-08-05 17:55 ` Steven Rostedt 1 sibling, 0 replies; 68+ messages in thread From: Linus Torvalds @ 2013-08-05 17:15 UTC (permalink / raw) To: Steven Rostedt Cc: LKML, gcc, Ingo Molnar, Mathieu Desnoyers, H. Peter Anvin, Thomas Gleixner, David Daney, Behan Webster, Peter Zijlstra On Mon, Aug 5, 2013 at 10:12 AM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > Secondly, you don't want a separate section anyway for any normal > kernel code, since you want short jumps if possible Just to clarify: the short jump is important regardless of how unlikely the code you're jumping is, since even if you'd be jumping to very unlikely ("never executed") code, the branch to that code is itself in the hot path. And the difference between a two-byte short jump to the end of a short function, and a five-byte long jump (to pick the x86 case) is quite noticeable. Other cases do long jumps by jumping to a thunk, and so the "hot case" is unaffected, but at least one common architecture very much sees the difference in the likely code. Linus ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [RFC] gcc feature request: Moving blocks into sections 2013-08-05 17:12 ` Linus Torvalds 2013-08-05 17:15 ` Linus Torvalds @ 2013-08-05 17:55 ` Steven Rostedt 2013-08-05 18:11 ` Steven Rostedt ` (2 more replies) 1 sibling, 3 replies; 68+ messages in thread From: Steven Rostedt @ 2013-08-05 17:55 UTC (permalink / raw) To: Linus Torvalds Cc: LKML, gcc, Ingo Molnar, Mathieu Desnoyers, H. Peter Anvin, Thomas Gleixner, David Daney, Behan Webster, Peter Zijlstra, Herbert Xu On Mon, 2013-08-05 at 10:12 -0700, Linus Torvalds wrote: > On Mon, Aug 5, 2013 at 9:55 AM, Steven Rostedt <rostedt@goodmis.org> wrote: > First off, we have very few things that are *so* unlikely that they > never get executed. Putting things in a separate section would > actually be really bad. My main concern is with tracepoints. Which on 90% (or more) of systems running Linux, is completely off, and basically just dead code, until someone wants to see what's happening and enables them. > > Secondly, you don't want a separate section anyway for any normal > kernel code, since you want short jumps if possible (pretty much every > single architecture out there has a concept of shorter jumps that are > noticeably cheaper than long ones). You want the unlikely code to be > out-of-line, but still *close*. Which is largely what gcc already does > (except if you use "-Os", which disables all the basic block movement > and thus makes "likely/unlikely" pointless to begin with). > > There are some situations where you'd want extremely unlikely code to > really be elsewhere, but they are rare as hell, and mostly in user > code where you might try to avoid demand-loading such code entirely. Well, as tracepoints are being added quite a bit in Linux, my concern is with the inlined functions that they bring. With jump labels they are disabled in a very unlikely way (the static_key_false() is a nop to skip the code, and is dynamically enabled to a jump). I did a make kernel/sched/core.i to get what we have in the current sched_switch code: static inline __attribute__((no_instrument_function)) void trace_sched_switch (struct task_struct *prev, struct task_struct *next) { if (static_key_false(& __tracepoint_sched_switch .key)) do { struct tracepoint_func *it_func_ptr; void *it_func; void *__data; rcu_read_lock_sched_notrace(); it_func_ptr = ({ typeof(*((&__tracepoint_sched_switch)->funcs)) *_________p1 = (typeof(*((&__tracepoint_sched_switch)->funcs))* ) (*(volatile typeof(((&__tracepoint_sched_switch)->funcs)) *) &(((&__tracepoint_sched_switch)->funcs))); do { static bool __attribute__ ((__section__(".data.unlikely"))) __warned; if (debug_lockdep_rcu_enabled() && !__warned && !(rcu_read_lock_sched_held() || (0))) { __warned = true; lockdep_rcu_suspicious( , 153 , "suspicious rcu_dereference_check()" " usage"); } } while (0); ((typeof(*((&__tracepoint_sched_switch)->funcs)) *)(_________p1)); }); if (it_func_ptr) { do { it_func = (it_func_ptr)->func; __data = (it_func_ptr)->data; ((void(*)(void *__data, struct task_struct *prev, struct task_struct *next))(it_func))(__data, prev, next); } while ((++it_func_ptr)->func); } rcu_read_unlock_sched_notrace(); } while (0); } I massaged it to look more readable. This is inlined right at the beginning of the prepare_task_switch(). Now, most of this code should be moved to the end of the function by gcc (well, as you stated -Os may not play nice here). And perhaps its not that bad of an issue. That is, how much of the icache does this actually take up? Maybe we are lucky and it sits outside the icache of the hot path. I still need to start running a bunch of benchmarks to see how much overhead these tracepoints cause. Herbert Xu brought up the concern about various latencies in the kernel, including tracing, in his ATTEND request on the kernel-discuss mailing list. > > So give up on sections. They are a bad idea for anything except the > things we already use them for. Sure, you can try to fix the problems > with sections with link-time optimization work and a *lot* of small > individual sections (the way per-function sections work already), but > that's basically just undoing the stupidity of using sections to begin > with. OK, this was just a suggestion. Perhaps my original patch that just moves this code into a real function where the trace_sched_switch() only contains the jump_label and a call to another function that does all the work when enabled, is still a better idea. That is, if benchmarks prove that it's worth it. Instead of the above, my patches would make the code into: static inline __attribute__((no_instrument_function)) void trace_sched_switch (struct task_struct *prev, struct task_struct *next) { if (static_key_false(& __tracepoint_sched_switch .key)) __trace_sched_switch(prev, next); } That is, when this tracepoint is enabled, it will call another function that does the tracepoint work. The difference between this and the "section" hack I suggested, is that this would use a "call"/"ret" when enabled instead of a "jmp"/"jmp". -- Steve ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [RFC] gcc feature request: Moving blocks into sections 2013-08-05 17:55 ` Steven Rostedt @ 2013-08-05 18:11 ` Steven Rostedt 2013-08-05 18:17 ` H. Peter Anvin 2013-08-05 18:20 ` Linus Torvalds 2 siblings, 0 replies; 68+ messages in thread From: Steven Rostedt @ 2013-08-05 18:11 UTC (permalink / raw) To: Linus Torvalds Cc: LKML, gcc, Ingo Molnar, Mathieu Desnoyers, H. Peter Anvin, Thomas Gleixner, David Daney, Behan Webster, Peter Zijlstra, Herbert Xu On Mon, 2013-08-05 at 13:55 -0400, Steven Rostedt wrote: > The difference between this and the > "section" hack I suggested, is that this would use a "call"/"ret" when > enabled instead of a "jmp"/"jmp". I wonder if this is what Kris Kross meant in their song? /me goes back to work... -- Steve ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [RFC] gcc feature request: Moving blocks into sections 2013-08-05 17:55 ` Steven Rostedt 2013-08-05 18:11 ` Steven Rostedt @ 2013-08-05 18:17 ` H. Peter Anvin 2013-08-05 18:23 ` Steven Rostedt 2013-08-05 18:20 ` Linus Torvalds 2 siblings, 1 reply; 68+ messages in thread From: H. Peter Anvin @ 2013-08-05 18:17 UTC (permalink / raw) To: Steven Rostedt Cc: Linus Torvalds, LKML, gcc, Ingo Molnar, Mathieu Desnoyers, Thomas Gleixner, David Daney, Behan Webster, Peter Zijlstra, Herbert Xu On 08/05/2013 10:55 AM, Steven Rostedt wrote: > > Well, as tracepoints are being added quite a bit in Linux, my concern is > with the inlined functions that they bring. With jump labels they are > disabled in a very unlikely way (the static_key_false() is a nop to skip > the code, and is dynamically enabled to a jump). > Have you considered using traps for tracepoints? A trapping instruction can be as small as a single byte. The downside, of course, is that it is extremely suppressed -- the trap is always expensive -- and you then have to do a lookup to find the target based on the originating IP. -hpa ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [RFC] gcc feature request: Moving blocks into sections 2013-08-05 18:17 ` H. Peter Anvin @ 2013-08-05 18:23 ` Steven Rostedt 2013-08-05 18:29 ` H. Peter Anvin 0 siblings, 1 reply; 68+ messages in thread From: Steven Rostedt @ 2013-08-05 18:23 UTC (permalink / raw) To: H. Peter Anvin Cc: Linus Torvalds, LKML, gcc, Ingo Molnar, Mathieu Desnoyers, Thomas Gleixner, David Daney, Behan Webster, Peter Zijlstra, Herbert Xu On Mon, 2013-08-05 at 11:17 -0700, H. Peter Anvin wrote: > On 08/05/2013 10:55 AM, Steven Rostedt wrote: > > > > Well, as tracepoints are being added quite a bit in Linux, my concern is > > with the inlined functions that they bring. With jump labels they are > > disabled in a very unlikely way (the static_key_false() is a nop to skip > > the code, and is dynamically enabled to a jump). > > > > Have you considered using traps for tracepoints? A trapping instruction > can be as small as a single byte. The downside, of course, is that it > is extremely suppressed -- the trap is always expensive -- and you then > have to do a lookup to find the target based on the originating IP. No, never considered it, nor would I. Those that use tracepoints, do use them extensively, and adding traps like this would probably cause heissenbugs and make tracepoints useless. Not to mention, how would we add a tracepoint to a trap handler? -- Steve ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [RFC] gcc feature request: Moving blocks into sections 2013-08-05 18:23 ` Steven Rostedt @ 2013-08-05 18:29 ` H. Peter Anvin 2013-08-05 18:49 ` Steven Rostedt 0 siblings, 1 reply; 68+ messages in thread From: H. Peter Anvin @ 2013-08-05 18:29 UTC (permalink / raw) To: Steven Rostedt Cc: Linus Torvalds, LKML, gcc, Ingo Molnar, Mathieu Desnoyers, Thomas Gleixner, David Daney, Behan Webster, Peter Zijlstra, Herbert Xu On 08/05/2013 11:23 AM, Steven Rostedt wrote: > On Mon, 2013-08-05 at 11:17 -0700, H. Peter Anvin wrote: >> On 08/05/2013 10:55 AM, Steven Rostedt wrote: >>> >>> Well, as tracepoints are being added quite a bit in Linux, my concern is >>> with the inlined functions that they bring. With jump labels they are >>> disabled in a very unlikely way (the static_key_false() is a nop to skip >>> the code, and is dynamically enabled to a jump). >>> >> >> Have you considered using traps for tracepoints? A trapping instruction >> can be as small as a single byte. The downside, of course, is that it >> is extremely suppressed -- the trap is always expensive -- and you then >> have to do a lookup to find the target based on the originating IP. > > No, never considered it, nor would I. Those that use tracepoints, do use > them extensively, and adding traps like this would probably cause > heissenbugs and make tracepoints useless. > > Not to mention, how would we add a tracepoint to a trap handler? > Traps nest, that's why there is a stack. (OK, so you don't want to take the same trap inside the trap handler, but that code should be very limited.) The trap instruction just becomes very short, but rather slow, call-return. However, when you consider the cost you have to consider that the tracepoint is doing other work, so it may very well amortize out. -hpa ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [RFC] gcc feature request: Moving blocks into sections 2013-08-05 18:29 ` H. Peter Anvin @ 2013-08-05 18:49 ` Steven Rostedt 2013-08-05 18:51 ` H. Peter Anvin 0 siblings, 1 reply; 68+ messages in thread From: Steven Rostedt @ 2013-08-05 18:49 UTC (permalink / raw) To: H. Peter Anvin Cc: Linus Torvalds, LKML, gcc, Ingo Molnar, Mathieu Desnoyers, Thomas Gleixner, David Daney, Behan Webster, Peter Zijlstra, Herbert Xu On Mon, 2013-08-05 at 11:29 -0700, H. Peter Anvin wrote: > Traps nest, that's why there is a stack. (OK, so you don't want to take > the same trap inside the trap handler, but that code should be very > limited.) The trap instruction just becomes very short, but rather > slow, call-return. > > However, when you consider the cost you have to consider that the > tracepoint is doing other work, so it may very well amortize out. Also, how would you pass the parameters? Every tracepoint has its own parameters to pass to it. How would a trap know what where to get "prev" and "next"? -- Steve ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [RFC] gcc feature request: Moving blocks into sections 2013-08-05 18:49 ` Steven Rostedt @ 2013-08-05 18:51 ` H. Peter Anvin 2013-08-05 19:01 ` Linus Torvalds 2013-08-05 19:09 ` Steven Rostedt 0 siblings, 2 replies; 68+ messages in thread From: H. Peter Anvin @ 2013-08-05 18:51 UTC (permalink / raw) To: Steven Rostedt Cc: Linus Torvalds, LKML, gcc, Ingo Molnar, Mathieu Desnoyers, Thomas Gleixner, David Daney, Behan Webster, Peter Zijlstra, Herbert Xu On 08/05/2013 11:49 AM, Steven Rostedt wrote: > On Mon, 2013-08-05 at 11:29 -0700, H. Peter Anvin wrote: > >> Traps nest, that's why there is a stack. (OK, so you don't want to take >> the same trap inside the trap handler, but that code should be very >> limited.) The trap instruction just becomes very short, but rather >> slow, call-return. >> >> However, when you consider the cost you have to consider that the >> tracepoint is doing other work, so it may very well amortize out. > > Also, how would you pass the parameters? Every tracepoint has its own > parameters to pass to it. How would a trap know what where to get "prev" > and "next"? > How do you do that now? You have to do an IP lookup to find out what you are doing. (Note: I wonder how much the parameter generation costs the tracepoints.) -hpa ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [RFC] gcc feature request: Moving blocks into sections 2013-08-05 18:51 ` H. Peter Anvin @ 2013-08-05 19:01 ` Linus Torvalds 2013-08-05 19:54 ` Mathieu Desnoyers 2013-08-05 19:09 ` Steven Rostedt 1 sibling, 1 reply; 68+ messages in thread From: Linus Torvalds @ 2013-08-05 19:01 UTC (permalink / raw) To: H. Peter Anvin Cc: Steven Rostedt, LKML, gcc, Ingo Molnar, Mathieu Desnoyers, Thomas Gleixner, David Daney, Behan Webster, Peter Zijlstra, Herbert Xu On Mon, Aug 5, 2013 at 11:51 AM, H. Peter Anvin <hpa@linux.intel.com> wrote: >> >> Also, how would you pass the parameters? Every tracepoint has its own >> parameters to pass to it. How would a trap know what where to get "prev" >> and "next"? > > How do you do that now? > > You have to do an IP lookup to find out what you are doing. No, he just generates the code for the call and then uses a static_key to jump to it. So normally it's all out-of-line, and the only thing in the hot-path is that 5-byte nop (which gets turned into a 5-byte jump when the tracing key is enabled) Works fine, but the normally unused stubs end up mixing in the normal code segment. Which I actually think is fine, but right now we don't get the short-jump advantage from it (and there is likely some I$ disadvantage from just fragmentation of the code). With two-byte jumps, you'd still get the I$ fragmentation (the argument generation and the call and the branch back would all be in the same code segment as the hot code), but that would be offset by the fact that at least the hot code itself could use a short jump when possible (ie a 2-byte nop rather than a 5-byte one). Don't know which way it would go performance-wise. But it shouldn't need gcc changes, it just needs the static key branch/nop rewriting to be able to handle both sizes. I couldn't tell why Steven's series to do that was so complex, though - I only glanced through the patches. Linus ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [RFC] gcc feature request: Moving blocks into sections 2013-08-05 19:01 ` Linus Torvalds @ 2013-08-05 19:54 ` Mathieu Desnoyers 2013-08-05 19:57 ` Linus Torvalds 0 siblings, 1 reply; 68+ messages in thread From: Mathieu Desnoyers @ 2013-08-05 19:54 UTC (permalink / raw) To: Linus Torvalds Cc: H. Peter Anvin, Steven Rostedt, LKML, gcc, Ingo Molnar, Thomas Gleixner, David Daney, Behan Webster, Peter Zijlstra, Herbert Xu * Linus Torvalds (torvalds@linux-foundation.org) wrote: [...] > With two-byte jumps, you'd still get the I$ fragmentation (the > argument generation and the call and the branch back would all be in > the same code segment as the hot code), but that would be offset by > the fact that at least the hot code itself could use a short jump when > possible (ie a 2-byte nop rather than a 5-byte one). I remember that choosing between 2 and 5 bytes nop in the asm goto was tricky: it had something to do with the fact that gcc doesn't know the exact size of each instructions until further down within compilation phases on architectures with variable instruction size like x86. If we have guarantees that the guessed size of each instruction is an upper bound on the instruction size, this could probably work though. Thoughts ? Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [RFC] gcc feature request: Moving blocks into sections 2013-08-05 19:54 ` Mathieu Desnoyers @ 2013-08-05 19:57 ` Linus Torvalds 2013-08-05 20:02 ` Steven Rostedt 2013-08-05 21:28 ` Mathieu Desnoyers 0 siblings, 2 replies; 68+ messages in thread From: Linus Torvalds @ 2013-08-05 19:57 UTC (permalink / raw) To: Mathieu Desnoyers Cc: H. Peter Anvin, Steven Rostedt, LKML, gcc, Ingo Molnar, Thomas Gleixner, David Daney, Behan Webster, Peter Zijlstra, Herbert Xu On Mon, Aug 5, 2013 at 12:54 PM, Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote: > > I remember that choosing between 2 and 5 bytes nop in the asm goto was > tricky: it had something to do with the fact that gcc doesn't know the > exact size of each instructions until further down within compilation Oh, you can't do it in the coompiler, no. But you don't need to. The assembler will pick the right version if you just do "jmp target". Linus ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [RFC] gcc feature request: Moving blocks into sections 2013-08-05 19:57 ` Linus Torvalds @ 2013-08-05 20:02 ` Steven Rostedt 2013-08-05 21:28 ` Mathieu Desnoyers 1 sibling, 0 replies; 68+ messages in thread From: Steven Rostedt @ 2013-08-05 20:02 UTC (permalink / raw) To: Linus Torvalds Cc: Mathieu Desnoyers, H. Peter Anvin, LKML, gcc, Ingo Molnar, Thomas Gleixner, David Daney, Behan Webster, Peter Zijlstra, Herbert Xu On Mon, 2013-08-05 at 12:57 -0700, Linus Torvalds wrote: > On Mon, Aug 5, 2013 at 12:54 PM, Mathieu Desnoyers > <mathieu.desnoyers@efficios.com> wrote: > > > > I remember that choosing between 2 and 5 bytes nop in the asm goto was > > tricky: it had something to do with the fact that gcc doesn't know the > > exact size of each instructions until further down within compilation > > Oh, you can't do it in the coompiler, no. But you don't need to. The > assembler will pick the right version if you just do "jmp target". Right, and that's exactly what my patches did. -- Steve ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [RFC] gcc feature request: Moving blocks into sections 2013-08-05 19:57 ` Linus Torvalds 2013-08-05 20:02 ` Steven Rostedt @ 2013-08-05 21:28 ` Mathieu Desnoyers 2013-08-05 21:43 ` H. Peter Anvin 2013-08-05 21:44 ` Steven Rostedt 1 sibling, 2 replies; 68+ messages in thread From: Mathieu Desnoyers @ 2013-08-05 21:28 UTC (permalink / raw) To: Linus Torvalds Cc: H. Peter Anvin, Steven Rostedt, LKML, gcc, Ingo Molnar, Thomas Gleixner, David Daney, Behan Webster, Peter Zijlstra, Herbert Xu * Linus Torvalds (torvalds@linux-foundation.org) wrote: > On Mon, Aug 5, 2013 at 12:54 PM, Mathieu Desnoyers > <mathieu.desnoyers@efficios.com> wrote: > > > > I remember that choosing between 2 and 5 bytes nop in the asm goto was > > tricky: it had something to do with the fact that gcc doesn't know the > > exact size of each instructions until further down within compilation > > Oh, you can't do it in the coompiler, no. But you don't need to. The > assembler will pick the right version if you just do "jmp target". Yep. Another thing that bothers me with Steven's approach is that decoding jumps generated by the compiler seems fragile IMHO. x86 decoding proposed by https://lkml.org/lkml/2012/3/8/464 : +static int make_nop_x86(void *map, size_t const offset) +{ + unsigned char *op; + unsigned char *nop; + int size; + + /* Determine which type of jmp this is 2 byte or 5. */ + op = map + offset; + switch (*op) { + case 0xeb: /* 2 byte */ + size = 2; + nop = ideal_nop2_x86; + break; + case 0xe9: /* 5 byte */ + size = 5; + nop = ideal_nop; + break; + default: + die(NULL, "Bad jump label section (bad op %x)\n", *op); + __builtin_unreachable(); + } My though is that the code above does not cover all jump encodings that can be generated by past, current and future x86 assemblers. Another way around this issue might be to keep the instruction size within a non-allocated section: static __always_inline bool arch_static_branch(struct static_key *key) { asm goto("1:" "jmp %l[l_yes]\n\t" "2:" ".pushsection __jump_table, \"aw\" \n\t" _ASM_ALIGN "\n\t" _ASM_PTR "1b, %l[l_yes], %c0 \n\t" ".popsection \n\t" ".pushsection __jump_table_ilen \n\t" _ASM_PTR "1b \n\t" /* Address of the jmp */ ".byte 2b - 1b \n\t" /* Size of the jmp instruction */ ".popsection \n\t" : : "i" (key) : : l_yes); return false; l_yes: return true; } And use (2b - 1b) to know what size of no-op should be used rather than to rely on instruction decoding. Thoughts ? Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [RFC] gcc feature request: Moving blocks into sections 2013-08-05 21:28 ` Mathieu Desnoyers @ 2013-08-05 21:43 ` H. Peter Anvin 2013-08-06 4:14 ` Mathieu Desnoyers 2013-08-06 16:15 ` Steven Rostedt 2013-08-05 21:44 ` Steven Rostedt 1 sibling, 2 replies; 68+ messages in thread From: H. Peter Anvin @ 2013-08-05 21:43 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Linus Torvalds, Steven Rostedt, LKML, gcc, Ingo Molnar, Thomas Gleixner, David Daney, Behan Webster, Peter Zijlstra, Herbert Xu On 08/05/2013 02:28 PM, Mathieu Desnoyers wrote: > * Linus Torvalds (torvalds@linux-foundation.org) wrote: >> On Mon, Aug 5, 2013 at 12:54 PM, Mathieu Desnoyers >> <mathieu.desnoyers@efficios.com> wrote: >>> >>> I remember that choosing between 2 and 5 bytes nop in the asm goto was >>> tricky: it had something to do with the fact that gcc doesn't know the >>> exact size of each instructions until further down within compilation >> >> Oh, you can't do it in the coompiler, no. But you don't need to. The >> assembler will pick the right version if you just do "jmp target". > > Yep. > > Another thing that bothers me with Steven's approach is that decoding > jumps generated by the compiler seems fragile IMHO. > > x86 decoding proposed by https://lkml.org/lkml/2012/3/8/464 : > > +static int make_nop_x86(void *map, size_t const offset) > +{ > + unsigned char *op; > + unsigned char *nop; > + int size; > + > + /* Determine which type of jmp this is 2 byte or 5. */ > + op = map + offset; > + switch (*op) { > + case 0xeb: /* 2 byte */ > + size = 2; > + nop = ideal_nop2_x86; > + break; > + case 0xe9: /* 5 byte */ > + size = 5; > + nop = ideal_nop; > + break; > + default: > + die(NULL, "Bad jump label section (bad op %x)\n", *op); > + __builtin_unreachable(); > + } > > My though is that the code above does not cover all jump encodings that > can be generated by past, current and future x86 assemblers. > For unconditional jmp that should be pretty safe barring any fundamental changes to the instruction set, in which case we can enable it as needed, but for extra robustness it probably should skip prefix bytes. -hpa ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [RFC] gcc feature request: Moving blocks into sections 2013-08-05 21:43 ` H. Peter Anvin @ 2013-08-06 4:14 ` Mathieu Desnoyers 2013-08-06 4:28 ` H. Peter Anvin 2013-08-06 16:15 ` Steven Rostedt 1 sibling, 1 reply; 68+ messages in thread From: Mathieu Desnoyers @ 2013-08-06 4:14 UTC (permalink / raw) To: H. Peter Anvin Cc: Linus Torvalds, Steven Rostedt, LKML, gcc, Ingo Molnar, Thomas Gleixner, David Daney, Behan Webster, Peter Zijlstra, Herbert Xu * H. Peter Anvin (hpa@linux.intel.com) wrote: > On 08/05/2013 02:28 PM, Mathieu Desnoyers wrote: > > * Linus Torvalds (torvalds@linux-foundation.org) wrote: > >> On Mon, Aug 5, 2013 at 12:54 PM, Mathieu Desnoyers > >> <mathieu.desnoyers@efficios.com> wrote: > >>> > >>> I remember that choosing between 2 and 5 bytes nop in the asm goto was > >>> tricky: it had something to do with the fact that gcc doesn't know the > >>> exact size of each instructions until further down within compilation > >> > >> Oh, you can't do it in the coompiler, no. But you don't need to. The > >> assembler will pick the right version if you just do "jmp target". > > > > Yep. > > > > Another thing that bothers me with Steven's approach is that decoding > > jumps generated by the compiler seems fragile IMHO. > > > > x86 decoding proposed by https://lkml.org/lkml/2012/3/8/464 : > > > > +static int make_nop_x86(void *map, size_t const offset) > > +{ > > + unsigned char *op; > > + unsigned char *nop; > > + int size; > > + > > + /* Determine which type of jmp this is 2 byte or 5. */ > > + op = map + offset; > > + switch (*op) { > > + case 0xeb: /* 2 byte */ > > + size = 2; > > + nop = ideal_nop2_x86; > > + break; > > + case 0xe9: /* 5 byte */ > > + size = 5; > > + nop = ideal_nop; > > + break; > > + default: > > + die(NULL, "Bad jump label section (bad op %x)\n", *op); > > + __builtin_unreachable(); > > + } > > > > My though is that the code above does not cover all jump encodings that > > can be generated by past, current and future x86 assemblers. > > > > For unconditional jmp that should be pretty safe barring any fundamental > changes to the instruction set, in which case we can enable it as > needed, but for extra robustness it probably should skip prefix bytes. On x86-32, some prefixes are actually meaningful. AFAIK, the 0x66 prefix is used for: E9 cw jmp rel16 relative jump, only in 32-bit Other prefixes can probably be safely skipped. Another question is whether anything prevents the assembler from generating a jump near (absolute indirect), or far jump. The code above seems to assume that we have either a short or near relative jump. Thoughts ? Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [RFC] gcc feature request: Moving blocks into sections 2013-08-06 4:14 ` Mathieu Desnoyers @ 2013-08-06 4:28 ` H. Peter Anvin 0 siblings, 0 replies; 68+ messages in thread From: H. Peter Anvin @ 2013-08-06 4:28 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Linus Torvalds, Steven Rostedt, LKML, gcc, Ingo Molnar, Thomas Gleixner, David Daney, Behan Webster, Peter Zijlstra, Herbert Xu On 08/05/2013 09:14 PM, Mathieu Desnoyers wrote: >> >> For unconditional jmp that should be pretty safe barring any fundamental >> changes to the instruction set, in which case we can enable it as >> needed, but for extra robustness it probably should skip prefix bytes. > > On x86-32, some prefixes are actually meaningful. AFAIK, the 0x66 prefix > is used for: > > E9 cw jmp rel16 relative jump, only in 32-bit > > Other prefixes can probably be safely skipped. > Yes. Some of them are used as hints or for MPX. > Another question is whether anything prevents the assembler from > generating a jump near (absolute indirect), or far jump. The code above > seems to assume that we have either a short or near relative jump. Absolutely something prevents! It would be a very serious error for the assembler to generate such instructions. -hpa ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [RFC] gcc feature request: Moving blocks into sections 2013-08-05 21:43 ` H. Peter Anvin 2013-08-06 4:14 ` Mathieu Desnoyers @ 2013-08-06 16:15 ` Steven Rostedt 2013-08-06 16:19 ` H. Peter Anvin 1 sibling, 1 reply; 68+ messages in thread From: Steven Rostedt @ 2013-08-06 16:15 UTC (permalink / raw) To: H. Peter Anvin Cc: Mathieu Desnoyers, Linus Torvalds, LKML, gcc, Ingo Molnar, Thomas Gleixner, David Daney, Behan Webster, Peter Zijlstra, Herbert Xu On Mon, 2013-08-05 at 14:43 -0700, H. Peter Anvin wrote: > For unconditional jmp that should be pretty safe barring any fundamental > changes to the instruction set, in which case we can enable it as > needed, but for extra robustness it probably should skip prefix bytes. Would the assembler add prefix bytes to: jmp 1f 1: ?? -- Steve ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [RFC] gcc feature request: Moving blocks into sections 2013-08-06 16:15 ` Steven Rostedt @ 2013-08-06 16:19 ` H. Peter Anvin 2013-08-06 16:26 ` Steven Rostedt 0 siblings, 1 reply; 68+ messages in thread From: H. Peter Anvin @ 2013-08-06 16:19 UTC (permalink / raw) To: Steven Rostedt Cc: Mathieu Desnoyers, Linus Torvalds, LKML, gcc, Ingo Molnar, Thomas Gleixner, David Daney, Behan Webster, Peter Zijlstra, Herbert Xu On 08/06/2013 09:15 AM, Steven Rostedt wrote: > On Mon, 2013-08-05 at 14:43 -0700, H. Peter Anvin wrote: > >> For unconditional jmp that should be pretty safe barring any fundamental >> changes to the instruction set, in which case we can enable it as >> needed, but for extra robustness it probably should skip prefix bytes. > > Would the assembler add prefix bytes to: > > jmp 1f > No, but if we ever end up doing MPX in the kernel, for example, we would have to put an MPX prefix on the jmp. -hpa ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [RFC] gcc feature request: Moving blocks into sections 2013-08-06 16:19 ` H. Peter Anvin @ 2013-08-06 16:26 ` Steven Rostedt 2013-08-06 16:29 ` H. Peter Anvin 0 siblings, 1 reply; 68+ messages in thread From: Steven Rostedt @ 2013-08-06 16:26 UTC (permalink / raw) To: H. Peter Anvin Cc: Mathieu Desnoyers, Linus Torvalds, LKML, gcc, Ingo Molnar, Thomas Gleixner, David Daney, Behan Webster, Peter Zijlstra, Herbert Xu On Tue, 2013-08-06 at 09:19 -0700, H. Peter Anvin wrote: > On 08/06/2013 09:15 AM, Steven Rostedt wrote: > > On Mon, 2013-08-05 at 14:43 -0700, H. Peter Anvin wrote: > > > >> For unconditional jmp that should be pretty safe barring any fundamental > >> changes to the instruction set, in which case we can enable it as > >> needed, but for extra robustness it probably should skip prefix bytes. > > > > Would the assembler add prefix bytes to: > > > > jmp 1f > > > > No, but if we ever end up doing MPX in the kernel, for example, we would > have to put an MPX prefix on the jmp. Well then we just have to update the rest of the jump label code :-) -- Steve ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [RFC] gcc feature request: Moving blocks into sections 2013-08-06 16:26 ` Steven Rostedt @ 2013-08-06 16:29 ` H. Peter Anvin 0 siblings, 0 replies; 68+ messages in thread From: H. Peter Anvin @ 2013-08-06 16:29 UTC (permalink / raw) To: Steven Rostedt Cc: Mathieu Desnoyers, Linus Torvalds, LKML, gcc, Ingo Molnar, Thomas Gleixner, David Daney, Behan Webster, Peter Zijlstra, Herbert Xu On 08/06/2013 09:26 AM, Steven Rostedt wrote: >> >> No, but if we ever end up doing MPX in the kernel, for example, we would >> have to put an MPX prefix on the jmp. > > Well then we just have to update the rest of the jump label code :-) > For MPX in the kernel, this would be a small part of the work...! -hpa ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [RFC] gcc feature request: Moving blocks into sections 2013-08-05 21:28 ` Mathieu Desnoyers 2013-08-05 21:43 ` H. Peter Anvin @ 2013-08-05 21:44 ` Steven Rostedt 2013-08-05 22:08 ` Mathieu Desnoyers 1 sibling, 1 reply; 68+ messages in thread From: Steven Rostedt @ 2013-08-05 21:44 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Linus Torvalds, H. Peter Anvin, LKML, gcc, Ingo Molnar, Thomas Gleixner, David Daney, Behan Webster, Peter Zijlstra, Herbert Xu On Mon, 2013-08-05 at 17:28 -0400, Mathieu Desnoyers wrote: > Another thing that bothers me with Steven's approach is that decoding > jumps generated by the compiler seems fragile IMHO. The encodings wont change. If they do, then old kernels will not run on new hardware. Now if it adds a third option to jmp, then we hit the "die" path and know right away that it wont work anymore. Then we fix it properly. > > x86 decoding proposed by https://lkml.org/lkml/2012/3/8/464 : > > +static int make_nop_x86(void *map, size_t const offset) > +{ > + unsigned char *op; > + unsigned char *nop; > + int size; > + > + /* Determine which type of jmp this is 2 byte or 5. */ > + op = map + offset; > + switch (*op) { > + case 0xeb: /* 2 byte */ > + size = 2; > + nop = ideal_nop2_x86; > + break; > + case 0xe9: /* 5 byte */ > + size = 5; > + nop = ideal_nop; > + break; > + default: > + die(NULL, "Bad jump label section (bad op %x)\n", *op); > + __builtin_unreachable(); > + } > > My though is that the code above does not cover all jump encodings that > can be generated by past, current and future x86 assemblers. > > Another way around this issue might be to keep the instruction size > within a non-allocated section: > > static __always_inline bool arch_static_branch(struct static_key *key) > { > asm goto("1:" > "jmp %l[l_yes]\n\t" > "2:" > > ".pushsection __jump_table, \"aw\" \n\t" > _ASM_ALIGN "\n\t" > _ASM_PTR "1b, %l[l_yes], %c0 \n\t" > ".popsection \n\t" > > ".pushsection __jump_table_ilen \n\t" > _ASM_PTR "1b \n\t" /* Address of the jmp */ > ".byte 2b - 1b \n\t" /* Size of the jmp instruction */ > ".popsection \n\t" > > : : "i" (key) : : l_yes); > return false; > l_yes: > return true; > } > > And use (2b - 1b) to know what size of no-op should be used rather than > to rely on instruction decoding. > > Thoughts ? > Then we need to add yet another table of information to the kernel that needs to hang around. This goes with another kernel-discuss request talking about kernel data bloat. -- Steve ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [RFC] gcc feature request: Moving blocks into sections 2013-08-05 21:44 ` Steven Rostedt @ 2013-08-05 22:08 ` Mathieu Desnoyers 0 siblings, 0 replies; 68+ messages in thread From: Mathieu Desnoyers @ 2013-08-05 22:08 UTC (permalink / raw) To: Steven Rostedt Cc: Linus Torvalds, H. Peter Anvin, LKML, gcc, Ingo Molnar, Thomas Gleixner, David Daney, Behan Webster, Peter Zijlstra, Herbert Xu * Steven Rostedt (rostedt@goodmis.org) wrote: > On Mon, 2013-08-05 at 17:28 -0400, Mathieu Desnoyers wrote: > [...] > > My though is that the code above does not cover all jump encodings that > > can be generated by past, current and future x86 assemblers. > > > > Another way around this issue might be to keep the instruction size > > within a non-allocated section: > > > > static __always_inline bool arch_static_branch(struct static_key *key) > > { > > asm goto("1:" > > "jmp %l[l_yes]\n\t" > > "2:" > > > > ".pushsection __jump_table, \"aw\" \n\t" > > _ASM_ALIGN "\n\t" > > _ASM_PTR "1b, %l[l_yes], %c0 \n\t" > > ".popsection \n\t" > > > > ".pushsection __jump_table_ilen \n\t" > > _ASM_PTR "1b \n\t" /* Address of the jmp */ > > ".byte 2b - 1b \n\t" /* Size of the jmp instruction */ > > ".popsection \n\t" > > > > : : "i" (key) : : l_yes); > > return false; > > l_yes: > > return true; > > } > > > > And use (2b - 1b) to know what size of no-op should be used rather than > > to rely on instruction decoding. > > > > Thoughts ? > > > > Then we need to add yet another table of information to the kernel that > needs to hang around. This goes with another kernel-discuss request > talking about kernel data bloat. Perhaps this section could be simply removed by the post-link stage ? Thanks, Mathieu > > -- Steve > > -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [RFC] gcc feature request: Moving blocks into sections 2013-08-05 18:51 ` H. Peter Anvin 2013-08-05 19:01 ` Linus Torvalds @ 2013-08-05 19:09 ` Steven Rostedt 1 sibling, 0 replies; 68+ messages in thread From: Steven Rostedt @ 2013-08-05 19:09 UTC (permalink / raw) To: H. Peter Anvin Cc: Linus Torvalds, LKML, gcc, Ingo Molnar, Mathieu Desnoyers, Thomas Gleixner, David Daney, Behan Webster, Peter Zijlstra, Herbert Xu On Mon, 2013-08-05 at 11:51 -0700, H. Peter Anvin wrote: > On 08/05/2013 11:49 AM, Steven Rostedt wrote: > > On Mon, 2013-08-05 at 11:29 -0700, H. Peter Anvin wrote: > > > >> Traps nest, that's why there is a stack. (OK, so you don't want to take > >> the same trap inside the trap handler, but that code should be very > >> limited.) The trap instruction just becomes very short, but rather > >> slow, call-return. > >> > >> However, when you consider the cost you have to consider that the > >> tracepoint is doing other work, so it may very well amortize out. > > > > Also, how would you pass the parameters? Every tracepoint has its own > > parameters to pass to it. How would a trap know what where to get "prev" > > and "next"? > > > > How do you do that now? > > You have to do an IP lookup to find out what you are doing. ?? You mean to do the enabling? Sure, but not after the code is enabled. There's no lookup. It just calls functions directly. > > (Note: I wonder how much the parameter generation costs the tracepoints.) The same as doing a function call. -- Steve ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [RFC] gcc feature request: Moving blocks into sections 2013-08-05 17:55 ` Steven Rostedt 2013-08-05 18:11 ` Steven Rostedt 2013-08-05 18:17 ` H. Peter Anvin @ 2013-08-05 18:20 ` Linus Torvalds 2013-08-05 18:24 ` Linus Torvalds ` (2 more replies) 2 siblings, 3 replies; 68+ messages in thread From: Linus Torvalds @ 2013-08-05 18:20 UTC (permalink / raw) To: Steven Rostedt Cc: LKML, gcc, Ingo Molnar, Mathieu Desnoyers, H. Peter Anvin, Thomas Gleixner, David Daney, Behan Webster, Peter Zijlstra, Herbert Xu On Mon, Aug 5, 2013 at 10:55 AM, Steven Rostedt <rostedt@goodmis.org> wrote: > > My main concern is with tracepoints. Which on 90% (or more) of systems > running Linux, is completely off, and basically just dead code, until > someone wants to see what's happening and enables them. The static_key_false() approach with minimal inlining sounds like a much better approach overall. Sure, it might add a call/ret, but it adds it to just the unlikely tracepoint taken path. Of course, it would be good to optimize static_key_false() itself - right now those static key jumps are always five bytes, and while they get nopped out, it would still be nice if there was some way to have just a two-byte nop (turning into a short branch) *if* we can reach another jump that way..For small functions that would be lovely. Oh well. Linus ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [RFC] gcc feature request: Moving blocks into sections 2013-08-05 18:20 ` Linus Torvalds @ 2013-08-05 18:24 ` Linus Torvalds 2013-08-05 18:34 ` Linus Torvalds 2013-08-05 18:33 ` H. Peter Anvin 2013-08-05 18:39 ` Steven Rostedt 2 siblings, 1 reply; 68+ messages in thread From: Linus Torvalds @ 2013-08-05 18:24 UTC (permalink / raw) To: Steven Rostedt Cc: LKML, gcc, Ingo Molnar, Mathieu Desnoyers, H. Peter Anvin, Thomas Gleixner, David Daney, Behan Webster, Peter Zijlstra, Herbert Xu On Mon, Aug 5, 2013 at 11:20 AM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > The static_key_false() approach with minimal inlining sounds like a > much better approach overall. Sorry, I misunderstood your thing. That's actually what you want that section thing for, because right now you cannot generate the argument expansion otherwise. Ugh. I can see the attraction of your section thing for that case, I just get the feeling that we should be able to do better somehow. Linus ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [RFC] gcc feature request: Moving blocks into sections 2013-08-05 18:24 ` Linus Torvalds @ 2013-08-05 18:34 ` Linus Torvalds 2013-08-05 18:38 ` H. Peter Anvin ` (2 more replies) 0 siblings, 3 replies; 68+ messages in thread From: Linus Torvalds @ 2013-08-05 18:34 UTC (permalink / raw) To: Steven Rostedt Cc: LKML, gcc, Ingo Molnar, Mathieu Desnoyers, H. Peter Anvin, Thomas Gleixner, David Daney, Behan Webster, Peter Zijlstra, Herbert Xu On Mon, Aug 5, 2013 at 11:24 AM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > Ugh. I can see the attraction of your section thing for that case, I > just get the feeling that we should be able to do better somehow. Hmm.. Quite frankly, Steven, for your use case I think you actually want the C goto *labels* associated with a section. Which sounds like it might be a cleaner syntax than making it about the basic block anyway. Linus ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [RFC] gcc feature request: Moving blocks into sections 2013-08-05 18:34 ` Linus Torvalds @ 2013-08-05 18:38 ` H. Peter Anvin 2013-08-05 19:04 ` Steven Rostedt 2013-08-05 19:40 ` Marek Polacek 2 siblings, 0 replies; 68+ messages in thread From: H. Peter Anvin @ 2013-08-05 18:38 UTC (permalink / raw) To: Linus Torvalds Cc: Steven Rostedt, LKML, gcc, Ingo Molnar, Mathieu Desnoyers, Thomas Gleixner, David Daney, Behan Webster, Peter Zijlstra, Herbert Xu On 08/05/2013 11:34 AM, Linus Torvalds wrote: > On Mon, Aug 5, 2013 at 11:24 AM, Linus Torvalds > <torvalds@linux-foundation.org> wrote: >> >> Ugh. I can see the attraction of your section thing for that case, I >> just get the feeling that we should be able to do better somehow. > > Hmm.. Quite frankly, Steven, for your use case I think you actually > want the C goto *labels* associated with a section. Which sounds like > it might be a cleaner syntax than making it about the basic block > anyway. > A label wouldn't have an endpoint, though... -hpa ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [RFC] gcc feature request: Moving blocks into sections 2013-08-05 18:34 ` Linus Torvalds 2013-08-05 18:38 ` H. Peter Anvin @ 2013-08-05 19:04 ` Steven Rostedt 2013-08-05 19:40 ` Marek Polacek 2 siblings, 0 replies; 68+ messages in thread From: Steven Rostedt @ 2013-08-05 19:04 UTC (permalink / raw) To: Linus Torvalds Cc: LKML, gcc, Ingo Molnar, Mathieu Desnoyers, H. Peter Anvin, Thomas Gleixner, David Daney, Behan Webster, Peter Zijlstra, Herbert Xu On Mon, 2013-08-05 at 11:34 -0700, Linus Torvalds wrote: > On Mon, Aug 5, 2013 at 11:24 AM, Linus Torvalds > <torvalds@linux-foundation.org> wrote: > > > > Ugh. I can see the attraction of your section thing for that case, I > > just get the feeling that we should be able to do better somehow. > > Hmm.. Quite frankly, Steven, for your use case I think you actually > want the C goto *labels* associated with a section. Which sounds like > it might be a cleaner syntax than making it about the basic block > anyway. I would love to. But IIRC, the asm_goto() has some strict constraints. We may be able to jump to a different section, but we have no way of coming back. Not to mention, you must tell the asm goto() what label you may be jumping to. I don't know how safe something like this may be: static inline trace_sched_switch(prev, next) { asm goto("jmp foo1\n" : : foo2); foo1: return; asm goto(".pushsection\n" "section \".foo\"\n"); foo2: __trace_sched_switch(prev, next); asm goto("jmp foo1" ".popsection\n" : : foo1); } The above looks too fragile for my taste. I'm afraid gcc will move stuff out of those "asm goto" locations, and make things just fail. But I can play with this, but I don't like it. -- Steve ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [RFC] gcc feature request: Moving blocks into sections 2013-08-05 18:34 ` Linus Torvalds 2013-08-05 18:38 ` H. Peter Anvin 2013-08-05 19:04 ` Steven Rostedt @ 2013-08-05 19:40 ` Marek Polacek 2013-08-05 19:56 ` Linus Torvalds 2013-08-05 19:57 ` Jason Baron 2 siblings, 2 replies; 68+ messages in thread From: Marek Polacek @ 2013-08-05 19:40 UTC (permalink / raw) To: Linus Torvalds Cc: Steven Rostedt, LKML, gcc, Ingo Molnar, Mathieu Desnoyers, H. Peter Anvin, Thomas Gleixner, David Daney, Behan Webster, Peter Zijlstra, Herbert Xu On Mon, Aug 05, 2013 at 11:34:55AM -0700, Linus Torvalds wrote: > On Mon, Aug 5, 2013 at 11:24 AM, Linus Torvalds > <torvalds@linux-foundation.org> wrote: > > > > Ugh. I can see the attraction of your section thing for that case, I > > just get the feeling that we should be able to do better somehow. > > Hmm.. Quite frankly, Steven, for your use case I think you actually > want the C goto *labels* associated with a section. Which sounds like > it might be a cleaner syntax than making it about the basic block > anyway. FWIW, we also support hot/cold attributes for labels, thus e.g. if (bar ()) goto A; /* ... */ A: __attribute__((cold)) /* ... */ I don't know whether that might be useful for what you want or not though... Marek ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [RFC] gcc feature request: Moving blocks into sections 2013-08-05 19:40 ` Marek Polacek @ 2013-08-05 19:56 ` Linus Torvalds 2013-08-05 19:57 ` Jason Baron 1 sibling, 0 replies; 68+ messages in thread From: Linus Torvalds @ 2013-08-05 19:56 UTC (permalink / raw) To: Marek Polacek Cc: Steven Rostedt, LKML, gcc, Ingo Molnar, Mathieu Desnoyers, H. Peter Anvin, Thomas Gleixner, David Daney, Behan Webster, Peter Zijlstra, Herbert Xu On Mon, Aug 5, 2013 at 12:40 PM, Marek Polacek <polacek@redhat.com> wrote: > > FWIW, we also support hot/cold attributes for labels, thus e.g. > > if (bar ()) > goto A; > /* ... */ > A: __attribute__((cold)) > /* ... */ > > I don't know whether that might be useful for what you want or not though... Steve? That does sound like it might at least re-order the basic blocks better for your cases. Worth checking out, no? That said, I don't know what gcc actually does for that case. It may be that it just ends up trying to transfer that "cold" information to the conditional itself, which wouldn't work for our asm goto use. I hope/assume it doesn't do that, though, since the "cold" attribute would presumably also be useful for things like computed gotos etc - so it really isn't about the _source_ of the branch, but about that specific target, and the basic block re-ordering. Anyway, the exact implementation details may make it more or less useful for our special static key things. But it does sound like the right thing to do for static keys. Linus ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [RFC] gcc feature request: Moving blocks into sections 2013-08-05 19:40 ` Marek Polacek 2013-08-05 19:56 ` Linus Torvalds @ 2013-08-05 19:57 ` Jason Baron 2013-08-05 20:35 ` Richard Henderson 1 sibling, 1 reply; 68+ messages in thread From: Jason Baron @ 2013-08-05 19:57 UTC (permalink / raw) To: Marek Polacek Cc: Linus Torvalds, Steven Rostedt, LKML, gcc, Ingo Molnar, Mathieu Desnoyers, H. Peter Anvin, Thomas Gleixner, David Daney, Behan Webster, Peter Zijlstra, Herbert Xu, rth On 08/05/2013 03:40 PM, Marek Polacek wrote: > On Mon, Aug 05, 2013 at 11:34:55AM -0700, Linus Torvalds wrote: >> On Mon, Aug 5, 2013 at 11:24 AM, Linus Torvalds >> <torvalds@linux-foundation.org> wrote: >>> Ugh. I can see the attraction of your section thing for that case, I >>> just get the feeling that we should be able to do better somehow. >> Hmm.. Quite frankly, Steven, for your use case I think you actually >> want the C goto *labels* associated with a section. Which sounds like >> it might be a cleaner syntax than making it about the basic block >> anyway. > FWIW, we also support hot/cold attributes for labels, thus e.g. > > if (bar ()) > goto A; > /* ... */ > A: __attribute__((cold)) > /* ... */ > > I don't know whether that might be useful for what you want or not though... > > Marek > It certainly would be. That was how I wanted to the 'static_key' stuff to work, but unfortunately the last time I tried it, it didn't move the text out-of-line any further than it was already doing. Would that be expected? The change for us, if it worked would be quite simple. Something like: --- a/arch/x86/include/asm/jump_label.h +++ b/arch/x86/include/asm/jump_label.h @@ -21,7 +21,7 @@ static __always_inline bool arch_static_branch(struct static_key *key) ".popsection \n\t" : : "i" (key) : : l_yes); return false; -l_yes: +l_yes: __attribute__((cold)) return true; } ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [RFC] gcc feature request: Moving blocks into sections 2013-08-05 19:57 ` Jason Baron @ 2013-08-05 20:35 ` Richard Henderson 2013-08-06 2:26 ` Jason Baron 0 siblings, 1 reply; 68+ messages in thread From: Richard Henderson @ 2013-08-05 20:35 UTC (permalink / raw) To: Jason Baron Cc: Marek Polacek, Linus Torvalds, Steven Rostedt, LKML, gcc, Ingo Molnar, Mathieu Desnoyers, H. Peter Anvin, Thomas Gleixner, David Daney, Behan Webster, Peter Zijlstra, Herbert Xu On 08/05/2013 09:57 AM, Jason Baron wrote: > On 08/05/2013 03:40 PM, Marek Polacek wrote: >> On Mon, Aug 05, 2013 at 11:34:55AM -0700, Linus Torvalds wrote: >>> On Mon, Aug 5, 2013 at 11:24 AM, Linus Torvalds >>> <torvalds@linux-foundation.org> wrote: >>>> Ugh. I can see the attraction of your section thing for that case, I >>>> just get the feeling that we should be able to do better somehow. >>> Hmm.. Quite frankly, Steven, for your use case I think you actually >>> want the C goto *labels* associated with a section. Which sounds like >>> it might be a cleaner syntax than making it about the basic block >>> anyway. >> FWIW, we also support hot/cold attributes for labels, thus e.g. >> >> if (bar ()) >> goto A; >> /* ... */ >> A: __attribute__((cold)) >> /* ... */ >> >> I don't know whether that might be useful for what you want or not though... >> >> Marek >> > > It certainly would be. > > That was how I wanted to the 'static_key' stuff to work, but unfortunately the > last time I tried it, it didn't move the text out-of-line any further than it > was already doing. Would that be expected? The change for us, if it worked > would be quite simple. Something like: It is expected. One must use -freorder-blocks-and-partition, and use real profile feedback to get blocks moved completely out-of-line. Whether that's a sensible default or not is debatable. r~ ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [RFC] gcc feature request: Moving blocks into sections 2013-08-05 20:35 ` Richard Henderson @ 2013-08-06 2:26 ` Jason Baron 2013-08-06 3:03 ` Steven Rostedt 0 siblings, 1 reply; 68+ messages in thread From: Jason Baron @ 2013-08-06 2:26 UTC (permalink / raw) To: Steven Rostedt Cc: Richard Henderson, Marek Polacek, Linus Torvalds, LKML, gcc, Ingo Molnar, Mathieu Desnoyers, H. Peter Anvin, Thomas Gleixner, David Daney, Behan Webster, Peter Zijlstra, Herbert Xu On 08/05/2013 04:35 PM, Richard Henderson wrote: > On 08/05/2013 09:57 AM, Jason Baron wrote: >> On 08/05/2013 03:40 PM, Marek Polacek wrote: >>> On Mon, Aug 05, 2013 at 11:34:55AM -0700, Linus Torvalds wrote: >>>> On Mon, Aug 5, 2013 at 11:24 AM, Linus Torvalds >>>> <torvalds@linux-foundation.org> wrote: >>>>> Ugh. I can see the attraction of your section thing for that case, I >>>>> just get the feeling that we should be able to do better somehow. >>>> Hmm.. Quite frankly, Steven, for your use case I think you actually >>>> want the C goto *labels* associated with a section. Which sounds like >>>> it might be a cleaner syntax than making it about the basic block >>>> anyway. >>> FWIW, we also support hot/cold attributes for labels, thus e.g. >>> >>> if (bar ()) >>> goto A; >>> /* ... */ >>> A: __attribute__((cold)) >>> /* ... */ >>> >>> I don't know whether that might be useful for what you want or not though... >>> >>> Marek >>> >> It certainly would be. >> >> That was how I wanted to the 'static_key' stuff to work, but unfortunately the >> last time I tried it, it didn't move the text out-of-line any further than it >> was already doing. Would that be expected? The change for us, if it worked >> would be quite simple. Something like: > It is expected. One must use -freorder-blocks-and-partition, and use real > profile feedback to get blocks moved completely out-of-line. > > Whether that's a sensible default or not is debatable. > Hi Steve, I think if the 'cold' attribute on the default disabled static_key branch moved the text completely out-of-line, it would satisfy your requirement here? If you like this approach, perhaps we can make something like this work within gcc. As its already supported, but doesn't quite go far enough for our purposes. Also, if we go down this path, it means the 2-byte jump sequence is probably not going to be too useful. Thanks, -Jason ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [RFC] gcc feature request: Moving blocks into sections 2013-08-06 2:26 ` Jason Baron @ 2013-08-06 3:03 ` Steven Rostedt 0 siblings, 0 replies; 68+ messages in thread From: Steven Rostedt @ 2013-08-06 3:03 UTC (permalink / raw) To: Jason Baron Cc: Richard Henderson, Marek Polacek, Linus Torvalds, LKML, gcc, Ingo Molnar, Mathieu Desnoyers, H. Peter Anvin, Thomas Gleixner, David Daney, Behan Webster, Peter Zijlstra, Herbert Xu On Mon, 2013-08-05 at 22:26 -0400, Jason Baron wrote: > I think if the 'cold' attribute on the default disabled static_key > branch moved the text completely out-of-line, it would satisfy your > requirement here? > > If you like this approach, perhaps we can make something like this work > within gcc. As its already supported, but doesn't quite go far enough > for our purposes. It may not be too bad to use. > > Also, if we go down this path, it means the 2-byte jump sequence is > probably not going to be too useful. Don't count us out yet :-) static inline bool arch_static_branch(struct static_key *key) { asm goto("1:" [...] : : "i" (key) : : l_yes); return false; l_yes: goto __l_yes; __l_yes: __attribute__((cold)); return false; } Or put that logic in the caller of arch_static_branch(). Basically, we may be able to do a short jump to the place that will do a long jump to the real work. I'll have to play with this and see what gcc does with the output. -- Steve ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [RFC] gcc feature request: Moving blocks into sections 2013-08-05 18:20 ` Linus Torvalds 2013-08-05 18:24 ` Linus Torvalds @ 2013-08-05 18:33 ` H. Peter Anvin 2013-08-05 18:39 ` Steven Rostedt 2 siblings, 0 replies; 68+ messages in thread From: H. Peter Anvin @ 2013-08-05 18:33 UTC (permalink / raw) To: Linus Torvalds Cc: Steven Rostedt, LKML, gcc, Ingo Molnar, Mathieu Desnoyers, Thomas Gleixner, David Daney, Behan Webster, Peter Zijlstra, Herbert Xu On 08/05/2013 11:20 AM, Linus Torvalds wrote: > > Of course, it would be good to optimize static_key_false() itself - > right now those static key jumps are always five bytes, and while they > get nopped out, it would still be nice if there was some way to have > just a two-byte nop (turning into a short branch) *if* we can reach > another jump that way..For small functions that would be lovely. Oh > well. > That would definitely require gcc support. It would be useful, but probably requires a lot of machinery. -hpa ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [RFC] gcc feature request: Moving blocks into sections 2013-08-05 18:20 ` Linus Torvalds 2013-08-05 18:24 ` Linus Torvalds 2013-08-05 18:33 ` H. Peter Anvin @ 2013-08-05 18:39 ` Steven Rostedt 2013-08-05 18:49 ` Linus Torvalds 2013-08-05 20:06 ` Jason Baron 2 siblings, 2 replies; 68+ messages in thread From: Steven Rostedt @ 2013-08-05 18:39 UTC (permalink / raw) To: Linus Torvalds Cc: LKML, gcc, Ingo Molnar, Mathieu Desnoyers, H. Peter Anvin, Thomas Gleixner, David Daney, Behan Webster, Peter Zijlstra, Herbert Xu On Mon, 2013-08-05 at 11:20 -0700, Linus Torvalds wrote: > Of course, it would be good to optimize static_key_false() itself - > right now those static key jumps are always five bytes, and while they > get nopped out, it would still be nice if there was some way to have > just a two-byte nop (turning into a short branch) *if* we can reach > another jump that way..For small functions that would be lovely. Oh > well. I had patches that did exactly this: https://lkml.org/lkml/2012/3/8/461 But it got dropped for some reason. I don't remember why. Maybe because of the complexity? -- Steve ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [RFC] gcc feature request: Moving blocks into sections 2013-08-05 18:39 ` Steven Rostedt @ 2013-08-05 18:49 ` Linus Torvalds 2013-08-05 19:39 ` Steven Rostedt 2013-08-06 14:19 ` Steven Rostedt 2013-08-05 20:06 ` Jason Baron 1 sibling, 2 replies; 68+ messages in thread From: Linus Torvalds @ 2013-08-05 18:49 UTC (permalink / raw) To: Steven Rostedt Cc: LKML, gcc, Ingo Molnar, Mathieu Desnoyers, H. Peter Anvin, Thomas Gleixner, David Daney, Behan Webster, Peter Zijlstra, Herbert Xu On Mon, Aug 5, 2013 at 11:39 AM, Steven Rostedt <rostedt@goodmis.org> wrote: > > I had patches that did exactly this: > > https://lkml.org/lkml/2012/3/8/461 > > But it got dropped for some reason. I don't remember why. Maybe because > of the complexity? Ugh. Why the crazy update_jump_label script stuff? I'd go "Eww" at that too, it looks crazy. The assembler already knows to make short 2-byte "jmp" instructions for near jumps, and you can just look at the opcode itself to determine size, why is all that other stuff required? IOW, 5/7 looks sane, but 4/7 makes me go "there's something wrong with that series". Linus ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [RFC] gcc feature request: Moving blocks into sections 2013-08-05 18:49 ` Linus Torvalds @ 2013-08-05 19:39 ` Steven Rostedt 2013-08-06 14:19 ` Steven Rostedt 1 sibling, 0 replies; 68+ messages in thread From: Steven Rostedt @ 2013-08-05 19:39 UTC (permalink / raw) To: Linus Torvalds Cc: LKML, gcc, Ingo Molnar, Mathieu Desnoyers, H. Peter Anvin, Thomas Gleixner, David Daney, Behan Webster, Peter Zijlstra, Herbert Xu On Mon, 2013-08-05 at 11:49 -0700, Linus Torvalds wrote: > On Mon, Aug 5, 2013 at 11:39 AM, Steven Rostedt <rostedt@goodmis.org> wrote: > > > > I had patches that did exactly this: > > > > https://lkml.org/lkml/2012/3/8/461 > > > > But it got dropped for some reason. I don't remember why. Maybe because > > of the complexity? > > Ugh. Why the crazy update_jump_label script stuff? I'd go "Eww" at > that too, it looks crazy. The assembler already knows to make short > 2-byte "jmp" instructions for near jumps, and you can just look at the > opcode itself to determine size, why is all that other stuff required? Hmm, I probably added that "optimization" in there because I was doing a bunch of jump label work and just included it in. It's been over a year since I've worked on this so I don't remember all the details. That update_jump_label program may have just been to do the conversion of nops at compile time and not during boot. It may not be needed. Also, it was based on the record-mcount code that the function tracer uses, which is also done at compile time, to get all the mcount locations. > > IOW, 5/7 looks sane, but 4/7 makes me go "there's something wrong with > that series". I just quickly looked at the changes again. I think I can redo them and send them again for 3.12. What do you think about keeping all but patch 4? 1 - Use a default nop at boot. I had help from hpa on this. Currently, jump labels use a jmp instead of a nop on boot. 2 - On boot, the jump label nops (jump before patch 1) looks at the best run time nop, and converts them. Since it is likely that the current nop is already ideal, skip the conversion. Again, this is just a boot up optimization. 3 - Add a test to see what we are converting. Adds safety checks like there is in the function tracer, where if it updates a location, and does not find what it expects to find, output a nasty bug. < will skip patch 4 > 5 - Does what you want, with the 2 and 5 byte nops. 6 - When/if a failure does trigger. Print out information to what went wrong. Helps debugging splats caused by patch 3. 7 - needs to go before patch 3. As patch 3 can trigger if the default nop is not the ideal nop for the box that is running. <reported by Ingo> If I take out patch 4, would that solution look fine for you? I can get this ready for 3.12. -- Steve ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [RFC] gcc feature request: Moving blocks into sections 2013-08-05 18:49 ` Linus Torvalds 2013-08-05 19:39 ` Steven Rostedt @ 2013-08-06 14:19 ` Steven Rostedt 2013-08-06 17:48 ` Linus Torvalds 1 sibling, 1 reply; 68+ messages in thread From: Steven Rostedt @ 2013-08-06 14:19 UTC (permalink / raw) To: Linus Torvalds Cc: LKML, gcc, Ingo Molnar, Mathieu Desnoyers, H. Peter Anvin, Thomas Gleixner, David Daney, Behan Webster, Peter Zijlstra, Herbert Xu On Mon, 2013-08-05 at 11:49 -0700, Linus Torvalds wrote: > Ugh. Why the crazy update_jump_label script stuff? After playing with the patches again, I now understand why I did that. It wasn't just for optimization. Currently the way jump labels work is that we use asm goto() and place a 5 byte nop in the assembly, with some labels. The location of the nop is stored in the __jump_table section. In order to use either 2 or 5 byte jumps, I had to put in the actual jump and let the assembler place the correct op code in. This changes the default switch for jump labels. Instead of being default off, it is now default on. To handle this, I had to convert all the jumps back to nops before the kernel runs. This was done at compile time with the update_jump_label script/program. Now, we can just do the update in early boot, but is this the best way? This means that the update must happen before any jump label is used. This may not be an issue, but as jump labels can be used for anything (not just tracing), it may be hard to know when the first instance is actually used. Also, if there is any issue with the op codes as Mathieu has been pointing out, it would only be caught at run time (boot time). The update_jump_label program isn't really that complex. Yes it parses the elf tables, but that's rather standard and that method is used by ftrace with the mcount locations (instead of that nasty daemon). It finds the __jump_table section and runs down the list of locations just like the boot up code does, and modifies the jumps to nops. If the compiler does something strange, it would be caught at compile time not boot time. Anyway, if you feel that update_jump_label is too complex, I can go the "update at early boot" route and see how that goes. -- Steve ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [RFC] gcc feature request: Moving blocks into sections 2013-08-06 14:19 ` Steven Rostedt @ 2013-08-06 17:48 ` Linus Torvalds 2013-08-06 17:58 ` Steven Rostedt 0 siblings, 1 reply; 68+ messages in thread From: Linus Torvalds @ 2013-08-06 17:48 UTC (permalink / raw) To: Steven Rostedt Cc: LKML, gcc, Ingo Molnar, Mathieu Desnoyers, H. Peter Anvin, Thomas Gleixner, David Daney, Behan Webster, Peter Zijlstra, Herbert Xu On Tue, Aug 6, 2013 at 7:19 AM, Steven Rostedt <rostedt@goodmis.org> wrote: > > After playing with the patches again, I now understand why I did that. > It wasn't just for optimization. [explanation snipped] > Anyway, if you feel that update_jump_label is too complex, I can go the > "update at early boot" route and see how that goes. Ugh. I'd love to see short jumps, but I do dislike binary rewriting, and doing it at early boot seems really quite scary too. So I wonder if this is a "ok, let's not bother, it's not worth the pain" issue. 128 bytes of offset is very small, so there probably aren't all that many cases that would use it. Linus ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [RFC] gcc feature request: Moving blocks into sections 2013-08-06 17:48 ` Linus Torvalds @ 2013-08-06 17:58 ` Steven Rostedt 2013-08-06 20:33 ` Mathieu Desnoyers 0 siblings, 1 reply; 68+ messages in thread From: Steven Rostedt @ 2013-08-06 17:58 UTC (permalink / raw) To: Linus Torvalds Cc: LKML, gcc, Ingo Molnar, Mathieu Desnoyers, H. Peter Anvin, Thomas Gleixner, David Daney, Behan Webster, Peter Zijlstra, Herbert Xu On Tue, 2013-08-06 at 10:48 -0700, Linus Torvalds wrote: > So I wonder if this is a "ok, let's not bother, it's not worth the > pain" issue. 128 bytes of offset is very small, so there probably > aren't all that many cases that would use it. OK, I'll forward port the original patches for the hell of it anyway, and post it as an RFC. Let people play with it if they want, and if it seems like it would benefit the kernel perhaps we can reconsider. It shouldn't be too hard to do the forward port, and if we don't ever take it, it would be a fun exercise regardless ;-) Actually, the first three patches should be added as they are clean ups and safety checks. Nothing to do with the actual 2-5 byte jumps. They were lost due to their association with the complex patches. :-/ -- Steve ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [RFC] gcc feature request: Moving blocks into sections 2013-08-06 17:58 ` Steven Rostedt @ 2013-08-06 20:33 ` Mathieu Desnoyers 2013-08-06 20:43 ` Steven Rostedt 0 siblings, 1 reply; 68+ messages in thread From: Mathieu Desnoyers @ 2013-08-06 20:33 UTC (permalink / raw) To: Steven Rostedt Cc: Linus Torvalds, LKML, gcc, Ingo Molnar, H. Peter Anvin, Thomas Gleixner, David Daney, Behan Webster, Peter Zijlstra, Herbert Xu * Steven Rostedt (rostedt@goodmis.org) wrote: > On Tue, 2013-08-06 at 10:48 -0700, Linus Torvalds wrote: > > > So I wonder if this is a "ok, let's not bother, it's not worth the > > pain" issue. 128 bytes of offset is very small, so there probably > > aren't all that many cases that would use it. > > OK, I'll forward port the original patches for the hell of it anyway, > and post it as an RFC. Let people play with it if they want, and if it > seems like it would benefit the kernel perhaps we can reconsider. > > It shouldn't be too hard to do the forward port, and if we don't ever > take it, it would be a fun exercise regardless ;-) > > Actually, the first three patches should be added as they are clean ups > and safety checks. Nothing to do with the actual 2-5 byte jumps. They > were lost due to their association with the complex patches. :-/ Steve, perhaps you could add a mode to your binary rewriting program that counts the number of 2-byte vs 5-byte jumps found, and if possible get a breakdown of those per subsystem ? It might help us getting a clearer picture of how many important sites, insn cache-wise, are being shrinked by this approach. Thanks, Mathieu > > -- Steve > > -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [RFC] gcc feature request: Moving blocks into sections 2013-08-06 20:33 ` Mathieu Desnoyers @ 2013-08-06 20:43 ` Steven Rostedt 2013-08-07 0:45 ` Steven Rostedt 0 siblings, 1 reply; 68+ messages in thread From: Steven Rostedt @ 2013-08-06 20:43 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Linus Torvalds, LKML, gcc, Ingo Molnar, H. Peter Anvin, Thomas Gleixner, David Daney, Behan Webster, Peter Zijlstra, Herbert Xu On Tue, 2013-08-06 at 16:33 -0400, Mathieu Desnoyers wrote: > Steve, perhaps you could add a mode to your binary rewriting program > that counts the number of 2-byte vs 5-byte jumps found, and if possible > get a breakdown of those per subsystem ? I actually started doing that, as I was curious to how many were being changed as well. Note, this is low on my priority list, so I work on it as I get time. -- Steve ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [RFC] gcc feature request: Moving blocks into sections 2013-08-06 20:43 ` Steven Rostedt @ 2013-08-07 0:45 ` Steven Rostedt 2013-08-07 0:56 ` Steven Rostedt 0 siblings, 1 reply; 68+ messages in thread From: Steven Rostedt @ 2013-08-07 0:45 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Linus Torvalds, LKML, gcc, Ingo Molnar, H. Peter Anvin, Thomas Gleixner, David Daney, Behan Webster, Peter Zijlstra, Herbert Xu On Tue, 2013-08-06 at 16:43 -0400, Steven Rostedt wrote: > On Tue, 2013-08-06 at 16:33 -0400, Mathieu Desnoyers wrote: > > > Steve, perhaps you could add a mode to your binary rewriting program > > that counts the number of 2-byte vs 5-byte jumps found, and if possible > > get a breakdown of those per subsystem ? > > I actually started doing that, as I was curious to how many were being > changed as well. I didn't add it to the update program as that runs on each individual object (needs to handle modules). But I put in the start up code a counter to see what types were converted: [ 3.387362] short jumps: 106 [ 3.390277] long jumps: 330 Thus, approximately 25%. Not bad. -- Steve ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [RFC] gcc feature request: Moving blocks into sections 2013-08-07 0:45 ` Steven Rostedt @ 2013-08-07 0:56 ` Steven Rostedt 2013-08-07 5:06 ` Ondřej Bílka 0 siblings, 1 reply; 68+ messages in thread From: Steven Rostedt @ 2013-08-07 0:56 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Linus Torvalds, LKML, gcc, Ingo Molnar, H. Peter Anvin, Thomas Gleixner, David Daney, Behan Webster, Peter Zijlstra, Herbert Xu On Tue, 2013-08-06 at 20:45 -0400, Steven Rostedt wrote: > [ 3.387362] short jumps: 106 > [ 3.390277] long jumps: 330 > > Thus, approximately 25%. Not bad. Also, where these happen to be is probably even more important than how many. If all the short jumps happen in slow paths, it's rather pointless. But they seem to be in some rather hot paths. I had it print out where it placed the short jumps too: [ 0.000000] short jump at: place_entity+0x53/0x87 ffffffff8106e139^M [ 0.000000] short jump at: place_entity+0x17/0x87 ffffffff8106e0fd^M [ 0.000000] short jump at: check_preempt_wakeup+0x11c/0x16e ffffffff8106f92b^M [ 0.000000] short jump at: can_migrate_task+0xc6/0x15d ffffffff8106e72e [ 0.000000] short jump at: update_group_power+0x72/0x1df ffffffff81070394 [ 0.000000] short jump at: update_group_power+0xaf/0x1df ffffffff810703d1^M [ 0.000000] short jump at: hrtick_enabled+0x4/0x35 ffffffff8106de51 [ 0.000000] short jump at: task_tick_fair+0x5c/0xf9 ffffffff81070102^M [ 0.000000] short jump at: source_load+0x27/0x40 ffffffff8106da7c^M [ 0.000000] short jump at: target_load+0x27/0x40 ffffffff8106dabc^M [ 0.000000] short jump at: try_to_wake_up+0x127/0x1e2 ffffffff8106b1d4^M [ 0.000000] short jump at: build_sched_domains+0x219/0x90b ffffffff8106bc24^M [ 0.000000] short jump at: smp_trace_call_function_single_interrupt+0x79/0x112 ffffffff8102616f^M [ 0.000000] short jump at: smp_trace_call_function_interrupt+0x7a/0x111 ffffffff81026038 [ 0.000000] short jump at: smp_trace_error_interrupt+0x72/0x109 ffffffff81028c9e [ 0.000000] short jump at: smp_trace_spurious_interrupt+0x71/0x107 ffffffff81028b77 [ 0.000000] short jump at: smp_trace_reschedule_interrupt+0x7a/0x110 ffffffff81025f01^M [ 0.000000] short jump at: __raise_softirq_irqoff+0xf/0x90 ffffffff810406e0^M [ 0.000000] short jump at: it_real_fn+0x17/0xb2 ffffffff8103ed85 [ 0.000000] short jump at: trace_itimer_state+0x13/0x97 ffffffff8103e9ff^M [ 0.000000] short jump at: debug_deactivate+0xa/0x7a ffffffff8106014d^M [ 0.000000] short jump at: debug_activate+0x10/0x86 ffffffff810478c7^M [ 0.000000] short jump at: __send_signal+0x233/0x268 ffffffff8104a6bb [ 0.000000] short jump at: send_sigqueue+0x103/0x148 ffffffff8104bbbf^M [ 0.000000] short jump at: trace_workqueue_activate_work+0xa/0x7a ffffffff81053deb^M [ 0.000000] short jump at: _rcu_barrier_trace+0x31/0xbc ffffffff810b8f81 [ 0.000000] short jump at: trace_rcu_dyntick+0x14/0x8f ffffffff810ba3a2^M [ 0.000000] short jump at: rcu_implicit_dynticks_qs+0x95/0xc4 ffffffff810ba35f [ 0.000000] short jump at: rcu_implicit_dynticks_qs+0x47/0xc4 ffffffff810ba311^M [ 0.000000] short jump at: trace_rcu_future_gp.isra.38+0x46/0xe9 ffffffff810b91e8 [ 0.000000] short jump at: trace_rcu_grace_period+0x14/0x8f ffffffff810b90d3 [ 0.000000] short jump at: trace_rcu_utilization+0xa/0x7a ffffffff810b9a6b [ 0.000000] short jump at: update_curr+0x89/0x14f ffffffff8106f4c9^M [ 0.000000] short jump at: update_stats_wait_end+0x5a/0xda ffffffff8106f203^M [ 0.000000] short jump at: delayed_put_task_struct+0x1b/0x95 ffffffff8103c798^M [ 0.000000] short jump at: trace_module_get+0x10/0x86 ffffffff81096b44^M [ 0.000000] short jump at: pm_qos_update_flags+0xc5/0x149 ffffffff81076fa0^M [ 0.000000] short jump at: pm_qos_update_request+0x51/0xf3 ffffffff81076b1e^M [ 0.000000] short jump at: pm_qos_add_request+0xb7/0x14e ffffffff81076db9^M [ 0.000000] short jump at: wakeup_source_report_event+0x7b/0xfc ffffffff81323045 [ 0.000000] short jump at: trace_rpm_return_int+0x14/0x8f ffffffff81323d3d^M [ 0.000000] short jump at: __activate_page+0xdd/0x183 ffffffff810f8a1d^M [ 0.000000] short jump at: __pagevec_lru_add_fn+0x139/0x1c4 ffffffff810f88b5^M [ 0.000000] short jump at: shrink_inactive_list+0x364/0x400 ffffffff810fcee8^M [ 0.000000] short jump at: isolate_lru_pages.isra.57+0xb6/0x14a ffffffff810fbafb^M [ 0.000000] short jump at: wakeup_kswapd+0xaf/0x14a ffffffff810fbd20^M [ 0.000000] short jump at: free_hot_cold_page_list+0x2a/0xca ffffffff810f3d1e [ 0.000000] short jump at: kmem_cache_free+0x74/0xee ffffffff81129f9a^M [ 0.000000] short jump at: kmem_cache_alloc_node+0xe6/0x17b ffffffff8112afb1^M [ 0.000000] short jump at: kmem_cache_alloc_node_trace+0xe1/0x176 ffffffff8112b615^M [ 0.000000] short jump at: kmem_cache_alloc+0xd8/0x168 ffffffff8112c1fe^M [ 0.000000] short jump at: trace_kmalloc+0x21/0xac ffffffff8112aa7e^M [ 0.000000] short jump at: wait_iff_congested+0xdc/0x158 ffffffff81105ee3^M [ 0.000000] short jump at: congestion_wait+0xa6/0x122 ffffffff81106005^M [ 0.000000] short jump at: global_dirty_limits+0xd7/0x151 ffffffff810f5f74 [ 0.000000] short jump at: queue_io+0x165/0x1e6 ffffffff811568ec [ 0.000000] short jump at: bdi_register+0xe9/0x161 ffffffff81106329^M [ 0.000000] short jump at: bdi_start_background_writeback+0xf/0x9c ffffffff8115755d^M [ 0.000000] short jump at: trace_writeback_pages_written+0xa/0x7a ffffffff81156717^M [ 0.000000] short jump at: trace_ext3_truncate_exit+0xa/0x7a ffffffff8119d57a^M [ 0.000000] short jump at: ext3_readpage+0xf/0x8c ffffffff8119d4f3 [ 0.000000] short jump at: ext3_drop_inode+0x2b/0xae ffffffff811a9435^M [ 0.000000] short jump at: ext4_es_find_delayed_extent_range+0x143/0x1e9 ffffffff811ea671^M [ 0.000000] short jump at: trace_ext4_get_implied_cluster_alloc_exit+0x14/0x8f ffffffff811d7b5a^M [ 0.000000] short jump at: __ext4_journal_start_reserved+0xf4/0x11a ffffffff811dd59b^M [ 0.000000] short jump at: ext4_truncate+0x25e/0x2fd ffffffff811ba6e3 [ 0.000000] short jump at: ext4_fallocate+0x39f/0x435 ffffffff811dcb20^M [ 0.000000] short jump at: trace_ext4_da_reserve_space+0x11/0x87 ffffffff811b707f^M [ 0.000000] short jump at: ext4_mb_release_group_pa+0xd9/0x112 ffffffff811e2413^M [ 0.000000] short jump at: ext4_mb_release_context+0x424/0x4d3 ffffffff811e2a74^M [ 0.000000] short jump at: ext4_alloc_da_blocks+0xf/0xa2 ffffffff811b87c9^M [ 0.000000] short jump at: trace_ext4_sync_fs+0x11/0x92 ffffffff811ce6e1^M [ 0.000000] short jump at: ext4_drop_inode+0x2b/0xae ffffffff811ce64d^M [ 0.000000] short jump at: trace_jbd_do_submit_data+0x11/0x87 ffffffff811f2a57 [ 0.000000] short jump at: __journal_drop_transaction+0xe0/0x160 ffffffff811f4958 [ 0.000000] short jump at: __jbd2_journal_drop_transaction+0xd8/0x152 ffffffff811fdb82^M [ 0.000000] short jump at: trace_block_plug+0xa/0x7a ffffffff81243e17^M [ 0.000000] short jump at: dec_pending+0x258/0x2e7 ffffffff813ec27c^M [ 0.000000] short jump at: elv_abort_queue+0x2a/0xc6 ffffffff8123fff5^M [ 0.000000] short jump at: touch_buffer+0xa/0x8c ffffffff8115cfb0 [ 0.000000] short jump at: trace_gpio_value+0x13/0x97 ffffffff81273682^M [ 0.000000] short jump at: trace_gpio_direction+0x13/0x97 ffffffff8127380f [ 0.000000] short jump at: _regulator_disable+0x1e8/0x22b ffffffff812dfce6 [ 0.000000] short jump at: scsi_eh_wakeup+0x21/0xcb ffffffff8133287a [ 0.000000] short jump at: scsi_done+0xf/0x8a ffffffff8132e568 [ 0.000000] short jump at: __udp_queue_rcv_skb+0xae/0x132 ffffffff814939f4^M [ 0.000000] short jump at: consume_skb+0x38/0xbe ffffffff8142b3b5 [ 0.000000] short jump at: kfree_skb+0x3f/0xcb ffffffff8142a01c [ 0.000000] short jump at: perf_event_task_sched_out+0x16/0x57 ffffffff810677ad [ 0.000000] short jump at: netif_receive_skb+0x25/0x8a ffffffff81435c6d^M [ 0.000000] short jump at: netif_receive_skb+0x11/0x8a ffffffff81435c59^M [ 0.000000] short jump at: udp_destroy_sock+0x37/0x5d ffffffff81494e22 [ 0.000000] short jump at: udpv6_destroy_sock+0x26/0x54 ffffffff814d4fa0^M [ 0.000000] short jump at: perf_event_task_sched_out+0x14/0x57 ffffffff810677ab^M [ 0.000000] short jump at: set_task_cpu+0x137/0x1b7 ffffffff81069933 [ 0.000000] short jump at: tcp_prequeue_process+0x30/0x76 ffffffff81479f88^M [ 0.000000] short jump at: __netif_receive_skb+0xc/0x5b ffffffff81435abc^M [ 0.000000] short jump at: __netdev_alloc_skb+0x34/0xae ffffffff81429ed7^M [ 0.000000] short jump at: sk_backlog_rcv+0x7/0x2b ffffffff81425659 [ 0.000000] short jump at: ac_put_obj.isra.36+0xd/0x41 ffffffff81129cad^M [ 0.000000] short jump at: ipmr_queue_xmit.isra.29+0x3b7/0x403 ffffffff814af2bc^M [ 0.000000] short jump at: ip_forward+0x200/0x28b ffffffff81470661 [ 0.000000] short jump at: __br_multicast_send_query+0x412/0x45d ffffffff814f818f^M [ 0.000000] short jump at: br_send_bpdu+0x10e/0x14b ffffffff814f308a [ 0.000000] short jump at: NF_HOOK.constprop.29+0xf/0x49 ffffffff814dafe0^M [ 0.000000] short jump at: ndisc_send_skb+0x1c6/0x29b ffffffff814d20b8^M The kmem_cache_* and the try_to_wake_up* are the hot paths that caught my eye. But still, is this worth it? -- Steve ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [RFC] gcc feature request: Moving blocks into sections 2013-08-07 0:56 ` Steven Rostedt @ 2013-08-07 5:06 ` Ondřej Bílka 2013-08-07 15:02 ` Steven Rostedt 0 siblings, 1 reply; 68+ messages in thread From: Ondřej Bílka @ 2013-08-07 5:06 UTC (permalink / raw) To: Steven Rostedt Cc: Mathieu Desnoyers, Linus Torvalds, LKML, gcc, Ingo Molnar, H. Peter Anvin, Thomas Gleixner, David Daney, Behan Webster, Peter Zijlstra, Herbert Xu On Tue, Aug 06, 2013 at 08:56:00PM -0400, Steven Rostedt wrote: > On Tue, 2013-08-06 at 20:45 -0400, Steven Rostedt wrote: > > > [ 3.387362] short jumps: 106 > > [ 3.390277] long jumps: 330 > > > > Thus, approximately 25%. Not bad. > > Also, where these happen to be is probably even more important than how > many. If all the short jumps happen in slow paths, it's rather > pointless. But they seem to be in some rather hot paths. I had it print > out where it placed the short jumps too: > > The kmem_cache_* and the try_to_wake_up* are the hot paths that caught > my eye. > > But still, is this worth it? > Add short_counter,long_counter and before increment counter before each jump. That way we will know how many short/long jumps were taken. > -- Steve > ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [RFC] gcc feature request: Moving blocks into sections 2013-08-07 5:06 ` Ondřej Bílka @ 2013-08-07 15:02 ` Steven Rostedt 2013-08-07 16:03 ` Mathieu Desnoyers 0 siblings, 1 reply; 68+ messages in thread From: Steven Rostedt @ 2013-08-07 15:02 UTC (permalink / raw) To: Ondřej Bílka Cc: Mathieu Desnoyers, Linus Torvalds, LKML, gcc, Ingo Molnar, H. Peter Anvin, Thomas Gleixner, David Daney, Behan Webster, Peter Zijlstra, Herbert Xu On Wed, 2013-08-07 at 07:06 +0200, Ondřej Bílka wrote: > Add short_counter,long_counter and before increment counter before each > jump. That way we will know how many short/long jumps were taken. That's not trivial at all. The jump is a single location (in an asm goto() statement) that happens to be inlined through out the kernel. The assembler decides if it will be a short or long jump. How do you add a counter to count the difference? The output I gave is from the boot up code that converts the jmp back to a nop (or in this case, the default nop to the ideal nop). It knows the size by reading the op code. This is a static analysis, not a running one. It's no trivial task to have a counter for each jump. There is a way though. If we enable all the jumps (all tracepoints, and other users of jumplabel), record the trace and then compare the trace to the output that shows which ones were short jumps, and all others are long jumps. I'll post the patches soon and you can have fun doing the compare :-) Actually, I'm working on the 4 patches of the series that is more about clean ups and safety checks than the jmp conversion. That is not controversial, and I'll be posting them for 3.12 soon. After that, I'll post the updated patches that have the conversion as well as the counter, for RFC and for others to play with. -- Steve ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [RFC] gcc feature request: Moving blocks into sections 2013-08-07 15:02 ` Steven Rostedt @ 2013-08-07 16:03 ` Mathieu Desnoyers 2013-08-07 16:11 ` Steven Rostedt 0 siblings, 1 reply; 68+ messages in thread From: Mathieu Desnoyers @ 2013-08-07 16:03 UTC (permalink / raw) To: Steven Rostedt Cc: Ondřej Bílka, Linus Torvalds, LKML, gcc, Ingo Molnar, H. Peter Anvin, Thomas Gleixner, David Daney, Behan Webster, Peter Zijlstra, Herbert Xu * Steven Rostedt (rostedt@goodmis.org) wrote: > On Wed, 2013-08-07 at 07:06 +0200, Ondřej Bílka wrote: > > > Add short_counter,long_counter and before increment counter before each > > jump. That way we will know how many short/long jumps were taken. > > That's not trivial at all. The jump is a single location (in an asm > goto() statement) that happens to be inlined through out the kernel. The > assembler decides if it will be a short or long jump. How do you add a > counter to count the difference? You might want to try creating a global array of counters (accessible both from C for printout and assembly for update). Index the array from assembly using: (2f - 1f) 1: jmp ...; 2: And put an atomic increment of the counter. This increment instruction should be located prior to the jmp for obvious reasons. You'll end up with the sums you're looking for at indexes 2 and 5 of the array. Thanks, Mathieu > > The output I gave is from the boot up code that converts the jmp back to > a nop (or in this case, the default nop to the ideal nop). It knows the > size by reading the op code. This is a static analysis, not a running > one. It's no trivial task to have a counter for each jump. > > There is a way though. If we enable all the jumps (all tracepoints, and > other users of jumplabel), record the trace and then compare the trace > to the output that shows which ones were short jumps, and all others are > long jumps. > > I'll post the patches soon and you can have fun doing the compare :-) > > Actually, I'm working on the 4 patches of the series that is more about > clean ups and safety checks than the jmp conversion. That is not > controversial, and I'll be posting them for 3.12 soon. > > After that, I'll post the updated patches that have the conversion as > well as the counter, for RFC and for others to play with. > > -- Steve > > -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [RFC] gcc feature request: Moving blocks into sections 2013-08-07 16:03 ` Mathieu Desnoyers @ 2013-08-07 16:11 ` Steven Rostedt 2013-08-07 23:22 ` Mathieu Desnoyers 0 siblings, 1 reply; 68+ messages in thread From: Steven Rostedt @ 2013-08-07 16:11 UTC (permalink / raw) To: Mathieu Desnoyers Cc: Ondřej Bílka, Linus Torvalds, LKML, gcc, Ingo Molnar, H. Peter Anvin, Thomas Gleixner, David Daney, Behan Webster, Peter Zijlstra, Herbert Xu On Wed, 2013-08-07 at 12:03 -0400, Mathieu Desnoyers wrote: > You might want to try creating a global array of counters (accessible > both from C for printout and assembly for update). > > Index the array from assembly using: (2f - 1f) > > 1: > jmp ...; > 2: > > And put an atomic increment of the counter. This increment instruction > should be located prior to the jmp for obvious reasons. > > You'll end up with the sums you're looking for at indexes 2 and 5 of the > array. After I post the patches, feel free to knock yourself out. -- Steve ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [RFC] gcc feature request: Moving blocks into sections 2013-08-07 16:11 ` Steven Rostedt @ 2013-08-07 23:22 ` Mathieu Desnoyers 0 siblings, 0 replies; 68+ messages in thread From: Mathieu Desnoyers @ 2013-08-07 23:22 UTC (permalink / raw) To: Steven Rostedt Cc: Ondřej Bílka, Linus Torvalds, LKML, gcc, Ingo Molnar, H. Peter Anvin, Thomas Gleixner, David Daney, Behan Webster, Peter Zijlstra, Herbert Xu * Steven Rostedt (rostedt@goodmis.org) wrote: > On Wed, 2013-08-07 at 12:03 -0400, Mathieu Desnoyers wrote: > > > You might want to try creating a global array of counters (accessible > > both from C for printout and assembly for update). > > > > Index the array from assembly using: (2f - 1f) > > > > 1: > > jmp ...; > > 2: > > > > And put an atomic increment of the counter. This increment instruction > > should be located prior to the jmp for obvious reasons. > > > > You'll end up with the sums you're looking for at indexes 2 and 5 of the > > array. > > After I post the patches, feel free to knock yourself out. I just need the calculation, not the entire patchset. For this purpose: Based on top of 3.10.5: --- arch/x86/include/asm/jump_label.h | 15 ++++++++++++++- include/linux/jump_label.h | 3 +++ kernel/jump_label.c | 12 ++++++++++++ 3 files changed, 29 insertions(+), 1 deletion(-) Index: linux/arch/x86/include/asm/jump_label.h =================================================================== --- linux.orig/arch/x86/include/asm/jump_label.h +++ linux/arch/x86/include/asm/jump_label.h @@ -15,9 +15,20 @@ static __always_inline bool arch_static_ { asm goto("1:" STATIC_KEY_INITIAL_NOP +#ifdef CONFIG_X86_64 + "lock; incq 4f \n\t" +#else + "lock; incl 4f \n\t" +#endif + "jmp 3f \n\t" + "2:" + "jmp %l[l_yes] \n\t" + "3:" ".pushsection __jump_table, \"aw\" \n\t" _ASM_ALIGN "\n\t" - _ASM_PTR "1b, %l[l_yes], %c0 \n\t" + _ASM_PTR "1b, %l[l_yes], %c0, (3b - 2b) \n\t" + "4:" /* nr_hit */ + _ASM_PTR "0 \n\t" ".popsection \n\t" : : "i" (key) : : l_yes); return false; @@ -37,6 +48,8 @@ struct jump_entry { jump_label_t code; jump_label_t target; jump_label_t key; + jump_label_t jmp_insn_len; + jump_label_t nr_hit; }; #endif Index: linux/include/linux/jump_label.h =================================================================== --- linux.orig/include/linux/jump_label.h +++ linux/include/linux/jump_label.h @@ -208,4 +208,7 @@ static inline bool static_key_enabled(st return (atomic_read(&key->enabled) > 0); } +struct jump_entry *get_jump_label_start(void); +struct jump_entry *get_jump_label_stop(void); + #endif /* _LINUX_JUMP_LABEL_H */ Index: linux/kernel/jump_label.c =================================================================== --- linux.orig/kernel/jump_label.c +++ linux/kernel/jump_label.c @@ -16,6 +16,18 @@ #ifdef HAVE_JUMP_LABEL +struct jump_entry *get_jump_label_start(void) +{ + return __start___jump_table; +} +EXPORT_SYMBOL_GPL(get_jump_label_start); + +struct jump_entry *get_jump_label_stop(void) +{ + return __stop___jump_table; +} +EXPORT_SYMBOL_GPL(get_jump_label_stop); + /* mutex to protect coming/going of the the jump_label table */ static DEFINE_MUTEX(jump_label_mutex); --------- test.c: /* * Copyright 2013 - Mathieu Desnoyers <mathieu.desnoyers@efficios.com> * * GPLv2 license. */ #include <linux/module.h> #include <linux/mm.h> #include <linux/kernel.h> #include <linux/jump_label.h> #include <linux/kallsyms.h> void print_static_jumps(void) { struct jump_entry *iter_start = get_jump_label_start(); struct jump_entry *iter_stop = get_jump_label_stop(); struct jump_entry *iter; for (iter = iter_start; iter < iter_stop; iter++) { char symbol[KSYM_SYMBOL_LEN] = ""; if (sprint_symbol(symbol, iter->code) == 0) { WARN_ON_ONCE(1); } printk("Jump label: addr: %lx symbol: %s ilen: %lu hits: %lu\n", (unsigned long) iter->code, symbol, (unsigned long) iter->jmp_insn_len, (unsigned long) iter->nr_hit); } } int initfct(void) { print_static_jumps(); return -EPERM; } module_init(initfct); MODULE_LICENSE("GPL"); ---------------------- Results sorted by reverse number of hits, after boot + starting firefox, 200s after boot: Jump label: addr: ffffffff810d9805 symbol: balance_dirty_pages_ratelimited+0x425/0x9c0 ilen: 5 814700 Jump label: addr: ffffffff81138135 symbol: writeback_sb_inodes+0x195/0x4a0 ilen: 5 752021 Jump label: addr: ffffffff8103695e symbol: call_console_drivers.constprop.13+0xe/0x140 ilen: 5 726153 Jump label: addr: ffffffff8103ce11 symbol: __do_softirq+0xe1/0x310 ilen: 5 724803 Jump label: addr: ffffffff810e07a6 symbol: shrink_inactive_list+0x2e6/0x420 ilen: 2 328701 Jump label: addr: ffffffff810dfdad symbol: shrink_page_list+0x56d/0x8c0 ilen: 5 315157 Jump label: addr: ffffffff810e9510 symbol: congestion_wait+0xb0/0x170 ilen: 2 241653 Jump label: addr: ffffffff810deb0f symbol: shrink_slab+0x1df/0x390 ilen: 5 231215 Jump label: addr: ffffffff810d9bd7 symbol: balance_dirty_pages_ratelimited+0x7f7/0x9c0 ilen: 5 166859 Jump label: addr: ffffffff815a8270 symbol: rcu_cpu_notify+0x14/0x882 ilen: 5 162468 Jump label: addr: ffffffff810a182d symbol: rcu_process_callbacks+0x2d/0x980 ilen: 5 162385 Jump label: addr: ffffffff8116a4e7 symbol: oom_score_adj_write+0x187/0x280 ilen: 2 157640 Jump label: addr: ffffffff810746c7 symbol: cpu_startup_entry+0x107/0x2d0 ilen: 2 140922 Jump label: addr: ffffffff8145e44f symbol: cpuidle_idle_call+0x9f/0x2d0 ilen: 5 140814 Jump label: addr: ffffffff810a04ab symbol: rcu_eqs_exit_common+0x12b/0x250 ilen: 2 133403 Jump label: addr: ffffffff810a07a8 symbol: rcu_eqs_enter_common+0x128/0x250 ilen: 2 133305 Jump label: addr: ffffffff810a2ec5 symbol: rcu_irq_enter+0x55/0x150 ilen: 2 133213 Jump label: addr: ffffffff810a289a symbol: __call_rcu.constprop.45+0x17a/0x3a0 ilen: 5 132999 Jump label: addr: ffffffff8105243e symbol: __queue_work+0xbe/0x380 ilen: 5 113973 Jump label: addr: ffffffff810deeb4 symbol: isolate_lru_pages.isra.56+0x104/0x230 ilen: 2 102282 Jump label: addr: ffffffff8100e537 symbol: syscall_trace_enter+0x207/0x2b0 ilen: 2 92971 Jump label: addr: ffffffff8100e763 symbol: syscall_trace_leave+0x183/0x210 ilen: 2 92890 Jump label: addr: ffffffff81084d04 symbol: __module_get+0x44/0x130 ilen: 2 88347 Jump label: addr: ffffffff8105c8e6 symbol: kthread_stop+0x66/0x1a0 ilen: 5 88347 Jump label: addr: ffffffff815a7af7 symbol: hrtimer_cpu_notify+0x147/0x228 ilen: 5 85377 Jump label: addr: ffffffff8105feb8 symbol: hrtimer_try_to_cancel+0x48/0x140 ilen: 2 85374 Jump label: addr: ffffffff8103b12e symbol: set_cpu_itimer+0x17e/0x270 ilen: 2 85371 Jump label: addr: ffffffff8109ff98 symbol: rcu_report_qs_rnp+0x88/0x200 ilen: 2 82824 Jump label: addr: ffffffff810a06af symbol: rcu_eqs_enter_common+0x2f/0x250 ilen: 2 67220 Jump label: addr: ffffffff810d9286 symbol: __bdi_update_bandwidth+0x2d6/0x430 ilen: 2 58486 Jump label: addr: ffffffff810a1986 symbol: rcu_process_callbacks+0x186/0x980 ilen: 5 42037 Jump label: addr: ffffffff810d0152 symbol: add_to_page_cache_locked+0x92/0x1a0 ilen: 2 41077 Jump label: addr: ffffffff810a1ac4 symbol: rcu_process_callbacks+0x2c4/0x980 ilen: 5 38498 Jump label: addr: ffffffff815a854e symbol: rcu_cpu_notify+0x2f2/0x882 ilen: 5 37915 Jump label: addr: ffffffff8100a193 symbol: default_idle+0x13/0x170 ilen: 2 36821 Jump label: addr: ffffffff8145c41d symbol: __cpufreq_notify_transition+0x19d/0x240 ilen: 5 36820 Jump label: addr: ffffffff81182025 symbol: ext3_truncate+0xc5/0x6f0 ilen: 5 36175 Jump label: addr: ffffffff810a0eab symbol: rcu_accelerate_cbs+0x19b/0x420 ilen: 5 34230 Jump label: addr: ffffffff810a1257 symbol: __rcu_process_gp_end+0x57/0x170 ilen: 2 34212 Jump label: addr: ffffffff810a1a38 symbol: rcu_process_callbacks+0x238/0x980 ilen: 5 33883 Jump label: addr: ffffffff810a1b3b symbol: rcu_process_callbacks+0x33b/0x980 ilen: 5 33882 Jump label: addr: ffffffff81059544 symbol: do_trace_rcu_torture_read+0x34/0x120 ilen: 2 33881 Jump label: addr: ffffffff81286602 symbol: blk_queue_bio+0x1b2/0x420 ilen: 2 32736 Jump label: addr: ffffffff8106906e symbol: rt_mutex_setprio+0x7e/0x290 ilen: 5 31821 Jump label: addr: ffffffff8127fb6b symbol: __elv_add_request+0x2b/0x350 ilen: 5 27443 Jump label: addr: ffffffff815afd2b symbol: __schedule+0x15b/0x8e0 ilen: 5 21613 Jump label: addr: ffffffff8104fd85 symbol: __request_module+0xf5/0x320 ilen: 5 21613 Jump label: addr: ffffffff810d8c39 symbol: global_dirty_limits+0xa9/0x1b0 ilen: 2 20980 Jump label: addr: ffffffff81286720 symbol: blk_queue_bio+0x2d0/0x420 ilen: 2 19492 Jump label: addr: ffffffff810a03be symbol: rcu_eqs_exit_common+0x3e/0x250 ilen: 2 17205 Jump label: addr: ffffffff8117e93f symbol: ext3_new_inode+0x3f/0xb60 ilen: 5 16992 Jump label: addr: ffffffff8117e5da symbol: ext3_free_inode+0x5a/0x380 ilen: 2 16992 Jump label: addr: ffffffff8113932f symbol: bdi_writeback_workfn+0x7f/0x270 ilen: 2 15614 Jump label: addr: ffffffff81138a29 symbol: wb_writeback+0x159/0x400 ilen: 5 15614 Jump label: addr: ffffffff81044f85 symbol: detach_if_pending+0x35/0x160 ilen: 2 14664 Jump label: addr: ffffffff81044e72 symbol: init_timer_key+0x22/0x100 ilen: 2 14403 Jump label: addr: ffffffff81049c52 symbol: get_signal_to_deliver+0x162/0x670 ilen: 5 14400 Jump label: addr: ffffffff810496a2 symbol: send_sigqueue+0x102/0x230 ilen: 2 14174 Jump label: addr: ffffffff81048118 symbol: __send_signal.constprop.29+0x188/0x3b0 ilen: 2 14174 Jump label: addr: ffffffff810a2d63 symbol: rcu_irq_exit+0x53/0x160 ilen: 2 13479 Jump label: addr: ffffffff81034321 symbol: do_fork+0xc1/0x320 ilen: 5 12516 Jump label: addr: ffffffff81182d8a symbol: ext3_journalled_write_end+0x3a/0x270 ilen: 5 11670 Jump label: addr: ffffffff8142c358 symbol: make_request+0x268/0xb60 ilen: 2 11625 Jump label: addr: ffffffff81329fcf symbol: credit_entropy_bits+0x12f/0x210 ilen: 5 11625 Jump label: addr: ffffffff8132a8a9 symbol: get_random_bytes_arch+0x19/0x130 ilen: 5 11540 Jump label: addr: ffffffff812837bf symbol: bio_attempt_back_merge+0x3f/0x150 ilen: 2 11075 Jump label: addr: ffffffff8128406c symbol: queue_unplugged+0x2c/0x120 ilen: 2 10989 Jump label: addr: ffffffff8144670d symbol: dec_pending+0x19d/0x390 ilen: 2 10904 Jump label: addr: ffffffff8151b640 symbol: udp_recvmsg+0x350/0x440 ilen: 2 10642 Jump label: addr: ffffffff8105fb40 symbol: __run_hrtimer+0x80/0x2a0 ilen: 2 10276 Jump label: addr: ffffffff8105fb1f symbol: __run_hrtimer+0x5f/0x2a0 ilen: 5 10276 Jump label: addr: ffffffff8105f622 symbol: enqueue_hrtimer+0x22/0x100 ilen: 2 10276 Jump label: addr: ffffffff81182fff symbol: ext3_get_blocks_handle+0x3f/0xd80 ilen: 2 9342 Jump label: addr: ffffffff81284fde symbol: get_request+0x37e/0x680 ilen: 5 8942 Jump label: addr: ffffffff81184c1f symbol: ext3_evict_inode+0x1f/0x300 ilen: 5 8428 Jump label: addr: ffffffff811819fd symbol: ext3_mark_inode_dirty+0x2d/0x110 ilen: 2 8419 Jump label: addr: ffffffff81039e05 symbol: do_wait+0x15/0x2b0 ilen: 5 8310 Jump label: addr: ffffffff810a4650 symbol: rcu_note_context_switch+0x10/0x3d0 ilen: 2 8183 Jump label: addr: ffffffff815a89fe symbol: rcu_cpu_notify+0x7a2/0x882 ilen: 5 7871 Jump label: addr: ffffffff8105fae7 symbol: __run_hrtimer+0x27/0x2a0 ilen: 5 7746 Jump label: addr: ffffffff8158c796 symbol: svc_udp_recvfrom+0x276/0x470 ilen: 2 7540 Jump label: addr: ffffffff8106ca15 symbol: update_curr+0xb5/0x1e0 ilen: 2 7149 Jump label: addr: ffffffff814c144c symbol: kfree_skb+0x3c/0x120 ilen: 2 6955 Jump label: addr: ffffffff81100f61 symbol: blk_queue_bounce+0x2e1/0x320 ilen: 5 6673 Jump label: addr: ffffffff81365b8f symbol: wakeup_source_deactivate+0xaf/0x190 ilen: 2 6608 Jump label: addr: ffffffff81088561 symbol: load_module+0x1a21/0x2390 ilen: 5 6605 Jump label: addr: ffffffff81283913 symbol: bio_attempt_front_merge+0x43/0x1b0 ilen: 5 6296 Jump label: addr: ffffffff8106a4d7 symbol: wake_up_new_task+0xc7/0x1e0 ilen: 2 6057 Jump label: addr: ffffffff8105c88d symbol: kthread_stop+0xd/0x1a0 ilen: 2 6057 Jump label: addr: ffffffff81427d48 symbol: return_io+0x58/0x110 ilen: 2 5594 Jump label: addr: ffffffff8109f7d0 symbol: _rcu_barrier_trace+0x40/0x120 ilen: 2 5356 Jump label: addr: ffffffff81560625 symbol: udpv6_recvmsg+0x515/0x610 ilen: 2 5342 Jump label: addr: ffffffff810600cc symbol: __hrtimer_start_range_ns+0x4c/0x420 ilen: 5 5025 Jump label: addr: ffffffff810071e9 symbol: emulate_vsyscall+0x59/0x400 ilen: 5 4906 Jump label: addr: ffffffff810dea68 symbol: shrink_slab+0x138/0x390 ilen: 5 4203 Jump label: addr: ffffffff81285e55 symbol: blk_peek_request+0x65/0x270 ilen: 5 4091 Jump label: addr: ffffffff814cc1fa symbol: netif_receive_skb+0x1a/0xc0 ilen: 2 4078 Jump label: addr: ffffffff81189404 symbol: ext3_unlink+0x14/0x300 ilen: 5 3943 Jump label: addr: ffffffff8109d830 symbol: handle_percpu_devid_irq+0x50/0x1c0 ilen: 2 3862 Jump label: addr: ffffffff810e1aa0 symbol: try_to_free_pages+0x3c0/0x530 ilen: 2 3696 Jump label: addr: ffffffff8103cdf7 symbol: __do_softirq+0xc7/0x310 ilen: 5 3669 Jump label: addr: ffffffff810a4541 symbol: rcu_read_unlock_special+0x2c1/0x3c0 ilen: 2 3662 Jump label: addr: ffffffff8109d84c symbol: handle_percpu_devid_irq+0x6c/0x1c0 ilen: 5 3662 Jump label: addr: ffffffff8117d24d symbol: ext3_trim_fs+0x53d/0x8c0 ilen: 5 3524 Jump label: addr: ffffffff812844a7 symbol: blk_update_request+0x27/0x420 ilen: 5 3487 Jump label: addr: ffffffff8109a438 symbol: handle_irq_event_percpu+0x68/0x2a0 ilen: 5 3400 Jump label: addr: ffffffff8107de91 symbol: __tick_nohz_idle_enter+0x1b1/0x4c0 ilen: 5 3400 Jump label: addr: ffffffff8117bb4e symbol: ext3_discard_reservation+0x7e/0x170 ilen: 2 3393 Jump label: addr: ffffffff8117b474 symbol: ext3_try_to_allocate_with_rsv+0x184/0x760 ilen: 5 3393 Jump label: addr: ffffffff810450ec symbol: call_timer_fn+0x3c/0x1f0 ilen: 5 2989 Jump label: addr: ffffffff815886a3 symbol: __rpc_execute+0x293/0x3d0 ilen: 2 2806 Jump label: addr: ffffffff8157ff7c symbol: call_status+0x8c/0x280 ilen: 2 2806 Jump label: addr: ffffffff815878c5 symbol: __rpc_sleep_on_priority+0x35/0x350 ilen: 5 2804 Jump label: addr: ffffffff814cadb7 symbol: dev_queue_xmit_nit+0x107/0x220 ilen: 5 2584 Jump label: addr: ffffffff814cc1e8 symbol: netif_receive_skb+0x8/0xc0 ilen: 2 2536 Jump label: addr: ffffffff811143c6 symbol: search_binary_handler+0x266/0x3c0 ilen: 5 2369 Jump label: addr: ffffffff810d0703 symbol: __delete_from_page_cache+0x23/0x1a0 ilen: 5 2324 Jump label: addr: ffffffff810457cf symbol: mod_timer_pinned+0x5f/0x1c0 ilen: 5 2108 Jump label: addr: ffffffff813656a0 symbol: wakeup_source_report_event+0xa0/0x140 ilen: 2 2051 Jump label: addr: ffffffff81074077 symbol: suspend_devices_and_enter+0x177/0x3b0 ilen: 5 2051 Jump label: addr: ffffffff81073f21 symbol: suspend_devices_and_enter+0x21/0x3b0 ilen: 5 2051 Jump label: addr: ffffffff815801f0 symbol: call_bind_status+0x50/0x230 ilen: 2 1835 Jump label: addr: ffffffff814c85a5 symbol: net_tx_action+0x65/0x1f0 ilen: 5 1701 Jump label: addr: ffffffff814c8996 symbol: __netif_receive_skb_core+0x16/0x5e0 ilen: 5 1683 Jump label: addr: ffffffff814ca83d symbol: netif_rx+0xd/0x1f0 ilen: 5 1681 Jump label: addr: ffffffff81066eae symbol: finish_task_switch+0x2e/0xf0 ilen: 2 1679 Jump label: addr: ffffffff815afd65 symbol: __schedule+0x195/0x8e0 ilen: 5 1677 Jump label: addr: ffffffff810d858a symbol: write_cache_pages+0x1aa/0x470 ilen: 5 1619 Jump label: addr: ffffffff811387b0 symbol: queue_io+0x150/0x270 ilen: 2 1614 Jump label: addr: ffffffff810459bb symbol: mod_timer+0x8b/0x270 ilen: 5 1587 Jump label: addr: ffffffff8151d70d symbol: udp_destroy_sock+0x2d/0x90 ilen: 2 1452 Jump label: addr: ffffffff815afd50 symbol: __schedule+0x180/0x8e0 ilen: 5 1445 Jump label: addr: ffffffff8109a458 symbol: handle_irq_event_percpu+0x88/0x2a0 ilen: 2 1427 Jump label: addr: ffffffff8117fee0 symbol: ext3_writeback_writepage+0xd0/0x230 ilen: 5 1303 Jump label: addr: ffffffff81180110 symbol: ext3_ordered_writepage+0xd0/0x2f0 ilen: 5 1209 Jump label: addr: ffffffff8151a063 symbol: raw_sendmsg+0x793/0x940 ilen: 5 1128 Jump label: addr: ffffffff81588b03 symbol: rpc_execute+0x23/0x120 ilen: 2 1127 Jump label: addr: ffffffff8158848a symbol: __rpc_execute+0x7a/0x3d0 ilen: 2 1127 Jump label: addr: ffffffff8157f230 symbol: call_connect_status+0x30/0x190 ilen: 2 1127 Jump label: addr: ffffffff814f7632 symbol: __ip_local_out+0x32/0x80 ilen: 2 1121 Jump label: addr: ffffffff81538d89 symbol: xfrm4_output+0x39/0xa0 ilen: 2 1120 Jump label: addr: ffffffff8103d1eb symbol: __raise_softirq_irqoff+0x1b/0xe0 ilen: 2 1119 Jump label: addr: ffffffff814f2eae symbol: ip_rcv+0x1fe/0x330 ilen: 2 1110 Jump label: addr: ffffffff81085905 symbol: module_put+0x45/0x150 ilen: 2 907 Jump label: addr: ffffffff81067bf7 symbol: ttwu_do_wakeup+0x27/0x150 ilen: 5 907 Jump label: addr: ffffffff81138598 symbol: __bdi_start_writeback+0x88/0x150 ilen: 2 613 Jump label: addr: ffffffff811e3dca symbol: ext4_es_find_delayed_extent_range+0x2a/0x260 ilen: 5 514 Jump label: addr: ffffffff810453f1 symbol: run_timer_softirq+0x151/0x2b0 ilen: 5 419 Jump label: addr: ffffffff814ca867 symbol: netif_rx+0x37/0x1f0 ilen: 2 414 Jump label: addr: ffffffff8103b6d4 symbol: do_setitimer+0x164/0x2e0 ilen: 5 398 Jump label: addr: ffffffff8117b828 symbol: ext3_try_to_allocate_with_rsv+0x538/0x760 ilen: 5 351 Jump label: addr: ffffffff81184919 symbol: ext3_direct_IO+0x169/0x450 ilen: 5 299 Jump label: addr: ffffffff81184831 symbol: ext3_direct_IO+0x81/0x450 ilen: 5 276 Jump label: addr: ffffffff8117afbb symbol: read_block_bitmap+0x3b/0x230 ilen: 2 206 Jump label: addr: ffffffff81138e6b symbol: bdi_start_background_writeback+0x1b/0xe0 ilen: 2 205 Jump label: addr: ffffffff8118068b symbol: ext3_readpage+0x1b/0xd0 ilen: 2 202 Jump label: addr: ffffffff8117caa7 symbol: ext3_new_blocks+0x6b7/0x790 ilen: 2 183 Jump label: addr: ffffffff8117c288 symbol: ext3_free_blocks+0x38/0x130 ilen: 2 183 Jump label: addr: ffffffff81074701 symbol: cpu_startup_entry+0x141/0x2d0 ilen: 5 183 Jump label: addr: ffffffff8100a1b0 symbol: default_idle+0x30/0x170 ilen: 5 183 Jump label: addr: ffffffff810e9c15 symbol: bdi_register+0x115/0x1c0 ilen: 2 175 Jump label: addr: ffffffff810e9902 symbol: bdi_unregister+0x22/0x220 ilen: 5 175 Jump label: addr: ffffffff810a469c symbol: rcu_note_context_switch+0x5c/0x3d0 ilen: 5 171 Jump label: addr: ffffffff811e3a30 symbol: ext4_es_lookup_extent+0xa0/0x230 ilen: 5 169 Jump label: addr: ffffffff8117e24f symbol: ext3_sync_file+0x4f/0x2e0 ilen: 2 167 Jump label: addr: ffffffff8127f157 symbol: elv_abort_queue+0x37/0xf0 ilen: 2 161 Jump label: addr: ffffffff811828b5 symbol: ext3_ordered_write_end+0x45/0x220 ilen: 5 140 Jump label: addr: ffffffff811826c9 symbol: ext3_writeback_write_end+0x59/0x200 ilen: 5 140 Jump label: addr: ffffffff811e3ea1 symbol: ext4_es_find_delayed_extent_range+0x101/0x260 ilen: 2 138 Jump label: addr: ffffffff811e39a6 symbol: ext4_es_lookup_extent+0x16/0x230 ilen: 5 138 Jump label: addr: ffffffff814ff669 symbol: tcp_prequeue_process+0x59/0xc0 ilen: 2 127 Jump label: addr: ffffffff814cc00e symbol: __netif_receive_skb+0xe/0x80 ilen: 2 109 Jump label: addr: ffffffff81180d03 symbol: ext3_forget+0x33/0x1c0 ilen: 5 100 Jump label: addr: ffffffff8132924d symbol: __mix_pool_bytes+0x3d/0x110 ilen: 2 85 Jump label: addr: ffffffff812835aa symbol: generic_make_request_checks+0x20a/0x3e0 ilen: 5 85 Jump label: addr: ffffffff811f7986 symbol: do_get_write_access+0x2a6/0x770 ilen: 2 76 Jump label: addr: ffffffff811f07cc symbol: log_do_checkpoint+0x1c/0x6b0 ilen: 5 76 Jump label: addr: ffffffff811ee570 symbol: journal_commit_transaction+0x630/0x1720 ilen: 5 76 Jump label: addr: ffffffff811ee1b8 symbol: journal_commit_transaction+0x278/0x1720 ilen: 5 76 Jump label: addr: ffffffff811edfb7 symbol: journal_commit_transaction+0x77/0x1720 ilen: 5 76 Jump label: addr: ffffffff811edf89 symbol: journal_commit_transaction+0x49/0x1720 ilen: 5 76 Jump label: addr: ffffffff810e179b symbol: try_to_free_pages+0xbb/0x530 ilen: 5 75 Jump label: addr: ffffffff811f9ece symbol: jbd2_journal_commit_transaction+0x150e/0x1f90 ilen: 5 74 Jump label: addr: ffffffff8118b042 symbol: ext3_sync_fs+0x22/0x120 ilen: 2 71 Jump label: addr: ffffffff8117e371 symbol: ext3_sync_file+0x171/0x2e0 ilen: 2 71 Jump label: addr: ffffffff81139146 symbol: wb_do_writeback+0xa6/0x210 ilen: 2 64 Jump label: addr: ffffffff81137622 symbol: bdi_queue_work+0x22/0x130 ilen: 2 64 Jump label: addr: ffffffff81137179 symbol: __writeback_single_inode+0x219/0x380 ilen: 2 64 Jump label: addr: ffffffff814ce724 symbol: dev_hard_start_xmit+0x214/0x620 ilen: 5 62 Jump label: addr: ffffffff81045c1c symbol: mod_timer_pending+0x5c/0x1b0 ilen: 5 59 Jump label: addr: ffffffff810d7e52 symbol: account_page_dirtied+0x22/0x170 ilen: 5 49 Jump label: addr: ffffffff8117fc50 symbol: ext3_journalled_writepage+0xd0/0x290 ilen: 5 34 Jump label: addr: ffffffff8109fa94 symbol: rcu_implicit_dynticks_qs+0x64/0x230 ilen: 2 33 Jump label: addr: ffffffff8145e4a8 symbol: cpuidle_idle_call+0xf8/0x2d0 ilen: 2 30 Jump label: addr: ffffffff81137004 symbol: __writeback_single_inode+0xa4/0x380 ilen: 5 29 Jump label: addr: ffffffff814f7d3f symbol: ip_mc_output+0xef/0x240 ilen: 2 28 Jump label: addr: ffffffff810a2c48 symbol: rcu_bh_qs+0x38/0x100 ilen: 2 24 Jump label: addr: ffffffff81084bc3 symbol: try_module_get+0x53/0x150 ilen: 2 24 Jump label: addr: ffffffff8103a2fe symbol: do_exit+0x25e/0xa20 ilen: 5 24 Jump label: addr: ffffffff814285e7 symbol: raid5_align_endio+0x87/0x240 ilen: 5 22 Jump label: addr: ffffffff810682ee symbol: wait_task_inactive+0x5e/0x1d0 ilen: 5 22 Jump label: addr: ffffffff81067f64 symbol: set_task_cpu+0x34/0x1d0 ilen: 5 22 Jump label: addr: ffffffff810e9362 symbol: wait_iff_congested+0xb2/0x1b0 ilen: 2 20 Jump label: addr: ffffffff810a2b48 symbol: rcu_sched_qs+0x38/0x100 ilen: 2 19 Jump label: addr: ffffffff8116a76a symbol: oom_adj_write+0x18a/0x2f0 ilen: 2 14 Jump label: addr: ffffffff81045622 symbol: add_timer_on+0x72/0x150 ilen: 2 12 Jump label: addr: ffffffff81538924 symbol: xfrm4_transport_finish+0x54/0x100 ilen: 2 10 Jump label: addr: ffffffff8132a1a6 symbol: mix_pool_bytes.constprop.16+0x36/0x120 ilen: 2 10 Jump label: addr: ffffffff814f7cec symbol: ip_mc_output+0x9c/0x240 ilen: 5 9 Jump label: addr: ffffffff814f2c35 symbol: ip_local_deliver+0x25/0xa0 ilen: 2 9 Jump label: addr: ffffffff810a0b1b symbol: __note_new_gpnum.isra.16+0x3b/0x110 ilen: 2 9 Jump label: addr: ffffffff810864fb symbol: free_module+0xb/0x290 ilen: 5 8 Jump label: addr: ffffffff81562237 symbol: udpv6_queue_rcv_skb+0x57/0x370 ilen: 5 7 Jump label: addr: ffffffff8156143b symbol: udpv6_destroy_sock+0x1b/0x70 ilen: 2 7 Jump label: addr: ffffffff8105dbc4 symbol: check_cpu_itimer.part.3+0x74/0x1b0 ilen: 5 6 Jump label: addr: ffffffff81431f8c symbol: raid5d+0x2dc/0x5a0 ilen: 5 5 Jump label: addr: ffffffff815ae662 symbol: schedule_timeout+0x92/0x250 ilen: 5 4 Jump label: addr: ffffffff8151f08c symbol: arp_xmit+0xc/0x60 ilen: 2 4 Jump label: addr: ffffffff814f7ec8 symbol: ip_output+0x38/0x90 ilen: 2 4 Jump label: addr: ffffffff81181f87 symbol: ext3_truncate+0x27/0x6f0 ilen: 5 3 Jump label: addr: ffffffff814c38b0 symbol: __netdev_alloc_skb+0x70/0x110 ilen: 2 2 Jump label: addr: ffffffff814c2267 symbol: __alloc_skb+0x37/0x2b0 ilen: 5 2 Jump label: addr: ffffffff811f5e27 symbol: jbd2_journal_extend+0x137/0x340 ilen: 2 2 Jump label: addr: ffffffff81573d52 symbol: xfrm6_transport_finish+0x62/0xc0 ilen: 2 1 Jump label: addr: ffffffff81284f43 symbol: get_request+0x2e3/0x680 ilen: 5 1 Jump label: addr: ffffffff8110b4fd symbol: kmem_cache_free+0xdd/0x220 ilen: 2 1 Jump label: addr: ffffffff8110b254 symbol: kfree+0x14/0x1e0 ilen: 5 1 Jump label: addr: ffffffff810a32d9 symbol: rcu_gp_kthread+0x179/0x920 ilen: 5 1 Jump label: addr: ffffffff81067f9e symbol: set_task_cpu+0x6e/0x1d0 ilen: 5 1 Jump label: addr: ffffffff815b4ad8 symbol: __do_page_fault+0x378/0x580 ilen: 2 0 Jump label: addr: ffffffff815b4953 symbol: __do_page_fault+0x1f3/0x580 ilen: 5 0 Jump label: addr: ffffffff815b4898 symbol: __do_page_fault+0x138/0x580 ilen: 5 0 Jump label: addr: ffffffff815aabef symbol: migrate_timer_list+0x1f/0xc0 ilen: 2 0 Jump label: addr: ffffffff815a844c symbol: rcu_cpu_notify+0x1f0/0x882 ilen: 5 0 Jump label: addr: ffffffff81587db0 symbol: rpc_wake_up_task_queue_locked+0x70/0x280 ilen: 5 0 Jump label: addr: ffffffff815812ad symbol: rpc_task_set_client+0x3d/0xd0 ilen: 2 0 Jump label: addr: ffffffff81574124 symbol: xfrm6_output+0x14/0x70 ilen: 2 0 Jump label: addr: ffffffff8156842f symbol: igmp6_send+0x20f/0x3a0 ilen: 2 0 Jump label: addr: ffffffff8156785f symbol: mld_sendpack+0x12f/0x290 ilen: 2 0 Jump label: addr: ffffffff8156409d symbol: rawv6_sendmsg+0x67d/0xb60 ilen: 5 0 Jump label: addr: ffffffff8155d751 symbol: ndisc_send_skb+0x151/0x2a0 ilen: 2 0 Jump label: addr: ffffffff8154d2dc symbol: ip6_input+0xc/0x60 ilen: 2 0 Jump label: addr: ffffffff8154d113 symbol: ipv6_rcv+0x263/0x420 ilen: 5 0 Jump label: addr: ffffffff8154ca5d symbol: ip6_output+0x2d/0xb0 ilen: 2 0 Jump label: addr: ffffffff8154b994 symbol: ip6_forward+0x1f4/0x7c0 ilen: 2 0 Jump label: addr: ffffffff8154b313 symbol: __ip6_local_out+0x33/0x90 ilen: 2 0 Jump label: addr: ffffffff8154b0dc symbol: ip6_xmit+0x1ac/0x3b0 ilen: 5 0 Jump label: addr: ffffffff8154ac46 symbol: ip6_finish_output2+0x346/0x470 ilen: 2 0 Jump label: addr: ffffffff8151fc56 symbol: arp_rcv+0xd6/0x160 ilen: 2 0 Jump label: addr: ffffffff8151d9f7 symbol: udp_queue_rcv_skb+0x57/0x350 ilen: 5 0 Jump label: addr: ffffffff8151a7c5 symbol: __udp_queue_rcv_skb+0x95/0x1a0 ilen: 2 0 Jump label: addr: ffffffff815145d9 symbol: tcp_prequeue+0xe9/0x260 ilen: 2 0 Jump label: addr: ffffffff81510d41 symbol: tcp_delack_timer_handler+0xb1/0x1e0 ilen: 2 0 Jump label: addr: ffffffff814f7dcd symbol: ip_mc_output+0x17d/0x240 ilen: 2 0 Jump label: addr: ffffffff814f4408 symbol: ip_forward+0x218/0x3b0 ilen: 5 0 Jump label: addr: ffffffff814ceb84 symbol: dev_queue_xmit+0x54/0x4d0 ilen: 5 0 Jump label: addr: ffffffff814ce847 symbol: dev_hard_start_xmit+0x337/0x620 ilen: 2 0 Jump label: addr: ffffffff814cc501 symbol: net_rx_action+0xa1/0x200 ilen: 2 0 Jump label: addr: ffffffff814ca852 symbol: netif_rx+0x22/0x1f0 ilen: 5 0 Jump label: addr: ffffffff814c89ae symbol: __netif_receive_skb_core+0x2e/0x5e0 ilen: 5 0 Jump label: addr: ffffffff814c4ddb symbol: skb_copy_datagram_iovec+0x2b/0x2c0 ilen: 5 0 Jump label: addr: ffffffff814c1333 symbol: consume_skb+0x33/0x110 ilen: 2 0 Jump label: addr: ffffffff814bb85b symbol: sock_queue_rcv_skb+0x3b/0x230 ilen: 2 0 Jump label: addr: ffffffff814bb743 symbol: sk_receive_skb+0x93/0x170 ilen: 2 0 Jump label: addr: ffffffff814baf38 symbol: release_sock+0x88/0x170 ilen: 5 0 Jump label: addr: ffffffff814ba800 symbol: __sk_mem_schedule+0xf0/0x310 ilen: 5 0 Jump label: addr: ffffffff81456b7c symbol: ghes_edac_report_mem_error+0x2fc/0xdc0 ilen: 2 0 Jump label: addr: ffffffff81451cf7 symbol: edac_mc_handle_error+0x417/0x6b0 ilen: 2 0 Jump label: addr: ffffffff8144890c symbol: dm_request_fn+0x12c/0x2b0 ilen: 2 0 Jump label: addr: ffffffff814469c5 symbol: __map_bio+0xc5/0x1d0 ilen: 2 0 Jump label: addr: ffffffff81430147 symbol: handle_stripe+0x9a7/0x22c0 ilen: 5 0 Jump label: addr: ffffffff8142ffea symbol: handle_stripe+0x84a/0x22c0 ilen: 5 0 Jump label: addr: ffffffff8142c795 symbol: make_request+0x6a5/0xb60 ilen: 5 0 Jump label: addr: ffffffff8142afc7 symbol: raid5_unplug+0x97/0x150 ilen: 2 0 Jump label: addr: ffffffff8138bba9 symbol: scsi_times_out+0x29/0x150 ilen: 2 0 Jump label: addr: ffffffff8138b988 symbol: scsi_eh_wakeup+0x48/0x100 ilen: 2 0 Jump label: addr: ffffffff81387b55 symbol: scsi_dispatch_cmd+0xd5/0x2c0 ilen: 5 0 Jump label: addr: ffffffff81387b1d symbol: scsi_dispatch_cmd+0x9d/0x2c0 ilen: 5 0 Jump label: addr: ffffffff813870db symbol: scsi_done+0x1b/0xd0 ilen: 2 0 Jump label: addr: ffffffff8133d17b symbol: drm_wait_vblank+0x46b/0x650 ilen: 2 0 Jump label: addr: ffffffff8133c76d symbol: drm_handle_vblank+0x2cd/0x3a0 ilen: 2 0 Jump label: addr: ffffffff8133b083 symbol: send_vblank_event+0x93/0x1a0 ilen: 2 0 Jump label: addr: ffffffff8132ab51 symbol: extract_entropy_user+0x31/0x1c0 ilen: 5 0 Jump label: addr: ffffffff8132a75e symbol: extract_entropy+0x3e/0x170 ilen: 5 0 Jump label: addr: ffffffff81283b4f symbol: blk_requeue_request+0x2f/0x120 ilen: 2 0 Jump label: addr: ffffffff812834a6 symbol: generic_make_request_checks+0x106/0x3e0 ilen: 5 0 Jump label: addr: ffffffff81201f05 symbol: __jbd2_update_log_tail+0x75/0x1b0 ilen: 2 0 Jump label: addr: ffffffff81200dd0 symbol: jbd2_write_superblock+0x30/0x1f0 ilen: 5 0 Jump label: addr: ffffffff811fc229 symbol: jbd2_log_do_checkpoint+0x29/0x570 ilen: 5 0 Jump label: addr: ffffffff811fbf89 symbol: __jbd2_journal_remove_checkpoint+0xf9/0x1f0 ilen: 2 0 Jump label: addr: ffffffff811fbd65 symbol: __jbd2_journal_drop_transaction+0xe5/0x210 ilen: 2 0 Jump label: addr: ffffffff811fa052 symbol: jbd2_journal_commit_transaction+0x1692/0x1f90 ilen: 5 0 Jump label: addr: ffffffff811f8fa1 symbol: jbd2_journal_commit_transaction+0x5e1/0x1f90 ilen: 5 0 Jump label: addr: ffffffff811f8e5b symbol: jbd2_journal_commit_transaction+0x49b/0x1f90 ilen: 5 0 Jump label: addr: ffffffff811f8d37 symbol: jbd2_journal_commit_transaction+0x377/0x1f90 ilen: 5 0 Jump label: addr: ffffffff811f8ac6 symbol: jbd2_journal_commit_transaction+0x106/0x1f90 ilen: 5 0 Jump label: addr: ffffffff811f8a8a symbol: jbd2_journal_commit_transaction+0xca/0x1f90 ilen: 5 0 Jump label: addr: ffffffff811f6367 symbol: jbd2_journal_stop+0x147/0x4f0 ilen: 5 0 Jump label: addr: ffffffff811f5bf9 symbol: jbd2__journal_start+0x119/0x200 ilen: 2 0 Jump label: addr: ffffffff811f2bab symbol: journal_write_superblock+0x2b/0x200 ilen: 5 0 Jump label: addr: ffffffff811f03a0 symbol: __journal_drop_transaction+0xd0/0x190 ilen: 2 0 Jump label: addr: ffffffff811f0128 symbol: cleanup_journal_tail+0xf8/0x220 ilen: 2 0 Jump label: addr: ffffffff811ef5ad symbol: journal_commit_transaction+0x166d/0x1720 ilen: 2 0 Jump label: addr: ffffffff811ef376 symbol: journal_commit_transaction+0x1436/0x1720 ilen: 2 0 Jump label: addr: ffffffff811ef0c2 symbol: journal_commit_transaction+0x1182/0x1720 ilen: 5 0 Jump label: addr: ffffffff811ee428 symbol: journal_commit_transaction+0x4e8/0x1720 ilen: 5 0 Jump label: addr: ffffffff811e3c27 symbol: ext4_es_insert_extent+0x67/0x190 ilen: 2 0 Jump label: addr: ffffffff811e378a symbol: ext4_es_remove_extent+0x2a/0x130 ilen: 2 0 Jump label: addr: ffffffff811e2fbf symbol: ext4_es_shrink+0x23f/0x3c0 ilen: 5 0 Jump label: addr: ffffffff811e2db3 symbol: ext4_es_shrink+0x33/0x3c0 ilen: 2 0 Jump label: addr: ffffffff811e1ba1 symbol: ext4_ind_map_blocks+0xb1/0x740 ilen: 5 0 Jump label: addr: ffffffff811e1b3c symbol: ext4_ind_map_blocks+0x4c/0x740 ilen: 5 0 Jump label: addr: ffffffff811dd1ad symbol: ext4_trim_fs+0x47d/0xa90 ilen: 5 0 Jump label: addr: ffffffff811dd0f1 symbol: ext4_trim_fs+0x3c1/0xa90 ilen: 5 0 Jump label: addr: ffffffff811dcef9 symbol: ext4_trim_fs+0x1c9/0xa90 ilen: 5 0 Jump label: addr: ffffffff811dc094 symbol: ext4_free_blocks+0x434/0xbd0 ilen: 5 0 Jump label: addr: ffffffff811dc002 symbol: ext4_free_blocks+0x3a2/0xbd0 ilen: 5 0 Jump label: addr: ffffffff811dbcbe symbol: ext4_free_blocks+0x5e/0xbd0 ilen: 5 0 Jump label: addr: ffffffff811dbaca symbol: ext4_mb_new_blocks+0x45a/0x5f0 ilen: 5 0 Jump label: addr: ffffffff811db7ee symbol: ext4_mb_new_blocks+0x17e/0x5f0 ilen: 5 0 Jump label: addr: ffffffff811db698 symbol: ext4_mb_new_blocks+0x28/0x5f0 ilen: 5 0 Jump label: addr: ffffffff811db218 symbol: ext4_discard_preallocations+0x38/0x490 ilen: 5 0 Jump label: addr: ffffffff811d9e52 symbol: ext4_mb_release_inode_pa.isra.23+0x152/0x3a0 ilen: 5 0 Jump label: addr: ffffffff811d9e27 symbol: ext4_mb_release_inode_pa.isra.23+0x127/0x3a0 ilen: 5 0 Jump label: addr: ffffffff811d99e0 symbol: ext4_mb_release_context+0x2b0/0x5d0 ilen: 2 0 Jump label: addr: ffffffff811d981a symbol: ext4_mb_release_context+0xea/0x5d0 ilen: 2 0 Jump label: addr: ffffffff811d92e4 symbol: ext4_mb_release_group_pa+0x94/0x1d0 ilen: 5 0 Jump label: addr: ffffffff811d9268 symbol: ext4_mb_release_group_pa+0x18/0x1d0 ilen: 5 0 Jump label: addr: ffffffff811d8fab symbol: ext4_free_data_callback+0x4b/0x2f0 ilen: 5 0 Jump label: addr: ffffffff811d6b37 symbol: ext4_mb_init_cache+0x3e7/0x780 ilen: 5 0 Jump label: addr: ffffffff811d69cc symbol: ext4_mb_init_cache+0x27c/0x780 ilen: 5 0 Jump label: addr: ffffffff811d5448 symbol: ext4_mb_new_inode_pa+0x158/0x3c0 ilen: 5 0 Jump label: addr: ffffffff811d50ad symbol: ext4_mb_new_group_pa+0xdd/0x320 ilen: 5 0 Jump label: addr: ffffffff811d375e symbol: __ext4_forget+0x3e/0x2a0 ilen: 5 0 Jump label: addr: ffffffff811d3394 symbol: __ext4_journal_start_sb+0x34/0x1e0 ilen: 2 0 Jump label: addr: ffffffff811d2bba symbol: ext4_fallocate+0x50a/0x5e0 ilen: 2 0 Jump label: addr: ffffffff811d2924 symbol: ext4_fallocate+0x274/0x5e0 ilen: 5 0 Jump label: addr: ffffffff811d2720 symbol: ext4_fallocate+0x70/0x5e0 ilen: 5 0 Jump label: addr: ffffffff811d1739 symbol: ext4_ext_map_blocks+0x1b9/0x1070 ilen: 5 0 Jump label: addr: ffffffff811d1661 symbol: ext4_ext_map_blocks+0xe1/0x1070 ilen: 5 0 Jump label: addr: ffffffff811d15d2 symbol: ext4_ext_map_blocks+0x52/0x1070 ilen: 5 0 Jump label: addr: ffffffff811d0eac symbol: ext4_ext_handle_uninitialized_extents+0x7dc/0xe90 ilen: 5 0 Jump label: addr: ffffffff811d0cbb symbol: ext4_ext_handle_uninitialized_extents+0x5eb/0xe90 ilen: 5 0 Jump label: addr: ffffffff811d0b92 symbol: ext4_ext_handle_uninitialized_extents+0x4c2/0xe90 ilen: 5 0 Jump label: addr: ffffffff811d0707 symbol: ext4_ext_handle_uninitialized_extents+0x37/0xe90 ilen: 5 0 Jump label: addr: ffffffff811d057a symbol: get_reserved_cluster_alloc+0x5a/0x1b0 ilen: 2 0 Jump label: addr: ffffffff811cfbd3 symbol: ext4_ext_remove_space+0x7a3/0x1060 ilen: 5 0 Jump label: addr: ffffffff811cf9dd symbol: ext4_ext_remove_space+0x5ad/0x1060 ilen: 5 0 Jump label: addr: ffffffff811cf879 symbol: ext4_ext_remove_space+0x449/0x1060 ilen: 5 0 Jump label: addr: ffffffff811cf4c0 symbol: ext4_ext_remove_space+0x90/0x1060 ilen: 5 0 Jump label: addr: ffffffff811cd4bd symbol: ext4_ext_find_extent+0x14d/0x400 ilen: 5 0 Jump label: addr: ffffffff811cd09e symbol: ext4_ext_rm_idx+0xde/0x240 ilen: 5 0 Jump label: addr: ffffffff811cc704 symbol: get_implied_cluster_alloc+0xe4/0x240 ilen: 5 0 Jump label: addr: ffffffff811cc688 symbol: get_implied_cluster_alloc+0x68/0x240 ilen: 5 0 Jump label: addr: ffffffff811b898e symbol: ext4_sync_fs+0x2e/0x130 ilen: 2 0 Jump label: addr: ffffffff811b88a9 symbol: ext4_drop_inode+0x39/0xf0 ilen: 2 0 Jump label: addr: ffffffff811b5496 symbol: ext4_unlink+0x1c6/0x3a0 ilen: 5 0 Jump label: addr: ffffffff811b52e4 symbol: ext4_unlink+0x14/0x3a0 ilen: 5 0 Jump label: addr: ffffffff811ade40 symbol: ext4_evict_inode+0x440/0x530 ilen: 2 0 Jump label: addr: ffffffff811ada0d symbol: ext4_evict_inode+0xd/0x530 ilen: 2 0 Jump label: addr: ffffffff811ad73a symbol: ext4_journalled_write_end+0x3a/0x300 ilen: 2 0 Jump label: addr: ffffffff811ad5ea symbol: ext4_da_writepages+0x59a/0x6b0 ilen: 2 0 Jump label: addr: ffffffff811ad3f9 symbol: ext4_da_writepages+0x3a9/0x6b0 ilen: 5 0 Jump label: addr: ffffffff811ad091 symbol: ext4_da_writepages+0x41/0x6b0 ilen: 2 0 Jump label: addr: ffffffff811ac4f6 symbol: ext4_da_write_end+0x76/0x2e0 ilen: 5 0 Jump label: addr: ffffffff811ac1ae symbol: ext4_write_end+0x3e/0x310 ilen: 5 0 Jump label: addr: ffffffff811abc61 symbol: ext4_punch_hole+0x71/0x580 ilen: 2 0 Jump label: addr: ffffffff811aba94 symbol: ext4_setattr+0x4d4/0x630 ilen: 5 0 Jump label: addr: ffffffff811ab2db symbol: ext4_da_write_begin+0x5b/0x340 ilen: 5 0 Jump label: addr: ffffffff811aae3e symbol: ext4_write_begin+0x2e/0x470 ilen: 5 0 Jump label: addr: ffffffff811aaba8 symbol: ext4_truncate+0x198/0x400 ilen: 5 0 Jump label: addr: ffffffff811aaa3b symbol: ext4_truncate+0x2b/0x400 ilen: 5 0 Jump label: addr: ffffffff811aa7f8 symbol: ext4_mark_inode_dirty+0x38/0x250 ilen: 2 0 Jump label: addr: ffffffff811a957f symbol: ext4_writepage+0x2f/0x460 ilen: 5 0 Jump label: addr: ffffffff811a864b symbol: ext4_alloc_da_blocks+0x1b/0x110 ilen: 2 0 Jump label: addr: ffffffff811a7bee symbol: ext4_da_update_reserve_space+0x4e/0x2e0 ilen: 5 0 Jump label: addr: ffffffff811a7933 symbol: ext4_da_invalidatepage+0x163/0x2f0 ilen: 2 0 Jump label: addr: ffffffff811a70bc symbol: ext4_da_get_block_prep+0x48c/0x760 ilen: 5 0 Jump label: addr: ffffffff811a6f74 symbol: ext4_da_get_block_prep+0x344/0x760 ilen: 5 0 Jump label: addr: ffffffff811a6b26 symbol: ext4_readpage+0x26/0x130 ilen: 2 0 Jump label: addr: ffffffff811a68a1 symbol: ext4_releasepage+0x41/0x180 ilen: 2 0 Jump label: addr: ffffffff811a6591 symbol: ext4_direct_IO+0x321/0x510 ilen: 2 0 Jump label: addr: ffffffff811a6381 symbol: ext4_direct_IO+0x111/0x510 ilen: 5 0 Jump label: addr: ffffffff811a5d42 symbol: ext4_invalidatepage+0x22/0x110 ilen: 2 0 Jump label: addr: ffffffff811a5a30 symbol: __ext4_get_inode_loc+0x180/0x470 ilen: 2 0 Jump label: addr: ffffffff811a57e1 symbol: __ext4_journalled_invalidatepage+0x41/0x110 ilen: 2 0 Jump label: addr: ffffffff811a4e2b symbol: __ext4_new_inode+0x13bb/0x1460 ilen: 2 0 Jump label: addr: ffffffff811a3ad4 symbol: __ext4_new_inode+0x64/0x1460 ilen: 5 0 Jump label: addr: ffffffff811a350c symbol: ext4_free_inode+0x6c/0x5d0 ilen: 5 0 Jump label: addr: ffffffff811a30b6 symbol: ext4_read_inode_bitmap+0x246/0x630 ilen: 5 0 Jump label: addr: ffffffff811a26de symbol: ext4_sync_file+0x1be/0x410 ilen: 2 0 Jump label: addr: ffffffff811a2586 symbol: ext4_sync_file+0x66/0x410 ilen: 2 0 Jump label: addr: ffffffff811a01d0 symbol: ext4_read_block_bitmap_nowait+0x180/0x330 ilen: 5 0 Jump label: addr: ffffffff8118af69 symbol: ext3_drop_inode+0x39/0xf0 ilen: 2 0 Jump label: addr: ffffffff81189499 symbol: ext3_unlink+0xa9/0x300 ilen: 5 0 Jump label: addr: ffffffff8118305b symbol: ext3_get_blocks_handle+0x9b/0xd80 ilen: 5 0 Jump label: addr: ffffffff81182acb symbol: ext3_write_begin+0x3b/0x2c0 ilen: 5 0 Jump label: addr: ffffffff8118205d symbol: ext3_truncate+0xfd/0x6f0 ilen: 5 0 Jump label: addr: ffffffff81180909 symbol: __ext3_get_inode_loc+0x149/0x3c0 ilen: 2 0 Jump label: addr: ffffffff811804d1 symbol: ext3_invalidatepage+0x41/0x110 ilen: 2 0 Jump label: addr: ffffffff81180371 symbol: ext3_releasepage+0x41/0x160 ilen: 2 0 Jump label: addr: ffffffff8117f38a symbol: ext3_new_inode+0xa8a/0xb60 ilen: 2 0 Jump label: addr: ffffffff8117c467 symbol: ext3_new_blocks+0x77/0x790 ilen: 5 0 Jump label: addr: ffffffff8117b1dd symbol: ext3_rsv_window_add+0x2d/0x140 ilen: 2 0 Jump label: addr: ffffffff81142e31 symbol: bio_split+0x51/0x2f0 ilen: 5 0 Jump label: addr: ffffffff81140af4 symbol: __find_get_block+0x94/0x270 ilen: 2 0 Jump label: addr: ffffffff8113ebd6 symbol: mark_buffer_dirty+0x26/0x160 ilen: 2 0 Jump label: addr: ffffffff8113e15b symbol: touch_buffer+0x1b/0xd0 ilen: 2 0 Jump label: addr: ffffffff8113947b symbol: bdi_writeback_workfn+0x1cb/0x270 ilen: 2 0 Jump label: addr: ffffffff811389da symbol: wb_writeback+0x10a/0x400 ilen: 5 0 Jump label: addr: ffffffff81138992 symbol: wb_writeback+0xc2/0x400 ilen: 5 0 Jump label: addr: ffffffff81137a7a symbol: __mark_inode_dirty+0x5a/0x3d0 ilen: 5 0 Jump label: addr: ffffffff81137a51 symbol: __mark_inode_dirty+0x31/0x3d0 ilen: 5 0 Jump label: addr: ffffffff81137150 symbol: __writeback_single_inode+0x1f0/0x380 ilen: 5 0 Jump label: addr: ffffffff81136f8c symbol: __writeback_single_inode+0x2c/0x380 ilen: 5 0 Jump label: addr: ffffffff81115db7 symbol: set_task_comm+0x37/0x110 ilen: 2 0 Jump label: addr: ffffffff8110c779 symbol: __kmalloc_track_caller+0xb9/0x1e0 ilen: 2 0 Jump label: addr: ffffffff8110bf0e symbol: __kmalloc+0xbe/0x1e0 ilen: 2 0 Jump label: addr: ffffffff8110b8ef symbol: kmem_cache_alloc_trace+0xaf/0x1d0 ilen: 2 0 Jump label: addr: ffffffff8110b724 symbol: kmem_cache_alloc+0xb4/0x1d0 ilen: 2 0 Jump label: addr: ffffffff8110a6e1 symbol: kmalloc_order_trace+0x51/0x120 ilen: 2 0 Jump label: addr: ffffffff810e1551 symbol: kswapd+0x7a1/0x930 ilen: 5 0 Jump label: addr: ffffffff810e0f1a symbol: kswapd+0x16a/0x930 ilen: 5 0 Jump label: addr: ffffffff810df1b6 symbol: wakeup_kswapd+0xa6/0x1a0 ilen: 2 0 Jump label: addr: ffffffff810d7a66 symbol: free_hot_cold_page_list+0x46/0x100 ilen: 2 0 Jump label: addr: ffffffff810d7141 symbol: free_pcppages_bulk+0x181/0x420 ilen: 5 0 Jump label: addr: ffffffff810d6474 symbol: __alloc_pages_nodemask+0x114/0x7e0 ilen: 2 0 Jump label: addr: ffffffff810d5bee symbol: free_pages_prepare+0xe/0x160 ilen: 5 0 Jump label: addr: ffffffff810d53e0 symbol: __rmqueue+0x2b0/0x560 ilen: 5 0 Jump label: addr: ffffffff810d53a2 symbol: __rmqueue+0x272/0x560 ilen: 5 0 Jump label: addr: ffffffff810a48b6 symbol: rcu_note_context_switch+0x276/0x3d0 ilen: 2 0 Jump label: addr: ffffffff810a43bd symbol: rcu_read_unlock_special+0x13d/0x3c0 ilen: 5 0 Jump label: addr: ffffffff810a3d17 symbol: rcu_check_callbacks+0x297/0x7a0 ilen: 5 0 Jump label: addr: ffffffff810a3a94 symbol: rcu_check_callbacks+0x14/0x7a0 ilen: 5 0 Jump label: addr: ffffffff810a3800 symbol: rcu_gp_kthread+0x6a0/0x920 ilen: 5 0 Jump label: addr: ffffffff810a340f symbol: rcu_gp_kthread+0x2af/0x920 ilen: 5 0 Jump label: addr: ffffffff810a27fe symbol: __call_rcu.constprop.45+0xde/0x3a0 ilen: 5 0 Jump label: addr: ffffffff810a1c52 symbol: rcu_process_callbacks+0x452/0x980 ilen: 5 0 Jump label: addr: ffffffff810a1000 symbol: rcu_accelerate_cbs+0x2f0/0x420 ilen: 2 0 Jump label: addr: ffffffff810a09de symbol: trace_rcu_future_gp.isra.5+0x5e/0x160 ilen: 2 0 Jump label: addr: ffffffff8109fbbd symbol: rcu_implicit_dynticks_qs+0x18d/0x230 ilen: 2 0 Jump label: addr: ffffffff8109f958 symbol: rcu_preempt_qs+0x38/0x110 ilen: 2 0 Jump label: addr: ffffffff8105f7fa symbol: hrtimer_init+0x2a/0x1c0 ilen: 5 0 Jump label: addr: ffffffff81052de7 symbol: process_one_work+0x157/0x4a0 ilen: 5 0 Jump label: addr: ffffffff81052dcc symbol: process_one_work+0x13c/0x4a0 ilen: 5 0 Jump label: addr: ffffffff8105247d symbol: __queue_work+0xfd/0x380 ilen: 5 0 Jump label: addr: ffffffff810505bf symbol: pwq_activate_delayed_work+0x2f/0x100 ilen: 2 0 Jump label: addr: ffffffff81045108 symbol: call_timer_fn+0x58/0x1f0 ilen: 2 0 Jump label: addr: ffffffff8103b4a3 symbol: it_real_fn+0x23/0xf0 ilen: 2 0 Jump label: addr: ffffffff81038f1e symbol: delayed_put_task_struct+0x2e/0x120 ilen: 2 0 Jump label: addr: ffffffff81033cf6 symbol: copy_process+0xe16/0x1350 ilen: 2 0 Jump label: addr: ffffffff8101cadd symbol: mce_log+0xd/0x1c0 ilen: 5 0 Thoughts ? Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [RFC] gcc feature request: Moving blocks into sections 2013-08-05 18:39 ` Steven Rostedt 2013-08-05 18:49 ` Linus Torvalds @ 2013-08-05 20:06 ` Jason Baron 1 sibling, 0 replies; 68+ messages in thread From: Jason Baron @ 2013-08-05 20:06 UTC (permalink / raw) To: Steven Rostedt Cc: Linus Torvalds, LKML, gcc, Ingo Molnar, Mathieu Desnoyers, H. Peter Anvin, Thomas Gleixner, David Daney, Behan Webster, Peter Zijlstra, Herbert Xu On 08/05/2013 02:39 PM, Steven Rostedt wrote: > On Mon, 2013-08-05 at 11:20 -0700, Linus Torvalds wrote: > >> Of course, it would be good to optimize static_key_false() itself - >> right now those static key jumps are always five bytes, and while they >> get nopped out, it would still be nice if there was some way to have >> just a two-byte nop (turning into a short branch) *if* we can reach >> another jump that way..For small functions that would be lovely. Oh >> well. > I had patches that did exactly this: > > https://lkml.org/lkml/2012/3/8/461 > > But it got dropped for some reason. I don't remember why. Maybe because > of the complexity? > > -- Steve Hi Steve, I recall testing your patches and the text size increased unexpectedly. I believe I correctly accounted for changes to the text size *outside* of branch points. If you do re-visit the series that is one thing I'd like to double check/understand. Thanks, -Jason ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [RFC] gcc feature request: Moving blocks into sections 2013-08-05 16:55 [RFC] gcc feature request: Moving blocks into sections Steven Rostedt 2013-08-05 17:02 ` H. Peter Anvin 2013-08-05 17:12 ` Linus Torvalds @ 2013-08-05 19:04 ` Andi Kleen 2013-08-05 19:16 ` Steven Rostedt 2013-08-05 19:25 ` Linus Torvalds 2013-08-12 9:17 ` Peter Zijlstra 3 siblings, 2 replies; 68+ messages in thread From: Andi Kleen @ 2013-08-05 19:04 UTC (permalink / raw) To: Steven Rostedt Cc: LKML, gcc, Linus Torvalds, Ingo Molnar, Mathieu Desnoyers, H. Peter Anvin, Thomas Gleixner, David Daney, Behan Webster, Peter Zijlstra Steven Rostedt <rostedt@goodmis.org> writes: Can't you just use -freorder-blocks-and-partition? This should already partition unlikely blocks into a different section. Just a single one of course. FWIW the disadvantage is that multiple code sections tends to break various older dwarf unwinders, as it needs dwarf3 latest'n'greatest. -Andi -- ak@linux.intel.com -- Speaking for myself only ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [RFC] gcc feature request: Moving blocks into sections 2013-08-05 19:04 ` Andi Kleen @ 2013-08-05 19:16 ` Steven Rostedt 2013-08-05 19:30 ` Xinliang David Li 2013-08-05 19:25 ` Linus Torvalds 1 sibling, 1 reply; 68+ messages in thread From: Steven Rostedt @ 2013-08-05 19:16 UTC (permalink / raw) To: Andi Kleen Cc: LKML, gcc, Linus Torvalds, Ingo Molnar, Mathieu Desnoyers, H. Peter Anvin, Thomas Gleixner, David Daney, Behan Webster, Peter Zijlstra On Mon, 2013-08-05 at 12:04 -0700, Andi Kleen wrote: > Steven Rostedt <rostedt@goodmis.org> writes: > > Can't you just use -freorder-blocks-and-partition? Yeah, I'm familiar with this option. > > This should already partition unlikely blocks into a > different section. Just a single one of course. > > FWIW the disadvantage is that multiple code sections tends > to break various older dwarf unwinders, as it needs > dwarf3 latest'n'greatest. If the option was so good, I would expect everyone would be using it ;-) I'm mainly only concerned with the tracepoints. I'm asking to be able to do this with just the tracepoint code, and affect nobody else. -- Steve ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [RFC] gcc feature request: Moving blocks into sections 2013-08-05 19:16 ` Steven Rostedt @ 2013-08-05 19:30 ` Xinliang David Li 0 siblings, 0 replies; 68+ messages in thread From: Xinliang David Li @ 2013-08-05 19:30 UTC (permalink / raw) To: Steven Rostedt Cc: Andi Kleen, LKML, gcc, Linus Torvalds, Ingo Molnar, Mathieu Desnoyers, H. Peter Anvin, Thomas Gleixner, David Daney, Behan Webster, Peter Zijlstra, Rong Xu, Teresa Johnson On Mon, Aug 5, 2013 at 12:16 PM, Steven Rostedt <rostedt@goodmis.org> wrote: > On Mon, 2013-08-05 at 12:04 -0700, Andi Kleen wrote: >> Steven Rostedt <rostedt@goodmis.org> writes: >> >> Can't you just use -freorder-blocks-and-partition? > > Yeah, I'm familiar with this option. > This option works best with FDO. FDOed linux kernel rocks :) >> >> This should already partition unlikely blocks into a >> different section. Just a single one of course. >> >> FWIW the disadvantage is that multiple code sections tends >> to break various older dwarf unwinders, as it needs >> dwarf3 latest'n'greatest. > > If the option was so good, I would expect everyone would be using it ;-) > There were lots of problems with this option -- recently cleaned up/fixed by Teresa in GCC trunk. thanks, David > > I'm mainly only concerned with the tracepoints. I'm asking to be able to > do this with just the tracepoint code, and affect nobody else. > > -- Steve > > ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [RFC] gcc feature request: Moving blocks into sections 2013-08-05 19:04 ` Andi Kleen 2013-08-05 19:16 ` Steven Rostedt @ 2013-08-05 19:25 ` Linus Torvalds 1 sibling, 0 replies; 68+ messages in thread From: Linus Torvalds @ 2013-08-05 19:25 UTC (permalink / raw) To: Andi Kleen Cc: Steven Rostedt, LKML, gcc, Ingo Molnar, Mathieu Desnoyers, H. Peter Anvin, Thomas Gleixner, David Daney, Behan Webster, Peter Zijlstra On Mon, Aug 5, 2013 at 12:04 PM, Andi Kleen <andi@firstfloor.org> wrote: > Steven Rostedt <rostedt@goodmis.org> writes: > > Can't you just use -freorder-blocks-and-partition? > > This should already partition unlikely blocks into a > different section. Just a single one of course. That's horrible. Not because of dwarf problems, but exactly because unlikely code isn't necessarily *that* unlikely, and normal unlikely code is reached with a small branch. Making it a whole different section breaks both of those. Maybe some "really_unlikely()" would make it ok. Linus ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [RFC] gcc feature request: Moving blocks into sections 2013-08-05 16:55 [RFC] gcc feature request: Moving blocks into sections Steven Rostedt ` (2 preceding siblings ...) 2013-08-05 19:04 ` Andi Kleen @ 2013-08-12 9:17 ` Peter Zijlstra 2013-08-12 14:56 ` H. Peter Anvin 3 siblings, 1 reply; 68+ messages in thread From: Peter Zijlstra @ 2013-08-12 9:17 UTC (permalink / raw) To: Steven Rostedt Cc: LKML, gcc, Linus Torvalds, Ingo Molnar, Mathieu Desnoyers, H. Peter Anvin, Thomas Gleixner, David Daney, Behan Webster On Mon, Aug 05, 2013 at 12:55:15PM -0400, Steven Rostedt wrote: > [ sent to both Linux kernel mailing list and to gcc list ] > Let me hijack this thread for something related... I've been wanting to 'abuse' static_key/asm-goto to sort-of JIT if-forest functions like perf_prepare_sample() and perf_output_sample(). They are of the form: void func(obj, args..) { unsigned long f = ...; if (f & F1) do_f1(); if (f & F2) do_f2(); ... if (f & FN) do_fn(); } Where f is constant for the entire lifetime of the particular object. So I was thinking of having these functions use static_key/asm-goto; then write the proper static key values unsafe so as to avoid all trickery (as these functions would never actually be used) and copy the end result into object private memory. The object will then use indirect calls into these functions. The advantage of using something like this is that it would work for all architectures that now support the asm-goto feature. For arch/gcc combinations that do not we'd simply revert to the current state of affairs. I suppose the question is, do people strenuously object to creativity like that and or is there something GCC can do to make this easier/better still? ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [RFC] gcc feature request: Moving blocks into sections 2013-08-12 9:17 ` Peter Zijlstra @ 2013-08-12 14:56 ` H. Peter Anvin 2013-08-12 16:02 ` Andi Kleen 2013-08-12 16:09 ` Peter Zijlstra 0 siblings, 2 replies; 68+ messages in thread From: H. Peter Anvin @ 2013-08-12 14:56 UTC (permalink / raw) To: Peter Zijlstra Cc: Steven Rostedt, LKML, gcc, Linus Torvalds, Ingo Molnar, Mathieu Desnoyers, Thomas Gleixner, David Daney, Behan Webster On 08/12/2013 02:17 AM, Peter Zijlstra wrote: > > I've been wanting to 'abuse' static_key/asm-goto to sort-of JIT > if-forest functions like perf_prepare_sample() and perf_output_sample(). > > They are of the form: > > void func(obj, args..) > { > unsigned long f = ...; > > if (f & F1) > do_f1(); > > if (f & F2) > do_f2(); > > ... > > if (f & FN) > do_fn(); > } > Am I reading this right that f can be a combination of any of these? > Where f is constant for the entire lifetime of the particular object. > > So I was thinking of having these functions use static_key/asm-goto; > then write the proper static key values unsafe so as to avoid all > trickery (as these functions would never actually be used) and copy the > end result into object private memory. The object will then use indirect > calls into these functions. I'm really not following what you are proposing here, especially not "copy the end result into object private memory." With asm goto you end up with at minimum a jump or NOP for each of these function entries, whereas an actual JIT can elide that as well. On the majority of architectures, including x86, you cannot simply copy a piece of code elsewhere and have it still work. You end up doing a bunch of the work that a JIT would do anyway, and would end up with considerably higher complexity and worse results than a true JIT. You also say "the object will then use indirect calls into these functions"... you mean the JIT or pseudo-JIT generated functions, or the calls inside them? > I suppose the question is, do people strenuously object to creativity > like that and or is there something GCC can do to make this > easier/better still? I think it would be much easier to just write a minimal JIT for this, even though it is per architecture. However, I would really like to understand what the value is. -hpa ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [RFC] gcc feature request: Moving blocks into sections 2013-08-12 14:56 ` H. Peter Anvin @ 2013-08-12 16:02 ` Andi Kleen 2013-08-12 16:11 ` Peter Zijlstra 2013-08-12 16:09 ` Peter Zijlstra 1 sibling, 1 reply; 68+ messages in thread From: Andi Kleen @ 2013-08-12 16:02 UTC (permalink / raw) To: H. Peter Anvin Cc: Peter Zijlstra, Steven Rostedt, LKML, gcc, Linus Torvalds, Ingo Molnar, Mathieu Desnoyers, Thomas Gleixner, David Daney, Behan Webster "H. Peter Anvin" <hpa@linux.intel.com> writes: > However, I would really like to > understand what the value is. Probably very little. When I last looked at it, the main overhead in perf currently seems to be backtraces and the ring buffer, not this code. -Andi -- ak@linux.intel.com -- Speaking for myself only ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [RFC] gcc feature request: Moving blocks into sections 2013-08-12 16:02 ` Andi Kleen @ 2013-08-12 16:11 ` Peter Zijlstra 0 siblings, 0 replies; 68+ messages in thread From: Peter Zijlstra @ 2013-08-12 16:11 UTC (permalink / raw) To: Andi Kleen Cc: H. Peter Anvin, Steven Rostedt, LKML, gcc, Linus Torvalds, Ingo Molnar, Mathieu Desnoyers, Thomas Gleixner, David Daney, Behan Webster On Mon, Aug 12, 2013 at 09:02:02AM -0700, Andi Kleen wrote: > "H. Peter Anvin" <hpa@linux.intel.com> writes: > > > However, I would really like to > > understand what the value is. > > Probably very little. When I last looked at it, the main overhead in > perf currently seems to be backtraces and the ring buffer, not this > code. backtraces do indeed blow and make pretty much everything else irrelevant, but when not using them the branch forest was significant when I last looked at it. ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [RFC] gcc feature request: Moving blocks into sections 2013-08-12 14:56 ` H. Peter Anvin 2013-08-12 16:02 ` Andi Kleen @ 2013-08-12 16:09 ` Peter Zijlstra 2013-08-12 17:47 ` H. Peter Anvin 1 sibling, 1 reply; 68+ messages in thread From: Peter Zijlstra @ 2013-08-12 16:09 UTC (permalink / raw) To: H. Peter Anvin Cc: Steven Rostedt, LKML, gcc, Linus Torvalds, Ingo Molnar, Mathieu Desnoyers, Thomas Gleixner, David Daney, Behan Webster On Mon, Aug 12, 2013 at 07:56:10AM -0700, H. Peter Anvin wrote: > On 08/12/2013 02:17 AM, Peter Zijlstra wrote: > > > > I've been wanting to 'abuse' static_key/asm-goto to sort-of JIT > > if-forest functions like perf_prepare_sample() and perf_output_sample(). > > > > They are of the form: > > > > void func(obj, args..) > > { > > unsigned long f = ...; > > > > if (f & F1) > > do_f1(); > > > > if (f & F2) > > do_f2(); > > > > ... > > > > if (f & FN) > > do_fn(); > > } > > > > Am I reading this right that f can be a combination of any of these? Correct. > > Where f is constant for the entire lifetime of the particular object. > > > > So I was thinking of having these functions use static_key/asm-goto; > > then write the proper static key values unsafe so as to avoid all > > trickery (as these functions would never actually be used) and copy the > > end result into object private memory. The object will then use indirect > > calls into these functions. > > I'm really not following what you are proposing here, especially not > "copy the end result into object private memory." > > With asm goto you end up with at minimum a jump or NOP for each of these > function entries, whereas an actual JIT can elide that as well. > > On the majority of architectures, including x86, you cannot simply copy > a piece of code elsewhere and have it still work. I thought we used -fPIC which would allow just that. > You end up doing a > bunch of the work that a JIT would do anyway, and would end up with > considerably higher complexity and worse results than a true JIT. Well, less complexity but worse result, yes. We'd only poke the specific static_branch sites with either NOPs or the (relative) jump target for each of these branches. Then copy the result. > You > also say "the object will then use indirect calls into these > functions"... you mean the JIT or pseudo-JIT generated functions, or the > calls inside them? The calls to these pseudo-JIT generated functions. > > I suppose the question is, do people strenuously object to creativity > > like that and or is there something GCC can do to make this > > easier/better still? > > I think it would be much easier to just write a minimal JIT for this, > even though it is per architecture. However, I would really like to > understand what the value is. Removing a lot of the conditionals from the sample path. Depending on the configuration these can be quite expensive. ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [RFC] gcc feature request: Moving blocks into sections 2013-08-12 16:09 ` Peter Zijlstra @ 2013-08-12 17:47 ` H. Peter Anvin 2013-08-13 7:50 ` Peter Zijlstra 0 siblings, 1 reply; 68+ messages in thread From: H. Peter Anvin @ 2013-08-12 17:47 UTC (permalink / raw) To: Peter Zijlstra Cc: Steven Rostedt, LKML, gcc, Linus Torvalds, Ingo Molnar, Mathieu Desnoyers, Thomas Gleixner, David Daney, Behan Webster On 08/12/2013 09:09 AM, Peter Zijlstra wrote: >> >> On the majority of architectures, including x86, you cannot simply copy >> a piece of code elsewhere and have it still work. > > I thought we used -fPIC which would allow just that. > Doubly wrong. The kernel is not compiled with -fPIC, nor does -fPIC allow this kind of movement for code that contains intramodule references (that is *all* references in the kernel). Since we really doesn't want to burden the kernel with a GOT and a PLT, that is life. >> You end up doing a >> bunch of the work that a JIT would do anyway, and would end up with >> considerably higher complexity and worse results than a true JIT. > > Well, less complexity but worse result, yes. We'd only poke the specific > static_branch sites with either NOPs or the (relative) jump target for > each of these branches. Then copy the result. Once again, you can't "copy the result". You end up with a full disassembler. -hpa ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [RFC] gcc feature request: Moving blocks into sections 2013-08-12 17:47 ` H. Peter Anvin @ 2013-08-13 7:50 ` Peter Zijlstra 2013-08-13 14:46 ` H. Peter Anvin 0 siblings, 1 reply; 68+ messages in thread From: Peter Zijlstra @ 2013-08-13 7:50 UTC (permalink / raw) To: H. Peter Anvin Cc: Steven Rostedt, LKML, gcc, Linus Torvalds, Ingo Molnar, Mathieu Desnoyers, Thomas Gleixner, David Daney, Behan Webster On Mon, Aug 12, 2013 at 10:47:37AM -0700, H. Peter Anvin wrote: > On 08/12/2013 09:09 AM, Peter Zijlstra wrote: > >> > >> On the majority of architectures, including x86, you cannot simply copy > >> a piece of code elsewhere and have it still work. > > > > I thought we used -fPIC which would allow just that. > > > > Doubly wrong. The kernel is not compiled with -fPIC, nor does -fPIC > allow this kind of movement for code that contains intramodule > references (that is *all* references in the kernel). Since we really > doesn't want to burden the kernel with a GOT and a PLT, that is life. OK. never mind then.. ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [RFC] gcc feature request: Moving blocks into sections 2013-08-13 7:50 ` Peter Zijlstra @ 2013-08-13 14:46 ` H. Peter Anvin 2013-08-13 14:52 ` Steven Rostedt 0 siblings, 1 reply; 68+ messages in thread From: H. Peter Anvin @ 2013-08-13 14:46 UTC (permalink / raw) To: Peter Zijlstra Cc: Steven Rostedt, LKML, gcc, Linus Torvalds, Ingo Molnar, Mathieu Desnoyers, Thomas Gleixner, David Daney, Behan Webster > On Mon, Aug 12, 2013 at 10:47:37AM -0700, H. Peter Anvin wrote: >> Since we really doesn't want to... Ow. Can't believe I wrote that. -hpa ^ permalink raw reply [flat|nested] 68+ messages in thread
* Re: [RFC] gcc feature request: Moving blocks into sections 2013-08-13 14:46 ` H. Peter Anvin @ 2013-08-13 14:52 ` Steven Rostedt 0 siblings, 0 replies; 68+ messages in thread From: Steven Rostedt @ 2013-08-13 14:52 UTC (permalink / raw) To: H. Peter Anvin Cc: Peter Zijlstra, LKML, gcc, Linus Torvalds, Ingo Molnar, Mathieu Desnoyers, Thomas Gleixner, David Daney, Behan Webster On Tue, 13 Aug 2013 07:46:46 -0700 "H. Peter Anvin" <hpa@linux.intel.com> wrote: > > On Mon, Aug 12, 2013 at 10:47:37AM -0700, H. Peter Anvin wrote: > >> Since we really doesn't want to... > > Ow. Can't believe I wrote that. > All your base are belong to us! -- Steve ^ permalink raw reply [flat|nested] 68+ messages in thread
end of thread, other threads:[~2013-08-13 14:52 UTC | newest] Thread overview: 68+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2013-08-05 16:55 [RFC] gcc feature request: Moving blocks into sections Steven Rostedt 2013-08-05 17:02 ` H. Peter Anvin 2013-08-05 17:24 ` Steven Rostedt 2013-08-05 17:12 ` Linus Torvalds 2013-08-05 17:15 ` Linus Torvalds 2013-08-05 17:55 ` Steven Rostedt 2013-08-05 18:11 ` Steven Rostedt 2013-08-05 18:17 ` H. Peter Anvin 2013-08-05 18:23 ` Steven Rostedt 2013-08-05 18:29 ` H. Peter Anvin 2013-08-05 18:49 ` Steven Rostedt 2013-08-05 18:51 ` H. Peter Anvin 2013-08-05 19:01 ` Linus Torvalds 2013-08-05 19:54 ` Mathieu Desnoyers 2013-08-05 19:57 ` Linus Torvalds 2013-08-05 20:02 ` Steven Rostedt 2013-08-05 21:28 ` Mathieu Desnoyers 2013-08-05 21:43 ` H. Peter Anvin 2013-08-06 4:14 ` Mathieu Desnoyers 2013-08-06 4:28 ` H. Peter Anvin 2013-08-06 16:15 ` Steven Rostedt 2013-08-06 16:19 ` H. Peter Anvin 2013-08-06 16:26 ` Steven Rostedt 2013-08-06 16:29 ` H. Peter Anvin 2013-08-05 21:44 ` Steven Rostedt 2013-08-05 22:08 ` Mathieu Desnoyers 2013-08-05 19:09 ` Steven Rostedt 2013-08-05 18:20 ` Linus Torvalds 2013-08-05 18:24 ` Linus Torvalds 2013-08-05 18:34 ` Linus Torvalds 2013-08-05 18:38 ` H. Peter Anvin 2013-08-05 19:04 ` Steven Rostedt 2013-08-05 19:40 ` Marek Polacek 2013-08-05 19:56 ` Linus Torvalds 2013-08-05 19:57 ` Jason Baron 2013-08-05 20:35 ` Richard Henderson 2013-08-06 2:26 ` Jason Baron 2013-08-06 3:03 ` Steven Rostedt 2013-08-05 18:33 ` H. Peter Anvin 2013-08-05 18:39 ` Steven Rostedt 2013-08-05 18:49 ` Linus Torvalds 2013-08-05 19:39 ` Steven Rostedt 2013-08-06 14:19 ` Steven Rostedt 2013-08-06 17:48 ` Linus Torvalds 2013-08-06 17:58 ` Steven Rostedt 2013-08-06 20:33 ` Mathieu Desnoyers 2013-08-06 20:43 ` Steven Rostedt 2013-08-07 0:45 ` Steven Rostedt 2013-08-07 0:56 ` Steven Rostedt 2013-08-07 5:06 ` Ondřej Bílka 2013-08-07 15:02 ` Steven Rostedt 2013-08-07 16:03 ` Mathieu Desnoyers 2013-08-07 16:11 ` Steven Rostedt 2013-08-07 23:22 ` Mathieu Desnoyers 2013-08-05 20:06 ` Jason Baron 2013-08-05 19:04 ` Andi Kleen 2013-08-05 19:16 ` Steven Rostedt 2013-08-05 19:30 ` Xinliang David Li 2013-08-05 19:25 ` Linus Torvalds 2013-08-12 9:17 ` Peter Zijlstra 2013-08-12 14:56 ` H. Peter Anvin 2013-08-12 16:02 ` Andi Kleen 2013-08-12 16:11 ` Peter Zijlstra 2013-08-12 16:09 ` Peter Zijlstra 2013-08-12 17:47 ` H. Peter Anvin 2013-08-13 7:50 ` Peter Zijlstra 2013-08-13 14:46 ` H. Peter Anvin 2013-08-13 14:52 ` Steven Rostedt
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.