* Re: [uClinux-dev] Kernel 2.6 size increase - get_current()?
[not found] ` <cArg.74D.11@gated-at.bofh.it>
@ 2003-07-24 8:13 ` Ihar "Philips" Filipau
2003-07-25 7:25 ` Denis Vlasenko
2003-07-25 18:36 ` bill davidsen
0 siblings, 2 replies; 20+ messages in thread
From: Ihar "Philips" Filipau @ 2003-07-24 8:13 UTC (permalink / raw)
To: linux-kernel
Bernardo Innocenti wrote:
> On Wednesday 23 July 2003 22:27, Christoph Hellwig wrote:
>
>>On Wed, Jul 23, 2003 at 01:22:56PM -0700, David S. Miller wrote:
>>>Drivers weren't audited much, and there's a lot of boneheaded
>>>stuff in this area. But these should be mostly identical
>>>to what would happen on the 2.4.x side
>>
>>Please read the original message again - he stated that every single
>>module in fs/ got alot bigger - if it gets smaller or at least the
>>same size as 2.4 it's clearly a sign of inlines gone mad in the
>>filesystem/VM code and we need to look at that. If not we have to look
>>elsewhere.
>
> I have my humbling opinion:
>
> In 2.4.20 (m68knommu):
> -------------------------------------------------------------------------
> #define current _current_task
> -------------------------------------------------------------------------
>
> In 2.6.0-test1 (m68knommu):
> -------------------------------------------------------------------------
> static inline struct task_struct *get_current(void)
> {
[cut]
> }
> static inline struct thread_info *current_thread_info(void)
> {
[cut]
> }
> -------------------------------------------------------------------------
>
> This takes 18*11 = 198 bytes just for invoking the 'current'
> macro so many times.
>
Just curious.
Is there any way to guess inline from inline?
I mean 'inline' which means 'this has to be inlined or it will
break' and 'inline' which means 'inline this please - it adds only 10k
of code bloat and improve performance in my suppa-puppa-bench by 0.000001%!'
Strictly speaking - separate 'inline' to 'require_inline' and
'better_inline'.
So people who really care about image size - can turn
'better_inline' into void, without harm to functionality.
Actually I saw real performance improvements on my Pentium MMX 133
(it has $i16k+$d16k of caches I beleive) when I was cutting some of
inlines out. and I'm not talking about (cache poor) embedded systems...
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [uClinux-dev] Kernel 2.6 size increase - get_current()?
2003-07-24 8:13 ` [uClinux-dev] Kernel 2.6 size increase - get_current()? Ihar "Philips" Filipau
@ 2003-07-25 7:25 ` Denis Vlasenko
2003-07-25 18:36 ` bill davidsen
1 sibling, 0 replies; 20+ messages in thread
From: Denis Vlasenko @ 2003-07-25 7:25 UTC (permalink / raw)
To: filia, linux-kernel
On 24 July 2003 11:13, Ihar \"Philips\" Filipau wrote:
> I mean 'inline' which means 'this has to be inlined or it will
> break' and 'inline' which means 'inline this please - it adds only 10k
> of code bloat and improve performance in my suppa-puppa-bench by 0.000001%!'
>
> Strictly speaking - separate 'inline' to 'require_inline' and
> 'better_inline'.
> So people who really care about image size - can turn
> 'better_inline' into void, without harm to functionality.
> Actually I saw real performance improvements on my Pentium MMX 133
> (it has $i16k+$d16k of caches I beleive) when I was cutting some of
> inlines out. and I'm not talking about (cache poor) embedded systems...
Which inlines? Let the list know
--
vda
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [uClinux-dev] Kernel 2.6 size increase - get_current()?
2003-07-24 8:13 ` [uClinux-dev] Kernel 2.6 size increase - get_current()? Ihar "Philips" Filipau
2003-07-25 7:25 ` Denis Vlasenko
@ 2003-07-25 18:36 ` bill davidsen
1 sibling, 0 replies; 20+ messages in thread
From: bill davidsen @ 2003-07-25 18:36 UTC (permalink / raw)
To: linux-kernel
In article <3F1F9531.2050204@softhome.net>,
Ihar \"Philips\" Filipau <filia@softhome.net> wrote:
| Just curious.
|
| Is there any way to guess inline from inline?
|
| I mean 'inline' which means 'this has to be inlined or it will
| break' and 'inline' which means 'inline this please - it adds only 10k
| of code bloat and improve performance in my suppa-puppa-bench by 0.000001%!'
|
| Strictly speaking - separate 'inline' to 'require_inline' and
| 'better_inline'.
| So people who really care about image size - can turn
| 'better_inline' into void, without harm to functionality.
| Actually I saw real performance improvements on my Pentium MMX 133
| (it has $i16k+$d16k of caches I beleive) when I was cutting some of
| inlines out. and I'm not talking about (cache poor) embedded systems...
Actually you have a very diferent CPU to memory bandwidth ratio than a
processor manufactured in this millenium. I use a system like that for
test, but please don't optimize for it!
Speculation of the day: I suspect that on some laptops which run
seriously slower when on battery, the CPU/memory speed changes enough
that you could see and measure better performance with a 'slow' and a
'fast' kernel.
Speculation, since I'm sure the gain would be down in the noise, one of
those 'difference without a distinction' things.
--
bill davidsen <davidsen@tmr.com>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [uClinux-dev] Kernel 2.6 size increase - get_current()?
2003-07-24 12:04 ` David McCullough
2003-07-24 14:48 ` Alan Cox
@ 2003-07-25 18:25 ` bill davidsen
1 sibling, 0 replies; 20+ messages in thread
From: bill davidsen @ 2003-07-25 18:25 UTC (permalink / raw)
To: linux-kernel
In article <20030724120441.GC16168@beast>,
David McCullough <davidm@snapgear.com> wrote:
| So should the trend be away from inlining, especially larger functions ?
|
| I know on m68k some of the really simple inlines are actually smaller as
| an inline than as a function call. But they have to be very simple, or
| only used once.
Actually, I would think that the compiler would make the decision in a
perfect world. (no smiley) Clearly some programmers think the compiler
isn't aggressive about this, and that may be the root problem. Certainly
if the compiler makes the choice then -Os should avoid the inline.
--
bill davidsen <davidsen@tmr.com>
CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [uClinux-dev] Kernel 2.6 size increase - get_current()?
[not found] ` <dbTZ.5Z5.19@gated-at.bofh.it>
@ 2003-07-25 15:37 ` Ihar "Philips" Filipau
0 siblings, 0 replies; 20+ messages in thread
From: Ihar "Philips" Filipau @ 2003-07-25 15:37 UTC (permalink / raw)
To: linux-kernel
Hollis Blanchard wrote:
> I believe the point Alan was trying to make is not that we should have
> more or less inlines, but we should have smarter inlines. I.E. don't
> just inline a function to "make it fast"; think about the implications
> (and ideally measure it, though I think that becomes problematic when so
> many other factors can affect the benefit of a single inlined function).
> The specific example he gave was inlining code on the fast path, while
> accepting branch/cache penalties for non-inlined code on the slow path.
>
But you cannot make this kind of decisions universal.
Some kind of compromise should be found between arch-mantainers and
subsystem-mantainers.
Or beat GCC developer hard so they finally will produce good
optimizing compiler ;-)
Or ask all kernel developpers to work one hour per week on GCC
optimization - I bet GCC will outperform everything else in industry in
less that one year ;-)))
To remind: source of the problem is not inlines, problem is the
compiler, which cannot read our minds yet and generate code we were
expected it to generate.
P.S. Offtopic. As I see it Linux & Linus have made the decision of
optimization. Linux after all is capitalismus creation: who has more
money do control everything. Server market has more money - they do more
work on kernel and they systems are not that far from developers'
workstations - so Linux gets more and more server/workstation oriented.
This will fit desktop market too - if your computer was made to run
WinXP AKA exp(bloat) - it will be capable to run any OS. Linus repeating
'small is beatiful' sounds more and more like crude joke...
As for embedded market - it is already in deep fork and far far away
from vanilla kernels... Vanilla really not that relevant to real world...
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [uClinux-dev] Kernel 2.6 size increase - get_current()?
2003-07-25 4:22 ` Otto Solares
@ 2003-07-25 14:38 ` Hollis Blanchard
0 siblings, 0 replies; 20+ messages in thread
From: Hollis Blanchard @ 2003-07-25 14:38 UTC (permalink / raw)
To: Otto Solares
Cc: J.A. Magallon, Alan Cox, David McCullough, uclinux-dev,
Linux Kernel Mailing List, Ihar Philips Filipau
On Thursday, Jul 24, 2003, at 23:22 US/Central, Otto Solares wrote:
> On Thu, Jul 24, 2003 at 11:20:00PM +0200, J.A. Magallon wrote:
>> Or you just define must_inline, and let gcc inline the rest of
>> 'inlines',
>> based on its own rule of functions size, adjusting the parameters
>> to gcc to assure (more or less) that what is inlined fits in cache of
>> the processor one is building for...
>> (this can be hard, help from gcc hackers will be needed...)
>
> IMO just a CONFIG_INLINE_FUNCTIONS will work, if you
> want to conserve space in detriment of speed simply
> don't select this option, else you have speed but
> a big kernel.
Inlines don't always help performance (depending on cache sizes, branch
penalties, frequency of code access...), but they do always increase
code size.
I believe the point Alan was trying to make is not that we should have
more or less inlines, but we should have smarter inlines. I.E. don't
just inline a function to "make it fast"; think about the implications
(and ideally measure it, though I think that becomes problematic when
so many other factors can affect the benefit of a single inlined
function). The specific example he gave was inlining code on the fast
path, while accepting branch/cache penalties for non-inlined code on
the slow path.
--
Hollis Blanchard
IBM Linux Technology Center
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [uClinux-dev] Kernel 2.6 size increase - get_current()?
2003-07-24 21:20 ` J.A. Magallon
@ 2003-07-25 4:22 ` Otto Solares
2003-07-25 14:38 ` Hollis Blanchard
0 siblings, 1 reply; 20+ messages in thread
From: Otto Solares @ 2003-07-25 4:22 UTC (permalink / raw)
To: J.A. Magallon
Cc: Hollis Blanchard, Alan Cox, David McCullough, uclinux-dev,
Linux Kernel Mailing List, Ihar Philips Filipau
On Thu, Jul 24, 2003 at 11:20:00PM +0200, J.A. Magallon wrote:
> Or you just define must_inline, and let gcc inline the rest of 'inlines',
> based on its own rule of functions size, adjusting the parameters
> to gcc to assure (more or less) that what is inlined fits in cache of
> the processor one is building for...
> (this can be hard, help from gcc hackers will be needed...)
IMO just a CONFIG_INLINE_FUNCTIONS will work, if you
want to conserve space in detriment of speed simply
don't select this option, else you have speed but
a big kernel.
-solca
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [uClinux-dev] Kernel 2.6 size increase - get_current()?
2003-07-24 19:51 ` Hollis Blanchard
@ 2003-07-24 21:20 ` J.A. Magallon
2003-07-25 4:22 ` Otto Solares
0 siblings, 1 reply; 20+ messages in thread
From: J.A. Magallon @ 2003-07-24 21:20 UTC (permalink / raw)
To: Hollis Blanchard
Cc: Alan Cox, David McCullough, uclinux-dev,
Linux Kernel Mailing List, Ihar Philips Filipau
On 07.24, Hollis Blanchard wrote:
> On Thursday, Jul 24, 2003, at 14:37 US/Central, Alan Cox wrote:
>
> > On Iau, 2003-07-24 at 16:30, Hollis Blanchard wrote:
> >> So you're arguing for more inlining, because icache speculative
> >> prefetch will pick up the inlined code?
> >
> > I'm arguing for short inlined fast paths and non inlined unusual
> > paths.
> >
> >> Or you're arguing for less, because code like get_current() which is
> >> called frequently could have a single copy living in icache?
> >
> > Depends how much the jump costs you.
>
> And also how big your icache is, and maybe even cpu/bus ratio, etc...
> which depend on the arch of course.
>
> So as I saw Ihar suggest earlier in this thread, perhaps there should
> be two inline directives: must_inline (for code whose correctness
> depends on it) and could_help_performance_inline. Then different archs
> could #define could_help_performance_inline as appropriate.
>
Or you just define must_inline, and let gcc inline the rest of 'inlines',
based on its own rule of functions size, adjusting the parameters
to gcc to assure (more or less) that what is inlined fits in cache of
the processor one is building for...
(this can be hard, help from gcc hackers will be needed...)
--
J.A. Magallon <jamagallon@able.es> \ Software is like sex:
werewolf.able.es \ It's better when it's free
Mandrake Linux release 9.2 (Cooker) for i586
Linux 2.4.22-pre7-jam1m (gcc 3.3.1 (Mandrake Linux 9.2 3.3.1-0.6mdk))
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [uClinux-dev] Kernel 2.6 size increase - get_current()?
2003-07-24 19:37 ` Alan Cox
@ 2003-07-24 19:51 ` Hollis Blanchard
2003-07-24 21:20 ` J.A. Magallon
0 siblings, 1 reply; 20+ messages in thread
From: Hollis Blanchard @ 2003-07-24 19:51 UTC (permalink / raw)
To: Alan Cox
Cc: David McCullough, uclinux-dev, Linux Kernel Mailing List,
Ihar "Philips" Filipau
On Thursday, Jul 24, 2003, at 14:37 US/Central, Alan Cox wrote:
> On Iau, 2003-07-24 at 16:30, Hollis Blanchard wrote:
>> So you're arguing for more inlining, because icache speculative
>> prefetch will pick up the inlined code?
>
> I'm arguing for short inlined fast paths and non inlined unusual
> paths.
>
>> Or you're arguing for less, because code like get_current() which is
>> called frequently could have a single copy living in icache?
>
> Depends how much the jump costs you.
And also how big your icache is, and maybe even cpu/bus ratio, etc...
which depend on the arch of course.
So as I saw Ihar suggest earlier in this thread, perhaps there should
be two inline directives: must_inline (for code whose correctness
depends on it) and could_help_performance_inline. Then different archs
could #define could_help_performance_inline as appropriate.
--
Hollis Blanchard
IBM Linux Technology Center
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [uClinux-dev] Kernel 2.6 size increase - get_current()?
2003-07-24 15:30 ` Hollis Blanchard
@ 2003-07-24 19:37 ` Alan Cox
2003-07-24 19:51 ` Hollis Blanchard
0 siblings, 1 reply; 20+ messages in thread
From: Alan Cox @ 2003-07-24 19:37 UTC (permalink / raw)
To: Hollis Blanchard; +Cc: David McCullough, uclinux-dev, Linux Kernel Mailing List
On Iau, 2003-07-24 at 16:30, Hollis Blanchard wrote:
> So you're arguing for more inlining, because icache speculative
> prefetch will pick up the inlined code?
I'm arguing for short inlined fast paths and non inlined unusual
paths.
> Or you're arguing for less, because code like get_current() which is
> called frequently could have a single copy living in icache?
Depends how much the jump costs you.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [uClinux-dev] Kernel 2.6 size increase - get_current()?
2003-07-24 11:28 ` Alan Cox
2003-07-24 12:04 ` David McCullough
@ 2003-07-24 15:30 ` Hollis Blanchard
2003-07-24 19:37 ` Alan Cox
1 sibling, 1 reply; 20+ messages in thread
From: Hollis Blanchard @ 2003-07-24 15:30 UTC (permalink / raw)
To: Alan Cox; +Cc: David McCullough, uclinux-dev, Linux Kernel Mailing List
On Thursday, Jul 24, 2003, at 06:28 US/Central, Alan Cox wrote:
>
> Code size for critical paths is getting more and more performance
> critical
> on x86 as well as on the embedded CPU systems. 3Ghz superscalar
> processors
> lose a lot of clocks to a memory stall.
So you're arguing for more inlining, because icache speculative
prefetch will pick up the inlined code?
Or you're arguing for less, because code like get_current() which is
called frequently could have a single copy living in icache?
--
Hollis Blanchard
IBM Linux Technology Center
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [uClinux-dev] Kernel 2.6 size increase - get_current()?
2003-07-24 12:04 ` David McCullough
@ 2003-07-24 14:48 ` Alan Cox
2003-07-25 18:25 ` bill davidsen
1 sibling, 0 replies; 20+ messages in thread
From: Alan Cox @ 2003-07-24 14:48 UTC (permalink / raw)
To: David McCullough
Cc: Bernardo Innocenti, Christoph Hellwig, David S. Miller,
uclinux-dev, Linux Kernel Mailing List, Greg Ungerer
On Iau, 2003-07-24 at 13:04, David McCullough wrote:
> So should the trend be away from inlining, especially larger functions ?
>
> I know on m68k some of the really simple inlines are actually smaller as
> an inline than as a function call. But they have to be very simple, or
> only used once.
Cool. As to trends well there are two conflicting ones - less inlines but
also more code because of adding fast paths to cut conditions down on normal
sequences of execution.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [uClinux-dev] Kernel 2.6 size increase - get_current()?
2003-07-24 11:28 ` Alan Cox
@ 2003-07-24 12:04 ` David McCullough
2003-07-24 14:48 ` Alan Cox
2003-07-25 18:25 ` bill davidsen
2003-07-24 15:30 ` Hollis Blanchard
1 sibling, 2 replies; 20+ messages in thread
From: David McCullough @ 2003-07-24 12:04 UTC (permalink / raw)
To: Alan Cox
Cc: Bernardo Innocenti, Christoph Hellwig, David S. Miller,
uclinux-dev, Linux Kernel Mailing List, Greg Ungerer
Jivin Alan Cox lays it down ...
> On Iau, 2003-07-24 at 06:06, David McCullough wrote:
> > Back when I first did the 2.4 uClinux port, the m68k MMU code was
> > dedicating a register (a2) for current. I thought that was a bad idea
> > given how often you run out of registers on the 68k, and made it a
>
> On some platforms a global register current was a win, I can't speak for
> m68k - current is used a lot.
I'm sure that using a register for current was the right thing to do at
the time. One problem with a global register approach is that the more
inlining the code uses, the more like the compiler is going to want
that extra register :-)
> > On the 2.5/2.6 front, I think the change comes from the 8K (2 page) task
> > structure and everyone just masking the kernel stack pointer to get the
> > task pointer. Gerg would know for sure, he did the 2.5 work in this area.
> > We should be easily able to switch back to the current_task pointer with a
> > few small mods to entry.S.
>
> A lot of platforms went this way because "current" is hard to do right
> on an SMP box. Its effectively per CPU dependant, and that means you
> either set up the MMU to do per CPU pages (via segments or tables) which
> is a pita, or you do the stack trick. For uniprocessor a global still
> works perfectly well.
Sounds like something that can at least be made conditional on SMP.
I'll look into it for m68knommu since it is more likely to care about "size"
than SMP.
> > A general comment on the use of inline throughout the kernel. Although
> > they may show gains on x86 platforms, they often perform worse on
> > embedded processors with limited cache, as well as adding size. I
>
> Code size for critical paths is getting more and more performance critical
> on x86 as well as on the embedded CPU systems. 3Ghz superscalar processors
> lose a lot of clocks to a memory stall.
So should the trend be away from inlining, especially larger functions ?
I know on m68k some of the really simple inlines are actually smaller as
an inline than as a function call. But they have to be very simple, or
only used once.
Cheers,
Davidm
--
David McCullough, davidm@snapgear.com Ph:+61 7 34352815 http://www.SnapGear.com
Custom Embedded Solutions + Security Fx:+61 7 38913630 http://www.uCdot.org
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [uClinux-dev] Kernel 2.6 size increase - get_current()?
2003-07-24 8:27 Ihar "Philips" Filipau
@ 2003-07-24 11:50 ` David McCullough
0 siblings, 0 replies; 20+ messages in thread
From: David McCullough @ 2003-07-24 11:50 UTC (permalink / raw)
To: Ihar Philips Filipau; +Cc: linux-kernel
Jivin Ihar Philips Filipau lays it down ...
> David McCullough wrote:
> >
> >A general comment on the use of inline throughout the kernel. Although
> >they may show gains on x86 platforms, they often perform worse on
> >embedded processors with limited cache, as well as adding size. I
> >can't see any way of coding around this though. As long as x86 is
> >driving influence, other platforms will jut have to deal with it as
> >best they can.
> >
>
> Actually I'm victim on over inlining too. Was at least.
> I was running some router on old Pentium's. I remember almost
> dramatical drop of performance with newer kernels because of inlining in
> net/*. But sure on Xeon P4 it boosts performance...
>
> Actually what I'm about.
> We have classical situation when we have mess of representation and
> intentions.
>
> Representation == 'inline', but intentions - 'inline or it will
> break' _and_ 'inline - it runs faster'.
> This obviously should be separated.
The biggest problem I see is that the inlines are done in header files
generally, and to stop them from inlining, you need to be able to
switch from an inline to a prototype in the header file. The code from
the header then needs to be added to a .o somewhere in the build for the
case where inlines are stripped out.
Other than providing non-critical inlines either on or off, I can't see
the level approach working all that well. A combination of levels that
work well on a few platforms may not work well at all on another.
Still, just the ability to reduce the inlines would be very useful.
Cheers,
Davidm
> even more.
>
> #define INLINE_LEVEL some_platform_specific_number
>
> ---------
>
> #define inline0 inline_always
>
> #if INLINE_LEVEL >= 1
> # define inline1 inline_always
> #else
> # define inline1
> #endif
> ...
> #if INLINE_LEVEL >= N
> # define inlineN inline_always
> #else
> # define inlineN
> #endif
>
> and so on, giving a platform chance to influence amount of inlining.
> better to put it into config with defined by platform defaults.
--
David McCullough, davidm@snapgear.com Ph:+61 7 34352815 http://www.SnapGear.com
Custom Embedded Solutions + Security Fx:+61 7 38913630 http://www.uCdot.org
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [uClinux-dev] Kernel 2.6 size increase - get_current()?
2003-07-24 5:06 ` David McCullough
@ 2003-07-24 11:28 ` Alan Cox
2003-07-24 12:04 ` David McCullough
2003-07-24 15:30 ` Hollis Blanchard
0 siblings, 2 replies; 20+ messages in thread
From: Alan Cox @ 2003-07-24 11:28 UTC (permalink / raw)
To: David McCullough
Cc: Bernardo Innocenti, Christoph Hellwig, David S. Miller,
uclinux-dev, Linux Kernel Mailing List, Greg Ungerer
On Iau, 2003-07-24 at 06:06, David McCullough wrote:
> Back when I first did the 2.4 uClinux port, the m68k MMU code was
> dedicating a register (a2) for current. I thought that was a bad idea
> given how often you run out of registers on the 68k, and made it a
On some platforms a global register current was a win, I can't speak for
m68k - current is used a lot.
> On the 2.5/2.6 front, I think the change comes from the 8K (2 page) task
> structure and everyone just masking the kernel stack pointer to get the
> task pointer. Gerg would know for sure, he did the 2.5 work in this area.
> We should be easily able to switch back to the current_task pointer with a
> few small mods to entry.S.
A lot of platforms went this way because "current" is hard to do right
on an SMP box. Its effectively per CPU dependant, and that means you
either set up the MMU to do per CPU pages (via segments or tables) which
is a pita, or you do the stack trick. For uniprocessor a global still
works perfectly well.
> A general comment on the use of inline throughout the kernel. Although
> they may show gains on x86 platforms, they often perform worse on
> embedded processors with limited cache, as well as adding size. I
Code size for critical paths is getting more and more performance critical
on x86 as well as on the embedded CPU systems. 3Ghz superscalar processors
lose a lot of clocks to a memory stall.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [uClinux-dev] Kernel 2.6 size increase - get_current()?
@ 2003-07-24 8:27 Ihar "Philips" Filipau
2003-07-24 11:50 ` David McCullough
0 siblings, 1 reply; 20+ messages in thread
From: Ihar "Philips" Filipau @ 2003-07-24 8:27 UTC (permalink / raw)
To: David McCullough; +Cc: linux-kernel
David McCullough wrote:
>
> A general comment on the use of inline throughout the kernel. Although
> they may show gains on x86 platforms, they often perform worse on
> embedded processors with limited cache, as well as adding size. I
> can't see any way of coding around this though. As long as x86 is
> driving influence, other platforms will jut have to deal with it as
> best they can.
>
Actually I'm victim on over inlining too. Was at least.
I was running some router on old Pentium's. I remember almost
dramatical drop of performance with newer kernels because of inlining in
net/*. But sure on Xeon P4 it boosts performance...
Actually what I'm about.
We have classical situation when we have mess of representation and
intentions.
Representation == 'inline', but intentions - 'inline or it will
break' _and_ 'inline - it runs faster'.
This obviously should be separated.
even more.
#define INLINE_LEVEL some_platform_specific_number
---------
#define inline0 inline_always
#if INLINE_LEVEL >= 1
# define inline1 inline_always
#else
# define inline1
#endif
...
#if INLINE_LEVEL >= N
# define inlineN inline_always
#else
# define inlineN
#endif
and so on, giving a platform chance to influence amount of inlining.
better to put it into config with defined by platform defaults.
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [uClinux-dev] Kernel 2.6 size increase - get_current()?
2003-07-23 23:00 ` Bernardo Innocenti
@ 2003-07-24 5:06 ` David McCullough
2003-07-24 11:28 ` Alan Cox
0 siblings, 1 reply; 20+ messages in thread
From: David McCullough @ 2003-07-24 5:06 UTC (permalink / raw)
To: Bernardo Innocenti
Cc: Alan Cox, Christoph Hellwig, David S. Miller, uclinux-dev,
Linux Kernel Mailing List, Greg Ungerer
Jivin Bernardo Innocenti lays it down ...
> On Thursday 24 July 2003 00:37, Alan Cox wrote:
>
> > On Mer, 2003-07-23 at 23:35, Bernardo Innocenti wrote:
> > > It's a sequence of 6 instructions, 18 bytes long, clobbering 4 registers.
> > > The compiler cannot see around it.
> > > This takes 18*11 = 198 bytes just for invoking the 'current'
> > > macro so many times.
> >
> > Unless you support SMP I'm not sure I understand why m68k nommu changed
> > from using a global for current_task ?
>
> The people who might know best are Greg and David from SnapGear.
> I'm appending them to the Cc list.
>
> But I noticed that most archs in 2.6 do like this. Is it some kind
> of flock-effect? Things get changed in i386 and all other archs
> just follow... :-)
It's a little this way for sure.
Back when I first did the 2.4 uClinux port, the m68k MMU code was
dedicating a register (a2) for current. I thought that was a bad idea
given how often you run out of registers on the 68k, and made it a
global. Because it was still effectively a pointer, the code size
change was not a factor. I just didn't want to give up a register.
So that is the 2.4 history and it has served us well so far ;-)
On the 2.5/2.6 front, I think the change comes from the 8K (2 page) task
structure and everyone just masking the kernel stack pointer to get the
task pointer. Gerg would know for sure, he did the 2.5 work in this area.
We should be easily able to switch back to the current_task pointer with a
few small mods to entry.S.
A general comment on the use of inline throughout the kernel. Although
they may show gains on x86 platforms, they often perform worse on
embedded processors with limited cache, as well as adding size. I
can't see any way of coding around this though. As long as x86 is
driving influence, other platforms will jut have to deal with it as
best they can.
Cheers,
Davidm
--
David McCullough, davidm@snapgear.com Ph:+61 7 34352815 http://www.SnapGear.com
Custom Embedded Solutions + Security Fx:+61 7 38913630 http://www.uCdot.org
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [uClinux-dev] Kernel 2.6 size increase - get_current()?
2003-07-23 22:37 ` Alan Cox
@ 2003-07-23 23:00 ` Bernardo Innocenti
2003-07-24 5:06 ` David McCullough
0 siblings, 1 reply; 20+ messages in thread
From: Bernardo Innocenti @ 2003-07-23 23:00 UTC (permalink / raw)
To: Alan Cox
Cc: Christoph Hellwig, David S. Miller, uclinux-dev,
Linux Kernel Mailing List, Greg Ungerer, David McCullough
On Thursday 24 July 2003 00:37, Alan Cox wrote:
> On Mer, 2003-07-23 at 23:35, Bernardo Innocenti wrote:
> > It's a sequence of 6 instructions, 18 bytes long, clobbering 4 registers.
> > The compiler cannot see around it.
> > This takes 18*11 = 198 bytes just for invoking the 'current'
> > macro so many times.
>
> Unless you support SMP I'm not sure I understand why m68k nommu changed
> from using a global for current_task ?
The people who might know best are Greg and David from SnapGear.
I'm appending them to the Cc list.
But I noticed that most archs in 2.6 do like this. Is it some kind
of flock-effect? Things get changed in i386 and all other archs
just follow... :-)
--
// Bernardo Innocenti - Develer S.r.l., R&D dept.
\X/ http://www.develer.com/
Please don't send Word attachments - http://www.gnu.org/philosophy/no-word-attachments.html
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [uClinux-dev] Kernel 2.6 size increase - get_current()?
2003-07-23 22:35 ` [uClinux-dev] Kernel 2.6 size increase - get_current()? Bernardo Innocenti
@ 2003-07-23 22:37 ` Alan Cox
2003-07-23 23:00 ` Bernardo Innocenti
0 siblings, 1 reply; 20+ messages in thread
From: Alan Cox @ 2003-07-23 22:37 UTC (permalink / raw)
To: Bernardo Innocenti
Cc: Christoph Hellwig, David S. Miller, uclinux-dev,
Linux Kernel Mailing List
On Mer, 2003-07-23 at 23:35, Bernardo Innocenti wrote:
> It's a sequence of 6 instructions, 18 bytes long, clobbering 4 registers.
> The compiler cannot see around it.
> This takes 18*11 = 198 bytes just for invoking the 'current'
> macro so many times.
Unless you support SMP I'm not sure I understand why m68k nommu changed
from using a global for current_task ?
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [uClinux-dev] Kernel 2.6 size increase - get_current()?
2003-07-23 20:27 ` Christoph Hellwig
@ 2003-07-23 22:35 ` Bernardo Innocenti
2003-07-23 22:37 ` Alan Cox
0 siblings, 1 reply; 20+ messages in thread
From: Bernardo Innocenti @ 2003-07-23 22:35 UTC (permalink / raw)
To: Christoph Hellwig, David S. Miller; +Cc: uclinux-dev, linux-kernel
On Wednesday 23 July 2003 22:27, Christoph Hellwig wrote:
> On Wed, Jul 23, 2003 at 01:22:56PM -0700, David S. Miller wrote:
> > Drivers weren't audited much, and there's a lot of boneheaded
> > stuff in this area. But these should be mostly identical
> > to what would happen on the 2.4.x side
>
> Please read the original message again - he stated that every single
> module in fs/ got alot bigger - if it gets smaller or at least the
> same size as 2.4 it's clearly a sign of inlines gone mad in the
> filesystem/VM code and we need to look at that. If not we have to look
> elsewhere.
I have my humbling opinion:
In 2.4.20 (m68knommu):
-------------------------------------------------------------------------
#define current _current_task
-------------------------------------------------------------------------
In 2.6.0-test1 (m68knommu):
-------------------------------------------------------------------------
#define current get_current()
static inline struct task_struct *get_current(void)
{
return(current_thread_info()->task);
}
static inline struct thread_info *current_thread_info(void)
{
struct thread_info *ti;
__asm__(
"move.l %%sp, %0 \n\t"
"and.l %1, %0"
: "=&d"(ti)
: "d" (~(THREAD_SIZE-1))
);
return ti;
}
-------------------------------------------------------------------------
The latter expands to:
0: movel #-8192,%d0
6: movel %sp,%d2
8: andl %d0,%d2
a: moveal %d2,%a1
c: moveal %a1@,%a0
e: moveal %a0@(92),%a0
12:
It's a sequence of 6 instructions, 18 bytes long, clobbering 4 registers.
The compiler cannot see around it.
"current" is being used very lightly all over the kernel, like in this
code snippet from fs/open.c:
old_fsuid = current->fsuid;
old_fsgid = current->fsgid;
old_cap = current->cap_effective;
current->fsuid = current->uid;
current->fsgid = current->gid;
if (current->uid)
cap_clear(current->cap_effective);
else
current->cap_effective = current->cap_permitted;
This takes 18*11 = 198 bytes just for invoking the 'current'
macro so many times.
Perhaps adding __attribute__((const)) on current_thread_info() and
get_current() would help eliminating some unnecessary accesses.
--
// Bernardo Innocenti - Develer S.r.l., R&D dept.
\X/ http://www.develer.com/
Please don't send Word attachments - http://www.gnu.org/philosophy/no-word-attachments.html
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2003-07-25 18:28 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <cwQJ.3BO.29@gated-at.bofh.it>
[not found] ` <cypH.5dM.35@gated-at.bofh.it>
[not found] ` <cyza.5lN.13@gated-at.bofh.it>
[not found] ` <cArg.74D.11@gated-at.bofh.it>
2003-07-24 8:13 ` [uClinux-dev] Kernel 2.6 size increase - get_current()? Ihar "Philips" Filipau
2003-07-25 7:25 ` Denis Vlasenko
2003-07-25 18:36 ` bill davidsen
[not found] <d2nx.4QV.15@gated-at.bofh.it>
[not found] ` <dbTZ.5Z5.19@gated-at.bofh.it>
2003-07-25 15:37 ` Ihar "Philips" Filipau
2003-07-24 8:27 Ihar "Philips" Filipau
2003-07-24 11:50 ` David McCullough
-- strict thread matches above, loose matches on Subject: below --
2003-07-23 18:46 Kernel 2.6 size increase Bernardo Innocenti
2003-07-23 20:22 ` [uClinux-dev] " David S. Miller
2003-07-23 20:27 ` Christoph Hellwig
2003-07-23 22:35 ` [uClinux-dev] Kernel 2.6 size increase - get_current()? Bernardo Innocenti
2003-07-23 22:37 ` Alan Cox
2003-07-23 23:00 ` Bernardo Innocenti
2003-07-24 5:06 ` David McCullough
2003-07-24 11:28 ` Alan Cox
2003-07-24 12:04 ` David McCullough
2003-07-24 14:48 ` Alan Cox
2003-07-25 18:25 ` bill davidsen
2003-07-24 15:30 ` Hollis Blanchard
2003-07-24 19:37 ` Alan Cox
2003-07-24 19:51 ` Hollis Blanchard
2003-07-24 21:20 ` J.A. Magallon
2003-07-25 4:22 ` Otto Solares
2003-07-25 14:38 ` Hollis Blanchard
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).