linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [uClinux-dev] Kernel 2.6 size increase - get_current()?
       [not found] ` <dbTZ.5Z5.19@gated-at.bofh.it>
@ 2003-07-25 15:37   ` Ihar "Philips" Filipau
  2003-07-25 20:46     ` OT: Vanilla not for embedded?! " Mike Fedyk
  0 siblings, 1 reply; 24+ messages in thread
From: Ihar "Philips" Filipau @ 2003-07-25 15:37 UTC (permalink / raw)
  To: linux-kernel

Hollis Blanchard wrote:

> I believe the point Alan was trying to make is not that we should have 
> more or less inlines, but we should have smarter inlines. I.E. don't 
> just inline a function to "make it fast"; think about the implications 
> (and ideally measure it, though I think that becomes problematic when so 
> many other factors can affect the benefit of a single inlined function). 
> The specific example he gave was inlining code on the fast path, while 
> accepting branch/cache penalties for non-inlined code on the slow path.
> 

   But you cannot make this kind of decisions universal.
   Some kind of compromise should be found between arch-mantainers and 
subsystem-mantainers.

   Or beat GCC developer hard so they finally will produce good
optimizing compiler ;-)

   Or ask all kernel developpers to work one hour per week on GCC 
optimization - I bet GCC will outperform everything else in industry in 
  less that one year ;-)))

   To remind: source of the problem is not inlines, problem is the 
compiler, which cannot read our minds yet and generate code we were 
expected it to generate.

P.S. Offtopic. As I see it Linux & Linus have made the decision of 
optimization. Linux after all is capitalismus creation: who has more 
money do control everything. Server market has more money - they do more 
work on kernel and they systems are not that far from developers' 
workstations - so Linux gets more and more server/workstation oriented. 
This will fit desktop market too - if your computer was made to run 
WinXP AKA exp(bloat) - it will be capable to run any OS. Linus repeating 
'small is beatiful' sounds more and more like crude joke...
As for embedded market - it is already in deep fork and far far away 
from vanilla kernels... Vanilla really not that relevant to real world...


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: OT: Vanilla not for embedded?! Re: Kernel 2.6 size increase - get_current()?
  2003-07-25 20:46     ` OT: Vanilla not for embedded?! " Mike Fedyk
@ 2003-07-25 20:43       ` Andre Hedrick
  2003-07-27 11:57       ` Ihar "Philips" Filipau
  1 sibling, 0 replies; 24+ messages in thread
From: Andre Hedrick @ 2003-07-25 20:43 UTC (permalink / raw)
  To: Mike Fedyk; +Cc: Ihar Philips Filipau, linux-kernel


Where is the "deep" fork storaged, sounds interesting!
At lets it should buisness friendly.

-a

On Fri, 25 Jul 2003, Mike Fedyk wrote:

> On Fri, Jul 25, 2003 at 05:37:39PM +0200, Ihar Philips Filipau wrote:
> > P.S. Offtopic. As I see it Linux & Linus have made the decision of 
> > optimization. Linux after all is capitalismus creation: who has more 
> > money do control everything. Server market has more money - they do more
> > work on kernel and they systems are not that far from developers' 
> > workstations - so Linux gets more and more server/workstation oriented. 
> > This will fit desktop market too - if your computer was made to run 
> > WinXP AKA exp(bloat) - it will be capable to run any OS. Linus repeating 
> > 'small is beatiful' sounds more and more like crude joke...
> > As for embedded market - it is already in deep fork and far far away 
> > from vanilla kernels... Vanilla really not that relevant to real world...
> 
> Vanilla will be what people put into it.  And I have seen more messages from
> embedded people complaining, than actually doing and submitting patches for
> merging.
> 
> So the embedded trees are a deep fork huh?  Did you or anyone else do
> anything to merge during 2.5?!
> 
> And now you see why there is a "deep" fork...
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* OT: Vanilla not for embedded?! Re: Kernel 2.6 size increase - get_current()?
  2003-07-25 15:37   ` [uClinux-dev] Kernel 2.6 size increase - get_current()? Ihar "Philips" Filipau
@ 2003-07-25 20:46     ` Mike Fedyk
  2003-07-25 20:43       ` Andre Hedrick
  2003-07-27 11:57       ` Ihar "Philips" Filipau
  0 siblings, 2 replies; 24+ messages in thread
From: Mike Fedyk @ 2003-07-25 20:46 UTC (permalink / raw)
  To: Ihar Philips Filipau; +Cc: linux-kernel

On Fri, Jul 25, 2003 at 05:37:39PM +0200, Ihar Philips Filipau wrote:
> P.S. Offtopic. As I see it Linux & Linus have made the decision of 
> optimization. Linux after all is capitalismus creation: who has more 
> money do control everything. Server market has more money - they do more
> work on kernel and they systems are not that far from developers' 
> workstations - so Linux gets more and more server/workstation oriented. 
> This will fit desktop market too - if your computer was made to run 
> WinXP AKA exp(bloat) - it will be capable to run any OS. Linus repeating 
> 'small is beatiful' sounds more and more like crude joke...
> As for embedded market - it is already in deep fork and far far away 
> from vanilla kernels... Vanilla really not that relevant to real world...

Vanilla will be what people put into it.  And I have seen more messages from
embedded people complaining, than actually doing and submitting patches for
merging.

So the embedded trees are a deep fork huh?  Did you or anyone else do
anything to merge during 2.5?!

And now you see why there is a "deep" fork...

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: OT: Vanilla not for embedded?! Re: Kernel 2.6 size increase - get_current()?
  2003-07-25 20:46     ` OT: Vanilla not for embedded?! " Mike Fedyk
  2003-07-25 20:43       ` Andre Hedrick
@ 2003-07-27 11:57       ` Ihar "Philips" Filipau
  2003-07-27 13:05         ` Francois Romieu
  1 sibling, 1 reply; 24+ messages in thread
From: Ihar "Philips" Filipau @ 2003-07-27 11:57 UTC (permalink / raw)
  To: Mike Fedyk; +Cc: linux-kernel

Mike Fedyk wrote:
> 
> Vanilla will be what people put into it.  And I have seen more messages from
> embedded people complaining, than actually doing and submitting patches for
> merging.
> 
> So the embedded trees are a deep fork huh?  Did you or anyone else do
> anything to merge during 2.5?!
> 
> And now you see why there is a "deep" fork...
> 

   Real-time stuff is a must - something like RTAI.
   Things like Linux Trace Toolkit - soone or later you have to start 
using them to tune performace.
   Patches to remove mandatory (for 2.2/2.0) PCI/IDE support were pretty 
common too.
   Patch to shrink network hashes - norm of life.
   Patch to kill PCI names database.
   And this is only things I was using personally (and I remember about) 
in my short 4 years carrier.

   CONFIG_TINY - http://lwn.net/Articles/14186/ - got something like 
this merged? - so I'm the first guy in the download queue on ftp.kernel.org!

   Kernel heavily tuned for servers and workstations (read - modern PCs).

   At my previous position company was using kernel prepared by Karim 
Yaghmour and right now we using kernels from MontaVista.
   Far from vanillas.

 > embedded people complaining

   Sure complaining.
   For some reasons all "improvements" to kernel had lead to increase of 
kernel size, not decrease. Strange, isn't it?



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: OT: Vanilla not for embedded?! Re: Kernel 2.6 size increase - get_current()?
  2003-07-27 11:57       ` Ihar "Philips" Filipau
@ 2003-07-27 13:05         ` Francois Romieu
  0 siblings, 0 replies; 24+ messages in thread
From: Francois Romieu @ 2003-07-27 13:05 UTC (permalink / raw)
  To: Ihar Philips Filipau; +Cc: linux-kernel

Ihar Philips Filipau <filia@softhome.net> :
[...]
>    Patches to remove mandatory (for 2.2/2.0) PCI/IDE support were pretty 
> common too.
>    Patch to shrink network hashes - norm of life.
>    Patch to kill PCI names database.
>    And this is only things I was using personally (and I remember about) 
> in my short 4 years carrier.

Would you mind publishing the patches ?

>    CONFIG_TINY - http://lwn.net/Articles/14186/ - got something like 
> this merged? - so I'm the first guy in the download queue on ftp.kernel.org!

See CONFIG_EMBEDDED.

[...]
>    For some reasons all "improvements" to kernel had lead to increase of 
> kernel size, not decrease. Strange, isn't it?

No time for sarcasm here.

Regards

--
Ueimor

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [uClinux-dev] Kernel 2.6 size increase - get_current()?
  2003-07-24  8:13       ` Ihar "Philips" Filipau
  2003-07-25  7:25         ` Denis Vlasenko
@ 2003-07-25 18:36         ` bill davidsen
  1 sibling, 0 replies; 24+ messages in thread
From: bill davidsen @ 2003-07-25 18:36 UTC (permalink / raw)
  To: linux-kernel

In article <3F1F9531.2050204@softhome.net>,
Ihar \"Philips\" Filipau <filia@softhome.net> wrote:

|     Just curious.
| 
|     Is there any way to guess inline from inline?
| 
|     I mean 'inline' which means 'this has to be inlined or it will 
| break' and 'inline' which means 'inline this please - it adds only 10k 
| of code bloat and improve performance in my suppa-puppa-bench by 0.000001%!'
| 
|     Strictly speaking - separate 'inline' to 'require_inline' and 
| 'better_inline'.
|     So people who really care about image size - can turn 
| 'better_inline' into void, without harm to functionality.
|     Actually I saw real performance improvements on my Pentium MMX 133 
| (it has $i16k+$d16k of caches I beleive) when I was cutting some of 
| inlines out. and I'm not talking about (cache poor) embedded systems...

Actually you have a very diferent CPU to memory bandwidth ratio than a
processor manufactured in this millenium. I use a system like that for
test, but please don't optimize for it!

Speculation of the day: I suspect that on some laptops which run
seriously slower when on battery, the CPU/memory speed changes enough
that you could see and measure better performance with a 'slow' and a
'fast' kernel.

Speculation, since I'm sure the gain would be down in the noise, one of
those 'difference without a distinction' things.
-- 
bill davidsen <davidsen@tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [uClinux-dev] Kernel 2.6 size increase - get_current()?
  2003-07-24 12:04               ` David McCullough
  2003-07-24 14:48                 ` Alan Cox
@ 2003-07-25 18:25                 ` bill davidsen
  1 sibling, 0 replies; 24+ messages in thread
From: bill davidsen @ 2003-07-25 18:25 UTC (permalink / raw)
  To: linux-kernel

In article <20030724120441.GC16168@beast>,
David McCullough  <davidm@snapgear.com> wrote:

| So should the trend be away from inlining,  especially larger functions ?
| 
| I know on m68k some of the really simple inlines are actually smaller as
| an inline than as a function call.  But they have to be very simple,  or
| only used once.

Actually, I would think that the compiler would make the decision in a
perfect world. (no smiley) Clearly some programmers think the compiler
isn't aggressive about this, and that may be the root problem. Certainly
if the compiler makes the choice then -Os should avoid the inline.
-- 
bill davidsen <davidsen@tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [uClinux-dev] Kernel 2.6 size increase - get_current()?
  2003-07-25  4:22                       ` Otto Solares
@ 2003-07-25 14:38                         ` Hollis Blanchard
  0 siblings, 0 replies; 24+ messages in thread
From: Hollis Blanchard @ 2003-07-25 14:38 UTC (permalink / raw)
  To: Otto Solares
  Cc: J.A. Magallon, Alan Cox, David McCullough, uclinux-dev,
	Linux Kernel Mailing List, Ihar Philips Filipau

On Thursday, Jul 24, 2003, at 23:22 US/Central, Otto Solares wrote:

> On Thu, Jul 24, 2003 at 11:20:00PM +0200, J.A. Magallon wrote:
>> Or you just define must_inline, and let gcc inline the rest of 
>> 'inlines',
>> based on its own rule of functions size, adjusting the parameters
>> to gcc to assure (more or less) that what is inlined fits in cache of
>> the processor one is building for...
>> (this can be hard, help from gcc hackers will be needed...)
>
> IMO just a CONFIG_INLINE_FUNCTIONS will work, if you
> want to conserve space in detriment of speed simply
> don't select this option, else you have speed but
> a big kernel.

Inlines don't always help performance (depending on cache sizes, branch 
penalties, frequency of code access...), but they do always increase 
code size.

I believe the point Alan was trying to make is not that we should have 
more or less inlines, but we should have smarter inlines. I.E. don't 
just inline a function to "make it fast"; think about the implications 
(and ideally measure it, though I think that becomes problematic when 
so many other factors can affect the benefit of a single inlined 
function). The specific example he gave was inlining code on the fast 
path, while accepting branch/cache penalties for non-inlined code on 
the slow path.

-- 
Hollis Blanchard
IBM Linux Technology Center


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [uClinux-dev] Kernel 2.6 size increase - get_current()?
  2003-07-24  8:13       ` Ihar "Philips" Filipau
@ 2003-07-25  7:25         ` Denis Vlasenko
  2003-07-25 18:36         ` bill davidsen
  1 sibling, 0 replies; 24+ messages in thread
From: Denis Vlasenko @ 2003-07-25  7:25 UTC (permalink / raw)
  To: filia, linux-kernel

On 24 July 2003 11:13, Ihar \"Philips\" Filipau wrote:
>     I mean 'inline' which means 'this has to be inlined or it will 
> break' and 'inline' which means 'inline this please - it adds only 10k 
> of code bloat and improve performance in my suppa-puppa-bench by 0.000001%!'
> 
>     Strictly speaking - separate 'inline' to 'require_inline' and 
> 'better_inline'.
>     So people who really care about image size - can turn 
> 'better_inline' into void, without harm to functionality.
>     Actually I saw real performance improvements on my Pentium MMX 133 
> (it has $i16k+$d16k of caches I beleive) when I was cutting some of 
> inlines out. and I'm not talking about (cache poor) embedded systems...

Which inlines? Let the list know
--
vda

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [uClinux-dev] Kernel 2.6 size increase - get_current()?
  2003-07-24 21:20                     ` J.A. Magallon
@ 2003-07-25  4:22                       ` Otto Solares
  2003-07-25 14:38                         ` Hollis Blanchard
  0 siblings, 1 reply; 24+ messages in thread
From: Otto Solares @ 2003-07-25  4:22 UTC (permalink / raw)
  To: J.A. Magallon
  Cc: Hollis Blanchard, Alan Cox, David McCullough, uclinux-dev,
	Linux Kernel Mailing List, Ihar Philips Filipau

On Thu, Jul 24, 2003 at 11:20:00PM +0200, J.A. Magallon wrote:
> Or you just define must_inline, and let gcc inline the rest of 'inlines',
> based on its own rule of functions size, adjusting the parameters
> to gcc to assure (more or less) that what is inlined fits in cache of
> the processor one is building for...
> (this can be hard, help from gcc hackers will be needed...)

IMO just a CONFIG_INLINE_FUNCTIONS will work, if you
want to conserve space in detriment of speed simply
don't select this option, else you have speed but
a big kernel.

-solca


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [uClinux-dev] Kernel 2.6 size increase - get_current()?
  2003-07-24 19:51                   ` Hollis Blanchard
@ 2003-07-24 21:20                     ` J.A. Magallon
  2003-07-25  4:22                       ` Otto Solares
  0 siblings, 1 reply; 24+ messages in thread
From: J.A. Magallon @ 2003-07-24 21:20 UTC (permalink / raw)
  To: Hollis Blanchard
  Cc: Alan Cox, David McCullough, uclinux-dev,
	Linux Kernel Mailing List, Ihar Philips Filipau


On 07.24, Hollis Blanchard wrote:
> On Thursday, Jul 24, 2003, at 14:37 US/Central, Alan Cox wrote:
> 
> > On Iau, 2003-07-24 at 16:30, Hollis Blanchard wrote:
> >> So you're arguing for more inlining, because icache speculative
> >> prefetch will pick up the inlined code?
> >
> > I'm arguing for short inlined fast paths and non inlined unusual
> > paths.
> >
> >> Or you're arguing for less, because code like get_current() which is
> >> called frequently could have a single copy living in icache?
> >
> > Depends how much the jump costs you.
> 
> And also how big your icache is, and maybe even cpu/bus ratio, etc... 
> which depend on the arch of course.
> 
> So as I saw Ihar suggest earlier in this thread, perhaps there should 
> be two inline directives: must_inline (for code whose correctness 
> depends on it) and could_help_performance_inline. Then different archs 
> could #define could_help_performance_inline as appropriate.
> 

Or you just define must_inline, and let gcc inline the rest of 'inlines',
based on its own rule of functions size, adjusting the parameters
to gcc to assure (more or less) that what is inlined fits in cache of
the processor one is building for...
(this can be hard, help from gcc hackers will be needed...)

-- 
J.A. Magallon <jamagallon@able.es>      \                 Software is like sex:
werewolf.able.es                         \           It's better when it's free
Mandrake Linux release 9.2 (Cooker) for i586
Linux 2.4.22-pre7-jam1m (gcc 3.3.1 (Mandrake Linux 9.2 3.3.1-0.6mdk))

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [uClinux-dev] Kernel 2.6 size increase - get_current()?
  2003-07-24 19:37                 ` Alan Cox
@ 2003-07-24 19:51                   ` Hollis Blanchard
  2003-07-24 21:20                     ` J.A. Magallon
  0 siblings, 1 reply; 24+ messages in thread
From: Hollis Blanchard @ 2003-07-24 19:51 UTC (permalink / raw)
  To: Alan Cox
  Cc: David McCullough, uclinux-dev, Linux Kernel Mailing List,
	Ihar "Philips" Filipau

On Thursday, Jul 24, 2003, at 14:37 US/Central, Alan Cox wrote:

> On Iau, 2003-07-24 at 16:30, Hollis Blanchard wrote:
>> So you're arguing for more inlining, because icache speculative
>> prefetch will pick up the inlined code?
>
> I'm arguing for short inlined fast paths and non inlined unusual
> paths.
>
>> Or you're arguing for less, because code like get_current() which is
>> called frequently could have a single copy living in icache?
>
> Depends how much the jump costs you.

And also how big your icache is, and maybe even cpu/bus ratio, etc... 
which depend on the arch of course.

So as I saw Ihar suggest earlier in this thread, perhaps there should 
be two inline directives: must_inline (for code whose correctness 
depends on it) and could_help_performance_inline. Then different archs 
could #define could_help_performance_inline as appropriate.

-- 
Hollis Blanchard
IBM Linux Technology Center


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [uClinux-dev] Kernel 2.6 size increase - get_current()?
  2003-07-24 15:30               ` Hollis Blanchard
@ 2003-07-24 19:37                 ` Alan Cox
  2003-07-24 19:51                   ` Hollis Blanchard
  0 siblings, 1 reply; 24+ messages in thread
From: Alan Cox @ 2003-07-24 19:37 UTC (permalink / raw)
  To: Hollis Blanchard; +Cc: David McCullough, uclinux-dev, Linux Kernel Mailing List

On Iau, 2003-07-24 at 16:30, Hollis Blanchard wrote:
> So you're arguing for more inlining, because icache speculative 
> prefetch will pick up the inlined code?

I'm arguing for short inlined fast paths and non inlined unusual
paths.

> Or you're arguing for less, because code like get_current() which is 
> called frequently could have a single copy living in icache?

Depends how much the jump costs you.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [uClinux-dev] Kernel 2.6 size increase - get_current()?
  2003-07-24 11:28             ` Alan Cox
  2003-07-24 12:04               ` David McCullough
@ 2003-07-24 15:30               ` Hollis Blanchard
  2003-07-24 19:37                 ` Alan Cox
  1 sibling, 1 reply; 24+ messages in thread
From: Hollis Blanchard @ 2003-07-24 15:30 UTC (permalink / raw)
  To: Alan Cox; +Cc: David McCullough, uclinux-dev, Linux Kernel Mailing List

On Thursday, Jul 24, 2003, at 06:28 US/Central, Alan Cox wrote:
>
> Code size for critical paths is getting more and more performance 
> critical
> on x86 as well as on the embedded CPU systems. 3Ghz superscalar 
> processors
> lose a lot of clocks to a memory stall.

So you're arguing for more inlining, because icache speculative 
prefetch will pick up the inlined code?

Or you're arguing for less, because code like get_current() which is 
called frequently could have a single copy living in icache?

-- 
Hollis Blanchard
IBM Linux Technology Center


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [uClinux-dev] Kernel 2.6 size increase - get_current()?
  2003-07-24 12:04               ` David McCullough
@ 2003-07-24 14:48                 ` Alan Cox
  2003-07-25 18:25                 ` bill davidsen
  1 sibling, 0 replies; 24+ messages in thread
From: Alan Cox @ 2003-07-24 14:48 UTC (permalink / raw)
  To: David McCullough
  Cc: Bernardo Innocenti, Christoph Hellwig, David S. Miller,
	uclinux-dev, Linux Kernel Mailing List, Greg Ungerer

On Iau, 2003-07-24 at 13:04, David McCullough wrote:
> So should the trend be away from inlining,  especially larger functions ?
> 
> I know on m68k some of the really simple inlines are actually smaller as
> an inline than as a function call.  But they have to be very simple,  or
> only used once.

Cool. As to trends well there are two conflicting ones - less inlines but
also more code because of adding fast paths to cut conditions down on normal
sequences of execution.



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [uClinux-dev] Kernel 2.6 size increase - get_current()?
  2003-07-24 11:28             ` Alan Cox
@ 2003-07-24 12:04               ` David McCullough
  2003-07-24 14:48                 ` Alan Cox
  2003-07-25 18:25                 ` bill davidsen
  2003-07-24 15:30               ` Hollis Blanchard
  1 sibling, 2 replies; 24+ messages in thread
From: David McCullough @ 2003-07-24 12:04 UTC (permalink / raw)
  To: Alan Cox
  Cc: Bernardo Innocenti, Christoph Hellwig, David S. Miller,
	uclinux-dev, Linux Kernel Mailing List, Greg Ungerer


Jivin Alan Cox lays it down ...
> On Iau, 2003-07-24 at 06:06, David McCullough wrote:
> > Back when I first did the 2.4 uClinux port,  the m68k MMU code was
> > dedicating a register (a2) for current.  I thought that was a bad idea
> > given how often you run out of registers on the 68k,  and made it a
> 
> On some platforms a global register current was a win, I can't speak for
> m68k - current is used a lot.


I'm sure that using a register for current was the right thing to do at
the time.  One problem with a global register approach is that the more
inlining the code uses,  the more like the compiler is going to want
that extra register :-)


> > On the 2.5/2.6 front,  I think the change comes from the 8K (2 page) task
> > structure and everyone just masking the kernel stack pointer to get the
> > task pointer.  Gerg would know for sure,  he did the 2.5 work in this area.
> > We should be easily able to switch back to the current_task pointer with a
> > few small mods to entry.S.
> 
> A lot of platforms went this way because "current" is hard to do right
> on an SMP box. Its effectively per CPU dependant, and that means you
> either set up the MMU to do per CPU pages (via segments or tables) which
> is a pita, or you do the stack trick. For uniprocessor a global still
> works perfectly well.


Sounds like something that can at least be made conditional on SMP.
I'll look into it for m68knommu since it is more likely to care about "size"
than SMP.


> > A general comment on the use of inline throughout the kernel.  Although
> > they may show gains on x86 platforms,  they often perform worse on 
> > embedded processors with limited cache,  as well as adding size.  I
> 
> Code size for critical paths is getting more and more performance critical
> on x86 as well as on the embedded CPU systems. 3Ghz superscalar processors
> lose a lot of clocks to a memory stall.

So should the trend be away from inlining,  especially larger functions ?

I know on m68k some of the really simple inlines are actually smaller as
an inline than as a function call.  But they have to be very simple,  or
only used once.

Cheers,
Davidm

-- 
David McCullough, davidm@snapgear.com  Ph:+61 7 34352815 http://www.SnapGear.com
Custom Embedded Solutions + Security   Fx:+61 7 38913630 http://www.uCdot.org

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [uClinux-dev] Kernel 2.6 size increase - get_current()?
  2003-07-24  8:27 [uClinux-dev] " Ihar "Philips" Filipau
@ 2003-07-24 11:50 ` David McCullough
  0 siblings, 0 replies; 24+ messages in thread
From: David McCullough @ 2003-07-24 11:50 UTC (permalink / raw)
  To: Ihar Philips Filipau; +Cc: linux-kernel


Jivin Ihar Philips Filipau lays it down ...
> David McCullough wrote:
> >
> >A general comment on the use of inline throughout the kernel.  Although
> >they may show gains on x86 platforms,  they often perform worse on 
> >embedded processors with limited cache,  as well as adding size.  I
> >can't see any way of coding around this though.  As long as x86 is
> >driving influence,  other platforms will jut have to deal with it as
> >best they can.
> >
> 
>   Actually I'm victim on over inlining too. Was at least.
>   I was running some router on old Pentium's. I remember almost 
> dramatical drop of performance with newer kernels because of inlining in 
> net/*. But sure on Xeon P4 it boosts performance...
> 
>   Actually what I'm about.
>   We have classical situation when we have mess of representation and 
> intentions.
> 
>   Representation == 'inline', but intentions - 'inline or it will 
> break' _and_ 'inline - it runs faster'.
>   This obviously should be separated.


The biggest problem I see is that the inlines are done in header files
generally,  and to stop them from inlining,  you need to be able to
switch from an inline to a prototype in the header file.  The code from
the header then needs to be added to a .o somewhere in the build for the
case where inlines are stripped out.


Other than providing non-critical inlines either on or off,  I can't see
the level approach working all that well.  A combination of levels that
work well on a few platforms may not work well at all on another.
Still, just the ability to reduce the inlines would be very useful.


Cheers,
Davidm




>   even more.
> 
> #define INLINE_LEVEL some_platform_specific_number
> 
> ---------
> 
> #define inline0 inline_always
> 
> #if INLINE_LEVEL >= 1
> #  define inline1 inline_always
> #else
> #  define inline1
> #endif
> ...
> #if INLINE_LEVEL >= N
> #  define inlineN inline_always
> #else
> #  define inlineN
> #endif
> 
>    and so on, giving a platform chance to influence amount of inlining.
>    better to put it into config with defined by platform defaults.
-- 
David McCullough, davidm@snapgear.com  Ph:+61 7 34352815 http://www.SnapGear.com
Custom Embedded Solutions + Security   Fx:+61 7 38913630 http://www.uCdot.org

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [uClinux-dev] Kernel 2.6 size increase - get_current()?
  2003-07-24  5:06           ` David McCullough
@ 2003-07-24 11:28             ` Alan Cox
  2003-07-24 12:04               ` David McCullough
  2003-07-24 15:30               ` Hollis Blanchard
  0 siblings, 2 replies; 24+ messages in thread
From: Alan Cox @ 2003-07-24 11:28 UTC (permalink / raw)
  To: David McCullough
  Cc: Bernardo Innocenti, Christoph Hellwig, David S. Miller,
	uclinux-dev, Linux Kernel Mailing List, Greg Ungerer

On Iau, 2003-07-24 at 06:06, David McCullough wrote:
> Back when I first did the 2.4 uClinux port,  the m68k MMU code was
> dedicating a register (a2) for current.  I thought that was a bad idea
> given how often you run out of registers on the 68k,  and made it a

On some platforms a global register current was a win, I can't speak for
m68k - current is used a lot.

> On the 2.5/2.6 front,  I think the change comes from the 8K (2 page) task
> structure and everyone just masking the kernel stack pointer to get the
> task pointer.  Gerg would know for sure,  he did the 2.5 work in this area.
> We should be easily able to switch back to the current_task pointer with a
> few small mods to entry.S.

A lot of platforms went this way because "current" is hard to do right
on an SMP box. Its effectively per CPU dependant, and that means you
either set up the MMU to do per CPU pages (via segments or tables) which
is a pita, or you do the stack trick. For uniprocessor a global still
works perfectly well.

> A general comment on the use of inline throughout the kernel.  Although
> they may show gains on x86 platforms,  they often perform worse on 
> embedded processors with limited cache,  as well as adding size.  I

Code size for critical paths is getting more and more performance critical
on x86 as well as on the embedded CPU systems. 3Ghz superscalar processors
lose a lot of clocks to a memory stall.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [uClinux-dev] Kernel 2.6 size increase - get_current()?
@ 2003-07-24  8:27 Ihar "Philips" Filipau
  2003-07-24 11:50 ` David McCullough
  0 siblings, 1 reply; 24+ messages in thread
From: Ihar "Philips" Filipau @ 2003-07-24  8:27 UTC (permalink / raw)
  To: David McCullough; +Cc: linux-kernel

David McCullough wrote:
> 
> A general comment on the use of inline throughout the kernel.  Although
> they may show gains on x86 platforms,  they often perform worse on 
> embedded processors with limited cache,  as well as adding size.  I
> can't see any way of coding around this though.  As long as x86 is
> driving influence,  other platforms will jut have to deal with it as
> best they can.
> 

   Actually I'm victim on over inlining too. Was at least.
   I was running some router on old Pentium's. I remember almost 
dramatical drop of performance with newer kernels because of inlining in 
net/*. But sure on Xeon P4 it boosts performance...

   Actually what I'm about.
   We have classical situation when we have mess of representation and 
intentions.

   Representation == 'inline', but intentions - 'inline or it will 
break' _and_ 'inline - it runs faster'.
   This obviously should be separated.

   even more.

#define INLINE_LEVEL some_platform_specific_number

---------

#define inline0 inline_always

#if INLINE_LEVEL >= 1
#  define inline1 inline_always
#else
#  define inline1
#endif
...
#if INLINE_LEVEL >= N
#  define inlineN inline_always
#else
#  define inlineN
#endif

    and so on, giving a platform chance to influence amount of inlining.
    better to put it into config with defined by platform defaults.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [uClinux-dev] Kernel 2.6 size increase - get_current()?
       [not found]     ` <cArg.74D.11@gated-at.bofh.it>
@ 2003-07-24  8:13       ` Ihar "Philips" Filipau
  2003-07-25  7:25         ` Denis Vlasenko
  2003-07-25 18:36         ` bill davidsen
  0 siblings, 2 replies; 24+ messages in thread
From: Ihar "Philips" Filipau @ 2003-07-24  8:13 UTC (permalink / raw)
  To: linux-kernel

Bernardo Innocenti wrote:
> On Wednesday 23 July 2003 22:27, Christoph Hellwig wrote:
> 
>>On Wed, Jul 23, 2003 at 01:22:56PM -0700, David S. Miller wrote:
>>>Drivers weren't audited much, and there's a lot of boneheaded
>>>stuff in this area.  But these should be mostly identical
>>>to what would happen on the 2.4.x side
>>
>>Please read the original message again - he stated that every single
>>module in fs/ got alot bigger - if it gets smaller or at least the
>>same size as 2.4 it's clearly a sign of inlines gone mad in the
>>filesystem/VM code and we need to look at that.  If not we have to look
>>elsewhere.
> 
> I have my humbling opinion:
> 
> In 2.4.20 (m68knommu):
> -------------------------------------------------------------------------
> #define current _current_task
> -------------------------------------------------------------------------
> 
> In 2.6.0-test1 (m68knommu):
> -------------------------------------------------------------------------
> static inline struct task_struct *get_current(void)
> {
    [cut]
> }
> static inline struct thread_info *current_thread_info(void)
> {
    [cut]
> }
> -------------------------------------------------------------------------
> 
> This takes 18*11 = 198 bytes just for invoking the 'current'
> macro so many times.
> 

    Just curious.

    Is there any way to guess inline from inline?

    I mean 'inline' which means 'this has to be inlined or it will 
break' and 'inline' which means 'inline this please - it adds only 10k 
of code bloat and improve performance in my suppa-puppa-bench by 0.000001%!'

    Strictly speaking - separate 'inline' to 'require_inline' and 
'better_inline'.
    So people who really care about image size - can turn 
'better_inline' into void, without harm to functionality.
    Actually I saw real performance improvements on my Pentium MMX 133 
(it has $i16k+$d16k of caches I beleive) when I was cutting some of 
inlines out. and I'm not talking about (cache poor) embedded systems...


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [uClinux-dev] Kernel 2.6 size increase - get_current()?
  2003-07-23 23:00         ` Bernardo Innocenti
@ 2003-07-24  5:06           ` David McCullough
  2003-07-24 11:28             ` Alan Cox
  0 siblings, 1 reply; 24+ messages in thread
From: David McCullough @ 2003-07-24  5:06 UTC (permalink / raw)
  To: Bernardo Innocenti
  Cc: Alan Cox, Christoph Hellwig, David S. Miller, uclinux-dev,
	Linux Kernel Mailing List, Greg Ungerer


Jivin Bernardo Innocenti lays it down ...
> On Thursday 24 July 2003 00:37, Alan Cox wrote:
> 
> > On Mer, 2003-07-23 at 23:35, Bernardo Innocenti wrote:
> > > It's a sequence of 6 instructions, 18 bytes long, clobbering 4 registers.
> > > The compiler cannot see around it.
> > > This takes 18*11 = 198 bytes just for invoking the 'current'
> > > macro so many times.
> >
> > Unless you support SMP I'm not sure I understand why m68k nommu changed
> > from using a global for current_task ?
> 
> The people who might know best are Greg and David from SnapGear.
> I'm appending them to the Cc list.
> 
> But I noticed that most archs in 2.6 do like this. Is it some kind
> of flock-effect? Things get changed in i386 and all other archs
> just follow... :-)

It's a little this way for sure.

Back when I first did the 2.4 uClinux port,  the m68k MMU code was
dedicating a register (a2) for current.  I thought that was a bad idea
given how often you run out of registers on the 68k,  and made it a
global.  Because it was still effectively a pointer,  the code size
change was not a factor.  I just didn't want to give up a register.
So that is the 2.4 history and it has served us well so far ;-)

On the 2.5/2.6 front,  I think the change comes from the 8K (2 page) task
structure and everyone just masking the kernel stack pointer to get the
task pointer.  Gerg would know for sure,  he did the 2.5 work in this area.
We should be easily able to switch back to the current_task pointer with a
few small mods to entry.S.

A general comment on the use of inline throughout the kernel.  Although
they may show gains on x86 platforms,  they often perform worse on 
embedded processors with limited cache,  as well as adding size.  I
can't see any way of coding around this though.  As long as x86 is
driving influence,  other platforms will jut have to deal with it as
best they can.

Cheers,
Davidm

-- 
David McCullough, davidm@snapgear.com  Ph:+61 7 34352815 http://www.SnapGear.com
Custom Embedded Solutions + Security   Fx:+61 7 38913630 http://www.uCdot.org

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [uClinux-dev] Kernel 2.6 size increase - get_current()?
  2003-07-23 22:37       ` Alan Cox
@ 2003-07-23 23:00         ` Bernardo Innocenti
  2003-07-24  5:06           ` David McCullough
  0 siblings, 1 reply; 24+ messages in thread
From: Bernardo Innocenti @ 2003-07-23 23:00 UTC (permalink / raw)
  To: Alan Cox
  Cc: Christoph Hellwig, David S. Miller, uclinux-dev,
	Linux Kernel Mailing List, Greg Ungerer, David McCullough

On Thursday 24 July 2003 00:37, Alan Cox wrote:

> On Mer, 2003-07-23 at 23:35, Bernardo Innocenti wrote:
> > It's a sequence of 6 instructions, 18 bytes long, clobbering 4 registers.
> > The compiler cannot see around it.
> > This takes 18*11 = 198 bytes just for invoking the 'current'
> > macro so many times.
>
> Unless you support SMP I'm not sure I understand why m68k nommu changed
> from using a global for current_task ?

The people who might know best are Greg and David from SnapGear.
I'm appending them to the Cc list.

But I noticed that most archs in 2.6 do like this. Is it some kind
of flock-effect? Things get changed in i386 and all other archs
just follow... :-)

-- 
  // Bernardo Innocenti - Develer S.r.l., R&D dept.
\X/  http://www.develer.com/

Please don't send Word attachments - http://www.gnu.org/philosophy/no-word-attachments.html



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [uClinux-dev] Kernel 2.6 size increase - get_current()?
  2003-07-23 22:35     ` [uClinux-dev] Kernel 2.6 size increase - get_current()? Bernardo Innocenti
@ 2003-07-23 22:37       ` Alan Cox
  2003-07-23 23:00         ` Bernardo Innocenti
  0 siblings, 1 reply; 24+ messages in thread
From: Alan Cox @ 2003-07-23 22:37 UTC (permalink / raw)
  To: Bernardo Innocenti
  Cc: Christoph Hellwig, David S. Miller, uclinux-dev,
	Linux Kernel Mailing List

On Mer, 2003-07-23 at 23:35, Bernardo Innocenti wrote:
> It's a sequence of 6 instructions, 18 bytes long, clobbering 4 registers.
> The compiler cannot see around it.
> This takes 18*11 = 198 bytes just for invoking the 'current'
> macro so many times.

Unless you support SMP I'm not sure I understand why m68k nommu changed
from using a global for current_task ?


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [uClinux-dev] Kernel 2.6 size increase - get_current()?
  2003-07-23 20:27   ` Christoph Hellwig
@ 2003-07-23 22:35     ` Bernardo Innocenti
  2003-07-23 22:37       ` Alan Cox
  0 siblings, 1 reply; 24+ messages in thread
From: Bernardo Innocenti @ 2003-07-23 22:35 UTC (permalink / raw)
  To: Christoph Hellwig, David S. Miller; +Cc: uclinux-dev, linux-kernel

On Wednesday 23 July 2003 22:27, Christoph Hellwig wrote:

> On Wed, Jul 23, 2003 at 01:22:56PM -0700, David S. Miller wrote:
> > Drivers weren't audited much, and there's a lot of boneheaded
> > stuff in this area.  But these should be mostly identical
> > to what would happen on the 2.4.x side
>
> Please read the original message again - he stated that every single
> module in fs/ got alot bigger - if it gets smaller or at least the
> same size as 2.4 it's clearly a sign of inlines gone mad in the
> filesystem/VM code and we need to look at that.  If not we have to look
> elsewhere.

I have my humbling opinion:

In 2.4.20 (m68knommu):
-------------------------------------------------------------------------
#define current _current_task
-------------------------------------------------------------------------

In 2.6.0-test1 (m68knommu):
-------------------------------------------------------------------------
#define current get_current()
static inline struct task_struct *get_current(void)
{
        return(current_thread_info()->task);
}
static inline struct thread_info *current_thread_info(void)
{
        struct thread_info *ti;
        __asm__(
                "move.l %%sp, %0 \n\t"
                "and.l  %1, %0"
                : "=&d"(ti)
                : "d" (~(THREAD_SIZE-1))
                );
        return ti;
}
-------------------------------------------------------------------------

The latter expands to:

 0:	movel #-8192,%d0
 6:	movel %sp,%d2
 8:	andl %d0,%d2
 a:	moveal %d2,%a1
 c:	moveal %a1@,%a0
 e:	moveal %a0@(92),%a0
12:

It's a sequence of 6 instructions, 18 bytes long, clobbering 4 registers.
The compiler cannot see around it.

"current" is being used very lightly all over the kernel, like in this
code snippet from fs/open.c:

        old_fsuid = current->fsuid;
        old_fsgid = current->fsgid;
        old_cap = current->cap_effective;
        current->fsuid = current->uid;
        current->fsgid = current->gid;
        if (current->uid)
                cap_clear(current->cap_effective);
        else
                current->cap_effective = current->cap_permitted;

This takes 18*11 = 198 bytes just for invoking the 'current'
macro so many times.

Perhaps adding __attribute__((const)) on current_thread_info() and
get_current() would help eliminating some unnecessary accesses.

-- 
  // Bernardo Innocenti - Develer S.r.l., R&D dept.
\X/  http://www.develer.com/

Please don't send Word attachments - http://www.gnu.org/philosophy/no-word-attachments.html



^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2003-07-27 12:50 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <d2nx.4QV.15@gated-at.bofh.it>
     [not found] ` <dbTZ.5Z5.19@gated-at.bofh.it>
2003-07-25 15:37   ` [uClinux-dev] Kernel 2.6 size increase - get_current()? Ihar "Philips" Filipau
2003-07-25 20:46     ` OT: Vanilla not for embedded?! " Mike Fedyk
2003-07-25 20:43       ` Andre Hedrick
2003-07-27 11:57       ` Ihar "Philips" Filipau
2003-07-27 13:05         ` Francois Romieu
2003-07-24  8:27 [uClinux-dev] " Ihar "Philips" Filipau
2003-07-24 11:50 ` David McCullough
     [not found] <cwQJ.3BO.29@gated-at.bofh.it>
     [not found] ` <cypH.5dM.35@gated-at.bofh.it>
     [not found]   ` <cyza.5lN.13@gated-at.bofh.it>
     [not found]     ` <cArg.74D.11@gated-at.bofh.it>
2003-07-24  8:13       ` Ihar "Philips" Filipau
2003-07-25  7:25         ` Denis Vlasenko
2003-07-25 18:36         ` bill davidsen
  -- strict thread matches above, loose matches on Subject: below --
2003-07-23 18:46 Kernel 2.6 size increase Bernardo Innocenti
2003-07-23 20:22 ` [uClinux-dev] " David S. Miller
2003-07-23 20:27   ` Christoph Hellwig
2003-07-23 22:35     ` [uClinux-dev] Kernel 2.6 size increase - get_current()? Bernardo Innocenti
2003-07-23 22:37       ` Alan Cox
2003-07-23 23:00         ` Bernardo Innocenti
2003-07-24  5:06           ` David McCullough
2003-07-24 11:28             ` Alan Cox
2003-07-24 12:04               ` David McCullough
2003-07-24 14:48                 ` Alan Cox
2003-07-25 18:25                 ` bill davidsen
2003-07-24 15:30               ` Hollis Blanchard
2003-07-24 19:37                 ` Alan Cox
2003-07-24 19:51                   ` Hollis Blanchard
2003-07-24 21:20                     ` J.A. Magallon
2003-07-25  4:22                       ` Otto Solares
2003-07-25 14:38                         ` Hollis Blanchard

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).