linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: x86: 4kstacks default
       [not found] <200804181737.m3IHbabI010051@hera.kernel.org>
@ 2008-04-18 21:29 ` Andrew Morton
  2008-04-19 14:23   ` Ingo Molnar
  0 siblings, 1 reply; 162+ messages in thread
From: Andrew Morton @ 2008-04-18 21:29 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Linux Kernel Mailing List

On Fri, 18 Apr 2008 17:37:36 GMT
Linux Kernel Mailing List <linux-kernel@vger.kernel.org> wrote:

> Gitweb:     http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=d61ecf0b53131564949bc4196e70f676000a845a
> Commit:     d61ecf0b53131564949bc4196e70f676000a845a
> Parent:     f408b43ceedce49f26c01cd4a68dbbdbe2743e51
> Author:     Ingo Molnar <mingo@elte.hu>
> AuthorDate: Fri Apr 4 17:11:09 2008 +0200
> Committer:  Ingo Molnar <mingo@elte.hu>
> CommitDate: Thu Apr 17 17:41:34 2008 +0200
> 
>     x86: 4kstacks default
>     
>     Signed-off-by: Ingo Molnar <mingo@elte.hu>
> ---
>  arch/x86/Kconfig.debug |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/arch/x86/Kconfig.debug b/arch/x86/Kconfig.debug
> index f4413c0..610aaec 100644
> --- a/arch/x86/Kconfig.debug
> +++ b/arch/x86/Kconfig.debug
> @@ -106,8 +106,8 @@ config DEBUG_NX_TEST
>  
>  config 4KSTACKS
>  	bool "Use 4Kb for kernel stacks instead of 8Kb"
> -	depends on DEBUG_KERNEL
>  	depends on X86_32
> +	default y

This patch will cause kernels to crash.

It has no changelog which explains or justifies the alteration.

afaict the patch was not posted to the mailing list and was not
discussed or reviewed.

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-18 21:29 ` x86: 4kstacks default Andrew Morton
@ 2008-04-19 14:23   ` Ingo Molnar
  2008-04-19 14:35     ` Oliver Pinter
                       ` (4 more replies)
  0 siblings, 5 replies; 162+ messages in thread
From: Ingo Molnar @ 2008-04-19 14:23 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Linux Kernel Mailing List, Arjan van de Ven, Thomas Gleixner


* Andrew Morton <akpm@linux-foundation.org> wrote:

> >  config 4KSTACKS
> >  	bool "Use 4Kb for kernel stacks instead of 8Kb"
> > -	depends on DEBUG_KERNEL
> >  	depends on X86_32
> > +	default y
> 
> This patch will cause kernels to crash.

what mainline kernels crash and how will they crash? Fedora and other 
distros have had 4K stacks enabled for years:

  $ grep 4K /boot/config-2.6.24-9.fc9
  CONFIG_4KSTACKS=y

and we've conducted tens of thousands of bootup tests with all sorts of 
drivers and kernel options enabled and have yet to see a single crash 
due to 4K stacks. So basically the kernel default just follows the 
common distro default now. (distros and users can still disable it)

	Ingo

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-19 14:23   ` Ingo Molnar
@ 2008-04-19 14:35     ` Oliver Pinter
  2008-04-19 15:19       ` Adrian Bunk
  2008-04-19 14:59     ` Shawn Bohrer
                       ` (3 subsequent siblings)
  4 siblings, 1 reply; 162+ messages in thread
From: Oliver Pinter @ 2008-04-19 14:35 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Morton, Linux Kernel Mailing List, Arjan van de Ven,
	Thomas Gleixner, Christoph Hellwig, David Chinner, xfs

Hi Ingo!

with the older kernel is typical: xfs+nfs+4k stack(+lvm)



On 4/19/08, Ingo Molnar <mingo@elte.hu> wrote:
>
> * Andrew Morton <akpm@linux-foundation.org> wrote:
>
> > >  config 4KSTACKS
> > >  	bool "Use 4Kb for kernel stacks instead of 8Kb"
> > > -	depends on DEBUG_KERNEL
> > >  	depends on X86_32
> > > +	default y
> >
> > This patch will cause kernels to crash.
>
> what mainline kernels crash and how will they crash? Fedora and other
> distros have had 4K stacks enabled for years:
>
>   $ grep 4K /boot/config-2.6.24-9.fc9
>   CONFIG_4KSTACKS=y
>
> and we've conducted tens of thousands of bootup tests with all sorts of
> drivers and kernel options enabled and have yet to see a single crash
> due to 4K stacks. So basically the kernel default just follows the
> common distro default now. (distros and users can still disable it)
>
> 	Ingo
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>


-- 
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-19 14:23   ` Ingo Molnar
  2008-04-19 14:35     ` Oliver Pinter
@ 2008-04-19 14:59     ` Shawn Bohrer
  2008-04-19 18:00       ` Arjan van de Ven
  2008-04-20  8:09       ` Adrian Bunk
  2008-04-19 17:49     ` Andrew Morton
                       ` (2 subsequent siblings)
  4 siblings, 2 replies; 162+ messages in thread
From: Shawn Bohrer @ 2008-04-19 14:59 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Morton, Linux Kernel Mailing List, Arjan van de Ven,
	Thomas Gleixner

On Sat, Apr 19, 2008 at 04:23:29PM +0200, Ingo Molnar wrote:
> 
> * Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> > >  config 4KSTACKS
> > >  	bool "Use 4Kb for kernel stacks instead of 8Kb"
> > > -	depends on DEBUG_KERNEL
> > >  	depends on X86_32
> > > +	default y
> > 
> > This patch will cause kernels to crash.
> 
> what mainline kernels crash and how will they crash? Fedora and other 
> distros have had 4K stacks enabled for years:

If by other distros you mean RHEL then yes.  However, openSUSE,
Ubuntu, and Mandriva all still have 8K stacks.  I know of no other
distributions that default to 4K.

--
Shawn

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-19 14:35     ` Oliver Pinter
@ 2008-04-19 15:19       ` Adrian Bunk
  2008-04-19 15:42         ` Oliver Pinter
                           ` (2 more replies)
  0 siblings, 3 replies; 162+ messages in thread
From: Adrian Bunk @ 2008-04-19 15:19 UTC (permalink / raw)
  To: Oliver Pinter
  Cc: Ingo Molnar, Andrew Morton, Linux Kernel Mailing List,
	Arjan van de Ven, Thomas Gleixner, Christoph Hellwig,
	David Chinner, xfs

On Sat, Apr 19, 2008 at 04:35:31PM +0200, Oliver Pinter wrote:
>...
> with the older kernel is typical: xfs+nfs+4k stack(+lvm)

Does anyone still experience problems with 2.6.25?

We all know that there once were problems, but if there are any left 
they should be reported and fixed.

> Thanks,
> Oliver

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-19 15:19       ` Adrian Bunk
@ 2008-04-19 15:42         ` Oliver Pinter
  2008-04-20  1:56         ` Eric Sandeen
       [not found]         ` <480AA2B9.10305__23983.3358479247$1208657639$gmane$org@sandeen.net>
  2 siblings, 0 replies; 162+ messages in thread
From: Oliver Pinter @ 2008-04-19 15:42 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Ingo Molnar, Andrew Morton, Linux Kernel Mailing List,
	Arjan van de Ven, Thomas Gleixner, Christoph Hellwig,
	David Chinner, xfs

I dont know, thet this problem presentiert in 2.6.25, but im older
kernels yes (2.6.22> or 2.6.23>).

On 4/19/08, Adrian Bunk <bunk@kernel.org> wrote:
> On Sat, Apr 19, 2008 at 04:35:31PM +0200, Oliver Pinter wrote:
> >...
> > with the older kernel is typical: xfs+nfs+4k stack(+lvm)
>
> Does anyone still experience problems with 2.6.25?
>
> We all know that there once were problems, but if there are any left
> they should be reported and fixed.
>
> > Thanks,
> > Oliver
>
> cu
> Adrian
>
> --
>
>        "Is there not promise of rain?" Ling Tan asked suddenly out
>         of the darkness. There had been need of rain for many days.
>        "Only a promise," Lao Er said.
>                                        Pearl S. Buck - Dragon Seed
>
>


-- 
Thanks,
Oliver

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-19 14:23   ` Ingo Molnar
  2008-04-19 14:35     ` Oliver Pinter
  2008-04-19 14:59     ` Shawn Bohrer
@ 2008-04-19 17:49     ` Andrew Morton
  2008-04-25 17:39       ` Parag Warudkar
  2008-04-20  3:29     ` Eric Sandeen
  2008-04-23  5:27     ` Benjamin Herrenschmidt
  4 siblings, 1 reply; 162+ messages in thread
From: Andrew Morton @ 2008-04-19 17:49 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: linux-kernel, arjan, tglx

> On Sat, 19 Apr 2008 16:23:29 +0200 Ingo Molnar <mingo@elte.hu> wrote:
> 
> * Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> > >  config 4KSTACKS
> > >  	bool "Use 4Kb for kernel stacks instead of 8Kb"
> > > -	depends on DEBUG_KERNEL
> > >  	depends on X86_32
> > > +	default y
> > 
> > This patch will cause kernels to crash.
> 
> what mainline kernels crash and how will they crash?

There has been a dribble of reports - I don't have the links handy, nor did
I search for them.

> Fedora and other 
> distros have had 4K stacks enabled for years:
> 
>   $ grep 4K /boot/config-2.6.24-9.fc9
>   CONFIG_4KSTACKS=y
> 
> and we've conducted tens of thousands of bootup tests with all sorts of 
> drivers and kernel options enabled and have yet to see a single crash 
> due to 4K stacks.

I doubt if you're testing things like nfsd-on-xfs-on-md-on-porky-scsi-driver.

Enable CONFIG_DEBUG_STACK_USAGE.  Monitor the results.  It's so scary that
I wonder if the feature is busted.

> So basically the kernel default just follows the 
> common distro default now. (distros and users can still disable it)

Apparently not.  I wouldn't enable it if I had a distro.

Anyway.  We should be having this sort of discussion _before_ a patch
gets merged, no?

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-19 14:59     ` Shawn Bohrer
@ 2008-04-19 18:00       ` Arjan van de Ven
  2008-04-19 18:33         ` Ingo Molnar
  2008-04-20  2:36         ` Eric Sandeen
  2008-04-20  8:09       ` Adrian Bunk
  1 sibling, 2 replies; 162+ messages in thread
From: Arjan van de Ven @ 2008-04-19 18:00 UTC (permalink / raw)
  To: Shawn Bohrer
  Cc: Ingo Molnar, Andrew Morton, Linux Kernel Mailing List, Thomas Gleixner

On Sat, 19 Apr 2008 09:59:48 -0500
Shawn Bohrer <shawn.bohrer@gmail.com> wrote:

> On Sat, Apr 19, 2008 at 04:23:29PM +0200, Ingo Molnar wrote:
> > 
> > * Andrew Morton <akpm@linux-foundation.org> wrote:
> > 
> > > >  config 4KSTACKS
> > > >  	bool "Use 4Kb for kernel stacks instead of 8Kb"
> > > > -	depends on DEBUG_KERNEL
> > > >  	depends on X86_32
> > > > +	default y
> > > 
> > > This patch will cause kernels to crash.
> > 
> > what mainline kernels crash and how will they crash? Fedora and
> > other distros have had 4K stacks enabled for years:
> 
> If by other distros you mean RHEL then yes.  However, openSUSE,
> Ubuntu, and Mandriva all still have 8K stacks.  I know of no other
> distributions that default to 4K.

centos, oracle and redflag tend to follow the RHEL/fedora settings.

To be honest, at this point we're at a situation where
* Several very popular distributions have this enabled for 5+ years,
  apparently without any real issues (otherwise the enterprise releases
  would have turned this off)
* The early "hot known issues" have been resolved afaik, things like
  block device stacking, and symlink recursion lookups are either no longer
  recursive, or a lot less recursive than they used to be.

There are clear benefits to 4K stacks (no need to reiterate the flamewar,
but worth mentioning)
* Less memory consumption in the lowmem zone (critical for enterprise use,
  also good for general performance)
* Kernel stacks at 8K are one of the most prominent order-1 allocations in the
  kernel; again with big-memory systems the fragmentation of the lowmem zone
  is a problem (and the distros that ship 4K stacks went there because of customer
  complaints)

On the flipside the arguments tend to be
1) certain stackings of components still runs the risk of overflowing
2) I want to run ndiswrapper
3) general, unspecified uneasyness.

For 1), we need to know which they are, and then solve them, because even on x86-64 with 8k stacks
they can be a problem (just because the stack frames are bigger, although not quite double, there).
I've not seen any recent reports, I'll try to extend the kerneloops.org client to collect the
"stack is getting low" warning to be able to see how much this really happens.

for 2), the real answer there is "ndiswrapper needs 12kb not 8kb"

for 3), this is hard to deal with but also generally unfounded... you can use this argument against any change in the kernel.


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-19 18:00       ` Arjan van de Ven
@ 2008-04-19 18:33         ` Ingo Molnar
  2008-04-19 19:10           ` Stefan Richter
  2008-04-20  2:36         ` Eric Sandeen
  1 sibling, 1 reply; 162+ messages in thread
From: Ingo Molnar @ 2008-04-19 18:33 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Shawn Bohrer, Andrew Morton, Linux Kernel Mailing List, Thomas Gleixner


* Arjan van de Ven <arjan@infradead.org> wrote:

> On Sat, 19 Apr 2008 09:59:48 -0500
> Shawn Bohrer <shawn.bohrer@gmail.com> wrote:
> 
> > On Sat, Apr 19, 2008 at 04:23:29PM +0200, Ingo Molnar wrote:
> > > 
> > > * Andrew Morton <akpm@linux-foundation.org> wrote:
> > > 
> > > > >  config 4KSTACKS
> > > > >  	bool "Use 4Kb for kernel stacks instead of 8Kb"
> > > > > -	depends on DEBUG_KERNEL
> > > > >  	depends on X86_32
> > > > > +	default y
> > > > 
> > > > This patch will cause kernels to crash.
> > > 
> > > what mainline kernels crash and how will they crash? Fedora and
> > > other distros have had 4K stacks enabled for years:
> > 
> > If by other distros you mean RHEL then yes.  However, openSUSE,
> > Ubuntu, and Mandriva all still have 8K stacks.  I know of no other
> > distributions that default to 4K.
> 
> centos, oracle and redflag tend to follow the RHEL/fedora settings.
> 
> To be honest, at this point we're at a situation where
> * Several very popular distributions have this enabled for 5+ years,
>   apparently without any real issues (otherwise the enterprise releases
>   would have turned this off)
> * The early "hot known issues" have been resolved afaik, things like
>   block device stacking, and symlink recursion lookups are either no longer
>   recursive, or a lot less recursive than they used to be.
> 
> There are clear benefits to 4K stacks (no need to reiterate the flamewar,
> but worth mentioning)
> * Less memory consumption in the lowmem zone (critical for enterprise use,
>   also good for general performance)
> * Kernel stacks at 8K are one of the most prominent order-1 allocations in the
>   kernel; again with big-memory systems the fragmentation of the lowmem zone
>   is a problem (and the distros that ship 4K stacks went there because of customer
>   complaints)
> 
> On the flipside the arguments tend to be
> 1) certain stackings of components still runs the risk of overflowing
> 2) I want to run ndiswrapper
> 3) general, unspecified uneasyness.
> 
> For 1), we need to know which they are, and then solve them, because 
> even on x86-64 with 8k stacks they can be a problem (just because the 
> stack frames are bigger, although not quite double, there). I've not 
> seen any recent reports, I'll try to extend the kerneloops.org client 
> to collect the "stack is getting low" warning to be able to see how 
> much this really happens.
> 
> for 2), the real answer there is "ndiswrapper needs 12kb not 8kb"
> 
> for 3), this is hard to deal with but also generally unfounded... you 
> can use this argument against any change in the kernel.

and lets observe it that 8K stacks are of course still offered, so if 
anyone disables 4K stacks in the .config, it will stay disabled.

	Ingo

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-19 18:33         ` Ingo Molnar
@ 2008-04-19 19:10           ` Stefan Richter
  0 siblings, 0 replies; 162+ messages in thread
From: Stefan Richter @ 2008-04-19 19:10 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Arjan van de Ven, Shawn Bohrer, Andrew Morton,
	Linux Kernel Mailing List, Thomas Gleixner

Ingo Molnar wrote:
> and lets observe it that 8K stacks are of course still offered, so if 
> anyone disables 4K stacks in the .config, it will stay disabled.

While you change the default, maybe move it also from the "Kernel 
hacking" menu into the "General setup" menu?  An option with default=y 
is probably not an option that is targeted towards kernel hackers only.
-- 
Stefan Richter
-=====-==--- -=-- =--==
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-19 15:19       ` Adrian Bunk
  2008-04-19 15:42         ` Oliver Pinter
@ 2008-04-20  1:56         ` Eric Sandeen
  2008-04-20  7:42           ` Adrian Bunk
       [not found]         ` <480AA2B9.10305__23983.3358479247$1208657639$gmane$org@sandeen.net>
  2 siblings, 1 reply; 162+ messages in thread
From: Eric Sandeen @ 2008-04-20  1:56 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Oliver Pinter, Ingo Molnar, Andrew Morton,
	Linux Kernel Mailing List, Arjan van de Ven, Thomas Gleixner,
	Christoph Hellwig, David Chinner, xfs

Adrian Bunk wrote:
> On Sat, Apr 19, 2008 at 04:35:31PM +0200, Oliver Pinter wrote:
>> ...
>> with the older kernel is typical: xfs+nfs+4k stack(+lvm)
> 
> Does anyone still experience problems with 2.6.25?

There are always problems.  You can always come up with something that
will crash in 4k, IMHO.

Rather than foisting this upon everyone, I'd rather see work put into
making stack size a boot parameter or something, so that people can
choose what's appropriate for their workload (or their IO stack, if you
prefer).

-Eric

> We all know that there once were problems, but if there are any left 
> they should be reported and fixed.
> 
>> Thanks,
>> Oliver
> 
> cu
> Adrian
> 


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-19 18:00       ` Arjan van de Ven
  2008-04-19 18:33         ` Ingo Molnar
@ 2008-04-20  2:36         ` Eric Sandeen
  2008-04-20  6:11           ` Arjan van de Ven
  2008-04-20 22:53           ` David Chinner
  1 sibling, 2 replies; 162+ messages in thread
From: Eric Sandeen @ 2008-04-20  2:36 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Shawn Bohrer, Ingo Molnar, Andrew Morton,
	Linux Kernel Mailing List, Thomas Gleixner

Arjan van de Ven wrote:

> On the flipside the arguments tend to be
> 1) certain stackings of components still runs the risk of overflowing
> 2) I want to run ndiswrapper
> 3) general, unspecified uneasyness.
> 
> For 1), we need to know which they are, and then solve them, because even on x86-64 with 8k stacks
> they can be a problem (just because the stack frames are bigger, although not quite double, there).

Except, apparently, not, at least in my experience.

Ask the xfs guys if they see stack overflows on x86_64, or on x86.

I've personally never seen common stack problems with xfs on x86_64, but
it's very common on x86.  I don't have a great answer for why, but
that's my anecdotal evidence.

> I've not seen any recent reports, I'll try to extend the kerneloops.org client to collect the
> "stack is getting low" warning to be able to see how much this really happens.

That sounds like a very good thing to collect, and maybe if I re-send a
"clearly state stack overflows at oops time" patch you can easily keep tabs.

Thanks,

-Eric

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-19 14:23   ` Ingo Molnar
                       ` (2 preceding siblings ...)
  2008-04-19 17:49     ` Andrew Morton
@ 2008-04-20  3:29     ` Eric Sandeen
  2008-04-20 12:36       ` Andi Kleen
  2008-04-21 14:31       ` Ingo Molnar
  2008-04-23  5:27     ` Benjamin Herrenschmidt
  4 siblings, 2 replies; 162+ messages in thread
From: Eric Sandeen @ 2008-04-20  3:29 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Morton, Linux Kernel Mailing List, Arjan van de Ven,
	Thomas Gleixner

Ingo Molnar wrote:
> * Andrew Morton <akpm@linux-foundation.org> wrote:
> 
>>>  config 4KSTACKS
>>>  	bool "Use 4Kb for kernel stacks instead of 8Kb"
>>> -	depends on DEBUG_KERNEL
>>>  	depends on X86_32
>>> +	default y
>> This patch will cause kernels to crash.
> 
> what mainline kernels crash and how will they crash? Fedora and other 
> distros have had 4K stacks enabled for years:
> 
>   $ grep 4K /boot/config-2.6.24-9.fc9
>   CONFIG_4KSTACKS=y
> 
> and we've conducted tens of thousands of bootup tests with all sorts of 
> drivers and kernel options enabled and have yet to see a single crash 
> due to 4K stacks. 

Really, not one?

https://bugzilla.redhat.com/show_bug.cgi?id=247158
https://bugzilla.redhat.com/show_bug.cgi?id=227331
https://bugzilla.redhat.com/show_bug.cgi?id=240077

(hehe, ok, xfs is a common component there...)

and it's not always obvious that you've overflowed the stack.

CONFIG_DEBUG_STACKOVERFLOW isn't ery useful because the warning printk
it generates uses the remaining amount of stack, and tips the box.

> So basically the kernel default just follows the 
> common distro default now. (distros and users can still disable it)

If Fedora is the common distro, ok. :)

Fedora is a pretty narrow sample in terms of IO stacks at least.  I have
plenty of fondness for Fedora, but it's almost 100% ext3[1].  I spent a
fair amount of time getting xfs+lvm to survive 4k on F8; gcc caused
stack usage to grow in general from F7 to F8, and F9 seems to have
gotten tight again but I haven't gotten to the bottom of yet.

Heck my ext3-root-on-sda1 pre-beta F9 box, no nfs or lvm or xfs or
anything gets within 744 bytes of the end of the 4k stack simply by
*booting* (it was a modprobe process... maybe some module needs help)

How many other distros use 4K stacks on x86, really?

-Eric

[1] http://www.smolts.org/static/stats/stats.html shows 24588 ext3
filesystems, compared to 366 xfs, 248 reiserfs, 76 jfs ...

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-20  2:36         ` Eric Sandeen
@ 2008-04-20  6:11           ` Arjan van de Ven
  2008-04-20 22:53           ` David Chinner
  1 sibling, 0 replies; 162+ messages in thread
From: Arjan van de Ven @ 2008-04-20  6:11 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: Shawn Bohrer, Ingo Molnar, Andrew Morton,
	Linux Kernel Mailing List, Thomas Gleixner

On Sat, 19 Apr 2008 21:36:16 -0500
Eric Sandeen <sandeen@sandeen.net> wrote:


> > For 1), we need to know which they are, and then solve them,
> > because even on x86-64 with 8k stacks they can be a problem (just
> > because the stack frames are bigger, although not quite double,
> > there).
> 
> Except, apparently, not, at least in my experience.

if you actually go over on x86, it's not unlikely that you're getting close to the edge on 64 bit.

At minimum we really do want to fix these things...

> I've personally never seen common stack problems with xfs on x86_64,
> but it's very common on x86.  I don't have a great answer for why, but
> that's my anecdotal evidence.

One thing I've learned with the kerneloops.org work is that people don't read
their dmesg..... 
> 
> > I've not seen any recent reports, I'll try to extend the
> > kerneloops.org client to collect the "stack is getting low" warning
> > to be able to see how much this really happens.
> 
> That sounds like a very good thing to collect, and maybe if I re-send
> a "clearly state stack overflows at oops time" patch you can easily
> keep tabs.

... which makes me think we need to strengthen this part of the kernel.
(and then have kerneloops.org collect the issues)

If there's a clear pattern in the backtraces we will find it. 
And then we can fix it... which is absolutely the right thing,
I don't think anyone disagrees with that.

So yes if you can dig up your patch, yes please!


-- 
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-20  1:56         ` Eric Sandeen
@ 2008-04-20  7:42           ` Adrian Bunk
  2008-04-20 16:59             ` Chris Wedgwood
  0 siblings, 1 reply; 162+ messages in thread
From: Adrian Bunk @ 2008-04-20  7:42 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: Oliver Pinter, Ingo Molnar, Andrew Morton,
	Linux Kernel Mailing List, Arjan van de Ven, Thomas Gleixner,
	Christoph Hellwig, David Chinner, xfs

On Sat, Apr 19, 2008 at 08:56:09PM -0500, Eric Sandeen wrote:
> Adrian Bunk wrote:
> > On Sat, Apr 19, 2008 at 04:35:31PM +0200, Oliver Pinter wrote:
> >> ...
> >> with the older kernel is typical: xfs+nfs+4k stack(+lvm)
> > 
> > Does anyone still experience problems with 2.6.25?
> 
> There are always problems.  You can always come up with something that
> will crash in 4k, IMHO.

We are going from 6k to 4k.

Your "You can always come up with something that will crash in" point 
would be invariant to this change (although it might be harder to 
trigger in real life).

> Rather than foisting this upon everyone, I'd rather see work put into
> making stack size a boot parameter or something, so that people can
> choose what's appropriate for their workload (or their IO stack, if you
> prefer).

Why should users have to poke with such deeply internal things?
That doesn't sound right.

Excessive stack usage in the kernel is considered to be a bug.

We should identify and fix all remaining problems (if any).

> -Eric

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-20  8:09       ` Adrian Bunk
@ 2008-04-20  8:06         ` Alan Cox
  2008-04-20  8:51           ` Adrian Bunk
  0 siblings, 1 reply; 162+ messages in thread
From: Alan Cox @ 2008-04-20  8:06 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Shawn Bohrer, Ingo Molnar, Andrew Morton,
	Linux Kernel Mailing List, Arjan van de Ven, Thomas Gleixner

> The stack problems in the kernel tend to not be in arch code, and if 
> we don't get i386 to always run with 4k stacks there's no chance that 
> it will ever work reliably on other architectures.

Not really the case - embedded tends not to use deep stacks of drivers.

Alan

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-19 14:59     ` Shawn Bohrer
  2008-04-19 18:00       ` Arjan van de Ven
@ 2008-04-20  8:09       ` Adrian Bunk
  2008-04-20  8:06         ` Alan Cox
  1 sibling, 1 reply; 162+ messages in thread
From: Adrian Bunk @ 2008-04-20  8:09 UTC (permalink / raw)
  To: Shawn Bohrer
  Cc: Ingo Molnar, Andrew Morton, Linux Kernel Mailing List,
	Arjan van de Ven, Thomas Gleixner

On Sat, Apr 19, 2008 at 09:59:48AM -0500, Shawn Bohrer wrote:
> On Sat, Apr 19, 2008 at 04:23:29PM +0200, Ingo Molnar wrote:
> > 
> > * Andrew Morton <akpm@linux-foundation.org> wrote:
> > 
> > > >  config 4KSTACKS
> > > >  	bool "Use 4Kb for kernel stacks instead of 8Kb"
> > > > -	depends on DEBUG_KERNEL
> > > >  	depends on X86_32
> > > > +	default y
> > > 
> > > This patch will cause kernels to crash.
> > 
> > what mainline kernels crash and how will they crash? Fedora and other 
> > distros have had 4K stacks enabled for years:
> 
> If by other distros you mean RHEL then yes.  However, openSUSE,
> Ubuntu, and Mandriva all still have 8K stacks.  I know of no other
> distributions that default to 4K.

MontaVista offers 4k stacks for arm (currently an external patch) and 
markets that as a feature to customers, so many of them might use it.

In-kernel the sh and m68knommu ports also offer 4k stacks (for both 
archs there's also a defconfig using it), and the mn10300 port contains 
an #ifdef but no config option.

The stack problems in the kernel tend to not be in arch code, and if 
we don't get i386 to always run with 4k stacks there's no chance that 
it will ever work reliably on other architectures.

> Shawn

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-20  8:06         ` Alan Cox
@ 2008-04-20  8:51           ` Adrian Bunk
  2008-04-20  9:36             ` Alan Cox
  0 siblings, 1 reply; 162+ messages in thread
From: Adrian Bunk @ 2008-04-20  8:51 UTC (permalink / raw)
  To: Alan Cox
  Cc: Shawn Bohrer, Ingo Molnar, Andrew Morton,
	Linux Kernel Mailing List, Arjan van de Ven, Thomas Gleixner

On Sun, Apr 20, 2008 at 09:06:23AM +0100, Alan Cox wrote:
> > The stack problems in the kernel tend to not be in arch code, and if 
> > we don't get i386 to always run with 4k stacks there's no chance that 
> > it will ever work reliably on other architectures.
> 
> Not really the case - embedded tends not to use deep stacks of drivers.

Something like nfsd-over-xfs-over-raid is (or was) the most common 
problem - and this or similar stackings might be used in NAS devices.

> Alan

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-20  8:51           ` Adrian Bunk
@ 2008-04-20  9:36             ` Alan Cox
  2008-04-20 10:44               ` Adrian Bunk
  0 siblings, 1 reply; 162+ messages in thread
From: Alan Cox @ 2008-04-20  9:36 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Shawn Bohrer, Ingo Molnar, Andrew Morton,
	Linux Kernel Mailing List, Arjan van de Ven, Thomas Gleixner

On Sun, 20 Apr 2008 11:51:04 +0300
Adrian Bunk <bunk@kernel.org> wrote:

> On Sun, Apr 20, 2008 at 09:06:23AM +0100, Alan Cox wrote:
> > > The stack problems in the kernel tend to not be in arch code, and if 
> > > we don't get i386 to always run with 4k stacks there's no chance that 
> > > it will ever work reliably on other architectures.
> > 
> > Not really the case - embedded tends not to use deep stacks of drivers.
> 
> Something like nfsd-over-xfs-over-raid is (or was) the most common 
> problem - and this or similar stackings might be used in NAS devices.

Specific cases yes, but such NAS devices have big processors and are not
little emdedded CPUs. On an embedded box you know at build time what it
will be doing.

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-20  9:36             ` Alan Cox
@ 2008-04-20 10:44               ` Adrian Bunk
  2008-04-20 11:02                 ` Alan Cox
                                   ` (2 more replies)
  0 siblings, 3 replies; 162+ messages in thread
From: Adrian Bunk @ 2008-04-20 10:44 UTC (permalink / raw)
  To: Alan Cox
  Cc: Shawn Bohrer, Ingo Molnar, Andrew Morton,
	Linux Kernel Mailing List, Arjan van de Ven, Thomas Gleixner

On Sun, Apr 20, 2008 at 10:36:11AM +0100, Alan Cox wrote:
> On Sun, 20 Apr 2008 11:51:04 +0300
> Adrian Bunk <bunk@kernel.org> wrote:
> 
> > On Sun, Apr 20, 2008 at 09:06:23AM +0100, Alan Cox wrote:
> > > > The stack problems in the kernel tend to not be in arch code, and if 
> > > > we don't get i386 to always run with 4k stacks there's no chance that 
> > > > it will ever work reliably on other architectures.
> > > 
> > > Not really the case - embedded tends not to use deep stacks of drivers.
> > 
> > Something like nfsd-over-xfs-over-raid is (or was) the most common 
> > problem - and this or similar stackings might be used in NAS devices.
> 
> Specific cases yes, but such NAS devices have big processors and are not
> little emdedded CPUs. On an embedded box you know at build time what it
> will be doing.

The code in the kernel that gets the fewest coverage at all are our 
error paths, and some vendor might try 4k stacks, validate it works in 
all use cases - and then it will blow up in some error condition he 
didn't test.

6k is known to work, and there aren't many problems known with 4k.

And from a QA point of view the only way of getting 4k thoroughly tested 
by users, and well also tested in -rc kernels for catching regressions 
before they get into stable kernels, is if we get 4k stacks enabled 
unconditionally on i386.

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-20 10:44               ` Adrian Bunk
@ 2008-04-20 11:02                 ` Alan Cox
  2008-04-20 11:54                   ` Adrian Bunk
  2008-04-20 12:27                 ` Andi Kleen
  2008-04-20 13:22                 ` Mark Lord
  2 siblings, 1 reply; 162+ messages in thread
From: Alan Cox @ 2008-04-20 11:02 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Shawn Bohrer, Ingo Molnar, Andrew Morton,
	Linux Kernel Mailing List, Arjan van de Ven, Thomas Gleixner

> The code in the kernel that gets the fewest coverage at all are our 
> error paths, and some vendor might try 4k stacks, validate it works in 
> all use cases - and then it will blow up in some error condition he 
> didn't test.

Which you won't fix by changing the x86 defaults. More of a problem in
embedded small devices is the 8K allocation failing in the first place -
plus 4K x 80 processes == lots.

> And from a QA point of view the only way of getting 4k thoroughly tested 
> by users, and well also tested in -rc kernels for catching regressions 
> before they get into stable kernels, is if we get 4k stacks enabled 
> unconditionally on i386.

At which point some distros will simply patch it back no doubt.

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-20 11:54                   ` Adrian Bunk
@ 2008-04-20 11:37                     ` Alan Cox
  2008-04-20 12:18                       ` Adrian Bunk
  2008-04-20 12:37                     ` Andi Kleen
  1 sibling, 1 reply; 162+ messages in thread
From: Alan Cox @ 2008-04-20 11:37 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Shawn Bohrer, Ingo Molnar, Andrew Morton,
	Linux Kernel Mailing List, Arjan van de Ven, Thomas Gleixner

On Sun, 20 Apr 2008 14:54:55 +0300
Adrian Bunk <bunk@kernel.org> wrote:

> Red Hat seems to get usable kernels with 4k for some years?

Yes and I think it is the right setting.

> If we get whatever is still missing for 4k working once and then the 
> coverage of all i386 -rc testers for noticing new issues immediately
> there should be no stability reason for distros to patch it back in.

You don't get to dictate to people however. 

Alan
--
  "If we become a great evil avaricious hegemony, I wanna cool uniform"
                        -- robk

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
       [not found]         ` <480AA2B9.10305__23983.3358479247$1208657639$gmane$org@sandeen.net>
@ 2008-04-20 11:48           ` Andi Kleen
  0 siblings, 0 replies; 162+ messages in thread
From: Andi Kleen @ 2008-04-20 11:48 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: Adrian Bunk, Oliver Pinter, Ingo Molnar, Andrew Morton,
	Linux Kernel Mailing List, Arjan van de Ven, Thomas Gleixner,
	Christoph Hellwig, David Chinner, xfs

Eric Sandeen <sandeen@sandeen.net> writes:

> Adrian Bunk wrote:
>> On Sat, Apr 19, 2008 at 04:35:31PM +0200, Oliver Pinter wrote:
>>> ...
>>> with the older kernel is typical: xfs+nfs+4k stack(+lvm)
>> 
>> Does anyone still experience problems with 2.6.25?
>
> There are always problems.  You can always come up with something that
> will crash in 4k, IMHO.

But what are a few crashes compared against the ability to run 50000
kernel threads on a 32bit machine? Something has to give in the aim
for useless checkbox numbers after all. 

-Andi

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-20 11:02                 ` Alan Cox
@ 2008-04-20 11:54                   ` Adrian Bunk
  2008-04-20 11:37                     ` Alan Cox
  2008-04-20 12:37                     ` Andi Kleen
  0 siblings, 2 replies; 162+ messages in thread
From: Adrian Bunk @ 2008-04-20 11:54 UTC (permalink / raw)
  To: Alan Cox
  Cc: Shawn Bohrer, Ingo Molnar, Andrew Morton,
	Linux Kernel Mailing List, Arjan van de Ven, Thomas Gleixner

On Sun, Apr 20, 2008 at 12:02:50PM +0100, Alan Cox wrote:
> > The code in the kernel that gets the fewest coverage at all are our 
> > error paths, and some vendor might try 4k stacks, validate it works in 
> > all use cases - and then it will blow up in some error condition he 
> > didn't test.
> 
> Which you won't fix by changing the x86 defaults.

Stuff like nfsd, xfs and raid is covered by the x86 defaults.

It's not a 100% coverage, but quite much.

> More of a problem in
> embedded small devices is the 8K allocation failing in the first place -
> plus 4K x 80 processes == lots.
> 
> > And from a QA point of view the only way of getting 4k thoroughly tested 
> > by users, and well also tested in -rc kernels for catching regressions 
> > before they get into stable kernels, is if we get 4k stacks enabled 
> > unconditionally on i386.
> 
> At which point some distros will simply patch it back no doubt.

Red Hat seems to get usable kernels with 4k for some years?

If we get whatever is still missing for 4k working once and then the 
coverage of all i386 -rc testers for noticing new issues immediately
there should be no stability reason for distros to patch it back in.

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-20 11:37                     ` Alan Cox
@ 2008-04-20 12:18                       ` Adrian Bunk
  2008-04-20 14:05                         ` Eric Sandeen
  0 siblings, 1 reply; 162+ messages in thread
From: Adrian Bunk @ 2008-04-20 12:18 UTC (permalink / raw)
  To: Alan Cox
  Cc: Shawn Bohrer, Ingo Molnar, Andrew Morton,
	Linux Kernel Mailing List, Arjan van de Ven, Thomas Gleixner

On Sun, Apr 20, 2008 at 12:37:31PM +0100, Alan Cox wrote:
> On Sun, 20 Apr 2008 14:54:55 +0300
> Adrian Bunk <bunk@kernel.org> wrote:
> 
> > Red Hat seems to get usable kernels with 4k for some years?
> 
> Yes and I think it is the right setting.
> 
> > If we get whatever is still missing for 4k working once and then the 
> > coverage of all i386 -rc testers for noticing new issues immediately
> > there should be no stability reason for distros to patch it back in.
> 
> You don't get to dictate to people however. 

Everyone is free to patch whatever stacksize he wants into his kernel.

But the more users will get 4k stacks the more testing we have, and the 
better both existing and new bugs get shaken out.

And if there were only 4k stacks in the vanilla kernel, and therefore 
all people on i386 testing -rc kernels would get it, that would give a 
better chance of finding stack regressions before they get into a 
stable kernel.

If a distribution or user then wants to increase it that's his choice 
(and easy to do), but nothing the upstream kernel has to offer.

> Alan

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-20 10:44               ` Adrian Bunk
  2008-04-20 11:02                 ` Alan Cox
@ 2008-04-20 12:27                 ` Andi Kleen
  2008-04-20 12:32                   ` Adrian Bunk
                                     ` (2 more replies)
  2008-04-20 13:22                 ` Mark Lord
  2 siblings, 3 replies; 162+ messages in thread
From: Andi Kleen @ 2008-04-20 12:27 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Alan Cox, Shawn Bohrer, Ingo Molnar, Andrew Morton,
	Linux Kernel Mailing List, Arjan van de Ven, Thomas Gleixner

Adrian Bunk <bunk@kernel.org> writes:
>
> 6k is known to work, and there aren't many problems known with 4k.
>
> And from a QA point of view the only way of getting 4k thoroughly tested 

But you have to first ask why do you want 4k tested? Does it serve
any useful purpose in itself? I don't think so. Or you're saying
it's important to support 50k kernel threads on 32bit kernels?

-Andi

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-20 12:27                 ` Andi Kleen
@ 2008-04-20 12:32                   ` Adrian Bunk
  2008-04-20 12:47                   ` Willy Tarreau
  2008-04-20 15:44                   ` Daniel Hazelton
  2 siblings, 0 replies; 162+ messages in thread
From: Adrian Bunk @ 2008-04-20 12:32 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Alan Cox, Shawn Bohrer, Ingo Molnar, Andrew Morton,
	Linux Kernel Mailing List, Arjan van de Ven, Thomas Gleixner

On Sun, Apr 20, 2008 at 02:27:14PM +0200, Andi Kleen wrote:
> Adrian Bunk <bunk@kernel.org> writes:
> >
> > 6k is known to work, and there aren't many problems known with 4k.
> >
> > And from a QA point of view the only way of getting 4k thoroughly tested 
> 
> But you have to first ask why do you want 4k tested? Does it serve
> any useful purpose in itself? I don't think so. Or you're saying
> it's important to support 50k kernel threads on 32bit kernels?

Small embedded systems like the space savings.

> -Andi

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-20  3:29     ` Eric Sandeen
@ 2008-04-20 12:36       ` Andi Kleen
  2008-04-21 14:31       ` Ingo Molnar
  1 sibling, 0 replies; 162+ messages in thread
From: Andi Kleen @ 2008-04-20 12:36 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: Ingo Molnar, Andrew Morton, Linux Kernel Mailing List,
	Arjan van de Ven, Thomas Gleixner

Eric Sandeen <sandeen@sandeen.net> writes:
>
> CONFIG_DEBUG_STACKOVERFLOW isn't ery useful because the warning printk
> it generates uses the remaining amount of stack, and tips the box.

That could be easily fixed by executing the printk on the interrupt
stack on i386.  Currently it is before the stack switch which is wrong
agreed. On x86-64 it should already execute on the interrupt stack. Or
perhaps it would be better to just move the stack switch on i386 into
entry.S too similar to 64bit.

That wouldn't help without interrupt stacks of course, but these
should be always on anyways even with 8k stacks.

Experimental patch appended to do this.

-Andi

---

i386: Execute stack overflow warning on interrupt stack

Previously it would run on the process stack, which risks overflow
an already low stack. Instead execute it on the interrupt stack.

Based on an observation by Eric Sandeen.

Signed-off-by: Andi Kleen <andi@firstfloor.org>

Index: linux/arch/x86/kernel/irq_32.c
===================================================================
--- linux.orig/arch/x86/kernel/irq_32.c
+++ linux/arch/x86/kernel/irq_32.c
@@ -61,6 +61,26 @@ static union irq_ctx *hardirq_ctx[NR_CPU
 static union irq_ctx *softirq_ctx[NR_CPUS] __read_mostly;
 #endif
 
+static void stack_overflow(void)
+{
+	printk("low stack detected by irq handler\n");
+	dump_stack();
+}
+
+static inline void call_on_stack2(void *func, unsigned long stack,
+			   unsigned long arg1, unsigned long arg2)
+{
+	unsigned long bx;
+	asm volatile(
+			"       xchgl  %%ebx,%%esp    \n"
+			"       call   *%%edi         \n"
+			"       movl   %%ebx,%%esp    \n"
+			: "=a" (arg1), "=d" (arg2), "=b" (bx)
+			:  "0" (arg1),   "1" (arg2),  "2" (stack),
+			   "D" (func)
+			: "memory", "cc");
+}
+
 /*
  * do_IRQ handles all normal device IRQ's (the special
  * SMP cross-CPU interrupts have their own specific
@@ -76,6 +96,7 @@ unsigned int do_IRQ(struct pt_regs *regs
 	union irq_ctx *curctx, *irqctx;
 	u32 *isp;
 #endif
+	int overflow = 0;
 
 	if (unlikely((unsigned)irq >= NR_IRQS)) {
 		printk(KERN_EMERG "%s: cannot handle IRQ %d\n",
@@ -92,11 +113,8 @@ unsigned int do_IRQ(struct pt_regs *regs
 
 		__asm__ __volatile__("andl %%esp,%0" :
 					"=r" (sp) : "0" (THREAD_SIZE - 1));
-		if (unlikely(sp < (sizeof(struct thread_info) + STACK_WARN))) {
-			printk("do_IRQ: stack overflow: %ld\n",
-				sp - sizeof(struct thread_info));
-			dump_stack();
-		}
+		if (unlikely(sp < (sizeof(struct thread_info) + STACK_WARN)))
+			overflow = 1;
 	}
 #endif
 
@@ -112,8 +130,6 @@ unsigned int do_IRQ(struct pt_regs *regs
 	 * current stack (which is the irq stack already after all)
 	 */
 	if (curctx != irqctx) {
-		int arg1, arg2, bx;
-
 		/* build the stack frame on the IRQ stack */
 		isp = (u32*) ((char*)irqctx + sizeof(*irqctx));
 		irqctx->tinfo.task = curctx->tinfo.task;
@@ -127,18 +143,20 @@ unsigned int do_IRQ(struct pt_regs *regs
 			(irqctx->tinfo.preempt_count & ~SOFTIRQ_MASK) |
 			(curctx->tinfo.preempt_count & SOFTIRQ_MASK);
 
-		asm volatile(
-			"       xchgl  %%ebx,%%esp    \n"
-			"       call   *%%edi         \n"
-			"       movl   %%ebx,%%esp    \n"
-			: "=a" (arg1), "=d" (arg2), "=b" (bx)
-			:  "0" (irq),   "1" (desc),  "2" (isp),
-			   "D" (desc->handle_irq)
-			: "memory", "cc"
-		);
+		/* Execute warning on interrupt stack */
+		if (unlikely(overflow))
+			call_on_stack2(stack_overflow, isp, 0, 0);
+
+		call_on_stack2(desc->handle_irq, isp, irq, desc);
+
 	} else
 #endif
+	{
+		/* AK: Slightly bogus here */
+		if (overflow)
+			stack_overflow();
 		desc->handle_irq(irq, desc);
+	}
 
 	irq_exit();
 	set_irq_regs(old_regs);

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-20 11:54                   ` Adrian Bunk
  2008-04-20 11:37                     ` Alan Cox
@ 2008-04-20 12:37                     ` Andi Kleen
  1 sibling, 0 replies; 162+ messages in thread
From: Andi Kleen @ 2008-04-20 12:37 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Alan Cox, Shawn Bohrer, Ingo Molnar, Andrew Morton,
	Linux Kernel Mailing List, Arjan van de Ven, Thomas Gleixner

Adrian Bunk <bunk@kernel.org> writes:
>
> Red Hat seems to get usable kernels with 4k for some years?

One way they do that is by marking significant parts of the kernel
unsupported. I don't think that's an option for mainline.

-Andi

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-20 12:27                 ` Andi Kleen
  2008-04-20 12:32                   ` Adrian Bunk
@ 2008-04-20 12:47                   ` Willy Tarreau
  2008-04-20 13:06                     ` Andi Kleen
                                       ` (2 more replies)
  2008-04-20 15:44                   ` Daniel Hazelton
  2 siblings, 3 replies; 162+ messages in thread
From: Willy Tarreau @ 2008-04-20 12:47 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Adrian Bunk, Alan Cox, Shawn Bohrer, Ingo Molnar, Andrew Morton,
	Linux Kernel Mailing List, Arjan van de Ven, Thomas Gleixner

On Sun, Apr 20, 2008 at 02:27:14PM +0200, Andi Kleen wrote:
> Adrian Bunk <bunk@kernel.org> writes:
> >
> > 6k is known to work, and there aren't many problems known with 4k.
> >
> > And from a QA point of view the only way of getting 4k thoroughly tested 
> 
> But you have to first ask why do you want 4k tested? Does it serve
> any useful purpose in itself? I don't think so. Or you're saying
> it's important to support 50k kernel threads on 32bit kernels?

Clearly if I have the choice between a kernel which can run 50k threads
and a kernel which does not crash under me during an I/O error, I choose
the later! I don't even imagine what purpose 50k kernel threads may serve.
I certainly can understand that reducing memory footprint is useful, but
if we want wider testing of 4k stacks, considering they may fail in error
path in complex I/O environment, it's not likely during -rc kernels that
we'll detect problems, and if we push them down the throat of users in a
stable release, of course they will thank us very much for crashing their
NFS servers in production during peak hours.

I have nothing against changing the default setting to 4k provided that
it is easy to get back to the save setting (ie changing a config option,
or better, a cmdline parameter). I just don't agree with the idea of
forcing users to swim in the sh*t, it only brings bad reputation to
Linux.

What would really help would be to have 8k stacks with the lower page
causing a fault and print a stack trace upon first access. That way,
the safe setting would still report us useful information without
putting users into trouble.

Willy


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-20 12:47                   ` Willy Tarreau
@ 2008-04-20 13:06                     ` Andi Kleen
  2008-04-20 13:30                       ` Adrian Bunk
  2008-04-20 13:21                     ` Adrian Bunk
  2008-04-20 13:27                     ` Mark Lord
  2 siblings, 1 reply; 162+ messages in thread
From: Andi Kleen @ 2008-04-20 13:06 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Adrian Bunk, Alan Cox, Shawn Bohrer, Ingo Molnar, Andrew Morton,
	Linux Kernel Mailing List, Arjan van de Ven, Thomas Gleixner

Willy Tarreau wrote:

> Clearly if I have the choice between a kernel which can run 50k threads
> and a kernel which does not crash under me during an I/O error, I choose
> the later! I don't even imagine what purpose 50k kernel threads may serve.

I don't know either but it was quoted to me earlier as the primary
reason 4k stacks were introduced.

> I have nothing against changing the default setting to 4k provided that
> it is easy to get back to the save setting

So you're saying that only advanced users who understand all their
CONFIG options should have the safe settings? And everyone else
the "only explodes once a week" mode?

For me that is exactly the wrong way around.

If someone is sure they know what they're doing they can set whatever
crazy settings they want (given there is a quick way to check
for the crazy settings in oops reports so that I can ignore those), but
the default should be always safe and optimized for reliability.

-Andi


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-20 12:47                   ` Willy Tarreau
  2008-04-20 13:06                     ` Andi Kleen
@ 2008-04-20 13:21                     ` Adrian Bunk
  2008-04-23  9:13                       ` Helge Hafting
  2008-04-28 18:38                       ` Bill Davidsen
  2008-04-20 13:27                     ` Mark Lord
  2 siblings, 2 replies; 162+ messages in thread
From: Adrian Bunk @ 2008-04-20 13:21 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Andi Kleen, Alan Cox, Shawn Bohrer, Ingo Molnar, Andrew Morton,
	Linux Kernel Mailing List, Arjan van de Ven, Thomas Gleixner

On Sun, Apr 20, 2008 at 02:47:17PM +0200, Willy Tarreau wrote:
>...
> I certainly can understand that reducing memory footprint is useful, but
> if we want wider testing of 4k stacks, considering they may fail in error
> path in complex I/O environment, it's not likely during -rc kernels that
> we'll detect problems, and if we push them down the throat of users in a
> stable release, of course they will thank us very much for crashing their
> NFS servers in production during peak hours.

I've seen many bugs in error paths in the kernel and fixed quite a 
few of them - and stack problems were not a significant part of them.

There are so many possible bugs (that also occur in practice) that 
singling out stack usage won't gain much.

> I have nothing against changing the default setting to 4k provided that
> it is easy to get back to the save setting (ie changing a config option,
> or better, a cmdline parameter). I just don't agree with the idea of
> forcing users to swim in the sh*t, it only brings bad reputation to
> Linux.
>...

What actually brings bad reputation is shipping a 4k option that is 
known to break under some circumstances.

And history has shown that as long as 8k stacks are available on i386 
some problems will not get fixed. 4k stacks are available as an option 
on i386 for more than 4 years, and at about as long we know that there 
are some setups (AFAIK all that might still be present seem to include 
XFS) that are known to not work reliably with 4k stacks.

If we go after stability and reputation, we have to make a decision 
whether we want to get 4k stacks on 32bit architectures with 4k page 
size unconditionally or not at all. That's the way that gets the maximal 
number of bugs shaken out [1] for all supported configurations before 
they would hit a stable kernel.

> Willy

cu
Adrian

[1] obviously not all, but that's true for all classes of bugs

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-20 10:44               ` Adrian Bunk
  2008-04-20 11:02                 ` Alan Cox
  2008-04-20 12:27                 ` Andi Kleen
@ 2008-04-20 13:22                 ` Mark Lord
  2 siblings, 0 replies; 162+ messages in thread
From: Mark Lord @ 2008-04-20 13:22 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Alan Cox, Shawn Bohrer, Ingo Molnar, Andrew Morton,
	Linux Kernel Mailing List, Arjan van de Ven, Thomas Gleixner

Adrian Bunk wrote:
>
> The code in the kernel that gets the fewest coverage at all are our 
> error paths, and some vendor might try 4k stacks, validate it works in 
> all use cases - and then it will blow up in some error condition he 
> didn't test.
..

That's exactly the worry.

If anyone want's to take a crack at testing some of the more likely
fail paths there, just introduce a media error onto a SATA disk
that's buried at the bottom of a stacked RAID1 over RAID0 over LVM,
with XFS and nfsd on top.

Or something like that.
And then experiment with corrupting meta data rather than simply file data.
How-to introduce a media error?  hdparm --make-bad-sector nnnnnn /dev/sdX

This catches the most likely (IMHO) failure scenarios,
but still comes nowhere near 100% code coverage.  :(

Cheers

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-20 12:47                   ` Willy Tarreau
  2008-04-20 13:06                     ` Andi Kleen
  2008-04-20 13:21                     ` Adrian Bunk
@ 2008-04-20 13:27                     ` Mark Lord
  2008-04-20 13:38                       ` Willy Tarreau
  2008-04-20 14:09                       ` Eric Sandeen
  2 siblings, 2 replies; 162+ messages in thread
From: Mark Lord @ 2008-04-20 13:27 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Andi Kleen, Adrian Bunk, Alan Cox, Shawn Bohrer, Ingo Molnar,
	Andrew Morton, Linux Kernel Mailing List, Arjan van de Ven,
	Thomas Gleixner

Willy Tarreau wrote:
>
> What would really help would be to have 8k stacks with the lower page
> causing a fault and print a stack trace upon first access. That way,
> the safe setting would still report us useful information without
> putting users into trouble.
..

That's the best suggestion from this thread, by far!
Can you produce a patch for 2.6.26 for this?
Or perhaps someone else here, with the right code familiarity, could?

Some sort of CONFIG option would likely be wanted to
either enable/disable this feature, of course.

Cheers

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-20 13:06                     ` Andi Kleen
@ 2008-04-20 13:30                       ` Adrian Bunk
  2008-04-20 13:34                         ` Willy Tarreau
  2008-04-28 17:56                         ` Bill Davidsen
  0 siblings, 2 replies; 162+ messages in thread
From: Adrian Bunk @ 2008-04-20 13:30 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Willy Tarreau, Alan Cox, Shawn Bohrer, Ingo Molnar,
	Andrew Morton, Linux Kernel Mailing List, Arjan van de Ven,
	Thomas Gleixner

On Sun, Apr 20, 2008 at 03:06:23PM +0200, Andi Kleen wrote:
> Willy Tarreau wrote:
>...
> > I have nothing against changing the default setting to 4k provided that
> > it is easy to get back to the save setting
> 
> So you're saying that only advanced users who understand all their
> CONFIG options should have the safe settings? And everyone else
> the "only explodes once a week" mode?
> 
> For me that is exactly the wrong way around.
> 
> If someone is sure they know what they're doing they can set whatever
> crazy settings they want (given there is a quick way to check
> for the crazy settings in oops reports so that I can ignore those), but
> the default should be always safe and optimized for reliability.

That means we'll have nearly zero testing of the "crazy setting" and 
when someone tries it he'll have a high probability of running into some
problems.

Such a "crazy setting" shouldn't be offered to users at all.

We should either aim at 4k stacks unconditionally for all 32bit 
architectures with 4k page size or don't allow any architecture
to offer 4k stacks.

> -Andi

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-20 13:30                       ` Adrian Bunk
@ 2008-04-20 13:34                         ` Willy Tarreau
  2008-04-20 14:04                           ` Adrian Bunk
  2008-04-28 17:56                         ` Bill Davidsen
  1 sibling, 1 reply; 162+ messages in thread
From: Willy Tarreau @ 2008-04-20 13:34 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Andi Kleen, Alan Cox, Shawn Bohrer, Ingo Molnar, Andrew Morton,
	Linux Kernel Mailing List, Arjan van de Ven, Thomas Gleixner

On Sun, Apr 20, 2008 at 04:30:07PM +0300, Adrian Bunk wrote:
> Such a "crazy setting" shouldn't be offered to users at all.
> 
> We should either aim at 4k stacks unconditionally for all 32bit 
> architectures with 4k page size or don't allow any architecture
> to offer 4k stacks.

I agree you make a valid point here. Then wouldn't it be easier to
simply remove 4k and agree it was a wet dream ?

Willy


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-20 13:27                     ` Mark Lord
@ 2008-04-20 13:38                       ` Willy Tarreau
  2008-04-20 14:19                         ` Andi Kleen
  2008-04-20 14:09                       ` Eric Sandeen
  1 sibling, 1 reply; 162+ messages in thread
From: Willy Tarreau @ 2008-04-20 13:38 UTC (permalink / raw)
  To: Mark Lord
  Cc: Andi Kleen, Adrian Bunk, Alan Cox, Shawn Bohrer, Ingo Molnar,
	Andrew Morton, Linux Kernel Mailing List, Arjan van de Ven,
	Thomas Gleixner

On Sun, Apr 20, 2008 at 09:27:32AM -0400, Mark Lord wrote:
> Willy Tarreau wrote:
> >
> >What would really help would be to have 8k stacks with the lower page
> >causing a fault and print a stack trace upon first access. That way,
> >the safe setting would still report us useful information without
> >putting users into trouble.
> ..
> 
> That's the best suggestion from this thread, by far!
> Can you produce a patch for 2.6.26 for this?

Unfortunately, I can't. I wouldn't know where to start from.

> Or perhaps someone else here, with the right code familiarity, could?

I hope so.

> Some sort of CONFIG option would likely be wanted to
> either enable/disable this feature, of course.

If we want to migrate to 4k sooner or later, this behaviour would not
need a config option, maybe just a /proc or /sys tunable to disable
the warning. Config would be either (4k + risk of crash) or (8k + warning).

The *real* issue is to decide whether we need/want 4k or not, because
I think we're still discussing the subject for no reason, as usual...

Willy


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-20 13:34                         ` Willy Tarreau
@ 2008-04-20 14:04                           ` Adrian Bunk
  0 siblings, 0 replies; 162+ messages in thread
From: Adrian Bunk @ 2008-04-20 14:04 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Andi Kleen, Alan Cox, Shawn Bohrer, Ingo Molnar, Andrew Morton,
	Linux Kernel Mailing List, Arjan van de Ven, Thomas Gleixner

On Sun, Apr 20, 2008 at 03:34:41PM +0200, Willy Tarreau wrote:
> On Sun, Apr 20, 2008 at 04:30:07PM +0300, Adrian Bunk wrote:
> > Such a "crazy setting" shouldn't be offered to users at all.
> > 
> > We should either aim at 4k stacks unconditionally for all 32bit 
> > architectures with 4k page size or don't allow any architecture
> > to offer 4k stacks.
> 
> I agree you make a valid point here. Then wouldn't it be easier to
> simply remove 4k and agree it was a wet dream ?

If the sh maintainer and the m68knommu maintainer (and perhaps 
MontaVista) agree that it was a wet dream.

> Willy

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-20 12:18                       ` Adrian Bunk
@ 2008-04-20 14:05                         ` Eric Sandeen
  2008-04-20 14:21                           ` Adrian Bunk
                                             ` (2 more replies)
  0 siblings, 3 replies; 162+ messages in thread
From: Eric Sandeen @ 2008-04-20 14:05 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Alan Cox, Shawn Bohrer, Ingo Molnar, Andrew Morton,
	Linux Kernel Mailing List, Arjan van de Ven, Thomas Gleixner

Adrian Bunk wrote:

> But the more users will get 4k stacks the more testing we have, and the 
> better both existing and new bugs get shaken out.
> 
> And if there were only 4k stacks in the vanilla kernel, and therefore 
> all people on i386 testing -rc kernels would get it, that would give a 
> better chance of finding stack regressions before they get into a 
> stable kernel.

Heck, maybe you should make it 2k by default in all -rc kernels; that
way when people run -final with the 4k it'll be 100% bulletproof, right?
 'cause all those piggy drivers that blow a 2k stack will finally have
to get fixed?  Or leave it at 2k and find a way to share pages for
stacks, think how much memory you could save and how many java threads
you could run!

4K just happens to be the page size; other than that it's really just
some random/magic number picked, and now dictated that if you (and
everyting around you) doesn't fit, you're broken.

That bugs me.

-Eric

(yes, I know there are advantages to only allocating a single page for a
new thread, but from an "all callchains after that must fit in that
space" perspective, it's just a randomly picked number)

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-20 13:27                     ` Mark Lord
  2008-04-20 13:38                       ` Willy Tarreau
@ 2008-04-20 14:09                       ` Eric Sandeen
  2008-04-20 14:20                         ` Willy Tarreau
  1 sibling, 1 reply; 162+ messages in thread
From: Eric Sandeen @ 2008-04-20 14:09 UTC (permalink / raw)
  To: Mark Lord
  Cc: Willy Tarreau, Andi Kleen, Adrian Bunk, Alan Cox, Shawn Bohrer,
	Ingo Molnar, Andrew Morton, Linux Kernel Mailing List,
	Arjan van de Ven, Thomas Gleixner

Mark Lord wrote:
> Willy Tarreau wrote:
>> What would really help would be to have 8k stacks with the lower page
>> causing a fault and print a stack trace upon first access. That way,
>> the safe setting would still report us useful information without
>> putting users into trouble.
> ..
> 
> That's the best suggestion from this thread, by far!
> Can you produce a patch for 2.6.26 for this?
> Or perhaps someone else here, with the right code familiarity, could?
> 
> Some sort of CONFIG option would likely be wanted to
> either enable/disable this feature, of course.

Changing the default warning threshold is easy, it's just a #define.
Although setting it too low would spam syslogs on some setups.

When I was trying to cram stuff into 4k in the past, I had a patch which
added a sysctl to dynamically change the warning threshold, and
optionally BUG() when I hit it for crash analysis.  It was good for
debugging, at least.  If something along those lines is desired, I could
resurrect it.

-Eric

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-20 13:38                       ` Willy Tarreau
@ 2008-04-20 14:19                         ` Andi Kleen
  2008-04-20 16:41                           ` Jörn Engel
  0 siblings, 1 reply; 162+ messages in thread
From: Andi Kleen @ 2008-04-20 14:19 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Mark Lord, Adrian Bunk, Alan Cox, Shawn Bohrer, Ingo Molnar,
	Andrew Morton, Linux Kernel Mailing List, Arjan van de Ven,
	Thomas Gleixner

Willy Tarreau wrote:
> On Sun, Apr 20, 2008 at 09:27:32AM -0400, Mark Lord wrote:
>> Willy Tarreau wrote:
>>> What would really help would be to have 8k stacks with the lower page
>>> causing a fault and print a stack trace upon first access. That way,
>>> the safe setting would still report us useful information without
>>> putting users into trouble.
>> ..
>>
>> That's the best suggestion from this thread, by far!

Only if you believe that 4K stack pages are a worthy goal.
As far as I can figure out they are not. They might have been
a worthy goal on crappy 2.4 VMs, but these times are long gone.

The "saving memory on embedded" argument also does not 
quite convince me, it is unclear if that is really
a significant amount of memory on these systems and if that 
couldn't be addressed better (e.g. in running generally
less kernel threads).  I don't have numbers on this,
but then the people who made this argument didn't have any
either :) 

If anybody has concrete statistics on this
(including other kernel memory users in realistic situations)
please feel free to post them.


>> Can you produce a patch for 2.6.26 for this?
> 
> Unfortunately, I can't. I wouldn't know where to start from.

The problem with his suggestion is that the lower 4K of the stack page 
are accessed in normal operation too because it contains the thread_struct.
That could be changed, but it would be a relatively large change
because you would need to audit/change a lot of code who assumes 
thread_struct and stack are continuous

If that was changed implementing Willy's suggestion would not be that 
difficult using cpa()  at the cost of some general slowdown in 
increased TLB misses and much higher thread creation/tear down cost etc, 
Using the alternative vmalloc way has also other issues.

But still the fundamental problem is that it would likely only
hit the interesting cases in real production setups and I don't 
think the production users would be very happy to slow down
their kernels and handle strange backtraces just to act as guinea pigs 
for something dubious

-Andi


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-20 14:09                       ` Eric Sandeen
@ 2008-04-20 14:20                         ` Willy Tarreau
  2008-04-20 14:40                           ` Eric Sandeen
  0 siblings, 1 reply; 162+ messages in thread
From: Willy Tarreau @ 2008-04-20 14:20 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: Mark Lord, Andi Kleen, Adrian Bunk, Alan Cox, Shawn Bohrer,
	Ingo Molnar, Andrew Morton, Linux Kernel Mailing List,
	Arjan van de Ven, Thomas Gleixner

On Sun, Apr 20, 2008 at 09:09:37AM -0500, Eric Sandeen wrote:
> Mark Lord wrote:
> > Willy Tarreau wrote:
> >> What would really help would be to have 8k stacks with the lower page
> >> causing a fault and print a stack trace upon first access. That way,
> >> the safe setting would still report us useful information without
> >> putting users into trouble.
> > ..
> > 
> > That's the best suggestion from this thread, by far!
> > Can you produce a patch for 2.6.26 for this?
> > Or perhaps someone else here, with the right code familiarity, could?
> > 
> > Some sort of CONFIG option would likely be wanted to
> > either enable/disable this feature, of course.
> 
> Changing the default warning threshold is easy, it's just a #define.

I thought it was checked only at a few places (eg: during irqs). If so,
maybe it can miss some call chains ?

> Although setting it too low would spam syslogs on some setups.

we should set it slightly below the 4k limit if we want users to switch
to 4k.

> When I was trying to cram stuff into 4k in the past, I had a patch which
> added a sysctl to dynamically change the warning threshold, and
> optionally BUG() when I hit it for crash analysis.  It was good for
> debugging, at least.  If something along those lines is desired, I could
> resurrect it.

While it's good for debugging, having users tweak the limit to eliminate
the warning is the opposite of what we're looking for. We just want to
have them report the warning without their service being disrupted.

Willy


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-20 14:05                         ` Eric Sandeen
@ 2008-04-20 14:21                           ` Adrian Bunk
  2008-04-20 14:56                             ` Eric Sandeen
  2008-04-20 15:41                           ` Arjan van de Ven
  2008-04-21  7:45                           ` Denys Vlasenko
  2 siblings, 1 reply; 162+ messages in thread
From: Adrian Bunk @ 2008-04-20 14:21 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: Alan Cox, Shawn Bohrer, Ingo Molnar, Andrew Morton,
	Linux Kernel Mailing List, Arjan van de Ven, Thomas Gleixner

On Sun, Apr 20, 2008 at 09:05:40AM -0500, Eric Sandeen wrote:
> Adrian Bunk wrote:
> 
> > But the more users will get 4k stacks the more testing we have, and the 
> > better both existing and new bugs get shaken out.
> > 
> > And if there were only 4k stacks in the vanilla kernel, and therefore 
> > all people on i386 testing -rc kernels would get it, that would give a 
> > better chance of finding stack regressions before they get into a 
> > stable kernel.
> 
> Heck, maybe you should make it 2k by default in all -rc kernels; that
> way when people run -final with the 4k it'll be 100% bulletproof, right?
>  'cause all those piggy drivers that blow a 2k stack will finally have
> to get fixed?

I'm arguing for aiming at having all 32bit architectures with 4k page 
size using the same stack size. Not for having -rc kernels differ from 
release kernels.

>  Or leave it at 2k and find a way to share pages for
> stacks, think how much memory you could save and how many java threads
> you could run!

The only architecture that already defaults to 4k stacks is m68knommu, 
and I doubt they do it for many java threads...

>...
> -Eric
>...

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-20 14:20                         ` Willy Tarreau
@ 2008-04-20 14:40                           ` Eric Sandeen
  0 siblings, 0 replies; 162+ messages in thread
From: Eric Sandeen @ 2008-04-20 14:40 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Mark Lord, Andi Kleen, Adrian Bunk, Alan Cox, Shawn Bohrer,
	Ingo Molnar, Andrew Morton, Linux Kernel Mailing List,
	Arjan van de Ven, Thomas Gleixner

Willy Tarreau wrote:
> On Sun, Apr 20, 2008 at 09:09:37AM -0500, Eric Sandeen wrote:
>> Mark Lord wrote:
>>> Willy Tarreau wrote:
>>>> What would really help would be to have 8k stacks with the lower page
>>>> causing a fault and print a stack trace upon first access. That way,
>>>> the safe setting would still report us useful information without
>>>> putting users into trouble.
>>> ..
>>>
>>> That's the best suggestion from this thread, by far!
>>> Can you produce a patch for 2.6.26 for this?
>>> Or perhaps someone else here, with the right code familiarity, could?
>>>
>>> Some sort of CONFIG option would likely be wanted to
>>> either enable/disable this feature, of course.
>> Changing the default warning threshold is easy, it's just a #define.
> 
> I thought it was checked only at a few places (eg: during irqs). If so,
> maybe it can miss some call chains ?

Ah, ok I skimmed your first suggestion too quickly.  100% coverage
reports on the initial access to the 2nd 4k that way would be nice.
Well, it would be nice if we all really wanted 4k stacks some day... :)

-Eric

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-20 14:21                           ` Adrian Bunk
@ 2008-04-20 14:56                             ` Eric Sandeen
  0 siblings, 0 replies; 162+ messages in thread
From: Eric Sandeen @ 2008-04-20 14:56 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Alan Cox, Shawn Bohrer, Ingo Molnar, Andrew Morton,
	Linux Kernel Mailing List, Arjan van de Ven, Thomas Gleixner

Adrian Bunk wrote:
> On Sun, Apr 20, 2008 at 09:05:40AM -0500, Eric Sandeen wrote:
>> Adrian Bunk wrote:
>>
>>> But the more users will get 4k stacks the more testing we have, and the 
>>> better both existing and new bugs get shaken out.
>>>
>>> And if there were only 4k stacks in the vanilla kernel, and therefore 
>>> all people on i386 testing -rc kernels would get it, that would give a 
>>> better chance of finding stack regressions before they get into a 
>>> stable kernel.
>> Heck, maybe you should make it 2k by default in all -rc kernels; that
>> way when people run -final with the 4k it'll be 100% bulletproof, right?
>>  'cause all those piggy drivers that blow a 2k stack will finally have
>> to get fixed?
> 
> I'm arguing for aiming at having all 32bit architectures with 4k page 
> size using the same stack size. Not for having -rc kernels differ from 
> release kernels.

Oh, I know.  I'm just saying that 4k seems chosen out of convenience for
memory management, without any real correlation to what you might
actually need to run a thread.  They do happen to be roughly equivalent
for many cases, but not all.  Setting a default which is not safe for
several common use cases does not seem wise...

I guess what I'm saying is, I don't agree that any callchain which needs
more than 4k of stack indicates brokenness that must be fixed, as
various posts in this thread seem to suggest.

Sure, 1k char buffers on the stack and massive structs and unlimited
recursion we can agree on as things to fix, but complex/deep/stacked
callchains which don't fit in 4k are much more of a grey area.

-Eric

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-20 14:05                         ` Eric Sandeen
  2008-04-20 14:21                           ` Adrian Bunk
@ 2008-04-20 15:41                           ` Arjan van de Ven
  2008-04-20 16:03                             ` Adrian Bunk
  2008-04-21  7:45                           ` Denys Vlasenko
  2 siblings, 1 reply; 162+ messages in thread
From: Arjan van de Ven @ 2008-04-20 15:41 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: Adrian Bunk, Alan Cox, Shawn Bohrer, Ingo Molnar, Andrew Morton,
	Linux Kernel Mailing List, Thomas Gleixner

On Sun, 20 Apr 2008 09:05:40 -0500
Eric Sandeen <sandeen@sandeen.net> wrote:

> 
> 4K just happens to be the page size; other than that it's really just
> some random/magic number picked, and now dictated that if you (and
> everyting around you) doesn't fit, you're broken.

it wasn't randomly picked; it was based on 2.4 kernels
(where we had 8kb, but that was roughly 2.5Kb or so for the task struct,
which was on stack back then, then 4Kb for user context and 2Kb for IRQ context)

> 
> That bugs me.
> 

yes. Adrian is waay off in the weeds on this one. Nobody but him is suggesting to remove
8Kb stacks. I think everyone else agrees that having both options is valuable; and there
are better ways to find+fix stack bloat than removing this config option.


-- 
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-20 12:27                 ` Andi Kleen
  2008-04-20 12:32                   ` Adrian Bunk
  2008-04-20 12:47                   ` Willy Tarreau
@ 2008-04-20 15:44                   ` Daniel Hazelton
  2008-04-20 17:26                     ` Andi Kleen
  2008-04-22 18:20                     ` Romano Giannetti
  2 siblings, 2 replies; 162+ messages in thread
From: Daniel Hazelton @ 2008-04-20 15:44 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Adrian Bunk, Alan Cox, Shawn Bohrer, Ingo Molnar, Andrew Morton,
	Linux Kernel Mailing List, Arjan van de Ven, Thomas Gleixner

On Sunday 20 April 2008 08:27:14 Andi Kleen wrote:
> Adrian Bunk <bunk@kernel.org> writes:
> > 6k is known to work, and there aren't many problems known with 4k.
> >
> > And from a QA point of view the only way of getting 4k thoroughly tested
>
> But you have to first ask why do you want 4k tested? Does it serve
> any useful purpose in itself? I don't think so. Or you're saying
> it's important to support 50k kernel threads on 32bit kernels?
>
> -Andi

Andi, you're the only one I've seen seriously pounding the "50k threads" 
thing - I don't think anyone is really fooled by the straw-man, so I'd 
suggest you drop it.

The real issue is that you think (and are correct in thinking) that people are 
idiots. Yes, there will be breakages if the default is changed to 4k stacks - 
but if people are running new kernels on boxes that'll hit stack use problems 
(that *AREN'T* related to ndiswrapper) and haven't made sure that they've 
configured the kernel properly, then they deserve the outcome. It isn't the 
job of the Linux Kernel to protect the incompetent - nor is it the job of 
linux kernel developers to do such.

If people are doing a "zcat /proc/kconfig.gz > .config && make oldconfig" (or 
similar) the problem shouldn't even appear, really. They'll get whatever 
setting was in their old config for the stack size. And until the problems 
with deep-stack setups - like nfs+xfs+raid - get resolved I'd think that the 
option to configure the stack size would remain.

Since the second-most-common reason for stack overages is ndiswrapper... Well, 
with there being so much more hardware now supported directly by the linux 
kernel... I'm stunned every time someone tells me "I can't run Linux on my 
laptop, there is hardware that isn't supported without me having to get 
ndiswrapper". The last time someone said that to me I pointed to the fact 
that their hardware is supported by the latest kernel and even offered to 
build&install it for them.

DRH

-- 
Dialup is like pissing through a pipette. Slow and excruciatingly painful.

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-20 15:41                           ` Arjan van de Ven
@ 2008-04-20 16:03                             ` Adrian Bunk
  2008-04-21  3:30                               ` Alexander E. Patrakov
  0 siblings, 1 reply; 162+ messages in thread
From: Adrian Bunk @ 2008-04-20 16:03 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Eric Sandeen, Alan Cox, Shawn Bohrer, Ingo Molnar, Andrew Morton,
	Linux Kernel Mailing List, Thomas Gleixner

On Sun, Apr 20, 2008 at 08:41:27AM -0700, Arjan van de Ven wrote:
>...
> yes. Adrian is waay off in the weeds on this one. Nobody but him is suggesting to remove
> 8Kb stacks. I think everyone else agrees that having both options is valuable; and there
> are better ways to find+fix stack bloat than removing this config option.

I'm not arguing for removing the option immediately, but long-term we 
shouldn't need it.

This comes from my experience of removing obsolete drivers for hardware 
for which also a more recent driver exists:
As long as there is some workaround (e.g. using an older driver or
8k stacks) the workaround will be used instead of the getting proper 
bug reports and fixes.

As far as I know all problems that are known with 4k stacks are some 
nested things with XFS in the trace.

If this class of issues would get fixed one day, why would it be 
valuable to also offer 8k stacks long-term? Especially weigthed
against the fact that with only 4k stacks we will have more people
running into stack problems in -rc kernels if any new ones pop up,
resulting in getting more such problems fixed during -rc.

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-20 14:19                         ` Andi Kleen
@ 2008-04-20 16:41                           ` Jörn Engel
  2008-04-20 17:19                             ` Andi Kleen
  0 siblings, 1 reply; 162+ messages in thread
From: Jörn Engel @ 2008-04-20 16:41 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Willy Tarreau, Mark Lord, Adrian Bunk, Alan Cox, Shawn Bohrer,
	Ingo Molnar, Andrew Morton, Linux Kernel Mailing List,
	Arjan van de Ven, Thomas Gleixner

On Sun, 20 April 2008 16:19:29 +0200, Andi Kleen wrote:
> 
> Only if you believe that 4K stack pages are a worthy goal.
> As far as I can figure out they are not. They might have been
> a worthy goal on crappy 2.4 VMs, but these times are long gone.
> 
> The "saving memory on embedded" argument also does not 
> quite convince me, it is unclear if that is really
> a significant amount of memory on these systems and if that 
> couldn't be addressed better (e.g. in running generally
> less kernel threads).  I don't have numbers on this,
> but then the people who made this argument didn't have any
> either :) 

It is not uncommon for embedded systems to be designed around 16MiB.
Some may even have less, although I haven't encountered any of those
lately.

When dealing in those dimensions, savings of 100k are substantial.  In
some causes they may be the difference between 16MiB or 32MiB, which
translates to manufacturing costs.  In others it simply means that the
system can cache a bit more and run faster, or it can have a little more
functionality.

In most cases it simply allows userspace programmers to avoid looking
harder to save those 100k, as they are already saved in kernel space.
Therefore we made life hard for us in order to make life easier for
someone else, saving them time and money.

Whether that is worth it depends on your personal point of view.  Many
embedded people will claim "Hell yes!"  Of those that don't, most are
simply ignoring currently mainline kernels and will regret the
development later.  They care, thay just don't tend to care enough to
engage in these discussions or even know about them. :(

Jörn

-- 
Eighty percent of success is showing up.
-- Woody Allen

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-20  7:42           ` Adrian Bunk
@ 2008-04-20 16:59             ` Chris Wedgwood
  0 siblings, 0 replies; 162+ messages in thread
From: Chris Wedgwood @ 2008-04-20 16:59 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Eric Sandeen, Oliver Pinter, Ingo Molnar, Andrew Morton,
	Linux Kernel Mailing List, Arjan van de Ven, Thomas Gleixner,
	Christoph Hellwig, David Chinner, xfs

On Sun, Apr 20, 2008 at 10:42:28AM +0300, Adrian Bunk wrote:

> We are going from 6k to 4k.

6k?

> Why should users have to poke with such deeply internal things?
> That doesn't sound right.

they shouldn't, so a 4k default is a problem for them

> Excessive stack usage in the kernel is considered to be a bug.

define excessive

> We should identify and fix all remaining problems (if any).

let's see your patches then

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-20 16:41                           ` Jörn Engel
@ 2008-04-20 17:19                             ` Andi Kleen
  2008-04-20 17:43                               ` Jörn Engel
  0 siblings, 1 reply; 162+ messages in thread
From: Andi Kleen @ 2008-04-20 17:19 UTC (permalink / raw)
  To: Jörn Engel
  Cc: Willy Tarreau, Mark Lord, Adrian Bunk, Alan Cox, Shawn Bohrer,
	Ingo Molnar, Andrew Morton, Linux Kernel Mailing List,
	Arjan van de Ven, Thomas Gleixner

Jörn Engel wrote:
> On Sun, 20 April 2008 16:19:29 +0200, Andi Kleen wrote:
>> Only if you believe that 4K stack pages are a worthy goal.
>> As far as I can figure out they are not. They might have been
>> a worthy goal on crappy 2.4 VMs, but these times are long gone.
>>
>> The "saving memory on embedded" argument also does not 
>> quite convince me, it is unclear if that is really
>> a significant amount of memory on these systems and if that 
>> couldn't be addressed better (e.g. in running generally
>> less kernel threads).  I don't have numbers on this,
>> but then the people who made this argument didn't have any
>> either :) 
> 
> It is not uncommon for embedded systems to be designed around 16MiB.

But these are SoC systems. Do they really run x86?
(note we're talking about an x86 default option here)

Also I suspect in a true 16MB system you have to strip down
everything kernel side so much that you're pretty much outside
the "validated by testers" realm that Adrian cares about.

> When dealing in those dimensions, savings of 100k are substantial.  In
> some causes they may be the difference between 16MiB or 32MiB, which
> translates to manufacturing costs.  In others it simply means that the
> system can cache 

If you need the stack you don't have any less cache foot print.
If you don't need it you don't have any either.

-Andi

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-20 15:44                   ` Daniel Hazelton
@ 2008-04-20 17:26                     ` Andi Kleen
  2008-04-20 18:48                       ` Arjan van de Ven
  2008-04-22 18:20                     ` Romano Giannetti
  1 sibling, 1 reply; 162+ messages in thread
From: Andi Kleen @ 2008-04-20 17:26 UTC (permalink / raw)
  To: Daniel Hazelton
  Cc: Adrian Bunk, Alan Cox, Shawn Bohrer, Ingo Molnar, Andrew Morton,
	Linux Kernel Mailing List, Arjan van de Ven, Thomas Gleixner

Daniel Hazelton wrote:

> Andi, you're the only one I've seen seriously pounding the "50k threads" 
> thing. I don't think anyone is really fooled by the straw-man, so I'd
> suggest you drop it.

Ok, perhaps we can settle this properly. Like historicans. We study the
original sources.

The primary resource is the original commit adding the 4k stack code.
You cannot find this in latest git because it predates 2.6.12, but it is
available in one of the historic trees imported from BitKeeper like
git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git

Here's the log:
>>
commit 95f238eac82907c4ccbc301cd5788e67db0715ce
Author: Andrew Morton <akpm@osdl.org>
Date:   Sun Apr 11 23:18:43 2004 -0700

    [PATCH] ia32: 4Kb stacks (and irqstacks) patch

    From: Arjan van de Ven <arjanv@redhat.com>

    Below is a patch to enable 4Kb stacks for x86. The goal of this is to

    1) Reduce footprint per thread so that systems can run many more threads
       (for the java people)

    2) Reduce the pressure on the VM for order > 0 allocations. We see
real life
       workloads (granted with 2.4 but the fundamental fragmentation
issue isn't
       solved in 2.6 and isn't solvable in theory) where this can be a
problem.
       In addition order > 0 allocations can make the VM "stutter" and
give more
       latency due to having to do much much more work trying to defragment

...
<<

This gives us two reasons as you can see, one of them many threads
and another mostly only relevant to 2.4

Now I was also assuming that nobody took (1) really serious and
attacked (2) in earlier thread; in particular in

http://article.gmane.org/gmane.linux.kernel/665584

>>
Actually the real reason the 4K stacks were introduced IIRC was that
the VM is not very good at allocation of order > 0 pages and that only
using order 0 and not order 1 in normal operation prevented some stalls.

This rationale also goes back to 2.4 (especially some of the early 2.4
VMs were not very good) and the 2.6 VM is generally better and on
x86-64 I don't see much evidence that these stalls are a big problem
(but then x86-64 also has more lowmem).
<<

This was corrected by Ingo who was one of the primary authors of the patch:

http://thread.gmane.org/gmane.linux.kernel/665420:

>>
no, the primary motivation Arjan and me started working on 4K stacks and
implemented it was what Denys mentioned: i had a testcase that ran
50,000 threads before it ran out of memory - i wanted it to run 100,000
threads. The improved order-0 behavior was just icing on the cake.

	Ingo
<<

and then from Arjan:

http://thread.gmane.org/gmane.linux.kernel/665420

>>
> no, the primary motivation Arjan and me started working on 4K stacks
> and implemented it was what Denys mentioned: i had a testcase that

well that and the fact that RH had customers who had major issues at
fewer threads
with 8Kb versus fragmentation.
<<

So both the primary authors of the patch state that 50k threads
was the main reason. I didn't believe it at first either, but after
these forceful corrections I do now.

You're totally wrong when you call it a straw man.

-Andi


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-20 17:19                             ` Andi Kleen
@ 2008-04-20 17:43                               ` Jörn Engel
  2008-04-20 18:19                                 ` Andi Kleen
  0 siblings, 1 reply; 162+ messages in thread
From: Jörn Engel @ 2008-04-20 17:43 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Willy Tarreau, Mark Lord, Adrian Bunk, Alan Cox, Shawn Bohrer,
	Ingo Molnar, Andrew Morton, Linux Kernel Mailing List,
	Arjan van de Ven, Thomas Gleixner

On Sun, 20 April 2008 19:19:26 +0200, Andi Kleen wrote:
> 
> But these are SoC systems. Do they really run x86?
> (note we're talking about an x86 default option here)
> 
> Also I suspect in a true 16MB system you have to strip down
> everything kernel side so much that you're pretty much outside
> the "validated by testers" realm that Adrian cares about.

Maybe.  I merely showed that embedded people (not me) have good reasons
to care about small stacks.  Whether they care enough to actually spend
work on it - doubtful.

> > When dealing in those dimensions, savings of 100k are substantial.  In
> > some causes they may be the difference between 16MiB or 32MiB, which
> > translates to manufacturing costs.  In others it simply means that the
> > system can cache 
> 
> If you need the stack you don't have any less cache foot print.
> If you don't need it you don't have any either.

This part I don't understand.

Jörn

-- 
You ain't got no problem, Jules. I'm on the motherfucker. Go back in
there, chill them niggers out and wait for the Wolf, who should be
coming directly.
-- Marsellus Wallace

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-20 17:43                               ` Jörn Engel
@ 2008-04-20 18:19                                 ` Andi Kleen
  2008-04-20 18:50                                   ` Arjan van de Ven
                                                     ` (2 more replies)
  0 siblings, 3 replies; 162+ messages in thread
From: Andi Kleen @ 2008-04-20 18:19 UTC (permalink / raw)
  To: Jörn Engel
  Cc: Willy Tarreau, Mark Lord, Adrian Bunk, Alan Cox, Shawn Bohrer,
	Ingo Molnar, Andrew Morton, Linux Kernel Mailing List,
	Arjan van de Ven, Thomas Gleixner

Jörn Engel wrote:
> On Sun, 20 April 2008 19:19:26 +0200, Andi Kleen wrote:
>> But these are SoC systems. Do they really run x86?
>> (note we're talking about an x86 default option here)
>>
>> Also I suspect in a true 16MB system you have to strip down
>> everything kernel side so much that you're pretty much outside
>> the "validated by testers" realm that Adrian cares about.
> 
> Maybe.  I merely showed that embedded people (not me) have good reasons
> to care about small stacks. 

Sure but I don't think they're x86 embedded people. Right now there
are very little x86 SOCs if any (iirc there is only some obscure rise
core) and future SOCs will likely have more RAM.

Anyways I don't have a problem to give these people any special options
they need to do whatever they want.  I just object to changing the
default options on important architectures to force people in completely
different setups to do part of their testing.


 Whether they care enough to actually spend
> work on it - doubtful.
> 
>>> When dealing in those dimensions, savings of 100k are substantial.  In
>>> some causes they may be the difference between 16MiB or 32MiB, which
>>> translates to manufacturing costs.  In others it simply means that the
>>> system can cache 
>> If you need the stack you don't have any less cache foot print.
>> If you don't need it you don't have any either.
> 
> This part I don't understand.

I was just objecting to your claim that small stack implies smaller
cache foot print. Smaller stacks rarely give you smaller cache foot
print in my kernel coding experience:

First some stack is always safety and in practice unused. It won't
be in cache.

Then typically standard kernel stack pigs are just too large
buffers on the stack which are not fully used. These also
don't have much cache foot print.

Or if you have a complicated call stack the typical fix
is to move parts of it into another thread. But that doesn't
give you less cache footprint because the cache foot print
is just in someone else's stack. In fact you'll likely
have slightly more cache foot print from that due to the
context of the other thread.

In theory if you e.g. convert a recursive algorithm
to iterative you might save some cache foot print, but I don't
think that really happens in kernel code.

-Andi

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-20 17:26                     ` Andi Kleen
@ 2008-04-20 18:48                       ` Arjan van de Ven
  2008-04-20 20:01                         ` Andi Kleen
  2008-04-20 21:45                         ` Andrew Morton
  0 siblings, 2 replies; 162+ messages in thread
From: Arjan van de Ven @ 2008-04-20 18:48 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Daniel Hazelton, Adrian Bunk, Alan Cox, Shawn Bohrer,
	Ingo Molnar, Andrew Morton, Linux Kernel Mailing List,
	Thomas Gleixner

On Sun, 20 Apr 2008 19:26:10 +0200
Andi Kleen <andi@firstfloor.org> wrote:

> Daniel Hazelton wrote:
> 
> > Andi, you're the only one I've seen seriously pounding the "50k
> > threads" thing. I don't think anyone is really fooled by the
> > straw-man, so I'd suggest you drop it.
> 
> Ok, perhaps we can settle this properly. Like historicans. We study
> the original sources.
> 
> The primary resource is the original commit adding the 4k stack code.
> You cannot find this in latest git because it predates 2.6.12, but it
> is available in one of the historic trees imported from BitKeeper like
> git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
> 
> Here's the log:
> >>
> commit 95f238eac82907c4ccbc301cd5788e67db0715ce
> Author: Andrew Morton <akpm@osdl.org>
> Date:   Sun Apr 11 23:18:43 2004 -0700
> 
>     [PATCH] ia32: 4Kb stacks (and irqstacks) patch
> 
>     From: Arjan van de Ven <arjanv@redhat.com>
> 
>     Below is a patch to enable 4Kb stacks for x86. The goal of this
> is to
> 
>     1) Reduce footprint per thread so that systems can run many more
> threads (for the java people)
> 
>     2) Reduce the pressure on the VM for order > 0 allocations. We see
> real life
>        workloads (granted with 2.4 but the fundamental fragmentation
> issue isn't
>        solved in 2.6 and isn't solvable in theory) where this can be a
> problem.
>        In addition order > 0 allocations can make the VM "stutter" and
> give more
>        latency due to having to do much much more work trying to
> defragment
> 
> ...
> <<
> 
> This gives us two reasons as you can see, one of them many threads
> and another mostly only relevant to 2.4
> 
> Now I was also assuming that nobody took (1) really serious and

I'm sorry but I really hope nobody shares your assumption here.
These are real customer workloads; java based "many things going on" at a time
showed several thousands of threads fin the system (a dozen or two per request, multiplied
by the number of outstanding connections) for *real customers*.
That you don't take that serious, fair, you can take serious whatever you want.


> attacked (2) in earlier thread; in particular in

yes you did attack. But lets please use more friendly conversation here than words like
"attack". This is not a war, and we really shouldn't be hostile in this forum, neither
in words nor in intention.

> 
> http://article.gmane.org/gmane.linux.kernel/665584
> 
> >>
> Actually the real reason the 4K stacks were introduced IIRC was that
> the VM is not very good at allocation of order > 0 pages and that only
> using order 0 and not order 1 in normal operation prevented some
> stalls.
> 
> This rationale also goes back to 2.4 (especially some of the early 2.4
> VMs were not very good) and the 2.6 VM is generally better and on
> x86-64 I don't see much evidence that these stalls are a big problem
> (but then x86-64 also has more lowmem).
> <<

What you didn't atta^Waddress was the observation that fragmentation is fundamentally unsolvable.
Yes 2.4 sucked a lot more than 2.6 does. But even 2.6 will (and does) have fragmentation issues.
We don't have effective physical address based reclaim yet for higher order allocs.

> 
> http://thread.gmane.org/gmane.linux.kernel/665420:
> 
> >>
> no, the primary motivation Arjan and me started working on 4K stacks
> and implemented it was what Denys mentioned: i had a testcase that ran
> 50,000 threads before it ran out of memory - i wanted it to run
> 100,000 threads. The improved order-0 behavior was just icing on the
> cake.
> 
> 	Ingo
> <<
> 
> and then from Arjan:
> 
> http://thread.gmane.org/gmane.linux.kernel/665420
> 
> >>
> > no, the primary motivation Arjan and me started working on 4K stacks
> > and implemented it was what Denys mentioned: i had a testcase that
> 
> well that and the fact that RH had customers who had major issues at
> fewer threads
> with 8Kb versus fragmentation.
> <<
> 
> So both the primary authors of the patch state that 50k threads
> was the main reason. I didn't believe it at first either, but after
> these forceful corrections I do now.

I'm sorry but I fail to entirely understand where your "So" or the rest of your 
conclusion comes from in terms of "both the authors". Which part of "fewer threads" and
"8kb versus fragmentation" did you misunderstand to get to your conclusion?

-- 
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-20 18:19                                 ` Andi Kleen
@ 2008-04-20 18:50                                   ` Arjan van de Ven
  2008-04-20 20:09                                     ` Andi Kleen
  2008-04-20 21:50                                     ` Andrew Morton
  2008-04-20 20:32                                   ` Jörn Engel
  2008-04-20 20:35                                   ` Jörn Engel
  2 siblings, 2 replies; 162+ messages in thread
From: Arjan van de Ven @ 2008-04-20 18:50 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Jörn Engel, Willy Tarreau, Mark Lord, Adrian Bunk, Alan Cox,
	Shawn Bohrer, Ingo Molnar, Andrew Morton,
	Linux Kernel Mailing List, Thomas Gleixner

On Sun, 20 Apr 2008 20:19:30 +0200
Andi Kleen <andi@firstfloor.org> wrote:

> In theory if you e.g. convert a recursive algorithm
> to iterative you might save some cache foot print, but I don't
> think that really happens in kernel code.
> 

this is what Al did for the symlink recursion thing, and Jens did for the block layer...
so yes this conversion does happen for real.

-- 
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-20 18:48                       ` Arjan van de Ven
@ 2008-04-20 20:01                         ` Andi Kleen
  2008-04-20 20:43                           ` Daniel Hazelton
                                             ` (2 more replies)
  2008-04-20 21:45                         ` Andrew Morton
  1 sibling, 3 replies; 162+ messages in thread
From: Andi Kleen @ 2008-04-20 20:01 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Daniel Hazelton, Adrian Bunk, Alan Cox, Shawn Bohrer,
	Ingo Molnar, Andrew Morton, Linux Kernel Mailing List,
	Thomas Gleixner



> These are real customer workloads; java based "many things going on" at a time
> showed several thousands of threads fin the system (a dozen or two per request, multiplied
> by the number of outstanding connections) for *real customers*.

Several thousands or 50k? Several thousands sounds large, but not entirely unreasonable, 
but it is far from 50k.

> That you don't take that serious, fair, you can take serious whatever you want.

No I don't take 50k threads on 32bit serious. And I hope you do not
either.

Why I don't take it serious: on 32bit 50k threads will lead
to lowmem exhaustion if the threads are actually doing something 
(like keeping select pages around or similar and having some thread
local data). You'll easily be at 16-32K/thread and that is already 
far beyond the lowmem available on any 3:1 split 32bit kernel, likely 
even beyond 2:2. Even with 3:1 it could be tight.

So you can say about customer workloads what you want, but you'll
have a hard time convincing me they really run 50k threads 
doing something on 32bit.

Now if we take the real realistic overhead of a thread into 
account 4k or more less don't really matter all that much
and the decreased safety from the 4k stack starts to look
like a very bad bargain.

>> attacked (2) in earlier thread; in particular in
> 
> yes you did attack. 
> But lets please use more friendly conversation here than words like
> "attack". This is not a war, and we really shouldn't be hostile in this forum, neither
> in words nor in intention.

Ok what word would you prefer? 

There is no war involved right, just a technical argument. I previously 
always assumed that "attacking" was a standard term in discussions, but 
if you don't like I can switch to another one. 

Regarding war like terminology: I used to think that people who commonly 
talk about "nuking code" went a little too far, but at some point
I adapted to them I think. Perhaps it comes from that.


> What you didn't atta^Waddress 

Fine, I will call it address from now.

> was the observation that fragmentation is fundamentally unsolvable.

Where was that observation? 

> Yes 2.4 sucked a lot more than 2.6 does. But even 2.6 will (and does) have fragmentation issues.
> We don't have effective physical address based reclaim yet for higher order allocs.

I don't see any evidence that there are serious order 1 fragmentation 
issues on 2.6. If you have any please post it.

-Andi

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-20 18:50                                   ` Arjan van de Ven
@ 2008-04-20 20:09                                     ` Andi Kleen
  2008-04-20 21:50                                     ` Andrew Morton
  1 sibling, 0 replies; 162+ messages in thread
From: Andi Kleen @ 2008-04-20 20:09 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Jörn Engel, Willy Tarreau, Mark Lord, Adrian Bunk, Alan Cox,
	Shawn Bohrer, Ingo Molnar, Andrew Morton,
	Linux Kernel Mailing List, Thomas Gleixner

Arjan van de Ven wrote:

> this is what Al did for the symlink recursion thing, 

AFAIK most symlink lookups are still recursive.

-Andi

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-20 18:19                                 ` Andi Kleen
  2008-04-20 18:50                                   ` Arjan van de Ven
@ 2008-04-20 20:32                                   ` Jörn Engel
  2008-04-20 20:35                                   ` Jörn Engel
  2 siblings, 0 replies; 162+ messages in thread
From: Jörn Engel @ 2008-04-20 20:32 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Willy Tarreau, Mark Lord, Adrian Bunk, Alan Cox, Shawn Bohrer,
	Ingo Molnar, Andrew Morton, Linux Kernel Mailing List,
	Arjan van de Ven, Thomas Gleixner

On Sun, 20 April 2008 20:19:30 +0200, Andi Kleen wrote:
> > 
> >>> When dealing in those dimensions, savings of 100k are substantial.  In
> >>> some causes they may be the difference between 16MiB or 32MiB, which
> >>> translates to manufacturing costs.  In others it simply means that the
> >>> system can cache 
> >> If you need the stack you don't have any less cache foot print.
> >> If you don't need it you don't have any either.
> > 
> > This part I don't understand.
> 
> I was just objecting to your claim that small stack implies smaller
> cache foot print.

The cache I referred to is called DRAM, not L1.

Jörn

-- 
Don't worry about people stealing your ideas. If your ideas are any good,
you'll have to ram them down people's throats.
-- Howard Aiken quoted by Ken Iverson quoted by Jim Horning quoted by
   Raph Levien, 1979

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-20 18:19                                 ` Andi Kleen
  2008-04-20 18:50                                   ` Arjan van de Ven
  2008-04-20 20:32                                   ` Jörn Engel
@ 2008-04-20 20:35                                   ` Jörn Engel
  2 siblings, 0 replies; 162+ messages in thread
From: Jörn Engel @ 2008-04-20 20:35 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Willy Tarreau, Mark Lord, Adrian Bunk, Alan Cox, Shawn Bohrer,
	Ingo Molnar, Andrew Morton, Linux Kernel Mailing List,
	Arjan van de Ven, Thomas Gleixner

On Sun, 20 April 2008 20:19:30 +0200, Andi Kleen wrote:
> 
> Sure but I don't think they're x86 embedded people. Right now there
> are very little x86 SOCs if any (iirc there is only some obscure rise
> core) and future SOCs will likely have more RAM.
> 
> Anyways I don't have a problem to give these people any special options
> they need to do whatever they want.  I just object to changing the
> default options on important architectures to force people in completely
> different setups to do part of their testing.

Ah, ok.  The question whether 4k stacks should become the default I
prefer not touching with an 80' pole.

Jörn

-- 
Why do musicians compose symphonies and poets write poems?
They do it because life wouldn't have any meaning for them if they didn't.
That's why I draw cartoons.  It's my life.
-- Charles Shultz

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-20 20:01                         ` Andi Kleen
@ 2008-04-20 20:43                           ` Daniel Hazelton
  2008-04-20 21:40                             ` Andi Kleen
  2008-04-20 22:33                           ` Arjan van de Ven
  2008-04-20 22:33                           ` Arjan van de Ven
  2 siblings, 1 reply; 162+ messages in thread
From: Daniel Hazelton @ 2008-04-20 20:43 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Arjan van de Ven, Adrian Bunk, Alan Cox, Shawn Bohrer,
	Ingo Molnar, Andrew Morton, Linux Kernel Mailing List,
	Thomas Gleixner

On Sunday 20 April 2008 16:01:46 Andi Kleen wrote:
> > These are real customer workloads; java based "many things going on" at a
> > time showed several thousands of threads fin the system (a dozen or two
> > per request, multiplied by the number of outstanding connections) for
> > *real customers*.
>
> Several thousands or 50k? Several thousands sounds large, but not entirely
> unreasonable, but it is far from 50k.
>

At 12 threads per request it'd only take about 4200 outstanding requests. That 
is high, but I can see it happening. At 24 threads per request the number of 
outstanding requests it takes to reach that is cut in half, to about 2100. 
That number is more realistic. Since all outstanding requests aren't going to 
be at the extremes, let us assume that it's a mid-point between the two for 
the number of outstanding requests - say somewhere around 3150 outstanding 
requests.

While that is a rather high number, if a company - a decently sized one - is 
using a piece of Java code internally for some reason they could easily have 
that level of requests coming in from the users. For a website with a decent 
load that routes a common request to the machine running the code it'd be 
even easier to hit that limit. So yes, 50K threads *IS* actually pretty easy 
to reach and could be a common workload.

> > That you don't take that serious, fair, you can take serious whatever you
> > want.
>
> No I don't take 50k threads on 32bit serious. And I hope you do not
> either.

Just makes you sound foolish. Run the numbers yourself and you'll see that it 
is easy for a machine running highly threaded code to easily hit 50K threads.

<snip>
> > Yes 2.4 sucked a lot more than 2.6 does. But even 2.6 will (and does)
> > have fragmentation issues. We don't have effective physical address based
> > reclaim yet for higher order allocs.
>
> I don't see any evidence that there are serious order 1 fragmentation
> issues on 2.6. If you have any please post it.

Due to me screwing up the configuration of Apache (2) and MySQL I have seen a 
machine I own hit problems with memory fragmentation - and it's running a 2.6 
series kernel (a distro 2.6.17)

Because I was able to see that it was a problem I caused I didn't even *THINK* 
about posting information about it to LKML. I didn't keep the logs of that 
around - it happened more than three months ago and I clean the logs out 
every three months or so.

DRH

-- 
Dialup is like pissing through a pipette. Slow and excruciatingly painful.

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-20 20:43                           ` Daniel Hazelton
@ 2008-04-20 21:40                             ` Andi Kleen
  2008-04-20 22:17                               ` Bernd Eckenfels
  2008-04-21  1:45                               ` Daniel Hazelton
  0 siblings, 2 replies; 162+ messages in thread
From: Andi Kleen @ 2008-04-20 21:40 UTC (permalink / raw)
  To: Daniel Hazelton
  Cc: Arjan van de Ven, Adrian Bunk, Alan Cox, Shawn Bohrer,
	Ingo Molnar, Andrew Morton, Linux Kernel Mailing List,
	Thomas Gleixner

Daniel Hazelton wrote:

> At 12 threads per request it'd only take about 4200 outstanding requests. That 
> is high, but I can see it happening.

If it happens it just won't work on 32bit.

> Just makes you sound foolish. Run the numbers yourself and you'll see that it 
> is easy for a machine running highly threaded code to easily hit 50K threads.

I ran the numbers and the numbers showed that you need > 1.5GB of lowmem
with a somewhat realistic scenario (32K per thread) at 50k threads. And
subtracting 4k from that 32k number won't make any significant
difference (still 1.3GB)

If you claim that works on a 32bit system with typically 300-600MB
lowmem available (which is also shared by other subsystem) I know who
sounds foolish.

-Andi

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-20 18:48                       ` Arjan van de Ven
  2008-04-20 20:01                         ` Andi Kleen
@ 2008-04-20 21:45                         ` Andrew Morton
  2008-04-20 21:51                           ` Andi Kleen
  1 sibling, 1 reply; 162+ messages in thread
From: Andrew Morton @ 2008-04-20 21:45 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: andi, dhazelton, bunk, alan, shawn.bohrer, mingo, linux-kernel, tglx

> On Sun, 20 Apr 2008 11:48:45 -0700 Arjan van de Ven <arjan@infradead.org> wrote:
> We don't have effective physical address based reclaim yet for higher order allocs.

Lumpy reclaim is supposed to be exactly that.

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-20 18:50                                   ` Arjan van de Ven
  2008-04-20 20:09                                     ` Andi Kleen
@ 2008-04-20 21:50                                     ` Andrew Morton
  2008-04-20 21:55                                       ` Andi Kleen
  2008-04-21 14:29                                       ` Ingo Molnar
  1 sibling, 2 replies; 162+ messages in thread
From: Andrew Morton @ 2008-04-20 21:50 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: andi, joern, w, lkml, bunk, alan, shawn.bohrer, mingo,
	linux-kernel, tglx

> On Sun, 20 Apr 2008 11:50:53 -0700 Arjan van de Ven <arjan@infradead.org> wrote:
> On Sun, 20 Apr 2008 20:19:30 +0200
> Andi Kleen <andi@firstfloor.org> wrote:
> 
> > In theory if you e.g. convert a recursive algorithm
> > to iterative you might save some cache foot print, but I don't
> > think that really happens in kernel code.
> > 
> 
> this is what Al did for the symlink recursion thing, and Jens did for the block layer...
> so yes this conversion does happen for real.

md got mostly-fixed too, via Neil's patch which sat in -mm for nearly two
years.


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-20 21:45                         ` Andrew Morton
@ 2008-04-20 21:51                           ` Andi Kleen
  0 siblings, 0 replies; 162+ messages in thread
From: Andi Kleen @ 2008-04-20 21:51 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Arjan van de Ven, dhazelton, bunk, alan, shawn.bohrer, mingo,
	linux-kernel, tglx

Andrew Morton wrote:
>> On Sun, 20 Apr 2008 11:48:45 -0700 Arjan van de Ven <arjan@infradead.org> wrote:
>> We don't have effective physical address based reclaim yet for higher order allocs.
> 
> Lumpy reclaim is supposed to be exactly that.

Also if order 1 allocs were a significant problem on i386 we must have
had lots of reports of EAGAIN on fork/clone with !4k stack kernels. I'm
not aware of an significant number of such reports (there were a few
occasionally, but that is probably normal and unavoidable and can
be caused by other things too like simply running out of lowmem)

-Andi


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-20 21:50                                     ` Andrew Morton
@ 2008-04-20 21:55                                       ` Andi Kleen
  2008-04-21 14:29                                       ` Ingo Molnar
  1 sibling, 0 replies; 162+ messages in thread
From: Andi Kleen @ 2008-04-20 21:55 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Arjan van de Ven, joern, w, lkml, bunk, alan, shawn.bohrer,
	mingo, linux-kernel, tglx

Andrew Morton wrote:
>> On Sun, 20 Apr 2008 11:50:53 -0700 Arjan van de Ven <arjan@infradead.org> wrote:
>> On Sun, 20 Apr 2008 20:19:30 +0200
>> Andi Kleen <andi@firstfloor.org> wrote:
>>
>>> In theory if you e.g. convert a recursive algorithm
>>> to iterative you might save some cache foot print, but I don't
>>> think that really happens in kernel code.
>>>
>> this is what Al did for the symlink recursion thing, and Jens did for the block layer...
>> so yes this conversion does happen for real.
> 
> md got mostly-fixed too, via Neil's patch which sat in -mm for nearly two
> years.

Congratulations, you found three examples in 8.4MLOC.

Ok ok I should have said it only happens very rarely (I still stand by
that :)

Anyways it is moot because it was a miscommunication between me and Joerg.

-Andi



^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-20 21:40                             ` Andi Kleen
@ 2008-04-20 22:17                               ` Bernd Eckenfels
  2008-04-20 23:48                                 ` Avi Kivity
  2008-04-21  1:45                               ` Daniel Hazelton
  1 sibling, 1 reply; 162+ messages in thread
From: Bernd Eckenfels @ 2008-04-20 22:17 UTC (permalink / raw)
  To: linux-kernel

In article <480BB85A.6020508@firstfloor.org> you wrote:
> If you claim that works on a 32bit system with typically 300-600MB
> lowmem available (which is also shared by other subsystem) I know who
> sounds foolish.

A question along this line. Why is the Userspace Thread bound to a
Kernel-Space Stack at all? I could imagine a solution like Stack Pools
assigned only of a Thread enters kernel space, or something like this?

Gruss
Bernd

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-20 20:01                         ` Andi Kleen
  2008-04-20 20:43                           ` Daniel Hazelton
@ 2008-04-20 22:33                           ` Arjan van de Ven
  2008-04-20 22:33                           ` Arjan van de Ven
  2 siblings, 0 replies; 162+ messages in thread
From: Arjan van de Ven @ 2008-04-20 22:33 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Daniel Hazelton, Adrian Bunk, Alan Cox, Shawn Bohrer,
	Ingo Molnar, Andrew Morton, Linux Kernel Mailing List,
	Thomas Gleixner

On Sun, 20 Apr 2008 22:01:46 +0200
Andi Kleen <andi@firstfloor.org> wrote:

> 
> 
> > These are real customer workloads; java based "many things going
> > on" at a time showed several thousands of threads fin the system (a
> > dozen or two per request, multiplied by the number of outstanding
> > connections) for *real customers*.
> 
> Several thousands or 50k? Several thousands sounds large, but not
> entirely unreasonable, but it is far from 50k.

it is you who keeps putting up the 50k argument.
What I'm talking about is in the 10k to 20k range; and that is actual workloads
by real customers.
> 
> > That you don't take that serious, fair, you can take serious
> > whatever you want.
> 
> No I don't take 50k threads on 32bit serious. And I hope you do not
> either.

[ removed a bunch of stuff about 50k again ]

> 
> > was the observation that fragmentation is fundamentally unsolvable.
> 
> Where was that observation? 

it was in the commit message from me you quoted, and was rather widely discussed at the time.
It's also basic math; the Linux VM gets to deal with both short and long lasting allocations;
no matter how hard you try to get some degree of fragmentation; especially due to the
15:1 acceleration you get due to the lowmem issue.

And before you say "you should use 64 bit on such machines"; I would love it if more people used 64 bit linux.
Sadly the adoption rate of that is not very good still.... by far ;(

> 
> > Yes 2.4 sucked a lot more than 2.6 does. But even 2.6 will (and
> > does) have fragmentation issues. We don't have effective physical
> > address based reclaim yet for higher order allocs.
> 
> I don't see any evidence that there are serious order 1 fragmentation 
> issues on 2.6. 

I assume you're not asking me to give you customer confidential data from a previous job in public ;)

>If you have any please post it.

just like you're posting the evidence that 4k stacks overflows?

Google scores:

1-order allocation failed		54000 pages
do_IRQ: stack overflow			4560 pages


-- 
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-20 20:01                         ` Andi Kleen
  2008-04-20 20:43                           ` Daniel Hazelton
  2008-04-20 22:33                           ` Arjan van de Ven
@ 2008-04-20 22:33                           ` Arjan van de Ven
  2008-04-20 23:16                             ` Andi Kleen
  2008-04-21  3:06                             ` Eric Sandeen
  2 siblings, 2 replies; 162+ messages in thread
From: Arjan van de Ven @ 2008-04-20 22:33 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Daniel Hazelton, Adrian Bunk, Alan Cox, Shawn Bohrer,
	Ingo Molnar, Andrew Morton, Linux Kernel Mailing List,
	Thomas Gleixner

On Sun, 20 Apr 2008 22:01:46 +0200
Andi Kleen <andi@firstfloor.org> wrote:

> 
> 
> > These are real customer workloads; java based "many things going
> > on" at a time showed several thousands of threads fin the system (a
> > dozen or two per request, multiplied by the number of outstanding
> > connections) for *real customers*.
> 
> Several thousands or 50k? Several thousands sounds large, but not
> entirely unreasonable, but it is far from 50k.

it is you who keeps putting up the 50k argument.
What I'm talking about is in the 10k to 20k range; and that is actual workloads
by real customers.
> 
> > That you don't take that serious, fair, you can take serious
> > whatever you want.
> 
> No I don't take 50k threads on 32bit serious. And I hope you do not
> either.

[ removed a bunch of stuff about 50k again ]

> 
> > was the observation that fragmentation is fundamentally unsolvable.
> 
> Where was that observation? 

it was in the commit message from me you quoted, and was rather widely discussed at the time.
It's also basic math; the Linux VM gets to deal with both short and long lasting allocations;
no matter how hard you try to get some degree of fragmentation; especially due to the
15:1 acceleration you get due to the lowmem issue.

And before you say "you should use 64 bit on such machines"; I would love it if more people used 64 bit linux.
Sadly the adoption rate of that is not very good still.... by far ;(

> 
> > Yes 2.4 sucked a lot more than 2.6 does. But even 2.6 will (and
> > does) have fragmentation issues. We don't have effective physical
> > address based reclaim yet for higher order allocs.
> 
> I don't see any evidence that there are serious order 1 fragmentation 
> issues on 2.6. 

I assume you're not asking me to give you customer confidential data from a previous job in public ;)

>If you have any please post it.

just like you're posting the evidence that 4k stacks overflows?

Google scores:

1-order allocation failed		54000 pages
do_IRQ: stack overflow			4560 pages


-- 
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-20  2:36         ` Eric Sandeen
  2008-04-20  6:11           ` Arjan van de Ven
@ 2008-04-20 22:53           ` David Chinner
  1 sibling, 0 replies; 162+ messages in thread
From: David Chinner @ 2008-04-20 22:53 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: Arjan van de Ven, Shawn Bohrer, Ingo Molnar, Andrew Morton,
	Linux Kernel Mailing List, Thomas Gleixner

On Sat, Apr 19, 2008 at 09:36:16PM -0500, Eric Sandeen wrote:
> Arjan van de Ven wrote:
> 
> > On the flipside the arguments tend to be
> > 1) certain stackings of components still runs the risk of overflowing
> > 2) I want to run ndiswrapper
> > 3) general, unspecified uneasyness.
> > 
> > For 1), we need to know which they are, and then solve them, because even on x86-64 with 8k stacks
> > they can be a problem (just because the stack frames are bigger, although not quite double, there).
> 
> Except, apparently, not, at least in my experience.
> 
> Ask the xfs guys if they see stack overflows on x86_64, or on x86.

We see them regularly enough on x86 to know that the first question
to any strange crash is "are you using 4k stacks?". In comparison,
I have never heard of a single stack overflow on x86_64....

> I've personally never seen common stack problems with xfs on x86_64, but
> it's very common on x86.  I don't have a great answer for why, but
> that's my anecdotal evidence.

Why? Because XFS makes extensive use of 64 bit types and so stack
usage in the critical paths changes by a relatively small amount
between 32 bit and 64 bit machines.  IIRC, x86_64 only uses about
30% more stack than x86. So given that the stack doubles on x86_64
and we only increase usage (in XFS) from about 1500 bytes to 2000
bytes of stack usage, we have *lots* more stack space to spare on
x86_64 compared to 4k stacks on x86....

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-20 22:33                           ` Arjan van de Ven
@ 2008-04-20 23:16                             ` Andi Kleen
  2008-04-21  5:53                               ` Arjan van de Ven
  2008-04-21  3:06                             ` Eric Sandeen
  1 sibling, 1 reply; 162+ messages in thread
From: Andi Kleen @ 2008-04-20 23:16 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Daniel Hazelton, Adrian Bunk, Alan Cox, Shawn Bohrer,
	Ingo Molnar, Andrew Morton, Linux Kernel Mailing List,
	Thomas Gleixner

Arjan van de Ven <arjan@infradead.org> writes:
>
> it is you who keeps putting up the 50k argument.

See the links I posted and quote in an earlier message up the thread if you
don't remember what you wrote yourself.

I originally only hold up the fragmentation argument (or rather only
argued against it), until I was corrected by both Ingo and you in the
earlier thread and you both insisted that 50k threads were the real
reason'd'etre for 4k stacks.  

You're saying that was wrong and the fragmentation issue was really the 
real reason for 4k stacks? If both you and Ingo can agree on that
I would be happy to forget the 50k threads :)

> What I'm talking about is in the 10k to 20k range; and that is actual workloads
> by real customers.

On a 32bit kernel? 

My estimate is that you need around 32k for a functional blocked thread
in a network server (8k + 2*4k for poll with large fd table and wait queues + 
some pinned dentries and inodes + misc other stuff). With 20k you're 625MB into
your lowmem which leaves about 200MB left on a 3:1 system with 16GB 
(and ~128MB mem_map).  That might work for some time, but I expect it will fall
over at some point because there is just too much pinned lowmem
and not enough left for other stuff (like networking buffers etc.) 

10k sounds more doable. But again do 4k more or less make 
a big difference with the other thread overhead? I don't think so.

And trading reliability (and functionality -- you basically have to
cut off XFS)just for 4k/thread doesn't seem like good bargain to
me. Especially with kernel code getting more complicated all the time.

>> I don't see any evidence that there are serious order 1 fragmentation 
>> issues on 2.6. 
>
> I assume you're not asking me to give you customer confidential data from a previous job in public ;)

Well if it is that serious a problem surely it will have hit some public
bugzillas or mailing lists?  Arguing with something secret is also not 
very useful.

Also I find it always important to reevaluate assumptions when new 
facts come up. In this case we should reevaluate a decision that made
sense[1] in 2.4 with the new facts of 2.6 (e.g. new VM with much better
reclaim)

[1] refering to the fragmentation argument, not the 50k threads which
were always unrealistic.

-Andi

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-20 22:17                               ` Bernd Eckenfels
@ 2008-04-20 23:48                                 ` Avi Kivity
  0 siblings, 0 replies; 162+ messages in thread
From: Avi Kivity @ 2008-04-20 23:48 UTC (permalink / raw)
  To: Bernd Eckenfels; +Cc: linux-kernel

Bernd Eckenfels wrote:
> A question along this line. Why is the Userspace Thread bound to a
> Kernel-Space Stack at all? I could imagine a solution like Stack Pools
> assigned only of a Thread enters kernel space, or something like this?
>
>   

The vast majority of threads are sleeping (with a stack footprint in the 
kernel).  If you have an N-way system, at most N threads can be in 
userspace at any given moment.

You could multiplex several userspace threads on one kernel thread (the 
M:N model), but it gets fairly complex.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-20 21:40                             ` Andi Kleen
  2008-04-20 22:17                               ` Bernd Eckenfels
@ 2008-04-21  1:45                               ` Daniel Hazelton
  2008-04-21  7:51                                 ` Andi Kleen
  1 sibling, 1 reply; 162+ messages in thread
From: Daniel Hazelton @ 2008-04-21  1:45 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Arjan van de Ven, Adrian Bunk, Alan Cox, Shawn Bohrer,
	Ingo Molnar, Andrew Morton, Linux Kernel Mailing List,
	Thomas Gleixner

On Sunday 20 April 2008 17:40:42 Andi Kleen wrote:
> Daniel Hazelton wrote:
> > At 12 threads per request it'd only take about 4200 outstanding requests.
> > That is high, but I can see it happening.
>
> If it happens it just won't work on 32bit.

No, it won't. Which is what I was pointing out. You're hitting a different 
wall there.

> > Just makes you sound foolish. Run the numbers yourself and you'll see
> > that it is easy for a machine running highly threaded code to easily hit
> > 50K threads.
>
> I ran the numbers and the numbers showed that you need > 1.5GB of lowmem
> with a somewhat realistic scenario (32K per thread) at 50k threads. And
> subtracting 4k from that 32k number won't make any significant
> difference (still 1.3GB)
>
> If you claim that works on a 32bit system with typically 300-600MB
> lowmem available (which is also shared by other subsystem) I know who
> sounds foolish.

Never said it worked on a 32bit system. I was pointing out that there can be 
workloads that do reach that 50K thread-count that you seem to be 
calling "stupid". 

As I pointed out later in the message, I *HAVE* run into lowmem starvation on 
a 32bit x86 system. You thoughtfully removed this, perhaps because you felt 
it damaged your argument. The machine in question is an old P3 box with less 
than 1G of memory in it. (Phys+Swap on that machine is only about 1.4G)

So yes, on a 32bit machine you run into problems at much, much less of a 
workload and a much lower thread-count than the magic 50K you are so fond of 
talking about. If I had been running 4K stacks on that machine I probably 
would have survived the mis-configuration without the reboot it took to make 
the machine functional again - I probably would still have reconfigured 
Apache and MySQL, though - the machine still would have gone largely 
unresponsive.

DRH

--
Dialup is like pissing through a pipette. Slow and excruciatingly painful.

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-20 22:33                           ` Arjan van de Ven
  2008-04-20 23:16                             ` Andi Kleen
@ 2008-04-21  3:06                             ` Eric Sandeen
  1 sibling, 0 replies; 162+ messages in thread
From: Eric Sandeen @ 2008-04-21  3:06 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Andi Kleen, Daniel Hazelton, Adrian Bunk, Alan Cox, Shawn Bohrer,
	Ingo Molnar, Andrew Morton, Linux Kernel Mailing List,
	Thomas Gleixner

Arjan van de Ven wrote:

> Google scores:
> 
> 1-order allocation failed		54000 pages
> do_IRQ: stack overflow			4560 pages

with quotes for exact matches:

"1-order allocation failed"	790 pages
"do_IRQ: stack overflow"	1,880 pages

http://www.google.com/search?q=%221-order+allocation+failed%22
http://www.google.com/search?q=%22do_IRQ%3A+stack+overflow%22

-Eric

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-20 16:03                             ` Adrian Bunk
@ 2008-04-21  3:30                               ` Alexander E. Patrakov
  2008-04-23  8:57                                 ` Helge Hafting
  0 siblings, 1 reply; 162+ messages in thread
From: Alexander E. Patrakov @ 2008-04-21  3:30 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Arjan van de Ven, Eric Sandeen, Alan Cox, Shawn Bohrer,
	Ingo Molnar, Andrew Morton, Linux Kernel Mailing List,
	Thomas Gleixner

Adrian Bunk wrote:
> On Sun, Apr 20, 2008 at 08:41:27AM -0700, Arjan van de Ven wrote:
>> ...
>> yes. Adrian is waay off in the weeds on this one. Nobody but him is suggesting to remove
>> 8Kb stacks. I think everyone else agrees that having both options is valuable; and there
>> are better ways to find+fix stack bloat than removing this config option.
> 
> I'm not arguing for removing the option immediately, but long-term we 
> shouldn't need it.
> 
> This comes from my experience of removing obsolete drivers for hardware 
> for which also a more recent driver exists:
> As long as there is some workaround (e.g. using an older driver or
> 8k stacks) the workaround will be used instead of the getting proper 
> bug reports and fixes.
> 
> As far as I know all problems that are known with 4k stacks are some 
> nested things with XFS in the trace.

This "as far as I know" is a problem itself. Is it possible to implement (e.g., 
using some form of memory protection in hardware, but I am not an expert here) 
an option with 8k stacks that, however, spams the log if the actual usage goes 
above 4k, and have this as a default for some time? If 4k stacks are the goal 
that is almost achieved, then this debugging option should have zero impact on 
performance.

-- 
Alexander E. Patrakov

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-20 23:16                             ` Andi Kleen
@ 2008-04-21  5:53                               ` Arjan van de Ven
  0 siblings, 0 replies; 162+ messages in thread
From: Arjan van de Ven @ 2008-04-21  5:53 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Daniel Hazelton, Adrian Bunk, Alan Cox, Shawn Bohrer,
	Ingo Molnar, Andrew Morton, Linux Kernel Mailing List,
	Thomas Gleixner

On Mon, 21 Apr 2008 01:16:22 +0200
Andi Kleen <andi@firstfloor.org> wrote:

> Arjan van de Ven <arjan@infradead.org> writes:
> >
> > it is you who keeps putting up the 50k argument.
> 
> See the links I posted and quote in an earlier message up the thread
> if you don't remember what you wrote yourself.
> 
> I originally only hold up the fragmentation argument (or rather only
> argued against it), until I was corrected by both Ingo and you in the
> earlier thread and you both insisted that 50k threads were the real
> reason'd'etre for 4k stacks.  
> 
> You're saying that was wrong and the fragmentation issue was really
> the real reason for 4k stacks? If both you and Ingo can agree on that
> I would be happy to forget the 50k threads :)

I already corrected you misquoting/misunderstanding me; should I do this again?

> 
> > What I'm talking about is in the 10k to 20k range; and that is
> > actual workloads by real customers.
> 
> On a 32bit kernel? 
> 
> My estimate is that you need around 32k for a functional blocked
> thread in a network server (8k + 2*4k for poll with large fd table
> and wait queues + some pinned dentries and inodes + misc other
> stuff). With 20k you're 625MB into your lowmem which leaves about
> 200MB left on a 3:1 system with 16GB (and ~128MB mem_map).  That
> might work for some time, but I expect it will fall over at some
> point because there is just too much pinned lowmem and not enough
> left for other stuff (like networking buffers etc.) 
> 
> 10k sounds more doable. But again do 4k more or less make 
> a big difference with the other thread overhead? I don't think so.

no but the other ones are order 0..

-- 
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-20 14:05                         ` Eric Sandeen
  2008-04-20 14:21                           ` Adrian Bunk
  2008-04-20 15:41                           ` Arjan van de Ven
@ 2008-04-21  7:45                           ` Denys Vlasenko
  2008-04-21  9:55                             ` Andi Kleen
  2008-04-21 13:29                             ` Eric Sandeen
  2 siblings, 2 replies; 162+ messages in thread
From: Denys Vlasenko @ 2008-04-21  7:45 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: Adrian Bunk, Alan Cox, Shawn Bohrer, Ingo Molnar, Andrew Morton,
	Linux Kernel Mailing List, Arjan van de Ven, Thomas Gleixner

On Sunday 20 April 2008 16:05, Eric Sandeen wrote:
> Adrian Bunk wrote:
> > But the more users will get 4k stacks the more testing we have, and the 
> > better both existing and new bugs get shaken out.
> > 
> > And if there were only 4k stacks in the vanilla kernel, and therefore 
> > all people on i386 testing -rc kernels would get it, that would give a 
> > better chance of finding stack regressions before they get into a 
> > stable kernel.
> 
> Heck, maybe you should make it 2k by default in all -rc kernels; that
> way when people run -final with the 4k it'll be 100% bulletproof, right?
>  'cause all those piggy drivers that blow a 2k stack will finally have
> to get fixed?  Or leave it at 2k and find a way to share pages for
> stacks, think how much memory you could save and how many java threads
> you could run!
> 
> 4K just happens to be the page size; other than that it's really just
> some random/magic number picked, and now dictated that if you (and
> everyting around you) doesn't fit, you're broken.

Some number has to be picked. Why fitting in 4k is "bad" and fitting
in 8k is "not bad"?

Look what happens when this number is too big: Windows is "generous",
and as a result Windows drivers routinely need 12k, sometimes 16k of stack.
We know it from ndiswrapper. We don't want to go that way, right?

Forget about 50k threads. 4k of waste per process is a waste nevertheless.
It's not at all unusual to have 250+ processes, and 250 processes with 8k
stack each waste 1M. Do you think extra 1M won't be useful to have?

It seems that 4k works for everybody sans xfs. Making it work took some effort,
but it is already done. Why not use it after all?

And since i386 is such a common architecture, other 32-bit arches will be
relieved from the burden of hunting down stack overflows which happen
only on those arches. (For example, different ABI or different gcc behavior
may make $OTHER_ARCH slightly more stack-greedy). God knows non-mainstream
arches have enough problems already.
--
vda

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-21  1:45                               ` Daniel Hazelton
@ 2008-04-21  7:51                                 ` Andi Kleen
  2008-04-21 17:34                                   ` Daniel Hazelton
  0 siblings, 1 reply; 162+ messages in thread
From: Andi Kleen @ 2008-04-21  7:51 UTC (permalink / raw)
  To: Daniel Hazelton
  Cc: Andi Kleen, Arjan van de Ven, Adrian Bunk, Alan Cox,
	Shawn Bohrer, Ingo Molnar, Andrew Morton,
	Linux Kernel Mailing List, Thomas Gleixner

> Never said it worked on a 32bit system. I was pointing out that there can be 
> workloads that do reach 

Ah your point was that people might do this on 64bit systems? 

They could indeed. It would not be very efficient but it should work
in theory at least with enough memory. Of course they don't need 4k
stacks for it. They can also try it on 32bit and it will work
to some extent too, just not scale very far.  And 4k stack more or less
won't make much difference for that because the stack is only
a small part of the lowmem needed for a blocked thread with
open sockets.

But this thread clearly was about 32bit systems only.

> that 50K thread-count that you seem to be 
> calling "stupid". 

Note I didn't come up with that number, it was quoted to me earlier
(but one of its authors has distanced itself from it now, so it 
seems to becoming more and more irrelevant indeed now) 

Stupid in this case just refers to the general observation that
it is quite inefficient to do one thread per request on servers
who are expected to process lots of long running connections.

Perhaps I could have put that better I will give you that. Please
assume I always meant "inefficient" when I wrote "stupid".

> talking about. If I had been running 4K stacks on that machine I probably 
> would have survived the mis-configuration without the reboot it took to make 

Now that is a very doubtful claim. You realize that a functional network
server thread needs a lot more lowmem than just the stack? 

-Andi


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-21  7:45                           ` Denys Vlasenko
@ 2008-04-21  9:55                             ` Andi Kleen
  2008-04-21 13:29                             ` Eric Sandeen
  1 sibling, 0 replies; 162+ messages in thread
From: Andi Kleen @ 2008-04-21  9:55 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: Eric Sandeen, Adrian Bunk, Alan Cox, Shawn Bohrer, Ingo Molnar,
	Andrew Morton, Linux Kernel Mailing List, Arjan van de Ven,
	Thomas Gleixner

Denys Vlasenko <vda.linux@googlemail.com> writes:
>
> Forget about 50k threads. 4k of waste per process is a waste nevertheless.
> It's not at all unusual to have 250+ processes, and 250 processes with 8k
> stack each waste 1M. Do you think extra 1M won't be useful to have?

If the 1M gives you more reliability (and I think it does) I don't 
think it is "wasted". Would you trade occasional crashes for 1MB? 
I wouldn't.

Also a typical process uses much more memory than just 4K. If it's
not a thread it needs own page tables and from those alone you're
easily into 10+ pages even for a quite small process. But even threads
in practice have other overheads too if they actually do something.
The 4K won't save or break you.

[BTW if you're really interested in saving memory there are lots
of other subsystems where you could very likely save more. A common
example are the standard hash tables which are still too big]

The trends are also against it: kernel code is getting more and more
complex all the time with more and more complicated stacks of
different subsystems on top of each other. It wouldn't surprise me if
at some point 8KB isn't even enough anymore. Going into the
other direction is definitely the wrong way.

-Andi

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-21  7:45                           ` Denys Vlasenko
  2008-04-21  9:55                             ` Andi Kleen
@ 2008-04-21 13:29                             ` Eric Sandeen
  2008-04-21 19:51                               ` Denys Vlasenko
  1 sibling, 1 reply; 162+ messages in thread
From: Eric Sandeen @ 2008-04-21 13:29 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: Adrian Bunk, Alan Cox, Shawn Bohrer, Ingo Molnar, Andrew Morton,
	Linux Kernel Mailing List, Arjan van de Ven, Thomas Gleixner

Denys Vlasenko wrote:
> On Sunday 20 April 2008 16:05, Eric Sandeen wrote:
>> Adrian Bunk wrote:
>>> But the more users will get 4k stacks the more testing we have, and the 
>>> better both existing and new bugs get shaken out.
>>>
>>> And if there were only 4k stacks in the vanilla kernel, and therefore 
>>> all people on i386 testing -rc kernels would get it, that would give a 
>>> better chance of finding stack regressions before they get into a 
>>> stable kernel.
>> Heck, maybe you should make it 2k by default in all -rc kernels; that
>> way when people run -final with the 4k it'll be 100% bulletproof, right?
>>  'cause all those piggy drivers that blow a 2k stack will finally have
>> to get fixed?  Or leave it at 2k and find a way to share pages for
>> stacks, think how much memory you could save and how many java threads
>> you could run!
>>
>> 4K just happens to be the page size; other than that it's really just
>> some random/magic number picked, and now dictated that if you (and
>> everyting around you) doesn't fit, you're broken.
> 
> Some number has to be picked. Why fitting in 4k is "bad" and fitting
> in 8k is "not bad"?


Because well-written code in several subsystems, used in combination in
common configurations, does not always fit, that is why.

Show me the "bug" in an nfs+xfs+md+scsi writeback stack oops and I'm
sure it'll get "fixed."  But if it's simply complex code that happens to
need >4k, I will continue to argue that the limited stack size selection
is the problem, not the code running in it.

Perhaps not surprisingly, ext4, which is significantly more complex than
ext3, has many more individual functions > 100 bytes than ext3 has.  As
others have said, there is no trend towards smaller, simpler, less
interesting, and less functional code which fits in a smaller and
smaller footprint in the general case.

If someone has a workload and configuration which happens to fit in 4k
then turn it on, test the heck out of it, and have fun.  I've not seen
what I consider to be a convincing argument for making it the default
for everyone.

-Eric

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-20 21:50                                     ` Andrew Morton
  2008-04-20 21:55                                       ` Andi Kleen
@ 2008-04-21 14:29                                       ` Ingo Molnar
  1 sibling, 0 replies; 162+ messages in thread
From: Ingo Molnar @ 2008-04-21 14:29 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Arjan van de Ven, andi, joern, w, lkml, bunk, alan, shawn.bohrer,
	linux-kernel, tglx


* Andrew Morton <akpm@linux-foundation.org> wrote:

> > On Sun, 20 Apr 2008 11:50:53 -0700 Arjan van de Ven <arjan@infradead.org> wrote:
> > On Sun, 20 Apr 2008 20:19:30 +0200
> > Andi Kleen <andi@firstfloor.org> wrote:
> > 
> > > In theory if you e.g. convert a recursive algorithm
> > > to iterative you might save some cache foot print, but I don't
> > > think that really happens in kernel code.
> > > 
> > 
> > this is what Al did for the symlink recursion thing, and Jens did 
> > for the block layer... so yes this conversion does happen for real.
> 
> md got mostly-fixed too, via Neil's patch which sat in -mm for nearly 
> two years.

had we done the de-obfuscate-4K-stacks Kconfig change earlier it might 
have gotten upstream faster.

	Ingo

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-20  3:29     ` Eric Sandeen
  2008-04-20 12:36       ` Andi Kleen
@ 2008-04-21 14:31       ` Ingo Molnar
  1 sibling, 0 replies; 162+ messages in thread
From: Ingo Molnar @ 2008-04-21 14:31 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: Andrew Morton, Linux Kernel Mailing List, Arjan van de Ven,
	Thomas Gleixner


* Eric Sandeen <sandeen@sandeen.net> wrote:

> > and we've conducted tens of thousands of bootup tests with all sorts 
> > of drivers and kernel options enabled and have yet to see a single 
> > crash due to 4K stacks.
> 
> Really, not one?
> 
> https://bugzilla.redhat.com/show_bug.cgi?id=247158
> https://bugzilla.redhat.com/show_bug.cgi?id=227331
> https://bugzilla.redhat.com/show_bug.cgi?id=240077
> 
> (hehe, ok, xfs is a common component there...)
> 
> and it's not always obvious that you've overflowed the stack.
> 
> CONFIG_DEBUG_STACKOVERFLOW isn't ery useful because the warning printk 
> it generates uses the remaining amount of stack, and tips the box.

note that in -rt we have an ftrace plugin that measures _precise_ stack 
footprint, when it happens.

so it's possible to measure exact stack footprint and save a stack trace 
when that happens.

	Ingo

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-21  7:51                                 ` Andi Kleen
@ 2008-04-21 17:34                                   ` Daniel Hazelton
  0 siblings, 0 replies; 162+ messages in thread
From: Daniel Hazelton @ 2008-04-21 17:34 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Arjan van de Ven, Adrian Bunk, Alan Cox, Shawn Bohrer,
	Ingo Molnar, Andrew Morton, Linux Kernel Mailing List,
	Thomas Gleixner

On Monday 21 April 2008 03:51:02 Andi Kleen wrote:
> > Never said it worked on a 32bit system. I was pointing out that there can
> > be workloads that do reach
>
> Ah your point was that people might do this on 64bit systems?

My point was that people might try to make such a system work on a 32bit 
system and fail. The  fact that the limit does exist and changing the stack 
size doesn't really help things is a key there.

My point is that you can get a few more threads out of a machine with 4K 
stacks, even on 32bit. Sure, the difference is basically negligible, but it 
does happen. That extra available space may be the difference between a 
poorly coded program triggering random crashes (and the OOM killer) and the 
system surviving it. 

While it's true that I feel that the job of the kernel isn't to protect the 
incompetent, it should protect the competent admins from the incompetent 
developers (and middle management).

> They could indeed. It would not be very efficient but it should work
> in theory at least with enough memory. Of course they don't need 4k
> stacks for it. They can also try it on 32bit and it will work
> to some extent too, just not scale very far.  And 4k stack more or less
> won't make much difference for that because the stack is only
> a small part of the lowmem needed for a blocked thread with
> open sockets.

True. But having that tiny bit of extra memory might be the difference between 
a crash and a somewhat memory starved but surviving system.

> But this thread clearly was about 32bit systems only.

I didn't say otherwise. I was pointing out that 50K threads isn't out of the 
question when looking at the workload provided (and ignoring all other memory 
concerns.

However, I had hoped I wouldn't have to spell out the stuff I've had to point 
out in this mail.

> > that 50K thread-count that you seem to be
> > calling "stupid".
>
> Note I didn't come up with that number, it was quoted to me earlier
> (but one of its authors has distanced itself from it now, so it
> seems to becoming more and more irrelevant indeed now)

Yes, I know you didn't come up with it. But in seeing the original commit-log 
for it, I'm thinking that the '50K' number was initially meant as either a 
small joke or a dream of a maximum.

> Stupid in this case just refers to the general observation that
> it is quite inefficient to do one thread per request on servers
> who are expected to process lots of long running connections.

Remember, you're talking about people that write the code in Java. It's going 
to spawn all kinds of threads anyway. I, personally, would write the code in 
a language giving me better control over the available resources. However, 
I'm not employed by any major company because I will almost always refuse to 
work on a project if it's being done in an inefficient manner.

> Perhaps I could have put that better I will give you that. Please
> assume I always meant "inefficient" when I wrote "stupid".

In that case I agree. It is very inefficient to do things that way.

> > talking about. If I had been running 4K stacks on that machine I probably
> > would have survived the mis-configuration without the reboot it took to
> > make
>
> Now that is a very doubtful claim. You realize that a functional network
> server thread needs a lot more lowmem than just the stack?

There was nothing else running on the machine and it was reporting lowmem free 
in the logs, just none "usable". Since the two biggest hogs on that box are 
Apache2 and MySQL - and since repairing the Apache2 config damage has halted 
further OOM's on that machine, I'm pretty much certain that it was Apache2 at 
fault, though since there were reports of free lowmem, I'm pretty certain it 
was a combination of fragmentation and Apache2.

DRH


-- 
Dialup is like pissing through a pipette. Slow and excruciatingly painful.

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-21 13:29                             ` Eric Sandeen
@ 2008-04-21 19:51                               ` Denys Vlasenko
  2008-04-21 20:28                                 ` Denys Vlasenko
  2008-04-22  1:28                                 ` David Chinner
  0 siblings, 2 replies; 162+ messages in thread
From: Denys Vlasenko @ 2008-04-21 19:51 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: Adrian Bunk, Alan Cox, Shawn Bohrer, Ingo Molnar, Andrew Morton,
	Linux Kernel Mailing List, Arjan van de Ven, Thomas Gleixner

On Monday 21 April 2008 15:29, Eric Sandeen wrote:
> > Some number has to be picked. Why fitting in 4k is "bad" and fitting
> > in 8k is "not bad"?
> 
> 
> Because well-written code in several subsystems, used in combination in
> common configurations, does not always fit, that is why.
> 
> Show me the "bug" in an nfs+xfs+md+scsi writeback stack oops

Why nfs+xfs+md+ide works? Does scsi intrinsically require more stack
than ide?

Why xfs code is said to be 5 timed bigged than e.g. reiserfs?
Does it have to be that big? Does it really have to eat lots of stack?

> and I'm 
> sure it'll get "fixed."  But if it's simply complex code that happens to
> need >4k, I will continue to argue that the limited stack size selection
> is the problem, not the code running in it.

8k stack is limited too. Other Operating System, no doubt in the name
of better stability, has even larger stack (16k or more).

For what its worth, I do realize that there is a point of diminishing
returns and increased pain when one tries to reduce stack usage.
I feel that 4k for 32-bit x86 is not too painful. IMO, of course.

> If someone has a workload and configuration which happens to fit in 4k
> then turn it on, test the heck out of it, and have fun.  I've not seen
> what I consider to be a convincing argument for making it the default
> for everyone.

Conversely:

"If someone is strongly concerned about possibility of stack overflow,
then turn on 8k option, and enjoy the benefits of wide testing which
is provided by millions of people who run 4k stacks. If _that_ works
ok in practice, 8k _ought_ to be 100.00% safe versus stack overflow".

These threads about 4k stack seem to degenerate in ping-ponging
of these arguments again and again.
--
vda

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-21 19:51                               ` Denys Vlasenko
@ 2008-04-21 20:28                                 ` Denys Vlasenko
  2008-04-22  1:28                                 ` David Chinner
  1 sibling, 0 replies; 162+ messages in thread
From: Denys Vlasenko @ 2008-04-21 20:28 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: Adrian Bunk, Alan Cox, Shawn Bohrer, Ingo Molnar, Andrew Morton,
	Linux Kernel Mailing List, Arjan van de Ven, Thomas Gleixner

On Monday 21 April 2008 21:51, Denys Vlasenko wrote:
> Why xfs code is said to be 5 timed bigged than e.g. reiserfs?

s/timed bigged/times bigger/

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-21 19:51                               ` Denys Vlasenko
  2008-04-21 20:28                                 ` Denys Vlasenko
@ 2008-04-22  1:28                                 ` David Chinner
  2008-04-22  2:33                                   ` [PATCH] xfs: do not pass size into kmem_free, it's unused Denys Vlasenko
                                                     ` (2 more replies)
  1 sibling, 3 replies; 162+ messages in thread
From: David Chinner @ 2008-04-22  1:28 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: Eric Sandeen, Adrian Bunk, Alan Cox, Shawn Bohrer, Ingo Molnar,
	Andrew Morton, Linux Kernel Mailing List, Arjan van de Ven,
	Thomas Gleixner

On Mon, Apr 21, 2008 at 09:51:02PM +0200, Denys Vlasenko wrote:
> On Monday 21 April 2008 15:29, Eric Sandeen wrote:
> > > Some number has to be picked. Why fitting in 4k is "bad" and fitting
> > > in 8k is "not bad"?
> > 
> > 
> > Because well-written code in several subsystems, used in combination in
> > common configurations, does not always fit, that is why.
> > 
> > Show me the "bug" in an nfs+xfs+md+scsi writeback stack oops
> 
> Why nfs+xfs+md+ide works?

Luck?

With 4k stacks, you really don't need NFS at all - you just have
enter memory reclaim at the wrong time (i.e. when something else
was already consuming 2/3rds of the 4k stack).

> Does scsi intrinsically require more stack than ide?

<shrug>

> Why xfs code is said to be 5 timed bigged than e.g. reiserfs?
> Does it have to be that big?

If we cut the bulkstat code out, the handle interface, the
preallocation, the journalled quota, the delayed allocation, all the
runtime validation, the shutdown code, the debug code, the tracing
code, etc, then we might get down to the same size reiser....

> Does it really have to eat lots of stack?

Writeback is done under ENOMEM pressure, and XFS can't provide the
guarantees mempools need to work. That leaves the stack as the only
place we can put the things we need. e.g. the args structures that
tell the allocator what to do and retain state between subsequent
low level allocation calls use ~250 bytes of stack just by
themselves....

We've already chopped off the low hanging fruit, added noinline to
every function definition to prevent compiler heuristics from
blowing out stack usage by 25% and reduced use of temporary
variables as much as possible. There's very little fat left to trim,
and still we can't reliably fit in 4k stacks.

Patches are welcome - I'd be over the moon if any of the known 4k
stack advocates sent a stack reduction patch for XFS, but it seems
that actually trying to fix the problems is much harder than
resending a one line patch every few months....

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 162+ messages in thread

* [PATCH] xfs: do not pass size into kmem_free, it's unused
  2008-04-22  1:28                                 ` David Chinner
@ 2008-04-22  2:33                                   ` Denys Vlasenko
  2008-04-22  3:03                                     ` [PATCH] xfs: do not pass unused params to xfs_flush_pages Denys Vlasenko
                                                       ` (2 more replies)
  2008-04-22 12:48                                   ` x86: 4kstacks default Denys Vlasenko
  2008-04-27 19:27                                   ` Jörn Engel
  2 siblings, 3 replies; 162+ messages in thread
From: Denys Vlasenko @ 2008-04-22  2:33 UTC (permalink / raw)
  To: David Chinner
  Cc: Eric Sandeen, Adrian Bunk, Alan Cox, Shawn Bohrer, Ingo Molnar,
	Andrew Morton, Linux Kernel Mailing List, Arjan van de Ven,
	Thomas Gleixner

[-- Attachment #1: Type: text/plain, Size: 744 bytes --]

Hi David,

> Patches are welcome - I'd be over the moon if any of the known 4k
> stack advocates sent a stack reduction patch for XFS, but it seems
> that actually trying to fix the problems is much harder than
> resending a one line patch every few months....

kmem_free() function takes (ptr, size) arguments but doesn't
actually use second one.

This patch removes size argument from all callsites.

Code size difference on 32-bit x86:

# size */fs/xfs/xfs.o
   text    data     bss     dec     hex filename
 391271    2748    1708  395727   609cf linux-2.6-xfs0-TEST/fs/xfs/xfs.o
 390739    2748    1708  395195   607bb linux-2.6-xfs1-TEST/fs/xfs/xfs.o

Compile-tested only.

Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
--
vda

[-- Attachment #2: xfs-kmem_free-remove_unused_param.patch --]
[-- Type: text/x-diff, Size: 28392 bytes --]

--- linux-2.6-xfs0/fs/xfs/linux-2.6/kmem.c	Tue Apr 22 03:56:05 2008
+++ linux-2.6-xfs1/fs/xfs/linux-2.6/kmem.c	Tue Apr 22 04:13:50 2008
@@ -90,7 +90,7 @@
 }
 
 void
-kmem_free(void *ptr, size_t size)
+kmem_free(void *ptr)
 {
 	if (!is_vmalloc_addr(ptr)) {
 		kfree(ptr);
@@ -110,7 +110,7 @@
 		if (new)
 			memcpy(new, ptr,
 				((oldsize < newsize) ? oldsize : newsize));
-		kmem_free(ptr, oldsize);
+		kmem_free(ptr);
 	}
 	return new;
 }
--- linux-2.6-xfs0/fs/xfs/linux-2.6/kmem.h	Tue Apr 22 03:56:05 2008
+++ linux-2.6-xfs1/fs/xfs/linux-2.6/kmem.h	Tue Apr 22 04:12:09 2008
@@ -58,7 +58,7 @@
 extern void *kmem_zalloc(size_t, unsigned int __nocast);
 extern void *kmem_zalloc_greedy(size_t *, size_t, size_t, unsigned int __nocast);
 extern void *kmem_realloc(void *, size_t, size_t, unsigned int __nocast);
-extern void  kmem_free(void *, size_t);
+extern void  kmem_free(void *);
 
 /*
  * Zone interfaces
--- linux-2.6-xfs0/fs/xfs/linux-2.6/xfs_buf.c	Tue Apr 22 03:56:05 2008
+++ linux-2.6-xfs1/fs/xfs/linux-2.6/xfs_buf.c	Tue Apr 22 04:14:06 2008
@@ -310,8 +310,7 @@
 	xfs_buf_t	*bp)
 {
 	if (bp->b_pages != bp->b_page_array) {
-		kmem_free(bp->b_pages,
-			  bp->b_page_count * sizeof(struct page *));
+		kmem_free(bp->b_pages);
 	}
 }
 
@@ -1382,7 +1381,7 @@
 xfs_free_bufhash(
 	xfs_buftarg_t		*btp)
 {
-	kmem_free(btp->bt_hash, (1<<btp->bt_hashshift) * sizeof(xfs_bufhash_t));
+	kmem_free(btp->bt_hash);
 	btp->bt_hash = NULL;
 }
 
@@ -1428,7 +1427,7 @@
 	xfs_unregister_buftarg(btp);
 	kthread_stop(btp->bt_task);
 
-	kmem_free(btp, sizeof(*btp));
+	kmem_free(btp);
 }
 
 STATIC int
@@ -1559,7 +1558,7 @@
 	return btp;
 
 error:
-	kmem_free(btp, sizeof(*btp));
+	kmem_free(btp);
 	return NULL;
 }
 
--- linux-2.6-xfs0/fs/xfs/linux-2.6/xfs_super.c	Tue Apr 22 03:56:05 2008
+++ linux-2.6-xfs1/fs/xfs/linux-2.6/xfs_super.c	Tue Apr 22 04:14:19 2008
@@ -1074,7 +1074,7 @@
 			list_del(&work->w_list);
 			if (work == &mp->m_sync_work)
 				continue;
-			kmem_free(work, sizeof(struct bhv_vfs_sync_work));
+			kmem_free(work);
 		}
 	}
 
@@ -1222,7 +1222,7 @@
 	error = xfs_parseargs(mp, options, args, 1);
 	if (!error)
 		error = xfs_mntupdate(mp, flags, args);
-	kmem_free(args, sizeof(*args));
+	kmem_free(args);
 	return -error;
 }
 
@@ -1370,7 +1370,7 @@
 
 	xfs_itrace_exit(XFS_I(sb->s_root->d_inode));
 
-	kmem_free(args, sizeof(*args));
+	kmem_free(args);
 	return 0;
 
 fail_vnrele:
@@ -1385,7 +1385,7 @@
 	xfs_unmount(mp, 0, NULL);
 
 fail_vfsop:
-	kmem_free(args, sizeof(*args));
+	kmem_free(args);
 	return -error;
 }
 
--- linux-2.6-xfs0/fs/xfs/quota/xfs_dquot_item.c	Tue Apr 22 03:56:05 2008
+++ linux-2.6-xfs1/fs/xfs/quota/xfs_dquot_item.c	Tue Apr 22 04:13:40 2008
@@ -566,8 +566,8 @@
 	 * xfs_trans_delete_ail() drops the AIL lock.
 	 */
 	xfs_trans_delete_ail(qfs->qql_item.li_mountp, (xfs_log_item_t *)qfs);
-	kmem_free(qfs, sizeof(xfs_qoff_logitem_t));
-	kmem_free(qfe, sizeof(xfs_qoff_logitem_t));
+	kmem_free(qfs);
+	kmem_free(qfe);
 	return (xfs_lsn_t)-1;
 }
 
--- linux-2.6-xfs0/fs/xfs/quota/xfs_qm.c	Tue Apr 22 03:56:05 2008
+++ linux-2.6-xfs1/fs/xfs/quota/xfs_qm.c	Tue Apr 22 04:13:03 2008
@@ -192,8 +192,8 @@
 		xfs_qm_list_destroy(&(xqm->qm_usr_dqhtable[i]));
 		xfs_qm_list_destroy(&(xqm->qm_grp_dqhtable[i]));
 	}
-	kmem_free(xqm->qm_usr_dqhtable, hsize * sizeof(xfs_dqhash_t));
-	kmem_free(xqm->qm_grp_dqhtable, hsize * sizeof(xfs_dqhash_t));
+	kmem_free(xqm->qm_usr_dqhtable);
+	kmem_free(xqm->qm_grp_dqhtable);
 	xqm->qm_usr_dqhtable = NULL;
 	xqm->qm_grp_dqhtable = NULL;
 	xqm->qm_dqhashmask = 0;
@@ -201,7 +201,7 @@
 #ifdef DEBUG
 	mutex_destroy(&qcheck_lock);
 #endif
-	kmem_free(xqm, sizeof(xfs_qm_t));
+	kmem_free(xqm);
 }
 
 /*
@@ -1133,7 +1133,7 @@
 	 * and change the superblock accordingly.
 	 */
 	if ((error = xfs_qm_init_quotainos(mp))) {
-		kmem_free(qinf, sizeof(xfs_quotainfo_t));
+		kmem_free(qinf);
 		mp->m_quotainfo = NULL;
 		return error;
 	}
@@ -1247,7 +1247,7 @@
 		qi->qi_gquotaip = NULL;
 	}
 	mutex_destroy(&qi->qi_quotaofflock);
-	kmem_free(qi, sizeof(xfs_quotainfo_t));
+	kmem_free(qi);
 	mp->m_quotainfo = NULL;
 }
 
@@ -1624,7 +1624,7 @@
 			break;
 	} while (nmaps > 0);
 
-	kmem_free(map, XFS_DQITER_MAP_SIZE * sizeof(*map));
+	kmem_free(map);
 
 	return error;
 }
--- linux-2.6-xfs0/fs/xfs/quota/xfs_qm_syscalls.c	Tue Apr 22 03:56:05 2008
+++ linux-2.6-xfs1/fs/xfs/quota/xfs_qm_syscalls.c	Tue Apr 22 04:13:32 2008
@@ -1447,14 +1447,14 @@
 		for (d = (xfs_dqtest_t *) h1->qh_next; d != NULL; ) {
 			xfs_dqtest_cmp(d);
 			e = (xfs_dqtest_t *) d->HL_NEXT;
-			kmem_free(d, sizeof(xfs_dqtest_t));
+			kmem_free(d);
 			d = e;
 		}
 		h1 = &qmtest_gdqtab[i];
 		for (d = (xfs_dqtest_t *) h1->qh_next; d != NULL; ) {
 			xfs_dqtest_cmp(d);
 			e = (xfs_dqtest_t *) d->HL_NEXT;
-			kmem_free(d, sizeof(xfs_dqtest_t));
+			kmem_free(d);
 			d = e;
 		}
 	}
@@ -1465,8 +1465,8 @@
 	} else {
 		cmn_err(CE_DEBUG, "******** quotacheck successful! ********");
 	}
-	kmem_free(qmtest_udqtab, qmtest_hashmask * sizeof(xfs_dqhash_t));
-	kmem_free(qmtest_gdqtab, qmtest_hashmask * sizeof(xfs_dqhash_t));
+	kmem_free(qmtest_udqtab);
+	kmem_free(qmtest_gdqtab);
 	mutex_unlock(&qcheck_lock);
 	return (qmtest_nfails);
 }
--- linux-2.6-xfs0/fs/xfs/support/ktrace.c	Tue Apr 22 03:56:05 2008
+++ linux-2.6-xfs1/fs/xfs/support/ktrace.c	Tue Apr 22 04:13:47 2008
@@ -85,7 +85,7 @@
 		if (sleep & KM_SLEEP)
 			panic("ktrace_alloc: NULL memory on KM_SLEEP request!");
 
-		kmem_free(ktp, sizeof(*ktp));
+		kmem_free(ktp);
 
 		return NULL;
 	}
@@ -120,7 +120,7 @@
 	} else {
 		entries_size = (int)(ktp->kt_nentries * sizeof(ktrace_entry_t));
 
-		kmem_free(ktp->kt_entries, entries_size);
+		kmem_free(ktp->kt_entries);
 	}
 
 	kmem_zone_free(ktrace_hdr_zone, ktp);
--- linux-2.6-xfs0/fs/xfs/xfs_attr_leaf.c	Tue Apr 22 03:56:05 2008
+++ linux-2.6-xfs1/fs/xfs/xfs_attr_leaf.c	Tue Apr 22 04:16:04 2008
@@ -555,7 +555,7 @@
 out:
 	if(bp)
 		xfs_da_buf_done(bp);
-	kmem_free(tmpbuffer, size);
+	kmem_free(tmpbuffer);
 	return(error);
 }
 
@@ -676,7 +676,7 @@
 					     XFS_ERRLEVEL_LOW,
 					     context->dp->i_mount, sfe);
 			xfs_attr_trace_l_c("sf corrupted", context);
-			kmem_free(sbuf, sbsize);
+			kmem_free(sbuf);
 			return XFS_ERROR(EFSCORRUPTED);
 		}
 		if (!xfs_attr_namesp_match_overrides(context->flags, sfe->flags)) {
@@ -717,7 +717,7 @@
 		}
 	}
 	if (i == nsbuf) {
-		kmem_free(sbuf, sbsize);
+		kmem_free(sbuf);
 		xfs_attr_trace_l_c("blk end", context);
 		return(0);
 	}
@@ -747,7 +747,7 @@
 		cursor->offset++;
 	}
 
-	kmem_free(sbuf, sbsize);
+	kmem_free(sbuf);
 	xfs_attr_trace_l_c("sf E-O-F", context);
 	return(0);
 }
@@ -873,7 +873,7 @@
 	error = 0;
 
 out:
-	kmem_free(tmpbuffer, XFS_LBSIZE(dp->i_mount));
+	kmem_free(tmpbuffer);
 	return(error);
 }
 
@@ -1271,7 +1271,7 @@
 				be16_to_cpu(hdr_s->count), mp);
 	xfs_da_log_buf(trans, bp, 0, XFS_LBSIZE(mp) - 1);
 
-	kmem_free(tmpbuffer, XFS_LBSIZE(mp));
+	kmem_free(tmpbuffer);
 }
 
 /*
@@ -1921,7 +1921,7 @@
 				be16_to_cpu(drop_hdr->count), mp);
 		}
 		memcpy((char *)save_leaf, (char *)tmp_leaf, state->blocksize);
-		kmem_free(tmpbuffer, state->blocksize);
+		kmem_free(tmpbuffer);
 	}
 
 	xfs_da_log_buf(state->args->trans, save_blk->bp, 0,
@@ -2451,7 +2451,7 @@
 						(int)name_rmt->namelen,
 						valuelen,
 						(char*)args.value);
-				kmem_free(args.value, valuelen);
+				kmem_free(args.value);
 			}
 			else {
 				retval = context->put_listent(context,
@@ -2954,7 +2954,7 @@
 			error = tmp;	/* save only the 1st errno */
 	}
 
-	kmem_free((xfs_caddr_t)list, size);
+	kmem_free((xfs_caddr_t)list);
 	return(error);
 }
 
--- linux-2.6-xfs0/fs/xfs/xfs_bmap.c	Tue Apr 22 03:56:05 2008
+++ linux-2.6-xfs1/fs/xfs/xfs_bmap.c	Tue Apr 22 04:16:25 2008
@@ -5965,7 +5965,7 @@
 	xfs_iunlock_map_shared(ip, lock);
 	xfs_iunlock(ip, XFS_IOLOCK_SHARED);
 
-	kmem_free(map, subnex * sizeof(*map));
+	kmem_free(map);
 
 	return error;
 }
--- linux-2.6-xfs0/fs/xfs/xfs_buf_item.c	Tue Apr 22 03:56:05 2008
+++ linux-2.6-xfs1/fs/xfs/xfs_buf_item.c	Tue Apr 22 04:15:04 2008
@@ -884,9 +884,9 @@
 	}
 
 #ifdef XFS_TRANS_DEBUG
-	kmem_free(bip->bli_orig, XFS_BUF_COUNT(bp));
+	kmem_free(bip->bli_orig);
 	bip->bli_orig = NULL;
-	kmem_free(bip->bli_logged, XFS_BUF_COUNT(bp) / NBBY);
+	kmem_free(bip->bli_logged);
 	bip->bli_logged = NULL;
 #endif /* XFS_TRANS_DEBUG */
 
@@ -1133,9 +1133,9 @@
 	xfs_trans_delete_ail(mp, (xfs_log_item_t *)bip);
 
 #ifdef XFS_TRANS_DEBUG
-	kmem_free(bip->bli_orig, XFS_BUF_COUNT(bp));
+	kmem_free(bip->bli_orig);
 	bip->bli_orig = NULL;
-	kmem_free(bip->bli_logged, XFS_BUF_COUNT(bp) / NBBY);
+	kmem_free(bip->bli_logged);
 	bip->bli_logged = NULL;
 #endif /* XFS_TRANS_DEBUG */
 
--- linux-2.6-xfs0/fs/xfs/xfs_da_btree.c	Tue Apr 22 03:56:05 2008
+++ linux-2.6-xfs1/fs/xfs/xfs_da_btree.c	Tue Apr 22 04:21:21 2008
@@ -1598,7 +1598,7 @@
 					args->firstblock, args->total,
 					&mapp[mapi], &nmap, args->flist,
 					NULL))) {
-				kmem_free(mapp, sizeof(*mapp) * count);
+				kmem_free(mapp);
 				return error;
 			}
 			if (nmap < 1)
@@ -1620,11 +1620,11 @@
 	    mapp[mapi - 1].br_startoff + mapp[mapi - 1].br_blockcount !=
 	    bno + count) {
 		if (mapp != &map)
-			kmem_free(mapp, sizeof(*mapp) * count);
+			kmem_free(mapp);
 		return XFS_ERROR(ENOSPC);
 	}
 	if (mapp != &map)
-		kmem_free(mapp, sizeof(*mapp) * count);
+		kmem_free(mapp);
 	*new_blkno = (xfs_dablk_t)bno;
 	return 0;
 }
@@ -2090,10 +2090,10 @@
 		}
 	}
 	if (bplist) {
-		kmem_free(bplist, sizeof(*bplist) * nmap);
+		kmem_free(bplist);
 	}
 	if (mapp != &map) {
-		kmem_free(mapp, sizeof(*mapp) * nfsb);
+		kmem_free(mapp);
 	}
 	if (bpp)
 		*bpp = rbp;
@@ -2102,11 +2102,11 @@
 	if (bplist) {
 		for (i = 0; i < nbplist; i++)
 			xfs_trans_brelse(trans, bplist[i]);
-		kmem_free(bplist, sizeof(*bplist) * nmap);
+		kmem_free(bplist);
 	}
 exit0:
 	if (mapp != &map)
-		kmem_free(mapp, sizeof(*mapp) * nfsb);
+		kmem_free(mapp);
 	if (bpp)
 		*bpp = NULL;
 	return error;
@@ -2315,7 +2315,7 @@
 	if (dabuf->dirty)
 		xfs_da_buf_clean(dabuf);
 	if (dabuf->nbuf > 1)
-		kmem_free(dabuf->data, BBTOB(dabuf->bbcount));
+		kmem_free(dabuf->data);
 #ifdef XFS_DABUF_DEBUG
 	{
 		spin_lock(&xfs_dabuf_global_lock);
@@ -2332,7 +2332,7 @@
 	if (dabuf->nbuf == 1)
 		kmem_zone_free(xfs_dabuf_zone, dabuf);
 	else
-		kmem_free(dabuf, XFS_DA_BUF_SIZE(dabuf->nbuf));
+		kmem_free(dabuf);
 }
 
 /*
@@ -2403,7 +2403,7 @@
 	for (i = 0; i < nbuf; i++)
 		xfs_trans_brelse(tp, bplist[i]);
 	if (bplist != &bp)
-		kmem_free(bplist, nbuf * sizeof(*bplist));
+		kmem_free(bplist);
 }
 
 /*
@@ -2429,7 +2429,7 @@
 	for (i = 0; i < nbuf; i++)
 		xfs_trans_binval(tp, bplist[i]);
 	if (bplist != &bp)
-		kmem_free(bplist, nbuf * sizeof(*bplist));
+		kmem_free(bplist);
 }
 
 /*
--- linux-2.6-xfs0/fs/xfs/xfs_dfrag.c	Tue Apr 22 03:56:05 2008
+++ linux-2.6-xfs1/fs/xfs/xfs_dfrag.c	Tue Apr 22 04:15:27 2008
@@ -116,7 +116,7 @@
  out_put_file:
 	fput(file);
  out_free_sxp:
-	kmem_free(sxp, sizeof(xfs_swapext_t));
+	kmem_free(sxp);
  out:
 	return error;
 }
@@ -381,6 +381,6 @@
 		xfs_iunlock(tip, lock_flags);
 	}
 	if (tempifp != NULL)
-		kmem_free(tempifp, sizeof(xfs_ifork_t));
+		kmem_free(tempifp);
 	return error;
 }
--- linux-2.6-xfs0/fs/xfs/xfs_dir2.c	Tue Apr 22 03:56:05 2008
+++ linux-2.6-xfs1/fs/xfs/xfs_dir2.c	Tue Apr 22 04:16:39 2008
@@ -499,7 +499,7 @@
 					args->firstblock, args->total,
 					&mapp[mapi], &nmap, args->flist,
 					NULL))) {
-				kmem_free(mapp, sizeof(*mapp) * count);
+				kmem_free(mapp);
 				return error;
 			}
 			if (nmap < 1)
@@ -531,14 +531,14 @@
 	    mapp[mapi - 1].br_startoff + mapp[mapi - 1].br_blockcount !=
 	    bno + count) {
 		if (mapp != &map)
-			kmem_free(mapp, sizeof(*mapp) * count);
+			kmem_free(mapp);
 		return XFS_ERROR(ENOSPC);
 	}
 	/*
 	 * Done with the temporary mapping table.
 	 */
 	if (mapp != &map)
-		kmem_free(mapp, sizeof(*mapp) * count);
+		kmem_free(mapp);
 	*dbp = xfs_dir2_da_to_db(mp, (xfs_dablk_t)bno);
 	/*
 	 * Update file's size if this is the data space and it grew.
--- linux-2.6-xfs0/fs/xfs/xfs_dir2_block.c	Tue Apr 22 03:56:05 2008
+++ linux-2.6-xfs1/fs/xfs/xfs_dir2_block.c	Tue Apr 22 04:19:32 2008
@@ -1071,7 +1071,7 @@
 	 */
 	error = xfs_dir2_grow_inode(args, XFS_DIR2_DATA_SPACE, &blkno);
 	if (error) {
-		kmem_free(buf, buf_len);
+		kmem_free(buf);
 		return error;
 	}
 	/*
@@ -1079,7 +1079,7 @@
 	 */
 	error = xfs_dir2_data_init(args, blkno, &bp);
 	if (error) {
-		kmem_free(buf, buf_len);
+		kmem_free(buf);
 		return error;
 	}
 	block = bp->data;
@@ -1198,7 +1198,7 @@
 			sfep = xfs_dir2_sf_nextentry(sfp, sfep);
 	}
 	/* Done with the temporary buffer */
-	kmem_free(buf, buf_len);
+	kmem_free(buf);
 	/*
 	 * Sort the leaf entries by hash value.
 	 */
--- linux-2.6-xfs0/fs/xfs/xfs_dir2_leaf.c	Tue Apr 22 03:56:05 2008
+++ linux-2.6-xfs1/fs/xfs/xfs_dir2_leaf.c	Tue Apr 22 04:18:27 2008
@@ -1110,7 +1110,7 @@
 		*offset = XFS_DIR2_MAX_DATAPTR;
 	else
 		*offset = xfs_dir2_byte_to_dataptr(mp, curoff);
-	kmem_free(map, map_size * sizeof(*map));
+	kmem_free(map);
 	if (bp)
 		xfs_da_brelse(NULL, bp);
 	return error;
--- linux-2.6-xfs0/fs/xfs/xfs_dir2_sf.c	Tue Apr 22 03:56:05 2008
+++ linux-2.6-xfs1/fs/xfs/xfs_dir2_sf.c	Tue Apr 22 04:19:21 2008
@@ -255,7 +255,7 @@
 	xfs_dir2_sf_check(args);
 out:
 	xfs_trans_log_inode(args->trans, dp, logflags);
-	kmem_free(block, mp->m_dirblksize);
+	kmem_free(block);
 	return error;
 }
 
@@ -512,7 +512,7 @@
 		sfep = xfs_dir2_sf_nextentry(sfp, sfep);
 		memcpy(sfep, oldsfep, old_isize - nbytes);
 	}
-	kmem_free(buf, old_isize);
+	kmem_free(buf);
 	dp->i_d.di_size = new_isize;
 	xfs_dir2_sf_check(args);
 }
@@ -1174,7 +1174,7 @@
 	/*
 	 * Clean up the inode.
 	 */
-	kmem_free(buf, oldsize);
+	kmem_free(buf);
 	dp->i_d.di_size = newsize;
 	xfs_trans_log_inode(args->trans, dp, XFS_ILOG_CORE | XFS_ILOG_DDATA);
 }
@@ -1251,7 +1251,7 @@
 	/*
 	 * Clean up the inode.
 	 */
-	kmem_free(buf, oldsize);
+	kmem_free(buf);
 	dp->i_d.di_size = newsize;
 	xfs_trans_log_inode(args->trans, dp, XFS_ILOG_CORE | XFS_ILOG_DDATA);
 }
--- linux-2.6-xfs0/fs/xfs/xfs_error.c	Tue Apr 22 03:56:05 2008
+++ linux-2.6-xfs1/fs/xfs/xfs_error.c	Tue Apr 22 04:14:29 2008
@@ -150,8 +150,7 @@
 				xfs_etest[i]);
 			xfs_etest[i] = 0;
 			xfs_etest_fsid[i] = 0LL;
-			kmem_free(xfs_etest_fsname[i],
-				  strlen(xfs_etest_fsname[i]) + 1);
+			kmem_free(xfs_etest_fsname[i]);
 			xfs_etest_fsname[i] = NULL;
 		}
 	}
@@ -175,7 +174,7 @@
 		newfmt = kmem_alloc(len, KM_SLEEP);
 		sprintf(newfmt, "Filesystem \"%s\": %s", mp->m_fsname, fmt);
 		icmn_err(level, newfmt, ap);
-		kmem_free(newfmt, len);
+		kmem_free(newfmt);
 	} else {
 		icmn_err(level, fmt, ap);
 	}
--- linux-2.6-xfs0/fs/xfs/xfs_extfree_item.c	Tue Apr 22 03:56:05 2008
+++ linux-2.6-xfs1/fs/xfs/xfs_extfree_item.c	Tue Apr 22 04:15:19 2008
@@ -41,8 +41,7 @@
 	int nexts = efip->efi_format.efi_nextents;
 
 	if (nexts > XFS_EFI_MAX_FAST_EXTENTS) {
-		kmem_free(efip, sizeof(xfs_efi_log_item_t) +
-				(nexts - 1) * sizeof(xfs_extent_t));
+		kmem_free(efip);
 	} else {
 		kmem_zone_free(xfs_efi_zone, efip);
 	}
@@ -374,8 +373,7 @@
 	int nexts = efdp->efd_format.efd_nextents;
 
 	if (nexts > XFS_EFD_MAX_FAST_EXTENTS) {
-		kmem_free(efdp, sizeof(xfs_efd_log_item_t) +
-				(nexts - 1) * sizeof(xfs_extent_t));
+		kmem_free(efdp);
 	} else {
 		kmem_zone_free(xfs_efd_zone, efdp);
 	}
--- linux-2.6-xfs0/fs/xfs/xfs_inode.c	Tue Apr 22 03:56:05 2008
+++ linux-2.6-xfs1/fs/xfs/xfs_inode.c	Tue Apr 22 04:17:46 2008
@@ -2337,7 +2337,7 @@
 		xfs_trans_binval(tp, bp);
 	}
 
-	kmem_free(ip_found, ninodes * sizeof(xfs_inode_t *));
+	kmem_free(ip_found);
 	xfs_put_perag(mp, pag);
 }
 
@@ -2549,7 +2549,7 @@
 						     (int)new_size);
 		memcpy(np, op, new_max * (uint)sizeof(xfs_dfsbno_t));
 	}
-	kmem_free(ifp->if_broot, ifp->if_broot_bytes);
+	kmem_free(ifp->if_broot);
 	ifp->if_broot = new_broot;
 	ifp->if_broot_bytes = (int)new_size;
 	ASSERT(ifp->if_broot_bytes <=
@@ -2593,7 +2593,7 @@
 
 	if (new_size == 0) {
 		if (ifp->if_u1.if_data != ifp->if_u2.if_inline_data) {
-			kmem_free(ifp->if_u1.if_data, ifp->if_real_bytes);
+			kmem_free(ifp->if_u1.if_data);
 		}
 		ifp->if_u1.if_data = NULL;
 		real_size = 0;
@@ -2608,7 +2608,7 @@
 			ASSERT(ifp->if_real_bytes != 0);
 			memcpy(ifp->if_u2.if_inline_data, ifp->if_u1.if_data,
 			      new_size);
-			kmem_free(ifp->if_u1.if_data, ifp->if_real_bytes);
+			kmem_free(ifp->if_u1.if_data);
 			ifp->if_u1.if_data = ifp->if_u2.if_inline_data;
 		}
 		real_size = 0;
@@ -2698,7 +2698,7 @@
 
 	ifp = XFS_IFORK_PTR(ip, whichfork);
 	if (ifp->if_broot != NULL) {
-		kmem_free(ifp->if_broot, ifp->if_broot_bytes);
+		kmem_free(ifp->if_broot);
 		ifp->if_broot = NULL;
 	}
 
@@ -2712,7 +2712,7 @@
 		if ((ifp->if_u1.if_data != ifp->if_u2.if_inline_data) &&
 		    (ifp->if_u1.if_data != NULL)) {
 			ASSERT(ifp->if_real_bytes != 0);
-			kmem_free(ifp->if_u1.if_data, ifp->if_real_bytes);
+			kmem_free(ifp->if_u1.if_data);
 			ifp->if_u1.if_data = NULL;
 			ifp->if_real_bytes = 0;
 		}
@@ -3861,7 +3861,7 @@
 			erp = xfs_iext_irec_new(ifp, erp_idx);
 		}
 		memmove(&erp->er_extbuf[i], nex2_ep, byte_diff);
-		kmem_free(nex2_ep, byte_diff);
+		kmem_free(nex2_ep);
 		erp->er_extcount += nex2;
 		xfs_iext_irec_update_extoffs(ifp, erp_idx + 1, nex2);
 	}
@@ -4137,7 +4137,7 @@
 	 */
 	memcpy(ifp->if_u2.if_inline_ext, ifp->if_u1.if_extents,
 		nextents * sizeof(xfs_bmbt_rec_t));
-	kmem_free(ifp->if_u1.if_extents, ifp->if_real_bytes);
+	kmem_free(ifp->if_u1.if_extents);
 	ifp->if_u1.if_extents = ifp->if_u2.if_inline_ext;
 	ifp->if_real_bytes = 0;
 }
@@ -4211,7 +4211,7 @@
 	ASSERT(ifp->if_real_bytes == XFS_IEXT_BUFSZ);
 
 	ep = ifp->if_u1.if_ext_irec->er_extbuf;
-	kmem_free(ifp->if_u1.if_ext_irec, sizeof(xfs_ext_irec_t));
+	kmem_free(ifp->if_u1.if_ext_irec);
 	ifp->if_flags &= ~XFS_IFEXTIREC;
 	ifp->if_u1.if_extents = ep;
 	ifp->if_bytes = size;
@@ -4237,7 +4237,7 @@
 		}
 		ifp->if_flags &= ~XFS_IFEXTIREC;
 	} else if (ifp->if_real_bytes) {
-		kmem_free(ifp->if_u1.if_extents, ifp->if_real_bytes);
+		kmem_free(ifp->if_u1.if_extents);
 	} else if (ifp->if_bytes) {
 		memset(ifp->if_u2.if_inline_ext, 0, XFS_INLINE_EXTS *
 			sizeof(xfs_bmbt_rec_t));
@@ -4508,7 +4508,7 @@
 	if (erp->er_extbuf) {
 		xfs_iext_irec_update_extoffs(ifp, erp_idx + 1,
 			-erp->er_extcount);
-		kmem_free(erp->er_extbuf, XFS_IEXT_BUFSZ);
+		kmem_free(erp->er_extbuf);
 	}
 	/* Compact extent records */
 	erp = ifp->if_u1.if_ext_irec;
@@ -4526,8 +4526,7 @@
 		xfs_iext_realloc_indirect(ifp,
 			nlists * sizeof(xfs_ext_irec_t));
 	} else {
-		kmem_free(ifp->if_u1.if_ext_irec,
-			sizeof(xfs_ext_irec_t));
+		kmem_free(ifp->if_u1.if_ext_irec);
 	}
 	ifp->if_real_bytes = nlists * XFS_IEXT_BUFSZ;
 }
@@ -4596,7 +4595,7 @@
 			 * so er_extoffs don't get modified in
 			 * xfs_iext_irec_remove.
 			 */
-			kmem_free(erp_next->er_extbuf, XFS_IEXT_BUFSZ);
+			kmem_free(erp_next->er_extbuf);
 			erp_next->er_extbuf = NULL;
 			xfs_iext_irec_remove(ifp, erp_idx + 1);
 			nlists = ifp->if_real_bytes / XFS_IEXT_BUFSZ;
@@ -4639,8 +4638,7 @@
 			 * so er_extoffs don't get modified in
 			 * xfs_iext_irec_remove.
 			 */
-			kmem_free(erp_next->er_extbuf,
-				erp_next->er_extcount * sizeof(xfs_bmbt_rec_t));
+			kmem_free(erp_next->er_extbuf);
 			erp_next->er_extbuf = NULL;
 			xfs_iext_irec_remove(ifp, erp_idx + 1);
 			erp = &ifp->if_u1.if_ext_irec[erp_idx];
--- linux-2.6-xfs0/fs/xfs/xfs_inode_item.c	Tue Apr 22 03:56:05 2008
+++ linux-2.6-xfs1/fs/xfs/xfs_inode_item.c	Tue Apr 22 04:16:19 2008
@@ -685,7 +685,7 @@
 		ASSERT(ip->i_d.di_nextents > 0);
 		ASSERT(iip->ili_format.ilf_fields & XFS_ILOG_DEXT);
 		ASSERT(ip->i_df.if_bytes > 0);
-		kmem_free(iip->ili_extents_buf, ip->i_df.if_bytes);
+		kmem_free(iip->ili_extents_buf);
 		iip->ili_extents_buf = NULL;
 	}
 	if (iip->ili_aextents_buf != NULL) {
@@ -693,7 +693,7 @@
 		ASSERT(ip->i_d.di_anextents > 0);
 		ASSERT(iip->ili_format.ilf_fields & XFS_ILOG_AEXT);
 		ASSERT(ip->i_afp->if_bytes > 0);
-		kmem_free(iip->ili_aextents_buf, ip->i_afp->if_bytes);
+		kmem_free(iip->ili_aextents_buf);
 		iip->ili_aextents_buf = NULL;
 	}
 
@@ -951,8 +951,7 @@
 {
 #ifdef XFS_TRANS_DEBUG
 	if (ip->i_itemp->ili_root_size != 0) {
-		kmem_free(ip->i_itemp->ili_orig_root,
-			  ip->i_itemp->ili_root_size);
+		kmem_free(ip->i_itemp->ili_orig_root);
 	}
 #endif
 	kmem_zone_free(xfs_ili_zone, ip->i_itemp);
--- linux-2.6-xfs0/fs/xfs/xfs_itable.c	Tue Apr 22 03:56:05 2008
+++ linux-2.6-xfs1/fs/xfs/xfs_itable.c	Tue Apr 22 04:19:05 2008
@@ -265,7 +265,7 @@
 		*ubused = error;
 
  out_free:
-	kmem_free(buf, sizeof(*buf));
+	kmem_free(buf);
 	return error;
 }
 
@@ -715,7 +715,7 @@
 	/*
 	 * Done, we're either out of filesystem or space to put the data.
 	 */
-	kmem_free(irbuf, irbsize);
+	kmem_free(irbuf);
 	*ubcountp = ubelem;
 	/*
 	 * Found some inodes, return them now and return the error next time.
@@ -921,7 +921,7 @@
 		}
 		*lastino = XFS_AGINO_TO_INO(mp, agno, agino);
 	}
-	kmem_free(buffer, bcount * sizeof(*buffer));
+	kmem_free(buffer);
 	if (cur)
 		xfs_btree_del_cursor(cur, (error ? XFS_BTREE_ERROR :
 					   XFS_BTREE_NOERROR));
--- linux-2.6-xfs0/fs/xfs/xfs_log.c	Tue Apr 22 03:56:05 2008
+++ linux-2.6-xfs1/fs/xfs/xfs_log.c	Tue Apr 22 04:16:49 2008
@@ -1552,7 +1552,7 @@
 		}
 #endif
 		next_iclog = iclog->ic_next;
-		kmem_free(iclog, sizeof(xlog_in_core_t));
+		kmem_free(iclog);
 		iclog = next_iclog;
 	}
 	freesema(&log->l_flushsema);
@@ -1571,7 +1571,7 @@
 		tic = log->l_unmount_free;
 		while (tic) {
 			next_tic = tic->t_next;
-			kmem_free(tic, PAGE_SIZE);
+			kmem_free(tic);
 			tic = next_tic;
 		}
 	}
@@ -1585,7 +1585,7 @@
 	}
 #endif
 	log->l_mp->m_log = NULL;
-	kmem_free(log, sizeof(xlog_t));
+	kmem_free(log);
 }	/* xlog_dealloc_log */
 
 /*
--- linux-2.6-xfs0/fs/xfs/xfs_log_recover.c	Tue Apr 22 03:56:05 2008
+++ linux-2.6-xfs1/fs/xfs/xfs_log_recover.c	Tue Apr 22 04:20:46 2008
@@ -1709,8 +1709,7 @@
 					} else {
 						prevp->bc_next = bcp->bc_next;
 					}
-					kmem_free(bcp,
-						  sizeof(xfs_buf_cancel_t));
+					kmem_free(bcp);
 				}
 			}
 			return 1;
@@ -2511,7 +2510,7 @@
 
 error:
 	if (need_free)
-		kmem_free(in_f, sizeof(*in_f));
+		kmem_free(in_f);
 	return XFS_ERROR(error);
 }
 
@@ -2822,16 +2821,14 @@
 		item = item->ri_next;
 		 /* Free the regions in the item. */
 		for (i = 0; i < free_item->ri_cnt; i++) {
-			kmem_free(free_item->ri_buf[i].i_addr,
-				  free_item->ri_buf[i].i_len);
+			kmem_free(free_item->ri_buf[i].i_addr);
 		}
 		/* Free the item itself */
-		kmem_free(free_item->ri_buf,
-			  (free_item->ri_total * sizeof(xfs_log_iovec_t)));
-		kmem_free(free_item, sizeof(xlog_recover_item_t));
+		kmem_free(free_item->ri_buf);
+		kmem_free(free_item);
 	} while (first_item != item);
 	/* Free the transaction recover structure */
-	kmem_free(trans, sizeof(xlog_recover_t));
+	kmem_free(trans);
 }
 
 STATIC int
@@ -3747,8 +3744,7 @@
 	error = xlog_do_recovery_pass(log, head_blk, tail_blk,
 				      XLOG_RECOVER_PASS1);
 	if (error != 0) {
-		kmem_free(log->l_buf_cancel_table,
-			  XLOG_BC_TABLE_SIZE * sizeof(xfs_buf_cancel_t*));
+		kmem_free(log->l_buf_cancel_table);
 		log->l_buf_cancel_table = NULL;
 		return error;
 	}
@@ -3767,8 +3763,7 @@
 	}
 #endif	/* DEBUG */
 
-	kmem_free(log->l_buf_cancel_table,
-		  XLOG_BC_TABLE_SIZE * sizeof(xfs_buf_cancel_t*));
+	kmem_free(log->l_buf_cancel_table);
 	log->l_buf_cancel_table = NULL;
 
 	return error;
--- linux-2.6-xfs0/fs/xfs/xfs_mount.c	Tue Apr 22 03:56:05 2008
+++ linux-2.6-xfs1/fs/xfs/xfs_mount.c	Tue Apr 22 04:20:08 2008
@@ -158,11 +158,8 @@
 
 		for (agno = 0; agno < mp->m_maxagi; agno++)
 			if (mp->m_perag[agno].pagb_list)
-				kmem_free(mp->m_perag[agno].pagb_list,
-						sizeof(xfs_perag_busy_t) *
-							XFS_PAGB_NUM_SLOTS);
-		kmem_free(mp->m_perag,
-			  sizeof(xfs_perag_t) * mp->m_sb.sb_agcount);
+				kmem_free(mp->m_perag[agno].pagb_list);
+		kmem_free(mp->m_perag);
 	}
 
 	spinlock_destroy(&mp->m_ail_lock);
@@ -173,11 +170,11 @@
 		XFS_QM_DONE(mp);
 
 	if (mp->m_fsname != NULL)
-		kmem_free(mp->m_fsname, mp->m_fsname_len);
+		kmem_free(mp->m_fsname);
 	if (mp->m_rtname != NULL)
-		kmem_free(mp->m_rtname, strlen(mp->m_rtname) + 1);
+		kmem_free(mp->m_rtname);
 	if (mp->m_logname != NULL)
-		kmem_free(mp->m_logname, strlen(mp->m_logname) + 1);
+		kmem_free(mp->m_logname);
 
 	xfs_icsb_destroy_counters(mp);
 }
@@ -1219,9 +1216,8 @@
  error2:
 	for (agno = 0; agno < sbp->sb_agcount; agno++)
 		if (mp->m_perag[agno].pagb_list)
-			kmem_free(mp->m_perag[agno].pagb_list,
-			  sizeof(xfs_perag_busy_t) * XFS_PAGB_NUM_SLOTS);
-	kmem_free(mp->m_perag, sbp->sb_agcount * sizeof(xfs_perag_t));
+			kmem_free(mp->m_perag[agno].pagb_list);
+	kmem_free(mp->m_perag);
 	mp->m_perag = NULL;
 	/* FALLTHROUGH */
  error1:
--- linux-2.6-xfs0/fs/xfs/xfs_mru_cache.c	Tue Apr 22 03:56:05 2008
+++ linux-2.6-xfs1/fs/xfs/xfs_mru_cache.c	Tue Apr 22 04:18:08 2008
@@ -382,9 +382,9 @@
 
 exit:
 	if (err && mru && mru->lists)
-		kmem_free(mru->lists, mru->grp_count * sizeof(*mru->lists));
+		kmem_free(mru->lists);
 	if (err && mru)
-		kmem_free(mru, sizeof(*mru));
+		kmem_free(mru);
 
 	return err;
 }
@@ -424,8 +424,8 @@
 
 	xfs_mru_cache_flush(mru);
 
-	kmem_free(mru->lists, mru->grp_count * sizeof(*mru->lists));
-	kmem_free(mru, sizeof(*mru));
+	kmem_free(mru->lists);
+	kmem_free(mru);
 }
 
 /*
--- linux-2.6-xfs0/fs/xfs/xfs_rtalloc.c	Tue Apr 22 03:56:05 2008
+++ linux-2.6-xfs1/fs/xfs/xfs_rtalloc.c	Tue Apr 22 04:18:12 2008
@@ -2053,7 +2053,7 @@
 	/*
 	 * Free the fake mp structure.
 	 */
-	kmem_free(nmp, sizeof(*nmp));
+	kmem_free(nmp);
 
 	return error;
 }
--- linux-2.6-xfs0/fs/xfs/xfs_trans.c	Tue Apr 22 03:56:05 2008
+++ linux-2.6-xfs1/fs/xfs/xfs_trans.c	Tue Apr 22 04:18:22 2008
@@ -889,7 +889,7 @@
 
 	tp->t_commit_lsn = commit_lsn;
 	if (nvec > XFS_TRANS_LOGVEC_COUNT) {
-		kmem_free(log_vector, nvec * sizeof(xfs_log_iovec_t));
+		kmem_free(log_vector);
 	}
 
 	/*
@@ -1265,7 +1265,7 @@
 		ASSERT(!XFS_LIC_ARE_ALL_FREE(licp));
 		xfs_trans_chunk_committed(licp, tp->t_lsn, abortflag);
 		next_licp = licp->lic_next;
-		kmem_free(licp, sizeof(xfs_log_item_chunk_t));
+		kmem_free(licp);
 		licp = next_licp;
 	}
 
--- linux-2.6-xfs0/fs/xfs/xfs_trans_inode.c	Tue Apr 22 03:56:05 2008
+++ linux-2.6-xfs1/fs/xfs/xfs_trans_inode.c	Tue Apr 22 04:12:18 2008
@@ -291,7 +291,7 @@
 	iip = ip->i_itemp;
 	if (iip->ili_root_size != 0) {
 		ASSERT(iip->ili_orig_root != NULL);
-		kmem_free(iip->ili_orig_root, iip->ili_root_size);
+		kmem_free(iip->ili_orig_root);
 		iip->ili_root_size = 0;
 		iip->ili_orig_root = NULL;
 	}
--- linux-2.6-xfs0/fs/xfs/xfs_trans_item.c	Tue Apr 22 03:56:05 2008
+++ linux-2.6-xfs1/fs/xfs/xfs_trans_item.c	Tue Apr 22 04:14:45 2008
@@ -161,7 +161,7 @@
 			licpp = &((*licpp)->lic_next);
 		}
 		*licpp = licp->lic_next;
-		kmem_free(licp, sizeof(xfs_log_item_chunk_t));
+		kmem_free(licp);
 		tp->t_items_free -= XFS_LIC_NUM_SLOTS;
 	}
 }
@@ -314,7 +314,7 @@
 		ASSERT(!XFS_LIC_ARE_ALL_FREE(licp));
 		(void) xfs_trans_unlock_chunk(licp, 1, abort, NULLCOMMITLSN);
 		next_licp = licp->lic_next;
-		kmem_free(licp, sizeof(xfs_log_item_chunk_t));
+		kmem_free(licp);
 		licp = next_licp;
 	}
 
@@ -363,7 +363,7 @@
 		next_licp = licp->lic_next;
 		if (XFS_LIC_ARE_ALL_FREE(licp)) {
 			*licpp = next_licp;
-			kmem_free(licp, sizeof(xfs_log_item_chunk_t));
+			kmem_free(licp);
 			freed -= XFS_LIC_NUM_SLOTS;
 		} else {
 			licpp = &(licp->lic_next);
@@ -530,7 +530,7 @@
 	lbcp = tp->t_busy.lbc_next;
 	while (lbcp != NULL) {
 		lbcq = lbcp->lbc_next;
-		kmem_free(lbcp, sizeof(xfs_log_busy_chunk_t));
+		kmem_free(lbcp);
 		lbcp = lbcq;
 	}
 
--- linux-2.6-xfs0/fs/xfs/xfs_vfsops.c	Tue Apr 22 03:56:05 2008
+++ linux-2.6-xfs1/fs/xfs/xfs_vfsops.c	Tue Apr 22 04:18:53 2008
@@ -641,7 +641,7 @@
 		xfs_unmountfs(mp, credp);
 		xfs_qmops_put(mp);
 		xfs_dmops_put(mp);
-		kmem_free(mp, sizeof(xfs_mount_t));
+		kmem_free(mp);
 	}
 
 	return XFS_ERROR(error);
@@ -1054,7 +1054,7 @@
 
 		if (XFS_FORCED_SHUTDOWN(mp) && !(flags & SYNC_CLOSE)) {
 			XFS_MOUNT_IUNLOCK(mp);
-			kmem_free(ipointer, sizeof(xfs_iptr_t));
+			kmem_free(ipointer);
 			return 0;
 		}
 
@@ -1200,7 +1200,7 @@
 			}
 			XFS_MOUNT_IUNLOCK(mp);
 			ASSERT(ipointer_in == B_FALSE);
-			kmem_free(ipointer, sizeof(xfs_iptr_t));
+			kmem_free(ipointer);
 			return XFS_ERROR(error);
 		}
 
@@ -1230,7 +1230,7 @@
 
 	ASSERT(ipointer_in == B_FALSE);
 
-	kmem_free(ipointer, sizeof(xfs_iptr_t));
+	kmem_free(ipointer);
 	return XFS_ERROR(last_error);
 }
 

^ permalink raw reply	[flat|nested] 162+ messages in thread

* [PATCH] xfs: do not pass unused params to xfs_flush_pages
  2008-04-22  2:33                                   ` [PATCH] xfs: do not pass size into kmem_free, it's unused Denys Vlasenko
@ 2008-04-22  3:03                                     ` Denys Vlasenko
  2008-04-22  3:14                                       ` [PATCH] xfs: use smaller int param in call " Denys Vlasenko
                                                         ` (2 more replies)
  2008-04-22  3:09                                     ` [PATCH] xfs: do not pass size into kmem_free, it's unused Eric Sandeen
  2008-04-22 22:02                                     ` David Chinner
  2 siblings, 3 replies; 162+ messages in thread
From: Denys Vlasenko @ 2008-04-22  3:03 UTC (permalink / raw)
  To: David Chinner
  Cc: Eric Sandeen, Adrian Bunk, Alan Cox, Shawn Bohrer, Ingo Molnar,
	Andrew Morton, Linux Kernel Mailing List, Arjan van de Ven,
	Thomas Gleixner

[-- Attachment #1: Type: text/plain, Size: 474 bytes --]

Hi David,

xfs_flush_pages() does not use some of its parameters, namely:
first, last and fiops.

This patch removes these parameters from all callsites.

Code size difference on 32-bit x86:

   text    data     bss     dec     hex filename
 390739    2748    1708  395195   607bb linux-2.6-xfs1-TEST/fs/xfs/xfs.o
 390567    2748    1708  395023   6070f linux-2.6-xfs2-TEST/fs/xfs/xfs.o

Compile-tested only.

Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
--
vda

[-- Attachment #2: xfs1-xfs_flush_pages-remove_unused_params.patch --]
[-- Type: text/x-diff, Size: 2850 bytes --]

--- linux-2.6-xfs1/fs/xfs/linux-2.6/xfs_aops.c	Tue Apr 22 04:06:46 2008
+++ linux-2.6-xfs2/fs/xfs/linux-2.6/xfs_aops.c	Tue Apr 22 04:46:49 2008
@@ -1533,7 +1532,7 @@
 
 	xfs_itrace_entry(XFS_I(inode));
 	xfs_rwlock(ip, VRWLOCK_READ);
-	xfs_flush_pages(ip, (xfs_off_t)0, -1, 0, FI_REMAPF);
+	xfs_flush_pages(ip, 0);
 	xfs_rwunlock(ip, VRWLOCK_READ);
 	return generic_block_bmap(mapping, block, xfs_get_blocks);
 }
--- linux-2.6-xfs1/fs/xfs/linux-2.6/xfs_fs_subr.c	Tue Apr 22 04:06:46 2008
+++ linux-2.6-xfs2/fs/xfs/linux-2.6/xfs_fs_subr.c	Tue Apr 22 04:46:33 2008
@@ -72,10 +72,7 @@
 int
 xfs_flush_pages(
 	xfs_inode_t	*ip,
-	xfs_off_t	first,
-	xfs_off_t	last,
-	uint64_t	flags,
-	int		fiopt)
+	uint64_t	flags)
 {
 	bhv_vnode_t	*vp = XFS_ITOV(ip);
 	struct inode	*inode = vn_to_inode(vp);
--- linux-2.6-xfs1/fs/xfs/xfs_bmap.c	Tue Apr 22 04:16:25 2008
+++ linux-2.6-xfs2/fs/xfs/xfs_bmap.c	Tue Apr 22 04:47:22 2008
@@ -5867,8 +5867,7 @@
 	if (whichfork == XFS_DATA_FORK &&
 		(ip->i_delayed_blks || ip->i_size > ip->i_d.di_size)) {
 		/* xfs_fsize_t last_byte = xfs_file_last_byte(ip); */
-		error = xfs_flush_pages(ip, (xfs_off_t)0,
-					       -1, 0, FI_REMAPF);
+		error = xfs_flush_pages(ip, 0);
 	}
 
 	ASSERT(whichfork == XFS_ATTR_FORK || ip->i_delayed_blks == 0);
--- linux-2.6-xfs1/fs/xfs/xfs_vfsops.c	Tue Apr 22 04:18:53 2008
+++ linux-2.6-xfs2/fs/xfs/xfs_vfsops.c	Tue Apr 22 04:47:36 2008
@@ -1121,8 +1121,7 @@
 					error = xfs_flushinval_pages(ip,
 							0, -1, FI_REMAPF);
 			} else if ((flags & SYNC_DELWRI) && VN_DIRTY(vp)) {
-				error = xfs_flush_pages(ip, 0,
-							-1, fflag, FI_NONE);
+				error = xfs_flush_pages(ip, fflag);
 			}
 
 			/*
--- linux-2.6-xfs1/fs/xfs/xfs_vnodeops.c	Tue Apr 22 04:06:44 2008
+++ linux-2.6-xfs2/fs/xfs/xfs_vnodeops.c	Tue Apr 22 04:47:13 2008
@@ -592,9 +592,7 @@
 		if (!code &&
 		    (ip->i_size != ip->i_d.di_size) &&
 		    (vap->va_size > ip->i_d.di_size)) {
-			code = xfs_flush_pages(ip,
-					ip->i_d.di_size, vap->va_size,
-					XFS_B_ASYNC, FI_NONE);
+			code = xfs_flush_pages(ip, XFS_B_ASYNC);
 		}
 
 		/* wait for all I/O to complete */
@@ -1517,7 +1515,7 @@
 		 */
 		truncated = xfs_iflags_test_and_clear(ip, XFS_ITRUNCATED);
 		if (truncated && VN_DIRTY(vp) && ip->i_delayed_blks > 0)
-			xfs_flush_pages(ip, 0, -1, XFS_B_ASYNC, FI_NONE);
+			xfs_flush_pages(ip, XFS_B_ASYNC);
 	}
 
 #ifdef HAVE_REFCACHE
--- linux-2.6-xfs1/fs/xfs/xfs_vnodeops.h	Tue Apr 22 04:06:44 2008
+++ linux-2.6-xfs2/fs/xfs/xfs_vnodeops.h	Tue Apr 22 04:46:59 2008
@@ -78,7 +78,6 @@
 		xfs_off_t last, int fiopt);
 int xfs_flushinval_pages(struct xfs_inode *ip, xfs_off_t first,
 		xfs_off_t last, int fiopt);
-int xfs_flush_pages(struct xfs_inode *ip, xfs_off_t first,
-		xfs_off_t last, uint64_t flags, int fiopt);
+int xfs_flush_pages(struct xfs_inode *ip, uint64_t flags);
 
 #endif /* _XFS_VNODEOPS_H */

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] xfs: do not pass size into kmem_free, it's unused
  2008-04-22  2:33                                   ` [PATCH] xfs: do not pass size into kmem_free, it's unused Denys Vlasenko
  2008-04-22  3:03                                     ` [PATCH] xfs: do not pass unused params to xfs_flush_pages Denys Vlasenko
@ 2008-04-22  3:09                                     ` Eric Sandeen
  2008-04-22  3:35                                       ` Eric Sandeen
  2008-04-22 22:02                                     ` David Chinner
  2 siblings, 1 reply; 162+ messages in thread
From: Eric Sandeen @ 2008-04-22  3:09 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: David Chinner, Adrian Bunk, Alan Cox, Shawn Bohrer, Ingo Molnar,
	Andrew Morton, Linux Kernel Mailing List, Arjan van de Ven,
	Thomas Gleixner

Denys Vlasenko wrote:
> Hi David,
> 
>> Patches are welcome - I'd be over the moon if any of the known 4k
>> stack advocates sent a stack reduction patch for XFS, but it seems
>> that actually trying to fix the problems is much harder than
>> resending a one line patch every few months....
> 
> kmem_free() function takes (ptr, size) arguments but doesn't
> actually use second one.
> 
> This patch removes size argument from all callsites.

I didn't expect it to but this does reduce a few things slightly.

On x86_64:

-xfs_attr_leaf_list_int 200
+xfs_attr_leaf_list_int 184

-xfs_dir2_sf_to_block 136
+xfs_dir2_sf_to_block 120

-xfs_ifree_cluster 136
+xfs_ifree_cluster 120

-xfs_inumbers 184
+xfs_inumbers 168

-xfs_mount_free 24

Thanks,
-Eric


^ permalink raw reply	[flat|nested] 162+ messages in thread

* [PATCH] xfs: use smaller int param in call to xfs_flush_pages
  2008-04-22  3:03                                     ` [PATCH] xfs: do not pass unused params to xfs_flush_pages Denys Vlasenko
@ 2008-04-22  3:14                                       ` Denys Vlasenko
  2008-04-22  3:18                                         ` Eric Sandeen
                                                           ` (2 more replies)
  2008-04-22  3:15                                       ` [PATCH] xfs: do not pass unused params " Eric Sandeen
  2008-04-22 22:07                                       ` David Chinner
  2 siblings, 3 replies; 162+ messages in thread
From: Denys Vlasenko @ 2008-04-22  3:14 UTC (permalink / raw)
  To: David Chinner
  Cc: Eric Sandeen, Adrian Bunk, Alan Cox, Shawn Bohrer, Ingo Molnar,
	Andrew Morton, Linux Kernel Mailing List, Arjan van de Ven,
	Thomas Gleixner

[-- Attachment #1: Type: text/plain, Size: 739 bytes --]

Hi David,

xfs_flush_pages() flags parameter is declared as uint64_t, but
code never pass values which do not fit into 32 bits.
All callsites sans one pass zero, and the last one passes
XFS_B_DELWRI, XFS_B_ASYNC or zero.
These values are defined in enum xfs_buf_flags_t and they
all fit in 32 bits.

This patch changes type of the parameter and one variable
which used to pass it to unsigned int.

Code size difference on 32-bit x86:

# size */fs/xfs/xfs.o
   text    data     bss     dec     hex filename
 390567    2748    1708  395023   6070f linux-2.6-xfs2-TEST/fs/xfs/xfs.o
 390507    2748    1708  394963   606d3 linux-2.6-xfs3-TEST/fs/xfs/xfs.o

Compile-tested only.

Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
--
vda

[-- Attachment #2: xfs2-xfs_flush_pages-use_smaller_int_param.patch --]
[-- Type: text/x-diff, Size: 1079 bytes --]

--- linux-2.6-xfs2/fs/xfs/linux-2.6/xfs_fs_subr.c	Tue Apr 22 04:46:33 2008
+++ linux-2.6-xfs3/fs/xfs/linux-2.6/xfs_fs_subr.c	Tue Apr 22 05:05:47 2008
@@ -72,7 +72,7 @@
 int
 xfs_flush_pages(
 	xfs_inode_t	*ip,
-	uint64_t	flags)
+	unsigned int	flags)
 {
 	bhv_vnode_t	*vp = XFS_ITOV(ip);
 	struct inode	*inode = vn_to_inode(vp);
--- linux-2.6-xfs2/fs/xfs/xfs_vfsops.c	Tue Apr 22 04:47:36 2008
+++ linux-2.6-xfs3/fs/xfs/xfs_vfsops.c	Tue Apr 22 05:05:17 2008
@@ -897,7 +897,7 @@
 	bhv_vnode_t	*vp = NULL;
 	int		error;
 	int		last_error;
-	uint64_t	fflag;
+	unsigned int	fflag;
 	uint		lock_flags;
 	uint		base_lock_flags;
 	boolean_t	mount_locked;
--- linux-2.6-xfs2/fs/xfs/xfs_vnodeops.h	Tue Apr 22 04:46:59 2008
+++ linux-2.6-xfs3/fs/xfs/xfs_vnodeops.h	Tue Apr 22 05:04:52 2008
@@ -78,6 +78,6 @@
 		xfs_off_t last, int fiopt);
 int xfs_flushinval_pages(struct xfs_inode *ip, xfs_off_t first,
 		xfs_off_t last, int fiopt);
-int xfs_flush_pages(struct xfs_inode *ip, uint64_t flags);
+int xfs_flush_pages(struct xfs_inode *ip, unsigned int flags);
 
 #endif /* _XFS_VNODEOPS_H */

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] xfs: do not pass unused params to xfs_flush_pages
  2008-04-22  3:03                                     ` [PATCH] xfs: do not pass unused params to xfs_flush_pages Denys Vlasenko
  2008-04-22  3:14                                       ` [PATCH] xfs: use smaller int param in call " Denys Vlasenko
@ 2008-04-22  3:15                                       ` Eric Sandeen
  2008-04-22  8:57                                         ` Denys Vlasenko
  2008-04-22 22:07                                       ` David Chinner
  2 siblings, 1 reply; 162+ messages in thread
From: Eric Sandeen @ 2008-04-22  3:15 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: David Chinner, Adrian Bunk, Alan Cox, Shawn Bohrer, Ingo Molnar,
	Andrew Morton, Linux Kernel Mailing List, Arjan van de Ven,
	Thomas Gleixner

Denys Vlasenko wrote:
> Hi David,
> 
> xfs_flush_pages() does not use some of its parameters, namely:
> first, last and fiops.
> 
> This patch removes these parameters from all callsites.
> 
> Code size difference on 32-bit x86:
> 
>    text    data     bss     dec     hex filename
>  390739    2748    1708  395195   607bb linux-2.6-xfs1-TEST/fs/xfs/xfs.o
>  390567    2748    1708  395023   6070f linux-2.6-xfs2-TEST/fs/xfs/xfs.o
> 
> Compile-tested only.

FWIW this one actually does not seem to reduce stack usage anywhere.

-Eric

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] xfs: use smaller int param in call to xfs_flush_pages
  2008-04-22  3:14                                       ` [PATCH] xfs: use smaller int param in call " Denys Vlasenko
@ 2008-04-22  3:18                                         ` Eric Sandeen
  2008-04-22  4:10                                           ` David Chinner
  2008-04-22  9:42                                         ` [PATCH] xfs: remove unused parameter of xfs_qm_dqpurge Denys Vlasenko
  2008-04-22 22:08                                         ` [PATCH] xfs: use smaller int param in call to xfs_flush_pages David Chinner
  2 siblings, 1 reply; 162+ messages in thread
From: Eric Sandeen @ 2008-04-22  3:18 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: David Chinner, Adrian Bunk, Alan Cox, Shawn Bohrer, Ingo Molnar,
	Andrew Morton, Linux Kernel Mailing List, Arjan van de Ven,
	Thomas Gleixner

Denys Vlasenko wrote:
> Hi David,
> 
> xfs_flush_pages() flags parameter is declared as uint64_t, but
> code never pass values which do not fit into 32 bits.
> All callsites sans one pass zero, and the last one passes
> XFS_B_DELWRI, XFS_B_ASYNC or zero.
> These values are defined in enum xfs_buf_flags_t and they
> all fit in 32 bits.
> 
> This patch changes type of the parameter and one variable
> which used to pass it to unsigned int.


FWIW this one also seems to make no stack difference, at least on x86_64.

Not complaining; just checking it out. :)

If you can shink xfs_bmapi, let me know.  :)

Thanks,
-Eric

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] xfs: do not pass size into kmem_free, it's unused
  2008-04-22  3:09                                     ` [PATCH] xfs: do not pass size into kmem_free, it's unused Eric Sandeen
@ 2008-04-22  3:35                                       ` Eric Sandeen
  0 siblings, 0 replies; 162+ messages in thread
From: Eric Sandeen @ 2008-04-22  3:35 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: David Chinner, Adrian Bunk, Alan Cox, Shawn Bohrer, Ingo Molnar,
	Andrew Morton, Linux Kernel Mailing List, Arjan van de Ven,
	Thomas Gleixner

Eric Sandeen wrote:
> Denys Vlasenko wrote:
>> Hi David,
>>
>>> Patches are welcome - I'd be over the moon if any of the known 4k
>>> stack advocates sent a stack reduction patch for XFS, but it seems
>>> that actually trying to fix the problems is much harder than
>>> resending a one line patch every few months....
>> kmem_free() function takes (ptr, size) arguments but doesn't
>> actually use second one.
>>
>> This patch removes size argument from all callsites.
> 
> I didn't expect it to but this does reduce a few things slightly.
> 
> On x86_64:
> 
> -xfs_attr_leaf_list_int 200
> +xfs_attr_leaf_list_int 184
> 
> -xfs_dir2_sf_to_block 136
> +xfs_dir2_sf_to_block 120
> 
> -xfs_ifree_cluster 136
> +xfs_ifree_cluster 120
> 
> -xfs_inumbers 184
> +xfs_inumbers 168
> 
> -xfs_mount_free 24

And on x86, just for the record (fedora 9 config in both cases...)

-xfs_attr_leaf_inactive 36
+xfs_attr_leaf_inactive 32

-xfs_attr_shortform_list 40
+xfs_attr_shortform_list 36

-xfs_da_grow_inode 96
+xfs_da_grow_inode 92

-xfs_dir2_grow_inode 116
+xfs_dir2_grow_inode 104

-xfs_dir2_leaf_getdents 176
+xfs_dir2_leaf_getdents 172

-xfs_dir2_sf_to_block 92
+xfs_dir2_sf_to_block 88

-xfs_ifree_cluster 108
+xfs_ifree_cluster 104

-xfs_inumbers 88
+xfs_inumbers 84

-xfs_lock_inodes 24
+xfs_lock_inodes 28

-Eric

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] xfs: use smaller int param in call to xfs_flush_pages
  2008-04-22  3:18                                         ` Eric Sandeen
@ 2008-04-22  4:10                                           ` David Chinner
  0 siblings, 0 replies; 162+ messages in thread
From: David Chinner @ 2008-04-22  4:10 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: Denys Vlasenko, David Chinner, Adrian Bunk, Alan Cox,
	Shawn Bohrer, Ingo Molnar, Andrew Morton,
	Linux Kernel Mailing List, Arjan van de Ven, Thomas Gleixner

On Mon, Apr 21, 2008 at 10:18:24PM -0500, Eric Sandeen wrote:
> FWIW this one also seems to make no stack difference, at least on x86_64.
> 
> Not complaining; just checking it out. :)
> 
> If you can shink xfs_bmapi, let me know.  :)

FWIW, the path we care about is this path through ->writepage:

(submit_bio)
_xfs_buf_ioapply		32
xfs_buf_iorequest		0
xfs_buf_iostart			0
xfs_buf_read_flags		0
xfs_trans_read_buf		4
xfs_btree_read_bufs		16
xfs_alloc_lookup		56
xfs_alloc_lookup_eq		16
xfs_alloc_fixup_trees		20
xfs_alloc_ag_vextent_near	76
xfs_alloc_ag_vextent		0
xfs_alloc_vextent		48
xfs_bmap_btalloc		164
xfs_bmap_alloc			0
xfs_bmapi			228
xfs_iomap_write_allocate	116
xfs_iomap			20
xfs_map_blocks			16
xfs_page_state_convert		124
xfs_vm_writepage		12
-------------------------------------
checkstack total:		948

Realistically, the onyl thing we can trim anything off is xfs_bmapi,
xfs_bmap_btalloc, xfs_iomap_write_allocate, and xfs_page_state_convert.
It's going to take a lot of work to get any significant change into
those functions given the complexity of them....

FWIW, if we've come through a syscall, the rest of the trace looks
like:

__writepage				0
write_cache_pages			100
generic_writepages			0
xfs_vm_writepages			12
do_writepages				0
__writeback_single_inode		36
sync_sb_inodes				40
writeback_inodes			0
balance_dirty_pages_ratelimited_nr	76
generic_file_buffered_write		96
xfs_write				80
xfs_file_aio_write			12
do_sync_write				140
vfs_write				12
--------------------------------------------
total					604

So the normal case uses 604 bytes prior to entering ->writepage.

It's when we are already using >2k of the stack when we enter
->writepage that we get into trouble....

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] xfs: do not pass unused params to xfs_flush_pages
  2008-04-22  3:15                                       ` [PATCH] xfs: do not pass unused params " Eric Sandeen
@ 2008-04-22  8:57                                         ` Denys Vlasenko
  2008-04-22  9:56                                           ` Jakub Jelinek
  2008-04-22 12:51                                           ` Eric Sandeen
  0 siblings, 2 replies; 162+ messages in thread
From: Denys Vlasenko @ 2008-04-22  8:57 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: David Chinner, Adrian Bunk, Alan Cox, Shawn Bohrer, Ingo Molnar,
	Andrew Morton, Linux Kernel Mailing List, Arjan van de Ven,
	Thomas Gleixner

On Tuesday 22 April 2008 05:15, Eric Sandeen wrote:
> Denys Vlasenko wrote:
> > Hi David,
> > 
> > xfs_flush_pages() does not use some of its parameters, namely:
> > first, last and fiops.
> > 
> > This patch removes these parameters from all callsites.
> > 
> > Code size difference on 32-bit x86:
> > 
> >    text    data     bss     dec     hex filename
> >  390739    2748    1708  395195   607bb linux-2.6-xfs1-TEST/fs/xfs/xfs.o
> >  390567    2748    1708  395023   6070f linux-2.6-xfs2-TEST/fs/xfs/xfs.o
> > 
> > Compile-tested only.
> 
> FWIW this one actually does not seem to reduce stack usage anywhere.

I hope this will not deteriorate into a contest whether
every particular patch reduces stack usage or not, but:

You do not see reduced stack usage in "make checkstack",
because "make checkstack" shows only stack usage caused by
local variables (it analyses sub %esp,NN instructions which
make room for them). Parameters also take up stack, but
they are pushed on stack with push instruction,
and so are invisible in "make checkstack" output.
--
vda

^ permalink raw reply	[flat|nested] 162+ messages in thread

* [PATCH] xfs: remove unused parameter of xfs_qm_dqpurge
  2008-04-22  3:14                                       ` [PATCH] xfs: use smaller int param in call " Denys Vlasenko
  2008-04-22  3:18                                         ` Eric Sandeen
@ 2008-04-22  9:42                                         ` Denys Vlasenko
  2008-04-22 10:16                                           ` [PATCH] xfs: remove unused parameter of xfs_iomap_write_allocate Denys Vlasenko
                                                             ` (2 more replies)
  2008-04-22 22:08                                         ` [PATCH] xfs: use smaller int param in call to xfs_flush_pages David Chinner
  2 siblings, 3 replies; 162+ messages in thread
From: Denys Vlasenko @ 2008-04-22  9:42 UTC (permalink / raw)
  To: David Chinner
  Cc: Eric Sandeen, Adrian Bunk, Alan Cox, Shawn Bohrer, Ingo Molnar,
	Andrew Morton, Linux Kernel Mailing List, Arjan van de Ven,
	Thomas Gleixner

[-- Attachment #1: Type: text/plain, Size: 423 bytes --]

Hi David,

xfs_qm_dqpurge() does not use flags parameter.
This patch removes it.

Code size difference on 32-bit x86:

# size */fs/xfs/xfs.o

Compile-tested only.
   text    data     bss     dec     hex filename
 390507    2748    1708  394963   606d3 linux-2.6-xfs3-TEST/fs/xfs/xfs.o
 390491    2748    1708  394947   606c3 linux-2.6-xfs4-TEST/fs/xfs/xfs.o

Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
--
vda

[-- Attachment #2: xfs3-xfs_qm_dqpurge-remove_unused_params.patch --]
[-- Type: text/x-diff, Size: 1151 bytes --]

--- linux-2.6-xfs3/fs/xfs/quota/xfs_dquot.c	Tue Apr 22 04:06:46 2008
+++ linux-2.6-xfs4/fs/xfs/quota/xfs_dquot.c	Tue Apr 22 11:34:01 2008
@@ -1435,8 +1435,7 @@
 /* ARGSUSED */
 int
 xfs_qm_dqpurge(
-	xfs_dquot_t	*dqp,
-	uint		flags)
+	xfs_dquot_t	*dqp)
 {
 	xfs_dqhash_t	*thishash;
 	xfs_mount_t	*mp;
--- linux-2.6-xfs3/fs/xfs/quota/xfs_dquot.h	Tue Apr 22 04:06:46 2008
+++ linux-2.6-xfs4/fs/xfs/quota/xfs_dquot.h	Tue Apr 22 11:33:54 2008
@@ -164,7 +164,7 @@
 
 extern void		xfs_qm_dqdestroy(xfs_dquot_t *);
 extern int		xfs_qm_dqflush(xfs_dquot_t *, uint);
-extern int		xfs_qm_dqpurge(xfs_dquot_t *, uint);
+extern int		xfs_qm_dqpurge(xfs_dquot_t *);
 extern void		xfs_qm_dqunpin_wait(xfs_dquot_t *);
 extern int		xfs_qm_dqlock_nowait(xfs_dquot_t *);
 extern int		xfs_qm_dqflock_nowait(xfs_dquot_t *);
--- linux-2.6-xfs3/fs/xfs/quota/xfs_qm.c	Tue Apr 22 04:13:03 2008
+++ linux-2.6-xfs4/fs/xfs/quota/xfs_qm.c	Tue Apr 22 11:34:31 2008
@@ -631,7 +631,7 @@
 		 * freelist in INACTIVE state.
 		 */
 		nextdqp = dqp->MPL_NEXT;
-		nmisses += xfs_qm_dqpurge(dqp, flags);
+		nmisses += xfs_qm_dqpurge(dqp);
 		dqp = nextdqp;
 	}
 	xfs_qm_mplist_unlock(mp);

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] xfs: do not pass unused params to xfs_flush_pages
  2008-04-22  8:57                                         ` Denys Vlasenko
@ 2008-04-22  9:56                                           ` Jakub Jelinek
  2008-04-22 10:33                                             ` Denys Vlasenko
  2008-04-22 12:51                                           ` Eric Sandeen
  1 sibling, 1 reply; 162+ messages in thread
From: Jakub Jelinek @ 2008-04-22  9:56 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: Eric Sandeen, David Chinner, Adrian Bunk, Alan Cox, Shawn Bohrer,
	Ingo Molnar, Andrew Morton, Linux Kernel Mailing List,
	Arjan van de Ven, Thomas Gleixner

On Tue, Apr 22, 2008 at 10:57:33AM +0200, Denys Vlasenko wrote:
> You do not see reduced stack usage in "make checkstack",
> because "make checkstack" shows only stack usage caused by
> local variables (it analyses sub %esp,NN instructions which
> make room for them). Parameters also take up stack, but
> they are pushed on stack with push instruction,
> and so are invisible in "make checkstack" output.

That on i?86 actually depends on whether -maccumulate-outgoing-args
is on or off (the default is off for -Os and most pre-i686 tunings,
and on for i686 and most post-i686 tunings when not -Os).

	Jakub

^ permalink raw reply	[flat|nested] 162+ messages in thread

* [PATCH] xfs: remove unused parameter of xfs_iomap_write_allocate
  2008-04-22  9:42                                         ` [PATCH] xfs: remove unused parameter of xfs_qm_dqpurge Denys Vlasenko
@ 2008-04-22 10:16                                           ` Denys Vlasenko
  2008-04-22 11:20                                             ` [PATCH] xfs: #define out unused parameters of xfs_bmap_add_free and xfs_btree_read_bufl Denys Vlasenko
  2008-04-22 22:33                                             ` [PATCH] xfs: remove unused parameter of xfs_iomap_write_allocate David Chinner
  2008-04-22 22:11                                           ` [PATCH] xfs: remove unused parameter of xfs_qm_dqpurge David Chinner
  2008-04-23  8:18                                           ` Christoph Hellwig
  2 siblings, 2 replies; 162+ messages in thread
From: Denys Vlasenko @ 2008-04-22 10:16 UTC (permalink / raw)
  To: David Chinner
  Cc: Eric Sandeen, Adrian Bunk, Alan Cox, Shawn Bohrer, Ingo Molnar,
	Andrew Morton, Linux Kernel Mailing List, Arjan van de Ven,
	Thomas Gleixner

[-- Attachment #1: Type: text/plain, Size: 384 bytes --]

Hi David,

xfs_iomap_write_allocate() does not use count parameter.
This patch removes it.

Code size difference on 32-bit x86:

# size */fs/xfs/xfs.o
 393457    2904    2952  399313   617d1 linux-2.6-xfs4-TEST/fs/xfs/xfs.o
 393441    2904    2952  399297   617c1 linux-2.6-xfs5-TEST/fs/xfs/xfs.o

Compile tested only.

Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
--
vda

[-- Attachment #2: xfs4-xfs_iomap_write_allocate-remove_unused_params.patch --]
[-- Type: text/x-diff, Size: 1022 bytes --]

--- linux-2.6-xfs4/fs/xfs/xfs_iomap.c	Tue Apr 22 04:06:44 2008
+++ linux-2.6-xfs5/fs/xfs/xfs_iomap.c	Tue Apr 22 11:59:32 2008
@@ -267,7 +267,7 @@
 			break;
 		}
 
-		error = xfs_iomap_write_allocate(ip, offset, count,
+		error = xfs_iomap_write_allocate(ip, offset,
 						 &imap, &nimaps);
 		break;
 	}
@@ -710,7 +710,6 @@
 xfs_iomap_write_allocate(
 	xfs_inode_t	*ip,
 	xfs_off_t	offset,
-	size_t		count,
 	xfs_bmbt_irec_t *map,
 	int		*retmap)
 {
--- linux-2.6-xfs4/fs/xfs/xfs_iomap.h	Tue Apr 22 04:06:44 2008
+++ linux-2.6-xfs5/fs/xfs/xfs_iomap.h	Tue Apr 22 11:59:16 2008
@@ -80,7 +80,7 @@
 				  int, struct xfs_bmbt_irec *, int *, int);
 extern int xfs_iomap_write_delay(struct xfs_inode *, xfs_off_t, size_t, int,
 				 struct xfs_bmbt_irec *, int *);
-extern int xfs_iomap_write_allocate(struct xfs_inode *, xfs_off_t, size_t,
+extern int xfs_iomap_write_allocate(struct xfs_inode *, xfs_off_t,
 				struct xfs_bmbt_irec *, int *);
 extern int xfs_iomap_write_unwritten(struct xfs_inode *, xfs_off_t, size_t);
 

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] xfs: do not pass unused params to xfs_flush_pages
  2008-04-22  9:56                                           ` Jakub Jelinek
@ 2008-04-22 10:33                                             ` Denys Vlasenko
  0 siblings, 0 replies; 162+ messages in thread
From: Denys Vlasenko @ 2008-04-22 10:33 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Eric Sandeen, David Chinner, Adrian Bunk, Alan Cox, Shawn Bohrer,
	Ingo Molnar, Andrew Morton, Linux Kernel Mailing List,
	Arjan van de Ven, Thomas Gleixner

On Tuesday 22 April 2008 11:56, Jakub Jelinek wrote:
> On Tue, Apr 22, 2008 at 10:57:33AM +0200, Denys Vlasenko wrote:
> > You do not see reduced stack usage in "make checkstack",
> > because "make checkstack" shows only stack usage caused by
> > local variables (it analyses sub %esp,NN instructions which
> > make room for them). Parameters also take up stack, but
> > they are pushed on stack with push instruction,
> > and so are invisible in "make checkstack" output.
> 
> That on i?86 actually depends on whether -maccumulate-outgoing-args
> is on or off (the default is off for -Os and most pre-i686 tunings,
> and on for i686 and most post-i686 tunings when not -Os).

I trust you know it better than I.

I removed a few parameters of non-static, non-inline function.
Since at call site gcc has no way of knowing that these parameters
will not be used by callee, and the function is not regparm
(explicitly or implicitly by being static), I am fairly sure
gcc is putting these parameters on stack.

"make checkstack" doesn't see any difference. It can only
mean that "make checkstack" does not account for stack space
taken by parameters, not that there is no difference
in stack usage after this change. That is simply not possible IMO.
--
vda

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] xfs: #define out unused parameters of xfs_bmap_add_free and xfs_btree_read_bufl
  2008-04-22 10:16                                           ` [PATCH] xfs: remove unused parameter of xfs_iomap_write_allocate Denys Vlasenko
@ 2008-04-22 11:20                                             ` Denys Vlasenko
  2008-04-22 11:48                                               ` [PATCH] xfs: #define out unused parameters for seven functions in xfs_trans.h Denys Vlasenko
                                                                 ` (3 more replies)
  2008-04-22 22:33                                             ` [PATCH] xfs: remove unused parameter of xfs_iomap_write_allocate David Chinner
  1 sibling, 4 replies; 162+ messages in thread
From: Denys Vlasenko @ 2008-04-22 11:20 UTC (permalink / raw)
  To: David Chinner
  Cc: Eric Sandeen, Adrian Bunk, Alan Cox, Shawn Bohrer, Ingo Molnar,
	Andrew Morton, Linux Kernel Mailing List, Arjan van de Ven,
	Thomas Gleixner

[-- Attachment #1: Type: text/plain, Size: 574 bytes --]

Hi David,

xfs_bmap_add_free and xfs_btree_read_bufl functions
use some of their parameters only in some cases
(e.g. if DEBUG is defined, or on non-Linux OS :)

This patch removes these parameters using #define hack
which makes them "disappear" without the need of uglifying
every callsite with #ifdefs.

Code size difference on 32-bit x86:
 393457    2904    2952  399313   617d1 linux-2.6-xfs6-TEST/fs/xfs/xfs.o
 393441    2904    2952  399297   617c1 linux-2.6-xfs7-TEST/fs/xfs/xfs.o

Compile tested only.

Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
--
vda

[-- Attachment #2: xfs5-xfs_bmap_add_free_xfs_btree_read_bufl-remove_unused_params.patch --]
[-- Type: text/x-diff, Size: 1220 bytes --]

diff -urpN linux-2.6-xfs6/fs/xfs/xfs_bmap.h linux-2.6-xfs7/fs/xfs/xfs_bmap.h
--- linux-2.6-xfs6/fs/xfs/xfs_bmap.h	2008-04-22 04:06:43.000000000 +0200
+++ linux-2.6-xfs7/fs/xfs/xfs_bmap.h	2008-04-22 13:10:36.000000000 +0200
@@ -170,6 +170,10 @@ xfs_bmap_add_attrfork(
  * Add the extent to the list of extents to be free at transaction end.
  * The list is maintained sorted (by block number).
  */
+#ifndef DEBUG
+/* "mp" is used only for debugging */
+#define xfs_bmap_add_free(bno, len, flist, mp) xfs_bmap_add_free(bno, len, flist)
+#endif
 void
 xfs_bmap_add_free(
 	xfs_fsblock_t		bno,		/* fs block number of extent */
diff -urpN linux-2.6-xfs6/fs/xfs/xfs_btree.h linux-2.6-xfs7/fs/xfs/xfs_btree.h
--- linux-2.6-xfs6/fs/xfs/xfs_btree.h	2008-04-22 04:06:43.000000000 +0200
+++ linux-2.6-xfs7/fs/xfs/xfs_btree.h	2008-04-22 13:10:36.000000000 +0200
@@ -363,6 +363,9 @@ xfs_btree_offsets(
  * Get a buffer for the block, return it read in.
  * Long-form addressing.
  */
+/* In Linux, "refval" is not used */
+#define xfs_btree_read_bufl(mp, tp, fsbno, lock, bpp, refval) \
+	xfs_btree_read_bufl(mp, tp, fsbno, lock, bpp)
 int					/* error */
 xfs_btree_read_bufl(
 	struct xfs_mount	*mp,	/* file system mount point */

^ permalink raw reply	[flat|nested] 162+ messages in thread

* [PATCH] xfs: #define out unused parameters for seven functions in xfs_trans.h
  2008-04-22 11:20                                             ` [PATCH] xfs: #define out unused parameters of xfs_bmap_add_free and xfs_btree_read_bufl Denys Vlasenko
@ 2008-04-22 11:48                                               ` Denys Vlasenko
  2008-04-22 11:51                                               ` Denys Vlasenko
                                                                 ` (2 subsequent siblings)
  3 siblings, 0 replies; 162+ messages in thread
From: Denys Vlasenko @ 2008-04-22 11:48 UTC (permalink / raw)
  To: David Chinner
  Cc: Eric Sandeen, Adrian Bunk, Alan Cox, Shawn Bohrer, Ingo Molnar,
	Andrew Morton, Linux Kernel Mailing List, Arjan van de Ven,
	Thomas Gleixner

Hi David,

Seven xfs_trans_XXX functions declared in xfs_trans.h
are not using "tp" parameter in non-debug builds,
but it still takes stack space since these functions
are not static and gcc cannot optimize it out.

This patch removes these parameters using #define hack
which makes them "disappear" without the need of uglifying
every callsite with #ifdefs.

Code size difference on 32-bit x86:
 393441    2904    2952  399297   617c1 linux-2.6-xfs7-TEST/fs/xfs/xfs.o
 393289    2904    2952  399145   61729 linux-2.6-xfs8-TEST/fs/xfs/xfs.o

Compile tested only.

Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
--
vda

^ permalink raw reply	[flat|nested] 162+ messages in thread

* [PATCH] xfs: #define out unused parameters for seven functions in xfs_trans.h
  2008-04-22 11:20                                             ` [PATCH] xfs: #define out unused parameters of xfs_bmap_add_free and xfs_btree_read_bufl Denys Vlasenko
  2008-04-22 11:48                                               ` [PATCH] xfs: #define out unused parameters for seven functions in xfs_trans.h Denys Vlasenko
@ 2008-04-22 11:51                                               ` Denys Vlasenko
  2008-04-22 13:32                                                 ` [PATCH] xfs: remove unused params from functions in xfs_dir2_leaf.h Denys Vlasenko
  2008-04-22 22:47                                                 ` [PATCH] xfs: #define out unused parameters for seven functions in xfs_trans.h David Chinner
  2008-04-22 14:28                                               ` [PATCH] xfs: #define out unused parameters of xfs_bmap_add_free and xfs_btree_read_bufl Adrian Bunk
  2008-04-22 22:43                                               ` David Chinner
  3 siblings, 2 replies; 162+ messages in thread
From: Denys Vlasenko @ 2008-04-22 11:51 UTC (permalink / raw)
  To: David Chinner
  Cc: Eric Sandeen, Adrian Bunk, Alan Cox, Shawn Bohrer, Ingo Molnar,
	Andrew Morton, Linux Kernel Mailing List, Arjan van de Ven,
	Thomas Gleixner

[-- Attachment #1: Type: text/plain, Size: 692 bytes --]

[ resend: now with patch attached! :) ]

Hi David,

Seven xfs_trans_XXX functions declared in xfs_trans.h
are not using "tp" parameter in non-debug builds,
but it still takes stack space since these functions
are not static and gcc cannot optimize it out.

This patch removes these parameters using #define hack
which makes them "disappear" without the need of uglifying
every callsite with #ifdefs.

Code size difference on 32-bit x86:
 393441    2904    2952  399297   617c1 linux-2.6-xfs7-TEST/fs/xfs/xfs.o
 393289    2904    2952  399145   61729 linux-2.6-xfs8-TEST/fs/xfs/xfs.o

Compile tested only.

Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
--
vda

[-- Attachment #2: xfs6-xfs_trans.h-remove_unused_params.patch --]
[-- Type: text/x-diff, Size: 2630 bytes --]

diff -urpN linux-2.6-xfs7/fs/xfs/xfs_trans_buf.c linux-2.6-xfs8/fs/xfs/xfs_trans_buf.c
--- linux-2.6-xfs7/fs/xfs/xfs_trans_buf.c	2008-04-22 04:06:44.000000000 +0200
+++ linux-2.6-xfs8/fs/xfs/xfs_trans_buf.c	2008-04-22 13:37:23.000000000 +0200
@@ -687,7 +687,6 @@ xfs_trans_bjoin(xfs_trans_t	*tp,
  * IOP_UNLOCK() routine is called.  The buffer must already be locked
  * and associated with the given transaction.
  */
-/* ARGSUSED */
 void
 xfs_trans_bhold(xfs_trans_t	*tp,
 		xfs_buf_t	*bp)
@@ -950,7 +949,6 @@ xfs_trans_stale_inode_buf(
  * xfs_buf_item_committed() to ensure that the buffer remains in the
  * AIL at its original location even after it has been relogged.
  */
-/* ARGSUSED */
 void
 xfs_trans_inode_alloc_buf(
 	xfs_trans_t	*tp,
@@ -979,7 +977,6 @@ xfs_trans_inode_alloc_buf(
  * between usr dquot bufs and grp dquot bufs, because usr and grp quotas
  * can be turned off independently.
  */
-/* ARGSUSED */
 void
 xfs_trans_dquot_buf(
 	xfs_trans_t	*tp,
diff -urpN linux-2.6-xfs7/fs/xfs/xfs_trans.h linux-2.6-xfs8/fs/xfs/xfs_trans.h
--- linux-2.6-xfs7/fs/xfs/xfs_trans.h	2008-04-22 13:10:36.000000000 +0200
+++ linux-2.6-xfs8/fs/xfs/xfs_trans.h	2008-04-22 13:41:38.000000000 +0200
@@ -961,6 +961,16 @@ struct xfs_buf	*xfs_trans_getsb(xfs_tran
 
 void		xfs_trans_brelse(xfs_trans_t *, struct xfs_buf *);
 void		xfs_trans_bjoin(xfs_trans_t *, struct xfs_buf *);
+#ifndef DEBUG
+/* "tp" is used only for debugging */
+#define         xfs_trans_bhold(tp, bp)           xfs_trans_bhold(bp)
+#define         xfs_trans_bhold_release(tp, bp)   xfs_trans_bhold_release(bp)
+#define         xfs_trans_inode_buf(tp, bp)       xfs_trans_inode_buf(bp)
+#define         xfs_trans_stale_inode_buf(tp, bp) xfs_trans_stale_inode_buf(bp)
+#define         xfs_trans_inode_alloc_buf(tp, bp) xfs_trans_inode_alloc_buf(bp)
+#define         xfs_trans_dquot_buf(tp, bp, type) xfs_trans_dquot_buf(bp, type)
+#define         xfs_trans_ihold(tp, bp)           xfs_trans_ihold(bp)
+#endif
 void		xfs_trans_bhold(xfs_trans_t *, struct xfs_buf *);
 void		xfs_trans_bhold_release(xfs_trans_t *, struct xfs_buf *);
 void		xfs_trans_binval(xfs_trans_t *, struct xfs_buf *);
diff -urpN linux-2.6-xfs7/fs/xfs/xfs_trans_inode.c linux-2.6-xfs8/fs/xfs/xfs_trans_inode.c
--- linux-2.6-xfs7/fs/xfs/xfs_trans_inode.c	2008-04-22 04:12:18.000000000 +0200
+++ linux-2.6-xfs8/fs/xfs/xfs_trans_inode.c	2008-04-22 13:38:10.000000000 +0200
@@ -224,7 +224,6 @@ xfs_trans_ijoin(
  * IOP_UNLOCK() routine is called.  The inode must already be locked
  * and associated with the given transaction.
  */
-/*ARGSUSED*/
 void
 xfs_trans_ihold(
 	xfs_trans_t	*tp,

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-22  1:28                                 ` David Chinner
  2008-04-22  2:33                                   ` [PATCH] xfs: do not pass size into kmem_free, it's unused Denys Vlasenko
@ 2008-04-22 12:48                                   ` Denys Vlasenko
  2008-04-22 13:01                                     ` Adrian Bunk
  2008-04-27 19:27                                   ` Jörn Engel
  2 siblings, 1 reply; 162+ messages in thread
From: Denys Vlasenko @ 2008-04-22 12:48 UTC (permalink / raw)
  To: David Chinner
  Cc: Eric Sandeen, Adrian Bunk, Alan Cox, Shawn Bohrer, Ingo Molnar,
	Andrew Morton, Linux Kernel Mailing List, Arjan van de Ven,
	Thomas Gleixner

On Tuesday 22 April 2008 03:28, David Chinner wrote:
> We've already chopped off the low hanging fruit, added noinline to
> every function definition to prevent compiler heuristics from
> blowing out stack usage by 25% and reduced use of temporary
> variables as much as possible. There's very little fat left to trim,
> and still we can't reliably fit in 4k stacks.

At yet, I got four screenfuls of

fs/xfs/XXXXX.c: warning: unused parameter 'foo'

when I added -Wunused_parameter to Makefile.
Clearly there is some room for improvement.

> Patches are welcome - I'd be over the moon if any of the known 4k
> stack advocates sent a stack reduction patch for XFS, but it seems
> that actually trying to fix the problems is much harder than
> resending a one line patch every few months....

Sent a few.
I would like to ask you to ACK/NAK every individual patch
in some reasonable period of time, say, 1-3 days. If you NAK a patch,
please let me know what is wrong with it.

I am not eager at all to experience a repeat of aic7xxx
patch saga, when I was not getting any meaningful reply
for months.

Best regards, Denys.
--
vda

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] xfs: do not pass unused params to xfs_flush_pages
  2008-04-22  8:57                                         ` Denys Vlasenko
  2008-04-22  9:56                                           ` Jakub Jelinek
@ 2008-04-22 12:51                                           ` Eric Sandeen
  1 sibling, 0 replies; 162+ messages in thread
From: Eric Sandeen @ 2008-04-22 12:51 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: David Chinner, Adrian Bunk, Alan Cox, Shawn Bohrer, Ingo Molnar,
	Andrew Morton, Linux Kernel Mailing List, Arjan van de Ven,
	Thomas Gleixner

Denys Vlasenko wrote:
> On Tuesday 22 April 2008 05:15, Eric Sandeen wrote:


>>> Compile-tested only.
>> FWIW this one actually does not seem to reduce stack usage anywhere.
> 
> I hope this will not deteriorate into a contest whether
> every particular patch reduces stack usage or not, but:

Sorry if you took it that way; since the patch was in response to Dave's
mention of accepting stack-reducing patches, I thought it was worth
checking and highlighting whether it seemed to help.  It wasn't supposed
to be an attack or argument.

> You do not see reduced stack usage in "make checkstack",
> because "make checkstack" shows only stack usage caused by
> local variables (it analyses sub %esp,NN instructions which
> make room for them). Parameters also take up stack, but
> they are pushed on stack with push instruction,
> and so are invisible in "make checkstack" output.

Hm, I had assumed that the %esp subtraction also made room for the
arguments pushed onto the stack.  Is there no way to analyze that part?

Thanks,
-Eric

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-22 12:48                                   ` x86: 4kstacks default Denys Vlasenko
@ 2008-04-22 13:01                                     ` Adrian Bunk
  2008-04-22 13:51                                       ` Denys Vlasenko
  0 siblings, 1 reply; 162+ messages in thread
From: Adrian Bunk @ 2008-04-22 13:01 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: David Chinner, Eric Sandeen, Alan Cox, Shawn Bohrer, Ingo Molnar,
	Andrew Morton, Linux Kernel Mailing List, Arjan van de Ven,
	Thomas Gleixner

On Tue, Apr 22, 2008 at 02:48:09PM +0200, Denys Vlasenko wrote:
> On Tuesday 22 April 2008 03:28, David Chinner wrote:
>...
> > Patches are welcome - I'd be over the moon if any of the known 4k
> > stack advocates sent a stack reduction patch for XFS, but it seems
> > that actually trying to fix the problems is much harder than
> > resending a one line patch every few months....
> 
> Sent a few.
> I would like to ask you to ACK/NAK every individual patch
> in some reasonable period of time, say, 1-3 days. If you NAK a patch,
> please let me know what is wrong with it.
>...

I know the feeling of resending patches again and again without any 
reaction quite well, but that's not David's fault and not true for XFS 
patches, so when you try to put pressure on him you hit the wrong 
person.

> Best regards, Denys.

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 162+ messages in thread

* [PATCH] xfs: remove unused params from functions in xfs_dir2_leaf.h
  2008-04-22 11:51                                               ` Denys Vlasenko
@ 2008-04-22 13:32                                                 ` Denys Vlasenko
  2008-04-22 13:40                                                   ` [PATCH] xfs: remove unused params from functions in xfs/quota/* Denys Vlasenko
  2008-04-22 22:47                                                 ` [PATCH] xfs: #define out unused parameters for seven functions in xfs_trans.h David Chinner
  1 sibling, 1 reply; 162+ messages in thread
From: Denys Vlasenko @ 2008-04-22 13:32 UTC (permalink / raw)
  To: David Chinner
  Cc: Eric Sandeen, Adrian Bunk, Alan Cox, Shawn Bohrer, Ingo Molnar,
	Andrew Morton, Linux Kernel Mailing List, Arjan van de Ven,
	Thomas Gleixner

[-- Attachment #1: Type: text/plain, Size: 507 bytes --]

Hi David,

Inline functions xfs_dir2_dataptr_to_byte and xfs_dir2_byte_to_dataptr
are not using their 1st argument. gcc is able to optimize that out.

I still want to delete these parameters, as they serve no useful purpose
and by removing them I can make gcc to notice some additional
unused variables in the callers of these inlines, and warn me
about that.

There is no object code size difference from this change.

Compile tested only.

Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
--
vda 

[-- Attachment #2: xfs7-xfs_dir2_leaf.h-remove_unused_params.patch --]
[-- Type: text/x-diff, Size: 4887 bytes --]

diff -urpN linux-2.6-xfs5/fs/xfs/xfs_dir2_block.c linux-2.6-xfs6/fs/xfs/xfs_dir2_block.c
--- linux-2.6-xfs5/fs/xfs/xfs_dir2_block.c	2008-04-22 04:19:32.000000000 +0200
+++ linux-2.6-xfs6/fs/xfs/xfs_dir2_block.c	2008-04-22 12:19:44.000000000 +0200
@@ -397,7 +397,7 @@ xfs_dir2_block_addname(
 	 * Fill in the leaf entry.
 	 */
 	blp[mid].hashval = cpu_to_be32(args->hashval);
-	blp[mid].address = cpu_to_be32(xfs_dir2_byte_to_dataptr(mp,
+	blp[mid].address = cpu_to_be32(xfs_dir2_byte_to_dataptr(
 				(char *)dep - (char *)block));
 	xfs_dir2_block_log_leaf(tp, bp, lfloglow, lfloghigh);
 	/*
@@ -1124,7 +1124,7 @@ xfs_dir2_sf_to_block(
 	*tagp = cpu_to_be16((char *)dep - (char *)block);
 	xfs_dir2_data_log_entry(tp, bp, dep);
 	blp[0].hashval = cpu_to_be32(xfs_dir_hash_dot);
-	blp[0].address = cpu_to_be32(xfs_dir2_byte_to_dataptr(mp,
+	blp[0].address = cpu_to_be32(xfs_dir2_byte_to_dataptr(
 				(char *)dep - (char *)block));
 	/*
 	 * Create entry for ..
@@ -1138,7 +1138,7 @@ xfs_dir2_sf_to_block(
 	*tagp = cpu_to_be16((char *)dep - (char *)block);
 	xfs_dir2_data_log_entry(tp, bp, dep);
 	blp[1].hashval = cpu_to_be32(xfs_dir_hash_dotdot);
-	blp[1].address = cpu_to_be32(xfs_dir2_byte_to_dataptr(mp,
+	blp[1].address = cpu_to_be32(xfs_dir2_byte_to_dataptr(
 				(char *)dep - (char *)block));
 	offset = XFS_DIR2_DATA_FIRST_OFFSET;
 	/*
@@ -1189,7 +1189,7 @@ xfs_dir2_sf_to_block(
 		xfs_dir2_data_log_entry(tp, bp, dep);
 		blp[2 + i].hashval = cpu_to_be32(xfs_da_hashname(
 					(char *)sfep->name, sfep->namelen));
-		blp[2 + i].address = cpu_to_be32(xfs_dir2_byte_to_dataptr(mp,
+		blp[2 + i].address = cpu_to_be32(xfs_dir2_byte_to_dataptr(
 						 (char *)dep - (char *)block));
 		offset = (int)((char *)(tagp + 1) - (char *)block);
 		if (++i == sfp->hdr.count)
diff -urpN linux-2.6-xfs5/fs/xfs/xfs_dir2_leaf.c linux-2.6-xfs6/fs/xfs/xfs_dir2_leaf.c
--- linux-2.6-xfs5/fs/xfs/xfs_dir2_leaf.c	2008-04-22 04:18:27.000000000 +0200
+++ linux-2.6-xfs6/fs/xfs/xfs_dir2_leaf.c	2008-04-22 12:19:21.000000000 +0200
@@ -804,7 +804,7 @@ xfs_dir2_leaf_getdents(
 	 * Inside the loop we keep the main offset value as a byte offset
 	 * in the directory file.
 	 */
-	curoff = xfs_dir2_dataptr_to_byte(mp, *offset);
+	curoff = xfs_dir2_dataptr_to_byte(*offset);
 
 	/*
 	 * Force this conversion through db so we truncate the offset
@@ -1091,7 +1091,7 @@ xfs_dir2_leaf_getdents(
 		 * Won't fit.  Return to caller.
 		 */
 		if (filldir(dirent, dep->name, dep->namelen,
-			    xfs_dir2_byte_to_dataptr(mp, curoff),
+			    xfs_dir2_byte_to_dataptr(curoff),
 			    ino, DT_UNKNOWN))
 			break;
 
@@ -1106,10 +1106,10 @@ xfs_dir2_leaf_getdents(
 	/*
 	 * All done.  Set output offset value to current offset.
 	 */
-	if (curoff > xfs_dir2_dataptr_to_byte(mp, XFS_DIR2_MAX_DATAPTR))
+	if (curoff > xfs_dir2_dataptr_to_byte(XFS_DIR2_MAX_DATAPTR))
 		*offset = XFS_DIR2_MAX_DATAPTR;
 	else
-		*offset = xfs_dir2_byte_to_dataptr(mp, curoff);
+		*offset = xfs_dir2_byte_to_dataptr(curoff);
 	kmem_free(map);
 	if (bp)
 		xfs_da_brelse(NULL, bp);
diff -urpN linux-2.6-xfs5/fs/xfs/xfs_dir2_leaf.h linux-2.6-xfs6/fs/xfs/xfs_dir2_leaf.h
--- linux-2.6-xfs5/fs/xfs/xfs_dir2_leaf.h	2008-04-22 04:06:43.000000000 +0200
+++ linux-2.6-xfs6/fs/xfs/xfs_dir2_leaf.h	2008-04-22 12:19:29.000000000 +0200
@@ -112,7 +112,7 @@ xfs_dir2_leaf_bests_p(xfs_dir2_leaf_tail
  * Convert dataptr to byte in file space
  */
 static inline xfs_dir2_off_t
-xfs_dir2_dataptr_to_byte(struct xfs_mount *mp, xfs_dir2_dataptr_t dp)
+xfs_dir2_dataptr_to_byte(xfs_dir2_dataptr_t dp)
 {
 	return (xfs_dir2_off_t)(dp) << XFS_DIR2_DATA_ALIGN_LOG;
 }
@@ -121,7 +121,7 @@ xfs_dir2_dataptr_to_byte(struct xfs_moun
  * Convert byte in file space to dataptr.  It had better be aligned.
  */
 static inline xfs_dir2_dataptr_t
-xfs_dir2_byte_to_dataptr(struct xfs_mount *mp, xfs_dir2_off_t by)
+xfs_dir2_byte_to_dataptr(xfs_dir2_off_t by)
 {
 	return (xfs_dir2_dataptr_t)((by) >> XFS_DIR2_DATA_ALIGN_LOG);
 }
@@ -142,7 +142,7 @@ xfs_dir2_byte_to_db(struct xfs_mount *mp
 static inline xfs_dir2_db_t
 xfs_dir2_dataptr_to_db(struct xfs_mount *mp, xfs_dir2_dataptr_t dp)
 {
-	return xfs_dir2_byte_to_db(mp, xfs_dir2_dataptr_to_byte(mp, dp));
+	return xfs_dir2_byte_to_db(mp, xfs_dir2_dataptr_to_byte(dp));
 }
 
 /*
@@ -161,7 +161,7 @@ xfs_dir2_byte_to_off(struct xfs_mount *m
 static inline xfs_dir2_data_aoff_t
 xfs_dir2_dataptr_to_off(struct xfs_mount *mp, xfs_dir2_dataptr_t dp)
 {
-	return xfs_dir2_byte_to_off(mp, xfs_dir2_dataptr_to_byte(mp, dp));
+	return xfs_dir2_byte_to_off(mp, xfs_dir2_dataptr_to_byte(dp));
 }
 
 /*
@@ -200,7 +200,7 @@ static inline xfs_dir2_dataptr_t
 xfs_dir2_db_off_to_dataptr(struct xfs_mount *mp, xfs_dir2_db_t db,
 			   xfs_dir2_data_aoff_t o)
 {
-	return xfs_dir2_byte_to_dataptr(mp, xfs_dir2_db_off_to_byte(mp, db, o));
+	return xfs_dir2_byte_to_dataptr(xfs_dir2_db_off_to_byte(mp, db, o));
 }
 
 /*

^ permalink raw reply	[flat|nested] 162+ messages in thread

* [PATCH] xfs: remove unused params from functions in xfs/quota/*
  2008-04-22 13:32                                                 ` [PATCH] xfs: remove unused params from functions in xfs_dir2_leaf.h Denys Vlasenko
@ 2008-04-22 13:40                                                   ` Denys Vlasenko
  2008-04-22 13:46                                                     ` [PATCH] xfs: expose no-op xfs_put_perag() Denys Vlasenko
  2008-04-22 23:08                                                     ` [PATCH] xfs: remove unused params from functions in xfs/quota/* David Chinner
  0 siblings, 2 replies; 162+ messages in thread
From: Denys Vlasenko @ 2008-04-22 13:40 UTC (permalink / raw)
  To: David Chinner
  Cc: Eric Sandeen, Adrian Bunk, Alan Cox, Shawn Bohrer, Ingo Molnar,
	Andrew Morton, Linux Kernel Mailing List, Arjan van de Ven,
	Thomas Gleixner

[-- Attachment #1: Type: text/plain, Size: 505 bytes --]

Hi David,

This patch deals with remaining cases of unused parameters in fs/xfs/quota/*
as far as I can see so far. The rest of unused parameters
in fs/xfs/quota/* cannot be easily eliminated due to addresses
of functions being taken.

Code size difference on 32-bit x86:
 393289    2904    2952  399145   61729 linux-2.6-xfs8-TEST/fs/xfs/xfs.o
 393236    2904    2952  399092   616f4 linux-2.6-xfs9-TEST/fs/xfs/xfs.o

Compile tested only.

Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
--
vda

[-- Attachment #2: xfs8-quota-remove_unused_params.patch --]
[-- Type: text/x-diff, Size: 2138 bytes --]

diff -urpN linux-2.6-xfs8/fs/xfs/quota/xfs_dquot.c linux-2.6-xfs9/fs/xfs/quota/xfs_dquot.c
--- linux-2.6-xfs8/fs/xfs/quota/xfs_dquot.c	2008-04-22 11:34:01.000000000 +0200
+++ linux-2.6-xfs9/fs/xfs/quota/xfs_dquot.c	2008-04-22 15:12:08.000000000 +0200
@@ -640,7 +640,11 @@ xfs_qm_dqtobp(
  * and release the buffer immediately.
  *
  */
-/* ARGSUSED */
+#ifndef DEBUG
+/* "id" is used only for debugging */
+#define xfs_qm_dqread(tpp, id, dqp, flags) \
+	xfs_qm_dqread(tpp, dqp, flags)
+#endif
 STATIC int
 xfs_qm_dqread(
 	xfs_trans_t	**tpp,
diff -urpN linux-2.6-xfs8/fs/xfs/quota/xfs_qm.c linux-2.6-xfs9/fs/xfs/quota/xfs_qm.c
--- linux-2.6-xfs8/fs/xfs/quota/xfs_qm.c	2008-04-22 11:34:31.000000000 +0200
+++ linux-2.6-xfs9/fs/xfs/quota/xfs_qm.c	2008-04-22 15:12:54.000000000 +0200
@@ -65,6 +65,8 @@ kmem_zone_t	*qm_dqtrxzone;
 
 static cred_t	xfs_zerocr;
 
+/* "str" and "n" are unused */
+#define xfs_qm_list_init(list, str, n) xfs_qm_list_init(list)
 STATIC void	xfs_qm_list_init(xfs_dqlist_t *, char *, int);
 STATIC void	xfs_qm_list_destroy(xfs_dqlist_t *);
 
@@ -210,10 +212,10 @@ xfs_qm_destroy(
  * structures are pretty independent, but it helps the XQM keep a
  * global view of what's going on.
  */
-/* ARGSUSED */
 STATIC int
 xfs_qm_hold_quotafs_ref(
-	struct xfs_mount *mp)
+	void
+	/*struct xfs_mount *mp*/)
 {
 	/*
 	 * Need to lock the xfs_Gqm structure for things like this. For example,
@@ -243,7 +245,8 @@ xfs_qm_hold_quotafs_ref(
 /* ARGSUSED */
 STATIC void
 xfs_qm_rele_quotafs_ref(
-	struct xfs_mount *mp)
+	void
+	/*struct xfs_mount *mp*/)
 {
 	xfs_dquot_t	*dqp, *nextdqp;
 
@@ -1122,7 +1125,7 @@ xfs_qm_init_quotainfo(
 	/*
 	 * Tell XQM that we exist as soon as possible.
 	 */
-	if ((error = xfs_qm_hold_quotafs_ref(mp))) {
+	if ((error = xfs_qm_hold_quotafs_ref(/* mp */))) {
 		return error;
 	}
 
@@ -1233,7 +1236,7 @@ xfs_qm_destroy_quotainfo(
 	 * when the XQM structure should be freed. We cannot assume
 	 * that xfs_Gqm is non-null after this point.
 	 */
-	xfs_qm_rele_quotafs_ref(mp);
+	xfs_qm_rele_quotafs_ref(/* mp */);
 
 	spinlock_destroy(&qi->qi_pinlock);
 	xfs_qm_list_destroy(&qi->qi_dqlist);

^ permalink raw reply	[flat|nested] 162+ messages in thread

* [PATCH] xfs: expose no-op xfs_put_perag()
  2008-04-22 13:40                                                   ` [PATCH] xfs: remove unused params from functions in xfs/quota/* Denys Vlasenko
@ 2008-04-22 13:46                                                     ` Denys Vlasenko
  2008-04-22 14:08                                                       ` Eric Sandeen
  2008-04-22 23:16                                                       ` David Chinner
  2008-04-22 23:08                                                     ` [PATCH] xfs: remove unused params from functions in xfs/quota/* David Chinner
  1 sibling, 2 replies; 162+ messages in thread
From: Denys Vlasenko @ 2008-04-22 13:46 UTC (permalink / raw)
  To: David Chinner
  Cc: Eric Sandeen, Adrian Bunk, Alan Cox, Shawn Bohrer, Ingo Molnar,
	Andrew Morton, Linux Kernel Mailing List, Arjan van de Ven,
	Thomas Gleixner

[-- Attachment #1: Type: text/plain, Size: 529 bytes --]

Hi David,

Inline function xfs_put_perag() in fs/xfs/xfs_mount.h is a no-op.

This patch converts it to no-op macro.

As a result, gcc will emit warning about unused variables,
parameters and so on not in this function, but in its callers,
which is more useful.

This patch, together with previous ones, has already resulted
in more unused params discovered and warned about by gcc.

There is no object code size difference from this change.

Compile tested only.

Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
--
vda

[-- Attachment #2: xfs_trivial0-expose-xfs_put_perag-noop.patch --]
[-- Type: text/x-diff, Size: 785 bytes --]

diff -urpN linux-2.6-xfs5/fs/xfs/xfs_mount.h linux-2.6-xfs6/fs/xfs/xfs_mount.h
--- linux-2.6-xfs5/fs/xfs/xfs_mount.h	2008-04-22 04:06:44.000000000 +0200
+++ linux-2.6-xfs6/fs/xfs/xfs_mount.h	2008-04-22 12:13:17.000000000 +0200
@@ -471,11 +471,17 @@ xfs_get_perag(struct xfs_mount *mp, xfs_
 	return &mp->m_perag[XFS_INO_TO_AGNO(mp, ino)];
 }
 
+/* Macro (instead of inline) makes gcc understand that params are not used
+ * and emit "unused" warnings *in callers* if they otherwise are not using
+ * variables passed to xfs_put_perag. We want to know that. */
+#define xfs_put_perag(mp, pag) ((void)0)
+#if 0
 static inline void
 xfs_put_perag(struct xfs_mount *mp, xfs_perag_t *pag)
 {
 	/* nothing to see here, move along */
 }
+#endif
 
 /*
  * Per-cpu superblock locking functions

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-22 13:01                                     ` Adrian Bunk
@ 2008-04-22 13:51                                       ` Denys Vlasenko
  0 siblings, 0 replies; 162+ messages in thread
From: Denys Vlasenko @ 2008-04-22 13:51 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: David Chinner, Eric Sandeen, Alan Cox, Shawn Bohrer, Ingo Molnar,
	Andrew Morton, Linux Kernel Mailing List, Arjan van de Ven,
	Thomas Gleixner

On Tuesday 22 April 2008 15:01, Adrian Bunk wrote:
> > Sent a few.
> > I would like to ask you to ACK/NAK every individual patch
> > in some reasonable period of time, say, 1-3 days. If you NAK a patch,
> > please let me know what is wrong with it.
> >...
> 
> I know the feeling of resending patches again and again without any 
> reaction quite well, but that's not David's fault and not true for XFS 
> patches, so when you try to put pressure on him you hit the wrong 
> person.

Yeah, sorry about that. I was not implying that XFS people were
not responsive.
--
vda

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] xfs: expose no-op xfs_put_perag()
  2008-04-22 13:46                                                     ` [PATCH] xfs: expose no-op xfs_put_perag() Denys Vlasenko
@ 2008-04-22 14:08                                                       ` Eric Sandeen
  2008-04-22 23:16                                                       ` David Chinner
  1 sibling, 0 replies; 162+ messages in thread
From: Eric Sandeen @ 2008-04-22 14:08 UTC (permalink / raw)
  To: Denys Vlasenko; +Cc: David Chinner, Adrian Bunk, Linux Kernel Mailing List

Denys Vlasenko wrote:
> Hi David,
> 
> Inline function xfs_put_perag() in fs/xfs/xfs_mount.h is a no-op.
> 
> This patch converts it to no-op macro.
> 
> As a result, gcc will emit warning about unused variables,
> parameters and so on not in this function, but in its callers,
> which is more useful.
> 
> This patch, together with previous ones, has already resulted
> in more unused params discovered and warned about by gcc.
> 
> There is no object code size difference from this change.

Denys, thanks for going through all this; I didn't mean to discount the
work with the stackcheck reports.  I've done a lot of similar xfs
pruning in the past, and every little bit helps.  It is still hard to
find significant reductions in the critical callchains though!

If the xfs codebase gets to the point where things are fairly well
cleaned up it might be nice to add the gcc warning to the makefiles, add
unused attributes to the vfs ops vectors as needed, and keep it clean
from this point on...

Thanks,

-Eric

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] xfs: #define out unused parameters of xfs_bmap_add_free and xfs_btree_read_bufl
  2008-04-22 11:20                                             ` [PATCH] xfs: #define out unused parameters of xfs_bmap_add_free and xfs_btree_read_bufl Denys Vlasenko
  2008-04-22 11:48                                               ` [PATCH] xfs: #define out unused parameters for seven functions in xfs_trans.h Denys Vlasenko
  2008-04-22 11:51                                               ` Denys Vlasenko
@ 2008-04-22 14:28                                               ` Adrian Bunk
  2008-04-22 16:17                                                 ` Denys Vlasenko
  2008-04-22 22:43                                               ` David Chinner
  3 siblings, 1 reply; 162+ messages in thread
From: Adrian Bunk @ 2008-04-22 14:28 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: David Chinner, Eric Sandeen, Alan Cox, Shawn Bohrer, Ingo Molnar,
	Andrew Morton, Linux Kernel Mailing List, Arjan van de Ven,
	Thomas Gleixner

On Tue, Apr 22, 2008 at 01:20:54PM +0200, Denys Vlasenko wrote:
> Hi David,
> 
> xfs_bmap_add_free and xfs_btree_read_bufl functions
> use some of their parameters only in some cases
> (e.g. if DEBUG is defined, or on non-Linux OS :)
> 
> This patch removes these parameters using #define hack
> which makes them "disappear" without the need of uglifying
> every callsite with #ifdefs.
> 
> Code size difference on 32-bit x86:
>  393457    2904    2952  399313   617d1 linux-2.6-xfs6-TEST/fs/xfs/xfs.o
>  393441    2904    2952  399297   617c1 linux-2.6-xfs7-TEST/fs/xfs/xfs.o
>...

Elimination of completely unused parameters makes sense, but IMHO using 
such #define hacks for minuscule code size and stack usage advantages is 
not worth it.

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] xfs: #define out unused parameters of xfs_bmap_add_free and xfs_btree_read_bufl
  2008-04-22 14:28                                               ` [PATCH] xfs: #define out unused parameters of xfs_bmap_add_free and xfs_btree_read_bufl Adrian Bunk
@ 2008-04-22 16:17                                                 ` Denys Vlasenko
  2008-04-22 17:21                                                   ` Adrian Bunk
  0 siblings, 1 reply; 162+ messages in thread
From: Denys Vlasenko @ 2008-04-22 16:17 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: David Chinner, Eric Sandeen, Alan Cox, Shawn Bohrer, Ingo Molnar,
	Andrew Morton, Linux Kernel Mailing List, Arjan van de Ven,
	Thomas Gleixner

On Tuesday 22 April 2008 16:28, Adrian Bunk wrote:
> > xfs_bmap_add_free and xfs_btree_read_bufl functions
> > use some of their parameters only in some cases
> > (e.g. if DEBUG is defined, or on non-Linux OS :)
> > 
> > This patch removes these parameters using #define hack
> > which makes them "disappear" without the need of uglifying
> > every callsite with #ifdefs.
> > 
> > Code size difference on 32-bit x86:
> >  393457    2904    2952  399313   617d1 linux-2.6-xfs6-TEST/fs/xfs/xfs.o
> >  393441    2904    2952  399297   617c1 linux-2.6-xfs7-TEST/fs/xfs/xfs.o
> >...
> 
> Elimination of completely unused parameters makes sense, but IMHO using 
> such #define hacks for minuscule code size and stack usage advantages is 
> not worth it.

In busybox this trick is used extensively.

I don't know how to eliminate these unused parameters with less
intervention, but I also don't want to leave it unfixed.

I want to eventually reach the state with no warnings
about unused parameters.
--
vda

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] xfs: #define out unused parameters of xfs_bmap_add_free and xfs_btree_read_bufl
  2008-04-22 16:17                                                 ` Denys Vlasenko
@ 2008-04-22 17:21                                                   ` Adrian Bunk
  2008-04-22 17:26                                                     ` Eric Sandeen
  0 siblings, 1 reply; 162+ messages in thread
From: Adrian Bunk @ 2008-04-22 17:21 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: David Chinner, Eric Sandeen, Alan Cox, Shawn Bohrer, Ingo Molnar,
	Andrew Morton, Linux Kernel Mailing List, Arjan van de Ven,
	Thomas Gleixner

On Tue, Apr 22, 2008 at 06:17:03PM +0200, Denys Vlasenko wrote:
> On Tuesday 22 April 2008 16:28, Adrian Bunk wrote:
> > > xfs_bmap_add_free and xfs_btree_read_bufl functions
> > > use some of their parameters only in some cases
> > > (e.g. if DEBUG is defined, or on non-Linux OS :)
> > > 
> > > This patch removes these parameters using #define hack
> > > which makes them "disappear" without the need of uglifying
> > > every callsite with #ifdefs.
> > > 
> > > Code size difference on 32-bit x86:
> > >  393457    2904    2952  399313   617d1 linux-2.6-xfs6-TEST/fs/xfs/xfs.o
> > >  393441    2904    2952  399297   617c1 linux-2.6-xfs7-TEST/fs/xfs/xfs.o
> > >...
> > 
> > Elimination of completely unused parameters makes sense, but IMHO using 
> > such #define hacks for minuscule code size and stack usage advantages is 
> > not worth it.
> 
> In busybox this trick is used extensively.

Busybox does not have more than one million lines changed in
one release.

In the Linux kernel maintainability is much more important than in 
smaller projects.

> I don't know how to eliminate these unused parameters with less
> intervention, but I also don't want to leave it unfixed.
> 
> I want to eventually reach the state with no warnings
> about unused parameters.

The standard kernel pattern in using empty static inline functions (that 
allow type checking).

And I'm not sure whether the number of functions you'd have to change 
for reaching your goal has four, five or six digits.

> vda

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] xfs: #define out unused parameters of xfs_bmap_add_free and xfs_btree_read_bufl
  2008-04-22 17:21                                                   ` Adrian Bunk
@ 2008-04-22 17:26                                                     ` Eric Sandeen
  2008-04-22 17:50                                                       ` Denys Vlasenko
  2008-04-22 20:46                                                       ` [PATCH] xfs: #define out unused parameters of xfs_bmap_add_free " Denys Vlasenko
  0 siblings, 2 replies; 162+ messages in thread
From: Eric Sandeen @ 2008-04-22 17:26 UTC (permalink / raw)
  To: Adrian Bunk; +Cc: Denys Vlasenko, Linux Kernel Mailing List

Adrian Bunk wrote:
> On Tue, Apr 22, 2008 at 06:17:03PM +0200, Denys Vlasenko wrote:


>>> Elimination of completely unused parameters makes sense, but IMHO using 
>>> such #define hacks for minuscule code size and stack usage advantages is 
>>> not worth it.
>> In busybox this trick is used extensively.
> 
> Busybox does not have more than one million lines changed in
> one release.
> 
> In the Linux kernel maintainability is much more important than in 
> smaller projects.
> 
>> I don't know how to eliminate these unused parameters with less
>> intervention, but I also don't want to leave it unfixed.
>>
>> I want to eventually reach the state with no warnings
>> about unused parameters.
> 
> The standard kernel pattern in using empty static inline functions (that 
> allow type checking).
> 
> And I'm not sure whether the number of functions you'd have to change 
> for reaching your goal has four, five or six digits.

It would be a huge undertaking.

Just building xfs w/ the warning in place exposes tons of unused
parameter warnings from outside xfs as well.

But, if it was deemed important enough, you could go annotate them as
unused, I suppose, and hack away at it...  Does marking as unused just
shut up the warning or does it let gcc do further optimizations?

-Eric

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] xfs: #define out unused parameters of xfs_bmap_add_free and xfs_btree_read_bufl
  2008-04-22 17:26                                                     ` Eric Sandeen
@ 2008-04-22 17:50                                                       ` Denys Vlasenko
  2008-04-22 18:28                                                         ` [PATCH] xfs: #define out unused parameters of?xfs_bmap_add_free " Adrian Bunk
  2008-04-22 20:46                                                       ` [PATCH] xfs: #define out unused parameters of xfs_bmap_add_free " Denys Vlasenko
  1 sibling, 1 reply; 162+ messages in thread
From: Denys Vlasenko @ 2008-04-22 17:50 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Adrian Bunk, Linux Kernel Mailing List

On Tuesday 22 April 2008 19:26, Eric Sandeen wrote:
> >> I want to eventually reach the state with no warnings
> >> about unused parameters.
> > 
> > The standard kernel pattern in using empty static inline functions (that 
> > allow type checking).
> > 
> > And I'm not sure whether the number of functions you'd have to change 
> > for reaching your goal has four, five or six digits.
> 
> It would be a huge undertaking.
> 
> Just building xfs w/ the warning in place exposes tons of unused
> parameter warnings from outside xfs as well.

Eh... I meant "no warnings about unused parameters" for fs/xfs/* only,
not for the entire kernel. I filter out other warnings.

I want to do it not as an excercise in perfectionism,
but as means of making sure we do not waste stack
passing useless parameters, which is important for xfs.
--
vda

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-20 15:44                   ` Daniel Hazelton
  2008-04-20 17:26                     ` Andi Kleen
@ 2008-04-22 18:20                     ` Romano Giannetti
  2008-04-23  5:03                       ` Denys Vlasenko
  1 sibling, 1 reply; 162+ messages in thread
From: Romano Giannetti @ 2008-04-22 18:20 UTC (permalink / raw)
  To: Daniel Hazelton; +Cc: linux-kernel



On Sun, 2008-04-20 at 11:44 -0400, Daniel Hazelton wrote:
> Since the second-most-common reason for stack overages is ndiswrapper... Well, 
> with there being so much more hardware now supported directly by the linux 
> kernel... 

How would I like you being right... Atheros AR5008, AR5414 PHY, "not yet
here". It's almost one year now since I bought this laptop, and till now
it's the cable or ndiswrapper. But yes, it's going better. For my first
wifi laptop I waited two and a half years, now it seems that in a bit
more than one there will be an open source driver...

I know all the trouble ndiswrapper signify. But I see also that people
around me with a laptop and linux use more ndiswrapper than a real
driver, so... be gentle with it. 

Thanks,
	Romano 	

-- 
Sorry for the disclaimer --- ¡I cannot stop it!



--
La presente comunicación tiene carácter confidencial y es para el exclusivo uso del destinatario indicado en la misma. Si Ud. no es el destinatario indicado, le informamos que cualquier forma de distribución, reproducción o uso de esta comunicación y/o de la información contenida en la misma están estrictamente prohibidos por la ley. Si Ud. ha recibido esta comunicación por error, por favor, notifíquelo inmediatamente al remitente contestando a este mensaje y proceda a continuación a destruirlo. Gracias por su colaboración.

This communication contains confidential information. It is for the exclusive use of the intended addressee. If you are not the intended addressee, please note that any form of distribution, copying or use of this communication or the information in it is strictly prohibited by law. If you have received this communication in error, please immediately notify the sender by reply e-mail and destroy this message. Thank you for your cooperation. 

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] xfs: #define out unused parameters of?xfs_bmap_add_free and xfs_btree_read_bufl
  2008-04-22 17:50                                                       ` Denys Vlasenko
@ 2008-04-22 18:28                                                         ` Adrian Bunk
  2008-04-22 19:32                                                           ` Denys Vlasenko
  0 siblings, 1 reply; 162+ messages in thread
From: Adrian Bunk @ 2008-04-22 18:28 UTC (permalink / raw)
  To: Denys Vlasenko; +Cc: Eric Sandeen, Linux Kernel Mailing List

On Tue, Apr 22, 2008 at 07:50:54PM +0200, Denys Vlasenko wrote:
> On Tuesday 22 April 2008 19:26, Eric Sandeen wrote:
> > >> I want to eventually reach the state with no warnings
> > >> about unused parameters.
> > > 
> > > The standard kernel pattern in using empty static inline functions (that 
> > > allow type checking).
> > > 
> > > And I'm not sure whether the number of functions you'd have to change 
> > > for reaching your goal has four, five or six digits.
> > 
> > It would be a huge undertaking.
> > 
> > Just building xfs w/ the warning in place exposes tons of unused
> > parameter warnings from outside xfs as well.
> 
> Eh... I meant "no warnings about unused parameters" for fs/xfs/* only,
> not for the entire kernel. I filter out other warnings.
> 
> I want to do it not as an excercise in perfectionism,
> but as means of making sure we do not waste stack
> passing useless parameters, which is important for xfs.

That's not really maintainable, and the stack gains are too small for 
bringing us significantely nearer to a solution.

> vda

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] xfs: #define out unused parameters of?xfs_bmap_add_free and xfs_btree_read_bufl
  2008-04-22 18:28                                                         ` [PATCH] xfs: #define out unused parameters of?xfs_bmap_add_free " Adrian Bunk
@ 2008-04-22 19:32                                                           ` Denys Vlasenko
  2008-04-22 23:53                                                             ` Adrian Bunk
  0 siblings, 1 reply; 162+ messages in thread
From: Denys Vlasenko @ 2008-04-22 19:32 UTC (permalink / raw)
  To: Adrian Bunk; +Cc: Eric Sandeen, Linux Kernel Mailing List

On Tuesday 22 April 2008 20:28, Adrian Bunk wrote:
> > Eh... I meant "no warnings about unused parameters" for fs/xfs/* only,
> > not for the entire kernel. I filter out other warnings.
> > 
> > I want to do it not as an excercise in perfectionism,
> > but as means of making sure we do not waste stack
> > passing useless parameters, which is important for xfs.
> 
> That's not really maintainable,

Why? Adding -Wunused -Wunused-parameter in fs/xfs/Makefile:

EXTRA_CFLAGS += -I$(src) -I$(src)/linux-2.6 -funsigned-char
#EXTRA_CFLAGS += -Wunused -Wunused-parameter

and making a test build with it uncommented once in a while
will reveal a bit of fallout, which is then fixed.
busybox source is thrice as big as xfs source
and from the experience I'd say it's not difficult
to keep it in shape.

> and the stack gains are too small for  
> bringing us significantely nearer to a solution.

I promise to take a look at the critical (wrt stack use) path next.
--
vda

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] xfs: #define out unused parameters of xfs_bmap_add_free and xfs_btree_read_bufl
  2008-04-22 17:26                                                     ` Eric Sandeen
  2008-04-22 17:50                                                       ` Denys Vlasenko
@ 2008-04-22 20:46                                                       ` Denys Vlasenko
  1 sibling, 0 replies; 162+ messages in thread
From: Denys Vlasenko @ 2008-04-22 20:46 UTC (permalink / raw)
  To: Eric Sandeen; +Cc: Adrian Bunk, Linux Kernel Mailing List

On Tuesday 22 April 2008 19:26, Eric Sandeen wrote:
> It would be a huge undertaking.
> 
> Just building xfs w/ the warning in place exposes tons of unused
> parameter warnings from outside xfs as well.

I was grepping them away.

> But, if it was deemed important enough, you could go annotate them as
> unused, I suppose, and hack away at it...  Does marking as unused just
> shut up the warning or does it let gcc do further optimizations?

It just shuts up the warning. It is still useful - suppresses
false positives.

I didn't check whether gcc is clever enough to reuse stack space
occupied by unused parameter(s) as a free space for automatic
variables. In theory it is allowed to do that and reduce stack usage
that way.
--
vda

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] xfs: do not pass size into kmem_free, it's unused
  2008-04-22  2:33                                   ` [PATCH] xfs: do not pass size into kmem_free, it's unused Denys Vlasenko
  2008-04-22  3:03                                     ` [PATCH] xfs: do not pass unused params to xfs_flush_pages Denys Vlasenko
  2008-04-22  3:09                                     ` [PATCH] xfs: do not pass size into kmem_free, it's unused Eric Sandeen
@ 2008-04-22 22:02                                     ` David Chinner
  2 siblings, 0 replies; 162+ messages in thread
From: David Chinner @ 2008-04-22 22:02 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: David Chinner, Eric Sandeen, Adrian Bunk, Alan Cox, Shawn Bohrer,
	Ingo Molnar, Andrew Morton, Linux Kernel Mailing List,
	Arjan van de Ven, Thomas Gleixner

On Tue, Apr 22, 2008 at 04:33:03AM +0200, Denys Vlasenko wrote:
> Hi David,
> 
> > Patches are welcome - I'd be over the moon if any of the known 4k
> > stack advocates sent a stack reduction patch for XFS, but it seems
> > that actually trying to fix the problems is much harder than
> > resending a one line patch every few months....
> 
> kmem_free() function takes (ptr, size) arguments but doesn't
> actually use second one.

Ack. Pulled into my qa tree.

FWIW, can you send patches in line next time? It makes it easier to
quote them on review....

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] xfs: do not pass unused params to xfs_flush_pages
  2008-04-22  3:03                                     ` [PATCH] xfs: do not pass unused params to xfs_flush_pages Denys Vlasenko
  2008-04-22  3:14                                       ` [PATCH] xfs: use smaller int param in call " Denys Vlasenko
  2008-04-22  3:15                                       ` [PATCH] xfs: do not pass unused params " Eric Sandeen
@ 2008-04-22 22:07                                       ` David Chinner
  2 siblings, 0 replies; 162+ messages in thread
From: David Chinner @ 2008-04-22 22:07 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: David Chinner, Eric Sandeen, Adrian Bunk, Alan Cox, Shawn Bohrer,
	Ingo Molnar, Andrew Morton, Linux Kernel Mailing List,
	Arjan van de Ven, Thomas Gleixner

On Tue, Apr 22, 2008 at 05:03:16AM +0200, Denys Vlasenko wrote:
> Hi David,
> 
> xfs_flush_pages() does not use some of its parameters, namely:
> first, last and fiops.

These were never removed because they are place holders for
stuff that Linux didn't support when the original port was done.
Now Linux supports range flushes, these functions should be changed
to do that, and hence the first/last parameters will be used.

But the fiopt flag can probably be killed....

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] xfs: use smaller int param in call to xfs_flush_pages
  2008-04-22  3:14                                       ` [PATCH] xfs: use smaller int param in call " Denys Vlasenko
  2008-04-22  3:18                                         ` Eric Sandeen
  2008-04-22  9:42                                         ` [PATCH] xfs: remove unused parameter of xfs_qm_dqpurge Denys Vlasenko
@ 2008-04-22 22:08                                         ` David Chinner
  2 siblings, 0 replies; 162+ messages in thread
From: David Chinner @ 2008-04-22 22:08 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: David Chinner, Eric Sandeen, Adrian Bunk, Alan Cox, Shawn Bohrer,
	Ingo Molnar, Andrew Morton, Linux Kernel Mailing List,
	Arjan van de Ven, Thomas Gleixner

On Tue, Apr 22, 2008 at 05:14:45AM +0200, Denys Vlasenko wrote:
> Hi David,
> 
> xfs_flush_pages() flags parameter is declared as uint64_t, but
> code never pass values which do not fit into 32 bits.
> All callsites sans one pass zero, and the last one passes
> XFS_B_DELWRI, XFS_B_ASYNC or zero.
> These values are defined in enum xfs_buf_flags_t and they
> all fit in 32 bits.

Can you fold this into the previous patch that kills fiopt to
this function?

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] xfs: remove unused parameter of xfs_qm_dqpurge
  2008-04-22  9:42                                         ` [PATCH] xfs: remove unused parameter of xfs_qm_dqpurge Denys Vlasenko
  2008-04-22 10:16                                           ` [PATCH] xfs: remove unused parameter of xfs_iomap_write_allocate Denys Vlasenko
@ 2008-04-22 22:11                                           ` David Chinner
  2008-04-23  8:18                                           ` Christoph Hellwig
  2 siblings, 0 replies; 162+ messages in thread
From: David Chinner @ 2008-04-22 22:11 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: David Chinner, Eric Sandeen, Adrian Bunk, Alan Cox, Shawn Bohrer,
	Ingo Molnar, Andrew Morton, Linux Kernel Mailing List,
	Arjan van de Ven, Thomas Gleixner

On Tue, Apr 22, 2008 at 11:42:00AM +0200, Denys Vlasenko wrote:
> Hi David,
> 
> xfs_qm_dqpurge() does not use flags parameter.
> This patch removes it.

Ok. Will test.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] xfs: remove unused parameter of xfs_iomap_write_allocate
  2008-04-22 10:16                                           ` [PATCH] xfs: remove unused parameter of xfs_iomap_write_allocate Denys Vlasenko
  2008-04-22 11:20                                             ` [PATCH] xfs: #define out unused parameters of xfs_bmap_add_free and xfs_btree_read_bufl Denys Vlasenko
@ 2008-04-22 22:33                                             ` David Chinner
  1 sibling, 0 replies; 162+ messages in thread
From: David Chinner @ 2008-04-22 22:33 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: David Chinner, Eric Sandeen, Adrian Bunk, Alan Cox, Shawn Bohrer,
	Ingo Molnar, Andrew Morton, Linux Kernel Mailing List,
	Arjan van de Ven, Thomas Gleixner

On Tue, Apr 22, 2008 at 12:16:25PM +0200, Denys Vlasenko wrote:
> Hi David,
> 
> xfs_iomap_write_allocate() does not use count parameter.

Hmmm - I'm wondering if that is actually a bug. Certainly the
code is in conflict with the comment for the function, and
it points out that I could have fixed a recent bug in a better
way.

I'm going to hold off this one until I've had time to look at this
in more detail....

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] xfs: #define out unused parameters of xfs_bmap_add_free and xfs_btree_read_bufl
  2008-04-22 11:20                                             ` [PATCH] xfs: #define out unused parameters of xfs_bmap_add_free and xfs_btree_read_bufl Denys Vlasenko
                                                                 ` (2 preceding siblings ...)
  2008-04-22 14:28                                               ` [PATCH] xfs: #define out unused parameters of xfs_bmap_add_free and xfs_btree_read_bufl Adrian Bunk
@ 2008-04-22 22:43                                               ` David Chinner
  3 siblings, 0 replies; 162+ messages in thread
From: David Chinner @ 2008-04-22 22:43 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: David Chinner, Eric Sandeen, Adrian Bunk, Alan Cox, Shawn Bohrer,
	Ingo Molnar, Andrew Morton, Linux Kernel Mailing List,
	Arjan van de Ven, Thomas Gleixner

On Tue, Apr 22, 2008 at 01:20:54PM +0200, Denys Vlasenko wrote:
> Hi David,
> 
> xfs_bmap_add_free and xfs_btree_read_bufl functions
> use some of their parameters only in some cases
> (e.g. if DEBUG is defined, or on non-Linux OS :)
> 
> This patch removes these parameters using #define hack
> which makes them "disappear" without the need of uglifying
> every callsite with #ifdefs.

We don't use pre-processor hacks to hide function variables for different
config options.  The XFS header files are messy enough without adding
additional redefinitions of function types to them.

w.r.t xfs_bmap_add_free(), the correct thing to do is to factor the
debug code out into a different function that is only compiled
on debug kernels and remove all the debug checks from xfs_bmap_add_free().

As it is, I don't think that the change is worth the maintenance
cost for a few bytes of stack space in non-critical paths.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] xfs: #define out unused parameters for seven functions in xfs_trans.h
  2008-04-22 11:51                                               ` Denys Vlasenko
  2008-04-22 13:32                                                 ` [PATCH] xfs: remove unused params from functions in xfs_dir2_leaf.h Denys Vlasenko
@ 2008-04-22 22:47                                                 ` David Chinner
  1 sibling, 0 replies; 162+ messages in thread
From: David Chinner @ 2008-04-22 22:47 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: David Chinner, Eric Sandeen, Adrian Bunk, Alan Cox, Shawn Bohrer,
	Ingo Molnar, Andrew Morton, Linux Kernel Mailing List,
	Arjan van de Ven, Thomas Gleixner

On Tue, Apr 22, 2008 at 01:51:03PM +0200, Denys Vlasenko wrote:
> [ resend: now with patch attached! :) ]
> 
> Hi David,
> 
> Seven xfs_trans_XXX functions declared in xfs_trans.h
> are not using "tp" parameter in non-debug builds,
> but it still takes stack space since these functions
> are not static and gcc cannot optimize it out.

Same as my last comments - I don't think the savings are
worth the additional clutter it introduces.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] xfs: remove unused params from functions in xfs/quota/*
  2008-04-22 13:40                                                   ` [PATCH] xfs: remove unused params from functions in xfs/quota/* Denys Vlasenko
  2008-04-22 13:46                                                     ` [PATCH] xfs: expose no-op xfs_put_perag() Denys Vlasenko
@ 2008-04-22 23:08                                                     ` David Chinner
  1 sibling, 0 replies; 162+ messages in thread
From: David Chinner @ 2008-04-22 23:08 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: David Chinner, Eric Sandeen, Adrian Bunk, Alan Cox, Shawn Bohrer,
	Ingo Molnar, Andrew Morton, Linux Kernel Mailing List,
	Arjan van de Ven, Thomas Gleixner

On Tue, Apr 22, 2008 at 03:40:12PM +0200, Denys Vlasenko wrote:
> Hi David,
> 
> This patch deals with remaining cases of unused parameters in fs/xfs/quota/*
> as far as I can see so far. The rest of unused parameters
> in fs/xfs/quota/* cannot be easily eliminated due to addresses
> of functions being taken.

I'd just kill the parameters to xfs_qm_hold_quotafs_ref and
xfs_qm_rele_quotafs_ref and I wouldn't worry about removingthe debug-only
id parameter to xfs_qm_dqread as it's not in a stack critical path.

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] xfs: expose no-op xfs_put_perag()
  2008-04-22 13:46                                                     ` [PATCH] xfs: expose no-op xfs_put_perag() Denys Vlasenko
  2008-04-22 14:08                                                       ` Eric Sandeen
@ 2008-04-22 23:16                                                       ` David Chinner
  1 sibling, 0 replies; 162+ messages in thread
From: David Chinner @ 2008-04-22 23:16 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: David Chinner, Eric Sandeen, Adrian Bunk, Alan Cox, Shawn Bohrer,
	Ingo Molnar, Andrew Morton, Linux Kernel Mailing List,
	Arjan van de Ven, Thomas Gleixner

On Tue, Apr 22, 2008 at 03:46:58PM +0200, Denys Vlasenko wrote:
> Hi David,
> 
> Inline function xfs_put_perag() in fs/xfs/xfs_mount.h is a no-op.
> 
> This patch converts it to no-op macro.

xfs_put_perag() is paired with xfs_get_perag() and should never be
called by itself. It is a stub for AG reference counting the
in-memory per-ag structures and, in future, locking to allow us to
avoid certain deadlocks that can occur (rarely) when growing and
shrinking the filesystem.

Also, I've got patches that put stuff in this function, so I'd
prefer to leave it as it is right now...

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] xfs: #define out unused parameters of?xfs_bmap_add_free and xfs_btree_read_bufl
  2008-04-22 19:32                                                           ` Denys Vlasenko
@ 2008-04-22 23:53                                                             ` Adrian Bunk
  0 siblings, 0 replies; 162+ messages in thread
From: Adrian Bunk @ 2008-04-22 23:53 UTC (permalink / raw)
  To: Denys Vlasenko; +Cc: Eric Sandeen, Linux Kernel Mailing List

On Tue, Apr 22, 2008 at 09:32:38PM +0200, Denys Vlasenko wrote:
> On Tuesday 22 April 2008 20:28, Adrian Bunk wrote:
> > > Eh... I meant "no warnings about unused parameters" for fs/xfs/* only,
> > > not for the entire kernel. I filter out other warnings.
> > > 
> > > I want to do it not as an excercise in perfectionism,
> > > but as means of making sure we do not waste stack
> > > passing useless parameters, which is important for xfs.
> > 
> > That's not really maintainable,
> 
> Why? Adding -Wunused -Wunused-parameter in fs/xfs/Makefile:
> 
> EXTRA_CFLAGS += -I$(src) -I$(src)/linux-2.6 -funsigned-char
> #EXTRA_CFLAGS += -Wunused -Wunused-parameter
> 
> and making a test build with it uncommented once in a while
> will reveal a bit of fallout, which is then fixed.
>...

The problem isn't in the Makefile, the problem are the ugly #ifdef's in 
the code.

And for getting the stack problems fixed the effect is anyway by two 
orders of magnitude too small, so there's no real gain in exchange.

> vda

cu
Adrian

-- 

       "Is there not promise of rain?" Ling Tan asked suddenly out
        of the darkness. There had been need of rain for many days.
       "Only a promise," Lao Er said.
                                       Pearl S. Buck - Dragon Seed


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-22 18:20                     ` Romano Giannetti
@ 2008-04-23  5:03                       ` Denys Vlasenko
  2008-04-23  5:21                         ` Daniel Hazelton
  0 siblings, 1 reply; 162+ messages in thread
From: Denys Vlasenko @ 2008-04-23  5:03 UTC (permalink / raw)
  To: Romano Giannetti; +Cc: Daniel Hazelton, linux-kernel

On Tuesday 22 April 2008 20:20, Romano Giannetti wrote:
> On Sun, 2008-04-20 at 11:44 -0400, Daniel Hazelton wrote:
> > Since the second-most-common reason for stack overages is ndiswrapper... Well, 
> > with there being so much more hardware now supported directly by the linux 
> > kernel... 
> 
> How would I like you being right... Atheros AR5008, AR5414 PHY, "not yet
> here". It's almost one year now since I bought this laptop, and till now
> it's the cable or ndiswrapper. But yes, it's going better. For my first
> wifi laptop I waited two and a half years, now it seems that in a bit
> more than one there will be an open source driver...
> 
> I know all the trouble ndiswrapper signify. But I see also that people
> around me with a laptop and linux use more ndiswrapper than a real
> driver, so... be gentle with it. 

Nobody knows how much potential development is not done because
"you can make your wifi work with ndiswrapper".
--
vda

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-23  5:03                       ` Denys Vlasenko
@ 2008-04-23  5:21                         ` Daniel Hazelton
  2008-04-23  5:25                           ` david
  0 siblings, 1 reply; 162+ messages in thread
From: Daniel Hazelton @ 2008-04-23  5:21 UTC (permalink / raw)
  To: Denys Vlasenko; +Cc: Romano Giannetti, linux-kernel

On Wednesday 23 April 2008 01:03:11 Denys Vlasenko wrote:
> On Tuesday 22 April 2008 20:20, Romano Giannetti wrote:
> > On Sun, 2008-04-20 at 11:44 -0400, Daniel Hazelton wrote:
> > > Since the second-most-common reason for stack overages is
> > > ndiswrapper... Well, with there being so much more hardware now
> > > supported directly by the linux kernel...
> >
> > How would I like you being right... Atheros AR5008, AR5414 PHY, "not yet
> > here". It's almost one year now since I bought this laptop, and till now
> > it's the cable or ndiswrapper. But yes, it's going better. For my first
> > wifi laptop I waited two and a half years, now it seems that in a bit
> > more than one there will be an open source driver...
> >
> > I know all the trouble ndiswrapper signify. But I see also that people
> > around me with a laptop and linux use more ndiswrapper than a real
> > driver, so... be gentle with it.
>
> Nobody knows how much potential development is not done because
> "you can make your wifi work with ndiswrapper".

I've got to agree with that sentiment. Once a working solution is found, no 
matter how crappy, it seems that almost all development stops.

DRH

-- 
Dialup is like pissing through a pipette. Slow and excruciatingly painful.

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-23  5:21                         ` Daniel Hazelton
@ 2008-04-23  5:25                           ` david
  2008-04-23  5:41                             ` Daniel Hazelton
  0 siblings, 1 reply; 162+ messages in thread
From: david @ 2008-04-23  5:25 UTC (permalink / raw)
  To: Daniel Hazelton; +Cc: Denys Vlasenko, Romano Giannetti, linux-kernel

On Wed, 23 Apr 2008, Daniel Hazelton wrote:

> On Wednesday 23 April 2008 01:03:11 Denys Vlasenko wrote:
>> On Tuesday 22 April 2008 20:20, Romano Giannetti wrote:
>>> On Sun, 2008-04-20 at 11:44 -0400, Daniel Hazelton wrote:
>>>> Since the second-most-common reason for stack overages is
>>>> ndiswrapper... Well, with there being so much more hardware now
>>>> supported directly by the linux kernel...
>>>
>>> How would I like you being right... Atheros AR5008, AR5414 PHY, "not yet
>>> here". It's almost one year now since I bought this laptop, and till now
>>> it's the cable or ndiswrapper. But yes, it's going better. For my first
>>> wifi laptop I waited two and a half years, now it seems that in a bit
>>> more than one there will be an open source driver...
>>>
>>> I know all the trouble ndiswrapper signify. But I see also that people
>>> around me with a laptop and linux use more ndiswrapper than a real
>>> driver, so... be gentle with it.
>>
>> Nobody knows how much potential development is not done because
>> "you can make your wifi work with ndiswrapper".
>
> I've got to agree with that sentiment. Once a working solution is found, no
> matter how crappy, it seems that almost all development stops.

and nobody knows how many people are running linux instead of windows 
becouse they were able to use ndiswrapper to get things running. most of 
those people contributed nothing to the kernel, but they all contributed 
to Linux, if nothing else as examples that Linux is a reasonable option 
(and some percentage of those users have contrinbuted to other opensource 
projects that they would probably never have bumped into if they were 
running windows instead)

I know we will never convince each other, but we do need to recognise that 
there is another valid point of view.

David Lang

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-19 14:23   ` Ingo Molnar
                       ` (3 preceding siblings ...)
  2008-04-20  3:29     ` Eric Sandeen
@ 2008-04-23  5:27     ` Benjamin Herrenschmidt
  2008-04-23 23:36       ` David Chinner
  4 siblings, 1 reply; 162+ messages in thread
From: Benjamin Herrenschmidt @ 2008-04-23  5:27 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andrew Morton, Linux Kernel Mailing List, Arjan van de Ven,
	Thomas Gleixner


On Sat, 2008-04-19 at 16:23 +0200, Ingo Molnar wrote:
> * Andrew Morton <akpm@linux-foundation.org> wrote:
> 
> > >  config 4KSTACKS
> > >  	bool "Use 4Kb for kernel stacks instead of 8Kb"
> > > -	depends on DEBUG_KERNEL
> > >  	depends on X86_32
> > > +	default y
> > 
> > This patch will cause kernels to crash.
> 
> what mainline kernels crash and how will they crash? Fedora and other 
> distros have had 4K stacks enabled for years:
> 
>   $ grep 4K /boot/config-2.6.24-9.fc9
>   CONFIG_4KSTACKS=y
> 
> and we've conducted tens of thousands of bootup tests with all sorts of 
> drivers and kernel options enabled and have yet to see a single crash 
> due to 4K stacks. So basically the kernel default just follows the 
> common distro default now. (distros and users can still disable it)

Do we routinely test nasty scenarii such as a GFP_KERNEL allocation deep
in a call stack trying to swap something out to NFS ?

Ben.



^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-23  5:25                           ` david
@ 2008-04-23  5:41                             ` Daniel Hazelton
  2008-04-23  7:46                               ` Romano Giannetti
  0 siblings, 1 reply; 162+ messages in thread
From: Daniel Hazelton @ 2008-04-23  5:41 UTC (permalink / raw)
  To: david; +Cc: Denys Vlasenko, Romano Giannetti, linux-kernel

On Wednesday 23 April 2008 01:25:27 david@lang.hm wrote:
> On Wed, 23 Apr 2008, Daniel Hazelton wrote:
> > On Wednesday 23 April 2008 01:03:11 Denys Vlasenko wrote:
> >> On Tuesday 22 April 2008 20:20, Romano Giannetti wrote:
> >>> On Sun, 2008-04-20 at 11:44 -0400, Daniel Hazelton wrote:
> >>>> Since the second-most-common reason for stack overages is
> >>>> ndiswrapper... Well, with there being so much more hardware now
> >>>> supported directly by the linux kernel...
> >>>
> >>> How would I like you being right... Atheros AR5008, AR5414 PHY, "not
> >>> yet here". It's almost one year now since I bought this laptop, and
> >>> till now it's the cable or ndiswrapper. But yes, it's going better. For
> >>> my first wifi laptop I waited two and a half years, now it seems that
> >>> in a bit more than one there will be an open source driver...
> >>>
> >>> I know all the trouble ndiswrapper signify. But I see also that people
> >>> around me with a laptop and linux use more ndiswrapper than a real
> >>> driver, so... be gentle with it.
> >>
> >> Nobody knows how much potential development is not done because
> >> "you can make your wifi work with ndiswrapper".
> >
> > I've got to agree with that sentiment. Once a working solution is found,
> > no matter how crappy, it seems that almost all development stops.
>
> and nobody knows how many people are running linux instead of windows
> becouse they were able to use ndiswrapper to get things running. most of
> those people contributed nothing to the kernel, but they all contributed
> to Linux, if nothing else as examples that Linux is a reasonable option
> (and some percentage of those users have contrinbuted to other opensource
> projects that they would probably never have bumped into if they were
> running windows instead)
>
> I know we will never convince each other, but we do need to recognise that
> there is another valid point of view.

And who knows how many more people would be running Linux if they didn't need 
ndiswrapper at all?

And how much better would it be if the drivers were native linux code and were 
fully supportable because of that?

There are many, many reasons why it'd be better if ndiswrapper didn't exist as 
a solution or if development on native solutions continued on at the level it 
would without ndiswrapper. 

DRH

-- 
Dialup is like pissing through a pipette. Slow and excruciatingly painful.

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-23  5:41                             ` Daniel Hazelton
@ 2008-04-23  7:46                               ` Romano Giannetti
  2008-04-23 11:24                                 ` Stefan Richter
  0 siblings, 1 reply; 162+ messages in thread
From: Romano Giannetti @ 2008-04-23  7:46 UTC (permalink / raw)
  To: Daniel Hazelton; +Cc: david, Denys Vlasenko, linux-kernel



[Trimmed, I hope I got the authors right...]

On Wed, 2008-04-23 at 01:41 -0400, Daniel Hazelton wrote:
> > On Wed, 23 Apr 2008, Daniel Hazelton wrote:
> > > On Wednesday 23 April 2008 01:03:11 Denys Vlasenko wrote:
> > >> On Tuesday 22 April 2008 20:20, Romano Giannetti wrote:

> > >>> I know all the trouble ndiswrapper signify. But I see also that people
> > >>> around me with a laptop and linux use more ndiswrapper than a real
> > >>> driver, so... be gentle with it.
> > >>
> > >> Nobody knows how much potential development is not done because
> > >> "you can make your wifi work with ndiswrapper".
> > >

> And how much better would it be if the drivers were native linux code and were 
> fully supportable because of that?

I understand your position, but let me give my example. I have this
laptop that is one year old. I'm helping in all what I can to the
development of ath5k --- IOW, offering testing, I am not an expert on
this. 

But the mere fact that ndiswrapper exists enabled me to use this laptop
on a daily basis, and so I could test new kernel (and if you look at the
logs you'll see I had at least helped to fix a nasty MMC bug, and to
make sound work in this laptop) and help in other areas, like
suspend/resume testing and bug chasing. 

There is not only wireless development. Without ndiswrapper, I wouldn't
have been in any position to help other areas. I would have had a
crippled laptop[1], a much higher Vista uptime (which now is 0), and a
far bitter Linux experience.  

And this is the point of view of someone that is using Linux since
0.99pl9, so I have quite a bit of experience. 99% of normal users would
simply say "don't work"[2].

		Romano

[1] yes, there's a madwifi version locked to a specific kernel that
works with my card. But I do not think that this would be so much
different. 

[2] a nice page with "_this_ laptop will fully work with linux" would be
nice. Linux on laptop or similar is too complex to be a real help when
you have to buy a laptop in 2 days. 


-- 
Sorry for the disclaimer --- ¡I cannot stop it!



--
La presente comunicación tiene carácter confidencial y es para el exclusivo uso del destinatario indicado en la misma. Si Ud. no es el destinatario indicado, le informamos que cualquier forma de distribución, reproducción o uso de esta comunicación y/o de la información contenida en la misma están estrictamente prohibidos por la ley. Si Ud. ha recibido esta comunicación por error, por favor, notifíquelo inmediatamente al remitente contestando a este mensaje y proceda a continuación a destruirlo. Gracias por su colaboración.

This communication contains confidential information. It is for the exclusive use of the intended addressee. If you are not the intended addressee, please note that any form of distribution, copying or use of this communication or the information in it is strictly prohibited by law. If you have received this communication in error, please immediately notify the sender by reply e-mail and destroy this message. Thank you for your cooperation. 

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: [PATCH] xfs: remove unused parameter of xfs_qm_dqpurge
  2008-04-22  9:42                                         ` [PATCH] xfs: remove unused parameter of xfs_qm_dqpurge Denys Vlasenko
  2008-04-22 10:16                                           ` [PATCH] xfs: remove unused parameter of xfs_iomap_write_allocate Denys Vlasenko
  2008-04-22 22:11                                           ` [PATCH] xfs: remove unused parameter of xfs_qm_dqpurge David Chinner
@ 2008-04-23  8:18                                           ` Christoph Hellwig
  2 siblings, 0 replies; 162+ messages in thread
From: Christoph Hellwig @ 2008-04-23  8:18 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: David Chinner, Eric Sandeen, Adrian Bunk, Alan Cox, Shawn Bohrer,
	Ingo Molnar, Andrew Morton, Linux Kernel Mailing List,
	Arjan van de Ven, Thomas Gleixner

FYI: if you want to sumbit xfs patches it makes a lot of sense to send
them to the xfs list..


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-21  3:30                               ` Alexander E. Patrakov
@ 2008-04-23  8:57                                 ` Helge Hafting
  0 siblings, 0 replies; 162+ messages in thread
From: Helge Hafting @ 2008-04-23  8:57 UTC (permalink / raw)
  To: Alexander E. Patrakov
  Cc: Adrian Bunk, Arjan van de Ven, Eric Sandeen, Alan Cox,
	Shawn Bohrer, Ingo Molnar, Andrew Morton,
	Linux Kernel Mailing List, Thomas Gleixner

Alexander E. Patrakov wrote:
> Adrian Bunk wrote:
>> On Sun, Apr 20, 2008 at 08:41:27AM -0700, Arjan van de Ven wrote:
>>> ...
>>> yes. Adrian is waay off in the weeds on this one. Nobody but him is 
>>> suggesting to remove
>>> 8Kb stacks. I think everyone else agrees that having both options is 
>>> valuable; and there
>>> are better ways to find+fix stack bloat than removing this config 
>>> option.
>>
>> I'm not arguing for removing the option immediately, but long-term we 
>> shouldn't need it.
>>
>> This comes from my experience of removing obsolete drivers for 
>> hardware for which also a more recent driver exists:
>> As long as there is some workaround (e.g. using an older driver or
>> 8k stacks) the workaround will be used instead of the getting proper 
>> bug reports and fixes.
>>
>> As far as I know all problems that are known with 4k stacks are some 
>> nested things with XFS in the trace.
>
> This "as far as I know" is a problem itself. Is it possible to 
> implement (e.g., using some form of memory protection in hardware, but 
> I am not an expert here) an option with 8k stacks that, however, spams 
> the log if the actual usage goes above 4k, and have this as a default 
> for some time? If 4k stacks are the goal that is almost achieved, then 
> this debugging option should have zero impact on performance.
Shouldn't be hard. Use the 8k stack, and have the system mark the second 
page as "not present"
If it ever gets used you get a page fault. The page fault handler then 
have to mark the page
present before returning, as well as queue up some spam (the call chain 
perhaps) for the log.

A less intrusive way is to use 8k stacks as-is, but put a signature in 
the second page.
When the process quits, examine the second stack page to see if the 
signature
got overwritten. This approach will only show that a problem exists, it 
won't
pinpoint exactly what does it.

Helge Hafting


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-20 13:21                     ` Adrian Bunk
@ 2008-04-23  9:13                       ` Helge Hafting
  2008-04-23 23:29                         ` David Chinner
  2008-04-24 15:46                         ` Eric Sandeen
  2008-04-28 18:38                       ` Bill Davidsen
  1 sibling, 2 replies; 162+ messages in thread
From: Helge Hafting @ 2008-04-23  9:13 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Willy Tarreau, Andi Kleen, Alan Cox, Shawn Bohrer, Ingo Molnar,
	Andrew Morton, Linux Kernel Mailing List, Arjan van de Ven,
	Thomas Gleixner

Adrian Bunk wrote:
> What actually brings bad reputation is shipping a 4k option that is 
> known to break under some circumstances.
>   
How about making 4k stacks incompatible with those circumstances then?
I.e. is you select 4k stacks, then you can't select XFS because we know
that _may_ fail. Similiar for ndiswrapper networking, and other
stuff where problems have been noticed.

Some people don't need any of these, and can then use
safe 4k stacks. Well, at least as safe as the 8k stacks are, there is no
mathematical proof for their safety in all cases either.

Helge Hafting

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-23  7:46                               ` Romano Giannetti
@ 2008-04-23 11:24                                 ` Stefan Richter
  2008-04-23 12:15                                   ` Romano Giannetti
  0 siblings, 1 reply; 162+ messages in thread
From: Stefan Richter @ 2008-04-23 11:24 UTC (permalink / raw)
  To: Romano Giannetti; +Cc: Daniel Hazelton, david, Denys Vlasenko, linux-kernel

Romano Giannetti wrote:
[stack size requirements -> ndiswrapper -> hardware support status -> ?]
> a nice page with "_this_ laptop will fully work with linux" would be
> nice. Linux on laptop or similar is too complex to be a real help when
> you have to buy a laptop in 2 days. 

If sites like tuxmobil.org, hardware4linux.info, and the hardware
compatibility databases of Linux distributors don't work for you, then
just ask the notebook vendors directly.
-- 
Stefan Richter
-=====-==--- -=-- =-===
http://arcgraph.de/sr/

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-23 11:24                                 ` Stefan Richter
@ 2008-04-23 12:15                                   ` Romano Giannetti
  2008-04-23 15:59                                     ` Lennart Sorensen
  0 siblings, 1 reply; 162+ messages in thread
From: Romano Giannetti @ 2008-04-23 12:15 UTC (permalink / raw)
  To: Stefan Richter; +Cc: Daniel Hazelton, david, Denys Vlasenko, linux-kernel



On Wed, 2008-04-23 at 13:24 +0200, Stefan Richter wrote:
> Romano Giannetti wrote:
> [stack size requirements -> ndiswrapper -> hardware support status -> ?]
> > a nice page with "_this_ laptop will fully work with linux" would be
> > nice. Linux on laptop or similar is too complex to be a real help when
> > you have to buy a laptop in 2 days. 
> 
> If sites like tuxmobil.org, hardware4linux.info, and the hardware
> compatibility databases of Linux distributors don't work for you, then
> just ask the notebook vendors directly.

Unfortunately, it is quite a complex thing to check. Mind you, I've
bought this laptop after looking all over there, but:

- tuxmobil & Co are very user-driven, and you have to swim among tenth
of "similar" computer;

- it's not so easy to know what exactly is bundled with a laptop[1];

- vendor say "works" (and often is listed as works in the aforementioned
sites too) independently if it works with an open source driver or not.
As an example, all the nvidia-based graphics are marked "works".

Romano 

[1] In my case, I selected this toshiba over for example a HP or an Acer
because it had "atheros wifi" (but guess that the PHY version is too new
to be supported...), "intel hda sound" (but guess that the specific
codec didn't work at all, and continues to have a lot of problems), 
"intel graphics" (and that at least was a good decision!). 


-- 
Sorry for the disclaimer --- ¡I cannot stop it!



--
La presente comunicación tiene carácter confidencial y es para el exclusivo uso del destinatario indicado en la misma. Si Ud. no es el destinatario indicado, le informamos que cualquier forma de distribución, reproducción o uso de esta comunicación y/o de la información contenida en la misma están estrictamente prohibidos por la ley. Si Ud. ha recibido esta comunicación por error, por favor, notifíquelo inmediatamente al remitente contestando a este mensaje y proceda a continuación a destruirlo. Gracias por su colaboración.

This communication contains confidential information. It is for the exclusive use of the intended addressee. If you are not the intended addressee, please note that any form of distribution, copying or use of this communication or the information in it is strictly prohibited by law. If you have received this communication in error, please immediately notify the sender by reply e-mail and destroy this message. Thank you for your cooperation. 

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-23 12:15                                   ` Romano Giannetti
@ 2008-04-23 15:59                                     ` Lennart Sorensen
  0 siblings, 0 replies; 162+ messages in thread
From: Lennart Sorensen @ 2008-04-23 15:59 UTC (permalink / raw)
  To: Romano Giannetti
  Cc: Stefan Richter, Daniel Hazelton, david, Denys Vlasenko, linux-kernel

On Wed, Apr 23, 2008 at 02:15:01PM +0200, Romano Giannetti wrote:
> - vendor say "works" (and often is listed as works in the aforementioned
> sites too) independently if it works with an open source driver or not.
> As an example, all the nvidia-based graphics are marked "works".

The nv driver does work for all the nvidia cards as far as I know.  Sure
you don't get 3D acceleration, but you do get working X.

But yes it is quite annoying when companies like highpoint (and others)
claim to support linux when all they have is binary blobs as part of their
"driver".

-- 
Len Sorensen

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-23  9:13                       ` Helge Hafting
@ 2008-04-23 23:29                         ` David Chinner
  2008-04-24 15:46                         ` Eric Sandeen
  1 sibling, 0 replies; 162+ messages in thread
From: David Chinner @ 2008-04-23 23:29 UTC (permalink / raw)
  To: Helge Hafting
  Cc: Adrian Bunk, Willy Tarreau, Andi Kleen, Alan Cox, Shawn Bohrer,
	Ingo Molnar, Andrew Morton, Linux Kernel Mailing List,
	Arjan van de Ven, Thomas Gleixner

On Wed, Apr 23, 2008 at 11:13:55AM +0200, Helge Hafting wrote:
> Adrian Bunk wrote:
> >What actually brings bad reputation is shipping a 4k option that is 
> >known to break under some circumstances.
> >  
> How about making 4k stacks incompatible with those circumstances then?
> I.e. is you select 4k stacks, then you can't select XFS because we know
> that _may_ fail. Similiar for ndiswrapper networking, and other
> stuff where problems have been noticed.

Yeah, that means every distro that supports XFS (i.e. pretty much
all of them including Fedora) will be forced disable 4k stacks on
x86.  I'd be happy with this solution.

FWIW, this would make 4k stacks pretty much unused outside of custom
kernels.  At which point I'd suggest a default of 4k is wrong....

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-23  5:27     ` Benjamin Herrenschmidt
@ 2008-04-23 23:36       ` David Chinner
  2008-04-24  0:45         ` Arjan van de Ven
  2008-04-24  0:56         ` Benjamin Herrenschmidt
  0 siblings, 2 replies; 162+ messages in thread
From: David Chinner @ 2008-04-23 23:36 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Ingo Molnar, Andrew Morton, Linux Kernel Mailing List,
	Arjan van de Ven, Thomas Gleixner

On Wed, Apr 23, 2008 at 03:27:01PM +1000, Benjamin Herrenschmidt wrote:
> On Sat, 2008-04-19 at 16:23 +0200, Ingo Molnar wrote:
> > * Andrew Morton <akpm@linux-foundation.org> wrote:
> > 
> > > >  config 4KSTACKS
> > > >  	bool "Use 4Kb for kernel stacks instead of 8Kb"
> > > > -	depends on DEBUG_KERNEL
> > > >  	depends on X86_32
> > > > +	default y
> > > 
> > > This patch will cause kernels to crash.
> > 
> > what mainline kernels crash and how will they crash? Fedora and other 
> > distros have had 4K stacks enabled for years:
> > 
> >   $ grep 4K /boot/config-2.6.24-9.fc9
> >   CONFIG_4KSTACKS=y
> > 
> > and we've conducted tens of thousands of bootup tests with all sorts of 
> > drivers and kernel options enabled and have yet to see a single crash 
> > due to 4K stacks. So basically the kernel default just follows the 
> > common distro default now. (distros and users can still disable it)
> 
> Do we routinely test nasty scenarii such as a GFP_KERNEL allocation deep
> in a call stack trying to swap something out to NFS ?

I doubt it, because this is the place that a local XFS filesystem
typically blows a 4k stack (direct memory reclaim triggering
->writepage). Boot testing does nothing to exercise the potential
paths for stack overflows....

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-23 23:36       ` David Chinner
@ 2008-04-24  0:45         ` Arjan van de Ven
  2008-04-24  9:52           ` Christoph Hellwig
  2008-04-24  0:56         ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 162+ messages in thread
From: Arjan van de Ven @ 2008-04-24  0:45 UTC (permalink / raw)
  To: T David Chinner
  Cc: Benjamin Herrenschmidt, Ingo Molnar, Andrew Morton,
	Linux Kernel Mailing List, Thomas Gleixner

On Thu, 24 Apr 2008 09:36:52 +1000
David Chinner <dgc@sgi.com> wrote:

> On Wed, Apr 23, 2008 at 03:27:01PM +1000, Benjamin Herrenschmidt
> wrote:
> > On Sat, 2008-04-19 at 16:23 +0200, Ingo Molnar wrote:
> > > * Andrew Morton <akpm@linux-foundation.org> wrote:
> > > 
> > > > >  config 4KSTACKS
> > > > >  	bool "Use 4Kb for kernel stacks instead of 8Kb"
> > > > > -	depends on DEBUG_KERNEL
> > > > >  	depends on X86_32
> > > > > +	default y
> > > > 
> > > > This patch will cause kernels to crash.
> > > 
> > > what mainline kernels crash and how will they crash? Fedora and
> > > other distros have had 4K stacks enabled for years:
> > > 
> > >   $ grep 4K /boot/config-2.6.24-9.fc9
> > >   CONFIG_4KSTACKS=y
> > > 
> > > and we've conducted tens of thousands of bootup tests with all
> > > sorts of drivers and kernel options enabled and have yet to see a
> > > single crash due to 4K stacks. So basically the kernel default
> > > just follows the common distro default now. (distros and users
> > > can still disable it)
> > 
> > Do we routinely test nasty scenarii such as a GFP_KERNEL allocation
> > deep in a call stack trying to swap something out to NFS ?
> 
> I doubt it, because this is the place that a local XFS filesystem
> typically blows a 4k stack (direct memory reclaim triggering
> ->writepage). Boot testing does nothing to exercise the potential
> paths for stack overflows....
> 

THe good news is that direct reclaim is.. rare.
And I also doubt XFS is unique here; imagine the whole stacking thing on x86-64 just the same ...

I wonder if the direct reclaim path should avoid direct reclaim if the stack has only X bytes left.
(where the value of X is... well we can figure that one out later)

The rarity of direct reclaim during normal use ought to make this not a performance problem per se,
and the benefits go further than just "XFS" or "4K stacks".

-- 
If you want to reach me at my work email, use arjan@linux.intel.com
For development, discussion and tips for power savings, 
visit http://www.lesswatts.org

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-23 23:36       ` David Chinner
  2008-04-24  0:45         ` Arjan van de Ven
@ 2008-04-24  0:56         ` Benjamin Herrenschmidt
  1 sibling, 0 replies; 162+ messages in thread
From: Benjamin Herrenschmidt @ 2008-04-24  0:56 UTC (permalink / raw)
  To: David Chinner
  Cc: Ingo Molnar, Andrew Morton, Linux Kernel Mailing List,
	Arjan van de Ven, Thomas Gleixner


On Thu, 2008-04-24 at 09:36 +1000, David Chinner wrote:
> > Do we routinely test nasty scenarii such as a GFP_KERNEL allocation
> deep
> > in a call stack trying to swap something out to NFS ?
> 
> I doubt it, because this is the place that a local XFS filesystem
> typically blows a 4k stack (direct memory reclaim triggering
> ->writepage). Boot testing does nothing to exercise the potential
> paths for stack overflows....

Yup, note even counting when the said NFS is on top of some fancy
network stack with a driver on top of USB .... I mean, we do have
potential for worst case scenario that I think -will- blow a 4k stack.

Ben.



^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-24  0:45         ` Arjan van de Ven
@ 2008-04-24  9:52           ` Christoph Hellwig
  2008-04-24 12:25             ` Peter Zijlstra
  2008-04-24 15:41             ` Chris Mason
  0 siblings, 2 replies; 162+ messages in thread
From: Christoph Hellwig @ 2008-04-24  9:52 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: T David Chinner, Benjamin Herrenschmidt, Ingo Molnar,
	Andrew Morton, Linux Kernel Mailing List, Thomas Gleixner

On Wed, Apr 23, 2008 at 05:45:16PM -0700, Arjan van de Ven wrote:
> THe good news is that direct reclaim is.. rare.
> And I also doubt XFS is unique here; imagine the whole stacking thing on x86-64 just the same ...

It's bad news actually.  Beause it means the stack overflow happens
totally random and hard to reproduce.   And no, XFS is not unique there,
any filesystem with a complex enough writeback path (aka extents +
delalloc + smart allocator) will have to use quite a lot here.  I'll be
my 2 cent that ext4 one finished up will run into this just as likely.

> I wonder if the direct reclaim path should avoid direct reclaim if the stack has only X bytes left.
> (where the value of X is... well we can figure that one out later)

Actually direct reclaim should be totally avoided for complex
filesystems.  It's horrible for the stack and for the filesystem
writeout policy and ondisk allocation strategies.


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-24  9:52           ` Christoph Hellwig
@ 2008-04-24 12:25             ` Peter Zijlstra
  2008-04-24 15:41             ` Chris Mason
  1 sibling, 0 replies; 162+ messages in thread
From: Peter Zijlstra @ 2008-04-24 12:25 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Arjan van de Ven, T David Chinner, Benjamin Herrenschmidt,
	Ingo Molnar, Andrew Morton, Linux Kernel Mailing List,
	Thomas Gleixner

On Thu, 2008-04-24 at 05:52 -0400, Christoph Hellwig wrote:

> > I wonder if the direct reclaim path should avoid direct reclaim if the stack has only X bytes left.
> > (where the value of X is... well we can figure that one out later)
> 
> Actually direct reclaim should be totally avoided for complex
> filesystems.  It's horrible for the stack and for the filesystem
> writeout policy and ondisk allocation strategies.

That's basically any reclaim, even kswapd will ruin policy and block
allocation smarts.


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-24  9:52           ` Christoph Hellwig
  2008-04-24 12:25             ` Peter Zijlstra
@ 2008-04-24 15:41             ` Chris Mason
  2008-04-24 18:30               ` Alexander van Heukelum
  1 sibling, 1 reply; 162+ messages in thread
From: Chris Mason @ 2008-04-24 15:41 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Arjan van de Ven, T David Chinner, Benjamin Herrenschmidt,
	Ingo Molnar, Andrew Morton, Linux Kernel Mailing List,
	Thomas Gleixner, sandeen

On Thursday 24 April 2008, Christoph Hellwig wrote:
> On Wed, Apr 23, 2008 at 05:45:16PM -0700, Arjan van de Ven wrote:
> > THe good news is that direct reclaim is.. rare.
> > And I also doubt XFS is unique here; imagine the whole stacking thing on
> > x86-64 just the same ...
>
> It's bad news actually.  Beause it means the stack overflow happens
> totally random and hard to reproduce.   And no, XFS is not unique there,
> any filesystem with a complex enough writeback path (aka extents +
> delalloc + smart allocator) will have to use quite a lot here.  I'll be
> my 2 cent that ext4 one finished up will run into this just as likely.
>
> > I wonder if the direct reclaim path should avoid direct reclaim if the
> > stack has only X bytes left. (where the value of X is... well we can
> > figure that one out later)
>
> Actually direct reclaim should be totally avoided for complex
> filesystems.  It's horrible for the stack and for the filesystem
> writeout policy and ondisk allocation strategies.

Just as a data point, XFS isn't alone.  I run through once or twice a month 
and try to get rid of any new btrfs stack pigs, but keeping under the 4k 
stack barrier is a constant challenge.  

My storage configuration is fairly simple, if we spin the wheel of stacked IO 
devices...it won't be pretty.

Does it make more sense to kill off some brain cells on finding ways to 
dynamically increase the stack as we run out?  Or even give the robust stack 
users like xfs/btrfs a way to say: I'm pretty sure this call path is going to 
hurt, please make my stack bigger now.

We have relatively few entry points between the rest of the kernel and the FS, 
there should be some ways to compromise here.

-chris

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-23  9:13                       ` Helge Hafting
  2008-04-23 23:29                         ` David Chinner
@ 2008-04-24 15:46                         ` Eric Sandeen
  1 sibling, 0 replies; 162+ messages in thread
From: Eric Sandeen @ 2008-04-24 15:46 UTC (permalink / raw)
  To: Helge Hafting
  Cc: Adrian Bunk, Willy Tarreau, Andi Kleen, Alan Cox, Shawn Bohrer,
	Ingo Molnar, Andrew Morton, Linux Kernel Mailing List,
	Arjan van de Ven, Thomas Gleixner

Helge Hafting wrote:
> Adrian Bunk wrote:
>> What actually brings bad reputation is shipping a 4k option that is 
>> known to break under some circumstances.
>>   
> How about making 4k stacks incompatible with those circumstances then?
> I.e. is you select 4k stacks, then you can't select XFS because we know
> that _may_ fail. Similiar for ndiswrapper networking, and other
> stuff where problems have been noticed.

Problem is, it's the storage configuration (at administration time, not
kernel build time) that matters, too.

I have XFS on Fedora with 4k stacks on SATA /dev/sdb1 on my x86 mythbox,
and it's perfectly fine.  But that's a nice, simple setup.  If I stacked
more things over/under it, I'd be more likely to have trouble.

-Eric

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-24 15:41             ` Chris Mason
@ 2008-04-24 18:30               ` Alexander van Heukelum
  0 siblings, 0 replies; 162+ messages in thread
From: Alexander van Heukelum @ 2008-04-24 18:30 UTC (permalink / raw)
  To: Chris Mason, Christoph Hellwig
  Cc: Arjan van de Ven, T David Chinner, Benjamin Herrenschmidt,
	Ingo Molnar, Andrew Morton, Linux Kernel Mailing List, Th

On Thu, 24 Apr 2008 11:41:30 -0400, "Chris Mason"
<chris.mason@oracle.com> said:
> On Thursday 24 April 2008, Christoph Hellwig wrote:
> > On Wed, Apr 23, 2008 at 05:45:16PM -0700, Arjan van de Ven wrote:
> > > THe good news is that direct reclaim is.. rare. And I also doubt
> > > XFS is unique here; imagine the whole stacking thing on x86-64
> > > just the same ...
> >
> > It's bad news actually.  Beause it means the stack overflow happens
> > totally random and hard to reproduce.   And no, XFS is not unique
> > there, any filesystem with a complex enough writeback path (aka
> > extents + delalloc + smart allocator) will have to use quite a lot
> > here.  I'll be my 2 cent that ext4 one finished up will run into
> > this just as likely.
> >
> > > I wonder if the direct reclaim path should avoid direct reclaim if
> > > the stack has only X bytes left. (where the value of X is... well
> > > we can figure that one out later)
> >
> > Actually direct reclaim should be totally avoided for complex
> > filesystems.  It's horrible for the stack and for the filesystem
> > writeout policy and ondisk allocation strategies.
>
> Just as a data point, XFS isn't alone.  I run through once or twice a
> month and try to get rid of any new btrfs stack pigs, but keeping
> under the 4k  stack barrier is a constant challenge.
>
> My storage configuration is fairly simple, if we spin the wheel of
> stacked IO devices...it won't be pretty.
>
> Does it make more sense to kill off some brain cells on finding ways
> to dynamically increase the stack as we run out?  Or even give the
> robust stack users like xfs/btrfs a way to say: I'm pretty sure this
> call path is going to hurt, please make my stack bigger now.

Hi,

(Rookie warning goes here.) To me, growing the stack at more or less
random places in the kernel seems to be quite a complicated thing to do
and it will be quite a maintainance burden to find the right spots to
insert stack usage checks. So I'ld say: lose the dynamic aspect.

How about unconditionally switching stacks at some defined points within
the core code of the kernel, just before calling into any driver code,
for example? The 4k-option has separate irq stacks already, why not have
driver stacks too?

I think the most important consideration to keep the stack size small
was that non-order-0 allocations are unreliable under/after memory
pressure due to fragmentation and that this allocation has to be done
for each thread. It is therefore preferable not to do any higher-order
allocations at all, unless there is a fall-back mechanism if the
allocation fails. For higher-order stacks there isn't such a fallback...
Can the system get by (without deadlocks at least in practice) with a
limited number of preallocated but 'large' stacks (in addition to a
small per-thread stack)?

It was discussed that stack space is needed for any sleeping process.
Could it be arranged that this waiting happens on the smallish stack, at
least for the most common cases, while non-waiting activity can use the
big stacks?

Greetings,
    Alexander

> We have relatively few entry points between the rest of the kernel and
> the FS, there should be some ways to compromise here.
>
> -chris
> --
> To unsubscribe from this list: send the line "unsubscribe linux-
> kernel" in the body of a message to majordomo@vger.kernel.org More
> majordomo info at  http://vger.kernel.org/majordomo-info.html Please
> read the FAQ at  http://www.tux.org/lkml/
>
>
-- 
  Alexander van Heukelum
  heukelum@fastmail.fm

-- 
http://www.fastmail.fm - A fast, anti-spam email service.


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-19 17:49     ` Andrew Morton
@ 2008-04-25 17:39       ` Parag Warudkar
  0 siblings, 0 replies; 162+ messages in thread
From: Parag Warudkar @ 2008-04-25 17:39 UTC (permalink / raw)
  To: linux-kernel

Andrew Morton <akpm <at> linux-foundation.org> writes:

> 
> > On Sat, 19 Apr 2008 16:23:29 +0200 Ingo Molnar <mingo <at> elte.hu> wrote:
> > 
> > * Andrew Morton <akpm <at> linux-foundation.org> wrote:
> > 
> > > >  config 4KSTACKS
> > > >  	bool "Use 4Kb for kernel stacks instead of 8Kb"
> > > > -	depends on DEBUG_KERNEL
> > > >  	depends on X86_32
> > > > +	default y
> > > 
> > > This patch will cause kernels to crash.
> > 
> > what mainline kernels crash and how will they crash?
> 
> There has been a dribble of reports - I don't have the links handy, nor did
> I search for them.
> 
> > Fedora and other 
> > distros have had 4K stacks enabled for years:
> > 
> >   $ grep 4K /boot/config-2.6.24-9.fc9
> >   CONFIG_4KSTACKS=y

Here is a report - Fedora 8 default kernel, Mac Mini file server, Not Tainted. 
Attempt to copy 100Gb+ of data from a hfsplus file system on a USB drive to a 
firewire drive with XFS filesystem - I got a nasty panic with a huge stack 
backtrace. I gave up and switched to Ubuntu. With a stock kernel.org kernel I 
was able to successfully copy the data over. I still have the machine and the 
restored drives and can try to reproduce it with Fedora 9 w/4K stacks if 
anyone thinks it is worthwhile (i.e. fixable).

Parag


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-22  1:28                                 ` David Chinner
  2008-04-22  2:33                                   ` [PATCH] xfs: do not pass size into kmem_free, it's unused Denys Vlasenko
  2008-04-22 12:48                                   ` x86: 4kstacks default Denys Vlasenko
@ 2008-04-27 19:27                                   ` Jörn Engel
  2008-04-27 23:02                                     ` Denys Vlasenko
  2 siblings, 1 reply; 162+ messages in thread
From: Jörn Engel @ 2008-04-27 19:27 UTC (permalink / raw)
  To: David Chinner
  Cc: Denys Vlasenko, Eric Sandeen, Adrian Bunk, Alan Cox,
	Shawn Bohrer, Ingo Molnar, Andrew Morton,
	Linux Kernel Mailing List, Arjan van de Ven, Thomas Gleixner

On Tue, 22 April 2008 11:28:19 +1000, David Chinner wrote:
> On Mon, Apr 21, 2008 at 09:51:02PM +0200, Denys Vlasenko wrote:
> 
> > Why xfs code is said to be 5 timed bigged than e.g. reiserfs?
> > Does it have to be that big?
> 
> If we cut the bulkstat code out, the handle interface, the
> preallocation, the journalled quota, the delayed allocation, all the
> runtime validation, the shutdown code, the debug code, the tracing
> code, etc, then we might get down to the same size reiser....

Just noticed this bit of FUD.  Last time I did some static analysis on
stack usage, reiserfs alone would blow away 3k, while xfs was somewhere
below.  Reiserfs was improved afaik, but I'd still expect it to be worse
than xfs until shown otherwise.

Maybe reiserfs simply isn't used that much in nfs+*fs+md+whatnot+scsi
setups?

Jörn

-- 
Courage is not the absence of fear, but rather the judgement that
something else is more important than fear.
-- Ambrose Redmoon

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-27 19:27                                   ` Jörn Engel
@ 2008-04-27 23:02                                     ` Denys Vlasenko
  2008-04-27 23:08                                       ` Eric Sandeen
  0 siblings, 1 reply; 162+ messages in thread
From: Denys Vlasenko @ 2008-04-27 23:02 UTC (permalink / raw)
  To: Jörn Engel
  Cc: David Chinner, Eric Sandeen, Adrian Bunk, Alan Cox, Shawn Bohrer,
	Ingo Molnar, Andrew Morton, Linux Kernel Mailing List,
	Arjan van de Ven, Thomas Gleixner

On Sunday 27 April 2008 21:27, Jörn Engel wrote:
> On Tue, 22 April 2008 11:28:19 +1000, David Chinner wrote:
> > On Mon, Apr 21, 2008 at 09:51:02PM +0200, Denys Vlasenko wrote:
> > 
> > > Why xfs code is said to be 5 times bigger than e.g. reiserfs?
> > > Does it have to be that big?
> > 
> > If we cut the bulkstat code out, the handle interface, the
> > preallocation, the journalled quota, the delayed allocation, all the
> > runtime validation, the shutdown code, the debug code, the tracing
> > code, etc, then we might get down to the same size reiser....
> 
> Just noticed this bit of FUD. Last time I did some static analysis on
> stack usage, reiserfs alone would blow away 3k, while xfs was somewhere
> below.

I'm sorry, but it's not what I said.
I didn't say reiserfs eats less stack. I don't know.
I said it is smaller.

reiserfs/*  821474 bytes
xfs/*      3019689 bytes
--
vda

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-27 23:02                                     ` Denys Vlasenko
@ 2008-04-27 23:08                                       ` Eric Sandeen
  2008-04-28  0:00                                         ` Denys Vlasenko
  0 siblings, 1 reply; 162+ messages in thread
From: Eric Sandeen @ 2008-04-27 23:08 UTC (permalink / raw)
  To: Denys Vlasenko
  Cc: Jörn Engel, David Chinner, Adrian Bunk, Alan Cox,
	Shawn Bohrer, Ingo Molnar, Andrew Morton,
	Linux Kernel Mailing List, Arjan van de Ven, Thomas Gleixner

Denys Vlasenko wrote:
> On Sunday 27 April 2008 21:27, Jörn Engel wrote:
>> On Tue, 22 April 2008 11:28:19 +1000, David Chinner wrote:
>>> On Mon, Apr 21, 2008 at 09:51:02PM +0200, Denys Vlasenko wrote:
>>>
>>>> Why xfs code is said to be 5 times bigger than e.g. reiserfs?
>>>> Does it have to be that big?
>>> If we cut the bulkstat code out, the handle interface, the
>>> preallocation, the journalled quota, the delayed allocation, all the
>>> runtime validation, the shutdown code, the debug code, the tracing
>>> code, etc, then we might get down to the same size reiser....
>> Just noticed this bit of FUD. Last time I did some static analysis on
>> stack usage, reiserfs alone would blow away 3k, while xfs was somewhere
>> below.
> 
> I'm sorry, but it's not what I said.
> I didn't say reiserfs eats less stack. I don't know.
> I said it is smaller.
> 
> reiserfs/*  821474 bytes
> xfs/*      3019689 bytes

FWIW, the reason for that is in large part all the features Dave listed
above, and probably more.

And, while certainly not yet tiny, the recent trend actually is that xfs
is getting a bit smaller:

http://oss.sgi.com/~sandeen/xfs-linedata.png

(note, though - the Y axis does not start at 0)  :)

-Eric

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-27 23:08                                       ` Eric Sandeen
@ 2008-04-28  0:00                                         ` Denys Vlasenko
  0 siblings, 0 replies; 162+ messages in thread
From: Denys Vlasenko @ 2008-04-28  0:00 UTC (permalink / raw)
  To: Eric Sandeen
  Cc: Jörn Engel, David Chinner, Adrian Bunk, Alan Cox,
	Shawn Bohrer, Ingo Molnar, Andrew Morton,
	Linux Kernel Mailing List, Arjan van de Ven, Thomas Gleixner

On Monday 28 April 2008 01:08, Eric Sandeen wrote:
> >>>> Why xfs code is said to be 5 times bigger than e.g. reiserfs?
> >>>> Does it have to be that big?
> >>> If we cut the bulkstat code out, the handle interface, the
> >>> preallocation, the journalled quota, the delayed allocation, all the
> >>> runtime validation, the shutdown code, the debug code, the tracing
> >>> code, etc, then we might get down to the same size reiser....
> >> Just noticed this bit of FUD. Last time I did some static analysis on
> >> stack usage, reiserfs alone would blow away 3k, while xfs was somewhere
> >> below.
> > 
> > I'm sorry, but it's not what I said.
> > I didn't say reiserfs eats less stack. I don't know.
> > I said it is smaller.
> > 
> > reiserfs/*  821474 bytes
> > xfs/*      3019689 bytes
> 
> FWIW, the reason for that is in large part all the features Dave listed
> above, and probably more.
> 
> And, while certainly not yet tiny, the recent trend actually is that xfs
> is getting a bit smaller:
> 
> http://oss.sgi.com/~sandeen/xfs-linedata.png

~30% line count reduction? Impressive, especially in this age
of creeping bloat. Thanks.
--
vda

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-20 13:30                       ` Adrian Bunk
  2008-04-20 13:34                         ` Willy Tarreau
@ 2008-04-28 17:56                         ` Bill Davidsen
  1 sibling, 0 replies; 162+ messages in thread
From: Bill Davidsen @ 2008-04-28 17:56 UTC (permalink / raw)
  To: linux-kernel
  Cc: Andi Kleen, Willy Tarreau, Alan Cox, Shawn Bohrer, Ingo Molnar,
	Andrew Morton, Linux Kernel Mailing List, Arjan van de Ven,
	Thomas Gleixner

Adrian Bunk wrote:
> On Sun, Apr 20, 2008 at 03:06:23PM +0200, Andi Kleen wrote:
>> Willy Tarreau wrote:
>> ...
>>> I have nothing against changing the default setting to 4k provided that
>>> it is easy to get back to the save setting
>> So you're saying that only advanced users who understand all their
>> CONFIG options should have the safe settings? And everyone else
>> the "only explodes once a week" mode?
>>
>> For me that is exactly the wrong way around.
>>
>> If someone is sure they know what they're doing they can set whatever
>> crazy settings they want (given there is a quick way to check
>> for the crazy settings in oops reports so that I can ignore those), but
>> the default should be always safe and optimized for reliability.
> 
> That means we'll have nearly zero testing of the "crazy setting" and 
> when someone tries it he'll have a high probability of running into some
> problems.
> 
> Such a "crazy setting" shouldn't be offered to users at all.
> 
> We should either aim at 4k stacks unconditionally for all 32bit 
> architectures with 4k page size or don't allow any architecture
> to offer 4k stacks.
> 
I have suggested before that the solution is to allocate memory in 
"stack size" units (obviously must be a multiple of the hardware page 
size). The reason allocation fails is more often fragmentation than 
actual lack of memory, or so it has been reported.

-- 
Bill Davidsen <davidsen@tmr.com>
   "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-20 13:21                     ` Adrian Bunk
  2008-04-23  9:13                       ` Helge Hafting
@ 2008-04-28 18:38                       ` Bill Davidsen
  1 sibling, 0 replies; 162+ messages in thread
From: Bill Davidsen @ 2008-04-28 18:38 UTC (permalink / raw)
  To: Adrian Bunk
  Cc: Willy Tarreau, Andi Kleen, Alan Cox, Shawn Bohrer, Ingo Molnar,
	Andrew Morton, Linux Kernel Mailing List, Arjan van de Ven,
	Thomas Gleixner

Adrian Bunk wrote:
> On Sun, Apr 20, 2008 at 02:47:17PM +0200, Willy Tarreau wrote:
>> ...
>> I certainly can understand that reducing memory footprint is useful, but
>> if we want wider testing of 4k stacks, considering they may fail in error
>> path in complex I/O environment, it's not likely during -rc kernels that
>> we'll detect problems, and if we push them down the throat of users in a
>> stable release, of course they will thank us very much for crashing their
>> NFS servers in production during peak hours.
> 
> I've seen many bugs in error paths in the kernel and fixed quite a 
> few of them - and stack problems were not a significant part of them.
> 
> There are so many possible bugs (that also occur in practice) that 
> singling out stack usage won't gain much.
> 
>> I have nothing against changing the default setting to 4k provided that
>> it is easy to get back to the save setting (ie changing a config option,
>> or better, a cmdline parameter). I just don't agree with the idea of
>> forcing users to swim in the sh*t, it only brings bad reputation to
>> Linux.
>> ...
> 
> What actually brings bad reputation is shipping a 4k option that is 
> known to break under some circumstances.
> 
> And history has shown that as long as 8k stacks are available on i386 
> some problems will not get fixed. 4k stacks are available as an option 
> on i386 for more than 4 years, and at about as long we know that there 
> are some setups (AFAIK all that might still be present seem to include 
> XFS) that are known to not work reliably with 4k stacks.
> 
> If we go after stability and reputation, we have to make a decision 
> whether we want to get 4k stacks on 32bit architectures with 4k page 
> size unconditionally or not at all. That's the way that gets the maximal 
> number of bugs shaken out [1] for all supported configurations before 
> they would hit a stable kernel.
> 
A good argument for keeping the default 8k and letting people who know 
what they are doing, or think they do, test their system for 4k 
operation. Embedded systems typically have far better defined loads than 
servers or desktops, and are less likely to have different behavior 
change the stack requirements. That doesn't mean they do less, just that 
the load is usually better characterized.

Vendors shipping a 4k stack kernel are probably not going to be happy if 
someone nfs exports an xfs filesystem on lvm, running on md raid0 
composed of raid5 arrays, containing multipath, iSCSI, SATA and nbd 
devices. No, I didn't make that up, someone asked me what I thought 
their problem was with that setup.

The kernel is getting more complex, and I don't think that anyone but 
you is interested in making 4k stacks mandatory, or in eliminating them, 
either.

You frequently take the attitude that something you don't like (like all 
the old but WORKING network drivers) should be removed from the kernel, 
so that people will be forced to use the new whatever and find bugs so 
they can be fixed. Unfortunately in some cases the bugs are never fixed 
and Linux loses a capability it once had.

The arbitrary 4k limit requires a lot of work on dropping stack usage 
even more than has already been done, and is mostly an effort you want 
other people to make so you can be happy (I assume that if you were 
offering to do it all yourself you already would have), and most 
importantly it would waste a lot of developer effort on a low return 
goal, which could be used on useful new features or fixing corner case 
bugs. Or drinking beer...

Hell, it wastes your time arguing about it, and you do lots of useful 
things when you're not trying to force your minimalist philosophy on people.

-- 
Bill Davidsen <davidsen@tmr.com>
   "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-21 20:05           ` Bodo Eggert
@ 2008-04-22 15:34             ` Daniel Hazelton
  0 siblings, 0 replies; 162+ messages in thread
From: Daniel Hazelton @ 2008-04-22 15:34 UTC (permalink / raw)
  To: Bodo Eggert
  Cc: 7eggert, Adrian Bunk, Alan Cox, Shawn Bohrer, Ingo Molnar,
	Andrew Morton, Linux Kernel Mailing List, Arjan van de Ven,
	Thomas Gleixner, Andi Kleen

On Monday 21 April 2008 16:05:38 you wrote:
> On Sun, 20 Apr 2008, Daniel Hazelton wrote:
> > On Sunday 20 April 2008 16:23:45 Bodo Eggert wrote:
> > > Daniel Hazelton <dhazelton@enter.net> wrote:
> > > > On Sunday 20 April 2008 08:27:14 Andi Kleen wrote:
> > > >> Adrian Bunk <bunk@kernel.org> writes:
> > > >> > 6k is known to work, and there aren't many problems known with 4k.
> > > >> >
> > > >> > And from a QA point of view the only way of getting 4k thoroughly
> > > >> > tested
> > > >>
> > > >> But you have to first ask why do you want 4k tested? Does it serve
> > > >> any useful purpose in itself? I don't think so. Or you're saying
> > > >> it's important to support 50k kernel threads on 32bit kernels?
> > > >
> > > > Andi, you're the only one I've seen seriously pounding the "50k
> > > > threads" thing - I don't think anyone is really fooled by the
> > > > straw-man, so I'd suggest you drop it.
> > > >
> > > > The real issue is that you think (and are correct in thinking) that
> > > > people are idiots. Yes, there will be breakages if the default is
> > > > changed to 4k stacks - but if people are running new kernels on boxes
> > > > that'll hit stack use problems (that *AREN'T* related to ndiswrapper)
> > > > and haven't made sure that they've configured the kernel properly,
> > > > then they deserve the outcome. It isn't the job of the Linux Kernel
> > > > to protect the incompetent - nor is it the job of linux kernel
> > > > developers to do such.
> > >
> > > It's the job of the kernel developers to mark experimental and broken
> > > options, and to put a warning:
> > >
> > > "This will break stacking of drivers, especially if disk manager, xfs,
> > > RAID and nfs are used. Yes, linux is broken by default, but only if you
> > > intend to set up a reliable system, so this will be OK!"
> > >
> > > into the help text, instead of expecting each admin to read lkml.
> >
> > Note that I've yet to meet a competent admin that creates brand new
> > configurations each time they build a new kernel for a machine.
>
> Once is enough, and if you build a costom kernel, you'll certainly not
> want to start from the distribution's allmodconfig.

No, you wouldn't. But, at least at the companies I've worked for, there was 
already a custom kernel running and it had a default configuration file that 
was updated and carried over to each new kernel, with a "make oldconfig" done 
to update it.

> > Usually they
> > have a "default configuration" for each machine that gets updated each
> > time a new kernel is built. Usually they don't change working options.
> > And since changing things to 4K stacks default would cause a new option -
> > the "8K stacks" option to show up in a "make oldconfig" run - the admin
> > would see it and, hopefully, check the help text and see that it his
> > system, with a deeply stacked driver system (nfs+xfs+raid, for example)
> > and set the 8K stacks option to "Y".
>
> The help text does not yet say anything about crashing.

It should be updated to note that there are configurations that will overrun 
the stack and cause crashes. (However, it shouldn't be the default - I'll 
agree to that)

> > As I said, it isn't the job of the kernel or kernel developers to protect
> > the incompetent (or the lazy).
>
> It's only incompetent if it's reasonable to expect a crashing kernel to
> result from chosing the default values.

Agreed. I've been arguing about it without being clear that my arguments for 
it (including making it the default) are from the perspective of a desktop 
user who had run the "stack depth check" on an 8K stacks kernel for a long 
time and found that the only times he ever had problems was during boot - 
with "sed" and "grep" being the culprits.

Booting a 4K stacks kernel wouldn't work if I hadn't also modified the 
initscripts here, and that wasn't easy - I've got a report of grep using 
enough stack that only about 3900 bytes were left on the 8K stack.  However, 
I think this does show that there are problems left in a number of places 
that make moving to a default of 4K stacks dangerous.

(I've recently checked this by undoing my changes and setting up an 8K kernel 
on this laptop - I don't know if the next version of the distro will ship 
with 4K stacks, but I'm pretty certain it won't)

In light of this I'm going to pull out of this discussion, because I can no 
longer support a move to 4K stacks default. This doesn't mean I don't still 
support having them around, or even a move to them as a default at a later 
date, but right now there are still many places where there are problems that 
would cause "mysterious" failures and corruption with 4K stacks.

DRH

-- 
Dialup is like pissing through a pipette. Slow and excruciatingly painful.

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-20 20:59         ` Daniel Hazelton
@ 2008-04-21 20:05           ` Bodo Eggert
  2008-04-22 15:34             ` Daniel Hazelton
  0 siblings, 1 reply; 162+ messages in thread
From: Bodo Eggert @ 2008-04-21 20:05 UTC (permalink / raw)
  To: Daniel Hazelton
  Cc: 7eggert, Adrian Bunk, Alan Cox, Shawn Bohrer, Ingo Molnar,
	Andrew Morton, Linux Kernel Mailing List, Arjan van de Ven,
	Thomas Gleixner, Andi Kleen

On Sun, 20 Apr 2008, Daniel Hazelton wrote:
> On Sunday 20 April 2008 16:23:45 Bodo Eggert wrote:
> > Daniel Hazelton <dhazelton@enter.net> wrote:
> > > On Sunday 20 April 2008 08:27:14 Andi Kleen wrote:
> > >> Adrian Bunk <bunk@kernel.org> writes:

> > >> > 6k is known to work, and there aren't many problems known with 4k.
> > >> >
> > >> > And from a QA point of view the only way of getting 4k thoroughly
> > >> > tested
> > >>
> > >> But you have to first ask why do you want 4k tested? Does it serve
> > >> any useful purpose in itself? I don't think so. Or you're saying
> > >> it's important to support 50k kernel threads on 32bit kernels?
> > >
> > > Andi, you're the only one I've seen seriously pounding the "50k threads"
> > > thing - I don't think anyone is really fooled by the straw-man, so I'd
> > > suggest you drop it.
> > >
> > > The real issue is that you think (and are correct in thinking) that
> > > people are idiots. Yes, there will be breakages if the default is changed
> > > to 4k stacks - but if people are running new kernels on boxes that'll hit
> > > stack use problems (that *AREN'T* related to ndiswrapper) and haven't
> > > made sure that they've configured the kernel properly, then they deserve
> > > the outcome. It isn't the job of the Linux Kernel to protect the
> > > incompetent - nor is it the job of linux kernel developers to do such.
> >
> > It's the job of the kernel developers to mark experimental and broken
> > options, and to put a warning:
> >
> > "This will break stacking of drivers, especially if disk manager, xfs, RAID
> > and nfs are used. Yes, linux is broken by default, but only if you intend
> > to set up a reliable system, so this will be OK!"
> >
> > into the help text, instead of expecting each admin to read lkml.
> 
> Note that I've yet to meet a competent admin that creates brand new 
> configurations each time they build a new kernel for a machine.

Once is enough, and if you build a costom kernel, you'll certainly not 
want to start from the distribution's allmodconfig.

> Usually they 
> have a "default configuration" for each machine that gets updated each time a 
> new kernel is built. Usually they don't change working options. And since 
> changing things to 4K stacks default would cause a new option - the "8K 
> stacks" option to show up in a "make oldconfig" run - the admin would see it 
> and, hopefully, check the help text and see that it his system, with a deeply 
> stacked driver system (nfs+xfs+raid, for example) and set the 8K stacks 
> option to "Y".

The help text does not yet say anything about crashing.

> As I said, it isn't the job of the kernel or kernel developers to protect the 
> incompetent (or the lazy).

It's only incompetent if it's reasonable to expect a crashing kernel to
result from chosing the default values.
-- 
bus error. passengers dumped.

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
  2008-04-20 20:23       ` Bodo Eggert
@ 2008-04-20 20:59         ` Daniel Hazelton
  2008-04-21 20:05           ` Bodo Eggert
  0 siblings, 1 reply; 162+ messages in thread
From: Daniel Hazelton @ 2008-04-20 20:59 UTC (permalink / raw)
  To: 7eggert
  Cc: Adrian Bunk, Alan Cox, Shawn Bohrer, Ingo Molnar, Andrew Morton,
	Linux Kernel Mailing List, Arjan van de Ven, Thomas Gleixner,
	Andi Kleen

On Sunday 20 April 2008 16:23:45 Bodo Eggert wrote:
> Daniel Hazelton <dhazelton@enter.net> wrote:
> > On Sunday 20 April 2008 08:27:14 Andi Kleen wrote:
> >> Adrian Bunk <bunk@kernel.org> writes:
> >> > 6k is known to work, and there aren't many problems known with 4k.
> >> >
> >> > And from a QA point of view the only way of getting 4k thoroughly
> >> > tested
> >>
> >> But you have to first ask why do you want 4k tested? Does it serve
> >> any useful purpose in itself? I don't think so. Or you're saying
> >> it's important to support 50k kernel threads on 32bit kernels?
> >
> > Andi, you're the only one I've seen seriously pounding the "50k threads"
> > thing - I don't think anyone is really fooled by the straw-man, so I'd
> > suggest you drop it.
> >
> > The real issue is that you think (and are correct in thinking) that
> > people are idiots. Yes, there will be breakages if the default is changed
> > to 4k stacks - but if people are running new kernels on boxes that'll hit
> > stack use problems (that *AREN'T* related to ndiswrapper) and haven't
> > made sure that they've configured the kernel properly, then they deserve
> > the outcome. It isn't the job of the Linux Kernel to protect the
> > incompetent - nor is it the job of linux kernel developers to do such.
>
> It's the job of the kernel developers to mark experimental and broken
> options, and to put a warning:
>
> "This will break stacking of drivers, especially if disk manager, xfs, RAID
> and nfs are used. Yes, linux is broken by default, but only if you intend
> to set up a reliable system, so this will be OK!"
>
> into the help text, instead of expecting each admin to read lkml.

Note that I've yet to meet a competent admin that creates brand new 
configurations each time they build a new kernel for a machine. Usually they 
have a "default configuration" for each machine that gets updated each time a 
new kernel is built. Usually they don't change working options. And since 
changing things to 4K stacks default would cause a new option - the "8K 
stacks" option to show up in a "make oldconfig" run - the admin would see it 
and, hopefully, check the help text and see that it his system, with a deeply 
stacked driver system (nfs+xfs+raid, for example) and set the 8K stacks 
option to "Y".

As I said, it isn't the job of the kernel or kernel developers to protect the 
incompetent (or the lazy).

DRH

-- 
Dialup is like pissing through a pipette. Slow and excruciatingly painful.

^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
       [not found]     ` <akKhi-91-37@gated-at.bofh.it>
@ 2008-04-20 20:23       ` Bodo Eggert
  2008-04-20 20:59         ` Daniel Hazelton
  0 siblings, 1 reply; 162+ messages in thread
From: Bodo Eggert @ 2008-04-20 20:23 UTC (permalink / raw)
  To: Daniel Hazelton, Adrian Bunk, Alan Cox, Shawn Bohrer,
	Ingo Molnar, Andrew Morton, Linux Kernel Mailing List,
	Arjan van de Ven, Thomas Gleixner, Andi Kleen

Daniel Hazelton <dhazelton@enter.net> wrote:
> On Sunday 20 April 2008 08:27:14 Andi Kleen wrote:
>> Adrian Bunk <bunk@kernel.org> writes:
>> > 6k is known to work, and there aren't many problems known with 4k.
>> >
>> > And from a QA point of view the only way of getting 4k thoroughly tested
>>
>> But you have to first ask why do you want 4k tested? Does it serve
>> any useful purpose in itself? I don't think so. Or you're saying
>> it's important to support 50k kernel threads on 32bit kernels?

> Andi, you're the only one I've seen seriously pounding the "50k threads"
> thing - I don't think anyone is really fooled by the straw-man, so I'd
> suggest you drop it.
> 
> The real issue is that you think (and are correct in thinking) that people are
> idiots. Yes, there will be breakages if the default is changed to 4k stacks -
> but if people are running new kernels on boxes that'll hit stack use problems
> (that *AREN'T* related to ndiswrapper) and haven't made sure that they've
> configured the kernel properly, then they deserve the outcome. It isn't the
> job of the Linux Kernel to protect the incompetent - nor is it the job of
> linux kernel developers to do such.

It's the job of the kernel developers to mark experimental and broken options,
and to put a warning:

"This will break stacking of drivers, especially if disk manager, xfs, RAID
and nfs are used. Yes, linux is broken by default, but only if you intend to
set up a reliable system, so this will be OK!"

into the help text, instead of expecting each admin to read lkml.


^ permalink raw reply	[flat|nested] 162+ messages in thread

* Re: x86: 4kstacks default
       [not found] ` <ak6tq-32p-5@gated-at.bofh.it>
@ 2008-04-19 10:56   ` Bodo Eggert
  0 siblings, 0 replies; 162+ messages in thread
From: Bodo Eggert @ 2008-04-19 10:56 UTC (permalink / raw)
  To: Andrew Morton, Ingo Molnar, Linux Kernel Mailing List

Andrew Morton <akpm@linux-foundation.org> wrote:
> Linux Kernel Mailing List <linux-kernel@vger.kernel.org> wrote:

>> Gitweb:    
>>
http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=d61ecf0b53131564949bc4196e70f676000a845a
>> Commit:     d61ecf0b53131564949bc4196e70f676000a845a
>> Author:     Ingo Molnar <mingo@elte.hu>
>> Committer:  Ingo Molnar <mingo@elte.hu>

>>     x86: 4kstacks default

> This patch will cause kernels to crash.
> 
> It has no changelog which explains or justifies the alteration.
> 
> afaict the patch was not posted to the mailing list and was not
> discussed or reviewed.

The patch (or a similar one) was discussed and rejected for the reason above.


^ permalink raw reply	[flat|nested] 162+ messages in thread

end of thread, other threads:[~2008-04-28 18:34 UTC | newest]

Thread overview: 162+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <200804181737.m3IHbabI010051@hera.kernel.org>
2008-04-18 21:29 ` x86: 4kstacks default Andrew Morton
2008-04-19 14:23   ` Ingo Molnar
2008-04-19 14:35     ` Oliver Pinter
2008-04-19 15:19       ` Adrian Bunk
2008-04-19 15:42         ` Oliver Pinter
2008-04-20  1:56         ` Eric Sandeen
2008-04-20  7:42           ` Adrian Bunk
2008-04-20 16:59             ` Chris Wedgwood
     [not found]         ` <480AA2B9.10305__23983.3358479247$1208657639$gmane$org@sandeen.net>
2008-04-20 11:48           ` Andi Kleen
2008-04-19 14:59     ` Shawn Bohrer
2008-04-19 18:00       ` Arjan van de Ven
2008-04-19 18:33         ` Ingo Molnar
2008-04-19 19:10           ` Stefan Richter
2008-04-20  2:36         ` Eric Sandeen
2008-04-20  6:11           ` Arjan van de Ven
2008-04-20 22:53           ` David Chinner
2008-04-20  8:09       ` Adrian Bunk
2008-04-20  8:06         ` Alan Cox
2008-04-20  8:51           ` Adrian Bunk
2008-04-20  9:36             ` Alan Cox
2008-04-20 10:44               ` Adrian Bunk
2008-04-20 11:02                 ` Alan Cox
2008-04-20 11:54                   ` Adrian Bunk
2008-04-20 11:37                     ` Alan Cox
2008-04-20 12:18                       ` Adrian Bunk
2008-04-20 14:05                         ` Eric Sandeen
2008-04-20 14:21                           ` Adrian Bunk
2008-04-20 14:56                             ` Eric Sandeen
2008-04-20 15:41                           ` Arjan van de Ven
2008-04-20 16:03                             ` Adrian Bunk
2008-04-21  3:30                               ` Alexander E. Patrakov
2008-04-23  8:57                                 ` Helge Hafting
2008-04-21  7:45                           ` Denys Vlasenko
2008-04-21  9:55                             ` Andi Kleen
2008-04-21 13:29                             ` Eric Sandeen
2008-04-21 19:51                               ` Denys Vlasenko
2008-04-21 20:28                                 ` Denys Vlasenko
2008-04-22  1:28                                 ` David Chinner
2008-04-22  2:33                                   ` [PATCH] xfs: do not pass size into kmem_free, it's unused Denys Vlasenko
2008-04-22  3:03                                     ` [PATCH] xfs: do not pass unused params to xfs_flush_pages Denys Vlasenko
2008-04-22  3:14                                       ` [PATCH] xfs: use smaller int param in call " Denys Vlasenko
2008-04-22  3:18                                         ` Eric Sandeen
2008-04-22  4:10                                           ` David Chinner
2008-04-22  9:42                                         ` [PATCH] xfs: remove unused parameter of xfs_qm_dqpurge Denys Vlasenko
2008-04-22 10:16                                           ` [PATCH] xfs: remove unused parameter of xfs_iomap_write_allocate Denys Vlasenko
2008-04-22 11:20                                             ` [PATCH] xfs: #define out unused parameters of xfs_bmap_add_free and xfs_btree_read_bufl Denys Vlasenko
2008-04-22 11:48                                               ` [PATCH] xfs: #define out unused parameters for seven functions in xfs_trans.h Denys Vlasenko
2008-04-22 11:51                                               ` Denys Vlasenko
2008-04-22 13:32                                                 ` [PATCH] xfs: remove unused params from functions in xfs_dir2_leaf.h Denys Vlasenko
2008-04-22 13:40                                                   ` [PATCH] xfs: remove unused params from functions in xfs/quota/* Denys Vlasenko
2008-04-22 13:46                                                     ` [PATCH] xfs: expose no-op xfs_put_perag() Denys Vlasenko
2008-04-22 14:08                                                       ` Eric Sandeen
2008-04-22 23:16                                                       ` David Chinner
2008-04-22 23:08                                                     ` [PATCH] xfs: remove unused params from functions in xfs/quota/* David Chinner
2008-04-22 22:47                                                 ` [PATCH] xfs: #define out unused parameters for seven functions in xfs_trans.h David Chinner
2008-04-22 14:28                                               ` [PATCH] xfs: #define out unused parameters of xfs_bmap_add_free and xfs_btree_read_bufl Adrian Bunk
2008-04-22 16:17                                                 ` Denys Vlasenko
2008-04-22 17:21                                                   ` Adrian Bunk
2008-04-22 17:26                                                     ` Eric Sandeen
2008-04-22 17:50                                                       ` Denys Vlasenko
2008-04-22 18:28                                                         ` [PATCH] xfs: #define out unused parameters of?xfs_bmap_add_free " Adrian Bunk
2008-04-22 19:32                                                           ` Denys Vlasenko
2008-04-22 23:53                                                             ` Adrian Bunk
2008-04-22 20:46                                                       ` [PATCH] xfs: #define out unused parameters of xfs_bmap_add_free " Denys Vlasenko
2008-04-22 22:43                                               ` David Chinner
2008-04-22 22:33                                             ` [PATCH] xfs: remove unused parameter of xfs_iomap_write_allocate David Chinner
2008-04-22 22:11                                           ` [PATCH] xfs: remove unused parameter of xfs_qm_dqpurge David Chinner
2008-04-23  8:18                                           ` Christoph Hellwig
2008-04-22 22:08                                         ` [PATCH] xfs: use smaller int param in call to xfs_flush_pages David Chinner
2008-04-22  3:15                                       ` [PATCH] xfs: do not pass unused params " Eric Sandeen
2008-04-22  8:57                                         ` Denys Vlasenko
2008-04-22  9:56                                           ` Jakub Jelinek
2008-04-22 10:33                                             ` Denys Vlasenko
2008-04-22 12:51                                           ` Eric Sandeen
2008-04-22 22:07                                       ` David Chinner
2008-04-22  3:09                                     ` [PATCH] xfs: do not pass size into kmem_free, it's unused Eric Sandeen
2008-04-22  3:35                                       ` Eric Sandeen
2008-04-22 22:02                                     ` David Chinner
2008-04-22 12:48                                   ` x86: 4kstacks default Denys Vlasenko
2008-04-22 13:01                                     ` Adrian Bunk
2008-04-22 13:51                                       ` Denys Vlasenko
2008-04-27 19:27                                   ` Jörn Engel
2008-04-27 23:02                                     ` Denys Vlasenko
2008-04-27 23:08                                       ` Eric Sandeen
2008-04-28  0:00                                         ` Denys Vlasenko
2008-04-20 12:37                     ` Andi Kleen
2008-04-20 12:27                 ` Andi Kleen
2008-04-20 12:32                   ` Adrian Bunk
2008-04-20 12:47                   ` Willy Tarreau
2008-04-20 13:06                     ` Andi Kleen
2008-04-20 13:30                       ` Adrian Bunk
2008-04-20 13:34                         ` Willy Tarreau
2008-04-20 14:04                           ` Adrian Bunk
2008-04-28 17:56                         ` Bill Davidsen
2008-04-20 13:21                     ` Adrian Bunk
2008-04-23  9:13                       ` Helge Hafting
2008-04-23 23:29                         ` David Chinner
2008-04-24 15:46                         ` Eric Sandeen
2008-04-28 18:38                       ` Bill Davidsen
2008-04-20 13:27                     ` Mark Lord
2008-04-20 13:38                       ` Willy Tarreau
2008-04-20 14:19                         ` Andi Kleen
2008-04-20 16:41                           ` Jörn Engel
2008-04-20 17:19                             ` Andi Kleen
2008-04-20 17:43                               ` Jörn Engel
2008-04-20 18:19                                 ` Andi Kleen
2008-04-20 18:50                                   ` Arjan van de Ven
2008-04-20 20:09                                     ` Andi Kleen
2008-04-20 21:50                                     ` Andrew Morton
2008-04-20 21:55                                       ` Andi Kleen
2008-04-21 14:29                                       ` Ingo Molnar
2008-04-20 20:32                                   ` Jörn Engel
2008-04-20 20:35                                   ` Jörn Engel
2008-04-20 14:09                       ` Eric Sandeen
2008-04-20 14:20                         ` Willy Tarreau
2008-04-20 14:40                           ` Eric Sandeen
2008-04-20 15:44                   ` Daniel Hazelton
2008-04-20 17:26                     ` Andi Kleen
2008-04-20 18:48                       ` Arjan van de Ven
2008-04-20 20:01                         ` Andi Kleen
2008-04-20 20:43                           ` Daniel Hazelton
2008-04-20 21:40                             ` Andi Kleen
2008-04-20 22:17                               ` Bernd Eckenfels
2008-04-20 23:48                                 ` Avi Kivity
2008-04-21  1:45                               ` Daniel Hazelton
2008-04-21  7:51                                 ` Andi Kleen
2008-04-21 17:34                                   ` Daniel Hazelton
2008-04-20 22:33                           ` Arjan van de Ven
2008-04-20 22:33                           ` Arjan van de Ven
2008-04-20 23:16                             ` Andi Kleen
2008-04-21  5:53                               ` Arjan van de Ven
2008-04-21  3:06                             ` Eric Sandeen
2008-04-20 21:45                         ` Andrew Morton
2008-04-20 21:51                           ` Andi Kleen
2008-04-22 18:20                     ` Romano Giannetti
2008-04-23  5:03                       ` Denys Vlasenko
2008-04-23  5:21                         ` Daniel Hazelton
2008-04-23  5:25                           ` david
2008-04-23  5:41                             ` Daniel Hazelton
2008-04-23  7:46                               ` Romano Giannetti
2008-04-23 11:24                                 ` Stefan Richter
2008-04-23 12:15                                   ` Romano Giannetti
2008-04-23 15:59                                     ` Lennart Sorensen
2008-04-20 13:22                 ` Mark Lord
2008-04-19 17:49     ` Andrew Morton
2008-04-25 17:39       ` Parag Warudkar
2008-04-20  3:29     ` Eric Sandeen
2008-04-20 12:36       ` Andi Kleen
2008-04-21 14:31       ` Ingo Molnar
2008-04-23  5:27     ` Benjamin Herrenschmidt
2008-04-23 23:36       ` David Chinner
2008-04-24  0:45         ` Arjan van de Ven
2008-04-24  9:52           ` Christoph Hellwig
2008-04-24 12:25             ` Peter Zijlstra
2008-04-24 15:41             ` Chris Mason
2008-04-24 18:30               ` Alexander van Heukelum
2008-04-24  0:56         ` Benjamin Herrenschmidt
     [not found] <ak6tq-32p-7@gated-at.bofh.it>
     [not found] ` <ak6tq-32p-5@gated-at.bofh.it>
2008-04-19 10:56   ` Bodo Eggert
     [not found] ` <akFri-4yB-25@gated-at.bofh.it>
     [not found]   ` <akGQg-7TX-1@gated-at.bofh.it>
     [not found]     ` <akKhi-91-37@gated-at.bofh.it>
2008-04-20 20:23       ` Bodo Eggert
2008-04-20 20:59         ` Daniel Hazelton
2008-04-21 20:05           ` Bodo Eggert
2008-04-22 15:34             ` Daniel Hazelton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).