[Xenomai-help] native: A 32k stack is not always a 'reasonable' size

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Xenomai-help] native: A 32k stack is not always a 'reasonable' size
@ 2010-07-06 19:25 Peter Soetens
  2010-07-07  9:06 ` Gilles Chanteperdrix
  2010-07-11 13:15 ` Gilles Chanteperdrix
  0 siblings, 2 replies; 21+ messages in thread
From: Peter Soetens @ 2010-07-06 19:25 UTC (permalink / raw)
  To: xenomai-help

At least, not for Orocos applications. We've had hard to debug
application segfaults that used just a 'little' bit more than 32k. We
had to raise the stack size to 128k to get reliably through our
application startup. I stem from the old 'mlockall ate my RAM'
generation where we typically reduced stack sizes in order to have
some crumbles left for the heap. But 32k wasn't really what we were
aiming for.

Maybe we should explicitly document the 32k limit and its limitations
for certain applications...?

Just my 2ct,
Peter

PS: can anyone allow 'sspr' (=me) to edit/add stuff on the wiki ?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xenomai-help] native: A 32k stack is not always a 'reasonable' size
  2010-07-06 19:25 [Xenomai-help] native: A 32k stack is not always a 'reasonable' size Peter Soetens
@ 2010-07-07  9:06 ` Gilles Chanteperdrix
  2010-07-07 20:57   ` Peter Soetens
  2010-07-11 13:15 ` Gilles Chanteperdrix
  1 sibling, 1 reply; 21+ messages in thread
From: Gilles Chanteperdrix @ 2010-07-07  9:06 UTC (permalink / raw)
  To: Peter Soetens; +Cc: xenomai-help

Peter Soetens wrote:
> At least, not for Orocos applications. We've had hard to debug
> application segfaults that used just a 'little' bit more than 32k. We
> had to raise the stack size to 128k to get reliably through our
> application startup. I stem from the old 'mlockall ate my RAM'
> generation where we typically reduced stack sizes in order to have
> some crumbles left for the heap. But 32k wasn't really what we were
> aiming for.
> 
> Maybe we should explicitly document the 32k limit and its limitations
> for certain applications...?

Again, things have been fixed in 2.5.3 with regard to stack sizes, could
you check that you have the same behaviour?

As for 32KiB, it is only a default stack size, it is only reasonable in
the sense that 2MiB is unreasonable on a low-end system. 32KiB was
picked because it allows printf to work. Now, whatever stack size we
choose, there will be applications which need more, this does not really
make the default unreasonable.

> PS: can anyone allow 'sspr' (=me) to edit/add stuff on the wiki ?

Looks like you passed an incorrect mail address for this account, so it
could not be verified, did you fix this?

-- 
					    Gilles.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xenomai-help] native: A 32k stack is not always a 'reasonable' size
  2010-07-07  9:06 ` Gilles Chanteperdrix
@ 2010-07-07 20:57   ` Peter Soetens
  2010-07-07 21:19     ` Gilles Chanteperdrix
  0 siblings, 1 reply; 21+ messages in thread
From: Peter Soetens @ 2010-07-07 20:57 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai-help

On Wed, Jul 7, 2010 at 11:06 AM, Gilles Chanteperdrix
<gilles.chanteperdrix@xenomai.org> wrote:
> Peter Soetens wrote:
>> At least, not for Orocos applications. We've had hard to debug
>> application segfaults that used just a 'little' bit more than 32k. We
>> had to raise the stack size to 128k to get reliably through our
>> application startup. I stem from the old 'mlockall ate my RAM'
>> generation where we typically reduced stack sizes in order to have
>> some crumbles left for the heap. But 32k wasn't really what we were
>> aiming for.
>>
>> Maybe we should explicitly document the 32k limit and its limitations
>> for certain applications...?
>
> Again, things have been fixed in 2.5.3 with regard to stack sizes, could
> you check that you have the same behaviour?

I think we had, but I'm uncertain right now.

>
> As for 32KiB, it is only a default stack size, it is only reasonable in
> the sense that 2MiB is unreasonable on a low-end system. 32KiB was
> picked because it allows printf to work. Now, whatever stack size we
> choose, there will be applications which need more, this does not really
> make the default unreasonable.

I knew you would say that. It deserves an entry in the faq or some
trouble shooting document though.

>
>> PS: can anyone allow 'sspr' (=me) to edit/add stuff on the wiki ?
>
> Looks like you passed an incorrect mail address for this account, so it
> could not be verified, did you fix this?

I did. Didn't realize there was a problem.

Peter


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xenomai-help] native: A 32k stack is not always a 'reasonable' size
  2010-07-07 20:57   ` Peter Soetens
@ 2010-07-07 21:19     ` Gilles Chanteperdrix
  2010-07-07 22:31       ` Peter Soetens
  0 siblings, 1 reply; 21+ messages in thread
From: Gilles Chanteperdrix @ 2010-07-07 21:19 UTC (permalink / raw)
  To: Peter Soetens; +Cc: xenomai-help

Peter Soetens wrote:
> On Wed, Jul 7, 2010 at 11:06 AM, Gilles Chanteperdrix
> <gilles.chanteperdrix@xenomai.org> wrote:
>> Peter Soetens wrote:
>>> At least, not for Orocos applications. We've had hard to debug
>>> application segfaults that used just a 'little' bit more than 32k. We
>>> had to raise the stack size to 128k to get reliably through our
>>> application startup. I stem from the old 'mlockall ate my RAM'
>>> generation where we typically reduced stack sizes in order to have
>>> some crumbles left for the heap. But 32k wasn't really what we were
>>> aiming for.
>>>
>>> Maybe we should explicitly document the 32k limit and its limitations
>>> for certain applications...?
>> Again, things have been fixed in 2.5.3 with regard to stack sizes, could
>> you check that you have the same behaviour?
> 
> I think we had, but I'm uncertain right now.
> 
>> As for 32KiB, it is only a default stack size, it is only reasonable in
>> the sense that 2MiB is unreasonable on a low-end system. 32KiB was
>> picked because it allows printf to work. Now, whatever stack size we
>> choose, there will be applications which need more, this does not really
>> make the default unreasonable.
> 
> I knew you would say that. It deserves an entry in the faq or some
> trouble shooting document though.

It is documented. For instance, rt_task_create says:
stksize 	The size of the stack (in bytes) for the new task. If
		zero is passed, a reasonable pre-defined size will be substituted.

What else can we say? Documenting that this size is 32 KiB would be
wrong, because we do not want applications to rely on a particular
value, in case we want to change it. And the fact that if your stack is
too small, you will get problems is kind of obvious. For anyone having
played with stack sizes with Linux or any proprietary RTOS, at least.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xenomai-help] native: A 32k stack is not always a 'reasonable' size
  2010-07-07 21:19     ` Gilles Chanteperdrix
@ 2010-07-07 22:31       ` Peter Soetens
  2010-07-07 23:08         ` Gilles Chanteperdrix
  0 siblings, 1 reply; 21+ messages in thread
From: Peter Soetens @ 2010-07-07 22:31 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai-help

On Wed, Jul 7, 2010 at 11:19 PM, Gilles Chanteperdrix
<gilles.chanteperdrix@xenomai.org> wrote:
> Peter Soetens wrote:
>> On Wed, Jul 7, 2010 at 11:06 AM, Gilles Chanteperdrix
>> <gilles.chanteperdrix@xenomai.org> wrote:
>>> Peter Soetens wrote:
>>>> At least, not for Orocos applications. We've had hard to debug
>>>> application segfaults that used just a 'little' bit more than 32k. We
>>>> had to raise the stack size to 128k to get reliably through our
>>>> application startup. I stem from the old 'mlockall ate my RAM'
>>>> generation where we typically reduced stack sizes in order to have
>>>> some crumbles left for the heap. But 32k wasn't really what we were
>>>> aiming for.
>>>>
>>>> Maybe we should explicitly document the 32k limit and its limitations
>>>> for certain applications...?
>>> Again, things have been fixed in 2.5.3 with regard to stack sizes, could
>>> you check that you have the same behaviour?
>>
>> I think we had, but I'm uncertain right now.
>>
>>> As for 32KiB, it is only a default stack size, it is only reasonable in
>>> the sense that 2MiB is unreasonable on a low-end system. 32KiB was
>>> picked because it allows printf to work. Now, whatever stack size we
>>> choose, there will be applications which need more, this does not really
>>> make the default unreasonable.
>>
>> I knew you would say that. It deserves an entry in the faq or some
>> trouble shooting document though.
>
> It is documented. For instance, rt_task_create says:
> stksize         The size of the stack (in bytes) for the new task. If
>                zero is passed, a reasonable pre-defined size will be substituted.
>
> What else can we say? Documenting that this size is 32 KiB would be
> wrong, because we do not want applications to rely on a particular
> value, in case we want to change it. And the fact that if your stack is
> too small, you will get problems is kind of obvious. For anyone having
> played with stack sizes with Linux or any proprietary RTOS, at least.

And what with new RTOS/Xenomai users ?

You have to take the user perspective here. The problem with stack
overflows is that they occur when the development of a program has
progressed a while and applications reached a certain level of
complexity (otherwise the overflow wouldn't have happend in the first
place). So it suddenly starts to segfault (from time to time). What he
does is this: he fires up the debugger to get a backtrace, sees
trouble and wrongly assumes that gdb can't really handle these Xenomai
threads and tries to eliminate causes of the crashes.. The user comes
quickly to the conclusion that 'putting it all together' causes the
crash (the single unit tests pass) and is looking for a software
integration problem. In reality, it's the stack.

If you've been through all this and then came to the correct
conclusion the same day, you've been burnt before, or are the
exception.

In my view, 32k is a premature optimization. At least, it shows the
side effects of one.

Peter


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xenomai-help] native: A 32k stack is not always a 'reasonable' size
  2010-07-07 22:31       ` Peter Soetens
@ 2010-07-07 23:08         ` Gilles Chanteperdrix
  2010-07-08  8:37           ` Philippe Gerum
  0 siblings, 1 reply; 21+ messages in thread
From: Gilles Chanteperdrix @ 2010-07-07 23:08 UTC (permalink / raw)
  To: Peter Soetens; +Cc: xenomai-help

Peter Soetens wrote:
> On Wed, Jul 7, 2010 at 11:19 PM, Gilles Chanteperdrix
> <gilles.chanteperdrix@xenomai.org> wrote:
>> Peter Soetens wrote:
>>> On Wed, Jul 7, 2010 at 11:06 AM, Gilles Chanteperdrix
>>> <gilles.chanteperdrix@xenomai.org> wrote:
>>>> Peter Soetens wrote:
>>>>> At least, not for Orocos applications. We've had hard to debug
>>>>> application segfaults that used just a 'little' bit more than 32k. We
>>>>> had to raise the stack size to 128k to get reliably through our
>>>>> application startup. I stem from the old 'mlockall ate my RAM'
>>>>> generation where we typically reduced stack sizes in order to have
>>>>> some crumbles left for the heap. But 32k wasn't really what we were
>>>>> aiming for.
>>>>>
>>>>> Maybe we should explicitly document the 32k limit and its limitations
>>>>> for certain applications...?
>>>> Again, things have been fixed in 2.5.3 with regard to stack sizes, could
>>>> you check that you have the same behaviour?
>>> I think we had, but I'm uncertain right now.
>>>
>>>> As for 32KiB, it is only a default stack size, it is only reasonable in
>>>> the sense that 2MiB is unreasonable on a low-end system. 32KiB was
>>>> picked because it allows printf to work. Now, whatever stack size we
>>>> choose, there will be applications which need more, this does not really
>>>> make the default unreasonable.
>>> I knew you would say that. It deserves an entry in the faq or some
>>> trouble shooting document though.
>> It is documented. For instance, rt_task_create says:
>> stksize         The size of the stack (in bytes) for the new task. If
>>                zero is passed, a reasonable pre-defined size will be substituted.
>>
>> What else can we say? Documenting that this size is 32 KiB would be
>> wrong, because we do not want applications to rely on a particular
>> value, in case we want to change it. And the fact that if your stack is
>> too small, you will get problems is kind of obvious. For anyone having
>> played with stack sizes with Linux or any proprietary RTOS, at least.
> 
> And what with new RTOS/Xenomai users ?
> 
> You have to take the user perspective here. The problem with stack
> overflows is that they occur when the development of a program has
> progressed a while and applications reached a certain level of
> complexity (otherwise the overflow wouldn't have happend in the first
> place). So it suddenly starts to segfault (from time to time). What he
> does is this: he fires up the debugger to get a backtrace, sees
> trouble and wrongly assumes that gdb can't really handle these Xenomai
> threads and tries to eliminate causes of the crashes.. 

Last time I tried, debugging a stack overflow with gdb was possible. You
can print the stack pointer and compare the value with the contents of
/proc/pid/maps.

The user comes
> quickly to the conclusion that 'putting it all together' causes the
> crash (the single unit tests pass) and is looking for a software
> integration problem. In reality, it's the stack.
> 
> If you've been through all this and then came to the correct
> conclusion the same day, you've been burnt before, or are the
> exception.
> 
> In my view, 32k is a premature optimization. At least, it shows the
> side effects of one.

I guess you run Xenomai on one of these big irons, do you? Because if
you ran on a low-end machine, you would have understand why we can not
keep the 2MB default limit. 32 KiB looks already like a pretty large
limit, so, maybe there is a problem in your application?

The I-pipe patch for ARM detects stack overflows, I guess we can modify
the kernel on all architectures to do the same thing on all architectures.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xenomai-help] native: A 32k stack is not always a 'reasonable' size
  2010-07-07 23:08         ` Gilles Chanteperdrix
@ 2010-07-08  8:37           ` Philippe Gerum
  2010-07-08  8:58             ` Gilles Chanteperdrix
  0 siblings, 1 reply; 21+ messages in thread
From: Philippe Gerum @ 2010-07-08  8:37 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai-help

On Thu, 2010-07-08 at 01:08 +0200, Gilles Chanteperdrix wrote:
> Peter Soetens wrote:
> > On Wed, Jul 7, 2010 at 11:19 PM, Gilles Chanteperdrix
> > <gilles.chanteperdrix@xenomai.org> wrote:
> >> Peter Soetens wrote:
> >>> On Wed, Jul 7, 2010 at 11:06 AM, Gilles Chanteperdrix
> >>> <gilles.chanteperdrix@xenomai.org> wrote:
> >>>> Peter Soetens wrote:
> >>>>> At least, not for Orocos applications. We've had hard to debug
> >>>>> application segfaults that used just a 'little' bit more than 32k. We
> >>>>> had to raise the stack size to 128k to get reliably through our
> >>>>> application startup. I stem from the old 'mlockall ate my RAM'
> >>>>> generation where we typically reduced stack sizes in order to have
> >>>>> some crumbles left for the heap. But 32k wasn't really what we were
> >>>>> aiming for.
> >>>>>
> >>>>> Maybe we should explicitly document the 32k limit and its limitations
> >>>>> for certain applications...?
> >>>> Again, things have been fixed in 2.5.3 with regard to stack sizes, could
> >>>> you check that you have the same behaviour?
> >>> I think we had, but I'm uncertain right now.
> >>>
> >>>> As for 32KiB, it is only a default stack size, it is only reasonable in
> >>>> the sense that 2MiB is unreasonable on a low-end system. 32KiB was
> >>>> picked because it allows printf to work. Now, whatever stack size we
> >>>> choose, there will be applications which need more, this does not really
> >>>> make the default unreasonable.
> >>> I knew you would say that. It deserves an entry in the faq or some
> >>> trouble shooting document though.
> >> It is documented. For instance, rt_task_create says:
> >> stksize         The size of the stack (in bytes) for the new task. If
> >>                zero is passed, a reasonable pre-defined size will be substituted.
> >>
> >> What else can we say? Documenting that this size is 32 KiB would be
> >> wrong, because we do not want applications to rely on a particular
> >> value, in case we want to change it. And the fact that if your stack is
> >> too small, you will get problems is kind of obvious. For anyone having
> >> played with stack sizes with Linux or any proprietary RTOS, at least.
> > 
> > And what with new RTOS/Xenomai users ?
> > 
> > You have to take the user perspective here. The problem with stack
> > overflows is that they occur when the development of a program has
> > progressed a while and applications reached a certain level of
> > complexity (otherwise the overflow wouldn't have happend in the first
> > place). So it suddenly starts to segfault (from time to time). What he
> > does is this: he fires up the debugger to get a backtrace, sees
> > trouble and wrongly assumes that gdb can't really handle these Xenomai
> > threads and tries to eliminate causes of the crashes.. 
> 
> Last time I tried, debugging a stack overflow with gdb was possible. You
> can print the stack pointer and compare the value with the contents of
> /proc/pid/maps.
> 
> The user comes
> > quickly to the conclusion that 'putting it all together' causes the
> > crash (the single unit tests pass) and is looking for a software
> > integration problem. In reality, it's the stack.
> > 
> > If you've been through all this and then came to the correct
> > conclusion the same day, you've been burnt before, or are the
> > exception.
> > 
> > In my view, 32k is a premature optimization. At least, it shows the
> > side effects of one.
> 
> I guess you run Xenomai on one of these big irons, do you? Because if
> you ran on a low-end machine, you would have understand why we can not
> keep the 2MB default limit. 32 KiB looks already like a pretty large
> limit, so, maybe there is a problem in your application?
> 
> The I-pipe patch for ARM detects stack overflows, I guess we can modify
> the kernel on all architectures to do the same thing on all architectures.
> 

Peter made a good point considering the various braindamage outcomes a
stack smashing issue could trigger. I'm unsure whether anyone can
immediately suspect a stack overflow to be the cause of any random
application behavior; typically, that issue could cause a branch to any
random IP value on x86 since the return address is living on the stack
and could get trashed, but not necessarily on architectures with
branch-and-link registers. In the former case, GDB is of little help,
except for single-stepping until the offending statement is reached and
we can observe the trashing live, which means that we actually did the
work of spotting the issue manually.

It turns out that people with large applications and lots of contexts
often end up naked in the cold most of the time when facing those
things, and the only option left to them is to go backward on the
integration path, in order to find a possibly faulty component. Before
people can reasonably compare %sp values, they need some help to narrow
the search, otherwise, it's hopeless.

To this end, maybe an option would be to enable gcc's
-fstack-protector[-all] -fstack-check when the debug switch is given to
the configure script, provided the compiler in use supports this.

Granted, a stack overflow is not identical to a smashing, but quite
often the stack memory unduly consumed by a thread belongs to some other
memory object, and therefore usually gets trashed when that object is
modified. At least, enabling some canary word checking in that case may
help.

-- 
Philippe.




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xenomai-help] native: A 32k stack is not always a 'reasonable' size
  2010-07-08  8:37           ` Philippe Gerum
@ 2010-07-08  8:58             ` Gilles Chanteperdrix
  2010-07-08  9:31               ` Philippe Gerum
  2010-07-08  9:50               ` Philippe Gerum
  0 siblings, 2 replies; 21+ messages in thread
From: Gilles Chanteperdrix @ 2010-07-08  8:58 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai-help

Philippe Gerum wrote:
> On Thu, 2010-07-08 at 01:08 +0200, Gilles Chanteperdrix wrote:
>> Peter Soetens wrote:
>>> On Wed, Jul 7, 2010 at 11:19 PM, Gilles Chanteperdrix
>>> <gilles.chanteperdrix@xenomai.org> wrote:
>>>> Peter Soetens wrote:
>>>>> On Wed, Jul 7, 2010 at 11:06 AM, Gilles Chanteperdrix
>>>>> <gilles.chanteperdrix@xenomai.org> wrote:
>>>>>> Peter Soetens wrote:
>>>>>>> At least, not for Orocos applications. We've had hard to debug
>>>>>>> application segfaults that used just a 'little' bit more than 32k. We
>>>>>>> had to raise the stack size to 128k to get reliably through our
>>>>>>> application startup. I stem from the old 'mlockall ate my RAM'
>>>>>>> generation where we typically reduced stack sizes in order to have
>>>>>>> some crumbles left for the heap. But 32k wasn't really what we were
>>>>>>> aiming for.
>>>>>>>
>>>>>>> Maybe we should explicitly document the 32k limit and its limitations
>>>>>>> for certain applications...?
>>>>>> Again, things have been fixed in 2.5.3 with regard to stack sizes, could
>>>>>> you check that you have the same behaviour?
>>>>> I think we had, but I'm uncertain right now.
>>>>>
>>>>>> As for 32KiB, it is only a default stack size, it is only reasonable in
>>>>>> the sense that 2MiB is unreasonable on a low-end system. 32KiB was
>>>>>> picked because it allows printf to work. Now, whatever stack size we
>>>>>> choose, there will be applications which need more, this does not really
>>>>>> make the default unreasonable.
>>>>> I knew you would say that. It deserves an entry in the faq or some
>>>>> trouble shooting document though.
>>>> It is documented. For instance, rt_task_create says:
>>>> stksize         The size of the stack (in bytes) for the new task. If
>>>>                zero is passed, a reasonable pre-defined size will be substituted.
>>>>
>>>> What else can we say? Documenting that this size is 32 KiB would be
>>>> wrong, because we do not want applications to rely on a particular
>>>> value, in case we want to change it. And the fact that if your stack is
>>>> too small, you will get problems is kind of obvious. For anyone having
>>>> played with stack sizes with Linux or any proprietary RTOS, at least.
>>> And what with new RTOS/Xenomai users ?
>>>
>>> You have to take the user perspective here. The problem with stack
>>> overflows is that they occur when the development of a program has
>>> progressed a while and applications reached a certain level of
>>> complexity (otherwise the overflow wouldn't have happend in the first
>>> place). So it suddenly starts to segfault (from time to time). What he
>>> does is this: he fires up the debugger to get a backtrace, sees
>>> trouble and wrongly assumes that gdb can't really handle these Xenomai
>>> threads and tries to eliminate causes of the crashes.. 
>> Last time I tried, debugging a stack overflow with gdb was possible. You
>> can print the stack pointer and compare the value with the contents of
>> /proc/pid/maps.
>>
>> The user comes
>>> quickly to the conclusion that 'putting it all together' causes the
>>> crash (the single unit tests pass) and is looking for a software
>>> integration problem. In reality, it's the stack.
>>>
>>> If you've been through all this and then came to the correct
>>> conclusion the same day, you've been burnt before, or are the
>>> exception.
>>>
>>> In my view, 32k is a premature optimization. At least, it shows the
>>> side effects of one.
>> I guess you run Xenomai on one of these big irons, do you? Because if
>> you ran on a low-end machine, you would have understand why we can not
>> keep the 2MB default limit. 32 KiB looks already like a pretty large
>> limit, so, maybe there is a problem in your application?
>>
>> The I-pipe patch for ARM detects stack overflows, I guess we can modify
>> the kernel on all architectures to do the same thing on all architectures.
>>
> 
> Peter made a good point considering the various braindamage outcomes a
> stack smashing issue could trigger. I'm unsure whether anyone can
> immediately suspect a stack overflow to be the cause of any random
> application behavior; typically, that issue could cause a branch to any
> random IP value on x86 since the return address is living on the stack
> and could get trashed, but not necessarily on architectures with
> branch-and-link registers. In the former case, GDB is of little help,
> except for single-stepping until the offending statement is reached and
> we can observe the trashing live, which means that we actually did the
> work of spotting the issue manually.
> 
> It turns out that people with large applications and lots of contexts
> often end up naked in the cold most of the time when facing those
> things, and the only option left to them is to go backward on the
> integration path, in order to find a possibly faulty component. Before
> people can reasonably compare %sp values, they need some help to narrow
> the search, otherwise, it's hopeless.
> 
> To this end, maybe an option would be to enable gcc's
> -fstack-protector[-all] -fstack-check when the debug switch is given to
> the configure script, provided the compiler in use supports this.
> 
> Granted, a stack overflow is not identical to a smashing, but quite
> often the stack memory unduly consumed by a thread belongs to some other
> memory object, and therefore usually gets trashed when that object is
> modified. At least, enabling some canary word checking in that case may
> help.

I do not think so. The glibc maps an unreadable/unwritable page below
the stack. So, what you get is a segmentation fault. Unless, of course,
you overflow more than one page. But we can map more than one page by
using pthread_attr_setguardsize, if one page is not enough.

We can detect the stack overflow in kernel-space, there it is easy to
detect, the problem is that x86 users, which are the ones more likely to
be hit by a stack overflow, may not be watching the console, so may not
see the message.

Or we can install a handler for SIGSEGV which detects stack overflows
(it will be a litlle harder than in kernel-space) and prints a clear
message in that case but we will have to use an alternate stack for the
signal handler (obviously, the SIGSEGV handler can not be stacked over
the stack overflow).

Or we can increase the default stack size, but in my view, we will only
be delaying the problem a bit further down the "new users" development
process.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xenomai-help] native: A 32k stack is not always a 'reasonable' size
  2010-07-08  8:58             ` Gilles Chanteperdrix
@ 2010-07-08  9:31               ` Philippe Gerum
  2010-07-08  9:35                 ` Gilles Chanteperdrix
  2010-07-08  9:50               ` Philippe Gerum
  1 sibling, 1 reply; 21+ messages in thread
From: Philippe Gerum @ 2010-07-08  9:31 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai-help

On Thu, 2010-07-08 at 10:58 +0200, Gilles Chanteperdrix wrote:
> Philippe Gerum wrote:
> > On Thu, 2010-07-08 at 01:08 +0200, Gilles Chanteperdrix wrote:
> >> Peter Soetens wrote:
> >>> On Wed, Jul 7, 2010 at 11:19 PM, Gilles Chanteperdrix
> >>> <gilles.chanteperdrix@xenomai.org> wrote:
> >>>> Peter Soetens wrote:
> >>>>> On Wed, Jul 7, 2010 at 11:06 AM, Gilles Chanteperdrix
> >>>>> <gilles.chanteperdrix@xenomai.org> wrote:
> >>>>>> Peter Soetens wrote:
> >>>>>>> At least, not for Orocos applications. We've had hard to debug
> >>>>>>> application segfaults that used just a 'little' bit more than 32k. We
> >>>>>>> had to raise the stack size to 128k to get reliably through our
> >>>>>>> application startup. I stem from the old 'mlockall ate my RAM'
> >>>>>>> generation where we typically reduced stack sizes in order to have
> >>>>>>> some crumbles left for the heap. But 32k wasn't really what we were
> >>>>>>> aiming for.
> >>>>>>>
> >>>>>>> Maybe we should explicitly document the 32k limit and its limitations
> >>>>>>> for certain applications...?
> >>>>>> Again, things have been fixed in 2.5.3 with regard to stack sizes, could
> >>>>>> you check that you have the same behaviour?
> >>>>> I think we had, but I'm uncertain right now.
> >>>>>
> >>>>>> As for 32KiB, it is only a default stack size, it is only reasonable in
> >>>>>> the sense that 2MiB is unreasonable on a low-end system. 32KiB was
> >>>>>> picked because it allows printf to work. Now, whatever stack size we
> >>>>>> choose, there will be applications which need more, this does not really
> >>>>>> make the default unreasonable.
> >>>>> I knew you would say that. It deserves an entry in the faq or some
> >>>>> trouble shooting document though.
> >>>> It is documented. For instance, rt_task_create says:
> >>>> stksize         The size of the stack (in bytes) for the new task. If
> >>>>                zero is passed, a reasonable pre-defined size will be substituted.
> >>>>
> >>>> What else can we say? Documenting that this size is 32 KiB would be
> >>>> wrong, because we do not want applications to rely on a particular
> >>>> value, in case we want to change it. And the fact that if your stack is
> >>>> too small, you will get problems is kind of obvious. For anyone having
> >>>> played with stack sizes with Linux or any proprietary RTOS, at least.
> >>> And what with new RTOS/Xenomai users ?
> >>>
> >>> You have to take the user perspective here. The problem with stack
> >>> overflows is that they occur when the development of a program has
> >>> progressed a while and applications reached a certain level of
> >>> complexity (otherwise the overflow wouldn't have happend in the first
> >>> place). So it suddenly starts to segfault (from time to time). What he
> >>> does is this: he fires up the debugger to get a backtrace, sees
> >>> trouble and wrongly assumes that gdb can't really handle these Xenomai
> >>> threads and tries to eliminate causes of the crashes.. 
> >> Last time I tried, debugging a stack overflow with gdb was possible. You
> >> can print the stack pointer and compare the value with the contents of
> >> /proc/pid/maps.
> >>
> >> The user comes
> >>> quickly to the conclusion that 'putting it all together' causes the
> >>> crash (the single unit tests pass) and is looking for a software
> >>> integration problem. In reality, it's the stack.
> >>>
> >>> If you've been through all this and then came to the correct
> >>> conclusion the same day, you've been burnt before, or are the
> >>> exception.
> >>>
> >>> In my view, 32k is a premature optimization. At least, it shows the
> >>> side effects of one.
> >> I guess you run Xenomai on one of these big irons, do you? Because if
> >> you ran on a low-end machine, you would have understand why we can not
> >> keep the 2MB default limit. 32 KiB looks already like a pretty large
> >> limit, so, maybe there is a problem in your application?
> >>
> >> The I-pipe patch for ARM detects stack overflows, I guess we can modify
> >> the kernel on all architectures to do the same thing on all architectures.
> >>
> > 
> > Peter made a good point considering the various braindamage outcomes a
> > stack smashing issue could trigger. I'm unsure whether anyone can
> > immediately suspect a stack overflow to be the cause of any random
> > application behavior; typically, that issue could cause a branch to any
> > random IP value on x86 since the return address is living on the stack
> > and could get trashed, but not necessarily on architectures with
> > branch-and-link registers. In the former case, GDB is of little help,
> > except for single-stepping until the offending statement is reached and
> > we can observe the trashing live, which means that we actually did the
> > work of spotting the issue manually.
> > 
> > It turns out that people with large applications and lots of contexts
> > often end up naked in the cold most of the time when facing those
> > things, and the only option left to them is to go backward on the
> > integration path, in order to find a possibly faulty component. Before
> > people can reasonably compare %sp values, they need some help to narrow
> > the search, otherwise, it's hopeless.
> > 
> > To this end, maybe an option would be to enable gcc's
> > -fstack-protector[-all] -fstack-check when the debug switch is given to
> > the configure script, provided the compiler in use supports this.
> > 
> > Granted, a stack overflow is not identical to a smashing, but quite
> > often the stack memory unduly consumed by a thread belongs to some other
> > memory object, and therefore usually gets trashed when that object is
> > modified. At least, enabling some canary word checking in that case may
> > help.
> 
> I do not think so. The glibc maps an unreadable/unwritable page below
> the stack. So, what you get is a segmentation fault. Unless, of course,
> you overflow more than one page. But we can map more than one page by
> using pthread_attr_setguardsize, if one page is not enough.

The page guard is restricted to MMU-enabled systems, we have two over
six of our architectures running without MMU. In this case, the only
option left that may work is the stack protector based on the canary
word checking.

Relying on pthread_attr_setguardsize() when available will trigger the
same amount of uncertainty than we have now with setting the minimum
stack size. Which guard value would a sane default? one, two, four
pages?

> 
> We can detect the stack overflow in kernel-space, there it is easy to
> detect, the problem is that x86 users, which are the ones more likely to
> be hit by a stack overflow, may not be watching the console, so may not
> see the message.
> 

Kernel-space is another issue, people writing applications in kernel
space are mostly on their own these days, and others implementing
drivers are expected to always consider stack space as a scarce resource
anyway. But helping with solving userland problems seems to be the most
urgent thing to do, since common practices in that environment may
conflict badly with real-time restrictions and requirements.

> Or we can install a handler for SIGSEGV which detects stack overflows
> (it will be a litlle harder than in kernel-space) and prints a clear
> message in that case but we will have to use an alternate stack for the
> signal handler (obviously, the SIGSEGV handler can not be stacked over
> the stack overflow).
> 
> Or we can increase the default stack size, but in my view, we will only
> be delaying the problem a bit further down the "new users" development
> process.
> 

I agree with your view here, but this also creates the requirement for
helping people to detect stack trashing early enough.

-- 
Philippe.




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xenomai-help] native: A 32k stack is not always a 'reasonable' size
  2010-07-08  9:31               ` Philippe Gerum
@ 2010-07-08  9:35                 ` Gilles Chanteperdrix
  2010-07-08  9:58                   ` Philippe Gerum
  0 siblings, 1 reply; 21+ messages in thread
From: Gilles Chanteperdrix @ 2010-07-08  9:35 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai-help

Philippe Gerum wrote:
> On Thu, 2010-07-08 at 10:58 +0200, Gilles Chanteperdrix wrote:
>> Philippe Gerum wrote:
>>> On Thu, 2010-07-08 at 01:08 +0200, Gilles Chanteperdrix wrote:
>>>> Peter Soetens wrote:
>>>>> On Wed, Jul 7, 2010 at 11:19 PM, Gilles Chanteperdrix
>>>>> <gilles.chanteperdrix@xenomai.org> wrote:
>>>>>> Peter Soetens wrote:
>>>>>>> On Wed, Jul 7, 2010 at 11:06 AM, Gilles Chanteperdrix
>>>>>>> <gilles.chanteperdrix@xenomai.org> wrote:
>>>>>>>> Peter Soetens wrote:
>>>>>>>>> At least, not for Orocos applications. We've had hard to debug
>>>>>>>>> application segfaults that used just a 'little' bit more than 32k. We
>>>>>>>>> had to raise the stack size to 128k to get reliably through our
>>>>>>>>> application startup. I stem from the old 'mlockall ate my RAM'
>>>>>>>>> generation where we typically reduced stack sizes in order to have
>>>>>>>>> some crumbles left for the heap. But 32k wasn't really what we were
>>>>>>>>> aiming for.
>>>>>>>>>
>>>>>>>>> Maybe we should explicitly document the 32k limit and its limitations
>>>>>>>>> for certain applications...?
>>>>>>>> Again, things have been fixed in 2.5.3 with regard to stack sizes, could
>>>>>>>> you check that you have the same behaviour?
>>>>>>> I think we had, but I'm uncertain right now.
>>>>>>>
>>>>>>>> As for 32KiB, it is only a default stack size, it is only reasonable in
>>>>>>>> the sense that 2MiB is unreasonable on a low-end system. 32KiB was
>>>>>>>> picked because it allows printf to work. Now, whatever stack size we
>>>>>>>> choose, there will be applications which need more, this does not really
>>>>>>>> make the default unreasonable.
>>>>>>> I knew you would say that. It deserves an entry in the faq or some
>>>>>>> trouble shooting document though.
>>>>>> It is documented. For instance, rt_task_create says:
>>>>>> stksize         The size of the stack (in bytes) for the new task. If
>>>>>>                zero is passed, a reasonable pre-defined size will be substituted.
>>>>>>
>>>>>> What else can we say? Documenting that this size is 32 KiB would be
>>>>>> wrong, because we do not want applications to rely on a particular
>>>>>> value, in case we want to change it. And the fact that if your stack is
>>>>>> too small, you will get problems is kind of obvious. For anyone having
>>>>>> played with stack sizes with Linux or any proprietary RTOS, at least.
>>>>> And what with new RTOS/Xenomai users ?
>>>>>
>>>>> You have to take the user perspective here. The problem with stack
>>>>> overflows is that they occur when the development of a program has
>>>>> progressed a while and applications reached a certain level of
>>>>> complexity (otherwise the overflow wouldn't have happend in the first
>>>>> place). So it suddenly starts to segfault (from time to time). What he
>>>>> does is this: he fires up the debugger to get a backtrace, sees
>>>>> trouble and wrongly assumes that gdb can't really handle these Xenomai
>>>>> threads and tries to eliminate causes of the crashes.. 
>>>> Last time I tried, debugging a stack overflow with gdb was possible. You
>>>> can print the stack pointer and compare the value with the contents of
>>>> /proc/pid/maps.
>>>>
>>>> The user comes
>>>>> quickly to the conclusion that 'putting it all together' causes the
>>>>> crash (the single unit tests pass) and is looking for a software
>>>>> integration problem. In reality, it's the stack.
>>>>>
>>>>> If you've been through all this and then came to the correct
>>>>> conclusion the same day, you've been burnt before, or are the
>>>>> exception.
>>>>>
>>>>> In my view, 32k is a premature optimization. At least, it shows the
>>>>> side effects of one.
>>>> I guess you run Xenomai on one of these big irons, do you? Because if
>>>> you ran on a low-end machine, you would have understand why we can not
>>>> keep the 2MB default limit. 32 KiB looks already like a pretty large
>>>> limit, so, maybe there is a problem in your application?
>>>>
>>>> The I-pipe patch for ARM detects stack overflows, I guess we can modify
>>>> the kernel on all architectures to do the same thing on all architectures.
>>>>
>>> Peter made a good point considering the various braindamage outcomes a
>>> stack smashing issue could trigger. I'm unsure whether anyone can
>>> immediately suspect a stack overflow to be the cause of any random
>>> application behavior; typically, that issue could cause a branch to any
>>> random IP value on x86 since the return address is living on the stack
>>> and could get trashed, but not necessarily on architectures with
>>> branch-and-link registers. In the former case, GDB is of little help,
>>> except for single-stepping until the offending statement is reached and
>>> we can observe the trashing live, which means that we actually did the
>>> work of spotting the issue manually.
>>>
>>> It turns out that people with large applications and lots of contexts
>>> often end up naked in the cold most of the time when facing those
>>> things, and the only option left to them is to go backward on the
>>> integration path, in order to find a possibly faulty component. Before
>>> people can reasonably compare %sp values, they need some help to narrow
>>> the search, otherwise, it's hopeless.
>>>
>>> To this end, maybe an option would be to enable gcc's
>>> -fstack-protector[-all] -fstack-check when the debug switch is given to
>>> the configure script, provided the compiler in use supports this.
>>>
>>> Granted, a stack overflow is not identical to a smashing, but quite
>>> often the stack memory unduly consumed by a thread belongs to some other
>>> memory object, and therefore usually gets trashed when that object is
>>> modified. At least, enabling some canary word checking in that case may
>>> help.
>> I do not think so. The glibc maps an unreadable/unwritable page below
>> the stack. So, what you get is a segmentation fault. Unless, of course,
>> you overflow more than one page. But we can map more than one page by
>> using pthread_attr_setguardsize, if one page is not enough.
> 
> The page guard is restricted to MMU-enabled systems, we have two over
> six of our architectures running without MMU. In this case, the only
> option left that may work is the stack protector based on the canary
> word checking.
> 
> Relying on pthread_attr_setguardsize() when available will trigger the
> same amount of uncertainty than we have now with setting the minimum
> stack size. Which guard value would a sane default? one, two, four
> pages?
> 
>> We can detect the stack overflow in kernel-space, there it is easy to
>> detect, the problem is that x86 users, which are the ones more likely to
>> be hit by a stack overflow, may not be watching the console, so may not
>> see the message.
>>
> 
> Kernel-space is another issue, people writing applications in kernel
> space are mostly on their own these days, and others implementing
> drivers are expected to always consider stack space as a scarce resource
> anyway. But helping with solving userland problems seems to be the most
> urgent thing to do, since common practices in that environment may
> conflict badly with real-time restrictions and requirements.

I mean detecting the user-space stack overflows when handling user-space
page faults in kernel-space. But granted, that also only works for 
systems with an MMU. The following piece of code does it in the I-pipe 
patch for ARM with FCSE enabled:

+       down_read(&mm->mmap_sem);
+       if (find_vma(mm, addr) == find_vma(mm, regs->ARM_sp))
+               printk(KERN_INFO "FCSE: process %u(%s) probably overflowed stack
 at 0x%08lx.\n",
+                      current->pid, current->comm, regs->ARM_pc);
+       up_read(&mm->mmap_sem);


> 
>> Or we can install a handler for SIGSEGV which detects stack overflows
>> (it will be a litlle harder than in kernel-space) and prints a clear
>> message in that case but we will have to use an alternate stack for the
>> signal handler (obviously, the SIGSEGV handler can not be stacked over
>> the stack overflow).
>>
>> Or we can increase the default stack size, but in my view, we will only
>> be delaying the problem a bit further down the "new users" development
>> process.
>>
> 
> I agree with your view here, but this also creates the requirement for
> helping people to detect stack trashing early enough.
> 


-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xenomai-help] native: A 32k stack is not always a 'reasonable' size
  2010-07-08  8:58             ` Gilles Chanteperdrix
  2010-07-08  9:31               ` Philippe Gerum
@ 2010-07-08  9:50               ` Philippe Gerum
  2010-07-08  9:55                 ` Gilles Chanteperdrix
  1 sibling, 1 reply; 21+ messages in thread
From: Philippe Gerum @ 2010-07-08  9:50 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai-help

On Thu, 2010-07-08 at 10:58 +0200, Gilles Chanteperdrix wrote:
> Philippe Gerum wrote:
> > On Thu, 2010-07-08 at 01:08 +0200, Gilles Chanteperdrix wrote:
> >> Peter Soetens wrote:
> >>> On Wed, Jul 7, 2010 at 11:19 PM, Gilles Chanteperdrix
> >>> <gilles.chanteperdrix@xenomai.org> wrote:
> >>>> Peter Soetens wrote:
> >>>>> On Wed, Jul 7, 2010 at 11:06 AM, Gilles Chanteperdrix
> >>>>> <gilles.chanteperdrix@xenomai.org> wrote:
> >>>>>> Peter Soetens wrote:
> >>>>>>> At least, not for Orocos applications. We've had hard to debug
> >>>>>>> application segfaults that used just a 'little' bit more than 32k. We
> >>>>>>> had to raise the stack size to 128k to get reliably through our
> >>>>>>> application startup. I stem from the old 'mlockall ate my RAM'
> >>>>>>> generation where we typically reduced stack sizes in order to have
> >>>>>>> some crumbles left for the heap. But 32k wasn't really what we were
> >>>>>>> aiming for.
> >>>>>>>
> >>>>>>> Maybe we should explicitly document the 32k limit and its limitations
> >>>>>>> for certain applications...?
> >>>>>> Again, things have been fixed in 2.5.3 with regard to stack sizes, could
> >>>>>> you check that you have the same behaviour?
> >>>>> I think we had, but I'm uncertain right now.
> >>>>>
> >>>>>> As for 32KiB, it is only a default stack size, it is only reasonable in
> >>>>>> the sense that 2MiB is unreasonable on a low-end system. 32KiB was
> >>>>>> picked because it allows printf to work. Now, whatever stack size we
> >>>>>> choose, there will be applications which need more, this does not really
> >>>>>> make the default unreasonable.
> >>>>> I knew you would say that. It deserves an entry in the faq or some
> >>>>> trouble shooting document though.
> >>>> It is documented. For instance, rt_task_create says:
> >>>> stksize         The size of the stack (in bytes) for the new task. If
> >>>>                zero is passed, a reasonable pre-defined size will be substituted.
> >>>>
> >>>> What else can we say? Documenting that this size is 32 KiB would be
> >>>> wrong, because we do not want applications to rely on a particular
> >>>> value, in case we want to change it. And the fact that if your stack is
> >>>> too small, you will get problems is kind of obvious. For anyone having
> >>>> played with stack sizes with Linux or any proprietary RTOS, at least.
> >>> And what with new RTOS/Xenomai users ?
> >>>
> >>> You have to take the user perspective here. The problem with stack
> >>> overflows is that they occur when the development of a program has
> >>> progressed a while and applications reached a certain level of
> >>> complexity (otherwise the overflow wouldn't have happend in the first
> >>> place). So it suddenly starts to segfault (from time to time). What he
> >>> does is this: he fires up the debugger to get a backtrace, sees
> >>> trouble and wrongly assumes that gdb can't really handle these Xenomai
> >>> threads and tries to eliminate causes of the crashes.. 
> >> Last time I tried, debugging a stack overflow with gdb was possible. You
> >> can print the stack pointer and compare the value with the contents of
> >> /proc/pid/maps.
> >>
> >> The user comes
> >>> quickly to the conclusion that 'putting it all together' causes the
> >>> crash (the single unit tests pass) and is looking for a software
> >>> integration problem. In reality, it's the stack.
> >>>
> >>> If you've been through all this and then came to the correct
> >>> conclusion the same day, you've been burnt before, or are the
> >>> exception.
> >>>
> >>> In my view, 32k is a premature optimization. At least, it shows the
> >>> side effects of one.
> >> I guess you run Xenomai on one of these big irons, do you? Because if
> >> you ran on a low-end machine, you would have understand why we can not
> >> keep the 2MB default limit. 32 KiB looks already like a pretty large
> >> limit, so, maybe there is a problem in your application?
> >>
> >> The I-pipe patch for ARM detects stack overflows, I guess we can modify
> >> the kernel on all architectures to do the same thing on all architectures.
> >>
> > 
> > Peter made a good point considering the various braindamage outcomes a
> > stack smashing issue could trigger. I'm unsure whether anyone can
> > immediately suspect a stack overflow to be the cause of any random
> > application behavior; typically, that issue could cause a branch to any
> > random IP value on x86 since the return address is living on the stack
> > and could get trashed, but not necessarily on architectures with
> > branch-and-link registers. In the former case, GDB is of little help,
> > except for single-stepping until the offending statement is reached and
> > we can observe the trashing live, which means that we actually did the
> > work of spotting the issue manually.
> > 
> > It turns out that people with large applications and lots of contexts
> > often end up naked in the cold most of the time when facing those
> > things, and the only option left to them is to go backward on the
> > integration path, in order to find a possibly faulty component. Before
> > people can reasonably compare %sp values, they need some help to narrow
> > the search, otherwise, it's hopeless.
> > 
> > To this end, maybe an option would be to enable gcc's
> > -fstack-protector[-all] -fstack-check when the debug switch is given to
> > the configure script, provided the compiler in use supports this.
> > 
> > Granted, a stack overflow is not identical to a smashing, but quite
> > often the stack memory unduly consumed by a thread belongs to some other
> > memory object, and therefore usually gets trashed when that object is
> > modified. At least, enabling some canary word checking in that case may
> > help.
> 
> I do not think so. The glibc maps an unreadable/unwritable page below
> the stack. So, what you get is a segmentation fault. Unless, of course,
> you overflow more than one page. But we can map more than one page by
> using pthread_attr_setguardsize, if one page is not enough.

Actually, I guess that the stack guard area will not be contiguous to
any valid page in most cases, so the size of that area should not be the
main issue; i.e. at worst, the code would write to an unmapped address
and raise a fault the same way. But despite this, identifying whether we
had a stack overflow is still a pain, because that situation sometimes
deeply confuses GDB. Or confuses the developer because function
prologues and other hidden code do refer to stack memory, so unless we
trace the program at instruction level, in single-stepping mode, we are
toast.

In short, I'd say that the issue is not that much about pulling the
break when a stack overflow is detected (which happens in a way or
another anyway), but rather about obtaining a reasonably precise hint as
to _where_ the problem occurs.

-- 
Philippe.




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xenomai-help] native: A 32k stack is not always a 'reasonable' size
  2010-07-08  9:50               ` Philippe Gerum
@ 2010-07-08  9:55                 ` Gilles Chanteperdrix
  2010-07-08 10:19                   ` Philippe Gerum
  0 siblings, 1 reply; 21+ messages in thread
From: Gilles Chanteperdrix @ 2010-07-08  9:55 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai-help

Philippe Gerum wrote:
> On Thu, 2010-07-08 at 10:58 +0200, Gilles Chanteperdrix wrote:
>> Philippe Gerum wrote:
>>> On Thu, 2010-07-08 at 01:08 +0200, Gilles Chanteperdrix wrote:
>>>> Peter Soetens wrote:
>>>>> On Wed, Jul 7, 2010 at 11:19 PM, Gilles Chanteperdrix
>>>>> <gilles.chanteperdrix@xenomai.org> wrote:
>>>>>> Peter Soetens wrote:
>>>>>>> On Wed, Jul 7, 2010 at 11:06 AM, Gilles Chanteperdrix
>>>>>>> <gilles.chanteperdrix@xenomai.org> wrote:
>>>>>>>> Peter Soetens wrote:
>>>>>>>>> At least, not for Orocos applications. We've had hard to debug
>>>>>>>>> application segfaults that used just a 'little' bit more than 32k. We
>>>>>>>>> had to raise the stack size to 128k to get reliably through our
>>>>>>>>> application startup. I stem from the old 'mlockall ate my RAM'
>>>>>>>>> generation where we typically reduced stack sizes in order to have
>>>>>>>>> some crumbles left for the heap. But 32k wasn't really what we were
>>>>>>>>> aiming for.
>>>>>>>>>
>>>>>>>>> Maybe we should explicitly document the 32k limit and its limitations
>>>>>>>>> for certain applications...?
>>>>>>>> Again, things have been fixed in 2.5.3 with regard to stack sizes, could
>>>>>>>> you check that you have the same behaviour?
>>>>>>> I think we had, but I'm uncertain right now.
>>>>>>>
>>>>>>>> As for 32KiB, it is only a default stack size, it is only reasonable in
>>>>>>>> the sense that 2MiB is unreasonable on a low-end system. 32KiB was
>>>>>>>> picked because it allows printf to work. Now, whatever stack size we
>>>>>>>> choose, there will be applications which need more, this does not really
>>>>>>>> make the default unreasonable.
>>>>>>> I knew you would say that. It deserves an entry in the faq or some
>>>>>>> trouble shooting document though.
>>>>>> It is documented. For instance, rt_task_create says:
>>>>>> stksize         The size of the stack (in bytes) for the new task. If
>>>>>>                zero is passed, a reasonable pre-defined size will be substituted.
>>>>>>
>>>>>> What else can we say? Documenting that this size is 32 KiB would be
>>>>>> wrong, because we do not want applications to rely on a particular
>>>>>> value, in case we want to change it. And the fact that if your stack is
>>>>>> too small, you will get problems is kind of obvious. For anyone having
>>>>>> played with stack sizes with Linux or any proprietary RTOS, at least.
>>>>> And what with new RTOS/Xenomai users ?
>>>>>
>>>>> You have to take the user perspective here. The problem with stack
>>>>> overflows is that they occur when the development of a program has
>>>>> progressed a while and applications reached a certain level of
>>>>> complexity (otherwise the overflow wouldn't have happend in the first
>>>>> place). So it suddenly starts to segfault (from time to time). What he
>>>>> does is this: he fires up the debugger to get a backtrace, sees
>>>>> trouble and wrongly assumes that gdb can't really handle these Xenomai
>>>>> threads and tries to eliminate causes of the crashes.. 
>>>> Last time I tried, debugging a stack overflow with gdb was possible. You
>>>> can print the stack pointer and compare the value with the contents of
>>>> /proc/pid/maps.
>>>>
>>>> The user comes
>>>>> quickly to the conclusion that 'putting it all together' causes the
>>>>> crash (the single unit tests pass) and is looking for a software
>>>>> integration problem. In reality, it's the stack.
>>>>>
>>>>> If you've been through all this and then came to the correct
>>>>> conclusion the same day, you've been burnt before, or are the
>>>>> exception.
>>>>>
>>>>> In my view, 32k is a premature optimization. At least, it shows the
>>>>> side effects of one.
>>>> I guess you run Xenomai on one of these big irons, do you? Because if
>>>> you ran on a low-end machine, you would have understand why we can not
>>>> keep the 2MB default limit. 32 KiB looks already like a pretty large
>>>> limit, so, maybe there is a problem in your application?
>>>>
>>>> The I-pipe patch for ARM detects stack overflows, I guess we can modify
>>>> the kernel on all architectures to do the same thing on all architectures.
>>>>
>>> Peter made a good point considering the various braindamage outcomes a
>>> stack smashing issue could trigger. I'm unsure whether anyone can
>>> immediately suspect a stack overflow to be the cause of any random
>>> application behavior; typically, that issue could cause a branch to any
>>> random IP value on x86 since the return address is living on the stack
>>> and could get trashed, but not necessarily on architectures with
>>> branch-and-link registers. In the former case, GDB is of little help,
>>> except for single-stepping until the offending statement is reached and
>>> we can observe the trashing live, which means that we actually did the
>>> work of spotting the issue manually.
>>>
>>> It turns out that people with large applications and lots of contexts
>>> often end up naked in the cold most of the time when facing those
>>> things, and the only option left to them is to go backward on the
>>> integration path, in order to find a possibly faulty component. Before
>>> people can reasonably compare %sp values, they need some help to narrow
>>> the search, otherwise, it's hopeless.
>>>
>>> To this end, maybe an option would be to enable gcc's
>>> -fstack-protector[-all] -fstack-check when the debug switch is given to
>>> the configure script, provided the compiler in use supports this.
>>>
>>> Granted, a stack overflow is not identical to a smashing, but quite
>>> often the stack memory unduly consumed by a thread belongs to some other
>>> memory object, and therefore usually gets trashed when that object is
>>> modified. At least, enabling some canary word checking in that case may
>>> help.
>> I do not think so. The glibc maps an unreadable/unwritable page below
>> the stack. So, what you get is a segmentation fault. Unless, of course,
>> you overflow more than one page. But we can map more than one page by
>> using pthread_attr_setguardsize, if one page is not enough.
> 
> Actually, I guess that the stack guard area will not be contiguous to
> any valid page in most cases, so the size of that area should not be the
> main issue; i.e. at worst, the code would write to an unmapped address
> and raise a fault the same way. But despite this, identifying whether we
> had a stack overflow is still a pain, because that situation sometimes
> deeply confuses GDB. Or confuses the developer because function
> prologues and other hidden code do refer to stack memory, so unless we
> trace the program at instruction level, in single-stepping mode, we are
> toast.

Unfortunately, the thread stacks get allocated with mmap, so, they all
get "stacked", no pun intended. They are only separated with the guard
pages, so, yes, if you overflow badly, you may override an other
thread's stack. And since you will have a tendency to overflow the other
thread's stack top, it will take some time before you detect the overrun.

> 
> In short, I'd say that the issue is not that much about pulling the
> break when a stack overflow is detected (which happens in a way or
> another anyway), but rather about obtaining a reasonably precise hint as
> to _where_ the problem occurs.
> 

Hence the proposition of kernel instrumentation, or of SIGSEGV handler.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xenomai-help] native: A 32k stack is not always a 'reasonable' size
  2010-07-08  9:35                 ` Gilles Chanteperdrix
@ 2010-07-08  9:58                   ` Philippe Gerum
  2010-07-08 10:04                     ` Gilles Chanteperdrix
  2010-07-08 11:52                     ` Gilles Chanteperdrix
  0 siblings, 2 replies; 21+ messages in thread
From: Philippe Gerum @ 2010-07-08  9:58 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai-help

On Thu, 2010-07-08 at 11:35 +0200, Gilles Chanteperdrix wrote:
> Philippe Gerum wrote:
> > On Thu, 2010-07-08 at 10:58 +0200, Gilles Chanteperdrix wrote:
> >> Philippe Gerum wrote:
> >>> On Thu, 2010-07-08 at 01:08 +0200, Gilles Chanteperdrix wrote:
> >>>> Peter Soetens wrote:
> >>>>> On Wed, Jul 7, 2010 at 11:19 PM, Gilles Chanteperdrix
> >>>>> <gilles.chanteperdrix@xenomai.org> wrote:
> >>>>>> Peter Soetens wrote:
> >>>>>>> On Wed, Jul 7, 2010 at 11:06 AM, Gilles Chanteperdrix
> >>>>>>> <gilles.chanteperdrix@xenomai.org> wrote:
> >>>>>>>> Peter Soetens wrote:
> >>>>>>>>> At least, not for Orocos applications. We've had hard to debug
> >>>>>>>>> application segfaults that used just a 'little' bit more than 32k. We
> >>>>>>>>> had to raise the stack size to 128k to get reliably through our
> >>>>>>>>> application startup. I stem from the old 'mlockall ate my RAM'
> >>>>>>>>> generation where we typically reduced stack sizes in order to have
> >>>>>>>>> some crumbles left for the heap. But 32k wasn't really what we were
> >>>>>>>>> aiming for.
> >>>>>>>>>
> >>>>>>>>> Maybe we should explicitly document the 32k limit and its limitations
> >>>>>>>>> for certain applications...?
> >>>>>>>> Again, things have been fixed in 2.5.3 with regard to stack sizes, could
> >>>>>>>> you check that you have the same behaviour?
> >>>>>>> I think we had, but I'm uncertain right now.
> >>>>>>>
> >>>>>>>> As for 32KiB, it is only a default stack size, it is only reasonable in
> >>>>>>>> the sense that 2MiB is unreasonable on a low-end system. 32KiB was
> >>>>>>>> picked because it allows printf to work. Now, whatever stack size we
> >>>>>>>> choose, there will be applications which need more, this does not really
> >>>>>>>> make the default unreasonable.
> >>>>>>> I knew you would say that. It deserves an entry in the faq or some
> >>>>>>> trouble shooting document though.
> >>>>>> It is documented. For instance, rt_task_create says:
> >>>>>> stksize         The size of the stack (in bytes) for the new task. If
> >>>>>>                zero is passed, a reasonable pre-defined size will be substituted.
> >>>>>>
> >>>>>> What else can we say? Documenting that this size is 32 KiB would be
> >>>>>> wrong, because we do not want applications to rely on a particular
> >>>>>> value, in case we want to change it. And the fact that if your stack is
> >>>>>> too small, you will get problems is kind of obvious. For anyone having
> >>>>>> played with stack sizes with Linux or any proprietary RTOS, at least.
> >>>>> And what with new RTOS/Xenomai users ?
> >>>>>
> >>>>> You have to take the user perspective here. The problem with stack
> >>>>> overflows is that they occur when the development of a program has
> >>>>> progressed a while and applications reached a certain level of
> >>>>> complexity (otherwise the overflow wouldn't have happend in the first
> >>>>> place). So it suddenly starts to segfault (from time to time). What he
> >>>>> does is this: he fires up the debugger to get a backtrace, sees
> >>>>> trouble and wrongly assumes that gdb can't really handle these Xenomai
> >>>>> threads and tries to eliminate causes of the crashes.. 
> >>>> Last time I tried, debugging a stack overflow with gdb was possible. You
> >>>> can print the stack pointer and compare the value with the contents of
> >>>> /proc/pid/maps.
> >>>>
> >>>> The user comes
> >>>>> quickly to the conclusion that 'putting it all together' causes the
> >>>>> crash (the single unit tests pass) and is looking for a software
> >>>>> integration problem. In reality, it's the stack.
> >>>>>
> >>>>> If you've been through all this and then came to the correct
> >>>>> conclusion the same day, you've been burnt before, or are the
> >>>>> exception.
> >>>>>
> >>>>> In my view, 32k is a premature optimization. At least, it shows the
> >>>>> side effects of one.
> >>>> I guess you run Xenomai on one of these big irons, do you? Because if
> >>>> you ran on a low-end machine, you would have understand why we can not
> >>>> keep the 2MB default limit. 32 KiB looks already like a pretty large
> >>>> limit, so, maybe there is a problem in your application?
> >>>>
> >>>> The I-pipe patch for ARM detects stack overflows, I guess we can modify
> >>>> the kernel on all architectures to do the same thing on all architectures.
> >>>>
> >>> Peter made a good point considering the various braindamage outcomes a
> >>> stack smashing issue could trigger. I'm unsure whether anyone can
> >>> immediately suspect a stack overflow to be the cause of any random
> >>> application behavior; typically, that issue could cause a branch to any
> >>> random IP value on x86 since the return address is living on the stack
> >>> and could get trashed, but not necessarily on architectures with
> >>> branch-and-link registers. In the former case, GDB is of little help,
> >>> except for single-stepping until the offending statement is reached and
> >>> we can observe the trashing live, which means that we actually did the
> >>> work of spotting the issue manually.
> >>>
> >>> It turns out that people with large applications and lots of contexts
> >>> often end up naked in the cold most of the time when facing those
> >>> things, and the only option left to them is to go backward on the
> >>> integration path, in order to find a possibly faulty component. Before
> >>> people can reasonably compare %sp values, they need some help to narrow
> >>> the search, otherwise, it's hopeless.
> >>>
> >>> To this end, maybe an option would be to enable gcc's
> >>> -fstack-protector[-all] -fstack-check when the debug switch is given to
> >>> the configure script, provided the compiler in use supports this.
> >>>
> >>> Granted, a stack overflow is not identical to a smashing, but quite
> >>> often the stack memory unduly consumed by a thread belongs to some other
> >>> memory object, and therefore usually gets trashed when that object is
> >>> modified. At least, enabling some canary word checking in that case may
> >>> help.
> >> I do not think so. The glibc maps an unreadable/unwritable page below
> >> the stack. So, what you get is a segmentation fault. Unless, of course,
> >> you overflow more than one page. But we can map more than one page by
> >> using pthread_attr_setguardsize, if one page is not enough.
> > 
> > The page guard is restricted to MMU-enabled systems, we have two over
> > six of our architectures running without MMU. In this case, the only
> > option left that may work is the stack protector based on the canary
> > word checking.
> > 
> > Relying on pthread_attr_setguardsize() when available will trigger the
> > same amount of uncertainty than we have now with setting the minimum
> > stack size. Which guard value would a sane default? one, two, four
> > pages?
> > 
> >> We can detect the stack overflow in kernel-space, there it is easy to
> >> detect, the problem is that x86 users, which are the ones more likely to
> >> be hit by a stack overflow, may not be watching the console, so may not
> >> see the message.
> >>
> > 
> > Kernel-space is another issue, people writing applications in kernel
> > space are mostly on their own these days, and others implementing
> > drivers are expected to always consider stack space as a scarce resource
> > anyway. But helping with solving userland problems seems to be the most
> > urgent thing to do, since common practices in that environment may
> > conflict badly with real-time restrictions and requirements.
> 
> I mean detecting the user-space stack overflows when handling user-space
> page faults in kernel-space. But granted, that also only works for 
> systems with an MMU. The following piece of code does it in the I-pipe 
> patch for ARM with FCSE enabled:
> 
> +       down_read(&mm->mmap_sem);
> +       if (find_vma(mm, addr) == find_vma(mm, regs->ARM_sp))
> +               printk(KERN_INFO "FCSE: process %u(%s) probably overflowed stack
>  at 0x%08lx.\n",
> +                      current->pid, current->comm, regs->ARM_pc);
> +       up_read(&mm->mmap_sem);
> 

My understanding is that such code detects faulty references within the
_valid_ address space, typically when hitting a page guard area. But I
guess that this won't work when treading on stack memory outside of the
address space, e.g. below the red zone for instance, isn't it? AFAIU,
those things may happen when the heading space of preposterously large
stack-based objects are addressed.

> 
> > 
> >> Or we can install a handler for SIGSEGV which detects stack overflows
> >> (it will be a litlle harder than in kernel-space) and prints a clear
> >> message in that case but we will have to use an alternate stack for the
> >> signal handler (obviously, the SIGSEGV handler can not be stacked over
> >> the stack overflow).
> >>
> >> Or we can increase the default stack size, but in my view, we will only
> >> be delaying the problem a bit further down the "new users" development
> >> process.
> >>
> > 
> > I agree with your view here, but this also creates the requirement for
> > helping people to detect stack trashing early enough.
> > 
> 
> 

-- 
Philippe.




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xenomai-help] native: A 32k stack is not always a 'reasonable' size
  2010-07-08  9:58                   ` Philippe Gerum
@ 2010-07-08 10:04                     ` Gilles Chanteperdrix
  2010-07-08 10:09                       ` Gilles Chanteperdrix
  2010-07-08 11:52                     ` Gilles Chanteperdrix
  1 sibling, 1 reply; 21+ messages in thread
From: Gilles Chanteperdrix @ 2010-07-08 10:04 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai-help

Philippe Gerum wrote:
> On Thu, 2010-07-08 at 11:35 +0200, Gilles Chanteperdrix wrote:
>> Philippe Gerum wrote:
>>> On Thu, 2010-07-08 at 10:58 +0200, Gilles Chanteperdrix wrote:
>>>> Philippe Gerum wrote:
>>>>> On Thu, 2010-07-08 at 01:08 +0200, Gilles Chanteperdrix wrote:
>>>>>> Peter Soetens wrote:
>>>>>>> On Wed, Jul 7, 2010 at 11:19 PM, Gilles Chanteperdrix
>>>>>>> <gilles.chanteperdrix@xenomai.org> wrote:
>>>>>>>> Peter Soetens wrote:
>>>>>>>>> On Wed, Jul 7, 2010 at 11:06 AM, Gilles Chanteperdrix
>>>>>>>>> <gilles.chanteperdrix@xenomai.org> wrote:
>>>>>>>>>> Peter Soetens wrote:
>>>>>>>>>>> At least, not for Orocos applications. We've had hard to debug
>>>>>>>>>>> application segfaults that used just a 'little' bit more than 32k. We
>>>>>>>>>>> had to raise the stack size to 128k to get reliably through our
>>>>>>>>>>> application startup. I stem from the old 'mlockall ate my RAM'
>>>>>>>>>>> generation where we typically reduced stack sizes in order to have
>>>>>>>>>>> some crumbles left for the heap. But 32k wasn't really what we were
>>>>>>>>>>> aiming for.
>>>>>>>>>>>
>>>>>>>>>>> Maybe we should explicitly document the 32k limit and its limitations
>>>>>>>>>>> for certain applications...?
>>>>>>>>>> Again, things have been fixed in 2.5.3 with regard to stack sizes, could
>>>>>>>>>> you check that you have the same behaviour?
>>>>>>>>> I think we had, but I'm uncertain right now.
>>>>>>>>>
>>>>>>>>>> As for 32KiB, it is only a default stack size, it is only reasonable in
>>>>>>>>>> the sense that 2MiB is unreasonable on a low-end system. 32KiB was
>>>>>>>>>> picked because it allows printf to work. Now, whatever stack size we
>>>>>>>>>> choose, there will be applications which need more, this does not really
>>>>>>>>>> make the default unreasonable.
>>>>>>>>> I knew you would say that. It deserves an entry in the faq or some
>>>>>>>>> trouble shooting document though.
>>>>>>>> It is documented. For instance, rt_task_create says:
>>>>>>>> stksize         The size of the stack (in bytes) for the new task. If
>>>>>>>>                zero is passed, a reasonable pre-defined size will be substituted.
>>>>>>>>
>>>>>>>> What else can we say? Documenting that this size is 32 KiB would be
>>>>>>>> wrong, because we do not want applications to rely on a particular
>>>>>>>> value, in case we want to change it. And the fact that if your stack is
>>>>>>>> too small, you will get problems is kind of obvious. For anyone having
>>>>>>>> played with stack sizes with Linux or any proprietary RTOS, at least.
>>>>>>> And what with new RTOS/Xenomai users ?
>>>>>>>
>>>>>>> You have to take the user perspective here. The problem with stack
>>>>>>> overflows is that they occur when the development of a program has
>>>>>>> progressed a while and applications reached a certain level of
>>>>>>> complexity (otherwise the overflow wouldn't have happend in the first
>>>>>>> place). So it suddenly starts to segfault (from time to time). What he
>>>>>>> does is this: he fires up the debugger to get a backtrace, sees
>>>>>>> trouble and wrongly assumes that gdb can't really handle these Xenomai
>>>>>>> threads and tries to eliminate causes of the crashes.. 
>>>>>> Last time I tried, debugging a stack overflow with gdb was possible. You
>>>>>> can print the stack pointer and compare the value with the contents of
>>>>>> /proc/pid/maps.
>>>>>>
>>>>>> The user comes
>>>>>>> quickly to the conclusion that 'putting it all together' causes the
>>>>>>> crash (the single unit tests pass) and is looking for a software
>>>>>>> integration problem. In reality, it's the stack.
>>>>>>>
>>>>>>> If you've been through all this and then came to the correct
>>>>>>> conclusion the same day, you've been burnt before, or are the
>>>>>>> exception.
>>>>>>>
>>>>>>> In my view, 32k is a premature optimization. At least, it shows the
>>>>>>> side effects of one.
>>>>>> I guess you run Xenomai on one of these big irons, do you? Because if
>>>>>> you ran on a low-end machine, you would have understand why we can not
>>>>>> keep the 2MB default limit. 32 KiB looks already like a pretty large
>>>>>> limit, so, maybe there is a problem in your application?
>>>>>>
>>>>>> The I-pipe patch for ARM detects stack overflows, I guess we can modify
>>>>>> the kernel on all architectures to do the same thing on all architectures.
>>>>>>
>>>>> Peter made a good point considering the various braindamage outcomes a
>>>>> stack smashing issue could trigger. I'm unsure whether anyone can
>>>>> immediately suspect a stack overflow to be the cause of any random
>>>>> application behavior; typically, that issue could cause a branch to any
>>>>> random IP value on x86 since the return address is living on the stack
>>>>> and could get trashed, but not necessarily on architectures with
>>>>> branch-and-link registers. In the former case, GDB is of little help,
>>>>> except for single-stepping until the offending statement is reached and
>>>>> we can observe the trashing live, which means that we actually did the
>>>>> work of spotting the issue manually.
>>>>>
>>>>> It turns out that people with large applications and lots of contexts
>>>>> often end up naked in the cold most of the time when facing those
>>>>> things, and the only option left to them is to go backward on the
>>>>> integration path, in order to find a possibly faulty component. Before
>>>>> people can reasonably compare %sp values, they need some help to narrow
>>>>> the search, otherwise, it's hopeless.
>>>>>
>>>>> To this end, maybe an option would be to enable gcc's
>>>>> -fstack-protector[-all] -fstack-check when the debug switch is given to
>>>>> the configure script, provided the compiler in use supports this.
>>>>>
>>>>> Granted, a stack overflow is not identical to a smashing, but quite
>>>>> often the stack memory unduly consumed by a thread belongs to some other
>>>>> memory object, and therefore usually gets trashed when that object is
>>>>> modified. At least, enabling some canary word checking in that case may
>>>>> help.
>>>> I do not think so. The glibc maps an unreadable/unwritable page below
>>>> the stack. So, what you get is a segmentation fault. Unless, of course,
>>>> you overflow more than one page. But we can map more than one page by
>>>> using pthread_attr_setguardsize, if one page is not enough.
>>> The page guard is restricted to MMU-enabled systems, we have two over
>>> six of our architectures running without MMU. In this case, the only
>>> option left that may work is the stack protector based on the canary
>>> word checking.
>>>
>>> Relying on pthread_attr_setguardsize() when available will trigger the
>>> same amount of uncertainty than we have now with setting the minimum
>>> stack size. Which guard value would a sane default? one, two, four
>>> pages?
>>>
>>>> We can detect the stack overflow in kernel-space, there it is easy to
>>>> detect, the problem is that x86 users, which are the ones more likely to
>>>> be hit by a stack overflow, may not be watching the console, so may not
>>>> see the message.
>>>>
>>> Kernel-space is another issue, people writing applications in kernel
>>> space are mostly on their own these days, and others implementing
>>> drivers are expected to always consider stack space as a scarce resource
>>> anyway. But helping with solving userland problems seems to be the most
>>> urgent thing to do, since common practices in that environment may
>>> conflict badly with real-time restrictions and requirements.
>> I mean detecting the user-space stack overflows when handling user-space
>> page faults in kernel-space. But granted, that also only works for 
>> systems with an MMU. The following piece of code does it in the I-pipe 
>> patch for ARM with FCSE enabled:
>>
>> +       down_read(&mm->mmap_sem);
>> +       if (find_vma(mm, addr) == find_vma(mm, regs->ARM_sp))
>> +               printk(KERN_INFO "FCSE: process %u(%s) probably overflowed stack
>>  at 0x%08lx.\n",
>> +                      current->pid, current->comm, regs->ARM_pc);
>> +       up_read(&mm->mmap_sem);
>>
> 
> My understanding is that such code detects faulty references within the
> _valid_ address space, typically when hitting a page guard area. But I
> guess that this won't work when treading on stack memory outside of the
> address space, e.g. below the red zone for instance, isn't it? AFAIU,
> those things may happen when the heading space of preposterously large
> stack-based objects are addressed.

Yes, exactly, but that would have been enough to detect Peter's problem.
I thought gcc had an option to yell when the objects on stack grow
beyond some size, but I can not find it.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xenomai-help] native: A 32k stack is not always a 'reasonable' size
  2010-07-08 10:04                     ` Gilles Chanteperdrix
@ 2010-07-08 10:09                       ` Gilles Chanteperdrix
  0 siblings, 0 replies; 21+ messages in thread
From: Gilles Chanteperdrix @ 2010-07-08 10:09 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai-help

Gilles Chanteperdrix wrote:
> Philippe Gerum wrote:
>> On Thu, 2010-07-08 at 11:35 +0200, Gilles Chanteperdrix wrote:
>>> +       down_read(&mm->mmap_sem);
>>> +       if (find_vma(mm, addr) == find_vma(mm, regs->ARM_sp))
>>> +               printk(KERN_INFO "FCSE: process %u(%s) probably overflowed stack
>>>  at 0x%08lx.\n",
>>> +                      current->pid, current->comm, regs->ARM_pc);
>>> +       up_read(&mm->mmap_sem);
>>>
>> My understanding is that such code detects faulty references within the
>> _valid_ address space, typically when hitting a page guard area. But I
>> guess that this won't work when treading on stack memory outside of the
>> address space, e.g. below the red zone for instance, isn't it? AFAIU,
>> those things may happen when the heading space of preposterously large
>> stack-based objects are addressed.
> 
> Yes, exactly, but that would have been enough to detect Peter's problem.
> I thought gcc had an option to yell when the objects on stack grow
> beyond some size, but I can not find it.

The option is enabled by the kernel when enabling CONFIG_WARN_FRAME, and
is -Wframe-larger-than
It seems to require gcc 4.4 though.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xenomai-help] native: A 32k stack is not always a 'reasonable' size
  2010-07-08  9:55                 ` Gilles Chanteperdrix
@ 2010-07-08 10:19                   ` Philippe Gerum
  2010-07-08 11:47                     ` Gilles Chanteperdrix
  0 siblings, 1 reply; 21+ messages in thread
From: Philippe Gerum @ 2010-07-08 10:19 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai-help

On Thu, 2010-07-08 at 11:55 +0200, Gilles Chanteperdrix wrote:
> Philippe Gerum wrote:
> > On Thu, 2010-07-08 at 10:58 +0200, Gilles Chanteperdrix wrote:
> >> Philippe Gerum wrote:
> >>> On Thu, 2010-07-08 at 01:08 +0200, Gilles Chanteperdrix wrote:
> >>>> Peter Soetens wrote:
> >>>>> On Wed, Jul 7, 2010 at 11:19 PM, Gilles Chanteperdrix
> >>>>> <gilles.chanteperdrix@xenomai.org> wrote:
> >>>>>> Peter Soetens wrote:
> >>>>>>> On Wed, Jul 7, 2010 at 11:06 AM, Gilles Chanteperdrix
> >>>>>>> <gilles.chanteperdrix@xenomai.org> wrote:
> >>>>>>>> Peter Soetens wrote:
> >>>>>>>>> At least, not for Orocos applications. We've had hard to debug
> >>>>>>>>> application segfaults that used just a 'little' bit more than 32k. We
> >>>>>>>>> had to raise the stack size to 128k to get reliably through our
> >>>>>>>>> application startup. I stem from the old 'mlockall ate my RAM'
> >>>>>>>>> generation where we typically reduced stack sizes in order to have
> >>>>>>>>> some crumbles left for the heap. But 32k wasn't really what we were
> >>>>>>>>> aiming for.
> >>>>>>>>>
> >>>>>>>>> Maybe we should explicitly document the 32k limit and its limitations
> >>>>>>>>> for certain applications...?
> >>>>>>>> Again, things have been fixed in 2.5.3 with regard to stack sizes, could
> >>>>>>>> you check that you have the same behaviour?
> >>>>>>> I think we had, but I'm uncertain right now.
> >>>>>>>
> >>>>>>>> As for 32KiB, it is only a default stack size, it is only reasonable in
> >>>>>>>> the sense that 2MiB is unreasonable on a low-end system. 32KiB was
> >>>>>>>> picked because it allows printf to work. Now, whatever stack size we
> >>>>>>>> choose, there will be applications which need more, this does not really
> >>>>>>>> make the default unreasonable.
> >>>>>>> I knew you would say that. It deserves an entry in the faq or some
> >>>>>>> trouble shooting document though.
> >>>>>> It is documented. For instance, rt_task_create says:
> >>>>>> stksize         The size of the stack (in bytes) for the new task. If
> >>>>>>                zero is passed, a reasonable pre-defined size will be substituted.
> >>>>>>
> >>>>>> What else can we say? Documenting that this size is 32 KiB would be
> >>>>>> wrong, because we do not want applications to rely on a particular
> >>>>>> value, in case we want to change it. And the fact that if your stack is
> >>>>>> too small, you will get problems is kind of obvious. For anyone having
> >>>>>> played with stack sizes with Linux or any proprietary RTOS, at least.
> >>>>> And what with new RTOS/Xenomai users ?
> >>>>>
> >>>>> You have to take the user perspective here. The problem with stack
> >>>>> overflows is that they occur when the development of a program has
> >>>>> progressed a while and applications reached a certain level of
> >>>>> complexity (otherwise the overflow wouldn't have happend in the first
> >>>>> place). So it suddenly starts to segfault (from time to time). What he
> >>>>> does is this: he fires up the debugger to get a backtrace, sees
> >>>>> trouble and wrongly assumes that gdb can't really handle these Xenomai
> >>>>> threads and tries to eliminate causes of the crashes.. 
> >>>> Last time I tried, debugging a stack overflow with gdb was possible. You
> >>>> can print the stack pointer and compare the value with the contents of
> >>>> /proc/pid/maps.
> >>>>
> >>>> The user comes
> >>>>> quickly to the conclusion that 'putting it all together' causes the
> >>>>> crash (the single unit tests pass) and is looking for a software
> >>>>> integration problem. In reality, it's the stack.
> >>>>>
> >>>>> If you've been through all this and then came to the correct
> >>>>> conclusion the same day, you've been burnt before, or are the
> >>>>> exception.
> >>>>>
> >>>>> In my view, 32k is a premature optimization. At least, it shows the
> >>>>> side effects of one.
> >>>> I guess you run Xenomai on one of these big irons, do you? Because if
> >>>> you ran on a low-end machine, you would have understand why we can not
> >>>> keep the 2MB default limit. 32 KiB looks already like a pretty large
> >>>> limit, so, maybe there is a problem in your application?
> >>>>
> >>>> The I-pipe patch for ARM detects stack overflows, I guess we can modify
> >>>> the kernel on all architectures to do the same thing on all architectures.
> >>>>
> >>> Peter made a good point considering the various braindamage outcomes a
> >>> stack smashing issue could trigger. I'm unsure whether anyone can
> >>> immediately suspect a stack overflow to be the cause of any random
> >>> application behavior; typically, that issue could cause a branch to any
> >>> random IP value on x86 since the return address is living on the stack
> >>> and could get trashed, but not necessarily on architectures with
> >>> branch-and-link registers. In the former case, GDB is of little help,
> >>> except for single-stepping until the offending statement is reached and
> >>> we can observe the trashing live, which means that we actually did the
> >>> work of spotting the issue manually.
> >>>
> >>> It turns out that people with large applications and lots of contexts
> >>> often end up naked in the cold most of the time when facing those
> >>> things, and the only option left to them is to go backward on the
> >>> integration path, in order to find a possibly faulty component. Before
> >>> people can reasonably compare %sp values, they need some help to narrow
> >>> the search, otherwise, it's hopeless.
> >>>
> >>> To this end, maybe an option would be to enable gcc's
> >>> -fstack-protector[-all] -fstack-check when the debug switch is given to
> >>> the configure script, provided the compiler in use supports this.
> >>>
> >>> Granted, a stack overflow is not identical to a smashing, but quite
> >>> often the stack memory unduly consumed by a thread belongs to some other
> >>> memory object, and therefore usually gets trashed when that object is
> >>> modified. At least, enabling some canary word checking in that case may
> >>> help.
> >> I do not think so. The glibc maps an unreadable/unwritable page below
> >> the stack. So, what you get is a segmentation fault. Unless, of course,
> >> you overflow more than one page. But we can map more than one page by
> >> using pthread_attr_setguardsize, if one page is not enough.
> > 
> > Actually, I guess that the stack guard area will not be contiguous to
> > any valid page in most cases, so the size of that area should not be the
> > main issue; i.e. at worst, the code would write to an unmapped address
> > and raise a fault the same way. But despite this, identifying whether we
> > had a stack overflow is still a pain, because that situation sometimes
> > deeply confuses GDB. Or confuses the developer because function
> > prologues and other hidden code do refer to stack memory, so unless we
> > trace the program at instruction level, in single-stepping mode, we are
> > toast.
> 
> Unfortunately, the thread stacks get allocated with mmap, so, they all
> get "stacked", no pun intended. They are only separated with the guard
> pages, so, yes, if you overflow badly, you may override an other
> thread's stack. And since you will have a tendency to overflow the other
> thread's stack top, it will take some time before you detect the overrun.
> 

If I understand the glibc code properly, the stack cache is not
pre-filled, but merely serves to recycle old stacks from terminated
stacks. So, at least until a stack area could actually be reused from
that cache, fresh new stack space for new threads is always obtained via
mmap(), which means that we may have non-contiguous stack spaces most of
the time. It seems that things would start to hit the crapper when some
recycling takes place, in which case an overflow situation could cause a
stack to overflow on its neighbor.


-- 
Philippe.




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xenomai-help] native: A 32k stack is not always a 'reasonable' size
  2010-07-08 10:19                   ` Philippe Gerum
@ 2010-07-08 11:47                     ` Gilles Chanteperdrix
  2010-07-08 15:01                       ` Philippe Gerum
  0 siblings, 1 reply; 21+ messages in thread
From: Gilles Chanteperdrix @ 2010-07-08 11:47 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai-help

Philippe Gerum wrote:
> If I understand the glibc code properly, the stack cache is not
> pre-filled, but merely serves to recycle old stacks from terminated
> stacks. So, at least until a stack area could actually be reused from
> that cache, fresh new stack space for new threads is always obtained via
> mmap(), which means that we may have non-contiguous stack spaces most of
> the time. It seems that things would start to hit the crapper when some
> recycling takes place, in which case an overflow situation could cause a
> stack to overflow on its neighbor.

I am not sure I understand what you mean. So, I am going to try and show
you what I mean. I run the following program:

#include <stdio.h>

#include <pthread.h>
#include <unistd.h>

void *thread(void *cookie)
{
        int x;
        printf("sp: %p\n", &x);
        pause();
        return cookie;
}

int main(void)
{
        pthread_t ida, idb;
        pthread_create(&ida, NULL, thread, NULL);
        pthread_create(&idb, NULL, thread, NULL);
        pthread_join(ida, NULL);
        return 0;
}

On an ARMv7 (no FCSE involved) platform. It prints:
sp: 0x411a2ddc
sp: 0x409a2ddc

I then dump the process mappings, and I get everything contiguous:
401a4000-401a5000 ---p 00000000 00:00 0
401a5000-409a4000 rw-p 00000000 00:00 0
409a4000-409a5000 ---p 00000000 00:00 0
409a5000-411a4000 rw-p 00000000 00:00 0

So, it looks to me like if the thread with the highest stack address go
past below the guard page limit, it will overrun the other thread's stack.

On x86, this is a different story. I guess because the kernel or glibc
has a stack top randomization strategy.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xenomai-help] native: A 32k stack is not always a 'reasonable' size
  2010-07-08  9:58                   ` Philippe Gerum
  2010-07-08 10:04                     ` Gilles Chanteperdrix
@ 2010-07-08 11:52                     ` Gilles Chanteperdrix
  1 sibling, 0 replies; 21+ messages in thread
From: Gilles Chanteperdrix @ 2010-07-08 11:52 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai-help

Philippe Gerum wrote:
> On Thu, 2010-07-08 at 11:35 +0200, Gilles Chanteperdrix wrote:
>> +       down_read(&mm->mmap_sem);
>> +       if (find_vma(mm, addr) == find_vma(mm, regs->ARM_sp))
>> +               printk(KERN_INFO "FCSE: process %u(%s) probably overflowed stack
>>  at 0x%08lx.\n",
>> +                      current->pid, current->comm, regs->ARM_pc);
>> +       up_read(&mm->mmap_sem);
>>
> 
> My understanding is that such code detects faulty references within the
> _valid_ address space, typically when hitting a page guard area. But I
> guess that this won't work when treading on stack memory outside of the
> address space, e.g. below the red zone for instance, isn't it? AFAIU,
> those things may happen when the heading space of preposterously large
> stack-based objects are addressed.

We only get the case where addr and sp are both in the guard page, or
both in a memory mapping hole. We can improve a bit by trying:

	if (!find_vma(mm, regs->ARM_sp) ||
		find_vma(mm, addr) == find_vma(mm, regs->ARM_sp))

We will also catch the case where addr is in the guard page, whereas sp
is in a memory mapping hole. But as I said in the other mail I just
sent, this will only work on machines with holes between thread stacks.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xenomai-help] native: A 32k stack is not always a 'reasonable' size
  2010-07-08 11:47                     ` Gilles Chanteperdrix
@ 2010-07-08 15:01                       ` Philippe Gerum
  2010-07-08 16:33                         ` Gilles Chanteperdrix
  0 siblings, 1 reply; 21+ messages in thread
From: Philippe Gerum @ 2010-07-08 15:01 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai-help

On Thu, 2010-07-08 at 13:47 +0200, Gilles Chanteperdrix wrote:
> Philippe Gerum wrote:
> > If I understand the glibc code properly, the stack cache is not
> > pre-filled, but merely serves to recycle old stacks from terminated
> > stacks. So, at least until a stack area could actually be reused from
> > that cache, fresh new stack space for new threads is always obtained via
> > mmap(), which means that we may have non-contiguous stack spaces most of
> > the time. It seems that things would start to hit the crapper when some
> > recycling takes place, in which case an overflow situation could cause a
> > stack to overflow on its neighbor.
> 
> I am not sure I understand what you mean. So, I am going to try and show
> you what I mean. I run the following program:
> 
> #include <stdio.h>
> 
> #include <pthread.h>
> #include <unistd.h>
> 
> void *thread(void *cookie)
> {
>         int x;
>         printf("sp: %p\n", &x);
>         pause();
>         return cookie;
> }
> 
> int main(void)
> {
>         pthread_t ida, idb;
>         pthread_create(&ida, NULL, thread, NULL);
>         pthread_create(&idb, NULL, thread, NULL);
>         pthread_join(ida, NULL);
>         return 0;
> }
> 
> On an ARMv7 (no FCSE involved) platform. It prints:
> sp: 0x411a2ddc
> sp: 0x409a2ddc
> 
> I then dump the process mappings, and I get everything contiguous:
> 401a4000-401a5000 ---p 00000000 00:00 0
> 401a5000-409a4000 rw-p 00000000 00:00 0
> 409a4000-409a5000 ---p 00000000 00:00 0
> 409a5000-411a4000 rw-p 00000000 00:00 0
> 
> So, it looks to me like if the thread with the highest stack address go
> past below the guard page limit, it will overrun the other thread's stack.

I mean that glibc does not pre-allocate pieces of anon memory to honor
requests for stack chunks, it gets them on the fly from an internal
cache if one matches, or mmaps its. Besides, the cache itself is only
fed with recycled stacks from terminated threads it seems, so we can't
predict whether all stacks there would be contiguous.

For instance, I'm assuming that tweaking your code like below would
likely prevent the stack segments from being contiguous:

        pthread_create(&ida, NULL, thread, NULL);
      +	mmap(NULL, 8*1024*1024, PROT_READ, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
	pthread_create(&idb, NULL, thread, NULL);
        pthread_join(ida, NULL);

If so, it is indeed likely that segments would be contiguous if threads
are started the way you did; on the other hand, it is possible that a
more complex application does not suffer this. Granted, this does not
help us that much anyway.

My point is that nothing guarantees us either contiguous or sparse stack
address ranges, so we probably should not rely on those assumptions.

> 
> On x86, this is a different story. I guess because the kernel or glibc
> has a stack top randomization strategy.
> 

-- 
Philippe.




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xenomai-help] native: A 32k stack is not always a 'reasonable' size
  2010-07-08 15:01                       ` Philippe Gerum
@ 2010-07-08 16:33                         ` Gilles Chanteperdrix
  0 siblings, 0 replies; 21+ messages in thread
From: Gilles Chanteperdrix @ 2010-07-08 16:33 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai-help

Philippe Gerum wrote:
> I mean that glibc does not pre-allocate pieces of anon memory to honor
> requests for stack chunks, it gets them on the fly from an internal
> cache if one matches, or mmaps its. Besides, the cache itself is only
> fed with recycled stacks from terminated threads it seems, so we can't
> predict whether all stacks there would be contiguous.
> 
> For instance, I'm assuming that tweaking your code like below would
> likely prevent the stack segments from being contiguous:
> 
>         pthread_create(&ida, NULL, thread, NULL);
>       +	mmap(NULL, 8*1024*1024, PROT_READ, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
> 	pthread_create(&idb, NULL, thread, NULL);
>         pthread_join(ida, NULL);
> 
> If so, it is indeed likely that segments would be contiguous if threads
> are started the way you did; on the other hand, it is possible that a
> more complex application does not suffer this. Granted, this does not
> help us that much anyway.
> 
> My point is that nothing guarantees us either contiguous or sparse stack
> address ranges, so we probably should not rely on those assumptions.

So the worst case, in case of massive stack overflow, or in a system
without MMU is silent corruption of unrelated data.

I am not sure of what we can do about that. Not sure
-fstack-protector/-fstack-check is a solution.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Xenomai-help] native: A 32k stack is not always a 'reasonable' size
  2010-07-06 19:25 [Xenomai-help] native: A 32k stack is not always a 'reasonable' size Peter Soetens
  2010-07-07  9:06 ` Gilles Chanteperdrix
@ 2010-07-11 13:15 ` Gilles Chanteperdrix
  1 sibling, 0 replies; 21+ messages in thread
From: Gilles Chanteperdrix @ 2010-07-11 13:15 UTC (permalink / raw)
  To: Peter Soetens; +Cc: xenomai-help

Peter Soetens wrote:
> PS: can anyone allow 'sspr' (=me) to edit/add stuff on the wiki ?

Should be done now.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2010-07-11 13:15 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-07-06 19:25 [Xenomai-help] native: A 32k stack is not always a 'reasonable' size Peter Soetens
2010-07-07  9:06 ` Gilles Chanteperdrix
2010-07-07 20:57   ` Peter Soetens
2010-07-07 21:19     ` Gilles Chanteperdrix
2010-07-07 22:31       ` Peter Soetens
2010-07-07 23:08         ` Gilles Chanteperdrix
2010-07-08  8:37           ` Philippe Gerum
2010-07-08  8:58             ` Gilles Chanteperdrix
2010-07-08  9:31               ` Philippe Gerum
2010-07-08  9:35                 ` Gilles Chanteperdrix
2010-07-08  9:58                   ` Philippe Gerum
2010-07-08 10:04                     ` Gilles Chanteperdrix
2010-07-08 10:09                       ` Gilles Chanteperdrix
2010-07-08 11:52                     ` Gilles Chanteperdrix
2010-07-08  9:50               ` Philippe Gerum
2010-07-08  9:55                 ` Gilles Chanteperdrix
2010-07-08 10:19                   ` Philippe Gerum
2010-07-08 11:47                     ` Gilles Chanteperdrix
2010-07-08 15:01                       ` Philippe Gerum
2010-07-08 16:33                         ` Gilles Chanteperdrix
2010-07-11 13:15 ` Gilles Chanteperdrix

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.