All of lore.kernel.org
 help / color / mirror / Atom feed
* [Xenomai] t_suspend and XNBREAK
@ 2013-10-09 16:29 Daniel Merrill
  2013-10-09 16:33 ` Philippe Gerum
  0 siblings, 1 reply; 18+ messages in thread
From: Daniel Merrill @ 2013-10-09 16:29 UTC (permalink / raw)
  To: xenomai

All,

 

I'm hoping maybe someone can shed a little more light on the issue we see
occasionally. Occasionally our code using the psos+ skin will fail a
t_suspend(0) with error code -4, which I found to be EINTR and appears to
be set if the XNBREAK flag is set. After digging around in the
documentation I found some references that seem to indicate that this
means the thread was forcibly unblocked for some reason. Is there some way
to diagnose what caused this (I'm having trouble pinpointing anything)? It
appears when debugging that the thread never really suspends at all but
returns immediately from the call. Does anyone have some pointers on what
might be a good place to start looking for the culprit? Thanks in advance.

 

 

Dan Merrill


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xenomai] t_suspend and XNBREAK
  2013-10-09 16:29 [Xenomai] t_suspend and XNBREAK Daniel Merrill
@ 2013-10-09 16:33 ` Philippe Gerum
  2013-10-09 16:37   ` Daniel Merrill
  0 siblings, 1 reply; 18+ messages in thread
From: Philippe Gerum @ 2013-10-09 16:33 UTC (permalink / raw)
  To: Daniel Merrill, xenomai

On 10/09/2013 06:29 PM, Daniel Merrill wrote:
> All,
>
>
>
> I'm hoping maybe someone can shed a little more light on the issue we see
> occasionally. Occasionally our code using the psos+ skin will fail a
> t_suspend(0) with error code -4, which I found to be EINTR and appears to
> be set if the XNBREAK flag is set. After digging around in the
> documentation I found some references that seem to indicate that this
> means the thread was forcibly unblocked for some reason. Is there some way
> to diagnose what caused this (I'm having trouble pinpointing anything)? It
> appears when debugging that the thread never really suspends at all but
> returns immediately from the call. Does anyone have some pointers on what
> might be a good place to start looking for the culprit? Thanks in advance.

Are you tracing the application with GDB?

>
>
>
>
>
> Dan Merrill
>
> _______________________________________________
> Xenomai mailing list
> Xenomai@xenomai.org
> http://www.xenomai.org/mailman/listinfo/xenomai
>


-- 
Philippe.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xenomai] t_suspend and XNBREAK
  2013-10-09 16:33 ` Philippe Gerum
@ 2013-10-09 16:37   ` Daniel Merrill
  2013-10-09 16:43     ` Philippe Gerum
  0 siblings, 1 reply; 18+ messages in thread
From: Daniel Merrill @ 2013-10-09 16:37 UTC (permalink / raw)
  To: 'Philippe Gerum', xenomai

On 10/09/2013 06:29 PM, Daniel Merrill wrote:
> All,
>
>
>
> I'm hoping maybe someone can shed a little more light on the issue we 
> see occasionally. Occasionally our code using the psos+ skin will fail 
> a
> t_suspend(0) with error code -4, which I found to be EINTR and appears 
> to be set if the XNBREAK flag is set. After digging around in the 
> documentation I found some references that seem to indicate that this 
> means the thread was forcibly unblocked for some reason. Is there some 
> way to diagnose what caused this (I'm having trouble pinpointing 
> anything)? It appears when debugging that the thread never really 
> suspends at all but returns immediately from the call. Does anyone 
> have some pointers on what might be a good place to start looking for
the culprit? Thanks in advance.

Are you tracing the application with GDB?

We are using GDB to diagnose problems, can this cause the issue?
>
>
>
>
>
> Dan Merrill
>
> _______________________________________________
> Xenomai mailing list
> Xenomai@xenomai.org
> http://www.xenomai.org/mailman/listinfo/xenomai
>


--
Philippe.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xenomai] t_suspend and XNBREAK
  2013-10-09 16:37   ` Daniel Merrill
@ 2013-10-09 16:43     ` Philippe Gerum
  2013-10-09 17:01       ` Daniel Merrill
  0 siblings, 1 reply; 18+ messages in thread
From: Philippe Gerum @ 2013-10-09 16:43 UTC (permalink / raw)
  To: Daniel Merrill, xenomai

On 10/09/2013 06:37 PM, Daniel Merrill wrote:
> On 10/09/2013 06:29 PM, Daniel Merrill wrote:
>> All,
>>
>>
>>
>> I'm hoping maybe someone can shed a little more light on the issue we
>> see occasionally. Occasionally our code using the psos+ skin will fail
>> a
>> t_suspend(0) with error code -4, which I found to be EINTR and appears
>> to be set if the XNBREAK flag is set. After digging around in the
>> documentation I found some references that seem to indicate that this
>> means the thread was forcibly unblocked for some reason. Is there some
>> way to diagnose what caused this (I'm having trouble pinpointing
>> anything)? It appears when debugging that the thread never really
>> suspends at all but returns immediately from the call. Does anyone
>> have some pointers on what might be a good place to start looking for
> the culprit? Thanks in advance.
>
> Are you tracing the application with GDB?
>
> We are using GDB to diagnose problems, can this cause the issue?

It should not, but one of the reasons for a thread to get forcibly 
unblocked is to receive a regular linux signal when blocked in primary 
mode. Since GDB does send quite a few signals to the application when 
single-stepping/breakpointing the code, this information may be useful 
to know. In short, GDB/ptracing might magnify a bug in this area.

If this issue also happens with no ptracing, then some other source 
kicked the thread out of wait state, and we'd have to instrument the 
code to know who does this.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xenomai] t_suspend and XNBREAK
  2013-10-09 16:43     ` Philippe Gerum
@ 2013-10-09 17:01       ` Daniel Merrill
  2013-10-16 18:30         ` Daniel Merrill
  0 siblings, 1 reply; 18+ messages in thread
From: Daniel Merrill @ 2013-10-09 17:01 UTC (permalink / raw)
  To: 'Philippe Gerum', xenomai

On 10/09/2013 06:37 PM, Daniel Merrill wrote:
> On 10/09/2013 06:29 PM, Daniel Merrill wrote:
>> All,
>>
>>
>>
>> I'm hoping maybe someone can shed a little more light on the issue we 
>> see occasionally. Occasionally our code using the psos+ skin will 
>> fail a
>> t_suspend(0) with error code -4, which I found to be EINTR and 
>> appears to be set if the XNBREAK flag is set. After digging around in 
>> the documentation I found some references that seem to indicate that 
>> this means the thread was forcibly unblocked for some reason. Is 
>> there some way to diagnose what caused this (I'm having trouble 
>> pinpointing anything)? It appears when debugging that the thread 
>> never really suspends at all but returns immediately from the call. 
>> Does anyone have some pointers on what might be a good place to start 
>> looking for
> the culprit? Thanks in advance.
>
> Are you tracing the application with GDB?
>
> We are using GDB to diagnose problems, can this cause the issue?

It should not, but one of the reasons for a thread to get forcibly
unblocked is to receive a regular linux signal when blocked in primary
mode. Since GDB does send quite a few signals to the application when
single-stepping/breakpointing the code, this information may be useful to
know. In short, GDB/ptracing might magnify a bug in this area.

If this issue also happens with no ptracing, then some other source kicked
the thread out of wait state, and we'd have to instrument the code to know
who does this.

I believe we do see it more often when we are using GDB, I do know of one
specific example that happens in gdb but does not happen if we run with no
debugger. In that particular case it doesn't matter if we are single
stepping or not, with breakpoints or without, the mere fact that gdb is
attached seems to activate the issue. We haven't been able to tell if it
is gdb itself that is directly causing the issue or if attaching gdb is
causing some other side effect (maybe timing related) that then causes the
problem to appear.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xenomai] t_suspend and XNBREAK
  2013-10-09 17:01       ` Daniel Merrill
@ 2013-10-16 18:30         ` Daniel Merrill
  2013-10-21 18:21           ` Daniel Merrill
  2014-05-20 12:52           ` Philippe Gerum
  0 siblings, 2 replies; 18+ messages in thread
From: Daniel Merrill @ 2013-10-16 18:30 UTC (permalink / raw)
  To: 'Philippe Gerum', xenomai



-----Original Message-----
From: xenomai-bounces@xenomai.org [mailto:xenomai-bounces@xenomai.org] On
Behalf Of Daniel Merrill
Sent: Wednesday, October 09, 2013 10:02 AM
To: 'Philippe Gerum'; xenomai@xenomai.org
Subject: Re: [Xenomai] t_suspend and XNBREAK

On 10/09/2013 06:37 PM, Daniel Merrill wrote:
> On 10/09/2013 06:29 PM, Daniel Merrill wrote:
>> All,
>>
>>
>>
>> I'm hoping maybe someone can shed a little more light on the issue we 
>> see occasionally. Occasionally our code using the psos+ skin will 
>> fail a
>> t_suspend(0) with error code -4, which I found to be EINTR and 
>> appears to be set if the XNBREAK flag is set. After digging around in 
>> the documentation I found some references that seem to indicate that 
>> this means the thread was forcibly unblocked for some reason. Is 
>> there some way to diagnose what caused this (I'm having trouble 
>> pinpointing anything)? It appears when debugging that the thread 
>> never really suspends at all but returns immediately from the call.
>> Does anyone have some pointers on what might be a good place to start 
>> looking for
> the culprit? Thanks in advance.
>
> Are you tracing the application with GDB?
>
> We are using GDB to diagnose problems, can this cause the issue?

It should not, but one of the reasons for a thread to get forcibly
unblocked is to receive a regular linux signal when blocked in primary
mode. Since GDB does send quite a few signals to the application when
single-stepping/breakpointing the code, this information may be useful to
know. In short, GDB/ptracing might magnify a bug in this area.

If this issue also happens with no ptracing, then some other source kicked
the thread out of wait state, and we'd have to instrument the code to know
who does this.

I believe we do see it more often when we are using GDB, I do know of one
specific example that happens in gdb but does not happen if we run with no
debugger. In that particular case it doesn't matter if we are single
stepping or not, with breakpoints or without, the mere fact that gdb is
attached seems to activate the issue. We haven't been able to tell if it
is gdb itself that is directly causing the issue or if attaching gdb is
causing some other side effect (maybe timing related) that then causes the
problem to appear.

So I've been playing around trying to pinpoint the issue, It does appear
only to be caused by GDB, which is unfortunate cause it makes it harder
for us to find issues with our platform, but we can live with it if we
have to. My question now is this. Is there any way to clear the XNBREAK?
It seems that once the issue happens no thread will suspend correctly and
in our case we have a few that get stuck in an infinite loop since they
were depending on the suspend to keep them from doing so. Thanks Again.

_______________________________________________
Xenomai mailing list
Xenomai@xenomai.org
http://www.xenomai.org/mailman/listinfo/xenomai


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xenomai] t_suspend and XNBREAK
  2013-10-16 18:30         ` Daniel Merrill
@ 2013-10-21 18:21           ` Daniel Merrill
  2013-10-22 13:46             ` Philippe Gerum
  2014-05-20 12:52           ` Philippe Gerum
  1 sibling, 1 reply; 18+ messages in thread
From: Daniel Merrill @ 2013-10-21 18:21 UTC (permalink / raw)
  To: 'Philippe Gerum', xenomai



> -----Original Message-----
> From: Daniel Merrill [mailto:daniel.merrill@psware.com]
> Sent: Wednesday, October 16, 2013 11:31 AM
> To: 'Philippe Gerum'; 'xenomai@xenomai.org'
> Subject: RE: [Xenomai] t_suspend and XNBREAK
> 
> 
> 
> -----Original Message-----
> From: xenomai-bounces@xenomai.org [mailto:xenomai-
> bounces@xenomai.org] On Behalf Of Daniel Merrill
> Sent: Wednesday, October 09, 2013 10:02 AM
> To: 'Philippe Gerum'; xenomai@xenomai.org
> Subject: Re: [Xenomai] t_suspend and XNBREAK
> 
> On 10/09/2013 06:37 PM, Daniel Merrill wrote:
> > On 10/09/2013 06:29 PM, Daniel Merrill wrote:
> >> All,
> >>
> >>
> >>
> >> I'm hoping maybe someone can shed a little more light on the issue we
> >> see occasionally. Occasionally our code using the psos+ skin will
> >> fail a
> >> t_suspend(0) with error code -4, which I found to be EINTR and
> >> appears to be set if the XNBREAK flag is set. After digging around in
> >> the documentation I found some references that seem to indicate that
> >> this means the thread was forcibly unblocked for some reason. Is
> >> there some way to diagnose what caused this (I'm having trouble
> >> pinpointing anything)? It appears when debugging that the thread
> >> never really suspends at all but returns immediately from the call.
> >> Does anyone have some pointers on what might be a good place to start
> >> looking for
> > the culprit? Thanks in advance.
> >
> > Are you tracing the application with GDB?
> >
> > We are using GDB to diagnose problems, can this cause the issue?
> 
> It should not, but one of the reasons for a thread to get forcibly
unblocked is to
> receive a regular linux signal when blocked in primary mode. Since GDB
does
> send quite a few signals to the application when
single-stepping/breakpointing
> the code, this information may be useful to know. In short, GDB/ptracing
might
> magnify a bug in this area.
> 
> If this issue also happens with no ptracing, then some other source
kicked the
> thread out of wait state, and we'd have to instrument the code to know
who
> does this.
> 
> I believe we do see it more often when we are using GDB, I do know of
one
> specific example that happens in gdb but does not happen if we run with
no
> debugger. In that particular case it doesn't matter if we are single
stepping or
> not, with breakpoints or without, the mere fact that gdb is attached
seems to
> activate the issue. We haven't been able to tell if it is gdb itself
that is directly
> causing the issue or if attaching gdb is causing some other side effect
(maybe
> timing related) that then causes the problem to appear.
> 
> So I've been playing around trying to pinpoint the issue, It does appear
only to
> be caused by GDB, which is unfortunate cause it makes it harder for us
to find
> issues with our platform, but we can live with it if we have to. My
question now
> is this. Is there any way to clear the XNBREAK? It seems that once the
issue
> happens no thread will suspend correctly and in our case we have a few
that get
> stuck in an infinite loop since they were depending on the suspend to
keep them
> from doing so. Thanks Again.
> 
> _______________________________________________
> Xenomai mailing list
> Xenomai@xenomai.org
> http://www.xenomai.org/mailman/listinfo/xenomai

More follow up on this, we went ahead and put some logging in shadow.c
which from what we could find is where the signal is "kicking" the thread.
>From the logging it looks like the only signals we get (while attached to
GDB) are SIGSTOP, SIGTRAP, SIGRT32 and SIGKILL(upon exiting the debugger).
I'm assuming the SIGSTOP, SIGTRAP and SIGKILL are normal from the
debugger. It looks like shadow.c looks for SIGTRAP and SIGSTOP and sets an
XNDEBUG state on the thread which I assume allows it to restart the
suspend? SIGRT32 I believe comes from our calls to t_delete. I'm guessing
this is what's causing the suspends to fail? Anyway, I appreciate any
additional insight anyone can offer. Thanks again for all the help.

Dan Merrill


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xenomai] t_suspend and XNBREAK
  2013-10-21 18:21           ` Daniel Merrill
@ 2013-10-22 13:46             ` Philippe Gerum
  2013-10-22 14:49               ` Gilles Chanteperdrix
  2013-10-22 16:50               ` Daniel Merrill
  0 siblings, 2 replies; 18+ messages in thread
From: Philippe Gerum @ 2013-10-22 13:46 UTC (permalink / raw)
  To: Daniel Merrill, xenomai

On 10/21/2013 08:21 PM, Daniel Merrill wrote:

> More follow up on this, we went ahead and put some logging in shadow.c
> which from what we could find is where the signal is "kicking" the thread.
>>From the logging it looks like the only signals we get (while attached to
> GDB) are SIGSTOP, SIGTRAP, SIGRT32 and SIGKILL(upon exiting the debugger).
> I'm assuming the SIGSTOP, SIGTRAP and SIGKILL are normal from the
> debugger. It looks like shadow.c looks for SIGTRAP and SIGSTOP and sets an
> XNDEBUG state on the thread which I assume allows it to restart the
> suspend?

XNDEBUG marks a thread which is ptraced, this has implications when 
managing the system timer while the app is single-stepped/stopped by a 
debugger.

  SIGRT32 I believe comes from our calls to t_delete. I'm guessing
> this is what's causing the suspends to fail? Anyway, I appreciate any
> additional insight anyone can offer. Thanks again for all the help.
>

t_delete() will cause t_suspend() to unblock if sent to the suspended 
task, due to receiving SIGRT32/SIGCANCEL from the linux side, which is 
how the NPTL deals with async cancellation internally (t_delete() -> 
pthread_cancel() -> t(g)kill(SIGCANCEL)).

Internally, XNBREAK will be raised for that task, causing -EINTR to be 
propagated back. However, since there is SIGCANCEL pending for the task, 
the NPTL handler should run on the way back to the call site in 
t_suspend(), and the task should never return from this handler.

In short, receiving EINTR from t_suspend() is unexpected, particularly 
when unblocked by SIGCANCEL. I could not reproduce this issue based on 
the simple test, running over GDB (7.5.1).

A few questions more:

- regardless of t_delete(), is the problem about one or multiple threads 
unblocking unexpectedly from t_suspend(0), when single-stepping a 
distinct thread over GDB?

- I'm testing with Xenomai 2.6.3. Which version have you been using, on 
which cpu/platform, using which I-pipe release in the kernel (check 
/proc/xenomai/{version, hal}?

- Also could you write a simple test code illustrating the issue so that 
I could try reproducing it? Typically, would this be reproducible on 
your setup with a single task running t_suspend(0), while ptracing the 
main routine in parallel?

TIA,

-- 
Philippe.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xenomai] t_suspend and XNBREAK
  2013-10-22 13:46             ` Philippe Gerum
@ 2013-10-22 14:49               ` Gilles Chanteperdrix
  2013-10-22 16:53                 ` Philippe Gerum
  2013-10-22 16:50               ` Daniel Merrill
  1 sibling, 1 reply; 18+ messages in thread
From: Gilles Chanteperdrix @ 2013-10-22 14:49 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai

On 10/22/2013 03:46 PM, Philippe Gerum wrote:
> On 10/21/2013 08:21 PM, Daniel Merrill wrote:
>
>> More follow up on this, we went ahead and put some logging in shadow.c
>> which from what we could find is where the signal is "kicking" the thread.
>> >From the logging it looks like the only signals we get (while attached to
>> GDB) are SIGSTOP, SIGTRAP, SIGRT32 and SIGKILL(upon exiting the debugger).
>> I'm assuming the SIGSTOP, SIGTRAP and SIGKILL are normal from the
>> debugger. It looks like shadow.c looks for SIGTRAP and SIGSTOP and sets an
>> XNDEBUG state on the thread which I assume allows it to restart the
>> suspend?
>
> XNDEBUG marks a thread which is ptraced, this has implications when
> managing the system timer while the app is single-stepped/stopped by a
> debugger.
>
>    SIGRT32 I believe comes from our calls to t_delete. I'm guessing
>> this is what's causing the suspends to fail? Anyway, I appreciate any
>> additional insight anyone can offer. Thanks again for all the help.
>>
>
> t_delete() will cause t_suspend() to unblock if sent to the suspended
> task, due to receiving SIGRT32/SIGCANCEL from the linux side, which is
> how the NPTL deals with async cancellation internally (t_delete() ->
> pthread_cancel() -> t(g)kill(SIGCANCEL)).
>
> Internally, XNBREAK will be raised for that task, causing -EINTR to be
> propagated back. However, since there is SIGCANCEL pending for the task,
> the NPTL handler should run on the way back to the call site in
> t_suspend(), and the task should never return from this handler.
>
> In short, receiving EINTR from t_suspend() is unexpected, particularly
> when unblocked by SIGCANCEL. I could not reproduce this issue based on
> the simple test, running over GDB (7.5.1).
>
> A few questions more:
>
> - regardless of t_delete(), is the problem about one or multiple threads
> unblocking unexpectedly from t_suspend(0), when single-stepping a
> distinct thread over GDB?
>
> - I'm testing with Xenomai 2.6.3. Which version have you been using, on
> which cpu/platform, using which I-pipe release in the kernel (check
> /proc/xenomai/{version, hal}?
>
> - Also could you write a simple test code illustrating the issue so that
> I could try reproducing it? Typically, would this be reproducible on
> your setup with a single task running t_suspend(0), while ptracing the
> main routine in parallel?

Maybe cancellation is disabled with pthread_setcancelstate?

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xenomai] t_suspend and XNBREAK
  2013-10-22 13:46             ` Philippe Gerum
  2013-10-22 14:49               ` Gilles Chanteperdrix
@ 2013-10-22 16:50               ` Daniel Merrill
  1 sibling, 0 replies; 18+ messages in thread
From: Daniel Merrill @ 2013-10-22 16:50 UTC (permalink / raw)
  To: 'Philippe Gerum', xenomai



> -----Original Message-----
> From: Philippe Gerum [mailto:rpm@xenomai.org]
> Sent: Tuesday, October 22, 2013 6:47 AM
> To: Daniel Merrill; xenomai@xenomai.org
> Subject: Re: [Xenomai] t_suspend and XNBREAK
> 
> On 10/21/2013 08:21 PM, Daniel Merrill wrote:
> 
> > More follow up on this, we went ahead and put some logging in shadow.c
> > which from what we could find is where the signal is "kicking" the
thread.
> >>From the logging it looks like the only signals we get (while attached
> >>to
> > GDB) are SIGSTOP, SIGTRAP, SIGRT32 and SIGKILL(upon exiting the
debugger).
> > I'm assuming the SIGSTOP, SIGTRAP and SIGKILL are normal from the
> > debugger. It looks like shadow.c looks for SIGTRAP and SIGSTOP and
> > sets an XNDEBUG state on the thread which I assume allows it to
> > restart the suspend?
> 
> XNDEBUG marks a thread which is ptraced, this has implications when
managing
> the system timer while the app is single-stepped/stopped by a debugger.
> 
>   SIGRT32 I believe comes from our calls to t_delete. I'm guessing
> > this is what's causing the suspends to fail? Anyway, I appreciate any
> > additional insight anyone can offer. Thanks again for all the help.
> >
> 
> t_delete() will cause t_suspend() to unblock if sent to the suspended
task, due to
> receiving SIGRT32/SIGCANCEL from the linux side, which is how the NPTL
deals
> with async cancellation internally (t_delete() ->
> pthread_cancel() -> t(g)kill(SIGCANCEL)).
> 
> Internally, XNBREAK will be raised for that task, causing -EINTR to be
> propagated back. However, since there is SIGCANCEL pending for the task,
the
> NPTL handler should run on the way back to the call site in t_suspend(),
and the
> task should never return from this handler.
> 
> In short, receiving EINTR from t_suspend() is unexpected, particularly
when
> unblocked by SIGCANCEL. I could not reproduce this issue based on the
simple
> test, running over GDB (7.5.1).
> 
> A few questions more:
> 
> - regardless of t_delete(), is the problem about one or multiple threads
> unblocking unexpectedly from t_suspend(0), when single-stepping a
distinct
> thread over GDB?
> 

There are about 5 threads that this occurs on, all are deleted by an
outside "controlling" thread. Essentially what happens is occasionally
some of the threads will fail to suspend because of the EINTR issue. What
is supposed to happen is that the thread is supposed to suspend itself and
then wait in that state until it is removed by the t_delete.  Most of the
time it works even when debugger is attached, however taking some code
paths seems to put it into a state where some or all of the threads that
receive the t_delete begin to fail their t_suspend(0) calls.

> - I'm testing with Xenomai 2.6.3. Which version have you been using, on
which
> cpu/platform, using which I-pipe release in the kernel (check
> /proc/xenomai/{version, hal}?

We're using Xenomai 2.6.2.1, i-pipe-core-3.5.7-x86-3
> 
> - Also could you write a simple test code illustrating the issue so that
I could try
> reproducing it? Typically, would this be reproducible on your setup with
a single
> task running t_suspend(0), while ptracing the main routine in parallel?
> 
I've been wanting to give this a try and will attempt now that I think I
have an understanding of what is causing the issue. Up to this point I've
been at a loss for what even to try. I will give it my best shot and see
if I can get this to happen in a much simpler context.

> TIA,
> 
> --
> Philippe.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xenomai] t_suspend and XNBREAK
  2013-10-22 14:49               ` Gilles Chanteperdrix
@ 2013-10-22 16:53                 ` Philippe Gerum
  2013-10-22 22:10                   ` Daniel Merrill
  0 siblings, 1 reply; 18+ messages in thread
From: Philippe Gerum @ 2013-10-22 16:53 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai

On 10/22/2013 04:49 PM, Gilles Chanteperdrix wrote:
> On 10/22/2013 03:46 PM, Philippe Gerum wrote:
>> On 10/21/2013 08:21 PM, Daniel Merrill wrote:
>>
>>> More follow up on this, we went ahead and put some logging in shadow.c
>>> which from what we could find is where the signal is "kicking" the
>>> thread.
>>> >From the logging it looks like the only signals we get (while
>>> attached to
>>> GDB) are SIGSTOP, SIGTRAP, SIGRT32 and SIGKILL(upon exiting the
>>> debugger).
>>> I'm assuming the SIGSTOP, SIGTRAP and SIGKILL are normal from the
>>> debugger. It looks like shadow.c looks for SIGTRAP and SIGSTOP and
>>> sets an
>>> XNDEBUG state on the thread which I assume allows it to restart the
>>> suspend?
>>
>> XNDEBUG marks a thread which is ptraced, this has implications when
>> managing the system timer while the app is single-stepped/stopped by a
>> debugger.
>>
>>    SIGRT32 I believe comes from our calls to t_delete. I'm guessing
>>> this is what's causing the suspends to fail? Anyway, I appreciate any
>>> additional insight anyone can offer. Thanks again for all the help.
>>>
>>
>> t_delete() will cause t_suspend() to unblock if sent to the suspended
>> task, due to receiving SIGRT32/SIGCANCEL from the linux side, which is
>> how the NPTL deals with async cancellation internally (t_delete() ->
>> pthread_cancel() -> t(g)kill(SIGCANCEL)).
>>
>> Internally, XNBREAK will be raised for that task, causing -EINTR to be
>> propagated back. However, since there is SIGCANCEL pending for the task,
>> the NPTL handler should run on the way back to the call site in
>> t_suspend(), and the task should never return from this handler.
>>
>> In short, receiving EINTR from t_suspend() is unexpected, particularly
>> when unblocked by SIGCANCEL. I could not reproduce this issue based on
>> the simple test, running over GDB (7.5.1).
>>
>> A few questions more:
>>
>> - regardless of t_delete(), is the problem about one or multiple threads
>> unblocking unexpectedly from t_suspend(0), when single-stepping a
>> distinct thread over GDB?
>>
>> - I'm testing with Xenomai 2.6.3. Which version have you been using, on
>> which cpu/platform, using which I-pipe release in the kernel (check
>> /proc/xenomai/{version, hal}?
>>
>> - Also could you write a simple test code illustrating the issue so that
>> I could try reproducing it? Typically, would this be reproducible on
>> your setup with a single task running t_suspend(0), while ptracing the
>> main routine in parallel?
>
> Maybe cancellation is disabled with pthread_setcancelstate?
>

AFAIU the NPTL code, then SIGCANCEL should not be sent.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xenomai] t_suspend and XNBREAK
  2013-10-22 16:53                 ` Philippe Gerum
@ 2013-10-22 22:10                   ` Daniel Merrill
  2013-10-23 17:31                     ` Philippe Gerum
  0 siblings, 1 reply; 18+ messages in thread
From: Daniel Merrill @ 2013-10-22 22:10 UTC (permalink / raw)
  To: 'Philippe Gerum', 'Gilles Chanteperdrix'; +Cc: xenomai



> -----Original Message-----
> From: Philippe Gerum [mailto:rpm@xenomai.org]
> Sent: Tuesday, October 22, 2013 9:53 AM
> To: Gilles Chanteperdrix
> Cc: Daniel Merrill; xenomai@xenomai.org
> Subject: Re: [Xenomai] t_suspend and XNBREAK
>
> On 10/22/2013 04:49 PM, Gilles Chanteperdrix wrote:
> > On 10/22/2013 03:46 PM, Philippe Gerum wrote:
> >> On 10/21/2013 08:21 PM, Daniel Merrill wrote:
> >>
> >>> More follow up on this, we went ahead and put some logging in
> >>> shadow.c which from what we could find is where the signal is
> >>> "kicking" the thread.
> >>> >From the logging it looks like the only signals we get (while
> >>> attached to
> >>> GDB) are SIGSTOP, SIGTRAP, SIGRT32 and SIGKILL(upon exiting the
> >>> debugger).
> >>> I'm assuming the SIGSTOP, SIGTRAP and SIGKILL are normal from the
> >>> debugger. It looks like shadow.c looks for SIGTRAP and SIGSTOP and
> >>> sets an XNDEBUG state on the thread which I assume allows it to
> >>> restart the suspend?
> >>
> >> XNDEBUG marks a thread which is ptraced, this has implications when
> >> managing the system timer while the app is single-stepped/stopped by
> >> a debugger.
> >>
> >>    SIGRT32 I believe comes from our calls to t_delete. I'm guessing
> >>> this is what's causing the suspends to fail? Anyway, I appreciate
> >>> any additional insight anyone can offer. Thanks again for all the 
> >>> help.
> >>>
> >>
> >> t_delete() will cause t_suspend() to unblock if sent to the suspended
> >> task, due to receiving SIGRT32/SIGCANCEL from the linux side, which
> >> is how the NPTL deals with async cancellation internally (t_delete()
> >> ->
> >> pthread_cancel() -> t(g)kill(SIGCANCEL)).
> >>
> >> Internally, XNBREAK will be raised for that task, causing -EINTR to
> >> be propagated back. However, since there is SIGCANCEL pending for the
> >> task, the NPTL handler should run on the way back to the call site in
> >> t_suspend(), and the task should never return from this handler.
> >>
> >> In short, receiving EINTR from t_suspend() is unexpected,
> >> particularly when unblocked by SIGCANCEL. I could not reproduce this
> >> issue based on the simple test, running over GDB (7.5.1).
> >>
> >> A few questions more:
> >>
> >> - regardless of t_delete(), is the problem about one or multiple
> >> threads unblocking unexpectedly from t_suspend(0), when
> >> single-stepping a distinct thread over GDB?
> >>
> >> - I'm testing with Xenomai 2.6.3. Which version have you been using,
> >> on which cpu/platform, using which I-pipe release in the kernel
> >> (check /proc/xenomai/{version, hal}?
> >>
> >> - Also could you write a simple test code illustrating the issue so
> >> that I could try reproducing it? Typically, would this be
> >> reproducible on your setup with a single task running t_suspend(0),
> >> while ptracing the main routine in parallel?
> >
> > Maybe cancellation is disabled with pthread_setcancelstate?
> >
>
> AFAIU the NPTL code, then SIGCANCEL should not be sent.
>
> --
> Philippe.

Ok, If I didn't do anything stupid (don't hold it against me if I did) the
following code seems to reproduce the issue on my system:

#include <stdlib.h>
#include <stdio.h>
#include <psos+/psos.h>
#include <sys/mman.h>

#define CONTROLLER_PRIORITY	5
#define SUB_TASK1_PRIORITY	2
#define SUB_TASK2_PRIORITY	3
#define SUB_TASK3_PRIORITY	4

void subTask3()
{
	u_long retValue = 0;
	int count = 0;
	retValue = t_suspend(0);
	printf("subTask3, suspend returned %ld\n", retValue);
	/* count to 1000000 */
	while(count < 1000000)
		count++;
	retValue = t_suspend(0);
	printf("subTask3, suspend returned %ld\n", retValue);
	/*should never get here, we should have either suspended or
	 * been deleted*/
	while(1);
}

void subTask2()
{
	u_long retValue = 0;
	int count = 0;
	retValue = t_suspend(0);
	printf("subTask2, suspend returned %ld\n", retValue);
	/* count to 100000 */
	while(count < 100000)
		count++;
	retValue = t_suspend(0);
	printf("subTask2, suspend returned %ld\n", retValue);
	/*should never get here, we should have either suspended or
	 * been deleted*/
	while(1);
}

void subTask1()
{
	u_long retValue = 0;
	int count = 0;
	retValue = t_suspend(0);
	printf("subTask1, suspend returned %ld\n", retValue);
	/* count to 10000 */
	while(count < 10000)
		count++;
	retValue = t_suspend(0);
	printf("subTask1, suspend returned %ld\n", retValue);
	/*should never get here, we should have either suspended or
	 * been deleted*/
	while(1);
}

void controllerTask()
{
	u_long tid;

	t_ident("SUB1", 0, &tid);
	t_resume(tid);
	tm_wkafter(5);
	t_delete(tid);

	t_ident("SUB2", 0, &tid);
	t_resume(tid);
	tm_wkafter(5);
	t_delete(tid);

	t_ident("SUB3", 0, &tid);
	t_resume(tid);
	tm_wkafter(5);
	t_delete(tid);

	t_ident("MAIN", 0, &tid);
	ev_send(tid, 0x00000001);
	t_suspend(0);
}


int main(int argc, char *argv[])
{
	u_long contId, sub1Id, sub2Id, sub3Id;
	u_long eventsReceived;

	mlockall(MCL_CURRENT | MCL_FUTURE);

	t_create("CONT", CONTROLLER_PRIORITY, 1000, 1000,
			T_FPU | T_LOCAL, &contId);
	t_create("SUB1", SUB_TASK1_PRIORITY, 1000, 1000,
			T_FPU | T_LOCAL, &sub1Id);
	t_create("SUB2", SUB_TASK2_PRIORITY, 1000, 1000,
		       	T_FPU | T_LOCAL, &sub2Id);
	t_create("SUB3", SUB_TASK3_PRIORITY, 1000, 1000,
		       	T_FPU | T_LOCAL, &sub3Id);
	t_start(sub1Id, T_PREEMPT | T_SUPV | T_NOASR, subTask1, NULL);
	t_start(sub2Id, T_PREEMPT | T_SUPV | T_NOASR, subTask2, NULL);
	t_start(sub3Id, T_PREEMPT | T_SUPV | T_NOASR, subTask3, NULL);
	tm_wkafter(5);
	t_start(contId, T_PREEMPT | T_SUPV | T_NOASR, controllerTask, NULL);

	ev_receive(0x00000001, EV_WAIT | EV_ALL, 0, &eventsReceived);

}

This was compiled with the following command:

gcc -g -I/usr/include/xenomai -D_GNU_SOURCE -D_REENTRANT -D__XENO__
 -I/usr/include/xenomai/psos+ 
test.c -lpsos -L/usr/lib -lxenomai -lpthread -lrt

It was then run in gdb using the following gdb script:

break 20
commands
next
continue
end
b 32
commands
next
continue
end
b 52
commands
next
continue
end
run

Please let me know what you think. If I made a mistake in the test I'm more 
than happy to try again, just let me know what I did wrong. Thanks for 
sticking with me through all this. I appreciate all the advice and help 
that's been given.

Dan Merrill


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xenomai] t_suspend and XNBREAK
  2013-10-22 22:10                   ` Daniel Merrill
@ 2013-10-23 17:31                     ` Philippe Gerum
  2013-10-23 23:11                       ` Daniel Merrill
  0 siblings, 1 reply; 18+ messages in thread
From: Philippe Gerum @ 2013-10-23 17:31 UTC (permalink / raw)
  To: Daniel Merrill, 'Gilles Chanteperdrix'; +Cc: xenomai

On 10/23/2013 12:10 AM, Daniel Merrill wrote:
> 
> 
>> -----Original Message-----
>> From: Philippe Gerum [mailto:rpm@xenomai.org]
>> Sent: Tuesday, October 22, 2013 9:53 AM
>> To: Gilles Chanteperdrix
>> Cc: Daniel Merrill; xenomai@xenomai.org
>> Subject: Re: [Xenomai] t_suspend and XNBREAK
>>
>> On 10/22/2013 04:49 PM, Gilles Chanteperdrix wrote:
>>> On 10/22/2013 03:46 PM, Philippe Gerum wrote:
>>>> On 10/21/2013 08:21 PM, Daniel Merrill wrote:
>>>>
>>>>> More follow up on this, we went ahead and put some logging in
>>>>> shadow.c which from what we could find is where the signal is
>>>>> "kicking" the thread.
>>>>> >From the logging it looks like the only signals we get (while
>>>>> attached to
>>>>> GDB) are SIGSTOP, SIGTRAP, SIGRT32 and SIGKILL(upon exiting the
>>>>> debugger).
>>>>> I'm assuming the SIGSTOP, SIGTRAP and SIGKILL are normal from the
>>>>> debugger. It looks like shadow.c looks for SIGTRAP and SIGSTOP and
>>>>> sets an XNDEBUG state on the thread which I assume allows it to
>>>>> restart the suspend?
>>>>
>>>> XNDEBUG marks a thread which is ptraced, this has implications when
>>>> managing the system timer while the app is single-stepped/stopped by
>>>> a debugger.
>>>>
>>>>     SIGRT32 I believe comes from our calls to t_delete. I'm guessing
>>>>> this is what's causing the suspends to fail? Anyway, I appreciate
>>>>> any additional insight anyone can offer. Thanks again for all the
>>>>> help.
>>>>>
>>>>
>>>> t_delete() will cause t_suspend() to unblock if sent to the suspended
>>>> task, due to receiving SIGRT32/SIGCANCEL from the linux side, which
>>>> is how the NPTL deals with async cancellation internally (t_delete()
>>>> ->
>>>> pthread_cancel() -> t(g)kill(SIGCANCEL)).
>>>>
>>>> Internally, XNBREAK will be raised for that task, causing -EINTR to
>>>> be propagated back. However, since there is SIGCANCEL pending for the
>>>> task, the NPTL handler should run on the way back to the call site in
>>>> t_suspend(), and the task should never return from this handler.
>>>>
>>>> In short, receiving EINTR from t_suspend() is unexpected,
>>>> particularly when unblocked by SIGCANCEL. I could not reproduce this
>>>> issue based on the simple test, running over GDB (7.5.1).
>>>>
>>>> A few questions more:
>>>>
>>>> - regardless of t_delete(), is the problem about one or multiple
>>>> threads unblocking unexpectedly from t_suspend(0), when
>>>> single-stepping a distinct thread over GDB?
>>>>
>>>> - I'm testing with Xenomai 2.6.3. Which version have you been using,
>>>> on which cpu/platform, using which I-pipe release in the kernel
>>>> (check /proc/xenomai/{version, hal}?
>>>>
>>>> - Also could you write a simple test code illustrating the issue so
>>>> that I could try reproducing it? Typically, would this be
>>>> reproducible on your setup with a single task running t_suspend(0),
>>>> while ptracing the main routine in parallel?
>>>
>>> Maybe cancellation is disabled with pthread_setcancelstate?
>>>
>>
>> AFAIU the NPTL code, then SIGCANCEL should not be sent.
>>
>> --
>> Philippe.
> 
> Ok, If I didn't do anything stupid (don't hold it against me if I did) the
> following code seems to reproduce the issue on my system:
> 
> #include <stdlib.h>
> #include <stdio.h>
> #include <psos+/psos.h>
> #include <sys/mman.h>
> 
> #define CONTROLLER_PRIORITY	5
> #define SUB_TASK1_PRIORITY	2
> #define SUB_TASK2_PRIORITY	3
> #define SUB_TASK3_PRIORITY	4
> 
> void subTask3()
> {
> 	u_long retValue = 0;
> 	int count = 0;
> 	retValue = t_suspend(0);
> 	printf("subTask3, suspend returned %ld\n", retValue);
> 	/* count to 1000000 */
> 	while(count < 1000000)
> 		count++;
> 	retValue = t_suspend(0);
> 	printf("subTask3, suspend returned %ld\n", retValue);
> 	/*should never get here, we should have either suspended or
> 	 * been deleted*/
> 	while(1);
> }
> 
> void subTask2()
> {
> 	u_long retValue = 0;
> 	int count = 0;
> 	retValue = t_suspend(0);
> 	printf("subTask2, suspend returned %ld\n", retValue);
> 	/* count to 100000 */
> 	while(count < 100000)
> 		count++;
> 	retValue = t_suspend(0);
> 	printf("subTask2, suspend returned %ld\n", retValue);
> 	/*should never get here, we should have either suspended or
> 	 * been deleted*/
> 	while(1);
> }
> 
> void subTask1()
> {
> 	u_long retValue = 0;
> 	int count = 0;
> 	retValue = t_suspend(0);
> 	printf("subTask1, suspend returned %ld\n", retValue);
> 	/* count to 10000 */
> 	while(count < 10000)
> 		count++;
> 	retValue = t_suspend(0);
> 	printf("subTask1, suspend returned %ld\n", retValue);
> 	/*should never get here, we should have either suspended or
> 	 * been deleted*/
> 	while(1);
> }
> 
> void controllerTask()
> {
> 	u_long tid;
> 
> 	t_ident("SUB1", 0, &tid);
> 	t_resume(tid);
> 	tm_wkafter(5);
> 	t_delete(tid);
> 
> 	t_ident("SUB2", 0, &tid);
> 	t_resume(tid);
> 	tm_wkafter(5);
> 	t_delete(tid);
> 
> 	t_ident("SUB3", 0, &tid);
> 	t_resume(tid);
> 	tm_wkafter(5);
> 	t_delete(tid);
> 
> 	t_ident("MAIN", 0, &tid);
> 	ev_send(tid, 0x00000001);
> 	t_suspend(0);
> }
> 
> 
> int main(int argc, char *argv[])
> {
> 	u_long contId, sub1Id, sub2Id, sub3Id;
> 	u_long eventsReceived;
> 
> 	mlockall(MCL_CURRENT | MCL_FUTURE);
> 
> 	t_create("CONT", CONTROLLER_PRIORITY, 1000, 1000,
> 			T_FPU | T_LOCAL, &contId);
> 	t_create("SUB1", SUB_TASK1_PRIORITY, 1000, 1000,
> 			T_FPU | T_LOCAL, &sub1Id);
> 	t_create("SUB2", SUB_TASK2_PRIORITY, 1000, 1000,
> 		       	T_FPU | T_LOCAL, &sub2Id);
> 	t_create("SUB3", SUB_TASK3_PRIORITY, 1000, 1000,
> 		       	T_FPU | T_LOCAL, &sub3Id);
> 	t_start(sub1Id, T_PREEMPT | T_SUPV | T_NOASR, subTask1, NULL);
> 	t_start(sub2Id, T_PREEMPT | T_SUPV | T_NOASR, subTask2, NULL);
> 	t_start(sub3Id, T_PREEMPT | T_SUPV | T_NOASR, subTask3, NULL);
> 	tm_wkafter(5);
> 	t_start(contId, T_PREEMPT | T_SUPV | T_NOASR, controllerTask, NULL);
> 
> 	ev_receive(0x00000001, EV_WAIT | EV_ALL, 0, &eventsReceived);
> 
> }
> 
> This was compiled with the following command:
> 
> gcc -g -I/usr/include/xenomai -D_GNU_SOURCE -D_REENTRANT -D__XENO__
>   -I/usr/include/xenomai/psos+
> test.c -lpsos -L/usr/lib -lxenomai -lpthread -lrt
> 
> It was then run in gdb using the following gdb script:
> 
> break 20
> commands
> next
> continue
> end
> b 32
> commands
> next
> continue
> end
> b 52
> commands
> next
> continue
> end
> run
> 
> Please let me know what you think. If I made a mistake in the test I'm more
> than happy to try again, just let me know what I did wrong. Thanks for
> sticking with me through all this. I appreciate all the advice and help
> that's been given.
> 

Ok, I can't reproduce with this code yet. Let's proceed differently.
Could you apply the patch below, then send back the kernel output
you should get when the issue happens? The traces are emitted only when
a task self-suspends using a null tid, which should restrict the scope
enough to avoid extraneous messages. I'd be interested in the gdb output
you get as well when running over gdb into this issue (i.e. which t_suspend()
call fails with -EINTR in this case - lineno?).

In the meantime, you may also try switching temporarily to 2.6.3,
just for testing. It is 100% ABI and API compatible with 2.6.2.1,
you would only need to recompile your app, nothing more. You don't
have to switch i-pipe support for doing this. We fixed a
couple of issues in the 2.6.3 time frame wrt GDB support. Although
I don't see a direct relationship between your issue and what we fixed,
it makes sense to do a quick check anyway.

TIA,

diff --git a/ksrc/skins/psos+/syscall.c b/ksrc/skins/psos+/syscall.c
index 496ee2b..84d078e 100644
--- a/ksrc/skins/psos+/syscall.c
+++ b/ksrc/skins/psos+/syscall.c
@@ -199,15 +199,13 @@ static int __t_delete(struct pt_regs *regs)
 static int __t_suspend(struct pt_regs *regs)
 {
 	xnhandle_t handle = __xn_reg_arg1(regs);
-	psostask_t *task;
+	psostask_t *task = NULL;
 
-	if (handle)
+	if (handle) {
 		task = __psos_task_lookup(handle);
-	else
-		task = __psos_task_current(current);
-
-	if (!task)
-		return ERR_OBJID;
+		if (task == NULL)
+			return ERR_OBJID;
+	}
 
 	return t_suspend((u_long)task);
 }
diff --git a/ksrc/skins/psos+/task.c b/ksrc/skins/psos+/task.c
index 3de4da4..259e1d6 100644
--- a/ksrc/skins/psos+/task.c
+++ b/ksrc/skins/psos+/task.c
@@ -469,18 +469,40 @@ unlock_and_exit:
 
 u_long t_suspend(u_long tid)
 {
+	struct task_struct *t = current;
 	u_long err = SUCCESS;
+	sigset_t pending;
 	psostask_t *task;
 	spl_t s;
+	int sig;
 
 	if (tid == 0) {
+		printk(KERN_WARNING "%s: self-suspend %s[%d], sigpending=%d, state=0x%lx, info=0x%lx\n",
+		       __func__, t->comm, t->pid,
+		       signal_pending(t),
+		       psos_current_task()->threadbase.state,
+		       psos_current_task()->threadbase.info);
+
 		if (xnpod_unblockable_p())
 			return -EPERM;
 
 		xnpod_suspend_self();
 
-		if (xnthread_test_info(&psos_current_task()->threadbase, XNBREAK))
+		if (xnthread_test_info(&psos_current_task()->threadbase, XNBREAK)) {
+			printk(KERN_WARNING "%s: unblocked %s[%d], sigpending=%d, state=0x%lx, info=0x%lx\n",
+			       __func__, t->comm, t->pid,
+			       signal_pending(t),
+			       psos_current_task()->threadbase.state,
+			       psos_current_task()->threadbase.info);
+			if (signal_pending(t)) {
+				wrap_get_sigpending(&pending, t);
+				for (sig = 1; sig <= _NSIG; sig++) {
+					if (sigismember(&pending, sig))
+						printk(KERN_WARNING "   => SIG%d pending\n", sig);
+				}
+			}
 			return -EINTR;
+		}
 
 		return SUCCESS;
 	}

-- 
Philippe.


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [Xenomai] t_suspend and XNBREAK
  2013-10-23 17:31                     ` Philippe Gerum
@ 2013-10-23 23:11                       ` Daniel Merrill
  2013-10-24 10:03                         ` Philippe Gerum
  0 siblings, 1 reply; 18+ messages in thread
From: Daniel Merrill @ 2013-10-23 23:11 UTC (permalink / raw)
  To: 'Philippe Gerum', 'Gilles Chanteperdrix'; +Cc: xenomai

>
> Ok, I can't reproduce with this code yet. Let's proceed differently.
> Could you apply the patch below, then send back the kernel output
> you should get when the issue happens? The traces are emitted only when
> a task self-suspends using a null tid, which should restrict the scope
> enough to avoid extraneous messages. I'd be interested in the gdb output
> you get as well when running over gdb into this issue (i.e. which 
> t_suspend()
> call fails with -EINTR in this case - lineno?).
>
> In the meantime, you may also try switching temporarily to 2.6.3,
> just for testing. It is 100% ABI and API compatible with 2.6.2.1,
> you would only need to recompile your app, nothing more. You don't
> have to switch i-pipe support for doing this. We fixed a
> couple of issues in the 2.6.3 time frame wrt GDB support. Although
> I don't see a direct relationship between your issue and what we fixed,
> it makes sense to do a quick check anyway.
>
> TIA,
> --
> Philippe.

Ok, here is the log output, plus the gdb output run on 2.6.3

Oct 23 16:05:55 JETSdev kernel: [ 7013.901317] t_suspend: self-suspend 
SUB1[1387], sigpending=0, state=0x300180, info=0x0
Oct 23 16:05:55 JETSdev kernel: [ 7013.901324] t_suspend: self-suspend 
SUB2[1388], sigpending=0, state=0x300180, info=0x0
Oct 23 16:05:55 JETSdev kernel: [ 7013.901344] t_suspend: self-suspend 
SUB3[1389], sigpending=0, state=0x300180, info=0x0
Oct 23 16:05:55 JETSdev kernel: [ 7013.906328] t_suspend: unblocked 
SUB3[1389], sigpending=1, state=0x302180, info=0xc
Oct 23 16:05:55 JETSdev kernel: [ 7013.906330]    => SIG19 pending
Oct 23 16:05:55 JETSdev kernel: [ 7013.906339] t_suspend: unblocked 
SUB2[1388], sigpending=1, state=0x302180, info=0xc
Oct 23 16:05:55 JETSdev kernel: [ 7013.906340]    => SIG19 pending
Oct 23 16:05:55 JETSdev kernel: [ 7013.907202] t_suspend: self-suspend 
SUB3[1389], sigpending=0, state=0x300180, info=0x0
Oct 23 16:05:55 JETSdev kernel: [ 7013.907210] t_suspend: self-suspend 
SUB2[1388], sigpending=0, state=0x300180, info=0x0
Oct 23 16:05:55 JETSdev kernel: [ 7013.907223] t_suspend: self-suspend 
SUB1[1387], sigpending=0, state=0x300180, info=0x0
Oct 23 16:05:55 JETSdev kernel: [ 7013.912273] t_suspend: unblocked 
SUB3[1389], sigpending=1, state=0x302180, info=0xc
Oct 23 16:05:55 JETSdev kernel: [ 7013.912274]    => SIG19 pending
Oct 23 16:05:55 JETSdev kernel: [ 7013.912282] t_suspend: unblocked 
SUB2[1388], sigpending=1, state=0x302180, info=0xc
Oct 23 16:05:55 JETSdev kernel: [ 7013.912283]    => SIG19 pending
Oct 23 16:05:55 JETSdev kernel: [ 7013.912290] t_suspend: unblocked 
SUB1[1387], sigpending=1, state=0x302180, info=0xc
Oct 23 16:05:55 JETSdev kernel: [ 7013.912291]    => SIG19 pending
Oct 23 16:05:55 JETSdev kernel: [ 7013.912694] t_suspend: self-suspend 
SUB3[1389], sigpending=0, state=0x300180, info=0x0
Oct 23 16:05:55 JETSdev kernel: [ 7013.912701] t_suspend: self-suspend 
SUB2[1388], sigpending=0, state=0x300180, info=0x0
Oct 23 16:05:55 JETSdev kernel: [ 7013.912707] t_suspend: self-suspend 
SUB1[1387], sigpending=0, state=0x300180, info=0x0
Oct 23 16:05:55 JETSdev kernel: [ 7013.912783] t_suspend: unblocked 
SUB3[1389], sigpending=1, state=0x302180, info=0xc
Oct 23 16:05:55 JETSdev kernel: [ 7013.912784]    => SIG19 pending
Oct 23 16:05:55 JETSdev kernel: [ 7013.912792] t_suspend: unblocked 
SUB2[1388], sigpending=1, state=0x302180, info=0xc
Oct 23 16:05:55 JETSdev kernel: [ 7013.912792]    => SIG19 pending
Oct 23 16:05:55 JETSdev kernel: [ 7013.912799] t_suspend: unblocked 
SUB1[1387], sigpending=1, state=0x302180, info=0xc
Oct 23 16:05:55 JETSdev kernel: [ 7013.912800]    => SIG19 pending
Oct 23 16:05:55 JETSdev kernel: [ 7013.914466] t_suspend: self-suspend 
SUB3[1389], sigpending=0, state=0x300180, info=0x0
Oct 23 16:05:55 JETSdev kernel: [ 7013.914474] t_suspend: self-suspend 
SUB2[1388], sigpending=0, state=0x300180, info=0x0
Oct 23 16:05:55 JETSdev kernel: [ 7013.914480] t_suspend: self-suspend 
SUB1[1387], sigpending=0, state=0x300180, info=0x0
Oct 23 16:05:55 JETSdev kernel: [ 7013.914515] t_suspend: unblocked 
SUB1[1387], sigpending=1, state=0x300180, info=0xc
Oct 23 16:05:55 JETSdev kernel: [ 7013.914516]    => SIG32 pending
Oct 23 16:05:55 JETSdev kernel: [ 7013.914751] t_suspend: unblocked 
SUB3[1389], sigpending=1, state=0x302180, info=0xc
Oct 23 16:05:55 JETSdev kernel: [ 7013.914752]    => SIG19 pending
Oct 23 16:05:55 JETSdev kernel: [ 7013.915840] t_suspend: self-suspend 
SUB3[1389], sigpending=0, state=0x300180, info=0x0
Oct 23 16:05:55 JETSdev kernel: [ 7013.915858] t_suspend: self-suspend 
SUB2[1388], sigpending=0, state=0x300180, info=0x0
Oct 23 16:05:55 JETSdev kernel: [ 7013.920201] t_suspend: unblocked 
SUB2[1388], sigpending=1, state=0x300180, info=0xc
Oct 23 16:05:55 JETSdev kernel: [ 7013.920203]    => SIG32 pending
Oct 23 16:05:55 JETSdev kernel: [ 7013.922922] t_suspend: unblocked 
SUB3[1389], sigpending=1, state=0x302180, info=0xc
Oct 23 16:05:55 JETSdev kernel: [ 7013.922924]    => SIG19 pending
Oct 23 16:05:55 JETSdev kernel: [ 7013.923319] t_suspend: self-suspend 
SUB3[1389], sigpending=0, state=0x300180, info=0x0

jets@JETSdev:~/projects/tdelete_fail_test$ gdb test
GNU gdb (Ubuntu/Linaro 7.4-2012.04-0ubuntu2.1) 7.4-2012.04
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
<http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-linux-gnu".
For bug reporting instructions, please see:
<http://bugs.launchpad.net/gdb-linaro/>...
Reading symbols from /home/jets/projects/tdelete_fail_test/test...done.
Breakpoint 1 at 0x80486fa: file test.c, line 20.
Breakpoint 2 at 0x8048765: file test.c, line 35.
Breakpoint 3 at 0x80487d0: file test.c, line 50.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/i386-linux-gnu/libthread_db.so.1".
[New Thread 0xb7fd3b40 (LWP 1386)]
[New Thread 0xb7fceb40 (LWP 1387)]
[New Thread 0xb7deab40 (LWP 1388)]
[New Thread 0xb7de5b40 (LWP 1389)]
SUB1 TID:124
subTask1, suspend returned 0
[Switching to Thread 0xb7fceb40 (LWP 1387)]

Breakpoint 3, subTask1 () at test.c:50
50		retValue = t_suspend(0);
SUB2 TID:125
subTask2, suspend returned 0
[Switching to Thread 0xb7deab40 (LWP 1388)]

Breakpoint 2, subTask2 () at test.c:35
35		retValue = t_suspend(0);
subTask1, suspend returned -4
[Thread 0xb7deab40 (LWP 1388) exited]

It hangs after subTask1 fails to suspend because subTask1 starts misbehaving 
at that point and never gives up time.

Anyway hopefully this starts to lead us down the right path. Thanks

Dan Merrill



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xenomai] t_suspend and XNBREAK
  2013-10-23 23:11                       ` Daniel Merrill
@ 2013-10-24 10:03                         ` Philippe Gerum
  2013-10-24 15:52                           ` Daniel Merrill
  2013-10-24 18:24                           ` Daniel Merrill
  0 siblings, 2 replies; 18+ messages in thread
From: Philippe Gerum @ 2013-10-24 10:03 UTC (permalink / raw)
  To: Daniel Merrill, 'Gilles Chanteperdrix'; +Cc: xenomai

On 10/24/2013 01:11 AM, Daniel Merrill wrote:
>>
>> Ok, I can't reproduce with this code yet. Let's proceed differently.
>> Could you apply the patch below, then send back the kernel output
>> you should get when the issue happens? The traces are emitted only when
>> a task self-suspends using a null tid, which should restrict the scope
>> enough to avoid extraneous messages. I'd be interested in the gdb output
>> you get as well when running over gdb into this issue (i.e. which
>> t_suspend()
>> call fails with -EINTR in this case - lineno?).
>>
>> In the meantime, you may also try switching temporarily to 2.6.3,
>> just for testing. It is 100% ABI and API compatible with 2.6.2.1,
>> you would only need to recompile your app, nothing more. You don't
>> have to switch i-pipe support for doing this. We fixed a
>> couple of issues in the 2.6.3 time frame wrt GDB support. Although
>> I don't see a direct relationship between your issue and what we fixed,
>> it makes sense to do a quick check anyway.
>>
>> TIA,
>> --
>> Philippe.
> 
> Ok, here is the log output, plus the gdb output run on 2.6.3
> 
> Oct 23 16:05:55 JETSdev kernel: [ 7013.901317] t_suspend: self-suspend
> SUB1[1387], sigpending=0, state=0x300180, info=0x0
> Oct 23 16:05:55 JETSdev kernel: [ 7013.901324] t_suspend: self-suspend
> SUB2[1388], sigpending=0, state=0x300180, info=0x0
> Oct 23 16:05:55 JETSdev kernel: [ 7013.901344] t_suspend: self-suspend
> SUB3[1389], sigpending=0, state=0x300180, info=0x0
> Oct 23 16:05:55 JETSdev kernel: [ 7013.906328] t_suspend: unblocked
> SUB3[1389], sigpending=1, state=0x302180, info=0xc
> Oct 23 16:05:55 JETSdev kernel: [ 7013.906330]    => SIG19 pending
> Oct 23 16:05:55 JETSdev kernel: [ 7013.906339] t_suspend: unblocked
> SUB2[1388], sigpending=1, state=0x302180, info=0xc
> Oct 23 16:05:55 JETSdev kernel: [ 7013.906340]    => SIG19 pending
> Oct 23 16:05:55 JETSdev kernel: [ 7013.907202] t_suspend: self-suspend
> SUB3[1389], sigpending=0, state=0x300180, info=0x0
> Oct 23 16:05:55 JETSdev kernel: [ 7013.907210] t_suspend: self-suspend
> SUB2[1388], sigpending=0, state=0x300180, info=0x0
> Oct 23 16:05:55 JETSdev kernel: [ 7013.907223] t_suspend: self-suspend
> SUB1[1387], sigpending=0, state=0x300180, info=0x0
> Oct 23 16:05:55 JETSdev kernel: [ 7013.912273] t_suspend: unblocked
> SUB3[1389], sigpending=1, state=0x302180, info=0xc
> Oct 23 16:05:55 JETSdev kernel: [ 7013.912274]    => SIG19 pending
> Oct 23 16:05:55 JETSdev kernel: [ 7013.912282] t_suspend: unblocked
> SUB2[1388], sigpending=1, state=0x302180, info=0xc
> Oct 23 16:05:55 JETSdev kernel: [ 7013.912283]    => SIG19 pending
> Oct 23 16:05:55 JETSdev kernel: [ 7013.912290] t_suspend: unblocked
> SUB1[1387], sigpending=1, state=0x302180, info=0xc
> Oct 23 16:05:55 JETSdev kernel: [ 7013.912291]    => SIG19 pending
> Oct 23 16:05:55 JETSdev kernel: [ 7013.912694] t_suspend: self-suspend
> SUB3[1389], sigpending=0, state=0x300180, info=0x0
> Oct 23 16:05:55 JETSdev kernel: [ 7013.912701] t_suspend: self-suspend
> SUB2[1388], sigpending=0, state=0x300180, info=0x0
> Oct 23 16:05:55 JETSdev kernel: [ 7013.912707] t_suspend: self-suspend
> SUB1[1387], sigpending=0, state=0x300180, info=0x0
> Oct 23 16:05:55 JETSdev kernel: [ 7013.912783] t_suspend: unblocked
> SUB3[1389], sigpending=1, state=0x302180, info=0xc
> Oct 23 16:05:55 JETSdev kernel: [ 7013.912784]    => SIG19 pending
> Oct 23 16:05:55 JETSdev kernel: [ 7013.912792] t_suspend: unblocked
> SUB2[1388], sigpending=1, state=0x302180, info=0xc
> Oct 23 16:05:55 JETSdev kernel: [ 7013.912792]    => SIG19 pending
> Oct 23 16:05:55 JETSdev kernel: [ 7013.912799] t_suspend: unblocked
> SUB1[1387], sigpending=1, state=0x302180, info=0xc
> Oct 23 16:05:55 JETSdev kernel: [ 7013.912800]    => SIG19 pending
> Oct 23 16:05:55 JETSdev kernel: [ 7013.914466] t_suspend: self-suspend
> SUB3[1389], sigpending=0, state=0x300180, info=0x0
> Oct 23 16:05:55 JETSdev kernel: [ 7013.914474] t_suspend: self-suspend
> SUB2[1388], sigpending=0, state=0x300180, info=0x0
> Oct 23 16:05:55 JETSdev kernel: [ 7013.914480] t_suspend: self-suspend
> SUB1[1387], sigpending=0, state=0x300180, info=0x0
> Oct 23 16:05:55 JETSdev kernel: [ 7013.914515] t_suspend: unblocked
> SUB1[1387], sigpending=1, state=0x300180, info=0xc
> Oct 23 16:05:55 JETSdev kernel: [ 7013.914516]    => SIG32 pending
> Oct 23 16:05:55 JETSdev kernel: [ 7013.914751] t_suspend: unblocked
> SUB3[1389], sigpending=1, state=0x302180, info=0xc
> Oct 23 16:05:55 JETSdev kernel: [ 7013.914752]    => SIG19 pending
> Oct 23 16:05:55 JETSdev kernel: [ 7013.915840] t_suspend: self-suspend
> SUB3[1389], sigpending=0, state=0x300180, info=0x0
> Oct 23 16:05:55 JETSdev kernel: [ 7013.915858] t_suspend: self-suspend
> SUB2[1388], sigpending=0, state=0x300180, info=0x0
> Oct 23 16:05:55 JETSdev kernel: [ 7013.920201] t_suspend: unblocked
> SUB2[1388], sigpending=1, state=0x300180, info=0xc
> Oct 23 16:05:55 JETSdev kernel: [ 7013.920203]    => SIG32 pending
> Oct 23 16:05:55 JETSdev kernel: [ 7013.922922] t_suspend: unblocked
> SUB3[1389], sigpending=1, state=0x302180, info=0xc
> Oct 23 16:05:55 JETSdev kernel: [ 7013.922924]    => SIG19 pending
> Oct 23 16:05:55 JETSdev kernel: [ 7013.923319] t_suspend: self-suspend
> SUB3[1389], sigpending=0, state=0x300180, info=0x0
> 
> jets@JETSdev:~/projects/tdelete_fail_test$ gdb test
> GNU gdb (Ubuntu/Linaro 7.4-2012.04-0ubuntu2.1) 7.4-2012.04
> Copyright (C) 2012 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later
> <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "i686-linux-gnu".
> For bug reporting instructions, please see:
> <http://bugs.launchpad.net/gdb-linaro/>...
> Reading symbols from /home/jets/projects/tdelete_fail_test/test...done.
> Breakpoint 1 at 0x80486fa: file test.c, line 20.
> Breakpoint 2 at 0x8048765: file test.c, line 35.
> Breakpoint 3 at 0x80487d0: file test.c, line 50.
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib/i386-linux-gnu/libthread_db.so.1".
> [New Thread 0xb7fd3b40 (LWP 1386)]
> [New Thread 0xb7fceb40 (LWP 1387)]
> [New Thread 0xb7deab40 (LWP 1388)]
> [New Thread 0xb7de5b40 (LWP 1389)]
> SUB1 TID:124
> subTask1, suspend returned 0
> [Switching to Thread 0xb7fceb40 (LWP 1387)]
> 
> Breakpoint 3, subTask1 () at test.c:50
> 50		retValue = t_suspend(0);
> SUB2 TID:125
> subTask2, suspend returned 0
> [Switching to Thread 0xb7deab40 (LWP 1388)]
> 
> Breakpoint 2, subTask2 () at test.c:35
> 35		retValue = t_suspend(0);
> subTask1, suspend returned -4
> [Thread 0xb7deab40 (LWP 1388) exited]
> 
> It hangs after subTask1 fails to suspend because subTask1 starts misbehaving
> at that point and never gives up time.
> 
> Anyway hopefully this starts to lead us down the right path. Thanks
> 

- Do you handle SIG32 specifically in GDB, so that the
latter does not intercept the signal? If not, could you check whether
this makes any difference? (i.e. "handle SIG32 nostop noprint pass")

- Could you apply the patch below, keeping the previous one in?
We should be sending back -ERESTARTSYS to the kernel when the
issue happens, and this behaves as if the Xenomai core did
not.

diff --git a/ksrc/nucleus/shadow.c b/ksrc/nucleus/shadow.c
index c91a6f3..9b96029 100644
--- a/ksrc/nucleus/shadow.c
+++ b/ksrc/nucleus/shadow.c
@@ -699,11 +699,21 @@ void __init xnheap_init_vdso(void)
 
 static inline void request_syscall_restart(xnthread_t *thread,
 					   struct pt_regs *regs,
-					   int sysflags)
+					   int sysflags,
+					   int head)
 {
 	int notify = 0;
 
 	if (xnthread_test_info(thread, XNKICKED)) {
+		printk(KERN_WARNING
+		       "%s[%s]: %s, state=0x%lx, info=0x%lx, rval=%ld, sysflags=0x%x\n",
+		       __func__,
+		       head ? "hi" : "lo",
+		       thread->name,
+		       thread->state, thread->info,
+		       __xn_reg_rval(regs),
+		       sysflags);
+
 		if (__xn_interrupted_p(regs)) {
 			__xn_error_return(regs,
 					  (sysflags & __xn_exec_norestart) ?
@@ -2364,7 +2374,7 @@ int do_hisyscall_event(unsigned event, rthal_pipeline_stage_t *stage,
 		if (signal_pending(p) || xnthread_amok_p(thread)) {
 			sigs = 1;
 			xnthread_clear_amok(thread);
-			request_syscall_restart(thread, regs, sysflags);
+			request_syscall_restart(thread, regs, sysflags, 1);
 		} else if (xnthread_test_state(thread, XNOTHER) &&
 			   xnthread_get_rescnt(thread) == 0) {
 			if (switched)
@@ -2537,7 +2547,7 @@ int do_losyscall_event(unsigned event, rthal_pipeline_stage_t *stage,
 		thread = xnshadow_thread(current);
 		if (signal_pending(current)) {
 			sigs = 1;
-			request_syscall_restart(thread, regs, sysflags);
+			request_syscall_restart(thread, regs, sysflags, 0);
 		} else if (xnthread_test_state(thread, XNOTHER) &&
 			   xnthread_get_rescnt(thread) == 0)
 			sysflags |= __xn_exec_switchback;

-- 
Philippe.


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [Xenomai] t_suspend and XNBREAK
  2013-10-24 10:03                         ` Philippe Gerum
@ 2013-10-24 15:52                           ` Daniel Merrill
  2013-10-24 18:24                           ` Daniel Merrill
  1 sibling, 0 replies; 18+ messages in thread
From: Daniel Merrill @ 2013-10-24 15:52 UTC (permalink / raw)
  To: 'Philippe Gerum', 'Gilles Chanteperdrix'; +Cc: xenomai



> -----Original Message-----
> From: Philippe Gerum [mailto:rpm@xenomai.org]
> Sent: Thursday, October 24, 2013 3:03 AM
> To: Daniel Merrill; 'Gilles Chanteperdrix'
> Cc: xenomai@xenomai.org
> Subject: Re: [Xenomai] t_suspend and XNBREAK
>
> On 10/24/2013 01:11 AM, Daniel Merrill wrote:
> >>
> >> Ok, I can't reproduce with this code yet. Let's proceed differently.
> >> Could you apply the patch below, then send back the kernel output you
> >> should get when the issue happens? The traces are emitted only when a
> >> task self-suspends using a null tid, which should restrict the scope
> >> enough to avoid extraneous messages. I'd be interested in the gdb
> >> output you get as well when running over gdb into this issue (i.e.
> >> which
> >> t_suspend()
> >> call fails with -EINTR in this case - lineno?).
> >>
> >> In the meantime, you may also try switching temporarily to 2.6.3,
> >> just for testing. It is 100% ABI and API compatible with 2.6.2.1, you
> >> would only need to recompile your app, nothing more. You don't have
> >> to switch i-pipe support for doing this. We fixed a couple of issues
> >> in the 2.6.3 time frame wrt GDB support. Although I don't see a
> >> direct relationship between your issue and what we fixed, it makes
> >> sense to do a quick check anyway.
> >>
> >> TIA,
> >> --
> >> Philippe.
> >
> > Ok, here is the log output, plus the gdb output run on 2.6.3
> >
> > Oct 23 16:05:55 JETSdev kernel: [ 7013.901317] t_suspend: self-suspend
> > SUB1[1387], sigpending=0, state=0x300180, info=0x0 Oct 23 16:05:55
> > JETSdev kernel: [ 7013.901324] t_suspend: self-suspend SUB2[1388],
> > sigpending=0, state=0x300180, info=0x0 Oct 23 16:05:55 JETSdev kernel:
> > [ 7013.901344] t_suspend: self-suspend SUB3[1389], sigpending=0,
> > state=0x300180, info=0x0 Oct 23 16:05:55 JETSdev kernel: [
> > 7013.906328] t_suspend: unblocked SUB3[1389], sigpending=1,
> > state=0x302180, info=0xc
> > Oct 23 16:05:55 JETSdev kernel: [ 7013.906330]    => SIG19 pending
> > Oct 23 16:05:55 JETSdev kernel: [ 7013.906339] t_suspend: unblocked
> > SUB2[1388], sigpending=1, state=0x302180, info=0xc
> > Oct 23 16:05:55 JETSdev kernel: [ 7013.906340]    => SIG19 pending
> > Oct 23 16:05:55 JETSdev kernel: [ 7013.907202] t_suspend: self-suspend
> > SUB3[1389], sigpending=0, state=0x300180, info=0x0 Oct 23 16:05:55
> > JETSdev kernel: [ 7013.907210] t_suspend: self-suspend SUB2[1388],
> > sigpending=0, state=0x300180, info=0x0 Oct 23 16:05:55 JETSdev kernel:
> > [ 7013.907223] t_suspend: self-suspend SUB1[1387], sigpending=0,
> > state=0x300180, info=0x0 Oct 23 16:05:55 JETSdev kernel: [
> > 7013.912273] t_suspend: unblocked SUB3[1389], sigpending=1,
> > state=0x302180, info=0xc
> > Oct 23 16:05:55 JETSdev kernel: [ 7013.912274]    => SIG19 pending
> > Oct 23 16:05:55 JETSdev kernel: [ 7013.912282] t_suspend: unblocked
> > SUB2[1388], sigpending=1, state=0x302180, info=0xc
> > Oct 23 16:05:55 JETSdev kernel: [ 7013.912283]    => SIG19 pending
> > Oct 23 16:05:55 JETSdev kernel: [ 7013.912290] t_suspend: unblocked
> > SUB1[1387], sigpending=1, state=0x302180, info=0xc
> > Oct 23 16:05:55 JETSdev kernel: [ 7013.912291]    => SIG19 pending
> > Oct 23 16:05:55 JETSdev kernel: [ 7013.912694] t_suspend: self-suspend
> > SUB3[1389], sigpending=0, state=0x300180, info=0x0 Oct 23 16:05:55
> > JETSdev kernel: [ 7013.912701] t_suspend: self-suspend SUB2[1388],
> > sigpending=0, state=0x300180, info=0x0 Oct 23 16:05:55 JETSdev kernel:
> > [ 7013.912707] t_suspend: self-suspend SUB1[1387], sigpending=0,
> > state=0x300180, info=0x0 Oct 23 16:05:55 JETSdev kernel: [
> > 7013.912783] t_suspend: unblocked SUB3[1389], sigpending=1,
> > state=0x302180, info=0xc
> > Oct 23 16:05:55 JETSdev kernel: [ 7013.912784]    => SIG19 pending
> > Oct 23 16:05:55 JETSdev kernel: [ 7013.912792] t_suspend: unblocked
> > SUB2[1388], sigpending=1, state=0x302180, info=0xc
> > Oct 23 16:05:55 JETSdev kernel: [ 7013.912792]    => SIG19 pending
> > Oct 23 16:05:55 JETSdev kernel: [ 7013.912799] t_suspend: unblocked
> > SUB1[1387], sigpending=1, state=0x302180, info=0xc
> > Oct 23 16:05:55 JETSdev kernel: [ 7013.912800]    => SIG19 pending
> > Oct 23 16:05:55 JETSdev kernel: [ 7013.914466] t_suspend: self-suspend
> > SUB3[1389], sigpending=0, state=0x300180, info=0x0 Oct 23 16:05:55
> > JETSdev kernel: [ 7013.914474] t_suspend: self-suspend SUB2[1388],
> > sigpending=0, state=0x300180, info=0x0 Oct 23 16:05:55 JETSdev kernel:
> > [ 7013.914480] t_suspend: self-suspend SUB1[1387], sigpending=0,
> > state=0x300180, info=0x0 Oct 23 16:05:55 JETSdev kernel: [
> > 7013.914515] t_suspend: unblocked SUB1[1387], sigpending=1,
> > state=0x300180, info=0xc
> > Oct 23 16:05:55 JETSdev kernel: [ 7013.914516]    => SIG32 pending
> > Oct 23 16:05:55 JETSdev kernel: [ 7013.914751] t_suspend: unblocked
> > SUB3[1389], sigpending=1, state=0x302180, info=0xc
> > Oct 23 16:05:55 JETSdev kernel: [ 7013.914752]    => SIG19 pending
> > Oct 23 16:05:55 JETSdev kernel: [ 7013.915840] t_suspend: self-suspend
> > SUB3[1389], sigpending=0, state=0x300180, info=0x0 Oct 23 16:05:55
> > JETSdev kernel: [ 7013.915858] t_suspend: self-suspend SUB2[1388],
> > sigpending=0, state=0x300180, info=0x0 Oct 23 16:05:55 JETSdev kernel:
> > [ 7013.920201] t_suspend: unblocked SUB2[1388], sigpending=1,
> > state=0x300180, info=0xc
> > Oct 23 16:05:55 JETSdev kernel: [ 7013.920203]    => SIG32 pending
> > Oct 23 16:05:55 JETSdev kernel: [ 7013.922922] t_suspend: unblocked
> > SUB3[1389], sigpending=1, state=0x302180, info=0xc
> > Oct 23 16:05:55 JETSdev kernel: [ 7013.922924]    => SIG19 pending
> > Oct 23 16:05:55 JETSdev kernel: [ 7013.923319] t_suspend: self-suspend
> > SUB3[1389], sigpending=0, state=0x300180, info=0x0
> >
> > jets@JETSdev:~/projects/tdelete_fail_test$ gdb test GNU gdb
> > (Ubuntu/Linaro 7.4-2012.04-0ubuntu2.1) 7.4-2012.04 Copyright (C) 2012
> > Free Software Foundation, Inc.
> > License GPLv3+: GNU GPL version 3 or later
> > <http://gnu.org/licenses/gpl.html>
> > This is free software: you are free to change and redistribute it.
> > There is NO WARRANTY, to the extent permitted by law.  Type "show 
> > copying"
> > and "show warranty" for details.
> > This GDB was configured as "i686-linux-gnu".
> > For bug reporting instructions, please see:
> > <http://bugs.launchpad.net/gdb-linaro/>...
> > Reading symbols from /home/jets/projects/tdelete_fail_test/test...done.
> > Breakpoint 1 at 0x80486fa: file test.c, line 20.
> > Breakpoint 2 at 0x8048765: file test.c, line 35.
> > Breakpoint 3 at 0x80487d0: file test.c, line 50.
> > [Thread debugging using libthread_db enabled] Using host libthread_db
> > library "/lib/i386-linux-gnu/libthread_db.so.1".
> > [New Thread 0xb7fd3b40 (LWP 1386)]
> > [New Thread 0xb7fceb40 (LWP 1387)]
> > [New Thread 0xb7deab40 (LWP 1388)]
> > [New Thread 0xb7de5b40 (LWP 1389)]
> > SUB1 TID:124
> > subTask1, suspend returned 0
> > [Switching to Thread 0xb7fceb40 (LWP 1387)]
> >
> > Breakpoint 3, subTask1 () at test.c:50
> > 50		retValue = t_suspend(0);
> > SUB2 TID:125
> > subTask2, suspend returned 0
> > [Switching to Thread 0xb7deab40 (LWP 1388)]
> >
> > Breakpoint 2, subTask2 () at test.c:35
> > 35		retValue = t_suspend(0);
> > subTask1, suspend returned -4
> > [Thread 0xb7deab40 (LWP 1388) exited]
> >
> > It hangs after subTask1 fails to suspend because subTask1 starts
> > misbehaving at that point and never gives up time.
> >
> > Anyway hopefully this starts to lead us down the right path. Thanks
> >
>
> - Do you handle SIG32 specifically in GDB, so that the latter does not 
> intercept
> the signal? If not, could you check whether this makes any difference? 
> (i.e.
> "handle SIG32 nostop noprint pass")
>
I've played around with that quite a bit and it doesn't seem to make any 
difference. Either with handling it specifically in GDB or not. It is 
usually set as you specified to nostop noprint and pass. I will apply the 
patch and get back to you with the results.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xenomai] t_suspend and XNBREAK
  2013-10-24 10:03                         ` Philippe Gerum
  2013-10-24 15:52                           ` Daniel Merrill
@ 2013-10-24 18:24                           ` Daniel Merrill
  1 sibling, 0 replies; 18+ messages in thread
From: Daniel Merrill @ 2013-10-24 18:24 UTC (permalink / raw)
  To: 'Philippe Gerum', 'Gilles Chanteperdrix'; +Cc: xenomai


>
> - Do you handle SIG32 specifically in GDB, so that the
> latter does not intercept the signal? If not, could you check whether
> this makes any difference? (i.e. "handle SIG32 nostop noprint pass")
>
> - Could you apply the patch below, keeping the previous one in?
> We should be sending back -ERESTARTSYS to the kernel when the
> issue happens, and this behaves as if the Xenomai core did
> not.
>

New log with new gdb output with patch applied

Oct 24 11:16:19 JETSdev kernel: [  127.043727] t_suspend: self-suspend 
SUB1[2280], sigpending=0, state=0x300180, info=0x0
Oct 24 11:16:19 JETSdev kernel: [  127.043735] t_suspend: self-suspend 
SUB2[2281], sigpending=0, state=0x300180, info=0x0
Oct 24 11:16:19 JETSdev kernel: [  127.043754] t_suspend: self-suspend 
SUB3[2282], sigpending=0, state=0x300180, info=0x0
Oct 24 11:16:19 JETSdev kernel: [  127.054237] t_suspend: unblocked 
SUB3[2282], sigpending=1, state=0x302180, info=0xc
Oct 24 11:16:19 JETSdev kernel: [  127.054239]    => SIG19 pending
Oct 24 11:16:19 JETSdev kernel: [  127.054240] request_syscall_restart[hi]: 
SUB3, state=0x302180, info=0xc, rval=-4, sysflags=0x22
Oct 24 11:16:19 JETSdev kernel: [  127.054251] t_suspend: unblocked 
SUB2[2281], sigpending=1, state=0x302180, info=0xc
Oct 24 11:16:19 JETSdev kernel: [  127.054251]    => SIG19 pending
Oct 24 11:16:19 JETSdev kernel: [  127.054252] request_syscall_restart[hi]: 
SUB2, state=0x302180, info=0xc, rval=-4, sysflags=0x22
Oct 24 11:16:19 JETSdev kernel: [  127.054261] request_syscall_restart[lo]: 
CONT, state=0x302180, info=0xc, rval=-4, sysflags=0x6
Oct 24 11:16:19 JETSdev kernel: [  127.054269] request_syscall_restart[lo]: 
MAIN, state=0xb02180, info=0xc, rval=-4, sysflags=0x6
Oct 24 11:16:19 JETSdev kernel: [  127.055902] t_suspend: self-suspend 
SUB3[2282], sigpending=0, state=0x300180, info=0x0
Oct 24 11:16:19 JETSdev kernel: [  127.055910] t_suspend: self-suspend 
SUB2[2281], sigpending=0, state=0x300180, info=0x0
Oct 24 11:16:19 JETSdev kernel: [  127.055928] t_suspend: self-suspend 
SUB1[2280], sigpending=0, state=0x300180, info=0x0
Oct 24 11:16:19 JETSdev kernel: [  127.060439] t_suspend: unblocked 
SUB3[2282], sigpending=1, state=0x302180, info=0xc
Oct 24 11:16:19 JETSdev kernel: [  127.060441]    => SIG19 pending
Oct 24 11:16:19 JETSdev kernel: [  127.060442] request_syscall_restart[lo]: 
SUB3, state=0x302180, info=0xc, rval=-4, sysflags=0x22
Oct 24 11:16:19 JETSdev kernel: [  127.060453] t_suspend: unblocked 
SUB2[2281], sigpending=1, state=0x302180, info=0xc
Oct 24 11:16:19 JETSdev kernel: [  127.060454]    => SIG19 pending
Oct 24 11:16:19 JETSdev kernel: [  127.060455] request_syscall_restart[lo]: 
SUB2, state=0x302180, info=0xc, rval=-4, sysflags=0x22
Oct 24 11:16:19 JETSdev kernel: [  127.060464] t_suspend: unblocked 
SUB1[2280], sigpending=1, state=0x302180, info=0xc
Oct 24 11:16:19 JETSdev kernel: [  127.060465]    => SIG19 pending
Oct 24 11:16:19 JETSdev kernel: [  127.060466] request_syscall_restart[lo]: 
SUB1, state=0x302180, info=0xc, rval=-4, sysflags=0x22
Oct 24 11:16:19 JETSdev kernel: [  127.060477] request_syscall_restart[lo]: 
MAIN, state=0xb02180, info=0xc, rval=-4, sysflags=0x6
Oct 24 11:16:19 JETSdev kernel: [  127.060935] t_suspend: self-suspend 
SUB3[2282], sigpending=0, state=0x300180, info=0x0
Oct 24 11:16:19 JETSdev kernel: [  127.060943] t_suspend: self-suspend 
SUB2[2281], sigpending=0, state=0x300180, info=0x0
Oct 24 11:16:19 JETSdev kernel: [  127.060950] t_suspend: self-suspend 
SUB1[2280], sigpending=0, state=0x300180, info=0x0
Oct 24 11:16:19 JETSdev kernel: [  127.061060] t_suspend: unblocked 
SUB3[2282], sigpending=1, state=0x302180, info=0xc
Oct 24 11:16:19 JETSdev kernel: [  127.061061]    => SIG19 pending
Oct 24 11:16:19 JETSdev kernel: [  127.061062] request_syscall_restart[lo]: 
SUB3, state=0x302180, info=0xc, rval=-4, sysflags=0x22
Oct 24 11:16:19 JETSdev kernel: [  127.061070] t_suspend: unblocked 
SUB2[2281], sigpending=1, state=0x302180, info=0xc
Oct 24 11:16:19 JETSdev kernel: [  127.061071]    => SIG19 pending
Oct 24 11:16:19 JETSdev kernel: [  127.061072] request_syscall_restart[lo]: 
SUB2, state=0x302180, info=0xc, rval=-4, sysflags=0x22
Oct 24 11:16:19 JETSdev kernel: [  127.061079] t_suspend: unblocked 
SUB1[2280], sigpending=1, state=0x302180, info=0xc
Oct 24 11:16:19 JETSdev kernel: [  127.061080]    => SIG19 pending
Oct 24 11:16:19 JETSdev kernel: [  127.061081] request_syscall_restart[lo]: 
SUB1, state=0x302180, info=0xc, rval=-4, sysflags=0x22
Oct 24 11:16:19 JETSdev kernel: [  127.061089] request_syscall_restart[lo]: 
MAIN, state=0xb02180, info=0xc, rval=-4, sysflags=0x6
Oct 24 11:16:19 JETSdev kernel: [  127.062059] t_suspend: self-suspend 
SUB3[2282], sigpending=0, state=0x300180, info=0x0
Oct 24 11:16:19 JETSdev kernel: [  127.062068] t_suspend: self-suspend 
SUB2[2281], sigpending=0, state=0x300180, info=0x0
Oct 24 11:16:19 JETSdev kernel: [  127.062074] t_suspend: self-suspend 
SUB1[2280], sigpending=0, state=0x300180, info=0x0
Oct 24 11:16:19 JETSdev kernel: [  127.062116] t_suspend: unblocked 
SUB1[2280], sigpending=1, state=0x300180, info=0xc
Oct 24 11:16:19 JETSdev kernel: [  127.062117]    => SIG32 pending
Oct 24 11:16:19 JETSdev kernel: [  127.062117] request_syscall_restart[lo]: 
SUB1, state=0x300180, info=0xc, rval=-4, sysflags=0x22
Oct 24 11:16:19 JETSdev kernel: [  127.062385] t_suspend: unblocked 
SUB3[2282], sigpending=1, state=0x302180, info=0xc
Oct 24 11:16:19 JETSdev kernel: [  127.062386]    => SIG19 pending
Oct 24 11:16:19 JETSdev kernel: [  127.062387] request_syscall_restart[lo]: 
SUB3, state=0x302180, info=0xc, rval=-4, sysflags=0x22
Oct 24 11:16:19 JETSdev kernel: [  127.062396] request_syscall_restart[lo]: 
CONT, state=0x302180, info=0xc, rval=-4, sysflags=0x6
Oct 24 11:16:19 JETSdev kernel: [  127.062404] request_syscall_restart[lo]: 
MAIN, state=0xb02180, info=0xc, rval=-4, sysflags=0x6
Oct 24 11:16:19 JETSdev kernel: [  127.063467] t_suspend: self-suspend 
SUB3[2282], sigpending=0, state=0x300180, info=0x0
Oct 24 11:16:19 JETSdev kernel: [  127.063482] t_suspend: self-suspend 
SUB2[2281], sigpending=0, state=0x300180, info=0x0
Oct 24 11:16:19 JETSdev kernel: [  127.068227] t_suspend: unblocked 
SUB2[2281], sigpending=1, state=0x300180, info=0xc
Oct 24 11:16:19 JETSdev kernel: [  127.068228]    => SIG32 pending
Oct 24 11:16:19 JETSdev kernel: [  127.068229] request_syscall_restart[lo]: 
SUB2, state=0x300180, info=0xc, rval=-4, sysflags=0x22
Oct 24 11:16:19 JETSdev kernel: [  127.068292] t_suspend: unblocked 
SUB3[2282], sigpending=1, state=0x302180, info=0xc
Oct 24 11:16:19 JETSdev kernel: [  127.068293]    => SIG19 pending
Oct 24 11:16:19 JETSdev kernel: [  127.068294] request_syscall_restart[lo]: 
SUB3, state=0x302180, info=0xc, rval=-4, sysflags=0x22
Oct 24 11:16:19 JETSdev kernel: [  127.068310] request_syscall_restart[lo]: 
MAIN, state=0xb02180, info=0xc, rval=-4, sysflags=0x6
Oct 24 11:16:19 JETSdev kernel: [  127.068562] t_suspend: self-suspend 
SUB3[2282], sigpending=0, state=0x300180, info=0x0


GNU gdb (Ubuntu/Linaro 7.4-2012.04-0ubuntu2.1) 7.4-2012.04
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
<http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i686-linux-gnu".
For bug reporting instructions, please see:
<http://bugs.launchpad.net/gdb-linaro/>...
Reading symbols from /home/jets/projects/tdelete_fail_test/test...done.
Breakpoint 1 at 0x80486fa: file test.c, line 20.
Breakpoint 2 at 0x8048765: file test.c, line 35.
Breakpoint 3 at 0x80487d0: file test.c, line 50.
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/i386-linux-gnu/libthread_db.so.1".
[New Thread 0xb7fd3b40 (LWP 2279)]
[New Thread 0xb7fceb40 (LWP 2280)]
[New Thread 0xb7deab40 (LWP 2281)]
[New Thread 0xb7de5b40 (LWP 2282)]
SUB1 TID:4
subTask1, suspend returned 0
[Switching to Thread 0xb7fceb40 (LWP 2280)]

Breakpoint 3, subTask1 () at test.c:50
50		retValue = t_suspend(0);
SUB2 TID:5
subTask2, suspend returned 0
[Switching to Thread 0xb7deab40 (LWP 2281)]

Breakpoint 2, subTask2 () at test.c:35
35		retValue = t_suspend(0);
subTask1, suspend returned -4
[Thread 0xb7deab40 (LWP 2281) exited]



^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Xenomai] t_suspend and XNBREAK
  2013-10-16 18:30         ` Daniel Merrill
  2013-10-21 18:21           ` Daniel Merrill
@ 2014-05-20 12:52           ` Philippe Gerum
  1 sibling, 0 replies; 18+ messages in thread
From: Philippe Gerum @ 2014-05-20 12:52 UTC (permalink / raw)
  To: Daniel Merrill, xenomai

On 10/16/2013 08:30 PM, Daniel Merrill wrote:
>
>
> -----Original Message-----
> From: xenomai-bounces@xenomai.org [mailto:xenomai-bounces@xenomai.org] On
> Behalf Of Daniel Merrill
> Sent: Wednesday, October 09, 2013 10:02 AM
> To: 'Philippe Gerum'; xenomai@xenomai.org
> Subject: Re: [Xenomai] t_suspend and XNBREAK
>
> On 10/09/2013 06:37 PM, Daniel Merrill wrote:
>> On 10/09/2013 06:29 PM, Daniel Merrill wrote:
>>> All,
>>>
>>>
>>>
>>> I'm hoping maybe someone can shed a little more light on the issue we
>>> see occasionally. Occasionally our code using the psos+ skin will
>>> fail a
>>> t_suspend(0) with error code -4, which I found to be EINTR and
>>> appears to be set if the XNBREAK flag is set. After digging around in
>>> the documentation I found some references that seem to indicate that
>>> this means the thread was forcibly unblocked for some reason. Is
>>> there some way to diagnose what caused this (I'm having trouble
>>> pinpointing anything)? It appears when debugging that the thread
>>> never really suspends at all but returns immediately from the call.
>>> Does anyone have some pointers on what might be a good place to start
>>> looking for
>> the culprit? Thanks in advance.
>>
>> Are you tracing the application with GDB?
>>
>> We are using GDB to diagnose problems, can this cause the issue?
>
> It should not, but one of the reasons for a thread to get forcibly
> unblocked is to receive a regular linux signal when blocked in primary
> mode. Since GDB does send quite a few signals to the application when
> single-stepping/breakpointing the code, this information may be useful to
> know. In short, GDB/ptracing might magnify a bug in this area.
>
> If this issue also happens with no ptracing, then some other source kicked
> the thread out of wait state, and we'd have to instrument the code to know
> who does this.
>
> I believe we do see it more often when we are using GDB, I do know of one
> specific example that happens in gdb but does not happen if we run with no
> debugger. In that particular case it doesn't matter if we are single
> stepping or not, with breakpoints or without, the mere fact that gdb is
> attached seems to activate the issue. We haven't been able to tell if it
> is gdb itself that is directly causing the issue or if attaching gdb is
> causing some other side effect (maybe timing related) that then causes the
> problem to appear.
>
> So I've been playing around trying to pinpoint the issue, It does appear
> only to be caused by GDB, which is unfortunate cause it makes it harder
> for us to find issues with our platform, but we can live with it if we
> have to. My question now is this. Is there any way to clear the XNBREAK?
> It seems that once the issue happens no thread will suspend correctly and
> in our case we have a few that get stuck in an infinite loop since they
> were depending on the suspend to keep them from doing so. Thanks Again.
>

You may want to try out the patch below, fixing a recent issue. Although 
I could not reproduce the bug you observed (it's most likely very 
timing-dependent), there are a lot of similarities between both issues:

http://git.xenomai.org/xenomai-2.6.git/commit/?id=589882956280d5cb4fdc181a5fcd5ae1188ab6ed

HTH,

-- 
Philippe.


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2014-05-20 12:52 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-10-09 16:29 [Xenomai] t_suspend and XNBREAK Daniel Merrill
2013-10-09 16:33 ` Philippe Gerum
2013-10-09 16:37   ` Daniel Merrill
2013-10-09 16:43     ` Philippe Gerum
2013-10-09 17:01       ` Daniel Merrill
2013-10-16 18:30         ` Daniel Merrill
2013-10-21 18:21           ` Daniel Merrill
2013-10-22 13:46             ` Philippe Gerum
2013-10-22 14:49               ` Gilles Chanteperdrix
2013-10-22 16:53                 ` Philippe Gerum
2013-10-22 22:10                   ` Daniel Merrill
2013-10-23 17:31                     ` Philippe Gerum
2013-10-23 23:11                       ` Daniel Merrill
2013-10-24 10:03                         ` Philippe Gerum
2013-10-24 15:52                           ` Daniel Merrill
2013-10-24 18:24                           ` Daniel Merrill
2013-10-22 16:50               ` Daniel Merrill
2014-05-20 12:52           ` Philippe Gerum

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.