[Xenomai] Re : Sporadic problem : rt_task_sleep locked after debugging

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Xenomai] Re : Sporadic problem : rt_task_sleep locked after debugging
@ 2013-05-02 13:49 Paolo Minazzi
  2013-05-03  1:06 ` Gilles Chanteperdrix
  0 siblings, 1 reply; 16+ messages in thread
From: Paolo Minazzi @ 2013-05-02 13:49 UTC (permalink / raw)
  To: xenomai

I'm Paolo Minazzi and some time ago I write you for a problem
related to xenomai and gdb.
After a long time I think to be near to a solution.
I'd like your help and opinions because maybe also other people
have the same problem.

In short, if I debug a xenomai user-space application,
sometimes after a rt_task_sleep the application seems locked.
After this condition, I have to restart my arm processor
because xenomai seems locked. Normal linux application instead
continue to work correctly.

A solution that works for me is to comment some lines in
xenomai-2.5.6/ksrc/nucleus/shadow.c:2620
Please see my comments that begin with "// COMMENT"

==================================================================================
if (xnthread_test_state(next, XNDEBUG)) {
         if (signal_pending(next_task)) {
                 sigset_t pending;
                 /*
                  * Do not grab the sighand lock here:
                  * it's useless, and we already own
                  * the runqueue lock, so this would
                  * expose us to deadlock situations on
                  * SMP.
                  */
                 wrap_get_sigpending(&pending, next_task);

                 // COMMENT if (sigismember(&pending, SIGSTOP) ||
                 // COMMENT     sigismember(&pending, SIGINT))
                 // COMMENT         goto no_ptrace;
         }
         xnthread_clear_state(next, XNDEBUG);
         unlock_timers();
}
no_ptrace:
==================================================================================

The source of my problem seems to be the locked timers.
If I unlock the timers (manually using a dirty hack) when I'm in the bug 
condition
I can repeair the problem and continue to use xenomai without any problems.

In xenomai-2.5.6/ksrc/nucleus/shadow.c:2615

                 /*
                  * Check whether we need to unlock the timers, each
                  * time a Linux task resumes from a stopped state,
                  * excluding tasks resuming shortly for entering a
                  * stopped state asap due to ptracing. To identify the
                  * latter, we need to check for SIGSTOP and SIGINT in
                  * order to encompass both the NPTL and LinuxThreads
                  * behaviours.
                  */

that explains why SIGINT and SIGSTOP are checked. I do not understand well
all this but it seems related to signal used by gdb.
Have you got an idea to solve in the right way the problem ?

Thanks for your time

Paolo Minazzi

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Xenomai] Re : Sporadic problem : rt_task_sleep locked after debugging
  2013-05-02 13:49 [Xenomai] Re : Sporadic problem : rt_task_sleep locked after debugging Paolo Minazzi
@ 2013-05-03  1:06 ` Gilles Chanteperdrix
  2013-05-03 14:46   ` Paolo Minazzi
  0 siblings, 1 reply; 16+ messages in thread
From: Gilles Chanteperdrix @ 2013-05-03  1:06 UTC (permalink / raw)
  To: Paolo Minazzi; +Cc: xenomai

On 05/02/2013 03:49 PM, Paolo Minazzi wrote:

> I'm Paolo Minazzi and some time ago I write you for a problem
> related to xenomai and gdb.
> After a long time I think to be near to a solution.
> I'd like your help and opinions because maybe also other people
> have the same problem.
> 
> In short, if I debug a xenomai user-space application,
> sometimes after a rt_task_sleep the application seems locked.
> After this condition, I have to restart my arm processor
> because xenomai seems locked. Normal linux application instead
> continue to work correctly.
> 
> A solution that works for me is to comment some lines in
> xenomai-2.5.6/ksrc/nucleus/shadow.c:2620
> Please see my comments that begin with "// COMMENT"

The problem, if I understand it correctly, is that timers are locked
when you debug your application with gdb (which is correct), but do not
restart when they should. What you have done here is essentially restart
them upon any SIGSTOP or SIGINT, which means that the timers are in fact
not stopped.

What exactly do you do with gdb when the bug happens? Do you restart the
application with "continue", do you detach de debugger with "detach" or
something else?

What you should do to understand what happens, is to enable the I-pipe
tracer and trigger a trace freeze when you hit a point where you are
sure the timers should have been restarted but have not. With enough
backlog, you may have some luck finding back what happened.

Regards.

-- 
                                                                Gilles.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Xenomai] Re : Sporadic problem : rt_task_sleep locked after debugging
  2013-05-03  1:06 ` Gilles Chanteperdrix
@ 2013-05-03 14:46   ` Paolo Minazzi
  2013-05-08  8:03     ` Paolo Minazzi
  0 siblings, 1 reply; 16+ messages in thread
From: Paolo Minazzi @ 2013-05-03 14:46 UTC (permalink / raw)
  To: xenomai

Il 03/05/2013 3.06, Gilles Chanteperdrix ha scritto:
> The problem, if I understand it correctly, is that timers are locked 
> when you debug your application with gdb (which is correct), but do 
> not restart when they should. What you have done here is essentially 
> restart them upon any SIGSTOP or SIGINT, which means that the timers 
> are in fact not stopped.

Hi Gilles, you understand correctly.

This is an other test that show that timer are locked.
I have also printed the following number :

xnarch_atomic_get(&nkpod->timerlck)

xnarch_atomic_get(&nkpod->timerlck) == 0    means     unlocked
xnarch_atomic_get(&nkpod->timerlck) > 0     means     locked

If I'm stopped with gdb I see 70 (ok, timers locked).
When I restart and all is OK I see 0 (zero, that is timer unlocked).
When I restart ad I have the bug I see 1(bug, I should see zero).

Sometimes, for some reason, timer remains locked.
This is the problem.

>
> What exactly do you do with gdb when the bug happens? Do you restart the
> application with "continue", do you detach de debugger with "detach" or
> something else?
To see the bug I have to

-1- start gdbserver
-2- start gdbclient
-3- handle SIG32 nostop noprint
-4- set brk point #1
-5- set brk point #2
-6- set brk point #3
-7- c to continue (start the program)
-8- it stops on brk point #3
-9- c to continue
-10- it stops on brk point #1
-11- c to continue
-12- it stops on brk point #2
-13- disable brk point #2
-14- c to continue
====> now timers are locked !


Maybe it is interesting that if I remove command -3-

-3- handle SIG32 nostop noprint

at point -12- I begin to receive SIG32 signals.
The SIG32 messages seems be due to detection of a new threads. In this 
moment infact some thread are create.
Maybe these SIG32 signals make confusion.

> What you should do to understand what happens, is to enable the I-pipe
> tracer and trigger a trace freeze when you hit a point where you are
> sure the timers should have been restarted but have not. With enough
> backlog, you may have some luck finding back what happened.
>
> Regards.
>
I have tried but for me for now it is difficult to make a log at the bug 
point.

Thanks
Paolo



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Xenomai] Re : Sporadic problem : rt_task_sleep locked after debugging
  2013-05-03 14:46   ` Paolo Minazzi
@ 2013-05-08  8:03     ` Paolo Minazzi
  2013-05-08 12:58       ` Gilles Chanteperdrix
  0 siblings, 1 reply; 16+ messages in thread
From: Paolo Minazzi @ 2013-05-08  8:03 UTC (permalink / raw)
  To: xenomai

Hi Gilles,
I have added a check in

file          :   xenomai-2.5.6/ksrc/skins/native/task.c
function      :   int rt_task_delete(RT_TASK *task)

See my code signed with =======>

It seems if I'm working with a gdb session, the rt_task_delete() keep 
the lock of the killed thread.
With my additional check, all seems ok.

I could also insert my check directly in xnpod_delete_thread().

What do you think about ?
Paolo



// 
********************************************************************************
int rt_task_delete(RT_TASK *task)
{
         int err = 0;
         spl_t s;

         if (!task) {
                 if (!xnpod_primary_p())
                         return -EPERM;

                 task = xeno_current_task();
         } else if (xnpod_asynch_p())
                 return -EPERM;

         xnlock_get_irqsave(&nklock, s);

         task = xeno_h2obj_validate(task, XENO_TASK_MAGIC, RT_TASK);

         if (!task) {
                 err = xeno_handle_error(task, XENO_TASK_MAGIC, RT_TASK);
                 goto unlock_and_exit;
         }

         /* Make sure the target task is out of any safe section. */
         err = __native_task_safewait(task);

         if (err)
                 goto unlock_and_exit;

         =========> if (xnthread_test_state(&task->thread_base, XNDEBUG))
=========> {
=========>     unlock_timers();
=========> }


         /* Does not return if task is current. */
         xnpod_delete_thread(&task->thread_base);

       unlock_and_exit:

         xnlock_put_irqrestore(&nklock, s);

         return err;
}
// 
********************************************************************************



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Xenomai] Re : Sporadic problem : rt_task_sleep locked after debugging
  2013-05-08  8:03     ` Paolo Minazzi
@ 2013-05-08 12:58       ` Gilles Chanteperdrix
       [not found]         ` <518A505C.2090207@mitrol.it>
  0 siblings, 1 reply; 16+ messages in thread
From: Gilles Chanteperdrix @ 2013-05-08 12:58 UTC (permalink / raw)
  To: Paolo Minazzi; +Cc: xenomai

On 05/08/2013 10:03 AM, Paolo Minazzi wrote:

> Hi Gilles,
> I have added a check in
> 
> file          :   xenomai-2.5.6/ksrc/skins/native/task.c
> function      :   int rt_task_delete(RT_TASK *task)
> 
> See my code signed with =======>
> 
> It seems if I'm working with a gdb session, the rt_task_delete() keep 
> the lock of the killed thread.
> With my additional check, all seems ok.
> 
> I could also insert my check directly in xnpod_delete_thread().
> 
> What do you think about ?

That is a hack. You are working around the real bug. The real fix is to
get the timers unlocked at the time where they should be unlocked. In
order to do this, I suggest once more to enable the I-pipe tracer and
trigger a trace freeze at a point in the code where you are sure the
timers should be unlocked, but they are not, and with enough backlog,
try and understand what happened.

In the example you sent, they should be unlocked as soon as you type
"continue" in gdb console, not any time later.

-- 
                                                                Gilles.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Xenomai] Re : Sporadic problem : rt_task_sleep locked after debugging
       [not found]             ` <518A5600.20508@mitrol.it>
@ 2013-05-08 14:30               ` Paolo Minazzi
  2013-05-08 14:43                 ` Gilles Chanteperdrix
  2013-05-08 16:06                 ` Philippe Gerum
  0 siblings, 2 replies; 16+ messages in thread
From: Paolo Minazzi @ 2013-05-08 14:30 UTC (permalink / raw)
  To: xenomai

I think to be very near to the solution of this problem.
Thanks to Gilles for his patience.

Now I will retry to make a summary of the problem.

********************
**** MY PROBLEM ****
********************
I have an user space xenomai application.
It works fine. If I try to debug it with gdb, sometimes, in some points,
xenomai stops to works. All xenomai threads are like stopped.
In my case they are stopped in rt_task_sleep() or other wait operations.
Linux instead continues to work normally.
To make xenomai usable and working I have to reset the processor.
I have no other way.

**************************
**** SOLUTION / PATCH ****
**************************
file          :   xenomai-2.5.6/ksrc/skins/native/task.c
function      :   int rt_task_delete(RT_TASK *task)

See my code signed with =======>

int rt_task_delete(RT_TASK *task)
{
         int err = 0;
         spl_t s;

         if (!task) {
                 if (!xnpod_primary_p())
                         return -EPERM;

                 task = xeno_current_task();
         } else if (xnpod_asynch_p())
                 return -EPERM;

         xnlock_get_irqsave(&nklock, s);

         task = xeno_h2obj_validate(task, XENO_TASK_MAGIC, RT_TASK);

         if (!task) {
                 err = xeno_handle_error(task, XENO_TASK_MAGIC, RT_TASK);
                 goto unlock_and_exit;
         }

         /* Make sure the target task is out of any safe section. */
         err = __native_task_safewait(task);

         if (err)
                 goto unlock_and_exit;

         =========> if (xnthread_test_state(&task->thread_base, XNDEBUG))
         =========> {
         =========>     unlock_timers();
         =========> }

         /* Does not return if task is current. */
         xnpod_delete_thread(&task->thread_base);

       unlock_and_exit:

         xnlock_put_irqrestore(&nklock, s);

         return err;
}

As you can see, my patch unlock timers if the xenomai thread is killed.

***************************
**** DETAILS OF MY BUG ****
***************************
I have a xenomai application with 70 real-time threads.
Suppose that thread 1 deletes thread 70 with rt_task_delete().
This is my case.

I put a breakpoint and run my xenomai application.
When my breakpoint is reached gdb stops all my xenomai threads.

If I press "c" to continue, gdb restarts my application,
that is every thread, one by one (thread 1, thread 2, thread 3,
.... , thread 70).

If I press "c" on gdb, the following scenario could happens :

1) gdb start thread 1
2) thread 1 starts immediately
3)   thread 1 delete thread 70 by rt_task_delete()
...
4) gdb start thread 2
5) gdb start thread 3
6) gdb start thread 4
7) gdb start thread 5
8) gdb start thread 6
...
.) gdb start thread 68
.) gdb start thread 69
.) gdb start thread 70

The thread 1 finds thread 70 in debug mode !

My patch adjust this problem.

I realize that it is a very special case, but it is my case.

I'd like to know if the patch is valid or can be written in a different way.
For example, I could insert my patch directly in xnpod_delete_thread().

The function unlock_timers() cannot be called from 
xenomai-2.5.6/ksrc/skins/native/task.c
because it is defined static. This is a detail. There are simple ways to 
solve this.

Thanks in advance,
Paolo Minazzi

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Xenomai] Re : Sporadic problem : rt_task_sleep locked after debugging
  2013-05-08 14:30               ` Paolo Minazzi
@ 2013-05-08 14:43                 ` Gilles Chanteperdrix
  2013-05-08 16:06                 ` Philippe Gerum
  1 sibling, 0 replies; 16+ messages in thread
From: Gilles Chanteperdrix @ 2013-05-08 14:43 UTC (permalink / raw)
  To: Paolo Minazzi; +Cc: xenomai

On 05/08/2013 04:30 PM, Paolo Minazzi wrote:

> I have a xenomai application with 70 real-time threads.
> Suppose that thread 1 deletes thread 70 with rt_task_delete().
> This is my case.
> 
> I put a breakpoint and run my xenomai application.
> When my breakpoint is reached gdb stops all my xenomai threads.
> 
> If I press "c" to continue, gdb restarts my application,
> that is every thread, one by one (thread 1, thread 2, thread 3,
> .... , thread 70).
> 
> If I press "c" on gdb, the following scenario could happens :
> 
> 1) gdb start thread 1
> 2) thread 1 starts immediately
> 3)   thread 1 delete thread 70 by rt_task_delete()
> ...
> 4) gdb start thread 2
> 5) gdb start thread 3
> 6) gdb start thread 4
> 7) gdb start thread 5
> 8) gdb start thread 6
> ...
> .) gdb start thread 68
> .) gdb start thread 69
> .) gdb start thread 70
> 
> The thread 1 finds thread 70 in debug mode !
> 
> My patch adjust this problem.
> 
> I realize that it is a very special case, but it is my case.
> 
> I'd like to know if the patch is valid or can be written in a different way.
> For example, I could insert my patch directly in xnpod_delete_thread().
> 
> The function unlock_timers() cannot be called from 
> xenomai-2.5.6/ksrc/skins/native/task.c
> because it is defined static. This is a detail. There are simple ways to 
> solve this.


Hi Paolo,

this explanation makes the condition under which you observe the bug
more clear, but this case is supposed to be already handled by Xenomai code.

If you look at do_taskexit_event() in ksrc/nucleus/shadow.c you will see
at the very beginning a test which does exactly the same thing as your
patch.

Normally, any deletion of a xenomai thread should cause this function to
be called. So, some more debugging is necessary to understand what
happens when rt_task_delete is called in your case. Please try to
trigger an I-pipe tracer freeze in xnpod_delete_thread, and have a large
number of points in /proc/ipipe/trace/post_trace_points.

Regard.

-- 
                                                                Gilles.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Xenomai] Re : Sporadic problem : rt_task_sleep locked after debugging
  2013-05-08 14:30               ` Paolo Minazzi
  2013-05-08 14:43                 ` Gilles Chanteperdrix
@ 2013-05-08 16:06                 ` Philippe Gerum
  2013-05-08 16:10                   ` Philippe Gerum
  1 sibling, 1 reply; 16+ messages in thread
From: Philippe Gerum @ 2013-05-08 16:06 UTC (permalink / raw)
  To: Paolo Minazzi; +Cc: xenomai

On 05/08/2013 04:30 PM, Paolo Minazzi wrote:
> I think to be very near to the solution of this problem.
> Thanks to Gilles for his patience.
> 
> Now I will retry to make a summary of the problem.
>

<snip>

> The thread 1 finds thread 70 in debug mode !
> 

Which is expected. thread 70 has to be scheduled in with no pending ptrace signals for leaving this mode, and this may happen long after the truckload of other threads releases the CPU.

> My patch adjust this problem.
> 
> I realize that it is a very special case, but it is my case.
> 
> I'd like to know if the patch is valid or can be written in a different 
> way.
> For example, I could insert my patch directly in xnpod_delete_thread().
> 
> The function unlock_timers() cannot be called from 
> xenomai-2.5.6/ksrc/skins/native/task.c
> because it is defined static. This is a detail. There are simple ways to 
> solve this.
> 

No, really the patch is wrong, but what you expose does reveal a bug in the Xenomai core for sure. As Gilles told you, you would be only papering over that real bug, which would likely show up in a different situation.

First we need to check for a lock imbalance, I don't think that code is particularly safe.
Please apply the patch below, hoping that it won't affect the timings too much.
A lock imbalance should trigger a BUG assertion, we will try finding out any weirdness
in the locking sequence based on the kernel log output this patch also produces.
Please apply this patch on the stock Xenomai code, then send us back any valuable
kernel output. TIA,

diff --git a/ksrc/nucleus/shadow.c b/ksrc/nucleus/shadow.c
index c91a6f3..edbbbfd 100644
--- a/ksrc/nucleus/shadow.c
+++ b/ksrc/nucleus/shadow.c
@@ -725,18 +725,30 @@ static inline void set_linux_task_priority(struct task_struct *p, int prio)
 		       prio, p->comm);
 }
 
-static inline void lock_timers(void)
+static inline void __lock_timers(struct xnthread *thread, const char *fn)
 {
 	xnarch_atomic_inc(&nkpod->timerlck);
 	setbits(nktbase.status, XNTBLCK);
+	XENO_BUGON(NUCLEUS, xnarch_atomic_get(&nkpod->timerlck) == 0);
+	printk(KERN_WARNING "%s LOCK, thread=%s[%d], count=%d\n",
+	       fn, thread->name, xnthread_user_pid(thread),
+	       xnarch_atomic_get(&nkpod->timerlck));
 }
 
-static inline void unlock_timers(void)
+static inline void __unlock_timers(struct xnthread *thread, const char *fn)
 {
-	if (xnarch_atomic_dec_and_test(&nkpod->timerlck))
+	if (xnarch_atomic_dec_and_test(&nkpod->timerlck)) {
 		clrbits(nktbase.status, XNTBLCK);
+		XENO_BUGON(NUCLEUS, xnarch_atomic_get(&nkpod->timerlck) != 0);
+	}
+	printk(KERN_WARNING "%s UNLOCK, thread=%s[%d], count=%d\n",
+	       fn, thread->name, xnthread_user_pid(thread),
+	       xnarch_atomic_get(&nkpod->timerlck));
 }
 
+#define lock_timers(t)		__lock_timers((t), __func__)
+#define unlock_timers(t)	__unlock_timers((t), __func__)
+
 static void xnshadow_dereference_skin(unsigned magic)
 {
 	unsigned muxid;
@@ -2572,7 +2584,7 @@ static inline void do_taskexit_event(struct task_struct *p)
 	XENO_BUGON(NUCLEUS, !xnpod_root_p());
 
 	if (xnthread_test_state(thread, XNDEBUG))
-		unlock_timers();
+		unlock_timers(thread);
 
 	magic = xnthread_get_magic(thread);
 
@@ -2636,7 +2648,7 @@ static inline void do_schedule_event(struct task_struct *next_task)
 					goto no_ptrace;
 			}
 			xnthread_clear_state(next, XNDEBUG);
-			unlock_timers();
+			unlock_timers(next);
 		}
 
 	      no_ptrace:
@@ -2691,7 +2703,7 @@ static inline void do_sigwake_event(struct task_struct *p)
 		    sigismember(&pending, SIGSTOP)
 		    || sigismember(&pending, SIGINT)) {
 			xnthread_set_state(thread, XNDEBUG);
-			lock_timers();
+			lock_timers(thread);
 		}
 	}

-- 
Philippe.


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [Xenomai] Re : Sporadic problem : rt_task_sleep locked after debugging
  2013-05-08 16:06                 ` Philippe Gerum
@ 2013-05-08 16:10                   ` Philippe Gerum
  2013-05-09 13:36                     ` Philippe Gerum
  0 siblings, 1 reply; 16+ messages in thread
From: Philippe Gerum @ 2013-05-08 16:10 UTC (permalink / raw)
  To: Paolo Minazzi; +Cc: xenomai

On 05/08/2013 06:06 PM, Philippe Gerum wrote:
> On 05/08/2013 04:30 PM, Paolo Minazzi wrote:
>> I think to be very near to the solution of this problem.
>> Thanks to Gilles for his patience.
>>
>> Now I will retry to make a summary of the problem.
>>
>
> <snip>
>
>> The thread 1 finds thread 70 in debug mode !
>>
>
> Which is expected. thread 70 has to be scheduled in with no pending ptrace signals for leaving this mode, and this may happen long after the truckload of other threads releases the CPU.
>
>> My patch adjust this problem.
>>
>> I realize that it is a very special case, but it is my case.
>>
>> I'd like to know if the patch is valid or can be written in a different
>> way.
>> For example, I could insert my patch directly in xnpod_delete_thread().
>>
>> The function unlock_timers() cannot be called from
>> xenomai-2.5.6/ksrc/skins/native/task.c
>> because it is defined static. This is a detail. There are simple ways to
>> solve this.
>>
>
> No, really the patch is wrong, but what you expose does reveal a bug in the Xenomai core for sure. As Gilles told you, you would be only papering over that real bug, which would likely show up in a different situation.
>
> First we need to check for a lock imbalance, I don't think that code is particularly safe.

I mean a lock imbalance introduced by an unexpected race between the 
locking/unlocking calls. The assertions introduced by this patch might 
help detecting this, with some luck.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Xenomai] Re : Sporadic problem : rt_task_sleep locked after debugging
  2013-05-08 16:10                   ` Philippe Gerum
@ 2013-05-09 13:36                     ` Philippe Gerum
  2013-05-09 13:52                       ` Paolo Minazzi
  0 siblings, 1 reply; 16+ messages in thread
From: Philippe Gerum @ 2013-05-09 13:36 UTC (permalink / raw)
  To: Paolo Minazzi; +Cc: xenomai

On 05/08/2013 06:10 PM, Philippe Gerum wrote:
> On 05/08/2013 06:06 PM, Philippe Gerum wrote:
>> On 05/08/2013 04:30 PM, Paolo Minazzi wrote:
>>> I think to be very near to the solution of this problem.
>>> Thanks to Gilles for his patience.
>>>
>>> Now I will retry to make a summary of the problem.
>>>
>>
>> <snip>
>>
>>> The thread 1 finds thread 70 in debug mode !
>>>
>>
>> Which is expected. thread 70 has to be scheduled in with no pending 
>> ptrace signals for leaving this mode, and this may happen long after 
>> the truckload of other threads releases the CPU.
>>
>>> My patch adjust this problem.
>>>
>>> I realize that it is a very special case, but it is my case.
>>>
>>> I'd like to know if the patch is valid or can be written in a different
>>> way.
>>> For example, I could insert my patch directly in xnpod_delete_thread().
>>>
>>> The function unlock_timers() cannot be called from
>>> xenomai-2.5.6/ksrc/skins/native/task.c
>>> because it is defined static. This is a detail. There are simple ways to
>>> solve this.
>>>
>>
>> No, really the patch is wrong, but what you expose does reveal a bug 
>> in the Xenomai core for sure. As Gilles told you, you would be only 
>> papering over that real bug, which would likely show up in a different 
>> situation.
>>
>> First we need to check for a lock imbalance, I don't think that code 
>> is particularly safe.
> 
> I mean a lock imbalance introduced by an unexpected race between the 
> locking/unlocking calls. The assertions introduced by this patch might 
> help detecting this, with some luck.
> 

Could you apply that patch below, and report whether some task triggers
the message it introduces, when things go wrong with gdb? TIA,

diff --git a/ksrc/nucleus/pod.c b/ksrc/nucleus/pod.c
index 868f98f..2da3265 100644
--- a/ksrc/nucleus/pod.c
+++ b/ksrc/nucleus/pod.c
@@ -1215,6 +1215,10 @@ void xnpod_delete_thread(xnthread_t *thread)
 #else /* !CONFIG_XENO_HW_UNLOCKED_SWITCH */
 	} else {
 #endif /* !CONFIG_XENO_HW_UNLOCKED_SWITCH */
+		if (xnthread_test_state(thread, XNSHADOW|XNMAPPED) == XNSHADOW)
+			printk(KERN_WARNING "%s: deleting unmapped shadow %s\n",
+			       __func__, thread->name);
+
 		xnpod_run_hooks(&nkpod->tdeleteq, thread, "DELETE");
 
 		xnsched_forget(thread);

-- 
Philippe.


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [Xenomai] Re : Sporadic problem : rt_task_sleep locked after debugging
  2013-05-09 13:36                     ` Philippe Gerum
@ 2013-05-09 13:52                       ` Paolo Minazzi
  2013-05-09 13:58                         ` Gilles Chanteperdrix
  0 siblings, 1 reply; 16+ messages in thread
From: Paolo Minazzi @ 2013-05-09 13:52 UTC (permalink / raw)
  To: Philippe Gerum; +Cc: xenomai

Il 09/05/2013 15.36, Philippe Gerum ha scritto:
> On 05/08/2013 06:10 PM, Philippe Gerum wrote:
>> On 05/08/2013 06:06 PM, Philippe Gerum wrote:
>>> On 05/08/2013 04:30 PM, Paolo Minazzi wrote:
>>>> I think to be very near to the solution of this problem.
>>>> Thanks to Gilles for his patience.
>>>>
>>>> Now I will retry to make a summary of the problem.
>>>>
>>> <snip>
>>>
>>>> The thread 1 finds thread 70 in debug mode !
>>>>
>>> Which is expected. thread 70 has to be scheduled in with no pending
>>> ptrace signals for leaving this mode, and this may happen long after
>>> the truckload of other threads releases the CPU.
>>>
>>>> My patch adjust this problem.
>>>>
>>>> I realize that it is a very special case, but it is my case.
>>>>
>>>> I'd like to know if the patch is valid or can be written in a different
>>>> way.
>>>> For example, I could insert my patch directly in xnpod_delete_thread().
>>>>
>>>> The function unlock_timers() cannot be called from
>>>> xenomai-2.5.6/ksrc/skins/native/task.c
>>>> because it is defined static. This is a detail. There are simple ways to
>>>> solve this.
>>>>
>>> No, really the patch is wrong, but what you expose does reveal a bug
>>> in the Xenomai core for sure. As Gilles told you, you would be only
>>> papering over that real bug, which would likely show up in a different
>>> situation.
>>>
>>> First we need to check for a lock imbalance, I don't think that code
>>> is particularly safe.
>> I mean a lock imbalance introduced by an unexpected race between the
>> locking/unlocking calls. The assertions introduced by this patch might
>> help detecting this, with some luck.
>>
> Could you apply that patch below, and report whether some task triggers
> the message it introduces, when things go wrong with gdb? TIA,
>
> diff --git a/ksrc/nucleus/pod.c b/ksrc/nucleus/pod.c
> index 868f98f..2da3265 100644
> --- a/ksrc/nucleus/pod.c
> +++ b/ksrc/nucleus/pod.c
> @@ -1215,6 +1215,10 @@ void xnpod_delete_thread(xnthread_t *thread)
>   #else /* !CONFIG_XENO_HW_UNLOCKED_SWITCH */
>   	} else {
>   #endif /* !CONFIG_XENO_HW_UNLOCKED_SWITCH */
> +		if (xnthread_test_state(thread, XNSHADOW|XNMAPPED) == XNSHADOW)
> +			printk(KERN_WARNING "%s: deleting unmapped shadow %s\n",
> +			       __func__, thread->name);
> +
>   		xnpod_run_hooks(&nkpod->tdeleteq, thread, "DELETE");
>
>   		xnsched_forget(thread);
>
I have tried but no messages on the console.
This time I'm sure. To be sure I have added other messages and I see them.
Paolo




^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Xenomai] Re : Sporadic problem : rt_task_sleep locked after debugging
  2013-05-09 13:52                       ` Paolo Minazzi
@ 2013-05-09 13:58                         ` Gilles Chanteperdrix
  2013-05-09 14:04                           ` Philippe Gerum
  0 siblings, 1 reply; 16+ messages in thread
From: Gilles Chanteperdrix @ 2013-05-09 13:58 UTC (permalink / raw)
  To: Paolo Minazzi; +Cc: xenomai

On 05/09/2013 03:52 PM, Paolo Minazzi wrote:

> Il 09/05/2013 15.36, Philippe Gerum ha scritto:
>> On 05/08/2013 06:10 PM, Philippe Gerum wrote:
>>> On 05/08/2013 06:06 PM, Philippe Gerum wrote:
>>>> On 05/08/2013 04:30 PM, Paolo Minazzi wrote:
>>>>> I think to be very near to the solution of this problem.
>>>>> Thanks to Gilles for his patience.
>>>>>
>>>>> Now I will retry to make a summary of the problem.
>>>>>
>>>> <snip>
>>>>
>>>>> The thread 1 finds thread 70 in debug mode !
>>>>>
>>>> Which is expected. thread 70 has to be scheduled in with no pending
>>>> ptrace signals for leaving this mode, and this may happen long after
>>>> the truckload of other threads releases the CPU.
>>>>
>>>>> My patch adjust this problem.
>>>>>
>>>>> I realize that it is a very special case, but it is my case.
>>>>>
>>>>> I'd like to know if the patch is valid or can be written in a different
>>>>> way.
>>>>> For example, I could insert my patch directly in xnpod_delete_thread().
>>>>>
>>>>> The function unlock_timers() cannot be called from
>>>>> xenomai-2.5.6/ksrc/skins/native/task.c
>>>>> because it is defined static. This is a detail. There are simple ways to
>>>>> solve this.
>>>>>
>>>> No, really the patch is wrong, but what you expose does reveal a bug
>>>> in the Xenomai core for sure. As Gilles told you, you would be only
>>>> papering over that real bug, which would likely show up in a different
>>>> situation.
>>>>
>>>> First we need to check for a lock imbalance, I don't think that code
>>>> is particularly safe.
>>> I mean a lock imbalance introduced by an unexpected race between the
>>> locking/unlocking calls. The assertions introduced by this patch might
>>> help detecting this, with some luck.
>>>
>> Could you apply that patch below, and report whether some task triggers
>> the message it introduces, when things go wrong with gdb? TIA,
>>
>> diff --git a/ksrc/nucleus/pod.c b/ksrc/nucleus/pod.c
>> index 868f98f..2da3265 100644
>> --- a/ksrc/nucleus/pod.c
>> +++ b/ksrc/nucleus/pod.c
>> @@ -1215,6 +1215,10 @@ void xnpod_delete_thread(xnthread_t *thread)
>>   #else /* !CONFIG_XENO_HW_UNLOCKED_SWITCH */
>>   	} else {
>>   #endif /* !CONFIG_XENO_HW_UNLOCKED_SWITCH */
>> +		if (xnthread_test_state(thread, XNSHADOW|XNMAPPED) == XNSHADOW)
>> +			printk(KERN_WARNING "%s: deleting unmapped shadow %s\n",
>> +			       __func__, thread->name);
>> +
>>   		xnpod_run_hooks(&nkpod->tdeleteq, thread, "DELETE");
>>
>>   		xnsched_forget(thread);
>>
> I have tried but no messages on the console.
> This time I'm sure. To be sure I have added other messages and I see them.
> Paolo


On my side, I have run your example with CONFIG_XENO_OPT_DEBUG_NUCLEUS 
turned on, and I get the following message in the same conditions as 
you:

Xenomai: xnshadow_unmap invoked for a non-current task (t=demo0/p=demo0)
Master time base: clock=8300431919
(...)
[<c002da88>] (unwind_backtrace+0x0/0xf4) from [<c002bf68>] (show_stack+0x20/0x24)
[<c002bf68>] (show_stack+0x20/0x24) from [<c00e6198>] (xnshadow_unmap+0x1e0/0x270)
[<c00e6198>] (xnshadow_unmap+0x1e0/0x270) from [<c010918c>] (__shadow_delete_hook+0x4c/0x54)
[<c010918c>] (__shadow_delete_hook+0x4c/0x54) from [<c00a3c78>] (xnpod_fire_callouts+0x44/0x80)
[<c00a3c78>] (xnpod_fire_callouts+0x44/0x80) from [<c00ae220>] (xnpod_delete_thread+0x91c/0x15b4)
[<c00ae220>] (xnpod_delete_thread+0x91c/0x15b4) from [<c0106fc4>] (rt_task_delete+0x100/0x344)
[<c0106fc4>] (rt_task_delete+0x100/0x344) from [<c01119a8>] (__rt_task_delete+0x8c/0x90)
[<c01119a8>] (__rt_task_delete+0x8c/0x90) from [<c00e6f08>] (losyscall_event+0xd8/0x268)
[<c00e6f08>] (losyscall_event+0xd8/0x268) from [<c007f7f0>] (__ipipe_dispatch_event+0x100/0x264)
[<c007f7f0>] (__ipipe_dispatch_event+0x100/0x264) from [<c002e444>] (__ipipe_syscall_root+0x64/0x13c)
[<c002e444>] (__ipipe_syscall_root+0x64/0x13c) from [<c002736c>] (vector_swi+0x6c/0xac)

So, maybe the XNSHADOW bit is not set yet? Because the deletion hooks
are definitely run when the bug happens.

-- 
                                                                Gilles.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Xenomai] Re : Sporadic problem : rt_task_sleep locked after debugging
  2013-05-09 13:58                         ` Gilles Chanteperdrix
@ 2013-05-09 14:04                           ` Philippe Gerum
  2013-05-09 14:08                             ` Philippe Gerum
  0 siblings, 1 reply; 16+ messages in thread
From: Philippe Gerum @ 2013-05-09 14:04 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai

On 05/09/2013 03:58 PM, Gilles Chanteperdrix wrote:
> On 05/09/2013 03:52 PM, Paolo Minazzi wrote:
>
>> Il 09/05/2013 15.36, Philippe Gerum ha scritto:
>>> On 05/08/2013 06:10 PM, Philippe Gerum wrote:
>>>> On 05/08/2013 06:06 PM, Philippe Gerum wrote:
>>>>> On 05/08/2013 04:30 PM, Paolo Minazzi wrote:
>>>>>> I think to be very near to the solution of this problem.
>>>>>> Thanks to Gilles for his patience.
>>>>>>
>>>>>> Now I will retry to make a summary of the problem.
>>>>>>
>>>>> <snip>
>>>>>
>>>>>> The thread 1 finds thread 70 in debug mode !
>>>>>>
>>>>> Which is expected. thread 70 has to be scheduled in with no pending
>>>>> ptrace signals for leaving this mode, and this may happen long after
>>>>> the truckload of other threads releases the CPU.
>>>>>
>>>>>> My patch adjust this problem.
>>>>>>
>>>>>> I realize that it is a very special case, but it is my case.
>>>>>>
>>>>>> I'd like to know if the patch is valid or can be written in a different
>>>>>> way.
>>>>>> For example, I could insert my patch directly in xnpod_delete_thread().
>>>>>>
>>>>>> The function unlock_timers() cannot be called from
>>>>>> xenomai-2.5.6/ksrc/skins/native/task.c
>>>>>> because it is defined static. This is a detail. There are simple ways to
>>>>>> solve this.
>>>>>>
>>>>> No, really the patch is wrong, but what you expose does reveal a bug
>>>>> in the Xenomai core for sure. As Gilles told you, you would be only
>>>>> papering over that real bug, which would likely show up in a different
>>>>> situation.
>>>>>
>>>>> First we need to check for a lock imbalance, I don't think that code
>>>>> is particularly safe.
>>>> I mean a lock imbalance introduced by an unexpected race between the
>>>> locking/unlocking calls. The assertions introduced by this patch might
>>>> help detecting this, with some luck.
>>>>
>>> Could you apply that patch below, and report whether some task triggers
>>> the message it introduces, when things go wrong with gdb? TIA,
>>>
>>> diff --git a/ksrc/nucleus/pod.c b/ksrc/nucleus/pod.c
>>> index 868f98f..2da3265 100644
>>> --- a/ksrc/nucleus/pod.c
>>> +++ b/ksrc/nucleus/pod.c
>>> @@ -1215,6 +1215,10 @@ void xnpod_delete_thread(xnthread_t *thread)
>>>    #else /* !CONFIG_XENO_HW_UNLOCKED_SWITCH */
>>>    	} else {
>>>    #endif /* !CONFIG_XENO_HW_UNLOCKED_SWITCH */
>>> +		if (xnthread_test_state(thread, XNSHADOW|XNMAPPED) == XNSHADOW)
>>> +			printk(KERN_WARNING "%s: deleting unmapped shadow %s\n",
>>> +			       __func__, thread->name);
>>> +
>>>    		xnpod_run_hooks(&nkpod->tdeleteq, thread, "DELETE");
>>>
>>>    		xnsched_forget(thread);
>>>
>> I have tried but no messages on the console.
>> This time I'm sure. To be sure I have added other messages and I see them.
>> Paolo
>
>
> On my side, I have run your example with CONFIG_XENO_OPT_DEBUG_NUCLEUS
> turned on, and I get the following message in the same conditions as
> you:
>
> Xenomai: xnshadow_unmap invoked for a non-current task (t=demo0/p=demo0)
> Master time base: clock=8300431919
> (...)
> [<c002da88>] (unwind_backtrace+0x0/0xf4) from [<c002bf68>] (show_stack+0x20/0x24)
> [<c002bf68>] (show_stack+0x20/0x24) from [<c00e6198>] (xnshadow_unmap+0x1e0/0x270)
> [<c00e6198>] (xnshadow_unmap+0x1e0/0x270) from [<c010918c>] (__shadow_delete_hook+0x4c/0x54)
> [<c010918c>] (__shadow_delete_hook+0x4c/0x54) from [<c00a3c78>] (xnpod_fire_callouts+0x44/0x80)
> [<c00a3c78>] (xnpod_fire_callouts+0x44/0x80) from [<c00ae220>] (xnpod_delete_thread+0x91c/0x15b4)
> [<c00ae220>] (xnpod_delete_thread+0x91c/0x15b4) from [<c0106fc4>] (rt_task_delete+0x100/0x344)
> [<c0106fc4>] (rt_task_delete+0x100/0x344) from [<c01119a8>] (__rt_task_delete+0x8c/0x90)
> [<c01119a8>] (__rt_task_delete+0x8c/0x90) from [<c00e6f08>] (losyscall_event+0xd8/0x268)
> [<c00e6f08>] (losyscall_event+0xd8/0x268) from [<c007f7f0>] (__ipipe_dispatch_event+0x100/0x264)
> [<c007f7f0>] (__ipipe_dispatch_event+0x100/0x264) from [<c002e444>] (__ipipe_syscall_root+0x64/0x13c)
> [<c002e444>] (__ipipe_syscall_root+0x64/0x13c) from [<c002736c>] (vector_swi+0x6c/0xac)
>
> So, maybe the XNSHADOW bit is not set yet? Because the deletion hooks
> are definitely run when the bug happens.
>

XNSHADOW is set early when the tcb is created over the userland 
trampoline routine. The problem is that such thread may be still bearing 
XNDORMANT, so it would not be caught in the userland kill redirect, 
earlier in xnpod_delete(). This would explain in turn why the latest 
assertion did not trigger. Ok, that would be a bug. Need to think about 
this now.

-- 
Philippe.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Xenomai] Re : Sporadic problem : rt_task_sleep locked after debugging
  2013-05-09 14:04                           ` Philippe Gerum
@ 2013-05-09 14:08                             ` Philippe Gerum
  2013-05-10  6:47                               ` Paolo Minazzi
  0 siblings, 1 reply; 16+ messages in thread
From: Philippe Gerum @ 2013-05-09 14:08 UTC (permalink / raw)
  To: Paolo Minazzi; +Cc: xenomai

On 05/09/2013 04:04 PM, Philippe Gerum wrote:
> On 05/09/2013 03:58 PM, Gilles Chanteperdrix wrote:
>> On 05/09/2013 03:52 PM, Paolo Minazzi wrote:
>>
>>> Il 09/05/2013 15.36, Philippe Gerum ha scritto:
>>>> On 05/08/2013 06:10 PM, Philippe Gerum wrote:
>>>>> On 05/08/2013 06:06 PM, Philippe Gerum wrote:
>>>>>> On 05/08/2013 04:30 PM, Paolo Minazzi wrote:
>>>>>>> I think to be very near to the solution of this problem.
>>>>>>> Thanks to Gilles for his patience.
>>>>>>>
>>>>>>> Now I will retry to make a summary of the problem.
>>>>>>>
>>>>>> <snip>
>>>>>>
>>>>>>> The thread 1 finds thread 70 in debug mode !
>>>>>>>
>>>>>> Which is expected. thread 70 has to be scheduled in with no pending
>>>>>> ptrace signals for leaving this mode, and this may happen long after
>>>>>> the truckload of other threads releases the CPU.
>>>>>>
>>>>>>> My patch adjust this problem.
>>>>>>>
>>>>>>> I realize that it is a very special case, but it is my case.
>>>>>>>
>>>>>>> I'd like to know if the patch is valid or can be written in a 
>>>>>>> different
>>>>>>> way.
>>>>>>> For example, I could insert my patch directly in 
>>>>>>> xnpod_delete_thread().
>>>>>>>
>>>>>>> The function unlock_timers() cannot be called from
>>>>>>> xenomai-2.5.6/ksrc/skins/native/task.c
>>>>>>> because it is defined static. This is a detail. There are simple 
>>>>>>> ways to
>>>>>>> solve this.
>>>>>>>
>>>>>> No, really the patch is wrong, but what you expose does reveal a bug
>>>>>> in the Xenomai core for sure. As Gilles told you, you would be only
>>>>>> papering over that real bug, which would likely show up in a 
>>>>>> different
>>>>>> situation.
>>>>>>
>>>>>> First we need to check for a lock imbalance, I don't think that code
>>>>>> is particularly safe.
>>>>> I mean a lock imbalance introduced by an unexpected race between the
>>>>> locking/unlocking calls. The assertions introduced by this patch might
>>>>> help detecting this, with some luck.
>>>>>
>>>> Could you apply that patch below, and report whether some task triggers
>>>> the message it introduces, when things go wrong with gdb? TIA,
>>>>
>>>> diff --git a/ksrc/nucleus/pod.c b/ksrc/nucleus/pod.c
>>>> index 868f98f..2da3265 100644
>>>> --- a/ksrc/nucleus/pod.c
>>>> +++ b/ksrc/nucleus/pod.c
>>>> @@ -1215,6 +1215,10 @@ void xnpod_delete_thread(xnthread_t *thread)
>>>>    #else /* !CONFIG_XENO_HW_UNLOCKED_SWITCH */
>>>>        } else {
>>>>    #endif /* !CONFIG_XENO_HW_UNLOCKED_SWITCH */
>>>> +        if (xnthread_test_state(thread, XNSHADOW|XNMAPPED) == 
>>>> XNSHADOW)
>>>> +            printk(KERN_WARNING "%s: deleting unmapped shadow %s\n",
>>>> +                   __func__, thread->name);
>>>> +
>>>>            xnpod_run_hooks(&nkpod->tdeleteq, thread, "DELETE");
>>>>
>>>>            xnsched_forget(thread);
>>>>
>>> I have tried but no messages on the console.
>>> This time I'm sure. To be sure I have added other messages and I see 
>>> them.
>>> Paolo
>>
>>
>> On my side, I have run your example with CONFIG_XENO_OPT_DEBUG_NUCLEUS
>> turned on, and I get the following message in the same conditions as
>> you:
>>
>> Xenomai: xnshadow_unmap invoked for a non-current task (t=demo0/p=demo0)
>> Master time base: clock=8300431919
>> (...)
>> [<c002da88>] (unwind_backtrace+0x0/0xf4) from [<c002bf68>] 
>> (show_stack+0x20/0x24)
>> [<c002bf68>] (show_stack+0x20/0x24) from [<c00e6198>] 
>> (xnshadow_unmap+0x1e0/0x270)
>> [<c00e6198>] (xnshadow_unmap+0x1e0/0x270) from [<c010918c>] 
>> (__shadow_delete_hook+0x4c/0x54)
>> [<c010918c>] (__shadow_delete_hook+0x4c/0x54) from [<c00a3c78>] 
>> (xnpod_fire_callouts+0x44/0x80)
>> [<c00a3c78>] (xnpod_fire_callouts+0x44/0x80) from [<c00ae220>] 
>> (xnpod_delete_thread+0x91c/0x15b4)
>> [<c00ae220>] (xnpod_delete_thread+0x91c/0x15b4) from [<c0106fc4>] 
>> (rt_task_delete+0x100/0x344)
>> [<c0106fc4>] (rt_task_delete+0x100/0x344) from [<c01119a8>] 
>> (__rt_task_delete+0x8c/0x90)
>> [<c01119a8>] (__rt_task_delete+0x8c/0x90) from [<c00e6f08>] 
>> (losyscall_event+0xd8/0x268)
>> [<c00e6f08>] (losyscall_event+0xd8/0x268) from [<c007f7f0>] 
>> (__ipipe_dispatch_event+0x100/0x264)
>> [<c007f7f0>] (__ipipe_dispatch_event+0x100/0x264) from [<c002e444>] 
>> (__ipipe_syscall_root+0x64/0x13c)
>> [<c002e444>] (__ipipe_syscall_root+0x64/0x13c) from [<c002736c>] 
>> (vector_swi+0x6c/0xac)
>>
>> So, maybe the XNSHADOW bit is not set yet? Because the deletion hooks
>> are definitely run when the bug happens.
>>
> 
> XNSHADOW is set early when the tcb is created over the userland 
> trampoline routine. The problem is that such thread may be still bearing 
> XNDORMANT, so it would not be caught in the userland kill redirect, 
> earlier in xnpod_delete(). This would explain in turn why the latest 
> assertion did not trigger. Ok, that would be a bug. Need to think about 
> this now.
> 

Ok, does this one trigger?

diff --git a/ksrc/nucleus/pod.c b/ksrc/nucleus/pod.c
index 868f98f..9f14bf1 100644
--- a/ksrc/nucleus/pod.c
+++ b/ksrc/nucleus/pod.c
@@ -1215,6 +1215,10 @@ void xnpod_delete_thread(xnthread_t *thread)
 #else /* !CONFIG_XENO_HW_UNLOCKED_SWITCH */
 	} else {
 #endif /* !CONFIG_XENO_HW_UNLOCKED_SWITCH */
+		if (xnthread_test_state(thread, XNSHADOW|XNDORMANT) == (XNSHADOW|XNDORMANT))
+			printk(KERN_WARNING "%s: deleting dormant shadow %s\n",
+			       __func__, thread->name);
+
 		xnpod_run_hooks(&nkpod->tdeleteq, thread, "DELETE");
 
 		xnsched_forget(thread);

-- 
Philippe.


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [Xenomai] Re : Sporadic problem : rt_task_sleep locked after debugging
  2013-05-09 14:08                             ` Philippe Gerum
@ 2013-05-10  6:47                               ` Paolo Minazzi
  2013-06-17  7:20                                 ` Philippe Gerum
  0 siblings, 1 reply; 16+ messages in thread
From: Paolo Minazzi @ 2013-05-10  6:47 UTC (permalink / raw)
  To: Philippe Gerum, Gilles Chanteperdrix; +Cc: xenomai

Il 09/05/2013 16.08, Philippe Gerum ha scritto:
> On 05/09/2013 04:04 PM, Philippe Gerum wrote:
>> On 05/09/2013 03:58 PM, Gilles Chanteperdrix wrote:
>>> On 05/09/2013 03:52 PM, Paolo Minazzi wrote:
>>>
>>>> Il 09/05/2013 15.36, Philippe Gerum ha scritto:
>>>>> On 05/08/2013 06:10 PM, Philippe Gerum wrote:
>>>>>> On 05/08/2013 06:06 PM, Philippe Gerum wrote:
>>>>>>> On 05/08/2013 04:30 PM, Paolo Minazzi wrote:
>>>>>>>> I think to be very near to the solution of this problem.
>>>>>>>> Thanks to Gilles for his patience.
>>>>>>>>
>>>>>>>> Now I will retry to make a summary of the problem.
>>>>>>>>
>>>>>>> <snip>
>>>>>>>
>>>>>>>> The thread 1 finds thread 70 in debug mode !
>>>>>>>>
>>>>>>> Which is expected. thread 70 has to be scheduled in with no pending
>>>>>>> ptrace signals for leaving this mode, and this may happen long after
>>>>>>> the truckload of other threads releases the CPU.
>>>>>>>
>>>>>>>> My patch adjust this problem.
>>>>>>>>
>>>>>>>> I realize that it is a very special case, but it is my case.
>>>>>>>>
>>>>>>>> I'd like to know if the patch is valid or can be written in a
>>>>>>>> different
>>>>>>>> way.
>>>>>>>> For example, I could insert my patch directly in
>>>>>>>> xnpod_delete_thread().
>>>>>>>>
>>>>>>>> The function unlock_timers() cannot be called from
>>>>>>>> xenomai-2.5.6/ksrc/skins/native/task.c
>>>>>>>> because it is defined static. This is a detail. There are simple
>>>>>>>> ways to
>>>>>>>> solve this.
>>>>>>>>
>>>>>>> No, really the patch is wrong, but what you expose does reveal a bug
>>>>>>> in the Xenomai core for sure. As Gilles told you, you would be only
>>>>>>> papering over that real bug, which would likely show up in a
>>>>>>> different
>>>>>>> situation.
>>>>>>>
>>>>>>> First we need to check for a lock imbalance, I don't think that code
>>>>>>> is particularly safe.
>>>>>> I mean a lock imbalance introduced by an unexpected race between the
>>>>>> locking/unlocking calls. The assertions introduced by this patch might
>>>>>> help detecting this, with some luck.
>>>>>>
>>>>> Could you apply that patch below, and report whether some task triggers
>>>>> the message it introduces, when things go wrong with gdb? TIA,
>>>>>
>>>>> diff --git a/ksrc/nucleus/pod.c b/ksrc/nucleus/pod.c
>>>>> index 868f98f..2da3265 100644
>>>>> --- a/ksrc/nucleus/pod.c
>>>>> +++ b/ksrc/nucleus/pod.c
>>>>> @@ -1215,6 +1215,10 @@ void xnpod_delete_thread(xnthread_t *thread)
>>>>>     #else /* !CONFIG_XENO_HW_UNLOCKED_SWITCH */
>>>>>         } else {
>>>>>     #endif /* !CONFIG_XENO_HW_UNLOCKED_SWITCH */
>>>>> +        if (xnthread_test_state(thread, XNSHADOW|XNMAPPED) ==
>>>>> XNSHADOW)
>>>>> +            printk(KERN_WARNING "%s: deleting unmapped shadow %s\n",
>>>>> +                   __func__, thread->name);
>>>>> +
>>>>>             xnpod_run_hooks(&nkpod->tdeleteq, thread, "DELETE");
>>>>>
>>>>>             xnsched_forget(thread);
>>>>>
>>>> I have tried but no messages on the console.
>>>> This time I'm sure. To be sure I have added other messages and I see
>>>> them.
>>>> Paolo
>>>
>>> On my side, I have run your example with CONFIG_XENO_OPT_DEBUG_NUCLEUS
>>> turned on, and I get the following message in the same conditions as
>>> you:
>>>
>>> Xenomai: xnshadow_unmap invoked for a non-current task (t=demo0/p=demo0)
>>> Master time base: clock=8300431919
>>> (...)
>>> [<c002da88>] (unwind_backtrace+0x0/0xf4) from [<c002bf68>]
>>> (show_stack+0x20/0x24)
>>> [<c002bf68>] (show_stack+0x20/0x24) from [<c00e6198>]
>>> (xnshadow_unmap+0x1e0/0x270)
>>> [<c00e6198>] (xnshadow_unmap+0x1e0/0x270) from [<c010918c>]
>>> (__shadow_delete_hook+0x4c/0x54)
>>> [<c010918c>] (__shadow_delete_hook+0x4c/0x54) from [<c00a3c78>]
>>> (xnpod_fire_callouts+0x44/0x80)
>>> [<c00a3c78>] (xnpod_fire_callouts+0x44/0x80) from [<c00ae220>]
>>> (xnpod_delete_thread+0x91c/0x15b4)
>>> [<c00ae220>] (xnpod_delete_thread+0x91c/0x15b4) from [<c0106fc4>]
>>> (rt_task_delete+0x100/0x344)
>>> [<c0106fc4>] (rt_task_delete+0x100/0x344) from [<c01119a8>]
>>> (__rt_task_delete+0x8c/0x90)
>>> [<c01119a8>] (__rt_task_delete+0x8c/0x90) from [<c00e6f08>]
>>> (losyscall_event+0xd8/0x268)
>>> [<c00e6f08>] (losyscall_event+0xd8/0x268) from [<c007f7f0>]
>>> (__ipipe_dispatch_event+0x100/0x264)
>>> [<c007f7f0>] (__ipipe_dispatch_event+0x100/0x264) from [<c002e444>]
>>> (__ipipe_syscall_root+0x64/0x13c)
>>> [<c002e444>] (__ipipe_syscall_root+0x64/0x13c) from [<c002736c>]
>>> (vector_swi+0x6c/0xac)
>>>
>>> So, maybe the XNSHADOW bit is not set yet? Because the deletion hooks
>>> are definitely run when the bug happens.
>>>
>> XNSHADOW is set early when the tcb is created over the userland
>> trampoline routine. The problem is that such thread may be still bearing
>> XNDORMANT, so it would not be caught in the userland kill redirect,
>> earlier in xnpod_delete(). This would explain in turn why the latest
>> assertion did not trigger. Ok, that would be a bug. Need to think about
>> this now.
>>
> Ok, does this one trigger?
>
> diff --git a/ksrc/nucleus/pod.c b/ksrc/nucleus/pod.c
> index 868f98f..9f14bf1 100644
> --- a/ksrc/nucleus/pod.c
> +++ b/ksrc/nucleus/pod.c
> @@ -1215,6 +1215,10 @@ void xnpod_delete_thread(xnthread_t *thread)
>   #else /* !CONFIG_XENO_HW_UNLOCKED_SWITCH */
>   	} else {
>   #endif /* !CONFIG_XENO_HW_UNLOCKED_SWITCH */
> +		if (xnthread_test_state(thread, XNSHADOW|XNDORMANT) == (XNSHADOW|XNDORMANT))
> +			printk(KERN_WARNING "%s: deleting dormant shadow %s\n",
> +			       __func__, thread->name);
> +
>   		xnpod_run_hooks(&nkpod->tdeleteq, thread, "DELETE");
>
>   		xnsched_forget(thread);
>
Ok, it seems trigger.
This is my output on "c" command og gdb.

/D # gdbserver :8888 prova
Process prova created; pid = 295
Listening on port 8888
Remote debugging from host 198.18.0.1
xnpod_delete_thread: deleting dormant shadow demo0
xnpod_delete_thread: deleting dormant shadow demo1
xnpod_delete_thread: deleting dormant shadow demo2
xnpod_delete_thread: deleting dormant shadow demo3
xnpod_delete_thread: deleting dormant shadow demo4
xnpod_delete_thread: deleting dormant shadow demo5
xnpod_delete_thread: deleting dormant shadow demo6
xnpod_delete_thread: deleting dormant shadow demo7
xnpod_delete_thread: deleting dormant shadow demo8
xnpod_delete_thread: deleting dormant shadow demo9
xnpod_delete_thread: deleting dormant shadow demo10
xnpod_delete_thread: deleting dormant shadow demo11
xnpod_delete_thread: deleting dormant shadow demo12
....

Paolo



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [Xenomai] Re : Sporadic problem : rt_task_sleep locked after debugging
  2013-05-10  6:47                               ` Paolo Minazzi
@ 2013-06-17  7:20                                 ` Philippe Gerum
  0 siblings, 0 replies; 16+ messages in thread
From: Philippe Gerum @ 2013-06-17  7:20 UTC (permalink / raw)
  To: Paolo Minazzi; +Cc: xenomai

On 05/10/2013 08:47 AM, Paolo Minazzi wrote:

> /D # gdbserver :8888 prova
> Process prova created; pid = 295
> Listening on port 8888
> Remote debugging from host 198.18.0.1
> xnpod_delete_thread: deleting dormant shadow demo0
> xnpod_delete_thread: deleting dormant shadow demo1
> xnpod_delete_thread: deleting dormant shadow demo2
> xnpod_delete_thread: deleting dormant shadow demo3
> xnpod_delete_thread: deleting dormant shadow demo4
> xnpod_delete_thread: deleting dormant shadow demo5
> xnpod_delete_thread: deleting dormant shadow demo6
> xnpod_delete_thread: deleting dormant shadow demo7
> xnpod_delete_thread: deleting dormant shadow demo8
> xnpod_delete_thread: deleting dormant shadow demo9
> xnpod_delete_thread: deleting dormant shadow demo10
> xnpod_delete_thread: deleting dormant shadow demo11
> xnpod_delete_thread: deleting dormant shadow demo12

This should fix the issue:
http://git.xenomai.org/?p=xenomai-2.6.git;a=commit;h=b2ddb284c9b7e6cbd6c2a0d26544e9a0a618f32a

> ....
> 
> Paolo
> 
> 


-- 
Philippe.


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2013-06-17  7:20 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-02 13:49 [Xenomai] Re : Sporadic problem : rt_task_sleep locked after debugging Paolo Minazzi
2013-05-03  1:06 ` Gilles Chanteperdrix
2013-05-03 14:46   ` Paolo Minazzi
2013-05-08  8:03     ` Paolo Minazzi
2013-05-08 12:58       ` Gilles Chanteperdrix
     [not found]         ` <518A505C.2090207@mitrol.it>
     [not found]           ` <518A52A7.5000801@xenomai.org>
     [not found]             ` <518A5600.20508@mitrol.it>
2013-05-08 14:30               ` Paolo Minazzi
2013-05-08 14:43                 ` Gilles Chanteperdrix
2013-05-08 16:06                 ` Philippe Gerum
2013-05-08 16:10                   ` Philippe Gerum
2013-05-09 13:36                     ` Philippe Gerum
2013-05-09 13:52                       ` Paolo Minazzi
2013-05-09 13:58                         ` Gilles Chanteperdrix
2013-05-09 14:04                           ` Philippe Gerum
2013-05-09 14:08                             ` Philippe Gerum
2013-05-10  6:47                               ` Paolo Minazzi
2013-06-17  7:20                                 ` Philippe Gerum

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.