All of lore.kernel.org
 help / color / mirror / Atom feed
* [Xenomai-core] Linux lock-up with rtcanrecv
@ 2007-02-07 23:04 Jan Kiszka
  2007-02-07 23:09 ` Jan Kiszka
  2007-02-08  8:46 ` [Xenomai-core] " Wolfgang Grandegger
  0 siblings, 2 replies; 6+ messages in thread
From: Jan Kiszka @ 2007-02-07 23:04 UTC (permalink / raw)
  To: Philippe Gerum, Wolfgang Grandegger; +Cc: xenomai-core


[-- Attachment #1.1: Type: text/plain, Size: 1836 bytes --]

Hi all,

fiddling with latest Xenomai trunk and 2.3.x on one of our robots (there
is still a bug in trunk /wrt broken timeouts of rt_dev_read on
xeno_16550A - different issue...), I ran into a weird behaviour of
rtcanrecv:

I have a continuous stream of a few thousand packets/s towards the
robot. When I start up two "rtcanrecv rtcan0 -p1000" instances (or one +
our own receiver application), the second one causes a Linux lock-up.
Sometimes this happens during startup of the second rtcanrecv, but at
latest on its termination. Other RT tasks are still running. I can
resolve the lock-up by pulling the CAN cable, everyone is fine
afterwards and can be cleaned up. I played with quite a few combinations
of recent ipipe patches and Xenomai revisions (even back to #1084 in
v2.3.x), no noticeable difference.

Seems like I have to take a closer look - once time permits and the
robot is available. So any ideas or attempts to reproduce this are
welcome, current .config attached (no magic knob found there yet).

Jan


PS: Wolfgang, any objections against "decoupling" -v from -p and
lowering the receiver priority to 0?

Index: src/utils/can/rtcanrecv.c
===================================================================
--- src/utils/can/rtcanrecv.c	(revision 2146)
+++ src/utils/can/rtcanrecv.c	(working copy)
@@ -192,6 +192,7 @@ int main(int argc, char **argv)

 	case 'p':
 	    print = strtoul(optarg, NULL, 0);
+	    break;

 	case 'v':
 	    verbose = 1;
@@ -312,7 +313,7 @@ int main(int argc, char **argv)
     }

     snprintf(name, sizeof(name), "rtcanrecv-%d", getpid());
-    ret = rt_task_shadow(&rt_task_desc, name, 1, 0);
+    ret = rt_task_shadow(&rt_task_desc, name, 0, 0);
     if (ret) {
 	fprintf(stderr, "rt_task_shadow: %s\n", strerror(-ret));
 	goto failure;


[-- Attachment #1.2: config.bz2 --]
[-- Type: application/octet-stream, Size: 7460 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Xenomai-core] Linux lock-up with rtcanrecv
  2007-02-07 23:04 [Xenomai-core] Linux lock-up with rtcanrecv Jan Kiszka
@ 2007-02-07 23:09 ` Jan Kiszka
  2007-02-08  8:44   ` Wolfgang Grandegger
  2007-02-08  8:46 ` [Xenomai-core] " Wolfgang Grandegger
  1 sibling, 1 reply; 6+ messages in thread
From: Jan Kiszka @ 2007-02-07 23:09 UTC (permalink / raw)
  To: Philippe Gerum, Wolfgang Grandegger; +Cc: xenomai-core

[-- Attachment #1: Type: text/plain, Size: 1087 bytes --]

Jan Kiszka wrote:
> Hi all,
> 
> fiddling with latest Xenomai trunk and 2.3.x on one of our robots (there
> is still a bug in trunk /wrt broken timeouts of rt_dev_read on
> xeno_16550A - different issue...), I ran into a weird behaviour of
> rtcanrecv:
> 
> I have a continuous stream of a few thousand packets/s towards the
> robot. When I start up two "rtcanrecv rtcan0 -p1000" instances (or one +
> our own receiver application), the second one causes a Linux lock-up.
> Sometimes this happens during startup of the second rtcanrecv, but at
> latest on its termination. Other RT tasks are still running. I can
> resolve the lock-up by pulling the CAN cable, everyone is fine
> afterwards and can be cleaned up. I played with quite a few combinations
> of recent ipipe patches and Xenomai revisions (even back to #1084 in
> v2.3.x), no noticeable difference.
> 

Forgot to mention one further observation: removing the usleep form
rtcanrecv's cleanup() works around the shutdown lock-up. I can't
interpret this yet. [BTW, Wolfgang, what is it good for?]

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Xenomai-core] Re: Linux lock-up with rtcanrecv
  2007-02-08  8:46 ` [Xenomai-core] " Wolfgang Grandegger
@ 2007-02-08  8:43   ` Jan Kiszka
  2007-02-08 12:28     ` Jan Kiszka
  0 siblings, 1 reply; 6+ messages in thread
From: Jan Kiszka @ 2007-02-08  8:43 UTC (permalink / raw)
  To: Wolfgang Grandegger; +Cc: xenomai-core

[-- Attachment #1: Type: text/plain, Size: 1697 bytes --]

Wolfgang Grandegger wrote:
> Hi Jan,
> 
> Jan Kiszka wrote:
>> Hi all,
>>
>> fiddling with latest Xenomai trunk and 2.3.x on one of our robots (there
>> is still a bug in trunk /wrt broken timeouts of rt_dev_read on
>> xeno_16550A - different issue...), I ran into a weird behaviour of
>> rtcanrecv:
>>
>> I have a continuous stream of a few thousand packets/s towards the
>> robot. When I start up two "rtcanrecv rtcan0 -p1000" instances (or one +
>> our own receiver application), the second one causes a Linux lock-up.
>> Sometimes this happens during startup of the second rtcanrecv, but at
>> latest on its termination. Other RT tasks are still running. I can
>> resolve the lock-up by pulling the CAN cable, everyone is fine
>> afterwards and can be cleaned up. I played with quite a few combinations
>> of recent ipipe patches and Xenomai revisions (even back to #1084 in
>> v2.3.x), no noticeable difference.
>>
>> Seems like I have to take a closer look - once time permits and the
>> robot is available. So any ideas or attempts to reproduce this are
>> welcome, current .config attached (no magic knob found there yet).
> 
> I will try to reporduce the problem a.s.a.p.

TiA.

> 
>> Jan
>>
>>
>> PS: Wolfgang, any objections against "decoupling" -v from -p and
>> lowering the receiver priority to 0?
> 
> No, -v with -p looks like a bug anyway. And does it make sense to define
> an option for the task priority?

I don't think so because a) the timestamps are recorded at IRQ level
anyway and we b) printf the result in secondary mode. My reason for
lowering the prio was to avoid the the receiver runs under Linux with
SCHED_FIFO.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Xenomai-core] Linux lock-up with rtcanrecv
  2007-02-07 23:09 ` Jan Kiszka
@ 2007-02-08  8:44   ` Wolfgang Grandegger
  0 siblings, 0 replies; 6+ messages in thread
From: Wolfgang Grandegger @ 2007-02-08  8:44 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai-core

Jan Kiszka wrote:
> Jan Kiszka wrote:
>> Hi all,
>>
>> fiddling with latest Xenomai trunk and 2.3.x on one of our robots (there
>> is still a bug in trunk /wrt broken timeouts of rt_dev_read on
>> xeno_16550A - different issue...), I ran into a weird behaviour of
>> rtcanrecv:
>>
>> I have a continuous stream of a few thousand packets/s towards the
>> robot. When I start up two "rtcanrecv rtcan0 -p1000" instances (or one +
>> our own receiver application), the second one causes a Linux lock-up.
>> Sometimes this happens during startup of the second rtcanrecv, but at
>> latest on its termination. Other RT tasks are still running. I can
>> resolve the lock-up by pulling the CAN cable, everyone is fine
>> afterwards and can be cleaned up. I played with quite a few combinations
>> of recent ipipe patches and Xenomai revisions (even back to #1084 in
>> v2.3.x), no noticeable difference.
>>
> 
> Forgot to mention one further observation: removing the usleep form
> rtcanrecv's cleanup() works around the shutdown lock-up. I can't
> interpret this yet. [BTW, Wolfgang, what is it good for?]

Hm, I think the usleep() only make sense for rtcansend to allow messages 
to got out before the close. You can remove it.

Wolfgang.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Xenomai-core] Re: Linux lock-up with rtcanrecv
  2007-02-07 23:04 [Xenomai-core] Linux lock-up with rtcanrecv Jan Kiszka
  2007-02-07 23:09 ` Jan Kiszka
@ 2007-02-08  8:46 ` Wolfgang Grandegger
  2007-02-08  8:43   ` Jan Kiszka
  1 sibling, 1 reply; 6+ messages in thread
From: Wolfgang Grandegger @ 2007-02-08  8:46 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai-core

Hi Jan,

Jan Kiszka wrote:
> Hi all,
> 
> fiddling with latest Xenomai trunk and 2.3.x on one of our robots (there
> is still a bug in trunk /wrt broken timeouts of rt_dev_read on
> xeno_16550A - different issue...), I ran into a weird behaviour of
> rtcanrecv:
> 
> I have a continuous stream of a few thousand packets/s towards the
> robot. When I start up two "rtcanrecv rtcan0 -p1000" instances (or one +
> our own receiver application), the second one causes a Linux lock-up.
> Sometimes this happens during startup of the second rtcanrecv, but at
> latest on its termination. Other RT tasks are still running. I can
> resolve the lock-up by pulling the CAN cable, everyone is fine
> afterwards and can be cleaned up. I played with quite a few combinations
> of recent ipipe patches and Xenomai revisions (even back to #1084 in
> v2.3.x), no noticeable difference.
> 
> Seems like I have to take a closer look - once time permits and the
> robot is available. So any ideas or attempts to reproduce this are
> welcome, current .config attached (no magic knob found there yet).

I will try to reporduce the problem a.s.a.p.

> Jan
> 
> 
> PS: Wolfgang, any objections against "decoupling" -v from -p and
> lowering the receiver priority to 0?

No, -v with -p looks like a bug anyway. And does it make sense to define 
an option for the task priority?

> 
> Index: src/utils/can/rtcanrecv.c
> ===================================================================
> --- src/utils/can/rtcanrecv.c	(revision 2146)
> +++ src/utils/can/rtcanrecv.c	(working copy)
> @@ -192,6 +192,7 @@ int main(int argc, char **argv)
> 
>  	case 'p':
>  	    print = strtoul(optarg, NULL, 0);
> +	    break;
> 
>  	case 'v':
>  	    verbose = 1;
> @@ -312,7 +313,7 @@ int main(int argc, char **argv)
>      }
> 
>      snprintf(name, sizeof(name), "rtcanrecv-%d", getpid());
> -    ret = rt_task_shadow(&rt_task_desc, name, 1, 0);
> +    ret = rt_task_shadow(&rt_task_desc, name, 0, 0);
>      if (ret) {
>  	fprintf(stderr, "rt_task_shadow: %s\n", strerror(-ret));
>  	goto failure;
> 

Wolfgang.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Xenomai-core] Re: Linux lock-up with rtcanrecv
  2007-02-08  8:43   ` Jan Kiszka
@ 2007-02-08 12:28     ` Jan Kiszka
  0 siblings, 0 replies; 6+ messages in thread
From: Jan Kiszka @ 2007-02-08 12:28 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: xenomai-core

[-- Attachment #1: Type: text/plain, Size: 1767 bytes --]

Jan Kiszka wrote:
> Wolfgang Grandegger wrote:
>> Hi Jan,
>>
>> Jan Kiszka wrote:
>>> Hi all,
>>>
>>> fiddling with latest Xenomai trunk and 2.3.x on one of our robots (there
>>> is still a bug in trunk /wrt broken timeouts of rt_dev_read on
>>> xeno_16550A - different issue...), I ran into a weird behaviour of
>>> rtcanrecv:
>>>
>>> I have a continuous stream of a few thousand packets/s towards the
>>> robot. When I start up two "rtcanrecv rtcan0 -p1000" instances (or one +
>>> our own receiver application), the second one causes a Linux lock-up.
>>> Sometimes this happens during startup of the second rtcanrecv, but at
>>> latest on its termination. Other RT tasks are still running. I can
>>> resolve the lock-up by pulling the CAN cable, everyone is fine
>>> afterwards and can be cleaned up. I played with quite a few combinations
>>> of recent ipipe patches and Xenomai revisions (even back to #1084 in
>>> v2.3.x), no noticeable difference.
>>>
>>> Seems like I have to take a closer look - once time permits and the
>>> robot is available. So any ideas or attempts to reproduce this are
>>> welcome, current .config attached (no magic knob found there yet).
>> I will try to reporduce the problem a.s.a.p.
> 
> TiA.

Grmbl. You can forget about it, I found the magic knob.

Normally I don't even notice that the tracer is running in background.
This time I did notice it, but didn't realised that it was the reason.
Already disabling it during runtime "solves" my problem. It looks like
its overhead combined with a few more debug options of Linux, the high
IRQ load, and a low-end board drove the otherwise only moderately loaded
box into starvation.

Sorry for making noise, let's go back to business.

Jan


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2007-02-08 12:28 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-07 23:04 [Xenomai-core] Linux lock-up with rtcanrecv Jan Kiszka
2007-02-07 23:09 ` Jan Kiszka
2007-02-08  8:44   ` Wolfgang Grandegger
2007-02-08  8:46 ` [Xenomai-core] " Wolfgang Grandegger
2007-02-08  8:43   ` Jan Kiszka
2007-02-08 12:28     ` Jan Kiszka

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.