Re: [Xenomai] [Emc-developers] "new RTOS" status: Scheduler (?) lockup on ARM

* Re: [Xenomai] [Emc-developers] "new RTOS" status: Scheduler (?) lockup on ARM
       [not found]   ` <F1ACE4FC-3E71-4498-B683-81F5C40CB6E3@mah.priv.at>
@ 2013-01-17  7:59     ` Bas Laarhoven
  2013-01-17  8:53       ` Gilles Chanteperdrix
  0 siblings, 1 reply; 17+ messages in thread
From: Bas Laarhoven @ 2013-01-17  7:59 UTC (permalink / raw)
  To: EMC developers; +Cc: xenomai

On 16-1-2013 20:36, Michael Haberler wrote:
> Am 16.01.2013 um 17:45 schrieb Bas Laarhoven:
>
>> On 16-1-2013 15:15, Michael Haberler wrote:
>>> ARM work:
>>>
>>> Several people have been able to get the Beaglebone ubuntu/xenomai setup working as outlined here: http://wiki.linuxcnc.org/cgi-bin/wiki.pl?BeagleboneDevsetup
>>> I have updated the kernel and rootfs image a few days ago so the kernel includes ext2/3/4 support compiled in, which should take care of two failure reports I got.
>>>
>>> Again that xenomai kernel is based on 3.2.21; it works very stable for me but there have been several reports of 'sudden stops'. The BB is a bit sensitive to power fluctuations but it might be more than that. As for that kernel, it works, but it is based on a branch which will see no further development. It supports most of the stuff needed to development; there might be some patches coming from more active BB users than me.
>> Hi Michael,
>>
>> Are you saying you don't have seen these 'sudden stops' yourself?
> No, never, after swapping to stronger power supplies; I have two of these boards running over NFS all the time. I dont have Linuxcnc running on them though, I'll do that and see if that changes the picture. Maybe keeping the torture test running helps trigger it.

Beginners error! :-P The power supply is indeed critical, but the 
stepdown converter on my BeBoPr is dimensioned for at least 2A and 
hasn't failed me yet.

I think that running linuxcnc is mandatory for the lockup. After a dozen 
runs, it looks like I can reproduce the lockup with 100% certainty 
within one hour.
Using the JTAG interface to attach a debugger to the Bone, I've found 
that once stalled the kernel is still running. It looks like it won't 
schedule properly and almost all time is spent in the cpu_idle thread.

The kernel with extra diagnostics produces these messages:

[ 3480.386342] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[ 3480.395913] INFO: task axis:799 blocked for more than 120 seconds.
[ 3480.406643] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
[ 3600.408670] INFO: task hal_manualtoolc:788 blocked for more than 120 
seconds.

On one run I was able to re-issue a command from the command history 
before that console froze too.
Since the x86 version seems to be having none of these problems, it 
might be ARM specific.
Any suggestions on how to proceed? Are other people working on the ARM 
version?

I'm also sending this message to the xenomai mailing list as that might 
be a better place to resume this thread.

-- Bas

>
> NB there is an ipipe trace option, but that doesnt help if you cant talk to the damn thing.
>   
>> My system has frozen within one hour every time.
>> I'm aware of the power supply issues, but my configuration has _never_ experienced this problem over at least half a year of (heavy) use.
> just to clarifiy: you get the lockups only with the Xenomai kernel, I assume ? your other option is some Angström kernel or what exactly (isn't the list of options bewildering ;-?)
>
>> So I dare say that isn't the problem, at least not with my lock-ups I'm seeing.
>>
>> Currently I'm debugging the kernel to see what's going on. It looks like the kernel is idling, but the system is completely frozen (blocked, not scheduling?).
>> I've built a kernel with symbols a lot of extra debug options and am waiting for it to stop again right now. It's been running axis with the demo for almost an hour, the best result up to now...
>>
>> Do you have an opinion on what would be the best kernel version for (future) development? Is Xenomai up with the current kernels? Are the DT kernels usable on the bone or do we have to wait another couple of months for that?
> again it's a question of matching a Xenomai patch version with a stable base version, and have the itimer support in it - that's what reduces the range of options
>
> there are several base versions one could try; the integration towards mainline is now targeted at 3.8 and it seems the stock kernel has much of what is needed including PRUSS. It's also possible that the current Xenomai work for a 3.5.x base results in a match, I need to look into it. I was suggested to 'forward port the ipipe patch myself' but I chickened out on that one.
>
> summary: I'm pretty sure there is; I am not aware of tangible results.
>
> I will push the two patches I got from Stephan Kappertz and Sheng Chao Wong, I dont think they are online.
>
> - Michael
>
>
>> -- Bas
>>
>> Yes! Frozen Bone after 56 minutes uptime : ) Time to start debugging again!
>>
>>> Charles has done some great work for a high-speed stepgen on the Beaglebone, and a few folks have reproduced that, but I leave the fanfare to Charles here;)
>>>
>>> I have done no further work on the Raspberry, I do not consider that platform particularly useful to base work on.
>>>
>>> RTAI note:
>>>
>>> I was pointed to this thread recently, which is interesting to read for several reasons:
>>> https://mail.rtai.org/pipermail/rtai/2012-December/thread.html  "Git repository for RTAI"
>>>
>>> It does mention a Ubuntu 12.04 RTAI kernel (Shahbaz Youssefi shabbyx at gmail.com Tue Dec 18 11:09:41 CET 2012) - it might be worth following that up, maybe this is an option to get the current builds out of the 10.04 end-of-support-life situation. I would appreciate if somebody more RTAI-aware than me would pick that up.
>>>
>>> It also touches on the issue how the source repository and collaboration model touches upon a project's success, and that's an interesting read. It looks like the nature of open source communities changes due to for instance the github model, making it easier for the casual contributor, which is a sore spot with the linuxcnc proejct. Something to think about.
>>>
>>> - Michael
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Master Java SE, Java EE, Eclipse, Spring, Hibernate, JavaScript, jQuery
>>> and much more. Keep your Java skills current with LearnJavaNow -
>>> 200+ hours of step-by-step video tutorials by Java experts.
>>> SALE $49.99 this month only -- learn more at:
>>> http://p.sf.net/sfu/learnmore_122612
>>> _______________________________________________
>>> Emc-developers mailing list
>>> Emc-developers@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/emc-developers
>
> ------------------------------------------------------------------------------
> Master Java SE, Java EE, Eclipse, Spring, Hibernate, JavaScript, jQuery
> and much more. Keep your Java skills current with LearnJavaNow -
> 200+ hours of step-by-step video tutorials by Java experts.
> SALE $49.99 this month only -- learn more at:
> http://p.sf.net/sfu/learnmore_122612
> _______________________________________________
> Emc-developers mailing list
> Emc-developers@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/emc-developers

^ permalink raw reply	[flat|nested] 17+ messages in thread