All of lore.kernel.org
 help / color / mirror / Atom feed
* [Xenomai] Command line freeze during xeno-regression-test on omap4460
@ 2014-01-06 15:30 Andreas Glatz
  2014-01-06 17:33 ` Gilles Chanteperdrix
                   ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Andreas Glatz @ 2014-01-06 15:30 UTC (permalink / raw)
  To: xenomai

Hi,

I managed to produce a kernel (v3.8.13) with xenomai 2.6.3 ipipe patch and
rootfs (debian wheezy) with xenomai 2.6.3 libraries for my Pandaboard ES
(omap4460). The simple regression test, which only calls dd during the
switchtest, works fine. However the regression test with the linux test
project (ltp-full-20130904) scripts causes some sort of system lock up.
After that I only can ctrl-c xeno-regression-test (i.e. switchtest), which,
however, doesn't help to regain console access (neigher over ethernet nor
serial).

Here's what I did:

-- Building --
As recomended in the Xenomai 2.6 readme I followed the instructions in [1]
to produce a kernel and filesystem. To get a xenomai kernel I had to do
three things differently:

*) I used: git checkout origin/v3.8.x -b tmp
*) I applied ipipe-core-3.8.13-arm-3.patch from the xenomai-2.6 git tree as
described in the Xenomai 2.6 readme
*) I disabled KGDB and TIDSPBRIDGE since those produced compile errors (see
config [2])

After a while I obtained the following messages from dmesg [3] and from the
command prompt:

root@arm:~# cat /proc/version
Linux version 3.8.13-x3.6 (aglatz@linuxvbox) (gcc version 4.7.3 20130328
(prerelease) (crosstool-NG linaro-1.13.1-4.7-2013.04-20130415 - Linaro GCC
2013.04) ) #4 SMP Sat Jan 4 15:54:20 GMT 2014

-- Testing Linux --
To see if everything works I downloaded and cross-compiled
ltp-full-20130904 [4] with the same toolchain and flags (-march=armv7-a
-mfpu=vfp3) as the xenomai libs and runtime. I started ltp with "./runltp
-p -l dohell-2014-01-06-1.log -S xenomai.skiplist" and after a while it
finished with a few failed tests [5]. The console access, however, worked
fine.

-- Testing Xenomai --
First I sucessfully could run the simple xenomai regression test:
xeno-regression-test -l "/usr/lib/xenomai/testsuite/dohell -m /tmp 100" -t
2 which produced the output in [6] and the following additional messages
with dmesg:

[  476.215057] Xenomai: RTDM: closing file descriptor 1.
[  477.434936] Xenomai: Posix: destroying semaphore f0069c00.
[  477.440887] Xenomai: Posix: destroying mutex f0069a00.
[  477.475372] xnheap: destroying shared heap 'rt_heap: heap' with 16384
bytes still in use.
[  479.008453] Xenomai: Switching rt_task to secondary mode after exception
#0 from user-space at 0x9620 (pid 2145)
[  480.574462] Xenomai: watchdog triggered -- signaling runaway thread
'rt_task'
[  480.582061] [sched_delayed] sched: RT throttling activated
[  557.336425] Xenomai: Posix: closing message queue descriptor 3.

and  "cat /proc/xenomai/*" produced [7].

When I started the realistic xenomai regression test: xeno-regression-test
-l "/usr/lib/xenomai/testsuite/dohell -m /tmp -l /opt/ltp" -t 2 everything
seemed fine at first - I could logon and start top to inspect the running
processes. However, the command line (over serial and ethernet)
consistently freezes after a while (at different ltp tests though). First I
thought it's the massive system load which doesn't leave CPU for the
console... however ctrl-c of xeno-regression-test does not help to regain
console access... even after waiting for ~10mins I could not regain access
to the existing consoles nor new consoles over ethernet. It seems to me
that every syscall into the kernelspace causes the calling process getting
blocked and never scheduled again.

-- Remaining questions --
*) Has anyone experienced something similar and/or found a(n)
explanation/fix/workaround?
*) Are there more debugging options I could try?

Thanks for any help,

Andreas

-- References --
[1] http://eewiki.net/display/linuxonarm/PandaBoard
[2]
https://dcn060062.dcn.ed.ac.uk/main.php?cmd=image&var1=XenomaiOnArm%2F3.8.13-x3.6.config
[3]
https://dcn060062.dcn.ed.ac.uk/main.php?cmd=image&var1=XenomaiOnArm%2Fdmesg_after_boot.txt
[4]
https://sourceforge.net/projects/ltp/files/LTP%20Source/ltp-20130904/ltp-full-20130904.tar.xz/download
[5]
https://dcn060062.dcn.ed.ac.uk/main.php?cmd=image&var1=XenomaiOnArm%2Fdohell-2014-01-06-1.log
[6]
https://dcn060062.dcn.ed.ac.uk/main.php?cmd=image&var1=XenomaiOnArm%2Fxeno-regression-test_simple.txt
[7]
https://dcn060062.dcn.ed.ac.uk/main.php?cmd=image&var1=XenomaiOnArm%2Fxeno-regression-test_realistic_proc.txt

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xenomai] Command line freeze during xeno-regression-test on omap4460
  2014-01-06 15:30 [Xenomai] Command line freeze during xeno-regression-test on omap4460 Andreas Glatz
@ 2014-01-06 17:33 ` Gilles Chanteperdrix
  2014-01-06 17:39 ` Gilles Chanteperdrix
  2014-04-14  7:13 ` Gilles Chanteperdrix
  2 siblings, 0 replies; 28+ messages in thread
From: Gilles Chanteperdrix @ 2014-01-06 17:33 UTC (permalink / raw)
  To: Andreas Glatz; +Cc: xenomai

On 01/06/2014 04:30 PM, Andreas Glatz wrote:
> Hi,
>
> I managed to produce a kernel (v3.8.13) with xenomai 2.6.3 ipipe patch and
> rootfs (debian wheezy) with xenomai 2.6.3 libraries for my Pandaboard ES
> (omap4460). The simple regression test, which only calls dd during the
> switchtest, works fine. However the regression test with the linux test
> project (ltp-full-20130904) scripts causes some sort of system lock up.
> After that I only can ctrl-c xeno-regression-test (i.e. switchtest), which,
> however, doesn't help to regain console access (neigher over ethernet nor
> serial).

If the problem happens during the ltp test itself (notably while running 
msgctl10 or msgctl11) this is normal, the system is completely 
overloaded. You have to wait for some time before it returns to normal.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xenomai] Command line freeze during xeno-regression-test on omap4460
  2014-01-06 15:30 [Xenomai] Command line freeze during xeno-regression-test on omap4460 Andreas Glatz
  2014-01-06 17:33 ` Gilles Chanteperdrix
@ 2014-01-06 17:39 ` Gilles Chanteperdrix
  2014-01-07  7:23   ` Andreas Glatz
  2014-04-04 10:27   ` Andreas Glatz
  2014-04-14  7:13 ` Gilles Chanteperdrix
  2 siblings, 2 replies; 28+ messages in thread
From: Gilles Chanteperdrix @ 2014-01-06 17:39 UTC (permalink / raw)
  To: Andreas Glatz; +Cc: xenomai

On 01/06/2014 04:30 PM, Andreas Glatz wrote:
> Hi,
>
> I managed to produce a kernel (v3.8.13) with xenomai 2.6.3 ipipe patch and
> rootfs (debian wheezy) with xenomai 2.6.3 libraries for my Pandaboard ES
> (omap4460). The simple regression test, which only calls dd during the
> switchtest, works fine. However the regression test with the linux test
> project (ltp-full-20130904) scripts causes some sort of system lock up.
> After that I only can ctrl-c xeno-regression-test (i.e. switchtest), which,
> however, doesn't help to regain console access (neigher over ethernet nor
> serial).
>
> Here's what I did:
>
> -- Building --
> As recomended in the Xenomai 2.6 readme I followed the instructions in [1]
> to produce a kernel and filesystem. To get a xenomai kernel I had to do
> three things differently:
>
> *) I used: git checkout origin/v3.8.x -b tmp
> *) I applied ipipe-core-3.8.13-arm-3.patch from the xenomai-2.6 git tree as
> described in the Xenomai 2.6 readme
> *) I disabled KGDB and TIDSPBRIDGE since those produced compile errors (see
> config [2])
>
> After a while I obtained the following messages from dmesg [3] and from the
> command prompt:
>
> root@arm:~# cat /proc/version
> Linux version 3.8.13-x3.6 (aglatz@linuxvbox) (gcc version 4.7.3 20130328
> (prerelease) (crosstool-NG linaro-1.13.1-4.7-2013.04-20130415 - Linaro GCC
> 2013.04) ) #4 SMP Sat Jan 4 15:54:20 GMT 2014
>
> -- Testing Linux --
> To see if everything works I downloaded and cross-compiled
> ltp-full-20130904 [4] with the same toolchain and flags (-march=armv7-a
> -mfpu=vfp3) as the xenomai libs and runtime. I started ltp with "./runltp
> -p -l dohell-2014-01-06-1.log -S xenomai.skiplist" and after a while it
> finished with a few failed tests [5]. The console access, however, worked
> fine.
>
> -- Testing Xenomai --
> First I sucessfully could run the simple xenomai regression test:
> xeno-regression-test -l "/usr/lib/xenomai/testsuite/dohell -m /tmp 100" -t
> 2 which produced the output in [6] and the following additional messages
> with dmesg:
>
> [  476.215057] Xenomai: RTDM: closing file descriptor 1.
> [  477.434936] Xenomai: Posix: destroying semaphore f0069c00.
> [  477.440887] Xenomai: Posix: destroying mutex f0069a00.
> [  477.475372] xnheap: destroying shared heap 'rt_heap: heap' with 16384
> bytes still in use.
> [  479.008453] Xenomai: Switching rt_task to secondary mode after exception
> #0 from user-space at 0x9620 (pid 2145)
> [  480.574462] Xenomai: watchdog triggered -- signaling runaway thread
> 'rt_task'
> [  480.582061] [sched_delayed] sched: RT throttling activated
> [  557.336425] Xenomai: Posix: closing message queue descriptor 3.
>
> and  "cat /proc/xenomai/*" produced [7].
>
> When I started the realistic xenomai regression test: xeno-regression-test
> -l "/usr/lib/xenomai/testsuite/dohell -m /tmp -l /opt/ltp" -t 2 everything
> seemed fine at first - I could logon and start top to inspect the running
> processes. However, the command line (over serial and ethernet)
> consistently freezes after a while (at different ltp tests though). First I
> thought it's the massive system load which doesn't leave CPU for the
> console... however ctrl-c of xeno-regression-test does not help to regain
> console access...

That is because kill xeno-regression-test does not kill all the script 
children. So, basically, the load tasks are still running. Also, what 
filesystem is /tmp? dohell is using dd to alternatively write to /tmp, 
then erase the file. If /tmp is some flash, it will become slow after a 
while. If it is a tmpfs, it will eat RAM.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xenomai] Command line freeze during xeno-regression-test on omap4460
  2014-01-06 17:39 ` Gilles Chanteperdrix
@ 2014-01-07  7:23   ` Andreas Glatz
  2014-01-07  8:10     ` Andreas Glatz
  2014-04-04 10:27   ` Andreas Glatz
  1 sibling, 1 reply; 28+ messages in thread
From: Andreas Glatz @ 2014-01-07  7:23 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai

Hi Gilles,

At first /tmp was tmpfs since I didn't want to wear out my flash with the
testing. Now I connected an external usb harddrive to the panda and mounted
one of the harddrive partions as /tmp. Additionally, I did not load
xeno_klat and xeno_rtdmtest, which were both in before. I started
xeno-regression-test -l "/usr/lib/xenomai/testsuite/dohell -m /tmp -l
/opt/ltp" -t 2 again (btw I modified dohell to call runltp instead of
runallscripts.sh since the latter is deprecated). And the good news was
that it ran nearly all the tests (everything up to cgroup*) and 'top' was
working all the way. However this morning I noticed that it's still at the
first test of the cgroup* and didn't go any further. 'top' stopped working
and i cannot open additional consoles. On the consoles that are still open
(3 of them) I can execute simple commands like ps, grep, df, ... but
nothing like reboot, top, shutdown, ... Suprisingly, cat /var/log/messages
also gets blocked:

Console 1:
root@arm:~# cat /var/log/messages
^C # <-- notice: I tried to kill it here but no response

Console 2:
root@arm:/opt/ltp/results# ps ax
  PID TTY      STAT   TIME COMMAND
...
15149 pts/2    D+     0:00 cat /var/log/messages
...

Nothing else is running though (neither on linux nor xenomai) :
root@arm:/opt/ltp/results# cat /proc/xenomai/stat
CPU  PID    MSW        CSW        PF    STAT       %CPU  NAME
  0  0      0          986564     0     00500080  100.0  ROOT/0
  1  0      0          992712     0     00500080  100.0  ROOT/1
  1  0      0          309543     0     00000000    0.0  IRQ29: [timer]

To me this looks like a problem with the filesystem (maybe my sd flash card
where the rootfs resides). I will try and install the rootfs on the
external harddrive and repreat everything... maybe this might solve the
problem.

I still have all three consoles open, where just one is still repsonsive.
Any further suggestions?

Thanks for any help,

A.





On Mon, Jan 6, 2014 at 5:39 PM, Gilles Chanteperdrix <
gilles.chanteperdrix@xenomai.org> wrote:

> On 01/06/2014 04:30 PM, Andreas Glatz wrote:
>
>> Hi,
>>
>> I managed to produce a kernel (v3.8.13) with xenomai 2.6.3 ipipe patch and
>> rootfs (debian wheezy) with xenomai 2.6.3 libraries for my Pandaboard ES
>> (omap4460). The simple regression test, which only calls dd during the
>> switchtest, works fine. However the regression test with the linux test
>> project (ltp-full-20130904) scripts causes some sort of system lock up.
>> After that I only can ctrl-c xeno-regression-test (i.e. switchtest),
>> which,
>> however, doesn't help to regain console access (neigher over ethernet nor
>> serial).
>>
>> Here's what I did:
>>
>> -- Building --
>> As recomended in the Xenomai 2.6 readme I followed the instructions in [1]
>> to produce a kernel and filesystem. To get a xenomai kernel I had to do
>> three things differently:
>>
>> *) I used: git checkout origin/v3.8.x -b tmp
>> *) I applied ipipe-core-3.8.13-arm-3.patch from the xenomai-2.6 git tree
>> as
>> described in the Xenomai 2.6 readme
>> *) I disabled KGDB and TIDSPBRIDGE since those produced compile errors
>> (see
>> config [2])
>>
>> After a while I obtained the following messages from dmesg [3] and from
>> the
>> command prompt:
>>
>> root@arm:~# cat /proc/version
>> Linux version 3.8.13-x3.6 (aglatz@linuxvbox) (gcc version 4.7.3 20130328
>> (prerelease) (crosstool-NG linaro-1.13.1-4.7-2013.04-20130415 - Linaro
>> GCC
>> 2013.04) ) #4 SMP Sat Jan 4 15:54:20 GMT 2014
>>
>> -- Testing Linux --
>> To see if everything works I downloaded and cross-compiled
>> ltp-full-20130904 [4] with the same toolchain and flags (-march=armv7-a
>> -mfpu=vfp3) as the xenomai libs and runtime. I started ltp with "./runltp
>> -p -l dohell-2014-01-06-1.log -S xenomai.skiplist" and after a while it
>> finished with a few failed tests [5]. The console access, however, worked
>> fine.
>>
>> -- Testing Xenomai --
>> First I sucessfully could run the simple xenomai regression test:
>> xeno-regression-test -l "/usr/lib/xenomai/testsuite/dohell -m /tmp 100"
>> -t
>> 2 which produced the output in [6] and the following additional messages
>> with dmesg:
>>
>> [  476.215057] Xenomai: RTDM: closing file descriptor 1.
>> [  477.434936] Xenomai: Posix: destroying semaphore f0069c00.
>> [  477.440887] Xenomai: Posix: destroying mutex f0069a00.
>> [  477.475372] xnheap: destroying shared heap 'rt_heap: heap' with 16384
>> bytes still in use.
>> [  479.008453] Xenomai: Switching rt_task to secondary mode after
>> exception
>> #0 from user-space at 0x9620 (pid 2145)
>> [  480.574462] Xenomai: watchdog triggered -- signaling runaway thread
>> 'rt_task'
>> [  480.582061] [sched_delayed] sched: RT throttling activated
>> [  557.336425] Xenomai: Posix: closing message queue descriptor 3.
>>
>> and  "cat /proc/xenomai/*" produced [7].
>>
>> When I started the realistic xenomai regression test: xeno-regression-test
>> -l "/usr/lib/xenomai/testsuite/dohell -m /tmp -l /opt/ltp" -t 2
>> everything
>> seemed fine at first - I could logon and start top to inspect the running
>> processes. However, the command line (over serial and ethernet)
>> consistently freezes after a while (at different ltp tests though). First
>> I
>> thought it's the massive system load which doesn't leave CPU for the
>> console... however ctrl-c of xeno-regression-test does not help to regain
>> console access...
>>
>
> That is because kill xeno-regression-test does not kill all the script
> children. So, basically, the load tasks are still running. Also, what
> filesystem is /tmp? dohell is using dd to alternatively write to /tmp, then
> erase the file. If /tmp is some flash, it will become slow after a while.
> If it is a tmpfs, it will eat RAM.
>
> --
>                                             Gilles.
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xenomai] Command line freeze during xeno-regression-test on omap4460
  2014-01-07  7:23   ` Andreas Glatz
@ 2014-01-07  8:10     ` Andreas Glatz
  0 siblings, 0 replies; 28+ messages in thread
From: Andreas Glatz @ 2014-01-07  8:10 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai

 Hi Gilles,


On Tue, Jan 7, 2014 at 7:23 AM, Andreas Glatz <andi.glatz@gmail.com> wrote:

> Hi Gilles,
>
> At first /tmp was tmpfs since I didn't want to wear out my flash with the
> testing. Now I connected an external usb harddrive to the panda and mounted
> one of the harddrive partions as /tmp. Additionally, I did not load
> xeno_klat and xeno_rtdmtest, which were both in before. I started
> xeno-regression-test -l "/usr/lib/xenomai/testsuite/dohell -m /tmp -l
> /opt/ltp" -t 2 again (btw I modified dohell to call runltp instead of
> runallscripts.sh since the latter is deprecated). And the good news was
> that it ran nearly all the tests (everything up to cgroup*) and 'top' was
> working all the way. However this morning I noticed that it's still at the
> first test of the cgroup* and didn't go any further. 'top' stopped working
> and i cannot open additional consoles. On the consoles that are still open
> (3 of them) I can execute simple commands like ps, grep, df, ... but
> nothing like reboot, top, shutdown, ... Suprisingly, cat /var/log/messages
> also gets blocked:
>
> Console 1:
> root@arm:~# cat /var/log/messages
> ^C # <-- notice: I tried to kill it here but no response
>
> Console 2:
> root@arm:/opt/ltp/results# ps ax
>   PID TTY      STAT   TIME COMMAND
> ...
> 15149 pts/2    D+     0:00 cat /var/log/messages
> ...
>
> Nothing else is running though (neither on linux nor xenomai) :
> root@arm:/opt/ltp/results# cat /proc/xenomai/stat
> CPU  PID    MSW        CSW        PF    STAT       %CPU  NAME
>   0  0      0          986564     0     00500080  100.0  ROOT/0
>   1  0      0          992712     0     00500080  100.0  ROOT/1
>   1  0      0          309543     0     00000000    0.0  IRQ29: [timer]
>
> To me this looks like a problem with the filesystem (maybe my sd flash
> card where the rootfs resides). I will try and install the rootfs on the
> external harddrive and repreat everything... maybe this might solve the
> problem.
>


Firstly sorry for the toppost :)

Secondly, I also noticed that the status led on the panda, which is
triggered by mmc0, is constantly on after failure, whereas it turns only on
when accessing the flash partition after a reboot. So I guest that's a good
indication that mmc0 might have something to do with it?

A.




>
> I still have all three consoles open, where just one is still repsonsive.
> Any further suggestions?
>
> Thanks for any help,
>
> A.
>
>
>
>
>
> On Mon, Jan 6, 2014 at 5:39 PM, Gilles Chanteperdrix <
> gilles.chanteperdrix@xenomai.org> wrote:
>
>> On 01/06/2014 04:30 PM, Andreas Glatz wrote:
>>
>>> Hi,
>>>
>>> I managed to produce a kernel (v3.8.13) with xenomai 2.6.3 ipipe patch
>>> and
>>> rootfs (debian wheezy) with xenomai 2.6.3 libraries for my Pandaboard ES
>>> (omap4460). The simple regression test, which only calls dd during the
>>> switchtest, works fine. However the regression test with the linux test
>>> project (ltp-full-20130904) scripts causes some sort of system lock up.
>>> After that I only can ctrl-c xeno-regression-test (i.e. switchtest),
>>> which,
>>> however, doesn't help to regain console access (neigher over ethernet nor
>>> serial).
>>>
>>> Here's what I did:
>>>
>>> -- Building --
>>> As recomended in the Xenomai 2.6 readme I followed the instructions in
>>> [1]
>>> to produce a kernel and filesystem. To get a xenomai kernel I had to do
>>> three things differently:
>>>
>>> *) I used: git checkout origin/v3.8.x -b tmp
>>> *) I applied ipipe-core-3.8.13-arm-3.patch from the xenomai-2.6 git tree
>>> as
>>> described in the Xenomai 2.6 readme
>>> *) I disabled KGDB and TIDSPBRIDGE since those produced compile errors
>>> (see
>>> config [2])
>>>
>>> After a while I obtained the following messages from dmesg [3] and from
>>> the
>>> command prompt:
>>>
>>> root@arm:~# cat /proc/version
>>> Linux version 3.8.13-x3.6 (aglatz@linuxvbox) (gcc version 4.7.3 20130328
>>> (prerelease) (crosstool-NG linaro-1.13.1-4.7-2013.04-20130415 - Linaro
>>> GCC
>>> 2013.04) ) #4 SMP Sat Jan 4 15:54:20 GMT 2014
>>>
>>> -- Testing Linux --
>>> To see if everything works I downloaded and cross-compiled
>>> ltp-full-20130904 [4] with the same toolchain and flags (-march=armv7-a
>>> -mfpu=vfp3) as the xenomai libs and runtime. I started ltp with "./runltp
>>> -p -l dohell-2014-01-06-1.log -S xenomai.skiplist" and after a while it
>>> finished with a few failed tests [5]. The console access, however, worked
>>> fine.
>>>
>>> -- Testing Xenomai --
>>> First I sucessfully could run the simple xenomai regression test:
>>> xeno-regression-test -l "/usr/lib/xenomai/testsuite/dohell -m /tmp 100"
>>> -t
>>> 2 which produced the output in [6] and the following additional messages
>>> with dmesg:
>>>
>>> [  476.215057] Xenomai: RTDM: closing file descriptor 1.
>>> [  477.434936] Xenomai: Posix: destroying semaphore f0069c00.
>>> [  477.440887] Xenomai: Posix: destroying mutex f0069a00.
>>> [  477.475372] xnheap: destroying shared heap 'rt_heap: heap' with 16384
>>> bytes still in use.
>>> [  479.008453] Xenomai: Switching rt_task to secondary mode after
>>> exception
>>> #0 from user-space at 0x9620 (pid 2145)
>>> [  480.574462] Xenomai: watchdog triggered -- signaling runaway thread
>>> 'rt_task'
>>> [  480.582061] [sched_delayed] sched: RT throttling activated
>>> [  557.336425] Xenomai: Posix: closing message queue descriptor 3.
>>>
>>> and  "cat /proc/xenomai/*" produced [7].
>>>
>>> When I started the realistic xenomai regression test:
>>> xeno-regression-test
>>> -l "/usr/lib/xenomai/testsuite/dohell -m /tmp -l /opt/ltp" -t 2
>>> everything
>>> seemed fine at first - I could logon and start top to inspect the running
>>> processes. However, the command line (over serial and ethernet)
>>> consistently freezes after a while (at different ltp tests though).
>>> First I
>>> thought it's the massive system load which doesn't leave CPU for the
>>> console... however ctrl-c of xeno-regression-test does not help to regain
>>> console access...
>>>
>>
>> That is because kill xeno-regression-test does not kill all the script
>> children. So, basically, the load tasks are still running. Also, what
>> filesystem is /tmp? dohell is using dd to alternatively write to /tmp, then
>> erase the file. If /tmp is some flash, it will become slow after a while.
>> If it is a tmpfs, it will eat RAM.
>>
>> --
>>                                             Gilles.
>>
>
>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xenomai] Command line freeze during xeno-regression-test on omap4460
  2014-01-06 17:39 ` Gilles Chanteperdrix
  2014-01-07  7:23   ` Andreas Glatz
@ 2014-04-04 10:27   ` Andreas Glatz
  2014-04-04 10:44     ` Gilles Chanteperdrix
  2014-04-04 11:00     ` Gilles Chanteperdrix
  1 sibling, 2 replies; 28+ messages in thread
From: Andreas Glatz @ 2014-04-04 10:27 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai

Hi Gilles,

I'm finally back to my original problem below:

On 6 Jan 2014, at 17:39, Gilles Chanteperdrix wrote:

> On 01/06/2014 04:30 PM, Andreas Glatz wrote:
>> Hi,
>>
>> I managed to produce a kernel (v3.8.13) with xenomai 2.6.3 ipipe  
>> patch and
>> rootfs (debian wheezy) with xenomai 2.6.3 libraries for my  
>> Pandaboard ES
>> (omap4460). The simple regression test, which only calls dd during  
>> the
>> switchtest, works fine. However the regression test with the linux  
>> test
>> project (ltp-full-20130904) scripts causes some sort of system lock  
>> up.
>> After that I only can ctrl-c xeno-regression-test (i.e.  
>> switchtest), which,
>> however, doesn't help to regain console access (neigher over  
>> ethernet nor
>> serial).
>>
>> Here's what I did:
>>
>> -- Building --
>> As recomended in the Xenomai 2.6 readme I followed the instructions  
>> in [1]
>> to produce a kernel and filesystem. To get a xenomai kernel I had  
>> to do
>> three things differently:
>>
>> *) I used: git checkout origin/v3.8.x -b tmp
>> *) I applied ipipe-core-3.8.13-arm-3.patch from the xenomai-2.6 git  
>> tree as
>> described in the Xenomai 2.6 readme
>> *) I disabled KGDB and TIDSPBRIDGE since those produced compile  
>> errors (see
>> config [2])
>>
>> After a while I obtained the following messages from dmesg [3] and  
>> from the
>> command prompt:
>>
>> root@arm:~# cat /proc/version
>> Linux version 3.8.13-x3.6 (aglatz@linuxvbox) (gcc version 4.7.3  
>> 20130328
>> (prerelease) (crosstool-NG linaro-1.13.1-4.7-2013.04-20130415 -  
>> Linaro GCC
>> 2013.04) ) #4 SMP Sat Jan 4 15:54:20 GMT 2014
>>
>> -- Testing Linux --
>> To see if everything works I downloaded and cross-compiled
>> ltp-full-20130904 [4] with the same toolchain and flags (- 
>> march=armv7-a
>> -mfpu=vfp3) as the xenomai libs and runtime. I started ltp with "./ 
>> runltp
>> -p -l dohell-2014-01-06-1.log -S xenomai.skiplist" and after a  
>> while it
>> finished with a few failed tests [5]. The console access, however,  
>> worked
>> fine.
>>
>> -- Testing Xenomai --
>> First I sucessfully could run the simple xenomai regression test:
>> xeno-regression-test -l "/usr/lib/xenomai/testsuite/dohell -m /tmp  
>> 100" -t
>> 2 which produced the output in [6] and the following additional  
>> messages
>> with dmesg:
>>
>> [  476.215057] Xenomai: RTDM: closing file descriptor 1.
>> [  477.434936] Xenomai: Posix: destroying semaphore f0069c00.
>> [  477.440887] Xenomai: Posix: destroying mutex f0069a00.
>> [  477.475372] xnheap: destroying shared heap 'rt_heap: heap' with  
>> 16384
>> bytes still in use.
>> [  479.008453] Xenomai: Switching rt_task to secondary mode after  
>> exception
>> #0 from user-space at 0x9620 (pid 2145)
>> [  480.574462] Xenomai: watchdog triggered -- signaling runaway  
>> thread
>> 'rt_task'
>> [  480.582061] [sched_delayed] sched: RT throttling activated
>> [  557.336425] Xenomai: Posix: closing message queue descriptor 3.
>>
>> and  "cat /proc/xenomai/*" produced [7].
>>
>> When I started the realistic xenomai regression test: xeno- 
>> regression-test
>> -l "/usr/lib/xenomai/testsuite/dohell -m /tmp -l /opt/ltp" -t 2  
>> everything
>> seemed fine at first - I could logon and start top to inspect the  
>> running
>> processes. However, the command line (over serial and ethernet)
>> consistently freezes after a while (at different ltp tests though).  
>> First I
>> thought it's the massive system load which doesn't leave CPU for the
>> console... however ctrl-c of xeno-regression-test does not help to  
>> regain
>> console access...
>
> That is because kill xeno-regression-test does not kill all the  
> script children. So, basically, the load tasks are still running.  
> Also, what filesystem is /tmp? dohell is using dd to alternatively  
> write to /tmp, then erase the file. If /tmp is some flash, it will  
> become slow after a while. If it is a tmpfs, it will eat RAM.
>
>

The described problem is _very_ reproducible on my PandaBoard ES  
(omap4460), where I boot from an SD card partition and the rootfs is  
also on the SD card partition. I tried it with several kernel versions  
(3.8.13, 3.10.18, and 3.10.34) with the latest ipipe and xenomai from  
git the git repos. Everytime I start the regression test (see command  
above) the following happens: Everything works fine until the switch/ 
latency tests start. Then I see that there is heavy access to the SD  
card, which is expected, as the status LED 2 is blinking. After ~5mins  
this status LED is constantly on. That's when I know that everything  
is over. On the console I can only execute commands that are already  
in RAM, such as the bash things like ps, mount, ... However, if I try  
a simple 'touch new' it blocks forever and I know that it blocks in  
the syscall where the file should be created, because I looked at it  
with strace. I tried several things: I turned off CONFIG_PM (which was  
on by default), turned on the MMC debugging, put extra prink's in the  
omap_hsmmc.c ISR. However, everything seems to work on this level: DMA  
requests are started and do finish, the ISR is called regularly (bc  
first I though that Xenomai would starve it).

Have you every run Xenonmai on this _specific_ board (since everything  
is running smoothly on the omap5 board)?
Any more ideas how to debug it?

Currently, I'm compiling the ipipe trace in hope that it would tell me  
something useful...

Oh yes, the best bit is that the regression test works perfectly fine  
if I boot from an external USB HD _AND_ unmount (!) all MMC partitions.

Thanks,

A.


















^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xenomai] Command line freeze during xeno-regression-test on omap4460
  2014-04-04 10:27   ` Andreas Glatz
@ 2014-04-04 10:44     ` Gilles Chanteperdrix
  2014-04-04 11:19       ` Andreas Glatz
  2014-04-06 11:21       ` Andreas Glatz
  2014-04-04 11:00     ` Gilles Chanteperdrix
  1 sibling, 2 replies; 28+ messages in thread
From: Gilles Chanteperdrix @ 2014-04-04 10:44 UTC (permalink / raw)
  To: Andreas Glatz; +Cc: xenomai

On 04/04/2014 12:27 PM, Andreas Glatz wrote:
> Hi Gilles,
> 
> I'm finally back to my original problem below:
> 
> On 6 Jan 2014, at 17:39, Gilles Chanteperdrix wrote:
> 
>> On 01/06/2014 04:30 PM, Andreas Glatz wrote:
>>> Hi,
>>>
>>> I managed to produce a kernel (v3.8.13) with xenomai 2.6.3 ipipe
>>> patch and
>>> rootfs (debian wheezy) with xenomai 2.6.3 libraries for my
>>> Pandaboard ES
>>> (omap4460). The simple regression test, which only calls dd during
>>> the
>>> switchtest, works fine. However the regression test with the linux
>>> test
>>> project (ltp-full-20130904) scripts causes some sort of system lock
>>> up.
>>> After that I only can ctrl-c xeno-regression-test (i.e.
>>> switchtest), which,
>>> however, doesn't help to regain console access (neigher over
>>> ethernet nor
>>> serial).
>>>
>>> Here's what I did:
>>>
>>> -- Building --
>>> As recomended in the Xenomai 2.6 readme I followed the instructions
>>> in [1]
>>> to produce a kernel and filesystem. To get a xenomai kernel I had
>>> to do
>>> three things differently:
>>>
>>> *) I used: git checkout origin/v3.8.x -b tmp
>>> *) I applied ipipe-core-3.8.13-arm-3.patch from the xenomai-2.6 git
>>> tree as
>>> described in the Xenomai 2.6 readme
>>> *) I disabled KGDB and TIDSPBRIDGE since those produced compile
>>> errors (see
>>> config [2])
>>>
>>> After a while I obtained the following messages from dmesg [3] and
>>> from the
>>> command prompt:
>>>
>>> root@arm:~# cat /proc/version
>>> Linux version 3.8.13-x3.6 (aglatz@linuxvbox) (gcc version 4.7.3
>>> 20130328
>>> (prerelease) (crosstool-NG linaro-1.13.1-4.7-2013.04-20130415 -
>>> Linaro GCC
>>> 2013.04) ) #4 SMP Sat Jan 4 15:54:20 GMT 2014
>>>
>>> -- Testing Linux --
>>> To see if everything works I downloaded and cross-compiled
>>> ltp-full-20130904 [4] with the same toolchain and flags (-
>>> march=armv7-a
>>> -mfpu=vfp3) as the xenomai libs and runtime. I started ltp with "./
>>> runltp
>>> -p -l dohell-2014-01-06-1.log -S xenomai.skiplist" and after a
>>> while it
>>> finished with a few failed tests [5]. The console access, however,
>>> worked
>>> fine.
>>>
>>> -- Testing Xenomai --
>>> First I sucessfully could run the simple xenomai regression test:
>>> xeno-regression-test -l "/usr/lib/xenomai/testsuite/dohell -m /tmp
>>> 100" -t
>>> 2 which produced the output in [6] and the following additional
>>> messages
>>> with dmesg:
>>>
>>> [  476.215057] Xenomai: RTDM: closing file descriptor 1.
>>> [  477.434936] Xenomai: Posix: destroying semaphore f0069c00.
>>> [  477.440887] Xenomai: Posix: destroying mutex f0069a00.
>>> [  477.475372] xnheap: destroying shared heap 'rt_heap: heap' with
>>> 16384
>>> bytes still in use.
>>> [  479.008453] Xenomai: Switching rt_task to secondary mode after
>>> exception
>>> #0 from user-space at 0x9620 (pid 2145)
>>> [  480.574462] Xenomai: watchdog triggered -- signaling runaway
>>> thread
>>> 'rt_task'
>>> [  480.582061] [sched_delayed] sched: RT throttling activated
>>> [  557.336425] Xenomai: Posix: closing message queue descriptor 3.
>>>
>>> and  "cat /proc/xenomai/*" produced [7].
>>>
>>> When I started the realistic xenomai regression test: xeno-
>>> regression-test
>>> -l "/usr/lib/xenomai/testsuite/dohell -m /tmp -l /opt/ltp" -t 2
>>> everything
>>> seemed fine at first - I could logon and start top to inspect the
>>> running
>>> processes. However, the command line (over serial and ethernet)
>>> consistently freezes after a while (at different ltp tests though).
>>> First I
>>> thought it's the massive system load which doesn't leave CPU for the
>>> console... however ctrl-c of xeno-regression-test does not help to
>>> regain
>>> console access...
>>
>> That is because kill xeno-regression-test does not kill all the
>> script children. So, basically, the load tasks are still running.
>> Also, what filesystem is /tmp? dohell is using dd to alternatively
>> write to /tmp, then erase the file. If /tmp is some flash, it will
>> become slow after a while. If it is a tmpfs, it will eat RAM.
>>
>>
> 
> The described problem is _very_ reproducible on my PandaBoard ES
> (omap4460), where I boot from an SD card partition and the rootfs is
> also on the SD card partition. I tried it with several kernel versions
> (3.8.13, 3.10.18, and 3.10.34) with the latest ipipe and xenomai from
> git the git repos. Everytime I start the regression test (see command
> above) the following happens: Everything works fine until the switch/
> latency tests start. Then I see that there is heavy access to the SD
> card, which is expected, as the status LED 2 is blinking. After ~5mins
> this status LED is constantly on. That's when I know that everything
> is over. On the console I can only execute commands that are already
> in RAM, such as the bash things like ps, mount, ... However, if I try
> a simple 'touch new' it blocks forever and I know that it blocks in
> the syscall where the file should be created, because I looked at it
> with strace. I tried several things: I turned off CONFIG_PM (which was
> on by default), turned on the MMC debugging, put extra prink's in the
> omap_hsmmc.c ISR. However, everything seems to work on this level: DMA
> requests are started and do finish, the ISR is called regularly (bc
> first I though that Xenomai would starve it).
> 
> Have you every run Xenonmai on this _specific_ board (since everything
> is running smoothly on the omap5 board)?
> Any more ideas how to debug it?
> 
> Currently, I'm compiling the ipipe trace in hope that it would tell me
> something useful...
> 
> Oh yes, the best bit is that the regression test works perfectly fine
> if I boot from an external USB HD _AND_ unmount (!) all MMC partitions.

So, the MMC driver has a problem. Have you tried:
- running the exact same kernel configuration only with CONFIG_XENOMAI 
disabled (and stress with dohell)
- then with CONFIG_XENOMAI and CONFIG_IPIPE disabled.

Also, do you have this patch in the tree you tried?
http://git.xenomai.org/ipipe.git/commit/?h=stable/ipipe-3.10.18&id=c26e7ad5679f9391cd8ea1db001bf301d2f6bc88



-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xenomai] Command line freeze during xeno-regression-test on omap4460
  2014-04-04 10:27   ` Andreas Glatz
  2014-04-04 10:44     ` Gilles Chanteperdrix
@ 2014-04-04 11:00     ` Gilles Chanteperdrix
  2014-04-04 13:38       ` Andreas Glatz
  1 sibling, 1 reply; 28+ messages in thread
From: Gilles Chanteperdrix @ 2014-04-04 11:00 UTC (permalink / raw)
  To: Andreas Glatz; +Cc: xenomai

On 04/04/2014 12:27 PM, Andreas Glatz wrote:
> Hi Gilles,
>
> I'm finally back to my original problem below:
>
> On 6 Jan 2014, at 17:39, Gilles Chanteperdrix wrote:
>
>> On 01/06/2014 04:30 PM, Andreas Glatz wrote:
>>> Hi,
>>>
>>> I managed to produce a kernel (v3.8.13) with xenomai 2.6.3 ipipe
>>> patch and
>>> rootfs (debian wheezy) with xenomai 2.6.3 libraries for my
>>> Pandaboard ES
>>> (omap4460). The simple regression test, which only calls dd during
>>> the
>>> switchtest, works fine. However the regression test with the linux
>>> test
>>> project (ltp-full-20130904) scripts causes some sort of system lock
>>> up.
>>> After that I only can ctrl-c xeno-regression-test (i.e.
>>> switchtest), which,
>>> however, doesn't help to regain console access (neigher over
>>> ethernet nor
>>> serial).
>>>
>>> Here's what I did:
>>>
>>> -- Building --
>>> As recomended in the Xenomai 2.6 readme I followed the instructions
>>> in [1]
>>> to produce a kernel and filesystem. To get a xenomai kernel I had
>>> to do
>>> three things differently:
>>>
>>> *) I used: git checkout origin/v3.8.x -b tmp
>>> *) I applied ipipe-core-3.8.13-arm-3.patch from the xenomai-2.6 git
>>> tree as
>>> described in the Xenomai 2.6 readme
>>> *) I disabled KGDB and TIDSPBRIDGE since those produced compile
>>> errors (see
>>> config [2])
>>>
>>> After a while I obtained the following messages from dmesg [3] and
>>> from the
>>> command prompt:
>>>
>>> root@arm:~# cat /proc/version
>>> Linux version 3.8.13-x3.6 (aglatz@linuxvbox) (gcc version 4.7.3
>>> 20130328
>>> (prerelease) (crosstool-NG linaro-1.13.1-4.7-2013.04-20130415 -
>>> Linaro GCC
>>> 2013.04) ) #4 SMP Sat Jan 4 15:54:20 GMT 2014
>>>
>>> -- Testing Linux --
>>> To see if everything works I downloaded and cross-compiled
>>> ltp-full-20130904 [4] with the same toolchain and flags (-
>>> march=armv7-a
>>> -mfpu=vfp3) as the xenomai libs and runtime. I started ltp with "./
>>> runltp
>>> -p -l dohell-2014-01-06-1.log -S xenomai.skiplist" and after a
>>> while it
>>> finished with a few failed tests [5]. The console access, however,
>>> worked
>>> fine.
>>>
>>> -- Testing Xenomai --
>>> First I sucessfully could run the simple xenomai regression test:
>>> xeno-regression-test -l "/usr/lib/xenomai/testsuite/dohell -m /tmp
>>> 100" -t
>>> 2 which produced the output in [6] and the following additional
>>> messages
>>> with dmesg:
>>>
>>> [  476.215057] Xenomai: RTDM: closing file descriptor 1.
>>> [  477.434936] Xenomai: Posix: destroying semaphore f0069c00.
>>> [  477.440887] Xenomai: Posix: destroying mutex f0069a00.
>>> [  477.475372] xnheap: destroying shared heap 'rt_heap: heap' with
>>> 16384
>>> bytes still in use.
>>> [  479.008453] Xenomai: Switching rt_task to secondary mode after
>>> exception
>>> #0 from user-space at 0x9620 (pid 2145)
>>> [  480.574462] Xenomai: watchdog triggered -- signaling runaway
>>> thread
>>> 'rt_task'
>>> [  480.582061] [sched_delayed] sched: RT throttling activated
>>> [  557.336425] Xenomai: Posix: closing message queue descriptor 3.
>>>
>>> and  "cat /proc/xenomai/*" produced [7].
>>>
>>> When I started the realistic xenomai regression test: xeno-
>>> regression-test
>>> -l "/usr/lib/xenomai/testsuite/dohell -m /tmp -l /opt/ltp" -t 2
>>> everything
>>> seemed fine at first - I could logon and start top to inspect the
>>> running
>>> processes. However, the command line (over serial and ethernet)
>>> consistently freezes after a while (at different ltp tests though).
>>> First I
>>> thought it's the massive system load which doesn't leave CPU for the
>>> console... however ctrl-c of xeno-regression-test does not help to
>>> regain
>>> console access...
>>
>> That is because kill xeno-regression-test does not kill all the
>> script children. So, basically, the load tasks are still running.
>> Also, what filesystem is /tmp? dohell is using dd to alternatively
>> write to /tmp, then erase the file. If /tmp is some flash, it will
>> become slow after a while. If it is a tmpfs, it will eat RAM.
>>
>>
>
> The described problem is _very_ reproducible on my PandaBoard ES
> (omap4460), where I boot from an SD card partition and the rootfs is

I have a pandaboard, I can check whether I can reproduce that.

I believe the same problem has also been reported on beagleboard XM:
http://www.xenomai.org/pipermail/xenomai/2014-March/030311.html

So, there may be an issue with Xenomai or interrupt pipelining and the 
MMC driver for omap3 and omap4.


-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xenomai] Command line freeze during xeno-regression-test on omap4460
  2014-04-04 10:44     ` Gilles Chanteperdrix
@ 2014-04-04 11:19       ` Andreas Glatz
  2014-04-04 11:21         ` Gilles Chanteperdrix
  2014-04-06 11:21       ` Andreas Glatz
  1 sibling, 1 reply; 28+ messages in thread
From: Andreas Glatz @ 2014-04-04 11:19 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai


On 4 Apr 2014, at 11:44, Gilles Chanteperdrix wrote:

> On 04/04/2014 12:27 PM, Andreas Glatz wrote:
>> Hi Gilles,
>>
>> I'm finally back to my original problem below:
>>
>> On 6 Jan 2014, at 17:39, Gilles Chanteperdrix wrote:
>>
>>> On 01/06/2014 04:30 PM, Andreas Glatz wrote:
>>>> Hi,
>>>>
>>>> I managed to produce a kernel (v3.8.13) with xenomai 2.6.3 ipipe
>>>> patch and
>>>> rootfs (debian wheezy) with xenomai 2.6.3 libraries for my
>>>> Pandaboard ES
>>>> (omap4460). The simple regression test, which only calls dd during
>>>> the
>>>> switchtest, works fine. However the regression test with the linux
>>>> test
>>>> project (ltp-full-20130904) scripts causes some sort of system lock
>>>> up.
>>>> After that I only can ctrl-c xeno-regression-test (i.e.
>>>> switchtest), which,
>>>> however, doesn't help to regain console access (neigher over
>>>> ethernet nor
>>>> serial).
>>>>
>>>> Here's what I did:
>>>>
>>>> -- Building --
>>>> As recomended in the Xenomai 2.6 readme I followed the instructions
>>>> in [1]
>>>> to produce a kernel and filesystem. To get a xenomai kernel I had
>>>> to do
>>>> three things differently:
>>>>
>>>> *) I used: git checkout origin/v3.8.x -b tmp
>>>> *) I applied ipipe-core-3.8.13-arm-3.patch from the xenomai-2.6 git
>>>> tree as
>>>> described in the Xenomai 2.6 readme
>>>> *) I disabled KGDB and TIDSPBRIDGE since those produced compile
>>>> errors (see
>>>> config [2])
>>>>
>>>> After a while I obtained the following messages from dmesg [3] and
>>>> from the
>>>> command prompt:
>>>>
>>>> root@arm:~# cat /proc/version
>>>> Linux version 3.8.13-x3.6 (aglatz@linuxvbox) (gcc version 4.7.3
>>>> 20130328
>>>> (prerelease) (crosstool-NG linaro-1.13.1-4.7-2013.04-20130415 -
>>>> Linaro GCC
>>>> 2013.04) ) #4 SMP Sat Jan 4 15:54:20 GMT 2014
>>>>
>>>> -- Testing Linux --
>>>> To see if everything works I downloaded and cross-compiled
>>>> ltp-full-20130904 [4] with the same toolchain and flags (-
>>>> march=armv7-a
>>>> -mfpu=vfp3) as the xenomai libs and runtime. I started ltp with "./
>>>> runltp
>>>> -p -l dohell-2014-01-06-1.log -S xenomai.skiplist" and after a
>>>> while it
>>>> finished with a few failed tests [5]. The console access, however,
>>>> worked
>>>> fine.
>>>>
>>>> -- Testing Xenomai --
>>>> First I sucessfully could run the simple xenomai regression test:
>>>> xeno-regression-test -l "/usr/lib/xenomai/testsuite/dohell -m /tmp
>>>> 100" -t
>>>> 2 which produced the output in [6] and the following additional
>>>> messages
>>>> with dmesg:
>>>>
>>>> [  476.215057] Xenomai: RTDM: closing file descriptor 1.
>>>> [  477.434936] Xenomai: Posix: destroying semaphore f0069c00.
>>>> [  477.440887] Xenomai: Posix: destroying mutex f0069a00.
>>>> [  477.475372] xnheap: destroying shared heap 'rt_heap: heap' with
>>>> 16384
>>>> bytes still in use.
>>>> [  479.008453] Xenomai: Switching rt_task to secondary mode after
>>>> exception
>>>> #0 from user-space at 0x9620 (pid 2145)
>>>> [  480.574462] Xenomai: watchdog triggered -- signaling runaway
>>>> thread
>>>> 'rt_task'
>>>> [  480.582061] [sched_delayed] sched: RT throttling activated
>>>> [  557.336425] Xenomai: Posix: closing message queue descriptor 3.
>>>>
>>>> and  "cat /proc/xenomai/*" produced [7].
>>>>
>>>> When I started the realistic xenomai regression test: xeno-
>>>> regression-test
>>>> -l "/usr/lib/xenomai/testsuite/dohell -m /tmp -l /opt/ltp" -t 2
>>>> everything
>>>> seemed fine at first - I could logon and start top to inspect the
>>>> running
>>>> processes. However, the command line (over serial and ethernet)
>>>> consistently freezes after a while (at different ltp tests though).
>>>> First I
>>>> thought it's the massive system load which doesn't leave CPU for  
>>>> the
>>>> console... however ctrl-c of xeno-regression-test does not help to
>>>> regain
>>>> console access...
>>>
>>> That is because kill xeno-regression-test does not kill all the
>>> script children. So, basically, the load tasks are still running.
>>> Also, what filesystem is /tmp? dohell is using dd to alternatively
>>> write to /tmp, then erase the file. If /tmp is some flash, it will
>>> become slow after a while. If it is a tmpfs, it will eat RAM.
>>>
>>>
>>
>> The described problem is _very_ reproducible on my PandaBoard ES
>> (omap4460), where I boot from an SD card partition and the rootfs is
>> also on the SD card partition. I tried it with several kernel  
>> versions
>> (3.8.13, 3.10.18, and 3.10.34) with the latest ipipe and xenomai from
>> git the git repos. Everytime I start the regression test (see command
>> above) the following happens: Everything works fine until the switch/
>> latency tests start. Then I see that there is heavy access to the SD
>> card, which is expected, as the status LED 2 is blinking. After  
>> ~5mins
>> this status LED is constantly on. That's when I know that everything
>> is over. On the console I can only execute commands that are already
>> in RAM, such as the bash things like ps, mount, ... However, if I try
>> a simple 'touch new' it blocks forever and I know that it blocks in
>> the syscall where the file should be created, because I looked at it
>> with strace. I tried several things: I turned off CONFIG_PM (which  
>> was
>> on by default), turned on the MMC debugging, put extra prink's in the
>> omap_hsmmc.c ISR. However, everything seems to work on this level:  
>> DMA
>> requests are started and do finish, the ISR is called regularly (bc
>> first I though that Xenomai would starve it).
>>
>> Have you every run Xenonmai on this _specific_ board (since  
>> everything
>> is running smoothly on the omap5 board)?
>> Any more ideas how to debug it?
>>
>> Currently, I'm compiling the ipipe trace in hope that it would tell  
>> me
>> something useful...
>>
>> Oh yes, the best bit is that the regression test works perfectly fine
>> if I boot from an external USB HD _AND_ unmount (!) all MMC  
>> partitions.
>
> So, the MMC driver has a problem. Have you tried:
> - running the exact same kernel configuration only with CONFIG_XENOMAI
> disabled (and stress with dohell)
> - then with CONFIG_XENOMAI and CONFIG_IPIPE disabled.
>
> Also, do you have this patch in the tree you tried?
> http://git.xenomai.org/ipipe.git/commit/?h=stable/ipipe-3.10.18&id=c26e7ad5679f9391cd8ea1db001bf301d2f6bc88
>
>

I did try the regression test without the switch/latency tests (aka: '/ 
usr/lib/xenomai/testsuite/dohell -m /tmp -l /opt/ltp') and, as far as  
I recall, they finished successfully. I also couldn't find any error  
reports about the omap mmc driver on the kernel mailing list. The only  
thing I found was this patch [1], which I also applied. It didn't  
change a thing though.

However, I'll try an run the test you suggested on my shiny new  
3.10.34 kernel. I built it last Monday after merging all the ipipe git  
stuff with CNelsons 3.18.14 kernel. I saw that the patch you mentioned  
is in the 3.10.18 tree. Shall I apply it to my kernel as well?

A.

[1] http://www.spinics.net/lists/linux-omap/msg104712.html




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xenomai] Command line freeze during xeno-regression-test on omap4460
  2014-04-04 11:19       ` Andreas Glatz
@ 2014-04-04 11:21         ` Gilles Chanteperdrix
  0 siblings, 0 replies; 28+ messages in thread
From: Gilles Chanteperdrix @ 2014-04-04 11:21 UTC (permalink / raw)
  To: Andreas Glatz; +Cc: xenomai

On 04/04/2014 01:19 PM, Andreas Glatz wrote:
> I saw that the patch you mentioned is in
> the 3.10.18 tree. Shall I apply it to my kernel as well?

Yes, definitely.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xenomai] Command line freeze during xeno-regression-test on omap4460
  2014-04-04 11:00     ` Gilles Chanteperdrix
@ 2014-04-04 13:38       ` Andreas Glatz
  0 siblings, 0 replies; 28+ messages in thread
From: Andreas Glatz @ 2014-04-04 13:38 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai


On 4 Apr 2014, at 12:00, Gilles Chanteperdrix wrote:

> On 04/04/2014 12:27 PM, Andreas Glatz wrote:
>> Hi Gilles,
>>
>> I'm finally back to my original problem below:
>>
>> On 6 Jan 2014, at 17:39, Gilles Chanteperdrix wrote:
>>
>>> On 01/06/2014 04:30 PM, Andreas Glatz wrote:
>>>> Hi,
>>>>
>>>> I managed to produce a kernel (v3.8.13) with xenomai 2.6.3 ipipe
>>>> patch and
>>>> rootfs (debian wheezy) with xenomai 2.6.3 libraries for my
>>>> Pandaboard ES
>>>> (omap4460). The simple regression test, which only calls dd during
>>>> the
>>>> switchtest, works fine. However the regression test with the linux
>>>> test
>>>> project (ltp-full-20130904) scripts causes some sort of system lock
>>>> up.
>>>> After that I only can ctrl-c xeno-regression-test (i.e.
>>>> switchtest), which,
>>>> however, doesn't help to regain console access (neigher over
>>>> ethernet nor
>>>> serial).
>>>>
>>>> Here's what I did:
>>>>
>>>> -- Building --
>>>> As recomended in the Xenomai 2.6 readme I followed the instructions
>>>> in [1]
>>>> to produce a kernel and filesystem. To get a xenomai kernel I had
>>>> to do
>>>> three things differently:
>>>>
>>>> *) I used: git checkout origin/v3.8.x -b tmp
>>>> *) I applied ipipe-core-3.8.13-arm-3.patch from the xenomai-2.6 git
>>>> tree as
>>>> described in the Xenomai 2.6 readme
>>>> *) I disabled KGDB and TIDSPBRIDGE since those produced compile
>>>> errors (see
>>>> config [2])
>>>>
>>>> After a while I obtained the following messages from dmesg [3] and
>>>> from the
>>>> command prompt:
>>>>
>>>> root@arm:~# cat /proc/version
>>>> Linux version 3.8.13-x3.6 (aglatz@linuxvbox) (gcc version 4.7.3
>>>> 20130328
>>>> (prerelease) (crosstool-NG linaro-1.13.1-4.7-2013.04-20130415 -
>>>> Linaro GCC
>>>> 2013.04) ) #4 SMP Sat Jan 4 15:54:20 GMT 2014
>>>>
>>>> -- Testing Linux --
>>>> To see if everything works I downloaded and cross-compiled
>>>> ltp-full-20130904 [4] with the same toolchain and flags (-
>>>> march=armv7-a
>>>> -mfpu=vfp3) as the xenomai libs and runtime. I started ltp with "./
>>>> runltp
>>>> -p -l dohell-2014-01-06-1.log -S xenomai.skiplist" and after a
>>>> while it
>>>> finished with a few failed tests [5]. The console access, however,
>>>> worked
>>>> fine.
>>>>
>>>> -- Testing Xenomai --
>>>> First I sucessfully could run the simple xenomai regression test:
>>>> xeno-regression-test -l "/usr/lib/xenomai/testsuite/dohell -m /tmp
>>>> 100" -t
>>>> 2 which produced the output in [6] and the following additional
>>>> messages
>>>> with dmesg:
>>>>
>>>> [  476.215057] Xenomai: RTDM: closing file descriptor 1.
>>>> [  477.434936] Xenomai: Posix: destroying semaphore f0069c00.
>>>> [  477.440887] Xenomai: Posix: destroying mutex f0069a00.
>>>> [  477.475372] xnheap: destroying shared heap 'rt_heap: heap' with
>>>> 16384
>>>> bytes still in use.
>>>> [  479.008453] Xenomai: Switching rt_task to secondary mode after
>>>> exception
>>>> #0 from user-space at 0x9620 (pid 2145)
>>>> [  480.574462] Xenomai: watchdog triggered -- signaling runaway
>>>> thread
>>>> 'rt_task'
>>>> [  480.582061] [sched_delayed] sched: RT throttling activated
>>>> [  557.336425] Xenomai: Posix: closing message queue descriptor 3.
>>>>
>>>> and  "cat /proc/xenomai/*" produced [7].
>>>>
>>>> When I started the realistic xenomai regression test: xeno-
>>>> regression-test
>>>> -l "/usr/lib/xenomai/testsuite/dohell -m /tmp -l /opt/ltp" -t 2
>>>> everything
>>>> seemed fine at first - I could logon and start top to inspect the
>>>> running
>>>> processes. However, the command line (over serial and ethernet)
>>>> consistently freezes after a while (at different ltp tests though).
>>>> First I
>>>> thought it's the massive system load which doesn't leave CPU for  
>>>> the
>>>> console... however ctrl-c of xeno-regression-test does not help to
>>>> regain
>>>> console access...
>>>
>>> That is because kill xeno-regression-test does not kill all the
>>> script children. So, basically, the load tasks are still running.
>>> Also, what filesystem is /tmp? dohell is using dd to alternatively
>>> write to /tmp, then erase the file. If /tmp is some flash, it will
>>> become slow after a while. If it is a tmpfs, it will eat RAM.
>>>
>>>
>>
>> The described problem is _very_ reproducible on my PandaBoard ES
>> (omap4460), where I boot from an SD card partition and the rootfs is
>
> I have a pandaboard, I can check whether I can reproduce that.

Thanks, I really appreciate that. For completeness sake I'm also  
including my current kernel config. This config is derived from  
CNelsons config and still has a lot of unnecessary stuff in it.  
CONFIG_PM can be disabled after disabling CONFIG_ARCH_OMAP2PLUS_TYPICAL.

A.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: configv2
Type: application/octet-stream
Size: 120523 bytes
Desc: not available
URL: <http://www.xenomai.org/pipermail/xenomai/attachments/20140404/81e66bf9/attachment.obj>
-------------- next part --------------





^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xenomai] Command line freeze during xeno-regression-test on omap4460
  2014-04-04 10:44     ` Gilles Chanteperdrix
  2014-04-04 11:19       ` Andreas Glatz
@ 2014-04-06 11:21       ` Andreas Glatz
  2014-04-06 14:44         ` Gilles Chanteperdrix
  2014-04-06 15:54         ` Gilles Chanteperdrix
  1 sibling, 2 replies; 28+ messages in thread
From: Andreas Glatz @ 2014-04-06 11:21 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai


On 4 Apr 2014, at 11:44, Gilles Chanteperdrix wrote:

> On 04/04/2014 12:27 PM, Andreas Glatz wrote:
>> Hi Gilles,
>>
>> I'm finally back to my original problem below:
>>
>> On 6 Jan 2014, at 17:39, Gilles Chanteperdrix wrote:
>>
>>> On 01/06/2014 04:30 PM, Andreas Glatz wrote:
>>>> Hi,
>>>>
>>>> I managed to produce a kernel (v3.8.13) with xenomai 2.6.3 ipipe
>>>> patch and
>>>> rootfs (debian wheezy) with xenomai 2.6.3 libraries for my
>>>> Pandaboard ES
>>>> (omap4460). The simple regression test, which only calls dd during
>>>> the
>>>> switchtest, works fine. However the regression test with the linux
>>>> test
>>>> project (ltp-full-20130904) scripts causes some sort of system lock
>>>> up.
>>>> After that I only can ctrl-c xeno-regression-test (i.e.
>>>> switchtest), which,
>>>> however, doesn't help to regain console access (neigher over
>>>> ethernet nor
>>>> serial).
>>>>
>>>> Here's what I did:
>>>>
>>>> -- Building --
>>>> As recomended in the Xenomai 2.6 readme I followed the instructions
>>>> in [1]
>>>> to produce a kernel and filesystem. To get a xenomai kernel I had
>>>> to do
>>>> three things differently:
>>>>
>>>> *) I used: git checkout origin/v3.8.x -b tmp
>>>> *) I applied ipipe-core-3.8.13-arm-3.patch from the xenomai-2.6 git
>>>> tree as
>>>> described in the Xenomai 2.6 readme
>>>> *) I disabled KGDB and TIDSPBRIDGE since those produced compile
>>>> errors (see
>>>> config [2])
>>>>
>>>> After a while I obtained the following messages from dmesg [3] and
>>>> from the
>>>> command prompt:
>>>>
>>>> root@arm:~# cat /proc/version
>>>> Linux version 3.8.13-x3.6 (aglatz@linuxvbox) (gcc version 4.7.3
>>>> 20130328
>>>> (prerelease) (crosstool-NG linaro-1.13.1-4.7-2013.04-20130415 -
>>>> Linaro GCC
>>>> 2013.04) ) #4 SMP Sat Jan 4 15:54:20 GMT 2014
>>>>
>>>> -- Testing Linux --
>>>> To see if everything works I downloaded and cross-compiled
>>>> ltp-full-20130904 [4] with the same toolchain and flags (-
>>>> march=armv7-a
>>>> -mfpu=vfp3) as the xenomai libs and runtime. I started ltp with "./
>>>> runltp
>>>> -p -l dohell-2014-01-06-1.log -S xenomai.skiplist" and after a
>>>> while it
>>>> finished with a few failed tests [5]. The console access, however,
>>>> worked
>>>> fine.
>>>>
>>>> -- Testing Xenomai --
>>>> First I sucessfully could run the simple xenomai regression test:
>>>> xeno-regression-test -l "/usr/lib/xenomai/testsuite/dohell -m /tmp
>>>> 100" -t
>>>> 2 which produced the output in [6] and the following additional
>>>> messages
>>>> with dmesg:
>>>>
>>>> [  476.215057] Xenomai: RTDM: closing file descriptor 1.
>>>> [  477.434936] Xenomai: Posix: destroying semaphore f0069c00.
>>>> [  477.440887] Xenomai: Posix: destroying mutex f0069a00.
>>>> [  477.475372] xnheap: destroying shared heap 'rt_heap: heap' with
>>>> 16384
>>>> bytes still in use.
>>>> [  479.008453] Xenomai: Switching rt_task to secondary mode after
>>>> exception
>>>> #0 from user-space at 0x9620 (pid 2145)
>>>> [  480.574462] Xenomai: watchdog triggered -- signaling runaway
>>>> thread
>>>> 'rt_task'
>>>> [  480.582061] [sched_delayed] sched: RT throttling activated
>>>> [  557.336425] Xenomai: Posix: closing message queue descriptor 3.
>>>>
>>>> and  "cat /proc/xenomai/*" produced [7].
>>>>
>>>> When I started the realistic xenomai regression test: xeno-
>>>> regression-test
>>>> -l "/usr/lib/xenomai/testsuite/dohell -m /tmp -l /opt/ltp" -t 2
>>>> everything
>>>> seemed fine at first - I could logon and start top to inspect the
>>>> running
>>>> processes. However, the command line (over serial and ethernet)
>>>> consistently freezes after a while (at different ltp tests though).
>>>> First I
>>>> thought it's the massive system load which doesn't leave CPU for  
>>>> the
>>>> console... however ctrl-c of xeno-regression-test does not help to
>>>> regain
>>>> console access...
>>>
>>> That is because kill xeno-regression-test does not kill all the
>>> script children. So, basically, the load tasks are still running.
>>> Also, what filesystem is /tmp? dohell is using dd to alternatively
>>> write to /tmp, then erase the file. If /tmp is some flash, it will
>>> become slow after a while. If it is a tmpfs, it will eat RAM.
>>>
>>>
>>
>> The described problem is _very_ reproducible on my PandaBoard ES
>> (omap4460), where I boot from an SD card partition and the rootfs is
>> also on the SD card partition. I tried it with several kernel  
>> versions
>> (3.8.13, 3.10.18, and 3.10.34) with the latest ipipe and xenomai from
>> git the git repos. Everytime I start the regression test (see command
>> above) the following happens: Everything works fine until the switch/
>> latency tests start. Then I see that there is heavy access to the SD
>> card, which is expected, as the status LED 2 is blinking. After  
>> ~5mins
>> this status LED is constantly on. That's when I know that everything
>> is over. On the console I can only execute commands that are already
>> in RAM, such as the bash things like ps, mount, ... However, if I try
>> a simple 'touch new' it blocks forever and I know that it blocks in
>> the syscall where the file should be created, because I looked at it
>> with strace. I tried several things: I turned off CONFIG_PM (which  
>> was
>> on by default), turned on the MMC debugging, put extra prink's in the
>> omap_hsmmc.c ISR. However, everything seems to work on this level:  
>> DMA
>> requests are started and do finish, the ISR is called regularly (bc
>> first I though that Xenomai would starve it).
>>
>> Have you every run Xenonmai on this _specific_ board (since  
>> everything
>> is running smoothly on the omap5 board)?
>> Any more ideas how to debug it?
>>
>> Currently, I'm compiling the ipipe trace in hope that it would tell  
>> me
>> something useful...
>>
>> Oh yes, the best bit is that the regression test works perfectly fine
>> if I boot from an external USB HD _AND_ unmount (!) all MMC  
>> partitions.
>
> So, the MMC driver has a problem. Have you tried:
> - running the exact same kernel configuration only with CONFIG_XENOMAI
> disabled (and stress with dohell)
> - then with CONFIG_XENOMAI and CONFIG_IPIPE disabled.
>
> Also, do you have this patch in the tree you tried?
> http://git.xenomai.org/ipipe.git/commit/?h=stable/ipipe-3.10.18&id=c26e7ad5679f9391cd8ea1db001bf301d2f6bc88
>

First i mounted tmpfs on /tmp so I don't wear out the SD card too much:
mount -t tmpfs -osize=192M tmpfs /tmp

Then I used the following line to start the test (substitute MYTEST  
below with the following line):
/usr/lib/xenomai/testsuite/dohell -m /tmp -l /opt/ltp

Note: I always monitored the test over wifi with 'top' so I also had  
some network load...

I got the following results with the 3.10.34 kernel, which includes  
everything up to the current ipipe-3.10 tag (it also included the  
patch you mentioned):

- xeno-regression-test "MYTEST" -> FAIL if booted from SD card (see  
description above); OK if booted from ext USB HD _AND_ no mmc  
partitions mounted
- CONFIG_IPIPE && CONFIG_XENOMAI && MYTEST -> FAIL (got status LED 2  
constantly on as described above)
- CONFIG_IPIPE && MYTEST -> OK (see attached config file and ltp test  
log)

Anything else I should try?

A.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: config_v3.10.34
Type: application/octet-stream
Size: 115686 bytes
Desc: not available
URL: <http://www.xenomai.org/pipermail/xenomai/attachments/20140406/b982a10e/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: LTP_RUN_ON-2014_Apr_05-16h_41m_09s.log
Type: application/octet-stream
Size: 64909 bytes
Desc: not available
URL: <http://www.xenomai.org/pipermail/xenomai/attachments/20140406/b982a10e/attachment-0001.obj>
-------------- next part --------------



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xenomai] Command line freeze during xeno-regression-test on omap4460
  2014-04-06 11:21       ` Andreas Glatz
@ 2014-04-06 14:44         ` Gilles Chanteperdrix
  2014-04-06 15:22           ` Andreas Glatz
  2014-04-06 15:54         ` Gilles Chanteperdrix
  1 sibling, 1 reply; 28+ messages in thread
From: Gilles Chanteperdrix @ 2014-04-06 14:44 UTC (permalink / raw)
  To: Andreas Glatz; +Cc: xenomai

On 04/06/2014 01:21 PM, Andreas Glatz wrote:
> 
> On 4 Apr 2014, at 11:44, Gilles Chanteperdrix wrote:
> 
>> On 04/04/2014 12:27 PM, Andreas Glatz wrote:
>>> Hi Gilles,
>>>
>>> I'm finally back to my original problem below:
>>>
>>> On 6 Jan 2014, at 17:39, Gilles Chanteperdrix wrote:
>>>
>>>> On 01/06/2014 04:30 PM, Andreas Glatz wrote:
>>>>> Hi,
>>>>>
>>>>> I managed to produce a kernel (v3.8.13) with xenomai 2.6.3 ipipe
>>>>> patch and
>>>>> rootfs (debian wheezy) with xenomai 2.6.3 libraries for my
>>>>> Pandaboard ES
>>>>> (omap4460). The simple regression test, which only calls dd during
>>>>> the
>>>>> switchtest, works fine. However the regression test with the linux
>>>>> test
>>>>> project (ltp-full-20130904) scripts causes some sort of system lock
>>>>> up.
>>>>> After that I only can ctrl-c xeno-regression-test (i.e.
>>>>> switchtest), which,
>>>>> however, doesn't help to regain console access (neigher over
>>>>> ethernet nor
>>>>> serial).
>>>>>
>>>>> Here's what I did:
>>>>>
>>>>> -- Building --
>>>>> As recomended in the Xenomai 2.6 readme I followed the instructions
>>>>> in [1]
>>>>> to produce a kernel and filesystem. To get a xenomai kernel I had
>>>>> to do
>>>>> three things differently:
>>>>>
>>>>> *) I used: git checkout origin/v3.8.x -b tmp
>>>>> *) I applied ipipe-core-3.8.13-arm-3.patch from the xenomai-2.6 git
>>>>> tree as
>>>>> described in the Xenomai 2.6 readme
>>>>> *) I disabled KGDB and TIDSPBRIDGE since those produced compile
>>>>> errors (see
>>>>> config [2])
>>>>>
>>>>> After a while I obtained the following messages from dmesg [3] and
>>>>> from the
>>>>> command prompt:
>>>>>
>>>>> root@arm:~# cat /proc/version
>>>>> Linux version 3.8.13-x3.6 (aglatz@linuxvbox) (gcc version 4.7.3
>>>>> 20130328
>>>>> (prerelease) (crosstool-NG linaro-1.13.1-4.7-2013.04-20130415 -
>>>>> Linaro GCC
>>>>> 2013.04) ) #4 SMP Sat Jan 4 15:54:20 GMT 2014
>>>>>
>>>>> -- Testing Linux --
>>>>> To see if everything works I downloaded and cross-compiled
>>>>> ltp-full-20130904 [4] with the same toolchain and flags (-
>>>>> march=armv7-a
>>>>> -mfpu=vfp3) as the xenomai libs and runtime. I started ltp with "./
>>>>> runltp
>>>>> -p -l dohell-2014-01-06-1.log -S xenomai.skiplist" and after a
>>>>> while it
>>>>> finished with a few failed tests [5]. The console access, however,
>>>>> worked
>>>>> fine.
>>>>>
>>>>> -- Testing Xenomai --
>>>>> First I sucessfully could run the simple xenomai regression test:
>>>>> xeno-regression-test -l "/usr/lib/xenomai/testsuite/dohell -m /tmp
>>>>> 100" -t
>>>>> 2 which produced the output in [6] and the following additional
>>>>> messages
>>>>> with dmesg:
>>>>>
>>>>> [  476.215057] Xenomai: RTDM: closing file descriptor 1.
>>>>> [  477.434936] Xenomai: Posix: destroying semaphore f0069c00.
>>>>> [  477.440887] Xenomai: Posix: destroying mutex f0069a00.
>>>>> [  477.475372] xnheap: destroying shared heap 'rt_heap: heap' with
>>>>> 16384
>>>>> bytes still in use.
>>>>> [  479.008453] Xenomai: Switching rt_task to secondary mode after
>>>>> exception
>>>>> #0 from user-space at 0x9620 (pid 2145)
>>>>> [  480.574462] Xenomai: watchdog triggered -- signaling runaway
>>>>> thread
>>>>> 'rt_task'
>>>>> [  480.582061] [sched_delayed] sched: RT throttling activated
>>>>> [  557.336425] Xenomai: Posix: closing message queue descriptor 3.
>>>>>
>>>>> and  "cat /proc/xenomai/*" produced [7].
>>>>>
>>>>> When I started the realistic xenomai regression test: xeno-
>>>>> regression-test
>>>>> -l "/usr/lib/xenomai/testsuite/dohell -m /tmp -l /opt/ltp" -t 2
>>>>> everything
>>>>> seemed fine at first - I could logon and start top to inspect the
>>>>> running
>>>>> processes. However, the command line (over serial and ethernet)
>>>>> consistently freezes after a while (at different ltp tests though).
>>>>> First I
>>>>> thought it's the massive system load which doesn't leave CPU for  
>>>>> the
>>>>> console... however ctrl-c of xeno-regression-test does not help to
>>>>> regain
>>>>> console access...
>>>>
>>>> That is because kill xeno-regression-test does not kill all the
>>>> script children. So, basically, the load tasks are still running.
>>>> Also, what filesystem is /tmp? dohell is using dd to alternatively
>>>> write to /tmp, then erase the file. If /tmp is some flash, it will
>>>> become slow after a while. If it is a tmpfs, it will eat RAM.
>>>>
>>>>
>>>
>>> The described problem is _very_ reproducible on my PandaBoard ES
>>> (omap4460), where I boot from an SD card partition and the rootfs is
>>> also on the SD card partition. I tried it with several kernel  
>>> versions
>>> (3.8.13, 3.10.18, and 3.10.34) with the latest ipipe and xenomai from
>>> git the git repos. Everytime I start the regression test (see command
>>> above) the following happens: Everything works fine until the switch/
>>> latency tests start. Then I see that there is heavy access to the SD
>>> card, which is expected, as the status LED 2 is blinking. After  
>>> ~5mins
>>> this status LED is constantly on. That's when I know that everything
>>> is over. On the console I can only execute commands that are already
>>> in RAM, such as the bash things like ps, mount, ... However, if I try
>>> a simple 'touch new' it blocks forever and I know that it blocks in
>>> the syscall where the file should be created, because I looked at it
>>> with strace. I tried several things: I turned off CONFIG_PM (which  
>>> was
>>> on by default), turned on the MMC debugging, put extra prink's in the
>>> omap_hsmmc.c ISR. However, everything seems to work on this level:  
>>> DMA
>>> requests are started and do finish, the ISR is called regularly (bc
>>> first I though that Xenomai would starve it).
>>>
>>> Have you every run Xenonmai on this _specific_ board (since  
>>> everything
>>> is running smoothly on the omap5 board)?
>>> Any more ideas how to debug it?
>>>
>>> Currently, I'm compiling the ipipe trace in hope that it would tell  
>>> me
>>> something useful...
>>>
>>> Oh yes, the best bit is that the regression test works perfectly fine
>>> if I boot from an external USB HD _AND_ unmount (!) all MMC  
>>> partitions.
>>
>> So, the MMC driver has a problem. Have you tried:
>> - running the exact same kernel configuration only with CONFIG_XENOMAI
>> disabled (and stress with dohell)
>> - then with CONFIG_XENOMAI and CONFIG_IPIPE disabled.
>>
>> Also, do you have this patch in the tree you tried?
>> http://git.xenomai.org/ipipe.git/commit/?h=stable/ipipe-3.10.18&id=c26e7ad5679f9391cd8ea1db001bf301d2f6bc88
>>
> 
> First i mounted tmpfs on /tmp so I don't wear out the SD card too much:
> mount -t tmpfs -osize=192M tmpfs /tmp
> 
> Then I used the following line to start the test (substitute MYTEST  
> below with the following line):
> /usr/lib/xenomai/testsuite/dohell -m /tmp -l /opt/ltp
> 
> Note: I always monitored the test over wifi with 'top' so I also had  
> some network load...
> 
> I got the following results with the 3.10.34 kernel, which includes  
> everything up to the current ipipe-3.10 tag (it also included the  
> patch you mentioned):
> 
> - xeno-regression-test "MYTEST" -> FAIL if booted from SD card (see  
> description above); OK if booted from ext USB HD _AND_ no mmc  
> partitions mounted
> - CONFIG_IPIPE && CONFIG_XENOMAI && MYTEST -> FAIL (got status LED 2  
> constantly on as described above)
> - CONFIG_IPIPE && MYTEST -> OK (see attached config file and ltp test  
> log)
> 
> Anything else I should try?

Is the current LTP test when the failure happens always the same?

-- 
                                                                Gilles.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xenomai] Command line freeze during xeno-regression-test on omap4460
  2014-04-06 14:44         ` Gilles Chanteperdrix
@ 2014-04-06 15:22           ` Andreas Glatz
  2014-04-06 15:28             ` Gilles Chanteperdrix
  0 siblings, 1 reply; 28+ messages in thread
From: Andreas Glatz @ 2014-04-06 15:22 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai


On 6 Apr 2014, at 15:44, Gilles Chanteperdrix wrote:

> On 04/06/2014 01:21 PM, Andreas Glatz wrote:
>>
>> On 4 Apr 2014, at 11:44, Gilles Chanteperdrix wrote:
>>
>>> On 04/04/2014 12:27 PM, Andreas Glatz wrote:
>>>> Hi Gilles,
>>>>
>>>> I'm finally back to my original problem below:
>>>>
>>>> On 6 Jan 2014, at 17:39, Gilles Chanteperdrix wrote:
>>>>
>>>>> On 01/06/2014 04:30 PM, Andreas Glatz wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I managed to produce a kernel (v3.8.13) with xenomai 2.6.3 ipipe
>>>>>> patch and
>>>>>> rootfs (debian wheezy) with xenomai 2.6.3 libraries for my
>>>>>> Pandaboard ES
>>>>>> (omap4460). The simple regression test, which only calls dd  
>>>>>> during
>>>>>> the
>>>>>> switchtest, works fine. However the regression test with the  
>>>>>> linux
>>>>>> test
>>>>>> project (ltp-full-20130904) scripts causes some sort of system  
>>>>>> lock
>>>>>> up.
>>>>>> After that I only can ctrl-c xeno-regression-test (i.e.
>>>>>> switchtest), which,
>>>>>> however, doesn't help to regain console access (neigher over
>>>>>> ethernet nor
>>>>>> serial).
>>>>>>
>>>>>> Here's what I did:
>>>>>>
>>>>>> -- Building --
>>>>>> As recomended in the Xenomai 2.6 readme I followed the  
>>>>>> instructions
>>>>>> in [1]
>>>>>> to produce a kernel and filesystem. To get a xenomai kernel I had
>>>>>> to do
>>>>>> three things differently:
>>>>>>
>>>>>> *) I used: git checkout origin/v3.8.x -b tmp
>>>>>> *) I applied ipipe-core-3.8.13-arm-3.patch from the xenomai-2.6  
>>>>>> git
>>>>>> tree as
>>>>>> described in the Xenomai 2.6 readme
>>>>>> *) I disabled KGDB and TIDSPBRIDGE since those produced compile
>>>>>> errors (see
>>>>>> config [2])
>>>>>>
>>>>>> After a while I obtained the following messages from dmesg [3]  
>>>>>> and
>>>>>> from the
>>>>>> command prompt:
>>>>>>
>>>>>> root@arm:~# cat /proc/version
>>>>>> Linux version 3.8.13-x3.6 (aglatz@linuxvbox) (gcc version 4.7.3
>>>>>> 20130328
>>>>>> (prerelease) (crosstool-NG linaro-1.13.1-4.7-2013.04-20130415 -
>>>>>> Linaro GCC
>>>>>> 2013.04) ) #4 SMP Sat Jan 4 15:54:20 GMT 2014
>>>>>>
>>>>>> -- Testing Linux --
>>>>>> To see if everything works I downloaded and cross-compiled
>>>>>> ltp-full-20130904 [4] with the same toolchain and flags (-
>>>>>> march=armv7-a
>>>>>> -mfpu=vfp3) as the xenomai libs and runtime. I started ltp with  
>>>>>> "./
>>>>>> runltp
>>>>>> -p -l dohell-2014-01-06-1.log -S xenomai.skiplist" and after a
>>>>>> while it
>>>>>> finished with a few failed tests [5]. The console access,  
>>>>>> however,
>>>>>> worked
>>>>>> fine.
>>>>>>
>>>>>> -- Testing Xenomai --
>>>>>> First I sucessfully could run the simple xenomai regression test:
>>>>>> xeno-regression-test -l "/usr/lib/xenomai/testsuite/dohell -m / 
>>>>>> tmp
>>>>>> 100" -t
>>>>>> 2 which produced the output in [6] and the following additional
>>>>>> messages
>>>>>> with dmesg:
>>>>>>
>>>>>> [  476.215057] Xenomai: RTDM: closing file descriptor 1.
>>>>>> [  477.434936] Xenomai: Posix: destroying semaphore f0069c00.
>>>>>> [  477.440887] Xenomai: Posix: destroying mutex f0069a00.
>>>>>> [  477.475372] xnheap: destroying shared heap 'rt_heap: heap'  
>>>>>> with
>>>>>> 16384
>>>>>> bytes still in use.
>>>>>> [  479.008453] Xenomai: Switching rt_task to secondary mode after
>>>>>> exception
>>>>>> #0 from user-space at 0x9620 (pid 2145)
>>>>>> [  480.574462] Xenomai: watchdog triggered -- signaling runaway
>>>>>> thread
>>>>>> 'rt_task'
>>>>>> [  480.582061] [sched_delayed] sched: RT throttling activated
>>>>>> [  557.336425] Xenomai: Posix: closing message queue descriptor  
>>>>>> 3.
>>>>>>
>>>>>> and  "cat /proc/xenomai/*" produced [7].
>>>>>>
>>>>>> When I started the realistic xenomai regression test: xeno-
>>>>>> regression-test
>>>>>> -l "/usr/lib/xenomai/testsuite/dohell -m /tmp -l /opt/ltp" -t 2
>>>>>> everything
>>>>>> seemed fine at first - I could logon and start top to inspect the
>>>>>> running
>>>>>> processes. However, the command line (over serial and ethernet)
>>>>>> consistently freezes after a while (at different ltp tests  
>>>>>> though).
>>>>>> First I
>>>>>> thought it's the massive system load which doesn't leave CPU for
>>>>>> the
>>>>>> console... however ctrl-c of xeno-regression-test does not help  
>>>>>> to
>>>>>> regain
>>>>>> console access...
>>>>>
>>>>> That is because kill xeno-regression-test does not kill all the
>>>>> script children. So, basically, the load tasks are still running.
>>>>> Also, what filesystem is /tmp? dohell is using dd to alternatively
>>>>> write to /tmp, then erase the file. If /tmp is some flash, it will
>>>>> become slow after a while. If it is a tmpfs, it will eat RAM.
>>>>>
>>>>>
>>>>
>>>> The described problem is _very_ reproducible on my PandaBoard ES
>>>> (omap4460), where I boot from an SD card partition and the rootfs  
>>>> is
>>>> also on the SD card partition. I tried it with several kernel
>>>> versions
>>>> (3.8.13, 3.10.18, and 3.10.34) with the latest ipipe and xenomai  
>>>> from
>>>> git the git repos. Everytime I start the regression test (see  
>>>> command
>>>> above) the following happens: Everything works fine until the  
>>>> switch/
>>>> latency tests start. Then I see that there is heavy access to the  
>>>> SD
>>>> card, which is expected, as the status LED 2 is blinking. After
>>>> ~5mins
>>>> this status LED is constantly on. That's when I know that  
>>>> everything
>>>> is over. On the console I can only execute commands that are  
>>>> already
>>>> in RAM, such as the bash things like ps, mount, ... However, if I  
>>>> try
>>>> a simple 'touch new' it blocks forever and I know that it blocks in
>>>> the syscall where the file should be created, because I looked at  
>>>> it
>>>> with strace. I tried several things: I turned off CONFIG_PM (which
>>>> was
>>>> on by default), turned on the MMC debugging, put extra prink's in  
>>>> the
>>>> omap_hsmmc.c ISR. However, everything seems to work on this level:
>>>> DMA
>>>> requests are started and do finish, the ISR is called regularly (bc
>>>> first I though that Xenomai would starve it).
>>>>
>>>> Have you every run Xenonmai on this _specific_ board (since
>>>> everything
>>>> is running smoothly on the omap5 board)?
>>>> Any more ideas how to debug it?
>>>>
>>>> Currently, I'm compiling the ipipe trace in hope that it would tell
>>>> me
>>>> something useful...
>>>>
>>>> Oh yes, the best bit is that the regression test works perfectly  
>>>> fine
>>>> if I boot from an external USB HD _AND_ unmount (!) all MMC
>>>> partitions.
>>>
>>> So, the MMC driver has a problem. Have you tried:
>>> - running the exact same kernel configuration only with  
>>> CONFIG_XENOMAI
>>> disabled (and stress with dohell)
>>> - then with CONFIG_XENOMAI and CONFIG_IPIPE disabled.
>>>
>>> Also, do you have this patch in the tree you tried?
>>> http://git.xenomai.org/ipipe.git/commit/?h=stable/ipipe-3.10.18&id=c26e7ad5679f9391cd8ea1db001bf301d2f6bc88
>>>
>>
>> First i mounted tmpfs on /tmp so I don't wear out the SD card too  
>> much:
>> mount -t tmpfs -osize=192M tmpfs /tmp
>>
>> Then I used the following line to start the test (substitute MYTEST
>> below with the following line):
>> /usr/lib/xenomai/testsuite/dohell -m /tmp -l /opt/ltp
>>
>> Note: I always monitored the test over wifi with 'top' so I also had
>> some network load...
>>
>> I got the following results with the 3.10.34 kernel, which includes
>> everything up to the current ipipe-3.10 tag (it also included the
>> patch you mentioned):
>>
>> - xeno-regression-test "MYTEST" -> FAIL if booted from SD card (see
>> description above); OK if booted from ext USB HD _AND_ no mmc
>> partitions mounted
>> - CONFIG_IPIPE && CONFIG_XENOMAI && MYTEST -> FAIL (got status LED 2
>> constantly on as described above)
>> - CONFIG_IPIPE && MYTEST -> OK (see attached config file and ltp test
>> log)
>>
>> Anything else I should try?
>
> Is the current LTP test when the failure happens always the same?
>
>

I went through all the logfiles on my pandaboard and and identified  
the last tests that ltp logged before the error occurred (I'm assuming  
that ltp writes to the file in /opt/ltp/results after completing the  
test since there is the PASS/FAIL note as well, which logically should  
only be available after completing the test):

test                               count
========================
rt_sigqueueinfo01    1
clock_nanosleep01 10
munmap02                1
semget06                   1
epoll_create1_01     5
splice01                      1
clock_getres01          1
rename13                   1
BindMounts                1
utimes01                     1

So it seems that the test after 'clock_nanosleep01', which is  
'clone01' according to the LTP log file I sent you, seems to be the  
prime hotspot of failure followed by 'epoll01', which comes after  
'epoll_create1_01'.

I'm using the standard LTP version 'ltp-full-20130904', which I  
downloaded and compiled on the target with gcc 4.6.3 (default debian  
wheezy).

A.










^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xenomai] Command line freeze during xeno-regression-test on omap4460
  2014-04-06 15:22           ` Andreas Glatz
@ 2014-04-06 15:28             ` Gilles Chanteperdrix
  2014-04-06 20:57               ` Andreas Glatz
  0 siblings, 1 reply; 28+ messages in thread
From: Gilles Chanteperdrix @ 2014-04-06 15:28 UTC (permalink / raw)
  To: Andreas Glatz; +Cc: xenomai

On 04/06/2014 05:22 PM, Andreas Glatz wrote:
> 
> On 6 Apr 2014, at 15:44, Gilles Chanteperdrix wrote:
> 
>> On 04/06/2014 01:21 PM, Andreas Glatz wrote:
>>>
>>> On 4 Apr 2014, at 11:44, Gilles Chanteperdrix wrote:
>>>
>>>> On 04/04/2014 12:27 PM, Andreas Glatz wrote:
>>>>> Hi Gilles,
>>>>>
>>>>> I'm finally back to my original problem below:
>>>>>
>>>>> On 6 Jan 2014, at 17:39, Gilles Chanteperdrix wrote:
>>>>>
>>>>>> On 01/06/2014 04:30 PM, Andreas Glatz wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> I managed to produce a kernel (v3.8.13) with xenomai 2.6.3 ipipe
>>>>>>> patch and
>>>>>>> rootfs (debian wheezy) with xenomai 2.6.3 libraries for my
>>>>>>> Pandaboard ES
>>>>>>> (omap4460). The simple regression test, which only calls dd  
>>>>>>> during
>>>>>>> the
>>>>>>> switchtest, works fine. However the regression test with the  
>>>>>>> linux
>>>>>>> test
>>>>>>> project (ltp-full-20130904) scripts causes some sort of system  
>>>>>>> lock
>>>>>>> up.
>>>>>>> After that I only can ctrl-c xeno-regression-test (i.e.
>>>>>>> switchtest), which,
>>>>>>> however, doesn't help to regain console access (neigher over
>>>>>>> ethernet nor
>>>>>>> serial).
>>>>>>>
>>>>>>> Here's what I did:
>>>>>>>
>>>>>>> -- Building --
>>>>>>> As recomended in the Xenomai 2.6 readme I followed the  
>>>>>>> instructions
>>>>>>> in [1]
>>>>>>> to produce a kernel and filesystem. To get a xenomai kernel I had
>>>>>>> to do
>>>>>>> three things differently:
>>>>>>>
>>>>>>> *) I used: git checkout origin/v3.8.x -b tmp
>>>>>>> *) I applied ipipe-core-3.8.13-arm-3.patch from the xenomai-2.6  
>>>>>>> git
>>>>>>> tree as
>>>>>>> described in the Xenomai 2.6 readme
>>>>>>> *) I disabled KGDB and TIDSPBRIDGE since those produced compile
>>>>>>> errors (see
>>>>>>> config [2])
>>>>>>>
>>>>>>> After a while I obtained the following messages from dmesg [3]  
>>>>>>> and
>>>>>>> from the
>>>>>>> command prompt:
>>>>>>>
>>>>>>> root@arm:~# cat /proc/version
>>>>>>> Linux version 3.8.13-x3.6 (aglatz@linuxvbox) (gcc version 4.7.3
>>>>>>> 20130328
>>>>>>> (prerelease) (crosstool-NG linaro-1.13.1-4.7-2013.04-20130415 -
>>>>>>> Linaro GCC
>>>>>>> 2013.04) ) #4 SMP Sat Jan 4 15:54:20 GMT 2014
>>>>>>>
>>>>>>> -- Testing Linux --
>>>>>>> To see if everything works I downloaded and cross-compiled
>>>>>>> ltp-full-20130904 [4] with the same toolchain and flags (-
>>>>>>> march=armv7-a
>>>>>>> -mfpu=vfp3) as the xenomai libs and runtime. I started ltp with  
>>>>>>> "./
>>>>>>> runltp
>>>>>>> -p -l dohell-2014-01-06-1.log -S xenomai.skiplist" and after a
>>>>>>> while it
>>>>>>> finished with a few failed tests [5]. The console access,  
>>>>>>> however,
>>>>>>> worked
>>>>>>> fine.
>>>>>>>
>>>>>>> -- Testing Xenomai --
>>>>>>> First I sucessfully could run the simple xenomai regression test:
>>>>>>> xeno-regression-test -l "/usr/lib/xenomai/testsuite/dohell -m / 
>>>>>>> tmp
>>>>>>> 100" -t
>>>>>>> 2 which produced the output in [6] and the following additional
>>>>>>> messages
>>>>>>> with dmesg:
>>>>>>>
>>>>>>> [  476.215057] Xenomai: RTDM: closing file descriptor 1.
>>>>>>> [  477.434936] Xenomai: Posix: destroying semaphore f0069c00.
>>>>>>> [  477.440887] Xenomai: Posix: destroying mutex f0069a00.
>>>>>>> [  477.475372] xnheap: destroying shared heap 'rt_heap: heap'  
>>>>>>> with
>>>>>>> 16384
>>>>>>> bytes still in use.
>>>>>>> [  479.008453] Xenomai: Switching rt_task to secondary mode after
>>>>>>> exception
>>>>>>> #0 from user-space at 0x9620 (pid 2145)
>>>>>>> [  480.574462] Xenomai: watchdog triggered -- signaling runaway
>>>>>>> thread
>>>>>>> 'rt_task'
>>>>>>> [  480.582061] [sched_delayed] sched: RT throttling activated
>>>>>>> [  557.336425] Xenomai: Posix: closing message queue descriptor  
>>>>>>> 3.
>>>>>>>
>>>>>>> and  "cat /proc/xenomai/*" produced [7].
>>>>>>>
>>>>>>> When I started the realistic xenomai regression test: xeno-
>>>>>>> regression-test
>>>>>>> -l "/usr/lib/xenomai/testsuite/dohell -m /tmp -l /opt/ltp" -t 2
>>>>>>> everything
>>>>>>> seemed fine at first - I could logon and start top to inspect the
>>>>>>> running
>>>>>>> processes. However, the command line (over serial and ethernet)
>>>>>>> consistently freezes after a while (at different ltp tests  
>>>>>>> though).
>>>>>>> First I
>>>>>>> thought it's the massive system load which doesn't leave CPU for
>>>>>>> the
>>>>>>> console... however ctrl-c of xeno-regression-test does not help  
>>>>>>> to
>>>>>>> regain
>>>>>>> console access...
>>>>>>
>>>>>> That is because kill xeno-regression-test does not kill all the
>>>>>> script children. So, basically, the load tasks are still running.
>>>>>> Also, what filesystem is /tmp? dohell is using dd to alternatively
>>>>>> write to /tmp, then erase the file. If /tmp is some flash, it will
>>>>>> become slow after a while. If it is a tmpfs, it will eat RAM.
>>>>>>
>>>>>>
>>>>>
>>>>> The described problem is _very_ reproducible on my PandaBoard ES
>>>>> (omap4460), where I boot from an SD card partition and the rootfs  
>>>>> is
>>>>> also on the SD card partition. I tried it with several kernel
>>>>> versions
>>>>> (3.8.13, 3.10.18, and 3.10.34) with the latest ipipe and xenomai  
>>>>> from
>>>>> git the git repos. Everytime I start the regression test (see  
>>>>> command
>>>>> above) the following happens: Everything works fine until the  
>>>>> switch/
>>>>> latency tests start. Then I see that there is heavy access to the  
>>>>> SD
>>>>> card, which is expected, as the status LED 2 is blinking. After
>>>>> ~5mins
>>>>> this status LED is constantly on. That's when I know that  
>>>>> everything
>>>>> is over. On the console I can only execute commands that are  
>>>>> already
>>>>> in RAM, such as the bash things like ps, mount, ... However, if I  
>>>>> try
>>>>> a simple 'touch new' it blocks forever and I know that it blocks in
>>>>> the syscall where the file should be created, because I looked at  
>>>>> it
>>>>> with strace. I tried several things: I turned off CONFIG_PM (which
>>>>> was
>>>>> on by default), turned on the MMC debugging, put extra prink's in  
>>>>> the
>>>>> omap_hsmmc.c ISR. However, everything seems to work on this level:
>>>>> DMA
>>>>> requests are started and do finish, the ISR is called regularly (bc
>>>>> first I though that Xenomai would starve it).
>>>>>
>>>>> Have you every run Xenonmai on this _specific_ board (since
>>>>> everything
>>>>> is running smoothly on the omap5 board)?
>>>>> Any more ideas how to debug it?
>>>>>
>>>>> Currently, I'm compiling the ipipe trace in hope that it would tell
>>>>> me
>>>>> something useful...
>>>>>
>>>>> Oh yes, the best bit is that the regression test works perfectly  
>>>>> fine
>>>>> if I boot from an external USB HD _AND_ unmount (!) all MMC
>>>>> partitions.
>>>>
>>>> So, the MMC driver has a problem. Have you tried:
>>>> - running the exact same kernel configuration only with  
>>>> CONFIG_XENOMAI
>>>> disabled (and stress with dohell)
>>>> - then with CONFIG_XENOMAI and CONFIG_IPIPE disabled.
>>>>
>>>> Also, do you have this patch in the tree you tried?
>>>> http://git.xenomai.org/ipipe.git/commit/?h=stable/ipipe-3.10.18&id=c26e7ad5679f9391cd8ea1db001bf301d2f6bc88
>>>>
>>>
>>> First i mounted tmpfs on /tmp so I don't wear out the SD card too  
>>> much:
>>> mount -t tmpfs -osize=192M tmpfs /tmp
>>>
>>> Then I used the following line to start the test (substitute MYTEST
>>> below with the following line):
>>> /usr/lib/xenomai/testsuite/dohell -m /tmp -l /opt/ltp
>>>
>>> Note: I always monitored the test over wifi with 'top' so I also had
>>> some network load...
>>>
>>> I got the following results with the 3.10.34 kernel, which includes
>>> everything up to the current ipipe-3.10 tag (it also included the
>>> patch you mentioned):
>>>
>>> - xeno-regression-test "MYTEST" -> FAIL if booted from SD card (see
>>> description above); OK if booted from ext USB HD _AND_ no mmc
>>> partitions mounted
>>> - CONFIG_IPIPE && CONFIG_XENOMAI && MYTEST -> FAIL (got status LED 2
>>> constantly on as described above)
>>> - CONFIG_IPIPE && MYTEST -> OK (see attached config file and ltp test
>>> log)
>>>
>>> Anything else I should try?
>>
>> Is the current LTP test when the failure happens always the same?
>>
>>
> 
> I went through all the logfiles on my pandaboard and and identified  
> the last tests that ltp logged before the error occurred (I'm assuming  
> that ltp writes to the file in /opt/ltp/results after completing the  
> test since there is the PASS/FAIL note as well, which logically should  
> only be available after completing the test):
> 
> test                               count
> ========================
> rt_sigqueueinfo01    1
> clock_nanosleep01 10
> munmap02                1
> semget06                   1
> epoll_create1_01     5
> splice01                      1
> clock_getres01          1
> rename13                   1
> BindMounts                1
> utimes01                     1
> 
> So it seems that the test after 'clock_nanosleep01', which is  
> 'clone01' according to the LTP log file I sent you, seems to be the  
> prime hotspot of failure followed by 'epoll01', which comes after  
> 'epoll_create1_01'.
> 
> I'm using the standard LTP version 'ltp-full-20130904', which I  
> downloaded and compiled on the target with gcc 4.6.3 (default debian  
> wheezy).

Ok. I am not sure it is meaningful. Anyway, the only difference between
CONFIG_XENOMAI + CONFIG_IPIPE and CONFIG_IPIPE alone, provided that you
are not running any program using Xenomai, is the host tick emulation.

So, could you please try to turn off
CONFIG_NO_HZ_IDLE
CONFIG_NO_HZ
CONFIG_HIGH_RES_TIMERS

And see if it works better?

-- 
                                                                Gilles.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xenomai] Command line freeze during xeno-regression-test on omap4460
  2014-04-06 11:21       ` Andreas Glatz
  2014-04-06 14:44         ` Gilles Chanteperdrix
@ 2014-04-06 15:54         ` Gilles Chanteperdrix
  2014-04-06 16:02           ` Andreas Glatz
  1 sibling, 1 reply; 28+ messages in thread
From: Gilles Chanteperdrix @ 2014-04-06 15:54 UTC (permalink / raw)
  To: Andreas Glatz; +Cc: xenomai

On 04/06/2014 01:21 PM, Andreas Glatz wrote:
> First i mounted tmpfs on /tmp so I don't wear out the SD card too much:
> mount -t tmpfs -osize=192M tmpfs /tmp
> 
> Then I used the following line to start the test (substitute MYTEST  
> below with the following line):
> /usr/lib/xenomai/testsuite/dohell -m /tmp -l /opt/ltp
> 
> Note: I always monitored the test over wifi with 'top' so I also had  
> some network load...
> 
> I got the following results with the 3.10.34 kernel, which includes  
> everything up to the current ipipe-3.10 tag (it also included the  
> patch you mentioned):
> 
> - xeno-regression-test "MYTEST" -> FAIL if booted from SD card (see  
> description above); OK if booted from ext USB HD _AND_ no mmc  
> partitions mounted
> - CONFIG_IPIPE && CONFIG_XENOMAI && MYTEST -> FAIL (got status LED 2  
> constantly on as described above)
> - CONFIG_IPIPE && MYTEST -> OK (see attached config file and ltp test  
> log)

Of course, I assume you used the exact same kernel configuration, the
only difference being CONFIG_XENOMAI in the two cases, right?

-- 
                                                                Gilles.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xenomai] Command line freeze during xeno-regression-test on omap4460
  2014-04-06 15:54         ` Gilles Chanteperdrix
@ 2014-04-06 16:02           ` Andreas Glatz
  2014-04-06 20:54             ` Gilles Chanteperdrix
  0 siblings, 1 reply; 28+ messages in thread
From: Andreas Glatz @ 2014-04-06 16:02 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai


On 6 Apr 2014, at 16:54, Gilles Chanteperdrix wrote:

> On 04/06/2014 01:21 PM, Andreas Glatz wrote:
>> First i mounted tmpfs on /tmp so I don't wear out the SD card too  
>> much:
>> mount -t tmpfs -osize=192M tmpfs /tmp
>>
>> Then I used the following line to start the test (substitute MYTEST
>> below with the following line):
>> /usr/lib/xenomai/testsuite/dohell -m /tmp -l /opt/ltp
>>
>> Note: I always monitored the test over wifi with 'top' so I also had
>> some network load...
>>
>> I got the following results with the 3.10.34 kernel, which includes
>> everything up to the current ipipe-3.10 tag (it also included the
>> patch you mentioned):
>>
>> - xeno-regression-test "MYTEST" -> FAIL if booted from SD card (see
>> description above); OK if booted from ext USB HD _AND_ no mmc
>> partitions mounted
>> - CONFIG_IPIPE && CONFIG_XENOMAI && MYTEST -> FAIL (got status LED 2
>> constantly on as described above)
>> - CONFIG_IPIPE && MYTEST -> OK (see attached config file and ltp test
>> log)
>
> Of course, I assume you used the exact same kernel configuration, the
> only difference being CONFIG_XENOMAI in the two cases, right?

Yes! I just went into menuconfig and disabled CONFIG_XENOMAI, rebuilt  
it, installed it and rebooted. I'm now recompiling the kernel with the  
last config I sent you and the changes I attached (i got all those  
changes after enabling CONFIG_XENOMAI and your CONFIG_* changes with  
make menuconfig). After everything is built, I'll install it and  
repeat running 'MYTEST' without 'xeno-regression-test'.

A.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: config.diff
Type: application/octet-stream
Size: 5246 bytes
Desc: not available
URL: <http://www.xenomai.org/pipermail/xenomai/attachments/20140406/1ac7eef7/attachment.obj>
-------------- next part --------------


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xenomai] Command line freeze during xeno-regression-test on omap4460
  2014-04-06 16:02           ` Andreas Glatz
@ 2014-04-06 20:54             ` Gilles Chanteperdrix
  2014-04-06 21:23               ` Andreas Glatz
  0 siblings, 1 reply; 28+ messages in thread
From: Gilles Chanteperdrix @ 2014-04-06 20:54 UTC (permalink / raw)
  To: Andreas Glatz; +Cc: xenomai

On 04/06/2014 06:02 PM, Andreas Glatz wrote:
> 
> On 6 Apr 2014, at 16:54, Gilles Chanteperdrix wrote:
> 
>> On 04/06/2014 01:21 PM, Andreas Glatz wrote:
>>> First i mounted tmpfs on /tmp so I don't wear out the SD card too  
>>> much:
>>> mount -t tmpfs -osize=192M tmpfs /tmp
>>>
>>> Then I used the following line to start the test (substitute MYTEST
>>> below with the following line):
>>> /usr/lib/xenomai/testsuite/dohell -m /tmp -l /opt/ltp
>>>
>>> Note: I always monitored the test over wifi with 'top' so I also had
>>> some network load...
>>>
>>> I got the following results with the 3.10.34 kernel, which includes
>>> everything up to the current ipipe-3.10 tag (it also included the
>>> patch you mentioned):
>>>
>>> - xeno-regression-test "MYTEST" -> FAIL if booted from SD card (see
>>> description above); OK if booted from ext USB HD _AND_ no mmc
>>> partitions mounted
>>> - CONFIG_IPIPE && CONFIG_XENOMAI && MYTEST -> FAIL (got status LED 2
>>> constantly on as described above)
>>> - CONFIG_IPIPE && MYTEST -> OK (see attached config file and ltp test
>>> log)
>>
>> Of course, I assume you used the exact same kernel configuration, the
>> only difference being CONFIG_XENOMAI in the two cases, right?
> 
> Yes! I just went into menuconfig and disabled CONFIG_XENOMAI, rebuilt  
> it, installed it and rebooted. I'm now recompiling the kernel with the  
> last config I sent you and the changes I attached (i got all those  
> changes after enabling CONFIG_XENOMAI and your CONFIG_* changes with  
> make menuconfig). After everything is built, I'll install it and  
> repeat running 'MYTEST' without 'xeno-regression-test'.

Another interesting test would be to enable CONFIG_DETECT_HUNG_TASK.
With a little luck, we will find on what is blocked the kernel.

-- 
                                                                Gilles.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xenomai] Command line freeze during xeno-regression-test on omap4460
  2014-04-06 15:28             ` Gilles Chanteperdrix
@ 2014-04-06 20:57               ` Andreas Glatz
  2014-04-06 21:04                 ` Gilles Chanteperdrix
  0 siblings, 1 reply; 28+ messages in thread
From: Andreas Glatz @ 2014-04-06 20:57 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai


On 6 Apr 2014, at 16:28, Gilles Chanteperdrix wrote:

> On 04/06/2014 05:22 PM, Andreas Glatz wrote:
>>
>> On 6 Apr 2014, at 15:44, Gilles Chanteperdrix wrote:
>>
>>> On 04/06/2014 01:21 PM, Andreas Glatz wrote:
>>>>
>>>> On 4 Apr 2014, at 11:44, Gilles Chanteperdrix wrote:
>>>>
>>>>> On 04/04/2014 12:27 PM, Andreas Glatz wrote:
>>>>>> Hi Gilles,
>>>>>>
>>>>>> I'm finally back to my original problem below:
>>>>>>
>>>>>> On 6 Jan 2014, at 17:39, Gilles Chanteperdrix wrote:
>>>>>>
>>>>>>> On 01/06/2014 04:30 PM, Andreas Glatz wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I managed to produce a kernel (v3.8.13) with xenomai 2.6.3  
>>>>>>>> ipipe
>>>>>>>> patch and
>>>>>>>> rootfs (debian wheezy) with xenomai 2.6.3 libraries for my
>>>>>>>> Pandaboard ES
>>>>>>>> (omap4460). The simple regression test, which only calls dd
>>>>>>>> during
>>>>>>>> the
>>>>>>>> switchtest, works fine. However the regression test with the
>>>>>>>> linux
>>>>>>>> test
>>>>>>>> project (ltp-full-20130904) scripts causes some sort of system
>>>>>>>> lock
>>>>>>>> up.
>>>>>>>> After that I only can ctrl-c xeno-regression-test (i.e.
>>>>>>>> switchtest), which,
>>>>>>>> however, doesn't help to regain console access (neigher over
>>>>>>>> ethernet nor
>>>>>>>> serial).
>>>>>>>>
>>>>>>>> Here's what I did:
>>>>>>>>
>>>>>>>> -- Building --
>>>>>>>> As recomended in the Xenomai 2.6 readme I followed the
>>>>>>>> instructions
>>>>>>>> in [1]
>>>>>>>> to produce a kernel and filesystem. To get a xenomai kernel I  
>>>>>>>> had
>>>>>>>> to do
>>>>>>>> three things differently:
>>>>>>>>
>>>>>>>> *) I used: git checkout origin/v3.8.x -b tmp
>>>>>>>> *) I applied ipipe-core-3.8.13-arm-3.patch from the xenomai-2.6
>>>>>>>> git
>>>>>>>> tree as
>>>>>>>> described in the Xenomai 2.6 readme
>>>>>>>> *) I disabled KGDB and TIDSPBRIDGE since those produced compile
>>>>>>>> errors (see
>>>>>>>> config [2])
>>>>>>>>
>>>>>>>> After a while I obtained the following messages from dmesg [3]
>>>>>>>> and
>>>>>>>> from the
>>>>>>>> command prompt:
>>>>>>>>
>>>>>>>> root@arm:~# cat /proc/version
>>>>>>>> Linux version 3.8.13-x3.6 (aglatz@linuxvbox) (gcc version 4.7.3
>>>>>>>> 20130328
>>>>>>>> (prerelease) (crosstool-NG linaro-1.13.1-4.7-2013.04-20130415 -
>>>>>>>> Linaro GCC
>>>>>>>> 2013.04) ) #4 SMP Sat Jan 4 15:54:20 GMT 2014
>>>>>>>>
>>>>>>>> -- Testing Linux --
>>>>>>>> To see if everything works I downloaded and cross-compiled
>>>>>>>> ltp-full-20130904 [4] with the same toolchain and flags (-
>>>>>>>> march=armv7-a
>>>>>>>> -mfpu=vfp3) as the xenomai libs and runtime. I started ltp with
>>>>>>>> "./
>>>>>>>> runltp
>>>>>>>> -p -l dohell-2014-01-06-1.log -S xenomai.skiplist" and after a
>>>>>>>> while it
>>>>>>>> finished with a few failed tests [5]. The console access,
>>>>>>>> however,
>>>>>>>> worked
>>>>>>>> fine.
>>>>>>>>
>>>>>>>> -- Testing Xenomai --
>>>>>>>> First I sucessfully could run the simple xenomai regression  
>>>>>>>> test:
>>>>>>>> xeno-regression-test -l "/usr/lib/xenomai/testsuite/dohell -m /
>>>>>>>> tmp
>>>>>>>> 100" -t
>>>>>>>> 2 which produced the output in [6] and the following additional
>>>>>>>> messages
>>>>>>>> with dmesg:
>>>>>>>>
>>>>>>>> [  476.215057] Xenomai: RTDM: closing file descriptor 1.
>>>>>>>> [  477.434936] Xenomai: Posix: destroying semaphore f0069c00.
>>>>>>>> [  477.440887] Xenomai: Posix: destroying mutex f0069a00.
>>>>>>>> [  477.475372] xnheap: destroying shared heap 'rt_heap: heap'
>>>>>>>> with
>>>>>>>> 16384
>>>>>>>> bytes still in use.
>>>>>>>> [  479.008453] Xenomai: Switching rt_task to secondary mode  
>>>>>>>> after
>>>>>>>> exception
>>>>>>>> #0 from user-space at 0x9620 (pid 2145)
>>>>>>>> [  480.574462] Xenomai: watchdog triggered -- signaling runaway
>>>>>>>> thread
>>>>>>>> 'rt_task'
>>>>>>>> [  480.582061] [sched_delayed] sched: RT throttling activated
>>>>>>>> [  557.336425] Xenomai: Posix: closing message queue descriptor
>>>>>>>> 3.
>>>>>>>>
>>>>>>>> and  "cat /proc/xenomai/*" produced [7].
>>>>>>>>
>>>>>>>> When I started the realistic xenomai regression test: xeno-
>>>>>>>> regression-test
>>>>>>>> -l "/usr/lib/xenomai/testsuite/dohell -m /tmp -l /opt/ltp" -t 2
>>>>>>>> everything
>>>>>>>> seemed fine at first - I could logon and start top to inspect  
>>>>>>>> the
>>>>>>>> running
>>>>>>>> processes. However, the command line (over serial and ethernet)
>>>>>>>> consistently freezes after a while (at different ltp tests
>>>>>>>> though).
>>>>>>>> First I
>>>>>>>> thought it's the massive system load which doesn't leave CPU  
>>>>>>>> for
>>>>>>>> the
>>>>>>>> console... however ctrl-c of xeno-regression-test does not help
>>>>>>>> to
>>>>>>>> regain
>>>>>>>> console access...
>>>>>>>
>>>>>>> That is because kill xeno-regression-test does not kill all the
>>>>>>> script children. So, basically, the load tasks are still  
>>>>>>> running.
>>>>>>> Also, what filesystem is /tmp? dohell is using dd to  
>>>>>>> alternatively
>>>>>>> write to /tmp, then erase the file. If /tmp is some flash, it  
>>>>>>> will
>>>>>>> become slow after a while. If it is a tmpfs, it will eat RAM.
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> The described problem is _very_ reproducible on my PandaBoard ES
>>>>>> (omap4460), where I boot from an SD card partition and the rootfs
>>>>>> is
>>>>>> also on the SD card partition. I tried it with several kernel
>>>>>> versions
>>>>>> (3.8.13, 3.10.18, and 3.10.34) with the latest ipipe and xenomai
>>>>>> from
>>>>>> git the git repos. Everytime I start the regression test (see
>>>>>> command
>>>>>> above) the following happens: Everything works fine until the
>>>>>> switch/
>>>>>> latency tests start. Then I see that there is heavy access to the
>>>>>> SD
>>>>>> card, which is expected, as the status LED 2 is blinking. After
>>>>>> ~5mins
>>>>>> this status LED is constantly on. That's when I know that
>>>>>> everything
>>>>>> is over. On the console I can only execute commands that are
>>>>>> already
>>>>>> in RAM, such as the bash things like ps, mount, ... However, if I
>>>>>> try
>>>>>> a simple 'touch new' it blocks forever and I know that it  
>>>>>> blocks in
>>>>>> the syscall where the file should be created, because I looked at
>>>>>> it
>>>>>> with strace. I tried several things: I turned off CONFIG_PM  
>>>>>> (which
>>>>>> was
>>>>>> on by default), turned on the MMC debugging, put extra prink's in
>>>>>> the
>>>>>> omap_hsmmc.c ISR. However, everything seems to work on this  
>>>>>> level:
>>>>>> DMA
>>>>>> requests are started and do finish, the ISR is called regularly  
>>>>>> (bc
>>>>>> first I though that Xenomai would starve it).
>>>>>>
>>>>>> Have you every run Xenonmai on this _specific_ board (since
>>>>>> everything
>>>>>> is running smoothly on the omap5 board)?
>>>>>> Any more ideas how to debug it?
>>>>>>
>>>>>> Currently, I'm compiling the ipipe trace in hope that it would  
>>>>>> tell
>>>>>> me
>>>>>> something useful...
>>>>>>
>>>>>> Oh yes, the best bit is that the regression test works perfectly
>>>>>> fine
>>>>>> if I boot from an external USB HD _AND_ unmount (!) all MMC
>>>>>> partitions.
>>>>>
>>>>> So, the MMC driver has a problem. Have you tried:
>>>>> - running the exact same kernel configuration only with
>>>>> CONFIG_XENOMAI
>>>>> disabled (and stress with dohell)
>>>>> - then with CONFIG_XENOMAI and CONFIG_IPIPE disabled.
>>>>>
>>>>> Also, do you have this patch in the tree you tried?
>>>>> http://git.xenomai.org/ipipe.git/commit/?h=stable/ipipe-3.10.18&id=c26e7ad5679f9391cd8ea1db001bf301d2f6bc88
>>>>>
>>>>
>>>> First i mounted tmpfs on /tmp so I don't wear out the SD card too
>>>> much:
>>>> mount -t tmpfs -osize=192M tmpfs /tmp
>>>>
>>>> Then I used the following line to start the test (substitute MYTEST
>>>> below with the following line):
>>>> /usr/lib/xenomai/testsuite/dohell -m /tmp -l /opt/ltp
>>>>
>>>> Note: I always monitored the test over wifi with 'top' so I also  
>>>> had
>>>> some network load...
>>>>
>>>> I got the following results with the 3.10.34 kernel, which includes
>>>> everything up to the current ipipe-3.10 tag (it also included the
>>>> patch you mentioned):
>>>>
>>>> - xeno-regression-test "MYTEST" -> FAIL if booted from SD card (see
>>>> description above); OK if booted from ext USB HD _AND_ no mmc
>>>> partitions mounted
>>>> - CONFIG_IPIPE && CONFIG_XENOMAI && MYTEST -> FAIL (got status  
>>>> LED 2
>>>> constantly on as described above)
>>>> - CONFIG_IPIPE && MYTEST -> OK (see attached config file and ltp  
>>>> test
>>>> log)
>>>>
>>>> Anything else I should try?
>>>
>>> Is the current LTP test when the failure happens always the same?
>>>
>>>
>>
>> I went through all the logfiles on my pandaboard and and identified
>> the last tests that ltp logged before the error occurred (I'm  
>> assuming
>> that ltp writes to the file in /opt/ltp/results after completing the
>> test since there is the PASS/FAIL note as well, which logically  
>> should
>> only be available after completing the test):
>>
>> test                               count
>> ========================
>> rt_sigqueueinfo01    1
>> clock_nanosleep01 10
>> munmap02                1
>> semget06                   1
>> epoll_create1_01     5
>> splice01                      1
>> clock_getres01          1
>> rename13                   1
>> BindMounts                1
>> utimes01                     1
>>
>> So it seems that the test after 'clock_nanosleep01', which is
>> 'clone01' according to the LTP log file I sent you, seems to be the
>> prime hotspot of failure followed by 'epoll01', which comes after
>> 'epoll_create1_01'.
>>
>> I'm using the standard LTP version 'ltp-full-20130904', which I
>> downloaded and compiled on the target with gcc 4.6.3 (default debian
>> wheezy).
>
> Ok. I am not sure it is meaningful. Anyway, the only difference  
> between
> CONFIG_XENOMAI + CONFIG_IPIPE and CONFIG_IPIPE alone, provided that  
> you
> are not running any program using Xenomai, is the host tick emulation.
>
> So, could you please try to turn off
> CONFIG_NO_HZ_IDLE
> CONFIG_NO_HZ
> CONFIG_HIGH_RES_TIMERS
>
> And see if it works better?
>

As I wrote before, I recompiled the Kernel with your timer options and  
CONFIG_XENOMAI, installed it, synced it and rebooted after cutting the  
power to the board for ~10secs.

It seems with those options it got much further with the tests.  
However, eventually all ssh connections broke up and the last messages  
on the console, where I started do hell were:

[...]
102400000 bytes (102 MB) copied, 2.97674 s, 34.4 MB/s
100+0 records in
100+0 records out
102400000 bytes (102 MB) copied, 1.97433 s, 51.9 MB/s
100+0 records in
100+0 records out
102400000 bytes (102 MB) copied, 2.68371 s, 38.2 MB/s
100+0 records in
100+0 records out
102400000 bytes (102 MB) copied, 2.57073 s, 39.8 MB/s
dd: writing `/tmp/bigfile': No space left on device
7+0 records in
6+0 records out
6164480 bytes (6.2 MB) copied, 0.189001 s, 32.6 MB/s
/usr/lib/xenomai/testsuite/dohell: 62: /usr/lib/xenomai/testsuite/ 
dohell: Cannot fork
Write failed: Host is down

... and as usuall status LED 2 is permanently on.

As u suspect there's something wrong with the timer subsystem I looked  
around a bit what extra patches went into the 3.10.14 kernel of  
RobertCNelson, which I used as a base to merge the ipipe git tree.  
Here is the list:

0001-panda-fix-wl12xx-regulator.patch
0002-ti-st-st-kim-fixing-firmware-path.patch
0003-Panda-expansion-add-spidev.patch
0004-HACK-PandaES-disable-cpufreq-so-board-will-boot.patch
0005-HACK-panda-enable-OMAP4_ERRATA_I688.patch
0006-ARM-hw_breakpoint-Enable-debug-powerdown-only-if-sys.patch
0007-Revert-regulator-twl-Remove-TWL6030_FIXED_RESOURCE.patch
0008-Revert-regulator-twl-Remove-another-unused-variable-.patch
0009-Revert-regulator-twl-Remove-references-to-the-twl403.patch
0010-Revert-regulator-twl-Remove-references-to-32kHz-cloc.patch
0011-panda-spidev-setup-pinmux.patch

Do you think those may have something to do with it?

A.





^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xenomai] Command line freeze during xeno-regression-test on omap4460
  2014-04-06 20:57               ` Andreas Glatz
@ 2014-04-06 21:04                 ` Gilles Chanteperdrix
  2014-04-07 10:18                   ` Andreas Glatz
  0 siblings, 1 reply; 28+ messages in thread
From: Gilles Chanteperdrix @ 2014-04-06 21:04 UTC (permalink / raw)
  To: Andreas Glatz; +Cc: xenomai

On 04/06/2014 10:57 PM, Andreas Glatz wrote:
> 
> On 6 Apr 2014, at 16:28, Gilles Chanteperdrix wrote:
> 
>> On 04/06/2014 05:22 PM, Andreas Glatz wrote:
>>>
>>> On 6 Apr 2014, at 15:44, Gilles Chanteperdrix wrote:
>>>
>>>> On 04/06/2014 01:21 PM, Andreas Glatz wrote:
>>>>>
>>>>> On 4 Apr 2014, at 11:44, Gilles Chanteperdrix wrote:
>>>>>
>>>>>> On 04/04/2014 12:27 PM, Andreas Glatz wrote:
>>>>>>> Hi Gilles,
>>>>>>>
>>>>>>> I'm finally back to my original problem below:
>>>>>>>
>>>>>>> On 6 Jan 2014, at 17:39, Gilles Chanteperdrix wrote:
>>>>>>>
>>>>>>>> On 01/06/2014 04:30 PM, Andreas Glatz wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I managed to produce a kernel (v3.8.13) with xenomai 2.6.3  
>>>>>>>>> ipipe
>>>>>>>>> patch and
>>>>>>>>> rootfs (debian wheezy) with xenomai 2.6.3 libraries for my
>>>>>>>>> Pandaboard ES
>>>>>>>>> (omap4460). The simple regression test, which only calls dd
>>>>>>>>> during
>>>>>>>>> the
>>>>>>>>> switchtest, works fine. However the regression test with the
>>>>>>>>> linux
>>>>>>>>> test
>>>>>>>>> project (ltp-full-20130904) scripts causes some sort of system
>>>>>>>>> lock
>>>>>>>>> up.
>>>>>>>>> After that I only can ctrl-c xeno-regression-test (i.e.
>>>>>>>>> switchtest), which,
>>>>>>>>> however, doesn't help to regain console access (neigher over
>>>>>>>>> ethernet nor
>>>>>>>>> serial).
>>>>>>>>>
>>>>>>>>> Here's what I did:
>>>>>>>>>
>>>>>>>>> -- Building --
>>>>>>>>> As recomended in the Xenomai 2.6 readme I followed the
>>>>>>>>> instructions
>>>>>>>>> in [1]
>>>>>>>>> to produce a kernel and filesystem. To get a xenomai kernel I  
>>>>>>>>> had
>>>>>>>>> to do
>>>>>>>>> three things differently:
>>>>>>>>>
>>>>>>>>> *) I used: git checkout origin/v3.8.x -b tmp
>>>>>>>>> *) I applied ipipe-core-3.8.13-arm-3.patch from the xenomai-2.6
>>>>>>>>> git
>>>>>>>>> tree as
>>>>>>>>> described in the Xenomai 2.6 readme
>>>>>>>>> *) I disabled KGDB and TIDSPBRIDGE since those produced compile
>>>>>>>>> errors (see
>>>>>>>>> config [2])
>>>>>>>>>
>>>>>>>>> After a while I obtained the following messages from dmesg [3]
>>>>>>>>> and
>>>>>>>>> from the
>>>>>>>>> command prompt:
>>>>>>>>>
>>>>>>>>> root@arm:~# cat /proc/version
>>>>>>>>> Linux version 3.8.13-x3.6 (aglatz@linuxvbox) (gcc version 4.7.3
>>>>>>>>> 20130328
>>>>>>>>> (prerelease) (crosstool-NG linaro-1.13.1-4.7-2013.04-20130415 -
>>>>>>>>> Linaro GCC
>>>>>>>>> 2013.04) ) #4 SMP Sat Jan 4 15:54:20 GMT 2014
>>>>>>>>>
>>>>>>>>> -- Testing Linux --
>>>>>>>>> To see if everything works I downloaded and cross-compiled
>>>>>>>>> ltp-full-20130904 [4] with the same toolchain and flags (-
>>>>>>>>> march=armv7-a
>>>>>>>>> -mfpu=vfp3) as the xenomai libs and runtime. I started ltp with
>>>>>>>>> "./
>>>>>>>>> runltp
>>>>>>>>> -p -l dohell-2014-01-06-1.log -S xenomai.skiplist" and after a
>>>>>>>>> while it
>>>>>>>>> finished with a few failed tests [5]. The console access,
>>>>>>>>> however,
>>>>>>>>> worked
>>>>>>>>> fine.
>>>>>>>>>
>>>>>>>>> -- Testing Xenomai --
>>>>>>>>> First I sucessfully could run the simple xenomai regression  
>>>>>>>>> test:
>>>>>>>>> xeno-regression-test -l "/usr/lib/xenomai/testsuite/dohell -m /
>>>>>>>>> tmp
>>>>>>>>> 100" -t
>>>>>>>>> 2 which produced the output in [6] and the following additional
>>>>>>>>> messages
>>>>>>>>> with dmesg:
>>>>>>>>>
>>>>>>>>> [  476.215057] Xenomai: RTDM: closing file descriptor 1.
>>>>>>>>> [  477.434936] Xenomai: Posix: destroying semaphore f0069c00.
>>>>>>>>> [  477.440887] Xenomai: Posix: destroying mutex f0069a00.
>>>>>>>>> [  477.475372] xnheap: destroying shared heap 'rt_heap: heap'
>>>>>>>>> with
>>>>>>>>> 16384
>>>>>>>>> bytes still in use.
>>>>>>>>> [  479.008453] Xenomai: Switching rt_task to secondary mode  
>>>>>>>>> after
>>>>>>>>> exception
>>>>>>>>> #0 from user-space at 0x9620 (pid 2145)
>>>>>>>>> [  480.574462] Xenomai: watchdog triggered -- signaling runaway
>>>>>>>>> thread
>>>>>>>>> 'rt_task'
>>>>>>>>> [  480.582061] [sched_delayed] sched: RT throttling activated
>>>>>>>>> [  557.336425] Xenomai: Posix: closing message queue descriptor
>>>>>>>>> 3.
>>>>>>>>>
>>>>>>>>> and  "cat /proc/xenomai/*" produced [7].
>>>>>>>>>
>>>>>>>>> When I started the realistic xenomai regression test: xeno-
>>>>>>>>> regression-test
>>>>>>>>> -l "/usr/lib/xenomai/testsuite/dohell -m /tmp -l /opt/ltp" -t 2
>>>>>>>>> everything
>>>>>>>>> seemed fine at first - I could logon and start top to inspect  
>>>>>>>>> the
>>>>>>>>> running
>>>>>>>>> processes. However, the command line (over serial and ethernet)
>>>>>>>>> consistently freezes after a while (at different ltp tests
>>>>>>>>> though).
>>>>>>>>> First I
>>>>>>>>> thought it's the massive system load which doesn't leave CPU  
>>>>>>>>> for
>>>>>>>>> the
>>>>>>>>> console... however ctrl-c of xeno-regression-test does not help
>>>>>>>>> to
>>>>>>>>> regain
>>>>>>>>> console access...
>>>>>>>>
>>>>>>>> That is because kill xeno-regression-test does not kill all the
>>>>>>>> script children. So, basically, the load tasks are still  
>>>>>>>> running.
>>>>>>>> Also, what filesystem is /tmp? dohell is using dd to  
>>>>>>>> alternatively
>>>>>>>> write to /tmp, then erase the file. If /tmp is some flash, it  
>>>>>>>> will
>>>>>>>> become slow after a while. If it is a tmpfs, it will eat RAM.
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> The described problem is _very_ reproducible on my PandaBoard ES
>>>>>>> (omap4460), where I boot from an SD card partition and the rootfs
>>>>>>> is
>>>>>>> also on the SD card partition. I tried it with several kernel
>>>>>>> versions
>>>>>>> (3.8.13, 3.10.18, and 3.10.34) with the latest ipipe and xenomai
>>>>>>> from
>>>>>>> git the git repos. Everytime I start the regression test (see
>>>>>>> command
>>>>>>> above) the following happens: Everything works fine until the
>>>>>>> switch/
>>>>>>> latency tests start. Then I see that there is heavy access to the
>>>>>>> SD
>>>>>>> card, which is expected, as the status LED 2 is blinking. After
>>>>>>> ~5mins
>>>>>>> this status LED is constantly on. That's when I know that
>>>>>>> everything
>>>>>>> is over. On the console I can only execute commands that are
>>>>>>> already
>>>>>>> in RAM, such as the bash things like ps, mount, ... However, if I
>>>>>>> try
>>>>>>> a simple 'touch new' it blocks forever and I know that it  
>>>>>>> blocks in
>>>>>>> the syscall where the file should be created, because I looked at
>>>>>>> it
>>>>>>> with strace. I tried several things: I turned off CONFIG_PM  
>>>>>>> (which
>>>>>>> was
>>>>>>> on by default), turned on the MMC debugging, put extra prink's in
>>>>>>> the
>>>>>>> omap_hsmmc.c ISR. However, everything seems to work on this  
>>>>>>> level:
>>>>>>> DMA
>>>>>>> requests are started and do finish, the ISR is called regularly  
>>>>>>> (bc
>>>>>>> first I though that Xenomai would starve it).
>>>>>>>
>>>>>>> Have you every run Xenonmai on this _specific_ board (since
>>>>>>> everything
>>>>>>> is running smoothly on the omap5 board)?
>>>>>>> Any more ideas how to debug it?
>>>>>>>
>>>>>>> Currently, I'm compiling the ipipe trace in hope that it would  
>>>>>>> tell
>>>>>>> me
>>>>>>> something useful...
>>>>>>>
>>>>>>> Oh yes, the best bit is that the regression test works perfectly
>>>>>>> fine
>>>>>>> if I boot from an external USB HD _AND_ unmount (!) all MMC
>>>>>>> partitions.
>>>>>>
>>>>>> So, the MMC driver has a problem. Have you tried:
>>>>>> - running the exact same kernel configuration only with
>>>>>> CONFIG_XENOMAI
>>>>>> disabled (and stress with dohell)
>>>>>> - then with CONFIG_XENOMAI and CONFIG_IPIPE disabled.
>>>>>>
>>>>>> Also, do you have this patch in the tree you tried?
>>>>>> http://git.xenomai.org/ipipe.git/commit/?h=stable/ipipe-3.10.18&id=c26e7ad5679f9391cd8ea1db001bf301d2f6bc88
>>>>>>
>>>>>
>>>>> First i mounted tmpfs on /tmp so I don't wear out the SD card too
>>>>> much:
>>>>> mount -t tmpfs -osize=192M tmpfs /tmp
>>>>>
>>>>> Then I used the following line to start the test (substitute MYTEST
>>>>> below with the following line):
>>>>> /usr/lib/xenomai/testsuite/dohell -m /tmp -l /opt/ltp
>>>>>
>>>>> Note: I always monitored the test over wifi with 'top' so I also  
>>>>> had
>>>>> some network load...
>>>>>
>>>>> I got the following results with the 3.10.34 kernel, which includes
>>>>> everything up to the current ipipe-3.10 tag (it also included the
>>>>> patch you mentioned):
>>>>>
>>>>> - xeno-regression-test "MYTEST" -> FAIL if booted from SD card (see
>>>>> description above); OK if booted from ext USB HD _AND_ no mmc
>>>>> partitions mounted
>>>>> - CONFIG_IPIPE && CONFIG_XENOMAI && MYTEST -> FAIL (got status  
>>>>> LED 2
>>>>> constantly on as described above)
>>>>> - CONFIG_IPIPE && MYTEST -> OK (see attached config file and ltp  
>>>>> test
>>>>> log)
>>>>>
>>>>> Anything else I should try?
>>>>
>>>> Is the current LTP test when the failure happens always the same?
>>>>
>>>>
>>>
>>> I went through all the logfiles on my pandaboard and and identified
>>> the last tests that ltp logged before the error occurred (I'm  
>>> assuming
>>> that ltp writes to the file in /opt/ltp/results after completing the
>>> test since there is the PASS/FAIL note as well, which logically  
>>> should
>>> only be available after completing the test):
>>>
>>> test                               count
>>> ========================
>>> rt_sigqueueinfo01    1
>>> clock_nanosleep01 10
>>> munmap02                1
>>> semget06                   1
>>> epoll_create1_01     5
>>> splice01                      1
>>> clock_getres01          1
>>> rename13                   1
>>> BindMounts                1
>>> utimes01                     1
>>>
>>> So it seems that the test after 'clock_nanosleep01', which is
>>> 'clone01' according to the LTP log file I sent you, seems to be the
>>> prime hotspot of failure followed by 'epoll01', which comes after
>>> 'epoll_create1_01'.
>>>
>>> I'm using the standard LTP version 'ltp-full-20130904', which I
>>> downloaded and compiled on the target with gcc 4.6.3 (default debian
>>> wheezy).
>>
>> Ok. I am not sure it is meaningful. Anyway, the only difference  
>> between
>> CONFIG_XENOMAI + CONFIG_IPIPE and CONFIG_IPIPE alone, provided that  
>> you
>> are not running any program using Xenomai, is the host tick emulation.
>>
>> So, could you please try to turn off
>> CONFIG_NO_HZ_IDLE
>> CONFIG_NO_HZ
>> CONFIG_HIGH_RES_TIMERS
>>
>> And see if it works better?
>>
> 
> As I wrote before, I recompiled the Kernel with your timer options and  
> CONFIG_XENOMAI, installed it, synced it and rebooted after cutting the  
> power to the board for ~10secs.
> 
> It seems with those options it got much further with the tests.  
> However, eventually all ssh connections broke up and the last messages  
> on the console, where I started do hell were:
> 
> [...]
> 102400000 bytes (102 MB) copied, 2.97674 s, 34.4 MB/s
> 100+0 records in
> 100+0 records out
> 102400000 bytes (102 MB) copied, 1.97433 s, 51.9 MB/s
> 100+0 records in
> 100+0 records out
> 102400000 bytes (102 MB) copied, 2.68371 s, 38.2 MB/s
> 100+0 records in
> 100+0 records out
> 102400000 bytes (102 MB) copied, 2.57073 s, 39.8 MB/s
> dd: writing `/tmp/bigfile': No space left on device
> 7+0 records in
> 6+0 records out
> 6164480 bytes (6.2 MB) copied, 0.189001 s, 32.6 MB/s
> /usr/lib/xenomai/testsuite/dohell: 62: /usr/lib/xenomai/testsuite/ 
> dohell: Cannot fork

This may simply be due to some LTP test which forks a lot and prevent
the system from being able to fork. This should be a temporary solution.

> Write failed: Host is down
> 
> ... and as usuall status LED 2 is permanently on.
> 
> As u suspect there's something wrong with the timer subsystem I looked  
> around a bit what extra patches went into the 3.10.14 kernel of  
> RobertCNelson, which I used as a base to merge the ipipe git tree.  
> Here is the list:
> 
> 0001-panda-fix-wl12xx-regulator.patch
> 0002-ti-st-st-kim-fixing-firmware-path.patch
> 0003-Panda-expansion-add-spidev.patch
> 0004-HACK-PandaES-disable-cpufreq-so-board-will-boot.patch
> 0005-HACK-panda-enable-OMAP4_ERRATA_I688.patch
> 0006-ARM-hw_breakpoint-Enable-debug-powerdown-only-if-sys.patch
> 0007-Revert-regulator-twl-Remove-TWL6030_FIXED_RESOURCE.patch
> 0008-Revert-regulator-twl-Remove-another-unused-variable-.patch
> 0009-Revert-regulator-twl-Remove-references-to-the-twl403.patch
> 0010-Revert-regulator-twl-Remove-references-to-32kHz-cloc.patch
> 0011-panda-spidev-setup-pinmux.patch
> 
> Do you think those may have something to do with it?

I do not think so. When the LED is still on, can you use the serial
console to run cat /proc/interrupts to see if the timer is still ticking?


-- 
                                                                Gilles.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xenomai] Command line freeze during xeno-regression-test on omap4460
  2014-04-06 20:54             ` Gilles Chanteperdrix
@ 2014-04-06 21:23               ` Andreas Glatz
  0 siblings, 0 replies; 28+ messages in thread
From: Andreas Glatz @ 2014-04-06 21:23 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai


On 6 Apr 2014, at 21:54, Gilles Chanteperdrix wrote:

> On 04/06/2014 06:02 PM, Andreas Glatz wrote:
>>
>> On 6 Apr 2014, at 16:54, Gilles Chanteperdrix wrote:
>>
>>> On 04/06/2014 01:21 PM, Andreas Glatz wrote:
>>>> First i mounted tmpfs on /tmp so I don't wear out the SD card too
>>>> much:
>>>> mount -t tmpfs -osize=192M tmpfs /tmp
>>>>
>>>> Then I used the following line to start the test (substitute MYTEST
>>>> below with the following line):
>>>> /usr/lib/xenomai/testsuite/dohell -m /tmp -l /opt/ltp
>>>>
>>>> Note: I always monitored the test over wifi with 'top' so I also  
>>>> had
>>>> some network load...
>>>>
>>>> I got the following results with the 3.10.34 kernel, which includes
>>>> everything up to the current ipipe-3.10 tag (it also included the
>>>> patch you mentioned):
>>>>
>>>> - xeno-regression-test "MYTEST" -> FAIL if booted from SD card (see
>>>> description above); OK if booted from ext USB HD _AND_ no mmc
>>>> partitions mounted
>>>> - CONFIG_IPIPE && CONFIG_XENOMAI && MYTEST -> FAIL (got status  
>>>> LED 2
>>>> constantly on as described above)
>>>> - CONFIG_IPIPE && MYTEST -> OK (see attached config file and ltp  
>>>> test
>>>> log)
>>>
>>> Of course, I assume you used the exact same kernel configuration,  
>>> the
>>> only difference being CONFIG_XENOMAI in the two cases, right?
>>
>> Yes! I just went into menuconfig and disabled CONFIG_XENOMAI, rebuilt
>> it, installed it and rebooted. I'm now recompiling the kernel with  
>> the
>> last config I sent you and the changes I attached (i got all those
>> changes after enabling CONFIG_XENOMAI and your CONFIG_* changes with
>> make menuconfig). After everything is built, I'll install it and
>> repeat running 'MYTEST' without 'xeno-regression-test'.
>
> Another interesting test would be to enable CONFIG_DETECT_HUNG_TASK.
> With a little luck, we will find on what is blocked the kernel.
>

Unfortunately, I rebooted the system and couldn't check the serial  
console. I started ltp again... so I should have more info tomorrow.

However, last week I got the following backtraces with a CONFIG_IPIPE  
&& CONFIG_XENOMAI kernel:

[10683.230000] INFO: task arith:2623 blocked for more than 120 seconds.
[10683.240000] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"  
disables this message.
[10683.250000] arith           D c0825a34     0  2623      1 0x00000001
[10683.260000] [<c0825a34>] (__schedule+0x550/0x858) from [<c0825dcc>]  
(schedule+0x90/0x94)
[10683.270000] [<c0825dcc>] (schedule+0x90/0x94) from [<c08260b4>]  
(io_schedule+0xbc/0x12c)
[10683.280000] [<c08260b4>] (io_schedule+0xbc/0x12c) from [<c02077a4>]  
(sleep_on_buffer+0x18/0x20)
[10683.290000] [<c02077a4>] (sleep_on_buffer+0x18/0x20) from  
[<c0823ef0>] (__wait_on_bit+0x64/0xb0)
[10683.300000] [<c0823ef0>] (__wait_on_bit+0x64/0xb0) from  
[<c0823fc4>] (out_of_line_wait_on_bit+0x88/0x94)
[10683.310000] [<c0823fc4>] (out_of_line_wait_on_bit+0x88/0x94) from  
[<c0207860>] (__wait_on_buffer+0x30/0x38)
[10683.320000] [<c0207860>] (__wait_on_buffer+0x30/0x38) from  
[<c0270e34>] (__ext4_get_inode_loc+0x1cc/0x448)
[10683.330000] [<c0270e34>] (__ext4_get_inode_loc+0x1cc/0x448) from  
[<c0272b64>] (ext4_iget+0x64/0x840)
[10683.340000] [<c0272b64>] (ext4_iget+0x64/0x840) from [<c027b9d4>]  
(ext4_lookup+0x120/0x168)
[10683.350000] [<c027b9d4>] (ext4_lookup+0x120/0x168) from  
[<c01e37e4>] (lookup_real+0x40/0x5c)
[10683.360000] [<c01e37e4>] (lookup_real+0x40/0x5c) from [<c01e7b64>]  
(do_last+0x604/0xd24)
[10683.370000] [<c01e7b64>] (do_last+0x604/0xd24) from [<c01e8348>]  
(path_openat+0xc4/0x460)
[10683.380000] [<c01e8348>] (path_openat+0xc4/0x460) from [<c01e9440>]  
(do_filp_open+0x3c/0x88)
[10683.390000] [<c01e9440>] (do_filp_open+0x3c/0x88) from [<c01d9c48>]  
(do_sys_open+0xf4/0x180)
[10683.400000] [<c01d9c48>] (do_sys_open+0xf4/0x180) from [<c01d9d04>]  
(SyS_open+0x30/0x34)
[10683.410000] [<c01d9d04>] (SyS_open+0x30/0x34) from [<c000e020>]  
(ret_fast_syscall+0x0/0x50)

[10683.070000] INFO: task rs:main Q:Reg:2063 blocked for more than 120  
seconds.
[10683.070000] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"  
disables this message.
[10683.080000] rs:main Q:Reg   D c0825a34     0  2063      1 0x00000000
[10683.090000] [<c0825a34>] (__schedule+0x550/0x858) from [<c0825dcc>]  
(schedule+0x90/0x94)
[10683.100000] [<c0825dcc>] (schedule+0x90/0x94) from [<c08260b4>]  
(io_schedule+0xbc/0x12c)
[10683.110000] [<c08260b4>] (io_schedule+0xbc/0x12c) from [<c0195570>]  
(sleep_on_page+0x18/0x20)
[10683.120000] [<c0195570>] (sleep_on_page+0x18/0x20) from  
[<c0823ef0>] (__wait_on_bit+0x64/0xb0)
[10683.130000] [<c0823ef0>] (__wait_on_bit+0x64/0xb0) from  
[<c0195364>] (wait_on_page_bit+0xa0/0xb0)
[10683.140000] [<c0195364>] (wait_on_page_bit+0xa0/0xb0) from  
[<c02765fc>] (ext4_da_write_begin+0x1d4/0x28c)
[10683.150000] [<c02765fc>] (ext4_da_write_begin+0x1d4/0x28c) from  
[<c01966ec>] (generic_file_buffered_write+0xdc/0x240)
[10683.160000] [<c01966ec>] (generic_file_buffered_write+0xdc/0x240)  
from [<c01979b0>] (__generic_file_aio_write+0x360/0x3ac)
[10683.170000] [<c01979b0>] (__generic_file_aio_write+0x360/0x3ac)  
from [<c0197a64>] (generic_file_aio_write+0x68/0xc8)
[10683.190000] [<c0197a64>] (generic_file_aio_write+0x68/0xc8) from  
[<c026d33c>] (ext4_file_write+0x36c/0x454)
[10683.200000] [<c026d33c>] (ext4_file_write+0x36c/0x454) from  
[<c01da120>] (do_sync_write+0x84/0xa8)
[10683.210000] [<c01da120>] (do_sync_write+0x84/0xa8) from  
[<c01da8c0>] (vfs_write+0xe0/0x1c8)
[10683.220000] [<c01da8c0>] (vfs_write+0xe0/0x1c8) from [<c01daec8>]  
(SyS_write+0x4c/0x7c)
[10683.230000] [<c01daec8>] (SyS_write+0x4c/0x7c) from [<c000e020>]  
(ret_fast_syscall+0x0/0x50)

A.




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xenomai] Command line freeze during xeno-regression-test on omap4460
  2014-04-06 21:04                 ` Gilles Chanteperdrix
@ 2014-04-07 10:18                   ` Andreas Glatz
  2014-04-07 10:52                     ` Gilles Chanteperdrix
  0 siblings, 1 reply; 28+ messages in thread
From: Andreas Glatz @ 2014-04-07 10:18 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai


On 6 Apr 2014, at 22:04, Gilles Chanteperdrix wrote:

> On 04/06/2014 10:57 PM, Andreas Glatz wrote:
>>
>> On 6 Apr 2014, at 16:28, Gilles Chanteperdrix wrote:
>>
>>> On 04/06/2014 05:22 PM, Andreas Glatz wrote:
>>>>
>>>> On 6 Apr 2014, at 15:44, Gilles Chanteperdrix wrote:
>>>>
>>>>> On 04/06/2014 01:21 PM, Andreas Glatz wrote:
>>>>>>
>>>>>> On 4 Apr 2014, at 11:44, Gilles Chanteperdrix wrote:
>>>>>>
>>>>>>> On 04/04/2014 12:27 PM, Andreas Glatz wrote:
>>>>>>>> Hi Gilles,
>>>>>>>>
>>>>>>>> I'm finally back to my original problem below:
>>>>>>>>
>>>>>>>> On 6 Jan 2014, at 17:39, Gilles Chanteperdrix wrote:
>>>>>>>>
>>>>>>>>> On 01/06/2014 04:30 PM, Andreas Glatz wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> I managed to produce a kernel (v3.8.13) with xenomai 2.6.3
>>>>>>>>>> ipipe
>>>>>>>>>> patch and
>>>>>>>>>> rootfs (debian wheezy) with xenomai 2.6.3 libraries for my
>>>>>>>>>> Pandaboard ES
>>>>>>>>>> (omap4460). The simple regression test, which only calls dd
>>>>>>>>>> during
>>>>>>>>>> the
>>>>>>>>>> switchtest, works fine. However the regression test with the
>>>>>>>>>> linux
>>>>>>>>>> test
>>>>>>>>>> project (ltp-full-20130904) scripts causes some sort of  
>>>>>>>>>> system
>>>>>>>>>> lock
>>>>>>>>>> up.
>>>>>>>>>> After that I only can ctrl-c xeno-regression-test (i.e.
>>>>>>>>>> switchtest), which,
>>>>>>>>>> however, doesn't help to regain console access (neigher over
>>>>>>>>>> ethernet nor
>>>>>>>>>> serial).
>>>>>>>>>>
>>>>>>>>>> Here's what I did:
>>>>>>>>>>
>>>>>>>>>> -- Building --
>>>>>>>>>> As recomended in the Xenomai 2.6 readme I followed the
>>>>>>>>>> instructions
>>>>>>>>>> in [1]
>>>>>>>>>> to produce a kernel and filesystem. To get a xenomai kernel I
>>>>>>>>>> had
>>>>>>>>>> to do
>>>>>>>>>> three things differently:
>>>>>>>>>>
>>>>>>>>>> *) I used: git checkout origin/v3.8.x -b tmp
>>>>>>>>>> *) I applied ipipe-core-3.8.13-arm-3.patch from the  
>>>>>>>>>> xenomai-2.6
>>>>>>>>>> git
>>>>>>>>>> tree as
>>>>>>>>>> described in the Xenomai 2.6 readme
>>>>>>>>>> *) I disabled KGDB and TIDSPBRIDGE since those produced  
>>>>>>>>>> compile
>>>>>>>>>> errors (see
>>>>>>>>>> config [2])
>>>>>>>>>>
>>>>>>>>>> After a while I obtained the following messages from dmesg  
>>>>>>>>>> [3]
>>>>>>>>>> and
>>>>>>>>>> from the
>>>>>>>>>> command prompt:
>>>>>>>>>>
>>>>>>>>>> root@arm:~# cat /proc/version
>>>>>>>>>> Linux version 3.8.13-x3.6 (aglatz@linuxvbox) (gcc version  
>>>>>>>>>> 4.7.3
>>>>>>>>>> 20130328
>>>>>>>>>> (prerelease) (crosstool-NG  
>>>>>>>>>> linaro-1.13.1-4.7-2013.04-20130415 -
>>>>>>>>>> Linaro GCC
>>>>>>>>>> 2013.04) ) #4 SMP Sat Jan 4 15:54:20 GMT 2014
>>>>>>>>>>
>>>>>>>>>> -- Testing Linux --
>>>>>>>>>> To see if everything works I downloaded and cross-compiled
>>>>>>>>>> ltp-full-20130904 [4] with the same toolchain and flags (-
>>>>>>>>>> march=armv7-a
>>>>>>>>>> -mfpu=vfp3) as the xenomai libs and runtime. I started ltp  
>>>>>>>>>> with
>>>>>>>>>> "./
>>>>>>>>>> runltp
>>>>>>>>>> -p -l dohell-2014-01-06-1.log -S xenomai.skiplist" and  
>>>>>>>>>> after a
>>>>>>>>>> while it
>>>>>>>>>> finished with a few failed tests [5]. The console access,
>>>>>>>>>> however,
>>>>>>>>>> worked
>>>>>>>>>> fine.
>>>>>>>>>>
>>>>>>>>>> -- Testing Xenomai --
>>>>>>>>>> First I sucessfully could run the simple xenomai regression
>>>>>>>>>> test:
>>>>>>>>>> xeno-regression-test -l "/usr/lib/xenomai/testsuite/dohell - 
>>>>>>>>>> m /
>>>>>>>>>> tmp
>>>>>>>>>> 100" -t
>>>>>>>>>> 2 which produced the output in [6] and the following  
>>>>>>>>>> additional
>>>>>>>>>> messages
>>>>>>>>>> with dmesg:
>>>>>>>>>>
>>>>>>>>>> [  476.215057] Xenomai: RTDM: closing file descriptor 1.
>>>>>>>>>> [  477.434936] Xenomai: Posix: destroying semaphore f0069c00.
>>>>>>>>>> [  477.440887] Xenomai: Posix: destroying mutex f0069a00.
>>>>>>>>>> [  477.475372] xnheap: destroying shared heap 'rt_heap: heap'
>>>>>>>>>> with
>>>>>>>>>> 16384
>>>>>>>>>> bytes still in use.
>>>>>>>>>> [  479.008453] Xenomai: Switching rt_task to secondary mode
>>>>>>>>>> after
>>>>>>>>>> exception
>>>>>>>>>> #0 from user-space at 0x9620 (pid 2145)
>>>>>>>>>> [  480.574462] Xenomai: watchdog triggered -- signaling  
>>>>>>>>>> runaway
>>>>>>>>>> thread
>>>>>>>>>> 'rt_task'
>>>>>>>>>> [  480.582061] [sched_delayed] sched: RT throttling activated
>>>>>>>>>> [  557.336425] Xenomai: Posix: closing message queue  
>>>>>>>>>> descriptor
>>>>>>>>>> 3.
>>>>>>>>>>
>>>>>>>>>> and  "cat /proc/xenomai/*" produced [7].
>>>>>>>>>>
>>>>>>>>>> When I started the realistic xenomai regression test: xeno-
>>>>>>>>>> regression-test
>>>>>>>>>> -l "/usr/lib/xenomai/testsuite/dohell -m /tmp -l /opt/ltp" - 
>>>>>>>>>> t 2
>>>>>>>>>> everything
>>>>>>>>>> seemed fine at first - I could logon and start top to inspect
>>>>>>>>>> the
>>>>>>>>>> running
>>>>>>>>>> processes. However, the command line (over serial and  
>>>>>>>>>> ethernet)
>>>>>>>>>> consistently freezes after a while (at different ltp tests
>>>>>>>>>> though).
>>>>>>>>>> First I
>>>>>>>>>> thought it's the massive system load which doesn't leave CPU
>>>>>>>>>> for
>>>>>>>>>> the
>>>>>>>>>> console... however ctrl-c of xeno-regression-test does not  
>>>>>>>>>> help
>>>>>>>>>> to
>>>>>>>>>> regain
>>>>>>>>>> console access...
>>>>>>>>>
>>>>>>>>> That is because kill xeno-regression-test does not kill all  
>>>>>>>>> the
>>>>>>>>> script children. So, basically, the load tasks are still
>>>>>>>>> running.
>>>>>>>>> Also, what filesystem is /tmp? dohell is using dd to
>>>>>>>>> alternatively
>>>>>>>>> write to /tmp, then erase the file. If /tmp is some flash, it
>>>>>>>>> will
>>>>>>>>> become slow after a while. If it is a tmpfs, it will eat RAM.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> The described problem is _very_ reproducible on my PandaBoard  
>>>>>>>> ES
>>>>>>>> (omap4460), where I boot from an SD card partition and the  
>>>>>>>> rootfs
>>>>>>>> is
>>>>>>>> also on the SD card partition. I tried it with several kernel
>>>>>>>> versions
>>>>>>>> (3.8.13, 3.10.18, and 3.10.34) with the latest ipipe and  
>>>>>>>> xenomai
>>>>>>>> from
>>>>>>>> git the git repos. Everytime I start the regression test (see
>>>>>>>> command
>>>>>>>> above) the following happens: Everything works fine until the
>>>>>>>> switch/
>>>>>>>> latency tests start. Then I see that there is heavy access to  
>>>>>>>> the
>>>>>>>> SD
>>>>>>>> card, which is expected, as the status LED 2 is blinking. After
>>>>>>>> ~5mins
>>>>>>>> this status LED is constantly on. That's when I know that
>>>>>>>> everything
>>>>>>>> is over. On the console I can only execute commands that are
>>>>>>>> already
>>>>>>>> in RAM, such as the bash things like ps, mount, ... However,  
>>>>>>>> if I
>>>>>>>> try
>>>>>>>> a simple 'touch new' it blocks forever and I know that it
>>>>>>>> blocks in
>>>>>>>> the syscall where the file should be created, because I  
>>>>>>>> looked at
>>>>>>>> it
>>>>>>>> with strace. I tried several things: I turned off CONFIG_PM
>>>>>>>> (which
>>>>>>>> was
>>>>>>>> on by default), turned on the MMC debugging, put extra  
>>>>>>>> prink's in
>>>>>>>> the
>>>>>>>> omap_hsmmc.c ISR. However, everything seems to work on this
>>>>>>>> level:
>>>>>>>> DMA
>>>>>>>> requests are started and do finish, the ISR is called regularly
>>>>>>>> (bc
>>>>>>>> first I though that Xenomai would starve it).
>>>>>>>>
>>>>>>>> Have you every run Xenonmai on this _specific_ board (since
>>>>>>>> everything
>>>>>>>> is running smoothly on the omap5 board)?
>>>>>>>> Any more ideas how to debug it?
>>>>>>>>
>>>>>>>> Currently, I'm compiling the ipipe trace in hope that it would
>>>>>>>> tell
>>>>>>>> me
>>>>>>>> something useful...
>>>>>>>>
>>>>>>>> Oh yes, the best bit is that the regression test works  
>>>>>>>> perfectly
>>>>>>>> fine
>>>>>>>> if I boot from an external USB HD _AND_ unmount (!) all MMC
>>>>>>>> partitions.
>>>>>>>
>>>>>>> So, the MMC driver has a problem. Have you tried:
>>>>>>> - running the exact same kernel configuration only with
>>>>>>> CONFIG_XENOMAI
>>>>>>> disabled (and stress with dohell)
>>>>>>> - then with CONFIG_XENOMAI and CONFIG_IPIPE disabled.
>>>>>>>
>>>>>>> Also, do you have this patch in the tree you tried?
>>>>>>> http://git.xenomai.org/ipipe.git/commit/?h=stable/ipipe-3.10.18&id=c26e7ad5679f9391cd8ea1db001bf301d2f6bc88
>>>>>>>
>>>>>>
>>>>>> First i mounted tmpfs on /tmp so I don't wear out the SD card too
>>>>>> much:
>>>>>> mount -t tmpfs -osize=192M tmpfs /tmp
>>>>>>
>>>>>> Then I used the following line to start the test (substitute  
>>>>>> MYTEST
>>>>>> below with the following line):
>>>>>> /usr/lib/xenomai/testsuite/dohell -m /tmp -l /opt/ltp
>>>>>>
>>>>>> Note: I always monitored the test over wifi with 'top' so I also
>>>>>> had
>>>>>> some network load...
>>>>>>
>>>>>> I got the following results with the 3.10.34 kernel, which  
>>>>>> includes
>>>>>> everything up to the current ipipe-3.10 tag (it also included the
>>>>>> patch you mentioned):
>>>>>>
>>>>>> - xeno-regression-test "MYTEST" -> FAIL if booted from SD card  
>>>>>> (see
>>>>>> description above); OK if booted from ext USB HD _AND_ no mmc
>>>>>> partitions mounted
>>>>>> - CONFIG_IPIPE && CONFIG_XENOMAI && MYTEST -> FAIL (got status
>>>>>> LED 2
>>>>>> constantly on as described above)
>>>>>> - CONFIG_IPIPE && MYTEST -> OK (see attached config file and ltp
>>>>>> test
>>>>>> log)
>>>>>>
>>>>>> Anything else I should try?
>>>>>
>>>>> Is the current LTP test when the failure happens always the same?
>>>>>
>>>>>
>>>>
>>>> I went through all the logfiles on my pandaboard and and identified
>>>> the last tests that ltp logged before the error occurred (I'm
>>>> assuming
>>>> that ltp writes to the file in /opt/ltp/results after completing  
>>>> the
>>>> test since there is the PASS/FAIL note as well, which logically
>>>> should
>>>> only be available after completing the test):
>>>>
>>>> test                               count
>>>> ========================
>>>> rt_sigqueueinfo01    1
>>>> clock_nanosleep01 10
>>>> munmap02                1
>>>> semget06                   1
>>>> epoll_create1_01     5
>>>> splice01                      1
>>>> clock_getres01          1
>>>> rename13                   1
>>>> BindMounts                1
>>>> utimes01                     1
>>>>
>>>> So it seems that the test after 'clock_nanosleep01', which is
>>>> 'clone01' according to the LTP log file I sent you, seems to be the
>>>> prime hotspot of failure followed by 'epoll01', which comes after
>>>> 'epoll_create1_01'.
>>>>
>>>> I'm using the standard LTP version 'ltp-full-20130904', which I
>>>> downloaded and compiled on the target with gcc 4.6.3 (default  
>>>> debian
>>>> wheezy).
>>>
>>> Ok. I am not sure it is meaningful. Anyway, the only difference
>>> between
>>> CONFIG_XENOMAI + CONFIG_IPIPE and CONFIG_IPIPE alone, provided that
>>> you
>>> are not running any program using Xenomai, is the host tick  
>>> emulation.
>>>
>>> So, could you please try to turn off
>>> CONFIG_NO_HZ_IDLE
>>> CONFIG_NO_HZ
>>> CONFIG_HIGH_RES_TIMERS
>>>
>>> And see if it works better?
>>>
>>
>> As I wrote before, I recompiled the Kernel with your timer options  
>> and
>> CONFIG_XENOMAI, installed it, synced it and rebooted after cutting  
>> the
>> power to the board for ~10secs.
>>
>> It seems with those options it got much further with the tests.
>> However, eventually all ssh connections broke up and the last  
>> messages
>> on the console, where I started do hell were:
>>
>> [...]
>> 102400000 bytes (102 MB) copied, 2.97674 s, 34.4 MB/s
>> 100+0 records in
>> 100+0 records out
>> 102400000 bytes (102 MB) copied, 1.97433 s, 51.9 MB/s
>> 100+0 records in
>> 100+0 records out
>> 102400000 bytes (102 MB) copied, 2.68371 s, 38.2 MB/s
>> 100+0 records in
>> 100+0 records out
>> 102400000 bytes (102 MB) copied, 2.57073 s, 39.8 MB/s
>> dd: writing `/tmp/bigfile': No space left on device
>> 7+0 records in
>> 6+0 records out
>> 6164480 bytes (6.2 MB) copied, 0.189001 s, 32.6 MB/s
>> /usr/lib/xenomai/testsuite/dohell: 62: /usr/lib/xenomai/testsuite/
>> dohell: Cannot fork
>
> This may simply be due to some LTP test which forks a lot and prevent
> the system from being able to fork. This should be a temporary  
> solution.
>
>> Write failed: Host is down
>>
>> ... and as usuall status LED 2 is permanently on.
>>
>> As u suspect there's something wrong with the timer subsystem I  
>> looked
>> around a bit what extra patches went into the 3.10.14 kernel of
>> RobertCNelson, which I used as a base to merge the ipipe git tree.
>> Here is the list:
>>
>> 0001-panda-fix-wl12xx-regulator.patch
>> 0002-ti-st-st-kim-fixing-firmware-path.patch
>> 0003-Panda-expansion-add-spidev.patch
>> 0004-HACK-PandaES-disable-cpufreq-so-board-will-boot.patch
>> 0005-HACK-panda-enable-OMAP4_ERRATA_I688.patch
>> 0006-ARM-hw_breakpoint-Enable-debug-powerdown-only-if-sys.patch
>> 0007-Revert-regulator-twl-Remove-TWL6030_FIXED_RESOURCE.patch
>> 0008-Revert-regulator-twl-Remove-another-unused-variable-.patch
>> 0009-Revert-regulator-twl-Remove-references-to-the-twl403.patch
>> 0010-Revert-regulator-twl-Remove-references-to-32kHz-cloc.patch
>> 0011-panda-spidev-setup-pinmux.patch
>>
>> Do you think those may have something to do with it?
>
> I do not think so. When the LED is still on, can you use the serial
> console to run cat /proc/interrupts to see if the timer is still  
> ticking?
>

I ran the test again with the same kernel and traced the messages from  
the serial console with minicom. Again, the test ran for quite some  
time until I got stacktraces similar to [1] (which might be just  
related to the ltp memcg test).

However, after these stacktraces I got the following message on the  
serial console (LED2 also went on and stayed on):

[...]
[ 6674.540000] omap_hsmmc omap_hsmmc.0: MMC start dma failure
[ 6674.540000] mmcblk0: unknown error -22 sending read/write command,  
card status 0x900
[ 6674.550000] end_request: I/O error, dev mmcblk0, sector 12751744
[ 6674.560000] EXT4-fs warning (device mmcblk0p2):  
__ext4_read_dirblock:908: error reading directory block (ino 397703,  
block 0)
[...]
[ 6932.610000] omap_hsmmc omap_hsmmc.0: MMC start dma failure
[ 6932.610000] mmcblk0: unknown error -22 sending read/write command,  
card status 0x900
[ 6932.620000] end_request: I/O error, dev mmcblk0, sector 21142904
[ 6932.630000] EXT4-fs warning (device mmcblk0p2):  
__ext4_read_dirblock:908: error reading directory block (ino 657554,  
block 0)
[...]

Although dd is still running on minicom, I lost the ssh connection  
over Ethernet (and I couldn't get it back even after unconnecting and  
reconnecting the cable, which didn't cause any PHY interrupt in dmesg  
as well) and I cannot Ctrl-C or do anything on the serial console... I  
just see dd, which was started by dohell, getting invoked.

So with the periodic timer ltp runs for much longer, however I can't  
get the console back after the mmc (?), which I was able to with the  
original timer subsystem config.

... and xeno-regression-test "MYTEST" fails as usual after ~ 5mins.

A.



[1] memcg related stacktrace:
=======================
[ 6606.000000] memcg_process invoked oom-killer: gfp_mask=0xd0,  
order=0, oom_sco
re_adj=0[ 6606.010000] memcg_process cpuset=/ mems_allowed=0
[ 6606.010000] CPU: 0 PID: 26237 Comm: memcg_process Tainted: G         
W    3.10.32-x3.4 #26
[ 6606.020000] [<c0014e0c>] (unwind_backtrace+0x0/0xe8) from  
[<c00122ac>] (show_stack+0x20/0x24)
[ 6606.030000] [<c00122ac>] (show_stack+0x20/0x24) from [<c081e0b0>]  
(dump_stack+0x20/0x28)
[ 6606.040000] [<c081e0b0>] (dump_stack+0x20/0x28) from [<c081a610>]  
(dump_header.isra.11+0x98/0x1ac)
[ 6606.050000] [<c081a610>] (dump_header.isra.11+0x98/0x1ac) from  
[<c01948e8>] (oom_kill_process+0x6c/0x3a0)
[ 6606.060000] [<c01948e8>] (oom_kill_process+0x6c/0x3a0) from  
[<c01d0fe8>] (__mem_cgroup_try_charge+0xb00/0xb50)
[ 6606.070000] [<c01d0fe8>] (__mem_cgroup_try_charge+0xb00/0xb50) from  
[<c01d14f0>] (mem_cgroup_charge_common+0x44/0x6c)
[ 6606.080000] [<c01d14f0>] (mem_cgroup_charge_common+0x44/0x6c) from  
[<c01d2958>] (mem_cgroup_newpage_charge+0x34/0x3c)
[ 6606.090000] [<c01d2958>] (mem_cgroup_newpage_charge+0x34/0x3c) from  
[<c01b5718>] (handle_pte_fault+0x718/0x878)
[ 6606.100000] [<c01b5718>] (handle_pte_fault+0x718/0x878) from  
[<c01b5968>] (handle_mm_fault+0xf0/0x144)
[ 6606.110000] [<c01b5968>] (handle_mm_fault+0xf0/0x144) from  
[<c01b5c7c>] (__get_user_pages.part.72+0x2c0/0x434)
[ 6606.120000] [<c01b5c7c>] (__get_user_pages.part.72+0x2c0/0x434)  
from [<c01b5e38>] (__get_user_pages+0x48/0x50)
[ 6606.130000] [<c01b5e38>] (__get_user_pages+0x48/0x50) from  
[<c01b6b24>] (__mlock_vma_pages_range+0x74/0x7c)
[ 6606.140000] [<c01b6b24>] (__mlock_vma_pages_range+0x74/0x7c) from  
[<c01b6fc4>] (__mm_populate+0xd8/0x13c)
[ 6606.150000] [<c01b6fc4>] (__mm_populate+0xd8/0x13c) from  
[<c01a9930>] (vm_mmap_pgoff+0xac/0xb8)
[ 6606.160000] [<c01a9930>] (vm_mmap_pgoff+0xac/0xb8) from  
[<c01b8dd8>] (SyS_mma
p_pgoff+0xb0/0xec)
[ 6606.160000] [<c01b8dd8>] (SyS_mmap_pgoff+0xb0/0xec) from  
[<c000e020>] (ret_fa
st_syscall+0x0/0x50)
[ 6606.170000] Task in /1/subgroup killed as a result of limit of / 
1[ 6606.180000] memory: usage 4kB, limit 4kB, failcnt 6[ 6606.190000]  
memory+swap: usage 4kB, limit 9007199254740991kB, failcnt 0
[ 6606.190000] kmem: usage 0kB, limit 9007199254740991kB, failcnt  
0[ 6606.200000] Memory cgroup stats for /1: cache:0KB rss:0KB rss_huge: 
0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB  
inactive_file:0KB active_fi
le:0KB unevictable:0KB
[ 6606.220000] Memory cgroup stats for /1/subgroup: cache:0KB rss:4KB  
rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon: 
0KB inactive_file:0KB active_file:0KB unevictable:4KB
[ 6606.230000] [ pid ]   uid  tgid total_vm      rss nr_ptes swapents  
oom_score_adj name
[ 6606.240000] [26237]     0 26237      404       84       3         
0             0 memcg_process
[ 6606.250000] Memory cgroup out of memory: Kill process 26237  
(memcg_process) score 85000 or sacrifice child
[ 6606.260000] Killed process 26237 (memcg_process) total-vm:1616kB,  
anon-rss:68kB, file-rss:268kB






^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xenomai] Command line freeze during xeno-regression-test on omap4460
  2014-04-07 10:18                   ` Andreas Glatz
@ 2014-04-07 10:52                     ` Gilles Chanteperdrix
  2014-04-07 13:41                       ` Andreas Glatz
  0 siblings, 1 reply; 28+ messages in thread
From: Gilles Chanteperdrix @ 2014-04-07 10:52 UTC (permalink / raw)
  To: Andreas Glatz; +Cc: xenomai

On 04/07/2014 12:18 PM, Andreas Glatz wrote:
> 
> On 6 Apr 2014, at 22:04, Gilles Chanteperdrix wrote:
> 
>> On 04/06/2014 10:57 PM, Andreas Glatz wrote:
>>>
>>> On 6 Apr 2014, at 16:28, Gilles Chanteperdrix wrote:
>>>
>>>> On 04/06/2014 05:22 PM, Andreas Glatz wrote:
>>>>>
>>>>> On 6 Apr 2014, at 15:44, Gilles Chanteperdrix wrote:
>>>>>
>>>>>> On 04/06/2014 01:21 PM, Andreas Glatz wrote:
>>>>>>>
>>>>>>> On 4 Apr 2014, at 11:44, Gilles Chanteperdrix wrote:
>>>>>>>
>>>>>>>> On 04/04/2014 12:27 PM, Andreas Glatz wrote:
>>>>>>>>> Hi Gilles,
>>>>>>>>>
>>>>>>>>> I'm finally back to my original problem below:
>>>>>>>>>
>>>>>>>>> On 6 Jan 2014, at 17:39, Gilles Chanteperdrix wrote:
>>>>>>>>>
>>>>>>>>>> On 01/06/2014 04:30 PM, Andreas Glatz wrote:
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> I managed to produce a kernel (v3.8.13) with xenomai 2.6.3
>>>>>>>>>>> ipipe
>>>>>>>>>>> patch and
>>>>>>>>>>> rootfs (debian wheezy) with xenomai 2.6.3 libraries for my
>>>>>>>>>>> Pandaboard ES
>>>>>>>>>>> (omap4460). The simple regression test, which only calls dd
>>>>>>>>>>> during
>>>>>>>>>>> the
>>>>>>>>>>> switchtest, works fine. However the regression test with the
>>>>>>>>>>> linux
>>>>>>>>>>> test
>>>>>>>>>>> project (ltp-full-20130904) scripts causes some sort of  
>>>>>>>>>>> system
>>>>>>>>>>> lock
>>>>>>>>>>> up.
>>>>>>>>>>> After that I only can ctrl-c xeno-regression-test (i.e.
>>>>>>>>>>> switchtest), which,
>>>>>>>>>>> however, doesn't help to regain console access (neigher over
>>>>>>>>>>> ethernet nor
>>>>>>>>>>> serial).
>>>>>>>>>>>
>>>>>>>>>>> Here's what I did:
>>>>>>>>>>>
>>>>>>>>>>> -- Building --
>>>>>>>>>>> As recomended in the Xenomai 2.6 readme I followed the
>>>>>>>>>>> instructions
>>>>>>>>>>> in [1]
>>>>>>>>>>> to produce a kernel and filesystem. To get a xenomai kernel I
>>>>>>>>>>> had
>>>>>>>>>>> to do
>>>>>>>>>>> three things differently:
>>>>>>>>>>>
>>>>>>>>>>> *) I used: git checkout origin/v3.8.x -b tmp
>>>>>>>>>>> *) I applied ipipe-core-3.8.13-arm-3.patch from the  
>>>>>>>>>>> xenomai-2.6
>>>>>>>>>>> git
>>>>>>>>>>> tree as
>>>>>>>>>>> described in the Xenomai 2.6 readme
>>>>>>>>>>> *) I disabled KGDB and TIDSPBRIDGE since those produced  
>>>>>>>>>>> compile
>>>>>>>>>>> errors (see
>>>>>>>>>>> config [2])
>>>>>>>>>>>
>>>>>>>>>>> After a while I obtained the following messages from dmesg  
>>>>>>>>>>> [3]
>>>>>>>>>>> and
>>>>>>>>>>> from the
>>>>>>>>>>> command prompt:
>>>>>>>>>>>
>>>>>>>>>>> root@arm:~# cat /proc/version
>>>>>>>>>>> Linux version 3.8.13-x3.6 (aglatz@linuxvbox) (gcc version  
>>>>>>>>>>> 4.7.3
>>>>>>>>>>> 20130328
>>>>>>>>>>> (prerelease) (crosstool-NG  
>>>>>>>>>>> linaro-1.13.1-4.7-2013.04-20130415 -
>>>>>>>>>>> Linaro GCC
>>>>>>>>>>> 2013.04) ) #4 SMP Sat Jan 4 15:54:20 GMT 2014
>>>>>>>>>>>
>>>>>>>>>>> -- Testing Linux --
>>>>>>>>>>> To see if everything works I downloaded and cross-compiled
>>>>>>>>>>> ltp-full-20130904 [4] with the same toolchain and flags (-
>>>>>>>>>>> march=armv7-a
>>>>>>>>>>> -mfpu=vfp3) as the xenomai libs and runtime. I started ltp  
>>>>>>>>>>> with
>>>>>>>>>>> "./
>>>>>>>>>>> runltp
>>>>>>>>>>> -p -l dohell-2014-01-06-1.log -S xenomai.skiplist" and  
>>>>>>>>>>> after a
>>>>>>>>>>> while it
>>>>>>>>>>> finished with a few failed tests [5]. The console access,
>>>>>>>>>>> however,
>>>>>>>>>>> worked
>>>>>>>>>>> fine.
>>>>>>>>>>>
>>>>>>>>>>> -- Testing Xenomai --
>>>>>>>>>>> First I sucessfully could run the simple xenomai regression
>>>>>>>>>>> test:
>>>>>>>>>>> xeno-regression-test -l "/usr/lib/xenomai/testsuite/dohell - 
>>>>>>>>>>> m /
>>>>>>>>>>> tmp
>>>>>>>>>>> 100" -t
>>>>>>>>>>> 2 which produced the output in [6] and the following  
>>>>>>>>>>> additional
>>>>>>>>>>> messages
>>>>>>>>>>> with dmesg:
>>>>>>>>>>>
>>>>>>>>>>> [  476.215057] Xenomai: RTDM: closing file descriptor 1.
>>>>>>>>>>> [  477.434936] Xenomai: Posix: destroying semaphore f0069c00.
>>>>>>>>>>> [  477.440887] Xenomai: Posix: destroying mutex f0069a00.
>>>>>>>>>>> [  477.475372] xnheap: destroying shared heap 'rt_heap: heap'
>>>>>>>>>>> with
>>>>>>>>>>> 16384
>>>>>>>>>>> bytes still in use.
>>>>>>>>>>> [  479.008453] Xenomai: Switching rt_task to secondary mode
>>>>>>>>>>> after
>>>>>>>>>>> exception
>>>>>>>>>>> #0 from user-space at 0x9620 (pid 2145)
>>>>>>>>>>> [  480.574462] Xenomai: watchdog triggered -- signaling  
>>>>>>>>>>> runaway
>>>>>>>>>>> thread
>>>>>>>>>>> 'rt_task'
>>>>>>>>>>> [  480.582061] [sched_delayed] sched: RT throttling activated
>>>>>>>>>>> [  557.336425] Xenomai: Posix: closing message queue  
>>>>>>>>>>> descriptor
>>>>>>>>>>> 3.
>>>>>>>>>>>
>>>>>>>>>>> and  "cat /proc/xenomai/*" produced [7].
>>>>>>>>>>>
>>>>>>>>>>> When I started the realistic xenomai regression test: xeno-
>>>>>>>>>>> regression-test
>>>>>>>>>>> -l "/usr/lib/xenomai/testsuite/dohell -m /tmp -l /opt/ltp" - 
>>>>>>>>>>> t 2
>>>>>>>>>>> everything
>>>>>>>>>>> seemed fine at first - I could logon and start top to inspect
>>>>>>>>>>> the
>>>>>>>>>>> running
>>>>>>>>>>> processes. However, the command line (over serial and  
>>>>>>>>>>> ethernet)
>>>>>>>>>>> consistently freezes after a while (at different ltp tests
>>>>>>>>>>> though).
>>>>>>>>>>> First I
>>>>>>>>>>> thought it's the massive system load which doesn't leave CPU
>>>>>>>>>>> for
>>>>>>>>>>> the
>>>>>>>>>>> console... however ctrl-c of xeno-regression-test does not  
>>>>>>>>>>> help
>>>>>>>>>>> to
>>>>>>>>>>> regain
>>>>>>>>>>> console access...
>>>>>>>>>>
>>>>>>>>>> That is because kill xeno-regression-test does not kill all  
>>>>>>>>>> the
>>>>>>>>>> script children. So, basically, the load tasks are still
>>>>>>>>>> running.
>>>>>>>>>> Also, what filesystem is /tmp? dohell is using dd to
>>>>>>>>>> alternatively
>>>>>>>>>> write to /tmp, then erase the file. If /tmp is some flash, it
>>>>>>>>>> will
>>>>>>>>>> become slow after a while. If it is a tmpfs, it will eat RAM.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> The described problem is _very_ reproducible on my PandaBoard  
>>>>>>>>> ES
>>>>>>>>> (omap4460), where I boot from an SD card partition and the  
>>>>>>>>> rootfs
>>>>>>>>> is
>>>>>>>>> also on the SD card partition. I tried it with several kernel
>>>>>>>>> versions
>>>>>>>>> (3.8.13, 3.10.18, and 3.10.34) with the latest ipipe and  
>>>>>>>>> xenomai
>>>>>>>>> from
>>>>>>>>> git the git repos. Everytime I start the regression test (see
>>>>>>>>> command
>>>>>>>>> above) the following happens: Everything works fine until the
>>>>>>>>> switch/
>>>>>>>>> latency tests start. Then I see that there is heavy access to  
>>>>>>>>> the
>>>>>>>>> SD
>>>>>>>>> card, which is expected, as the status LED 2 is blinking. After
>>>>>>>>> ~5mins
>>>>>>>>> this status LED is constantly on. That's when I know that
>>>>>>>>> everything
>>>>>>>>> is over. On the console I can only execute commands that are
>>>>>>>>> already
>>>>>>>>> in RAM, such as the bash things like ps, mount, ... However,  
>>>>>>>>> if I
>>>>>>>>> try
>>>>>>>>> a simple 'touch new' it blocks forever and I know that it
>>>>>>>>> blocks in
>>>>>>>>> the syscall where the file should be created, because I  
>>>>>>>>> looked at
>>>>>>>>> it
>>>>>>>>> with strace. I tried several things: I turned off CONFIG_PM
>>>>>>>>> (which
>>>>>>>>> was
>>>>>>>>> on by default), turned on the MMC debugging, put extra  
>>>>>>>>> prink's in
>>>>>>>>> the
>>>>>>>>> omap_hsmmc.c ISR. However, everything seems to work on this
>>>>>>>>> level:
>>>>>>>>> DMA
>>>>>>>>> requests are started and do finish, the ISR is called regularly
>>>>>>>>> (bc
>>>>>>>>> first I though that Xenomai would starve it).
>>>>>>>>>
>>>>>>>>> Have you every run Xenonmai on this _specific_ board (since
>>>>>>>>> everything
>>>>>>>>> is running smoothly on the omap5 board)?
>>>>>>>>> Any more ideas how to debug it?
>>>>>>>>>
>>>>>>>>> Currently, I'm compiling the ipipe trace in hope that it would
>>>>>>>>> tell
>>>>>>>>> me
>>>>>>>>> something useful...
>>>>>>>>>
>>>>>>>>> Oh yes, the best bit is that the regression test works  
>>>>>>>>> perfectly
>>>>>>>>> fine
>>>>>>>>> if I boot from an external USB HD _AND_ unmount (!) all MMC
>>>>>>>>> partitions.
>>>>>>>>
>>>>>>>> So, the MMC driver has a problem. Have you tried:
>>>>>>>> - running the exact same kernel configuration only with
>>>>>>>> CONFIG_XENOMAI
>>>>>>>> disabled (and stress with dohell)
>>>>>>>> - then with CONFIG_XENOMAI and CONFIG_IPIPE disabled.
>>>>>>>>
>>>>>>>> Also, do you have this patch in the tree you tried?
>>>>>>>> http://git.xenomai.org/ipipe.git/commit/?h=stable/ipipe-3.10.18&id=c26e7ad5679f9391cd8ea1db001bf301d2f6bc88
>>>>>>>>
>>>>>>>
>>>>>>> First i mounted tmpfs on /tmp so I don't wear out the SD card too
>>>>>>> much:
>>>>>>> mount -t tmpfs -osize=192M tmpfs /tmp
>>>>>>>
>>>>>>> Then I used the following line to start the test (substitute  
>>>>>>> MYTEST
>>>>>>> below with the following line):
>>>>>>> /usr/lib/xenomai/testsuite/dohell -m /tmp -l /opt/ltp
>>>>>>>
>>>>>>> Note: I always monitored the test over wifi with 'top' so I also
>>>>>>> had
>>>>>>> some network load...
>>>>>>>
>>>>>>> I got the following results with the 3.10.34 kernel, which  
>>>>>>> includes
>>>>>>> everything up to the current ipipe-3.10 tag (it also included the
>>>>>>> patch you mentioned):
>>>>>>>
>>>>>>> - xeno-regression-test "MYTEST" -> FAIL if booted from SD card  
>>>>>>> (see
>>>>>>> description above); OK if booted from ext USB HD _AND_ no mmc
>>>>>>> partitions mounted
>>>>>>> - CONFIG_IPIPE && CONFIG_XENOMAI && MYTEST -> FAIL (got status
>>>>>>> LED 2
>>>>>>> constantly on as described above)
>>>>>>> - CONFIG_IPIPE && MYTEST -> OK (see attached config file and ltp
>>>>>>> test
>>>>>>> log)
>>>>>>>
>>>>>>> Anything else I should try?
>>>>>>
>>>>>> Is the current LTP test when the failure happens always the same?
>>>>>>
>>>>>>
>>>>>
>>>>> I went through all the logfiles on my pandaboard and and identified
>>>>> the last tests that ltp logged before the error occurred (I'm
>>>>> assuming
>>>>> that ltp writes to the file in /opt/ltp/results after completing  
>>>>> the
>>>>> test since there is the PASS/FAIL note as well, which logically
>>>>> should
>>>>> only be available after completing the test):
>>>>>
>>>>> test                               count
>>>>> ========================
>>>>> rt_sigqueueinfo01    1
>>>>> clock_nanosleep01 10
>>>>> munmap02                1
>>>>> semget06                   1
>>>>> epoll_create1_01     5
>>>>> splice01                      1
>>>>> clock_getres01          1
>>>>> rename13                   1
>>>>> BindMounts                1
>>>>> utimes01                     1
>>>>>
>>>>> So it seems that the test after 'clock_nanosleep01', which is
>>>>> 'clone01' according to the LTP log file I sent you, seems to be the
>>>>> prime hotspot of failure followed by 'epoll01', which comes after
>>>>> 'epoll_create1_01'.
>>>>>
>>>>> I'm using the standard LTP version 'ltp-full-20130904', which I
>>>>> downloaded and compiled on the target with gcc 4.6.3 (default  
>>>>> debian
>>>>> wheezy).
>>>>
>>>> Ok. I am not sure it is meaningful. Anyway, the only difference
>>>> between
>>>> CONFIG_XENOMAI + CONFIG_IPIPE and CONFIG_IPIPE alone, provided that
>>>> you
>>>> are not running any program using Xenomai, is the host tick  
>>>> emulation.
>>>>
>>>> So, could you please try to turn off
>>>> CONFIG_NO_HZ_IDLE
>>>> CONFIG_NO_HZ
>>>> CONFIG_HIGH_RES_TIMERS
>>>>
>>>> And see if it works better?
>>>>
>>>
>>> As I wrote before, I recompiled the Kernel with your timer options  
>>> and
>>> CONFIG_XENOMAI, installed it, synced it and rebooted after cutting  
>>> the
>>> power to the board for ~10secs.
>>>
>>> It seems with those options it got much further with the tests.
>>> However, eventually all ssh connections broke up and the last  
>>> messages
>>> on the console, where I started do hell were:
>>>
>>> [...]
>>> 102400000 bytes (102 MB) copied, 2.97674 s, 34.4 MB/s
>>> 100+0 records in
>>> 100+0 records out
>>> 102400000 bytes (102 MB) copied, 1.97433 s, 51.9 MB/s
>>> 100+0 records in
>>> 100+0 records out
>>> 102400000 bytes (102 MB) copied, 2.68371 s, 38.2 MB/s
>>> 100+0 records in
>>> 100+0 records out
>>> 102400000 bytes (102 MB) copied, 2.57073 s, 39.8 MB/s
>>> dd: writing `/tmp/bigfile': No space left on device
>>> 7+0 records in
>>> 6+0 records out
>>> 6164480 bytes (6.2 MB) copied, 0.189001 s, 32.6 MB/s
>>> /usr/lib/xenomai/testsuite/dohell: 62: /usr/lib/xenomai/testsuite/
>>> dohell: Cannot fork
>>
>> This may simply be due to some LTP test which forks a lot and prevent
>> the system from being able to fork. This should be a temporary  
>> solution.
>>
>>> Write failed: Host is down
>>>
>>> ... and as usuall status LED 2 is permanently on.
>>>
>>> As u suspect there's something wrong with the timer subsystem I  
>>> looked
>>> around a bit what extra patches went into the 3.10.14 kernel of
>>> RobertCNelson, which I used as a base to merge the ipipe git tree.
>>> Here is the list:
>>>
>>> 0001-panda-fix-wl12xx-regulator.patch
>>> 0002-ti-st-st-kim-fixing-firmware-path.patch
>>> 0003-Panda-expansion-add-spidev.patch
>>> 0004-HACK-PandaES-disable-cpufreq-so-board-will-boot.patch
>>> 0005-HACK-panda-enable-OMAP4_ERRATA_I688.patch
>>> 0006-ARM-hw_breakpoint-Enable-debug-powerdown-only-if-sys.patch
>>> 0007-Revert-regulator-twl-Remove-TWL6030_FIXED_RESOURCE.patch
>>> 0008-Revert-regulator-twl-Remove-another-unused-variable-.patch
>>> 0009-Revert-regulator-twl-Remove-references-to-the-twl403.patch
>>> 0010-Revert-regulator-twl-Remove-references-to-32kHz-cloc.patch
>>> 0011-panda-spidev-setup-pinmux.patch
>>>
>>> Do you think those may have something to do with it?
>>
>> I do not think so. When the LED is still on, can you use the serial
>> console to run cat /proc/interrupts to see if the timer is still  
>> ticking?
>>
> 
> I ran the test again with the same kernel and traced the messages from  
> the serial console with minicom. Again, the test ran for quite some  
> time until I got stacktraces similar to [1] (which might be just  
> related to the ltp memcg test).
> 
> However, after these stacktraces I got the following message on the  
> serial console (LED2 also went on and stayed on):
> 
> [...]
> [ 6674.540000] omap_hsmmc omap_hsmmc.0: MMC start dma failure
> [ 6674.540000] mmcblk0: unknown error -22 sending read/write command,  
> card status 0x900
> [ 6674.550000] end_request: I/O error, dev mmcblk0, sector 12751744
> [ 6674.560000] EXT4-fs warning (device mmcblk0p2):  
> __ext4_read_dirblock:908: error reading directory block (ino 397703,  
> block 0)
> [...]
> [ 6932.610000] omap_hsmmc omap_hsmmc.0: MMC start dma failure
> [ 6932.610000] mmcblk0: unknown error -22 sending read/write command,  
> card status 0x900
> [ 6932.620000] end_request: I/O error, dev mmcblk0, sector 21142904
> [ 6932.630000] EXT4-fs warning (device mmcblk0p2):  
> __ext4_read_dirblock:908: error reading directory block (ino 657554,  
> block 0)
> [...]
> 
> Although dd is still running on minicom, I lost the ssh connection  
> over Ethernet (and I couldn't get it back even after unconnecting and  
> reconnecting the cable, which didn't cause any PHY interrupt in dmesg  
> as well) and I cannot Ctrl-C or do anything on the serial console... I  
> just see dd, which was started by dohell, getting invoked.

What I meant is to use minicom as a console, already logged in, doing
nothing, ready to be used when the bug happens.

Anyway, I think there is no way around understanding the MMC driver now.

The bug when starting DMA may simply be due to the fact that all
previous DMAs stalled.

Will look at this if I can reproduce it on my panda (it is currently
testing the I-pipe for 3.14, but when it is finished, I will try and
reproduce the bug).


-- 
                                                                Gilles.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xenomai] Command line freeze during xeno-regression-test on omap4460
  2014-04-07 10:52                     ` Gilles Chanteperdrix
@ 2014-04-07 13:41                       ` Andreas Glatz
  0 siblings, 0 replies; 28+ messages in thread
From: Andreas Glatz @ 2014-04-07 13:41 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai


On 7 Apr 2014, at 11:52, Gilles Chanteperdrix wrote:

> On 04/07/2014 12:18 PM, Andreas Glatz wrote:
>>
>> On 6 Apr 2014, at 22:04, Gilles Chanteperdrix wrote:
>>
>>> On 04/06/2014 10:57 PM, Andreas Glatz wrote:
>>>>
>>>> On 6 Apr 2014, at 16:28, Gilles Chanteperdrix wrote:
>>>>
>>>>> On 04/06/2014 05:22 PM, Andreas Glatz wrote:
>>>>>>
>>>>>> On 6 Apr 2014, at 15:44, Gilles Chanteperdrix wrote:
>>>>>>
>>>>>>> On 04/06/2014 01:21 PM, Andreas Glatz wrote:
>>>>>>>>
>>>>>>>> On 4 Apr 2014, at 11:44, Gilles Chanteperdrix wrote:
>>>>>>>>
>>>>>>>>> On 04/04/2014 12:27 PM, Andreas Glatz wrote:
>>>>>>>>>> Hi Gilles,
>>>>>>>>>>
>>>>>>>>>> I'm finally back to my original problem below:
>>>>>>>>>>
>>>>>>>>>> On 6 Jan 2014, at 17:39, Gilles Chanteperdrix wrote:
>>>>>>>>>>
>>>>>>>>>>> On 01/06/2014 04:30 PM, Andreas Glatz wrote:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> I managed to produce a kernel (v3.8.13) with xenomai 2.6.3
>>>>>>>>>>>> ipipe
>>>>>>>>>>>> patch and
>>>>>>>>>>>> rootfs (debian wheezy) with xenomai 2.6.3 libraries for my
>>>>>>>>>>>> Pandaboard ES
>>>>>>>>>>>> (omap4460). The simple regression test, which only calls dd
>>>>>>>>>>>> during
>>>>>>>>>>>> the
>>>>>>>>>>>> switchtest, works fine. However the regression test with  
>>>>>>>>>>>> the
>>>>>>>>>>>> linux
>>>>>>>>>>>> test
>>>>>>>>>>>> project (ltp-full-20130904) scripts causes some sort of
>>>>>>>>>>>> system
>>>>>>>>>>>> lock
>>>>>>>>>>>> up.
>>>>>>>>>>>> After that I only can ctrl-c xeno-regression-test (i.e.
>>>>>>>>>>>> switchtest), which,
>>>>>>>>>>>> however, doesn't help to regain console access (neigher  
>>>>>>>>>>>> over
>>>>>>>>>>>> ethernet nor
>>>>>>>>>>>> serial).
>>>>>>>>>>>>
>>>>>>>>>>>> Here's what I did:
>>>>>>>>>>>>
>>>>>>>>>>>> -- Building --
>>>>>>>>>>>> As recomended in the Xenomai 2.6 readme I followed the
>>>>>>>>>>>> instructions
>>>>>>>>>>>> in [1]
>>>>>>>>>>>> to produce a kernel and filesystem. To get a xenomai  
>>>>>>>>>>>> kernel I
>>>>>>>>>>>> had
>>>>>>>>>>>> to do
>>>>>>>>>>>> three things differently:
>>>>>>>>>>>>
>>>>>>>>>>>> *) I used: git checkout origin/v3.8.x -b tmp
>>>>>>>>>>>> *) I applied ipipe-core-3.8.13-arm-3.patch from the
>>>>>>>>>>>> xenomai-2.6
>>>>>>>>>>>> git
>>>>>>>>>>>> tree as
>>>>>>>>>>>> described in the Xenomai 2.6 readme
>>>>>>>>>>>> *) I disabled KGDB and TIDSPBRIDGE since those produced
>>>>>>>>>>>> compile
>>>>>>>>>>>> errors (see
>>>>>>>>>>>> config [2])
>>>>>>>>>>>>
>>>>>>>>>>>> After a while I obtained the following messages from dmesg
>>>>>>>>>>>> [3]
>>>>>>>>>>>> and
>>>>>>>>>>>> from the
>>>>>>>>>>>> command prompt:
>>>>>>>>>>>>
>>>>>>>>>>>> root@arm:~# cat /proc/version
>>>>>>>>>>>> Linux version 3.8.13-x3.6 (aglatz@linuxvbox) (gcc version
>>>>>>>>>>>> 4.7.3
>>>>>>>>>>>> 20130328
>>>>>>>>>>>> (prerelease) (crosstool-NG
>>>>>>>>>>>> linaro-1.13.1-4.7-2013.04-20130415 -
>>>>>>>>>>>> Linaro GCC
>>>>>>>>>>>> 2013.04) ) #4 SMP Sat Jan 4 15:54:20 GMT 2014
>>>>>>>>>>>>
>>>>>>>>>>>> -- Testing Linux --
>>>>>>>>>>>> To see if everything works I downloaded and cross-compiled
>>>>>>>>>>>> ltp-full-20130904 [4] with the same toolchain and flags (-
>>>>>>>>>>>> march=armv7-a
>>>>>>>>>>>> -mfpu=vfp3) as the xenomai libs and runtime. I started ltp
>>>>>>>>>>>> with
>>>>>>>>>>>> "./
>>>>>>>>>>>> runltp
>>>>>>>>>>>> -p -l dohell-2014-01-06-1.log -S xenomai.skiplist" and
>>>>>>>>>>>> after a
>>>>>>>>>>>> while it
>>>>>>>>>>>> finished with a few failed tests [5]. The console access,
>>>>>>>>>>>> however,
>>>>>>>>>>>> worked
>>>>>>>>>>>> fine.
>>>>>>>>>>>>
>>>>>>>>>>>> -- Testing Xenomai --
>>>>>>>>>>>> First I sucessfully could run the simple xenomai regression
>>>>>>>>>>>> test:
>>>>>>>>>>>> xeno-regression-test -l "/usr/lib/xenomai/testsuite/ 
>>>>>>>>>>>> dohell -
>>>>>>>>>>>> m /
>>>>>>>>>>>> tmp
>>>>>>>>>>>> 100" -t
>>>>>>>>>>>> 2 which produced the output in [6] and the following
>>>>>>>>>>>> additional
>>>>>>>>>>>> messages
>>>>>>>>>>>> with dmesg:
>>>>>>>>>>>>
>>>>>>>>>>>> [  476.215057] Xenomai: RTDM: closing file descriptor 1.
>>>>>>>>>>>> [  477.434936] Xenomai: Posix: destroying semaphore  
>>>>>>>>>>>> f0069c00.
>>>>>>>>>>>> [  477.440887] Xenomai: Posix: destroying mutex f0069a00.
>>>>>>>>>>>> [  477.475372] xnheap: destroying shared heap 'rt_heap:  
>>>>>>>>>>>> heap'
>>>>>>>>>>>> with
>>>>>>>>>>>> 16384
>>>>>>>>>>>> bytes still in use.
>>>>>>>>>>>> [  479.008453] Xenomai: Switching rt_task to secondary mode
>>>>>>>>>>>> after
>>>>>>>>>>>> exception
>>>>>>>>>>>> #0 from user-space at 0x9620 (pid 2145)
>>>>>>>>>>>> [  480.574462] Xenomai: watchdog triggered -- signaling
>>>>>>>>>>>> runaway
>>>>>>>>>>>> thread
>>>>>>>>>>>> 'rt_task'
>>>>>>>>>>>> [  480.582061] [sched_delayed] sched: RT throttling  
>>>>>>>>>>>> activated
>>>>>>>>>>>> [  557.336425] Xenomai: Posix: closing message queue
>>>>>>>>>>>> descriptor
>>>>>>>>>>>> 3.
>>>>>>>>>>>>
>>>>>>>>>>>> and  "cat /proc/xenomai/*" produced [7].
>>>>>>>>>>>>
>>>>>>>>>>>> When I started the realistic xenomai regression test: xeno-
>>>>>>>>>>>> regression-test
>>>>>>>>>>>> -l "/usr/lib/xenomai/testsuite/dohell -m /tmp -l /opt/ 
>>>>>>>>>>>> ltp" -
>>>>>>>>>>>> t 2
>>>>>>>>>>>> everything
>>>>>>>>>>>> seemed fine at first - I could logon and start top to  
>>>>>>>>>>>> inspect
>>>>>>>>>>>> the
>>>>>>>>>>>> running
>>>>>>>>>>>> processes. However, the command line (over serial and
>>>>>>>>>>>> ethernet)
>>>>>>>>>>>> consistently freezes after a while (at different ltp tests
>>>>>>>>>>>> though).
>>>>>>>>>>>> First I
>>>>>>>>>>>> thought it's the massive system load which doesn't leave  
>>>>>>>>>>>> CPU
>>>>>>>>>>>> for
>>>>>>>>>>>> the
>>>>>>>>>>>> console... however ctrl-c of xeno-regression-test does not
>>>>>>>>>>>> help
>>>>>>>>>>>> to
>>>>>>>>>>>> regain
>>>>>>>>>>>> console access...
>>>>>>>>>>>
>>>>>>>>>>> That is because kill xeno-regression-test does not kill all
>>>>>>>>>>> the
>>>>>>>>>>> script children. So, basically, the load tasks are still
>>>>>>>>>>> running.
>>>>>>>>>>> Also, what filesystem is /tmp? dohell is using dd to
>>>>>>>>>>> alternatively
>>>>>>>>>>> write to /tmp, then erase the file. If /tmp is some flash,  
>>>>>>>>>>> it
>>>>>>>>>>> will
>>>>>>>>>>> become slow after a while. If it is a tmpfs, it will eat  
>>>>>>>>>>> RAM.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The described problem is _very_ reproducible on my PandaBoard
>>>>>>>>>> ES
>>>>>>>>>> (omap4460), where I boot from an SD card partition and the
>>>>>>>>>> rootfs
>>>>>>>>>> is
>>>>>>>>>> also on the SD card partition. I tried it with several kernel
>>>>>>>>>> versions
>>>>>>>>>> (3.8.13, 3.10.18, and 3.10.34) with the latest ipipe and
>>>>>>>>>> xenomai
>>>>>>>>>> from
>>>>>>>>>> git the git repos. Everytime I start the regression test (see
>>>>>>>>>> command
>>>>>>>>>> above) the following happens: Everything works fine until the
>>>>>>>>>> switch/
>>>>>>>>>> latency tests start. Then I see that there is heavy access to
>>>>>>>>>> the
>>>>>>>>>> SD
>>>>>>>>>> card, which is expected, as the status LED 2 is blinking.  
>>>>>>>>>> After
>>>>>>>>>> ~5mins
>>>>>>>>>> this status LED is constantly on. That's when I know that
>>>>>>>>>> everything
>>>>>>>>>> is over. On the console I can only execute commands that are
>>>>>>>>>> already
>>>>>>>>>> in RAM, such as the bash things like ps, mount, ... However,
>>>>>>>>>> if I
>>>>>>>>>> try
>>>>>>>>>> a simple 'touch new' it blocks forever and I know that it
>>>>>>>>>> blocks in
>>>>>>>>>> the syscall where the file should be created, because I
>>>>>>>>>> looked at
>>>>>>>>>> it
>>>>>>>>>> with strace. I tried several things: I turned off CONFIG_PM
>>>>>>>>>> (which
>>>>>>>>>> was
>>>>>>>>>> on by default), turned on the MMC debugging, put extra
>>>>>>>>>> prink's in
>>>>>>>>>> the
>>>>>>>>>> omap_hsmmc.c ISR. However, everything seems to work on this
>>>>>>>>>> level:
>>>>>>>>>> DMA
>>>>>>>>>> requests are started and do finish, the ISR is called  
>>>>>>>>>> regularly
>>>>>>>>>> (bc
>>>>>>>>>> first I though that Xenomai would starve it).
>>>>>>>>>>
>>>>>>>>>> Have you every run Xenonmai on this _specific_ board (since
>>>>>>>>>> everything
>>>>>>>>>> is running smoothly on the omap5 board)?
>>>>>>>>>> Any more ideas how to debug it?
>>>>>>>>>>
>>>>>>>>>> Currently, I'm compiling the ipipe trace in hope that it  
>>>>>>>>>> would
>>>>>>>>>> tell
>>>>>>>>>> me
>>>>>>>>>> something useful...
>>>>>>>>>>
>>>>>>>>>> Oh yes, the best bit is that the regression test works
>>>>>>>>>> perfectly
>>>>>>>>>> fine
>>>>>>>>>> if I boot from an external USB HD _AND_ unmount (!) all MMC
>>>>>>>>>> partitions.
>>>>>>>>>
>>>>>>>>> So, the MMC driver has a problem. Have you tried:
>>>>>>>>> - running the exact same kernel configuration only with
>>>>>>>>> CONFIG_XENOMAI
>>>>>>>>> disabled (and stress with dohell)
>>>>>>>>> - then with CONFIG_XENOMAI and CONFIG_IPIPE disabled.
>>>>>>>>>
>>>>>>>>> Also, do you have this patch in the tree you tried?
>>>>>>>>> http://git.xenomai.org/ipipe.git/commit/?h=stable/ipipe-3.10.18&id=c26e7ad5679f9391cd8ea1db001bf301d2f6bc88
>>>>>>>>>
>>>>>>>>
>>>>>>>> First i mounted tmpfs on /tmp so I don't wear out the SD card  
>>>>>>>> too
>>>>>>>> much:
>>>>>>>> mount -t tmpfs -osize=192M tmpfs /tmp
>>>>>>>>
>>>>>>>> Then I used the following line to start the test (substitute
>>>>>>>> MYTEST
>>>>>>>> below with the following line):
>>>>>>>> /usr/lib/xenomai/testsuite/dohell -m /tmp -l /opt/ltp
>>>>>>>>
>>>>>>>> Note: I always monitored the test over wifi with 'top' so I  
>>>>>>>> also
>>>>>>>> had
>>>>>>>> some network load...
>>>>>>>>
>>>>>>>> I got the following results with the 3.10.34 kernel, which
>>>>>>>> includes
>>>>>>>> everything up to the current ipipe-3.10 tag (it also included  
>>>>>>>> the
>>>>>>>> patch you mentioned):
>>>>>>>>
>>>>>>>> - xeno-regression-test "MYTEST" -> FAIL if booted from SD card
>>>>>>>> (see
>>>>>>>> description above); OK if booted from ext USB HD _AND_ no mmc
>>>>>>>> partitions mounted
>>>>>>>> - CONFIG_IPIPE && CONFIG_XENOMAI && MYTEST -> FAIL (got status
>>>>>>>> LED 2
>>>>>>>> constantly on as described above)
>>>>>>>> - CONFIG_IPIPE && MYTEST -> OK (see attached config file and  
>>>>>>>> ltp
>>>>>>>> test
>>>>>>>> log)
>>>>>>>>
>>>>>>>> Anything else I should try?
>>>>>>>
>>>>>>> Is the current LTP test when the failure happens always the  
>>>>>>> same?
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> I went through all the logfiles on my pandaboard and and  
>>>>>> identified
>>>>>> the last tests that ltp logged before the error occurred (I'm
>>>>>> assuming
>>>>>> that ltp writes to the file in /opt/ltp/results after completing
>>>>>> the
>>>>>> test since there is the PASS/FAIL note as well, which logically
>>>>>> should
>>>>>> only be available after completing the test):
>>>>>>
>>>>>> test                               count
>>>>>> ========================
>>>>>> rt_sigqueueinfo01    1
>>>>>> clock_nanosleep01 10
>>>>>> munmap02                1
>>>>>> semget06                   1
>>>>>> epoll_create1_01     5
>>>>>> splice01                      1
>>>>>> clock_getres01          1
>>>>>> rename13                   1
>>>>>> BindMounts                1
>>>>>> utimes01                     1
>>>>>>
>>>>>> So it seems that the test after 'clock_nanosleep01', which is
>>>>>> 'clone01' according to the LTP log file I sent you, seems to be  
>>>>>> the
>>>>>> prime hotspot of failure followed by 'epoll01', which comes after
>>>>>> 'epoll_create1_01'.
>>>>>>
>>>>>> I'm using the standard LTP version 'ltp-full-20130904', which I
>>>>>> downloaded and compiled on the target with gcc 4.6.3 (default
>>>>>> debian
>>>>>> wheezy).
>>>>>
>>>>> Ok. I am not sure it is meaningful. Anyway, the only difference
>>>>> between
>>>>> CONFIG_XENOMAI + CONFIG_IPIPE and CONFIG_IPIPE alone, provided  
>>>>> that
>>>>> you
>>>>> are not running any program using Xenomai, is the host tick
>>>>> emulation.
>>>>>
>>>>> So, could you please try to turn off
>>>>> CONFIG_NO_HZ_IDLE
>>>>> CONFIG_NO_HZ
>>>>> CONFIG_HIGH_RES_TIMERS
>>>>>
>>>>> And see if it works better?
>>>>>
>>>>
>>>> As I wrote before, I recompiled the Kernel with your timer options
>>>> and
>>>> CONFIG_XENOMAI, installed it, synced it and rebooted after cutting
>>>> the
>>>> power to the board for ~10secs.
>>>>
>>>> It seems with those options it got much further with the tests.
>>>> However, eventually all ssh connections broke up and the last
>>>> messages
>>>> on the console, where I started do hell were:
>>>>
>>>> [...]
>>>> 102400000 bytes (102 MB) copied, 2.97674 s, 34.4 MB/s
>>>> 100+0 records in
>>>> 100+0 records out
>>>> 102400000 bytes (102 MB) copied, 1.97433 s, 51.9 MB/s
>>>> 100+0 records in
>>>> 100+0 records out
>>>> 102400000 bytes (102 MB) copied, 2.68371 s, 38.2 MB/s
>>>> 100+0 records in
>>>> 100+0 records out
>>>> 102400000 bytes (102 MB) copied, 2.57073 s, 39.8 MB/s
>>>> dd: writing `/tmp/bigfile': No space left on device
>>>> 7+0 records in
>>>> 6+0 records out
>>>> 6164480 bytes (6.2 MB) copied, 0.189001 s, 32.6 MB/s
>>>> /usr/lib/xenomai/testsuite/dohell: 62: /usr/lib/xenomai/testsuite/
>>>> dohell: Cannot fork
>>>
>>> This may simply be due to some LTP test which forks a lot and  
>>> prevent
>>> the system from being able to fork. This should be a temporary
>>> solution.
>>>
>>>> Write failed: Host is down
>>>>
>>>> ... and as usuall status LED 2 is permanently on.
>>>>
>>>> As u suspect there's something wrong with the timer subsystem I
>>>> looked
>>>> around a bit what extra patches went into the 3.10.14 kernel of
>>>> RobertCNelson, which I used as a base to merge the ipipe git tree.
>>>> Here is the list:
>>>>
>>>> 0001-panda-fix-wl12xx-regulator.patch
>>>> 0002-ti-st-st-kim-fixing-firmware-path.patch
>>>> 0003-Panda-expansion-add-spidev.patch
>>>> 0004-HACK-PandaES-disable-cpufreq-so-board-will-boot.patch
>>>> 0005-HACK-panda-enable-OMAP4_ERRATA_I688.patch
>>>> 0006-ARM-hw_breakpoint-Enable-debug-powerdown-only-if-sys.patch
>>>> 0007-Revert-regulator-twl-Remove-TWL6030_FIXED_RESOURCE.patch
>>>> 0008-Revert-regulator-twl-Remove-another-unused-variable-.patch
>>>> 0009-Revert-regulator-twl-Remove-references-to-the-twl403.patch
>>>> 0010-Revert-regulator-twl-Remove-references-to-32kHz-cloc.patch
>>>> 0011-panda-spidev-setup-pinmux.patch
>>>>
>>>> Do you think those may have something to do with it?
>>>
>>> I do not think so. When the LED is still on, can you use the serial
>>> console to run cat /proc/interrupts to see if the timer is still
>>> ticking?
>>>
>>
>> I ran the test again with the same kernel and traced the messages  
>> from
>> the serial console with minicom. Again, the test ran for quite some
>> time until I got stacktraces similar to [1] (which might be just
>> related to the ltp memcg test).
>>
>> However, after these stacktraces I got the following message on the
>> serial console (LED2 also went on and stayed on):
>>
>> [...]
>> [ 6674.540000] omap_hsmmc omap_hsmmc.0: MMC start dma failure
>> [ 6674.540000] mmcblk0: unknown error -22 sending read/write command,
>> card status 0x900
>> [ 6674.550000] end_request: I/O error, dev mmcblk0, sector 12751744
>> [ 6674.560000] EXT4-fs warning (device mmcblk0p2):
>> __ext4_read_dirblock:908: error reading directory block (ino 397703,
>> block 0)
>> [...]
>> [ 6932.610000] omap_hsmmc omap_hsmmc.0: MMC start dma failure
>> [ 6932.610000] mmcblk0: unknown error -22 sending read/write command,
>> card status 0x900
>> [ 6932.620000] end_request: I/O error, dev mmcblk0, sector 21142904
>> [ 6932.630000] EXT4-fs warning (device mmcblk0p2):
>> __ext4_read_dirblock:908: error reading directory block (ino 657554,
>> block 0)
>> [...]
>>
>> Although dd is still running on minicom, I lost the ssh connection
>> over Ethernet (and I couldn't get it back even after unconnecting and
>> reconnecting the cable, which didn't cause any PHY interrupt in dmesg
>> as well) and I cannot Ctrl-C or do anything on the serial  
>> console... I
>> just see dd, which was started by dohell, getting invoked.
>
> What I meant is to use minicom as a console, already logged in, doing
> nothing, ready to be used when the bug happens.

So now I started dohell over ssh and connected to the serial console  
over minicom, where I just logged in as root.

The result was that I got approx. as far as last time with the ltp  
tests. However, I couln't see the omap_hsmmc errors as last time. What  
was the same like last time was the fact that I couldn't do anything  
on the console or over ssh after the failure occurred... so  
unfortunately, I don't have any news on the 'cat /proc/interrupts'  
front.

I do remember that with the my original timer subsystem setup I was  
seeing interrupts after the failure occurred.

>
> Anyway, I think there is no way around understanding the MMC driver  
> now.
>
> The bug when starting DMA may simply be due to the fact that all
> previous DMAs stalled.
>
> Will look at this if I can reproduce it on my panda (it is currently
> testing the I-pipe for 3.14, but when it is finished, I will try and
> reproduce the bug).
>

Let me know if/how I can help. I also have a jtag hardware debugger,  
which I've never used before... it might take some time to set that up  
though...

A.



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xenomai] Command line freeze during xeno-regression-test on omap4460
  2014-01-06 15:30 [Xenomai] Command line freeze during xeno-regression-test on omap4460 Andreas Glatz
  2014-01-06 17:33 ` Gilles Chanteperdrix
  2014-01-06 17:39 ` Gilles Chanteperdrix
@ 2014-04-14  7:13 ` Gilles Chanteperdrix
  2014-04-14  7:24   ` Andreas Glatz
  2 siblings, 1 reply; 28+ messages in thread
From: Gilles Chanteperdrix @ 2014-04-14  7:13 UTC (permalink / raw)
  To: Andreas Glatz; +Cc: xenomai

On 01/06/2014 04:30 PM, Andreas Glatz wrote:
> Hi,
> 
> I managed to produce a kernel (v3.8.13) with xenomai 2.6.3 ipipe patch and
> rootfs (debian wheezy) with xenomai 2.6.3 libraries for my Pandaboard ES
> (omap4460). The simple regression test, which only calls dd during the
> switchtest, works fine. However the regression test with the linux test
> project (ltp-full-20130904) scripts causes some sort of system lock up.
> After that I only can ctrl-c xeno-regression-test (i.e. switchtest), which,
> however, doesn't help to regain console access (neigher over ethernet nor
> serial).

Hi,

I finally ran some tests with SD card: I booted my pandaboard using NFS
as usual, but ran the xeno-test script passing the mount point of the SD
card to dohell's -m option. And I could not reproduce any issue. The
kernel I used is 3.14, the configuration is omap2plus_defconfig.

Regards.

-- 
                                                                Gilles.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xenomai] Command line freeze during xeno-regression-test on omap4460
  2014-04-14  7:13 ` Gilles Chanteperdrix
@ 2014-04-14  7:24   ` Andreas Glatz
  2014-04-14  7:35     ` Gilles Chanteperdrix
  0 siblings, 1 reply; 28+ messages in thread
From: Andreas Glatz @ 2014-04-14  7:24 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai


On 14 Apr 2014, at 08:13, Gilles Chanteperdrix wrote:

> On 01/06/2014 04:30 PM, Andreas Glatz wrote:
>> Hi,
>>
>> I managed to produce a kernel (v3.8.13) with xenomai 2.6.3 ipipe  
>> patch and
>> rootfs (debian wheezy) with xenomai 2.6.3 libraries for my  
>> Pandaboard ES
>> (omap4460). The simple regression test, which only calls dd during  
>> the
>> switchtest, works fine. However the regression test with the linux  
>> test
>> project (ltp-full-20130904) scripts causes some sort of system lock  
>> up.
>> After that I only can ctrl-c xeno-regression-test (i.e.  
>> switchtest), which,
>> however, doesn't help to regain console access (neigher over  
>> ethernet nor
>> serial).
>
> Hi,
>
> I finally ran some tests with SD card: I booted my pandaboard using  
> NFS
> as usual, but ran the xeno-test script passing the mount point of  
> the SD
> card to dohell's -m option. And I could not reproduce any issue. The
> kernel I used is 3.14, the configuration is omap2plus_defconfig.
>

OK, brilliant. I'll give that a try. I'm assuming that you tested with  
the git tag 'raw/ipipe-3.14.0' ?

A.



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xenomai] Command line freeze during xeno-regression-test on omap4460
  2014-04-14  7:24   ` Andreas Glatz
@ 2014-04-14  7:35     ` Gilles Chanteperdrix
  2014-04-14 15:55       ` Andreas Glatz
  0 siblings, 1 reply; 28+ messages in thread
From: Gilles Chanteperdrix @ 2014-04-14  7:35 UTC (permalink / raw)
  To: Andreas Glatz; +Cc: xenomai

On 04/14/2014 09:24 AM, Andreas Glatz wrote:
> 
> On 14 Apr 2014, at 08:13, Gilles Chanteperdrix wrote:
> 
>> On 01/06/2014 04:30 PM, Andreas Glatz wrote:
>>> Hi,
>>>
>>> I managed to produce a kernel (v3.8.13) with xenomai 2.6.3 ipipe  
>>> patch and
>>> rootfs (debian wheezy) with xenomai 2.6.3 libraries for my  
>>> Pandaboard ES
>>> (omap4460). The simple regression test, which only calls dd during  
>>> the
>>> switchtest, works fine. However the regression test with the linux  
>>> test
>>> project (ltp-full-20130904) scripts causes some sort of system lock  
>>> up.
>>> After that I only can ctrl-c xeno-regression-test (i.e.  
>>> switchtest), which,
>>> however, doesn't help to regain console access (neigher over  
>>> ethernet nor
>>> serial).
>>
>> Hi,
>>
>> I finally ran some tests with SD card: I booted my pandaboard using  
>> NFS
>> as usual, but ran the xeno-test script passing the mount point of  
>> the SD
>> card to dohell's -m option. And I could not reproduce any issue. The
>> kernel I used is 3.14, the configuration is omap2plus_defconfig.
>>
> 
> OK, brilliant. I'll give that a try. I'm assuming that you tested with  
> the git tag 'raw/ipipe-3.14.0' ?

It is a branch actually, but yes.

-- 
                                                                Gilles.


^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [Xenomai] Command line freeze during xeno-regression-test on omap4460
  2014-04-14  7:35     ` Gilles Chanteperdrix
@ 2014-04-14 15:55       ` Andreas Glatz
  0 siblings, 0 replies; 28+ messages in thread
From: Andreas Glatz @ 2014-04-14 15:55 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai

On 14 Apr 2014, at 08:35, Gilles Chanteperdrix wrote:

> On 04/14/2014 09:24 AM, Andreas Glatz wrote:
>>
>> On 14 Apr 2014, at 08:13, Gilles Chanteperdrix wrote:
>>
>>> On 01/06/2014 04:30 PM, Andreas Glatz wrote:
>>>> Hi,
>>>>
>>>> I managed to produce a kernel (v3.8.13) with xenomai 2.6.3 ipipe
>>>> patch and
>>>> rootfs (debian wheezy) with xenomai 2.6.3 libraries for my
>>>> Pandaboard ES
>>>> (omap4460). The simple regression test, which only calls dd during
>>>> the
>>>> switchtest, works fine. However the regression test with the linux
>>>> test
>>>> project (ltp-full-20130904) scripts causes some sort of system lock
>>>> up.
>>>> After that I only can ctrl-c xeno-regression-test (i.e.
>>>> switchtest), which,
>>>> however, doesn't help to regain console access (neigher over
>>>> ethernet nor
>>>> serial).
>>>
>>> Hi,
>>>
>>> I finally ran some tests with SD card: I booted my pandaboard using
>>> NFS
>>> as usual, but ran the xeno-test script passing the mount point of
>>> the SD
>>> card to dohell's -m option. And I could not reproduce any issue. The
>>> kernel I used is 3.14, the configuration is omap2plus_defconfig.
>>>
>>
>> OK, brilliant. I'll give that a try. I'm assuming that you tested  
>> with
>> the git tag 'raw/ipipe-3.14.0' ?
>
> It is a branch actually, but yes.
>

At the bottom of this Email is the result of the first LTP pass  
(started from dohell, started from xeno-regression-test). I never got  
so far with just the SD card in the panda (I ran exactly the same ltp  
test as before). LED2 is still constantly on, but the system remains  
responsive over wifi, ethernet and serial. Max latency is ~7us and  
worst latency is ~18us. Will post the final output ASAP.

I think from the results and the other reports on the mailing list I  
think I might be well off using this kernel as a base for our open- 
source/-hardware DAQ project.

Thanks a lot Gilles!

A.


mv_tests01                     PASS       0
size01                         PASS       0
sssd01                         PASS       0
sssd02                         PASS       0
sssd03                         PASS       0
smt_smp_enabled                PASS       0
smt_smp_affinity               PASS       0
ht_interrupt                   PASS       0
kmsg01                         PASS       0
fw_load                        FAIL       2

-----------------------------------------------
Total Tests: 1345
Total Failures: 51
Kernel Version: 3.14.0-ipipe-38801-g9b33fee-dirty
Machine Architecture: armv7l
Hostname: arm

root@arm:~#




^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2014-04-14 15:55 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-06 15:30 [Xenomai] Command line freeze during xeno-regression-test on omap4460 Andreas Glatz
2014-01-06 17:33 ` Gilles Chanteperdrix
2014-01-06 17:39 ` Gilles Chanteperdrix
2014-01-07  7:23   ` Andreas Glatz
2014-01-07  8:10     ` Andreas Glatz
2014-04-04 10:27   ` Andreas Glatz
2014-04-04 10:44     ` Gilles Chanteperdrix
2014-04-04 11:19       ` Andreas Glatz
2014-04-04 11:21         ` Gilles Chanteperdrix
2014-04-06 11:21       ` Andreas Glatz
2014-04-06 14:44         ` Gilles Chanteperdrix
2014-04-06 15:22           ` Andreas Glatz
2014-04-06 15:28             ` Gilles Chanteperdrix
2014-04-06 20:57               ` Andreas Glatz
2014-04-06 21:04                 ` Gilles Chanteperdrix
2014-04-07 10:18                   ` Andreas Glatz
2014-04-07 10:52                     ` Gilles Chanteperdrix
2014-04-07 13:41                       ` Andreas Glatz
2014-04-06 15:54         ` Gilles Chanteperdrix
2014-04-06 16:02           ` Andreas Glatz
2014-04-06 20:54             ` Gilles Chanteperdrix
2014-04-06 21:23               ` Andreas Glatz
2014-04-04 11:00     ` Gilles Chanteperdrix
2014-04-04 13:38       ` Andreas Glatz
2014-04-14  7:13 ` Gilles Chanteperdrix
2014-04-14  7:24   ` Andreas Glatz
2014-04-14  7:35     ` Gilles Chanteperdrix
2014-04-14 15:55       ` Andreas Glatz

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.