* [Xenomai] Command line freeze during xeno-regression-test on omap4460 @ 2014-01-06 15:30 Andreas Glatz 2014-01-06 17:33 ` Gilles Chanteperdrix ` (2 more replies) 0 siblings, 3 replies; 28+ messages in thread From: Andreas Glatz @ 2014-01-06 15:30 UTC (permalink / raw) To: xenomai Hi, I managed to produce a kernel (v3.8.13) with xenomai 2.6.3 ipipe patch and rootfs (debian wheezy) with xenomai 2.6.3 libraries for my Pandaboard ES (omap4460). The simple regression test, which only calls dd during the switchtest, works fine. However the regression test with the linux test project (ltp-full-20130904) scripts causes some sort of system lock up. After that I only can ctrl-c xeno-regression-test (i.e. switchtest), which, however, doesn't help to regain console access (neigher over ethernet nor serial). Here's what I did: -- Building -- As recomended in the Xenomai 2.6 readme I followed the instructions in [1] to produce a kernel and filesystem. To get a xenomai kernel I had to do three things differently: *) I used: git checkout origin/v3.8.x -b tmp *) I applied ipipe-core-3.8.13-arm-3.patch from the xenomai-2.6 git tree as described in the Xenomai 2.6 readme *) I disabled KGDB and TIDSPBRIDGE since those produced compile errors (see config [2]) After a while I obtained the following messages from dmesg [3] and from the command prompt: root@arm:~# cat /proc/version Linux version 3.8.13-x3.6 (aglatz@linuxvbox) (gcc version 4.7.3 20130328 (prerelease) (crosstool-NG linaro-1.13.1-4.7-2013.04-20130415 - Linaro GCC 2013.04) ) #4 SMP Sat Jan 4 15:54:20 GMT 2014 -- Testing Linux -- To see if everything works I downloaded and cross-compiled ltp-full-20130904 [4] with the same toolchain and flags (-march=armv7-a -mfpu=vfp3) as the xenomai libs and runtime. I started ltp with "./runltp -p -l dohell-2014-01-06-1.log -S xenomai.skiplist" and after a while it finished with a few failed tests [5]. The console access, however, worked fine. -- Testing Xenomai -- First I sucessfully could run the simple xenomai regression test: xeno-regression-test -l "/usr/lib/xenomai/testsuite/dohell -m /tmp 100" -t 2 which produced the output in [6] and the following additional messages with dmesg: [ 476.215057] Xenomai: RTDM: closing file descriptor 1. [ 477.434936] Xenomai: Posix: destroying semaphore f0069c00. [ 477.440887] Xenomai: Posix: destroying mutex f0069a00. [ 477.475372] xnheap: destroying shared heap 'rt_heap: heap' with 16384 bytes still in use. [ 479.008453] Xenomai: Switching rt_task to secondary mode after exception #0 from user-space at 0x9620 (pid 2145) [ 480.574462] Xenomai: watchdog triggered -- signaling runaway thread 'rt_task' [ 480.582061] [sched_delayed] sched: RT throttling activated [ 557.336425] Xenomai: Posix: closing message queue descriptor 3. and "cat /proc/xenomai/*" produced [7]. When I started the realistic xenomai regression test: xeno-regression-test -l "/usr/lib/xenomai/testsuite/dohell -m /tmp -l /opt/ltp" -t 2 everything seemed fine at first - I could logon and start top to inspect the running processes. However, the command line (over serial and ethernet) consistently freezes after a while (at different ltp tests though). First I thought it's the massive system load which doesn't leave CPU for the console... however ctrl-c of xeno-regression-test does not help to regain console access... even after waiting for ~10mins I could not regain access to the existing consoles nor new consoles over ethernet. It seems to me that every syscall into the kernelspace causes the calling process getting blocked and never scheduled again. -- Remaining questions -- *) Has anyone experienced something similar and/or found a(n) explanation/fix/workaround? *) Are there more debugging options I could try? Thanks for any help, Andreas -- References -- [1] http://eewiki.net/display/linuxonarm/PandaBoard [2] https://dcn060062.dcn.ed.ac.uk/main.php?cmd=image&var1=XenomaiOnArm%2F3.8.13-x3.6.config [3] https://dcn060062.dcn.ed.ac.uk/main.php?cmd=image&var1=XenomaiOnArm%2Fdmesg_after_boot.txt [4] https://sourceforge.net/projects/ltp/files/LTP%20Source/ltp-20130904/ltp-full-20130904.tar.xz/download [5] https://dcn060062.dcn.ed.ac.uk/main.php?cmd=image&var1=XenomaiOnArm%2Fdohell-2014-01-06-1.log [6] https://dcn060062.dcn.ed.ac.uk/main.php?cmd=image&var1=XenomaiOnArm%2Fxeno-regression-test_simple.txt [7] https://dcn060062.dcn.ed.ac.uk/main.php?cmd=image&var1=XenomaiOnArm%2Fxeno-regression-test_realistic_proc.txt ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Xenomai] Command line freeze during xeno-regression-test on omap4460 2014-01-06 15:30 [Xenomai] Command line freeze during xeno-regression-test on omap4460 Andreas Glatz @ 2014-01-06 17:33 ` Gilles Chanteperdrix 2014-01-06 17:39 ` Gilles Chanteperdrix 2014-04-14 7:13 ` Gilles Chanteperdrix 2 siblings, 0 replies; 28+ messages in thread From: Gilles Chanteperdrix @ 2014-01-06 17:33 UTC (permalink / raw) To: Andreas Glatz; +Cc: xenomai On 01/06/2014 04:30 PM, Andreas Glatz wrote: > Hi, > > I managed to produce a kernel (v3.8.13) with xenomai 2.6.3 ipipe patch and > rootfs (debian wheezy) with xenomai 2.6.3 libraries for my Pandaboard ES > (omap4460). The simple regression test, which only calls dd during the > switchtest, works fine. However the regression test with the linux test > project (ltp-full-20130904) scripts causes some sort of system lock up. > After that I only can ctrl-c xeno-regression-test (i.e. switchtest), which, > however, doesn't help to regain console access (neigher over ethernet nor > serial). If the problem happens during the ltp test itself (notably while running msgctl10 or msgctl11) this is normal, the system is completely overloaded. You have to wait for some time before it returns to normal. -- Gilles. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Xenomai] Command line freeze during xeno-regression-test on omap4460 2014-01-06 15:30 [Xenomai] Command line freeze during xeno-regression-test on omap4460 Andreas Glatz 2014-01-06 17:33 ` Gilles Chanteperdrix @ 2014-01-06 17:39 ` Gilles Chanteperdrix 2014-01-07 7:23 ` Andreas Glatz 2014-04-04 10:27 ` Andreas Glatz 2014-04-14 7:13 ` Gilles Chanteperdrix 2 siblings, 2 replies; 28+ messages in thread From: Gilles Chanteperdrix @ 2014-01-06 17:39 UTC (permalink / raw) To: Andreas Glatz; +Cc: xenomai On 01/06/2014 04:30 PM, Andreas Glatz wrote: > Hi, > > I managed to produce a kernel (v3.8.13) with xenomai 2.6.3 ipipe patch and > rootfs (debian wheezy) with xenomai 2.6.3 libraries for my Pandaboard ES > (omap4460). The simple regression test, which only calls dd during the > switchtest, works fine. However the regression test with the linux test > project (ltp-full-20130904) scripts causes some sort of system lock up. > After that I only can ctrl-c xeno-regression-test (i.e. switchtest), which, > however, doesn't help to regain console access (neigher over ethernet nor > serial). > > Here's what I did: > > -- Building -- > As recomended in the Xenomai 2.6 readme I followed the instructions in [1] > to produce a kernel and filesystem. To get a xenomai kernel I had to do > three things differently: > > *) I used: git checkout origin/v3.8.x -b tmp > *) I applied ipipe-core-3.8.13-arm-3.patch from the xenomai-2.6 git tree as > described in the Xenomai 2.6 readme > *) I disabled KGDB and TIDSPBRIDGE since those produced compile errors (see > config [2]) > > After a while I obtained the following messages from dmesg [3] and from the > command prompt: > > root@arm:~# cat /proc/version > Linux version 3.8.13-x3.6 (aglatz@linuxvbox) (gcc version 4.7.3 20130328 > (prerelease) (crosstool-NG linaro-1.13.1-4.7-2013.04-20130415 - Linaro GCC > 2013.04) ) #4 SMP Sat Jan 4 15:54:20 GMT 2014 > > -- Testing Linux -- > To see if everything works I downloaded and cross-compiled > ltp-full-20130904 [4] with the same toolchain and flags (-march=armv7-a > -mfpu=vfp3) as the xenomai libs and runtime. I started ltp with "./runltp > -p -l dohell-2014-01-06-1.log -S xenomai.skiplist" and after a while it > finished with a few failed tests [5]. The console access, however, worked > fine. > > -- Testing Xenomai -- > First I sucessfully could run the simple xenomai regression test: > xeno-regression-test -l "/usr/lib/xenomai/testsuite/dohell -m /tmp 100" -t > 2 which produced the output in [6] and the following additional messages > with dmesg: > > [ 476.215057] Xenomai: RTDM: closing file descriptor 1. > [ 477.434936] Xenomai: Posix: destroying semaphore f0069c00. > [ 477.440887] Xenomai: Posix: destroying mutex f0069a00. > [ 477.475372] xnheap: destroying shared heap 'rt_heap: heap' with 16384 > bytes still in use. > [ 479.008453] Xenomai: Switching rt_task to secondary mode after exception > #0 from user-space at 0x9620 (pid 2145) > [ 480.574462] Xenomai: watchdog triggered -- signaling runaway thread > 'rt_task' > [ 480.582061] [sched_delayed] sched: RT throttling activated > [ 557.336425] Xenomai: Posix: closing message queue descriptor 3. > > and "cat /proc/xenomai/*" produced [7]. > > When I started the realistic xenomai regression test: xeno-regression-test > -l "/usr/lib/xenomai/testsuite/dohell -m /tmp -l /opt/ltp" -t 2 everything > seemed fine at first - I could logon and start top to inspect the running > processes. However, the command line (over serial and ethernet) > consistently freezes after a while (at different ltp tests though). First I > thought it's the massive system load which doesn't leave CPU for the > console... however ctrl-c of xeno-regression-test does not help to regain > console access... That is because kill xeno-regression-test does not kill all the script children. So, basically, the load tasks are still running. Also, what filesystem is /tmp? dohell is using dd to alternatively write to /tmp, then erase the file. If /tmp is some flash, it will become slow after a while. If it is a tmpfs, it will eat RAM. -- Gilles. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Xenomai] Command line freeze during xeno-regression-test on omap4460 2014-01-06 17:39 ` Gilles Chanteperdrix @ 2014-01-07 7:23 ` Andreas Glatz 2014-01-07 8:10 ` Andreas Glatz 2014-04-04 10:27 ` Andreas Glatz 1 sibling, 1 reply; 28+ messages in thread From: Andreas Glatz @ 2014-01-07 7:23 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: xenomai Hi Gilles, At first /tmp was tmpfs since I didn't want to wear out my flash with the testing. Now I connected an external usb harddrive to the panda and mounted one of the harddrive partions as /tmp. Additionally, I did not load xeno_klat and xeno_rtdmtest, which were both in before. I started xeno-regression-test -l "/usr/lib/xenomai/testsuite/dohell -m /tmp -l /opt/ltp" -t 2 again (btw I modified dohell to call runltp instead of runallscripts.sh since the latter is deprecated). And the good news was that it ran nearly all the tests (everything up to cgroup*) and 'top' was working all the way. However this morning I noticed that it's still at the first test of the cgroup* and didn't go any further. 'top' stopped working and i cannot open additional consoles. On the consoles that are still open (3 of them) I can execute simple commands like ps, grep, df, ... but nothing like reboot, top, shutdown, ... Suprisingly, cat /var/log/messages also gets blocked: Console 1: root@arm:~# cat /var/log/messages ^C # <-- notice: I tried to kill it here but no response Console 2: root@arm:/opt/ltp/results# ps ax PID TTY STAT TIME COMMAND ... 15149 pts/2 D+ 0:00 cat /var/log/messages ... Nothing else is running though (neither on linux nor xenomai) : root@arm:/opt/ltp/results# cat /proc/xenomai/stat CPU PID MSW CSW PF STAT %CPU NAME 0 0 0 986564 0 00500080 100.0 ROOT/0 1 0 0 992712 0 00500080 100.0 ROOT/1 1 0 0 309543 0 00000000 0.0 IRQ29: [timer] To me this looks like a problem with the filesystem (maybe my sd flash card where the rootfs resides). I will try and install the rootfs on the external harddrive and repreat everything... maybe this might solve the problem. I still have all three consoles open, where just one is still repsonsive. Any further suggestions? Thanks for any help, A. On Mon, Jan 6, 2014 at 5:39 PM, Gilles Chanteperdrix < gilles.chanteperdrix@xenomai.org> wrote: > On 01/06/2014 04:30 PM, Andreas Glatz wrote: > >> Hi, >> >> I managed to produce a kernel (v3.8.13) with xenomai 2.6.3 ipipe patch and >> rootfs (debian wheezy) with xenomai 2.6.3 libraries for my Pandaboard ES >> (omap4460). The simple regression test, which only calls dd during the >> switchtest, works fine. However the regression test with the linux test >> project (ltp-full-20130904) scripts causes some sort of system lock up. >> After that I only can ctrl-c xeno-regression-test (i.e. switchtest), >> which, >> however, doesn't help to regain console access (neigher over ethernet nor >> serial). >> >> Here's what I did: >> >> -- Building -- >> As recomended in the Xenomai 2.6 readme I followed the instructions in [1] >> to produce a kernel and filesystem. To get a xenomai kernel I had to do >> three things differently: >> >> *) I used: git checkout origin/v3.8.x -b tmp >> *) I applied ipipe-core-3.8.13-arm-3.patch from the xenomai-2.6 git tree >> as >> described in the Xenomai 2.6 readme >> *) I disabled KGDB and TIDSPBRIDGE since those produced compile errors >> (see >> config [2]) >> >> After a while I obtained the following messages from dmesg [3] and from >> the >> command prompt: >> >> root@arm:~# cat /proc/version >> Linux version 3.8.13-x3.6 (aglatz@linuxvbox) (gcc version 4.7.3 20130328 >> (prerelease) (crosstool-NG linaro-1.13.1-4.7-2013.04-20130415 - Linaro >> GCC >> 2013.04) ) #4 SMP Sat Jan 4 15:54:20 GMT 2014 >> >> -- Testing Linux -- >> To see if everything works I downloaded and cross-compiled >> ltp-full-20130904 [4] with the same toolchain and flags (-march=armv7-a >> -mfpu=vfp3) as the xenomai libs and runtime. I started ltp with "./runltp >> -p -l dohell-2014-01-06-1.log -S xenomai.skiplist" and after a while it >> finished with a few failed tests [5]. The console access, however, worked >> fine. >> >> -- Testing Xenomai -- >> First I sucessfully could run the simple xenomai regression test: >> xeno-regression-test -l "/usr/lib/xenomai/testsuite/dohell -m /tmp 100" >> -t >> 2 which produced the output in [6] and the following additional messages >> with dmesg: >> >> [ 476.215057] Xenomai: RTDM: closing file descriptor 1. >> [ 477.434936] Xenomai: Posix: destroying semaphore f0069c00. >> [ 477.440887] Xenomai: Posix: destroying mutex f0069a00. >> [ 477.475372] xnheap: destroying shared heap 'rt_heap: heap' with 16384 >> bytes still in use. >> [ 479.008453] Xenomai: Switching rt_task to secondary mode after >> exception >> #0 from user-space at 0x9620 (pid 2145) >> [ 480.574462] Xenomai: watchdog triggered -- signaling runaway thread >> 'rt_task' >> [ 480.582061] [sched_delayed] sched: RT throttling activated >> [ 557.336425] Xenomai: Posix: closing message queue descriptor 3. >> >> and "cat /proc/xenomai/*" produced [7]. >> >> When I started the realistic xenomai regression test: xeno-regression-test >> -l "/usr/lib/xenomai/testsuite/dohell -m /tmp -l /opt/ltp" -t 2 >> everything >> seemed fine at first - I could logon and start top to inspect the running >> processes. However, the command line (over serial and ethernet) >> consistently freezes after a while (at different ltp tests though). First >> I >> thought it's the massive system load which doesn't leave CPU for the >> console... however ctrl-c of xeno-regression-test does not help to regain >> console access... >> > > That is because kill xeno-regression-test does not kill all the script > children. So, basically, the load tasks are still running. Also, what > filesystem is /tmp? dohell is using dd to alternatively write to /tmp, then > erase the file. If /tmp is some flash, it will become slow after a while. > If it is a tmpfs, it will eat RAM. > > -- > Gilles. > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Xenomai] Command line freeze during xeno-regression-test on omap4460 2014-01-07 7:23 ` Andreas Glatz @ 2014-01-07 8:10 ` Andreas Glatz 0 siblings, 0 replies; 28+ messages in thread From: Andreas Glatz @ 2014-01-07 8:10 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: xenomai Hi Gilles, On Tue, Jan 7, 2014 at 7:23 AM, Andreas Glatz <andi.glatz@gmail.com> wrote: > Hi Gilles, > > At first /tmp was tmpfs since I didn't want to wear out my flash with the > testing. Now I connected an external usb harddrive to the panda and mounted > one of the harddrive partions as /tmp. Additionally, I did not load > xeno_klat and xeno_rtdmtest, which were both in before. I started > xeno-regression-test -l "/usr/lib/xenomai/testsuite/dohell -m /tmp -l > /opt/ltp" -t 2 again (btw I modified dohell to call runltp instead of > runallscripts.sh since the latter is deprecated). And the good news was > that it ran nearly all the tests (everything up to cgroup*) and 'top' was > working all the way. However this morning I noticed that it's still at the > first test of the cgroup* and didn't go any further. 'top' stopped working > and i cannot open additional consoles. On the consoles that are still open > (3 of them) I can execute simple commands like ps, grep, df, ... but > nothing like reboot, top, shutdown, ... Suprisingly, cat /var/log/messages > also gets blocked: > > Console 1: > root@arm:~# cat /var/log/messages > ^C # <-- notice: I tried to kill it here but no response > > Console 2: > root@arm:/opt/ltp/results# ps ax > PID TTY STAT TIME COMMAND > ... > 15149 pts/2 D+ 0:00 cat /var/log/messages > ... > > Nothing else is running though (neither on linux nor xenomai) : > root@arm:/opt/ltp/results# cat /proc/xenomai/stat > CPU PID MSW CSW PF STAT %CPU NAME > 0 0 0 986564 0 00500080 100.0 ROOT/0 > 1 0 0 992712 0 00500080 100.0 ROOT/1 > 1 0 0 309543 0 00000000 0.0 IRQ29: [timer] > > To me this looks like a problem with the filesystem (maybe my sd flash > card where the rootfs resides). I will try and install the rootfs on the > external harddrive and repreat everything... maybe this might solve the > problem. > Firstly sorry for the toppost :) Secondly, I also noticed that the status led on the panda, which is triggered by mmc0, is constantly on after failure, whereas it turns only on when accessing the flash partition after a reboot. So I guest that's a good indication that mmc0 might have something to do with it? A. > > I still have all three consoles open, where just one is still repsonsive. > Any further suggestions? > > Thanks for any help, > > A. > > > > > > On Mon, Jan 6, 2014 at 5:39 PM, Gilles Chanteperdrix < > gilles.chanteperdrix@xenomai.org> wrote: > >> On 01/06/2014 04:30 PM, Andreas Glatz wrote: >> >>> Hi, >>> >>> I managed to produce a kernel (v3.8.13) with xenomai 2.6.3 ipipe patch >>> and >>> rootfs (debian wheezy) with xenomai 2.6.3 libraries for my Pandaboard ES >>> (omap4460). The simple regression test, which only calls dd during the >>> switchtest, works fine. However the regression test with the linux test >>> project (ltp-full-20130904) scripts causes some sort of system lock up. >>> After that I only can ctrl-c xeno-regression-test (i.e. switchtest), >>> which, >>> however, doesn't help to regain console access (neigher over ethernet nor >>> serial). >>> >>> Here's what I did: >>> >>> -- Building -- >>> As recomended in the Xenomai 2.6 readme I followed the instructions in >>> [1] >>> to produce a kernel and filesystem. To get a xenomai kernel I had to do >>> three things differently: >>> >>> *) I used: git checkout origin/v3.8.x -b tmp >>> *) I applied ipipe-core-3.8.13-arm-3.patch from the xenomai-2.6 git tree >>> as >>> described in the Xenomai 2.6 readme >>> *) I disabled KGDB and TIDSPBRIDGE since those produced compile errors >>> (see >>> config [2]) >>> >>> After a while I obtained the following messages from dmesg [3] and from >>> the >>> command prompt: >>> >>> root@arm:~# cat /proc/version >>> Linux version 3.8.13-x3.6 (aglatz@linuxvbox) (gcc version 4.7.3 20130328 >>> (prerelease) (crosstool-NG linaro-1.13.1-4.7-2013.04-20130415 - Linaro >>> GCC >>> 2013.04) ) #4 SMP Sat Jan 4 15:54:20 GMT 2014 >>> >>> -- Testing Linux -- >>> To see if everything works I downloaded and cross-compiled >>> ltp-full-20130904 [4] with the same toolchain and flags (-march=armv7-a >>> -mfpu=vfp3) as the xenomai libs and runtime. I started ltp with "./runltp >>> -p -l dohell-2014-01-06-1.log -S xenomai.skiplist" and after a while it >>> finished with a few failed tests [5]. The console access, however, worked >>> fine. >>> >>> -- Testing Xenomai -- >>> First I sucessfully could run the simple xenomai regression test: >>> xeno-regression-test -l "/usr/lib/xenomai/testsuite/dohell -m /tmp 100" >>> -t >>> 2 which produced the output in [6] and the following additional messages >>> with dmesg: >>> >>> [ 476.215057] Xenomai: RTDM: closing file descriptor 1. >>> [ 477.434936] Xenomai: Posix: destroying semaphore f0069c00. >>> [ 477.440887] Xenomai: Posix: destroying mutex f0069a00. >>> [ 477.475372] xnheap: destroying shared heap 'rt_heap: heap' with 16384 >>> bytes still in use. >>> [ 479.008453] Xenomai: Switching rt_task to secondary mode after >>> exception >>> #0 from user-space at 0x9620 (pid 2145) >>> [ 480.574462] Xenomai: watchdog triggered -- signaling runaway thread >>> 'rt_task' >>> [ 480.582061] [sched_delayed] sched: RT throttling activated >>> [ 557.336425] Xenomai: Posix: closing message queue descriptor 3. >>> >>> and "cat /proc/xenomai/*" produced [7]. >>> >>> When I started the realistic xenomai regression test: >>> xeno-regression-test >>> -l "/usr/lib/xenomai/testsuite/dohell -m /tmp -l /opt/ltp" -t 2 >>> everything >>> seemed fine at first - I could logon and start top to inspect the running >>> processes. However, the command line (over serial and ethernet) >>> consistently freezes after a while (at different ltp tests though). >>> First I >>> thought it's the massive system load which doesn't leave CPU for the >>> console... however ctrl-c of xeno-regression-test does not help to regain >>> console access... >>> >> >> That is because kill xeno-regression-test does not kill all the script >> children. So, basically, the load tasks are still running. Also, what >> filesystem is /tmp? dohell is using dd to alternatively write to /tmp, then >> erase the file. If /tmp is some flash, it will become slow after a while. >> If it is a tmpfs, it will eat RAM. >> >> -- >> Gilles. >> > > ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Xenomai] Command line freeze during xeno-regression-test on omap4460 2014-01-06 17:39 ` Gilles Chanteperdrix 2014-01-07 7:23 ` Andreas Glatz @ 2014-04-04 10:27 ` Andreas Glatz 2014-04-04 10:44 ` Gilles Chanteperdrix 2014-04-04 11:00 ` Gilles Chanteperdrix 1 sibling, 2 replies; 28+ messages in thread From: Andreas Glatz @ 2014-04-04 10:27 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: xenomai Hi Gilles, I'm finally back to my original problem below: On 6 Jan 2014, at 17:39, Gilles Chanteperdrix wrote: > On 01/06/2014 04:30 PM, Andreas Glatz wrote: >> Hi, >> >> I managed to produce a kernel (v3.8.13) with xenomai 2.6.3 ipipe >> patch and >> rootfs (debian wheezy) with xenomai 2.6.3 libraries for my >> Pandaboard ES >> (omap4460). The simple regression test, which only calls dd during >> the >> switchtest, works fine. However the regression test with the linux >> test >> project (ltp-full-20130904) scripts causes some sort of system lock >> up. >> After that I only can ctrl-c xeno-regression-test (i.e. >> switchtest), which, >> however, doesn't help to regain console access (neigher over >> ethernet nor >> serial). >> >> Here's what I did: >> >> -- Building -- >> As recomended in the Xenomai 2.6 readme I followed the instructions >> in [1] >> to produce a kernel and filesystem. To get a xenomai kernel I had >> to do >> three things differently: >> >> *) I used: git checkout origin/v3.8.x -b tmp >> *) I applied ipipe-core-3.8.13-arm-3.patch from the xenomai-2.6 git >> tree as >> described in the Xenomai 2.6 readme >> *) I disabled KGDB and TIDSPBRIDGE since those produced compile >> errors (see >> config [2]) >> >> After a while I obtained the following messages from dmesg [3] and >> from the >> command prompt: >> >> root@arm:~# cat /proc/version >> Linux version 3.8.13-x3.6 (aglatz@linuxvbox) (gcc version 4.7.3 >> 20130328 >> (prerelease) (crosstool-NG linaro-1.13.1-4.7-2013.04-20130415 - >> Linaro GCC >> 2013.04) ) #4 SMP Sat Jan 4 15:54:20 GMT 2014 >> >> -- Testing Linux -- >> To see if everything works I downloaded and cross-compiled >> ltp-full-20130904 [4] with the same toolchain and flags (- >> march=armv7-a >> -mfpu=vfp3) as the xenomai libs and runtime. I started ltp with "./ >> runltp >> -p -l dohell-2014-01-06-1.log -S xenomai.skiplist" and after a >> while it >> finished with a few failed tests [5]. The console access, however, >> worked >> fine. >> >> -- Testing Xenomai -- >> First I sucessfully could run the simple xenomai regression test: >> xeno-regression-test -l "/usr/lib/xenomai/testsuite/dohell -m /tmp >> 100" -t >> 2 which produced the output in [6] and the following additional >> messages >> with dmesg: >> >> [ 476.215057] Xenomai: RTDM: closing file descriptor 1. >> [ 477.434936] Xenomai: Posix: destroying semaphore f0069c00. >> [ 477.440887] Xenomai: Posix: destroying mutex f0069a00. >> [ 477.475372] xnheap: destroying shared heap 'rt_heap: heap' with >> 16384 >> bytes still in use. >> [ 479.008453] Xenomai: Switching rt_task to secondary mode after >> exception >> #0 from user-space at 0x9620 (pid 2145) >> [ 480.574462] Xenomai: watchdog triggered -- signaling runaway >> thread >> 'rt_task' >> [ 480.582061] [sched_delayed] sched: RT throttling activated >> [ 557.336425] Xenomai: Posix: closing message queue descriptor 3. >> >> and "cat /proc/xenomai/*" produced [7]. >> >> When I started the realistic xenomai regression test: xeno- >> regression-test >> -l "/usr/lib/xenomai/testsuite/dohell -m /tmp -l /opt/ltp" -t 2 >> everything >> seemed fine at first - I could logon and start top to inspect the >> running >> processes. However, the command line (over serial and ethernet) >> consistently freezes after a while (at different ltp tests though). >> First I >> thought it's the massive system load which doesn't leave CPU for the >> console... however ctrl-c of xeno-regression-test does not help to >> regain >> console access... > > That is because kill xeno-regression-test does not kill all the > script children. So, basically, the load tasks are still running. > Also, what filesystem is /tmp? dohell is using dd to alternatively > write to /tmp, then erase the file. If /tmp is some flash, it will > become slow after a while. If it is a tmpfs, it will eat RAM. > > The described problem is _very_ reproducible on my PandaBoard ES (omap4460), where I boot from an SD card partition and the rootfs is also on the SD card partition. I tried it with several kernel versions (3.8.13, 3.10.18, and 3.10.34) with the latest ipipe and xenomai from git the git repos. Everytime I start the regression test (see command above) the following happens: Everything works fine until the switch/ latency tests start. Then I see that there is heavy access to the SD card, which is expected, as the status LED 2 is blinking. After ~5mins this status LED is constantly on. That's when I know that everything is over. On the console I can only execute commands that are already in RAM, such as the bash things like ps, mount, ... However, if I try a simple 'touch new' it blocks forever and I know that it blocks in the syscall where the file should be created, because I looked at it with strace. I tried several things: I turned off CONFIG_PM (which was on by default), turned on the MMC debugging, put extra prink's in the omap_hsmmc.c ISR. However, everything seems to work on this level: DMA requests are started and do finish, the ISR is called regularly (bc first I though that Xenomai would starve it). Have you every run Xenonmai on this _specific_ board (since everything is running smoothly on the omap5 board)? Any more ideas how to debug it? Currently, I'm compiling the ipipe trace in hope that it would tell me something useful... Oh yes, the best bit is that the regression test works perfectly fine if I boot from an external USB HD _AND_ unmount (!) all MMC partitions. Thanks, A. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Xenomai] Command line freeze during xeno-regression-test on omap4460 2014-04-04 10:27 ` Andreas Glatz @ 2014-04-04 10:44 ` Gilles Chanteperdrix 2014-04-04 11:19 ` Andreas Glatz 2014-04-06 11:21 ` Andreas Glatz 2014-04-04 11:00 ` Gilles Chanteperdrix 1 sibling, 2 replies; 28+ messages in thread From: Gilles Chanteperdrix @ 2014-04-04 10:44 UTC (permalink / raw) To: Andreas Glatz; +Cc: xenomai On 04/04/2014 12:27 PM, Andreas Glatz wrote: > Hi Gilles, > > I'm finally back to my original problem below: > > On 6 Jan 2014, at 17:39, Gilles Chanteperdrix wrote: > >> On 01/06/2014 04:30 PM, Andreas Glatz wrote: >>> Hi, >>> >>> I managed to produce a kernel (v3.8.13) with xenomai 2.6.3 ipipe >>> patch and >>> rootfs (debian wheezy) with xenomai 2.6.3 libraries for my >>> Pandaboard ES >>> (omap4460). The simple regression test, which only calls dd during >>> the >>> switchtest, works fine. However the regression test with the linux >>> test >>> project (ltp-full-20130904) scripts causes some sort of system lock >>> up. >>> After that I only can ctrl-c xeno-regression-test (i.e. >>> switchtest), which, >>> however, doesn't help to regain console access (neigher over >>> ethernet nor >>> serial). >>> >>> Here's what I did: >>> >>> -- Building -- >>> As recomended in the Xenomai 2.6 readme I followed the instructions >>> in [1] >>> to produce a kernel and filesystem. To get a xenomai kernel I had >>> to do >>> three things differently: >>> >>> *) I used: git checkout origin/v3.8.x -b tmp >>> *) I applied ipipe-core-3.8.13-arm-3.patch from the xenomai-2.6 git >>> tree as >>> described in the Xenomai 2.6 readme >>> *) I disabled KGDB and TIDSPBRIDGE since those produced compile >>> errors (see >>> config [2]) >>> >>> After a while I obtained the following messages from dmesg [3] and >>> from the >>> command prompt: >>> >>> root@arm:~# cat /proc/version >>> Linux version 3.8.13-x3.6 (aglatz@linuxvbox) (gcc version 4.7.3 >>> 20130328 >>> (prerelease) (crosstool-NG linaro-1.13.1-4.7-2013.04-20130415 - >>> Linaro GCC >>> 2013.04) ) #4 SMP Sat Jan 4 15:54:20 GMT 2014 >>> >>> -- Testing Linux -- >>> To see if everything works I downloaded and cross-compiled >>> ltp-full-20130904 [4] with the same toolchain and flags (- >>> march=armv7-a >>> -mfpu=vfp3) as the xenomai libs and runtime. I started ltp with "./ >>> runltp >>> -p -l dohell-2014-01-06-1.log -S xenomai.skiplist" and after a >>> while it >>> finished with a few failed tests [5]. The console access, however, >>> worked >>> fine. >>> >>> -- Testing Xenomai -- >>> First I sucessfully could run the simple xenomai regression test: >>> xeno-regression-test -l "/usr/lib/xenomai/testsuite/dohell -m /tmp >>> 100" -t >>> 2 which produced the output in [6] and the following additional >>> messages >>> with dmesg: >>> >>> [ 476.215057] Xenomai: RTDM: closing file descriptor 1. >>> [ 477.434936] Xenomai: Posix: destroying semaphore f0069c00. >>> [ 477.440887] Xenomai: Posix: destroying mutex f0069a00. >>> [ 477.475372] xnheap: destroying shared heap 'rt_heap: heap' with >>> 16384 >>> bytes still in use. >>> [ 479.008453] Xenomai: Switching rt_task to secondary mode after >>> exception >>> #0 from user-space at 0x9620 (pid 2145) >>> [ 480.574462] Xenomai: watchdog triggered -- signaling runaway >>> thread >>> 'rt_task' >>> [ 480.582061] [sched_delayed] sched: RT throttling activated >>> [ 557.336425] Xenomai: Posix: closing message queue descriptor 3. >>> >>> and "cat /proc/xenomai/*" produced [7]. >>> >>> When I started the realistic xenomai regression test: xeno- >>> regression-test >>> -l "/usr/lib/xenomai/testsuite/dohell -m /tmp -l /opt/ltp" -t 2 >>> everything >>> seemed fine at first - I could logon and start top to inspect the >>> running >>> processes. However, the command line (over serial and ethernet) >>> consistently freezes after a while (at different ltp tests though). >>> First I >>> thought it's the massive system load which doesn't leave CPU for the >>> console... however ctrl-c of xeno-regression-test does not help to >>> regain >>> console access... >> >> That is because kill xeno-regression-test does not kill all the >> script children. So, basically, the load tasks are still running. >> Also, what filesystem is /tmp? dohell is using dd to alternatively >> write to /tmp, then erase the file. If /tmp is some flash, it will >> become slow after a while. If it is a tmpfs, it will eat RAM. >> >> > > The described problem is _very_ reproducible on my PandaBoard ES > (omap4460), where I boot from an SD card partition and the rootfs is > also on the SD card partition. I tried it with several kernel versions > (3.8.13, 3.10.18, and 3.10.34) with the latest ipipe and xenomai from > git the git repos. Everytime I start the regression test (see command > above) the following happens: Everything works fine until the switch/ > latency tests start. Then I see that there is heavy access to the SD > card, which is expected, as the status LED 2 is blinking. After ~5mins > this status LED is constantly on. That's when I know that everything > is over. On the console I can only execute commands that are already > in RAM, such as the bash things like ps, mount, ... However, if I try > a simple 'touch new' it blocks forever and I know that it blocks in > the syscall where the file should be created, because I looked at it > with strace. I tried several things: I turned off CONFIG_PM (which was > on by default), turned on the MMC debugging, put extra prink's in the > omap_hsmmc.c ISR. However, everything seems to work on this level: DMA > requests are started and do finish, the ISR is called regularly (bc > first I though that Xenomai would starve it). > > Have you every run Xenonmai on this _specific_ board (since everything > is running smoothly on the omap5 board)? > Any more ideas how to debug it? > > Currently, I'm compiling the ipipe trace in hope that it would tell me > something useful... > > Oh yes, the best bit is that the regression test works perfectly fine > if I boot from an external USB HD _AND_ unmount (!) all MMC partitions. So, the MMC driver has a problem. Have you tried: - running the exact same kernel configuration only with CONFIG_XENOMAI disabled (and stress with dohell) - then with CONFIG_XENOMAI and CONFIG_IPIPE disabled. Also, do you have this patch in the tree you tried? http://git.xenomai.org/ipipe.git/commit/?h=stable/ipipe-3.10.18&id=c26e7ad5679f9391cd8ea1db001bf301d2f6bc88 -- Gilles. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Xenomai] Command line freeze during xeno-regression-test on omap4460 2014-04-04 10:44 ` Gilles Chanteperdrix @ 2014-04-04 11:19 ` Andreas Glatz 2014-04-04 11:21 ` Gilles Chanteperdrix 2014-04-06 11:21 ` Andreas Glatz 1 sibling, 1 reply; 28+ messages in thread From: Andreas Glatz @ 2014-04-04 11:19 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: xenomai On 4 Apr 2014, at 11:44, Gilles Chanteperdrix wrote: > On 04/04/2014 12:27 PM, Andreas Glatz wrote: >> Hi Gilles, >> >> I'm finally back to my original problem below: >> >> On 6 Jan 2014, at 17:39, Gilles Chanteperdrix wrote: >> >>> On 01/06/2014 04:30 PM, Andreas Glatz wrote: >>>> Hi, >>>> >>>> I managed to produce a kernel (v3.8.13) with xenomai 2.6.3 ipipe >>>> patch and >>>> rootfs (debian wheezy) with xenomai 2.6.3 libraries for my >>>> Pandaboard ES >>>> (omap4460). The simple regression test, which only calls dd during >>>> the >>>> switchtest, works fine. However the regression test with the linux >>>> test >>>> project (ltp-full-20130904) scripts causes some sort of system lock >>>> up. >>>> After that I only can ctrl-c xeno-regression-test (i.e. >>>> switchtest), which, >>>> however, doesn't help to regain console access (neigher over >>>> ethernet nor >>>> serial). >>>> >>>> Here's what I did: >>>> >>>> -- Building -- >>>> As recomended in the Xenomai 2.6 readme I followed the instructions >>>> in [1] >>>> to produce a kernel and filesystem. To get a xenomai kernel I had >>>> to do >>>> three things differently: >>>> >>>> *) I used: git checkout origin/v3.8.x -b tmp >>>> *) I applied ipipe-core-3.8.13-arm-3.patch from the xenomai-2.6 git >>>> tree as >>>> described in the Xenomai 2.6 readme >>>> *) I disabled KGDB and TIDSPBRIDGE since those produced compile >>>> errors (see >>>> config [2]) >>>> >>>> After a while I obtained the following messages from dmesg [3] and >>>> from the >>>> command prompt: >>>> >>>> root@arm:~# cat /proc/version >>>> Linux version 3.8.13-x3.6 (aglatz@linuxvbox) (gcc version 4.7.3 >>>> 20130328 >>>> (prerelease) (crosstool-NG linaro-1.13.1-4.7-2013.04-20130415 - >>>> Linaro GCC >>>> 2013.04) ) #4 SMP Sat Jan 4 15:54:20 GMT 2014 >>>> >>>> -- Testing Linux -- >>>> To see if everything works I downloaded and cross-compiled >>>> ltp-full-20130904 [4] with the same toolchain and flags (- >>>> march=armv7-a >>>> -mfpu=vfp3) as the xenomai libs and runtime. I started ltp with "./ >>>> runltp >>>> -p -l dohell-2014-01-06-1.log -S xenomai.skiplist" and after a >>>> while it >>>> finished with a few failed tests [5]. The console access, however, >>>> worked >>>> fine. >>>> >>>> -- Testing Xenomai -- >>>> First I sucessfully could run the simple xenomai regression test: >>>> xeno-regression-test -l "/usr/lib/xenomai/testsuite/dohell -m /tmp >>>> 100" -t >>>> 2 which produced the output in [6] and the following additional >>>> messages >>>> with dmesg: >>>> >>>> [ 476.215057] Xenomai: RTDM: closing file descriptor 1. >>>> [ 477.434936] Xenomai: Posix: destroying semaphore f0069c00. >>>> [ 477.440887] Xenomai: Posix: destroying mutex f0069a00. >>>> [ 477.475372] xnheap: destroying shared heap 'rt_heap: heap' with >>>> 16384 >>>> bytes still in use. >>>> [ 479.008453] Xenomai: Switching rt_task to secondary mode after >>>> exception >>>> #0 from user-space at 0x9620 (pid 2145) >>>> [ 480.574462] Xenomai: watchdog triggered -- signaling runaway >>>> thread >>>> 'rt_task' >>>> [ 480.582061] [sched_delayed] sched: RT throttling activated >>>> [ 557.336425] Xenomai: Posix: closing message queue descriptor 3. >>>> >>>> and "cat /proc/xenomai/*" produced [7]. >>>> >>>> When I started the realistic xenomai regression test: xeno- >>>> regression-test >>>> -l "/usr/lib/xenomai/testsuite/dohell -m /tmp -l /opt/ltp" -t 2 >>>> everything >>>> seemed fine at first - I could logon and start top to inspect the >>>> running >>>> processes. However, the command line (over serial and ethernet) >>>> consistently freezes after a while (at different ltp tests though). >>>> First I >>>> thought it's the massive system load which doesn't leave CPU for >>>> the >>>> console... however ctrl-c of xeno-regression-test does not help to >>>> regain >>>> console access... >>> >>> That is because kill xeno-regression-test does not kill all the >>> script children. So, basically, the load tasks are still running. >>> Also, what filesystem is /tmp? dohell is using dd to alternatively >>> write to /tmp, then erase the file. If /tmp is some flash, it will >>> become slow after a while. If it is a tmpfs, it will eat RAM. >>> >>> >> >> The described problem is _very_ reproducible on my PandaBoard ES >> (omap4460), where I boot from an SD card partition and the rootfs is >> also on the SD card partition. I tried it with several kernel >> versions >> (3.8.13, 3.10.18, and 3.10.34) with the latest ipipe and xenomai from >> git the git repos. Everytime I start the regression test (see command >> above) the following happens: Everything works fine until the switch/ >> latency tests start. Then I see that there is heavy access to the SD >> card, which is expected, as the status LED 2 is blinking. After >> ~5mins >> this status LED is constantly on. That's when I know that everything >> is over. On the console I can only execute commands that are already >> in RAM, such as the bash things like ps, mount, ... However, if I try >> a simple 'touch new' it blocks forever and I know that it blocks in >> the syscall where the file should be created, because I looked at it >> with strace. I tried several things: I turned off CONFIG_PM (which >> was >> on by default), turned on the MMC debugging, put extra prink's in the >> omap_hsmmc.c ISR. However, everything seems to work on this level: >> DMA >> requests are started and do finish, the ISR is called regularly (bc >> first I though that Xenomai would starve it). >> >> Have you every run Xenonmai on this _specific_ board (since >> everything >> is running smoothly on the omap5 board)? >> Any more ideas how to debug it? >> >> Currently, I'm compiling the ipipe trace in hope that it would tell >> me >> something useful... >> >> Oh yes, the best bit is that the regression test works perfectly fine >> if I boot from an external USB HD _AND_ unmount (!) all MMC >> partitions. > > So, the MMC driver has a problem. Have you tried: > - running the exact same kernel configuration only with CONFIG_XENOMAI > disabled (and stress with dohell) > - then with CONFIG_XENOMAI and CONFIG_IPIPE disabled. > > Also, do you have this patch in the tree you tried? > http://git.xenomai.org/ipipe.git/commit/?h=stable/ipipe-3.10.18&id=c26e7ad5679f9391cd8ea1db001bf301d2f6bc88 > > I did try the regression test without the switch/latency tests (aka: '/ usr/lib/xenomai/testsuite/dohell -m /tmp -l /opt/ltp') and, as far as I recall, they finished successfully. I also couldn't find any error reports about the omap mmc driver on the kernel mailing list. The only thing I found was this patch [1], which I also applied. It didn't change a thing though. However, I'll try an run the test you suggested on my shiny new 3.10.34 kernel. I built it last Monday after merging all the ipipe git stuff with CNelsons 3.18.14 kernel. I saw that the patch you mentioned is in the 3.10.18 tree. Shall I apply it to my kernel as well? A. [1] http://www.spinics.net/lists/linux-omap/msg104712.html ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Xenomai] Command line freeze during xeno-regression-test on omap4460 2014-04-04 11:19 ` Andreas Glatz @ 2014-04-04 11:21 ` Gilles Chanteperdrix 0 siblings, 0 replies; 28+ messages in thread From: Gilles Chanteperdrix @ 2014-04-04 11:21 UTC (permalink / raw) To: Andreas Glatz; +Cc: xenomai On 04/04/2014 01:19 PM, Andreas Glatz wrote: > I saw that the patch you mentioned is in > the 3.10.18 tree. Shall I apply it to my kernel as well? Yes, definitely. -- Gilles. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Xenomai] Command line freeze during xeno-regression-test on omap4460 2014-04-04 10:44 ` Gilles Chanteperdrix 2014-04-04 11:19 ` Andreas Glatz @ 2014-04-06 11:21 ` Andreas Glatz 2014-04-06 14:44 ` Gilles Chanteperdrix 2014-04-06 15:54 ` Gilles Chanteperdrix 1 sibling, 2 replies; 28+ messages in thread From: Andreas Glatz @ 2014-04-06 11:21 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: xenomai On 4 Apr 2014, at 11:44, Gilles Chanteperdrix wrote: > On 04/04/2014 12:27 PM, Andreas Glatz wrote: >> Hi Gilles, >> >> I'm finally back to my original problem below: >> >> On 6 Jan 2014, at 17:39, Gilles Chanteperdrix wrote: >> >>> On 01/06/2014 04:30 PM, Andreas Glatz wrote: >>>> Hi, >>>> >>>> I managed to produce a kernel (v3.8.13) with xenomai 2.6.3 ipipe >>>> patch and >>>> rootfs (debian wheezy) with xenomai 2.6.3 libraries for my >>>> Pandaboard ES >>>> (omap4460). The simple regression test, which only calls dd during >>>> the >>>> switchtest, works fine. However the regression test with the linux >>>> test >>>> project (ltp-full-20130904) scripts causes some sort of system lock >>>> up. >>>> After that I only can ctrl-c xeno-regression-test (i.e. >>>> switchtest), which, >>>> however, doesn't help to regain console access (neigher over >>>> ethernet nor >>>> serial). >>>> >>>> Here's what I did: >>>> >>>> -- Building -- >>>> As recomended in the Xenomai 2.6 readme I followed the instructions >>>> in [1] >>>> to produce a kernel and filesystem. To get a xenomai kernel I had >>>> to do >>>> three things differently: >>>> >>>> *) I used: git checkout origin/v3.8.x -b tmp >>>> *) I applied ipipe-core-3.8.13-arm-3.patch from the xenomai-2.6 git >>>> tree as >>>> described in the Xenomai 2.6 readme >>>> *) I disabled KGDB and TIDSPBRIDGE since those produced compile >>>> errors (see >>>> config [2]) >>>> >>>> After a while I obtained the following messages from dmesg [3] and >>>> from the >>>> command prompt: >>>> >>>> root@arm:~# cat /proc/version >>>> Linux version 3.8.13-x3.6 (aglatz@linuxvbox) (gcc version 4.7.3 >>>> 20130328 >>>> (prerelease) (crosstool-NG linaro-1.13.1-4.7-2013.04-20130415 - >>>> Linaro GCC >>>> 2013.04) ) #4 SMP Sat Jan 4 15:54:20 GMT 2014 >>>> >>>> -- Testing Linux -- >>>> To see if everything works I downloaded and cross-compiled >>>> ltp-full-20130904 [4] with the same toolchain and flags (- >>>> march=armv7-a >>>> -mfpu=vfp3) as the xenomai libs and runtime. I started ltp with "./ >>>> runltp >>>> -p -l dohell-2014-01-06-1.log -S xenomai.skiplist" and after a >>>> while it >>>> finished with a few failed tests [5]. The console access, however, >>>> worked >>>> fine. >>>> >>>> -- Testing Xenomai -- >>>> First I sucessfully could run the simple xenomai regression test: >>>> xeno-regression-test -l "/usr/lib/xenomai/testsuite/dohell -m /tmp >>>> 100" -t >>>> 2 which produced the output in [6] and the following additional >>>> messages >>>> with dmesg: >>>> >>>> [ 476.215057] Xenomai: RTDM: closing file descriptor 1. >>>> [ 477.434936] Xenomai: Posix: destroying semaphore f0069c00. >>>> [ 477.440887] Xenomai: Posix: destroying mutex f0069a00. >>>> [ 477.475372] xnheap: destroying shared heap 'rt_heap: heap' with >>>> 16384 >>>> bytes still in use. >>>> [ 479.008453] Xenomai: Switching rt_task to secondary mode after >>>> exception >>>> #0 from user-space at 0x9620 (pid 2145) >>>> [ 480.574462] Xenomai: watchdog triggered -- signaling runaway >>>> thread >>>> 'rt_task' >>>> [ 480.582061] [sched_delayed] sched: RT throttling activated >>>> [ 557.336425] Xenomai: Posix: closing message queue descriptor 3. >>>> >>>> and "cat /proc/xenomai/*" produced [7]. >>>> >>>> When I started the realistic xenomai regression test: xeno- >>>> regression-test >>>> -l "/usr/lib/xenomai/testsuite/dohell -m /tmp -l /opt/ltp" -t 2 >>>> everything >>>> seemed fine at first - I could logon and start top to inspect the >>>> running >>>> processes. However, the command line (over serial and ethernet) >>>> consistently freezes after a while (at different ltp tests though). >>>> First I >>>> thought it's the massive system load which doesn't leave CPU for >>>> the >>>> console... however ctrl-c of xeno-regression-test does not help to >>>> regain >>>> console access... >>> >>> That is because kill xeno-regression-test does not kill all the >>> script children. So, basically, the load tasks are still running. >>> Also, what filesystem is /tmp? dohell is using dd to alternatively >>> write to /tmp, then erase the file. If /tmp is some flash, it will >>> become slow after a while. If it is a tmpfs, it will eat RAM. >>> >>> >> >> The described problem is _very_ reproducible on my PandaBoard ES >> (omap4460), where I boot from an SD card partition and the rootfs is >> also on the SD card partition. I tried it with several kernel >> versions >> (3.8.13, 3.10.18, and 3.10.34) with the latest ipipe and xenomai from >> git the git repos. Everytime I start the regression test (see command >> above) the following happens: Everything works fine until the switch/ >> latency tests start. Then I see that there is heavy access to the SD >> card, which is expected, as the status LED 2 is blinking. After >> ~5mins >> this status LED is constantly on. That's when I know that everything >> is over. On the console I can only execute commands that are already >> in RAM, such as the bash things like ps, mount, ... However, if I try >> a simple 'touch new' it blocks forever and I know that it blocks in >> the syscall where the file should be created, because I looked at it >> with strace. I tried several things: I turned off CONFIG_PM (which >> was >> on by default), turned on the MMC debugging, put extra prink's in the >> omap_hsmmc.c ISR. However, everything seems to work on this level: >> DMA >> requests are started and do finish, the ISR is called regularly (bc >> first I though that Xenomai would starve it). >> >> Have you every run Xenonmai on this _specific_ board (since >> everything >> is running smoothly on the omap5 board)? >> Any more ideas how to debug it? >> >> Currently, I'm compiling the ipipe trace in hope that it would tell >> me >> something useful... >> >> Oh yes, the best bit is that the regression test works perfectly fine >> if I boot from an external USB HD _AND_ unmount (!) all MMC >> partitions. > > So, the MMC driver has a problem. Have you tried: > - running the exact same kernel configuration only with CONFIG_XENOMAI > disabled (and stress with dohell) > - then with CONFIG_XENOMAI and CONFIG_IPIPE disabled. > > Also, do you have this patch in the tree you tried? > http://git.xenomai.org/ipipe.git/commit/?h=stable/ipipe-3.10.18&id=c26e7ad5679f9391cd8ea1db001bf301d2f6bc88 > First i mounted tmpfs on /tmp so I don't wear out the SD card too much: mount -t tmpfs -osize=192M tmpfs /tmp Then I used the following line to start the test (substitute MYTEST below with the following line): /usr/lib/xenomai/testsuite/dohell -m /tmp -l /opt/ltp Note: I always monitored the test over wifi with 'top' so I also had some network load... I got the following results with the 3.10.34 kernel, which includes everything up to the current ipipe-3.10 tag (it also included the patch you mentioned): - xeno-regression-test "MYTEST" -> FAIL if booted from SD card (see description above); OK if booted from ext USB HD _AND_ no mmc partitions mounted - CONFIG_IPIPE && CONFIG_XENOMAI && MYTEST -> FAIL (got status LED 2 constantly on as described above) - CONFIG_IPIPE && MYTEST -> OK (see attached config file and ltp test log) Anything else I should try? A. -------------- next part -------------- A non-text attachment was scrubbed... Name: config_v3.10.34 Type: application/octet-stream Size: 115686 bytes Desc: not available URL: <http://www.xenomai.org/pipermail/xenomai/attachments/20140406/b982a10e/attachment.obj> -------------- next part -------------- A non-text attachment was scrubbed... Name: LTP_RUN_ON-2014_Apr_05-16h_41m_09s.log Type: application/octet-stream Size: 64909 bytes Desc: not available URL: <http://www.xenomai.org/pipermail/xenomai/attachments/20140406/b982a10e/attachment-0001.obj> -------------- next part -------------- ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Xenomai] Command line freeze during xeno-regression-test on omap4460 2014-04-06 11:21 ` Andreas Glatz @ 2014-04-06 14:44 ` Gilles Chanteperdrix 2014-04-06 15:22 ` Andreas Glatz 2014-04-06 15:54 ` Gilles Chanteperdrix 1 sibling, 1 reply; 28+ messages in thread From: Gilles Chanteperdrix @ 2014-04-06 14:44 UTC (permalink / raw) To: Andreas Glatz; +Cc: xenomai On 04/06/2014 01:21 PM, Andreas Glatz wrote: > > On 4 Apr 2014, at 11:44, Gilles Chanteperdrix wrote: > >> On 04/04/2014 12:27 PM, Andreas Glatz wrote: >>> Hi Gilles, >>> >>> I'm finally back to my original problem below: >>> >>> On 6 Jan 2014, at 17:39, Gilles Chanteperdrix wrote: >>> >>>> On 01/06/2014 04:30 PM, Andreas Glatz wrote: >>>>> Hi, >>>>> >>>>> I managed to produce a kernel (v3.8.13) with xenomai 2.6.3 ipipe >>>>> patch and >>>>> rootfs (debian wheezy) with xenomai 2.6.3 libraries for my >>>>> Pandaboard ES >>>>> (omap4460). The simple regression test, which only calls dd during >>>>> the >>>>> switchtest, works fine. However the regression test with the linux >>>>> test >>>>> project (ltp-full-20130904) scripts causes some sort of system lock >>>>> up. >>>>> After that I only can ctrl-c xeno-regression-test (i.e. >>>>> switchtest), which, >>>>> however, doesn't help to regain console access (neigher over >>>>> ethernet nor >>>>> serial). >>>>> >>>>> Here's what I did: >>>>> >>>>> -- Building -- >>>>> As recomended in the Xenomai 2.6 readme I followed the instructions >>>>> in [1] >>>>> to produce a kernel and filesystem. To get a xenomai kernel I had >>>>> to do >>>>> three things differently: >>>>> >>>>> *) I used: git checkout origin/v3.8.x -b tmp >>>>> *) I applied ipipe-core-3.8.13-arm-3.patch from the xenomai-2.6 git >>>>> tree as >>>>> described in the Xenomai 2.6 readme >>>>> *) I disabled KGDB and TIDSPBRIDGE since those produced compile >>>>> errors (see >>>>> config [2]) >>>>> >>>>> After a while I obtained the following messages from dmesg [3] and >>>>> from the >>>>> command prompt: >>>>> >>>>> root@arm:~# cat /proc/version >>>>> Linux version 3.8.13-x3.6 (aglatz@linuxvbox) (gcc version 4.7.3 >>>>> 20130328 >>>>> (prerelease) (crosstool-NG linaro-1.13.1-4.7-2013.04-20130415 - >>>>> Linaro GCC >>>>> 2013.04) ) #4 SMP Sat Jan 4 15:54:20 GMT 2014 >>>>> >>>>> -- Testing Linux -- >>>>> To see if everything works I downloaded and cross-compiled >>>>> ltp-full-20130904 [4] with the same toolchain and flags (- >>>>> march=armv7-a >>>>> -mfpu=vfp3) as the xenomai libs and runtime. I started ltp with "./ >>>>> runltp >>>>> -p -l dohell-2014-01-06-1.log -S xenomai.skiplist" and after a >>>>> while it >>>>> finished with a few failed tests [5]. The console access, however, >>>>> worked >>>>> fine. >>>>> >>>>> -- Testing Xenomai -- >>>>> First I sucessfully could run the simple xenomai regression test: >>>>> xeno-regression-test -l "/usr/lib/xenomai/testsuite/dohell -m /tmp >>>>> 100" -t >>>>> 2 which produced the output in [6] and the following additional >>>>> messages >>>>> with dmesg: >>>>> >>>>> [ 476.215057] Xenomai: RTDM: closing file descriptor 1. >>>>> [ 477.434936] Xenomai: Posix: destroying semaphore f0069c00. >>>>> [ 477.440887] Xenomai: Posix: destroying mutex f0069a00. >>>>> [ 477.475372] xnheap: destroying shared heap 'rt_heap: heap' with >>>>> 16384 >>>>> bytes still in use. >>>>> [ 479.008453] Xenomai: Switching rt_task to secondary mode after >>>>> exception >>>>> #0 from user-space at 0x9620 (pid 2145) >>>>> [ 480.574462] Xenomai: watchdog triggered -- signaling runaway >>>>> thread >>>>> 'rt_task' >>>>> [ 480.582061] [sched_delayed] sched: RT throttling activated >>>>> [ 557.336425] Xenomai: Posix: closing message queue descriptor 3. >>>>> >>>>> and "cat /proc/xenomai/*" produced [7]. >>>>> >>>>> When I started the realistic xenomai regression test: xeno- >>>>> regression-test >>>>> -l "/usr/lib/xenomai/testsuite/dohell -m /tmp -l /opt/ltp" -t 2 >>>>> everything >>>>> seemed fine at first - I could logon and start top to inspect the >>>>> running >>>>> processes. However, the command line (over serial and ethernet) >>>>> consistently freezes after a while (at different ltp tests though). >>>>> First I >>>>> thought it's the massive system load which doesn't leave CPU for >>>>> the >>>>> console... however ctrl-c of xeno-regression-test does not help to >>>>> regain >>>>> console access... >>>> >>>> That is because kill xeno-regression-test does not kill all the >>>> script children. So, basically, the load tasks are still running. >>>> Also, what filesystem is /tmp? dohell is using dd to alternatively >>>> write to /tmp, then erase the file. If /tmp is some flash, it will >>>> become slow after a while. If it is a tmpfs, it will eat RAM. >>>> >>>> >>> >>> The described problem is _very_ reproducible on my PandaBoard ES >>> (omap4460), where I boot from an SD card partition and the rootfs is >>> also on the SD card partition. I tried it with several kernel >>> versions >>> (3.8.13, 3.10.18, and 3.10.34) with the latest ipipe and xenomai from >>> git the git repos. Everytime I start the regression test (see command >>> above) the following happens: Everything works fine until the switch/ >>> latency tests start. Then I see that there is heavy access to the SD >>> card, which is expected, as the status LED 2 is blinking. After >>> ~5mins >>> this status LED is constantly on. That's when I know that everything >>> is over. On the console I can only execute commands that are already >>> in RAM, such as the bash things like ps, mount, ... However, if I try >>> a simple 'touch new' it blocks forever and I know that it blocks in >>> the syscall where the file should be created, because I looked at it >>> with strace. I tried several things: I turned off CONFIG_PM (which >>> was >>> on by default), turned on the MMC debugging, put extra prink's in the >>> omap_hsmmc.c ISR. However, everything seems to work on this level: >>> DMA >>> requests are started and do finish, the ISR is called regularly (bc >>> first I though that Xenomai would starve it). >>> >>> Have you every run Xenonmai on this _specific_ board (since >>> everything >>> is running smoothly on the omap5 board)? >>> Any more ideas how to debug it? >>> >>> Currently, I'm compiling the ipipe trace in hope that it would tell >>> me >>> something useful... >>> >>> Oh yes, the best bit is that the regression test works perfectly fine >>> if I boot from an external USB HD _AND_ unmount (!) all MMC >>> partitions. >> >> So, the MMC driver has a problem. Have you tried: >> - running the exact same kernel configuration only with CONFIG_XENOMAI >> disabled (and stress with dohell) >> - then with CONFIG_XENOMAI and CONFIG_IPIPE disabled. >> >> Also, do you have this patch in the tree you tried? >> http://git.xenomai.org/ipipe.git/commit/?h=stable/ipipe-3.10.18&id=c26e7ad5679f9391cd8ea1db001bf301d2f6bc88 >> > > First i mounted tmpfs on /tmp so I don't wear out the SD card too much: > mount -t tmpfs -osize=192M tmpfs /tmp > > Then I used the following line to start the test (substitute MYTEST > below with the following line): > /usr/lib/xenomai/testsuite/dohell -m /tmp -l /opt/ltp > > Note: I always monitored the test over wifi with 'top' so I also had > some network load... > > I got the following results with the 3.10.34 kernel, which includes > everything up to the current ipipe-3.10 tag (it also included the > patch you mentioned): > > - xeno-regression-test "MYTEST" -> FAIL if booted from SD card (see > description above); OK if booted from ext USB HD _AND_ no mmc > partitions mounted > - CONFIG_IPIPE && CONFIG_XENOMAI && MYTEST -> FAIL (got status LED 2 > constantly on as described above) > - CONFIG_IPIPE && MYTEST -> OK (see attached config file and ltp test > log) > > Anything else I should try? Is the current LTP test when the failure happens always the same? -- Gilles. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Xenomai] Command line freeze during xeno-regression-test on omap4460 2014-04-06 14:44 ` Gilles Chanteperdrix @ 2014-04-06 15:22 ` Andreas Glatz 2014-04-06 15:28 ` Gilles Chanteperdrix 0 siblings, 1 reply; 28+ messages in thread From: Andreas Glatz @ 2014-04-06 15:22 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: xenomai On 6 Apr 2014, at 15:44, Gilles Chanteperdrix wrote: > On 04/06/2014 01:21 PM, Andreas Glatz wrote: >> >> On 4 Apr 2014, at 11:44, Gilles Chanteperdrix wrote: >> >>> On 04/04/2014 12:27 PM, Andreas Glatz wrote: >>>> Hi Gilles, >>>> >>>> I'm finally back to my original problem below: >>>> >>>> On 6 Jan 2014, at 17:39, Gilles Chanteperdrix wrote: >>>> >>>>> On 01/06/2014 04:30 PM, Andreas Glatz wrote: >>>>>> Hi, >>>>>> >>>>>> I managed to produce a kernel (v3.8.13) with xenomai 2.6.3 ipipe >>>>>> patch and >>>>>> rootfs (debian wheezy) with xenomai 2.6.3 libraries for my >>>>>> Pandaboard ES >>>>>> (omap4460). The simple regression test, which only calls dd >>>>>> during >>>>>> the >>>>>> switchtest, works fine. However the regression test with the >>>>>> linux >>>>>> test >>>>>> project (ltp-full-20130904) scripts causes some sort of system >>>>>> lock >>>>>> up. >>>>>> After that I only can ctrl-c xeno-regression-test (i.e. >>>>>> switchtest), which, >>>>>> however, doesn't help to regain console access (neigher over >>>>>> ethernet nor >>>>>> serial). >>>>>> >>>>>> Here's what I did: >>>>>> >>>>>> -- Building -- >>>>>> As recomended in the Xenomai 2.6 readme I followed the >>>>>> instructions >>>>>> in [1] >>>>>> to produce a kernel and filesystem. To get a xenomai kernel I had >>>>>> to do >>>>>> three things differently: >>>>>> >>>>>> *) I used: git checkout origin/v3.8.x -b tmp >>>>>> *) I applied ipipe-core-3.8.13-arm-3.patch from the xenomai-2.6 >>>>>> git >>>>>> tree as >>>>>> described in the Xenomai 2.6 readme >>>>>> *) I disabled KGDB and TIDSPBRIDGE since those produced compile >>>>>> errors (see >>>>>> config [2]) >>>>>> >>>>>> After a while I obtained the following messages from dmesg [3] >>>>>> and >>>>>> from the >>>>>> command prompt: >>>>>> >>>>>> root@arm:~# cat /proc/version >>>>>> Linux version 3.8.13-x3.6 (aglatz@linuxvbox) (gcc version 4.7.3 >>>>>> 20130328 >>>>>> (prerelease) (crosstool-NG linaro-1.13.1-4.7-2013.04-20130415 - >>>>>> Linaro GCC >>>>>> 2013.04) ) #4 SMP Sat Jan 4 15:54:20 GMT 2014 >>>>>> >>>>>> -- Testing Linux -- >>>>>> To see if everything works I downloaded and cross-compiled >>>>>> ltp-full-20130904 [4] with the same toolchain and flags (- >>>>>> march=armv7-a >>>>>> -mfpu=vfp3) as the xenomai libs and runtime. I started ltp with >>>>>> "./ >>>>>> runltp >>>>>> -p -l dohell-2014-01-06-1.log -S xenomai.skiplist" and after a >>>>>> while it >>>>>> finished with a few failed tests [5]. The console access, >>>>>> however, >>>>>> worked >>>>>> fine. >>>>>> >>>>>> -- Testing Xenomai -- >>>>>> First I sucessfully could run the simple xenomai regression test: >>>>>> xeno-regression-test -l "/usr/lib/xenomai/testsuite/dohell -m / >>>>>> tmp >>>>>> 100" -t >>>>>> 2 which produced the output in [6] and the following additional >>>>>> messages >>>>>> with dmesg: >>>>>> >>>>>> [ 476.215057] Xenomai: RTDM: closing file descriptor 1. >>>>>> [ 477.434936] Xenomai: Posix: destroying semaphore f0069c00. >>>>>> [ 477.440887] Xenomai: Posix: destroying mutex f0069a00. >>>>>> [ 477.475372] xnheap: destroying shared heap 'rt_heap: heap' >>>>>> with >>>>>> 16384 >>>>>> bytes still in use. >>>>>> [ 479.008453] Xenomai: Switching rt_task to secondary mode after >>>>>> exception >>>>>> #0 from user-space at 0x9620 (pid 2145) >>>>>> [ 480.574462] Xenomai: watchdog triggered -- signaling runaway >>>>>> thread >>>>>> 'rt_task' >>>>>> [ 480.582061] [sched_delayed] sched: RT throttling activated >>>>>> [ 557.336425] Xenomai: Posix: closing message queue descriptor >>>>>> 3. >>>>>> >>>>>> and "cat /proc/xenomai/*" produced [7]. >>>>>> >>>>>> When I started the realistic xenomai regression test: xeno- >>>>>> regression-test >>>>>> -l "/usr/lib/xenomai/testsuite/dohell -m /tmp -l /opt/ltp" -t 2 >>>>>> everything >>>>>> seemed fine at first - I could logon and start top to inspect the >>>>>> running >>>>>> processes. However, the command line (over serial and ethernet) >>>>>> consistently freezes after a while (at different ltp tests >>>>>> though). >>>>>> First I >>>>>> thought it's the massive system load which doesn't leave CPU for >>>>>> the >>>>>> console... however ctrl-c of xeno-regression-test does not help >>>>>> to >>>>>> regain >>>>>> console access... >>>>> >>>>> That is because kill xeno-regression-test does not kill all the >>>>> script children. So, basically, the load tasks are still running. >>>>> Also, what filesystem is /tmp? dohell is using dd to alternatively >>>>> write to /tmp, then erase the file. If /tmp is some flash, it will >>>>> become slow after a while. If it is a tmpfs, it will eat RAM. >>>>> >>>>> >>>> >>>> The described problem is _very_ reproducible on my PandaBoard ES >>>> (omap4460), where I boot from an SD card partition and the rootfs >>>> is >>>> also on the SD card partition. I tried it with several kernel >>>> versions >>>> (3.8.13, 3.10.18, and 3.10.34) with the latest ipipe and xenomai >>>> from >>>> git the git repos. Everytime I start the regression test (see >>>> command >>>> above) the following happens: Everything works fine until the >>>> switch/ >>>> latency tests start. Then I see that there is heavy access to the >>>> SD >>>> card, which is expected, as the status LED 2 is blinking. After >>>> ~5mins >>>> this status LED is constantly on. That's when I know that >>>> everything >>>> is over. On the console I can only execute commands that are >>>> already >>>> in RAM, such as the bash things like ps, mount, ... However, if I >>>> try >>>> a simple 'touch new' it blocks forever and I know that it blocks in >>>> the syscall where the file should be created, because I looked at >>>> it >>>> with strace. I tried several things: I turned off CONFIG_PM (which >>>> was >>>> on by default), turned on the MMC debugging, put extra prink's in >>>> the >>>> omap_hsmmc.c ISR. However, everything seems to work on this level: >>>> DMA >>>> requests are started and do finish, the ISR is called regularly (bc >>>> first I though that Xenomai would starve it). >>>> >>>> Have you every run Xenonmai on this _specific_ board (since >>>> everything >>>> is running smoothly on the omap5 board)? >>>> Any more ideas how to debug it? >>>> >>>> Currently, I'm compiling the ipipe trace in hope that it would tell >>>> me >>>> something useful... >>>> >>>> Oh yes, the best bit is that the regression test works perfectly >>>> fine >>>> if I boot from an external USB HD _AND_ unmount (!) all MMC >>>> partitions. >>> >>> So, the MMC driver has a problem. Have you tried: >>> - running the exact same kernel configuration only with >>> CONFIG_XENOMAI >>> disabled (and stress with dohell) >>> - then with CONFIG_XENOMAI and CONFIG_IPIPE disabled. >>> >>> Also, do you have this patch in the tree you tried? >>> http://git.xenomai.org/ipipe.git/commit/?h=stable/ipipe-3.10.18&id=c26e7ad5679f9391cd8ea1db001bf301d2f6bc88 >>> >> >> First i mounted tmpfs on /tmp so I don't wear out the SD card too >> much: >> mount -t tmpfs -osize=192M tmpfs /tmp >> >> Then I used the following line to start the test (substitute MYTEST >> below with the following line): >> /usr/lib/xenomai/testsuite/dohell -m /tmp -l /opt/ltp >> >> Note: I always monitored the test over wifi with 'top' so I also had >> some network load... >> >> I got the following results with the 3.10.34 kernel, which includes >> everything up to the current ipipe-3.10 tag (it also included the >> patch you mentioned): >> >> - xeno-regression-test "MYTEST" -> FAIL if booted from SD card (see >> description above); OK if booted from ext USB HD _AND_ no mmc >> partitions mounted >> - CONFIG_IPIPE && CONFIG_XENOMAI && MYTEST -> FAIL (got status LED 2 >> constantly on as described above) >> - CONFIG_IPIPE && MYTEST -> OK (see attached config file and ltp test >> log) >> >> Anything else I should try? > > Is the current LTP test when the failure happens always the same? > > I went through all the logfiles on my pandaboard and and identified the last tests that ltp logged before the error occurred (I'm assuming that ltp writes to the file in /opt/ltp/results after completing the test since there is the PASS/FAIL note as well, which logically should only be available after completing the test): test count ======================== rt_sigqueueinfo01 1 clock_nanosleep01 10 munmap02 1 semget06 1 epoll_create1_01 5 splice01 1 clock_getres01 1 rename13 1 BindMounts 1 utimes01 1 So it seems that the test after 'clock_nanosleep01', which is 'clone01' according to the LTP log file I sent you, seems to be the prime hotspot of failure followed by 'epoll01', which comes after 'epoll_create1_01'. I'm using the standard LTP version 'ltp-full-20130904', which I downloaded and compiled on the target with gcc 4.6.3 (default debian wheezy). A. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Xenomai] Command line freeze during xeno-regression-test on omap4460 2014-04-06 15:22 ` Andreas Glatz @ 2014-04-06 15:28 ` Gilles Chanteperdrix 2014-04-06 20:57 ` Andreas Glatz 0 siblings, 1 reply; 28+ messages in thread From: Gilles Chanteperdrix @ 2014-04-06 15:28 UTC (permalink / raw) To: Andreas Glatz; +Cc: xenomai On 04/06/2014 05:22 PM, Andreas Glatz wrote: > > On 6 Apr 2014, at 15:44, Gilles Chanteperdrix wrote: > >> On 04/06/2014 01:21 PM, Andreas Glatz wrote: >>> >>> On 4 Apr 2014, at 11:44, Gilles Chanteperdrix wrote: >>> >>>> On 04/04/2014 12:27 PM, Andreas Glatz wrote: >>>>> Hi Gilles, >>>>> >>>>> I'm finally back to my original problem below: >>>>> >>>>> On 6 Jan 2014, at 17:39, Gilles Chanteperdrix wrote: >>>>> >>>>>> On 01/06/2014 04:30 PM, Andreas Glatz wrote: >>>>>>> Hi, >>>>>>> >>>>>>> I managed to produce a kernel (v3.8.13) with xenomai 2.6.3 ipipe >>>>>>> patch and >>>>>>> rootfs (debian wheezy) with xenomai 2.6.3 libraries for my >>>>>>> Pandaboard ES >>>>>>> (omap4460). The simple regression test, which only calls dd >>>>>>> during >>>>>>> the >>>>>>> switchtest, works fine. However the regression test with the >>>>>>> linux >>>>>>> test >>>>>>> project (ltp-full-20130904) scripts causes some sort of system >>>>>>> lock >>>>>>> up. >>>>>>> After that I only can ctrl-c xeno-regression-test (i.e. >>>>>>> switchtest), which, >>>>>>> however, doesn't help to regain console access (neigher over >>>>>>> ethernet nor >>>>>>> serial). >>>>>>> >>>>>>> Here's what I did: >>>>>>> >>>>>>> -- Building -- >>>>>>> As recomended in the Xenomai 2.6 readme I followed the >>>>>>> instructions >>>>>>> in [1] >>>>>>> to produce a kernel and filesystem. To get a xenomai kernel I had >>>>>>> to do >>>>>>> three things differently: >>>>>>> >>>>>>> *) I used: git checkout origin/v3.8.x -b tmp >>>>>>> *) I applied ipipe-core-3.8.13-arm-3.patch from the xenomai-2.6 >>>>>>> git >>>>>>> tree as >>>>>>> described in the Xenomai 2.6 readme >>>>>>> *) I disabled KGDB and TIDSPBRIDGE since those produced compile >>>>>>> errors (see >>>>>>> config [2]) >>>>>>> >>>>>>> After a while I obtained the following messages from dmesg [3] >>>>>>> and >>>>>>> from the >>>>>>> command prompt: >>>>>>> >>>>>>> root@arm:~# cat /proc/version >>>>>>> Linux version 3.8.13-x3.6 (aglatz@linuxvbox) (gcc version 4.7.3 >>>>>>> 20130328 >>>>>>> (prerelease) (crosstool-NG linaro-1.13.1-4.7-2013.04-20130415 - >>>>>>> Linaro GCC >>>>>>> 2013.04) ) #4 SMP Sat Jan 4 15:54:20 GMT 2014 >>>>>>> >>>>>>> -- Testing Linux -- >>>>>>> To see if everything works I downloaded and cross-compiled >>>>>>> ltp-full-20130904 [4] with the same toolchain and flags (- >>>>>>> march=armv7-a >>>>>>> -mfpu=vfp3) as the xenomai libs and runtime. I started ltp with >>>>>>> "./ >>>>>>> runltp >>>>>>> -p -l dohell-2014-01-06-1.log -S xenomai.skiplist" and after a >>>>>>> while it >>>>>>> finished with a few failed tests [5]. The console access, >>>>>>> however, >>>>>>> worked >>>>>>> fine. >>>>>>> >>>>>>> -- Testing Xenomai -- >>>>>>> First I sucessfully could run the simple xenomai regression test: >>>>>>> xeno-regression-test -l "/usr/lib/xenomai/testsuite/dohell -m / >>>>>>> tmp >>>>>>> 100" -t >>>>>>> 2 which produced the output in [6] and the following additional >>>>>>> messages >>>>>>> with dmesg: >>>>>>> >>>>>>> [ 476.215057] Xenomai: RTDM: closing file descriptor 1. >>>>>>> [ 477.434936] Xenomai: Posix: destroying semaphore f0069c00. >>>>>>> [ 477.440887] Xenomai: Posix: destroying mutex f0069a00. >>>>>>> [ 477.475372] xnheap: destroying shared heap 'rt_heap: heap' >>>>>>> with >>>>>>> 16384 >>>>>>> bytes still in use. >>>>>>> [ 479.008453] Xenomai: Switching rt_task to secondary mode after >>>>>>> exception >>>>>>> #0 from user-space at 0x9620 (pid 2145) >>>>>>> [ 480.574462] Xenomai: watchdog triggered -- signaling runaway >>>>>>> thread >>>>>>> 'rt_task' >>>>>>> [ 480.582061] [sched_delayed] sched: RT throttling activated >>>>>>> [ 557.336425] Xenomai: Posix: closing message queue descriptor >>>>>>> 3. >>>>>>> >>>>>>> and "cat /proc/xenomai/*" produced [7]. >>>>>>> >>>>>>> When I started the realistic xenomai regression test: xeno- >>>>>>> regression-test >>>>>>> -l "/usr/lib/xenomai/testsuite/dohell -m /tmp -l /opt/ltp" -t 2 >>>>>>> everything >>>>>>> seemed fine at first - I could logon and start top to inspect the >>>>>>> running >>>>>>> processes. However, the command line (over serial and ethernet) >>>>>>> consistently freezes after a while (at different ltp tests >>>>>>> though). >>>>>>> First I >>>>>>> thought it's the massive system load which doesn't leave CPU for >>>>>>> the >>>>>>> console... however ctrl-c of xeno-regression-test does not help >>>>>>> to >>>>>>> regain >>>>>>> console access... >>>>>> >>>>>> That is because kill xeno-regression-test does not kill all the >>>>>> script children. So, basically, the load tasks are still running. >>>>>> Also, what filesystem is /tmp? dohell is using dd to alternatively >>>>>> write to /tmp, then erase the file. If /tmp is some flash, it will >>>>>> become slow after a while. If it is a tmpfs, it will eat RAM. >>>>>> >>>>>> >>>>> >>>>> The described problem is _very_ reproducible on my PandaBoard ES >>>>> (omap4460), where I boot from an SD card partition and the rootfs >>>>> is >>>>> also on the SD card partition. I tried it with several kernel >>>>> versions >>>>> (3.8.13, 3.10.18, and 3.10.34) with the latest ipipe and xenomai >>>>> from >>>>> git the git repos. Everytime I start the regression test (see >>>>> command >>>>> above) the following happens: Everything works fine until the >>>>> switch/ >>>>> latency tests start. Then I see that there is heavy access to the >>>>> SD >>>>> card, which is expected, as the status LED 2 is blinking. After >>>>> ~5mins >>>>> this status LED is constantly on. That's when I know that >>>>> everything >>>>> is over. On the console I can only execute commands that are >>>>> already >>>>> in RAM, such as the bash things like ps, mount, ... However, if I >>>>> try >>>>> a simple 'touch new' it blocks forever and I know that it blocks in >>>>> the syscall where the file should be created, because I looked at >>>>> it >>>>> with strace. I tried several things: I turned off CONFIG_PM (which >>>>> was >>>>> on by default), turned on the MMC debugging, put extra prink's in >>>>> the >>>>> omap_hsmmc.c ISR. However, everything seems to work on this level: >>>>> DMA >>>>> requests are started and do finish, the ISR is called regularly (bc >>>>> first I though that Xenomai would starve it). >>>>> >>>>> Have you every run Xenonmai on this _specific_ board (since >>>>> everything >>>>> is running smoothly on the omap5 board)? >>>>> Any more ideas how to debug it? >>>>> >>>>> Currently, I'm compiling the ipipe trace in hope that it would tell >>>>> me >>>>> something useful... >>>>> >>>>> Oh yes, the best bit is that the regression test works perfectly >>>>> fine >>>>> if I boot from an external USB HD _AND_ unmount (!) all MMC >>>>> partitions. >>>> >>>> So, the MMC driver has a problem. Have you tried: >>>> - running the exact same kernel configuration only with >>>> CONFIG_XENOMAI >>>> disabled (and stress with dohell) >>>> - then with CONFIG_XENOMAI and CONFIG_IPIPE disabled. >>>> >>>> Also, do you have this patch in the tree you tried? >>>> http://git.xenomai.org/ipipe.git/commit/?h=stable/ipipe-3.10.18&id=c26e7ad5679f9391cd8ea1db001bf301d2f6bc88 >>>> >>> >>> First i mounted tmpfs on /tmp so I don't wear out the SD card too >>> much: >>> mount -t tmpfs -osize=192M tmpfs /tmp >>> >>> Then I used the following line to start the test (substitute MYTEST >>> below with the following line): >>> /usr/lib/xenomai/testsuite/dohell -m /tmp -l /opt/ltp >>> >>> Note: I always monitored the test over wifi with 'top' so I also had >>> some network load... >>> >>> I got the following results with the 3.10.34 kernel, which includes >>> everything up to the current ipipe-3.10 tag (it also included the >>> patch you mentioned): >>> >>> - xeno-regression-test "MYTEST" -> FAIL if booted from SD card (see >>> description above); OK if booted from ext USB HD _AND_ no mmc >>> partitions mounted >>> - CONFIG_IPIPE && CONFIG_XENOMAI && MYTEST -> FAIL (got status LED 2 >>> constantly on as described above) >>> - CONFIG_IPIPE && MYTEST -> OK (see attached config file and ltp test >>> log) >>> >>> Anything else I should try? >> >> Is the current LTP test when the failure happens always the same? >> >> > > I went through all the logfiles on my pandaboard and and identified > the last tests that ltp logged before the error occurred (I'm assuming > that ltp writes to the file in /opt/ltp/results after completing the > test since there is the PASS/FAIL note as well, which logically should > only be available after completing the test): > > test count > ======================== > rt_sigqueueinfo01 1 > clock_nanosleep01 10 > munmap02 1 > semget06 1 > epoll_create1_01 5 > splice01 1 > clock_getres01 1 > rename13 1 > BindMounts 1 > utimes01 1 > > So it seems that the test after 'clock_nanosleep01', which is > 'clone01' according to the LTP log file I sent you, seems to be the > prime hotspot of failure followed by 'epoll01', which comes after > 'epoll_create1_01'. > > I'm using the standard LTP version 'ltp-full-20130904', which I > downloaded and compiled on the target with gcc 4.6.3 (default debian > wheezy). Ok. I am not sure it is meaningful. Anyway, the only difference between CONFIG_XENOMAI + CONFIG_IPIPE and CONFIG_IPIPE alone, provided that you are not running any program using Xenomai, is the host tick emulation. So, could you please try to turn off CONFIG_NO_HZ_IDLE CONFIG_NO_HZ CONFIG_HIGH_RES_TIMERS And see if it works better? -- Gilles. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Xenomai] Command line freeze during xeno-regression-test on omap4460 2014-04-06 15:28 ` Gilles Chanteperdrix @ 2014-04-06 20:57 ` Andreas Glatz 2014-04-06 21:04 ` Gilles Chanteperdrix 0 siblings, 1 reply; 28+ messages in thread From: Andreas Glatz @ 2014-04-06 20:57 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: xenomai On 6 Apr 2014, at 16:28, Gilles Chanteperdrix wrote: > On 04/06/2014 05:22 PM, Andreas Glatz wrote: >> >> On 6 Apr 2014, at 15:44, Gilles Chanteperdrix wrote: >> >>> On 04/06/2014 01:21 PM, Andreas Glatz wrote: >>>> >>>> On 4 Apr 2014, at 11:44, Gilles Chanteperdrix wrote: >>>> >>>>> On 04/04/2014 12:27 PM, Andreas Glatz wrote: >>>>>> Hi Gilles, >>>>>> >>>>>> I'm finally back to my original problem below: >>>>>> >>>>>> On 6 Jan 2014, at 17:39, Gilles Chanteperdrix wrote: >>>>>> >>>>>>> On 01/06/2014 04:30 PM, Andreas Glatz wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> I managed to produce a kernel (v3.8.13) with xenomai 2.6.3 >>>>>>>> ipipe >>>>>>>> patch and >>>>>>>> rootfs (debian wheezy) with xenomai 2.6.3 libraries for my >>>>>>>> Pandaboard ES >>>>>>>> (omap4460). The simple regression test, which only calls dd >>>>>>>> during >>>>>>>> the >>>>>>>> switchtest, works fine. However the regression test with the >>>>>>>> linux >>>>>>>> test >>>>>>>> project (ltp-full-20130904) scripts causes some sort of system >>>>>>>> lock >>>>>>>> up. >>>>>>>> After that I only can ctrl-c xeno-regression-test (i.e. >>>>>>>> switchtest), which, >>>>>>>> however, doesn't help to regain console access (neigher over >>>>>>>> ethernet nor >>>>>>>> serial). >>>>>>>> >>>>>>>> Here's what I did: >>>>>>>> >>>>>>>> -- Building -- >>>>>>>> As recomended in the Xenomai 2.6 readme I followed the >>>>>>>> instructions >>>>>>>> in [1] >>>>>>>> to produce a kernel and filesystem. To get a xenomai kernel I >>>>>>>> had >>>>>>>> to do >>>>>>>> three things differently: >>>>>>>> >>>>>>>> *) I used: git checkout origin/v3.8.x -b tmp >>>>>>>> *) I applied ipipe-core-3.8.13-arm-3.patch from the xenomai-2.6 >>>>>>>> git >>>>>>>> tree as >>>>>>>> described in the Xenomai 2.6 readme >>>>>>>> *) I disabled KGDB and TIDSPBRIDGE since those produced compile >>>>>>>> errors (see >>>>>>>> config [2]) >>>>>>>> >>>>>>>> After a while I obtained the following messages from dmesg [3] >>>>>>>> and >>>>>>>> from the >>>>>>>> command prompt: >>>>>>>> >>>>>>>> root@arm:~# cat /proc/version >>>>>>>> Linux version 3.8.13-x3.6 (aglatz@linuxvbox) (gcc version 4.7.3 >>>>>>>> 20130328 >>>>>>>> (prerelease) (crosstool-NG linaro-1.13.1-4.7-2013.04-20130415 - >>>>>>>> Linaro GCC >>>>>>>> 2013.04) ) #4 SMP Sat Jan 4 15:54:20 GMT 2014 >>>>>>>> >>>>>>>> -- Testing Linux -- >>>>>>>> To see if everything works I downloaded and cross-compiled >>>>>>>> ltp-full-20130904 [4] with the same toolchain and flags (- >>>>>>>> march=armv7-a >>>>>>>> -mfpu=vfp3) as the xenomai libs and runtime. I started ltp with >>>>>>>> "./ >>>>>>>> runltp >>>>>>>> -p -l dohell-2014-01-06-1.log -S xenomai.skiplist" and after a >>>>>>>> while it >>>>>>>> finished with a few failed tests [5]. The console access, >>>>>>>> however, >>>>>>>> worked >>>>>>>> fine. >>>>>>>> >>>>>>>> -- Testing Xenomai -- >>>>>>>> First I sucessfully could run the simple xenomai regression >>>>>>>> test: >>>>>>>> xeno-regression-test -l "/usr/lib/xenomai/testsuite/dohell -m / >>>>>>>> tmp >>>>>>>> 100" -t >>>>>>>> 2 which produced the output in [6] and the following additional >>>>>>>> messages >>>>>>>> with dmesg: >>>>>>>> >>>>>>>> [ 476.215057] Xenomai: RTDM: closing file descriptor 1. >>>>>>>> [ 477.434936] Xenomai: Posix: destroying semaphore f0069c00. >>>>>>>> [ 477.440887] Xenomai: Posix: destroying mutex f0069a00. >>>>>>>> [ 477.475372] xnheap: destroying shared heap 'rt_heap: heap' >>>>>>>> with >>>>>>>> 16384 >>>>>>>> bytes still in use. >>>>>>>> [ 479.008453] Xenomai: Switching rt_task to secondary mode >>>>>>>> after >>>>>>>> exception >>>>>>>> #0 from user-space at 0x9620 (pid 2145) >>>>>>>> [ 480.574462] Xenomai: watchdog triggered -- signaling runaway >>>>>>>> thread >>>>>>>> 'rt_task' >>>>>>>> [ 480.582061] [sched_delayed] sched: RT throttling activated >>>>>>>> [ 557.336425] Xenomai: Posix: closing message queue descriptor >>>>>>>> 3. >>>>>>>> >>>>>>>> and "cat /proc/xenomai/*" produced [7]. >>>>>>>> >>>>>>>> When I started the realistic xenomai regression test: xeno- >>>>>>>> regression-test >>>>>>>> -l "/usr/lib/xenomai/testsuite/dohell -m /tmp -l /opt/ltp" -t 2 >>>>>>>> everything >>>>>>>> seemed fine at first - I could logon and start top to inspect >>>>>>>> the >>>>>>>> running >>>>>>>> processes. However, the command line (over serial and ethernet) >>>>>>>> consistently freezes after a while (at different ltp tests >>>>>>>> though). >>>>>>>> First I >>>>>>>> thought it's the massive system load which doesn't leave CPU >>>>>>>> for >>>>>>>> the >>>>>>>> console... however ctrl-c of xeno-regression-test does not help >>>>>>>> to >>>>>>>> regain >>>>>>>> console access... >>>>>>> >>>>>>> That is because kill xeno-regression-test does not kill all the >>>>>>> script children. So, basically, the load tasks are still >>>>>>> running. >>>>>>> Also, what filesystem is /tmp? dohell is using dd to >>>>>>> alternatively >>>>>>> write to /tmp, then erase the file. If /tmp is some flash, it >>>>>>> will >>>>>>> become slow after a while. If it is a tmpfs, it will eat RAM. >>>>>>> >>>>>>> >>>>>> >>>>>> The described problem is _very_ reproducible on my PandaBoard ES >>>>>> (omap4460), where I boot from an SD card partition and the rootfs >>>>>> is >>>>>> also on the SD card partition. I tried it with several kernel >>>>>> versions >>>>>> (3.8.13, 3.10.18, and 3.10.34) with the latest ipipe and xenomai >>>>>> from >>>>>> git the git repos. Everytime I start the regression test (see >>>>>> command >>>>>> above) the following happens: Everything works fine until the >>>>>> switch/ >>>>>> latency tests start. Then I see that there is heavy access to the >>>>>> SD >>>>>> card, which is expected, as the status LED 2 is blinking. After >>>>>> ~5mins >>>>>> this status LED is constantly on. That's when I know that >>>>>> everything >>>>>> is over. On the console I can only execute commands that are >>>>>> already >>>>>> in RAM, such as the bash things like ps, mount, ... However, if I >>>>>> try >>>>>> a simple 'touch new' it blocks forever and I know that it >>>>>> blocks in >>>>>> the syscall where the file should be created, because I looked at >>>>>> it >>>>>> with strace. I tried several things: I turned off CONFIG_PM >>>>>> (which >>>>>> was >>>>>> on by default), turned on the MMC debugging, put extra prink's in >>>>>> the >>>>>> omap_hsmmc.c ISR. However, everything seems to work on this >>>>>> level: >>>>>> DMA >>>>>> requests are started and do finish, the ISR is called regularly >>>>>> (bc >>>>>> first I though that Xenomai would starve it). >>>>>> >>>>>> Have you every run Xenonmai on this _specific_ board (since >>>>>> everything >>>>>> is running smoothly on the omap5 board)? >>>>>> Any more ideas how to debug it? >>>>>> >>>>>> Currently, I'm compiling the ipipe trace in hope that it would >>>>>> tell >>>>>> me >>>>>> something useful... >>>>>> >>>>>> Oh yes, the best bit is that the regression test works perfectly >>>>>> fine >>>>>> if I boot from an external USB HD _AND_ unmount (!) all MMC >>>>>> partitions. >>>>> >>>>> So, the MMC driver has a problem. Have you tried: >>>>> - running the exact same kernel configuration only with >>>>> CONFIG_XENOMAI >>>>> disabled (and stress with dohell) >>>>> - then with CONFIG_XENOMAI and CONFIG_IPIPE disabled. >>>>> >>>>> Also, do you have this patch in the tree you tried? >>>>> http://git.xenomai.org/ipipe.git/commit/?h=stable/ipipe-3.10.18&id=c26e7ad5679f9391cd8ea1db001bf301d2f6bc88 >>>>> >>>> >>>> First i mounted tmpfs on /tmp so I don't wear out the SD card too >>>> much: >>>> mount -t tmpfs -osize=192M tmpfs /tmp >>>> >>>> Then I used the following line to start the test (substitute MYTEST >>>> below with the following line): >>>> /usr/lib/xenomai/testsuite/dohell -m /tmp -l /opt/ltp >>>> >>>> Note: I always monitored the test over wifi with 'top' so I also >>>> had >>>> some network load... >>>> >>>> I got the following results with the 3.10.34 kernel, which includes >>>> everything up to the current ipipe-3.10 tag (it also included the >>>> patch you mentioned): >>>> >>>> - xeno-regression-test "MYTEST" -> FAIL if booted from SD card (see >>>> description above); OK if booted from ext USB HD _AND_ no mmc >>>> partitions mounted >>>> - CONFIG_IPIPE && CONFIG_XENOMAI && MYTEST -> FAIL (got status >>>> LED 2 >>>> constantly on as described above) >>>> - CONFIG_IPIPE && MYTEST -> OK (see attached config file and ltp >>>> test >>>> log) >>>> >>>> Anything else I should try? >>> >>> Is the current LTP test when the failure happens always the same? >>> >>> >> >> I went through all the logfiles on my pandaboard and and identified >> the last tests that ltp logged before the error occurred (I'm >> assuming >> that ltp writes to the file in /opt/ltp/results after completing the >> test since there is the PASS/FAIL note as well, which logically >> should >> only be available after completing the test): >> >> test count >> ======================== >> rt_sigqueueinfo01 1 >> clock_nanosleep01 10 >> munmap02 1 >> semget06 1 >> epoll_create1_01 5 >> splice01 1 >> clock_getres01 1 >> rename13 1 >> BindMounts 1 >> utimes01 1 >> >> So it seems that the test after 'clock_nanosleep01', which is >> 'clone01' according to the LTP log file I sent you, seems to be the >> prime hotspot of failure followed by 'epoll01', which comes after >> 'epoll_create1_01'. >> >> I'm using the standard LTP version 'ltp-full-20130904', which I >> downloaded and compiled on the target with gcc 4.6.3 (default debian >> wheezy). > > Ok. I am not sure it is meaningful. Anyway, the only difference > between > CONFIG_XENOMAI + CONFIG_IPIPE and CONFIG_IPIPE alone, provided that > you > are not running any program using Xenomai, is the host tick emulation. > > So, could you please try to turn off > CONFIG_NO_HZ_IDLE > CONFIG_NO_HZ > CONFIG_HIGH_RES_TIMERS > > And see if it works better? > As I wrote before, I recompiled the Kernel with your timer options and CONFIG_XENOMAI, installed it, synced it and rebooted after cutting the power to the board for ~10secs. It seems with those options it got much further with the tests. However, eventually all ssh connections broke up and the last messages on the console, where I started do hell were: [...] 102400000 bytes (102 MB) copied, 2.97674 s, 34.4 MB/s 100+0 records in 100+0 records out 102400000 bytes (102 MB) copied, 1.97433 s, 51.9 MB/s 100+0 records in 100+0 records out 102400000 bytes (102 MB) copied, 2.68371 s, 38.2 MB/s 100+0 records in 100+0 records out 102400000 bytes (102 MB) copied, 2.57073 s, 39.8 MB/s dd: writing `/tmp/bigfile': No space left on device 7+0 records in 6+0 records out 6164480 bytes (6.2 MB) copied, 0.189001 s, 32.6 MB/s /usr/lib/xenomai/testsuite/dohell: 62: /usr/lib/xenomai/testsuite/ dohell: Cannot fork Write failed: Host is down ... and as usuall status LED 2 is permanently on. As u suspect there's something wrong with the timer subsystem I looked around a bit what extra patches went into the 3.10.14 kernel of RobertCNelson, which I used as a base to merge the ipipe git tree. Here is the list: 0001-panda-fix-wl12xx-regulator.patch 0002-ti-st-st-kim-fixing-firmware-path.patch 0003-Panda-expansion-add-spidev.patch 0004-HACK-PandaES-disable-cpufreq-so-board-will-boot.patch 0005-HACK-panda-enable-OMAP4_ERRATA_I688.patch 0006-ARM-hw_breakpoint-Enable-debug-powerdown-only-if-sys.patch 0007-Revert-regulator-twl-Remove-TWL6030_FIXED_RESOURCE.patch 0008-Revert-regulator-twl-Remove-another-unused-variable-.patch 0009-Revert-regulator-twl-Remove-references-to-the-twl403.patch 0010-Revert-regulator-twl-Remove-references-to-32kHz-cloc.patch 0011-panda-spidev-setup-pinmux.patch Do you think those may have something to do with it? A. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Xenomai] Command line freeze during xeno-regression-test on omap4460 2014-04-06 20:57 ` Andreas Glatz @ 2014-04-06 21:04 ` Gilles Chanteperdrix 2014-04-07 10:18 ` Andreas Glatz 0 siblings, 1 reply; 28+ messages in thread From: Gilles Chanteperdrix @ 2014-04-06 21:04 UTC (permalink / raw) To: Andreas Glatz; +Cc: xenomai On 04/06/2014 10:57 PM, Andreas Glatz wrote: > > On 6 Apr 2014, at 16:28, Gilles Chanteperdrix wrote: > >> On 04/06/2014 05:22 PM, Andreas Glatz wrote: >>> >>> On 6 Apr 2014, at 15:44, Gilles Chanteperdrix wrote: >>> >>>> On 04/06/2014 01:21 PM, Andreas Glatz wrote: >>>>> >>>>> On 4 Apr 2014, at 11:44, Gilles Chanteperdrix wrote: >>>>> >>>>>> On 04/04/2014 12:27 PM, Andreas Glatz wrote: >>>>>>> Hi Gilles, >>>>>>> >>>>>>> I'm finally back to my original problem below: >>>>>>> >>>>>>> On 6 Jan 2014, at 17:39, Gilles Chanteperdrix wrote: >>>>>>> >>>>>>>> On 01/06/2014 04:30 PM, Andreas Glatz wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I managed to produce a kernel (v3.8.13) with xenomai 2.6.3 >>>>>>>>> ipipe >>>>>>>>> patch and >>>>>>>>> rootfs (debian wheezy) with xenomai 2.6.3 libraries for my >>>>>>>>> Pandaboard ES >>>>>>>>> (omap4460). The simple regression test, which only calls dd >>>>>>>>> during >>>>>>>>> the >>>>>>>>> switchtest, works fine. However the regression test with the >>>>>>>>> linux >>>>>>>>> test >>>>>>>>> project (ltp-full-20130904) scripts causes some sort of system >>>>>>>>> lock >>>>>>>>> up. >>>>>>>>> After that I only can ctrl-c xeno-regression-test (i.e. >>>>>>>>> switchtest), which, >>>>>>>>> however, doesn't help to regain console access (neigher over >>>>>>>>> ethernet nor >>>>>>>>> serial). >>>>>>>>> >>>>>>>>> Here's what I did: >>>>>>>>> >>>>>>>>> -- Building -- >>>>>>>>> As recomended in the Xenomai 2.6 readme I followed the >>>>>>>>> instructions >>>>>>>>> in [1] >>>>>>>>> to produce a kernel and filesystem. To get a xenomai kernel I >>>>>>>>> had >>>>>>>>> to do >>>>>>>>> three things differently: >>>>>>>>> >>>>>>>>> *) I used: git checkout origin/v3.8.x -b tmp >>>>>>>>> *) I applied ipipe-core-3.8.13-arm-3.patch from the xenomai-2.6 >>>>>>>>> git >>>>>>>>> tree as >>>>>>>>> described in the Xenomai 2.6 readme >>>>>>>>> *) I disabled KGDB and TIDSPBRIDGE since those produced compile >>>>>>>>> errors (see >>>>>>>>> config [2]) >>>>>>>>> >>>>>>>>> After a while I obtained the following messages from dmesg [3] >>>>>>>>> and >>>>>>>>> from the >>>>>>>>> command prompt: >>>>>>>>> >>>>>>>>> root@arm:~# cat /proc/version >>>>>>>>> Linux version 3.8.13-x3.6 (aglatz@linuxvbox) (gcc version 4.7.3 >>>>>>>>> 20130328 >>>>>>>>> (prerelease) (crosstool-NG linaro-1.13.1-4.7-2013.04-20130415 - >>>>>>>>> Linaro GCC >>>>>>>>> 2013.04) ) #4 SMP Sat Jan 4 15:54:20 GMT 2014 >>>>>>>>> >>>>>>>>> -- Testing Linux -- >>>>>>>>> To see if everything works I downloaded and cross-compiled >>>>>>>>> ltp-full-20130904 [4] with the same toolchain and flags (- >>>>>>>>> march=armv7-a >>>>>>>>> -mfpu=vfp3) as the xenomai libs and runtime. I started ltp with >>>>>>>>> "./ >>>>>>>>> runltp >>>>>>>>> -p -l dohell-2014-01-06-1.log -S xenomai.skiplist" and after a >>>>>>>>> while it >>>>>>>>> finished with a few failed tests [5]. The console access, >>>>>>>>> however, >>>>>>>>> worked >>>>>>>>> fine. >>>>>>>>> >>>>>>>>> -- Testing Xenomai -- >>>>>>>>> First I sucessfully could run the simple xenomai regression >>>>>>>>> test: >>>>>>>>> xeno-regression-test -l "/usr/lib/xenomai/testsuite/dohell -m / >>>>>>>>> tmp >>>>>>>>> 100" -t >>>>>>>>> 2 which produced the output in [6] and the following additional >>>>>>>>> messages >>>>>>>>> with dmesg: >>>>>>>>> >>>>>>>>> [ 476.215057] Xenomai: RTDM: closing file descriptor 1. >>>>>>>>> [ 477.434936] Xenomai: Posix: destroying semaphore f0069c00. >>>>>>>>> [ 477.440887] Xenomai: Posix: destroying mutex f0069a00. >>>>>>>>> [ 477.475372] xnheap: destroying shared heap 'rt_heap: heap' >>>>>>>>> with >>>>>>>>> 16384 >>>>>>>>> bytes still in use. >>>>>>>>> [ 479.008453] Xenomai: Switching rt_task to secondary mode >>>>>>>>> after >>>>>>>>> exception >>>>>>>>> #0 from user-space at 0x9620 (pid 2145) >>>>>>>>> [ 480.574462] Xenomai: watchdog triggered -- signaling runaway >>>>>>>>> thread >>>>>>>>> 'rt_task' >>>>>>>>> [ 480.582061] [sched_delayed] sched: RT throttling activated >>>>>>>>> [ 557.336425] Xenomai: Posix: closing message queue descriptor >>>>>>>>> 3. >>>>>>>>> >>>>>>>>> and "cat /proc/xenomai/*" produced [7]. >>>>>>>>> >>>>>>>>> When I started the realistic xenomai regression test: xeno- >>>>>>>>> regression-test >>>>>>>>> -l "/usr/lib/xenomai/testsuite/dohell -m /tmp -l /opt/ltp" -t 2 >>>>>>>>> everything >>>>>>>>> seemed fine at first - I could logon and start top to inspect >>>>>>>>> the >>>>>>>>> running >>>>>>>>> processes. However, the command line (over serial and ethernet) >>>>>>>>> consistently freezes after a while (at different ltp tests >>>>>>>>> though). >>>>>>>>> First I >>>>>>>>> thought it's the massive system load which doesn't leave CPU >>>>>>>>> for >>>>>>>>> the >>>>>>>>> console... however ctrl-c of xeno-regression-test does not help >>>>>>>>> to >>>>>>>>> regain >>>>>>>>> console access... >>>>>>>> >>>>>>>> That is because kill xeno-regression-test does not kill all the >>>>>>>> script children. So, basically, the load tasks are still >>>>>>>> running. >>>>>>>> Also, what filesystem is /tmp? dohell is using dd to >>>>>>>> alternatively >>>>>>>> write to /tmp, then erase the file. If /tmp is some flash, it >>>>>>>> will >>>>>>>> become slow after a while. If it is a tmpfs, it will eat RAM. >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> The described problem is _very_ reproducible on my PandaBoard ES >>>>>>> (omap4460), where I boot from an SD card partition and the rootfs >>>>>>> is >>>>>>> also on the SD card partition. I tried it with several kernel >>>>>>> versions >>>>>>> (3.8.13, 3.10.18, and 3.10.34) with the latest ipipe and xenomai >>>>>>> from >>>>>>> git the git repos. Everytime I start the regression test (see >>>>>>> command >>>>>>> above) the following happens: Everything works fine until the >>>>>>> switch/ >>>>>>> latency tests start. Then I see that there is heavy access to the >>>>>>> SD >>>>>>> card, which is expected, as the status LED 2 is blinking. After >>>>>>> ~5mins >>>>>>> this status LED is constantly on. That's when I know that >>>>>>> everything >>>>>>> is over. On the console I can only execute commands that are >>>>>>> already >>>>>>> in RAM, such as the bash things like ps, mount, ... However, if I >>>>>>> try >>>>>>> a simple 'touch new' it blocks forever and I know that it >>>>>>> blocks in >>>>>>> the syscall where the file should be created, because I looked at >>>>>>> it >>>>>>> with strace. I tried several things: I turned off CONFIG_PM >>>>>>> (which >>>>>>> was >>>>>>> on by default), turned on the MMC debugging, put extra prink's in >>>>>>> the >>>>>>> omap_hsmmc.c ISR. However, everything seems to work on this >>>>>>> level: >>>>>>> DMA >>>>>>> requests are started and do finish, the ISR is called regularly >>>>>>> (bc >>>>>>> first I though that Xenomai would starve it). >>>>>>> >>>>>>> Have you every run Xenonmai on this _specific_ board (since >>>>>>> everything >>>>>>> is running smoothly on the omap5 board)? >>>>>>> Any more ideas how to debug it? >>>>>>> >>>>>>> Currently, I'm compiling the ipipe trace in hope that it would >>>>>>> tell >>>>>>> me >>>>>>> something useful... >>>>>>> >>>>>>> Oh yes, the best bit is that the regression test works perfectly >>>>>>> fine >>>>>>> if I boot from an external USB HD _AND_ unmount (!) all MMC >>>>>>> partitions. >>>>>> >>>>>> So, the MMC driver has a problem. Have you tried: >>>>>> - running the exact same kernel configuration only with >>>>>> CONFIG_XENOMAI >>>>>> disabled (and stress with dohell) >>>>>> - then with CONFIG_XENOMAI and CONFIG_IPIPE disabled. >>>>>> >>>>>> Also, do you have this patch in the tree you tried? >>>>>> http://git.xenomai.org/ipipe.git/commit/?h=stable/ipipe-3.10.18&id=c26e7ad5679f9391cd8ea1db001bf301d2f6bc88 >>>>>> >>>>> >>>>> First i mounted tmpfs on /tmp so I don't wear out the SD card too >>>>> much: >>>>> mount -t tmpfs -osize=192M tmpfs /tmp >>>>> >>>>> Then I used the following line to start the test (substitute MYTEST >>>>> below with the following line): >>>>> /usr/lib/xenomai/testsuite/dohell -m /tmp -l /opt/ltp >>>>> >>>>> Note: I always monitored the test over wifi with 'top' so I also >>>>> had >>>>> some network load... >>>>> >>>>> I got the following results with the 3.10.34 kernel, which includes >>>>> everything up to the current ipipe-3.10 tag (it also included the >>>>> patch you mentioned): >>>>> >>>>> - xeno-regression-test "MYTEST" -> FAIL if booted from SD card (see >>>>> description above); OK if booted from ext USB HD _AND_ no mmc >>>>> partitions mounted >>>>> - CONFIG_IPIPE && CONFIG_XENOMAI && MYTEST -> FAIL (got status >>>>> LED 2 >>>>> constantly on as described above) >>>>> - CONFIG_IPIPE && MYTEST -> OK (see attached config file and ltp >>>>> test >>>>> log) >>>>> >>>>> Anything else I should try? >>>> >>>> Is the current LTP test when the failure happens always the same? >>>> >>>> >>> >>> I went through all the logfiles on my pandaboard and and identified >>> the last tests that ltp logged before the error occurred (I'm >>> assuming >>> that ltp writes to the file in /opt/ltp/results after completing the >>> test since there is the PASS/FAIL note as well, which logically >>> should >>> only be available after completing the test): >>> >>> test count >>> ======================== >>> rt_sigqueueinfo01 1 >>> clock_nanosleep01 10 >>> munmap02 1 >>> semget06 1 >>> epoll_create1_01 5 >>> splice01 1 >>> clock_getres01 1 >>> rename13 1 >>> BindMounts 1 >>> utimes01 1 >>> >>> So it seems that the test after 'clock_nanosleep01', which is >>> 'clone01' according to the LTP log file I sent you, seems to be the >>> prime hotspot of failure followed by 'epoll01', which comes after >>> 'epoll_create1_01'. >>> >>> I'm using the standard LTP version 'ltp-full-20130904', which I >>> downloaded and compiled on the target with gcc 4.6.3 (default debian >>> wheezy). >> >> Ok. I am not sure it is meaningful. Anyway, the only difference >> between >> CONFIG_XENOMAI + CONFIG_IPIPE and CONFIG_IPIPE alone, provided that >> you >> are not running any program using Xenomai, is the host tick emulation. >> >> So, could you please try to turn off >> CONFIG_NO_HZ_IDLE >> CONFIG_NO_HZ >> CONFIG_HIGH_RES_TIMERS >> >> And see if it works better? >> > > As I wrote before, I recompiled the Kernel with your timer options and > CONFIG_XENOMAI, installed it, synced it and rebooted after cutting the > power to the board for ~10secs. > > It seems with those options it got much further with the tests. > However, eventually all ssh connections broke up and the last messages > on the console, where I started do hell were: > > [...] > 102400000 bytes (102 MB) copied, 2.97674 s, 34.4 MB/s > 100+0 records in > 100+0 records out > 102400000 bytes (102 MB) copied, 1.97433 s, 51.9 MB/s > 100+0 records in > 100+0 records out > 102400000 bytes (102 MB) copied, 2.68371 s, 38.2 MB/s > 100+0 records in > 100+0 records out > 102400000 bytes (102 MB) copied, 2.57073 s, 39.8 MB/s > dd: writing `/tmp/bigfile': No space left on device > 7+0 records in > 6+0 records out > 6164480 bytes (6.2 MB) copied, 0.189001 s, 32.6 MB/s > /usr/lib/xenomai/testsuite/dohell: 62: /usr/lib/xenomai/testsuite/ > dohell: Cannot fork This may simply be due to some LTP test which forks a lot and prevent the system from being able to fork. This should be a temporary solution. > Write failed: Host is down > > ... and as usuall status LED 2 is permanently on. > > As u suspect there's something wrong with the timer subsystem I looked > around a bit what extra patches went into the 3.10.14 kernel of > RobertCNelson, which I used as a base to merge the ipipe git tree. > Here is the list: > > 0001-panda-fix-wl12xx-regulator.patch > 0002-ti-st-st-kim-fixing-firmware-path.patch > 0003-Panda-expansion-add-spidev.patch > 0004-HACK-PandaES-disable-cpufreq-so-board-will-boot.patch > 0005-HACK-panda-enable-OMAP4_ERRATA_I688.patch > 0006-ARM-hw_breakpoint-Enable-debug-powerdown-only-if-sys.patch > 0007-Revert-regulator-twl-Remove-TWL6030_FIXED_RESOURCE.patch > 0008-Revert-regulator-twl-Remove-another-unused-variable-.patch > 0009-Revert-regulator-twl-Remove-references-to-the-twl403.patch > 0010-Revert-regulator-twl-Remove-references-to-32kHz-cloc.patch > 0011-panda-spidev-setup-pinmux.patch > > Do you think those may have something to do with it? I do not think so. When the LED is still on, can you use the serial console to run cat /proc/interrupts to see if the timer is still ticking? -- Gilles. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Xenomai] Command line freeze during xeno-regression-test on omap4460 2014-04-06 21:04 ` Gilles Chanteperdrix @ 2014-04-07 10:18 ` Andreas Glatz 2014-04-07 10:52 ` Gilles Chanteperdrix 0 siblings, 1 reply; 28+ messages in thread From: Andreas Glatz @ 2014-04-07 10:18 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: xenomai On 6 Apr 2014, at 22:04, Gilles Chanteperdrix wrote: > On 04/06/2014 10:57 PM, Andreas Glatz wrote: >> >> On 6 Apr 2014, at 16:28, Gilles Chanteperdrix wrote: >> >>> On 04/06/2014 05:22 PM, Andreas Glatz wrote: >>>> >>>> On 6 Apr 2014, at 15:44, Gilles Chanteperdrix wrote: >>>> >>>>> On 04/06/2014 01:21 PM, Andreas Glatz wrote: >>>>>> >>>>>> On 4 Apr 2014, at 11:44, Gilles Chanteperdrix wrote: >>>>>> >>>>>>> On 04/04/2014 12:27 PM, Andreas Glatz wrote: >>>>>>>> Hi Gilles, >>>>>>>> >>>>>>>> I'm finally back to my original problem below: >>>>>>>> >>>>>>>> On 6 Jan 2014, at 17:39, Gilles Chanteperdrix wrote: >>>>>>>> >>>>>>>>> On 01/06/2014 04:30 PM, Andreas Glatz wrote: >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I managed to produce a kernel (v3.8.13) with xenomai 2.6.3 >>>>>>>>>> ipipe >>>>>>>>>> patch and >>>>>>>>>> rootfs (debian wheezy) with xenomai 2.6.3 libraries for my >>>>>>>>>> Pandaboard ES >>>>>>>>>> (omap4460). The simple regression test, which only calls dd >>>>>>>>>> during >>>>>>>>>> the >>>>>>>>>> switchtest, works fine. However the regression test with the >>>>>>>>>> linux >>>>>>>>>> test >>>>>>>>>> project (ltp-full-20130904) scripts causes some sort of >>>>>>>>>> system >>>>>>>>>> lock >>>>>>>>>> up. >>>>>>>>>> After that I only can ctrl-c xeno-regression-test (i.e. >>>>>>>>>> switchtest), which, >>>>>>>>>> however, doesn't help to regain console access (neigher over >>>>>>>>>> ethernet nor >>>>>>>>>> serial). >>>>>>>>>> >>>>>>>>>> Here's what I did: >>>>>>>>>> >>>>>>>>>> -- Building -- >>>>>>>>>> As recomended in the Xenomai 2.6 readme I followed the >>>>>>>>>> instructions >>>>>>>>>> in [1] >>>>>>>>>> to produce a kernel and filesystem. To get a xenomai kernel I >>>>>>>>>> had >>>>>>>>>> to do >>>>>>>>>> three things differently: >>>>>>>>>> >>>>>>>>>> *) I used: git checkout origin/v3.8.x -b tmp >>>>>>>>>> *) I applied ipipe-core-3.8.13-arm-3.patch from the >>>>>>>>>> xenomai-2.6 >>>>>>>>>> git >>>>>>>>>> tree as >>>>>>>>>> described in the Xenomai 2.6 readme >>>>>>>>>> *) I disabled KGDB and TIDSPBRIDGE since those produced >>>>>>>>>> compile >>>>>>>>>> errors (see >>>>>>>>>> config [2]) >>>>>>>>>> >>>>>>>>>> After a while I obtained the following messages from dmesg >>>>>>>>>> [3] >>>>>>>>>> and >>>>>>>>>> from the >>>>>>>>>> command prompt: >>>>>>>>>> >>>>>>>>>> root@arm:~# cat /proc/version >>>>>>>>>> Linux version 3.8.13-x3.6 (aglatz@linuxvbox) (gcc version >>>>>>>>>> 4.7.3 >>>>>>>>>> 20130328 >>>>>>>>>> (prerelease) (crosstool-NG >>>>>>>>>> linaro-1.13.1-4.7-2013.04-20130415 - >>>>>>>>>> Linaro GCC >>>>>>>>>> 2013.04) ) #4 SMP Sat Jan 4 15:54:20 GMT 2014 >>>>>>>>>> >>>>>>>>>> -- Testing Linux -- >>>>>>>>>> To see if everything works I downloaded and cross-compiled >>>>>>>>>> ltp-full-20130904 [4] with the same toolchain and flags (- >>>>>>>>>> march=armv7-a >>>>>>>>>> -mfpu=vfp3) as the xenomai libs and runtime. I started ltp >>>>>>>>>> with >>>>>>>>>> "./ >>>>>>>>>> runltp >>>>>>>>>> -p -l dohell-2014-01-06-1.log -S xenomai.skiplist" and >>>>>>>>>> after a >>>>>>>>>> while it >>>>>>>>>> finished with a few failed tests [5]. The console access, >>>>>>>>>> however, >>>>>>>>>> worked >>>>>>>>>> fine. >>>>>>>>>> >>>>>>>>>> -- Testing Xenomai -- >>>>>>>>>> First I sucessfully could run the simple xenomai regression >>>>>>>>>> test: >>>>>>>>>> xeno-regression-test -l "/usr/lib/xenomai/testsuite/dohell - >>>>>>>>>> m / >>>>>>>>>> tmp >>>>>>>>>> 100" -t >>>>>>>>>> 2 which produced the output in [6] and the following >>>>>>>>>> additional >>>>>>>>>> messages >>>>>>>>>> with dmesg: >>>>>>>>>> >>>>>>>>>> [ 476.215057] Xenomai: RTDM: closing file descriptor 1. >>>>>>>>>> [ 477.434936] Xenomai: Posix: destroying semaphore f0069c00. >>>>>>>>>> [ 477.440887] Xenomai: Posix: destroying mutex f0069a00. >>>>>>>>>> [ 477.475372] xnheap: destroying shared heap 'rt_heap: heap' >>>>>>>>>> with >>>>>>>>>> 16384 >>>>>>>>>> bytes still in use. >>>>>>>>>> [ 479.008453] Xenomai: Switching rt_task to secondary mode >>>>>>>>>> after >>>>>>>>>> exception >>>>>>>>>> #0 from user-space at 0x9620 (pid 2145) >>>>>>>>>> [ 480.574462] Xenomai: watchdog triggered -- signaling >>>>>>>>>> runaway >>>>>>>>>> thread >>>>>>>>>> 'rt_task' >>>>>>>>>> [ 480.582061] [sched_delayed] sched: RT throttling activated >>>>>>>>>> [ 557.336425] Xenomai: Posix: closing message queue >>>>>>>>>> descriptor >>>>>>>>>> 3. >>>>>>>>>> >>>>>>>>>> and "cat /proc/xenomai/*" produced [7]. >>>>>>>>>> >>>>>>>>>> When I started the realistic xenomai regression test: xeno- >>>>>>>>>> regression-test >>>>>>>>>> -l "/usr/lib/xenomai/testsuite/dohell -m /tmp -l /opt/ltp" - >>>>>>>>>> t 2 >>>>>>>>>> everything >>>>>>>>>> seemed fine at first - I could logon and start top to inspect >>>>>>>>>> the >>>>>>>>>> running >>>>>>>>>> processes. However, the command line (over serial and >>>>>>>>>> ethernet) >>>>>>>>>> consistently freezes after a while (at different ltp tests >>>>>>>>>> though). >>>>>>>>>> First I >>>>>>>>>> thought it's the massive system load which doesn't leave CPU >>>>>>>>>> for >>>>>>>>>> the >>>>>>>>>> console... however ctrl-c of xeno-regression-test does not >>>>>>>>>> help >>>>>>>>>> to >>>>>>>>>> regain >>>>>>>>>> console access... >>>>>>>>> >>>>>>>>> That is because kill xeno-regression-test does not kill all >>>>>>>>> the >>>>>>>>> script children. So, basically, the load tasks are still >>>>>>>>> running. >>>>>>>>> Also, what filesystem is /tmp? dohell is using dd to >>>>>>>>> alternatively >>>>>>>>> write to /tmp, then erase the file. If /tmp is some flash, it >>>>>>>>> will >>>>>>>>> become slow after a while. If it is a tmpfs, it will eat RAM. >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> The described problem is _very_ reproducible on my PandaBoard >>>>>>>> ES >>>>>>>> (omap4460), where I boot from an SD card partition and the >>>>>>>> rootfs >>>>>>>> is >>>>>>>> also on the SD card partition. I tried it with several kernel >>>>>>>> versions >>>>>>>> (3.8.13, 3.10.18, and 3.10.34) with the latest ipipe and >>>>>>>> xenomai >>>>>>>> from >>>>>>>> git the git repos. Everytime I start the regression test (see >>>>>>>> command >>>>>>>> above) the following happens: Everything works fine until the >>>>>>>> switch/ >>>>>>>> latency tests start. Then I see that there is heavy access to >>>>>>>> the >>>>>>>> SD >>>>>>>> card, which is expected, as the status LED 2 is blinking. After >>>>>>>> ~5mins >>>>>>>> this status LED is constantly on. That's when I know that >>>>>>>> everything >>>>>>>> is over. On the console I can only execute commands that are >>>>>>>> already >>>>>>>> in RAM, such as the bash things like ps, mount, ... However, >>>>>>>> if I >>>>>>>> try >>>>>>>> a simple 'touch new' it blocks forever and I know that it >>>>>>>> blocks in >>>>>>>> the syscall where the file should be created, because I >>>>>>>> looked at >>>>>>>> it >>>>>>>> with strace. I tried several things: I turned off CONFIG_PM >>>>>>>> (which >>>>>>>> was >>>>>>>> on by default), turned on the MMC debugging, put extra >>>>>>>> prink's in >>>>>>>> the >>>>>>>> omap_hsmmc.c ISR. However, everything seems to work on this >>>>>>>> level: >>>>>>>> DMA >>>>>>>> requests are started and do finish, the ISR is called regularly >>>>>>>> (bc >>>>>>>> first I though that Xenomai would starve it). >>>>>>>> >>>>>>>> Have you every run Xenonmai on this _specific_ board (since >>>>>>>> everything >>>>>>>> is running smoothly on the omap5 board)? >>>>>>>> Any more ideas how to debug it? >>>>>>>> >>>>>>>> Currently, I'm compiling the ipipe trace in hope that it would >>>>>>>> tell >>>>>>>> me >>>>>>>> something useful... >>>>>>>> >>>>>>>> Oh yes, the best bit is that the regression test works >>>>>>>> perfectly >>>>>>>> fine >>>>>>>> if I boot from an external USB HD _AND_ unmount (!) all MMC >>>>>>>> partitions. >>>>>>> >>>>>>> So, the MMC driver has a problem. Have you tried: >>>>>>> - running the exact same kernel configuration only with >>>>>>> CONFIG_XENOMAI >>>>>>> disabled (and stress with dohell) >>>>>>> - then with CONFIG_XENOMAI and CONFIG_IPIPE disabled. >>>>>>> >>>>>>> Also, do you have this patch in the tree you tried? >>>>>>> http://git.xenomai.org/ipipe.git/commit/?h=stable/ipipe-3.10.18&id=c26e7ad5679f9391cd8ea1db001bf301d2f6bc88 >>>>>>> >>>>>> >>>>>> First i mounted tmpfs on /tmp so I don't wear out the SD card too >>>>>> much: >>>>>> mount -t tmpfs -osize=192M tmpfs /tmp >>>>>> >>>>>> Then I used the following line to start the test (substitute >>>>>> MYTEST >>>>>> below with the following line): >>>>>> /usr/lib/xenomai/testsuite/dohell -m /tmp -l /opt/ltp >>>>>> >>>>>> Note: I always monitored the test over wifi with 'top' so I also >>>>>> had >>>>>> some network load... >>>>>> >>>>>> I got the following results with the 3.10.34 kernel, which >>>>>> includes >>>>>> everything up to the current ipipe-3.10 tag (it also included the >>>>>> patch you mentioned): >>>>>> >>>>>> - xeno-regression-test "MYTEST" -> FAIL if booted from SD card >>>>>> (see >>>>>> description above); OK if booted from ext USB HD _AND_ no mmc >>>>>> partitions mounted >>>>>> - CONFIG_IPIPE && CONFIG_XENOMAI && MYTEST -> FAIL (got status >>>>>> LED 2 >>>>>> constantly on as described above) >>>>>> - CONFIG_IPIPE && MYTEST -> OK (see attached config file and ltp >>>>>> test >>>>>> log) >>>>>> >>>>>> Anything else I should try? >>>>> >>>>> Is the current LTP test when the failure happens always the same? >>>>> >>>>> >>>> >>>> I went through all the logfiles on my pandaboard and and identified >>>> the last tests that ltp logged before the error occurred (I'm >>>> assuming >>>> that ltp writes to the file in /opt/ltp/results after completing >>>> the >>>> test since there is the PASS/FAIL note as well, which logically >>>> should >>>> only be available after completing the test): >>>> >>>> test count >>>> ======================== >>>> rt_sigqueueinfo01 1 >>>> clock_nanosleep01 10 >>>> munmap02 1 >>>> semget06 1 >>>> epoll_create1_01 5 >>>> splice01 1 >>>> clock_getres01 1 >>>> rename13 1 >>>> BindMounts 1 >>>> utimes01 1 >>>> >>>> So it seems that the test after 'clock_nanosleep01', which is >>>> 'clone01' according to the LTP log file I sent you, seems to be the >>>> prime hotspot of failure followed by 'epoll01', which comes after >>>> 'epoll_create1_01'. >>>> >>>> I'm using the standard LTP version 'ltp-full-20130904', which I >>>> downloaded and compiled on the target with gcc 4.6.3 (default >>>> debian >>>> wheezy). >>> >>> Ok. I am not sure it is meaningful. Anyway, the only difference >>> between >>> CONFIG_XENOMAI + CONFIG_IPIPE and CONFIG_IPIPE alone, provided that >>> you >>> are not running any program using Xenomai, is the host tick >>> emulation. >>> >>> So, could you please try to turn off >>> CONFIG_NO_HZ_IDLE >>> CONFIG_NO_HZ >>> CONFIG_HIGH_RES_TIMERS >>> >>> And see if it works better? >>> >> >> As I wrote before, I recompiled the Kernel with your timer options >> and >> CONFIG_XENOMAI, installed it, synced it and rebooted after cutting >> the >> power to the board for ~10secs. >> >> It seems with those options it got much further with the tests. >> However, eventually all ssh connections broke up and the last >> messages >> on the console, where I started do hell were: >> >> [...] >> 102400000 bytes (102 MB) copied, 2.97674 s, 34.4 MB/s >> 100+0 records in >> 100+0 records out >> 102400000 bytes (102 MB) copied, 1.97433 s, 51.9 MB/s >> 100+0 records in >> 100+0 records out >> 102400000 bytes (102 MB) copied, 2.68371 s, 38.2 MB/s >> 100+0 records in >> 100+0 records out >> 102400000 bytes (102 MB) copied, 2.57073 s, 39.8 MB/s >> dd: writing `/tmp/bigfile': No space left on device >> 7+0 records in >> 6+0 records out >> 6164480 bytes (6.2 MB) copied, 0.189001 s, 32.6 MB/s >> /usr/lib/xenomai/testsuite/dohell: 62: /usr/lib/xenomai/testsuite/ >> dohell: Cannot fork > > This may simply be due to some LTP test which forks a lot and prevent > the system from being able to fork. This should be a temporary > solution. > >> Write failed: Host is down >> >> ... and as usuall status LED 2 is permanently on. >> >> As u suspect there's something wrong with the timer subsystem I >> looked >> around a bit what extra patches went into the 3.10.14 kernel of >> RobertCNelson, which I used as a base to merge the ipipe git tree. >> Here is the list: >> >> 0001-panda-fix-wl12xx-regulator.patch >> 0002-ti-st-st-kim-fixing-firmware-path.patch >> 0003-Panda-expansion-add-spidev.patch >> 0004-HACK-PandaES-disable-cpufreq-so-board-will-boot.patch >> 0005-HACK-panda-enable-OMAP4_ERRATA_I688.patch >> 0006-ARM-hw_breakpoint-Enable-debug-powerdown-only-if-sys.patch >> 0007-Revert-regulator-twl-Remove-TWL6030_FIXED_RESOURCE.patch >> 0008-Revert-regulator-twl-Remove-another-unused-variable-.patch >> 0009-Revert-regulator-twl-Remove-references-to-the-twl403.patch >> 0010-Revert-regulator-twl-Remove-references-to-32kHz-cloc.patch >> 0011-panda-spidev-setup-pinmux.patch >> >> Do you think those may have something to do with it? > > I do not think so. When the LED is still on, can you use the serial > console to run cat /proc/interrupts to see if the timer is still > ticking? > I ran the test again with the same kernel and traced the messages from the serial console with minicom. Again, the test ran for quite some time until I got stacktraces similar to [1] (which might be just related to the ltp memcg test). However, after these stacktraces I got the following message on the serial console (LED2 also went on and stayed on): [...] [ 6674.540000] omap_hsmmc omap_hsmmc.0: MMC start dma failure [ 6674.540000] mmcblk0: unknown error -22 sending read/write command, card status 0x900 [ 6674.550000] end_request: I/O error, dev mmcblk0, sector 12751744 [ 6674.560000] EXT4-fs warning (device mmcblk0p2): __ext4_read_dirblock:908: error reading directory block (ino 397703, block 0) [...] [ 6932.610000] omap_hsmmc omap_hsmmc.0: MMC start dma failure [ 6932.610000] mmcblk0: unknown error -22 sending read/write command, card status 0x900 [ 6932.620000] end_request: I/O error, dev mmcblk0, sector 21142904 [ 6932.630000] EXT4-fs warning (device mmcblk0p2): __ext4_read_dirblock:908: error reading directory block (ino 657554, block 0) [...] Although dd is still running on minicom, I lost the ssh connection over Ethernet (and I couldn't get it back even after unconnecting and reconnecting the cable, which didn't cause any PHY interrupt in dmesg as well) and I cannot Ctrl-C or do anything on the serial console... I just see dd, which was started by dohell, getting invoked. So with the periodic timer ltp runs for much longer, however I can't get the console back after the mmc (?), which I was able to with the original timer subsystem config. ... and xeno-regression-test "MYTEST" fails as usual after ~ 5mins. A. [1] memcg related stacktrace: ======================= [ 6606.000000] memcg_process invoked oom-killer: gfp_mask=0xd0, order=0, oom_sco re_adj=0[ 6606.010000] memcg_process cpuset=/ mems_allowed=0 [ 6606.010000] CPU: 0 PID: 26237 Comm: memcg_process Tainted: G W 3.10.32-x3.4 #26 [ 6606.020000] [<c0014e0c>] (unwind_backtrace+0x0/0xe8) from [<c00122ac>] (show_stack+0x20/0x24) [ 6606.030000] [<c00122ac>] (show_stack+0x20/0x24) from [<c081e0b0>] (dump_stack+0x20/0x28) [ 6606.040000] [<c081e0b0>] (dump_stack+0x20/0x28) from [<c081a610>] (dump_header.isra.11+0x98/0x1ac) [ 6606.050000] [<c081a610>] (dump_header.isra.11+0x98/0x1ac) from [<c01948e8>] (oom_kill_process+0x6c/0x3a0) [ 6606.060000] [<c01948e8>] (oom_kill_process+0x6c/0x3a0) from [<c01d0fe8>] (__mem_cgroup_try_charge+0xb00/0xb50) [ 6606.070000] [<c01d0fe8>] (__mem_cgroup_try_charge+0xb00/0xb50) from [<c01d14f0>] (mem_cgroup_charge_common+0x44/0x6c) [ 6606.080000] [<c01d14f0>] (mem_cgroup_charge_common+0x44/0x6c) from [<c01d2958>] (mem_cgroup_newpage_charge+0x34/0x3c) [ 6606.090000] [<c01d2958>] (mem_cgroup_newpage_charge+0x34/0x3c) from [<c01b5718>] (handle_pte_fault+0x718/0x878) [ 6606.100000] [<c01b5718>] (handle_pte_fault+0x718/0x878) from [<c01b5968>] (handle_mm_fault+0xf0/0x144) [ 6606.110000] [<c01b5968>] (handle_mm_fault+0xf0/0x144) from [<c01b5c7c>] (__get_user_pages.part.72+0x2c0/0x434) [ 6606.120000] [<c01b5c7c>] (__get_user_pages.part.72+0x2c0/0x434) from [<c01b5e38>] (__get_user_pages+0x48/0x50) [ 6606.130000] [<c01b5e38>] (__get_user_pages+0x48/0x50) from [<c01b6b24>] (__mlock_vma_pages_range+0x74/0x7c) [ 6606.140000] [<c01b6b24>] (__mlock_vma_pages_range+0x74/0x7c) from [<c01b6fc4>] (__mm_populate+0xd8/0x13c) [ 6606.150000] [<c01b6fc4>] (__mm_populate+0xd8/0x13c) from [<c01a9930>] (vm_mmap_pgoff+0xac/0xb8) [ 6606.160000] [<c01a9930>] (vm_mmap_pgoff+0xac/0xb8) from [<c01b8dd8>] (SyS_mma p_pgoff+0xb0/0xec) [ 6606.160000] [<c01b8dd8>] (SyS_mmap_pgoff+0xb0/0xec) from [<c000e020>] (ret_fa st_syscall+0x0/0x50) [ 6606.170000] Task in /1/subgroup killed as a result of limit of / 1[ 6606.180000] memory: usage 4kB, limit 4kB, failcnt 6[ 6606.190000] memory+swap: usage 4kB, limit 9007199254740991kB, failcnt 0 [ 6606.190000] kmem: usage 0kB, limit 9007199254740991kB, failcnt 0[ 6606.200000] Memory cgroup stats for /1: cache:0KB rss:0KB rss_huge: 0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_fi le:0KB unevictable:0KB [ 6606.220000] Memory cgroup stats for /1/subgroup: cache:0KB rss:4KB rss_huge:0KB mapped_file:0KB swap:0KB inactive_anon:0KB active_anon: 0KB inactive_file:0KB active_file:0KB unevictable:4KB [ 6606.230000] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name [ 6606.240000] [26237] 0 26237 404 84 3 0 0 memcg_process [ 6606.250000] Memory cgroup out of memory: Kill process 26237 (memcg_process) score 85000 or sacrifice child [ 6606.260000] Killed process 26237 (memcg_process) total-vm:1616kB, anon-rss:68kB, file-rss:268kB ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Xenomai] Command line freeze during xeno-regression-test on omap4460 2014-04-07 10:18 ` Andreas Glatz @ 2014-04-07 10:52 ` Gilles Chanteperdrix 2014-04-07 13:41 ` Andreas Glatz 0 siblings, 1 reply; 28+ messages in thread From: Gilles Chanteperdrix @ 2014-04-07 10:52 UTC (permalink / raw) To: Andreas Glatz; +Cc: xenomai On 04/07/2014 12:18 PM, Andreas Glatz wrote: > > On 6 Apr 2014, at 22:04, Gilles Chanteperdrix wrote: > >> On 04/06/2014 10:57 PM, Andreas Glatz wrote: >>> >>> On 6 Apr 2014, at 16:28, Gilles Chanteperdrix wrote: >>> >>>> On 04/06/2014 05:22 PM, Andreas Glatz wrote: >>>>> >>>>> On 6 Apr 2014, at 15:44, Gilles Chanteperdrix wrote: >>>>> >>>>>> On 04/06/2014 01:21 PM, Andreas Glatz wrote: >>>>>>> >>>>>>> On 4 Apr 2014, at 11:44, Gilles Chanteperdrix wrote: >>>>>>> >>>>>>>> On 04/04/2014 12:27 PM, Andreas Glatz wrote: >>>>>>>>> Hi Gilles, >>>>>>>>> >>>>>>>>> I'm finally back to my original problem below: >>>>>>>>> >>>>>>>>> On 6 Jan 2014, at 17:39, Gilles Chanteperdrix wrote: >>>>>>>>> >>>>>>>>>> On 01/06/2014 04:30 PM, Andreas Glatz wrote: >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> I managed to produce a kernel (v3.8.13) with xenomai 2.6.3 >>>>>>>>>>> ipipe >>>>>>>>>>> patch and >>>>>>>>>>> rootfs (debian wheezy) with xenomai 2.6.3 libraries for my >>>>>>>>>>> Pandaboard ES >>>>>>>>>>> (omap4460). The simple regression test, which only calls dd >>>>>>>>>>> during >>>>>>>>>>> the >>>>>>>>>>> switchtest, works fine. However the regression test with the >>>>>>>>>>> linux >>>>>>>>>>> test >>>>>>>>>>> project (ltp-full-20130904) scripts causes some sort of >>>>>>>>>>> system >>>>>>>>>>> lock >>>>>>>>>>> up. >>>>>>>>>>> After that I only can ctrl-c xeno-regression-test (i.e. >>>>>>>>>>> switchtest), which, >>>>>>>>>>> however, doesn't help to regain console access (neigher over >>>>>>>>>>> ethernet nor >>>>>>>>>>> serial). >>>>>>>>>>> >>>>>>>>>>> Here's what I did: >>>>>>>>>>> >>>>>>>>>>> -- Building -- >>>>>>>>>>> As recomended in the Xenomai 2.6 readme I followed the >>>>>>>>>>> instructions >>>>>>>>>>> in [1] >>>>>>>>>>> to produce a kernel and filesystem. To get a xenomai kernel I >>>>>>>>>>> had >>>>>>>>>>> to do >>>>>>>>>>> three things differently: >>>>>>>>>>> >>>>>>>>>>> *) I used: git checkout origin/v3.8.x -b tmp >>>>>>>>>>> *) I applied ipipe-core-3.8.13-arm-3.patch from the >>>>>>>>>>> xenomai-2.6 >>>>>>>>>>> git >>>>>>>>>>> tree as >>>>>>>>>>> described in the Xenomai 2.6 readme >>>>>>>>>>> *) I disabled KGDB and TIDSPBRIDGE since those produced >>>>>>>>>>> compile >>>>>>>>>>> errors (see >>>>>>>>>>> config [2]) >>>>>>>>>>> >>>>>>>>>>> After a while I obtained the following messages from dmesg >>>>>>>>>>> [3] >>>>>>>>>>> and >>>>>>>>>>> from the >>>>>>>>>>> command prompt: >>>>>>>>>>> >>>>>>>>>>> root@arm:~# cat /proc/version >>>>>>>>>>> Linux version 3.8.13-x3.6 (aglatz@linuxvbox) (gcc version >>>>>>>>>>> 4.7.3 >>>>>>>>>>> 20130328 >>>>>>>>>>> (prerelease) (crosstool-NG >>>>>>>>>>> linaro-1.13.1-4.7-2013.04-20130415 - >>>>>>>>>>> Linaro GCC >>>>>>>>>>> 2013.04) ) #4 SMP Sat Jan 4 15:54:20 GMT 2014 >>>>>>>>>>> >>>>>>>>>>> -- Testing Linux -- >>>>>>>>>>> To see if everything works I downloaded and cross-compiled >>>>>>>>>>> ltp-full-20130904 [4] with the same toolchain and flags (- >>>>>>>>>>> march=armv7-a >>>>>>>>>>> -mfpu=vfp3) as the xenomai libs and runtime. I started ltp >>>>>>>>>>> with >>>>>>>>>>> "./ >>>>>>>>>>> runltp >>>>>>>>>>> -p -l dohell-2014-01-06-1.log -S xenomai.skiplist" and >>>>>>>>>>> after a >>>>>>>>>>> while it >>>>>>>>>>> finished with a few failed tests [5]. The console access, >>>>>>>>>>> however, >>>>>>>>>>> worked >>>>>>>>>>> fine. >>>>>>>>>>> >>>>>>>>>>> -- Testing Xenomai -- >>>>>>>>>>> First I sucessfully could run the simple xenomai regression >>>>>>>>>>> test: >>>>>>>>>>> xeno-regression-test -l "/usr/lib/xenomai/testsuite/dohell - >>>>>>>>>>> m / >>>>>>>>>>> tmp >>>>>>>>>>> 100" -t >>>>>>>>>>> 2 which produced the output in [6] and the following >>>>>>>>>>> additional >>>>>>>>>>> messages >>>>>>>>>>> with dmesg: >>>>>>>>>>> >>>>>>>>>>> [ 476.215057] Xenomai: RTDM: closing file descriptor 1. >>>>>>>>>>> [ 477.434936] Xenomai: Posix: destroying semaphore f0069c00. >>>>>>>>>>> [ 477.440887] Xenomai: Posix: destroying mutex f0069a00. >>>>>>>>>>> [ 477.475372] xnheap: destroying shared heap 'rt_heap: heap' >>>>>>>>>>> with >>>>>>>>>>> 16384 >>>>>>>>>>> bytes still in use. >>>>>>>>>>> [ 479.008453] Xenomai: Switching rt_task to secondary mode >>>>>>>>>>> after >>>>>>>>>>> exception >>>>>>>>>>> #0 from user-space at 0x9620 (pid 2145) >>>>>>>>>>> [ 480.574462] Xenomai: watchdog triggered -- signaling >>>>>>>>>>> runaway >>>>>>>>>>> thread >>>>>>>>>>> 'rt_task' >>>>>>>>>>> [ 480.582061] [sched_delayed] sched: RT throttling activated >>>>>>>>>>> [ 557.336425] Xenomai: Posix: closing message queue >>>>>>>>>>> descriptor >>>>>>>>>>> 3. >>>>>>>>>>> >>>>>>>>>>> and "cat /proc/xenomai/*" produced [7]. >>>>>>>>>>> >>>>>>>>>>> When I started the realistic xenomai regression test: xeno- >>>>>>>>>>> regression-test >>>>>>>>>>> -l "/usr/lib/xenomai/testsuite/dohell -m /tmp -l /opt/ltp" - >>>>>>>>>>> t 2 >>>>>>>>>>> everything >>>>>>>>>>> seemed fine at first - I could logon and start top to inspect >>>>>>>>>>> the >>>>>>>>>>> running >>>>>>>>>>> processes. However, the command line (over serial and >>>>>>>>>>> ethernet) >>>>>>>>>>> consistently freezes after a while (at different ltp tests >>>>>>>>>>> though). >>>>>>>>>>> First I >>>>>>>>>>> thought it's the massive system load which doesn't leave CPU >>>>>>>>>>> for >>>>>>>>>>> the >>>>>>>>>>> console... however ctrl-c of xeno-regression-test does not >>>>>>>>>>> help >>>>>>>>>>> to >>>>>>>>>>> regain >>>>>>>>>>> console access... >>>>>>>>>> >>>>>>>>>> That is because kill xeno-regression-test does not kill all >>>>>>>>>> the >>>>>>>>>> script children. So, basically, the load tasks are still >>>>>>>>>> running. >>>>>>>>>> Also, what filesystem is /tmp? dohell is using dd to >>>>>>>>>> alternatively >>>>>>>>>> write to /tmp, then erase the file. If /tmp is some flash, it >>>>>>>>>> will >>>>>>>>>> become slow after a while. If it is a tmpfs, it will eat RAM. >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> The described problem is _very_ reproducible on my PandaBoard >>>>>>>>> ES >>>>>>>>> (omap4460), where I boot from an SD card partition and the >>>>>>>>> rootfs >>>>>>>>> is >>>>>>>>> also on the SD card partition. I tried it with several kernel >>>>>>>>> versions >>>>>>>>> (3.8.13, 3.10.18, and 3.10.34) with the latest ipipe and >>>>>>>>> xenomai >>>>>>>>> from >>>>>>>>> git the git repos. Everytime I start the regression test (see >>>>>>>>> command >>>>>>>>> above) the following happens: Everything works fine until the >>>>>>>>> switch/ >>>>>>>>> latency tests start. Then I see that there is heavy access to >>>>>>>>> the >>>>>>>>> SD >>>>>>>>> card, which is expected, as the status LED 2 is blinking. After >>>>>>>>> ~5mins >>>>>>>>> this status LED is constantly on. That's when I know that >>>>>>>>> everything >>>>>>>>> is over. On the console I can only execute commands that are >>>>>>>>> already >>>>>>>>> in RAM, such as the bash things like ps, mount, ... However, >>>>>>>>> if I >>>>>>>>> try >>>>>>>>> a simple 'touch new' it blocks forever and I know that it >>>>>>>>> blocks in >>>>>>>>> the syscall where the file should be created, because I >>>>>>>>> looked at >>>>>>>>> it >>>>>>>>> with strace. I tried several things: I turned off CONFIG_PM >>>>>>>>> (which >>>>>>>>> was >>>>>>>>> on by default), turned on the MMC debugging, put extra >>>>>>>>> prink's in >>>>>>>>> the >>>>>>>>> omap_hsmmc.c ISR. However, everything seems to work on this >>>>>>>>> level: >>>>>>>>> DMA >>>>>>>>> requests are started and do finish, the ISR is called regularly >>>>>>>>> (bc >>>>>>>>> first I though that Xenomai would starve it). >>>>>>>>> >>>>>>>>> Have you every run Xenonmai on this _specific_ board (since >>>>>>>>> everything >>>>>>>>> is running smoothly on the omap5 board)? >>>>>>>>> Any more ideas how to debug it? >>>>>>>>> >>>>>>>>> Currently, I'm compiling the ipipe trace in hope that it would >>>>>>>>> tell >>>>>>>>> me >>>>>>>>> something useful... >>>>>>>>> >>>>>>>>> Oh yes, the best bit is that the regression test works >>>>>>>>> perfectly >>>>>>>>> fine >>>>>>>>> if I boot from an external USB HD _AND_ unmount (!) all MMC >>>>>>>>> partitions. >>>>>>>> >>>>>>>> So, the MMC driver has a problem. Have you tried: >>>>>>>> - running the exact same kernel configuration only with >>>>>>>> CONFIG_XENOMAI >>>>>>>> disabled (and stress with dohell) >>>>>>>> - then with CONFIG_XENOMAI and CONFIG_IPIPE disabled. >>>>>>>> >>>>>>>> Also, do you have this patch in the tree you tried? >>>>>>>> http://git.xenomai.org/ipipe.git/commit/?h=stable/ipipe-3.10.18&id=c26e7ad5679f9391cd8ea1db001bf301d2f6bc88 >>>>>>>> >>>>>>> >>>>>>> First i mounted tmpfs on /tmp so I don't wear out the SD card too >>>>>>> much: >>>>>>> mount -t tmpfs -osize=192M tmpfs /tmp >>>>>>> >>>>>>> Then I used the following line to start the test (substitute >>>>>>> MYTEST >>>>>>> below with the following line): >>>>>>> /usr/lib/xenomai/testsuite/dohell -m /tmp -l /opt/ltp >>>>>>> >>>>>>> Note: I always monitored the test over wifi with 'top' so I also >>>>>>> had >>>>>>> some network load... >>>>>>> >>>>>>> I got the following results with the 3.10.34 kernel, which >>>>>>> includes >>>>>>> everything up to the current ipipe-3.10 tag (it also included the >>>>>>> patch you mentioned): >>>>>>> >>>>>>> - xeno-regression-test "MYTEST" -> FAIL if booted from SD card >>>>>>> (see >>>>>>> description above); OK if booted from ext USB HD _AND_ no mmc >>>>>>> partitions mounted >>>>>>> - CONFIG_IPIPE && CONFIG_XENOMAI && MYTEST -> FAIL (got status >>>>>>> LED 2 >>>>>>> constantly on as described above) >>>>>>> - CONFIG_IPIPE && MYTEST -> OK (see attached config file and ltp >>>>>>> test >>>>>>> log) >>>>>>> >>>>>>> Anything else I should try? >>>>>> >>>>>> Is the current LTP test when the failure happens always the same? >>>>>> >>>>>> >>>>> >>>>> I went through all the logfiles on my pandaboard and and identified >>>>> the last tests that ltp logged before the error occurred (I'm >>>>> assuming >>>>> that ltp writes to the file in /opt/ltp/results after completing >>>>> the >>>>> test since there is the PASS/FAIL note as well, which logically >>>>> should >>>>> only be available after completing the test): >>>>> >>>>> test count >>>>> ======================== >>>>> rt_sigqueueinfo01 1 >>>>> clock_nanosleep01 10 >>>>> munmap02 1 >>>>> semget06 1 >>>>> epoll_create1_01 5 >>>>> splice01 1 >>>>> clock_getres01 1 >>>>> rename13 1 >>>>> BindMounts 1 >>>>> utimes01 1 >>>>> >>>>> So it seems that the test after 'clock_nanosleep01', which is >>>>> 'clone01' according to the LTP log file I sent you, seems to be the >>>>> prime hotspot of failure followed by 'epoll01', which comes after >>>>> 'epoll_create1_01'. >>>>> >>>>> I'm using the standard LTP version 'ltp-full-20130904', which I >>>>> downloaded and compiled on the target with gcc 4.6.3 (default >>>>> debian >>>>> wheezy). >>>> >>>> Ok. I am not sure it is meaningful. Anyway, the only difference >>>> between >>>> CONFIG_XENOMAI + CONFIG_IPIPE and CONFIG_IPIPE alone, provided that >>>> you >>>> are not running any program using Xenomai, is the host tick >>>> emulation. >>>> >>>> So, could you please try to turn off >>>> CONFIG_NO_HZ_IDLE >>>> CONFIG_NO_HZ >>>> CONFIG_HIGH_RES_TIMERS >>>> >>>> And see if it works better? >>>> >>> >>> As I wrote before, I recompiled the Kernel with your timer options >>> and >>> CONFIG_XENOMAI, installed it, synced it and rebooted after cutting >>> the >>> power to the board for ~10secs. >>> >>> It seems with those options it got much further with the tests. >>> However, eventually all ssh connections broke up and the last >>> messages >>> on the console, where I started do hell were: >>> >>> [...] >>> 102400000 bytes (102 MB) copied, 2.97674 s, 34.4 MB/s >>> 100+0 records in >>> 100+0 records out >>> 102400000 bytes (102 MB) copied, 1.97433 s, 51.9 MB/s >>> 100+0 records in >>> 100+0 records out >>> 102400000 bytes (102 MB) copied, 2.68371 s, 38.2 MB/s >>> 100+0 records in >>> 100+0 records out >>> 102400000 bytes (102 MB) copied, 2.57073 s, 39.8 MB/s >>> dd: writing `/tmp/bigfile': No space left on device >>> 7+0 records in >>> 6+0 records out >>> 6164480 bytes (6.2 MB) copied, 0.189001 s, 32.6 MB/s >>> /usr/lib/xenomai/testsuite/dohell: 62: /usr/lib/xenomai/testsuite/ >>> dohell: Cannot fork >> >> This may simply be due to some LTP test which forks a lot and prevent >> the system from being able to fork. This should be a temporary >> solution. >> >>> Write failed: Host is down >>> >>> ... and as usuall status LED 2 is permanently on. >>> >>> As u suspect there's something wrong with the timer subsystem I >>> looked >>> around a bit what extra patches went into the 3.10.14 kernel of >>> RobertCNelson, which I used as a base to merge the ipipe git tree. >>> Here is the list: >>> >>> 0001-panda-fix-wl12xx-regulator.patch >>> 0002-ti-st-st-kim-fixing-firmware-path.patch >>> 0003-Panda-expansion-add-spidev.patch >>> 0004-HACK-PandaES-disable-cpufreq-so-board-will-boot.patch >>> 0005-HACK-panda-enable-OMAP4_ERRATA_I688.patch >>> 0006-ARM-hw_breakpoint-Enable-debug-powerdown-only-if-sys.patch >>> 0007-Revert-regulator-twl-Remove-TWL6030_FIXED_RESOURCE.patch >>> 0008-Revert-regulator-twl-Remove-another-unused-variable-.patch >>> 0009-Revert-regulator-twl-Remove-references-to-the-twl403.patch >>> 0010-Revert-regulator-twl-Remove-references-to-32kHz-cloc.patch >>> 0011-panda-spidev-setup-pinmux.patch >>> >>> Do you think those may have something to do with it? >> >> I do not think so. When the LED is still on, can you use the serial >> console to run cat /proc/interrupts to see if the timer is still >> ticking? >> > > I ran the test again with the same kernel and traced the messages from > the serial console with minicom. Again, the test ran for quite some > time until I got stacktraces similar to [1] (which might be just > related to the ltp memcg test). > > However, after these stacktraces I got the following message on the > serial console (LED2 also went on and stayed on): > > [...] > [ 6674.540000] omap_hsmmc omap_hsmmc.0: MMC start dma failure > [ 6674.540000] mmcblk0: unknown error -22 sending read/write command, > card status 0x900 > [ 6674.550000] end_request: I/O error, dev mmcblk0, sector 12751744 > [ 6674.560000] EXT4-fs warning (device mmcblk0p2): > __ext4_read_dirblock:908: error reading directory block (ino 397703, > block 0) > [...] > [ 6932.610000] omap_hsmmc omap_hsmmc.0: MMC start dma failure > [ 6932.610000] mmcblk0: unknown error -22 sending read/write command, > card status 0x900 > [ 6932.620000] end_request: I/O error, dev mmcblk0, sector 21142904 > [ 6932.630000] EXT4-fs warning (device mmcblk0p2): > __ext4_read_dirblock:908: error reading directory block (ino 657554, > block 0) > [...] > > Although dd is still running on minicom, I lost the ssh connection > over Ethernet (and I couldn't get it back even after unconnecting and > reconnecting the cable, which didn't cause any PHY interrupt in dmesg > as well) and I cannot Ctrl-C or do anything on the serial console... I > just see dd, which was started by dohell, getting invoked. What I meant is to use minicom as a console, already logged in, doing nothing, ready to be used when the bug happens. Anyway, I think there is no way around understanding the MMC driver now. The bug when starting DMA may simply be due to the fact that all previous DMAs stalled. Will look at this if I can reproduce it on my panda (it is currently testing the I-pipe for 3.14, but when it is finished, I will try and reproduce the bug). -- Gilles. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Xenomai] Command line freeze during xeno-regression-test on omap4460 2014-04-07 10:52 ` Gilles Chanteperdrix @ 2014-04-07 13:41 ` Andreas Glatz 0 siblings, 0 replies; 28+ messages in thread From: Andreas Glatz @ 2014-04-07 13:41 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: xenomai On 7 Apr 2014, at 11:52, Gilles Chanteperdrix wrote: > On 04/07/2014 12:18 PM, Andreas Glatz wrote: >> >> On 6 Apr 2014, at 22:04, Gilles Chanteperdrix wrote: >> >>> On 04/06/2014 10:57 PM, Andreas Glatz wrote: >>>> >>>> On 6 Apr 2014, at 16:28, Gilles Chanteperdrix wrote: >>>> >>>>> On 04/06/2014 05:22 PM, Andreas Glatz wrote: >>>>>> >>>>>> On 6 Apr 2014, at 15:44, Gilles Chanteperdrix wrote: >>>>>> >>>>>>> On 04/06/2014 01:21 PM, Andreas Glatz wrote: >>>>>>>> >>>>>>>> On 4 Apr 2014, at 11:44, Gilles Chanteperdrix wrote: >>>>>>>> >>>>>>>>> On 04/04/2014 12:27 PM, Andreas Glatz wrote: >>>>>>>>>> Hi Gilles, >>>>>>>>>> >>>>>>>>>> I'm finally back to my original problem below: >>>>>>>>>> >>>>>>>>>> On 6 Jan 2014, at 17:39, Gilles Chanteperdrix wrote: >>>>>>>>>> >>>>>>>>>>> On 01/06/2014 04:30 PM, Andreas Glatz wrote: >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> I managed to produce a kernel (v3.8.13) with xenomai 2.6.3 >>>>>>>>>>>> ipipe >>>>>>>>>>>> patch and >>>>>>>>>>>> rootfs (debian wheezy) with xenomai 2.6.3 libraries for my >>>>>>>>>>>> Pandaboard ES >>>>>>>>>>>> (omap4460). The simple regression test, which only calls dd >>>>>>>>>>>> during >>>>>>>>>>>> the >>>>>>>>>>>> switchtest, works fine. However the regression test with >>>>>>>>>>>> the >>>>>>>>>>>> linux >>>>>>>>>>>> test >>>>>>>>>>>> project (ltp-full-20130904) scripts causes some sort of >>>>>>>>>>>> system >>>>>>>>>>>> lock >>>>>>>>>>>> up. >>>>>>>>>>>> After that I only can ctrl-c xeno-regression-test (i.e. >>>>>>>>>>>> switchtest), which, >>>>>>>>>>>> however, doesn't help to regain console access (neigher >>>>>>>>>>>> over >>>>>>>>>>>> ethernet nor >>>>>>>>>>>> serial). >>>>>>>>>>>> >>>>>>>>>>>> Here's what I did: >>>>>>>>>>>> >>>>>>>>>>>> -- Building -- >>>>>>>>>>>> As recomended in the Xenomai 2.6 readme I followed the >>>>>>>>>>>> instructions >>>>>>>>>>>> in [1] >>>>>>>>>>>> to produce a kernel and filesystem. To get a xenomai >>>>>>>>>>>> kernel I >>>>>>>>>>>> had >>>>>>>>>>>> to do >>>>>>>>>>>> three things differently: >>>>>>>>>>>> >>>>>>>>>>>> *) I used: git checkout origin/v3.8.x -b tmp >>>>>>>>>>>> *) I applied ipipe-core-3.8.13-arm-3.patch from the >>>>>>>>>>>> xenomai-2.6 >>>>>>>>>>>> git >>>>>>>>>>>> tree as >>>>>>>>>>>> described in the Xenomai 2.6 readme >>>>>>>>>>>> *) I disabled KGDB and TIDSPBRIDGE since those produced >>>>>>>>>>>> compile >>>>>>>>>>>> errors (see >>>>>>>>>>>> config [2]) >>>>>>>>>>>> >>>>>>>>>>>> After a while I obtained the following messages from dmesg >>>>>>>>>>>> [3] >>>>>>>>>>>> and >>>>>>>>>>>> from the >>>>>>>>>>>> command prompt: >>>>>>>>>>>> >>>>>>>>>>>> root@arm:~# cat /proc/version >>>>>>>>>>>> Linux version 3.8.13-x3.6 (aglatz@linuxvbox) (gcc version >>>>>>>>>>>> 4.7.3 >>>>>>>>>>>> 20130328 >>>>>>>>>>>> (prerelease) (crosstool-NG >>>>>>>>>>>> linaro-1.13.1-4.7-2013.04-20130415 - >>>>>>>>>>>> Linaro GCC >>>>>>>>>>>> 2013.04) ) #4 SMP Sat Jan 4 15:54:20 GMT 2014 >>>>>>>>>>>> >>>>>>>>>>>> -- Testing Linux -- >>>>>>>>>>>> To see if everything works I downloaded and cross-compiled >>>>>>>>>>>> ltp-full-20130904 [4] with the same toolchain and flags (- >>>>>>>>>>>> march=armv7-a >>>>>>>>>>>> -mfpu=vfp3) as the xenomai libs and runtime. I started ltp >>>>>>>>>>>> with >>>>>>>>>>>> "./ >>>>>>>>>>>> runltp >>>>>>>>>>>> -p -l dohell-2014-01-06-1.log -S xenomai.skiplist" and >>>>>>>>>>>> after a >>>>>>>>>>>> while it >>>>>>>>>>>> finished with a few failed tests [5]. The console access, >>>>>>>>>>>> however, >>>>>>>>>>>> worked >>>>>>>>>>>> fine. >>>>>>>>>>>> >>>>>>>>>>>> -- Testing Xenomai -- >>>>>>>>>>>> First I sucessfully could run the simple xenomai regression >>>>>>>>>>>> test: >>>>>>>>>>>> xeno-regression-test -l "/usr/lib/xenomai/testsuite/ >>>>>>>>>>>> dohell - >>>>>>>>>>>> m / >>>>>>>>>>>> tmp >>>>>>>>>>>> 100" -t >>>>>>>>>>>> 2 which produced the output in [6] and the following >>>>>>>>>>>> additional >>>>>>>>>>>> messages >>>>>>>>>>>> with dmesg: >>>>>>>>>>>> >>>>>>>>>>>> [ 476.215057] Xenomai: RTDM: closing file descriptor 1. >>>>>>>>>>>> [ 477.434936] Xenomai: Posix: destroying semaphore >>>>>>>>>>>> f0069c00. >>>>>>>>>>>> [ 477.440887] Xenomai: Posix: destroying mutex f0069a00. >>>>>>>>>>>> [ 477.475372] xnheap: destroying shared heap 'rt_heap: >>>>>>>>>>>> heap' >>>>>>>>>>>> with >>>>>>>>>>>> 16384 >>>>>>>>>>>> bytes still in use. >>>>>>>>>>>> [ 479.008453] Xenomai: Switching rt_task to secondary mode >>>>>>>>>>>> after >>>>>>>>>>>> exception >>>>>>>>>>>> #0 from user-space at 0x9620 (pid 2145) >>>>>>>>>>>> [ 480.574462] Xenomai: watchdog triggered -- signaling >>>>>>>>>>>> runaway >>>>>>>>>>>> thread >>>>>>>>>>>> 'rt_task' >>>>>>>>>>>> [ 480.582061] [sched_delayed] sched: RT throttling >>>>>>>>>>>> activated >>>>>>>>>>>> [ 557.336425] Xenomai: Posix: closing message queue >>>>>>>>>>>> descriptor >>>>>>>>>>>> 3. >>>>>>>>>>>> >>>>>>>>>>>> and "cat /proc/xenomai/*" produced [7]. >>>>>>>>>>>> >>>>>>>>>>>> When I started the realistic xenomai regression test: xeno- >>>>>>>>>>>> regression-test >>>>>>>>>>>> -l "/usr/lib/xenomai/testsuite/dohell -m /tmp -l /opt/ >>>>>>>>>>>> ltp" - >>>>>>>>>>>> t 2 >>>>>>>>>>>> everything >>>>>>>>>>>> seemed fine at first - I could logon and start top to >>>>>>>>>>>> inspect >>>>>>>>>>>> the >>>>>>>>>>>> running >>>>>>>>>>>> processes. However, the command line (over serial and >>>>>>>>>>>> ethernet) >>>>>>>>>>>> consistently freezes after a while (at different ltp tests >>>>>>>>>>>> though). >>>>>>>>>>>> First I >>>>>>>>>>>> thought it's the massive system load which doesn't leave >>>>>>>>>>>> CPU >>>>>>>>>>>> for >>>>>>>>>>>> the >>>>>>>>>>>> console... however ctrl-c of xeno-regression-test does not >>>>>>>>>>>> help >>>>>>>>>>>> to >>>>>>>>>>>> regain >>>>>>>>>>>> console access... >>>>>>>>>>> >>>>>>>>>>> That is because kill xeno-regression-test does not kill all >>>>>>>>>>> the >>>>>>>>>>> script children. So, basically, the load tasks are still >>>>>>>>>>> running. >>>>>>>>>>> Also, what filesystem is /tmp? dohell is using dd to >>>>>>>>>>> alternatively >>>>>>>>>>> write to /tmp, then erase the file. If /tmp is some flash, >>>>>>>>>>> it >>>>>>>>>>> will >>>>>>>>>>> become slow after a while. If it is a tmpfs, it will eat >>>>>>>>>>> RAM. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> The described problem is _very_ reproducible on my PandaBoard >>>>>>>>>> ES >>>>>>>>>> (omap4460), where I boot from an SD card partition and the >>>>>>>>>> rootfs >>>>>>>>>> is >>>>>>>>>> also on the SD card partition. I tried it with several kernel >>>>>>>>>> versions >>>>>>>>>> (3.8.13, 3.10.18, and 3.10.34) with the latest ipipe and >>>>>>>>>> xenomai >>>>>>>>>> from >>>>>>>>>> git the git repos. Everytime I start the regression test (see >>>>>>>>>> command >>>>>>>>>> above) the following happens: Everything works fine until the >>>>>>>>>> switch/ >>>>>>>>>> latency tests start. Then I see that there is heavy access to >>>>>>>>>> the >>>>>>>>>> SD >>>>>>>>>> card, which is expected, as the status LED 2 is blinking. >>>>>>>>>> After >>>>>>>>>> ~5mins >>>>>>>>>> this status LED is constantly on. That's when I know that >>>>>>>>>> everything >>>>>>>>>> is over. On the console I can only execute commands that are >>>>>>>>>> already >>>>>>>>>> in RAM, such as the bash things like ps, mount, ... However, >>>>>>>>>> if I >>>>>>>>>> try >>>>>>>>>> a simple 'touch new' it blocks forever and I know that it >>>>>>>>>> blocks in >>>>>>>>>> the syscall where the file should be created, because I >>>>>>>>>> looked at >>>>>>>>>> it >>>>>>>>>> with strace. I tried several things: I turned off CONFIG_PM >>>>>>>>>> (which >>>>>>>>>> was >>>>>>>>>> on by default), turned on the MMC debugging, put extra >>>>>>>>>> prink's in >>>>>>>>>> the >>>>>>>>>> omap_hsmmc.c ISR. However, everything seems to work on this >>>>>>>>>> level: >>>>>>>>>> DMA >>>>>>>>>> requests are started and do finish, the ISR is called >>>>>>>>>> regularly >>>>>>>>>> (bc >>>>>>>>>> first I though that Xenomai would starve it). >>>>>>>>>> >>>>>>>>>> Have you every run Xenonmai on this _specific_ board (since >>>>>>>>>> everything >>>>>>>>>> is running smoothly on the omap5 board)? >>>>>>>>>> Any more ideas how to debug it? >>>>>>>>>> >>>>>>>>>> Currently, I'm compiling the ipipe trace in hope that it >>>>>>>>>> would >>>>>>>>>> tell >>>>>>>>>> me >>>>>>>>>> something useful... >>>>>>>>>> >>>>>>>>>> Oh yes, the best bit is that the regression test works >>>>>>>>>> perfectly >>>>>>>>>> fine >>>>>>>>>> if I boot from an external USB HD _AND_ unmount (!) all MMC >>>>>>>>>> partitions. >>>>>>>>> >>>>>>>>> So, the MMC driver has a problem. Have you tried: >>>>>>>>> - running the exact same kernel configuration only with >>>>>>>>> CONFIG_XENOMAI >>>>>>>>> disabled (and stress with dohell) >>>>>>>>> - then with CONFIG_XENOMAI and CONFIG_IPIPE disabled. >>>>>>>>> >>>>>>>>> Also, do you have this patch in the tree you tried? >>>>>>>>> http://git.xenomai.org/ipipe.git/commit/?h=stable/ipipe-3.10.18&id=c26e7ad5679f9391cd8ea1db001bf301d2f6bc88 >>>>>>>>> >>>>>>>> >>>>>>>> First i mounted tmpfs on /tmp so I don't wear out the SD card >>>>>>>> too >>>>>>>> much: >>>>>>>> mount -t tmpfs -osize=192M tmpfs /tmp >>>>>>>> >>>>>>>> Then I used the following line to start the test (substitute >>>>>>>> MYTEST >>>>>>>> below with the following line): >>>>>>>> /usr/lib/xenomai/testsuite/dohell -m /tmp -l /opt/ltp >>>>>>>> >>>>>>>> Note: I always monitored the test over wifi with 'top' so I >>>>>>>> also >>>>>>>> had >>>>>>>> some network load... >>>>>>>> >>>>>>>> I got the following results with the 3.10.34 kernel, which >>>>>>>> includes >>>>>>>> everything up to the current ipipe-3.10 tag (it also included >>>>>>>> the >>>>>>>> patch you mentioned): >>>>>>>> >>>>>>>> - xeno-regression-test "MYTEST" -> FAIL if booted from SD card >>>>>>>> (see >>>>>>>> description above); OK if booted from ext USB HD _AND_ no mmc >>>>>>>> partitions mounted >>>>>>>> - CONFIG_IPIPE && CONFIG_XENOMAI && MYTEST -> FAIL (got status >>>>>>>> LED 2 >>>>>>>> constantly on as described above) >>>>>>>> - CONFIG_IPIPE && MYTEST -> OK (see attached config file and >>>>>>>> ltp >>>>>>>> test >>>>>>>> log) >>>>>>>> >>>>>>>> Anything else I should try? >>>>>>> >>>>>>> Is the current LTP test when the failure happens always the >>>>>>> same? >>>>>>> >>>>>>> >>>>>> >>>>>> I went through all the logfiles on my pandaboard and and >>>>>> identified >>>>>> the last tests that ltp logged before the error occurred (I'm >>>>>> assuming >>>>>> that ltp writes to the file in /opt/ltp/results after completing >>>>>> the >>>>>> test since there is the PASS/FAIL note as well, which logically >>>>>> should >>>>>> only be available after completing the test): >>>>>> >>>>>> test count >>>>>> ======================== >>>>>> rt_sigqueueinfo01 1 >>>>>> clock_nanosleep01 10 >>>>>> munmap02 1 >>>>>> semget06 1 >>>>>> epoll_create1_01 5 >>>>>> splice01 1 >>>>>> clock_getres01 1 >>>>>> rename13 1 >>>>>> BindMounts 1 >>>>>> utimes01 1 >>>>>> >>>>>> So it seems that the test after 'clock_nanosleep01', which is >>>>>> 'clone01' according to the LTP log file I sent you, seems to be >>>>>> the >>>>>> prime hotspot of failure followed by 'epoll01', which comes after >>>>>> 'epoll_create1_01'. >>>>>> >>>>>> I'm using the standard LTP version 'ltp-full-20130904', which I >>>>>> downloaded and compiled on the target with gcc 4.6.3 (default >>>>>> debian >>>>>> wheezy). >>>>> >>>>> Ok. I am not sure it is meaningful. Anyway, the only difference >>>>> between >>>>> CONFIG_XENOMAI + CONFIG_IPIPE and CONFIG_IPIPE alone, provided >>>>> that >>>>> you >>>>> are not running any program using Xenomai, is the host tick >>>>> emulation. >>>>> >>>>> So, could you please try to turn off >>>>> CONFIG_NO_HZ_IDLE >>>>> CONFIG_NO_HZ >>>>> CONFIG_HIGH_RES_TIMERS >>>>> >>>>> And see if it works better? >>>>> >>>> >>>> As I wrote before, I recompiled the Kernel with your timer options >>>> and >>>> CONFIG_XENOMAI, installed it, synced it and rebooted after cutting >>>> the >>>> power to the board for ~10secs. >>>> >>>> It seems with those options it got much further with the tests. >>>> However, eventually all ssh connections broke up and the last >>>> messages >>>> on the console, where I started do hell were: >>>> >>>> [...] >>>> 102400000 bytes (102 MB) copied, 2.97674 s, 34.4 MB/s >>>> 100+0 records in >>>> 100+0 records out >>>> 102400000 bytes (102 MB) copied, 1.97433 s, 51.9 MB/s >>>> 100+0 records in >>>> 100+0 records out >>>> 102400000 bytes (102 MB) copied, 2.68371 s, 38.2 MB/s >>>> 100+0 records in >>>> 100+0 records out >>>> 102400000 bytes (102 MB) copied, 2.57073 s, 39.8 MB/s >>>> dd: writing `/tmp/bigfile': No space left on device >>>> 7+0 records in >>>> 6+0 records out >>>> 6164480 bytes (6.2 MB) copied, 0.189001 s, 32.6 MB/s >>>> /usr/lib/xenomai/testsuite/dohell: 62: /usr/lib/xenomai/testsuite/ >>>> dohell: Cannot fork >>> >>> This may simply be due to some LTP test which forks a lot and >>> prevent >>> the system from being able to fork. This should be a temporary >>> solution. >>> >>>> Write failed: Host is down >>>> >>>> ... and as usuall status LED 2 is permanently on. >>>> >>>> As u suspect there's something wrong with the timer subsystem I >>>> looked >>>> around a bit what extra patches went into the 3.10.14 kernel of >>>> RobertCNelson, which I used as a base to merge the ipipe git tree. >>>> Here is the list: >>>> >>>> 0001-panda-fix-wl12xx-regulator.patch >>>> 0002-ti-st-st-kim-fixing-firmware-path.patch >>>> 0003-Panda-expansion-add-spidev.patch >>>> 0004-HACK-PandaES-disable-cpufreq-so-board-will-boot.patch >>>> 0005-HACK-panda-enable-OMAP4_ERRATA_I688.patch >>>> 0006-ARM-hw_breakpoint-Enable-debug-powerdown-only-if-sys.patch >>>> 0007-Revert-regulator-twl-Remove-TWL6030_FIXED_RESOURCE.patch >>>> 0008-Revert-regulator-twl-Remove-another-unused-variable-.patch >>>> 0009-Revert-regulator-twl-Remove-references-to-the-twl403.patch >>>> 0010-Revert-regulator-twl-Remove-references-to-32kHz-cloc.patch >>>> 0011-panda-spidev-setup-pinmux.patch >>>> >>>> Do you think those may have something to do with it? >>> >>> I do not think so. When the LED is still on, can you use the serial >>> console to run cat /proc/interrupts to see if the timer is still >>> ticking? >>> >> >> I ran the test again with the same kernel and traced the messages >> from >> the serial console with minicom. Again, the test ran for quite some >> time until I got stacktraces similar to [1] (which might be just >> related to the ltp memcg test). >> >> However, after these stacktraces I got the following message on the >> serial console (LED2 also went on and stayed on): >> >> [...] >> [ 6674.540000] omap_hsmmc omap_hsmmc.0: MMC start dma failure >> [ 6674.540000] mmcblk0: unknown error -22 sending read/write command, >> card status 0x900 >> [ 6674.550000] end_request: I/O error, dev mmcblk0, sector 12751744 >> [ 6674.560000] EXT4-fs warning (device mmcblk0p2): >> __ext4_read_dirblock:908: error reading directory block (ino 397703, >> block 0) >> [...] >> [ 6932.610000] omap_hsmmc omap_hsmmc.0: MMC start dma failure >> [ 6932.610000] mmcblk0: unknown error -22 sending read/write command, >> card status 0x900 >> [ 6932.620000] end_request: I/O error, dev mmcblk0, sector 21142904 >> [ 6932.630000] EXT4-fs warning (device mmcblk0p2): >> __ext4_read_dirblock:908: error reading directory block (ino 657554, >> block 0) >> [...] >> >> Although dd is still running on minicom, I lost the ssh connection >> over Ethernet (and I couldn't get it back even after unconnecting and >> reconnecting the cable, which didn't cause any PHY interrupt in dmesg >> as well) and I cannot Ctrl-C or do anything on the serial >> console... I >> just see dd, which was started by dohell, getting invoked. > > What I meant is to use minicom as a console, already logged in, doing > nothing, ready to be used when the bug happens. So now I started dohell over ssh and connected to the serial console over minicom, where I just logged in as root. The result was that I got approx. as far as last time with the ltp tests. However, I couln't see the omap_hsmmc errors as last time. What was the same like last time was the fact that I couldn't do anything on the console or over ssh after the failure occurred... so unfortunately, I don't have any news on the 'cat /proc/interrupts' front. I do remember that with the my original timer subsystem setup I was seeing interrupts after the failure occurred. > > Anyway, I think there is no way around understanding the MMC driver > now. > > The bug when starting DMA may simply be due to the fact that all > previous DMAs stalled. > > Will look at this if I can reproduce it on my panda (it is currently > testing the I-pipe for 3.14, but when it is finished, I will try and > reproduce the bug). > Let me know if/how I can help. I also have a jtag hardware debugger, which I've never used before... it might take some time to set that up though... A. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Xenomai] Command line freeze during xeno-regression-test on omap4460 2014-04-06 11:21 ` Andreas Glatz 2014-04-06 14:44 ` Gilles Chanteperdrix @ 2014-04-06 15:54 ` Gilles Chanteperdrix 2014-04-06 16:02 ` Andreas Glatz 1 sibling, 1 reply; 28+ messages in thread From: Gilles Chanteperdrix @ 2014-04-06 15:54 UTC (permalink / raw) To: Andreas Glatz; +Cc: xenomai On 04/06/2014 01:21 PM, Andreas Glatz wrote: > First i mounted tmpfs on /tmp so I don't wear out the SD card too much: > mount -t tmpfs -osize=192M tmpfs /tmp > > Then I used the following line to start the test (substitute MYTEST > below with the following line): > /usr/lib/xenomai/testsuite/dohell -m /tmp -l /opt/ltp > > Note: I always monitored the test over wifi with 'top' so I also had > some network load... > > I got the following results with the 3.10.34 kernel, which includes > everything up to the current ipipe-3.10 tag (it also included the > patch you mentioned): > > - xeno-regression-test "MYTEST" -> FAIL if booted from SD card (see > description above); OK if booted from ext USB HD _AND_ no mmc > partitions mounted > - CONFIG_IPIPE && CONFIG_XENOMAI && MYTEST -> FAIL (got status LED 2 > constantly on as described above) > - CONFIG_IPIPE && MYTEST -> OK (see attached config file and ltp test > log) Of course, I assume you used the exact same kernel configuration, the only difference being CONFIG_XENOMAI in the two cases, right? -- Gilles. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Xenomai] Command line freeze during xeno-regression-test on omap4460 2014-04-06 15:54 ` Gilles Chanteperdrix @ 2014-04-06 16:02 ` Andreas Glatz 2014-04-06 20:54 ` Gilles Chanteperdrix 0 siblings, 1 reply; 28+ messages in thread From: Andreas Glatz @ 2014-04-06 16:02 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: xenomai On 6 Apr 2014, at 16:54, Gilles Chanteperdrix wrote: > On 04/06/2014 01:21 PM, Andreas Glatz wrote: >> First i mounted tmpfs on /tmp so I don't wear out the SD card too >> much: >> mount -t tmpfs -osize=192M tmpfs /tmp >> >> Then I used the following line to start the test (substitute MYTEST >> below with the following line): >> /usr/lib/xenomai/testsuite/dohell -m /tmp -l /opt/ltp >> >> Note: I always monitored the test over wifi with 'top' so I also had >> some network load... >> >> I got the following results with the 3.10.34 kernel, which includes >> everything up to the current ipipe-3.10 tag (it also included the >> patch you mentioned): >> >> - xeno-regression-test "MYTEST" -> FAIL if booted from SD card (see >> description above); OK if booted from ext USB HD _AND_ no mmc >> partitions mounted >> - CONFIG_IPIPE && CONFIG_XENOMAI && MYTEST -> FAIL (got status LED 2 >> constantly on as described above) >> - CONFIG_IPIPE && MYTEST -> OK (see attached config file and ltp test >> log) > > Of course, I assume you used the exact same kernel configuration, the > only difference being CONFIG_XENOMAI in the two cases, right? Yes! I just went into menuconfig and disabled CONFIG_XENOMAI, rebuilt it, installed it and rebooted. I'm now recompiling the kernel with the last config I sent you and the changes I attached (i got all those changes after enabling CONFIG_XENOMAI and your CONFIG_* changes with make menuconfig). After everything is built, I'll install it and repeat running 'MYTEST' without 'xeno-regression-test'. A. -------------- next part -------------- A non-text attachment was scrubbed... Name: config.diff Type: application/octet-stream Size: 5246 bytes Desc: not available URL: <http://www.xenomai.org/pipermail/xenomai/attachments/20140406/1ac7eef7/attachment.obj> -------------- next part -------------- ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Xenomai] Command line freeze during xeno-regression-test on omap4460 2014-04-06 16:02 ` Andreas Glatz @ 2014-04-06 20:54 ` Gilles Chanteperdrix 2014-04-06 21:23 ` Andreas Glatz 0 siblings, 1 reply; 28+ messages in thread From: Gilles Chanteperdrix @ 2014-04-06 20:54 UTC (permalink / raw) To: Andreas Glatz; +Cc: xenomai On 04/06/2014 06:02 PM, Andreas Glatz wrote: > > On 6 Apr 2014, at 16:54, Gilles Chanteperdrix wrote: > >> On 04/06/2014 01:21 PM, Andreas Glatz wrote: >>> First i mounted tmpfs on /tmp so I don't wear out the SD card too >>> much: >>> mount -t tmpfs -osize=192M tmpfs /tmp >>> >>> Then I used the following line to start the test (substitute MYTEST >>> below with the following line): >>> /usr/lib/xenomai/testsuite/dohell -m /tmp -l /opt/ltp >>> >>> Note: I always monitored the test over wifi with 'top' so I also had >>> some network load... >>> >>> I got the following results with the 3.10.34 kernel, which includes >>> everything up to the current ipipe-3.10 tag (it also included the >>> patch you mentioned): >>> >>> - xeno-regression-test "MYTEST" -> FAIL if booted from SD card (see >>> description above); OK if booted from ext USB HD _AND_ no mmc >>> partitions mounted >>> - CONFIG_IPIPE && CONFIG_XENOMAI && MYTEST -> FAIL (got status LED 2 >>> constantly on as described above) >>> - CONFIG_IPIPE && MYTEST -> OK (see attached config file and ltp test >>> log) >> >> Of course, I assume you used the exact same kernel configuration, the >> only difference being CONFIG_XENOMAI in the two cases, right? > > Yes! I just went into menuconfig and disabled CONFIG_XENOMAI, rebuilt > it, installed it and rebooted. I'm now recompiling the kernel with the > last config I sent you and the changes I attached (i got all those > changes after enabling CONFIG_XENOMAI and your CONFIG_* changes with > make menuconfig). After everything is built, I'll install it and > repeat running 'MYTEST' without 'xeno-regression-test'. Another interesting test would be to enable CONFIG_DETECT_HUNG_TASK. With a little luck, we will find on what is blocked the kernel. -- Gilles. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Xenomai] Command line freeze during xeno-regression-test on omap4460 2014-04-06 20:54 ` Gilles Chanteperdrix @ 2014-04-06 21:23 ` Andreas Glatz 0 siblings, 0 replies; 28+ messages in thread From: Andreas Glatz @ 2014-04-06 21:23 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: xenomai On 6 Apr 2014, at 21:54, Gilles Chanteperdrix wrote: > On 04/06/2014 06:02 PM, Andreas Glatz wrote: >> >> On 6 Apr 2014, at 16:54, Gilles Chanteperdrix wrote: >> >>> On 04/06/2014 01:21 PM, Andreas Glatz wrote: >>>> First i mounted tmpfs on /tmp so I don't wear out the SD card too >>>> much: >>>> mount -t tmpfs -osize=192M tmpfs /tmp >>>> >>>> Then I used the following line to start the test (substitute MYTEST >>>> below with the following line): >>>> /usr/lib/xenomai/testsuite/dohell -m /tmp -l /opt/ltp >>>> >>>> Note: I always monitored the test over wifi with 'top' so I also >>>> had >>>> some network load... >>>> >>>> I got the following results with the 3.10.34 kernel, which includes >>>> everything up to the current ipipe-3.10 tag (it also included the >>>> patch you mentioned): >>>> >>>> - xeno-regression-test "MYTEST" -> FAIL if booted from SD card (see >>>> description above); OK if booted from ext USB HD _AND_ no mmc >>>> partitions mounted >>>> - CONFIG_IPIPE && CONFIG_XENOMAI && MYTEST -> FAIL (got status >>>> LED 2 >>>> constantly on as described above) >>>> - CONFIG_IPIPE && MYTEST -> OK (see attached config file and ltp >>>> test >>>> log) >>> >>> Of course, I assume you used the exact same kernel configuration, >>> the >>> only difference being CONFIG_XENOMAI in the two cases, right? >> >> Yes! I just went into menuconfig and disabled CONFIG_XENOMAI, rebuilt >> it, installed it and rebooted. I'm now recompiling the kernel with >> the >> last config I sent you and the changes I attached (i got all those >> changes after enabling CONFIG_XENOMAI and your CONFIG_* changes with >> make menuconfig). After everything is built, I'll install it and >> repeat running 'MYTEST' without 'xeno-regression-test'. > > Another interesting test would be to enable CONFIG_DETECT_HUNG_TASK. > With a little luck, we will find on what is blocked the kernel. > Unfortunately, I rebooted the system and couldn't check the serial console. I started ltp again... so I should have more info tomorrow. However, last week I got the following backtraces with a CONFIG_IPIPE && CONFIG_XENOMAI kernel: [10683.230000] INFO: task arith:2623 blocked for more than 120 seconds. [10683.240000] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [10683.250000] arith D c0825a34 0 2623 1 0x00000001 [10683.260000] [<c0825a34>] (__schedule+0x550/0x858) from [<c0825dcc>] (schedule+0x90/0x94) [10683.270000] [<c0825dcc>] (schedule+0x90/0x94) from [<c08260b4>] (io_schedule+0xbc/0x12c) [10683.280000] [<c08260b4>] (io_schedule+0xbc/0x12c) from [<c02077a4>] (sleep_on_buffer+0x18/0x20) [10683.290000] [<c02077a4>] (sleep_on_buffer+0x18/0x20) from [<c0823ef0>] (__wait_on_bit+0x64/0xb0) [10683.300000] [<c0823ef0>] (__wait_on_bit+0x64/0xb0) from [<c0823fc4>] (out_of_line_wait_on_bit+0x88/0x94) [10683.310000] [<c0823fc4>] (out_of_line_wait_on_bit+0x88/0x94) from [<c0207860>] (__wait_on_buffer+0x30/0x38) [10683.320000] [<c0207860>] (__wait_on_buffer+0x30/0x38) from [<c0270e34>] (__ext4_get_inode_loc+0x1cc/0x448) [10683.330000] [<c0270e34>] (__ext4_get_inode_loc+0x1cc/0x448) from [<c0272b64>] (ext4_iget+0x64/0x840) [10683.340000] [<c0272b64>] (ext4_iget+0x64/0x840) from [<c027b9d4>] (ext4_lookup+0x120/0x168) [10683.350000] [<c027b9d4>] (ext4_lookup+0x120/0x168) from [<c01e37e4>] (lookup_real+0x40/0x5c) [10683.360000] [<c01e37e4>] (lookup_real+0x40/0x5c) from [<c01e7b64>] (do_last+0x604/0xd24) [10683.370000] [<c01e7b64>] (do_last+0x604/0xd24) from [<c01e8348>] (path_openat+0xc4/0x460) [10683.380000] [<c01e8348>] (path_openat+0xc4/0x460) from [<c01e9440>] (do_filp_open+0x3c/0x88) [10683.390000] [<c01e9440>] (do_filp_open+0x3c/0x88) from [<c01d9c48>] (do_sys_open+0xf4/0x180) [10683.400000] [<c01d9c48>] (do_sys_open+0xf4/0x180) from [<c01d9d04>] (SyS_open+0x30/0x34) [10683.410000] [<c01d9d04>] (SyS_open+0x30/0x34) from [<c000e020>] (ret_fast_syscall+0x0/0x50) [10683.070000] INFO: task rs:main Q:Reg:2063 blocked for more than 120 seconds. [10683.070000] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [10683.080000] rs:main Q:Reg D c0825a34 0 2063 1 0x00000000 [10683.090000] [<c0825a34>] (__schedule+0x550/0x858) from [<c0825dcc>] (schedule+0x90/0x94) [10683.100000] [<c0825dcc>] (schedule+0x90/0x94) from [<c08260b4>] (io_schedule+0xbc/0x12c) [10683.110000] [<c08260b4>] (io_schedule+0xbc/0x12c) from [<c0195570>] (sleep_on_page+0x18/0x20) [10683.120000] [<c0195570>] (sleep_on_page+0x18/0x20) from [<c0823ef0>] (__wait_on_bit+0x64/0xb0) [10683.130000] [<c0823ef0>] (__wait_on_bit+0x64/0xb0) from [<c0195364>] (wait_on_page_bit+0xa0/0xb0) [10683.140000] [<c0195364>] (wait_on_page_bit+0xa0/0xb0) from [<c02765fc>] (ext4_da_write_begin+0x1d4/0x28c) [10683.150000] [<c02765fc>] (ext4_da_write_begin+0x1d4/0x28c) from [<c01966ec>] (generic_file_buffered_write+0xdc/0x240) [10683.160000] [<c01966ec>] (generic_file_buffered_write+0xdc/0x240) from [<c01979b0>] (__generic_file_aio_write+0x360/0x3ac) [10683.170000] [<c01979b0>] (__generic_file_aio_write+0x360/0x3ac) from [<c0197a64>] (generic_file_aio_write+0x68/0xc8) [10683.190000] [<c0197a64>] (generic_file_aio_write+0x68/0xc8) from [<c026d33c>] (ext4_file_write+0x36c/0x454) [10683.200000] [<c026d33c>] (ext4_file_write+0x36c/0x454) from [<c01da120>] (do_sync_write+0x84/0xa8) [10683.210000] [<c01da120>] (do_sync_write+0x84/0xa8) from [<c01da8c0>] (vfs_write+0xe0/0x1c8) [10683.220000] [<c01da8c0>] (vfs_write+0xe0/0x1c8) from [<c01daec8>] (SyS_write+0x4c/0x7c) [10683.230000] [<c01daec8>] (SyS_write+0x4c/0x7c) from [<c000e020>] (ret_fast_syscall+0x0/0x50) A. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Xenomai] Command line freeze during xeno-regression-test on omap4460 2014-04-04 10:27 ` Andreas Glatz 2014-04-04 10:44 ` Gilles Chanteperdrix @ 2014-04-04 11:00 ` Gilles Chanteperdrix 2014-04-04 13:38 ` Andreas Glatz 1 sibling, 1 reply; 28+ messages in thread From: Gilles Chanteperdrix @ 2014-04-04 11:00 UTC (permalink / raw) To: Andreas Glatz; +Cc: xenomai On 04/04/2014 12:27 PM, Andreas Glatz wrote: > Hi Gilles, > > I'm finally back to my original problem below: > > On 6 Jan 2014, at 17:39, Gilles Chanteperdrix wrote: > >> On 01/06/2014 04:30 PM, Andreas Glatz wrote: >>> Hi, >>> >>> I managed to produce a kernel (v3.8.13) with xenomai 2.6.3 ipipe >>> patch and >>> rootfs (debian wheezy) with xenomai 2.6.3 libraries for my >>> Pandaboard ES >>> (omap4460). The simple regression test, which only calls dd during >>> the >>> switchtest, works fine. However the regression test with the linux >>> test >>> project (ltp-full-20130904) scripts causes some sort of system lock >>> up. >>> After that I only can ctrl-c xeno-regression-test (i.e. >>> switchtest), which, >>> however, doesn't help to regain console access (neigher over >>> ethernet nor >>> serial). >>> >>> Here's what I did: >>> >>> -- Building -- >>> As recomended in the Xenomai 2.6 readme I followed the instructions >>> in [1] >>> to produce a kernel and filesystem. To get a xenomai kernel I had >>> to do >>> three things differently: >>> >>> *) I used: git checkout origin/v3.8.x -b tmp >>> *) I applied ipipe-core-3.8.13-arm-3.patch from the xenomai-2.6 git >>> tree as >>> described in the Xenomai 2.6 readme >>> *) I disabled KGDB and TIDSPBRIDGE since those produced compile >>> errors (see >>> config [2]) >>> >>> After a while I obtained the following messages from dmesg [3] and >>> from the >>> command prompt: >>> >>> root@arm:~# cat /proc/version >>> Linux version 3.8.13-x3.6 (aglatz@linuxvbox) (gcc version 4.7.3 >>> 20130328 >>> (prerelease) (crosstool-NG linaro-1.13.1-4.7-2013.04-20130415 - >>> Linaro GCC >>> 2013.04) ) #4 SMP Sat Jan 4 15:54:20 GMT 2014 >>> >>> -- Testing Linux -- >>> To see if everything works I downloaded and cross-compiled >>> ltp-full-20130904 [4] with the same toolchain and flags (- >>> march=armv7-a >>> -mfpu=vfp3) as the xenomai libs and runtime. I started ltp with "./ >>> runltp >>> -p -l dohell-2014-01-06-1.log -S xenomai.skiplist" and after a >>> while it >>> finished with a few failed tests [5]. The console access, however, >>> worked >>> fine. >>> >>> -- Testing Xenomai -- >>> First I sucessfully could run the simple xenomai regression test: >>> xeno-regression-test -l "/usr/lib/xenomai/testsuite/dohell -m /tmp >>> 100" -t >>> 2 which produced the output in [6] and the following additional >>> messages >>> with dmesg: >>> >>> [ 476.215057] Xenomai: RTDM: closing file descriptor 1. >>> [ 477.434936] Xenomai: Posix: destroying semaphore f0069c00. >>> [ 477.440887] Xenomai: Posix: destroying mutex f0069a00. >>> [ 477.475372] xnheap: destroying shared heap 'rt_heap: heap' with >>> 16384 >>> bytes still in use. >>> [ 479.008453] Xenomai: Switching rt_task to secondary mode after >>> exception >>> #0 from user-space at 0x9620 (pid 2145) >>> [ 480.574462] Xenomai: watchdog triggered -- signaling runaway >>> thread >>> 'rt_task' >>> [ 480.582061] [sched_delayed] sched: RT throttling activated >>> [ 557.336425] Xenomai: Posix: closing message queue descriptor 3. >>> >>> and "cat /proc/xenomai/*" produced [7]. >>> >>> When I started the realistic xenomai regression test: xeno- >>> regression-test >>> -l "/usr/lib/xenomai/testsuite/dohell -m /tmp -l /opt/ltp" -t 2 >>> everything >>> seemed fine at first - I could logon and start top to inspect the >>> running >>> processes. However, the command line (over serial and ethernet) >>> consistently freezes after a while (at different ltp tests though). >>> First I >>> thought it's the massive system load which doesn't leave CPU for the >>> console... however ctrl-c of xeno-regression-test does not help to >>> regain >>> console access... >> >> That is because kill xeno-regression-test does not kill all the >> script children. So, basically, the load tasks are still running. >> Also, what filesystem is /tmp? dohell is using dd to alternatively >> write to /tmp, then erase the file. If /tmp is some flash, it will >> become slow after a while. If it is a tmpfs, it will eat RAM. >> >> > > The described problem is _very_ reproducible on my PandaBoard ES > (omap4460), where I boot from an SD card partition and the rootfs is I have a pandaboard, I can check whether I can reproduce that. I believe the same problem has also been reported on beagleboard XM: http://www.xenomai.org/pipermail/xenomai/2014-March/030311.html So, there may be an issue with Xenomai or interrupt pipelining and the MMC driver for omap3 and omap4. -- Gilles. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Xenomai] Command line freeze during xeno-regression-test on omap4460 2014-04-04 11:00 ` Gilles Chanteperdrix @ 2014-04-04 13:38 ` Andreas Glatz 0 siblings, 0 replies; 28+ messages in thread From: Andreas Glatz @ 2014-04-04 13:38 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: xenomai On 4 Apr 2014, at 12:00, Gilles Chanteperdrix wrote: > On 04/04/2014 12:27 PM, Andreas Glatz wrote: >> Hi Gilles, >> >> I'm finally back to my original problem below: >> >> On 6 Jan 2014, at 17:39, Gilles Chanteperdrix wrote: >> >>> On 01/06/2014 04:30 PM, Andreas Glatz wrote: >>>> Hi, >>>> >>>> I managed to produce a kernel (v3.8.13) with xenomai 2.6.3 ipipe >>>> patch and >>>> rootfs (debian wheezy) with xenomai 2.6.3 libraries for my >>>> Pandaboard ES >>>> (omap4460). The simple regression test, which only calls dd during >>>> the >>>> switchtest, works fine. However the regression test with the linux >>>> test >>>> project (ltp-full-20130904) scripts causes some sort of system lock >>>> up. >>>> After that I only can ctrl-c xeno-regression-test (i.e. >>>> switchtest), which, >>>> however, doesn't help to regain console access (neigher over >>>> ethernet nor >>>> serial). >>>> >>>> Here's what I did: >>>> >>>> -- Building -- >>>> As recomended in the Xenomai 2.6 readme I followed the instructions >>>> in [1] >>>> to produce a kernel and filesystem. To get a xenomai kernel I had >>>> to do >>>> three things differently: >>>> >>>> *) I used: git checkout origin/v3.8.x -b tmp >>>> *) I applied ipipe-core-3.8.13-arm-3.patch from the xenomai-2.6 git >>>> tree as >>>> described in the Xenomai 2.6 readme >>>> *) I disabled KGDB and TIDSPBRIDGE since those produced compile >>>> errors (see >>>> config [2]) >>>> >>>> After a while I obtained the following messages from dmesg [3] and >>>> from the >>>> command prompt: >>>> >>>> root@arm:~# cat /proc/version >>>> Linux version 3.8.13-x3.6 (aglatz@linuxvbox) (gcc version 4.7.3 >>>> 20130328 >>>> (prerelease) (crosstool-NG linaro-1.13.1-4.7-2013.04-20130415 - >>>> Linaro GCC >>>> 2013.04) ) #4 SMP Sat Jan 4 15:54:20 GMT 2014 >>>> >>>> -- Testing Linux -- >>>> To see if everything works I downloaded and cross-compiled >>>> ltp-full-20130904 [4] with the same toolchain and flags (- >>>> march=armv7-a >>>> -mfpu=vfp3) as the xenomai libs and runtime. I started ltp with "./ >>>> runltp >>>> -p -l dohell-2014-01-06-1.log -S xenomai.skiplist" and after a >>>> while it >>>> finished with a few failed tests [5]. The console access, however, >>>> worked >>>> fine. >>>> >>>> -- Testing Xenomai -- >>>> First I sucessfully could run the simple xenomai regression test: >>>> xeno-regression-test -l "/usr/lib/xenomai/testsuite/dohell -m /tmp >>>> 100" -t >>>> 2 which produced the output in [6] and the following additional >>>> messages >>>> with dmesg: >>>> >>>> [ 476.215057] Xenomai: RTDM: closing file descriptor 1. >>>> [ 477.434936] Xenomai: Posix: destroying semaphore f0069c00. >>>> [ 477.440887] Xenomai: Posix: destroying mutex f0069a00. >>>> [ 477.475372] xnheap: destroying shared heap 'rt_heap: heap' with >>>> 16384 >>>> bytes still in use. >>>> [ 479.008453] Xenomai: Switching rt_task to secondary mode after >>>> exception >>>> #0 from user-space at 0x9620 (pid 2145) >>>> [ 480.574462] Xenomai: watchdog triggered -- signaling runaway >>>> thread >>>> 'rt_task' >>>> [ 480.582061] [sched_delayed] sched: RT throttling activated >>>> [ 557.336425] Xenomai: Posix: closing message queue descriptor 3. >>>> >>>> and "cat /proc/xenomai/*" produced [7]. >>>> >>>> When I started the realistic xenomai regression test: xeno- >>>> regression-test >>>> -l "/usr/lib/xenomai/testsuite/dohell -m /tmp -l /opt/ltp" -t 2 >>>> everything >>>> seemed fine at first - I could logon and start top to inspect the >>>> running >>>> processes. However, the command line (over serial and ethernet) >>>> consistently freezes after a while (at different ltp tests though). >>>> First I >>>> thought it's the massive system load which doesn't leave CPU for >>>> the >>>> console... however ctrl-c of xeno-regression-test does not help to >>>> regain >>>> console access... >>> >>> That is because kill xeno-regression-test does not kill all the >>> script children. So, basically, the load tasks are still running. >>> Also, what filesystem is /tmp? dohell is using dd to alternatively >>> write to /tmp, then erase the file. If /tmp is some flash, it will >>> become slow after a while. If it is a tmpfs, it will eat RAM. >>> >>> >> >> The described problem is _very_ reproducible on my PandaBoard ES >> (omap4460), where I boot from an SD card partition and the rootfs is > > I have a pandaboard, I can check whether I can reproduce that. Thanks, I really appreciate that. For completeness sake I'm also including my current kernel config. This config is derived from CNelsons config and still has a lot of unnecessary stuff in it. CONFIG_PM can be disabled after disabling CONFIG_ARCH_OMAP2PLUS_TYPICAL. A. -------------- next part -------------- A non-text attachment was scrubbed... Name: configv2 Type: application/octet-stream Size: 120523 bytes Desc: not available URL: <http://www.xenomai.org/pipermail/xenomai/attachments/20140404/81e66bf9/attachment.obj> -------------- next part -------------- ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Xenomai] Command line freeze during xeno-regression-test on omap4460 2014-01-06 15:30 [Xenomai] Command line freeze during xeno-regression-test on omap4460 Andreas Glatz 2014-01-06 17:33 ` Gilles Chanteperdrix 2014-01-06 17:39 ` Gilles Chanteperdrix @ 2014-04-14 7:13 ` Gilles Chanteperdrix 2014-04-14 7:24 ` Andreas Glatz 2 siblings, 1 reply; 28+ messages in thread From: Gilles Chanteperdrix @ 2014-04-14 7:13 UTC (permalink / raw) To: Andreas Glatz; +Cc: xenomai On 01/06/2014 04:30 PM, Andreas Glatz wrote: > Hi, > > I managed to produce a kernel (v3.8.13) with xenomai 2.6.3 ipipe patch and > rootfs (debian wheezy) with xenomai 2.6.3 libraries for my Pandaboard ES > (omap4460). The simple regression test, which only calls dd during the > switchtest, works fine. However the regression test with the linux test > project (ltp-full-20130904) scripts causes some sort of system lock up. > After that I only can ctrl-c xeno-regression-test (i.e. switchtest), which, > however, doesn't help to regain console access (neigher over ethernet nor > serial). Hi, I finally ran some tests with SD card: I booted my pandaboard using NFS as usual, but ran the xeno-test script passing the mount point of the SD card to dohell's -m option. And I could not reproduce any issue. The kernel I used is 3.14, the configuration is omap2plus_defconfig. Regards. -- Gilles. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Xenomai] Command line freeze during xeno-regression-test on omap4460 2014-04-14 7:13 ` Gilles Chanteperdrix @ 2014-04-14 7:24 ` Andreas Glatz 2014-04-14 7:35 ` Gilles Chanteperdrix 0 siblings, 1 reply; 28+ messages in thread From: Andreas Glatz @ 2014-04-14 7:24 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: xenomai On 14 Apr 2014, at 08:13, Gilles Chanteperdrix wrote: > On 01/06/2014 04:30 PM, Andreas Glatz wrote: >> Hi, >> >> I managed to produce a kernel (v3.8.13) with xenomai 2.6.3 ipipe >> patch and >> rootfs (debian wheezy) with xenomai 2.6.3 libraries for my >> Pandaboard ES >> (omap4460). The simple regression test, which only calls dd during >> the >> switchtest, works fine. However the regression test with the linux >> test >> project (ltp-full-20130904) scripts causes some sort of system lock >> up. >> After that I only can ctrl-c xeno-regression-test (i.e. >> switchtest), which, >> however, doesn't help to regain console access (neigher over >> ethernet nor >> serial). > > Hi, > > I finally ran some tests with SD card: I booted my pandaboard using > NFS > as usual, but ran the xeno-test script passing the mount point of > the SD > card to dohell's -m option. And I could not reproduce any issue. The > kernel I used is 3.14, the configuration is omap2plus_defconfig. > OK, brilliant. I'll give that a try. I'm assuming that you tested with the git tag 'raw/ipipe-3.14.0' ? A. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Xenomai] Command line freeze during xeno-regression-test on omap4460 2014-04-14 7:24 ` Andreas Glatz @ 2014-04-14 7:35 ` Gilles Chanteperdrix 2014-04-14 15:55 ` Andreas Glatz 0 siblings, 1 reply; 28+ messages in thread From: Gilles Chanteperdrix @ 2014-04-14 7:35 UTC (permalink / raw) To: Andreas Glatz; +Cc: xenomai On 04/14/2014 09:24 AM, Andreas Glatz wrote: > > On 14 Apr 2014, at 08:13, Gilles Chanteperdrix wrote: > >> On 01/06/2014 04:30 PM, Andreas Glatz wrote: >>> Hi, >>> >>> I managed to produce a kernel (v3.8.13) with xenomai 2.6.3 ipipe >>> patch and >>> rootfs (debian wheezy) with xenomai 2.6.3 libraries for my >>> Pandaboard ES >>> (omap4460). The simple regression test, which only calls dd during >>> the >>> switchtest, works fine. However the regression test with the linux >>> test >>> project (ltp-full-20130904) scripts causes some sort of system lock >>> up. >>> After that I only can ctrl-c xeno-regression-test (i.e. >>> switchtest), which, >>> however, doesn't help to regain console access (neigher over >>> ethernet nor >>> serial). >> >> Hi, >> >> I finally ran some tests with SD card: I booted my pandaboard using >> NFS >> as usual, but ran the xeno-test script passing the mount point of >> the SD >> card to dohell's -m option. And I could not reproduce any issue. The >> kernel I used is 3.14, the configuration is omap2plus_defconfig. >> > > OK, brilliant. I'll give that a try. I'm assuming that you tested with > the git tag 'raw/ipipe-3.14.0' ? It is a branch actually, but yes. -- Gilles. ^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [Xenomai] Command line freeze during xeno-regression-test on omap4460 2014-04-14 7:35 ` Gilles Chanteperdrix @ 2014-04-14 15:55 ` Andreas Glatz 0 siblings, 0 replies; 28+ messages in thread From: Andreas Glatz @ 2014-04-14 15:55 UTC (permalink / raw) To: Gilles Chanteperdrix; +Cc: xenomai On 14 Apr 2014, at 08:35, Gilles Chanteperdrix wrote: > On 04/14/2014 09:24 AM, Andreas Glatz wrote: >> >> On 14 Apr 2014, at 08:13, Gilles Chanteperdrix wrote: >> >>> On 01/06/2014 04:30 PM, Andreas Glatz wrote: >>>> Hi, >>>> >>>> I managed to produce a kernel (v3.8.13) with xenomai 2.6.3 ipipe >>>> patch and >>>> rootfs (debian wheezy) with xenomai 2.6.3 libraries for my >>>> Pandaboard ES >>>> (omap4460). The simple regression test, which only calls dd during >>>> the >>>> switchtest, works fine. However the regression test with the linux >>>> test >>>> project (ltp-full-20130904) scripts causes some sort of system lock >>>> up. >>>> After that I only can ctrl-c xeno-regression-test (i.e. >>>> switchtest), which, >>>> however, doesn't help to regain console access (neigher over >>>> ethernet nor >>>> serial). >>> >>> Hi, >>> >>> I finally ran some tests with SD card: I booted my pandaboard using >>> NFS >>> as usual, but ran the xeno-test script passing the mount point of >>> the SD >>> card to dohell's -m option. And I could not reproduce any issue. The >>> kernel I used is 3.14, the configuration is omap2plus_defconfig. >>> >> >> OK, brilliant. I'll give that a try. I'm assuming that you tested >> with >> the git tag 'raw/ipipe-3.14.0' ? > > It is a branch actually, but yes. > At the bottom of this Email is the result of the first LTP pass (started from dohell, started from xeno-regression-test). I never got so far with just the SD card in the panda (I ran exactly the same ltp test as before). LED2 is still constantly on, but the system remains responsive over wifi, ethernet and serial. Max latency is ~7us and worst latency is ~18us. Will post the final output ASAP. I think from the results and the other reports on the mailing list I think I might be well off using this kernel as a base for our open- source/-hardware DAQ project. Thanks a lot Gilles! A. mv_tests01 PASS 0 size01 PASS 0 sssd01 PASS 0 sssd02 PASS 0 sssd03 PASS 0 smt_smp_enabled PASS 0 smt_smp_affinity PASS 0 ht_interrupt PASS 0 kmsg01 PASS 0 fw_load FAIL 2 ----------------------------------------------- Total Tests: 1345 Total Failures: 51 Kernel Version: 3.14.0-ipipe-38801-g9b33fee-dirty Machine Architecture: armv7l Hostname: arm root@arm:~# ^ permalink raw reply [flat|nested] 28+ messages in thread
end of thread, other threads:[~2014-04-14 15:55 UTC | newest] Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2014-01-06 15:30 [Xenomai] Command line freeze during xeno-regression-test on omap4460 Andreas Glatz 2014-01-06 17:33 ` Gilles Chanteperdrix 2014-01-06 17:39 ` Gilles Chanteperdrix 2014-01-07 7:23 ` Andreas Glatz 2014-01-07 8:10 ` Andreas Glatz 2014-04-04 10:27 ` Andreas Glatz 2014-04-04 10:44 ` Gilles Chanteperdrix 2014-04-04 11:19 ` Andreas Glatz 2014-04-04 11:21 ` Gilles Chanteperdrix 2014-04-06 11:21 ` Andreas Glatz 2014-04-06 14:44 ` Gilles Chanteperdrix 2014-04-06 15:22 ` Andreas Glatz 2014-04-06 15:28 ` Gilles Chanteperdrix 2014-04-06 20:57 ` Andreas Glatz 2014-04-06 21:04 ` Gilles Chanteperdrix 2014-04-07 10:18 ` Andreas Glatz 2014-04-07 10:52 ` Gilles Chanteperdrix 2014-04-07 13:41 ` Andreas Glatz 2014-04-06 15:54 ` Gilles Chanteperdrix 2014-04-06 16:02 ` Andreas Glatz 2014-04-06 20:54 ` Gilles Chanteperdrix 2014-04-06 21:23 ` Andreas Glatz 2014-04-04 11:00 ` Gilles Chanteperdrix 2014-04-04 13:38 ` Andreas Glatz 2014-04-14 7:13 ` Gilles Chanteperdrix 2014-04-14 7:24 ` Andreas Glatz 2014-04-14 7:35 ` Gilles Chanteperdrix 2014-04-14 15:55 ` Andreas Glatz
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.