All of lore.kernel.org
 help / color / mirror / Atom feed
* [Xenomai] Switchtest failures on ODROIDU3
@ 2014-09-30  5:31 GP Orcullo
  2014-09-30 11:22 ` Gilles Chanteperdrix
  2014-09-30 11:30 ` Gilles Chanteperdrix
  0 siblings, 2 replies; 46+ messages in thread
From: GP Orcullo @ 2014-09-30  5:31 UTC (permalink / raw)
  To: xenomai

Hi,

Running the switchtest for extended periods (>10 mins) causes the
machine to lockup.

I'm running a modified xeno-regression-test which contains only the
following tests:

check_alive /usr/lib/xenomai/testsuite/switchtest
check_alive /usr/lib/xenomai/testsuite/switchtest -s 1000
check_alive /usr/lib/xenomai/testsuite/latency ${1+"$@"}

The script is invoked with the following arguments:

nohup sudo ./xeno-regression-test -l
"/usr/lib/xenomai/testsuite/dohell -m /media/work 36000" -t 2 >
/dev/null & top -d0.5

The kernel dumps the OOPS information intermittently so it's difficult
to diagnose the issue.

Attached is the kernel config and the logfile.

Thanks,

GP Orcullo
-------------- next part --------------
A non-text attachment was scrubbed...
Name: config.gz
Type: application/x-gzip
Size: 18718 bytes
Desc: not available
URL: <http://www.xenomai.org/pipermail/xenomai/attachments/20140930/b6c892de/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dmesg.cap
Type: application/vnd.tcpdump.pcap
Size: 38750 bytes
Desc: not available
URL: <http://www.xenomai.org/pipermail/xenomai/attachments/20140930/b6c892de/attachment.cap>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] Switchtest failures on ODROIDU3
  2014-09-30  5:31 [Xenomai] Switchtest failures on ODROIDU3 GP Orcullo
@ 2014-09-30 11:22 ` Gilles Chanteperdrix
  2014-09-30 11:30 ` Gilles Chanteperdrix
  1 sibling, 0 replies; 46+ messages in thread
From: Gilles Chanteperdrix @ 2014-09-30 11:22 UTC (permalink / raw)
  To: GP Orcullo, xenomai

On 09/30/2014 07:31 AM, GP Orcullo wrote:
> Hi,
> 
> Running the switchtest for extended periods (>10 mins) causes the
> machine to lockup.
> 
> I'm running a modified xeno-regression-test which contains only the
> following tests:
> 
> check_alive /usr/lib/xenomai/testsuite/switchtest
> check_alive /usr/lib/xenomai/testsuite/switchtest -s 1000
> check_alive /usr/lib/xenomai/testsuite/latency ${1+"$@"}
> 
> The script is invoked with the following arguments:
> 
> nohup sudo ./xeno-regression-test -l
> "/usr/lib/xenomai/testsuite/dohell -m /media/work 36000" -t 2 >
> /dev/null & top -d0.5
> 
> The kernel dumps the OOPS information intermittently so it's difficult
> to diagnose the issue.
> 
> Attached is the kernel config and the logfile.
> 

http://xenomai.org/asking-for-help/

-- 
                                                                Gilles.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] Switchtest failures on ODROIDU3
  2014-09-30  5:31 [Xenomai] Switchtest failures on ODROIDU3 GP Orcullo
  2014-09-30 11:22 ` Gilles Chanteperdrix
@ 2014-09-30 11:30 ` Gilles Chanteperdrix
  2014-09-30 12:04   ` GP Orcullo
  1 sibling, 1 reply; 46+ messages in thread
From: Gilles Chanteperdrix @ 2014-09-30 11:30 UTC (permalink / raw)
  To: GP Orcullo, xenomai

On 09/30/2014 07:31 AM, GP Orcullo wrote:
> Hi,
> 
> Running the switchtest for extended periods (>10 mins) causes the
> machine to lockup.
> 
> I'm running a modified xeno-regression-test which contains only the
> following tests:
> 
> check_alive /usr/lib/xenomai/testsuite/switchtest
> check_alive /usr/lib/xenomai/testsuite/switchtest -s 1000
> check_alive /usr/lib/xenomai/testsuite/latency ${1+"$@"}
> 
> The script is invoked with the following arguments:
> 
> nohup sudo ./xeno-regression-test -l
> "/usr/lib/xenomai/testsuite/dohell -m /media/work 36000" -t 2 >
> /dev/null & top -d0.5
> 
> The kernel dumps the OOPS information intermittently so it's difficult
> to diagnose the issue.
> 
> Attached is the kernel config and the logfile.

Ok, this is an exynos. Sorry, but I have never seen the patch for
exynos, so I do not know what is inside. You should direct your
questions to whoever provided you with this support.

-- 
                                                                Gilles.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] Switchtest failures on ODROIDU3
  2014-09-30 11:30 ` Gilles Chanteperdrix
@ 2014-09-30 12:04   ` GP Orcullo
  2014-09-30 12:16     ` Gilles Chanteperdrix
  0 siblings, 1 reply; 46+ messages in thread
From: GP Orcullo @ 2014-09-30 12:04 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai

On Sep 30, 2014 7:30 PM, "Gilles Chanteperdrix" <
gilles.chanteperdrix@xenomai.org> wrote:
>
> On 09/30/2014 07:31 AM, GP Orcullo wrote:
> > Hi,
> >
> > Running the switchtest for extended periods (>10 mins) causes the
> > machine to lockup.
> >
> > I'm running a modified xeno-regression-test which contains only the
> > following tests:
> >
> > check_alive /usr/lib/xenomai/testsuite/switchtest
> > check_alive /usr/lib/xenomai/testsuite/switchtest -s 1000
> > check_alive /usr/lib/xenomai/testsuite/latency ${1+"$@"}
> >
> > The script is invoked with the following arguments:
> >
> > nohup sudo ./xeno-regression-test -l
> > "/usr/lib/xenomai/testsuite/dohell -m /media/work 36000" -t 2 >
> > /dev/null & top -d0.5
> >
> > The kernel dumps the OOPS information intermittently so it's difficult
> > to diagnose the issue.
> >
> > Attached is the kernel config and the logfile.
>
> Ok, this is an exynos. Sorry, but I have never seen the patch for
> exynos, so I do not know what is inside. You should direct your
> questions to whoever provided you with this support.

I'm in the process of porting xenomai to run on exynos.

The ipipe-core-3.8.13-arm-3.patch applies cleanly to the 3.8.13.11 kernel
used by the odroid U3 board.

Attached is the ipipe patch that I've made.

I was just wondering what would cause switchtest to fail. The error that I
can see is that the system is running out of memory and I don't know
exactly what is causing this.

I've reattached the dmesg log.

>
> --
>                                                                 Gilles.
-------------- next part --------------
Uncompressing Linux... done, booting the kernel.
[    0.000000] Booting Linux on physical CPU 0xa00
[    0.000000] Initializing cgroup subsys cpuset
[    0.000000] Linux version 3.8.13.11-xen (root@mba) (gcc version 4.7.2 (Debian 4.7.2-5) ) #10 SMP PREEMPT Tue Sep 30 03:45:05 UTC 2014
[    0.000000] CPU: ARMv7 Processor [413fc090] revision 0 (ARMv7), cr=10c5387d
[    0.000000] CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache
[    0.000000] Machine: ODROIDU2
[    0.000000] bootconsole [earlycon0] enabled
[    0.000000] cma: CMA: reserved 64 MiB at 43000000
[    0.000000] cma: CMA: reserved 64 MiB at 51000000
[    0.000000] cma: CMA: reserved 128 MiB at 67800000
[    0.000000] Memory policy: ECC disabled, Data cache writealloc
[    0.000000] CPU EXYNOS4412 (id 0xe4412220)
[    0.000000] exynos4_init_clocks: initializing clocks
[    0.000000] S3C24XX Clocks, Copyright 2004 Simtec Electronics
[    0.000000] s3c_register_clksrc: clock armclk has no registers set
[    0.000000] s3c_register_clksrc: clock audiocdclk has no registers set
[    0.000000] audiocdclk: no parent clock specified
[    0.000000] exynos4_setup_clocks: registering clocks
[    0.000000] exynos4_setup_clocks: xtal is 24000000
[    0.000000] EXYNOS4: PLL settings, A=1000000000, M=880000000, E=96000000 V=350000000
[    0.000000] EXYNOS4: ARMCLK=1000000000, DMC=440000000, ACLK200=176000000
[    0.000000] ACLK100=110000000, ACLK160=176000000, ACLK133=146666666
[    0.000000] sclk_pwm: source is ext_xtal (0), rate is 24000000
[    0.000000] sclk_csis: source is xusbxti (1), rate is 1500000
[    0.000000] sclk_csis: source is xusbxti (1), rate is 1500000
[    0.000000] sclk_cam0: source is xusbxti (1), rate is 1500000
[    0.000000] sclk_cam1: source is xusbxti (1), rate is 1500000
[    0.000000] sclk_fimc: source is xusbxti (1), rate is 1500000
[    0.000000] sclk_fimc: source is xusbxti (1), rate is 1500000
[    0.000000] sclk_fimc: source is xusbxti (1), rate is 1500000
[    0.000000] sclk_fimc: source is xusbxti (1), rate is 1500000
[    0.000000] sclk_fimd: source is mout_mpll_user (6), rate is 55000000
[    0.000000] sclk_mfc: source is mout_mfc0 (0), rate is 55000000
[    0.000000] sclk_g3d: source is mout_g3d0 (0), rate is 55000000
[    0.000000] Unable to set parent aclk_160 of clock dout_mmc0.
[    0.000000] Unable to set parent aclk_160 of clock dout_mmc1.
[    0.000000] Unable to set parent aclk_160 of clock dout_mmc2.
[    0.000000] On node 0 totalpages: 524032
[    0.000000]   Normal zone: 1520 pages used for memmap
[    0.000000]   Normal zone: 0 pages reserved
[    0.000000]   Normal zone: 193040 pages, LIFO batch:31
[    0.000000]   HighMem zone: 2574 pages used for memmap
[    0.000000]   HighMem zone: 326898 pages, LIFO batch:31
[    0.000000] Running under secure firmware.
[    0.000000] PERCPU: Embedded 10 pages/cpu @c16f9000 s18496 r8192 d14272 u40960
[    0.000000] pcpu-alloc: s18496 r8192 d14272 u40960 alloc=10*4096
[    0.000000] pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3 
[    0.000000] Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 519938
[    0.000000] Kernel command line: console=tty1 mem=2047M earlyprintk=serial,ttySAC1,115200n8 console=tty1 console=ttySAC1,115200n8 fb_x_res=1280 fb_y_res=720 hdmi_phy_res=720 root=UUID=b8348df1-266c-4df4-b6ca-3a82803afe17 rootwait ro mem=2047M debug log_buf_len=10M
[    0.000000] log_buf_len: 16777216
[    0.000000] early log buf free: 127528(97%)
[    0.000000] PID hash table entries: 4096 (order: 2, 16384 bytes)
[    0.000000] Dentry cache hash table entries: 131072 (order: 7, 524288 bytes)
[    0.000000] Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
[    0.000000] __ex_table already sorted, skipping sort
[    0.000000] Memory: 2047MB = 2047MB total
[    0.000000] Memory: 1784456k/3568912k available, 623344k reserved, 1317888K highmem
[    0.000000] Virtual kernel memory layout:
[    0.000000]     vector  : 0xffff0000 - 0xffff1000   (   4 kB)
[    0.000000]     fixmap  : 0xfff00000 - 0xfffe0000   ( 896 kB)
[    0.000000]     vmalloc : 0xf0000000 - 0xff000000   ( 240 MB)
[    0.000000]     lowmem  : 0xc0000000 - 0xef800000   ( 760 MB)
[    0.000000]     pkmap   : 0xbfe00000 - 0xc0000000   (   2 MB)
[    0.000000]     modules : 0xbf000000 - 0xbfe00000   (  14 MB)
[    0.000000]       .text : 0xc0008000 - 0xc05b3950   (5807 kB)
[    0.000000]       .init : 0xc05b4000 - 0xc05e4840   ( 195 kB)
[    0.000000]       .data : 0xc05e6000 - 0xc0646ce0   ( 388 kB)
[    0.000000]        .bss : 0xc0646ce0 - 0xc06e3c6c   ( 628 kB)
[    0.000000] SLUB: Genslabs=11, HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1
[    0.000000] Preemptible hierarchical RCU implementation.
[    0.000000] NR_IRQS:549
[    0.000000] I-pipe, 24.000 MHz clocksource
[    0.000000] sched_clock: 32 bits at 24MHz, resolution 41ns, wraps every 178956ms
[    0.000000] Interrupt pipeline (release #3)
[    0.000000] Console: colour dummy device 80x30
[    0.000000] console [tty1] enabled
[    0.013274] Calibrating delay loop... 1992.29 BogoMIPS (lpj=4980736)
[    0.062786] pid_max: default: 32768 minimum: 301
[    0.067874] Mount-cache hash table entries: 512
[    0.073681] Initializing cgroup subsys cpuacct
[    0.078109] Initializing cgroup subsys devices
[    0.082674] CPU: Testing write buffer coherency: ok
[    0.087832] CPU0: thread -1, cpu 0, socket 10, mpidr 80000a00
[    0.093655] Setting up static identity map for 0x404446d8 - 0x40444730
[    0.100207] L310 cache controller enabled
[    0.104231] l2x0: 16 ways, CACHE_ID 0x000000c0, AUX_CTRL 0x7e470001, Cache size: 1048576 B
[    0.159021] CPU1: Booted secondary processor
[    0.178594] CPU1: thread -1, cpu 1, socket 10, mpidr 80000a01
[    0.189009] CPU2: Booted secondary processor
[    0.208593] CPU2: thread -1, cpu 2, socket 10, mpidr 80000a02
[    0.219033] CPU3: Booted secondary processor
[    0.238594] CPU3: thread -1, cpu 3, socket 10, mpidr 80000a03
[    0.238650] Brought up 4 CPUs
[    0.272027] SMP: Total of 4 processors activated (7969.17 BogoMIPS).
[    0.280584] devtmpfs: initialized
[    0.297760] regulator-dummy: no parameters
[    0.309276] NET: Registered protocol family 16
[    0.326240] DMA: preallocated 256 KiB pool for atomic coherent allocations
[    0.343656] s5p_hdmi_cec_set_platdata()
[    0.356027] hw-breakpoint: found 5 (+1 reserved) breakpoint and 1 watchpoint registers.
[    0.363985] hw-breakpoint: maximum watchpoint size is 4 bytes.
[    0.370086] S3C Power Management, Copyright 2004 Simtec Electronics
[    0.376479] EXYNOS4x12 PMU Initialize
[    0.383970] EXYNOS: Initializing architecture
[    0.389774] s3c24xx-pwm s3c24xx-pwm.0: tin at 110000000, tdiv at 110000000, tin=divclk, base 0
[    0.398423] s3c24xx-pwm s3c24xx-pwm.1: tin at 110000000, tdiv at 110000000, tin=divclk, base 8
[    0.435310] bio: create slab <bio-0> at 0
[    0.440225] hdmi_5v: 5000 mV 
[    0.446048] usbcore: registered new interface driver usbfs
[    0.451684] usbcore: registered new interface driver hub
[    0.457181] usbcore: registered new device driver usb
[    0.462792] s3c-i2c s3c2440-i2c.0: slave address 0x10
[    0.467785] s3c-i2c s3c2440-i2c.0: bus frequency set to 71 KHz
[    0.474885] max77686 0-0009: device found
[    0.479103] max77686 0-0009: irq is not specified
[    0.487194] LDO1 VDD_ALIVE: 1000 mV 
[    0.494591] LDO2 VDDQ_M1_1V8: 1200 mV 
[    0.502129] LDO3 VDDQ_AUD_1V8: 1800 mV 
[    0.509816] LDO4 VDDQ_MMC2_2V8: 2800 mV 
[    0.517555] LDO5 VDDQ_MMC1_1V8: 1800 mV 
[    0.525295] LDO6 VDD10_MPLL_1V0: 1000 mV 
[    0.533118] LDO7 VDD10_EPLL_1V0: 1000 mV 
[    0.540963] LDO8 VDD10_MIPI_1V0: 1000 mV 
[    0.548260] LDO9 VT_CORE_1V0: ODROIDU2: Disabled regulator
[    0.553698] LDO9 VT_CORE_1V0: 1000 mV 
[    0.559610] LDO10 VDD18_MIPI_1V8: 1800 mV 
[    0.567517] LDO11 VDD18_ABB1_1V8: 1800 mV 
[    0.573667] vdd_ldo12 range: 3300 mV 
[    0.581110] LDO13 VDD18_MIPIHSI_1V8: 1800 mV 
[    0.589294] LDO14 VDD18_ADC_1V8: 1800 mV 
[    0.595328] vdd_ldo15 range: 1000 mV 
[    0.602784] LDO16 VDD18_HSIC: 1800 mV 
[    0.610374] LDO17 VDDQ_CAM_1V8: 1800 mV 
[    0.617510] LDO18 VDDQ_ISP_1V8: ODROIDU2: Disabled regulator
[    0.623112] LDO18 VDDQ_ISP_1V8: 1800 mV 
[    0.628654] LDO19 VT_CAM_1V8: ODROIDU2: Disabled regulator
[    0.634091] LDO19 VT_CAM_1V8: 1800 mV 
[    0.639855] LDO20 EMMC_IO_1V8: 1800 mV 
[    0.647496] LDO21 TFLASH_2V8: 2800 mV 
[    0.655045] LDO22 2V8: 2800 mV 
[    0.661385] LDO23 VDD_TOUCH_2V8: ODROIDU2: Disabled regulator
[    0.667077] LDO23 VDD_TOUCH_2V8: 2800 mV 
[    0.674449] LDO24 VDD_TOUCHLED_3V3: ODROIDU2: Disabled regulator
[    0.680408] LDO24 VDD_TOUCHLED_3V3: 3300 mV 
[    0.688630] LDO25 VDDQ_LCD_3V0: 1800 mV 
[    0.695745] LDO26 VDD_MOTOR_3V0: ODROIDU2: Disabled regulator
[    0.701438] LDO26 VDD_MOTOR_3V0: 3000 mV 
[    0.707051] BUCK1 vdd_mif: 1100 mV 
[    0.712583] BUCK2 vdd_arm: 800 <--> 1500 mV at 1400 mV 
[    0.719318] BUCK3 vdd_int: 1125 mV 
[    0.724856] BUCK4 vdd_g3d: 850 <--> 1200 mV at 1125 mV 
[    0.733881] BUCK5 VDDQ_CKEM1_2: 1200 mV 
[    0.741604] BUCK6 1V35: 1350 mV 
[    0.748643] BUCK7 2V0: 2000 mV 
[    0.753836] ODROIDU2: Regulator BUCK8
[    0.880550] BUCK8 3V0: 3300 mV 
[    0.886897] BUCK9 1V2: ODROIDU2: Disabled regulator
[    0.891717] BUCK9 1V2: 1200 mV 
[    0.896132] s3c-i2c s3c2440-i2c.0: i2c-0: S3C I2C adapter
[    0.901555] s3c-i2c s3c2440-i2c.1: slave address 0x10
[    0.906601] s3c-i2c s3c2440-i2c.1: bus frequency set to 71 KHz
[    0.913023] s3c-i2c s3c2440-i2c.1: i2c-1: S3C I2C adapter
[    0.918440] s3c-i2c s3c2440-i2c.3: slave address 0x10
[    0.923486] s3c-i2c s3c2440-i2c.3: bus frequency set to 71 KHz
[    0.929670] s3c-i2c s3c2440-i2c.3: i2c-3: S3C I2C adapter
[    0.935087] s3c-i2c s3c2440-i2c.7: slave address 0x10
[    0.940144] s3c-i2c s3c2440-i2c.7: bus frequency set to 71 KHz
[    0.946325] s3c-i2c s3c2440-i2c.7: i2c-7: S3C I2C adapter
[    0.951757] s3c-i2c s3c2440-hdmiphy-i2c: slave address 0x10
[    0.957312] s3c-i2c s3c2440-hdmiphy-i2c: bus frequency set to 71 KHz
[    0.964013] s3c-i2c s3c2440-hdmiphy-i2c: i2c-8: S3C I2C adapter
[    0.970079] media: Linux media interface: v0.10
[    0.974751] Linux video capture interface: v2.00
[    0.980865] Advanced Linux Sound Architecture Driver Initialized.
[    0.988218] Switching to clocksource ipipe_tsc
[    1.016994] NET: Registered protocol family 2
[    1.021820] TCP established hash table entries: 8192 (order: 4, 65536 bytes)
[    1.028951] TCP bind hash table entries: 8192 (order: 5, 163840 bytes)
[    1.035556] TCP: Hash tables configured (established 8192 bind 8192)
[    1.041920] TCP: reno registered
[    1.045158] UDP hash table entries: 512 (order: 2, 24576 bytes)
[    1.051177] UDP-Lite hash table entries: 512 (order: 2, 24576 bytes)
[    1.057906] NET: Registered protocol family 1
[    1.062501] Trying to unpack rootfs image as initramfs...
[    1.546098] Freeing initrd memory: 8104K
[    1.550359] CPU PMU: probing PMU on CPU 1
[    1.554329] hw perfevents: enabled with ARMv7 Cortex-A9 PMU driver, 7 counters available
[    1.564970] I-pipe: head domain Xenomai registered.
[    1.569847] Xenomai: hal/arm started.
[    1.574053] Xenomai: scheduling class idle registered.
[    1.579135] Xenomai: scheduling class rt registered.
[    1.594882] Xenomai: real-time nucleus v2.6.3 (Lies and Truths) loaded.
[    1.601448] Xenomai: debug mode enabled.
[    1.606433] Xenomai: starting native API services.
[    1.611184] Xenomai: starting POSIX services.
[    1.615718] Xenomai: starting RTDM services.
[    1.620615] bounce pool size: 64 pages
[    1.641851] msgmni has been set to 1439
[    1.646805] io scheduler noop registered
[    1.650681] io scheduler deadline registered
[    1.655444] io scheduler cfq registered (default)
[    1.674662] dma-pl330 dma-pl330.0: Loaded driver for PL330 DMAC-267056
[    1.681150] dma-pl330 dma-pl330.0: 	DBUFF-32x4bytes Num_Chans-8 Num_Peri-32 Num_Events-32
[    1.697949] dma-pl330 dma-pl330.1: Loaded driver for PL330 DMAC-267056
[    1.704432] dma-pl330 dma-pl330.1: 	DBUFF-32x4bytes Num_Chans-8 Num_Peri-32 Num_Events-32
[    1.715149] dma-pl330 dma-pl330.2: Loaded driver for PL330 DMAC-267056
[    1.721630] dma-pl330 dma-pl330.2: 	DBUFF-64x8bytes Num_Chans-8 Num_Peri-1 Num_Events-32
[    1.886846] Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
[    1.895499] exynos4210-uart.0: ttySAC0 at MMIO 0x13800000 (irq = 84) is a S3C6400/10
[    1.903708] exynos4210-uart.1: ttySAC1 at MMIO 0x13810000 (irq = 85) is a S3C6400/10
[    1.911445] console [ttySAC1] enabled, bootconsole disabled
    1.911445] console [ttySAC1] enabled, bootconsole disabled
[    1.923304] exynos4210-uart.2: ttySAC2 at MMIO 0x13820000 (irq = 86) is a S3C6400/10
[    1.925900] exynos4210-uart.3: ttySAC3 at MMIO 0x13830000 (irq = 87) is a S3C6400/10
[    1.934237] [drm] Initialized drm 1.1.0 20060810
[    1.937745] Mali<2>: Inserting Mali v19 device driver. 
[    1.942949] Mali<2>: Compiled: Sep 25 2014, time: 11:01:49.
[    1.948493] Mali<2>: Driver revision: 
[    1.952201] Mali<2>: mali_module_init() registering driver
[    1.957733] Mali<2>: mali_probe(): Called for platform device mali-utgard
[    1.964459] Mali<2>: Memory system initializing
[    1.968963] Mali<2>: Using device defined memory settings (dedicated: 0x00000000@0x00000000, shared: 0x10000000)
[    1.979115] Mali<2>: Mali OS memory allocator created with max allocation size of 0x10000000 bytes, cpu_usage_adjust 0x00000000
[    1.990562] Mali<2>: Using device defined frame buffer settings (0x00000000@0x00000000)
[    1.998558] mali-utgard mali-utgard.0: start latency exceeded, new value 459 ns
[    2.005831] mali-utgard mali-utgard.0: state restore latency exceeded, new value 3584 ns
[    2.013902] Mali<2>: Mali PP: Creating Mali PP core: Mali_PP0
[    2.019605] Mali<2>: Mali PP: Base address of PP core: 0x13008000
[    2.025746] Mali<2>: Found Mali GPU Mali-400 MP r1p1
[    2.032018] Mali<2>: Mali L2 cache: Creating Mali L2 cache: Mali_L2
[    2.036881] Mali<2>: Mali MMU: Creating Mali MMU: Mali_GP_MMU
[    2.042608] Mali<2>: Mali GP: Creating Mali GP core: Mali_GP
[    2.048269] Mali<2>: Mali MMU: Creating Mali MMU: Mali_PP0_MMU
[    2.054083] Mali<2>: Mali PP: Creating Mali PP core: Mali_PP0
[    2.059761] Mali<2>: Mali PP: Base address of PP core: 0x13008000
[    2.065875] Mali<2>: Mali MMU: Creating Mali MMU: Mali_PP1_MMU
[    2.071656] Mali<2>: Mali PP: Creating Mali PP core: Mali_PP1
[    2.077347] Mali<2>: Mali PP: Base address of PP core: 0x1300a000
[    2.083480] Mali<2>: Mali MMU: Creating Mali MMU: Mali_PP2_MMU
[    2.089280] Mali<2>: Mali PP: Creating Mali PP core: Mali_PP2
[    2.094971] Mali<2>: Mali PP: Base address of PP core: 0x1300c000
[    2.101085] Mali<2>: Mali MMU: Creating Mali MMU: Mali_PP3_MMU
[    2.106869] Mali<2>: Mali PP: Creating Mali PP core: Mali_PP3
[    2.112559] Mali<2>: Mali PP: Base address of PP core: 0x1300e000
[    2.118687] Mali<2>: 4+0 PP cores initialized
[    2.122992] Mali<2>: Mali GPU Utilization: No utilization handler installed
[    2.129964] Mali<2>: = clk_set_rate : 533 , 1000000 
[    2.135260] Mali<2>: = clk_get_rate: 533 
[    2.138855] Mali: init_mali_clock mali_clock c060cff0 
[    2.144709] Mali<1>: = regulator_enable -> use cnt: 1 
[    2.149100] Mali<2>: = regulator_set_voltage: 1125000, 1125000 
[    2.158157] Mali<1>: = regulator_get_voltage: 1125000 
[    2.160086] Mali<2>: MALI Clock is set at mali driver
[    2.166180] Mali<2>: mali_probe(): Successfully initialized driver for platform device mali-utgard
[    2.174104] mali-utgard mali-utgard.0: stop latency exceeded, new value 542 ns
[    2.181305] mali-utgard mali-utgard.0: state save latency exceeded, new value 7917 ns
[    2.189546] Mali: Mali device driver loaded
[    2.193752] UMP: UMP device driver  loaded
[    2.211508] brd: module loaded
[    2.219095] loop: module loaded
[    2.219324] ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
[    2.223369] s5p-ehci s5p-ehci: S5P EHCI Host Controller
[    2.228378] s5p-ehci s5p-ehci: new USB bus registered, assigned bus number 1
[    2.235888] s5p-ehci s5p-ehci: irq 102, io mem 0x12580000
[    2.247687] s5p-ehci s5p-ehci: USB 2.0 started, EHCI 1.00
[    2.247876] usb usb1: New USB device found, idVendor=1d6b, idProduct=0002
[    2.254256] usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[    2.261446] usb usb1: Product: S5P EHCI Host Controller
[    2.266651] usb usb1: Manufacturer: Linux 3.8.13.11-xen ehci_hcd
[    2.272630] usb usb1: SerialNumber: s5p-ehci
[    2.277865] hub 1-0:1.0: USB hub found
[    2.280598] hub 1-0:1.0: 3 ports detected
[    2.285275] ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
[    2.290877] exynos-ohci exynos-ohci: Already power on PHY
[    2.296139] exynos-ohci exynos-ohci: EXYNOS OHCI Host Controller
[    2.302151] exynos-ohci exynos-ohci: new USB bus registered, assigned bus number 2
[    2.309704] exynos-ohci exynos-ohci: irq 102, io mem 0x12590000
[    2.371786] usb usb2: New USB device found, idVendor=1d6b, idProduct=0001
[    2.372965] usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1
[    2.380162] usb usb2: Product: EXYNOS OHCI Host Controller
[    2.385629] usb usb2: Manufacturer: Linux 3.8.13.11-xen ohci_hcd
[    2.391608] usb usb2: SerialNumber: exynos-ohci
[    2.397071] hub 2-0:1.0: USB hub found
[    2.399868] hub 2-0:1.0: 3 ports detected
[    2.509141] usb3503 0-0008: CFG1 failed (-111)
[    2.509206] usb3503 0-0008: usb3503_probe: probed on  hub mode
[    2.514564] s3c-rtc s3c64xx-rtc: rtc core: registered s3c as rtc0
[    2.527282] s5p-fimc exynos4-fimc.0: sclk_fimc rate is 176000000
[    2.527744] s5p-fimc exynos4-fimc.0: start latency exceeded, new value 500 ns
[    2.534821] s5p-fimc exynos4-fimc.0: state restore latency exceeded, new value 17833 ns
[    2.542819] s5p-fimc exynos4-fimc.0: stop latency exceeded, new value 333 ns
[    2.549831] s5p-fimc exynos4-fimc.0: state save latency exceeded, new value 3917 ns
[    2.552776] s5p-fimc exynos4-fimc.1: sclk_fimc rate is 176000000
[    2.563449] s5p-fimc exynos4-fimc.1: start latency exceeded, new value 292 ns
[    2.570576] s5p-fimc exynos4-fimc.1: state restore latency exceeded, new value 37500 ns
[    2.578542] s5p-fimc exynos4-fimc.1: stop latency exceeded, new value 333 ns
[    2.585544] s5p-fimc exynos4-fimc.1: state save latency exceeded, new value 3459 ns
[    2.587776] s5p-fimc exynos4-fimc.2: sclk_fimc rate is 176000000
[    2.599153] s5p-fimc exynos4-fimc.1: stop latency exceeded, new value 334 ns
[    2.602703] usb 1-2: new high-speed USB device number 2 using s5p-ehci
[    2.612704] s5p-fimc exynos4-fimc.2: start latency exceeded, new value 375 ns
[    2.619804] s5p-fimc exynos4-fimc.2: state restore latency exceeded, new value 16208 ns
[    2.627793] s5p-fimc exynos4-fimc.2: stop latency exceeded, new value 375 ns
[    2.634800] s5p-fimc exynos4-fimc.2: state save latency exceeded, new value 3333 ns
[    2.637773] s5p-fimc exynos4-fimc.3: sclk_fimc rate is 176000000
[    2.648426] s5p-fimc exynos4-fimc.3: start latency exceeded, new value 334 ns
[    2.655536] s5p-fimc exynos4-fimc.3: state restore latency exceeded, new value 15917 ns
[    2.663525] s5p-fimc exynos4-fimc.3: stop latency exceeded, new value 333 ns
[    2.670531] s5p-fimc exynos4-fimc.3: state save latency exceeded, new value 3292 ns
[    2.673696] s5p-fimc-md: Registered fimc.0.m2m as /dev/video0
[    2.674043] s5p-fimc-md: Registered fimc.0.capture as /dev/video1
[    2.674374] s5p-fimc-md: Registered fimc.1.m2m as /dev/video2
[    2.674713] s5p-fimc-md: Registered fimc.1.capture as /dev/video3
[    2.675047] s5p-fimc-md: Registered fimc.2.m2m as /dev/video4
[    2.675378] s5p-fimc-md: Registered fimc.2.capture as /dev/video5
[    2.675717] s5p-fimc-md: Registered fimc.3.m2m as /dev/video6
[    2.676049] s5p-fimc-md: Registered fimc.3.capture as /dev/video7
[    2.677718] sclk_mfc rate is: 220
[    2.678149] s5p-mfc s5p-mfc: decoder registered as /dev/video8
[    2.678482] s5p-mfc s5p-mfc: encoder registered as /dev/video9
[    2.685360] s5p-hdmiphy 8-0038: probe successful
[    2.685383] s5p-tv: Board is ODROID-X/X2/U2
[    2.685390] s5p-hdmi exynos4-hdmi: probe successful
[    2.685600] Samsung TV Mixer driver, (c) 2010-2011 Samsung Electronics Co., Ltd.
[    2.685600] 
[    2.696749] s5p-mixer s5p-mixer: probe start
[    2.696883] s5p-mixer s5p-mixer: resources acquired
[    2.696899] s5p-mixer s5p-mixer: added output 'S5P HDMI connector' from module 's5p-hdmi'
[    2.696907] s5p-mixer s5p-mixer: module s5p-sdo is missing
[    2.697307] s5p-mixer s5p-mixer: registered layer graph0 as /dev/video10
[    2.733038] usb 1-2: New USB device found, idVendor=0424, idProduct=9730
[    2.733046] usb 1-2: New USB device strings: Mfr=0, Product=0, SerialNumber=0
[    2.805908] fb0: registered frame buffer emulation for /dev/video10
[    2.805918] s5p-fimc exynos4-fimc.3: stop latency exceeded, new value 458 ns
[    2.819551] s5p-mixer s5p-mixer: registered layer graph1 as /dev/video11
[    2.826251] fb1: registered frame buffer emulation for /dev/video11
[    2.832453] s5p-mixer s5p-mixer: registered layer video0 as /dev/video12
[    2.839170] fb2: registered frame buffer emulation for /dev/video12
[    2.845029] s5p-mixer s5p-mixer: probe successful
[    2.850462] s5p-g2d s5p-g2d.0: device registered as /dev/video13
[    2.856675] Exynos: Kernel Thermal management registered
[    2.861364] sdhci: Secure Digital Host Controller Interface driver
[    2.867137] sdhci: Copyright(c) Pierre Ossman
[    2.871624] s3c-sdhci exynos4-sdhci.2: clock source 2: mmc_busclk.2 (440000000 Hz)
[    2.879107] mmc0: no vqmmc regulator found
[    2.883109] mmc0: no vmmc regulator found
[    2.917700] mmc0: SDHCI controller on samsung-hsmmc [exynos4-sdhci.2] using ADMA
[    2.921092] dw_mmc dw_mmc: Using internal DMA controller.
[    2.924907] mmc1: no vmmc regulator found
[    2.957711] DWMMC: Div 2 = 125
[    2.957767] mmc_host mmc1: Bus speed (slot 0) = 100000000Hz (slot req 400000Hz, actual 400000HZ div = 125)
[    2.957771] dw_mmc dw_mmc: Version ID is 240a
[    2.957780] dw_mmc dw_mmc: DW MMC controller at irq 109, 32 bit host data width, 128 deep fifo
[    2.964436] usbcore: registered new interface driver usbhid
[    2.964440] usbhid: USB HID core driver
[    2.987116] usb 1-3: new high-speed USB device number 3 using s5p-ehci
[    2.998123] max98090 1-0010: revision 0x43
[    3.006342] 	[MAX98090] max98090_set_record_main_mic(150)
[    3.009018] hkdk-snd-max89090 hkdk-snd-max89090:  max98090-aif1 <-> samsung-i2s.0 mapping ok
[    3.016522] hkdk-snd-max89090 hkdk-snd-max89090:  max98090-aif1 <-> samsung-i2s.0 mapping ok
[    3.025540] TCP: cubic registered
[    3.026232] NET: Registered protocol family 17
[    3.030860] VFP support v0.3: implementor 41 architecture 3 part 30 variant 9 rev 4
[    3.038405] ThumbEE CPU extension supported.
[    3.042624] Registering SWP/SWPB emulation handler
[    3.051490] LDO20 EMMC_IO_1V8: incomplete constraints, leaving on
[    3.055192] LDO10 VDD18_MIPI_1V8: incomplete constraints, leaving on
[    3.060745] DRM: mali_platform_drm_probe()
[    3.063832] mali_drm_init(), driver name: mali_drm, version 0.1
[    3.063843] mmc1: BKOPS_EN bit is not set
[    3.074299] DRM: mali_driver_load()
[    3.077187] [drm] Initialized mali_drm 0.1.0 20100520 on minor 0
[    3.083203] DWMMC: Div 1 = 1
[    3.085997] mmc_host mmc1: Bus speed (slot 0) = 100000000Hz (slot req 52000000Hz, actual 50000000HZ div = 1)
[    3.086005] s3c-rtc s3c64xx-rtc: setting system clock to 2014-09-30 04:07:31 UTC (1412050051)
[    3.089586] exynos4_dvfs_hotplug_init, max(2000000),min(0)
[    3.089663] ALSA device list:
[    3.089666]   #0: Built-in Audio
[    3.116392] mmc1: new high speed DDR MMC card at address 0001
[    3.116488] Freeing init memory: 192K
[    3.123026] usb 1-3: New USB device found, idVendor=0424, idProduct=3503
[    3.123031] usb 1-3: New USB device strings: Mfr=0, Product=0, SerialNumber=0
[    3.123751] hub 1-3:1.0: USB hub found
[    3.123896] hub 1-3:1.0: 3 ports detected
[    3.147241] mmcblk0: mmc1:0001 008G92 7.28 GiB 
[    3.151473] mmcblk0boot0: mmc1:0001 008G92 partition 1 4.00 MiB
[    3.157367] mmcblk0boot1: mmc1:0001 008G92 partition 2 4.00 MiB
[    3.163279] mmcblk0rpmb: mmc1:0001 008G92 partition 3 512 KiB
[    3.169982]  mmcblk0: p1 p2
[    3.173347]  mmcblk0boot1: unknown partition table
[    3.175720] udevd[1348]: starting version 175
[    3.182552]  mmcblk0boot0: unknown partition table
Loading, please wait...
[    3.736164] EXT4-fs (mmcblk0p2): mounted filesystem with ordered data mode. Opts: (null)
INIT: version 2.88 booting
[info] Using makefile-style concurrent boot in runlevel S.
[....] Starting the hotplug events dispatcher: udevd[    4.291195] udevd[1605]: starting version 175
[ ok .
[....] Synthesizing the initial hotplug events...[    4.534198] input: gpio-keys as /devices/platform/gpio-keys.0/input/input0
[    4.568213] s5p-g2d s5p-g2d.0: instance opened
[    4.568429] s5p-g2d s5p-g2d.0: instance closed
[    4.643526] s5p-fimc exynos4-fimc.2: state restore latency exceeded, new value 17917 ns
[    4.645996] s5p-fimc exynos4-fimc.2: stop latency exceeded, new value 625 ns
[    4.653105] s5p-fimc exynos4-fimc.2: state save latency exceeded, new value 51125 ns
[    4.659723] s5p-mfc s5p-mfc: start latency exceeded, new value 458 ns
[    4.659729] s5p-mfc s5p-mfc: state restore latency exceeded, new value 1583 ns
[    4.674301] s5p-fimc exynos4-fimc.2: stop latency exceeded, new value 792 ns
[    4.683166] s5p-fimc exynos4-fimc.1: start latency exceeded, new value 416 ns
[    4.683180] s5p-fimc exynos4-fimc.3: state restore latency exceeded, new value 19833 ns
[    4.683203] s5p-fimc exynos4-fimc.3: stop latency exceeded, new value 500 ns
[    4.685276] s5p-mixer s5p-mixer: start latency exceeded, new value 333 ns
[    4.685315] s5p-mixer s5p-mixer: state restore latency exceeded, new value 32583 ns
[    4.685511] s5p-mixer s5p-mixer: stop latency exceeded, new value 500 ns
[    4.685522] s5p-mixer s5p-mixer: start latency exceeded, new value 542 ns
[    4.685536] s5p-mixer s5p-mixer: state save latency exceeded, new value 9000 ns
[    4.708612] s5p-mixer s5p-mixer: stop latency exceeded, new value 584 ns
[    4.708636] s5p-mixer s5p-mixer: state save latency exceeded, new value 10000 ns
[    4.757199] s5p-fimc exynos4-fimc.1: state restore latency exceeded, new value 4574959 ns
[    4.766108] s5p-fimc exynos4-fimc.1: stop latency exceeded, new value 833 ns
[    4.767885] s5p-fimc exynos4-fimc.3: start latency exceeded, new value 541 ns
[    4.774933] s5p-fimc exynos4-fimc.3: state save latency exceeded, new value 25958 ns
[    4.782698] s5p-fimc exynos4-fimc.3: stop latency exceeded, new value 625 ns
[    4.790157] s5p-fimc exynos4-fimc.1: start latency exceeded, new value 708 ns
[    4.796778] s5p-fimc exynos4-fimc.1: state save latency exceeded, new value 6209 ns
[    4.807284] s5p-fimc exynos4-fimc.0: start latency exceeded, new value 1125 ns
[    4.811980] s5p-fimc exynos4-fimc.0: state save latency exceeded, new value 366375 ns
[    4.819422] s5p-fimc exynos4-fimc.0: stop latency exceeded, new value 1125 ns
[ ok done.
[    4.875524] smsc95xx v1.0.4
[    4.878837] smsc95xx_read_mac_addr : Can't file(/etc/smsc95xx_mac_addr) create!!
[    4.880623] smsc95xx 1-2:1.0 (unregistered net_device): Failed to write /etc/smsc95xx_mac_addr file!
[....] Waiting for /dev to be fully populated...[    4.942973] smsc95xx 1-2:1.0 eth0: register 'smsc95xx' at usb-s5p-ehci-2, smsc95xx USB 2.0 Ethernet, 5e:ae:5c:c6:13:e1
[    4.948187] usbcore: registered new interface driver smsc95xx
[    5.115044] s5p_mfc_alloc_and_load_firmware:44: Firmware is not present in the /lib/firmware directory nor compiled in kernel
[    5.120762] s5p-mfc s5p-mfc: stop latency exceeded, new value 708 ns
[    5.127126] s5p-mfc s5p-mfc: state save latency exceeded, new value 1583 ns
[    5.135297] s5p-mfc s5p-mfc: state restore latency exceeded, new value 2292 ns
[    5.148422] s5p_mfc_alloc_and_load_firmware:44: Firmware is not present in the /lib/firmware directory nor compiled in kernel
[    5.154141] s5p-mfc s5p-mfc: stop latency exceeded, new value 917 ns
[ ok done.
[....] Setting parameters of disc: (none)[ ok .
[....] Activating swap...[ ok done.
[    5.505059] EXT4-fs (mmcblk0p2): re-mounted. Opts: (null)
[....] Cleaning up temporary files... /tmp[ ok .
[....] Activating lvm and md swap...[ ok done.
[....] Checking file systems...fsck from util-linux 2.20.1
dosfsck 3.0.13, 30 Jun 2012, FAT32, LFN
/dev/mmcblk0p1: 52 files, 16645/32857 clusters
[ ok done.
[....] Mounting local filesystems...[ ok done.
[....] Activating swapfile swap...[ ok done.
[....] Cleaning up temporary files...[ ok .
[....] Setting kernel variables ...[ ok done.
[....] Configuring network interfaces...[    8.717522] smsc95xx 1-2:1.0 eth0: link up, 100Mbps, full-duplex, lpa 0xC5E1
[    9.010862] 	[MAX98090] max98090_set_playback_speaker_headset(111)
Starting rpcbind daemon...rpcbind: cannot create socket for udp6
rpcbind: cannot create socket for tcp6
.
Starting NFS common utilities: statd[    9.740247] RPC: Registered named UNIX socket transport module.
[    9.740534] RPC: Registered udp transport module.
[    9.745227] RPC: Registered tcp transport module.
[    9.749907] RPC: Registered tcp NFSv4.1 backchannel transport module.
[   10.088083] Installing knfsd (copyright (C) 1996 okir@monad.swb.de).
 idmapd.
[   10.271807] NFS: Registering the id_resolver key type
[   10.271901] Key type id_resolver registered
[   10.275402] Key type id_legacy registered
ifup: interface eth0 already configured
[ ok done.
[....] Starting rpcbind daemon...[....] Already running.[ ok .
[....] Starting NFS common utilities: statd idmapd[ ok .
[....] Cleaning up temporary files...[ ok .
[....] Setting up X socket directories... /tmp/.X11-unix /tmp/.ICE-unix[ ok .
INIT: Entering runlevel: 2
[info] Using makefile-style concurrent boot in runlevel 2.
[....] Starting rpcbind daemon...[....] Already running.[ ok .
[....] Starting enhanced syslogd: rsyslogd[ ok .
[....] Starting NFS common utilities: statd idmapd[ ok .
[....] Starting ACPI services...RTNETLINK1 answers: No such file or directory
acpid: error talking to the kernel via netlink
[ ok .
[....] Starting periodic command scheduler: cron[ ok .
[....] Starting NTP server: ntpd[ ok .
[....] Starting OpenBSD Secure Shell server: sshd[ ok .
[....] Starting system message bus: dbus[ ok .
[....] Starting Avahi mDNS/DNS-SD Daemon: avahi-daemon[ ok .
saned disabled; edit /etc/default/saned
[   73.229506] ps invoked oom-killer: gfp_mask=0x800d0, order=0, oom_score_adj=0
[   73.231032] ps cpuset=/ mems_allowed=0
[   73.238894] [<c0015528>] (unwind_backtrace+0x0/0xf8) from [<c043e300>] (dump_header.isra.9+0x68/0x17c)
[   73.244060] [<c043e300>] (dump_header.isra.9+0x68/0x17c) from [<c01277e8>] (oom_kill_process+0x25c/0x3ac)
[   73.253580] [<c01277e8>] (oom_kill_process+0x25c/0x3ac) from [<c0127d6c>] (out_of_memory+0x26c/0x2b0)
[   73.262764] [<c0127d6c>] (out_of_memory+0x26c/0x2b0) from [<c012ba00>] (__alloc_pages_nodemask+0x904/0x928)
[   73.272476] [<c012ba00>] (__alloc_pages_nodemask+0x904/0x928) from [<c012ba34>] (__get_free_pages+0x10/0x24)
[   73.282277] [<c012ba34>] (__get_free_pages+0x10/0x24) from [<c01a7c54>] (proc_info_read+0x40/0xdc)
[   73.291212] [<c01a7c54>] (proc_info_read+0x40/0xdc) from [<c015d654>] (vfs_read+0x9c/0x140)
[   73.299536] [<c015d654>] (vfs_read+0x9c/0x140) from [<c015d734>] (sys_read+0x3c/0x70)
[   73.307342] [<c015d734>] (sys_read+0x3c/0x70) from [<c000e9c0>] (ret_fast_syscall+0x0/0x34)
[   73.315660] Mem-info:
[   73.317895] Normal per-cpu:
[   73.320679] CPU    0: hi:  186, btch:  31 usd:  52
[   73.332481] CPU    1: hi:  186, btch:  31 usd:  69
[   73.332538] CPU    2: hi:  186, btch:  31 usd: 156
[   73.336416] CPU    3: hi:  186, btch:  31 usd:  47
[   73.341186] HighMem per-cpu:
[   73.349817] CPU    0: hi:  186, btch:  31 usd:  74
[   73.349876] CPU    1: hi:  186, btch:  31 usd: 163
[   73.353759] CPU    2: hi:  186, btch:  31 usd:  68
[   73.358520] CPU    3: hi:  186, btch:  31 usd: 167
[   73.363295] active_anon:3029 inactive_anon:80 isolated_anon:0
[   73.363295]  active_file:914 inactive_file:26899 isolated_file:0
[   73.363295]  unevictable:6034 dirty:0 writeback:21618 unstable:3382
[   73.363295]  free:365186 slab_reclaimable:21851 slab_unreclaimable:78307
[   73.363295]  mapped:1961 shmem:85 pagetables:219 bounce:0
[   73.363295]  free_cma:65274
[   73.396106] Normal free:293796kB min:32768kB low:40960kB high:49152kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:136kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:772160kB managed:466568kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:87404kB slab_unreclaimable:313228kB kernel_stack:35392kB pagetables:876kB unstable:0kB bounce:0kB free_cma:261096kB writeback_tmp:0kB pages_scanned:8872 all_unreclaimable? yes
[   73.437905] lowmem_reserve[]: 0 10215 10215
[   73.443954] HighMem free:1166948kB min:512kB low:14384kB high:28256kB active_anon:12116kB inactive_anon:320kB active_file:3660kB inactive_file:107460kB unevictable:24136kB isolated(anon):0kB isolated(file):0kB present:1307592kB managed:1317888kB mlocked:24136kB dirty:0kB writeback:86472kB mapped:7844kB shmem:340kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:13528kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[   73.484835] lowmem_reserve[]: 0 0 0
[   73.488243] Normal: 43*4kB (UM) 35*8kB (UMC) 26*16kB (M) 19*32kB (UMC) 14*64kB (UMC) 12*128kB (MC) 10*256kB (EMC) 11*512kB (UEMC) 5*1024kB (EM) 5*2048kB (MC) 65*4096kB (RC) = 293700kB
[   73.504558] HighMem: 1553*4kB (UM) 690*8kB (UM) 317*16kB (UM) 180*32kB (UM) 93*64kB (UM) 30*128kB (UM) 24*256kB (UM) 10*512kB (UM) 3*1024kB (UM) 3*2048kB (UM) 272*4096kB (UMR) = 1166948kB
[   73.521218] 28319 total pagecache pages
[   73.526089] 0 pages in swap cache
[   73.530070] Swap cache stats: add 0, delete 0, find 0/0
[   73.533545] Free swap  = 0kB
[   73.536388] Total swap = 0kB
[   73.625353] 1048064 pages of RAM
[   73.625403] 741072 free pages
[   73.625915] 20640 reserved pages
[   73.630964] 171398 slab pages
[   73.632058] 592340 pages shared
[   73.635170] 0 pages swap cached
[   73.638314] [ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
[   73.646141] [ 1605]     0  1605      639      318       4        0         -1000 udevd
[   73.660618] [ 2920]     0  2920      473      194       3        0             0 rpcbind
[   73.663075] [ 2933]   103  2933      551      280       4        0             0 rpc.statd
[   73.680625] [ 2952]     0  2952      571      175       3        0             0 rpc.idmapd
[   73.683340] [ 3259]     0  3259     6879      374       6        0             0 rsyslogd
[   73.701140] [ 3383]     0  3383      333      132       3        0             0 acpid
[   73.706415] [ 3397]     0  3397      463      176       3        0             0 cron
[   73.716297] [ 3400]   101  3400     1172      391       5        0             0 ntpd
[   73.723648] [ 3434]     0  3434     1290      248       4        0         -1000 sshd
[   73.730796] [ 3456]   104  3456      646      225       4        0             0 dbus-daemon
[   73.742246] [ 3500]   107  3500      715      351       4        0             0 avahi-daemon
[   73.747988] [ 3501]   107  3501      690      125       4        0             0 avahi-daemon
[   73.754186] [ 3530]     0  3530      460      180       3        0             0 getty
[   73.763341] [ 3531]     0  3531      460      180       3        0             0 getty
[   73.773442] [ 3532]     0  3532      460      180       3        0             0 getty
[   73.780235] [ 3533]     0  3533      460      180       4        0             0 getty
[   73.785320] [ 3534]     0  3534      460      180       4        0             0 getty
[   73.794893] [ 3535]     0  3535      460      180       3        0             0 getty
[   73.805040] [ 3536]     0  3536      638      258       4        0         -1000 udevd
[   73.811796] [ 3537]     0  3537      638      258       4        0         -1000 udevd
[   73.817872] [ 3538]     0  3538     2177      715       6        0             0 sshd
[   73.831421] [ 3543]     0  3543     6866      817       9        0             0 console-kit-dae
[   73.837427] [ 3610]     0  3610     5549      635       7        0             0 polkitd
[   73.847766] [ 3616]  1000  3616     2440      642       7        0             0 sshd
[   73.851107] [ 3621]  1000  3621      680      407       3        0             0 bash
[   73.859448] [ 3632]  1000  3632      607      137       3        0             0 gpg-agent
[   73.876118] [ 3635]     0  3635      755      342       4        0             0 sudo
[   73.878362] [ 3636]     0  3636      370      129       3        0             0 xeno-regression
[   73.889993] [ 3637]     0  3637      545      268       3        0             0 bash
[   73.896706] [ 3638]     0  3638      341      115       3        0             0 dohell
[   73.904646] [ 3639]     0  3639     3078     3040       8        0             0 switchtest
[   73.911285] [ 3640]     0  3640     3142     3104       8        0             0 switchtest
[   73.929813] [ 3641]     0  3641      566      508       4        0             0 latency
[   73.936223] [ 3642]     0  3642      341       66       3        0             0 dohell
[   73.940301] [ 3643]     0  3643      341       67       3        0             0 dohell
[   73.951169] [ 3644]     0  3644      341       66       3        0             0 dohell
[   73.962422] [ 3646]     0  3646      413      129       4        0             0 dd
[   73.967453] [ 3648]     0  3648      341       66       3        0             0 dohell
[   73.979465] [ 3649]     0  3649      329       97       3        0             0 sleep
[   73.982835] [ 3651]     0  3651      595      281       4        0             0 ls
[   73.991322] [ 3969]  1000  3969      667      290       4        0             0 top
[   73.997213] [ 7660]     0  7660      666      382       4        0             0 dd
[   74.006496] [ 7848]     0  7848      341       17       3        0             0 dohell
[   74.012845] [11962]     0 11962      551      181       4        0             0 ps
[   74.022176] Out of memory: Kill process 3616 (sshd) score 1 or sacrifice child
[   74.031554] Killed process 3621 (bash) total-vm:2720kB, anon-rss:472kB, file-rss:1156kB
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0003-Exynos4-ipipe.patch
Type: text/x-patch
Size: 8044 bytes
Desc: not available
URL: <http://www.xenomai.org/pipermail/xenomai/attachments/20140930/a54442a2/attachment.bin>

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] Switchtest failures on ODROIDU3
  2014-09-30 12:04   ` GP Orcullo
@ 2014-09-30 12:16     ` Gilles Chanteperdrix
  2014-09-30 23:32       ` GP Orcullo
  0 siblings, 1 reply; 46+ messages in thread
From: Gilles Chanteperdrix @ 2014-09-30 12:16 UTC (permalink / raw)
  To: GP Orcullo; +Cc: xenomai

On 09/30/2014 02:04 PM, GP Orcullo wrote:
> On Sep 30, 2014 7:30 PM, "Gilles Chanteperdrix" <
> gilles.chanteperdrix@xenomai.org> wrote:
>>
>> On 09/30/2014 07:31 AM, GP Orcullo wrote:
>>> Hi,
>>>
>>> Running the switchtest for extended periods (>10 mins) causes the
>>> machine to lockup.
>>>
>>> I'm running a modified xeno-regression-test which contains only the
>>> following tests:
>>>
>>> check_alive /usr/lib/xenomai/testsuite/switchtest
>>> check_alive /usr/lib/xenomai/testsuite/switchtest -s 1000
>>> check_alive /usr/lib/xenomai/testsuite/latency ${1+"$@"}
>>>
>>> The script is invoked with the following arguments:
>>>
>>> nohup sudo ./xeno-regression-test -l
>>> "/usr/lib/xenomai/testsuite/dohell -m /media/work 36000" -t 2 >
>>> /dev/null & top -d0.5
>>>
>>> The kernel dumps the OOPS information intermittently so it's difficult
>>> to diagnose the issue.
>>>
>>> Attached is the kernel config and the logfile.
>>
>> Ok, this is an exynos. Sorry, but I have never seen the patch for
>> exynos, so I do not know what is inside. You should direct your
>> questions to whoever provided you with this support.
> 
> I'm in the process of porting xenomai to run on exynos.
> 
> The ipipe-core-3.8.13-arm-3.patch applies cleanly to the 3.8.13.11 kernel
> used by the odroid U3 board.
> 
> Attached is the ipipe patch that I've made.
> 
> I was just wondering what would cause switchtest to fail. The error that I
> can see is that the system is running out of memory and I don't know
> exactly what is causing this.

Certainly not switchtest as it does not do any memory allocation.
However, the dohell script has a loop creating a large file and removing
it. So, could you try and run the dohell script with an unpatched kernel
and see if you have the error?

-- 
                                                                Gilles.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] Switchtest failures on ODROIDU3
  2014-09-30 12:16     ` Gilles Chanteperdrix
@ 2014-09-30 23:32       ` GP Orcullo
  2014-10-01  7:54         ` Gilles Chanteperdrix
  0 siblings, 1 reply; 46+ messages in thread
From: GP Orcullo @ 2014-09-30 23:32 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai

On Sep 30, 2014 8:16 PM, "Gilles Chanteperdrix" <
gilles.chanteperdrix@xenomai.org> wrote:
>
> On 09/30/2014 02:04 PM, GP Orcullo wrote:
> > On Sep 30, 2014 7:30 PM, "Gilles Chanteperdrix" <
> > gilles.chanteperdrix@xenomai.org> wrote:
> >>
> >> On 09/30/2014 07:31 AM, GP Orcullo wrote:
> >>> Hi,
> >>>
> >>> Running the switchtest for extended periods (>10 mins) causes the
> >>> machine to lockup.
> >>>
> >>> I'm running a modified xeno-regression-test which contains only the
> >>> following tests:
> >>>
> >>> check_alive /usr/lib/xenomai/testsuite/switchtest
> >>> check_alive /usr/lib/xenomai/testsuite/switchtest -s 1000
> >>> check_alive /usr/lib/xenomai/testsuite/latency ${1+"$@"}
> >>>
> >>> The script is invoked with the following arguments:
> >>>
> >>> nohup sudo ./xeno-regression-test -l
> >>> "/usr/lib/xenomai/testsuite/dohell -m /media/work 36000" -t 2 >
> >>> /dev/null & top -d0.5
> >>>
> >>> The kernel dumps the OOPS information intermittently so it's difficult
> >>> to diagnose the issue.
> >>>
> >>> Attached is the kernel config and the logfile.
> >>
> >> Ok, this is an exynos. Sorry, but I have never seen the patch for
> >> exynos, so I do not know what is inside. You should direct your
> >> questions to whoever provided you with this support.
> >
> > I'm in the process of porting xenomai to run on exynos.
> >
> > The ipipe-core-3.8.13-arm-3.patch applies cleanly to the 3.8.13.11
kernel
> > used by the odroid U3 board.
> >
> > Attached is the ipipe patch that I've made.
> >
> > I was just wondering what would cause switchtest to fail. The error
that I
> > can see is that the system is running out of memory and I don't know
> > exactly what is causing this.
>
> Certainly not switchtest as it does not do any memory allocation.
> However, the dohell script has a loop creating a large file and removing
> it. So, could you try and run the dohell script with an unpatched kernel
> and see if you have the error?
>

Running dohell on a patched and unpatched kernel doesn't trigger the lockup.

Running switchtest without dohell works OK.

What's the best way to tackle this issue?

> --
>                                                                 Gilles.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] Switchtest failures on ODROIDU3
  2014-09-30 23:32       ` GP Orcullo
@ 2014-10-01  7:54         ` Gilles Chanteperdrix
  2014-10-01  9:12           ` GP Orcullo
  0 siblings, 1 reply; 46+ messages in thread
From: Gilles Chanteperdrix @ 2014-10-01  7:54 UTC (permalink / raw)
  To: GP Orcullo; +Cc: xenomai

On 10/01/2014 01:32 AM, GP Orcullo wrote:
> On Sep 30, 2014 8:16 PM, "Gilles Chanteperdrix" <
> gilles.chanteperdrix@xenomai.org> wrote:
>>
>> On 09/30/2014 02:04 PM, GP Orcullo wrote:
>>> On Sep 30, 2014 7:30 PM, "Gilles Chanteperdrix" <
>>> gilles.chanteperdrix@xenomai.org> wrote:
>>>>
>>>> On 09/30/2014 07:31 AM, GP Orcullo wrote:
>>>>> Hi,
>>>>>
>>>>> Running the switchtest for extended periods (>10 mins) causes the
>>>>> machine to lockup.
>>>>>
>>>>> I'm running a modified xeno-regression-test which contains only the
>>>>> following tests:
>>>>>
>>>>> check_alive /usr/lib/xenomai/testsuite/switchtest
>>>>> check_alive /usr/lib/xenomai/testsuite/switchtest -s 1000
>>>>> check_alive /usr/lib/xenomai/testsuite/latency ${1+"$@"}
>>>>>
>>>>> The script is invoked with the following arguments:
>>>>>
>>>>> nohup sudo ./xeno-regression-test -l
>>>>> "/usr/lib/xenomai/testsuite/dohell -m /media/work 36000" -t 2 >
>>>>> /dev/null & top -d0.5
>>>>>
>>>>> The kernel dumps the OOPS information intermittently so it's difficult
>>>>> to diagnose the issue.
>>>>>
>>>>> Attached is the kernel config and the logfile.
>>>>
>>>> Ok, this is an exynos. Sorry, but I have never seen the patch for
>>>> exynos, so I do not know what is inside. You should direct your
>>>> questions to whoever provided you with this support.
>>>
>>> I'm in the process of porting xenomai to run on exynos.
>>>
>>> The ipipe-core-3.8.13-arm-3.patch applies cleanly to the 3.8.13.11
> kernel
>>> used by the odroid U3 board.
>>>
>>> Attached is the ipipe patch that I've made.
>>>
>>> I was just wondering what would cause switchtest to fail. The error
> that I
>>> can see is that the system is running out of memory and I don't know
>>> exactly what is causing this.
>>
>> Certainly not switchtest as it does not do any memory allocation.
>> However, the dohell script has a loop creating a large file and removing
>> it. So, could you try and run the dohell script with an unpatched kernel
>> and see if you have the error?
>>
> 
> Running dohell on a patched and unpatched kernel doesn't trigger the lockup.
> 
> Running switchtest without dohell works OK.

Is the problem a lockup, or an OOM?


-- 
                                                                Gilles.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] Switchtest failures on ODROIDU3
  2014-10-01  7:54         ` Gilles Chanteperdrix
@ 2014-10-01  9:12           ` GP Orcullo
  2014-10-01  9:20             ` Gilles Chanteperdrix
  0 siblings, 1 reply; 46+ messages in thread
From: GP Orcullo @ 2014-10-01  9:12 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai

On Oct 1, 2014 3:54 PM, "Gilles Chanteperdrix" <
gilles.chanteperdrix@xenomai.org> wrote:
>
> On 10/01/2014 01:32 AM, GP Orcullo wrote:
> > On Sep 30, 2014 8:16 PM, "Gilles Chanteperdrix" <
> > gilles.chanteperdrix@xenomai.org> wrote:
> >>
> >> On 09/30/2014 02:04 PM, GP Orcullo wrote:
> >>> On Sep 30, 2014 7:30 PM, "Gilles Chanteperdrix" <
> >>> gilles.chanteperdrix@xenomai.org> wrote:
> >>>>
> >>>> On 09/30/2014 07:31 AM, GP Orcullo wrote:
> >>>>> Hi,
> >>>>>
> >>>>> Running the switchtest for extended periods (>10 mins) causes the
> >>>>> machine to lockup.
> >>>>>
> >>>>> I'm running a modified xeno-regression-test which contains only the
> >>>>> following tests:
> >>>>>
> >>>>> check_alive /usr/lib/xenomai/testsuite/switchtest
> >>>>> check_alive /usr/lib/xenomai/testsuite/switchtest -s 1000
> >>>>> check_alive /usr/lib/xenomai/testsuite/latency ${1+"$@"}
> >>>>>
> >>>>> The script is invoked with the following arguments:
> >>>>>
> >>>>> nohup sudo ./xeno-regression-test -l
> >>>>> "/usr/lib/xenomai/testsuite/dohell -m /media/work 36000" -t 2 >
> >>>>> /dev/null & top -d0.5
> >>>>>
> >>>>> The kernel dumps the OOPS information intermittently so it's
difficult
> >>>>> to diagnose the issue.
> >>>>>
> >>>>> Attached is the kernel config and the logfile.
> >>>>
> >>>> Ok, this is an exynos. Sorry, but I have never seen the patch for
> >>>> exynos, so I do not know what is inside. You should direct your
> >>>> questions to whoever provided you with this support.
> >>>
> >>> I'm in the process of porting xenomai to run on exynos.
> >>>
> >>> The ipipe-core-3.8.13-arm-3.patch applies cleanly to the 3.8.13.11
> > kernel
> >>> used by the odroid U3 board.
> >>>
> >>> Attached is the ipipe patch that I've made.
> >>>
> >>> I was just wondering what would cause switchtest to fail. The error
> > that I
> >>> can see is that the system is running out of memory and I don't know
> >>> exactly what is causing this.
> >>
> >> Certainly not switchtest as it does not do any memory allocation.
> >> However, the dohell script has a loop creating a large file and
removing
> >> it. So, could you try and run the dohell script with an unpatched
kernel
> >> and see if you have the error?
> >>
> >
> > Running dohell on a patched and unpatched kernel doesn't trigger the
lockup.
> >
> > Running switchtest without dohell works OK.
>
> Is the problem a lockup, or an OOM?
>

It's a lockup.

The OOM message is the only one that I've captured so far.  Most of the
time the kernel doesn't spew any messages before the lockup.

The lockups are repeatable but generating any error messages isn't.

>
> --
>                                                                 Gilles.

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] Switchtest failures on ODROIDU3
  2014-10-01  9:12           ` GP Orcullo
@ 2014-10-01  9:20             ` Gilles Chanteperdrix
  2014-10-02 13:27               ` GP Orcullo
  0 siblings, 1 reply; 46+ messages in thread
From: Gilles Chanteperdrix @ 2014-10-01  9:20 UTC (permalink / raw)
  To: GP Orcullo; +Cc: xenomai

On 10/01/2014 11:12 AM, GP Orcullo wrote:
> On Oct 1, 2014 3:54 PM, "Gilles Chanteperdrix" <
> gilles.chanteperdrix@xenomai.org> wrote:
>>
>> On 10/01/2014 01:32 AM, GP Orcullo wrote:
>>> On Sep 30, 2014 8:16 PM, "Gilles Chanteperdrix" <
>>> gilles.chanteperdrix@xenomai.org> wrote:
>>>>
>>>> On 09/30/2014 02:04 PM, GP Orcullo wrote:
>>>>> On Sep 30, 2014 7:30 PM, "Gilles Chanteperdrix" <
>>>>> gilles.chanteperdrix@xenomai.org> wrote:
>>>>>>
>>>>>> On 09/30/2014 07:31 AM, GP Orcullo wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> Running the switchtest for extended periods (>10 mins) causes the
>>>>>>> machine to lockup.
>>>>>>>
>>>>>>> I'm running a modified xeno-regression-test which contains only the
>>>>>>> following tests:
>>>>>>>
>>>>>>> check_alive /usr/lib/xenomai/testsuite/switchtest
>>>>>>> check_alive /usr/lib/xenomai/testsuite/switchtest -s 1000
>>>>>>> check_alive /usr/lib/xenomai/testsuite/latency ${1+"$@"}
>>>>>>>
>>>>>>> The script is invoked with the following arguments:
>>>>>>>
>>>>>>> nohup sudo ./xeno-regression-test -l
>>>>>>> "/usr/lib/xenomai/testsuite/dohell -m /media/work 36000" -t 2 >
>>>>>>> /dev/null & top -d0.5
>>>>>>>
>>>>>>> The kernel dumps the OOPS information intermittently so it's
> difficult
>>>>>>> to diagnose the issue.
>>>>>>>
>>>>>>> Attached is the kernel config and the logfile.
>>>>>>
>>>>>> Ok, this is an exynos. Sorry, but I have never seen the patch for
>>>>>> exynos, so I do not know what is inside. You should direct your
>>>>>> questions to whoever provided you with this support.
>>>>>
>>>>> I'm in the process of porting xenomai to run on exynos.
>>>>>
>>>>> The ipipe-core-3.8.13-arm-3.patch applies cleanly to the 3.8.13.11
>>> kernel
>>>>> used by the odroid U3 board.
>>>>>
>>>>> Attached is the ipipe patch that I've made.
>>>>>
>>>>> I was just wondering what would cause switchtest to fail. The error
>>> that I
>>>>> can see is that the system is running out of memory and I don't know
>>>>> exactly what is causing this.
>>>>
>>>> Certainly not switchtest as it does not do any memory allocation.
>>>> However, the dohell script has a loop creating a large file and
> removing
>>>> it. So, could you try and run the dohell script with an unpatched
> kernel
>>>> and see if you have the error?
>>>>
>>>
>>> Running dohell on a patched and unpatched kernel doesn't trigger the
> lockup.
>>>
>>> Running switchtest without dohell works OK.
>>
>> Is the problem a lockup, or an OOM?
>>
> 
> It's a lockup.
> 
> The OOM message is the only one that I've captured so far.  Most of the
> time the kernel doesn't spew any messages before the lockup.
> 
> The lockups are repeatable but generating any error messages isn't.

Are you running the tests on the serial console, or with ssh? Do you
have unlocked context switch enabled? Have you tried enabling some debug
options?

Also note that xeno-regression-test puts the system under a lot of
stress, so it may happen that there is no output for some time (several
minutes), normally the test should stop by itself if there is no output
for something like 30 minutes. So, I would recommend not redirecting
xeno-test output to see if there is any error before the lockup, and
when you see the lockup, leave the system for 30 minutes to see if it
does not restart or if xeno-regression-test can exit gracefully.

-- 
                                                                Gilles.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] Switchtest failures on ODROIDU3
  2014-10-01  9:20             ` Gilles Chanteperdrix
@ 2014-10-02 13:27               ` GP Orcullo
  2014-10-02 13:36                 ` Gilles Chanteperdrix
  0 siblings, 1 reply; 46+ messages in thread
From: GP Orcullo @ 2014-10-02 13:27 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai

On Wed, Oct 1, 2014 at 5:20 PM, Gilles Chanteperdrix
<gilles.chanteperdrix@xenomai.org> wrote:
> On 10/01/2014 11:12 AM, GP Orcullo wrote:
>> On Oct 1, 2014 3:54 PM, "Gilles Chanteperdrix" <
>> gilles.chanteperdrix@xenomai.org> wrote:
>>>
>>> On 10/01/2014 01:32 AM, GP Orcullo wrote:
>>>> On Sep 30, 2014 8:16 PM, "Gilles Chanteperdrix" <
>>>> gilles.chanteperdrix@xenomai.org> wrote:
>>>>>
>>>>> On 09/30/2014 02:04 PM, GP Orcullo wrote:
>>>>>> On Sep 30, 2014 7:30 PM, "Gilles Chanteperdrix" <
>>>>>> gilles.chanteperdrix@xenomai.org> wrote:
>>>>>>>
>>>>>>> On 09/30/2014 07:31 AM, GP Orcullo wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> Running the switchtest for extended periods (>10 mins) causes the
>>>>>>>> machine to lockup.
>>>>>>>>
>>>>>>>> I'm running a modified xeno-regression-test which contains only the
>>>>>>>> following tests:
>>>>>>>>
>>>>>>>> check_alive /usr/lib/xenomai/testsuite/switchtest
>>>>>>>> check_alive /usr/lib/xenomai/testsuite/switchtest -s 1000
>>>>>>>> check_alive /usr/lib/xenomai/testsuite/latency ${1+"$@"}
>>>>>>>>
>>>>>>>> The script is invoked with the following arguments:
>>>>>>>>
>>>>>>>> nohup sudo ./xeno-regression-test -l
>>>>>>>> "/usr/lib/xenomai/testsuite/dohell -m /media/work 36000" -t 2 >
>>>>>>>> /dev/null & top -d0.5
>>>>>>>>
>>>>>>>> The kernel dumps the OOPS information intermittently so it's
>> difficult
>>>>>>>> to diagnose the issue.
>>>>>>>>
>>>>>>>> Attached is the kernel config and the logfile.
>>>>>>>
>>>>>>> Ok, this is an exynos. Sorry, but I have never seen the patch for
>>>>>>> exynos, so I do not know what is inside. You should direct your
>>>>>>> questions to whoever provided you with this support.
>>>>>>
>>>>>> I'm in the process of porting xenomai to run on exynos.
>>>>>>
>>>>>> The ipipe-core-3.8.13-arm-3.patch applies cleanly to the 3.8.13.11
>>>> kernel
>>>>>> used by the odroid U3 board.
>>>>>>
>>>>>> Attached is the ipipe patch that I've made.
>>>>>>
>>>>>> I was just wondering what would cause switchtest to fail. The error
>>>> that I
>>>>>> can see is that the system is running out of memory and I don't know
>>>>>> exactly what is causing this.
>>>>>
>>>>> Certainly not switchtest as it does not do any memory allocation.
>>>>> However, the dohell script has a loop creating a large file and
>> removing
>>>>> it. So, could you try and run the dohell script with an unpatched
>> kernel
>>>>> and see if you have the error?
>>>>>
>>>>
>>>> Running dohell on a patched and unpatched kernel doesn't trigger the
>> lockup.
>>>>
>>>> Running switchtest without dohell works OK.
>>>
>>> Is the problem a lockup, or an OOM?
>>>
>>
>> It's a lockup.
>>
>> The OOM message is the only one that I've captured so far.  Most of the
>> time the kernel doesn't spew any messages before the lockup.
>>
>> The lockups are repeatable but generating any error messages isn't.
>
> Are you running the tests on the serial console, or with ssh? Do you
> have unlocked context switch enabled? Have you tried enabling some debug
> options?
>

I'm using the serial console to log the kernel messages and ssh to run
the command. Using purely the serial console has the same results.

Is this the context switch?: "CONFIG_XENO_HW_UNLOCKED_SWITCH=y"

I will try playing again with the debug options and see if I can get
something useful.

> Also note that xeno-regression-test puts the system under a lot of
> stress, so it may happen that there is no output for some time (several
> minutes), normally the test should stop by itself if there is no output
> for something like 30 minutes. So, I would recommend not redirecting
> xeno-test output to see if there is any error before the lockup, and
> when you see the lockup, leave the system for 30 minutes to see if it
> does not restart or if xeno-regression-test can exit gracefully.
>

This is a total lockup. There's a heartbeat led that dies when it occurs.

Attached is one error log that I had captured previously and this one
had the CONFIG_CPU_IDLE enabled. I've lost track on which kernel this
trace came from but maybe the error looks familiar.

> --
>                                                                 Gilles.

-- 
GP Orcullo
-------------- next part --------------
[ 4619.775000] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[ 4619.775000]  PR  NI  VIRT  RES  SHR S  %CPU %MEM    TIME+  COMMAND           
[ 4619.775000] [<c0014a94>] (unwind_backtrace+0x0/0xf8) from [<c02fd710>] (panic+0x8c/0x1e4)
[ 4619.775000] [<c02fd710>] (panic+0x8c/0x1e4) from [<c0028fa8>] (do_exit+0x770/0x828)
[ 4619.775000] [<c0028fa8>] (do_exit+0x770/0x828) from [<c00291ac>] (do_group_exit+0x3c/0xb0)
[ 4619.775000] [<c00291ac>] (do_group_exit+0x3c/0xb0) from [<c0032f00>] (get_signal_to_deliver+0x1c4/0x530)
[ 4619.775000] [<c0032f00>] (get_signal_to_deliver+0x1c4/0x530) from [<c001125c>] (do_signal+0x7c/0x480)
[ 4619.775000] [<c001125c>] (do_signal+0x7c/0x480) from [<c0011afc>] (do_work_pending+0x68/0xa8)
[ 4619.775000] [<c0011afc>] (do_work_pending+0x68/0xa8) from [<c000e340>] (work_pending+0xc/0x20)
[ 4619.775000] CPU1: stopping   0    0 S   0.0  0.0   0:00.00 kthreadd          
[ 4619.775000] [<c0014a94>] (unwind_backtrace+0x0/0xf8) from [<c0013438>] (handle_IPI+0x120/0x14c)
[ 4619.775000] [<c0013438>] (handle_IPI+0x120/0x14c) from [<c000855c>] (gic_handle_irq+0x60/0x68)
[ 4619.775000] [<c000855c>] (gic_handle_irq+0x60/0x68) from [<c000df00>] (__irq_svc+0x40/0x70)
[ 4619.775000] Exception stack(0xc2113d88 to 0xc2113dd0)00.00 kworker/u:0H      
[ 4619.775000] 3d80:                   e6c60d10 0001ffff 00000001 00797474 e4089008 00797474
[ 4619.775000] 3da0: 00008eb2 e6c60d10 c2113e58 e6b8c025 00000003 c2113f00 ff555a5a c2113dd0
[ 4619.775000] 3dc0: c00d43c8 c00d43f0 60000053 ffffffff00.00 rcu_bh            
[ 4619.775000] [<c000df00>] (__irq_svc+0x40/0x70) from [<c00d43f0>] (__d_lookup+0x64/0x178)
[ 4619.775000] [<c00d43f0>] (__d_lookup+0x64/0x178) from [<c00c8800>] (lookup_fast+0x140/0x27c)                                          
[ 4619.775000] [<c00c8800>] (lookup_fast+0x140/0x27c) from [<c00c97e8>] (link_path_walk+0x178/0x8a0)                                     
[ 4619.775000] [<c00c97e8>] (link_path_walk+0x178/0x8a0) from [<c00ca64c>] (path_lookupat+0x54/0x774)                                    
[ 4619.775000] [<c00ca64c>] (path_lookupat+0x54/0x774) from [<c00cad8c>] (filename_lookup+0x20/0x60)                                     
[ 4619.775000] [<c00cad8c>] (filename_lookup+0x20/0x60) from [<c00ccda8>] (user_path_at_empty+0x50/0x7c)                                 
[ 4619.775000] [<c00ccda8>] (user_path_at_empty+0x50/0x7c) from [<c00ccde8>] (user_path_at+0x14/0x1c)                                    
[ 4619.775000] [<c00ccde8>] (user_path_at+0x14/0x1c) from [<c00defec>] (sys_lgetxattr+0x30/0x80)                                         
[ 4619.775000] [<c00defec>] (sys_lgetxattr+0x30/0x80) from [<c000e300>] (ret_fast_syscall+0x0/0x30)                                      
[ 4619.775000] CPU3: stopping                                                                                                            
[ 4619.775000] [<c0014a94>] (unwind_backtrace+0x0/0xf8) from [<c0013438>] (handle_IPI+0x120/0x14c)                                       
[ 4619.775000] [<c0013438>] (handle_IPI+0x120/0x14c) from [<c000855c>] (gic_handle_irq+0x60/0x68)                                        
[ 4619.775000] [<c000855c>] (gic_handle_irq+0x60/0x68) from [<c000df00>] (__irq_svc+0x40/0x70)                                           
[ 4619.775000] Exception stack(0xe6777ee8 to 0xe6777f30)                                                                                 
[ 4619.775000] 7ee0:                   00000001 a0000053 c00c9244 00000000 e7001480 c23ed000                                             
[ 4619.775000] 7f00: 426905c3 c23e8000 000000d0 c23ed000 426905c7 000201f0 010b3000 e6777f30                                             
[ 4619.775000] 7f20: c00c9244 c00ba378 20000053 ffffffff                                                                                 
[ 4619.775000] [<c000df00>] (__irq_svc+0x40/0x70) from [<c00ba378>] (kmem_cache_alloc+0x6c/0xe4)                                         
[ 4619.775000] [<c00ba378>] (kmem_cache_alloc+0x6c/0xe4) from [<c00c9244>] (getname_flags+0x20/0x118)                                    
[ 4619.775000] [<c00c9244>] (getname_flags+0x20/0x118) from [<c00bf2d4>] (do_sys_open+0xb4/0x174)                                        
[ 4619.775000] [<c00bf2d4>] (do_sys_open+0xb4/0x174) from [<c000e300>] (ret_fast_syscall+0x0/0x30)                                       
[ 4619.775000] CPU2: stopping                                                                                                            
[ 4619.775000] [<c0014a94>] (unwind_backtrace+0x0/0xf8) from [<c0013438>] (handle_IPI+0x120/0x14c)                                       
[ 4619.775000] [<c0013438>] (handle_IPI+0x120/0x14c) from [<c000855c>] (gic_handle_irq+0x60/0x68)                                        
[ 4619.775000] [<c000855c>] (gic_handle_irq+0x60/0x68) from [<c000df00>] (__irq_svc+0x40/0x70)                                           
[ 4619.775000] Exception stack(0xe706ff40 to 0xe706ff88)                                                                                 
[ 4619.775000] ff40: e706ff88 3b9aca00 a3d560f0 00000433 a3a76c30 00000433 c14d83f0 00000000                                             
[ 4619.775000] ff60: c045a380 413fc090 c0454f90 00000000 00000018 e706ff88 c0057900 c0255b64                                             
[ 4619.775000] ff80: 60000053 ffffffff                                                                                                   
[ 4619.775000] [<c000df00>] (__irq_svc+0x40/0x70) from [<c0255b64>] (cpuidle_wrap_enter+0x48/0x94)                                       
[ 4619.775000] [<c0255b64>] (cpuidle_wrap_enter+0x48/0x94) from [<c025586c>] (cpuidle_enter_state+0x14/0x68)                             
[ 4619.775000] [<c025586c>] (cpuidle_enter_state+0x14/0x68) from [<c0255954>] (cpuidle_idle_call+0x94/0x100)                             
[ 4619.775000] [<c0255954>] (cpuidle_idle_call+0x94/0x100) from [<c000f5e0>] (cpu_idle+0x90/0xec)                                        
[ 4619.775000] [<c000f5e0>] (cpu_idle+0x90/0xec) from [<402fa928>] (0x402fa928)  

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] Switchtest failures on ODROIDU3
  2014-10-02 13:27               ` GP Orcullo
@ 2014-10-02 13:36                 ` Gilles Chanteperdrix
  2014-10-02 15:52                   ` GP Orcullo
  0 siblings, 1 reply; 46+ messages in thread
From: Gilles Chanteperdrix @ 2014-10-02 13:36 UTC (permalink / raw)
  To: GP Orcullo; +Cc: xenomai

On 10/02/2014 03:27 PM, GP Orcullo wrote:
> On Wed, Oct 1, 2014 at 5:20 PM, Gilles Chanteperdrix
> <gilles.chanteperdrix@xenomai.org> wrote:
>> On 10/01/2014 11:12 AM, GP Orcullo wrote:
>>> On Oct 1, 2014 3:54 PM, "Gilles Chanteperdrix" <
>>> gilles.chanteperdrix@xenomai.org> wrote:
>>>>
>>>> On 10/01/2014 01:32 AM, GP Orcullo wrote:
>>>>> On Sep 30, 2014 8:16 PM, "Gilles Chanteperdrix" <
>>>>> gilles.chanteperdrix@xenomai.org> wrote:
>>>>>>
>>>>>> On 09/30/2014 02:04 PM, GP Orcullo wrote:
>>>>>>> On Sep 30, 2014 7:30 PM, "Gilles Chanteperdrix" <
>>>>>>> gilles.chanteperdrix@xenomai.org> wrote:
>>>>>>>>
>>>>>>>> On 09/30/2014 07:31 AM, GP Orcullo wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Running the switchtest for extended periods (>10 mins) causes the
>>>>>>>>> machine to lockup.
>>>>>>>>>
>>>>>>>>> I'm running a modified xeno-regression-test which contains only the
>>>>>>>>> following tests:
>>>>>>>>>
>>>>>>>>> check_alive /usr/lib/xenomai/testsuite/switchtest
>>>>>>>>> check_alive /usr/lib/xenomai/testsuite/switchtest -s 1000
>>>>>>>>> check_alive /usr/lib/xenomai/testsuite/latency ${1+"$@"}
>>>>>>>>>
>>>>>>>>> The script is invoked with the following arguments:
>>>>>>>>>
>>>>>>>>> nohup sudo ./xeno-regression-test -l
>>>>>>>>> "/usr/lib/xenomai/testsuite/dohell -m /media/work 36000" -t 2 >
>>>>>>>>> /dev/null & top -d0.5
>>>>>>>>>
>>>>>>>>> The kernel dumps the OOPS information intermittently so it's
>>> difficult
>>>>>>>>> to diagnose the issue.
>>>>>>>>>
>>>>>>>>> Attached is the kernel config and the logfile.
>>>>>>>>
>>>>>>>> Ok, this is an exynos. Sorry, but I have never seen the patch for
>>>>>>>> exynos, so I do not know what is inside. You should direct your
>>>>>>>> questions to whoever provided you with this support.
>>>>>>>
>>>>>>> I'm in the process of porting xenomai to run on exynos.
>>>>>>>
>>>>>>> The ipipe-core-3.8.13-arm-3.patch applies cleanly to the 3.8.13.11
>>>>> kernel
>>>>>>> used by the odroid U3 board.
>>>>>>>
>>>>>>> Attached is the ipipe patch that I've made.
>>>>>>>
>>>>>>> I was just wondering what would cause switchtest to fail. The error
>>>>> that I
>>>>>>> can see is that the system is running out of memory and I don't know
>>>>>>> exactly what is causing this.
>>>>>>
>>>>>> Certainly not switchtest as it does not do any memory allocation.
>>>>>> However, the dohell script has a loop creating a large file and
>>> removing
>>>>>> it. So, could you try and run the dohell script with an unpatched
>>> kernel
>>>>>> and see if you have the error?
>>>>>>
>>>>>
>>>>> Running dohell on a patched and unpatched kernel doesn't trigger the
>>> lockup.
>>>>>
>>>>> Running switchtest without dohell works OK.
>>>>
>>>> Is the problem a lockup, or an OOM?
>>>>
>>>
>>> It's a lockup.
>>>
>>> The OOM message is the only one that I've captured so far.  Most of the
>>> time the kernel doesn't spew any messages before the lockup.
>>>
>>> The lockups are repeatable but generating any error messages isn't.
>>
>> Are you running the tests on the serial console, or with ssh? Do you
>> have unlocked context switch enabled? Have you tried enabling some debug
>> options?
>>
> 
> I'm using the serial console to log the kernel messages and ssh to run
> the command. Using purely the serial console has the same results.

The main point was to avoid redirecting standard error to /dev/null to
see any application error message. Doing this on the serial console may
be a better idea that on ssh, because it means you are less likely to
miss a message that would be sent just prior to the system dying.

> 
> Is this the context switch?: "CONFIG_XENO_HW_UNLOCKED_SWITCH=y"

Yes, please try to disable it if you have it enabled.

> 
> I will try playing again with the debug options and see if I can get
> something useful.
> 
>> Also note that xeno-regression-test puts the system under a lot of
>> stress, so it may happen that there is no output for some time (several
>> minutes), normally the test should stop by itself if there is no output
>> for something like 30 minutes. So, I would recommend not redirecting
>> xeno-test output to see if there is any error before the lockup, and
>> when you see the lockup, leave the system for 30 minutes to see if it
>> does not restart or if xeno-regression-test can exit gracefully.
>>
> 
> This is a total lockup. There's a heartbeat led that dies when it occurs.

Well the heartbeat led does not prove anything: some Linux kernel
activity can very well prevent it from being toggled. Say if for
instance it is toggled by a thread and the activity that hogs the kernel
is a softirq that never ends.

> 
> Attached is one error log that I had captured previously and this one
> had the CONFIG_CPU_IDLE enabled. I've lost track on which kernel this
> trace came from but maybe the error looks familiar.

This trace misses an important information: the reason for the error.
So, please capture the serial console to a file, and post the complete
file, from boot up to the error.

Anyway, you did not answered my question: did you try to leave the
system on for say 30 minutes of 1 hour after the lockup to see if it
does not recover?


-- 
                                                                Gilles.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] Switchtest failures on ODROIDU3
  2014-10-02 13:36                 ` Gilles Chanteperdrix
@ 2014-10-02 15:52                   ` GP Orcullo
  2014-10-02 17:13                     ` Gilles Chanteperdrix
  0 siblings, 1 reply; 46+ messages in thread
From: GP Orcullo @ 2014-10-02 15:52 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai

On Thu, Oct 2, 2014 at 9:36 PM, Gilles Chanteperdrix
<gilles.chanteperdrix@xenomai.org> wrote:
> On 10/02/2014 03:27 PM, GP Orcullo wrote:
>> On Wed, Oct 1, 2014 at 5:20 PM, Gilles Chanteperdrix
>> <gilles.chanteperdrix@xenomai.org> wrote:
>>> On 10/01/2014 11:12 AM, GP Orcullo wrote:
>>>> On Oct 1, 2014 3:54 PM, "Gilles Chanteperdrix" <
>>>> gilles.chanteperdrix@xenomai.org> wrote:
>>>>>
>>>>> On 10/01/2014 01:32 AM, GP Orcullo wrote:
>>>>>> On Sep 30, 2014 8:16 PM, "Gilles Chanteperdrix" <
>>>>>> gilles.chanteperdrix@xenomai.org> wrote:
>>>>>>>
>>>>>>> On 09/30/2014 02:04 PM, GP Orcullo wrote:
>>>>>>>> On Sep 30, 2014 7:30 PM, "Gilles Chanteperdrix" <
>>>>>>>> gilles.chanteperdrix@xenomai.org> wrote:
>>>>>>>>>
>>>>>>>>> On 09/30/2014 07:31 AM, GP Orcullo wrote:
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> Running the switchtest for extended periods (>10 mins) causes the
>>>>>>>>>> machine to lockup.
>>>>>>>>>>
>>>>>>>>>> I'm running a modified xeno-regression-test which contains only the
>>>>>>>>>> following tests:
>>>>>>>>>>
>>>>>>>>>> check_alive /usr/lib/xenomai/testsuite/switchtest
>>>>>>>>>> check_alive /usr/lib/xenomai/testsuite/switchtest -s 1000
>>>>>>>>>> check_alive /usr/lib/xenomai/testsuite/latency ${1+"$@"}
>>>>>>>>>>
>>>>>>>>>> The script is invoked with the following arguments:
>>>>>>>>>>
>>>>>>>>>> nohup sudo ./xeno-regression-test -l
>>>>>>>>>> "/usr/lib/xenomai/testsuite/dohell -m /media/work 36000" -t 2 >
>>>>>>>>>> /dev/null & top -d0.5
>>>>>>>>>>
>>>>>>>>>> The kernel dumps the OOPS information intermittently so it's
>>>> difficult
>>>>>>>>>> to diagnose the issue.
>>>>>>>>>>
>>>>>>>>>> Attached is the kernel config and the logfile.
>>>>>>>>>
>>>>>>>>> Ok, this is an exynos. Sorry, but I have never seen the patch for
>>>>>>>>> exynos, so I do not know what is inside. You should direct your
>>>>>>>>> questions to whoever provided you with this support.
>>>>>>>>
>>>>>>>> I'm in the process of porting xenomai to run on exynos.
>>>>>>>>
>>>>>>>> The ipipe-core-3.8.13-arm-3.patch applies cleanly to the 3.8.13.11
>>>>>> kernel
>>>>>>>> used by the odroid U3 board.
>>>>>>>>
>>>>>>>> Attached is the ipipe patch that I've made.
>>>>>>>>
>>>>>>>> I was just wondering what would cause switchtest to fail. The error
>>>>>> that I
>>>>>>>> can see is that the system is running out of memory and I don't know
>>>>>>>> exactly what is causing this.
>>>>>>>
>>>>>>> Certainly not switchtest as it does not do any memory allocation.
>>>>>>> However, the dohell script has a loop creating a large file and
>>>> removing
>>>>>>> it. So, could you try and run the dohell script with an unpatched
>>>> kernel
>>>>>>> and see if you have the error?
>>>>>>>
>>>>>>
>>>>>> Running dohell on a patched and unpatched kernel doesn't trigger the
>>>> lockup.
>>>>>>
>>>>>> Running switchtest without dohell works OK.
>>>>>
>>>>> Is the problem a lockup, or an OOM?
>>>>>
>>>>
>>>> It's a lockup.
>>>>
>>>> The OOM message is the only one that I've captured so far.  Most of the
>>>> time the kernel doesn't spew any messages before the lockup.
>>>>
>>>> The lockups are repeatable but generating any error messages isn't.
>>>
>>> Are you running the tests on the serial console, or with ssh? Do you
>>> have unlocked context switch enabled? Have you tried enabling some debug
>>> options?
>>>
>>
>> I'm using the serial console to log the kernel messages and ssh to run
>> the command. Using purely the serial console has the same results.
>
> The main point was to avoid redirecting standard error to /dev/null to
> see any application error message. Doing this on the serial console may
> be a better idea that on ssh, because it means you are less likely to
> miss a message that would be sent just prior to the system dying.
>
>>
>> Is this the context switch?: "CONFIG_XENO_HW_UNLOCKED_SWITCH=y"
>
> Yes, please try to disable it if you have it enabled.
>
>>
>> I will try playing again with the debug options and see if I can get
>> something useful.
>>
>>> Also note that xeno-regression-test puts the system under a lot of
>>> stress, so it may happen that there is no output for some time (several
>>> minutes), normally the test should stop by itself if there is no output
>>> for something like 30 minutes. So, I would recommend not redirecting
>>> xeno-test output to see if there is any error before the lockup, and
>>> when you see the lockup, leave the system for 30 minutes to see if it
>>> does not restart or if xeno-regression-test can exit gracefully.
>>>
>>
>> This is a total lockup. There's a heartbeat led that dies when it occurs.
>
> Well the heartbeat led does not prove anything: some Linux kernel
> activity can very well prevent it from being toggled. Say if for
> instance it is toggled by a thread and the activity that hogs the kernel
> is a softirq that never ends.
>
>>
>> Attached is one error log that I had captured previously and this one
>> had the CONFIG_CPU_IDLE enabled. I've lost track on which kernel this
>> trace came from but maybe the error looks familiar.
>
> This trace misses an important information: the reason for the error.
> So, please capture the serial console to a file, and post the complete
> file, from boot up to the error.
>
> Anyway, you did not answered my question: did you try to leave the
> system on for say 30 minutes of 1 hour after the lockup to see if it
> does not recover?
>
>

The system never recovered.

With the context switch disabled, I was able to capture this error:

[  210.482299] INFO: rcu_preempt detected stalls on CPUs/tasks:)
[  210.487790] Task dump for CPU 2:
[  210.490995] switchtest      R running      0  3915   3639 0x00000002
[  210.497340] [<c0453ddc>] (__schedule+0x1fc/0x5f8) from [<00000010>] (0x10)
[  390.507943] INFO: rcu_preempt detected stalls on CPUs/tasks: { 2} (detected )
[  390.513510] Task dump for CPU 2:
[  390.516716] switchtest      R running      0  3915   3639 0x00000002
[  390.523065] [<c0453ddc>] (__schedule+0x1fc/0x5f8) from [<00000010>] (0x10)

<c0453ddc> points to the following section:

#ifndef __ARCH_WANT_UNLOCKED_CTXSW
        spin_release(&rq->lock.dep_map, 1, _THIS_IP_);
c0453dc8:       ebf04b13        bl      c0066a1c <lock_release>
#endif

        context_tracking_task_switch(prev, next);
        /* Here we just switch the register state and the stack. */
        switch_to(prev, next, prev);
c0453dcc:       e1a00009        mov     r0, r9
c0453dd0:       e5991004        ldr     r1, [r9, #4]
c0453dd4:       e5982004        ldr     r2, [r8, #4]
c0453dd8:       ebeeeae5        bl      c000e974 <__switch_to>
c0453ddc:       e1a04000        mov     r4, r0

        barrier();

        if (unlikely(__ipipe_switch_tail()))
c0453de0:       ebf0ceca        bl      c0087910 <__ipipe_switch_tail>
c0453de4:       e3500000        cmp     r0, #0
c0453de8:       1a0000cc        bne     c0454120 <__schedule+0x540>
        /*
         * this_rq must be evaluated again because prev may have moved
         * CPUs since it called schedule(), thus the 'rq' on its stack
         * frame will be invalid.
         */
        finish_task_switch(this_rq(), prev);
c0453dec:       ebf7e104        bl      c024c204 <debug_smp_processor_id>
c0453df0:       e51bc074        ldr     ip, [fp, #-116] ; 0x74
c0453df4:       e1a01004        mov     r1, r4


> --
>                                                                 Gilles.

-- 
GP Orcullo


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] Switchtest failures on ODROIDU3
  2014-10-02 15:52                   ` GP Orcullo
@ 2014-10-02 17:13                     ` Gilles Chanteperdrix
  2014-10-02 23:40                       ` GP Orcullo
  2014-10-03  3:35                       ` GP Orcullo
  0 siblings, 2 replies; 46+ messages in thread
From: Gilles Chanteperdrix @ 2014-10-02 17:13 UTC (permalink / raw)
  To: GP Orcullo; +Cc: xenomai

On 10/02/2014 05:52 PM, GP Orcullo wrote:
> On Thu, Oct 2, 2014 at 9:36 PM, Gilles Chanteperdrix
> <gilles.chanteperdrix@xenomai.org> wrote:
>> On 10/02/2014 03:27 PM, GP Orcullo wrote:
>>> On Wed, Oct 1, 2014 at 5:20 PM, Gilles Chanteperdrix
>>> <gilles.chanteperdrix@xenomai.org> wrote:
>>>> On 10/01/2014 11:12 AM, GP Orcullo wrote:
>>>>> On Oct 1, 2014 3:54 PM, "Gilles Chanteperdrix" <
>>>>> gilles.chanteperdrix@xenomai.org> wrote:
>>>>>>
>>>>>> On 10/01/2014 01:32 AM, GP Orcullo wrote:
>>>>>>> On Sep 30, 2014 8:16 PM, "Gilles Chanteperdrix" <
>>>>>>> gilles.chanteperdrix@xenomai.org> wrote:
>>>>>>>>
>>>>>>>> On 09/30/2014 02:04 PM, GP Orcullo wrote:
>>>>>>>>> On Sep 30, 2014 7:30 PM, "Gilles Chanteperdrix" <
>>>>>>>>> gilles.chanteperdrix@xenomai.org> wrote:
>>>>>>>>>>
>>>>>>>>>> On 09/30/2014 07:31 AM, GP Orcullo wrote:
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> Running the switchtest for extended periods (>10 mins) causes the
>>>>>>>>>>> machine to lockup.
>>>>>>>>>>>
>>>>>>>>>>> I'm running a modified xeno-regression-test which contains only the
>>>>>>>>>>> following tests:
>>>>>>>>>>>
>>>>>>>>>>> check_alive /usr/lib/xenomai/testsuite/switchtest
>>>>>>>>>>> check_alive /usr/lib/xenomai/testsuite/switchtest -s 1000
>>>>>>>>>>> check_alive /usr/lib/xenomai/testsuite/latency ${1+"$@"}
>>>>>>>>>>>
>>>>>>>>>>> The script is invoked with the following arguments:
>>>>>>>>>>>
>>>>>>>>>>> nohup sudo ./xeno-regression-test -l
>>>>>>>>>>> "/usr/lib/xenomai/testsuite/dohell -m /media/work 36000" -t 2 >
>>>>>>>>>>> /dev/null & top -d0.5
>>>>>>>>>>>
>>>>>>>>>>> The kernel dumps the OOPS information intermittently so it's
>>>>> difficult
>>>>>>>>>>> to diagnose the issue.
>>>>>>>>>>>
>>>>>>>>>>> Attached is the kernel config and the logfile.
>>>>>>>>>>
>>>>>>>>>> Ok, this is an exynos. Sorry, but I have never seen the patch for
>>>>>>>>>> exynos, so I do not know what is inside. You should direct your
>>>>>>>>>> questions to whoever provided you with this support.
>>>>>>>>>
>>>>>>>>> I'm in the process of porting xenomai to run on exynos.
>>>>>>>>>
>>>>>>>>> The ipipe-core-3.8.13-arm-3.patch applies cleanly to the 3.8.13.11
>>>>>>> kernel
>>>>>>>>> used by the odroid U3 board.
>>>>>>>>>
>>>>>>>>> Attached is the ipipe patch that I've made.
>>>>>>>>>
>>>>>>>>> I was just wondering what would cause switchtest to fail. The error
>>>>>>> that I
>>>>>>>>> can see is that the system is running out of memory and I don't know
>>>>>>>>> exactly what is causing this.
>>>>>>>>
>>>>>>>> Certainly not switchtest as it does not do any memory allocation.
>>>>>>>> However, the dohell script has a loop creating a large file and
>>>>> removing
>>>>>>>> it. So, could you try and run the dohell script with an unpatched
>>>>> kernel
>>>>>>>> and see if you have the error?
>>>>>>>>
>>>>>>>
>>>>>>> Running dohell on a patched and unpatched kernel doesn't trigger the
>>>>> lockup.
>>>>>>>
>>>>>>> Running switchtest without dohell works OK.
>>>>>>
>>>>>> Is the problem a lockup, or an OOM?
>>>>>>
>>>>>
>>>>> It's a lockup.
>>>>>
>>>>> The OOM message is the only one that I've captured so far.  Most of the
>>>>> time the kernel doesn't spew any messages before the lockup.
>>>>>
>>>>> The lockups are repeatable but generating any error messages isn't.
>>>>
>>>> Are you running the tests on the serial console, or with ssh? Do you
>>>> have unlocked context switch enabled? Have you tried enabling some debug
>>>> options?
>>>>
>>>
>>> I'm using the serial console to log the kernel messages and ssh to run
>>> the command. Using purely the serial console has the same results.
>>
>> The main point was to avoid redirecting standard error to /dev/null to
>> see any application error message. Doing this on the serial console may
>> be a better idea that on ssh, because it means you are less likely to
>> miss a message that would be sent just prior to the system dying.
>>
>>>
>>> Is this the context switch?: "CONFIG_XENO_HW_UNLOCKED_SWITCH=y"
>>
>> Yes, please try to disable it if you have it enabled.
>>
>>>
>>> I will try playing again with the debug options and see if I can get
>>> something useful.
>>>
>>>> Also note that xeno-regression-test puts the system under a lot of
>>>> stress, so it may happen that there is no output for some time (several
>>>> minutes), normally the test should stop by itself if there is no output
>>>> for something like 30 minutes. So, I would recommend not redirecting
>>>> xeno-test output to see if there is any error before the lockup, and
>>>> when you see the lockup, leave the system for 30 minutes to see if it
>>>> does not restart or if xeno-regression-test can exit gracefully.
>>>>
>>>
>>> This is a total lockup. There's a heartbeat led that dies when it occurs.
>>
>> Well the heartbeat led does not prove anything: some Linux kernel
>> activity can very well prevent it from being toggled. Say if for
>> instance it is toggled by a thread and the activity that hogs the kernel
>> is a softirq that never ends.
>>
>>>
>>> Attached is one error log that I had captured previously and this one
>>> had the CONFIG_CPU_IDLE enabled. I've lost track on which kernel this
>>> trace came from but maybe the error looks familiar.
>>
>> This trace misses an important information: the reason for the error.
>> So, please capture the serial console to a file, and post the complete
>> file, from boot up to the error.
>>
>> Anyway, you did not answered my question: did you try to leave the
>> system on for say 30 minutes of 1 hour after the lockup to see if it
>> does not recover?
>>
>>
> 
> The system never recovered.
> 
> With the context switch disabled, I was able to capture this error:
> 
> [  210.482299] INFO: rcu_preempt detected stalls on CPUs/tasks:)
> [  210.487790] Task dump for CPU 2:
> [  210.490995] switchtest      R running      0  3915   3639 0x00000002
> [  210.497340] [<c0453ddc>] (__schedule+0x1fc/0x5f8) from [<00000010>] (0x10)
> [  390.507943] INFO: rcu_preempt detected stalls on CPUs/tasks: { 2} (detected )
> [  390.513510] Task dump for CPU 2:
> [  390.516716] switchtest      R running      0  3915   3639 0x00000002
> [  390.523065] [<c0453ddc>] (__schedule+0x1fc/0x5f8) from [<00000010>] (0x10)
> 
> <c0453ddc> points to the following section:
> 
> #ifndef __ARCH_WANT_UNLOCKED_CTXSW
>         spin_release(&rq->lock.dep_map, 1, _THIS_IP_);
> c0453dc8:       ebf04b13        bl      c0066a1c <lock_release>
> #endif
> 
>         context_tracking_task_switch(prev, next);

You do not have context tracking enabled, right?


-- 
                                                                Gilles.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] Switchtest failures on ODROIDU3
  2014-10-02 17:13                     ` Gilles Chanteperdrix
@ 2014-10-02 23:40                       ` GP Orcullo
  2014-10-03  3:35                       ` GP Orcullo
  1 sibling, 0 replies; 46+ messages in thread
From: GP Orcullo @ 2014-10-02 23:40 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai

On Fri, Oct 3, 2014 at 1:13 AM, Gilles Chanteperdrix
<gilles.chanteperdrix@xenomai.org> wrote:
> On 10/02/2014 05:52 PM, GP Orcullo wrote:
>> On Thu, Oct 2, 2014 at 9:36 PM, Gilles Chanteperdrix
>> <gilles.chanteperdrix@xenomai.org> wrote:
>>> On 10/02/2014 03:27 PM, GP Orcullo wrote:
>>>> On Wed, Oct 1, 2014 at 5:20 PM, Gilles Chanteperdrix
>>>> <gilles.chanteperdrix@xenomai.org> wrote:
>>>>> On 10/01/2014 11:12 AM, GP Orcullo wrote:
>>>>>> On Oct 1, 2014 3:54 PM, "Gilles Chanteperdrix" <
>>>>>> gilles.chanteperdrix@xenomai.org> wrote:
>>>>>>>
>>>>>>> On 10/01/2014 01:32 AM, GP Orcullo wrote:
>>>>>>>> On Sep 30, 2014 8:16 PM, "Gilles Chanteperdrix" <
>>>>>>>> gilles.chanteperdrix@xenomai.org> wrote:
>>>>>>>>>
>>>>>>>>> On 09/30/2014 02:04 PM, GP Orcullo wrote:
>>>>>>>>>> On Sep 30, 2014 7:30 PM, "Gilles Chanteperdrix" <
>>>>>>>>>> gilles.chanteperdrix@xenomai.org> wrote:
>>>>>>>>>>>
>>>>>>>>>>> On 09/30/2014 07:31 AM, GP Orcullo wrote:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> Running the switchtest for extended periods (>10 mins) causes the
>>>>>>>>>>>> machine to lockup.
>>>>>>>>>>>>
>>>>>>>>>>>> I'm running a modified xeno-regression-test which contains only the
>>>>>>>>>>>> following tests:
>>>>>>>>>>>>
>>>>>>>>>>>> check_alive /usr/lib/xenomai/testsuite/switchtest
>>>>>>>>>>>> check_alive /usr/lib/xenomai/testsuite/switchtest -s 1000
>>>>>>>>>>>> check_alive /usr/lib/xenomai/testsuite/latency ${1+"$@"}
>>>>>>>>>>>>
>>>>>>>>>>>> The script is invoked with the following arguments:
>>>>>>>>>>>>
>>>>>>>>>>>> nohup sudo ./xeno-regression-test -l
>>>>>>>>>>>> "/usr/lib/xenomai/testsuite/dohell -m /media/work 36000" -t 2 >
>>>>>>>>>>>> /dev/null & top -d0.5
>>>>>>>>>>>>
>>>>>>>>>>>> The kernel dumps the OOPS information intermittently so it's
>>>>>> difficult
>>>>>>>>>>>> to diagnose the issue.
>>>>>>>>>>>>
>>>>>>>>>>>> Attached is the kernel config and the logfile.
>>>>>>>>>>>
>>>>>>>>>>> Ok, this is an exynos. Sorry, but I have never seen the patch for
>>>>>>>>>>> exynos, so I do not know what is inside. You should direct your
>>>>>>>>>>> questions to whoever provided you with this support.
>>>>>>>>>>
>>>>>>>>>> I'm in the process of porting xenomai to run on exynos.
>>>>>>>>>>
>>>>>>>>>> The ipipe-core-3.8.13-arm-3.patch applies cleanly to the 3.8.13.11
>>>>>>>> kernel
>>>>>>>>>> used by the odroid U3 board.
>>>>>>>>>>
>>>>>>>>>> Attached is the ipipe patch that I've made.
>>>>>>>>>>
>>>>>>>>>> I was just wondering what would cause switchtest to fail. The error
>>>>>>>> that I
>>>>>>>>>> can see is that the system is running out of memory and I don't know
>>>>>>>>>> exactly what is causing this.
>>>>>>>>>
>>>>>>>>> Certainly not switchtest as it does not do any memory allocation.
>>>>>>>>> However, the dohell script has a loop creating a large file and
>>>>>> removing
>>>>>>>>> it. So, could you try and run the dohell script with an unpatched
>>>>>> kernel
>>>>>>>>> and see if you have the error?
>>>>>>>>>
>>>>>>>>
>>>>>>>> Running dohell on a patched and unpatched kernel doesn't trigger the
>>>>>> lockup.
>>>>>>>>
>>>>>>>> Running switchtest without dohell works OK.
>>>>>>>
>>>>>>> Is the problem a lockup, or an OOM?
>>>>>>>
>>>>>>
>>>>>> It's a lockup.
>>>>>>
>>>>>> The OOM message is the only one that I've captured so far.  Most of the
>>>>>> time the kernel doesn't spew any messages before the lockup.
>>>>>>
>>>>>> The lockups are repeatable but generating any error messages isn't.
>>>>>
>>>>> Are you running the tests on the serial console, or with ssh? Do you
>>>>> have unlocked context switch enabled? Have you tried enabling some debug
>>>>> options?
>>>>>
>>>>
>>>> I'm using the serial console to log the kernel messages and ssh to run
>>>> the command. Using purely the serial console has the same results.
>>>
>>> The main point was to avoid redirecting standard error to /dev/null to
>>> see any application error message. Doing this on the serial console may
>>> be a better idea that on ssh, because it means you are less likely to
>>> miss a message that would be sent just prior to the system dying.
>>>
>>>>
>>>> Is this the context switch?: "CONFIG_XENO_HW_UNLOCKED_SWITCH=y"
>>>
>>> Yes, please try to disable it if you have it enabled.
>>>
>>>>
>>>> I will try playing again with the debug options and see if I can get
>>>> something useful.
>>>>
>>>>> Also note that xeno-regression-test puts the system under a lot of
>>>>> stress, so it may happen that there is no output for some time (several
>>>>> minutes), normally the test should stop by itself if there is no output
>>>>> for something like 30 minutes. So, I would recommend not redirecting
>>>>> xeno-test output to see if there is any error before the lockup, and
>>>>> when you see the lockup, leave the system for 30 minutes to see if it
>>>>> does not restart or if xeno-regression-test can exit gracefully.
>>>>>
>>>>
>>>> This is a total lockup. There's a heartbeat led that dies when it occurs.
>>>
>>> Well the heartbeat led does not prove anything: some Linux kernel
>>> activity can very well prevent it from being toggled. Say if for
>>> instance it is toggled by a thread and the activity that hogs the kernel
>>> is a softirq that never ends.
>>>
>>>>
>>>> Attached is one error log that I had captured previously and this one
>>>> had the CONFIG_CPU_IDLE enabled. I've lost track on which kernel this
>>>> trace came from but maybe the error looks familiar.
>>>
>>> This trace misses an important information: the reason for the error.
>>> So, please capture the serial console to a file, and post the complete
>>> file, from boot up to the error.
>>>
>>> Anyway, you did not answered my question: did you try to leave the
>>> system on for say 30 minutes of 1 hour after the lockup to see if it
>>> does not recover?
>>>
>>>
>>
>> The system never recovered.
>>
>> With the context switch disabled, I was able to capture this error:
>>
>> [  210.482299] INFO: rcu_preempt detected stalls on CPUs/tasks:)
>> [  210.487790] Task dump for CPU 2:
>> [  210.490995] switchtest      R running      0  3915   3639 0x00000002
>> [  210.497340] [<c0453ddc>] (__schedule+0x1fc/0x5f8) from [<00000010>] (0x10)
>> [  390.507943] INFO: rcu_preempt detected stalls on CPUs/tasks: { 2} (detected )
>> [  390.513510] Task dump for CPU 2:
>> [  390.516716] switchtest      R running      0  3915   3639 0x00000002
>> [  390.523065] [<c0453ddc>] (__schedule+0x1fc/0x5f8) from [<00000010>] (0x10)
>>
>> <c0453ddc> points to the following section:
>>
>> #ifndef __ARCH_WANT_UNLOCKED_CTXSW
>>         spin_release(&rq->lock.dep_map, 1, _THIS_IP_);
>> c0453dc8:       ebf04b13        bl      c0066a1c <lock_release>
>> #endif
>>
>>         context_tracking_task_switch(prev, next);
>
> You do not have context tracking enabled, right?
>

# CONFIG_XENO_HW_UNLOCKED_SWITCH is not set

Getting this board to spew out error messages is tough.

>
> --
>                                                                 Gilles.



-- 
GP Orcullo


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] Switchtest failures on ODROIDU3
  2014-10-02 17:13                     ` Gilles Chanteperdrix
  2014-10-02 23:40                       ` GP Orcullo
@ 2014-10-03  3:35                       ` GP Orcullo
  2014-10-03  7:20                         ` Gilles Chanteperdrix
  1 sibling, 1 reply; 46+ messages in thread
From: GP Orcullo @ 2014-10-03  3:35 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai

On Fri, Oct 3, 2014 at 1:13 AM, Gilles Chanteperdrix
<gilles.chanteperdrix@xenomai.org> wrote:
> On 10/02/2014 05:52 PM, GP Orcullo wrote:
>> On Thu, Oct 2, 2014 at 9:36 PM, Gilles Chanteperdrix
>> <gilles.chanteperdrix@xenomai.org> wrote:
>>> On 10/02/2014 03:27 PM, GP Orcullo wrote:
>>>> On Wed, Oct 1, 2014 at 5:20 PM, Gilles Chanteperdrix
>>>> <gilles.chanteperdrix@xenomai.org> wrote:
>>>>> On 10/01/2014 11:12 AM, GP Orcullo wrote:
>>>>>> On Oct 1, 2014 3:54 PM, "Gilles Chanteperdrix" <
>>>>>> gilles.chanteperdrix@xenomai.org> wrote:
>>>>>>>
>>>>>>> On 10/01/2014 01:32 AM, GP Orcullo wrote:
>>>>>>>> On Sep 30, 2014 8:16 PM, "Gilles Chanteperdrix" <
>>>>>>>> gilles.chanteperdrix@xenomai.org> wrote:
>>>>>>>>>
>>>>>>>>> On 09/30/2014 02:04 PM, GP Orcullo wrote:
>>>>>>>>>> On Sep 30, 2014 7:30 PM, "Gilles Chanteperdrix" <
>>>>>>>>>> gilles.chanteperdrix@xenomai.org> wrote:
>>>>>>>>>>>
>>>>>>>>>>> On 09/30/2014 07:31 AM, GP Orcullo wrote:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> Running the switchtest for extended periods (>10 mins) causes the
>>>>>>>>>>>> machine to lockup.
>>>>>>>>>>>>
>>>>>>>>>>>> I'm running a modified xeno-regression-test which contains only the
>>>>>>>>>>>> following tests:
>>>>>>>>>>>>
>>>>>>>>>>>> check_alive /usr/lib/xenomai/testsuite/switchtest
>>>>>>>>>>>> check_alive /usr/lib/xenomai/testsuite/switchtest -s 1000
>>>>>>>>>>>> check_alive /usr/lib/xenomai/testsuite/latency ${1+"$@"}
>>>>>>>>>>>>
>>>>>>>>>>>> The script is invoked with the following arguments:
>>>>>>>>>>>>
>>>>>>>>>>>> nohup sudo ./xeno-regression-test -l
>>>>>>>>>>>> "/usr/lib/xenomai/testsuite/dohell -m /media/work 36000" -t 2 >
>>>>>>>>>>>> /dev/null & top -d0.5
>>>>>>>>>>>>
>>>>>>>>>>>> The kernel dumps the OOPS information intermittently so it's
>>>>>> difficult
>>>>>>>>>>>> to diagnose the issue.
>>>>>>>>>>>>
>>>>>>>>>>>> Attached is the kernel config and the logfile.
>>>>>>>>>>>
>>>>>>>>>>> Ok, this is an exynos. Sorry, but I have never seen the patch for
>>>>>>>>>>> exynos, so I do not know what is inside. You should direct your
>>>>>>>>>>> questions to whoever provided you with this support.
>>>>>>>>>>
>>>>>>>>>> I'm in the process of porting xenomai to run on exynos.
>>>>>>>>>>
>>>>>>>>>> The ipipe-core-3.8.13-arm-3.patch applies cleanly to the 3.8.13.11
>>>>>>>> kernel
>>>>>>>>>> used by the odroid U3 board.
>>>>>>>>>>
>>>>>>>>>> Attached is the ipipe patch that I've made.
>>>>>>>>>>
>>>>>>>>>> I was just wondering what would cause switchtest to fail. The error
>>>>>>>> that I
>>>>>>>>>> can see is that the system is running out of memory and I don't know
>>>>>>>>>> exactly what is causing this.
>>>>>>>>>
>>>>>>>>> Certainly not switchtest as it does not do any memory allocation.
>>>>>>>>> However, the dohell script has a loop creating a large file and
>>>>>> removing
>>>>>>>>> it. So, could you try and run the dohell script with an unpatched
>>>>>> kernel
>>>>>>>>> and see if you have the error?
>>>>>>>>>
>>>>>>>>
>>>>>>>> Running dohell on a patched and unpatched kernel doesn't trigger the
>>>>>> lockup.
>>>>>>>>
>>>>>>>> Running switchtest without dohell works OK.
>>>>>>>
>>>>>>> Is the problem a lockup, or an OOM?
>>>>>>>
>>>>>>
>>>>>> It's a lockup.
>>>>>>
>>>>>> The OOM message is the only one that I've captured so far.  Most of the
>>>>>> time the kernel doesn't spew any messages before the lockup.
>>>>>>
>>>>>> The lockups are repeatable but generating any error messages isn't.
>>>>>
>>>>> Are you running the tests on the serial console, or with ssh? Do you
>>>>> have unlocked context switch enabled? Have you tried enabling some debug
>>>>> options?
>>>>>
>>>>
>>>> I'm using the serial console to log the kernel messages and ssh to run
>>>> the command. Using purely the serial console has the same results.
>>>
>>> The main point was to avoid redirecting standard error to /dev/null to
>>> see any application error message. Doing this on the serial console may
>>> be a better idea that on ssh, because it means you are less likely to
>>> miss a message that would be sent just prior to the system dying.
>>>
>>>>
>>>> Is this the context switch?: "CONFIG_XENO_HW_UNLOCKED_SWITCH=y"
>>>
>>> Yes, please try to disable it if you have it enabled.
>>>
>>>>
>>>> I will try playing again with the debug options and see if I can get
>>>> something useful.
>>>>
>>>>> Also note that xeno-regression-test puts the system under a lot of
>>>>> stress, so it may happen that there is no output for some time (several
>>>>> minutes), normally the test should stop by itself if there is no output
>>>>> for something like 30 minutes. So, I would recommend not redirecting
>>>>> xeno-test output to see if there is any error before the lockup, and
>>>>> when you see the lockup, leave the system for 30 minutes to see if it
>>>>> does not restart or if xeno-regression-test can exit gracefully.
>>>>>
>>>>
>>>> This is a total lockup. There's a heartbeat led that dies when it occurs.
>>>
>>> Well the heartbeat led does not prove anything: some Linux kernel
>>> activity can very well prevent it from being toggled. Say if for
>>> instance it is toggled by a thread and the activity that hogs the kernel
>>> is a softirq that never ends.
>>>
>>>>
>>>> Attached is one error log that I had captured previously and this one
>>>> had the CONFIG_CPU_IDLE enabled. I've lost track on which kernel this
>>>> trace came from but maybe the error looks familiar.
>>>
>>> This trace misses an important information: the reason for the error.
>>> So, please capture the serial console to a file, and post the complete
>>> file, from boot up to the error.
>>>
>>> Anyway, you did not answered my question: did you try to leave the
>>> system on for say 30 minutes of 1 hour after the lockup to see if it
>>> does not recover?
>>>
>>>
>>
>> The system never recovered.
>>
>> With the context switch disabled, I was able to capture this error:
>>
>> [  210.482299] INFO: rcu_preempt detected stalls on CPUs/tasks:)
>> [  210.487790] Task dump for CPU 2:
>> [  210.490995] switchtest      R running      0  3915   3639 0x00000002
>> [  210.497340] [<c0453ddc>] (__schedule+0x1fc/0x5f8) from [<00000010>] (0x10)
>> [  390.507943] INFO: rcu_preempt detected stalls on CPUs/tasks: { 2} (detected )
>> [  390.513510] Task dump for CPU 2:
>> [  390.516716] switchtest      R running      0  3915   3639 0x00000002
>> [  390.523065] [<c0453ddc>] (__schedule+0x1fc/0x5f8) from [<00000010>] (0x10)
>>
>> <c0453ddc> points to the following section:
>>
>> #ifndef __ARCH_WANT_UNLOCKED_CTXSW
>>         spin_release(&rq->lock.dep_map, 1, _THIS_IP_);
>> c0453dc8:       ebf04b13        bl      c0066a1c <lock_release>
>> #endif
>>
>>         context_tracking_task_switch(prev, next);
>
> You do not have context tracking enabled, right?
>
>
> --
>                                                                 Gilles.

The system managed to dump some error messages.

-- 
GP Orcullo
-------------- next part --------------
[  407.688809] INFO: rcu_preempt detected stalls on CPUs/tasks: { 3} (detected by 0, t=12002 jiffies, g=22447, c=22446, q=113448)
[  407.694560] Task dump for CPU 3:
[  407.697763] switchtest      R running      0  3979   3640 0x00000002
[  407.704111] [<c0453ddc>] (__schedule+0x1fc/0x5f8) from [<00000010>] (0x10)
[  587.714458] INFO: rcu_preempt detected stalls on CPUs/tasks: { 3} (detected by 2, t=48007 jiffies, g=22447, c=22446, q=443504)
[  587.720205] Task dump for CPU 3:
[  587.723407] switchtest      R running      0  3979   3640 0x00000002
[  587.729755] [<c0453ddc>] (__schedule+0x1fc/0x5f8) from [<00000010>] (0x10)
[  767.740128] INFO: rcu_preempt detected stalls on CPUs/tasks: { 3} (detected by 2, t=84012 jiffies, g=22447, c=22446, q=1136198)
[  767.745964] Task dump for CPU 3:
[  767.749167] switchtest      R running      0  3979   3640 0x00000002
[  767.755515] [<c0453ddc>] (__schedule+0x1fc/0x5f8) from [<00000010>] (0x10)
[  782.763027] ------------[ cut here ]------------
[  782.763099] WARNING: at include/linux/ipipe_domain.h:212 __ipipe_dispatch_irq+0x35c/0x5a0()
[  782.770380] Modules linked in: nfsv3 nfsv4 nfsd exportfs nfs_acl auth_rpcgss nfs lockd sunrpc vfat fat smsc95xx usbnet mii gpio_keys
[  782.782218] [<c0015624>] (unwind_backtrace+0x0/0xf8) from [<c0026118>] (warn_slowpath_common+0x4c/0x64)
[  782.791606] [<c0026118>] (warn_slowpath_common+0x4c/0x64) from [<c002614c>] (warn_slowpath_null+0x1c/0x24)
[  782.801232] [<c002614c>] (warn_slowpath_null+0x1c/0x24) from [<c00886ac>] (__ipipe_dispatch_irq+0x35c/0x5a0)
[  782.811035] [<c00886ac>] (__ipipe_dispatch_irq+0x35c/0x5a0) from [<c00200c8>] (combiner_handle_cascade_irq+0x80/0xf4)
[  782.821620] [<c00200c8>] (combiner_handle_cascade_irq+0x80/0xf4) from [<c0019110>] (__ipipe_ack_irq+0xc/0x10)
[  782.831510] [<c0019110>] (__ipipe_ack_irq+0xc/0x10) from [<c0088594>] (__ipipe_dispatch_irq+0x244/0x5a0)
[  782.840956] [<c0088594>] (__ipipe_dispatch_irq+0x244/0x5a0) from [<c0008508>] (__ipipe_grab_irq+0x48/0xd0)
[  782.850582] [<c0008508>] (__ipipe_grab_irq+0x48/0xd0) from [<c0008790>] (gic_handle_irq+0x34/0x68)
[  782.859512] Exception stack(0xe6131c78 to 0xe6131cc0)
[  782.864539] 1c60:                                                       c001f9f8 c006cadc
[  782.872699] 1c80: 20000053 ffffffff e6131cc4 c000e540 f8820000 00000004 01512000 00000001
[  782.880848] 1ca0: c1b11180 c06021c0 00000000 00000003 c01322b8 00000001 00000000 c062af44
[  782.889000] [<c0008790>] (gic_handle_irq+0x34/0x68) from [<c000e540>] (__irq_svc+0x40/0x6c)
[  782.897323] Exception stack(0xe6131c90 to 0xe6131cd8)
[  782.902350] 1c80:                                     f8820000 00000004 01512000 00000001
[  782.910508] 1ca0: c1b11180 c06021c0 00000000 00000003 c01322b8 00000001 00000000 c062af44
[  782.918659] 1cc0: 00000004 e6131cd8 c001f9f8 c006cadc 20000053 ffffffff
[  782.925253] [<c000e540>] (__irq_svc+0x40/0x6c) from [<c006cadc>] (smp_call_function_many+0x1ac/0x288)
[  782.934456] [<c006cadc>] (smp_call_function_many+0x1ac/0x288) from [<c006cbf0>] (on_each_cpu_mask+0x38/0x98)
[  782.944251] [<c006cbf0>] (on_each_cpu_mask+0x38/0x98) from [<c0132aa0>] (__alloc_pages_nodemask+0x5ec/0x954)
[  782.954076] [<c0132aa0>] (__alloc_pages_nodemask+0x5ec/0x954) from [<bf0983cc>] (nfs_readdir_xdr_to_array+0xb4/0x234 [nfs])
[  782.965212] [<bf0983cc>] (nfs_readdir_xdr_to_array+0xb4/0x234 [nfs]) from [<bf098568>] (nfs_readdir_filler+0x1c/0x84 [nfs])
[  782.976270] [<bf098568>] (nfs_readdir_filler+0x1c/0x84 [nfs]) from [<c012c274>] (do_read_cache_page+0x70/0x15c)
[  782.986311] [<c012c274>] (do_read_cache_page+0x70/0x15c) from [<c012c3a4>] (read_cache_page_async+0x18/0x20)
[  782.996112] [<c012c3a4>] (read_cache_page_async+0x18/0x20) from [<c012c3b4>] (read_cache_page+0x8/0x10)
[  783.005494] [<c012c3b4>] (read_cache_page+0x8/0x10) from [<bf0986e0>] (nfs_readdir+0x110/0x554 [nfs])
[  783.014692] [<bf0986e0>] (nfs_readdir+0x110/0x554 [nfs]) from [<c0175b38>] (vfs_readdir+0x80/0xa4)
[  783.023617] [<c0175b38>] (vfs_readdir+0x80/0xa4) from [<c0175d04>] (sys_getdents64+0x64/0xd8)
[  783.032103] [<c0175d04>] (sys_getdents64+0x64/0xd8) from [<c000e9c0>] (ret_fast_syscall+0x0/0x34)
[  783.040951] ---[ end trace 0054ffe34e60e6b0 ]---
[  783.045534] ------------[ cut here ]------------
[  783.050118] WARNING: at include/linux/ipipe_domain.h:153 __ipipe_set_irq_pending+0xe8/0x16c()
[  783.058637] Modules linked in: nfsv3 nfsv4 nfsd exportfs nfs_acl auth_rpcgss nfs lockd sunrpc vfat fat smsc95xx usbnet mii gpio_keys
[  783.070497] [<c0015624>] (unwind_backtrace+0x0/0xf8) from [<c0026118>] (warn_slowpath_common+0x4c/0x64)
[  783.079893] [<c0026118>] (warn_slowpath_common+0x4c/0x64) from [<c002614c>] (warn_slowpath_null+0x1c/0x24)
[  783.089529] [<c002614c>] (warn_slowpath_null+0x1c/0x24) from [<c00861ac>] (__ipipe_set_irq_pending+0xe8/0x16c)
[  783.099495] [<c00861ac>] (__ipipe_set_irq_pending+0xe8/0x16c) from [<c0088544>] (__ipipe_dispatch_irq+0x1f4/0x5a0)
[  783.109815] [<c0088544>] (__ipipe_dispatch_irq+0x1f4/0x5a0) from [<c00200c8>] (combiner_handle_cascade_irq+0x80/0xf4)
[  783.120407] [<c00200c8>] (combiner_handle_cascade_irq+0x80/0xf4) from [<c0019110>] (__ipipe_ack_irq+0xc/0x10)
[  783.130283] [<c0019110>] (__ipipe_ack_irq+0xc/0x10) from [<c0088594>] (__ipipe_dispatch_irq+0x244/0x5a0)
[  783.139733] [<c0088594>] (__ipipe_dispatch_irq+0x244/0x5a0) from [<c0008508>] (__ipipe_grab_irq+0x48/0xd0)
[  783.149363] [<c0008508>] (__ipipe_grab_irq+0x48/0xd0) from [<c0008790>] (gic_handle_irq+0x34/0x68)
[  783.158286] Exception stack(0xe6131c78 to 0xe6131cc0)
[  783.163312] 1c60:                                                       c001f9f8 c006cadc
[  783.171470] 1c80: 20000053 ffffffff e6131cc4 c000e540 f8820000 00000004 01512000 00000001
[  783.179625] 1ca0: c1b11180 c06021c0 00000000 00000003 c01322b8 00000001 00000000 c062af44
[  783.187777] [<c0008790>] (gic_handle_irq+0x34/0x68) from [<c000e540>] (__irq_svc+0x40/0x6c)
[  783.196098] Exception stack(0xe6131c90 to 0xe6131cd8)
[  783.201125] 1c80:                                     f8820000 00000004 01512000 00000001
[  783.209283] 1ca0: c1b11180 c06021c0 00000000 00000003 c01322b8 00000001 00000000 c062af44
[  783.217435] 1cc0: 00000004 e6131cd8 c001f9f8 c006cadc 20000053 ffffffff
[  783.224027] [<c000e540>] (__irq_svc+0x40/0x6c) from [<c006cadc>] (smp_call_function_many+0x1ac/0x288)
[  783.233225] [<c006cadc>] (smp_call_function_many+0x1ac/0x288) from [<c006cbf0>] (on_each_cpu_mask+0x38/0x98)
[  783.243026] [<c006cbf0>] (on_each_cpu_mask+0x38/0x98) from [<c0132aa0>] (__alloc_pages_nodemask+0x5ec/0x954)
[  783.252851] [<c0132aa0>] (__alloc_pages_nodemask+0x5ec/0x954) from [<bf0983cc>] (nfs_readdir_xdr_to_array+0xb4/0x234 [nfs])
[  783.263958] [<bf0983cc>] (nfs_readdir_xdr_to_array+0xb4/0x234 [nfs]) from [<bf098568>] (nfs_readdir_filler+0x1c/0x84 [nfs])
[  783.275048] [<bf098568>] (nfs_readdir_filler+0x1c/0x84 [nfs]) from [<c012c274>] (do_read_cache_page+0x70/0x15c)
[  783.285090] [<c012c274>] (do_read_cache_page+0x70/0x15c) from [<c012c3a4>] (read_cache_page_async+0x18/0x20)
[  783.294889] [<c012c3a4>] (read_cache_page_async+0x18/0x20) from [<c012c3b4>] (read_cache_page+0x8/0x10)
[  783.304269] [<c012c3b4>] (read_cache_page+0x8/0x10) from [<bf0986e0>] (nfs_readdir+0x110/0x554 [nfs])
[  783.313463] [<bf0986e0>] (nfs_readdir+0x110/0x554 [nfs]) from [<c0175b38>] (vfs_readdir+0x80/0xa4)
[  783.322391] [<c0175b38>] (vfs_readdir+0x80/0xa4) from [<c0175d04>] (sys_getdents64+0x64/0xd8)
[  783.330878] [<c0175d04>] (sys_getdents64+0x64/0xd8) from [<c000e9c0>] (ret_fast_syscall+0x0/0x34)
[  783.339722] ---[ end trace 0054ffe34e60e6b1 ]---
[  783.344309] ------------[ cut here ]------------
[  783.348911] WARNING: at kernel/ipipe/core.c:582 __ipipe_set_irq_pending+0x150/0x16c()
[  783.356721] Modules linked in: nfsv3 nfsv4 nfsd exportfs nfs_acl auth_rpcgss nfs lockd sunrpc vfat fat smsc95xx usbnet mii gpio_keys
[  783.368577] [<c0015624>] (unwind_backtrace+0x0/0xf8) from [<c0026118>] (warn_slowpath_common+0x4c/0x64)
[  783.377972] [<c0026118>] (warn_slowpath_common+0x4c/0x64) from [<c002614c>] (warn_slowpath_null+0x1c/0x24)
[  783.387599] [<c002614c>] (warn_slowpath_null+0x1c/0x24) from [<c0086214>] (__ipipe_set_irq_pending+0x150/0x16c)
[  783.397663] [<c0086214>] (__ipipe_set_irq_pending+0x150/0x16c) from [<c0088544>] (__ipipe_dispatch_irq+0x1f4/0x5a0)
[  783.408067] [<c0088544>] (__ipipe_dispatch_irq+0x1f4/0x5a0) from [<c00200c8>] (combiner_handle_cascade_irq+0x80/0xf4)
[  783.418649] [<c00200c8>] (combiner_handle_cascade_irq+0x80/0xf4) from [<c0019110>] (__ipipe_ack_irq+0xc/0x10)
[  783.428535] [<c0019110>] (__ipipe_ack_irq+0xc/0x10) from [<c0088594>] (__ipipe_dispatch_irq+0x244/0x5a0)
[  783.437985] [<c0088594>] (__ipipe_dispatch_irq+0x244/0x5a0) from [<c0008508>] (__ipipe_grab_irq+0x48/0xd0)
[  783.447613] [<c0008508>] (__ipipe_grab_irq+0x48/0xd0) from [<c0008790>] (gic_handle_irq+0x34/0x68)
[  783.456541] Exception stack(0xe6131c78 to 0xe6131cc0)
[  783.461573] 1c60:                                                       c001f9f8 c006cadc
[  783.469729] 1c80: 20000053 ffffffff e6131cc4 c000e540 f8820000 00000004 01512000 00000001
[  783.477878] 1ca0: c1b11180 c06021c0 00000000 00000003 c01322b8 00000001 00000000 c062af44
[  783.486031] [<c0008790>] (gic_handle_irq+0x34/0x68) from [<c000e540>] (__irq_svc+0x40/0x6c)
[  783.494353] Exception stack(0xe6131c90 to 0xe6131cd8)
[  783.499395] 1c80:                                     f8820000 00000004 01512000 00000001
[  783.507538] 1ca0: c1b11180 c06021c0 00000000 00000003 c01322b8 00000001 00000000 c062af44
[  783.515690] 1cc0: 00000004 e6131cd8 c001f9f8 c006cadc 20000053 ffffffff
[  783.522283] [<c000e540>] (__irq_svc+0x40/0x6c) from [<c006cadc>] (smp_call_function_many+0x1ac/0x288)
[  783.531479] [<c006cadc>] (smp_call_function_many+0x1ac/0x288) from [<c006cbf0>] (on_each_cpu_mask+0x38/0x98)
[  783.541280] [<c006cbf0>] (on_each_cpu_mask+0x38/0x98) from [<c0132aa0>] (__alloc_pages_nodemask+0x5ec/0x954)
[  783.551099] [<c0132aa0>] (__alloc_pages_nodemask+0x5ec/0x954) from [<bf0983cc>] (nfs_readdir_xdr_to_array+0xb4/0x234 [nfs])
[  783.562218] [<bf0983cc>] (nfs_readdir_xdr_to_array+0xb4/0x234 [nfs]) from [<bf098568>] (nfs_readdir_filler+0x1c/0x84 [nfs])
[  783.573302] [<bf098568>] (nfs_readdir_filler+0x1c/0x84 [nfs]) from [<c012c274>] (do_read_cache_page+0x70/0x15c)
[  783.583342] [<c012c274>] (do_read_cache_page+0x70/0x15c) from [<c012c3a4>] (read_cache_page_async+0x18/0x20)
[  783.593142] [<c012c3a4>] (read_cache_page_async+0x18/0x20) from [<c012c3b4>] (read_cache_page+0x8/0x10)
[  783.602523] [<c012c3b4>] (read_cache_page+0x8/0x10) from [<bf0986e0>] (nfs_readdir+0x110/0x554 [nfs])
[  783.611719] [<bf0986e0>] (nfs_readdir+0x110/0x554 [nfs]) from [<c0175b38>] (vfs_readdir+0x80/0xa4)
[  783.620634] [<c0175b38>] (vfs_readdir+0x80/0xa4) from [<c0175d04>] (sys_getdents64+0x64/0xd8)
[  783.629133] [<c0175d04>] (sys_getdents64+0x64/0xd8) from [<c000e9c0>] (ret_fast_syscall+0x0/0x34)
[  783.637974] ---[ end trace 0054ffe34e60e6b2 ]---
[  800.110222] BUG: soft lockup - CPU#0 stuck for 23s! [ls:3654]
[  800.110322] Modules linked in: nfsv3 nfsv4 nfsd exportfs nfs_acl auth_rpcgss nfs lockd sunrpc vfat fat smsc95xx usbnet mii gpio_keys
[  800.122197] 
[  800.123674] Pid: 3654, comm:                   ls
[  800.128362] CPU: 0    Tainted: G        W     (3.8.13.11-xen-deb #21)
[  800.134784] PC is at s3c24xx_serial_console_putchar+0x38/0x74
[  800.140505] LR is at 0x0
[  800.143016] pc : [<c0290da4>]    lr : [<00000000>]    psr: 60000153
[  800.143016] sp : 00000000  ip : 00000000  fp : 00000000
[  800.154472] r10: 00000000  r9 : 00000000  r8 : 00000000
[  800.159669] r7 : 00000000  r6 : 00000000  r5 : 00000000  r4 : 00000000
[  800.166174] r3 : 00000000  r2 : 00000000  r1 : 00000000  r0 : 00000000
[  800.170226] BUG: soft lockup - CPU#2 stuck for 23s! [ps:17584]
[  800.170252] Modules linked in: nfsv3 nfsv4 nfsd exportfs nfs_acl auth_rpcgss nfs lockd sunrpc vfat fat smsc95xx usbnet mii gpio_keys
[  800.170254] 
[  800.170256] Pid: 17584, comm:                   ps
[  800.170259] CPU: 2    Tainted: G        W     (3.8.13.11-xen-deb #21)
[  800.170265] PC is at smp_call_function_many+0x1a8/0x288
[  800.170268] LR is at 0x0
[  800.170272] pc : [<c006cad8>]    lr : [<00000000>]    psr: 20000053
[  800.170272] sp : 00000000  ip : 00000000  fp : 00000000
[  800.170274] r10: 00000000  r9 : 00000000  r8 : 00000000
[  800.170277] r7 : 00000000  r6 : 00000000  r5 : 00000000  r4 : 00000000
[  800.170279] r3 : 00000000  r2 : 00000000  r1 : 00000000  r0 : 00000000
[  800.170283] Flags: nzCv  IRQs on  FIQs off  Mode SVC_32  ISA ARM  Segment user
[  800.170286] Control: 10c5387d  Table: 65fe404a  DAC: 00000015
[  800.170298] [<c0015624>] (unwind_backtrace+0x0/0xf8) from [<c007c138>] (watchdog_timer_fn+0x134/0x17c)
[  800.170309] [<c007c138>] (watchdog_timer_fn+0x134/0x17c) from [<c0046ba8>] (__run_hrtimer.isra.19+0x44/0xd0)
[  800.170316] [<c0046ba8>] (__run_hrtimer.isra.19+0x44/0xd0) from [<c0047318>] (hrtimer_interrupt+0x108/0x2b4)
[  800.170324] [<c0047318>] (hrtimer_interrupt+0x108/0x2b4) from [<c0020f30>] (exynos4_mct_tick_isr+0x28/0x34)
[  800.170331] [<c0020f30>] (exynos4_mct_tick_isr+0x28/0x34) from [<c007f6dc>] (handle_percpu_devid_irq+0x44/0xa0)
[  800.170338] [<c007f6dc>] (handle_percpu_devid_irq+0x44/0xa0) from [<c007c3bc>] (generic_handle_irq+0x18/0x2c)
[  800.170346] [<c007c3bc>] (generic_handle_irq+0x18/0x2c) from [<c000f918>] (handle_IRQ+0x38/0x94)
[  800.170354] [<c000f918>] (handle_IRQ+0x38/0x94) from [<c0087b54>] (__ipipe_do_sync_stage+0x1a8/0x2c0)
[  800.170360] [<c0087b54>] (__ipipe_do_sync_stage+0x1a8/0x2c0) from [<c0008508>] (__ipipe_grab_irq+0x48/0xd0)
[  800.170365] [<c0008508>] (__ipipe_grab_irq+0x48/0xd0) from [<c0008790>] (gic_handle_irq+0x34/0x68)
[  800.170368] Exception stack(0xe5fe9dd0 to 0xe5fe9e18)
[  800.170372] 9dc0:                                     c001f9f8 c006cad8 20000053 ffffffff
[  800.170378] 9de0: e5fe9e1c c000e540 f8828000 00000004 01526000 00000001 c1b25180 c06021c0
[  800.170383] 9e00: 00000000 00000003 c01322b8 00000001 00000000 c062af44
[  800.170389] [<c0008790>] (gic_handle_irq+0x34/0x68) from [<c000e540>] (__irq_svc+0x40/0x6c)
[  800.170391] Exception stack(0xe5fe9de8 to 0xe5fe9e30)
[  800.170396] 9de0:                   f8828000 00000004 01526000 00000001 c1b25180 c06021c0
[  800.170402] 9e00: 00000000 00000003 c01322b8 00000001 00000000 c062af44 00000004 e5fe9e30
[  800.170406] 9e20: c001f9f8 c006cad8 20000053 ffffffff
[  800.170413] [<c000e540>] (__irq_svc+0x40/0x6c) from [<c006cad8>] (smp_call_function_many+0x1a8/0x288)
[  800.170419] [<c006cad8>] (smp_call_function_many+0x1a8/0x288) from [<c006cbf0>] (on_each_cpu_mask+0x38/0x98)
[  800.170426] [<c006cbf0>] (on_each_cpu_mask+0x38/0x98) from [<c0132aa0>] (__alloc_pages_nodemask+0x5ec/0x954)
[  800.170432] [<c0132aa0>] (__alloc_pages_nodemask+0x5ec/0x954) from [<c0132e1c>] (__get_free_pages+0x14/0x2c)
[  800.170440] [<c0132e1c>] (__get_free_pages+0x14/0x2c) from [<c01b0d5c>] (proc_info_read+0x40/0xdc)
[  800.170450] [<c01b0d5c>] (proc_info_read+0x40/0xdc) from [<c0165ecc>] (vfs_read+0x9c/0x140)
[  800.170457] [<c0165ecc>] (vfs_read+0x9c/0x140) from [<c0165fac>] (sys_read+0x3c/0x70)
[  800.170464] [<c0165fac>] (sys_read+0x3c/0x70) from [<c000e9c0>] (ret_fast_syscall+0x0/0x34)
[  800.483079] Flags: nZCv  IRQs on  FIQs off  Mode SVC_32  ISA ARM  Segment user
[  800.490277] Control: 10c5387d  Table: 6614804a  DAC: 00000015
[  800.496006] [<c0015624>] (unwind_backtrace+0x0/0xf8) from [<c007c138>] (watchdog_timer_fn+0x134/0x17c)
[  800.505289] [<c007c138>] (watchdog_timer_fn+0x134/0x17c) from [<c0046ba8>] (__run_hrtimer.isra.19+0x44/0xd0)
[  800.515088] [<c0046ba8>] (__run_hrtimer.isra.19+0x44/0xd0) from [<c0047318>] (hrtimer_interrupt+0x108/0x2b4)
[  800.524889] [<c0047318>] (hrtimer_interrupt+0x108/0x2b4) from [<c0020f30>] (exynos4_mct_tick_isr+0x28/0x34)
[  800.534601] [<c0020f30>] (exynos4_mct_tick_isr+0x28/0x34) from [<c007f6dc>] (handle_percpu_devid_irq+0x44/0xa0)
[  800.544662] [<c007f6dc>] (handle_percpu_devid_irq+0x44/0xa0) from [<c007c3bc>] (generic_handle_irq+0x18/0x2c)
[  800.554549] [<c007c3bc>] (generic_handle_irq+0x18/0x2c) from [<c000f918>] (handle_IRQ+0x38/0x94)
[  800.563308] [<c000f918>] (handle_IRQ+0x38/0x94) from [<c0087b54>] (__ipipe_do_sync_stage+0x1a8/0x2c0)
[  800.572500] [<c0087b54>] (__ipipe_do_sync_stage+0x1a8/0x2c0) from [<c0008508>] (__ipipe_grab_irq+0x48/0xd0)
[  800.582213] [<c0008508>] (__ipipe_grab_irq+0x48/0xd0) from [<c0008790>] (gic_handle_irq+0x34/0x68)
[  800.591141] Exception stack(0xe6131c78 to 0xe6131cc0)
[  800.596168] 1c60:                                                       c001f9f8 c006cad8
[  800.604327] 1c80: 20000053 ffffffff e6131cc4 c000e540 f8820000 00000004 01512000 00000001
[  800.612479] 1ca0: c1b11180 c06021c0 00000000 00000003 c01322b8 00000001 00000000 c062af44
[  800.620632] [<c0008790>] (gic_handle_irq+0x34/0x68) from [<c000e540>] (__irq_svc+0x40/0x6c)
[  800.628954] Exception stack(0xe6131c90 to 0xe6131cd8)
[  800.633982] 1c80:                                     f8820000 00000004 01512000 00000001
[  800.642139] 1ca0: c1b11180 c06021c0 00000000 00000003 c01322b8 00000001 00000000 c062af44
[  800.650291] 1cc0: 00000004 e6131cd8 c001f9f8 c006cad8 20000053 ffffffff
[  800.656883] [<c000e540>] (__irq_svc+0x40/0x6c) from [<c006cad8>] (smp_call_function_many+0x1a8/0x288)
[  800.666080] [<c006cad8>] (smp_call_function_many+0x1a8/0x288) from [<c006cbf0>] (on_each_cpu_mask+0x38/0x98)
[  800.675880] [<c006cbf0>] (on_each_cpu_mask+0x38/0x98) from [<c0132aa0>] (__alloc_pages_nodemask+0x5ec/0x954)
[  800.685701] [<c0132aa0>] (__alloc_pages_nodemask+0x5ec/0x954) from [<bf0983cc>] (nfs_readdir_xdr_to_array+0xb4/0x234 [nfs])
[  800.696813] [<bf0983cc>] (nfs_readdir_xdr_to_array+0xb4/0x234 [nfs]) from [<bf098568>] (nfs_readdir_filler+0x1c/0x84 [nfs])
[  800.707899] [<bf098568>] (nfs_readdir_filler+0x1c/0x84 [nfs]) from [<c012c274>] (do_read_cache_page+0x70/0x15c)
[  800.717942] [<c012c274>] (do_read_cache_page+0x70/0x15c) from [<c012c3a4>] (read_cache_page_async+0x18/0x20)
[  800.727741] [<c012c3a4>] (read_cache_page_async+0x18/0x20) from [<c012c3b4>] (read_cache_page+0x8/0x10)
[  800.737123] [<c012c3b4>] (read_cache_page+0x8/0x10) from [<bf0986e0>] (nfs_readdir+0x110/0x554 [nfs])
[  800.746318] [<bf0986e0>] (nfs_readdir+0x110/0x554 [nfs]) from [<c0175b38>] (vfs_readdir+0x80/0xa4)
[  800.755234] [<c0175b38>] (vfs_readdir+0x80/0xa4) from [<c0175d04>] (sys_getdents64+0x64/0xd8)
[  800.763733] [<c0175d04>] (sys_getdents64+0x64/0xd8) from [<c000e9c0>] (ret_fast_syscall+0x0/0x34)
[  804.140245] BUG: soft lockup - CPU#1 stuck for 22s! [ls:5936]
[  804.140347] Modules linked in: nfsv3 nfsv4 nfsd exportfs nfs_acl auth_rpcgss nfs lockd sunrpc vfat fat smsc95xx usbnet mii gpio_keys
[  804.152222] 
[  804.153699] Pid: 5936, comm:                   ls
[  804.158386] CPU: 1    Tainted: G        W     (3.8.13.11-xen-deb #21)
[  804.164807] PC is at smp_call_function_many+0x1a4/0x288
[  804.170008] LR is at 0x0
[  804.172521] pc : [<c006cad4>]    lr : [<00000000>]    psr: 20000053
[  804.172521] sp : 00000000  ip : 00000000  fp : 00000000
[  804.183977] r10: 00000000  r9 : 00000000  r8 : 00000000
[  804.189174] r7 : 00000000  r6 : 00000000  r5 : 00000000  r4 : 00000000
[  804.195680] r3 : 00000000  r2 : 00000000  r1 : 00000000  r0 : 00000000
[  804.202185] Flags: nzCv  IRQs on  FIQs off  Mode SVC_32  ISA ARM  Segment user
[  804.209385] Control: 10c5387d  Table: 65af004a  DAC: 00000015
[  804.215113] [<c0015624>] (unwind_backtrace+0x0/0xf8) from [<c007c138>] (watchdog_timer_fn+0x134/0x17c)
[  804.224396] [<c007c138>] (watchdog_timer_fn+0x134/0x17c) from [<c0046ba8>] (__run_hrtimer.isra.19+0x44/0xd0)
[  804.234195] [<c0046ba8>] (__run_hrtimer.isra.19+0x44/0xd0) from [<c0047318>] (hrtimer_interrupt+0x108/0x2b4)
[  804.243997] [<c0047318>] (hrtimer_interrupt+0x108/0x2b4) from [<c0020f30>] (exynos4_mct_tick_isr+0x28/0x34)
[  804.253708] [<c0020f30>] (exynos4_mct_tick_isr+0x28/0x34) from [<c007f6dc>] (handle_percpu_devid_irq+0x44/0xa0)
[  804.263769] [<c007f6dc>] (handle_percpu_devid_irq+0x44/0xa0) from [<c007c3bc>] (generic_handle_irq+0x18/0x2c)
[  804.273656] [<c007c3bc>] (generic_handle_irq+0x18/0x2c) from [<c000f918>] (handle_IRQ+0x38/0x94)
[  804.282415] [<c000f918>] (handle_IRQ+0x38/0x94) from [<c0087b54>] (__ipipe_do_sync_stage+0x1a8/0x2c0)
[  804.291607] [<c0087b54>] (__ipipe_do_sync_stage+0x1a8/0x2c0) from [<c0008508>] (__ipipe_grab_irq+0x48/0xd0)
[  804.301320] [<c0008508>] (__ipipe_grab_irq+0x48/0xd0) from [<c0008790>] (gic_handle_irq+0x34/0x68)
[  804.310249] Exception stack(0xe5b87c78 to 0xe5b87cc0)
[  804.315276] 7c60:                                                       c001f9f8 c006cad4
[  804.323434] 7c80: 20000053 ffffffff e5b87cc4 c000e540 f8824000 00000004 0151c000 00000001
[  804.331586] 7ca0: c1b1b180 c06021c0 00000000 00000002 c01322b8 00000001 00000000 c062af44
[  804.339739] [<c0008790>] (gic_handle_irq+0x34/0x68) from [<c000e540>] (__irq_svc+0x40/0x6c)
[  804.348062] Exception stack(0xe5b87c90 to 0xe5b87cd8)
[  804.353089] 7c80:                                     f8824000 00000004 0151c000 00000001
[  804.361247] 7ca0: c1b1b180 c06021c0 00000000 00000002 c01322b8 00000001 00000000 c062af44
[  804.369398] 7cc0: 00000004 e5b87cd8 c001f9f8 c006cad4 20000053 ffffffff
[  804.375990] [<c000e540>] (__irq_svc+0x40/0x6c) from [<c006cad4>] (smp_call_function_many+0x1a4/0x288)
[  804.385186] [<c006cad4>] (smp_call_function_many+0x1a4/0x288) from [<c006cbf0>] (on_each_cpu_mask+0x38/0x98)
[  804.394987] [<c006cbf0>] (on_each_cpu_mask+0x38/0x98) from [<c0132aa0>] (__alloc_pages_nodemask+0x5ec/0x954)
[  804.404807] [<c0132aa0>] (__alloc_pages_nodemask+0x5ec/0x954) from [<bf0983cc>] (nfs_readdir_xdr_to_array+0xb4/0x234 [nfs])
[  804.415918] [<bf0983cc>] (nfs_readdir_xdr_to_array+0xb4/0x234 [nfs]) from [<bf098568>] (nfs_readdir_filler+0x1c/0x84 [nfs])
[  804.427004] [<bf098568>] (nfs_readdir_filler+0x1c/0x84 [nfs]) from [<c012c274>] (do_read_cache_page+0x70/0x15c)
[  804.437049] [<c012c274>] (do_read_cache_page+0x70/0x15c) from [<c012c3a4>] (read_cache_page_async+0x18/0x20)
[  804.446848] [<c012c3a4>] (read_cache_page_async+0x18/0x20) from [<c012c3b4>] (read_cache_page+0x8/0x10)
[  804.456228] [<c012c3b4>] (read_cache_page+0x8/0x10) from [<bf0986e0>] (nfs_readdir+0x110/0x554 [nfs])
[  804.465423] [<bf0986e0>] (nfs_readdir+0x110/0x554 [nfs]) from [<c0175b38>] (vfs_readdir+0x80/0xa4)
[  804.474341] [<c0175b38>] (vfs_readdir+0x80/0xa4) from [<c0175d04>] (sys_getdents64+0x64/0xd8)
[  804.482841] [<c0175d04>] (sys_getdents64+0x64/0xd8) from [<c000e9c0>] (ret_fast_syscall+0x0/0x34)
[  947.765771] INFO: rcu_preempt detected stalls on CPUs/tasks: { 3} (detected by 1, t=120017 jiffies, g=22447, c=22446, q=1166139)
[  947.771694] Task dump for CPU 3:
[  947.774897] switchtest      R running      0  3979   3640 0x00000002
[  947.781244] [<c0453ddc>] (__schedule+0x1fc/0x5f8) from [<00000010>] (0x10)
[ 1127.791434] INFO: rcu_preempt detected stalls on CPUs/tasks: { 3} (detected by 0, t=156022 jiffies, g=22447, c=22446, q=1166139)
[ 1127.797358] Task dump for CPU 3:
[ 1127.800561] switchtest      R running      0  3979   3640 0x00000002
[ 1127.806908] [<c0453ddc>] (__schedule+0x1fc/0x5f8) from [<00000010>] (0x10)
[ 1307.817087] INFO: rcu_preempt detected stalls on CPUs/tasks: { 3} (detected by 0, t=192027 jiffies, g=22447, c=22446, q=1166139)
[ 1307.823002] Task dump for CPU 3:
[ 1307.826206] switchtest      R running      0  3979   3640 0x00000002
[ 1307.832553] [<c0453ddc>] (__schedule+0x1fc/0x5f8) from [<00000010>] (0x10)
[ 1487.842745] INFO: rcu_preempt detected stalls on CPUs/tasks: { 3} (detected by 0, t=228032 jiffies, g=22447, c=22446, q=1166139)
[ 1487.848665] Task dump for CPU 3:
[ 1487.851868] switchtest      R running      0  3979   3640 0x00000002
[ 1487.858213] [<c0453ddc>] (__schedule+0x1fc/0x5f8) from [<00000010>] (0x10)
[ 1667.868406] INFO: rcu_preempt detected stalls on CPUs/tasks: { 3} (detected by 0, t=264037 jiffies, g=22447, c=22446, q=1166139)
[ 1667.874328] Task dump for CPU 3:
[ 1667.877531] switchtest      R running      0  3979   3640 0x00000002
[ 1667.883878] [<c0453ddc>] (__schedule+0x1fc/0x5f8) from [<00000010>] (0x10)
[ 1847.894064] INFO: rcu_preempt detected stalls on CPUs/tasks: { 3} (detected by 2, t=300042 jiffies, g=22447, c=22446, q=1166139)
[ 1847.899981] Task dump for CPU 3:
[ 1847.903185] switchtest      R running      0  3979   3640 0x00000002
[ 1847.909531] [<c0453ddc>] (__schedule+0x1fc/0x5f8) from [<00000010>] (0x10)
[ 2027.919723] INFO: rcu_preempt detected stalls on CPUs/tasks: { 3} (detected by 0, t=336047 jiffies, g=22447, c=22446, q=1166139)
[ 2027.925645] Task dump for CPU 3:
[ 2027.928848] switchtest      R running      0  3979   3640 0x00000002
[ 2027.935194] [<c0453ddc>] (__schedule+0x1fc/0x5f8) from [<00000010>] (0x10)

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] Switchtest failures on ODROIDU3
  2014-10-03  3:35                       ` GP Orcullo
@ 2014-10-03  7:20                         ` Gilles Chanteperdrix
  2014-10-03  8:45                           ` GP Orcullo
  0 siblings, 1 reply; 46+ messages in thread
From: Gilles Chanteperdrix @ 2014-10-03  7:20 UTC (permalink / raw)
  To: GP Orcullo; +Cc: xenomai

On Fri, Oct 03, 2014 at 11:35:33AM +0800, GP Orcullo wrote:
> The system managed to dump some error messages.
>

> [  407.688809] INFO: rcu_preempt detected stalls on CPUs/tasks: { 3} (detected by 0, t=12002 jiffies, g=22447, c=22446, q=113448)
> [  407.694560] Task dump for CPU 3:
> [  407.697763] switchtest      R running      0  3979   3640 0x00000002
> [  407.704111] [<c0453ddc>] (__schedule+0x1fc/0x5f8) from [<00000010>] (0x10)
> [  587.714458] INFO: rcu_preempt detected stalls on CPUs/tasks: { 3} (detected by 2, t=48007 jiffies, g=22447, c=22446, q=443504)
> [  587.720205] Task dump for CPU 3:
> [  587.723407] switchtest      R running      0  3979   3640 0x00000002
> [  587.729755] [<c0453ddc>] (__schedule+0x1fc/0x5f8) from [<00000010>] (0x10)
> [  767.740128] INFO: rcu_preempt detected stalls on CPUs/tasks: { 3} (detected by 2, t=84012 jiffies, g=22447, c=22446, q=1136198)
> [  767.745964] Task dump for CPU 3:
> [  767.749167] switchtest      R running      0  3979   3640 0x00000002
> [  767.755515] [<c0453ddc>] (__schedule+0x1fc/0x5f8) from [<00000010>] (0x10)
> [  782.763027] ------------[ cut here ]------------
> [  782.763099] WARNING: at include/linux/ipipe_domain.h:212 __ipipe_dispatch_irq+0x35c/0x5a0()
> [  782.770380] Modules linked in: nfsv3 nfsv4 nfsd exportfs nfs_acl auth_rpcgss nfs lockd sunrpc vfat fat smsc95xx usbnet mii gpio_keys
> [  782.782218] [<c0015624>] (unwind_backtrace+0x0/0xf8) from [<c0026118>] (warn_slowpath_common+0x4c/0x64)
> [  782.791606] [<c0026118>] (warn_slowpath_common+0x4c/0x64) from [<c002614c>] (warn_slowpath_null+0x1c/0x24)
> [  782.801232] [<c002614c>] (warn_slowpath_null+0x1c/0x24) from [<c00886ac>] (__ipipe_dispatch_irq+0x35c/0x5a0)
> [  782.811035] [<c00886ac>] (__ipipe_dispatch_irq+0x35c/0x5a0) from [<c00200c8>] (combiner_handle_cascade_irq+0x80/0xf4)
> [  782.821620] [<c00200c8>] (combiner_handle_cascade_irq+0x80/0xf4) from [<c0019110>] (__ipipe_ack_irq+0xc/0x10)

This is bad, can I see the code of the "combiner_handle_cascade_irq" function?

--
					    Gilles.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] Switchtest failures on ODROIDU3
  2014-10-03  7:20                         ` Gilles Chanteperdrix
@ 2014-10-03  8:45                           ` GP Orcullo
  2014-10-03  8:57                             ` Gilles Chanteperdrix
  0 siblings, 1 reply; 46+ messages in thread
From: GP Orcullo @ 2014-10-03  8:45 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai

On Fri, Oct 3, 2014 at 3:20 PM, Gilles Chanteperdrix
<gilles.chanteperdrix@xenomai.org> wrote:
> On Fri, Oct 03, 2014 at 11:35:33AM +0800, GP Orcullo wrote:
>> The system managed to dump some error messages.
>>
>
>> [  407.688809] INFO: rcu_preempt detected stalls on CPUs/tasks: { 3} (detected by 0, t=12002 jiffies, g=22447, c=22446, q=113448)
>> [  407.694560] Task dump for CPU 3:
>> [  407.697763] switchtest      R running      0  3979   3640 0x00000002
>> [  407.704111] [<c0453ddc>] (__schedule+0x1fc/0x5f8) from [<00000010>] (0x10)
>> [  587.714458] INFO: rcu_preempt detected stalls on CPUs/tasks: { 3} (detected by 2, t=48007 jiffies, g=22447, c=22446, q=443504)
>> [  587.720205] Task dump for CPU 3:
>> [  587.723407] switchtest      R running      0  3979   3640 0x00000002
>> [  587.729755] [<c0453ddc>] (__schedule+0x1fc/0x5f8) from [<00000010>] (0x10)
>> [  767.740128] INFO: rcu_preempt detected stalls on CPUs/tasks: { 3} (detected by 2, t=84012 jiffies, g=22447, c=22446, q=1136198)
>> [  767.745964] Task dump for CPU 3:
>> [  767.749167] switchtest      R running      0  3979   3640 0x00000002
>> [  767.755515] [<c0453ddc>] (__schedule+0x1fc/0x5f8) from [<00000010>] (0x10)
>> [  782.763027] ------------[ cut here ]------------
>> [  782.763099] WARNING: at include/linux/ipipe_domain.h:212 __ipipe_dispatch_irq+0x35c/0x5a0()
>> [  782.770380] Modules linked in: nfsv3 nfsv4 nfsd exportfs nfs_acl auth_rpcgss nfs lockd sunrpc vfat fat smsc95xx usbnet mii gpio_keys
>> [  782.782218] [<c0015624>] (unwind_backtrace+0x0/0xf8) from [<c0026118>] (warn_slowpath_common+0x4c/0x64)
>> [  782.791606] [<c0026118>] (warn_slowpath_common+0x4c/0x64) from [<c002614c>] (warn_slowpath_null+0x1c/0x24)
>> [  782.801232] [<c002614c>] (warn_slowpath_null+0x1c/0x24) from [<c00886ac>] (__ipipe_dispatch_irq+0x35c/0x5a0)
>> [  782.811035] [<c00886ac>] (__ipipe_dispatch_irq+0x35c/0x5a0) from [<c00200c8>] (combiner_handle_cascade_irq+0x80/0xf4)
>> [  782.821620] [<c00200c8>] (combiner_handle_cascade_irq+0x80/0xf4) from [<c0019110>] (__ipipe_ack_irq+0xc/0x10)
>
> This is bad, can I see the code of the "combiner_handle_cascade_irq" function?
>
> --
>                                             Gilles.

Here's the original code:

static DEFINE_SPINLOCK(irq_controller_lock);

...

static void combiner_handle_cascade_irq(unsigned int irq, struct irq_desc *desc)
{
    struct combiner_chip_data *chip_data = irq_get_handler_data(irq);
    struct irq_chip *chip = irq_get_chip(irq);
    unsigned int cascade_irq, combiner_irq;
    unsigned long status;

    chained_irq_enter(chip, desc);

    spin_lock(&irq_controller_lock);
    status = __raw_readl(chip_data->base + COMBINER_INT_STATUS);
    spin_unlock(&irq_controller_lock);
    status &= chip_data->irq_mask;

    if (status == 0)
        goto out;

    combiner_irq = __ffs(status);

    cascade_irq = combiner_irq + (chip_data->irq_offset & ~31);
    if (unlikely(cascade_irq >= NR_IRQS))
        do_bad_IRQ(cascade_irq, desc);
    else
        ipipe_handle_demuxed_irq(cascade_irq);

 out:
    chained_irq_exit(chip, desc);
}


I've changed the DEFINE_SPINLOCK to IPIPE_DEFINE_SPINLOCK and I got
the same results.

Here's the link to the full code:

https://github.com/kinsamanka/linux-odroidu3/blob/odroid-3.8.y-xenomai/arch/arm/mach-exynos/common.c

Regards,

GP Orcullo


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] Switchtest failures on ODROIDU3
  2014-10-03  8:45                           ` GP Orcullo
@ 2014-10-03  8:57                             ` Gilles Chanteperdrix
  2014-10-03 10:58                               ` GP Orcullo
  0 siblings, 1 reply; 46+ messages in thread
From: Gilles Chanteperdrix @ 2014-10-03  8:57 UTC (permalink / raw)
  To: GP Orcullo; +Cc: xenomai

On Fri, Oct 03, 2014 at 04:45:53PM +0800, GP Orcullo wrote:
> On Fri, Oct 3, 2014 at 3:20 PM, Gilles Chanteperdrix
> <gilles.chanteperdrix@xenomai.org> wrote:
> > On Fri, Oct 03, 2014 at 11:35:33AM +0800, GP Orcullo wrote:
> >> The system managed to dump some error messages.
> >>
> >
> >> [  407.688809] INFO: rcu_preempt detected stalls on CPUs/tasks: { 3} (detected by 0, t=12002 jiffies, g=22447, c=22446, q=113448)
> >> [  407.694560] Task dump for CPU 3:
> >> [  407.697763] switchtest      R running      0  3979   3640 0x00000002
> >> [  407.704111] [<c0453ddc>] (__schedule+0x1fc/0x5f8) from [<00000010>] (0x10)
> >> [  587.714458] INFO: rcu_preempt detected stalls on CPUs/tasks: { 3} (detected by 2, t=48007 jiffies, g=22447, c=22446, q=443504)
> >> [  587.720205] Task dump for CPU 3:
> >> [  587.723407] switchtest      R running      0  3979   3640 0x00000002
> >> [  587.729755] [<c0453ddc>] (__schedule+0x1fc/0x5f8) from [<00000010>] (0x10)
> >> [  767.740128] INFO: rcu_preempt detected stalls on CPUs/tasks: { 3} (detected by 2, t=84012 jiffies, g=22447, c=22446, q=1136198)
> >> [  767.745964] Task dump for CPU 3:
> >> [  767.749167] switchtest      R running      0  3979   3640 0x00000002
> >> [  767.755515] [<c0453ddc>] (__schedule+0x1fc/0x5f8) from [<00000010>] (0x10)
> >> [  782.763027] ------------[ cut here ]------------
> >> [  782.763099] WARNING: at include/linux/ipipe_domain.h:212 __ipipe_dispatch_irq+0x35c/0x5a0()
> >> [  782.770380] Modules linked in: nfsv3 nfsv4 nfsd exportfs nfs_acl auth_rpcgss nfs lockd sunrpc vfat fat smsc95xx usbnet mii gpio_keys
> >> [  782.782218] [<c0015624>] (unwind_backtrace+0x0/0xf8) from [<c0026118>] (warn_slowpath_common+0x4c/0x64)
> >> [  782.791606] [<c0026118>] (warn_slowpath_common+0x4c/0x64) from [<c002614c>] (warn_slowpath_null+0x1c/0x24)
> >> [  782.801232] [<c002614c>] (warn_slowpath_null+0x1c/0x24) from [<c00886ac>] (__ipipe_dispatch_irq+0x35c/0x5a0)
> >> [  782.811035] [<c00886ac>] (__ipipe_dispatch_irq+0x35c/0x5a0) from [<c00200c8>] (combiner_handle_cascade_irq+0x80/0xf4)
> >> [  782.821620] [<c00200c8>] (combiner_handle_cascade_irq+0x80/0xf4) from [<c0019110>] (__ipipe_ack_irq+0xc/0x10)
> >
> > This is bad, can I see the code of the "combiner_handle_cascade_irq" function?
> >
>
> Here's the original code:
>
> static DEFINE_SPINLOCK(irq_controller_lock);
>
> ...
>
> static void combiner_handle_cascade_irq(unsigned int irq, struct irq_desc *desc)
> {
>     struct combiner_chip_data *chip_data = irq_get_handler_data(irq);
>     struct irq_chip *chip = irq_get_chip(irq);
>     unsigned int cascade_irq, combiner_irq;
>     unsigned long status;
>
>     chained_irq_enter(chip, desc);
>
>     spin_lock(&irq_controller_lock);
>     status = __raw_readl(chip_data->base + COMBINER_INT_STATUS);
>     spin_unlock(&irq_controller_lock);
>     status &= chip_data->irq_mask;
>
>     if (status == 0)
>         goto out;
>
>     combiner_irq = __ffs(status);
>
>     cascade_irq = combiner_irq + (chip_data->irq_offset & ~31);
>     if (unlikely(cascade_irq >= NR_IRQS))
>         do_bad_IRQ(cascade_irq, desc);
>     else
>         ipipe_handle_demuxed_irq(cascade_irq);
>
>  out:
>     chained_irq_exit(chip, desc);
> }
>
>
> I've changed the DEFINE_SPINLOCK to IPIPE_DEFINE_SPINLOCK and I got
> the same results.

Same results meaning you still have the warning in ipipe_domain ? Or
still have the lockup?

One of these functions uses local_irq_save/local_irq_restore, one
way to find which is to enable the I-pipe tracer, enable panic
backtrace, and put a BUG_ON(hard_irqs_enabled()) right before the
call to ipipe_handle_demuxed_irq. You probably can not call
do_bad_IRQ from here either, but I do not think this is your
problem, this checks looks a bit redundant: in order to check that
cascade_irq is always less than NR_IRQS, it is sufficient to check
that chip_data->irq_offset & ~31 is less than NR_IRQS - 32, which
can be done only once, and not repeated stupidly for every irq.

--
					    Gilles.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] Switchtest failures on ODROIDU3
  2014-10-03  8:57                             ` Gilles Chanteperdrix
@ 2014-10-03 10:58                               ` GP Orcullo
  2014-10-03 13:37                                 ` Gilles Chanteperdrix
  0 siblings, 1 reply; 46+ messages in thread
From: GP Orcullo @ 2014-10-03 10:58 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai

On Fri, Oct 3, 2014 at 4:57 PM, Gilles Chanteperdrix
<gilles.chanteperdrix@xenomai.org> wrote:
> On Fri, Oct 03, 2014 at 04:45:53PM +0800, GP Orcullo wrote:
>> On Fri, Oct 3, 2014 at 3:20 PM, Gilles Chanteperdrix
>> <gilles.chanteperdrix@xenomai.org> wrote:
>> > On Fri, Oct 03, 2014 at 11:35:33AM +0800, GP Orcullo wrote:
>> >> The system managed to dump some error messages.
>> >>
>> >
>> >> [  407.688809] INFO: rcu_preempt detected stalls on CPUs/tasks: { 3} (detected by 0, t=12002 jiffies, g=22447, c=22446, q=113448)
>> >> [  407.694560] Task dump for CPU 3:
>> >> [  407.697763] switchtest      R running      0  3979   3640 0x00000002
>> >> [  407.704111] [<c0453ddc>] (__schedule+0x1fc/0x5f8) from [<00000010>] (0x10)
>> >> [  587.714458] INFO: rcu_preempt detected stalls on CPUs/tasks: { 3} (detected by 2, t=48007 jiffies, g=22447, c=22446, q=443504)
>> >> [  587.720205] Task dump for CPU 3:
>> >> [  587.723407] switchtest      R running      0  3979   3640 0x00000002
>> >> [  587.729755] [<c0453ddc>] (__schedule+0x1fc/0x5f8) from [<00000010>] (0x10)
>> >> [  767.740128] INFO: rcu_preempt detected stalls on CPUs/tasks: { 3} (detected by 2, t=84012 jiffies, g=22447, c=22446, q=1136198)
>> >> [  767.745964] Task dump for CPU 3:
>> >> [  767.749167] switchtest      R running      0  3979   3640 0x00000002
>> >> [  767.755515] [<c0453ddc>] (__schedule+0x1fc/0x5f8) from [<00000010>] (0x10)
>> >> [  782.763027] ------------[ cut here ]------------
>> >> [  782.763099] WARNING: at include/linux/ipipe_domain.h:212 __ipipe_dispatch_irq+0x35c/0x5a0()
>> >> [  782.770380] Modules linked in: nfsv3 nfsv4 nfsd exportfs nfs_acl auth_rpcgss nfs lockd sunrpc vfat fat smsc95xx usbnet mii gpio_keys
>> >> [  782.782218] [<c0015624>] (unwind_backtrace+0x0/0xf8) from [<c0026118>] (warn_slowpath_common+0x4c/0x64)
>> >> [  782.791606] [<c0026118>] (warn_slowpath_common+0x4c/0x64) from [<c002614c>] (warn_slowpath_null+0x1c/0x24)
>> >> [  782.801232] [<c002614c>] (warn_slowpath_null+0x1c/0x24) from [<c00886ac>] (__ipipe_dispatch_irq+0x35c/0x5a0)
>> >> [  782.811035] [<c00886ac>] (__ipipe_dispatch_irq+0x35c/0x5a0) from [<c00200c8>] (combiner_handle_cascade_irq+0x80/0xf4)
>> >> [  782.821620] [<c00200c8>] (combiner_handle_cascade_irq+0x80/0xf4) from [<c0019110>] (__ipipe_ack_irq+0xc/0x10)
>> >
>> > This is bad, can I see the code of the "combiner_handle_cascade_irq" function?
>> >
>>
>> Here's the original code:
>>
>> static DEFINE_SPINLOCK(irq_controller_lock);
>>
>> ...
>>
>> static void combiner_handle_cascade_irq(unsigned int irq, struct irq_desc *desc)
>> {
>>     struct combiner_chip_data *chip_data = irq_get_handler_data(irq);
>>     struct irq_chip *chip = irq_get_chip(irq);
>>     unsigned int cascade_irq, combiner_irq;
>>     unsigned long status;
>>
>>     chained_irq_enter(chip, desc);
>>
>>     spin_lock(&irq_controller_lock);
>>     status = __raw_readl(chip_data->base + COMBINER_INT_STATUS);
>>     spin_unlock(&irq_controller_lock);
>>     status &= chip_data->irq_mask;
>>
>>     if (status == 0)
>>         goto out;
>>
>>     combiner_irq = __ffs(status);
>>
>>     cascade_irq = combiner_irq + (chip_data->irq_offset & ~31);
>>     if (unlikely(cascade_irq >= NR_IRQS))
>>         do_bad_IRQ(cascade_irq, desc);
>>     else
>>         ipipe_handle_demuxed_irq(cascade_irq);
>>
>>  out:
>>     chained_irq_exit(chip, desc);
>> }
>>
>>
>> I've changed the DEFINE_SPINLOCK to IPIPE_DEFINE_SPINLOCK and I got
>> the same results.
>
> Same results meaning you still have the warning in ipipe_domain ? Or
> still have the lockup?
>

Lockup. It was just luck that that error message came up.

>
> One of these functions uses local_irq_save/local_irq_restore, one
> way to find which is to enable the I-pipe tracer, enable panic
> backtrace, and put a BUG_ON(hard_irqs_enabled()) right before the
> call to ipipe_handle_demuxed_irq. You probably can not call
> do_bad_IRQ from here either, but I do not think this is your
> problem, this checks looks a bit redundant: in order to check that
> cascade_irq is always less than NR_IRQS, it is sufficient to check
> that chip_data->irq_offset & ~31 is less than NR_IRQS - 32, which
> can be done only once, and not repeated stupidly for every irq.
>
> --
>                                             Gilles.

Thanks, will let you know how it goes.

-- 
GP Orcullo


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] Switchtest failures on ODROIDU3
  2014-10-03 10:58                               ` GP Orcullo
@ 2014-10-03 13:37                                 ` Gilles Chanteperdrix
  2014-10-03 15:28                                   ` GP Orcullo
  0 siblings, 1 reply; 46+ messages in thread
From: Gilles Chanteperdrix @ 2014-10-03 13:37 UTC (permalink / raw)
  To: GP Orcullo; +Cc: xenomai

On 10/03/2014 12:58 PM, GP Orcullo wrote:
> On Fri, Oct 3, 2014 at 4:57 PM, Gilles Chanteperdrix
> <gilles.chanteperdrix@xenomai.org> wrote:
>> On Fri, Oct 03, 2014 at 04:45:53PM +0800, GP Orcullo wrote:
>>> On Fri, Oct 3, 2014 at 3:20 PM, Gilles Chanteperdrix
>>> <gilles.chanteperdrix@xenomai.org> wrote:
>>>> On Fri, Oct 03, 2014 at 11:35:33AM +0800, GP Orcullo wrote:
>>>>> The system managed to dump some error messages.
>>>>>
>>>>
>>>>> [  407.688809] INFO: rcu_preempt detected stalls on CPUs/tasks: { 3} (detected by 0, t=12002 jiffies, g=22447, c=22446, q=113448)
>>>>> [  407.694560] Task dump for CPU 3:
>>>>> [  407.697763] switchtest      R running      0  3979   3640 0x00000002
>>>>> [  407.704111] [<c0453ddc>] (__schedule+0x1fc/0x5f8) from [<00000010>] (0x10)
>>>>> [  587.714458] INFO: rcu_preempt detected stalls on CPUs/tasks: { 3} (detected by 2, t=48007 jiffies, g=22447, c=22446, q=443504)
>>>>> [  587.720205] Task dump for CPU 3:
>>>>> [  587.723407] switchtest      R running      0  3979   3640 0x00000002
>>>>> [  587.729755] [<c0453ddc>] (__schedule+0x1fc/0x5f8) from [<00000010>] (0x10)
>>>>> [  767.740128] INFO: rcu_preempt detected stalls on CPUs/tasks: { 3} (detected by 2, t=84012 jiffies, g=22447, c=22446, q=1136198)
>>>>> [  767.745964] Task dump for CPU 3:
>>>>> [  767.749167] switchtest      R running      0  3979   3640 0x00000002
>>>>> [  767.755515] [<c0453ddc>] (__schedule+0x1fc/0x5f8) from [<00000010>] (0x10)
>>>>> [  782.763027] ------------[ cut here ]------------
>>>>> [  782.763099] WARNING: at include/linux/ipipe_domain.h:212 __ipipe_dispatch_irq+0x35c/0x5a0()
>>>>> [  782.770380] Modules linked in: nfsv3 nfsv4 nfsd exportfs nfs_acl auth_rpcgss nfs lockd sunrpc vfat fat smsc95xx usbnet mii gpio_keys
>>>>> [  782.782218] [<c0015624>] (unwind_backtrace+0x0/0xf8) from [<c0026118>] (warn_slowpath_common+0x4c/0x64)
>>>>> [  782.791606] [<c0026118>] (warn_slowpath_common+0x4c/0x64) from [<c002614c>] (warn_slowpath_null+0x1c/0x24)
>>>>> [  782.801232] [<c002614c>] (warn_slowpath_null+0x1c/0x24) from [<c00886ac>] (__ipipe_dispatch_irq+0x35c/0x5a0)
>>>>> [  782.811035] [<c00886ac>] (__ipipe_dispatch_irq+0x35c/0x5a0) from [<c00200c8>] (combiner_handle_cascade_irq+0x80/0xf4)
>>>>> [  782.821620] [<c00200c8>] (combiner_handle_cascade_irq+0x80/0xf4) from [<c0019110>] (__ipipe_ack_irq+0xc/0x10)
>>>>
>>>> This is bad, can I see the code of the "combiner_handle_cascade_irq" function?
>>>>
>>>
>>> Here's the original code:
>>>
>>> static DEFINE_SPINLOCK(irq_controller_lock);
>>>
>>> ...
>>>
>>> static void combiner_handle_cascade_irq(unsigned int irq, struct irq_desc *desc)
>>> {
>>>     struct combiner_chip_data *chip_data = irq_get_handler_data(irq);
>>>     struct irq_chip *chip = irq_get_chip(irq);
>>>     unsigned int cascade_irq, combiner_irq;
>>>     unsigned long status;
>>>
>>>     chained_irq_enter(chip, desc);
>>>
>>>     spin_lock(&irq_controller_lock);
>>>     status = __raw_readl(chip_data->base + COMBINER_INT_STATUS);
>>>     spin_unlock(&irq_controller_lock);
>>>     status &= chip_data->irq_mask;
>>>
>>>     if (status == 0)
>>>         goto out;
>>>
>>>     combiner_irq = __ffs(status);
>>>
>>>     cascade_irq = combiner_irq + (chip_data->irq_offset & ~31);
>>>     if (unlikely(cascade_irq >= NR_IRQS))
>>>         do_bad_IRQ(cascade_irq, desc);
>>>     else
>>>         ipipe_handle_demuxed_irq(cascade_irq);
>>>
>>>  out:
>>>     chained_irq_exit(chip, desc);
>>> }
>>>
>>>
>>> I've changed the DEFINE_SPINLOCK to IPIPE_DEFINE_SPINLOCK and I got
>>> the same results.
>>
>> Same results meaning you still have the warning in ipipe_domain ? Or
>> still have the lockup?
>>
> 
> Lockup. It was just luck that that error message came up.

The error message only happens with I-pipe debugging enabled. And the
error was probably due to the spinlock.

> 
>>
>> One of these functions uses local_irq_save/local_irq_restore, one
>> way to find which is to enable the I-pipe tracer, enable panic
>> backtrace, and put a BUG_ON(hard_irqs_enabled()) right before the
>> call to ipipe_handle_demuxed_irq. You probably can not call
>> do_bad_IRQ from here either, but I do not think this is your
>> problem, this checks looks a bit redundant: in order to check that
>> cascade_irq is always less than NR_IRQS, it is sufficient to check
>> that chip_data->irq_offset & ~31 is less than NR_IRQS - 32, which
>> can be done only once, and not repeated stupidly for every irq.
>>
> 
> Thanks, will let you know how it goes.

I do not think that is necessary, if the error was due to the spinlock,
now that you replaced with an I-pipe spinlock, you will no longer have
it. It will not help debugging the spinlock, which may be something
different. But you can test that anyway, to be sure that the irqs are
not enabled at this point.

In any case, the backtrace you sent explains why the cpu appear to be
stalled: because root is unstalled while a GPIO irq happens,
smp_call_functions ends up invoking callbacks on cpus where Linux has
been preempted by Xeonmai, then invokes schedule, which probably goes
south because either Linux data are not in a consistent state or more
probably because it preempted a Xenomai kernel thread (switchtest runs a
lot of them), and tries to use current, which can not work.

-- 
                                                                Gilles.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] Switchtest failures on ODROIDU3
  2014-10-03 13:37                                 ` Gilles Chanteperdrix
@ 2014-10-03 15:28                                   ` GP Orcullo
  2014-10-03 19:14                                     ` Gilles Chanteperdrix
  0 siblings, 1 reply; 46+ messages in thread
From: GP Orcullo @ 2014-10-03 15:28 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai

On Fri, Oct 3, 2014 at 9:37 PM, Gilles Chanteperdrix
<gilles.chanteperdrix@xenomai.org> wrote:
> On 10/03/2014 12:58 PM, GP Orcullo wrote:
>> On Fri, Oct 3, 2014 at 4:57 PM, Gilles Chanteperdrix
>> <gilles.chanteperdrix@xenomai.org> wrote:
>>> On Fri, Oct 03, 2014 at 04:45:53PM +0800, GP Orcullo wrote:
>>>> On Fri, Oct 3, 2014 at 3:20 PM, Gilles Chanteperdrix
>>>> <gilles.chanteperdrix@xenomai.org> wrote:
>>>>> On Fri, Oct 03, 2014 at 11:35:33AM +0800, GP Orcullo wrote:
>>>>>> The system managed to dump some error messages.
>>>>>>
>>>>>
>>>>>> [  407.688809] INFO: rcu_preempt detected stalls on CPUs/tasks: { 3} (detected by 0, t=12002 jiffies, g=22447, c=22446, q=113448)
>>>>>> [  407.694560] Task dump for CPU 3:
>>>>>> [  407.697763] switchtest      R running      0  3979   3640 0x00000002
>>>>>> [  407.704111] [<c0453ddc>] (__schedule+0x1fc/0x5f8) from [<00000010>] (0x10)
>>>>>> [  587.714458] INFO: rcu_preempt detected stalls on CPUs/tasks: { 3} (detected by 2, t=48007 jiffies, g=22447, c=22446, q=443504)
>>>>>> [  587.720205] Task dump for CPU 3:
>>>>>> [  587.723407] switchtest      R running      0  3979   3640 0x00000002
>>>>>> [  587.729755] [<c0453ddc>] (__schedule+0x1fc/0x5f8) from [<00000010>] (0x10)
>>>>>> [  767.740128] INFO: rcu_preempt detected stalls on CPUs/tasks: { 3} (detected by 2, t=84012 jiffies, g=22447, c=22446, q=1136198)
>>>>>> [  767.745964] Task dump for CPU 3:
>>>>>> [  767.749167] switchtest      R running      0  3979   3640 0x00000002
>>>>>> [  767.755515] [<c0453ddc>] (__schedule+0x1fc/0x5f8) from [<00000010>] (0x10)
>>>>>> [  782.763027] ------------[ cut here ]------------
>>>>>> [  782.763099] WARNING: at include/linux/ipipe_domain.h:212 __ipipe_dispatch_irq+0x35c/0x5a0()
>>>>>> [  782.770380] Modules linked in: nfsv3 nfsv4 nfsd exportfs nfs_acl auth_rpcgss nfs lockd sunrpc vfat fat smsc95xx usbnet mii gpio_keys
>>>>>> [  782.782218] [<c0015624>] (unwind_backtrace+0x0/0xf8) from [<c0026118>] (warn_slowpath_common+0x4c/0x64)
>>>>>> [  782.791606] [<c0026118>] (warn_slowpath_common+0x4c/0x64) from [<c002614c>] (warn_slowpath_null+0x1c/0x24)
>>>>>> [  782.801232] [<c002614c>] (warn_slowpath_null+0x1c/0x24) from [<c00886ac>] (__ipipe_dispatch_irq+0x35c/0x5a0)
>>>>>> [  782.811035] [<c00886ac>] (__ipipe_dispatch_irq+0x35c/0x5a0) from [<c00200c8>] (combiner_handle_cascade_irq+0x80/0xf4)
>>>>>> [  782.821620] [<c00200c8>] (combiner_handle_cascade_irq+0x80/0xf4) from [<c0019110>] (__ipipe_ack_irq+0xc/0x10)
>>>>>
>>>>> This is bad, can I see the code of the "combiner_handle_cascade_irq" function?
>>>>>
>>>>
>>>> Here's the original code:
>>>>
>>>> static DEFINE_SPINLOCK(irq_controller_lock);
>>>>
>>>> ...
>>>>
>>>> static void combiner_handle_cascade_irq(unsigned int irq, struct irq_desc *desc)
>>>> {
>>>>     struct combiner_chip_data *chip_data = irq_get_handler_data(irq);
>>>>     struct irq_chip *chip = irq_get_chip(irq);
>>>>     unsigned int cascade_irq, combiner_irq;
>>>>     unsigned long status;
>>>>
>>>>     chained_irq_enter(chip, desc);
>>>>
>>>>     spin_lock(&irq_controller_lock);
>>>>     status = __raw_readl(chip_data->base + COMBINER_INT_STATUS);
>>>>     spin_unlock(&irq_controller_lock);
>>>>     status &= chip_data->irq_mask;
>>>>
>>>>     if (status == 0)
>>>>         goto out;
>>>>
>>>>     combiner_irq = __ffs(status);
>>>>
>>>>     cascade_irq = combiner_irq + (chip_data->irq_offset & ~31);
>>>>     if (unlikely(cascade_irq >= NR_IRQS))
>>>>         do_bad_IRQ(cascade_irq, desc);
>>>>     else
>>>>         ipipe_handle_demuxed_irq(cascade_irq);
>>>>
>>>>  out:
>>>>     chained_irq_exit(chip, desc);
>>>> }
>>>>
>>>>
>>>> I've changed the DEFINE_SPINLOCK to IPIPE_DEFINE_SPINLOCK and I got
>>>> the same results.
>>>
>>> Same results meaning you still have the warning in ipipe_domain ? Or
>>> still have the lockup?
>>>
>>
>> Lockup. It was just luck that that error message came up.
>
> The error message only happens with I-pipe debugging enabled. And the
> error was probably due to the spinlock.
>
>>
>>>
>>> One of these functions uses local_irq_save/local_irq_restore, one
>>> way to find which is to enable the I-pipe tracer, enable panic
>>> backtrace, and put a BUG_ON(hard_irqs_enabled()) right before the
>>> call to ipipe_handle_demuxed_irq. You probably can not call
>>> do_bad_IRQ from here either, but I do not think this is your
>>> problem, this checks looks a bit redundant: in order to check that
>>> cascade_irq is always less than NR_IRQS, it is sufficient to check
>>> that chip_data->irq_offset & ~31 is less than NR_IRQS - 32, which
>>> can be done only once, and not repeated stupidly for every irq.
>>>
>>
>> Thanks, will let you know how it goes.
>
> I do not think that is necessary, if the error was due to the spinlock,
> now that you replaced with an I-pipe spinlock, you will no longer have
> it. It will not help debugging the spinlock, which may be something
> different. But you can test that anyway, to be sure that the irqs are
> not enabled at this point.
>
> In any case, the backtrace you sent explains why the cpu appear to be
> stalled: because root is unstalled while a GPIO irq happens,
> smp_call_functions ends up invoking callbacks on cpus where Linux has
> been preempted by Xeonmai, then invokes schedule, which probably goes
> south because either Linux data are not in a consistent state or more
> probably because it preempted a Xenomai kernel thread (switchtest runs a
> lot of them), and tries to use current, which can not work.
>
> --
>                                                                 Gilles.

I've added BUG_ON(!hard_irqs_disabled()) to the code and got the
kernel to oops at startup.

Where shall I start looking for the offending code?

-- 
GP Orcullo
-------------- next part --------------
[   15.603409] Unable to handle kernel NULL pointer dereference at virtual address 0000000c
[   15.606348] pgd = e6004000, hw pgd = e6004000
[   15.610624] [0000000c] *pgd=00000000
[   15.613969] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
[   15.616703] s5p-mixer s5p-mixer: start latency exceeded, new value 333 ns
[   15.616747] s5p-mixer s5p-mixer: state restore latency exceeded, new value 36542 ns
[   15.617000] s5p-mixer s5p-mixer: stop latency exceeded, new value 291 ns
[   15.617024] s5p-mixer s5p-mixer: state save latency exceeded, new value 11750 ns
[   15.617029] s5p-mixer s5p-mixer: stop latency exceeded, new value 292 ns
[   15.654205] Modules linked in: mii(+)
[   15.657830] CPU: 1    Not tainted  (3.8.13.11-xen #25)
[   15.662966] PC is at load_module+0x1908/0x1e18
[   15.667382] LR is at ipipe_root_only+0x58/0x160
[   15.671890] pc : [<c007935c>]    lr : [<c008f614>]    psr: a0000053
[   15.671890] sp : e6bddeb0  ip : bf000c40  fp : c04c4c68
[   15.683355] r10: 00000000  r9 : bf000b34  r8 : 00000000
[   15.688542] r7 : fffffff8  r6 : bf000aec  r5 : bf000af8  r4 : e6bddf58
[   15.695050] r3 : e7135800  r2 : e6bddea8  r1 : c08c7ac0  r0 : 00000000
[   15.701555] Flags: NzCv  IRQs on  FIQs off  Mode SVC_32  ISA ARM  Segment user
[   15.708756] Control: 10c5387d  Table: 6600404a  DAC: 00000015
[   15.714476] Process modprobe (pid: 1728, stack limit = 0xe6bdc240)
[   15.720634] Stack: (0xe6bddeb0 to 0xe6bde000)
[   15.724968] dea0:                                     bf000af8 00007fff c0074e48 00001302
[   15.733133] dec0: 00000000 f01d1000 b6ec1d50 e6bdc000 bf000c40 e6bddef4 e6bdddb0 00000000
[   15.741285] dee0: c001aa44 c000e860 f01f2000 b6da3000 00000c8f bf000980 00000008 00000000
[   15.749437] df00: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[   15.757589] df20: 00000000 00000000 00000000 00000000 000000d2 00021d0f b6d82000 b6ec1d50
[   15.765742] df40: 00000080 c000ef70 e6bdc000 00000000 00000000 c007994c f01d1000 00021d0f
[   15.773894] df60: f01eb2c0 f01eb171 f01f29d0 00000c3c 00000edc 00000000 00000000 00000000
[   15.782046] df80: 0000001f 00000020 0000000d 00000000 0000000a 00000000 00000000 b6f10290
[   15.790209] dfa0: b6f12910 c000ed40 00000000 b6f10290 b6d82000 00021d0f b6ec1d50 00000002
[   15.798361] dfc0: 00000000 b6f10290 b6f12910 00000080 00000000 b6ec1d50 00021d0f 00000000
[   15.800249] s5p-mixer s5p-mixer: start latency exceeded, new value 542 ns
[   15.800535] s5p-mixer s5p-mixer: stop latency exceeded, new value 333 ns
[   15.819951] dfe0: 00060000 be9ce914 b6ebbb07 b6e3f684 80000050 b6d82000 00000000 00000000
[   15.828113] [<c007935c>] (load_module+0x1908/0x1e18) from [<c007994c>] (sys_init_module+0xe0/0xf4)
[   15.837040] [<c007994c>] (sys_init_module+0xe0/0xf4) from [<c000ed40>] (ret_fast_syscall+0x0/0x34)
[   15.845971] Code: e59dc020 e15c0007 e2477008 0a000009 (e5973014) 
[   15.852033] I-pipe tracer log (100 points):
[   15.856183]      +func                    0 ipipe_trace_panic_freeze+0x8 (oops_enter+0x14)
[   15.864412]      +func                   -1 oops_enter+0x8 (die+0x20)
[   15.870830]      +func                   -1 die+0xc (__do_kernel_fault.part.8+0x5c)
[   15.878462]  |   #func                   -2 ipipe_root_only+0x8 (ipipe_unstall_root+0x14)
[   15.886615]      #func                   -3 ipipe_unstall_root+0x8 (vprintk_emit+0x1cc)
[   15.894593]      #func                   -3 wake_up_klogd+0x8 (vprintk_emit+0x1bc)
[   15.902138]      #func                   -4 ipipe_test_root+0x8 (preempt_schedule+0x24)
[   15.910118]      #func                   -4 ipipe_root_only+0x8 (sub_preempt_count+0x18)
[   15.918184]      #func                   -5 sub_preempt_count+0xc (_raw_spin_unlock_irqrestore+0x28)
[   15.927289]      #func                   -5 _raw_spin_unlock_irqrestore+0x8 (console_unlock+0x1e0)
[   15.936222]      #func                   -6 __ipipe_spin_unlock_debug+0x8 (console_unlock+0x1d4)
[   15.944981]      #func                   -6 ipipe_root_only+0x8 (add_preempt_count+0x18)
[   15.953047]      #func                   -7 add_preempt_count+0xc (_raw_spin_lock+0x18)
[   15.961026]      #func                   -7 _raw_spin_lock+0x8 (console_unlock+0x1c0)
[   15.968831]      #func                   -8 ipipe_test_root+0x8 (preempt_schedule+0x24)
[   15.976810]      #func                   -8 ipipe_root_only+0x8 (sub_preempt_count+0x18)
[   15.984876]      #func                   -9 sub_preempt_count+0xc (_raw_spin_unlock_irqrestore+0x28)
[   15.993982]      #func                   -9 _raw_spin_unlock_irqrestore+0x8 (console_unlock+0x1b8)
[   16.002915]      #func                  -10 __ipipe_spin_unlock_debug+0x8 (up+0x3c)
[   16.010547]      #func                  -10 ipipe_root_only+0x8 (add_preempt_count+0x18)
[   16.018612]      #func                  -11 add_preempt_count+0xc (_raw_spin_lock_irqsave+0x20)
[   16.027285]      #func                  -11 ipipe_root_only+0x8 (ipipe_test_and_stall_root+0x10)
[   16.036045]      #func                  -12 ipipe_test_and_stall_root+0x8 (_raw_spin_lock_irqsave+0x14)
[   16.045411]      #func                  -12 _raw_spin_lock_irqsave+0x8 (up+0x14)
[   16.052783]      #func                  -13 up+0x8 (console_unlock+0x1b8)
[   16.059548]      #func                  -13 ipipe_test_root+0x8 (preempt_schedule+0x24)
[   16.067527]      #func                  -14 ipipe_root_only+0x8 (sub_preempt_count+0x18)
[   16.075592]      #func                  -14 sub_preempt_count+0xc (_raw_spin_unlock+0x18)
[   16.083745]      #func                  -15 _raw_spin_unlock+0x8 (console_unlock+0x1b0)
[   16.091724]      #func                  -16 ipipe_root_only+0x8 (add_preempt_count+0x18)
[   16.099789]      #func                  -16 add_preempt_count+0xc (_raw_spin_lock_irqsave+0x20)
[   16.108462]      #func                  -17 ipipe_root_only+0x8 (ipipe_test_and_stall_root+0x10)
[   16.117222]      #func                  -17 ipipe_test_and_stall_root+0x8 (_raw_spin_lock_irqsave+0x14)
[   16.126588]      #func                  -17 _raw_spin_lock_irqsave+0x8 (console_unlock+0x8c)
[   16.135000]      #func                  -18 ipipe_test_root+0x8 (preempt_schedule+0x24)
[   16.142979]      #func                  -19 ipipe_root_only+0x8 (sub_preempt_count+0x18)
[   16.151045]      #func                  -19 sub_preempt_count+0xc (_raw_spin_unlock+0x18)
[   16.159197]      #func                  -19 _raw_spin_unlock+0x8 (call_console_drivers.constprop.15+0xf8)
[   16.168738]      #func                  -20 __rcu_read_unlock+0x8 (__atomic_notifier_call_chain+0x48)
[   16.177931]      #func                  -20 notifier_call_chain+0x8 (__atomic_notifier_call_chain+0x40)
[   16.187297]      #func                  -21 __rcu_read_lock+0x4 (__atomic_notifier_call_chain+0x24)
[   16.196317]      #func                  -21 __atomic_notifier_call_chain+0xc (atomic_notifier_call_chain+0x20)
[   16.206290]      #func                  -22 atomic_notifier_call_chain+0xc (notify_update+0x30)
[   16.214963]      #func                  -22 notify_update+0xc (vt_console_print+0x1cc)
[   16.222855]      #func                  -23 dummycon_dummy+0x4 (set_cursor+0x90)
[   16.230227]      #func                  -23 add_softcursor+0x8 (set_cursor+0x6c)
[   16.237599]      #func                  -24 set_cursor+0x8 (vt_console_print+0x1c4)
[   16.245231]      #func                  -24 __rcu_read_unlock+0x8 (__atomic_notifier_call_chain+0x48)
[   16.254424]      #func                  -25 notifier_call_chain+0x8 (__atomic_notifier_call_chain+0x40)
[   16.263791]      #func                  -25 __rcu_read_lock+0x4 (__atomic_notifier_call_chain+0x24)
[   16.272810]      #func                  -26 __atomic_notifier_call_chain+0xc (atomic_notifier_call_chain+0x20)
[   16.282784]      #func                  -26 atomic_notifier_call_chain+0xc (notify_write+0x28)
[   16.291370]      #func                  -26 notify_write+0xc (vt_console_print+0x260)
[   16.299175]      #func                  -27 __rcu_read_unlock+0x8 (__atomic_notifier_call_chain+0x48)
[   16.308368]      #func                  -27 notifier_call_chain+0x8 (__atomic_notifier_call_chain+0x40)
[   16.317735]      #func                  -28 __rcu_read_lock+0x4 (__atomic_notifier_call_chain+0x24)
[   16.326754]      #func                  -28 __atomic_notifier_call_chain+0xc (atomic_notifier_call_chain+0x20)
[   16.336728]      #func                  -29 atomic_notifier_call_chain+0xc (notify_write+0x28)
[   16.345314]      #func                  -29 notify_write+0xc (vt_console_print+0x230)
[   16.353120]      #func                  -33 dummycon_dummy+0x4 (scrup+0xe8)
[   16.360058]      #func                  -33 scrup+0xc (lf+0x6c)
[   16.365955]      #func                  -34 lf+0x8 (vt_console_print+0x230)
[   16.372893]      #func                  -34 dummycon_dummy+0x4 (hide_cursor+0x38)
[   16.380352]      #func                  -35 hide_cursor+0x8 (vt_console_print+0x2c0)
[   16.388071]      #func                  -36 ipipe_root_only+0x8 (add_preempt_count+0x18)
[   16.396136]      #func                  -36 add_preempt_count+0xc (_raw_spin_trylock+0x18)
[   16.404376]      #func                  -37 _raw_spin_trylock+0x8 (vt_console_print+0x48)
[   16.412528]      #func                  -37 vt_console_print+0xc (call_console_drivers.constprop.15+0xf8)
[   16.422068]      #func                  -38 ipipe_test_root+0x8 (debug_smp_processor_id+0x7c)
[   16.430567]      #func                  -39 s3c24xx_serial_console_putchar+0x8 (uart_console_write+0x5c)
[   16.440020]      #func                  -41 s3c24xx_serial_console_putchar+0x8 (uart_console_write+0x50)
[   16.449474]      #func                  -41 uart_console_write+0x8 (call_console_drivers.constprop.15+0xf8)
[   16.459187]      #func                  -42 s3c24xx_serial_console_write+0x4 (call_console_drivers.constprop.15+0xf8)
[   16.469768]      #func                  -42 ipipe_test_root+0x8 (debug_smp_processor_id+0x7c)
[   16.478267]      #func                  -43 call_console_drivers.constprop.15+0xc (console_unlock+0x288)
[   16.487720]      #func                  -43 ipipe_test_root+0x8 (preempt_schedule+0x24)
[   16.495699]      #func                  -44 ipipe_root_only+0x8 (sub_preempt_count+0x18)
[   16.503765]      #func                  -44 sub_preempt_count+0xc (_raw_spin_unlock+0x18)
[   16.511917]      #func                  -45 _raw_spin_unlock+0x8 (console_unlock+0x27c)
[   16.519896]      #func                  -46 ipipe_root_only+0x8 (add_preempt_count+0x18)
[   16.527962]      #func                  -46 add_preempt_count+0xc (_raw_spin_lock_irqsave+0x20)
[   16.536635]      #func                  -47 ipipe_root_only+0x8 (ipipe_test_and_stall_root+0x10)
[   16.545394]      #func                  -47 ipipe_test_and_stall_root+0x8 (_raw_spin_lock_irqsave+0x14)
[   16.554761]      #func                  -48 _raw_spin_lock_irqsave+0x8 (console_unlock+0x34)
[   16.563173]      #func                  -48 console_unlock+0xc (vprintk_emit+0x1bc)
[   16.570805]      #func                  -48 ipipe_test_root+0x8 (preempt_schedule+0x24)
[   16.578784]      #func                  -49 ipipe_root_only+0x8 (sub_preempt_count+0x18)
[   16.586850]      #func                  -49 sub_preempt_count+0xc (_raw_spin_unlock+0x18)
[   16.595002]      #func                  -50 _raw_spin_unlock+0x8 (vprintk_emit+0x1b8)
[   16.602807]      #func                  -51 ipipe_root_only+0x8 (sub_preempt_count+0x18)
[   16.610873]      #func                  -51 sub_preempt_count+0xc (_raw_spin_unlock_irqrestore+0x28)
[   16.619979]      #func                  -52 _raw_spin_unlock_irqrestore+0x8 (down_trylock+0x38)
[   16.628652]      #func                  -52 __ipipe_spin_unlock_debug+0x8 (down_trylock+0x2c)
[   16.637151]      #func                  -53 ipipe_root_only+0x8 (add_preempt_count+0x18)
[   16.645217]      #func                  -53 add_preempt_count+0xc (_raw_spin_lock_irqsave+0x20)
[   16.653890]      #func                  -54 ipipe_root_only+0x8 (ipipe_test_and_stall_root+0x10)
[   16.662650]      #func                  -54 ipipe_test_and_stall_root+0x8 (_raw_spin_lock_irqsave+0x14)
[   16.672016]      #func                  -55 _raw_spin_lock_irqsave+0x8 (down_trylock+0x14)
[   16.680255]      #func                  -55 down_trylock+0x8 (console_trylock+0x14)
[   16.687887]      #func                  -56 console_trylock+0x8 (vprintk_emit+0x16c)
[   16.695700] ---[ end trace c1425b6856ac7c36 ]---

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] Switchtest failures on ODROIDU3
  2014-10-03 15:28                                   ` GP Orcullo
@ 2014-10-03 19:14                                     ` Gilles Chanteperdrix
  2014-10-03 22:45                                       ` GP Orcullo
  0 siblings, 1 reply; 46+ messages in thread
From: Gilles Chanteperdrix @ 2014-10-03 19:14 UTC (permalink / raw)
  To: GP Orcullo; +Cc: xenomai

On Fri, Oct 03, 2014 at 11:28:52PM +0800, GP Orcullo wrote:
> I've added BUG_ON(!hard_irqs_disabled()) to the code and got the
> kernel to oops at startup.
>
> Where shall I start looking for the offending code?

The thing is, all the tracepoints you have are late after the
fault. A better strategy is to replace BUG_ON(!hard_irqs_disabled)
with:

if (!hard_irqs_disabled()) {
   ipipe_trace_panic_freeze();
   ipipe_trace_panic_dump();
   BUG();
}

--
					    Gilles.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] Switchtest failures on ODROIDU3
  2014-10-03 19:14                                     ` Gilles Chanteperdrix
@ 2014-10-03 22:45                                       ` GP Orcullo
  2014-10-03 22:48                                         ` Gilles Chanteperdrix
  0 siblings, 1 reply; 46+ messages in thread
From: GP Orcullo @ 2014-10-03 22:45 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: xenomai

On Sat, Oct 4, 2014 at 3:14 AM, Gilles Chanteperdrix
<gilles.chanteperdrix@xenomai.org> wrote:
> On Fri, Oct 03, 2014 at 11:28:52PM +0800, GP Orcullo wrote:
>> I've added BUG_ON(!hard_irqs_disabled()) to the code and got the
>> kernel to oops at startup.
>>
>> Where shall I start looking for the offending code?
>
> The thing is, all the tracepoints you have are late after the
> fault. A better strategy is to replace BUG_ON(!hard_irqs_disabled)
> with:
>
> if (!hard_irqs_disabled()) {
>    ipipe_trace_panic_freeze();
>    ipipe_trace_panic_dump();
>    BUG();
> }
>
> --
>                                             Gilles.

The results are almost similar.

-- 
GP Orcullo
-------------- next part --------------
[   15.550769] Unable to handle kernel NULL pointer dereference at virtual address 0000000c
[   15.553538] pgd = e6078000, hw pgd = e6078000
[   15.557960] [0000000c] *pgd=00000000
[   15.561285] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
[   15.566436] Modules linked in: mii(+)
[   15.570060] CPU: 2    Not tainted  (3.8.13.11-xen #27)
[   15.575199] PC is at load_module+0x1908/0x1e18
[   15.579618] LR is at ipipe_root_only+0x58/0x160
[   15.584122] pc : [<c0079364>]    lr : [<c008f61c>]    psr: a0000053
[   15.584122] sp : e604feb0  ip : bf000c40  fp : c04c4c68
[   15.595586] r10: 00000000  r9 : bf000b34  r8 : 00000000
[   15.600772] r7 : fffffff8  r6 : bf000aec  r5 : bf000af8  r4 : e604ff58
[   15.607277] r3 : e6b98c00  r2 : e604fea8  r1 : c08c7ac0  r0 : 00000000
[   15.613782] Flags: NzCv  IRQs on  FIQs off  Mode SVC_32  ISA ARM  Segment user
[   15.620991] Control: 10c5387d  Table: 6607804a  DAC: 00000015
[   15.626710] Process modprobe (pid: 1739, stack limit = 0xe604e240)
[   15.632865] Stack: (0xe604feb0 to 0xe6050000)
[   15.637199] fea0:                                     bf000af8 00007fff c0074e50 00001302
[   15.645364] fec0: 00000000 f01d1000 b6f39d50 e604e000 bf000c40 e604fef4 e604fdb0 00000000
[   15.653518] fee0: c001aa44 c000e860 f01f2000 b6e1b000 00000c8f bf000980 00000008 00000000
[   15.661670] ff00: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[   15.669821] ff20: 00000000 00000000 00000000 00000000 000000d2 00021d0f b6dfa000 b6f39d50
[   15.677975] ff40: 00000080 c000ef70 e604e000 00000000 00000000 c0079954 f01d1000 00021d0f
[   15.686126] ff60: f01eb2c0 f01eb171 f01f29d0 00000c3c 00000edc 00000000 00000000 00000000
[   15.694273] ff80: 0000001f 00000020 0000000d 00000000 0000000a 00000000 00000000 b6f88290
[   15.702426] ffa0: b6f8a910 c000ed40 00000000 b6f88290 b6dfa000 00021d0f b6f39d50 00000002
[   15.710585] ffc0: 00000000 b6f88290 b6f8a910 00000080 00000000 b6f39d50 00021d0f 00000000
[   15.718739] ffe0: 00060000 beea3914 b6f33b07 b6eb7684 80000050 b6dfa000 00000000 00000000
[   15.726896] [<c0079364>] (load_module+0x1908/0x1e18) from [<c0079954>] (sys_init_module+0xe0/0xf4)
[   15.735831] [<c0079954>] (sys_init_module+0xe0/0xf4) from [<c000ed40>] (ret_fast_syscall+0x0/0x34)
[   15.744757] Code: e59dc020 e15c0007 e2477008 0a000009 (e5973014) 
[   15.750824] I-pipe tracer log (100 points):
[   15.754972]      +func                    0 ipipe_trace_panic_freeze+0x8 (oops_enter+0x14)
[   15.763198]      +func                   -1 oops_enter+0x8 (die+0x20)
[   15.769614]      +func                   -2 die+0xc (__do_kernel_fault.part.8+0x5c)
[   15.777245]  |   #func                   -3 ipipe_root_only+0x8 (ipipe_unstall_root+0x14)
[   15.785398]      #func                   -3 ipipe_unstall_root+0x8 (vprintk_emit+0x1cc)
[   15.793377]      #func                   -4 wake_up_klogd+0x8 (vprintk_emit+0x1bc)
[   15.800924]      #func                   -5 ipipe_test_root+0x8 (preempt_schedule+0x24)
[   15.808901]      #func                   -5 ipipe_root_only+0x8 (sub_preempt_count+0x18)
[   15.816966]      #func                   -6 sub_preempt_count+0xc (_raw_spin_unlock_irqrestore+0x28)
[   15.826072]      #func                   -6 _raw_spin_unlock_irqrestore+0x8 (console_unlock+0x1e0)
[   15.835005]      #func                   -6 __ipipe_spin_unlock_debug+0x8 (console_unlock+0x1d4)
[   15.843765]      #func                   -7 ipipe_root_only+0x8 (add_preempt_count+0x18)
[   15.851830]      #func                   -7 add_preempt_count+0xc (_raw_spin_lock+0x18)
[   15.859809]      #func                   -8 _raw_spin_lock+0x8 (console_unlock+0x1c0)
[   15.867615]      #func                   -8 ipipe_test_root+0x8 (preempt_schedule+0x24)
[   15.875594]      #func                   -9 ipipe_root_only+0x8 (sub_preempt_count+0x18)
[   15.883659]      #func                   -9 sub_preempt_count+0xc (_raw_spin_unlock_irqrestore+0x28)
[   15.892766]      #func                  -10 _raw_spin_unlock_irqrestore+0x8 (console_unlock+0x1b8)
[   15.901699]      #func                  -10 __ipipe_spin_unlock_debug+0x8 (up+0x3c)
[   15.909330]      #func                  -11 ipipe_root_only+0x8 (add_preempt_count+0x18)
[   15.917396]      #func                  -12 add_preempt_count+0xc (_raw_spin_lock_irqsave+0x20)
[   15.926069]      #func                  -12 ipipe_root_only+0x8 (ipipe_test_and_stall_root+0x10)
[   15.934829]      #func                  -13 ipipe_test_and_stall_root+0x8 (_raw_spin_lock_irqsave+0x14)
[   15.944195]      #func                  -13 _raw_spin_lock_irqsave+0x8 (up+0x14)
[   15.951567]      #func                  -13 up+0x8 (console_unlock+0x1b8)
[   15.958332]      #func                  -14 ipipe_test_root+0x8 (preempt_schedule+0x24)
[   15.966310]      #func                  -15 ipipe_root_only+0x8 (sub_preempt_count+0x18)
[   15.974376]      #func                  -15 sub_preempt_count+0xc (_raw_spin_unlock+0x18)
[   15.982528]      #func                  -16 _raw_spin_unlock+0x8 (console_unlock+0x1b0)
[   15.990507]      #func                  -16 ipipe_root_only+0x8 (add_preempt_count+0x18)
[   15.998573]      #func                  -17 add_preempt_count+0xc (_raw_spin_lock_irqsave+0x20)
[   16.007246]      #func                  -17 ipipe_root_only+0x8 (ipipe_test_and_stall_root+0x10)
[   16.016005]      #func                  -18 ipipe_test_and_stall_root+0x8 (_raw_spin_lock_irqsave+0x14)
[   16.025372]      #func                  -18 _raw_spin_lock_irqsave+0x8 (console_unlock+0x8c)
[   16.033784]      #func                  -19 ipipe_test_root+0x8 (preempt_schedule+0x24)
[   16.041763]      #func                  -19 ipipe_root_only+0x8 (sub_preempt_count+0x18)
[   16.049829]      #func                  -20 sub_preempt_count+0xc (_raw_spin_unlock+0x18)
[   16.057981]      #func                  -20 _raw_spin_unlock+0x8 (call_console_drivers.constprop.15+0xf8)
[   16.067521]      #func                  -21 __rcu_read_unlock+0x8 (__atomic_notifier_call_chain+0x48)
[   16.076714]      #func                  -21 notifier_call_chain+0x8 (__atomic_notifier_call_chain+0x40)
[   16.086081]      #func                  -21 __rcu_read_lock+0x4 (__atomic_notifier_call_chain+0x24)
[   16.095100]      #func                  -22 __atomic_notifier_call_chain+0xc (atomic_notifier_call_chain+0x20)
[   16.105074]      #func                  -22 atomic_notifier_call_chain+0xc (notify_update+0x30)
[   16.113746]      #func                  -23 notify_update+0xc (vt_console_print+0x1cc)
[   16.121639]      #func                  -23 dummycon_dummy+0x4 (set_cursor+0x90)
[   16.129011]      #func                  -24 add_softcursor+0x8 (set_cursor+0x6c)
[   16.136383]      #func                  -24 set_cursor+0x8 (vt_console_print+0x1c4)
[   16.144014]      #func                  -25 __rcu_read_unlock+0x8 (__atomic_notifier_call_chain+0x48)
[   16.153208]      #func                  -25 notifier_call_chain+0x8 (__atomic_notifier_call_chain+0x40)
[   16.162574]      #func                  -26 __rcu_read_lock+0x4 (__atomic_notifier_call_chain+0x24)
[   16.171594]      #func                  -26 __atomic_notifier_call_chain+0xc (atomic_notifier_call_chain+0x20)
[   16.181567]      #func                  -27 atomic_notifier_call_chain+0xc (notify_write+0x28)
[   16.190153]      #func                  -27 notify_write+0xc (vt_console_print+0x260)
[   16.197959]      #func                  -28 __rcu_read_unlock+0x8 (__atomic_notifier_call_chain+0x48)
[   16.207152]      #func                  -28 notifier_call_chain+0x8 (__atomic_notifier_call_chain+0x40)
[   16.216518]      #func                  -29 __rcu_read_lock+0x4 (__atomic_notifier_call_chain+0x24)
[   16.225538]      #func                  -29 __atomic_notifier_call_chain+0xc (atomic_notifier_call_chain+0x20)
[   16.235512]      #func                  -29 atomic_notifier_call_chain+0xc (notify_write+0x28)
[   16.244098]      #func                  -30 notify_write+0xc (vt_console_print+0x230)
[   16.251903]      #func                  -33 dummycon_dummy+0x4 (scrup+0xe8)
[   16.258842]      #func                  -33 scrup+0xc (lf+0x6c)
[   16.264739]      #func                  -34 lf+0x8 (vt_console_print+0x230)
[   16.271677]      #func                  -34 dummycon_dummy+0x4 (hide_cursor+0x38)
[   16.279136]      #func                  -35 hide_cursor+0x8 (vt_console_print+0x2c0)
[   16.286854]      #func                  -36 ipipe_root_only+0x8 (add_preempt_count+0x18)
[   16.294920]      #func                  -36 add_preempt_count+0xc (_raw_spin_trylock+0x18)
[   16.303159]      #func                  -37 _raw_spin_trylock+0x8 (vt_console_print+0x48)
[   16.311311]      #func                  -37 vt_console_print+0xc (call_console_drivers.constprop.15+0xf8)
[   16.320851]      #func                  -38 ipipe_test_root+0x8 (debug_smp_processor_id+0x7c)
[   16.329351]      #func                  -39 s3c24xx_serial_console_putchar+0x8 (uart_console_write+0x5c)
[   16.338804]      #func                  -40 s3c24xx_serial_console_putchar+0x8 (uart_console_write+0x50)
[   16.348257]      #func                  -41 uart_console_write+0x8 (call_console_drivers.constprop.15+0xf8)
[   16.357971]      #func                  -41 s3c24xx_serial_console_write+0x4 (call_console_drivers.constprop.15+0xf8)
[   16.368552]      #func                  -42 ipipe_test_root+0x8 (debug_smp_processor_id+0x7c)
[   16.377051]      #func                  -42 call_console_drivers.constprop.15+0xc (console_unlock+0x288)
[   16.386504]      #func                  -43 ipipe_test_root+0x8 (preempt_schedule+0x24)
[   16.394483]      #func                  -43 ipipe_root_only+0x8 (sub_preempt_count+0x18)
[   16.402548]      #func                  -44 sub_preempt_count+0xc (_raw_spin_unlock+0x18)
[   16.410701]      #func                  -44 _raw_spin_unlock+0x8 (console_unlock+0x27c)
[   16.418680]      #func                  -45 ipipe_root_only+0x8 (add_preempt_count+0x18)
[   16.426745]      #func                  -45 add_preempt_count+0xc (_raw_spin_lock_irqsave+0x20)
[   16.435418]      #func                  -46 ipipe_root_only+0x8 (ipipe_test_and_stall_root+0x10)
[   16.444178]      #func                  -46 ipipe_test_and_stall_root+0x8 (_raw_spin_lock_irqsave+0x14)
[   16.453544]      #func                  -47 _raw_spin_lock_irqsave+0x8 (console_unlock+0x34)
[   16.461957]      #func                  -47 console_unlock+0xc (vprintk_emit+0x1bc)
[   16.469589]      #func                  -48 ipipe_test_root+0x8 (preempt_schedule+0x24)
[   16.477568]      #func                  -48 ipipe_root_only+0x8 (sub_preempt_count+0x18)
[   16.485633]      #func                  -49 sub_preempt_count+0xc (_raw_spin_unlock+0x18)
[   16.493786]      #func                  -49 _raw_spin_unlock+0x8 (vprintk_emit+0x1b8)
[   16.501591]      #func                  -50 ipipe_root_only+0x8 (sub_preempt_count+0x18)
[   16.509657]      #func                  -50 sub_preempt_count+0xc (_raw_spin_unlock_irqrestore+0x28)
[   16.518763]      #func                  -51 _raw_spin_unlock_irqrestore+0x8 (down_trylock+0x38)
[   16.527436]      #func                  -51 __ipipe_spin_unlock_debug+0x8 (down_trylock+0x2c)
[   16.535935]      #func                  -52 ipipe_root_only+0x8 (add_preempt_count+0x18)
[   16.544001]      #func                  -52 add_preempt_count+0xc (_raw_spin_lock_irqsave+0x20)
[   16.552673]      #func                  -53 ipipe_root_only+0x8 (ipipe_test_and_stall_root+0x10)
[   16.561433]      #func                  -53 ipipe_test_and_stall_root+0x8 (_raw_spin_lock_irqsave+0x14)
[   16.570799]      #func                  -54 _raw_spin_lock_irqsave+0x8 (down_trylock+0x14)
[   16.579038]      #func                  -54 down_trylock+0x8 (console_trylock+0x14)
[   16.586670]      #func                  -55 console_trylock+0x8 (vprintk_emit+0x16c)
[   16.594473] ---[ end trace fb39fe55ac26e870 ]---
[ ok done.
udevd[1728]: '/sbin/modprobe -b usb:v0424p9730d0100dcFFdsc00dpFFicFFisc00ipFFin00' [1739] terminated by signal 11 (Segmentation fault)


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] Switchtest failures on ODROIDU3
  2014-10-03 22:45                                       ` GP Orcullo
@ 2014-10-03 22:48                                         ` Gilles Chanteperdrix
  2014-10-04 10:26                                           ` GP Orcullo
  0 siblings, 1 reply; 46+ messages in thread
From: Gilles Chanteperdrix @ 2014-10-03 22:48 UTC (permalink / raw)
  To: GP Orcullo; +Cc: Xenomai

On Sat, Oct 04, 2014 at 06:45:08AM +0800, GP Orcullo wrote:
> On Sat, Oct 4, 2014 at 3:14 AM, Gilles Chanteperdrix
> <gilles.chanteperdrix@xenomai.org> wrote:
> > On Fri, Oct 03, 2014 at 11:28:52PM +0800, GP Orcullo wrote:
> >> I've added BUG_ON(!hard_irqs_disabled()) to the code and got the
> >> kernel to oops at startup.
> >>
> >> Where shall I start looking for the offending code?
> >
> > The thing is, all the tracepoints you have are late after the
> > fault. A better strategy is to replace BUG_ON(!hard_irqs_disabled)
> > with:
> >
> > if (!hard_irqs_disabled()) {
> >    ipipe_trace_panic_freeze();
> >    ipipe_trace_panic_dump();
> >    BUG();
> > }
> >
>
> The results are almost similar.
>

> [   15.550769] Unable to handle kernel NULL pointer dereference at virtual address 0000000c

This is not caused by the code above. This is another bug.

With the code above, you should see a line
"BUG" or something.

--
					    Gilles.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] Switchtest failures on ODROIDU3
  2014-10-03 22:48                                         ` Gilles Chanteperdrix
@ 2014-10-04 10:26                                           ` GP Orcullo
  2014-10-04 11:31                                             ` Gilles Chanteperdrix
  0 siblings, 1 reply; 46+ messages in thread
From: GP Orcullo @ 2014-10-04 10:26 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Xenomai

On Sat, Oct 4, 2014 at 6:48 AM, Gilles Chanteperdrix
<gilles.chanteperdrix@xenomai.org> wrote:
> On Sat, Oct 04, 2014 at 06:45:08AM +0800, GP Orcullo wrote:
>> On Sat, Oct 4, 2014 at 3:14 AM, Gilles Chanteperdrix
>> <gilles.chanteperdrix@xenomai.org> wrote:
>> > On Fri, Oct 03, 2014 at 11:28:52PM +0800, GP Orcullo wrote:
>> >> I've added BUG_ON(!hard_irqs_disabled()) to the code and got the
>> >> kernel to oops at startup.
>> >>
>> >> Where shall I start looking for the offending code?
>> >
>> > The thing is, all the tracepoints you have are late after the
>> > fault. A better strategy is to replace BUG_ON(!hard_irqs_disabled)
>> > with:
>> >
>> > if (!hard_irqs_disabled()) {
>> >    ipipe_trace_panic_freeze();
>> >    ipipe_trace_panic_dump();
>> >    BUG();
>> > }
>> >
>>
>> The results are almost similar.
>>
>
>> [   15.550769] Unable to handle kernel NULL pointer dereference at virtual address 0000000c
>

This is due to user error - I've mixed up the module versions...

> This is not caused by the code above. This is another bug.
>
> With the code above, you should see a line
> "BUG" or something.
>
> --
>                                             Gilles.

Getting this kernel to print any messages before lockup is very
difficult. Is there another way of debugging it beside using jtag?

-- 
GP Orcullo


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] Switchtest failures on ODROIDU3
  2014-10-04 10:26                                           ` GP Orcullo
@ 2014-10-04 11:31                                             ` Gilles Chanteperdrix
  2014-10-05 22:00                                               ` GP Orcullo
  0 siblings, 1 reply; 46+ messages in thread
From: Gilles Chanteperdrix @ 2014-10-04 11:31 UTC (permalink / raw)
  To: GP Orcullo; +Cc: Xenomai

On Sat, Oct 04, 2014 at 06:26:33PM +0800, GP Orcullo wrote:
> On Sat, Oct 4, 2014 at 6:48 AM, Gilles Chanteperdrix
> <gilles.chanteperdrix@xenomai.org> wrote:
> > On Sat, Oct 04, 2014 at 06:45:08AM +0800, GP Orcullo wrote:
> >> On Sat, Oct 4, 2014 at 3:14 AM, Gilles Chanteperdrix
> >> <gilles.chanteperdrix@xenomai.org> wrote:
> >> > On Fri, Oct 03, 2014 at 11:28:52PM +0800, GP Orcullo wrote:
> >> >> I've added BUG_ON(!hard_irqs_disabled()) to the code and got the
> >> >> kernel to oops at startup.
> >> >>
> >> >> Where shall I start looking for the offending code?
> >> >
> >> > The thing is, all the tracepoints you have are late after the
> >> > fault. A better strategy is to replace BUG_ON(!hard_irqs_disabled)
> >> > with:
> >> >
> >> > if (!hard_irqs_disabled()) {
> >> >    ipipe_trace_panic_freeze();
> >> >    ipipe_trace_panic_dump();
> >> >    BUG();
> >> > }
> >> >
> >>
> >> The results are almost similar.
> >>
> >
> >> [   15.550769] Unable to handle kernel NULL pointer dereference at virtual address 0000000c
> >
>
> This is due to user error - I've mixed up the module versions...
>
> > This is not caused by the code above. This is another bug.
> >
> > With the code above, you should see a line
> > "BUG" or something.
> >
>
> Getting this kernel to print any messages before lockup is very
> difficult. Is there another way of debugging it beside using jtag?
>

I generally use printk or the I-pipe tracer in some creative way
(trigger a trace freeze on some condition then BUG()). You could for
instance check whether the Linux timer irq is still running by
putting a printascii(".") there every HZ ticks. If it is not running
check the same with the Xenomai timer irq, then finally the I-pipe
timer ack function. On x86 we have the NMI watchdog to help with
lockups, maybe someone could implement a similar functionality with
FIQ on ARM.

--
					    Gilles.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] Switchtest failures on ODROIDU3
  2014-10-04 11:31                                             ` Gilles Chanteperdrix
@ 2014-10-05 22:00                                               ` GP Orcullo
  2014-10-05 22:04                                                 ` Gilles Chanteperdrix
  0 siblings, 1 reply; 46+ messages in thread
From: GP Orcullo @ 2014-10-05 22:00 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Xenomai

On Sat, Oct 4, 2014 at 7:31 PM, Gilles Chanteperdrix
<gilles.chanteperdrix@xenomai.org> wrote:
> On Sat, Oct 04, 2014 at 06:26:33PM +0800, GP Orcullo wrote:
>> On Sat, Oct 4, 2014 at 6:48 AM, Gilles Chanteperdrix
>> <gilles.chanteperdrix@xenomai.org> wrote:
>> > On Sat, Oct 04, 2014 at 06:45:08AM +0800, GP Orcullo wrote:
>> >> On Sat, Oct 4, 2014 at 3:14 AM, Gilles Chanteperdrix
>> >> <gilles.chanteperdrix@xenomai.org> wrote:
>> >> > On Fri, Oct 03, 2014 at 11:28:52PM +0800, GP Orcullo wrote:
>> >> >> I've added BUG_ON(!hard_irqs_disabled()) to the code and got the
>> >> >> kernel to oops at startup.
>> >> >>
>> >> >> Where shall I start looking for the offending code?
>> >> >
>> >> > The thing is, all the tracepoints you have are late after the
>> >> > fault. A better strategy is to replace BUG_ON(!hard_irqs_disabled)
>> >> > with:
>> >> >
>> >> > if (!hard_irqs_disabled()) {
>> >> >    ipipe_trace_panic_freeze();
>> >> >    ipipe_trace_panic_dump();
>> >> >    BUG();
>> >> > }
>> >> >
>> >>
>> >> The results are almost similar.
>> >>
>> >
>> >> [   15.550769] Unable to handle kernel NULL pointer dereference at virtual address 0000000c
>> >
>>
>> This is due to user error - I've mixed up the module versions...
>>
>> > This is not caused by the code above. This is another bug.
>> >
>> > With the code above, you should see a line
>> > "BUG" or something.
>> >
>>
>> Getting this kernel to print any messages before lockup is very
>> difficult. Is there another way of debugging it beside using jtag?
>>
>
> I generally use printk or the I-pipe tracer in some creative way
> (trigger a trace freeze on some condition then BUG()). You could for
> instance check whether the Linux timer irq is still running by
> putting a printascii(".") there every HZ ticks. If it is not running
> check the same with the Xenomai timer irq, then finally the I-pipe
> timer ack function. On x86 we have the NMI watchdog to help with
> lockups, maybe someone could implement a similar functionality with
> FIQ on ARM.
>
> --
>                                             Gilles.


The cause of the lockups is due to CONFIG_PREEMPT. The board is
running for more than 6 hrs now after disabling it.

Thanks for all your help!

-- 
GP Orcullo


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] Switchtest failures on ODROIDU3
  2014-10-05 22:00                                               ` GP Orcullo
@ 2014-10-05 22:04                                                 ` Gilles Chanteperdrix
  2014-10-05 22:24                                                   ` GP Orcullo
  0 siblings, 1 reply; 46+ messages in thread
From: Gilles Chanteperdrix @ 2014-10-05 22:04 UTC (permalink / raw)
  To: GP Orcullo; +Cc: Xenomai

On Mon, Oct 06, 2014 at 06:00:23AM +0800, GP Orcullo wrote:
> On Sat, Oct 4, 2014 at 7:31 PM, Gilles Chanteperdrix
> <gilles.chanteperdrix@xenomai.org> wrote:
> > On Sat, Oct 04, 2014 at 06:26:33PM +0800, GP Orcullo wrote:
> >> On Sat, Oct 4, 2014 at 6:48 AM, Gilles Chanteperdrix
> >> <gilles.chanteperdrix@xenomai.org> wrote:
> >> > On Sat, Oct 04, 2014 at 06:45:08AM +0800, GP Orcullo wrote:
> >> >> On Sat, Oct 4, 2014 at 3:14 AM, Gilles Chanteperdrix
> >> >> <gilles.chanteperdrix@xenomai.org> wrote:
> >> >> > On Fri, Oct 03, 2014 at 11:28:52PM +0800, GP Orcullo wrote:
> >> >> >> I've added BUG_ON(!hard_irqs_disabled()) to the code and got the
> >> >> >> kernel to oops at startup.
> >> >> >>
> >> >> >> Where shall I start looking for the offending code?
> >> >> >
> >> >> > The thing is, all the tracepoints you have are late after the
> >> >> > fault. A better strategy is to replace BUG_ON(!hard_irqs_disabled)
> >> >> > with:
> >> >> >
> >> >> > if (!hard_irqs_disabled()) {
> >> >> >    ipipe_trace_panic_freeze();
> >> >> >    ipipe_trace_panic_dump();
> >> >> >    BUG();
> >> >> > }
> >> >> >
> >> >>
> >> >> The results are almost similar.
> >> >>
> >> >
> >> >> [   15.550769] Unable to handle kernel NULL pointer dereference at virtual address 0000000c
> >> >
> >>
> >> This is due to user error - I've mixed up the module versions...
> >>
> >> > This is not caused by the code above. This is another bug.
> >> >
> >> > With the code above, you should see a line
> >> > "BUG" or something.
> >> >
> >>
> >> Getting this kernel to print any messages before lockup is very
> >> difficult. Is there another way of debugging it beside using jtag?
> >>
> >
> > I generally use printk or the I-pipe tracer in some creative way
> > (trigger a trace freeze on some condition then BUG()). You could for
> > instance check whether the Linux timer irq is still running by
> > putting a printascii(".") there every HZ ticks. If it is not running
> > check the same with the Xenomai timer irq, then finally the I-pipe
> > timer ack function. On x86 we have the NMI watchdog to help with
> > lockups, maybe someone could implement a similar functionality with
> > FIQ on ARM.
> >
>
>
> The cause of the lockups is due to CONFIG_PREEMPT. The board is
> running for more than 6 hrs now after disabling it.
>
> Thanks for all your help!

Xenomai runs fine with CONFIG_PREEMPT. The reason why CONFIG_PREEMPT
may be an issue is if you are using "preempt_disable" or
"preempt_enable", which use current, in a place where you
should not (on the context of a kernel-space thread for
instance). This happens for instance when using a plain Linux
spinlock in real-time code.

But this should not happen.

--
					    Gilles.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] Switchtest failures on ODROIDU3
  2014-10-05 22:04                                                 ` Gilles Chanteperdrix
@ 2014-10-05 22:24                                                   ` GP Orcullo
  2014-10-05 22:30                                                     ` Gilles Chanteperdrix
  0 siblings, 1 reply; 46+ messages in thread
From: GP Orcullo @ 2014-10-05 22:24 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Xenomai

On Mon, Oct 6, 2014 at 6:04 AM, Gilles Chanteperdrix
<gilles.chanteperdrix@xenomai.org> wrote:
> On Mon, Oct 06, 2014 at 06:00:23AM +0800, GP Orcullo wrote:
>> On Sat, Oct 4, 2014 at 7:31 PM, Gilles Chanteperdrix
>> <gilles.chanteperdrix@xenomai.org> wrote:
>> > On Sat, Oct 04, 2014 at 06:26:33PM +0800, GP Orcullo wrote:
>> >> On Sat, Oct 4, 2014 at 6:48 AM, Gilles Chanteperdrix
>> >> <gilles.chanteperdrix@xenomai.org> wrote:
>> >> > On Sat, Oct 04, 2014 at 06:45:08AM +0800, GP Orcullo wrote:
>> >> >> On Sat, Oct 4, 2014 at 3:14 AM, Gilles Chanteperdrix
>> >> >> <gilles.chanteperdrix@xenomai.org> wrote:
>> >> >> > On Fri, Oct 03, 2014 at 11:28:52PM +0800, GP Orcullo wrote:
>> >> >> >> I've added BUG_ON(!hard_irqs_disabled()) to the code and got the
>> >> >> >> kernel to oops at startup.
>> >> >> >>
>> >> >> >> Where shall I start looking for the offending code?
>> >> >> >
>> >> >> > The thing is, all the tracepoints you have are late after the
>> >> >> > fault. A better strategy is to replace BUG_ON(!hard_irqs_disabled)
>> >> >> > with:
>> >> >> >
>> >> >> > if (!hard_irqs_disabled()) {
>> >> >> >    ipipe_trace_panic_freeze();
>> >> >> >    ipipe_trace_panic_dump();
>> >> >> >    BUG();
>> >> >> > }
>> >> >> >
>> >> >>
>> >> >> The results are almost similar.
>> >> >>
>> >> >
>> >> >> [   15.550769] Unable to handle kernel NULL pointer dereference at virtual address 0000000c
>> >> >
>> >>
>> >> This is due to user error - I've mixed up the module versions...
>> >>
>> >> > This is not caused by the code above. This is another bug.
>> >> >
>> >> > With the code above, you should see a line
>> >> > "BUG" or something.
>> >> >
>> >>
>> >> Getting this kernel to print any messages before lockup is very
>> >> difficult. Is there another way of debugging it beside using jtag?
>> >>
>> >
>> > I generally use printk or the I-pipe tracer in some creative way
>> > (trigger a trace freeze on some condition then BUG()). You could for
>> > instance check whether the Linux timer irq is still running by
>> > putting a printascii(".") there every HZ ticks. If it is not running
>> > check the same with the Xenomai timer irq, then finally the I-pipe
>> > timer ack function. On x86 we have the NMI watchdog to help with
>> > lockups, maybe someone could implement a similar functionality with
>> > FIQ on ARM.
>> >
>>
>>
>> The cause of the lockups is due to CONFIG_PREEMPT. The board is
>> running for more than 6 hrs now after disabling it.
>>
>> Thanks for all your help!
>
> Xenomai runs fine with CONFIG_PREEMPT. The reason why CONFIG_PREEMPT
> may be an issue is if you are using "preempt_disable" or
> "preempt_enable", which use current, in a place where you
> should not (on the context of a kernel-space thread for
> instance). This happens for instance when using a plain Linux
> spinlock in real-time code.
>
> But this should not happen.
>
> --
>                                             Gilles.

Before disabling CONFIG_PREEMPT, I've been seeing this line:

[  123.982550] hrtimer: interrupt took 280933 ns

Is this normal?

-- 
GP Orcullo


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] Switchtest failures on ODROIDU3
  2014-10-05 22:24                                                   ` GP Orcullo
@ 2014-10-05 22:30                                                     ` Gilles Chanteperdrix
  2014-10-09 10:02                                                       ` GP Orcullo
  0 siblings, 1 reply; 46+ messages in thread
From: Gilles Chanteperdrix @ 2014-10-05 22:30 UTC (permalink / raw)
  To: GP Orcullo; +Cc: Xenomai

On Mon, Oct 06, 2014 at 06:24:53AM +0800, GP Orcullo wrote:
> On Mon, Oct 6, 2014 at 6:04 AM, Gilles Chanteperdrix
> <gilles.chanteperdrix@xenomai.org> wrote:
> > On Mon, Oct 06, 2014 at 06:00:23AM +0800, GP Orcullo wrote:
> >> The cause of the lockups is due to CONFIG_PREEMPT. The board is
> >> running for more than 6 hrs now after disabling it.
> >>
> >> Thanks for all your help!
> >
> > Xenomai runs fine with CONFIG_PREEMPT. The reason why CONFIG_PREEMPT
> > may be an issue is if you are using "preempt_disable" or
> > "preempt_enable", which use current, in a place where you
> > should not (on the context of a kernel-space thread for
> > instance). This happens for instance when using a plain Linux
> > spinlock in real-time code.
> >
> > But this should not happen.
> >
>
> Before disabling CONFIG_PREEMPT, I've been seeing this line:
>
> [  123.982550] hrtimer: interrupt took 280933 ns
>
> Is this normal?
>

It means that Linux was interrupted by Xenomai during its timer
interrupt, and that Xenomai interrupted it for 280us. This may
happens with switchtest if it has a really long chain of context
switches. If you want to check what happened, enable the I-pipe
tracer, and trigger a trace freeze right before this message.

--
					    Gilles.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] Switchtest failures on ODROIDU3
  2014-10-05 22:30                                                     ` Gilles Chanteperdrix
@ 2014-10-09 10:02                                                       ` GP Orcullo
  2014-10-09 10:06                                                         ` Gilles Chanteperdrix
  0 siblings, 1 reply; 46+ messages in thread
From: GP Orcullo @ 2014-10-09 10:02 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Xenomai

On Mon, Oct 6, 2014 at 6:30 AM, Gilles Chanteperdrix
<gilles.chanteperdrix@xenomai.org> wrote:
>
> It means that Linux was interrupted by Xenomai during its timer
> interrupt, and that Xenomai interrupted it for 280us. This may
> happens with switchtest if it has a really long chain of context
> switches. If you want to check what happened, enable the I-pipe
> tracer, and trigger a trace freeze right before this message.
>
> --
>                                             Gilles.

One more piece to the puzzle: disabling CONFIG_IPIPE_DEBUG_INTERNAL
causes the system to lockup.

-- 
GP Orcullo


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] Switchtest failures on ODROIDU3
  2014-10-09 10:02                                                       ` GP Orcullo
@ 2014-10-09 10:06                                                         ` Gilles Chanteperdrix
  2014-10-09 10:12                                                           ` GP Orcullo
  0 siblings, 1 reply; 46+ messages in thread
From: Gilles Chanteperdrix @ 2014-10-09 10:06 UTC (permalink / raw)
  To: GP Orcullo; +Cc: Xenomai

On 10/09/2014 12:02 PM, GP Orcullo wrote:
> On Mon, Oct 6, 2014 at 6:30 AM, Gilles Chanteperdrix
> <gilles.chanteperdrix@xenomai.org> wrote:
>>
>> It means that Linux was interrupted by Xenomai during its timer
>> interrupt, and that Xenomai interrupted it for 280us. This may
>> happens with switchtest if it has a really long chain of context
>> switches. If you want to check what happened, enable the I-pipe
>> tracer, and trigger a trace freeze right before this message.
>>
>> --
>>                                             Gilles.
> 
> One more piece to the puzzle: disabling CONFIG_IPIPE_DEBUG_INTERNAL
> causes the system to lockup.
> 
How do you know this is related?

-- 
                                                                Gilles.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] Switchtest failures on ODROIDU3
  2014-10-09 10:06                                                         ` Gilles Chanteperdrix
@ 2014-10-09 10:12                                                           ` GP Orcullo
  2014-10-09 10:16                                                             ` Gilles Chanteperdrix
  0 siblings, 1 reply; 46+ messages in thread
From: GP Orcullo @ 2014-10-09 10:12 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Xenomai

On Thu, Oct 9, 2014 at 6:06 PM, Gilles Chanteperdrix
<gilles.chanteperdrix@xenomai.org> wrote:
> On 10/09/2014 12:02 PM, GP Orcullo wrote:
>> On Mon, Oct 6, 2014 at 6:30 AM, Gilles Chanteperdrix
>> <gilles.chanteperdrix@xenomai.org> wrote:
>>>
>>> It means that Linux was interrupted by Xenomai during its timer
>>> interrupt, and that Xenomai interrupted it for 280us. This may
>>> happens with switchtest if it has a really long chain of context
>>> switches. If you want to check what happened, enable the I-pipe
>>> tracer, and trigger a trace freeze right before this message.
>>>
>>> --
>>>                                             Gilles.
>>
>> One more piece to the puzzle: disabling CONFIG_IPIPE_DEBUG_INTERNAL
>> causes the system to lockup.
>>
> How do you know this is related?
>
> --
>                                                                 Gilles.

Sorry, I quoted the wrong message.

If CONFIG_PREEMPT is disabled and CONFIG_IPIPE_DEBUG_INTERNAL is not
disabled, the system works fine.

-- 
GP Orcullo


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] Switchtest failures on ODROIDU3
  2014-10-09 10:12                                                           ` GP Orcullo
@ 2014-10-09 10:16                                                             ` Gilles Chanteperdrix
  2014-10-09 10:41                                                               ` Gilles Chanteperdrix
  2014-10-09 11:06                                                               ` GP Orcullo
  0 siblings, 2 replies; 46+ messages in thread
From: Gilles Chanteperdrix @ 2014-10-09 10:16 UTC (permalink / raw)
  To: GP Orcullo; +Cc: Xenomai

On 10/09/2014 12:12 PM, GP Orcullo wrote:
> On Thu, Oct 9, 2014 at 6:06 PM, Gilles Chanteperdrix
> <gilles.chanteperdrix@xenomai.org> wrote:
>> On 10/09/2014 12:02 PM, GP Orcullo wrote:
>>> On Mon, Oct 6, 2014 at 6:30 AM, Gilles Chanteperdrix
>>> <gilles.chanteperdrix@xenomai.org> wrote:
>>>>
>>>> It means that Linux was interrupted by Xenomai during its timer
>>>> interrupt, and that Xenomai interrupted it for 280us. This may
>>>> happens with switchtest if it has a really long chain of context
>>>> switches. If you want to check what happened, enable the I-pipe
>>>> tracer, and trigger a trace freeze right before this message.
>>>>
>>>> --
>>>>                                             Gilles.
>>>
>>> One more piece to the puzzle: disabling CONFIG_IPIPE_DEBUG_INTERNAL
>>> causes the system to lockup.
>>>
>> How do you know this is related?
>>
>> --
>>                                                                 Gilles.
> 
> Sorry, I quoted the wrong message.
> 
> If CONFIG_PREEMPT is disabled and CONFIG_IPIPE_DEBUG_INTERNAL is not
> disabled, the system works fine.
> 
So, there is a problem, likely in your port with CONFIG_PREEMPT, but
maybe in Xenomai (I need to check, because I am not so sure I tested
xeno-regression-test without CONFIG_PREEMPT).

And there is a problem in your port without CONFIG_IPIPE_DEBUG_INTERNAL.
This I do not need to check, I have tested Xenomai wihout this option
enabled.

So, my question is: how do you know the two issues are related?

-- 
                                                                Gilles.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] Switchtest failures on ODROIDU3
  2014-10-09 10:16                                                             ` Gilles Chanteperdrix
@ 2014-10-09 10:41                                                               ` Gilles Chanteperdrix
  2014-10-09 11:06                                                               ` GP Orcullo
  1 sibling, 0 replies; 46+ messages in thread
From: Gilles Chanteperdrix @ 2014-10-09 10:41 UTC (permalink / raw)
  To: GP Orcullo; +Cc: Xenomai

On 10/09/2014 12:16 PM, Gilles Chanteperdrix wrote:
> On 10/09/2014 12:12 PM, GP Orcullo wrote:
>> On Thu, Oct 9, 2014 at 6:06 PM, Gilles Chanteperdrix
>> <gilles.chanteperdrix@xenomai.org> wrote:
>>> On 10/09/2014 12:02 PM, GP Orcullo wrote:
>>>> On Mon, Oct 6, 2014 at 6:30 AM, Gilles Chanteperdrix
>>>> <gilles.chanteperdrix@xenomai.org> wrote:
>>>>>
>>>>> It means that Linux was interrupted by Xenomai during its timer
>>>>> interrupt, and that Xenomai interrupted it for 280us. This may
>>>>> happens with switchtest if it has a really long chain of context
>>>>> switches. If you want to check what happened, enable the I-pipe
>>>>> tracer, and trigger a trace freeze right before this message.
>>>>>
>>>>> --
>>>>>                                             Gilles.
>>>>
>>>> One more piece to the puzzle: disabling CONFIG_IPIPE_DEBUG_INTERNAL
>>>> causes the system to lockup.
>>>>
>>> How do you know this is related?
>>>
>>> --
>>>                                                                 Gilles.
>>
>> Sorry, I quoted the wrong message.
>>
>> If CONFIG_PREEMPT is disabled and CONFIG_IPIPE_DEBUG_INTERNAL is not
>> disabled, the system works fine.
>>
> So, there is a problem, likely in your port with CONFIG_PREEMPT, but
> maybe in Xenomai (I need to check, because I am not so sure I tested
> xeno-regression-test without CONFIG_PREEMPT).

I meant I am not sure I tested xeno-regression with CONFIG_PREEMPT.


-- 
                                                                Gilles.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] Switchtest failures on ODROIDU3
  2014-10-09 10:16                                                             ` Gilles Chanteperdrix
  2014-10-09 10:41                                                               ` Gilles Chanteperdrix
@ 2014-10-09 11:06                                                               ` GP Orcullo
  2014-10-09 13:06                                                                 ` Gilles Chanteperdrix
  2014-10-09 15:14                                                                 ` Gilles Chanteperdrix
  1 sibling, 2 replies; 46+ messages in thread
From: GP Orcullo @ 2014-10-09 11:06 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Xenomai

On Oct 9, 2014 6:16 PM, "Gilles Chanteperdrix" <
gilles.chanteperdrix@xenomai.org> wrote:
>
> On 10/09/2014 12:12 PM, GP Orcullo wrote:
> > On Thu, Oct 9, 2014 at 6:06 PM, Gilles Chanteperdrix
> > <gilles.chanteperdrix@xenomai.org> wrote:
> >> On 10/09/2014 12:02 PM, GP Orcullo wrote:
> >>> On Mon, Oct 6, 2014 at 6:30 AM, Gilles Chanteperdrix
> >>> <gilles.chanteperdrix@xenomai.org> wrote:
> >>>>
> >>>> It means that Linux was interrupted by Xenomai during its timer
> >>>> interrupt, and that Xenomai interrupted it for 280us. This may
> >>>> happens with switchtest if it has a really long chain of context
> >>>> switches. If you want to check what happened, enable the I-pipe
> >>>> tracer, and trigger a trace freeze right before this message.
> >>>>
> >>>> --
> >>>>                                             Gilles.
> >>>
> >>> One more piece to the puzzle: disabling CONFIG_IPIPE_DEBUG_INTERNAL
> >>> causes the system to lockup.
> >>>
> >> How do you know this is related?
> >>
> >> --
> >>                                                                 Gilles.
> >
> > Sorry, I quoted the wrong message.
> >
> > If CONFIG_PREEMPT is disabled and CONFIG_IPIPE_DEBUG_INTERNAL is not
> > disabled, the system works fine.
> >
> So, there is a problem, likely in your port with CONFIG_PREEMPT, but
> maybe in Xenomai (I need to check, because I am not so sure I tested
> xeno-regression-test without CONFIG_PREEMPT).
>
> And there is a problem in your port without CONFIG_IPIPE_DEBUG_INTERNAL.
> This I do not need to check, I have tested Xenomai wihout this option
> enabled.
>
> So, my question is: how do you know the two issues are related?
>
> --
>                                                                 Gilles.

I don't know the answer.

I'm only looking at the effects and not the cause of the issue.

So,  where shall I start digging?  What's in  CONFIG_IPIPE_DEBUG_INTERNAL
that would somehow suppress the problem?

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] Switchtest failures on ODROIDU3
  2014-10-09 11:06                                                               ` GP Orcullo
@ 2014-10-09 13:06                                                                 ` Gilles Chanteperdrix
  2014-10-09 15:14                                                                 ` Gilles Chanteperdrix
  1 sibling, 0 replies; 46+ messages in thread
From: Gilles Chanteperdrix @ 2014-10-09 13:06 UTC (permalink / raw)
  To: GP Orcullo; +Cc: Xenomai

On 10/09/2014 01:06 PM, GP Orcullo wrote:
> On Oct 9, 2014 6:16 PM, "Gilles Chanteperdrix" <
> gilles.chanteperdrix@xenomai.org> wrote:
>>
>> On 10/09/2014 12:12 PM, GP Orcullo wrote:
>>> On Thu, Oct 9, 2014 at 6:06 PM, Gilles Chanteperdrix
>>> <gilles.chanteperdrix@xenomai.org> wrote:
>>>> On 10/09/2014 12:02 PM, GP Orcullo wrote:
>>>>> On Mon, Oct 6, 2014 at 6:30 AM, Gilles Chanteperdrix
>>>>> <gilles.chanteperdrix@xenomai.org> wrote:
>>>>>>
>>>>>> It means that Linux was interrupted by Xenomai during its timer
>>>>>> interrupt, and that Xenomai interrupted it for 280us. This may
>>>>>> happens with switchtest if it has a really long chain of context
>>>>>> switches. If you want to check what happened, enable the I-pipe
>>>>>> tracer, and trigger a trace freeze right before this message.
>>>>>>
>>>>>> --
>>>>>>                                             Gilles.
>>>>>
>>>>> One more piece to the puzzle: disabling CONFIG_IPIPE_DEBUG_INTERNAL
>>>>> causes the system to lockup.
>>>>>
>>>> How do you know this is related?
>>>>
>>>> --
>>>>                                                                 Gilles.
>>>
>>> Sorry, I quoted the wrong message.
>>>
>>> If CONFIG_PREEMPT is disabled and CONFIG_IPIPE_DEBUG_INTERNAL is not
>>> disabled, the system works fine.
>>>
>> So, there is a problem, likely in your port with CONFIG_PREEMPT, but
>> maybe in Xenomai (I need to check, because I am not so sure I tested
>> xeno-regression-test without CONFIG_PREEMPT).
>>
>> And there is a problem in your port without CONFIG_IPIPE_DEBUG_INTERNAL.
>> This I do not need to check, I have tested Xenomai wihout this option
>> enabled.
>>
>> So, my question is: how do you know the two issues are related?
>>
>> --
>>                                                                 Gilles.
> 
> I don't know the answer.
> 
> I'm only looking at the effects and not the cause of the issue.
> 
> So,  where shall I start digging?  What's in  CONFIG_IPIPE_DEBUG_INTERNAL
> that would somehow suppress the problem?
> 

Basically, you can grep IPIPE_DEBUG_INTERNAL in the sources, and try and
apply the diffs hunk by hunk. As a wild guess, I would start with the
one in arch/arm/asm/percpu.h.

-- 
                                                                Gilles.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] Switchtest failures on ODROIDU3
  2014-10-09 11:06                                                               ` GP Orcullo
  2014-10-09 13:06                                                                 ` Gilles Chanteperdrix
@ 2014-10-09 15:14                                                                 ` Gilles Chanteperdrix
  2014-10-20  7:29                                                                   ` GP Orcullo
  1 sibling, 1 reply; 46+ messages in thread
From: Gilles Chanteperdrix @ 2014-10-09 15:14 UTC (permalink / raw)
  To: GP Orcullo; +Cc: Xenomai

On 10/09/2014 01:06 PM, GP Orcullo wrote:
> On Oct 9, 2014 6:16 PM, "Gilles Chanteperdrix" <
> gilles.chanteperdrix@xenomai.org> wrote:
>>
>> On 10/09/2014 12:12 PM, GP Orcullo wrote:
>>> On Thu, Oct 9, 2014 at 6:06 PM, Gilles Chanteperdrix
>>> <gilles.chanteperdrix@xenomai.org> wrote:
>>>> On 10/09/2014 12:02 PM, GP Orcullo wrote:
>>>>> On Mon, Oct 6, 2014 at 6:30 AM, Gilles Chanteperdrix
>>>>> <gilles.chanteperdrix@xenomai.org> wrote:
>>>>>>
>>>>>> It means that Linux was interrupted by Xenomai during its timer
>>>>>> interrupt, and that Xenomai interrupted it for 280us. This may
>>>>>> happens with switchtest if it has a really long chain of context
>>>>>> switches. If you want to check what happened, enable the I-pipe
>>>>>> tracer, and trigger a trace freeze right before this message.
>>>>>>
>>>>>> --
>>>>>>                                             Gilles.
>>>>>
>>>>> One more piece to the puzzle: disabling CONFIG_IPIPE_DEBUG_INTERNAL
>>>>> causes the system to lockup.
>>>>>
>>>> How do you know this is related?
>>>>
>>>> --
>>>>                                                                 Gilles.
>>>
>>> Sorry, I quoted the wrong message.
>>>
>>> If CONFIG_PREEMPT is disabled and CONFIG_IPIPE_DEBUG_INTERNAL is not
>>> disabled, the system works fine.
>>>
>> So, there is a problem, likely in your port with CONFIG_PREEMPT, but
>> maybe in Xenomai (I need to check, because I am not so sure I tested
>> xeno-regression-test without CONFIG_PREEMPT).
>>
>> And there is a problem in your port without CONFIG_IPIPE_DEBUG_INTERNAL.
>> This I do not need to check, I have tested Xenomai wihout this option
>> enabled.
>>
>> So, my question is: how do you know the two issues are related?
>>
>> --
>>                                                                 Gilles.
> 
> I don't know the answer.
> 
> I'm only looking at the effects and not the cause of the issue.
> 
> So,  where shall I start digging?  What's in  CONFIG_IPIPE_DEBUG_INTERNAL
> that would somehow suppress the problem?
> 

Quite frankly, I would go the other way: check that every piece of code
which may be executed over real-time context does not use any Linux
code. That does not include a lot of code, actually all that is covered
by the porting guide:
- the interrupt controller callbacks (note that the GIC handles some SOC
specific callabcks, so if you have some, you need to check them)
- the chained interrupt demux handlers
- the timer and tsc management functions.
- some workaround specific code that hooks in the iowrite/writel
functions, such as the L2 cache synchronization on omap4.

And it seems that is all, so, there should not be a lot of code to check.

-- 
                                                                Gilles.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] Switchtest failures on ODROIDU3
  2014-10-09 15:14                                                                 ` Gilles Chanteperdrix
@ 2014-10-20  7:29                                                                   ` GP Orcullo
  2014-10-20  7:33                                                                     ` Gilles Chanteperdrix
  2014-10-22  6:28                                                                     ` Gilles Chanteperdrix
  0 siblings, 2 replies; 46+ messages in thread
From: GP Orcullo @ 2014-10-20  7:29 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Xenomai

On Oct 9, 2014 11:14 PM, "Gilles Chanteperdrix" <
gilles.chanteperdrix@xenomai.org> wrote:
>
> On 10/09/2014 01:06 PM, GP Orcullo wrote:
> > On Oct 9, 2014 6:16 PM, "Gilles Chanteperdrix" <
> > gilles.chanteperdrix@xenomai.org> wrote:
> >>
> >> On 10/09/2014 12:12 PM, GP Orcullo wrote:
> >>> On Thu, Oct 9, 2014 at 6:06 PM, Gilles Chanteperdrix
> >>> <gilles.chanteperdrix@xenomai.org> wrote:
> >>>> On 10/09/2014 12:02 PM, GP Orcullo wrote:
> >>>>> On Mon, Oct 6, 2014 at 6:30 AM, Gilles Chanteperdrix
> >>>>> <gilles.chanteperdrix@xenomai.org> wrote:
> >>>>>>
> >>>>>> It means that Linux was interrupted by Xenomai during its timer
> >>>>>> interrupt, and that Xenomai interrupted it for 280us. This may
> >>>>>> happens with switchtest if it has a really long chain of context
> >>>>>> switches. If you want to check what happened, enable the I-pipe
> >>>>>> tracer, and trigger a trace freeze right before this message.
> >>>>>>
> >>>>>> --
> >>>>>>                                             Gilles.
> >>>>>
> >>>>> One more piece to the puzzle: disabling CONFIG_IPIPE_DEBUG_INTERNAL
> >>>>> causes the system to lockup.
> >>>>>
> >>>> How do you know this is related?
> >>>>
> >>>> --
> >>>>
 Gilles.
> >>>
> >>> Sorry, I quoted the wrong message.
> >>>
> >>> If CONFIG_PREEMPT is disabled and CONFIG_IPIPE_DEBUG_INTERNAL is not
> >>> disabled, the system works fine.
> >>>
> >> So, there is a problem, likely in your port with CONFIG_PREEMPT, but
> >> maybe in Xenomai (I need to check, because I am not so sure I tested
> >> xeno-regression-test without CONFIG_PREEMPT).
> >>
> >> And there is a problem in your port without
CONFIG_IPIPE_DEBUG_INTERNAL.
> >> This I do not need to check, I have tested Xenomai wihout this option
> >> enabled.
> >>
> >> So, my question is: how do you know the two issues are related?
> >>
> >> --
> >>                                                                 Gilles.
> >
> > I don't know the answer.
> >
> > I'm only looking at the effects and not the cause of the issue.
> >
> > So,  where shall I start digging?  What's in
CONFIG_IPIPE_DEBUG_INTERNAL
> > that would somehow suppress the problem?
> >
>
> Quite frankly, I would go the other way: check that every piece of code
> which may be executed over real-time context does not use any Linux
> code. That does not include a lot of code, actually all that is covered
> by the porting guide:
> - the interrupt controller callbacks (note that the GIC handles some SOC
> specific callabcks, so if you have some, you need to check them)
> - the chained interrupt demux handlers
> - the timer and tsc management functions.
> - some workaround specific code that hooks in the iowrite/writel
> functions, such as the L2 cache synchronization on omap4.
>
> And it seems that is all, so, there should not be a lot of code to check.
>
> --
>

Hello Gilles,

Thanks for all the help.

The problem was traced to the tsc emulation,  the counter somehow gets
messed up when the global timer is shared with Linux.

Using the global timer exclusively for the tsc emulation fixed all the
issues that have been encountered.

I'll post the updated ipipe patches soon.

Regards,

Gemi

^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] Switchtest failures on ODROIDU3
  2014-10-20  7:29                                                                   ` GP Orcullo
@ 2014-10-20  7:33                                                                     ` Gilles Chanteperdrix
  2014-10-22  6:28                                                                     ` Gilles Chanteperdrix
  1 sibling, 0 replies; 46+ messages in thread
From: Gilles Chanteperdrix @ 2014-10-20  7:33 UTC (permalink / raw)
  To: GP Orcullo; +Cc: Xenomai

On Mon, Oct 20, 2014 at 03:29:26PM +0800, GP Orcullo wrote:
> On Oct 9, 2014 11:14 PM, "Gilles Chanteperdrix" <
> gilles.chanteperdrix@xenomai.org> wrote:
> >
> > On 10/09/2014 01:06 PM, GP Orcullo wrote:
> > > On Oct 9, 2014 6:16 PM, "Gilles Chanteperdrix" <
> > > gilles.chanteperdrix@xenomai.org> wrote:
> > >>
> > >> On 10/09/2014 12:12 PM, GP Orcullo wrote:
> > >>> On Thu, Oct 9, 2014 at 6:06 PM, Gilles Chanteperdrix
> > >>> <gilles.chanteperdrix@xenomai.org> wrote:
> > >>>> On 10/09/2014 12:02 PM, GP Orcullo wrote:
> > >>>>> On Mon, Oct 6, 2014 at 6:30 AM, Gilles Chanteperdrix
> > >>>>> <gilles.chanteperdrix@xenomai.org> wrote:
> > >>>>>>
> > >>>>>> It means that Linux was interrupted by Xenomai during its timer
> > >>>>>> interrupt, and that Xenomai interrupted it for 280us. This may
> > >>>>>> happens with switchtest if it has a really long chain of context
> > >>>>>> switches. If you want to check what happened, enable the I-pipe
> > >>>>>> tracer, and trigger a trace freeze right before this message.
> > >>>>>>
> > >>>>>
> > >>>>> One more piece to the puzzle: disabling CONFIG_IPIPE_DEBUG_INTERNAL
> > >>>>> causes the system to lockup.
> > >>>>>
> > >>>> How do you know this is related?
> > >>>>
> > >>>>
>  Gilles.
> > >>>
> > >>> Sorry, I quoted the wrong message.
> > >>>
> > >>> If CONFIG_PREEMPT is disabled and CONFIG_IPIPE_DEBUG_INTERNAL is not
> > >>> disabled, the system works fine.
> > >>>
> > >> So, there is a problem, likely in your port with CONFIG_PREEMPT, but
> > >> maybe in Xenomai (I need to check, because I am not so sure I tested
> > >> xeno-regression-test without CONFIG_PREEMPT).
> > >>
> > >> And there is a problem in your port without
> CONFIG_IPIPE_DEBUG_INTERNAL.
> > >> This I do not need to check, I have tested Xenomai wihout this option
> > >> enabled.
> > >>
> > >> So, my question is: how do you know the two issues are related?
> > >>
> > >
> > > I don't know the answer.
> > >
> > > I'm only looking at the effects and not the cause of the issue.
> > >
> > > So,  where shall I start digging?  What's in
> CONFIG_IPIPE_DEBUG_INTERNAL
> > > that would somehow suppress the problem?
> > >
> >
> > Quite frankly, I would go the other way: check that every piece of code
> > which may be executed over real-time context does not use any Linux
> > code. That does not include a lot of code, actually all that is covered
> > by the porting guide:
> > - the interrupt controller callbacks (note that the GIC handles some SOC
> > specific callabcks, so if you have some, you need to check them)
> > - the chained interrupt demux handlers
> > - the timer and tsc management functions.
> > - some workaround specific code that hooks in the iowrite/writel
> > functions, such as the L2 cache synchronization on omap4.
> >
> > And it seems that is all, so, there should not be a lot of code to check.
> >
> >
> 
> Hello Gilles,
> 
> Thanks for all the help.
> 
> The problem was traced to the tsc emulation,  the counter somehow gets
> messed up when the global timer is shared with Linux.

Thanks, this needs to be fixed properly, probably integrating the global
timer based tsc emulation into Linux global timer code. When Xenomai
started using the global timer, Linux did not use it.

> 
> Using the global timer exclusively for the tsc emulation fixed all the
> issues that have been encountered.
> 
> I'll post the updated ipipe patches soon.

Ok, thanks avain. Please post them as patches to the I-pipe git.

Regards.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] Switchtest failures on ODROIDU3
  2014-10-20  7:29                                                                   ` GP Orcullo
  2014-10-20  7:33                                                                     ` Gilles Chanteperdrix
@ 2014-10-22  6:28                                                                     ` Gilles Chanteperdrix
  2014-10-29  1:23                                                                       ` GP Orcullo
  1 sibling, 1 reply; 46+ messages in thread
From: Gilles Chanteperdrix @ 2014-10-22  6:28 UTC (permalink / raw)
  To: GP Orcullo; +Cc: Xenomai

On Mon, Oct 20, 2014 at 03:29:26PM +0800, GP Orcullo wrote:
> On Oct 9, 2014 11:14 PM, "Gilles Chanteperdrix" <
> gilles.chanteperdrix@xenomai.org> wrote:
> >
> > On 10/09/2014 01:06 PM, GP Orcullo wrote:
> > > On Oct 9, 2014 6:16 PM, "Gilles Chanteperdrix" <
> > > gilles.chanteperdrix@xenomai.org> wrote:
> > >>
> > >> On 10/09/2014 12:12 PM, GP Orcullo wrote:
> > >>> On Thu, Oct 9, 2014 at 6:06 PM, Gilles Chanteperdrix
> > >>> <gilles.chanteperdrix@xenomai.org> wrote:
> > >>>> On 10/09/2014 12:02 PM, GP Orcullo wrote:
> > >>>>> On Mon, Oct 6, 2014 at 6:30 AM, Gilles Chanteperdrix
> > >>>>> <gilles.chanteperdrix@xenomai.org> wrote:
> > >>>>>>
> > >>>>>> It means that Linux was interrupted by Xenomai during its timer
> > >>>>>> interrupt, and that Xenomai interrupted it for 280us. This may
> > >>>>>> happens with switchtest if it has a really long chain of context
> > >>>>>> switches. If you want to check what happened, enable the I-pipe
> > >>>>>> tracer, and trigger a trace freeze right before this message.
> > >>>>>>
> > >>>>>
> > >>>>> One more piece to the puzzle: disabling CONFIG_IPIPE_DEBUG_INTERNAL
> > >>>>> causes the system to lockup.
> > >>>>>
> > >>>> How do you know this is related?
> > >>>>
> > >>>>
>  Gilles.
> > >>>
> > >>> Sorry, I quoted the wrong message.
> > >>>
> > >>> If CONFIG_PREEMPT is disabled and CONFIG_IPIPE_DEBUG_INTERNAL is not
> > >>> disabled, the system works fine.
> > >>>
> > >> So, there is a problem, likely in your port with CONFIG_PREEMPT, but
> > >> maybe in Xenomai (I need to check, because I am not so sure I tested
> > >> xeno-regression-test without CONFIG_PREEMPT).
> > >>
> > >> And there is a problem in your port without
> CONFIG_IPIPE_DEBUG_INTERNAL.
> > >> This I do not need to check, I have tested Xenomai wihout this option
> > >> enabled.
> > >>
> > >> So, my question is: how do you know the two issues are related?
> > >>
> > >
> > > I don't know the answer.
> > >
> > > I'm only looking at the effects and not the cause of the issue.
> > >
> > > So,  where shall I start digging?  What's in
> CONFIG_IPIPE_DEBUG_INTERNAL
> > > that would somehow suppress the problem?
> > >
> >
> > Quite frankly, I would go the other way: check that every piece of code
> > which may be executed over real-time context does not use any Linux
> > code. That does not include a lot of code, actually all that is covered
> > by the porting guide:
> > - the interrupt controller callbacks (note that the GIC handles some SOC
> > specific callabcks, so if you have some, you need to check them)
> > - the chained interrupt demux handlers
> > - the timer and tsc management functions.
> > - some workaround specific code that hooks in the iowrite/writel
> > functions, such as the L2 cache synchronization on omap4.
> >
> > And it seems that is all, so, there should not be a lot of code to check.
> >
> >
> 
> Hello Gilles,
> 
> Thanks for all the help.
> 
> The problem was traced to the tsc emulation,  the counter somehow gets
> messed up when the global timer is shared with Linux.

Hi,

could you try the following patch and tell me if it avoids the
issue?

Thanks.
Regards.

diff --git a/arch/arm/boot/dts/imx6qdl.dtsi b/arch/arm/boot/dts/imx6qdl.dtsi
index fb28b2e..150beb5 100644
--- a/arch/arm/boot/dts/imx6qdl.dtsi
+++ b/arch/arm/boot/dts/imx6qdl.dtsi
@@ -106,6 +106,12 @@
 			clocks = <&clks 15>;
 		};
 
+		timer@00a00200 {
+			compatible = "arm,cortex-a9-global-timer";
+			reg = <0x00a00200 0x20>;
+			clocks = <&clks 15>;
+		};
+
 		L2: l2-cache@00a02000 {
 			compatible = "arm,pl310-cache";
 			reg = <0x00a02000 0x1000>;
diff --git a/arch/arm/boot/dts/omap4.dtsi b/arch/arm/boot/dts/omap4.dtsi
index a914496..2cb8144 100644
--- a/arch/arm/boot/dts/omap4.dtsi
+++ b/arch/arm/boot/dts/omap4.dtsi
@@ -67,6 +67,12 @@
 		interrupts = <GIC_PPI 13 (GIC_CPU_MASK_RAW(3) | IRQ_TYPE_LEVEL_HIGH)>;
 	};
 
+	global_timer: timer@48240200 {
+		compatible = "arm,cortex-a9-global-timer";
+		reg = <0x48240200 0x20>;
+		clocks = <&mpu_periphclk>;
+	};
+
 	/*
 	 * The soc node represents the soc top level view. It is uses for IPs
 	 * that are not memory mapped in the MPU view or for the MPU itself.
diff --git a/arch/arm/kernel/smp_twd.c b/arch/arm/kernel/smp_twd.c
index f0b7c52..864651c 100644
--- a/arch/arm/kernel/smp_twd.c
+++ b/arch/arm/kernel/smp_twd.c
@@ -28,7 +28,6 @@
 #include <asm/smp_plat.h>
 #include <asm/smp_twd.h>
 #include <asm/cputype.h>
-#include <asm/ipipe.h>
 
 /* set up by the platform code */
 static void __iomem *twd_base;
@@ -51,42 +50,9 @@ static void twd_ack(void)
 	writel_relaxed(1, twd_base + TWD_TIMER_INTSTAT);
 }
 
-static struct __ipipe_tscinfo tsc_info;
-
 static void twd_get_clock(struct device_node *np);
 static void __cpuinit twd_calibrate_rate(void);
 
-static void __init gt_setup(unsigned long base_paddr, unsigned bits)
-{
-	if ((read_cpuid_id() & 0xf00000) == 0)
-		return;
-
-	gt_base = ioremap(base_paddr, SZ_256);
-	BUG_ON(!gt_base);
-
-	/* Start global timer */
-	__raw_writel(1, gt_base + 0x8);
-
-	tsc_info.type = IPIPE_TSC_TYPE_FREERUNNING;
-	tsc_info.freq = twd_timer_rate;
-	tsc_info.counter_vaddr = (unsigned long)gt_base;
-	tsc_info.u.counter_paddr = base_paddr;
-
-	switch(bits) {
-	case 64:
-		tsc_info.u.mask = 0xffffffffffffffffULL;
-		break;
-	case 32:
-		tsc_info.u.mask = 0xffffffff;
-		break;
-	default:
-		/* Only supported as a 32 bits or 64 bits */
-		BUG();
-	}
-
-	__ipipe_tsc_register(&tsc_info);
-}
-
 #ifdef CONFIG_IPIPE_DEBUG_INTERNAL
 
 static DEFINE_PER_CPU(int, irqs);
@@ -460,8 +426,6 @@ out_free:
 
 int __init twd_local_timer_register(struct twd_local_timer *tlt)
 {
-	int rc;
-
 	if (twd_base || twd_evt)
 		return -EBUSY;
 
@@ -472,13 +436,7 @@ int __init twd_local_timer_register(struct twd_local_timer *tlt)
 		return -ENOMEM;
 
 
-	rc = twd_local_timer_common_register(NULL);
-	if (rc == 0)
-#ifdef CONFIG_IPIPE
-		gt_setup(tlt->res[0].start - 0x400, 32);
-#endif
-
-	return rc;
+	return twd_local_timer_common_register(NULL);
 }
 
 #ifdef CONFIG_OF
@@ -503,16 +461,6 @@ static void __init twd_local_timer_of_register(struct device_node *np)
 
 
 	err = twd_local_timer_common_register(np);
-#ifdef CONFIG_IPIPE
-	if (err == 0) {
-		struct resource res;
-
-		if (of_address_to_resource(np, 0, &res))
-			res.start = 0;
-
-		gt_setup(res.start - 0x400, 32);
-	}
-#endif /* CONFIG_IPIPE */
 
 out:
 	WARN(err, "twd_local_timer_of_register failed (%d)\n", err);
diff --git a/arch/arm/mach-imx/Kconfig b/arch/arm/mach-imx/Kconfig
index 33567aa..b825a09 100644
--- a/arch/arm/mach-imx/Kconfig
+++ b/arch/arm/mach-imx/Kconfig
@@ -808,6 +808,7 @@ config SOC_IMX6Q
 	select PL310_ERRATA_727915 if CACHE_PL310
 	select PL310_ERRATA_769419 if CACHE_PL310
 	select PM_OPP if PM
+	select ARM_GLOBAL_TIMER if SMP && IPIPE
 
 	help
 	  This enables support for Freescale i.MX6 Quad processor.
diff --git a/arch/arm/mach-omap2/Kconfig b/arch/arm/mach-omap2/Kconfig
index 0af7ca0..63a5b42 100644
--- a/arch/arm/mach-omap2/Kconfig
+++ b/arch/arm/mach-omap2/Kconfig
@@ -45,6 +45,7 @@ config ARCH_OMAP4
 	select USB_ARCH_HAS_EHCI if USB_SUPPORT
 	select ARM_ERRATA_754322
 	select ARM_ERRATA_775420
+	select ARM_GLOBAL_TIMER if IPIPE && SMP
 
 config SOC_OMAP5
 	bool "TI OMAP5"
diff --git a/drivers/clocksource/arm_global_timer.c b/drivers/clocksource/arm_global_timer.c
index 0fc31d0..62e208b 100644
--- a/drivers/clocksource/arm_global_timer.c
+++ b/drivers/clocksource/arm_global_timer.c
@@ -24,6 +24,7 @@
 #include <linux/sched_clock.h>
 
 #include <asm/cputype.h>
+#include <asm/ipipe.h>
 
 #define GT_COUNTER0	0x00
 #define GT_COUNTER1	0x04
@@ -48,6 +49,7 @@
  * the units for all operations.
  */
 static void __iomem *gt_base;
+static unsigned long gt_pbase;
 static unsigned long gt_clk_rate;
 static int gt_ppi;
 static struct clock_event_device __percpu *gt_evt;
@@ -210,6 +212,16 @@ static u64 notrace gt_sched_clock_read(void)
 
 static void __init gt_clocksource_init(void)
 {
+#ifdef CONFIG_IPIPE
+	struct __ipipe_tscinfo tsc_info = {
+		.type = IPIPE_TSC_TYPE_FREERUNNING,
+		.freq = gt_clk_rate,
+		.counter_vaddr = (unsigned long)gt_base,
+		.u.counter_paddr = gt_pbase,
+		.u.mask = 0xffffffff,
+	};
+#endif
+
 	writel(0, gt_base + GT_CONTROL);
 	writel(0, gt_base + GT_COUNTER0);
 	writel(0, gt_base + GT_COUNTER1);
@@ -219,6 +231,9 @@ static void __init gt_clocksource_init(void)
 #ifdef CONFIG_CLKSRC_ARM_GLOBAL_TIMER_SCHED_CLOCK
 	sched_clock_register(gt_sched_clock_read, 64, gt_clk_rate);
 #endif
+#ifdef CONFIG_IPIPE
+	__ipipe_tsc_register(&tsc_info);
+#endif
 	clocksource_register_hz(&gt_clocksource, gt_clk_rate);
 }
 
@@ -242,8 +257,9 @@ static struct notifier_block gt_cpu_nb = {
 
 static void __init global_timer_of_register(struct device_node *np)
 {
+	int err = 0, install_clockevent = 1;
+	struct resource res;
 	struct clk *gt_clk;
-	int err = 0;
 
 	/*
 	 * In r2p0 the comparators for each processor with the global timer
@@ -252,13 +268,15 @@ static void __init global_timer_of_register(struct device_node *np)
 	 */
 	if ((read_cpuid_id() & 0xf0000f) < 0x200000) {
 		pr_warn("global-timer: non support for this cpu version.\n");
-		return;
+		install_clockevent = 0;
 	}
 
-	gt_ppi = irq_of_parse_and_map(np, 0);
-	if (!gt_ppi) {
-		pr_warn("global-timer: unable to parse irq\n");
-		return;
+	if (install_clockevent) {
+		gt_ppi = irq_of_parse_and_map(np, 0);
+		if (!gt_ppi) {
+			pr_warn("global-timer: unable to parse irq\n");
+			install_clockevent = 0;
+		}
 	}
 
 	gt_base = of_iomap(np, 0);
@@ -267,6 +285,11 @@ static void __init global_timer_of_register(struct device_node *np)
 		return;
 	}
 
+	if (of_address_to_resource(np, 0, &res))
+		res.start = 0;
+
+	gt_pbase = res.start;
+
 	gt_clk = of_clk_get(np, 0);
 	if (!IS_ERR(gt_clk)) {
 		err = clk_prepare_enable(gt_clk);
@@ -279,37 +302,39 @@ static void __init global_timer_of_register(struct device_node *np)
 	}
 
 	gt_clk_rate = clk_get_rate(gt_clk);
-	gt_evt = alloc_percpu(struct clock_event_device);
-	if (!gt_evt) {
-		pr_warn("global-timer: can't allocate memory\n");
-		err = -ENOMEM;
-		goto out_clk;
-	}
-
-	err = request_percpu_irq(gt_ppi, gt_clockevent_interrupt,
+	if (install_clockevent) {
+		gt_evt = alloc_percpu(struct clock_event_device);
+		if (!gt_evt) {
+			pr_warn("global-timer: can't allocate memory\n");
+			err = -ENOMEM;
+			goto out_clk;
+		}
+
+		err = request_percpu_irq(gt_ppi, gt_clockevent_interrupt,
 				 "gt", gt_evt);
-	if (err) {
-		pr_warn("global-timer: can't register interrupt %d (%d)\n",
-			gt_ppi, err);
-		goto out_free;
-	}
-
-	err = register_cpu_notifier(&gt_cpu_nb);
-	if (err) {
-		pr_warn("global-timer: unable to register cpu notifier.\n");
-		goto out_irq;
+		if (err) {
+			pr_warn("global-timer: can't register interrupt %d (%d)\n",
+				gt_ppi, err);
+			goto out_free;
+		}
+
+		err = register_cpu_notifier(&gt_cpu_nb);
+		if (err) {
+			pr_warn("global-timer: unable to register cpu notifier.\n");
+			free_percpu_irq(gt_ppi, gt_evt);
+		  out_free:
+			free_percpu(gt_evt);
+			install_clockevent = 0;
+		}
 	}
 
 	/* Immediately configure the timer on the boot CPU */
 	gt_clocksource_init();
-	gt_clockevents_init(this_cpu_ptr(gt_evt));
+	if (install_clockevent)
+		gt_clockevents_init(this_cpu_ptr(gt_evt));
 
 	return;
 
-out_irq:
-	free_percpu_irq(gt_ppi, gt_evt);
-out_free:
-	free_percpu(gt_evt);
 out_clk:
 	clk_disable_unprepare(gt_clk);
 out_unmap:

-- 
					    Gilles.


^ permalink raw reply related	[flat|nested] 46+ messages in thread

* Re: [Xenomai] Switchtest failures on ODROIDU3
  2014-10-22  6:28                                                                     ` Gilles Chanteperdrix
@ 2014-10-29  1:23                                                                       ` GP Orcullo
  2014-10-29  6:16                                                                         ` Gilles Chanteperdrix
  0 siblings, 1 reply; 46+ messages in thread
From: GP Orcullo @ 2014-10-29  1:23 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Xenomai

On Wed, Oct 22, 2014 at 2:28 PM, Gilles Chanteperdrix
<gilles.chanteperdrix@xenomai.org> wrote:
> On Mon, Oct 20, 2014 at 03:29:26PM +0800, GP Orcullo wrote:
>> On Oct 9, 2014 11:14 PM, "Gilles Chanteperdrix" <
>> gilles.chanteperdrix@xenomai.org> wrote:
>> >
>> > On 10/09/2014 01:06 PM, GP Orcullo wrote:
>> > > On Oct 9, 2014 6:16 PM, "Gilles Chanteperdrix" <
>> > > gilles.chanteperdrix@xenomai.org> wrote:
>> > >>
>> > >> On 10/09/2014 12:12 PM, GP Orcullo wrote:
>> > >>> On Thu, Oct 9, 2014 at 6:06 PM, Gilles Chanteperdrix
>> > >>> <gilles.chanteperdrix@xenomai.org> wrote:
>> > >>>> On 10/09/2014 12:02 PM, GP Orcullo wrote:
>> > >>>>> On Mon, Oct 6, 2014 at 6:30 AM, Gilles Chanteperdrix
>> > >>>>> <gilles.chanteperdrix@xenomai.org> wrote:
>> > >>>>>>
>> > >>>>>> It means that Linux was interrupted by Xenomai during its timer
>> > >>>>>> interrupt, and that Xenomai interrupted it for 280us. This may
>> > >>>>>> happens with switchtest if it has a really long chain of context
>> > >>>>>> switches. If you want to check what happened, enable the I-pipe
>> > >>>>>> tracer, and trigger a trace freeze right before this message.
>> > >>>>>>
>> > >>>>>
>> > >>>>> One more piece to the puzzle: disabling CONFIG_IPIPE_DEBUG_INTERNAL
>> > >>>>> causes the system to lockup.
>> > >>>>>
>> > >>>> How do you know this is related?
>> > >>>>
>> > >>>>
>>  Gilles.
>> > >>>
>> > >>> Sorry, I quoted the wrong message.
>> > >>>
>> > >>> If CONFIG_PREEMPT is disabled and CONFIG_IPIPE_DEBUG_INTERNAL is not
>> > >>> disabled, the system works fine.
>> > >>>
>> > >> So, there is a problem, likely in your port with CONFIG_PREEMPT, but
>> > >> maybe in Xenomai (I need to check, because I am not so sure I tested
>> > >> xeno-regression-test without CONFIG_PREEMPT).
>> > >>
>> > >> And there is a problem in your port without
>> CONFIG_IPIPE_DEBUG_INTERNAL.
>> > >> This I do not need to check, I have tested Xenomai wihout this option
>> > >> enabled.
>> > >>
>> > >> So, my question is: how do you know the two issues are related?
>> > >>
>> > >
>> > > I don't know the answer.
>> > >
>> > > I'm only looking at the effects and not the cause of the issue.
>> > >
>> > > So,  where shall I start digging?  What's in
>> CONFIG_IPIPE_DEBUG_INTERNAL
>> > > that would somehow suppress the problem?
>> > >
>> >
>> > Quite frankly, I would go the other way: check that every piece of code
>> > which may be executed over real-time context does not use any Linux
>> > code. That does not include a lot of code, actually all that is covered
>> > by the porting guide:
>> > - the interrupt controller callbacks (note that the GIC handles some SOC
>> > specific callabcks, so if you have some, you need to check them)
>> > - the chained interrupt demux handlers
>> > - the timer and tsc management functions.
>> > - some workaround specific code that hooks in the iowrite/writel
>> > functions, such as the L2 cache synchronization on omap4.
>> >
>> > And it seems that is all, so, there should not be a lot of code to check.
>> >
>> >
>>
>> Hello Gilles,
>>
>> Thanks for all the help.
>>
>> The problem was traced to the tsc emulation,  the counter somehow gets
>> messed up when the global timer is shared with Linux.
>
> Hi,
>
> could you try the following patch and tell me if it avoids the
> issue?
>
> Thanks.
> Regards.
>
> diff --git a/arch/arm/boot/dts/imx6qdl.dtsi b/arch/arm/boot/dts/imx6qdl.dtsi
> index fb28b2e..150beb5 100644
> --- a/arch/arm/boot/dts/imx6qdl.dtsi
> +++ b/arch/arm/boot/dts/imx6qdl.dtsi
> @@ -106,6 +106,12 @@
>                         clocks = <&clks 15>;
>                 };
>
> +               timer@00a00200 {
> +                       compatible = "arm,cortex-a9-global-timer";
> +                       reg = <0x00a00200 0x20>;
> +                       clocks = <&clks 15>;
> +               };
> +
>                 L2: l2-cache@00a02000 {
>                         compatible = "arm,pl310-cache";
>                         reg = <0x00a02000 0x1000>;
> diff --git a/arch/arm/boot/dts/omap4.dtsi b/arch/arm/boot/dts/omap4.dtsi
> index a914496..2cb8144 100644
> --- a/arch/arm/boot/dts/omap4.dtsi
> +++ b/arch/arm/boot/dts/omap4.dtsi
> @@ -67,6 +67,12 @@
>                 interrupts = <GIC_PPI 13 (GIC_CPU_MASK_RAW(3) | IRQ_TYPE_LEVEL_HIGH)>;
>         };
>
> +       global_timer: timer@48240200 {
> +               compatible = "arm,cortex-a9-global-timer";
> +               reg = <0x48240200 0x20>;
> +               clocks = <&mpu_periphclk>;
> +       };
> +
>         /*
>          * The soc node represents the soc top level view. It is uses for IPs
>          * that are not memory mapped in the MPU view or for the MPU itself.
> diff --git a/arch/arm/kernel/smp_twd.c b/arch/arm/kernel/smp_twd.c
> index f0b7c52..864651c 100644
> --- a/arch/arm/kernel/smp_twd.c
> +++ b/arch/arm/kernel/smp_twd.c
> @@ -28,7 +28,6 @@
>  #include <asm/smp_plat.h>
>  #include <asm/smp_twd.h>
>  #include <asm/cputype.h>
> -#include <asm/ipipe.h>
>
>  /* set up by the platform code */
>  static void __iomem *twd_base;
> @@ -51,42 +50,9 @@ static void twd_ack(void)
>         writel_relaxed(1, twd_base + TWD_TIMER_INTSTAT);
>  }
>
> -static struct __ipipe_tscinfo tsc_info;
> -
>  static void twd_get_clock(struct device_node *np);
>  static void __cpuinit twd_calibrate_rate(void);
>
> -static void __init gt_setup(unsigned long base_paddr, unsigned bits)
> -{
> -       if ((read_cpuid_id() & 0xf00000) == 0)
> -               return;
> -
> -       gt_base = ioremap(base_paddr, SZ_256);
> -       BUG_ON(!gt_base);
> -
> -       /* Start global timer */
> -       __raw_writel(1, gt_base + 0x8);
> -
> -       tsc_info.type = IPIPE_TSC_TYPE_FREERUNNING;
> -       tsc_info.freq = twd_timer_rate;
> -       tsc_info.counter_vaddr = (unsigned long)gt_base;
> -       tsc_info.u.counter_paddr = base_paddr;
> -
> -       switch(bits) {
> -       case 64:
> -               tsc_info.u.mask = 0xffffffffffffffffULL;
> -               break;
> -       case 32:
> -               tsc_info.u.mask = 0xffffffff;
> -               break;
> -       default:
> -               /* Only supported as a 32 bits or 64 bits */
> -               BUG();
> -       }
> -
> -       __ipipe_tsc_register(&tsc_info);
> -}
> -
>  #ifdef CONFIG_IPIPE_DEBUG_INTERNAL
>
>  static DEFINE_PER_CPU(int, irqs);
> @@ -460,8 +426,6 @@ out_free:
>
>  int __init twd_local_timer_register(struct twd_local_timer *tlt)
>  {
> -       int rc;
> -
>         if (twd_base || twd_evt)
>                 return -EBUSY;
>
> @@ -472,13 +436,7 @@ int __init twd_local_timer_register(struct twd_local_timer *tlt)
>                 return -ENOMEM;
>
>
> -       rc = twd_local_timer_common_register(NULL);
> -       if (rc == 0)
> -#ifdef CONFIG_IPIPE
> -               gt_setup(tlt->res[0].start - 0x400, 32);
> -#endif
> -
> -       return rc;
> +       return twd_local_timer_common_register(NULL);
>  }
>
>  #ifdef CONFIG_OF
> @@ -503,16 +461,6 @@ static void __init twd_local_timer_of_register(struct device_node *np)
>
>
>         err = twd_local_timer_common_register(np);
> -#ifdef CONFIG_IPIPE
> -       if (err == 0) {
> -               struct resource res;
> -
> -               if (of_address_to_resource(np, 0, &res))
> -                       res.start = 0;
> -
> -               gt_setup(res.start - 0x400, 32);
> -       }
> -#endif /* CONFIG_IPIPE */
>
>  out:
>         WARN(err, "twd_local_timer_of_register failed (%d)\n", err);
> diff --git a/arch/arm/mach-imx/Kconfig b/arch/arm/mach-imx/Kconfig
> index 33567aa..b825a09 100644
> --- a/arch/arm/mach-imx/Kconfig
> +++ b/arch/arm/mach-imx/Kconfig
> @@ -808,6 +808,7 @@ config SOC_IMX6Q
>         select PL310_ERRATA_727915 if CACHE_PL310
>         select PL310_ERRATA_769419 if CACHE_PL310
>         select PM_OPP if PM
> +       select ARM_GLOBAL_TIMER if SMP && IPIPE
>
>         help
>           This enables support for Freescale i.MX6 Quad processor.
> diff --git a/arch/arm/mach-omap2/Kconfig b/arch/arm/mach-omap2/Kconfig
> index 0af7ca0..63a5b42 100644
> --- a/arch/arm/mach-omap2/Kconfig
> +++ b/arch/arm/mach-omap2/Kconfig
> @@ -45,6 +45,7 @@ config ARCH_OMAP4
>         select USB_ARCH_HAS_EHCI if USB_SUPPORT
>         select ARM_ERRATA_754322
>         select ARM_ERRATA_775420
> +       select ARM_GLOBAL_TIMER if IPIPE && SMP
>
>  config SOC_OMAP5
>         bool "TI OMAP5"
> diff --git a/drivers/clocksource/arm_global_timer.c b/drivers/clocksource/arm_global_timer.c
> index 0fc31d0..62e208b 100644
> --- a/drivers/clocksource/arm_global_timer.c
> +++ b/drivers/clocksource/arm_global_timer.c
> @@ -24,6 +24,7 @@
>  #include <linux/sched_clock.h>
>
>  #include <asm/cputype.h>
> +#include <asm/ipipe.h>
>
>  #define GT_COUNTER0    0x00
>  #define GT_COUNTER1    0x04
> @@ -48,6 +49,7 @@
>   * the units for all operations.
>   */
>  static void __iomem *gt_base;
> +static unsigned long gt_pbase;
>  static unsigned long gt_clk_rate;
>  static int gt_ppi;
>  static struct clock_event_device __percpu *gt_evt;
> @@ -210,6 +212,16 @@ static u64 notrace gt_sched_clock_read(void)
>
>  static void __init gt_clocksource_init(void)
>  {
> +#ifdef CONFIG_IPIPE
> +       struct __ipipe_tscinfo tsc_info = {
> +               .type = IPIPE_TSC_TYPE_FREERUNNING,
> +               .freq = gt_clk_rate,
> +               .counter_vaddr = (unsigned long)gt_base,
> +               .u.counter_paddr = gt_pbase,
> +               .u.mask = 0xffffffff,
> +       };
> +#endif
> +
>         writel(0, gt_base + GT_CONTROL);
>         writel(0, gt_base + GT_COUNTER0);
>         writel(0, gt_base + GT_COUNTER1);
> @@ -219,6 +231,9 @@ static void __init gt_clocksource_init(void)
>  #ifdef CONFIG_CLKSRC_ARM_GLOBAL_TIMER_SCHED_CLOCK
>         sched_clock_register(gt_sched_clock_read, 64, gt_clk_rate);
>  #endif
> +#ifdef CONFIG_IPIPE
> +       __ipipe_tsc_register(&tsc_info);
> +#endif
>         clocksource_register_hz(&gt_clocksource, gt_clk_rate);
>  }
>
> @@ -242,8 +257,9 @@ static struct notifier_block gt_cpu_nb = {
>
>  static void __init global_timer_of_register(struct device_node *np)
>  {
> +       int err = 0, install_clockevent = 1;
> +       struct resource res;
>         struct clk *gt_clk;
> -       int err = 0;
>
>         /*
>          * In r2p0 the comparators for each processor with the global timer
> @@ -252,13 +268,15 @@ static void __init global_timer_of_register(struct device_node *np)
>          */
>         if ((read_cpuid_id() & 0xf0000f) < 0x200000) {
>                 pr_warn("global-timer: non support for this cpu version.\n");
> -               return;
> +               install_clockevent = 0;
>         }
>
> -       gt_ppi = irq_of_parse_and_map(np, 0);
> -       if (!gt_ppi) {
> -               pr_warn("global-timer: unable to parse irq\n");
> -               return;
> +       if (install_clockevent) {
> +               gt_ppi = irq_of_parse_and_map(np, 0);
> +               if (!gt_ppi) {
> +                       pr_warn("global-timer: unable to parse irq\n");
> +                       install_clockevent = 0;
> +               }
>         }
>
>         gt_base = of_iomap(np, 0);
> @@ -267,6 +285,11 @@ static void __init global_timer_of_register(struct device_node *np)
>                 return;
>         }
>
> +       if (of_address_to_resource(np, 0, &res))
> +               res.start = 0;
> +
> +       gt_pbase = res.start;
> +
>         gt_clk = of_clk_get(np, 0);
>         if (!IS_ERR(gt_clk)) {
>                 err = clk_prepare_enable(gt_clk);
> @@ -279,37 +302,39 @@ static void __init global_timer_of_register(struct device_node *np)
>         }
>
>         gt_clk_rate = clk_get_rate(gt_clk);
> -       gt_evt = alloc_percpu(struct clock_event_device);
> -       if (!gt_evt) {
> -               pr_warn("global-timer: can't allocate memory\n");
> -               err = -ENOMEM;
> -               goto out_clk;
> -       }
> -
> -       err = request_percpu_irq(gt_ppi, gt_clockevent_interrupt,
> +       if (install_clockevent) {
> +               gt_evt = alloc_percpu(struct clock_event_device);
> +               if (!gt_evt) {
> +                       pr_warn("global-timer: can't allocate memory\n");
> +                       err = -ENOMEM;
> +                       goto out_clk;
> +               }
> +
> +               err = request_percpu_irq(gt_ppi, gt_clockevent_interrupt,
>                                  "gt", gt_evt);
> -       if (err) {
> -               pr_warn("global-timer: can't register interrupt %d (%d)\n",
> -                       gt_ppi, err);
> -               goto out_free;
> -       }
> -
> -       err = register_cpu_notifier(&gt_cpu_nb);
> -       if (err) {
> -               pr_warn("global-timer: unable to register cpu notifier.\n");
> -               goto out_irq;
> +               if (err) {
> +                       pr_warn("global-timer: can't register interrupt %d (%d)\n",
> +                               gt_ppi, err);
> +                       goto out_free;
> +               }
> +
> +               err = register_cpu_notifier(&gt_cpu_nb);
> +               if (err) {
> +                       pr_warn("global-timer: unable to register cpu notifier.\n");
> +                       free_percpu_irq(gt_ppi, gt_evt);
> +                 out_free:
> +                       free_percpu(gt_evt);
> +                       install_clockevent = 0;
> +               }
>         }
>
>         /* Immediately configure the timer on the boot CPU */
>         gt_clocksource_init();
> -       gt_clockevents_init(this_cpu_ptr(gt_evt));
> +       if (install_clockevent)
> +               gt_clockevents_init(this_cpu_ptr(gt_evt));
>
>         return;
>
> -out_irq:
> -       free_percpu_irq(gt_ppi, gt_evt);
> -out_free:
> -       free_percpu(gt_evt);
>  out_clk:
>         clk_disable_unprepare(gt_clk);
>  out_unmap:
>
> --
>                                             Gilles.

Exynos is not using the arm global timers.

I'm still experiencing some intermittent "rcu_preempt detected stalls".

Maybe I'm dealing with more than one issue here.

-- 
GP Orcullo


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] Switchtest failures on ODROIDU3
  2014-10-29  1:23                                                                       ` GP Orcullo
@ 2014-10-29  6:16                                                                         ` Gilles Chanteperdrix
  2014-10-29  7:24                                                                           ` GP Orcullo
  0 siblings, 1 reply; 46+ messages in thread
From: Gilles Chanteperdrix @ 2014-10-29  6:16 UTC (permalink / raw)
  To: GP Orcullo; +Cc: Xenomai

On Wed, Oct 29, 2014 at 09:23:16AM +0800, GP Orcullo wrote:
> Exynos is not using the arm global timers.

I do not understand what you mean... You said you had problems
when the global timer was shared with Linux, this patches tries to
address that by not remapping the global timer for xenomai, and
sharing the mapping defined by Linux. And now you say Exynos is not
using the global timers? You will have to explain a bit more than
that... I quote what you said:

> >> The problem was traced to the tsc emulation,  the counter somehow gets
> >> messed up when the global timer is shared with Linux.


> 
> I'm still experiencing some intermittent "rcu_preempt detected stalls".
> 
> Maybe I'm dealing with more than one issue here.
> 

I would tend to prefer solving them one by one, and from where I
stand the global timer issue is not solved.

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] Switchtest failures on ODROIDU3
  2014-10-29  6:16                                                                         ` Gilles Chanteperdrix
@ 2014-10-29  7:24                                                                           ` GP Orcullo
  2014-10-29  7:26                                                                             ` Gilles Chanteperdrix
  0 siblings, 1 reply; 46+ messages in thread
From: GP Orcullo @ 2014-10-29  7:24 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Xenomai

On Wed, Oct 29, 2014 at 2:16 PM, Gilles Chanteperdrix
<gilles.chanteperdrix@xenomai.org> wrote:
> On Wed, Oct 29, 2014 at 09:23:16AM +0800, GP Orcullo wrote:
>> Exynos is not using the arm global timers.
>
> I do not understand what you mean... You said you had problems
> when the global timer was shared with Linux, this patches tries to
> address that by not remapping the global timer for xenomai, and
> sharing the mapping defined by Linux. And now you say Exynos is not
> using the global timers? You will have to explain a bit more than
> that... I quote what you said:
>
>> >> The problem was traced to the tsc emulation,  the counter somehow gets
>> >> messed up when the global timer is shared with Linux.
>
>

Sorry, I was referring to the global timer that exynos uses - MCT,
multi core timer.

>>
>> I'm still experiencing some intermittent "rcu_preempt detected stalls".
>>
>> Maybe I'm dealing with more than one issue here.
>>
>
> I would tend to prefer solving them one by one, and from where I
> stand the global timer issue is not solved.
>
> --
>                                             Gilles.

-- 
GP Orcullo


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] Switchtest failures on ODROIDU3
  2014-10-29  7:24                                                                           ` GP Orcullo
@ 2014-10-29  7:26                                                                             ` Gilles Chanteperdrix
  2014-10-29  7:47                                                                               ` GP Orcullo
  0 siblings, 1 reply; 46+ messages in thread
From: Gilles Chanteperdrix @ 2014-10-29  7:26 UTC (permalink / raw)
  To: GP Orcullo; +Cc: Xenomai

On Wed, Oct 29, 2014 at 03:24:52PM +0800, GP Orcullo wrote:
> On Wed, Oct 29, 2014 at 2:16 PM, Gilles Chanteperdrix
> <gilles.chanteperdrix@xenomai.org> wrote:
> > On Wed, Oct 29, 2014 at 09:23:16AM +0800, GP Orcullo wrote:
> >> Exynos is not using the arm global timers.
> >
> > I do not understand what you mean... You said you had problems
> > when the global timer was shared with Linux, this patches tries to
> > address that by not remapping the global timer for xenomai, and
> > sharing the mapping defined by Linux. And now you say Exynos is not
> > using the global timers? You will have to explain a bit more than
> > that... I quote what you said:
> >
> >> >> The problem was traced to the tsc emulation,  the counter somehow gets
> >> >> messed up when the global timer is shared with Linux.
> >
> >
> 
> Sorry, I was referring to the global timer that exynos uses - MCT,
> multi core timer.

Ok, but Xenomai on cortex A9 enables the global timer as soon as the
local timer is enabled. How do you get around this, you do not
enable the TWD code on exynos ?

-- 
					    Gilles.


^ permalink raw reply	[flat|nested] 46+ messages in thread

* Re: [Xenomai] Switchtest failures on ODROIDU3
  2014-10-29  7:26                                                                             ` Gilles Chanteperdrix
@ 2014-10-29  7:47                                                                               ` GP Orcullo
  0 siblings, 0 replies; 46+ messages in thread
From: GP Orcullo @ 2014-10-29  7:47 UTC (permalink / raw)
  To: Gilles Chanteperdrix; +Cc: Xenomai

On Oct 29, 2014 3:26 PM, "Gilles Chanteperdrix" <
gilles.chanteperdrix@xenomai.org> wrote:
>
> On Wed, Oct 29, 2014 at 03:24:52PM +0800, GP Orcullo wrote:
> > On Wed, Oct 29, 2014 at 2:16 PM, Gilles Chanteperdrix
> > <gilles.chanteperdrix@xenomai.org> wrote:
> > > On Wed, Oct 29, 2014 at 09:23:16AM +0800, GP Orcullo wrote:
> > >> Exynos is not using the arm global timers.
> > >
> > > I do not understand what you mean... You said you had problems
> > > when the global timer was shared with Linux, this patches tries to
> > > address that by not remapping the global timer for xenomai, and
> > > sharing the mapping defined by Linux. And now you say Exynos is not
> > > using the global timers? You will have to explain a bit more than
> > > that... I quote what you said:
> > >
> > >> >> The problem was traced to the tsc emulation,  the counter somehow
gets
> > >> >> messed up when the global timer is shared with Linux.
> > >
> > >
> >
> > Sorry, I was referring to the global timer that exynos uses - MCT,
> > multi core timer.
>
> Ok, but Xenomai on cortex A9 enables the global timer as soon as the
> local timer is enabled. How do you get around this, you do not
> enable the TWD code on exynos ?
>
> --
>                                             Gilles.

It doesn't have the normal A9 global timer. The CONFIG_HAVE_ARM_TWD is not
used on any of the exynos kernels.

^ permalink raw reply	[flat|nested] 46+ messages in thread

end of thread, other threads:[~2014-10-29  7:47 UTC | newest]

Thread overview: 46+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-30  5:31 [Xenomai] Switchtest failures on ODROIDU3 GP Orcullo
2014-09-30 11:22 ` Gilles Chanteperdrix
2014-09-30 11:30 ` Gilles Chanteperdrix
2014-09-30 12:04   ` GP Orcullo
2014-09-30 12:16     ` Gilles Chanteperdrix
2014-09-30 23:32       ` GP Orcullo
2014-10-01  7:54         ` Gilles Chanteperdrix
2014-10-01  9:12           ` GP Orcullo
2014-10-01  9:20             ` Gilles Chanteperdrix
2014-10-02 13:27               ` GP Orcullo
2014-10-02 13:36                 ` Gilles Chanteperdrix
2014-10-02 15:52                   ` GP Orcullo
2014-10-02 17:13                     ` Gilles Chanteperdrix
2014-10-02 23:40                       ` GP Orcullo
2014-10-03  3:35                       ` GP Orcullo
2014-10-03  7:20                         ` Gilles Chanteperdrix
2014-10-03  8:45                           ` GP Orcullo
2014-10-03  8:57                             ` Gilles Chanteperdrix
2014-10-03 10:58                               ` GP Orcullo
2014-10-03 13:37                                 ` Gilles Chanteperdrix
2014-10-03 15:28                                   ` GP Orcullo
2014-10-03 19:14                                     ` Gilles Chanteperdrix
2014-10-03 22:45                                       ` GP Orcullo
2014-10-03 22:48                                         ` Gilles Chanteperdrix
2014-10-04 10:26                                           ` GP Orcullo
2014-10-04 11:31                                             ` Gilles Chanteperdrix
2014-10-05 22:00                                               ` GP Orcullo
2014-10-05 22:04                                                 ` Gilles Chanteperdrix
2014-10-05 22:24                                                   ` GP Orcullo
2014-10-05 22:30                                                     ` Gilles Chanteperdrix
2014-10-09 10:02                                                       ` GP Orcullo
2014-10-09 10:06                                                         ` Gilles Chanteperdrix
2014-10-09 10:12                                                           ` GP Orcullo
2014-10-09 10:16                                                             ` Gilles Chanteperdrix
2014-10-09 10:41                                                               ` Gilles Chanteperdrix
2014-10-09 11:06                                                               ` GP Orcullo
2014-10-09 13:06                                                                 ` Gilles Chanteperdrix
2014-10-09 15:14                                                                 ` Gilles Chanteperdrix
2014-10-20  7:29                                                                   ` GP Orcullo
2014-10-20  7:33                                                                     ` Gilles Chanteperdrix
2014-10-22  6:28                                                                     ` Gilles Chanteperdrix
2014-10-29  1:23                                                                       ` GP Orcullo
2014-10-29  6:16                                                                         ` Gilles Chanteperdrix
2014-10-29  7:24                                                                           ` GP Orcullo
2014-10-29  7:26                                                                             ` Gilles Chanteperdrix
2014-10-29  7:47                                                                               ` GP Orcullo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.