* Possible kernel regression between 3.0.31-rt51 and 3.4.x on PPC64 ?
@ 2012-07-11 16:02 Arnaud B.
[not found] ` <4FFDBE71.2000705@am.sony.com>
0 siblings, 1 reply; 4+ messages in thread
From: Arnaud B. @ 2012-07-11 16:02 UTC (permalink / raw)
To: linux-rt-users
Hi all,
I got a *big* issue with my current setup, and perhaps someone could
have a nice idea (or at least it will be helpful to have a bug report
from me :) )
I'm working on freescale P5020 with a 64bits kernel (ppc64) with
preempt-rt (kernel version 3.4.4 rt13)
And I got an issue, and couldn't find a solution yet.
I also try on another ppc64 board (my old MacPro G5 ) and I got the same issue !
Last thing to note:
. it works ok with a 32bits kernel on P5020.
. What is strange is that if I take 3.0 kernel (3.0.31-rt51) it's ok
(yes, even with a 64bit kernel)
So, here are the issue:
Kernel crash randomly. Often in udev. If it pass udev, it could run ok
for a while, but will fail eventually in the middle of LTP.
If in the kernel I put DEBUG_RT_MUTEXES, it will fail just after
registering perf monitor.
I tried was to put raw spin lock in plist, as it seems it fails
somewhere when dealing with that. No success :(
At this point the rtmutex plist is trashed. Data are bad when it crash
as pointer is not pointing to RAM anymore ;)
I tried add call to plist_check_head at some place, and eventually I
got always a data access in init_lists (rtmutex.c). So at this point
plist is already corrupted.
So here is the call stack. Remember, it's alway the same :)
cpu 0x0: Vector: 700 (Program Check) at [c0000000fb0e7830]
pc: c0000000000a2ba4: .__try_to_take_rt_mutex+0x74/0x1b0
lr: c0000000007d9f10: .rt_spin_lock_slowlock+0xa4/0x414
sp: c0000000fb0e7ab0
msr: 80029000
current = 0xc0000000fb0e20c0
paca = 0xc00000000fff9000 softe: 0 irq_happened: 0x01
pid = 3, comm = ksoftirqd/0
*kernel BUG at
/home/arnaud/ALU/KERNEL_34/HOME/build/linux/kernel/rtmutex_common.h:75!*
0:mon> t
[c0000000fb0e7b60] c0000000007d9f10 .rt_spin_lock_slowlock+0xa4/0x414
[c0000000fb0e7cb0] c0000000007da6c0 .rt_spin_lock+0x20/0x30
[c0000000fb0e7d30] c00000000004532c .__thread_do_softirq+0xc4/0x1b4
[c0000000fb0e7dc0] c0000000000454dc .run_ksoftirqd+0xc0/0x208
[c0000000fb0e7e70] c00000000006becc .kthread+0xb8/0xc4
If you need an more information, I got all this boards in front of me :)
TIA,
/Arnaud.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Possible kernel regression between 3.0.31-rt51 and 3.4.x on PPC64 ?
[not found] ` <4FFDBE71.2000705@am.sony.com>
@ 2012-07-12 13:11 ` Arnaud B.
[not found] ` <4FFF3877.6080007@am.sony.com>
0 siblings, 1 reply; 4+ messages in thread
From: Arnaud B. @ 2012-07-12 13:11 UTC (permalink / raw)
To: frank.rowand; +Cc: linux-rt-users
Here are 2 crash logs. One is normal booting, the other is with
CONFIG_DEBUG_RT_MUTEXES choosen.
for sure w in rt_mutex_top_waiter, return by plist_first_entry is
corrupted. There is even ASCII in it from time to time :p
Without CONFIG_DEBUG_RT_MUTEXES:
w:0xc000000000beb700 w.lock:0x766500005f6e6c5f lock:0xc000000007900710
waitlist:0xc000000007900718
cpu 0x0: Vector: 700 (Program Check) at [c0000000fb0eb940]
pc: c00000000009d934: .wakeup_next_waiter+0x7c/0xe8
lr: c00000000009d920: .wakeup_next_waiter+0x68/0xe8
sp: c0000000fb0ebbc0
msr: 80029200
current = 0xc0000000fb0d8040
paca = 0xc00000000fff9000 softe: 0 irq_happened: 0x09
pid = 3, comm = ksoftirqd/0
kernel BUG at /home/arnaud/ALU/KERNEL_34/TRY_12JUIL_64/build/linux/kernel/rtmutex_common.h:83!
enter ? for help
[c0000000fb0ebc50] c000000000825d94 .rt_spin_lock_slowunlock+0xa4/0x11c
[c0000000fb0ebce0] c0000000008267b8 .rt_spin_unlock+0x54/0x64
[c0000000fb0ebd60] c0000000000451e4 .__thread_do_softirq+0x160/0x1b4
[c0000000fb0ebdf0] c0000000000452f8 .run_ksoftirqd+0xc0/0x208
[c0000000fb0ebea0] c00000000006baf0 .kthread+0xb8/0xc4
[c0000000fb0ebf90] c000000000015564 .original_kernel_thread+0x54/0x70
0:mon> d 0xc000000000beb700 <== There shoundn't be ASCII there :D
c000000000beb700 7070656400000000 72616e6765735b63 |pped....ranges[c|
c000000000beb710 6e745d2e66726f6d 203e3d2066726f6d |nt].from >= from|
c000000000beb720 000000005f6e6c5f 617263686976655f |...._nl_archive_|
c000000000beb730 7375626672656572 657300002f757372 |subfreeres../usr|
0:mon> t
[c0000000fb0ebc50] c000000000825d94 .rt_spin_lock_slowunlock+0xa4/0x11c
[c0000000fb0ebce0] c0000000008267b8 .rt_spin_unlock+0x54/0x64
[c0000000fb0ebd60] c0000000000451e4 .__thread_do_softirq+0x160/0x1b4
[c0000000fb0ebdf0] c0000000000452f8 .run_ksoftirqd+0xc0/0x208
[c0000000fb0ebea0] c00000000006baf0 .kthread+0xb8/0xc4
[c0000000fb0ebf90] c000000000015564 .original_kernel_thread+0x54/0x70
0:mon> e
cpu 0x0: Vector: 700 (Program Check) at [c0000000fb0eb940]
pc: c00000000009d934: .wakeup_next_waiter+0x7c/0xe8
lr: c00000000009d920: .wakeup_next_waiter+0x68/0xe8
sp: c0000000fb0ebbc0
msr: 80029200
current = 0xc0000000fb0d8040
paca = 0xc00000000fff9000 softe: 0 irq_happened: 0x09
pid = 3, comm = ksoftirqd/0
kernel BUG at /home/arnaud/ALU/KERNEL_34/TRY_12JUIL_64/build/linux/kernel/rtmutex_common.h:83!
0:mon> r
R00 = 0000000000000001 R16 = 0000000000000000
R01 = c0000000fb0ebbc0 R17 = 000000007fe31430
R02 = c000000000df3160 R18 = 0000000000000000
R03 = 0000000000000000 R19 = 000000007ff88068
R04 = c000000000a89f18 R20 = 000000007ff88068
R05 = 0000000000000002 R21 = 000000007ff88068
R06 = 0000000000000006 R22 = 0000000000000000
R07 = 05000affffffffff R23 = 0000000000000000
R08 = 0000000000000063 R24 = 0000000000000001
R09 = b665000058fe6b4e R25 = 0000000000000000
R10 = 000000000000000f R26 = c000000000c88dd0
R11 = 0000000000000001 R27 = c0000000fb0e8080
R12 = 0000000024044044 R28 = 0000000000000000
R13 = c00000000fff9000 R29 = c000000000beb700
R14 = 0000000000000000 R30 = c000000000d285b8
R15 = 000000007ffa1254 R31 = c000000007900710
pc = c00000000009d934 .wakeup_next_waiter+0x7c/0xe8
lr = c00000000009d920 .wakeup_next_waiter+0x68/0xe8
msr = 0000000080029200 cr = 48044048
ctr = c00000000001ebf4 xer = 0000000020000000 trap = 700
0:mon> S
msr = 0000000080021200 sprg0= c00000000fff9000
pvr = 0000000080240010 sprg1= c00000000fff9000
dec = 0000000000000000 sprg2= c00000000fff9180
sp = c0000000fb0eb0e0 sprg3= 0000000000000000
cpu 0x0: Vector: 700 (Program Check) at [c0000000fb0eae60]
pc: c000000000032a64: .cmds+0xcb4/0x1748
lr: c000000000032a60: .cmds+0xcb0/0x1748
sp: c0000000fb0eb0e0
msr: 80021200
current = 0xc0000000fb0d8040
paca = 0xc00000000fff9000 softe: 0 irq_happened: 0x09
pid = 3, comm = ksoftirqd/0
cpu 0x0: Exception 700 (Program Check) in xmon, returning to main loop
0:mon> la c00000000009d934
c00000000009d934: .wakeup_next_waiter+0x7c/0xe8
0:mon> di c00000000009d934:
c00000000009d934 0b000000 tdnei r0,0
c00000000009d938 e88d0388 ld r4,904(r13)
c00000000009d93c 387d0028 addi r3,r29,40
c00000000009d940 388406e0 addi r4,r4,1760
c00000000009d944 483ac9a9 bl c00000000044a2ec #
.plist_del+0x0/0x94
c00000000009d948 60000000 nop
c00000000009d94c e93f0008 ld r9,8(r31)
c00000000009d950 381f0008 addi r0,r31,8
c00000000009d954 7d200278 xor r0,r9,r0
c00000000009d958 3120ffff addic r9,r0,-1
c00000000009d95c 7c090110 subfe r0,r9,r0
c00000000009d960 f81f0018 std r0,24(r31)
c00000000009d964 e86d0388 ld r3,904(r13)
c00000000009d968 386306d8 addi r3,r3,1752
c00000000009d96c 7f84e378 mr r4,r28
c00000000009d970 48789561 bl c000000000826ed0 #
._raw_spin_unlock_irqrestore+0x0/0xc8
CONFIG_DEBUG_RT_MUTEXES:
[ 0.337296] ftrace: allocating 23126 entries in 137 pages
[ 0.355391] mpic: requesting IPIs...
[ 0.412420] e500 family performance monitor hardware support registered
w:0xc000000000bf1780 w.lock:0x (null) lock:0xc000000007900790
waitlist:0xc000000007900798
cpu 0x0: Vector: 700 (Program Check) at [c0000000fb0eb840]
pc: c00000000009d4e8: .__try_to_take_rt_mutex+0xa4/0x214
lr: c00000000009d4d4: .__try_to_take_rt_mutex+0x90/0x214
sp: c0000000fb0ebac0
msr: 80029200
current = 0xc0000000fb0d2040
paca = 0xc00000000fff9000 softe: 0 irq_happened: 0x01
pid = 3, comm = ksoftirqd/0
kernel BUG at /home/arnaud/ALU/KERNEL_34/TRY_12JUIL_64/build/linux/kernel/rtmutex_common.h:83!
enter ? for help
[c0000000fb0ebb70] c000000000826bc4 .rt_spin_lock_slowlock+0xa4/0x454
[c0000000fb0ebcc0] c0000000008270f8 .rt_spin_lock+0x20/0x30
[c0000000fb0ebd40] c000000000045154 .__thread_do_softirq+0xc4/0x1b4
[c0000000fb0ebdd0] c000000000045304 .run_ksoftirqd+0xc0/0x208
[c0000000fb0ebe80] c00000000006bb08 .kthread+0xb8/0xc4
[c0000000fb0ebf90] c000000000015564 .original_kernel_thread+0x54/0x70
0:mon> e
cpu 0x0: Vector: 700 (Program Check) at [c0000000fb0eb840]
pc: c00000000009d4e8: .__try_to_take_rt_mutex+0xa4/0x214
lr: c00000000009d4d4: .__try_to_take_rt_mutex+0x90/0x214
sp: c0000000fb0ebac0
msr: 80029200
current = 0xc0000000fb0d2040
paca = 0xc00000000fff9000 softe: 0 irq_happened: 0x01
pid = 3, comm = ksoftirqd/0
kernel BUG at /home/arnaud/ALU/KERNEL_34/TRY_12JUIL_64/build/linux/kernel/rtmutex_common.h:83!
0:mon> r
R00 = 0000000000000001 R16 = 0000000000000000
R01 = c0000000fb0ebac0 R17 = 000000007fe31430
R02 = c000000000dfc950 R18 = 0000000000000000
R03 = 0000000000000000 R19 = 000000007ff88068
R04 = c000000000a8b94d R20 = 000000007ff88068
R05 = 0000000000000002 R21 = 000000007ff88068
R06 = 0000000000000006 R22 = 0000000000000000
R07 = 05000affffffffff R23 = 0000000000000000
R08 = 0000000000000063 R24 = 0000000000000001
R09 = c00000000790078f R25 = 0000000000000001
R10 = 000000000000000f R26 = 0000000000000000
R11 = 0000000000000001 R27 = c000000000bf1780
R12 = 0000000024044042 R28 = c000000007900798
R13 = c00000000fff9000 R29 = c0000000fb0d2040
R14 = 0000000000000000 R30 = c000000000d31a88
R15 = 000000007ffa1254 R31 = c000000007900790
pc = c00000000009d4e8 .__try_to_take_rt_mutex+0xa4/0x214
lr = c00000000009d4d4 .__try_to_take_rt_mutex+0x90/0x214
msr = 0000000080029200 cr = 48044088
ctr = c00000000001ebf4 xer = 0000000020000000 trap = 700
0:mon> S
msr = 0000000080021200 sprg0= c00000000fff9000
pvr = 0000000080240010 sprg1= c00000000fff9000
dec = 0000000000000000 sprg2= c00000000fff9180
sp = c0000000fb0eafe0 sprg3= 0000000000000000
cpu 0x0: Vector: 700 (Program Check) at [c0000000fb0ead60]
pc: c000000000032a64: .cmds+0xcb4/0x1748
lr: c000000000032a60: .cmds+0xcb0/0x1748
sp: c0000000fb0eafe0
msr: 80021200
current = 0xc0000000fb0d2040
paca = 0xc00000000fff9000 softe: 0 irq_happened: 0x01
pid = 3, comm = ksoftirqd/0
cpu 0x0: Exception 700 (Program Check) in xmon, returning to main loop
0:mon> d 0xc000000000bf1780
c000000000bf1780 0000000000000000 0000000000000000 |................|
c000000000bf1790 0000000000000000 c000000000bf1798 |................|
c000000000bf17a0 c000000000bf1798 0000000000000000 |................|
c000000000bf17b0 0000000100000000 c000000000a849df |..............I.|
0:mon> d 0xc000000007900790
c000000007900790 8000000000000000 c000000000bf1798 |................|
c0000000079007a0 c000000000bf1798 0000000000000001 |................|
c0000000079007b0 0000000100000000 c000000000a849df |..............I.|
c0000000079007c0 c000000000a84ab3 0000019800000000 |......J.........|
0:mon> d 0xc000000007900798
c000000007900798 c000000000bf1798 c000000000bf1798 |................|
c0000000079007a8 0000000000000001 0000000100000000 |................|
c0000000079007b8 c000000000a849df c000000000a84ab3 |......I.......J.|
c0000000079007c8 0000019800000000 0000000000000000 |................|
0:mon> di c00000000009d4e8
c00000000009d4e8 0b000000 tdnei r0,0
c00000000009d4ec e93b0050 ld r9,80(r27)
c00000000009d4f0 7fbd4800 cmpd cr7,r29,r9
c00000000009d4f4 419e0028 beq cr7,c00000000009d51c #
.__try_to_take_rt_mutex+0xd8/0x214
c00000000009d4f8 2fb90000 cmpdi cr7,r25,0
c00000000009d4fc 419e0010 beq cr7,c00000000009d50c #
.__try_to_take_rt_mutex+0xc8/0x214
c00000000009d500 e81d003a lwa r0,56(r29)
c00000000009d504 2f800063 cmpwi cr7,r0,99
c00000000009d508 41bd0110 bgt cr7,c00000000009d618 #
.__try_to_take_rt_mutex+0x1d4/0x214
c00000000009d50c 817d0038 lwz r11,56(r29)
c00000000009d510 80090038 lwz r0,56(r9)
c00000000009d514 7f8b0000 cmpw cr7,r11,r0
c00000000009d518 409c00f8 bge cr7,c00000000009d610 #
.__try_to_take_rt_mutex+0x1cc/0x214
c00000000009d51c 2fba0000 cmpdi cr7,r26,0
c00000000009d520 409e0010 bne cr7,c00000000009d530 #
.__try_to_take_rt_mutex+0xec/0x214
c00000000009d524 e81f0008 ld r0,8(r31)
0:mon> di c00000000009d4d4
c00000000009d4d4 60000000 nop
c00000000009d4d8 e81b0058 ld r0,88(r27)
c00000000009d4dc 7fe00278 xor r0,r31,r0
c00000000009d4e0 3120ffff addic r9,r0,-1
c00000000009d4e4 7c090110 subfe r0,r9,r0
c00000000009d4e8 0b000000 tdnei r0,0
c00000000009d4ec e93b0050 ld r9,80(r27)
c00000000009d4f0 7fbd4800 cmpd cr7,r29,r9
c00000000009d4f4 419e0028 beq cr7,c00000000009d51c #
.__try_to_take_rt_mutex+0xd8/0x214
c00000000009d4f8 2fb90000 cmpdi cr7,r25,0
c00000000009d4fc 419e0010 beq cr7,c00000000009d50c #
.__try_to_take_rt_mutex+0xc8/0x214
c00000000009d500 e81d003a lwa r0,56(r29)
c00000000009d504 2f800063 cmpwi cr7,r0,99
c00000000009d508 41bd0110 bgt cr7,c00000000009d618 #
.__try_to_take_rt_mutex+0x1d4/0x214
c00000000009d50c 817d0038 lwz r11,56(r29)
c00000000009d510 80090038 lwz r0,56(r9)
0:mon> la c00000000009d4d4
c00000000009d4d4: .__try_to_take_rt_mutex+0x90/0x214
2012/7/11 Frank Rowand <frank.rowand@am.sony.com>:
> On 07/11/12 09:02, Arnaud B. wrote:
>> Hi all,
>
> < snip >
>
>>
>> At this point the rtmutex plist is trashed. Data are bad when it crash
>> as pointer is not pointing to RAM anymore ;)
>> I tried add call to plist_check_head at some place, and eventually I
>> got always a data access in init_lists (rtmutex.c). So at this point
>> plist is already corrupted.
>>
>>
>> So here is the call stack. Remember, it's alway the same :)
>>
>> cpu 0x0: Vector: 700 (Program Check) at [c0000000fb0e7830]
>> pc: c0000000000a2ba4: .__try_to_take_rt_mutex+0x74/0x1b0
>> lr: c0000000007d9f10: .rt_spin_lock_slowlock+0xa4/0x414
>> sp: c0000000fb0e7ab0
>> msr: 80029000
>> current = 0xc0000000fb0e20c0
>> paca = 0xc00000000fff9000 softe: 0 irq_happened: 0x01
>> pid = 3, comm = ksoftirqd/0
>> *kernel BUG at
>> /home/arnaud/ALU/KERNEL_34/HOME/build/linux/kernel/rtmutex_common.h:75!*
>> 0:mon> t
>> [c0000000fb0e7b60] c0000000007d9f10 .rt_spin_lock_slowlock+0xa4/0x414
>> [c0000000fb0e7cb0] c0000000007da6c0 .rt_spin_lock+0x20/0x30
>> [c0000000fb0e7d30] c00000000004532c .__thread_do_softirq+0xc4/0x1b4
>> [c0000000fb0e7dc0] c0000000000454dc .run_ksoftirqd+0xc0/0x208
>> [c0000000fb0e7e70] c00000000006becc .kthread+0xb8/0xc4
>>
>> If you need an more information, I got all this boards in front of me :)
>
> Can you show a little bit more of the messages aound the crash? For example
> the register contents. Disassembly of the instruction at pc, and a few before
> that.
>
> -Frank
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Possible kernel regression between 3.0.31-rt51 and 3.4.x on PPC64 ?
[not found] ` <4FFF3877.6080007@am.sony.com>
@ 2012-07-13 12:37 ` Arnaud B.
2012-07-20 13:30 ` Arnaud B.
0 siblings, 1 reply; 4+ messages in thread
From: Arnaud B. @ 2012-07-13 12:37 UTC (permalink / raw)
To: frank.rowand; +Cc: Rowand, Frank, linux-rt-users
Thanks anyway ;)
I made another try : same test case fail on 3.2.22-rt35
So the regression is between 3.0 and 3.2 !
Is there any ppc64 user here ?
/Arnaud.
2012/7/12 Frank Rowand <frank.rowand@am.sony.com>:
> On 07/12/12 06:11, Arnaud B. wrote:
>> Here are 2 crash logs. One is normal booting, the other is with
>> CONFIG_DEBUG_RT_MUTEXES choosen.
>> for sure w in rt_mutex_top_waiter, return by plist_first_entry is
>> corrupted. There is even ASCII in it from time to time :p
>
> < snip >
>
>> 2012/7/11 Frank Rowand <frank.rowand@am.sony.com>:
>>> On 07/11/12 09:02, Arnaud B. wrote:
>>>> Hi all,
>
> < snip >
>
>>>> If you need an more information, I got all this boards in front of me :)
>>>
>>> Can you show a little bit more of the messages aound the crash? For example
>>> the register contents. Disassembly of the instruction at pc, and a few before
>>> that.
>>>
>>> -Frank
>
> Thanks for the extra info.
>
> I was looking at a plist corruption issue on a 2.6.29 rt kernel, but it does not
> look like your situation at all. So I don't have anything useful to contribute
> about your case.
>
> -Frank
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: Possible kernel regression between 3.0.31-rt51 and 3.4.x on PPC64 ?
2012-07-13 12:37 ` Arnaud B.
@ 2012-07-20 13:30 ` Arnaud B.
0 siblings, 0 replies; 4+ messages in thread
From: Arnaud B. @ 2012-07-20 13:30 UTC (permalink / raw)
To: frank.rowand; +Cc: Rowand, Frank, linux-rt-users
Hi All,
I don't know if it's useful, or not, but it still fails if I boot
using "nosmp", but it's ok if I built a kernel without CONFIG_SMP.
It's also ok with CONFIG_SMP and CONFIG_PREEMPT_RTB.
So it's something related to PPC64 + SMP + PREEMPT_RT_FULL
/Arnaud.
2012/7/13 Arnaud B. <discipleze1@gmail.com>:
> Thanks anyway ;)
>
> I made another try : same test case fail on 3.2.22-rt35
> So the regression is between 3.0 and 3.2 !
>
> Is there any ppc64 user here ?
>
> /Arnaud.
>
> 2012/7/12 Frank Rowand <frank.rowand@am.sony.com>:
>> On 07/12/12 06:11, Arnaud B. wrote:
>>> Here are 2 crash logs. One is normal booting, the other is with
>>> CONFIG_DEBUG_RT_MUTEXES choosen.
>>> for sure w in rt_mutex_top_waiter, return by plist_first_entry is
>>> corrupted. There is even ASCII in it from time to time :p
>>
>> < snip >
>>
>>> 2012/7/11 Frank Rowand <frank.rowand@am.sony.com>:
>>>> On 07/11/12 09:02, Arnaud B. wrote:
>>>>> Hi all,
>>
>> < snip >
>>
>>>>> If you need an more information, I got all this boards in front of me :)
>>>>
>>>> Can you show a little bit more of the messages aound the crash? For example
>>>> the register contents. Disassembly of the instruction at pc, and a few before
>>>> that.
>>>>
>>>> -Frank
>>
>> Thanks for the extra info.
>>
>> I was looking at a plist corruption issue on a 2.6.29 rt kernel, but it does not
>> look like your situation at all. So I don't have anything useful to contribute
>> about your case.
>>
>> -Frank
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2012-07-20 13:30 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-07-11 16:02 Possible kernel regression between 3.0.31-rt51 and 3.4.x on PPC64 ? Arnaud B.
[not found] ` <4FFDBE71.2000705@am.sony.com>
2012-07-12 13:11 ` Arnaud B.
[not found] ` <4FFF3877.6080007@am.sony.com>
2012-07-13 12:37 ` Arnaud B.
2012-07-20 13:30 ` Arnaud B.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.