All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Arnaud B." <discipleze1@gmail.com>
To: frank.rowand@am.sony.com
Cc: "linux-rt-users@vger.kernel.org" <linux-rt-users@vger.kernel.org>
Subject: Re: Possible kernel regression between 3.0.31-rt51 and 3.4.x on PPC64 ?
Date: Thu, 12 Jul 2012 15:11:44 +0200	[thread overview]
Message-ID: <CAJVbuyo3Y6=n6c-wyNsZgxAM0dkwiPrP7oN6aOMP=ozRtFJSug@mail.gmail.com> (raw)
In-Reply-To: <4FFDBE71.2000705@am.sony.com>

Here are 2 crash logs. One is normal booting, the other is with
CONFIG_DEBUG_RT_MUTEXES choosen.
for sure w in rt_mutex_top_waiter, return by plist_first_entry is
corrupted. There is even ASCII in it from time to time :p

Without CONFIG_DEBUG_RT_MUTEXES:

w:0xc000000000beb700 w.lock:0x766500005f6e6c5f lock:0xc000000007900710
waitlist:0xc000000007900718
cpu 0x0: Vector: 700 (Program Check) at [c0000000fb0eb940]
    pc: c00000000009d934: .wakeup_next_waiter+0x7c/0xe8
    lr: c00000000009d920: .wakeup_next_waiter+0x68/0xe8
    sp: c0000000fb0ebbc0
   msr: 80029200
  current = 0xc0000000fb0d8040
  paca    = 0xc00000000fff9000   softe: 0        irq_happened: 0x09
    pid   = 3, comm = ksoftirqd/0
kernel BUG at /home/arnaud/ALU/KERNEL_34/TRY_12JUIL_64/build/linux/kernel/rtmutex_common.h:83!
enter ? for help
[c0000000fb0ebc50] c000000000825d94 .rt_spin_lock_slowunlock+0xa4/0x11c
[c0000000fb0ebce0] c0000000008267b8 .rt_spin_unlock+0x54/0x64
[c0000000fb0ebd60] c0000000000451e4 .__thread_do_softirq+0x160/0x1b4
[c0000000fb0ebdf0] c0000000000452f8 .run_ksoftirqd+0xc0/0x208
[c0000000fb0ebea0] c00000000006baf0 .kthread+0xb8/0xc4
[c0000000fb0ebf90] c000000000015564 .original_kernel_thread+0x54/0x70

0:mon> d 0xc000000000beb700 <== There shoundn't be ASCII there :D
c000000000beb700 7070656400000000 72616e6765735b63  |pped....ranges[c|
c000000000beb710 6e745d2e66726f6d 203e3d2066726f6d  |nt].from >= from|
c000000000beb720 000000005f6e6c5f 617263686976655f  |...._nl_archive_|
c000000000beb730 7375626672656572 657300002f757372  |subfreeres../usr|

0:mon> t
[c0000000fb0ebc50] c000000000825d94 .rt_spin_lock_slowunlock+0xa4/0x11c
[c0000000fb0ebce0] c0000000008267b8 .rt_spin_unlock+0x54/0x64
[c0000000fb0ebd60] c0000000000451e4 .__thread_do_softirq+0x160/0x1b4
[c0000000fb0ebdf0] c0000000000452f8 .run_ksoftirqd+0xc0/0x208
[c0000000fb0ebea0] c00000000006baf0 .kthread+0xb8/0xc4
[c0000000fb0ebf90] c000000000015564 .original_kernel_thread+0x54/0x70

0:mon> e
cpu 0x0: Vector: 700 (Program Check) at [c0000000fb0eb940]
    pc: c00000000009d934: .wakeup_next_waiter+0x7c/0xe8
    lr: c00000000009d920: .wakeup_next_waiter+0x68/0xe8
    sp: c0000000fb0ebbc0
   msr: 80029200
  current = 0xc0000000fb0d8040
  paca    = 0xc00000000fff9000   softe: 0        irq_happened: 0x09
    pid   = 3, comm = ksoftirqd/0
kernel BUG at /home/arnaud/ALU/KERNEL_34/TRY_12JUIL_64/build/linux/kernel/rtmutex_common.h:83!

0:mon> r
R00 = 0000000000000001   R16 = 0000000000000000
R01 = c0000000fb0ebbc0   R17 = 000000007fe31430
R02 = c000000000df3160   R18 = 0000000000000000
R03 = 0000000000000000   R19 = 000000007ff88068
R04 = c000000000a89f18   R20 = 000000007ff88068
R05 = 0000000000000002   R21 = 000000007ff88068
R06 = 0000000000000006   R22 = 0000000000000000
R07 = 05000affffffffff   R23 = 0000000000000000
R08 = 0000000000000063   R24 = 0000000000000001
R09 = b665000058fe6b4e   R25 = 0000000000000000
R10 = 000000000000000f   R26 = c000000000c88dd0
R11 = 0000000000000001   R27 = c0000000fb0e8080
R12 = 0000000024044044   R28 = 0000000000000000
R13 = c00000000fff9000   R29 = c000000000beb700
R14 = 0000000000000000   R30 = c000000000d285b8
R15 = 000000007ffa1254   R31 = c000000007900710
pc  = c00000000009d934 .wakeup_next_waiter+0x7c/0xe8
lr  = c00000000009d920 .wakeup_next_waiter+0x68/0xe8
msr = 0000000080029200   cr  = 48044048
ctr = c00000000001ebf4   xer = 0000000020000000   trap =  700

0:mon> S
msr  = 0000000080021200  sprg0= c00000000fff9000
pvr  = 0000000080240010  sprg1= c00000000fff9000
dec  = 0000000000000000  sprg2= c00000000fff9180
sp   = c0000000fb0eb0e0  sprg3= 0000000000000000
cpu 0x0: Vector: 700 (Program Check) at [c0000000fb0eae60]
    pc: c000000000032a64: .cmds+0xcb4/0x1748
    lr: c000000000032a60: .cmds+0xcb0/0x1748
    sp: c0000000fb0eb0e0
   msr: 80021200
  current = 0xc0000000fb0d8040
  paca    = 0xc00000000fff9000   softe: 0        irq_happened: 0x09
    pid   = 3, comm = ksoftirqd/0
cpu 0x0: Exception 700 (Program Check) in xmon, returning to main loop


0:mon> la c00000000009d934
c00000000009d934: .wakeup_next_waiter+0x7c/0xe8

0:mon> di c00000000009d934:
c00000000009d934  0b000000      tdnei   r0,0
c00000000009d938  e88d0388      ld      r4,904(r13)
c00000000009d93c  387d0028      addi    r3,r29,40
c00000000009d940  388406e0      addi    r4,r4,1760
c00000000009d944  483ac9a9      bl      c00000000044a2ec        #
.plist_del+0x0/0x94
c00000000009d948  60000000      nop
c00000000009d94c  e93f0008      ld      r9,8(r31)
c00000000009d950  381f0008      addi    r0,r31,8
c00000000009d954  7d200278      xor     r0,r9,r0
c00000000009d958  3120ffff      addic   r9,r0,-1
c00000000009d95c  7c090110      subfe   r0,r9,r0
c00000000009d960  f81f0018      std     r0,24(r31)
c00000000009d964  e86d0388      ld      r3,904(r13)
c00000000009d968  386306d8      addi    r3,r3,1752
c00000000009d96c  7f84e378      mr      r4,r28
c00000000009d970  48789561      bl      c000000000826ed0        #
._raw_spin_unlock_irqrestore+0x0/0xc8



 CONFIG_DEBUG_RT_MUTEXES:
[    0.337296] ftrace: allocating 23126 entries in 137 pages
[    0.355391] mpic: requesting IPIs...
[    0.412420] e500 family performance monitor hardware support registered
w:0xc000000000bf1780 w.lock:0x          (null) lock:0xc000000007900790
waitlist:0xc000000007900798
cpu 0x0: Vector: 700 (Program Check) at [c0000000fb0eb840]
    pc: c00000000009d4e8: .__try_to_take_rt_mutex+0xa4/0x214
    lr: c00000000009d4d4: .__try_to_take_rt_mutex+0x90/0x214
    sp: c0000000fb0ebac0
   msr: 80029200
  current = 0xc0000000fb0d2040
  paca    = 0xc00000000fff9000   softe: 0        irq_happened: 0x01
    pid   = 3, comm = ksoftirqd/0
kernel BUG at /home/arnaud/ALU/KERNEL_34/TRY_12JUIL_64/build/linux/kernel/rtmutex_common.h:83!
enter ? for help
[c0000000fb0ebb70] c000000000826bc4 .rt_spin_lock_slowlock+0xa4/0x454
[c0000000fb0ebcc0] c0000000008270f8 .rt_spin_lock+0x20/0x30
[c0000000fb0ebd40] c000000000045154 .__thread_do_softirq+0xc4/0x1b4
[c0000000fb0ebdd0] c000000000045304 .run_ksoftirqd+0xc0/0x208
[c0000000fb0ebe80] c00000000006bb08 .kthread+0xb8/0xc4
[c0000000fb0ebf90] c000000000015564 .original_kernel_thread+0x54/0x70

0:mon> e
cpu 0x0: Vector: 700 (Program Check) at [c0000000fb0eb840]
    pc: c00000000009d4e8: .__try_to_take_rt_mutex+0xa4/0x214
    lr: c00000000009d4d4: .__try_to_take_rt_mutex+0x90/0x214
    sp: c0000000fb0ebac0
   msr: 80029200
  current = 0xc0000000fb0d2040
  paca    = 0xc00000000fff9000   softe: 0        irq_happened: 0x01
    pid   = 3, comm = ksoftirqd/0
kernel BUG at /home/arnaud/ALU/KERNEL_34/TRY_12JUIL_64/build/linux/kernel/rtmutex_common.h:83!

0:mon> r
R00 = 0000000000000001   R16 = 0000000000000000
R01 = c0000000fb0ebac0   R17 = 000000007fe31430
R02 = c000000000dfc950   R18 = 0000000000000000
R03 = 0000000000000000   R19 = 000000007ff88068
R04 = c000000000a8b94d   R20 = 000000007ff88068
R05 = 0000000000000002   R21 = 000000007ff88068
R06 = 0000000000000006   R22 = 0000000000000000
R07 = 05000affffffffff   R23 = 0000000000000000
R08 = 0000000000000063   R24 = 0000000000000001
R09 = c00000000790078f   R25 = 0000000000000001
R10 = 000000000000000f   R26 = 0000000000000000
R11 = 0000000000000001   R27 = c000000000bf1780
R12 = 0000000024044042   R28 = c000000007900798
R13 = c00000000fff9000   R29 = c0000000fb0d2040
R14 = 0000000000000000   R30 = c000000000d31a88
R15 = 000000007ffa1254   R31 = c000000007900790
pc  = c00000000009d4e8 .__try_to_take_rt_mutex+0xa4/0x214
lr  = c00000000009d4d4 .__try_to_take_rt_mutex+0x90/0x214
msr = 0000000080029200   cr  = 48044088
ctr = c00000000001ebf4   xer = 0000000020000000   trap =  700

0:mon> S
msr  = 0000000080021200  sprg0= c00000000fff9000
pvr  = 0000000080240010  sprg1= c00000000fff9000
dec  = 0000000000000000  sprg2= c00000000fff9180
sp   = c0000000fb0eafe0  sprg3= 0000000000000000
cpu 0x0: Vector: 700 (Program Check) at [c0000000fb0ead60]
    pc: c000000000032a64: .cmds+0xcb4/0x1748
    lr: c000000000032a60: .cmds+0xcb0/0x1748
    sp: c0000000fb0eafe0
   msr: 80021200
  current = 0xc0000000fb0d2040
  paca    = 0xc00000000fff9000   softe: 0        irq_happened: 0x01
    pid   = 3, comm = ksoftirqd/0
cpu 0x0: Exception 700 (Program Check) in xmon, returning to main loop

0:mon> d 0xc000000000bf1780
c000000000bf1780 0000000000000000 0000000000000000  |................|
c000000000bf1790 0000000000000000 c000000000bf1798  |................|
c000000000bf17a0 c000000000bf1798 0000000000000000  |................|
c000000000bf17b0 0000000100000000 c000000000a849df  |..............I.|
0:mon> d 0xc000000007900790
c000000007900790 8000000000000000 c000000000bf1798  |................|
c0000000079007a0 c000000000bf1798 0000000000000001  |................|
c0000000079007b0 0000000100000000 c000000000a849df  |..............I.|
c0000000079007c0 c000000000a84ab3 0000019800000000  |......J.........|
0:mon> d 0xc000000007900798
c000000007900798 c000000000bf1798 c000000000bf1798  |................|
c0000000079007a8 0000000000000001 0000000100000000  |................|
c0000000079007b8 c000000000a849df c000000000a84ab3  |......I.......J.|
c0000000079007c8 0000019800000000 0000000000000000  |................|

0:mon> di c00000000009d4e8
c00000000009d4e8  0b000000      tdnei   r0,0
c00000000009d4ec  e93b0050      ld      r9,80(r27)
c00000000009d4f0  7fbd4800      cmpd    cr7,r29,r9
c00000000009d4f4  419e0028      beq     cr7,c00000000009d51c    #
.__try_to_take_rt_mutex+0xd8/0x214
c00000000009d4f8  2fb90000      cmpdi   cr7,r25,0
c00000000009d4fc  419e0010      beq     cr7,c00000000009d50c    #
.__try_to_take_rt_mutex+0xc8/0x214
c00000000009d500  e81d003a      lwa     r0,56(r29)
c00000000009d504  2f800063      cmpwi   cr7,r0,99
c00000000009d508  41bd0110      bgt     cr7,c00000000009d618    #
.__try_to_take_rt_mutex+0x1d4/0x214
c00000000009d50c  817d0038      lwz     r11,56(r29)
c00000000009d510  80090038      lwz     r0,56(r9)
c00000000009d514  7f8b0000      cmpw    cr7,r11,r0
c00000000009d518  409c00f8      bge     cr7,c00000000009d610    #
.__try_to_take_rt_mutex+0x1cc/0x214
c00000000009d51c  2fba0000      cmpdi   cr7,r26,0
c00000000009d520  409e0010      bne     cr7,c00000000009d530    #
.__try_to_take_rt_mutex+0xec/0x214
c00000000009d524  e81f0008      ld      r0,8(r31)

0:mon> di c00000000009d4d4
c00000000009d4d4  60000000      nop
c00000000009d4d8  e81b0058      ld      r0,88(r27)
c00000000009d4dc  7fe00278      xor     r0,r31,r0
c00000000009d4e0  3120ffff      addic   r9,r0,-1
c00000000009d4e4  7c090110      subfe   r0,r9,r0
c00000000009d4e8  0b000000      tdnei   r0,0
c00000000009d4ec  e93b0050      ld      r9,80(r27)
c00000000009d4f0  7fbd4800      cmpd    cr7,r29,r9
c00000000009d4f4  419e0028      beq     cr7,c00000000009d51c    #
.__try_to_take_rt_mutex+0xd8/0x214
c00000000009d4f8  2fb90000      cmpdi   cr7,r25,0
c00000000009d4fc  419e0010      beq     cr7,c00000000009d50c    #
.__try_to_take_rt_mutex+0xc8/0x214
c00000000009d500  e81d003a      lwa     r0,56(r29)
c00000000009d504  2f800063      cmpwi   cr7,r0,99
c00000000009d508  41bd0110      bgt     cr7,c00000000009d618    #
.__try_to_take_rt_mutex+0x1d4/0x214
c00000000009d50c  817d0038      lwz     r11,56(r29)
c00000000009d510  80090038      lwz     r0,56(r9)

0:mon> la c00000000009d4d4
c00000000009d4d4: .__try_to_take_rt_mutex+0x90/0x214

2012/7/11 Frank Rowand <frank.rowand@am.sony.com>:
> On 07/11/12 09:02, Arnaud B. wrote:
>> Hi all,
>
> < snip >
>
>>
>> At this point the rtmutex plist is trashed. Data are bad when it crash
>> as pointer is not pointing to RAM anymore ;)
>> I tried add  call to plist_check_head at some place, and eventually I
>> got always a data access in init_lists (rtmutex.c). So at this point
>> plist is already corrupted.
>>
>>
>> So here is the call stack. Remember, it's alway the same :)
>>
>> cpu 0x0: Vector: 700 (Program Check) at [c0000000fb0e7830]
>> pc: c0000000000a2ba4: .__try_to_take_rt_mutex+0x74/0x1b0
>> lr: c0000000007d9f10: .rt_spin_lock_slowlock+0xa4/0x414
>> sp: c0000000fb0e7ab0
>> msr: 80029000
>> current = 0xc0000000fb0e20c0
>> paca = 0xc00000000fff9000 softe: 0 irq_happened: 0x01
>> pid = 3, comm = ksoftirqd/0
>> *kernel BUG at
>> /home/arnaud/ALU/KERNEL_34/HOME/build/linux/kernel/rtmutex_common.h:75!*
>> 0:mon>  t
>> [c0000000fb0e7b60] c0000000007d9f10 .rt_spin_lock_slowlock+0xa4/0x414
>> [c0000000fb0e7cb0] c0000000007da6c0 .rt_spin_lock+0x20/0x30
>> [c0000000fb0e7d30] c00000000004532c .__thread_do_softirq+0xc4/0x1b4
>> [c0000000fb0e7dc0] c0000000000454dc .run_ksoftirqd+0xc0/0x208
>> [c0000000fb0e7e70] c00000000006becc .kthread+0xb8/0xc4
>>
>> If you need an more information, I got all this boards in front of me :)
>
> Can you show a little bit more of the messages aound the crash?  For example
> the register contents.  Disassembly of the instruction at pc, and a few before
> that.
>
> -Frank

  parent reply	other threads:[~2012-07-12 13:11 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-07-11 16:02 Possible kernel regression between 3.0.31-rt51 and 3.4.x on PPC64 ? Arnaud B.
     [not found] ` <4FFDBE71.2000705@am.sony.com>
2012-07-12 13:11   ` Arnaud B. [this message]
     [not found]     ` <4FFF3877.6080007@am.sony.com>
2012-07-13 12:37       ` Arnaud B.
2012-07-20 13:30         ` Arnaud B.

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJVbuyo3Y6=n6c-wyNsZgxAM0dkwiPrP7oN6aOMP=ozRtFJSug@mail.gmail.com' \
    --to=discipleze1@gmail.com \
    --cc=frank.rowand@am.sony.com \
    --cc=linux-rt-users@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.