All of lore.kernel.org
 help / color / mirror / Atom feed
* BUG: spinlock recursion (sys_chdir, user_path_at, do_path_lookup ...)
@ 2011-01-11 11:05 ` Uwe Kleine-König
  0 siblings, 0 replies; 29+ messages in thread
From: Uwe Kleine-König @ 2011-01-11 11:05 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel; +Cc: kernel

Hello,

when testing yesterday's Linus' master branch
(a08948812b30653eb2c536ae613b635a989feb6f + some arch support including
Trond's latest nfsfix[1]) I hit the following reproducibly:

[    5.580000] BUG: spinlock recursion on CPU#0, init/1
[    5.580000]  lock: c7487e10, .magic: dead4ead, .owner: init/1, .owner_cpu: 0
[    5.590000] Backtrace: 
[    5.590000] [<c0037c2c>] (dump_backtrace+0x0/0x110) from [<c028240c>] (dump_stack+0x1c/0x20)
[    5.600000]  r7:c7487e10 r6:c0321368 r5:c7487e10 r4:c7848000
[    5.610000] [<c02823f0>] (dump_stack+0x0/0x20) from [<c01b516c>] (spin_bug+0x90/0xa4)
[    5.620000] [<c01b50dc>] (spin_bug+0x0/0xa4) from [<c01b52d4>] (do_raw_spin_lock+0x50/0x154)
[    5.620000]  r6:c7487e10 r5:c7487e10 r4:00000000
[    5.630000] [<c01b5284>] (do_raw_spin_lock+0x0/0x154) from [<c028524c>] (_raw_spin_lock_nested+0x40/0x48)
[    5.640000] [<c028520c>] (_raw_spin_lock_nested+0x0/0x48) from [<c00f436c>] (nameidata_dentry_drop_rcu+0x90/0x1a4)
[    5.650000]  r5:c7843efc r4:c7487dc0
[    5.650000] [<c00f42dc>] (nameidata_dentry_drop_rcu+0x0/0x1a4) from [<c00f44c0>] (d_revalidate+0x40/0x68)
[    5.660000] [<c00f4480>] (d_revalidate+0x0/0x68) from [<c00f6ed4>] (link_path_walk+0xb84/0xbf0)
[    5.670000]  r6:c7843efc r5:c7843efc r4:00000000
[    5.680000] [<c00f6350>] (link_path_walk+0x0/0xbf0) from [<c00f7164>] (do_path_lookup+0x48/0xd4)
[    5.680000] [<c00f711c>] (do_path_lookup+0x0/0xd4) from [<c00f7c08>] (user_path_at+0x64/0x9c)
[    5.690000] [<c00f7ba4>] (user_path_at+0x0/0x9c) from [<c00e9614>] (sys_chdir+0x2c/0x78)
[    5.700000]  r8:c0034108 r7:0000000c r6:be961ee4 r5:c7843f88 r4:00063015
[    5.710000] [<c00e95e8>] (sys_chdir+0x0/0x78) from [<c0033e80>] (ret_fast_syscall+0x0/0x44)
[    5.720000]  r5:be961ee4 r4:00063015
[   11.720000] BUG: spinlock lockup on CPU#0, init/1, c7487e10
[   11.730000] Backtrace: 
[   11.730000] [<c0037c2c>] (dump_backtrace+0x0/0x110) from [<c028240c>] (dump_stack+0x1c/0x20)
[   11.740000]  r7:c7842000 r6:c7487e10 r5:00000000 r4:00000000
[   11.740000] [<c02823f0>] (dump_stack+0x0/0x20) from [<c01b539c>] (do_raw_spin_lock+0x118/0x154)
[   11.750000] [<c01b5284>] (do_raw_spin_lock+0x0/0x154) from [<c028524c>] (_raw_spin_lock_nested+0x40/0x48)
[   11.760000] [<c028520c>] (_raw_spin_lock_nested+0x0/0x48) from [<c00f436c>] (nameidata_dentry_drop_rcu+0x90/0x1a4)
[   11.770000]  r5:c7843efc r4:c7487dc0
[   11.780000] [<c00f42dc>] (nameidata_dentry_drop_rcu+0x0/0x1a4) from [<c00f44c0>] (d_revalidate+0x40/0x68)
[   11.790000] [<c00f4480>] (d_revalidate+0x0/0x68) from [<c00f6ed4>] (link_path_walk+0xb84/0xbf0)
[   11.790000]  r6:c7843efc r5:c7843efc r4:00000000
[   11.800000] [<c00f6350>] (link_path_walk+0x0/0xbf0) from [<c00f7164>] (do_path_lookup+0x48/0xd4)
[   11.810000] [<c00f711c>] (do_path_lookup+0x0/0xd4) from [<c00f7c08>] (user_path_at+0x64/0x9c)
[   11.820000] [<c00f7ba4>] (user_path_at+0x0/0x9c) from [<c00e9614>] (sys_chdir+0x2c/0x78)
[   11.820000]  r8:c0034108 r7:0000000c r6:be961ee4 r5:c7843f88 r4:00063015
[   11.830000] [<c00e95e8>] (sys_chdir+0x0/0x78) from [<c0033e80>] (ret_fast_syscall+0x0/0x44)
[   11.840000]  r5:be961ee4 r4:00063015
[   75.280000] BUG: soft lockup - CPU#0 stuck for 64s! [init:1]
[   75.280000] Modules linked in:
[   75.280000] irq event stamp: 113662
[   75.280000] hardirqs last  enabled at (113662): [<c0285a7c>] _raw_spin_unlock_irqrestore+0x48/0x50
[   75.280000] hardirqs last disabled at (113661): [<c0285398>] _raw_spin_lock_irqsave+0x30/0x64
[   75.280000] softirqs last  enabled at (113509): [<c026447c>] rpc_wake_up_next+0x1b0/0x1c4
[   75.280000] softirqs last disabled at (113507): [<c02854f0>] _raw_spin_lock_bh+0x20/0x58
[   75.280000] 
[   75.280000] Pid: 1, comm:                 init
[   75.280000] CPU: 0    Not tainted  (2.6.37-04021-gb8b018c-dirty #41)
[   75.280000] PC is at do_raw_spin_lock+0xac/0x154
[   75.280000] LR is at do_raw_spin_lock+0xc0/0x154
[   75.280000] pc : [<c01b5330>]    lr : [<c01b5344>]    psr: 20000013
[   75.280000] sp : c7843dd0  ip : c7843cd4  fp : c7843e04
[   75.280000] r10: 06bd0000  r9 : 00000000  r8 : 00000000
[   75.280000] r7 : c7842000  r6 : c7487e10  r5 : 00000000  r4 : 03dd5aca
[   75.280000] r3 : 00000000  r2 : 00000001  r1 : c0285a74  r0 : 00000001
[   75.280000] Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[   75.280000] Control: 0005317f  Table: 479a8000  DAC: 00000015
[   75.280000] [<c00356c4>] (show_regs+0x0/0x54) from [<c0089dac>] (watchdog_timer_fn+0x13c/0x1a4)
[   75.280000]  r4:c7842000
[   75.280000] [<c0089c70>] (watchdog_timer_fn+0x0/0x1a4) from [<c006cb58>] (__run_hrtimer+0x114/0x1f0)
[   75.280000] [<c006ca44>] (__run_hrtimer+0x0/0x1f0) from [<c006ced8>] (hrtimer_interrupt+0x154/0x338)
[   75.280000] [<c006cd84>] (hrtimer_interrupt+0x0/0x338) from [<c003e36c>] (mxs_timer_interrupt+0x28/0x34)
[   75.280000] [<c003e344>] (mxs_timer_interrupt+0x0/0x34) from [<c008a408>] (handle_IRQ_event+0x7c/0x1a8)
[   75.280000] [<c008a38c>] (handle_IRQ_event+0x0/0x1a8) from [<c008c948>] (handle_level_irq+0xc8/0x148)
[   75.280000] [<c008c880>] (handle_level_irq+0x0/0x148) from [<c002d320>] (asm_do_IRQ+0x80/0xa4)
[   75.280000]  r7:c7842000 r6:c7487e10 r5:00000000 r4:00000030
[   75.280000] [<c002d2a0>] (asm_do_IRQ+0x0/0xa4) from [<c0033ab8>] (__irq_svc+0x38/0x80)
[   75.280000] Exception stack(0xc7843d88 to 0xc7843dd0)
[   75.280000] 3d80:                   00000001 c0285a74 00000001 00000000 03dd5aca 00000000
[   75.280000] 3da0: c7487e10 c7842000 00000000 00000000 06bd0000 c7843e04 c7843cd4 c7843dd0
[   75.280000] 3dc0: c01b5344 c01b5330 20000013 ffffffff
[   75.280000]  r5:f5000000 r4:ffffffff
[   75.280000] [<c01b5284>] (do_raw_spin_lock+0x0/0x154) from [<c028524c>] (_raw_spin_lock_nested+0x40/0x48)
[   75.280000] [<c028520c>] (_raw_spin_lock_nested+0x0/0x48) from [<c00f436c>] (nameidata_dentry_drop_rcu+0x90/0x1a4)
[   75.280000]  r5:c7843efc r4:c7487dc0
[   75.280000] [<c00f42dc>] (nameidata_dentry_drop_rcu+0x0/0x1a4) from [<c00f44c0>] (d_revalidate+0x40/0x68)
[   75.280000] [<c00f4480>] (d_revalidate+0x0/0x68) from [<c00f6ed4>] (link_path_walk+0xb84/0xbf0)
[   75.280000]  r6:c7843efc r5:c7843efc r4:00000000
[   75.280000] [<c00f6350>] (link_path_walk+0x0/0xbf0) from [<c00f7164>] (do_path_lookup+0x48/0xd4)
[   75.280000] [<c00f711c>] (do_path_lookup+0x0/0xd4) from [<c00f7c08>] (user_path_at+0x64/0x9c)
[   75.280000] [<c00f7ba4>] (user_path_at+0x0/0x9c) from [<c00e9614>] (sys_chdir+0x2c/0x78)
[   75.280000]  r8:c0034108 r7:0000000c r6:be961ee4 r5:c7843f88 r4:00063015
[   75.280000] [<c00e95e8>] (sys_chdir+0x0/0x78) from [<c0033e80>] (ret_fast_syscall+0x0/0x44)
[   75.280000]  r5:be961ee4 r4:00063015

I started to bisect, but already the first test case showed a different
error (my getty dying every few seconds).

Does this ring a bell for someone?

If you have questions don't hesitate to ask.

Hardware: mxs-based arm9

Best regards
Uwe

[1] http://mid.gmane.org/1294528551.4181.19.camel@heimdal.trondhjem.org

-- 
Pengutronix e.K.                           | Uwe Kleine-König            |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |

^ permalink raw reply	[flat|nested] 29+ messages in thread

* BUG: spinlock recursion (sys_chdir, user_path_at, do_path_lookup ...)
@ 2011-01-11 11:05 ` Uwe Kleine-König
  0 siblings, 0 replies; 29+ messages in thread
From: Uwe Kleine-König @ 2011-01-11 11:05 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

when testing yesterday's Linus' master branch
(a08948812b30653eb2c536ae613b635a989feb6f + some arch support including
Trond's latest nfsfix[1]) I hit the following reproducibly:

[    5.580000] BUG: spinlock recursion on CPU#0, init/1
[    5.580000]  lock: c7487e10, .magic: dead4ead, .owner: init/1, .owner_cpu: 0
[    5.590000] Backtrace: 
[    5.590000] [<c0037c2c>] (dump_backtrace+0x0/0x110) from [<c028240c>] (dump_stack+0x1c/0x20)
[    5.600000]  r7:c7487e10 r6:c0321368 r5:c7487e10 r4:c7848000
[    5.610000] [<c02823f0>] (dump_stack+0x0/0x20) from [<c01b516c>] (spin_bug+0x90/0xa4)
[    5.620000] [<c01b50dc>] (spin_bug+0x0/0xa4) from [<c01b52d4>] (do_raw_spin_lock+0x50/0x154)
[    5.620000]  r6:c7487e10 r5:c7487e10 r4:00000000
[    5.630000] [<c01b5284>] (do_raw_spin_lock+0x0/0x154) from [<c028524c>] (_raw_spin_lock_nested+0x40/0x48)
[    5.640000] [<c028520c>] (_raw_spin_lock_nested+0x0/0x48) from [<c00f436c>] (nameidata_dentry_drop_rcu+0x90/0x1a4)
[    5.650000]  r5:c7843efc r4:c7487dc0
[    5.650000] [<c00f42dc>] (nameidata_dentry_drop_rcu+0x0/0x1a4) from [<c00f44c0>] (d_revalidate+0x40/0x68)
[    5.660000] [<c00f4480>] (d_revalidate+0x0/0x68) from [<c00f6ed4>] (link_path_walk+0xb84/0xbf0)
[    5.670000]  r6:c7843efc r5:c7843efc r4:00000000
[    5.680000] [<c00f6350>] (link_path_walk+0x0/0xbf0) from [<c00f7164>] (do_path_lookup+0x48/0xd4)
[    5.680000] [<c00f711c>] (do_path_lookup+0x0/0xd4) from [<c00f7c08>] (user_path_at+0x64/0x9c)
[    5.690000] [<c00f7ba4>] (user_path_at+0x0/0x9c) from [<c00e9614>] (sys_chdir+0x2c/0x78)
[    5.700000]  r8:c0034108 r7:0000000c r6:be961ee4 r5:c7843f88 r4:00063015
[    5.710000] [<c00e95e8>] (sys_chdir+0x0/0x78) from [<c0033e80>] (ret_fast_syscall+0x0/0x44)
[    5.720000]  r5:be961ee4 r4:00063015
[   11.720000] BUG: spinlock lockup on CPU#0, init/1, c7487e10
[   11.730000] Backtrace: 
[   11.730000] [<c0037c2c>] (dump_backtrace+0x0/0x110) from [<c028240c>] (dump_stack+0x1c/0x20)
[   11.740000]  r7:c7842000 r6:c7487e10 r5:00000000 r4:00000000
[   11.740000] [<c02823f0>] (dump_stack+0x0/0x20) from [<c01b539c>] (do_raw_spin_lock+0x118/0x154)
[   11.750000] [<c01b5284>] (do_raw_spin_lock+0x0/0x154) from [<c028524c>] (_raw_spin_lock_nested+0x40/0x48)
[   11.760000] [<c028520c>] (_raw_spin_lock_nested+0x0/0x48) from [<c00f436c>] (nameidata_dentry_drop_rcu+0x90/0x1a4)
[   11.770000]  r5:c7843efc r4:c7487dc0
[   11.780000] [<c00f42dc>] (nameidata_dentry_drop_rcu+0x0/0x1a4) from [<c00f44c0>] (d_revalidate+0x40/0x68)
[   11.790000] [<c00f4480>] (d_revalidate+0x0/0x68) from [<c00f6ed4>] (link_path_walk+0xb84/0xbf0)
[   11.790000]  r6:c7843efc r5:c7843efc r4:00000000
[   11.800000] [<c00f6350>] (link_path_walk+0x0/0xbf0) from [<c00f7164>] (do_path_lookup+0x48/0xd4)
[   11.810000] [<c00f711c>] (do_path_lookup+0x0/0xd4) from [<c00f7c08>] (user_path_at+0x64/0x9c)
[   11.820000] [<c00f7ba4>] (user_path_at+0x0/0x9c) from [<c00e9614>] (sys_chdir+0x2c/0x78)
[   11.820000]  r8:c0034108 r7:0000000c r6:be961ee4 r5:c7843f88 r4:00063015
[   11.830000] [<c00e95e8>] (sys_chdir+0x0/0x78) from [<c0033e80>] (ret_fast_syscall+0x0/0x44)
[   11.840000]  r5:be961ee4 r4:00063015
[   75.280000] BUG: soft lockup - CPU#0 stuck for 64s! [init:1]
[   75.280000] Modules linked in:
[   75.280000] irq event stamp: 113662
[   75.280000] hardirqs last  enabled at (113662): [<c0285a7c>] _raw_spin_unlock_irqrestore+0x48/0x50
[   75.280000] hardirqs last disabled at (113661): [<c0285398>] _raw_spin_lock_irqsave+0x30/0x64
[   75.280000] softirqs last  enabled at (113509): [<c026447c>] rpc_wake_up_next+0x1b0/0x1c4
[   75.280000] softirqs last disabled at (113507): [<c02854f0>] _raw_spin_lock_bh+0x20/0x58
[   75.280000] 
[   75.280000] Pid: 1, comm:                 init
[   75.280000] CPU: 0    Not tainted  (2.6.37-04021-gb8b018c-dirty #41)
[   75.280000] PC is at do_raw_spin_lock+0xac/0x154
[   75.280000] LR is at do_raw_spin_lock+0xc0/0x154
[   75.280000] pc : [<c01b5330>]    lr : [<c01b5344>]    psr: 20000013
[   75.280000] sp : c7843dd0  ip : c7843cd4  fp : c7843e04
[   75.280000] r10: 06bd0000  r9 : 00000000  r8 : 00000000
[   75.280000] r7 : c7842000  r6 : c7487e10  r5 : 00000000  r4 : 03dd5aca
[   75.280000] r3 : 00000000  r2 : 00000001  r1 : c0285a74  r0 : 00000001
[   75.280000] Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[   75.280000] Control: 0005317f  Table: 479a8000  DAC: 00000015
[   75.280000] [<c00356c4>] (show_regs+0x0/0x54) from [<c0089dac>] (watchdog_timer_fn+0x13c/0x1a4)
[   75.280000]  r4:c7842000
[   75.280000] [<c0089c70>] (watchdog_timer_fn+0x0/0x1a4) from [<c006cb58>] (__run_hrtimer+0x114/0x1f0)
[   75.280000] [<c006ca44>] (__run_hrtimer+0x0/0x1f0) from [<c006ced8>] (hrtimer_interrupt+0x154/0x338)
[   75.280000] [<c006cd84>] (hrtimer_interrupt+0x0/0x338) from [<c003e36c>] (mxs_timer_interrupt+0x28/0x34)
[   75.280000] [<c003e344>] (mxs_timer_interrupt+0x0/0x34) from [<c008a408>] (handle_IRQ_event+0x7c/0x1a8)
[   75.280000] [<c008a38c>] (handle_IRQ_event+0x0/0x1a8) from [<c008c948>] (handle_level_irq+0xc8/0x148)
[   75.280000] [<c008c880>] (handle_level_irq+0x0/0x148) from [<c002d320>] (asm_do_IRQ+0x80/0xa4)
[   75.280000]  r7:c7842000 r6:c7487e10 r5:00000000 r4:00000030
[   75.280000] [<c002d2a0>] (asm_do_IRQ+0x0/0xa4) from [<c0033ab8>] (__irq_svc+0x38/0x80)
[   75.280000] Exception stack(0xc7843d88 to 0xc7843dd0)
[   75.280000] 3d80:                   00000001 c0285a74 00000001 00000000 03dd5aca 00000000
[   75.280000] 3da0: c7487e10 c7842000 00000000 00000000 06bd0000 c7843e04 c7843cd4 c7843dd0
[   75.280000] 3dc0: c01b5344 c01b5330 20000013 ffffffff
[   75.280000]  r5:f5000000 r4:ffffffff
[   75.280000] [<c01b5284>] (do_raw_spin_lock+0x0/0x154) from [<c028524c>] (_raw_spin_lock_nested+0x40/0x48)
[   75.280000] [<c028520c>] (_raw_spin_lock_nested+0x0/0x48) from [<c00f436c>] (nameidata_dentry_drop_rcu+0x90/0x1a4)
[   75.280000]  r5:c7843efc r4:c7487dc0
[   75.280000] [<c00f42dc>] (nameidata_dentry_drop_rcu+0x0/0x1a4) from [<c00f44c0>] (d_revalidate+0x40/0x68)
[   75.280000] [<c00f4480>] (d_revalidate+0x0/0x68) from [<c00f6ed4>] (link_path_walk+0xb84/0xbf0)
[   75.280000]  r6:c7843efc r5:c7843efc r4:00000000
[   75.280000] [<c00f6350>] (link_path_walk+0x0/0xbf0) from [<c00f7164>] (do_path_lookup+0x48/0xd4)
[   75.280000] [<c00f711c>] (do_path_lookup+0x0/0xd4) from [<c00f7c08>] (user_path_at+0x64/0x9c)
[   75.280000] [<c00f7ba4>] (user_path_at+0x0/0x9c) from [<c00e9614>] (sys_chdir+0x2c/0x78)
[   75.280000]  r8:c0034108 r7:0000000c r6:be961ee4 r5:c7843f88 r4:00063015
[   75.280000] [<c00e95e8>] (sys_chdir+0x0/0x78) from [<c0033e80>] (ret_fast_syscall+0x0/0x44)
[   75.280000]  r5:be961ee4 r4:00063015

I started to bisect, but already the first test case showed a different
error (my getty dying every few seconds).

Does this ring a bell for someone?

If you have questions don't hesitate to ask.

Hardware: mxs-based arm9

Best regards
Uwe

[1] http://mid.gmane.org/1294528551.4181.19.camel at heimdal.trondhjem.org

-- 
Pengutronix e.K.                           | Uwe Kleine-K?nig            |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: BUG: spinlock recursion (sys_chdir, user_path_at, do_path_lookup ...)
  2011-01-11 11:05 ` Uwe Kleine-König
@ 2011-01-12  7:52   ` Uwe Kleine-König
  -1 siblings, 0 replies; 29+ messages in thread
From: Uwe Kleine-König @ 2011-01-12  7:52 UTC (permalink / raw)
  To: linux-kernel, linux-arm-kernel
  Cc: kernel, Nick Piggin, Soren Sandmann, Steven Rostedt,
	Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Peter Zijlstra,
	Arjan van de Ven, Frederic Weisbecker, Arnaldo Carvalho de Melo

Hello,

On Tue, Jan 11, 2011 at 12:05:39PM +0100, Uwe Kleine-König wrote:
> when testing yesterday's Linus' master branch
> (a08948812b30653eb2c536ae613b635a989feb6f + some arch support including
> Trond's latest nfsfix[1]) I hit the following reproducibly:
> 
> [    5.580000] BUG: spinlock recursion on CPU#0, init/1
> [    5.580000]  lock: c7487e10, .magic: dead4ead, .owner: init/1, .owner_cpu: 0
> [    5.590000] Backtrace: 
> [    5.590000] [<c0037c2c>] (dump_backtrace+0x0/0x110) from [<c028240c>] (dump_stack+0x1c/0x20)
> [    5.600000]  r7:c7487e10 r6:c0321368 r5:c7487e10 r4:c7848000
> [    5.610000] [<c02823f0>] (dump_stack+0x0/0x20) from [<c01b516c>] (spin_bug+0x90/0xa4)
> [    5.620000] [<c01b50dc>] (spin_bug+0x0/0xa4) from [<c01b52d4>] (do_raw_spin_lock+0x50/0x154)
> [    5.620000]  r6:c7487e10 r5:c7487e10 r4:00000000
> [    5.630000] [<c01b5284>] (do_raw_spin_lock+0x0/0x154) from [<c028524c>] (_raw_spin_lock_nested+0x40/0x48)
> [    5.640000] [<c028520c>] (_raw_spin_lock_nested+0x0/0x48) from [<c00f436c>] (nameidata_dentry_drop_rcu+0x90/0x1a4)
> [    5.650000]  r5:c7843efc r4:c7487dc0
> [    5.650000] [<c00f42dc>] (nameidata_dentry_drop_rcu+0x0/0x1a4) from [<c00f44c0>] (d_revalidate+0x40/0x68)
> [    5.660000] [<c00f4480>] (d_revalidate+0x0/0x68) from [<c00f6ed4>] (link_path_walk+0xb84/0xbf0)
> [    5.670000]  r6:c7843efc r5:c7843efc r4:00000000
> [    5.680000] [<c00f6350>] (link_path_walk+0x0/0xbf0) from [<c00f7164>] (do_path_lookup+0x48/0xd4)
> [    5.680000] [<c00f711c>] (do_path_lookup+0x0/0xd4) from [<c00f7c08>] (user_path_at+0x64/0x9c)
> [    5.690000] [<c00f7ba4>] (user_path_at+0x0/0x9c) from [<c00e9614>] (sys_chdir+0x2c/0x78)
> [    5.700000]  r8:c0034108 r7:0000000c r6:be961ee4 r5:c7843f88 r4:00063015
> [    5.710000] [<c00e95e8>] (sys_chdir+0x0/0x78) from [<c0033e80>] (ret_fast_syscall+0x0/0x44)
> [    5.720000]  r5:be961ee4 r4:00063015
> [   11.720000] BUG: spinlock lockup on CPU#0, init/1, c7487e10
> [   11.730000] Backtrace: 
> [   11.730000] [<c0037c2c>] (dump_backtrace+0x0/0x110) from [<c028240c>] (dump_stack+0x1c/0x20)
> [   11.740000]  r7:c7842000 r6:c7487e10 r5:00000000 r4:00000000
> [   11.740000] [<c02823f0>] (dump_stack+0x0/0x20) from [<c01b539c>] (do_raw_spin_lock+0x118/0x154)
> [   11.750000] [<c01b5284>] (do_raw_spin_lock+0x0/0x154) from [<c028524c>] (_raw_spin_lock_nested+0x40/0x48)
> [   11.760000] [<c028520c>] (_raw_spin_lock_nested+0x0/0x48) from [<c00f436c>] (nameidata_dentry_drop_rcu+0x90/0x1a4)
> [   11.770000]  r5:c7843efc r4:c7487dc0
> [   11.780000] [<c00f42dc>] (nameidata_dentry_drop_rcu+0x0/0x1a4) from [<c00f44c0>] (d_revalidate+0x40/0x68)
> [   11.790000] [<c00f4480>] (d_revalidate+0x0/0x68) from [<c00f6ed4>] (link_path_walk+0xb84/0xbf0)
> [   11.790000]  r6:c7843efc r5:c7843efc r4:00000000
> [   11.800000] [<c00f6350>] (link_path_walk+0x0/0xbf0) from [<c00f7164>] (do_path_lookup+0x48/0xd4)
> [   11.810000] [<c00f711c>] (do_path_lookup+0x0/0xd4) from [<c00f7c08>] (user_path_at+0x64/0x9c)
> [   11.820000] [<c00f7ba4>] (user_path_at+0x0/0x9c) from [<c00e9614>] (sys_chdir+0x2c/0x78)
> [   11.820000]  r8:c0034108 r7:0000000c r6:be961ee4 r5:c7843f88 r4:00063015
> [   11.830000] [<c00e95e8>] (sys_chdir+0x0/0x78) from [<c0033e80>] (ret_fast_syscall+0x0/0x44)
> [   11.840000]  r5:be961ee4 r4:00063015
> [   75.280000] BUG: soft lockup - CPU#0 stuck for 64s! [init:1]
> [   75.280000] Modules linked in:
> [   75.280000] irq event stamp: 113662
> [   75.280000] hardirqs last  enabled at (113662): [<c0285a7c>] _raw_spin_unlock_irqrestore+0x48/0x50
> [   75.280000] hardirqs last disabled at (113661): [<c0285398>] _raw_spin_lock_irqsave+0x30/0x64
> [   75.280000] softirqs last  enabled at (113509): [<c026447c>] rpc_wake_up_next+0x1b0/0x1c4
> [   75.280000] softirqs last disabled at (113507): [<c02854f0>] _raw_spin_lock_bh+0x20/0x58
> [   75.280000] 
> [   75.280000] Pid: 1, comm:                 init
> [   75.280000] CPU: 0    Not tainted  (2.6.37-04021-gb8b018c-dirty #41)
> [   75.280000] PC is at do_raw_spin_lock+0xac/0x154
> [   75.280000] LR is at do_raw_spin_lock+0xc0/0x154
> [   75.280000] pc : [<c01b5330>]    lr : [<c01b5344>]    psr: 20000013
> [   75.280000] sp : c7843dd0  ip : c7843cd4  fp : c7843e04
> [   75.280000] r10: 06bd0000  r9 : 00000000  r8 : 00000000
> [   75.280000] r7 : c7842000  r6 : c7487e10  r5 : 00000000  r4 : 03dd5aca
> [   75.280000] r3 : 00000000  r2 : 00000001  r1 : c0285a74  r0 : 00000001
> [   75.280000] Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
> [   75.280000] Control: 0005317f  Table: 479a8000  DAC: 00000015
> [   75.280000] [<c00356c4>] (show_regs+0x0/0x54) from [<c0089dac>] (watchdog_timer_fn+0x13c/0x1a4)
> [   75.280000]  r4:c7842000
> [   75.280000] [<c0089c70>] (watchdog_timer_fn+0x0/0x1a4) from [<c006cb58>] (__run_hrtimer+0x114/0x1f0)
> [   75.280000] [<c006ca44>] (__run_hrtimer+0x0/0x1f0) from [<c006ced8>] (hrtimer_interrupt+0x154/0x338)
> [   75.280000] [<c006cd84>] (hrtimer_interrupt+0x0/0x338) from [<c003e36c>] (mxs_timer_interrupt+0x28/0x34)
> [   75.280000] [<c003e344>] (mxs_timer_interrupt+0x0/0x34) from [<c008a408>] (handle_IRQ_event+0x7c/0x1a8)
> [   75.280000] [<c008a38c>] (handle_IRQ_event+0x0/0x1a8) from [<c008c948>] (handle_level_irq+0xc8/0x148)
> [   75.280000] [<c008c880>] (handle_level_irq+0x0/0x148) from [<c002d320>] (asm_do_IRQ+0x80/0xa4)
> [   75.280000]  r7:c7842000 r6:c7487e10 r5:00000000 r4:00000030
> [   75.280000] [<c002d2a0>] (asm_do_IRQ+0x0/0xa4) from [<c0033ab8>] (__irq_svc+0x38/0x80)
> [   75.280000] Exception stack(0xc7843d88 to 0xc7843dd0)
> [   75.280000] 3d80:                   00000001 c0285a74 00000001 00000000 03dd5aca 00000000
> [   75.280000] 3da0: c7487e10 c7842000 00000000 00000000 06bd0000 c7843e04 c7843cd4 c7843dd0
> [   75.280000] 3dc0: c01b5344 c01b5330 20000013 ffffffff
> [   75.280000]  r5:f5000000 r4:ffffffff
> [   75.280000] [<c01b5284>] (do_raw_spin_lock+0x0/0x154) from [<c028524c>] (_raw_spin_lock_nested+0x40/0x48)
> [   75.280000] [<c028520c>] (_raw_spin_lock_nested+0x0/0x48) from [<c00f436c>] (nameidata_dentry_drop_rcu+0x90/0x1a4)
> [   75.280000]  r5:c7843efc r4:c7487dc0
> [   75.280000] [<c00f42dc>] (nameidata_dentry_drop_rcu+0x0/0x1a4) from [<c00f44c0>] (d_revalidate+0x40/0x68)
> [   75.280000] [<c00f4480>] (d_revalidate+0x0/0x68) from [<c00f6ed4>] (link_path_walk+0xb84/0xbf0)
> [   75.280000]  r6:c7843efc r5:c7843efc r4:00000000
> [   75.280000] [<c00f6350>] (link_path_walk+0x0/0xbf0) from [<c00f7164>] (do_path_lookup+0x48/0xd4)
> [   75.280000] [<c00f711c>] (do_path_lookup+0x0/0xd4) from [<c00f7c08>] (user_path_at+0x64/0x9c)
> [   75.280000] [<c00f7ba4>] (user_path_at+0x0/0x9c) from [<c00e9614>] (sys_chdir+0x2c/0x78)
> [   75.280000]  r8:c0034108 r7:0000000c r6:be961ee4 r5:c7843f88 r4:00063015
> [   75.280000] [<c00e95e8>] (sys_chdir+0x0/0x78) from [<c0033e80>] (ret_fast_syscall+0x0/0x44)
> [   75.280000]  r5:be961ee4 r4:00063015
> 
> I started to bisect, but already the first test case showed a different
> error (my getty dying every few seconds).
I bisected this one now, the first bad commit is

	9c0729d (x86: Eliminate bp argument from the stack tracing routines)

.  It made a x86 specific change to include/linux/stacktrace.h.

According to tglx the lockup above "is related to nicks scalability
stuff".  I havn't researched yet the offending commit.  Is that
necessary?

Best regards
Uwe

-- 
Pengutronix e.K.                           | Uwe Kleine-König            |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |

^ permalink raw reply	[flat|nested] 29+ messages in thread

* BUG: spinlock recursion (sys_chdir, user_path_at, do_path_lookup ...)
@ 2011-01-12  7:52   ` Uwe Kleine-König
  0 siblings, 0 replies; 29+ messages in thread
From: Uwe Kleine-König @ 2011-01-12  7:52 UTC (permalink / raw)
  To: linux-arm-kernel

Hello,

On Tue, Jan 11, 2011 at 12:05:39PM +0100, Uwe Kleine-K?nig wrote:
> when testing yesterday's Linus' master branch
> (a08948812b30653eb2c536ae613b635a989feb6f + some arch support including
> Trond's latest nfsfix[1]) I hit the following reproducibly:
> 
> [    5.580000] BUG: spinlock recursion on CPU#0, init/1
> [    5.580000]  lock: c7487e10, .magic: dead4ead, .owner: init/1, .owner_cpu: 0
> [    5.590000] Backtrace: 
> [    5.590000] [<c0037c2c>] (dump_backtrace+0x0/0x110) from [<c028240c>] (dump_stack+0x1c/0x20)
> [    5.600000]  r7:c7487e10 r6:c0321368 r5:c7487e10 r4:c7848000
> [    5.610000] [<c02823f0>] (dump_stack+0x0/0x20) from [<c01b516c>] (spin_bug+0x90/0xa4)
> [    5.620000] [<c01b50dc>] (spin_bug+0x0/0xa4) from [<c01b52d4>] (do_raw_spin_lock+0x50/0x154)
> [    5.620000]  r6:c7487e10 r5:c7487e10 r4:00000000
> [    5.630000] [<c01b5284>] (do_raw_spin_lock+0x0/0x154) from [<c028524c>] (_raw_spin_lock_nested+0x40/0x48)
> [    5.640000] [<c028520c>] (_raw_spin_lock_nested+0x0/0x48) from [<c00f436c>] (nameidata_dentry_drop_rcu+0x90/0x1a4)
> [    5.650000]  r5:c7843efc r4:c7487dc0
> [    5.650000] [<c00f42dc>] (nameidata_dentry_drop_rcu+0x0/0x1a4) from [<c00f44c0>] (d_revalidate+0x40/0x68)
> [    5.660000] [<c00f4480>] (d_revalidate+0x0/0x68) from [<c00f6ed4>] (link_path_walk+0xb84/0xbf0)
> [    5.670000]  r6:c7843efc r5:c7843efc r4:00000000
> [    5.680000] [<c00f6350>] (link_path_walk+0x0/0xbf0) from [<c00f7164>] (do_path_lookup+0x48/0xd4)
> [    5.680000] [<c00f711c>] (do_path_lookup+0x0/0xd4) from [<c00f7c08>] (user_path_at+0x64/0x9c)
> [    5.690000] [<c00f7ba4>] (user_path_at+0x0/0x9c) from [<c00e9614>] (sys_chdir+0x2c/0x78)
> [    5.700000]  r8:c0034108 r7:0000000c r6:be961ee4 r5:c7843f88 r4:00063015
> [    5.710000] [<c00e95e8>] (sys_chdir+0x0/0x78) from [<c0033e80>] (ret_fast_syscall+0x0/0x44)
> [    5.720000]  r5:be961ee4 r4:00063015
> [   11.720000] BUG: spinlock lockup on CPU#0, init/1, c7487e10
> [   11.730000] Backtrace: 
> [   11.730000] [<c0037c2c>] (dump_backtrace+0x0/0x110) from [<c028240c>] (dump_stack+0x1c/0x20)
> [   11.740000]  r7:c7842000 r6:c7487e10 r5:00000000 r4:00000000
> [   11.740000] [<c02823f0>] (dump_stack+0x0/0x20) from [<c01b539c>] (do_raw_spin_lock+0x118/0x154)
> [   11.750000] [<c01b5284>] (do_raw_spin_lock+0x0/0x154) from [<c028524c>] (_raw_spin_lock_nested+0x40/0x48)
> [   11.760000] [<c028520c>] (_raw_spin_lock_nested+0x0/0x48) from [<c00f436c>] (nameidata_dentry_drop_rcu+0x90/0x1a4)
> [   11.770000]  r5:c7843efc r4:c7487dc0
> [   11.780000] [<c00f42dc>] (nameidata_dentry_drop_rcu+0x0/0x1a4) from [<c00f44c0>] (d_revalidate+0x40/0x68)
> [   11.790000] [<c00f4480>] (d_revalidate+0x0/0x68) from [<c00f6ed4>] (link_path_walk+0xb84/0xbf0)
> [   11.790000]  r6:c7843efc r5:c7843efc r4:00000000
> [   11.800000] [<c00f6350>] (link_path_walk+0x0/0xbf0) from [<c00f7164>] (do_path_lookup+0x48/0xd4)
> [   11.810000] [<c00f711c>] (do_path_lookup+0x0/0xd4) from [<c00f7c08>] (user_path_at+0x64/0x9c)
> [   11.820000] [<c00f7ba4>] (user_path_at+0x0/0x9c) from [<c00e9614>] (sys_chdir+0x2c/0x78)
> [   11.820000]  r8:c0034108 r7:0000000c r6:be961ee4 r5:c7843f88 r4:00063015
> [   11.830000] [<c00e95e8>] (sys_chdir+0x0/0x78) from [<c0033e80>] (ret_fast_syscall+0x0/0x44)
> [   11.840000]  r5:be961ee4 r4:00063015
> [   75.280000] BUG: soft lockup - CPU#0 stuck for 64s! [init:1]
> [   75.280000] Modules linked in:
> [   75.280000] irq event stamp: 113662
> [   75.280000] hardirqs last  enabled at (113662): [<c0285a7c>] _raw_spin_unlock_irqrestore+0x48/0x50
> [   75.280000] hardirqs last disabled at (113661): [<c0285398>] _raw_spin_lock_irqsave+0x30/0x64
> [   75.280000] softirqs last  enabled at (113509): [<c026447c>] rpc_wake_up_next+0x1b0/0x1c4
> [   75.280000] softirqs last disabled at (113507): [<c02854f0>] _raw_spin_lock_bh+0x20/0x58
> [   75.280000] 
> [   75.280000] Pid: 1, comm:                 init
> [   75.280000] CPU: 0    Not tainted  (2.6.37-04021-gb8b018c-dirty #41)
> [   75.280000] PC is at do_raw_spin_lock+0xac/0x154
> [   75.280000] LR is at do_raw_spin_lock+0xc0/0x154
> [   75.280000] pc : [<c01b5330>]    lr : [<c01b5344>]    psr: 20000013
> [   75.280000] sp : c7843dd0  ip : c7843cd4  fp : c7843e04
> [   75.280000] r10: 06bd0000  r9 : 00000000  r8 : 00000000
> [   75.280000] r7 : c7842000  r6 : c7487e10  r5 : 00000000  r4 : 03dd5aca
> [   75.280000] r3 : 00000000  r2 : 00000001  r1 : c0285a74  r0 : 00000001
> [   75.280000] Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
> [   75.280000] Control: 0005317f  Table: 479a8000  DAC: 00000015
> [   75.280000] [<c00356c4>] (show_regs+0x0/0x54) from [<c0089dac>] (watchdog_timer_fn+0x13c/0x1a4)
> [   75.280000]  r4:c7842000
> [   75.280000] [<c0089c70>] (watchdog_timer_fn+0x0/0x1a4) from [<c006cb58>] (__run_hrtimer+0x114/0x1f0)
> [   75.280000] [<c006ca44>] (__run_hrtimer+0x0/0x1f0) from [<c006ced8>] (hrtimer_interrupt+0x154/0x338)
> [   75.280000] [<c006cd84>] (hrtimer_interrupt+0x0/0x338) from [<c003e36c>] (mxs_timer_interrupt+0x28/0x34)
> [   75.280000] [<c003e344>] (mxs_timer_interrupt+0x0/0x34) from [<c008a408>] (handle_IRQ_event+0x7c/0x1a8)
> [   75.280000] [<c008a38c>] (handle_IRQ_event+0x0/0x1a8) from [<c008c948>] (handle_level_irq+0xc8/0x148)
> [   75.280000] [<c008c880>] (handle_level_irq+0x0/0x148) from [<c002d320>] (asm_do_IRQ+0x80/0xa4)
> [   75.280000]  r7:c7842000 r6:c7487e10 r5:00000000 r4:00000030
> [   75.280000] [<c002d2a0>] (asm_do_IRQ+0x0/0xa4) from [<c0033ab8>] (__irq_svc+0x38/0x80)
> [   75.280000] Exception stack(0xc7843d88 to 0xc7843dd0)
> [   75.280000] 3d80:                   00000001 c0285a74 00000001 00000000 03dd5aca 00000000
> [   75.280000] 3da0: c7487e10 c7842000 00000000 00000000 06bd0000 c7843e04 c7843cd4 c7843dd0
> [   75.280000] 3dc0: c01b5344 c01b5330 20000013 ffffffff
> [   75.280000]  r5:f5000000 r4:ffffffff
> [   75.280000] [<c01b5284>] (do_raw_spin_lock+0x0/0x154) from [<c028524c>] (_raw_spin_lock_nested+0x40/0x48)
> [   75.280000] [<c028520c>] (_raw_spin_lock_nested+0x0/0x48) from [<c00f436c>] (nameidata_dentry_drop_rcu+0x90/0x1a4)
> [   75.280000]  r5:c7843efc r4:c7487dc0
> [   75.280000] [<c00f42dc>] (nameidata_dentry_drop_rcu+0x0/0x1a4) from [<c00f44c0>] (d_revalidate+0x40/0x68)
> [   75.280000] [<c00f4480>] (d_revalidate+0x0/0x68) from [<c00f6ed4>] (link_path_walk+0xb84/0xbf0)
> [   75.280000]  r6:c7843efc r5:c7843efc r4:00000000
> [   75.280000] [<c00f6350>] (link_path_walk+0x0/0xbf0) from [<c00f7164>] (do_path_lookup+0x48/0xd4)
> [   75.280000] [<c00f711c>] (do_path_lookup+0x0/0xd4) from [<c00f7c08>] (user_path_at+0x64/0x9c)
> [   75.280000] [<c00f7ba4>] (user_path_at+0x0/0x9c) from [<c00e9614>] (sys_chdir+0x2c/0x78)
> [   75.280000]  r8:c0034108 r7:0000000c r6:be961ee4 r5:c7843f88 r4:00063015
> [   75.280000] [<c00e95e8>] (sys_chdir+0x0/0x78) from [<c0033e80>] (ret_fast_syscall+0x0/0x44)
> [   75.280000]  r5:be961ee4 r4:00063015
> 
> I started to bisect, but already the first test case showed a different
> error (my getty dying every few seconds).
I bisected this one now, the first bad commit is

	9c0729d (x86: Eliminate bp argument from the stack tracing routines)

.  It made a x86 specific change to include/linux/stacktrace.h.

According to tglx the lockup above "is related to nicks scalability
stuff".  I havn't researched yet the offending commit.  Is that
necessary?

Best regards
Uwe

-- 
Pengutronix e.K.                           | Uwe Kleine-K?nig            |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: BUG: spinlock recursion (sys_chdir, user_path_at, do_path_lookup ...)
  2011-01-12  7:52   ` Uwe Kleine-König
@ 2011-01-12 10:57     ` Thomas Gleixner
  -1 siblings, 0 replies; 29+ messages in thread
From: Thomas Gleixner @ 2011-01-12 10:57 UTC (permalink / raw)
  To: Uwe Kleine-König
  Cc: linux-kernel, linux-arm-kernel, kernel, Nick Piggin,
	Soren Sandmann, Steven Rostedt, Ingo Molnar, H. Peter Anvin,
	Peter Zijlstra, Arjan van de Ven, Frederic Weisbecker,
	Arnaldo Carvalho de Melo

[-- Attachment #1: Type: TEXT/PLAIN, Size: 790 bytes --]

On Wed, 12 Jan 2011, Uwe Kleine-König wrote:
> > [   75.280000]  r5:be961ee4 r4:00063015
> > 
> > I started to bisect, but already the first test case showed a different
> > error (my getty dying every few seconds).
> I bisected this one now, the first bad commit is
> 
> 	9c0729d (x86: Eliminate bp argument from the stack tracing routines)
> 
> .  It made a x86 specific change to include/linux/stacktrace.h.

As I said on IRC already, that's complete nonsense. The commit changes
a function prototype which is only relevant for x86. So how should
that affect ARM ?

> According to tglx the lockup above "is related to nicks scalability
> stuff".  I havn't researched yet the offending commit.  Is that
> necessary?

Only if you are interested that the problem gets fixed.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 29+ messages in thread

* BUG: spinlock recursion (sys_chdir, user_path_at, do_path_lookup ...)
@ 2011-01-12 10:57     ` Thomas Gleixner
  0 siblings, 0 replies; 29+ messages in thread
From: Thomas Gleixner @ 2011-01-12 10:57 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 12 Jan 2011, Uwe Kleine-K?nig wrote:
> > [   75.280000]  r5:be961ee4 r4:00063015
> > 
> > I started to bisect, but already the first test case showed a different
> > error (my getty dying every few seconds).
> I bisected this one now, the first bad commit is
> 
> 	9c0729d (x86: Eliminate bp argument from the stack tracing routines)
> 
> .  It made a x86 specific change to include/linux/stacktrace.h.

As I said on IRC already, that's complete nonsense. The commit changes
a function prototype which is only relevant for x86. So how should
that affect ARM ?

> According to tglx the lockup above "is related to nicks scalability
> stuff".  I havn't researched yet the offending commit.  Is that
> necessary?

Only if you are interested that the problem gets fixed.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: BUG: spinlock recursion (sys_chdir, user_path_at, do_path_lookup ...)
  2011-01-12 10:57     ` Thomas Gleixner
@ 2011-01-12 12:03       ` Uwe Kleine-König
  -1 siblings, 0 replies; 29+ messages in thread
From: Uwe Kleine-König @ 2011-01-12 12:03 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, linux-arm-kernel, kernel, Nick Piggin,
	Soren Sandmann, Steven Rostedt, Ingo Molnar, H. Peter Anvin,
	Peter Zijlstra, Arjan van de Ven, Frederic Weisbecker,
	Arnaldo Carvalho de Melo

On Wed, Jan 12, 2011 at 11:57:50AM +0100, Thomas Gleixner wrote:
> On Wed, 12 Jan 2011, Uwe Kleine-König wrote:
> > > [   75.280000]  r5:be961ee4 r4:00063015
> > > 
> > > I started to bisect, but already the first test case showed a different
> > > error (my getty dying every few seconds).
> > I bisected this one now, the first bad commit is
> > 
> > 	9c0729d (x86: Eliminate bp argument from the stack tracing routines)
> > 
> > .  It made a x86 specific change to include/linux/stacktrace.h.
> 
> As I said on IRC already, that's complete nonsense. The commit changes
> a function prototype which is only relevant for x86. So how should
> that affect ARM ?
hmm, the conversion that you probably mean is:

	22:26 < ukleinek> hmm, 9c0729dc8062bed96189bd14ac6d4920f3958743 is the first bad commit
	22:26 < tglx> lol
	22:26  * ukleinek goes to bed
	22:27 < ukleinek> then it can only be about include/linux/stacktrace.h
	22:27  * ukleinek goes to bed anyhow
	22:28 < rostedt> ukleinek: btw, you could do the bisect automated with ktest.pl :-)
	22:30 < tglx> ukleinek: right, a change to include/linux/stacktrace.h which is x86 specific
	22:33 < tglx> makes arm explode
	22:33 < tglx> rotfl

I admit I didn't look what was changed there and I understood your
statement as "the change to include/linux/stacktrace.h was x86 specific
and so broke ARM".

I will look into it again after lunch.

> > According to tglx the lockup above "is related to nicks scalability
> > stuff".  I havn't researched yet the offending commit.  Is that
> > necessary?
> 
> Only if you are interested that the problem gets fixed.
OK, will do.

Best regards
Uwe

-- 
Pengutronix e.K.                           | Uwe Kleine-König            |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |

^ permalink raw reply	[flat|nested] 29+ messages in thread

* BUG: spinlock recursion (sys_chdir, user_path_at, do_path_lookup ...)
@ 2011-01-12 12:03       ` Uwe Kleine-König
  0 siblings, 0 replies; 29+ messages in thread
From: Uwe Kleine-König @ 2011-01-12 12:03 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jan 12, 2011 at 11:57:50AM +0100, Thomas Gleixner wrote:
> On Wed, 12 Jan 2011, Uwe Kleine-K?nig wrote:
> > > [   75.280000]  r5:be961ee4 r4:00063015
> > > 
> > > I started to bisect, but already the first test case showed a different
> > > error (my getty dying every few seconds).
> > I bisected this one now, the first bad commit is
> > 
> > 	9c0729d (x86: Eliminate bp argument from the stack tracing routines)
> > 
> > .  It made a x86 specific change to include/linux/stacktrace.h.
> 
> As I said on IRC already, that's complete nonsense. The commit changes
> a function prototype which is only relevant for x86. So how should
> that affect ARM ?
hmm, the conversion that you probably mean is:

	22:26 < ukleinek> hmm, 9c0729dc8062bed96189bd14ac6d4920f3958743 is the first bad commit
	22:26 < tglx> lol
	22:26  * ukleinek goes to bed
	22:27 < ukleinek> then it can only be about include/linux/stacktrace.h
	22:27  * ukleinek goes to bed anyhow
	22:28 < rostedt> ukleinek: btw, you could do the bisect automated with ktest.pl :-)
	22:30 < tglx> ukleinek: right, a change to include/linux/stacktrace.h which is x86 specific
	22:33 < tglx> makes arm explode
	22:33 < tglx> rotfl

I admit I didn't look what was changed there and I understood your
statement as "the change to include/linux/stacktrace.h was x86 specific
and so broke ARM".

I will look into it again after lunch.

> > According to tglx the lockup above "is related to nicks scalability
> > stuff".  I havn't researched yet the offending commit.  Is that
> > necessary?
> 
> Only if you are interested that the problem gets fixed.
OK, will do.

Best regards
Uwe

-- 
Pengutronix e.K.                           | Uwe Kleine-K?nig            |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: BUG: spinlock recursion (sys_chdir, user_path_at, do_path_lookup ...)
  2011-01-12 12:03       ` Uwe Kleine-König
@ 2011-01-12 12:35         ` Russell King - ARM Linux
  -1 siblings, 0 replies; 29+ messages in thread
From: Russell King - ARM Linux @ 2011-01-12 12:35 UTC (permalink / raw)
  To: Uwe Kleine-König
  Cc: Thomas Gleixner, Nick Piggin, Frederic Weisbecker,
	Soren Sandmann, Steven Rostedt, linux-kernel,
	Arnaldo Carvalho de Melo, Ingo Molnar, kernel, H. Peter Anvin,
	Arjan van de Ven, linux-arm-kernel, Peter Zijlstra

On Wed, Jan 12, 2011 at 01:03:49PM +0100, Uwe Kleine-König wrote:
> On Wed, Jan 12, 2011 at 11:57:50AM +0100, Thomas Gleixner wrote:
> > On Wed, 12 Jan 2011, Uwe Kleine-König wrote:
> > > > [   75.280000]  r5:be961ee4 r4:00063015
> > > > 
> > > > I started to bisect, but already the first test case showed a different
> > > > error (my getty dying every few seconds).
> > > I bisected this one now, the first bad commit is
> > > 
> > > 	9c0729d (x86: Eliminate bp argument from the stack tracing routines)
> > > 
> > > .  It made a x86 specific change to include/linux/stacktrace.h.
> > 
> > As I said on IRC already, that's complete nonsense. The commit changes
> > a function prototype which is only relevant for x86. So how should
> > that affect ARM ?
> hmm, the conversion that you probably mean is:
> 
> 	22:26 < ukleinek> hmm, 9c0729dc8062bed96189bd14ac6d4920f3958743 is the first bad commit
> 	22:26 < tglx> lol
> 	22:26  * ukleinek goes to bed
> 	22:27 < ukleinek> then it can only be about include/linux/stacktrace.h
> 	22:27  * ukleinek goes to bed anyhow
> 	22:28 < rostedt> ukleinek: btw, you could do the bisect automated with ktest.pl :-)
> 	22:30 < tglx> ukleinek: right, a change to include/linux/stacktrace.h which is x86 specific
> 	22:33 < tglx> makes arm explode
> 	22:33 < tglx> rotfl
> 
> I admit I didn't look what was changed there and I understood your
> statement as "the change to include/linux/stacktrace.h was x86 specific
> and so broke ARM".

This commit has nothing to do with ARM and couldn't possibly be
responsible for your breakage.  The diffstat for that commit is:

 arch/x86/include/asm/kdebug.h     |    2 +-
 arch/x86/include/asm/stacktrace.h |   33 ++++++++++++++++++++++++++++++---
 arch/x86/kernel/cpu/perf_event.c  |    2 +-
 arch/x86/kernel/dumpstack.c       |   12 ++++++------
 arch/x86/kernel/dumpstack_32.c    |   25 +++++++------------------
 arch/x86/kernel/dumpstack_64.c    |   24 +++++++-----------------
 arch/x86/kernel/process.c         |    3 +--
 arch/x86/kernel/stacktrace.c      |    8 ++++----
 arch/x86/mm/kmemcheck/error.c     |    2 +-
 arch/x86/oprofile/backtrace.c     |    2 +-
 include/linux/stacktrace.h        |    4 +++-
 11 files changed, 62 insertions(+), 55 deletions(-)

and the only file which could affect ARM is include/linux/stacktrace.h.
That change adds a declaration of struct pt_regs and then does:

-extern void save_stack_trace_bp(struct stack_trace *trace, unsigned long bp);
+extern void save_stack_trace_regs(struct stack_trace *trace,
+                                 struct pt_regs *regs);

ARM doesn't implement save_stack_trace_regs() nor save_stack_trace_bp()
so if the compiler referenced these, you'd have a kernel which doesn't
link.  The only places that this symbol appears is:

arch/x86/kernel/stacktrace.c:void save_stack_trace_regs(struct stack_trace *trac
arch/x86/mm/kmemcheck/error.c:  save_stack_trace_regs(&e->trace, regs);
include/linux/stacktrace.h:extern void save_stack_trace_regs(struct stack_trace

So, if this is where your bisect decided was the problem, your bisect
was faulty.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* BUG: spinlock recursion (sys_chdir, user_path_at, do_path_lookup ...)
@ 2011-01-12 12:35         ` Russell King - ARM Linux
  0 siblings, 0 replies; 29+ messages in thread
From: Russell King - ARM Linux @ 2011-01-12 12:35 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jan 12, 2011 at 01:03:49PM +0100, Uwe Kleine-K?nig wrote:
> On Wed, Jan 12, 2011 at 11:57:50AM +0100, Thomas Gleixner wrote:
> > On Wed, 12 Jan 2011, Uwe Kleine-K?nig wrote:
> > > > [   75.280000]  r5:be961ee4 r4:00063015
> > > > 
> > > > I started to bisect, but already the first test case showed a different
> > > > error (my getty dying every few seconds).
> > > I bisected this one now, the first bad commit is
> > > 
> > > 	9c0729d (x86: Eliminate bp argument from the stack tracing routines)
> > > 
> > > .  It made a x86 specific change to include/linux/stacktrace.h.
> > 
> > As I said on IRC already, that's complete nonsense. The commit changes
> > a function prototype which is only relevant for x86. So how should
> > that affect ARM ?
> hmm, the conversion that you probably mean is:
> 
> 	22:26 < ukleinek> hmm, 9c0729dc8062bed96189bd14ac6d4920f3958743 is the first bad commit
> 	22:26 < tglx> lol
> 	22:26  * ukleinek goes to bed
> 	22:27 < ukleinek> then it can only be about include/linux/stacktrace.h
> 	22:27  * ukleinek goes to bed anyhow
> 	22:28 < rostedt> ukleinek: btw, you could do the bisect automated with ktest.pl :-)
> 	22:30 < tglx> ukleinek: right, a change to include/linux/stacktrace.h which is x86 specific
> 	22:33 < tglx> makes arm explode
> 	22:33 < tglx> rotfl
> 
> I admit I didn't look what was changed there and I understood your
> statement as "the change to include/linux/stacktrace.h was x86 specific
> and so broke ARM".

This commit has nothing to do with ARM and couldn't possibly be
responsible for your breakage.  The diffstat for that commit is:

 arch/x86/include/asm/kdebug.h     |    2 +-
 arch/x86/include/asm/stacktrace.h |   33 ++++++++++++++++++++++++++++++---
 arch/x86/kernel/cpu/perf_event.c  |    2 +-
 arch/x86/kernel/dumpstack.c       |   12 ++++++------
 arch/x86/kernel/dumpstack_32.c    |   25 +++++++------------------
 arch/x86/kernel/dumpstack_64.c    |   24 +++++++-----------------
 arch/x86/kernel/process.c         |    3 +--
 arch/x86/kernel/stacktrace.c      |    8 ++++----
 arch/x86/mm/kmemcheck/error.c     |    2 +-
 arch/x86/oprofile/backtrace.c     |    2 +-
 include/linux/stacktrace.h        |    4 +++-
 11 files changed, 62 insertions(+), 55 deletions(-)

and the only file which could affect ARM is include/linux/stacktrace.h.
That change adds a declaration of struct pt_regs and then does:

-extern void save_stack_trace_bp(struct stack_trace *trace, unsigned long bp);
+extern void save_stack_trace_regs(struct stack_trace *trace,
+                                 struct pt_regs *regs);

ARM doesn't implement save_stack_trace_regs() nor save_stack_trace_bp()
so if the compiler referenced these, you'd have a kernel which doesn't
link.  The only places that this symbol appears is:

arch/x86/kernel/stacktrace.c:void save_stack_trace_regs(struct stack_trace *trac
arch/x86/mm/kmemcheck/error.c:  save_stack_trace_regs(&e->trace, regs);
include/linux/stacktrace.h:extern void save_stack_trace_regs(struct stack_trace

So, if this is where your bisect decided was the problem, your bisect
was faulty.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: BUG: spinlock recursion (sys_chdir, user_path_at, do_path_lookup ...)
  2011-01-12 12:35         ` Russell King - ARM Linux
@ 2011-01-12 12:48           ` Russell King - ARM Linux
  -1 siblings, 0 replies; 29+ messages in thread
From: Russell King - ARM Linux @ 2011-01-12 12:48 UTC (permalink / raw)
  To: Uwe Kleine-König
  Cc: Thomas Gleixner, Nick Piggin, Frederic Weisbecker,
	Soren Sandmann, Steven Rostedt, linux-kernel,
	Arnaldo Carvalho de Melo, Ingo Molnar, kernel, H. Peter Anvin,
	Arjan van de Ven, linux-arm-kernel, Peter Zijlstra

On Wed, Jan 12, 2011 at 12:35:08PM +0000, Russell King - ARM Linux wrote:
> ARM doesn't implement save_stack_trace_regs() nor save_stack_trace_bp()
> so if the compiler referenced these, you'd have a kernel which doesn't
> link.  The only places that this symbol appears is:
> 
> arch/x86/kernel/stacktrace.c:void save_stack_trace_regs(struct stack_trace *trac
> arch/x86/mm/kmemcheck/error.c:  save_stack_trace_regs(&e->trace, regs);
> include/linux/stacktrace.h:extern void save_stack_trace_regs(struct stack_trace
> 
> So, if this is where your bisect decided was the problem, your bisect
> was faulty.

BTW, a useful thing to do after a bisect is to return to the point in
the history where you first noticed the regression (so Linus' tip,
your tip, or whatever).  Then try reverting the commit which git bisect
_thinks_ is the cause of your problem and re-test that.

If the problem is fixed, you have greater confidence that the commit is
the problem.

If it made no difference, then you know that something else (maybe in
combination) is causing the problem.

If you couldn't revert it because of other dependencies then you have
to rely on analysis (such as what I did) and maybe try again with a
slightly different strategy - maybe the problem only _occasionally_
occurs, making the 'git bisect good' points unreliable, so maybe you
need to do more testing when the problem doesn't immediately appear?

Lastly, it is worth bearing in mind that GCC is really finicky with its
optimization.  It may be hard to believe, but unrelated function
definitions in headers can (and do) affect the code generation in
completely unrelated functions causing them to be optimized
differently [*].  Maybe this applies to prototypes too?

So it _could_ be that the prototype change in include/linux/stacktrace.h
is tickling a GCC code generation bug.

* - ISTR, this behaviour was raised as a bug with GCC folk, which I
believe was closed down as wontfix as its a result of the way the
optimizer works.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* BUG: spinlock recursion (sys_chdir, user_path_at, do_path_lookup ...)
@ 2011-01-12 12:48           ` Russell King - ARM Linux
  0 siblings, 0 replies; 29+ messages in thread
From: Russell King - ARM Linux @ 2011-01-12 12:48 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jan 12, 2011 at 12:35:08PM +0000, Russell King - ARM Linux wrote:
> ARM doesn't implement save_stack_trace_regs() nor save_stack_trace_bp()
> so if the compiler referenced these, you'd have a kernel which doesn't
> link.  The only places that this symbol appears is:
> 
> arch/x86/kernel/stacktrace.c:void save_stack_trace_regs(struct stack_trace *trac
> arch/x86/mm/kmemcheck/error.c:  save_stack_trace_regs(&e->trace, regs);
> include/linux/stacktrace.h:extern void save_stack_trace_regs(struct stack_trace
> 
> So, if this is where your bisect decided was the problem, your bisect
> was faulty.

BTW, a useful thing to do after a bisect is to return to the point in
the history where you first noticed the regression (so Linus' tip,
your tip, or whatever).  Then try reverting the commit which git bisect
_thinks_ is the cause of your problem and re-test that.

If the problem is fixed, you have greater confidence that the commit is
the problem.

If it made no difference, then you know that something else (maybe in
combination) is causing the problem.

If you couldn't revert it because of other dependencies then you have
to rely on analysis (such as what I did) and maybe try again with a
slightly different strategy - maybe the problem only _occasionally_
occurs, making the 'git bisect good' points unreliable, so maybe you
need to do more testing when the problem doesn't immediately appear?

Lastly, it is worth bearing in mind that GCC is really finicky with its
optimization.  It may be hard to believe, but unrelated function
definitions in headers can (and do) affect the code generation in
completely unrelated functions causing them to be optimized
differently [*].  Maybe this applies to prototypes too?

So it _could_ be that the prototype change in include/linux/stacktrace.h
is tickling a GCC code generation bug.

* - ISTR, this behaviour was raised as a bug with GCC folk, which I
believe was closed down as wontfix as its a result of the way the
optimizer works.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: BUG: spinlock recursion (sys_chdir, user_path_at, do_path_lookup ...)
  2011-01-12 12:48           ` Russell King - ARM Linux
@ 2011-01-12 12:56             ` Thomas Gleixner
  -1 siblings, 0 replies; 29+ messages in thread
From: Thomas Gleixner @ 2011-01-12 12:56 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Uwe Kleine-König, Nick Piggin, Frederic Weisbecker,
	Soren Sandmann, Steven Rostedt, linux-kernel,
	Arnaldo Carvalho de Melo, Ingo Molnar, kernel, H. Peter Anvin,
	Arjan van de Ven, linux-arm-kernel, Peter Zijlstra

On Wed, 12 Jan 2011, Russell King - ARM Linux wrote:
> On Wed, Jan 12, 2011 at 12:35:08PM +0000, Russell King - ARM Linux wrote:
> > ARM doesn't implement save_stack_trace_regs() nor save_stack_trace_bp()
> > so if the compiler referenced these, you'd have a kernel which doesn't
> > link.  The only places that this symbol appears is:
> > 
> > arch/x86/kernel/stacktrace.c:void save_stack_trace_regs(struct stack_trace *trac
> > arch/x86/mm/kmemcheck/error.c:  save_stack_trace_regs(&e->trace, regs);
> > include/linux/stacktrace.h:extern void save_stack_trace_regs(struct stack_trace
> > 
> > So, if this is where your bisect decided was the problem, your bisect
> > was faulty.
> 
> BTW, a useful thing to do after a bisect is to return to the point in
> the history where you first noticed the regression (so Linus' tip,
> your tip, or whatever).  Then try reverting the commit which git bisect
> _thinks_ is the cause of your problem and re-test that.
> 
> If the problem is fixed, you have greater confidence that the commit is
> the problem.
> 
> If it made no difference, then you know that something else (maybe in
> combination) is causing the problem.
> 
> If you couldn't revert it because of other dependencies then you have
> to rely on analysis (such as what I did) and maybe try again with a
> slightly different strategy - maybe the problem only _occasionally_
> occurs, making the 'git bisect good' points unreliable, so maybe you
> need to do more testing when the problem doesn't immediately appear?
> 
> Lastly, it is worth bearing in mind that GCC is really finicky with its
> optimization.  It may be hard to believe, but unrelated function
> definitions in headers can (and do) affect the code generation in
> completely unrelated functions causing them to be optimized
> differently [*].  Maybe this applies to prototypes too?

Yes, it does. Also adding an inline or define can change the
behaviour.

> So it _could_ be that the prototype change in include/linux/stacktrace.h
> is tickling a GCC code generation bug.
> 
> * - ISTR, this behaviour was raised as a bug with GCC folk, which I
> believe was closed down as wontfix as its a result of the way the
> optimizer works.

Right, they just fixed the problem where this effect generated buggy
code on x86 in some cases.

Thanks,

	tglx


^ permalink raw reply	[flat|nested] 29+ messages in thread

* BUG: spinlock recursion (sys_chdir, user_path_at, do_path_lookup ...)
@ 2011-01-12 12:56             ` Thomas Gleixner
  0 siblings, 0 replies; 29+ messages in thread
From: Thomas Gleixner @ 2011-01-12 12:56 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 12 Jan 2011, Russell King - ARM Linux wrote:
> On Wed, Jan 12, 2011 at 12:35:08PM +0000, Russell King - ARM Linux wrote:
> > ARM doesn't implement save_stack_trace_regs() nor save_stack_trace_bp()
> > so if the compiler referenced these, you'd have a kernel which doesn't
> > link.  The only places that this symbol appears is:
> > 
> > arch/x86/kernel/stacktrace.c:void save_stack_trace_regs(struct stack_trace *trac
> > arch/x86/mm/kmemcheck/error.c:  save_stack_trace_regs(&e->trace, regs);
> > include/linux/stacktrace.h:extern void save_stack_trace_regs(struct stack_trace
> > 
> > So, if this is where your bisect decided was the problem, your bisect
> > was faulty.
> 
> BTW, a useful thing to do after a bisect is to return to the point in
> the history where you first noticed the regression (so Linus' tip,
> your tip, or whatever).  Then try reverting the commit which git bisect
> _thinks_ is the cause of your problem and re-test that.
> 
> If the problem is fixed, you have greater confidence that the commit is
> the problem.
> 
> If it made no difference, then you know that something else (maybe in
> combination) is causing the problem.
> 
> If you couldn't revert it because of other dependencies then you have
> to rely on analysis (such as what I did) and maybe try again with a
> slightly different strategy - maybe the problem only _occasionally_
> occurs, making the 'git bisect good' points unreliable, so maybe you
> need to do more testing when the problem doesn't immediately appear?
> 
> Lastly, it is worth bearing in mind that GCC is really finicky with its
> optimization.  It may be hard to believe, but unrelated function
> definitions in headers can (and do) affect the code generation in
> completely unrelated functions causing them to be optimized
> differently [*].  Maybe this applies to prototypes too?

Yes, it does. Also adding an inline or define can change the
behaviour.

> So it _could_ be that the prototype change in include/linux/stacktrace.h
> is tickling a GCC code generation bug.
> 
> * - ISTR, this behaviour was raised as a bug with GCC folk, which I
> believe was closed down as wontfix as its a result of the way the
> optimizer works.

Right, they just fixed the problem where this effect generated buggy
code on x86 in some cases.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: BUG: spinlock recursion (sys_chdir, user_path_at, do_path_lookup ...)
  2011-01-11 11:05 ` Uwe Kleine-König
@ 2011-01-18 16:59   ` Maciej Rutecki
  -1 siblings, 0 replies; 29+ messages in thread
From: Maciej Rutecki @ 2011-01-18 16:59 UTC (permalink / raw)
  To: Uwe Kleine-König; +Cc: linux-kernel, linux-arm-kernel, kernel

I created a Bugzilla entry at 
https://bugzilla.kernel.org/show_bug.cgi?id=27032
for your bug report, please add your address to the CC list in there, thanks!	

On wtorek, 11 stycznia 2011 o 12:05:39 Uwe Kleine-König wrote:
> Hello,
> 
> when testing yesterday's Linus' master branch
> (a08948812b30653eb2c536ae613b635a989feb6f + some arch support including
> Trond's latest nfsfix[1]) I hit the following reproducibly:
> 
> [    5.580000] BUG: spinlock recursion on CPU#0, init/1
> [    5.580000]  lock: c7487e10, .magic: dead4ead, .owner: init/1,
> .owner_cpu: 0 [    5.590000] Backtrace:
> [    5.590000] [<c0037c2c>] (dump_backtrace+0x0/0x110) from [<c028240c>]
> (dump_stack+0x1c/0x20) [    5.600000]  r7:c7487e10 r6:c0321368 r5:c7487e10
> r4:c7848000
> [    5.610000] [<c02823f0>] (dump_stack+0x0/0x20) from [<c01b516c>]
> (spin_bug+0x90/0xa4) [    5.620000] [<c01b50dc>] (spin_bug+0x0/0xa4) from
> [<c01b52d4>] (do_raw_spin_lock+0x50/0x154) [    5.620000]  r6:c7487e10
> r5:c7487e10 r4:00000000
> [    5.630000] [<c01b5284>] (do_raw_spin_lock+0x0/0x154) from [<c028524c>]
> (_raw_spin_lock_nested+0x40/0x48) [    5.640000] [<c028520c>]
> (_raw_spin_lock_nested+0x0/0x48) from [<c00f436c>]
> (nameidata_dentry_drop_rcu+0x90/0x1a4) [    5.650000]  r5:c7843efc
> r4:c7487dc0
> [    5.650000] [<c00f42dc>] (nameidata_dentry_drop_rcu+0x0/0x1a4) from
> [<c00f44c0>] (d_revalidate+0x40/0x68) [    5.660000] [<c00f4480>]
> (d_revalidate+0x0/0x68) from [<c00f6ed4>] (link_path_walk+0xb84/0xbf0) [  
>  5.670000]  r6:c7843efc r5:c7843efc r4:00000000
> [    5.680000] [<c00f6350>] (link_path_walk+0x0/0xbf0) from [<c00f7164>]
> (do_path_lookup+0x48/0xd4) [    5.680000] [<c00f711c>]
> (do_path_lookup+0x0/0xd4) from [<c00f7c08>] (user_path_at+0x64/0x9c) [   
> 5.690000] [<c00f7ba4>] (user_path_at+0x0/0x9c) from [<c00e9614>]
> (sys_chdir+0x2c/0x78) [    5.700000]  r8:c0034108 r7:0000000c r6:be961ee4
> r5:c7843f88 r4:00063015 [    5.710000] [<c00e95e8>] (sys_chdir+0x0/0x78)
> from [<c0033e80>] (ret_fast_syscall+0x0/0x44) [    5.720000]  r5:be961ee4
> r4:00063015
> [   11.720000] BUG: spinlock lockup on CPU#0, init/1, c7487e10
> [   11.730000] Backtrace:
> [   11.730000] [<c0037c2c>] (dump_backtrace+0x0/0x110) from [<c028240c>]
> (dump_stack+0x1c/0x20) [   11.740000]  r7:c7842000 r6:c7487e10 r5:00000000
> r4:00000000
> [   11.740000] [<c02823f0>] (dump_stack+0x0/0x20) from [<c01b539c>]
> (do_raw_spin_lock+0x118/0x154) [   11.750000] [<c01b5284>]
> (do_raw_spin_lock+0x0/0x154) from [<c028524c>]
> (_raw_spin_lock_nested+0x40/0x48) [   11.760000] [<c028520c>]
> (_raw_spin_lock_nested+0x0/0x48) from [<c00f436c>]
> (nameidata_dentry_drop_rcu+0x90/0x1a4) [   11.770000]  r5:c7843efc
> r4:c7487dc0
> [   11.780000] [<c00f42dc>] (nameidata_dentry_drop_rcu+0x0/0x1a4) from
> [<c00f44c0>] (d_revalidate+0x40/0x68) [   11.790000] [<c00f4480>]
> (d_revalidate+0x0/0x68) from [<c00f6ed4>] (link_path_walk+0xb84/0xbf0) [  
> 11.790000]  r6:c7843efc r5:c7843efc r4:00000000
> [   11.800000] [<c00f6350>] (link_path_walk+0x0/0xbf0) from [<c00f7164>]
> (do_path_lookup+0x48/0xd4) [   11.810000] [<c00f711c>]
> (do_path_lookup+0x0/0xd4) from [<c00f7c08>] (user_path_at+0x64/0x9c) [  
> 11.820000] [<c00f7ba4>] (user_path_at+0x0/0x9c) from [<c00e9614>]
> (sys_chdir+0x2c/0x78) [   11.820000]  r8:c0034108 r7:0000000c r6:be961ee4
> r5:c7843f88 r4:00063015 [   11.830000] [<c00e95e8>] (sys_chdir+0x0/0x78)
> from [<c0033e80>] (ret_fast_syscall+0x0/0x44) [   11.840000]  r5:be961ee4
> r4:00063015
> [   75.280000] BUG: soft lockup - CPU#0 stuck for 64s! [init:1]
> [   75.280000] Modules linked in:
> [   75.280000] irq event stamp: 113662
> [   75.280000] hardirqs last  enabled at (113662): [<c0285a7c>]
> _raw_spin_unlock_irqrestore+0x48/0x50 [   75.280000] hardirqs last
> disabled at (113661): [<c0285398>] _raw_spin_lock_irqsave+0x30/0x64 [  
> 75.280000] softirqs last  enabled at (113509): [<c026447c>]
> rpc_wake_up_next+0x1b0/0x1c4 [   75.280000] softirqs last disabled at
> (113507): [<c02854f0>] _raw_spin_lock_bh+0x20/0x58 [   75.280000]
> [   75.280000] Pid: 1, comm:                 init
> [   75.280000] CPU: 0    Not tainted  (2.6.37-04021-gb8b018c-dirty #41)
> [   75.280000] PC is at do_raw_spin_lock+0xac/0x154
> [   75.280000] LR is at do_raw_spin_lock+0xc0/0x154
> [   75.280000] pc : [<c01b5330>]    lr : [<c01b5344>]    psr: 20000013
> [   75.280000] sp : c7843dd0  ip : c7843cd4  fp : c7843e04
> [   75.280000] r10: 06bd0000  r9 : 00000000  r8 : 00000000
> [   75.280000] r7 : c7842000  r6 : c7487e10  r5 : 00000000  r4 : 03dd5aca
> [   75.280000] r3 : 00000000  r2 : 00000001  r1 : c0285a74  r0 : 00000001
> [   75.280000] Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment
> user [   75.280000] Control: 0005317f  Table: 479a8000  DAC: 00000015
> [   75.280000] [<c00356c4>] (show_regs+0x0/0x54) from [<c0089dac>]
> (watchdog_timer_fn+0x13c/0x1a4) [   75.280000]  r4:c7842000
> [   75.280000] [<c0089c70>] (watchdog_timer_fn+0x0/0x1a4) from [<c006cb58>]
> (__run_hrtimer+0x114/0x1f0) [   75.280000] [<c006ca44>]
> (__run_hrtimer+0x0/0x1f0) from [<c006ced8>]
> (hrtimer_interrupt+0x154/0x338) [   75.280000] [<c006cd84>]
> (hrtimer_interrupt+0x0/0x338) from [<c003e36c>]
> (mxs_timer_interrupt+0x28/0x34) [   75.280000] [<c003e344>]
> (mxs_timer_interrupt+0x0/0x34) from [<c008a408>]
> (handle_IRQ_event+0x7c/0x1a8) [   75.280000] [<c008a38c>]
> (handle_IRQ_event+0x0/0x1a8) from [<c008c948>]
> (handle_level_irq+0xc8/0x148) [   75.280000] [<c008c880>]
> (handle_level_irq+0x0/0x148) from [<c002d320>] (asm_do_IRQ+0x80/0xa4) [  
> 75.280000]  r7:c7842000 r6:c7487e10 r5:00000000 r4:00000030
> [   75.280000] [<c002d2a0>] (asm_do_IRQ+0x0/0xa4) from [<c0033ab8>]
> (__irq_svc+0x38/0x80) [   75.280000] Exception stack(0xc7843d88 to
> 0xc7843dd0)
> [   75.280000] 3d80:                   00000001 c0285a74 00000001 00000000
> 03dd5aca 00000000 [   75.280000] 3da0: c7487e10 c7842000 00000000 00000000
> 06bd0000 c7843e04 c7843cd4 c7843dd0 [   75.280000] 3dc0: c01b5344 c01b5330
> 20000013 ffffffff
> [   75.280000]  r5:f5000000 r4:ffffffff
> [   75.280000] [<c01b5284>] (do_raw_spin_lock+0x0/0x154) from [<c028524c>]
> (_raw_spin_lock_nested+0x40/0x48) [   75.280000] [<c028520c>]
> (_raw_spin_lock_nested+0x0/0x48) from [<c00f436c>]
> (nameidata_dentry_drop_rcu+0x90/0x1a4) [   75.280000]  r5:c7843efc
> r4:c7487dc0
> [   75.280000] [<c00f42dc>] (nameidata_dentry_drop_rcu+0x0/0x1a4) from
> [<c00f44c0>] (d_revalidate+0x40/0x68) [   75.280000] [<c00f4480>]
> (d_revalidate+0x0/0x68) from [<c00f6ed4>] (link_path_walk+0xb84/0xbf0) [  
> 75.280000]  r6:c7843efc r5:c7843efc r4:00000000
> [   75.280000] [<c00f6350>] (link_path_walk+0x0/0xbf0) from [<c00f7164>]
> (do_path_lookup+0x48/0xd4) [   75.280000] [<c00f711c>]
> (do_path_lookup+0x0/0xd4) from [<c00f7c08>] (user_path_at+0x64/0x9c) [  
> 75.280000] [<c00f7ba4>] (user_path_at+0x0/0x9c) from [<c00e9614>]
> (sys_chdir+0x2c/0x78) [   75.280000]  r8:c0034108 r7:0000000c r6:be961ee4
> r5:c7843f88 r4:00063015 [   75.280000] [<c00e95e8>] (sys_chdir+0x0/0x78)
> from [<c0033e80>] (ret_fast_syscall+0x0/0x44) [   75.280000]  r5:be961ee4
> r4:00063015
> 
> I started to bisect, but already the first test case showed a different
> error (my getty dying every few seconds).
> 
> Does this ring a bell for someone?
> 
> If you have questions don't hesitate to ask.
> 
> Hardware: mxs-based arm9
> 
> Best regards
> Uwe
> 
> [1] http://mid.gmane.org/1294528551.4181.19.camel@heimdal.trondhjem.org

-- 
Maciej Rutecki
http://www.maciek.unixy.pl

^ permalink raw reply	[flat|nested] 29+ messages in thread

* BUG: spinlock recursion (sys_chdir, user_path_at, do_path_lookup ...)
@ 2011-01-18 16:59   ` Maciej Rutecki
  0 siblings, 0 replies; 29+ messages in thread
From: Maciej Rutecki @ 2011-01-18 16:59 UTC (permalink / raw)
  To: linux-arm-kernel

I created a Bugzilla entry at 
https://bugzilla.kernel.org/show_bug.cgi?id=27032
for your bug report, please add your address to the CC list in there, thanks!	

On wtorek, 11 stycznia 2011 o 12:05:39 Uwe Kleine-K?nig wrote:
> Hello,
> 
> when testing yesterday's Linus' master branch
> (a08948812b30653eb2c536ae613b635a989feb6f + some arch support including
> Trond's latest nfsfix[1]) I hit the following reproducibly:
> 
> [    5.580000] BUG: spinlock recursion on CPU#0, init/1
> [    5.580000]  lock: c7487e10, .magic: dead4ead, .owner: init/1,
> .owner_cpu: 0 [    5.590000] Backtrace:
> [    5.590000] [<c0037c2c>] (dump_backtrace+0x0/0x110) from [<c028240c>]
> (dump_stack+0x1c/0x20) [    5.600000]  r7:c7487e10 r6:c0321368 r5:c7487e10
> r4:c7848000
> [    5.610000] [<c02823f0>] (dump_stack+0x0/0x20) from [<c01b516c>]
> (spin_bug+0x90/0xa4) [    5.620000] [<c01b50dc>] (spin_bug+0x0/0xa4) from
> [<c01b52d4>] (do_raw_spin_lock+0x50/0x154) [    5.620000]  r6:c7487e10
> r5:c7487e10 r4:00000000
> [    5.630000] [<c01b5284>] (do_raw_spin_lock+0x0/0x154) from [<c028524c>]
> (_raw_spin_lock_nested+0x40/0x48) [    5.640000] [<c028520c>]
> (_raw_spin_lock_nested+0x0/0x48) from [<c00f436c>]
> (nameidata_dentry_drop_rcu+0x90/0x1a4) [    5.650000]  r5:c7843efc
> r4:c7487dc0
> [    5.650000] [<c00f42dc>] (nameidata_dentry_drop_rcu+0x0/0x1a4) from
> [<c00f44c0>] (d_revalidate+0x40/0x68) [    5.660000] [<c00f4480>]
> (d_revalidate+0x0/0x68) from [<c00f6ed4>] (link_path_walk+0xb84/0xbf0) [  
>  5.670000]  r6:c7843efc r5:c7843efc r4:00000000
> [    5.680000] [<c00f6350>] (link_path_walk+0x0/0xbf0) from [<c00f7164>]
> (do_path_lookup+0x48/0xd4) [    5.680000] [<c00f711c>]
> (do_path_lookup+0x0/0xd4) from [<c00f7c08>] (user_path_at+0x64/0x9c) [   
> 5.690000] [<c00f7ba4>] (user_path_at+0x0/0x9c) from [<c00e9614>]
> (sys_chdir+0x2c/0x78) [    5.700000]  r8:c0034108 r7:0000000c r6:be961ee4
> r5:c7843f88 r4:00063015 [    5.710000] [<c00e95e8>] (sys_chdir+0x0/0x78)
> from [<c0033e80>] (ret_fast_syscall+0x0/0x44) [    5.720000]  r5:be961ee4
> r4:00063015
> [   11.720000] BUG: spinlock lockup on CPU#0, init/1, c7487e10
> [   11.730000] Backtrace:
> [   11.730000] [<c0037c2c>] (dump_backtrace+0x0/0x110) from [<c028240c>]
> (dump_stack+0x1c/0x20) [   11.740000]  r7:c7842000 r6:c7487e10 r5:00000000
> r4:00000000
> [   11.740000] [<c02823f0>] (dump_stack+0x0/0x20) from [<c01b539c>]
> (do_raw_spin_lock+0x118/0x154) [   11.750000] [<c01b5284>]
> (do_raw_spin_lock+0x0/0x154) from [<c028524c>]
> (_raw_spin_lock_nested+0x40/0x48) [   11.760000] [<c028520c>]
> (_raw_spin_lock_nested+0x0/0x48) from [<c00f436c>]
> (nameidata_dentry_drop_rcu+0x90/0x1a4) [   11.770000]  r5:c7843efc
> r4:c7487dc0
> [   11.780000] [<c00f42dc>] (nameidata_dentry_drop_rcu+0x0/0x1a4) from
> [<c00f44c0>] (d_revalidate+0x40/0x68) [   11.790000] [<c00f4480>]
> (d_revalidate+0x0/0x68) from [<c00f6ed4>] (link_path_walk+0xb84/0xbf0) [  
> 11.790000]  r6:c7843efc r5:c7843efc r4:00000000
> [   11.800000] [<c00f6350>] (link_path_walk+0x0/0xbf0) from [<c00f7164>]
> (do_path_lookup+0x48/0xd4) [   11.810000] [<c00f711c>]
> (do_path_lookup+0x0/0xd4) from [<c00f7c08>] (user_path_at+0x64/0x9c) [  
> 11.820000] [<c00f7ba4>] (user_path_at+0x0/0x9c) from [<c00e9614>]
> (sys_chdir+0x2c/0x78) [   11.820000]  r8:c0034108 r7:0000000c r6:be961ee4
> r5:c7843f88 r4:00063015 [   11.830000] [<c00e95e8>] (sys_chdir+0x0/0x78)
> from [<c0033e80>] (ret_fast_syscall+0x0/0x44) [   11.840000]  r5:be961ee4
> r4:00063015
> [   75.280000] BUG: soft lockup - CPU#0 stuck for 64s! [init:1]
> [   75.280000] Modules linked in:
> [   75.280000] irq event stamp: 113662
> [   75.280000] hardirqs last  enabled at (113662): [<c0285a7c>]
> _raw_spin_unlock_irqrestore+0x48/0x50 [   75.280000] hardirqs last
> disabled at (113661): [<c0285398>] _raw_spin_lock_irqsave+0x30/0x64 [  
> 75.280000] softirqs last  enabled at (113509): [<c026447c>]
> rpc_wake_up_next+0x1b0/0x1c4 [   75.280000] softirqs last disabled at
> (113507): [<c02854f0>] _raw_spin_lock_bh+0x20/0x58 [   75.280000]
> [   75.280000] Pid: 1, comm:                 init
> [   75.280000] CPU: 0    Not tainted  (2.6.37-04021-gb8b018c-dirty #41)
> [   75.280000] PC is at do_raw_spin_lock+0xac/0x154
> [   75.280000] LR is at do_raw_spin_lock+0xc0/0x154
> [   75.280000] pc : [<c01b5330>]    lr : [<c01b5344>]    psr: 20000013
> [   75.280000] sp : c7843dd0  ip : c7843cd4  fp : c7843e04
> [   75.280000] r10: 06bd0000  r9 : 00000000  r8 : 00000000
> [   75.280000] r7 : c7842000  r6 : c7487e10  r5 : 00000000  r4 : 03dd5aca
> [   75.280000] r3 : 00000000  r2 : 00000001  r1 : c0285a74  r0 : 00000001
> [   75.280000] Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment
> user [   75.280000] Control: 0005317f  Table: 479a8000  DAC: 00000015
> [   75.280000] [<c00356c4>] (show_regs+0x0/0x54) from [<c0089dac>]
> (watchdog_timer_fn+0x13c/0x1a4) [   75.280000]  r4:c7842000
> [   75.280000] [<c0089c70>] (watchdog_timer_fn+0x0/0x1a4) from [<c006cb58>]
> (__run_hrtimer+0x114/0x1f0) [   75.280000] [<c006ca44>]
> (__run_hrtimer+0x0/0x1f0) from [<c006ced8>]
> (hrtimer_interrupt+0x154/0x338) [   75.280000] [<c006cd84>]
> (hrtimer_interrupt+0x0/0x338) from [<c003e36c>]
> (mxs_timer_interrupt+0x28/0x34) [   75.280000] [<c003e344>]
> (mxs_timer_interrupt+0x0/0x34) from [<c008a408>]
> (handle_IRQ_event+0x7c/0x1a8) [   75.280000] [<c008a38c>]
> (handle_IRQ_event+0x0/0x1a8) from [<c008c948>]
> (handle_level_irq+0xc8/0x148) [   75.280000] [<c008c880>]
> (handle_level_irq+0x0/0x148) from [<c002d320>] (asm_do_IRQ+0x80/0xa4) [  
> 75.280000]  r7:c7842000 r6:c7487e10 r5:00000000 r4:00000030
> [   75.280000] [<c002d2a0>] (asm_do_IRQ+0x0/0xa4) from [<c0033ab8>]
> (__irq_svc+0x38/0x80) [   75.280000] Exception stack(0xc7843d88 to
> 0xc7843dd0)
> [   75.280000] 3d80:                   00000001 c0285a74 00000001 00000000
> 03dd5aca 00000000 [   75.280000] 3da0: c7487e10 c7842000 00000000 00000000
> 06bd0000 c7843e04 c7843cd4 c7843dd0 [   75.280000] 3dc0: c01b5344 c01b5330
> 20000013 ffffffff
> [   75.280000]  r5:f5000000 r4:ffffffff
> [   75.280000] [<c01b5284>] (do_raw_spin_lock+0x0/0x154) from [<c028524c>]
> (_raw_spin_lock_nested+0x40/0x48) [   75.280000] [<c028520c>]
> (_raw_spin_lock_nested+0x0/0x48) from [<c00f436c>]
> (nameidata_dentry_drop_rcu+0x90/0x1a4) [   75.280000]  r5:c7843efc
> r4:c7487dc0
> [   75.280000] [<c00f42dc>] (nameidata_dentry_drop_rcu+0x0/0x1a4) from
> [<c00f44c0>] (d_revalidate+0x40/0x68) [   75.280000] [<c00f4480>]
> (d_revalidate+0x0/0x68) from [<c00f6ed4>] (link_path_walk+0xb84/0xbf0) [  
> 75.280000]  r6:c7843efc r5:c7843efc r4:00000000
> [   75.280000] [<c00f6350>] (link_path_walk+0x0/0xbf0) from [<c00f7164>]
> (do_path_lookup+0x48/0xd4) [   75.280000] [<c00f711c>]
> (do_path_lookup+0x0/0xd4) from [<c00f7c08>] (user_path_at+0x64/0x9c) [  
> 75.280000] [<c00f7ba4>] (user_path_at+0x0/0x9c) from [<c00e9614>]
> (sys_chdir+0x2c/0x78) [   75.280000]  r8:c0034108 r7:0000000c r6:be961ee4
> r5:c7843f88 r4:00063015 [   75.280000] [<c00e95e8>] (sys_chdir+0x0/0x78)
> from [<c0033e80>] (ret_fast_syscall+0x0/0x44) [   75.280000]  r5:be961ee4
> r4:00063015
> 
> I started to bisect, but already the first test case showed a different
> error (my getty dying every few seconds).
> 
> Does this ring a bell for someone?
> 
> If you have questions don't hesitate to ask.
> 
> Hardware: mxs-based arm9
> 
> Best regards
> Uwe
> 
> [1] http://mid.gmane.org/1294528551.4181.19.camel at heimdal.trondhjem.org

-- 
Maciej Rutecki
http://www.maciek.unixy.pl

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: BUG: spinlock recursion (sys_chdir, user_path_at, do_path_lookup ...)
  2011-01-18 16:59   ` Maciej Rutecki
@ 2011-01-18 22:19     ` Nick Piggin
  -1 siblings, 0 replies; 29+ messages in thread
From: Nick Piggin @ 2011-01-18 22:19 UTC (permalink / raw)
  To: maciej.rutecki
  Cc: Uwe Kleine-König, linux-kernel, linux-arm-kernel, kernel

2011/1/19 Maciej Rutecki <maciej.rutecki@gmail.com>:
> I created a Bugzilla entry at
> https://bugzilla.kernel.org/show_bug.cgi?id=27032
> for your bug report, please add your address to the CC list in there, thanks!

Sorry I missed this mail. This bug should have been fixed several
days ago 90dbb77ba48dddb87445d238e84cd137cf97dd98

Thanks for the report, please do again if you see any problems.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* BUG: spinlock recursion (sys_chdir, user_path_at, do_path_lookup ...)
@ 2011-01-18 22:19     ` Nick Piggin
  0 siblings, 0 replies; 29+ messages in thread
From: Nick Piggin @ 2011-01-18 22:19 UTC (permalink / raw)
  To: linux-arm-kernel

2011/1/19 Maciej Rutecki <maciej.rutecki@gmail.com>:
> I created a Bugzilla entry at
> https://bugzilla.kernel.org/show_bug.cgi?id=27032
> for your bug report, please add your address to the CC list in there, thanks!

Sorry I missed this mail. This bug should have been fixed several
days ago 90dbb77ba48dddb87445d238e84cd137cf97dd98

Thanks for the report, please do again if you see any problems.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: BUG: spinlock recursion (sys_chdir, user_path_at, do_path_lookup ...)
  2011-01-18 22:19     ` Nick Piggin
@ 2011-01-19  7:43       ` Uwe Kleine-König
  -1 siblings, 0 replies; 29+ messages in thread
From: Uwe Kleine-König @ 2011-01-19  7:43 UTC (permalink / raw)
  To: Nick Piggin; +Cc: maciej.rutecki, kernel, linux-kernel, linux-arm-kernel

On Wed, Jan 19, 2011 at 09:19:59AM +1100, Nick Piggin wrote:
> 2011/1/19 Maciej Rutecki <maciej.rutecki@gmail.com>:
> > I created a Bugzilla entry at
> > https://bugzilla.kernel.org/show_bug.cgi?id=27032
> > for your bug report, please add your address to the CC list in there, thanks!
> 
> Sorry I missed this mail. This bug should have been fixed several
> days ago 90dbb77ba48dddb87445d238e84cd137cf97dd98
Yesterday I retryed Linus' tree (which has the commit 90dbb77ba) and the
spinlock recursion is gone.

Best regards
Uwe

-- 
Pengutronix e.K.                           | Uwe Kleine-König            |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |

^ permalink raw reply	[flat|nested] 29+ messages in thread

* BUG: spinlock recursion (sys_chdir, user_path_at, do_path_lookup ...)
@ 2011-01-19  7:43       ` Uwe Kleine-König
  0 siblings, 0 replies; 29+ messages in thread
From: Uwe Kleine-König @ 2011-01-19  7:43 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Jan 19, 2011 at 09:19:59AM +1100, Nick Piggin wrote:
> 2011/1/19 Maciej Rutecki <maciej.rutecki@gmail.com>:
> > I created a Bugzilla entry at
> > https://bugzilla.kernel.org/show_bug.cgi?id=27032
> > for your bug report, please add your address to the CC list in there, thanks!
> 
> Sorry I missed this mail. This bug should have been fixed several
> days ago 90dbb77ba48dddb87445d238e84cd137cf97dd98
Yesterday I retryed Linus' tree (which has the commit 90dbb77ba) and the
spinlock recursion is gone.

Best regards
Uwe

-- 
Pengutronix e.K.                           | Uwe Kleine-K?nig            |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |

^ permalink raw reply	[flat|nested] 29+ messages in thread

* BUG: spinlock recursion (sys_chdir, user_path_at, do_path_lookup ...)
  2011-01-13 11:21         ` Thomas Gleixner
@ 2011-01-13 11:37           ` Peter Zijlstra
  0 siblings, 0 replies; 29+ messages in thread
From: Peter Zijlstra @ 2011-01-13 11:37 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 2011-01-13 at 12:21 +0100, Thomas Gleixner wrote:
> On Thu, 13 Jan 2011, Peter Zijlstra wrote:
> 
> > 
> > > On Wed, 2011-01-12 at 23:52 +0100, Thomas Gleixner wrote:
> > 
> > > > @peterz: Why does lockdep ignore the lock recursion in that
> > > >          spin_lock_nested() call?
> > 
> > So after some hints on IRC on where to look:
> > 
> > <tglx>         spin_lock(&parent->d_lock);
> > <tglx>         spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
> > <tglx> if parent == dentry
> > 
> > That won't yell because you explicitly tell lockdep its ok, I know what
> > I'm doing.
> > 
> > Several lockdep annotations (including this one) allow you to annotate
> > real bugs away, hence you really need to be sure about things when you
> > make them.
> 
> Yeah, I suspected that, but checking whether the pointers are same
> would be nice as it would tell us right away where we fcked up :)
> 

Something like the below would indeed do that, but it makes the
lock_acquire path more expensive, since it will now have to iterate the
held lock stack every time.

(not actually tested)

---
 kernel/lockdep.c |   18 +++++++++++++++++-
 1 files changed, 17 insertions(+), 1 deletions(-)

diff --git a/kernel/lockdep.c b/kernel/lockdep.c
index 42ba65d..d053d9a 100644
--- a/kernel/lockdep.c
+++ b/kernel/lockdep.c
@@ -2740,11 +2740,12 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
 {
 	struct task_struct *curr = current;
 	struct lock_class *class = NULL;
-	struct held_lock *hlock;
+	struct held_lock *hlock, *rhlock;
 	unsigned int depth, id;
 	int chain_head = 0;
 	int class_idx;
 	u64 chain_key;
+	int i;
 
 	if (!prove_locking)
 		check = 1;
@@ -2817,6 +2818,21 @@ static int __lock_acquire(struct lockdep_map *lock, unsigned int subclass,
 	hlock->holdtime_stamp = lockstat_clock();
 #endif
 
+	for (i = depth-1; i >= 0; i--) {
+		rhlock = curr->held_locks + i;
+		if (rhlock->instance == lock) {
+			if (debug_locks_off() || debug_locks_silent)
+				return 0;
+			printk("Lock recursion, trying to acquire:\n");
+			print_lock(hlock);
+			printk("while already holding:\n");
+			print_lock(rhlock);
+			printk("which is the same lock instance!\n");
+			dump_stack();
+			return 0;
+		}
+	}
+
 	if (check == 2 && !mark_irqflags(curr, hlock))
 		return 0;
 

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* BUG: spinlock recursion (sys_chdir, user_path_at, do_path_lookup ...)
  2011-01-13 11:17       ` Peter Zijlstra
@ 2011-01-13 11:21         ` Thomas Gleixner
  2011-01-13 11:37           ` Peter Zijlstra
  0 siblings, 1 reply; 29+ messages in thread
From: Thomas Gleixner @ 2011-01-13 11:21 UTC (permalink / raw)
  To: linux-arm-kernel

On Thu, 13 Jan 2011, Peter Zijlstra wrote:

> 
> > On Wed, 2011-01-12 at 23:52 +0100, Thomas Gleixner wrote:
> 
> > > @peterz: Why does lockdep ignore the lock recursion in that
> > >          spin_lock_nested() call?
> 
> So after some hints on IRC on where to look:
> 
> <tglx>         spin_lock(&parent->d_lock);
> <tglx>         spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
> <tglx> if parent == dentry
> 
> That won't yell because you explicitly tell lockdep its ok, I know what
> I'm doing.
> 
> Several lockdep annotations (including this one) allow you to annotate
> real bugs away, hence you really need to be sure about things when you
> make them.

Yeah, I suspected that, but checking whether the pointers are same
would be nice as it would tell us right away where we fcked up :)

^ permalink raw reply	[flat|nested] 29+ messages in thread

* BUG: spinlock recursion (sys_chdir, user_path_at, do_path_lookup ...)
  2011-01-13 11:01     ` Peter Zijlstra
@ 2011-01-13 11:17       ` Peter Zijlstra
  2011-01-13 11:21         ` Thomas Gleixner
  0 siblings, 1 reply; 29+ messages in thread
From: Peter Zijlstra @ 2011-01-13 11:17 UTC (permalink / raw)
  To: linux-arm-kernel


> On Wed, 2011-01-12 at 23:52 +0100, Thomas Gleixner wrote:

> > @peterz: Why does lockdep ignore the lock recursion in that
> >          spin_lock_nested() call?

So after some hints on IRC on where to look:

<tglx>         spin_lock(&parent->d_lock);
<tglx>         spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED);
<tglx> if parent == dentry

That won't yell because you explicitly tell lockdep its ok, I know what
I'm doing.

Several lockdep annotations (including this one) allow you to annotate
real bugs away, hence you really need to be sure about things when you
make them.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* BUG: spinlock recursion (sys_chdir, user_path_at, do_path_lookup ...)
  2011-01-12 22:52   ` Thomas Gleixner
  2011-01-13  8:09     ` Uwe Kleine-König
@ 2011-01-13 11:01     ` Peter Zijlstra
  2011-01-13 11:17       ` Peter Zijlstra
  1 sibling, 1 reply; 29+ messages in thread
From: Peter Zijlstra @ 2011-01-13 11:01 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 2011-01-12 at 23:52 +0100, Thomas Gleixner wrote:
> On Wed, 12 Jan 2011, Uwe Kleine-K?nig wrote:
> > > Reverting: fs: rcu-walk aware d_revalidate method
> > > commit: 34286d6662308d82aed891852d04c7c3a2649b16
> > I found that one, too, in the meantime.  Currently debugging that with
> > tglx on irc.
> 
> The last finding is that parent and dentry in
> nameidata_dentry_drop_rcu() are the same, which explains the lock
> recursion nicely. 
> 
> @nick: Anything you want us to add to the debugging ?
> 
> @peterz: Why does lockdep ignore the lock recursion in that
>          spin_lock_nested() call?

$ git show 34286d6662308d82aed891852d04c7c3a2649b16 | grep spin_lock_nested | wc -l
0

Uhm, whot?!

^ permalink raw reply	[flat|nested] 29+ messages in thread

* BUG: spinlock recursion (sys_chdir, user_path_at, do_path_lookup ...)
  2011-01-12 22:52   ` Thomas Gleixner
@ 2011-01-13  8:09     ` Uwe Kleine-König
  2011-01-13 11:01     ` Peter Zijlstra
  1 sibling, 0 replies; 29+ messages in thread
From: Uwe Kleine-König @ 2011-01-13  8:09 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Nick,

On Wed, Jan 12, 2011 at 11:52:01PM +0100, Thomas Gleixner wrote:
> On Wed, 12 Jan 2011, Uwe Kleine-K?nig wrote:
> > > Reverting: fs: rcu-walk aware d_revalidate method
> > > commit: 34286d6662308d82aed891852d04c7c3a2649b16
> > I found that one, too, in the meantime.  Currently debugging that with
> > tglx on irc.
> 
> The last finding is that parent and dentry in
> nameidata_dentry_drop_rcu() are the same, which explains the lock
> recursion nicely. 
> 
> @nick: Anything you want us to add to the debugging ?
If that helps, the chdir is to / (probably while being already in /).
Maybe the problem is that "/." == "/.."?

Best regards
Uwe

-- 
Pengutronix e.K.                           | Uwe Kleine-K?nig            |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |

^ permalink raw reply	[flat|nested] 29+ messages in thread

* BUG: spinlock recursion (sys_chdir, user_path_at, do_path_lookup ...)
  2011-01-12 21:02 ` Uwe Kleine-König
  2011-01-12 21:16   ` Ramirez Luna, Omar
@ 2011-01-12 22:52   ` Thomas Gleixner
  2011-01-13  8:09     ` Uwe Kleine-König
  2011-01-13 11:01     ` Peter Zijlstra
  1 sibling, 2 replies; 29+ messages in thread
From: Thomas Gleixner @ 2011-01-12 22:52 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, 12 Jan 2011, Uwe Kleine-K?nig wrote:
> > Reverting: fs: rcu-walk aware d_revalidate method
> > commit: 34286d6662308d82aed891852d04c7c3a2649b16
> I found that one, too, in the meantime.  Currently debugging that with
> tglx on irc.

The last finding is that parent and dentry in
nameidata_dentry_drop_rcu() are the same, which explains the lock
recursion nicely. 

@nick: Anything you want us to add to the debugging ?

@peterz: Why does lockdep ignore the lock recursion in that
	 spin_lock_nested() call?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 29+ messages in thread

* BUG: spinlock recursion (sys_chdir, user_path_at, do_path_lookup ...)
  2011-01-12 21:02 ` Uwe Kleine-König
@ 2011-01-12 21:16   ` Ramirez Luna, Omar
  2011-01-12 22:52   ` Thomas Gleixner
  1 sibling, 0 replies; 29+ messages in thread
From: Ramirez Luna, Omar @ 2011-01-12 21:16 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Uwe,

2011/1/12 Uwe Kleine-K?nig <u.kleine-koenig@pengutronix.de>:
> You're on ARM, too?

Yes, ARMv7 (zoom2).

>> > If it made no difference, then you know that something else (maybe in
>> > combination) is causing the problem.
>>
>> I tried to narrow it down using the dump and another thread mentioning
>> recent changes from "Nick" (might be Nick Piggin).
>>
>> Reverting: fs: rcu-walk aware d_revalidate method
>> commit: 34286d6662308d82aed891852d04c7c3a2649b16
> I found that one, too, in the meantime. ?Currently debugging that with
> tglx on irc.

OK, let me know if I can help.

Regards,

Omar

^ permalink raw reply	[flat|nested] 29+ messages in thread

* BUG: spinlock recursion (sys_chdir, user_path_at, do_path_lookup ...)
  2011-01-12 20:59 Ramirez Luna, Omar
@ 2011-01-12 21:02 ` Uwe Kleine-König
  2011-01-12 21:16   ` Ramirez Luna, Omar
  2011-01-12 22:52   ` Thomas Gleixner
  0 siblings, 2 replies; 29+ messages in thread
From: Uwe Kleine-König @ 2011-01-12 21:02 UTC (permalink / raw)
  To: linux-arm-kernel

Hello Omar,

On Wed, Jan 12, 2011 at 02:59:39PM -0600, Ramirez Luna, Omar wrote:
> I picked this mail from google so I might be missing some recipients.
> 
> On Wed, 12 Jan 2011, Russell King - ARM Linux wrote:
> > On Wed, Jan 12, 2011 at 12:35:08PM +0000, Russell King - ARM Linux wrote:
> > > ARM doesn't implement save_stack_trace_regs() nor save_stack_trace_bp()
> > > so if the compiler referenced these, you'd have a kernel which doesn't
> > > link.  The only places that this symbol appears is:
> > >
> > > arch/x86/kernel/stacktrace.c:void save_stack_trace_regs(struct stack_trace *trac
> > > arch/x86/mm/kmemcheck/error.c:  save_stack_trace_regs(&e->trace, regs);
> > > include/linux/stacktrace.h:extern void save_stack_trace_regs(struct stack_trace
> > >
> > > So, if this is where your bisect decided was the problem, your bisect
> > > was faulty.
> >
> > BTW, a useful thing to do after a bisect is to return to the point in
> > the history where you first noticed the regression (so Linus' tip,
> > your tip, or whatever).  Then try reverting the commit which git bisect
> > _thinks_ is the cause of your problem and re-test that.
> >
> > If the problem is fixed, you have greater confidence that the commit is
> > the problem.
> 
> Reverting this commit (9c0729dc8062bed96189bd14ac6d4920f3958743 )
> didn't improve in my case.
Yeah, my bisect was screwed because I somehow changed .config in the
middle ...

You're on ARM, too?

> > If it made no difference, then you know that something else (maybe in
> > combination) is causing the problem.
> 
> I tried to narrow it down using the dump and another thread mentioning
> recent changes from "Nick" (might be Nick Piggin).
> 
> Reverting: fs: rcu-walk aware d_revalidate method
> commit: 34286d6662308d82aed891852d04c7c3a2649b16
I found that one, too, in the meantime.  Currently debugging that with
tglx on irc.

Best regards
Uwe

-- 
Pengutronix e.K.                           | Uwe Kleine-K?nig            |
Industrial Linux Solutions                 | http://www.pengutronix.de/  |

^ permalink raw reply	[flat|nested] 29+ messages in thread

* BUG: spinlock recursion (sys_chdir, user_path_at, do_path_lookup ...)
@ 2011-01-12 20:59 Ramirez Luna, Omar
  2011-01-12 21:02 ` Uwe Kleine-König
  0 siblings, 1 reply; 29+ messages in thread
From: Ramirez Luna, Omar @ 2011-01-12 20:59 UTC (permalink / raw)
  To: linux-arm-kernel

Hi,

I picked this mail from google so I might be missing some recipients.

On Wed, 12 Jan 2011, Russell King - ARM Linux wrote:
> On Wed, Jan 12, 2011 at 12:35:08PM +0000, Russell King - ARM Linux wrote:
> > ARM doesn't implement save_stack_trace_regs() nor save_stack_trace_bp()
> > so if the compiler referenced these, you'd have a kernel which doesn't
> > link.  The only places that this symbol appears is:
> >
> > arch/x86/kernel/stacktrace.c:void save_stack_trace_regs(struct stack_trace *trac
> > arch/x86/mm/kmemcheck/error.c:  save_stack_trace_regs(&e->trace, regs);
> > include/linux/stacktrace.h:extern void save_stack_trace_regs(struct stack_trace
> >
> > So, if this is where your bisect decided was the problem, your bisect
> > was faulty.
>
> BTW, a useful thing to do after a bisect is to return to the point in
> the history where you first noticed the regression (so Linus' tip,
> your tip, or whatever).  Then try reverting the commit which git bisect
> _thinks_ is the cause of your problem and re-test that.
>
> If the problem is fixed, you have greater confidence that the commit is
> the problem.

Reverting this commit (9c0729dc8062bed96189bd14ac6d4920f3958743 )
didn't improve in my case.

> If it made no difference, then you know that something else (maybe in
> combination) is causing the problem.

I tried to narrow it down using the dump and another thread mentioning
recent changes from "Nick" (might be Nick Piggin).

Reverting: fs: rcu-walk aware d_revalidate method
commit: 34286d6662308d82aed891852d04c7c3a2649b16

Seems to get rid of the bug, hopefully it will give more information
to someone more experienced with this code (than me).

Regards,

Omar

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2011-01-19  7:43 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-11 11:05 BUG: spinlock recursion (sys_chdir, user_path_at, do_path_lookup ...) Uwe Kleine-König
2011-01-11 11:05 ` Uwe Kleine-König
2011-01-12  7:52 ` Uwe Kleine-König
2011-01-12  7:52   ` Uwe Kleine-König
2011-01-12 10:57   ` Thomas Gleixner
2011-01-12 10:57     ` Thomas Gleixner
2011-01-12 12:03     ` Uwe Kleine-König
2011-01-12 12:03       ` Uwe Kleine-König
2011-01-12 12:35       ` Russell King - ARM Linux
2011-01-12 12:35         ` Russell King - ARM Linux
2011-01-12 12:48         ` Russell King - ARM Linux
2011-01-12 12:48           ` Russell King - ARM Linux
2011-01-12 12:56           ` Thomas Gleixner
2011-01-12 12:56             ` Thomas Gleixner
2011-01-18 16:59 ` Maciej Rutecki
2011-01-18 16:59   ` Maciej Rutecki
2011-01-18 22:19   ` Nick Piggin
2011-01-18 22:19     ` Nick Piggin
2011-01-19  7:43     ` Uwe Kleine-König
2011-01-19  7:43       ` Uwe Kleine-König
2011-01-12 20:59 Ramirez Luna, Omar
2011-01-12 21:02 ` Uwe Kleine-König
2011-01-12 21:16   ` Ramirez Luna, Omar
2011-01-12 22:52   ` Thomas Gleixner
2011-01-13  8:09     ` Uwe Kleine-König
2011-01-13 11:01     ` Peter Zijlstra
2011-01-13 11:17       ` Peter Zijlstra
2011-01-13 11:21         ` Thomas Gleixner
2011-01-13 11:37           ` Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.