linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [Powerpc / eHEA] Circular dependency with 2.6.29-rc6
@ 2009-02-23  8:47 Sachin P. Sant
  2009-02-25 15:05 ` Jan-Bernd Themann
  0 siblings, 1 reply; 5+ messages in thread
From: Sachin P. Sant @ 2009-02-23  8:47 UTC (permalink / raw)
  To: linuxppc-dev, netdev
  Cc: TKLEIN, Mel Gorman, Kamalesh Babulal, Jan-Bernd Themann

While booting 2.6.29-rc6 on a powerpc box came across this
circular dependency with eHEA driver.

=======================================================
[ INFO: possible circular locking dependency detected ]
2.6.29-rc6 #2
-------------------------------------------------------
ip/2174 is trying to acquire lock:
 (&ehea_fw_handles.lock){--..}, at: [<d000000002a13e30>] .ehea_up+0x64/0x6e0
[ehea]

but task is already holding lock:
 (&port->port_lock){--..}, at: [<d000000002a1533c>] .ehea_open+0x3c/0xc4 [ehea]

which lock already depends on the new lock.


the existing dependency chain (in reverse order) is:

-> #2 (&port->port_lock){--..}:
 [<c0000000000a8590>] .__lock_acquire+0x7e0/0x8a8
       [<c0000000000a86ac>] .lock_acquire+0x54/0x80
       [<c0000000005d7564>] .mutex_lock_nested+0x190/0x46c
       [<d000000002a1533c>] .ehea_open+0x3c/0xc4 [ehea]
       [<c000000000537834>] .dev_open+0xf4/0x168
       [<c000000000535780>] .dev_change_flags+0xe4/0x1e8
       [<c000000000597bfc>] .devinet_ioctl+0x2c4/0x750
       [<c0000000005997a8>] .inet_ioctl+0xcc/0x11c
       [<c000000000523400>] .sock_ioctl+0x2f0/0x34c
       [<c0000000001380ec>] .vfs_ioctl+0x5c/0xf0
       [<c000000000138810>] .do_vfs_ioctl+0x690/0x70c
       [<c000000000138900>] .SyS_ioctl+0x74/0xb8
       [<c00000000016fb08>] .dev_ifsioc+0x210/0x4b8
       [<c00000000016ef18>] .compat_sys_ioctl+0x3f4/0x488
       [<c00000000000855c>] syscall_exit+0x0/0x40

-> #1 (rtnl_mutex){--..}:
       [<c0000000000a8590>] .__lock_acquire+0x7e0/0x8a8
       [<c0000000000a86ac>] .lock_acquire+0x54/0x80
       [<c0000000005d7564>] .mutex_lock_nested+0x190/0x46c
       [<c0000000005430a8>] .rtnl_lock+0x20/0x38
       [<c00000000053677c>] .register_netdev+0x1c/0x80
       [<d000000002a12714>] .ehea_setup_single_port+0x2c8/0x3d0 [ehea]
       [<d000000002a19da8>] .ehea_probe_adapter+0x288/0x394 [ehea]
       [<c00000000051f034>] .of_platform_device_probe+0x78/0x86c
       [<c00000000047faec>] .driver_probe_device+0x13c/0x200
       [<c00000000047fc44>] .__driver_attach+0x94/0xd8
       [<c00000000047eab4>] .bus_for_each_dev+0x80/0xd8
       [<c00000000047f850>] .driver_attach+0x28/0x40
       [<c00000000047f23c>] .bus_add_driver+0xd4/0x284
       [<c00000000047ff7c>] .driver_register+0xc4/0x198
       [<c00000000051eeec>] .of_register_driver+0x4c/0x60
       [<c000000000024da4>] .ibmebus_register_driver+0x30/0x4c
       [<d000000002a1a090>] .ehea_module_init+0x1dc/0x234c [ehea]
       [<c000000000009368>] .do_one_initcall+0x90/0x1b0
       [<c0000000000b2f24>] .SyS_init_module+0xc8/0x220
       [<c00000000000855c>] syscall_exit+0x0/0x40

-> #0 (&ehea_fw_handles.lock){--..}:
       [<c0000000000a8590>] .__lock_acquire+0x7e0/0x8a8
       [<c0000000000a86ac>] .lock_acquire+0x54/0x80
       [<c0000000005d7564>] .mutex_lock_nested+0x190/0x46c
       [<d000000002a13e30>] .ehea_up+0x64/0x6e0 [ehea]
       [<d000000002a15364>] .ehea_open+0x64/0xc4 [ehea]
       [<c000000000537834>] .dev_open+0xf4/0x168
       [<c000000000535780>] .dev_change_flags+0xe4/0x1e8
       [<c000000000597bfc>] .devinet_ioctl+0x2c4/0x750
       [<c0000000005997a8>] .inet_ioctl+0xcc/0x11c
       [<c000000000523400>] .sock_ioctl+0x2f0/0x34c
       [<c0000000001380ec>] .vfs_ioctl+0x5c/0xf0
       [<c000000000138810>] .do_vfs_ioctl+0x690/0x70c
       [<c000000000138900>] .SyS_ioctl+0x74/0xb8
       [<c00000000016fb08>] .dev_ifsioc+0x210/0x4b8
       [<c00000000016ef18>] .compat_sys_ioctl+0x3f4/0x488
       [<c00000000000855c>] syscall_exit+0x0/0x40

other info that might help us debug this:

2 locks held by ip/2174:
 #0:  (rtnl_mutex){--..}, at: [<c0000000005430a8>] .rtnl_lock+0x20/0x38
 #1:  (&port->port_lock){--..}, at: [<d000000002a1533c>] .ehea_open+0x3c/0xc4
[ehea]

stack backtrace:
Call Trace:
[c00000004246b070] [c00000000001154c] .show_stack+0x70/0x184 (unreliable)
[c00000004246b120] [c0000000000a6ee4] .print_circular_bug_tail+0xd8/0xfc
[c00000004246b1f0] [c0000000000a76ec] .validate_chain+0x7e4/0xea8
[c00000004246b2b0] [c0000000000a8590] .__lock_acquire+0x7e0/0x8a8
[c00000004246b3a0] [c0000000000a86ac] .lock_acquire+0x54/0x80
[c00000004246b430] [c0000000005d7564] .mutex_lock_nested+0x190/0x46c
[c00000004246b510] [d000000002a13e30] .ehea_up+0x64/0x6e0 [ehea]
[c00000004246b610] [d000000002a15364] .ehea_open+0x64/0xc4 [ehea]
[c00000004246b6b0] [c000000000537834] .dev_open+0xf4/0x168
[c00000004246b740] [c000000000535780] .dev_change_flags+0xe4/0x1e8
[c00000004246b7f0] [c000000000597bfc] .devinet_ioctl+0x2c4/0x750
[c00000004246b8f0] [c0000000005997a8] .inet_ioctl+0xcc/0x11c
[c00000004246b960] [c000000000523400] .sock_ioctl+0x2f0/0x34c
[c00000004246ba00] [c0000000001380ec] .vfs_ioctl+0x5c/0xf0
[c00000004246baa0] [c000000000138810] .do_vfs_ioctl+0x690/0x70c
[c00000004246bb80] [c000000000138900] .SyS_ioctl+0x74/0xb8
[c00000004246bc30] [c00000000016fb08] .dev_ifsioc+0x210/0x4b8
[c00000004246bd40] [c00000000016ef18] .compat_sys_ioctl+0x3f4/0x488
[c00000004246be30] [c00000000000855c] syscall_exit+0x0/0x40
ehea: eth2: Physical port up

Thanks
-Sachin

-- 

---------------------------------
Sachin Sant
IBM Linux Technology Center
India Systems and Technology Labs
Bangalore, India
---------------------------------

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Powerpc / eHEA] Circular dependency with 2.6.29-rc6
  2009-02-23  8:47 [Powerpc / eHEA] Circular dependency with 2.6.29-rc6 Sachin P. Sant
@ 2009-02-25 15:05 ` Jan-Bernd Themann
  2009-02-25 15:50   ` Peter Zijlstra
  0 siblings, 1 reply; 5+ messages in thread
From: Jan-Bernd Themann @ 2009-02-25 15:05 UTC (permalink / raw)
  To: Sachin P. Sant
  Cc: TKLEIN, Jan-Bernd Themann, Mel Gorman, netdev, Kamalesh Babulal,
	linuxppc-dev, Ingo Molnar

Hi,

we have investigated this problem but didn't understand to root cause of
this problem so far.
The things we observed:
- The warning is only shown when the ehea module is loaded while the
machine is booting.
- If you load the module later (modprobe) no warnings are shown
- Machine never actually hangs

We interpret the warning like this:
- The mutex debug facility detects a dependency between port_lock and
ehea_fw_handles.lock
- ehea_fw_handles.lock is an ehea global lock
- port->port_lock is a lock per network device
- When "open" is called for a registered network device, port->port_lock
is taken first,
  then ehea_fw_handles.lock
- When "open" is left these locks are released in a proper way (inverse
order)
- In addition: ehea_fw_handles.lock is held by the function
"driver_probe_device"
  that registers all available network devices (register_netdev)
- When multiple network devices are registered, it is possible that
"open" is
  called on an already registered network device while further
netdevices are still registered
  in "driver_probe_device". ---> "open" will take port->port_lock, but
won't get ehea_fw_handles.lock
- However, ehea_fw_handles.lock is freed once all netdevices are registered.
- When the second netdevice is registered in "driver_probe_device", it
will also try to get
  the port->port_lock (which in fact is a different one, as there is one
per netdevice).
- Does the mutex debug mechanism distinguish between the different
port->port_lock instances?

So far we don't see a locking problem here. Is it possible that the
mutex debug
mechanism causes a false positive here?

Any help is highly appreciated.

Regards
Jan-Bernd

Sachin P. Sant wrote:
> While booting 2.6.29-rc6 on a powerpc box came across this
> circular dependency with eHEA driver.
>
> =======================================================
> [ INFO: possible circular locking dependency detected ]
> 2.6.29-rc6 #2
> -------------------------------------------------------
> ip/2174 is trying to acquire lock:
> (&ehea_fw_handles.lock){--..}, at: [<d000000002a13e30>]
> .ehea_up+0x64/0x6e0
> [ehea]
>
> but task is already holding lock:
> (&port->port_lock){--..}, at: [<d000000002a1533c>]
> .ehea_open+0x3c/0xc4 [ehea]
>
> which lock already depends on the new lock.
>
>
> the existing dependency chain (in reverse order) is:
>
> -> #2 (&port->port_lock){--..}:
> [<c0000000000a8590>] .__lock_acquire+0x7e0/0x8a8
>       [<c0000000000a86ac>] .lock_acquire+0x54/0x80
>       [<c0000000005d7564>] .mutex_lock_nested+0x190/0x46c
>       [<d000000002a1533c>] .ehea_open+0x3c/0xc4 [ehea]
>       [<c000000000537834>] .dev_open+0xf4/0x168
>       [<c000000000535780>] .dev_change_flags+0xe4/0x1e8
>       [<c000000000597bfc>] .devinet_ioctl+0x2c4/0x750
>       [<c0000000005997a8>] .inet_ioctl+0xcc/0x11c
>       [<c000000000523400>] .sock_ioctl+0x2f0/0x34c
>       [<c0000000001380ec>] .vfs_ioctl+0x5c/0xf0
>       [<c000000000138810>] .do_vfs_ioctl+0x690/0x70c
>       [<c000000000138900>] .SyS_ioctl+0x74/0xb8
>       [<c00000000016fb08>] .dev_ifsioc+0x210/0x4b8
>       [<c00000000016ef18>] .compat_sys_ioctl+0x3f4/0x488
>       [<c00000000000855c>] syscall_exit+0x0/0x40
>
> -> #1 (rtnl_mutex){--..}:
>       [<c0000000000a8590>] .__lock_acquire+0x7e0/0x8a8
>       [<c0000000000a86ac>] .lock_acquire+0x54/0x80
>       [<c0000000005d7564>] .mutex_lock_nested+0x190/0x46c
>       [<c0000000005430a8>] .rtnl_lock+0x20/0x38
>       [<c00000000053677c>] .register_netdev+0x1c/0x80
>       [<d000000002a12714>] .ehea_setup_single_port+0x2c8/0x3d0 [ehea]
>       [<d000000002a19da8>] .ehea_probe_adapter+0x288/0x394 [ehea]
>       [<c00000000051f034>] .of_platform_device_probe+0x78/0x86c
>       [<c00000000047faec>] .driver_probe_device+0x13c/0x200
>       [<c00000000047fc44>] .__driver_attach+0x94/0xd8
>       [<c00000000047eab4>] .bus_for_each_dev+0x80/0xd8
>       [<c00000000047f850>] .driver_attach+0x28/0x40
>       [<c00000000047f23c>] .bus_add_driver+0xd4/0x284
>       [<c00000000047ff7c>] .driver_register+0xc4/0x198
>       [<c00000000051eeec>] .of_register_driver+0x4c/0x60
>       [<c000000000024da4>] .ibmebus_register_driver+0x30/0x4c
>       [<d000000002a1a090>] .ehea_module_init+0x1dc/0x234c [ehea]
>       [<c000000000009368>] .do_one_initcall+0x90/0x1b0
>       [<c0000000000b2f24>] .SyS_init_module+0xc8/0x220
>       [<c00000000000855c>] syscall_exit+0x0/0x40
>
> -> #0 (&ehea_fw_handles.lock){--..}:
>       [<c0000000000a8590>] .__lock_acquire+0x7e0/0x8a8
>       [<c0000000000a86ac>] .lock_acquire+0x54/0x80
>       [<c0000000005d7564>] .mutex_lock_nested+0x190/0x46c
>       [<d000000002a13e30>] .ehea_up+0x64/0x6e0 [ehea]
>       [<d000000002a15364>] .ehea_open+0x64/0xc4 [ehea]
>       [<c000000000537834>] .dev_open+0xf4/0x168
>       [<c000000000535780>] .dev_change_flags+0xe4/0x1e8
>       [<c000000000597bfc>] .devinet_ioctl+0x2c4/0x750
>       [<c0000000005997a8>] .inet_ioctl+0xcc/0x11c
>       [<c000000000523400>] .sock_ioctl+0x2f0/0x34c
>       [<c0000000001380ec>] .vfs_ioctl+0x5c/0xf0
>       [<c000000000138810>] .do_vfs_ioctl+0x690/0x70c
>       [<c000000000138900>] .SyS_ioctl+0x74/0xb8
>       [<c00000000016fb08>] .dev_ifsioc+0x210/0x4b8
>       [<c00000000016ef18>] .compat_sys_ioctl+0x3f4/0x488
>       [<c00000000000855c>] syscall_exit+0x0/0x40
>
> other info that might help us debug this:
>
> 2 locks held by ip/2174:
> #0:  (rtnl_mutex){--..}, at: [<c0000000005430a8>] .rtnl_lock+0x20/0x38
> #1:  (&port->port_lock){--..}, at: [<d000000002a1533c>]
> .ehea_open+0x3c/0xc4
> [ehea]
>
> stack backtrace:
> Call Trace:
> [c00000004246b070] [c00000000001154c] .show_stack+0x70/0x184 (unreliable)
> [c00000004246b120] [c0000000000a6ee4] .print_circular_bug_tail+0xd8/0xfc
> [c00000004246b1f0] [c0000000000a76ec] .validate_chain+0x7e4/0xea8
> [c00000004246b2b0] [c0000000000a8590] .__lock_acquire+0x7e0/0x8a8
> [c00000004246b3a0] [c0000000000a86ac] .lock_acquire+0x54/0x80
> [c00000004246b430] [c0000000005d7564] .mutex_lock_nested+0x190/0x46c
> [c00000004246b510] [d000000002a13e30] .ehea_up+0x64/0x6e0 [ehea]
> [c00000004246b610] [d000000002a15364] .ehea_open+0x64/0xc4 [ehea]
> [c00000004246b6b0] [c000000000537834] .dev_open+0xf4/0x168
> [c00000004246b740] [c000000000535780] .dev_change_flags+0xe4/0x1e8
> [c00000004246b7f0] [c000000000597bfc] .devinet_ioctl+0x2c4/0x750
> [c00000004246b8f0] [c0000000005997a8] .inet_ioctl+0xcc/0x11c
> [c00000004246b960] [c000000000523400] .sock_ioctl+0x2f0/0x34c
> [c00000004246ba00] [c0000000001380ec] .vfs_ioctl+0x5c/0xf0
> [c00000004246baa0] [c000000000138810] .do_vfs_ioctl+0x690/0x70c
> [c00000004246bb80] [c000000000138900] .SyS_ioctl+0x74/0xb8
> [c00000004246bc30] [c00000000016fb08] .dev_ifsioc+0x210/0x4b8
> [c00000004246bd40] [c00000000016ef18] .compat_sys_ioctl+0x3f4/0x488
> [c00000004246be30] [c00000000000855c] syscall_exit+0x0/0x40
> ehea: eth2: Physical port up
>
> Thanks
> -Sachin
>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Powerpc / eHEA] Circular dependency with 2.6.29-rc6
  2009-02-25 15:05 ` Jan-Bernd Themann
@ 2009-02-25 15:50   ` Peter Zijlstra
  2009-02-25 17:07     ` Jan-Bernd Themann
  0 siblings, 1 reply; 5+ messages in thread
From: Peter Zijlstra @ 2009-02-25 15:50 UTC (permalink / raw)
  To: Jan-Bernd Themann
  Cc: TKLEIN, Jan-Bernd Themann, Mel Gorman, netdev, Kamalesh Babulal,
	linuxppc-dev, Ingo Molnar

On Wed, 2009-02-25 at 16:05 +0100, Jan-Bernd Themann wrote:

> - When "open" is called for a registered network device, port->port_lock
> is taken first,
>   then ehea_fw_handles.lock
> - When "open" is left these locks are released in a proper way (inverse
> order)

So this has:

  port->port_lock
    ehea_fw_handles.lock

This would be the case that is generating the warning.

> - In addition: ehea_fw_handles.lock is held by the function
> "driver_probe_device"
>   that registers all available network devices (register_netdev)
> - When multiple network devices are registered, it is possible that
> "open" is
>   called on an already registered network device while further
> netdevices are still registered
>   in "driver_probe_device". ---> "open" will take port->port_lock, but
> won't get ehea_fw_handles.lock

Right, so here you have 

  ehea_fw_handles.lock
    port->port_lock

Overlay these two cases and you have AB-BA deadlocks.

> - However, ehea_fw_handles.lock is freed once all netdevices are registered.
> - When the second netdevice is registered in "driver_probe_device", it
> will also try to get
>   the port->port_lock (which in fact is a different one, as there is one
> per netdevice).
> - Does the mutex debug mechanism distinguish between the different
> port->port_lock instances?

Not unless you tell it to.

Are you really sure the port->port_lock in this AB-BA scenario are never
the same? The above explanation didn't convince me (also very hard to
read due to funny wrapping).

Suppose you do an open concurrently with a re-probe, which apparently
takes port->port_lock's of existing devices, in the above scenario that
deadlocks.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Powerpc / eHEA] Circular dependency with 2.6.29-rc6
  2009-02-25 15:50   ` Peter Zijlstra
@ 2009-02-25 17:07     ` Jan-Bernd Themann
  2009-02-25 18:24       ` Peter Zijlstra
  0 siblings, 1 reply; 5+ messages in thread
From: Jan-Bernd Themann @ 2009-02-25 17:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: TKLEIN, Jan-Bernd Themann, Mel Gorman, netdev, Kamalesh Babulal,
	linuxppc-dev, Ingo Molnar

Hi,

yes, sorry for the funny wrapping... and thanks for your quick answer!

Peter Zijlstra wrote:
> On Wed, 2009-02-25 at 16:05 +0100, Jan-Bernd Themann wrote:
>
>   
>> - When "open" is called for a registered network device, port->port_lock
>> is taken first,
>>   then ehea_fw_handles.lock
>> - When "open" is left these locks are released in a proper way (inverse
>> order)
>>     
>
> So this has:
>
>   port->port_lock
>     ehea_fw_handles.lock
>
> This would be the case that is generating the warning.
>
>   
>> - In addition: ehea_fw_handles.lock is held by the function
>> "driver_probe_device"
>>   that registers all available network devices (register_netdev)
>> - When multiple network devices are registered, it is possible that
>> "open" is
>>   called on an already registered network device while further
>> netdevices are still registered
>>   in "driver_probe_device". ---> "open" will take port->port_lock, but
>> won't get ehea_fw_handles.lock
>>     
>
> Right, so here you have 
>
>   ehea_fw_handles.lock
>     port->port_lock
>
> Overlay these two cases and you have AB-BA deadlocks.
>
>   
The thing here is that I did not see that "open" is called from this
"probe" function,
this happens probably indirectly as each new device causes a notifier chain
to be called --> If I got it right then a userspace tool triggers the
"open".
In that case the open would run in an other task/thread and thus when
the kernel
preemts the task/thread the probe function would continue and free the lock.

Lets assume that it is actually possible that "open" is called in the
same context as
"probe", wound't that mean that we actually need to hit a deadlock?
(probe helds
the lock all the time). We have never observed a deadlock so far.

Is there a way to find out if all these locks are actually taken in the
same context
(kthread, tasklet...)?

>> - However, ehea_fw_handles.lock is freed once all netdevices are registered.
>> - When the second netdevice is registered in "driver_probe_device", it
>> will also try to get
>>   the port->port_lock (which in fact is a different one, as there is one
>> per netdevice).
>> - Does the mutex debug mechanism distinguish between the different
>> port->port_lock instances?
>>     
>
> Not unless you tell it to.
>   
> Are you really sure the port->port_lock in this AB-BA scenario are never
> the same? The above explanation didn't convince me (also very hard to
> read due to funny wrapping).
>   
I'm not sure, especially as I just ran the same test with just one port
and we still
get the warning. But having two instances of port accessing the locks
does not
look like a problem to me as they allocate and free the locks properly
(right order).

> Suppose you do an open concurrently with a re-probe, which apparently
> takes port->port_lock's of existing devices, in the above scenario that
> deadlocks.
>
>   

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Powerpc / eHEA] Circular dependency with 2.6.29-rc6
  2009-02-25 17:07     ` Jan-Bernd Themann
@ 2009-02-25 18:24       ` Peter Zijlstra
  0 siblings, 0 replies; 5+ messages in thread
From: Peter Zijlstra @ 2009-02-25 18:24 UTC (permalink / raw)
  To: Jan-Bernd Themann
  Cc: TKLEIN, Jan-Bernd Themann, Mel Gorman, netdev, Kamalesh Babulal,
	linuxppc-dev, Ingo Molnar

On Wed, 2009-02-25 at 18:07 +0100, Jan-Bernd Themann wrote:
> Hi,
> 
> yes, sorry for the funny wrapping... and thanks for your quick answer!
> 
> Peter Zijlstra wrote:
> > On Wed, 2009-02-25 at 16:05 +0100, Jan-Bernd Themann wrote:
> >
> >   
> >> - When "open" is called for a registered network device, port->port_lock
> >> is taken first,
> >>   then ehea_fw_handles.lock
> >> - When "open" is left these locks are released in a proper way (inverse
> >> order)
> >>     
> >
> > So this has:
> >
> >   port->port_lock
> >     ehea_fw_handles.lock
> >
> > This would be the case that is generating the warning.
> >
> >   
> >> - In addition: ehea_fw_handles.lock is held by the function
> >> "driver_probe_device"
> >>   that registers all available network devices (register_netdev)
> >> - When multiple network devices are registered, it is possible that
> >> "open" is
> >>   called on an already registered network device while further
> >> netdevices are still registered
> >>   in "driver_probe_device". ---> "open" will take port->port_lock, but
> >> won't get ehea_fw_handles.lock
> >>     
> >
> > Right, so here you have 
> >
> >   ehea_fw_handles.lock
> >     port->port_lock
> >
> > Overlay these two cases and you have AB-BA deadlocks.
> >
> >   
> The thing here is that I did not see that "open" is called from this
> "probe" function,
> this happens probably indirectly as each new device causes a notifier chain
> to be called --> If I got it right then a userspace tool triggers the
> "open".
> In that case the open would run in an other task/thread and thus when
> the kernel
> preemts the task/thread the probe function would continue and free the lock.
> 
> Lets assume that it is actually possible that "open" is called in the
> same context as
> "probe", wound't that mean that we actually need to hit a deadlock?
> (probe helds
> the lock all the time). We have never observed a deadlock so far.

That's the brilliant bit about lockdep, it can observe potential
deadlocks without ever hitting them :-)

> Is there a way to find out if all these locks are actually taken in the
> same context
> (kthread, tasklet...)?

They don't need to happen in the same context, suppose a kthread (1)
does the probe and some user task (2) does the open:

    1 - probe                    2 - open

lock(ehea_fw_handles.lock)

			lock(port->port_lock)

lock(port->port_lock) <-- waiting for 2

			lock(ehea_fw_handles.lock) <-- waiting for 1


Which is the classic AB-BA deadlock scenario.

Hitting it will be very unlikely, as this probe thing is a very rare
event, but that doesn't mean it cannot happen.

Now, if you can guarantee that the probe and open port object are
_never_ the same one, then we can say this is a false positive and work
on teaching lockdep about that.

> >> - However, ehea_fw_handles.lock is freed once all netdevices are registered.
> >> - When the second netdevice is registered in "driver_probe_device", it
> >> will also try to get
> >>   the port->port_lock (which in fact is a different one, as there is one
> >> per netdevice).
> >> - Does the mutex debug mechanism distinguish between the different
> >> port->port_lock instances?
> >>     
> >
> > Not unless you tell it to.
> >   
> > Are you really sure the port->port_lock in this AB-BA scenario are never
> > the same? The above explanation didn't convince me (also very hard to
> > read due to funny wrapping).
> >   
> I'm not sure, especially as I just ran the same test with just one port
> and we still
> get the warning. But having two instances of port accessing the locks
> does not
> look like a problem to me as they allocate and free the locks properly
> (right order).

The initial probe will establish the A->B order, the subsequent open
will attempt B->A at which point lockdep will warn.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2009-02-25 18:24 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-02-23  8:47 [Powerpc / eHEA] Circular dependency with 2.6.29-rc6 Sachin P. Sant
2009-02-25 15:05 ` Jan-Bernd Themann
2009-02-25 15:50   ` Peter Zijlstra
2009-02-25 17:07     ` Jan-Bernd Themann
2009-02-25 18:24       ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).