All of lore.kernel.org
 help / color / mirror / Atom feed
* deadlock in 2.6.18.2 related to bridging?
@ 2007-02-14  1:23 Ben Greear
  2007-02-14 21:12 ` Stephen Hemminger
  0 siblings, 1 reply; 4+ messages in thread
From: Ben Greear @ 2007-02-14  1:23 UTC (permalink / raw)
  To: NetDev; +Cc: Stephen Hemminger

I think I may have found a deadlock bug in 2.6.18.2.  This is
with my hacked kernel, but my binary module has not been loaded.

I have several bridges configured, including some containing
my redirect-device virtual devices and ethernet devices.

I believe the deadlock is this:

The work-queue process is calling this, and is blocked on
rtnl:

  [<c0337ede>] __mutex_lock_slowpath+0xbe/0x2a0
  [<c03380dc>] mutex_lock+0x1c/0x20
  [<c02dd1db>] __rtnl_lock+0x1b/0x40
  [<df909dc2>] port_carrier_check+0x22/0xa0 [bridge]
  [<c012d21b>] run_workqueue+0x7b/0x100
  [<c012d9cf>] worker_thread+0x10f/0x130
  [<c01304b5>] kthread+0xd5/0xe0
  [<c0101005>] kernel_thread_helper+0x5/0x10


But, the 'ip' program already has rtnl (acquired in devinet_ioctl),
and is trying to flush the work-queue:

ip            D D9C34000  6600  2780   2775                     (NOTLB)
        d9c35e1c 00000046 deeebae8 d9c34000 c010327f 00000001 d9c34000 00000260
        deeeba80 00000001 d9c542b0 e548f009 0000001a 00020224 d9c543c0 0000007b
        0000007b 00335517 00000000 deeeba80 deeebae8 00000053 d9c35e44 c012d30b
Call Trace:
  [<c012d30b>] flush_cpu_workqueue+0x6b/0xb0
  [<c012d388>] flush_workqueue+0x38/0x50
  [<c012d3fd>] flush_scheduled_work+0xd/0x10
  [<df819665>] rtl8139_close+0x165/0x1a0 [8139too]
  [<c02d4bd4>] dev_close+0x54/0x70
  [<c02d3e31>] dev_change_flags+0x51/0x110
  [<c0314e90>] devinet_ioctl+0x4b0/0x6a0
  [<c031579b>] inet_ioctl+0x6b/0x80
  [<c02c9627>] sock_ioctl+0x77/0x250
  [<c017e1f8>] do_ioctl+0x28/0x80
  [<c017e2a7>] vfs_ioctl+0x57/0x2b0
  [<c017e539>] sys_ioctl+0x39/0x60
  [<c01031ad>] sysenter_past_esp+0x56/0x99
  [<b7fd5410>] 0xb7fd5410


Has this been fixed in later releases?

Thanks,
Ben


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: deadlock in 2.6.18.2 related to bridging?
  2007-02-14  1:23 deadlock in 2.6.18.2 related to bridging? Ben Greear
@ 2007-02-14 21:12 ` Stephen Hemminger
  2007-02-14 21:26   ` Ben Greear
  0 siblings, 1 reply; 4+ messages in thread
From: Stephen Hemminger @ 2007-02-14 21:12 UTC (permalink / raw)
  To: Ben Greear; +Cc: NetDev

On Tue, 13 Feb 2007 17:23:05 -0800
Ben Greear <greearb@candelatech.com> wrote:

> I think I may have found a deadlock bug in 2.6.18.2.  This is
> with my hacked kernel, but my binary module has not been loaded.
> 
> I have several bridges configured, including some containing
> my redirect-device virtual devices and ethernet devices.
> 
> I believe the deadlock is this:
> 
> The work-queue process is calling this, and is blocked on
> rtnl:
> 
>   [<c0337ede>] __mutex_lock_slowpath+0xbe/0x2a0
>   [<c03380dc>] mutex_lock+0x1c/0x20
>   [<c02dd1db>] __rtnl_lock+0x1b/0x40
>   [<df909dc2>] port_carrier_check+0x22/0xa0 [bridge]
>   [<c012d21b>] run_workqueue+0x7b/0x100
>   [<c012d9cf>] worker_thread+0x10f/0x130
>   [<c01304b5>] kthread+0xd5/0xe0
>   [<c0101005>] kernel_thread_helper+0x5/0x10

It is waiting for the other function to finish (in this case the ioctl).
 
> 
> But, the 'ip' program already has rtnl (acquired in devinet_ioctl),
> and is trying to flush the work-queue:
> 
> ip            D D9C34000  6600  2780   2775                     (NOTLB)
>         d9c35e1c 00000046 deeebae8 d9c34000 c010327f 00000001 d9c34000 00000260
>         deeeba80 00000001 d9c542b0 e548f009 0000001a 00020224 d9c543c0 0000007b
>         0000007b 00335517 00000000 deeeba80 deeebae8 00000053 d9c35e44 c012d30b
> Call Trace:
>   [<c012d30b>] flush_cpu_workqueue+0x6b/0xb0
>   [<c012d388>] flush_workqueue+0x38/0x50
>   [<c012d3fd>] flush_scheduled_work+0xd/0x10
>   [<df819665>] rtl8139_close+0x165/0x1a0 [8139too]
>   [<c02d4bd4>] dev_close+0x54/0x70
>   [<c02d3e31>] dev_change_flags+0x51/0x110
>   [<c0314e90>] devinet_ioctl+0x4b0/0x6a0
>   [<c031579b>] inet_ioctl+0x6b/0x80
>   [<c02c9627>] sock_ioctl+0x77/0x250
>   [<c017e1f8>] do_ioctl+0x28/0x80
>   [<c017e2a7>] vfs_ioctl+0x57/0x2b0
>   [<c017e539>] sys_ioctl+0x39/0x60
>   [<c01031ad>] sysenter_past_esp+0x56/0x99
>   [<b7fd5410>] 0xb7fd5410

The bug is in r8139too.c driver. It calls flush_scheduled_work
with RTNL mutex held, so any other work using it will get stuck.

> 
> Has this been fixed in later releases?

No but a different race (with device removal) has been fixed.



-- 
Stephen Hemminger <shemminger@linux-foundation.org>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: deadlock in 2.6.18.2 related to bridging?
  2007-02-14 21:12 ` Stephen Hemminger
@ 2007-02-14 21:26   ` Ben Greear
  2007-02-14 23:37     ` Francois Romieu
  0 siblings, 1 reply; 4+ messages in thread
From: Ben Greear @ 2007-02-14 21:26 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: NetDev

Stephen Hemminger wrote:

> The bug is in r8139too.c driver. It calls flush_scheduled_work
> with RTNL mutex held, so any other work using it will get stuck.

It looks like a fairly common problem, as tg3 has the same issue
(though it seems someone tried to hack around one particular case):

static int tg3_close(struct net_device *dev)
{
         struct tg3 *tp = netdev_priv(dev);

         /* Calling flush_scheduled_work() may deadlock because
          * linkwatch_event() may be on the workqueue and it will try to get
          * the rtnl_lock which we are holding.
          */
         while (tp->tg3_flags & TG3_FLAG_IN_RESET_TASK)
                 msleep(1);

         netif_stop_queue(dev);


e1000 appears clean, at least, but there are a lot of other
drivers that are calling that method (I didn't check to see
if they might be holding rtnl when called.)


Thanks,
Ben




> 
>> Has this been fixed in later releases?
> 
> No but a different race (with device removal) has been fixed.
> 
> 
> 


-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: deadlock in 2.6.18.2 related to bridging?
  2007-02-14 21:26   ` Ben Greear
@ 2007-02-14 23:37     ` Francois Romieu
  0 siblings, 0 replies; 4+ messages in thread
From: Francois Romieu @ 2007-02-14 23:37 UTC (permalink / raw)
  To: Ben Greear; +Cc: Stephen Hemminger, NetDev

Ben Greear <greearb@candelatech.com> :
[...]
> e1000 appears clean, at least, but there are a lot of other
> drivers that are calling that method (I didn't check to see
> if they might be holding rtnl when called.)

Not that lot: only r8169, sis190, s2io and cassini (through change_mtu).

Bad week.

-- 
Ueimor

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2007-02-14 23:37 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-02-14  1:23 deadlock in 2.6.18.2 related to bridging? Ben Greear
2007-02-14 21:12 ` Stephen Hemminger
2007-02-14 21:26   ` Ben Greear
2007-02-14 23:37     ` Francois Romieu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.