* [lock validator] net/ipv4/fib_hash.c: illegal {enabled-softirqs} -> {used-in-softirq} usage?
@ 2006-01-27 0:18 Ingo Molnar
2006-01-27 1:41 ` Herbert Xu
0 siblings, 1 reply; 22+ messages in thread
From: Ingo Molnar @ 2006-01-27 0:18 UTC (permalink / raw)
To: David S. Miller; +Cc: linux-kernel
the lock validator i'm working on found another item:
============================
[ BUG: illegal lock usage! ]
----------------------------
illegal {enabled-softirqs} -> {used-in-softirq} usage.
hackbench/8407 [HC0[0]:SC1[2]:HE1:SE0] takes:
{&state[i].lock} [<c0bf7d08>] wrandom_flush+0x2a/0xb3
{enabled-softirqs} state was registered at:
[<c0befc33>] fn_hash_insert+0x565/0x5c3
hardirqs last enabled at: [<c0d29a36>] _spin_unlock_irq+0xd/0x10
softirqs last enabled at: [<c012d0a0>] irq_exit+0x36/0x38
other info that might help us debug this:
locks held by hackbench/8407: <none>
stack backtrace:
[<c0103e86>] show_trace+0xd/0xf
[<c0103e9d>] dump_stack+0x15/0x17
[<c013eb40>] print_usage_bug+0x16d/0x177
[<c013f130>] mark_lock+0xe3/0x248
[<c013f65d>] debug_lock_chain+0x3c8/0xb1d
[<c013fde3>] debug_lock_chain_spin+0x31/0x48
[<c0410c4e>] _raw_spin_lock+0x34/0x7f
[<c0d298fd>] _spin_lock+0x8/0xa
[<c0bf7d08>] wrandom_flush+0x2a/0xb3
[<c0bc3738>] rt_cache_flush+0x3f/0xd8
[<c0bc3843>] rt_secret_rebuild+0x11/0x26
[<c0131165>] run_timer_softirq+0x143/0x19e
[<c012d384>] __do_softirq+0x84/0xff
[<c0104d76>] do_softirq+0x52/0xbb
=======================
[<c012d0a0>] irq_exit+0x36/0x38
[<c0118809>] smp_apic_timer_interrupt+0x4e/0x51
[<c010376f>] apic_timer_interrupt+0x27/0x2c
[<c01670b2>] kmem_cache_alloc+0x28/0xa6
[<c0b80be6>] __alloc_skb+0x25/0xee
[<c0b7f144>] sock_alloc_send_skb+0x5d/0x192
[<c0c29bdf>] unix_stream_sendmsg+0x131/0x33f
[<c0b7e30a>] do_sock_write+0xbd/0xc6
[<c0b7e441>] sock_aio_write+0x56/0x64
[<c016a9a4>] do_sync_write+0xb1/0xe6
[<c016adf0>] vfs_write+0xbd/0x155
[<c016b657>] sys_write+0x3b/0x60
[<c0102c25>] syscall_call+0x7/0xb
the culprit seems to be:
(gdb) list *0xc0befc33
0xc0befc33 is in fn_hash_insert (net/ipv4/fib_hash.c:527).
522 list_add_tail(&new_fa->fa_list,
523 (fa ? &fa->fa_list : &f->fn_alias));
524 fib_hash_genid++;
525 write_unlock_bh(&fib_hash_lock);
526
527 if (new_f)
528 fz->fz_nent++;
529 rt_cache_flush(-1);
530
531 rtmsg_fib(RTM_NEWROUTE, key, new_fa, z, tb->tb_id, n, req);
(gdb)
which enabled softirqs with {&state[i].lock} still held. If the
rt_secret_rebuild() would hit that codepath at that point then it could
cause a deadlock, right?
Ingo
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [lock validator] net/ipv4/fib_hash.c: illegal {enabled-softirqs} -> {used-in-softirq} usage?
2006-01-27 0:18 [lock validator] net/ipv4/fib_hash.c: illegal {enabled-softirqs} -> {used-in-softirq} usage? Ingo Molnar
@ 2006-01-27 1:41 ` Herbert Xu
2006-01-28 15:22 ` [lock validator] inet6_destroy_sock(): soft-safe -> soft-unsafe lock dependency Ingo Molnar
0 siblings, 1 reply; 22+ messages in thread
From: Herbert Xu @ 2006-01-27 1:41 UTC (permalink / raw)
To: Ingo Molnar; +Cc: davem, linux-kernel
Ingo Molnar <mingo@elte.hu> wrote:
> the lock validator i'm working on found another item:
>
> ============================
> [ BUG: illegal lock usage! ]
> ----------------------------
> illegal {enabled-softirqs} -> {used-in-softirq} usage.
> hackbench/8407 [HC0[0]:SC1[2]:HE1:SE0] takes:
> {&state[i].lock} [<c0bf7d08>] wrandom_flush+0x2a/0xb3
> {enabled-softirqs} state was registered at:
> [<c0befc33>] fn_hash_insert+0x565/0x5c3
> hardirqs last enabled at: [<c0d29a36>] _spin_unlock_irq+0xd/0x10
> softirqs last enabled at: [<c012d0a0>] irq_exit+0x36/0x38
>
> other info that might help us debug this:
> locks held by hackbench/8407: <none>
>
> stack backtrace:
> [<c0103e86>] show_trace+0xd/0xf
> [<c0103e9d>] dump_stack+0x15/0x17
> [<c013eb40>] print_usage_bug+0x16d/0x177
> [<c013f130>] mark_lock+0xe3/0x248
> [<c013f65d>] debug_lock_chain+0x3c8/0xb1d
> [<c013fde3>] debug_lock_chain_spin+0x31/0x48
> [<c0410c4e>] _raw_spin_lock+0x34/0x7f
> [<c0d298fd>] _spin_lock+0x8/0xa
> [<c0bf7d08>] wrandom_flush+0x2a/0xb3
Good catch. In fact the other spin lock operation in the same file needs
BH protection as well since it is invoked by the output route path which
can be in user-context for local packets.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 22+ messages in thread
* [lock validator] inet6_destroy_sock(): soft-safe -> soft-unsafe lock dependency
2006-01-27 1:41 ` Herbert Xu
@ 2006-01-28 15:22 ` Ingo Molnar
2006-01-28 15:44 ` Ingo Molnar
2006-01-31 10:27 ` Herbert Xu
0 siblings, 2 replies; 22+ messages in thread
From: Ingo Molnar @ 2006-01-28 15:22 UTC (permalink / raw)
To: Herbert Xu; +Cc: davem, linux-kernel
* Herbert Xu <herbert@gondor.apana.org.au> wrote:
> Good catch. In fact the other spin lock operation in the same file
> needs BH protection as well since it is invoked by the output route
> path which can be in user-context for local packets.
ok, cool.
Today i've added rwlocks to the coverage of the validator too, and it
triggers in the networking code straight away - see the output below. Is
this a genuine deadlock-site too? (Take this with a large grain of salt
- while it did find another genuine bug in floppy.c and it passes a
couple of testcases too, the rwlock validation feature is literally just
1 hour old.)
let me know if you need explanation about the output, or if you need
more info.
Ingo
===========================================================
[ BUG: soft-safe -> soft-unsafe lock dependency detected! ]
-----------------------------------------------------------
sshd/2853 [HC0[0]:SC0[1]:HE1:SE0] is trying to acquire:
(&newsk->sk_dst_lock){+-}, at: [<c048f385>] inet6_destroy_sock+0x25/0x100
and this task is already holding:
(&((sk)->sk_lock.slock)){+-}, at: [<c0464832>] tcp_close+0x142/0x650
which would create a new lock dependency:
(&((sk)->sk_lock.slock)){+-} -> (&newsk->sk_dst_lock){+-}
but this new dependency connects a softirq-safe lock:
(&((sk)->sk_lock.slock)){+-}, at: [<c0464832>] tcp_close+0x142/0x650
... which became softirq-safe at:
... [<c0102e27>] sysenter_past_esp+0x84/0x8d
to a softirq-unsafe lock:
(&newsk->sk_dst_lock){+-}, at: [<c048f385>] inet6_destroy_sock+0x25/0x100
... which became softirq-unsafe at:
... [<00000000>] 0x0
which could lead to deadlocks!
other info that might help us debug this:
1 locks held by sshd/2853:
#0: (&((sk)->sk_lock.slock)){+-}, at: [<c0464832>] tcp_close+0x142/0x650
the softirq-safe lock's dependencies:
-> (&((sk)->sk_lock.slock)){+-} {
used at: [<c0438429>] lock_sock+0x29/0xd0
in-softirq at: [<c047081b>] tcp_delack_timer+0x1b/0x200
hardirq-on at: [<c0102e27>] sysenter_past_esp+0x84/0x8d
}
... key at: [<c0438375>] sock_lock_init+0x15/0x30
... acquired at: [<c0464832>] tcp_close+0x142/0x650
-> (&sk->sk_callback_lock){+-} {
used at: [<c048c8b3>] unix_release_sock+0x43/0x2d0
hardirq-on at: [<c0162738>] kmem_cache_free+0x78/0xb0
}
... key at: [<c043963f>] sock_init_data+0x10f/0x1d0
... acquired at: [<c046484e>] tcp_close+0x15e/0x650
-> (cpa_lock){++} {
used at: [<c011474c>] change_page_attr+0x1c/0x2a0
in-hardirq at: [<c011474c>] change_page_attr+0x1c/0x2a0
in-softirq at: [<c011474c>] change_page_attr+0x1c/0x2a0
}
... key at: [<c0581950>] cpa_lock+0x10/0x40
... acquired at: [<c011474c>] change_page_attr+0x1c/0x2a0
-> (&dev->queue_lock){+-} {
used at: [<c044d44d>] qdisc_lock_tree+0x1d/0x20
in-softirq at: [<c0441ef4>] dev_queue_xmit+0x64/0x290
hardirq-on at: [<c0102ea7>] restore_nocheck+0x8/0xb
}
... key at: [<c044288b>] register_netdevice+0x4b/0x3a0
... acquired at: [<c0441ef4>] dev_queue_xmit+0x64/0x290
-> (&dev->xmit_lock){+-} {
used at: [<c0443e6c>] dev_mc_upload+0x1c/0x40
in-softirq at: [<c044d6bb>] dev_watchdog+0x1b/0xc0
hardirq-on at: [<c04d40fa>] __mutex_lock_slowpath+0x27a/0x4e0
}
... key at: [<c044289d>] register_netdevice+0x5d/0x3a0
... acquired at: [<c044e019>] dev_activate+0x69/0x110
-> (&tp->lock){++} {
used at: [<c0338c8a>] rtl8139_interrupt+0x2a/0x5e0
in-hardirq at: [<c0338c8a>] rtl8139_interrupt+0x2a/0x5e0
in-softirq at: [<c033839a>] rtl8139_start_xmit+0x7a/0x1a0
}
... key at: [<c03376bb>] rtl8139_init_one+0x2ab/0x960
... acquired at: [<c0337d92>] rtl8139_set_rx_mode+0x22/0x180
-> (&base->t_base.lock){++} {
used at: [<c01269f3>] lock_timer_base+0x23/0x50
in-hardirq at: [<c01269f3>] lock_timer_base+0x23/0x50
in-softirq at: [<c0127612>] run_timer_softirq+0x42/0x1f0
}
... key at: [<c1fdaef0>] 0xc1fdaef0
... acquired at: [<c01269f3>] lock_timer_base+0x23/0x50
-> (modlist_lock){++} {
used at: [<c0140289>] module_text_address+0x19/0x40
in-hardirq at: [<c0140289>] module_text_address+0x19/0x40
in-softirq at: [<c0140289>] module_text_address+0x19/0x40
}
... key at: [<c05842f0>] modlist_lock+0x10/0x40
... acquired at: [<c0140289>] module_text_address+0x19/0x40
-> (cpa_lock){++} {
used at: [<c011474c>] change_page_attr+0x1c/0x2a0
in-hardirq at: [<c011474c>] change_page_attr+0x1c/0x2a0
in-softirq at: [<c011474c>] change_page_attr+0x1c/0x2a0
}
... key at: [<c0581950>] cpa_lock+0x10/0x40
... acquired at: [<c011474c>] change_page_attr+0x1c/0x2a0
-> (&tcp_hashinfo.ehash[i].lock){+-} {
used at: [<c0460dbd>] inet_hash_connect+0x3dd/0x480
in-softirq at: [<c0460eed>] __inet_twsk_hashdance+0x8d/0x130
hardirq-on at: [<c0161b26>] kmem_cache_alloc+0x96/0xd0
}
... key at: [<c0696b8d>] tcp_init+0x2ad/0x300
... acquired at: [<c0472fa4>] tcp_v4_syn_recv_sock+0x184/0x340
-> (&tcp_hashinfo.bhash[i].lock){+-} {
used at: [<c046275a>] inet_csk_get_port+0xea/0x240
in-softirq at: [<c0460eaa>] __inet_twsk_hashdance+0x4a/0x130
hardirq-on at: [<c0102e27>] sysenter_past_esp+0x84/0x8d
}
... key at: [<c0696b44>] tcp_init+0x264/0x300
... acquired at: [<c0472ffe>] tcp_v4_syn_recv_sock+0x1de/0x340
-> (&parent->list_lock){++} {
used at: [<c0161bca>] cache_alloc_refill+0x6a/0x730
in-hardirq at: [<c016362b>] do_drain+0x2b/0x60
in-softirq at: [<c0162a46>] free_block+0x136/0x150
}
... key at: [<c01610ba>] kmem_list3_init+0x3a/0x50
... acquired at: [<c0161bca>] cache_alloc_refill+0x6a/0x730
-> (&cachep->spinlock){+-} {
used at: [<c0161ea2>] cache_alloc_refill+0x342/0x730
in-softirq at: [<c0161ea2>] cache_alloc_refill+0x342/0x730
}
... key at: [<c0163c71>] kmem_cache_create+0x2c1/0x780
... acquired at: [<c0161ea2>] cache_alloc_refill+0x342/0x730
-> (&parent->list_lock){++} {
used at: [<c0161bca>] cache_alloc_refill+0x6a/0x730
in-hardirq at: [<c016362b>] do_drain+0x2b/0x60
in-softirq at: [<c0162a46>] free_block+0x136/0x150
}
... key at: [<c01610ba>] kmem_list3_init+0x3a/0x50
... acquired at: [<c01636a1>] __cache_shrink+0x41/0x140
-> (cpa_lock){++} {
used at: [<c011474c>] change_page_attr+0x1c/0x2a0
in-hardirq at: [<c011474c>] change_page_attr+0x1c/0x2a0
in-softirq at: [<c011474c>] change_page_attr+0x1c/0x2a0
}
... key at: [<c0581950>] cpa_lock+0x10/0x40
... acquired at: [<c011474c>] change_page_attr+0x1c/0x2a0
-> (cpa_lock){++} {
used at: [<c011474c>] change_page_attr+0x1c/0x2a0
in-hardirq at: [<c011474c>] change_page_attr+0x1c/0x2a0
in-softirq at: [<c011474c>] change_page_attr+0x1c/0x2a0
}
... key at: [<c0581950>] cpa_lock+0x10/0x40
... acquired at: [<c011474c>] change_page_attr+0x1c/0x2a0
-> (&tcp_hashinfo.ehash[i].lock){+-} {
used at: [<c0460dbd>] inet_hash_connect+0x3dd/0x480
in-softirq at: [<c0460eed>] __inet_twsk_hashdance+0x8d/0x130
hardirq-on at: [<c0161b26>] kmem_cache_alloc+0x96/0xd0
}
... key at: [<c0696b8d>] tcp_init+0x2ad/0x300
... acquired at: [<c0460dbd>] inet_hash_connect+0x3dd/0x480
-> (&queue->syn_wait_lock){+-} {
used at: [<c043a7bc>] reqsk_queue_alloc+0x8c/0xd0
in-softirq at: [<c0461e88>] inet_csk_reqsk_queue_hash_add+0xd8/0x110
hardirq-on at: [<c04d5105>] _spin_unlock_irqrestore+0x25/0x30
}
... key at: [<c043a7a2>] reqsk_queue_alloc+0x72/0xd0
... acquired at: [<c0475bf2>] tcp_check_req+0x2c2/0x3f0
-> (&base->t_base.lock){++} {
used at: [<c01269f3>] lock_timer_base+0x23/0x50
in-hardirq at: [<c01269f3>] lock_timer_base+0x23/0x50
in-softirq at: [<c0127612>] run_timer_softirq+0x42/0x1f0
}
... key at: [<c1fd2ef0>] 0xc1fd2ef0
... acquired at: [<c01269f3>] lock_timer_base+0x23/0x50
-> (modlist_lock){++} {
used at: [<c0140289>] module_text_address+0x19/0x40
in-hardirq at: [<c0140289>] module_text_address+0x19/0x40
in-softirq at: [<c0140289>] module_text_address+0x19/0x40
}
... key at: [<c05842f0>] modlist_lock+0x10/0x40
... acquired at: [<c0140289>] module_text_address+0x19/0x40
-> (&q->lock){++} {
used at: [<c0132050>] add_wait_queue+0x20/0x50
in-hardirq at: [<c0114fab>] __wake_up+0x1b/0x50
in-softirq at: [<c0114fab>] __wake_up+0x1b/0x50
}
... key at: [<c0131e62>] init_waitqueue_head+0x12/0x20
... acquired at: [<c0114fab>] __wake_up+0x1b/0x50
-> (&rq->lock){++} {
used at: [<c01162a9>] init_idle+0x49/0x70
in-hardirq at: [<c0117fb6>] scheduler_tick+0xe6/0x380
in-softirq at: [<c01175ae>] try_to_wake_up+0x3e/0x410
}
... key at: [<c1fda4f0>] 0xc1fda4f0
... acquired at: [<c01175ae>] try_to_wake_up+0x3e/0x410
-> (&rq->lock){++} {
used at: [<c01162a9>] init_idle+0x49/0x70
in-hardirq at: [<c0117fb6>] scheduler_tick+0xe6/0x380
in-softirq at: [<c0117fe7>] scheduler_tick+0x117/0x380
}
... key at: [<c1fd24f0>] 0xc1fd24f0
... acquired at: [<c01175ae>] try_to_wake_up+0x3e/0x410
-> (&rq->lock){++} {
used at: [<c01162a9>] init_idle+0x49/0x70
in-hardirq at: [<c0117fb6>] scheduler_tick+0xe6/0x380
in-softirq at: [<c01175ae>] try_to_wake_up+0x3e/0x410
}
... key at: [<c1fda4f0>] 0xc1fda4f0
... acquired at: [<c0114e6c>] double_rq_lock+0x3c/0x50
-> (&parent->list_lock){++} {
used at: [<c0161bca>] cache_alloc_refill+0x6a/0x730
in-hardirq at: [<c016362b>] do_drain+0x2b/0x60
in-softirq at: [<c0162a46>] free_block+0x136/0x150
}
... key at: [<c01610ba>] kmem_list3_init+0x3a/0x50
... acquired at: [<c0161bca>] cache_alloc_refill+0x6a/0x730
-> (&cachep->spinlock){+-} {
used at: [<c0161ea2>] cache_alloc_refill+0x342/0x730
in-softirq at: [<c0161ea2>] cache_alloc_refill+0x342/0x730
}
... key at: [<c0163c71>] kmem_cache_create+0x2c1/0x780
... acquired at: [<c0161ea2>] cache_alloc_refill+0x342/0x730
-> (&parent->list_lock){++} {
used at: [<c0161bca>] cache_alloc_refill+0x6a/0x730
in-hardirq at: [<c016362b>] do_drain+0x2b/0x60
in-softirq at: [<c0162a46>] free_block+0x136/0x150
}
... key at: [<c01610ba>] kmem_list3_init+0x3a/0x50
... acquired at: [<c01636a1>] __cache_shrink+0x41/0x140
-> (cpa_lock){++} {
used at: [<c011474c>] change_page_attr+0x1c/0x2a0
in-hardirq at: [<c011474c>] change_page_attr+0x1c/0x2a0
in-softirq at: [<c011474c>] change_page_attr+0x1c/0x2a0
}
... key at: [<c0581950>] cpa_lock+0x10/0x40
... acquired at: [<c011474c>] change_page_attr+0x1c/0x2a0
-> (&newsk->sk_callback_lock){+-} {
used at: [<c047f9ad>] inet_accept+0x4d/0xc0
hardirq-on at: [<c0162738>] kmem_cache_free+0x78/0xb0
}
... key at: [<c04399fb>] sk_clone+0xeb/0x1c0
... acquired at: [<c046484e>] tcp_close+0x15e/0x650
the softirq-unsafe lock's dependencies:
-> (&newsk->sk_dst_lock){+-} {
used at: [<c045fda6>] ip_setsockopt+0x526/0xbf0
softirq-on at: [<c04384c1>] lock_sock+0xc1/0xd0
hardirq-on at: [<c0102e27>] sysenter_past_esp+0x84/0x8d
}
... key at: [<c04399e9>] sk_clone+0xd9/0x1c0
... acquired at: [<c048f385>] inet6_destroy_sock+0x25/0x100
stack backtrace:
[<c010432d>] show_trace+0xd/0x10
[<c0104347>] dump_stack+0x17/0x20
[<c01380f0>] check_mask+0x2b0/0x300
[<c013a109>] debug_lock_chain+0xce9/0x1090
[<c013a4ed>] debug_lock_chain_spin+0x3d/0x60
[<c02684f2>] _raw_write_lock+0x32/0x1a0
[<c04d50d8>] _write_lock+0x8/0x10
[<c048f385>] inet6_destroy_sock+0x25/0x100
[<c04aede2>] tcp_v6_destroy_sock+0x12/0x20
[<c046217a>] inet_csk_destroy_sock+0x4a/0x150
[<c0464b7f>] tcp_close+0x48f/0x650
[<c047ffa7>] inet_release+0x37/0x60
[<c048ed10>] inet6_release+0x30/0x40
[<c0436957>] sock_release+0x17/0xa0
[<c0436df1>] sock_close+0x21/0x40
[<c01674e3>] __fput+0xb3/0x170
[<c01675c7>] fput+0x27/0x50
[<c01647d7>] filp_close+0x47/0x80
[<c011e685>] put_files_struct+0x85/0xf0
[<c011f58e>] do_exit+0x14e/0x840
[<c011fcbd>] do_group_exit+0x3d/0xa0
[<c011fd2f>] sys_exit_group+0xf/0x20
[<c0102df7>] sysenter_past_esp+0x54/0x8d
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [lock validator] inet6_destroy_sock(): soft-safe -> soft-unsafe lock dependency
2006-01-28 15:22 ` [lock validator] inet6_destroy_sock(): soft-safe -> soft-unsafe lock dependency Ingo Molnar
@ 2006-01-28 15:44 ` Ingo Molnar
2006-01-31 10:27 ` Herbert Xu
1 sibling, 0 replies; 22+ messages in thread
From: Ingo Molnar @ 2006-01-28 15:44 UTC (permalink / raw)
To: Herbert Xu; +Cc: davem, linux-kernel
* Ingo Molnar <mingo@elte.hu> wrote:
> to a softirq-unsafe lock:
> (&newsk->sk_dst_lock){+-}, at: [<c048f385>] inet6_destroy_sock+0x25/0x100
> ... which became softirq-unsafe at:
> ... [<00000000>] 0x0
fyi, here is where sk_dst_lock became softirq-unsafe:
marked lock as {softirq-on}:
(&sk->sk_dst_lock){--}, at: [<c04b45d3>] ip6_datagram_connect+0x3b3/0x520
softirq was enabled at: c0497738
hardirq was enabled at: c0102e27
(&sk->sk_dst_lock){--}, at: [<c04b45d3>] ip6_datagram_connect+0x3b3/0x520
[<c010432d>] show_trace+0xd/0x10
[<c0104347>] dump_stack+0x17/0x20
[<c0139243>] mark_lock+0x173/0x3a0
[<c01399a5>] debug_lock_chain+0x535/0x1090
[<c013a53d>] debug_lock_chain_spin+0x3d/0x60
[<c0268542>] _raw_write_lock+0x32/0x1a0
[<c04d48c8>] _write_lock+0x8/0x10
[<c04b45d3>] ip6_datagram_connect+0x3b3/0x520
[<c04805c7>] inet_dgram_connect+0x37/0x80
[<c0436f0a>] sys_connect+0x5a/0x80
[<c0437414>] sys_socketcall+0x94/0x260
[<c0102df7>] sysenter_past_esp+0x54/0x8d
Ingo
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [lock validator] inet6_destroy_sock(): soft-safe -> soft-unsafe lock dependency
2006-01-28 15:22 ` [lock validator] inet6_destroy_sock(): soft-safe -> soft-unsafe lock dependency Ingo Molnar
2006-01-28 15:44 ` Ingo Molnar
@ 2006-01-31 10:27 ` Herbert Xu
2006-01-31 10:43 ` David S. Miller
2006-01-31 21:24 ` Ingo Molnar
1 sibling, 2 replies; 22+ messages in thread
From: Herbert Xu @ 2006-01-31 10:27 UTC (permalink / raw)
To: Ingo Molnar; +Cc: davem, linux-kernel
On Sat, Jan 28, 2006 at 04:22:04PM +0100, Ingo Molnar wrote:
>
> ===========================================================
> [ BUG: soft-safe -> soft-unsafe lock dependency detected! ]
> -----------------------------------------------------------
> sshd/2853 [HC0[0]:SC0[1]:HE1:SE0] is trying to acquire:
> (&newsk->sk_dst_lock){+-}, at: [<c048f385>] inet6_destroy_sock+0x25/0x100
>
> and this task is already holding:
> (&((sk)->sk_lock.slock)){+-}, at: [<c0464832>] tcp_close+0x142/0x650
> which would create a new lock dependency:
> (&((sk)->sk_lock.slock)){+-} -> (&newsk->sk_dst_lock){+-}
OK, I believe this is a false positive. However, Dave should
double-check this.
tcp_close is only called from process context. The rule for sk_dst_lock
is that it must also only be obtained in process context. On the other
hand, it is true that sk_lock can be obtained in softirq context.
In this particular case, sk_dst_lock is obtained by tcp_close with
softirqs disabled. This is not a problem in itself since we're not
trying to get sk_dst_lock from a real softirq context (as opposed to
process context with softirq disabled).
I believe this warning comes about because the validator creates a
dependency between sk_lock and sk_dst_lock. It then infers from this
dependency that in softirq contexts where sk_lock is obtained the code
may also attempt to obtain sk_dst_lock.
This inference is where the validator errs. sk_dst_lock is never
(or should never be, and as far as I can see none of the traces show
it to do so) obtained in a real softirq context.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [lock validator] inet6_destroy_sock(): soft-safe -> soft-unsafe lock dependency
2006-01-31 10:27 ` Herbert Xu
@ 2006-01-31 10:43 ` David S. Miller
2006-01-31 11:21 ` Ingo Molnar
2006-02-01 13:32 ` Ingo Molnar
2006-01-31 21:24 ` Ingo Molnar
1 sibling, 2 replies; 22+ messages in thread
From: David S. Miller @ 2006-01-31 10:43 UTC (permalink / raw)
To: herbert; +Cc: mingo, davem, linux-kernel
From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Tue, 31 Jan 2006 21:27:58 +1100
> tcp_close is only called from process context. The rule for sk_dst_lock
> is that it must also only be obtained in process context. On the other
> hand, it is true that sk_lock can be obtained in softirq context.
>
> In this particular case, sk_dst_lock is obtained by tcp_close with
> softirqs disabled. This is not a problem in itself since we're not
> trying to get sk_dst_lock from a real softirq context (as opposed to
> process context with softirq disabled).
>
> I believe this warning comes about because the validator creates a
> dependency between sk_lock and sk_dst_lock. It then infers from this
> dependency that in softirq contexts where sk_lock is obtained the code
> may also attempt to obtain sk_dst_lock.
>
> This inference is where the validator errs. sk_dst_lock is never
> (or should never be, and as far as I can see none of the traces show
> it to do so) obtained in a real softirq context.
Herbert's analysis is correct. This unique locking strategy
is used by tcp_close() because at this point it knows that
every single reference to this socket in the system is gone
once it takes the socket lock with BH disabled.
And that known invariant is why this is correct, and the locking
validator has no way to figure this out.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [lock validator] inet6_destroy_sock(): soft-safe -> soft-unsafe lock dependency
2006-01-31 10:43 ` David S. Miller
@ 2006-01-31 11:21 ` Ingo Molnar
2006-02-01 13:32 ` Ingo Molnar
1 sibling, 0 replies; 22+ messages in thread
From: Ingo Molnar @ 2006-01-31 11:21 UTC (permalink / raw)
To: David S. Miller; +Cc: herbert, davem, linux-kernel
* David S. Miller <davem@davemloft.net> wrote:
> From: Herbert Xu <herbert@gondor.apana.org.au>
> Date: Tue, 31 Jan 2006 21:27:58 +1100
>
> > tcp_close is only called from process context. The rule for sk_dst_lock
> > is that it must also only be obtained in process context. On the other
> > hand, it is true that sk_lock can be obtained in softirq context.
> >
> > In this particular case, sk_dst_lock is obtained by tcp_close with
> > softirqs disabled. This is not a problem in itself since we're not
> > trying to get sk_dst_lock from a real softirq context (as opposed to
> > process context with softirq disabled).
> >
> > I believe this warning comes about because the validator creates a
> > dependency between sk_lock and sk_dst_lock. It then infers from this
> > dependency that in softirq contexts where sk_lock is obtained the code
> > may also attempt to obtain sk_dst_lock.
> >
> > This inference is where the validator errs. sk_dst_lock is never
> > (or should never be, and as far as I can see none of the traces show
> > it to do so) obtained in a real softirq context.
>
> Herbert's analysis is correct. This unique locking strategy is used
> by tcp_close() because at this point it knows that every single
> reference to this socket in the system is gone once it takes the
> socket lock with BH disabled.
>
> And that known invariant is why this is correct, and the locking
> validator has no way to figure this out.
ok, thanks for the analysis! I'll fix this with an explicit hint to the
validator.
Ingo
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [lock validator] inet6_destroy_sock(): soft-safe -> soft-unsafe lock dependency
2006-01-31 10:27 ` Herbert Xu
2006-01-31 10:43 ` David S. Miller
@ 2006-01-31 21:24 ` Ingo Molnar
2006-01-31 22:06 ` Herbert Xu
2006-02-01 10:42 ` Herbert Xu
1 sibling, 2 replies; 22+ messages in thread
From: Ingo Molnar @ 2006-01-31 21:24 UTC (permalink / raw)
To: Herbert Xu; +Cc: davem, linux-kernel
* Herbert Xu <herbert@gondor.apana.org.au> wrote:
> This inference is where the validator errs. sk_dst_lock is never (or
> should never be, and as far as I can see none of the traces show it to
> do so) obtained in a real softirq context.
just to make sure - is the trace below a safe use of sk_dst_lock too?
Here sk_dst_lock seems to be taken in real softirq context.
Ingo
============================
[ BUG: illegal lock usage! ]
----------------------------
illegal {softirq-on} -> {in-softirq} usage.
sshd/2476 [HC0[0]:SC1[2]:HE1:SE0] takes:
(&sk->sk_dst_lock){-+}, at: [<c0499015>] inet6_destroy_sock+0x25/0x100
{softirq-on} state was registered at:
[<c04a1bd8>] ipv6_dev_get_saddr+0x138/0x640
hardirqs last enabled at: [<c04dea45>] _spin_unlock_irqrestore+0x25/0x30
softirqs last enabled at: [<c01231e5>] irq_exit+0x45/0x50
other info that might help us debug this:
3 locks held by sshd/2476:
#0: (&p->proc_lock){--}, at: [<c011f1a3>] release_task+0x23/0x150
#1: (&tp->rx_lock){-+}, at: [<c0342a15>] rtl8139_poll+0x45/0x4c0
#2: (&sk->sk_lock.slock/1){-+}, at: [<c047e8f6>] tcp_v4_rcv+0x726/0x9d0
stack backtrace:
[<c010437d>] show_trace+0xd/0x10
[<c0104397>] dump_stack+0x17/0x20
[<c0139538>] print_usage_bug+0x1d8/0x230
[<c01398a8>] mark_lock+0x318/0x350
[<c0139d73>] debug_lock_chain+0x493/0x1090
[<c013a9ad>] debug_lock_chain_spin+0x3d/0x60
[<c0269272>] _raw_write_lock+0x32/0x1a0
[<c04de9e8>] _write_lock+0x8/0x10
[<c0499015>] inet6_destroy_sock+0x25/0x100
[<c04b8672>] tcp_v6_destroy_sock+0x12/0x20
[<c046bbda>] inet_csk_destroy_sock+0x4a/0x150
[<c047625c>] tcp_rcv_state_process+0xd4c/0xdd0
[<c047d8e9>] tcp_v4_do_rcv+0xa9/0x340
[<c047eabb>] tcp_v4_rcv+0x8eb/0x9d0
[<c0462c76>] ip_local_deliver+0xa6/0x190
[<c04629f8>] ip_rcv+0x2f8/0x4d0
[<c044bcb6>] netif_receive_skb+0x1b6/0x2a0
[<c0342d3a>] rtl8139_poll+0x36a/0x4c0
[<c044a682>] net_rx_action+0xd2/0x1f0
[<c0123527>] __do_softirq+0x97/0x130
[<c01054d9>] do_softirq+0x69/0x100
=======================
[<c01231e5>] irq_exit+0x45/0x50
[<c01055c4>] do_IRQ+0x54/0x70
[<c01038a9>] common_interrupt+0x25/0x2c
[<c0120990>] do_wait+0x7d0/0xad0
[<c0120cc2>] sys_wait4+0x32/0x40
[<c0120cf5>] sys_waitpid+0x25/0x30
[<c0102e17>] sysenter_past_esp+0x54/0x8d
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [lock validator] inet6_destroy_sock(): soft-safe -> soft-unsafe lock dependency
2006-01-31 21:24 ` Ingo Molnar
@ 2006-01-31 22:06 ` Herbert Xu
2006-02-01 10:42 ` Herbert Xu
1 sibling, 0 replies; 22+ messages in thread
From: Herbert Xu @ 2006-01-31 22:06 UTC (permalink / raw)
To: Ingo Molnar; +Cc: davem, linux-kernel
On Tue, Jan 31, 2006 at 10:24:32PM +0100, Ingo Molnar wrote:
>
> just to make sure - is the trace below a safe use of sk_dst_lock too?
> Here sk_dst_lock seems to be taken in real softirq context.
You're right this does look suspicious. Let me take a closer look...
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [lock validator] inet6_destroy_sock(): soft-safe -> soft-unsafe lock dependency
2006-01-31 21:24 ` Ingo Molnar
2006-01-31 22:06 ` Herbert Xu
@ 2006-02-01 10:42 ` Herbert Xu
2006-02-01 11:13 ` Ingo Molnar
2006-02-03 1:01 ` David S. Miller
1 sibling, 2 replies; 22+ messages in thread
From: Herbert Xu @ 2006-02-01 10:42 UTC (permalink / raw)
To: Ingo Molnar; +Cc: davem, linux-kernel, netdev, YOSHIFUJI Hideaki
[-- Attachment #1: Type: text/plain, Size: 1072 bytes --]
On Tue, Jan 31, 2006 at 10:24:32PM +0100, Ingo Molnar wrote:
>
> [<c04de9e8>] _write_lock+0x8/0x10
> [<c0499015>] inet6_destroy_sock+0x25/0x100
> [<c04b8672>] tcp_v6_destroy_sock+0x12/0x20
> [<c046bbda>] inet_csk_destroy_sock+0x4a/0x150
> [<c047625c>] tcp_rcv_state_process+0xd4c/0xdd0
> [<c047d8e9>] tcp_v4_do_rcv+0xa9/0x340
> [<c047eabb>] tcp_v4_rcv+0x8eb/0x9d0
OK this is definitely broken. We should never touch the dst lock in
softirq context. Since inet6_destroy_sock may be called from that
context due to the asynchronous nature of sockets, we can't take the
lock there.
In fact this sk_dst_reset is totally redundant since all IPv6 sockets
use inet_sock_destruct as their socket destructor which always cleans
up the dst anyway. So the solution is to simply remove the call.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
[-- Attachment #2: inet6-destroy-sock-should-not-reset-dst --]
[-- Type: text/plain, Size: 369 bytes --]
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -369,12 +369,6 @@ int inet6_destroy_sock(struct sock *sk)
struct sk_buff *skb;
struct ipv6_txoptions *opt;
- /*
- * Release destination entry
- */
-
- sk_dst_reset(sk);
-
/* Release rx options */
if ((skb = xchg(&np->pktoptions, NULL)) != NULL)
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [lock validator] inet6_destroy_sock(): soft-safe -> soft-unsafe lock dependency
2006-02-01 10:42 ` Herbert Xu
@ 2006-02-01 11:13 ` Ingo Molnar
2006-02-03 1:01 ` David S. Miller
1 sibling, 0 replies; 22+ messages in thread
From: Ingo Molnar @ 2006-02-01 11:13 UTC (permalink / raw)
To: Herbert Xu; +Cc: davem, linux-kernel, netdev, YOSHIFUJI Hideaki
* Herbert Xu <herbert@gondor.apana.org.au> wrote:
> On Tue, Jan 31, 2006 at 10:24:32PM +0100, Ingo Molnar wrote:
> >
> > [<c04de9e8>] _write_lock+0x8/0x10
> > [<c0499015>] inet6_destroy_sock+0x25/0x100
> > [<c04b8672>] tcp_v6_destroy_sock+0x12/0x20
> > [<c046bbda>] inet_csk_destroy_sock+0x4a/0x150
> > [<c047625c>] tcp_rcv_state_process+0xd4c/0xdd0
> > [<c047d8e9>] tcp_v4_do_rcv+0xa9/0x340
> > [<c047eabb>] tcp_v4_rcv+0x8eb/0x9d0
>
> OK this is definitely broken. We should never touch the dst lock in
> softirq context. Since inet6_destroy_sock may be called from that
> context due to the asynchronous nature of sockets, we can't take the
> lock there.
>
> In fact this sk_dst_reset is totally redundant since all IPv6 sockets
> use inet_sock_destruct as their socket destructor which always cleans
> up the dst anyway. So the solution is to simply remove the call.
>
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
cool - i've booted your patch and will have results later today [it's
looking good so far, after 10 minutes of uptime ;)]
btw., this codepath took some time to trigger, and i'm not sure why:
maybe because i dont have any true ipv6 traffic? (In fact i dont have
CONFIG_IPV6 enabled at all in this kernel config - so this codepath must
be an effect of ipv4/ipv6 unification?) I guess i should run the ipv6
testsuite to expose all the important codepaths to the validator? [Would
you have any suggestions for me how to do that quickly & easily?]
another thing: could you add the string 'lock validator' somewhere into
the changelog, so that we can grep for such things later on? One reason
for that is to strenghten my future arguments for mainline inclusion
;-), but there's another argument as well:
the lock validator finds _all_ hidden deadlocks [no matter how obscure,
interdependent or unlikely they are, as long as the affected codepath is
triggered at least once], and thus the resulting bug statistics and
characteristics will be an excellent (and one-time) opportunity to
objectively judge the absolute code quality (and defect rate) of the
Linux kernel, for an important category of hard-to-find bugs. (Maybe the
results can even be extrapolated to other, similarly hard-to-find bug
categories.)
in other words, the lock validator is building a runtime set of formal
"locking requirements", and is automatically proving (and maintaining
the proof) that all locking activity within the kernel meets those
requirements, mathematically. It thus enables us to achieve a hard
ceiling of 100% correctness that we can rarely achieve in complex code
as the kernel. We should minimize the changeset entropy introduced and
preserve this historic information.
Ingo
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [lock validator] inet6_destroy_sock(): soft-safe -> soft-unsafe lock dependency
2006-01-31 10:43 ` David S. Miller
2006-01-31 11:21 ` Ingo Molnar
@ 2006-02-01 13:32 ` Ingo Molnar
2006-02-01 20:26 ` Herbert Xu
1 sibling, 1 reply; 22+ messages in thread
From: Ingo Molnar @ 2006-02-01 13:32 UTC (permalink / raw)
To: David S. Miller; +Cc: herbert, davem, linux-kernel
* David S. Miller <davem@davemloft.net> wrote:
> From: Herbert Xu <herbert@gondor.apana.org.au>
> Date: Tue, 31 Jan 2006 21:27:58 +1100
>
> > tcp_close is only called from process context. The rule for sk_dst_lock
> > is that it must also only be obtained in process context. On the other
> > hand, it is true that sk_lock can be obtained in softirq context.
> >
> > In this particular case, sk_dst_lock is obtained by tcp_close with
> > softirqs disabled. This is not a problem in itself since we're not
> > trying to get sk_dst_lock from a real softirq context (as opposed to
> > process context with softirq disabled).
> >
> > I believe this warning comes about because the validator creates a
> > dependency between sk_lock and sk_dst_lock. It then infers from this
> > dependency that in softirq contexts where sk_lock is obtained the code
> > may also attempt to obtain sk_dst_lock.
> >
> > This inference is where the validator errs. sk_dst_lock is never
> > (or should never be, and as far as I can see none of the traces show
> > it to do so) obtained in a real softirq context.
>
> Herbert's analysis is correct. This unique locking strategy is used
> by tcp_close() because at this point it knows that every single
> reference to this socket in the system is gone once it takes the
> socket lock with BH disabled.
>
> And that known invariant is why this is correct, and the locking
> validator has no way to figure this out.
update: with all of Herbert's fixes i havent gotten these yet - so maybe
the validator was not producing a false positive, but perhaps the
inet6_destroy_sock()->sk_dst_reset() thing was causing the messages?
Ingo
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [lock validator] inet6_destroy_sock(): soft-safe -> soft-unsafe lock dependency
2006-02-01 13:32 ` Ingo Molnar
@ 2006-02-01 20:26 ` Herbert Xu
2006-02-02 7:46 ` Ingo Molnar
0 siblings, 1 reply; 22+ messages in thread
From: Herbert Xu @ 2006-02-01 20:26 UTC (permalink / raw)
To: Ingo Molnar; +Cc: David S. Miller, linux-kernel
On Wed, Feb 01, 2006 at 02:32:19PM +0100, Ingo Molnar wrote:
>
> update: with all of Herbert's fixes i havent gotten these yet - so maybe
> the validator was not producing a false positive, but perhaps the
> inet6_destroy_sock()->sk_dst_reset() thing was causing the messages?
Maybe. But in that case shouldn't the validator show that code-path?
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [lock validator] inet6_destroy_sock(): soft-safe -> soft-unsafe lock dependency
2006-02-01 20:26 ` Herbert Xu
@ 2006-02-02 7:46 ` Ingo Molnar
2006-02-02 8:48 ` Herbert Xu
0 siblings, 1 reply; 22+ messages in thread
From: Ingo Molnar @ 2006-02-02 7:46 UTC (permalink / raw)
To: Herbert Xu; +Cc: David S. Miller, linux-kernel
* Herbert Xu <herbert@gondor.apana.org.au> wrote:
> On Wed, Feb 01, 2006 at 02:32:19PM +0100, Ingo Molnar wrote:
> >
> > update: with all of Herbert's fixes i havent gotten these yet - so maybe
> > the validator was not producing a false positive, but perhaps the
> > inet6_destroy_sock()->sk_dst_reset() thing was causing the messages?
>
> Maybe. But in that case shouldn't the validator show that code-path?
yeah, it should have. In any case, things are looking good so far with
your fixes. (Any suggestions wrt. how to trigger as many different
codepaths in the networking code as possible, to increase coverage of
locking scenarios mapped? I've tried LTP so far, and a few ad-hoc tests.
Perhaps there's some IP protocol tester i should try, which is known to
trigger lots of boundary conditions?)
Ingo
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [lock validator] inet6_destroy_sock(): soft-safe -> soft-unsafe lock dependency
2006-02-02 7:46 ` Ingo Molnar
@ 2006-02-02 8:48 ` Herbert Xu
2006-02-02 9:04 ` David S. Miller
2006-02-02 10:54 ` Ingo Molnar
0 siblings, 2 replies; 22+ messages in thread
From: Herbert Xu @ 2006-02-02 8:48 UTC (permalink / raw)
To: Ingo Molnar; +Cc: David S. Miller, linux-kernel
On Thu, Feb 02, 2006 at 08:46:27AM +0100, Ingo Molnar wrote:
>
> yeah, it should have. In any case, things are looking good so far with
> your fixes. (Any suggestions wrt. how to trigger as many different
> codepaths in the networking code as possible, to increase coverage of
> locking scenarios mapped? I've tried LTP so far, and a few ad-hoc tests.
> Perhaps there's some IP protocol tester i should try, which is known to
> trigger lots of boundary conditions?)
Two paths you might want to cover are netfilter and IPsec. Both of these
can be tested by simply adding netfilter rules and IPsec policies/SAs and
then running the same test you were using before.
You might also want to look at multicast but I don't know of any good
tests for that.
Oh and then there are VLANs, bridging, Ethernet bonding, qdisc's, etc.
Again these can be tested by setting up the relevant devices or rules
and running the usual TCP/UDP tests.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [lock validator] inet6_destroy_sock(): soft-safe -> soft-unsafe lock dependency
2006-02-02 8:48 ` Herbert Xu
@ 2006-02-02 9:04 ` David S. Miller
2006-02-02 10:54 ` Ingo Molnar
1 sibling, 0 replies; 22+ messages in thread
From: David S. Miller @ 2006-02-02 9:04 UTC (permalink / raw)
To: herbert; +Cc: mingo, linux-kernel
From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Thu, 2 Feb 2006 19:48:24 +1100
> You might also want to look at multicast but I don't know of any good
> tests for that.
David Stevens at IBM might have some tests.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [lock validator] inet6_destroy_sock(): soft-safe -> soft-unsafe lock dependency
2006-02-02 8:48 ` Herbert Xu
2006-02-02 9:04 ` David S. Miller
@ 2006-02-02 10:54 ` Ingo Molnar
2006-02-02 11:27 ` Ingo Molnar
2006-02-02 12:17 ` Herbert Xu
1 sibling, 2 replies; 22+ messages in thread
From: Ingo Molnar @ 2006-02-02 10:54 UTC (permalink / raw)
To: Herbert Xu; +Cc: David S. Miller, linux-kernel
hm, i got a new one:
============================================
[ BUG: circular locking deadlock detected! ]
--------------------------------------------
sshd/28997 is trying to acquire lock:
(&sk->sk_lock.slock){-+}, at: [<c0c6be28>] packet_rcv+0xbf/0x34b
but task is already holding lock:
(&dev->xmit_lock){-+}, at: [<c0bb04ec>] qdisc_restart+0x46/0x207
which lock already depends on the new lock,
which could lead to circular deadlocks!
the dependency chain (in reverse order) is:
-> #2 (&dev->xmit_lock){-+}: [<c0bb04ec>] qdisc_restart+0x46/0x207
-> #1 (&dev->queue_lock){-+}: [<c0b98137>] dev_queue_xmit+0xc3/0x21e
-> #0 (&sk->sk_lock.slock){-+}: [<c0c6be28>] packet_rcv+0xbf/0x34b
other info that might help us debug this:
1 locks held by sshd/28997:
#0: (&dev->xmit_lock){-+}, at: [<c0bb04ec>] qdisc_restart+0x46/0x207
stack backtrace:
[<c0103ecd>] show_trace+0xd/0xf
[<c0103ee4>] dump_stack+0x15/0x17
[<c014019e>] print_circular_bug_tail+0x42/0x4b
[<c0141716>] debug_lock_chain+0xb30/0xd58
[<c014196f>] debug_lock_chain_spin+0x31/0x48
[<c0415b8c>] _raw_spin_lock+0x34/0x7f
[<c0d3c12d>] _spin_lock+0x8/0xa
[<c0c6be28>] packet_rcv+0xbf/0x34b
[<c0b98052>] dev_queue_xmit_nit+0xbb/0xdd
[<c0bb05a5>] qdisc_restart+0xff/0x207
[<c0b9814e>] dev_queue_xmit+0xda/0x21e
[<c0bde286>] ip_output+0x237/0x2c1
[<c0bdd9a4>] ip_queue_xmit+0x394/0x417
[<c0beafb7>] tcp_transmit_skb+0x5d4/0x5f6
[<c0beca28>] __tcp_push_pending_frames+0x256/0x305
[<c0be43f1>] tcp_sendmsg+0x8ae/0x975
[<c0bfabd3>] inet_sendmsg+0x39/0x46
[<c0b8fd44>] do_sock_write+0xbd/0xc6
[<c0b8fe7b>] sock_aio_write+0x56/0x64
[<c016c922>] do_sync_write+0xb1/0xe6
[<c016cd71>] vfs_write+0xbd/0x155
[<c016d5db>] sys_write+0x3b/0x60
[<c0102c39>] syscall_call+0x7/0xb
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [lock validator] inet6_destroy_sock(): soft-safe -> soft-unsafe lock dependency
2006-02-02 10:54 ` Ingo Molnar
@ 2006-02-02 11:27 ` Ingo Molnar
2006-02-02 12:19 ` Herbert Xu
2006-02-02 12:17 ` Herbert Xu
1 sibling, 1 reply; 22+ messages in thread
From: Ingo Molnar @ 2006-02-02 11:27 UTC (permalink / raw)
To: Herbert Xu; +Cc: David S. Miller, linux-kernel
* Ingo Molnar <mingo@elte.hu> wrote:
> hm, i got a new one:
>
> ============================================
> [ BUG: circular locking deadlock detected! ]
> --------------------------------------------
> sshd/28997 is trying to acquire lock:
> (&sk->sk_lock.slock){-+}, at: [<c0c6be28>] packet_rcv+0xbf/0x34b
>
> but task is already holding lock:
> (&dev->xmit_lock){-+}, at: [<c0bb04ec>] qdisc_restart+0x46/0x207
>
> which lock already depends on the new lock,
> which could lead to circular deadlocks!
>
> the dependency chain (in reverse order) is:
> -> #2 (&dev->xmit_lock){-+}: [<c0bb04ec>] qdisc_restart+0x46/0x207
> -> #1 (&dev->queue_lock){-+}: [<c0b98137>] dev_queue_xmit+0xc3/0x21e
> -> #0 (&sk->sk_lock.slock){-+}: [<c0c6be28>] packet_rcv+0xbf/0x34b
i think this might be a false positive, caused by my (ill-advised)
change below? [i did the change to clean up an unlock ordering
assymetry, but apparently i thus also introduced the xmit_lock ->
queue_lock dependency.]
Ingo
Index: linux/net/sched/sch_generic.c
===================================================================
--- linux.orig/net/sched/sch_generic.c
+++ linux/net/sched/sch_generic.c
@@ -98,6 +98,8 @@ int qdisc_restart(struct net_device *dev
/* Dequeue packet */
if ((skb = q->dequeue(q)) != NULL) {
unsigned nolock = (dev->features & NETIF_F_LLTX);
+ /* release queue */
+ spin_unlock(&dev->queue_lock);
/*
* When the driver has LLTX set it does its own locking
* in start_xmit. No need to add additional overhead by
@@ -121,9 +123,11 @@ int qdisc_restart(struct net_device *dev
kfree_skb(skb);
if (net_ratelimit())
printk(KERN_DEBUG "Dead loop on netdevice %s, fix it urgently!\n", dev->name);
+ spin_lock(&dev->queue_lock);
return -1;
}
__get_cpu_var(netdev_rx_stat).cpu_collision++;
+ spin_lock(&dev->queue_lock);
goto requeue;
}
/* Remember that the driver is grabbed by us. */
@@ -131,8 +135,6 @@ int qdisc_restart(struct net_device *dev
}
{
- /* And release queue */
- spin_unlock(&dev->queue_lock);
if (!netif_queue_stopped(dev)) {
int ret;
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [lock validator] inet6_destroy_sock(): soft-safe -> soft-unsafe lock dependency
2006-02-02 10:54 ` Ingo Molnar
2006-02-02 11:27 ` Ingo Molnar
@ 2006-02-02 12:17 ` Herbert Xu
2006-02-02 13:54 ` Ingo Molnar
1 sibling, 1 reply; 22+ messages in thread
From: Herbert Xu @ 2006-02-02 12:17 UTC (permalink / raw)
To: Ingo Molnar; +Cc: David S. Miller, linux-kernel
On Thu, Feb 02, 2006 at 11:54:29AM +0100, Ingo Molnar wrote:
>
> hm, i got a new one:
>
> ============================================
> [ BUG: circular locking deadlock detected! ]
> --------------------------------------------
> sshd/28997 is trying to acquire lock:
> (&sk->sk_lock.slock){-+}, at: [<c0c6be28>] packet_rcv+0xbf/0x34b
>
> but task is already holding lock:
> (&dev->xmit_lock){-+}, at: [<c0bb04ec>] qdisc_restart+0x46/0x207
>
> which lock already depends on the new lock,
> which could lead to circular deadlocks!
>
> the dependency chain (in reverse order) is:
> -> #2 (&dev->xmit_lock){-+}: [<c0bb04ec>] qdisc_restart+0x46/0x207
> -> #1 (&dev->queue_lock){-+}: [<c0b98137>] dev_queue_xmit+0xc3/0x21e
> -> #0 (&sk->sk_lock.slock){-+}: [<c0c6be28>] packet_rcv+0xbf/0x34b
I believe this is a false positive and I think I can see where it went
wrong. The dependency between #0 and #1 is the broken premise.
The validator is probably putting all sk_lock's in the same basket.
That is, it's mixing up the socket locks for TCP, UDP as well as
AF_PACKET. While it is true that TCP and UDP's sk_lock may sit
outside queue_lock, AF_PACKET never transmits while holding its
sk_lock.
So the #0 => #1 dependency shouldn't exist. Can you get the validator
to print out the reasoning for the #0 => #1 dependency? That should
clarify the problem.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [lock validator] inet6_destroy_sock(): soft-safe -> soft-unsafe lock dependency
2006-02-02 11:27 ` Ingo Molnar
@ 2006-02-02 12:19 ` Herbert Xu
0 siblings, 0 replies; 22+ messages in thread
From: Herbert Xu @ 2006-02-02 12:19 UTC (permalink / raw)
To: Ingo Molnar; +Cc: David S. Miller, linux-kernel
On Thu, Feb 02, 2006 at 12:27:31PM +0100, Ingo Molnar wrote:
>
> i think this might be a false positive, caused by my (ill-advised)
> change below? [i did the change to clean up an unlock ordering
> assymetry, but apparently i thus also introduced the xmit_lock ->
> queue_lock dependency.]
As far as I can see your change can't cause the previous report,
unless the validator is treating the trylock in the same way as
a normal lock operation.
Cheers,
--
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [lock validator] inet6_destroy_sock(): soft-safe -> soft-unsafe lock dependency
2006-02-02 12:17 ` Herbert Xu
@ 2006-02-02 13:54 ` Ingo Molnar
0 siblings, 0 replies; 22+ messages in thread
From: Ingo Molnar @ 2006-02-02 13:54 UTC (permalink / raw)
To: Herbert Xu; +Cc: David S. Miller, linux-kernel
* Herbert Xu <herbert@gondor.apana.org.au> wrote:
> On Thu, Feb 02, 2006 at 11:54:29AM +0100, Ingo Molnar wrote:
> >
> > hm, i got a new one:
> >
> > ============================================
> > [ BUG: circular locking deadlock detected! ]
> > --------------------------------------------
> > sshd/28997 is trying to acquire lock:
> > (&sk->sk_lock.slock){-+}, at: [<c0c6be28>] packet_rcv+0xbf/0x34b
> >
> > but task is already holding lock:
> > (&dev->xmit_lock){-+}, at: [<c0bb04ec>] qdisc_restart+0x46/0x207
> >
> > which lock already depends on the new lock,
> > which could lead to circular deadlocks!
> >
> > the dependency chain (in reverse order) is:
> > -> #2 (&dev->xmit_lock){-+}: [<c0bb04ec>] qdisc_restart+0x46/0x207
> > -> #1 (&dev->queue_lock){-+}: [<c0b98137>] dev_queue_xmit+0xc3/0x21e
> > -> #0 (&sk->sk_lock.slock){-+}: [<c0c6be28>] packet_rcv+0xbf/0x34b
>
> I believe this is a false positive and I think I can see where it went
> wrong. The dependency between #0 and #1 is the broken premise.
>
> The validator is probably putting all sk_lock's in the same basket.
> That is, it's mixing up the socket locks for TCP, UDP as well as
> AF_PACKET. While it is true that TCP and UDP's sk_lock may sit
> outside queue_lock, AF_PACKET never transmits while holding its
> sk_lock.
you are right, the validator indeed treated all these protocols as one,
so this is a false positive. (there are two slock buckets already,
because there is some natural lock nesting between listen sockets and
ordinary sockets, but otherwise you are right that all net protocols
were indeed treated as one category.)
> So the #0 => #1 dependency shouldn't exist. Can you get the validator
> to print out the reasoning for the #0 => #1 dependency? That should
> clarify the problem.
the slock -> queue_lock dependency was first observed at:
-> (&sk->sk_lock.slock/1){-+}
-> &dev->queue_lock [<c0b98137>] dev_queue_xmit+0xc3/0x21e
so it comes from dev_queue_xmit(). No further information about where
that was done - but i suspect it was TCP's sendmsg. af_packet.c never
seems to be doing that.
The queue_lock -> xmit_lock dependency was first observed at:
-> (&dev->queue_lock){-+}
-> &dev->xmit_lock [<c0bb0827>] dev_activate+0xc8/0xe5
i think i'll solve this by splitting af_packet.c's slock into a separate
bucket (separate lock type). (Such exceptions and locking assymetries
can be expressed towards the validator in a pretty straightforward way,
by initializing those locks in a special way.)
Ingo
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [lock validator] inet6_destroy_sock(): soft-safe -> soft-unsafe lock dependency
2006-02-01 10:42 ` Herbert Xu
2006-02-01 11:13 ` Ingo Molnar
@ 2006-02-03 1:01 ` David S. Miller
1 sibling, 0 replies; 22+ messages in thread
From: David S. Miller @ 2006-02-03 1:01 UTC (permalink / raw)
To: herbert; +Cc: mingo, davem, linux-kernel, netdev, yoshfuji
From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Wed, 1 Feb 2006 21:42:14 +1100
> OK this is definitely broken. We should never touch the dst lock in
> softirq context. Since inet6_destroy_sock may be called from that
> context due to the asynchronous nature of sockets, we can't take the
> lock there.
>
> In fact this sk_dst_reset is totally redundant since all IPv6 sockets
> use inet_sock_destruct as their socket destructor which always cleans
> up the dst anyway. So the solution is to simply remove the call.
>
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Looks good, applied, thanks Herbert.
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2006-02-03 1:06 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-01-27 0:18 [lock validator] net/ipv4/fib_hash.c: illegal {enabled-softirqs} -> {used-in-softirq} usage? Ingo Molnar
2006-01-27 1:41 ` Herbert Xu
2006-01-28 15:22 ` [lock validator] inet6_destroy_sock(): soft-safe -> soft-unsafe lock dependency Ingo Molnar
2006-01-28 15:44 ` Ingo Molnar
2006-01-31 10:27 ` Herbert Xu
2006-01-31 10:43 ` David S. Miller
2006-01-31 11:21 ` Ingo Molnar
2006-02-01 13:32 ` Ingo Molnar
2006-02-01 20:26 ` Herbert Xu
2006-02-02 7:46 ` Ingo Molnar
2006-02-02 8:48 ` Herbert Xu
2006-02-02 9:04 ` David S. Miller
2006-02-02 10:54 ` Ingo Molnar
2006-02-02 11:27 ` Ingo Molnar
2006-02-02 12:19 ` Herbert Xu
2006-02-02 12:17 ` Herbert Xu
2006-02-02 13:54 ` Ingo Molnar
2006-01-31 21:24 ` Ingo Molnar
2006-01-31 22:06 ` Herbert Xu
2006-02-01 10:42 ` Herbert Xu
2006-02-01 11:13 ` Ingo Molnar
2006-02-03 1:01 ` David S. Miller
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).