linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [lock validator] net/ipv4/fib_hash.c: illegal {enabled-softirqs} -> {used-in-softirq} usage?
@ 2006-01-27  0:18 Ingo Molnar
  2006-01-27  1:41 ` Herbert Xu
  0 siblings, 1 reply; 22+ messages in thread
From: Ingo Molnar @ 2006-01-27  0:18 UTC (permalink / raw)
  To: David S. Miller; +Cc: linux-kernel

the lock validator i'm working on found another item:

  ============================
  [ BUG: illegal lock usage! ]
  ----------------------------
  illegal {enabled-softirqs} -> {used-in-softirq} usage.
  hackbench/8407 [HC0[0]:SC1[2]:HE1:SE0] takes:
   {&state[i].lock} [<c0bf7d08>] wrandom_flush+0x2a/0xb3
  {enabled-softirqs} state was registered at:
   [<c0befc33>] fn_hash_insert+0x565/0x5c3
  hardirqs last enabled at: [<c0d29a36>] _spin_unlock_irq+0xd/0x10
  softirqs last enabled at: [<c012d0a0>] irq_exit+0x36/0x38
  
  other info that might help us debug this:
  locks held by hackbench/8407: <none>
  
  stack backtrace:
   [<c0103e86>] show_trace+0xd/0xf
   [<c0103e9d>] dump_stack+0x15/0x17
   [<c013eb40>] print_usage_bug+0x16d/0x177
   [<c013f130>] mark_lock+0xe3/0x248
   [<c013f65d>] debug_lock_chain+0x3c8/0xb1d
   [<c013fde3>] debug_lock_chain_spin+0x31/0x48
   [<c0410c4e>] _raw_spin_lock+0x34/0x7f
   [<c0d298fd>] _spin_lock+0x8/0xa
   [<c0bf7d08>] wrandom_flush+0x2a/0xb3
   [<c0bc3738>] rt_cache_flush+0x3f/0xd8
   [<c0bc3843>] rt_secret_rebuild+0x11/0x26
   [<c0131165>] run_timer_softirq+0x143/0x19e
   [<c012d384>] __do_softirq+0x84/0xff
   [<c0104d76>] do_softirq+0x52/0xbb
   =======================
   [<c012d0a0>] irq_exit+0x36/0x38
   [<c0118809>] smp_apic_timer_interrupt+0x4e/0x51
   [<c010376f>] apic_timer_interrupt+0x27/0x2c
   [<c01670b2>] kmem_cache_alloc+0x28/0xa6
   [<c0b80be6>] __alloc_skb+0x25/0xee
   [<c0b7f144>] sock_alloc_send_skb+0x5d/0x192
   [<c0c29bdf>] unix_stream_sendmsg+0x131/0x33f
   [<c0b7e30a>] do_sock_write+0xbd/0xc6
   [<c0b7e441>] sock_aio_write+0x56/0x64
   [<c016a9a4>] do_sync_write+0xb1/0xe6
   [<c016adf0>] vfs_write+0xbd/0x155
   [<c016b657>] sys_write+0x3b/0x60
   [<c0102c25>] syscall_call+0x7/0xb

the culprit seems to be:

(gdb) list *0xc0befc33
0xc0befc33 is in fn_hash_insert (net/ipv4/fib_hash.c:527).
522             list_add_tail(&new_fa->fa_list,
523                      (fa ? &fa->fa_list : &f->fn_alias));
524             fib_hash_genid++;
525             write_unlock_bh(&fib_hash_lock);
526
527             if (new_f)
528                     fz->fz_nent++;
529             rt_cache_flush(-1);
530
531             rtmsg_fib(RTM_NEWROUTE, key, new_fa, z, tb->tb_id, n, req);
(gdb)

which enabled softirqs with {&state[i].lock} still held. If the 
rt_secret_rebuild() would hit that codepath at that point then it could 
cause a deadlock, right?

	Ingo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [lock validator] net/ipv4/fib_hash.c: illegal {enabled-softirqs} -> {used-in-softirq} usage?
  2006-01-27  0:18 [lock validator] net/ipv4/fib_hash.c: illegal {enabled-softirqs} -> {used-in-softirq} usage? Ingo Molnar
@ 2006-01-27  1:41 ` Herbert Xu
  2006-01-28 15:22   ` [lock validator] inet6_destroy_sock(): soft-safe -> soft-unsafe lock dependency Ingo Molnar
  0 siblings, 1 reply; 22+ messages in thread
From: Herbert Xu @ 2006-01-27  1:41 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: davem, linux-kernel

Ingo Molnar <mingo@elte.hu> wrote:
> the lock validator i'm working on found another item:
> 
>  ============================
>  [ BUG: illegal lock usage! ]
>  ----------------------------
>  illegal {enabled-softirqs} -> {used-in-softirq} usage.
>  hackbench/8407 [HC0[0]:SC1[2]:HE1:SE0] takes:
>   {&state[i].lock} [<c0bf7d08>] wrandom_flush+0x2a/0xb3
>  {enabled-softirqs} state was registered at:
>   [<c0befc33>] fn_hash_insert+0x565/0x5c3
>  hardirqs last enabled at: [<c0d29a36>] _spin_unlock_irq+0xd/0x10
>  softirqs last enabled at: [<c012d0a0>] irq_exit+0x36/0x38
>  
>  other info that might help us debug this:
>  locks held by hackbench/8407: <none>
>  
>  stack backtrace:
>   [<c0103e86>] show_trace+0xd/0xf
>   [<c0103e9d>] dump_stack+0x15/0x17
>   [<c013eb40>] print_usage_bug+0x16d/0x177
>   [<c013f130>] mark_lock+0xe3/0x248
>   [<c013f65d>] debug_lock_chain+0x3c8/0xb1d
>   [<c013fde3>] debug_lock_chain_spin+0x31/0x48
>   [<c0410c4e>] _raw_spin_lock+0x34/0x7f
>   [<c0d298fd>] _spin_lock+0x8/0xa
>   [<c0bf7d08>] wrandom_flush+0x2a/0xb3

Good catch.  In fact the other spin lock operation in the same file needs
BH protection as well since it is invoked by the output route path which
can be in user-context for local packets.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [lock validator] inet6_destroy_sock(): soft-safe -> soft-unsafe lock dependency
  2006-01-27  1:41 ` Herbert Xu
@ 2006-01-28 15:22   ` Ingo Molnar
  2006-01-28 15:44     ` Ingo Molnar
  2006-01-31 10:27     ` Herbert Xu
  0 siblings, 2 replies; 22+ messages in thread
From: Ingo Molnar @ 2006-01-28 15:22 UTC (permalink / raw)
  To: Herbert Xu; +Cc: davem, linux-kernel


* Herbert Xu <herbert@gondor.apana.org.au> wrote:

> Good catch.  In fact the other spin lock operation in the same file 
> needs BH protection as well since it is invoked by the output route 
> path which can be in user-context for local packets.

ok, cool.

Today i've added rwlocks to the coverage of the validator too, and it 
triggers in the networking code straight away - see the output below. Is 
this a genuine deadlock-site too? (Take this with a large grain of salt 
- while it did find another genuine bug in floppy.c and it passes a 
couple of testcases too, the rwlock validation feature is literally just 
1 hour old.)

let me know if you need explanation about the output, or if you need 
more info.

	Ingo

 ===========================================================
 [ BUG: soft-safe -> soft-unsafe lock dependency detected! ]
 -----------------------------------------------------------
 sshd/2853 [HC0[0]:SC0[1]:HE1:SE0] is trying to acquire:
  (&newsk->sk_dst_lock){+-}, at: [<c048f385>] inet6_destroy_sock+0x25/0x100
 
 and this task is already holding:
  (&((sk)->sk_lock.slock)){+-}, at: [<c0464832>] tcp_close+0x142/0x650
 which would create a new lock dependency:
   (&((sk)->sk_lock.slock)){+-} -> (&newsk->sk_dst_lock){+-}
 
 but this new dependency connects a softirq-safe lock:
  (&((sk)->sk_lock.slock)){+-}, at: [<c0464832>] tcp_close+0x142/0x650
 ... which became softirq-safe at:
 ... [<c0102e27>] sysenter_past_esp+0x84/0x8d
 
 to a softirq-unsafe lock:
  (&newsk->sk_dst_lock){+-}, at: [<c048f385>] inet6_destroy_sock+0x25/0x100
 ... which became softirq-unsafe at:
 ... [<00000000>] 0x0
 
 which could lead to deadlocks!
 
 other info that might help us debug this:
 
 1 locks held by sshd/2853:
  #0:  (&((sk)->sk_lock.slock)){+-}, at: [<c0464832>] tcp_close+0x142/0x650
 
 the softirq-safe lock's dependencies:
 -> (&((sk)->sk_lock.slock)){+-} {
    used       at: [<c0438429>] lock_sock+0x29/0xd0
    in-softirq at: [<c047081b>] tcp_delack_timer+0x1b/0x200
    hardirq-on at: [<c0102e27>] sysenter_past_esp+0x84/0x8d
  }
  ... key      at: [<c0438375>] sock_lock_init+0x15/0x30
  ... acquired at: [<c0464832>] tcp_close+0x142/0x650
 
   -> (&sk->sk_callback_lock){+-} {
      used       at: [<c048c8b3>] unix_release_sock+0x43/0x2d0
      hardirq-on at: [<c0162738>] kmem_cache_free+0x78/0xb0
    }
    ... key      at: [<c043963f>] sock_init_data+0x10f/0x1d0
    ... acquired at: [<c046484e>] tcp_close+0x15e/0x650
 
   -> (cpa_lock){++} {
      used       at: [<c011474c>] change_page_attr+0x1c/0x2a0
      in-hardirq at: [<c011474c>] change_page_attr+0x1c/0x2a0
      in-softirq at: [<c011474c>] change_page_attr+0x1c/0x2a0
    }
    ... key      at: [<c0581950>] cpa_lock+0x10/0x40
    ... acquired at: [<c011474c>] change_page_attr+0x1c/0x2a0
 
   -> (&dev->queue_lock){+-} {
      used       at: [<c044d44d>] qdisc_lock_tree+0x1d/0x20
      in-softirq at: [<c0441ef4>] dev_queue_xmit+0x64/0x290
      hardirq-on at: [<c0102ea7>] restore_nocheck+0x8/0xb
    }
    ... key      at: [<c044288b>] register_netdevice+0x4b/0x3a0
    ... acquired at: [<c0441ef4>] dev_queue_xmit+0x64/0x290
 
     -> (&dev->xmit_lock){+-} {
        used       at: [<c0443e6c>] dev_mc_upload+0x1c/0x40
        in-softirq at: [<c044d6bb>] dev_watchdog+0x1b/0xc0
        hardirq-on at: [<c04d40fa>] __mutex_lock_slowpath+0x27a/0x4e0
      }
      ... key      at: [<c044289d>] register_netdevice+0x5d/0x3a0
      ... acquired at: [<c044e019>] dev_activate+0x69/0x110
 
       -> (&tp->lock){++} {
          used       at: [<c0338c8a>] rtl8139_interrupt+0x2a/0x5e0
          in-hardirq at: [<c0338c8a>] rtl8139_interrupt+0x2a/0x5e0
          in-softirq at: [<c033839a>] rtl8139_start_xmit+0x7a/0x1a0
        }
        ... key      at: [<c03376bb>] rtl8139_init_one+0x2ab/0x960
        ... acquired at: [<c0337d92>] rtl8139_set_rx_mode+0x22/0x180
 
       -> (&base->t_base.lock){++} {
          used       at: [<c01269f3>] lock_timer_base+0x23/0x50
          in-hardirq at: [<c01269f3>] lock_timer_base+0x23/0x50
          in-softirq at: [<c0127612>] run_timer_softirq+0x42/0x1f0
        }
        ... key      at: [<c1fdaef0>] 0xc1fdaef0
        ... acquired at: [<c01269f3>] lock_timer_base+0x23/0x50
 
       -> (modlist_lock){++} {
          used       at: [<c0140289>] module_text_address+0x19/0x40
          in-hardirq at: [<c0140289>] module_text_address+0x19/0x40
          in-softirq at: [<c0140289>] module_text_address+0x19/0x40
        }
        ... key      at: [<c05842f0>] modlist_lock+0x10/0x40
        ... acquired at: [<c0140289>] module_text_address+0x19/0x40
 
       -> (cpa_lock){++} {
          used       at: [<c011474c>] change_page_attr+0x1c/0x2a0
          in-hardirq at: [<c011474c>] change_page_attr+0x1c/0x2a0
          in-softirq at: [<c011474c>] change_page_attr+0x1c/0x2a0
        }
        ... key      at: [<c0581950>] cpa_lock+0x10/0x40
        ... acquired at: [<c011474c>] change_page_attr+0x1c/0x2a0
 
   -> (&tcp_hashinfo.ehash[i].lock){+-} {
      used       at: [<c0460dbd>] inet_hash_connect+0x3dd/0x480
      in-softirq at: [<c0460eed>] __inet_twsk_hashdance+0x8d/0x130
      hardirq-on at: [<c0161b26>] kmem_cache_alloc+0x96/0xd0
    }
    ... key      at: [<c0696b8d>] tcp_init+0x2ad/0x300
    ... acquired at: [<c0472fa4>] tcp_v4_syn_recv_sock+0x184/0x340
 
   -> (&tcp_hashinfo.bhash[i].lock){+-} {
      used       at: [<c046275a>] inet_csk_get_port+0xea/0x240
      in-softirq at: [<c0460eaa>] __inet_twsk_hashdance+0x4a/0x130
      hardirq-on at: [<c0102e27>] sysenter_past_esp+0x84/0x8d
    }
    ... key      at: [<c0696b44>] tcp_init+0x264/0x300
    ... acquired at: [<c0472ffe>] tcp_v4_syn_recv_sock+0x1de/0x340
 
     -> (&parent->list_lock){++} {
        used       at: [<c0161bca>] cache_alloc_refill+0x6a/0x730
        in-hardirq at: [<c016362b>] do_drain+0x2b/0x60
        in-softirq at: [<c0162a46>] free_block+0x136/0x150
      }
      ... key      at: [<c01610ba>] kmem_list3_init+0x3a/0x50
      ... acquired at: [<c0161bca>] cache_alloc_refill+0x6a/0x730
 
     -> (&cachep->spinlock){+-} {
        used       at: [<c0161ea2>] cache_alloc_refill+0x342/0x730
        in-softirq at: [<c0161ea2>] cache_alloc_refill+0x342/0x730
      }
      ... key      at: [<c0163c71>] kmem_cache_create+0x2c1/0x780
      ... acquired at: [<c0161ea2>] cache_alloc_refill+0x342/0x730
 
       -> (&parent->list_lock){++} {
          used       at: [<c0161bca>] cache_alloc_refill+0x6a/0x730
          in-hardirq at: [<c016362b>] do_drain+0x2b/0x60
          in-softirq at: [<c0162a46>] free_block+0x136/0x150
        }
        ... key      at: [<c01610ba>] kmem_list3_init+0x3a/0x50
        ... acquired at: [<c01636a1>] __cache_shrink+0x41/0x140
 
       -> (cpa_lock){++} {
          used       at: [<c011474c>] change_page_attr+0x1c/0x2a0
          in-hardirq at: [<c011474c>] change_page_attr+0x1c/0x2a0
          in-softirq at: [<c011474c>] change_page_attr+0x1c/0x2a0
        }
        ... key      at: [<c0581950>] cpa_lock+0x10/0x40
        ... acquired at: [<c011474c>] change_page_attr+0x1c/0x2a0
 
     -> (cpa_lock){++} {
        used       at: [<c011474c>] change_page_attr+0x1c/0x2a0
        in-hardirq at: [<c011474c>] change_page_attr+0x1c/0x2a0
        in-softirq at: [<c011474c>] change_page_attr+0x1c/0x2a0
      }
      ... key      at: [<c0581950>] cpa_lock+0x10/0x40
      ... acquired at: [<c011474c>] change_page_attr+0x1c/0x2a0
 
     -> (&tcp_hashinfo.ehash[i].lock){+-} {
        used       at: [<c0460dbd>] inet_hash_connect+0x3dd/0x480
        in-softirq at: [<c0460eed>] __inet_twsk_hashdance+0x8d/0x130
        hardirq-on at: [<c0161b26>] kmem_cache_alloc+0x96/0xd0
      }
      ... key      at: [<c0696b8d>] tcp_init+0x2ad/0x300
      ... acquired at: [<c0460dbd>] inet_hash_connect+0x3dd/0x480
 
   -> (&queue->syn_wait_lock){+-} {
      used       at: [<c043a7bc>] reqsk_queue_alloc+0x8c/0xd0
      in-softirq at: [<c0461e88>] inet_csk_reqsk_queue_hash_add+0xd8/0x110
      hardirq-on at: [<c04d5105>] _spin_unlock_irqrestore+0x25/0x30
    }
    ... key      at: [<c043a7a2>] reqsk_queue_alloc+0x72/0xd0
    ... acquired at: [<c0475bf2>] tcp_check_req+0x2c2/0x3f0
 
   -> (&base->t_base.lock){++} {
      used       at: [<c01269f3>] lock_timer_base+0x23/0x50
      in-hardirq at: [<c01269f3>] lock_timer_base+0x23/0x50
      in-softirq at: [<c0127612>] run_timer_softirq+0x42/0x1f0
    }
    ... key      at: [<c1fd2ef0>] 0xc1fd2ef0
    ... acquired at: [<c01269f3>] lock_timer_base+0x23/0x50
 
   -> (modlist_lock){++} {
      used       at: [<c0140289>] module_text_address+0x19/0x40
      in-hardirq at: [<c0140289>] module_text_address+0x19/0x40
      in-softirq at: [<c0140289>] module_text_address+0x19/0x40
    }
    ... key      at: [<c05842f0>] modlist_lock+0x10/0x40
    ... acquired at: [<c0140289>] module_text_address+0x19/0x40
 
   -> (&q->lock){++} {
      used       at: [<c0132050>] add_wait_queue+0x20/0x50
      in-hardirq at: [<c0114fab>] __wake_up+0x1b/0x50
      in-softirq at: [<c0114fab>] __wake_up+0x1b/0x50
    }
    ... key      at: [<c0131e62>] init_waitqueue_head+0x12/0x20
    ... acquired at: [<c0114fab>] __wake_up+0x1b/0x50
 
     -> (&rq->lock){++} {
        used       at: [<c01162a9>] init_idle+0x49/0x70
        in-hardirq at: [<c0117fb6>] scheduler_tick+0xe6/0x380
        in-softirq at: [<c01175ae>] try_to_wake_up+0x3e/0x410
      }
      ... key      at: [<c1fda4f0>] 0xc1fda4f0
      ... acquired at: [<c01175ae>] try_to_wake_up+0x3e/0x410
 
     -> (&rq->lock){++} {
        used       at: [<c01162a9>] init_idle+0x49/0x70
        in-hardirq at: [<c0117fb6>] scheduler_tick+0xe6/0x380
        in-softirq at: [<c0117fe7>] scheduler_tick+0x117/0x380
      }
      ... key      at: [<c1fd24f0>] 0xc1fd24f0
      ... acquired at: [<c01175ae>] try_to_wake_up+0x3e/0x410
 
       -> (&rq->lock){++} {
          used       at: [<c01162a9>] init_idle+0x49/0x70
          in-hardirq at: [<c0117fb6>] scheduler_tick+0xe6/0x380
          in-softirq at: [<c01175ae>] try_to_wake_up+0x3e/0x410
        }
        ... key      at: [<c1fda4f0>] 0xc1fda4f0
        ... acquired at: [<c0114e6c>] double_rq_lock+0x3c/0x50
 
   -> (&parent->list_lock){++} {
      used       at: [<c0161bca>] cache_alloc_refill+0x6a/0x730
      in-hardirq at: [<c016362b>] do_drain+0x2b/0x60
      in-softirq at: [<c0162a46>] free_block+0x136/0x150
    }
    ... key      at: [<c01610ba>] kmem_list3_init+0x3a/0x50
    ... acquired at: [<c0161bca>] cache_alloc_refill+0x6a/0x730
 
   -> (&cachep->spinlock){+-} {
      used       at: [<c0161ea2>] cache_alloc_refill+0x342/0x730
      in-softirq at: [<c0161ea2>] cache_alloc_refill+0x342/0x730
    }
    ... key      at: [<c0163c71>] kmem_cache_create+0x2c1/0x780
    ... acquired at: [<c0161ea2>] cache_alloc_refill+0x342/0x730
 
     -> (&parent->list_lock){++} {
        used       at: [<c0161bca>] cache_alloc_refill+0x6a/0x730
        in-hardirq at: [<c016362b>] do_drain+0x2b/0x60
        in-softirq at: [<c0162a46>] free_block+0x136/0x150
      }
      ... key      at: [<c01610ba>] kmem_list3_init+0x3a/0x50
      ... acquired at: [<c01636a1>] __cache_shrink+0x41/0x140
 
     -> (cpa_lock){++} {
        used       at: [<c011474c>] change_page_attr+0x1c/0x2a0
        in-hardirq at: [<c011474c>] change_page_attr+0x1c/0x2a0
        in-softirq at: [<c011474c>] change_page_attr+0x1c/0x2a0
      }
      ... key      at: [<c0581950>] cpa_lock+0x10/0x40
      ... acquired at: [<c011474c>] change_page_attr+0x1c/0x2a0
 
   -> (&newsk->sk_callback_lock){+-} {
      used       at: [<c047f9ad>] inet_accept+0x4d/0xc0
      hardirq-on at: [<c0162738>] kmem_cache_free+0x78/0xb0
    }
    ... key      at: [<c04399fb>] sk_clone+0xeb/0x1c0
    ... acquired at: [<c046484e>] tcp_close+0x15e/0x650
 
 
 the softirq-unsafe lock's dependencies:
 -> (&newsk->sk_dst_lock){+-} {
    used       at: [<c045fda6>] ip_setsockopt+0x526/0xbf0
    softirq-on at: [<c04384c1>] lock_sock+0xc1/0xd0
    hardirq-on at: [<c0102e27>] sysenter_past_esp+0x84/0x8d
  }
  ... key      at: [<c04399e9>] sk_clone+0xd9/0x1c0
  ... acquired at: [<c048f385>] inet6_destroy_sock+0x25/0x100 
 

 stack backtrace:
  [<c010432d>] show_trace+0xd/0x10
  [<c0104347>] dump_stack+0x17/0x20
  [<c01380f0>] check_mask+0x2b0/0x300
  [<c013a109>] debug_lock_chain+0xce9/0x1090
  [<c013a4ed>] debug_lock_chain_spin+0x3d/0x60
  [<c02684f2>] _raw_write_lock+0x32/0x1a0
  [<c04d50d8>] _write_lock+0x8/0x10
  [<c048f385>] inet6_destroy_sock+0x25/0x100
  [<c04aede2>] tcp_v6_destroy_sock+0x12/0x20
  [<c046217a>] inet_csk_destroy_sock+0x4a/0x150
  [<c0464b7f>] tcp_close+0x48f/0x650
  [<c047ffa7>] inet_release+0x37/0x60
  [<c048ed10>] inet6_release+0x30/0x40
  [<c0436957>] sock_release+0x17/0xa0
  [<c0436df1>] sock_close+0x21/0x40
  [<c01674e3>] __fput+0xb3/0x170
  [<c01675c7>] fput+0x27/0x50
  [<c01647d7>] filp_close+0x47/0x80
  [<c011e685>] put_files_struct+0x85/0xf0
  [<c011f58e>] do_exit+0x14e/0x840
  [<c011fcbd>] do_group_exit+0x3d/0xa0
  [<c011fd2f>] sys_exit_group+0xf/0x20
  [<c0102df7>] sysenter_past_esp+0x54/0x8d

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [lock validator] inet6_destroy_sock(): soft-safe -> soft-unsafe lock dependency
  2006-01-28 15:22   ` [lock validator] inet6_destroy_sock(): soft-safe -> soft-unsafe lock dependency Ingo Molnar
@ 2006-01-28 15:44     ` Ingo Molnar
  2006-01-31 10:27     ` Herbert Xu
  1 sibling, 0 replies; 22+ messages in thread
From: Ingo Molnar @ 2006-01-28 15:44 UTC (permalink / raw)
  To: Herbert Xu; +Cc: davem, linux-kernel


* Ingo Molnar <mingo@elte.hu> wrote:

>  to a softirq-unsafe lock:
>   (&newsk->sk_dst_lock){+-}, at: [<c048f385>] inet6_destroy_sock+0x25/0x100
>  ... which became softirq-unsafe at:
>  ... [<00000000>] 0x0

fyi, here is where sk_dst_lock became softirq-unsafe:

marked lock as {softirq-on}:
 (&sk->sk_dst_lock){--}, at: [<c04b45d3>] ip6_datagram_connect+0x3b3/0x520
softirq was enabled at: c0497738
hardirq was enabled at: c0102e27
 (&sk->sk_dst_lock){--}, at: [<c04b45d3>] ip6_datagram_connect+0x3b3/0x520
 [<c010432d>] show_trace+0xd/0x10
 [<c0104347>] dump_stack+0x17/0x20
 [<c0139243>] mark_lock+0x173/0x3a0
 [<c01399a5>] debug_lock_chain+0x535/0x1090
 [<c013a53d>] debug_lock_chain_spin+0x3d/0x60
 [<c0268542>] _raw_write_lock+0x32/0x1a0
 [<c04d48c8>] _write_lock+0x8/0x10
 [<c04b45d3>] ip6_datagram_connect+0x3b3/0x520
 [<c04805c7>] inet_dgram_connect+0x37/0x80
 [<c0436f0a>] sys_connect+0x5a/0x80
 [<c0437414>] sys_socketcall+0x94/0x260
 [<c0102df7>] sysenter_past_esp+0x54/0x8d

	Ingo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [lock validator] inet6_destroy_sock(): soft-safe -> soft-unsafe lock dependency
  2006-01-28 15:22   ` [lock validator] inet6_destroy_sock(): soft-safe -> soft-unsafe lock dependency Ingo Molnar
  2006-01-28 15:44     ` Ingo Molnar
@ 2006-01-31 10:27     ` Herbert Xu
  2006-01-31 10:43       ` David S. Miller
  2006-01-31 21:24       ` Ingo Molnar
  1 sibling, 2 replies; 22+ messages in thread
From: Herbert Xu @ 2006-01-31 10:27 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: davem, linux-kernel

On Sat, Jan 28, 2006 at 04:22:04PM +0100, Ingo Molnar wrote:
> 
>  ===========================================================
>  [ BUG: soft-safe -> soft-unsafe lock dependency detected! ]
>  -----------------------------------------------------------
>  sshd/2853 [HC0[0]:SC0[1]:HE1:SE0] is trying to acquire:
>   (&newsk->sk_dst_lock){+-}, at: [<c048f385>] inet6_destroy_sock+0x25/0x100
>  
>  and this task is already holding:
>   (&((sk)->sk_lock.slock)){+-}, at: [<c0464832>] tcp_close+0x142/0x650
>  which would create a new lock dependency:
>    (&((sk)->sk_lock.slock)){+-} -> (&newsk->sk_dst_lock){+-}

OK, I believe this is a false positive.  However, Dave should
double-check this.

tcp_close is only called from process context.  The rule for sk_dst_lock
is that it must also only be obtained in process context.  On the other
hand, it is true that sk_lock can be obtained in softirq context.

In this particular case, sk_dst_lock is obtained by tcp_close with
softirqs disabled.  This is not a problem in itself since we're not
trying to get sk_dst_lock from a real softirq context (as opposed to
process context with softirq disabled).

I believe this warning comes about because the validator creates a
dependency between sk_lock and sk_dst_lock.  It then infers from this
dependency that in softirq contexts where sk_lock is obtained the code
may also attempt to obtain sk_dst_lock.

This inference is where the validator errs.  sk_dst_lock is never
(or should never be, and as far as I can see none of the traces show
it to do so) obtained in a real softirq context.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [lock validator] inet6_destroy_sock(): soft-safe -> soft-unsafe lock dependency
  2006-01-31 10:27     ` Herbert Xu
@ 2006-01-31 10:43       ` David S. Miller
  2006-01-31 11:21         ` Ingo Molnar
  2006-02-01 13:32         ` Ingo Molnar
  2006-01-31 21:24       ` Ingo Molnar
  1 sibling, 2 replies; 22+ messages in thread
From: David S. Miller @ 2006-01-31 10:43 UTC (permalink / raw)
  To: herbert; +Cc: mingo, davem, linux-kernel

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Tue, 31 Jan 2006 21:27:58 +1100

> tcp_close is only called from process context.  The rule for sk_dst_lock
> is that it must also only be obtained in process context.  On the other
> hand, it is true that sk_lock can be obtained in softirq context.
> 
> In this particular case, sk_dst_lock is obtained by tcp_close with
> softirqs disabled.  This is not a problem in itself since we're not
> trying to get sk_dst_lock from a real softirq context (as opposed to
> process context with softirq disabled).
> 
> I believe this warning comes about because the validator creates a
> dependency between sk_lock and sk_dst_lock.  It then infers from this
> dependency that in softirq contexts where sk_lock is obtained the code
> may also attempt to obtain sk_dst_lock.
> 
> This inference is where the validator errs.  sk_dst_lock is never
> (or should never be, and as far as I can see none of the traces show
> it to do so) obtained in a real softirq context.

Herbert's analysis is correct.  This unique locking strategy
is used by tcp_close() because at this point it knows that
every single reference to this socket in the system is gone
once it takes the socket lock with BH disabled.

And that known invariant is why this is correct, and the locking
validator has no way to figure this out.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [lock validator] inet6_destroy_sock(): soft-safe -> soft-unsafe lock dependency
  2006-01-31 10:43       ` David S. Miller
@ 2006-01-31 11:21         ` Ingo Molnar
  2006-02-01 13:32         ` Ingo Molnar
  1 sibling, 0 replies; 22+ messages in thread
From: Ingo Molnar @ 2006-01-31 11:21 UTC (permalink / raw)
  To: David S. Miller; +Cc: herbert, davem, linux-kernel


* David S. Miller <davem@davemloft.net> wrote:

> From: Herbert Xu <herbert@gondor.apana.org.au>
> Date: Tue, 31 Jan 2006 21:27:58 +1100
> 
> > tcp_close is only called from process context.  The rule for sk_dst_lock
> > is that it must also only be obtained in process context.  On the other
> > hand, it is true that sk_lock can be obtained in softirq context.
> > 
> > In this particular case, sk_dst_lock is obtained by tcp_close with
> > softirqs disabled.  This is not a problem in itself since we're not
> > trying to get sk_dst_lock from a real softirq context (as opposed to
> > process context with softirq disabled).
> > 
> > I believe this warning comes about because the validator creates a
> > dependency between sk_lock and sk_dst_lock.  It then infers from this
> > dependency that in softirq contexts where sk_lock is obtained the code
> > may also attempt to obtain sk_dst_lock.
> > 
> > This inference is where the validator errs.  sk_dst_lock is never
> > (or should never be, and as far as I can see none of the traces show
> > it to do so) obtained in a real softirq context.
> 
> Herbert's analysis is correct.  This unique locking strategy is used 
> by tcp_close() because at this point it knows that every single 
> reference to this socket in the system is gone once it takes the 
> socket lock with BH disabled.
> 
> And that known invariant is why this is correct, and the locking 
> validator has no way to figure this out.

ok, thanks for the analysis! I'll fix this with an explicit hint to the 
validator.

	Ingo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [lock validator] inet6_destroy_sock(): soft-safe -> soft-unsafe lock dependency
  2006-01-31 10:27     ` Herbert Xu
  2006-01-31 10:43       ` David S. Miller
@ 2006-01-31 21:24       ` Ingo Molnar
  2006-01-31 22:06         ` Herbert Xu
  2006-02-01 10:42         ` Herbert Xu
  1 sibling, 2 replies; 22+ messages in thread
From: Ingo Molnar @ 2006-01-31 21:24 UTC (permalink / raw)
  To: Herbert Xu; +Cc: davem, linux-kernel


* Herbert Xu <herbert@gondor.apana.org.au> wrote:

> This inference is where the validator errs.  sk_dst_lock is never (or 
> should never be, and as far as I can see none of the traces show it to 
> do so) obtained in a real softirq context.

just to make sure - is the trace below a safe use of sk_dst_lock too?  
Here sk_dst_lock seems to be taken in real softirq context.

	Ingo

============================
[ BUG: illegal lock usage! ]
----------------------------
illegal {softirq-on} -> {in-softirq} usage.
sshd/2476 [HC0[0]:SC1[2]:HE1:SE0] takes:
 (&sk->sk_dst_lock){-+}, at: [<c0499015>] inet6_destroy_sock+0x25/0x100
{softirq-on} state was registered at:
 [<c04a1bd8>] ipv6_dev_get_saddr+0x138/0x640
hardirqs last enabled at: [<c04dea45>] _spin_unlock_irqrestore+0x25/0x30
softirqs last enabled at: [<c01231e5>] irq_exit+0x45/0x50

other info that might help us debug this:
3 locks held by sshd/2476:
 #0:  (&p->proc_lock){--}, at: [<c011f1a3>] release_task+0x23/0x150
 #1:  (&tp->rx_lock){-+}, at: [<c0342a15>] rtl8139_poll+0x45/0x4c0
 #2:  (&sk->sk_lock.slock/1){-+}, at: [<c047e8f6>] tcp_v4_rcv+0x726/0x9d0

stack backtrace:
 [<c010437d>] show_trace+0xd/0x10
 [<c0104397>] dump_stack+0x17/0x20
 [<c0139538>] print_usage_bug+0x1d8/0x230
 [<c01398a8>] mark_lock+0x318/0x350
 [<c0139d73>] debug_lock_chain+0x493/0x1090
 [<c013a9ad>] debug_lock_chain_spin+0x3d/0x60
 [<c0269272>] _raw_write_lock+0x32/0x1a0
 [<c04de9e8>] _write_lock+0x8/0x10
 [<c0499015>] inet6_destroy_sock+0x25/0x100
 [<c04b8672>] tcp_v6_destroy_sock+0x12/0x20
 [<c046bbda>] inet_csk_destroy_sock+0x4a/0x150
 [<c047625c>] tcp_rcv_state_process+0xd4c/0xdd0
 [<c047d8e9>] tcp_v4_do_rcv+0xa9/0x340
 [<c047eabb>] tcp_v4_rcv+0x8eb/0x9d0
 [<c0462c76>] ip_local_deliver+0xa6/0x190
 [<c04629f8>] ip_rcv+0x2f8/0x4d0
 [<c044bcb6>] netif_receive_skb+0x1b6/0x2a0
 [<c0342d3a>] rtl8139_poll+0x36a/0x4c0
 [<c044a682>] net_rx_action+0xd2/0x1f0
 [<c0123527>] __do_softirq+0x97/0x130
 [<c01054d9>] do_softirq+0x69/0x100
 =======================
 [<c01231e5>] irq_exit+0x45/0x50
 [<c01055c4>] do_IRQ+0x54/0x70
 [<c01038a9>] common_interrupt+0x25/0x2c
 [<c0120990>] do_wait+0x7d0/0xad0
 [<c0120cc2>] sys_wait4+0x32/0x40
 [<c0120cf5>] sys_waitpid+0x25/0x30
 [<c0102e17>] sysenter_past_esp+0x54/0x8d

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [lock validator] inet6_destroy_sock(): soft-safe -> soft-unsafe lock dependency
  2006-01-31 21:24       ` Ingo Molnar
@ 2006-01-31 22:06         ` Herbert Xu
  2006-02-01 10:42         ` Herbert Xu
  1 sibling, 0 replies; 22+ messages in thread
From: Herbert Xu @ 2006-01-31 22:06 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: davem, linux-kernel

On Tue, Jan 31, 2006 at 10:24:32PM +0100, Ingo Molnar wrote:
> 
> just to make sure - is the trace below a safe use of sk_dst_lock too?  
> Here sk_dst_lock seems to be taken in real softirq context.

You're right this does look suspicious.  Let me take a closer look...
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [lock validator] inet6_destroy_sock(): soft-safe -> soft-unsafe lock dependency
  2006-01-31 21:24       ` Ingo Molnar
  2006-01-31 22:06         ` Herbert Xu
@ 2006-02-01 10:42         ` Herbert Xu
  2006-02-01 11:13           ` Ingo Molnar
  2006-02-03  1:01           ` David S. Miller
  1 sibling, 2 replies; 22+ messages in thread
From: Herbert Xu @ 2006-02-01 10:42 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: davem, linux-kernel, netdev, YOSHIFUJI Hideaki

[-- Attachment #1: Type: text/plain, Size: 1072 bytes --]

On Tue, Jan 31, 2006 at 10:24:32PM +0100, Ingo Molnar wrote:
>
>  [<c04de9e8>] _write_lock+0x8/0x10
>  [<c0499015>] inet6_destroy_sock+0x25/0x100
>  [<c04b8672>] tcp_v6_destroy_sock+0x12/0x20
>  [<c046bbda>] inet_csk_destroy_sock+0x4a/0x150
>  [<c047625c>] tcp_rcv_state_process+0xd4c/0xdd0
>  [<c047d8e9>] tcp_v4_do_rcv+0xa9/0x340
>  [<c047eabb>] tcp_v4_rcv+0x8eb/0x9d0

OK this is definitely broken.  We should never touch the dst lock in
softirq context.  Since inet6_destroy_sock may be called from that
context due to the asynchronous nature of sockets, we can't take the
lock there.

In fact this sk_dst_reset is totally redundant since all IPv6 sockets
use inet_sock_destruct as their socket destructor which always cleans
up the dst anyway.  So the solution is to simply remove the call.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

[-- Attachment #2: inet6-destroy-sock-should-not-reset-dst --]
[-- Type: text/plain, Size: 369 bytes --]

diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -369,12 +369,6 @@ int inet6_destroy_sock(struct sock *sk)
 	struct sk_buff *skb;
 	struct ipv6_txoptions *opt;
 
-	/*
-	 *	Release destination entry
-	 */
-
-	sk_dst_reset(sk);
-
 	/* Release rx options */
 
 	if ((skb = xchg(&np->pktoptions, NULL)) != NULL)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [lock validator] inet6_destroy_sock(): soft-safe -> soft-unsafe lock dependency
  2006-02-01 10:42         ` Herbert Xu
@ 2006-02-01 11:13           ` Ingo Molnar
  2006-02-03  1:01           ` David S. Miller
  1 sibling, 0 replies; 22+ messages in thread
From: Ingo Molnar @ 2006-02-01 11:13 UTC (permalink / raw)
  To: Herbert Xu; +Cc: davem, linux-kernel, netdev, YOSHIFUJI Hideaki


* Herbert Xu <herbert@gondor.apana.org.au> wrote:

> On Tue, Jan 31, 2006 at 10:24:32PM +0100, Ingo Molnar wrote:
> >
> >  [<c04de9e8>] _write_lock+0x8/0x10
> >  [<c0499015>] inet6_destroy_sock+0x25/0x100
> >  [<c04b8672>] tcp_v6_destroy_sock+0x12/0x20
> >  [<c046bbda>] inet_csk_destroy_sock+0x4a/0x150
> >  [<c047625c>] tcp_rcv_state_process+0xd4c/0xdd0
> >  [<c047d8e9>] tcp_v4_do_rcv+0xa9/0x340
> >  [<c047eabb>] tcp_v4_rcv+0x8eb/0x9d0
> 
> OK this is definitely broken.  We should never touch the dst lock in 
> softirq context.  Since inet6_destroy_sock may be called from that 
> context due to the asynchronous nature of sockets, we can't take the 
> lock there.
> 
> In fact this sk_dst_reset is totally redundant since all IPv6 sockets 
> use inet_sock_destruct as their socket destructor which always cleans 
> up the dst anyway.  So the solution is to simply remove the call.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

cool - i've booted your patch and will have results later today [it's 
looking good so far, after 10 minutes of uptime ;)]

btw., this codepath took some time to trigger, and i'm not sure why: 
maybe because i dont have any true ipv6 traffic? (In fact i dont have 
CONFIG_IPV6 enabled at all in this kernel config - so this codepath must 
be an effect of ipv4/ipv6 unification?) I guess i should run the ipv6 
testsuite to expose all the important codepaths to the validator? [Would 
you have any suggestions for me how to do that quickly & easily?]

another thing: could you add the string 'lock validator' somewhere into 
the changelog, so that we can grep for such things later on? One reason 
for that is to strenghten my future arguments for mainline inclusion 
;-), but there's another argument as well:

the lock validator finds _all_ hidden deadlocks [no matter how obscure, 
interdependent or unlikely they are, as long as the affected codepath is 
triggered at least once], and thus the resulting bug statistics and 
characteristics will be an excellent (and one-time) opportunity to 
objectively judge the absolute code quality (and defect rate) of the 
Linux kernel, for an important category of hard-to-find bugs. (Maybe the 
results can even be extrapolated to other, similarly hard-to-find bug 
categories.)

in other words, the lock validator is building a runtime set of formal 
"locking requirements", and is automatically proving (and maintaining 
the proof) that all locking activity within the kernel meets those 
requirements, mathematically. It thus enables us to achieve a hard 
ceiling of 100% correctness that we can rarely achieve in complex code 
as the kernel. We should minimize the changeset entropy introduced and 
preserve this historic information.

	Ingo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [lock validator] inet6_destroy_sock(): soft-safe -> soft-unsafe lock dependency
  2006-01-31 10:43       ` David S. Miller
  2006-01-31 11:21         ` Ingo Molnar
@ 2006-02-01 13:32         ` Ingo Molnar
  2006-02-01 20:26           ` Herbert Xu
  1 sibling, 1 reply; 22+ messages in thread
From: Ingo Molnar @ 2006-02-01 13:32 UTC (permalink / raw)
  To: David S. Miller; +Cc: herbert, davem, linux-kernel


* David S. Miller <davem@davemloft.net> wrote:

> From: Herbert Xu <herbert@gondor.apana.org.au>
> Date: Tue, 31 Jan 2006 21:27:58 +1100
> 
> > tcp_close is only called from process context.  The rule for sk_dst_lock
> > is that it must also only be obtained in process context.  On the other
> > hand, it is true that sk_lock can be obtained in softirq context.
> > 
> > In this particular case, sk_dst_lock is obtained by tcp_close with
> > softirqs disabled.  This is not a problem in itself since we're not
> > trying to get sk_dst_lock from a real softirq context (as opposed to
> > process context with softirq disabled).
> > 
> > I believe this warning comes about because the validator creates a
> > dependency between sk_lock and sk_dst_lock.  It then infers from this
> > dependency that in softirq contexts where sk_lock is obtained the code
> > may also attempt to obtain sk_dst_lock.
> > 
> > This inference is where the validator errs.  sk_dst_lock is never
> > (or should never be, and as far as I can see none of the traces show
> > it to do so) obtained in a real softirq context.
> 
> Herbert's analysis is correct.  This unique locking strategy is used 
> by tcp_close() because at this point it knows that every single 
> reference to this socket in the system is gone once it takes the 
> socket lock with BH disabled.
> 
> And that known invariant is why this is correct, and the locking 
> validator has no way to figure this out.

update: with all of Herbert's fixes i havent gotten these yet - so maybe 
the validator was not producing a false positive, but perhaps the 
inet6_destroy_sock()->sk_dst_reset() thing was causing the messages?

	Ingo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [lock validator] inet6_destroy_sock(): soft-safe -> soft-unsafe lock dependency
  2006-02-01 13:32         ` Ingo Molnar
@ 2006-02-01 20:26           ` Herbert Xu
  2006-02-02  7:46             ` Ingo Molnar
  0 siblings, 1 reply; 22+ messages in thread
From: Herbert Xu @ 2006-02-01 20:26 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: David S. Miller, linux-kernel

On Wed, Feb 01, 2006 at 02:32:19PM +0100, Ingo Molnar wrote:
> 
> update: with all of Herbert's fixes i havent gotten these yet - so maybe 
> the validator was not producing a false positive, but perhaps the 
> inet6_destroy_sock()->sk_dst_reset() thing was causing the messages?

Maybe.  But in that case shouldn't the validator show that code-path?

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [lock validator] inet6_destroy_sock(): soft-safe -> soft-unsafe lock dependency
  2006-02-01 20:26           ` Herbert Xu
@ 2006-02-02  7:46             ` Ingo Molnar
  2006-02-02  8:48               ` Herbert Xu
  0 siblings, 1 reply; 22+ messages in thread
From: Ingo Molnar @ 2006-02-02  7:46 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David S. Miller, linux-kernel


* Herbert Xu <herbert@gondor.apana.org.au> wrote:

> On Wed, Feb 01, 2006 at 02:32:19PM +0100, Ingo Molnar wrote:
> > 
> > update: with all of Herbert's fixes i havent gotten these yet - so maybe 
> > the validator was not producing a false positive, but perhaps the 
> > inet6_destroy_sock()->sk_dst_reset() thing was causing the messages?
> 
> Maybe.  But in that case shouldn't the validator show that code-path?

yeah, it should have. In any case, things are looking good so far with 
your fixes. (Any suggestions wrt. how to trigger as many different 
codepaths in the networking code as possible, to increase coverage of 
locking scenarios mapped? I've tried LTP so far, and a few ad-hoc tests.  
Perhaps there's some IP protocol tester i should try, which is known to 
trigger lots of boundary conditions?)

	Ingo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [lock validator] inet6_destroy_sock(): soft-safe -> soft-unsafe lock dependency
  2006-02-02  7:46             ` Ingo Molnar
@ 2006-02-02  8:48               ` Herbert Xu
  2006-02-02  9:04                 ` David S. Miller
  2006-02-02 10:54                 ` Ingo Molnar
  0 siblings, 2 replies; 22+ messages in thread
From: Herbert Xu @ 2006-02-02  8:48 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: David S. Miller, linux-kernel

On Thu, Feb 02, 2006 at 08:46:27AM +0100, Ingo Molnar wrote:
> 
> yeah, it should have. In any case, things are looking good so far with 
> your fixes. (Any suggestions wrt. how to trigger as many different 
> codepaths in the networking code as possible, to increase coverage of 
> locking scenarios mapped? I've tried LTP so far, and a few ad-hoc tests.  
> Perhaps there's some IP protocol tester i should try, which is known to 
> trigger lots of boundary conditions?)

Two paths you might want to cover are netfilter and IPsec.  Both of these
can be tested by simply adding netfilter rules and IPsec policies/SAs and
then running the same test you were using before.

You might also want to look at multicast but I don't know of any good
tests for that.

Oh and then there are VLANs, bridging, Ethernet bonding, qdisc's, etc.
Again these can be tested by setting up the relevant devices or rules
and running the usual TCP/UDP tests.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [lock validator] inet6_destroy_sock(): soft-safe -> soft-unsafe lock dependency
  2006-02-02  8:48               ` Herbert Xu
@ 2006-02-02  9:04                 ` David S. Miller
  2006-02-02 10:54                 ` Ingo Molnar
  1 sibling, 0 replies; 22+ messages in thread
From: David S. Miller @ 2006-02-02  9:04 UTC (permalink / raw)
  To: herbert; +Cc: mingo, linux-kernel

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Thu, 2 Feb 2006 19:48:24 +1100

> You might also want to look at multicast but I don't know of any good
> tests for that.

David Stevens at IBM might have some tests.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [lock validator] inet6_destroy_sock(): soft-safe -> soft-unsafe lock dependency
  2006-02-02  8:48               ` Herbert Xu
  2006-02-02  9:04                 ` David S. Miller
@ 2006-02-02 10:54                 ` Ingo Molnar
  2006-02-02 11:27                   ` Ingo Molnar
  2006-02-02 12:17                   ` Herbert Xu
  1 sibling, 2 replies; 22+ messages in thread
From: Ingo Molnar @ 2006-02-02 10:54 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David S. Miller, linux-kernel


hm, i got a new one:

============================================
[ BUG: circular locking deadlock detected! ]
--------------------------------------------
sshd/28997 is trying to acquire lock:
 (&sk->sk_lock.slock){-+}, at: [<c0c6be28>] packet_rcv+0xbf/0x34b

but task is already holding lock:
 (&dev->xmit_lock){-+}, at: [<c0bb04ec>] qdisc_restart+0x46/0x207

which lock already depends on the new lock,
which could lead to circular deadlocks!

the dependency chain (in reverse order) is:
-> #2 (&dev->xmit_lock){-+}: [<c0bb04ec>] qdisc_restart+0x46/0x207
-> #1 (&dev->queue_lock){-+}: [<c0b98137>] dev_queue_xmit+0xc3/0x21e
-> #0 (&sk->sk_lock.slock){-+}: [<c0c6be28>] packet_rcv+0xbf/0x34b

other info that might help us debug this:

1 locks held by sshd/28997:
 #0:  (&dev->xmit_lock){-+}, at: [<c0bb04ec>] qdisc_restart+0x46/0x207

stack backtrace:
 [<c0103ecd>] show_trace+0xd/0xf
 [<c0103ee4>] dump_stack+0x15/0x17
 [<c014019e>] print_circular_bug_tail+0x42/0x4b
 [<c0141716>] debug_lock_chain+0xb30/0xd58
 [<c014196f>] debug_lock_chain_spin+0x31/0x48
 [<c0415b8c>] _raw_spin_lock+0x34/0x7f
 [<c0d3c12d>] _spin_lock+0x8/0xa
 [<c0c6be28>] packet_rcv+0xbf/0x34b
 [<c0b98052>] dev_queue_xmit_nit+0xbb/0xdd
 [<c0bb05a5>] qdisc_restart+0xff/0x207
 [<c0b9814e>] dev_queue_xmit+0xda/0x21e
 [<c0bde286>] ip_output+0x237/0x2c1
 [<c0bdd9a4>] ip_queue_xmit+0x394/0x417
 [<c0beafb7>] tcp_transmit_skb+0x5d4/0x5f6
 [<c0beca28>] __tcp_push_pending_frames+0x256/0x305
 [<c0be43f1>] tcp_sendmsg+0x8ae/0x975
 [<c0bfabd3>] inet_sendmsg+0x39/0x46
 [<c0b8fd44>] do_sock_write+0xbd/0xc6
 [<c0b8fe7b>] sock_aio_write+0x56/0x64
 [<c016c922>] do_sync_write+0xb1/0xe6
 [<c016cd71>] vfs_write+0xbd/0x155
 [<c016d5db>] sys_write+0x3b/0x60
 [<c0102c39>] syscall_call+0x7/0xb

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [lock validator] inet6_destroy_sock(): soft-safe -> soft-unsafe lock dependency
  2006-02-02 10:54                 ` Ingo Molnar
@ 2006-02-02 11:27                   ` Ingo Molnar
  2006-02-02 12:19                     ` Herbert Xu
  2006-02-02 12:17                   ` Herbert Xu
  1 sibling, 1 reply; 22+ messages in thread
From: Ingo Molnar @ 2006-02-02 11:27 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David S. Miller, linux-kernel


* Ingo Molnar <mingo@elte.hu> wrote:

> hm, i got a new one:
> 
> ============================================
> [ BUG: circular locking deadlock detected! ]
> --------------------------------------------
> sshd/28997 is trying to acquire lock:
>  (&sk->sk_lock.slock){-+}, at: [<c0c6be28>] packet_rcv+0xbf/0x34b
> 
> but task is already holding lock:
>  (&dev->xmit_lock){-+}, at: [<c0bb04ec>] qdisc_restart+0x46/0x207
> 
> which lock already depends on the new lock,
> which could lead to circular deadlocks!
> 
> the dependency chain (in reverse order) is:
> -> #2 (&dev->xmit_lock){-+}: [<c0bb04ec>] qdisc_restart+0x46/0x207
> -> #1 (&dev->queue_lock){-+}: [<c0b98137>] dev_queue_xmit+0xc3/0x21e
> -> #0 (&sk->sk_lock.slock){-+}: [<c0c6be28>] packet_rcv+0xbf/0x34b

i think this might be a false positive, caused by my (ill-advised) 
change below? [i did the change to clean up an unlock ordering 
assymetry, but apparently i thus also introduced the xmit_lock -> 
queue_lock dependency.]

	Ingo

Index: linux/net/sched/sch_generic.c
===================================================================
--- linux.orig/net/sched/sch_generic.c
+++ linux/net/sched/sch_generic.c
@@ -98,6 +98,8 @@ int qdisc_restart(struct net_device *dev
 	/* Dequeue packet */
 	if ((skb = q->dequeue(q)) != NULL) {
 		unsigned nolock = (dev->features & NETIF_F_LLTX);
+		/* release queue */
+		spin_unlock(&dev->queue_lock);
 		/*
 		 * When the driver has LLTX set it does its own locking
 		 * in start_xmit. No need to add additional overhead by
@@ -121,9 +123,11 @@ int qdisc_restart(struct net_device *dev
 					kfree_skb(skb);
 					if (net_ratelimit())
 						printk(KERN_DEBUG "Dead loop on netdevice %s, fix it urgently!\n", dev->name);
+					spin_lock(&dev->queue_lock);
 					return -1;
 				}
 				__get_cpu_var(netdev_rx_stat).cpu_collision++;
+				spin_lock(&dev->queue_lock);
 				goto requeue;
 			}
 			/* Remember that the driver is grabbed by us. */
@@ -131,8 +135,6 @@ int qdisc_restart(struct net_device *dev
 		}
 		
 		{
-			/* And release queue */
-			spin_unlock(&dev->queue_lock);
 
 			if (!netif_queue_stopped(dev)) {
 				int ret;

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [lock validator] inet6_destroy_sock(): soft-safe -> soft-unsafe lock dependency
  2006-02-02 10:54                 ` Ingo Molnar
  2006-02-02 11:27                   ` Ingo Molnar
@ 2006-02-02 12:17                   ` Herbert Xu
  2006-02-02 13:54                     ` Ingo Molnar
  1 sibling, 1 reply; 22+ messages in thread
From: Herbert Xu @ 2006-02-02 12:17 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: David S. Miller, linux-kernel

On Thu, Feb 02, 2006 at 11:54:29AM +0100, Ingo Molnar wrote:
> 
> hm, i got a new one:
> 
> ============================================
> [ BUG: circular locking deadlock detected! ]
> --------------------------------------------
> sshd/28997 is trying to acquire lock:
>  (&sk->sk_lock.slock){-+}, at: [<c0c6be28>] packet_rcv+0xbf/0x34b
> 
> but task is already holding lock:
>  (&dev->xmit_lock){-+}, at: [<c0bb04ec>] qdisc_restart+0x46/0x207
> 
> which lock already depends on the new lock,
> which could lead to circular deadlocks!
> 
> the dependency chain (in reverse order) is:
> -> #2 (&dev->xmit_lock){-+}: [<c0bb04ec>] qdisc_restart+0x46/0x207
> -> #1 (&dev->queue_lock){-+}: [<c0b98137>] dev_queue_xmit+0xc3/0x21e
> -> #0 (&sk->sk_lock.slock){-+}: [<c0c6be28>] packet_rcv+0xbf/0x34b

I believe this is a false positive and I think I can see where it went
wrong.  The dependency between #0 and #1 is the broken premise.

The validator is probably putting all sk_lock's in the same basket.
That is, it's mixing up the socket locks for TCP, UDP as well as
AF_PACKET.  While it is true that TCP and UDP's sk_lock may sit
outside queue_lock, AF_PACKET never transmits while holding its
sk_lock.

So the #0 => #1 dependency shouldn't exist.  Can you get the validator
to print out the reasoning for the #0 => #1 dependency? That should
clarify the problem.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [lock validator] inet6_destroy_sock(): soft-safe -> soft-unsafe lock dependency
  2006-02-02 11:27                   ` Ingo Molnar
@ 2006-02-02 12:19                     ` Herbert Xu
  0 siblings, 0 replies; 22+ messages in thread
From: Herbert Xu @ 2006-02-02 12:19 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: David S. Miller, linux-kernel

On Thu, Feb 02, 2006 at 12:27:31PM +0100, Ingo Molnar wrote:
> 
> i think this might be a false positive, caused by my (ill-advised) 
> change below? [i did the change to clean up an unlock ordering 
> assymetry, but apparently i thus also introduced the xmit_lock -> 
> queue_lock dependency.]

As far as I can see your change can't cause the previous report,
unless the validator is treating the trylock in the same way as
a normal lock operation.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [lock validator] inet6_destroy_sock(): soft-safe -> soft-unsafe lock dependency
  2006-02-02 12:17                   ` Herbert Xu
@ 2006-02-02 13:54                     ` Ingo Molnar
  0 siblings, 0 replies; 22+ messages in thread
From: Ingo Molnar @ 2006-02-02 13:54 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David S. Miller, linux-kernel


* Herbert Xu <herbert@gondor.apana.org.au> wrote:

> On Thu, Feb 02, 2006 at 11:54:29AM +0100, Ingo Molnar wrote:
> > 
> > hm, i got a new one:
> > 
> > ============================================
> > [ BUG: circular locking deadlock detected! ]
> > --------------------------------------------
> > sshd/28997 is trying to acquire lock:
> >  (&sk->sk_lock.slock){-+}, at: [<c0c6be28>] packet_rcv+0xbf/0x34b
> > 
> > but task is already holding lock:
> >  (&dev->xmit_lock){-+}, at: [<c0bb04ec>] qdisc_restart+0x46/0x207
> > 
> > which lock already depends on the new lock,
> > which could lead to circular deadlocks!
> > 
> > the dependency chain (in reverse order) is:
> > -> #2 (&dev->xmit_lock){-+}: [<c0bb04ec>] qdisc_restart+0x46/0x207
> > -> #1 (&dev->queue_lock){-+}: [<c0b98137>] dev_queue_xmit+0xc3/0x21e
> > -> #0 (&sk->sk_lock.slock){-+}: [<c0c6be28>] packet_rcv+0xbf/0x34b
> 
> I believe this is a false positive and I think I can see where it went 
> wrong.  The dependency between #0 and #1 is the broken premise.
> 
> The validator is probably putting all sk_lock's in the same basket.  
> That is, it's mixing up the socket locks for TCP, UDP as well as 
> AF_PACKET.  While it is true that TCP and UDP's sk_lock may sit 
> outside queue_lock, AF_PACKET never transmits while holding its 
> sk_lock.

you are right, the validator indeed treated all these protocols as one, 
so this is a false positive. (there are two slock buckets already, 
because there is some natural lock nesting between listen sockets and 
ordinary sockets, but otherwise you are right that all net protocols 
were indeed treated as one category.)

> So the #0 => #1 dependency shouldn't exist.  Can you get the validator 
> to print out the reasoning for the #0 => #1 dependency? That should 
> clarify the problem.

the slock -> queue_lock dependency was first observed at:

-> (&sk->sk_lock.slock/1){-+}
    -> &dev->queue_lock          [<c0b98137>] dev_queue_xmit+0xc3/0x21e

so it comes from dev_queue_xmit(). No further information about where 
that was done - but i suspect it was TCP's sendmsg. af_packet.c never 
seems to be doing that.

The queue_lock -> xmit_lock dependency was first observed at:

-> (&dev->queue_lock){-+}
    -> &dev->xmit_lock           [<c0bb0827>] dev_activate+0xc8/0xe5

i think i'll solve this by splitting af_packet.c's slock into a separate 
bucket (separate lock type). (Such exceptions and locking assymetries 
can be expressed towards the validator in a pretty straightforward way, 
by initializing those locks in a special way.)

	Ingo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [lock validator] inet6_destroy_sock(): soft-safe -> soft-unsafe lock dependency
  2006-02-01 10:42         ` Herbert Xu
  2006-02-01 11:13           ` Ingo Molnar
@ 2006-02-03  1:01           ` David S. Miller
  1 sibling, 0 replies; 22+ messages in thread
From: David S. Miller @ 2006-02-03  1:01 UTC (permalink / raw)
  To: herbert; +Cc: mingo, davem, linux-kernel, netdev, yoshfuji

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Wed, 1 Feb 2006 21:42:14 +1100

> OK this is definitely broken.  We should never touch the dst lock in
> softirq context.  Since inet6_destroy_sock may be called from that
> context due to the asynchronous nature of sockets, we can't take the
> lock there.
> 
> In fact this sk_dst_reset is totally redundant since all IPv6 sockets
> use inet_sock_destruct as their socket destructor which always cleans
> up the dst anyway.  So the solution is to simply remove the call.
> 
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Looks good, applied, thanks Herbert.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2006-02-03  1:06 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-01-27  0:18 [lock validator] net/ipv4/fib_hash.c: illegal {enabled-softirqs} -> {used-in-softirq} usage? Ingo Molnar
2006-01-27  1:41 ` Herbert Xu
2006-01-28 15:22   ` [lock validator] inet6_destroy_sock(): soft-safe -> soft-unsafe lock dependency Ingo Molnar
2006-01-28 15:44     ` Ingo Molnar
2006-01-31 10:27     ` Herbert Xu
2006-01-31 10:43       ` David S. Miller
2006-01-31 11:21         ` Ingo Molnar
2006-02-01 13:32         ` Ingo Molnar
2006-02-01 20:26           ` Herbert Xu
2006-02-02  7:46             ` Ingo Molnar
2006-02-02  8:48               ` Herbert Xu
2006-02-02  9:04                 ` David S. Miller
2006-02-02 10:54                 ` Ingo Molnar
2006-02-02 11:27                   ` Ingo Molnar
2006-02-02 12:19                     ` Herbert Xu
2006-02-02 12:17                   ` Herbert Xu
2006-02-02 13:54                     ` Ingo Molnar
2006-01-31 21:24       ` Ingo Molnar
2006-01-31 22:06         ` Herbert Xu
2006-02-01 10:42         ` Herbert Xu
2006-02-01 11:13           ` Ingo Molnar
2006-02-03  1:01           ` David S. Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).