All of lore.kernel.org
 help / color / mirror / Atom feed
* inconsistent lock/deadlock crash, vanilla 3.3.4, 32bit, tcp
@ 2012-04-29  7:41 Denys Fedoryshchenko
  2012-04-29  8:27 ` Eric Dumazet
  0 siblings, 1 reply; 10+ messages in thread
From: Denys Fedoryshchenko @ 2012-04-29  7:41 UTC (permalink / raw)
  To: netdev, davem, jmorris, yoshfuji, kaber, linux-kernel

I apologize for late night post, and a lot of trash left in report.
I will cleanup it up now, and send with CC to maintainers.

Server job are proxy, with very high rate of new connections.
Deadlock at peaktime can be easily reproduced in 10-15 minutes.

Deadlock occured on almost all 3.3-stable versions (tried 3.3.3 - 
3.3.4). It is not easy to try older kernel,
but if required i can try.
Usually also, because SYN rate very high, i can see:
[   51.612987] TCP: Possible SYN flooding on port 8080. Sending 
cookies.  Check SNMP counters.

  [  762.903868]
  [  762.903880] =================================
  [  762.903890] [ INFO: inconsistent lock state ]
  [  762.903903] 3.3.4-build-0061 #8 Not tainted
  [  762.904133] ---------------------------------
  [  762.904344] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
  [  762.904542] squid/1603 [HC0[0]:SC0[0]:HE1:SE1] takes:
  [  762.904542]  (key#3){+.?...}, at: [<c0232cc4>] 
__percpu_counter_sum+0xd/0x58
  [  762.904542] {IN-SOFTIRQ-W} state was registered at:
  [  762.904542]   [<c0158b84>] __lock_acquire+0x284/0xc26
  [  762.904542]   [<c01598e8>] lock_acquire+0x71/0x85
  [  762.904542]   [<c0349765>] _raw_spin_lock+0x33/0x40
  [  762.904542]   [<c0232c93>] __percpu_counter_add+0x58/0x7c
  [  762.904542]   [<c02cfde1>] sk_clone_lock+0x1e5/0x200
  [  762.904542]   [<c0303ee4>] inet_csk_clone_lock+0xe/0x78
  [  762.904542]   [<c0315778>] tcp_create_openreq_child+0x1b/0x404
  [  762.904542]   [<c031339c>] tcp_v4_syn_recv_sock+0x32/0x1c1
  [  762.904542]   [<c031615a>] tcp_check_req+0x1fd/0x2d7
  [  762.904542]   [<c0313f77>] tcp_v4_do_rcv+0xab/0x194
  [  762.904542]   [<c03153bb>] tcp_v4_rcv+0x3b3/0x5cc
  [  762.904542]   [<c02fc0c4>] ip_local_deliver_finish+0x13a/0x1e9
  [  762.904542]   [<c02fc539>] NF_HOOK.clone.11+0x46/0x4d
  [  762.904542]   [<c02fc652>] ip_local_deliver+0x41/0x45
  [  762.904542]   [<c02fc4d1>] ip_rcv_finish+0x31a/0x33c
  [  762.904542]   [<c02fc539>] NF_HOOK.clone.11+0x46/0x4d
  [  762.904542]   [<c02fc857>] ip_rcv+0x201/0x23e
  [  762.904542]   [<c02daa3a>] __netif_receive_skb+0x319/0x368
  [  762.904542]   [<c02dac07>] netif_receive_skb+0x4e/0x7d
  [  762.904542]   [<c02dacf6>] napi_skb_finish+0x1e/0x34
  [  762.904542]   [<c02db122>] napi_gro_receive+0x20/0x24
  [  762.904542]   [<f85d1743>] e1000_receive_skb+0x3f/0x45 [e1000e]
  [  762.904542]   [<f85d3464>] e1000_clean_rx_irq+0x1f9/0x284 [e1000e]
  [  762.904542]   [<f85d3926>] e1000_clean+0x62/0x1f4 [e1000e]
  [  762.904542]   [<c02db228>] net_rx_action+0x90/0x160
  [  762.904542]   [<c012a445>] __do_softirq+0x7b/0x118
  [  762.904542] irq event stamp: 156915469
  [  762.904542] hardirqs last  enabled at (156915469): [<c019b4f4>] 
__slab_alloc.clone.58.clone.63+0xc4/0x2de
  [  762.904542] hardirqs last disabled at (156915468): [<c019b452>] 
__slab_alloc.clone.58.clone.63+0x22/0x2de
  [  762.904542] softirqs last  enabled at (156915466): [<c02ce677>] 
lock_sock_nested+0x64/0x6c
  [  762.904542] softirqs last disabled at (156915464): [<c0349914>] 
_raw_spin_lock_bh+0xe/0x45
  [  762.904542]
  [  762.904542] other info that might help us debug this:
  [  762.904542]  Possible unsafe locking scenario:
  [  762.904542]
  [  762.904542]        CPU0
  [  762.904542]        ----
  [  762.904542]   lock(key#3);
  [  762.904542]   <Interrupt>
  [  762.904542]     lock(key#3);
  [  762.904542]
  [  762.904542]  *** DEADLOCK ***
  [  762.904542]
  [  762.904542] 1 lock held by squid/1603:
  [  762.904542]  #0:  (sk_lock-AF_INET){+.+.+.}, at: [<c03055c0>] 
lock_sock+0xa/0xc
  [  762.904542]
  [  762.904542] stack backtrace:
  [  762.904542] Pid: 1603, comm: squid Not tainted 3.3.4-build-0061 #8
  [  762.904542] Call Trace:
  [  762.904542]  [<c0347b73>] ? printk+0x18/0x1d
  [  762.904542]  [<c015873a>] valid_state+0x1f6/0x201
  [  762.904542]  [<c0158816>] mark_lock+0xd1/0x1bb
  [  762.904542]  [<c015876b>] ? mark_lock+0x26/0x1bb
  [  762.904542]  [<c015805d>] ? check_usage_forwards+0x77/0x77
  [  762.904542]  [<c0158bf8>] __lock_acquire+0x2f8/0xc26
  [  762.904542]  [<c0159b8e>] ? mark_held_locks+0x5d/0x7b
  [  762.904542]  [<c0159cf6>] ? trace_hardirqs_on+0xb/0xd
  [  762.904542]  [<c0158dd4>] ? __lock_acquire+0x4d4/0xc26
  [  762.904542]  [<c01598e8>] lock_acquire+0x71/0x85
  [  762.904542]  [<c0232cc4>] ? __percpu_counter_sum+0xd/0x58
  [  762.904542]  [<c0349765>] _raw_spin_lock+0x33/0x40
  [  762.904542]  [<c0232cc4>] ? __percpu_counter_sum+0xd/0x58
  [  762.904542]  [<c0232cc4>] __percpu_counter_sum+0xd/0x58
  [  762.904542]  [<c02cebc4>] __sk_mem_schedule+0xdd/0x1c7
  [  762.904542]  [<c02d178d>] ? __alloc_skb+0x76/0x100
  [  762.904542]  [<c0305e8e>] sk_wmem_schedule+0x21/0x2d
  [  762.904542]  [<c0306370>] sk_stream_alloc_skb+0x42/0xaa
  [  762.904542]  [<c0306567>] tcp_sendmsg+0x18f/0x68b
  [  762.904542]  [<c031f3dc>] ? ip_fast_csum+0x30/0x30
  [  762.904542]  [<c0320193>] inet_sendmsg+0x53/0x5a
  [  762.904542]  [<c02cb633>] sock_aio_write+0xd2/0xda
  [  762.904542]  [<c015876b>] ? mark_lock+0x26/0x1bb
  [  762.904542]  [<c01a1017>] do_sync_write+0x9f/0xd9
  [  762.904542]  [<c01a2111>] ? file_free_rcu+0x2f/0x2f
  [  762.904542]  [<c01a17a1>] vfs_write+0x8f/0xab
  [  762.904542]  [<c01a284d>] ? fget_light+0x75/0x7c
  [  762.904542]  [<c01a1900>] sys_write+0x3d/0x5e
  [  762.904542]  [<c0349ec9>] syscall_call+0x7/0xb
  [  762.904542]  [<c0340000>] ? rp_sidt+0x41/0x83


---
Denys Fedoryshchenko

---
Network engineer
Denys Fedoryshchenko

Dora Highway - Center Cebaco - 2nd Floor
Beirut, Lebanon
Tel:	+961 1 247373
E-Mail: denys@visp.net.lb

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: inconsistent lock/deadlock crash, vanilla 3.3.4, 32bit, tcp
  2012-04-29  7:41 inconsistent lock/deadlock crash, vanilla 3.3.4, 32bit, tcp Denys Fedoryshchenko
@ 2012-04-29  8:27 ` Eric Dumazet
  2012-04-29  8:40   ` Eric Dumazet
  2012-04-29  8:46   ` Denys Fedoryshchenko
  0 siblings, 2 replies; 10+ messages in thread
From: Eric Dumazet @ 2012-04-29  8:27 UTC (permalink / raw)
  To: Denys Fedoryshchenko
  Cc: netdev, davem, jmorris, yoshfuji, kaber, linux-kernel, Glauber Costa

On Sun, 2012-04-29 at 10:41 +0300, Denys Fedoryshchenko wrote:
> I apologize for late night post, and a lot of trash left in report.
> I will cleanup it up now, and send with CC to maintainers.
> 
> Server job are proxy, with very high rate of new connections.
> Deadlock at peaktime can be easily reproduced in 10-15 minutes.
> 
> Deadlock occured on almost all 3.3-stable versions (tried 3.3.3 - 
> 3.3.4). It is not easy to try older kernel,
> but if required i can try.
> Usually also, because SYN rate very high, i can see:
> [   51.612987] TCP: Possible SYN flooding on port 8080. Sending 
> cookies.  Check SNMP counters.
> 
>   [  762.903868]
>   [  762.903880] =================================
>   [  762.903890] [ INFO: inconsistent lock state ]
>   [  762.903903] 3.3.4-build-0061 #8 Not tainted
>   [  762.904133] ---------------------------------
>   [  762.904344] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
>   [  762.904542] squid/1603 [HC0[0]:SC0[0]:HE1:SE1] takes:
>   [  762.904542]  (key#3){+.?...}, at: [<c0232cc4>] 
> __percpu_counter_sum+0xd/0x58
>   [  762.904542] {IN-SOFTIRQ-W} state was registered at:
>   [  762.904542]   [<c0158b84>] __lock_acquire+0x284/0xc26
>   [  762.904542]   [<c01598e8>] lock_acquire+0x71/0x85
>   [  762.904542]   [<c0349765>] _raw_spin_lock+0x33/0x40
>   [  762.904542]   [<c0232c93>] __percpu_counter_add+0x58/0x7c
>   [  762.904542]   [<c02cfde1>] sk_clone_lock+0x1e5/0x200
>   [  762.904542]   [<c0303ee4>] inet_csk_clone_lock+0xe/0x78
>   [  762.904542]   [<c0315778>] tcp_create_openreq_child+0x1b/0x404
>   [  762.904542]   [<c031339c>] tcp_v4_syn_recv_sock+0x32/0x1c1
>   [  762.904542]   [<c031615a>] tcp_check_req+0x1fd/0x2d7
>   [  762.904542]   [<c0313f77>] tcp_v4_do_rcv+0xab/0x194
>   [  762.904542]   [<c03153bb>] tcp_v4_rcv+0x3b3/0x5cc
>   [  762.904542]   [<c02fc0c4>] ip_local_deliver_finish+0x13a/0x1e9
>   [  762.904542]   [<c02fc539>] NF_HOOK.clone.11+0x46/0x4d
>   [  762.904542]   [<c02fc652>] ip_local_deliver+0x41/0x45
>   [  762.904542]   [<c02fc4d1>] ip_rcv_finish+0x31a/0x33c
>   [  762.904542]   [<c02fc539>] NF_HOOK.clone.11+0x46/0x4d
>   [  762.904542]   [<c02fc857>] ip_rcv+0x201/0x23e
>   [  762.904542]   [<c02daa3a>] __netif_receive_skb+0x319/0x368
>   [  762.904542]   [<c02dac07>] netif_receive_skb+0x4e/0x7d
>   [  762.904542]   [<c02dacf6>] napi_skb_finish+0x1e/0x34
>   [  762.904542]   [<c02db122>] napi_gro_receive+0x20/0x24
>   [  762.904542]   [<f85d1743>] e1000_receive_skb+0x3f/0x45 [e1000e]
>   [  762.904542]   [<f85d3464>] e1000_clean_rx_irq+0x1f9/0x284 [e1000e]
>   [  762.904542]   [<f85d3926>] e1000_clean+0x62/0x1f4 [e1000e]
>   [  762.904542]   [<c02db228>] net_rx_action+0x90/0x160
>   [  762.904542]   [<c012a445>] __do_softirq+0x7b/0x118
>   [  762.904542] irq event stamp: 156915469
>   [  762.904542] hardirqs last  enabled at (156915469): [<c019b4f4>] 
> __slab_alloc.clone.58.clone.63+0xc4/0x2de
>   [  762.904542] hardirqs last disabled at (156915468): [<c019b452>] 
> __slab_alloc.clone.58.clone.63+0x22/0x2de
>   [  762.904542] softirqs last  enabled at (156915466): [<c02ce677>] 
> lock_sock_nested+0x64/0x6c
>   [  762.904542] softirqs last disabled at (156915464): [<c0349914>] 
> _raw_spin_lock_bh+0xe/0x45
>   [  762.904542]
>   [  762.904542] other info that might help us debug this:
>   [  762.904542]  Possible unsafe locking scenario:
>   [  762.904542]
>   [  762.904542]        CPU0
>   [  762.904542]        ----
>   [  762.904542]   lock(key#3);
>   [  762.904542]   <Interrupt>
>   [  762.904542]     lock(key#3);
>   [  762.904542]
>   [  762.904542]  *** DEADLOCK ***
>   [  762.904542]
>   [  762.904542] 1 lock held by squid/1603:
>   [  762.904542]  #0:  (sk_lock-AF_INET){+.+.+.}, at: [<c03055c0>] 
> lock_sock+0xa/0xc
>   [  762.904542]
>   [  762.904542] stack backtrace:
>   [  762.904542] Pid: 1603, comm: squid Not tainted 3.3.4-build-0061 #8
>   [  762.904542] Call Trace:
>   [  762.904542]  [<c0347b73>] ? printk+0x18/0x1d
>   [  762.904542]  [<c015873a>] valid_state+0x1f6/0x201
>   [  762.904542]  [<c0158816>] mark_lock+0xd1/0x1bb
>   [  762.904542]  [<c015876b>] ? mark_lock+0x26/0x1bb
>   [  762.904542]  [<c015805d>] ? check_usage_forwards+0x77/0x77
>   [  762.904542]  [<c0158bf8>] __lock_acquire+0x2f8/0xc26
>   [  762.904542]  [<c0159b8e>] ? mark_held_locks+0x5d/0x7b
>   [  762.904542]  [<c0159cf6>] ? trace_hardirqs_on+0xb/0xd
>   [  762.904542]  [<c0158dd4>] ? __lock_acquire+0x4d4/0xc26
>   [  762.904542]  [<c01598e8>] lock_acquire+0x71/0x85
>   [  762.904542]  [<c0232cc4>] ? __percpu_counter_sum+0xd/0x58
>   [  762.904542]  [<c0349765>] _raw_spin_lock+0x33/0x40
>   [  762.904542]  [<c0232cc4>] ? __percpu_counter_sum+0xd/0x58
>   [  762.904542]  [<c0232cc4>] __percpu_counter_sum+0xd/0x58
>   [  762.904542]  [<c02cebc4>] __sk_mem_schedule+0xdd/0x1c7
>   [  762.904542]  [<c02d178d>] ? __alloc_skb+0x76/0x100
>   [  762.904542]  [<c0305e8e>] sk_wmem_schedule+0x21/0x2d
>   [  762.904542]  [<c0306370>] sk_stream_alloc_skb+0x42/0xaa
>   [  762.904542]  [<c0306567>] tcp_sendmsg+0x18f/0x68b
>   [  762.904542]  [<c031f3dc>] ? ip_fast_csum+0x30/0x30
>   [  762.904542]  [<c0320193>] inet_sendmsg+0x53/0x5a
>   [  762.904542]  [<c02cb633>] sock_aio_write+0xd2/0xda
>   [  762.904542]  [<c015876b>] ? mark_lock+0x26/0x1bb
>   [  762.904542]  [<c01a1017>] do_sync_write+0x9f/0xd9
>   [  762.904542]  [<c01a2111>] ? file_free_rcu+0x2f/0x2f
>   [  762.904542]  [<c01a17a1>] vfs_write+0x8f/0xab
>   [  762.904542]  [<c01a284d>] ? fget_light+0x75/0x7c
>   [  762.904542]  [<c01a1900>] sys_write+0x3d/0x5e
>   [  762.904542]  [<c0349ec9>] syscall_call+0x7/0xb
>   [  762.904542]  [<c0340000>] ? rp_sidt+0x41/0x83
> 


OK, so when we have memory pressure we can call
percpu_counter_read_positive() with SOFTIRQ enabled, and lockdep
complains...

This bug was probably added in 2008, in commit 1748376b6626a
(net: Use a percpu_counter for sockets_allocated)

I'll have to backport the following patch, can you test it please ?

Thanks !

 net/core/sock.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/core/sock.c b/net/core/sock.c
index b2e14c0..08fc929 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1818,7 +1818,9 @@ int __sk_mem_schedule(struct sock *sk, int size, int kind)
 
 		if (!sk_under_memory_pressure(sk))
 			return 1;
+		local_bh_disable();
 		alloc = sk_sockets_allocated_read_positive(sk);
+		local_bh_enable();
 		if (sk_prot_mem_limits(sk, 2) > alloc *
 		    sk_mem_pages(sk->sk_wmem_queued +
 				 atomic_read(&sk->sk_rmem_alloc) +









^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: inconsistent lock/deadlock crash, vanilla 3.3.4, 32bit, tcp
  2012-04-29  8:27 ` Eric Dumazet
@ 2012-04-29  8:40   ` Eric Dumazet
  2012-04-29  8:44     ` Eric Dumazet
  2012-04-29  8:46   ` Denys Fedoryshchenko
  1 sibling, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2012-04-29  8:40 UTC (permalink / raw)
  To: Denys Fedoryshchenko
  Cc: netdev, davem, jmorris, yoshfuji, kaber, linux-kernel, Glauber Costa

On Sun, 2012-04-29 at 10:27 +0200, Eric Dumazet wrote:
> On Sun, 2012-04-29 at 10:41 +0300, Denys Fedoryshchenko wrote:
> > I apologize for late night post, and a lot of trash left in report.
> > I will cleanup it up now, and send with CC to maintainers.
> > 
> > Server job are proxy, with very high rate of new connections.
> > Deadlock at peaktime can be easily reproduced in 10-15 minutes.
> > 
> > Deadlock occured on almost all 3.3-stable versions (tried 3.3.3 - 
> > 3.3.4). It is not easy to try older kernel,
> > but if required i can try.
> > Usually also, because SYN rate very high, i can see:
> > [   51.612987] TCP: Possible SYN flooding on port 8080. Sending 
> > cookies.  Check SNMP counters.
> > 
> >   [  762.903868]
> >   [  762.903880] =================================
> >   [  762.903890] [ INFO: inconsistent lock state ]
> >   [  762.903903] 3.3.4-build-0061 #8 Not tainted
> >   [  762.904133] ---------------------------------
> >   [  762.904344] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
> >   [  762.904542] squid/1603 [HC0[0]:SC0[0]:HE1:SE1] takes:
> >   [  762.904542]  (key#3){+.?...}, at: [<c0232cc4>] 
> > __percpu_counter_sum+0xd/0x58
> >   [  762.904542] {IN-SOFTIRQ-W} state was registered at:
> >   [  762.904542]   [<c0158b84>] __lock_acquire+0x284/0xc26
> >   [  762.904542]   [<c01598e8>] lock_acquire+0x71/0x85
> >   [  762.904542]   [<c0349765>] _raw_spin_lock+0x33/0x40
> >   [  762.904542]   [<c0232c93>] __percpu_counter_add+0x58/0x7c
> >   [  762.904542]   [<c02cfde1>] sk_clone_lock+0x1e5/0x200
> >   [  762.904542]   [<c0303ee4>] inet_csk_clone_lock+0xe/0x78
> >   [  762.904542]   [<c0315778>] tcp_create_openreq_child+0x1b/0x404
> >   [  762.904542]   [<c031339c>] tcp_v4_syn_recv_sock+0x32/0x1c1
> >   [  762.904542]   [<c031615a>] tcp_check_req+0x1fd/0x2d7
> >   [  762.904542]   [<c0313f77>] tcp_v4_do_rcv+0xab/0x194
> >   [  762.904542]   [<c03153bb>] tcp_v4_rcv+0x3b3/0x5cc
> >   [  762.904542]   [<c02fc0c4>] ip_local_deliver_finish+0x13a/0x1e9
> >   [  762.904542]   [<c02fc539>] NF_HOOK.clone.11+0x46/0x4d
> >   [  762.904542]   [<c02fc652>] ip_local_deliver+0x41/0x45
> >   [  762.904542]   [<c02fc4d1>] ip_rcv_finish+0x31a/0x33c
> >   [  762.904542]   [<c02fc539>] NF_HOOK.clone.11+0x46/0x4d
> >   [  762.904542]   [<c02fc857>] ip_rcv+0x201/0x23e
> >   [  762.904542]   [<c02daa3a>] __netif_receive_skb+0x319/0x368
> >   [  762.904542]   [<c02dac07>] netif_receive_skb+0x4e/0x7d
> >   [  762.904542]   [<c02dacf6>] napi_skb_finish+0x1e/0x34
> >   [  762.904542]   [<c02db122>] napi_gro_receive+0x20/0x24
> >   [  762.904542]   [<f85d1743>] e1000_receive_skb+0x3f/0x45 [e1000e]
> >   [  762.904542]   [<f85d3464>] e1000_clean_rx_irq+0x1f9/0x284 [e1000e]
> >   [  762.904542]   [<f85d3926>] e1000_clean+0x62/0x1f4 [e1000e]
> >   [  762.904542]   [<c02db228>] net_rx_action+0x90/0x160
> >   [  762.904542]   [<c012a445>] __do_softirq+0x7b/0x118
> >   [  762.904542] irq event stamp: 156915469
> >   [  762.904542] hardirqs last  enabled at (156915469): [<c019b4f4>] 
> > __slab_alloc.clone.58.clone.63+0xc4/0x2de
> >   [  762.904542] hardirqs last disabled at (156915468): [<c019b452>] 
> > __slab_alloc.clone.58.clone.63+0x22/0x2de
> >   [  762.904542] softirqs last  enabled at (156915466): [<c02ce677>] 
> > lock_sock_nested+0x64/0x6c
> >   [  762.904542] softirqs last disabled at (156915464): [<c0349914>] 
> > _raw_spin_lock_bh+0xe/0x45
> >   [  762.904542]
> >   [  762.904542] other info that might help us debug this:
> >   [  762.904542]  Possible unsafe locking scenario:
> >   [  762.904542]
> >   [  762.904542]        CPU0
> >   [  762.904542]        ----
> >   [  762.904542]   lock(key#3);
> >   [  762.904542]   <Interrupt>
> >   [  762.904542]     lock(key#3);
> >   [  762.904542]
> >   [  762.904542]  *** DEADLOCK ***
> >   [  762.904542]
> >   [  762.904542] 1 lock held by squid/1603:
> >   [  762.904542]  #0:  (sk_lock-AF_INET){+.+.+.}, at: [<c03055c0>] 
> > lock_sock+0xa/0xc
> >   [  762.904542]
> >   [  762.904542] stack backtrace:
> >   [  762.904542] Pid: 1603, comm: squid Not tainted 3.3.4-build-0061 #8
> >   [  762.904542] Call Trace:
> >   [  762.904542]  [<c0347b73>] ? printk+0x18/0x1d
> >   [  762.904542]  [<c015873a>] valid_state+0x1f6/0x201
> >   [  762.904542]  [<c0158816>] mark_lock+0xd1/0x1bb
> >   [  762.904542]  [<c015876b>] ? mark_lock+0x26/0x1bb
> >   [  762.904542]  [<c015805d>] ? check_usage_forwards+0x77/0x77
> >   [  762.904542]  [<c0158bf8>] __lock_acquire+0x2f8/0xc26
> >   [  762.904542]  [<c0159b8e>] ? mark_held_locks+0x5d/0x7b
> >   [  762.904542]  [<c0159cf6>] ? trace_hardirqs_on+0xb/0xd
> >   [  762.904542]  [<c0158dd4>] ? __lock_acquire+0x4d4/0xc26
> >   [  762.904542]  [<c01598e8>] lock_acquire+0x71/0x85
> >   [  762.904542]  [<c0232cc4>] ? __percpu_counter_sum+0xd/0x58
> >   [  762.904542]  [<c0349765>] _raw_spin_lock+0x33/0x40
> >   [  762.904542]  [<c0232cc4>] ? __percpu_counter_sum+0xd/0x58
> >   [  762.904542]  [<c0232cc4>] __percpu_counter_sum+0xd/0x58
> >   [  762.904542]  [<c02cebc4>] __sk_mem_schedule+0xdd/0x1c7
> >   [  762.904542]  [<c02d178d>] ? __alloc_skb+0x76/0x100
> >   [  762.904542]  [<c0305e8e>] sk_wmem_schedule+0x21/0x2d
> >   [  762.904542]  [<c0306370>] sk_stream_alloc_skb+0x42/0xaa
> >   [  762.904542]  [<c0306567>] tcp_sendmsg+0x18f/0x68b
> >   [  762.904542]  [<c031f3dc>] ? ip_fast_csum+0x30/0x30
> >   [  762.904542]  [<c0320193>] inet_sendmsg+0x53/0x5a
> >   [  762.904542]  [<c02cb633>] sock_aio_write+0xd2/0xda
> >   [  762.904542]  [<c015876b>] ? mark_lock+0x26/0x1bb
> >   [  762.904542]  [<c01a1017>] do_sync_write+0x9f/0xd9
> >   [  762.904542]  [<c01a2111>] ? file_free_rcu+0x2f/0x2f
> >   [  762.904542]  [<c01a17a1>] vfs_write+0x8f/0xab
> >   [  762.904542]  [<c01a284d>] ? fget_light+0x75/0x7c
> >   [  762.904542]  [<c01a1900>] sys_write+0x3d/0x5e
> >   [  762.904542]  [<c0349ec9>] syscall_call+0x7/0xb
> >   [  762.904542]  [<c0340000>] ? rp_sidt+0x41/0x83
> > 
> 
> 
> OK, so when we have memory pressure we can call
> percpu_counter_read_positive() with SOFTIRQ enabled, and lockdep
> complains...
> 
> This bug was probably added in 2008, in commit 1748376b6626a
> (net: Use a percpu_counter for sockets_allocated)

Hmm, no this patch was fine.

Bug was in fact added by Glauber Costa in commit 180d8cd942ce336b2c869
(foundations of per-cgroup memory pressure controlling.)

Because he replaced the safe percpu_counter_read_positive() call to
unsafe percpu_counter_sum_positive() in
sk_sockets_allocated_read_positive()

But anyway the patch I sent should fix the problem.

> 
> I'll have to backport the following patch, can you test it please ?
> 
> Thanks !
> 
>  net/core/sock.c |    2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/net/core/sock.c b/net/core/sock.c
> index b2e14c0..08fc929 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -1818,7 +1818,9 @@ int __sk_mem_schedule(struct sock *sk, int size, int kind)
>  
>  		if (!sk_under_memory_pressure(sk))
>  			return 1;
> +		local_bh_disable();
>  		alloc = sk_sockets_allocated_read_positive(sk);
> +		local_bh_enable();
>  		if (sk_prot_mem_limits(sk, 2) > alloc *
>  		    sk_mem_pages(sk->sk_wmem_queued +
>  				 atomic_read(&sk->sk_rmem_alloc) +
> 
> 
> 
> 
> 
> 
> 



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: inconsistent lock/deadlock crash, vanilla 3.3.4, 32bit, tcp
  2012-04-29  8:40   ` Eric Dumazet
@ 2012-04-29  8:44     ` Eric Dumazet
  2012-04-30  4:45       ` Glauber Costa
  0 siblings, 1 reply; 10+ messages in thread
From: Eric Dumazet @ 2012-04-29  8:44 UTC (permalink / raw)
  To: Denys Fedoryshchenko
  Cc: netdev, davem, jmorris, yoshfuji, kaber, linux-kernel, Glauber Costa

On Sun, 2012-04-29 at 10:40 +0200, Eric Dumazet wrote:
> On Sun, 2012-04-29 at 10:27 +0200, Eric Dumazet wrote:
> > On Sun, 2012-04-29 at 10:41 +0300, Denys Fedoryshchenko wrote:
> > > I apologize for late night post, and a lot of trash left in report.
> > > I will cleanup it up now, and send with CC to maintainers.
> > > 
> > > Server job are proxy, with very high rate of new connections.
> > > Deadlock at peaktime can be easily reproduced in 10-15 minutes.
> > > 
> > > Deadlock occured on almost all 3.3-stable versions (tried 3.3.3 - 
> > > 3.3.4). It is not easy to try older kernel,
> > > but if required i can try.
> > > Usually also, because SYN rate very high, i can see:
> > > [   51.612987] TCP: Possible SYN flooding on port 8080. Sending 
> > > cookies.  Check SNMP counters.
> > > 
> > >   [  762.903868]
> > >   [  762.903880] =================================
> > >   [  762.903890] [ INFO: inconsistent lock state ]
> > >   [  762.903903] 3.3.4-build-0061 #8 Not tainted
> > >   [  762.904133] ---------------------------------
> > >   [  762.904344] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
> > >   [  762.904542] squid/1603 [HC0[0]:SC0[0]:HE1:SE1] takes:
> > >   [  762.904542]  (key#3){+.?...}, at: [<c0232cc4>] 
> > > __percpu_counter_sum+0xd/0x58
> > >   [  762.904542] {IN-SOFTIRQ-W} state was registered at:
> > >   [  762.904542]   [<c0158b84>] __lock_acquire+0x284/0xc26
> > >   [  762.904542]   [<c01598e8>] lock_acquire+0x71/0x85
> > >   [  762.904542]   [<c0349765>] _raw_spin_lock+0x33/0x40
> > >   [  762.904542]   [<c0232c93>] __percpu_counter_add+0x58/0x7c
> > >   [  762.904542]   [<c02cfde1>] sk_clone_lock+0x1e5/0x200
> > >   [  762.904542]   [<c0303ee4>] inet_csk_clone_lock+0xe/0x78
> > >   [  762.904542]   [<c0315778>] tcp_create_openreq_child+0x1b/0x404
> > >   [  762.904542]   [<c031339c>] tcp_v4_syn_recv_sock+0x32/0x1c1
> > >   [  762.904542]   [<c031615a>] tcp_check_req+0x1fd/0x2d7
> > >   [  762.904542]   [<c0313f77>] tcp_v4_do_rcv+0xab/0x194
> > >   [  762.904542]   [<c03153bb>] tcp_v4_rcv+0x3b3/0x5cc
> > >   [  762.904542]   [<c02fc0c4>] ip_local_deliver_finish+0x13a/0x1e9
> > >   [  762.904542]   [<c02fc539>] NF_HOOK.clone.11+0x46/0x4d
> > >   [  762.904542]   [<c02fc652>] ip_local_deliver+0x41/0x45
> > >   [  762.904542]   [<c02fc4d1>] ip_rcv_finish+0x31a/0x33c
> > >   [  762.904542]   [<c02fc539>] NF_HOOK.clone.11+0x46/0x4d
> > >   [  762.904542]   [<c02fc857>] ip_rcv+0x201/0x23e
> > >   [  762.904542]   [<c02daa3a>] __netif_receive_skb+0x319/0x368
> > >   [  762.904542]   [<c02dac07>] netif_receive_skb+0x4e/0x7d
> > >   [  762.904542]   [<c02dacf6>] napi_skb_finish+0x1e/0x34
> > >   [  762.904542]   [<c02db122>] napi_gro_receive+0x20/0x24
> > >   [  762.904542]   [<f85d1743>] e1000_receive_skb+0x3f/0x45 [e1000e]
> > >   [  762.904542]   [<f85d3464>] e1000_clean_rx_irq+0x1f9/0x284 [e1000e]
> > >   [  762.904542]   [<f85d3926>] e1000_clean+0x62/0x1f4 [e1000e]
> > >   [  762.904542]   [<c02db228>] net_rx_action+0x90/0x160
> > >   [  762.904542]   [<c012a445>] __do_softirq+0x7b/0x118
> > >   [  762.904542] irq event stamp: 156915469
> > >   [  762.904542] hardirqs last  enabled at (156915469): [<c019b4f4>] 
> > > __slab_alloc.clone.58.clone.63+0xc4/0x2de
> > >   [  762.904542] hardirqs last disabled at (156915468): [<c019b452>] 
> > > __slab_alloc.clone.58.clone.63+0x22/0x2de
> > >   [  762.904542] softirqs last  enabled at (156915466): [<c02ce677>] 
> > > lock_sock_nested+0x64/0x6c
> > >   [  762.904542] softirqs last disabled at (156915464): [<c0349914>] 
> > > _raw_spin_lock_bh+0xe/0x45
> > >   [  762.904542]
> > >   [  762.904542] other info that might help us debug this:
> > >   [  762.904542]  Possible unsafe locking scenario:
> > >   [  762.904542]
> > >   [  762.904542]        CPU0
> > >   [  762.904542]        ----
> > >   [  762.904542]   lock(key#3);
> > >   [  762.904542]   <Interrupt>
> > >   [  762.904542]     lock(key#3);
> > >   [  762.904542]
> > >   [  762.904542]  *** DEADLOCK ***
> > >   [  762.904542]
> > >   [  762.904542] 1 lock held by squid/1603:
> > >   [  762.904542]  #0:  (sk_lock-AF_INET){+.+.+.}, at: [<c03055c0>] 
> > > lock_sock+0xa/0xc
> > >   [  762.904542]
> > >   [  762.904542] stack backtrace:
> > >   [  762.904542] Pid: 1603, comm: squid Not tainted 3.3.4-build-0061 #8
> > >   [  762.904542] Call Trace:
> > >   [  762.904542]  [<c0347b73>] ? printk+0x18/0x1d
> > >   [  762.904542]  [<c015873a>] valid_state+0x1f6/0x201
> > >   [  762.904542]  [<c0158816>] mark_lock+0xd1/0x1bb
> > >   [  762.904542]  [<c015876b>] ? mark_lock+0x26/0x1bb
> > >   [  762.904542]  [<c015805d>] ? check_usage_forwards+0x77/0x77
> > >   [  762.904542]  [<c0158bf8>] __lock_acquire+0x2f8/0xc26
> > >   [  762.904542]  [<c0159b8e>] ? mark_held_locks+0x5d/0x7b
> > >   [  762.904542]  [<c0159cf6>] ? trace_hardirqs_on+0xb/0xd
> > >   [  762.904542]  [<c0158dd4>] ? __lock_acquire+0x4d4/0xc26
> > >   [  762.904542]  [<c01598e8>] lock_acquire+0x71/0x85
> > >   [  762.904542]  [<c0232cc4>] ? __percpu_counter_sum+0xd/0x58
> > >   [  762.904542]  [<c0349765>] _raw_spin_lock+0x33/0x40
> > >   [  762.904542]  [<c0232cc4>] ? __percpu_counter_sum+0xd/0x58
> > >   [  762.904542]  [<c0232cc4>] __percpu_counter_sum+0xd/0x58
> > >   [  762.904542]  [<c02cebc4>] __sk_mem_schedule+0xdd/0x1c7
> > >   [  762.904542]  [<c02d178d>] ? __alloc_skb+0x76/0x100
> > >   [  762.904542]  [<c0305e8e>] sk_wmem_schedule+0x21/0x2d
> > >   [  762.904542]  [<c0306370>] sk_stream_alloc_skb+0x42/0xaa
> > >   [  762.904542]  [<c0306567>] tcp_sendmsg+0x18f/0x68b
> > >   [  762.904542]  [<c031f3dc>] ? ip_fast_csum+0x30/0x30
> > >   [  762.904542]  [<c0320193>] inet_sendmsg+0x53/0x5a
> > >   [  762.904542]  [<c02cb633>] sock_aio_write+0xd2/0xda
> > >   [  762.904542]  [<c015876b>] ? mark_lock+0x26/0x1bb
> > >   [  762.904542]  [<c01a1017>] do_sync_write+0x9f/0xd9
> > >   [  762.904542]  [<c01a2111>] ? file_free_rcu+0x2f/0x2f
> > >   [  762.904542]  [<c01a17a1>] vfs_write+0x8f/0xab
> > >   [  762.904542]  [<c01a284d>] ? fget_light+0x75/0x7c
> > >   [  762.904542]  [<c01a1900>] sys_write+0x3d/0x5e
> > >   [  762.904542]  [<c0349ec9>] syscall_call+0x7/0xb
> > >   [  762.904542]  [<c0340000>] ? rp_sidt+0x41/0x83
> > > 
> > 
> > 
> > OK, so when we have memory pressure we can call
> > percpu_counter_read_positive() with SOFTIRQ enabled, and lockdep
> > complains...
> > 
> > This bug was probably added in 2008, in commit 1748376b6626a
> > (net: Use a percpu_counter for sockets_allocated)
> 
> Hmm, no this patch was fine.
> 
> Bug was in fact added by Glauber Costa in commit 180d8cd942ce336b2c869
> (foundations of per-cgroup memory pressure controlling.)
> 
> Because he replaced the safe percpu_counter_read_positive() call to
> unsafe percpu_counter_sum_positive() in
> sk_sockets_allocated_read_positive()
> 
> But anyway the patch I sent should fix the problem.

Thinking again, I am not sure why Glauber did this change. He probably
made a typo or something.

Maybe we should revert and use the following patch instead, since we
dont need a precise sum at this point.

This better matches "sk_sockets_allocated_read_positive" intent anyway

diff --git a/include/net/sock.h b/include/net/sock.h
index 188532e..5a0a58a 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1129,9 +1129,9 @@ sk_sockets_allocated_read_positive(struct sock *sk)
 	struct proto *prot = sk->sk_prot;
 
 	if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
-		return percpu_counter_sum_positive(sk->sk_cgrp->sockets_allocated);
+		return percpu_counter_read_positive(sk->sk_cgrp->sockets_allocated);
 
-	return percpu_counter_sum_positive(prot->sockets_allocated);
+	return percpu_counter_read_positive(prot->sockets_allocated);
 }
 
 static inline int



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: inconsistent lock/deadlock crash, vanilla 3.3.4, 32bit, tcp
  2012-04-29  8:27 ` Eric Dumazet
  2012-04-29  8:40   ` Eric Dumazet
@ 2012-04-29  8:46   ` Denys Fedoryshchenko
  2012-04-29  9:21     ` Eric Dumazet
  1 sibling, 1 reply; 10+ messages in thread
From: Denys Fedoryshchenko @ 2012-04-29  8:46 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, davem, jmorris, yoshfuji, kaber, linux-kernel, Glauber Costa

On 2012-04-29 11:27, Eric Dumazet wrote:
>
>
> OK, so when we have memory pressure we can call
> percpu_counter_read_positive() with SOFTIRQ enabled, and lockdep
> complains...
>
> This bug was probably added in 2008, in commit 1748376b6626a
> (net: Use a percpu_counter for sockets_allocated)
>
> I'll have to backport the following patch, can you test it please ?
>
> Thanks !
>
>  net/core/sock.c |    2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/net/core/sock.c b/net/core/sock.c
> index b2e14c0..08fc929 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -1818,7 +1818,9 @@ int __sk_mem_schedule(struct sock *sk, int
> size, int kind)
>
>  		if (!sk_under_memory_pressure(sk))
>  			return 1;
> +		local_bh_disable();
>  		alloc = sk_sockets_allocated_read_positive(sk);
> +		local_bh_enable();
>  		if (sk_prot_mem_limits(sk, 2) > alloc *
>  		    sk_mem_pages(sk->sk_wmem_queued +
>  				 atomic_read(&sk->sk_rmem_alloc) +

I will try to deploy it and test as soon as someone will powercycle 
server on other side.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: inconsistent lock/deadlock crash, vanilla 3.3.4, 32bit, tcp
  2012-04-29  8:46   ` Denys Fedoryshchenko
@ 2012-04-29  9:21     ` Eric Dumazet
  2012-04-29 21:32       ` Neal Cardwell
  2012-04-30 17:38       ` David Miller
  0 siblings, 2 replies; 10+ messages in thread
From: Eric Dumazet @ 2012-04-29  9:21 UTC (permalink / raw)
  To: Denys Fedoryshchenko
  Cc: netdev, davem, jmorris, yoshfuji, kaber, linux-kernel, Glauber Costa

From: Eric Dumazet <edumazet@google.com>

On Sun, 2012-04-29 at 11:46 +0300, Denys Fedoryshchenko wrote:

> I will try to deploy it and test as soon as someone will powercycle 
> server on other side.
> 

Prefer the second patch, this is a better one. I submit an official
patch right now.

Thanks !

[PATCH] net: fix sk_sockets_allocated_read_positive

Denys Fedoryshchenko reported frequent crashes on a proxy server and kindly
provided a lockdep report that explains it all :


  [  762.903868]
  [  762.903880] =================================
  [  762.903890] [ INFO: inconsistent lock state ]
  [  762.903903] 3.3.4-build-0061 #8 Not tainted
  [  762.904133] ---------------------------------
  [  762.904344] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
  [  762.904542] squid/1603 [HC0[0]:SC0[0]:HE1:SE1] takes:
  [  762.904542]  (key#3){+.?...}, at: [<c0232cc4>] 
__percpu_counter_sum+0xd/0x58
  [  762.904542] {IN-SOFTIRQ-W} state was registered at:
  [  762.904542]   [<c0158b84>] __lock_acquire+0x284/0xc26
  [  762.904542]   [<c01598e8>] lock_acquire+0x71/0x85
  [  762.904542]   [<c0349765>] _raw_spin_lock+0x33/0x40
  [  762.904542]   [<c0232c93>] __percpu_counter_add+0x58/0x7c
  [  762.904542]   [<c02cfde1>] sk_clone_lock+0x1e5/0x200
  [  762.904542]   [<c0303ee4>] inet_csk_clone_lock+0xe/0x78
  [  762.904542]   [<c0315778>] tcp_create_openreq_child+0x1b/0x404
  [  762.904542]   [<c031339c>] tcp_v4_syn_recv_sock+0x32/0x1c1
  [  762.904542]   [<c031615a>] tcp_check_req+0x1fd/0x2d7
  [  762.904542]   [<c0313f77>] tcp_v4_do_rcv+0xab/0x194
  [  762.904542]   [<c03153bb>] tcp_v4_rcv+0x3b3/0x5cc
  [  762.904542]   [<c02fc0c4>] ip_local_deliver_finish+0x13a/0x1e9
  [  762.904542]   [<c02fc539>] NF_HOOK.clone.11+0x46/0x4d
  [  762.904542]   [<c02fc652>] ip_local_deliver+0x41/0x45
  [  762.904542]   [<c02fc4d1>] ip_rcv_finish+0x31a/0x33c
  [  762.904542]   [<c02fc539>] NF_HOOK.clone.11+0x46/0x4d
  [  762.904542]   [<c02fc857>] ip_rcv+0x201/0x23e
  [  762.904542]   [<c02daa3a>] __netif_receive_skb+0x319/0x368
  [  762.904542]   [<c02dac07>] netif_receive_skb+0x4e/0x7d
  [  762.904542]   [<c02dacf6>] napi_skb_finish+0x1e/0x34
  [  762.904542]   [<c02db122>] napi_gro_receive+0x20/0x24
  [  762.904542]   [<f85d1743>] e1000_receive_skb+0x3f/0x45 [e1000e]
  [  762.904542]   [<f85d3464>] e1000_clean_rx_irq+0x1f9/0x284 [e1000e]
  [  762.904542]   [<f85d3926>] e1000_clean+0x62/0x1f4 [e1000e]
  [  762.904542]   [<c02db228>] net_rx_action+0x90/0x160
  [  762.904542]   [<c012a445>] __do_softirq+0x7b/0x118
  [  762.904542] irq event stamp: 156915469
  [  762.904542] hardirqs last  enabled at (156915469): [<c019b4f4>] 
__slab_alloc.clone.58.clone.63+0xc4/0x2de
  [  762.904542] hardirqs last disabled at (156915468): [<c019b452>] 
__slab_alloc.clone.58.clone.63+0x22/0x2de
  [  762.904542] softirqs last  enabled at (156915466): [<c02ce677>] 
lock_sock_nested+0x64/0x6c
  [  762.904542] softirqs last disabled at (156915464): [<c0349914>] 
_raw_spin_lock_bh+0xe/0x45
  [  762.904542]
  [  762.904542] other info that might help us debug this:
  [  762.904542]  Possible unsafe locking scenario:
  [  762.904542]
  [  762.904542]        CPU0
  [  762.904542]        ----
  [  762.904542]   lock(key#3);
  [  762.904542]   <Interrupt>
  [  762.904542]     lock(key#3);
  [  762.904542]
  [  762.904542]  *** DEADLOCK ***
  [  762.904542]
  [  762.904542] 1 lock held by squid/1603:
  [  762.904542]  #0:  (sk_lock-AF_INET){+.+.+.}, at: [<c03055c0>] 
lock_sock+0xa/0xc
  [  762.904542]
  [  762.904542] stack backtrace:
  [  762.904542] Pid: 1603, comm: squid Not tainted 3.3.4-build-0061 #8
  [  762.904542] Call Trace:
  [  762.904542]  [<c0347b73>] ? printk+0x18/0x1d
  [  762.904542]  [<c015873a>] valid_state+0x1f6/0x201
  [  762.904542]  [<c0158816>] mark_lock+0xd1/0x1bb
  [  762.904542]  [<c015876b>] ? mark_lock+0x26/0x1bb
  [  762.904542]  [<c015805d>] ? check_usage_forwards+0x77/0x77
  [  762.904542]  [<c0158bf8>] __lock_acquire+0x2f8/0xc26
  [  762.904542]  [<c0159b8e>] ? mark_held_locks+0x5d/0x7b
  [  762.904542]  [<c0159cf6>] ? trace_hardirqs_on+0xb/0xd
  [  762.904542]  [<c0158dd4>] ? __lock_acquire+0x4d4/0xc26
  [  762.904542]  [<c01598e8>] lock_acquire+0x71/0x85
  [  762.904542]  [<c0232cc4>] ? __percpu_counter_sum+0xd/0x58
  [  762.904542]  [<c0349765>] _raw_spin_lock+0x33/0x40
  [  762.904542]  [<c0232cc4>] ? __percpu_counter_sum+0xd/0x58
  [  762.904542]  [<c0232cc4>] __percpu_counter_sum+0xd/0x58
  [  762.904542]  [<c02cebc4>] __sk_mem_schedule+0xdd/0x1c7
  [  762.904542]  [<c02d178d>] ? __alloc_skb+0x76/0x100
  [  762.904542]  [<c0305e8e>] sk_wmem_schedule+0x21/0x2d
  [  762.904542]  [<c0306370>] sk_stream_alloc_skb+0x42/0xaa
  [  762.904542]  [<c0306567>] tcp_sendmsg+0x18f/0x68b
  [  762.904542]  [<c031f3dc>] ? ip_fast_csum+0x30/0x30
  [  762.904542]  [<c0320193>] inet_sendmsg+0x53/0x5a
  [  762.904542]  [<c02cb633>] sock_aio_write+0xd2/0xda
  [  762.904542]  [<c015876b>] ? mark_lock+0x26/0x1bb
  [  762.904542]  [<c01a1017>] do_sync_write+0x9f/0xd9
  [  762.904542]  [<c01a2111>] ? file_free_rcu+0x2f/0x2f
  [  762.904542]  [<c01a17a1>] vfs_write+0x8f/0xab
  [  762.904542]  [<c01a284d>] ? fget_light+0x75/0x7c
  [  762.904542]  [<c01a1900>] sys_write+0x3d/0x5e
  [  762.904542]  [<c0349ec9>] syscall_call+0x7/0xb
  [  762.904542]  [<c0340000>] ? rp_sidt+0x41/0x83


Bug is that sk_sockets_allocated_read_positive() calls
percpu_counter_sum_positive() without BH being disabled.

This bug was added in commit 180d8cd942ce33
(foundations of per-cgroup memory pressure controlling.), since previous
code was using percpu_counter_read_positive() which is IRQ safe.

In __sk_mem_schedule() we dont need the precise count of allocated
sockets and can revert to previous behavior.

Reported-by: Denys Fedoryshchenko <denys@visp.net.lb>
Sined-off-by: Eric Dumazet <edumazet@google.com>
Cc: Glauber Costa <glommer@parallels.com>
---
 include/net/sock.h |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 188532e..5a0a58a 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1129,9 +1129,9 @@ sk_sockets_allocated_read_positive(struct sock *sk)
 	struct proto *prot = sk->sk_prot;
 
 	if (mem_cgroup_sockets_enabled && sk->sk_cgrp)
-		return percpu_counter_sum_positive(sk->sk_cgrp->sockets_allocated);
+		return percpu_counter_read_positive(sk->sk_cgrp->sockets_allocated);
 
-	return percpu_counter_sum_positive(prot->sockets_allocated);
+	return percpu_counter_read_positive(prot->sockets_allocated);
 }
 
 static inline int



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: inconsistent lock/deadlock crash, vanilla 3.3.4, 32bit, tcp
  2012-04-29  9:21     ` Eric Dumazet
@ 2012-04-29 21:32       ` Neal Cardwell
  2012-04-30 17:38       ` David Miller
  1 sibling, 0 replies; 10+ messages in thread
From: Neal Cardwell @ 2012-04-29 21:32 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Denys Fedoryshchenko, netdev, davem, jmorris, yoshfuji, kaber,
	linux-kernel, Glauber Costa

On Sun, Apr 29, 2012 at 5:21 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> [PATCH] net: fix sk_sockets_allocated_read_positive
>
> Denys Fedoryshchenko reported frequent crashes on a proxy server and kindly
> provided a lockdep report that explains it all :
>
>
>  [  762.903868]
>  [  762.903880] =================================
>  [  762.903890] [ INFO: inconsistent lock state ]
>  [  762.903903] 3.3.4-build-0061 #8 Not tainted
>  [  762.904133] ---------------------------------
>  [  762.904344] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage.
>  [  762.904542] squid/1603 [HC0[0]:SC0[0]:HE1:SE1] takes:
>  [  762.904542]  (key#3){+.?...}, at: [<c0232cc4>]
> __percpu_counter_sum+0xd/0x58
>  [  762.904542] {IN-SOFTIRQ-W} state was registered at:
>  [  762.904542]   [<c0158b84>] __lock_acquire+0x284/0xc26
>  [  762.904542]   [<c01598e8>] lock_acquire+0x71/0x85
>  [  762.904542]   [<c0349765>] _raw_spin_lock+0x33/0x40
>  [  762.904542]   [<c0232c93>] __percpu_counter_add+0x58/0x7c
>  [  762.904542]   [<c02cfde1>] sk_clone_lock+0x1e5/0x200
>  [  762.904542]   [<c0303ee4>] inet_csk_clone_lock+0xe/0x78
>  [  762.904542]   [<c0315778>] tcp_create_openreq_child+0x1b/0x404
>  [  762.904542]   [<c031339c>] tcp_v4_syn_recv_sock+0x32/0x1c1
>  [  762.904542]   [<c031615a>] tcp_check_req+0x1fd/0x2d7
>  [  762.904542]   [<c0313f77>] tcp_v4_do_rcv+0xab/0x194
>  [  762.904542]   [<c03153bb>] tcp_v4_rcv+0x3b3/0x5cc
>  [  762.904542]   [<c02fc0c4>] ip_local_deliver_finish+0x13a/0x1e9
>  [  762.904542]   [<c02fc539>] NF_HOOK.clone.11+0x46/0x4d
>  [  762.904542]   [<c02fc652>] ip_local_deliver+0x41/0x45
>  [  762.904542]   [<c02fc4d1>] ip_rcv_finish+0x31a/0x33c
>  [  762.904542]   [<c02fc539>] NF_HOOK.clone.11+0x46/0x4d
>  [  762.904542]   [<c02fc857>] ip_rcv+0x201/0x23e
>  [  762.904542]   [<c02daa3a>] __netif_receive_skb+0x319/0x368
>  [  762.904542]   [<c02dac07>] netif_receive_skb+0x4e/0x7d
>  [  762.904542]   [<c02dacf6>] napi_skb_finish+0x1e/0x34
>  [  762.904542]   [<c02db122>] napi_gro_receive+0x20/0x24
>  [  762.904542]   [<f85d1743>] e1000_receive_skb+0x3f/0x45 [e1000e]
>  [  762.904542]   [<f85d3464>] e1000_clean_rx_irq+0x1f9/0x284 [e1000e]
>  [  762.904542]   [<f85d3926>] e1000_clean+0x62/0x1f4 [e1000e]
>  [  762.904542]   [<c02db228>] net_rx_action+0x90/0x160
>  [  762.904542]   [<c012a445>] __do_softirq+0x7b/0x118
>  [  762.904542] irq event stamp: 156915469
>  [  762.904542] hardirqs last  enabled at (156915469): [<c019b4f4>]
> __slab_alloc.clone.58.clone.63+0xc4/0x2de
>  [  762.904542] hardirqs last disabled at (156915468): [<c019b452>]
> __slab_alloc.clone.58.clone.63+0x22/0x2de
>  [  762.904542] softirqs last  enabled at (156915466): [<c02ce677>]
> lock_sock_nested+0x64/0x6c
>  [  762.904542] softirqs last disabled at (156915464): [<c0349914>]
> _raw_spin_lock_bh+0xe/0x45
>  [  762.904542]
>  [  762.904542] other info that might help us debug this:
>  [  762.904542]  Possible unsafe locking scenario:
>  [  762.904542]
>  [  762.904542]        CPU0
>  [  762.904542]        ----
>  [  762.904542]   lock(key#3);
>  [  762.904542]   <Interrupt>
>  [  762.904542]     lock(key#3);
>  [  762.904542]
>  [  762.904542]  *** DEADLOCK ***
>  [  762.904542]
>  [  762.904542] 1 lock held by squid/1603:
>  [  762.904542]  #0:  (sk_lock-AF_INET){+.+.+.}, at: [<c03055c0>]
> lock_sock+0xa/0xc
>  [  762.904542]
>  [  762.904542] stack backtrace:
>  [  762.904542] Pid: 1603, comm: squid Not tainted 3.3.4-build-0061 #8
>  [  762.904542] Call Trace:
>  [  762.904542]  [<c0347b73>] ? printk+0x18/0x1d
>  [  762.904542]  [<c015873a>] valid_state+0x1f6/0x201
>  [  762.904542]  [<c0158816>] mark_lock+0xd1/0x1bb
>  [  762.904542]  [<c015876b>] ? mark_lock+0x26/0x1bb
>  [  762.904542]  [<c015805d>] ? check_usage_forwards+0x77/0x77
>  [  762.904542]  [<c0158bf8>] __lock_acquire+0x2f8/0xc26
>  [  762.904542]  [<c0159b8e>] ? mark_held_locks+0x5d/0x7b
>  [  762.904542]  [<c0159cf6>] ? trace_hardirqs_on+0xb/0xd
>  [  762.904542]  [<c0158dd4>] ? __lock_acquire+0x4d4/0xc26
>  [  762.904542]  [<c01598e8>] lock_acquire+0x71/0x85
>  [  762.904542]  [<c0232cc4>] ? __percpu_counter_sum+0xd/0x58
>  [  762.904542]  [<c0349765>] _raw_spin_lock+0x33/0x40
>  [  762.904542]  [<c0232cc4>] ? __percpu_counter_sum+0xd/0x58
>  [  762.904542]  [<c0232cc4>] __percpu_counter_sum+0xd/0x58
>  [  762.904542]  [<c02cebc4>] __sk_mem_schedule+0xdd/0x1c7
>  [  762.904542]  [<c02d178d>] ? __alloc_skb+0x76/0x100
>  [  762.904542]  [<c0305e8e>] sk_wmem_schedule+0x21/0x2d
>  [  762.904542]  [<c0306370>] sk_stream_alloc_skb+0x42/0xaa
>  [  762.904542]  [<c0306567>] tcp_sendmsg+0x18f/0x68b
>  [  762.904542]  [<c031f3dc>] ? ip_fast_csum+0x30/0x30
>  [  762.904542]  [<c0320193>] inet_sendmsg+0x53/0x5a
>  [  762.904542]  [<c02cb633>] sock_aio_write+0xd2/0xda
>  [  762.904542]  [<c015876b>] ? mark_lock+0x26/0x1bb
>  [  762.904542]  [<c01a1017>] do_sync_write+0x9f/0xd9
>  [  762.904542]  [<c01a2111>] ? file_free_rcu+0x2f/0x2f
>  [  762.904542]  [<c01a17a1>] vfs_write+0x8f/0xab
>  [  762.904542]  [<c01a284d>] ? fget_light+0x75/0x7c
>  [  762.904542]  [<c01a1900>] sys_write+0x3d/0x5e
>  [  762.904542]  [<c0349ec9>] syscall_call+0x7/0xb
>  [  762.904542]  [<c0340000>] ? rp_sidt+0x41/0x83
>
>
> Bug is that sk_sockets_allocated_read_positive() calls
> percpu_counter_sum_positive() without BH being disabled.
>
> This bug was added in commit 180d8cd942ce33
> (foundations of per-cgroup memory pressure controlling.), since previous
> code was using percpu_counter_read_positive() which is IRQ safe.
>
> In __sk_mem_schedule() we dont need the precise count of allocated
> sockets and can revert to previous behavior.
>
> Reported-by: Denys Fedoryshchenko <denys@visp.net.lb>
> Sined-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Glauber Costa <glommer@parallels.com>
> ---

Acked-by: Neal Cardwell <ncardwell@google.com>

neal

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: inconsistent lock/deadlock crash, vanilla 3.3.4, 32bit, tcp
  2012-04-29  8:44     ` Eric Dumazet
@ 2012-04-30  4:45       ` Glauber Costa
  2012-04-30 16:49         ` David Miller
  0 siblings, 1 reply; 10+ messages in thread
From: Glauber Costa @ 2012-04-30  4:45 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Denys Fedoryshchenko, netdev, davem, jmorris, yoshfuji, kaber,
	linux-kernel

On 04/29/2012 05:44 AM, Eric Dumazet wrote:
> On Sun, 2012-04-29 at 10:40 +0200, Eric Dumazet wrote:
>> On Sun, 2012-04-29 at 10:27 +0200, Eric Dumazet wrote:
>>> On Sun, 2012-04-29 at 10:41 +0300, Denys Fedoryshchenko wrote:
>>>> I apologize for late night post, and a lot of trash left in report.
>>>> I will cleanup it up now, and send with CC to maintainers.
>>>>
>>>> Server job are proxy, with very high rate of new connections.
>>>> Deadlock at peaktime can be easily reproduced in 10-15 minutes.
>>>>
>>>> Deadlock occured on almost all 3.3-stable versions (tried 3.3.3 -
>>>> 3.3.4). It is not easy to try older kernel,
>>>> but if required i can try.
>>>> Usually also, because SYN rate very high, i can see:
>>>> [   51.612987] TCP: Possible SYN flooding on port 8080. Sending
>>>> cookies.  Check SNMP counters.
>>>>
>>>>    [  762.903868]
>>>>    [  762.903880] =================================
>>>>    [  762.903890] [ INFO: inconsistent lock state ]
>>>>    [  762.903903] 3.3.4-build-0061 #8 Not tainted
>>>>    [  762.904133] ---------------------------------
>>>>    [  762.904344] inconsistent {IN-SOFTIRQ-W} ->  {SOFTIRQ-ON-W} usage.
>>>>    [  762.904542] squid/1603 [HC0[0]:SC0[0]:HE1:SE1] takes:
>>>>    [  762.904542]  (key#3){+.?...}, at: [<c0232cc4>]
>>>> __percpu_counter_sum+0xd/0x58
>>>>    [  762.904542] {IN-SOFTIRQ-W} state was registered at:
>>>>    [  762.904542]   [<c0158b84>] __lock_acquire+0x284/0xc26
>>>>    [  762.904542]   [<c01598e8>] lock_acquire+0x71/0x85
>>>>    [  762.904542]   [<c0349765>] _raw_spin_lock+0x33/0x40
>>>>    [  762.904542]   [<c0232c93>] __percpu_counter_add+0x58/0x7c
>>>>    [  762.904542]   [<c02cfde1>] sk_clone_lock+0x1e5/0x200
>>>>    [  762.904542]   [<c0303ee4>] inet_csk_clone_lock+0xe/0x78
>>>>    [  762.904542]   [<c0315778>] tcp_create_openreq_child+0x1b/0x404
>>>>    [  762.904542]   [<c031339c>] tcp_v4_syn_recv_sock+0x32/0x1c1
>>>>    [  762.904542]   [<c031615a>] tcp_check_req+0x1fd/0x2d7
>>>>    [  762.904542]   [<c0313f77>] tcp_v4_do_rcv+0xab/0x194
>>>>    [  762.904542]   [<c03153bb>] tcp_v4_rcv+0x3b3/0x5cc
>>>>    [  762.904542]   [<c02fc0c4>] ip_local_deliver_finish+0x13a/0x1e9
>>>>    [  762.904542]   [<c02fc539>] NF_HOOK.clone.11+0x46/0x4d
>>>>    [  762.904542]   [<c02fc652>] ip_local_deliver+0x41/0x45
>>>>    [  762.904542]   [<c02fc4d1>] ip_rcv_finish+0x31a/0x33c
>>>>    [  762.904542]   [<c02fc539>] NF_HOOK.clone.11+0x46/0x4d
>>>>    [  762.904542]   [<c02fc857>] ip_rcv+0x201/0x23e
>>>>    [  762.904542]   [<c02daa3a>] __netif_receive_skb+0x319/0x368
>>>>    [  762.904542]   [<c02dac07>] netif_receive_skb+0x4e/0x7d
>>>>    [  762.904542]   [<c02dacf6>] napi_skb_finish+0x1e/0x34
>>>>    [  762.904542]   [<c02db122>] napi_gro_receive+0x20/0x24
>>>>    [  762.904542]   [<f85d1743>] e1000_receive_skb+0x3f/0x45 [e1000e]
>>>>    [  762.904542]   [<f85d3464>] e1000_clean_rx_irq+0x1f9/0x284 [e1000e]
>>>>    [  762.904542]   [<f85d3926>] e1000_clean+0x62/0x1f4 [e1000e]
>>>>    [  762.904542]   [<c02db228>] net_rx_action+0x90/0x160
>>>>    [  762.904542]   [<c012a445>] __do_softirq+0x7b/0x118
>>>>    [  762.904542] irq event stamp: 156915469
>>>>    [  762.904542] hardirqs last  enabled at (156915469): [<c019b4f4>]
>>>> __slab_alloc.clone.58.clone.63+0xc4/0x2de
>>>>    [  762.904542] hardirqs last disabled at (156915468): [<c019b452>]
>>>> __slab_alloc.clone.58.clone.63+0x22/0x2de
>>>>    [  762.904542] softirqs last  enabled at (156915466): [<c02ce677>]
>>>> lock_sock_nested+0x64/0x6c
>>>>    [  762.904542] softirqs last disabled at (156915464): [<c0349914>]
>>>> _raw_spin_lock_bh+0xe/0x45
>>>>    [  762.904542]
>>>>    [  762.904542] other info that might help us debug this:
>>>>    [  762.904542]  Possible unsafe locking scenario:
>>>>    [  762.904542]
>>>>    [  762.904542]        CPU0
>>>>    [  762.904542]        ----
>>>>    [  762.904542]   lock(key#3);
>>>>    [  762.904542]<Interrupt>
>>>>    [  762.904542]     lock(key#3);
>>>>    [  762.904542]
>>>>    [  762.904542]  *** DEADLOCK ***
>>>>    [  762.904542]
>>>>    [  762.904542] 1 lock held by squid/1603:
>>>>    [  762.904542]  #0:  (sk_lock-AF_INET){+.+.+.}, at: [<c03055c0>]
>>>> lock_sock+0xa/0xc
>>>>    [  762.904542]
>>>>    [  762.904542] stack backtrace:
>>>>    [  762.904542] Pid: 1603, comm: squid Not tainted 3.3.4-build-0061 #8
>>>>    [  762.904542] Call Trace:
>>>>    [  762.904542]  [<c0347b73>] ? printk+0x18/0x1d
>>>>    [  762.904542]  [<c015873a>] valid_state+0x1f6/0x201
>>>>    [  762.904542]  [<c0158816>] mark_lock+0xd1/0x1bb
>>>>    [  762.904542]  [<c015876b>] ? mark_lock+0x26/0x1bb
>>>>    [  762.904542]  [<c015805d>] ? check_usage_forwards+0x77/0x77
>>>>    [  762.904542]  [<c0158bf8>] __lock_acquire+0x2f8/0xc26
>>>>    [  762.904542]  [<c0159b8e>] ? mark_held_locks+0x5d/0x7b
>>>>    [  762.904542]  [<c0159cf6>] ? trace_hardirqs_on+0xb/0xd
>>>>    [  762.904542]  [<c0158dd4>] ? __lock_acquire+0x4d4/0xc26
>>>>    [  762.904542]  [<c01598e8>] lock_acquire+0x71/0x85
>>>>    [  762.904542]  [<c0232cc4>] ? __percpu_counter_sum+0xd/0x58
>>>>    [  762.904542]  [<c0349765>] _raw_spin_lock+0x33/0x40
>>>>    [  762.904542]  [<c0232cc4>] ? __percpu_counter_sum+0xd/0x58
>>>>    [  762.904542]  [<c0232cc4>] __percpu_counter_sum+0xd/0x58
>>>>    [  762.904542]  [<c02cebc4>] __sk_mem_schedule+0xdd/0x1c7
>>>>    [  762.904542]  [<c02d178d>] ? __alloc_skb+0x76/0x100
>>>>    [  762.904542]  [<c0305e8e>] sk_wmem_schedule+0x21/0x2d
>>>>    [  762.904542]  [<c0306370>] sk_stream_alloc_skb+0x42/0xaa
>>>>    [  762.904542]  [<c0306567>] tcp_sendmsg+0x18f/0x68b
>>>>    [  762.904542]  [<c031f3dc>] ? ip_fast_csum+0x30/0x30
>>>>    [  762.904542]  [<c0320193>] inet_sendmsg+0x53/0x5a
>>>>    [  762.904542]  [<c02cb633>] sock_aio_write+0xd2/0xda
>>>>    [  762.904542]  [<c015876b>] ? mark_lock+0x26/0x1bb
>>>>    [  762.904542]  [<c01a1017>] do_sync_write+0x9f/0xd9
>>>>    [  762.904542]  [<c01a2111>] ? file_free_rcu+0x2f/0x2f
>>>>    [  762.904542]  [<c01a17a1>] vfs_write+0x8f/0xab
>>>>    [  762.904542]  [<c01a284d>] ? fget_light+0x75/0x7c
>>>>    [  762.904542]  [<c01a1900>] sys_write+0x3d/0x5e
>>>>    [  762.904542]  [<c0349ec9>] syscall_call+0x7/0xb
>>>>    [  762.904542]  [<c0340000>] ? rp_sidt+0x41/0x83
>>>>
>>>
>>>
>>> OK, so when we have memory pressure we can call
>>> percpu_counter_read_positive() with SOFTIRQ enabled, and lockdep
>>> complains...
>>>
>>> This bug was probably added in 2008, in commit 1748376b6626a
>>> (net: Use a percpu_counter for sockets_allocated)
>>
>> Hmm, no this patch was fine.
>>
>> Bug was in fact added by Glauber Costa in commit 180d8cd942ce336b2c869
>> (foundations of per-cgroup memory pressure controlling.)
>>
>> Because he replaced the safe percpu_counter_read_positive() call to
>> unsafe percpu_counter_sum_positive() in
>> sk_sockets_allocated_read_positive()
>>
>> But anyway the patch I sent should fix the problem.
>
> Thinking again, I am not sure why Glauber did this change. He probably
> made a typo or something.
>

Indeed.

It wasn't my intent to change that, it was a mistake.

Very sorry.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: inconsistent lock/deadlock crash, vanilla 3.3.4, 32bit, tcp
  2012-04-30  4:45       ` Glauber Costa
@ 2012-04-30 16:49         ` David Miller
  0 siblings, 0 replies; 10+ messages in thread
From: David Miller @ 2012-04-30 16:49 UTC (permalink / raw)
  To: glommer
  Cc: eric.dumazet, denys, netdev, jmorris, yoshfuji, kaber, linux-kernel

From: Glauber Costa <glommer@parallels.com>
Date: Mon, 30 Apr 2012 01:45:29 -0300

> It wasn't my intent to change that, it was a mistake.

I very much regret succumbing to the unreasonable pressure to apply
your original patches.

We've come to learn that they were full of errors and that they needed
to cook for several more months before being included.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: inconsistent lock/deadlock crash, vanilla 3.3.4, 32bit, tcp
  2012-04-29  9:21     ` Eric Dumazet
  2012-04-29 21:32       ` Neal Cardwell
@ 2012-04-30 17:38       ` David Miller
  1 sibling, 0 replies; 10+ messages in thread
From: David Miller @ 2012-04-30 17:38 UTC (permalink / raw)
  To: eric.dumazet
  Cc: denys, netdev, jmorris, yoshfuji, kaber, linux-kernel, glommer

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Sun, 29 Apr 2012 11:21:56 +0200

> From: Eric Dumazet <edumazet@google.com>
> 
> [PATCH] net: fix sk_sockets_allocated_read_positive
> 
> Denys Fedoryshchenko reported frequent crashes on a proxy server and kindly
> provided a lockdep report that explains it all :
 ...
> Bug is that sk_sockets_allocated_read_positive() calls
> percpu_counter_sum_positive() without BH being disabled.
> 
> This bug was added in commit 180d8cd942ce33
> (foundations of per-cgroup memory pressure controlling.), since previous
> code was using percpu_counter_read_positive() which is IRQ safe.
> 
> In __sk_mem_schedule() we dont need the precise count of allocated
> sockets and can revert to previous behavior.
> 
> Reported-by: Denys Fedoryshchenko <denys@visp.net.lb>
> Sined-off-by: Eric Dumazet <edumazet@google.com>

Applied, thanks.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2012-04-30 17:40 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-29  7:41 inconsistent lock/deadlock crash, vanilla 3.3.4, 32bit, tcp Denys Fedoryshchenko
2012-04-29  8:27 ` Eric Dumazet
2012-04-29  8:40   ` Eric Dumazet
2012-04-29  8:44     ` Eric Dumazet
2012-04-30  4:45       ` Glauber Costa
2012-04-30 16:49         ` David Miller
2012-04-29  8:46   ` Denys Fedoryshchenko
2012-04-29  9:21     ` Eric Dumazet
2012-04-29 21:32       ` Neal Cardwell
2012-04-30 17:38       ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.