All of lore.kernel.org
 help / color / mirror / Atom feed
* [BUG] Deadlock on sk->sk_lock.slock
@ 2014-03-04  8:59 Lars Persson
  2014-03-04 14:16 ` Eric Dumazet
  0 siblings, 1 reply; 16+ messages in thread
From: Lars Persson @ 2014-03-04  8:59 UTC (permalink / raw)
  To: netdev

Hi

We can trigger an rcu stall by stressing the error handling in the CIFS file system. The test case induces network outages of varying length while data is written to the file system. Judging by the stack dump it might be a deadlock in the tcpv4 code. So far seen on stable 3.10.32.

My initial interpretation of the stack trace:
At frame 9 bh is enabled while holding the socket spinlock. This happens from __tcp_checksum_complete_user, which is inlined in tcp_rcv_established.

At frame 1 we try to grab the same lock from bh context with resulting deadlock.

-000 |M:0x0:0x802B6AF8(asm) <-- arch_spin_lock 
-001 |tcp_v4_rcv(skb = 0x8BD527A0) <-- sk = 0x8BE6B2A0
-002 |ip_local_deliver_finish(skb = 0x8BD527A0)
-003 |__netif_receive_skb_core(skb = 0x8BD527A0, ?)
-004 |netif_receive_skb(skb = 0x8BD527A0)
-005 |elk_poll(napi = 0x8C770500, budget = 64)
-006 |net_rx_action(?)
-007 |__do_softirq()
-008 |do_softirq()
-009 |local_bh_enable()
-010 |tcp_rcv_established(sk = 0x8BE6B2A0, skb = 0x87D3A9E0, th = 0x814EBE14, ?)
-011 |tcp_v4_do_rcv(sk = 0x8BE6B2A0, skb = 0x87D3A9E0)
-012 |tcp_delack_timer_handler(sk = 0x8BE6B2A0)
-013 |tcp_release_cb(sk = 0x8BE6B2A0)
-014 |release_sock(sk = 0x8BE6B2A0)
-015 |tcp_sendmsg(?, sk = 0x8BE6B2A0, ?, ?)
-016 |sock_sendmsg(sock = 0x8518C4C0, msg = 0x87D8DAA8, size = 4096)
-017 |kernel_sendmsg(?, ?, ?, ?, size = 4096)
-018 |smb_send_kvec(server = 0x87C4D400, iov = 0x87D8DB10, n_vec = 1, sent = 0x87D8DB1C)
-019 |smb_send_rqst(server = 0x87C4D400, rqst = 0x87D8DBA0)
-020 |cifs_call_async(server = 0x87C4D400, rqst = 0x87D8DBA0, receive = 0x0, callback = 0x80219514,
-021 |cifs_async_writev(wdata = 0x87FD6580)
-022 |cifs_writepages(mapping = 0x852096E4, wbc = 0x87D8DC88)
-023 |__writeback_single_inode(inode = 0x852095D0, wbc = 0x87D8DC88)
-024 |writeback_sb_inodes(sb = 0x87D6D800, wb = 0x87E4A9C0, work = 0x87D8DD88)
-025 |__writeback_inodes_wb(wb = 0x87E4A9C0, work = 0x87D8DD88)
-026 |wb_writeback(wb = 0x87E4A9C0, work = 0x87D8DD88)
-027 |wb_do_writeback(wb = 0x87E4A9C0, force_wait = 0)
-028 |bdi_writeback_workfn(work = 0x87E4A9CC)
-029 |process_one_work(worker = 0x8B045880, work = 0x87E4A9CC)
-030 |worker_thread(__worker = 0x8B045880)
-031 |kthread(_create = 0x87CADD90)
-032 |ret_from_kernel_thread(asm)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [BUG] Deadlock on sk->sk_lock.slock
  2014-03-04  8:59 [BUG] Deadlock on sk->sk_lock.slock Lars Persson
@ 2014-03-04 14:16 ` Eric Dumazet
  2014-03-04 18:48   ` David Miller
  2014-03-05 15:01   ` [BUG] Deadlock on sk->sk_lock.slock Lars Persson
  0 siblings, 2 replies; 16+ messages in thread
From: Eric Dumazet @ 2014-03-04 14:16 UTC (permalink / raw)
  To: Lars Persson; +Cc: netdev

On Tue, 2014-03-04 at 09:59 +0100, Lars Persson wrote:
> Hi
> 
> We can trigger an rcu stall by stressing the error handling in the CIFS file system. The test case induces network outages of varying length while data is written to the file system. Judging by the stack dump it might be a deadlock in the tcpv4 code. So far seen on stable 3.10.32.
> 
> My initial interpretation of the stack trace:
> At frame 9 bh is enabled while holding the socket spinlock. This happens from __tcp_checksum_complete_user, which is inlined in tcp_rcv_established.
> 


Thanks a lot for this report.


> At frame 1 we try to grab the same lock from bh context with resulting deadlock.
> 
> -000 |M:0x0:0x802B6AF8(asm) <-- arch_spin_lock 
> -001 |tcp_v4_rcv(skb = 0x8BD527A0) <-- sk = 0x8BE6B2A0
> -002 |ip_local_deliver_finish(skb = 0x8BD527A0)
> -003 |__netif_receive_skb_core(skb = 0x8BD527A0, ?)
> -004 |netif_receive_skb(skb = 0x8BD527A0)
> -005 |elk_poll(napi = 0x8C770500, budget = 64)
> -006 |net_rx_action(?)
> -007 |__do_softirq()
> -008 |do_softirq()
> -009 |local_bh_enable()
> -010 |tcp_rcv_established(sk = 0x8BE6B2A0, skb = 0x87D3A9E0, th = 0x814EBE14, ?)
> -011 |tcp_v4_do_rcv(sk = 0x8BE6B2A0, skb = 0x87D3A9E0)
> -012 |tcp_delack_timer_handler(sk = 0x8BE6B2A0)
> -013 |tcp_release_cb(sk = 0x8BE6B2A0)
> -014 |release_sock(sk = 0x8BE6B2A0)
> -015 |tcp_sendmsg(?, sk = 0x8BE6B2A0, ?, ?)
> -016 |sock_sendmsg(sock = 0x8518C4C0, msg = 0x87D8DAA8, size = 4096)
> -017 |kernel_sendmsg(?, ?, ?, ?, size = 4096)
> -018 |smb_send_kvec(server = 0x87C4D400, iov = 0x87D8DB10, n_vec = 1, sent = 0x87D8DB1C)
> -019 |smb_send_rqst(server = 0x87C4D400, rqst = 0x87D8DBA0)
> -020 |cifs_call_async(server = 0x87C4D400, rqst = 0x87D8DBA0, receive = 0x0, callback = 0x80219514,
> -021 |cifs_async_writev(wdata = 0x87FD6580)
> -022 |cifs_writepages(mapping = 0x852096E4, wbc = 0x87D8DC88)
> -023 |__writeback_single_inode(inode = 0x852095D0, wbc = 0x87D8DC88)
> -024 |writeback_sb_inodes(sb = 0x87D6D800, wb = 0x87E4A9C0, work = 0x87D8DD88)
> -025 |__writeback_inodes_wb(wb = 0x87E4A9C0, work = 0x87D8DD88)
> -026 |wb_writeback(wb = 0x87E4A9C0, work = 0x87D8DD88)
> -027 |wb_do_writeback(wb = 0x87E4A9C0, force_wait = 0)
> -028 |bdi_writeback_workfn(work = 0x87E4A9CC)
> -029 |process_one_work(worker = 0x8B045880, work = 0x87E4A9CC)
> -030 |worker_thread(__worker = 0x8B045880)
> -031 |kthread(_create = 0x87CADD90)
> -032 |ret_from_kernel_thread(asm)

So the callchain is 

release_sock()
{
 spin_lock_bh(&sk->sk_lock.slock);

 sk->sk_prot->release_cb(sk); //

 sk->sk_lock.owned = 0;
}

Meaning we have the socket spinlock locked by us, and we disabled BH.

Problem is tcp_delack_timer_handler() assumes it was called from timer
handler and sock was not owned by user.

Bug was added in commit 6f458dfb409272082c9bfa412f77ff2fc21c626f
("tcp: improve latencies of timer triggered events")

and triggers if NIC lacks RX checksum support.


Could you try following hack/patch ?

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 6e4809389cbf..ed17e1cacae2 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -4945,7 +4945,10 @@ static __sum16 __tcp_checksum_complete_user(struct sock *sk,
 {
 	__sum16 result;
 
-	if (sock_owned_by_user(sk)) {
+	/* If we are called from tcp_release_cb(),
+	 * we do not want to enable BH
+	 */
+	if (sk->sk_lock.owned == 1) {
 		local_bh_enable();
 		result = __tcp_checksum_complete(skb);
 		local_bh_disable();
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index aaa68f5b1055..588837d06c60 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -785,6 +785,8 @@ void tcp_release_cb(struct sock *sk)
 		__sock_put(sk);
 	}
 	if (flags & (1UL << TCP_DELACK_TIMER_DEFERRED)) {
+		/* Make sure __tcp_checksum_complete_user() wont enable BH */
+		sk->sk_lock.owned = 2;
 		tcp_delack_timer_handler(sk);
 		__sock_put(sk);
 	}

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [BUG] Deadlock on sk->sk_lock.slock
  2014-03-04 14:16 ` Eric Dumazet
@ 2014-03-04 18:48   ` David Miller
  2014-03-04 22:39     ` [PATCH] tcp: tcp_release_cb() should release socket ownership Eric Dumazet
  2014-03-05 15:01   ` [BUG] Deadlock on sk->sk_lock.slock Lars Persson
  1 sibling, 1 reply; 16+ messages in thread
From: David Miller @ 2014-03-04 18:48 UTC (permalink / raw)
  To: eric.dumazet; +Cc: lars.persson, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 04 Mar 2014 06:16:23 -0800

> Could you try following hack/patch ?

Eric, if you do fix it like this (I don't mind), please use mnenomics
for the various lock state values.

Thanks.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] tcp: tcp_release_cb() should release socket ownership
  2014-03-04 18:48   ` David Miller
@ 2014-03-04 22:39     ` Eric Dumazet
  2014-03-06  1:59       ` David Miller
  2014-03-10 16:50       ` [PATCH resend] " Eric Dumazet
  0 siblings, 2 replies; 16+ messages in thread
From: Eric Dumazet @ 2014-03-04 22:39 UTC (permalink / raw)
  To: David Miller; +Cc: lars.persson, netdev

On Tue, 2014-03-04 at 13:48 -0500, David Miller wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Tue, 04 Mar 2014 06:16:23 -0800
> 
> > Could you try following hack/patch ?
> 
> Eric, if you do fix it like this (I don't mind), please use mnenomics
> for the various lock state values.

Sure this was kind of a hack...

I was thinking of following patch instead.

Note I would prefer waiting Lars tests.

Thanks !

[PATCH] tcp: tcp_release_cb() should release socket ownership

Lars Persson reported following deadlock :

-000 |M:0x0:0x802B6AF8(asm) <-- arch_spin_lock 
-001 |tcp_v4_rcv(skb = 0x8BD527A0) <-- sk = 0x8BE6B2A0
-002 |ip_local_deliver_finish(skb = 0x8BD527A0)
-003 |__netif_receive_skb_core(skb = 0x8BD527A0, ?)
-004 |netif_receive_skb(skb = 0x8BD527A0)
-005 |elk_poll(napi = 0x8C770500, budget = 64)
-006 |net_rx_action(?)
-007 |__do_softirq()
-008 |do_softirq()
-009 |local_bh_enable()
-010 |tcp_rcv_established(sk = 0x8BE6B2A0, skb = 0x87D3A9E0, th = 0x814EBE14, ?)
-011 |tcp_v4_do_rcv(sk = 0x8BE6B2A0, skb = 0x87D3A9E0)
-012 |tcp_delack_timer_handler(sk = 0x8BE6B2A0)
-013 |tcp_release_cb(sk = 0x8BE6B2A0)
-014 |release_sock(sk = 0x8BE6B2A0)
-015 |tcp_sendmsg(?, sk = 0x8BE6B2A0, ?, ?)
-016 |sock_sendmsg(sock = 0x8518C4C0, msg = 0x87D8DAA8, size = 4096)
-017 |kernel_sendmsg(?, ?, ?, ?, size = 4096)
-018 |smb_send_kvec(server = 0x87C4D400, iov = 0x87D8DB10, n_vec = 1, sent = 0x87D8DB1C)
-019 |smb_send_rqst(server = 0x87C4D400, rqst = 0x87D8DBA0)
-020 |cifs_call_async(server = 0x87C4D400, rqst = 0x87D8DBA0, receive = 0x0, callback = 0x80219514,
-021 |cifs_async_writev(wdata = 0x87FD6580)
-022 |cifs_writepages(mapping = 0x852096E4, wbc = 0x87D8DC88)
-023 |__writeback_single_inode(inode = 0x852095D0, wbc = 0x87D8DC88)
-024 |writeback_sb_inodes(sb = 0x87D6D800, wb = 0x87E4A9C0, work = 0x87D8DD88)
-025 |__writeback_inodes_wb(wb = 0x87E4A9C0, work = 0x87D8DD88)
-026 |wb_writeback(wb = 0x87E4A9C0, work = 0x87D8DD88)
-027 |wb_do_writeback(wb = 0x87E4A9C0, force_wait = 0)
-028 |bdi_writeback_workfn(work = 0x87E4A9CC)
-029 |process_one_work(worker = 0x8B045880, work = 0x87E4A9CC)
-030 |worker_thread(__worker = 0x8B045880)
-031 |kthread(_create = 0x87CADD90)
-032 |ret_from_kernel_thread(asm)

Bug occurs because __tcp_checksum_complete_user() enables BH, assuming
it is running from softirq context.

Lars trace involved a NIC without RX checksum support but other points
are problematic as well, like the prequeue stuff.

Problem is triggered by a timer, that found socket being owned by user.

tcp_release_cb() should call tcp_write_timer_handler() or
tcp_delack_timer_handler() in the appropriate context :

BH disabled and socket lock held, but 'owned' field cleared,
as if they were running from timer handlers.

Fixes: 6f458dfb4092 ("tcp: improve latencies of timer triggered events")
Reported-by: Lars Persson <lars.persson@axis.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/net/sock.h    |    5 +++++
 net/core/sock.c       |    5 ++++-
 net/ipv4/tcp_output.c |   11 +++++++++++
 3 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 5c3f7c3624aa..2100d5e8809a 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1488,6 +1488,11 @@ static inline void sk_wmem_free_skb(struct sock *sk, struct sk_buff *skb)
  */
 #define sock_owned_by_user(sk)	((sk)->sk_lock.owned)
 
+static inline void sock_release_ownership(struct sock *sk)
+{
+	sk->sk_lock.owned = 0;
+}
+
 /*
  * Macro so as to not evaluate some arguments when
  * lockdep is not enabled.
diff --git a/net/core/sock.c b/net/core/sock.c
index 5b6a9431b017..c0fc6bdad1e3 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2357,10 +2357,13 @@ void release_sock(struct sock *sk)
 	if (sk->sk_backlog.tail)
 		__release_sock(sk);
 
+	/* Warning : release_cb() might need to release sk ownership,
+	 * ie call sock_release_ownership(sk) before us.
+	 */
 	if (sk->sk_prot->release_cb)
 		sk->sk_prot->release_cb(sk);
 
-	sk->sk_lock.owned = 0;
+	sock_release_ownership(sk);
 	if (waitqueue_active(&sk->sk_lock.wq))
 		wake_up(&sk->sk_lock.wq);
 	spin_unlock_bh(&sk->sk_lock.slock);
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index f0eb4e337ec8..17a11e65e57f 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -767,6 +767,17 @@ void tcp_release_cb(struct sock *sk)
 	if (flags & (1UL << TCP_TSQ_DEFERRED))
 		tcp_tsq_handler(sk);
 
+	/* Here begins the tricky part :
+	 * We are called from release_sock() with :
+	 * 1) BH disabled
+	 * 2) sk_lock.slock spinlock held
+	 * 3) socket owned by us (sk->sk_lock.owned == 1)
+	 *
+	 * But following code is meant to be called from BH handlers,
+	 * so we should keep BH disabled, but early release socket ownership
+	 */
+	sock_release_ownership(sk);
+
 	if (flags & (1UL << TCP_WRITE_TIMER_DEFERRED)) {
 		tcp_write_timer_handler(sk);
 		__sock_put(sk);

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* RE: [BUG] Deadlock on sk->sk_lock.slock
  2014-03-04 14:16 ` Eric Dumazet
  2014-03-04 18:48   ` David Miller
@ 2014-03-05 15:01   ` Lars Persson
  2014-03-05 17:34     ` Eric Dumazet
  1 sibling, 1 reply; 16+ messages in thread
From: Lars Persson @ 2014-03-05 15:01 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev


Thanks, this patch has survived a 24 hour test session. I will now start a test with the new sock_release_ownership patch.

- Lars

> -----Original Message-----
> From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
> Sent: den 4 mars 2014 15:16
> To: Lars Persson
> Cc: netdev@vger.kernel.org
> Subject: Re: [BUG] Deadlock on sk->sk_lock.slock
> 
> On Tue, 2014-03-04 at 09:59 +0100, Lars Persson wrote:
> > Hi
> >
> > We can trigger an rcu stall by stressing the error handling in the
> CIFS file system. The test case induces network outages of varying
> length while data is written to the file system. Judging by the stack
> dump it might be a deadlock in the tcpv4 code. So far seen on stable
> 3.10.32.
> >
> > My initial interpretation of the stack trace:
> > At frame 9 bh is enabled while holding the socket spinlock. This
> happens from __tcp_checksum_complete_user, which is inlined in
> tcp_rcv_established.
> >
> 
> 
> Thanks a lot for this report.
> 
> 
> > At frame 1 we try to grab the same lock from bh context with
> resulting deadlock.
> >
> > -000 |M:0x0:0x802B6AF8(asm) <-- arch_spin_lock
> > -001 |tcp_v4_rcv(skb = 0x8BD527A0) <-- sk = 0x8BE6B2A0
> > -002 |ip_local_deliver_finish(skb = 0x8BD527A0)
> > -003 |__netif_receive_skb_core(skb = 0x8BD527A0, ?)
> > -004 |netif_receive_skb(skb = 0x8BD527A0)
> > -005 |elk_poll(napi = 0x8C770500, budget = 64)
> > -006 |net_rx_action(?)
> > -007 |__do_softirq()
> > -008 |do_softirq()
> > -009 |local_bh_enable()
> > -010 |tcp_rcv_established(sk = 0x8BE6B2A0, skb = 0x87D3A9E0, th =
> > 0x814EBE14, ?)
> > -011 |tcp_v4_do_rcv(sk = 0x8BE6B2A0, skb = 0x87D3A9E0)
> > -012 |tcp_delack_timer_handler(sk = 0x8BE6B2A0)
> > -013 |tcp_release_cb(sk = 0x8BE6B2A0)
> > -014 |release_sock(sk = 0x8BE6B2A0)
> > -015 |tcp_sendmsg(?, sk = 0x8BE6B2A0, ?, ?)
> > -016 |sock_sendmsg(sock = 0x8518C4C0, msg = 0x87D8DAA8, size = 4096)
> > -017 |kernel_sendmsg(?, ?, ?, ?, size = 4096)
> > -018 |smb_send_kvec(server = 0x87C4D400, iov = 0x87D8DB10, n_vec = 1,
> > sent = 0x87D8DB1C)
> > -019 |smb_send_rqst(server = 0x87C4D400, rqst = 0x87D8DBA0)
> > -020 |cifs_call_async(server = 0x87C4D400, rqst = 0x87D8DBA0, receive
> > = 0x0, callback = 0x80219514,
> > -021 |cifs_async_writev(wdata = 0x87FD6580)
> > -022 |cifs_writepages(mapping = 0x852096E4, wbc = 0x87D8DC88)
> > -023 |__writeback_single_inode(inode = 0x852095D0, wbc = 0x87D8DC88)
> > -024 |writeback_sb_inodes(sb = 0x87D6D800, wb = 0x87E4A9C0, work =
> > 0x87D8DD88)
> > -025 |__writeback_inodes_wb(wb = 0x87E4A9C0, work = 0x87D8DD88)
> > -026 |wb_writeback(wb = 0x87E4A9C0, work = 0x87D8DD88)
> > -027 |wb_do_writeback(wb = 0x87E4A9C0, force_wait = 0)
> > -028 |bdi_writeback_workfn(work = 0x87E4A9CC)
> > -029 |process_one_work(worker = 0x8B045880, work = 0x87E4A9CC)
> > -030 |worker_thread(__worker = 0x8B045880)
> > -031 |kthread(_create = 0x87CADD90)
> > -032 |ret_from_kernel_thread(asm)
> 
> So the callchain is
> 
> release_sock()
> {
>  spin_lock_bh(&sk->sk_lock.slock);
> 
>  sk->sk_prot->release_cb(sk); //
> 
>  sk->sk_lock.owned = 0;
> }
> 
> Meaning we have the socket spinlock locked by us, and we disabled BH.
> 
> Problem is tcp_delack_timer_handler() assumes it was called from timer
> handler and sock was not owned by user.
> 
> Bug was added in commit 6f458dfb409272082c9bfa412f77ff2fc21c626f
> ("tcp: improve latencies of timer triggered events")
> 
> and triggers if NIC lacks RX checksum support.
> 
> 
> Could you try following hack/patch ?
> 
> diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index
> 6e4809389cbf..ed17e1cacae2 100644
> --- a/net/ipv4/tcp_input.c
> +++ b/net/ipv4/tcp_input.c
> @@ -4945,7 +4945,10 @@ static __sum16
> __tcp_checksum_complete_user(struct sock *sk,  {
>  	__sum16 result;
> 
> -	if (sock_owned_by_user(sk)) {
> +	/* If we are called from tcp_release_cb(),
> +	 * we do not want to enable BH
> +	 */
> +	if (sk->sk_lock.owned == 1) {
>  		local_bh_enable();
>  		result = __tcp_checksum_complete(skb);
>  		local_bh_disable();
> diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index
> aaa68f5b1055..588837d06c60 100644
> --- a/net/ipv4/tcp_output.c
> +++ b/net/ipv4/tcp_output.c
> @@ -785,6 +785,8 @@ void tcp_release_cb(struct sock *sk)
>  		__sock_put(sk);
>  	}
>  	if (flags & (1UL << TCP_DELACK_TIMER_DEFERRED)) {
> +		/* Make sure __tcp_checksum_complete_user() wont enable BH */
> +		sk->sk_lock.owned = 2;
>  		tcp_delack_timer_handler(sk);
>  		__sock_put(sk);
>  	}
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [BUG] Deadlock on sk->sk_lock.slock
  2014-03-05 15:01   ` [BUG] Deadlock on sk->sk_lock.slock Lars Persson
@ 2014-03-05 17:34     ` Eric Dumazet
  0 siblings, 0 replies; 16+ messages in thread
From: Eric Dumazet @ 2014-03-05 17:34 UTC (permalink / raw)
  To: Lars Persson; +Cc: netdev

On Wed, 2014-03-05 at 16:01 +0100, Lars Persson wrote:
> Thanks, this patch has survived a 24 hour test session. I will now
> start a test with the new sock_release_ownership patch.

Thanks a lot Lars !

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] tcp: tcp_release_cb() should release socket ownership
  2014-03-04 22:39     ` [PATCH] tcp: tcp_release_cb() should release socket ownership Eric Dumazet
@ 2014-03-06  1:59       ` David Miller
  2014-03-06  2:06         ` Eric Dumazet
  2014-03-10 16:50       ` [PATCH resend] " Eric Dumazet
  1 sibling, 1 reply; 16+ messages in thread
From: David Miller @ 2014-03-06  1:59 UTC (permalink / raw)
  To: eric.dumazet; +Cc: lars.persson, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Tue, 04 Mar 2014 14:39:12 -0800

> @@ -767,6 +767,17 @@ void tcp_release_cb(struct sock *sk)
>  	if (flags & (1UL << TCP_TSQ_DEFERRED))
>  		tcp_tsq_handler(sk);
>  
> +	/* Here begins the tricky part :
> +	 * We are called from release_sock() with :
> +	 * 1) BH disabled
> +	 * 2) sk_lock.slock spinlock held
> +	 * 3) socket owned by us (sk->sk_lock.owned == 1)
> +	 *
> +	 * But following code is meant to be called from BH handlers,
> +	 * so we should keep BH disabled, but early release socket ownership
> +	 */
> +	sock_release_ownership(sk);
> +

It really means that sk_lock.owned cannot ever be accessed without the
sk_lock spinlock held.

Most of this is easy to hand audit, except sock_owned_by_user() which
has call sites everywhere.

Consider adding a locking assertion to it.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] tcp: tcp_release_cb() should release socket ownership
  2014-03-06  1:59       ` David Miller
@ 2014-03-06  2:06         ` Eric Dumazet
  2014-03-06  2:15           ` David Miller
  0 siblings, 1 reply; 16+ messages in thread
From: Eric Dumazet @ 2014-03-06  2:06 UTC (permalink / raw)
  To: David Miller; +Cc: lars.persson, netdev

On Wed, 2014-03-05 at 20:59 -0500, David Miller wrote:

> It really means that sk_lock.owned cannot ever be accessed without the
> sk_lock spinlock held.
> 
> Most of this is easy to hand audit, except sock_owned_by_user() which
> has call sites everywhere.
> 
> Consider adding a locking assertion to it.

We can do that, but would it be a stable candidate ?

What about I send a followup for net-next ?

Thanks

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] tcp: tcp_release_cb() should release socket ownership
  2014-03-06  2:06         ` Eric Dumazet
@ 2014-03-06  2:15           ` David Miller
  2014-03-06  2:20             ` Eric Dumazet
  0 siblings, 1 reply; 16+ messages in thread
From: David Miller @ 2014-03-06  2:15 UTC (permalink / raw)
  To: eric.dumazet; +Cc: lars.persson, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Wed, 05 Mar 2014 18:06:41 -0800

> On Wed, 2014-03-05 at 20:59 -0500, David Miller wrote:
> 
>> It really means that sk_lock.owned cannot ever be accessed without the
>> sk_lock spinlock held.
>> 
>> Most of this is easy to hand audit, except sock_owned_by_user() which
>> has call sites everywhere.
>> 
>> Consider adding a locking assertion to it.
> 
> We can do that, but would it be a stable candidate ?
> 
> What about I send a followup for net-next ?

Targetting net-next for the assertion is fine.

Did you get test results back yet?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH] tcp: tcp_release_cb() should release socket ownership
  2014-03-06  2:15           ` David Miller
@ 2014-03-06  2:20             ` Eric Dumazet
  2014-03-07 15:45               ` Lars Persson
  0 siblings, 1 reply; 16+ messages in thread
From: Eric Dumazet @ 2014-03-06  2:20 UTC (permalink / raw)
  To: David Miller; +Cc: lars.persson, netdev

On Wed, 2014-03-05 at 21:15 -0500, David Miller wrote:

> Targetting net-next for the assertion is fine.
> 
> Did you get test results back yet?

Lars gave the results this morning for the first patch.

He said he was now testing the official patch.

Lets wait a bit more ;)

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH] tcp: tcp_release_cb() should release socket ownership
  2014-03-06  2:20             ` Eric Dumazet
@ 2014-03-07 15:45               ` Lars Persson
  2014-03-07 16:01                 ` Eric Dumazet
  0 siblings, 1 reply; 16+ messages in thread
From: Lars Persson @ 2014-03-07 15:45 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

So far no deadlock on the socket lock is seen with this patch. I have seen two out-of-memory crashes which leaves the system deadlocked trying to allocate memory.

Stack traces and kernel log from one of the oom crashes is attached. I was thinking this may be caused by having the dirty_ratio set too high given our product's heavy use of memory. Currently re-testing with dirty_ratio reduced to 5%.


[11515.530000] Recording maint invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
[11515.540000] CPU: 0 PID: 1304 Comm: Recording maint Tainted: G           O 3.10.32 #10
[11515.550000] Stack : 80666e52 00000049 80660000 00000000 805563dc 8b1876f8 8054af80 805d2227
	  00000518 00000000 80656d40 8b1876f8 000007ce 00000002 00000000 8047d858
	  00000000 8003bb64 00000006 00000000 8054d024
 [11515.580000]  8b189a84 8b189a84 8054af80
	  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
	  00000000 00000000 00000000 00000000 00000000 00000000 00000000 8b189a18
	  ...
[11515.600000] Call Trace:
[11515.600000] [<8001d574>] show_stack+0x64/0x7c
 [11515.610000] [<8047f438>] dump_header.isra.15+0x80/0x234
[11515.610000] [<800dad58>] oom_kill_process+0x2b8/0x464
[11515.620000] [<800db480>] out_of_memory+0x330/0x35c
[11515.620000] [<800defb8>] __alloc_pages_nodemask+0x748/0x75c
[11515.630000] [<800d9ab0>] filemap_fault+0x1b8/0x4b0
 [11515.640000] [<800f8da8>] __do_fault+0x88/0x638
[11515.650000] [<800fc360>] handle_pte_fault+0xc4/0x1328
[11515.650000] [<800fd6ec>] handle_mm_fault+0x128/0x188
[11515.660000] [<800253ac>] do_page_fault+0x10c/0x520
[11515.660000] [<80017ee0>] ret_from_exception+0x0/0x10
[11515.670000] 
 [11515.670000] Mem-Info:
[11515.670000] Normal per-cpu:
[11515.680000] CPU    0: hi:   90, btch:  15 usd:  12
[11515.680000] CPU    1: hi:   90, btch:  15 usd:  88
[11515.690000] active_anon:8293 inactive_anon:12641 isolated_anon:0
[11515.690000]  active_file:9 inactive_file:247 isolated_file:0
[11515.690000]  unevictable:0 dirty:0 writeback:0 unstable:0
[11515.690000]  free:2034 slab_reclaimable:1371 slab_unreclaimable:20541
[11515.690000]  mapped:729 shmem:14591 pagetables:330 bounce:0
[11515.690000]  free_cma:0
 [11515.730000] Normal free:8136kB min:8192kB low:10240kB high:12288kB active_anon:33172kB inactive_anon:50564kB active_file:36kB inactive_file:1024kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:204800kB managed:190524kB mlocked:0kB dirty:0kB writeback:0kB mapped:2916kB shmem:58364kB slab_reclaimable:5484kB slab_unreclaimable:82164kB kernel_stack:1256kB pagetables:1320kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:340 all_unreclaimable? yes
 [11515.780000] lowmem_reserve[]: 0 0
[11515.780000] Normal: 32*4kB (UM) 0*8kB 0*16kB 0*32kB 1*64kB (R) 1*128kB (R) 1*256kB (R) 1*512kB (R) 1*1024kB (R) 1*2048kB (R) 1*4096kB (R) = 8256kB
[11515.800000] 14940 total pagecache pages
 [11515.820000] 51200 pages RAM
[11515.820000] 3486 pages reserved
[11515.830000] 263787 pages shared
[11515.830000] 44113 pages non-shared

Kernel thread cifsd:
-000 |__schedule()
-001 |__lock_sock(sk = 0x8BFD7920)
-002 |lock_sock_nested(sk = 0x8BFD7920, subclass = 0)
-003 |sk_wait_data(sk = 0x8BFD7920, timeo = 0x8AFD3C58)
-004 |tcp_recvmsg(?, sk = 0x8BFD7920, msg = 0x8AFD3DE0, len = 4, nonblock = 0, f
-005 |inet_recvmsg(iocb = 0x8AFD3CF8, ?, msg = 0x8AFD3DE0, size = 4, flags = 0)
-006 |sock_recvmsg(sock = 0x87C0A8E0, msg = 0x8AFD3DE0, size = 4, flags = 0)
-007 |kernel_recvmsg(?, ?, ?, ?, size = 4, flags = 0)
-008 |cifs_readv_from_socket(server = 0x8B072800, iov_orig = 0x8AFD3E40, nr_segs
-009 |cifs_read_from_socket(?, ?, ?)
-010 |cifs_demultiplex_thread(p = 0x8B072800)
-011 |kthread(_create = 0x8C7DFD40)
-012 |ret_from_kernel_thread(asm)
 --> |exception
 --- |end of frame

Kernel thread kworker/1:1
-000 |__schedule()
-001 |__cond_resched()
-002 |cond_resched()
-003 |shrink_slab(shrink = 0x8C6C799C, nr_pages_scanned = 32, lru_pages = 5)
-004 |try_to_free_pages(?, ?, ?, ?)
-005 |__alloc_pages_nodemask(?, order = 0, ?, nodemask = 0x0)
-006 |cache_alloc_refill(cachep = 0x8C414780, flags = 80, ?)
-007 |kmem_cache_alloc(cachep = 0x8C414780, flags = 80)
-008 |__alloc_skb(size = 1856, gfp_mask = 80, ?, ?)
-009 |sk_stream_alloc_skb(sk = 0x8BFD7920, size = 1648, ?)
-010 |tcp_sendmsg(?, sk = 0x8BFD7920, ?, ?)
-011 |sock_sendmsg(sock = 0x87C0A8E0, msg = 0x8C6C7CB8, size = 42)
-012 |kernel_sendmsg(?, ?, ?, ?, size = 42)
-013 |smb_send_kvec(server = 0x8B072800, iov = 0x8C6C7DC8, n_vec = 1, sent = 0x8C6C7D2C)
-014 |smb_send_rqst(server = 0x8B072800, rqst = 0x8C6C7DB0)
-015 |cifs_call_async(server = 0x8B072800, rqst = 0x8C6C7DB0, receive = 0x0, callback = 0x802194DC, cb
-016 |CIFSSMBEcho(server = 0x8B072800)
-017 |cifs_echo_request(work = 0x8B072A90)
-018 |process_one_work(worker = 0x8C6F5B80, work = 0x8B072A90)
-019 |worker_thread(__worker = 0x8C6F5B80)
-020 |kthread(_create = 0x8C47DD90)
-021 |ret_from_kernel_thread(asm)
 --> |exception
 --- |end of frame

sk = 0x8BFD7920 -> (
    __sk_common = (skc_addrpair = 8575039233117432000, skc_daddr = 16820416, skc_rcv_saddr = 1996531904, skc_hash = 670815510, skc_u16hashes = (54550, 10235), skc_portpair = 3106389249, skc_dport = 48385, skc_num = 47399, skc_f
    sk_lock = (
      slock = (
        rlock = (
          raw_lock = (
            lock = 0x00DC00DC,
            h = (serving_now = 220, ticket = 220)),
          magic = 0xDEAD4EAD,
          owner_cpu = 0xFFFFFFFF,
          owner = 0xFFFFFFFF,
          dep_map = (key = 0x80B87FC0, class_cache = (0x0, 0x0), name = 0x80568D0C)),
        __padding = (220, 0, 220, 0, 173, 78, 173, 222, 255, 255, 255, 255, 255, 255, 255, 255),
        dep_map = (key = 0x80B87FC0, class_cache = (0x0, 0x0), name = 0x80568D0C)),
      owned = 1,
      wq = (lock = (rlock = (raw_lock = (lock = 196611, h = (serving_now = 3, ticket = 3)), magic = 3735899821, owner_cpu = 4294967295, owner = 0xFFFFFFFF, dep_map = (key = 0x80B87FE0, class_cache = (0x0, 0x0), name = 0x80568CB
      dep_map = (key = 0x80B87FD0, class_cache = (0x0, 0x0), name = 0x80568D20)),



> -----Original Message-----
> From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
> Sent: den 6 mars 2014 03:21
> To: David Miller
> Cc: Lars Persson; netdev@vger.kernel.org
> Subject: Re: [PATCH] tcp: tcp_release_cb() should release socket
> ownership
> 
> On Wed, 2014-03-05 at 21:15 -0500, David Miller wrote:
> 
> > Targetting net-next for the assertion is fine.
> >
> > Did you get test results back yet?
> 
> Lars gave the results this morning for the first patch.
> 
> He said he was now testing the official patch.
> 
> Lets wait a bit more ;)
> 
> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH] tcp: tcp_release_cb() should release socket ownership
  2014-03-07 15:45               ` Lars Persson
@ 2014-03-07 16:01                 ` Eric Dumazet
  2014-03-10 16:43                   ` Eric Dumazet
  0 siblings, 1 reply; 16+ messages in thread
From: Eric Dumazet @ 2014-03-07 16:01 UTC (permalink / raw)
  To: Lars Persson, David Miller; +Cc: netdev

On Fri, 2014-03-07 at 16:45 +0100, Lars Persson wrote:
> So far no deadlock on the socket lock is seen with this patch. I have seen two out-of-memory crashes which leaves the system deadlocked trying to allocate memory.
> 
> Stack traces and kernel log from one of the oom crashes is attached. I was thinking this may be caused by having the dirty_ratio set too high given our product's heavy use of memory. Currently re-testing with dirty_ratio reduced to 5%.
> 
> 
> [11515.530000] Recording maint invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
> [11515.540000] CPU: 0 PID: 1304 Comm: Recording maint Tainted: G           O 3.10.32 #10
> [11515.550000] Stack : 80666e52 00000049 80660000 00000000 805563dc 8b1876f8 8054af80 805d2227
> 	  00000518 00000000 80656d40 8b1876f8 000007ce 00000002 00000000 8047d858
> 	  00000000 8003bb64 00000006 00000000 8054d024
>  [11515.580000]  8b189a84 8b189a84 8054af80
> 	  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> 	  00000000 00000000 00000000 00000000 00000000 00000000 00000000 8b189a18
> 	  ...


Well, this is 3.10.32, with local changes to dirty_ratio....

so I will simply ignore mm issues for the moment.

Thanks a lot Lars

Tested-by: Lars Persson <lars.persson@axis.com>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH] tcp: tcp_release_cb() should release socket ownership
  2014-03-07 16:01                 ` Eric Dumazet
@ 2014-03-10 16:43                   ` Eric Dumazet
  2014-03-10 20:40                     ` David Miller
  0 siblings, 1 reply; 16+ messages in thread
From: Eric Dumazet @ 2014-03-10 16:43 UTC (permalink / raw)
  To: Lars Persson, David Miller; +Cc: netdev

On Fri, 2014-03-07 at 08:01 -0800, Eric Dumazet wrote:
> On Fri, 2014-03-07 at 16:45 +0100, Lars Persson wrote:
> > So far no deadlock on the socket lock is seen with this patch. I have seen two out-of-memory crashes which leaves the system deadlocked trying to allocate memory.
> > 
> > Stack traces and kernel log from one of the oom crashes is attached. I was thinking this may be caused by having the dirty_ratio set too high given our product's heavy use of memory. Currently re-testing with dirty_ratio reduced to 5%.
> > 
> > 
> > [11515.530000] Recording maint invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
> > [11515.540000] CPU: 0 PID: 1304 Comm: Recording maint Tainted: G           O 3.10.32 #10
> > [11515.550000] Stack : 80666e52 00000049 80660000 00000000 805563dc 8b1876f8 8054af80 805d2227
> > 	  00000518 00000000 80656d40 8b1876f8 000007ce 00000002 00000000 8047d858
> > 	  00000000 8003bb64 00000006 00000000 8054d024
> >  [11515.580000]  8b189a84 8b189a84 8054af80
> > 	  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> > 	  00000000 00000000 00000000 00000000 00000000 00000000 00000000 8b189a18
> > 	  ...
> 
> 
> Well, this is 3.10.32, with local changes to dirty_ratio....
> 
> so I will simply ignore mm issues for the moment.
> 
> Thanks a lot Lars
> 
> Tested-by: Lars Persson <lars.persson@axis.com>

David, what happened with this patch ?

Should I re-submit it ?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH resend] tcp: tcp_release_cb() should release socket ownership
  2014-03-04 22:39     ` [PATCH] tcp: tcp_release_cb() should release socket ownership Eric Dumazet
  2014-03-06  1:59       ` David Miller
@ 2014-03-10 16:50       ` Eric Dumazet
  2014-03-11 20:46         ` David Miller
  1 sibling, 1 reply; 16+ messages in thread
From: Eric Dumazet @ 2014-03-10 16:50 UTC (permalink / raw)
  To: David Miller; +Cc: lars.persson, netdev

Lars Persson reported following deadlock :

-000 |M:0x0:0x802B6AF8(asm) <-- arch_spin_lock 
-001 |tcp_v4_rcv(skb = 0x8BD527A0) <-- sk = 0x8BE6B2A0
-002 |ip_local_deliver_finish(skb = 0x8BD527A0)
-003 |__netif_receive_skb_core(skb = 0x8BD527A0, ?)
-004 |netif_receive_skb(skb = 0x8BD527A0)
-005 |elk_poll(napi = 0x8C770500, budget = 64)
-006 |net_rx_action(?)
-007 |__do_softirq()
-008 |do_softirq()
-009 |local_bh_enable()
-010 |tcp_rcv_established(sk = 0x8BE6B2A0, skb = 0x87D3A9E0, th = 0x814EBE14, ?)
-011 |tcp_v4_do_rcv(sk = 0x8BE6B2A0, skb = 0x87D3A9E0)
-012 |tcp_delack_timer_handler(sk = 0x8BE6B2A0)
-013 |tcp_release_cb(sk = 0x8BE6B2A0)
-014 |release_sock(sk = 0x8BE6B2A0)
-015 |tcp_sendmsg(?, sk = 0x8BE6B2A0, ?, ?)
-016 |sock_sendmsg(sock = 0x8518C4C0, msg = 0x87D8DAA8, size = 4096)
-017 |kernel_sendmsg(?, ?, ?, ?, size = 4096)
-018 |smb_send_kvec()
-019 |smb_send_rqst(server = 0x87C4D400, rqst = 0x87D8DBA0)
-020 |cifs_call_async()
-021 |cifs_async_writev(wdata = 0x87FD6580)
-022 |cifs_writepages(mapping = 0x852096E4, wbc = 0x87D8DC88)
-023 |__writeback_single_inode(inode = 0x852095D0, wbc = 0x87D8DC88)
-024 |writeback_sb_inodes(sb = 0x87D6D800, wb = 0x87E4A9C0, work = 0x87D8DD88)
-025 |__writeback_inodes_wb(wb = 0x87E4A9C0, work = 0x87D8DD88)
-026 |wb_writeback(wb = 0x87E4A9C0, work = 0x87D8DD88)
-027 |wb_do_writeback(wb = 0x87E4A9C0, force_wait = 0)
-028 |bdi_writeback_workfn(work = 0x87E4A9CC)
-029 |process_one_work(worker = 0x8B045880, work = 0x87E4A9CC)
-030 |worker_thread(__worker = 0x8B045880)
-031 |kthread(_create = 0x87CADD90)
-032 |ret_from_kernel_thread(asm)

Bug occurs because __tcp_checksum_complete_user() enables BH, assuming
it is running from softirq context.

Lars trace involved a NIC without RX checksum support but other points
are problematic as well, like the prequeue stuff.

Problem is triggered by a timer, that found socket being owned by user.

tcp_release_cb() should call tcp_write_timer_handler() or
tcp_delack_timer_handler() in the appropriate context :

BH disabled and socket lock held, but 'owned' field cleared,
as if they were running from timer handlers.

Fixes: 6f458dfb4092 ("tcp: improve latencies of timer triggered events")
Reported-by: Lars Persson <lars.persson@axis.com>
Tested-by: Lars Persson <lars.persson@axis.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/net/sock.h    |    5 +++++
 net/core/sock.c       |    5 ++++-
 net/ipv4/tcp_output.c |   11 +++++++++++
 3 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/include/net/sock.h b/include/net/sock.h
index 5c3f7c3624aa..2100d5e8809a 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1488,6 +1488,11 @@ static inline void sk_wmem_free_skb(struct sock *sk, struct sk_buff *skb)
  */
 #define sock_owned_by_user(sk)	((sk)->sk_lock.owned)
 
+static inline void sock_release_ownership(struct sock *sk)
+{
+	sk->sk_lock.owned = 0;
+}
+
 /*
  * Macro so as to not evaluate some arguments when
  * lockdep is not enabled.
diff --git a/net/core/sock.c b/net/core/sock.c
index 5b6a9431b017..c0fc6bdad1e3 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2357,10 +2357,13 @@ void release_sock(struct sock *sk)
 	if (sk->sk_backlog.tail)
 		__release_sock(sk);
 
+	/* Warning : release_cb() might need to release sk ownership,
+	 * ie call sock_release_ownership(sk) before us.
+	 */
 	if (sk->sk_prot->release_cb)
 		sk->sk_prot->release_cb(sk);
 
-	sk->sk_lock.owned = 0;
+	sock_release_ownership(sk);
 	if (waitqueue_active(&sk->sk_lock.wq))
 		wake_up(&sk->sk_lock.wq);
 	spin_unlock_bh(&sk->sk_lock.slock);
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index f0eb4e337ec8..17a11e65e57f 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -767,6 +767,17 @@ void tcp_release_cb(struct sock *sk)
 	if (flags & (1UL << TCP_TSQ_DEFERRED))
 		tcp_tsq_handler(sk);
 
+	/* Here begins the tricky part :
+	 * We are called from release_sock() with :
+	 * 1) BH disabled
+	 * 2) sk_lock.slock spinlock held
+	 * 3) socket owned by us (sk->sk_lock.owned == 1)
+	 *
+	 * But following code is meant to be called from BH handlers,
+	 * so we should keep BH disabled, but early release socket ownership
+	 */
+	sock_release_ownership(sk);
+
 	if (flags & (1UL << TCP_WRITE_TIMER_DEFERRED)) {
 		tcp_write_timer_handler(sk);
 		__sock_put(sk);

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] tcp: tcp_release_cb() should release socket ownership
  2014-03-10 16:43                   ` Eric Dumazet
@ 2014-03-10 20:40                     ` David Miller
  0 siblings, 0 replies; 16+ messages in thread
From: David Miller @ 2014-03-10 20:40 UTC (permalink / raw)
  To: eric.dumazet; +Cc: lars.persson, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 10 Mar 2014 09:43:51 -0700

> On Fri, 2014-03-07 at 08:01 -0800, Eric Dumazet wrote:
>> On Fri, 2014-03-07 at 16:45 +0100, Lars Persson wrote:
>> > So far no deadlock on the socket lock is seen with this patch. I have seen two out-of-memory crashes which leaves the system deadlocked trying to allocate memory.
>> > 
>> > Stack traces and kernel log from one of the oom crashes is attached. I was thinking this may be caused by having the dirty_ratio set too high given our product's heavy use of memory. Currently re-testing with dirty_ratio reduced to 5%.
>> > 
>> > 
>> > [11515.530000] Recording maint invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
>> > [11515.540000] CPU: 0 PID: 1304 Comm: Recording maint Tainted: G           O 3.10.32 #10
>> > [11515.550000] Stack : 80666e52 00000049 80660000 00000000 805563dc 8b1876f8 8054af80 805d2227
>> > 	  00000518 00000000 80656d40 8b1876f8 000007ce 00000002 00000000 8047d858
>> > 	  00000000 8003bb64 00000006 00000000 8054d024
>> >  [11515.580000]  8b189a84 8b189a84 8054af80
>> > 	  00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
>> > 	  00000000 00000000 00000000 00000000 00000000 00000000 00000000 8b189a18
>> > 	  ...
>> 
>> 
>> Well, this is 3.10.32, with local changes to dirty_ratio....
>> 
>> so I will simply ignore mm issues for the moment.
>> 
>> Thanks a lot Lars
>> 
>> Tested-by: Lars Persson <lars.persson@axis.com>
> 
> David, what happened with this patch ?
> 
> Should I re-submit it ?

Yes, please do.  Thanks!

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH resend] tcp: tcp_release_cb() should release socket ownership
  2014-03-10 16:50       ` [PATCH resend] " Eric Dumazet
@ 2014-03-11 20:46         ` David Miller
  0 siblings, 0 replies; 16+ messages in thread
From: David Miller @ 2014-03-11 20:46 UTC (permalink / raw)
  To: eric.dumazet; +Cc: lars.persson, netdev

From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 10 Mar 2014 09:50:11 -0700

> Lars Persson reported following deadlock :
 ...
> Bug occurs because __tcp_checksum_complete_user() enables BH, assuming
> it is running from softirq context.
> 
> Lars trace involved a NIC without RX checksum support but other points
> are problematic as well, like the prequeue stuff.
> 
> Problem is triggered by a timer, that found socket being owned by user.
> 
> tcp_release_cb() should call tcp_write_timer_handler() or
> tcp_delack_timer_handler() in the appropriate context :
> 
> BH disabled and socket lock held, but 'owned' field cleared,
> as if they were running from timer handlers.
> 
> Fixes: 6f458dfb4092 ("tcp: improve latencies of timer triggered events")
> Reported-by: Lars Persson <lars.persson@axis.com>
> Tested-by: Lars Persson <lars.persson@axis.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Applied and queued up for -stable, thanks!

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2014-03-11 20:46 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-03-04  8:59 [BUG] Deadlock on sk->sk_lock.slock Lars Persson
2014-03-04 14:16 ` Eric Dumazet
2014-03-04 18:48   ` David Miller
2014-03-04 22:39     ` [PATCH] tcp: tcp_release_cb() should release socket ownership Eric Dumazet
2014-03-06  1:59       ` David Miller
2014-03-06  2:06         ` Eric Dumazet
2014-03-06  2:15           ` David Miller
2014-03-06  2:20             ` Eric Dumazet
2014-03-07 15:45               ` Lars Persson
2014-03-07 16:01                 ` Eric Dumazet
2014-03-10 16:43                   ` Eric Dumazet
2014-03-10 20:40                     ` David Miller
2014-03-10 16:50       ` [PATCH resend] " Eric Dumazet
2014-03-11 20:46         ` David Miller
2014-03-05 15:01   ` [BUG] Deadlock on sk->sk_lock.slock Lars Persson
2014-03-05 17:34     ` Eric Dumazet

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.