netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Fwd: Kernel Oops in __inet_twsk_kill()
       [not found] <FB8A4655DFD2B34DB16AE06DDDD6C0E231A6F030@SJEXCHMB12.corp.ad.broadcom.com>
@ 2014-11-05 16:00 ` Daniel Borkmann
  2014-11-05 18:00   ` Cong Wang
  0 siblings, 1 reply; 7+ messages in thread
From: Daniel Borkmann @ 2014-11-05 16:00 UTC (permalink / raw)
  To: charley.chu; +Cc: netdev

[ moving to netdev ]

-------- Original Message --------
Subject: Kernel Oops in __inet_twsk_kill()
Date: Tue, 4 Nov 2014 23:47:18 +0000
From: Charley (Hao Chuan) Chu <charley.chu@broadcom.com>
To: linux-kernel@vger.kernel.org <linux-kernel@vger.kernel.org>

We have situation on our system. It brings the network interface up and down every
a few seconds. Eventually, it brings down the system - the kernel crashed due to BUG
on in __inet_twsk_kill(). The debug message show following call flow.

1) time-wait socket is created by tcp_time_wait() when the socket gets into "TIME_WAIT" state.
     inet_twsk_alloc()               - refcnt= 0
     inet_twsk_hashdance()  - refcnt = 3
     inet_twsk_schedule()      - refcnt = 4
     inet_twsk_put()                 - refcnt = 3
2) tcp_v4_timewait_ack() is called when sync is received
     inet_twsk_put()                  - refcnt= 2      <== where we thing the problem is
     occasionally, second sync is received, so the inet_twsk_put is called twice - refcnt = 1
3) twdr_do_twkill_work() is called when timed out
     call __inet_twsk_kill - BUG_ON!!! as refcnt=2 (supposed to be 3).
     call inet_twsk_put()

In a normal case, the callflow only has step 1 and step 3.  Our understanding is
the time-wait socket has three references - ehash, bhash and timer death row. In
step 2, none of them are touched. Can anyone here explain to us why the inet_twsk_put()
is called in tcp_v4_timewait_ack()?

our system has 3.14 kernel.

Any help would be highly appreciated.

Charley Chu


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Kernel Oops in __inet_twsk_kill()
  2014-11-05 16:00 ` Fwd: Kernel Oops in __inet_twsk_kill() Daniel Borkmann
@ 2014-11-05 18:00   ` Cong Wang
  2014-11-05 19:16     ` Charley (Hao Chuan) Chu
  0 siblings, 1 reply; 7+ messages in thread
From: Cong Wang @ 2014-11-05 18:00 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: charley.chu, netdev

On Wed, Nov 5, 2014 at 8:00 AM, Daniel Borkmann <borkmann@iogearbox.net> wrote:
> [ moving to netdev ]
>
> -------- Original Message --------
> Subject: Kernel Oops in __inet_twsk_kill()
> Date: Tue, 4 Nov 2014 23:47:18 +0000
> From: Charley (Hao Chuan) Chu <charley.chu@broadcom.com>
> To: linux-kernel@vger.kernel.org <linux-kernel@vger.kernel.org>
>
> We have situation on our system. It brings the network interface up and down
> every
> a few seconds. Eventually, it brings down the system - the kernel crashed
> due to BUG
> on in __inet_twsk_kill(). The debug message show following call flow.
>
> 1) time-wait socket is created by tcp_time_wait() when the socket gets into
> "TIME_WAIT" state.
>     inet_twsk_alloc()               - refcnt= 0
>     inet_twsk_hashdance()  - refcnt = 3
>     inet_twsk_schedule()      - refcnt = 4
>     inet_twsk_put()                 - refcnt = 3
> 2) tcp_v4_timewait_ack() is called when sync is received
>     inet_twsk_put()                  - refcnt= 2      <== where we thing the
> problem is
>     occasionally, second sync is received, so the inet_twsk_put is called
> twice - refcnt = 1
> 3) twdr_do_twkill_work() is called when timed out
>     call __inet_twsk_kill - BUG_ON!!! as refcnt=2 (supposed to be 3).
>     call inet_twsk_put()
>
> In a normal case, the callflow only has step 1 and step 3.  Our
> understanding is
> the time-wait socket has three references - ehash, bhash and timer death
> row. In
> step 2, none of them are touched. Can anyone here explain to us why the
> inet_twsk_put()
> is called in tcp_v4_timewait_ack()?
>

It has been there for a rather long time, but this doesn't mean it is
correct. Its caller calls inet_twsk_put() on error path, so smells wrong
to call it on non-error path. But I don't look into this.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: Kernel Oops in __inet_twsk_kill()
  2014-11-05 18:00   ` Cong Wang
@ 2014-11-05 19:16     ` Charley (Hao Chuan) Chu
  2014-11-05 19:39       ` David Miller
  0 siblings, 1 reply; 7+ messages in thread
From: Charley (Hao Chuan) Chu @ 2014-11-05 19:16 UTC (permalink / raw)
  To: Cong Wang, Daniel Borkmann; +Cc: netdev

Thanks Daniel and Cong,

The problem has been fixed. It is introduced by a third party patch, which decreases the refcnt of timewait socket. 

Charley

-----Original Message-----
From: Cong Wang [mailto:cwang@twopensource.com] 
Sent: Wednesday, November 05, 2014 10:00 AM
To: Daniel Borkmann
Cc: Charley (Hao Chuan) Chu; netdev
Subject: Re: Kernel Oops in __inet_twsk_kill()

On Wed, Nov 5, 2014 at 8:00 AM, Daniel Borkmann <borkmann@iogearbox.net> wrote:
> [ moving to netdev ]
>
> -------- Original Message --------
> Subject: Kernel Oops in __inet_twsk_kill()
> Date: Tue, 4 Nov 2014 23:47:18 +0000
> From: Charley (Hao Chuan) Chu <charley.chu@broadcom.com>
> To: linux-kernel@vger.kernel.org <linux-kernel@vger.kernel.org>
>
> We have situation on our system. It brings the network interface up and down
> every
> a few seconds. Eventually, it brings down the system - the kernel crashed
> due to BUG
> on in __inet_twsk_kill(). The debug message show following call flow.
>
> 1) time-wait socket is created by tcp_time_wait() when the socket gets into
> "TIME_WAIT" state.
>     inet_twsk_alloc()               - refcnt= 0
>     inet_twsk_hashdance()  - refcnt = 3
>     inet_twsk_schedule()      - refcnt = 4
>     inet_twsk_put()                 - refcnt = 3
> 2) tcp_v4_timewait_ack() is called when sync is received
>     inet_twsk_put()                  - refcnt= 2      <== where we thing the
> problem is
>     occasionally, second sync is received, so the inet_twsk_put is called
> twice - refcnt = 1
> 3) twdr_do_twkill_work() is called when timed out
>     call __inet_twsk_kill - BUG_ON!!! as refcnt=2 (supposed to be 3).
>     call inet_twsk_put()
>
> In a normal case, the callflow only has step 1 and step 3.  Our
> understanding is
> the time-wait socket has three references - ehash, bhash and timer death
> row. In
> step 2, none of them are touched. Can anyone here explain to us why the
> inet_twsk_put()
> is called in tcp_v4_timewait_ack()?
>

It has been there for a rather long time, but this doesn't mean it is
correct. Its caller calls inet_twsk_put() on error path, so smells wrong
to call it on non-error path. But I don't look into this.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Kernel Oops in __inet_twsk_kill()
  2014-11-05 19:16     ` Charley (Hao Chuan) Chu
@ 2014-11-05 19:39       ` David Miller
  2014-11-05 20:35         ` Charley (Hao Chuan) Chu
  0 siblings, 1 reply; 7+ messages in thread
From: David Miller @ 2014-11-05 19:39 UTC (permalink / raw)
  To: charley.chu; +Cc: cwang, borkmann, netdev

From: "Charley (Hao Chuan) Chu" <charley.chu@broadcom.com>
Date: Wed, 5 Nov 2014 19:16:09 +0000

> The problem has been fixed. It is introduced by a third party patch,
> which decreases the refcnt of timewait socket.

This is something that really drives me nuts, seriously?

Never report problems to us when you have such patches applied.

It is an even larger crime to not mention this at all from the
beginning, because we might start to investigate and waste our
precious time doing so.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: Kernel Oops in __inet_twsk_kill()
  2014-11-05 19:39       ` David Miller
@ 2014-11-05 20:35         ` Charley (Hao Chuan) Chu
  2015-07-10 16:36           ` Phani
  0 siblings, 1 reply; 7+ messages in thread
From: Charley (Hao Chuan) Chu @ 2014-11-05 20:35 UTC (permalink / raw)
  To: David Miller; +Cc: cwang, borkmann, netdev

Hi David,

Sorry, it is my mistake forgot to mention we have third party patch applied.

Thanks,
Charley

-----Original Message-----
From: David Miller [mailto:davem@davemloft.net] 
Sent: Wednesday, November 05, 2014 11:39 AM
To: Charley (Hao Chuan) Chu
Cc: cwang@twopensource.com; borkmann@iogearbox.net; netdev@vger.kernel.org
Subject: Re: Kernel Oops in __inet_twsk_kill()

From: "Charley (Hao Chuan) Chu" <charley.chu@broadcom.com>
Date: Wed, 5 Nov 2014 19:16:09 +0000

> The problem has been fixed. It is introduced by a third party patch,
> which decreases the refcnt of timewait socket.

This is something that really drives me nuts, seriously?

Never report problems to us when you have such patches applied.

It is an even larger crime to not mention this at all from the
beginning, because we might start to investigate and waste our
precious time doing so.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Kernel Oops in __inet_twsk_kill()
  2014-11-05 20:35         ` Charley (Hao Chuan) Chu
@ 2015-07-10 16:36           ` Phani
  2015-08-03  2:19             ` charley
  0 siblings, 1 reply; 7+ messages in thread
From: Phani @ 2015-07-10 16:36 UTC (permalink / raw)
  To: netdev



> 
> > The problem has been fixed. It is introduced by a third party patch,
> > which decreases the refcnt of timewait socket.
>



What is the fix?
pl share the gerrit.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Kernel Oops in __inet_twsk_kill()
  2015-07-10 16:36           ` Phani
@ 2015-08-03  2:19             ` charley
  0 siblings, 0 replies; 7+ messages in thread
From: charley @ 2015-08-03  2:19 UTC (permalink / raw)
  To: netdev

Phani <pgargey <at> yahoo.com> writes:

> 
> 
> > 
> > > The problem has been fixed. It is introduced by a third party patch,
> > > which decreases the refcnt of timewait socket.
> >
> 
> What is the fix?
> pl share the gerrit.
> 
> 

Here you go.

https://android.googlesource.com/kernel/common/+/android-
3.14/net/netfilter/xt_qtaguid.c


diff --git a/net/netfilter/xt_qtaguid.c b/net/netfilter/xt_qtaguid.c
--- a/net/netfilter/xt_qtaguid.c
+++ b/net/netfilter/xt_qtaguid.c
@@ -1605,7 +1605,8 @@ static struct sock 
*qtaguid_find_sk(const struct sk_buff *skb,
 		 * "struct inet_timewait_sock" which is missing fields.
		 */
		if (sk->sk_state  == TCP_TIME_WAIT) {
-			sock_gen_put(sk);
+                      if (sk != skb-sk)
+			    sock_gen_put(sk);
			sk = NULL;
		}
	}

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-08-03  2:25 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <FB8A4655DFD2B34DB16AE06DDDD6C0E231A6F030@SJEXCHMB12.corp.ad.broadcom.com>
2014-11-05 16:00 ` Fwd: Kernel Oops in __inet_twsk_kill() Daniel Borkmann
2014-11-05 18:00   ` Cong Wang
2014-11-05 19:16     ` Charley (Hao Chuan) Chu
2014-11-05 19:39       ` David Miller
2014-11-05 20:35         ` Charley (Hao Chuan) Chu
2015-07-10 16:36           ` Phani
2015-08-03  2:19             ` charley

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).