All of lore.kernel.org
 help / color / mirror / Atom feed
* Fw: Nasty Oops in 2.6.0-test6 bind/SO_REUSEADDR
@ 2003-10-08 20:33 David S. Miller
  2003-10-09 17:03 ` Dan Merillat
  2003-10-10  2:36 ` Arnaldo Carvalho de Melo
  0 siblings, 2 replies; 7+ messages in thread
From: David S. Miller @ 2003-10-08 20:33 UTC (permalink / raw)
  To: acme; +Cc: netdev

[-- Attachment #1: Type: text/plain, Size: 1880 bytes --]

Arnaldo, I think this is another piece of fallout
from the struct sock splitup you did ages ago.

I think it's dereferencing inet_sk(sk) for a time-wait
socket, so we probably need a TCP_TIME_WAIT test plus
some additional logic here?  Better check tcp_ipv6.c too.

Begin forwarded message:

Date: Wed, 8 Oct 2003 16:04:09 -0400
From: Dan Merillat <dmerillat@sequiam.com>
To: linux-kernel@vger.kernel.org
Subject: Nasty Oops in 2.6.0-test6 bind/SO_REUSEADDR



I can't provide a stacktrace because it hardlocks the system, but it's
trivial to reproduce.

Swap back and forth between apache2 and apache a few times, and it
hardlocks at bind.

>From what I copied down and backtraced we crash at tcp_v4_get_port +
0x378/390, which is in tcp_ipv4.c:194 (inline tcp_bind_conflict)

                struct inet_opt *inet2 = inet_sk(sk2);
                if (!inet2->rcv_saddr || !inet->rcv_saddr ||
                    inet2->rcv_saddr == inet->rcv_saddr)
                    break;

     468:       0f b6 40 49             movzbl 0x49(%eax),%eax
     46c:       83 e0 20                and    $0x20,%eax
     46f:       84 c0                   test   %al,%al

In fact, I believe the problem to be with SO_REUSEADDR.  It only
manifests if the port has gotten traffic and there's sockets in
TIME_WAIT.

I suppose a trivial test would be to bind to a port, connect to it,
disconnect, close the socket, create a socket with SO_REUSEADDR and
rebind to it.  Pow.

I can't get UML 2.6.0 working so I can't test very well, but it's a
helluva showstopper.

The strace of apache starting up when it crashed:

socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 3
fcntl64(3, F_DUPFD, 15)                 = 20
close(3)                                = 0
setsockopt(20, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
setsockopt(20, SOL_SOCKET, SO_KEEPALIVE, [1], 4

(oopsed in bind so strace never saw it)

--Dan


[-- Attachment #2: 00000002.mimetmp --]
[-- Type: application/pgp-signature, Size: 156 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Fw: Nasty Oops in 2.6.0-test6 bind/SO_REUSEADDR
  2003-10-08 20:33 Fw: Nasty Oops in 2.6.0-test6 bind/SO_REUSEADDR David S. Miller
@ 2003-10-09 17:03 ` Dan Merillat
  2003-10-10  2:36 ` Arnaldo Carvalho de Melo
  1 sibling, 0 replies; 7+ messages in thread
From: Dan Merillat @ 2003-10-09 17:03 UTC (permalink / raw)
  To: netdev

[-- Attachment #1: Type: text/plain, Size: 1694 bytes --]

On Wed, 08 Oct 2003, David S. Miller wrote:

> Arnaldo, I think this is another piece of fallout
> from the struct sock splitup you did ages ago.
> 
> I think it's dereferencing inet_sk(sk) for a time-wait
> socket, so we probably need a TCP_TIME_WAIT test plus
> some additional logic here?  Better check tcp_ipv6.c too.

Found some more on this, it's been entered into the kernel bug-tracker

http://bugme.osdl.org/show_bug.cgi?id=1271

He managed to get an oops out of his:

Unable to handle kernel NULL pointer dereference at virtual address 00000049
 printing eip:
c030b346
*pde = 00000000
Oops: 0000 [#1]
CPU:    1
EIP:    0060:[<c030b346>]    Not tainted
EFLAGS: 00010246
EIP is at tcp_v4_get_port+0x3c6/0x3e0
eax: 00000000   ebx: f74ff380   ecx: f667ff40   edx: f667ff50
esi: 00000002   edi: 00002151   ebp: f66097c0   esp: f6b0be68
ds: 007b   es: 007b   ss: 0068
Process perl (pid: 3433, threadinfo=f6b0a000 task=f6b0d900)
Stack: 00000000 00000000 00000000 f66270d0 00000000 00000000 00000001 f6609908 
       00000000 00000000 00000000 00000001 f7c90a88 f66097c0 ffffffea f6609908 
       f6b0bee8 c031f215 f66097c0 00002151 c02d568d 00000003 21511818 f6612740 
Call Trace:
 [<c031f215>] inet_bind+0x1d5/0x300
 [<c02d568d>] move_addr_to_kernel+0x8d/0xa0
 [<c02d6d8b>] sys_bind+0x7b/0xb0
 [<c011c11c>] do_page_fault+0x23c/0x44f
 [<c02d59dc>] sockfd_lookup+0x1c/0x80
 [<c02d74d8>] sys_setsockopt+0x78/0xc0
 [<c02d7be8>] sys_socketcall+0xc8/0x2b0
 [<c01095d9>] sysenter_past_esp+0x52/0x71

Code: 0f b6 40 49 24 20 84 c0 75 97 eb 89 89 14 24 e8 06 51 e1 ff 
 <0>Kernel panic: Fatal exception in interrupt
In interrupt handler - not syncing



[-- Attachment #2: Type: application/pgp-signature, Size: 155 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Fw: Nasty Oops in 2.6.0-test6 bind/SO_REUSEADDR
  2003-10-08 20:33 Fw: Nasty Oops in 2.6.0-test6 bind/SO_REUSEADDR David S. Miller
  2003-10-09 17:03 ` Dan Merillat
@ 2003-10-10  2:36 ` Arnaldo Carvalho de Melo
  2003-10-10  3:22   ` [RFT] " Arnaldo Carvalho de Melo
  1 sibling, 1 reply; 7+ messages in thread
From: Arnaldo Carvalho de Melo @ 2003-10-10  2:36 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Dan Merillat

Em Wed, Oct 08, 2003 at 01:33:45PM -0700, David S. Miller escreveu:
> Arnaldo, I think this is another piece of fallout
> from the struct sock splitup you did ages ago.
> 
> I think it's dereferencing inet_sk(sk) for a time-wait
> socket, so we probably need a TCP_TIME_WAIT test plus
> some additional logic here?  Better check tcp_ipv6.c too.

Dan, could you please try with this patch?

- Arnaldo


===== net/ipv4/tcp_ipv4.c 1.69 vs edited =====
--- 1.69/net/ipv4/tcp_ipv4.c	Wed Oct  8 12:27:40 2003
+++ edited/net/ipv4/tcp_ipv4.c	Thu Oct  9 23:23:38 2003
@@ -186,7 +186,8 @@
 	int reuse = sk->sk_reuse;
 
 	sk_for_each_bound(sk2, node, &tb->owners) {
-		if (sk != sk2 &&
+		if (likely(sk->sk_state != TCP_TIME_WAIT) &&
+		    sk != sk2 &&
 		    !ipv6_only_sock(sk2) &&
 		    (!sk->sk_bound_dev_if ||
 		     !sk2->sk_bound_dev_if ||
===== net/ipv6/tcp_ipv6.c 1.74 vs edited =====
--- 1.74/net/ipv6/tcp_ipv6.c	Wed Oct  8 12:27:40 2003
+++ edited/net/ipv6/tcp_ipv6.c	Thu Oct  9 23:23:45 2003
@@ -101,7 +101,8 @@
 
 	/* We must walk the whole port owner list in this case. -DaveM */
 	sk_for_each_bound(sk2, node, &tb->owners) {
-		if (sk != sk2 &&
+		if (likely(sk->sk_state != TCP_TIME_WAIT) &&
+		    sk != sk2 &&
 		    (!sk->sk_bound_dev_if ||
 		     !sk2->sk_bound_dev_if ||
 		     sk->sk_bound_dev_if == sk2->sk_bound_dev_if) &&

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [RFT] Re: Fw: Nasty Oops in 2.6.0-test6 bind/SO_REUSEADDR
  2003-10-10  2:36 ` Arnaldo Carvalho de Melo
@ 2003-10-10  3:22   ` Arnaldo Carvalho de Melo
  2003-10-10  4:14     ` David S. Miller
  0 siblings, 1 reply; 7+ messages in thread
From: Arnaldo Carvalho de Melo @ 2003-10-10  3:22 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, Dan Merillat

Em Thu, Oct 09, 2003 at 11:36:44PM -0300, Arnaldo C. Melo escreveu:
> Em Wed, Oct 08, 2003 at 01:33:45PM -0700, David S. Miller escreveu:
> > Arnaldo, I think this is another piece of fallout
> > from the struct sock splitup you did ages ago.
> > 
> > I think it's dereferencing inet_sk(sk) for a time-wait
> > socket, so we probably need a TCP_TIME_WAIT test plus
> > some additional logic here?  Better check tcp_ipv6.c too.
> 
> Dan, could you please try with this patch?

Better, try with this one, it has an extra check that can catch other places
where we have a struct tcp_tw_bucket being used as a struct sock...

Tested here on a dual p100, no problems so far, i.e. using the kernel with the
patch below, not doing the apache tests.

- Arnaldo

===== include/linux/ip.h 1.10 vs edited =====
--- 1.10/include/linux/ip.h	Fri May 16 18:02:36 2003
+++ edited/include/linux/ip.h	Thu Oct  9 23:36:06 2003
@@ -157,7 +157,13 @@
 	struct inet_opt   inet;
 };
 
+#if 0
 #define inet_sk(__sk) (&((struct inet_sock *)__sk)->inet)
+#else
+#include <linux/tcp.h>
+#define inet_sk(__sk) ({ WARN_ON(__sk->sk_state == TCP_TIME_WAIT); \
+			 (&((struct inet_sock *)__sk)->inet); })
+#endif
 
 #endif
 
===== net/ipv4/tcp_ipv4.c 1.69 vs edited =====
--- 1.69/net/ipv4/tcp_ipv4.c	Wed Oct  8 12:27:40 2003
+++ edited/net/ipv4/tcp_ipv4.c	Thu Oct  9 23:23:38 2003
@@ -186,7 +186,8 @@
 	int reuse = sk->sk_reuse;
 
 	sk_for_each_bound(sk2, node, &tb->owners) {
-		if (sk != sk2 &&
+		if (likely(sk->sk_state != TCP_TIME_WAIT) &&
+		    sk != sk2 &&
 		    !ipv6_only_sock(sk2) &&
 		    (!sk->sk_bound_dev_if ||
 		     !sk2->sk_bound_dev_if ||
===== net/ipv6/tcp_ipv6.c 1.74 vs edited =====
--- 1.74/net/ipv6/tcp_ipv6.c	Wed Oct  8 12:27:40 2003
+++ edited/net/ipv6/tcp_ipv6.c	Thu Oct  9 23:23:45 2003
@@ -101,7 +101,8 @@
 
 	/* We must walk the whole port owner list in this case. -DaveM */
 	sk_for_each_bound(sk2, node, &tb->owners) {
-		if (sk != sk2 &&
+		if (likely(sk->sk_state != TCP_TIME_WAIT) &&
+		    sk != sk2 &&
 		    (!sk->sk_bound_dev_if ||
 		     !sk2->sk_bound_dev_if ||
 		     sk->sk_bound_dev_if == sk2->sk_bound_dev_if) &&

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFT] Re: Fw: Nasty Oops in 2.6.0-test6 bind/SO_REUSEADDR
  2003-10-10  3:22   ` [RFT] " Arnaldo Carvalho de Melo
@ 2003-10-10  4:14     ` David S. Miller
  2003-10-10 15:40       ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 7+ messages in thread
From: David S. Miller @ 2003-10-10  4:14 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: netdev, dmerillat


Even if they make the OOPS go away, all of these patches are
not correct.

You have to consider the timewait sockets just like established
sockets in the bind conflict test.

Therefore in the most inner part of this chain of if's you need
to have something like:

	u32 rcv_saddr;

	if (sk->sk_state == TCP_TIME_WAIT) {
		struct tcp_tw_bucket *tw = tcptw_sk(sk2);
		rcv_saddr = tw->tw_rcv_saddr;
	} else {
		struct inet_sock *inet = inet_sk(sk2);
		rcv_saddr = inet->rcv_saddr;
	}
	if (!rcv_saddr || !sk->rcv_saddr ||
	    sk->rcv_saddr == rcv_saddr)
		... blah blah blah ...

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFT] Re: Fw: Nasty Oops in 2.6.0-test6 bind/SO_REUSEADDR
  2003-10-10  4:14     ` David S. Miller
@ 2003-10-10 15:40       ` Arnaldo Carvalho de Melo
  2003-10-11 19:56         ` David S. Miller
  0 siblings, 1 reply; 7+ messages in thread
From: Arnaldo Carvalho de Melo @ 2003-10-10 15:40 UTC (permalink / raw)
  To: David S. Miller; +Cc: netdev, dmerillat

Em Thu, Oct 09, 2003 at 09:14:37PM -0700, David S. Miller escreveu:
> 
> Even if they make the OOPS go away, all of these patches are
> not correct.
> 
> You have to consider the timewait sockets just like established
> sockets in the bind conflict test.
> 
> Therefore in the most inner part of this chain of if's you need
> to have something like:
> 
> 	u32 rcv_saddr;
> 
> 	if (sk->sk_state == TCP_TIME_WAIT) {
> 		struct tcp_tw_bucket *tw = tcptw_sk(sk2);
> 		rcv_saddr = tw->tw_rcv_saddr;
> 	} else {
> 		struct inet_sock *inet = inet_sk(sk2);
> 		rcv_saddr = inet->rcv_saddr;
> 	}
> 	if (!rcv_saddr || !sk->rcv_saddr ||
> 	    sk->rcv_saddr == rcv_saddr)
> 		... blah blah blah ...

Gotcha, I'm working on this now, creating inet_rcv_saddr and inet6_rcv_saddr
inline functions, and in the process I'm finding what seems like a bug in
ipv6_rcv_saddr_equal, look at this:

        if (sk2->sk_family == AF_INET6 &&
            !ipv6_addr_cmp(&np->rcv_saddr,
                           (sk2->sk_state != TCP_TIME_WAIT ?
                            &inet6_sk(sk2)->rcv_saddr :
                            &tcptw_sk(sk)->tw_v6_rcv_saddr)))
                                      ^^
                                      ^^
shouldn't the tcp_tw_sk(sk) be tcp_tw_sk(sk2)?

And in this function we have the guard against it being a tcp_tw_bucket, but
not in all places...

Anyway, I'll be posting patches today.

- Arnaldo

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [RFT] Re: Fw: Nasty Oops in 2.6.0-test6 bind/SO_REUSEADDR
  2003-10-10 15:40       ` Arnaldo Carvalho de Melo
@ 2003-10-11 19:56         ` David S. Miller
  0 siblings, 0 replies; 7+ messages in thread
From: David S. Miller @ 2003-10-11 19:56 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: netdev, dmerillat

On Fri, 10 Oct 2003 12:40:52 -0300
Arnaldo Carvalho de Melo <acme@conectiva.com.br> wrote:

> ipv6_rcv_saddr_equal, look at this:
> 
>         if (sk2->sk_family == AF_INET6 &&
>             !ipv6_addr_cmp(&np->rcv_saddr,
>                            (sk2->sk_state != TCP_TIME_WAIT ?
>                             &inet6_sk(sk2)->rcv_saddr :
>                             &tcptw_sk(sk)->tw_v6_rcv_saddr)))
>                                       ^^
>                                       ^^
> shouldn't the tcp_tw_sk(sk) be tcp_tw_sk(sk2)?

Yes.

> And in this function we have the guard against it being a tcp_tw_bucket, but
> not in all places...

It does guard in this spot, that's why it is checking the
sk_state value.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2003-10-11 19:56 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-10-08 20:33 Fw: Nasty Oops in 2.6.0-test6 bind/SO_REUSEADDR David S. Miller
2003-10-09 17:03 ` Dan Merillat
2003-10-10  2:36 ` Arnaldo Carvalho de Melo
2003-10-10  3:22   ` [RFT] " Arnaldo Carvalho de Melo
2003-10-10  4:14     ` David S. Miller
2003-10-10 15:40       ` Arnaldo Carvalho de Melo
2003-10-11 19:56         ` David S. Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.