* CLOSE_WAIT bug?
@ 2001-06-12 17:30 eg_nth
2001-06-13 2:07 ` David Schwartz
0 siblings, 1 reply; 5+ messages in thread
From: eg_nth @ 2001-06-12 17:30 UTC (permalink / raw)
To: linux-kernel; +Cc: dicky
[-- Attachment #1: Type: TEXT/PLAIN, Size: 1320 bytes --]
Hi all,
I suppect that there is bug in both kernel 2.2.19 and 2.4.5.
The situation is as follow.
One server socket created and listening, blocking on select(),
once a client connect to that port, there is another thread in server
side issues a close() to the new connection.
After the client close the connection. The connection in server side will
stuck on CLOSE_WAIT forever until the program being killed.
I have attached a program to trigger the bug.
The program is written base on a bugtraq post on this link:
http://archives.indenial.com/hypermail/bugtraq/1999/January1999/0015.html
This is the output of "netstat -anop":
tcp 1 0 127.0.0.1:52882 127.0.0.1:1031 CLOSE_WAIT - off (0.00/0/0)
tcp 1 0 127.0.0.1:52882 127.0.0.1:1030 CLOSE_WAIT - off (0.00/0/0)
You can see that there is no owner and the timer is off.
I encounter this in my server program and the CLOSE_WAIT thread eat up
all the resource as it cannot be released.
I have tried this on kernel 2.2.16, 2.2.19, 2.4.5 and using
gcc version 2.96 20000731 (Red Hat Linux 7.0), all this have such problem.
I am new to kernel hacking. I don't know whether this is a bug or not.
Please correct me if I am doing something wrong and forgive my poor
description. :)
Thanks
Dicky
PS. Please CC: dicky@sinocdn.com when reply.
[-- Attachment #2: Type: TEXT/PLAIN, Size: 3157 bytes --]
// This program will kill a random port on a linux machine. The kernel will
// forever listen to that port and send the connections nowhere. Tested with
// Linux kernel 2.0.35 and libc-2.0.7. Requires LinuxThreads to compile,
// but removing LinuxThreads from your system will not solve the problem.
// The bug is triggered when a multithreaded program closes a socket from
// one thread while another thread is selecting on it. A subsequent abort
// leaves the socket in never-never land.
// Do not underestimate the risk of this exploit. While this program
// is mild, more vicious programs could lock large numbers of ports or
// replicate this same attack on an active connection with large
// send/receive buffers full of data. This could cause large increases
// in kernel memory consumption.
// Discovered by David J. Schwartz <davids@webmaster.com>
// Copyright (C) 1998, David J. Schwartz
// Note: This bug was not fixed in 2.0.36, as I was told it would be
// Compile with:
// gcc CLOSE_WAIT_test.c -lpthread -o CLOSE_WAIT_test
#include <pthread.h>
#include <stdio.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>
#include <stdlib.h>
#include <arpa/inet.h>
#include <errno.h>
volatile int s;
volatile int sock;
volatile int connected=0;
void *Thread1(void *a)
{
int i,p;
struct sockaddr_in to;
fd_set fd;
s=socket(AF_INET, SOCK_STREAM, 0);
if(s<=0) return;
memset(&to, 0, sizeof(to));
srand(getpid());
/* we pick a random port between 50000 and 59999 */
p=(rand()%10000)+50000;
printf("port = %d\n", p);
fflush(stdout);
to.sin_port=htons(p);
to.sin_addr.s_addr=0;
to.sin_family=AF_INET;
if(bind(s, (struct sockaddr *)&to, sizeof(to))<0)
fprintf(stderr,"no bind\n");
if(listen(s,10)!=0)
fprintf(stderr,"No Listen\n");
/* now we are listening on that port */
i=sizeof(to);
FD_ZERO(&fd);
FD_SET(s,&fd);
fprintf(stdout,"Listening, before select\n");
fprintf(stdout,"Please connect to port %d now\n", p);
select(s+1,&fd,NULL,NULL,NULL);
/* at this point we have selected on it as well */
fprintf(stderr,"select returned!\n");
if (FD_ISSET(s, &fd))
{
fprintf(stdout, "socket is set\n");
sock = accept(s, NULL, NULL);
fprintf(stdout, "accepted\n");
FD_SET(sock, &fd);
fprintf(stdout, "FD_SET ok\n");
connected = 1;
fprintf(stdout,"\nListening, before select\n");
select(sock+1, &fd, NULL, NULL, NULL);
fprintf(stdout, "select returned\n");
}
else
{
fprintf(stderr, "Error : fd not set\n");
exit(1);
}
}
void *Thread2(void *a)
{
fprintf(stdout,"Thread2 : before close the client socket\n");
close(sock);
fprintf(stdout,"Thread2 : after close the client socket\n\n\n");
fprintf(stdout,"Please close the remote session and check the result\n");
fflush(stderr);
// abort();
}
void main(void)
{
pthread_t j;
pthread_create(&j,NULL,Thread1,NULL);
while (connected == 0)
usleep(1000); /* give the other thread time to finish */
pthread_create(&j,NULL,Thread2,NULL);
while(1) sleep(1);
}
^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: CLOSE_WAIT bug?
2001-06-12 17:30 CLOSE_WAIT bug? eg_nth
@ 2001-06-13 2:07 ` David Schwartz
0 siblings, 0 replies; 5+ messages in thread
From: David Schwartz @ 2001-06-13 2:07 UTC (permalink / raw)
To: eg_nth, linux-kernel; +Cc: dicky
> One server socket created and listening, blocking on select(),
> once a client connect to that port, there is another thread in server
> side issues a close() to the new connection.
> After the client close the connection. The connection in server side will
> stuck on CLOSE_WAIT forever until the program being killed.
This isn't something you should ever do. There is no way the kernel can
guarantee a sane reaction to this since the 'close' could occur _before_ you
even enter 'select'.
There is no atomic 'release mutex and select' function so you can never
know for sure whether the 'close' will occur before or after the other
thread enters 'select'. There's also the possibility that another thread
will open a new connection onto the same descriptor before the thread
blocked in 'select' gets a chance to notice that the descriptor is being
closed.
It's also not clear what the 'close' does in this case. An attempt to close
a descriptor is not supposed to close the underlying connection unless it
closes the last reference. It's not clear whether 'select' represents a
reference or not, and it's not clear what should happen if the descriptor
table changes before the 'select' thread gets woken up even if the 'close'
call schedules it.
One can argue that 'select' should return because a 'read' or 'write' to
the connection wouldn't block. But that's only true after 'select' returns.
While the endpoint is in use (and 'select' is using it), 'close' shouldn't
necessarily close the underlying connection. So this argument require
bootstrapping.
For TCP, use 'shutdown'. Don't 'close' the descriptor until you are sure no
thread is using it. This is as serious an error as 'free'ing memory that
another thread is using.
So your code is buggy. So long as the kernel doesn't lose track of the
resources entirely, it's behavior is (at least to me) acceptable. In fact, I
wish it would punish errors like this more severely, as this would reduce
the amount of code out there that has them.
DS
^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <Pine.LNX.4.21.0106130037330.10007-200000@sinocdn.com.suse.lists.linux.kernel>]
* Re: CLOSE_WAIT bug?
[not found] <Pine.LNX.4.21.0106130037330.10007-200000@sinocdn.com.suse.lists.linux.kernel>
@ 2001-06-13 8:55 ` Andi Kleen
0 siblings, 0 replies; 5+ messages in thread
From: Andi Kleen @ 2001-06-13 8:55 UTC (permalink / raw)
To: eg_nth; +Cc: linux-kernel
<eg_nth@sinocdn.com> writes:
> Hi all,
>
> I suppect that there is bug in both kernel 2.2.19 and 2.4.5.
> The situation is as follow.
>
> One server socket created and listening, blocking on select(),
> once a client connect to that port, there is another thread in server
> side issues a close() to the new connection.
> After the client close the connection. The connection in server side will
> stuck on CLOSE_WAIT forever until the program being killed.
It is a known problem that unfortunately cannot be easily fixed with the current VFS.
select has a reference to the file handle and it keeps it open until the select ends.
Do a shutdown() in the thread before the close, then the select should wake up.
-Andi
^ permalink raw reply [flat|nested] 5+ messages in thread
* CLOSE_WAIT bug?
@ 2002-04-01 9:59 Beng Asuncion
0 siblings, 0 replies; 5+ messages in thread
From: Beng Asuncion @ 2002-04-01 9:59 UTC (permalink / raw)
To: linux-kernel
Dear List,
We are using 2.4.17 kernel for our Production machines running JRun/Java
applications + Apache. We are encountering a lot of CLOSE_WAITs like the
following before our JRun applications die (port 53001 is the Jrun
port):
tcp 1 0 127.0.0.1:47185 127.0.0.1:53001
CLOSE_WAIT 32649/httpd
tcp 1 0 127.0.0.1:47119 127.0.0.1:53001
CLOSE_WAIT 32641/httpd
tcp 1 0 127.0.0.1:47283 127.0.0.1:53001
CLOSE_WAIT 32634/httpd
tcp 7661 0 127.0.0.1:48012 127.0.0.1:53001
CLOSE_WAIT 1051/httpd
tcp 1 0 127.0.0.1:48166 127.0.0.1:53001
CLOSE_WAIT 32663/httpd
tcp 1 0 127.0.0.1:46625 127.0.0.1:53001
CLOSE_WAIT 32640/httpd
tcp 1 0 127.0.0.1:46727 127.0.0.1:53001
CLOSE_WAIT 32667/httpd
tcp 1 0 127.0.0.1:51625 127.0.0.1:53001
CLOSE_WAIT 632/httpd
tcp 1 0 127.0.0.1:51623 127.0.0.1:53001
CLOSE_WAIT 32654/httpd
tcp 1 0 127.0.0.1:51374 127.0.0.1:53001
CLOSE_WAIT 1023/httpd
tcp 1 0 127.0.0.1:52010 127.0.0.1:53001
CLOSE_WAIT 32669/httpd
tcp 1 0 127.0.0.1:52168 127.0.0.1:53001
CLOSE_WAIT 32661/httpd
tcp 1 0 127.0.0.1:51774 127.0.0.1:53001
CLOSE_WAIT 32642/httpd
tcp 4125 0 127.0.0.1:52635 127.0.0.1:53001
CLOSE_WAIT 632/httpd
tcp 1 0 127.0.0.1:52651 127.0.0.1:53001
CLOSE_WAIT 404/httpd
tcp 1 0 127.0.0.1:52676 127.0.0.1:53001 CLOS
E_WAIT 32672/httpd
tcp 1 0 127.0.0.1:53046 127.0.0.1:53001
CLOSE_WAIT 410/httpd
tcp 1 0 127.0.0.1:52831 127.0.0.1:53001
CLOSE_WAIT 32643/httpd
tcp 1 0 127.0.0.1:52818 127.0.0.1:53001
CLOSE_WAIT 32660/httpd
tcp 1 0 127.0.0.1:49602 127.0.0.1:53001
CLOSE_WAIT 635/httpd
tcp 1 0 127.0.0.1:50028 127.0.0.1:53001
CLOSE_WAIT 631/httpd
tcp 1 0 127.0.0.1:50161 127.0.0.1:53001
CLOSE_WAIT 627/httpd
tcp 1 0 127.0.0.1:49896 127.0.0.1:53001
CLOSE_WAIT 633/httpd
tcp 1 0 127.0.0.1:50498 127.0.0.1:53001
CLOSE_WAIT 629/httpd
tcp 1 0 127.0.0.1:50496 127.0.0.1:53001
CLOSE_WAIT 32658/httpd
tcp 1 0 127.0.0.1:50947 127.0.0.1:53001
CLOSE_WAIT 372/httpd
tcp 1 0 127.0.0.1:50754 127.0.0.1:53001
CLOSE_WAIT 626/httpd
tcp 1 0 127.0.0.1:55640 127.0.0.1:53001
CLOSE_WAIT 32666/httpd
tcp 4718 0 127.0.0.1:55638 127.0.0.1:53001
CLOSE_WAIT 626/httpd
tcp 1 0 127.0.0.1:55791 127.0.0.1:53001
CLOSE_WAIT 4237/httpd
tcp 1 0 127.0.0.1:56074 127.0.0.1:53001
CLOSE_WAIT 3300/httpd
tcp 1 0 127.0.0.1:56100 127.0.0.1:53001
CLOSE_WAIT 32657/httpd
tcp 1 0 127.0.0.1:55893 127.0.0.1:53001
CLOSE_WAIT 32632/httpd
tcp 1 0 127.0.0.1:56621 127.0.0.1:53001
CLOSE_WAIT 403/httpd
tcp 1 0 127.0.0.1:56758 127.0.0.1:53001
CLOSE_WAIT 628/httpd
tcp 8596 0 127.0.0.1:56826 127.0.0.1:53001
CLOSE_WAIT 635/httpd
tcp 1 0 127.0.0.1:56438 127.0.0.1:53001
CLOSE_WAIT 32665/httpd
tcp 1 0 127.0.0.1:57266 127.0.0.1:53001
CLOSE_WAIT 4242/httpd
tcp 4718 0 127.0.0.1:56904 127.0.0.1:53001
CLOSE_WAIT 32659/httpd
tcp 1 0 127.0.0.1:56980 127.0.0.1:53001
CLOSE_WAIT 32671/httpd
tcp 1 0 127.0.0.1:57003 127.0.0.1:53001
CLOSE_WAIT 32664/httpd
tcp 18867 0 127.0.0.1:57052 127.0.0.1:53001
CLOSE_WAIT 32648/httpd
tcp 1 0 127.0.0.1:53407 127.0.0.1:53001
CLOSE_WAIT 32638/httpd
tcp 1 0 127.0.0.1:53402 127.0.0.1:53001
CLOSE_WAIT 32659/httpd
tcp 1 0 127.0.0.1:53439 127.0.0.1:53001
CLOSE_WAIT 32668/httpd
tcp 1 0 127.0.0.1:54162 127.0.0.1:53001
CLOSE_WAIT 32650/httpd
tcp 49403 0 127.0.0.1:54172 127.0.0.1:53001
CLOSE_WAIT 32666/httpd
tcp 1 0 127.0.0.1:53783 127.0.0.1:53001
CLOSE_WAIT 32662/httpd
tcp 1 0 127.0.0.1:54408 127.0.0.1:53001
CLOSE_WAIT 634/httpd
tcp 1 0 127.0.0.1:54451 127.0.0.1:53001
CLOSE_WAIT 32633/httpd
tcp 21285 0 127.0.0.1:55113 127.0.0.1:53001
CLOSE_WAIT 4241/httpd
tcp 1 0 127.0.0.1:57400 127.0.0.1:53001
CLOSE_WAIT 4236/httpd
tcp 1 0 127.0.0.1:57463 127.0.0.1:53001
CLOSE_WAIT 32644/httpd
tcp 1 0 127.0.0.1:57456 127.0.0.1:53001
CLOSE_WAIT 32670/httpd
tcp 1 0 127.0.0.1:57584 127.0.0.1:53001
CLOSE_WAIT 4233/httpd
CLOSE_WAIT_for_OTHERS
12
tcp 1 14600 192.168.66.178:80 203.123.134.34:21244
CLOSE_WAIT 4241/httpd
tcp 1 12847 192.168.66.178:80 205.188.199.187:5396
CLOSE_WAIT 32666/httpd
tcp 1 13600 192.168.66.178:80 205.188.193.33:29005
CLOSE_WAIT 4238/httpd
tcp 1 13006 192.168.66.178:80 205.188.199.187:10542
CLOSE_WAIT 635/httpd
tcp 1 12771 192.168.66.178:80 205.188.192.184:23366
CLOSE_WAIT 5330/httpd
tcp 1 12447 192.168.66.178:80 205.188.199.187:10090
CLOSE_WAIT 4243/httpd
tcp 1 12770 192.168.66.178:80 205.188.192.184:25710
CLOSE_WAIT 628/httpd
tcp 1 10720 192.168.66.178:80 193.110.211.66:12136
CLOSE_WAIT 632/httpd
tcp 1 14600 192.168.66.178:80 203.127.108.3:51958
CLOSE_WAIT 5331/httpd
tcp 1 14600 192.168.66.178:80 203.127.108.3:40387
CLOSE_WAIT 32648/httpd
tcp 1 14437 192.168.66.178:80 210.49.245.134:15251
CLOSE_WAIT 626/httpd
tcp 1 13590 192.168.66.178:80 210.180.96.11:6364
CLOSE_WAIT 5377/httpd
---------------------------------
The developers have hinted that it could be a problem of the kernel,
refering me to the link below:
http://www.cs.helsinki.fi/linux/linux-kernel/2001-23/0424.html
It seems that the link referred to kernels 2.2.19 and 2.4.5 . Would
anyone be able to verify/confirm that this "BUG"(?) has already been
fixed on 2.4.17 and up? I followed the thread on the mailing list and
it looked like the messages stopped on this reponse from Andi Kleen.
Any info to shed light on this would be very much appreciated.
Regards,
Beng
^ permalink raw reply [flat|nested] 5+ messages in thread
[parent not found: <3CA82F7F.312547B8@globalsources.com.suse.lists.linux.kernel>]
* Re: CLOSE_WAIT bug?
[not found] <3CA82F7F.312547B8@globalsources.com.suse.lists.linux.kernel>
@ 2002-04-01 17:05 ` Andi Kleen
0 siblings, 0 replies; 5+ messages in thread
From: Andi Kleen @ 2002-04-01 17:05 UTC (permalink / raw)
To: Beng Asuncion; +Cc: linux-kernel
Beng Asuncion <asmismn1@globalsources.com> writes:
> Dear List,
>
> We are using 2.4.17 kernel for our Production machines running JRun/Java
> applications + Apache. We are encountering a lot of CLOSE_WAITs like the
> following before our JRun applications die (port 53001 is the Jrun
> port):
You can do a very simple test: if you kill your application completely
(= killing all threads and processes that could keep a socket open)
and then wait a few minutes the CLOSE_WAITs should go away. If they do
it's not a kernel problem and you just need to fix the application
to close sockets properly.
-Andi
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2002-04-01 17:05 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-06-12 17:30 CLOSE_WAIT bug? eg_nth
2001-06-13 2:07 ` David Schwartz
[not found] <Pine.LNX.4.21.0106130037330.10007-200000@sinocdn.com.suse.lists.linux.kernel>
2001-06-13 8:55 ` Andi Kleen
2002-04-01 9:59 Beng Asuncion
[not found] <3CA82F7F.312547B8@globalsources.com.suse.lists.linux.kernel>
2002-04-01 17:05 ` Andi Kleen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).