From: "dada1" <dada1@cosmosbay.com>
To: "Andi Kleen" <ak@colin2.muc.de>
Cc: "Nakajima, Jun" <jun.nakajima@intel.com>,
"Andi Kleen" <ak@muc.de>, <linux-kernel@vger.kernel.org>,
<netdev@oss.sgi.com>
Subject: Re: Network buffer hang was Re: [PATCH] 2.6 workaround for Athlon/Opteron prefetch errata
Date: Thu, 11 Sep 2003 15:17:55 +0200 [thread overview]
Message-ID: <0b2901c37867$1db399a0$890010ac@edumazet> (raw)
In-Reply-To: 20030911120956.GB7751@colin2.muc.de
[-- Attachment #1: Type: text/plain, Size: 6753 bytes --]
> > This is not a kernel crash. But total freeze as all memory is used by
> > network buffers, in no more than 10 seconds.
>
> Ok, but then you have to diagnose this freeze. I'm not sure why you
> think it must be this prefetch thingy. If the prefetch issue was
> hit then you would just get a normal segfault, not a kernel hang.
Well, the machine is a bi-athlon, and I use prefetchnta... thats all.
>
> e.g. you could write some kind of reduced test case for it and
> post it to the netdev mailing list (netdev@oss.sgi.com)
Thanks very much. I'm resending my original mail (with a small test program
attached), at the end of this one.
>
> I'm cc'ing it for you.
>
> > This application receive smalls TCP messages (about 30 bytes), but the
> > network stacks allocates 4KB buffers to store this little messages.
>
> Most drivers only allocate MTU size in their receive ring
> (normally 1.5K on ethernet). This is rounded to 2K by the memory
allocator.
>
> But most drivers support a rx_copybreak parameter. When the received
> packet is smaller than rx_copybreak it is copied to a freshly allocated
> buffer with the right size.
I'm using e1000 driver , on linux-2.6, this driver doesnt use the
rx_copybreak trick.
>
> In addition the 2.4 stack also supports garbage collection in the TCP
> receive buffers. This means even when a driver doesn't do the rx_copybreak
> trick and the receive queue of a socket fills up it will copy the data
> to fresh, right sized packets by itself.
>
> Another limit for this scenario is that the network stack has internal
> limits that supposed to avoid this. These are: each socket has a
> fixed receive buffer size and when more data arrives (including packet
> metadata and normal wastage) than the receive buffer allows then it is
> still dropped. In addition TCP has a global memory limit that also kicks
> in. And the network stack has a global queue limit that prevents
> too much data to be queued from the driver to the higher level
> parts (/proc/sys/net/core/netdev_max_backlog). Sometimes the queueing
> can also be controlled on the driver level with driver specific
> knobs.
>
cat /proc/sys/net/core/netdev_max_backlog
300
> This all can be tuned by sysctls in /proc/sys. See
Documentation/networking/
> ip-sysctl.txt for more details.
>
> Also the latest 2.6 kernel finally has a writable
/proc/sys/vm/min_free_kbytes
> again. This controls the amount of memory kept free for interrupts.
> Increase that.
Hum I didnt knew this one...
cat /proc/sys/vm/min_free_kbytes
16384
>
> > I posted a test application some days ago about this problem and got no
> > answers/feedback.
>
> Did you post it to netdev? On linux-kernel such things get often
> lost in the noise.
>
> Also I would contact the driver maintainer, it could be really a driver
> Issue.
>
> -Andi
Here is the copy of the mail I sent the Sep 1st on linux-kernel & linux-net
:
Hi all
I have an annoying problem with a network server (TCP sockets)
On some stress situation, LowMemory goes close to 0, and the whole machine
freezes.
When the sockets receive a lot of data, and the server is busy, the TCP
stack just can use too many buffers (in LowMem).
TCP stack uses "size-4096" buffers to store the datas, even if only one byte
is coming from the network.
I tried to change /proc/sys/net/ipv4/tcp_mem, without results.
# echo "1000 10000 15000" >/proc/sys/net/ipv4/tcp_mem
You can reproduce the problem with the test program attached.
# gcc -o crash crash.c
# ulimit -n 20000
# ./crash listen 8888 &
# ./crash call 127.0.0.1:8888 &
grep "size-4096 " /proc/slabinfo
size-4096 40015 40015 4096 1 1 : tunables 24 12 0 : slabdata
40015 40015 0
(thats is 160 Mo, far more than the limit given in
/proc/sys/net/ipv4/tcp_mem)
grep TCP /proc/net/sockstat
TCP: inuse 39996 orphan 0 tw 0 alloc 39997 mem 79986
What is the unit of 'mem' field ? Unless it is 2Ko, the numbers are wrong.
How may I ask the kernel NOT to use more than 'X Mo' to store TCP messages
?
Thanks
Eric Dumazet
/*
* Program to freeze a linux box, by using all the LOWMEM
* A bug on the tcp stack may be the reason
* Use at your own risk !!
*/
/* Principles :
A listener accepts incoming tcp sockets, write 40 bytes, and does nothing
with them (no reading)
A writer establish TCP sockets, sends some data (40 bytes), no more
reading/writing
*/
#include <stdio.h>
# include <sys/socket.h>
# include <netinet/tcp.h>
# include <arpa/inet.h>
# include <netdb.h>
# include <unistd.h>
# include <string.h>
/*
* Usage :
* crash listen port
* crash call IP:port
*/
void usage(int code)
{
fprintf(stderr, "Usages :\n") ;
fprintf(stderr, " crash listen port\n") ;
fprintf(stderr, " crash call IP:port\n") ;
exit(code) ;
}
const char some_data[40] = "some data.... just some data" ;
void do_listener(const char *string)
{
int port = atoi(string) ;
struct sockaddr_in host, from ;
int fdlisten ;
unsigned int total ;
socklen_t fromlen ;
memset(&host,0, sizeof(host));
host.sin_family = AF_INET;
host.sin_port = htons(port);
fdlisten = socket(AF_INET, SOCK_STREAM, 0) ;
if (bind(fdlisten, (struct sockaddr *)&host, sizeof(host)) == -1) {
perror("bind") ;
return ;
}
listen(fdlisten, 10) ;
for (total=0;;total++) {
int nfd ;
fromlen = sizeof(from) ;
nfd = accept(fdlisten, (struct sockaddr *)&from, &fromlen) ;
if (nfd == -1) break ;
write(nfd, some_data, sizeof(some_data)) ;
}
printf("total=%u\n", total) ;
pause() ;
}
void do_caller(const char *string)
{
union {
int i ;
char c[4] ;
} u ;
struct sockaddr_in dest;
int a1, a2, a3, a4, port ;
unsigned int total ;
sscanf(string, "%d.%d.%d.%d:%d", &a1, &a2, &a3, &a4, &port) ;
u.c[0] = a1 ; u.c[1] = a2 ; u.c[2] = a3 ; u.c[3] = a4 ;
for (total=0;;total++) {
int fd ;
memset(&dest, 0, sizeof(dest)) ;
dest.sin_family = AF_INET ;
dest.sin_port = htons(port) ;
dest.sin_addr.s_addr = u.i ;
fd = socket(AF_INET, SOCK_STREAM, 0) ;
if (fd == -1) break ;
if (connect(fd, (struct sockaddr *)&dest, sizeof(dest)) == -1) {
perror("connect") ;
break ;
}
write(fd, some_data, sizeof(some_data)) ;
}
printf("total=%u\n", total) ;
pause() ;
}
int main(int argc, char *argv[])
{
int listener ;
int caller ;
if (argc != 3) {
usage(1);
}
listener = !strcmp(argv[1], "listen") ;
caller = !strcmp(argv[1], "call") ;
if (listener) {
do_listener(argv[2]) ;
}
else if (caller) {
do_caller(argv[2]) ;
}
else usage(2) ;
return 0 ;
}
/********************************************************************/
[-- Attachment #2: crash.c --]
[-- Type: text/plain, Size: 2624 bytes --]
/*
* Program to freeze a linux box, by using all the LOWMEM
* A bug on the tcp stack may be the reason
* Use at your own risk !!
*/
/* Principles :
A listener accepts incoming tcp sockets, write 40 bytes, and does nothing with them (no reading)
A writer establish TCP sockets, sends some data (40 bytes), no more reading/writing
*/
#include <stdio.h>
# include <sys/socket.h>
# include <netinet/tcp.h>
# include <arpa/inet.h>
# include <netdb.h>
# include <unistd.h>
# include <string.h>
/*
* Usage :
* crash listen port
* crash call IP:port
*/
void usage(int code)
{
fprintf(stderr, "Usages :\n") ;
fprintf(stderr, " crash listen port\n") ;
fprintf(stderr, " crash call IP:port\n") ;
exit(code) ;
}
const char some_data[40] = "some data.... just some data" ;
void do_listener(const char *string)
{
int port = atoi(string) ;
struct sockaddr_in host, from ;
int fdlisten ;
unsigned int total ;
socklen_t fromlen ;
memset(&host,0, sizeof(host));
host.sin_family = AF_INET;
host.sin_port = htons(port);
fdlisten = socket(AF_INET, SOCK_STREAM, 0) ;
if (bind(fdlisten, (struct sockaddr *)&host, sizeof(host)) == -1) {
perror("bind") ;
return ;
}
listen(fdlisten, 10) ;
for (total=0;;total++) {
int nfd ;
fromlen = sizeof(from) ;
nfd = accept(fdlisten, (struct sockaddr *)&from, &fromlen) ;
if (nfd == -1) break ;
write(nfd, some_data, sizeof(some_data)) ;
}
printf("total=%u\n", total) ;
pause() ;
}
void do_caller(const char *string)
{
union {
int i ;
char c[4] ;
} u ;
struct sockaddr_in dest;
int a1, a2, a3, a4, port ;
unsigned int total ;
sscanf(string, "%d.%d.%d.%d:%d", &a1, &a2, &a3, &a4, &port) ;
u.c[0] = a1 ; u.c[1] = a2 ; u.c[2] = a3 ; u.c[3] = a4 ;
for (total=0;;total++) {
int fd ;
memset(&dest, 0, sizeof(dest)) ;
dest.sin_family = AF_INET ;
dest.sin_port = htons(port) ;
dest.sin_addr.s_addr = u.i ;
fd = socket(AF_INET, SOCK_STREAM, 0) ;
if (fd == -1) break ;
if (connect(fd, (struct sockaddr *)&dest, sizeof(dest)) == -1) {
perror("connect") ;
break ;
}
write(fd, some_data, sizeof(some_data)) ;
}
printf("total=%u\n", total) ;
pause() ;
}
int main(int argc, char *argv[])
{
int listener ;
int caller ;
if (argc != 3) {
usage(1);
}
listener = !strcmp(argv[1], "listen") ;
caller = !strcmp(argv[1], "call") ;
if (listener) {
do_listener(argv[2]) ;
}
else if (caller) {
do_caller(argv[2]) ;
}
else usage(2) ;
return 0 ;
}
/********************************************************************/
next prev parent reply other threads:[~2003-09-11 13:18 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <uqD5.3BI.3@gated-at.bofh.it>
2003-09-11 4:14 ` [PATCH] 2.6 workaround for Athlon/Opteron prefetch errata Andi Kleen
2003-09-11 4:58 ` dada1
2003-09-11 5:11 ` Andi Kleen
2003-09-11 5:58 ` dada1
2003-09-11 12:09 ` Network buffer hang was " Andi Kleen
2003-09-11 13:17 ` dada1 [this message]
2003-09-12 1:46 ` Ben Greear
2003-09-12 1:41 ` David S. Miller
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='0b2901c37867$1db399a0$890010ac@edumazet' \
--to=dada1@cosmosbay.com \
--cc=ak@colin2.muc.de \
--cc=ak@muc.de \
--cc=jun.nakajima@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=netdev@oss.sgi.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).