linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "dada1" <dada1@cosmosbay.com>
To: "Andi Kleen" <ak@colin2.muc.de>
Cc: "Nakajima, Jun" <jun.nakajima@intel.com>,
	"Andi Kleen" <ak@muc.de>, <linux-kernel@vger.kernel.org>,
	<netdev@oss.sgi.com>
Subject: Re: Network buffer hang was Re: [PATCH] 2.6 workaround for Athlon/Opteron prefetch errata
Date: Thu, 11 Sep 2003 15:17:55 +0200	[thread overview]
Message-ID: <0b2901c37867$1db399a0$890010ac@edumazet> (raw)
In-Reply-To: 20030911120956.GB7751@colin2.muc.de

[-- Attachment #1: Type: text/plain, Size: 6753 bytes --]

> > This is not a kernel crash. But total freeze as all memory is used by
> > network buffers, in no more than 10 seconds.
>
> Ok, but then you have to diagnose this freeze. I'm not sure why you
> think it must be this prefetch thingy. If the prefetch issue was
> hit then you would just get a normal segfault, not a kernel hang.

Well, the machine is a bi-athlon, and I use prefetchnta... thats all.

>
> e.g. you could write some kind of reduced test case for it and
> post it to the netdev mailing list (netdev@oss.sgi.com)

Thanks very much. I'm resending my original mail (with a small test program
attached), at the end of this one.

>
> I'm cc'ing it for you.
>
> > This application receive smalls TCP messages (about 30 bytes), but the
> > network stacks allocates 4KB buffers to store this little messages.
>
> Most drivers only allocate MTU size in their receive ring
> (normally 1.5K on ethernet). This is rounded to 2K by  the memory
allocator.
>
> But most drivers support a rx_copybreak parameter. When the received
> packet is smaller than rx_copybreak it is copied to a freshly allocated
> buffer with the right size.

I'm using e1000 driver , on linux-2.6, this driver doesnt use the
rx_copybreak trick.

>
> In addition the 2.4 stack also supports garbage collection in the TCP
> receive buffers. This means even when a driver doesn't do the rx_copybreak
> trick and the receive queue of a socket fills up it will copy the data
> to fresh, right sized packets by itself.
>
> Another limit for this scenario is that the network stack has internal
> limits that supposed to avoid this. These are: each socket has a
> fixed receive buffer size and when more data arrives (including packet
> metadata and normal wastage) than the receive buffer allows then it is
> still dropped. In addition TCP has a global memory limit that also kicks
> in. And the network stack has a global queue limit that prevents
> too much data to be queued from the driver to the higher level
> parts (/proc/sys/net/core/netdev_max_backlog). Sometimes the queueing
> can also be controlled on the driver level with driver specific
> knobs.
>

cat /proc/sys/net/core/netdev_max_backlog
300

> This all can be tuned by sysctls in /proc/sys. See
Documentation/networking/
> ip-sysctl.txt for more details.
>
> Also the latest 2.6 kernel finally has a writable
/proc/sys/vm/min_free_kbytes
> again. This controls the amount of memory kept free for interrupts.
> Increase that.

Hum I didnt knew this one...

cat  /proc/sys/vm/min_free_kbytes
16384

>
> > I posted a test application some days ago about this problem and got no
> > answers/feedback.
>
> Did you post it to netdev?  On linux-kernel such things get often
> lost in the noise.
>
> Also I would contact the driver maintainer, it could be really a driver
> Issue.
>
> -Andi

Here is the copy of the mail I sent the Sep 1st on linux-kernel & linux-net
:

Hi all

I have an annoying problem with a network server (TCP sockets)

On some stress situation,  LowMemory goes close to 0, and the whole machine
freezes.

When the sockets receive a lot of data, and the server is busy, the TCP
stack just can use too many buffers (in LowMem).

TCP stack uses "size-4096" buffers to store the datas, even if only one byte
is coming from the network.

I tried to change /proc/sys/net/ipv4/tcp_mem, without results.
# echo "1000 10000 15000" >/proc/sys/net/ipv4/tcp_mem

You can reproduce the problem with the test program attached.

# gcc -o crash crash.c
# ulimit -n 20000
# ./crash listen  8888 &
# ./crash call 127.0.0.1:8888 &

grep "size-4096 " /proc/slabinfo
size-4096      40015  40015  4096  1  1 : tunables  24  12  0  : slabdata
40015  40015 0

(thats is 160 Mo, far more than the limit given in
/proc/sys/net/ipv4/tcp_mem)

grep TCP /proc/net/sockstat
TCP: inuse 39996 orphan 0 tw 0 alloc 39997  mem 79986

What is the unit of 'mem' field ? Unless it is 2Ko, the numbers are wrong.

 How may I ask the kernel NOT to use more than 'X Mo' to store TCP messages
?

Thanks

Eric Dumazet

/*
 * Program to freeze a linux box, by using all the LOWMEM
 * A bug on the tcp stack may be the reason
 * Use at your own risk !!
 */

/* Principles :
   A listener accepts incoming tcp sockets, write 40 bytes, and does nothing
with them (no reading)
   A writer establish TCP sockets, sends some data (40 bytes), no more
reading/writing
 */
#include <stdio.h>
# include <sys/socket.h>
# include <netinet/tcp.h>
# include <arpa/inet.h>
# include <netdb.h>
# include <unistd.h>
# include <string.h>

/*
 * Usage :
 *              crash listen port
 *              crash call IP:port
 */
void usage(int code)
{
fprintf(stderr, "Usages :\n") ;
fprintf(stderr, "    crash listen port\n") ;
fprintf(stderr, "    crash call IP:port\n") ;
exit(code) ;
}
const char some_data[40] = "some data.... just some data" ;

void do_listener(const char *string)
{
int port = atoi(string) ;
struct sockaddr_in host, from ;
int fdlisten ;
unsigned int total ;
socklen_t fromlen ;
  memset(&host,0, sizeof(host));
  host.sin_family = AF_INET;
  host.sin_port = htons(port);
  fdlisten = socket(AF_INET, SOCK_STREAM, 0) ;
  if (bind(fdlisten, (struct sockaddr *)&host, sizeof(host)) == -1) {
        perror("bind") ;
        return ;
        }
listen(fdlisten, 10) ;
for (total=0;;total++) {
        int nfd ;
        fromlen = sizeof(from) ;
        nfd = accept(fdlisten, (struct sockaddr *)&from, &fromlen) ;
        if (nfd == -1) break ;
        write(nfd, some_data, sizeof(some_data)) ;
        }
printf("total=%u\n", total) ;
pause() ;
}

void do_caller(const char *string)
{
union {
   int i ;
   char c[4] ;
        } u ;
struct sockaddr_in dest;
int a1, a2, a3, a4, port ;
unsigned int total ;
sscanf(string, "%d.%d.%d.%d:%d", &a1, &a2, &a3, &a4, &port) ;
u.c[0] = a1 ; u.c[1] = a2 ; u.c[2] = a3 ; u.c[3] = a4 ;
for (total=0;;total++) {
    int fd ;
        memset(&dest, 0, sizeof(dest)) ;
        dest.sin_family = AF_INET ;
        dest.sin_port = htons(port) ;
        dest.sin_addr.s_addr = u.i ;
    fd = socket(AF_INET, SOCK_STREAM, 0) ;
        if (fd == -1) break ;
        if (connect(fd, (struct sockaddr *)&dest, sizeof(dest)) == -1) {
                perror("connect") ;
                break ;
                }
        write(fd, some_data, sizeof(some_data)) ;
        }
printf("total=%u\n", total) ;
pause() ;
}

int main(int argc, char *argv[])
{
int listener ;
int caller ;
if (argc != 3) {
        usage(1);
        }
listener = !strcmp(argv[1], "listen") ;
caller = !strcmp(argv[1], "call") ;
if (listener) {
        do_listener(argv[2]) ;
        }
else if (caller) {
        do_caller(argv[2]) ;
        }
else usage(2) ;
return 0 ;
}
/********************************************************************/



[-- Attachment #2: crash.c --]
[-- Type: text/plain, Size: 2624 bytes --]

/*
 * Program to freeze a linux box, by using all the LOWMEM
 * A bug on the tcp stack may be the reason 
 * Use at your own risk !!
 */

/* Principles :
   A listener accepts incoming tcp sockets, write 40 bytes, and does nothing with them (no reading)
   A writer establish TCP sockets, sends some data (40 bytes), no more reading/writing
 */
#include <stdio.h>
# include <sys/socket.h>
# include <netinet/tcp.h>
# include <arpa/inet.h>
# include <netdb.h>
# include <unistd.h>
# include <string.h>

/*
 * Usage :
 *              crash listen port
 *              crash call IP:port
 */
void usage(int code)
{
fprintf(stderr, "Usages :\n") ;
fprintf(stderr, "    crash listen port\n") ;
fprintf(stderr, "    crash call IP:port\n") ;
exit(code) ;
}
const char some_data[40] = "some data.... just some data" ;

void do_listener(const char *string)
{
int port = atoi(string) ;
struct sockaddr_in host, from ;
int fdlisten ;
unsigned int total ;
socklen_t fromlen ;
  memset(&host,0, sizeof(host));
  host.sin_family = AF_INET;
  host.sin_port = htons(port);
  fdlisten = socket(AF_INET, SOCK_STREAM, 0) ;
  if (bind(fdlisten, (struct sockaddr *)&host, sizeof(host)) == -1) {
	perror("bind") ;
	return ;
	}
listen(fdlisten, 10) ;
for (total=0;;total++) {
	int nfd ;
	fromlen = sizeof(from) ;
	nfd = accept(fdlisten, (struct sockaddr *)&from, &fromlen) ;
	if (nfd == -1) break ;
	write(nfd, some_data, sizeof(some_data)) ;
	}
printf("total=%u\n", total) ;
pause() ;
}

void do_caller(const char *string)
{
union {
   int i ;
   char c[4] ;
	} u ;
struct sockaddr_in dest;
int a1, a2, a3, a4, port ;
unsigned int total ;
sscanf(string, "%d.%d.%d.%d:%d", &a1, &a2, &a3, &a4, &port) ;
u.c[0] = a1 ; u.c[1] = a2 ; u.c[2] = a3 ; u.c[3] = a4 ;
for (total=0;;total++) {
    int fd ;
	memset(&dest, 0, sizeof(dest)) ;
	dest.sin_family = AF_INET ;
   	dest.sin_port = htons(port) ;
   	dest.sin_addr.s_addr = u.i ;
    fd = socket(AF_INET, SOCK_STREAM, 0) ;
	if (fd == -1) break ;
	if (connect(fd, (struct sockaddr *)&dest, sizeof(dest)) == -1) {
		perror("connect") ;
		break ;
		}
	write(fd, some_data, sizeof(some_data)) ;
	}
printf("total=%u\n", total) ;
pause() ;
}

int main(int argc, char *argv[])
{
int listener ;
int caller ;
if (argc != 3) {
	usage(1);
	}
listener = !strcmp(argv[1], "listen") ;
caller = !strcmp(argv[1], "call") ;
if (listener) {
	do_listener(argv[2]) ;
	}
else if (caller) {
	do_caller(argv[2]) ;
	}
else usage(2) ;
return 0 ;
}
/********************************************************************/

  reply	other threads:[~2003-09-11 13:18 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <uqD5.3BI.3@gated-at.bofh.it>
2003-09-11  4:14 ` [PATCH] 2.6 workaround for Athlon/Opteron prefetch errata Andi Kleen
2003-09-11  4:58   ` dada1
2003-09-11  5:11     ` Andi Kleen
2003-09-11  5:58       ` dada1
2003-09-11 12:09         ` Network buffer hang was " Andi Kleen
2003-09-11 13:17           ` dada1 [this message]
2003-09-12  1:46             ` Ben Greear
2003-09-12  1:41               ` David S. Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='0b2901c37867$1db399a0$890010ac@edumazet' \
    --to=dada1@cosmosbay.com \
    --cc=ak@colin2.muc.de \
    --cc=ak@muc.de \
    --cc=jun.nakajima@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).