linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* 2.4.20-ac2 Memory Leak?
@ 2003-03-18 23:17 Gregory Stark
  2003-03-18 23:28 ` Martin Josefsson
  0 siblings, 1 reply; 6+ messages in thread
From: Gregory Stark @ 2003-03-18 23:17 UTC (permalink / raw)
  To: linux-kernel


My router box has a problem, it seems to be running out of memory. Programs
that worked fine earlier are now swapping like crazy. 

What confuses me is that if I add up all the RSS of the processes I get 5.9M,
a number drastically lower than the available RAM on the machine (24M) and
drastically lower than the amount of RAM "free" says is taken (22M).

It seems something in kernel space has taken a ton of memory out of play?
Or is my diagnosis wrong?


USER       PID %CPU %MEM   VSZ  RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.8  1216  196 ?        S    Mar09   0:06 init [2] 
root         2  0.0  0.0     0    0 ?        SW   Mar09   0:02 [keventd]
root         3  0.0  0.0     0    0 ?        SWN  Mar09   0:01 [ksoftirqd_CPU0]
root         4  0.0  0.0     0    0 ?        SW   Mar09   0:25 [kswapd]
root         5  0.0  0.0     0    0 ?        SW   Mar09   0:00 [bdflush]
root         6  0.0  0.0     0    0 ?        SW   Mar09   0:00 [kupdated]
root       111  0.0  0.0     0    0 ?        SW   Mar09   0:00 [eth0]
root       118  0.0  0.8  1408  192 ?        S    Mar09   0:08 dhclient -pf /var/run/dhclient.eth0.pid eth0
root       182  0.0  1.1  1284  248 ?        S    Mar09   0:18 /sbin/syslogd
root       185  0.0  0.7  1840  168 ?        S    Mar09   0:04 /sbin/klogd
root       205  0.0  0.6  1236  156 ?        S    Mar09   0:00 /usr/sbin/inetd
root       212  0.0  1.0  2688  240 ?        S    Mar09   0:00 /usr/sbin/sshd
root       224  0.0  0.6  2308  140 tty1     S    Mar09   0:03 -bash
stark      225  0.0  0.5  2276  116 tty2     S    Mar09   0:01 -bash
stark      226  0.0  0.6  2300  144 tty3     S    Mar09   0:06 -bash
root       227  0.0  0.5  1200  116 tty4     S    Mar09   0:00 /sbin/getty 38400 tty4
root       228  0.0  0.5  1200  116 tty5     S    Mar09   0:00 /sbin/getty 38400 tty5
root       229  0.0  0.5  1200  116 tty6     S    Mar09   0:00 /sbin/getty 38400 tty6
root       236  0.0  0.6  1736  144 tty1     S    Mar09   0:03 tail -100f /var/log/ppp.log
root       245  0.0  0.8  2000  184 ?        S    Mar09   1:09 pppd call provider
root       253  0.0  0.0     0    0 ?        SW   Mar09   0:00 [eth1]
root       303  0.0  0.7  2088  160 ?        S    Mar09   0:25 /usr/sbin/zhm arilinn.mit.edu neskaya.mit.edu
root       494  0.0  0.5  5772  128 ?        S    Mar09   0:00 /usr/sbin/sshd
stark      496  0.0  1.5  5904  356 ?        S    Mar09   1:47 /usr/sbin/sshd
stark      497  0.5  2.7  3412  620 ?        S    Mar09  69:08 zwgc -ttymode -nofork -f /home/stark/.zwgc.desc
root      1315  0.0  0.7  1440  160 ?        S    Mar10   0:07 /usr/sbin/dhcpd -q eth1
root     11109  0.0  1.9  2316  432 tty2     S    14:02   0:01 bash
root     11367  0.0  1.9  2316  432 tty2     S    18:07   0:00 bash
root     11368  0.0  3.3  2488  744 tty2     R    18:07   0:00 ps auxww



        total:    used:    free:  shared: buffers:  cached:
Mem:  22962176 22597632   364544        0   245760  3223552
Swap: 67489792  4120576 63369216
MemTotal:        22424 kB
MemFree:           356 kB
MemShared:           0 kB
Buffers:           240 kB
Cached:           2352 kB
SwapCached:        796 kB
Active:           3008 kB
Inact_dirty:       192 kB
Inact_clean:       340 kB
Inact_target:      708 kB
HighTotal:           0 kB
HighFree:            0 kB
LowTotal:        22424 kB
LowFree:           356 kB
SwapTotal:       65908 kB
SwapFree:        61884 kB
Committed_AS:     8552 kB



Module                  Size  Used by    Not tainted
ipt_MASQUERADE          1784   0  (autoclean)
ipt_LOG                 3384   1  (autoclean)
ipt_state                536   1  (autoclean)
ipt_TCPMSS              2360   1  (autoclean)
ip_nat_ftp              3568   0  (unused)
iptable_nat            21016   2  [ipt_MASQUERADE ip_nat_ftp]
ip_conntrack_irc        3152   0  (unused)
ip_conntrack_ftp        4208   1  [ip_nat_ftp]
ip_conntrack           27776   4  [ipt_MASQUERADE ipt_state ip_nat_ftp iptable_nat ip_conntrack_irc ip_conntrack_ftp]
ethertap                3108   1 
iptable_filter          1644   1  (autoclean)
ip_tables              14296   8  [ipt_MASQUERADE ipt_LOG ipt_state ipt_TCPMSS iptable_nat iptable_filter]
pppoe                   7820   1 
pppox                   1176   1  [pppoe]
ppp_generic            19836   3  (autoclean) [pppoe pppox]
slhc                    5104   0  (autoclean) [ppp_generic]
8139too                14216   2 
rtc                     6588   0  (autoclean)


-- 
greg


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.4.20-ac2 Memory Leak?
  2003-03-18 23:17 2.4.20-ac2 Memory Leak? Gregory Stark
@ 2003-03-18 23:28 ` Martin Josefsson
  2003-03-18 23:46   ` Greg Stark
  2003-03-19 14:55   ` Greg Stark
  0 siblings, 2 replies; 6+ messages in thread
From: Martin Josefsson @ 2003-03-18 23:28 UTC (permalink / raw)
  To: gsstark; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 888 bytes --]

On Wed, 2003-03-19 at 00:17, Gregory Stark wrote:
> My router box has a problem, it seems to be running out of memory. Programs
> that worked fine earlier are now swapping like crazy. 
> 
> What confuses me is that if I add up all the RSS of the processes I get 5.9M,
> a number drastically lower than the available RAM on the machine (24M) and
> drastically lower than the amount of RAM "free" says is taken (22M).
> 
> It seems something in kernel space has taken a ton of memory out of play?
> Or is my diagnosis wrong?

2.4.20 changed the linked list handling and ip_conntrack relied on the
old way. I've attached a patch that removes this assumption from
ip_conntrack.

This can be the source of your problems, connections can get very long
timeouts and stay in ip_conntrack.

-- 
/Martin

Never argue with an idiot. They drag you down to their level, then beat you with experience.

[-- Attachment #2: 10_confirm_fix.patch --]
[-- Type: text/x-patch, Size: 6426 bytes --]

diff -urN --exclude-from=diff.exclude linux-2.4.20-base/include/linux/netfilter_ipv4/ip_conntrack.h linux-2.4.20-del/include/linux/netfilter_ipv4/ip_conntrack.h
--- linux-2.4.20-base/include/linux/netfilter_ipv4/ip_conntrack.h	Fri Nov 29 00:53:15 2002
+++ linux-2.4.20-del/include/linux/netfilter_ipv4/ip_conntrack.h	Fri Feb 21 17:01:38 2003
@@ -6,6 +6,7 @@
 
 #include <linux/config.h>
 #include <linux/netfilter_ipv4/ip_conntrack_tuple.h>
+#include <linux/bitops.h>
 #include <asm/atomic.h>
 
 enum ip_conntrack_info
@@ -41,6 +42,10 @@
 	/* Conntrack should never be early-expired. */
 	IPS_ASSURED_BIT = 2,
 	IPS_ASSURED = (1 << IPS_ASSURED_BIT),
+
+	/* Connection is confirmed: originating packet has left box */
+	IPS_CONFIRMED_BIT = 3,
+	IPS_CONFIRMED = (1 << IPS_CONFIRMED_BIT),
 };
 
 #include <linux/netfilter_ipv4/ip_conntrack_tcp.h>
@@ -159,7 +164,7 @@
 	struct ip_conntrack_tuple_hash tuplehash[IP_CT_DIR_MAX];
 
 	/* Have we seen traffic both ways yet? (bitset) */
-	volatile unsigned long status;
+	unsigned long status;
 
 	/* Timer function; drops refcnt when it goes off. */
 	struct timer_list timeout;
@@ -254,7 +259,7 @@
 /* It's confirmed if it is, or has been in the hash table. */
 static inline int is_confirmed(struct ip_conntrack *ct)
 {
-	return ct->tuplehash[IP_CT_DIR_ORIGINAL].list.next != NULL;
+	return test_bit(IPS_CONFIRMED_BIT, &ct->status);
 }
 
 extern unsigned int ip_conntrack_htable_size;
diff -urN --exclude-from=diff.exclude linux-2.4.20-base/net/ipv4/netfilter/ip_conntrack_core.c linux-2.4.20-del/net/ipv4/netfilter/ip_conntrack_core.c
--- linux-2.4.20-base/net/ipv4/netfilter/ip_conntrack_core.c	Tue Feb 18 17:08:21 2003
+++ linux-2.4.20-del/net/ipv4/netfilter/ip_conntrack_core.c	Fri Feb 21 17:01:39 2003
@@ -292,9 +292,6 @@
 {
 	DEBUGP("clean_from_lists(%p)\n", ct);
 	MUST_BE_WRITE_LOCKED(&ip_conntrack_lock);
-	/* Remove from both hash lists: must not NULL out next ptrs,
-           otherwise we'll look unconfirmed.  Fortunately, LIST_DELETE
-           doesn't do this. --RR */
 	LIST_DELETE(&ip_conntrack_hash
 		    [hash_conntrack(&ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple)],
 		    &ct->tuplehash[IP_CT_DIR_ORIGINAL]);
@@ -467,6 +464,7 @@
 		ct->timeout.expires += jiffies;
 		add_timer(&ct->timeout);
 		atomic_inc(&ct->ct_general.use);
+		set_bit(IPS_CONFIRMED_BIT, &ct->status);
 		WRITE_UNLOCK(&ip_conntrack_lock);
 		return NF_ACCEPT;
 	}
@@ -585,7 +583,7 @@
    connection.  Too bad: we're in trouble anyway. */
 static inline int unreplied(const struct ip_conntrack_tuple_hash *i)
 {
-	return !(i->ctrack->status & IPS_ASSURED);
+	return !(test_bit(IPS_ASSURED_BIT, &i->ctrack->status));
 }
 
 static int early_drop(struct list_head *chain)
@@ -720,7 +718,7 @@
 			conntrack, expected);
 		/* Welcome, Mr. Bond.  We've been expecting you... */
 		IP_NF_ASSERT(master_ct(conntrack));
-		conntrack->status = IPS_EXPECTED;
+		__set_bit(IPS_EXPECTED_BIT, &conntrack->status);
 		conntrack->master = expected;
 		expected->sibling = conntrack;
 		LIST_DELETE(&ip_conntrack_expect_list, expected);
@@ -768,11 +766,11 @@
 		*set_reply = 1;
 	} else {
 		/* Once we've had two way comms, always ESTABLISHED. */
-		if (h->ctrack->status & IPS_SEEN_REPLY) {
+		if (test_bit(IPS_SEEN_REPLY_BIT, &h->ctrack->status)) {
 			DEBUGP("ip_conntrack_in: normal packet for %p\n",
 			       h->ctrack);
 		        *ctinfo = IP_CT_ESTABLISHED;
-		} else if (h->ctrack->status & IPS_EXPECTED) {
+		} else if (test_bit(IPS_EXPECTED_BIT, &h->ctrack->status)) {
 			DEBUGP("ip_conntrack_in: related packet for %p\n",
 			       h->ctrack);
 			*ctinfo = IP_CT_RELATED;
diff -urN --exclude-from=diff.exclude linux-2.4.20-base/net/ipv4/netfilter/ip_conntrack_proto_tcp.c linux-2.4.20-del/net/ipv4/netfilter/ip_conntrack_proto_tcp.c
--- linux-2.4.20-base/net/ipv4/netfilter/ip_conntrack_proto_tcp.c	Tue Feb 18 17:07:26 2003
+++ linux-2.4.20-del/net/ipv4/netfilter/ip_conntrack_proto_tcp.c	Fri Feb 21 17:03:35 2003
@@ -192,7 +192,7 @@
 	   have an established connection: this is a fairly common
 	   problem case, so we can delete the conntrack
 	   immediately.  --RR */
-	if (!(conntrack->status & IPS_SEEN_REPLY) && tcph->rst) {
+	if (!test_bit(IPS_SEEN_REPLY_BIT, &conntrack->status) && tcph->rst) {
 		WRITE_UNLOCK(&tcp_lock);
 		if (del_timer(&conntrack->timeout))
 			conntrack->timeout.function((unsigned long)conntrack);
diff -urN --exclude-from=diff.exclude linux-2.4.20-base/net/ipv4/netfilter/ip_conntrack_proto_udp.c linux-2.4.20-del/net/ipv4/netfilter/ip_conntrack_proto_udp.c
--- linux-2.4.20-base/net/ipv4/netfilter/ip_conntrack_proto_udp.c	Fri Nov 29 00:53:15 2002
+++ linux-2.4.20-del/net/ipv4/netfilter/ip_conntrack_proto_udp.c	Fri Feb 21 17:01:39 2003
@@ -51,7 +51,7 @@
 {
 	/* If we've seen traffic both ways, this is some kind of UDP
 	   stream.  Extend timeout. */
-	if (conntrack->status & IPS_SEEN_REPLY) {
+	if (test_bit(IPS_SEEN_REPLY_BIT, &conntrack->status)) {
 		ip_ct_refresh(conntrack, UDP_STREAM_TIMEOUT);
 		/* Also, more likely to be important, and not a probe */
 		set_bit(IPS_ASSURED_BIT, &conntrack->status);
diff -urN --exclude-from=diff.exclude linux-2.4.20-base/net/ipv4/netfilter/ip_conntrack_standalone.c linux-2.4.20-del/net/ipv4/netfilter/ip_conntrack_standalone.c
--- linux-2.4.20-base/net/ipv4/netfilter/ip_conntrack_standalone.c	Fri Nov 29 00:53:15 2002
+++ linux-2.4.20-del/net/ipv4/netfilter/ip_conntrack_standalone.c	Fri Feb 21 21:10:37 2003
@@ -77,7 +77,7 @@
 }
 
 static unsigned int
-print_conntrack(char *buffer, const struct ip_conntrack *conntrack)
+print_conntrack(char *buffer, struct ip_conntrack *conntrack)
 {
 	unsigned int len;
 	struct ip_conntrack_protocol *proto
@@ -95,12 +95,12 @@
 	len += print_tuple(buffer + len,
 			   &conntrack->tuplehash[IP_CT_DIR_ORIGINAL].tuple,
 			   proto);
-	if (!(conntrack->status & IPS_SEEN_REPLY))
+	if (!(test_bit(IPS_SEEN_REPLY_BIT, &conntrack->status)))
 		len += sprintf(buffer + len, "[UNREPLIED] ");
 	len += print_tuple(buffer + len,
 			   &conntrack->tuplehash[IP_CT_DIR_REPLY].tuple,
 			   proto);
-	if (conntrack->status & IPS_ASSURED)
+	if (test_bit(IPS_ASSURED_BIT, &conntrack->status))
 		len += sprintf(buffer + len, "[ASSURED] ");
 	len += sprintf(buffer + len, "use=%u ",
 		       atomic_read(&conntrack->ct_general.use));

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.4.20-ac2 Memory Leak?
  2003-03-18 23:28 ` Martin Josefsson
@ 2003-03-18 23:46   ` Greg Stark
  2003-03-18 23:53     ` Martin Josefsson
  2003-03-19 14:55   ` Greg Stark
  1 sibling, 1 reply; 6+ messages in thread
From: Greg Stark @ 2003-03-18 23:46 UTC (permalink / raw)
  To: Martin Josefsson; +Cc: gsstark, linux-kernel

Martin Josefsson <gandalf@wlug.westbo.se> writes:

> This can be the source of your problems, connections can get very long
> timeouts and stay in ip_conntrack.

Is there a way to list the connections and confirm this is the problem?
It seems it would require an awful lot of connections to consume megabytes of
memory.

Also, I've looked high and low and can't find this anywhere, how do i tune the
timeouts connections get? I have certain protocols that potentially receive
very little traffic and I want to make sure they don't time out.

--
greg


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.4.20-ac2 Memory Leak?
  2003-03-18 23:46   ` Greg Stark
@ 2003-03-18 23:53     ` Martin Josefsson
  2003-03-19  0:05       ` Greg Stark
  0 siblings, 1 reply; 6+ messages in thread
From: Martin Josefsson @ 2003-03-18 23:53 UTC (permalink / raw)
  To: Greg Stark; +Cc: linux-kernel

On Wed, 2003-03-19 at 00:46, Greg Stark wrote:
> Martin Josefsson <gandalf@wlug.westbo.se> writes:
> 
> > This can be the source of your problems, connections can get very long
> > timeouts and stay in ip_conntrack.
> 
> Is there a way to list the connections and confirm this is the problem?
> It seems it would require an awful lot of connections to consume megabytes of
> memory.

cat /proc/net/ip_conntrack

The third field is the timeout in seconds.
The fourth field is the state of the connection, if it's TIME_WAIT with
a large timeout then it's the list handling bug. (iirc it was TIME_WAIT
that showed the problem...)

> Also, I've looked high and low and can't find this anywhere, how do i tune the
> timeouts connections get? I have certain protocols that potentially receive
> very little traffic and I want to make sure they don't time out.

The default tcp timeout is 5 days for esatblished connections.

There's a patch in patch-o-matic that enables you to tune the timeouts
without having to edit the source. Instructions on how to get
patch-o-matic are availiable on http://www.netfilter.org

-- 
/Martin

Never argue with an idiot. They drag you down to their level, then beat you with experience.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.4.20-ac2 Memory Leak?
  2003-03-18 23:53     ` Martin Josefsson
@ 2003-03-19  0:05       ` Greg Stark
  0 siblings, 0 replies; 6+ messages in thread
From: Greg Stark @ 2003-03-19  0:05 UTC (permalink / raw)
  To: Martin Josefsson; +Cc: Greg Stark, linux-kernel

Martin Josefsson <gandalf@wlug.westbo.se> writes:

> On Wed, 2003-03-19 at 00:46, Greg Stark wrote:
> > Martin Josefsson <gandalf@wlug.westbo.se> writes:
> > 
> > > This can be the source of your problems, connections can get very long
> > > timeouts and stay in ip_conntrack.
> > 
> > Is there a way to list the connections and confirm this is the problem?
> > It seems it would require an awful lot of connections to consume megabytes of
> > memory.
> 
> cat /proc/net/ip_conntrack
> 
> The third field is the timeout in seconds.
> The fourth field is the state of the connection, if it's TIME_WAIT with
> a large timeout then it's the list handling bug. (iirc it was TIME_WAIT
> that showed the problem...)

Nope, a total of 16 entries in ip_conntrack, one of which is in TIME_WAIT with
a timeout of 2m.



> The default tcp timeout is 5 days for esatblished connections.
> 
> There's a patch in patch-o-matic that enables you to tune the timeouts
> without having to edit the source. Instructions on how to get
> patch-o-matic are availiable on http://www.netfilter.org

Thanks.

-- 
greg


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: 2.4.20-ac2 Memory Leak?
  2003-03-18 23:28 ` Martin Josefsson
  2003-03-18 23:46   ` Greg Stark
@ 2003-03-19 14:55   ` Greg Stark
  1 sibling, 0 replies; 6+ messages in thread
From: Greg Stark @ 2003-03-19 14:55 UTC (permalink / raw)
  To: Martin Josefsson; +Cc: gsstark, linux-kernel


> On Wed, 2003-03-19 at 00:17, Gregory Stark wrote:
> > My router box has a problem, it seems to be running out of memory. Programs
> > that worked fine earlier are now swapping like crazy. 
> > 
> > What confuses me is that if I add up all the RSS of the processes I get 5.9M,
> > a number drastically lower than the available RAM on the machine (24M) and
> > drastically lower than the amount of RAM "free" says is taken (22M).
> > 
> > It seems something in kernel space has taken a ton of memory out of play?
> > Or is my diagnosis wrong?

One thing I find suspicious is that my ISP was down yesterday morning, causing
pppd to try repeatedly to connect. It seems it attempted to connect over 3,000
times while the ISP was down. Every connection failed at the PAP
authentication stage.

Perhaps there's a memory leak in the pppoe connection initiation process?
Even so that would be about 4k per pppoe connection attempt. Hm, a page?

--
greg


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2003-03-19 14:45 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-03-18 23:17 2.4.20-ac2 Memory Leak? Gregory Stark
2003-03-18 23:28 ` Martin Josefsson
2003-03-18 23:46   ` Greg Stark
2003-03-18 23:53     ` Martin Josefsson
2003-03-19  0:05       ` Greg Stark
2003-03-19 14:55   ` Greg Stark

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).