* 2.4.20-ac2 Memory Leak? @ 2003-03-18 23:17 Gregory Stark 2003-03-18 23:28 ` Martin Josefsson 0 siblings, 1 reply; 6+ messages in thread From: Gregory Stark @ 2003-03-18 23:17 UTC (permalink / raw) To: linux-kernel My router box has a problem, it seems to be running out of memory. Programs that worked fine earlier are now swapping like crazy. What confuses me is that if I add up all the RSS of the processes I get 5.9M, a number drastically lower than the available RAM on the machine (24M) and drastically lower than the amount of RAM "free" says is taken (22M). It seems something in kernel space has taken a ton of memory out of play? Or is my diagnosis wrong? USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 1 0.0 0.8 1216 196 ? S Mar09 0:06 init [2] root 2 0.0 0.0 0 0 ? SW Mar09 0:02 [keventd] root 3 0.0 0.0 0 0 ? SWN Mar09 0:01 [ksoftirqd_CPU0] root 4 0.0 0.0 0 0 ? SW Mar09 0:25 [kswapd] root 5 0.0 0.0 0 0 ? SW Mar09 0:00 [bdflush] root 6 0.0 0.0 0 0 ? SW Mar09 0:00 [kupdated] root 111 0.0 0.0 0 0 ? SW Mar09 0:00 [eth0] root 118 0.0 0.8 1408 192 ? S Mar09 0:08 dhclient -pf /var/run/dhclient.eth0.pid eth0 root 182 0.0 1.1 1284 248 ? S Mar09 0:18 /sbin/syslogd root 185 0.0 0.7 1840 168 ? S Mar09 0:04 /sbin/klogd root 205 0.0 0.6 1236 156 ? S Mar09 0:00 /usr/sbin/inetd root 212 0.0 1.0 2688 240 ? S Mar09 0:00 /usr/sbin/sshd root 224 0.0 0.6 2308 140 tty1 S Mar09 0:03 -bash stark 225 0.0 0.5 2276 116 tty2 S Mar09 0:01 -bash stark 226 0.0 0.6 2300 144 tty3 S Mar09 0:06 -bash root 227 0.0 0.5 1200 116 tty4 S Mar09 0:00 /sbin/getty 38400 tty4 root 228 0.0 0.5 1200 116 tty5 S Mar09 0:00 /sbin/getty 38400 tty5 root 229 0.0 0.5 1200 116 tty6 S Mar09 0:00 /sbin/getty 38400 tty6 root 236 0.0 0.6 1736 144 tty1 S Mar09 0:03 tail -100f /var/log/ppp.log root 245 0.0 0.8 2000 184 ? S Mar09 1:09 pppd call provider root 253 0.0 0.0 0 0 ? SW Mar09 0:00 [eth1] root 303 0.0 0.7 2088 160 ? S Mar09 0:25 /usr/sbin/zhm arilinn.mit.edu neskaya.mit.edu root 494 0.0 0.5 5772 128 ? S Mar09 0:00 /usr/sbin/sshd stark 496 0.0 1.5 5904 356 ? S Mar09 1:47 /usr/sbin/sshd stark 497 0.5 2.7 3412 620 ? S Mar09 69:08 zwgc -ttymode -nofork -f /home/stark/.zwgc.desc root 1315 0.0 0.7 1440 160 ? S Mar10 0:07 /usr/sbin/dhcpd -q eth1 root 11109 0.0 1.9 2316 432 tty2 S 14:02 0:01 bash root 11367 0.0 1.9 2316 432 tty2 S 18:07 0:00 bash root 11368 0.0 3.3 2488 744 tty2 R 18:07 0:00 ps auxww total: used: free: shared: buffers: cached: Mem: 22962176 22597632 364544 0 245760 3223552 Swap: 67489792 4120576 63369216 MemTotal: 22424 kB MemFree: 356 kB MemShared: 0 kB Buffers: 240 kB Cached: 2352 kB SwapCached: 796 kB Active: 3008 kB Inact_dirty: 192 kB Inact_clean: 340 kB Inact_target: 708 kB HighTotal: 0 kB HighFree: 0 kB LowTotal: 22424 kB LowFree: 356 kB SwapTotal: 65908 kB SwapFree: 61884 kB Committed_AS: 8552 kB Module Size Used by Not tainted ipt_MASQUERADE 1784 0 (autoclean) ipt_LOG 3384 1 (autoclean) ipt_state 536 1 (autoclean) ipt_TCPMSS 2360 1 (autoclean) ip_nat_ftp 3568 0 (unused) iptable_nat 21016 2 [ipt_MASQUERADE ip_nat_ftp] ip_conntrack_irc 3152 0 (unused) ip_conntrack_ftp 4208 1 [ip_nat_ftp] ip_conntrack 27776 4 [ipt_MASQUERADE ipt_state ip_nat_ftp iptable_nat ip_conntrack_irc ip_conntrack_ftp] ethertap 3108 1 iptable_filter 1644 1 (autoclean) ip_tables 14296 8 [ipt_MASQUERADE ipt_LOG ipt_state ipt_TCPMSS iptable_nat iptable_filter] pppoe 7820 1 pppox 1176 1 [pppoe] ppp_generic 19836 3 (autoclean) [pppoe pppox] slhc 5104 0 (autoclean) [ppp_generic] 8139too 14216 2 rtc 6588 0 (autoclean) -- greg ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: 2.4.20-ac2 Memory Leak? 2003-03-18 23:17 2.4.20-ac2 Memory Leak? Gregory Stark @ 2003-03-18 23:28 ` Martin Josefsson 2003-03-18 23:46 ` Greg Stark 2003-03-19 14:55 ` Greg Stark 0 siblings, 2 replies; 6+ messages in thread From: Martin Josefsson @ 2003-03-18 23:28 UTC (permalink / raw) To: gsstark; +Cc: linux-kernel [-- Attachment #1: Type: text/plain, Size: 888 bytes --] On Wed, 2003-03-19 at 00:17, Gregory Stark wrote: > My router box has a problem, it seems to be running out of memory. Programs > that worked fine earlier are now swapping like crazy. > > What confuses me is that if I add up all the RSS of the processes I get 5.9M, > a number drastically lower than the available RAM on the machine (24M) and > drastically lower than the amount of RAM "free" says is taken (22M). > > It seems something in kernel space has taken a ton of memory out of play? > Or is my diagnosis wrong? 2.4.20 changed the linked list handling and ip_conntrack relied on the old way. I've attached a patch that removes this assumption from ip_conntrack. This can be the source of your problems, connections can get very long timeouts and stay in ip_conntrack. -- /Martin Never argue with an idiot. They drag you down to their level, then beat you with experience. [-- Attachment #2: 10_confirm_fix.patch --] [-- Type: text/x-patch, Size: 6426 bytes --] diff -urN --exclude-from=diff.exclude linux-2.4.20-base/include/linux/netfilter_ipv4/ip_conntrack.h linux-2.4.20-del/include/linux/netfilter_ipv4/ip_conntrack.h --- linux-2.4.20-base/include/linux/netfilter_ipv4/ip_conntrack.h Fri Nov 29 00:53:15 2002 +++ linux-2.4.20-del/include/linux/netfilter_ipv4/ip_conntrack.h Fri Feb 21 17:01:38 2003 @@ -6,6 +6,7 @@ #include <linux/config.h> #include <linux/netfilter_ipv4/ip_conntrack_tuple.h> +#include <linux/bitops.h> #include <asm/atomic.h> enum ip_conntrack_info @@ -41,6 +42,10 @@ /* Conntrack should never be early-expired. */ IPS_ASSURED_BIT = 2, IPS_ASSURED = (1 << IPS_ASSURED_BIT), + + /* Connection is confirmed: originating packet has left box */ + IPS_CONFIRMED_BIT = 3, + IPS_CONFIRMED = (1 << IPS_CONFIRMED_BIT), }; #include <linux/netfilter_ipv4/ip_conntrack_tcp.h> @@ -159,7 +164,7 @@ struct ip_conntrack_tuple_hash tuplehash[IP_CT_DIR_MAX]; /* Have we seen traffic both ways yet? (bitset) */ - volatile unsigned long status; + unsigned long status; /* Timer function; drops refcnt when it goes off. */ struct timer_list timeout; @@ -254,7 +259,7 @@ /* It's confirmed if it is, or has been in the hash table. */ static inline int is_confirmed(struct ip_conntrack *ct) { - return ct->tuplehash[IP_CT_DIR_ORIGINAL].list.next != NULL; + return test_bit(IPS_CONFIRMED_BIT, &ct->status); } extern unsigned int ip_conntrack_htable_size; diff -urN --exclude-from=diff.exclude linux-2.4.20-base/net/ipv4/netfilter/ip_conntrack_core.c linux-2.4.20-del/net/ipv4/netfilter/ip_conntrack_core.c --- linux-2.4.20-base/net/ipv4/netfilter/ip_conntrack_core.c Tue Feb 18 17:08:21 2003 +++ linux-2.4.20-del/net/ipv4/netfilter/ip_conntrack_core.c Fri Feb 21 17:01:39 2003 @@ -292,9 +292,6 @@ { DEBUGP("clean_from_lists(%p)\n", ct); MUST_BE_WRITE_LOCKED(&ip_conntrack_lock); - /* Remove from both hash lists: must not NULL out next ptrs, - otherwise we'll look unconfirmed. Fortunately, LIST_DELETE - doesn't do this. --RR */ LIST_DELETE(&ip_conntrack_hash [hash_conntrack(&ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple)], &ct->tuplehash[IP_CT_DIR_ORIGINAL]); @@ -467,6 +464,7 @@ ct->timeout.expires += jiffies; add_timer(&ct->timeout); atomic_inc(&ct->ct_general.use); + set_bit(IPS_CONFIRMED_BIT, &ct->status); WRITE_UNLOCK(&ip_conntrack_lock); return NF_ACCEPT; } @@ -585,7 +583,7 @@ connection. Too bad: we're in trouble anyway. */ static inline int unreplied(const struct ip_conntrack_tuple_hash *i) { - return !(i->ctrack->status & IPS_ASSURED); + return !(test_bit(IPS_ASSURED_BIT, &i->ctrack->status)); } static int early_drop(struct list_head *chain) @@ -720,7 +718,7 @@ conntrack, expected); /* Welcome, Mr. Bond. We've been expecting you... */ IP_NF_ASSERT(master_ct(conntrack)); - conntrack->status = IPS_EXPECTED; + __set_bit(IPS_EXPECTED_BIT, &conntrack->status); conntrack->master = expected; expected->sibling = conntrack; LIST_DELETE(&ip_conntrack_expect_list, expected); @@ -768,11 +766,11 @@ *set_reply = 1; } else { /* Once we've had two way comms, always ESTABLISHED. */ - if (h->ctrack->status & IPS_SEEN_REPLY) { + if (test_bit(IPS_SEEN_REPLY_BIT, &h->ctrack->status)) { DEBUGP("ip_conntrack_in: normal packet for %p\n", h->ctrack); *ctinfo = IP_CT_ESTABLISHED; - } else if (h->ctrack->status & IPS_EXPECTED) { + } else if (test_bit(IPS_EXPECTED_BIT, &h->ctrack->status)) { DEBUGP("ip_conntrack_in: related packet for %p\n", h->ctrack); *ctinfo = IP_CT_RELATED; diff -urN --exclude-from=diff.exclude linux-2.4.20-base/net/ipv4/netfilter/ip_conntrack_proto_tcp.c linux-2.4.20-del/net/ipv4/netfilter/ip_conntrack_proto_tcp.c --- linux-2.4.20-base/net/ipv4/netfilter/ip_conntrack_proto_tcp.c Tue Feb 18 17:07:26 2003 +++ linux-2.4.20-del/net/ipv4/netfilter/ip_conntrack_proto_tcp.c Fri Feb 21 17:03:35 2003 @@ -192,7 +192,7 @@ have an established connection: this is a fairly common problem case, so we can delete the conntrack immediately. --RR */ - if (!(conntrack->status & IPS_SEEN_REPLY) && tcph->rst) { + if (!test_bit(IPS_SEEN_REPLY_BIT, &conntrack->status) && tcph->rst) { WRITE_UNLOCK(&tcp_lock); if (del_timer(&conntrack->timeout)) conntrack->timeout.function((unsigned long)conntrack); diff -urN --exclude-from=diff.exclude linux-2.4.20-base/net/ipv4/netfilter/ip_conntrack_proto_udp.c linux-2.4.20-del/net/ipv4/netfilter/ip_conntrack_proto_udp.c --- linux-2.4.20-base/net/ipv4/netfilter/ip_conntrack_proto_udp.c Fri Nov 29 00:53:15 2002 +++ linux-2.4.20-del/net/ipv4/netfilter/ip_conntrack_proto_udp.c Fri Feb 21 17:01:39 2003 @@ -51,7 +51,7 @@ { /* If we've seen traffic both ways, this is some kind of UDP stream. Extend timeout. */ - if (conntrack->status & IPS_SEEN_REPLY) { + if (test_bit(IPS_SEEN_REPLY_BIT, &conntrack->status)) { ip_ct_refresh(conntrack, UDP_STREAM_TIMEOUT); /* Also, more likely to be important, and not a probe */ set_bit(IPS_ASSURED_BIT, &conntrack->status); diff -urN --exclude-from=diff.exclude linux-2.4.20-base/net/ipv4/netfilter/ip_conntrack_standalone.c linux-2.4.20-del/net/ipv4/netfilter/ip_conntrack_standalone.c --- linux-2.4.20-base/net/ipv4/netfilter/ip_conntrack_standalone.c Fri Nov 29 00:53:15 2002 +++ linux-2.4.20-del/net/ipv4/netfilter/ip_conntrack_standalone.c Fri Feb 21 21:10:37 2003 @@ -77,7 +77,7 @@ } static unsigned int -print_conntrack(char *buffer, const struct ip_conntrack *conntrack) +print_conntrack(char *buffer, struct ip_conntrack *conntrack) { unsigned int len; struct ip_conntrack_protocol *proto @@ -95,12 +95,12 @@ len += print_tuple(buffer + len, &conntrack->tuplehash[IP_CT_DIR_ORIGINAL].tuple, proto); - if (!(conntrack->status & IPS_SEEN_REPLY)) + if (!(test_bit(IPS_SEEN_REPLY_BIT, &conntrack->status))) len += sprintf(buffer + len, "[UNREPLIED] "); len += print_tuple(buffer + len, &conntrack->tuplehash[IP_CT_DIR_REPLY].tuple, proto); - if (conntrack->status & IPS_ASSURED) + if (test_bit(IPS_ASSURED_BIT, &conntrack->status)) len += sprintf(buffer + len, "[ASSURED] "); len += sprintf(buffer + len, "use=%u ", atomic_read(&conntrack->ct_general.use)); ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: 2.4.20-ac2 Memory Leak? 2003-03-18 23:28 ` Martin Josefsson @ 2003-03-18 23:46 ` Greg Stark 2003-03-18 23:53 ` Martin Josefsson 2003-03-19 14:55 ` Greg Stark 1 sibling, 1 reply; 6+ messages in thread From: Greg Stark @ 2003-03-18 23:46 UTC (permalink / raw) To: Martin Josefsson; +Cc: gsstark, linux-kernel Martin Josefsson <gandalf@wlug.westbo.se> writes: > This can be the source of your problems, connections can get very long > timeouts and stay in ip_conntrack. Is there a way to list the connections and confirm this is the problem? It seems it would require an awful lot of connections to consume megabytes of memory. Also, I've looked high and low and can't find this anywhere, how do i tune the timeouts connections get? I have certain protocols that potentially receive very little traffic and I want to make sure they don't time out. -- greg ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: 2.4.20-ac2 Memory Leak? 2003-03-18 23:46 ` Greg Stark @ 2003-03-18 23:53 ` Martin Josefsson 2003-03-19 0:05 ` Greg Stark 0 siblings, 1 reply; 6+ messages in thread From: Martin Josefsson @ 2003-03-18 23:53 UTC (permalink / raw) To: Greg Stark; +Cc: linux-kernel On Wed, 2003-03-19 at 00:46, Greg Stark wrote: > Martin Josefsson <gandalf@wlug.westbo.se> writes: > > > This can be the source of your problems, connections can get very long > > timeouts and stay in ip_conntrack. > > Is there a way to list the connections and confirm this is the problem? > It seems it would require an awful lot of connections to consume megabytes of > memory. cat /proc/net/ip_conntrack The third field is the timeout in seconds. The fourth field is the state of the connection, if it's TIME_WAIT with a large timeout then it's the list handling bug. (iirc it was TIME_WAIT that showed the problem...) > Also, I've looked high and low and can't find this anywhere, how do i tune the > timeouts connections get? I have certain protocols that potentially receive > very little traffic and I want to make sure they don't time out. The default tcp timeout is 5 days for esatblished connections. There's a patch in patch-o-matic that enables you to tune the timeouts without having to edit the source. Instructions on how to get patch-o-matic are availiable on http://www.netfilter.org -- /Martin Never argue with an idiot. They drag you down to their level, then beat you with experience. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: 2.4.20-ac2 Memory Leak? 2003-03-18 23:53 ` Martin Josefsson @ 2003-03-19 0:05 ` Greg Stark 0 siblings, 0 replies; 6+ messages in thread From: Greg Stark @ 2003-03-19 0:05 UTC (permalink / raw) To: Martin Josefsson; +Cc: Greg Stark, linux-kernel Martin Josefsson <gandalf@wlug.westbo.se> writes: > On Wed, 2003-03-19 at 00:46, Greg Stark wrote: > > Martin Josefsson <gandalf@wlug.westbo.se> writes: > > > > > This can be the source of your problems, connections can get very long > > > timeouts and stay in ip_conntrack. > > > > Is there a way to list the connections and confirm this is the problem? > > It seems it would require an awful lot of connections to consume megabytes of > > memory. > > cat /proc/net/ip_conntrack > > The third field is the timeout in seconds. > The fourth field is the state of the connection, if it's TIME_WAIT with > a large timeout then it's the list handling bug. (iirc it was TIME_WAIT > that showed the problem...) Nope, a total of 16 entries in ip_conntrack, one of which is in TIME_WAIT with a timeout of 2m. > The default tcp timeout is 5 days for esatblished connections. > > There's a patch in patch-o-matic that enables you to tune the timeouts > without having to edit the source. Instructions on how to get > patch-o-matic are availiable on http://www.netfilter.org Thanks. -- greg ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: 2.4.20-ac2 Memory Leak? 2003-03-18 23:28 ` Martin Josefsson 2003-03-18 23:46 ` Greg Stark @ 2003-03-19 14:55 ` Greg Stark 1 sibling, 0 replies; 6+ messages in thread From: Greg Stark @ 2003-03-19 14:55 UTC (permalink / raw) To: Martin Josefsson; +Cc: gsstark, linux-kernel > On Wed, 2003-03-19 at 00:17, Gregory Stark wrote: > > My router box has a problem, it seems to be running out of memory. Programs > > that worked fine earlier are now swapping like crazy. > > > > What confuses me is that if I add up all the RSS of the processes I get 5.9M, > > a number drastically lower than the available RAM on the machine (24M) and > > drastically lower than the amount of RAM "free" says is taken (22M). > > > > It seems something in kernel space has taken a ton of memory out of play? > > Or is my diagnosis wrong? One thing I find suspicious is that my ISP was down yesterday morning, causing pppd to try repeatedly to connect. It seems it attempted to connect over 3,000 times while the ISP was down. Every connection failed at the PAP authentication stage. Perhaps there's a memory leak in the pppoe connection initiation process? Even so that would be about 4k per pppoe connection attempt. Hm, a page? -- greg ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2003-03-19 14:45 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2003-03-18 23:17 2.4.20-ac2 Memory Leak? Gregory Stark 2003-03-18 23:28 ` Martin Josefsson 2003-03-18 23:46 ` Greg Stark 2003-03-18 23:53 ` Martin Josefsson 2003-03-19 0:05 ` Greg Stark 2003-03-19 14:55 ` Greg Stark
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).