From mboxrd@z Thu Jan 1 00:00:00 1970 From: Eric Dumazet Subject: Re: PROBLEM: Memory leak (at least with SLUB) from "secpath_dup" (xfrm) in 3.5+ kernels Date: Sun, 21 Oct 2012 23:47:33 +0200 Message-ID: <1350856053.8609.217.camel@edumazet-glaptop> References: <20121019205055.2b258d09@sacrilege> <20121019233632.26cf96d8@sacrilege> <20121020204958.4bc8e293@sacrilege> <20121021044540.12e8f4b7@sacrilege> <20121021062402.7c4c4cb8@sacrilege> <1350826183.13333.2243.camel@edumazet-glaptop> <20121021195701.7a5872e7@sacrilege> <20121022004332.7e3f3f29@sacrilege> <20121022015134.4de457b9@sacrilege> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit Cc: Paul Moore , netdev@vger.kernel.org, linux-mm@kvack.org To: Mike Kazantsev Return-path: In-Reply-To: <20121022015134.4de457b9@sacrilege> Sender: owner-linux-mm@kvack.org List-Id: netdev.vger.kernel.org On Mon, 2012-10-22 at 01:51 +0600, Mike Kazantsev wrote: > On Mon, 22 Oct 2012 00:43:32 +0600 > Mike Kazantsev wrote: > > > > On Sun, 21 Oct 2012 15:29:43 +0200 > > > Eric Dumazet wrote: > > > > > > > > > > > Did you try linux-3.7-rc2 (or linux-3.7-rc1) ? > > > > > > > > I just built "torvalds/linux-2.6" (v3.7-rc2) and rebooted into it, > > started same rsync-over-net test and got kmalloc-64 leaking (it went up > > to tens of MiB until I stopped rsync, normally these are fixed at ~500 > > KiB). > > > > Unfortunately, I forgot to add slub_debug option and build kmemleak so > > wasn't able to look at this case further, and when I rebooted with > > these enabled/built, it was secpath_cache again. > > > > So previously noted "slabtop showed 'kmalloc-64' being the 99% offender > > in the past, but with recent kernels (3.6.1), it has changed to > > 'secpath_cache'" seem to be incorrect, as it seem to depend not on > > kernel version, but some other factor. > > > > Guess I'll try to reboot a few more times to see if I can catch > > kmalloc-64 leaking (instead of secpath_cache) again. > > > > I haven't been able to catch the aforementioned condition, but noticed > that with v3.7-rc2, "hex dump" part seem to vary in kmemleak > traces, and contain all sorts of random stuff, for example: > > unreferenced object 0xffff88002ae2de00 (size 56): > comm "softirq", pid 0, jiffies 4295006317 (age 213.066s) > hex dump (first 32 bytes): > 01 00 00 00 01 00 00 00 20 9f f4 28 00 88 ff ff ........ ..(.... > 2f 6f 72 67 2f 66 72 65 65 64 65 73 6b 74 6f 70 /org/freedesktop > backtrace: > [] kmemleak_alloc+0x21/0x3e > [] kmem_cache_alloc+0xa5/0xb1 > [] secpath_dup+0x1b/0x5a > [] xfrm_input+0x64/0x484 > [] xfrm6_rcv_spi+0x19/0x1b > [] xfrm6_rcv+0x20/0x22 > [] ip6_input_finish+0x203/0x31b > [] ip6_input+0x1e/0x50 > [] ip6_rcv_finish+0x65/0x69 > [] ipv6_rcv+0x27f/0x2e0 > [] __netif_receive_skb+0x5ba/0x65a > [] netif_receive_skb+0x47/0x78 > [] napi_skb_finish+0x21/0x54 > [] napi_gro_receive+0xfd/0x10a > [] rtl8169_poll+0x326/0x4fc > [] net_rx_action+0x9f/0x188 > > Not sure if it's relevant though. > > OK, so some layer seems to have a bug if the skb->head is exactly allocated, instead of having extra tailroom (because of kmalloc-powerof2 alignment) Or some layer overwrites past skb->cb[] array If you try to move sp field in sk_buff, does it change something ? diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 6a2c34e..9b1438a 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -395,6 +395,9 @@ struct sk_buff { struct sock *sk; struct net_device *dev; +#ifdef CONFIG_XFRM + struct sec_path *sp; +#endif /* * This is the control buffer. It is free to use for every * layer. Please put your private variables there. If you @@ -404,9 +407,6 @@ struct sk_buff { char cb[48] __aligned(8); unsigned long _skb_refdst; -#ifdef CONFIG_XFRM - struct sec_path *sp; -#endif unsigned int len, data_len; __u16 mac_len, Also try to increase tailroom in __netdev_alloc_skb() diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 6e04b1f..972ee4f 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -427,7 +427,7 @@ struct sk_buff *__netdev_alloc_skb(struct net_device *dev, unsigned int length, gfp_t gfp_mask) { struct sk_buff *skb = NULL; - unsigned int fragsz = SKB_DATA_ALIGN(length + NET_SKB_PAD) + + unsigned int fragsz = SKB_DATA_ALIGN(length + NET_SKB_PAD + 64) + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); if (fragsz <= PAGE_SIZE && !(gfp_mask & (__GFP_WAIT | GFP_DMA))) { -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from psmtp.com (na3sys010amx161.postini.com [74.125.245.161]) by kanga.kvack.org (Postfix) with SMTP id C25296B0062 for ; Sun, 21 Oct 2012 17:47:37 -0400 (EDT) Received: by mail-wi0-f179.google.com with SMTP id hq7so1498013wib.8 for ; Sun, 21 Oct 2012 14:47:36 -0700 (PDT) Subject: Re: PROBLEM: Memory leak (at least with SLUB) from "secpath_dup" (xfrm) in 3.5+ kernels From: Eric Dumazet In-Reply-To: <20121022015134.4de457b9@sacrilege> References: <20121019205055.2b258d09@sacrilege> <20121019233632.26cf96d8@sacrilege> <20121020204958.4bc8e293@sacrilege> <20121021044540.12e8f4b7@sacrilege> <20121021062402.7c4c4cb8@sacrilege> <1350826183.13333.2243.camel@edumazet-glaptop> <20121021195701.7a5872e7@sacrilege> <20121022004332.7e3f3f29@sacrilege> <20121022015134.4de457b9@sacrilege> Content-Type: text/plain; charset="UTF-8" Date: Sun, 21 Oct 2012 23:47:33 +0200 Message-ID: <1350856053.8609.217.camel@edumazet-glaptop> Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Mike Kazantsev Cc: Paul Moore , netdev@vger.kernel.org, linux-mm@kvack.org On Mon, 2012-10-22 at 01:51 +0600, Mike Kazantsev wrote: > On Mon, 22 Oct 2012 00:43:32 +0600 > Mike Kazantsev wrote: > > > > On Sun, 21 Oct 2012 15:29:43 +0200 > > > Eric Dumazet wrote: > > > > > > > > > > > Did you try linux-3.7-rc2 (or linux-3.7-rc1) ? > > > > > > > > I just built "torvalds/linux-2.6" (v3.7-rc2) and rebooted into it, > > started same rsync-over-net test and got kmalloc-64 leaking (it went up > > to tens of MiB until I stopped rsync, normally these are fixed at ~500 > > KiB). > > > > Unfortunately, I forgot to add slub_debug option and build kmemleak so > > wasn't able to look at this case further, and when I rebooted with > > these enabled/built, it was secpath_cache again. > > > > So previously noted "slabtop showed 'kmalloc-64' being the 99% offender > > in the past, but with recent kernels (3.6.1), it has changed to > > 'secpath_cache'" seem to be incorrect, as it seem to depend not on > > kernel version, but some other factor. > > > > Guess I'll try to reboot a few more times to see if I can catch > > kmalloc-64 leaking (instead of secpath_cache) again. > > > > I haven't been able to catch the aforementioned condition, but noticed > that with v3.7-rc2, "hex dump" part seem to vary in kmemleak > traces, and contain all sorts of random stuff, for example: > > unreferenced object 0xffff88002ae2de00 (size 56): > comm "softirq", pid 0, jiffies 4295006317 (age 213.066s) > hex dump (first 32 bytes): > 01 00 00 00 01 00 00 00 20 9f f4 28 00 88 ff ff ........ ..(.... > 2f 6f 72 67 2f 66 72 65 65 64 65 73 6b 74 6f 70 /org/freedesktop > backtrace: > [] kmemleak_alloc+0x21/0x3e > [] kmem_cache_alloc+0xa5/0xb1 > [] secpath_dup+0x1b/0x5a > [] xfrm_input+0x64/0x484 > [] xfrm6_rcv_spi+0x19/0x1b > [] xfrm6_rcv+0x20/0x22 > [] ip6_input_finish+0x203/0x31b > [] ip6_input+0x1e/0x50 > [] ip6_rcv_finish+0x65/0x69 > [] ipv6_rcv+0x27f/0x2e0 > [] __netif_receive_skb+0x5ba/0x65a > [] netif_receive_skb+0x47/0x78 > [] napi_skb_finish+0x21/0x54 > [] napi_gro_receive+0xfd/0x10a > [] rtl8169_poll+0x326/0x4fc > [] net_rx_action+0x9f/0x188 > > Not sure if it's relevant though. > > OK, so some layer seems to have a bug if the skb->head is exactly allocated, instead of having extra tailroom (because of kmalloc-powerof2 alignment) Or some layer overwrites past skb->cb[] array If you try to move sp field in sk_buff, does it change something ? diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 6a2c34e..9b1438a 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -395,6 +395,9 @@ struct sk_buff { struct sock *sk; struct net_device *dev; +#ifdef CONFIG_XFRM + struct sec_path *sp; +#endif /* * This is the control buffer. It is free to use for every * layer. Please put your private variables there. If you @@ -404,9 +407,6 @@ struct sk_buff { char cb[48] __aligned(8); unsigned long _skb_refdst; -#ifdef CONFIG_XFRM - struct sec_path *sp; -#endif unsigned int len, data_len; __u16 mac_len, Also try to increase tailroom in __netdev_alloc_skb() diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 6e04b1f..972ee4f 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -427,7 +427,7 @@ struct sk_buff *__netdev_alloc_skb(struct net_device *dev, unsigned int length, gfp_t gfp_mask) { struct sk_buff *skb = NULL; - unsigned int fragsz = SKB_DATA_ALIGN(length + NET_SKB_PAD) + + unsigned int fragsz = SKB_DATA_ALIGN(length + NET_SKB_PAD + 64) + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); if (fragsz <= PAGE_SIZE && !(gfp_mask & (__GFP_WAIT | GFP_DMA))) { -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org