From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?unknown-8bit?q?Pawe=C5=82?= Staszewski Date: Sun, 15 Oct 2017 00:03:19 +0200 Subject: [Intel-wired-lan] Linux 4.12+ memory leak on router with i40e NICs In-Reply-To: <227d17ae-b040-07d0-3c57-e9acd1a3b5b4@itcare.pl> References: <1507121766.30720.4.camel@cohaesio.com> <1507180753.20182.8.camel@cohaesio.com> <227d17ae-b040-07d0-3c57-e9acd1a3b5b4@itcare.pl> Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit To: intel-wired-lan@osuosl.org List-ID: Forgot to add - this graphs are tested with Kernel 4.14-rc4-next W dniu 2017-10-15 o?00:00, Pawe? Staszewski pisze: > > Same problem here > > Also only difference is change 82599 intel to x710 and have memleak > > mem with ixgbe driver over time - same config saame kernel > > > > changed NIC's to x710 i40e driver (this is the only change) > > And mem over time: > > > > There is no process that is eating memory - looks like there is some > problem with i40e driver - but it not a surprise :) this driver is > really buggy - with many things - most tickets on e1000e sourceforge > that i openned have no reply for year or more - or if somebody reply > after year they are closing ticket after 1 day with info about no > activity :) > > > > W dniu 2017-10-05 o?07:19, Anders K. Pedersen | Cohaesio pisze: >> On ons, 2017-10-04 at 08:32 -0700, Alexander Duyck wrote: >>> On Wed, Oct 4, 2017 at 5:56 AM, Anders K. Pedersen | Cohaesio >>> wrote: >>>> Hello, >>>> >>>> After updating one of our Linux based routers to kernel 4.13 it >>>> began >>>> leaking memory quite fast (about 1 GB every half hour). To narrow >>>> we >>>> tried various kernel versions and found that 4.11.12 is okay, while >>>> 4.12 also leaks, so we did a bisection between 4.11 and 4.12. >>>> >>>> The first bisection ended at >>>> "[6964e53f55837b0c49ed60d36656d2e0ee4fc27b] i40e: fix handling of >>>> HW >>>> ATR eviction", which fixes some flag handling that was broken by >>>> 47994c119a36 "i40e: remove hw_disabled_flags in favor of using >>>> separate >>>> flag bits", so I did a second bisection, where I added 6964e53f5583 >>>> "i40e: fix handling of HW ATR eviction" to the steps that had >>>> 47994c119a36 "i40e: remove hw_disabled_flags in favor of using >>>> separate >>>> flag bits" in them. >>>> >>>> The second bisection ended at >>>> "[0e626ff7ccbfc43c6cc4aeea611c40b899682382] i40e: Fix support for >>>> flow >>>> director programming status", where I don't see any obvious >>>> problems, >>>> so I'm hoping for some assistance. >>>> >>>> The router is a PowerEdge R730 server (Haswell based) with three >>>> Intel >>>> NICs (all using the i40e driver): >>>> >>>> X710 quad port 10 GbE SFP+: eth0 eth1 eth2 eth3 >>>> X710 quad port 10 GbE SFP+: eth4 eth5 eth6 eth7 >>>> XL710 dual port 40 GbE QSFP+: eth8 eth9 >>>> >>>> The NICs are aggregated with LACP with the team driver: >>>> >>>> team0: eth9 (40 GbE selected primary), and eth3, eth7 (10 GbE non- >>>> selected backups) >>>> team1: eth0, eth1, eth4, eth5 (all 10 GbE selected) >>>> >>>> team0 is used for internal networks and has one untagged and four >>>> tagged VLAN interfaces, while team1 has an external uplink >>>> connection >>>> without any VLANs. >>>> >>>> The router runs an eBGP session on team1 to one of our uplinks, and >>>> iBGP via team0 to our other border routers. It also runs OSPF on >>>> the >>>> internal VLANs on team0. One thing I've noticed is that when OSPF >>>> is >>>> not announcing a default gateway to the internal networks, so there >>>> is >>>> almost no traffic coming in on team0 and out on team1, but still >>>> plenty >>>> of traffic coming in on team1 and out via team0, there's no memory >>>> leak >>>> (or at least it is so small that we haven't detected it). But as >>>> soon >>>> as we configure OSPF to announce a default gateway to the internal >>>> VLANs, so we get traffic from team0 to team1 the leaking begins. >>>> Stopping the OSPF default gateway announcement again also stops the >>>> leaking, but does not release already leaked memory. >>>> >>>> So this leads to me suspect that the leaking is related to RX on >>>> team0 >>>> (where XL710 eth9 is normally the only active interface) or TX on >>>> team1 >>>> (X710 eth0, eth1, eth4, eth5). The first bad commit is related to >>>> RX >>>> cleaning, which suggests RX on team0. Since we're only seeing the >>>> leak >>>> for our outbound traffic, I suspect either a difference between the >>>> X710 vs. XL710 NICs, or that the inbound traffic is for relatively >>>> few >>>> destination addresses (only our own systems) while the outbound >>>> traffic >>>> is for many different addresses on the internet. But I'm just >>>> guessing >>>> here. >>>> >>>> I've tried kmemleak, but it only found a few kB of suspected memory >>>> leaks (several of which disappeared again after a while). >>>> >>>> Below I've included more details - git bisect logs, ethtool -i, >>>> dmesg, >>>> Kernel .config, and various memory related /proc files. Any help or >>>> suggestions would be much appreciated, and please let me know if >>>> more >>>> information is needed or there's something I should try. >>>> >>>> Regards, >>>> Anders K. Pedersen >>>> >>> Hi Anders, >>> >>> I think I see the problem and should have a patch submitted shortly >>> to >>> address it. From what I can tell it looks like the issue is that we >>> weren't properly recycling the pages associated with descriptors that >>> contained an Rx programming status. For now the workaround would be >>> to >>> try disabling ATR via the "ethtool --set-priv-flags" command. I >>> should >>> have a patch out in the next hour or so that you can try testing to >>> verify if it addresses the issue. >>> >>> Thanks. >>> >>> - Alex >> Thanks Alex, >> >> I will test the patch in our next service window on Tuesday morning. >> >> Regards, >> Anders > -------------- next part -------------- An HTML attachment was scrubbed... URL: