From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S264377AbTLEUaE (ORCPT ); Fri, 5 Dec 2003 15:30:04 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S264446AbTLEUaE (ORCPT ); Fri, 5 Dec 2003 15:30:04 -0500 Received: from pizda.ninka.net ([216.101.162.242]:25013 "EHLO pizda.ninka.net") by vger.kernel.org with ESMTP id S264377AbTLEU35 (ORCPT ); Fri, 5 Dec 2003 15:29:57 -0500 Date: Fri, 5 Dec 2003 12:28:19 -0800 From: "David S. Miller" To: Stephen Lee Cc: scott.feldman@intel.com, laforge@netfilter.org, netfilter-devel@lists.netfilter.org, linux-kernel@vger.kernel.org Subject: Re: Extremely slow network with e1000 & ip_conntrack Message-Id: <20031205122819.25ac14ab.davem@redhat.com> In-Reply-To: <20031204213030.2B75.MUKANSAI@emailplus.org> References: <20031204213030.2B75.MUKANSAI@emailplus.org> X-Mailer: Sylpheed version 0.9.7 (GTK+ 1.2.6; sparc-unknown-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 04 Dec 2003 21:36:09 +0900 Stephen Lee wrote: > "Feldman, Scott" wrote: > > > > Try turning off TSO by disabling this code or by using "ethtool -K tso > > off" (need version 1.8). > > Yes, turning off TSO with ethtool fixed it (tested on 2.6.0-test11). At > least we have a workaround now. OK, I've found out what IP conntack does that creates the problems. In fact, it's a bug in conntrack and this ends up corrupting the TSO packet. This forces TSO-disabling on that connection, and retransmission of all the data. Then the data flows correctly so TSO is re-enabled, and on and on and on like this. Performance goes into the toilet. The culprit is net/ipv4/netfilter/ip_conntrack_standalone.c, in ip_refrag(), it does this: if ((*pskb)->len > dst_pmtu(&rt->u.dst)) { /* No hook can be after us, so this should be OK. */ ip_fragment(*pskb, okfn); return NF_STOLEN; } Which fragments TSO packets, oops :) People can confirm this analysis by applying the patch below, enabling TSO with conntrack loaded, and see if the problem goes away. Some auditing is definitely necessary wrt. TSO and netfilter. In particular I am incredibly confident that we have issues in cases like when the FTP netfilter modules mangle the data. Another area for inspection are the cases where TCP header bits are changed and thus the checksum needs to be adjusted. ===== net/ipv4/netfilter/ip_conntrack_standalone.c 1.22 vs edited ===== --- 1.22/net/ipv4/netfilter/ip_conntrack_standalone.c Thu Oct 2 23:21:19 2003 +++ edited/net/ipv4/netfilter/ip_conntrack_standalone.c Fri Dec 5 12:25:22 2003 @@ -201,7 +201,8 @@ /* Local packets are never produced too large for their interface. We degfragment them at LOCAL_OUT, however, so we have to refragment them here. */ - if ((*pskb)->len > dst_pmtu(&rt->u.dst)) { + if ((*pskb)->len > dst_pmtu(&rt->u.dst) && + !skb_shinfo(*pskb)->tso_size) { /* No hook can be after us, so this should be OK. */ ip_fragment(*pskb, okfn); return NF_STOLEN;