* RE: Extremely slow network with e1000 & ip_conntrack
@ 2003-12-04 6:51 Feldman, Scott
2003-12-04 12:36 ` Stephen Lee
0 siblings, 1 reply; 13+ messages in thread
From: Feldman, Scott @ 2003-12-04 6:51 UTC (permalink / raw)
To: Harald Welte, Stephen Lee; +Cc: netfilter-devel, linux-kernel
> > > I compiled lots of kernels :-( and narrowed it down to between
> > > 2.5.26 and 2.5.46.
> > >
> > > Kernel version Chip Problem?
> > > 2.4.22 82540EM N
> > > 2.5.26 82540EM N
> > > 2.5.46 82540EM Y
> > > 2.6.0-test10 82540EM Y
> > > 2.6.0-test11 82540EM Y
> > > 2.6.0-test11 82547EI N
> > > 2.4.22nptlsmp 82547EI N
In e1000, check this out:
#ifdef NETIF_F_TSO
if((adapter->hw.mac_type >= e1000_82544) &&
(adapter->hw.mac_type != e1000_82547))
netdev->features |= NETIF_F_TSO;
#endif
Try turning off TSO by disabling this code or by using "ethtool -K tso
off" (need version 1.8).
-scott
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Extremely slow network with e1000 & ip_conntrack 2003-12-04 6:51 Extremely slow network with e1000 & ip_conntrack Feldman, Scott @ 2003-12-04 12:36 ` Stephen Lee 2003-12-04 18:24 ` David S. Miller 2003-12-05 20:28 ` David S. Miller 0 siblings, 2 replies; 13+ messages in thread From: Stephen Lee @ 2003-12-04 12:36 UTC (permalink / raw) To: Feldman, Scott Cc: Harald Welte, netfilter-devel, linux-kernel, David S. Miller "Feldman, Scott" <scott.feldman@intel.com> wrote: > > Try turning off TSO by disabling this code or by using "ethtool -K tso > off" (need version 1.8). Yes, turning off TSO with ethtool fixed it (tested on 2.6.0-test11). At least we have a workaround now. Thanks Scott, Harald and Dave. Is it not supported by the hardware? Seems TSO could improve performance a bit since the 1000/MT Desktop is starved for PCI bandwidth at 32-bit/33MHz. Thanks, Stephen ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Extremely slow network with e1000 & ip_conntrack 2003-12-04 12:36 ` Stephen Lee @ 2003-12-04 18:24 ` David S. Miller 2003-12-05 20:45 ` Stephen Lee 2003-12-05 20:28 ` David S. Miller 1 sibling, 1 reply; 13+ messages in thread From: David S. Miller @ 2003-12-04 18:24 UTC (permalink / raw) To: Stephen Lee, torvalds Cc: scott.feldman, laforge, netfilter-devel, linux-kernel On Thu, 04 Dec 2003 21:36:09 +0900 Stephen Lee <mukansai@emailplus.org> wrote: > "Feldman, Scott" <scott.feldman@intel.com> wrote: > > > > Try turning off TSO by disabling this code or by using "ethtool -K tso > > off" (need version 1.8). > > Yes, turning off TSO with ethtool fixed it (tested on 2.6.0-test11). At > least we have a workaround now. > > Thanks Scott, Harald and Dave. > > Is it not supported by the hardware? Seems TSO could improve > performance a bit since the 1000/MT Desktop is starved for PCI bandwidth > at 32-bit/33MHz. This workaround explains everything. The TSO packets have to be "un-TSO'd" in order for netfilter to look at the packet and parse the contents. This means copying all the data around, allocating several networking buffers, etc. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Extremely slow network with e1000 & ip_conntrack 2003-12-04 18:24 ` David S. Miller @ 2003-12-05 20:45 ` Stephen Lee 0 siblings, 0 replies; 13+ messages in thread From: Stephen Lee @ 2003-12-05 20:45 UTC (permalink / raw) To: David S. Miller Cc: torvalds, scott.feldman, laforge, netfilter-devel, linux-kernel "David S. Miller" <davem@redhat.com> wrote: > Stephen Lee <mukansai@emailplus.org> wrote: > > Yes, turning off TSO with ethtool fixed it (tested on 2.6.0-test11). At > > least we have a workaround now. > > This workaround explains everything. The TSO packets have to be > "un-TSO'd" in order for netfilter to look at the packet and parse > the contents. This means copying all the data around, allocating > several networking buffers, etc. Sorry if I am talking out of my ass, but can this be solved in one of the following ways? (But it would seem 2, 3 or 4 are not 2.6 material). (1) Turn off TSO altogether (Duh). (2) Do what you suggest, but pass the original TSO packet to the ethernet chip. Still have to copy around, but save us some interrupts? (Could introduce subtle bugs if we un-TSO it differently from the hardware). (3) Just pass the original packet to netfilter with a special flag and have netfilter "deduce" what the rest of the headers are. (4) Similar idea, but totally separate the headers and the payload. (I think this was suggested somewhere for some other problem... I read it in one of the archives but can't find it now) Then we can just generate a set of "un-TSO'd" headers (with appropriate pointers into the big payload) for netfilter to look at, but leave the original for passing to hardware eventually. Stephen ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Extremely slow network with e1000 & ip_conntrack 2003-12-04 12:36 ` Stephen Lee 2003-12-04 18:24 ` David S. Miller @ 2003-12-05 20:28 ` David S. Miller 2003-12-05 22:20 ` Stephen Lee ` (2 more replies) 1 sibling, 3 replies; 13+ messages in thread From: David S. Miller @ 2003-12-05 20:28 UTC (permalink / raw) To: Stephen Lee; +Cc: scott.feldman, laforge, netfilter-devel, linux-kernel On Thu, 04 Dec 2003 21:36:09 +0900 Stephen Lee <mukansai@emailplus.org> wrote: > "Feldman, Scott" <scott.feldman@intel.com> wrote: > > > > Try turning off TSO by disabling this code or by using "ethtool -K tso > > off" (need version 1.8). > > Yes, turning off TSO with ethtool fixed it (tested on 2.6.0-test11). At > least we have a workaround now. OK, I've found out what IP conntack does that creates the problems. In fact, it's a bug in conntrack and this ends up corrupting the TSO packet. This forces TSO-disabling on that connection, and retransmission of all the data. Then the data flows correctly so TSO is re-enabled, and on and on and on like this. Performance goes into the toilet. The culprit is net/ipv4/netfilter/ip_conntrack_standalone.c, in ip_refrag(), it does this: if ((*pskb)->len > dst_pmtu(&rt->u.dst)) { /* No hook can be after us, so this should be OK. */ ip_fragment(*pskb, okfn); return NF_STOLEN; } Which fragments TSO packets, oops :) People can confirm this analysis by applying the patch below, enabling TSO with conntrack loaded, and see if the problem goes away. Some auditing is definitely necessary wrt. TSO and netfilter. In particular I am incredibly confident that we have issues in cases like when the FTP netfilter modules mangle the data. Another area for inspection are the cases where TCP header bits are changed and thus the checksum needs to be adjusted. ===== net/ipv4/netfilter/ip_conntrack_standalone.c 1.22 vs edited ===== --- 1.22/net/ipv4/netfilter/ip_conntrack_standalone.c Thu Oct 2 23:21:19 2003 +++ edited/net/ipv4/netfilter/ip_conntrack_standalone.c Fri Dec 5 12:25:22 2003 @@ -201,7 +201,8 @@ /* Local packets are never produced too large for their interface. We degfragment them at LOCAL_OUT, however, so we have to refragment them here. */ - if ((*pskb)->len > dst_pmtu(&rt->u.dst)) { + if ((*pskb)->len > dst_pmtu(&rt->u.dst) && + !skb_shinfo(*pskb)->tso_size) { /* No hook can be after us, so this should be OK. */ ip_fragment(*pskb, okfn); return NF_STOLEN; ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Extremely slow network with e1000 & ip_conntrack 2003-12-05 20:28 ` David S. Miller @ 2003-12-05 22:20 ` Stephen Lee 2003-12-05 22:56 ` David S. Miller 2003-12-11 7:26 ` Harald Welte 2003-12-11 11:03 ` TSO and netfilter (Re: Extremely slow network with e1000 & ip_conntrack) Harald Welte 2 siblings, 1 reply; 13+ messages in thread From: Stephen Lee @ 2003-12-05 22:20 UTC (permalink / raw) To: David S. Miller Cc: Stephen Lee, scott.feldman, laforge, netfilter-devel, linux-kernel "David S. Miller" <davem@redhat.com> wrote: > > OK, I've found out what IP conntack does that creates the problems. > [...] > People can confirm this analysis by applying the patch below, enabling > TSO with conntrack loaded, and see if the problem goes away. I tested it with both e1000 & tg3, TSO on, and it is working fine for me using ftp and scp. Thanks, Stephen ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Extremely slow network with e1000 & ip_conntrack 2003-12-05 22:20 ` Stephen Lee @ 2003-12-05 22:56 ` David S. Miller 0 siblings, 0 replies; 13+ messages in thread From: David S. Miller @ 2003-12-05 22:56 UTC (permalink / raw) To: Stephen Lee Cc: mukansai, scott.feldman, laforge, netfilter-devel, linux-kernel On Sat, 06 Dec 2003 07:20:09 +0900 Stephen Lee <mukansai@emailplus.org> wrote: > "David S. Miller" <davem@redhat.com> wrote: > > > > OK, I've found out what IP conntack does that creates the problems. > > [...] > > People can confirm this analysis by applying the patch below, enabling > > TSO with conntrack loaded, and see if the problem goes away. > > I tested it with both e1000 & tg3, TSO on, and it is working fine for me > using ftp and scp. Great, I'll push the fix to Linus. Thanks. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Extremely slow network with e1000 & ip_conntrack 2003-12-05 20:28 ` David S. Miller 2003-12-05 22:20 ` Stephen Lee @ 2003-12-11 7:26 ` Harald Welte 2003-12-11 8:25 ` Henrik Nordstrom 2003-12-11 11:03 ` TSO and netfilter (Re: Extremely slow network with e1000 & ip_conntrack) Harald Welte 2 siblings, 1 reply; 13+ messages in thread From: Harald Welte @ 2003-12-11 7:26 UTC (permalink / raw) To: David S. Miller; +Cc: Stephen Lee, scott.feldman, netfilter-devel, linux-kernel [-- Attachment #1: Type: text/plain, Size: 1284 bytes --] On Fri, Dec 05, 2003 at 12:28:19PM -0800, David S. Miller wrote: > The culprit is net/ipv4/netfilter/ip_conntrack_standalone.c, > in ip_refrag(), it does this: > Sorry for getting back to you so late, but as indicated before, I was offline while travelling during the last week. Thanks for spotting and fixing the bug. > Some auditing is definitely necessary wrt. TSO and netfilter. In particular > I am incredibly confident that we have issues in cases like when the FTP > netfilter modules mangle the data. Another area for inspection are the > cases where TCP header bits are changed and thus the checksum needs to > be adjusted. yes, this is certainly a problem - but not with conntrack, only with nat. So maybe we should add a safeguard, preventing iptables_nat/ipchains/ipfwadm from being loaded when TSO on any interface is enabled? Or at least print a warining in syslog? -- - Harald Welte <laforge@netfilter.org> http://www.netfilter.org/ ============================================================================ "Fragmentation is like classful addressing -- an interesting early architectural error that shows how much experimentation was going on while IP was being designed." -- Paul Vixie [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Extremely slow network with e1000 & ip_conntrack 2003-12-11 7:26 ` Harald Welte @ 2003-12-11 8:25 ` Henrik Nordstrom 0 siblings, 0 replies; 13+ messages in thread From: Henrik Nordstrom @ 2003-12-11 8:25 UTC (permalink / raw) To: Harald Welte Cc: David S. Miller, Stephen Lee, scott.feldman, netfilter-devel, linux-kernel On Thu, 11 Dec 2003, Harald Welte wrote: > yes, this is certainly a problem - but not with conntrack, only with > nat. So maybe we should add a safeguard, preventing > iptables_nat/ipchains/ipfwadm from being loaded when TSO on any > interface is enabled? Or at least print a warining in syslog? TSO can be enabled while NAT is running so you better do this in the packet flow or if there is a suitable notifier hook that can be used. Most firewalls etc load the ruleset before activating the interfaces, i..e before even loading the nic drivers, so there is no interfaces to look at when iptables_nat is loaded. Regards Henrik ^ permalink raw reply [flat|nested] 13+ messages in thread
* TSO and netfilter (Re: Extremely slow network with e1000 & ip_conntrack) 2003-12-05 20:28 ` David S. Miller 2003-12-05 22:20 ` Stephen Lee 2003-12-11 7:26 ` Harald Welte @ 2003-12-11 11:03 ` Harald Welte 2003-12-12 1:41 ` David S. Miller 2 siblings, 1 reply; 13+ messages in thread From: Harald Welte @ 2003-12-11 11:03 UTC (permalink / raw) To: David S. Miller; +Cc: Stephen Lee, scott.feldman, netfilter-devel, linux-kernel [-- Attachment #1: Type: text/plain, Size: 2121 bytes --] On Fri, Dec 05, 2003 at 12:28:19PM -0800, David S. Miller wrote: > Some auditing is definitely necessary wrt. TSO and netfilter. In particular > I am incredibly confident that we have issues in cases like when the FTP > netfilter modules mangle the data. I didn't have a look into how TSO is implemented until today. From my naive point of view, I cannot think of any issues. From a netfilter point of view, a TSO-enabled skb just looks like a single large packet, right? I mean, the TSO-enabled skb still contains a fully valid IP and TCP packet. If we do any changes to the IP header or tcp header bits, or even to the payload of the packet, this happens before the TSO-enabled driver and the network board start creating multiple tcp/ip datagrams from this skb (by using the information present in the netfilter-modified ip/tcp headers). The only interesting case is in ip_output.c:ip_queue_xmit(), where tso_size and tso_segs are calculated, before NF_IP_LOCAL_OUT is run. But changing the content or the size of the tcp payload should not affect those calculations. A real problem would be resizing the TCP header (where th.doff is affected). But I cannot think of any case where any of the current netfilter/iptables/conntrack/nat code does that. Even in the past, when we used to remove SACKPERM from the tcp header, we just NOP'ed it out instead of resizing the header. > Another area for inspection are the cases where TCP header bits are > changed and thus the checksum needs to be adjusted. Why is this a problem? The netfilter code has to adjust the checksum anyway... or is the checksum calculation for TSO-enabled skb's different? Please enlighten me if I have missed something. -- - Harald Welte <laforge@netfilter.org> http://www.netfilter.org/ ============================================================================ "Fragmentation is like classful addressing -- an interesting early architectural error that shows how much experimentation was going on while IP was being designed." -- Paul Vixie [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: TSO and netfilter (Re: Extremely slow network with e1000 & ip_conntrack) 2003-12-11 11:03 ` TSO and netfilter (Re: Extremely slow network with e1000 & ip_conntrack) Harald Welte @ 2003-12-12 1:41 ` David S. Miller 2003-12-12 7:01 ` Harald Welte 0 siblings, 1 reply; 13+ messages in thread From: David S. Miller @ 2003-12-12 1:41 UTC (permalink / raw) To: Harald Welte; +Cc: mukansai, scott.feldman, netfilter-devel, linux-kernel On Thu, 11 Dec 2003 12:03:15 +0100 Harald Welte <laforge@netfilter.org> wrote: > The only interesting case is in ip_output.c:ip_queue_xmit(), where > tso_size and tso_segs are calculated, before NF_IP_LOCAL_OUT is run. > > But changing the content or the size of the tcp payload should not > affect those calculations. It changes at least tso_segs, since if you decrease of increase the size of the payload the number of real TCP/IP packets the TSO engine will end up spitting out could be different. The one netfilter module I'm most concerned about is the one that handles non-passive FTP, I remember that one did strange things with the data stream, removed TCP options, and stuff like that. > A real problem would be resizing the TCP header (where th.doff is > affected). But I cannot think of any case where any of the current > netfilter/iptables/conntrack/nat code does that. As mentioned above, I thought the netfilter module handling non-passive FTP stripped TCP options. > Even in the past, when > we used to remove SACKPERM from the tcp header, we just NOP'ed it out > instead of resizing the header. This may be what I was thinking about. > > Another area for inspection are the cases where TCP header bits are > > changed and thus the checksum needs to be adjusted. > > Why is this a problem? The netfilter code has to adjust the checksum > anyway... or is the checksum calculation for TSO-enabled skb's > different? Currently all the TSO supporting drivers set the ip and tcp header checksum values themselves as appropriate, so there are no worries in this area. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: TSO and netfilter (Re: Extremely slow network with e1000 & ip_conntrack) 2003-12-12 1:41 ` David S. Miller @ 2003-12-12 7:01 ` Harald Welte 2003-12-12 8:00 ` David S. Miller 0 siblings, 1 reply; 13+ messages in thread From: Harald Welte @ 2003-12-12 7:01 UTC (permalink / raw) To: David S. Miller; +Cc: mukansai, scott.feldman, netfilter-devel, linux-kernel [-- Attachment #1: Type: text/plain, Size: 2989 bytes --] On Thu, Dec 11, 2003 at 05:41:36PM -0800, David S. Miller wrote: > On Thu, 11 Dec 2003 12:03:15 +0100 > Harald Welte <laforge@netfilter.org> wrote: > > > The only interesting case is in ip_output.c:ip_queue_xmit(), where > > tso_size and tso_segs are calculated, before NF_IP_LOCAL_OUT is run. > > > > But changing the content or the size of the tcp payload should not > > affect those calculations. > > It changes at least tso_segs, since if you decrease of increase the > size of the payload the number of real TCP/IP packets the TSO engine > will end up spitting out could be different. I see. So what about the networking core exporting an [inline] function that recalculates tso_segs and tso_size (like the 'Hack zone' code fragment in ip_queue_xmit() right now), called skb_tso_recalc() or whatever name you prefer. Or even better (since I assume TSO can only happen with locally-originated datagrams), why don't we move the tso_size/tso_segs calculation to happen after the LOCAL_OUT netfilter hook? This way we also get the ip_select_ident_more() right, which we couldn't easily update from the proposed skb_tso_recalc() function. yes, in that case we would need to have some fake code like if (skb->len > mtu && (sk->sk_route_caps&NETIF_F_TSO)) skb_shinfo(skb)->tso_segs = 1; in order to make the newly-created check for refragmentation in conntrack still work. Alternatively, create some inline function that gives a yes/no return if the skb would later become TSO or not. > The one netfilter module I'm most concerned about is the one that > handles non-passive FTP, I remember that one did strange things with > the data stream, removed TCP options, and stuff like that. There are no NAT helper modules as of now that touch the size of the TCP header. We have some experimental stuff in patch-o-matic (like IPV3OPTSSTRIP target), but nothing in the stock kernel. I will put a review of those on our TODO list - but for the vanilla kernel there shouldn't be a problem. > > Even in the past, when we used to remove SACKPERM from the tcp > > header, we just NOP'ed it out instead of resizing the header. > > This may be what I was thinking about. We now don't do that anymore and mangle the SACK options accordingly. In any way, there is nothing that changes the size of the tcp or udp header. > Currently all the TSO supporting drivers set the ip and tcp header > checksum values themselves as appropriate, so there are no worries in > this area. good news. Please get back to me with any comments you might have, thanks. -- - Harald Welte <laforge@netfilter.org> http://www.netfilter.org/ ============================================================================ "Fragmentation is like classful addressing -- an interesting early architectural error that shows how much experimentation was going on while IP was being designed." -- Paul Vixie [-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --] ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: TSO and netfilter (Re: Extremely slow network with e1000 & ip_conntrack) 2003-12-12 7:01 ` Harald Welte @ 2003-12-12 8:00 ` David S. Miller 0 siblings, 0 replies; 13+ messages in thread From: David S. Miller @ 2003-12-12 8:00 UTC (permalink / raw) To: Harald Welte; +Cc: mukansai, scott.feldman, netfilter-devel, linux-kernel On Fri, 12 Dec 2003 08:01:31 +0100 Harald Welte <laforge@netfilter.org> wrote: > So what about the networking core exporting an [inline] function > that recalculates tso_segs and tso_size (like the 'Hack zone' code > fragment in ip_queue_xmit() right now), called skb_tso_recalc() or > whatever name you prefer. This might work. > Or even better (since I assume TSO can only happen with > locally-originated datagrams), why don't we move the tso_size/tso_segs > calculation to happen after the LOCAL_OUT netfilter hook? This way we > also get the ip_select_ident_more() right, which we couldn't easily > update from the proposed skb_tso_recalc() function. > > yes, in that case we would need to have some fake code like > if (skb->len > mtu && (sk->sk_route_caps&NETIF_F_TSO)) > skb_shinfo(skb)->tso_segs = 1; > in order to make the newly-created check for refragmentation in > conntrack still work. Alternatively, create some inline function that > gives a yes/no return if the skb would later become TSO or not. I don't know about this. The local-out hook always had a fully functional finalized packet to work with, and I doubt we should change that. Also, dst_output() might invoke IPSEC encapsulators which absolutely must have the final packet in hand when they run (f.e. you can't choose the IP ID after encryption of the IP header). Anyways, that leaves us with the helper function idea, does this (untested) look like what you want? --- include/linux/skbuff.h.~1~ Thu Dec 11 23:55:43 2003 +++ include/linux/skbuff.h Thu Dec 11 23:57:45 2003 @@ -1155,6 +1155,17 @@ #endif } +static __inline__ void skb_tso_recalc(struct sk_buff *skb, struct dst_entry *dst) +{ + unsigned int hlen = ((skb->h.raw-skb->data)+(skb->h.th->doff<<2)); + u32 mtu = dst_pmtu(dst);; + + skb_shinfo(skb)->tso_size = mtu - hlen; + skb_shinfo(skb)->tso_segs = + (skb->len - hlen + skb_shinfo(skb)->tso_size - 1) / + skb_shinfo(skb)->tso_size - 1; +} + #define skb_queue_walk(queue, skb) \ for (skb = (queue)->next, prefetch(skb->next); \ (skb != (struct sk_buff *)(queue)); \ ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2003-12-12 8:03 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2003-12-04 6:51 Extremely slow network with e1000 & ip_conntrack Feldman, Scott 2003-12-04 12:36 ` Stephen Lee 2003-12-04 18:24 ` David S. Miller 2003-12-05 20:45 ` Stephen Lee 2003-12-05 20:28 ` David S. Miller 2003-12-05 22:20 ` Stephen Lee 2003-12-05 22:56 ` David S. Miller 2003-12-11 7:26 ` Harald Welte 2003-12-11 8:25 ` Henrik Nordstrom 2003-12-11 11:03 ` TSO and netfilter (Re: Extremely slow network with e1000 & ip_conntrack) Harald Welte 2003-12-12 1:41 ` David S. Miller 2003-12-12 7:01 ` Harald Welte 2003-12-12 8:00 ` David S. Miller
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).