* Strange packet drops with heavy firewalling @ 2010-04-09 9:56 Benny Amorsen 2010-04-09 11:47 ` Eric Dumazet [not found] ` <q2v40c9f5b21004120116p766df82dj88c6af4e4cad55f@mail.gmail.com> 0 siblings, 2 replies; 15+ messages in thread From: Benny Amorsen @ 2010-04-09 9:56 UTC (permalink / raw) To: netdev I have a netfilter-box which is dropping packets. ethtool -S counts 10-20 rx_discards per second on the interface. The switch does not have flow control enabled; with flow control enabled the rx_discards turn into tx_on_sent which ultimately cause the same problem (the load is pretty constant so the switch has to drop the packets instead). perf top shows something like: 5201.00 - 6.7% : _spin_unlock_irqrestore 4232.00 - 5.5% : finish_task_switch 3597.00 - 4.6% : tg3_poll [tg3] 3257.00 - 4.2% : handle_IRQ_event 2515.00 - 3.2% : tick_nohz_restart_sched_tick 1947.00 - 2.5% : nf_ct_tuple_equal 1927.00 - 2.5% : tg3_start_xmit [tg3] 1879.00 - 2.4% : kmem_cache_alloc_node 1625.00 - 2.1% : tick_nohz_stop_sched_tick 1619.00 - 2.1% : ipt_do_table 1595.00 - 2.1% : ip_route_input 1547.00 - 2.0% : kmem_cache_free 1474.00 - 1.9% : __alloc_skb 1424.00 - 1.8% : fget_light 1391.00 - 1.8% : nf_iterate The rule set is quite large (more than 4000 rules), but organized so that each packet only has to traverse a few rules before getting accepted or rejected. When the problem started we were using a different server, an old two-socket 32-bit Xeon with hyperthreading. CPU usage often hit 100% on one CPU with that server. After replacing the server with a ProLiant DL160 G5 with a quad-core Xeon (without hyperthreading) the CPU usage rarely exceeds 10% on any CPU, but the packet loss persists. We're using the built-in dual Broadcom Corporation NetXtreme BCM5722 Gigabit Ethernet PCI Express nics, and the kernel is kernel-2.6.32.9-70.fc12.x86_64 from Fedora. Next step is probably installing a better ethernet card, perhaps an Intel 82576-based one, so that we can get multiqueue support. The traffic is about 300Mbps (twice that if you count both in and out, like Cisco). /Benny ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Strange packet drops with heavy firewalling 2010-04-09 9:56 Strange packet drops with heavy firewalling Benny Amorsen @ 2010-04-09 11:47 ` Eric Dumazet 2010-04-09 12:33 ` Benny Amorsen [not found] ` <q2v40c9f5b21004120116p766df82dj88c6af4e4cad55f@mail.gmail.com> 1 sibling, 1 reply; 15+ messages in thread From: Eric Dumazet @ 2010-04-09 11:47 UTC (permalink / raw) To: Benny Amorsen; +Cc: netdev Le vendredi 09 avril 2010 à 11:56 +0200, Benny Amorsen a écrit : > I have a netfilter-box which is dropping packets. ethtool -S counts > 10-20 rx_discards per second on the interface. > > The switch does not have flow control enabled; with flow control enabled > the rx_discards turn into tx_on_sent which ultimately cause the same > problem (the load is pretty constant so the switch has to drop the > packets instead). > > perf top shows something like: > 5201.00 - 6.7% : _spin_unlock_irqrestore > 4232.00 - 5.5% : finish_task_switch > 3597.00 - 4.6% : tg3_poll [tg3] > 3257.00 - 4.2% : handle_IRQ_event > 2515.00 - 3.2% : tick_nohz_restart_sched_tick > 1947.00 - 2.5% : nf_ct_tuple_equal > 1927.00 - 2.5% : tg3_start_xmit [tg3] > 1879.00 - 2.4% : kmem_cache_alloc_node > 1625.00 - 2.1% : tick_nohz_stop_sched_tick > 1619.00 - 2.1% : ipt_do_table > 1595.00 - 2.1% : ip_route_input > 1547.00 - 2.0% : kmem_cache_free > 1474.00 - 1.9% : __alloc_skb > 1424.00 - 1.8% : fget_light > 1391.00 - 1.8% : nf_iterate > > The rule set is quite large (more than 4000 rules), but organized so > that each packet only has to traverse a few rules before getting > accepted or rejected. > > When the problem started we were using a different server, an old > two-socket 32-bit Xeon with hyperthreading. CPU usage often hit 100% on > one CPU with that server. After replacing the server with a ProLiant > DL160 G5 with a quad-core Xeon (without hyperthreading) the CPU usage > rarely exceeds 10% on any CPU, but the packet loss persists. > might be micro bursts, check 'ethtool -g eth0' RX parameters (increase RX ring from 200 to 511 if you want more buffers ?) > We're using the built-in dual Broadcom Corporation NetXtreme BCM5722 Gigabit > Ethernet PCI Express nics, and the kernel is > kernel-2.6.32.9-70.fc12.x86_64 from Fedora. Next step is probably > installing a better ethernet card, perhaps an Intel 82576-based one, so > that we can get multiqueue support. > Sure, but before this, could you check cat /proc/net/softnet_stat cat /proc/interrupts (check eth0 IRQS are delivered to one cpu) grep . /proc/sys/net/ipv4/netfilter/ip_conntrack_* (might need to increase ip_conntrack_buckets) ethtool -c eth0 (might change coalesce params to reduce number of irqs) ethtool -g eth0 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Strange packet drops with heavy firewalling 2010-04-09 11:47 ` Eric Dumazet @ 2010-04-09 12:33 ` Benny Amorsen 2010-04-09 13:29 ` Eric Dumazet 0 siblings, 1 reply; 15+ messages in thread From: Benny Amorsen @ 2010-04-09 12:33 UTC (permalink / raw) To: Eric Dumazet; +Cc: netdev Eric Dumazet <eric.dumazet@gmail.com> writes: > might be micro bursts, check 'ethtool -g eth0' RX parameters (increase > RX ring from 200 to 511 if you want more buffers ?) I tried that already actually. (I didn't expect it to cause traffic interruption, but it did. Oh well.) It didn't make a difference, at least not one I could detect from the number of packet drops and the CPU utilization. > cat /proc/net/softnet_stat 000002d9 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 42bc8143 00000000 0000024c 00000000 00000000 00000000 00000000 00000000 00000000 0000031b 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 1c5a35e9 00000000 000005f7 00000000 00000000 00000000 00000000 00000000 00000000 I am not quite sure how to interpret that... > cat /proc/interrupts 79: 1240 4050590849 1253 1263 PCI-MSI-edge eth0 80: 12 9 14 3613521843 PCI-MSI-edge eth1 > (check eth0 IRQS are delivered to one cpu) Yes CPU1 handles eth0 and CPU3 handles eth1. > grep . /proc/sys/net/ipv4/netfilter/ip_conntrack_* nf_conntrack_acct:1 nf_conntrack_buckets:8192 nf_conntrack_checksum:1 nf_conntrack_count:49311 nf_conntrack_events:1 nf_conntrack_events_retry_timeout:15 nf_conntrack_expect_max:2048 nf_conntrack_generic_timeout:600 nf_conntrack_icmp_timeout:30 nf_conntrack_log_invalid:1 nf_conntrack_max:1048576 nf_conntrack_tcp_be_liberal:0 nf_conntrack_tcp_loose:1 nf_conntrack_tcp_max_retrans:3 nf_conntrack_tcp_timeout_close:10 nf_conntrack_tcp_timeout_close_wait:60 nf_conntrack_tcp_timeout_established:432000 nf_conntrack_tcp_timeout_fin_wait:120 nf_conntrack_tcp_timeout_last_ack:30 nf_conntrack_tcp_timeout_max_retrans:300 nf_conntrack_tcp_timeout_syn_recv:60 nf_conntrack_tcp_timeout_syn_sent:120 nf_conntrack_tcp_timeout_time_wait:120 nf_conntrack_tcp_timeout_unacknowledged:300 nf_conntrack_udp_timeout:30 nf_conntrack_udp_timeout_stream:180 > (might need to increase ip_conntrack_buckets) You got me there. I had forgotten nf_conntrack.hashsize=1048576 and nf_conntrack.expect_hashsize=32768 on the kernel command line. It was on the hot standby firewall, but not on the primary one. I will do a failover to the hot standby sometime during the weekend. It still isn't possible to change without a reboot, is it? > ethtool -c eth0 > (might change coalesce params to reduce number of irqs) Coalesce parameters for eth0: Adaptive RX: off TX: off stats-block-usecs: 0 sample-interval: 0 pkt-rate-low: 0 pkt-rate-high: 0 rx-usecs: 20 rx-frames: 5 rx-usecs-irq: 0 rx-frames-irq: 5 tx-usecs: 72 tx-frames: 53 tx-usecs-irq: 0 tx-frames-irq: 5 rx-usecs-low: 0 rx-frame-low: 0 tx-usecs-low: 0 tx-frame-low: 0 rx-usecs-high: 0 rx-frame-high: 0 tx-usecs-high: 0 tx-frame-high: 0 I played quite a lot with the parameters but it did not seem to make any difference. I didn't try adaptive though, but the load is fairly static so it didn't seem appropriate. > ethtool -g eth0 Ring parameters for eth0: Pre-set maximums: RX: 511 RX Mini: 0 RX Jumbo: 0 TX: 511 Current hardware settings: RX: 200 RX Mini: 0 RX Jumbo: 0 TX: 511 Right now RX is 200, but when it was 511 it didn't seem to make a difference. Thank you very much for the help! I will report back whether it was the hash buckets. /Benny ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Strange packet drops with heavy firewalling 2010-04-09 12:33 ` Benny Amorsen @ 2010-04-09 13:29 ` Eric Dumazet 2010-04-12 6:20 ` Benny Amorsen 0 siblings, 1 reply; 15+ messages in thread From: Eric Dumazet @ 2010-04-09 13:29 UTC (permalink / raw) To: Benny Amorsen; +Cc: netdev Le vendredi 09 avril 2010 à 14:33 +0200, Benny Amorsen a écrit : > Thank you very much for the help! I will report back whether it was the > hash buckets. OK You could try : ethtool -C eth0 tx-usecs 200 tx-frames 100 tx-frames-irq 100 ethtool -C eth1 tx-usecs 200 tx-frames 100 tx-frames-irq 100 (to reduce tx completion irqs) Before buying multiqueue devices, you also could try net-next-2.6 kernel, because RPS (Remote Packet Steering) is in. In your setup, this might help a bit, distribute the packets to all cpus, with appropriate cache handling. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Strange packet drops with heavy firewalling 2010-04-09 13:29 ` Eric Dumazet @ 2010-04-12 6:20 ` Benny Amorsen 0 siblings, 0 replies; 15+ messages in thread From: Benny Amorsen @ 2010-04-12 6:20 UTC (permalink / raw) To: Eric Dumazet; +Cc: netdev Eric Dumazet <eric.dumazet@gmail.com> writes: > Le vendredi 09 avril 2010 à 14:33 +0200, Benny Amorsen a écrit : > >> Thank you very much for the help! I will report back whether it was the >> hash buckets. > > OK > > You could try : > > ethtool -C eth0 tx-usecs 200 tx-frames 100 tx-frames-irq 100 > ethtool -C eth1 tx-usecs 200 tx-frames 100 tx-frames-irq 100 > > (to reduce tx completion irqs) Alas, even with the hash buckets I still have the same problem. Perhaps slightly less severe, but it's still there. I implemented the other changes you suggested as well except for the ethtool -G. I may try to switch to net-next if I can find an easy way to make an RPM out of it. Thank you for the help! /proc/sys/net/netfilter/nf_conntrack_acct:1 /proc/sys/net/netfilter/nf_conntrack_buckets:1048576 /proc/sys/net/netfilter/nf_conntrack_checksum:1 /proc/sys/net/netfilter/nf_conntrack_count:43430 /proc/sys/net/netfilter/nf_conntrack_events:1 /proc/sys/net/netfilter/nf_conntrack_events_retry_timeout:15 /proc/sys/net/netfilter/nf_conntrack_expect_max:2048 /proc/sys/net/netfilter/nf_conntrack_generic_timeout:600 /proc/sys/net/netfilter/nf_conntrack_icmp_timeout:30 /proc/sys/net/netfilter/nf_conntrack_log_invalid:1 /proc/sys/net/netfilter/nf_conntrack_max:1048576 /proc/sys/net/netfilter/nf_conntrack_tcp_be_liberal:0 /proc/sys/net/netfilter/nf_conntrack_tcp_loose:1 /proc/sys/net/netfilter/nf_conntrack_tcp_max_retrans:3 /proc/sys/net/netfilter/nf_conntrack_tcp_timeout_close:10 /proc/sys/net/netfilter/nf_conntrack_tcp_timeout_close_wait:60 /proc/sys/net/netfilter/nf_conntrack_tcp_timeout_established:432000 /proc/sys/net/netfilter/nf_conntrack_tcp_timeout_fin_wait:120 /proc/sys/net/netfilter/nf_conntrack_tcp_timeout_last_ack:30 /proc/sys/net/netfilter/nf_conntrack_tcp_timeout_max_retrans:300 /proc/sys/net/netfilter/nf_conntrack_tcp_timeout_syn_recv:60 /proc/sys/net/netfilter/nf_conntrack_tcp_timeout_syn_sent:120 /proc/sys/net/netfilter/nf_conntrack_tcp_timeout_time_wait:120 /proc/sys/net/netfilter/nf_conntrack_tcp_timeout_unacknowledged:300 /proc/sys/net/netfilter/nf_conntrack_udp_timeout:30 /proc/sys/net/netfilter/nf_conntrack_udp_timeout_stream:180 /Benny ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <q2v40c9f5b21004120116p766df82dj88c6af4e4cad55f@mail.gmail.com>]
* Re: Strange packet drops with heavy firewalling [not found] ` <q2v40c9f5b21004120116p766df82dj88c6af4e4cad55f@mail.gmail.com> @ 2010-04-12 14:44 ` Benny Lyne Amorsen [not found] ` <p2x40c9f5b21004120833jd7a749cak6ea69cebd28f8352@mail.gmail.com> 0 siblings, 1 reply; 15+ messages in thread From: Benny Lyne Amorsen @ 2010-04-12 14:44 UTC (permalink / raw) To: zhigang gong; +Cc: netdev man, 12 04 2010 kl. 16:16 +0800, skrev zhigang gong: > How do you know the per CPU usage data, by oprofile? I'm just a little > surprised with the result, as it shows your new core is running 10x > faster than your old core :). Well the old server had only two CPU's plus hyperthreading, and the CPU's were Pentium-4-based. Add a slow memory bus to that and you have a fairly slow system. It's almost 5 years old, so Moore's law says 2**3 increase in number of transistors... In about the same time frame Linux has gone from being able to fill 1Gbps ethernet to being able to fill 10Gbps ethernet > What's the average packet size? I asked the switch (I can't find a handy equivalent to ifstat which counts packets instead of bytes). The 5 minute average packet sizes seem to vary in the range 450 to 550 bytes. > If your packet size is 64 bytes, then the pps(packet per second) rate > should be about 585Kpps. As I know, this value is almost the best > result when the standard linux kernel is processing the networking > traffic with a normal 1Gb ethernet card (without multi-queue support) > on a intel box. If it is the case, to buy a better ethernet card with > multi-queue support should be a good choice. Otherwise, it may not > help. I am far from that, perhaps 1/10th of that. I do a lot more processing on at least some of the packets though (the ones starting new flows). /Benny ^ permalink raw reply [flat|nested] 15+ messages in thread
[parent not found: <p2x40c9f5b21004120833jd7a749cak6ea69cebd28f8352@mail.gmail.com>]
* Re: Strange packet drops with heavy firewalling [not found] ` <p2x40c9f5b21004120833jd7a749cak6ea69cebd28f8352@mail.gmail.com> @ 2010-04-12 17:06 ` Benny Amorsen 2010-04-12 23:18 ` Changli Gao 0 siblings, 1 reply; 15+ messages in thread From: Benny Amorsen @ 2010-04-12 17:06 UTC (permalink / raw) To: zhigang gong; +Cc: netdev man, 12 04 2010 kl. 23:33 +0800, skrev zhigang gong: > > Now, I agree with Eric's analysis, there may be some bursts, for > example a burst of a bunch of first packets for different new flows. > What mode are you using the ethernet driver in? I guess it's NAPI, > right? I presume so. > And whether your time consumption workload is handled in soft-irq > context or in a user space process? soft-irq, the box is doing pure iptables. The only time it does a little bit of user-space work is when I use conntrackd, but killing conntrackd does not affect the packet loss measurably. I switched to a 82576-based card, and now I get: 3341.00 - 4.9% : _spin_lock 2506.00 - 3.7% : irq_entries_start 2163.00 - 3.2% : _spin_lock_irqsave 1616.00 - 2.4% : native_read_tsc 1572.00 - 2.3% : igb_poll [igb] 1386.00 - 2.0% : get_partial_node 1236.00 - 1.8% : igb_clean_tx_irq [igb] 1205.00 - 1.8% : igb_xmit_frame_adv [igb] 1170.00 - 1.7% : ipt_do_table 1049.00 - 1.6% : fget_light 1015.00 - 1.5% : tick_nohz_stop_sched_tick 967.00 - 1.4% : fput 945.00 - 1.4% : __slab_free 919.00 - 1.4% : datagram_poll 874.00 - 1.3% : dev_queue_xmit And it seems the packet loss is gone! # ethtool -S eth0|fgrep drop tx_dropped: 0 rx_queue_drop_packet_count: 0 dropped_smbus: 0 rx_queue_0_drops: 0 rx_queue_1_drops: 0 rx_queue_2_drops: 0 rx_queue_3_drops: 0 I'm a bit surprised by this though: 99: 24 1306226 3 2 PCI-MSI-edge eth1-tx-0 100: 15735 1648774 3 7 PCI-MSI-edge eth1-tx-1 101: 8 11 9 1083022 PCI-MSI-edge eth1-tx-2 102: 0 0 0 0 PCI-MSI-edge eth1-tx-3 103: 18 15 6131 1095383 PCI-MSI-edge eth1-rx-0 104: 217 32 46544 1335325 PCI-MSI-edge eth1-rx-1 105: 154 1305595 218 16 PCI-MSI-edge eth1-rx-2 106: 17 16 8229 1467509 PCI-MSI-edge eth1-rx-3 107: 0 0 1 0 PCI-MSI-edge eth1 108: 2 14 15 1003053 PCI-MSI-edge eth0-tx-0 109: 8226 1668924 478 487 PCI-MSI-edge eth0-tx-1 110: 3 1188874 17 12 PCI-MSI-edge eth0-tx-2 111: 0 0 0 0 PCI-MSI-edge eth0-tx-3 112: 203 185 5324 1015263 PCI-MSI-edge eth0-rx-0 113: 4141 1600793 153 159 PCI-MSI-edge eth0-rx-1 114: 16242 1210108 436 3124 PCI-MSI-edge eth0-rx-2 115: 267 4173 19471 1321252 PCI-MSI-edge eth0-rx-3 116: 0 1 0 0 PCI-MSI-edge eth0 irqbalanced seems to have picked CPU1 and CPU3 for all the interrupts, which to my mind should cause the same problem as before (where CPU1 and CPU3 was handling all packets). Yet the box clearly works much better than before. Anyway, this brings the saga to an end from my point of view. Thank you very much for looking into this, you and Eric Dumazet have been invaluable! /Benny ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Strange packet drops with heavy firewalling 2010-04-12 17:06 ` Benny Amorsen @ 2010-04-12 23:18 ` Changli Gao 2010-04-13 5:56 ` Eric Dumazet 2010-04-13 12:33 ` Paweł Staszewski 0 siblings, 2 replies; 15+ messages in thread From: Changli Gao @ 2010-04-12 23:18 UTC (permalink / raw) To: Benny Amorsen; +Cc: zhigang gong, netdev On Tue, Apr 13, 2010 at 1:06 AM, Benny Amorsen <benny+usenet@amorsen.dk> wrote: > > 99: 24 1306226 3 2 PCI-MSI-edge eth1-tx-0 > 100: 15735 1648774 3 7 PCI-MSI-edge eth1-tx-1 > 101: 8 11 9 1083022 PCI-MSI-edge eth1-tx-2 > 102: 0 0 0 0 PCI-MSI-edge eth1-tx-3 > 103: 18 15 6131 1095383 PCI-MSI-edge eth1-rx-0 > 104: 217 32 46544 1335325 PCI-MSI-edge eth1-rx-1 > 105: 154 1305595 218 16 PCI-MSI-edge eth1-rx-2 > 106: 17 16 8229 1467509 PCI-MSI-edge eth1-rx-3 > 107: 0 0 1 0 PCI-MSI-edge eth1 > 108: 2 14 15 1003053 PCI-MSI-edge eth0-tx-0 > 109: 8226 1668924 478 487 PCI-MSI-edge eth0-tx-1 > 110: 3 1188874 17 12 PCI-MSI-edge eth0-tx-2 > 111: 0 0 0 0 PCI-MSI-edge eth0-tx-3 > 112: 203 185 5324 1015263 PCI-MSI-edge eth0-rx-0 > 113: 4141 1600793 153 159 PCI-MSI-edge eth0-rx-1 > 114: 16242 1210108 436 3124 PCI-MSI-edge eth0-rx-2 > 115: 267 4173 19471 1321252 PCI-MSI-edge eth0-rx-3 > 116: 0 1 0 0 PCI-MSI-edge eth0 > > > irqbalanced seems to have picked CPU1 and CPU3 for all the interrupts, > which to my mind should cause the same problem as before (where CPU1 and > CPU3 was handling all packets). Yet the box clearly works much better > than before. irqbalanced? I don't think it can work properly. Try RPS in netdev and linux-next tree, and if cpu load isn't even, try this patch: http://patchwork.ozlabs.org/patch/49915/ . -- Regards, Changli Gao(xiaosuo@gmail.com) ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Strange packet drops with heavy firewalling 2010-04-12 23:18 ` Changli Gao @ 2010-04-13 5:56 ` Eric Dumazet 2010-04-13 7:56 ` Benny Amorsen 2010-04-13 12:33 ` Paweł Staszewski 1 sibling, 1 reply; 15+ messages in thread From: Eric Dumazet @ 2010-04-13 5:56 UTC (permalink / raw) To: Changli Gao; +Cc: Benny Amorsen, zhigang gong, netdev Le mardi 13 avril 2010 à 07:18 +0800, Changli Gao a écrit : > On Tue, Apr 13, 2010 at 1:06 AM, Benny Amorsen <benny+usenet@amorsen.dk> wrote: > > > > 99: 24 1306226 3 2 PCI-MSI-edge eth1-tx-0 > > 100: 15735 1648774 3 7 PCI-MSI-edge eth1-tx-1 > > 101: 8 11 9 1083022 PCI-MSI-edge eth1-tx-2 > > 102: 0 0 0 0 PCI-MSI-edge eth1-tx-3 > > 103: 18 15 6131 1095383 PCI-MSI-edge eth1-rx-0 > > 104: 217 32 46544 1335325 PCI-MSI-edge eth1-rx-1 > > 105: 154 1305595 218 16 PCI-MSI-edge eth1-rx-2 > > 106: 17 16 8229 1467509 PCI-MSI-edge eth1-rx-3 > > 107: 0 0 1 0 PCI-MSI-edge eth1 > > 108: 2 14 15 1003053 PCI-MSI-edge eth0-tx-0 > > 109: 8226 1668924 478 487 PCI-MSI-edge eth0-tx-1 > > 110: 3 1188874 17 12 PCI-MSI-edge eth0-tx-2 > > 111: 0 0 0 0 PCI-MSI-edge eth0-tx-3 > > 112: 203 185 5324 1015263 PCI-MSI-edge eth0-rx-0 > > 113: 4141 1600793 153 159 PCI-MSI-edge eth0-rx-1 > > 114: 16242 1210108 436 3124 PCI-MSI-edge eth0-rx-2 > > 115: 267 4173 19471 1321252 PCI-MSI-edge eth0-rx-3 > > 116: 0 1 0 0 PCI-MSI-edge eth0 > > > > > > irqbalanced seems to have picked CPU1 and CPU3 for all the interrupts, > > which to my mind should cause the same problem as before (where CPU1 and > > CPU3 was handling all packets). Yet the box clearly works much better > > than before. > > irqbalanced? I don't think it can work properly. Try RPS in netdev and > linux-next tree, and if cpu load isn't even, try this patch: > http://patchwork.ozlabs.org/patch/49915/ . > > Dont try RPS on multiqueue devices ! If number of queues matches CPU numbers, it brings nothing but extra latencies ! Benny, I am not sure your irqbalance is up2date with multiqueue devices, you might need to disable it and manually irqaffine each interrupt echo 01 >/proc/irq/100/smp_affinity echo 02 >/proc/irq/101/smp_affinity echo 04 >/proc/irq/102/smp_affinity echo 08 >/proc/irq/103/smp_affinity echo 10 >/proc/irq/104/smp_affinity echo 20 >/proc/irq/105/smp_affinity echo 40 >/proc/irq/106/smp_affinity echo 80 >/proc/irq/107/smp_affinity echo 01 >/proc/irq/108/smp_affinity echo 02 >/proc/irq/109/smp_affinity echo 04 >/proc/irq/110/smp_affinity echo 08 >/proc/irq/111/smp_affinity echo 10 >/proc/irq/112/smp_affinity echo 20 >/proc/irq/113/smp_affinity echo 40 >/proc/irq/114/smp_affinity echo 80 >/proc/irq/115/smp_affinity ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Strange packet drops with heavy firewalling 2010-04-13 5:56 ` Eric Dumazet @ 2010-04-13 7:56 ` Benny Amorsen 2010-04-15 13:23 ` Benny Amorsen 0 siblings, 1 reply; 15+ messages in thread From: Benny Amorsen @ 2010-04-13 7:56 UTC (permalink / raw) To: Eric Dumazet; +Cc: Changli Gao, zhigang gong, netdev Eric Dumazet <eric.dumazet@gmail.com> writes: > Benny, I am not sure your irqbalance is up2date with multiqueue devices, > you might need to disable it and manually irqaffine each interrupt True, that would probably help. Irqbalance might just believe that the load is so low that it isn't worth rebalancing. The CPU's are spending more than 90% of their time idling. I'll keep monitoring the server, and if it starts dropping packets again or load increases I'll check whether irqbalanced does the right thing, and if not I'll implement your suggestion. Thank you very much! /Benny ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Strange packet drops with heavy firewalling 2010-04-13 7:56 ` Benny Amorsen @ 2010-04-15 13:23 ` Benny Amorsen 2010-04-15 13:42 ` Eric Dumazet 0 siblings, 1 reply; 15+ messages in thread From: Benny Amorsen @ 2010-04-15 13:23 UTC (permalink / raw) To: Eric Dumazet; +Cc: Changli Gao, zhigang gong, netdev Benny Amorsen <benny+usenet@amorsen.dk> writes: > I'll keep monitoring the server, and if it starts dropping packets again > or load increases I'll check whether irqbalanced does the right thing, > and if not I'll implement your suggestion. It did start dropping packets (although very few, a few packets dropped at once perhaps every ten minutes). Irqbalanced didn't move the interrupts. Doing echo 01 >/proc/irq/99/smp_affinity echo 02 >/proc/irq/100/smp_affinity echo 04 >/proc/irq/101/smp_affinity and so on like Erik Dumazet suggested seems to have helped, but not entirely solved the problem. The problem now manifests itself this way in ethtool -S: rx_no_buffer_count: 270 rx_queue_drop_packet_count: 270 I can't be sure that I'm not just getting hit by a 1Gbps traffic spike, of course, but it is a bit strange that a machine which can do 200Mbps at 92% idle can't handle subsecond peaks close to 1Gbps... I wish ifstat could report errors so I could see what the traffic rate was when the problem occurred... /Benny ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Strange packet drops with heavy firewalling 2010-04-15 13:23 ` Benny Amorsen @ 2010-04-15 13:42 ` Eric Dumazet 0 siblings, 0 replies; 15+ messages in thread From: Eric Dumazet @ 2010-04-15 13:42 UTC (permalink / raw) To: Benny Amorsen; +Cc: Changli Gao, zhigang gong, netdev Le jeudi 15 avril 2010 à 15:23 +0200, Benny Amorsen a écrit : > Benny Amorsen <benny+usenet@amorsen.dk> writes: > > > I'll keep monitoring the server, and if it starts dropping packets again > > or load increases I'll check whether irqbalanced does the right thing, > > and if not I'll implement your suggestion. > > It did start dropping packets (although very few, a few packets dropped > at once perhaps every ten minutes). Irqbalanced didn't move the > interrupts. > > Doing > > echo 01 >/proc/irq/99/smp_affinity > echo 02 >/proc/irq/100/smp_affinity > echo 04 >/proc/irq/101/smp_affinity > > and so on like Erik Dumazet suggested seems to have helped, but not > entirely solved the problem. > > The problem now manifests itself this way in ethtool -S: > rx_no_buffer_count: 270 > rx_queue_drop_packet_count: 270 > > I can't be sure that I'm not just getting hit by a 1Gbps traffic spike, > of course, but it is a bit strange that a machine which can do 200Mbps > at 92% idle can't handle subsecond peaks close to 1Gbps... > Even with multiqueue, its quite possible one queue gets more than one packet per micro second. Time to process a packet might be greater then 1 us even on recent hardware. So bursts of 1000 small packets with same flow information, hit one queue, one cpu, and fill rx ring. Loosing these packets is OK, its very likely its an attack :) > I wish ifstat could report errors so I could see what the traffic rate > was when the problem occurred... yes, it could be added I guess. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Strange packet drops with heavy firewalling 2010-04-12 23:18 ` Changli Gao 2010-04-13 5:56 ` Eric Dumazet @ 2010-04-13 12:33 ` Paweł Staszewski 2010-04-13 12:53 ` Eric Dumazet 1 sibling, 1 reply; 15+ messages in thread From: Paweł Staszewski @ 2010-04-13 12:33 UTC (permalink / raw) To: Changli Gao; +Cc: Benny Amorsen, zhigang gong, netdev [-- Attachment #1: Type: text/plain, Size: 2348 bytes --] W dniu 2010-04-13 01:18, Changli Gao pisze: > On Tue, Apr 13, 2010 at 1:06 AM, Benny Amorsen<benny+usenet@amorsen.dk> wrote: > >> 99: 24 1306226 3 2 PCI-MSI-edge eth1-tx-0 >> 100: 15735 1648774 3 7 PCI-MSI-edge eth1-tx-1 >> 101: 8 11 9 1083022 PCI-MSI-edge eth1-tx-2 >> 102: 0 0 0 0 PCI-MSI-edge eth1-tx-3 >> 103: 18 15 6131 1095383 PCI-MSI-edge eth1-rx-0 >> 104: 217 32 46544 1335325 PCI-MSI-edge eth1-rx-1 >> 105: 154 1305595 218 16 PCI-MSI-edge eth1-rx-2 >> 106: 17 16 8229 1467509 PCI-MSI-edge eth1-rx-3 >> 107: 0 0 1 0 PCI-MSI-edge eth1 >> 108: 2 14 15 1003053 PCI-MSI-edge eth0-tx-0 >> 109: 8226 1668924 478 487 PCI-MSI-edge eth0-tx-1 >> 110: 3 1188874 17 12 PCI-MSI-edge eth0-tx-2 >> 111: 0 0 0 0 PCI-MSI-edge eth0-tx-3 >> 112: 203 185 5324 1015263 PCI-MSI-edge eth0-rx-0 >> 113: 4141 1600793 153 159 PCI-MSI-edge eth0-rx-1 >> 114: 16242 1210108 436 3124 PCI-MSI-edge eth0-rx-2 >> 115: 267 4173 19471 1321252 PCI-MSI-edge eth0-rx-3 >> 116: 0 1 0 0 PCI-MSI-edge eth0 >> >> >> irqbalanced seems to have picked CPU1 and CPU3 for all the interrupts, >> which to my mind should cause the same problem as before (where CPU1 and >> CPU3 was handling all packets). Yet the box clearly works much better >> than before. >> > irqbalanced? I don't think it can work properly. Try RPS in netdev and > linux-next tree, and if cpu load isn't even, try this patch: > http://patchwork.ozlabs.org/patch/49915/ . > > > Yes without irqbalance - and with irq affinity set by hand router will work much better. But I don't think that RPS will help him - I make some tests with RPS and AFFINITY - results in attached file. Test router make traffic management (hfsc) for almost 9k users [-- Attachment #2: RPS_AFFINITY_TEST.txt --] [-- Type: text/plain, Size: 5028 bytes --] ############################################################################## eth0 -> CPU0 eth1 -> CPU5 RPS: echo 00e0 > /sys/class/net/eth1/queues/rx-0/rps_cpus echo 000e > /sys/class/net/eth0/queues/rx-0/rps_cpus ------------------------------------------------------------------------------ PerfTop: 85205 irqs/sec kernel:97.1% [100000 cycles], (all, 8 CPUs) ------------------------------------------------------------------------------ samples pcnt kernel function _______ _____ _______________ 214930.00 - 24.5% : _raw_spin_lock 63844.00 - 7.3% : u32_classify 48381.00 - 5.5% : e1000_clean 47754.00 - 5.5% : rb_next 37222.00 - 4.2% : e1000_intr_msi 26295.00 - 3.0% : hfsc_enqueue 17371.00 - 2.0% : rb_erase 15290.00 - 1.7% : _raw_spin_lock_irqsave 14958.00 - 1.7% : rb_insert_color 14439.00 - 1.6% : update_vf 14384.00 - 1.6% : e1000_xmit_frame 14356.00 - 1.6% : hfsc_dequeue 13804.00 - 1.6% : e1000_clean_tx_irq 13413.00 - 1.5% : ipt_do_table 9654.00 - 1.1% : ip_route_input ############################################################################## eth0 -> CPU0 eth1 -> CPU5 NO RPS ------------------------------------------------------------------------------ PerfTop: 33800 irqs/sec kernel:96.9% [100000 cycles], (all, 8 CPUs) ------------------------------------------------------------------------------ samples pcnt kernel function _______ _____ _______________ 19361.00 - 11.2% : e1000_clean 16424.00 - 9.5% : rb_next 13060.00 - 7.5% : e1000_intr_msi 7293.00 - 4.2% : u32_classify 6875.00 - 4.0% : ipt_do_table 5811.00 - 3.4% : _raw_spin_lock 5754.00 - 3.3% : e1000_xmit_frame 5671.00 - 3.3% : hfsc_dequeue 4503.00 - 2.6% : __alloc_skb 4156.00 - 2.4% : hfsc_enqueue 4090.00 - 2.4% : e1000_clean_tx_irq 3809.00 - 2.2% : e1000_clean_rx_irq 3424.00 - 2.0% : update_vf 3028.00 - 1.7% : rb_erase 2714.00 - 1.6% : ip_route_input ############################################################################## eth0 -> CPU0,CPU1,CPU2,CPU4 -> affinity echo 0f > /proc/irq/30/smp_affinity eth1 -> CPU5,CPU6,CPU7,CPU8 -> affinity echo f0 > /proc/irq/31/smp_affinity NO RPS ------------------------------------------------------------------------------ PerfTop: 42362 irqs/sec kernel:96.0% [100000 cycles], (all, 8 CPUs) ------------------------------------------------------------------------------ samples pcnt kernel function _______ _____ _______________ 33815.00 - 10.6% : rb_next 21357.00 - 6.7% : u32_classify 14525.00 - 4.6% : _raw_spin_lock 14346.00 - 4.5% : e1000_clean 12798.00 - 4.0% : hfsc_enqueue 10526.00 - 3.3% : ipt_do_table 9999.00 - 3.1% : hfsc_dequeue 9976.00 - 3.1% : e1000_intr_msi 9787.00 - 3.1% : rb_erase 8259.00 - 2.6% : e1000_xmit_frame 8015.00 - 2.5% : rb_insert_color 7948.00 - 2.5% : update_vf 6868.00 - 2.2% : e1000_clean_tx_irq 6822.00 - 2.1% : e1000_clean_rx_irq 6368.00 - 2.0% : __alloc_skb ############################################################################## eth0 -> CPU0,CPU1,CPU2,CPU4 -> affinity echo 0f > /proc/irq/30/smp_affinity eth1 -> CPU5,CPU6,CPU7,CPU8 -> affinity echo f0 > /proc/irq/31/smp_affinity RPS: echo 0f > /sys/class/net/eth0/queues/rx-0/rps_cpus echo f0 > /sys/class/net/eth1/queues/rx-0/rps_cpus ------------------------------------------------------------------------------ PerfTop: 81051 irqs/sec kernel:96.9% [100000 cycles], (all, 8 CPUs) ------------------------------------------------------------------------------ samples pcnt kernel function _______ _____ _______________ 167110.00 - 22.3% : _raw_spin_lock 58221.00 - 7.8% : u32_classify 46379.00 - 6.2% : rb_next 35189.00 - 4.7% : e1000_clean 25614.00 - 3.4% : e1000_intr_msi 24094.00 - 3.2% : hfsc_enqueue 16231.00 - 2.2% : rb_erase 14298.00 - 1.9% : rb_insert_color 13751.00 - 1.8% : update_vf 13712.00 - 1.8% : ipt_do_table 13588.00 - 1.8% : hfsc_dequeue 13335.00 - 1.8% : e1000_xmit_frame 12449.00 - 1.7% : e1000_clean_tx_irq 11510.00 - 1.5% : net_tx_action 11428.00 - 1.5% : _raw_spin_lock_irqsave ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Strange packet drops with heavy firewalling 2010-04-13 12:33 ` Paweł Staszewski @ 2010-04-13 12:53 ` Eric Dumazet 2010-04-13 13:39 ` Paweł Staszewski 0 siblings, 1 reply; 15+ messages in thread From: Eric Dumazet @ 2010-04-13 12:53 UTC (permalink / raw) To: Paweł Staszewski; +Cc: Changli Gao, Benny Amorsen, zhigang gong, netdev Le mardi 13 avril 2010 à 14:33 +0200, Paweł Staszewski a écrit : > W dniu 2010-04-13 01:18, Changli Gao pisze: > > On Tue, Apr 13, 2010 at 1:06 AM, Benny Amorsen<benny+usenet@amorsen.dk> wrote: > > > >> 99: 24 1306226 3 2 PCI-MSI-edge eth1-tx-0 > >> 100: 15735 1648774 3 7 PCI-MSI-edge eth1-tx-1 > >> 101: 8 11 9 1083022 PCI-MSI-edge eth1-tx-2 > >> 102: 0 0 0 0 PCI-MSI-edge eth1-tx-3 > >> 103: 18 15 6131 1095383 PCI-MSI-edge eth1-rx-0 > >> 104: 217 32 46544 1335325 PCI-MSI-edge eth1-rx-1 > >> 105: 154 1305595 218 16 PCI-MSI-edge eth1-rx-2 > >> 106: 17 16 8229 1467509 PCI-MSI-edge eth1-rx-3 > >> 107: 0 0 1 0 PCI-MSI-edge eth1 > >> 108: 2 14 15 1003053 PCI-MSI-edge eth0-tx-0 > >> 109: 8226 1668924 478 487 PCI-MSI-edge eth0-tx-1 > >> 110: 3 1188874 17 12 PCI-MSI-edge eth0-tx-2 > >> 111: 0 0 0 0 PCI-MSI-edge eth0-tx-3 > >> 112: 203 185 5324 1015263 PCI-MSI-edge eth0-rx-0 > >> 113: 4141 1600793 153 159 PCI-MSI-edge eth0-rx-1 > >> 114: 16242 1210108 436 3124 PCI-MSI-edge eth0-rx-2 > >> 115: 267 4173 19471 1321252 PCI-MSI-edge eth0-rx-3 > >> 116: 0 1 0 0 PCI-MSI-edge eth0 > >> > >> > >> irqbalanced seems to have picked CPU1 and CPU3 for all the interrupts, > >> which to my mind should cause the same problem as before (where CPU1 and > >> CPU3 was handling all packets). Yet the box clearly works much better > >> than before. > >> > > irqbalanced? I don't think it can work properly. Try RPS in netdev and > > linux-next tree, and if cpu load isn't even, try this patch: > > http://patchwork.ozlabs.org/patch/49915/ . > > > > > > > Yes without irqbalance - and with irq affinity set by hand router will > work much better. > > But I don't think that RPS will help him - I make some tests with RPS > and AFFINITY - results in attached file. > Test router make traffic management (hfsc) for almost 9k users Thanks for sharing Pawel. But obviously you are mixing apples and oranges. Are you aware that HFSC and other trafic shapers do serialize access to data structures ? If many cpus try to access these structures in //, you have a lot of cache line misses. HFSC is a real memory hog :( Benny do have firewalling (highly parallelized these days, iptables was well improved in this area), but no traffic control. Anyway, Benny has now multiqueue devices, and therefore RPS will not help him. I suggested RPS before his move to multiqueue, and multiqueue is the most sensible way to improve things, when no central lock is used. Every cpu can really work in //. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Strange packet drops with heavy firewalling 2010-04-13 12:53 ` Eric Dumazet @ 2010-04-13 13:39 ` Paweł Staszewski 0 siblings, 0 replies; 15+ messages in thread From: Paweł Staszewski @ 2010-04-13 13:39 UTC (permalink / raw) To: Eric Dumazet; +Cc: Changli Gao, Benny Amorsen, zhigang gong, netdev W dniu 2010-04-13 14:53, Eric Dumazet pisze: > Le mardi 13 avril 2010 à 14:33 +0200, Paweł Staszewski a écrit : > >> W dniu 2010-04-13 01:18, Changli Gao pisze: >> >>> On Tue, Apr 13, 2010 at 1:06 AM, Benny Amorsen<benny+usenet@amorsen.dk> wrote: >>> >>> >>>> 99: 24 1306226 3 2 PCI-MSI-edge eth1-tx-0 >>>> 100: 15735 1648774 3 7 PCI-MSI-edge eth1-tx-1 >>>> 101: 8 11 9 1083022 PCI-MSI-edge eth1-tx-2 >>>> 102: 0 0 0 0 PCI-MSI-edge eth1-tx-3 >>>> 103: 18 15 6131 1095383 PCI-MSI-edge eth1-rx-0 >>>> 104: 217 32 46544 1335325 PCI-MSI-edge eth1-rx-1 >>>> 105: 154 1305595 218 16 PCI-MSI-edge eth1-rx-2 >>>> 106: 17 16 8229 1467509 PCI-MSI-edge eth1-rx-3 >>>> 107: 0 0 1 0 PCI-MSI-edge eth1 >>>> 108: 2 14 15 1003053 PCI-MSI-edge eth0-tx-0 >>>> 109: 8226 1668924 478 487 PCI-MSI-edge eth0-tx-1 >>>> 110: 3 1188874 17 12 PCI-MSI-edge eth0-tx-2 >>>> 111: 0 0 0 0 PCI-MSI-edge eth0-tx-3 >>>> 112: 203 185 5324 1015263 PCI-MSI-edge eth0-rx-0 >>>> 113: 4141 1600793 153 159 PCI-MSI-edge eth0-rx-1 >>>> 114: 16242 1210108 436 3124 PCI-MSI-edge eth0-rx-2 >>>> 115: 267 4173 19471 1321252 PCI-MSI-edge eth0-rx-3 >>>> 116: 0 1 0 0 PCI-MSI-edge eth0 >>>> >>>> >>>> irqbalanced seems to have picked CPU1 and CPU3 for all the interrupts, >>>> which to my mind should cause the same problem as before (where CPU1 and >>>> CPU3 was handling all packets). Yet the box clearly works much better >>>> than before. >>>> >>>> >>> irqbalanced? I don't think it can work properly. Try RPS in netdev and >>> linux-next tree, and if cpu load isn't even, try this patch: >>> http://patchwork.ozlabs.org/patch/49915/ . >>> >>> >>> >>> >> Yes without irqbalance - and with irq affinity set by hand router will >> work much better. >> >> But I don't think that RPS will help him - I make some tests with RPS >> and AFFINITY - results in attached file. >> Test router make traffic management (hfsc) for almost 9k users >> > Thanks for sharing Pawel. > > But obviously you are mixing apples and oranges. > > Are you aware that HFSC and other trafic shapers do serialize access to > data structures ? If many cpus try to access these structures in //, you > have a lot of cache line misses. HFSC is a real memory hog :( > > Thanks Eric for explanation why RPS is useless for traffic management routers. > Benny do have firewalling (highly parallelized these days, iptables was > well improved in this area), but no traffic control. > > Hmm so maybe better choice for traffic management is use iptables for "filter classification" instead of "u32 filters"- something like iptables CLASSIFY target > Anyway, Benny has now multiqueue devices, and therefore RPS will not > help him. I suggested RPS before his move to multiqueue, and multiqueue > is the most sensible way to improve things, when no central lock is > used. Every cpu can really work in //. > > > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2010-04-15 13:43 UTC | newest] Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2010-04-09 9:56 Strange packet drops with heavy firewalling Benny Amorsen 2010-04-09 11:47 ` Eric Dumazet 2010-04-09 12:33 ` Benny Amorsen 2010-04-09 13:29 ` Eric Dumazet 2010-04-12 6:20 ` Benny Amorsen [not found] ` <q2v40c9f5b21004120116p766df82dj88c6af4e4cad55f@mail.gmail.com> 2010-04-12 14:44 ` Benny Lyne Amorsen [not found] ` <p2x40c9f5b21004120833jd7a749cak6ea69cebd28f8352@mail.gmail.com> 2010-04-12 17:06 ` Benny Amorsen 2010-04-12 23:18 ` Changli Gao 2010-04-13 5:56 ` Eric Dumazet 2010-04-13 7:56 ` Benny Amorsen 2010-04-15 13:23 ` Benny Amorsen 2010-04-15 13:42 ` Eric Dumazet 2010-04-13 12:33 ` Paweł Staszewski 2010-04-13 12:53 ` Eric Dumazet 2010-04-13 13:39 ` Paweł Staszewski
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.