All of lore.kernel.org
 help / color / mirror / Atom feed
* Strange packet drops with heavy firewalling
@ 2010-04-09  9:56 Benny Amorsen
  2010-04-09 11:47 ` Eric Dumazet
       [not found] ` <q2v40c9f5b21004120116p766df82dj88c6af4e4cad55f@mail.gmail.com>
  0 siblings, 2 replies; 15+ messages in thread
From: Benny Amorsen @ 2010-04-09  9:56 UTC (permalink / raw)
  To: netdev


I have a netfilter-box which is dropping packets. ethtool -S counts
10-20 rx_discards per second on the interface.

The switch does not have flow control enabled; with flow control enabled
the rx_discards turn into tx_on_sent which ultimately cause the same
problem (the load is pretty constant so the switch has to drop the
packets instead).

perf top shows something like:
             5201.00 -  6.7% : _spin_unlock_irqrestore
             4232.00 -  5.5% : finish_task_switch
             3597.00 -  4.6% : tg3_poll	[tg3]
             3257.00 -  4.2% : handle_IRQ_event
             2515.00 -  3.2% : tick_nohz_restart_sched_tick
             1947.00 -  2.5% : nf_ct_tuple_equal
             1927.00 -  2.5% : tg3_start_xmit	[tg3]
             1879.00 -  2.4% : kmem_cache_alloc_node
             1625.00 -  2.1% : tick_nohz_stop_sched_tick
             1619.00 -  2.1% : ipt_do_table
             1595.00 -  2.1% : ip_route_input
             1547.00 -  2.0% : kmem_cache_free
             1474.00 -  1.9% : __alloc_skb
             1424.00 -  1.8% : fget_light
             1391.00 -  1.8% : nf_iterate

The rule set is quite large (more than 4000 rules), but organized so
that each packet only has to traverse a few rules before getting
accepted or rejected.

When the problem started we were using a different server, an old
two-socket 32-bit Xeon with hyperthreading. CPU usage often hit 100% on
one CPU with that server. After replacing the server with a ProLiant
DL160 G5 with a quad-core Xeon (without hyperthreading) the CPU usage
rarely exceeds 10% on any CPU, but the packet loss persists.

We're using the built-in dual Broadcom Corporation NetXtreme BCM5722 Gigabit
Ethernet PCI Express nics, and the kernel is
kernel-2.6.32.9-70.fc12.x86_64 from Fedora. Next step is probably
installing a better ethernet card, perhaps an Intel 82576-based one, so
that we can get multiqueue support.

The traffic is about 300Mbps (twice that if you count both in and out,
like Cisco).


/Benny


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Strange packet drops with heavy firewalling
  2010-04-09  9:56 Strange packet drops with heavy firewalling Benny Amorsen
@ 2010-04-09 11:47 ` Eric Dumazet
  2010-04-09 12:33   ` Benny Amorsen
       [not found] ` <q2v40c9f5b21004120116p766df82dj88c6af4e4cad55f@mail.gmail.com>
  1 sibling, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2010-04-09 11:47 UTC (permalink / raw)
  To: Benny Amorsen; +Cc: netdev

Le vendredi 09 avril 2010 à 11:56 +0200, Benny Amorsen a écrit :
> I have a netfilter-box which is dropping packets. ethtool -S counts
> 10-20 rx_discards per second on the interface.
> 
> The switch does not have flow control enabled; with flow control enabled
> the rx_discards turn into tx_on_sent which ultimately cause the same
> problem (the load is pretty constant so the switch has to drop the
> packets instead).
> 
> perf top shows something like:
>              5201.00 -  6.7% : _spin_unlock_irqrestore
>              4232.00 -  5.5% : finish_task_switch
>              3597.00 -  4.6% : tg3_poll	[tg3]
>              3257.00 -  4.2% : handle_IRQ_event
>              2515.00 -  3.2% : tick_nohz_restart_sched_tick
>              1947.00 -  2.5% : nf_ct_tuple_equal
>              1927.00 -  2.5% : tg3_start_xmit	[tg3]
>              1879.00 -  2.4% : kmem_cache_alloc_node
>              1625.00 -  2.1% : tick_nohz_stop_sched_tick
>              1619.00 -  2.1% : ipt_do_table
>              1595.00 -  2.1% : ip_route_input
>              1547.00 -  2.0% : kmem_cache_free
>              1474.00 -  1.9% : __alloc_skb
>              1424.00 -  1.8% : fget_light
>              1391.00 -  1.8% : nf_iterate
> 
> The rule set is quite large (more than 4000 rules), but organized so
> that each packet only has to traverse a few rules before getting
> accepted or rejected.
> 
> When the problem started we were using a different server, an old
> two-socket 32-bit Xeon with hyperthreading. CPU usage often hit 100% on
> one CPU with that server. After replacing the server with a ProLiant
> DL160 G5 with a quad-core Xeon (without hyperthreading) the CPU usage
> rarely exceeds 10% on any CPU, but the packet loss persists.
> 

might be micro bursts, check 'ethtool -g eth0' RX parameters (increase
RX ring from 200 to 511 if you want more buffers ?)

> We're using the built-in dual Broadcom Corporation NetXtreme BCM5722 Gigabit
> Ethernet PCI Express nics, and the kernel is
> kernel-2.6.32.9-70.fc12.x86_64 from Fedora. Next step is probably
> installing a better ethernet card, perhaps an Intel 82576-based one, so
> that we can get multiqueue support.
> 

Sure, but before this, could you check

cat /proc/net/softnet_stat
cat /proc/interrupts
(check eth0 IRQS are delivered to one cpu)

grep . /proc/sys/net/ipv4/netfilter/ip_conntrack_*
(might need to increase ip_conntrack_buckets)

ethtool -c eth0
(might change coalesce params to reduce number of irqs)

ethtool -g eth0



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Strange packet drops with heavy firewalling
  2010-04-09 11:47 ` Eric Dumazet
@ 2010-04-09 12:33   ` Benny Amorsen
  2010-04-09 13:29     ` Eric Dumazet
  0 siblings, 1 reply; 15+ messages in thread
From: Benny Amorsen @ 2010-04-09 12:33 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

Eric Dumazet <eric.dumazet@gmail.com> writes:

> might be micro bursts, check 'ethtool -g eth0' RX parameters (increase
> RX ring from 200 to 511 if you want more buffers ?)

I tried that already actually. (I didn't expect it to cause traffic
interruption, but it did. Oh well.)

It didn't make a difference, at least not one I could detect from the
number of packet drops and the CPU utilization.

> cat /proc/net/softnet_stat

000002d9 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
42bc8143 00000000 0000024c 00000000 00000000 00000000 00000000 00000000 00000000
0000031b 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
1c5a35e9 00000000 000005f7 00000000 00000000 00000000 00000000 00000000 00000000

I am not quite sure how to interpret that...

> cat /proc/interrupts

  79:       1240 4050590849       1253       1263   PCI-MSI-edge      eth0
  80:         12          9         14 3613521843   PCI-MSI-edge      eth1

> (check eth0 IRQS are delivered to one cpu)

Yes CPU1 handles eth0 and CPU3 handles eth1.

> grep . /proc/sys/net/ipv4/netfilter/ip_conntrack_*

nf_conntrack_acct:1
nf_conntrack_buckets:8192
nf_conntrack_checksum:1
nf_conntrack_count:49311
nf_conntrack_events:1
nf_conntrack_events_retry_timeout:15
nf_conntrack_expect_max:2048
nf_conntrack_generic_timeout:600
nf_conntrack_icmp_timeout:30
nf_conntrack_log_invalid:1
nf_conntrack_max:1048576
nf_conntrack_tcp_be_liberal:0
nf_conntrack_tcp_loose:1
nf_conntrack_tcp_max_retrans:3
nf_conntrack_tcp_timeout_close:10
nf_conntrack_tcp_timeout_close_wait:60
nf_conntrack_tcp_timeout_established:432000
nf_conntrack_tcp_timeout_fin_wait:120
nf_conntrack_tcp_timeout_last_ack:30
nf_conntrack_tcp_timeout_max_retrans:300
nf_conntrack_tcp_timeout_syn_recv:60
nf_conntrack_tcp_timeout_syn_sent:120
nf_conntrack_tcp_timeout_time_wait:120
nf_conntrack_tcp_timeout_unacknowledged:300
nf_conntrack_udp_timeout:30
nf_conntrack_udp_timeout_stream:180

> (might need to increase ip_conntrack_buckets)

You got me there. I had forgotten nf_conntrack.hashsize=1048576
and nf_conntrack.expect_hashsize=32768 on the kernel command line. It
was on the hot standby firewall, but not on the primary one. I will do a
failover to the hot standby sometime during the weekend.

It still isn't possible to change without a reboot, is it?

> ethtool -c eth0
> (might change coalesce params to reduce number of irqs)

Coalesce parameters for eth0:
Adaptive RX: off  TX: off
stats-block-usecs: 0
sample-interval: 0
pkt-rate-low: 0
pkt-rate-high: 0

rx-usecs: 20
rx-frames: 5
rx-usecs-irq: 0
rx-frames-irq: 5

tx-usecs: 72
tx-frames: 53
tx-usecs-irq: 0
tx-frames-irq: 5

rx-usecs-low: 0
rx-frame-low: 0
tx-usecs-low: 0
tx-frame-low: 0

rx-usecs-high: 0
rx-frame-high: 0
tx-usecs-high: 0
tx-frame-high: 0

I played quite a lot with the parameters but it did not seem to make any
difference. I didn't try adaptive though, but the load is fairly static
so it didn't seem appropriate.

> ethtool -g eth0

Ring parameters for eth0:
Pre-set maximums:
RX:		511
RX Mini:	0
RX Jumbo:	0
TX:		511
Current hardware settings:
RX:		200
RX Mini:	0
RX Jumbo:	0
TX:		511

Right now RX is 200, but when it was 511 it didn't seem to make a
difference.

Thank you very much for the help! I will report back whether it was the
hash buckets.


/Benny

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Strange packet drops with heavy firewalling
  2010-04-09 12:33   ` Benny Amorsen
@ 2010-04-09 13:29     ` Eric Dumazet
  2010-04-12  6:20       ` Benny Amorsen
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2010-04-09 13:29 UTC (permalink / raw)
  To: Benny Amorsen; +Cc: netdev

Le vendredi 09 avril 2010 à 14:33 +0200, Benny Amorsen a écrit :

> Thank you very much for the help! I will report back whether it was the
> hash buckets.

OK

You could try :

ethtool -C eth0 tx-usecs 200 tx-frames 100 tx-frames-irq 100
ethtool -C eth1 tx-usecs 200 tx-frames 100 tx-frames-irq 100

(to reduce tx completion irqs)


Before buying multiqueue devices, you also could try net-next-2.6 kernel,
because RPS (Remote Packet Steering) is in.

In your setup, this might help a bit, distribute the packets to all cpus,
with appropriate cache handling.




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Strange packet drops with heavy firewalling
  2010-04-09 13:29     ` Eric Dumazet
@ 2010-04-12  6:20       ` Benny Amorsen
  0 siblings, 0 replies; 15+ messages in thread
From: Benny Amorsen @ 2010-04-12  6:20 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

Eric Dumazet <eric.dumazet@gmail.com> writes:

> Le vendredi 09 avril 2010 à 14:33 +0200, Benny Amorsen a écrit :
>
>> Thank you very much for the help! I will report back whether it was the
>> hash buckets.
>
> OK
>
> You could try :
>
> ethtool -C eth0 tx-usecs 200 tx-frames 100 tx-frames-irq 100
> ethtool -C eth1 tx-usecs 200 tx-frames 100 tx-frames-irq 100
>
> (to reduce tx completion irqs)

Alas, even with the hash buckets I still have the same problem. Perhaps
slightly less severe, but it's still there.

I implemented the other changes you suggested as well except for the
ethtool -G. I may try to switch to net-next if I can find an easy way to
make an RPM out of it.

Thank you for the help!

/proc/sys/net/netfilter/nf_conntrack_acct:1
/proc/sys/net/netfilter/nf_conntrack_buckets:1048576
/proc/sys/net/netfilter/nf_conntrack_checksum:1
/proc/sys/net/netfilter/nf_conntrack_count:43430
/proc/sys/net/netfilter/nf_conntrack_events:1
/proc/sys/net/netfilter/nf_conntrack_events_retry_timeout:15
/proc/sys/net/netfilter/nf_conntrack_expect_max:2048
/proc/sys/net/netfilter/nf_conntrack_generic_timeout:600
/proc/sys/net/netfilter/nf_conntrack_icmp_timeout:30
/proc/sys/net/netfilter/nf_conntrack_log_invalid:1
/proc/sys/net/netfilter/nf_conntrack_max:1048576
/proc/sys/net/netfilter/nf_conntrack_tcp_be_liberal:0
/proc/sys/net/netfilter/nf_conntrack_tcp_loose:1
/proc/sys/net/netfilter/nf_conntrack_tcp_max_retrans:3
/proc/sys/net/netfilter/nf_conntrack_tcp_timeout_close:10
/proc/sys/net/netfilter/nf_conntrack_tcp_timeout_close_wait:60
/proc/sys/net/netfilter/nf_conntrack_tcp_timeout_established:432000
/proc/sys/net/netfilter/nf_conntrack_tcp_timeout_fin_wait:120
/proc/sys/net/netfilter/nf_conntrack_tcp_timeout_last_ack:30
/proc/sys/net/netfilter/nf_conntrack_tcp_timeout_max_retrans:300
/proc/sys/net/netfilter/nf_conntrack_tcp_timeout_syn_recv:60
/proc/sys/net/netfilter/nf_conntrack_tcp_timeout_syn_sent:120
/proc/sys/net/netfilter/nf_conntrack_tcp_timeout_time_wait:120
/proc/sys/net/netfilter/nf_conntrack_tcp_timeout_unacknowledged:300
/proc/sys/net/netfilter/nf_conntrack_udp_timeout:30
/proc/sys/net/netfilter/nf_conntrack_udp_timeout_stream:180


/Benny

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Strange packet drops with heavy firewalling
       [not found] ` <q2v40c9f5b21004120116p766df82dj88c6af4e4cad55f@mail.gmail.com>
@ 2010-04-12 14:44   ` Benny Lyne Amorsen
       [not found]     ` <p2x40c9f5b21004120833jd7a749cak6ea69cebd28f8352@mail.gmail.com>
  0 siblings, 1 reply; 15+ messages in thread
From: Benny Lyne Amorsen @ 2010-04-12 14:44 UTC (permalink / raw)
  To: zhigang gong; +Cc: netdev

man, 12 04 2010 kl. 16:16 +0800, skrev zhigang gong:

> How do you know the per CPU usage data, by oprofile? I'm just a little
> surprised with the result, as it shows your new core is running 10x
> faster than your old core :). 

Well the old server had only two CPU's plus hyperthreading, and the
CPU's were Pentium-4-based. Add a slow memory bus to that and you have a
fairly slow system. It's almost 5 years old, so Moore's law says 2**3
increase in number of transistors...

In about the same time frame Linux has gone from being able to fill
1Gbps ethernet to being able to fill 10Gbps ethernet 

> What's the average packet size?

I asked the switch (I can't find a handy equivalent to ifstat which
counts packets instead of bytes). The 5 minute average packet sizes seem
to vary in the range 450 to 550 bytes.

> If your packet size is 64 bytes, then the pps(packet per second) rate
> should be about 585Kpps. As I know, this value is almost the best
> result when the standard linux kernel is processing the networking
> traffic with a normal 1Gb ethernet card (without multi-queue support)
> on a intel box. If it is the case, to buy a better ethernet card with
> multi-queue support should be a good choice. Otherwise, it may not
> help. 

I am far from that, perhaps 1/10th of that. I do a lot more processing
on at least some of the packets though (the ones starting new flows).


/Benny



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Strange packet drops with heavy firewalling
       [not found]     ` <p2x40c9f5b21004120833jd7a749cak6ea69cebd28f8352@mail.gmail.com>
@ 2010-04-12 17:06       ` Benny Amorsen
  2010-04-12 23:18         ` Changli Gao
  0 siblings, 1 reply; 15+ messages in thread
From: Benny Amorsen @ 2010-04-12 17:06 UTC (permalink / raw)
  To: zhigang gong; +Cc: netdev

man, 12 04 2010 kl. 23:33 +0800, skrev zhigang gong:

>  
> Now, I agree with Eric's analysis, there may be some bursts, for
> example a burst of a bunch of first packets for different new flows.
> What mode are you using the ethernet driver in? I guess it's NAPI,
> right?

I presume so.

>  And whether your time consumption workload is handled in soft-irq
> context or in a user space process? 

soft-irq, the box is doing pure iptables. The only time it does a little
bit of user-space work is when I use conntrackd, but killing conntrackd
does not affect the packet loss measurably.

I switched to a 82576-based card, and now I get:

             3341.00 -  4.9% : _spin_lock
             2506.00 -  3.7% : irq_entries_start
             2163.00 -  3.2% : _spin_lock_irqsave
             1616.00 -  2.4% : native_read_tsc
             1572.00 -  2.3% : igb_poll	[igb]
             1386.00 -  2.0% : get_partial_node
             1236.00 -  1.8% : igb_clean_tx_irq	[igb]
             1205.00 -  1.8% : igb_xmit_frame_adv	[igb]
             1170.00 -  1.7% : ipt_do_table
             1049.00 -  1.6% : fget_light
             1015.00 -  1.5% : tick_nohz_stop_sched_tick
              967.00 -  1.4% : fput
              945.00 -  1.4% : __slab_free
              919.00 -  1.4% : datagram_poll
              874.00 -  1.3% : dev_queue_xmit

And it seems the packet loss is gone!

# ethtool -S eth0|fgrep drop
     tx_dropped: 0
     rx_queue_drop_packet_count: 0
     dropped_smbus: 0
     rx_queue_0_drops: 0
     rx_queue_1_drops: 0
     rx_queue_2_drops: 0
     rx_queue_3_drops: 0

I'm a bit surprised by this though:
 
  99:         24    1306226          3          2   PCI-MSI-edge      eth1-tx-0
 100:      15735    1648774          3          7   PCI-MSI-edge      eth1-tx-1
 101:          8         11          9    1083022   PCI-MSI-edge      eth1-tx-2
 102:          0          0          0          0   PCI-MSI-edge      eth1-tx-3
 103:         18         15       6131    1095383   PCI-MSI-edge      eth1-rx-0
 104:        217         32      46544    1335325   PCI-MSI-edge      eth1-rx-1
 105:        154    1305595        218         16   PCI-MSI-edge      eth1-rx-2
 106:         17         16       8229    1467509   PCI-MSI-edge      eth1-rx-3
 107:          0          0          1          0   PCI-MSI-edge      eth1
 108:          2         14         15    1003053   PCI-MSI-edge      eth0-tx-0
 109:       8226    1668924        478        487   PCI-MSI-edge      eth0-tx-1
 110:          3    1188874         17         12   PCI-MSI-edge      eth0-tx-2
 111:          0          0          0          0   PCI-MSI-edge      eth0-tx-3
 112:        203        185       5324    1015263   PCI-MSI-edge      eth0-rx-0
 113:       4141    1600793        153        159   PCI-MSI-edge      eth0-rx-1
 114:      16242    1210108        436       3124   PCI-MSI-edge      eth0-rx-2
 115:        267       4173      19471    1321252   PCI-MSI-edge      eth0-rx-3
 116:          0          1          0          0   PCI-MSI-edge      eth0


irqbalanced seems to have picked CPU1 and CPU3 for all the interrupts,
which to my mind should cause the same problem as before (where CPU1 and
CPU3 was handling all packets). Yet the box clearly works much better
than before.

Anyway, this brings the saga to an end from my point of view. Thank you
very much for looking into this, you and Eric Dumazet have been
invaluable!


/Benny




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Strange packet drops with heavy firewalling
  2010-04-12 17:06       ` Benny Amorsen
@ 2010-04-12 23:18         ` Changli Gao
  2010-04-13  5:56           ` Eric Dumazet
  2010-04-13 12:33           ` Paweł Staszewski
  0 siblings, 2 replies; 15+ messages in thread
From: Changli Gao @ 2010-04-12 23:18 UTC (permalink / raw)
  To: Benny Amorsen; +Cc: zhigang gong, netdev

On Tue, Apr 13, 2010 at 1:06 AM, Benny Amorsen <benny+usenet@amorsen.dk> wrote:
>
>  99:         24    1306226          3          2   PCI-MSI-edge      eth1-tx-0
>  100:      15735    1648774          3          7   PCI-MSI-edge      eth1-tx-1
>  101:          8         11          9    1083022   PCI-MSI-edge      eth1-tx-2
>  102:          0          0          0          0   PCI-MSI-edge      eth1-tx-3
>  103:         18         15       6131    1095383   PCI-MSI-edge      eth1-rx-0
>  104:        217         32      46544    1335325   PCI-MSI-edge      eth1-rx-1
>  105:        154    1305595        218         16   PCI-MSI-edge      eth1-rx-2
>  106:         17         16       8229    1467509   PCI-MSI-edge      eth1-rx-3
>  107:          0          0          1          0   PCI-MSI-edge      eth1
>  108:          2         14         15    1003053   PCI-MSI-edge      eth0-tx-0
>  109:       8226    1668924        478        487   PCI-MSI-edge      eth0-tx-1
>  110:          3    1188874         17         12   PCI-MSI-edge      eth0-tx-2
>  111:          0          0          0          0   PCI-MSI-edge      eth0-tx-3
>  112:        203        185       5324    1015263   PCI-MSI-edge      eth0-rx-0
>  113:       4141    1600793        153        159   PCI-MSI-edge      eth0-rx-1
>  114:      16242    1210108        436       3124   PCI-MSI-edge      eth0-rx-2
>  115:        267       4173      19471    1321252   PCI-MSI-edge      eth0-rx-3
>  116:          0          1          0          0   PCI-MSI-edge      eth0
>
>
> irqbalanced seems to have picked CPU1 and CPU3 for all the interrupts,
> which to my mind should cause the same problem as before (where CPU1 and
> CPU3 was handling all packets). Yet the box clearly works much better
> than before.

irqbalanced? I don't think it can work properly. Try RPS in netdev and
linux-next tree, and if cpu load isn't even, try this patch:
http://patchwork.ozlabs.org/patch/49915/ .


-- 
Regards,
Changli Gao(xiaosuo@gmail.com)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Strange packet drops with heavy firewalling
  2010-04-12 23:18         ` Changli Gao
@ 2010-04-13  5:56           ` Eric Dumazet
  2010-04-13  7:56             ` Benny Amorsen
  2010-04-13 12:33           ` Paweł Staszewski
  1 sibling, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2010-04-13  5:56 UTC (permalink / raw)
  To: Changli Gao; +Cc: Benny Amorsen, zhigang gong, netdev

Le mardi 13 avril 2010 à 07:18 +0800, Changli Gao a écrit :
> On Tue, Apr 13, 2010 at 1:06 AM, Benny Amorsen <benny+usenet@amorsen.dk> wrote:
> >
> >  99:         24    1306226          3          2   PCI-MSI-edge      eth1-tx-0
> >  100:      15735    1648774          3          7   PCI-MSI-edge      eth1-tx-1
> >  101:          8         11          9    1083022   PCI-MSI-edge      eth1-tx-2
> >  102:          0          0          0          0   PCI-MSI-edge      eth1-tx-3
> >  103:         18         15       6131    1095383   PCI-MSI-edge      eth1-rx-0
> >  104:        217         32      46544    1335325   PCI-MSI-edge      eth1-rx-1
> >  105:        154    1305595        218         16   PCI-MSI-edge      eth1-rx-2
> >  106:         17         16       8229    1467509   PCI-MSI-edge      eth1-rx-3
> >  107:          0          0          1          0   PCI-MSI-edge      eth1
> >  108:          2         14         15    1003053   PCI-MSI-edge      eth0-tx-0
> >  109:       8226    1668924        478        487   PCI-MSI-edge      eth0-tx-1
> >  110:          3    1188874         17         12   PCI-MSI-edge      eth0-tx-2
> >  111:          0          0          0          0   PCI-MSI-edge      eth0-tx-3
> >  112:        203        185       5324    1015263   PCI-MSI-edge      eth0-rx-0
> >  113:       4141    1600793        153        159   PCI-MSI-edge      eth0-rx-1
> >  114:      16242    1210108        436       3124   PCI-MSI-edge      eth0-rx-2
> >  115:        267       4173      19471    1321252   PCI-MSI-edge      eth0-rx-3
> >  116:          0          1          0          0   PCI-MSI-edge      eth0
> >
> >
> > irqbalanced seems to have picked CPU1 and CPU3 for all the interrupts,
> > which to my mind should cause the same problem as before (where CPU1 and
> > CPU3 was handling all packets). Yet the box clearly works much better
> > than before.
> 
> irqbalanced? I don't think it can work properly. Try RPS in netdev and
> linux-next tree, and if cpu load isn't even, try this patch:
> http://patchwork.ozlabs.org/patch/49915/ .
> 
> 

Dont try RPS on multiqueue devices !

If number of queues matches CPU numbers, it brings nothing but extra
latencies !

Benny, I am not sure your irqbalance is up2date with multiqueue devices,
you might need to disable it and manually irqaffine each interrupt

echo 01 >/proc/irq/100/smp_affinity
echo 02 >/proc/irq/101/smp_affinity
echo 04 >/proc/irq/102/smp_affinity
echo 08 >/proc/irq/103/smp_affinity
echo 10 >/proc/irq/104/smp_affinity
echo 20 >/proc/irq/105/smp_affinity
echo 40 >/proc/irq/106/smp_affinity
echo 80 >/proc/irq/107/smp_affinity

echo 01 >/proc/irq/108/smp_affinity
echo 02 >/proc/irq/109/smp_affinity
echo 04 >/proc/irq/110/smp_affinity
echo 08 >/proc/irq/111/smp_affinity
echo 10 >/proc/irq/112/smp_affinity
echo 20 >/proc/irq/113/smp_affinity
echo 40 >/proc/irq/114/smp_affinity
echo 80 >/proc/irq/115/smp_affinity



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Strange packet drops with heavy firewalling
  2010-04-13  5:56           ` Eric Dumazet
@ 2010-04-13  7:56             ` Benny Amorsen
  2010-04-15 13:23               ` Benny Amorsen
  0 siblings, 1 reply; 15+ messages in thread
From: Benny Amorsen @ 2010-04-13  7:56 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Changli Gao, zhigang gong, netdev

Eric Dumazet <eric.dumazet@gmail.com> writes:

> Benny, I am not sure your irqbalance is up2date with multiqueue devices,
> you might need to disable it and manually irqaffine each interrupt

True, that would probably help. Irqbalance might just believe that the
load is so low that it isn't worth rebalancing. The CPU's are spending
more than 90% of their time idling.

I'll keep monitoring the server, and if it starts dropping packets again
or load increases I'll check whether irqbalanced does the right thing,
and if not I'll implement your suggestion.

Thank you very much!


/Benny

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Strange packet drops with heavy firewalling
  2010-04-12 23:18         ` Changli Gao
  2010-04-13  5:56           ` Eric Dumazet
@ 2010-04-13 12:33           ` Paweł Staszewski
  2010-04-13 12:53             ` Eric Dumazet
  1 sibling, 1 reply; 15+ messages in thread
From: Paweł Staszewski @ 2010-04-13 12:33 UTC (permalink / raw)
  To: Changli Gao; +Cc: Benny Amorsen, zhigang gong, netdev

[-- Attachment #1: Type: text/plain, Size: 2348 bytes --]

W dniu 2010-04-13 01:18, Changli Gao pisze:
> On Tue, Apr 13, 2010 at 1:06 AM, Benny Amorsen<benny+usenet@amorsen.dk>  wrote:
>    
>>   99:         24    1306226          3          2   PCI-MSI-edge      eth1-tx-0
>>   100:      15735    1648774          3          7   PCI-MSI-edge      eth1-tx-1
>>   101:          8         11          9    1083022   PCI-MSI-edge      eth1-tx-2
>>   102:          0          0          0          0   PCI-MSI-edge      eth1-tx-3
>>   103:         18         15       6131    1095383   PCI-MSI-edge      eth1-rx-0
>>   104:        217         32      46544    1335325   PCI-MSI-edge      eth1-rx-1
>>   105:        154    1305595        218         16   PCI-MSI-edge      eth1-rx-2
>>   106:         17         16       8229    1467509   PCI-MSI-edge      eth1-rx-3
>>   107:          0          0          1          0   PCI-MSI-edge      eth1
>>   108:          2         14         15    1003053   PCI-MSI-edge      eth0-tx-0
>>   109:       8226    1668924        478        487   PCI-MSI-edge      eth0-tx-1
>>   110:          3    1188874         17         12   PCI-MSI-edge      eth0-tx-2
>>   111:          0          0          0          0   PCI-MSI-edge      eth0-tx-3
>>   112:        203        185       5324    1015263   PCI-MSI-edge      eth0-rx-0
>>   113:       4141    1600793        153        159   PCI-MSI-edge      eth0-rx-1
>>   114:      16242    1210108        436       3124   PCI-MSI-edge      eth0-rx-2
>>   115:        267       4173      19471    1321252   PCI-MSI-edge      eth0-rx-3
>>   116:          0          1          0          0   PCI-MSI-edge      eth0
>>
>>
>> irqbalanced seems to have picked CPU1 and CPU3 for all the interrupts,
>> which to my mind should cause the same problem as before (where CPU1 and
>> CPU3 was handling all packets). Yet the box clearly works much better
>> than before.
>>      
> irqbalanced? I don't think it can work properly. Try RPS in netdev and
> linux-next tree, and if cpu load isn't even, try this patch:
> http://patchwork.ozlabs.org/patch/49915/ .
>
>
>    
Yes without irqbalance - and with irq affinity set by hand router will 
work much better.

But I don't think that RPS will help him - I make some tests with RPS 
and AFFINITY - results in attached file.
Test router make traffic management (hfsc) for almost 9k users





[-- Attachment #2: RPS_AFFINITY_TEST.txt --]
[-- Type: text/plain, Size: 5028 bytes --]

##############################################################################
eth0 -> CPU0
eth1 -> CPU5
RPS:
echo 00e0 > /sys/class/net/eth1/queues/rx-0/rps_cpus
echo 000e > /sys/class/net/eth0/queues/rx-0/rps_cpus

------------------------------------------------------------------------------
   PerfTop:   85205 irqs/sec  kernel:97.1% [100000 cycles],  (all, 8 CPUs)
------------------------------------------------------------------------------

             samples    pcnt   kernel function
             _______   _____   _______________

           214930.00 - 24.5% : _raw_spin_lock
            63844.00 -  7.3% : u32_classify
            48381.00 -  5.5% : e1000_clean
            47754.00 -  5.5% : rb_next
            37222.00 -  4.2% : e1000_intr_msi
            26295.00 -  3.0% : hfsc_enqueue
            17371.00 -  2.0% : rb_erase
            15290.00 -  1.7% : _raw_spin_lock_irqsave
            14958.00 -  1.7% : rb_insert_color
            14439.00 -  1.6% : update_vf
            14384.00 -  1.6% : e1000_xmit_frame
            14356.00 -  1.6% : hfsc_dequeue
            13804.00 -  1.6% : e1000_clean_tx_irq
            13413.00 -  1.5% : ipt_do_table
             9654.00 -  1.1% : ip_route_input

##############################################################################
eth0 -> CPU0
eth1 -> CPU5
NO RPS

------------------------------------------------------------------------------
   PerfTop:   33800 irqs/sec  kernel:96.9% [100000 cycles],  (all, 8 CPUs)
------------------------------------------------------------------------------

             samples    pcnt   kernel function
             _______   _____   _______________

            19361.00 - 11.2% : e1000_clean
            16424.00 -  9.5% : rb_next
            13060.00 -  7.5% : e1000_intr_msi
             7293.00 -  4.2% : u32_classify
             6875.00 -  4.0% : ipt_do_table
             5811.00 -  3.4% : _raw_spin_lock
             5754.00 -  3.3% : e1000_xmit_frame
             5671.00 -  3.3% : hfsc_dequeue
             4503.00 -  2.6% : __alloc_skb
             4156.00 -  2.4% : hfsc_enqueue
             4090.00 -  2.4% : e1000_clean_tx_irq
             3809.00 -  2.2% : e1000_clean_rx_irq
             3424.00 -  2.0% : update_vf
             3028.00 -  1.7% : rb_erase
             2714.00 -  1.6% : ip_route_input

##############################################################################
eth0 -> CPU0,CPU1,CPU2,CPU4 -> affinity echo 0f > /proc/irq/30/smp_affinity
eth1 -> CPU5,CPU6,CPU7,CPU8 -> affinity echo f0 > /proc/irq/31/smp_affinity
NO RPS
------------------------------------------------------------------------------
   PerfTop:   42362 irqs/sec  kernel:96.0% [100000 cycles],  (all, 8 CPUs)
------------------------------------------------------------------------------

             samples    pcnt   kernel function
             _______   _____   _______________

            33815.00 - 10.6% : rb_next
            21357.00 -  6.7% : u32_classify
            14525.00 -  4.6% : _raw_spin_lock
            14346.00 -  4.5% : e1000_clean
            12798.00 -  4.0% : hfsc_enqueue
            10526.00 -  3.3% : ipt_do_table
             9999.00 -  3.1% : hfsc_dequeue
             9976.00 -  3.1% : e1000_intr_msi
             9787.00 -  3.1% : rb_erase
             8259.00 -  2.6% : e1000_xmit_frame
             8015.00 -  2.5% : rb_insert_color
             7948.00 -  2.5% : update_vf
             6868.00 -  2.2% : e1000_clean_tx_irq
             6822.00 -  2.1% : e1000_clean_rx_irq
             6368.00 -  2.0% : __alloc_skb

##############################################################################
eth0 -> CPU0,CPU1,CPU2,CPU4 -> affinity echo 0f > /proc/irq/30/smp_affinity
eth1 -> CPU5,CPU6,CPU7,CPU8 -> affinity echo f0 > /proc/irq/31/smp_affinity
RPS:
echo 0f > /sys/class/net/eth0/queues/rx-0/rps_cpus
echo f0 > /sys/class/net/eth1/queues/rx-0/rps_cpus
------------------------------------------------------------------------------
   PerfTop:   81051 irqs/sec  kernel:96.9% [100000 cycles],  (all, 8 CPUs)
------------------------------------------------------------------------------

             samples    pcnt   kernel function
             _______   _____   _______________

           167110.00 - 22.3% : _raw_spin_lock
            58221.00 -  7.8% : u32_classify
            46379.00 -  6.2% : rb_next
            35189.00 -  4.7% : e1000_clean
            25614.00 -  3.4% : e1000_intr_msi
            24094.00 -  3.2% : hfsc_enqueue
            16231.00 -  2.2% : rb_erase
            14298.00 -  1.9% : rb_insert_color
            13751.00 -  1.8% : update_vf
            13712.00 -  1.8% : ipt_do_table
            13588.00 -  1.8% : hfsc_dequeue
            13335.00 -  1.8% : e1000_xmit_frame
            12449.00 -  1.7% : e1000_clean_tx_irq
            11510.00 -  1.5% : net_tx_action
            11428.00 -  1.5% : _raw_spin_lock_irqsave


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Strange packet drops with heavy firewalling
  2010-04-13 12:33           ` Paweł Staszewski
@ 2010-04-13 12:53             ` Eric Dumazet
  2010-04-13 13:39               ` Paweł Staszewski
  0 siblings, 1 reply; 15+ messages in thread
From: Eric Dumazet @ 2010-04-13 12:53 UTC (permalink / raw)
  To: Paweł Staszewski; +Cc: Changli Gao, Benny Amorsen, zhigang gong, netdev

Le mardi 13 avril 2010 à 14:33 +0200, Paweł Staszewski a écrit :
> W dniu 2010-04-13 01:18, Changli Gao pisze:
> > On Tue, Apr 13, 2010 at 1:06 AM, Benny Amorsen<benny+usenet@amorsen.dk>  wrote:
> >    
> >>   99:         24    1306226          3          2   PCI-MSI-edge      eth1-tx-0
> >>   100:      15735    1648774          3          7   PCI-MSI-edge      eth1-tx-1
> >>   101:          8         11          9    1083022   PCI-MSI-edge      eth1-tx-2
> >>   102:          0          0          0          0   PCI-MSI-edge      eth1-tx-3
> >>   103:         18         15       6131    1095383   PCI-MSI-edge      eth1-rx-0
> >>   104:        217         32      46544    1335325   PCI-MSI-edge      eth1-rx-1
> >>   105:        154    1305595        218         16   PCI-MSI-edge      eth1-rx-2
> >>   106:         17         16       8229    1467509   PCI-MSI-edge      eth1-rx-3
> >>   107:          0          0          1          0   PCI-MSI-edge      eth1
> >>   108:          2         14         15    1003053   PCI-MSI-edge      eth0-tx-0
> >>   109:       8226    1668924        478        487   PCI-MSI-edge      eth0-tx-1
> >>   110:          3    1188874         17         12   PCI-MSI-edge      eth0-tx-2
> >>   111:          0          0          0          0   PCI-MSI-edge      eth0-tx-3
> >>   112:        203        185       5324    1015263   PCI-MSI-edge      eth0-rx-0
> >>   113:       4141    1600793        153        159   PCI-MSI-edge      eth0-rx-1
> >>   114:      16242    1210108        436       3124   PCI-MSI-edge      eth0-rx-2
> >>   115:        267       4173      19471    1321252   PCI-MSI-edge      eth0-rx-3
> >>   116:          0          1          0          0   PCI-MSI-edge      eth0
> >>
> >>
> >> irqbalanced seems to have picked CPU1 and CPU3 for all the interrupts,
> >> which to my mind should cause the same problem as before (where CPU1 and
> >> CPU3 was handling all packets). Yet the box clearly works much better
> >> than before.
> >>      
> > irqbalanced? I don't think it can work properly. Try RPS in netdev and
> > linux-next tree, and if cpu load isn't even, try this patch:
> > http://patchwork.ozlabs.org/patch/49915/ .
> >
> >
> >    
> Yes without irqbalance - and with irq affinity set by hand router will 
> work much better.
> 
> But I don't think that RPS will help him - I make some tests with RPS 
> and AFFINITY - results in attached file.
> Test router make traffic management (hfsc) for almost 9k users

Thanks for sharing Pawel.

But obviously you are mixing apples and oranges.

 Are you aware that HFSC and other trafic shapers do serialize access to
data structures ? If many cpus try to access these structures in //, you
have a lot of cache line misses. HFSC is a real memory hog :(

Benny do have firewalling (highly parallelized these days, iptables was
well improved in this area), but no traffic control.

Anyway, Benny has now multiqueue devices, and therefore RPS will not
help him. I suggested RPS before his move to multiqueue, and multiqueue
is the most sensible way to improve things, when no central lock is
used. Every cpu can really work in //.




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Strange packet drops with heavy firewalling
  2010-04-13 12:53             ` Eric Dumazet
@ 2010-04-13 13:39               ` Paweł Staszewski
  0 siblings, 0 replies; 15+ messages in thread
From: Paweł Staszewski @ 2010-04-13 13:39 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Changli Gao, Benny Amorsen, zhigang gong, netdev

W dniu 2010-04-13 14:53, Eric Dumazet pisze:
> Le mardi 13 avril 2010 à 14:33 +0200, Paweł Staszewski a écrit :
>    
>> W dniu 2010-04-13 01:18, Changli Gao pisze:
>>      
>>> On Tue, Apr 13, 2010 at 1:06 AM, Benny Amorsen<benny+usenet@amorsen.dk>   wrote:
>>>
>>>        
>>>>    99:         24    1306226          3          2   PCI-MSI-edge      eth1-tx-0
>>>>    100:      15735    1648774          3          7   PCI-MSI-edge      eth1-tx-1
>>>>    101:          8         11          9    1083022   PCI-MSI-edge      eth1-tx-2
>>>>    102:          0          0          0          0   PCI-MSI-edge      eth1-tx-3
>>>>    103:         18         15       6131    1095383   PCI-MSI-edge      eth1-rx-0
>>>>    104:        217         32      46544    1335325   PCI-MSI-edge      eth1-rx-1
>>>>    105:        154    1305595        218         16   PCI-MSI-edge      eth1-rx-2
>>>>    106:         17         16       8229    1467509   PCI-MSI-edge      eth1-rx-3
>>>>    107:          0          0          1          0   PCI-MSI-edge      eth1
>>>>    108:          2         14         15    1003053   PCI-MSI-edge      eth0-tx-0
>>>>    109:       8226    1668924        478        487   PCI-MSI-edge      eth0-tx-1
>>>>    110:          3    1188874         17         12   PCI-MSI-edge      eth0-tx-2
>>>>    111:          0          0          0          0   PCI-MSI-edge      eth0-tx-3
>>>>    112:        203        185       5324    1015263   PCI-MSI-edge      eth0-rx-0
>>>>    113:       4141    1600793        153        159   PCI-MSI-edge      eth0-rx-1
>>>>    114:      16242    1210108        436       3124   PCI-MSI-edge      eth0-rx-2
>>>>    115:        267       4173      19471    1321252   PCI-MSI-edge      eth0-rx-3
>>>>    116:          0          1          0          0   PCI-MSI-edge      eth0
>>>>
>>>>
>>>> irqbalanced seems to have picked CPU1 and CPU3 for all the interrupts,
>>>> which to my mind should cause the same problem as before (where CPU1 and
>>>> CPU3 was handling all packets). Yet the box clearly works much better
>>>> than before.
>>>>
>>>>          
>>> irqbalanced? I don't think it can work properly. Try RPS in netdev and
>>> linux-next tree, and if cpu load isn't even, try this patch:
>>> http://patchwork.ozlabs.org/patch/49915/ .
>>>
>>>
>>>
>>>        
>> Yes without irqbalance - and with irq affinity set by hand router will
>> work much better.
>>
>> But I don't think that RPS will help him - I make some tests with RPS
>> and AFFINITY - results in attached file.
>> Test router make traffic management (hfsc) for almost 9k users
>>      
> Thanks for sharing Pawel.
>
> But obviously you are mixing apples and oranges.
>
>   Are you aware that HFSC and other trafic shapers do serialize access to
> data structures ? If many cpus try to access these structures in //, you
> have a lot of cache line misses. HFSC is a real memory hog :(
>
>    
Thanks Eric for explanation why RPS is useless for traffic management 
routers.

> Benny do have firewalling (highly parallelized these days, iptables was
> well improved in this area), but no traffic control.
>
>    
Hmm so maybe better choice for traffic management is use iptables for 
"filter classification" instead of "u32 filters"- something like 
iptables CLASSIFY target

> Anyway, Benny has now multiqueue devices, and therefore RPS will not
> help him. I suggested RPS before his move to multiqueue, and multiqueue
> is the most sensible way to improve things, when no central lock is
> used. Every cpu can really work in //.
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>    


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Strange packet drops with heavy firewalling
  2010-04-13  7:56             ` Benny Amorsen
@ 2010-04-15 13:23               ` Benny Amorsen
  2010-04-15 13:42                 ` Eric Dumazet
  0 siblings, 1 reply; 15+ messages in thread
From: Benny Amorsen @ 2010-04-15 13:23 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Changli Gao, zhigang gong, netdev

Benny Amorsen <benny+usenet@amorsen.dk> writes:

> I'll keep monitoring the server, and if it starts dropping packets again
> or load increases I'll check whether irqbalanced does the right thing,
> and if not I'll implement your suggestion.

It did start dropping packets (although very few, a few packets dropped
at once perhaps every ten minutes). Irqbalanced didn't move the
interrupts.

Doing

echo 01 >/proc/irq/99/smp_affinity
echo 02 >/proc/irq/100/smp_affinity
echo 04 >/proc/irq/101/smp_affinity

and so on like Erik Dumazet suggested seems to have helped, but not
entirely solved the problem.

The problem now manifests itself this way in ethtool -S:
     rx_no_buffer_count: 270
     rx_queue_drop_packet_count: 270

I can't be sure that I'm not just getting hit by a 1Gbps traffic spike,
of course, but it is a bit strange that a machine which can do 200Mbps
at 92% idle can't handle subsecond peaks close to 1Gbps...

I wish ifstat could report errors so I could see what the traffic rate
was when the problem occurred...


/Benny

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Strange packet drops with heavy firewalling
  2010-04-15 13:23               ` Benny Amorsen
@ 2010-04-15 13:42                 ` Eric Dumazet
  0 siblings, 0 replies; 15+ messages in thread
From: Eric Dumazet @ 2010-04-15 13:42 UTC (permalink / raw)
  To: Benny Amorsen; +Cc: Changli Gao, zhigang gong, netdev

Le jeudi 15 avril 2010 à 15:23 +0200, Benny Amorsen a écrit :
> Benny Amorsen <benny+usenet@amorsen.dk> writes:
> 
> > I'll keep monitoring the server, and if it starts dropping packets again
> > or load increases I'll check whether irqbalanced does the right thing,
> > and if not I'll implement your suggestion.
> 
> It did start dropping packets (although very few, a few packets dropped
> at once perhaps every ten minutes). Irqbalanced didn't move the
> interrupts.
> 
> Doing
> 
> echo 01 >/proc/irq/99/smp_affinity
> echo 02 >/proc/irq/100/smp_affinity
> echo 04 >/proc/irq/101/smp_affinity
> 
> and so on like Erik Dumazet suggested seems to have helped, but not
> entirely solved the problem.
> 
> The problem now manifests itself this way in ethtool -S:
>      rx_no_buffer_count: 270
>      rx_queue_drop_packet_count: 270
> 
> I can't be sure that I'm not just getting hit by a 1Gbps traffic spike,
> of course, but it is a bit strange that a machine which can do 200Mbps
> at 92% idle can't handle subsecond peaks close to 1Gbps...
> 

Even with multiqueue, its quite possible one queue gets more than one
packet per micro second. Time to process a packet might be greater then
1 us even on recent hardware. So bursts of 1000 small packets with same
flow information, hit one queue, one cpu, and fill rx ring.

Loosing these packets is OK, its very likely its an attack :)

> I wish ifstat could report errors so I could see what the traffic rate
> was when the problem occurred...

yes, it could be added I guess.



^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2010-04-15 13:43 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-04-09  9:56 Strange packet drops with heavy firewalling Benny Amorsen
2010-04-09 11:47 ` Eric Dumazet
2010-04-09 12:33   ` Benny Amorsen
2010-04-09 13:29     ` Eric Dumazet
2010-04-12  6:20       ` Benny Amorsen
     [not found] ` <q2v40c9f5b21004120116p766df82dj88c6af4e4cad55f@mail.gmail.com>
2010-04-12 14:44   ` Benny Lyne Amorsen
     [not found]     ` <p2x40c9f5b21004120833jd7a749cak6ea69cebd28f8352@mail.gmail.com>
2010-04-12 17:06       ` Benny Amorsen
2010-04-12 23:18         ` Changli Gao
2010-04-13  5:56           ` Eric Dumazet
2010-04-13  7:56             ` Benny Amorsen
2010-04-15 13:23               ` Benny Amorsen
2010-04-15 13:42                 ` Eric Dumazet
2010-04-13 12:33           ` Paweł Staszewski
2010-04-13 12:53             ` Eric Dumazet
2010-04-13 13:39               ` Paweł Staszewski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.