* Linux kernel - 5.4.0+ (net-next from 27.11.2019) routing/network performance
@ 2019-11-29 22:00 Paweł Staszewski
2019-11-29 22:13 ` Paweł Staszewski
2019-12-01 16:05 ` David Ahern
0 siblings, 2 replies; 6+ messages in thread
From: Paweł Staszewski @ 2019-11-29 22:00 UTC (permalink / raw)
To: netdev
As always - each year i need to summarize network performance for
routing applications like linux router on native Linux kernel (without
xdp/dpdk/vpp etc) :)
HW setup:
Server (Supermicro SYS-1019P-WTR)
1x Intel 6146
2x Mellanox connect-x 5 (100G) (installed in two different x16 pcie
gen3.1 slots)
6x 8GB DDR4 2666 (it really matters cause 100G is about 12.5GB/s of
memory bandwidth one direction)
And here it is:
perf top at 72Gbit.s RX and 72Gbit/s TX (at same time)
PerfTop: 91202 irqs/sec kernel:99.7% exact: 100.0% [4000Hz
cycles:ppp], (all, 24 CPUs)
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
7.56% [kernel] [k] __dev_queue_xmit
5.27% [kernel] [k] build_skb
4.41% [kernel] [k] rr_transmit
4.17% [kernel] [k] fib_table_lookup
3.83% [kernel] [k] mlx5e_skb_from_cqe_mpwrq_linear
3.30% [kernel] [k] mlx5e_sq_xmit
3.14% [kernel] [k] __netif_receive_skb_core
2.48% [kernel] [k] netif_skb_features
2.36% [kernel] [k] _raw_spin_trylock
2.27% [kernel] [k] dev_hard_start_xmit
2.26% [kernel] [k] dev_gro_receive
2.20% [kernel] [k] mlx5e_handle_rx_cqe_mpwrq
1.92% [kernel] [k] mlx5_eq_comp_int
1.91% [kernel] [k] mlx5e_poll_tx_cq
1.74% [kernel] [k] tcp_gro_receive
1.68% [kernel] [k] memcpy_erms
1.64% [kernel] [k] kmem_cache_free_bulk
1.57% [kernel] [k] inet_gro_receive
1.55% [kernel] [k] netdev_pick_tx
1.52% [kernel] [k] ip_forward
1.45% [kernel] [k] team_xmit
1.40% [kernel] [k] vlan_do_receive
1.37% [kernel] [k] team_handle_frame
1.36% [kernel] [k] __build_skb
1.33% [kernel] [k] ipt_do_table
1.33% [kernel] [k] mlx5e_poll_rx_cq
1.28% [kernel] [k] ip_finish_output2
1.26% [kernel] [k] vlan_passthru_hard_header
1.20% [kernel] [k] netdev_core_pick_tx
0.93% [kernel] [k] ip_rcv_core.isra.22.constprop.27
0.87% [kernel] [k] validate_xmit_skb.isra.148
0.87% [kernel] [k] ip_route_input_rcu
0.78% [kernel] [k] kmem_cache_alloc
0.77% [kernel] [k] mlx5e_handle_rx_dim
0.71% [kernel] [k] iommu_need_mapping
0.69% [kernel] [k] tasklet_action_common.isra.21
0.66% [kernel] [k] mlx5e_xmit
0.65% [kernel] [k] mlx5e_post_rx_mpwqes
0.63% [kernel] [k] _raw_spin_lock
0.61% [kernel] [k] ip_sublist_rcv
0.57% [kernel] [k] skb_release_data
0.53% [kernel] [k] __local_bh_enable_ip
0.53% [kernel] [k] tcp4_gro_receive
0.51% [kernel] [k] pfifo_fast_dequeue
0.51% [kernel] [k] page_frag_free
0.50% [kernel] [k] kmem_cache_free
0.47% [kernel] [k] dma_direct_map_page
0.45% [kernel] [k] native_irq_return_iret
0.44% [kernel] [k] __slab_free.isra.89
0.43% [kernel] [k] skb_gro_receive
0.43% [kernel] [k] napi_gro_receive
0.43% [kernel] [k] __do_softirq
0.41% [kernel] [k] sch_direct_xmit
0.41% [kernel] [k] ip_rcv_finish_core.isra.19
0.40% [kernel] [k] skb_network_protocol
0.40% [kernel] [k] __get_xps_queue_idx
Im useing team (2x 100G LAG)- that is why here is some load:
4.41% [kernel] [k] rr_transmit
No discards on interfaces:
ethtool -S enp179s0f0 | grep disc
rx_discards_phy: 0
tx_discards_phy: 0
ethtool -S enp179s0f1 | grep disc
rx_discards_phy: 0
tx_discards_phy: 0
io/stream test at 72G/72G traffic:
-------------------------------------------------------------
Function Best Rate MB/s Avg time Min time Max time
Copy: 38948.8 0.004368 0.004108 0.004533
Scale: 37914.6 0.004473 0.004220 0.004802
Add: 43134.6 0.005801 0.005564 0.006086
Triad: 42934.1 0.005696 0.005590 0.005901
-------------------------------------------------------------
And some links to screenshoots
Softirqs
https://pasteboard.co/IIZkGrw.png
And bandwidth / cpu / pps grapsh
https://pasteboard.co/IIZl6XP.png
Currently it looks like the biggest problem for 100G is cpu->mem->nic
bandwidth or nic doorbell / page cache at RX processing - cause what i
can see is that if I run iperf on this host i can TX full 100G - but I
cant RX 100G when i flood this host from some packet generator (it will
start to drop packets at arount 82Gbit/s) - and this is not a problem
with ppp but it is bandwidth problem.
For example i can flood RX with 14Mpps or 64b packets without nic
discards but i cant flood it with 1000b frames and same pps - cause when
it reaches 82Gbit/s nic's start to report discards.
Thanks
--
Paweł Staszewski
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Linux kernel - 5.4.0+ (net-next from 27.11.2019) routing/network performance
2019-11-29 22:00 Linux kernel - 5.4.0+ (net-next from 27.11.2019) routing/network performance Paweł Staszewski
@ 2019-11-29 22:13 ` Paweł Staszewski
2019-12-01 16:05 ` David Ahern
1 sibling, 0 replies; 6+ messages in thread
From: Paweł Staszewski @ 2019-11-29 22:13 UTC (permalink / raw)
To: netdev
W dniu 29.11.2019 o 23:00, Paweł Staszewski pisze:
> As always - each year i need to summarize network performance for
> routing applications like linux router on native Linux kernel (without
> xdp/dpdk/vpp etc) :)
>
> HW setup:
>
> Server (Supermicro SYS-1019P-WTR)
>
> 1x Intel 6146
>
> 2x Mellanox connect-x 5 (100G) (installed in two different x16 pcie
> gen3.1 slots)
>
> 6x 8GB DDR4 2666 (it really matters cause 100G is about 12.5GB/s of
> memory bandwidth one direction)
>
>
> And here it is:
>
> perf top at 72Gbit.s RX and 72Gbit/s TX (at same time)
>
> PerfTop: 91202 irqs/sec kernel:99.7% exact: 100.0% [4000Hz
> cycles:ppp], (all, 24 CPUs)
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>
> 7.56% [kernel] [k] __dev_queue_xmit
> 5.27% [kernel] [k] build_skb
> 4.41% [kernel] [k] rr_transmit
> 4.17% [kernel] [k] fib_table_lookup
> 3.83% [kernel] [k] mlx5e_skb_from_cqe_mpwrq_linear
> 3.30% [kernel] [k] mlx5e_sq_xmit
> 3.14% [kernel] [k] __netif_receive_skb_core
> 2.48% [kernel] [k] netif_skb_features
> 2.36% [kernel] [k] _raw_spin_trylock
> 2.27% [kernel] [k] dev_hard_start_xmit
> 2.26% [kernel] [k] dev_gro_receive
> 2.20% [kernel] [k] mlx5e_handle_rx_cqe_mpwrq
> 1.92% [kernel] [k] mlx5_eq_comp_int
> 1.91% [kernel] [k] mlx5e_poll_tx_cq
> 1.74% [kernel] [k] tcp_gro_receive
> 1.68% [kernel] [k] memcpy_erms
> 1.64% [kernel] [k] kmem_cache_free_bulk
> 1.57% [kernel] [k] inet_gro_receive
> 1.55% [kernel] [k] netdev_pick_tx
> 1.52% [kernel] [k] ip_forward
> 1.45% [kernel] [k] team_xmit
> 1.40% [kernel] [k] vlan_do_receive
> 1.37% [kernel] [k] team_handle_frame
> 1.36% [kernel] [k] __build_skb
> 1.33% [kernel] [k] ipt_do_table
> 1.33% [kernel] [k] mlx5e_poll_rx_cq
> 1.28% [kernel] [k] ip_finish_output2
> 1.26% [kernel] [k] vlan_passthru_hard_header
> 1.20% [kernel] [k] netdev_core_pick_tx
> 0.93% [kernel] [k] ip_rcv_core.isra.22.constprop.27
> 0.87% [kernel] [k] validate_xmit_skb.isra.148
> 0.87% [kernel] [k] ip_route_input_rcu
> 0.78% [kernel] [k] kmem_cache_alloc
> 0.77% [kernel] [k] mlx5e_handle_rx_dim
> 0.71% [kernel] [k] iommu_need_mapping
> 0.69% [kernel] [k] tasklet_action_common.isra.21
> 0.66% [kernel] [k] mlx5e_xmit
> 0.65% [kernel] [k] mlx5e_post_rx_mpwqes
> 0.63% [kernel] [k] _raw_spin_lock
> 0.61% [kernel] [k] ip_sublist_rcv
> 0.57% [kernel] [k] skb_release_data
> 0.53% [kernel] [k] __local_bh_enable_ip
> 0.53% [kernel] [k] tcp4_gro_receive
> 0.51% [kernel] [k] pfifo_fast_dequeue
> 0.51% [kernel] [k] page_frag_free
> 0.50% [kernel] [k] kmem_cache_free
> 0.47% [kernel] [k] dma_direct_map_page
> 0.45% [kernel] [k] native_irq_return_iret
> 0.44% [kernel] [k] __slab_free.isra.89
> 0.43% [kernel] [k] skb_gro_receive
> 0.43% [kernel] [k] napi_gro_receive
> 0.43% [kernel] [k] __do_softirq
> 0.41% [kernel] [k] sch_direct_xmit
> 0.41% [kernel] [k] ip_rcv_finish_core.isra.19
> 0.40% [kernel] [k] skb_network_protocol
> 0.40% [kernel] [k] __get_xps_queue_idx
>
>
> Im useing team (2x 100G LAG)- that is why here is some load:
>
> 4.41% [kernel] [k] rr_transmit
>
>
>
> No discards on interfaces:
>
> ethtool -S enp179s0f0 | grep disc
> rx_discards_phy: 0
> tx_discards_phy: 0
>
> ethtool -S enp179s0f1 | grep disc
> rx_discards_phy: 0
> tx_discards_phy: 0
>
> io/stream test at 72G/72G traffic:
>
> -------------------------------------------------------------
> Function Best Rate MB/s Avg time Min time Max time
> Copy: 38948.8 0.004368 0.004108 0.004533
> Scale: 37914.6 0.004473 0.004220 0.004802
> Add: 43134.6 0.005801 0.005564 0.006086
> Triad: 42934.1 0.005696 0.005590 0.005901
> -------------------------------------------------------------
>
>
> And some links to screenshoots
>
> Softirqs
>
> https://pasteboard.co/IIZkGrw.png
>
> And bandwidth / cpu / pps grapsh
>
> https://pasteboard.co/IIZl6XP.png
>
>
> Currently it looks like the biggest problem for 100G is cpu->mem->nic
> bandwidth or nic doorbell / page cache at RX processing - cause what i
> can see is that if I run iperf on this host i can TX full 100G - but I
> cant RX 100G when i flood this host from some packet generator (it
> will start to drop packets at arount 82Gbit/s) - and this is not a
> problem with ppp but it is bandwidth problem.
>
> For example i can flood RX with 14Mpps or 64b packets without nic
> discards but i cant flood it with 1000b frames and same pps - cause
> when it reaches 82Gbit/s nic's start to report discards.
>
>
> Thanks
>
>
>
Forgot to add this is forwarding scenario - so router is routing packets
from one 100G interface to another 100G interface and vice-versa (full
BGP feed x4 from 4 different upstreams) - 700k+ flows.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Linux kernel - 5.4.0+ (net-next from 27.11.2019) routing/network performance
2019-11-29 22:00 Linux kernel - 5.4.0+ (net-next from 27.11.2019) routing/network performance Paweł Staszewski
2019-11-29 22:13 ` Paweł Staszewski
@ 2019-12-01 16:05 ` David Ahern
2019-12-02 10:09 ` Paweł Staszewski
1 sibling, 1 reply; 6+ messages in thread
From: David Ahern @ 2019-12-01 16:05 UTC (permalink / raw)
To: Paweł Staszewski, netdev
On 11/29/19 4:00 PM, Paweł Staszewski wrote:
> As always - each year i need to summarize network performance for
> routing applications like linux router on native Linux kernel (without
> xdp/dpdk/vpp etc) :)
>
Do you keep past profiles? How does this profile (and traffic rates)
compare to older kernels - e.g., 5.0 or 4.19?
> HW setup:
>
> Server (Supermicro SYS-1019P-WTR)
>
> 1x Intel 6146
>
> 2x Mellanox connect-x 5 (100G) (installed in two different x16 pcie
> gen3.1 slots)
>
> 6x 8GB DDR4 2666 (it really matters cause 100G is about 12.5GB/s of
> memory bandwidth one direction)
>
>
> And here it is:
>
> perf top at 72Gbit.s RX and 72Gbit/s TX (at same time)
>
> PerfTop: 91202 irqs/sec kernel:99.7% exact: 100.0% [4000Hz
> cycles:ppp], (all, 24 CPUs)
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>
> 7.56% [kernel] [k] __dev_queue_xmit
> 5.27% [kernel] [k] build_skb
> 4.41% [kernel] [k] rr_transmit
> 4.17% [kernel] [k] fib_table_lookup
> 3.83% [kernel] [k] mlx5e_skb_from_cqe_mpwrq_linear
> 3.30% [kernel] [k] mlx5e_sq_xmit
> 3.14% [kernel] [k] __netif_receive_skb_core
> 2.48% [kernel] [k] netif_skb_features
> 2.36% [kernel] [k] _raw_spin_trylock
> 2.27% [kernel] [k] dev_hard_start_xmit
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Linux kernel - 5.4.0+ (net-next from 27.11.2019) routing/network performance
2019-12-01 16:05 ` David Ahern
@ 2019-12-02 10:09 ` Paweł Staszewski
2019-12-02 10:53 ` Paolo Abeni
0 siblings, 1 reply; 6+ messages in thread
From: Paweł Staszewski @ 2019-12-02 10:09 UTC (permalink / raw)
To: David Ahern, netdev, Jesper Dangaard Brouer
W dniu 01.12.2019 o 17:05, David Ahern pisze:
> On 11/29/19 4:00 PM, Paweł Staszewski wrote:
>> As always - each year i need to summarize network performance for
>> routing applications like linux router on native Linux kernel (without
>> xdp/dpdk/vpp etc) :)
>>
> Do you keep past profiles? How does this profile (and traffic rates)
> compare to older kernels - e.g., 5.0 or 4.19?
>
>
Yes - so for 4.19:
Max bandwidth was about 40-42Gbit/s RX / 40-42Gbit/s TX of
forwarded(routed) traffic
And after "order-0 pages" patches - max was 50Gbit/s RX + 50Gbit/s TX
(forwarding - bandwidth max)
(current kernel almost doubled this)
And also old perf top (from kernel 4.19) - before "order-0 pages patch":
PerfTop: 108490 irqs/sec kernel:99.6% exact: 0.0% [4000Hz
cycles], (all, 56 CPUs)
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
26.78% [kernel] [k] queued_spin_lock_slowpath
9.09% [kernel] [k] mlx5e_skb_from_cqe_linear
4.94% [kernel] [k] mlx5e_sq_xmit
3.63% [kernel] [k] memcpy_erms
3.30% [kernel] [k] fib_table_lookup
3.26% [kernel] [k] build_skb
2.41% [kernel] [k] mlx5e_poll_tx_cq
2.11% [kernel] [k] get_page_from_freelist
1.51% [kernel] [k] vlan_do_receive
1.51% [kernel] [k] _raw_spin_lock
1.43% [kernel] [k] __dev_queue_xmit
1.41% [kernel] [k] dev_gro_receive
1.34% [kernel] [k] mlx5e_poll_rx_cq
1.26% [kernel] [k] tcp_gro_receive
1.21% [kernel] [k] free_one_page
1.13% [kernel] [k] swiotlb_map_page
1.13% [kernel] [k] mlx5e_post_rx_wqes
1.05% [kernel] [k] pfifo_fast_dequeue
1.05% [kernel] [k] mlx5e_handle_rx_cqe
1.03% [kernel] [k] ip_finish_output2
1.02% [kernel] [k] ipt_do_table
0.96% [kernel] [k] inet_gro_receive
0.91% [kernel] [k] mlx5_eq_int
0.88% [kernel] [k] __slab_free.isra.79
0.86% [kernel] [k] __build_skb
0.84% [kernel] [k] page_frag_free
0.76% [kernel] [k] skb_release_data
0.75% [kernel] [k] __netif_receive_skb_core
0.75% [kernel] [k] irq_entries_start
0.71% [kernel] [k] ip_route_input_rcu
0.65% [kernel] [k] vlan_dev_hard_start_xmit
0.56% [kernel] [k] ip_forward
0.56% [kernel] [k] __memcpy
0.52% [kernel] [k] kmem_cache_alloc
0.52% [kernel] [k] kmem_cache_free_bulk
0.49% [kernel] [k] mlx5e_page_release
0.47% [kernel] [k] netif_skb_features
0.47% [kernel] [k] mlx5e_build_rx_skb
0.47% [kernel] [k] dev_hard_start_xmit
0.43% [kernel] [k] __page_pool_put_page
0.43% [kernel] [k] __netif_schedule
0.43% [kernel] [k] mlx5e_xmit
0.41% [kernel] [k] __qdisc_run
0.41% [kernel] [k] validate_xmit_skb.isra.142
0.41% [kernel] [k] swiotlb_unmap_page
0.40% [kernel] [k] inet_lookup_ifaddr_rcu
0.34% [kernel] [k] ip_rcv_core.isra.20.constprop.25
0.34% [kernel] [k] tcp4_gro_receive
0.29% [kernel] [k] _raw_spin_lock_irqsave
0.29% [kernel] [k] napi_consume_skb
0.29% [kernel] [k] skb_gro_receive
0.29% [kernel] [k] ___slab_alloc.isra.80
0.27% [kernel] [k] eth_type_trans
0.26% [kernel] [k] __free_pages_ok
0.26% [kernel] [k] __get_xps_queue_idx
0.24% [kernel] [k] _raw_spin_trylock
0.23% [kernel] [k] __local_bh_enable_ip
0.22% [kernel] [k] pfifo_fast_enqueue
0.21% [kernel] [k] tasklet_action_common.isra.21
0.21% [kernel] [k] sch_direct_xmit
0.21% [kernel] [k] skb_network_protocol
0.21% [kernel] [k] kmem_cache_free
0.20% [kernel] [k] netdev_pick_tx
0.18% [kernel] [k] napi_gro_complete
0.18% [kernel] [k] __sched_text_start
0.18% [kernel] [k] mlx5e_xdp_handle
0.17% [kernel] [k] ip_finish_output
0.16% [kernel] [k] napi_gro_flush
0.16% [kernel] [k] vlan_passthru_hard_header
0.16% [kernel] [k] skb_segment
0.15% [kernel] [k] __alloc_pages_nodemask
0.15% [kernel] [k] mlx5e_features_check
0.15% [kernel] [k] mlx5e_napi_poll
0.15% [kernel] [k] napi_gro_receive
0.14% [kernel] [k] fib_validate_source
0.14% [kernel] [k] _raw_spin_lock_irq
0.14% [kernel] [k] inet_gro_complete
0.14% [kernel] [k] get_partial_node.isra.78
0.13% [kernel] [k] napi_complete_done
0.13% [kernel] [k] ip_rcv_finish_core.isra.17
0.13% [kernel] [k] cmd_exec
After "order-0 pages" patch
PerfTop: 104692 irqs/sec kernel:99.5% exact: 0.0% [4000Hz
cycles], (all, 56 CPUs)
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
9.06% [kernel] [k] mlx5e_skb_from_cqe_mpwrq_linear
6.43% [kernel] [k] tasklet_action_common.isra.21
5.68% [kernel] [k] fib_table_lookup
4.89% [kernel] [k] irq_entries_start
4.53% [kernel] [k] mlx5_eq_int
4.10% [kernel] [k] build_skb
3.39% [kernel] [k] mlx5e_poll_tx_cq
3.38% [kernel] [k] mlx5e_sq_xmit
2.73% [kernel] [k] mlx5e_poll_rx_cq
2.18% [kernel] [k] __dev_queue_xmit
2.13% [kernel] [k] vlan_do_receive
2.12% [kernel] [k] mlx5e_handle_rx_cqe_mpwrq
2.00% [kernel] [k] ip_finish_output2
1.87% [kernel] [k] mlx5e_post_rx_mpwqes
1.86% [kernel] [k] memcpy_erms
1.85% [kernel] [k] ipt_do_table
1.70% [kernel] [k] dev_gro_receive
1.39% [kernel] [k] __netif_receive_skb_core
1.31% [kernel] [k] inet_gro_receive
1.21% [kernel] [k] ip_route_input_rcu
1.21% [kernel] [k] tcp_gro_receive
1.13% [kernel] [k] _raw_spin_lock
1.08% [kernel] [k] __build_skb
1.06% [kernel] [k] kmem_cache_free_bulk
1.05% [kernel] [k] __softirqentry_text_start
1.03% [kernel] [k] vlan_dev_hard_start_xmit
0.98% [kernel] [k] pfifo_fast_dequeue
0.95% [kernel] [k] mlx5e_xmit
0.95% [kernel] [k] page_frag_free
0.88% [kernel] [k] ip_forward
0.81% [kernel] [k] dev_hard_start_xmit
0.78% [kernel] [k] rcu_irq_exit
0.77% [kernel] [k] netif_skb_features
0.72% [kernel] [k] napi_complete_done
0.72% [kernel] [k] kmem_cache_alloc
0.68% [kernel] [k] validate_xmit_skb.isra.142
0.66% [kernel] [k] ip_rcv_core.isra.20.constprop.25
0.58% [kernel] [k] swiotlb_map_page
0.57% [kernel] [k] __qdisc_run
0.56% [kernel] [k] tasklet_action
0.54% [kernel] [k] __get_xps_queue_idx
0.54% [kernel] [k] inet_lookup_ifaddr_rcu
0.50% [kernel] [k] tcp4_gro_receive
0.49% [kernel] [k] skb_release_data
0.47% [kernel] [k] eth_type_trans
0.40% [kernel] [k] sch_direct_xmit
0.40% [kernel] [k] net_rx_action
0.39% [kernel] [k] __local_bh_enable_ip
>> HW setup:
>>
>> Server (Supermicro SYS-1019P-WTR)
>>
>> 1x Intel 6146
>>
>> 2x Mellanox connect-x 5 (100G) (installed in two different x16 pcie
>> gen3.1 slots)
>>
>> 6x 8GB DDR4 2666 (it really matters cause 100G is about 12.5GB/s of
>> memory bandwidth one direction)
>>
>>
>> And here it is:
>>
>> perf top at 72Gbit.s RX and 72Gbit/s TX (at same time)
>>
>> PerfTop: 91202 irqs/sec kernel:99.7% exact: 100.0% [4000Hz
>> cycles:ppp], (all, 24 CPUs)
>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>
>>
>> 7.56% [kernel] [k] __dev_queue_xmit
>> 5.27% [kernel] [k] build_skb
>> 4.41% [kernel] [k] rr_transmit
>> 4.17% [kernel] [k] fib_table_lookup
>> 3.83% [kernel] [k] mlx5e_skb_from_cqe_mpwrq_linear
>> 3.30% [kernel] [k] mlx5e_sq_xmit
>> 3.14% [kernel] [k] __netif_receive_skb_core
>> 2.48% [kernel] [k] netif_skb_features
>> 2.36% [kernel] [k] _raw_spin_trylock
>> 2.27% [kernel] [k] dev_hard_start_xmit
--
Paweł Staszewski
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Linux kernel - 5.4.0+ (net-next from 27.11.2019) routing/network performance
2019-12-02 10:09 ` Paweł Staszewski
@ 2019-12-02 10:53 ` Paolo Abeni
2019-12-02 16:23 ` Paweł Staszewski
0 siblings, 1 reply; 6+ messages in thread
From: Paolo Abeni @ 2019-12-02 10:53 UTC (permalink / raw)
To: Paweł Staszewski, David Ahern, netdev, Jesper Dangaard Brouer
On Mon, 2019-12-02 at 11:09 +0100, Paweł Staszewski wrote:
> W dniu 01.12.2019 o 17:05, David Ahern pisze:
> > On 11/29/19 4:00 PM, Paweł Staszewski wrote:
> > > As always - each year i need to summarize network performance for
> > > routing applications like linux router on native Linux kernel (without
> > > xdp/dpdk/vpp etc) :)
> > >
> > Do you keep past profiles? How does this profile (and traffic rates)
> > compare to older kernels - e.g., 5.0 or 4.19?
> >
> >
> Yes - so for 4.19:
>
> Max bandwidth was about 40-42Gbit/s RX / 40-42Gbit/s TX of
> forwarded(routed) traffic
>
> And after "order-0 pages" patches - max was 50Gbit/s RX + 50Gbit/s TX
> (forwarding - bandwidth max)
>
> (current kernel almost doubled this)
Looks like we are on the good track ;)
[...]
> After "order-0 pages" patch
>
> PerfTop: 104692 irqs/sec kernel:99.5% exact: 0.0% [4000Hz
> cycles], (all, 56 CPUs)
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>
> 9.06% [kernel] [k] mlx5e_skb_from_cqe_mpwrq_linear
> 6.43% [kernel] [k] tasklet_action_common.isra.21
> 5.68% [kernel] [k] fib_table_lookup
> 4.89% [kernel] [k] irq_entries_start
> 4.53% [kernel] [k] mlx5_eq_int
> 4.10% [kernel] [k] build_skb
> 3.39% [kernel] [k] mlx5e_poll_tx_cq
> 3.38% [kernel] [k] mlx5e_sq_xmit
> 2.73% [kernel] [k] mlx5e_poll_rx_cq
Compared to the current kernel perf figures, it looks like most of the
gains come from driver changes.
[... current perf figures follow ...]
> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>
> 7.56% [kernel] [k] __dev_queue_xmit
This is a bit surprising to me. I guess this is due
'__dev_queue_xmit()' being calling twice per packet (team, NIC) and due
to the retpoline overhead.
> 1.74% [kernel] [k] tcp_gro_receive
If the reference use-case is with a quite large number of cuncurrent
flows, I guess you can try disabling GRO
Cheers,
Paolo
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: Linux kernel - 5.4.0+ (net-next from 27.11.2019) routing/network performance
2019-12-02 10:53 ` Paolo Abeni
@ 2019-12-02 16:23 ` Paweł Staszewski
0 siblings, 0 replies; 6+ messages in thread
From: Paweł Staszewski @ 2019-12-02 16:23 UTC (permalink / raw)
To: Paolo Abeni, David Ahern, netdev, Jesper Dangaard Brouer
W dniu 02.12.2019 o 11:53, Paolo Abeni pisze:
> On Mon, 2019-12-02 at 11:09 +0100, Paweł Staszewski wrote:
>> W dniu 01.12.2019 o 17:05, David Ahern pisze:
>>> On 11/29/19 4:00 PM, Paweł Staszewski wrote:
>>>> As always - each year i need to summarize network performance for
>>>> routing applications like linux router on native Linux kernel (without
>>>> xdp/dpdk/vpp etc) :)
>>>>
>>> Do you keep past profiles? How does this profile (and traffic rates)
>>> compare to older kernels - e.g., 5.0 or 4.19?
>>>
>>>
>> Yes - so for 4.19:
>>
>> Max bandwidth was about 40-42Gbit/s RX / 40-42Gbit/s TX of
>> forwarded(routed) traffic
>>
>> And after "order-0 pages" patches - max was 50Gbit/s RX + 50Gbit/s TX
>> (forwarding - bandwidth max)
>>
>> (current kernel almost doubled this)
> Looks like we are on the good track ;)
>
> [...]
>> After "order-0 pages" patch
>>
>> PerfTop: 104692 irqs/sec kernel:99.5% exact: 0.0% [4000Hz
>> cycles], (all, 56 CPUs)
>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>
>>
>> 9.06% [kernel] [k] mlx5e_skb_from_cqe_mpwrq_linear
>> 6.43% [kernel] [k] tasklet_action_common.isra.21
>> 5.68% [kernel] [k] fib_table_lookup
>> 4.89% [kernel] [k] irq_entries_start
>> 4.53% [kernel] [k] mlx5_eq_int
>> 4.10% [kernel] [k] build_skb
>> 3.39% [kernel] [k] mlx5e_poll_tx_cq
>> 3.38% [kernel] [k] mlx5e_sq_xmit
>> 2.73% [kernel] [k] mlx5e_poll_rx_cq
> Compared to the current kernel perf figures, it looks like most of the
> gains come from driver changes.
>
> [... current perf figures follow ...]
>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>
>>
>> 7.56% [kernel] [k] __dev_queue_xmit
> This is a bit surprising to me. I guess this is due
> '__dev_queue_xmit()' being calling twice per packet (team, NIC) and due
> to the retpoline overhead.
>
>> 1.74% [kernel] [k] tcp_gro_receive
> If the reference use-case is with a quite large number of cuncurrent
> flows, I guess you can try disabling GRO
Disabling GRO with teamed interfaces is not good cause after disabling
GRO on physical interfaces cpu load is about 10% higher on all cores.
And observation:
Enabled GRO on interfaces vs team0 packets per second:
iface Rx Tx Total
==============================================================================
team0: 5952483.50 KB/s 6028436.50 KB/s 11980919.00
KB/s
----------------------------------------------------------------------------
And softnetstats:
CPU total/sec dropped/sec squeezed/sec
collision/sec rx_rps/sec flow_limit/sec
CPU:00 1014977 0 35 0
0 0
CPU:01 1074461 0 30 0
0 0
CPU:02 1020460 0 34 0
0 0
CPU:03 1077624 0 34 0
0 0
CPU:04 1005102 0 32 0
0 0
CPU:05 1097107 0 46 0
0 0
CPU:06 997877 0 24 0
0 0
CPU:07 1056216 0 34 0
0 0
CPU:08 856567 0 34 0
0 0
CPU:09 862527 0 23 0
0 0
CPU:10 876107 0 34 0
0 0
CPU:11 759275 0 27 0
0 0
CPU:12 817307 0 27 0
0 0
CPU:13 868073 0 21 0
0 0
CPU:14 837783 0 34 0
0 0
CPU:15 817946 0 27 0
0 0
CPU:16 785500 0 25 0
0 0
CPU:17 851276 0 28 0
0 0
CPU:18 843888 0 29 0
0 0
CPU:19 924840 0 34 0
0 0
CPU:20 884879 0 37 0
0 0
CPU:21 841461 0 28 0
0 0
CPU:22 819436 0 32 0
0 0
CPU:23 872843 0 32 0
0 0
Summed: 21863531 0 740 0
0 0
Disabled GRO on interfaces vs team0 packets per second:
iface Rx Tx Total
==============================================================================
team0: 5952483.50 KB/s 6028436.50 KB/s 11980919.00
KB/s
----------------------------------------------------------------------------
And softnet stat:
CPU total/sec dropped/sec squeezed/sec
collision/sec rx_rps/sec flow_limit/sec
CPU:00 625288 0 23 0
0 0
CPU:01 605239 0 24 0
0 0
CPU:02 644965 0 26 0
0 0
CPU:03 620264 0 30 0
0 0
CPU:04 603416 0 25 0
0 0
CPU:05 597838 0 23 0
0 0
CPU:06 580028 0 22 0
0 0
CPU:07 604274 0 23 0
0 0
CPU:08 556119 0 26 0
0 0
CPU:09 494997 0 23 0
0 0
CPU:10 514759 0 23 0
0 0
CPU:11 500333 0 22 0
0 0
CPU:12 497956 0 23 0
0 0
CPU:13 535194 0 14 0
0 0
CPU:14 504304 0 24 0
0 0
CPU:15 489015 0 18 0
0 0
CPU:16 487249 0 24 0
0 0
CPU:17 472023 0 23 0
0 0
CPU:18 539454 0 24 0
0 0
CPU:19 499901 0 19 0
0 0
CPU:20 479945 0 26 0
0 0
CPU:21 486800 0 29 0
0 0
CPU:22 466916 0 26 0
0 0
CPU:23 559730 0 34 0
0 0
Summed: 12966008 0 573 0
0 0
Maybee without team it will be better.
>
> Cheers,
>
> Paolo
>
--
Paweł Staszewski
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2019-12-02 16:24 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-29 22:00 Linux kernel - 5.4.0+ (net-next from 27.11.2019) routing/network performance Paweł Staszewski
2019-11-29 22:13 ` Paweł Staszewski
2019-12-01 16:05 ` David Ahern
2019-12-02 10:09 ` Paweł Staszewski
2019-12-02 10:53 ` Paolo Abeni
2019-12-02 16:23 ` Paweł Staszewski
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).