All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paweł Staszewski" <pstaszewski@itcare.pl>
To: netdev@vger.kernel.org
Subject: Re: Linux kernel - 5.4.0+ (net-next from 27.11.2019) routing/network performance
Date: Fri, 29 Nov 2019 23:13:49 +0100	[thread overview]
Message-ID: <55238597-f966-13aa-2dae-4fde19456254@itcare.pl> (raw)
In-Reply-To: <81ad4acf-c9b4-b2e8-d6b1-7e1245bce8a5@itcare.pl>


W dniu 29.11.2019 o 23:00, Paweł Staszewski pisze:
> As always - each year i need to summarize network performance for 
> routing applications like linux router on native Linux kernel (without 
> xdp/dpdk/vpp etc) :)
>
> HW setup:
>
> Server (Supermicro SYS-1019P-WTR)
>
> 1x Intel 6146
>
> 2x Mellanox connect-x 5 (100G) (installed in two different x16 pcie 
> gen3.1 slots)
>
> 6x 8GB DDR4 2666 (it really matters cause 100G is about 12.5GB/s of 
> memory bandwidth one direction)
>
>
> And here it is:
>
> perf top at 72Gbit.s RX and 72Gbit/s TX (at same time)
>
>    PerfTop:   91202 irqs/sec  kernel:99.7%  exact: 100.0% [4000Hz 
> cycles:ppp],  (all, 24 CPUs)
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 
>
>
>      7.56%  [kernel]       [k] __dev_queue_xmit
>      5.27%  [kernel]       [k] build_skb
>      4.41%  [kernel]       [k] rr_transmit
>      4.17%  [kernel]       [k] fib_table_lookup
>      3.83%  [kernel]       [k] mlx5e_skb_from_cqe_mpwrq_linear
>      3.30%  [kernel]       [k] mlx5e_sq_xmit
>      3.14%  [kernel]       [k] __netif_receive_skb_core
>      2.48%  [kernel]       [k] netif_skb_features
>      2.36%  [kernel]       [k] _raw_spin_trylock
>      2.27%  [kernel]       [k] dev_hard_start_xmit
>      2.26%  [kernel]       [k] dev_gro_receive
>      2.20%  [kernel]       [k] mlx5e_handle_rx_cqe_mpwrq
>      1.92%  [kernel]       [k] mlx5_eq_comp_int
>      1.91%  [kernel]       [k] mlx5e_poll_tx_cq
>      1.74%  [kernel]       [k] tcp_gro_receive
>      1.68%  [kernel]       [k] memcpy_erms
>      1.64%  [kernel]       [k] kmem_cache_free_bulk
>      1.57%  [kernel]       [k] inet_gro_receive
>      1.55%  [kernel]       [k] netdev_pick_tx
>      1.52%  [kernel]       [k] ip_forward
>      1.45%  [kernel]       [k] team_xmit
>      1.40%  [kernel]       [k] vlan_do_receive
>      1.37%  [kernel]       [k] team_handle_frame
>      1.36%  [kernel]       [k] __build_skb
>      1.33%  [kernel]       [k] ipt_do_table
>      1.33%  [kernel]       [k] mlx5e_poll_rx_cq
>      1.28%  [kernel]       [k] ip_finish_output2
>      1.26%  [kernel]       [k] vlan_passthru_hard_header
>      1.20%  [kernel]       [k] netdev_core_pick_tx
>      0.93%  [kernel]       [k] ip_rcv_core.isra.22.constprop.27
>      0.87%  [kernel]       [k] validate_xmit_skb.isra.148
>      0.87%  [kernel]       [k] ip_route_input_rcu
>      0.78%  [kernel]       [k] kmem_cache_alloc
>      0.77%  [kernel]       [k] mlx5e_handle_rx_dim
>      0.71%  [kernel]       [k] iommu_need_mapping
>      0.69%  [kernel]       [k] tasklet_action_common.isra.21
>      0.66%  [kernel]       [k] mlx5e_xmit
>      0.65%  [kernel]       [k] mlx5e_post_rx_mpwqes
>      0.63%  [kernel]       [k] _raw_spin_lock
>      0.61%  [kernel]       [k] ip_sublist_rcv
>      0.57%  [kernel]       [k] skb_release_data
>      0.53%  [kernel]       [k] __local_bh_enable_ip
>      0.53%  [kernel]       [k] tcp4_gro_receive
>      0.51%  [kernel]       [k] pfifo_fast_dequeue
>      0.51%  [kernel]       [k] page_frag_free
>      0.50%  [kernel]       [k] kmem_cache_free
>      0.47%  [kernel]       [k] dma_direct_map_page
>      0.45%  [kernel]       [k] native_irq_return_iret
>      0.44%  [kernel]       [k] __slab_free.isra.89
>      0.43%  [kernel]       [k] skb_gro_receive
>      0.43%  [kernel]       [k] napi_gro_receive
>      0.43%  [kernel]       [k] __do_softirq
>      0.41%  [kernel]       [k] sch_direct_xmit
>      0.41%  [kernel]       [k] ip_rcv_finish_core.isra.19
>      0.40%  [kernel]       [k] skb_network_protocol
>      0.40%  [kernel]       [k] __get_xps_queue_idx
>
>
> Im useing team (2x 100G LAG)- that is why here is some load:
>
>      4.41%  [kernel]       [k] rr_transmit
>
>
>
> No discards on interfaces:
>
> ethtool -S enp179s0f0 | grep disc
>      rx_discards_phy: 0
>      tx_discards_phy: 0
>
> ethtool -S enp179s0f1 | grep disc
>      rx_discards_phy: 0
>      tx_discards_phy: 0
>
> io/stream test at 72G/72G traffic:
>
> -------------------------------------------------------------
> Function    Best Rate MB/s  Avg time     Min time     Max time
> Copy:           38948.8     0.004368     0.004108     0.004533
> Scale:          37914.6     0.004473     0.004220     0.004802
> Add:            43134.6     0.005801     0.005564     0.006086
> Triad:          42934.1     0.005696     0.005590     0.005901
> -------------------------------------------------------------
>
>
> And some links to screenshoots
>
> Softirqs
>
> https://pasteboard.co/IIZkGrw.png
>
> And bandwidth / cpu / pps grapsh
>
> https://pasteboard.co/IIZl6XP.png
>
>
> Currently it looks like the biggest problem for 100G is cpu->mem->nic 
> bandwidth or nic doorbell / page cache at RX processing - cause what i 
> can see is that if I run iperf on this host i can TX full 100G - but I 
> cant RX 100G when i flood this host from some packet generator (it 
> will start to drop packets at arount 82Gbit/s) - and this is not a 
> problem with ppp but it is bandwidth problem.
>
> For example i can flood RX with 14Mpps or 64b packets without nic 
> discards but i cant flood it with 1000b frames and same pps - cause 
> when it reaches 82Gbit/s nic's start to report discards.
>
>
> Thanks
>
>
>
Forgot to add this is forwarding scenario - so router is routing packets 
from one 100G interface to another 100G interface and vice-versa (full 
BGP feed x4 from 4 different upstreams) - 700k+ flows.




  reply	other threads:[~2019-11-29 22:14 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-29 22:00 Linux kernel - 5.4.0+ (net-next from 27.11.2019) routing/network performance Paweł Staszewski
2019-11-29 22:13 ` Paweł Staszewski [this message]
2019-12-01 16:05 ` David Ahern
2019-12-02 10:09   ` Paweł Staszewski
2019-12-02 10:53     ` Paolo Abeni
2019-12-02 16:23       ` Paweł Staszewski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55238597-f966-13aa-2dae-4fde19456254@itcare.pl \
    --to=pstaszewski@itcare.pl \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.