All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paweł Staszewski" <pstaszewski@itcare.pl>
To: Paolo Abeni <pabeni@redhat.com>, David Ahern <dsahern@gmail.com>,
	netdev@vger.kernel.org,
	Jesper Dangaard Brouer <brouer@redhat.com>
Subject: Re: Linux kernel - 5.4.0+ (net-next from 27.11.2019) routing/network performance
Date: Mon, 2 Dec 2019 17:23:27 +0100	[thread overview]
Message-ID: <38f3e518-f41d-8e03-a6b9-7337566b8e04@itcare.pl> (raw)
In-Reply-To: <9c5c6dc9b7eb78c257d67c85ed2a6e0998ec8907.camel@redhat.com>


W dniu 02.12.2019 o 11:53, Paolo Abeni pisze:
> On Mon, 2019-12-02 at 11:09 +0100, Paweł Staszewski wrote:
>> W dniu 01.12.2019 o 17:05, David Ahern pisze:
>>> On 11/29/19 4:00 PM, Paweł Staszewski wrote:
>>>> As always - each year i need to summarize network performance for
>>>> routing applications like linux router on native Linux kernel (without
>>>> xdp/dpdk/vpp etc) :)
>>>>
>>> Do you keep past profiles? How does this profile (and traffic rates)
>>> compare to older kernels - e.g., 5.0 or 4.19?
>>>
>>>
>> Yes - so for 4.19:
>>
>> Max bandwidth was about 40-42Gbit/s RX / 40-42Gbit/s TX of
>> forwarded(routed) traffic
>>
>> And after "order-0 pages" patches - max was 50Gbit/s RX + 50Gbit/s TX
>> (forwarding - bandwidth max)
>>
>> (current kernel almost doubled this)
> Looks like we are on the good track ;)
>
> [...]
>> After "order-0 pages" patch
>>
>>      PerfTop:  104692 irqs/sec  kernel:99.5%  exact:  0.0% [4000Hz
>> cycles],  (all, 56 CPUs)
>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>
>>
>>        9.06%  [kernel]       [k] mlx5e_skb_from_cqe_mpwrq_linear
>>        6.43%  [kernel]       [k] tasklet_action_common.isra.21
>>        5.68%  [kernel]       [k] fib_table_lookup
>>        4.89%  [kernel]       [k] irq_entries_start
>>        4.53%  [kernel]       [k] mlx5_eq_int
>>        4.10%  [kernel]       [k] build_skb
>>        3.39%  [kernel]       [k] mlx5e_poll_tx_cq
>>        3.38%  [kernel]       [k] mlx5e_sq_xmit
>>        2.73%  [kernel]       [k] mlx5e_poll_rx_cq
> Compared to the current kernel perf figures, it looks like most of the
> gains come from driver changes.
>
> [... current perf figures follow ...]
>> ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>
>>
>>        7.56%  [kernel]       [k] __dev_queue_xmit
> This is a bit surprising to me. I guess this is due
> '__dev_queue_xmit()' being calling twice per packet (team, NIC) and due
> to the retpoline overhead.
>
>>        1.74%  [kernel]       [k] tcp_gro_receive
> If the reference use-case is with a quite large number of cuncurrent
> flows, I guess you can try disabling GRO

Disabling GRO with teamed interfaces is not good cause after disabling 
GRO on physical interfaces cpu load is about 10% higher on all cores.

And observation:

Enabled GRO on interfaces vs team0 packets per second:

   iface                   Rx                   Tx Total
==============================================================================
             team0:     5952483.50 KB/s      6028436.50 KB/s 11980919.00 
KB/s
----------------------------------------------------------------------------

And softnetstats:

CPU          total/sec     dropped/sec    squeezed/sec 
collision/sec      rx_rps/sec  flow_limit/sec
CPU:00         1014977               0 35               0               
0               0
CPU:01         1074461               0 30               0               
0               0
CPU:02         1020460               0 34               0               
0               0
CPU:03         1077624               0 34               0               
0               0
CPU:04         1005102               0 32               0               
0               0
CPU:05         1097107               0 46               0               
0               0
CPU:06          997877               0 24               0               
0               0
CPU:07         1056216               0 34               0               
0               0
CPU:08          856567               0 34               0               
0               0
CPU:09          862527               0 23               0               
0               0
CPU:10          876107               0 34               0               
0               0
CPU:11          759275               0 27               0               
0               0
CPU:12          817307               0 27               0               
0               0
CPU:13          868073               0 21               0               
0               0
CPU:14          837783               0 34               0               
0               0
CPU:15          817946               0 27               0               
0               0
CPU:16          785500               0 25               0               
0               0
CPU:17          851276               0 28               0               
0               0
CPU:18          843888               0 29               0               
0               0
CPU:19          924840               0 34               0               
0               0
CPU:20          884879               0 37               0               
0               0
CPU:21          841461               0 28               0               
0               0
CPU:22          819436               0 32               0               
0               0
CPU:23          872843               0 32               0               
0               0

Summed:       21863531               0 740               0               
0               0


Disabled GRO on interfaces vs team0 packets per second:

   iface                   Rx                   Tx Total
==============================================================================
             team0:     5952483.50 KB/s      6028436.50 KB/s 11980919.00 
KB/s
----------------------------------------------------------------------------

And softnet stat:

CPU          total/sec     dropped/sec    squeezed/sec 
collision/sec      rx_rps/sec  flow_limit/sec
CPU:00          625288               0 23               0               
0               0
CPU:01          605239               0 24               0               
0               0
CPU:02          644965               0 26               0               
0               0
CPU:03          620264               0 30               0               
0               0
CPU:04          603416               0 25               0               
0               0
CPU:05          597838               0 23               0               
0               0
CPU:06          580028               0 22               0               
0               0
CPU:07          604274               0 23               0               
0               0
CPU:08          556119               0 26               0               
0               0
CPU:09          494997               0 23               0               
0               0
CPU:10          514759               0 23               0               
0               0
CPU:11          500333               0 22               0               
0               0
CPU:12          497956               0 23               0               
0               0
CPU:13          535194               0 14               0               
0               0
CPU:14          504304               0 24               0               
0               0
CPU:15          489015               0 18               0               
0               0
CPU:16          487249               0 24               0               
0               0
CPU:17          472023               0 23               0               
0               0
CPU:18          539454               0 24               0               
0               0
CPU:19          499901               0 19               0               
0               0
CPU:20          479945               0 26               0               
0               0
CPU:21          486800               0 29               0               
0               0
CPU:22          466916               0 26               0               
0               0
CPU:23          559730               0 34               0               
0               0

Summed:       12966008               0 573               0               
0               0

Maybee without team it will be better.

>
> Cheers,
>
> Paolo
>
-- 
Paweł Staszewski


      reply	other threads:[~2019-12-02 16:24 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-29 22:00 Linux kernel - 5.4.0+ (net-next from 27.11.2019) routing/network performance Paweł Staszewski
2019-11-29 22:13 ` Paweł Staszewski
2019-12-01 16:05 ` David Ahern
2019-12-02 10:09   ` Paweł Staszewski
2019-12-02 10:53     ` Paolo Abeni
2019-12-02 16:23       ` Paweł Staszewski [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=38f3e518-f41d-8e03-a6b9-7337566b8e04@itcare.pl \
    --to=pstaszewski@itcare.pl \
    --cc=brouer@redhat.com \
    --cc=dsahern@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.