netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Performance impact in networking data path tests in Linux 5.5 Kernel
@ 2020-02-25  5:46 Rajender M
  2020-02-26  9:48 ` Vincent Guittot
  0 siblings, 1 reply; 4+ messages in thread
From: Rajender M @ 2020-02-25  5:46 UTC (permalink / raw)
  To: linux-kernel, netdev
  Cc: Vincent Guittot, Peter Zijlstra, David S. Miller, Steven Rostedt

As part of VMware's performance regression testing for Linux Kernel upstream
 releases, when comparing Linux 5.5 kernel against Linux 5.4 kernel, we noticed 
20% improvement in networking throughput performance at the cost of a 30% 
increase in the CPU utilization.

After performing the bisect between 5.4 and 5.5, we identified the root cause 
of this behaviour to be a scheduling change from Vincent Guittot's 
2ab4092fc82d ("sched/fair: Spread out tasks evenly when not overloaded").

The impacted testcases are TCP_STREAM SEND & RECV – on both small 
(8K socket & 256B message) & large (64K socket & 16K message) packet sizes.

We backed out Vincent's commit & reran our networking tests and found that 
the performance were similar to 5.4 kernel - improvements in networking tests 
were no more.

In our current network performance testing, we use Intel 10G NIC to evaluate 
all Linux Kernel releases. In order to confirm that the impact is also seen in 
higher bandwidth NIC, we repeated the same test cases with Intel 40G and 
we were able to reproduce the same behaviour - 25% improvements in 
throughput with 10% more CPU consumption.

The overall results indicate that the new scheduler change has introduced 
much better network throughput performance at the cost of incremental 
CPU usage. This can be seen as expected behavior because now the 
TCP streams are evenly spread across all the CPUs and eventually drives 
more network packets, with additional CPU consumption.


We have also confirmed this theory by parsing the ESX stats for 5.4 and 5.5 
kernels in a 4vCPU VM running 8 TCP streams - as shown below;

5.4 kernel:
  "2132149": {"id": 2132149, "used": 94.37, "ready": 0.01, "cstp": 0.00, "name": "vmx-vcpu-0:rhel7x64-0",
  "2132151": {"id": 2132151, "used": 0.13, "ready": 0.00, "cstp": 0.00, "name": "vmx-vcpu-1:rhel7x64-0",
  "2132152": {"id": 2132152, "used": 9.07, "ready": 0.03, "cstp": 0.00, "name": "vmx-vcpu-2:rhel7x64-0",
  "2132153": {"id": 2132153, "used": 34.77, "ready": 0.01, "cstp": 0.00, "name": "vmx-vcpu-3:rhel7x64-0",

5.5 kernel:
  "2132041": {"id": 2132041, "used": 55.70, "ready": 0.01, "cstp": 0.00, "name": "vmx-vcpu-0:rhel7x64-0",
  "2132043": {"id": 2132043, "used": 47.53, "ready": 0.01, "cstp": 0.00, "name": "vmx-vcpu-1:rhel7x64-0",
  "2132044": {"id": 2132044, "used": 77.81, "ready": 0.00, "cstp": 0.00, "name": "vmx-vcpu-2:rhel7x64-0",
  "2132045": {"id": 2132045, "used": 57.11, "ready": 0.02, "cstp": 0.00, "name": "vmx-vcpu-3:rhel7x64-0",

Note, "used %" in above stats for 5.5 kernel is evenly distributed across all vCPUs. 

On the whole, this change should be seen as a significant improvement for 
most customers.

Rajender M
Performance Engineering
VMware, Inc.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Performance impact in networking data path tests in Linux 5.5 Kernel
  2020-02-25  5:46 Performance impact in networking data path tests in Linux 5.5 Kernel Rajender M
@ 2020-02-26  9:48 ` Vincent Guittot
  2020-02-26 11:45   ` Rajender M
  0 siblings, 1 reply; 4+ messages in thread
From: Vincent Guittot @ 2020-02-26  9:48 UTC (permalink / raw)
  To: Rajender M
  Cc: linux-kernel, netdev, Peter Zijlstra, David S. Miller, Steven Rostedt

Hi Rajender,

On Tue, 25 Feb 2020 at 06:46, Rajender M <manir@vmware.com> wrote:
>
> As part of VMware's performance regression testing for Linux Kernel upstream
>  releases, when comparing Linux 5.5 kernel against Linux 5.4 kernel, we noticed
> 20% improvement in networking throughput performance at the cost of a 30%
> increase in the CPU utilization.

Thanks for testing and sharing results with us. It's always
interesting to get feedbacks from various tests cases

>
> After performing the bisect between 5.4 and 5.5, we identified the root cause
> of this behaviour to be a scheduling change from Vincent Guittot's
> 2ab4092fc82d ("sched/fair: Spread out tasks evenly when not overloaded").
>
> The impacted testcases are TCP_STREAM SEND & RECV – on both small
> (8K socket & 256B message) & large (64K socket & 16K message) packet sizes.
>
> We backed out Vincent's commit & reran our networking tests and found that
> the performance were similar to 5.4 kernel - improvements in networking tests
> were no more.
>
> In our current network performance testing, we use Intel 10G NIC to evaluate
> all Linux Kernel releases. In order to confirm that the impact is also seen in
> higher bandwidth NIC, we repeated the same test cases with Intel 40G and
> we were able to reproduce the same behaviour - 25% improvements in
> throughput with 10% more CPU consumption.
>
> The overall results indicate that the new scheduler change has introduced
> much better network throughput performance at the cost of incremental
> CPU usage. This can be seen as expected behavior because now the
> TCP streams are evenly spread across all the CPUs and eventually drives
> more network packets, with additional CPU consumption.
>
>
> We have also confirmed this theory by parsing the ESX stats for 5.4 and 5.5
> kernels in a 4vCPU VM running 8 TCP streams - as shown below;
>
> 5.4 kernel:
>   "2132149": {"id": 2132149, "used": 94.37, "ready": 0.01, "cstp": 0.00, "name": "vmx-vcpu-0:rhel7x64-0",
>   "2132151": {"id": 2132151, "used": 0.13, "ready": 0.00, "cstp": 0.00, "name": "vmx-vcpu-1:rhel7x64-0",
>   "2132152": {"id": 2132152, "used": 9.07, "ready": 0.03, "cstp": 0.00, "name": "vmx-vcpu-2:rhel7x64-0",
>   "2132153": {"id": 2132153, "used": 34.77, "ready": 0.01, "cstp": 0.00, "name": "vmx-vcpu-3:rhel7x64-0",
>
> 5.5 kernel:
>   "2132041": {"id": 2132041, "used": 55.70, "ready": 0.01, "cstp": 0.00, "name": "vmx-vcpu-0:rhel7x64-0",
>   "2132043": {"id": 2132043, "used": 47.53, "ready": 0.01, "cstp": 0.00, "name": "vmx-vcpu-1:rhel7x64-0",
>   "2132044": {"id": 2132044, "used": 77.81, "ready": 0.00, "cstp": 0.00, "name": "vmx-vcpu-2:rhel7x64-0",
>   "2132045": {"id": 2132045, "used": 57.11, "ready": 0.02, "cstp": 0.00, "name": "vmx-vcpu-3:rhel7x64-0",
>
> Note, "used %" in above stats for 5.5 kernel is evenly distributed across all vCPUs.
>
> On the whole, this change should be seen as a significant improvement for
> most customers.
>
> Rajender M
> Performance Engineering
> VMware, Inc.
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Performance impact in networking data path tests in Linux 5.5 Kernel
  2020-02-26  9:48 ` Vincent Guittot
@ 2020-02-26 11:45   ` Rajender M
  2020-02-26 14:10     ` Vincent Guittot
  0 siblings, 1 reply; 4+ messages in thread
From: Rajender M @ 2020-02-26 11:45 UTC (permalink / raw)
  To: Vincent Guittot
  Cc: linux-kernel, netdev, Peter Zijlstra, David S. Miller, Steven Rostedt

Thanks for your response, Vincent. 
Just curious to know, if there are any room for optimizing 
the additional CPU cost. 


On 26/02/20, 3:18 PM, "Vincent Guittot" <vincent.guittot@linaro.org> wrote:

    Hi Rajender,
    
    On Tue, 25 Feb 2020 at 06:46, Rajender M <manir@vmware.com> wrote:
    >
    > As part of VMware's performance regression testing for Linux Kernel upstream
    >  releases, when comparing Linux 5.5 kernel against Linux 5.4 kernel, we noticed
    > 20% improvement in networking throughput performance at the cost of a 30%
    > increase in the CPU utilization.
    
    Thanks for testing and sharing results with us. It's always
    interesting to get feedbacks from various tests cases
    
    >
    > After performing the bisect between 5.4 and 5.5, we identified the root cause
    > of this behaviour to be a scheduling change from Vincent Guittot's
    > 2ab4092fc82d ("sched/fair: Spread out tasks evenly when not overloaded").
    >
    > The impacted testcases are TCP_STREAM SEND & RECV – on both small
    > (8K socket & 256B message) & large (64K socket & 16K message) packet sizes.
    >
    > We backed out Vincent's commit & reran our networking tests and found that
    > the performance were similar to 5.4 kernel - improvements in networking tests
    > were no more.
    >
    > In our current network performance testing, we use Intel 10G NIC to evaluate
    > all Linux Kernel releases. In order to confirm that the impact is also seen in
    > higher bandwidth NIC, we repeated the same test cases with Intel 40G and
    > we were able to reproduce the same behaviour - 25% improvements in
    > throughput with 10% more CPU consumption.
    >
    > The overall results indicate that the new scheduler change has introduced
    > much better network throughput performance at the cost of incremental
    > CPU usage. This can be seen as expected behavior because now the
    > TCP streams are evenly spread across all the CPUs and eventually drives
    > more network packets, with additional CPU consumption.
    >
    >
    > We have also confirmed this theory by parsing the ESX stats for 5.4 and 5.5
    > kernels in a 4vCPU VM running 8 TCP streams - as shown below;
    >
    > 5.4 kernel:
    >   "2132149": {"id": 2132149, "used": 94.37, "ready": 0.01, "cstp": 0.00, "name": "vmx-vcpu-0:rhel7x64-0",
    >   "2132151": {"id": 2132151, "used": 0.13, "ready": 0.00, "cstp": 0.00, "name": "vmx-vcpu-1:rhel7x64-0",
    >   "2132152": {"id": 2132152, "used": 9.07, "ready": 0.03, "cstp": 0.00, "name": "vmx-vcpu-2:rhel7x64-0",
    >   "2132153": {"id": 2132153, "used": 34.77, "ready": 0.01, "cstp": 0.00, "name": "vmx-vcpu-3:rhel7x64-0",
    >
    > 5.5 kernel:
    >   "2132041": {"id": 2132041, "used": 55.70, "ready": 0.01, "cstp": 0.00, "name": "vmx-vcpu-0:rhel7x64-0",
    >   "2132043": {"id": 2132043, "used": 47.53, "ready": 0.01, "cstp": 0.00, "name": "vmx-vcpu-1:rhel7x64-0",
    >   "2132044": {"id": 2132044, "used": 77.81, "ready": 0.00, "cstp": 0.00, "name": "vmx-vcpu-2:rhel7x64-0",
    >   "2132045": {"id": 2132045, "used": 57.11, "ready": 0.02, "cstp": 0.00, "name": "vmx-vcpu-3:rhel7x64-0",
    >
    > Note, "used %" in above stats for 5.5 kernel is evenly distributed across all vCPUs.
    >
    > On the whole, this change should be seen as a significant improvement for
    > most customers.
    >
    > Rajender M
    > Performance Engineering
    > VMware, Inc.
    >
    


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Performance impact in networking data path tests in Linux 5.5 Kernel
  2020-02-26 11:45   ` Rajender M
@ 2020-02-26 14:10     ` Vincent Guittot
  0 siblings, 0 replies; 4+ messages in thread
From: Vincent Guittot @ 2020-02-26 14:10 UTC (permalink / raw)
  To: Rajender M
  Cc: linux-kernel, netdev, Peter Zijlstra, David S. Miller, Steven Rostedt

On Wed, 26 Feb 2020 at 12:45, Rajender M <manir@vmware.com> wrote:
>
> Thanks for your response, Vincent.
> Just curious to know, if there are any room for optimizing
> the additional CPU cost.

That's difficult to say, the additional cost is probably link to how
the CPU is involved in the data path. IIUC your results, there is +30%
CPUs for +20% of throughput for the 10GB NIC but only +10% CPU for
+25%  of throughput for the 40GB which might have more things done by
HW and needs less action from CPU

>
>
> On 26/02/20, 3:18 PM, "Vincent Guittot" <vincent.guittot@linaro.org> wrote:
>
>     Hi Rajender,
>
>     On Tue, 25 Feb 2020 at 06:46, Rajender M <manir@vmware.com> wrote:
>     >
>     > As part of VMware's performance regression testing for Linux Kernel upstream
>     >  releases, when comparing Linux 5.5 kernel against Linux 5.4 kernel, we noticed
>     > 20% improvement in networking throughput performance at the cost of a 30%
>     > increase in the CPU utilization.
>
>     Thanks for testing and sharing results with us. It's always
>     interesting to get feedbacks from various tests cases
>
>     >
>     > After performing the bisect between 5.4 and 5.5, we identified the root cause
>     > of this behaviour to be a scheduling change from Vincent Guittot's
>     > 2ab4092fc82d ("sched/fair: Spread out tasks evenly when not overloaded").
>     >
>     > The impacted testcases are TCP_STREAM SEND & RECV – on both small
>     > (8K socket & 256B message) & large (64K socket & 16K message) packet sizes.
>     >
>     > We backed out Vincent's commit & reran our networking tests and found that
>     > the performance were similar to 5.4 kernel - improvements in networking tests
>     > were no more.
>     >
>     > In our current network performance testing, we use Intel 10G NIC to evaluate
>     > all Linux Kernel releases. In order to confirm that the impact is also seen in
>     > higher bandwidth NIC, we repeated the same test cases with Intel 40G and
>     > we were able to reproduce the same behaviour - 25% improvements in
>     > throughput with 10% more CPU consumption.
>     >
>     > The overall results indicate that the new scheduler change has introduced
>     > much better network throughput performance at the cost of incremental
>     > CPU usage. This can be seen as expected behavior because now the
>     > TCP streams are evenly spread across all the CPUs and eventually drives
>     > more network packets, with additional CPU consumption.
>     >
>     >
>     > We have also confirmed this theory by parsing the ESX stats for 5.4 and 5.5
>     > kernels in a 4vCPU VM running 8 TCP streams - as shown below;
>     >
>     > 5.4 kernel:
>     >   "2132149": {"id": 2132149, "used": 94.37, "ready": 0.01, "cstp": 0.00, "name": "vmx-vcpu-0:rhel7x64-0",
>     >   "2132151": {"id": 2132151, "used": 0.13, "ready": 0.00, "cstp": 0.00, "name": "vmx-vcpu-1:rhel7x64-0",
>     >   "2132152": {"id": 2132152, "used": 9.07, "ready": 0.03, "cstp": 0.00, "name": "vmx-vcpu-2:rhel7x64-0",
>     >   "2132153": {"id": 2132153, "used": 34.77, "ready": 0.01, "cstp": 0.00, "name": "vmx-vcpu-3:rhel7x64-0",
>     >
>     > 5.5 kernel:
>     >   "2132041": {"id": 2132041, "used": 55.70, "ready": 0.01, "cstp": 0.00, "name": "vmx-vcpu-0:rhel7x64-0",
>     >   "2132043": {"id": 2132043, "used": 47.53, "ready": 0.01, "cstp": 0.00, "name": "vmx-vcpu-1:rhel7x64-0",
>     >   "2132044": {"id": 2132044, "used": 77.81, "ready": 0.00, "cstp": 0.00, "name": "vmx-vcpu-2:rhel7x64-0",
>     >   "2132045": {"id": 2132045, "used": 57.11, "ready": 0.02, "cstp": 0.00, "name": "vmx-vcpu-3:rhel7x64-0",
>     >
>     > Note, "used %" in above stats for 5.5 kernel is evenly distributed across all vCPUs.
>     >
>     > On the whole, this change should be seen as a significant improvement for
>     > most customers.
>     >
>     > Rajender M
>     > Performance Engineering
>     > VMware, Inc.
>     >
>
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-02-26 14:10 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-25  5:46 Performance impact in networking data path tests in Linux 5.5 Kernel Rajender M
2020-02-26  9:48 ` Vincent Guittot
2020-02-26 11:45   ` Rajender M
2020-02-26 14:10     ` Vincent Guittot

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).