From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932204AbbDMK4m (ORCPT ); Mon, 13 Apr 2015 06:56:42 -0400 Received: from mail-vn0-f42.google.com ([209.85.216.42]:35579 "EHLO mail-vn0-f42.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932140AbbDMK4a (ORCPT ); Mon, 13 Apr 2015 06:56:30 -0400 MIME-Version: 1.0 In-Reply-To: References: <1428596218.25985.263.camel@edumazet-glaptop2.roam.corp.google.com> Date: Mon, 13 Apr 2015 11:56:29 +0100 X-Google-Sender-Auth: ZTK2eZsCvR4_lkI_kzUykFnLguY Message-ID: Subject: Re: [Xen-devel] "tcp: refine TSO autosizing" causes performance regression on Xen From: George Dunlap To: Stefano Stabellini Cc: Eric Dumazet , "xen-devel@lists.xensource.com" , Wei Liu , Ian Campbell , netdev , Linux Kernel Mailing List , edumazet@google.com, linux-arm-kernel@lists.infradead.org, Christoffer Dall , David Vrabel , Jonathan Davies , Felipe Franciosi , Paul Durrant Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Apr 9, 2015 at 5:36 PM, Stefano Stabellini wrote: > On Thu, 9 Apr 2015, Eric Dumazet wrote: >> On Thu, 2015-04-09 at 16:46 +0100, Stefano Stabellini wrote: >> > Hi all, >> > >> > I found a performance regression when running netperf -t TCP_MAERTS from >> > an external host to a Xen VM on ARM64: v3.19 and v4.0-rc4 running in the >> > virtual machine are 30% slower than v3.18. >> > >> > Through bisection I found that the perf regression is caused by the >> > prensence of the following commit in the guest kernel: >> > >> > >> > commit 605ad7f184b60cfaacbc038aa6c55ee68dee3c89 >> > Author: Eric Dumazet >> > Date: Sun Dec 7 12:22:18 2014 -0800 >> > >> > tcp: refine TSO autosizing [snip] >> This commit restored original TCP Small Queue behavior, which is the >> first step to fight bufferbloat. >> >> Some network drivers are known to be problematic because of a delayed TX >> completion. [snip] >> Try to tweak /proc/sys/net/ipv4/tcp_limit_output_bytes to see if it >> makes a difference ? > > A very big difference: > > echo 262144 > /proc/sys/net/ipv4/tcp_limit_output_bytes > brings us much closer to the original performance, the slowdown is just > 8% > > echo 1048576 > /proc/sys/net/ipv4/tcp_limit_output_bytes > fills the gap entirely, same performance as before "refine TSO > autosizing" > > > What would be the next step for here? Should I just document this as an > important performance tweaking step for Xen, or is there something else > we can do? Is the problem perhaps that netback/netfront delays TX completion? Would it be better to see if that can be addressed properly, so that the original purpose of the patch (fighting bufferbloat) can be achieved while not degrading performance for Xen? Or at least, so that people get decent perfomance out of the box without having to tweak TCP parameters? -George From mboxrd@z Thu Jan 1 00:00:00 1970 From: George.Dunlap@eu.citrix.com (George Dunlap) Date: Mon, 13 Apr 2015 11:56:29 +0100 Subject: [Xen-devel] "tcp: refine TSO autosizing" causes performance regression on Xen In-Reply-To: References: <1428596218.25985.263.camel@edumazet-glaptop2.roam.corp.google.com> Message-ID: To: linux-arm-kernel@lists.infradead.org List-Id: linux-arm-kernel.lists.infradead.org On Thu, Apr 9, 2015 at 5:36 PM, Stefano Stabellini wrote: > On Thu, 9 Apr 2015, Eric Dumazet wrote: >> On Thu, 2015-04-09 at 16:46 +0100, Stefano Stabellini wrote: >> > Hi all, >> > >> > I found a performance regression when running netperf -t TCP_MAERTS from >> > an external host to a Xen VM on ARM64: v3.19 and v4.0-rc4 running in the >> > virtual machine are 30% slower than v3.18. >> > >> > Through bisection I found that the perf regression is caused by the >> > prensence of the following commit in the guest kernel: >> > >> > >> > commit 605ad7f184b60cfaacbc038aa6c55ee68dee3c89 >> > Author: Eric Dumazet >> > Date: Sun Dec 7 12:22:18 2014 -0800 >> > >> > tcp: refine TSO autosizing [snip] >> This commit restored original TCP Small Queue behavior, which is the >> first step to fight bufferbloat. >> >> Some network drivers are known to be problematic because of a delayed TX >> completion. [snip] >> Try to tweak /proc/sys/net/ipv4/tcp_limit_output_bytes to see if it >> makes a difference ? > > A very big difference: > > echo 262144 > /proc/sys/net/ipv4/tcp_limit_output_bytes > brings us much closer to the original performance, the slowdown is just > 8% > > echo 1048576 > /proc/sys/net/ipv4/tcp_limit_output_bytes > fills the gap entirely, same performance as before "refine TSO > autosizing" > > > What would be the next step for here? Should I just document this as an > important performance tweaking step for Xen, or is there something else > we can do? Is the problem perhaps that netback/netfront delays TX completion? Would it be better to see if that can be addressed properly, so that the original purpose of the patch (fighting bufferbloat) can be achieved while not degrading performance for Xen? Or at least, so that people get decent perfomance out of the box without having to tweak TCP parameters? -George