linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg KH <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: torvalds@linux-foundation.org, akpm@linux-foundation.org,
	alan@lxorguk.ukuu.org.uk, Eric Dumazet <edumazet@google.com>,
	Neal Cardwell <ncardwell@google.com>,
	Tom Herbert <therbert@google.com>,
	Yuchung Cheng <ycheng@google.com>,
	"David S. Miller" <davem@davemloft.net>
Subject: [ 31/52] tcp: change tcp_adv_win_scale and tcp_rmem[2]
Date: Thu, 10 May 2012 10:32:03 -0700	[thread overview]
Message-ID: <20120510173135.615265392@linuxfoundation.org> (raw)
In-Reply-To: <20120510173229.GA5678@kroah.com>

3.3-stable review patch.  If anyone has any objections, please let me know.

------------------


From: Eric Dumazet <edumazet@google.com>

[ Upstream commit b49960a05e32121d29316cfdf653894b88ac9190 ]

tcp_adv_win_scale default value is 2, meaning we expect a good citizen
skb to have skb->len / skb->truesize ratio of 75% (3/4)

In 2.6 kernels we (mis)accounted for typical MSS=1460 frame :
1536 + 64 + 256 = 1856 'estimated truesize', and 1856 * 3/4 = 1392.
So these skbs were considered as not bloated.

With recent truesize fixes, a typical MSS=1460 frame truesize is now the
more precise :
2048 + 256 = 2304. But 2304 * 3/4 = 1728.
So these skb are not good citizen anymore, because 1460 < 1728

(GRO can escape this problem because it build skbs with a too low
truesize.)

This also means tcp advertises a too optimistic window for a given
allocated rcvspace : When receiving frames, sk_rmem_alloc can hit
sk_rcvbuf limit and we call tcp_prune_queue()/tcp_collapse() too often,
especially when application is slow to drain its receive queue or in
case of losses (netperf is fast, scp is slow). This is a major latency
source.

We should adjust the len/truesize ratio to 50% instead of 75%

This patch :

1) changes tcp_adv_win_scale default to 1 instead of 2

2) increase tcp_rmem[2] limit from 4MB to 6MB to take into account
better truesize tracking and to allow autotuning tcp receive window to
reach same value than before. Note that same amount of kernel memory is
consumed compared to 2.6 kernels.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 Documentation/networking/ip-sysctl.txt |    4 ++--
 net/ipv4/tcp.c                         |    9 +++++----
 net/ipv4/tcp_input.c                   |    2 +-
 3 files changed, 8 insertions(+), 7 deletions(-)

--- a/Documentation/networking/ip-sysctl.txt
+++ b/Documentation/networking/ip-sysctl.txt
@@ -147,7 +147,7 @@ tcp_adv_win_scale - INTEGER
 	(if tcp_adv_win_scale > 0) or bytes-bytes/2^(-tcp_adv_win_scale),
 	if it is <= 0.
 	Possible values are [-31, 31], inclusive.
-	Default: 2
+	Default: 1
 
 tcp_allowed_congestion_control - STRING
 	Show/set the congestion control choices available to non-privileged
@@ -410,7 +410,7 @@ tcp_rmem - vector of 3 INTEGERs: min, de
 	net.core.rmem_max.  Calling setsockopt() with SO_RCVBUF disables
 	automatic tuning of that socket's receive buffer size, in which
 	case this value is ignored.
-	Default: between 87380B and 4MB, depending on RAM size.
+	Default: between 87380B and 6MB, depending on RAM size.
 
 tcp_sack - BOOLEAN
 	Enable select acknowledgments (SACKS).
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -3240,7 +3240,7 @@ void __init tcp_init(void)
 {
 	struct sk_buff *skb = NULL;
 	unsigned long limit;
-	int max_share, cnt;
+	int max_rshare, max_wshare, cnt;
 	unsigned int i;
 	unsigned long jiffy = jiffies;
 
@@ -3300,15 +3300,16 @@ void __init tcp_init(void)
 	tcp_init_mem(&init_net);
 	/* Set per-socket limits to no more than 1/128 the pressure threshold */
 	limit = nr_free_buffer_pages() << (PAGE_SHIFT - 7);
-	max_share = min(4UL*1024*1024, limit);
+	max_wshare = min(4UL*1024*1024, limit);
+	max_rshare = min(6UL*1024*1024, limit);
 
 	sysctl_tcp_wmem[0] = SK_MEM_QUANTUM;
 	sysctl_tcp_wmem[1] = 16*1024;
-	sysctl_tcp_wmem[2] = max(64*1024, max_share);
+	sysctl_tcp_wmem[2] = max(64*1024, max_wshare);
 
 	sysctl_tcp_rmem[0] = SK_MEM_QUANTUM;
 	sysctl_tcp_rmem[1] = 87380;
-	sysctl_tcp_rmem[2] = max(87380, max_share);
+	sysctl_tcp_rmem[2] = max(87380, max_rshare);
 
 	printk(KERN_INFO "TCP: Hash tables configured "
 	       "(established %u bind %u)\n",
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -83,7 +83,7 @@ int sysctl_tcp_ecn __read_mostly = 2;
 EXPORT_SYMBOL(sysctl_tcp_ecn);
 int sysctl_tcp_dsack __read_mostly = 1;
 int sysctl_tcp_app_win __read_mostly = 31;
-int sysctl_tcp_adv_win_scale __read_mostly = 2;
+int sysctl_tcp_adv_win_scale __read_mostly = 1;
 EXPORT_SYMBOL(sysctl_tcp_adv_win_scale);
 
 int sysctl_tcp_stdurg __read_mostly;



  parent reply	other threads:[~2012-05-10 17:35 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-10 17:32 [ 00/52] 3.3.6-stable review Greg KH
2012-05-10 17:31 ` [ 01/52] drm/i915: enable dip before writing data on gen4 Greg KH
2012-05-10 17:31 ` [ 02/52] smsc95xx: mark link down on startup and let PHY interrupt deal with carrier changes Greg KH
2012-05-10 17:31 ` [ 03/52] e1000: fix vlan processing regression Greg KH
2012-05-10 17:31 ` [ 04/52] xen/pte: Fix crashes when trying to see non-existent PGD/PMD/PUD/PTEs Greg KH
2012-05-10 17:31 ` [ 05/52] xen/pci: dont use PCI BIOS service for configuration space accesses Greg KH
2012-05-10 17:31 ` [ 06/52] drm/i915: disable sdvo hotplug on i945g/gm Greg KH
2012-05-10 17:31 ` [ 07/52] drm/i915: Do no set Stencil Cache eviction LRA w/a on gen7+ Greg KH
2012-05-10 17:31 ` [ 08/52] ASoC: core: check of_property_count_strings failure Greg KH
2012-05-10 17:31 ` [ 09/52] ASoC: tlv312aic23: unbreak resume Greg KH
2012-05-10 17:31 ` [ 10/52] fs/cifs: fix parsing of dfs referrals Greg KH
2012-05-10 17:31 ` [ 11/52] x86, relocs: Remove an unused variable Greg KH
2012-05-10 17:31 ` [ 12/52] percpu, x86: dont use PMD_SIZE as embedded atom_size on 32bit Greg KH
2012-05-10 17:31 ` [ 13/52] asm-generic: Use __BITS_PER_LONG in statfs.h Greg KH
2012-05-10 17:31 ` [ 14/52] Fix __read_seqcount_begin() to use ACCESS_ONCE for sequence value read Greg KH
2012-05-10 17:31 ` [ 15/52] ARM: 7410/1: Add extra clobber registers for assembly in kernel_execve Greg KH
2012-05-10 17:31 ` [ 16/52] ARM: 7411/1: audit: fix treatment of saved ip register during syscall tracing Greg KH
2012-05-10 17:31 ` [ 17/52] ARM: 7412/1: audit: use only AUDIT_ARCH_ARM regardless of endianness Greg KH
2012-05-10 17:31 ` [ 18/52] ARM: 7414/1: SMP: prevent use of the console when using idmap_pgd Greg KH
2012-05-10 17:31 ` [ 19/52] regulator: Fix the logic to ensure new voltage setting in valid range Greg KH
2012-05-10 17:31 ` [ 20/52] ARM: orion5x: Fix GPIO enable bits for MPP9 Greg KH
2012-05-10 17:31 ` [ 21/52] ARM: OMAP: Revert "ARM: OMAP: ctrl: Fix CONTROL_DSIPHY register fields" Greg KH
2012-05-10 17:31 ` [ 22/52] asix: Fix tx transfer padding for full-speed USB Greg KH
2012-05-10 18:25   ` Mark Lord
2012-05-10 18:29     ` Mark Lord
2012-05-10 18:52       ` David Miller
2012-05-10 17:31 ` [ 23/52] netem: fix possible skb leak Greg KH
2012-05-10 17:31 ` [ 24/52] net: In unregister_netdevice_notifier unregister the netdevices Greg KH
2012-05-10 17:31 ` [ 25/52] net: l2tp: unlock socket lock before returning from l2tp_ip_sendmsg Greg KH
2012-05-10 17:31 ` [ 26/52] sky2: propogate rx hash when packet is copied Greg KH
2012-05-10 17:31 ` [ 27/52] sky2: fix receive length error in mixed non-VLAN/VLAN traffic Greg KH
2012-05-10 17:32 ` [ 28/52] sungem: Fix WakeOnLan Greg KH
2012-05-10 17:32 ` [ 29/52] tg3: Avoid panic from reserved statblk field access Greg KH
2012-05-10 17:32 ` [ 30/52] tcp: fix infinite cwnd in tcp_complete_cwr() Greg KH
2012-05-10 17:32 ` Greg KH [this message]
2012-05-10 17:32 ` [ 32/52] net: Add memory barriers to prevent possible race in byte queue limits Greg KH
2012-05-10 17:32 ` [ 33/52] net: Fix issue with netdev_tx_reset_queue not resetting queue from XOFF state Greg KH
2012-05-10 18:53   ` Alexander Duyck
2012-05-10 18:55     ` David Miller
2012-05-10 19:46       ` Jonathan Nieder
2012-05-10 20:35         ` Alexander Duyck
2012-05-10 20:51           ` David Miller
2012-05-10 22:03             ` Greg KH
2012-05-10 17:32 ` [ 34/52] KVM: s390: do store status after handling STOP_ON_STOP bit Greg KH
2012-05-10 17:32 ` [ 35/52] KVM: s390: Sanitize fpc registers for KVM_SET_FPU Greg KH
2012-05-10 17:32 ` [ 36/52] KVM: Fix write protection race during dirty logging Greg KH
2012-05-10 17:32 ` [ 37/52] KVM: mmu_notifier: Flush TLBs before releasing mmu_lock Greg KH
2012-05-10 17:32 ` [ 38/52] KVM: x86 emulator: correctly mask pmc index bits in RDPMC instruction emulation Greg KH
2012-05-10 17:32 ` [ 39/52] KVM: Ensure all vcpus are consistent with in-kernel irqchip settings Greg KH
2012-05-10 17:32 ` [ 40/52] KVM: VMX: Fix delayed load of shared MSRs Greg KH
2012-05-10 17:32 ` [ 41/52] KVM: nVMX: Fix erroneous exception bitmap check Greg KH
2012-05-10 17:32 ` [ 42/52] KVM: VMX: vmx_set_cr0 expects kvm->srcu locked Greg KH
2012-05-10 17:32 ` [ 43/52] KVM: VMX: Fix kvm_set_shared_msr() called in preemptible context Greg KH
2012-05-10 17:32 ` [ 44/52] KVM: lock slots_lock around device assignment Greg KH
2012-05-10 17:32 ` [ 45/52] sony-laptop: Enable keyboard backlight by default Greg KH
2012-05-10 17:32 ` [ 46/52] hugepages: fix use after free bug in "quota" handling Greg KH
2012-05-10 17:32 ` [ 47/52] mtip32xx: fix incorrect value set for drv_cleanup_done, and re-initialize and start port in mtip_restart_port() Greg KH
2012-05-10 17:32 ` [ 48/52] mtip32xx: fix error handling in mtip_init() Greg KH
2012-05-10 17:32 ` [ 49/52] block: mtip32xx: remove HOTPLUG_PCI_PCIE dependancy Greg KH
2012-05-10 17:32 ` [ 50/52] nfsd: dont fail unchecked creates of non-special files Greg KH
2012-05-10 17:32 ` [ 51/52] ARM: 7397/1: l2x0: only apply workaround for erratum #753970 on PL310 Greg KH
2012-05-10 17:32 ` [ 52/52] ARM: 7398/1: l2x0: only write to debug registers " Greg KH

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120510173135.615265392@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ncardwell@google.com \
    --cc=stable@vger.kernel.org \
    --cc=therbert@google.com \
    --cc=torvalds@linux-foundation.org \
    --cc=ycheng@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).