All of lore.kernel.org
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Eric Dumazet <edumazet@google.com>,
	Soheil Hassas Yeganeh <soheil@google.com>,
	Neal Cardwell <ncardwell@google.com>,
	Yuchung Cheng <ycheng@google.com>,
	"David S. Miller" <davem@davemloft.net>,
	Sasha Levin <sashal@kernel.org>
Subject: [PATCH 5.10 44/86] tcp: tweak len/truesize ratio for coalesce candidates
Date: Mon, 29 Aug 2022 12:59:10 +0200	[thread overview]
Message-ID: <20220829105758.337074804@linuxfoundation.org> (raw)
In-Reply-To: <20220829105756.500128871@linuxfoundation.org>

From: Eric Dumazet <edumazet@google.com>

[ Upstream commit 240bfd134c592791fdceba1ce7fc3f973c33df2d ]

tcp_grow_window() is using skb->len/skb->truesize to increase tp->rcv_ssthresh
which has a direct impact on advertized window sizes.

We added TCP coalescing in linux-3.4 & linux-3.5:

Instead of storing skbs with one or two MSS in receive queue (or OFO queue),
we try to append segments together to reduce memory overhead.

High performance network drivers tend to cook skb with 3 parts :

1) sk_buff structure (256 bytes)
2) skb->head contains room to copy headers as needed, and skb_shared_info
3) page fragment(s) containing the ~1514 bytes frame (or more depending on MTU)

Once coalesced into a previous skb, 1) and 2) are freed.

We can therefore tweak the way we compute len/truesize ratio knowing
that skb->truesize is inflated by 1) and 2) soon to be freed.

This is done only for in-order skb, or skb coalesced into OFO queue.

The result is that low rate flows no longer pay the memory price of having
low GRO aggregation factor. Same result for drivers not using GRO.

This is critical to allow a big enough receiver window,
typically tcp_rmem[2] / 2.

We have been using this at Google for about 5 years, it is due time
to make it upstream.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 net/ipv4/tcp_input.c | 38 ++++++++++++++++++++++++++++++--------
 1 file changed, 30 insertions(+), 8 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index d35e88b5ffcbe..33a3fb04ac4df 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -454,11 +454,12 @@ static void tcp_sndbuf_expand(struct sock *sk)
  */
 
 /* Slow part of check#2. */
-static int __tcp_grow_window(const struct sock *sk, const struct sk_buff *skb)
+static int __tcp_grow_window(const struct sock *sk, const struct sk_buff *skb,
+			     unsigned int skbtruesize)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
 	/* Optimize this! */
-	int truesize = tcp_win_from_space(sk, skb->truesize) >> 1;
+	int truesize = tcp_win_from_space(sk, skbtruesize) >> 1;
 	int window = tcp_win_from_space(sk, sock_net(sk)->ipv4.sysctl_tcp_rmem[2]) >> 1;
 
 	while (tp->rcv_ssthresh <= window) {
@@ -471,7 +472,27 @@ static int __tcp_grow_window(const struct sock *sk, const struct sk_buff *skb)
 	return 0;
 }
 
-static void tcp_grow_window(struct sock *sk, const struct sk_buff *skb)
+/* Even if skb appears to have a bad len/truesize ratio, TCP coalescing
+ * can play nice with us, as sk_buff and skb->head might be either
+ * freed or shared with up to MAX_SKB_FRAGS segments.
+ * Only give a boost to drivers using page frag(s) to hold the frame(s),
+ * and if no payload was pulled in skb->head before reaching us.
+ */
+static u32 truesize_adjust(bool adjust, const struct sk_buff *skb)
+{
+	u32 truesize = skb->truesize;
+
+	if (adjust && !skb_headlen(skb)) {
+		truesize -= SKB_TRUESIZE(skb_end_offset(skb));
+		/* paranoid check, some drivers might be buggy */
+		if (unlikely((int)truesize < (int)skb->len))
+			truesize = skb->truesize;
+	}
+	return truesize;
+}
+
+static void tcp_grow_window(struct sock *sk, const struct sk_buff *skb,
+			    bool adjust)
 {
 	struct tcp_sock *tp = tcp_sk(sk);
 	int room;
@@ -480,15 +501,16 @@ static void tcp_grow_window(struct sock *sk, const struct sk_buff *skb)
 
 	/* Check #1 */
 	if (room > 0 && !tcp_under_memory_pressure(sk)) {
+		unsigned int truesize = truesize_adjust(adjust, skb);
 		int incr;
 
 		/* Check #2. Increase window, if skb with such overhead
 		 * will fit to rcvbuf in future.
 		 */
-		if (tcp_win_from_space(sk, skb->truesize) <= skb->len)
+		if (tcp_win_from_space(sk, truesize) <= skb->len)
 			incr = 2 * tp->advmss;
 		else
-			incr = __tcp_grow_window(sk, skb);
+			incr = __tcp_grow_window(sk, skb, truesize);
 
 		if (incr) {
 			incr = max_t(int, incr, 2 * skb->len);
@@ -782,7 +804,7 @@ static void tcp_event_data_recv(struct sock *sk, struct sk_buff *skb)
 	tcp_ecn_check_ce(sk, skb);
 
 	if (skb->len >= 128)
-		tcp_grow_window(sk, skb);
+		tcp_grow_window(sk, skb, true);
 }
 
 /* Called to compute a smoothed rtt estimate. The data fed to this
@@ -4761,7 +4783,7 @@ static void tcp_data_queue_ofo(struct sock *sk, struct sk_buff *skb)
 		 * and trigger fast retransmit.
 		 */
 		if (tcp_is_sack(tp))
-			tcp_grow_window(sk, skb);
+			tcp_grow_window(sk, skb, true);
 		kfree_skb_partial(skb, fragstolen);
 		skb = NULL;
 		goto add_sack;
@@ -4849,7 +4871,7 @@ static void tcp_data_queue_ofo(struct sock *sk, struct sk_buff *skb)
 		 * and trigger fast retransmit.
 		 */
 		if (tcp_is_sack(tp))
-			tcp_grow_window(sk, skb);
+			tcp_grow_window(sk, skb, false);
 		skb_condense(skb);
 		skb_set_owner_r(skb, sk);
 	}
-- 
2.35.1




  parent reply	other threads:[~2022-08-29 11:16 UTC|newest]

Thread overview: 96+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-29 10:58 [PATCH 5.10 00/86] 5.10.140-rc1 review Greg Kroah-Hartman
2022-08-29 10:58 ` [PATCH 5.10 01/86] audit: fix potential double free on error path from fsnotify_add_inode_mark Greg Kroah-Hartman
2022-08-29 10:58 ` [PATCH 5.10 02/86] parisc: Fix exception handler for fldw and fstw instructions Greg Kroah-Hartman
2022-08-29 10:58 ` [PATCH 5.10 03/86] kernel/sys_ni: add compat entry for fadvise64_64 Greg Kroah-Hartman
2022-08-29 10:58 ` [PATCH 5.10 04/86] pinctrl: amd: Dont save/restore interrupt status and wake status bits Greg Kroah-Hartman
2022-08-29 10:58 ` [PATCH 5.10 05/86] xfs: prevent a WARN_ONCE() in xfs_ioc_attr_list() Greg Kroah-Hartman
2022-08-29 10:58 ` [PATCH 5.10 06/86] xfs: reject crazy array sizes being fed to XFS_IOC_GETBMAP* Greg Kroah-Hartman
2022-08-29 10:58 ` [PATCH 5.10 07/86] fs: remove __sync_filesystem Greg Kroah-Hartman
2022-08-29 10:58 ` [PATCH 5.10 08/86] vfs: make sync_filesystem return errors from ->sync_fs Greg Kroah-Hartman
2022-08-29 10:58 ` [PATCH 5.10 09/86] xfs: return errors in xfs_fs_sync_fs Greg Kroah-Hartman
2022-08-29 10:58 ` [PATCH 5.10 10/86] xfs: only bother with sync_filesystem during readonly remount Greg Kroah-Hartman
2022-08-29 10:58 ` [PATCH 5.10 11/86] kernel/sched: Remove dl_boosted flag comment Greg Kroah-Hartman
2022-08-29 10:58 ` [PATCH 5.10 12/86] xfrm: fix refcount leak in __xfrm_policy_check() Greg Kroah-Hartman
2022-08-29 10:58 ` [PATCH 5.10 13/86] xfrm: clone missing x->lastused in xfrm_do_migrate Greg Kroah-Hartman
2022-08-29 10:58 ` [PATCH 5.10 14/86] af_key: Do not call xfrm_probe_algs in parallel Greg Kroah-Hartman
2022-08-29 10:58 ` [PATCH 5.10 15/86] xfrm: policy: fix metadata dst->dev xmit null pointer dereference Greg Kroah-Hartman
2022-08-29 10:58 ` [PATCH 5.10 16/86] NFS: Dont allocate nfs_fattr on the stack in __nfs42_ssc_open() Greg Kroah-Hartman
2022-08-29 10:58 ` [PATCH 5.10 17/86] NFSv4.2 fix problems with __nfs42_ssc_open Greg Kroah-Hartman
2022-08-29 10:58 ` [PATCH 5.10 18/86] SUNRPC: RPC level errors should set task->tk_rpc_status Greg Kroah-Hartman
2022-08-29 10:58 ` [PATCH 5.10 19/86] mm/huge_memory.c: use helper function migration_entry_to_page() Greg Kroah-Hartman
2022-08-29 10:58 ` [PATCH 5.10 20/86] mm/smaps: dont access young/dirty bit if pte unpresent Greg Kroah-Hartman
2022-08-29 10:58 ` [PATCH 5.10 21/86] rose: check NULL rose_loopback_neigh->loopback Greg Kroah-Hartman
2022-08-29 10:58 ` [PATCH 5.10 22/86] nfc: pn533: Fix use-after-free bugs caused by pn532_cmd_timeout Greg Kroah-Hartman
2022-08-29 10:58 ` [PATCH 5.10 23/86] ice: xsk: Force rings to be sized to power of 2 Greg Kroah-Hartman
2022-08-29 10:58 ` [PATCH 5.10 24/86] ice: xsk: prohibit usage of non-balanced queue id Greg Kroah-Hartman
2022-08-29 10:58 ` [PATCH 5.10 25/86] net/mlx5e: Properly disable vlan strip on non-UL reps Greg Kroah-Hartman
2022-08-29 10:58 ` [PATCH 5.10 26/86] net: ipa: dont assume SMEM is page-aligned Greg Kroah-Hartman
2022-08-29 10:58 ` [PATCH 5.10 27/86] net: moxa: get rid of asymmetry in DMA mapping/unmapping Greg Kroah-Hartman
2022-08-29 10:58 ` [PATCH 5.10 28/86] bonding: 802.3ad: fix no transmission of LACPDUs Greg Kroah-Hartman
2022-08-29 10:58 ` [PATCH 5.10 29/86] net: ipvtap - add __init/__exit annotations to module init/exit funcs Greg Kroah-Hartman
2022-08-29 10:58 ` [PATCH 5.10 30/86] netfilter: ebtables: reject blobs that dont provide all entry points Greg Kroah-Hartman
2022-08-29 10:58 ` [PATCH 5.10 31/86] bnxt_en: fix NQ resource accounting during vf creation on 57500 chips Greg Kroah-Hartman
2022-08-29 10:58 ` [PATCH 5.10 32/86] netfilter: nft_payload: report ERANGE for too long offset and length Greg Kroah-Hartman
2022-08-29 10:58 ` [PATCH 5.10 33/86] netfilter: nft_payload: do not truncate csum_offset and csum_type Greg Kroah-Hartman
2022-08-29 10:59 ` [PATCH 5.10 34/86] netfilter: nf_tables: do not leave chain stats enabled on error Greg Kroah-Hartman
2022-08-29 10:59 ` [PATCH 5.10 35/86] netfilter: nft_osf: restrict osf to ipv4, ipv6 and inet families Greg Kroah-Hartman
2022-08-29 10:59 ` [PATCH 5.10 36/86] netfilter: nft_tunnel: restrict it to netdev family Greg Kroah-Hartman
2022-08-29 10:59 ` [PATCH 5.10 37/86] netfilter: nftables: remove redundant assignment of variable err Greg Kroah-Hartman
2022-08-29 10:59 ` [PATCH 5.10 38/86] netfilter: nf_tables: consolidate rule verdict trace call Greg Kroah-Hartman
2022-08-29 10:59 ` [PATCH 5.10 39/86] netfilter: nft_cmp: optimize comparison for 16-bytes Greg Kroah-Hartman
2022-08-29 10:59 ` [PATCH 5.10 40/86] netfilter: bitwise: improve error goto labels Greg Kroah-Hartman
2022-08-29 10:59 ` [PATCH 5.10 41/86] netfilter: nf_tables: upfront validation of data via nft_data_init() Greg Kroah-Hartman
2022-08-29 10:59 ` [PATCH 5.10 42/86] netfilter: nf_tables: disallow jump to implicit chain from set element Greg Kroah-Hartman
2022-08-29 10:59 ` [PATCH 5.10 43/86] netfilter: nf_tables: disallow binding to already bound chain Greg Kroah-Hartman
2022-08-29 10:59 ` Greg Kroah-Hartman [this message]
2022-08-29 10:59 ` [PATCH 5.10 45/86] net: Fix data-races around sysctl_[rw]mem(_offset)? Greg Kroah-Hartman
2022-08-29 10:59 ` [PATCH 5.10 46/86] net: Fix data-races around sysctl_[rw]mem_(max|default) Greg Kroah-Hartman
2022-08-29 10:59 ` [PATCH 5.10 47/86] net: Fix data-races around weight_p and dev_weight_[rt]x_bias Greg Kroah-Hartman
2022-08-29 10:59 ` [PATCH 5.10 48/86] net: Fix data-races around netdev_max_backlog Greg Kroah-Hartman
2022-08-29 10:59 ` [PATCH 5.10 49/86] net: Fix data-races around netdev_tstamp_prequeue Greg Kroah-Hartman
2022-08-29 10:59 ` [PATCH 5.10 50/86] ratelimit: Fix data-races in ___ratelimit() Greg Kroah-Hartman
2022-08-29 10:59 ` [PATCH 5.10 51/86] bpf: Folding omem_charge() into sk_storage_charge() Greg Kroah-Hartman
2022-08-29 10:59 ` [PATCH 5.10 52/86] net: Fix data-races around sysctl_optmem_max Greg Kroah-Hartman
2022-08-29 10:59 ` [PATCH 5.10 53/86] net: Fix a data-race around sysctl_tstamp_allow_data Greg Kroah-Hartman
2022-08-29 10:59 ` [PATCH 5.10 54/86] net: Fix a data-race around sysctl_net_busy_poll Greg Kroah-Hartman
2022-08-29 10:59 ` [PATCH 5.10 55/86] net: Fix a data-race around sysctl_net_busy_read Greg Kroah-Hartman
2022-08-29 10:59 ` [PATCH 5.10 56/86] net: Fix a data-race around netdev_budget Greg Kroah-Hartman
2022-08-29 10:59 ` [PATCH 5.10 57/86] net: Fix a data-race around netdev_budget_usecs Greg Kroah-Hartman
2022-08-29 10:59 ` [PATCH 5.10 58/86] net: Fix data-races around sysctl_fb_tunnels_only_for_init_net Greg Kroah-Hartman
2022-08-29 10:59 ` [PATCH 5.10 59/86] net: Fix data-races around sysctl_devconf_inherit_init_net Greg Kroah-Hartman
2022-08-29 10:59 ` [PATCH 5.10 60/86] net: Fix a data-race around sysctl_somaxconn Greg Kroah-Hartman
2022-08-29 10:59 ` [PATCH 5.10 61/86] ixgbe: stop resetting SYSTIME in ixgbe_ptp_start_cyclecounter Greg Kroah-Hartman
2022-08-29 10:59 ` [PATCH 5.10 62/86] rxrpc: Fix locking in rxrpcs sendmsg Greg Kroah-Hartman
2022-08-29 10:59 ` [PATCH 5.10 63/86] ionic: fix up issues with handling EAGAIN on FW cmds Greg Kroah-Hartman
2022-08-29 10:59 ` [PATCH 5.10 64/86] btrfs: fix silent failure when deleting root reference Greg Kroah-Hartman
2022-08-29 10:59 ` [PATCH 5.10 65/86] btrfs: replace: drop assert for suspended replace Greg Kroah-Hartman
2022-08-29 10:59 ` [PATCH 5.10 66/86] btrfs: add info when mount fails due to stale replace target Greg Kroah-Hartman
2022-08-29 10:59 ` [PATCH 5.10 67/86] btrfs: check if root is readonly while setting security xattr Greg Kroah-Hartman
2022-08-29 10:59 ` [PATCH 5.10 68/86] perf/x86/lbr: Enable the branch type for the Arch LBR by default Greg Kroah-Hartman
2022-08-29 10:59 ` [PATCH 5.10 69/86] x86/unwind/orc: Unwind ftrace trampolines with correct ORC entry Greg Kroah-Hartman
2022-08-29 10:59 ` [PATCH 5.10 70/86] x86/bugs: Add "unknown" reporting for MMIO Stale Data Greg Kroah-Hartman
2022-08-29 10:59 ` [PATCH 5.10 71/86] loop: Check for overflow while configuring loop Greg Kroah-Hartman
2022-08-29 10:59 ` [PATCH 5.10 72/86] asm-generic: sections: refactor memory_intersects Greg Kroah-Hartman
2022-08-29 10:59 ` [PATCH 5.10 73/86] s390: fix double free of GS and RI CBs on fork() failure Greg Kroah-Hartman
2022-08-29 10:59 ` [PATCH 5.10 74/86] ACPI: processor: Remove freq Qos request for all CPUs Greg Kroah-Hartman
2022-08-29 10:59 ` [PATCH 5.10 75/86] xen/privcmd: fix error exit of privcmd_ioctl_dm_op() Greg Kroah-Hartman
2022-08-29 10:59 ` [PATCH 5.10 76/86] mm/hugetlb: fix hugetlb not supporting softdirty tracking Greg Kroah-Hartman
2022-08-29 10:59 ` [PATCH 5.10 77/86] Revert "md-raid: destroy the bitmap after destroying the thread" Greg Kroah-Hartman
2022-08-29 10:59 ` [PATCH 5.10 78/86] md: call __md_stop_writes in md_stop Greg Kroah-Hartman
2022-08-29 10:59 ` [PATCH 5.10 79/86] arm64: Fix match_list for erratum 1286807 on Arm Cortex-A76 Greg Kroah-Hartman
2022-08-29 10:59 ` [PATCH 5.10 80/86] Documentation/ABI: Mention retbleed vulnerability info file for sysfs Greg Kroah-Hartman
2022-08-29 10:59 ` [PATCH 5.10 81/86] blk-mq: fix io hung due to missing commit_rqs Greg Kroah-Hartman
2022-08-29 10:59 ` [PATCH 5.10 82/86] perf python: Fix build when PYTHON_CONFIG is user supplied Greg Kroah-Hartman
2022-08-29 10:59 ` [PATCH 5.10 83/86] perf/x86/intel/uncore: Fix broken read_counter() for SNB IMC PMU Greg Kroah-Hartman
2022-08-29 10:59 ` [PATCH 5.10 84/86] scsi: ufs: core: Enable link lost interrupt Greg Kroah-Hartman
2022-08-29 10:59 ` [PATCH 5.10 85/86] scsi: storvsc: Remove WQ_MEM_RECLAIM from storvsc_error_wq Greg Kroah-Hartman
2022-08-29 10:59 ` [PATCH 5.10 86/86] bpf: Dont use tnum_range on array range checking for poke descriptors Greg Kroah-Hartman
2022-08-29 17:19 ` [PATCH 5.10 00/86] 5.10.140-rc1 review Florian Fainelli
2022-08-29 17:56 ` Slade Watkins
2022-08-29 18:41 ` Pavel Machek
2022-08-29 22:19 ` Shuah Khan
2022-08-30  0:47 ` Guenter Roeck
2022-08-30  2:16 ` Daniel Díaz
2022-08-30 10:21 ` Jon Hunter
2022-08-30 10:41 ` Sudip Mukherjee (Codethink)
2022-08-30 11:56 ` Rudi Heitbaum

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220829105758.337074804@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=ncardwell@google.com \
    --cc=sashal@kernel.org \
    --cc=soheil@google.com \
    --cc=stable@vger.kernel.org \
    --cc=ycheng@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.