linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Eric Dumazet <edumazet@google.com>,
	Paolo Abeni <pabeni@redhat.com>, Greg Thelen <gthelen@google.com>,
	Alexander Duyck <alexanderduyck@fb.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	Jakub Kicinski <kuba@kernel.org>
Subject: [PATCH 4.19 16/22] net: avoid 32 x truesize under-estimation for tiny skbs
Date: Fri, 22 Jan 2021 15:12:34 +0100	[thread overview]
Message-ID: <20210122135732.555755452@linuxfoundation.org> (raw)
In-Reply-To: <20210122135731.921636245@linuxfoundation.org>

From: Eric Dumazet <edumazet@google.com>

[ Upstream commit 3226b158e67cfaa677fd180152bfb28989cb2fac ]

Both virtio net and napi_get_frags() allocate skbs
with a very small skb->head

While using page fragments instead of a kmalloc backed skb->head might give
a small performance improvement in some cases, there is a huge risk of
under estimating memory usage.

For both GOOD_COPY_LEN and GRO_MAX_HEAD, we can fit at least 32 allocations
per page (order-3 page in x86), or even 64 on PowerPC

We have been tracking OOM issues on GKE hosts hitting tcp_mem limits
but consuming far more memory for TCP buffers than instructed in tcp_mem[2]

Even if we force napi_alloc_skb() to only use order-0 pages, the issue
would still be there on arches with PAGE_SIZE >= 32768

This patch makes sure that small skb head are kmalloc backed, so that
other objects in the slab page can be reused instead of being held as long
as skbs are sitting in socket queues.

Note that we might in the future use the sk_buff napi cache,
instead of going through a more expensive __alloc_skb()

Another idea would be to use separate page sizes depending
on the allocated length (to never have more than 4 frags per page)

I would like to thank Greg Thelen for his precious help on this matter,
analysing crash dumps is always a time consuming task.

Fixes: fd11a83dd363 ("net: Pull out core bits of __netdev_alloc_skb and add __napi_alloc_skb")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://lore.kernel.org/r/20210113161819.1155526-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/core/skbuff.c |    9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -459,13 +459,17 @@ EXPORT_SYMBOL(__netdev_alloc_skb);
 struct sk_buff *__napi_alloc_skb(struct napi_struct *napi, unsigned int len,
 				 gfp_t gfp_mask)
 {
-	struct napi_alloc_cache *nc = this_cpu_ptr(&napi_alloc_cache);
+	struct napi_alloc_cache *nc;
 	struct sk_buff *skb;
 	void *data;
 
 	len += NET_SKB_PAD + NET_IP_ALIGN;
 
-	if ((len > SKB_WITH_OVERHEAD(PAGE_SIZE)) ||
+	/* If requested length is either too small or too big,
+	 * we use kmalloc() for skb->head allocation.
+	 */
+	if (len <= SKB_WITH_OVERHEAD(1024) ||
+	    len > SKB_WITH_OVERHEAD(PAGE_SIZE) ||
 	    (gfp_mask & (__GFP_DIRECT_RECLAIM | GFP_DMA))) {
 		skb = __alloc_skb(len, gfp_mask, SKB_ALLOC_RX, NUMA_NO_NODE);
 		if (!skb)
@@ -473,6 +477,7 @@ struct sk_buff *__napi_alloc_skb(struct
 		goto skb_success;
 	}
 
+	nc = this_cpu_ptr(&napi_alloc_cache);
 	len += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
 	len = SKB_DATA_ALIGN(len);
 



  parent reply	other threads:[~2021-01-22 14:27 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-22 14:12 [PATCH 4.19 00/22] 4.19.170-rc1 review Greg Kroah-Hartman
2021-01-22 14:12 ` [PATCH 4.19 01/22] usb: ohci: Make distrust_firmware param default to false Greg Kroah-Hartman
2021-01-22 14:12 ` [PATCH 4.19 02/22] compiler.h: Raise minimum version of GCC to 5.1 for arm64 Greg Kroah-Hartman
2021-01-22 14:12 ` [PATCH 4.19 03/22] dm integrity: fix flush with external metadata device Greg Kroah-Hartman
2021-01-22 14:12 ` [PATCH 4.19 04/22] crypto: x86/crc32c - fix building with clang ias Greg Kroah-Hartman
2021-01-22 14:12 ` [PATCH 4.19 05/22] nfsd4: readdirplus shouldnt return parent of export Greg Kroah-Hartman
2021-01-22 14:12 ` [PATCH 4.19 06/22] udp: Prevent reuseport_select_sock from reading uninitialized socks Greg Kroah-Hartman
2021-01-22 14:12 ` [PATCH 4.19 07/22] netxen_nic: fix MSI/MSI-x interrupts Greg Kroah-Hartman
2021-01-22 14:12 ` [PATCH 4.19 08/22] net: mvpp2: Remove Pause and Asym_Pause support Greg Kroah-Hartman
2021-01-22 14:12 ` [PATCH 4.19 09/22] rndis_host: set proper input size for OID_GEN_PHYSICAL_MEDIUM request Greg Kroah-Hartman
2021-01-22 14:12 ` [PATCH 4.19 10/22] esp: avoid unneeded kmap_atomic call Greg Kroah-Hartman
2021-01-22 14:12 ` [PATCH 4.19 11/22] net: dcb: Validate netlink message in DCB handler Greg Kroah-Hartman
2021-01-22 14:12 ` [PATCH 4.19 12/22] net: dcb: Accept RTM_GETDCB messages carrying set-like DCB commands Greg Kroah-Hartman
2021-01-22 14:12 ` [PATCH 4.19 13/22] rxrpc: Call state should be read with READ_ONCE() under some circumstances Greg Kroah-Hartman
2021-01-22 14:12 ` [PATCH 4.19 14/22] net: stmmac: Fixed mtu channged by cache aligned Greg Kroah-Hartman
2021-01-22 14:12 ` [PATCH 4.19 15/22] net: sit: unregister_netdevice on newlinks error path Greg Kroah-Hartman
2021-01-22 14:12 ` Greg Kroah-Hartman [this message]
2021-01-22 14:12 ` [PATCH 4.19 17/22] rxrpc: Fix handling of an unsupported token type in rxrpc_read() Greg Kroah-Hartman
2021-01-22 14:12 ` [PATCH 4.19 18/22] tipc: fix NULL deref in tipc_link_xmit() Greg Kroah-Hartman
2021-01-22 14:12 ` [PATCH 4.19 19/22] net: introduce skb_list_walk_safe for skb segment walking Greg Kroah-Hartman
2021-01-22 14:12 ` [PATCH 4.19 20/22] net: skbuff: disambiguate argument and member for skb_list_walk_safe helper Greg Kroah-Hartman
2021-01-22 14:12 ` [PATCH 4.19 21/22] net: ipv6: Validate GSO SKB before finish IPv6 processing Greg Kroah-Hartman
2021-01-22 14:12 ` [PATCH 4.19 22/22] spi: cadence: cache reference clock rate during probe Greg Kroah-Hartman
2021-01-23  0:24 ` [PATCH 4.19 00/22] 4.19.170-rc1 review Shuah Khan
2021-01-23  6:04 ` Naresh Kamboju
2021-01-23  7:20   ` Naresh Kamboju
2021-01-23  9:53 ` Pavel Machek
2021-01-23 14:36 ` Guenter Roeck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210122135732.555755452@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=alexanderduyck@fb.com \
    --cc=edumazet@google.com \
    --cc=gthelen@google.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=pabeni@redhat.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).