stable.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
To: linux-kernel@vger.kernel.org
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	stable@vger.kernel.org, Eric Dumazet <edumazet@google.com>,
	Paolo Abeni <pabeni@redhat.com>, Greg Thelen <gthelen@google.com>,
	Alexander Duyck <alexanderduyck@fb.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	Jakub Kicinski <kuba@kernel.org>
Subject: [PATCH 4.14 43/50] net: avoid 32 x truesize under-estimation for tiny skbs
Date: Fri, 22 Jan 2021 15:12:24 +0100	[thread overview]
Message-ID: <20210122135736.944716219@linuxfoundation.org> (raw)
In-Reply-To: <20210122135735.176469491@linuxfoundation.org>

From: Eric Dumazet <edumazet@google.com>

[ Upstream commit 3226b158e67cfaa677fd180152bfb28989cb2fac ]

Both virtio net and napi_get_frags() allocate skbs
with a very small skb->head

While using page fragments instead of a kmalloc backed skb->head might give
a small performance improvement in some cases, there is a huge risk of
under estimating memory usage.

For both GOOD_COPY_LEN and GRO_MAX_HEAD, we can fit at least 32 allocations
per page (order-3 page in x86), or even 64 on PowerPC

We have been tracking OOM issues on GKE hosts hitting tcp_mem limits
but consuming far more memory for TCP buffers than instructed in tcp_mem[2]

Even if we force napi_alloc_skb() to only use order-0 pages, the issue
would still be there on arches with PAGE_SIZE >= 32768

This patch makes sure that small skb head are kmalloc backed, so that
other objects in the slab page can be reused instead of being held as long
as skbs are sitting in socket queues.

Note that we might in the future use the sk_buff napi cache,
instead of going through a more expensive __alloc_skb()

Another idea would be to use separate page sizes depending
on the allocated length (to never have more than 4 frags per page)

I would like to thank Greg Thelen for his precious help on this matter,
analysing crash dumps is always a time consuming task.

Fixes: fd11a83dd363 ("net: Pull out core bits of __netdev_alloc_skb and add __napi_alloc_skb")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Link: https://lore.kernel.org/r/20210113161819.1155526-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
 net/core/skbuff.c |    9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -459,13 +459,17 @@ EXPORT_SYMBOL(__netdev_alloc_skb);
 struct sk_buff *__napi_alloc_skb(struct napi_struct *napi, unsigned int len,
 				 gfp_t gfp_mask)
 {
-	struct napi_alloc_cache *nc = this_cpu_ptr(&napi_alloc_cache);
+	struct napi_alloc_cache *nc;
 	struct sk_buff *skb;
 	void *data;
 
 	len += NET_SKB_PAD + NET_IP_ALIGN;
 
-	if ((len > SKB_WITH_OVERHEAD(PAGE_SIZE)) ||
+	/* If requested length is either too small or too big,
+	 * we use kmalloc() for skb->head allocation.
+	 */
+	if (len <= SKB_WITH_OVERHEAD(1024) ||
+	    len > SKB_WITH_OVERHEAD(PAGE_SIZE) ||
 	    (gfp_mask & (__GFP_DIRECT_RECLAIM | GFP_DMA))) {
 		skb = __alloc_skb(len, gfp_mask, SKB_ALLOC_RX, NUMA_NO_NODE);
 		if (!skb)
@@ -473,6 +477,7 @@ struct sk_buff *__napi_alloc_skb(struct
 		goto skb_success;
 	}
 
+	nc = this_cpu_ptr(&napi_alloc_cache);
 	len += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
 	len = SKB_DATA_ALIGN(len);
 



  parent reply	other threads:[~2021-01-22 14:22 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-22 14:11 [PATCH 4.14 00/50] 4.14.217-rc1 review Greg Kroah-Hartman
2021-01-22 14:11 ` [PATCH 4.14 01/50] ASoC: dapm: remove widget from dirty list on free Greg Kroah-Hartman
2021-01-22 14:11 ` [PATCH 4.14 02/50] MIPS: boot: Fix unaligned access with CONFIG_MIPS_RAW_APPENDED_DTB Greg Kroah-Hartman
2021-01-22 14:11 ` [PATCH 4.14 03/50] MIPS: Fix malformed NT_FILE and NT_SIGINFO in 32bit coredumps Greg Kroah-Hartman
2021-01-22 14:11 ` [PATCH 4.14 04/50] MIPS: relocatable: fix possible boot hangup with KASLR enabled Greg Kroah-Hartman
2021-01-22 14:11 ` [PATCH 4.14 05/50] ACPI: scan: Harden acpi_device_add() against device ID overflows Greg Kroah-Hartman
2021-01-22 14:11 ` [PATCH 4.14 06/50] mm/hugetlb: fix potential missing huge page size info Greg Kroah-Hartman
2021-01-22 14:11 ` [PATCH 4.14 07/50] dm snapshot: flush merged data before committing metadata Greg Kroah-Hartman
2021-01-22 14:11 ` [PATCH 4.14 08/50] r8152: Add Lenovo Powered USB-C Travel Hub Greg Kroah-Hartman
2021-01-22 14:11 ` [PATCH 4.14 09/50] ext4: fix bug for rename with RENAME_WHITEOUT Greg Kroah-Hartman
2021-01-22 14:11 ` [PATCH 4.14 10/50] ARC: build: remove non-existing bootpImage from KBUILD_IMAGE Greg Kroah-Hartman
2021-01-22 14:11 ` [PATCH 4.14 11/50] ARC: build: add uImage.lzma to the top-level target Greg Kroah-Hartman
2021-01-22 14:11 ` [PATCH 4.14 12/50] ARC: build: add boot_targets to PHONY Greg Kroah-Hartman
2021-01-22 14:11 ` [PATCH 4.14 13/50] btrfs: fix transaction leak and crash after RO remount caused by qgroup rescan Greg Kroah-Hartman
2021-01-22 14:11 ` [PATCH 4.14 14/50] ethernet: ucc_geth: fix definition and size of ucc_geth_tx_global_pram Greg Kroah-Hartman
2021-01-22 14:11 ` [PATCH 4.14 15/50] arch/arc: add copy_user_page() to <asm/page.h> to fix build error on ARC Greg Kroah-Hartman
2021-01-22 14:11 ` [PATCH 4.14 16/50] misdn: dsp: select CONFIG_BITREVERSE Greg Kroah-Hartman
2021-01-22 14:11 ` [PATCH 4.14 17/50] net: ethernet: fs_enet: Add missing MODULE_LICENSE Greg Kroah-Hartman
2021-01-22 14:11 ` [PATCH 4.14 18/50] ACPI: scan: add stub acpi_create_platform_device() for !CONFIG_ACPI Greg Kroah-Hartman
2021-01-22 14:12 ` [PATCH 4.14 19/50] ARM: picoxcell: fix missing interrupt-parent properties Greg Kroah-Hartman
2021-01-22 14:12 ` [PATCH 4.14 20/50] dump_common_audit_data(): fix racy accesses to ->d_name Greg Kroah-Hartman
2021-01-22 14:12 ` [PATCH 4.14 21/50] ASoC: Intel: fix error code cnl_set_dsp_D0() Greg Kroah-Hartman
2021-01-22 14:12 ` [PATCH 4.14 22/50] NFS4: Fix use-after-free in trace_event_raw_event_nfs4_set_lock Greg Kroah-Hartman
2021-01-22 14:12 ` [PATCH 4.14 23/50] pNFS: Mark layout for return if return-on-close was not sent Greg Kroah-Hartman
2021-01-22 14:12 ` [PATCH 4.14 24/50] NFS: nfs_igrab_and_active must first reference the superblock Greg Kroah-Hartman
2021-01-22 14:12 ` [PATCH 4.14 25/50] ext4: fix superblock checksum failure when setting password salt Greg Kroah-Hartman
2021-01-22 14:12 ` [PATCH 4.14 26/50] RDMA/usnic: Fix memleak in find_free_vf_and_create_qp_grp Greg Kroah-Hartman
2021-01-22 14:12 ` [PATCH 4.14 27/50] mm, slub: consider rest of partial list if acquire_slab() fails Greg Kroah-Hartman
2021-03-10 18:43   ` Linus Torvalds
2021-03-10 18:50     ` Greg Kroah-Hartman
2021-01-22 14:12 ` [PATCH 4.14 28/50] net: sunrpc: interpret the return value of kstrtou32 correctly Greg Kroah-Hartman
2021-01-22 14:12 ` [PATCH 4.14 29/50] dm: eliminate potential source of excessive kernel log noise Greg Kroah-Hartman
2021-01-22 14:12 ` [PATCH 4.14 30/50] ALSA: firewire-tascam: Fix integer overflow in midi_port_work() Greg Kroah-Hartman
2021-01-22 14:12 ` [PATCH 4.14 31/50] ALSA: fireface: Fix integer overflow in transmit_midi_msg() Greg Kroah-Hartman
2021-01-22 14:12 ` [PATCH 4.14 32/50] netfilter: conntrack: fix reading nf_conntrack_buckets Greg Kroah-Hartman
2021-01-22 14:12 ` [PATCH 4.14 33/50] usb: ohci: Make distrust_firmware param default to false Greg Kroah-Hartman
2021-01-22 14:12 ` [PATCH 4.14 34/50] compiler.h: Raise minimum version of GCC to 5.1 for arm64 Greg Kroah-Hartman
2021-01-22 14:12 ` [PATCH 4.14 35/50] nfsd4: readdirplus shouldnt return parent of export Greg Kroah-Hartman
2021-01-22 14:12 ` [PATCH 4.14 36/50] netxen_nic: fix MSI/MSI-x interrupts Greg Kroah-Hartman
2021-01-22 14:12 ` [PATCH 4.14 37/50] rndis_host: set proper input size for OID_GEN_PHYSICAL_MEDIUM request Greg Kroah-Hartman
2021-01-22 14:12 ` [PATCH 4.14 38/50] esp: avoid unneeded kmap_atomic call Greg Kroah-Hartman
2021-01-22 14:12 ` [PATCH 4.14 39/50] net: dcb: Validate netlink message in DCB handler Greg Kroah-Hartman
2021-01-22 14:12 ` [PATCH 4.14 40/50] net: dcb: Accept RTM_GETDCB messages carrying set-like DCB commands Greg Kroah-Hartman
2021-01-22 14:12 ` [PATCH 4.14 41/50] net: stmmac: Fixed mtu channged by cache aligned Greg Kroah-Hartman
2021-01-22 14:12 ` [PATCH 4.14 42/50] net: sit: unregister_netdevice on newlinks error path Greg Kroah-Hartman
2021-01-22 14:12 ` Greg Kroah-Hartman [this message]
2021-01-22 14:12 ` [PATCH 4.14 44/50] rxrpc: Fix handling of an unsupported token type in rxrpc_read() Greg Kroah-Hartman
2021-01-22 14:12 ` [PATCH 4.14 45/50] tipc: fix NULL deref in tipc_link_xmit() Greg Kroah-Hartman
2021-01-22 14:12 ` [PATCH 4.14 46/50] net: use skb_list_del_init() to remove from RX sublists Greg Kroah-Hartman
2021-01-22 14:12 ` [PATCH 4.14 47/50] net: introduce skb_list_walk_safe for skb segment walking Greg Kroah-Hartman
2021-01-22 14:12 ` [PATCH 4.14 48/50] net: skbuff: disambiguate argument and member for skb_list_walk_safe helper Greg Kroah-Hartman
2021-01-22 14:12 ` [PATCH 4.14 49/50] net: ipv6: Validate GSO SKB before finish IPv6 processing Greg Kroah-Hartman
2021-01-22 14:12 ` [PATCH 4.14 50/50] spi: cadence: cache reference clock rate during probe Greg Kroah-Hartman
2021-01-22 15:02 ` [PATCH 4.14 00/50] 4.14.217-rc1 review Naresh Kamboju
2021-01-22 15:08   ` Greg Kroah-Hartman
2021-01-22 15:13     ` Naresh Kamboju
2021-01-22 15:36       ` Will Deacon
2021-01-22 15:42         ` Nathan Chancellor
2021-01-22 18:10           ` Nick Desaulniers
2021-01-22 15:57         ` Greg Kroah-Hartman
2021-01-22 15:20 ` Naresh Kamboju
2021-01-22 15:59   ` Greg Kroah-Hartman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210122135736.944716219@linuxfoundation.org \
    --to=gregkh@linuxfoundation.org \
    --cc=alexanderduyck@fb.com \
    --cc=edumazet@google.com \
    --cc=gthelen@google.com \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=pabeni@redhat.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).