All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Dumazet <eric.dumazet@gmail.com>
To: "David S . Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>
Cc: netdev <netdev@vger.kernel.org>,
	Eric Dumazet <edumazet@google.com>,
	Eric Dumazet <eric.dumazet@gmail.com>,
	Alexander Duyck <alexanderduyck@fb.com>,
	Paolo Abeni <pabeni@redhat.com>,
	"Michael S . Tsirkin" <mst@redhat.com>,
	Greg Thelen <gthelen@google.com>
Subject: [PATCH net] net: avoid 32 x truesize under-estimation for tiny skbs
Date: Wed, 13 Jan 2021 08:18:19 -0800	[thread overview]
Message-ID: <20210113161819.1155526-1-eric.dumazet@gmail.com> (raw)

From: Eric Dumazet <edumazet@google.com>

Both virtio net and napi_get_frags() allocate skbs
with a very small skb->head

While using page fragments instead of a kmalloc backed skb->head might give
a small performance improvement in some cases, there is a huge risk of
under estimating memory usage.

For both GOOD_COPY_LEN and GRO_MAX_HEAD, we can fit at least 32 allocations
per page (order-3 page in x86), or even 64 on PowerPC

We have been tracking OOM issues on GKE hosts hitting tcp_mem limits
but consuming far more memory for TCP buffers than instructed in tcp_mem[2]

Even if we force napi_alloc_skb() to only use order-0 pages, the issue
would still be there on arches with PAGE_SIZE >= 32768

This patch makes sure that small skb head are kmalloc backed, so that
other objects in the slab page can be reused instead of being held as long
as skbs are sitting in socket queues.

Note that we might in the future use the sk_buff napi cache,
instead of going through a more expensive __alloc_skb()

Another idea would be to use separate page sizes depending
on the allocated length (to never have more than 4 frags per page)

I would like to thank Greg Thelen for his precious help on this matter,
analysing crash dumps is always a time consuming task.

Fixes: fd11a83dd363 ("net: Pull out core bits of __netdev_alloc_skb and add __napi_alloc_skb")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexander Duyck <alexanderduyck@fb.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
---
 net/core/skbuff.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 7626a33cce590e530f36167bd096026916131897..3a8f55a43e6964344df464a27b9b1faa0eb804f3 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -501,13 +501,17 @@ EXPORT_SYMBOL(__netdev_alloc_skb);
 struct sk_buff *__napi_alloc_skb(struct napi_struct *napi, unsigned int len,
 				 gfp_t gfp_mask)
 {
-	struct napi_alloc_cache *nc = this_cpu_ptr(&napi_alloc_cache);
+	struct napi_alloc_cache *nc;
 	struct sk_buff *skb;
 	void *data;
 
 	len += NET_SKB_PAD + NET_IP_ALIGN;
 
-	if ((len > SKB_WITH_OVERHEAD(PAGE_SIZE)) ||
+	/* If requested length is either too small or too big,
+	 * we use kmalloc() for skb->head allocation.
+	 */
+	if (len <= SKB_WITH_OVERHEAD(1024) ||
+	    len > SKB_WITH_OVERHEAD(PAGE_SIZE) ||
 	    (gfp_mask & (__GFP_DIRECT_RECLAIM | GFP_DMA))) {
 		skb = __alloc_skb(len, gfp_mask, SKB_ALLOC_RX, NUMA_NO_NODE);
 		if (!skb)
@@ -515,6 +519,7 @@ struct sk_buff *__napi_alloc_skb(struct napi_struct *napi, unsigned int len,
 		goto skb_success;
 	}
 
+	nc = this_cpu_ptr(&napi_alloc_cache);
 	len += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
 	len = SKB_DATA_ALIGN(len);
 
-- 
2.30.0.284.gd98b1dd5eaa7-goog


             reply	other threads:[~2021-01-13 16:19 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-13 16:18 Eric Dumazet [this message]
2021-01-13 18:00 ` [PATCH net] net: avoid 32 x truesize under-estimation for tiny skbs Alexander Duyck
2021-01-13 19:19 ` Michael S. Tsirkin
2021-01-13 22:23 ` David Laight
2021-01-14  5:16   ` Eric Dumazet
2021-01-14  9:29     ` David Laight
2021-01-14 19:00 ` patchwork-bot+netdevbpf
     [not found] ` <1617007696.5731978-1-xuanzhuo@linux.alibaba.com>
2021-03-29  9:06   ` Eric Dumazet
2021-03-31  8:11     ` Michael S. Tsirkin
2021-03-31  8:36       ` Eric Dumazet
2021-03-31  8:46         ` Eric Dumazet
2021-03-31  8:49           ` Eric Dumazet
2021-03-31  8:54             ` Eric Dumazet
     [not found]               ` <1617248264.4993114-2-xuanzhuo@linux.alibaba.com>
2021-04-01  5:06                 ` Eric Dumazet
     [not found]                   ` <1617357110.3822439-1-xuanzhuo@linux.alibaba.com>
2021-04-02 12:52                     ` Eric Dumazet
2021-04-01 13:51         ` Michael S. Tsirkin
2021-04-01 14:08           ` Eric Dumazet
2021-04-01  7:14       ` Jason Wang
     [not found]         ` <1617267183.5697193-1-xuanzhuo@linux.alibaba.com>
2021-04-01  9:58           ` Eric Dumazet
2021-04-02  2:52             ` Jason Wang
     [not found]               ` <1617361253.1788838-2-xuanzhuo@linux.alibaba.com>
2021-04-02 12:53                 ` Eric Dumazet
2021-04-06  2:04                 ` Jason Wang
     [not found]       ` <1617190239.1035674-1-xuanzhuo@linux.alibaba.com>
2021-03-31 12:08         ` Eric Dumazet
2021-04-01 13:36         ` Michael S. Tsirkin
2022-09-07 20:19 ` Paolo Abeni
2022-09-07 20:40   ` Eric Dumazet
2022-09-08 10:48     ` Paolo Abeni
2022-09-08 12:20       ` Eric Dumazet
2022-09-08 14:26         ` Paolo Abeni
2022-09-08 16:00           ` Eric Dumazet
2022-09-07 21:36   ` Alexander H Duyck
2022-09-08 11:00     ` Paolo Abeni
2022-09-08 14:53       ` Alexander H Duyck
2022-09-08 18:01         ` Paolo Abeni
2022-09-08 19:26           ` Alexander Duyck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210113161819.1155526-1-eric.dumazet@gmail.com \
    --to=eric.dumazet@gmail.com \
    --cc=alexanderduyck@fb.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=gthelen@google.com \
    --cc=kuba@kernel.org \
    --cc=mst@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.