Netdev Archive on lore.kernel.org
 help / color / Atom feed
* [PATCH net] net: avoid 32 x truesize under-estimation for tiny skbs
@ 2021-01-13 16:18 Eric Dumazet
  2021-01-13 18:00 ` Alexander Duyck
                   ` (3 more replies)
  0 siblings, 4 replies; 7+ messages in thread
From: Eric Dumazet @ 2021-01-13 16:18 UTC (permalink / raw)
  To: David S . Miller, Jakub Kicinski
  Cc: netdev, Eric Dumazet, Eric Dumazet, Alexander Duyck, Paolo Abeni,
	Michael S . Tsirkin, Greg Thelen

From: Eric Dumazet <edumazet@google.com>

Both virtio net and napi_get_frags() allocate skbs
with a very small skb->head

While using page fragments instead of a kmalloc backed skb->head might give
a small performance improvement in some cases, there is a huge risk of
under estimating memory usage.

For both GOOD_COPY_LEN and GRO_MAX_HEAD, we can fit at least 32 allocations
per page (order-3 page in x86), or even 64 on PowerPC

We have been tracking OOM issues on GKE hosts hitting tcp_mem limits
but consuming far more memory for TCP buffers than instructed in tcp_mem[2]

Even if we force napi_alloc_skb() to only use order-0 pages, the issue
would still be there on arches with PAGE_SIZE >= 32768

This patch makes sure that small skb head are kmalloc backed, so that
other objects in the slab page can be reused instead of being held as long
as skbs are sitting in socket queues.

Note that we might in the future use the sk_buff napi cache,
instead of going through a more expensive __alloc_skb()

Another idea would be to use separate page sizes depending
on the allocated length (to never have more than 4 frags per page)

I would like to thank Greg Thelen for his precious help on this matter,
analysing crash dumps is always a time consuming task.

Fixes: fd11a83dd363 ("net: Pull out core bits of __netdev_alloc_skb and add __napi_alloc_skb")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Alexander Duyck <alexanderduyck@fb.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Greg Thelen <gthelen@google.com>
---
 net/core/skbuff.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 7626a33cce590e530f36167bd096026916131897..3a8f55a43e6964344df464a27b9b1faa0eb804f3 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -501,13 +501,17 @@ EXPORT_SYMBOL(__netdev_alloc_skb);
 struct sk_buff *__napi_alloc_skb(struct napi_struct *napi, unsigned int len,
 				 gfp_t gfp_mask)
 {
-	struct napi_alloc_cache *nc = this_cpu_ptr(&napi_alloc_cache);
+	struct napi_alloc_cache *nc;
 	struct sk_buff *skb;
 	void *data;
 
 	len += NET_SKB_PAD + NET_IP_ALIGN;
 
-	if ((len > SKB_WITH_OVERHEAD(PAGE_SIZE)) ||
+	/* If requested length is either too small or too big,
+	 * we use kmalloc() for skb->head allocation.
+	 */
+	if (len <= SKB_WITH_OVERHEAD(1024) ||
+	    len > SKB_WITH_OVERHEAD(PAGE_SIZE) ||
 	    (gfp_mask & (__GFP_DIRECT_RECLAIM | GFP_DMA))) {
 		skb = __alloc_skb(len, gfp_mask, SKB_ALLOC_RX, NUMA_NO_NODE);
 		if (!skb)
@@ -515,6 +519,7 @@ struct sk_buff *__napi_alloc_skb(struct napi_struct *napi, unsigned int len,
 		goto skb_success;
 	}
 
+	nc = this_cpu_ptr(&napi_alloc_cache);
 	len += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
 	len = SKB_DATA_ALIGN(len);
 
-- 
2.30.0.284.gd98b1dd5eaa7-goog


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net] net: avoid 32 x truesize under-estimation for tiny skbs
  2021-01-13 16:18 [PATCH net] net: avoid 32 x truesize under-estimation for tiny skbs Eric Dumazet
@ 2021-01-13 18:00 ` Alexander Duyck
  2021-01-13 19:19 ` Michael S. Tsirkin
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 7+ messages in thread
From: Alexander Duyck @ 2021-01-13 18:00 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, netdev, Eric Dumazet,
	Alexander Duyck, Paolo Abeni, Michael S . Tsirkin, Greg Thelen

On Wed, Jan 13, 2021 at 8:20 AM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> From: Eric Dumazet <edumazet@google.com>
>
> Both virtio net and napi_get_frags() allocate skbs
> with a very small skb->head
>
> While using page fragments instead of a kmalloc backed skb->head might give
> a small performance improvement in some cases, there is a huge risk of
> under estimating memory usage.
>
> For both GOOD_COPY_LEN and GRO_MAX_HEAD, we can fit at least 32 allocations
> per page (order-3 page in x86), or even 64 on PowerPC
>
> We have been tracking OOM issues on GKE hosts hitting tcp_mem limits
> but consuming far more memory for TCP buffers than instructed in tcp_mem[2]
>
> Even if we force napi_alloc_skb() to only use order-0 pages, the issue
> would still be there on arches with PAGE_SIZE >= 32768
>
> This patch makes sure that small skb head are kmalloc backed, so that
> other objects in the slab page can be reused instead of being held as long
> as skbs are sitting in socket queues.
>
> Note that we might in the future use the sk_buff napi cache,
> instead of going through a more expensive __alloc_skb()
>
> Another idea would be to use separate page sizes depending
> on the allocated length (to never have more than 4 frags per page)
>
> I would like to thank Greg Thelen for his precious help on this matter,
> analysing crash dumps is always a time consuming task.
>
> Fixes: fd11a83dd363 ("net: Pull out core bits of __netdev_alloc_skb and add __napi_alloc_skb")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Alexander Duyck <alexanderduyck@fb.com>
> Cc: Paolo Abeni <pabeni@redhat.com>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Greg Thelen <gthelen@google.com>
> ---
>  net/core/skbuff.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 7626a33cce590e530f36167bd096026916131897..3a8f55a43e6964344df464a27b9b1faa0eb804f3 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -501,13 +501,17 @@ EXPORT_SYMBOL(__netdev_alloc_skb);
>  struct sk_buff *__napi_alloc_skb(struct napi_struct *napi, unsigned int len,
>                                  gfp_t gfp_mask)
>  {
> -       struct napi_alloc_cache *nc = this_cpu_ptr(&napi_alloc_cache);
> +       struct napi_alloc_cache *nc;
>         struct sk_buff *skb;
>         void *data;
>
>         len += NET_SKB_PAD + NET_IP_ALIGN;
>
> -       if ((len > SKB_WITH_OVERHEAD(PAGE_SIZE)) ||
> +       /* If requested length is either too small or too big,
> +        * we use kmalloc() for skb->head allocation.
> +        */
> +       if (len <= SKB_WITH_OVERHEAD(1024) ||
> +           len > SKB_WITH_OVERHEAD(PAGE_SIZE) ||
>             (gfp_mask & (__GFP_DIRECT_RECLAIM | GFP_DMA))) {
>                 skb = __alloc_skb(len, gfp_mask, SKB_ALLOC_RX, NUMA_NO_NODE);
>                 if (!skb)
> @@ -515,6 +519,7 @@ struct sk_buff *__napi_alloc_skb(struct napi_struct *napi, unsigned int len,
>                 goto skb_success;
>         }
>
> +       nc = this_cpu_ptr(&napi_alloc_cache);
>         len += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
>         len = SKB_DATA_ALIGN(len);
>

The fix here looks good to me.
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>

I think at some point in the future we may need to follow up and do a
rework of a bunch of this code. One thing I am wondering is if we
should look at doing some sort of memory accounting per napi_struct.
Maybe it is something we could work on tying into the page pool work
that Jesper did earlier.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net] net: avoid 32 x truesize under-estimation for tiny skbs
  2021-01-13 16:18 [PATCH net] net: avoid 32 x truesize under-estimation for tiny skbs Eric Dumazet
  2021-01-13 18:00 ` Alexander Duyck
@ 2021-01-13 19:19 ` Michael S. Tsirkin
  2021-01-13 22:23 ` David Laight
  2021-01-14 19:00 ` patchwork-bot+netdevbpf
  3 siblings, 0 replies; 7+ messages in thread
From: Michael S. Tsirkin @ 2021-01-13 19:19 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, Jakub Kicinski, netdev, Eric Dumazet,
	Alexander Duyck, Paolo Abeni, Greg Thelen

On Wed, Jan 13, 2021 at 08:18:19AM -0800, Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> Both virtio net and napi_get_frags() allocate skbs
> with a very small skb->head
> 
> While using page fragments instead of a kmalloc backed skb->head might give
> a small performance improvement in some cases, there is a huge risk of
> under estimating memory usage.
> 
> For both GOOD_COPY_LEN and GRO_MAX_HEAD, we can fit at least 32 allocations
> per page (order-3 page in x86), or even 64 on PowerPC
> 
> We have been tracking OOM issues on GKE hosts hitting tcp_mem limits
> but consuming far more memory for TCP buffers than instructed in tcp_mem[2]
> 
> Even if we force napi_alloc_skb() to only use order-0 pages, the issue
> would still be there on arches with PAGE_SIZE >= 32768
> 
> This patch makes sure that small skb head are kmalloc backed, so that
> other objects in the slab page can be reused instead of being held as long
> as skbs are sitting in socket queues.
> 
> Note that we might in the future use the sk_buff napi cache,
> instead of going through a more expensive __alloc_skb()
> 
> Another idea would be to use separate page sizes depending
> on the allocated length (to never have more than 4 frags per page)
> 
> I would like to thank Greg Thelen for his precious help on this matter,
> analysing crash dumps is always a time consuming task.
> 
> Fixes: fd11a83dd363 ("net: Pull out core bits of __netdev_alloc_skb and add __napi_alloc_skb")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Alexander Duyck <alexanderduyck@fb.com>
> Cc: Paolo Abeni <pabeni@redhat.com>
> Cc: Michael S. Tsirkin <mst@redhat.com>
> Cc: Greg Thelen <gthelen@google.com>

Better than tweaking virtio code.

Acked-by: Michael S. Tsirkin <mst@redhat.com>

I do hope the sk_buff napi cache idea materializes in the future.

> ---
>  net/core/skbuff.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 7626a33cce590e530f36167bd096026916131897..3a8f55a43e6964344df464a27b9b1faa0eb804f3 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -501,13 +501,17 @@ EXPORT_SYMBOL(__netdev_alloc_skb);
>  struct sk_buff *__napi_alloc_skb(struct napi_struct *napi, unsigned int len,
>  				 gfp_t gfp_mask)
>  {
> -	struct napi_alloc_cache *nc = this_cpu_ptr(&napi_alloc_cache);
> +	struct napi_alloc_cache *nc;
>  	struct sk_buff *skb;
>  	void *data;
>  
>  	len += NET_SKB_PAD + NET_IP_ALIGN;
>  
> -	if ((len > SKB_WITH_OVERHEAD(PAGE_SIZE)) ||
> +	/* If requested length is either too small or too big,
> +	 * we use kmalloc() for skb->head allocation.
> +	 */
> +	if (len <= SKB_WITH_OVERHEAD(1024) ||
> +	    len > SKB_WITH_OVERHEAD(PAGE_SIZE) ||
>  	    (gfp_mask & (__GFP_DIRECT_RECLAIM | GFP_DMA))) {
>  		skb = __alloc_skb(len, gfp_mask, SKB_ALLOC_RX, NUMA_NO_NODE);
>  		if (!skb)
> @@ -515,6 +519,7 @@ struct sk_buff *__napi_alloc_skb(struct napi_struct *napi, unsigned int len,
>  		goto skb_success;
>  	}
>  
> +	nc = this_cpu_ptr(&napi_alloc_cache);
>  	len += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
>  	len = SKB_DATA_ALIGN(len);
>  
> -- 
> 2.30.0.284.gd98b1dd5eaa7-goog


^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH net] net: avoid 32 x truesize under-estimation for tiny skbs
  2021-01-13 16:18 [PATCH net] net: avoid 32 x truesize under-estimation for tiny skbs Eric Dumazet
  2021-01-13 18:00 ` Alexander Duyck
  2021-01-13 19:19 ` Michael S. Tsirkin
@ 2021-01-13 22:23 ` David Laight
  2021-01-14  5:16   ` Eric Dumazet
  2021-01-14 19:00 ` patchwork-bot+netdevbpf
  3 siblings, 1 reply; 7+ messages in thread
From: David Laight @ 2021-01-13 22:23 UTC (permalink / raw)
  To: 'Eric Dumazet', David S . Miller, Jakub Kicinski
  Cc: netdev, Eric Dumazet, Alexander Duyck, Paolo Abeni,
	Michael S . Tsirkin, Greg Thelen

From: Eric Dumazet
> Sent: 13 January 2021 16:18
> 
> From: Eric Dumazet <edumazet@google.com>
> 
> Both virtio net and napi_get_frags() allocate skbs
> with a very small skb->head
> 
> While using page fragments instead of a kmalloc backed skb->head might give
> a small performance improvement in some cases, there is a huge risk of
> under estimating memory usage.

There is (or was last time I looked) also a problem with
some of the USB ethernet drivers.

IIRC one of the ASXnnnnnn (???) USB3 ones allocates 64k skb to pass
to the USB stack and then just lies about skb->truesize when passing
them into the network stack.
The USB hardware will merge TCP receives and put multiple ethernet
packets into a single USB message.
But single frames can end up in very big kernel memory buffers.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net] net: avoid 32 x truesize under-estimation for tiny skbs
  2021-01-13 22:23 ` David Laight
@ 2021-01-14  5:16   ` Eric Dumazet
  2021-01-14  9:29     ` David Laight
  0 siblings, 1 reply; 7+ messages in thread
From: Eric Dumazet @ 2021-01-14  5:16 UTC (permalink / raw)
  To: David Laight
  Cc: Eric Dumazet, David S . Miller, Jakub Kicinski, netdev,
	Alexander Duyck, Paolo Abeni, Michael S . Tsirkin, Greg Thelen

On Wed, Jan 13, 2021 at 11:23 PM David Laight <David.Laight@aculab.com> wrote:
>
> From: Eric Dumazet
> > Sent: 13 January 2021 16:18
> >
> > From: Eric Dumazet <edumazet@google.com>
> >
> > Both virtio net and napi_get_frags() allocate skbs
> > with a very small skb->head
> >
> > While using page fragments instead of a kmalloc backed skb->head might give
> > a small performance improvement in some cases, there is a huge risk of
> > under estimating memory usage.
>
> There is (or was last time I looked) also a problem with
> some of the USB ethernet drivers.
>
> IIRC one of the ASXnnnnnn (???) USB3 ones allocates 64k skb to pass
> to the USB stack and then just lies about skb->truesize when passing
> them into the network stack.

You sure ? I think I have fixed this at some point

https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=a9e0aca4b37885b5599e52211f098bd7f565e749

> The USB hardware will merge TCP receives and put multiple ethernet
> packets into a single USB message.
> But single frames can end up in very big kernel memory buffers.
>

Yeah, this is a known problem.

Since 2009 I have sent numerous patches addressing truesize issues,
your help will be welcomed especially if you own the hardware and test
the patches.

git log --author dumazet --grep truesize --oneline --reverse
2b85a34e911bf483c27cfdd124aeb1605145dc80 net: No more expensive
sock_hold()/sock_put() on each tx
d361fd599a991ff6c1d522a599c635b35d61ef30 net: sock_free() optimizations
daebbca3ab41031666ee27f991b223d2bc0415e9 qlcnic: dont set skb->truesize
8df8fd27123054b02007361bd5483775db84b4a8 qlcnic: dont set skb->truesize
7e96dc7045bff8758804b047c0dfb6868f182500 netxen: dont set skb->truesize
3d13008e7345fa7a79d8f6438150dc15d6ba6e9d ip: fix truesize mismatch in
ip fragmentation
7a91b434e2bad554b709265db7603b1aa52dd92e net: update SOCK_MIN_RCVBUF
87fb4b7b533073eeeaed0b6bf7c2328995f6c075 net: more accurate skb truesize
bdb28a97f46b5307e6e9351de52a9dd03e711a2f be2net: fix truesize errors
a1f4e8bcbccf50cf1894c263af4d677d4f566533 bnx2: fix skb truesize underestimation
ed64b3cc11502f50e1401f12e33d021592800bca e1000: fix skb truesize underestimation
95b9c1dfb7b929f5f3b203ed95c28bdfd069d122 igb: fix skb truesize underestimation
98130646770db42cd14c44ba0d7f2d0eb8078820 ixgbe: fix skb truesize underestimation
98a045d7e4a59db0865a59eea2140fe36bc7c220 e1000e: fix skb truesize
underestimation
7ae60b3f3b297b7f04025c93f1cb2275c3a1dfcd sky2: fix skb truesize underestimation
5935f81c595897d213afcf756e3e41af7c704f0e ftgmac100: fix skb truesize
underestimation
5e6c355c47e75314fd2282d087616069d4093ecf vmxnet3: fix skb truesize
underestimation
e7e5a4033f765e2a37095cd0a73261c99840f77e niu: fix skb truesize underestimation
96cd8951684adaa5fd72952adef532d0b42f70e1 ftmac100: fix skb truesize
underestimation
9e903e085262ffbf1fc44a17ac06058aca03524a net: add skb frag size accessors
90278c9ffb8a92672d60a618a58a99e2370a98ac mlx4_en: fix skb truesize
underestimation
7b8b59617ead5acc6ff041a9ad2ea1fe7a58094f igbvf: fix truesize underestimation
924a4c7d2e962b4e6a8e9ab3aabfd2bb29e5ada9 myri10ge: fix truesize underestimation
e1ac50f64691de9a095ac5d73cb8ac73d3d17dba bnx2x: fix skb truesize underestimation
4b727361f0bc7ee7378298941066d8aa15023ffb virtio_net: fix truesize
underestimation
e52fcb2462ac484e6dd6e68869536609f0216938 bnx2x: uses build_skb() in receive path
dd2bc8e9c0685d8eaaaf06e65919e31d60478411 bnx2: switch to build_skb()
infrastructure
9205fd9ccab8ef51ad771c1917eed7b2f2225d45 tg3: switch to build_skb()
infrastructure
570e57bcbcc4df5581b1e9c806ab2b16e96ea7d3 atm: use SKB_TRUESIZE() in
atm_guess_pdu2truesize()
f07d960df33c5aef8f513efce0fd201f962f94a1 tcp: avoid frag allocation
for small frames
0fd7bac6b6157eed6cf0cb86a1e88ba29e57c033 net: relax rcvbuf limits
de8261c2fa364397ed872fad1244d75364689168 gro: fix truesize underestimation
19c6c8f58b5840fd4757233b4849f42687d2ef3a ppp: fix truesize underestimation
a9e0aca4b37885b5599e52211f098bd7f565e749 asix: asix_rx_fixup surgery
to reduce skb truesizes
c8628155ece363487b57d33441ea0359018c0fa7 tcp: reduce out_of_order memory use
50269e19ad990e79eeda101fc6df80cffd5d4831 net: add a truesize parameter
to skb_add_rx_frag()
21dcda6083a0573686acabca39b3f92ba032d333 f_phonet: fix skb truesize
underestimation
094b5855bf37eae4b297bc47bb5bc5454f1f6fab cdc-phonet: fix skb truesize
underestimation
da882c1f2ecadb0ed582628ec1585e36b137c0f0 tcp: sk_add_backlog() is too
agressive for TCP
1402d366019fedaa2b024f2bac06b7cc9a8782e1 tcp: introduce tcp_try_coalesce
d3836f21b0af5513ef55701dd3f50b8c42e44c7a net: allow skb->head to be a
page fragment
b49960a05e32121d29316cfdf653894b88ac9190 tcp: change tcp_adv_win_scale
and tcp_rmem[2]
ed90542b0ce5415050c6fbfca324bccaafa69f2f iwlwifi: fix skb truesize
underestimation
715dc1f342713816d1be1c37643a2c9e6ee181a7 net: Fix truesize accounting
in skb_gro_receive()
3cc4949269e01f39443d0fcfffb5bc6b47878d45 ipv4: use skb coalescing in
defragmentation
ec16439e173aaf56f62bd8e175e976fbd452497b ipv6: use skb coalescing in reassembly
313b037cf054ec908de92fb4c085403ffd7420d4 gianfar: fix potential
sk_wmem_alloc imbalance
b28ba72665356438e3a6e3be365c3c3071496840 IPoIB: fix skb truesize underestimatiom
9936a7bbe56df432838fef658aea6bcfdd994021 igb: reduce Rx header size
87c084a980325d877dc7e388b8f2f26d5d3b4d01 l2tp: dont play with skb->truesize
6ff50cd55545d922f5c62776fe1feb38a9846168 tcp: gso: do not generate out
of order packets
9eb5bf838d06aa6ddebe4aca6b5cedcf2eb53b86 net: sock: fix TCP_SKB_MIN_TRUESIZE
45fe142cefa864b685615bcb930159f6749c3667 iwl3945: better skb
management in rx path
4e4f1fc226816905c937f9b29dabe351075dfe0f tcp: properly increase
rcv_ssthresh for ofo packets
400dfd3ae899849b27d398ca7894e1b44430887f net: refactor sk_page_frag_refill()
0d08c42cf9a71530fef5ebcfe368f38f2dd0476f tcp: gso: fix truesize tracking
e33d0ba8047b049c9262fdb1fcafb93cb52ceceb net-gro: reset skb->truesize
in napi_reuse_skb()
b2532eb9abd88384aa586169b54a3e53574f29f8 tcp: fix ooo_okay setting vs
Small Queues
f2d9da1a8375cbe53df5b415d059429013a3a79f bna: fix skb->truesize underestimation
9878196578286c5ed494778ada01da094377a686 tcp: do not pace pure ack packets
0cef6a4c34b56a9a6894f2dad2fad4be789990e1 tcp: give prequeue mode some care
95b58430abe74f5e50970c57d27380bd5b8be324 fq_codel: add memory
limitation per queue
008830bc321c0fc22c0db8d5b0b56f854ed90a5c net_sched: fq_codel: cache
skb->truesize into skb->cb
c9c3321257e1b95be9b375f811fb250162af8d39 tcp: add tcp_add_backlog()
a297569fe00a8fae18547061d355c45ef191b483 net/udp: do not touch
skb->peeked unless really needed
c8c8b127091b758f5768f906bcdeeb88bc9951ca udp: under rx pressure, try
to condense skbs
c84d949057cab262b4d3110ead9a42a58c2958f7 udp: copy skb->truesize in
the first cache line
158f323b9868b59967ad96957c4ca388161be321 net: adjust skb->truesize in
pskb_expand_head()
48cac18ecf1de82f76259a54402c3adb7839ad01 ipv6: orphan skbs in reassembly unit
60c7f5ae5416a8491216bcccf6b3b3d842d69fa4 mlx4: removal of frag_sizes[]
b5a54d9a313645ec9607dc557b67d9325c28884c mlx4: use order-0 pages for RX
7162fb242cb8322beb558828fd26b33c3e9fc805 tcp: do not underestimate
skb->truesize in tcp_trim_head()
c21b48cc1bbf2f5af3ef54ada559f7fadf8b508b net: adjust skb->truesize in
___pskb_trim()
d1f496fd8f34a40458d0eda6be0655926559e546 bpf: restore skb->sk before
pskb_trim() call
f6ba8d33cfbb46df569972e64dbb5bb7e929bfd9 netem: fix skb_orphan_partial()
7ec318feeed10a64c0359ec4d10889cb4defa39a tcp: gso: avoid refcount_t
warning from tcp_gso_segment()
72cd43ba64fc172a443410ce01645895850844c8 tcp: free batches of packets
in tcp_prune_ofo_queue()
4672694bd4f1aebdab0ad763ae4716e89cb15221 ipv4: frags: handle possible
skb truesize change
50ce163a72d817a99e8974222dcf2886d5deb1ae tcp: tcp_grow_window() needs
to respect tcp_space()
d7cc399e1227e74e44f78847d9732a228b46cc91 tcp: properly reset
skb->truesize for tx recycling
24adbc1676af4e134e709ddc7f34cf2adc2131e4 tcp: fix SO_RCVLOWAT hangs
with fat skbs

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH net] net: avoid 32 x truesize under-estimation for tiny skbs
  2021-01-14  5:16   ` Eric Dumazet
@ 2021-01-14  9:29     ` David Laight
  0 siblings, 0 replies; 7+ messages in thread
From: David Laight @ 2021-01-14  9:29 UTC (permalink / raw)
  To: 'Eric Dumazet'
  Cc: Eric Dumazet, David S . Miller, Jakub Kicinski, netdev,
	Alexander Duyck, Paolo Abeni, Michael S . Tsirkin, Greg Thelen

From: Eric Dumazet
> Sent: 14 January 2021 05:17
> 
> On Wed, Jan 13, 2021 at 11:23 PM David Laight <David.Laight@aculab.com> wrote:
> >
> > From: Eric Dumazet
> > > Sent: 13 January 2021 16:18
> > >
> > > From: Eric Dumazet <edumazet@google.com>
> > >
> > > Both virtio net and napi_get_frags() allocate skbs
> > > with a very small skb->head
> > >
> > > While using page fragments instead of a kmalloc backed skb->head might give
> > > a small performance improvement in some cases, there is a huge risk of
> > > under estimating memory usage.
> >
> > There is (or was last time I looked) also a problem with
> > some of the USB ethernet drivers.
> >
> > IIRC one of the ASXnnnnnn (???) USB3 ones allocates 64k skb to pass
> > to the USB stack and then just lies about skb->truesize when passing
> > them into the network stack.
> 
> You sure ? I think I have fixed this at some point
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=a9e0aca4b37885b5599e52211f09
> 8bd7f565e749

I might have forgotten that patch :-)
Or possibly only remembered it changing small packets.

> > The USB hardware will merge TCP receives and put multiple ethernet
> > packets into a single USB message.
> > But single frames can end up in very big kernel memory buffers.
> >
> 
> Yeah, this is a known problem.

The whole USB ethernet block is somewhat horrid and inefficient
especially for XHCI/USB3 - which could have high speed ethernet.
It really needs to either directly interface to the XHCI ring
(like a normal ethernet driver) or be given the sequence of
USB rx packets to split/join into ethernet frames.

However I don't have the time to make those changes.
When I was looking at that driver 'dayjob' was actually
trying to make it work.
They dropped that idea later.
I've not got the ethernet dongle any more either.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH net] net: avoid 32 x truesize under-estimation for tiny skbs
  2021-01-13 16:18 [PATCH net] net: avoid 32 x truesize under-estimation for tiny skbs Eric Dumazet
                   ` (2 preceding siblings ...)
  2021-01-13 22:23 ` David Laight
@ 2021-01-14 19:00 ` patchwork-bot+netdevbpf
  3 siblings, 0 replies; 7+ messages in thread
From: patchwork-bot+netdevbpf @ 2021-01-14 19:00 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: davem, kuba, netdev, edumazet, alexanderduyck, pabeni, mst, gthelen

Hello:

This patch was applied to netdev/net.git (refs/heads/master):

On Wed, 13 Jan 2021 08:18:19 -0800 you wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> Both virtio net and napi_get_frags() allocate skbs
> with a very small skb->head
> 
> While using page fragments instead of a kmalloc backed skb->head might give
> a small performance improvement in some cases, there is a huge risk of
> under estimating memory usage.
> 
> [...]

Here is the summary with links:
  - [net] net: avoid 32 x truesize under-estimation for tiny skbs
    https://git.kernel.org/netdev/net/c/3226b158e67c

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, back to index

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-13 16:18 [PATCH net] net: avoid 32 x truesize under-estimation for tiny skbs Eric Dumazet
2021-01-13 18:00 ` Alexander Duyck
2021-01-13 19:19 ` Michael S. Tsirkin
2021-01-13 22:23 ` David Laight
2021-01-14  5:16   ` Eric Dumazet
2021-01-14  9:29     ` David Laight
2021-01-14 19:00 ` patchwork-bot+netdevbpf

Netdev Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/netdev/0 netdev/git/0.git
	git clone --mirror https://lore.kernel.org/netdev/1 netdev/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 netdev netdev/ https://lore.kernel.org/netdev \
		netdev@vger.kernel.org
	public-inbox-index netdev

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.netdev


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git