Re: [PATCH net] net: avoid 32 x truesize under-estimation for tiny skbs

From: Paolo Abeni <pabeni@redhat.com>
To: Eric Dumazet <eric.dumazet@gmail.com>,
	"David S . Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>
Cc: netdev <netdev@vger.kernel.org>,
	Eric Dumazet <edumazet@google.com>,
	Alexander Duyck <alexanderduyck@fb.com>,
	"Michael S . Tsirkin" <mst@redhat.com>,
	Greg Thelen <gthelen@google.com>
Subject: Re: [PATCH net] net: avoid 32 x truesize under-estimation for tiny skbs
Date: Wed, 07 Sep 2022 22:19:53 +0200	[thread overview]
Message-ID: <bd79ede94805326cd63f105c84f1eaa4e75c8176.camel@redhat.com> (raw)
In-Reply-To: <20210113161819.1155526-1-eric.dumazet@gmail.com>

Hello,

reviving an old thread...
On Wed, 2021-01-13 at 08:18 -0800, Eric Dumazet wrote:
> While using page fragments instead of a kmalloc backed skb->head might give
> a small performance improvement in some cases, there is a huge risk of
> under estimating memory usage.

[...]

> Note that we might in the future use the sk_buff napi cache,
> instead of going through a more expensive __alloc_skb()
> 
> Another idea would be to use separate page sizes depending
> on the allocated length (to never have more than 4 frags per page)

I'm investigating a couple of performance regressions pointing to this
change and I'd like to have a try to the 2nd suggestion above. 

If I read correctly, it means:
- extend the page_frag_cache alloc API to allow forcing max order==0
- add a 2nd page_frag_cache into napi_alloc_cache (say page_order0 or
page_small)
- in __napi_alloc_skb(), when len <= SKB_WITH_OVERHEAD(1024), use the
page_small cache with order 0 allocation.
(all the above constrained to host with 4K pages)

I'm not quite sure about the "never have more than 4 frags per page"
part.

What outlined above will allow for 10 min size frags in page_order0, as
(SKB_DATA_ALIGN(0) + SKB_DATA_ALIGN(struct skb_shared_info) == 384. I'm
not sure that anything will allocate such small frags.
With a more reasonable GRO_MAX_HEAD, there will be 6 frags per page. 

The maximum truesize underestimation in both cases will be lower than
what we can get with the current code in the worst case (almost 32x
AFAICS). 

Is the above schema safe enough or should the requested size
artificially inflatted to fit at most 4 allocations per page_order0?
Am I miss something else? Apart from omitting a good deal of testing in
the above list ;) 

Thanks!

Paolo