* [PATCH v5 RFC 5/6] page_pool: update document about frag API
2023-06-29 12:02 [PATCH v5 RFC 0/6] introduce page_pool_alloc() API Yunsheng Lin
@ 2023-06-29 12:02 ` Yunsheng Lin
2023-06-29 20:30 ` Randy Dunlap
2023-06-29 12:02 ` [PATCH v5 RFC 6/6] net: veth: use newly added page pool API for veth with xdp Yunsheng Lin
2023-06-29 14:26 ` [PATCH v5 RFC 0/6] introduce page_pool_alloc() API Alexander Lobakin
2 siblings, 1 reply; 6+ messages in thread
From: Yunsheng Lin @ 2023-06-29 12:02 UTC (permalink / raw)
To: davem, kuba, pabeni
Cc: netdev, linux-kernel, Yunsheng Lin, Lorenzo Bianconi,
Alexander Duyck, Liang Chen, Alexander Lobakin,
Jesper Dangaard Brouer, Ilias Apalodimas, Eric Dumazet,
Jonathan Corbet, Alexei Starovoitov, Daniel Borkmann,
John Fastabend, linux-doc, bpf
As more drivers begin to use the frag API, update the
document about how to decide which API to use for the
driver author.
Also it seems there is a similar document in page_pool.h,
so remove it to avoid the duplication.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
CC: Lorenzo Bianconi <lorenzo@kernel.org>
CC: Alexander Duyck <alexander.duyck@gmail.com>
CC: Liang Chen <liangchen.linux@gmail.com>
CC: Alexander Lobakin <aleksander.lobakin@intel.com>
---
Documentation/networking/page_pool.rst | 34 ++++++++++++++++++++++----
include/net/page_pool.h | 22 -----------------
2 files changed, 29 insertions(+), 27 deletions(-)
diff --git a/Documentation/networking/page_pool.rst b/Documentation/networking/page_pool.rst
index 873efd97f822..18b13d659c98 100644
--- a/Documentation/networking/page_pool.rst
+++ b/Documentation/networking/page_pool.rst
@@ -4,12 +4,27 @@
Page Pool API
=============
-The page_pool allocator is optimized for the XDP mode that uses one frame
-per-page, but it can fallback on the regular page allocator APIs.
+The page_pool allocator is optimized for recycling page or page frag used by skb
+packet and xdp frame.
-Basic use involves replacing alloc_pages() calls with the
-page_pool_alloc_pages() call. Drivers should use page_pool_dev_alloc_pages()
-replacing dev_alloc_pages().
+Basic use involves replacing napi_alloc_frag() and alloc_pages() calls with
+page_pool_cache_alloc() and page_pool_alloc(), which allocate memory with or
+without page splitting depending on the requested memory size.
+
+If the driver knows that it always requires full pages or its allocates are
+always smaller than half a page, it can use one of the more specific API calls:
+
+1. page_pool_alloc_pages(): allocate memory without page splitting when driver
+ knows that the memory it need is always bigger than half of the page
+ allocated from page pool. There is no cache line dirtying for 'struct page'
+ when a page is recycled back to the page pool.
+
+2. page_pool_alloc_frag(): allocate memory with page splitting when driver knows
+ that the memory it need is always smaller than or equal to half of the page
+ allocated from page pool. Page splitting enables memory saving and thus avoid
+ TLB/cache miss for data access, but there also is some cost to implement page
+ splitting, mainly some cache line dirtying/bouncing for 'struct page' and
+ atomic operation for page->pp_frag_count.
API keeps track of in-flight pages, in order to let API user know
when it is safe to free a page_pool object. Thus, API users
@@ -93,6 +108,15 @@ a page will cause no race conditions is enough.
* page_pool_dev_alloc_pages(): Get a page from the page allocator or page_pool
caches.
+* page_pool_dev_alloc_frag(): Get a page frag from the page allocator or
+ page_pool caches.
+
+* page_pool_dev_alloc(): Get a page or page frag from the page allocator or
+ page_pool caches.
+
+* page_pool_dev_cache_alloc(): Get a cache from the page allocator or page_pool
+ caches.
+
* page_pool_get_dma_addr(): Retrieve the stored DMA address.
* page_pool_get_dma_dir(): Retrieve the stored DMA direction.
diff --git a/include/net/page_pool.h b/include/net/page_pool.h
index e9fb95d62ed5..2b7db9992fc0 100644
--- a/include/net/page_pool.h
+++ b/include/net/page_pool.h
@@ -5,28 +5,6 @@
* Copyright (C) 2016 Red Hat, Inc.
*/
-/**
- * DOC: page_pool allocator
- *
- * This page_pool allocator is optimized for the XDP mode that
- * uses one-frame-per-page, but have fallbacks that act like the
- * regular page allocator APIs.
- *
- * Basic use involve replacing alloc_pages() calls with the
- * page_pool_alloc_pages() call. Drivers should likely use
- * page_pool_dev_alloc_pages() replacing dev_alloc_pages().
- *
- * API keeps track of in-flight pages, in-order to let API user know
- * when it is safe to dealloactor page_pool object. Thus, API users
- * must make sure to call page_pool_release_page() when a page is
- * "leaving" the page_pool. Or call page_pool_put_page() where
- * appropiate. For maintaining correct accounting.
- *
- * API user must only call page_pool_put_page() once on a page, as it
- * will either recycle the page, or in case of elevated refcnt, it
- * will release the DMA mapping and in-flight state accounting. We
- * hope to lift this requirement in the future.
- */
#ifndef _NET_PAGE_POOL_H
#define _NET_PAGE_POOL_H
--
2.33.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v5 RFC 5/6] page_pool: update document about frag API
2023-06-29 12:02 ` [PATCH v5 RFC 5/6] page_pool: update document about frag API Yunsheng Lin
@ 2023-06-29 20:30 ` Randy Dunlap
0 siblings, 0 replies; 6+ messages in thread
From: Randy Dunlap @ 2023-06-29 20:30 UTC (permalink / raw)
To: Yunsheng Lin, davem, kuba, pabeni
Cc: netdev, linux-kernel, Lorenzo Bianconi, Alexander Duyck,
Liang Chen, Alexander Lobakin, Jesper Dangaard Brouer,
Ilias Apalodimas, Eric Dumazet, Jonathan Corbet,
Alexei Starovoitov, Daniel Borkmann, John Fastabend, linux-doc,
bpf
Hi--
On 6/29/23 05:02, Yunsheng Lin wrote:
> As more drivers begin to use the frag API, update the
> document about how to decide which API to use for the
> driver author.
>
> Also it seems there is a similar document in page_pool.h,
> so remove it to avoid the duplication.
>
> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
> CC: Lorenzo Bianconi <lorenzo@kernel.org>
> CC: Alexander Duyck <alexander.duyck@gmail.com>
> CC: Liang Chen <liangchen.linux@gmail.com>
> CC: Alexander Lobakin <aleksander.lobakin@intel.com>
> ---
> Documentation/networking/page_pool.rst | 34 ++++++++++++++++++++++----
> include/net/page_pool.h | 22 -----------------
> 2 files changed, 29 insertions(+), 27 deletions(-)
>
> diff --git a/Documentation/networking/page_pool.rst b/Documentation/networking/page_pool.rst
> index 873efd97f822..18b13d659c98 100644
> --- a/Documentation/networking/page_pool.rst
> +++ b/Documentation/networking/page_pool.rst
> @@ -4,12 +4,27 @@
> Page Pool API
> =============
>
> -The page_pool allocator is optimized for the XDP mode that uses one frame
> -per-page, but it can fallback on the regular page allocator APIs.
> +The page_pool allocator is optimized for recycling page or page frag used by skb
> +packet and xdp frame.
That sentence could use some adjectives. Choose singular or plural:
> +The page_pool allocator is optimized for recycling a page or page frag used by an skb
> +packet or xdp frame.
or
> +The page_pool allocator is optimized for recycling pages or page frags used by skb
> +packets or xdp frames.
Now that I have written them, I prefer the latter one (plural). FWIW.
>
> -Basic use involves replacing alloc_pages() calls with the
> -page_pool_alloc_pages() call. Drivers should use page_pool_dev_alloc_pages()
> -replacing dev_alloc_pages().
> +Basic use involves replacing napi_alloc_frag() and alloc_pages() calls with
> +page_pool_cache_alloc() and page_pool_alloc(), which allocate memory with or
> +without page splitting depending on the requested memory size.
> +
> +If the driver knows that it always requires full pages or its allocates are
allocations are
> +always smaller than half a page, it can use one of the more specific API calls:
> +
> +1. page_pool_alloc_pages(): allocate memory without page splitting when driver
> + knows that the memory it need is always bigger than half of the page
> + allocated from page pool. There is no cache line dirtying for 'struct page'
> + when a page is recycled back to the page pool.
> +
> +2. page_pool_alloc_frag(): allocate memory with page splitting when driver knows
> + that the memory it need is always smaller than or equal to half of the page
> + allocated from page pool. Page splitting enables memory saving and thus avoid
and thus avoids
> + TLB/cache miss for data access, but there also is some cost to implement page
> + splitting, mainly some cache line dirtying/bouncing for 'struct page' and
> + atomic operation for page->pp_frag_count.
>
> API keeps track of in-flight pages, in order to let API user know
> when it is safe to free a page_pool object. Thus, API users
> @@ -93,6 +108,15 @@ a page will cause no race conditions is enough.
> * page_pool_dev_alloc_pages(): Get a page from the page allocator or page_pool
> caches.
>
> +* page_pool_dev_alloc_frag(): Get a page frag from the page allocator or
> + page_pool caches.
> +
> +* page_pool_dev_alloc(): Get a page or page frag from the page allocator or
> + page_pool caches.
> +
> +* page_pool_dev_cache_alloc(): Get a cache from the page allocator or page_pool
> + caches.
> +
> * page_pool_get_dma_addr(): Retrieve the stored DMA address.
>
> * page_pool_get_dma_dir(): Retrieve the stored DMA direction.
Thanks for adding the documentation.
--
~Randy
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH v5 RFC 6/6] net: veth: use newly added page pool API for veth with xdp
2023-06-29 12:02 [PATCH v5 RFC 0/6] introduce page_pool_alloc() API Yunsheng Lin
2023-06-29 12:02 ` [PATCH v5 RFC 5/6] page_pool: update document about frag API Yunsheng Lin
@ 2023-06-29 12:02 ` Yunsheng Lin
2023-06-29 14:26 ` [PATCH v5 RFC 0/6] introduce page_pool_alloc() API Alexander Lobakin
2 siblings, 0 replies; 6+ messages in thread
From: Yunsheng Lin @ 2023-06-29 12:02 UTC (permalink / raw)
To: davem, kuba, pabeni
Cc: netdev, linux-kernel, Yunsheng Lin, Lorenzo Bianconi,
Alexander Duyck, Liang Chen, Alexander Lobakin, Eric Dumazet,
Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
John Fastabend, bpf
Use page_pool[_cache]_alloc() API to allocate memory with
least memory utilization and performance penalty.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
CC: Lorenzo Bianconi <lorenzo@kernel.org>
CC: Alexander Duyck <alexander.duyck@gmail.com>
CC: Liang Chen <liangchen.linux@gmail.com>
CC: Alexander Lobakin <aleksander.lobakin@intel.com>
---
drivers/net/veth.c | 24 +++++++++++++++---------
1 file changed, 15 insertions(+), 9 deletions(-)
diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 614f3e3efab0..d3dc754ba7f8 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -736,10 +736,11 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
if (skb_shared(skb) || skb_head_is_locked(skb) ||
skb_shinfo(skb)->nr_frags ||
skb_headroom(skb) < XDP_PACKET_HEADROOM) {
- u32 size, len, max_head_size, off;
+ u32 size, len, max_head_size, off, truesize, page_offset;
struct sk_buff *nskb;
struct page *page;
int i, head_off;
+ void *data;
/* We need a private copy of the skb and data buffers since
* the ebpf program can modify it. We segment the original skb
@@ -752,14 +753,17 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
if (skb->len > PAGE_SIZE * MAX_SKB_FRAGS + max_head_size)
goto drop;
+ size = min_t(u32, skb->len, max_head_size);
+ truesize = size;
+
/* Allocate skb head */
- page = page_pool_dev_alloc_pages(rq->page_pool);
- if (!page)
+ data = page_pool_dev_cache_alloc(rq->page_pool, &truesize);
+ if (!data)
goto drop;
- nskb = napi_build_skb(page_address(page), PAGE_SIZE);
+ nskb = napi_build_skb(data, truesize);
if (!nskb) {
- page_pool_put_full_page(rq->page_pool, page, true);
+ page_pool_cache_free(rq->page_pool, data, true);
goto drop;
}
@@ -767,7 +771,6 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
skb_copy_header(nskb, skb);
skb_mark_for_recycle(nskb);
- size = min_t(u32, skb->len, max_head_size);
if (skb_copy_bits(skb, 0, nskb->data, size)) {
consume_skb(nskb);
goto drop;
@@ -782,14 +785,17 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
len = skb->len - off;
for (i = 0; i < MAX_SKB_FRAGS && off < skb->len; i++) {
- page = page_pool_dev_alloc_pages(rq->page_pool);
+ size = min_t(u32, len, PAGE_SIZE);
+ truesize = size;
+
+ page = page_pool_dev_alloc(rq->page_pool, &page_offset,
+ &truesize);
if (!page) {
consume_skb(nskb);
goto drop;
}
- size = min_t(u32, len, PAGE_SIZE);
- skb_add_rx_frag(nskb, i, page, 0, size, PAGE_SIZE);
+ skb_add_rx_frag(nskb, i, page, page_offset, size, truesize);
if (skb_copy_bits(skb, off, page_address(page),
size)) {
consume_skb(nskb);
--
2.33.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH v5 RFC 0/6] introduce page_pool_alloc() API
2023-06-29 12:02 [PATCH v5 RFC 0/6] introduce page_pool_alloc() API Yunsheng Lin
2023-06-29 12:02 ` [PATCH v5 RFC 5/6] page_pool: update document about frag API Yunsheng Lin
2023-06-29 12:02 ` [PATCH v5 RFC 6/6] net: veth: use newly added page pool API for veth with xdp Yunsheng Lin
@ 2023-06-29 14:26 ` Alexander Lobakin
2023-06-30 11:57 ` Yunsheng Lin
2 siblings, 1 reply; 6+ messages in thread
From: Alexander Lobakin @ 2023-06-29 14:26 UTC (permalink / raw)
To: Yunsheng Lin
Cc: davem, kuba, pabeni, netdev, linux-kernel, Alexei Starovoitov,
Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
Matthias Brugger, AngeloGioacchino Del Regno, bpf,
linux-arm-kernel, linux-mediatek
From: Yunsheng Lin <linyunsheng@huawei.com>
Date: Thu, 29 Jun 2023 20:02:20 +0800
> In [1] & [2] & [3], there are usecases for veth and virtio_net
> to use frag support in page pool to reduce memory usage, and it
> may request different frag size depending on the head/tail
> room space for xdp_frame/shinfo and mtu/packet size. When the
> requested frag size is large enough that a single page can not
> be split into more than one frag, using frag support only have
> performance penalty because of the extra frag count handling
> for frag support.
>
> So this patchset provides a page pool API for the driver to
> allocate memory with least memory utilization and performance
> penalty when it doesn't know the size of memory it need
> beforehand.
>
> 1. https://patchwork.kernel.org/project/netdevbpf/patch/d3ae6bd3537fbce379382ac6a42f67e22f27ece2.1683896626.git.lorenzo@kernel.org/
> 2. https://patchwork.kernel.org/project/netdevbpf/patch/20230526054621.18371-3-liangchen.linux@gmail.com/
> 3. https://github.com/alobakin/linux/tree/iavf-pp-frag
Thanks for sharing the link :D
>
> v5 RFC: add a new page_pool_cache_alloc() API, and other minor
> change as discussed in v4. As there seems to be three
> comsumers that might be made use of the new API, so
> repost it as RFC and CC the relevant authors to see
> if the new API fits their need.
Tested v5 against my latest tree, no regressions, perf is even a bit
better than it was. That also might've come from that net-next pulled
Linus' tree with a good bunch of PRs already merged, or from v4 -> v5
update.
Re consumers, I'm planning to send the RFC series with IAVF as a
consumer on Monday (and a couple generic Page Pool improvements today,
will see).
>
> V4. Fix a typo and add a patch to update document about frag
> API, PAGE_POOL_DMA_USE_PP_FRAG_COUNT is not renamed yet
> as we may need a different thread to discuss that.
>
> V3: Incorporate changes from the disscusion with Alexander,
> mostly the inline wraper, PAGE_POOL_DMA_USE_PP_FRAG_COUNT
> change split to separate patch and comment change.
> V2: Add patch to remove PP_FLAG_PAGE_FRAG flags and mention
> virtio_net usecase in the cover letter.
> V1: Drop RFC tag and page_pool_frag patch.
Thanks,
Olek
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH v5 RFC 0/6] introduce page_pool_alloc() API
2023-06-29 14:26 ` [PATCH v5 RFC 0/6] introduce page_pool_alloc() API Alexander Lobakin
@ 2023-06-30 11:57 ` Yunsheng Lin
0 siblings, 0 replies; 6+ messages in thread
From: Yunsheng Lin @ 2023-06-30 11:57 UTC (permalink / raw)
To: Alexander Lobakin
Cc: davem, kuba, pabeni, netdev, linux-kernel, Alexei Starovoitov,
Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
Matthias Brugger, AngeloGioacchino Del Regno, bpf,
linux-arm-kernel, linux-mediatek
On 2023/6/29 22:26, Alexander Lobakin wrote:
>> v5 RFC: add a new page_pool_cache_alloc() API, and other minor
>> change as discussed in v4. As there seems to be three
>> comsumers that might be made use of the new API, so
>> repost it as RFC and CC the relevant authors to see
>> if the new API fits their need.
>
> Tested v5 against my latest tree, no regressions, perf is even a bit
> better than it was. That also might've come from that net-next pulled
> Linus' tree with a good bunch of PRs already merged, or from v4 -> v5
> update.
v4 -> v5 is mostly about adding the page pool cache API, so I believe the
perf improvement is from the net-next pull:)
>
> Re consumers, I'm planning to send the RFC series with IAVF as a
> consumer on Monday (and a couple generic Page Pool improvements today,
> will see).
Thanks.
^ permalink raw reply [flat|nested] 6+ messages in thread