netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next v7 0/6] introduce page_pool_alloc() related API
@ 2023-08-16 10:01 Yunsheng Lin
  2023-08-16 10:01 ` [PATCH net-next v7 1/6] page_pool: frag API support for 32-bit arch with 64-bit DMA Yunsheng Lin
                   ` (5 more replies)
  0 siblings, 6 replies; 32+ messages in thread
From: Yunsheng Lin @ 2023-08-16 10:01 UTC (permalink / raw)
  To: davem, kuba, pabeni
  Cc: netdev, linux-kernel, Yunsheng Lin, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, John Fastabend,
	Matthias Brugger, AngeloGioacchino Del Regno, bpf,
	linux-arm-kernel, linux-mediatek

In [1] & [2] & [3], there are usecases for veth and virtio_net
to use frag support in page pool to reduce memory usage, and it
may request different frag size depending on the head/tail
room space for xdp_frame/shinfo and mtu/packet size. When the
requested frag size is large enough that a single page can not
be split into more than one frag, using frag support only have
performance penalty because of the extra frag count handling
for frag support.

So this patchset provides a page pool API for the driver to
allocate memory with least memory utilization and performance
penalty when it doesn't know the size of memory it need
beforehand.

1. https://patchwork.kernel.org/project/netdevbpf/patch/d3ae6bd3537fbce379382ac6a42f67e22f27ece2.1683896626.git.lorenzo@kernel.org/
2. https://patchwork.kernel.org/project/netdevbpf/patch/20230526054621.18371-3-liangchen.linux@gmail.com/
3. https://github.com/alobakin/linux/tree/iavf-pp-frag

V7: Fix a compile error, a few typo and use kernel-doc syntax.

V6: Add a PP_FLAG_PAGE_SPLIT_IN_DRIVER flag to fail the page_pool
    creation for 32-bit arch with 64-bit DMA when driver tries to
    do the page splitting itself, adjust the requested size to
    include head/tail room in veth, and rebased on the latest
    next-net.

v5 RFC: Add a new page_pool_cache_alloc() API, and other minor
        change as discussed in v4. As there seems to be three
        comsumers that might be made use of the new API, so
        repost it as RFC and CC the relevant authors to see
        if the new API fits their need.

V4. Fix a typo and add a patch to update document about frag
    API, PAGE_POOL_DMA_USE_PP_FRAG_COUNT is not renamed yet
    as we may need a different thread to discuss that.

V3: Incorporate changes from the disscusion with Alexander,
    mostly the inline wraper, PAGE_POOL_DMA_USE_PP_FRAG_COUNT
    change split to separate patch and comment change.
V2: Add patch to remove PP_FLAG_PAGE_FRAG flags and mention
    virtio_net usecase in the cover letter.
V1: Drop RFC tag and page_pool_frag patch.

Yunsheng Lin (6):
  page_pool: frag API support for 32-bit arch with 64-bit DMA
  page_pool: unify frag_count handling in page_pool_is_last_frag()
  page_pool: remove PP_FLAG_PAGE_FRAG
  page_pool: introduce page_pool[_cache]_alloc() API
  page_pool: update document about frag API
  net: veth: use newly added page pool API for veth with xdp

 Documentation/networking/page_pool.rst        |   4 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c     |   2 -
 .../net/ethernet/hisilicon/hns3/hns3_enet.c   |   3 +-
 .../marvell/octeontx2/nic/otx2_common.c       |   2 +-
 .../net/ethernet/mellanox/mlx5/core/en_main.c |   3 +-
 drivers/net/veth.c                            |  25 +-
 drivers/net/wireless/mediatek/mt76/mac80211.c |   2 +-
 include/net/page_pool/helpers.h               | 219 ++++++++++++++++--
 include/net/page_pool/types.h                 |  38 +--
 net/core/page_pool.c                          |  30 +--
 net/core/skbuff.c                             |   2 +-
 11 files changed, 261 insertions(+), 69 deletions(-)

-- 
2.33.0


^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH net-next v7 1/6] page_pool: frag API support for 32-bit arch with 64-bit DMA
  2023-08-16 10:01 [PATCH net-next v7 0/6] introduce page_pool_alloc() related API Yunsheng Lin
@ 2023-08-16 10:01 ` Yunsheng Lin
  2023-08-17 13:57   ` Ilias Apalodimas
  2023-08-16 10:01 ` [PATCH net-next v7 2/6] page_pool: unify frag_count handling in page_pool_is_last_frag() Yunsheng Lin
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 32+ messages in thread
From: Yunsheng Lin @ 2023-08-16 10:01 UTC (permalink / raw)
  To: davem, kuba, pabeni
  Cc: netdev, linux-kernel, Yunsheng Lin, Lorenzo Bianconi,
	Alexander Duyck, Liang Chen, Alexander Lobakin, Saeed Mahameed,
	Leon Romanovsky, Eric Dumazet, Jesper Dangaard Brouer,
	Ilias Apalodimas, linux-rdma

Currently page_pool_alloc_frag() is not supported in 32-bit
arch with 64-bit DMA because of the overlap issue between
pp_frag_count and dma_addr_upper in 'struct page' for those
arches, which seems to be quite common, see [1], which means
driver may need to handle it when using frag API.

In order to simplify the driver's work when using frag API
this patch allows page_pool_alloc_frag() to call
page_pool_alloc_pages() to return pages for those arches.

Add a PP_FLAG_PAGE_SPLIT_IN_DRIVER flag in order to fail the
page_pool creation for 32-bit arch with 64-bit DMA when driver
tries to do the page splitting itself.

Note that it may aggravate truesize underestimate problem for
skb as there is no page splitting for those pages, if driver
need a accurate truesize, it may calculate that according to
frag size, page order and PAGE_POOL_DMA_USE_PP_FRAG_COUNT
being true or not. And we may provide a helper for that if it
turns out to be helpful.

1. https://lore.kernel.org/all/20211117075652.58299-1-linyunsheng@huawei.com/

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
CC: Lorenzo Bianconi <lorenzo@kernel.org>
CC: Alexander Duyck <alexander.duyck@gmail.com>
CC: Liang Chen <liangchen.linux@gmail.com>
CC: Alexander Lobakin <aleksander.lobakin@intel.com>
---
 .../net/ethernet/mellanox/mlx5/core/en_main.c |  3 +-
 include/net/page_pool/helpers.h               | 38 +++++++++++++++--
 include/net/page_pool/types.h                 | 42 ++++++++++++-------
 net/core/page_pool.c                          | 15 +++----
 4 files changed, 68 insertions(+), 30 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index bc9d5a5bea01..ec9c5a8cbda6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -834,7 +834,8 @@ static int mlx5e_alloc_rq(struct mlx5e_params *params,
 		struct page_pool_params pp_params = { 0 };
 
 		pp_params.order     = 0;
-		pp_params.flags     = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV | PP_FLAG_PAGE_FRAG;
+		pp_params.flags     = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV |
+					PP_FLAG_PAGE_SPLIT_IN_DRIVER;
 		pp_params.pool_size = pool_size;
 		pp_params.nid       = node;
 		pp_params.dev       = rq->pdev;
diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h
index 94231533a369..cb18de55f239 100644
--- a/include/net/page_pool/helpers.h
+++ b/include/net/page_pool/helpers.h
@@ -29,8 +29,12 @@
 #ifndef _NET_PAGE_POOL_HELPERS_H
 #define _NET_PAGE_POOL_HELPERS_H
 
+#include <linux/dma-mapping.h>
 #include <net/page_pool/types.h>
 
+#define PAGE_POOL_DMA_USE_PP_FRAG_COUNT	\
+		(sizeof(dma_addr_t) > sizeof(unsigned long))
+
 #ifdef CONFIG_PAGE_POOL_STATS
 int page_pool_ethtool_stats_get_count(void);
 u8 *page_pool_ethtool_stats_get_strings(u8 *data);
@@ -73,6 +77,29 @@ static inline struct page *page_pool_dev_alloc_pages(struct page_pool *pool)
 	return page_pool_alloc_pages(pool, gfp);
 }
 
+static inline struct page *page_pool_alloc_frag(struct page_pool *pool,
+						unsigned int *offset,
+						unsigned int size, gfp_t gfp)
+{
+	unsigned int max_size = PAGE_SIZE << pool->p.order;
+
+	size = ALIGN(size, dma_get_cache_alignment());
+
+	if (WARN_ON(size > max_size))
+		return NULL;
+
+	/* Don't allow page splitting and allocate one big frag
+	 * for 32-bit arch with 64-bit DMA, corresponding to
+	 * the checking in page_pool_is_last_frag().
+	 */
+	if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT) {
+		*offset = 0;
+		return page_pool_alloc_pages(pool, gfp);
+	}
+
+	return __page_pool_alloc_frag(pool, offset, size, gfp);
+}
+
 static inline struct page *page_pool_dev_alloc_frag(struct page_pool *pool,
 						    unsigned int *offset,
 						    unsigned int size)
@@ -134,8 +161,14 @@ static inline long page_pool_defrag_page(struct page *page, long nr)
 static inline bool page_pool_is_last_frag(struct page_pool *pool,
 					  struct page *page)
 {
-	/* If fragments aren't enabled or count is 0 we were the last user */
+	/* We assume we are the last frag user that is still holding
+	 * on to the page if:
+	 * 1. Fragments aren't enabled.
+	 * 2. We are running in 32-bit arch with 64-bit DMA.
+	 * 3. page_pool_defrag_page() indicate we are the last user.
+	 */
 	return !(pool->p.flags & PP_FLAG_PAGE_FRAG) ||
+	       PAGE_POOL_DMA_USE_PP_FRAG_COUNT ||
 	       (page_pool_defrag_page(page, 1) == 0);
 }
 
@@ -197,9 +230,6 @@ static inline void page_pool_recycle_direct(struct page_pool *pool,
 	page_pool_put_full_page(pool, page, true);
 }
 
-#define PAGE_POOL_DMA_USE_PP_FRAG_COUNT	\
-		(sizeof(dma_addr_t) > sizeof(unsigned long))
-
 /**
  * page_pool_get_dma_addr() - Retrieve the stored DMA address.
  * @page:	page allocated from a page pool
diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
index 887e7946a597..079337c42aa6 100644
--- a/include/net/page_pool/types.h
+++ b/include/net/page_pool/types.h
@@ -6,21 +6,29 @@
 #include <linux/dma-direction.h>
 #include <linux/ptr_ring.h>
 
-#define PP_FLAG_DMA_MAP		BIT(0) /* Should page_pool do the DMA
-					* map/unmap
-					*/
-#define PP_FLAG_DMA_SYNC_DEV	BIT(1) /* If set all pages that the driver gets
-					* from page_pool will be
-					* DMA-synced-for-device according to
-					* the length provided by the device
-					* driver.
-					* Please note DMA-sync-for-CPU is still
-					* device driver responsibility
-					*/
-#define PP_FLAG_PAGE_FRAG	BIT(2) /* for page frag feature */
+/* Should page_pool do the DMA map/unmap */
+#define PP_FLAG_DMA_MAP			BIT(0)
+
+/* If set all pages that the driver gets from page_pool will be
+ * DMA-synced-for-device according to the length provided by the device driver.
+ * Please note DMA-sync-for-CPU is still device driver responsibility
+ */
+#define PP_FLAG_DMA_SYNC_DEV		BIT(1)
+
+/* for page frag feature */
+#define PP_FLAG_PAGE_FRAG		BIT(2)
+
+/* If set driver will do the page splitting itself. This is used to fail the
+ * page_pool creation because there is overlap issue between pp_frag_count and
+ * dma_addr_upper in 'struct page' for some arches with
+ * PAGE_POOL_DMA_USE_PP_FRAG_COUNT being true.
+ */
+#define PP_FLAG_PAGE_SPLIT_IN_DRIVER	BIT(3)
+
 #define PP_FLAG_ALL		(PP_FLAG_DMA_MAP |\
 				 PP_FLAG_DMA_SYNC_DEV |\
-				 PP_FLAG_PAGE_FRAG)
+				 PP_FLAG_PAGE_FRAG |\
+				 PP_FLAG_PAGE_SPLIT_IN_DRIVER)
 
 /*
  * Fast allocation side cache array/stack
@@ -45,7 +53,8 @@ struct pp_alloc_cache {
 
 /**
  * struct page_pool_params - page pool parameters
- * @flags:	PP_FLAG_DMA_MAP, PP_FLAG_DMA_SYNC_DEV, PP_FLAG_PAGE_FRAG
+ * @flags:	PP_FLAG_DMA_MAP, PP_FLAG_DMA_SYNC_DEV, PP_FLAG_PAGE_FRAG,
+ *		PP_FLAG_PAGE_SPLIT_IN_DRIVER
  * @order:	2^order pages on allocation
  * @pool_size:	size of the ptr_ring
  * @nid:	NUMA node id to allocate from pages from
@@ -183,8 +192,9 @@ struct page_pool {
 };
 
 struct page *page_pool_alloc_pages(struct page_pool *pool, gfp_t gfp);
-struct page *page_pool_alloc_frag(struct page_pool *pool, unsigned int *offset,
-				  unsigned int size, gfp_t gfp);
+struct page *__page_pool_alloc_frag(struct page_pool *pool,
+				    unsigned int *offset, unsigned int size,
+				    gfp_t gfp);
 struct page_pool *page_pool_create(const struct page_pool_params *params);
 
 struct xdp_mem_info;
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 77cb75e63aca..7d5f0512aa13 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -14,7 +14,6 @@
 #include <net/xdp.h>
 
 #include <linux/dma-direction.h>
-#include <linux/dma-mapping.h>
 #include <linux/page-flags.h>
 #include <linux/mm.h> /* for put_page() */
 #include <linux/poison.h>
@@ -212,7 +211,7 @@ static int page_pool_init(struct page_pool *pool,
 	}
 
 	if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT &&
-	    pool->p.flags & PP_FLAG_PAGE_FRAG)
+	    pool->p.flags & PP_FLAG_PAGE_SPLIT_IN_DRIVER)
 		return -EINVAL;
 
 #ifdef CONFIG_PAGE_POOL_STATS
@@ -737,18 +736,16 @@ static void page_pool_free_frag(struct page_pool *pool)
 	page_pool_return_page(pool, page);
 }
 
-struct page *page_pool_alloc_frag(struct page_pool *pool,
-				  unsigned int *offset,
-				  unsigned int size, gfp_t gfp)
+struct page *__page_pool_alloc_frag(struct page_pool *pool,
+				    unsigned int *offset,
+				    unsigned int size, gfp_t gfp)
 {
 	unsigned int max_size = PAGE_SIZE << pool->p.order;
 	struct page *page = pool->frag_page;
 
-	if (WARN_ON(!(pool->p.flags & PP_FLAG_PAGE_FRAG) ||
-		    size > max_size))
+	if (WARN_ON(!(pool->p.flags & PP_FLAG_PAGE_FRAG)))
 		return NULL;
 
-	size = ALIGN(size, dma_get_cache_alignment());
 	*offset = pool->frag_offset;
 
 	if (page && *offset + size > max_size) {
@@ -781,7 +778,7 @@ struct page *page_pool_alloc_frag(struct page_pool *pool,
 	alloc_stat_inc(pool, fast);
 	return page;
 }
-EXPORT_SYMBOL(page_pool_alloc_frag);
+EXPORT_SYMBOL(__page_pool_alloc_frag);
 
 static void page_pool_empty_ring(struct page_pool *pool)
 {
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH net-next v7 2/6] page_pool: unify frag_count handling in page_pool_is_last_frag()
  2023-08-16 10:01 [PATCH net-next v7 0/6] introduce page_pool_alloc() related API Yunsheng Lin
  2023-08-16 10:01 ` [PATCH net-next v7 1/6] page_pool: frag API support for 32-bit arch with 64-bit DMA Yunsheng Lin
@ 2023-08-16 10:01 ` Yunsheng Lin
  2023-08-16 11:30   ` Ilias Apalodimas
  2023-08-16 10:01 ` [PATCH net-next v7 3/6] page_pool: remove PP_FLAG_PAGE_FRAG Yunsheng Lin
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 32+ messages in thread
From: Yunsheng Lin @ 2023-08-16 10:01 UTC (permalink / raw)
  To: davem, kuba, pabeni
  Cc: netdev, linux-kernel, Yunsheng Lin, Lorenzo Bianconi,
	Alexander Duyck, Liang Chen, Alexander Lobakin,
	Jesper Dangaard Brouer, Ilias Apalodimas, Eric Dumazet

Currently when page_pool_create() is called with
PP_FLAG_PAGE_FRAG flag, page_pool_alloc_pages() is only
allowed to be called under the below constraints:
1. page_pool_fragment_page() need to be called to setup
   page->pp_frag_count immediately.
2. page_pool_defrag_page() often need to be called to drain
   the page->pp_frag_count when there is no more user will
   be holding on to that page.

Those constraints exist in order to support a page to be
split into multi frags.

And those constraints have some overhead because of the
cache line dirtying/bouncing and atomic update.

Those constraints are unavoidable for case when we need a
page to be split into more than one frag, but there is also
case that we want to avoid the above constraints and their
overhead when a page can't be split as it can only hold a big
frag as requested by user, depending on different use cases:
use case 1: allocate page without page splitting.
use case 2: allocate page with page splitting.
use case 3: allocate page with or without page splitting
            depending on the frag size.

Currently page pool only provide page_pool_alloc_pages() and
page_pool_alloc_frag() API to enable the 1 & 2 separately,
so we can not use a combination of 1 & 2 to enable 3, it is
not possible yet because of the per page_pool flag
PP_FLAG_PAGE_FRAG.

So in order to allow allocating unsplit page without the
overhead of split page while still allow allocating split
page we need to remove the per page_pool flag in
page_pool_is_last_frag(), as best as I can think of, it seems
there are two methods as below:
1. Add per page flag/bit to indicate a page is split or
   not, which means we might need to update that flag/bit
   everytime the page is recycled, dirtying the cache line
   of 'struct page' for use case 1.
2. Unify the page->pp_frag_count handling for both split and
   unsplit page by assuming all pages in the page pool is split
   into a big frag initially.

As page pool already supports use case 1 without dirtying the
cache line of 'struct page' whenever a page is recyclable, we
need to support the above use case 3 with minimal overhead,
especially not adding any noticeable overhead for use case 1,
and we are already doing an optimization by not updating
pp_frag_count in page_pool_defrag_page() for the last frag
user, this patch chooses to unify the pp_frag_count handling
to support the above use case 3.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
CC: Lorenzo Bianconi <lorenzo@kernel.org>
CC: Alexander Duyck <alexander.duyck@gmail.com>
CC: Liang Chen <liangchen.linux@gmail.com>
CC: Alexander Lobakin <aleksander.lobakin@intel.com>
---
 include/net/page_pool/helpers.h | 54 +++++++++++++++++++++++----------
 net/core/page_pool.c            | 10 +++++-
 2 files changed, 47 insertions(+), 17 deletions(-)

diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h
index cb18de55f239..19e8ba056868 100644
--- a/include/net/page_pool/helpers.h
+++ b/include/net/page_pool/helpers.h
@@ -134,7 +134,8 @@ inline enum dma_data_direction page_pool_get_dma_dir(struct page_pool *pool)
  */
 static inline void page_pool_fragment_page(struct page *page, long nr)
 {
-	atomic_long_set(&page->pp_frag_count, nr);
+	if (!PAGE_POOL_DMA_USE_PP_FRAG_COUNT)
+		atomic_long_set(&page->pp_frag_count, nr);
 }
 
 static inline long page_pool_defrag_page(struct page *page, long nr)
@@ -142,33 +143,54 @@ static inline long page_pool_defrag_page(struct page *page, long nr)
 	long ret;
 
 	/* If nr == pp_frag_count then we have cleared all remaining
-	 * references to the page. No need to actually overwrite it, instead
-	 * we can leave this to be overwritten by the calling function.
+	 * references to the page:
+	 * 1. 'n == 1': no need to actually overwrite it.
+	 * 2. 'n != 1': overwrite it with one, which is the rare case
+	 *              for frag draining.
 	 *
-	 * The main advantage to doing this is that an atomic_read is
-	 * generally a much cheaper operation than an atomic update,
-	 * especially when dealing with a page that may be partitioned
-	 * into only 2 or 3 pieces.
+	 * The main advantage to doing this is that not only we avoid a
+	 * atomic update, as an atomic_read is generally a much cheaper
+	 * operation than an atomic update, especially when dealing with
+	 * a page that may be partitioned into only 2 or 3 pieces; but
+	 * also unify the frag and non-frag handling by ensuring all
+	 * pages have been split into one big frag initially, and only
+	 * overwrite it when the page is split into more than one frag.
 	 */
-	if (atomic_long_read(&page->pp_frag_count) == nr)
+	if (atomic_long_read(&page->pp_frag_count) == nr) {
+		/* As we have ensured nr is always one for constant case
+		 * using the BUILD_BUG_ON(), only need to handle the
+		 * non-constant case here for frag count draining, which
+		 * is a rare case.
+		 */
+		BUILD_BUG_ON(__builtin_constant_p(nr) && nr != 1);
+		if (!__builtin_constant_p(nr))
+			atomic_long_set(&page->pp_frag_count, 1);
+
 		return 0;
+	}
 
 	ret = atomic_long_sub_return(nr, &page->pp_frag_count);
 	WARN_ON(ret < 0);
+
+	/* We are the last user here too, reset frag count back to 1 to
+	 * ensure all pages have been split into one big frag initially,
+	 * this should be the rare case when the last two frag users call
+	 * page_pool_defrag_page() currently.
+	 */
+	if (unlikely(!ret))
+		atomic_long_set(&page->pp_frag_count, 1);
+
 	return ret;
 }
 
-static inline bool page_pool_is_last_frag(struct page_pool *pool,
-					  struct page *page)
+static inline bool page_pool_is_last_frag(struct page *page)
 {
 	/* We assume we are the last frag user that is still holding
 	 * on to the page if:
-	 * 1. Fragments aren't enabled.
-	 * 2. We are running in 32-bit arch with 64-bit DMA.
-	 * 3. page_pool_defrag_page() indicate we are the last user.
+	 * 1. We are running in 32-bit arch with 64-bit DMA.
+	 * 2. page_pool_defrag_page() indicate we are the last user.
 	 */
-	return !(pool->p.flags & PP_FLAG_PAGE_FRAG) ||
-	       PAGE_POOL_DMA_USE_PP_FRAG_COUNT ||
+	return PAGE_POOL_DMA_USE_PP_FRAG_COUNT ||
 	       (page_pool_defrag_page(page, 1) == 0);
 }
 
@@ -194,7 +216,7 @@ static inline void page_pool_put_page(struct page_pool *pool,
 	 * allow registering MEM_TYPE_PAGE_POOL, but shield linker.
 	 */
 #ifdef CONFIG_PAGE_POOL
-	if (!page_pool_is_last_frag(pool, page))
+	if (!page_pool_is_last_frag(page))
 		return;
 
 	page_pool_put_defragged_page(pool, page, dma_sync_size, allow_direct);
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 7d5f0512aa13..386e6d791e90 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -371,6 +371,14 @@ static void page_pool_set_pp_info(struct page_pool *pool,
 {
 	page->pp = pool;
 	page->pp_magic |= PP_SIGNATURE;
+
+	/* Ensuring all pages have been split into one big frag initially:
+	 * page_pool_set_pp_info() is only called once for every page when it
+	 * is allocated from the page allocator and page_pool_fragment_page()
+	 * is dirtying the same cache line as the page->pp_magic above, so
+	 * the overhead is negligible.
+	 */
+	page_pool_fragment_page(page, 1);
 	if (pool->p.init_callback)
 		pool->p.init_callback(page, pool->p.init_arg);
 }
@@ -667,7 +675,7 @@ void page_pool_put_page_bulk(struct page_pool *pool, void **data,
 		struct page *page = virt_to_head_page(data[i]);
 
 		/* It is not the last user for the page frag case */
-		if (!page_pool_is_last_frag(pool, page))
+		if (!page_pool_is_last_frag(page))
 			continue;
 
 		page = __page_pool_put_page(pool, page, -1, false);
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH net-next v7 3/6] page_pool: remove PP_FLAG_PAGE_FRAG
  2023-08-16 10:01 [PATCH net-next v7 0/6] introduce page_pool_alloc() related API Yunsheng Lin
  2023-08-16 10:01 ` [PATCH net-next v7 1/6] page_pool: frag API support for 32-bit arch with 64-bit DMA Yunsheng Lin
  2023-08-16 10:01 ` [PATCH net-next v7 2/6] page_pool: unify frag_count handling in page_pool_is_last_frag() Yunsheng Lin
@ 2023-08-16 10:01 ` Yunsheng Lin
  2023-08-16 10:01 ` [PATCH net-next v7 4/6] page_pool: introduce page_pool[_cache]_alloc() API Yunsheng Lin
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 32+ messages in thread
From: Yunsheng Lin @ 2023-08-16 10:01 UTC (permalink / raw)
  To: davem, kuba, pabeni
  Cc: netdev, linux-kernel, Yunsheng Lin, Lorenzo Bianconi,
	Alexander Duyck, Liang Chen, Alexander Lobakin, Michael Chan,
	Eric Dumazet, Yisen Zhuang, Salil Mehta, Sunil Goutham,
	Geetha sowjanya, Subbaraya Sundeep, hariprasad, Felix Fietkau,
	Ryder Lee, Shayne Chen, Sean Wang, Kalle Valo, Matthias Brugger,
	AngeloGioacchino Del Regno, Jesper Dangaard Brouer,
	Ilias Apalodimas, linux-wireless, linux-arm-kernel,
	linux-mediatek

PP_FLAG_PAGE_FRAG is not really needed after pp_frag_count
handling is unified and page_pool_alloc_frag() is supported
in 32-bit arch with 64-bit DMA, so remove it.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
CC: Lorenzo Bianconi <lorenzo@kernel.org>
CC: Alexander Duyck <alexander.duyck@gmail.com>
CC: Liang Chen <liangchen.linux@gmail.com>
CC: Alexander Lobakin <aleksander.lobakin@intel.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt.c                | 2 --
 drivers/net/ethernet/hisilicon/hns3/hns3_enet.c          | 3 +--
 drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c | 2 +-
 drivers/net/wireless/mediatek/mt76/mac80211.c            | 2 +-
 include/net/page_pool/types.h                            | 8 ++------
 net/core/page_pool.c                                     | 3 ---
 net/core/skbuff.c                                        | 2 +-
 7 files changed, 6 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
index 7be917a8da48..60b699be0d9b 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c
@@ -3249,8 +3249,6 @@ static int bnxt_alloc_rx_page_pool(struct bnxt *bp,
 	pp.napi = &rxr->bnapi->napi;
 	pp.dev = &bp->pdev->dev;
 	pp.dma_dir = DMA_BIDIRECTIONAL;
-	if (PAGE_SIZE > BNXT_RX_PAGE_SIZE)
-		pp.flags |= PP_FLAG_PAGE_FRAG;
 
 	rxr->page_pool = page_pool_create(&pp);
 	if (IS_ERR(rxr->page_pool)) {
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
index eac2d0573241..ff0c219365f1 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c
@@ -4926,8 +4926,7 @@ static void hns3_put_ring_config(struct hns3_nic_priv *priv)
 static void hns3_alloc_page_pool(struct hns3_enet_ring *ring)
 {
 	struct page_pool_params pp_params = {
-		.flags = PP_FLAG_DMA_MAP | PP_FLAG_PAGE_FRAG |
-				PP_FLAG_DMA_SYNC_DEV,
+		.flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV,
 		.order = hns3_page_order(ring),
 		.pool_size = ring->desc_num * hns3_buf_size(ring) /
 				(PAGE_SIZE << hns3_page_order(ring)),
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
index dce3cea00032..edc6acebf369 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
@@ -1433,7 +1433,7 @@ int otx2_pool_init(struct otx2_nic *pfvf, u16 pool_id,
 		return 0;
 	}
 
-	pp_params.flags = PP_FLAG_PAGE_FRAG | PP_FLAG_DMA_MAP;
+	pp_params.flags = PP_FLAG_DMA_MAP;
 	pp_params.pool_size = numptrs;
 	pp_params.nid = NUMA_NO_NODE;
 	pp_params.dev = pfvf->dev;
diff --git a/drivers/net/wireless/mediatek/mt76/mac80211.c b/drivers/net/wireless/mediatek/mt76/mac80211.c
index d158320bc15d..fe7cc67b7ee2 100644
--- a/drivers/net/wireless/mediatek/mt76/mac80211.c
+++ b/drivers/net/wireless/mediatek/mt76/mac80211.c
@@ -566,7 +566,7 @@ int mt76_create_page_pool(struct mt76_dev *dev, struct mt76_queue *q)
 {
 	struct page_pool_params pp_params = {
 		.order = 0,
-		.flags = PP_FLAG_PAGE_FRAG,
+		.flags = 0,
 		.nid = NUMA_NO_NODE,
 		.dev = dev->dma_dev,
 	};
diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
index 079337c42aa6..4775cf95edb7 100644
--- a/include/net/page_pool/types.h
+++ b/include/net/page_pool/types.h
@@ -15,19 +15,15 @@
  */
 #define PP_FLAG_DMA_SYNC_DEV		BIT(1)
 
-/* for page frag feature */
-#define PP_FLAG_PAGE_FRAG		BIT(2)
-
 /* If set driver will do the page splitting itself. This is used to fail the
  * page_pool creation because there is overlap issue between pp_frag_count and
  * dma_addr_upper in 'struct page' for some arches with
  * PAGE_POOL_DMA_USE_PP_FRAG_COUNT being true.
  */
-#define PP_FLAG_PAGE_SPLIT_IN_DRIVER	BIT(3)
+#define PP_FLAG_PAGE_SPLIT_IN_DRIVER	BIT(2)
 
 #define PP_FLAG_ALL		(PP_FLAG_DMA_MAP |\
 				 PP_FLAG_DMA_SYNC_DEV |\
-				 PP_FLAG_PAGE_FRAG |\
 				 PP_FLAG_PAGE_SPLIT_IN_DRIVER)
 
 /*
@@ -53,7 +49,7 @@ struct pp_alloc_cache {
 
 /**
  * struct page_pool_params - page pool parameters
- * @flags:	PP_FLAG_DMA_MAP, PP_FLAG_DMA_SYNC_DEV, PP_FLAG_PAGE_FRAG,
+ * @flags:	PP_FLAG_DMA_MAP, PP_FLAG_DMA_SYNC_DEV,
  *		PP_FLAG_PAGE_SPLIT_IN_DRIVER
  * @order:	2^order pages on allocation
  * @pool_size:	size of the ptr_ring
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 386e6d791e90..d3b8efe98d5e 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -751,9 +751,6 @@ struct page *__page_pool_alloc_frag(struct page_pool *pool,
 	unsigned int max_size = PAGE_SIZE << pool->p.order;
 	struct page *page = pool->frag_page;
 
-	if (WARN_ON(!(pool->p.flags & PP_FLAG_PAGE_FRAG)))
-		return NULL;
-
 	*offset = pool->frag_offset;
 
 	if (page && *offset + size > max_size) {
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 33fdf04d4334..4b90b6ed10b2 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -5709,7 +5709,7 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from,
 	/* In general, avoid mixing page_pool and non-page_pool allocated
 	 * pages within the same SKB. Additionally avoid dealing with clones
 	 * with page_pool pages, in case the SKB is using page_pool fragment
-	 * references (PP_FLAG_PAGE_FRAG). Since we only take full page
+	 * references (page_pool_alloc_frag()). Since we only take full page
 	 * references for cloned SKBs at the moment that would result in
 	 * inconsistent reference counts.
 	 * In theory we could take full references if @from is cloned and
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH net-next v7 4/6] page_pool: introduce page_pool[_cache]_alloc() API
  2023-08-16 10:01 [PATCH net-next v7 0/6] introduce page_pool_alloc() related API Yunsheng Lin
                   ` (2 preceding siblings ...)
  2023-08-16 10:01 ` [PATCH net-next v7 3/6] page_pool: remove PP_FLAG_PAGE_FRAG Yunsheng Lin
@ 2023-08-16 10:01 ` Yunsheng Lin
  2023-08-16 10:01 ` [PATCH net-next v7 5/6] page_pool: update document about frag API Yunsheng Lin
  2023-08-16 10:01 ` [PATCH net-next v7 6/6] net: veth: use newly added page pool API for veth with xdp Yunsheng Lin
  5 siblings, 0 replies; 32+ messages in thread
From: Yunsheng Lin @ 2023-08-16 10:01 UTC (permalink / raw)
  To: davem, kuba, pabeni
  Cc: netdev, linux-kernel, Yunsheng Lin, Lorenzo Bianconi,
	Alexander Duyck, Liang Chen, Alexander Lobakin,
	Jesper Dangaard Brouer, Ilias Apalodimas, Eric Dumazet

Currently page pool supports the below use cases:
use case 1: allocate page without page splitting using
            page_pool_alloc_pages() API if the driver knows
            that the memory it need is always bigger than
            half of the page allocated from page pool.
use case 2: allocate page frag with page splitting using
            page_pool_alloc_frag() API if the driver knows
            that the memory it need is always smaller than
            or equal to the half of the page allocated from
            page pool.

There is emerging use case [1] & [2] that is a mix of the
above two case: the driver doesn't know the size of memory it
need beforehand, so the driver may use something like below to
allocate memory with least memory utilization and performance
penalty:

if (size << 1 > max_size)
	page = page_pool_alloc_pages();
else
	page = page_pool_alloc_frag();

To avoid the driver doing something like above, add the
page_pool[_cache]_alloc() API to support the above use case,
and update the true size of memory that is acctually allocated
by updating '*size' back to the driver in order to avoid
exacerbating truesize underestimate problem.

Rename page_pool_free() which is used in the destroy process to
__page_pool_destroy() to avoid confusion with the newly added
API.

1. https://lore.kernel.org/all/d3ae6bd3537fbce379382ac6a42f67e22f27ece2.1683896626.git.lorenzo@kernel.org/
2. https://lore.kernel.org/all/20230526054621.18371-3-liangchen.linux@gmail.com/

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
CC: Lorenzo Bianconi <lorenzo@kernel.org>
CC: Alexander Duyck <alexander.duyck@gmail.com>
CC: Liang Chen <liangchen.linux@gmail.com>
CC: Alexander Lobakin <aleksander.lobakin@intel.com>
---
 include/net/page_pool/helpers.h | 70 +++++++++++++++++++++++++++++++++
 net/core/page_pool.c            |  4 +-
 2 files changed, 72 insertions(+), 2 deletions(-)

diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h
index 19e8ba056868..b920224f6584 100644
--- a/include/net/page_pool/helpers.h
+++ b/include/net/page_pool/helpers.h
@@ -109,6 +109,70 @@ static inline struct page *page_pool_dev_alloc_frag(struct page_pool *pool,
 	return page_pool_alloc_frag(pool, offset, size, gfp);
 }
 
+static inline struct page *page_pool_alloc(struct page_pool *pool,
+					   unsigned int *offset,
+					   unsigned int *size, gfp_t gfp)
+{
+	unsigned int max_size = PAGE_SIZE << pool->p.order;
+	struct page *page;
+
+	*size = ALIGN(*size, dma_get_cache_alignment());
+
+	if (WARN_ON(*size > max_size))
+		return NULL;
+
+	if ((*size << 1) > max_size || PAGE_POOL_DMA_USE_PP_FRAG_COUNT) {
+		*size = max_size;
+		*offset = 0;
+		return page_pool_alloc_pages(pool, gfp);
+	}
+
+	page = __page_pool_alloc_frag(pool, offset, *size, gfp);
+	if (unlikely(!page))
+		return NULL;
+
+	/* There is very likely not enough space for another frag, so append the
+	 * remaining size to the current frag to avoid truesize underestimate
+	 * problem.
+	 */
+	if (pool->frag_offset + *size > max_size) {
+		*size = max_size - *offset;
+		pool->frag_offset = max_size;
+	}
+
+	return page;
+}
+
+static inline struct page *page_pool_dev_alloc(struct page_pool *pool,
+					       unsigned int *offset,
+					       unsigned int *size)
+{
+	gfp_t gfp = (GFP_ATOMIC | __GFP_NOWARN);
+
+	return page_pool_alloc(pool, offset, size, gfp);
+}
+
+static inline void *page_pool_cache_alloc(struct page_pool *pool,
+					  unsigned int *size, gfp_t gfp)
+{
+	unsigned int offset;
+	struct page *page;
+
+	page = page_pool_alloc(pool, &offset, size, gfp);
+	if (unlikely(!page))
+		return NULL;
+
+	return page_address(page) + offset;
+}
+
+static inline void *page_pool_dev_cache_alloc(struct page_pool *pool,
+					      unsigned int *size)
+{
+	gfp_t gfp = (GFP_ATOMIC | __GFP_NOWARN);
+
+	return page_pool_cache_alloc(pool, size, gfp);
+}
+
 /**
  * page_pool_get_dma_dir() - Retrieve the stored DMA direction.
  * @pool:	pool from which page was allocated
@@ -252,6 +316,12 @@ static inline void page_pool_recycle_direct(struct page_pool *pool,
 	page_pool_put_full_page(pool, page, true);
 }
 
+static inline void page_pool_cache_free(struct page_pool *pool, void *data,
+					bool allow_direct)
+{
+	page_pool_put_page(pool, virt_to_head_page(data), -1, allow_direct);
+}
+
 /**
  * page_pool_get_dma_addr() - Retrieve the stored DMA address.
  * @page:	page allocated from a page pool
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index d3b8efe98d5e..6f970fc7a0a7 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -800,7 +800,7 @@ static void page_pool_empty_ring(struct page_pool *pool)
 	}
 }
 
-static void page_pool_free(struct page_pool *pool)
+static void __page_pool_destroy(struct page_pool *pool)
 {
 	if (pool->disconnect)
 		pool->disconnect(pool);
@@ -851,7 +851,7 @@ static int page_pool_release(struct page_pool *pool)
 	page_pool_scrub(pool);
 	inflight = page_pool_inflight(pool);
 	if (!inflight)
-		page_pool_free(pool);
+		__page_pool_destroy(pool);
 
 	return inflight;
 }
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH net-next v7 5/6] page_pool: update document about frag API
  2023-08-16 10:01 [PATCH net-next v7 0/6] introduce page_pool_alloc() related API Yunsheng Lin
                   ` (3 preceding siblings ...)
  2023-08-16 10:01 ` [PATCH net-next v7 4/6] page_pool: introduce page_pool[_cache]_alloc() API Yunsheng Lin
@ 2023-08-16 10:01 ` Yunsheng Lin
  2023-08-16 10:01 ` [PATCH net-next v7 6/6] net: veth: use newly added page pool API for veth with xdp Yunsheng Lin
  5 siblings, 0 replies; 32+ messages in thread
From: Yunsheng Lin @ 2023-08-16 10:01 UTC (permalink / raw)
  To: davem, kuba, pabeni
  Cc: netdev, linux-kernel, Yunsheng Lin, Lorenzo Bianconi,
	Alexander Duyck, Liang Chen, Alexander Lobakin,
	Jesper Dangaard Brouer, Ilias Apalodimas, Eric Dumazet,
	Jonathan Corbet, Alexei Starovoitov, Daniel Borkmann,
	John Fastabend, linux-doc, bpf

As more drivers begin to use the frag API, update the
document about how to decide which API to use for the
driver author.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
CC: Lorenzo Bianconi <lorenzo@kernel.org>
CC: Alexander Duyck <alexander.duyck@gmail.com>
CC: Liang Chen <liangchen.linux@gmail.com>
CC: Alexander Lobakin <aleksander.lobakin@intel.com>
---
 Documentation/networking/page_pool.rst |  4 +-
 include/net/page_pool/helpers.h        | 67 +++++++++++++++++++++++---
 2 files changed, 64 insertions(+), 7 deletions(-)

diff --git a/Documentation/networking/page_pool.rst b/Documentation/networking/page_pool.rst
index 215ebc92752c..0c0705994f51 100644
--- a/Documentation/networking/page_pool.rst
+++ b/Documentation/networking/page_pool.rst
@@ -58,7 +58,9 @@ a page will cause no race conditions is enough.
 
 .. kernel-doc:: include/net/page_pool/helpers.h
    :identifiers: page_pool_put_page page_pool_put_full_page
-		 page_pool_recycle_direct page_pool_dev_alloc_pages
+		 page_pool_recycle_direct page_pool_cache_free
+		 page_pool_dev_alloc_pages page_pool_dev_alloc_frag
+		 page_pool_dev_alloc page_pool_dev_cache_alloc
 		 page_pool_get_dma_addr page_pool_get_dma_dir
 
 .. kernel-doc:: net/core/page_pool.c
diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h
index b920224f6584..4abca6c9d9f4 100644
--- a/include/net/page_pool/helpers.h
+++ b/include/net/page_pool/helpers.h
@@ -8,13 +8,28 @@
 /**
  * DOC: page_pool allocator
  *
- * The page_pool allocator is optimized for the XDP mode that
- * uses one frame per-page, but it can fallback on the
- * regular page allocator APIs.
+ * The page_pool allocator is optimized for recycling page or page frag used by
+ * skb packet and xdp frame.
  *
- * Basic use involves replacing alloc_pages() calls with the
- * page_pool_alloc_pages() call.  Drivers should use
- * page_pool_dev_alloc_pages() replacing dev_alloc_pages().
+ * Basic use involves replacing napi_alloc_frag() and alloc_pages() calls with
+ * page_pool_cache_alloc() and page_pool_alloc(), which allocate memory with or
+ * without page splitting depending on the requested memory size.
+ *
+ * If the driver knows that it always requires full pages or its allocations are
+ * always smaller than half a page, it can use one of the more specific API
+ * calls:
+ *
+ * 1. page_pool_alloc_pages(): allocate memory without page splitting when
+ * driver knows that the memory it need is always bigger than half of the page
+ * allocated from page pool. There is no cache line dirtying for 'struct page'
+ * when a page is recycled back to the page pool.
+ *
+ * 2. page_pool_alloc_frag(): allocate memory with page splitting when driver
+ * knows that the memory it need is always smaller than or equal to half of the
+ * page allocated from page pool. Page splitting enables memory saving and thus
+ * avoids TLB/cache miss for data access, but there also is some cost to
+ * implement page splitting, mainly some cache line dirtying/bouncing for
+ * 'struct page' and atomic operation for page->pp_frag_count.
  *
  * API keeps track of in-flight pages, in order to let API user know
  * when it is safe to free a page_pool object.  Thus, API users
@@ -100,6 +115,17 @@ static inline struct page *page_pool_alloc_frag(struct page_pool *pool,
 	return __page_pool_alloc_frag(pool, offset, size, gfp);
 }
 
+/**
+ * page_pool_dev_alloc_frag() - allocate a page frag.
+ * @pool: pool from which to allocate
+ * @offset: offset to the allocated page
+ * @size: requested size
+ *
+ * Get a page frag from the page allocator or page_pool caches.
+ *
+ * Return:
+ * Returns allocated page frag, otherwise return NULL.
+ */
 static inline struct page *page_pool_dev_alloc_frag(struct page_pool *pool,
 						    unsigned int *offset,
 						    unsigned int size)
@@ -143,6 +169,17 @@ static inline struct page *page_pool_alloc(struct page_pool *pool,
 	return page;
 }
 
+/**
+ * page_pool_dev_alloc() - allocate a page or a page frag.
+ * @pool: pool from which to allocate
+ * @offset: offset to the allocated page
+ * @size: in as the requested size, out as the allocated size
+ *
+ * Get a page or a page frag from the page allocator or page_pool caches.
+ *
+ * Return:
+ * Returns a page or a page frag, otherwise return NULL.
+ */
 static inline struct page *page_pool_dev_alloc(struct page_pool *pool,
 					       unsigned int *offset,
 					       unsigned int *size)
@@ -165,6 +202,16 @@ static inline void *page_pool_cache_alloc(struct page_pool *pool,
 	return page_address(page) + offset;
 }
 
+/**
+ * page_pool_dev_cache_alloc() - allocate a cache.
+ * @pool: pool from which to allocate
+ * @size: in as the requested size, out as the allocated size
+ *
+ * Get a cache from the page allocator or page_pool caches.
+ *
+ * Return:
+ * Returns the addr for the allocated cache, otherwise return NULL.
+ */
 static inline void *page_pool_dev_cache_alloc(struct page_pool *pool,
 					      unsigned int *size)
 {
@@ -316,6 +363,14 @@ static inline void page_pool_recycle_direct(struct page_pool *pool,
 	page_pool_put_full_page(pool, page, true);
 }
 
+/**
+ * page_pool_cache_free() - free a cache into the page_pool
+ * @pool: pool from which cache was allocated
+ * @data: addr of cache to be free
+ * @allow_direct: freed by the consumer, allow lockless caching
+ *
+ * Free a cache allocated from page_pool_dev_cache_alloc().
+ */
 static inline void page_pool_cache_free(struct page_pool *pool, void *data,
 					bool allow_direct)
 {
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [PATCH net-next v7 6/6] net: veth: use newly added page pool API for veth with xdp
  2023-08-16 10:01 [PATCH net-next v7 0/6] introduce page_pool_alloc() related API Yunsheng Lin
                   ` (4 preceding siblings ...)
  2023-08-16 10:01 ` [PATCH net-next v7 5/6] page_pool: update document about frag API Yunsheng Lin
@ 2023-08-16 10:01 ` Yunsheng Lin
  5 siblings, 0 replies; 32+ messages in thread
From: Yunsheng Lin @ 2023-08-16 10:01 UTC (permalink / raw)
  To: davem, kuba, pabeni
  Cc: netdev, linux-kernel, Yunsheng Lin, Lorenzo Bianconi,
	Alexander Duyck, Liang Chen, Alexander Lobakin, Eric Dumazet,
	Alexei Starovoitov, Daniel Borkmann, Jesper Dangaard Brouer,
	John Fastabend, bpf

Use page_pool[_cache]_alloc() API to allocate memory with
least memory utilization and performance penalty.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
CC: Lorenzo Bianconi <lorenzo@kernel.org>
CC: Alexander Duyck <alexander.duyck@gmail.com>
CC: Liang Chen <liangchen.linux@gmail.com>
CC: Alexander Lobakin <aleksander.lobakin@intel.com>
---
 drivers/net/veth.c | 25 ++++++++++++++++---------
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 953f6d8f8db0..f9bb79d89a84 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -736,10 +736,11 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
 	if (skb_shared(skb) || skb_head_is_locked(skb) ||
 	    skb_shinfo(skb)->nr_frags ||
 	    skb_headroom(skb) < XDP_PACKET_HEADROOM) {
-		u32 size, len, max_head_size, off;
+		u32 size, len, max_head_size, off, truesize, page_offset;
 		struct sk_buff *nskb;
 		struct page *page;
 		int i, head_off;
+		void *data;
 
 		/* We need a private copy of the skb and data buffers since
 		 * the ebpf program can modify it. We segment the original skb
@@ -752,14 +753,17 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
 		if (skb->len > PAGE_SIZE * MAX_SKB_FRAGS + max_head_size)
 			goto drop;
 
+		size = min_t(u32, skb->len, max_head_size);
+		truesize = SKB_HEAD_ALIGN(size) + VETH_XDP_HEADROOM;
+
 		/* Allocate skb head */
-		page = page_pool_dev_alloc_pages(rq->page_pool);
-		if (!page)
+		data = page_pool_dev_cache_alloc(rq->page_pool, &truesize);
+		if (!data)
 			goto drop;
 
-		nskb = napi_build_skb(page_address(page), PAGE_SIZE);
+		nskb = napi_build_skb(data, truesize);
 		if (!nskb) {
-			page_pool_put_full_page(rq->page_pool, page, true);
+			page_pool_cache_free(rq->page_pool, data, true);
 			goto drop;
 		}
 
@@ -767,7 +771,6 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
 		skb_copy_header(nskb, skb);
 		skb_mark_for_recycle(nskb);
 
-		size = min_t(u32, skb->len, max_head_size);
 		if (skb_copy_bits(skb, 0, nskb->data, size)) {
 			consume_skb(nskb);
 			goto drop;
@@ -782,14 +785,18 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
 		len = skb->len - off;
 
 		for (i = 0; i < MAX_SKB_FRAGS && off < skb->len; i++) {
-			page = page_pool_dev_alloc_pages(rq->page_pool);
+			size = min_t(u32, len, PAGE_SIZE);
+			truesize = size;
+
+			page = page_pool_dev_alloc(rq->page_pool, &page_offset,
+						   &truesize);
 			if (!page) {
 				consume_skb(nskb);
 				goto drop;
 			}
 
-			size = min_t(u32, len, PAGE_SIZE);
-			skb_add_rx_frag(nskb, i, page, 0, size, PAGE_SIZE);
+			skb_add_rx_frag(nskb, i, page, page_offset, size,
+					truesize);
 			if (skb_copy_bits(skb, off, page_address(page),
 					  size)) {
 				consume_skb(nskb);
-- 
2.33.0


^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH net-next v7 2/6] page_pool: unify frag_count handling in page_pool_is_last_frag()
  2023-08-16 10:01 ` [PATCH net-next v7 2/6] page_pool: unify frag_count handling in page_pool_is_last_frag() Yunsheng Lin
@ 2023-08-16 11:30   ` Ilias Apalodimas
  2023-08-16 12:42     ` Yunsheng Lin
  0 siblings, 1 reply; 32+ messages in thread
From: Ilias Apalodimas @ 2023-08-16 11:30 UTC (permalink / raw)
  To: Yunsheng Lin
  Cc: davem, kuba, pabeni, netdev, linux-kernel, Lorenzo Bianconi,
	Alexander Duyck, Liang Chen, Alexander Lobakin,
	Jesper Dangaard Brouer, Eric Dumazet

On Wed, 16 Aug 2023 at 13:04, Yunsheng Lin <linyunsheng@huawei.com> wrote:
>
> Currently when page_pool_create() is called with
> PP_FLAG_PAGE_FRAG flag, page_pool_alloc_pages() is only
> allowed to be called under the below constraints:
> 1. page_pool_fragment_page() need to be called to setup
>    page->pp_frag_count immediately.
> 2. page_pool_defrag_page() often need to be called to drain
>    the page->pp_frag_count when there is no more user will
>    be holding on to that page.
>
> Those constraints exist in order to support a page to be
> split into multi frags.
>
> And those constraints have some overhead because of the
> cache line dirtying/bouncing and atomic update.
>
> Those constraints are unavoidable for case when we need a
> page to be split into more than one frag, but there is also
> case that we want to avoid the above constraints and their
> overhead when a page can't be split as it can only hold a big
> frag as requested by user, depending on different use cases:
> use case 1: allocate page without page splitting.
> use case 2: allocate page with page splitting.
> use case 3: allocate page with or without page splitting
>             depending on the frag size.
>
> Currently page pool only provide page_pool_alloc_pages() and
> page_pool_alloc_frag() API to enable the 1 & 2 separately,
> so we can not use a combination of 1 & 2 to enable 3, it is
> not possible yet because of the per page_pool flag
> PP_FLAG_PAGE_FRAG.
>

I really think we are inventing problems to solve here.
What would be more useful here would be an example with numbers.  Most
of what you mention are true, but what % of the packets would split a
page in a way that the remaining part cant be used is unknown.  Do you
have a usecase in the hns3 driver?  Are there any numbers that justify
the change?

Thanks
/Ilias
> So in order to allow allocating unsplit page without the
> overhead of split page while still allow allocating split
> page we need to remove the per page_pool flag in
> page_pool_is_last_frag(), as best as I can think of, it seems
> there are two methods as below:
> 1. Add per page flag/bit to indicate a page is split or
>    not, which means we might need to update that flag/bit
>    everytime the page is recycled, dirtying the cache line
>    of 'struct page' for use case 1.
> 2. Unify the page->pp_frag_count handling for both split and
>    unsplit page by assuming all pages in the page pool is split
>    into a big frag initially.
>
> As page pool already supports use case 1 without dirtying the
> cache line of 'struct page' whenever a page is recyclable, we
> need to support the above use case 3 with minimal overhead,
> especially not adding any noticeable overhead for use case 1,
> and we are already doing an optimization by not updating
> pp_frag_count in page_pool_defrag_page() for the last frag
> user, this patch chooses to unify the pp_frag_count handling
> to support the above use case 3.
>
> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
> CC: Lorenzo Bianconi <lorenzo@kernel.org>
> CC: Alexander Duyck <alexander.duyck@gmail.com>
> CC: Liang Chen <liangchen.linux@gmail.com>
> CC: Alexander Lobakin <aleksander.lobakin@intel.com>
> ---
>  include/net/page_pool/helpers.h | 54 +++++++++++++++++++++++----------
>  net/core/page_pool.c            | 10 +++++-
>  2 files changed, 47 insertions(+), 17 deletions(-)
>
> diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h
> index cb18de55f239..19e8ba056868 100644
> --- a/include/net/page_pool/helpers.h
> +++ b/include/net/page_pool/helpers.h
> @@ -134,7 +134,8 @@ inline enum dma_data_direction page_pool_get_dma_dir(struct page_pool *pool)
>   */
>  static inline void page_pool_fragment_page(struct page *page, long nr)
>  {
> -       atomic_long_set(&page->pp_frag_count, nr);
> +       if (!PAGE_POOL_DMA_USE_PP_FRAG_COUNT)
> +               atomic_long_set(&page->pp_frag_count, nr);
>  }
>
>  static inline long page_pool_defrag_page(struct page *page, long nr)
> @@ -142,33 +143,54 @@ static inline long page_pool_defrag_page(struct page *page, long nr)
>         long ret;
>
>         /* If nr == pp_frag_count then we have cleared all remaining
> -        * references to the page. No need to actually overwrite it, instead
> -        * we can leave this to be overwritten by the calling function.
> +        * references to the page:
> +        * 1. 'n == 1': no need to actually overwrite it.
> +        * 2. 'n != 1': overwrite it with one, which is the rare case
> +        *              for frag draining.
>          *
> -        * The main advantage to doing this is that an atomic_read is
> -        * generally a much cheaper operation than an atomic update,
> -        * especially when dealing with a page that may be partitioned
> -        * into only 2 or 3 pieces.
> +        * The main advantage to doing this is that not only we avoid a
> +        * atomic update, as an atomic_read is generally a much cheaper
> +        * operation than an atomic update, especially when dealing with
> +        * a page that may be partitioned into only 2 or 3 pieces; but
> +        * also unify the frag and non-frag handling by ensuring all
> +        * pages have been split into one big frag initially, and only
> +        * overwrite it when the page is split into more than one frag.
>          */
> -       if (atomic_long_read(&page->pp_frag_count) == nr)
> +       if (atomic_long_read(&page->pp_frag_count) == nr) {
> +               /* As we have ensured nr is always one for constant case
> +                * using the BUILD_BUG_ON(), only need to handle the
> +                * non-constant case here for frag count draining, which
> +                * is a rare case.
> +                */
> +               BUILD_BUG_ON(__builtin_constant_p(nr) && nr != 1);
> +               if (!__builtin_constant_p(nr))
> +                       atomic_long_set(&page->pp_frag_count, 1);
> +
>                 return 0;
> +       }
>
>         ret = atomic_long_sub_return(nr, &page->pp_frag_count);
>         WARN_ON(ret < 0);
> +
> +       /* We are the last user here too, reset frag count back to 1 to
> +        * ensure all pages have been split into one big frag initially,
> +        * this should be the rare case when the last two frag users call
> +        * page_pool_defrag_page() currently.
> +        */
> +       if (unlikely(!ret))
> +               atomic_long_set(&page->pp_frag_count, 1);
> +
>         return ret;
>  }
>
> -static inline bool page_pool_is_last_frag(struct page_pool *pool,
> -                                         struct page *page)
> +static inline bool page_pool_is_last_frag(struct page *page)
>  {
>         /* We assume we are the last frag user that is still holding
>          * on to the page if:
> -        * 1. Fragments aren't enabled.
> -        * 2. We are running in 32-bit arch with 64-bit DMA.
> -        * 3. page_pool_defrag_page() indicate we are the last user.
> +        * 1. We are running in 32-bit arch with 64-bit DMA.
> +        * 2. page_pool_defrag_page() indicate we are the last user.
>          */
> -       return !(pool->p.flags & PP_FLAG_PAGE_FRAG) ||
> -              PAGE_POOL_DMA_USE_PP_FRAG_COUNT ||
> +       return PAGE_POOL_DMA_USE_PP_FRAG_COUNT ||
>                (page_pool_defrag_page(page, 1) == 0);
>  }
>
> @@ -194,7 +216,7 @@ static inline void page_pool_put_page(struct page_pool *pool,
>          * allow registering MEM_TYPE_PAGE_POOL, but shield linker.
>          */
>  #ifdef CONFIG_PAGE_POOL
> -       if (!page_pool_is_last_frag(pool, page))
> +       if (!page_pool_is_last_frag(page))
>                 return;
>
>         page_pool_put_defragged_page(pool, page, dma_sync_size, allow_direct);
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index 7d5f0512aa13..386e6d791e90 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -371,6 +371,14 @@ static void page_pool_set_pp_info(struct page_pool *pool,
>  {
>         page->pp = pool;
>         page->pp_magic |= PP_SIGNATURE;
> +
> +       /* Ensuring all pages have been split into one big frag initially:
> +        * page_pool_set_pp_info() is only called once for every page when it
> +        * is allocated from the page allocator and page_pool_fragment_page()
> +        * is dirtying the same cache line as the page->pp_magic above, so
> +        * the overhead is negligible.
> +        */
> +       page_pool_fragment_page(page, 1);
>         if (pool->p.init_callback)
>                 pool->p.init_callback(page, pool->p.init_arg);
>  }
> @@ -667,7 +675,7 @@ void page_pool_put_page_bulk(struct page_pool *pool, void **data,
>                 struct page *page = virt_to_head_page(data[i]);
>
>                 /* It is not the last user for the page frag case */
> -               if (!page_pool_is_last_frag(pool, page))
> +               if (!page_pool_is_last_frag(page))
>                         continue;
>
>                 page = __page_pool_put_page(pool, page, -1, false);
> --
> 2.33.0
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH net-next v7 2/6] page_pool: unify frag_count handling in page_pool_is_last_frag()
  2023-08-16 11:30   ` Ilias Apalodimas
@ 2023-08-16 12:42     ` Yunsheng Lin
  0 siblings, 0 replies; 32+ messages in thread
From: Yunsheng Lin @ 2023-08-16 12:42 UTC (permalink / raw)
  To: Ilias Apalodimas
  Cc: davem, kuba, pabeni, netdev, linux-kernel, Lorenzo Bianconi,
	Alexander Duyck, Liang Chen, Alexander Lobakin,
	Jesper Dangaard Brouer, Eric Dumazet, Maryam Tahhan

On 2023/8/16 19:30, Ilias Apalodimas wrote:
> On Wed, 16 Aug 2023 at 13:04, Yunsheng Lin <linyunsheng@huawei.com> wrote:
>>
>> Currently when page_pool_create() is called with
>> PP_FLAG_PAGE_FRAG flag, page_pool_alloc_pages() is only
>> allowed to be called under the below constraints:
>> 1. page_pool_fragment_page() need to be called to setup
>>    page->pp_frag_count immediately.
>> 2. page_pool_defrag_page() often need to be called to drain
>>    the page->pp_frag_count when there is no more user will
>>    be holding on to that page.
>>
>> Those constraints exist in order to support a page to be
>> split into multi frags.
>>
>> And those constraints have some overhead because of the
>> cache line dirtying/bouncing and atomic update.
>>
>> Those constraints are unavoidable for case when we need a
>> page to be split into more than one frag, but there is also
>> case that we want to avoid the above constraints and their
>> overhead when a page can't be split as it can only hold a big
>> frag as requested by user, depending on different use cases:
>> use case 1: allocate page without page splitting.
>> use case 2: allocate page with page splitting.
>> use case 3: allocate page with or without page splitting
>>             depending on the frag size.
>>
>> Currently page pool only provide page_pool_alloc_pages() and
>> page_pool_alloc_frag() API to enable the 1 & 2 separately,
>> so we can not use a combination of 1 & 2 to enable 3, it is
>> not possible yet because of the per page_pool flag
>> PP_FLAG_PAGE_FRAG.
>>
> 
> I really think we are inventing problems to solve here.
> What would be more useful here would be an example with numbers.  Most
> of what you mention are true, but what % of the packets would split a
> page in a way that the remaining part cant be used is unknown.  Do you
> have a usecase in the hns3 driver?  Are there any numbers that justify
> the change?

There is no usecase for the hns3 driver yet.

As mentioned in the cover letter, there are usecases for veth and virtio_net
to use both frag page and non frag page in order to have the best performance
with the least memory depending on the head/tail room space for xdp_frame/shinfo
and mtu/packet size.

For virtio_net case:
I am guessing that allocating the non frag page has some advantage in some
case and allocating frag page has some advantage in other case, that
is why there is module param for the switching between those two mode.
It would be good that Liang Chen can help to confirm the gusessing with some
data.

For veth case:
It would be good that Maryam(cc) can help to do some testing to justify the
change as suggested by Jesper.


For iavf case, Alexander Lobakin mentioned some data in below disscusion:
https://patchwork.kernel.org/project/netdevbpf/patch/20230612130256.4572-5-linyunsheng@huawei.com/#25384716

> 


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH net-next v7 1/6] page_pool: frag API support for 32-bit arch with 64-bit DMA
  2023-08-16 10:01 ` [PATCH net-next v7 1/6] page_pool: frag API support for 32-bit arch with 64-bit DMA Yunsheng Lin
@ 2023-08-17 13:57   ` Ilias Apalodimas
  2023-08-17 16:15     ` Jakub Kicinski
  0 siblings, 1 reply; 32+ messages in thread
From: Ilias Apalodimas @ 2023-08-17 13:57 UTC (permalink / raw)
  To: Yunsheng Lin
  Cc: davem, kuba, pabeni, netdev, linux-kernel, Lorenzo Bianconi,
	Alexander Duyck, Liang Chen, Alexander Lobakin, Saeed Mahameed,
	Leon Romanovsky, Eric Dumazet, Jesper Dangaard Brouer,
	linux-rdma

Hi Yunsheng,

On Wed, 16 Aug 2023 at 13:04, Yunsheng Lin <linyunsheng@huawei.com> wrote:
>
> Currently page_pool_alloc_frag() is not supported in 32-bit
> arch with 64-bit DMA because of the overlap issue between
> pp_frag_count and dma_addr_upper in 'struct page' for those
> arches, which seems to be quite common, see [1], which means
> driver may need to handle it when using frag API.
>
> In order to simplify the driver's work when using frag API
> this patch allows page_pool_alloc_frag() to call
> page_pool_alloc_pages() to return pages for those arches.
>
> Add a PP_FLAG_PAGE_SPLIT_IN_DRIVER flag in order to fail the
> page_pool creation for 32-bit arch with 64-bit DMA when driver
> tries to do the page splitting itself.

Why should we care about this?  Even an architecture that's 32-bit and
has a 64bit DMA should be allowed to split the pages internally if it
decides to do so.  The trick that drivers usually do is elevate the
page refcnt and deal with that internally.

Thanks
/Ilias
>
> Note that it may aggravate truesize underestimate problem for
> skb as there is no page splitting for those pages, if driver
> need a accurate truesize, it may calculate that according to
> frag size, page order and PAGE_POOL_DMA_USE_PP_FRAG_COUNT
> being true or not. And we may provide a helper for that if it
> turns out to be helpful.
>
> 1. https://lore.kernel.org/all/20211117075652.58299-1-linyunsheng@huawei.com/
>
> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
> CC: Lorenzo Bianconi <lorenzo@kernel.org>
> CC: Alexander Duyck <alexander.duyck@gmail.com>
> CC: Liang Chen <liangchen.linux@gmail.com>
> CC: Alexander Lobakin <aleksander.lobakin@intel.com>
> ---
>  .../net/ethernet/mellanox/mlx5/core/en_main.c |  3 +-
>  include/net/page_pool/helpers.h               | 38 +++++++++++++++--
>  include/net/page_pool/types.h                 | 42 ++++++++++++-------
>  net/core/page_pool.c                          | 15 +++----
>  4 files changed, 68 insertions(+), 30 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> index bc9d5a5bea01..ec9c5a8cbda6 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> @@ -834,7 +834,8 @@ static int mlx5e_alloc_rq(struct mlx5e_params *params,
>                 struct page_pool_params pp_params = { 0 };
>
>                 pp_params.order     = 0;
> -               pp_params.flags     = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV | PP_FLAG_PAGE_FRAG;
> +               pp_params.flags     = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV |
> +                                       PP_FLAG_PAGE_SPLIT_IN_DRIVER;
>                 pp_params.pool_size = pool_size;
>                 pp_params.nid       = node;
>                 pp_params.dev       = rq->pdev;
> diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h
> index 94231533a369..cb18de55f239 100644
> --- a/include/net/page_pool/helpers.h
> +++ b/include/net/page_pool/helpers.h
> @@ -29,8 +29,12 @@
>  #ifndef _NET_PAGE_POOL_HELPERS_H
>  #define _NET_PAGE_POOL_HELPERS_H
>
> +#include <linux/dma-mapping.h>
>  #include <net/page_pool/types.h>
>
> +#define PAGE_POOL_DMA_USE_PP_FRAG_COUNT        \
> +               (sizeof(dma_addr_t) > sizeof(unsigned long))
> +
>  #ifdef CONFIG_PAGE_POOL_STATS
>  int page_pool_ethtool_stats_get_count(void);
>  u8 *page_pool_ethtool_stats_get_strings(u8 *data);
> @@ -73,6 +77,29 @@ static inline struct page *page_pool_dev_alloc_pages(struct page_pool *pool)
>         return page_pool_alloc_pages(pool, gfp);
>  }
>
> +static inline struct page *page_pool_alloc_frag(struct page_pool *pool,
> +                                               unsigned int *offset,
> +                                               unsigned int size, gfp_t gfp)
> +{
> +       unsigned int max_size = PAGE_SIZE << pool->p.order;
> +
> +       size = ALIGN(size, dma_get_cache_alignment());
> +
> +       if (WARN_ON(size > max_size))
> +               return NULL;
> +
> +       /* Don't allow page splitting and allocate one big frag
> +        * for 32-bit arch with 64-bit DMA, corresponding to
> +        * the checking in page_pool_is_last_frag().
> +        */
> +       if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT) {
> +               *offset = 0;
> +               return page_pool_alloc_pages(pool, gfp);
> +       }
> +
> +       return __page_pool_alloc_frag(pool, offset, size, gfp);
> +}
> +
>  static inline struct page *page_pool_dev_alloc_frag(struct page_pool *pool,
>                                                     unsigned int *offset,
>                                                     unsigned int size)
> @@ -134,8 +161,14 @@ static inline long page_pool_defrag_page(struct page *page, long nr)
>  static inline bool page_pool_is_last_frag(struct page_pool *pool,
>                                           struct page *page)
>  {
> -       /* If fragments aren't enabled or count is 0 we were the last user */
> +       /* We assume we are the last frag user that is still holding
> +        * on to the page if:
> +        * 1. Fragments aren't enabled.
> +        * 2. We are running in 32-bit arch with 64-bit DMA.
> +        * 3. page_pool_defrag_page() indicate we are the last user.
> +        */
>         return !(pool->p.flags & PP_FLAG_PAGE_FRAG) ||
> +              PAGE_POOL_DMA_USE_PP_FRAG_COUNT ||
>                (page_pool_defrag_page(page, 1) == 0);
>  }
>
> @@ -197,9 +230,6 @@ static inline void page_pool_recycle_direct(struct page_pool *pool,
>         page_pool_put_full_page(pool, page, true);
>  }
>
> -#define PAGE_POOL_DMA_USE_PP_FRAG_COUNT        \
> -               (sizeof(dma_addr_t) > sizeof(unsigned long))
> -
>  /**
>   * page_pool_get_dma_addr() - Retrieve the stored DMA address.
>   * @page:      page allocated from a page pool
> diff --git a/include/net/page_pool/types.h b/include/net/page_pool/types.h
> index 887e7946a597..079337c42aa6 100644
> --- a/include/net/page_pool/types.h
> +++ b/include/net/page_pool/types.h
> @@ -6,21 +6,29 @@
>  #include <linux/dma-direction.h>
>  #include <linux/ptr_ring.h>
>
> -#define PP_FLAG_DMA_MAP                BIT(0) /* Should page_pool do the DMA
> -                                       * map/unmap
> -                                       */
> -#define PP_FLAG_DMA_SYNC_DEV   BIT(1) /* If set all pages that the driver gets
> -                                       * from page_pool will be
> -                                       * DMA-synced-for-device according to
> -                                       * the length provided by the device
> -                                       * driver.
> -                                       * Please note DMA-sync-for-CPU is still
> -                                       * device driver responsibility
> -                                       */
> -#define PP_FLAG_PAGE_FRAG      BIT(2) /* for page frag feature */
> +/* Should page_pool do the DMA map/unmap */
> +#define PP_FLAG_DMA_MAP                        BIT(0)
> +
> +/* If set all pages that the driver gets from page_pool will be
> + * DMA-synced-for-device according to the length provided by the device driver.
> + * Please note DMA-sync-for-CPU is still device driver responsibility
> + */
> +#define PP_FLAG_DMA_SYNC_DEV           BIT(1)
> +
> +/* for page frag feature */
> +#define PP_FLAG_PAGE_FRAG              BIT(2)
> +
> +/* If set driver will do the page splitting itself. This is used to fail the
> + * page_pool creation because there is overlap issue between pp_frag_count and
> + * dma_addr_upper in 'struct page' for some arches with
> + * PAGE_POOL_DMA_USE_PP_FRAG_COUNT being true.
> + */
> +#define PP_FLAG_PAGE_SPLIT_IN_DRIVER   BIT(3)
> +
>  #define PP_FLAG_ALL            (PP_FLAG_DMA_MAP |\
>                                  PP_FLAG_DMA_SYNC_DEV |\
> -                                PP_FLAG_PAGE_FRAG)
> +                                PP_FLAG_PAGE_FRAG |\
> +                                PP_FLAG_PAGE_SPLIT_IN_DRIVER)
>
>  /*
>   * Fast allocation side cache array/stack
> @@ -45,7 +53,8 @@ struct pp_alloc_cache {
>
>  /**
>   * struct page_pool_params - page pool parameters
> - * @flags:     PP_FLAG_DMA_MAP, PP_FLAG_DMA_SYNC_DEV, PP_FLAG_PAGE_FRAG
> + * @flags:     PP_FLAG_DMA_MAP, PP_FLAG_DMA_SYNC_DEV, PP_FLAG_PAGE_FRAG,
> + *             PP_FLAG_PAGE_SPLIT_IN_DRIVER
>   * @order:     2^order pages on allocation
>   * @pool_size: size of the ptr_ring
>   * @nid:       NUMA node id to allocate from pages from
> @@ -183,8 +192,9 @@ struct page_pool {
>  };
>
>  struct page *page_pool_alloc_pages(struct page_pool *pool, gfp_t gfp);
> -struct page *page_pool_alloc_frag(struct page_pool *pool, unsigned int *offset,
> -                                 unsigned int size, gfp_t gfp);
> +struct page *__page_pool_alloc_frag(struct page_pool *pool,
> +                                   unsigned int *offset, unsigned int size,
> +                                   gfp_t gfp);
>  struct page_pool *page_pool_create(const struct page_pool_params *params);
>
>  struct xdp_mem_info;
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index 77cb75e63aca..7d5f0512aa13 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -14,7 +14,6 @@
>  #include <net/xdp.h>
>
>  #include <linux/dma-direction.h>
> -#include <linux/dma-mapping.h>
>  #include <linux/page-flags.h>
>  #include <linux/mm.h> /* for put_page() */
>  #include <linux/poison.h>
> @@ -212,7 +211,7 @@ static int page_pool_init(struct page_pool *pool,
>         }
>
>         if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT &&
> -           pool->p.flags & PP_FLAG_PAGE_FRAG)
> +           pool->p.flags & PP_FLAG_PAGE_SPLIT_IN_DRIVER)
>                 return -EINVAL;
>
>  #ifdef CONFIG_PAGE_POOL_STATS
> @@ -737,18 +736,16 @@ static void page_pool_free_frag(struct page_pool *pool)
>         page_pool_return_page(pool, page);
>  }
>
> -struct page *page_pool_alloc_frag(struct page_pool *pool,
> -                                 unsigned int *offset,
> -                                 unsigned int size, gfp_t gfp)
> +struct page *__page_pool_alloc_frag(struct page_pool *pool,
> +                                   unsigned int *offset,
> +                                   unsigned int size, gfp_t gfp)
>  {
>         unsigned int max_size = PAGE_SIZE << pool->p.order;
>         struct page *page = pool->frag_page;
>
> -       if (WARN_ON(!(pool->p.flags & PP_FLAG_PAGE_FRAG) ||
> -                   size > max_size))
> +       if (WARN_ON(!(pool->p.flags & PP_FLAG_PAGE_FRAG)))
>                 return NULL;
>
> -       size = ALIGN(size, dma_get_cache_alignment());
>         *offset = pool->frag_offset;
>
>         if (page && *offset + size > max_size) {
> @@ -781,7 +778,7 @@ struct page *page_pool_alloc_frag(struct page_pool *pool,
>         alloc_stat_inc(pool, fast);
>         return page;
>  }
> -EXPORT_SYMBOL(page_pool_alloc_frag);
> +EXPORT_SYMBOL(__page_pool_alloc_frag);
>
>  static void page_pool_empty_ring(struct page_pool *pool)
>  {
> --
> 2.33.0
>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH net-next v7 1/6] page_pool: frag API support for 32-bit arch with 64-bit DMA
  2023-08-17 13:57   ` Ilias Apalodimas
@ 2023-08-17 16:15     ` Jakub Kicinski
  2023-08-17 16:59       ` Ilias Apalodimas
  0 siblings, 1 reply; 32+ messages in thread
From: Jakub Kicinski @ 2023-08-17 16:15 UTC (permalink / raw)
  To: Ilias Apalodimas, Mina Almasry
  Cc: Yunsheng Lin, davem, pabeni, netdev, linux-kernel,
	Lorenzo Bianconi, Alexander Duyck, Liang Chen, Alexander Lobakin,
	Saeed Mahameed, Leon Romanovsky, Eric Dumazet,
	Jesper Dangaard Brouer

On Thu, 17 Aug 2023 16:57:16 +0300 Ilias Apalodimas wrote:
> Why should we care about this?  Even an architecture that's 32-bit and
> has a 64bit DMA should be allowed to split the pages internally if it
> decides to do so.  The trick that drivers usually do is elevate the
> page refcnt and deal with that internally.

Can we assume the DMA mapping of page pool is page aligned? We should
be, right? That means we're storing 12 bits of 0 at the lower end.
So even with 32b of space we can easily store addresses for 32b+12b =>
16TB of memory. "Ought to be enough" to paraphrase Bill G, and the
problem is only in our heads?

Before we go that way - Mina, are the dma-buf "chunks" you're working
with going to be fragment-able? Or rather can driver and/or core take
multiple references on a single buffer?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH net-next v7 1/6] page_pool: frag API support for 32-bit arch with 64-bit DMA
  2023-08-17 16:15     ` Jakub Kicinski
@ 2023-08-17 16:59       ` Ilias Apalodimas
  2023-08-17 23:57         ` Jakub Kicinski
  0 siblings, 1 reply; 32+ messages in thread
From: Ilias Apalodimas @ 2023-08-17 16:59 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Mina Almasry, Yunsheng Lin, davem, pabeni, netdev, linux-kernel,
	Lorenzo Bianconi, Alexander Duyck, Liang Chen, Alexander Lobakin,
	Saeed Mahameed, Leon Romanovsky, Eric Dumazet,
	Jesper Dangaard Brouer

Hi Jakub,

On Thu, 17 Aug 2023 at 19:15, Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Thu, 17 Aug 2023 16:57:16 +0300 Ilias Apalodimas wrote:
> > Why should we care about this?  Even an architecture that's 32-bit and
> > has a 64bit DMA should be allowed to split the pages internally if it
> > decides to do so.  The trick that drivers usually do is elevate the
> > page refcnt and deal with that internally.
>
> Can we assume the DMA mapping of page pool is page aligned? We should
> be, right?

Yes

> That means we're storing 12 bits of 0 at the lower end.
> So even with 32b of space we can easily store addresses for 32b+12b =>
> 16TB of memory. "Ought to be enough" to paraphrase Bill G, and the
> problem is only in our heads?

Do you mean moving the pp_frag_count there?  I was questioning the
need to have PP_FLAG_PAGE_SPLIT_IN_DRIVER overall.  With Yunshengs
patches such a platform would allocate a page, so why should we
prevent it from splitting it internally?

Thanks
/Ilias
>
> Before we go that way - Mina, are the dma-buf "chunks" you're working
> with going to be fragment-able? Or rather can driver and/or core take
> multiple references on a single buffer?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH net-next v7 1/6] page_pool: frag API support for 32-bit arch with 64-bit DMA
  2023-08-17 16:59       ` Ilias Apalodimas
@ 2023-08-17 23:57         ` Jakub Kicinski
  2023-08-18  6:12           ` Ilias Apalodimas
  0 siblings, 1 reply; 32+ messages in thread
From: Jakub Kicinski @ 2023-08-17 23:57 UTC (permalink / raw)
  To: Ilias Apalodimas
  Cc: Mina Almasry, Yunsheng Lin, davem, pabeni, netdev, linux-kernel,
	Lorenzo Bianconi, Alexander Duyck, Liang Chen, Alexander Lobakin,
	Saeed Mahameed, Leon Romanovsky, Eric Dumazet,
	Jesper Dangaard Brouer

On Thu, 17 Aug 2023 19:59:37 +0300 Ilias Apalodimas wrote:
> > Can we assume the DMA mapping of page pool is page aligned? We should
> > be, right?  
> 
> Yes
> 
> > That means we're storing 12 bits of 0 at the lower end.
> > So even with 32b of space we can easily store addresses for 32b+12b =>
> > 16TB of memory. "Ought to be enough" to paraphrase Bill G, and the
> > problem is only in our heads?  
> 
> Do you mean moving the pp_frag_count there? 

Right, IIUC we don't have enough space to fit dma_addr_t and the
refcount, but if we store the dma addr on a shifted u32 instead 
of using dma_addr_t explicitly - the refcount should fit?

> I was questioning the need to have PP_FLAG_PAGE_SPLIT_IN_DRIVER
> overall.  With Yunshengs patches such a platform would allocate a
> page, so why should we prevent it from splitting it internally?

Splitting it is fine, the problem is that the refcount AKA
page->pp_frag_count** counts outstanding PP-aware references
and page->refcount counts PP-unaware references.

If we want to use page->refcount directly we'd need to unmap 
the page whenever drivers calls page_pool_defrag_page().
But the driver may assume the page is still mapped afterwards.

We can change the API to make this behavior explicit. Although
IMHO that's putting the burden of rare platforms on non-rare
platforms which we should avoid.

** I said it before and I will keep saying this until someone gets 
   angry at me - I really think we should rename this field because
   the association with frags is a coincidence.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH net-next v7 1/6] page_pool: frag API support for 32-bit arch with 64-bit DMA
  2023-08-17 23:57         ` Jakub Kicinski
@ 2023-08-18  6:12           ` Ilias Apalodimas
  2023-08-18  8:59             ` Yunsheng Lin
  2023-08-18 21:51             ` Jakub Kicinski
  0 siblings, 2 replies; 32+ messages in thread
From: Ilias Apalodimas @ 2023-08-18  6:12 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Mina Almasry, Yunsheng Lin, davem, pabeni, netdev, linux-kernel,
	Lorenzo Bianconi, Alexander Duyck, Liang Chen, Alexander Lobakin,
	Saeed Mahameed, Leon Romanovsky, Eric Dumazet,
	Jesper Dangaard Brouer

On Fri, 18 Aug 2023 at 02:57, Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Thu, 17 Aug 2023 19:59:37 +0300 Ilias Apalodimas wrote:
> > > Can we assume the DMA mapping of page pool is page aligned? We should
> > > be, right?
> >
> > Yes
> >
> > > That means we're storing 12 bits of 0 at the lower end.
> > > So even with 32b of space we can easily store addresses for 32b+12b =>
> > > 16TB of memory. "Ought to be enough" to paraphrase Bill G, and the
> > > problem is only in our heads?
> >
> > Do you mean moving the pp_frag_count there?
>
> Right, IIUC we don't have enough space to fit dma_addr_t and the
> refcount, but if we store the dma addr on a shifted u32 instead
> of using dma_addr_t explicitly - the refcount should fit?

struct page looks like this:

unsigned long dma_addr;
union {
      unsigned long dma_addr_upper;
      atomic_long_t pp_frag_count;
};

So, on 32bit platforms with 64bit dma we can't support a frag count at all.
We could either use the lower 12 bits (and have support for 4096 frags
'only') or do what you suggest.
TBH I don't love any of these and since those platforms are rare (or
at least that's what I think), I prefer not supporting them at all.

>
> > I was questioning the need to have PP_FLAG_PAGE_SPLIT_IN_DRIVER
> > overall.  With Yunshengs patches such a platform would allocate a
> > page, so why should we prevent it from splitting it internally?
>
> Splitting it is fine, the problem is that the refcount AKA
> page->pp_frag_count** counts outstanding PP-aware references
> and page->refcount counts PP-unaware references.
>
> If we want to use page->refcount directly we'd need to unmap
> the page whenever drivers calls page_pool_defrag_page().
> But the driver may assume the page is still mapped afterwards.

What I am suggesting here is to not add the new
PP_FLAG_PAGE_SPLIT_IN_DRIVER flag.  If a driver wants to split pages
internally it should create a pool without
PP_FLAG_DMA_SYNC_DEV to begin with.  The only responsibility the
driver would have is to elevate the page refcnt so page pool would not
try to free/recycle it.  Since it won't be able to allocate fragments
we don't have to worry about the rest.

> We can change the API to make this behavior explicit. Although
> IMHO that's putting the burden of rare platforms on non-rare
> platforms which we should avoid.

Yep, agree here.

>
> ** I said it before and I will keep saying this until someone gets
>    angry at me - I really think we should rename this field because
>    the association with frags is a coincidence.

I had similar confusions everytime I had to re-read our code hence git
show 4d4266e3fd32
Any suggestions?

Thanks
/Ilias

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH net-next v7 1/6] page_pool: frag API support for 32-bit arch with 64-bit DMA
  2023-08-18  6:12           ` Ilias Apalodimas
@ 2023-08-18  8:59             ` Yunsheng Lin
  2023-08-18 21:51             ` Jakub Kicinski
  1 sibling, 0 replies; 32+ messages in thread
From: Yunsheng Lin @ 2023-08-18  8:59 UTC (permalink / raw)
  To: Ilias Apalodimas, Jakub Kicinski
  Cc: Mina Almasry, davem, pabeni, netdev, linux-kernel,
	Lorenzo Bianconi, Alexander Duyck, Liang Chen, Alexander Lobakin,
	Saeed Mahameed, Leon Romanovsky, Eric Dumazet,
	Jesper Dangaard Brouer

On 2023/8/18 14:12, Ilias Apalodimas wrote:
> On Fri, 18 Aug 2023 at 02:57, Jakub Kicinski <kuba@kernel.org> wrote:
>>
>> On Thu, 17 Aug 2023 19:59:37 +0300 Ilias Apalodimas wrote:
>>>> Can we assume the DMA mapping of page pool is page aligned? We should
>>>> be, right?
>>>
>>> Yes
>>>
>>>> That means we're storing 12 bits of 0 at the lower end.
>>>> So even with 32b of space we can easily store addresses for 32b+12b =>
>>>> 16TB of memory. "Ought to be enough" to paraphrase Bill G, and the
>>>> problem is only in our heads?
>>>
>>> Do you mean moving the pp_frag_count there?
>>
>> Right, IIUC we don't have enough space to fit dma_addr_t and the
>> refcount, but if we store the dma addr on a shifted u32 instead
>> of using dma_addr_t explicitly - the refcount should fit?
> 
> struct page looks like this:
> 
> unsigned long dma_addr;
> union {
>       unsigned long dma_addr_upper;
>       atomic_long_t pp_frag_count;
> };
> 
> So, on 32bit platforms with 64bit dma we can't support a frag count at all.
> We could either use the lower 12 bits (and have support for 4096 frags
> 'only') or do what you suggest.

Maybe we can rethink about defining a new memdesc for page used by
the network stack. As I took a glance at the Device Memory TCP v2
patchset from Mina, we may use the new memdesc to support more new
memory types, to decupple page_pool from the spcae limitation of
'stuct page', and to support frag page for 32bit platforms with
64bit dma too.

> TBH I don't love any of these and since those platforms are rare (or
> at least that's what I think), I prefer not supporting them at all.
> 
>>
>>> I was questioning the need to have PP_FLAG_PAGE_SPLIT_IN_DRIVER
>>> overall.  With Yunshengs patches such a platform would allocate a
>>> page, so why should we prevent it from splitting it internally?
>>
>> Splitting it is fine, the problem is that the refcount AKA
>> page->pp_frag_count** counts outstanding PP-aware references
>> and page->refcount counts PP-unaware references.
>>
>> If we want to use page->refcount directly we'd need to unmap
>> the page whenever drivers calls page_pool_defrag_page().
>> But the driver may assume the page is still mapped afterwards.
> 
> What I am suggesting here is to not add the new
> PP_FLAG_PAGE_SPLIT_IN_DRIVER flag.  If a driver wants to split pages
> internally it should create a pool without
> PP_FLAG_DMA_SYNC_DEV to begin with.  The only responsibility the

I am not sure why using PP_FLAG_DMA_SYNC_DEV is conflicted with page
split if the frag page is freed with dma_sync_size being -1 and the
pool->p.max_len is setup correctly.

I was thinking about defining page_pool_put_frag() which corresponds
to page_pool_dev_alloc_frag() and page_pool_free() which corresponds
to page_pool_dev_alloc(), so that driver author does not misuse the
dma_sync_size parameter for page_pool_put_page() related API.

And PP_FLAG_PAGE_SPLIT_IN_DRIVER is used to fail the page_pool creation
when driver is using page->pp_frag_count to split page itself in 32-bit
arch with 64-bit DMA.

> driver would have is to elevate the page refcnt so page pool would not
> try to free/recycle it.  Since it won't be able to allocate fragments
> we don't have to worry about the rest.
> 
>> We can change the API to make this behavior explicit. Although
>> IMHO that's putting the burden of rare platforms on non-rare
>> platforms which we should avoid.
> 
> Yep, agree here.
> 
>>
>> ** I said it before and I will keep saying this until someone gets
>>    angry at me - I really think we should rename this field because
>>    the association with frags is a coincidence.

As my understanding, currently the naming is fine as it is because
page->pp_frag_count is used to indicate the number of frags a page
is split, there is only one user holding onto one frag for now.

But if there are more than one user holding onto one frag, then we
should probably rename it to something like page->pp_ref_count as
Jakub suggested before, for example below patch may enable more than
one user holding onto one frag.

https://patchwork.kernel.org/project/netdevbpf/patch/20230628121150.47778-1-liangchen.linux@gmail.com/

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH net-next v7 1/6] page_pool: frag API support for 32-bit arch with 64-bit DMA
  2023-08-18  6:12           ` Ilias Apalodimas
  2023-08-18  8:59             ` Yunsheng Lin
@ 2023-08-18 21:51             ` Jakub Kicinski
  2023-08-21  8:38               ` Ilias Apalodimas
  2023-08-21 12:18               ` Yunsheng Lin
  1 sibling, 2 replies; 32+ messages in thread
From: Jakub Kicinski @ 2023-08-18 21:51 UTC (permalink / raw)
  To: Ilias Apalodimas
  Cc: Mina Almasry, Yunsheng Lin, davem, pabeni, netdev, linux-kernel,
	Lorenzo Bianconi, Alexander Duyck, Liang Chen, Alexander Lobakin,
	Saeed Mahameed, Leon Romanovsky, Eric Dumazet,
	Jesper Dangaard Brouer

On Fri, 18 Aug 2023 09:12:09 +0300 Ilias Apalodimas wrote:
> > Right, IIUC we don't have enough space to fit dma_addr_t and the
> > refcount, but if we store the dma addr on a shifted u32 instead
> > of using dma_addr_t explicitly - the refcount should fit?  
> 
> struct page looks like this:
> 
> unsigned long dma_addr;
> union {
>       unsigned long dma_addr_upper;
>       atomic_long_t pp_frag_count;
> };

I could be completely misunderstanding the problem.
Let me show you the diff of what I was thinking more or less.

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 5e74ce4a28cd..58ffa8dc745f 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -126,11 +126,6 @@ struct page {
 			unsigned long _pp_mapping_pad;
 			unsigned long dma_addr;
 			union {
-				/**
-				 * dma_addr_upper: might require a 64-bit
-				 * value on 32-bit architectures.
-				 */
-				unsigned long dma_addr_upper;
 				/**
 				 * For frag page support, not supported in
 				 * 32-bit architectures with 64-bit DMA.
diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h
index 94231533a369..6f87a0fa2178 100644
--- a/include/net/page_pool/helpers.h
+++ b/include/net/page_pool/helpers.h
@@ -212,16 +212,24 @@ static inline dma_addr_t page_pool_get_dma_addr(struct page *page)
 	dma_addr_t ret = page->dma_addr;
 
 	if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT)
-		ret |= (dma_addr_t)page->dma_addr_upper << 16 << 16;
+		ret <<= PAGE_SHIFT;
 
 	return ret;
 }
 
-static inline void page_pool_set_dma_addr(struct page *page, dma_addr_t addr)
+static inline bool page_pool_set_dma_addr(struct page *page, dma_addr_t addr)
 {
+	bool failed = false;
+
 	page->dma_addr = addr;
-	if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT)
-		page->dma_addr_upper = upper_32_bits(addr);
+	if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT) {
+		page->dma_addr >>= PAGE_SHIFT;
+		/* We assume page alignment to shave off bottom bits,
+		 * if this "compression" doesn't work we need to drop.
+		 */
+		failed = addr != page->dma_addr << PAGE_SHIFT;
+	}
+	return failed;
 }
 
 static inline bool page_pool_put(struct page_pool *pool)
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 77cb75e63aca..9ea42e242a89 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -211,10 +211,6 @@ static int page_pool_init(struct page_pool *pool,
 		 */
 	}
 
-	if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT &&
-	    pool->p.flags & PP_FLAG_PAGE_FRAG)
-		return -EINVAL;
-
 #ifdef CONFIG_PAGE_POOL_STATS
 	pool->recycle_stats = alloc_percpu(struct page_pool_recycle_stats);
 	if (!pool->recycle_stats)
@@ -359,12 +355,19 @@ static bool page_pool_dma_map(struct page_pool *pool, struct page *page)
 	if (dma_mapping_error(pool->p.dev, dma))
 		return false;
 
-	page_pool_set_dma_addr(page, dma);
+	if (page_pool_set_dma_addr(page, dma))
+		goto unmap_failed;
 
 	if (pool->p.flags & PP_FLAG_DMA_SYNC_DEV)
 		page_pool_dma_sync_for_device(pool, page, pool->p.max_len);
 
 	return true;
+
+unmap_failed:
+	dma_unmap_page_attrs(pool->p.dev, dma,
+			     PAGE_SIZE << pool->p.order, pool->p.dma_dir,
+			     DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_WEAK_ORDERING);
+	return false;
 }
 
 static void page_pool_set_pp_info(struct page_pool *pool,

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [PATCH net-next v7 1/6] page_pool: frag API support for 32-bit arch with 64-bit DMA
  2023-08-18 21:51             ` Jakub Kicinski
@ 2023-08-21  8:38               ` Ilias Apalodimas
  2023-08-21 11:15                 ` Ilias Apalodimas
  2023-08-21 12:18               ` Yunsheng Lin
  1 sibling, 1 reply; 32+ messages in thread
From: Ilias Apalodimas @ 2023-08-21  8:38 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Mina Almasry, Yunsheng Lin, davem, pabeni, netdev, linux-kernel,
	Lorenzo Bianconi, Alexander Duyck, Liang Chen, Alexander Lobakin,
	Saeed Mahameed, Leon Romanovsky, Eric Dumazet,
	Jesper Dangaard Brouer

resending for the mailing list apologies for the noise.


On Sat, 19 Aug 2023 at 00:51, Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Fri, 18 Aug 2023 09:12:09 +0300 Ilias Apalodimas wrote:
> > > Right, IIUC we don't have enough space to fit dma_addr_t and the
> > > refcount, but if we store the dma addr on a shifted u32 instead
> > > of using dma_addr_t explicitly - the refcount should fit?
> >
> > struct page looks like this:
> >
> > unsigned long dma_addr;
> > union {
> >       unsigned long dma_addr_upper;
> >       atomic_long_t pp_frag_count;
> > };
>
> I could be completely misunderstanding the problem.

You aren't!

> Let me show you the diff of what I was thinking more or less.
>
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 5e74ce4a28cd..58ffa8dc745f 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -126,11 +126,6 @@ struct page {
>                         unsigned long _pp_mapping_pad;
>                         unsigned long dma_addr;
>                         union {
> -                               /**
> -                                * dma_addr_upper: might require a 64-bit
> -                                * value on 32-bit architectures.
> -                                */
> -                               unsigned long dma_addr_upper;
>                                 /**
>                                  * For frag page support, not supported in
>                                  * 32-bit architectures with 64-bit DMA.
> diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h
> index 94231533a369..6f87a0fa2178 100644
> --- a/include/net/page_pool/helpers.h
> +++ b/include/net/page_pool/helpers.h
> @@ -212,16 +212,24 @@ static inline dma_addr_t page_pool_get_dma_addr(struct page *page)
>         dma_addr_t ret = page->dma_addr;
>
>         if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT)
> -               ret |= (dma_addr_t)page->dma_addr_upper << 16 << 16;
> +               ret <<= PAGE_SHIFT;
>
>         return ret;
>  }
>
> -static inline void page_pool_set_dma_addr(struct page *page, dma_addr_t addr)
> +static inline bool page_pool_set_dma_addr(struct page *page, dma_addr_t addr)
>  {
> +       bool failed = false;
> +
>         page->dma_addr = addr;
> -       if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT)
> -               page->dma_addr_upper = upper_32_bits(addr);
> +       if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT) {
> +               page->dma_addr >>= PAGE_SHIFT;
> +               /* We assume page alignment to shave off bottom bits,
> +                * if this "compression" doesn't work we need to drop.
> +                */
> +               failed = addr != page->dma_addr << PAGE_SHIFT;
> +       }
> +       return failed;
>  }
>
>  static inline bool page_pool_put(struct page_pool *pool)
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index 77cb75e63aca..9ea42e242a89 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -211,10 +211,6 @@ static int page_pool_init(struct page_pool *pool,
>                  */
>         }
>
> -       if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT &&
> -           pool->p.flags & PP_FLAG_PAGE_FRAG)
> -               return -EINVAL;
> -
>  #ifdef CONFIG_PAGE_POOL_STATS
>         pool->recycle_stats = alloc_percpu(struct page_pool_recycle_stats);
>         if (!pool->recycle_stats)
> @@ -359,12 +355,19 @@ static bool page_pool_dma_map(struct page_pool *pool, struct page *page)
>         if (dma_mapping_error(pool->p.dev, dma))
>                 return false;
>
> -       page_pool_set_dma_addr(page, dma);
> +       if (page_pool_set_dma_addr(page, dma))
> +               goto unmap_failed;
>
>         if (pool->p.flags & PP_FLAG_DMA_SYNC_DEV)
>                 page_pool_dma_sync_for_device(pool, page, pool->p.max_len);
>
>         return true;
> +
> +unmap_failed:
> +       dma_unmap_page_attrs(pool->p.dev, dma,
> +                            PAGE_SIZE << pool->p.order, pool->p.dma_dir,
> +                            DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_WEAK_ORDERING);
> +       return false;
>  }

That seems reasonable and would work for pages > 4k as well. But is
16TB enough?  I am more familiar with embedded than large servers,
which do tend to scale that high.

Regards
/Ilias
>
>  static void page_pool_set_pp_info(struct page_pool *pool,

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH net-next v7 1/6] page_pool: frag API support for 32-bit arch with 64-bit DMA
  2023-08-21  8:38               ` Ilias Apalodimas
@ 2023-08-21 11:15                 ` Ilias Apalodimas
  0 siblings, 0 replies; 32+ messages in thread
From: Ilias Apalodimas @ 2023-08-21 11:15 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Mina Almasry, Yunsheng Lin, davem, pabeni, netdev, linux-kernel,
	Lorenzo Bianconi, Alexander Duyck, Liang Chen, Alexander Lobakin,
	Saeed Mahameed, Leon Romanovsky, Eric Dumazet,
	Jesper Dangaard Brouer

On Mon, 21 Aug 2023 at 11:38, Ilias Apalodimas
<ilias.apalodimas@linaro.org> wrote:
>
> resending for the mailing list apologies for the noise.
>
>
> On Sat, 19 Aug 2023 at 00:51, Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > On Fri, 18 Aug 2023 09:12:09 +0300 Ilias Apalodimas wrote:
> > > > Right, IIUC we don't have enough space to fit dma_addr_t and the
> > > > refcount, but if we store the dma addr on a shifted u32 instead
> > > > of using dma_addr_t explicitly - the refcount should fit?
> > >
> > > struct page looks like this:
> > >
> > > unsigned long dma_addr;
> > > union {
> > >       unsigned long dma_addr_upper;
> > >       atomic_long_t pp_frag_count;
> > > };
> >
> > I could be completely misunderstanding the problem.
>
> You aren't!
>
> > Let me show you the diff of what I was thinking more or less.
> >
> > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> > index 5e74ce4a28cd..58ffa8dc745f 100644
> > --- a/include/linux/mm_types.h
> > +++ b/include/linux/mm_types.h
> > @@ -126,11 +126,6 @@ struct page {
> >                         unsigned long _pp_mapping_pad;
> >                         unsigned long dma_addr;
> >                         union {
> > -                               /**
> > -                                * dma_addr_upper: might require a 64-bit
> > -                                * value on 32-bit architectures.
> > -                                */
> > -                               unsigned long dma_addr_upper;
> >                                 /**
> >                                  * For frag page support, not supported in
> >                                  * 32-bit architectures with 64-bit DMA.
> > diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h
> > index 94231533a369..6f87a0fa2178 100644
> > --- a/include/net/page_pool/helpers.h
> > +++ b/include/net/page_pool/helpers.h
> > @@ -212,16 +212,24 @@ static inline dma_addr_t page_pool_get_dma_addr(struct page *page)
> >         dma_addr_t ret = page->dma_addr;
> >
> >         if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT)
> > -               ret |= (dma_addr_t)page->dma_addr_upper << 16 << 16;
> > +               ret <<= PAGE_SHIFT;
> >
> >         return ret;
> >  }
> >
> > -static inline void page_pool_set_dma_addr(struct page *page, dma_addr_t addr)
> > +static inline bool page_pool_set_dma_addr(struct page *page, dma_addr_t addr)
> >  {
> > +       bool failed = false;
> > +
> >         page->dma_addr = addr;
> > -       if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT)
> > -               page->dma_addr_upper = upper_32_bits(addr);
> > +       if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT) {
> > +               page->dma_addr >>= PAGE_SHIFT;
> > +               /* We assume page alignment to shave off bottom bits,
> > +                * if this "compression" doesn't work we need to drop.
> > +                */
> > +               failed = addr != page->dma_addr << PAGE_SHIFT;
> > +       }
> > +       return failed;
> >  }
> >
> >  static inline bool page_pool_put(struct page_pool *pool)
> > diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> > index 77cb75e63aca..9ea42e242a89 100644
> > --- a/net/core/page_pool.c
> > +++ b/net/core/page_pool.c
> > @@ -211,10 +211,6 @@ static int page_pool_init(struct page_pool *pool,
> >                  */
> >         }
> >
> > -       if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT &&
> > -           pool->p.flags & PP_FLAG_PAGE_FRAG)
> > -               return -EINVAL;
> > -
> >  #ifdef CONFIG_PAGE_POOL_STATS
> >         pool->recycle_stats = alloc_percpu(struct page_pool_recycle_stats);
> >         if (!pool->recycle_stats)
> > @@ -359,12 +355,19 @@ static bool page_pool_dma_map(struct page_pool *pool, struct page *page)
> >         if (dma_mapping_error(pool->p.dev, dma))
> >                 return false;
> >
> > -       page_pool_set_dma_addr(page, dma);
> > +       if (page_pool_set_dma_addr(page, dma))
> > +               goto unmap_failed;
> >
> >         if (pool->p.flags & PP_FLAG_DMA_SYNC_DEV)
> >                 page_pool_dma_sync_for_device(pool, page, pool->p.max_len);
> >
> >         return true;
> > +
> > +unmap_failed:
> > +       dma_unmap_page_attrs(pool->p.dev, dma,
> > +                            PAGE_SIZE << pool->p.order, pool->p.dma_dir,
> > +                            DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_WEAK_ORDERING);
> > +       return false;
> >  }
>
> That seems reasonable and would work for pages > 4k as well. But is
> 16TB enough?  I am more familiar with embedded than large servers,
> which do tend to scale that high.

Right never respond before coffee .... I think this is reasonable overall.

Thanks
/Ilias
>
> Regards
> /Ilias
> >
> >  static void page_pool_set_pp_info(struct page_pool *pool,

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH net-next v7 1/6] page_pool: frag API support for 32-bit arch with 64-bit DMA
  2023-08-18 21:51             ` Jakub Kicinski
  2023-08-21  8:38               ` Ilias Apalodimas
@ 2023-08-21 12:18               ` Yunsheng Lin
  2023-08-21 18:35                 ` Jakub Kicinski
  1 sibling, 1 reply; 32+ messages in thread
From: Yunsheng Lin @ 2023-08-21 12:18 UTC (permalink / raw)
  To: Jakub Kicinski, Ilias Apalodimas
  Cc: Mina Almasry, davem, pabeni, netdev, linux-kernel,
	Lorenzo Bianconi, Alexander Duyck, Liang Chen, Alexander Lobakin,
	Saeed Mahameed, Leon Romanovsky, Eric Dumazet,
	Jesper Dangaard Brouer

On 2023/8/19 5:51, Jakub Kicinski wrote:
> On Fri, 18 Aug 2023 09:12:09 +0300 Ilias Apalodimas wrote:
>>> Right, IIUC we don't have enough space to fit dma_addr_t and the
>>> refcount, but if we store the dma addr on a shifted u32 instead
>>> of using dma_addr_t explicitly - the refcount should fit?  
>>
>> struct page looks like this:
>>
>> unsigned long dma_addr;
>> union {
>>       unsigned long dma_addr_upper;
>>       atomic_long_t pp_frag_count;
>> };
> 
> I could be completely misunderstanding the problem.
> Let me show you the diff of what I was thinking more or less.
> 
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 5e74ce4a28cd..58ffa8dc745f 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -126,11 +126,6 @@ struct page {
>  			unsigned long _pp_mapping_pad;
>  			unsigned long dma_addr;
>  			union {
> -				/**
> -				 * dma_addr_upper: might require a 64-bit
> -				 * value on 32-bit architectures.
> -				 */
> -				unsigned long dma_addr_upper;
>  				/**
>  				 * For frag page support, not supported in
>  				 * 32-bit architectures with 64-bit DMA.
> diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h
> index 94231533a369..6f87a0fa2178 100644
> --- a/include/net/page_pool/helpers.h
> +++ b/include/net/page_pool/helpers.h
> @@ -212,16 +212,24 @@ static inline dma_addr_t page_pool_get_dma_addr(struct page *page)
>  	dma_addr_t ret = page->dma_addr;
>  
>  	if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT)
> -		ret |= (dma_addr_t)page->dma_addr_upper << 16 << 16;
> +		ret <<= PAGE_SHIFT;
>  
>  	return ret;
>  }
>  
> -static inline void page_pool_set_dma_addr(struct page *page, dma_addr_t addr)
> +static inline bool page_pool_set_dma_addr(struct page *page, dma_addr_t addr)
>  {
> +	bool failed = false;
> +
>  	page->dma_addr = addr;
> -	if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT)
> -		page->dma_addr_upper = upper_32_bits(addr);
> +	if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT) {
> +		page->dma_addr >>= PAGE_SHIFT;
> +		/* We assume page alignment to shave off bottom bits,
> +		 * if this "compression" doesn't work we need to drop.
> +		 */
> +		failed = addr != page->dma_addr << PAGE_SHIFT;
> +	}
> +	return failed;
>  }
>  
>  static inline bool page_pool_put(struct page_pool *pool)
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index 77cb75e63aca..9ea42e242a89 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -211,10 +211,6 @@ static int page_pool_init(struct page_pool *pool,
>  		 */
>  	}
>  
> -	if (PAGE_POOL_DMA_USE_PP_FRAG_COUNT &&
> -	    pool->p.flags & PP_FLAG_PAGE_FRAG)
> -		return -EINVAL;
> -
>  #ifdef CONFIG_PAGE_POOL_STATS
>  	pool->recycle_stats = alloc_percpu(struct page_pool_recycle_stats);
>  	if (!pool->recycle_stats)
> @@ -359,12 +355,19 @@ static bool page_pool_dma_map(struct page_pool *pool, struct page *page)
>  	if (dma_mapping_error(pool->p.dev, dma))
>  		return false;
>  
> -	page_pool_set_dma_addr(page, dma);
> +	if (page_pool_set_dma_addr(page, dma))
> +		goto unmap_failed;

What does the driver do when the above fails?
Does the driver still need to implement a fallback for 32 bit arch with
dma addr with more than 32 + 12 bits?
If yes, it does not seems to be very helpful from driver's point of view
as the driver might still need to call page allocator API directly when
the above fails.

>  
>  	if (pool->p.flags & PP_FLAG_DMA_SYNC_DEV)
>  		page_pool_dma_sync_for_device(pool, page, pool->p.max_len);
>  
>  	return true;
> +
> +unmap_failed:
> +	dma_unmap_page_attrs(pool->p.dev, dma,
> +			     PAGE_SIZE << pool->p.order, pool->p.dma_dir,
> +			     DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_WEAK_ORDERING);
> +	return false;
>  }
>  
>  static void page_pool_set_pp_info(struct page_pool *pool,
> .
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH net-next v7 1/6] page_pool: frag API support for 32-bit arch with 64-bit DMA
  2023-08-21 12:18               ` Yunsheng Lin
@ 2023-08-21 18:35                 ` Jakub Kicinski
  2023-08-22  9:21                   ` Yunsheng Lin
  0 siblings, 1 reply; 32+ messages in thread
From: Jakub Kicinski @ 2023-08-21 18:35 UTC (permalink / raw)
  To: Yunsheng Lin
  Cc: Ilias Apalodimas, Mina Almasry, davem, pabeni, netdev,
	linux-kernel, Lorenzo Bianconi, Alexander Duyck, Liang Chen,
	Alexander Lobakin, Saeed Mahameed, Leon Romanovsky, Eric Dumazet,
	Jesper Dangaard Brouer

On Mon, 21 Aug 2023 20:18:55 +0800 Yunsheng Lin wrote:
> > -	page_pool_set_dma_addr(page, dma);
> > +	if (page_pool_set_dma_addr(page, dma))
> > +		goto unmap_failed;  
> 
> What does the driver do when the above fails?
> Does the driver still need to implement a fallback for 32 bit arch with
> dma addr with more than 32 + 12 bits?
> If yes, it does not seems to be very helpful from driver's point of view
> as the driver might still need to call page allocator API directly when
> the above fails.

I'd expect the driver to do nothing, we are operating under 
the assumption that "this will never happen". If it does 
the user should report it back to us. So maybe..

> >  	if (pool->p.flags & PP_FLAG_DMA_SYNC_DEV)
> >  		page_pool_dma_sync_for_device(pool, page, pool->p.max_len);
> >  
> >  	return true;
> > +
> > +unmap_failed:

.. we should also add a:

	WARN_ONCE(1, "misaligned DMA address, please report to netdev@");

here?

> > +	dma_unmap_page_attrs(pool->p.dev, dma,
> > +			     PAGE_SIZE << pool->p.order, pool->p.dma_dir,
> > +			     DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_WEAK_ORDERING);
> > +	return false;

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH net-next v7 1/6] page_pool: frag API support for 32-bit arch with 64-bit DMA
  2023-08-21 18:35                 ` Jakub Kicinski
@ 2023-08-22  9:21                   ` Yunsheng Lin
  2023-08-22 15:38                     ` Jakub Kicinski
  0 siblings, 1 reply; 32+ messages in thread
From: Yunsheng Lin @ 2023-08-22  9:21 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Ilias Apalodimas, Mina Almasry, davem, pabeni, netdev,
	linux-kernel, Lorenzo Bianconi, Alexander Duyck, Liang Chen,
	Alexander Lobakin, Saeed Mahameed, Leon Romanovsky, Eric Dumazet,
	Jesper Dangaard Brouer

On 2023/8/22 2:35, Jakub Kicinski wrote:
> On Mon, 21 Aug 2023 20:18:55 +0800 Yunsheng Lin wrote:
>>> -	page_pool_set_dma_addr(page, dma);
>>> +	if (page_pool_set_dma_addr(page, dma))
>>> +		goto unmap_failed;  
>>
>> What does the driver do when the above fails?
>> Does the driver still need to implement a fallback for 32 bit arch with
>> dma addr with more than 32 + 12 bits?
>> If yes, it does not seems to be very helpful from driver's point of view
>> as the driver might still need to call page allocator API directly when
>> the above fails.
> 
> I'd expect the driver to do nothing, we are operating under 
> the assumption that "this will never happen". If it does 
> the user should report it back to us. So maybe..

Digging a litte deeper:
From CPU's point of views, up to 40 bits is supported for LPAE.
From device's point of view, it seems up to 48-bits is supported
when iommu is enabled, see below:
https://elixir.free-electrons.com/linux/v6.5-rc7/source/drivers/iommu/Kconfig#L28

> 
>>>  	if (pool->p.flags & PP_FLAG_DMA_SYNC_DEV)
>>>  		page_pool_dma_sync_for_device(pool, page, pool->p.max_len);
>>>  
>>>  	return true;
>>> +
>>> +unmap_failed:
> 
> .. we should also add a:
> 
> 	WARN_ONCE(1, "misaligned DMA address, please report to netdev@");

As the CONFIG_PHYS_ADDR_T_64BIT seems to used widely in x86/arm/mips/powerpc,
I am not sure if we can really make the above assumption.

https://elixir.free-electrons.com/linux/v6.4-rc6/K/ident/CONFIG_PHYS_ADDR_T_64BIT

> 
> here?
> 
>>> +	dma_unmap_page_attrs(pool->p.dev, dma,
>>> +			     PAGE_SIZE << pool->p.order, pool->p.dma_dir,
>>> +			     DMA_ATTR_SKIP_CPU_SYNC | DMA_ATTR_WEAK_ORDERING);
>>> +	return false;
> .
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH net-next v7 1/6] page_pool: frag API support for 32-bit arch with 64-bit DMA
  2023-08-22  9:21                   ` Yunsheng Lin
@ 2023-08-22 15:38                     ` Jakub Kicinski
  2023-08-22 18:30                       ` Alexander Duyck
  2023-08-23  3:03                       ` Yunsheng Lin
  0 siblings, 2 replies; 32+ messages in thread
From: Jakub Kicinski @ 2023-08-22 15:38 UTC (permalink / raw)
  To: Yunsheng Lin, Alexander Duyck
  Cc: Ilias Apalodimas, Mina Almasry, davem, pabeni, netdev,
	linux-kernel, Lorenzo Bianconi, Liang Chen, Alexander Lobakin,
	Saeed Mahameed, Leon Romanovsky, Eric Dumazet,
	Jesper Dangaard Brouer

On Tue, 22 Aug 2023 17:21:35 +0800 Yunsheng Lin wrote:
> > .. we should also add a:
> > 
> > 	WARN_ONCE(1, "misaligned DMA address, please report to netdev@");  
> 
> As the CONFIG_PHYS_ADDR_T_64BIT seems to used widely in x86/arm/mips/powerpc,
> I am not sure if we can really make the above assumption.
> 
> https://elixir.free-electrons.com/linux/v6.4-rc6/K/ident/CONFIG_PHYS_ADDR_T_64BIT

Huh, it's actually used a lot less than I anticipated!

None of the x86/arm/mips/powerpc systems matter IMHO - the only _real_
risk is something we don't know about returning non-aligned addresses.

Unless we know about specific problems I'd suggest we took the simpler
path rather than complicating the design for systems which may not
exist.

Alex, do you know of any such cases? Some crazy swiotlb setting?
WDYT about this in general?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH net-next v7 1/6] page_pool: frag API support for 32-bit arch with 64-bit DMA
  2023-08-22 15:38                     ` Jakub Kicinski
@ 2023-08-22 18:30                       ` Alexander Duyck
  2023-08-22 18:58                         ` Alexander Duyck
  2023-08-23  3:03                       ` Yunsheng Lin
  1 sibling, 1 reply; 32+ messages in thread
From: Alexander Duyck @ 2023-08-22 18:30 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Yunsheng Lin, Ilias Apalodimas, Mina Almasry, davem, pabeni,
	netdev, linux-kernel, Lorenzo Bianconi, Liang Chen,
	Alexander Lobakin, Saeed Mahameed, Leon Romanovsky, Eric Dumazet,
	Jesper Dangaard Brouer

On Tue, Aug 22, 2023 at 8:38 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Tue, 22 Aug 2023 17:21:35 +0800 Yunsheng Lin wrote:
> > > .. we should also add a:
> > >
> > >     WARN_ONCE(1, "misaligned DMA address, please report to netdev@");
> >
> > As the CONFIG_PHYS_ADDR_T_64BIT seems to used widely in x86/arm/mips/powerpc,
> > I am not sure if we can really make the above assumption.
> >
> > https://elixir.free-electrons.com/linux/v6.4-rc6/K/ident/CONFIG_PHYS_ADDR_T_64BIT
>
> Huh, it's actually used a lot less than I anticipated!
>
> None of the x86/arm/mips/powerpc systems matter IMHO - the only _real_
> risk is something we don't know about returning non-aligned addresses.
>
> Unless we know about specific problems I'd suggest we took the simpler
> path rather than complicating the design for systems which may not
> exist.
>
> Alex, do you know of any such cases? Some crazy swiotlb setting?
> WDYT about this in general?

There may be scenarios where if bounce buffers are used the page may
not be aligned. It all comes down to how
swiotlb_tbl_map_single(https://elixir.free-electrons.com/linux/v6.5-rc7/C/ident/swiotlb_tbl_map_single)
is called. In the IOMMU case it looks like they take the extra step of
passing an alignment value, but it looks like for the other two cases
they don't.

Changing that behavior wouldn't take much though. Basically we would
just need to do something like look at the size and address and if
they are both page aligned then we could specify a page alignment for
the DMA mapping.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH net-next v7 1/6] page_pool: frag API support for 32-bit arch with 64-bit DMA
  2023-08-22 18:30                       ` Alexander Duyck
@ 2023-08-22 18:58                         ` Alexander Duyck
  0 siblings, 0 replies; 32+ messages in thread
From: Alexander Duyck @ 2023-08-22 18:58 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Yunsheng Lin, Ilias Apalodimas, Mina Almasry, davem, pabeni,
	netdev, linux-kernel, Lorenzo Bianconi, Liang Chen,
	Alexander Lobakin, Saeed Mahameed, Leon Romanovsky, Eric Dumazet,
	Jesper Dangaard Brouer

On Tue, Aug 22, 2023 at 11:30 AM Alexander Duyck
<alexander.duyck@gmail.com> wrote:
>
> On Tue, Aug 22, 2023 at 8:38 AM Jakub Kicinski <kuba@kernel.org> wrote:
> >
> > On Tue, 22 Aug 2023 17:21:35 +0800 Yunsheng Lin wrote:
> > > > .. we should also add a:
> > > >
> > > >     WARN_ONCE(1, "misaligned DMA address, please report to netdev@");
> > >
> > > As the CONFIG_PHYS_ADDR_T_64BIT seems to used widely in x86/arm/mips/powerpc,
> > > I am not sure if we can really make the above assumption.
> > >
> > > https://elixir.free-electrons.com/linux/v6.4-rc6/K/ident/CONFIG_PHYS_ADDR_T_64BIT
> >
> > Huh, it's actually used a lot less than I anticipated!
> >
> > None of the x86/arm/mips/powerpc systems matter IMHO - the only _real_
> > risk is something we don't know about returning non-aligned addresses.
> >
> > Unless we know about specific problems I'd suggest we took the simpler
> > path rather than complicating the design for systems which may not
> > exist.
> >
> > Alex, do you know of any such cases? Some crazy swiotlb setting?
> > WDYT about this in general?
>
> There may be scenarios where if bounce buffers are used the page may
> not be aligned. It all comes down to how
> swiotlb_tbl_map_single(https://elixir.free-electrons.com/linux/v6.5-rc7/C/ident/swiotlb_tbl_map_single)
> is called. In the IOMMU case it looks like they take the extra step of
> passing an alignment value, but it looks like for the other two cases
> they don't.
>
> Changing that behavior wouldn't take much though. Basically we would
> just need to do something like look at the size and address and if
> they are both page aligned then we could specify a page alignment for
> the DMA mapping.

Actually I take that back. It looks like in the bounce case there is
already code that will look for PAGE_SIZE aligned blocks if the
request is for PAGE_SIZE or larger. So there shouldn't be any cases
where a PAGE_SIZE request is not PAGE_SIZE aligned in DMA that I am
aware of.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH net-next v7 1/6] page_pool: frag API support for 32-bit arch with 64-bit DMA
  2023-08-22 15:38                     ` Jakub Kicinski
  2023-08-22 18:30                       ` Alexander Duyck
@ 2023-08-23  3:03                       ` Yunsheng Lin
  2023-08-23 14:25                         ` Jakub Kicinski
  1 sibling, 1 reply; 32+ messages in thread
From: Yunsheng Lin @ 2023-08-23  3:03 UTC (permalink / raw)
  To: Jakub Kicinski, Alexander Duyck
  Cc: Ilias Apalodimas, Mina Almasry, davem, pabeni, netdev,
	linux-kernel, Lorenzo Bianconi, Liang Chen, Alexander Lobakin,
	Saeed Mahameed, Leon Romanovsky, Eric Dumazet,
	Jesper Dangaard Brouer

On 2023/8/22 23:38, Jakub Kicinski wrote:
> On Tue, 22 Aug 2023 17:21:35 +0800 Yunsheng Lin wrote:
>>> .. we should also add a:
>>>
>>> 	WARN_ONCE(1, "misaligned DMA address, please report to netdev@");  
>>
>> As the CONFIG_PHYS_ADDR_T_64BIT seems to used widely in x86/arm/mips/powerpc,
>> I am not sure if we can really make the above assumption.
>>
>> https://elixir.free-electrons.com/linux/v6.4-rc6/K/ident/CONFIG_PHYS_ADDR_T_64BIT
> 
> Huh, it's actually used a lot less than I anticipated!
> 
> None of the x86/arm/mips/powerpc systems matter IMHO - the only _real_

Is there any particular reason that you think that the above systems does
not really matter?

As we have made a similar wrong assumption about those arches before, I am
really trying to be more cautious about it.

I searched through the web, some seems to be claiming that "32-bits is DEAD",
I am not sure if there is some common agreement among the kernel community,
is there any previous discussion about that?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH net-next v7 1/6] page_pool: frag API support for 32-bit arch with 64-bit DMA
  2023-08-23  3:03                       ` Yunsheng Lin
@ 2023-08-23 14:25                         ` Jakub Kicinski
  2023-08-23 18:00                           ` Alexander Duyck
  0 siblings, 1 reply; 32+ messages in thread
From: Jakub Kicinski @ 2023-08-23 14:25 UTC (permalink / raw)
  To: Yunsheng Lin
  Cc: Alexander Duyck, Ilias Apalodimas, Mina Almasry, davem, pabeni,
	netdev, linux-kernel, Lorenzo Bianconi, Liang Chen,
	Alexander Lobakin, Saeed Mahameed, Leon Romanovsky, Eric Dumazet,
	Jesper Dangaard Brouer

On Wed, 23 Aug 2023 11:03:31 +0800 Yunsheng Lin wrote:
> On 2023/8/22 23:38, Jakub Kicinski wrote:
> > On Tue, 22 Aug 2023 17:21:35 +0800 Yunsheng Lin wrote:  
> >> As the CONFIG_PHYS_ADDR_T_64BIT seems to used widely in x86/arm/mips/powerpc,
> >> I am not sure if we can really make the above assumption.
> >>
> >> https://elixir.free-electrons.com/linux/v6.4-rc6/K/ident/CONFIG_PHYS_ADDR_T_64BIT  
> > 
> > Huh, it's actually used a lot less than I anticipated!
> > 
> > None of the x86/arm/mips/powerpc systems matter IMHO - the only _real_  
> 
> Is there any particular reason that you think that the above systems does
> not really matter?

Not the systems themselves but the combination of a 32b arch with 
an address space >16TB. All those arches have 64b equivalent, seems
logical to use the 64b version for a system with a large address space.
If we're talking about a system which ends up running Linux.

> As we have made a similar wrong assumption about those arches before, I am
> really trying to be more cautious about it.
> 
> I searched through the web, some seems to be claiming that "32-bits is DEAD",
> I am not sure if there is some common agreement among the kernel community,
> is there any previous discussion about that?

My suspicion/claim is that 32 + PAGE_SHIFT should be enough bits for
any 32b platform.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH net-next v7 1/6] page_pool: frag API support for 32-bit arch with 64-bit DMA
  2023-08-23 14:25                         ` Jakub Kicinski
@ 2023-08-23 18:00                           ` Alexander Duyck
  2023-08-25  9:40                             ` Yunsheng Lin
  0 siblings, 1 reply; 32+ messages in thread
From: Alexander Duyck @ 2023-08-23 18:00 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Yunsheng Lin, Ilias Apalodimas, Mina Almasry, davem, pabeni,
	netdev, linux-kernel, Lorenzo Bianconi, Liang Chen,
	Alexander Lobakin, Saeed Mahameed, Leon Romanovsky, Eric Dumazet,
	Jesper Dangaard Brouer

On Wed, Aug 23, 2023 at 7:25 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Wed, 23 Aug 2023 11:03:31 +0800 Yunsheng Lin wrote:
> > On 2023/8/22 23:38, Jakub Kicinski wrote:
> > > On Tue, 22 Aug 2023 17:21:35 +0800 Yunsheng Lin wrote:
> > >> As the CONFIG_PHYS_ADDR_T_64BIT seems to used widely in x86/arm/mips/powerpc,
> > >> I am not sure if we can really make the above assumption.
> > >>
> > >> https://elixir.free-electrons.com/linux/v6.4-rc6/K/ident/CONFIG_PHYS_ADDR_T_64BIT
> > >
> > > Huh, it's actually used a lot less than I anticipated!
> > >
> > > None of the x86/arm/mips/powerpc systems matter IMHO - the only _real_
> >
> > Is there any particular reason that you think that the above systems does
> > not really matter?
>
> Not the systems themselves but the combination of a 32b arch with
> an address space >16TB. All those arches have 64b equivalent, seems
> logical to use the 64b version for a system with a large address space.
> If we're talking about a system which ends up running Linux.
>
> > As we have made a similar wrong assumption about those arches before, I am
> > really trying to be more cautious about it.
> >
> > I searched through the web, some seems to be claiming that "32-bits is DEAD",
> > I am not sure if there is some common agreement among the kernel community,
> > is there any previous discussion about that?
>
> My suspicion/claim is that 32 + PAGE_SHIFT should be enough bits for
> any 32b platform.

One additional thing we could consider would be to simply look at
having page_pool enforce a DMA mask for the device to address any
cases where we might not be able to fit the address. Then in the
unlikely event that somebody is running a 32b system with over 16
terabytes of RAM. With that the DMA subsystem would handle it for us
and we wouldn't have to worry so much about it.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH net-next v7 1/6] page_pool: frag API support for 32-bit arch with 64-bit DMA
  2023-08-23 18:00                           ` Alexander Duyck
@ 2023-08-25  9:40                             ` Yunsheng Lin
  2023-08-26  0:08                               ` Jakub Kicinski
  0 siblings, 1 reply; 32+ messages in thread
From: Yunsheng Lin @ 2023-08-25  9:40 UTC (permalink / raw)
  To: Alexander Duyck, Jakub Kicinski
  Cc: Ilias Apalodimas, Mina Almasry, davem, pabeni, netdev,
	linux-kernel, Lorenzo Bianconi, Liang Chen, Alexander Lobakin,
	Saeed Mahameed, Leon Romanovsky, Eric Dumazet,
	Jesper Dangaard Brouer

On 2023/8/24 2:00, Alexander Duyck wrote:
> On Wed, Aug 23, 2023 at 7:25 AM Jakub Kicinski <kuba@kernel.org> wrote:
>>
>> On Wed, 23 Aug 2023 11:03:31 +0800 Yunsheng Lin wrote:
>>> On 2023/8/22 23:38, Jakub Kicinski wrote:
>>>> On Tue, 22 Aug 2023 17:21:35 +0800 Yunsheng Lin wrote:
>>>>> As the CONFIG_PHYS_ADDR_T_64BIT seems to used widely in x86/arm/mips/powerpc,
>>>>> I am not sure if we can really make the above assumption.
>>>>>
>>>>> https://elixir.free-electrons.com/linux/v6.4-rc6/K/ident/CONFIG_PHYS_ADDR_T_64BIT
>>>>
>>>> Huh, it's actually used a lot less than I anticipated!
>>>>
>>>> None of the x86/arm/mips/powerpc systems matter IMHO - the only _real_
>>>
>>> Is there any particular reason that you think that the above systems does
>>> not really matter?
>>
>> Not the systems themselves but the combination of a 32b arch with
>> an address space >16TB. All those arches have 64b equivalent, seems
>> logical to use the 64b version for a system with a large address space.
>> If we're talking about a system which ends up running Linux.
>>
>>> As we have made a similar wrong assumption about those arches before, I am
>>> really trying to be more cautious about it.
>>>
>>> I searched through the web, some seems to be claiming that "32-bits is DEAD",
>>> I am not sure if there is some common agreement among the kernel community,
>>> is there any previous discussion about that?
>>
>> My suspicion/claim is that 32 + PAGE_SHIFT should be enough bits for
>> any 32b platform.
> 
> One additional thing we could consider would be to simply look at
> having page_pool enforce a DMA mask for the device to address any
> cases where we might not be able to fit the address. Then in the
> unlikely event that somebody is running a 32b system with over 16
> terabytes of RAM. With that the DMA subsystem would handle it for us
> and we wouldn't have to worry so much about it.

It seems there is a API to acquire the DMA mask used by the device:
https://elixir.free-electrons.com/linux/v6.4-rc6/source/include/linux/dma-mapping.h#L434

Is it possible to use that to check if DMA mask used by the device is
within 32 + PAGE_SHIFT limit, if yes, we use jakub's proposal to reduce
reduce the dma address bit, if no, we fail the page_pool creation?

> .
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH net-next v7 1/6] page_pool: frag API support for 32-bit arch with 64-bit DMA
  2023-08-25  9:40                             ` Yunsheng Lin
@ 2023-08-26  0:08                               ` Jakub Kicinski
  2023-08-28 14:50                                 ` Alexander Duyck
  0 siblings, 1 reply; 32+ messages in thread
From: Jakub Kicinski @ 2023-08-26  0:08 UTC (permalink / raw)
  To: Yunsheng Lin
  Cc: Alexander Duyck, Ilias Apalodimas, Mina Almasry, davem, pabeni,
	netdev, linux-kernel, Lorenzo Bianconi, Liang Chen,
	Alexander Lobakin, Saeed Mahameed, Leon Romanovsky, Eric Dumazet,
	Jesper Dangaard Brouer

On Fri, 25 Aug 2023 17:40:43 +0800 Yunsheng Lin wrote:
> > One additional thing we could consider would be to simply look at
> > having page_pool enforce a DMA mask for the device to address any
> > cases where we might not be able to fit the address. Then in the
> > unlikely event that somebody is running a 32b system with over 16
> > terabytes of RAM. With that the DMA subsystem would handle it for us
> > and we wouldn't have to worry so much about it.  
> 
> It seems there is a API to acquire the DMA mask used by the device:
> https://elixir.free-electrons.com/linux/v6.4-rc6/source/include/linux/dma-mapping.h#L434
> 
> Is it possible to use that to check if DMA mask used by the device is
> within 32 + PAGE_SHIFT limit, if yes, we use jakub's proposal to reduce
> reduce the dma address bit, if no, we fail the page_pool creation?

IMO you're making this unnecessarily complicated. We can set the masks
in page pool core or just handle the allocation failure like my patch
does and worry about the very unlikely case when someone reports actual
problems.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH net-next v7 1/6] page_pool: frag API support for 32-bit arch with 64-bit DMA
  2023-08-26  0:08                               ` Jakub Kicinski
@ 2023-08-28 14:50                                 ` Alexander Duyck
  2023-08-28 15:38                                   ` Jakub Kicinski
  0 siblings, 1 reply; 32+ messages in thread
From: Alexander Duyck @ 2023-08-28 14:50 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Yunsheng Lin, Ilias Apalodimas, Mina Almasry, davem, pabeni,
	netdev, linux-kernel, Lorenzo Bianconi, Liang Chen,
	Alexander Lobakin, Saeed Mahameed, Leon Romanovsky, Eric Dumazet,
	Jesper Dangaard Brouer

On Fri, Aug 25, 2023 at 5:08 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Fri, 25 Aug 2023 17:40:43 +0800 Yunsheng Lin wrote:
> > > One additional thing we could consider would be to simply look at
> > > having page_pool enforce a DMA mask for the device to address any
> > > cases where we might not be able to fit the address. Then in the
> > > unlikely event that somebody is running a 32b system with over 16
> > > terabytes of RAM. With that the DMA subsystem would handle it for us
> > > and we wouldn't have to worry so much about it.
> >
> > It seems there is a API to acquire the DMA mask used by the device:
> > https://elixir.free-electrons.com/linux/v6.4-rc6/source/include/linux/dma-mapping.h#L434
> >
> > Is it possible to use that to check if DMA mask used by the device is
> > within 32 + PAGE_SHIFT limit, if yes, we use jakub's proposal to reduce
> > reduce the dma address bit, if no, we fail the page_pool creation?
>
> IMO you're making this unnecessarily complicated. We can set the masks
> in page pool core or just handle the allocation failure like my patch
> does and worry about the very unlikely case when someone reports actual
> problems.

Actually we could keep it pretty simple. We just have to create a
#define using DMA_BIT_MASK for the size of the page pool DMA. We could
name it something like PP_DMA_BIT_MASK. The drivers would just have to
use that to define their bit mask when they call
dma_set_mask_and_coherent. In that case the DMA API would switch to
bounce buffers automatically in cases where the page DMA address would
be out of bounds.

The other tweak we could look at doing would be to just look at the
dma_get_required_mask and add a warning and/or fail to load page pool
on systems where the page pool would not be able to process that when
ANDed with the device dma mask.

With those two changes the setup should be rock solid in terms of any
risks of the DMA address being out of bounds, and with minimal
performance impact as we would have verified all possibilities before
we even get into the hot path.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH net-next v7 1/6] page_pool: frag API support for 32-bit arch with 64-bit DMA
  2023-08-28 14:50                                 ` Alexander Duyck
@ 2023-08-28 15:38                                   ` Jakub Kicinski
  2023-08-29 11:58                                     ` Yunsheng Lin
  0 siblings, 1 reply; 32+ messages in thread
From: Jakub Kicinski @ 2023-08-28 15:38 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Yunsheng Lin, Ilias Apalodimas, Mina Almasry, davem, pabeni,
	netdev, linux-kernel, Lorenzo Bianconi, Liang Chen,
	Alexander Lobakin, Saeed Mahameed, Leon Romanovsky, Eric Dumazet,
	Jesper Dangaard Brouer

On Mon, 28 Aug 2023 07:50:33 -0700 Alexander Duyck wrote:
> Actually we could keep it pretty simple. We just have to create a
> #define using DMA_BIT_MASK for the size of the page pool DMA. We could
> name it something like PP_DMA_BIT_MASK. The drivers would just have to
> use that to define their bit mask when they call
> dma_set_mask_and_coherent. In that case the DMA API would switch to
> bounce buffers automatically in cases where the page DMA address would
> be out of bounds.
> 
> The other tweak we could look at doing would be to just look at the
> dma_get_required_mask and add a warning and/or fail to load page pool
> on systems where the page pool would not be able to process that when
> ANDed with the device dma mask.
> 
> With those two changes the setup should be rock solid in terms of any
> risks of the DMA address being out of bounds, and with minimal
> performance impact as we would have verified all possibilities before
> we even get into the hot path.

Sounds like a plan!

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH net-next v7 1/6] page_pool: frag API support for 32-bit arch with 64-bit DMA
  2023-08-28 15:38                                   ` Jakub Kicinski
@ 2023-08-29 11:58                                     ` Yunsheng Lin
  0 siblings, 0 replies; 32+ messages in thread
From: Yunsheng Lin @ 2023-08-29 11:58 UTC (permalink / raw)
  To: Jakub Kicinski, Alexander Duyck
  Cc: Ilias Apalodimas, Mina Almasry, davem, pabeni, netdev,
	linux-kernel, Lorenzo Bianconi, Liang Chen, Alexander Lobakin,
	Saeed Mahameed, Leon Romanovsky, Eric Dumazet,
	Jesper Dangaard Brouer

On 2023/8/28 23:38, Jakub Kicinski wrote:
> On Mon, 28 Aug 2023 07:50:33 -0700 Alexander Duyck wrote:
>> Actually we could keep it pretty simple. We just have to create a
>> #define using DMA_BIT_MASK for the size of the page pool DMA. We could
>> name it something like PP_DMA_BIT_MASK. The drivers would just have to
>> use that to define their bit mask when they call
>> dma_set_mask_and_coherent. In that case the DMA API would switch to
>> bounce buffers automatically in cases where the page DMA address would
>> be out of bounds.
>>
>> The other tweak we could look at doing would be to just look at the
>> dma_get_required_mask and add a warning and/or fail to load page pool
>> on systems where the page pool would not be able to process that when
>> ANDed with the device dma mask.

As the all arches have used CONFIG_PHYS_ADDR_T_64BIT:
https://elixir.free-electrons.com/linux/v6.4-rc6/K/ident/CONFIG_PHYS_ADDR_T_64BIT

arm: Large Physical Address Extension or LPAE, 40 bits of phys addr.
arc: Physical Address Extension or PAE, 40 bits of phys addr.
mips: eXtended Physical Addressing or PXA, 40 bits of phys addr.
powerpc: does not seems to have a name for the feature, and have 36
         bits of phys addr.
riscv: large physical address, 34 bits of phys addr.
x86: Physical Address Extension or PAE, 36 bits of phys addr.

It do seem that we are worrying too much, So I am going to follow jakub's
suggestion. If we make a wrong assumption, we print a warning for that.

>>
>> With those two changes the setup should be rock solid in terms of any
>> risks of the DMA address being out of bounds, and with minimal
>> performance impact as we would have verified all possibilities before
>> we even get into the hot path.
> 
> Sounds like a plan!
> .
> 

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2023-08-29 11:58 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-16 10:01 [PATCH net-next v7 0/6] introduce page_pool_alloc() related API Yunsheng Lin
2023-08-16 10:01 ` [PATCH net-next v7 1/6] page_pool: frag API support for 32-bit arch with 64-bit DMA Yunsheng Lin
2023-08-17 13:57   ` Ilias Apalodimas
2023-08-17 16:15     ` Jakub Kicinski
2023-08-17 16:59       ` Ilias Apalodimas
2023-08-17 23:57         ` Jakub Kicinski
2023-08-18  6:12           ` Ilias Apalodimas
2023-08-18  8:59             ` Yunsheng Lin
2023-08-18 21:51             ` Jakub Kicinski
2023-08-21  8:38               ` Ilias Apalodimas
2023-08-21 11:15                 ` Ilias Apalodimas
2023-08-21 12:18               ` Yunsheng Lin
2023-08-21 18:35                 ` Jakub Kicinski
2023-08-22  9:21                   ` Yunsheng Lin
2023-08-22 15:38                     ` Jakub Kicinski
2023-08-22 18:30                       ` Alexander Duyck
2023-08-22 18:58                         ` Alexander Duyck
2023-08-23  3:03                       ` Yunsheng Lin
2023-08-23 14:25                         ` Jakub Kicinski
2023-08-23 18:00                           ` Alexander Duyck
2023-08-25  9:40                             ` Yunsheng Lin
2023-08-26  0:08                               ` Jakub Kicinski
2023-08-28 14:50                                 ` Alexander Duyck
2023-08-28 15:38                                   ` Jakub Kicinski
2023-08-29 11:58                                     ` Yunsheng Lin
2023-08-16 10:01 ` [PATCH net-next v7 2/6] page_pool: unify frag_count handling in page_pool_is_last_frag() Yunsheng Lin
2023-08-16 11:30   ` Ilias Apalodimas
2023-08-16 12:42     ` Yunsheng Lin
2023-08-16 10:01 ` [PATCH net-next v7 3/6] page_pool: remove PP_FLAG_PAGE_FRAG Yunsheng Lin
2023-08-16 10:01 ` [PATCH net-next v7 4/6] page_pool: introduce page_pool[_cache]_alloc() API Yunsheng Lin
2023-08-16 10:01 ` [PATCH net-next v7 5/6] page_pool: update document about frag API Yunsheng Lin
2023-08-16 10:01 ` [PATCH net-next v7 6/6] net: veth: use newly added page pool API for veth with xdp Yunsheng Lin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).