[PATCH RFC net-next 0/4] net: page_pool: a couple assorted optimizations

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH RFC net-next 0/4] net: page_pool: a couple assorted optimizations
@ 2023-06-29 15:23 Alexander Lobakin
  2023-06-29 15:23 ` [PATCH RFC net-next 1/4] net: skbuff: don't include <net/page_pool.h> to <linux/skbuff.h> Alexander Lobakin
                   ` (4 more replies)
  0 siblings, 5 replies; 21+ messages in thread
From: Alexander Lobakin @ 2023-06-29 15:23 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Alexander Lobakin, Maciej Fijalkowski, Larysa Zaremba,
	Yunsheng Lin, Alexander Duyck, Jesper Dangaard Brouer,
	Ilias Apalodimas, netdev, linux-kernel

Here's spin off the IAVF PP series[0], with 2 runtime (hotpath) and 1
compile-time optimizations. They're based and tested on top of the
hybrid PP allocation series[1], but don't require it to work and
in general independent of it and each other.

Per-patch breakdown:
 #1: already was on the lists, but this time it's done the other way, the
     one that Alex Duyck proposed during the review of the previous series.
     Slightly reduce amount of C preprocessing by stopping including
     <net/page_pool.h> to <linux/skbuff.h> (which is included in the
     half of the kernel sources). Especially useful with the abovementioned
     series applied, as it makes page_pool.h heavier;
 #2: don't call to DMA sync externals when they won't do anything anyway
     by doing some heuristics a bit earlier (when allocating a new page),
     also was on the lists;
 #3: new, prereq to #4. Add NAPI state flag, which would indicate
     napi->poll() is running right now, so that napi->list_owner would
     point to the CPU where it's being run, not just scheduled;
 #4: new. In addition to recycling skb PP pages directly when @napi_safe
     is set, check for the flag from #3, which will mean the same if
     ->list_owner is pointing to us. This allows to use direct recycling
     anytime we're inside a NAPI polling loop or GRO stuff going right
     after it, covering way more cases than is right now.

(complete tree with [1] + this + [0] is available here: [2])

[0] https://lore.kernel.org/netdev/20230530150035.1943669-1-aleksander.lobakin@intel.com
[1] https://lore.kernel.org/netdev/20230629120226.14854-1-linyunsheng@huawei.com
[2] https://github.com/alobakin/linux/commits/iavf-pp-frag

Alexander Lobakin (4):
  net: skbuff: don't include <net/page_pool.h> to <linux/skbuff.h>
  net: page_pool: avoid calling no-op externals when possible
  net: add flag to indicate NAPI/GRO is running right now
  net: skbuff: always recycle PP pages directly when inside a NAPI loop

 drivers/net/ethernet/engleder/tsnep_main.c    |  1 +
 drivers/net/ethernet/freescale/fec_main.c     |  1 +
 .../marvell/octeontx2/nic/otx2_common.c       |  1 +
 .../ethernet/marvell/octeontx2/nic/otx2_pf.c  |  1 +
 .../ethernet/mellanox/mlx5/core/en/params.c   |  1 +
 .../net/ethernet/mellanox/mlx5/core/en/xdp.c  |  1 +
 drivers/net/wireless/mediatek/mt76/mt76.h     |  1 +
 include/linux/netdevice.h                     |  2 +
 include/linux/skbuff.h                        |  3 +-
 include/net/page_pool.h                       |  5 +-
 net/core/dev.c                                | 23 +++++--
 net/core/page_pool.c                          | 62 +++++++------------
 net/core/skbuff.c                             | 29 +++++++++
 13 files changed, 83 insertions(+), 48 deletions(-)

---
Really curious about #3. Implementing the idea correctly (this or other
way) potentially unblocks a lot more interesting stuff (besides #4).
-- 
2.41.0


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH RFC net-next 1/4] net: skbuff: don't include <net/page_pool.h> to <linux/skbuff.h>
  2023-06-29 15:23 [PATCH RFC net-next 0/4] net: page_pool: a couple assorted optimizations Alexander Lobakin
@ 2023-06-29 15:23 ` Alexander Lobakin
  2023-06-29 16:55   ` Alexander H Duyck
  2023-06-29 15:23 ` [PATCH RFC net-next 2/4] net: page_pool: avoid calling no-op externals when possible Alexander Lobakin
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 21+ messages in thread
From: Alexander Lobakin @ 2023-06-29 15:23 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Alexander Lobakin, Maciej Fijalkowski, Larysa Zaremba,
	Yunsheng Lin, Alexander Duyck, Jesper Dangaard Brouer,
	Ilias Apalodimas, netdev, linux-kernel

Currently, touching <net/page_pool.h> triggers a rebuild of more than
a half of the kernel. That's because it's included in <linux/skbuff.h>.
And each new include to page_pool.h adds more [useless] data for the
toolchain to process per each source file from that pile.

In commit 6a5bcd84e886 ("page_pool: Allow drivers to hint on SKB
recycling"), Matteo included it to be able to call a couple functions
defined there. Then, in commit 57f05bc2ab24 ("page_pool: keep pp info
as long as page pool owns the page") one of the calls was removed, so
only one left. It's the call to page_pool_return_skb_page() in
napi_frag_unref(). The function is external and doesn't have any
dependencies. Having include of very niche page_pool.h only for that
looks like an overkill.
As Alex noticed, the only thing that holds this function in page_pool.c
is %PP_SIGNATURE. By moving the check for magic a couple functions up,
the whole page_pool_return_skb_page() can be moved to skbuff.c.
The arguments for moving the check are the following:

1) It checks for a corner case that shouldn't ever happen when the code
   is sane. And execution speed doesn't matter on exception path, thus
   doing more calls before bailing out doesn't make any weather.
2) There are 2 users of the internal __page_pool_put_page(), where this
   check is moved: page_pool_put_defragged_page() and
   page_pool_put_page_bulk(). Both are exported and can be called from
   the drivers, but previously only the skb recycling path of the former
   was protected with the magic check. So this also makes the code a bit
   more reliable.

After the check is moved, teleport page_pool_return_skb_page() to
skbuff.c, just next to the main consumer, skb_pp_recycle(). It's used
also in napi_frag_unref() -> {__,}skb_frag_unref(), so no `static`
unfortunately. Maybe next time.
Now, after a few include fixes in the drivers, touching page_pool.h
only triggers rebuilding of the drivers using it and a couple core
networking files.

Suggested-by: Jakub Kicinski <kuba@kernel.org> # make skbuff.h less heavy
Suggested-by: Alexander Duyck <alexanderduyck@fb.com> # move to skbuff.c
Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
---
 drivers/net/ethernet/engleder/tsnep_main.c    |  1 +
 drivers/net/ethernet/freescale/fec_main.c     |  1 +
 .../marvell/octeontx2/nic/otx2_common.c       |  1 +
 .../ethernet/marvell/octeontx2/nic/otx2_pf.c  |  1 +
 .../ethernet/mellanox/mlx5/core/en/params.c   |  1 +
 .../net/ethernet/mellanox/mlx5/core/en/xdp.c  |  1 +
 drivers/net/wireless/mediatek/mt76/mt76.h     |  1 +
 include/linux/skbuff.h                        |  3 +-
 include/net/page_pool.h                       |  2 -
 net/core/page_pool.c                          | 52 +++++--------------
 net/core/skbuff.c                             | 28 ++++++++++
 11 files changed, 50 insertions(+), 42 deletions(-)

diff --git a/drivers/net/ethernet/engleder/tsnep_main.c b/drivers/net/ethernet/engleder/tsnep_main.c
index 84751bb303a6..6222aaa5157f 100644
--- a/drivers/net/ethernet/engleder/tsnep_main.c
+++ b/drivers/net/ethernet/engleder/tsnep_main.c
@@ -28,6 +28,7 @@
 #include <linux/iopoll.h>
 #include <linux/bpf.h>
 #include <linux/bpf_trace.h>
+#include <net/page_pool.h>
 #include <net/xdp_sock_drv.h>
 
 #define TSNEP_RX_OFFSET (max(NET_SKB_PAD, XDP_PACKET_HEADROOM) + NET_IP_ALIGN)
diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
index 8fbe47703d47..99b3b4f79603 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -38,6 +38,7 @@
 #include <linux/in.h>
 #include <linux/ip.h>
 #include <net/ip.h>
+#include <net/page_pool.h>
 #include <net/selftests.h>
 #include <net/tso.h>
 #include <linux/tcp.h>
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
index a5d03583bf79..d17a0ebc9036 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
@@ -7,6 +7,7 @@
 
 #include <linux/interrupt.h>
 #include <linux/pci.h>
+#include <net/page_pool.h>
 #include <net/tso.h>
 #include <linux/bitfield.h>
 
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
index fe8ea4e531b7..7eca434a0550 100644
--- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
+++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
@@ -16,6 +16,7 @@
 #include <linux/bpf.h>
 #include <linux/bpf_trace.h>
 #include <linux/bitfield.h>
+#include <net/page_pool.h>
 
 #include "otx2_reg.h"
 #include "otx2_common.h"
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/params.c b/drivers/net/ethernet/mellanox/mlx5/core/en/params.c
index 5ce28ff7685f..0f152f14165b 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/params.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/params.c
@@ -6,6 +6,7 @@
 #include "en/port.h"
 #include "en_accel/en_accel.h"
 #include "en_accel/ipsec.h"
+#include <net/page_pool.h>
 #include <net/xdp_sock_drv.h>
 
 static u8 mlx5e_mpwrq_min_page_shift(struct mlx5_core_dev *mdev)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
index f0e6095809fa..1bd91bc09eb8 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
@@ -35,6 +35,7 @@
 #include "en/xdp.h"
 #include "en/params.h"
 #include <linux/bitfield.h>
+#include <net/page_pool.h>
 
 int mlx5e_xdp_max_mtu(struct mlx5e_params *params, struct mlx5e_xsk_param *xsk)
 {
diff --git a/drivers/net/wireless/mediatek/mt76/mt76.h b/drivers/net/wireless/mediatek/mt76/mt76.h
index 6b07b8fafec2..95c16f11d156 100644
--- a/drivers/net/wireless/mediatek/mt76/mt76.h
+++ b/drivers/net/wireless/mediatek/mt76/mt76.h
@@ -15,6 +15,7 @@
 #include <linux/average.h>
 #include <linux/soc/mediatek/mtk_wed.h>
 #include <net/mac80211.h>
+#include <net/page_pool.h>
 #include "util.h"
 #include "testmode.h"
 
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 91ed66952580..f76d172ed262 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -32,7 +32,6 @@
 #include <linux/if_packet.h>
 #include <linux/llist.h>
 #include <net/flow.h>
-#include <net/page_pool.h>
 #if IS_ENABLED(CONFIG_NF_CONNTRACK)
 #include <linux/netfilter/nf_conntrack_common.h>
 #endif
@@ -3423,6 +3422,8 @@ static inline void skb_frag_ref(struct sk_buff *skb, int f)
 	__skb_frag_ref(&skb_shinfo(skb)->frags[f]);
 }
 
+bool page_pool_return_skb_page(struct page *page, bool napi_safe);
+
 static inline void
 napi_frag_unref(skb_frag_t *frag, bool recycle, bool napi_safe)
 {
diff --git a/include/net/page_pool.h b/include/net/page_pool.h
index 2b7db9992fc0..829dc1f8ba6b 100644
--- a/include/net/page_pool.h
+++ b/include/net/page_pool.h
@@ -307,8 +307,6 @@ inline enum dma_data_direction page_pool_get_dma_dir(struct page_pool *pool)
 	return pool->p.dma_dir;
 }
 
-bool page_pool_return_skb_page(struct page *page, bool napi_safe);
-
 struct page_pool *page_pool_create(const struct page_pool_params *params);
 
 struct xdp_mem_info;
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 985ccaffc06a..dff0b4fa2316 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -582,6 +582,19 @@ static __always_inline struct page *
 __page_pool_put_page(struct page_pool *pool, struct page *page,
 		     unsigned int dma_sync_size, bool allow_direct)
 {
+	/* Avoid recycling non-PP pages, give them back to the page allocator.
+	 * page->pp_magic is OR'ed with PP_SIGNATURE after the allocation
+	 * in order to preserve any existing bits, such as bit 0 for the
+	 * head page of compound page and bit 1 for pfmemalloc page, so
+	 * mask those bits for freeing side when doing below checking,
+	 * and page_is_pfmemalloc() is checked in __page_pool_put_page()
+	 * to avoid recycling the pfmemalloc page.
+	 */
+	if (unlikely((page->pp_magic & ~0x3UL) != PP_SIGNATURE)) {
+		put_page(page);
+		return NULL;
+	}
+
 	/* This allocator is optimized for the XDP mode that uses
 	 * one-frame-per-page, but have fallbacks that act like the
 	 * regular page allocator APIs.
@@ -913,42 +926,3 @@ void page_pool_update_nid(struct page_pool *pool, int new_nid)
 	}
 }
 EXPORT_SYMBOL(page_pool_update_nid);
-
-bool page_pool_return_skb_page(struct page *page, bool napi_safe)
-{
-	struct napi_struct *napi;
-	struct page_pool *pp;
-	bool allow_direct;
-
-	page = compound_head(page);
-
-	/* page->pp_magic is OR'ed with PP_SIGNATURE after the allocation
-	 * in order to preserve any existing bits, such as bit 0 for the
-	 * head page of compound page and bit 1 for pfmemalloc page, so
-	 * mask those bits for freeing side when doing below checking,
-	 * and page_is_pfmemalloc() is checked in __page_pool_put_page()
-	 * to avoid recycling the pfmemalloc page.
-	 */
-	if (unlikely((page->pp_magic & ~0x3UL) != PP_SIGNATURE))
-		return false;
-
-	pp = page->pp;
-
-	/* Allow direct recycle if we have reasons to believe that we are
-	 * in the same context as the consumer would run, so there's
-	 * no possible race.
-	 */
-	napi = READ_ONCE(pp->p.napi);
-	allow_direct = napi_safe && napi &&
-		READ_ONCE(napi->list_owner) == smp_processor_id();
-
-	/* Driver set this to memory recycling info. Reset it on recycle.
-	 * This will *not* work for NIC using a split-page memory model.
-	 * The page will be returned to the pool here regardless of the
-	 * 'flipped' fragment being in use or not.
-	 */
-	page_pool_put_full_page(pp, page, allow_direct);
-
-	return true;
-}
-EXPORT_SYMBOL(page_pool_return_skb_page);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 7edabf17988a..4b7d00d5b5d7 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -879,6 +879,34 @@ static void skb_clone_fraglist(struct sk_buff *skb)
 		skb_get(list);
 }
 
+bool page_pool_return_skb_page(struct page *page, bool napi_safe)
+{
+	struct napi_struct *napi;
+	struct page_pool *pp;
+	bool allow_direct;
+
+	page = compound_head(page);
+	pp = page->pp;
+
+	/* Allow direct recycle if we have reasons to believe that we are
+	 * in the same context as the consumer would run, so there's
+	 * no possible race.
+	 */
+	napi = READ_ONCE(pp->p.napi);
+	allow_direct = napi_safe && napi &&
+		READ_ONCE(napi->list_owner) == smp_processor_id();
+
+	/* Driver set this to memory recycling info. Reset it on recycle.
+	 * This will *not* work for NIC using a split-page memory model.
+	 * The page will be returned to the pool here regardless of the
+	 * 'flipped' fragment being in use or not.
+	 */
+	page_pool_put_full_page(pp, page, allow_direct);
+
+	return true;
+}
+EXPORT_SYMBOL(page_pool_return_skb_page);
+
 static bool skb_pp_recycle(struct sk_buff *skb, void *data, bool napi_safe)
 {
 	if (!IS_ENABLED(CONFIG_PAGE_POOL) || !skb->pp_recycle)
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH RFC net-next 2/4] net: page_pool: avoid calling no-op externals when possible
  2023-06-29 15:23 [PATCH RFC net-next 0/4] net: page_pool: a couple assorted optimizations Alexander Lobakin
  2023-06-29 15:23 ` [PATCH RFC net-next 1/4] net: skbuff: don't include <net/page_pool.h> to <linux/skbuff.h> Alexander Lobakin
@ 2023-06-29 15:23 ` Alexander Lobakin
  2023-06-29 16:45   ` Alexander H Duyck
  2023-06-29 15:23 ` [PATCH RFC net-next 3/4] net: add flag to indicate NAPI/GRO is running right now Alexander Lobakin
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 21+ messages in thread
From: Alexander Lobakin @ 2023-06-29 15:23 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Alexander Lobakin, Maciej Fijalkowski, Larysa Zaremba,
	Yunsheng Lin, Alexander Duyck, Jesper Dangaard Brouer,
	Ilias Apalodimas, netdev, linux-kernel

Turned out page_pool_put{,_full}_page() can burn quite a bunch of cycles
even when on DMA-coherent platforms (like x86) with no active IOMMU or
swiotlb, just for the call ladder.
Indeed, it's

page_pool_put_page()
  page_pool_put_defragged_page()                  <- external
    __page_pool_put_page()
      page_pool_dma_sync_for_device()             <- non-inline
        dma_sync_single_range_for_device()
          dma_sync_single_for_device()            <- external
            dma_direct_sync_single_for_device()
              dev_is_dma_coherent()               <- exit

For the inline functions, no guarantees the compiler won't uninline them
(they're clearly not one-liners and sometimes compilers uninline even
2 + 2). The first external call is necessary, but the rest 2+ are done
for nothing each time, plus a bunch of checks here and there.
Since Page Pool mappings are long-term and for one "device + addr" pair
dma_need_sync() will always return the same value (basically, whether it
belongs to an swiotlb pool), addresses can be tested once right after
they're obtained and the result can be reused until the page is unmapped.
Define new PP flag, which will mean "do DMA syncs for device, but only
when needed" and turn it on by default when the driver asks to sync
pages. When a page is mapped, check whether it needs syncs and if so,
replace that "sync when needed" back to "always do syncs" globally for
the whole pool (better safe than sorry). As long as a pool has no pages
requiring DMA syncs, this cuts off a good piece of calls and checks.
On my x86_64, this gives from 2% to 5% performance benefit with no
negative impact for cases when IOMMU is on and the shortcut can't be
used.

Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
---
 include/net/page_pool.h |  3 +++
 net/core/page_pool.c    | 10 ++++++++++
 2 files changed, 13 insertions(+)

diff --git a/include/net/page_pool.h b/include/net/page_pool.h
index 829dc1f8ba6b..ff3772fab707 100644
--- a/include/net/page_pool.h
+++ b/include/net/page_pool.h
@@ -23,6 +23,9 @@
 					* Please note DMA-sync-for-CPU is still
 					* device driver responsibility
 					*/
+#define PP_FLAG_DMA_MAYBE_SYNC	BIT(2) /* Internal, should not be used in
+					* drivers
+					*/
 #define PP_FLAG_ALL		(PP_FLAG_DMA_MAP |\
 				 PP_FLAG_DMA_SYNC_DEV)

diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index dff0b4fa2316..498e058140b3 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -197,6 +197,10 @@ static int page_pool_init(struct page_pool *pool,
 		/* pool->p.offset has to be set according to the address
 		 * offset used by the DMA engine to start copying rx data
 		 */
+
+		/* Try to avoid calling no-op syncs */
+		pool->p.flags |= PP_FLAG_DMA_MAYBE_SYNC;
+		pool->p.flags &= ~PP_FLAG_DMA_SYNC_DEV;
 	}

 #ifdef CONFIG_PAGE_POOL_STATS
@@ -341,6 +345,12 @@ static bool page_pool_dma_map(struct page_pool *pool, struct page *page)

 	page_pool_set_dma_addr(page, dma);

+	if ((pool->p.flags & PP_FLAG_DMA_MAYBE_SYNC) &&
+	    dma_need_sync(pool->p.dev, dma)) {
+		pool->p.flags |= PP_FLAG_DMA_SYNC_DEV;
+		pool->p.flags &= ~PP_FLAG_DMA_MAYBE_SYNC;
+	}
+
 	if (pool->p.flags & PP_FLAG_DMA_SYNC_DEV)
 		page_pool_dma_sync_for_device(pool, page, pool->p.max_len);

-- 
2.41.0

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH RFC net-next 3/4] net: add flag to indicate NAPI/GRO is running right now
  2023-06-29 15:23 [PATCH RFC net-next 0/4] net: page_pool: a couple assorted optimizations Alexander Lobakin
  2023-06-29 15:23 ` [PATCH RFC net-next 1/4] net: skbuff: don't include <net/page_pool.h> to <linux/skbuff.h> Alexander Lobakin
  2023-06-29 15:23 ` [PATCH RFC net-next 2/4] net: page_pool: avoid calling no-op externals when possible Alexander Lobakin
@ 2023-06-29 15:23 ` Alexander Lobakin
  2023-06-29 15:23 ` [PATCH RFC net-next 4/4] net: skbuff: always recycle PP pages directly when inside a NAPI loop Alexander Lobakin
  2023-07-02  0:01 ` [PATCH RFC net-next 0/4] net: page_pool: a couple assorted optimizations Jakub Kicinski
  4 siblings, 0 replies; 21+ messages in thread
From: Alexander Lobakin @ 2023-06-29 15:23 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Alexander Lobakin, Maciej Fijalkowski, Larysa Zaremba,
	Yunsheng Lin, Alexander Duyck, Jesper Dangaard Brouer,
	Ilias Apalodimas, netdev, linux-kernel

Currently, there's no easy way to check if a NAPI polling cycle is
running and on which CPU, although this might come very handy in
several cases.
Commit 8c48eea3adf3 ("page_pool: allow caching from safely localized
NAPI") added napi_struct::list_owner, BUT it's set when the NAPI is
*scheduled*. `->list_owner == smp_processor_id()` doesn't mean we're
inside the corresponding polling loop.
Introduce new NAPI state flag, NAPI{,F}_STATE_RUNNING. Set it right
before calling to ->poll() and clear after all napi_gro_flush() and
gro_normal_list() are done. They are run in the same context and, in
fact, are part of the receive routine.
When `packets == budget`, napi_complete_done() is not called, so in
that case it's safe to clear the flag after ->poll() ends. Otherwise,
however, napi_complete_done() can lead to reenabling interrupts and
scheduling the NAPI already on another CPU. In that case, clear the
flag in napi_complete_done() itself.
The main usecase for the flag is as follows:

	if (test_bit(NAPI_STATE_RUNNING, &n->state) &&
	    READ_ONCE(n->list_owner) == smp_processor_id())
		/* you're inside n->poll() or the following GRO
		 * processing context
		 */

IOW, when the condition is true, feel free to use resources protected
by this NAPI, such as page_pools covered by it, percpu NAPI caches etc.
Just make sure interrupts are enabled.

Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
---
 include/linux/netdevice.h |  2 ++
 net/core/dev.c            | 23 +++++++++++++++++------
 2 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index b828c7a75be2..db3aea863ea9 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -389,6 +389,7 @@ enum {
 	NAPI_STATE_PREFER_BUSY_POLL,	/* prefer busy-polling over softirq processing*/
 	NAPI_STATE_THREADED,		/* The poll is performed inside its own thread*/
 	NAPI_STATE_SCHED_THREADED,	/* Napi is currently scheduled in threaded mode */
+	NAPI_STATE_RUNNING,		/* This NAPI or GRO is running on ::list_owner */
 };
 
 enum {
@@ -402,6 +403,7 @@ enum {
 	NAPIF_STATE_PREFER_BUSY_POLL	= BIT(NAPI_STATE_PREFER_BUSY_POLL),
 	NAPIF_STATE_THREADED		= BIT(NAPI_STATE_THREADED),
 	NAPIF_STATE_SCHED_THREADED	= BIT(NAPI_STATE_SCHED_THREADED),
+	NAPIF_STATE_RUNNING		= BIT(NAPI_STATE_RUNNING),
 };
 
 enum gro_result {
diff --git a/net/core/dev.c b/net/core/dev.c
index 69a3e544676c..7f0d23c9e25e 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6039,7 +6039,8 @@ bool napi_complete_done(struct napi_struct *n, int work_done)
 
 		new = val & ~(NAPIF_STATE_MISSED | NAPIF_STATE_SCHED |
 			      NAPIF_STATE_SCHED_THREADED |
-			      NAPIF_STATE_PREFER_BUSY_POLL);
+			      NAPIF_STATE_PREFER_BUSY_POLL |
+			      NAPIF_STATE_RUNNING);
 
 		/* If STATE_MISSED was set, leave STATE_SCHED set,
 		 * because we will call napi->poll() one more time.
@@ -6128,6 +6129,7 @@ static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock, bool
 	/* All we really want here is to re-enable device interrupts.
 	 * Ideally, a new ndo_busy_poll_stop() could avoid another round.
 	 */
+	set_bit(NAPI_STATE_RUNNING, &napi->state);
 	rc = napi->poll(napi, budget);
 	/* We can't gro_normal_list() here, because napi->poll() might have
 	 * rearmed the napi (napi_complete_done()) in which case it could
@@ -6135,8 +6137,10 @@ static void busy_poll_stop(struct napi_struct *napi, void *have_poll_lock, bool
 	 */
 	trace_napi_poll(napi, rc, budget);
 	netpoll_poll_unlock(have_poll_lock);
-	if (rc == budget)
+	if (rc == budget) {
 		__busy_poll_stop(napi, skip_schedule);
+		clear_bit(NAPI_STATE_RUNNING, &napi->state);
+	}
 	local_bh_enable();
 }
 
@@ -6186,9 +6190,11 @@ void napi_busy_loop(unsigned int napi_id,
 			have_poll_lock = netpoll_poll_lock(napi);
 			napi_poll = napi->poll;
 		}
+		set_bit(NAPI_STATE_RUNNING, &napi->state);
 		work = napi_poll(napi, budget);
 		trace_napi_poll(napi, work, budget);
 		gro_normal_list(napi);
+		clear_bit(NAPI_STATE_RUNNING, &napi->state);
 count:
 		if (work > 0)
 			__NET_ADD_STATS(dev_net(napi->dev),
@@ -6457,6 +6463,7 @@ static int __napi_poll(struct napi_struct *n, bool *repoll)
 	 */
 	work = 0;
 	if (test_bit(NAPI_STATE_SCHED, &n->state)) {
+		set_bit(NAPI_STATE_RUNNING, &n->state);
 		work = n->poll(n, weight);
 		trace_napi_poll(n, work, weight);
 	}
@@ -6466,7 +6473,7 @@ static int __napi_poll(struct napi_struct *n, bool *repoll)
 				n->poll, work, weight);
 
 	if (likely(work < weight))
-		return work;
+		goto out;
 
 	/* Drivers must not modify the NAPI state if they
 	 * consume the entire weight.  In such cases this code
@@ -6475,7 +6482,7 @@ static int __napi_poll(struct napi_struct *n, bool *repoll)
 	 */
 	if (unlikely(napi_disable_pending(n))) {
 		napi_complete(n);
-		return work;
+		goto out;
 	}
 
 	/* The NAPI context has more processing work, but busy-polling
@@ -6488,7 +6495,7 @@ static int __napi_poll(struct napi_struct *n, bool *repoll)
 			 */
 			napi_schedule(n);
 		}
-		return work;
+		goto out;
 	}
 
 	if (n->gro_bitmask) {
@@ -6506,11 +6513,15 @@ static int __napi_poll(struct napi_struct *n, bool *repoll)
 	if (unlikely(!list_empty(&n->poll_list))) {
 		pr_warn_once("%s: Budget exhausted after napi rescheduled\n",
 			     n->dev ? n->dev->name : "backlog");
-		return work;
+		goto out;
 	}
 
 	*repoll = true;
 
+out:
+	if (READ_ONCE(n->list_owner) == smp_processor_id())
+		clear_bit(NAPI_STATE_RUNNING, &n->state);
+
 	return work;
 }
 
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH RFC net-next 4/4] net: skbuff: always recycle PP pages directly when inside a NAPI loop
  2023-06-29 15:23 [PATCH RFC net-next 0/4] net: page_pool: a couple assorted optimizations Alexander Lobakin
                   ` (2 preceding siblings ...)
  2023-06-29 15:23 ` [PATCH RFC net-next 3/4] net: add flag to indicate NAPI/GRO is running right now Alexander Lobakin
@ 2023-06-29 15:23 ` Alexander Lobakin
  2023-07-02  0:01 ` [PATCH RFC net-next 0/4] net: page_pool: a couple assorted optimizations Jakub Kicinski
  4 siblings, 0 replies; 21+ messages in thread
From: Alexander Lobakin @ 2023-06-29 15:23 UTC (permalink / raw)
  To: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni
  Cc: Alexander Lobakin, Maciej Fijalkowski, Larysa Zaremba,
	Yunsheng Lin, Alexander Duyck, Jesper Dangaard Brouer,
	Ilias Apalodimas, netdev, linux-kernel

Commit 8c48eea3adf3 ("page_pool: allow caching from safely localized
NAPI") allowed direct recycling of skb pages to their PP for some cases,
but unfortunately missed a couple other majors.
For example, %XDP_DROP in skb mode. The netstack just calls kfree_skb(),
which unconditionally passes `false` as @napi_safe. Thus, all pages go
through ptr_ring and locks, although most of times we're actually inside
the NAPI polling this PP is linked with, so that it would be perfectly
safe to recycle pages directly.
Let's address such. If @napi_safe is true, we're fine, don't change
anything for this path. But if it's false, test the introduced
%NAPI_STATE_RUNNING. There's good probability it will be set and, if
->list_owner is our current CPU, we're good to use direct recycling,
even though @napi_safe is false.
For the mentioned xdp-drop-skb-mode case, the improvement I got is
3-4% in Mpps. As for page_pool stats, recycle_ring is now 0 and
alloc_slow counter doesn't change most of times, which means the
MM layer is not even called to allocate any new pages.

Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
---
 net/core/skbuff.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 4b7d00d5b5d7..931c83d7b251 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -893,7 +893,8 @@ bool page_pool_return_skb_page(struct page *page, bool napi_safe)
 	 * no possible race.
 	 */
 	napi = READ_ONCE(pp->p.napi);
-	allow_direct = napi_safe && napi &&
+	allow_direct = napi &&
+		(napi_safe || test_bit(NAPI_STATE_RUNNING, &napi->state)) &&
 		READ_ONCE(napi->list_owner) == smp_processor_id();
 
 	/* Driver set this to memory recycling info. Reset it on recycle.
-- 
2.41.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC net-next 2/4] net: page_pool: avoid calling no-op externals when possible
  2023-06-29 15:23 ` [PATCH RFC net-next 2/4] net: page_pool: avoid calling no-op externals when possible Alexander Lobakin
@ 2023-06-29 16:45   ` Alexander H Duyck
  2023-06-30 12:29     ` Alexander Lobakin
  0 siblings, 1 reply; 21+ messages in thread
From: Alexander H Duyck @ 2023-06-29 16:45 UTC (permalink / raw)
  To: Alexander Lobakin, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni
  Cc: Maciej Fijalkowski, Larysa Zaremba, Yunsheng Lin,
	Alexander Duyck, Jesper Dangaard Brouer, Ilias Apalodimas,
	netdev, linux-kernel

On Thu, 2023-06-29 at 17:23 +0200, Alexander Lobakin wrote:
> Turned out page_pool_put{,_full}_page() can burn quite a bunch of cycles
> even when on DMA-coherent platforms (like x86) with no active IOMMU or
> swiotlb, just for the call ladder.
> Indeed, it's
> 
> page_pool_put_page()
>   page_pool_put_defragged_page()                  <- external
>     __page_pool_put_page()
>       page_pool_dma_sync_for_device()             <- non-inline
>         dma_sync_single_range_for_device()
>           dma_sync_single_for_device()            <- external
>             dma_direct_sync_single_for_device()
>               dev_is_dma_coherent()               <- exit
> 
> For the inline functions, no guarantees the compiler won't uninline them
> (they're clearly not one-liners and sometimes compilers uninline even
> 2 + 2). The first external call is necessary, but the rest 2+ are done
> for nothing each time, plus a bunch of checks here and there.
> Since Page Pool mappings are long-term and for one "device + addr" pair
> dma_need_sync() will always return the same value (basically, whether it
> belongs to an swiotlb pool), addresses can be tested once right after
> they're obtained and the result can be reused until the page is unmapped.
> Define new PP flag, which will mean "do DMA syncs for device, but only
> when needed" and turn it on by default when the driver asks to sync
> pages. When a page is mapped, check whether it needs syncs and if so,
> replace that "sync when needed" back to "always do syncs" globally for
> the whole pool (better safe than sorry). As long as a pool has no pages
> requiring DMA syncs, this cuts off a good piece of calls and checks.
> On my x86_64, this gives from 2% to 5% performance benefit with no
> negative impact for cases when IOMMU is on and the shortcut can't be
> used.
> 
> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
> ---
>  include/net/page_pool.h |  3 +++
>  net/core/page_pool.c    | 10 ++++++++++
>  2 files changed, 13 insertions(+)
> 
> diff --git a/include/net/page_pool.h b/include/net/page_pool.h
> index 829dc1f8ba6b..ff3772fab707 100644
> --- a/include/net/page_pool.h
> +++ b/include/net/page_pool.h
> @@ -23,6 +23,9 @@
>  					* Please note DMA-sync-for-CPU is still
>  					* device driver responsibility
>  					*/
> +#define PP_FLAG_DMA_MAYBE_SYNC	BIT(2) /* Internal, should not be used in
> +					* drivers
> +					*/
>  #define PP_FLAG_ALL		(PP_FLAG_DMA_MAP |\
>  				 PP_FLAG_DMA_SYNC_DEV)
>  
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index dff0b4fa2316..498e058140b3 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -197,6 +197,10 @@ static int page_pool_init(struct page_pool *pool,
>  		/* pool->p.offset has to be set according to the address
>  		 * offset used by the DMA engine to start copying rx data
>  		 */
> +
> +		/* Try to avoid calling no-op syncs */
> +		pool->p.flags |= PP_FLAG_DMA_MAYBE_SYNC;
> +		pool->p.flags &= ~PP_FLAG_DMA_SYNC_DEV;
>  	}
>  
>  #ifdef CONFIG_PAGE_POOL_STATS
> @@ -341,6 +345,12 @@ static bool page_pool_dma_map(struct page_pool *pool, struct page *page)
>  
>  	page_pool_set_dma_addr(page, dma);
>  
> +	if ((pool->p.flags & PP_FLAG_DMA_MAYBE_SYNC) &&
> +	    dma_need_sync(pool->p.dev, dma)) {
> +		pool->p.flags |= PP_FLAG_DMA_SYNC_DEV;
> +		pool->p.flags &= ~PP_FLAG_DMA_MAYBE_SYNC;
> +	}
> +
>  	if (pool->p.flags & PP_FLAG_DMA_SYNC_DEV)
>  		page_pool_dma_sync_for_device(pool, page, pool->p.max_len);
>  

I am pretty sure the logic is flawed here. The problem is
dma_needs_sync depends on the DMA address being used. In the worst case
scenario we could have a device that has something like a 32b DMA
address space on a system with over 4GB of memory. In such a case the
higher addresses would need to be synced because they will go off to a
swiotlb bounce buffer while the lower addresses wouldn't.

If you were to store a flag like this it would have to be generated per
page.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC net-next 1/4] net: skbuff: don't include <net/page_pool.h> to <linux/skbuff.h>
  2023-06-29 15:23 ` [PATCH RFC net-next 1/4] net: skbuff: don't include <net/page_pool.h> to <linux/skbuff.h> Alexander Lobakin
@ 2023-06-29 16:55   ` Alexander H Duyck
  2023-06-30 12:39     ` Alexander Lobakin
  0 siblings, 1 reply; 21+ messages in thread
From: Alexander H Duyck @ 2023-06-29 16:55 UTC (permalink / raw)
  To: Alexander Lobakin, David S. Miller, Eric Dumazet, Jakub Kicinski,
	Paolo Abeni
  Cc: Maciej Fijalkowski, Larysa Zaremba, Yunsheng Lin,
	Alexander Duyck, Jesper Dangaard Brouer, Ilias Apalodimas,
	netdev, linux-kernel

On Thu, 2023-06-29 at 17:23 +0200, Alexander Lobakin wrote:
> Currently, touching <net/page_pool.h> triggers a rebuild of more than
> a half of the kernel. That's because it's included in <linux/skbuff.h>.
> And each new include to page_pool.h adds more [useless] data for the
> toolchain to process per each source file from that pile.
> 
> In commit 6a5bcd84e886 ("page_pool: Allow drivers to hint on SKB
> recycling"), Matteo included it to be able to call a couple functions
> defined there. Then, in commit 57f05bc2ab24 ("page_pool: keep pp info
> as long as page pool owns the page") one of the calls was removed, so
> only one left. It's the call to page_pool_return_skb_page() in
> napi_frag_unref(). The function is external and doesn't have any
> dependencies. Having include of very niche page_pool.h only for that
> looks like an overkill.
> As Alex noticed, the only thing that holds this function in page_pool.c
> is %PP_SIGNATURE. By moving the check for magic a couple functions up,
> the whole page_pool_return_skb_page() can be moved to skbuff.c.
> The arguments for moving the check are the following:
> 
> 1) It checks for a corner case that shouldn't ever happen when the code
>    is sane. And execution speed doesn't matter on exception path, thus
>    doing more calls before bailing out doesn't make any weather.
> 2) There are 2 users of the internal __page_pool_put_page(), where this
>    check is moved: page_pool_put_defragged_page() and
>    page_pool_put_page_bulk(). Both are exported and can be called from
>    the drivers, but previously only the skb recycling path of the former
>    was protected with the magic check. So this also makes the code a bit
>    more reliable.
> 
> After the check is moved, teleport page_pool_return_skb_page() to
> skbuff.c, just next to the main consumer, skb_pp_recycle(). It's used
> also in napi_frag_unref() -> {__,}skb_frag_unref(), so no `static`
> unfortunately. Maybe next time.
> Now, after a few include fixes in the drivers, touching page_pool.h
> only triggers rebuilding of the drivers using it and a couple core
> networking files.
> 
> Suggested-by: Jakub Kicinski <kuba@kernel.org> # make skbuff.h less heavy
> Suggested-by: Alexander Duyck <alexanderduyck@fb.com> # move to skbuff.c
> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com>
> ---
>  drivers/net/ethernet/engleder/tsnep_main.c    |  1 +
>  drivers/net/ethernet/freescale/fec_main.c     |  1 +
>  .../marvell/octeontx2/nic/otx2_common.c       |  1 +
>  .../ethernet/marvell/octeontx2/nic/otx2_pf.c  |  1 +
>  .../ethernet/mellanox/mlx5/core/en/params.c   |  1 +
>  .../net/ethernet/mellanox/mlx5/core/en/xdp.c  |  1 +
>  drivers/net/wireless/mediatek/mt76/mt76.h     |  1 +
>  include/linux/skbuff.h                        |  3 +-
>  include/net/page_pool.h                       |  2 -
>  net/core/page_pool.c                          | 52 +++++--------------
>  net/core/skbuff.c                             | 28 ++++++++++
>  11 files changed, 50 insertions(+), 42 deletions(-)
> 
> diff --git a/drivers/net/ethernet/engleder/tsnep_main.c b/drivers/net/ethernet/engleder/tsnep_main.c
> index 84751bb303a6..6222aaa5157f 100644
> --- a/drivers/net/ethernet/engleder/tsnep_main.c
> +++ b/drivers/net/ethernet/engleder/tsnep_main.c
> @@ -28,6 +28,7 @@
>  #include <linux/iopoll.h>
>  #include <linux/bpf.h>
>  #include <linux/bpf_trace.h>
> +#include <net/page_pool.h>
>  #include <net/xdp_sock_drv.h>
>  
>  #define TSNEP_RX_OFFSET (max(NET_SKB_PAD, XDP_PACKET_HEADROOM) + NET_IP_ALIGN)
> diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
> index 8fbe47703d47..99b3b4f79603 100644
> --- a/drivers/net/ethernet/freescale/fec_main.c
> +++ b/drivers/net/ethernet/freescale/fec_main.c
> @@ -38,6 +38,7 @@
>  #include <linux/in.h>
>  #include <linux/ip.h>
>  #include <net/ip.h>
> +#include <net/page_pool.h>
>  #include <net/selftests.h>
>  #include <net/tso.h>
>  #include <linux/tcp.h>
> diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
> index a5d03583bf79..d17a0ebc9036 100644
> --- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
> +++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c
> @@ -7,6 +7,7 @@
>  
>  #include <linux/interrupt.h>
>  #include <linux/pci.h>
> +#include <net/page_pool.h>
>  #include <net/tso.h>
>  #include <linux/bitfield.h>
>  
> diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
> index fe8ea4e531b7..7eca434a0550 100644
> --- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
> +++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_pf.c
> @@ -16,6 +16,7 @@
>  #include <linux/bpf.h>
>  #include <linux/bpf_trace.h>
>  #include <linux/bitfield.h>
> +#include <net/page_pool.h>
>  
>  #include "otx2_reg.h"
>  #include "otx2_common.h"
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/params.c b/drivers/net/ethernet/mellanox/mlx5/core/en/params.c
> index 5ce28ff7685f..0f152f14165b 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/params.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/params.c
> @@ -6,6 +6,7 @@
>  #include "en/port.h"
>  #include "en_accel/en_accel.h"
>  #include "en_accel/ipsec.h"
> +#include <net/page_pool.h>
>  #include <net/xdp_sock_drv.h>
>  
>  static u8 mlx5e_mpwrq_min_page_shift(struct mlx5_core_dev *mdev)
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
> index f0e6095809fa..1bd91bc09eb8 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
> @@ -35,6 +35,7 @@
>  #include "en/xdp.h"
>  #include "en/params.h"
>  #include <linux/bitfield.h>
> +#include <net/page_pool.h>
>  
>  int mlx5e_xdp_max_mtu(struct mlx5e_params *params, struct mlx5e_xsk_param *xsk)
>  {
> diff --git a/drivers/net/wireless/mediatek/mt76/mt76.h b/drivers/net/wireless/mediatek/mt76/mt76.h
> index 6b07b8fafec2..95c16f11d156 100644
> --- a/drivers/net/wireless/mediatek/mt76/mt76.h
> +++ b/drivers/net/wireless/mediatek/mt76/mt76.h
> @@ -15,6 +15,7 @@
>  #include <linux/average.h>
>  #include <linux/soc/mediatek/mtk_wed.h>
>  #include <net/mac80211.h>
> +#include <net/page_pool.h>
>  #include "util.h"
>  #include "testmode.h"
>  
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 91ed66952580..f76d172ed262 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -32,7 +32,6 @@
>  #include <linux/if_packet.h>
>  #include <linux/llist.h>
>  #include <net/flow.h>
> -#include <net/page_pool.h>
>  #if IS_ENABLED(CONFIG_NF_CONNTRACK)
>  #include <linux/netfilter/nf_conntrack_common.h>
>  #endif
> @@ -3423,6 +3422,8 @@ static inline void skb_frag_ref(struct sk_buff *skb, int f)
>  	__skb_frag_ref(&skb_shinfo(skb)->frags[f]);
>  }
>  
> +bool page_pool_return_skb_page(struct page *page, bool napi_safe);
> +
>  static inline void
>  napi_frag_unref(skb_frag_t *frag, bool recycle, bool napi_safe)
>  {
> diff --git a/include/net/page_pool.h b/include/net/page_pool.h
> index 2b7db9992fc0..829dc1f8ba6b 100644
> --- a/include/net/page_pool.h
> +++ b/include/net/page_pool.h
> @@ -307,8 +307,6 @@ inline enum dma_data_direction page_pool_get_dma_dir(struct page_pool *pool)
>  	return pool->p.dma_dir;
>  }
>  
> -bool page_pool_return_skb_page(struct page *page, bool napi_safe);
> -
>  struct page_pool *page_pool_create(const struct page_pool_params *params);
>  
>  struct xdp_mem_info;
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index 985ccaffc06a..dff0b4fa2316 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -582,6 +582,19 @@ static __always_inline struct page *
>  __page_pool_put_page(struct page_pool *pool, struct page *page,
>  		     unsigned int dma_sync_size, bool allow_direct)
>  {
> +	/* Avoid recycling non-PP pages, give them back to the page allocator.
> +	 * page->pp_magic is OR'ed with PP_SIGNATURE after the allocation
> +	 * in order to preserve any existing bits, such as bit 0 for the
> +	 * head page of compound page and bit 1 for pfmemalloc page, so
> +	 * mask those bits for freeing side when doing below checking,
> +	 * and page_is_pfmemalloc() is checked in __page_pool_put_page()
> +	 * to avoid recycling the pfmemalloc page.
> +	 */
> +	if (unlikely((page->pp_magic & ~0x3UL) != PP_SIGNATURE)) {
> +		put_page(page);
> +		return NULL;
> +	}
> +

Rather than moving this block of code down into here. You may just want
to look at creating an inline function that would act as a accessor for
retrieving the page pool for pages with the signature, and for those
without just returning NULL.

>  	/* This allocator is optimized for the XDP mode that uses
>  	 * one-frame-per-page, but have fallbacks that act like the
>  	 * regular page allocator APIs.
> @@ -913,42 +926,3 @@ void page_pool_update_nid(struct page_pool *pool, int new_nid)
>  	}
>  }
>  EXPORT_SYMBOL(page_pool_update_nid);
> -
> -bool page_pool_return_skb_page(struct page *page, bool napi_safe)
> -{
> -	struct napi_struct *napi;
> -	struct page_pool *pp;
> -	bool allow_direct;
> -
> -	page = compound_head(page);
> -
> -	/* page->pp_magic is OR'ed with PP_SIGNATURE after the allocation
> -	 * in order to preserve any existing bits, such as bit 0 for the
> -	 * head page of compound page and bit 1 for pfmemalloc page, so
> -	 * mask those bits for freeing side when doing below checking,
> -	 * and page_is_pfmemalloc() is checked in __page_pool_put_page()
> -	 * to avoid recycling the pfmemalloc page.
> -	 */
> -	if (unlikely((page->pp_magic & ~0x3UL) != PP_SIGNATURE))
> -		return false;
> -
> -	pp = page->pp;
> -
> -	/* Allow direct recycle if we have reasons to believe that we are
> -	 * in the same context as the consumer would run, so there's
> -	 * no possible race.
> -	 */
> -	napi = READ_ONCE(pp->p.napi);
> -	allow_direct = napi_safe && napi &&
> -		READ_ONCE(napi->list_owner) == smp_processor_id();
> -
> -	/* Driver set this to memory recycling info. Reset it on recycle.
> -	 * This will *not* work for NIC using a split-page memory model.
> -	 * The page will be returned to the pool here regardless of the
> -	 * 'flipped' fragment being in use or not.
> -	 */
> -	page_pool_put_full_page(pp, page, allow_direct);
> -
> -	return true;
> -}
> -EXPORT_SYMBOL(page_pool_return_skb_page);
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 7edabf17988a..4b7d00d5b5d7 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -879,6 +879,34 @@ static void skb_clone_fraglist(struct sk_buff *skb)
>  		skb_get(list);
>  }
>  
> +bool page_pool_return_skb_page(struct page *page, bool napi_safe)
> +{
> +	struct napi_struct *napi;
> +	struct page_pool *pp;
> +	bool allow_direct;
> +
> +	page = compound_head(page);
> +	pp = page->pp;

So this is just assuming that any page we pass thru is a page pool
page. The problem is there may be some other pointer stored here that
could cause issues.

I would suggest creating an accessor as mentioned above to verify it is
a page pool page before you access page->pp.

> +
> +	/* Allow direct recycle if we have reasons to believe that we are
> +	 * in the same context as the consumer would run, so there's
> +	 * no possible race.
> +	 */
> +	napi = READ_ONCE(pp->p.napi);
> +	allow_direct = napi_safe && napi &&
> +		READ_ONCE(napi->list_owner) == smp_processor_id();
> +
> +	/* Driver set this to memory recycling info. Reset it on recycle.
> +	 * This will *not* work for NIC using a split-page memory model.
> +	 * The page will be returned to the pool here regardless of the
> +	 * 'flipped' fragment being in use or not.
> +	 */
> +	page_pool_put_full_page(pp, page, allow_direct);
> +
> +	return true;
> +}
> +EXPORT_SYMBOL(page_pool_return_skb_page);
> +
>  static bool skb_pp_recycle(struct sk_buff *skb, void *data, bool napi_safe)
>  {
>  	if (!IS_ENABLED(CONFIG_PAGE_POOL) || !skb->pp_recycle)


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC net-next 2/4] net: page_pool: avoid calling no-op externals when possible
  2023-06-29 16:45   ` Alexander H Duyck
@ 2023-06-30 12:29     ` Alexander Lobakin
  2023-06-30 14:44       ` Alexander Duyck
  0 siblings, 1 reply; 21+ messages in thread
From: Alexander Lobakin @ 2023-06-30 12:29 UTC (permalink / raw)
  To: Alexander H Duyck
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Maciej Fijalkowski, Larysa Zaremba, Yunsheng Lin,
	Alexander Duyck, Jesper Dangaard Brouer, Ilias Apalodimas,
	netdev, linux-kernel

From: Alexander H Duyck <alexander.duyck@gmail.com>
Date: Thu, 29 Jun 2023 09:45:26 -0700

> On Thu, 2023-06-29 at 17:23 +0200, Alexander Lobakin wrote:
>> Turned out page_pool_put{,_full}_page() can burn quite a bunch of cycles
>> even when on DMA-coherent platforms (like x86) with no active IOMMU or
>> swiotlb, just for the call ladder.
>> Indeed, it's

[...]

>> @@ -341,6 +345,12 @@ static bool page_pool_dma_map(struct page_pool *pool, struct page *page)
>>  
>>  	page_pool_set_dma_addr(page, dma);
>>  
>> +	if ((pool->p.flags & PP_FLAG_DMA_MAYBE_SYNC) &&
>> +	    dma_need_sync(pool->p.dev, dma)) {
>> +		pool->p.flags |= PP_FLAG_DMA_SYNC_DEV;
>> +		pool->p.flags &= ~PP_FLAG_DMA_MAYBE_SYNC;
>> +	}
>> +
>>  	if (pool->p.flags & PP_FLAG_DMA_SYNC_DEV)
>>  		page_pool_dma_sync_for_device(pool, page, pool->p.max_len);
>>  
> 
> I am pretty sure the logic is flawed here. The problem is
> dma_needs_sync depends on the DMA address being used. In the worst case
> scenario we could have a device that has something like a 32b DMA
> address space on a system with over 4GB of memory. In such a case the
> higher addresses would need to be synced because they will go off to a
> swiotlb bounce buffer while the lower addresses wouldn't.
> 
> If you were to store a flag like this it would have to be generated per
> page.

I know when DMA might need syncing :D That's the point of this shortcut:
if at least one page needs syncing, I disable it for the whole pool.
It's a "better safe than sorry".
Using a per-page flag involves more changes and might hurt some
scenarios/setups. For example, non-coherent systems, where you always
need to do syncs. The idea was to give some improvement when possible,
otherwise just fallback to what we have today.

Thanks,
Olek

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC net-next 1/4] net: skbuff: don't include <net/page_pool.h> to <linux/skbuff.h>
  2023-06-29 16:55   ` Alexander H Duyck
@ 2023-06-30 12:39     ` Alexander Lobakin
  2023-06-30 15:11       ` Alexander Duyck
  0 siblings, 1 reply; 21+ messages in thread
From: Alexander Lobakin @ 2023-06-30 12:39 UTC (permalink / raw)
  To: Alexander H Duyck
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Maciej Fijalkowski, Larysa Zaremba, Yunsheng Lin,
	Alexander Duyck, Jesper Dangaard Brouer, Ilias Apalodimas,
	netdev, linux-kernel

From: Alexander H Duyck <alexander.duyck@gmail.com>
Date: Thu, 29 Jun 2023 09:55:15 -0700

> On Thu, 2023-06-29 at 17:23 +0200, Alexander Lobakin wrote:
>> Currently, touching <net/page_pool.h> triggers a rebuild of more than
>> a half of the kernel. That's because it's included in <linux/skbuff.h>.
>> And each new include to page_pool.h adds more [useless] data for the
>> toolchain to process per each source file from that pile.

[...]

>> +bool page_pool_return_skb_page(struct page *page, bool napi_safe)
>> +{
>> +	struct napi_struct *napi;
>> +	struct page_pool *pp;
>> +	bool allow_direct;
>> +
>> +	page = compound_head(page);
>> +	pp = page->pp;
> 
> So this is just assuming that any page we pass thru is a page pool
> page. The problem is there may be some other pointer stored here that
> could cause issues.

But that is exactly what you suggested in the previous revision's
thread... Hey! :D

"I suspect we could look at pulling parts of it out as well. The
pp_magic check should always be succeeding unless we have pages getting
routed the wrong way somewhere. So maybe we should look at pulling it
out and moving it to another part of the path such as
__page_pool_put_page() and making it a bit more visible to catch those
cases".

Anyway, since some drivers still mix PP pages with non-PP ones (mt76
IIRC, maybe more), I feel the check for magic is still relevant. I just
believed you and forgot about that T.T

> 
> I would suggest creating an accessor as mentioned above to verify it is
> a page pool page before you access page->pp.
> 
>> +
>> +	/* Allow direct recycle if we have reasons to believe that we are
[...]

Thanks,
Olek

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC net-next 2/4] net: page_pool: avoid calling no-op externals when possible
  2023-06-30 12:29     ` Alexander Lobakin
@ 2023-06-30 14:44       ` Alexander Duyck
  2023-06-30 15:34         ` Alexander Lobakin
  0 siblings, 1 reply; 21+ messages in thread
From: Alexander Duyck @ 2023-06-30 14:44 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Maciej Fijalkowski, Larysa Zaremba, Yunsheng Lin,
	Alexander Duyck, Jesper Dangaard Brouer, Ilias Apalodimas,
	netdev, linux-kernel

On Fri, Jun 30, 2023 at 5:30 AM Alexander Lobakin
<aleksander.lobakin@intel.com> wrote:
>
> From: Alexander H Duyck <alexander.duyck@gmail.com>
> Date: Thu, 29 Jun 2023 09:45:26 -0700
>
> > On Thu, 2023-06-29 at 17:23 +0200, Alexander Lobakin wrote:
> >> Turned out page_pool_put{,_full}_page() can burn quite a bunch of cycles
> >> even when on DMA-coherent platforms (like x86) with no active IOMMU or
> >> swiotlb, just for the call ladder.
> >> Indeed, it's
>
> [...]
>
> >> @@ -341,6 +345,12 @@ static bool page_pool_dma_map(struct page_pool *pool, struct page *page)
> >>
> >>      page_pool_set_dma_addr(page, dma);
> >>
> >> +    if ((pool->p.flags & PP_FLAG_DMA_MAYBE_SYNC) &&
> >> +        dma_need_sync(pool->p.dev, dma)) {
> >> +            pool->p.flags |= PP_FLAG_DMA_SYNC_DEV;
> >> +            pool->p.flags &= ~PP_FLAG_DMA_MAYBE_SYNC;
> >> +    }
> >> +
> >>      if (pool->p.flags & PP_FLAG_DMA_SYNC_DEV)
> >>              page_pool_dma_sync_for_device(pool, page, pool->p.max_len);
> >>
> >
> > I am pretty sure the logic is flawed here. The problem is
> > dma_needs_sync depends on the DMA address being used. In the worst case
> > scenario we could have a device that has something like a 32b DMA
> > address space on a system with over 4GB of memory. In such a case the
> > higher addresses would need to be synced because they will go off to a
> > swiotlb bounce buffer while the lower addresses wouldn't.
> >
> > If you were to store a flag like this it would have to be generated per
> > page.
>
> I know when DMA might need syncing :D That's the point of this shortcut:
> if at least one page needs syncing, I disable it for the whole pool.
> It's a "better safe than sorry".
> Using a per-page flag involves more changes and might hurt some
> scenarios/setups. For example, non-coherent systems, where you always
> need to do syncs. The idea was to give some improvement when possible,
> otherwise just fallback to what we have today.

I am not a fan of having the page pool force the syncing either. Last
I knew I thought the PP_FLAG_DMA_SYNC_DEV was meant to be set by the
driver, not by the page pool API itself. The big reason for that being
that the driver in many cases will have to take care of the DMA sync
itself instead of letting the allocator take care of it.

Basically we are just trading off the dma_need_sync cost versus the
page_pool_dma_sync_for_device cost. If we think it is a win to call
dma_need_sync for every frame then maybe we should look at folding it
into page_pool_dma_sync_for_device itself since that is the only
consumer of it it or just fold it into the PP_FLAG_DMA_SYNC_DEV if
statement after the flag check rather than adding yet another flag
that will likely always be true for most devices. Otherwise you are
just adding overhead for the non-exception case and devices that don't
bother setting PP_FLAG_DMA_SYNC_DEV.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC net-next 1/4] net: skbuff: don't include <net/page_pool.h> to <linux/skbuff.h>
  2023-06-30 12:39     ` Alexander Lobakin
@ 2023-06-30 15:11       ` Alexander Duyck
  2023-06-30 16:05         ` Alexander Lobakin
  0 siblings, 1 reply; 21+ messages in thread
From: Alexander Duyck @ 2023-06-30 15:11 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Maciej Fijalkowski, Larysa Zaremba, Yunsheng Lin,
	Alexander Duyck, Jesper Dangaard Brouer, Ilias Apalodimas,
	netdev, linux-kernel

On Fri, Jun 30, 2023 at 5:39 AM Alexander Lobakin
<aleksander.lobakin@intel.com> wrote:
>
> From: Alexander H Duyck <alexander.duyck@gmail.com>
> Date: Thu, 29 Jun 2023 09:55:15 -0700
>
> > On Thu, 2023-06-29 at 17:23 +0200, Alexander Lobakin wrote:
> >> Currently, touching <net/page_pool.h> triggers a rebuild of more than
> >> a half of the kernel. That's because it's included in <linux/skbuff.h>.
> >> And each new include to page_pool.h adds more [useless] data for the
> >> toolchain to process per each source file from that pile.
>
> [...]
>
> >> +bool page_pool_return_skb_page(struct page *page, bool napi_safe)
> >> +{
> >> +    struct napi_struct *napi;
> >> +    struct page_pool *pp;
> >> +    bool allow_direct;
> >> +
> >> +    page = compound_head(page);
> >> +    pp = page->pp;
> >
> > So this is just assuming that any page we pass thru is a page pool
> > page. The problem is there may be some other pointer stored here that
> > could cause issues.
>
> But that is exactly what you suggested in the previous revision's
> thread... Hey! :D
>
> "I suspect we could look at pulling parts of it out as well. The
> pp_magic check should always be succeeding unless we have pages getting
> routed the wrong way somewhere. So maybe we should look at pulling it
> out and moving it to another part of the path such as
> __page_pool_put_page() and making it a bit more visible to catch those
> cases".

Yeah, I have had a few off days recently, amazing what happens when
you are running on only a few hours of sleep.. :-/

Anyway, as I was saying it might make sense to wrap the whole thing up
as a page pool accessor that would return NULL if the MAGIC value
isn't present. Alternatively one other possibility would be to look at
creating an inline function here that could check to see if the
skb_frag is page pool and then just keep that in sk_buff.h since it
looks like it should only need to rely on the page struct and
PP_SIGNATURE which is a poison.h value. With that napi_frag_unref
could avoid an unnecessary trip into the page_pool_return_skb_page
function entirely if it isn't a page pool page and we could look at
dropping the return value from page_pool_return_skb_page entirely.

> Anyway, since some drivers still mix PP pages with non-PP ones (mt76
> IIRC, maybe more), I feel the check for magic is still relevant. I just
> believed you and forgot about that T.T

Yeah, sorry about that, my bad. I was a bit too focused on the main
drivers we use and not thinking outside the box enough.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC net-next 2/4] net: page_pool: avoid calling no-op externals when possible
  2023-06-30 14:44       ` Alexander Duyck
@ 2023-06-30 15:34         ` Alexander Lobakin
  2023-06-30 18:28           ` Alexander Duyck
  2023-07-03 20:32           ` Jakub Kicinski
  0 siblings, 2 replies; 21+ messages in thread
From: Alexander Lobakin @ 2023-06-30 15:34 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Maciej Fijalkowski, Larysa Zaremba, Yunsheng Lin,
	Alexander Duyck, Jesper Dangaard Brouer, Ilias Apalodimas,
	netdev, linux-kernel

From: Alexander Duyck <alexander.duyck@gmail.com>
Date: Fri, 30 Jun 2023 07:44:45 -0700

> On Fri, Jun 30, 2023 at 5:30 AM Alexander Lobakin
> <aleksander.lobakin@intel.com> wrote:
>>
>> From: Alexander H Duyck <alexander.duyck@gmail.com>
>> Date: Thu, 29 Jun 2023 09:45:26 -0700
>>
>>> On Thu, 2023-06-29 at 17:23 +0200, Alexander Lobakin wrote:
>>>> Turned out page_pool_put{,_full}_page() can burn quite a bunch of cycles
>>>> even when on DMA-coherent platforms (like x86) with no active IOMMU or
>>>> swiotlb, just for the call ladder.
>>>> Indeed, it's
>>
>> [...]
>>
>>>> @@ -341,6 +345,12 @@ static bool page_pool_dma_map(struct page_pool *pool, struct page *page)
>>>>
>>>>      page_pool_set_dma_addr(page, dma);
>>>>
>>>> +    if ((pool->p.flags & PP_FLAG_DMA_MAYBE_SYNC) &&
>>>> +        dma_need_sync(pool->p.dev, dma)) {
>>>> +            pool->p.flags |= PP_FLAG_DMA_SYNC_DEV;
>>>> +            pool->p.flags &= ~PP_FLAG_DMA_MAYBE_SYNC;
>>>> +    }
>>>> +
>>>>      if (pool->p.flags & PP_FLAG_DMA_SYNC_DEV)
>>>>              page_pool_dma_sync_for_device(pool, page, pool->p.max_len);
>>>>
>>>
>>> I am pretty sure the logic is flawed here. The problem is
>>> dma_needs_sync depends on the DMA address being used. In the worst case
>>> scenario we could have a device that has something like a 32b DMA
>>> address space on a system with over 4GB of memory. In such a case the
>>> higher addresses would need to be synced because they will go off to a
>>> swiotlb bounce buffer while the lower addresses wouldn't.
>>>
>>> If you were to store a flag like this it would have to be generated per
>>> page.
>>
>> I know when DMA might need syncing :D That's the point of this shortcut:
>> if at least one page needs syncing, I disable it for the whole pool.
>> It's a "better safe than sorry".
>> Using a per-page flag involves more changes and might hurt some
>> scenarios/setups. For example, non-coherent systems, where you always
>> need to do syncs. The idea was to give some improvement when possible,
>> otherwise just fallback to what we have today.
> 
> I am not a fan of having the page pool force the syncing either. Last
> I knew I thought the PP_FLAG_DMA_SYNC_DEV was meant to be set by the

Please follow the logics of the patch.

1. The driver sets DMA_SYNC_DEV.
2. PP tries to shortcut and replaces it with MAYBE_SYNC.
3. If dma_need_sync() returns true for some page, it gets replaced back
   to DMA_SYNC_DEV, no further dma_need_sync() calls for that pool.

OR

1. The driver doesn't set DMA_SYNC_DEV.
2. PP doesn't turn on MAYBE_SYNC.
3. No dma_need_sync() tests.

Where does PP force syncs for drivers which don't need them?

> driver, not by the page pool API itself. The big reason for that being
> that the driver in many cases will have to take care of the DMA sync
> itself instead of letting the allocator take care of it.
> 
> Basically we are just trading off the dma_need_sync cost versus the
> page_pool_dma_sync_for_device cost. If we think it is a win to call

dma_need_sync() is called once per newly allocated and mapped page.
page_pool_dma_sync_for_device() is called each time you ask for a page.

On my setup and with patch #4, I have literally 0 allocations once a
ring is filled. This means dma_need_sync() is not called at all during
Rx, while sync_for_device() would be called all the time.
When pages go through ptr_ring, sometimes new allocations happen, but
still the number of times dma_need_sync() is called is thousands times
lower.

> dma_need_sync for every frame then maybe we should look at folding it
> into page_pool_dma_sync_for_device itself since that is the only
> consumer of it it or just fold it into the PP_FLAG_DMA_SYNC_DEV if
> statement after the flag check rather than adding yet another flag
> that will likely always be true for most devices. Otherwise you are

What you suggest is either calling dma_need_sync() each time a page is
requested or introducing a flag to store it somewhere in struct page to
allow some optimization for really-not-common-cases when dma_need_sync()
might return different values due to swiotlb etc. Did I get it right?

> just adding overhead for the non-exception case and devices that don't
> bother setting PP_FLAG_DMA_SYNC_DEV.

Thanks,
Olek

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC net-next 1/4] net: skbuff: don't include <net/page_pool.h> to <linux/skbuff.h>
  2023-06-30 15:11       ` Alexander Duyck
@ 2023-06-30 16:05         ` Alexander Lobakin
  0 siblings, 0 replies; 21+ messages in thread
From: Alexander Lobakin @ 2023-06-30 16:05 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Maciej Fijalkowski, Larysa Zaremba, Yunsheng Lin,
	Alexander Duyck, Jesper Dangaard Brouer, Ilias Apalodimas,
	netdev, linux-kernel

From: Alexander Duyck <alexander.duyck@gmail.com>
Date: Fri, 30 Jun 2023 08:11:02 -0700

> On Fri, Jun 30, 2023 at 5:39 AM Alexander Lobakin
> <aleksander.lobakin@intel.com> wrote:
>>
>> From: Alexander H Duyck <alexander.duyck@gmail.com>
>> Date: Thu, 29 Jun 2023 09:55:15 -0700
>>
>>> On Thu, 2023-06-29 at 17:23 +0200, Alexander Lobakin wrote:
>>>> Currently, touching <net/page_pool.h> triggers a rebuild of more than
>>>> a half of the kernel. That's because it's included in <linux/skbuff.h>.
>>>> And each new include to page_pool.h adds more [useless] data for the
>>>> toolchain to process per each source file from that pile.
>>
>> [...]
>>
>>>> +bool page_pool_return_skb_page(struct page *page, bool napi_safe)
>>>> +{
>>>> +    struct napi_struct *napi;
>>>> +    struct page_pool *pp;
>>>> +    bool allow_direct;
>>>> +
>>>> +    page = compound_head(page);
>>>> +    pp = page->pp;
>>>
>>> So this is just assuming that any page we pass thru is a page pool
>>> page. The problem is there may be some other pointer stored here that
>>> could cause issues.
>>
>> But that is exactly what you suggested in the previous revision's
>> thread... Hey! :D
>>
>> "I suspect we could look at pulling parts of it out as well. The
>> pp_magic check should always be succeeding unless we have pages getting
>> routed the wrong way somewhere. So maybe we should look at pulling it
>> out and moving it to another part of the path such as
>> __page_pool_put_page() and making it a bit more visible to catch those
>> cases".
> 
> Yeah, I have had a few off days recently, amazing what happens when
> you are running on only a few hours of sleep.. :-/

Aaah I see :D

> 
> Anyway, as I was saying it might make sense to wrap the whole thing up
> as a page pool accessor that would return NULL if the MAGIC value
> isn't present. Alternatively one other possibility would be to look at
> creating an inline function here that could check to see if the
> skb_frag is page pool and then just keep that in sk_buff.h since it
> looks like it should only need to rely on the page struct and
> PP_SIGNATURE which is a poison.h value. With that napi_frag_unref
> could avoid an unnecessary trip into the page_pool_return_skb_page
> function entirely if it isn't a page pool page and we could look at
> dropping the return value from page_pool_return_skb_page entirely.

I like this one. This is rather simple check and shouldn't cause code
size inflating, but I'll recheck with bloat-o-meter just in case.

> 
>> Anyway, since some drivers still mix PP pages with non-PP ones (mt76
>> IIRC, maybe more), I feel the check for magic is still relevant. I just
>> believed you and forgot about that T.T
> 
> Yeah, sorry about that, my bad. I was a bit too focused on the main
> drivers we use and not thinking outside the box enough.

Not a prob!

Thanks,
Olek

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC net-next 2/4] net: page_pool: avoid calling no-op externals when possible
  2023-06-30 15:34         ` Alexander Lobakin
@ 2023-06-30 18:28           ` Alexander Duyck
  2023-07-03 13:38             ` Alexander Lobakin
  2023-07-03 20:32           ` Jakub Kicinski
  1 sibling, 1 reply; 21+ messages in thread
From: Alexander Duyck @ 2023-06-30 18:28 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Maciej Fijalkowski, Larysa Zaremba, Yunsheng Lin,
	Alexander Duyck, Jesper Dangaard Brouer, Ilias Apalodimas,
	netdev, linux-kernel

On Fri, Jun 30, 2023 at 8:34 AM Alexander Lobakin
<aleksander.lobakin@intel.com> wrote:
>
> From: Alexander Duyck <alexander.duyck@gmail.com>
> Date: Fri, 30 Jun 2023 07:44:45 -0700
>
> > On Fri, Jun 30, 2023 at 5:30 AM Alexander Lobakin
> > <aleksander.lobakin@intel.com> wrote:
> >>
> >> From: Alexander H Duyck <alexander.duyck@gmail.com>
> >> Date: Thu, 29 Jun 2023 09:45:26 -0700
> >>
> >>> On Thu, 2023-06-29 at 17:23 +0200, Alexander Lobakin wrote:
> >>>> Turned out page_pool_put{,_full}_page() can burn quite a bunch of cycles
> >>>> even when on DMA-coherent platforms (like x86) with no active IOMMU or
> >>>> swiotlb, just for the call ladder.
> >>>> Indeed, it's
> >>
> >> [...]
> >>
> >>>> @@ -341,6 +345,12 @@ static bool page_pool_dma_map(struct page_pool *pool, struct page *page)
> >>>>
> >>>>      page_pool_set_dma_addr(page, dma);
> >>>>
> >>>> +    if ((pool->p.flags & PP_FLAG_DMA_MAYBE_SYNC) &&
> >>>> +        dma_need_sync(pool->p.dev, dma)) {
> >>>> +            pool->p.flags |= PP_FLAG_DMA_SYNC_DEV;
> >>>> +            pool->p.flags &= ~PP_FLAG_DMA_MAYBE_SYNC;
> >>>> +    }
> >>>> +
> >>>>      if (pool->p.flags & PP_FLAG_DMA_SYNC_DEV)
> >>>>              page_pool_dma_sync_for_device(pool, page, pool->p.max_len);
> >>>>
> >>>
> >>> I am pretty sure the logic is flawed here. The problem is
> >>> dma_needs_sync depends on the DMA address being used. In the worst case
> >>> scenario we could have a device that has something like a 32b DMA
> >>> address space on a system with over 4GB of memory. In such a case the
> >>> higher addresses would need to be synced because they will go off to a
> >>> swiotlb bounce buffer while the lower addresses wouldn't.
> >>>
> >>> If you were to store a flag like this it would have to be generated per
> >>> page.
> >>
> >> I know when DMA might need syncing :D That's the point of this shortcut:
> >> if at least one page needs syncing, I disable it for the whole pool.
> >> It's a "better safe than sorry".
> >> Using a per-page flag involves more changes and might hurt some
> >> scenarios/setups. For example, non-coherent systems, where you always
> >> need to do syncs. The idea was to give some improvement when possible,
> >> otherwise just fallback to what we have today.
> >
> > I am not a fan of having the page pool force the syncing either. Last
> > I knew I thought the PP_FLAG_DMA_SYNC_DEV was meant to be set by the
>
> Please follow the logics of the patch.
>
> 1. The driver sets DMA_SYNC_DEV.
> 2. PP tries to shortcut and replaces it with MAYBE_SYNC.
> 3. If dma_need_sync() returns true for some page, it gets replaced back
>    to DMA_SYNC_DEV, no further dma_need_sync() calls for that pool.
>
> OR
>
> 1. The driver doesn't set DMA_SYNC_DEV.
> 2. PP doesn't turn on MAYBE_SYNC.
> 3. No dma_need_sync() tests.
>
> Where does PP force syncs for drivers which don't need them?

You are right. I was looking at it out of context.

> > driver, not by the page pool API itself. The big reason for that being
> > that the driver in many cases will have to take care of the DMA sync
> > itself instead of letting the allocator take care of it.
> >
> > Basically we are just trading off the dma_need_sync cost versus the
> > page_pool_dma_sync_for_device cost. If we think it is a win to call
>
> dma_need_sync() is called once per newly allocated and mapped page.
> page_pool_dma_sync_for_device() is called each time you ask for a page.
>
> On my setup and with patch #4, I have literally 0 allocations once a
> ring is filled. This means dma_need_sync() is not called at all during
> Rx, while sync_for_device() would be called all the time.
> When pages go through ptr_ring, sometimes new allocations happen, but
> still the number of times dma_need_sync() is called is thousands times
> lower.

I see, so you are using it as a screener for pages as they are added
to the pool. However the first time somebody trips the dma_need_sync
then everybody in the pool is going to be getting hit with the sync
code.

> > dma_need_sync for every frame then maybe we should look at folding it
> > into page_pool_dma_sync_for_device itself since that is the only
> > consumer of it it or just fold it into the PP_FLAG_DMA_SYNC_DEV if
> > statement after the flag check rather than adding yet another flag
> > that will likely always be true for most devices. Otherwise you are
>
> What you suggest is either calling dma_need_sync() each time a page is
> requested or introducing a flag to store it somewhere in struct page to
> allow some optimization for really-not-common-cases when dma_need_sync()
> might return different values due to swiotlb etc. Did I get it right?

Yeah, my thought would be to have a flag in the page to indicate if it
will need the sync bits or not. Then you could look at exposing that
to the drivers as well so that they could cut down on their own
overhead. We could probably look at just embedding a flag in the lower
bits of the DMA address stored in the page since I suspect at a
minimum the resultant DMA address for a page would always be at least
aligned to a long if not a full page.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC net-next 0/4] net: page_pool: a couple assorted optimizations
  2023-06-29 15:23 [PATCH RFC net-next 0/4] net: page_pool: a couple assorted optimizations Alexander Lobakin
                   ` (3 preceding siblings ...)
  2023-06-29 15:23 ` [PATCH RFC net-next 4/4] net: skbuff: always recycle PP pages directly when inside a NAPI loop Alexander Lobakin
@ 2023-07-02  0:01 ` Jakub Kicinski
  2023-07-03 13:50   ` Alexander Lobakin
  4 siblings, 1 reply; 21+ messages in thread
From: Jakub Kicinski @ 2023-07-02  0:01 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: David S. Miller, Eric Dumazet, Paolo Abeni, Maciej Fijalkowski,
	Larysa Zaremba, Yunsheng Lin, Alexander Duyck,
	Jesper Dangaard Brouer, Ilias Apalodimas, netdev, linux-kernel

On Thu, 29 Jun 2023 17:23:01 +0200 Alexander Lobakin wrote:
>  #3: new, prereq to #4. Add NAPI state flag, which would indicate
>      napi->poll() is running right now, so that napi->list_owner would
>      point to the CPU where it's being run, not just scheduled;
>  #4: new. In addition to recycling skb PP pages directly when @napi_safe
>      is set, check for the flag from #3, which will mean the same if
>      ->list_owner is pointing to us. This allows to use direct recycling  
>      anytime we're inside a NAPI polling loop or GRO stuff going right
>      after it, covering way more cases than is right now.

You know NAPI pretty well so I'm worried I'm missing something.
I don't think the new flag adds any value. NAPI does not have to 
be running, you can drop patch 3 and use in_softirq() instead of 
the new flag, AFAIU.

The reason I did not do that is that I wasn't sure if there is no
weird (netcons?) case where skb gets freed from an IRQ :(

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC net-next 2/4] net: page_pool: avoid calling no-op externals when possible
  2023-06-30 18:28           ` Alexander Duyck
@ 2023-07-03 13:38             ` Alexander Lobakin
  0 siblings, 0 replies; 21+ messages in thread
From: Alexander Lobakin @ 2023-07-03 13:38 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: David S. Miller, Eric Dumazet, Jakub Kicinski, Paolo Abeni,
	Maciej Fijalkowski, Larysa Zaremba, Yunsheng Lin,
	Alexander Duyck, Jesper Dangaard Brouer, Ilias Apalodimas,
	netdev, linux-kernel

From: Alexander Duyck <alexander.duyck@gmail.com>
Date: Fri, 30 Jun 2023 11:28:30 -0700

> On Fri, Jun 30, 2023 at 8:34 AM Alexander Lobakin
> <aleksander.lobakin@intel.com> wrote:

[...]

>> On my setup and with patch #4, I have literally 0 allocations once a
>> ring is filled. This means dma_need_sync() is not called at all during
>> Rx, while sync_for_device() would be called all the time.
>> When pages go through ptr_ring, sometimes new allocations happen, but
>> still the number of times dma_need_sync() is called is thousands times
>> lower.
> 
> I see, so you are using it as a screener for pages as they are added
> to the pool. However the first time somebody trips the dma_need_sync
> then everybody in the pool is going to be getting hit with the sync
> code.

Right. "Better safe than sorry". If at least one page needs sync, let's
drop the shortcut. It won't make it worse than the mainline anyway.

> 
>>> dma_need_sync for every frame then maybe we should look at folding it
>>> into page_pool_dma_sync_for_device itself since that is the only
>>> consumer of it it or just fold it into the PP_FLAG_DMA_SYNC_DEV if
>>> statement after the flag check rather than adding yet another flag
>>> that will likely always be true for most devices. Otherwise you are
>>
>> What you suggest is either calling dma_need_sync() each time a page is
>> requested or introducing a flag to store it somewhere in struct page to
>> allow some optimization for really-not-common-cases when dma_need_sync()
>> might return different values due to swiotlb etc. Did I get it right?
> 
> Yeah, my thought would be to have a flag in the page to indicate if it
> will need the sync bits or not. Then you could look at exposing that
> to the drivers as well so that they could cut down on their own
> overhead. We could probably look at just embedding a flag in the lower
> bits of the DMA address stored in the page since I suspect at a
> minimum the resultant DMA address for a page would always be at least
> aligned to a long if not a full page.

As for drivers, I could add a wrapper like page_pool_need_sync() to test
for DMA_SYNC_DEV. I'm not sure it's worth it to check on a per-page basis.
Also, having that bit in the struct page forces us to always fetch it to
a cacheline. Right now, if the sync is skipped, this also avoids
touching struct page or at least postpones it. page_address() (for
&xdp_buff or build_skb()) doesn't touch it.

As for possible implementation, I also thought of the lowest bit of DMA
address. It probably can be lower than %PAGE_SIZE in some cases (at
least non-PP), but not to the 1-byte granularity.
But again, we need some group decision on if it's worth to do on a
per-page basis :)

Thanks,
Olek

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC net-next 0/4] net: page_pool: a couple assorted optimizations
  2023-07-02  0:01 ` [PATCH RFC net-next 0/4] net: page_pool: a couple assorted optimizations Jakub Kicinski
@ 2023-07-03 13:50   ` Alexander Lobakin
  2023-07-03 18:57     ` Jakub Kicinski
  0 siblings, 1 reply; 21+ messages in thread
From: Alexander Lobakin @ 2023-07-03 13:50 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David S. Miller, Eric Dumazet, Paolo Abeni, Maciej Fijalkowski,
	Larysa Zaremba, Yunsheng Lin, Alexander Duyck,
	Jesper Dangaard Brouer, Ilias Apalodimas, netdev, linux-kernel

From: Jakub Kicinski <kuba@kernel.org>
Date: Sat, 1 Jul 2023 17:01:55 -0700

> On Thu, 29 Jun 2023 17:23:01 +0200 Alexander Lobakin wrote:
>>  #3: new, prereq to #4. Add NAPI state flag, which would indicate
>>      napi->poll() is running right now, so that napi->list_owner would
>>      point to the CPU where it's being run, not just scheduled;
>>  #4: new. In addition to recycling skb PP pages directly when @napi_safe
>>      is set, check for the flag from #3, which will mean the same if
>>      ->list_owner is pointing to us. This allows to use direct recycling  
>>      anytime we're inside a NAPI polling loop or GRO stuff going right
>>      after it, covering way more cases than is right now.
> 
> You know NAPI pretty well so I'm worried I'm missing something.

I wouldn't say I know it well :D

> I don't think the new flag adds any value. NAPI does not have to 
> be running, you can drop patch 3 and use in_softirq() instead of 
> the new flag, AFAIU.

That's most likely true for the patch 4 case, but I wanted to add some
flag for wider usage.
For example, busy polling relies on whether ->poll() returned whole
budget to decide whether interrupts were reenabled to avoid possible
concurrent access, but I wouldn't say it's precise enough.
napi_complete_done() doesn't always return true.
OTOH, the new flag or, more precisely, flag + list_owner combo would
tell for sure.

> 
> The reason I did not do that is that I wasn't sure if there is no
> weird (netcons?) case where skb gets freed from an IRQ :(

Shouldn't they use dev_kfree_skb_any() or _irq()? Usage of plain
kfree_skb() is not allowed in the TH :s

Anyway, if the flag really makes no sense, I can replace it with
in_softirq(), it's my hobby to break weird drivers :D

Thanks,
Olek

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC net-next 0/4] net: page_pool: a couple assorted optimizations
  2023-07-03 13:50   ` Alexander Lobakin
@ 2023-07-03 18:57     ` Jakub Kicinski
  2023-07-05 12:31       ` Alexander Lobakin
  0 siblings, 1 reply; 21+ messages in thread
From: Jakub Kicinski @ 2023-07-03 18:57 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: David S. Miller, Eric Dumazet, Paolo Abeni, Maciej Fijalkowski,
	Larysa Zaremba, Yunsheng Lin, Alexander Duyck,
	Jesper Dangaard Brouer, Ilias Apalodimas, netdev, linux-kernel

On Mon, 3 Jul 2023 15:50:55 +0200 Alexander Lobakin wrote:
> > The reason I did not do that is that I wasn't sure if there is no
> > weird (netcons?) case where skb gets freed from an IRQ :(  
> 
> Shouldn't they use dev_kfree_skb_any() or _irq()? Usage of plain
> kfree_skb() is not allowed in the TH :s

I haven't looked at the code so I could be lying but I thought that 
the only thing that can't run in hard IRQ context is the destructor,
so if the caller knows there's no destructor they can free the skb.

I'd ask you the inverse question. If the main use case is skb xdp
(which eh, uh, okay..) then why not make it use napi_consume_skb()?
I don't think skb XDP can run in hard IRQ context, can it?

> Anyway, if the flag really makes no sense, I can replace it with
> in_softirq(), it's my hobby to break weird drivers :D

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC net-next 2/4] net: page_pool: avoid calling no-op externals when possible
  2023-06-30 15:34         ` Alexander Lobakin
  2023-06-30 18:28           ` Alexander Duyck
@ 2023-07-03 20:32           ` Jakub Kicinski
  2023-07-05 14:41             ` Alexander Lobakin
  1 sibling, 1 reply; 21+ messages in thread
From: Jakub Kicinski @ 2023-07-03 20:32 UTC (permalink / raw)
  To: Alexander Lobakin
  Cc: Alexander Duyck, David S. Miller, Eric Dumazet, Paolo Abeni,
	Maciej Fijalkowski, Larysa Zaremba, Yunsheng Lin,
	Alexander Duyck, Jesper Dangaard Brouer, Ilias Apalodimas,
	netdev, linux-kernel

On Fri, 30 Jun 2023 17:34:02 +0200 Alexander Lobakin wrote:
> > I am not a fan of having the page pool force the syncing either. Last
> > I knew I thought the PP_FLAG_DMA_SYNC_DEV was meant to be set by the  
> 
> Please follow the logics of the patch.
> 
> 1. The driver sets DMA_SYNC_DEV.
> 2. PP tries to shortcut and replaces it with MAYBE_SYNC.
> 3. If dma_need_sync() returns true for some page, it gets replaced back
>    to DMA_SYNC_DEV, no further dma_need_sync() calls for that pool.
> 
> OR
> 
> 1. The driver doesn't set DMA_SYNC_DEV.
> 2. PP doesn't turn on MAYBE_SYNC.
> 3. No dma_need_sync() tests.
> 
> Where does PP force syncs for drivers which don't need them?

I think both Alex and I got confused about what's going on here.

Could you reshuffle the code somehow to make it more obvious?
Rename the flag, perhaps put it in a different field than 
the driver-set PP flags?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC net-next 0/4] net: page_pool: a couple assorted optimizations
  2023-07-03 18:57     ` Jakub Kicinski
@ 2023-07-05 12:31       ` Alexander Lobakin
  0 siblings, 0 replies; 21+ messages in thread
From: Alexander Lobakin @ 2023-07-05 12:31 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David S. Miller, Eric Dumazet, Paolo Abeni, Maciej Fijalkowski,
	Larysa Zaremba, Yunsheng Lin, Alexander Duyck,
	Jesper Dangaard Brouer, Ilias Apalodimas, netdev, linux-kernel

From: Jakub Kicinski <kuba@kernel.org>
Date: Mon, 3 Jul 2023 11:57:34 -0700

> On Mon, 3 Jul 2023 15:50:55 +0200 Alexander Lobakin wrote:
>>> The reason I did not do that is that I wasn't sure if there is no
>>> weird (netcons?) case where skb gets freed from an IRQ :(  
>>
>> Shouldn't they use dev_kfree_skb_any() or _irq()? Usage of plain
>> kfree_skb() is not allowed in the TH :s
> 
> I haven't looked at the code so I could be lying but I thought that 
> the only thing that can't run in hard IRQ context is the destructor,
> so if the caller knows there's no destructor they can free the skb.
> 
> I'd ask you the inverse question. If the main use case is skb xdp
> (which eh, uh, okay..) then why not make it use napi_consume_skb()?

Remember about Wi-Fi, DSA, and other poor citizens with no native XDP! :D
That was mostly a joke, but I thought of this, too. At the end my
thought was "let's try making it cover more usecases" and I found this
approach. I'm not saying it's optimal or even much needed, that's why I
sent it to discuss basically.

(e.g. I wanted to try speed up xdp_return_frame{,_bulk}() using it)

> I don't think skb XDP can run in hard IRQ context, can it?

skb XDP can't happen in the TH and I think we could assume it's safe to
use napi_consume_skb() there (with a fake non-zero budget :p).

> 
>> Anyway, if the flag really makes no sense, I can replace it with
>> in_softirq(), it's my hobby to break weird drivers :D

Thanks,
Olek

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH RFC net-next 2/4] net: page_pool: avoid calling no-op externals when possible
  2023-07-03 20:32           ` Jakub Kicinski
@ 2023-07-05 14:41             ` Alexander Lobakin
  0 siblings, 0 replies; 21+ messages in thread
From: Alexander Lobakin @ 2023-07-05 14:41 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Alexander Duyck, David S. Miller, Eric Dumazet, Paolo Abeni,
	Maciej Fijalkowski, Larysa Zaremba, Yunsheng Lin,
	Alexander Duyck, Jesper Dangaard Brouer, Ilias Apalodimas,
	netdev, linux-kernel

From: Jakub Kicinski <kuba@kernel.org>
Date: Mon, 3 Jul 2023 13:32:07 -0700

> On Fri, 30 Jun 2023 17:34:02 +0200 Alexander Lobakin wrote:
>>> I am not a fan of having the page pool force the syncing either. Last
>>> I knew I thought the PP_FLAG_DMA_SYNC_DEV was meant to be set by the  
>>
>> Please follow the logics of the patch.
>>
>> 1. The driver sets DMA_SYNC_DEV.
>> 2. PP tries to shortcut and replaces it with MAYBE_SYNC.
>> 3. If dma_need_sync() returns true for some page, it gets replaced back
>>    to DMA_SYNC_DEV, no further dma_need_sync() calls for that pool.
>>
>> OR
>>
>> 1. The driver doesn't set DMA_SYNC_DEV.
>> 2. PP doesn't turn on MAYBE_SYNC.
>> 3. No dma_need_sync() tests.
>>
>> Where does PP force syncs for drivers which don't need them?
> 
> I think both Alex and I got confused about what's going on here.
> 
> Could you reshuffle the code somehow to make it more obvious?
> Rename the flag, perhaps put it in a different field than 
> the driver-set PP flags?

PP currently doesn't have a field for internal flags or so, so I reused
the existing one :s But you're probably right, that would make it more
obvious.

1. Driver sets PP_SYNC_DEV.
2. PP doesn't set its internal one until dma_need_sync() returns false.
3. PP-sync-for-dev checks for the internal flag.

Although needs more lines to be changed :D

Thanks,
Olek

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2023-07-05 14:42 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-29 15:23 [PATCH RFC net-next 0/4] net: page_pool: a couple assorted optimizations Alexander Lobakin
2023-06-29 15:23 ` [PATCH RFC net-next 1/4] net: skbuff: don't include <net/page_pool.h> to <linux/skbuff.h> Alexander Lobakin
2023-06-29 16:55   ` Alexander H Duyck
2023-06-30 12:39     ` Alexander Lobakin
2023-06-30 15:11       ` Alexander Duyck
2023-06-30 16:05         ` Alexander Lobakin
2023-06-29 15:23 ` [PATCH RFC net-next 2/4] net: page_pool: avoid calling no-op externals when possible Alexander Lobakin
2023-06-29 16:45   ` Alexander H Duyck
2023-06-30 12:29     ` Alexander Lobakin
2023-06-30 14:44       ` Alexander Duyck
2023-06-30 15:34         ` Alexander Lobakin
2023-06-30 18:28           ` Alexander Duyck
2023-07-03 13:38             ` Alexander Lobakin
2023-07-03 20:32           ` Jakub Kicinski
2023-07-05 14:41             ` Alexander Lobakin
2023-06-29 15:23 ` [PATCH RFC net-next 3/4] net: add flag to indicate NAPI/GRO is running right now Alexander Lobakin
2023-06-29 15:23 ` [PATCH RFC net-next 4/4] net: skbuff: always recycle PP pages directly when inside a NAPI loop Alexander Lobakin
2023-07-02  0:01 ` [PATCH RFC net-next 0/4] net: page_pool: a couple assorted optimizations Jakub Kicinski
2023-07-03 13:50   ` Alexander Lobakin
2023-07-03 18:57     ` Jakub Kicinski
2023-07-05 12:31       ` Alexander Lobakin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.