All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 0/4] page_pool: API for numa node change handling
@ 2019-10-22  4:44 Saeed Mahameed
  2019-10-22  4:44 ` [PATCH net-next 1/4] page_pool: Add API to update numa node Saeed Mahameed
                   ` (5 more replies)
  0 siblings, 6 replies; 13+ messages in thread
From: Saeed Mahameed @ 2019-10-22  4:44 UTC (permalink / raw)
  To: David S. Miller, Jesper Dangaard Brouer
  Cc: netdev, Jonathan Lemon, ilias.apalodimas, Saeed Mahameed

Hi Dave & Jesper,

This series extends page pool API to allow page pool consumers to update
page pool numa node on the fly. This is required since on some systems,
rx rings irqs can migrate between numa nodes, due to irq balancer or user
defined scripts, current page pool has no way to know of such migration
and will keep allocating and holding on to pages from a wrong numa node,
which is bad for the consumer performance.

1) Add API to update numa node id of the page pool
Consumers will call this API to update the page pool numa node id.

2) Don't recycle non-reusable pages:
Page pool will check upon page return whether a page is suitable for
recycling or not. 
 2.1) when it belongs to a different num node.
 2.2) when it was allocated under memory pressure.

3) mlx5 will use the new API to update page pool numa id on demand.

The series is a joint work between me and Jonathan, we tested it and it
proved itself worthy to avoid page allocator bottlenecks and improve
packet rate and cpu utilization significantly for the described
scenarios above.

Performance testing:
XDP drop/tx rate and TCP single/multi stream, on mlx5 driver
while migrating rx ring irq from close to far numa:

mlx5 internal page cache was locally disabled to get pure page pool
results.

CPU: Intel(R) Xeon(R) CPU E5-2603 v4 @ 1.70GHz
NIC: Mellanox Technologies MT27700 Family [ConnectX-4] (100G)

XDP Drop/TX single core:
NUMA  | XDP  | Before    | After
---------------------------------------
Close | Drop | 11   Mpps | 10.9 Mpps
Far   | Drop | 4.4  Mpps | 5.8  Mpps

Close | TX   | 6.5 Mpps  | 6.5 Mpps
Far   | TX   | 4   Mpps  | 3.5  Mpps

Improvement is about 30% drop packet rate, 15% tx packet rate for numa
far test.
No degradation for numa close tests.

TCP single/multi cpu/stream:
NUMA  | #cpu | Before  | After
--------------------------------------
Close | 1    | 18 Gbps | 18 Gbps
Far   | 1    | 15 Gbps | 18 Gbps
Close | 12   | 80 Gbps | 80 Gbps
Far   | 12   | 68 Gbps | 80 Gbps

In all test cases we see improvement for the far numa case, and no
impact on the close numa case.

Thanks,
Saeed.

---

Jonathan Lemon (1):
  page_pool: Restructure __page_pool_put_page()

Saeed Mahameed (3):
  page_pool: Add API to update numa node
  page_pool: Don't recycle non-reusable pages
  net/mlx5e: Rx, Update page pool numa node when changed

 .../net/ethernet/mellanox/mlx5/core/en_rx.c   |  3 ++
 include/net/page_pool.h                       |  7 +++
 include/trace/events/page_pool.h              | 22 +++++++++
 net/core/page_pool.c                          | 46 +++++++++++++------
 4 files changed, 65 insertions(+), 13 deletions(-)

-- 
2.21.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH net-next 1/4] page_pool: Add API to update numa node
  2019-10-22  4:44 [PATCH net-next 0/4] page_pool: API for numa node change handling Saeed Mahameed
@ 2019-10-22  4:44 ` Saeed Mahameed
  2019-10-22  4:44 ` [PATCH net-next 2/4] page_pool: Don't recycle non-reusable pages Saeed Mahameed
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 13+ messages in thread
From: Saeed Mahameed @ 2019-10-22  4:44 UTC (permalink / raw)
  To: David S. Miller, Jesper Dangaard Brouer
  Cc: netdev, Jonathan Lemon, ilias.apalodimas, Saeed Mahameed

Add page_pool_update_nid() to be called by page pool consumers when they
detect numa node changes.

It will update the page pool nid value to start allocating from the new
effective numa node.

This is to mitigate page pool allocating pages from a wrong numa node,
where the pool was originally allocated, and holding on to pages that
belong to a different numa node, which causes performance degradation.

For pages that are already being consumed and could be returned to the
pool by the consumer, in next patch we will add a check per page to avoid
recycling them back to the pool and return them to the page allocator.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
---
 include/net/page_pool.h          |  7 +++++++
 include/trace/events/page_pool.h | 22 ++++++++++++++++++++++
 net/core/page_pool.c             |  8 ++++++++
 3 files changed, 37 insertions(+)

diff --git a/include/net/page_pool.h b/include/net/page_pool.h
index 2cbcdbdec254..f46b78408e44 100644
--- a/include/net/page_pool.h
+++ b/include/net/page_pool.h
@@ -226,4 +226,11 @@ static inline bool page_pool_put(struct page_pool *pool)
 	return refcount_dec_and_test(&pool->user_cnt);
 }
 
+/* Caller must provide appropriate safe context, e.g. NAPI. */
+void page_pool_update_nid(struct page_pool *pool, int new_nid);
+static inline void page_pool_nid_changed(struct page_pool *pool, int new_nid)
+{
+	if (unlikely(pool->p.nid != new_nid))
+		page_pool_update_nid(pool, new_nid);
+}
 #endif /* _NET_PAGE_POOL_H */
diff --git a/include/trace/events/page_pool.h b/include/trace/events/page_pool.h
index 47b5ee880aa9..2507c5ff19e6 100644
--- a/include/trace/events/page_pool.h
+++ b/include/trace/events/page_pool.h
@@ -81,6 +81,28 @@ TRACE_EVENT(page_pool_state_hold,
 		  __entry->pool, __entry->page, __entry->hold)
 );
 
+TRACE_EVENT(page_pool_nid_update,
+
+	TP_PROTO(const struct page_pool *pool, int new_nid),
+
+	TP_ARGS(pool, new_nid),
+
+	TP_STRUCT__entry(
+		__field(const struct page_pool *, pool)
+		__field(int,			  pool_nid)
+		__field(int,			  new_nid)
+	),
+
+	TP_fast_assign(
+		__entry->pool		= pool;
+		__entry->pool_nid	= pool->p.nid;
+		__entry->new_nid	= new_nid;
+	),
+
+	TP_printk("page_pool=%p pool_nid=%d new_nid=%d",
+		  __entry->pool, __entry->pool_nid, __entry->new_nid)
+);
+
 #endif /* _TRACE_PAGE_POOL_H */
 
 /* This part must be outside protection */
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 5bc65587f1c4..08ca9915c618 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -397,3 +397,11 @@ bool __page_pool_request_shutdown(struct page_pool *pool)
 	return __page_pool_safe_to_destroy(pool);
 }
 EXPORT_SYMBOL(__page_pool_request_shutdown);
+
+/* Caller must provide appropriate safe context, e.g. NAPI. */
+void page_pool_update_nid(struct page_pool *pool, int new_nid)
+{
+	trace_page_pool_nid_update(pool, new_nid);
+	pool->p.nid = new_nid;
+}
+EXPORT_SYMBOL(page_pool_update_nid);
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH net-next 2/4] page_pool: Don't recycle non-reusable pages
  2019-10-22  4:44 [PATCH net-next 0/4] page_pool: API for numa node change handling Saeed Mahameed
  2019-10-22  4:44 ` [PATCH net-next 1/4] page_pool: Add API to update numa node Saeed Mahameed
@ 2019-10-22  4:44 ` Saeed Mahameed
  2019-10-23 18:38   ` Jesper Dangaard Brouer
  2019-10-22  4:44 ` [PATCH net-next 3/4] page_pool: Restructure __page_pool_put_page() Saeed Mahameed
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Saeed Mahameed @ 2019-10-22  4:44 UTC (permalink / raw)
  To: David S. Miller, Jesper Dangaard Brouer
  Cc: netdev, Jonathan Lemon, ilias.apalodimas, Saeed Mahameed

A page is NOT reusable when at least one of the following is true:
1) allocated when system was under some pressure. (page_is_pfmemalloc)
2) belongs to a different NUMA node than pool->p.nid.

To update pool->p.nid users should call page_pool_update_nid().

Holding on to such pages in the pool will hurt the consumer performance
when the pool migrates to a different numa node.

Performance testing:
XDP drop/tx rate and TCP single/multi stream, on mlx5 driver
while migrating rx ring irq from close to far numa:

mlx5 internal page cache was locally disabled to get pure page pool
results.

CPU: Intel(R) Xeon(R) CPU E5-2603 v4 @ 1.70GHz
NIC: Mellanox Technologies MT27700 Family [ConnectX-4] (100G)

XDP Drop/TX single core:
NUMA  | XDP  | Before    | After
---------------------------------------
Close | Drop | 11   Mpps | 10.8 Mpps
Far   | Drop | 4.4  Mpps | 5.8  Mpps

Close | TX   | 6.5 Mpps  | 6.5 Mpps
Far   | TX   | 4   Mpps  | 3.5  Mpps

Improvement is about 30% drop packet rate, 15% tx packet rate for numa
far test.
No degradation for numa close tests.

TCP single/multi cpu/stream:
NUMA  | #cpu | Before  | After
--------------------------------------
Close | 1    | 18 Gbps | 18 Gbps
Far   | 1    | 15 Gbps | 18 Gbps
Close | 12   | 80 Gbps | 80 Gbps
Far   | 12   | 68 Gbps | 80 Gbps

In all test cases we see improvement for the far numa case, and no
impact on the close numa case.

The impact of adding a check per page is very negligible, and shows no
performance degradation whatsoever, also functionality wise it seems more
correct and more robust for page pool to verify when pages should be
recycled, since page pool can't guarantee where pages are coming from.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
---
 net/core/page_pool.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 08ca9915c618..8120aec999ce 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -283,6 +283,17 @@ static bool __page_pool_recycle_direct(struct page *page,
 	return true;
 }
 
+/* page is NOT reusable when:
+ * 1) allocated when system is under some pressure. (page_is_pfmemalloc)
+ * 2) belongs to a different NUMA node than pool->p.nid.
+ *
+ * To update pool->p.nid users must call page_pool_update_nid.
+ */
+static bool pool_page_reusable(struct page_pool *pool, struct page *page)
+{
+	return !page_is_pfmemalloc(page) && page_to_nid(page) == pool->p.nid;
+}
+
 void __page_pool_put_page(struct page_pool *pool,
 			  struct page *page, bool allow_direct)
 {
@@ -292,7 +303,8 @@ void __page_pool_put_page(struct page_pool *pool,
 	 *
 	 * refcnt == 1 means page_pool owns page, and can recycle it.
 	 */
-	if (likely(page_ref_count(page) == 1)) {
+	if (likely(page_ref_count(page) == 1 &&
+		   pool_page_reusable(pool, page))) {
 		/* Read barrier done in page_ref_count / READ_ONCE */
 
 		if (allow_direct && in_serving_softirq())
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH net-next 3/4] page_pool: Restructure __page_pool_put_page()
  2019-10-22  4:44 [PATCH net-next 0/4] page_pool: API for numa node change handling Saeed Mahameed
  2019-10-22  4:44 ` [PATCH net-next 1/4] page_pool: Add API to update numa node Saeed Mahameed
  2019-10-22  4:44 ` [PATCH net-next 2/4] page_pool: Don't recycle non-reusable pages Saeed Mahameed
@ 2019-10-22  4:44 ` Saeed Mahameed
  2019-10-23  8:45   ` Ilias Apalodimas
  2019-10-22  4:44 ` [PATCH net-next 4/4] net/mlx5e: Rx, Update page pool numa node when changed Saeed Mahameed
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 13+ messages in thread
From: Saeed Mahameed @ 2019-10-22  4:44 UTC (permalink / raw)
  To: David S. Miller, Jesper Dangaard Brouer
  Cc: netdev, Jonathan Lemon, ilias.apalodimas, Saeed Mahameed

From: Jonathan Lemon <jonathan.lemon@gmail.com>

1) Rename functions to reflect what they are actually doing.

2) Unify the condition to keep a page.

3) When page can't be kept in cache, fallback to releasing page to page
allocator in one place, instead of calling it from multiple conditions,
and reuse __page_pool_return_page().

Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
---
 net/core/page_pool.c | 38 +++++++++++++++++++-------------------
 1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 8120aec999ce..65680aaa0818 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -258,6 +258,7 @@ static bool __page_pool_recycle_into_ring(struct page_pool *pool,
 				   struct page *page)
 {
 	int ret;
+
 	/* BH protection not needed if current is serving softirq */
 	if (in_serving_softirq())
 		ret = ptr_ring_produce(&pool->ring, page);
@@ -272,8 +273,8 @@ static bool __page_pool_recycle_into_ring(struct page_pool *pool,
  *
  * Caller must provide appropriate safe context.
  */
-static bool __page_pool_recycle_direct(struct page *page,
-				       struct page_pool *pool)
+static bool __page_pool_recycle_into_cache(struct page *page,
+					   struct page_pool *pool)
 {
 	if (unlikely(pool->alloc.count == PP_ALLOC_CACHE_SIZE))
 		return false;
@@ -283,15 +284,18 @@ static bool __page_pool_recycle_direct(struct page *page,
 	return true;
 }
 
-/* page is NOT reusable when:
- * 1) allocated when system is under some pressure. (page_is_pfmemalloc)
- * 2) belongs to a different NUMA node than pool->p.nid.
+/* Keep page in caches only if page:
+ * 1) wasn't allocated when system is under some pressure (page_is_pfmemalloc).
+ * 2) belongs to pool's numa node (pool->p.nid).
+ * 3) refcount is 1 (owned by page pool).
  *
  * To update pool->p.nid users must call page_pool_update_nid.
  */
-static bool pool_page_reusable(struct page_pool *pool, struct page *page)
+static bool page_pool_keep_page(struct page_pool *pool, struct page *page)
 {
-	return !page_is_pfmemalloc(page) && page_to_nid(page) == pool->p.nid;
+	return !page_is_pfmemalloc(page) &&
+	       page_to_nid(page) == pool->p.nid &&
+	       page_ref_count(page) == 1;
 }
 
 void __page_pool_put_page(struct page_pool *pool,
@@ -300,22 +304,19 @@ void __page_pool_put_page(struct page_pool *pool,
 	/* This allocator is optimized for the XDP mode that uses
 	 * one-frame-per-page, but have fallbacks that act like the
 	 * regular page allocator APIs.
-	 *
-	 * refcnt == 1 means page_pool owns page, and can recycle it.
 	 */
-	if (likely(page_ref_count(page) == 1 &&
-		   pool_page_reusable(pool, page))) {
+
+	if (likely(page_pool_keep_page(pool, page))) {
 		/* Read barrier done in page_ref_count / READ_ONCE */
 
 		if (allow_direct && in_serving_softirq())
-			if (__page_pool_recycle_direct(page, pool))
+			if (__page_pool_recycle_into_cache(page, pool))
 				return;
 
-		if (!__page_pool_recycle_into_ring(pool, page)) {
-			/* Cache full, fallback to free pages */
-			__page_pool_return_page(pool, page);
-		}
-		return;
+		if (__page_pool_recycle_into_ring(pool, page))
+			return;
+
+		/* Cache full, fallback to return pages */
 	}
 	/* Fallback/non-XDP mode: API user have elevated refcnt.
 	 *
@@ -330,8 +331,7 @@ void __page_pool_put_page(struct page_pool *pool,
 	 * doing refcnt based recycle tricks, meaning another process
 	 * will be invoking put_page.
 	 */
-	__page_pool_clean_page(pool, page);
-	put_page(page);
+	__page_pool_return_page(pool, page);
 }
 EXPORT_SYMBOL(__page_pool_put_page);
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH net-next 4/4] net/mlx5e: Rx, Update page pool numa node when changed
  2019-10-22  4:44 [PATCH net-next 0/4] page_pool: API for numa node change handling Saeed Mahameed
                   ` (2 preceding siblings ...)
  2019-10-22  4:44 ` [PATCH net-next 3/4] page_pool: Restructure __page_pool_put_page() Saeed Mahameed
@ 2019-10-22  4:44 ` Saeed Mahameed
  2019-10-22 16:33 ` [PATCH net-next 0/4] page_pool: API for numa node change handling Jonathan Lemon
  2019-10-23 15:20 ` Or Gerlitz
  5 siblings, 0 replies; 13+ messages in thread
From: Saeed Mahameed @ 2019-10-22  4:44 UTC (permalink / raw)
  To: David S. Miller, Jesper Dangaard Brouer
  Cc: netdev, Jonathan Lemon, ilias.apalodimas, Saeed Mahameed

Once every napi poll cycle, check if numa node is different than
the page pool's numa id, and update it using page_pool_update_nid().

Alternatively, we could have registered an irq affinity change handler,
but page_pool_update_nid() must be called from napi context anyways, so
the handler won't actually help.

Performance testing:
XDP drop/tx rate and TCP single/multi stream, on mlx5 driver
while migrating rx ring irq from close to far numa:

mlx5 internal page cache was locally disabled to get pure page pool
results.

CPU: Intel(R) Xeon(R) CPU E5-2603 v4 @ 1.70GHz
NIC: Mellanox Technologies MT27700 Family [ConnectX-4] (100G)

XDP Drop/TX single core:
NUMA  | XDP  | Before    | After
---------------------------------------
Close | Drop | 11   Mpps | 10.8 Mpps
Far   | Drop | 4.4  Mpps | 5.8  Mpps

Close | TX   | 6.5 Mpps  | 6.5 Mpps
Far   | TX   | 4   Mpps  | 3.5  Mpps

Improvement is about 30% drop packet rate, 15% tx packet rate for numa
far test.
No degradation for numa close tests.

TCP single/multi cpu/stream:
NUMA  | #cpu | Before  | After
--------------------------------------
Close | 1    | 18 Gbps | 18 Gbps
Far   | 1    | 15 Gbps | 18 Gbps
Close | 12   | 80 Gbps | 80 Gbps
Far   | 12   | 68 Gbps | 80 Gbps

In all test cases we see improvement for the far numa case, and no
impact on the close numa case.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index d6a547238de0..3b6d55797c0f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -1386,6 +1386,9 @@ int mlx5e_poll_rx_cq(struct mlx5e_cq *cq, int budget)
 	if (unlikely(!test_bit(MLX5E_RQ_STATE_ENABLED, &rq->state)))
 		return 0;
 
+	if (rq->page_pool)
+		page_pool_nid_changed(rq->page_pool, numa_mem_id());
+
 	if (rq->cqd.left)
 		work_done += mlx5e_decompress_cqes_cont(rq, cqwq, 0, budget);
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 0/4] page_pool: API for numa node change handling
  2019-10-22  4:44 [PATCH net-next 0/4] page_pool: API for numa node change handling Saeed Mahameed
                   ` (3 preceding siblings ...)
  2019-10-22  4:44 ` [PATCH net-next 4/4] net/mlx5e: Rx, Update page pool numa node when changed Saeed Mahameed
@ 2019-10-22 16:33 ` Jonathan Lemon
  2019-10-23 15:20 ` Or Gerlitz
  5 siblings, 0 replies; 13+ messages in thread
From: Jonathan Lemon @ 2019-10-22 16:33 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: David S. Miller, Jesper Dangaard Brouer, netdev, ilias.apalodimas

On 21 Oct 2019, at 21:44, Saeed Mahameed wrote:

> Hi Dave & Jesper,
>
> This series extends page pool API to allow page pool consumers to update
> page pool numa node on the fly. This is required since on some systems,
> rx rings irqs can migrate between numa nodes, due to irq balancer or user
> defined scripts, current page pool has no way to know of such migration
> and will keep allocating and holding on to pages from a wrong numa node,
> which is bad for the consumer performance.
>
> 1) Add API to update numa node id of the page pool
> Consumers will call this API to update the page pool numa node id.
>
> 2) Don't recycle non-reusable pages:
> Page pool will check upon page return whether a page is suitable for
> recycling or not.
>  2.1) when it belongs to a different num node.
>  2.2) when it was allocated under memory pressure.
>
> 3) mlx5 will use the new API to update page pool numa id on demand.
>
> The series is a joint work between me and Jonathan, we tested it and it
> proved itself worthy to avoid page allocator bottlenecks and improve
> packet rate and cpu utilization significantly for the described
> scenarios above.
>
> Performance testing:
> XDP drop/tx rate and TCP single/multi stream, on mlx5 driver
> while migrating rx ring irq from close to far numa:
>
> mlx5 internal page cache was locally disabled to get pure page pool
> results.
>
> CPU: Intel(R) Xeon(R) CPU E5-2603 v4 @ 1.70GHz
> NIC: Mellanox Technologies MT27700 Family [ConnectX-4] (100G)
>
> XDP Drop/TX single core:
> NUMA  | XDP  | Before    | After
> ---------------------------------------
> Close | Drop | 11   Mpps | 10.9 Mpps
> Far   | Drop | 4.4  Mpps | 5.8  Mpps
>
> Close | TX   | 6.5 Mpps  | 6.5 Mpps
> Far   | TX   | 4   Mpps  | 3.5  Mpps
>
> Improvement is about 30% drop packet rate, 15% tx packet rate for numa
> far test.
> No degradation for numa close tests.
>
> TCP single/multi cpu/stream:
> NUMA  | #cpu | Before  | After
> --------------------------------------
> Close | 1    | 18 Gbps | 18 Gbps
> Far   | 1    | 15 Gbps | 18 Gbps
> Close | 12   | 80 Gbps | 80 Gbps
> Far   | 12   | 68 Gbps | 80 Gbps
>
> In all test cases we see improvement for the far numa case, and no
> impact on the close numa case.

These look good, thanks Saeed!
-- 
Jonathan



>
> Thanks,
> Saeed.
>
> ---
>
> Jonathan Lemon (1):
>   page_pool: Restructure __page_pool_put_page()
>
> Saeed Mahameed (3):
>   page_pool: Add API to update numa node
>   page_pool: Don't recycle non-reusable pages
>   net/mlx5e: Rx, Update page pool numa node when changed
>
>  .../net/ethernet/mellanox/mlx5/core/en_rx.c   |  3 ++
>  include/net/page_pool.h                       |  7 +++
>  include/trace/events/page_pool.h              | 22 +++++++++
>  net/core/page_pool.c                          | 46 +++++++++++++------
>  4 files changed, 65 insertions(+), 13 deletions(-)
>
> -- 
> 2.21.0

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 3/4] page_pool: Restructure __page_pool_put_page()
  2019-10-22  4:44 ` [PATCH net-next 3/4] page_pool: Restructure __page_pool_put_page() Saeed Mahameed
@ 2019-10-23  8:45   ` Ilias Apalodimas
  2019-10-23 18:31     ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 13+ messages in thread
From: Ilias Apalodimas @ 2019-10-23  8:45 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: David S. Miller, Jesper Dangaard Brouer, netdev, Jonathan Lemon

On Tue, Oct 22, 2019 at 04:44:24AM +0000, Saeed Mahameed wrote:
> From: Jonathan Lemon <jonathan.lemon@gmail.com>
> 
> 1) Rename functions to reflect what they are actually doing.
> 
> 2) Unify the condition to keep a page.
> 
> 3) When page can't be kept in cache, fallback to releasing page to page
> allocator in one place, instead of calling it from multiple conditions,
> and reuse __page_pool_return_page().
> 
> Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
> ---
>  net/core/page_pool.c | 38 +++++++++++++++++++-------------------
>  1 file changed, 19 insertions(+), 19 deletions(-)
> 
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index 8120aec999ce..65680aaa0818 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -258,6 +258,7 @@ static bool __page_pool_recycle_into_ring(struct page_pool *pool,
>  				   struct page *page)
>  {
>  	int ret;
> +
>  	/* BH protection not needed if current is serving softirq */
>  	if (in_serving_softirq())
>  		ret = ptr_ring_produce(&pool->ring, page);
> @@ -272,8 +273,8 @@ static bool __page_pool_recycle_into_ring(struct page_pool *pool,
>   *
>   * Caller must provide appropriate safe context.
>   */
> -static bool __page_pool_recycle_direct(struct page *page,
> -				       struct page_pool *pool)
> +static bool __page_pool_recycle_into_cache(struct page *page,
> +					   struct page_pool *pool)
>  {
>  	if (unlikely(pool->alloc.count == PP_ALLOC_CACHE_SIZE))
>  		return false;
> @@ -283,15 +284,18 @@ static bool __page_pool_recycle_direct(struct page *page,
>  	return true;
>  }
>  
> -/* page is NOT reusable when:
> - * 1) allocated when system is under some pressure. (page_is_pfmemalloc)
> - * 2) belongs to a different NUMA node than pool->p.nid.
> +/* Keep page in caches only if page:
> + * 1) wasn't allocated when system is under some pressure (page_is_pfmemalloc).
> + * 2) belongs to pool's numa node (pool->p.nid).
> + * 3) refcount is 1 (owned by page pool).
>   *
>   * To update pool->p.nid users must call page_pool_update_nid.
>   */
> -static bool pool_page_reusable(struct page_pool *pool, struct page *page)
> +static bool page_pool_keep_page(struct page_pool *pool, struct page *page)
>  {
> -	return !page_is_pfmemalloc(page) && page_to_nid(page) == pool->p.nid;
> +	return !page_is_pfmemalloc(page) &&
> +	       page_to_nid(page) == pool->p.nid &&
> +	       page_ref_count(page) == 1;
>  }
>  
>  void __page_pool_put_page(struct page_pool *pool,
> @@ -300,22 +304,19 @@ void __page_pool_put_page(struct page_pool *pool,
>  	/* This allocator is optimized for the XDP mode that uses
>  	 * one-frame-per-page, but have fallbacks that act like the
>  	 * regular page allocator APIs.
> -	 *
> -	 * refcnt == 1 means page_pool owns page, and can recycle it.
>  	 */
> -	if (likely(page_ref_count(page) == 1 &&
> -		   pool_page_reusable(pool, page))) {
> +
> +	if (likely(page_pool_keep_page(pool, page))) {
>  		/* Read barrier done in page_ref_count / READ_ONCE */
>  
>  		if (allow_direct && in_serving_softirq())
> -			if (__page_pool_recycle_direct(page, pool))
> +			if (__page_pool_recycle_into_cache(page, pool))
>  				return;
>  
> -		if (!__page_pool_recycle_into_ring(pool, page)) {
> -			/* Cache full, fallback to free pages */
> -			__page_pool_return_page(pool, page);
> -		}
> -		return;
> +		if (__page_pool_recycle_into_ring(pool, page))
> +			return;
> +
> +		/* Cache full, fallback to return pages */
>  	}
>  	/* Fallback/non-XDP mode: API user have elevated refcnt.
>  	 *
> @@ -330,8 +331,7 @@ void __page_pool_put_page(struct page_pool *pool,
>  	 * doing refcnt based recycle tricks, meaning another process
>  	 * will be invoking put_page.
>  	 */
> -	__page_pool_clean_page(pool, page);
> -	put_page(page);
> +	__page_pool_return_page(pool, page);

I think Jesper had a reason for calling them separately instead of 
__page_pool_return_page + put_page() (which in fact does the same thing). 

In the future he was planning on removing the __page_pool_clean_page call from
there, since someone might call __page_pool_put_page() after someone has called
__page_pool_clean_page()
Can we leave the calls there as-is?

>  }
>  EXPORT_SYMBOL(__page_pool_put_page);
>  
> -- 
> 2.21.0
> 

Thanks
/Ilias

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 0/4] page_pool: API for numa node change handling
  2019-10-22  4:44 [PATCH net-next 0/4] page_pool: API for numa node change handling Saeed Mahameed
                   ` (4 preceding siblings ...)
  2019-10-22 16:33 ` [PATCH net-next 0/4] page_pool: API for numa node change handling Jonathan Lemon
@ 2019-10-23 15:20 ` Or Gerlitz
  2019-10-23 19:10   ` Saeed Mahameed
  5 siblings, 1 reply; 13+ messages in thread
From: Or Gerlitz @ 2019-10-23 15:20 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: David S. Miller, Jesper Dangaard Brouer, netdev, Jonathan Lemon,
	ilias.apalodimas

On Tue, Oct 22, 2019 at 8:04 AM Saeed Mahameed <saeedm@mellanox.com> wrote:

> CPU: Intel(R) Xeon(R) CPU E5-2603 v4 @ 1.70GHz
> NIC: Mellanox Technologies MT27700 Family [ConnectX-4] (100G)
>
> XDP Drop/TX single core:
> NUMA  | XDP  | Before    | After
> ---------------------------------------
> Close | Drop | 11   Mpps | 10.9 Mpps
> Far   | Drop | 4.4  Mpps | 5.8  Mpps
>
> Close | TX   | 6.5 Mpps  | 6.5 Mpps
> Far   | TX   | 4   Mpps  | 3.5  Mpps
>
> Improvement is about 30% drop packet rate, 15% tx packet rate for numa far test.

some typo here, the TX far test results become worse, would be good to
clarify/fix the cover letter

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 3/4] page_pool: Restructure __page_pool_put_page()
  2019-10-23  8:45   ` Ilias Apalodimas
@ 2019-10-23 18:31     ` Jesper Dangaard Brouer
  2019-10-23 19:09       ` Saeed Mahameed
  0 siblings, 1 reply; 13+ messages in thread
From: Jesper Dangaard Brouer @ 2019-10-23 18:31 UTC (permalink / raw)
  To: Ilias Apalodimas
  Cc: Saeed Mahameed, David S. Miller, netdev, Jonathan Lemon, brouer

On Wed, 23 Oct 2019 11:45:15 +0300
Ilias Apalodimas <ilias.apalodimas@linaro.org> wrote:

> On Tue, Oct 22, 2019 at 04:44:24AM +0000, Saeed Mahameed wrote:
> > From: Jonathan Lemon <jonathan.lemon@gmail.com>
> > 
> > 1) Rename functions to reflect what they are actually doing.
> > 
> > 2) Unify the condition to keep a page.
> > 
> > 3) When page can't be kept in cache, fallback to releasing page to page
> > allocator in one place, instead of calling it from multiple conditions,
> > and reuse __page_pool_return_page().
> > 
> > Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
> > Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
> > ---
> >  net/core/page_pool.c | 38 +++++++++++++++++++-------------------
> >  1 file changed, 19 insertions(+), 19 deletions(-)
> > 
> > diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> > index 8120aec999ce..65680aaa0818 100644
> > --- a/net/core/page_pool.c
> > +++ b/net/core/page_pool.c
> > @@ -258,6 +258,7 @@ static bool __page_pool_recycle_into_ring(struct page_pool *pool,
> >  				   struct page *page)
> >  {
> >  	int ret;
> > +
> >  	/* BH protection not needed if current is serving softirq */
> >  	if (in_serving_softirq())
> >  		ret = ptr_ring_produce(&pool->ring, page);
> > @@ -272,8 +273,8 @@ static bool __page_pool_recycle_into_ring(struct page_pool *pool,
> >   *
> >   * Caller must provide appropriate safe context.
> >   */
> > -static bool __page_pool_recycle_direct(struct page *page,
> > -				       struct page_pool *pool)
> > +static bool __page_pool_recycle_into_cache(struct page *page,
> > +					   struct page_pool *pool)
> >  {
> >  	if (unlikely(pool->alloc.count == PP_ALLOC_CACHE_SIZE))
> >  		return false;
> > @@ -283,15 +284,18 @@ static bool __page_pool_recycle_direct(struct page *page,
> >  	return true;
> >  }
> >  
> > -/* page is NOT reusable when:
> > - * 1) allocated when system is under some pressure. (page_is_pfmemalloc)
> > - * 2) belongs to a different NUMA node than pool->p.nid.
> > +/* Keep page in caches only if page:
> > + * 1) wasn't allocated when system is under some pressure (page_is_pfmemalloc).
> > + * 2) belongs to pool's numa node (pool->p.nid).
> > + * 3) refcount is 1 (owned by page pool).
> >   *
> >   * To update pool->p.nid users must call page_pool_update_nid.
> >   */
> > -static bool pool_page_reusable(struct page_pool *pool, struct page *page)
> > +static bool page_pool_keep_page(struct page_pool *pool, struct page *page)
> >  {
> > -	return !page_is_pfmemalloc(page) && page_to_nid(page) == pool->p.nid;
> > +	return !page_is_pfmemalloc(page) &&
> > +	       page_to_nid(page) == pool->p.nid &&
> > +	       page_ref_count(page) == 1;
> >  }
> >  
> >  void __page_pool_put_page(struct page_pool *pool,
> > @@ -300,22 +304,19 @@ void __page_pool_put_page(struct page_pool *pool,
> >  	/* This allocator is optimized for the XDP mode that uses
> >  	 * one-frame-per-page, but have fallbacks that act like the
> >  	 * regular page allocator APIs.
> > -	 *
> > -	 * refcnt == 1 means page_pool owns page, and can recycle it.
> >  	 */
> > -	if (likely(page_ref_count(page) == 1 &&
> > -		   pool_page_reusable(pool, page))) {
> > +
> > +	if (likely(page_pool_keep_page(pool, page))) {
> >  		/* Read barrier done in page_ref_count / READ_ONCE */
> >  
> >  		if (allow_direct && in_serving_softirq())
> > -			if (__page_pool_recycle_direct(page, pool))
> > +			if (__page_pool_recycle_into_cache(page, pool))
> >  				return;
> >  
> > -		if (!__page_pool_recycle_into_ring(pool, page)) {
> > -			/* Cache full, fallback to free pages */
> > -			__page_pool_return_page(pool, page);
> > -		}
> > -		return;
> > +		if (__page_pool_recycle_into_ring(pool, page))
> > +			return;
> > +
> > +		/* Cache full, fallback to return pages */
> >  	}
> >  	/* Fallback/non-XDP mode: API user have elevated refcnt.
> >  	 *
> > @@ -330,8 +331,7 @@ void __page_pool_put_page(struct page_pool *pool,
> >  	 * doing refcnt based recycle tricks, meaning another process
> >  	 * will be invoking put_page.
> >  	 */
> > -	__page_pool_clean_page(pool, page);
> > -	put_page(page);
> > +	__page_pool_return_page(pool, page);  
> 
> I think Jesper had a reason for calling them separately instead of 
> __page_pool_return_page + put_page() (which in fact does the same thing). 
> 
> In the future he was planning on removing the __page_pool_clean_page call from
> there, since someone might call __page_pool_put_page() after someone has called
> __page_pool_clean_page()

Yes.  We need to work on removing this  __page_pool_clean_page() call,
to fulfill the plans of SKB returning/recycling page_pool pages.

> Can we leave the calls there as-is?

Yes, please.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 2/4] page_pool: Don't recycle non-reusable pages
  2019-10-22  4:44 ` [PATCH net-next 2/4] page_pool: Don't recycle non-reusable pages Saeed Mahameed
@ 2019-10-23 18:38   ` Jesper Dangaard Brouer
  2019-10-23 19:09     ` Saeed Mahameed
  0 siblings, 1 reply; 13+ messages in thread
From: Jesper Dangaard Brouer @ 2019-10-23 18:38 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: David S. Miller, netdev, Jonathan Lemon, ilias.apalodimas, brouer

On Tue, 22 Oct 2019 04:44:21 +0000
Saeed Mahameed <saeedm@mellanox.com> wrote:

> A page is NOT reusable when at least one of the following is true:
> 1) allocated when system was under some pressure. (page_is_pfmemalloc)
> 2) belongs to a different NUMA node than pool->p.nid.
> 
> To update pool->p.nid users should call page_pool_update_nid().
> 
> Holding on to such pages in the pool will hurt the consumer performance
> when the pool migrates to a different numa node.
> 
> Performance testing:
> XDP drop/tx rate and TCP single/multi stream, on mlx5 driver
> while migrating rx ring irq from close to far numa:
> 
> mlx5 internal page cache was locally disabled to get pure page pool
> results.
> 
> CPU: Intel(R) Xeon(R) CPU E5-2603 v4 @ 1.70GHz
> NIC: Mellanox Technologies MT27700 Family [ConnectX-4] (100G)
> 
> XDP Drop/TX single core:
> NUMA  | XDP  | Before    | After
> ---------------------------------------
> Close | Drop | 11   Mpps | 10.8 Mpps
> Far   | Drop | 4.4  Mpps | 5.8  Mpps
> 
> Close | TX   | 6.5 Mpps  | 6.5 Mpps
> Far   | TX   | 4   Mpps  | 3.5  Mpps
> 
> Improvement is about 30% drop packet rate, 15% tx packet rate for numa
> far test.
> No degradation for numa close tests.
> 
> TCP single/multi cpu/stream:
> NUMA  | #cpu | Before  | After
> --------------------------------------
> Close | 1    | 18 Gbps | 18 Gbps
> Far   | 1    | 15 Gbps | 18 Gbps
> Close | 12   | 80 Gbps | 80 Gbps
> Far   | 12   | 68 Gbps | 80 Gbps
> 
> In all test cases we see improvement for the far numa case, and no
> impact on the close numa case.
> 
> The impact of adding a check per page is very negligible, and shows no
> performance degradation whatsoever, also functionality wise it seems more
> correct and more robust for page pool to verify when pages should be
> recycled, since page pool can't guarantee where pages are coming from.
> 
> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
> ---
>  net/core/page_pool.c | 14 +++++++++++++-
>  1 file changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index 08ca9915c618..8120aec999ce 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -283,6 +283,17 @@ static bool __page_pool_recycle_direct(struct page *page,
>  	return true;
>  }
>  
> +/* page is NOT reusable when:
> + * 1) allocated when system is under some pressure. (page_is_pfmemalloc)
> + * 2) belongs to a different NUMA node than pool->p.nid.
> + *
> + * To update pool->p.nid users must call page_pool_update_nid.
> + */
> +static bool pool_page_reusable(struct page_pool *pool, struct page *page)
> +{
> +	return !page_is_pfmemalloc(page) && page_to_nid(page) == pool->p.nid;

I think we have discussed this before. You are adding the
page_is_pfmemalloc(page) memory pressure test, even-though the
allocation side of page_pool will not give us these kind of pages.

I'm going to accept this anyway, as it is a good safeguard, as it is a
very bad thing to recycle such a page.  Performance wise, you have
showed it have almost zero impact, which I guess is because we are
already reading the struct page area here.

> +}
> +
>  void __page_pool_put_page(struct page_pool *pool,
>  			  struct page *page, bool allow_direct)
>  {
> @@ -292,7 +303,8 @@ void __page_pool_put_page(struct page_pool *pool,
>  	 *
>  	 * refcnt == 1 means page_pool owns page, and can recycle it.
>  	 */
> -	if (likely(page_ref_count(page) == 1)) {
> +	if (likely(page_ref_count(page) == 1 &&
> +		   pool_page_reusable(pool, page))) {
>  		/* Read barrier done in page_ref_count / READ_ONCE */
>  
>  		if (allow_direct && in_serving_softirq())



-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 2/4] page_pool: Don't recycle non-reusable pages
  2019-10-23 18:38   ` Jesper Dangaard Brouer
@ 2019-10-23 19:09     ` Saeed Mahameed
  0 siblings, 0 replies; 13+ messages in thread
From: Saeed Mahameed @ 2019-10-23 19:09 UTC (permalink / raw)
  To: brouer; +Cc: jonathan.lemon, davem, netdev, ilias.apalodimas

On Wed, 2019-10-23 at 20:38 +0200, Jesper Dangaard Brouer wrote:
> On Tue, 22 Oct 2019 04:44:21 +0000
> Saeed Mahameed <saeedm@mellanox.com> wrote:
> 
> > A page is NOT reusable when at least one of the following is true:
> > 1) allocated when system was under some pressure.
> > (page_is_pfmemalloc)
> > 2) belongs to a different NUMA node than pool->p.nid.
> > 
> > To update pool->p.nid users should call page_pool_update_nid().
> > 
> > Holding on to such pages in the pool will hurt the consumer
> > performance
> > when the pool migrates to a different numa node.
> > 
> > Performance testing:
> > XDP drop/tx rate and TCP single/multi stream, on mlx5 driver
> > while migrating rx ring irq from close to far numa:
> > 
> > mlx5 internal page cache was locally disabled to get pure page pool
> > results.
> > 
> > CPU: Intel(R) Xeon(R) CPU E5-2603 v4 @ 1.70GHz
> > NIC: Mellanox Technologies MT27700 Family [ConnectX-4] (100G)
> > 
> > XDP Drop/TX single core:
> > NUMA  | XDP  | Before    | After
> > ---------------------------------------
> > Close | Drop | 11   Mpps | 10.8 Mpps
> > Far   | Drop | 4.4  Mpps | 5.8  Mpps
> > 
> > Close | TX   | 6.5 Mpps  | 6.5 Mpps
> > Far   | TX   | 4   Mpps  | 3.5  Mpps
> > 
> > Improvement is about 30% drop packet rate, 15% tx packet rate for
> > numa
> > far test.
> > No degradation for numa close tests.
> > 
> > TCP single/multi cpu/stream:
> > NUMA  | #cpu | Before  | After
> > --------------------------------------
> > Close | 1    | 18 Gbps | 18 Gbps
> > Far   | 1    | 15 Gbps | 18 Gbps
> > Close | 12   | 80 Gbps | 80 Gbps
> > Far   | 12   | 68 Gbps | 80 Gbps
> > 
> > In all test cases we see improvement for the far numa case, and no
> > impact on the close numa case.
> > 
> > The impact of adding a check per page is very negligible, and shows
> > no
> > performance degradation whatsoever, also functionality wise it
> > seems more
> > correct and more robust for page pool to verify when pages should
> > be
> > recycled, since page pool can't guarantee where pages are coming
> > from.
> > 
> > Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
> > Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
> > ---
> >  net/core/page_pool.c | 14 +++++++++++++-
> >  1 file changed, 13 insertions(+), 1 deletion(-)
> > 
> > diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> > index 08ca9915c618..8120aec999ce 100644
> > --- a/net/core/page_pool.c
> > +++ b/net/core/page_pool.c
> > @@ -283,6 +283,17 @@ static bool __page_pool_recycle_direct(struct
> > page *page,
> >  	return true;
> >  }
> >  
> > +/* page is NOT reusable when:
> > + * 1) allocated when system is under some pressure.
> > (page_is_pfmemalloc)
> > + * 2) belongs to a different NUMA node than pool->p.nid.
> > + *
> > + * To update pool->p.nid users must call page_pool_update_nid.
> > + */
> > +static bool pool_page_reusable(struct page_pool *pool, struct page
> > *page)
> > +{
> > +	return !page_is_pfmemalloc(page) && page_to_nid(page) == pool-
> > >p.nid;
> 
> I think we have discussed this before. You are adding the
> page_is_pfmemalloc(page) memory pressure test, even-though the
> allocation side of page_pool will not give us these kind of pages.
> 
> I'm going to accept this anyway, as it is a good safeguard, as it is
> a
> very bad thing to recycle such a page.  Performance wise, you have
> showed it have almost zero impact, which I guess is because we are
> already reading the struct page area here.

Yes, that is the case, and since the page pool allows consumers to
return any page to the pool (even pages that weren't allocated using
the pool APIs), it seems necessary to do such checks.



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 3/4] page_pool: Restructure __page_pool_put_page()
  2019-10-23 18:31     ` Jesper Dangaard Brouer
@ 2019-10-23 19:09       ` Saeed Mahameed
  0 siblings, 0 replies; 13+ messages in thread
From: Saeed Mahameed @ 2019-10-23 19:09 UTC (permalink / raw)
  To: ilias.apalodimas, brouer; +Cc: jonathan.lemon, davem, netdev

On Wed, 2019-10-23 at 20:31 +0200, Jesper Dangaard Brouer wrote:
> On Wed, 23 Oct 2019 11:45:15 +0300
> Ilias Apalodimas <ilias.apalodimas@linaro.org> wrote:
> 
> > On Tue, Oct 22, 2019 at 04:44:24AM +0000, Saeed Mahameed wrote:
> > > From: Jonathan Lemon <jonathan.lemon@gmail.com>
> > > 
> > > 1) Rename functions to reflect what they are actually doing.
> > > 
> > > 2) Unify the condition to keep a page.
> > > 
> > > 3) When page can't be kept in cache, fallback to releasing page
> > > to page
> > > allocator in one place, instead of calling it from multiple
> > > conditions,
> > > and reuse __page_pool_return_page().
> > > 
> > > Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
> > > Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
> > > ---
> > >  net/core/page_pool.c | 38 +++++++++++++++++++-------------------
> > >  1 file changed, 19 insertions(+), 19 deletions(-)
> > > 
> > > diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> > > index 8120aec999ce..65680aaa0818 100644
> > > --- a/net/core/page_pool.c
> > > +++ b/net/core/page_pool.c
> > > @@ -258,6 +258,7 @@ static bool
> > > __page_pool_recycle_into_ring(struct page_pool *pool,
> > >  				   struct page *page)
> > >  {
> > >  	int ret;
> > > +
> > >  	/* BH protection not needed if current is serving softirq */
> > >  	if (in_serving_softirq())
> > >  		ret = ptr_ring_produce(&pool->ring, page);
> > > @@ -272,8 +273,8 @@ static bool
> > > __page_pool_recycle_into_ring(struct page_pool *pool,
> > >   *
> > >   * Caller must provide appropriate safe context.
> > >   */
> > > -static bool __page_pool_recycle_direct(struct page *page,
> > > -				       struct page_pool *pool)
> > > +static bool __page_pool_recycle_into_cache(struct page *page,
> > > +					   struct page_pool *pool)
> > >  {
> > >  	if (unlikely(pool->alloc.count == PP_ALLOC_CACHE_SIZE))
> > >  		return false;
> > > @@ -283,15 +284,18 @@ static bool
> > > __page_pool_recycle_direct(struct page *page,
> > >  	return true;
> > >  }
> > >  
> > > -/* page is NOT reusable when:
> > > - * 1) allocated when system is under some pressure.
> > > (page_is_pfmemalloc)
> > > - * 2) belongs to a different NUMA node than pool->p.nid.
> > > +/* Keep page in caches only if page:
> > > + * 1) wasn't allocated when system is under some pressure
> > > (page_is_pfmemalloc).
> > > + * 2) belongs to pool's numa node (pool->p.nid).
> > > + * 3) refcount is 1 (owned by page pool).
> > >   *
> > >   * To update pool->p.nid users must call page_pool_update_nid.
> > >   */
> > > -static bool pool_page_reusable(struct page_pool *pool, struct
> > > page *page)
> > > +static bool page_pool_keep_page(struct page_pool *pool, struct
> > > page *page)
> > >  {
> > > -	return !page_is_pfmemalloc(page) && page_to_nid(page) == pool-
> > > >p.nid;
> > > +	return !page_is_pfmemalloc(page) &&
> > > +	       page_to_nid(page) == pool->p.nid &&
> > > +	       page_ref_count(page) == 1;
> > >  }
> > >  
> > >  void __page_pool_put_page(struct page_pool *pool,
> > > @@ -300,22 +304,19 @@ void __page_pool_put_page(struct page_pool
> > > *pool,
> > >  	/* This allocator is optimized for the XDP mode that uses
> > >  	 * one-frame-per-page, but have fallbacks that act like the
> > >  	 * regular page allocator APIs.
> > > -	 *
> > > -	 * refcnt == 1 means page_pool owns page, and can recycle it.
> > >  	 */
> > > -	if (likely(page_ref_count(page) == 1 &&
> > > -		   pool_page_reusable(pool, page))) {
> > > +
> > > +	if (likely(page_pool_keep_page(pool, page))) {
> > >  		/* Read barrier done in page_ref_count / READ_ONCE */
> > >  
> > >  		if (allow_direct && in_serving_softirq())
> > > -			if (__page_pool_recycle_direct(page, pool))
> > > +			if (__page_pool_recycle_into_cache(page, pool))
> > >  				return;
> > >  
> > > -		if (!__page_pool_recycle_into_ring(pool, page)) {
> > > -			/* Cache full, fallback to free pages */
> > > -			__page_pool_return_page(pool, page);
> > > -		}
> > > -		return;
> > > +		if (__page_pool_recycle_into_ring(pool, page))
> > > +			return;
> > > +
> > > +		/* Cache full, fallback to return pages */
> > >  	}
> > >  	/* Fallback/non-XDP mode: API user have elevated refcnt.
> > >  	 *
> > > @@ -330,8 +331,7 @@ void __page_pool_put_page(struct page_pool
> > > *pool,
> > >  	 * doing refcnt based recycle tricks, meaning another process
> > >  	 * will be invoking put_page.
> > >  	 */
> > > -	__page_pool_clean_page(pool, page);
> > > -	put_page(page);
> > > +	__page_pool_return_page(pool, page);  
> > 
> > I think Jesper had a reason for calling them separately instead of 
> > __page_pool_return_page + put_page() (which in fact does the same
> > thing). 
> > 
> > In the future he was planning on removing the
> > __page_pool_clean_page call from
> > there, since someone might call __page_pool_put_page() after
> > someone has called
> > __page_pool_clean_page()
> 
> Yes.  We need to work on removing this  __page_pool_clean_page()
> call,
> to fulfill the plans of SKB returning/recycling page_pool pages.
> 
> > Can we leave the calls there as-is?
> 
> Yes, please.
> 

Sure, i will drop this patch for now.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH net-next 0/4] page_pool: API for numa node change handling
  2019-10-23 15:20 ` Or Gerlitz
@ 2019-10-23 19:10   ` Saeed Mahameed
  0 siblings, 0 replies; 13+ messages in thread
From: Saeed Mahameed @ 2019-10-23 19:10 UTC (permalink / raw)
  To: gerlitz.or; +Cc: jonathan.lemon, davem, netdev, brouer, ilias.apalodimas

On Wed, 2019-10-23 at 18:20 +0300, Or Gerlitz wrote:
> On Tue, Oct 22, 2019 at 8:04 AM Saeed Mahameed <saeedm@mellanox.com>
> wrote:
> 
> > CPU: Intel(R) Xeon(R) CPU E5-2603 v4 @ 1.70GHz
> > NIC: Mellanox Technologies MT27700 Family [ConnectX-4] (100G)
> > 
> > XDP Drop/TX single core:
> > NUMA  | XDP  | Before    | After
> > ---------------------------------------
> > Close | Drop | 11   Mpps | 10.9 Mpps
> > Far   | Drop | 4.4  Mpps | 5.8  Mpps
> > 
> > Close | TX   | 6.5 Mpps  | 6.5 Mpps
> > Far   | TX   | 4   Mpps  | 3.5  Mpps
> > 
> > Improvement is about 30% drop packet rate, 15% tx packet rate for
> > numa far test.
> 
> some typo here, the TX far test results become worse, would be good
> to
> clarify/fix the cover letter

nice catch, will fix the documentation !


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-10-23 19:10 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-22  4:44 [PATCH net-next 0/4] page_pool: API for numa node change handling Saeed Mahameed
2019-10-22  4:44 ` [PATCH net-next 1/4] page_pool: Add API to update numa node Saeed Mahameed
2019-10-22  4:44 ` [PATCH net-next 2/4] page_pool: Don't recycle non-reusable pages Saeed Mahameed
2019-10-23 18:38   ` Jesper Dangaard Brouer
2019-10-23 19:09     ` Saeed Mahameed
2019-10-22  4:44 ` [PATCH net-next 3/4] page_pool: Restructure __page_pool_put_page() Saeed Mahameed
2019-10-23  8:45   ` Ilias Apalodimas
2019-10-23 18:31     ` Jesper Dangaard Brouer
2019-10-23 19:09       ` Saeed Mahameed
2019-10-22  4:44 ` [PATCH net-next 4/4] net/mlx5e: Rx, Update page pool numa node when changed Saeed Mahameed
2019-10-22 16:33 ` [PATCH net-next 0/4] page_pool: API for numa node change handling Jonathan Lemon
2019-10-23 15:20 ` Or Gerlitz
2019-10-23 19:10   ` Saeed Mahameed

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.