All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-nex V2 0/3] page_pool: API for numa node change handling
@ 2019-10-23 19:36 Saeed Mahameed
  2019-10-23 19:36 ` [PATCH net-nex V2 1/3] page_pool: Add API to update numa node Saeed Mahameed
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Saeed Mahameed @ 2019-10-23 19:36 UTC (permalink / raw)
  To: David S. Miller, Jesper Dangaard Brouer
  Cc: netdev, Jonathan Lemon, ilias.apalodimas, Saeed Mahameed

Hi Dave & Jesper,

This series extends page pool API to allow page pool consumers to update
page pool numa node on the fly. This is required since on some systems,
rx rings irqs can migrate between numa nodes, due to irq balancer or user
defined scripts, current page pool has no way to know of such migration
and will keep allocating and holding on to pages from a wrong numa node,
which is bad for the consumer performance.

1) Add API to update numa node id of the page pool
Consumers will call this API to update the page pool numa node id.

2) Don't recycle non-reusable pages:
Page pool will check upon page return whether a page is suitable for
recycling or not. 
 2.1) when it belongs to a different num node.
 2.2) when it was allocated under memory pressure.

3) mlx5 will use the new API to update page pool numa id on demand.

The series is a joint work between me and Jonathan, we tested it and it
proved itself worthy to avoid page allocator bottlenecks and improve
packet rate and cpu utilization significantly for the described
scenarios above.

Performance testing:
XDP drop/tx rate and TCP single/multi stream, on mlx5 driver
while migrating rx ring irq from close to far numa:

mlx5 internal page cache was locally disabled to get pure page pool
results.

CPU: Intel(R) Xeon(R) CPU E5-2603 v4 @ 1.70GHz
NIC: Mellanox Technologies MT27700 Family [ConnectX-4] (100G)

XDP Drop/TX single core:
NUMA  | XDP  | Before    | After
---------------------------------------
Close | Drop | 11   Mpps | 10.9 Mpps
Far   | Drop | 4.4  Mpps | 5.8  Mpps

Close | TX   | 6.5 Mpps  | 6.5 Mpps
Far   | TX   | 3.5 Mpps  | 4   Mpps

Improvement is about 30% drop packet rate, 15% tx packet rate for numa
far test.
No degradation for numa close tests.

TCP single/multi cpu/stream:
NUMA  | #cpu | Before  | After
--------------------------------------
Close | 1    | 18 Gbps | 18 Gbps
Far   | 1    | 15 Gbps | 18 Gbps
Close | 12   | 80 Gbps | 80 Gbps
Far   | 12   | 68 Gbps | 80 Gbps

In all test cases we see improvement for the far numa case, and no
impact on the close numa case.

v1->v2:
  - Drop last patch, as requested by Ilias and Jesper.
  - Fix documentation's performance numbers order.

Thanks,
Saeed.

---

Saeed Mahameed (3):
  page_pool: Add API to update numa node
  page_pool: Don't recycle non-reusable pages
  net/mlx5e: Rx, Update page pool numa node when changed

 .../net/ethernet/mellanox/mlx5/core/en_rx.c   |  3 +++
 include/net/page_pool.h                       |  7 ++++++
 include/trace/events/page_pool.h              | 22 +++++++++++++++++++
 net/core/page_pool.c                          | 22 ++++++++++++++++++-
 4 files changed, 53 insertions(+), 1 deletion(-)

-- 
2.21.0


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH net-nex V2 1/3] page_pool: Add API to update numa node
  2019-10-23 19:36 [PATCH net-nex V2 0/3] page_pool: API for numa node change handling Saeed Mahameed
@ 2019-10-23 19:36 ` Saeed Mahameed
  2019-10-24  4:50   ` Ilias Apalodimas
  2019-10-23 19:37 ` [PATCH net-nex V2 2/3] page_pool: Don't recycle non-reusable pages Saeed Mahameed
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 10+ messages in thread
From: Saeed Mahameed @ 2019-10-23 19:36 UTC (permalink / raw)
  To: David S. Miller, Jesper Dangaard Brouer
  Cc: netdev, Jonathan Lemon, ilias.apalodimas, Saeed Mahameed

Add page_pool_update_nid() to be called by page pool consumers when they
detect numa node changes.

It will update the page pool nid value to start allocating from the new
effective numa node.

This is to mitigate page pool allocating pages from a wrong numa node,
where the pool was originally allocated, and holding on to pages that
belong to a different numa node, which causes performance degradation.

For pages that are already being consumed and could be returned to the
pool by the consumer, in next patch we will add a check per page to avoid
recycling them back to the pool and return them to the page allocator.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
---
 include/net/page_pool.h          |  7 +++++++
 include/trace/events/page_pool.h | 22 ++++++++++++++++++++++
 net/core/page_pool.c             |  8 ++++++++
 3 files changed, 37 insertions(+)

diff --git a/include/net/page_pool.h b/include/net/page_pool.h
index 2cbcdbdec254..f46b78408e44 100644
--- a/include/net/page_pool.h
+++ b/include/net/page_pool.h
@@ -226,4 +226,11 @@ static inline bool page_pool_put(struct page_pool *pool)
 	return refcount_dec_and_test(&pool->user_cnt);
 }
 
+/* Caller must provide appropriate safe context, e.g. NAPI. */
+void page_pool_update_nid(struct page_pool *pool, int new_nid);
+static inline void page_pool_nid_changed(struct page_pool *pool, int new_nid)
+{
+	if (unlikely(pool->p.nid != new_nid))
+		page_pool_update_nid(pool, new_nid);
+}
 #endif /* _NET_PAGE_POOL_H */
diff --git a/include/trace/events/page_pool.h b/include/trace/events/page_pool.h
index 47b5ee880aa9..b58b6a3a3e57 100644
--- a/include/trace/events/page_pool.h
+++ b/include/trace/events/page_pool.h
@@ -81,6 +81,28 @@ TRACE_EVENT(page_pool_state_hold,
 		  __entry->pool, __entry->page, __entry->hold)
 );
 
+TRACE_EVENT(page_pool_update_nid,
+
+	TP_PROTO(const struct page_pool *pool, int new_nid),
+
+	TP_ARGS(pool, new_nid),
+
+	TP_STRUCT__entry(
+		__field(const struct page_pool *, pool)
+		__field(int,			  pool_nid)
+		__field(int,			  new_nid)
+	),
+
+	TP_fast_assign(
+		__entry->pool		= pool;
+		__entry->pool_nid	= pool->p.nid;
+		__entry->new_nid	= new_nid;
+	),
+
+	TP_printk("page_pool=%p pool_nid=%d new_nid=%d",
+		  __entry->pool, __entry->pool_nid, __entry->new_nid)
+);
+
 #endif /* _TRACE_PAGE_POOL_H */
 
 /* This part must be outside protection */
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 5bc65587f1c4..953af6d414fb 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -397,3 +397,11 @@ bool __page_pool_request_shutdown(struct page_pool *pool)
 	return __page_pool_safe_to_destroy(pool);
 }
 EXPORT_SYMBOL(__page_pool_request_shutdown);
+
+/* Caller must provide appropriate safe context, e.g. NAPI. */
+void page_pool_update_nid(struct page_pool *pool, int new_nid)
+{
+	trace_page_pool_update_nid(pool, new_nid);
+	pool->p.nid = new_nid;
+}
+EXPORT_SYMBOL(page_pool_update_nid);
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH net-nex V2 2/3] page_pool: Don't recycle non-reusable pages
  2019-10-23 19:36 [PATCH net-nex V2 0/3] page_pool: API for numa node change handling Saeed Mahameed
  2019-10-23 19:36 ` [PATCH net-nex V2 1/3] page_pool: Add API to update numa node Saeed Mahameed
@ 2019-10-23 19:37 ` Saeed Mahameed
  2019-10-24  5:00   ` Ilias Apalodimas
  2019-10-25 13:33   ` Jesper Dangaard Brouer
  2019-10-23 19:37 ` [PATCH net-nex V2 3/3] net/mlx5e: Rx, Update page pool numa node when changed Saeed Mahameed
  2019-10-28 18:31 ` [PATCH net-nex V2 0/3] page_pool: API for numa node change handling David Miller
  3 siblings, 2 replies; 10+ messages in thread
From: Saeed Mahameed @ 2019-10-23 19:37 UTC (permalink / raw)
  To: David S. Miller, Jesper Dangaard Brouer
  Cc: netdev, Jonathan Lemon, ilias.apalodimas, Saeed Mahameed

A page is NOT reusable when at least one of the following is true:
1) allocated when system was under some pressure. (page_is_pfmemalloc)
2) belongs to a different NUMA node than pool->p.nid.

To update pool->p.nid users should call page_pool_update_nid().

Holding on to such pages in the pool will hurt the consumer performance
when the pool migrates to a different numa node.

Performance testing:
XDP drop/tx rate and TCP single/multi stream, on mlx5 driver
while migrating rx ring irq from close to far numa:

mlx5 internal page cache was locally disabled to get pure page pool
results.

CPU: Intel(R) Xeon(R) CPU E5-2603 v4 @ 1.70GHz
NIC: Mellanox Technologies MT27700 Family [ConnectX-4] (100G)

XDP Drop/TX single core:
NUMA  | XDP  | Before    | After
---------------------------------------
Close | Drop | 11   Mpps | 10.9 Mpps
Far   | Drop | 4.4  Mpps | 5.8  Mpps

Close | TX   | 6.5 Mpps  | 6.5 Mpps
Far   | TX   | 3.5 Mpps  | 4  Mpps

Improvement is about 30% drop packet rate, 15% tx packet rate for numa
far test.
No degradation for numa close tests.

TCP single/multi cpu/stream:
NUMA  | #cpu | Before  | After
--------------------------------------
Close | 1    | 18 Gbps | 18 Gbps
Far   | 1    | 15 Gbps | 18 Gbps
Close | 12   | 80 Gbps | 80 Gbps
Far   | 12   | 68 Gbps | 80 Gbps

In all test cases we see improvement for the far numa case, and no
impact on the close numa case.

The impact of adding a check per page is very negligible, and shows no
performance degradation whatsoever, also functionality wise it seems more
correct and more robust for page pool to verify when pages should be
recycled, since page pool can't guarantee where pages are coming from.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
---
 net/core/page_pool.c | 14 +++++++++++++-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 953af6d414fb..73e4173c4dce 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -283,6 +283,17 @@ static bool __page_pool_recycle_direct(struct page *page,
 	return true;
 }
 
+/* page is NOT reusable when:
+ * 1) allocated when system is under some pressure. (page_is_pfmemalloc)
+ * 2) belongs to a different NUMA node than pool->p.nid.
+ *
+ * To update pool->p.nid users must call page_pool_update_nid.
+ */
+static bool pool_page_reusable(struct page_pool *pool, struct page *page)
+{
+	return !page_is_pfmemalloc(page) && page_to_nid(page) == pool->p.nid;
+}
+
 void __page_pool_put_page(struct page_pool *pool,
 			  struct page *page, bool allow_direct)
 {
@@ -292,7 +303,8 @@ void __page_pool_put_page(struct page_pool *pool,
 	 *
 	 * refcnt == 1 means page_pool owns page, and can recycle it.
 	 */
-	if (likely(page_ref_count(page) == 1)) {
+	if (likely(page_ref_count(page) == 1 &&
+		   pool_page_reusable(pool, page))) {
 		/* Read barrier done in page_ref_count / READ_ONCE */
 
 		if (allow_direct && in_serving_softirq())
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH net-nex V2 3/3] net/mlx5e: Rx, Update page pool numa node when changed
  2019-10-23 19:36 [PATCH net-nex V2 0/3] page_pool: API for numa node change handling Saeed Mahameed
  2019-10-23 19:36 ` [PATCH net-nex V2 1/3] page_pool: Add API to update numa node Saeed Mahameed
  2019-10-23 19:37 ` [PATCH net-nex V2 2/3] page_pool: Don't recycle non-reusable pages Saeed Mahameed
@ 2019-10-23 19:37 ` Saeed Mahameed
  2019-10-28 18:31 ` [PATCH net-nex V2 0/3] page_pool: API for numa node change handling David Miller
  3 siblings, 0 replies; 10+ messages in thread
From: Saeed Mahameed @ 2019-10-23 19:37 UTC (permalink / raw)
  To: David S. Miller, Jesper Dangaard Brouer
  Cc: netdev, Jonathan Lemon, ilias.apalodimas, Saeed Mahameed

Once every napi poll cycle, check if numa node is different than
the page pool's numa id, and update it using page_pool_update_nid().

Alternatively, we could have registered an irq affinity change handler,
but page_pool_update_nid() must be called from napi context anyways, so
the handler won't actually help.

Performance testing:
XDP drop/tx rate and TCP single/multi stream, on mlx5 driver
while migrating rx ring irq from close to far numa:

mlx5 internal page cache was locally disabled to get pure page pool
results.

CPU: Intel(R) Xeon(R) CPU E5-2603 v4 @ 1.70GHz
NIC: Mellanox Technologies MT27700 Family [ConnectX-4] (100G)

XDP Drop/TX single core:
NUMA  | XDP  | Before    | After
---------------------------------------
Close | Drop | 11   Mpps | 10.9 Mpps
Far   | Drop | 4.4  Mpps | 5.8  Mpps

Close | TX   | 6.5 Mpps  | 6.5 Mpps
Far   | TX   | 3.5 Mpps  | 4  Mpps

Improvement is about 30% drop packet rate, 15% tx packet rate for numa
far test.
No degradation for numa close tests.

TCP single/multi cpu/stream:
NUMA  | #cpu | Before  | After
--------------------------------------
Close | 1    | 18 Gbps | 18 Gbps
Far   | 1    | 15 Gbps | 18 Gbps
Close | 12   | 80 Gbps | 80 Gbps
Far   | 12   | 68 Gbps | 80 Gbps

In all test cases we see improvement for the far numa case, and no
impact on the close numa case.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index d6a547238de0..3b6d55797c0f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -1386,6 +1386,9 @@ int mlx5e_poll_rx_cq(struct mlx5e_cq *cq, int budget)
 	if (unlikely(!test_bit(MLX5E_RQ_STATE_ENABLED, &rq->state)))
 		return 0;
 
+	if (rq->page_pool)
+		page_pool_nid_changed(rq->page_pool, numa_mem_id());
+
 	if (rq->cqd.left)
 		work_done += mlx5e_decompress_cqes_cont(rq, cqwq, 0, budget);
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH net-nex V2 1/3] page_pool: Add API to update numa node
  2019-10-23 19:36 ` [PATCH net-nex V2 1/3] page_pool: Add API to update numa node Saeed Mahameed
@ 2019-10-24  4:50   ` Ilias Apalodimas
  0 siblings, 0 replies; 10+ messages in thread
From: Ilias Apalodimas @ 2019-10-24  4:50 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: David S. Miller, Jesper Dangaard Brouer, netdev, Jonathan Lemon

On Wed, Oct 23, 2019 at 07:36:58PM +0000, Saeed Mahameed wrote:
> Add page_pool_update_nid() to be called by page pool consumers when they
> detect numa node changes.
> 
> It will update the page pool nid value to start allocating from the new
> effective numa node.
> 
> This is to mitigate page pool allocating pages from a wrong numa node,
> where the pool was originally allocated, and holding on to pages that
> belong to a different numa node, which causes performance degradation.
> 
> For pages that are already being consumed and could be returned to the
> pool by the consumer, in next patch we will add a check per page to avoid
> recycling them back to the pool and return them to the page allocator.
> 
> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
> ---
>  include/net/page_pool.h          |  7 +++++++
>  include/trace/events/page_pool.h | 22 ++++++++++++++++++++++
>  net/core/page_pool.c             |  8 ++++++++
>  3 files changed, 37 insertions(+)
> 
> diff --git a/include/net/page_pool.h b/include/net/page_pool.h
> index 2cbcdbdec254..f46b78408e44 100644
> --- a/include/net/page_pool.h
> +++ b/include/net/page_pool.h
> @@ -226,4 +226,11 @@ static inline bool page_pool_put(struct page_pool *pool)
>  	return refcount_dec_and_test(&pool->user_cnt);
>  }
>  
> +/* Caller must provide appropriate safe context, e.g. NAPI. */
> +void page_pool_update_nid(struct page_pool *pool, int new_nid);
> +static inline void page_pool_nid_changed(struct page_pool *pool, int new_nid)
> +{
> +	if (unlikely(pool->p.nid != new_nid))
> +		page_pool_update_nid(pool, new_nid);
> +}
>  #endif /* _NET_PAGE_POOL_H */
> diff --git a/include/trace/events/page_pool.h b/include/trace/events/page_pool.h
> index 47b5ee880aa9..b58b6a3a3e57 100644
> --- a/include/trace/events/page_pool.h
> +++ b/include/trace/events/page_pool.h
> @@ -81,6 +81,28 @@ TRACE_EVENT(page_pool_state_hold,
>  		  __entry->pool, __entry->page, __entry->hold)
>  );
>  
> +TRACE_EVENT(page_pool_update_nid,
> +
> +	TP_PROTO(const struct page_pool *pool, int new_nid),
> +
> +	TP_ARGS(pool, new_nid),
> +
> +	TP_STRUCT__entry(
> +		__field(const struct page_pool *, pool)
> +		__field(int,			  pool_nid)
> +		__field(int,			  new_nid)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->pool		= pool;
> +		__entry->pool_nid	= pool->p.nid;
> +		__entry->new_nid	= new_nid;
> +	),
> +
> +	TP_printk("page_pool=%p pool_nid=%d new_nid=%d",
> +		  __entry->pool, __entry->pool_nid, __entry->new_nid)
> +);
> +
>  #endif /* _TRACE_PAGE_POOL_H */
>  
>  /* This part must be outside protection */
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index 5bc65587f1c4..953af6d414fb 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -397,3 +397,11 @@ bool __page_pool_request_shutdown(struct page_pool *pool)
>  	return __page_pool_safe_to_destroy(pool);
>  }
>  EXPORT_SYMBOL(__page_pool_request_shutdown);
> +
> +/* Caller must provide appropriate safe context, e.g. NAPI. */
> +void page_pool_update_nid(struct page_pool *pool, int new_nid)
> +{
> +	trace_page_pool_update_nid(pool, new_nid);
> +	pool->p.nid = new_nid;
> +}
> +EXPORT_SYMBOL(page_pool_update_nid);
> -- 
> 2.21.0
> 

Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-nex V2 2/3] page_pool: Don't recycle non-reusable pages
  2019-10-23 19:37 ` [PATCH net-nex V2 2/3] page_pool: Don't recycle non-reusable pages Saeed Mahameed
@ 2019-10-24  5:00   ` Ilias Apalodimas
  2019-10-25 13:33   ` Jesper Dangaard Brouer
  1 sibling, 0 replies; 10+ messages in thread
From: Ilias Apalodimas @ 2019-10-24  5:00 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: David S. Miller, Jesper Dangaard Brouer, netdev, Jonathan Lemon

On Wed, Oct 23, 2019 at 07:37:00PM +0000, Saeed Mahameed wrote:
> A page is NOT reusable when at least one of the following is true:
> 1) allocated when system was under some pressure. (page_is_pfmemalloc)
> 2) belongs to a different NUMA node than pool->p.nid.
> 
> To update pool->p.nid users should call page_pool_update_nid().
> 
> Holding on to such pages in the pool will hurt the consumer performance
> when the pool migrates to a different numa node.
> 
> Performance testing:
> XDP drop/tx rate and TCP single/multi stream, on mlx5 driver
> while migrating rx ring irq from close to far numa:
> 
> mlx5 internal page cache was locally disabled to get pure page pool
> results.
> 
> CPU: Intel(R) Xeon(R) CPU E5-2603 v4 @ 1.70GHz
> NIC: Mellanox Technologies MT27700 Family [ConnectX-4] (100G)
> 
> XDP Drop/TX single core:
> NUMA  | XDP  | Before    | After
> ---------------------------------------
> Close | Drop | 11   Mpps | 10.9 Mpps
> Far   | Drop | 4.4  Mpps | 5.8  Mpps
> 
> Close | TX   | 6.5 Mpps  | 6.5 Mpps
> Far   | TX   | 3.5 Mpps  | 4  Mpps
> 
> Improvement is about 30% drop packet rate, 15% tx packet rate for numa
> far test.
> No degradation for numa close tests.
> 
> TCP single/multi cpu/stream:
> NUMA  | #cpu | Before  | After
> --------------------------------------
> Close | 1    | 18 Gbps | 18 Gbps
> Far   | 1    | 15 Gbps | 18 Gbps
> Close | 12   | 80 Gbps | 80 Gbps
> Far   | 12   | 68 Gbps | 80 Gbps
> 
> In all test cases we see improvement for the far numa case, and no
> impact on the close numa case.
> 
> The impact of adding a check per page is very negligible, and shows no
> performance degradation whatsoever, also functionality wise it seems more
> correct and more robust for page pool to verify when pages should be
> recycled, since page pool can't guarantee where pages are coming from.
> 
> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
> ---
>  net/core/page_pool.c | 14 +++++++++++++-
>  1 file changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index 953af6d414fb..73e4173c4dce 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -283,6 +283,17 @@ static bool __page_pool_recycle_direct(struct page *page,
>  	return true;
>  }
>  
> +/* page is NOT reusable when:
> + * 1) allocated when system is under some pressure. (page_is_pfmemalloc)
> + * 2) belongs to a different NUMA node than pool->p.nid.
> + *
> + * To update pool->p.nid users must call page_pool_update_nid.
> + */
> +static bool pool_page_reusable(struct page_pool *pool, struct page *page)
> +{
> +	return !page_is_pfmemalloc(page) && page_to_nid(page) == pool->p.nid;
> +}
> +
>  void __page_pool_put_page(struct page_pool *pool,
>  			  struct page *page, bool allow_direct)
>  {
> @@ -292,7 +303,8 @@ void __page_pool_put_page(struct page_pool *pool,
>  	 *
>  	 * refcnt == 1 means page_pool owns page, and can recycle it.
>  	 */
> -	if (likely(page_ref_count(page) == 1)) {
> +	if (likely(page_ref_count(page) == 1 &&
> +		   pool_page_reusable(pool, page))) {
>  		/* Read barrier done in page_ref_count / READ_ONCE */
>  
>  		if (allow_direct && in_serving_softirq())
> -- 
> 2.21.0
> 

Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-nex V2 2/3] page_pool: Don't recycle non-reusable pages
  2019-10-23 19:37 ` [PATCH net-nex V2 2/3] page_pool: Don't recycle non-reusable pages Saeed Mahameed
  2019-10-24  5:00   ` Ilias Apalodimas
@ 2019-10-25 13:33   ` Jesper Dangaard Brouer
  2019-10-25 18:37     ` Jonathan Lemon
  2019-10-28 22:57     ` Saeed Mahameed
  1 sibling, 2 replies; 10+ messages in thread
From: Jesper Dangaard Brouer @ 2019-10-25 13:33 UTC (permalink / raw)
  To: Saeed Mahameed
  Cc: David S. Miller, netdev, Jonathan Lemon, ilias.apalodimas, brouer

On Wed, 23 Oct 2019 19:37:00 +0000
Saeed Mahameed <saeedm@mellanox.com> wrote:

> A page is NOT reusable when at least one of the following is true:
> 1) allocated when system was under some pressure. (page_is_pfmemalloc)
> 2) belongs to a different NUMA node than pool->p.nid.
> 
> To update pool->p.nid users should call page_pool_update_nid().
> 
> Holding on to such pages in the pool will hurt the consumer performance
> when the pool migrates to a different numa node.
> 
> Performance testing:
> XDP drop/tx rate and TCP single/multi stream, on mlx5 driver
> while migrating rx ring irq from close to far numa:
> 
> mlx5 internal page cache was locally disabled to get pure page pool
> results.

Could you show us the code that disable the local page cache?


> CPU: Intel(R) Xeon(R) CPU E5-2603 v4 @ 1.70GHz
> NIC: Mellanox Technologies MT27700 Family [ConnectX-4] (100G)
> 
> XDP Drop/TX single core:
> NUMA  | XDP  | Before    | After
> ---------------------------------------
> Close | Drop | 11   Mpps | 10.9 Mpps
> Far   | Drop | 4.4  Mpps | 5.8  Mpps
> 
> Close | TX   | 6.5 Mpps  | 6.5 Mpps
> Far   | TX   | 3.5 Mpps  | 4  Mpps
> 
> Improvement is about 30% drop packet rate, 15% tx packet rate for numa
> far test.
> No degradation for numa close tests.
> 
> TCP single/multi cpu/stream:
> NUMA  | #cpu | Before  | After
> --------------------------------------
> Close | 1    | 18 Gbps | 18 Gbps
> Far   | 1    | 15 Gbps | 18 Gbps
> Close | 12   | 80 Gbps | 80 Gbps
> Far   | 12   | 68 Gbps | 80 Gbps
> 
> In all test cases we see improvement for the far numa case, and no
> impact on the close numa case.
> 
> The impact of adding a check per page is very negligible, and shows no
> performance degradation whatsoever, also functionality wise it seems more
> correct and more robust for page pool to verify when pages should be
> recycled, since page pool can't guarantee where pages are coming from.
> 
> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
> ---
>  net/core/page_pool.c | 14 +++++++++++++-
>  1 file changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index 953af6d414fb..73e4173c4dce 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -283,6 +283,17 @@ static bool __page_pool_recycle_direct(struct page *page,
>  	return true;
>  }
>  
> +/* page is NOT reusable when:
> + * 1) allocated when system is under some pressure. (page_is_pfmemalloc)
> + * 2) belongs to a different NUMA node than pool->p.nid.
> + *
> + * To update pool->p.nid users must call page_pool_update_nid.
> + */
> +static bool pool_page_reusable(struct page_pool *pool, struct page *page)
> +{
> +	return !page_is_pfmemalloc(page) && page_to_nid(page) == pool->p.nid;
> +}
> +
>  void __page_pool_put_page(struct page_pool *pool,
>  			  struct page *page, bool allow_direct)
>  {
> @@ -292,7 +303,8 @@ void __page_pool_put_page(struct page_pool *pool,
>  	 *
>  	 * refcnt == 1 means page_pool owns page, and can recycle it.
>  	 */
> -	if (likely(page_ref_count(page) == 1)) {
> +	if (likely(page_ref_count(page) == 1 &&
> +		   pool_page_reusable(pool, page))) {

I'm afraid that we are slowly chipping away the performance benefit
with these incremental changes, adding more checks. We have an extreme
performance use-case with XDP_DROP, where we want drivers to use this
code path to hit __page_pool_recycle_direct(), that is a simple array
update (protected under NAPI) into pool->alloc.cache[].

To preserve this hot-path, you could instead flush pool->alloc.cache[]
in the call page_pool_update_nid().  And move the pool_page_reusable()
check into __page_pool_recycle_into_ring().  (Below added the '>>' with
remaining code to make this easier to see)


>  		/* Read barrier done in page_ref_count / READ_ONCE */
>  
>  		if (allow_direct && in_serving_softirq())
>>			if (__page_pool_recycle_direct(page, pool))
>>				return;
>>
>>		if (!__page_pool_recycle_into_ring(pool, page)) {
>>			/* Cache full, fallback to free pages */
>>			__page_pool_return_page(pool, page);
>>		}
>>		return;
>>	}
>>	/* Fallback/non-XDP mode: API user have elevated refcnt.


-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

For easier review:

/* Only allow direct recycling in special circumstances, into the
 * alloc side cache.  E.g. during RX-NAPI processing for XDP_DROP use-case.
 *
 * Caller must provide appropriate safe context.
 */
static bool __page_pool_recycle_direct(struct page *page,
				       struct page_pool *pool)
{
	if (unlikely(pool->alloc.count == PP_ALLOC_CACHE_SIZE))
		return false;

	/* Caller MUST have verified/know (page_ref_count(page) == 1) */
	pool->alloc.cache[pool->alloc.count++] = page;
	return true;
}



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-nex V2 2/3] page_pool: Don't recycle non-reusable pages
  2019-10-25 13:33   ` Jesper Dangaard Brouer
@ 2019-10-25 18:37     ` Jonathan Lemon
  2019-10-28 22:57     ` Saeed Mahameed
  1 sibling, 0 replies; 10+ messages in thread
From: Jonathan Lemon @ 2019-10-25 18:37 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Saeed Mahameed, David S. Miller, netdev, ilias.apalodimas

On 25 Oct 2019, at 6:33, Jesper Dangaard Brouer wrote:

> On Wed, 23 Oct 2019 19:37:00 +0000
> Saeed Mahameed <saeedm@mellanox.com> wrote:
>
>> @@ -292,7 +303,8 @@ void __page_pool_put_page(struct page_pool *pool,
>>  	 *
>>  	 * refcnt == 1 means page_pool owns page, and can recycle it.
>>  	 */
>> -	if (likely(page_ref_count(page) == 1)) {
>> +	if (likely(page_ref_count(page) == 1 &&
>> +		   pool_page_reusable(pool, page))) {
>
> I'm afraid that we are slowly chipping away the performance benefit
> with these incremental changes, adding more checks. We have an extreme
> performance use-case with XDP_DROP, where we want drivers to use this
> code path to hit __page_pool_recycle_direct(), that is a simple array
> update (protected under NAPI) into pool->alloc.cache[].

The checks have to be paid for somewhere.  mlx4, mlx5, i40e, igb, etc
already do these in determining whether to recycle a page or not.  The
pfmemalloc check can't be moved into the flush path.  I'd rather have
a common point where this is performed, moving it out of each driver.
-- 
Jonathan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-nex V2 0/3] page_pool: API for numa node change handling
  2019-10-23 19:36 [PATCH net-nex V2 0/3] page_pool: API for numa node change handling Saeed Mahameed
                   ` (2 preceding siblings ...)
  2019-10-23 19:37 ` [PATCH net-nex V2 3/3] net/mlx5e: Rx, Update page pool numa node when changed Saeed Mahameed
@ 2019-10-28 18:31 ` David Miller
  3 siblings, 0 replies; 10+ messages in thread
From: David Miller @ 2019-10-28 18:31 UTC (permalink / raw)
  To: saeedm; +Cc: brouer, netdev, jonathan.lemon, ilias.apalodimas


Jesper has given you feedback asking you to show him the code which was
used to disable the mlx5e page cache during your testing of patch #2.

It's been 5 days since he asked for that simple thing, and I haven't
seen a reply yet.

Therefore I am tossing this series.

Please repost after you've worked things out with Jesper.

Thanks.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH net-nex V2 2/3] page_pool: Don't recycle non-reusable pages
  2019-10-25 13:33   ` Jesper Dangaard Brouer
  2019-10-25 18:37     ` Jonathan Lemon
@ 2019-10-28 22:57     ` Saeed Mahameed
  1 sibling, 0 replies; 10+ messages in thread
From: Saeed Mahameed @ 2019-10-28 22:57 UTC (permalink / raw)
  To: brouer; +Cc: jonathan.lemon, davem, netdev, ilias.apalodimas

On Fri, 2019-10-25 at 15:33 +0200, Jesper Dangaard Brouer wrote:
> On Wed, 23 Oct 2019 19:37:00 +0000
> Saeed Mahameed <saeedm@mellanox.com> wrote:
> 
> > A page is NOT reusable when at least one of the following is true:
> > 1) allocated when system was under some pressure.
> > (page_is_pfmemalloc)
> > 2) belongs to a different NUMA node than pool->p.nid.
> > 
> > To update pool->p.nid users should call page_pool_update_nid().
> > 
> > Holding on to such pages in the pool will hurt the consumer
> > performance
> > when the pool migrates to a different numa node.
> > 
> > Performance testing:
> > XDP drop/tx rate and TCP single/multi stream, on mlx5 driver
> > while migrating rx ring irq from close to far numa:
> > 
> > mlx5 internal page cache was locally disabled to get pure page pool
> > results.
> 
> Could you show us the code that disable the local page cache?
> 

here you go, very simple patch, just avoid calling mlx5 internal cache
API and avoid dma mapping/unmapping.

https://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux.git/commit/?h=topic/page-pool-numa&id=aa7345f62cad62cf19fbf2dee2a8b15de34b33e3

> 
> > CPU: Intel(R) Xeon(R) CPU E5-2603 v4 @ 1.70GHz
> > NIC: Mellanox Technologies MT27700 Family [ConnectX-4] (100G)
> > 
> > XDP Drop/TX single core:
> > NUMA  | XDP  | Before    | After
> > ---------------------------------------
> > Close | Drop | 11   Mpps | 10.9 Mpps
> > Far   | Drop | 4.4  Mpps | 5.8  Mpps
> > 
> > Close | TX   | 6.5 Mpps  | 6.5 Mpps
> > Far   | TX   | 3.5 Mpps  | 4  Mpps
> > 
> > Improvement is about 30% drop packet rate, 15% tx packet rate for
> > numa
> > far test.
> > No degradation for numa close tests.
> > 
> > TCP single/multi cpu/stream:
> > NUMA  | #cpu | Before  | After
> > --------------------------------------
> > Close | 1    | 18 Gbps | 18 Gbps
> > Far   | 1    | 15 Gbps | 18 Gbps
> > Close | 12   | 80 Gbps | 80 Gbps
> > Far   | 12   | 68 Gbps | 80 Gbps
> > 
> > In all test cases we see improvement for the far numa case, and no
> > impact on the close numa case.
> > 
> > The impact of adding a check per page is very negligible, and shows
> > no
> > performance degradation whatsoever, also functionality wise it
> > seems more
> > correct and more robust for page pool to verify when pages should
> > be
> > recycled, since page pool can't guarantee where pages are coming
> > from.
> > 
> > Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
> > Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
> > ---
> >  net/core/page_pool.c | 14 +++++++++++++-
> >  1 file changed, 13 insertions(+), 1 deletion(-)
> > 
> > diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> > index 953af6d414fb..73e4173c4dce 100644
> > --- a/net/core/page_pool.c
> > +++ b/net/core/page_pool.c
> > @@ -283,6 +283,17 @@ static bool __page_pool_recycle_direct(struct
> > page *page,
> >  	return true;
> >  }
> >  
> > +/* page is NOT reusable when:
> > + * 1) allocated when system is under some pressure.
> > (page_is_pfmemalloc)
> > + * 2) belongs to a different NUMA node than pool->p.nid.
> > + *
> > + * To update pool->p.nid users must call page_pool_update_nid.
> > + */
> > +static bool pool_page_reusable(struct page_pool *pool, struct page
> > *page)
> > +{
> > +	return !page_is_pfmemalloc(page) && page_to_nid(page) == pool-
> > >p.nid;
> > +}
> > +
> >  void __page_pool_put_page(struct page_pool *pool,
> >  			  struct page *page, bool allow_direct)
> >  {
> > @@ -292,7 +303,8 @@ void __page_pool_put_page(struct page_pool
> > *pool,
> >  	 *
> >  	 * refcnt == 1 means page_pool owns page, and can recycle it.
> >  	 */
> > -	if (likely(page_ref_count(page) == 1)) {
> > +	if (likely(page_ref_count(page) == 1 &&
> > +		   pool_page_reusable(pool, page))) {
> 
> I'm afraid that we are slowly chipping away the performance benefit
> with these incremental changes, adding more checks. We have an
> extreme
> performance use-case with XDP_DROP, where we want drivers to use this
> code path to hit __page_pool_recycle_direct(), that is a simple array
> update (protected under NAPI) into pool->alloc.cache[].
> 
> To preserve this hot-path, you could instead flush pool-
> >alloc.cache[]
> in the call page_pool_update_nid().  And move the
> pool_page_reusable()
> check into __page_pool_recycle_into_ring().  (Below added the '>>'
> with
> remaining code to make this easier to see)
> 

Flush simply won't work, what about pages that are currently in use by
the driver HW, and will be returned to the cache only after the flush ?
who will flush those pages ? and flushing the HW queue will be too
heavy !

As Jonathan pointed out, all drivers are doing this check today, all we
need is to simply move this to the cache and take it out of the drivers
responsibility. 

for the pfmemalloc, we can make sure the page pool will never allow
such pages to be allocated in first place, but this is very risky and
can break many "kernel in emergency" features.

I think what Jonathan and I did here, is the best course of action.

> 
> >  		/* Read barrier done in page_ref_count / READ_ONCE */
> >  
> >  		if (allow_direct && in_serving_softirq())
> > > 			if (__page_pool_recycle_direct(page, pool))
> > > 				return;
> > > 
> > > 		if (!__page_pool_recycle_into_ring(pool, page)) {
> > > 			/* Cache full, fallback to free pages */
> > > 			__page_pool_return_page(pool, page);
> > > 		}
> > > 		return;
> > > 	}
> > > 	/* Fallback/non-XDP mode: API user have elevated refcnt.
> 
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-10-28 22:58 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-10-23 19:36 [PATCH net-nex V2 0/3] page_pool: API for numa node change handling Saeed Mahameed
2019-10-23 19:36 ` [PATCH net-nex V2 1/3] page_pool: Add API to update numa node Saeed Mahameed
2019-10-24  4:50   ` Ilias Apalodimas
2019-10-23 19:37 ` [PATCH net-nex V2 2/3] page_pool: Don't recycle non-reusable pages Saeed Mahameed
2019-10-24  5:00   ` Ilias Apalodimas
2019-10-25 13:33   ` Jesper Dangaard Brouer
2019-10-25 18:37     ` Jonathan Lemon
2019-10-28 22:57     ` Saeed Mahameed
2019-10-23 19:37 ` [PATCH net-nex V2 3/3] net/mlx5e: Rx, Update page pool numa node when changed Saeed Mahameed
2019-10-28 18:31 ` [PATCH net-nex V2 0/3] page_pool: API for numa node change handling David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.