All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v5 net-next 0/3] add DMA-sync-for-device capability to page_pool API
@ 2019-11-20 14:54 Lorenzo Bianconi
  2019-11-20 14:54 ` [PATCH v5 net-next 1/3] net: mvneta: rely on page_pool_recycle_direct in mvneta_run_xdp Lorenzo Bianconi
                   ` (4 more replies)
  0 siblings, 5 replies; 14+ messages in thread
From: Lorenzo Bianconi @ 2019-11-20 14:54 UTC (permalink / raw)
  To: netdev
  Cc: davem, ilias.apalodimas, brouer, lorenzo.bianconi, mcroce,
	jonathan.lemon

Introduce the possibility to sync DMA memory for device in the page_pool API.
This feature allows to sync proper DMA size and not always full buffer
(dma_sync_single_for_device can be very costly).
Please note DMA-sync-for-CPU is still device driver responsibility.
Relying on page_pool DMA sync mvneta driver improves XDP_DROP pps of
about 170Kpps:

- XDP_DROP DMA sync managed by mvneta driver:	~420Kpps
- XDP_DROP DMA sync managed by page_pool API:	~585Kpps

Do not change naming convention for the moment since the changes will hit other
drivers as well. I will address it in another series.

Changes since v4:
- do not allow the driver to set max_len to 0
- convert PP_FLAG_DMA_MAP/PP_FLAG_DMA_SYNC_DEV to BIT() macro

Changes since v3:
- move dma_sync_for_device before putting the page in ptr_ring in
  __page_pool_recycle_into_ring since ptr_ring can be consumed
  concurrently. Simplify the code moving dma_sync_for_device
  before running __page_pool_recycle_direct/__page_pool_recycle_into_ring

Changes since v2:
- rely on PP_FLAG_DMA_SYNC_DEV flag instead of dma_sync

Changes since v1:
- rename sync in dma_sync
- set dma_sync_size to 0xFFFFFFFF in page_pool_recycle_direct and
  page_pool_put_page routines
- Improve documentation

Lorenzo Bianconi (3):
  net: mvneta: rely on page_pool_recycle_direct in mvneta_run_xdp
  net: page_pool: add the possibility to sync DMA memory for device
  net: mvneta: get rid of huge dma sync in mvneta_rx_refill

 drivers/net/ethernet/marvell/mvneta.c | 24 +++++++++++-------
 include/net/page_pool.h               | 24 +++++++++++++-----
 net/core/page_pool.c                  | 36 +++++++++++++++++++++++++--
 3 files changed, 67 insertions(+), 17 deletions(-)

-- 
2.21.0


^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH v5 net-next 1/3] net: mvneta: rely on page_pool_recycle_direct in mvneta_run_xdp
  2019-11-20 14:54 [PATCH v5 net-next 0/3] add DMA-sync-for-device capability to page_pool API Lorenzo Bianconi
@ 2019-11-20 14:54 ` Lorenzo Bianconi
  2019-11-20 15:45   ` Jesper Dangaard Brouer
  2019-11-20 14:54 ` [PATCH v5 net-next 2/3] net: page_pool: add the possibility to sync DMA memory for device Lorenzo Bianconi
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Lorenzo Bianconi @ 2019-11-20 14:54 UTC (permalink / raw)
  To: netdev
  Cc: davem, ilias.apalodimas, brouer, lorenzo.bianconi, mcroce,
	jonathan.lemon

Rely on page_pool_recycle_direct and not on xdp_return_buff in
mvneta_run_xdp. This is a preliminary patch to limit the dma sync len
to the one strictly necessary

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 drivers/net/ethernet/marvell/mvneta.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 12e03b15f0ab..f7713c2c68e1 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -2097,7 +2097,8 @@ mvneta_run_xdp(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
 		err = xdp_do_redirect(pp->dev, xdp, prog);
 		if (err) {
 			ret = MVNETA_XDP_DROPPED;
-			xdp_return_buff(xdp);
+			page_pool_recycle_direct(rxq->page_pool,
+						 virt_to_head_page(xdp->data));
 		} else {
 			ret = MVNETA_XDP_REDIR;
 		}
@@ -2106,7 +2107,8 @@ mvneta_run_xdp(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
 	case XDP_TX:
 		ret = mvneta_xdp_xmit_back(pp, xdp);
 		if (ret != MVNETA_XDP_TX)
-			xdp_return_buff(xdp);
+			page_pool_recycle_direct(rxq->page_pool,
+						 virt_to_head_page(xdp->data));
 		break;
 	default:
 		bpf_warn_invalid_xdp_action(act);
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v5 net-next 2/3] net: page_pool: add the possibility to sync DMA memory for device
  2019-11-20 14:54 [PATCH v5 net-next 0/3] add DMA-sync-for-device capability to page_pool API Lorenzo Bianconi
  2019-11-20 14:54 ` [PATCH v5 net-next 1/3] net: mvneta: rely on page_pool_recycle_direct in mvneta_run_xdp Lorenzo Bianconi
@ 2019-11-20 14:54 ` Lorenzo Bianconi
  2019-11-20 17:49   ` Jesper Dangaard Brouer
  2019-11-20 14:54 ` [PATCH v5 net-next 3/3] net: mvneta: get rid of huge dma sync in mvneta_rx_refill Lorenzo Bianconi
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 14+ messages in thread
From: Lorenzo Bianconi @ 2019-11-20 14:54 UTC (permalink / raw)
  To: netdev
  Cc: davem, ilias.apalodimas, brouer, lorenzo.bianconi, mcroce,
	jonathan.lemon

Introduce the following parameters in order to add the possibility to sync
DMA memory for device before putting allocated pages in the page_pool
caches:
- PP_FLAG_DMA_SYNC_DEV: if set in page_pool_params flags, all pages that
  the driver gets from page_pool will be DMA-synced-for-device according
  to the length provided by the device driver. Please note DMA-sync-for-CPU
  is still device driver responsibility
- offset: DMA address offset where the DMA engine starts copying rx data
- max_len: maximum DMA memory size page_pool is allowed to flush. This
  is currently used in __page_pool_alloc_pages_slow routine when pages
  are allocated from page allocator
These parameters are supposed to be set by device drivers.

This optimization reduces the length of the DMA-sync-for-device.
The optimization is valid because pages are initially
DMA-synced-for-device as defined via max_len. At RX time, the driver
will perform a DMA-sync-for-CPU on the memory for the packet length.
What is important is the memory occupied by packet payload, because
this is the area CPU is allowed to read and modify. As we don't track
cache-lines written into by the CPU, simply use the packet payload length
as dma_sync_size at page_pool recycle time. This also take into account
any tail-extend.

Tested-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 include/net/page_pool.h | 24 ++++++++++++++++++------
 net/core/page_pool.c    | 36 ++++++++++++++++++++++++++++++++++--
 2 files changed, 52 insertions(+), 8 deletions(-)

diff --git a/include/net/page_pool.h b/include/net/page_pool.h
index ace881c15dcb..49b27643dda4 100644
--- a/include/net/page_pool.h
+++ b/include/net/page_pool.h
@@ -34,8 +34,18 @@
 #include <linux/ptr_ring.h>
 #include <linux/dma-direction.h>
 
-#define PP_FLAG_DMA_MAP 1 /* Should page_pool do the DMA map/unmap */
-#define PP_FLAG_ALL	PP_FLAG_DMA_MAP
+#define PP_FLAG_DMA_MAP		BIT(0) /* Should page_pool do the DMA
+					* map/unmap
+					*/
+#define PP_FLAG_DMA_SYNC_DEV	BIT(1) /* If set all pages that the driver gets
+					* from page_pool will be
+					* DMA-synced-for-device according to
+					* the length provided by the device
+					* driver.
+					* Please note DMA-sync-for-CPU is still
+					* device driver responsibility
+					*/
+#define PP_FLAG_ALL		(PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV)
 
 /*
  * Fast allocation side cache array/stack
@@ -65,6 +75,8 @@ struct page_pool_params {
 	int		nid;  /* Numa node id to allocate from pages from */
 	struct device	*dev; /* device, for DMA pre-mapping purposes */
 	enum dma_data_direction dma_dir; /* DMA mapping direction */
+	unsigned int	max_len; /* max DMA sync memory size */
+	unsigned int	offset;  /* DMA addr offset */
 };
 
 struct page_pool {
@@ -151,8 +163,8 @@ static inline void page_pool_use_xdp_mem(struct page_pool *pool,
 #endif
 
 /* Never call this directly, use helpers below */
-void __page_pool_put_page(struct page_pool *pool,
-			  struct page *page, bool allow_direct);
+void __page_pool_put_page(struct page_pool *pool, struct page *page,
+			  unsigned int dma_sync_size, bool allow_direct);
 
 static inline void page_pool_put_page(struct page_pool *pool,
 				      struct page *page, bool allow_direct)
@@ -161,14 +173,14 @@ static inline void page_pool_put_page(struct page_pool *pool,
 	 * allow registering MEM_TYPE_PAGE_POOL, but shield linker.
 	 */
 #ifdef CONFIG_PAGE_POOL
-	__page_pool_put_page(pool, page, allow_direct);
+	__page_pool_put_page(pool, page, -1, allow_direct);
 #endif
 }
 /* Very limited use-cases allow recycle direct */
 static inline void page_pool_recycle_direct(struct page_pool *pool,
 					    struct page *page)
 {
-	__page_pool_put_page(pool, page, true);
+	__page_pool_put_page(pool, page, -1, true);
 }
 
 /* Disconnects a page (from a page_pool).  API users can have a need
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index e28db2ef8e12..495454a9ff3e 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -47,6 +47,21 @@ static int page_pool_init(struct page_pool *pool,
 	    (pool->p.dma_dir != DMA_BIDIRECTIONAL))
 		return -EINVAL;
 
+	if (pool->p.flags & PP_FLAG_DMA_SYNC_DEV) {
+		/* In order to request DMA-sync-for-device the page
+		 * needs to be mapped
+		 */
+		if (!(pool->p.flags & PP_FLAG_DMA_MAP))
+			return -EINVAL;
+
+		if (!pool->p.max_len)
+			return -EINVAL;
+
+		/* pool->p.offset has to be set according to the address
+		 * offset used by the DMA engine to start copying rx data
+		 */
+	}
+
 	if (ptr_ring_init(&pool->ring, ring_qsize, GFP_KERNEL) < 0)
 		return -ENOMEM;
 
@@ -115,6 +130,16 @@ static struct page *__page_pool_get_cached(struct page_pool *pool)
 	return page;
 }
 
+static void page_pool_dma_sync_for_device(struct page_pool *pool,
+					  struct page *page,
+					  unsigned int dma_sync_size)
+{
+	dma_sync_size = min(dma_sync_size, pool->p.max_len);
+	dma_sync_single_range_for_device(pool->p.dev, page->dma_addr,
+					 pool->p.offset, dma_sync_size,
+					 pool->p.dma_dir);
+}
+
 /* slow path */
 noinline
 static struct page *__page_pool_alloc_pages_slow(struct page_pool *pool,
@@ -159,6 +184,9 @@ static struct page *__page_pool_alloc_pages_slow(struct page_pool *pool,
 	}
 	page->dma_addr = dma;
 
+	if (pool->p.flags & PP_FLAG_DMA_SYNC_DEV)
+		page_pool_dma_sync_for_device(pool, page, pool->p.max_len);
+
 skip_dma_map:
 	/* Track how many pages are held 'in-flight' */
 	pool->pages_state_hold_cnt++;
@@ -281,8 +309,8 @@ static bool __page_pool_recycle_direct(struct page *page,
 	return true;
 }
 
-void __page_pool_put_page(struct page_pool *pool,
-			  struct page *page, bool allow_direct)
+void __page_pool_put_page(struct page_pool *pool, struct page *page,
+			  unsigned int dma_sync_size, bool allow_direct)
 {
 	/* This allocator is optimized for the XDP mode that uses
 	 * one-frame-per-page, but have fallbacks that act like the
@@ -293,6 +321,10 @@ void __page_pool_put_page(struct page_pool *pool,
 	if (likely(page_ref_count(page) == 1)) {
 		/* Read barrier done in page_ref_count / READ_ONCE */
 
+		if (pool->p.flags & PP_FLAG_DMA_SYNC_DEV)
+			page_pool_dma_sync_for_device(pool, page,
+						      dma_sync_size);
+
 		if (allow_direct && in_serving_softirq())
 			if (__page_pool_recycle_direct(page, pool))
 				return;
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH v5 net-next 3/3] net: mvneta: get rid of huge dma sync in mvneta_rx_refill
  2019-11-20 14:54 [PATCH v5 net-next 0/3] add DMA-sync-for-device capability to page_pool API Lorenzo Bianconi
  2019-11-20 14:54 ` [PATCH v5 net-next 1/3] net: mvneta: rely on page_pool_recycle_direct in mvneta_run_xdp Lorenzo Bianconi
  2019-11-20 14:54 ` [PATCH v5 net-next 2/3] net: page_pool: add the possibility to sync DMA memory for device Lorenzo Bianconi
@ 2019-11-20 14:54 ` Lorenzo Bianconi
  2019-11-20 17:49   ` Jesper Dangaard Brouer
  2019-11-20 15:37 ` [PATCH v5 net-next 0/3] add DMA-sync-for-device capability to page_pool API Jesper Dangaard Brouer
  2019-11-20 20:34 ` David Miller
  4 siblings, 1 reply; 14+ messages in thread
From: Lorenzo Bianconi @ 2019-11-20 14:54 UTC (permalink / raw)
  To: netdev
  Cc: davem, ilias.apalodimas, brouer, lorenzo.bianconi, mcroce,
	jonathan.lemon

Get rid of costly dma_sync_single_for_device in mvneta_rx_refill
since now the driver can let page_pool API to manage needed DMA
sync with a proper size.

- XDP_DROP DMA sync managed by mvneta driver:	~420Kpps
- XDP_DROP DMA sync managed by page_pool API:	~585Kpps

Tested-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 drivers/net/ethernet/marvell/mvneta.c | 26 +++++++++++++++-----------
 1 file changed, 15 insertions(+), 11 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index f7713c2c68e1..a06d109c9e80 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -1846,7 +1846,6 @@ static int mvneta_rx_refill(struct mvneta_port *pp,
 			    struct mvneta_rx_queue *rxq,
 			    gfp_t gfp_mask)
 {
-	enum dma_data_direction dma_dir;
 	dma_addr_t phys_addr;
 	struct page *page;
 
@@ -1856,9 +1855,6 @@ static int mvneta_rx_refill(struct mvneta_port *pp,
 		return -ENOMEM;
 
 	phys_addr = page_pool_get_dma_addr(page) + pp->rx_offset_correction;
-	dma_dir = page_pool_get_dma_dir(rxq->page_pool);
-	dma_sync_single_for_device(pp->dev->dev.parent, phys_addr,
-				   MVNETA_MAX_RX_BUF_SIZE, dma_dir);
 	mvneta_rx_desc_fill(rx_desc, phys_addr, page, rxq);
 
 	return 0;
@@ -2097,8 +2093,10 @@ mvneta_run_xdp(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
 		err = xdp_do_redirect(pp->dev, xdp, prog);
 		if (err) {
 			ret = MVNETA_XDP_DROPPED;
-			page_pool_recycle_direct(rxq->page_pool,
-						 virt_to_head_page(xdp->data));
+			__page_pool_put_page(rxq->page_pool,
+					virt_to_head_page(xdp->data),
+					xdp->data_end - xdp->data_hard_start,
+					true);
 		} else {
 			ret = MVNETA_XDP_REDIR;
 		}
@@ -2107,8 +2105,10 @@ mvneta_run_xdp(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
 	case XDP_TX:
 		ret = mvneta_xdp_xmit_back(pp, xdp);
 		if (ret != MVNETA_XDP_TX)
-			page_pool_recycle_direct(rxq->page_pool,
-						 virt_to_head_page(xdp->data));
+			__page_pool_put_page(rxq->page_pool,
+					virt_to_head_page(xdp->data),
+					xdp->data_end - xdp->data_hard_start,
+					true);
 		break;
 	default:
 		bpf_warn_invalid_xdp_action(act);
@@ -2117,8 +2117,10 @@ mvneta_run_xdp(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
 		trace_xdp_exception(pp->dev, prog, act);
 		/* fall through */
 	case XDP_DROP:
-		page_pool_recycle_direct(rxq->page_pool,
-					 virt_to_head_page(xdp->data));
+		__page_pool_put_page(rxq->page_pool,
+				     virt_to_head_page(xdp->data),
+				     xdp->data_end - xdp->data_hard_start,
+				     true);
 		ret = MVNETA_XDP_DROPPED;
 		break;
 	}
@@ -3067,11 +3069,13 @@ static int mvneta_create_page_pool(struct mvneta_port *pp,
 	struct bpf_prog *xdp_prog = READ_ONCE(pp->xdp_prog);
 	struct page_pool_params pp_params = {
 		.order = 0,
-		.flags = PP_FLAG_DMA_MAP,
+		.flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV,
 		.pool_size = size,
 		.nid = cpu_to_node(0),
 		.dev = pp->dev->dev.parent,
 		.dma_dir = xdp_prog ? DMA_BIDIRECTIONAL : DMA_FROM_DEVICE,
+		.offset = pp->rx_offset_correction,
+		.max_len = MVNETA_MAX_RX_BUF_SIZE,
 	};
 	int err;
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 net-next 0/3] add DMA-sync-for-device capability to page_pool API
  2019-11-20 14:54 [PATCH v5 net-next 0/3] add DMA-sync-for-device capability to page_pool API Lorenzo Bianconi
                   ` (2 preceding siblings ...)
  2019-11-20 14:54 ` [PATCH v5 net-next 3/3] net: mvneta: get rid of huge dma sync in mvneta_rx_refill Lorenzo Bianconi
@ 2019-11-20 15:37 ` Jesper Dangaard Brouer
  2019-11-20 15:45   ` Lorenzo Bianconi
  2019-11-20 20:34 ` David Miller
  4 siblings, 1 reply; 14+ messages in thread
From: Jesper Dangaard Brouer @ 2019-11-20 15:37 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: netdev, davem, ilias.apalodimas, lorenzo.bianconi, mcroce,
	jonathan.lemon, brouer

On Wed, 20 Nov 2019 16:54:16 +0200
Lorenzo Bianconi <lorenzo@kernel.org> wrote:

> Do not change naming convention for the moment since the changes will
> hit other drivers as well. I will address it in another series.

Yes, I agree, as I also said over IRC (freenode #xdp).

The length (dma_sync_size) API addition to __page_pool_put_page() is
for now confined to this driver (mvneta).  We can postpone the API-name
discussion, as you have promised here (and on IRC) that you will
"address it in another series".  (Guess, given timing, the followup
series and discussion will happen after the merge window...)

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 net-next 1/3] net: mvneta: rely on page_pool_recycle_direct in mvneta_run_xdp
  2019-11-20 14:54 ` [PATCH v5 net-next 1/3] net: mvneta: rely on page_pool_recycle_direct in mvneta_run_xdp Lorenzo Bianconi
@ 2019-11-20 15:45   ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 14+ messages in thread
From: Jesper Dangaard Brouer @ 2019-11-20 15:45 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: netdev, davem, ilias.apalodimas, lorenzo.bianconi, mcroce,
	jonathan.lemon, brouer

On Wed, 20 Nov 2019 16:54:17 +0200
Lorenzo Bianconi <lorenzo@kernel.org> wrote:

> Rely on page_pool_recycle_direct and not on xdp_return_buff in
> mvneta_run_xdp. This is a preliminary patch to limit the dma sync len
> to the one strictly necessary
> 
> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>

Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 net-next 0/3] add DMA-sync-for-device capability to page_pool API
  2019-11-20 15:37 ` [PATCH v5 net-next 0/3] add DMA-sync-for-device capability to page_pool API Jesper Dangaard Brouer
@ 2019-11-20 15:45   ` Lorenzo Bianconi
  2019-11-20 18:05     ` Ilias Apalodimas
  0 siblings, 1 reply; 14+ messages in thread
From: Lorenzo Bianconi @ 2019-11-20 15:45 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: netdev, davem, ilias.apalodimas, lorenzo.bianconi, mcroce,
	jonathan.lemon

[-- Attachment #1: Type: text/plain, Size: 887 bytes --]

> On Wed, 20 Nov 2019 16:54:16 +0200
> Lorenzo Bianconi <lorenzo@kernel.org> wrote:
> 
> > Do not change naming convention for the moment since the changes will
> > hit other drivers as well. I will address it in another series.
> 
> Yes, I agree, as I also said over IRC (freenode #xdp).
> 
> The length (dma_sync_size) API addition to __page_pool_put_page() is
> for now confined to this driver (mvneta).  We can postpone the API-name
> discussion, as you have promised here (and on IRC) that you will
> "address it in another series".  (Guess, given timing, the followup
> series and discussion will happen after the merge window...)

Right, I will work on it after next merging window.

Regards,
Lorenzo

> 
> -- 
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   LinkedIn: http://www.linkedin.com/in/brouer
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 net-next 2/3] net: page_pool: add the possibility to sync DMA memory for device
  2019-11-20 14:54 ` [PATCH v5 net-next 2/3] net: page_pool: add the possibility to sync DMA memory for device Lorenzo Bianconi
@ 2019-11-20 17:49   ` Jesper Dangaard Brouer
  2019-11-20 18:00     ` Ilias Apalodimas
  2019-11-20 18:42     ` Jonathan Lemon
  0 siblings, 2 replies; 14+ messages in thread
From: Jesper Dangaard Brouer @ 2019-11-20 17:49 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: netdev, davem, ilias.apalodimas, lorenzo.bianconi, mcroce,
	jonathan.lemon, brouer

On Wed, 20 Nov 2019 16:54:18 +0200
Lorenzo Bianconi <lorenzo@kernel.org> wrote:

> Introduce the following parameters in order to add the possibility to sync
> DMA memory for device before putting allocated pages in the page_pool
> caches:
> - PP_FLAG_DMA_SYNC_DEV: if set in page_pool_params flags, all pages that
>   the driver gets from page_pool will be DMA-synced-for-device according
>   to the length provided by the device driver. Please note DMA-sync-for-CPU
>   is still device driver responsibility
> - offset: DMA address offset where the DMA engine starts copying rx data
> - max_len: maximum DMA memory size page_pool is allowed to flush. This
>   is currently used in __page_pool_alloc_pages_slow routine when pages
>   are allocated from page allocator
> These parameters are supposed to be set by device drivers.
> 
> This optimization reduces the length of the DMA-sync-for-device.
> The optimization is valid because pages are initially
> DMA-synced-for-device as defined via max_len. At RX time, the driver
> will perform a DMA-sync-for-CPU on the memory for the packet length.
> What is important is the memory occupied by packet payload, because
> this is the area CPU is allowed to read and modify. As we don't track
> cache-lines written into by the CPU, simply use the packet payload length
> as dma_sync_size at page_pool recycle time. This also take into account
> any tail-extend.
> 
> Tested-by: Matteo Croce <mcroce@redhat.com>
> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> ---

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>

[...]
> @@ -281,8 +309,8 @@ static bool __page_pool_recycle_direct(struct page *page,
>  	return true;
>  }
>  
> -void __page_pool_put_page(struct page_pool *pool,
> -			  struct page *page, bool allow_direct)
> +void __page_pool_put_page(struct page_pool *pool, struct page *page,
> +			  unsigned int dma_sync_size, bool allow_direct)
>  {
>  	/* This allocator is optimized for the XDP mode that uses
>  	 * one-frame-per-page, but have fallbacks that act like the
> @@ -293,6 +321,10 @@ void __page_pool_put_page(struct page_pool *pool,
>  	if (likely(page_ref_count(page) == 1)) {
>  		/* Read barrier done in page_ref_count / READ_ONCE */
>  
> +		if (pool->p.flags & PP_FLAG_DMA_SYNC_DEV)
> +			page_pool_dma_sync_for_device(pool, page,
> +						      dma_sync_size);
> +
>  		if (allow_direct && in_serving_softirq())
>  			if (__page_pool_recycle_direct(page, pool))
>  				return;

I am slightly concerned this touch the fast-path code. But at-least on
Intel, I don't think this is measurable.  And for the ARM64 board it
was a huge win... thus I'll accept this.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 net-next 3/3] net: mvneta: get rid of huge dma sync in mvneta_rx_refill
  2019-11-20 14:54 ` [PATCH v5 net-next 3/3] net: mvneta: get rid of huge dma sync in mvneta_rx_refill Lorenzo Bianconi
@ 2019-11-20 17:49   ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 14+ messages in thread
From: Jesper Dangaard Brouer @ 2019-11-20 17:49 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: netdev, davem, ilias.apalodimas, lorenzo.bianconi, mcroce,
	jonathan.lemon, brouer

On Wed, 20 Nov 2019 16:54:19 +0200
Lorenzo Bianconi <lorenzo@kernel.org> wrote:

> Get rid of costly dma_sync_single_for_device in mvneta_rx_refill
> since now the driver can let page_pool API to manage needed DMA
> sync with a proper size.
> 
> - XDP_DROP DMA sync managed by mvneta driver:	~420Kpps
> - XDP_DROP DMA sync managed by page_pool API:	~585Kpps
> 
> Tested-by: Matteo Croce <mcroce@redhat.com>

Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 net-next 2/3] net: page_pool: add the possibility to sync DMA memory for device
  2019-11-20 17:49   ` Jesper Dangaard Brouer
@ 2019-11-20 18:00     ` Ilias Apalodimas
  2019-11-20 18:42     ` Jonathan Lemon
  1 sibling, 0 replies; 14+ messages in thread
From: Ilias Apalodimas @ 2019-11-20 18:00 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Lorenzo Bianconi, netdev, davem, lorenzo.bianconi, mcroce,
	jonathan.lemon

> [...]
> > @@ -281,8 +309,8 @@ static bool __page_pool_recycle_direct(struct page *page,
> >  	return true;
> >  }
> >  
> > -void __page_pool_put_page(struct page_pool *pool,
> > -			  struct page *page, bool allow_direct)
> > +void __page_pool_put_page(struct page_pool *pool, struct page *page,
> > +			  unsigned int dma_sync_size, bool allow_direct)
> >  {
> >  	/* This allocator is optimized for the XDP mode that uses
> >  	 * one-frame-per-page, but have fallbacks that act like the
> > @@ -293,6 +321,10 @@ void __page_pool_put_page(struct page_pool *pool,
> >  	if (likely(page_ref_count(page) == 1)) {
> >  		/* Read barrier done in page_ref_count / READ_ONCE */
> >  
> > +		if (pool->p.flags & PP_FLAG_DMA_SYNC_DEV)
> > +			page_pool_dma_sync_for_device(pool, page,
> > +						      dma_sync_size);
> > +
> >  		if (allow_direct && in_serving_softirq())
> >  			if (__page_pool_recycle_direct(page, pool))
> >  				return;
> 
> I am slightly concerned this touch the fast-path code. But at-least on
> Intel, I don't think this is measurable.  And for the ARM64 board it
> was a huge win... thus I'll accept this.

Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 net-next 0/3] add DMA-sync-for-device capability to page_pool API
  2019-11-20 15:45   ` Lorenzo Bianconi
@ 2019-11-20 18:05     ` Ilias Apalodimas
  0 siblings, 0 replies; 14+ messages in thread
From: Ilias Apalodimas @ 2019-11-20 18:05 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: Jesper Dangaard Brouer, netdev, davem, lorenzo.bianconi, mcroce,
	jonathan.lemon

On Wed, Nov 20, 2019 at 05:45:22PM +0200, Lorenzo Bianconi wrote:
> > On Wed, 20 Nov 2019 16:54:16 +0200
> > Lorenzo Bianconi <lorenzo@kernel.org> wrote:
> > 
> > > Do not change naming convention for the moment since the changes will
> > > hit other drivers as well. I will address it in another series.
> > 
> > Yes, I agree, as I also said over IRC (freenode #xdp).
> > 
> > The length (dma_sync_size) API addition to __page_pool_put_page() is
> > for now confined to this driver (mvneta).  We can postpone the API-name
> > discussion, as you have promised here (and on IRC) that you will
> > "address it in another series".  (Guess, given timing, the followup
> > series and discussion will happen after the merge window...)
> 
> Right, I will work on it after next merging window.

As we discussed we can go a step further as page_pool_put_page() and
page_pool_recycle_direct() can probably go away. Syncing with the len make more
sense as long. As we document the 'allow_direct' flag sufficiently we can just
rename __page_pool_put_page -> page_pool_put_page and keep one function only. 

In any case this patchset is fine for this merge window

for the series

Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 net-next 2/3] net: page_pool: add the possibility to sync DMA memory for device
  2019-11-20 17:49   ` Jesper Dangaard Brouer
  2019-11-20 18:00     ` Ilias Apalodimas
@ 2019-11-20 18:42     ` Jonathan Lemon
  2019-11-20 19:04       ` Jesper Dangaard Brouer
  1 sibling, 1 reply; 14+ messages in thread
From: Jonathan Lemon @ 2019-11-20 18:42 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Lorenzo Bianconi, netdev, davem, ilias.apalodimas,
	lorenzo.bianconi, mcroce



On 20 Nov 2019, at 9:49, Jesper Dangaard Brouer wrote:

> On Wed, 20 Nov 2019 16:54:18 +0200
> Lorenzo Bianconi <lorenzo@kernel.org> wrote:
>
>> Introduce the following parameters in order to add the possibility to 
>> sync
>> DMA memory for device before putting allocated pages in the page_pool
>> caches:
>> - PP_FLAG_DMA_SYNC_DEV: if set in page_pool_params flags, all pages 
>> that
>>   the driver gets from page_pool will be DMA-synced-for-device 
>> according
>>   to the length provided by the device driver. Please note 
>> DMA-sync-for-CPU
>>   is still device driver responsibility
>> - offset: DMA address offset where the DMA engine starts copying rx 
>> data
>> - max_len: maximum DMA memory size page_pool is allowed to flush. 
>> This
>>   is currently used in __page_pool_alloc_pages_slow routine when 
>> pages
>>   are allocated from page allocator
>> These parameters are supposed to be set by device drivers.
>>
>> This optimization reduces the length of the DMA-sync-for-device.
>> The optimization is valid because pages are initially
>> DMA-synced-for-device as defined via max_len. At RX time, the driver
>> will perform a DMA-sync-for-CPU on the memory for the packet length.
>> What is important is the memory occupied by packet payload, because
>> this is the area CPU is allowed to read and modify. As we don't track
>> cache-lines written into by the CPU, simply use the packet payload 
>> length
>> as dma_sync_size at page_pool recycle time. This also take into 
>> account
>> any tail-extend.
>>
>> Tested-by: Matteo Croce <mcroce@redhat.com>
>> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
>> ---
>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
>
> [...]
>> @@ -281,8 +309,8 @@ static bool __page_pool_recycle_direct(struct 
>> page *page,
>>  	return true;
>>  }
>>
>> -void __page_pool_put_page(struct page_pool *pool,
>> -			  struct page *page, bool allow_direct)
>> +void __page_pool_put_page(struct page_pool *pool, struct page *page,
>> +			  unsigned int dma_sync_size, bool allow_direct)
>>  {
>>  	/* This allocator is optimized for the XDP mode that uses
>>  	 * one-frame-per-page, but have fallbacks that act like the
>> @@ -293,6 +321,10 @@ void __page_pool_put_page(struct page_pool 
>> *pool,
>>  	if (likely(page_ref_count(page) == 1)) {
>>  		/* Read barrier done in page_ref_count / READ_ONCE */
>>
>> +		if (pool->p.flags & PP_FLAG_DMA_SYNC_DEV)
>> +			page_pool_dma_sync_for_device(pool, page,
>> +						      dma_sync_size);
>> +
>>  		if (allow_direct && in_serving_softirq())
>>  			if (__page_pool_recycle_direct(page, pool))
>>  				return;
>
> I am slightly concerned this touch the fast-path code. But at-least on
> Intel, I don't think this is measurable.  And for the ARM64 board it
> was a huge win... thus I'll accept this.

For the next series:

The "in_serving_softirq()" check shows up on profiling.  I'd
like to remove this and just have a "direct" flag, where the
caller takes the responsibility of the correct context.
-- 
Jonathan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 net-next 2/3] net: page_pool: add the possibility to sync DMA memory for device
  2019-11-20 18:42     ` Jonathan Lemon
@ 2019-11-20 19:04       ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 14+ messages in thread
From: Jesper Dangaard Brouer @ 2019-11-20 19:04 UTC (permalink / raw)
  To: Jonathan Lemon
  Cc: Lorenzo Bianconi, netdev, davem, ilias.apalodimas,
	lorenzo.bianconi, mcroce, brouer

On Wed, 20 Nov 2019 10:42:47 -0800
"Jonathan Lemon" <jonathan.lemon@gmail.com> wrote:

> On 20 Nov 2019, at 9:49, Jesper Dangaard Brouer wrote:
> 
> > On Wed, 20 Nov 2019 16:54:18 +0200
> > Lorenzo Bianconi <lorenzo@kernel.org> wrote:
> >  
> >> Introduce the following parameters in order to add the possibility to 
> >> sync
> >> DMA memory for device before putting allocated pages in the page_pool
> >> caches:
> >> - PP_FLAG_DMA_SYNC_DEV: if set in page_pool_params flags, all pages 
> >> that
> >>   the driver gets from page_pool will be DMA-synced-for-device 
> >> according
> >>   to the length provided by the device driver. Please note 
> >> DMA-sync-for-CPU
> >>   is still device driver responsibility
> >> - offset: DMA address offset where the DMA engine starts copying rx 
> >> data
> >> - max_len: maximum DMA memory size page_pool is allowed to flush. 
> >> This
> >>   is currently used in __page_pool_alloc_pages_slow routine when 
> >> pages
> >>   are allocated from page allocator
> >> These parameters are supposed to be set by device drivers.
> >>
> >> This optimization reduces the length of the DMA-sync-for-device.
> >> The optimization is valid because pages are initially
> >> DMA-synced-for-device as defined via max_len. At RX time, the driver
> >> will perform a DMA-sync-for-CPU on the memory for the packet length.
> >> What is important is the memory occupied by packet payload, because
> >> this is the area CPU is allowed to read and modify. As we don't track
> >> cache-lines written into by the CPU, simply use the packet payload 
> >> length
> >> as dma_sync_size at page_pool recycle time. This also take into 
> >> account
> >> any tail-extend.
> >>
> >> Tested-by: Matteo Croce <mcroce@redhat.com>
> >> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> >> ---  
> >
> > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> >
> > [...]  
> >> @@ -281,8 +309,8 @@ static bool __page_pool_recycle_direct(struct 
> >> page *page,
> >>  	return true;
> >>  }
> >>
> >> -void __page_pool_put_page(struct page_pool *pool,
> >> -			  struct page *page, bool allow_direct)
> >> +void __page_pool_put_page(struct page_pool *pool, struct page *page,
> >> +			  unsigned int dma_sync_size, bool allow_direct)
> >>  {
> >>  	/* This allocator is optimized for the XDP mode that uses
> >>  	 * one-frame-per-page, but have fallbacks that act like the
> >> @@ -293,6 +321,10 @@ void __page_pool_put_page(struct page_pool 
> >> *pool,
> >>  	if (likely(page_ref_count(page) == 1)) {
> >>  		/* Read barrier done in page_ref_count / READ_ONCE */
> >>
> >> +		if (pool->p.flags & PP_FLAG_DMA_SYNC_DEV)
> >> +			page_pool_dma_sync_for_device(pool, page,
> >> +						      dma_sync_size);
> >> +
> >>  		if (allow_direct && in_serving_softirq())
> >>  			if (__page_pool_recycle_direct(page, pool))
> >>  				return;  
> >
> > I am slightly concerned this touch the fast-path code. But at-least on
> > Intel, I don't think this is measurable.  And for the ARM64 board it
> > was a huge win... thus I'll accept this.  
> 
> For the next series:
> 
> The "in_serving_softirq()" check shows up on profiling.  I'd
> like to remove this and just have a "direct" flag, where the
> caller takes the responsibility of the correct context.

As far as I can remember, this was added due to a bug in mlx5 shutdown
path... that needs to be fixed first.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH v5 net-next 0/3] add DMA-sync-for-device capability to page_pool API
  2019-11-20 14:54 [PATCH v5 net-next 0/3] add DMA-sync-for-device capability to page_pool API Lorenzo Bianconi
                   ` (3 preceding siblings ...)
  2019-11-20 15:37 ` [PATCH v5 net-next 0/3] add DMA-sync-for-device capability to page_pool API Jesper Dangaard Brouer
@ 2019-11-20 20:34 ` David Miller
  4 siblings, 0 replies; 14+ messages in thread
From: David Miller @ 2019-11-20 20:34 UTC (permalink / raw)
  To: lorenzo
  Cc: netdev, ilias.apalodimas, brouer, lorenzo.bianconi, mcroce,
	jonathan.lemon

From: Lorenzo Bianconi <lorenzo@kernel.org>
Date: Wed, 20 Nov 2019 16:54:16 +0200

> Introduce the possibility to sync DMA memory for device in the page_pool API.
> This feature allows to sync proper DMA size and not always full buffer
> (dma_sync_single_for_device can be very costly).
> Please note DMA-sync-for-CPU is still device driver responsibility.
> Relying on page_pool DMA sync mvneta driver improves XDP_DROP pps of
> about 170Kpps:
> 
> - XDP_DROP DMA sync managed by mvneta driver:	~420Kpps
> - XDP_DROP DMA sync managed by page_pool API:	~585Kpps
> 
> Do not change naming convention for the moment since the changes will hit other
> drivers as well. I will address it in another series.
 ...

Series applied, thanks.

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2019-11-20 20:35 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-20 14:54 [PATCH v5 net-next 0/3] add DMA-sync-for-device capability to page_pool API Lorenzo Bianconi
2019-11-20 14:54 ` [PATCH v5 net-next 1/3] net: mvneta: rely on page_pool_recycle_direct in mvneta_run_xdp Lorenzo Bianconi
2019-11-20 15:45   ` Jesper Dangaard Brouer
2019-11-20 14:54 ` [PATCH v5 net-next 2/3] net: page_pool: add the possibility to sync DMA memory for device Lorenzo Bianconi
2019-11-20 17:49   ` Jesper Dangaard Brouer
2019-11-20 18:00     ` Ilias Apalodimas
2019-11-20 18:42     ` Jonathan Lemon
2019-11-20 19:04       ` Jesper Dangaard Brouer
2019-11-20 14:54 ` [PATCH v5 net-next 3/3] net: mvneta: get rid of huge dma sync in mvneta_rx_refill Lorenzo Bianconi
2019-11-20 17:49   ` Jesper Dangaard Brouer
2019-11-20 15:37 ` [PATCH v5 net-next 0/3] add DMA-sync-for-device capability to page_pool API Jesper Dangaard Brouer
2019-11-20 15:45   ` Lorenzo Bianconi
2019-11-20 18:05     ` Ilias Apalodimas
2019-11-20 20:34 ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.