All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 0/3] add DMA sync capability to page_pool API
@ 2019-11-10 12:09 Lorenzo Bianconi
  2019-11-10 12:09 ` [PATCH net-next 1/3] net: mvneta: rely on page_pool_recycle_direct in mvneta_run_xdp Lorenzo Bianconi
                   ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Lorenzo Bianconi @ 2019-11-10 12:09 UTC (permalink / raw)
  To: netdev
  Cc: lorenzo.bianconi, davem, thomas.petazzoni, brouer,
	ilias.apalodimas, matteo.croce

Introduce the possibility to sync DMA memory in the page_pool API for
non-coherent devices. This feature allows to sync proper DMA size and
not always full buffer (dma_sync_single_for_device can be very costly).
Relying on page_pool DMA sync mvneta driver improves XDP_DROP pps of
about 180Kpps:

- XDP_DROP DMA sync managed by mvneta driver:	~420Kpps
- XDP_DROP DMA sync managed by page_pool API:	~595Kpps

Lorenzo Bianconi (3):
  net: mvneta: rely on page_pool_recycle_direct in mvneta_run_xdp
  net: page_pool: add the possibility sync dma memory for non-coherent
    devices
  net: mvneta: get rid of huge dma sync in mvneta_rx_refill

 drivers/net/ethernet/marvell/mvneta.c | 23 ++++++++++------
 include/net/page_pool.h               | 11 +++++---
 net/core/page_pool.c                  | 39 ++++++++++++++++++++++-----
 3 files changed, 55 insertions(+), 18 deletions(-)

-- 
2.21.0


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH net-next 1/3] net: mvneta: rely on page_pool_recycle_direct in mvneta_run_xdp
  2019-11-10 12:09 [PATCH net-next 0/3] add DMA sync capability to page_pool API Lorenzo Bianconi
@ 2019-11-10 12:09 ` Lorenzo Bianconi
  2019-11-10 12:09 ` [PATCH net-next 2/3] net: page_pool: add the possibility to sync DMA memory for non-coherent devices Lorenzo Bianconi
  2019-11-10 12:09 ` [PATCH net-next 3/3] net: mvneta: get rid of huge DMA sync in mvneta_rx_refill Lorenzo Bianconi
  2 siblings, 0 replies; 21+ messages in thread
From: Lorenzo Bianconi @ 2019-11-10 12:09 UTC (permalink / raw)
  To: netdev
  Cc: lorenzo.bianconi, davem, thomas.petazzoni, brouer,
	ilias.apalodimas, matteo.croce

Rely on page_pool_recycle_direct and not on xdp_return_buff in
mvneta_run_xdp. This is a preliminary patch to limit the dma sync len
to the one strictly necessary

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 drivers/net/ethernet/marvell/mvneta.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 274ac39c0f0f..ed93eecb7485 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -2097,7 +2097,8 @@ mvneta_run_xdp(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
 		err = xdp_do_redirect(pp->dev, xdp, prog);
 		if (err) {
 			ret = MVNETA_XDP_DROPPED;
-			xdp_return_buff(xdp);
+			page_pool_recycle_direct(rxq->page_pool,
+						 virt_to_head_page(xdp->data));
 		} else {
 			ret = MVNETA_XDP_REDIR;
 		}
@@ -2106,7 +2107,8 @@ mvneta_run_xdp(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
 	case XDP_TX:
 		ret = mvneta_xdp_xmit_back(pp, xdp);
 		if (ret != MVNETA_XDP_TX)
-			xdp_return_buff(xdp);
+			page_pool_recycle_direct(rxq->page_pool,
+						 virt_to_head_page(xdp->data));
 		break;
 	default:
 		bpf_warn_invalid_xdp_action(act);
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH net-next 2/3] net: page_pool: add the possibility to sync DMA memory for non-coherent devices
  2019-11-10 12:09 [PATCH net-next 0/3] add DMA sync capability to page_pool API Lorenzo Bianconi
  2019-11-10 12:09 ` [PATCH net-next 1/3] net: mvneta: rely on page_pool_recycle_direct in mvneta_run_xdp Lorenzo Bianconi
@ 2019-11-10 12:09 ` Lorenzo Bianconi
  2019-11-11 16:48   ` Jesper Dangaard Brouer
                     ` (2 more replies)
  2019-11-10 12:09 ` [PATCH net-next 3/3] net: mvneta: get rid of huge DMA sync in mvneta_rx_refill Lorenzo Bianconi
  2 siblings, 3 replies; 21+ messages in thread
From: Lorenzo Bianconi @ 2019-11-10 12:09 UTC (permalink / raw)
  To: netdev
  Cc: lorenzo.bianconi, davem, thomas.petazzoni, brouer,
	ilias.apalodimas, matteo.croce

Introduce the following parameters in order to add the possibility to sync
DMA memory area before putting allocated buffers in the page_pool caches:
- sync: set to 1 if device is non cache-coherent and needs to flush DMA
  area
- offset: DMA address offset where the DMA engine starts copying rx data
- max_len: maximum DMA memory size page_pool is allowed to flush. This
  is currently used in __page_pool_alloc_pages_slow routine when pages
  are allocated from page allocator
These parameters are supposed to be set by device drivers

Tested-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 include/net/page_pool.h | 11 +++++++----
 net/core/page_pool.c    | 39 +++++++++++++++++++++++++++++++++------
 2 files changed, 40 insertions(+), 10 deletions(-)

diff --git a/include/net/page_pool.h b/include/net/page_pool.h
index 2cbcdbdec254..defbfd90ab46 100644
--- a/include/net/page_pool.h
+++ b/include/net/page_pool.h
@@ -65,6 +65,9 @@ struct page_pool_params {
 	int		nid;  /* Numa node id to allocate from pages from */
 	struct device	*dev; /* device, for DMA pre-mapping purposes */
 	enum dma_data_direction dma_dir; /* DMA mapping direction */
+	unsigned int	max_len; /* max DMA sync memory size */
+	unsigned int	offset;  /* DMA addr offset */
+	u8 sync;
 };
 
 struct page_pool {
@@ -150,8 +153,8 @@ static inline void page_pool_destroy(struct page_pool *pool)
 }
 
 /* Never call this directly, use helpers below */
-void __page_pool_put_page(struct page_pool *pool,
-			  struct page *page, bool allow_direct);
+void __page_pool_put_page(struct page_pool *pool, struct page *page,
+			  unsigned int dma_sync_size, bool allow_direct);
 
 static inline void page_pool_put_page(struct page_pool *pool,
 				      struct page *page, bool allow_direct)
@@ -160,14 +163,14 @@ static inline void page_pool_put_page(struct page_pool *pool,
 	 * allow registering MEM_TYPE_PAGE_POOL, but shield linker.
 	 */
 #ifdef CONFIG_PAGE_POOL
-	__page_pool_put_page(pool, page, allow_direct);
+	__page_pool_put_page(pool, page, 0, allow_direct);
 #endif
 }
 /* Very limited use-cases allow recycle direct */
 static inline void page_pool_recycle_direct(struct page_pool *pool,
 					    struct page *page)
 {
-	__page_pool_put_page(pool, page, true);
+	__page_pool_put_page(pool, page, 0, true);
 }
 
 /* API user MUST have disconnected alloc-side (not allowed to call
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 5bc65587f1c4..af9514c2d15b 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -112,6 +112,17 @@ static struct page *__page_pool_get_cached(struct page_pool *pool)
 	return page;
 }
 
+/* Used for non-coherent devices */
+static void page_pool_dma_sync_for_device(struct page_pool *pool,
+					  struct page *page,
+					  unsigned int dma_sync_size)
+{
+	dma_sync_size = min(dma_sync_size, pool->p.max_len);
+	dma_sync_single_range_for_device(pool->p.dev, page->dma_addr,
+					 pool->p.offset, dma_sync_size,
+					 pool->p.dma_dir);
+}
+
 /* slow path */
 noinline
 static struct page *__page_pool_alloc_pages_slow(struct page_pool *pool,
@@ -156,6 +167,10 @@ static struct page *__page_pool_alloc_pages_slow(struct page_pool *pool,
 	}
 	page->dma_addr = dma;
 
+	/* non-coherent devices - flush memory */
+	if (pool->p.sync)
+		page_pool_dma_sync_for_device(pool, page, pool->p.max_len);
+
 skip_dma_map:
 	/* Track how many pages are held 'in-flight' */
 	pool->pages_state_hold_cnt++;
@@ -255,7 +270,8 @@ static void __page_pool_return_page(struct page_pool *pool, struct page *page)
 }
 
 static bool __page_pool_recycle_into_ring(struct page_pool *pool,
-				   struct page *page)
+					  struct page *page,
+					  unsigned int dma_sync_size)
 {
 	int ret;
 	/* BH protection not needed if current is serving softirq */
@@ -264,6 +280,10 @@ static bool __page_pool_recycle_into_ring(struct page_pool *pool,
 	else
 		ret = ptr_ring_produce_bh(&pool->ring, page);
 
+	/* non-coherent devices - flush memory */
+	if (ret == 0 && pool->p.sync)
+		page_pool_dma_sync_for_device(pool, page, dma_sync_size);
+
 	return (ret == 0) ? true : false;
 }
 
@@ -273,18 +293,23 @@ static bool __page_pool_recycle_into_ring(struct page_pool *pool,
  * Caller must provide appropriate safe context.
  */
 static bool __page_pool_recycle_direct(struct page *page,
-				       struct page_pool *pool)
+				       struct page_pool *pool,
+				       unsigned int dma_sync_size)
 {
 	if (unlikely(pool->alloc.count == PP_ALLOC_CACHE_SIZE))
 		return false;
 
 	/* Caller MUST have verified/know (page_ref_count(page) == 1) */
 	pool->alloc.cache[pool->alloc.count++] = page;
+
+	/* non-coherent devices - flush memory */
+	if (pool->p.sync)
+		page_pool_dma_sync_for_device(pool, page, dma_sync_size);
 	return true;
 }
 
-void __page_pool_put_page(struct page_pool *pool,
-			  struct page *page, bool allow_direct)
+void __page_pool_put_page(struct page_pool *pool, struct page *page,
+			  unsigned int dma_sync_size, bool allow_direct)
 {
 	/* This allocator is optimized for the XDP mode that uses
 	 * one-frame-per-page, but have fallbacks that act like the
@@ -296,10 +321,12 @@ void __page_pool_put_page(struct page_pool *pool,
 		/* Read barrier done in page_ref_count / READ_ONCE */
 
 		if (allow_direct && in_serving_softirq())
-			if (__page_pool_recycle_direct(page, pool))
+			if (__page_pool_recycle_direct(page, pool,
+						       dma_sync_size))
 				return;
 
-		if (!__page_pool_recycle_into_ring(pool, page)) {
+		if (!__page_pool_recycle_into_ring(pool, page,
+						   dma_sync_size)) {
 			/* Cache full, fallback to free pages */
 			__page_pool_return_page(pool, page);
 		}
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [PATCH net-next 3/3] net: mvneta: get rid of huge DMA sync in mvneta_rx_refill
  2019-11-10 12:09 [PATCH net-next 0/3] add DMA sync capability to page_pool API Lorenzo Bianconi
  2019-11-10 12:09 ` [PATCH net-next 1/3] net: mvneta: rely on page_pool_recycle_direct in mvneta_run_xdp Lorenzo Bianconi
  2019-11-10 12:09 ` [PATCH net-next 2/3] net: page_pool: add the possibility to sync DMA memory for non-coherent devices Lorenzo Bianconi
@ 2019-11-10 12:09 ` Lorenzo Bianconi
  2019-11-14 18:14   ` Jonathan Lemon
  2 siblings, 1 reply; 21+ messages in thread
From: Lorenzo Bianconi @ 2019-11-10 12:09 UTC (permalink / raw)
  To: netdev
  Cc: lorenzo.bianconi, davem, thomas.petazzoni, brouer,
	ilias.apalodimas, matteo.croce

Get rid of costly dma_sync_single_for_device in mvneta_rx_refill
since now the driver can let page_pool API to manage needed DMA
sync with a proper size.

- XDP_DROP DMA sync managed by mvneta driver:	~420Kpps
- XDP_DROP DMA sync managed by page_pool API:	~595Kpps

Tested-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 drivers/net/ethernet/marvell/mvneta.c | 25 +++++++++++++++----------
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index ed93eecb7485..591d580c68b4 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -1846,7 +1846,6 @@ static int mvneta_rx_refill(struct mvneta_port *pp,
 			    struct mvneta_rx_queue *rxq,
 			    gfp_t gfp_mask)
 {
-	enum dma_data_direction dma_dir;
 	dma_addr_t phys_addr;
 	struct page *page;
 
@@ -1856,9 +1855,6 @@ static int mvneta_rx_refill(struct mvneta_port *pp,
 		return -ENOMEM;
 
 	phys_addr = page_pool_get_dma_addr(page) + pp->rx_offset_correction;
-	dma_dir = page_pool_get_dma_dir(rxq->page_pool);
-	dma_sync_single_for_device(pp->dev->dev.parent, phys_addr,
-				   MVNETA_MAX_RX_BUF_SIZE, dma_dir);
 	mvneta_rx_desc_fill(rx_desc, phys_addr, page, rxq);
 
 	return 0;
@@ -2097,8 +2093,10 @@ mvneta_run_xdp(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
 		err = xdp_do_redirect(pp->dev, xdp, prog);
 		if (err) {
 			ret = MVNETA_XDP_DROPPED;
-			page_pool_recycle_direct(rxq->page_pool,
-						 virt_to_head_page(xdp->data));
+			__page_pool_put_page(rxq->page_pool,
+					virt_to_head_page(xdp->data),
+					xdp->data_end - xdp->data_hard_start,
+					true);
 		} else {
 			ret = MVNETA_XDP_REDIR;
 		}
@@ -2107,8 +2105,10 @@ mvneta_run_xdp(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
 	case XDP_TX:
 		ret = mvneta_xdp_xmit_back(pp, xdp);
 		if (ret != MVNETA_XDP_TX)
-			page_pool_recycle_direct(rxq->page_pool,
-						 virt_to_head_page(xdp->data));
+			__page_pool_put_page(rxq->page_pool,
+					virt_to_head_page(xdp->data),
+					xdp->data_end - xdp->data_hard_start,
+					true);
 		break;
 	default:
 		bpf_warn_invalid_xdp_action(act);
@@ -2117,8 +2117,10 @@ mvneta_run_xdp(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
 		trace_xdp_exception(pp->dev, prog, act);
 		/* fall through */
 	case XDP_DROP:
-		page_pool_recycle_direct(rxq->page_pool,
-					 virt_to_head_page(xdp->data));
+		__page_pool_put_page(rxq->page_pool,
+				     virt_to_head_page(xdp->data),
+				     xdp->data_end - xdp->data_hard_start,
+				     true);
 		ret = MVNETA_XDP_DROPPED;
 		break;
 	}
@@ -3072,6 +3074,9 @@ static int mvneta_create_page_pool(struct mvneta_port *pp,
 		.nid = cpu_to_node(0),
 		.dev = pp->dev->dev.parent,
 		.dma_dir = xdp_prog ? DMA_BIDIRECTIONAL : DMA_FROM_DEVICE,
+		.offset = pp->rx_offset_correction,
+		.max_len = MVNETA_MAX_RX_BUF_SIZE,
+		.sync = 1,
 	};
 	int err;
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next 2/3] net: page_pool: add the possibility to sync DMA memory for non-coherent devices
  2019-11-10 12:09 ` [PATCH net-next 2/3] net: page_pool: add the possibility to sync DMA memory for non-coherent devices Lorenzo Bianconi
@ 2019-11-11 16:48   ` Jesper Dangaard Brouer
  2019-11-11 19:11     ` Lorenzo Bianconi
  2019-11-13  8:29   ` Jesper Dangaard Brouer
  2019-11-14 18:48   ` Jonathan Lemon
  2 siblings, 1 reply; 21+ messages in thread
From: Jesper Dangaard Brouer @ 2019-11-11 16:48 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: netdev, lorenzo.bianconi, davem, thomas.petazzoni,
	ilias.apalodimas, matteo.croce, brouer

On Sun, 10 Nov 2019 14:09:09 +0200
Lorenzo Bianconi <lorenzo@kernel.org> wrote:

> Introduce the following parameters in order to add the possibility to sync
> DMA memory area before putting allocated buffers in the page_pool caches:

> - sync: set to 1 if device is non cache-coherent and needs to flush DMA area

I don't agree that this is only for non cache-coherent devices.

This change is generally for all device drivers.  Via setting 'sync'
(which I prefer to rename 'dma_sync') driver request that page_pool
takes over doing DMA-sync-for-device. (Very important, DMA-sync-for-CPU
is still drivers responsibility).  Drivers can benefit from removing
their calls to dma_sync_single_for_device().

We need to define meaning/semantics of this setting (my definition):
- This means that all pages that driver gets from page_pool, will be
  DMA-synced-for-device.

> - offset: DMA address offset where the DMA engine starts copying rx data

> - max_len: maximum DMA memory size page_pool is allowed to flush. This
>   is currently used in __page_pool_alloc_pages_slow routine when pages
>   are allocated from page allocator

Implementation wise (you did as I suggested offlist), and does the
DMA-sync-for-device at return-time page_pool_put_page() time, because
we (often) know the length that was/can touched by CPU.  This is key to
the optimization, that we know this length.

I also think you/we need to explain why this optimization is correct,
my attempt: 

This optimization reduce the length of the DMA-sync-for-device.  The
optimization is valid, because page is initially DMA-synced-for-device,
as defined via max_len.  At driver RX time, the driver will do a
DMA-sync-for-CPU on the memory for the packet length.  What is
important is the memory occupied by packet payload, because this is the
memory CPU is allowed to read and modify.  If CPU have not written into
a cache-line, then we know that CPU will not be flushing this, thus it
doesn't need a DMA-sync-for-device.  As we don't track cache-lines
written into, simply use the full packet length as dma_sync_size, at
page_pool recycle time.  This also take into account any tail-extend.


> These parameters are supposed to be set by device drivers


 
> Tested-by: Matteo Croce <mcroce@redhat.com>
> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> ---
>  include/net/page_pool.h | 11 +++++++----
>  net/core/page_pool.c    | 39 +++++++++++++++++++++++++++++++++------
>  2 files changed, 40 insertions(+), 10 deletions(-)
> 
> diff --git a/include/net/page_pool.h b/include/net/page_pool.h
> index 2cbcdbdec254..defbfd90ab46 100644
> --- a/include/net/page_pool.h
> +++ b/include/net/page_pool.h
> @@ -65,6 +65,9 @@ struct page_pool_params {
>  	int		nid;  /* Numa node id to allocate from pages from */
>  	struct device	*dev; /* device, for DMA pre-mapping purposes */
>  	enum dma_data_direction dma_dir; /* DMA mapping direction */
> +	unsigned int	max_len; /* max DMA sync memory size */
> +	unsigned int	offset;  /* DMA addr offset */
> +	u8 sync;
>  };
>  
>  struct page_pool {
> @@ -150,8 +153,8 @@ static inline void page_pool_destroy(struct page_pool *pool)
>  }
>  
>  /* Never call this directly, use helpers below */
> -void __page_pool_put_page(struct page_pool *pool,
> -			  struct page *page, bool allow_direct);
> +void __page_pool_put_page(struct page_pool *pool, struct page *page,
> +			  unsigned int dma_sync_size, bool allow_direct);
>  
>  static inline void page_pool_put_page(struct page_pool *pool,
>  				      struct page *page, bool allow_direct)
> @@ -160,14 +163,14 @@ static inline void page_pool_put_page(struct page_pool *pool,
>  	 * allow registering MEM_TYPE_PAGE_POOL, but shield linker.
>  	 */
>  #ifdef CONFIG_PAGE_POOL
> -	__page_pool_put_page(pool, page, allow_direct);
> +	__page_pool_put_page(pool, page, 0, allow_direct);
>  #endif
>  }
>  /* Very limited use-cases allow recycle direct */
>  static inline void page_pool_recycle_direct(struct page_pool *pool,
>  					    struct page *page)
>  {
> -	__page_pool_put_page(pool, page, true);
> +	__page_pool_put_page(pool, page, 0, true);
>  }
>  
>  /* API user MUST have disconnected alloc-side (not allowed to call
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index 5bc65587f1c4..af9514c2d15b 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -112,6 +112,17 @@ static struct page *__page_pool_get_cached(struct page_pool *pool)
>  	return page;
>  }
>  
> +/* Used for non-coherent devices */
> +static void page_pool_dma_sync_for_device(struct page_pool *pool,
> +					  struct page *page,
> +					  unsigned int dma_sync_size)
> +{
> +	dma_sync_size = min(dma_sync_size, pool->p.max_len);
> +	dma_sync_single_range_for_device(pool->p.dev, page->dma_addr,
> +					 pool->p.offset, dma_sync_size,
> +					 pool->p.dma_dir);
> +}
> +
>  /* slow path */
>  noinline
>  static struct page *__page_pool_alloc_pages_slow(struct page_pool *pool,
> @@ -156,6 +167,10 @@ static struct page *__page_pool_alloc_pages_slow(struct page_pool *pool,
>  	}
>  	page->dma_addr = dma;
>  
> +	/* non-coherent devices - flush memory */
> +	if (pool->p.sync)
> +		page_pool_dma_sync_for_device(pool, page, pool->p.max_len);
> +
>  skip_dma_map:
>  	/* Track how many pages are held 'in-flight' */
>  	pool->pages_state_hold_cnt++;
> @@ -255,7 +270,8 @@ static void __page_pool_return_page(struct page_pool *pool, struct page *page)
>  }
>  
>  static bool __page_pool_recycle_into_ring(struct page_pool *pool,
> -				   struct page *page)
> +					  struct page *page,
> +					  unsigned int dma_sync_size)
>  {
>  	int ret;
>  	/* BH protection not needed if current is serving softirq */
> @@ -264,6 +280,10 @@ static bool __page_pool_recycle_into_ring(struct page_pool *pool,
>  	else
>  		ret = ptr_ring_produce_bh(&pool->ring, page);
>  
> +	/* non-coherent devices - flush memory */
> +	if (ret == 0 && pool->p.sync)
> +		page_pool_dma_sync_for_device(pool, page, dma_sync_size);
> +
>  	return (ret == 0) ? true : false;
>  }
>  
> @@ -273,18 +293,23 @@ static bool __page_pool_recycle_into_ring(struct page_pool *pool,
>   * Caller must provide appropriate safe context.
>   */
>  static bool __page_pool_recycle_direct(struct page *page,
> -				       struct page_pool *pool)
> +				       struct page_pool *pool,
> +				       unsigned int dma_sync_size)
>  {
>  	if (unlikely(pool->alloc.count == PP_ALLOC_CACHE_SIZE))
>  		return false;
>  
>  	/* Caller MUST have verified/know (page_ref_count(page) == 1) */
>  	pool->alloc.cache[pool->alloc.count++] = page;
> +
> +	/* non-coherent devices - flush memory */
> +	if (pool->p.sync)
> +		page_pool_dma_sync_for_device(pool, page, dma_sync_size);
>  	return true;
>  }
>  
> -void __page_pool_put_page(struct page_pool *pool,
> -			  struct page *page, bool allow_direct)
> +void __page_pool_put_page(struct page_pool *pool, struct page *page,
> +			  unsigned int dma_sync_size, bool allow_direct)
>  {
>  	/* This allocator is optimized for the XDP mode that uses
>  	 * one-frame-per-page, but have fallbacks that act like the
> @@ -296,10 +321,12 @@ void __page_pool_put_page(struct page_pool *pool,
>  		/* Read barrier done in page_ref_count / READ_ONCE */
>  
>  		if (allow_direct && in_serving_softirq())
> -			if (__page_pool_recycle_direct(page, pool))
> +			if (__page_pool_recycle_direct(page, pool,
> +						       dma_sync_size))
>  				return;
>  
> -		if (!__page_pool_recycle_into_ring(pool, page)) {
> +		if (!__page_pool_recycle_into_ring(pool, page,
> +						   dma_sync_size)) {
>  			/* Cache full, fallback to free pages */
>  			__page_pool_return_page(pool, page);
>  		}



-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next 2/3] net: page_pool: add the possibility to sync DMA memory for non-coherent devices
  2019-11-11 16:48   ` Jesper Dangaard Brouer
@ 2019-11-11 19:11     ` Lorenzo Bianconi
  0 siblings, 0 replies; 21+ messages in thread
From: Lorenzo Bianconi @ 2019-11-11 19:11 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: netdev, lorenzo.bianconi, davem, thomas.petazzoni,
	ilias.apalodimas, matteo.croce

[-- Attachment #1: Type: text/plain, Size: 8609 bytes --]

> On Sun, 10 Nov 2019 14:09:09 +0200
> Lorenzo Bianconi <lorenzo@kernel.org> wrote:
> 
> > Introduce the following parameters in order to add the possibility to sync
> > DMA memory area before putting allocated buffers in the page_pool caches:
> 
> > - sync: set to 1 if device is non cache-coherent and needs to flush DMA area
> 

Hi Jesper,

thx for the review

> I don't agree that this is only for non cache-coherent devices.
> 
> This change is generally for all device drivers.  Via setting 'sync'
> (which I prefer to rename 'dma_sync') driver request that page_pool
> takes over doing DMA-sync-for-device. (Very important, DMA-sync-for-CPU
> is still drivers responsibility).  Drivers can benefit from removing
> their calls to dma_sync_single_for_device().
> 
> We need to define meaning/semantics of this setting (my definition):
> - This means that all pages that driver gets from page_pool, will be
>   DMA-synced-for-device.

ack, will fix it in v2

> 
> > - offset: DMA address offset where the DMA engine starts copying rx data
> 
> > - max_len: maximum DMA memory size page_pool is allowed to flush. This
> >   is currently used in __page_pool_alloc_pages_slow routine when pages
> >   are allocated from page allocator
> 
> Implementation wise (you did as I suggested offlist), and does the
> DMA-sync-for-device at return-time page_pool_put_page() time, because
> we (often) know the length that was/can touched by CPU.  This is key to
> the optimization, that we know this length.

right, refilling the cache we now the exact length that was/can touched by CPU.

> 
> I also think you/we need to explain why this optimization is correct,
> my attempt: 
> 
> This optimization reduce the length of the DMA-sync-for-device.  The
> optimization is valid, because page is initially DMA-synced-for-device,
> as defined via max_len.  At driver RX time, the driver will do a
> DMA-sync-for-CPU on the memory for the packet length.  What is
> important is the memory occupied by packet payload, because this is the
> memory CPU is allowed to read and modify.  If CPU have not written into
> a cache-line, then we know that CPU will not be flushing this, thus it
> doesn't need a DMA-sync-for-device.  As we don't track cache-lines
> written into, simply use the full packet length as dma_sync_size, at
> page_pool recycle time.  This also take into account any tail-extend.

ack, will update it in v2

Regards,
Lorenzo

> 
> 
> > These parameters are supposed to be set by device drivers
> 
> 
>  
> > Tested-by: Matteo Croce <mcroce@redhat.com>
> > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> > ---
> >  include/net/page_pool.h | 11 +++++++----
> >  net/core/page_pool.c    | 39 +++++++++++++++++++++++++++++++++------
> >  2 files changed, 40 insertions(+), 10 deletions(-)
> > 
> > diff --git a/include/net/page_pool.h b/include/net/page_pool.h
> > index 2cbcdbdec254..defbfd90ab46 100644
> > --- a/include/net/page_pool.h
> > +++ b/include/net/page_pool.h
> > @@ -65,6 +65,9 @@ struct page_pool_params {
> >  	int		nid;  /* Numa node id to allocate from pages from */
> >  	struct device	*dev; /* device, for DMA pre-mapping purposes */
> >  	enum dma_data_direction dma_dir; /* DMA mapping direction */
> > +	unsigned int	max_len; /* max DMA sync memory size */
> > +	unsigned int	offset;  /* DMA addr offset */
> > +	u8 sync;
> >  };
> >  
> >  struct page_pool {
> > @@ -150,8 +153,8 @@ static inline void page_pool_destroy(struct page_pool *pool)
> >  }
> >  
> >  /* Never call this directly, use helpers below */
> > -void __page_pool_put_page(struct page_pool *pool,
> > -			  struct page *page, bool allow_direct);
> > +void __page_pool_put_page(struct page_pool *pool, struct page *page,
> > +			  unsigned int dma_sync_size, bool allow_direct);
> >  
> >  static inline void page_pool_put_page(struct page_pool *pool,
> >  				      struct page *page, bool allow_direct)
> > @@ -160,14 +163,14 @@ static inline void page_pool_put_page(struct page_pool *pool,
> >  	 * allow registering MEM_TYPE_PAGE_POOL, but shield linker.
> >  	 */
> >  #ifdef CONFIG_PAGE_POOL
> > -	__page_pool_put_page(pool, page, allow_direct);
> > +	__page_pool_put_page(pool, page, 0, allow_direct);
> >  #endif
> >  }
> >  /* Very limited use-cases allow recycle direct */
> >  static inline void page_pool_recycle_direct(struct page_pool *pool,
> >  					    struct page *page)
> >  {
> > -	__page_pool_put_page(pool, page, true);
> > +	__page_pool_put_page(pool, page, 0, true);
> >  }
> >  
> >  /* API user MUST have disconnected alloc-side (not allowed to call
> > diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> > index 5bc65587f1c4..af9514c2d15b 100644
> > --- a/net/core/page_pool.c
> > +++ b/net/core/page_pool.c
> > @@ -112,6 +112,17 @@ static struct page *__page_pool_get_cached(struct page_pool *pool)
> >  	return page;
> >  }
> >  
> > +/* Used for non-coherent devices */
> > +static void page_pool_dma_sync_for_device(struct page_pool *pool,
> > +					  struct page *page,
> > +					  unsigned int dma_sync_size)
> > +{
> > +	dma_sync_size = min(dma_sync_size, pool->p.max_len);
> > +	dma_sync_single_range_for_device(pool->p.dev, page->dma_addr,
> > +					 pool->p.offset, dma_sync_size,
> > +					 pool->p.dma_dir);
> > +}
> > +
> >  /* slow path */
> >  noinline
> >  static struct page *__page_pool_alloc_pages_slow(struct page_pool *pool,
> > @@ -156,6 +167,10 @@ static struct page *__page_pool_alloc_pages_slow(struct page_pool *pool,
> >  	}
> >  	page->dma_addr = dma;
> >  
> > +	/* non-coherent devices - flush memory */
> > +	if (pool->p.sync)
> > +		page_pool_dma_sync_for_device(pool, page, pool->p.max_len);
> > +
> >  skip_dma_map:
> >  	/* Track how many pages are held 'in-flight' */
> >  	pool->pages_state_hold_cnt++;
> > @@ -255,7 +270,8 @@ static void __page_pool_return_page(struct page_pool *pool, struct page *page)
> >  }
> >  
> >  static bool __page_pool_recycle_into_ring(struct page_pool *pool,
> > -				   struct page *page)
> > +					  struct page *page,
> > +					  unsigned int dma_sync_size)
> >  {
> >  	int ret;
> >  	/* BH protection not needed if current is serving softirq */
> > @@ -264,6 +280,10 @@ static bool __page_pool_recycle_into_ring(struct page_pool *pool,
> >  	else
> >  		ret = ptr_ring_produce_bh(&pool->ring, page);
> >  
> > +	/* non-coherent devices - flush memory */
> > +	if (ret == 0 && pool->p.sync)
> > +		page_pool_dma_sync_for_device(pool, page, dma_sync_size);
> > +
> >  	return (ret == 0) ? true : false;
> >  }
> >  
> > @@ -273,18 +293,23 @@ static bool __page_pool_recycle_into_ring(struct page_pool *pool,
> >   * Caller must provide appropriate safe context.
> >   */
> >  static bool __page_pool_recycle_direct(struct page *page,
> > -				       struct page_pool *pool)
> > +				       struct page_pool *pool,
> > +				       unsigned int dma_sync_size)
> >  {
> >  	if (unlikely(pool->alloc.count == PP_ALLOC_CACHE_SIZE))
> >  		return false;
> >  
> >  	/* Caller MUST have verified/know (page_ref_count(page) == 1) */
> >  	pool->alloc.cache[pool->alloc.count++] = page;
> > +
> > +	/* non-coherent devices - flush memory */
> > +	if (pool->p.sync)
> > +		page_pool_dma_sync_for_device(pool, page, dma_sync_size);
> >  	return true;
> >  }
> >  
> > -void __page_pool_put_page(struct page_pool *pool,
> > -			  struct page *page, bool allow_direct)
> > +void __page_pool_put_page(struct page_pool *pool, struct page *page,
> > +			  unsigned int dma_sync_size, bool allow_direct)
> >  {
> >  	/* This allocator is optimized for the XDP mode that uses
> >  	 * one-frame-per-page, but have fallbacks that act like the
> > @@ -296,10 +321,12 @@ void __page_pool_put_page(struct page_pool *pool,
> >  		/* Read barrier done in page_ref_count / READ_ONCE */
> >  
> >  		if (allow_direct && in_serving_softirq())
> > -			if (__page_pool_recycle_direct(page, pool))
> > +			if (__page_pool_recycle_direct(page, pool,
> > +						       dma_sync_size))
> >  				return;
> >  
> > -		if (!__page_pool_recycle_into_ring(pool, page)) {
> > +		if (!__page_pool_recycle_into_ring(pool, page,
> > +						   dma_sync_size)) {
> >  			/* Cache full, fallback to free pages */
> >  			__page_pool_return_page(pool, page);
> >  		}
> 
> 
> 
> -- 
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   LinkedIn: http://www.linkedin.com/in/brouer
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next 2/3] net: page_pool: add the possibility to sync DMA memory for non-coherent devices
  2019-11-10 12:09 ` [PATCH net-next 2/3] net: page_pool: add the possibility to sync DMA memory for non-coherent devices Lorenzo Bianconi
  2019-11-11 16:48   ` Jesper Dangaard Brouer
@ 2019-11-13  8:29   ` Jesper Dangaard Brouer
  2019-11-14 18:48   ` Jonathan Lemon
  2 siblings, 0 replies; 21+ messages in thread
From: Jesper Dangaard Brouer @ 2019-11-13  8:29 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: netdev, lorenzo.bianconi, davem, thomas.petazzoni,
	ilias.apalodimas, matteo.croce, brouer

On Sun, 10 Nov 2019 14:09:09 +0200
Lorenzo Bianconi <lorenzo@kernel.org> wrote:

> diff --git a/include/net/page_pool.h b/include/net/page_pool.h
> index 2cbcdbdec254..defbfd90ab46 100644
[...]
> @@ -150,8 +153,8 @@ static inline void page_pool_destroy(struct page_pool *pool)
>  }
>  
>  /* Never call this directly, use helpers below */
> -void __page_pool_put_page(struct page_pool *pool,
> -			  struct page *page, bool allow_direct);
> +void __page_pool_put_page(struct page_pool *pool, struct page *page,
> +			  unsigned int dma_sync_size, bool allow_direct);
>  
>  static inline void page_pool_put_page(struct page_pool *pool,
>  				      struct page *page, bool allow_direct)
> @@ -160,14 +163,14 @@ static inline void page_pool_put_page(struct page_pool *pool,
>  	 * allow registering MEM_TYPE_PAGE_POOL, but shield linker.
>  	 */
>  #ifdef CONFIG_PAGE_POOL
> -	__page_pool_put_page(pool, page, allow_direct);
> +	__page_pool_put_page(pool, page, 0, allow_direct);
>  #endif
>  }
>  /* Very limited use-cases allow recycle direct */
>  static inline void page_pool_recycle_direct(struct page_pool *pool,
>  					    struct page *page)
>  {
> -	__page_pool_put_page(pool, page, true);
> +	__page_pool_put_page(pool, page, 0, true);
>  }

We need to use another "default" value than zero for 'dma_sync_size' in
above calls.  I suggest either 0xFFFFFFFF or -1 (which unsigned is
0xFFFFFFFF).

Point is that in case caller doesn't know the length (the CPU have had
access to) then page_pool will need to sync with pool->p.max_len.

If choosing a larger value here default value your code below takes
care of it via min(dma_sync_size, pool->p.max_len).

 
>  /* API user MUST have disconnected alloc-side (not allowed to call
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index 5bc65587f1c4..af9514c2d15b 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -112,6 +112,17 @@ static struct page
> *__page_pool_get_cached(struct page_pool *pool) return page;
>  }
>  
> +/* Used for non-coherent devices */
> +static void page_pool_dma_sync_for_device(struct page_pool *pool,
> +					  struct page *page,
> +					  unsigned int dma_sync_size)
> +{
> +	dma_sync_size = min(dma_sync_size, pool->p.max_len);
> +	dma_sync_single_range_for_device(pool->p.dev, page->dma_addr,
> +					 pool->p.offset, dma_sync_size,
> +					 pool->p.dma_dir);
> +}


-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next 3/3] net: mvneta: get rid of huge DMA sync in mvneta_rx_refill
  2019-11-10 12:09 ` [PATCH net-next 3/3] net: mvneta: get rid of huge DMA sync in mvneta_rx_refill Lorenzo Bianconi
@ 2019-11-14 18:14   ` Jonathan Lemon
  2019-11-14 18:18     ` Ilias Apalodimas
  0 siblings, 1 reply; 21+ messages in thread
From: Jonathan Lemon @ 2019-11-14 18:14 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: netdev, lorenzo.bianconi, davem, thomas.petazzoni, brouer,
	ilias.apalodimas, matteo.croce

On 10 Nov 2019, at 4:09, Lorenzo Bianconi wrote:

> Get rid of costly dma_sync_single_for_device in mvneta_rx_refill
> since now the driver can let page_pool API to manage needed DMA
> sync with a proper size.
>
> - XDP_DROP DMA sync managed by mvneta driver:	~420Kpps
> - XDP_DROP DMA sync managed by page_pool API:	~595Kpps
>
> Tested-by: Matteo Croce <mcroce@redhat.com>
> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> ---
>  drivers/net/ethernet/marvell/mvneta.c | 25 +++++++++++++++----------
>  1 file changed, 15 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/net/ethernet/marvell/mvneta.c 
> b/drivers/net/ethernet/marvell/mvneta.c
> index ed93eecb7485..591d580c68b4 100644
> --- a/drivers/net/ethernet/marvell/mvneta.c
> +++ b/drivers/net/ethernet/marvell/mvneta.c
> @@ -1846,7 +1846,6 @@ static int mvneta_rx_refill(struct mvneta_port 
> *pp,
>  			    struct mvneta_rx_queue *rxq,
>  			    gfp_t gfp_mask)
>  {
> -	enum dma_data_direction dma_dir;
>  	dma_addr_t phys_addr;
>  	struct page *page;
>
> @@ -1856,9 +1855,6 @@ static int mvneta_rx_refill(struct mvneta_port 
> *pp,
>  		return -ENOMEM;
>
>  	phys_addr = page_pool_get_dma_addr(page) + pp->rx_offset_correction;
> -	dma_dir = page_pool_get_dma_dir(rxq->page_pool);
> -	dma_sync_single_for_device(pp->dev->dev.parent, phys_addr,
> -				   MVNETA_MAX_RX_BUF_SIZE, dma_dir);
>  	mvneta_rx_desc_fill(rx_desc, phys_addr, page, rxq);
>
>  	return 0;
> @@ -2097,8 +2093,10 @@ mvneta_run_xdp(struct mvneta_port *pp, struct 
> mvneta_rx_queue *rxq,
>  		err = xdp_do_redirect(pp->dev, xdp, prog);
>  		if (err) {
>  			ret = MVNETA_XDP_DROPPED;
> -			page_pool_recycle_direct(rxq->page_pool,
> -						 virt_to_head_page(xdp->data));
> +			__page_pool_put_page(rxq->page_pool,
> +					virt_to_head_page(xdp->data),
> +					xdp->data_end - xdp->data_hard_start,
> +					true);

I just have a clarifying question.  Here, the RX buffer was received and 
then
dma_sync'd to the CPU.  Now, it is going to be recycled for RX again; 
does it
actually need to be sync'd back to the device?

I'm asking since several of the other network drivers (mellanox, for 
example)
don't resync the buffer back to the device when recycling it for reuse.
-- 
Jonathan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next 3/3] net: mvneta: get rid of huge DMA sync in mvneta_rx_refill
  2019-11-14 18:14   ` Jonathan Lemon
@ 2019-11-14 18:18     ` Ilias Apalodimas
  0 siblings, 0 replies; 21+ messages in thread
From: Ilias Apalodimas @ 2019-11-14 18:18 UTC (permalink / raw)
  To: Jonathan Lemon
  Cc: Lorenzo Bianconi, netdev, lorenzo.bianconi, davem,
	thomas.petazzoni, brouer, matteo.croce

Hi Jonathan,

On Thu, Nov 14, 2019 at 10:14:19AM -0800, Jonathan Lemon wrote:
> On 10 Nov 2019, at 4:09, Lorenzo Bianconi wrote:
> 
> > Get rid of costly dma_sync_single_for_device in mvneta_rx_refill
> > since now the driver can let page_pool API to manage needed DMA
> > sync with a proper size.
> > 
> > - XDP_DROP DMA sync managed by mvneta driver:	~420Kpps
> > - XDP_DROP DMA sync managed by page_pool API:	~595Kpps
> > 
> > Tested-by: Matteo Croce <mcroce@redhat.com>
> > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> > ---
> >  drivers/net/ethernet/marvell/mvneta.c | 25 +++++++++++++++----------
> >  1 file changed, 15 insertions(+), 10 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/marvell/mvneta.c
> > b/drivers/net/ethernet/marvell/mvneta.c
> > index ed93eecb7485..591d580c68b4 100644
> > --- a/drivers/net/ethernet/marvell/mvneta.c
> > +++ b/drivers/net/ethernet/marvell/mvneta.c
> > @@ -1846,7 +1846,6 @@ static int mvneta_rx_refill(struct mvneta_port
> > *pp,
> >  			    struct mvneta_rx_queue *rxq,
> >  			    gfp_t gfp_mask)
> >  {
> > -	enum dma_data_direction dma_dir;
> >  	dma_addr_t phys_addr;
> >  	struct page *page;
> > 
> > @@ -1856,9 +1855,6 @@ static int mvneta_rx_refill(struct mvneta_port
> > *pp,
> >  		return -ENOMEM;
> > 
> >  	phys_addr = page_pool_get_dma_addr(page) + pp->rx_offset_correction;
> > -	dma_dir = page_pool_get_dma_dir(rxq->page_pool);
> > -	dma_sync_single_for_device(pp->dev->dev.parent, phys_addr,
> > -				   MVNETA_MAX_RX_BUF_SIZE, dma_dir);
> >  	mvneta_rx_desc_fill(rx_desc, phys_addr, page, rxq);
> > 
> >  	return 0;
> > @@ -2097,8 +2093,10 @@ mvneta_run_xdp(struct mvneta_port *pp, struct
> > mvneta_rx_queue *rxq,
> >  		err = xdp_do_redirect(pp->dev, xdp, prog);
> >  		if (err) {
> >  			ret = MVNETA_XDP_DROPPED;
> > -			page_pool_recycle_direct(rxq->page_pool,
> > -						 virt_to_head_page(xdp->data));
> > +			__page_pool_put_page(rxq->page_pool,
> > +					virt_to_head_page(xdp->data),
> > +					xdp->data_end - xdp->data_hard_start,
> > +					true);
> 
> I just have a clarifying question.  Here, the RX buffer was received and
> then
> dma_sync'd to the CPU.  Now, it is going to be recycled for RX again; does
> it
> actually need to be sync'd back to the device?
> 
> I'm asking since several of the other network drivers (mellanox, for
> example)
> don't resync the buffer back to the device when recycling it for reuse.

I think that if noone apart from the NIC touches the memory, you don't have any
pending cache writes you have to account for. 
So since the buffer is completely under the drivers control, as long as you can
guarantee noone's going to write it, you can hand it back to the device without
syncing (BPF use case where the bpf program changes the packet for example
breaks this rule)

Thanks
/Ilias
> -- 
> Jonathan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next 2/3] net: page_pool: add the possibility to sync DMA memory for non-coherent devices
  2019-11-10 12:09 ` [PATCH net-next 2/3] net: page_pool: add the possibility to sync DMA memory for non-coherent devices Lorenzo Bianconi
  2019-11-11 16:48   ` Jesper Dangaard Brouer
  2019-11-13  8:29   ` Jesper Dangaard Brouer
@ 2019-11-14 18:48   ` Jonathan Lemon
  2019-11-14 18:53     ` Ilias Apalodimas
  2 siblings, 1 reply; 21+ messages in thread
From: Jonathan Lemon @ 2019-11-14 18:48 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: netdev, lorenzo.bianconi, davem, thomas.petazzoni, brouer,
	ilias.apalodimas, matteo.croce

On 10 Nov 2019, at 4:09, Lorenzo Bianconi wrote:

> Introduce the following parameters in order to add the possibility to 
> sync
> DMA memory area before putting allocated buffers in the page_pool 
> caches:
> - sync: set to 1 if device is non cache-coherent and needs to flush 
> DMA
>   area
> - offset: DMA address offset where the DMA engine starts copying rx 
> data
> - max_len: maximum DMA memory size page_pool is allowed to flush. This
>   is currently used in __page_pool_alloc_pages_slow routine when pages
>   are allocated from page allocator
> These parameters are supposed to be set by device drivers
>
> Tested-by: Matteo Croce <mcroce@redhat.com>
> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> ---
>  include/net/page_pool.h | 11 +++++++----
>  net/core/page_pool.c    | 39 +++++++++++++++++++++++++++++++++------
>  2 files changed, 40 insertions(+), 10 deletions(-)
>
> diff --git a/include/net/page_pool.h b/include/net/page_pool.h
> index 2cbcdbdec254..defbfd90ab46 100644
> --- a/include/net/page_pool.h
> +++ b/include/net/page_pool.h
> @@ -65,6 +65,9 @@ struct page_pool_params {
>  	int		nid;  /* Numa node id to allocate from pages from */
>  	struct device	*dev; /* device, for DMA pre-mapping purposes */
>  	enum dma_data_direction dma_dir; /* DMA mapping direction */
> +	unsigned int	max_len; /* max DMA sync memory size */
> +	unsigned int	offset;  /* DMA addr offset */
> +	u8 sync;
>  };

How about using PP_FLAG_DMA_SYNC instead of another flag word?
(then it can also be gated on having DMA_MAP enabled)
-- 
Jonathan

>
>  struct page_pool {
> @@ -150,8 +153,8 @@ static inline void page_pool_destroy(struct 
> page_pool *pool)
>  }
>
>  /* Never call this directly, use helpers below */
> -void __page_pool_put_page(struct page_pool *pool,
> -			  struct page *page, bool allow_direct);
> +void __page_pool_put_page(struct page_pool *pool, struct page *page,
> +			  unsigned int dma_sync_size, bool allow_direct);
>
>  static inline void page_pool_put_page(struct page_pool *pool,
>  				      struct page *page, bool allow_direct)
> @@ -160,14 +163,14 @@ static inline void page_pool_put_page(struct 
> page_pool *pool,
>  	 * allow registering MEM_TYPE_PAGE_POOL, but shield linker.
>  	 */
>  #ifdef CONFIG_PAGE_POOL
> -	__page_pool_put_page(pool, page, allow_direct);
> +	__page_pool_put_page(pool, page, 0, allow_direct);
>  #endif
>  }
>  /* Very limited use-cases allow recycle direct */
>  static inline void page_pool_recycle_direct(struct page_pool *pool,
>  					    struct page *page)
>  {
> -	__page_pool_put_page(pool, page, true);
> +	__page_pool_put_page(pool, page, 0, true);
>  }
>
>  /* API user MUST have disconnected alloc-side (not allowed to call
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index 5bc65587f1c4..af9514c2d15b 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -112,6 +112,17 @@ static struct page *__page_pool_get_cached(struct 
> page_pool *pool)
>  	return page;
>  }
>
> +/* Used for non-coherent devices */
> +static void page_pool_dma_sync_for_device(struct page_pool *pool,
> +					  struct page *page,
> +					  unsigned int dma_sync_size)
> +{
> +	dma_sync_size = min(dma_sync_size, pool->p.max_len);
> +	dma_sync_single_range_for_device(pool->p.dev, page->dma_addr,
> +					 pool->p.offset, dma_sync_size,
> +					 pool->p.dma_dir);
> +}
> +
>  /* slow path */
>  noinline
>  static struct page *__page_pool_alloc_pages_slow(struct page_pool 
> *pool,
> @@ -156,6 +167,10 @@ static struct page 
> *__page_pool_alloc_pages_slow(struct page_pool *pool,
>  	}
>  	page->dma_addr = dma;
>
> +	/* non-coherent devices - flush memory */
> +	if (pool->p.sync)
> +		page_pool_dma_sync_for_device(pool, page, pool->p.max_len);
> +
>  skip_dma_map:
>  	/* Track how many pages are held 'in-flight' */
>  	pool->pages_state_hold_cnt++;
> @@ -255,7 +270,8 @@ static void __page_pool_return_page(struct 
> page_pool *pool, struct page *page)
>  }
>
>  static bool __page_pool_recycle_into_ring(struct page_pool *pool,
> -				   struct page *page)
> +					  struct page *page,
> +					  unsigned int dma_sync_size)
>  {
>  	int ret;
>  	/* BH protection not needed if current is serving softirq */
> @@ -264,6 +280,10 @@ static bool __page_pool_recycle_into_ring(struct 
> page_pool *pool,
>  	else
>  		ret = ptr_ring_produce_bh(&pool->ring, page);
>
> +	/* non-coherent devices - flush memory */
> +	if (ret == 0 && pool->p.sync)
> +		page_pool_dma_sync_for_device(pool, page, dma_sync_size);
> +
>  	return (ret == 0) ? true : false;
>  }
>
> @@ -273,18 +293,23 @@ static bool __page_pool_recycle_into_ring(struct 
> page_pool *pool,
>   * Caller must provide appropriate safe context.
>   */
>  static bool __page_pool_recycle_direct(struct page *page,
> -				       struct page_pool *pool)
> +				       struct page_pool *pool,
> +				       unsigned int dma_sync_size)
>  {
>  	if (unlikely(pool->alloc.count == PP_ALLOC_CACHE_SIZE))
>  		return false;
>
>  	/* Caller MUST have verified/know (page_ref_count(page) == 1) */
>  	pool->alloc.cache[pool->alloc.count++] = page;
> +
> +	/* non-coherent devices - flush memory */
> +	if (pool->p.sync)
> +		page_pool_dma_sync_for_device(pool, page, dma_sync_size);
>  	return true;
>  }
>
> -void __page_pool_put_page(struct page_pool *pool,
> -			  struct page *page, bool allow_direct)
> +void __page_pool_put_page(struct page_pool *pool, struct page *page,
> +			  unsigned int dma_sync_size, bool allow_direct)
>  {
>  	/* This allocator is optimized for the XDP mode that uses
>  	 * one-frame-per-page, but have fallbacks that act like the
> @@ -296,10 +321,12 @@ void __page_pool_put_page(struct page_pool 
> *pool,
>  		/* Read barrier done in page_ref_count / READ_ONCE */
>
>  		if (allow_direct && in_serving_softirq())
> -			if (__page_pool_recycle_direct(page, pool))
> +			if (__page_pool_recycle_direct(page, pool,
> +						       dma_sync_size))
>  				return;
>
> -		if (!__page_pool_recycle_into_ring(pool, page)) {
> +		if (!__page_pool_recycle_into_ring(pool, page,
> +						   dma_sync_size)) {
>  			/* Cache full, fallback to free pages */
>  			__page_pool_return_page(pool, page);
>  		}
> -- 
> 2.21.0

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next 2/3] net: page_pool: add the possibility to sync DMA memory for non-coherent devices
  2019-11-14 18:48   ` Jonathan Lemon
@ 2019-11-14 18:53     ` Ilias Apalodimas
  2019-11-14 20:27       ` Jonathan Lemon
  0 siblings, 1 reply; 21+ messages in thread
From: Ilias Apalodimas @ 2019-11-14 18:53 UTC (permalink / raw)
  To: Jonathan Lemon
  Cc: Lorenzo Bianconi, netdev, lorenzo.bianconi, davem,
	thomas.petazzoni, brouer, matteo.croce

[...]
> > index 2cbcdbdec254..defbfd90ab46 100644
> > --- a/include/net/page_pool.h
> > +++ b/include/net/page_pool.h
> > @@ -65,6 +65,9 @@ struct page_pool_params {
> >  	int		nid;  /* Numa node id to allocate from pages from */
> >  	struct device	*dev; /* device, for DMA pre-mapping purposes */
> >  	enum dma_data_direction dma_dir; /* DMA mapping direction */
> > +	unsigned int	max_len; /* max DMA sync memory size */
> > +	unsigned int	offset;  /* DMA addr offset */
> > +	u8 sync;
> >  };
> 
> How about using PP_FLAG_DMA_SYNC instead of another flag word?
> (then it can also be gated on having DMA_MAP enabled)

You mean instead of the u8?
As you pointed out on your V2 comment of the mail, some cards don't sync back to
device.
As the API tries to be generic a u8 was choosen instead of a flag to cover these
use cases. So in time we'll change the semantics of this to 'always sync', 'dont
sync if it's an skb-only queue' etc.
The first case Lorenzo covered is sync the required len only instead of the full
buffer


Thanks
/Ilias

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next 2/3] net: page_pool: add the possibility to sync DMA memory for non-coherent devices
  2019-11-14 18:53     ` Ilias Apalodimas
@ 2019-11-14 20:27       ` Jonathan Lemon
  2019-11-14 20:42         ` Ilias Apalodimas
  0 siblings, 1 reply; 21+ messages in thread
From: Jonathan Lemon @ 2019-11-14 20:27 UTC (permalink / raw)
  To: Ilias Apalodimas
  Cc: Lorenzo Bianconi, netdev, lorenzo.bianconi, davem,
	thomas.petazzoni, brouer, matteo.croce



On 14 Nov 2019, at 10:53, Ilias Apalodimas wrote:

> [...]
>>> index 2cbcdbdec254..defbfd90ab46 100644
>>> --- a/include/net/page_pool.h
>>> +++ b/include/net/page_pool.h
>>> @@ -65,6 +65,9 @@ struct page_pool_params {
>>>  	int		nid;  /* Numa node id to allocate from pages from */
>>>  	struct device	*dev; /* device, for DMA pre-mapping purposes */
>>>  	enum dma_data_direction dma_dir; /* DMA mapping direction */
>>> +	unsigned int	max_len; /* max DMA sync memory size */
>>> +	unsigned int	offset;  /* DMA addr offset */
>>> +	u8 sync;
>>>  };
>>
>> How about using PP_FLAG_DMA_SYNC instead of another flag word?
>> (then it can also be gated on having DMA_MAP enabled)
>
> You mean instead of the u8?
> As you pointed out on your V2 comment of the mail, some cards don't 
> sync back to
> device.
> As the API tries to be generic a u8 was choosen instead of a flag to 
> cover these
> use cases. So in time we'll change the semantics of this to 'always 
> sync', 'dont
> sync if it's an skb-only queue' etc.
> The first case Lorenzo covered is sync the required len only instead 
> of the full
> buffer

Yes, I meant instead of:
+		.sync = 1,

Something like:
         .flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC

Since .sync alone doesn't make sense if the page pool isn't performing 
any
DMA mapping, right?  Then existing drivers, if they're converted, can 
just
add the SYNC flag.

I did see the initial case where only the RX_BUF_SIZE (1536) is sync'd
instead of the full page.

Could you expand on your 'skb-only queue' comment?  I'm currently 
running
a variant of your patch where iommu mapped pages are attached to skb's 
and
sent up the stack, then reclaimed on release.  I imagine that with this
change, they would have the full RX_BUF_SIZE sync'd before returning to 
the
driver, since the upper layers could basically do anything with the 
buffer
area.
-- 
Jonathan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next 2/3] net: page_pool: add the possibility to sync DMA memory for non-coherent devices
  2019-11-14 20:27       ` Jonathan Lemon
@ 2019-11-14 20:42         ` Ilias Apalodimas
  2019-11-14 21:04           ` Jonathan Lemon
  0 siblings, 1 reply; 21+ messages in thread
From: Ilias Apalodimas @ 2019-11-14 20:42 UTC (permalink / raw)
  To: Jonathan Lemon
  Cc: Lorenzo Bianconi, netdev, lorenzo.bianconi, davem,
	thomas.petazzoni, brouer, matteo.croce

Hi Jonathan,

On Thu, Nov 14, 2019 at 12:27:40PM -0800, Jonathan Lemon wrote:
> 
> 
> On 14 Nov 2019, at 10:53, Ilias Apalodimas wrote:
> 
> > [...]
> > > > index 2cbcdbdec254..defbfd90ab46 100644
> > > > --- a/include/net/page_pool.h
> > > > +++ b/include/net/page_pool.h
> > > > @@ -65,6 +65,9 @@ struct page_pool_params {
> > > >  	int		nid;  /* Numa node id to allocate from pages from */
> > > >  	struct device	*dev; /* device, for DMA pre-mapping purposes */
> > > >  	enum dma_data_direction dma_dir; /* DMA mapping direction */
> > > > +	unsigned int	max_len; /* max DMA sync memory size */
> > > > +	unsigned int	offset;  /* DMA addr offset */
> > > > +	u8 sync;
> > > >  };
> > > 
> > > How about using PP_FLAG_DMA_SYNC instead of another flag word?
> > > (then it can also be gated on having DMA_MAP enabled)
> > 
> > You mean instead of the u8?
> > As you pointed out on your V2 comment of the mail, some cards don't sync
> > back to
> > device.
> > As the API tries to be generic a u8 was choosen instead of a flag to
> > cover these
> > use cases. So in time we'll change the semantics of this to 'always
> > sync', 'dont
> > sync if it's an skb-only queue' etc.
> > The first case Lorenzo covered is sync the required len only instead of
> > the full
> > buffer
> 
> Yes, I meant instead of:
> +		.sync = 1,
> 
> Something like:
>         .flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC
> 
> Since .sync alone doesn't make sense if the page pool isn't performing any
> DMA mapping, right?  

Correct. If the sync happens regardless of the page pool mapping capabilities,
this will affect performance negatively as well (on non-coherent architectures) 

> Then existing drivers, if they're converted, can just
> add the SYNC flag.
> 
> I did see the initial case where only the RX_BUF_SIZE (1536) is sync'd
> instead of the full page.
> 
> Could you expand on your 'skb-only queue' comment?  I'm currently running
> a variant of your patch where iommu mapped pages are attached to skb's and
> sent up the stack, then reclaimed on release.  I imagine that with this
> change, they would have the full RX_BUF_SIZE sync'd before returning to the
> driver, since the upper layers could basically do anything with the buffer
> area.

The idea was that page_pool lives per device queue. Usually some queues are
reserved for XDP only. Since eBPF progs can change the packet we have to sync
for the device, before we fill in the device descriptors. 

For the skb reserved queues, this depends on the 'anything'. If the rest of the
layers touch (or rather write) into that area, then we'll again gave to sync. 
If we know that the data has not been altered though, we can hand them back to
the device skipping that sync right?


Thanks
/Ilias
> -- 
> Jonathan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next 2/3] net: page_pool: add the possibility to sync DMA memory for non-coherent devices
  2019-11-14 20:42         ` Ilias Apalodimas
@ 2019-11-14 21:04           ` Jonathan Lemon
  2019-11-14 21:43             ` Jesper Dangaard Brouer
  2019-11-15  7:17             ` Ilias Apalodimas
  0 siblings, 2 replies; 21+ messages in thread
From: Jonathan Lemon @ 2019-11-14 21:04 UTC (permalink / raw)
  To: Ilias Apalodimas
  Cc: Lorenzo Bianconi, netdev, lorenzo.bianconi, davem,
	thomas.petazzoni, brouer, matteo.croce

On 14 Nov 2019, at 12:42, Ilias Apalodimas wrote:

> Hi Jonathan,
>
> On Thu, Nov 14, 2019 at 12:27:40PM -0800, Jonathan Lemon wrote:
>>
>>
>> On 14 Nov 2019, at 10:53, Ilias Apalodimas wrote:
>>
>>> [...]
>>>>> index 2cbcdbdec254..defbfd90ab46 100644
>>>>> --- a/include/net/page_pool.h
>>>>> +++ b/include/net/page_pool.h
>>>>> @@ -65,6 +65,9 @@ struct page_pool_params {
>>>>>  	int		nid;  /* Numa node id to allocate from pages from */
>>>>>  	struct device	*dev; /* device, for DMA pre-mapping purposes */
>>>>>  	enum dma_data_direction dma_dir; /* DMA mapping direction */
>>>>> +	unsigned int	max_len; /* max DMA sync memory size */
>>>>> +	unsigned int	offset;  /* DMA addr offset */
>>>>> +	u8 sync;
>>>>>  };
>>>>
>>>> How about using PP_FLAG_DMA_SYNC instead of another flag word?
>>>> (then it can also be gated on having DMA_MAP enabled)
>>>
>>> You mean instead of the u8?
>>> As you pointed out on your V2 comment of the mail, some cards don't 
>>> sync
>>> back to
>>> device.
>>> As the API tries to be generic a u8 was choosen instead of a flag to
>>> cover these
>>> use cases. So in time we'll change the semantics of this to 'always
>>> sync', 'dont
>>> sync if it's an skb-only queue' etc.
>>> The first case Lorenzo covered is sync the required len only instead 
>>> of
>>> the full
>>> buffer
>>
>> Yes, I meant instead of:
>> +		.sync = 1,
>>
>> Something like:
>>         .flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC
>>
>> Since .sync alone doesn't make sense if the page pool isn't 
>> performing any
>> DMA mapping, right?
>
> Correct. If the sync happens regardless of the page pool mapping 
> capabilities,
> this will affect performance negatively as well (on non-coherent 
> architectures)
>
>> Then existing drivers, if they're converted, can just
>> add the SYNC flag.
>>
>> I did see the initial case where only the RX_BUF_SIZE (1536) is 
>> sync'd
>> instead of the full page.
>>
>> Could you expand on your 'skb-only queue' comment?  I'm currently 
>> running
>> a variant of your patch where iommu mapped pages are attached to 
>> skb's and
>> sent up the stack, then reclaimed on release.  I imagine that with 
>> this
>> change, they would have the full RX_BUF_SIZE sync'd before returning 
>> to the
>> driver, since the upper layers could basically do anything with the 
>> buffer
>> area.
>
> The idea was that page_pool lives per device queue. Usually some 
> queues are
> reserved for XDP only. Since eBPF progs can change the packet we have 
> to sync
> for the device, before we fill in the device descriptors.

And some devices (mlx4) run xdp on the normal RX queue, and if the 
verdict is
PASS, a skb is constructed and sent up the stack.


> For the skb reserved queues, this depends on the 'anything'. If the 
> rest of the
> layers touch (or rather write) into that area, then we'll again gave 
> to sync.
> If we know that the data has not been altered though, we can hand them 
> back to
> the device skipping that sync right?

Sure, but this is also true for eBPF programs.  How would the driver 
know that
the data has not been altered / compacted by the upper layers?
-- 
Jonathan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next 2/3] net: page_pool: add the possibility to sync DMA memory for non-coherent devices
  2019-11-14 21:04           ` Jonathan Lemon
@ 2019-11-14 21:43             ` Jesper Dangaard Brouer
  2019-11-15  7:05               ` Ilias Apalodimas
  2019-11-15  7:17             ` Ilias Apalodimas
  1 sibling, 1 reply; 21+ messages in thread
From: Jesper Dangaard Brouer @ 2019-11-14 21:43 UTC (permalink / raw)
  To: Jonathan Lemon
  Cc: Ilias Apalodimas, Lorenzo Bianconi, netdev, lorenzo.bianconi,
	davem, thomas.petazzoni, matteo.croce, brouer

On Thu, 14 Nov 2019 13:04:26 -0800
"Jonathan Lemon" <jonathan.lemon@gmail.com> wrote:

> On 14 Nov 2019, at 12:42, Ilias Apalodimas wrote:
> 
> > Hi Jonathan,
> >
> > On Thu, Nov 14, 2019 at 12:27:40PM -0800, Jonathan Lemon wrote:  
> >>
> >>
> >> On 14 Nov 2019, at 10:53, Ilias Apalodimas wrote:
> >>  
> >>> [...]  
> >>>>> index 2cbcdbdec254..defbfd90ab46 100644
> >>>>> --- a/include/net/page_pool.h
> >>>>> +++ b/include/net/page_pool.h
> >>>>> @@ -65,6 +65,9 @@ struct page_pool_params {
> >>>>>  	int		nid;  /* Numa node id to allocate from pages from */
> >>>>>  	struct device	*dev; /* device, for DMA pre-mapping purposes */
> >>>>>  	enum dma_data_direction dma_dir; /* DMA mapping direction */
> >>>>> +	unsigned int	max_len; /* max DMA sync memory size */
> >>>>> +	unsigned int	offset;  /* DMA addr offset */
> >>>>> +	u8 sync;
> >>>>>  };  
> >>>>
> >>>> How about using PP_FLAG_DMA_SYNC instead of another flag word?
> >>>> (then it can also be gated on having DMA_MAP enabled)  
> >>>
> >>> You mean instead of the u8?
> >>> As you pointed out on your V2 comment of the mail, some cards don't 
> >>> sync back to device.
> >>>
> >>> As the API tries to be generic a u8 was choosen instead of a flag
> >>> to cover these use cases. So in time we'll change the semantics of
> >>> this to 'always sync', 'dont sync if it's an skb-only queue' etc.
> >>>
> >>> The first case Lorenzo covered is sync the required len only instead 
> >>> of the full buffer  
> >>
> >> Yes, I meant instead of:
> >> +		.sync = 1,
> >>
> >> Something like:
> >>         .flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC
> >>

I actually agree and think we could use a flag. I suggest
PP_FLAG_DMA_SYNC_DEV to indicate that this DMA-sync-for-device.

Ilias notice that the change I requested to Lorenzo, that dma_sync_size
default value is 0xFFFFFFFF (-1).  That makes dma_sync_size==0 a valid
value, which you can use in the cases, where you know that nobody have
written into the data-area.  This allow us to selectively choose it for
these cases.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next 2/3] net: page_pool: add the possibility to sync DMA memory for non-coherent devices
  2019-11-14 21:43             ` Jesper Dangaard Brouer
@ 2019-11-15  7:05               ` Ilias Apalodimas
  2019-11-15  7:49                 ` Lorenzo Bianconi
  0 siblings, 1 reply; 21+ messages in thread
From: Ilias Apalodimas @ 2019-11-15  7:05 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Jonathan Lemon, Lorenzo Bianconi, netdev, lorenzo.bianconi,
	davem, thomas.petazzoni, matteo.croce

On Thu, Nov 14, 2019 at 10:43:09PM +0100, Jesper Dangaard Brouer wrote:
> On Thu, 14 Nov 2019 13:04:26 -0800
> "Jonathan Lemon" <jonathan.lemon@gmail.com> wrote:
> 
> > On 14 Nov 2019, at 12:42, Ilias Apalodimas wrote:
> > 
> > > Hi Jonathan,
> > >
> > > On Thu, Nov 14, 2019 at 12:27:40PM -0800, Jonathan Lemon wrote:  
> > >>
> > >>
> > >> On 14 Nov 2019, at 10:53, Ilias Apalodimas wrote:
> > >>  
> > >>> [...]  
> > >>>>> index 2cbcdbdec254..defbfd90ab46 100644
> > >>>>> --- a/include/net/page_pool.h
> > >>>>> +++ b/include/net/page_pool.h
> > >>>>> @@ -65,6 +65,9 @@ struct page_pool_params {
> > >>>>>  	int		nid;  /* Numa node id to allocate from pages from */
> > >>>>>  	struct device	*dev; /* device, for DMA pre-mapping purposes */
> > >>>>>  	enum dma_data_direction dma_dir; /* DMA mapping direction */
> > >>>>> +	unsigned int	max_len; /* max DMA sync memory size */
> > >>>>> +	unsigned int	offset;  /* DMA addr offset */
> > >>>>> +	u8 sync;
> > >>>>>  };  
> > >>>>
> > >>>> How about using PP_FLAG_DMA_SYNC instead of another flag word?
> > >>>> (then it can also be gated on having DMA_MAP enabled)  
> > >>>
> > >>> You mean instead of the u8?
> > >>> As you pointed out on your V2 comment of the mail, some cards don't 
> > >>> sync back to device.
> > >>>
> > >>> As the API tries to be generic a u8 was choosen instead of a flag
> > >>> to cover these use cases. So in time we'll change the semantics of
> > >>> this to 'always sync', 'dont sync if it's an skb-only queue' etc.
> > >>>
> > >>> The first case Lorenzo covered is sync the required len only instead 
> > >>> of the full buffer  
> > >>
> > >> Yes, I meant instead of:
> > >> +		.sync = 1,
> > >>
> > >> Something like:
> > >>         .flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC
> > >>
> 
> I actually agree and think we could use a flag. I suggest
> PP_FLAG_DMA_SYNC_DEV to indicate that this DMA-sync-for-device.
> 
> Ilias notice that the change I requested to Lorenzo, that dma_sync_size
> default value is 0xFFFFFFFF (-1).  That makes dma_sync_size==0 a valid
> value, which you can use in the cases, where you know that nobody have
> written into the data-area.  This allow us to selectively choose it for
> these cases.

Okay, then i guess the flag is a better fit for this.
The only difference would be that the sync semantics will be done on 'per
packet' basis,  instead of 'per pool', but that should be fine for our cases.

Cheers
/Ilias
> 
> -- 
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   LinkedIn: http://www.linkedin.com/in/brouer
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next 2/3] net: page_pool: add the possibility to sync DMA memory for non-coherent devices
  2019-11-14 21:04           ` Jonathan Lemon
  2019-11-14 21:43             ` Jesper Dangaard Brouer
@ 2019-11-15  7:17             ` Ilias Apalodimas
  1 sibling, 0 replies; 21+ messages in thread
From: Ilias Apalodimas @ 2019-11-15  7:17 UTC (permalink / raw)
  To: Jonathan Lemon
  Cc: Lorenzo Bianconi, netdev, lorenzo.bianconi, davem,
	thomas.petazzoni, brouer, matteo.croce

Hi Jonathan, 

> 
[...]
> > For the skb reserved queues, this depends on the 'anything'. If the rest
> > of the
> > layers touch (or rather write) into that area, then we'll again gave to
> > sync.
> > If we know that the data has not been altered though, we can hand them
> > back to
> > the device skipping that sync right?
> 
> Sure, but this is also true for eBPF programs.  How would the driver know
> that
> the data has not been altered / compacted by the upper layers?

I haven't looked that up in detail. I was just explaining the reasoning behind
picking a u8 instead of a flag. As Jesper pointed we can get the same result by
using len = 0, so i am fine with the proposed change.

Thanks
/Ilias
> -- 
> Jonathan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next 2/3] net: page_pool: add the possibility to sync DMA memory for non-coherent devices
  2019-11-15  7:05               ` Ilias Apalodimas
@ 2019-11-15  7:49                 ` Lorenzo Bianconi
  2019-11-15  8:03                   ` Ilias Apalodimas
  0 siblings, 1 reply; 21+ messages in thread
From: Lorenzo Bianconi @ 2019-11-15  7:49 UTC (permalink / raw)
  To: Ilias Apalodimas
  Cc: Jesper Dangaard Brouer, Jonathan Lemon, Lorenzo Bianconi, netdev,
	davem, thomas.petazzoni, matteo.croce

[-- Attachment #1: Type: text/plain, Size: 3105 bytes --]

> On Thu, Nov 14, 2019 at 10:43:09PM +0100, Jesper Dangaard Brouer wrote:
> > On Thu, 14 Nov 2019 13:04:26 -0800
> > "Jonathan Lemon" <jonathan.lemon@gmail.com> wrote:
> > 
> > > On 14 Nov 2019, at 12:42, Ilias Apalodimas wrote:
> > > 
> > > > Hi Jonathan,
> > > >
> > > > On Thu, Nov 14, 2019 at 12:27:40PM -0800, Jonathan Lemon wrote:  
> > > >>
> > > >>
> > > >> On 14 Nov 2019, at 10:53, Ilias Apalodimas wrote:
> > > >>  
> > > >>> [...]  
> > > >>>>> index 2cbcdbdec254..defbfd90ab46 100644
> > > >>>>> --- a/include/net/page_pool.h
> > > >>>>> +++ b/include/net/page_pool.h
> > > >>>>> @@ -65,6 +65,9 @@ struct page_pool_params {
> > > >>>>>  	int		nid;  /* Numa node id to allocate from pages from */
> > > >>>>>  	struct device	*dev; /* device, for DMA pre-mapping purposes */
> > > >>>>>  	enum dma_data_direction dma_dir; /* DMA mapping direction */
> > > >>>>> +	unsigned int	max_len; /* max DMA sync memory size */
> > > >>>>> +	unsigned int	offset;  /* DMA addr offset */
> > > >>>>> +	u8 sync;
> > > >>>>>  };  
> > > >>>>
> > > >>>> How about using PP_FLAG_DMA_SYNC instead of another flag word?
> > > >>>> (then it can also be gated on having DMA_MAP enabled)  
> > > >>>
> > > >>> You mean instead of the u8?
> > > >>> As you pointed out on your V2 comment of the mail, some cards don't 
> > > >>> sync back to device.
> > > >>>
> > > >>> As the API tries to be generic a u8 was choosen instead of a flag
> > > >>> to cover these use cases. So in time we'll change the semantics of
> > > >>> this to 'always sync', 'dont sync if it's an skb-only queue' etc.
> > > >>>
> > > >>> The first case Lorenzo covered is sync the required len only instead 
> > > >>> of the full buffer  
> > > >>
> > > >> Yes, I meant instead of:
> > > >> +		.sync = 1,
> > > >>
> > > >> Something like:
> > > >>         .flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC
> > > >>
> > 
> > I actually agree and think we could use a flag. I suggest
> > PP_FLAG_DMA_SYNC_DEV to indicate that this DMA-sync-for-device.
> > 
> > Ilias notice that the change I requested to Lorenzo, that dma_sync_size
> > default value is 0xFFFFFFFF (-1).  That makes dma_sync_size==0 a valid
> > value, which you can use in the cases, where you know that nobody have
> > written into the data-area.  This allow us to selectively choose it for
> > these cases.
> 
> Okay, then i guess the flag is a better fit for this.
> The only difference would be that the sync semantics will be done on 'per
> packet' basis,  instead of 'per pool', but that should be fine for our cases.

Ack, fine for me.
Do you think when checking for PP_FLAG_DMA_SYNC_DEV we should even verify
PP_FLAG_DMA_MAP? Something like:

if ((pool->p.flags & (PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV)) ==
    (PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV))
	page_pool_dma_sync_for_device();

Regards,
Lorenzo

> 
> Cheers
> /Ilias
> > 
> > -- 
> > Best regards,
> >   Jesper Dangaard Brouer
> >   MSc.CS, Principal Kernel Engineer at Red Hat
> >   LinkedIn: http://www.linkedin.com/in/brouer
> > 
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next 2/3] net: page_pool: add the possibility to sync DMA memory for non-coherent devices
  2019-11-15  7:49                 ` Lorenzo Bianconi
@ 2019-11-15  8:03                   ` Ilias Apalodimas
  2019-11-15 16:47                     ` Jonathan Lemon
  0 siblings, 1 reply; 21+ messages in thread
From: Ilias Apalodimas @ 2019-11-15  8:03 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: Jesper Dangaard Brouer, Jonathan Lemon, Lorenzo Bianconi, netdev,
	davem, thomas.petazzoni, matteo.croce

Hi Lorenzo, 

> > > > >>>> How about using PP_FLAG_DMA_SYNC instead of another flag word?
> > > > >>>> (then it can also be gated on having DMA_MAP enabled)  
> > > > >>>
> > > > >>> You mean instead of the u8?
> > > > >>> As you pointed out on your V2 comment of the mail, some cards don't 
> > > > >>> sync back to device.
> > > > >>>
> > > > >>> As the API tries to be generic a u8 was choosen instead of a flag
> > > > >>> to cover these use cases. So in time we'll change the semantics of
> > > > >>> this to 'always sync', 'dont sync if it's an skb-only queue' etc.
> > > > >>>
> > > > >>> The first case Lorenzo covered is sync the required len only instead 
> > > > >>> of the full buffer  
> > > > >>
> > > > >> Yes, I meant instead of:
> > > > >> +		.sync = 1,
> > > > >>
> > > > >> Something like:
> > > > >>         .flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC
> > > > >>
> > > 
> > > I actually agree and think we could use a flag. I suggest
> > > PP_FLAG_DMA_SYNC_DEV to indicate that this DMA-sync-for-device.
> > > 
> > > Ilias notice that the change I requested to Lorenzo, that dma_sync_size
> > > default value is 0xFFFFFFFF (-1).  That makes dma_sync_size==0 a valid
> > > value, which you can use in the cases, where you know that nobody have
> > > written into the data-area.  This allow us to selectively choose it for
> > > these cases.
> > 
> > Okay, then i guess the flag is a better fit for this.
> > The only difference would be that the sync semantics will be done on 'per
> > packet' basis,  instead of 'per pool', but that should be fine for our cases.
> 
> Ack, fine for me.
> Do you think when checking for PP_FLAG_DMA_SYNC_DEV we should even verify
> PP_FLAG_DMA_MAP? Something like:
> 
> if ((pool->p.flags & (PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV)) ==
>     (PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV))
> 	page_pool_dma_sync_for_device();
> 
> Regards,
> Lorenzo

I think it's better to do the check once on the pool registration and maybe
refuse to allocate the pool? Syncing without mapping doesn't really make sense

Cheers
/Ilias
> 
> > 
> > Cheers
> > /Ilias
> > > 
> > > -- 
> > > Best regards,
> > >   Jesper Dangaard Brouer
> > >   MSc.CS, Principal Kernel Engineer at Red Hat
> > >   LinkedIn: http://www.linkedin.com/in/brouer
> > > 
> > 



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next 2/3] net: page_pool: add the possibility to sync DMA memory for non-coherent devices
  2019-11-15  8:03                   ` Ilias Apalodimas
@ 2019-11-15 16:47                     ` Jonathan Lemon
  2019-11-15 16:53                       ` Lorenzo Bianconi
  0 siblings, 1 reply; 21+ messages in thread
From: Jonathan Lemon @ 2019-11-15 16:47 UTC (permalink / raw)
  To: Ilias Apalodimas
  Cc: Lorenzo Bianconi, Jesper Dangaard Brouer, Lorenzo Bianconi,
	netdev, davem, thomas.petazzoni, matteo.croce

On 15 Nov 2019, at 0:03, Ilias Apalodimas wrote:

> Hi Lorenzo,
>
>>>>>>>>> How about using PP_FLAG_DMA_SYNC instead of another flag word?
>>>>>>>>> (then it can also be gated on having DMA_MAP enabled)
>>>>>>>>
>>>>>>>> You mean instead of the u8?
>>>>>>>> As you pointed out on your V2 comment of the mail, some cards 
>>>>>>>> don't
>>>>>>>> sync back to device.
>>>>>>>>
>>>>>>>> As the API tries to be generic a u8 was choosen instead of a 
>>>>>>>> flag
>>>>>>>> to cover these use cases. So in time we'll change the semantics 
>>>>>>>> of
>>>>>>>> this to 'always sync', 'dont sync if it's an skb-only queue' 
>>>>>>>> etc.
>>>>>>>>
>>>>>>>> The first case Lorenzo covered is sync the required len only 
>>>>>>>> instead
>>>>>>>> of the full buffer
>>>>>>>
>>>>>>> Yes, I meant instead of:
>>>>>>> +		.sync = 1,
>>>>>>>
>>>>>>> Something like:
>>>>>>>         .flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC
>>>>>>>
>>>>
>>>> I actually agree and think we could use a flag. I suggest
>>>> PP_FLAG_DMA_SYNC_DEV to indicate that this DMA-sync-for-device.
>>>>
>>>> Ilias notice that the change I requested to Lorenzo, that 
>>>> dma_sync_size
>>>> default value is 0xFFFFFFFF (-1).  That makes dma_sync_size==0 a 
>>>> valid
>>>> value, which you can use in the cases, where you know that nobody 
>>>> have
>>>> written into the data-area.  This allow us to selectively choose it 
>>>> for
>>>> these cases.
>>>
>>> Okay, then i guess the flag is a better fit for this.
>>> The only difference would be that the sync semantics will be done on 
>>> 'per
>>> packet' basis,  instead of 'per pool', but that should be fine for 
>>> our cases.
>>
>> Ack, fine for me.
>> Do you think when checking for PP_FLAG_DMA_SYNC_DEV we should even 
>> verify
>> PP_FLAG_DMA_MAP? Something like:
>>
>> if ((pool->p.flags & (PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV)) ==
>>     (PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV))
>> 	page_pool_dma_sync_for_device();
>>
>> Regards,
>> Lorenzo
>
> I think it's better to do the check once on the pool registration and 
> maybe
> refuse to allocate the pool? Syncing without mapping doesn't really 
> make sense

+1.
-- 
Jonathan

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH net-next 2/3] net: page_pool: add the possibility to sync DMA memory for non-coherent devices
  2019-11-15 16:47                     ` Jonathan Lemon
@ 2019-11-15 16:53                       ` Lorenzo Bianconi
  0 siblings, 0 replies; 21+ messages in thread
From: Lorenzo Bianconi @ 2019-11-15 16:53 UTC (permalink / raw)
  To: Jonathan Lemon
  Cc: Ilias Apalodimas, Lorenzo Bianconi, Jesper Dangaard Brouer,
	netdev, davem, thomas.petazzoni, matteo.croce

[-- Attachment #1: Type: text/plain, Size: 906 bytes --]

[...]

> > > > 
> > > > Okay, then i guess the flag is a better fit for this.
> > > > The only difference would be that the sync semantics will be
> > > > done on 'per
> > > > packet' basis,  instead of 'per pool', but that should be fine
> > > > for our cases.
> > > 
> > > Ack, fine for me.
> > > Do you think when checking for PP_FLAG_DMA_SYNC_DEV we should even
> > > verify
> > > PP_FLAG_DMA_MAP? Something like:
> > > 
> > > if ((pool->p.flags & (PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV)) ==
> > >     (PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV))
> > > 	page_pool_dma_sync_for_device();
> > > 
> > > Regards,
> > > Lorenzo
> > 
> > I think it's better to do the check once on the pool registration and
> > maybe
> > refuse to allocate the pool? Syncing without mapping doesn't really make
> > sense
> 
> +1.

ack, will post v3 soon.

Regards,
Lorenzo

> -- 
> Jonathan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2019-11-15 16:53 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-10 12:09 [PATCH net-next 0/3] add DMA sync capability to page_pool API Lorenzo Bianconi
2019-11-10 12:09 ` [PATCH net-next 1/3] net: mvneta: rely on page_pool_recycle_direct in mvneta_run_xdp Lorenzo Bianconi
2019-11-10 12:09 ` [PATCH net-next 2/3] net: page_pool: add the possibility to sync DMA memory for non-coherent devices Lorenzo Bianconi
2019-11-11 16:48   ` Jesper Dangaard Brouer
2019-11-11 19:11     ` Lorenzo Bianconi
2019-11-13  8:29   ` Jesper Dangaard Brouer
2019-11-14 18:48   ` Jonathan Lemon
2019-11-14 18:53     ` Ilias Apalodimas
2019-11-14 20:27       ` Jonathan Lemon
2019-11-14 20:42         ` Ilias Apalodimas
2019-11-14 21:04           ` Jonathan Lemon
2019-11-14 21:43             ` Jesper Dangaard Brouer
2019-11-15  7:05               ` Ilias Apalodimas
2019-11-15  7:49                 ` Lorenzo Bianconi
2019-11-15  8:03                   ` Ilias Apalodimas
2019-11-15 16:47                     ` Jonathan Lemon
2019-11-15 16:53                       ` Lorenzo Bianconi
2019-11-15  7:17             ` Ilias Apalodimas
2019-11-10 12:09 ` [PATCH net-next 3/3] net: mvneta: get rid of huge DMA sync in mvneta_rx_refill Lorenzo Bianconi
2019-11-14 18:14   ` Jonathan Lemon
2019-11-14 18:18     ` Ilias Apalodimas

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.