All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 0/9] mlx4: order-0 allocations and page recycling
@ 2017-02-07  3:02 Eric Dumazet
  2017-02-07  3:02 ` [PATCH net-next 1/9] mlx4: use __skb_fill_page_desc() Eric Dumazet
                   ` (9 more replies)
  0 siblings, 10 replies; 26+ messages in thread
From: Eric Dumazet @ 2017-02-07  3:02 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Tariq Toukan, Martin KaFai Lau, Willem de Bruijn,
	Jesper Dangaard Brouer, Brenden Blanco, Alexei Starovoitov,
	Eric Dumazet, Eric Dumazet

As mentioned half a year ago, we better switch mlx4 driver to order-0
allocations and page recycling.

This reduces vulnerability surface thanks to better skb->truesize tracking
and provides better performance in most cases.

Worth noting this patch series deletes more than 100 lines of code ;)

Eric Dumazet (9):
  mlx4: use __skb_fill_page_desc()
  mlx4: dma_dir is a mlx4_en_priv attribute
  mlx4: remove order field from mlx4_en_frag_info
  mlx4: get rid of frag_prefix_size
  mlx4: rx_headroom is a per port attribute
  mlx4: reduce rx ring page_cache size
  mlx4: removal of frag_sizes[]
  mlx4: use order-0 pages for RX
  mlx4: add page recycling in receive path

 drivers/net/ethernet/mellanox/mlx4/en_rx.c   | 350 +++++++++------------------
 drivers/net/ethernet/mellanox/mlx4/en_tx.c   |   4 +-
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h |  28 +--
 3 files changed, 129 insertions(+), 253 deletions(-)

-- 
2.11.0.483.g087da7b7c-goog

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH net-next 1/9] mlx4: use __skb_fill_page_desc()
  2017-02-07  3:02 [PATCH net-next 0/9] mlx4: order-0 allocations and page recycling Eric Dumazet
@ 2017-02-07  3:02 ` Eric Dumazet
  2017-02-07  3:02 ` [PATCH net-next 2/9] mlx4: dma_dir is a mlx4_en_priv attribute Eric Dumazet
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 26+ messages in thread
From: Eric Dumazet @ 2017-02-07  3:02 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Tariq Toukan, Martin KaFai Lau, Willem de Bruijn,
	Jesper Dangaard Brouer, Brenden Blanco, Alexei Starovoitov,
	Eric Dumazet, Eric Dumazet

Or we might miss the fact that a page was allocated from memory reserves.

Fixes: dceeab0e5258 ("mlx4: support __GFP_MEMALLOC for rx")
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_rx.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index f15ddba3659aac38471059c6bcbf05071794..03f1713c94c7fa57e9eaaf87fe38a0a6d372 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -601,10 +601,10 @@ static int mlx4_en_complete_rx_desc(struct mlx4_en_priv *priv,
 		dma_sync_single_for_cpu(priv->ddev, dma, frag_info->frag_size,
 					DMA_FROM_DEVICE);
 
-		/* Save page reference in skb */
-		__skb_frag_set_page(&skb_frags_rx[nr], frags[nr].page);
-		skb_frag_size_set(&skb_frags_rx[nr], frag_info->frag_size);
-		skb_frags_rx[nr].page_offset = frags[nr].page_offset;
+		__skb_fill_page_desc(skb, nr, frags[nr].page,
+				     frags[nr].page_offset,
+				     frag_info->frag_size);
+
 		skb->truesize += frag_info->frag_stride;
 		frags[nr].page = NULL;
 	}
-- 
2.11.0.483.g087da7b7c-goog

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH net-next 2/9] mlx4: dma_dir is a mlx4_en_priv attribute
  2017-02-07  3:02 [PATCH net-next 0/9] mlx4: order-0 allocations and page recycling Eric Dumazet
  2017-02-07  3:02 ` [PATCH net-next 1/9] mlx4: use __skb_fill_page_desc() Eric Dumazet
@ 2017-02-07  3:02 ` Eric Dumazet
  2017-02-07  3:02 ` [PATCH net-next 3/9] mlx4: remove order field from mlx4_en_frag_info Eric Dumazet
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 26+ messages in thread
From: Eric Dumazet @ 2017-02-07  3:02 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Tariq Toukan, Martin KaFai Lau, Willem de Bruijn,
	Jesper Dangaard Brouer, Brenden Blanco, Alexei Starovoitov,
	Eric Dumazet, Eric Dumazet

No need to duplicate it for all queues and frags.

num_frags & log_rx_info become u8 to save space.
u8 accesses are a bit faster than u16 anyway.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_rx.c   | 16 ++++++++--------
 drivers/net/ethernet/mellanox/mlx4/en_tx.c   |  2 +-
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h |  6 +++---
 3 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 03f1713c94c7fa57e9eaaf87fe38a0a6d372..9bb22eb5bfcc3037e92d06cca75d514dd52e 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -72,7 +72,7 @@ static int mlx4_alloc_pages(struct mlx4_en_priv *priv,
 			return -ENOMEM;
 	}
 	dma = dma_map_page(priv->ddev, page, 0, PAGE_SIZE << order,
-			   frag_info->dma_dir);
+			   priv->dma_dir);
 	if (unlikely(dma_mapping_error(priv->ddev, dma))) {
 		put_page(page);
 		return -ENOMEM;
@@ -128,7 +128,7 @@ static int mlx4_en_alloc_frags(struct mlx4_en_priv *priv,
 		if (page_alloc[i].page != ring_alloc[i].page) {
 			dma_unmap_page(priv->ddev, page_alloc[i].dma,
 				page_alloc[i].page_size,
-				priv->frag_info[i].dma_dir);
+				priv->dma_dir);
 			page = page_alloc[i].page;
 			/* Revert changes done by mlx4_alloc_pages */
 			page_ref_sub(page, page_alloc[i].page_size /
@@ -149,7 +149,7 @@ static void mlx4_en_free_frag(struct mlx4_en_priv *priv,
 
 	if (next_frag_end > frags[i].page_size)
 		dma_unmap_page(priv->ddev, frags[i].dma, frags[i].page_size,
-			       frag_info->dma_dir);
+			       priv->dma_dir);
 
 	if (frags[i].page)
 		put_page(frags[i].page);
@@ -181,7 +181,7 @@ static int mlx4_en_init_allocator(struct mlx4_en_priv *priv,
 		page_alloc = &ring->page_alloc[i];
 		dma_unmap_page(priv->ddev, page_alloc->dma,
 			       page_alloc->page_size,
-			       priv->frag_info[i].dma_dir);
+			       priv->dma_dir);
 		page = page_alloc->page;
 		/* Revert changes done by mlx4_alloc_pages */
 		page_ref_sub(page, page_alloc->page_size /
@@ -206,7 +206,7 @@ static void mlx4_en_destroy_allocator(struct mlx4_en_priv *priv,
 		       i, page_count(page_alloc->page));
 
 		dma_unmap_page(priv->ddev, page_alloc->dma,
-				page_alloc->page_size, frag_info->dma_dir);
+				page_alloc->page_size, priv->dma_dir);
 		while (page_alloc->page_offset + frag_info->frag_stride <
 		       page_alloc->page_size) {
 			put_page(page_alloc->page);
@@ -567,7 +567,7 @@ void mlx4_en_deactivate_rx_ring(struct mlx4_en_priv *priv,
 		struct mlx4_en_rx_alloc *frame = &ring->page_cache.buf[i];
 
 		dma_unmap_page(priv->ddev, frame->dma, frame->page_size,
-			       priv->frag_info[0].dma_dir);
+			       priv->dma_dir);
 		put_page(frame->page);
 	}
 	ring->page_cache.index = 0;
@@ -1199,7 +1199,7 @@ void mlx4_en_calc_rx_buf(struct net_device *dev)
 		 * expense of more costly truesize accounting
 		 */
 		priv->frag_info[0].frag_stride = PAGE_SIZE;
-		priv->frag_info[0].dma_dir = PCI_DMA_BIDIRECTIONAL;
+		priv->dma_dir = PCI_DMA_BIDIRECTIONAL;
 		priv->frag_info[0].rx_headroom = XDP_PACKET_HEADROOM;
 		i = 1;
 	} else {
@@ -1214,11 +1214,11 @@ void mlx4_en_calc_rx_buf(struct net_device *dev)
 			priv->frag_info[i].frag_stride =
 				ALIGN(priv->frag_info[i].frag_size,
 				      SMP_CACHE_BYTES);
-			priv->frag_info[i].dma_dir = PCI_DMA_FROMDEVICE;
 			priv->frag_info[i].rx_headroom = 0;
 			buf_size += priv->frag_info[i].frag_size;
 			i++;
 		}
+		priv->dma_dir = PCI_DMA_FROMDEVICE;
 	}
 
 	priv->num_frags = i;
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
index 3ed42199d3f1275f77560e92a430c0dde181..98bc67a7249b14f8857fe1fd6baa40ae3ec5 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
@@ -360,7 +360,7 @@ u32 mlx4_en_recycle_tx_desc(struct mlx4_en_priv *priv,
 
 	if (!mlx4_en_rx_recycle(ring->recycle_ring, &frame)) {
 		dma_unmap_page(priv->ddev, tx_info->map0_dma,
-			       PAGE_SIZE, priv->frag_info[0].dma_dir);
+			       PAGE_SIZE, priv->dma_dir);
 		put_page(tx_info->page);
 	}
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index ba1c6cd0cc79590075f4420a930b613c9fde..549f88b9becd9f2dd96282a44f6d374f14a4 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -474,7 +474,6 @@ struct mlx4_en_frag_info {
 	u16 frag_size;
 	u16 frag_prefix_size;
 	u32 frag_stride;
-	enum dma_data_direction dma_dir;
 	u16 order;
 	u16 rx_headroom;
 };
@@ -584,8 +583,9 @@ struct mlx4_en_priv {
 	u32 rx_ring_num;
 	u32 rx_skb_size;
 	struct mlx4_en_frag_info frag_info[MLX4_EN_MAX_RX_FRAGS];
-	u16 num_frags;
-	u16 log_rx_info;
+	u8 num_frags;
+	u8 log_rx_info;
+	u8 dma_dir;
 
 	struct mlx4_en_tx_ring **tx_ring[MLX4_EN_NUM_TX_TYPES];
 	struct mlx4_en_rx_ring *rx_ring[MAX_RX_RINGS];
-- 
2.11.0.483.g087da7b7c-goog

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH net-next 3/9] mlx4: remove order field from mlx4_en_frag_info
  2017-02-07  3:02 [PATCH net-next 0/9] mlx4: order-0 allocations and page recycling Eric Dumazet
  2017-02-07  3:02 ` [PATCH net-next 1/9] mlx4: use __skb_fill_page_desc() Eric Dumazet
  2017-02-07  3:02 ` [PATCH net-next 2/9] mlx4: dma_dir is a mlx4_en_priv attribute Eric Dumazet
@ 2017-02-07  3:02 ` Eric Dumazet
  2017-02-07  3:02 ` [PATCH net-next 4/9] mlx4: get rid of frag_prefix_size Eric Dumazet
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 26+ messages in thread
From: Eric Dumazet @ 2017-02-07  3:02 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Tariq Toukan, Martin KaFai Lau, Willem de Bruijn,
	Jesper Dangaard Brouer, Brenden Blanco, Alexei Starovoitov,
	Eric Dumazet, Eric Dumazet

This is really a port attribute, no need to duplicate it per
RX queue and per frag.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_rx.c   | 6 +++---
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 9bb22eb5bfcc3037e92d06cca75d514dd52e..f868cb330039f5730ab8f59eca451c3d5272 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -59,7 +59,7 @@ static int mlx4_alloc_pages(struct mlx4_en_priv *priv,
 	struct page *page;
 	dma_addr_t dma;
 
-	for (order = frag_info->order; ;) {
+	for (order = priv->rx_page_order; ;) {
 		gfp_t gfp = _gfp;
 
 		if (order)
@@ -1192,7 +1192,7 @@ void mlx4_en_calc_rx_buf(struct net_device *dev)
 	 * This only works when num_frags == 1.
 	 */
 	if (priv->tx_ring_num[TX_XDP]) {
-		priv->frag_info[0].order = 0;
+		priv->rx_page_order = 0;
 		priv->frag_info[0].frag_size = eff_mtu;
 		priv->frag_info[0].frag_prefix_size = 0;
 		/* This will gain efficient xdp frame recycling at the
@@ -1206,7 +1206,6 @@ void mlx4_en_calc_rx_buf(struct net_device *dev)
 		int buf_size = 0;
 
 		while (buf_size < eff_mtu) {
-			priv->frag_info[i].order = MLX4_EN_ALLOC_PREFER_ORDER;
 			priv->frag_info[i].frag_size =
 				(eff_mtu > buf_size + frag_sizes[i]) ?
 					frag_sizes[i] : eff_mtu - buf_size;
@@ -1218,6 +1217,7 @@ void mlx4_en_calc_rx_buf(struct net_device *dev)
 			buf_size += priv->frag_info[i].frag_size;
 			i++;
 		}
+		priv->rx_page_order = MLX4_EN_ALLOC_PREFER_ORDER;
 		priv->dma_dir = PCI_DMA_FROMDEVICE;
 	}
 
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index 549f88b9becd9f2dd96282a44f6d374f14a4..11898550f87c077f6687903790d329e4aa1e 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -474,7 +474,6 @@ struct mlx4_en_frag_info {
 	u16 frag_size;
 	u16 frag_prefix_size;
 	u32 frag_stride;
-	u16 order;
 	u16 rx_headroom;
 };
 
@@ -586,6 +585,7 @@ struct mlx4_en_priv {
 	u8 num_frags;
 	u8 log_rx_info;
 	u8 dma_dir;
+	u8 rx_page_order;
 
 	struct mlx4_en_tx_ring **tx_ring[MLX4_EN_NUM_TX_TYPES];
 	struct mlx4_en_rx_ring *rx_ring[MAX_RX_RINGS];
-- 
2.11.0.483.g087da7b7c-goog

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH net-next 4/9] mlx4: get rid of frag_prefix_size
  2017-02-07  3:02 [PATCH net-next 0/9] mlx4: order-0 allocations and page recycling Eric Dumazet
                   ` (2 preceding siblings ...)
  2017-02-07  3:02 ` [PATCH net-next 3/9] mlx4: remove order field from mlx4_en_frag_info Eric Dumazet
@ 2017-02-07  3:02 ` Eric Dumazet
  2017-02-09 12:28   ` Tariq Toukan
  2017-02-07  3:02 ` [PATCH net-next 5/9] mlx4: rx_headroom is a per port attribute Eric Dumazet
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 26+ messages in thread
From: Eric Dumazet @ 2017-02-07  3:02 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Tariq Toukan, Martin KaFai Lau, Willem de Bruijn,
	Jesper Dangaard Brouer, Brenden Blanco, Alexei Starovoitov,
	Eric Dumazet, Eric Dumazet

Using per frag storage for frag_prefix_size is really silly.

mlx4_en_complete_rx_desc() has all needed info already.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_rx.c   | 27 ++++++++++++---------------
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h |  3 +--
 2 files changed, 13 insertions(+), 17 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index f868cb330039f5730ab8f59eca451c3d5272..c6c64ac1e25931fc172beb5c718ec3a799f6 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -585,15 +585,14 @@ static int mlx4_en_complete_rx_desc(struct mlx4_en_priv *priv,
 				    int length)
 {
 	struct skb_frag_struct *skb_frags_rx = skb_shinfo(skb)->frags;
-	struct mlx4_en_frag_info *frag_info;
-	int nr;
+	struct mlx4_en_frag_info *frag_info = priv->frag_info;
+	int nr, frag_size;
 	dma_addr_t dma;
 
 	/* Collect used fragments while replacing them in the HW descriptors */
-	for (nr = 0; nr < priv->num_frags; nr++) {
-		frag_info = &priv->frag_info[nr];
-		if (length <= frag_info->frag_prefix_size)
-			break;
+	for (nr = 0;;) {
+		frag_size = min_t(int, length, frag_info->frag_size);
+
 		if (unlikely(!frags[nr].page))
 			goto fail;
 
@@ -603,15 +602,16 @@ static int mlx4_en_complete_rx_desc(struct mlx4_en_priv *priv,
 
 		__skb_fill_page_desc(skb, nr, frags[nr].page,
 				     frags[nr].page_offset,
-				     frag_info->frag_size);
+				     frag_size);
 
 		skb->truesize += frag_info->frag_stride;
 		frags[nr].page = NULL;
+		nr++;
+		length -= frag_size;
+		if (!length)
+			break;
+		frag_info++;
 	}
-	/* Adjust size of last fragment to match actual length */
-	if (nr > 0)
-		skb_frag_size_set(&skb_frags_rx[nr - 1],
-			length - priv->frag_info[nr - 1].frag_prefix_size);
 	return nr;
 
 fail:
@@ -1194,7 +1194,6 @@ void mlx4_en_calc_rx_buf(struct net_device *dev)
 	if (priv->tx_ring_num[TX_XDP]) {
 		priv->rx_page_order = 0;
 		priv->frag_info[0].frag_size = eff_mtu;
-		priv->frag_info[0].frag_prefix_size = 0;
 		/* This will gain efficient xdp frame recycling at the
 		 * expense of more costly truesize accounting
 		 */
@@ -1209,7 +1208,6 @@ void mlx4_en_calc_rx_buf(struct net_device *dev)
 			priv->frag_info[i].frag_size =
 				(eff_mtu > buf_size + frag_sizes[i]) ?
 					frag_sizes[i] : eff_mtu - buf_size;
-			priv->frag_info[i].frag_prefix_size = buf_size;
 			priv->frag_info[i].frag_stride =
 				ALIGN(priv->frag_info[i].frag_size,
 				      SMP_CACHE_BYTES);
@@ -1229,10 +1227,9 @@ void mlx4_en_calc_rx_buf(struct net_device *dev)
 	       eff_mtu, priv->num_frags);
 	for (i = 0; i < priv->num_frags; i++) {
 		en_err(priv,
-		       "  frag:%d - size:%d prefix:%d stride:%d\n",
+		       "  frag:%d - size:%d stride:%d\n",
 		       i,
 		       priv->frag_info[i].frag_size,
-		       priv->frag_info[i].frag_prefix_size,
 		       priv->frag_info[i].frag_stride);
 	}
 }
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index 11898550f87c077f6687903790d329e4aa1e..fe8ed4e85e9645679cc37d0d30284b523689 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -472,9 +472,8 @@ struct mlx4_en_mc_list {
 
 struct mlx4_en_frag_info {
 	u16 frag_size;
-	u16 frag_prefix_size;
-	u32 frag_stride;
 	u16 rx_headroom;
+	u32 frag_stride;
 };
 
 #ifdef CONFIG_MLX4_EN_DCB
-- 
2.11.0.483.g087da7b7c-goog

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH net-next 5/9] mlx4: rx_headroom is a per port attribute
  2017-02-07  3:02 [PATCH net-next 0/9] mlx4: order-0 allocations and page recycling Eric Dumazet
                   ` (3 preceding siblings ...)
  2017-02-07  3:02 ` [PATCH net-next 4/9] mlx4: get rid of frag_prefix_size Eric Dumazet
@ 2017-02-07  3:02 ` Eric Dumazet
  2017-02-07  3:02 ` [PATCH net-next 6/9] mlx4: reduce rx ring page_cache size Eric Dumazet
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 26+ messages in thread
From: Eric Dumazet @ 2017-02-07  3:02 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Tariq Toukan, Martin KaFai Lau, Willem de Bruijn,
	Jesper Dangaard Brouer, Brenden Blanco, Alexei Starovoitov,
	Eric Dumazet, Eric Dumazet

No need to duplicate it per RX queue / frags.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_rx.c   | 6 +++---
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index c6c64ac1e25931fc172beb5c718ec3a799f6..80bb3c15f7c169f7091eb4a8dc06804f98b6 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -115,7 +115,7 @@ static int mlx4_en_alloc_frags(struct mlx4_en_priv *priv,
 
 	for (i = 0; i < priv->num_frags; i++) {
 		frags[i] = ring_alloc[i];
-		frags[i].page_offset += priv->frag_info[i].rx_headroom;
+		frags[i].page_offset += priv->rx_headroom;
 		rx_desc->data[i].addr = cpu_to_be64(frags[i].dma +
 						    frags[i].page_offset);
 		ring_alloc[i] = page_alloc[i];
@@ -1199,7 +1199,7 @@ void mlx4_en_calc_rx_buf(struct net_device *dev)
 		 */
 		priv->frag_info[0].frag_stride = PAGE_SIZE;
 		priv->dma_dir = PCI_DMA_BIDIRECTIONAL;
-		priv->frag_info[0].rx_headroom = XDP_PACKET_HEADROOM;
+		priv->rx_headroom = XDP_PACKET_HEADROOM;
 		i = 1;
 	} else {
 		int buf_size = 0;
@@ -1211,12 +1211,12 @@ void mlx4_en_calc_rx_buf(struct net_device *dev)
 			priv->frag_info[i].frag_stride =
 				ALIGN(priv->frag_info[i].frag_size,
 				      SMP_CACHE_BYTES);
-			priv->frag_info[i].rx_headroom = 0;
 			buf_size += priv->frag_info[i].frag_size;
 			i++;
 		}
 		priv->rx_page_order = MLX4_EN_ALLOC_PREFER_ORDER;
 		priv->dma_dir = PCI_DMA_FROMDEVICE;
+		priv->rx_headroom = 0;
 	}
 
 	priv->num_frags = i;
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index fe8ed4e85e9645679cc37d0d30284b523689..5d65a60e93b7a2ae84312cd0f2d474a065d9 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -472,7 +472,6 @@ struct mlx4_en_mc_list {
 
 struct mlx4_en_frag_info {
 	u16 frag_size;
-	u16 rx_headroom;
 	u32 frag_stride;
 };
 
@@ -585,6 +584,7 @@ struct mlx4_en_priv {
 	u8 log_rx_info;
 	u8 dma_dir;
 	u8 rx_page_order;
+	u16 rx_headroom;
 
 	struct mlx4_en_tx_ring **tx_ring[MLX4_EN_NUM_TX_TYPES];
 	struct mlx4_en_rx_ring *rx_ring[MAX_RX_RINGS];
-- 
2.11.0.483.g087da7b7c-goog

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH net-next 6/9] mlx4: reduce rx ring page_cache size
  2017-02-07  3:02 [PATCH net-next 0/9] mlx4: order-0 allocations and page recycling Eric Dumazet
                   ` (4 preceding siblings ...)
  2017-02-07  3:02 ` [PATCH net-next 5/9] mlx4: rx_headroom is a per port attribute Eric Dumazet
@ 2017-02-07  3:02 ` Eric Dumazet
  2017-02-07  3:02 ` [PATCH net-next 7/9] mlx4: removal of frag_sizes[] Eric Dumazet
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 26+ messages in thread
From: Eric Dumazet @ 2017-02-07  3:02 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Tariq Toukan, Martin KaFai Lau, Willem de Bruijn,
	Jesper Dangaard Brouer, Brenden Blanco, Alexei Starovoitov,
	Eric Dumazet, Eric Dumazet

We only need to store the page and dma address.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_rx.c   | 17 ++++++++++-------
 drivers/net/ethernet/mellanox/mlx4/en_tx.c   |  2 --
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h |  6 +++++-
 3 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 80bb3c15f7c169f7091eb4a8dc06804f98b6..6c95694f6390aa9fbc5f941a97e305815949 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -250,7 +250,10 @@ static int mlx4_en_prepare_rx_desc(struct mlx4_en_priv *priv,
 					(index << priv->log_rx_info);
 
 	if (ring->page_cache.index > 0) {
-		frags[0] = ring->page_cache.buf[--ring->page_cache.index];
+		ring->page_cache.index--;
+		frags[0].page = ring->page_cache.buf[ring->page_cache.index].page;
+		frags[0].dma  = ring->page_cache.buf[ring->page_cache.index].dma;
+		frags[0].page_offset = XDP_PACKET_HEADROOM;
 		rx_desc->data[0].addr = cpu_to_be64(frags[0].dma +
 						    frags[0].page_offset);
 		return 0;
@@ -534,7 +537,9 @@ bool mlx4_en_rx_recycle(struct mlx4_en_rx_ring *ring,
 	if (cache->index >= MLX4_EN_CACHE_SIZE)
 		return false;
 
-	cache->buf[cache->index++] = *frame;
+	cache->buf[cache->index].page = frame->page;
+	cache->buf[cache->index].dma = frame->dma;
+	cache->index++;
 	return true;
 }
 
@@ -564,11 +569,9 @@ void mlx4_en_deactivate_rx_ring(struct mlx4_en_priv *priv,
 	int i;
 
 	for (i = 0; i < ring->page_cache.index; i++) {
-		struct mlx4_en_rx_alloc *frame = &ring->page_cache.buf[i];
-
-		dma_unmap_page(priv->ddev, frame->dma, frame->page_size,
-			       priv->dma_dir);
-		put_page(frame->page);
+		dma_unmap_page(priv->ddev, ring->page_cache.buf[i].dma,
+			       PAGE_SIZE, priv->dma_dir);
+		put_page(ring->page_cache.buf[i].page);
 	}
 	ring->page_cache.index = 0;
 	mlx4_en_free_rx_buf(priv, ring);
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_tx.c b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
index 98bc67a7249b14f8857fe1fd6baa40ae3ec5..e0c5ffb3e3a6607456e1f191b0b8c8becfc7 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_tx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_tx.c
@@ -354,8 +354,6 @@ u32 mlx4_en_recycle_tx_desc(struct mlx4_en_priv *priv,
 	struct mlx4_en_rx_alloc frame = {
 		.page = tx_info->page,
 		.dma = tx_info->map0_dma,
-		.page_offset = XDP_PACKET_HEADROOM,
-		.page_size = PAGE_SIZE,
 	};
 
 	if (!mlx4_en_rx_recycle(ring->recycle_ring, &frame)) {
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index 5d65a60e93b7a2ae84312cd0f2d474a065d9..c9916b75b94bc9364b2cbe6da06a5ea385c6 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -267,9 +267,13 @@ struct mlx4_en_rx_alloc {
 };
 
 #define MLX4_EN_CACHE_SIZE (2 * NAPI_POLL_WEIGHT)
+
 struct mlx4_en_page_cache {
 	u32 index;
-	struct mlx4_en_rx_alloc buf[MLX4_EN_CACHE_SIZE];
+	struct {
+		struct page	*page;
+		dma_addr_t	dma;
+	} buf[MLX4_EN_CACHE_SIZE];
 };
 
 struct mlx4_en_priv;
-- 
2.11.0.483.g087da7b7c-goog

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH net-next 7/9] mlx4: removal of frag_sizes[]
  2017-02-07  3:02 [PATCH net-next 0/9] mlx4: order-0 allocations and page recycling Eric Dumazet
                   ` (5 preceding siblings ...)
  2017-02-07  3:02 ` [PATCH net-next 6/9] mlx4: reduce rx ring page_cache size Eric Dumazet
@ 2017-02-07  3:02 ` Eric Dumazet
  2017-02-07  3:02 ` [PATCH net-next 8/9] mlx4: use order-0 pages for RX Eric Dumazet
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 26+ messages in thread
From: Eric Dumazet @ 2017-02-07  3:02 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Tariq Toukan, Martin KaFai Lau, Willem de Bruijn,
	Jesper Dangaard Brouer, Brenden Blanco, Alexei Starovoitov,
	Eric Dumazet, Eric Dumazet

We will soon use order-0 pages, and frag truesize will more precisely
match real sizes.

In the new model, we prefer to use <= 2048 bytes fragments, so that
we can use page-recycle technique on PAGE_SIZE=4096 arches.

We will still pack as much frames as possible on arches with big
pages, like PowerPC.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_rx.c   | 24 ++++++++++--------------
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h |  8 --------
 2 files changed, 10 insertions(+), 22 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 6c95694f6390aa9fbc5f941a97e305815949..dd3bfcfea10c4545dfeda0f999449b13ca91 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -1178,13 +1178,6 @@ int mlx4_en_poll_rx_cq(struct napi_struct *napi, int budget)
 	return done;
 }
 
-static const int frag_sizes[] = {
-	FRAG_SZ0,
-	FRAG_SZ1,
-	FRAG_SZ2,
-	FRAG_SZ3
-};
-
 void mlx4_en_calc_rx_buf(struct net_device *dev)
 {
 	struct mlx4_en_priv *priv = netdev_priv(dev);
@@ -1208,13 +1201,16 @@ void mlx4_en_calc_rx_buf(struct net_device *dev)
 		int buf_size = 0;
 
 		while (buf_size < eff_mtu) {
-			priv->frag_info[i].frag_size =
-				(eff_mtu > buf_size + frag_sizes[i]) ?
-					frag_sizes[i] : eff_mtu - buf_size;
-			priv->frag_info[i].frag_stride =
-				ALIGN(priv->frag_info[i].frag_size,
-				      SMP_CACHE_BYTES);
-			buf_size += priv->frag_info[i].frag_size;
+			int frag_size = eff_mtu - buf_size;
+
+			if (i < MLX4_EN_MAX_RX_FRAGS - 1)
+				frag_size = min(frag_size, 2048);
+
+			priv->frag_info[i].frag_size = frag_size;
+
+			priv->frag_info[i].frag_stride = ALIGN(frag_size,
+							       SMP_CACHE_BYTES);
+			buf_size += frag_size;
 			i++;
 		}
 		priv->rx_page_order = MLX4_EN_ALLOC_PREFER_ORDER;
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index c9916b75b94bc9364b2cbe6da06a5ea385c6..a5bb0103ad8fd3b3f4b3d16099b7bf7ba01b 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -104,14 +104,6 @@
 
 #define MLX4_EN_ALLOC_PREFER_ORDER	PAGE_ALLOC_COSTLY_ORDER
 
-/* Receive fragment sizes; we use at most 3 fragments (for 9600 byte MTU
- * and 4K allocations) */
-enum {
-	FRAG_SZ0 = 1536 - NET_IP_ALIGN,
-	FRAG_SZ1 = 4096,
-	FRAG_SZ2 = 4096,
-	FRAG_SZ3 = MLX4_EN_ALLOC_SIZE
-};
 #define MLX4_EN_MAX_RX_FRAGS	4
 
 /* Maximum ring sizes */
-- 
2.11.0.483.g087da7b7c-goog

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH net-next 8/9] mlx4: use order-0 pages for RX
  2017-02-07  3:02 [PATCH net-next 0/9] mlx4: order-0 allocations and page recycling Eric Dumazet
                   ` (6 preceding siblings ...)
  2017-02-07  3:02 ` [PATCH net-next 7/9] mlx4: removal of frag_sizes[] Eric Dumazet
@ 2017-02-07  3:02 ` Eric Dumazet
  2017-02-07  3:02 ` [PATCH net-next 9/9] mlx4: add page recycling in receive path Eric Dumazet
  2017-02-07 15:50 ` [PATCH net-next 0/9] mlx4: order-0 allocations and page recycling Tariq Toukan
  9 siblings, 0 replies; 26+ messages in thread
From: Eric Dumazet @ 2017-02-07  3:02 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Tariq Toukan, Martin KaFai Lau, Willem de Bruijn,
	Jesper Dangaard Brouer, Brenden Blanco, Alexei Starovoitov,
	Eric Dumazet, Eric Dumazet

Use of order-3 pages is problematic in some cases.

This patch might add three kinds of regression :

1) a CPU performance regression, but we will add later page
recycling and performance should be back.

2) TCP receiver could grow its receive window slightly slower,
   because skb->len/skb->truesize ratio will decrease.
   This is mostly ok, we prefer being conservative to not risk OOM,
   and eventually tune TCP better in the future.
   This is consistent with other drivers using 2048 per ethernet frame.

3) Because we allocate one page per RX slot, we consume more
   memory for the ring buffers. XDP already had this constraint anyway.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_rx.c   | 72 +++++++++++++---------------
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h |  4 --
 2 files changed, 33 insertions(+), 43 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index dd3bfcfea10c4545dfeda0f999449b13ca91..be4f3491a4fcb6ee0e9fe4e71abfd2bc5373 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -53,38 +53,26 @@
 static int mlx4_alloc_pages(struct mlx4_en_priv *priv,
 			    struct mlx4_en_rx_alloc *page_alloc,
 			    const struct mlx4_en_frag_info *frag_info,
-			    gfp_t _gfp)
+			    gfp_t gfp)
 {
-	int order;
 	struct page *page;
 	dma_addr_t dma;
 
-	for (order = priv->rx_page_order; ;) {
-		gfp_t gfp = _gfp;
-
-		if (order)
-			gfp |= __GFP_COMP | __GFP_NOWARN | __GFP_NOMEMALLOC;
-		page = alloc_pages(gfp, order);
-		if (likely(page))
-			break;
-		if (--order < 0 ||
-		    ((PAGE_SIZE << order) < frag_info->frag_size))
-			return -ENOMEM;
-	}
-	dma = dma_map_page(priv->ddev, page, 0, PAGE_SIZE << order,
-			   priv->dma_dir);
+	page = alloc_page(gfp);
+	if (unlikely(!page))
+		return -ENOMEM;
+	dma = dma_map_page(priv->ddev, page, 0, PAGE_SIZE, priv->dma_dir);
 	if (unlikely(dma_mapping_error(priv->ddev, dma))) {
 		put_page(page);
 		return -ENOMEM;
 	}
-	page_alloc->page_size = PAGE_SIZE << order;
 	page_alloc->page = page;
 	page_alloc->dma = dma;
 	page_alloc->page_offset = 0;
 	/* Not doing get_page() for each frag is a big win
 	 * on asymetric workloads. Note we can not use atomic_set().
 	 */
-	page_ref_add(page, page_alloc->page_size / frag_info->frag_stride - 1);
+	page_ref_add(page, PAGE_SIZE / frag_info->frag_stride - 1);
 	return 0;
 }
 
@@ -105,7 +93,7 @@ static int mlx4_en_alloc_frags(struct mlx4_en_priv *priv,
 		page_alloc[i].page_offset += frag_info->frag_stride;
 
 		if (page_alloc[i].page_offset + frag_info->frag_stride <=
-		    ring_alloc[i].page_size)
+		    PAGE_SIZE)
 			continue;
 
 		if (unlikely(mlx4_alloc_pages(priv, &page_alloc[i],
@@ -127,11 +115,10 @@ static int mlx4_en_alloc_frags(struct mlx4_en_priv *priv,
 	while (i--) {
 		if (page_alloc[i].page != ring_alloc[i].page) {
 			dma_unmap_page(priv->ddev, page_alloc[i].dma,
-				page_alloc[i].page_size,
-				priv->dma_dir);
+				       PAGE_SIZE, priv->dma_dir);
 			page = page_alloc[i].page;
 			/* Revert changes done by mlx4_alloc_pages */
-			page_ref_sub(page, page_alloc[i].page_size /
+			page_ref_sub(page, PAGE_SIZE /
 					   priv->frag_info[i].frag_stride - 1);
 			put_page(page);
 		}
@@ -147,8 +134,8 @@ static void mlx4_en_free_frag(struct mlx4_en_priv *priv,
 	u32 next_frag_end = frags[i].page_offset + 2 * frag_info->frag_stride;
 
 
-	if (next_frag_end > frags[i].page_size)
-		dma_unmap_page(priv->ddev, frags[i].dma, frags[i].page_size,
+	if (next_frag_end > PAGE_SIZE)
+		dma_unmap_page(priv->ddev, frags[i].dma, PAGE_SIZE,
 			       priv->dma_dir);
 
 	if (frags[i].page)
@@ -168,9 +155,8 @@ static int mlx4_en_init_allocator(struct mlx4_en_priv *priv,
 				     frag_info, GFP_KERNEL | __GFP_COLD))
 			goto out;
 
-		en_dbg(DRV, priv, "  frag %d allocator: - size:%d frags:%d\n",
-		       i, ring->page_alloc[i].page_size,
-		       page_ref_count(ring->page_alloc[i].page));
+		en_dbg(DRV, priv, "  frag %d allocator: - frags:%d\n",
+		       i, page_ref_count(ring->page_alloc[i].page));
 	}
 	return 0;
 
@@ -180,11 +166,10 @@ static int mlx4_en_init_allocator(struct mlx4_en_priv *priv,
 
 		page_alloc = &ring->page_alloc[i];
 		dma_unmap_page(priv->ddev, page_alloc->dma,
-			       page_alloc->page_size,
-			       priv->dma_dir);
+			       PAGE_SIZE, priv->dma_dir);
 		page = page_alloc->page;
 		/* Revert changes done by mlx4_alloc_pages */
-		page_ref_sub(page, page_alloc->page_size /
+		page_ref_sub(page, PAGE_SIZE /
 				   priv->frag_info[i].frag_stride - 1);
 		put_page(page);
 		page_alloc->page = NULL;
@@ -206,9 +191,9 @@ static void mlx4_en_destroy_allocator(struct mlx4_en_priv *priv,
 		       i, page_count(page_alloc->page));
 
 		dma_unmap_page(priv->ddev, page_alloc->dma,
-				page_alloc->page_size, priv->dma_dir);
+			       PAGE_SIZE, priv->dma_dir);
 		while (page_alloc->page_offset + frag_info->frag_stride <
-		       page_alloc->page_size) {
+		       PAGE_SIZE) {
 			put_page(page_alloc->page);
 			page_alloc->page_offset += frag_info->frag_stride;
 		}
@@ -1188,7 +1173,6 @@ void mlx4_en_calc_rx_buf(struct net_device *dev)
 	 * This only works when num_frags == 1.
 	 */
 	if (priv->tx_ring_num[TX_XDP]) {
-		priv->rx_page_order = 0;
 		priv->frag_info[0].frag_size = eff_mtu;
 		/* This will gain efficient xdp frame recycling at the
 		 * expense of more costly truesize accounting
@@ -1198,22 +1182,32 @@ void mlx4_en_calc_rx_buf(struct net_device *dev)
 		priv->rx_headroom = XDP_PACKET_HEADROOM;
 		i = 1;
 	} else {
-		int buf_size = 0;
+		int frag_size_max = 2048, buf_size = 0;
+
+		/* should not happen, right ? */
+		if (eff_mtu > PAGE_SIZE + (MLX4_EN_MAX_RX_FRAGS - 1) * 2048)
+			frag_size_max = PAGE_SIZE;
 
 		while (buf_size < eff_mtu) {
-			int frag_size = eff_mtu - buf_size;
+			int frag_stride, frag_size = eff_mtu - buf_size;
+			int pad, nb;
 
 			if (i < MLX4_EN_MAX_RX_FRAGS - 1)
-				frag_size = min(frag_size, 2048);
+				frag_size = min(frag_size, frag_size_max);
 
 			priv->frag_info[i].frag_size = frag_size;
+			frag_stride = ALIGN(frag_size, SMP_CACHE_BYTES);
+			/* We can only pack 2 1536-bytes frames in on 4K page
+			 * Therefore, each frame would consume more bytes (truesize)
+			 */
+			nb = PAGE_SIZE / frag_stride;
+			pad = (PAGE_SIZE - nb * frag_stride) / nb;
+			pad &= ~(SMP_CACHE_BYTES - 1);
+			priv->frag_info[i].frag_stride = frag_stride + pad;
 
-			priv->frag_info[i].frag_stride = ALIGN(frag_size,
-							       SMP_CACHE_BYTES);
 			buf_size += frag_size;
 			i++;
 		}
-		priv->rx_page_order = MLX4_EN_ALLOC_PREFER_ORDER;
 		priv->dma_dir = PCI_DMA_FROMDEVICE;
 		priv->rx_headroom = 0;
 	}
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index a5bb0103ad8fd3b3f4b3d16099b7bf7ba01b..4016086b13539c8bd848242a3a1788eff245 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -102,8 +102,6 @@
 /* Use the maximum between 16384 and a single page */
 #define MLX4_EN_ALLOC_SIZE	PAGE_ALIGN(16384)
 
-#define MLX4_EN_ALLOC_PREFER_ORDER	PAGE_ALLOC_COSTLY_ORDER
-
 #define MLX4_EN_MAX_RX_FRAGS	4
 
 /* Maximum ring sizes */
@@ -255,7 +253,6 @@ struct mlx4_en_rx_alloc {
 	struct page	*page;
 	dma_addr_t	dma;
 	u32		page_offset;
-	u32		page_size;
 };
 
 #define MLX4_EN_CACHE_SIZE (2 * NAPI_POLL_WEIGHT)
@@ -579,7 +576,6 @@ struct mlx4_en_priv {
 	u8 num_frags;
 	u8 log_rx_info;
 	u8 dma_dir;
-	u8 rx_page_order;
 	u16 rx_headroom;
 
 	struct mlx4_en_tx_ring **tx_ring[MLX4_EN_NUM_TX_TYPES];
-- 
2.11.0.483.g087da7b7c-goog

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH net-next 9/9] mlx4: add page recycling in receive path
  2017-02-07  3:02 [PATCH net-next 0/9] mlx4: order-0 allocations and page recycling Eric Dumazet
                   ` (7 preceding siblings ...)
  2017-02-07  3:02 ` [PATCH net-next 8/9] mlx4: use order-0 pages for RX Eric Dumazet
@ 2017-02-07  3:02 ` Eric Dumazet
  2017-02-07 16:20   ` Tariq Toukan
  2017-02-07 15:50 ` [PATCH net-next 0/9] mlx4: order-0 allocations and page recycling Tariq Toukan
  9 siblings, 1 reply; 26+ messages in thread
From: Eric Dumazet @ 2017-02-07  3:02 UTC (permalink / raw)
  To: David S . Miller
  Cc: netdev, Tariq Toukan, Martin KaFai Lau, Willem de Bruijn,
	Jesper Dangaard Brouer, Brenden Blanco, Alexei Starovoitov,
	Eric Dumazet, Eric Dumazet

Same technique than some Intel drivers, for arches where PAGE_SIZE = 4096

In most cases, pages are reused because they were consumed
before we could loop around the RX ring.

This brings back performance, and is even better,
a single TCP flow reaches 30Gbit on my hosts.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_rx.c   | 238 ++++++++-------------------
 drivers/net/ethernet/mellanox/mlx4/mlx4_en.h |   1 -
 2 files changed, 68 insertions(+), 171 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index be4f3491a4fcb6ee0e9fe4e71abfd2bc5373..6854a19087edbf0bc9bf29e20a82deaaf043 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -50,10 +50,9 @@
 
 #include "mlx4_en.h"
 
-static int mlx4_alloc_pages(struct mlx4_en_priv *priv,
-			    struct mlx4_en_rx_alloc *page_alloc,
-			    const struct mlx4_en_frag_info *frag_info,
-			    gfp_t gfp)
+static int mlx4_alloc_page(const struct mlx4_en_priv *priv,
+			   struct mlx4_en_rx_alloc *frag,
+			   gfp_t gfp)
 {
 	struct page *page;
 	dma_addr_t dma;
@@ -66,142 +65,40 @@ static int mlx4_alloc_pages(struct mlx4_en_priv *priv,
 		put_page(page);
 		return -ENOMEM;
 	}
-	page_alloc->page = page;
-	page_alloc->dma = dma;
-	page_alloc->page_offset = 0;
-	/* Not doing get_page() for each frag is a big win
-	 * on asymetric workloads. Note we can not use atomic_set().
-	 */
-	page_ref_add(page, PAGE_SIZE / frag_info->frag_stride - 1);
+	frag->page = page;
+	frag->dma = dma;
+	frag->page_offset = priv->rx_headroom;
 	return 0;
 }
 
-static int mlx4_en_alloc_frags(struct mlx4_en_priv *priv,
+static int mlx4_en_alloc_frags(const struct mlx4_en_priv *priv,
 			       struct mlx4_en_rx_desc *rx_desc,
 			       struct mlx4_en_rx_alloc *frags,
-			       struct mlx4_en_rx_alloc *ring_alloc,
 			       gfp_t gfp)
 {
-	struct mlx4_en_rx_alloc page_alloc[MLX4_EN_MAX_RX_FRAGS];
-	const struct mlx4_en_frag_info *frag_info;
-	struct page *page;
 	int i;
 
-	for (i = 0; i < priv->num_frags; i++) {
-		frag_info = &priv->frag_info[i];
-		page_alloc[i] = ring_alloc[i];
-		page_alloc[i].page_offset += frag_info->frag_stride;
-
-		if (page_alloc[i].page_offset + frag_info->frag_stride <=
-		    PAGE_SIZE)
-			continue;
-
-		if (unlikely(mlx4_alloc_pages(priv, &page_alloc[i],
-					      frag_info, gfp)))
-			goto out;
-	}
-
-	for (i = 0; i < priv->num_frags; i++) {
-		frags[i] = ring_alloc[i];
-		frags[i].page_offset += priv->rx_headroom;
-		rx_desc->data[i].addr = cpu_to_be64(frags[i].dma +
-						    frags[i].page_offset);
-		ring_alloc[i] = page_alloc[i];
-	}
-
-	return 0;
-
-out:
-	while (i--) {
-		if (page_alloc[i].page != ring_alloc[i].page) {
-			dma_unmap_page(priv->ddev, page_alloc[i].dma,
-				       PAGE_SIZE, priv->dma_dir);
-			page = page_alloc[i].page;
-			/* Revert changes done by mlx4_alloc_pages */
-			page_ref_sub(page, PAGE_SIZE /
-					   priv->frag_info[i].frag_stride - 1);
-			put_page(page);
-		}
-	}
-	return -ENOMEM;
-}
-
-static void mlx4_en_free_frag(struct mlx4_en_priv *priv,
-			      struct mlx4_en_rx_alloc *frags,
-			      int i)
-{
-	const struct mlx4_en_frag_info *frag_info = &priv->frag_info[i];
-	u32 next_frag_end = frags[i].page_offset + 2 * frag_info->frag_stride;
-
-
-	if (next_frag_end > PAGE_SIZE)
-		dma_unmap_page(priv->ddev, frags[i].dma, PAGE_SIZE,
-			       priv->dma_dir);
-
-	if (frags[i].page)
-		put_page(frags[i].page);
-}
-
-static int mlx4_en_init_allocator(struct mlx4_en_priv *priv,
-				  struct mlx4_en_rx_ring *ring)
-{
-	int i;
-	struct mlx4_en_rx_alloc *page_alloc;
-
-	for (i = 0; i < priv->num_frags; i++) {
-		const struct mlx4_en_frag_info *frag_info = &priv->frag_info[i];
-
-		if (mlx4_alloc_pages(priv, &ring->page_alloc[i],
-				     frag_info, GFP_KERNEL | __GFP_COLD))
-			goto out;
-
-		en_dbg(DRV, priv, "  frag %d allocator: - frags:%d\n",
-		       i, page_ref_count(ring->page_alloc[i].page));
+	for (i = 0; i < priv->num_frags; i++, frags++) {
+		if (!frags->page && mlx4_alloc_page(priv, frags, gfp))
+			return -ENOMEM;
+		rx_desc->data[i].addr = cpu_to_be64(frags->dma +
+						    frags->page_offset);
 	}
 	return 0;
-
-out:
-	while (i--) {
-		struct page *page;
-
-		page_alloc = &ring->page_alloc[i];
-		dma_unmap_page(priv->ddev, page_alloc->dma,
-			       PAGE_SIZE, priv->dma_dir);
-		page = page_alloc->page;
-		/* Revert changes done by mlx4_alloc_pages */
-		page_ref_sub(page, PAGE_SIZE /
-				   priv->frag_info[i].frag_stride - 1);
-		put_page(page);
-		page_alloc->page = NULL;
-	}
-	return -ENOMEM;
 }
 
-static void mlx4_en_destroy_allocator(struct mlx4_en_priv *priv,
-				      struct mlx4_en_rx_ring *ring)
+static void mlx4_en_free_frag(const struct mlx4_en_priv *priv,
+			      struct mlx4_en_rx_alloc *frag)
 {
-	struct mlx4_en_rx_alloc *page_alloc;
-	int i;
-
-	for (i = 0; i < priv->num_frags; i++) {
-		const struct mlx4_en_frag_info *frag_info = &priv->frag_info[i];
-
-		page_alloc = &ring->page_alloc[i];
-		en_dbg(DRV, priv, "Freeing allocator:%d count:%d\n",
-		       i, page_count(page_alloc->page));
-
-		dma_unmap_page(priv->ddev, page_alloc->dma,
+	if (frag->page) {
+		dma_unmap_page(priv->ddev, frag->dma,
 			       PAGE_SIZE, priv->dma_dir);
-		while (page_alloc->page_offset + frag_info->frag_stride <
-		       PAGE_SIZE) {
-			put_page(page_alloc->page);
-			page_alloc->page_offset += frag_info->frag_stride;
-		}
-		page_alloc->page = NULL;
+		put_page(frag->page);
+		frag->page = NULL;
 	}
 }
 
-static void mlx4_en_init_rx_desc(struct mlx4_en_priv *priv,
+static void mlx4_en_init_rx_desc(const struct mlx4_en_priv *priv,
 				 struct mlx4_en_rx_ring *ring, int index)
 {
 	struct mlx4_en_rx_desc *rx_desc = ring->buf + ring->stride * index;
@@ -226,7 +123,7 @@ static void mlx4_en_init_rx_desc(struct mlx4_en_priv *priv,
 	}
 }
 
-static int mlx4_en_prepare_rx_desc(struct mlx4_en_priv *priv,
+static int mlx4_en_prepare_rx_desc(const struct mlx4_en_priv *priv,
 				   struct mlx4_en_rx_ring *ring, int index,
 				   gfp_t gfp)
 {
@@ -235,19 +132,21 @@ static int mlx4_en_prepare_rx_desc(struct mlx4_en_priv *priv,
 					(index << priv->log_rx_info);
 
 	if (ring->page_cache.index > 0) {
-		ring->page_cache.index--;
-		frags[0].page = ring->page_cache.buf[ring->page_cache.index].page;
-		frags[0].dma  = ring->page_cache.buf[ring->page_cache.index].dma;
+		if (frags[0].page) {
+			ring->page_cache.index--;
+			frags[0].page = ring->page_cache.buf[ring->page_cache.index].page;
+			frags[0].dma  = ring->page_cache.buf[ring->page_cache.index].dma;
+		}
 		frags[0].page_offset = XDP_PACKET_HEADROOM;
 		rx_desc->data[0].addr = cpu_to_be64(frags[0].dma +
 						    frags[0].page_offset);
 		return 0;
 	}
 
-	return mlx4_en_alloc_frags(priv, rx_desc, frags, ring->page_alloc, gfp);
+	return mlx4_en_alloc_frags(priv, rx_desc, frags, gfp);
 }
 
-static inline bool mlx4_en_is_ring_empty(struct mlx4_en_rx_ring *ring)
+static bool mlx4_en_is_ring_empty(const struct mlx4_en_rx_ring *ring)
 {
 	return ring->prod == ring->cons;
 }
@@ -257,7 +156,8 @@ static inline void mlx4_en_update_rx_prod_db(struct mlx4_en_rx_ring *ring)
 	*ring->wqres.db.db = cpu_to_be32(ring->prod & 0xffff);
 }
 
-static void mlx4_en_free_rx_desc(struct mlx4_en_priv *priv,
+/* slow path */
+static void mlx4_en_free_rx_desc(const struct mlx4_en_priv *priv,
 				 struct mlx4_en_rx_ring *ring,
 				 int index)
 {
@@ -267,7 +167,7 @@ static void mlx4_en_free_rx_desc(struct mlx4_en_priv *priv,
 	frags = ring->rx_info + (index << priv->log_rx_info);
 	for (nr = 0; nr < priv->num_frags; nr++) {
 		en_dbg(DRV, priv, "Freeing fragment:%d\n", nr);
-		mlx4_en_free_frag(priv, frags, nr);
+		mlx4_en_free_frag(priv, frags + nr);
 	}
 }
 
@@ -380,9 +280,9 @@ int mlx4_en_create_rx_ring(struct mlx4_en_priv *priv,
 
 	tmp = size * roundup_pow_of_two(MLX4_EN_MAX_RX_FRAGS *
 					sizeof(struct mlx4_en_rx_alloc));
-	ring->rx_info = vmalloc_node(tmp, node);
+	ring->rx_info = vzalloc_node(tmp, node);
 	if (!ring->rx_info) {
-		ring->rx_info = vmalloc(tmp);
+		ring->rx_info = vzalloc(tmp);
 		if (!ring->rx_info) {
 			err = -ENOMEM;
 			goto err_ring;
@@ -452,16 +352,6 @@ int mlx4_en_activate_rx_rings(struct mlx4_en_priv *priv)
 		/* Initialize all descriptors */
 		for (i = 0; i < ring->size; i++)
 			mlx4_en_init_rx_desc(priv, ring, i);
-
-		/* Initialize page allocators */
-		err = mlx4_en_init_allocator(priv, ring);
-		if (err) {
-			en_err(priv, "Failed initializing ring allocator\n");
-			if (ring->stride <= TXBB_SIZE)
-				ring->buf -= TXBB_SIZE;
-			ring_ind--;
-			goto err_allocator;
-		}
 	}
 	err = mlx4_en_fill_rx_buffers(priv);
 	if (err)
@@ -481,11 +371,9 @@ int mlx4_en_activate_rx_rings(struct mlx4_en_priv *priv)
 		mlx4_en_free_rx_buf(priv, priv->rx_ring[ring_ind]);
 
 	ring_ind = priv->rx_ring_num - 1;
-err_allocator:
 	while (ring_ind >= 0) {
 		if (priv->rx_ring[ring_ind]->stride <= TXBB_SIZE)
 			priv->rx_ring[ring_ind]->buf -= TXBB_SIZE;
-		mlx4_en_destroy_allocator(priv, priv->rx_ring[ring_ind]);
 		ring_ind--;
 	}
 	return err;
@@ -562,50 +450,64 @@ void mlx4_en_deactivate_rx_ring(struct mlx4_en_priv *priv,
 	mlx4_en_free_rx_buf(priv, ring);
 	if (ring->stride <= TXBB_SIZE)
 		ring->buf -= TXBB_SIZE;
-	mlx4_en_destroy_allocator(priv, ring);
 }
 
 
-static int mlx4_en_complete_rx_desc(struct mlx4_en_priv *priv,
-				    struct mlx4_en_rx_desc *rx_desc,
+static noinline int mlx4_en_complete_rx_desc(struct mlx4_en_priv *priv,
 				    struct mlx4_en_rx_alloc *frags,
 				    struct sk_buff *skb,
 				    int length)
 {
-	struct skb_frag_struct *skb_frags_rx = skb_shinfo(skb)->frags;
 	struct mlx4_en_frag_info *frag_info = priv->frag_info;
 	int nr, frag_size;
+	struct page *page;
 	dma_addr_t dma;
+	bool release;
+	unsigned int truesize = 0;
 
 	/* Collect used fragments while replacing them in the HW descriptors */
 	for (nr = 0;;) {
 		frag_size = min_t(int, length, frag_info->frag_size);
 
-		if (unlikely(!frags[nr].page))
+		page = frags[nr].page;
+		if (unlikely(!page))
 			goto fail;
 
-		dma = be64_to_cpu(rx_desc->data[nr].addr);
-		dma_sync_single_for_cpu(priv->ddev, dma, frag_info->frag_size,
-					DMA_FROM_DEVICE);
+		dma = frags[nr].dma;
+		dma_sync_single_range_for_cpu(priv->ddev, dma, frags[nr].page_offset,
+					      frag_info->frag_size, priv->dma_dir);
 
-		__skb_fill_page_desc(skb, nr, frags[nr].page,
-				     frags[nr].page_offset,
+		__skb_fill_page_desc(skb, nr, page, frags[nr].page_offset,
 				     frag_size);
 
-		skb->truesize += frag_info->frag_stride;
-		frags[nr].page = NULL;
+		truesize += frag_info->frag_stride;
+		if (frag_info->frag_stride == PAGE_SIZE / 2) {
+			frags[nr].page_offset ^= PAGE_SIZE / 2;
+			release = page_count(page) != 1 || page_is_pfmemalloc(page);
+		} else {
+			frags[nr].page_offset += frag_info->frag_stride;
+			release = frags[nr].page_offset + frag_info->frag_size > PAGE_SIZE;
+		}
+		if (release) {
+			dma_unmap_page(priv->ddev, dma, PAGE_SIZE, priv->dma_dir);
+			frags[nr].page = NULL;
+		} else {
+			page_ref_inc(page);
+		}
+
 		nr++;
 		length -= frag_size;
 		if (!length)
 			break;
 		frag_info++;
 	}
+	skb->truesize += truesize;
 	return nr;
 
 fail:
 	while (nr > 0) {
 		nr--;
-		__skb_frag_unref(&skb_frags_rx[nr]);
+		__skb_frag_unref(skb_shinfo(skb)->frags + nr);
 	}
 	return 0;
 }
@@ -636,7 +538,8 @@ static struct sk_buff *mlx4_en_rx_skb(struct mlx4_en_priv *priv,
 	if (length <= SMALL_PACKET_SIZE) {
 		/* We are copying all relevant data to the skb - temporarily
 		 * sync buffers for the copy */
-		dma = be64_to_cpu(rx_desc->data[0].addr);
+
+		dma = frags[0].dma + frags[0].page_offset;
 		dma_sync_single_for_cpu(priv->ddev, dma, length,
 					DMA_FROM_DEVICE);
 		skb_copy_to_linear_data(skb, va, length);
@@ -645,7 +548,7 @@ static struct sk_buff *mlx4_en_rx_skb(struct mlx4_en_priv *priv,
 		unsigned int pull_len;
 
 		/* Move relevant fragments to skb */
-		used_frags = mlx4_en_complete_rx_desc(priv, rx_desc, frags,
+		used_frags = mlx4_en_complete_rx_desc(priv, frags,
 							skb, length);
 		if (unlikely(!used_frags)) {
 			kfree_skb(skb);
@@ -913,8 +816,10 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
 			case XDP_TX:
 				if (likely(!mlx4_en_xmit_frame(ring, frags, dev,
 							length, cq->ring,
-							&doorbell_pending)))
-					goto consumed;
+							&doorbell_pending))) {
+					frags[0].page = NULL;
+					goto next;
+				}
 				trace_xdp_exception(dev, xdp_prog, act);
 				goto xdp_drop_no_cnt; /* Drop on xmit failure */
 			default:
@@ -924,8 +829,6 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
 			case XDP_DROP:
 				ring->xdp_drop++;
 xdp_drop_no_cnt:
-				if (likely(mlx4_en_rx_recycle(ring, frags)))
-					goto consumed;
 				goto next;
 			}
 		}
@@ -971,9 +874,8 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
 			if (!gro_skb)
 				goto next;
 
-			nr = mlx4_en_complete_rx_desc(priv,
-				rx_desc, frags, gro_skb,
-				length);
+			nr = mlx4_en_complete_rx_desc(priv, frags, gro_skb,
+						      length);
 			if (!nr)
 				goto next;
 
@@ -1081,10 +983,6 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
 
 		napi_gro_receive(&cq->napi, skb);
 next:
-		for (nr = 0; nr < priv->num_frags; nr++)
-			mlx4_en_free_frag(priv, frags, nr);
-
-consumed:
 		++cq->mcq.cons_index;
 		index = (cq->mcq.cons_index) & ring->size_mask;
 		cqe = mlx4_en_get_cqe(cq->buf, index, priv->cqe_size) + factor;
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index 4016086b13539c8bd848242a3a1788eff245..4a6594325a10b238b8a4f01805493b5c6e8b 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -327,7 +327,6 @@ struct mlx4_en_rx_desc {
 
 struct mlx4_en_rx_ring {
 	struct mlx4_hwq_resources wqres;
-	struct mlx4_en_rx_alloc page_alloc[MLX4_EN_MAX_RX_FRAGS];
 	u32 size ;	/* number of Rx descs*/
 	u32 actual_size;
 	u32 size_mask;
-- 
2.11.0.483.g087da7b7c-goog

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next 0/9] mlx4: order-0 allocations and page recycling
  2017-02-07  3:02 [PATCH net-next 0/9] mlx4: order-0 allocations and page recycling Eric Dumazet
                   ` (8 preceding siblings ...)
  2017-02-07  3:02 ` [PATCH net-next 9/9] mlx4: add page recycling in receive path Eric Dumazet
@ 2017-02-07 15:50 ` Tariq Toukan
  2017-02-07 16:06   ` Eric Dumazet
  2017-02-08  9:02   ` Tariq Toukan
  9 siblings, 2 replies; 26+ messages in thread
From: Tariq Toukan @ 2017-02-07 15:50 UTC (permalink / raw)
  To: Eric Dumazet, David S . Miller
  Cc: netdev, Tariq Toukan, Martin KaFai Lau, Willem de Bruijn,
	Jesper Dangaard Brouer, Brenden Blanco, Alexei Starovoitov,
	Eric Dumazet

Hi Eric,

Thanks for your series.

On 07/02/2017 5:02 AM, Eric Dumazet wrote:
> As mentioned half a year ago, we better switch mlx4 driver to order-0
> allocations and page recycling.
>
> This reduces vulnerability surface thanks to better skb->truesize tracking
> and provides better performance in most cases.
The series makes significant change in the RX data-path, that requires 
deeper checks, in addition to code review.
We applied your series and started running both our functional and 
performance regression.
We will have results by tomorrow morning, and will analyze them during 
the day. I'll update about that.
>
> Worth noting this patch series deletes more than 100 lines of code ;)
>
> Eric Dumazet (9):
>    mlx4: use __skb_fill_page_desc()
>    mlx4: dma_dir is a mlx4_en_priv attribute
>    mlx4: remove order field from mlx4_en_frag_info
>    mlx4: get rid of frag_prefix_size
>    mlx4: rx_headroom is a per port attribute
>    mlx4: reduce rx ring page_cache size
>    mlx4: removal of frag_sizes[]
>    mlx4: use order-0 pages for RX
>    mlx4: add page recycling in receive path
>
>   drivers/net/ethernet/mellanox/mlx4/en_rx.c   | 350 +++++++++------------------
>   drivers/net/ethernet/mellanox/mlx4/en_tx.c   |   4 +-
>   drivers/net/ethernet/mellanox/mlx4/mlx4_en.h |  28 +--
>   3 files changed, 129 insertions(+), 253 deletions(-)
>
Thanks,
Tariq

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next 0/9] mlx4: order-0 allocations and page recycling
  2017-02-07 15:50 ` [PATCH net-next 0/9] mlx4: order-0 allocations and page recycling Tariq Toukan
@ 2017-02-07 16:06   ` Eric Dumazet
  2017-02-07 16:26     ` Eric Dumazet
  2017-02-08  9:02   ` Tariq Toukan
  1 sibling, 1 reply; 26+ messages in thread
From: Eric Dumazet @ 2017-02-07 16:06 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: Eric Dumazet, David S . Miller, netdev, Tariq Toukan,
	Martin KaFai Lau, Willem de Bruijn, Jesper Dangaard Brouer,
	Brenden Blanco, Alexei Starovoitov

On Tue, 2017-02-07 at 17:50 +0200, Tariq Toukan wrote:
> Hi Eric,
> 
> Thanks for your series.
> 
> On 07/02/2017 5:02 AM, Eric Dumazet wrote:
> > As mentioned half a year ago, we better switch mlx4 driver to order-0
> > allocations and page recycling.
> >
> > This reduces vulnerability surface thanks to better skb->truesize tracking
> > and provides better performance in most cases.
> The series makes significant change in the RX data-path, that requires 
> deeper checks, in addition to code review.
> We applied your series and started running both our functional and 
> performance regression.
> We will have results by tomorrow morning, and will analyze them during 
> the day. I'll update about that.


Thanks Tariq.

I have also removed the need to access rx_desc, one less cache line
miss. Added two prefetches as well.

I will incorporate the following in the series.

30 -> 32 Gbits on a single TCP flow.

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 6854a19087edbf0bc9bf29e20a82deaaf043..3959db42b3d15657d4073a0d6391afd6a2a5 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -483,7 +483,9 @@ static noinline int mlx4_en_complete_rx_desc(struct mlx4_en_priv *priv,
 		truesize += frag_info->frag_stride;
 		if (frag_info->frag_stride == PAGE_SIZE / 2) {
 			frags[nr].page_offset ^= PAGE_SIZE / 2;
-			release = page_count(page) != 1 || page_is_pfmemalloc(page);
+			release = page_count(page) != 1 ||
+				  page_is_pfmemalloc(page) ||
+				  page_to_nid(page) != numa_mem_id();
 		} else {
 			frags[nr].page_offset += frag_info->frag_stride;
 			release = frags[nr].page_offset + frag_info->frag_size > PAGE_SIZE;
@@ -514,12 +516,11 @@ static noinline int mlx4_en_complete_rx_desc(struct mlx4_en_priv *priv,
 
 
 static struct sk_buff *mlx4_en_rx_skb(struct mlx4_en_priv *priv,
-				      struct mlx4_en_rx_desc *rx_desc,
 				      struct mlx4_en_rx_alloc *frags,
+				      void *va,
 				      unsigned int length)
 {
 	struct sk_buff *skb;
-	void *va;
 	int used_frags;
 	dma_addr_t dma;
 
@@ -531,10 +532,6 @@ static struct sk_buff *mlx4_en_rx_skb(struct mlx4_en_priv *priv,
 	skb_reserve(skb, NET_IP_ALIGN);
 	skb->len = length;
 
-	/* Get pointer to first fragment so we could copy the headers into the
-	 * (linear part of the) skb */
-	va = page_address(frags[0].page) + frags[0].page_offset;
-
 	if (length <= SMALL_PACKET_SIZE) {
 		/* We are copying all relevant data to the skb - temporarily
 		 * sync buffers for the copy */
@@ -689,7 +686,6 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
 	struct mlx4_cqe *cqe;
 	struct mlx4_en_rx_ring *ring = priv->rx_ring[cq->ring];
 	struct mlx4_en_rx_alloc *frags;
-	struct mlx4_en_rx_desc *rx_desc;
 	struct bpf_prog *xdp_prog;
 	int doorbell_pending;
 	struct sk_buff *skb;
@@ -722,14 +718,18 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
 	/* Process all completed CQEs */
 	while (XNOR(cqe->owner_sr_opcode & MLX4_CQE_OWNER_MASK,
 		    cq->mcq.cons_index & cq->size)) {
+		void *va;
 
 		frags = ring->rx_info + (index << priv->log_rx_info);
-		rx_desc = ring->buf + (index << ring->log_stride);
 
 		/*
 		 * make sure we read the CQE after we read the ownership bit
 		 */
 		dma_rmb();
+		prefetch(frags[0].page);
+		va = page_address(frags[0].page) + frags[0].page_offset;
+
+		prefetch(va + 64);
 
 		/* Drop packet on bad receive or bad checksum */
 		if (unlikely((cqe->owner_sr_opcode & MLX4_CQE_OPCODE_MASK) ==
@@ -753,7 +753,7 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
 			/* Get pointer to first fragment since we haven't
 			 * skb yet and cast it to ethhdr struct
 			 */
-			dma = be64_to_cpu(rx_desc->data[0].addr);
+			dma = frags[0].dma + frags[0].page_offset;
 			dma_sync_single_for_cpu(priv->ddev, dma, sizeof(*ethh),
 						DMA_FROM_DEVICE);
 			ethh = (struct ethhdr *)(page_address(frags[0].page) +
@@ -792,7 +792,7 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
 			void *orig_data;
 			u32 act;
 
-			dma = be64_to_cpu(rx_desc->data[0].addr);
+			dma = frags[0].dma + frags[0].page_offset;
 			dma_sync_single_for_cpu(priv->ddev, dma,
 						priv->frag_info[0].frag_size,
 						DMA_FROM_DEVICE);
@@ -880,7 +880,6 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
 				goto next;
 
 			if (ip_summed == CHECKSUM_COMPLETE) {
-				void *va = skb_frag_address(skb_shinfo(gro_skb)->frags);
 				if (check_csum(cqe, gro_skb, va,
 					       dev->features)) {
 					ip_summed = CHECKSUM_NONE;
@@ -932,7 +931,7 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
 		}
 
 		/* GRO not possible, complete processing here */
-		skb = mlx4_en_rx_skb(priv, rx_desc, frags, length);
+		skb = mlx4_en_rx_skb(priv, frags, va, length);
 		if (unlikely(!skb)) {
 			ring->dropped++;
 			goto next;

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next 9/9] mlx4: add page recycling in receive path
  2017-02-07  3:02 ` [PATCH net-next 9/9] mlx4: add page recycling in receive path Eric Dumazet
@ 2017-02-07 16:20   ` Tariq Toukan
  2017-02-07 16:34     ` Eric Dumazet
  0 siblings, 1 reply; 26+ messages in thread
From: Tariq Toukan @ 2017-02-07 16:20 UTC (permalink / raw)
  To: Eric Dumazet, David S . Miller
  Cc: netdev, Tariq Toukan, Martin KaFai Lau, Willem de Bruijn,
	Jesper Dangaard Brouer, Brenden Blanco, Alexei Starovoitov,
	Eric Dumazet



On 07/02/2017 5:02 AM, Eric Dumazet wrote:
> Same technique than some Intel drivers, for arches where PAGE_SIZE = 4096
>
> In most cases, pages are reused because they were consumed
> before we could loop around the RX ring.
This might not be true when multiple streams are handled in the same ring.
As the process time in stack will be longer.
We will test this scenario as well.
>
> This brings back performance, and is even better,
> a single TCP flow reaches 30Gbit on my hosts.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
>   drivers/net/ethernet/mellanox/mlx4/en_rx.c   | 238 ++++++++-------------------
>   drivers/net/ethernet/mellanox/mlx4/mlx4_en.h |   1 -
>   2 files changed, 68 insertions(+), 171 deletions(-)
>
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next 0/9] mlx4: order-0 allocations and page recycling
  2017-02-07 16:06   ` Eric Dumazet
@ 2017-02-07 16:26     ` Eric Dumazet
  2017-02-07 16:28       ` Eric Dumazet
  2017-02-07 19:05       ` Alexei Starovoitov
  0 siblings, 2 replies; 26+ messages in thread
From: Eric Dumazet @ 2017-02-07 16:26 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: Eric Dumazet, David S . Miller, netdev, Tariq Toukan,
	Martin KaFai Lau, Willem de Bruijn, Jesper Dangaard Brouer,
	Brenden Blanco, Alexei Starovoitov

On Tue, 2017-02-07 at 08:06 -0800, Eric Dumazet wrote:

>  		/*
>  		 * make sure we read the CQE after we read the ownership bit
>  		 */
>  		dma_rmb();
> +		prefetch(frags[0].page);

Note that I would like to instead do a prefetch(frags[1].page)

So I will probably change how ring->rx_info is allocated

wasting all that space and forcing vmalloc() is silly :

tmp = size * roundup_pow_of_two(MLX4_EN_MAX_RX_FRAGS *
                                sizeof(struct mlx4_en_rx_alloc));
ring->rx_info = vzalloc_node(tmp, node);

In most cases, using exactly 12 bytes per slot would allow better
packing. Only one cpu is using this area, no need to force strange
alignments, for the sake of avoiding a multiply !

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next 0/9] mlx4: order-0 allocations and page recycling
  2017-02-07 16:26     ` Eric Dumazet
@ 2017-02-07 16:28       ` Eric Dumazet
  2017-02-07 19:05       ` Alexei Starovoitov
  1 sibling, 0 replies; 26+ messages in thread
From: Eric Dumazet @ 2017-02-07 16:28 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: Eric Dumazet, David S . Miller, netdev, Tariq Toukan,
	Martin KaFai Lau, Willem de Bruijn, Jesper Dangaard Brouer,
	Brenden Blanco, Alexei Starovoitov

On Tue, 2017-02-07 at 08:26 -0800, Eric Dumazet wrote:

> In most cases, using exactly 12 bytes per slot would allow better

typo : 24 bytes on 64bit arches.

(Instead of 128 bytes with the current implementation)

> packing. Only one cpu is using this area, no need to force strange
> alignments, for the sake of avoiding a multiply !

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next 9/9] mlx4: add page recycling in receive path
  2017-02-07 16:20   ` Tariq Toukan
@ 2017-02-07 16:34     ` Eric Dumazet
  2017-02-08 10:27       ` Tariq Toukan
  0 siblings, 1 reply; 26+ messages in thread
From: Eric Dumazet @ 2017-02-07 16:34 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: David S . Miller, netdev, Tariq Toukan, Martin KaFai Lau,
	Willem de Bruijn, Jesper Dangaard Brouer, Brenden Blanco,
	Alexei Starovoitov, Eric Dumazet

On Tue, Feb 7, 2017 at 8:20 AM, Tariq Toukan <ttoukan.linux@gmail.com> wrote:
>
>
> On 07/02/2017 5:02 AM, Eric Dumazet wrote:
>>
>> Same technique than some Intel drivers, for arches where PAGE_SIZE = 4096
>>
>> In most cases, pages are reused because they were consumed
>> before we could loop around the RX ring.
>
> This might not be true when multiple streams are handled in the same ring.
> As the process time in stack will be longer.
> We will test this scenario as well.

Sure thing. I already did ;)

I had a local patch adding a new "ethtool -S ... | grep
rx_page_allocs"  counter.

I could probably add it to the series.  (I noticed that
rx_alloc_failed was a dead counter)

Thanks.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next 0/9] mlx4: order-0 allocations and page recycling
  2017-02-07 16:26     ` Eric Dumazet
  2017-02-07 16:28       ` Eric Dumazet
@ 2017-02-07 19:05       ` Alexei Starovoitov
  2017-02-07 19:18         ` Eric Dumazet
  1 sibling, 1 reply; 26+ messages in thread
From: Alexei Starovoitov @ 2017-02-07 19:05 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Tariq Toukan, Eric Dumazet, David S . Miller, netdev,
	Tariq Toukan, Martin KaFai Lau, Willem de Bruijn,
	Jesper Dangaard Brouer, Brenden Blanco, Alexei Starovoitov

On Tue, Feb 07, 2017 at 08:26:23AM -0800, Eric Dumazet wrote:
> On Tue, 2017-02-07 at 08:06 -0800, Eric Dumazet wrote:
>

Awesome that you've started working on this. I think it's correct approach
and mlx5 should be cleaned up in similar way.
Long term we should be able to move all page alloc/free out of the drivers
completely.

> >  		/*
> >  		 * make sure we read the CQE after we read the ownership bit
> >  		 */
> >  		dma_rmb();
> > +		prefetch(frags[0].page);
> 
> Note that I would like to instead do a prefetch(frags[1].page)

yeah, this two look weird:
+               prefetch(frags[0].page);
+               va = page_address(frags[0].page) + frags[0].page_offset;

on most archs page_addres() is just math (not a load from memory),
but the result != frags[0].page, so I'm missing what are you trying to prefetch?

prefetch(frags[1].page)
is even more confusing. what will it prefetch?

btw we had a patch that was doing prefetch of 'va' of next packet
and it was very helpful. Like this:
   pref_index = (index + 1) & ring->size_mask;
   pref = ring->rx_info + (pref_index << priv->log_rx_info);
   prefetch(page_address(pref->page) + pref->page_offset);

but since you're redesigning rxing->rx_info... not sure how will it fit.

> So I will probably change how ring->rx_info is allocated
> 
> wasting all that space and forcing vmalloc() is silly :
> 
> tmp = size * roundup_pow_of_two(MLX4_EN_MAX_RX_FRAGS *
>                                 sizeof(struct mlx4_en_rx_alloc));

I think you'd still need roundup_pow_of_two otherwise priv->log_rx_info
optimization won't work.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next 0/9] mlx4: order-0 allocations and page recycling
  2017-02-07 19:05       ` Alexei Starovoitov
@ 2017-02-07 19:18         ` Eric Dumazet
  0 siblings, 0 replies; 26+ messages in thread
From: Eric Dumazet @ 2017-02-07 19:18 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: Eric Dumazet, Tariq Toukan, David S . Miller, netdev,
	Tariq Toukan, Martin KaFai Lau, Willem de Bruijn,
	Jesper Dangaard Brouer, Brenden Blanco, Alexei Starovoitov

On Tue, Feb 7, 2017 at 11:05 AM, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
> On Tue, Feb 07, 2017 at 08:26:23AM -0800, Eric Dumazet wrote:
>> On Tue, 2017-02-07 at 08:06 -0800, Eric Dumazet wrote:
>>
>
> Awesome that you've started working on this. I think it's correct approach
> and mlx5 should be cleaned up in similar way.
> Long term we should be able to move all page alloc/free out of the drivers
> completely.
>
>> >             /*
>> >              * make sure we read the CQE after we read the ownership bit
>> >              */
>> >             dma_rmb();
>> > +           prefetch(frags[0].page);
>>
>> Note that I would like to instead do a prefetch(frags[1].page)
>
> yeah, this two look weird:
> +               prefetch(frags[0].page);
> +               va = page_address(frags[0].page) + frags[0].page_offset;
>
> on most archs page_addres() is just math (not a load from memory),
> but the result != frags[0].page, so I'm missing what are you trying to prefetch?
>
> prefetch(frags[1].page)
> is even more confusing. what will it prefetch?


The "struct page"  of the following frame

Remember we need :

                 release = page_count(page) != 1 ||
                                page_is_pfmemalloc(page) ||
                                page_to_nid(page) != numa_mem_id();

Then :

page_ref_inc(page);


My patch now does :

prefetch(frags[priv->num_frags].page);

>
> btw we had a patch that was doing prefetch of 'va' of next packet
> and it was very helpful. Like this:

I preferred to fetch the second cache line of this frame,
because TCP is mostly used with timestamps : total of 66 bytes of
header with IPv4, and more for IPV6 of course.


>    pref_index = (index + 1) & ring->size_mask;
>    pref = ring->rx_info + (pref_index << priv->log_rx_info);
>    prefetch(page_address(pref->page) + pref->page_offset);
>
> but since you're redesigning rxing->rx_info... not sure how will it fit.
>
>> So I will probably change how ring->rx_info is allocated
>>
>> wasting all that space and forcing vmalloc() is silly :
>>
>> tmp = size * roundup_pow_of_two(MLX4_EN_MAX_RX_FRAGS *
>>                                 sizeof(struct mlx4_en_rx_alloc));
>
> I think you'd still need roundup_pow_of_two otherwise priv->log_rx_info
> optimization won't work.

No more log_rx_info trick.

Simply : frags = priv->rx_info + (index * priv->rx_info_bytes_per_slot);

A multiply is damn fast these days compared to cache misses.

Using 24*<rx_ring_size> bytes is better than 32*<rx_ring_size>, our
L1/L2 caches are quite small.

Of course, this applies to the 'stress' mode, not the light mode where
we receive one single packet per IRQ.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next 0/9] mlx4: order-0 allocations and page recycling
  2017-02-07 15:50 ` [PATCH net-next 0/9] mlx4: order-0 allocations and page recycling Tariq Toukan
  2017-02-07 16:06   ` Eric Dumazet
@ 2017-02-08  9:02   ` Tariq Toukan
  2017-02-08 10:29     ` Tariq Toukan
  2017-02-08 15:52     ` Eric Dumazet
  1 sibling, 2 replies; 26+ messages in thread
From: Tariq Toukan @ 2017-02-08  9:02 UTC (permalink / raw)
  To: Eric Dumazet, David S . Miller
  Cc: netdev, Tariq Toukan, Martin KaFai Lau, Willem de Bruijn,
	Jesper Dangaard Brouer, Brenden Blanco, Alexei Starovoitov,
	Eric Dumazet



On 07/02/2017 5:50 PM, Tariq Toukan wrote:
> Hi Eric,
>
> Thanks for your series.
>
> On 07/02/2017 5:02 AM, Eric Dumazet wrote:
>> As mentioned half a year ago, we better switch mlx4 driver to order-0
>> allocations and page recycling.
>>
>> This reduces vulnerability surface thanks to better skb->truesize 
>> tracking
>> and provides better performance in most cases.
> The series makes significant change in the RX data-path, that requires 
> deeper checks, in addition to code review.
> We applied your series and started running both our functional and 
> performance regression.
> We will have results by tomorrow morning, and will analyze them during 
> the day. I'll update about that.
We hit a kernel panic when running traffic after configuring a large MTU 
(9000).
I will take deeper look into this soon and will keep you updated.

[56136.982183] BUG: unable to handle kernel paging request at 
000000022f9e7020
[56136.990426] IP: mlx4_en_complete_rx_desc+0x130/0x2e0 [mlx4_en]
[56136.995303] PGD 220b7c067
[56136.995304] PUD 0
[56136.997941]
[56137.001807] Oops: 0000 [#1] SMP
[56137.004540] Modules linked in: netconsole mlx4_ib mlx4_en(E) 
mlx4_core(E) nfsv3 nfs fscache rpcrdma ib_isert iscsi_target_mod ib_iser 
libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp 
scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm 
dm_mirror dm_region_hash dm_log ib_cm dm_mod iw_cm ppdev parport_pc 
i2c_piix4 sg virtio_balloon parport pcspkr acpi_cpufreq nfsd auth_rpcgss 
nfs_acl lockd grace sunrpc ip_tables mlx5_ib sd_mod ata_generic 
pata_acpi ib_core mlx5_core cirrus drm_kms_helper syscopyarea 
sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix floppy libata ptp 
e1000 crc32c_intel virtio_pci pps_core serio_raw virtio_ring i2c_core 
virtio [last unloaded: netconsole]
[56137.046028] CPU: 1 PID: 16 Comm: ksoftirqd/1 Tainted: G            
E   4.10.0-rc6+ #26
[56137.051501] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[56137.055817] task: ffff880236245200 task.stack: ffffc90000d04000
[56137.060154] RIP: 0010:mlx4_en_complete_rx_desc+0x130/0x2e0 [mlx4_en]
[56137.064712] RSP: 0018:ffffc90000d07c90 EFLAGS: 00010282
[56137.068646] RAX: 0000000000000003 RBX: 000000022f9e7000 RCX: 
ffff880234988880
[56137.073588] RDX: ffff8802349888e0 RSI: 0000000000000000 RDI: 
ffff880235dad0a0
[56137.078563] RBP: ffffc90000d07ce0 R08: 0000000000000000 R09: 
ffff8802225a08c0
[56137.083370] R10: ffff8802335c7800 R11: 0000000000000000 R12: 
ffffc90001da1048
[56137.088123] R13: 0000000000000b36 R14: ffff8802225af040 R15: 
0000000000000b36
[56137.092837] FS:  0000000000000000(0000) GS:ffff88023fc40000(0000) 
knlGS:0000000000000000
[56137.098495] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[56137.102626] CR2: 000000022f9e7020 CR3: 000000023441f000 CR4: 
00000000000006e0
[56137.107581] Call Trace:
[56137.109955]  mlx4_en_process_rx_cq+0x35c/0xda0 [mlx4_en]
[56137.113894]  ? mlx4_en_free_tx_desc+0x14e/0x350 [mlx4_en]
[56137.117992]  ? load_balance+0x1ac/0x900
[56137.121285]  mlx4_en_poll_rx_cq+0x30/0xa0 [mlx4_en]
[56137.125023]  net_rx_action+0x23d/0x3a0
[56137.128146]  __do_softirq+0xd1/0x2a2
[56137.131178]  run_ksoftirqd+0x29/0x50
[56137.134180]  smpboot_thread_fn+0x110/0x160
[56137.137530]  kthread+0x101/0x140
[56137.140330]  ? sort_range+0x30/0x30
[56137.143255]  ? kthread_park+0x90/0x90
[56137.146304]  ? __kthread_parkme+0x50/0x70
[56137.149466]  ret_from_fork+0x2c/0x40
[56137.152426] Code: c0 8b 45 cc 41 8b 8a cc 00 00 00 48 63 d0 49 03 8a 
d0 00 00 00 48 83 c2 03 48 c1 e2 04 48 01 ca 48 89 1a 44 89 5a 08 44 89 
7a 0c <48> 8b 53 20 f6 c2 01 0f 85 76 01 00 00 48 89 da 48 83 7a 10 ff
[56137.164855] RIP: mlx4_en_complete_rx_desc+0x130/0x2e0 [mlx4_en] RSP: 
ffffc90000d07c90
[56137.170211] CR2: 000000022f9e7020
[56137.175430] ---[ end trace 6a259f16967a0cff ]---


>>
>> Worth noting this patch series deletes more than 100 lines of code ;)
>>
>> Eric Dumazet (9):
>>    mlx4: use __skb_fill_page_desc()
>>    mlx4: dma_dir is a mlx4_en_priv attribute
>>    mlx4: remove order field from mlx4_en_frag_info
>>    mlx4: get rid of frag_prefix_size
>>    mlx4: rx_headroom is a per port attribute
>>    mlx4: reduce rx ring page_cache size
>>    mlx4: removal of frag_sizes[]
>>    mlx4: use order-0 pages for RX
>>    mlx4: add page recycling in receive path
>>
>>   drivers/net/ethernet/mellanox/mlx4/en_rx.c   | 350 
>> +++++++++------------------
>>   drivers/net/ethernet/mellanox/mlx4/en_tx.c   |   4 +-
>>   drivers/net/ethernet/mellanox/mlx4/mlx4_en.h |  28 +--
>>   3 files changed, 129 insertions(+), 253 deletions(-)
>>
> Thanks,
> Tariq

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next 9/9] mlx4: add page recycling in receive path
  2017-02-07 16:34     ` Eric Dumazet
@ 2017-02-08 10:27       ` Tariq Toukan
  0 siblings, 0 replies; 26+ messages in thread
From: Tariq Toukan @ 2017-02-08 10:27 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David S . Miller, netdev, Tariq Toukan, Martin KaFai Lau,
	Willem de Bruijn, Jesper Dangaard Brouer, Brenden Blanco,
	Alexei Starovoitov, Eric Dumazet



On 07/02/2017 6:34 PM, Eric Dumazet wrote:
> On Tue, Feb 7, 2017 at 8:20 AM, Tariq Toukan <ttoukan.linux@gmail.com> wrote:
>>
>> On 07/02/2017 5:02 AM, Eric Dumazet wrote:
>>> Same technique than some Intel drivers, for arches where PAGE_SIZE = 4096
>>>
>>> In most cases, pages are reused because they were consumed
>>> before we could loop around the RX ring.
>> This might not be true when multiple streams are handled in the same ring.
>> As the process time in stack will be longer.
>> We will test this scenario as well.
> Sure thing. I already did ;)
>
> I had a local patch adding a new "ethtool -S ... | grep
> rx_page_allocs"  counter.
>
> I could probably add it to the series.
I'd always like to have this kind of counters. They make performance 
analysis much easier.
>    (I noticed that
> rx_alloc_failed was a dead counter)
Yes indeed... I'll take care of it. Thanks!
>
> Thanks.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next 0/9] mlx4: order-0 allocations and page recycling
  2017-02-08  9:02   ` Tariq Toukan
@ 2017-02-08 10:29     ` Tariq Toukan
  2017-02-08 15:52     ` Eric Dumazet
  1 sibling, 0 replies; 26+ messages in thread
From: Tariq Toukan @ 2017-02-08 10:29 UTC (permalink / raw)
  To: Eric Dumazet, David S . Miller
  Cc: netdev, Tariq Toukan, Martin KaFai Lau, Willem de Bruijn,
	Jesper Dangaard Brouer, Brenden Blanco, Alexei Starovoitov,
	Eric Dumazet


On 08/02/2017 11:02 AM, Tariq Toukan wrote:
>
>
> On 07/02/2017 5:50 PM, Tariq Toukan wrote:
>> Hi Eric,
>>
>> Thanks for your series.
>>
>> On 07/02/2017 5:02 AM, Eric Dumazet wrote:
>>> As mentioned half a year ago, we better switch mlx4 driver to order-0
>>> allocations and page recycling.
>>>
>>> This reduces vulnerability surface thanks to better skb->truesize 
>>> tracking
>>> and provides better performance in most cases.
>> The series makes significant change in the RX data-path, that 
>> requires deeper checks, in addition to code review.
>> We applied your series and started running both our functional and 
>> performance regression.
>> We will have results by tomorrow morning, and will analyze them 
>> during the day. I'll update about that.
> We hit a kernel panic when running traffic after configuring a large 
> MTU (9000).
> I will take deeper look into this soon and will keep you updated.
Doesn't happen before applying patch 9/9:
mlx4: add page recycling in receive path
>
> [56136.982183] BUG: unable to handle kernel paging request at 
> 000000022f9e7020
> [56136.990426] IP: mlx4_en_complete_rx_desc+0x130/0x2e0 [mlx4_en]
> [56136.995303] PGD 220b7c067
> [56136.995304] PUD 0
> [56136.997941]
> [56137.001807] Oops: 0000 [#1] SMP
> [56137.004540] Modules linked in: netconsole mlx4_ib mlx4_en(E) 
> mlx4_core(E) nfsv3 nfs fscache rpcrdma ib_isert iscsi_target_mod 
> ib_iser libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp 
> scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm 
> dm_mirror dm_region_hash dm_log ib_cm dm_mod iw_cm ppdev parport_pc 
> i2c_piix4 sg virtio_balloon parport pcspkr acpi_cpufreq nfsd 
> auth_rpcgss nfs_acl lockd grace sunrpc ip_tables mlx5_ib sd_mod 
> ata_generic pata_acpi ib_core mlx5_core cirrus drm_kms_helper 
> syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm ata_piix floppy 
> libata ptp e1000 crc32c_intel virtio_pci pps_core serio_raw 
> virtio_ring i2c_core virtio [last unloaded: netconsole]
> [56137.046028] CPU: 1 PID: 16 Comm: ksoftirqd/1 Tainted: G            
> E   4.10.0-rc6+ #26
> [56137.051501] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
> [56137.055817] task: ffff880236245200 task.stack: ffffc90000d04000
> [56137.060154] RIP: 0010:mlx4_en_complete_rx_desc+0x130/0x2e0 [mlx4_en]
> [56137.064712] RSP: 0018:ffffc90000d07c90 EFLAGS: 00010282
> [56137.068646] RAX: 0000000000000003 RBX: 000000022f9e7000 RCX: 
> ffff880234988880
> [56137.073588] RDX: ffff8802349888e0 RSI: 0000000000000000 RDI: 
> ffff880235dad0a0
> [56137.078563] RBP: ffffc90000d07ce0 R08: 0000000000000000 R09: 
> ffff8802225a08c0
> [56137.083370] R10: ffff8802335c7800 R11: 0000000000000000 R12: 
> ffffc90001da1048
> [56137.088123] R13: 0000000000000b36 R14: ffff8802225af040 R15: 
> 0000000000000b36
> [56137.092837] FS:  0000000000000000(0000) GS:ffff88023fc40000(0000) 
> knlGS:0000000000000000
> [56137.098495] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [56137.102626] CR2: 000000022f9e7020 CR3: 000000023441f000 CR4: 
> 00000000000006e0
> [56137.107581] Call Trace:
> [56137.109955]  mlx4_en_process_rx_cq+0x35c/0xda0 [mlx4_en]
> [56137.113894]  ? mlx4_en_free_tx_desc+0x14e/0x350 [mlx4_en]
> [56137.117992]  ? load_balance+0x1ac/0x900
> [56137.121285]  mlx4_en_poll_rx_cq+0x30/0xa0 [mlx4_en]
> [56137.125023]  net_rx_action+0x23d/0x3a0
> [56137.128146]  __do_softirq+0xd1/0x2a2
> [56137.131178]  run_ksoftirqd+0x29/0x50
> [56137.134180]  smpboot_thread_fn+0x110/0x160
> [56137.137530]  kthread+0x101/0x140
> [56137.140330]  ? sort_range+0x30/0x30
> [56137.143255]  ? kthread_park+0x90/0x90
> [56137.146304]  ? __kthread_parkme+0x50/0x70
> [56137.149466]  ret_from_fork+0x2c/0x40
> [56137.152426] Code: c0 8b 45 cc 41 8b 8a cc 00 00 00 48 63 d0 49 03 
> 8a d0 00 00 00 48 83 c2 03 48 c1 e2 04 48 01 ca 48 89 1a 44 89 5a 08 
> 44 89 7a 0c <48> 8b 53 20 f6 c2 01 0f 85 76 01 00 00 48 89 da 48 83 7a 
> 10 ff
> [56137.164855] RIP: mlx4_en_complete_rx_desc+0x130/0x2e0 [mlx4_en] 
> RSP: ffffc90000d07c90
> [56137.170211] CR2: 000000022f9e7020
> [56137.175430] ---[ end trace 6a259f16967a0cff ]---
>
>
>>>
>>> Worth noting this patch series deletes more than 100 lines of code ;)
>>>
>>> Eric Dumazet (9):
>>>    mlx4: use __skb_fill_page_desc()
>>>    mlx4: dma_dir is a mlx4_en_priv attribute
>>>    mlx4: remove order field from mlx4_en_frag_info
>>>    mlx4: get rid of frag_prefix_size
>>>    mlx4: rx_headroom is a per port attribute
>>>    mlx4: reduce rx ring page_cache size
>>>    mlx4: removal of frag_sizes[]
>>>    mlx4: use order-0 pages for RX
>>>    mlx4: add page recycling in receive path
>>>
>>>   drivers/net/ethernet/mellanox/mlx4/en_rx.c   | 350 
>>> +++++++++------------------
>>>   drivers/net/ethernet/mellanox/mlx4/en_tx.c   |   4 +-
>>>   drivers/net/ethernet/mellanox/mlx4/mlx4_en.h |  28 +--
>>>   3 files changed, 129 insertions(+), 253 deletions(-)
>>>
>> Thanks,
>> Tariq
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next 0/9] mlx4: order-0 allocations and page recycling
  2017-02-08  9:02   ` Tariq Toukan
  2017-02-08 10:29     ` Tariq Toukan
@ 2017-02-08 15:52     ` Eric Dumazet
  2017-02-09 12:00       ` Tariq Toukan
  1 sibling, 1 reply; 26+ messages in thread
From: Eric Dumazet @ 2017-02-08 15:52 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: Eric Dumazet, David S . Miller, netdev, Tariq Toukan,
	Martin KaFai Lau, Willem de Bruijn, Jesper Dangaard Brouer,
	Brenden Blanco, Alexei Starovoitov

On Wed, 2017-02-08 at 11:02 +0200, Tariq Toukan wrote:
> 
> On 07/02/2017 5:50 PM, Tariq Toukan wrote:
> > Hi Eric,
> >
> > Thanks for your series.
> >
> > On 07/02/2017 5:02 AM, Eric Dumazet wrote:
> >> As mentioned half a year ago, we better switch mlx4 driver to order-0
> >> allocations and page recycling.
> >>
> >> This reduces vulnerability surface thanks to better skb->truesize 
> >> tracking
> >> and provides better performance in most cases.
> > The series makes significant change in the RX data-path, that requires 
> > deeper checks, in addition to code review.
> > We applied your series and started running both our functional and 
> > performance regression.
> > We will have results by tomorrow morning, and will analyze them during 
> > the day. I'll update about that.
> We hit a kernel panic when running traffic after configuring a large MTU 
> (9000).
> I will take deeper look into this soon and will keep you updated.

Hmm... I saw a typo for XDP, but not for the non XDP path...

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 8f16ec8dfadd0f95646c498c14d53f7266a0..e572e175edfe0f7392b9833b5b3f867fd6db 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -132,7 +132,7 @@ static int mlx4_en_prepare_rx_desc(const struct mlx4_en_priv *priv,
                                        (index << priv->log_rx_info);
 
        if (ring->page_cache.index > 0) {
-               if (frags[0].page) {
+               if (!frags[0].page) {
                        ring->page_cache.index--;
                        frags[0].page = ring->page_cache.buf[ring->page_cache.index].page;
                        frags[0].dma  = ring->page_cache.buf[ring->page_cache.index].dma;

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next 0/9] mlx4: order-0 allocations and page recycling
  2017-02-08 15:52     ` Eric Dumazet
@ 2017-02-09 12:00       ` Tariq Toukan
  2017-02-09 13:31         ` Eric Dumazet
  0 siblings, 1 reply; 26+ messages in thread
From: Tariq Toukan @ 2017-02-09 12:00 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Eric Dumazet, David S . Miller, netdev, Tariq Toukan,
	Martin KaFai Lau, Willem de Bruijn, Jesper Dangaard Brouer,
	Brenden Blanco, Alexei Starovoitov



On 08/02/2017 5:52 PM, Eric Dumazet wrote:
> On Wed, 2017-02-08 at 11:02 +0200, Tariq Toukan wrote:
>> On 07/02/2017 5:50 PM, Tariq Toukan wrote:
>>> Hi Eric,
>>>
>>> Thanks for your series.
>>>
>>> On 07/02/2017 5:02 AM, Eric Dumazet wrote:
>>>> As mentioned half a year ago, we better switch mlx4 driver to order-0
>>>> allocations and page recycling.
>>>>
>>>> This reduces vulnerability surface thanks to better skb->truesize
>>>> tracking
>>>> and provides better performance in most cases.
>>> The series makes significant change in the RX data-path, that requires
>>> deeper checks, in addition to code review.
>>> We applied your series and started running both our functional and
>>> performance regression.
>>> We will have results by tomorrow morning, and will analyze them during
>>> the day. I'll update about that.
>> We hit a kernel panic when running traffic after configuring a large MTU
>> (9000).
>> I will take deeper look into this soon and will keep you updated.
> Hmm... I saw a typo for XDP, but not for the non XDP path...
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> index 8f16ec8dfadd0f95646c498c14d53f7266a0..e572e175edfe0f7392b9833b5b3f867fd6db 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> @@ -132,7 +132,7 @@ static int mlx4_en_prepare_rx_desc(const struct mlx4_en_priv *priv,
>                                          (index << priv->log_rx_info);
>   
>          if (ring->page_cache.index > 0) {
> -               if (frags[0].page) {
> +               if (!frags[0].page) {
>                          ring->page_cache.index--;
>                          frags[0].page = ring->page_cache.buf[ring->page_cache.index].page;
>                          frags[0].dma  = ring->page_cache.buf[ring->page_cache.index].dma;
>
>
Yes.

Here it is:
Large MTU enlarges priv->log_rx_info.

1115         priv->log_rx_info = ROUNDUP_LOG2(i * sizeof(struct 
mlx4_en_rx_alloc));

In the free_frag function, only frag->page is cleared (dma and offset 
are not!), leaving non-zero garbage that is read later as a page field.

   90 static void mlx4_en_free_frag(const struct mlx4_en_priv *priv,
   91                               struct mlx4_en_rx_alloc *frag)
   92 {
   93         if (frag->page) {
   94                 dma_unmap_page(priv->ddev, frag->dma,
   95                                PAGE_SIZE, priv->dma_dir);
   96                 put_page(frag->page);
   97         }

Later, on line 82, we stay with a garbage page pointer.

   74 static int mlx4_en_alloc_frags(const struct mlx4_en_priv *priv,
   75                                struct mlx4_en_rx_desc *rx_desc,
   76                                struct mlx4_en_rx_alloc *frags,
   77                                gfp_t gfp)
   78 {
   79         int i;
   80
   81         for (i = 0; i < priv->num_frags; i++, frags++) {
   82                 if (!frags->page && mlx4_alloc_page(priv, frags, gfp))
   83                         return -ENOMEM;
   84                 rx_desc->data[i].addr = cpu_to_be64(frags->dma +
   85 frags->page_offset);
   86         }
   87         return 0;
   88 }

It can be fixed with this:

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c 
b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 6854a19087ed..d97ee69393f0 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -94,8 +94,8 @@ static void mlx4_en_free_frag(const struct 
mlx4_en_priv *priv,
                 dma_unmap_page(priv->ddev, frag->dma,
                                PAGE_SIZE, priv->dma_dir);
                 put_page(frag->page);
-               frag->page = NULL;
         }
+       memset(frag, 0, sizeof(*frag));
  }

  static void mlx4_en_init_rx_desc(const struct mlx4_en_priv *priv,


Regards,
Tariq Toukan.

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next 4/9] mlx4: get rid of frag_prefix_size
  2017-02-07  3:02 ` [PATCH net-next 4/9] mlx4: get rid of frag_prefix_size Eric Dumazet
@ 2017-02-09 12:28   ` Tariq Toukan
  2017-02-09 14:06     ` Eric Dumazet
  0 siblings, 1 reply; 26+ messages in thread
From: Tariq Toukan @ 2017-02-09 12:28 UTC (permalink / raw)
  To: Eric Dumazet, David S . Miller
  Cc: netdev, Tariq Toukan, Martin KaFai Lau, Willem de Bruijn,
	Jesper Dangaard Brouer, Brenden Blanco, Alexei Starovoitov,
	Eric Dumazet



On 07/02/2017 5:02 AM, Eric Dumazet wrote:
> Using per frag storage for frag_prefix_size is really silly.
>
> mlx4_en_complete_rx_desc() has all needed info already.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---
>   drivers/net/ethernet/mellanox/mlx4/en_rx.c   | 27 ++++++++++++---------------
>   drivers/net/ethernet/mellanox/mlx4/mlx4_en.h |  3 +--
>   2 files changed, 13 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> index f868cb330039f5730ab8f59eca451c3d5272..c6c64ac1e25931fc172beb5c718ec3a799f6 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> @@ -585,15 +585,14 @@ static int mlx4_en_complete_rx_desc(struct mlx4_en_priv *priv,
>   				    int length)
>   {
>   	struct skb_frag_struct *skb_frags_rx = skb_shinfo(skb)->frags;
> -	struct mlx4_en_frag_info *frag_info;
> -	int nr;
> +	struct mlx4_en_frag_info *frag_info = priv->frag_info;
> +	int nr, frag_size;
>   	dma_addr_t dma;
>   
>   	/* Collect used fragments while replacing them in the HW descriptors */
> -	for (nr = 0; nr < priv->num_frags; nr++) {
> -		frag_info = &priv->frag_info[nr];
> -		if (length <= frag_info->frag_prefix_size)
> -			break;
> +	for (nr = 0;;) {
> +		frag_size = min_t(int, length, frag_info->frag_size);
> +
>   		if (unlikely(!frags[nr].page))
>   			goto fail;
>   
> @@ -603,15 +602,16 @@ static int mlx4_en_complete_rx_desc(struct mlx4_en_priv *priv,
>   
>   		__skb_fill_page_desc(skb, nr, frags[nr].page,
>   				     frags[nr].page_offset,
> -				     frag_info->frag_size);
> +				     frag_size);
Same for dma_sync_single (two lines above).
>   
>   		skb->truesize += frag_info->frag_stride;
>   		frags[nr].page = NULL;
> +		nr++;
> +		length -= frag_size;
> +		if (!length)
> +			break;
> +		frag_info++;
>   	}
> -	/* Adjust size of last fragment to match actual length */
> -	if (nr > 0)
> -		skb_frag_size_set(&skb_frags_rx[nr - 1],
> -			length - priv->frag_info[nr - 1].frag_prefix_size);
>   	return nr;
>   
>   fail:
> @@ -1194,7 +1194,6 @@ void mlx4_en_calc_rx_buf(struct net_device *dev)
>   	if (priv->tx_ring_num[TX_XDP]) {
>   		priv->rx_page_order = 0;
>   		priv->frag_info[0].frag_size = eff_mtu;
> -		priv->frag_info[0].frag_prefix_size = 0;
>   		/* This will gain efficient xdp frame recycling at the
>   		 * expense of more costly truesize accounting
>   		 */
> @@ -1209,7 +1208,6 @@ void mlx4_en_calc_rx_buf(struct net_device *dev)
>   			priv->frag_info[i].frag_size =
>   				(eff_mtu > buf_size + frag_sizes[i]) ?
>   					frag_sizes[i] : eff_mtu - buf_size;
> -			priv->frag_info[i].frag_prefix_size = buf_size;
>   			priv->frag_info[i].frag_stride =
>   				ALIGN(priv->frag_info[i].frag_size,
>   				      SMP_CACHE_BYTES);
> @@ -1229,10 +1227,9 @@ void mlx4_en_calc_rx_buf(struct net_device *dev)
>   	       eff_mtu, priv->num_frags);
>   	for (i = 0; i < priv->num_frags; i++) {
>   		en_err(priv,
> -		       "  frag:%d - size:%d prefix:%d stride:%d\n",
> +		       "  frag:%d - size:%d stride:%d\n",
>   		       i,
>   		       priv->frag_info[i].frag_size,
> -		       priv->frag_info[i].frag_prefix_size,
>   		       priv->frag_info[i].frag_stride);
>   	}
>   }
> diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
> index 11898550f87c077f6687903790d329e4aa1e..fe8ed4e85e9645679cc37d0d30284b523689 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
> +++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
> @@ -472,9 +472,8 @@ struct mlx4_en_mc_list {
>   
>   struct mlx4_en_frag_info {
>   	u16 frag_size;
> -	u16 frag_prefix_size;
> -	u32 frag_stride;
>   	u16 rx_headroom;
> +	u32 frag_stride;
>   };
>   
>   #ifdef CONFIG_MLX4_EN_DCB

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next 0/9] mlx4: order-0 allocations and page recycling
  2017-02-09 12:00       ` Tariq Toukan
@ 2017-02-09 13:31         ` Eric Dumazet
  0 siblings, 0 replies; 26+ messages in thread
From: Eric Dumazet @ 2017-02-09 13:31 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: Eric Dumazet, David S . Miller, netdev, Tariq Toukan,
	Martin KaFai Lau, Willem de Bruijn, Jesper Dangaard Brouer,
	Brenden Blanco, Alexei Starovoitov

On Thu, Feb 9, 2017 at 4:00 AM, Tariq Toukan <ttoukan.linux@gmail.com> wrote:

> Yes.

>
> It can be fixed with this:
>
> diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> index 6854a19087ed..d97ee69393f0 100644
> --- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
> @@ -94,8 +94,8 @@ static void mlx4_en_free_frag(const struct mlx4_en_priv
> *priv,
>                 dma_unmap_page(priv->ddev, frag->dma,
>                                PAGE_SIZE, priv->dma_dir);
>                 put_page(frag->page);
> -               frag->page = NULL;
>         }
> +       memset(frag, 0, sizeof(*frag));
>  }
>

Oh nice, and this is slow path (mlx4_en_free_frag() wont be called
anymore in fast path after my patches)

Thanks a lot Tariq for tracking this down.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH net-next 4/9] mlx4: get rid of frag_prefix_size
  2017-02-09 12:28   ` Tariq Toukan
@ 2017-02-09 14:06     ` Eric Dumazet
  0 siblings, 0 replies; 26+ messages in thread
From: Eric Dumazet @ 2017-02-09 14:06 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: David S . Miller, netdev, Tariq Toukan, Martin KaFai Lau,
	Willem de Bruijn, Jesper Dangaard Brouer, Brenden Blanco,
	Alexei Starovoitov, Eric Dumazet

On Thu, Feb 9, 2017 at 4:28 AM, Tariq Toukan <ttoukan.linux@gmail.com> wrote:
>
>
> On 07/02/2017 5:02 AM, Eric Dumazet wrote:
>>

>> +       for (nr = 0;;) {
>> +               frag_size = min_t(int, length, frag_info->frag_size);
>> +
>>                 if (unlikely(!frags[nr].page))
>>                         goto fail;
>>   @@ -603,15 +602,16 @@ static int mlx4_en_complete_rx_desc(struct
>> mlx4_en_priv *priv,
>>                 __skb_fill_page_desc(skb, nr, frags[nr].page,
>>                                      frags[nr].page_offset,
>> -                                    frag_info->frag_size);
>> +                                    frag_size);
>
> Same for dma_sync_single (two lines above).

Oh right, although I was not sure it this really matter on a part of a page.

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2017-02-09 14:37 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-07  3:02 [PATCH net-next 0/9] mlx4: order-0 allocations and page recycling Eric Dumazet
2017-02-07  3:02 ` [PATCH net-next 1/9] mlx4: use __skb_fill_page_desc() Eric Dumazet
2017-02-07  3:02 ` [PATCH net-next 2/9] mlx4: dma_dir is a mlx4_en_priv attribute Eric Dumazet
2017-02-07  3:02 ` [PATCH net-next 3/9] mlx4: remove order field from mlx4_en_frag_info Eric Dumazet
2017-02-07  3:02 ` [PATCH net-next 4/9] mlx4: get rid of frag_prefix_size Eric Dumazet
2017-02-09 12:28   ` Tariq Toukan
2017-02-09 14:06     ` Eric Dumazet
2017-02-07  3:02 ` [PATCH net-next 5/9] mlx4: rx_headroom is a per port attribute Eric Dumazet
2017-02-07  3:02 ` [PATCH net-next 6/9] mlx4: reduce rx ring page_cache size Eric Dumazet
2017-02-07  3:02 ` [PATCH net-next 7/9] mlx4: removal of frag_sizes[] Eric Dumazet
2017-02-07  3:02 ` [PATCH net-next 8/9] mlx4: use order-0 pages for RX Eric Dumazet
2017-02-07  3:02 ` [PATCH net-next 9/9] mlx4: add page recycling in receive path Eric Dumazet
2017-02-07 16:20   ` Tariq Toukan
2017-02-07 16:34     ` Eric Dumazet
2017-02-08 10:27       ` Tariq Toukan
2017-02-07 15:50 ` [PATCH net-next 0/9] mlx4: order-0 allocations and page recycling Tariq Toukan
2017-02-07 16:06   ` Eric Dumazet
2017-02-07 16:26     ` Eric Dumazet
2017-02-07 16:28       ` Eric Dumazet
2017-02-07 19:05       ` Alexei Starovoitov
2017-02-07 19:18         ` Eric Dumazet
2017-02-08  9:02   ` Tariq Toukan
2017-02-08 10:29     ` Tariq Toukan
2017-02-08 15:52     ` Eric Dumazet
2017-02-09 12:00       ` Tariq Toukan
2017-02-09 13:31         ` Eric Dumazet

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.