netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [net-next PATCH RFC 0/8] page_pool DMA handling and allow to recycles frames via SKB
@ 2018-12-06 23:25 Jesper Dangaard Brouer
  2018-12-06 23:25 ` [net-next PATCH RFC 1/8] page_pool: add helper functions for DMA Jesper Dangaard Brouer
                   ` (7 more replies)
  0 siblings, 8 replies; 30+ messages in thread
From: Jesper Dangaard Brouer @ 2018-12-06 23:25 UTC (permalink / raw)
  To: netdev, David S. Miller, Jesper Dangaard Brouer
  Cc: Toke Høiland-Jørgensen, ard.biesheuvel, Jason Wang,
	ilias.apalodimas, BjörnTöpel, w, Saeed Mahameed,
	mykyta.iziumtsev, Daniel Borkmann, Alexei Starovoitov,
	Tariq Toukan

This RFC patchset shows the plans for allowing page_pool to handle and
maintain DMA map/unmap of the pages it serves to the driver.  For this
to work a return hook in the network core is introduced, which also
involves extending sk_buff with the necessary information.

The overall purpose is to simplify drivers, by providing a page
allocation API that does recycling, such that each driver doesn't have
to reinvent its own recycling scheme.  Using page_pool in a driver
does not require implementing XDP support, but it makes it trivially
easy to do so.  The recycles code leverage the XDP recycle APIs.

The Marvell mvneta driver was used in this patchset to demonstrate how
to use the API, and tested on the EspressoBIN board.  We also have
patches enabling XDP for this driver, but they are not part of this
patchset as we want review of the general idea of the page_pool return
SKB hook.

A driver using page_pool and XDP redirecting into CPUMAP or veth, will
also take advantage of the new SKB return hook, this is currently only mlx5.


The longer term plans involves allowing other types of allocators to
use this return hook.  Particularly allowing zero-copy AF_XDP frames
to travel further into the netstack, if userspace page have been
restricted to read-only.

---

Ilias Apalodimas (4):
      page_pool: add helper functions for DMA
      net: mvneta: use page pool API for sw buffer manager
      net: core: add recycle capabilities on skbs via page_pool API
      net: mvneta: remove copybreak, prefetch and use build_skb

Jesper Dangaard Brouer (4):
      xdp: reduce size of struct xdp_mem_info
      mvneta: activate page recycling via skb using page_pool
      xdp: bpf: cpumap redirect must update skb->mem_info
      veth: xdp_frames redirected into veth need to transfer xdp_mem_info


 drivers/net/ethernet/marvell/Kconfig  |    1 
 drivers/net/ethernet/marvell/mvneta.c |  158 +++++++++++++++++----------------
 drivers/net/veth.c                    |    1 
 include/linux/skbuff.h                |    6 +
 include/net/page_pool.h               |    6 +
 include/net/xdp.h                     |    5 +
 kernel/bpf/cpumap.c                   |    2 
 net/core/page_pool.c                  |    7 +
 net/core/skbuff.c                     |    7 +
 net/core/xdp.c                        |   14 ++-
 10 files changed, 125 insertions(+), 82 deletions(-)

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [net-next PATCH RFC 1/8] page_pool: add helper functions for DMA
  2018-12-06 23:25 [net-next PATCH RFC 0/8] page_pool DMA handling and allow to recycles frames via SKB Jesper Dangaard Brouer
@ 2018-12-06 23:25 ` Jesper Dangaard Brouer
  2018-12-08  7:06   ` David Miller
  2018-12-06 23:25 ` [net-next PATCH RFC 2/8] net: mvneta: use page pool API for sw buffer manager Jesper Dangaard Brouer
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 30+ messages in thread
From: Jesper Dangaard Brouer @ 2018-12-06 23:25 UTC (permalink / raw)
  To: netdev, David S. Miller, Jesper Dangaard Brouer
  Cc: Toke Høiland-Jørgensen, ard.biesheuvel, Jason Wang,
	ilias.apalodimas, BjörnTöpel, w, Saeed Mahameed,
	mykyta.iziumtsev, Daniel Borkmann, Alexei Starovoitov,
	Tariq Toukan

From: Ilias Apalodimas <ilias.apalodimas@linaro.org>

Add helper functions for retreiving dma_addr_t stored in page_private and
unmapping dma addresses, mapped via the page_pool API.

Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 include/net/page_pool.h |    6 ++++++
 net/core/page_pool.c    |    7 +++++++
 2 files changed, 13 insertions(+)

diff --git a/include/net/page_pool.h b/include/net/page_pool.h
index 694d055e01ef..439f9183d4cd 100644
--- a/include/net/page_pool.h
+++ b/include/net/page_pool.h
@@ -111,6 +111,8 @@ struct page_pool *page_pool_create(const struct page_pool_params *params);
 
 void page_pool_destroy(struct page_pool *pool);
 
+void page_pool_unmap_page(struct page_pool *pool, struct page *page);
+
 /* Never call this directly, use helpers below */
 void __page_pool_put_page(struct page_pool *pool,
 			  struct page *page, bool allow_direct);
@@ -141,4 +143,8 @@ static inline bool is_page_pool_compiled_in(void)
 #endif
 }
 
+static inline dma_addr_t page_pool_get_dma_addr(struct page *page)
+{
+	return page_private(page);
+}
 #endif /* _NET_PAGE_POOL_H */
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index 43a932cb609b..26e14a17a67c 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -184,6 +184,13 @@ static void __page_pool_clean_page(struct page_pool *pool,
 	set_page_private(page, 0);
 }
 
+/* unmap the page and clean our state */
+void page_pool_unmap_page(struct page_pool *pool, struct page *page)
+{
+	__page_pool_clean_page(pool, page);
+}
+EXPORT_SYMBOL(page_pool_unmap_page);
+
 /* Return a page to the page allocator, cleaning up our state */
 static void __page_pool_return_page(struct page_pool *pool, struct page *page)
 {

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [net-next PATCH RFC 2/8] net: mvneta: use page pool API for sw buffer manager
  2018-12-06 23:25 [net-next PATCH RFC 0/8] page_pool DMA handling and allow to recycles frames via SKB Jesper Dangaard Brouer
  2018-12-06 23:25 ` [net-next PATCH RFC 1/8] page_pool: add helper functions for DMA Jesper Dangaard Brouer
@ 2018-12-06 23:25 ` Jesper Dangaard Brouer
  2018-12-06 23:25 ` [net-next PATCH RFC 3/8] xdp: reduce size of struct xdp_mem_info Jesper Dangaard Brouer
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 30+ messages in thread
From: Jesper Dangaard Brouer @ 2018-12-06 23:25 UTC (permalink / raw)
  To: netdev, David S. Miller, Jesper Dangaard Brouer
  Cc: Toke Høiland-Jørgensen, ard.biesheuvel, Jason Wang,
	ilias.apalodimas, BjörnTöpel, w, Saeed Mahameed,
	mykyta.iziumtsev, Daniel Borkmann, Alexei Starovoitov,
	Tariq Toukan

From: Ilias Apalodimas <ilias.apalodimas@linaro.org>

Use the page_pool api for allocations and DMA handling instead of
__dev_alloc_page()/dma_map_page() and free_page()/dma_unmap_page().

The page_pool API offers buffer recycling capabilities for XDP but
allocates one page per packet, unless the driver splits and manages
the allocated page.

Although XDP is not a part of the driver yet, the current implementation
is allocating one page per packet, thus there's no performance penalty from
using the API.

For now pages are unmapped via page_pool_unmap_page() before packets
travel into the network stack, as it doesn't have a return hook yet.
Given this call cleared the page_pool state, it is safe to let the
page be returned to the normal page allocator.

Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/marvell/Kconfig  |    1 +
 drivers/net/ethernet/marvell/mvneta.c |   56 ++++++++++++++++++++++++---------
 2 files changed, 41 insertions(+), 16 deletions(-)

diff --git a/drivers/net/ethernet/marvell/Kconfig b/drivers/net/ethernet/marvell/Kconfig
index 3238aa7f5dac..3325abe67465 100644
--- a/drivers/net/ethernet/marvell/Kconfig
+++ b/drivers/net/ethernet/marvell/Kconfig
@@ -60,6 +60,7 @@ config MVNETA
 	depends on ARCH_MVEBU || COMPILE_TEST
 	select MVMDIO
 	select PHYLINK
+	select PAGE_POOL
 	---help---
 	  This driver supports the network interface units in the
 	  Marvell ARMADA XP, ARMADA 370, ARMADA 38x and
diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 5bfd349bf41a..2354421fe96f 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -33,6 +33,7 @@
 #include <linux/skbuff.h>
 #include <net/hwbm.h>
 #include "mvneta_bm.h"
+#include <net/page_pool.h>
 #include <net/ip.h>
 #include <net/ipv6.h>
 #include <net/tso.h>
@@ -624,6 +625,9 @@ struct mvneta_rx_queue {
 	struct sk_buff *skb;
 	int left_size;
 
+	/* page pool */
+	struct page_pool *page_pool;
+
 	/* error counters */
 	u32 skb_alloc_err;
 	u32 refill_err;
@@ -1813,17 +1817,11 @@ static int mvneta_rx_refill(struct mvneta_port *pp,
 	dma_addr_t phys_addr;
 	struct page *page;
 
-	page = __dev_alloc_page(gfp_mask);
+	page = page_pool_dev_alloc_pages(rxq->page_pool);
 	if (!page)
 		return -ENOMEM;
 
-	/* map page for use */
-	phys_addr = dma_map_page(pp->dev->dev.parent, page, 0, PAGE_SIZE,
-				 DMA_FROM_DEVICE);
-	if (unlikely(dma_mapping_error(pp->dev->dev.parent, phys_addr))) {
-		__free_page(page);
-		return -ENOMEM;
-	}
+	phys_addr = page_pool_get_dma_addr(page);
 
 	phys_addr += pp->rx_offset_correction;
 	mvneta_rx_desc_fill(rx_desc, phys_addr, page, rxq);
@@ -1892,10 +1890,11 @@ static void mvneta_rxq_drop_pkts(struct mvneta_port *pp,
 		if (!data || !(rx_desc->buf_phys_addr))
 			continue;
 
-		dma_unmap_page(pp->dev->dev.parent, rx_desc->buf_phys_addr,
-			       PAGE_SIZE, DMA_FROM_DEVICE);
-		__free_page(data);
+		page_pool_put_page(rxq->page_pool, data, false);
 	}
+
+	if (rxq->page_pool)
+		page_pool_destroy(rxq->page_pool);
 }
 
 static inline
@@ -2010,8 +2009,7 @@ static int mvneta_rx_swbm(struct napi_struct *napi,
 				skb_add_rx_frag(rxq->skb, frag_num, page,
 						frag_offset, frag_size,
 						PAGE_SIZE);
-				dma_unmap_page(dev->dev.parent, phys_addr,
-					       PAGE_SIZE, DMA_FROM_DEVICE);
+				page_pool_unmap_page(rxq->page_pool, page);
 				rxq->left_size -= frag_size;
 			}
 		} else {
@@ -2041,8 +2039,7 @@ static int mvneta_rx_swbm(struct napi_struct *napi,
 						frag_offset, frag_size,
 						PAGE_SIZE);
 
-				dma_unmap_page(dev->dev.parent, phys_addr,
-					       PAGE_SIZE, DMA_FROM_DEVICE);
+				page_pool_unmap_page(rxq->page_pool, page);
 
 				rxq->left_size -= frag_size;
 			}
@@ -2828,11 +2825,37 @@ static int mvneta_poll(struct napi_struct *napi, int budget)
 	return rx_done;
 }
 
+static int mvneta_create_page_pool(struct mvneta_port *pp,
+				   struct mvneta_rx_queue *rxq, int num)
+{
+	struct page_pool_params pp_params = { 0 };
+	int err = 0;
+
+	pp_params.order = 0;
+	/* internal DMA mapping in page_pool */
+	pp_params.flags = PP_FLAG_DMA_MAP;
+	pp_params.pool_size = num;
+	pp_params.nid = NUMA_NO_NODE;
+	pp_params.dev = pp->dev->dev.parent;
+	pp_params.dma_dir = DMA_FROM_DEVICE;
+
+	rxq->page_pool = page_pool_create(&pp_params);
+	if (IS_ERR(rxq->page_pool)) {
+		err = PTR_ERR(rxq->page_pool);
+		rxq->page_pool = NULL;
+	}
+
+	return err;
+}
+
 /* Handle rxq fill: allocates rxq skbs; called when initializing a port */
 static int mvneta_rxq_fill(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
 			   int num)
 {
-	int i;
+	int i = 0;
+
+	if (mvneta_create_page_pool(pp, rxq, num))
+		goto out;
 
 	for (i = 0; i < num; i++) {
 		memset(rxq->descs + i, 0, sizeof(struct mvneta_rx_desc));
@@ -2848,6 +2871,7 @@ static int mvneta_rxq_fill(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
 	/* Add this number of RX descriptors as non occupied (ready to
 	 * get packets)
 	 */
+out:
 	mvneta_rxq_non_occup_desc_add(pp, rxq, i);
 
 	return i;

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [net-next PATCH RFC 3/8] xdp: reduce size of struct xdp_mem_info
  2018-12-06 23:25 [net-next PATCH RFC 0/8] page_pool DMA handling and allow to recycles frames via SKB Jesper Dangaard Brouer
  2018-12-06 23:25 ` [net-next PATCH RFC 1/8] page_pool: add helper functions for DMA Jesper Dangaard Brouer
  2018-12-06 23:25 ` [net-next PATCH RFC 2/8] net: mvneta: use page pool API for sw buffer manager Jesper Dangaard Brouer
@ 2018-12-06 23:25 ` Jesper Dangaard Brouer
  2018-12-06 23:25 ` [net-next PATCH RFC 4/8] net: core: add recycle capabilities on skbs via page_pool API Jesper Dangaard Brouer
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 30+ messages in thread
From: Jesper Dangaard Brouer @ 2018-12-06 23:25 UTC (permalink / raw)
  To: netdev, David S. Miller, Jesper Dangaard Brouer
  Cc: Toke Høiland-Jørgensen, ard.biesheuvel, Jason Wang,
	ilias.apalodimas, BjörnTöpel, w, Saeed Mahameed,
	mykyta.iziumtsev, Daniel Borkmann, Alexei Starovoitov,
	Tariq Toukan

It is possible to compress/reduce the size of struct xdp_mem_info.
This change reduce struct xdp_mem_info from 8 bytes to 4 bytes.

The member xdp_mem_info.id can be reduced to u16, as the mem_id_ht
rhashtable in net/core/xdp.c is already limited by MEM_ID_MAX=0xFFFE
which can safely fit in u16.

The member xdp_mem_info.type could be reduced more than u16, as it stores
the enum xdp_mem_type, but due to alignment it is only reduced to u16.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 include/net/xdp.h |    4 ++--
 net/core/xdp.c    |    8 ++++----
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/net/xdp.h b/include/net/xdp.h
index 0f25b3675c5c..5c33b9e0efab 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -46,8 +46,8 @@ enum xdp_mem_type {
 #define XDP_XMIT_FLAGS_MASK	XDP_XMIT_FLUSH
 
 struct xdp_mem_info {
-	u32 type; /* enum xdp_mem_type, but known size type */
-	u32 id;
+	u16 type; /* enum xdp_mem_type, but known size type */
+	u16 id;
 };
 
 struct page_pool;
diff --git a/net/core/xdp.c b/net/core/xdp.c
index 4b2b194f4f1f..e79526314864 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -42,11 +42,11 @@ struct xdp_mem_allocator {
 
 static u32 xdp_mem_id_hashfn(const void *data, u32 len, u32 seed)
 {
-	const u32 *k = data;
-	const u32 key = *k;
+	const u16 *k = data;
+	const u16 key = *k;
 
 	BUILD_BUG_ON(FIELD_SIZEOF(struct xdp_mem_allocator, mem.id)
-		     != sizeof(u32));
+		     != sizeof(u16));
 
 	/* Use cyclic increasing ID as direct hash key */
 	return key;
@@ -56,7 +56,7 @@ static int xdp_mem_id_cmp(struct rhashtable_compare_arg *arg,
 			  const void *ptr)
 {
 	const struct xdp_mem_allocator *xa = ptr;
-	u32 mem_id = *(u32 *)arg->key;
+	u16 mem_id = *(u16 *)arg->key;
 
 	return xa->mem.id != mem_id;
 }

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [net-next PATCH RFC 4/8] net: core: add recycle capabilities on skbs via page_pool API
  2018-12-06 23:25 [net-next PATCH RFC 0/8] page_pool DMA handling and allow to recycles frames via SKB Jesper Dangaard Brouer
                   ` (2 preceding siblings ...)
  2018-12-06 23:25 ` [net-next PATCH RFC 3/8] xdp: reduce size of struct xdp_mem_info Jesper Dangaard Brouer
@ 2018-12-06 23:25 ` Jesper Dangaard Brouer
  2018-12-08  7:15   ` David Miller
  2018-12-08  9:57   ` [net-next, RFC, " Florian Westphal
  2018-12-06 23:25 ` [net-next PATCH RFC 5/8] net: mvneta: remove copybreak, prefetch and use build_skb Jesper Dangaard Brouer
                   ` (3 subsequent siblings)
  7 siblings, 2 replies; 30+ messages in thread
From: Jesper Dangaard Brouer @ 2018-12-06 23:25 UTC (permalink / raw)
  To: netdev, David S. Miller, Jesper Dangaard Brouer
  Cc: Toke Høiland-Jørgensen, ard.biesheuvel, Jason Wang,
	ilias.apalodimas, BjörnTöpel, w, Saeed Mahameed,
	mykyta.iziumtsev, Daniel Borkmann, Alexei Starovoitov,
	Tariq Toukan

From: Ilias Apalodimas <ilias.apalodimas@linaro.org>

This patch is changing struct sk_buff, and is thus per-definition
controversial.

Place a new member 'mem_info' of type struct xdp_mem_info, just after
members (flags) head_frag and pfmemalloc, And not in between
headers_start/end to ensure skb_copy() and pskb_copy() work as-is.
Copying mem_info during skb_clone() is required.  This makes sure that
pages are correctly freed or recycled during the altered
skb_free_head() invocation.

The 'mem_info' name is chosen as this is not strictly tied to XDP,
although the XDP return infrastructure is used.  As a future plan, we
could introduce a __u8 flags member to xdp_mem_info and move flags
head_frag and pfmemalloc into this area.

Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 include/linux/skbuff.h |    6 +++++-
 include/net/xdp.h      |    1 +
 net/core/skbuff.c      |    7 +++++++
 net/core/xdp.c         |    6 ++++++
 4 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 7dcfb5591dc3..95dac0ba6947 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -40,6 +40,7 @@
 #include <linux/in6.h>
 #include <linux/if_packet.h>
 #include <net/flow.h>
+#include <net/xdp.h>
 
 /* The interface for checksum offload between the stack and networking drivers
  * is as follows...
@@ -744,6 +745,10 @@ struct sk_buff {
 				head_frag:1,
 				xmit_more:1,
 				pfmemalloc:1;
+	/* TODO: Future idea, extend mem_info with __u8 flags, and
+	 * move bits head_frag and pfmemalloc there.
+	 */
+	struct xdp_mem_info	mem_info;
 
 	/* fields enclosed in headers_start/headers_end are copied
 	 * using a single memcpy() in __copy_skb_header()
@@ -827,7 +832,6 @@ struct sk_buff {
 #ifdef CONFIG_NETWORK_SECMARK
 	__u32		secmark;
 #endif
-
 	union {
 		__u32		mark;
 		__u32		reserved_tailroom;
diff --git a/include/net/xdp.h b/include/net/xdp.h
index 5c33b9e0efab..4a0ca7a3d5e5 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -128,6 +128,7 @@ struct xdp_frame *convert_to_xdp_frame(struct xdp_buff *xdp)
 void xdp_return_frame(struct xdp_frame *xdpf);
 void xdp_return_frame_rx_napi(struct xdp_frame *xdpf);
 void xdp_return_buff(struct xdp_buff *xdp);
+void xdp_return_skb_page(void *data, struct xdp_mem_info *mem_info);
 
 int xdp_rxq_info_reg(struct xdp_rxq_info *xdp_rxq,
 		     struct net_device *dev, u32 queue_index);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index b4ee5c8b928f..71aca186e44c 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -70,6 +70,7 @@
 #include <net/checksum.h>
 #include <net/ip6_checksum.h>
 #include <net/xfrm.h>
+#include <net/page_pool.h>
 
 #include <linux/uaccess.h>
 #include <trace/events/skb.h>
@@ -544,6 +545,11 @@ static void skb_free_head(struct sk_buff *skb)
 {
 	unsigned char *head = skb->head;
 
+	if (skb->mem_info.type == MEM_TYPE_PAGE_POOL) {
+		xdp_return_skb_page(head, &skb->mem_info);
+		return;
+	}
+
 	if (skb->head_frag)
 		skb_free_frag(head);
 	else
@@ -859,6 +865,7 @@ static struct sk_buff *__skb_clone(struct sk_buff *n, struct sk_buff *skb)
 	n->nohdr = 0;
 	n->peeked = 0;
 	C(pfmemalloc);
+	C(mem_info);
 	n->destructor = NULL;
 	C(tail);
 	C(end);
diff --git a/net/core/xdp.c b/net/core/xdp.c
index e79526314864..1703be4c2611 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -379,6 +379,12 @@ void xdp_return_buff(struct xdp_buff *xdp)
 }
 EXPORT_SYMBOL_GPL(xdp_return_buff);
 
+void xdp_return_skb_page(void *data, struct xdp_mem_info *mem_info)
+{
+	__xdp_return(data, mem_info, false, 0);
+}
+EXPORT_SYMBOL(xdp_return_skb_page);
+
 int xdp_attachment_query(struct xdp_attachment_info *info,
 			 struct netdev_bpf *bpf)
 {

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [net-next PATCH RFC 5/8] net: mvneta: remove copybreak, prefetch and use build_skb
  2018-12-06 23:25 [net-next PATCH RFC 0/8] page_pool DMA handling and allow to recycles frames via SKB Jesper Dangaard Brouer
                   ` (3 preceding siblings ...)
  2018-12-06 23:25 ` [net-next PATCH RFC 4/8] net: core: add recycle capabilities on skbs via page_pool API Jesper Dangaard Brouer
@ 2018-12-06 23:25 ` Jesper Dangaard Brouer
  2018-12-06 23:25 ` [net-next PATCH RFC 6/8] mvneta: activate page recycling via skb using page_pool Jesper Dangaard Brouer
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 30+ messages in thread
From: Jesper Dangaard Brouer @ 2018-12-06 23:25 UTC (permalink / raw)
  To: netdev, David S. Miller, Jesper Dangaard Brouer
  Cc: Toke Høiland-Jørgensen, ard.biesheuvel, Jason Wang,
	ilias.apalodimas, BjörnTöpel, w, Saeed Mahameed,
	mykyta.iziumtsev, Daniel Borkmann, Alexei Starovoitov,
	Tariq Toukan

From: Ilias Apalodimas <ilias.apalodimas@linaro.org>

The driver memcpy for packets < 256b and it's recycle tricks are not
needed anymore.  As previous patch introduces buffer recycling using
the page_pool API (although recycling doesn't get fully activated in
this patch).  After this switch to using build_skb().

This patch implicit fixes a driver bug where the memory is copied
prior to it's syncing for the CPU, in the < 256b case (as this case is
removed).

We also remove the data prefetch completely. The original driver had
the prefetch misplaced before any dma_sync operations took place.
Based on Jesper's analysis even if the prefetch is placed after
any DMA sync ops it ends up hurting performance.

Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/marvell/mvneta.c |   81 +++++++++------------------------
 1 file changed, 22 insertions(+), 59 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 2354421fe96f..78f1fcdc1f00 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -643,7 +643,6 @@ static int txq_number = 8;
 static int rxq_def;
 
 static int rx_copybreak __read_mostly = 256;
-static int rx_header_size __read_mostly = 128;
 
 /* HW BM need that each port be identify by a unique ID */
 static int global_port_id;
@@ -1823,7 +1822,7 @@ static int mvneta_rx_refill(struct mvneta_port *pp,
 
 	phys_addr = page_pool_get_dma_addr(page);
 
-	phys_addr += pp->rx_offset_correction;
+	phys_addr += pp->rx_offset_correction + NET_SKB_PAD;
 	mvneta_rx_desc_fill(rx_desc, phys_addr, page, rxq);
 	return 0;
 }
@@ -1944,14 +1943,12 @@ static int mvneta_rx_swbm(struct napi_struct *napi,
 		struct page *page;
 		dma_addr_t phys_addr;
 		u32 rx_status, index;
-		int rx_bytes, skb_size, copy_size;
-		int frag_num, frag_size, frag_offset;
+		int frag_num, frag_size;
+		int rx_bytes;
 
 		index = rx_desc - rxq->descs;
 		page = (struct page *)rxq->buf_virt_addr[index];
 		data = page_address(page);
-		/* Prefetch header */
-		prefetch(data);
 
 		phys_addr = rx_desc->buf_phys_addr;
 		rx_status = rx_desc->status;
@@ -1969,49 +1966,25 @@ static int mvneta_rx_swbm(struct napi_struct *napi,
 			rx_bytes = rx_desc->data_size -
 				   (ETH_FCS_LEN + MVNETA_MH_SIZE);
 
-			/* Allocate small skb for each new packet */
-			skb_size = max(rx_copybreak, rx_header_size);
-			rxq->skb = netdev_alloc_skb_ip_align(dev, skb_size);
-			if (unlikely(!rxq->skb)) {
-				netdev_err(dev,
-					   "Can't allocate skb on queue %d\n",
-					   rxq->id);
-				dev->stats.rx_dropped++;
-				rxq->skb_alloc_err++;
-				continue;
-			}
-			copy_size = min(skb_size, rx_bytes);
-
-			/* Copy data from buffer to SKB, skip Marvell header */
-			memcpy(rxq->skb->data, data + MVNETA_MH_SIZE,
-			       copy_size);
-			skb_put(rxq->skb, copy_size);
-			rxq->left_size = rx_bytes - copy_size;
 
-			mvneta_rx_csum(pp, rx_status, rxq->skb);
-			if (rxq->left_size == 0) {
-				int size = copy_size + MVNETA_MH_SIZE;
-
-				dma_sync_single_range_for_cpu(dev->dev.parent,
-							      phys_addr, 0,
-							      size,
-							      DMA_FROM_DEVICE);
+			dma_sync_single_range_for_cpu(dev->dev.parent,
+						      phys_addr, 0,
+						      rx_bytes,
+						      DMA_FROM_DEVICE);
 
-				/* leave the descriptor and buffer untouched */
-			} else {
-				/* refill descriptor with new buffer later */
-				rx_desc->buf_phys_addr = 0;
+			rxq->skb = build_skb(data, PAGE_SIZE);
+			if (!rxq->skb)
+				break;
 
-				frag_num = 0;
-				frag_offset = copy_size + MVNETA_MH_SIZE;
-				frag_size = min(rxq->left_size,
-						(int)(PAGE_SIZE - frag_offset));
-				skb_add_rx_frag(rxq->skb, frag_num, page,
-						frag_offset, frag_size,
-						PAGE_SIZE);
-				page_pool_unmap_page(rxq->page_pool, page);
-				rxq->left_size -= frag_size;
-			}
+			rx_desc->buf_phys_addr = 0;
+			frag_num = 0;
+			skb_reserve(rxq->skb, MVNETA_MH_SIZE + NET_SKB_PAD);
+			skb_put(rxq->skb, rx_bytes < PAGE_SIZE ? rx_bytes :
+				PAGE_SIZE);
+			mvneta_rx_csum(pp, rx_status, rxq->skb);
+			page_pool_unmap_page(rxq->page_pool, page);
+			rxq->left_size = rx_bytes < PAGE_SIZE ? 0 : rx_bytes -
+				PAGE_SIZE;
 		} else {
 			/* Middle or Last descriptor */
 			if (unlikely(!rxq->skb)) {
@@ -2019,24 +1992,14 @@ static int mvneta_rx_swbm(struct napi_struct *napi,
 					 rx_status);
 				continue;
 			}
-			if (!rxq->left_size) {
-				/* last descriptor has only FCS */
-				/* and can be discarded */
-				dma_sync_single_range_for_cpu(dev->dev.parent,
-							      phys_addr, 0,
-							      ETH_FCS_LEN,
-							      DMA_FROM_DEVICE);
-				/* leave the descriptor and buffer untouched */
-			} else {
+			if (rxq->left_size) {
 				/* refill descriptor with new buffer later */
 				rx_desc->buf_phys_addr = 0;
 
 				frag_num = skb_shinfo(rxq->skb)->nr_frags;
-				frag_offset = 0;
-				frag_size = min(rxq->left_size,
-						(int)(PAGE_SIZE - frag_offset));
+				frag_size = min(rxq->left_size, (int)PAGE_SIZE);
 				skb_add_rx_frag(rxq->skb, frag_num, page,
-						frag_offset, frag_size,
+						0, frag_size,
 						PAGE_SIZE);
 
 				page_pool_unmap_page(rxq->page_pool, page);

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [net-next PATCH RFC 6/8] mvneta: activate page recycling via skb using page_pool
  2018-12-06 23:25 [net-next PATCH RFC 0/8] page_pool DMA handling and allow to recycles frames via SKB Jesper Dangaard Brouer
                   ` (4 preceding siblings ...)
  2018-12-06 23:25 ` [net-next PATCH RFC 5/8] net: mvneta: remove copybreak, prefetch and use build_skb Jesper Dangaard Brouer
@ 2018-12-06 23:25 ` Jesper Dangaard Brouer
  2018-12-06 23:26 ` [net-next PATCH RFC 7/8] xdp: bpf: cpumap redirect must update skb->mem_info Jesper Dangaard Brouer
  2018-12-06 23:26 ` [net-next PATCH RFC 8/8] veth: xdp_frames redirected into veth need to transfer xdp_mem_info Jesper Dangaard Brouer
  7 siblings, 0 replies; 30+ messages in thread
From: Jesper Dangaard Brouer @ 2018-12-06 23:25 UTC (permalink / raw)
  To: netdev, David S. Miller, Jesper Dangaard Brouer
  Cc: Toke Høiland-Jørgensen, ard.biesheuvel, Jason Wang,
	ilias.apalodimas, BjörnTöpel, w, Saeed Mahameed,
	mykyta.iziumtsev, Daniel Borkmann, Alexei Starovoitov,
	Tariq Toukan

Previous mvneta patches have already started to use page_pool, but
this was primarily for RX page alloc-side and for doing DMA map/unmap
handling.  Pages traveling through the netstack was unmapped and
returned through the normal page allocator.

It is now time to activate that pages are recycled back. This involves
registering the page_pool with the XDP rxq memory model API, even-though
the driver doesn't support XDP itself yet.  And simply updating the
SKB->mem_info field with info from the xdp_rxq_info.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
---
 drivers/net/ethernet/marvell/mvneta.c |   29 +++++++++++++++++++++++++----
 1 file changed, 25 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 78f1fcdc1f00..449c19829d67 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -628,6 +628,9 @@ struct mvneta_rx_queue {
 	/* page pool */
 	struct page_pool *page_pool;
 
+	/* XDP */
+	struct xdp_rxq_info xdp_rxq;
+
 	/* error counters */
 	u32 skb_alloc_err;
 	u32 refill_err;
@@ -1892,6 +1895,9 @@ static void mvneta_rxq_drop_pkts(struct mvneta_port *pp,
 		page_pool_put_page(rxq->page_pool, data, false);
 	}
 
+	if (xdp_rxq_info_is_reg(&rxq->xdp_rxq))
+		xdp_rxq_info_unreg(&rxq->xdp_rxq);
+
 	if (rxq->page_pool)
 		page_pool_destroy(rxq->page_pool);
 }
@@ -1978,11 +1984,11 @@ static int mvneta_rx_swbm(struct napi_struct *napi,
 
 			rx_desc->buf_phys_addr = 0;
 			frag_num = 0;
+			rxq->skb->mem_info = rxq->xdp_rxq.mem;
 			skb_reserve(rxq->skb, MVNETA_MH_SIZE + NET_SKB_PAD);
 			skb_put(rxq->skb, rx_bytes < PAGE_SIZE ? rx_bytes :
 				PAGE_SIZE);
 			mvneta_rx_csum(pp, rx_status, rxq->skb);
-			page_pool_unmap_page(rxq->page_pool, page);
 			rxq->left_size = rx_bytes < PAGE_SIZE ? 0 : rx_bytes -
 				PAGE_SIZE;
 		} else {
@@ -2001,7 +2007,7 @@ static int mvneta_rx_swbm(struct napi_struct *napi,
 				skb_add_rx_frag(rxq->skb, frag_num, page,
 						0, frag_size,
 						PAGE_SIZE);
-
+				/* skb frags[] are not recycled, unmap now */
 				page_pool_unmap_page(rxq->page_pool, page);
 
 				rxq->left_size -= frag_size;
@@ -2815,10 +2821,25 @@ static int mvneta_create_page_pool(struct mvneta_port *pp,
 static int mvneta_rxq_fill(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
 			   int num)
 {
-	int i = 0;
+	int err, i = 0;
+
+	err = mvneta_create_page_pool(pp, rxq, num);
+	if (err)
+		goto out;
 
-	if (mvneta_create_page_pool(pp, rxq, num))
+	err = xdp_rxq_info_reg(&rxq->xdp_rxq, pp->dev, rxq->id);
+	if (err) {
+		page_pool_destroy(rxq->page_pool);
+		goto out;
+	}
+
+	err = xdp_rxq_info_reg_mem_model(&rxq->xdp_rxq, MEM_TYPE_PAGE_POOL,
+					 rxq->page_pool);
+	if (err) {
+		xdp_rxq_info_unreg(&rxq->xdp_rxq);
+		page_pool_destroy(rxq->page_pool);
 		goto out;
+	}
 
 	for (i = 0; i < num; i++) {
 		memset(rxq->descs + i, 0, sizeof(struct mvneta_rx_desc));

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [net-next PATCH RFC 7/8] xdp: bpf: cpumap redirect must update skb->mem_info
  2018-12-06 23:25 [net-next PATCH RFC 0/8] page_pool DMA handling and allow to recycles frames via SKB Jesper Dangaard Brouer
                   ` (5 preceding siblings ...)
  2018-12-06 23:25 ` [net-next PATCH RFC 6/8] mvneta: activate page recycling via skb using page_pool Jesper Dangaard Brouer
@ 2018-12-06 23:26 ` Jesper Dangaard Brouer
  2018-12-06 23:26 ` [net-next PATCH RFC 8/8] veth: xdp_frames redirected into veth need to transfer xdp_mem_info Jesper Dangaard Brouer
  7 siblings, 0 replies; 30+ messages in thread
From: Jesper Dangaard Brouer @ 2018-12-06 23:26 UTC (permalink / raw)
  To: netdev, David S. Miller, Jesper Dangaard Brouer
  Cc: Toke Høiland-Jørgensen, ard.biesheuvel, Jason Wang,
	ilias.apalodimas, BjörnTöpel, w, Saeed Mahameed,
	mykyta.iziumtsev, Daniel Borkmann, Alexei Starovoitov,
	Tariq Toukan

XDP-redirect to CPUMAP is about creating the SKB outside the driver
(and on another CPU) via xdp_frame info. Transfer the xdp_frame mem
info to the new SKB mem_info field.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
---
 kernel/bpf/cpumap.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
index 24aac0d0f412..e3e05b6ccc42 100644
--- a/kernel/bpf/cpumap.c
+++ b/kernel/bpf/cpumap.c
@@ -199,6 +199,8 @@ static struct sk_buff *cpu_map_build_skb(struct bpf_cpu_map_entry *rcpu,
 	/* Essential SKB info: protocol and skb->dev */
 	skb->protocol = eth_type_trans(skb, xdpf->dev_rx);
 
+	skb->mem_info = xdpf->mem;
+
 	/* Optional SKB info, currently missing:
 	 * - HW checksum info		(skb->ip_summed)
 	 * - HW RX hash			(skb_set_hash)

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [net-next PATCH RFC 8/8] veth: xdp_frames redirected into veth need to transfer xdp_mem_info
  2018-12-06 23:25 [net-next PATCH RFC 0/8] page_pool DMA handling and allow to recycles frames via SKB Jesper Dangaard Brouer
                   ` (6 preceding siblings ...)
  2018-12-06 23:26 ` [net-next PATCH RFC 7/8] xdp: bpf: cpumap redirect must update skb->mem_info Jesper Dangaard Brouer
@ 2018-12-06 23:26 ` Jesper Dangaard Brouer
  7 siblings, 0 replies; 30+ messages in thread
From: Jesper Dangaard Brouer @ 2018-12-06 23:26 UTC (permalink / raw)
  To: netdev, David S. Miller, Jesper Dangaard Brouer
  Cc: Toke Høiland-Jørgensen, ard.biesheuvel, Jason Wang,
	ilias.apalodimas, BjörnTöpel, w, Saeed Mahameed,
	mykyta.iziumtsev, Daniel Borkmann, Alexei Starovoitov,
	Tariq Toukan

XDP frames redirected into a veth device, that choose XDP_PASS end-up
creating an SKB from the xdp_frame.  The xdp_frame mem info need to be
transferred into the SKB.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
---
 drivers/net/veth.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index f412ea1cef18..925d300402ca 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -555,6 +555,7 @@ static struct sk_buff *veth_xdp_rcv_one(struct veth_rq *rq,
 		goto err;
 	}
 
+	skb->mem_info = frame->mem;
 	xdp_scrub_frame(frame);
 	skb->protocol = eth_type_trans(skb, rq->dev);
 err:

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [net-next PATCH RFC 1/8] page_pool: add helper functions for DMA
  2018-12-06 23:25 ` [net-next PATCH RFC 1/8] page_pool: add helper functions for DMA Jesper Dangaard Brouer
@ 2018-12-08  7:06   ` David Miller
  2018-12-08  7:55     ` Ilias Apalodimas
  0 siblings, 1 reply; 30+ messages in thread
From: David Miller @ 2018-12-08  7:06 UTC (permalink / raw)
  To: brouer
  Cc: netdev, toke, ard.biesheuvel, jasowang, ilias.apalodimas,
	bjorn.topel, w, saeedm, mykyta.iziumtsev, borkmann,
	alexei.starovoitov, tariqt

From: Jesper Dangaard Brouer <brouer@redhat.com>
Date: Fri, 07 Dec 2018 00:25:32 +0100

> From: Ilias Apalodimas <ilias.apalodimas@linaro.org>
> 
> Add helper functions for retreiving dma_addr_t stored in page_private and
> unmapping dma addresses, mapped via the page_pool API.
> 
> Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>

This isn't going to work on 32-bit platforms where dma_addr_t is a u64,
because the page private is unsigned long.

Grep for PHY_ADDR_T_64BIT under arch/ to see the vast majority of the
cases where this happens, then ARCH_DMA_ADDR_T_64BIT.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [net-next PATCH RFC 4/8] net: core: add recycle capabilities on skbs via page_pool API
  2018-12-06 23:25 ` [net-next PATCH RFC 4/8] net: core: add recycle capabilities on skbs via page_pool API Jesper Dangaard Brouer
@ 2018-12-08  7:15   ` David Miller
  2018-12-08  7:54     ` Ilias Apalodimas
  2018-12-08  9:57   ` [net-next, RFC, " Florian Westphal
  1 sibling, 1 reply; 30+ messages in thread
From: David Miller @ 2018-12-08  7:15 UTC (permalink / raw)
  To: brouer
  Cc: netdev, toke, ard.biesheuvel, jasowang, ilias.apalodimas,
	bjorn.topel, w, saeedm, mykyta.iziumtsev, borkmann,
	alexei.starovoitov, tariqt

From: Jesper Dangaard Brouer <brouer@redhat.com>
Date: Fri, 07 Dec 2018 00:25:47 +0100

> @@ -744,6 +745,10 @@ struct sk_buff {
>  				head_frag:1,
>  				xmit_more:1,
>  				pfmemalloc:1;
> +	/* TODO: Future idea, extend mem_info with __u8 flags, and
> +	 * move bits head_frag and pfmemalloc there.
> +	 */
> +	struct xdp_mem_info	mem_info;

This is 4 bytes right?

I guess I can live with this.

Please do some microbenchmarks to make sure this doesn't show any
obvious regressions.

Thanks.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [net-next PATCH RFC 4/8] net: core: add recycle capabilities on skbs via page_pool API
  2018-12-08  7:15   ` David Miller
@ 2018-12-08  7:54     ` Ilias Apalodimas
  0 siblings, 0 replies; 30+ messages in thread
From: Ilias Apalodimas @ 2018-12-08  7:54 UTC (permalink / raw)
  To: David Miller
  Cc: brouer, netdev, toke, ard.biesheuvel, jasowang, bjorn.topel, w,
	saeedm, mykyta.iziumtsev, borkmann, alexei.starovoitov, tariqt

On Fri, Dec 07, 2018 at 11:15:14PM -0800, David Miller wrote:
> From: Jesper Dangaard Brouer <brouer@redhat.com>
> Date: Fri, 07 Dec 2018 00:25:47 +0100
> 
> > @@ -744,6 +745,10 @@ struct sk_buff {
> >  				head_frag:1,
> >  				xmit_more:1,
> >  				pfmemalloc:1;
> > +	/* TODO: Future idea, extend mem_info with __u8 flags, and
> > +	 * move bits head_frag and pfmemalloc there.
> > +	 */
> > +	struct xdp_mem_info	mem_info;
> 
> This is 4 bytes right?

With this patchset yes

> 
> I guess I can live with this.

Great news!

> 
> Please do some microbenchmarks to make sure this doesn't show any
> obvious regressions.

Will do

Thanks
/Ilias

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [net-next PATCH RFC 1/8] page_pool: add helper functions for DMA
  2018-12-08  7:06   ` David Miller
@ 2018-12-08  7:55     ` Ilias Apalodimas
  0 siblings, 0 replies; 30+ messages in thread
From: Ilias Apalodimas @ 2018-12-08  7:55 UTC (permalink / raw)
  To: David Miller
  Cc: brouer, netdev, toke, ard.biesheuvel, jasowang, bjorn.topel, w,
	saeedm, mykyta.iziumtsev, borkmann, alexei.starovoitov, tariqt

On Fri, Dec 07, 2018 at 11:06:55PM -0800, David Miller wrote:

> This isn't going to work on 32-bit platforms where dma_addr_t is a u64,
> because the page private is unsigned long.
> 
> Grep for PHY_ADDR_T_64BIT under arch/ to see the vast majority of the
> cases where this happens, then ARCH_DMA_ADDR_T_64BIT.

Noted, thanks for the heads up.

Thanks
/Ilias

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [net-next, RFC, 4/8] net: core: add recycle capabilities on skbs via page_pool API
  2018-12-06 23:25 ` [net-next PATCH RFC 4/8] net: core: add recycle capabilities on skbs via page_pool API Jesper Dangaard Brouer
  2018-12-08  7:15   ` David Miller
@ 2018-12-08  9:57   ` Florian Westphal
  2018-12-08 11:36     ` Jesper Dangaard Brouer
  2018-12-08 12:29     ` Eric Dumazet
  1 sibling, 2 replies; 30+ messages in thread
From: Florian Westphal @ 2018-12-08  9:57 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: netdev, David S. Miller, Toke Høiland-Jørgensen,
	ard.biesheuvel, Jason Wang, ilias.apalodimas,
	BjörnTöpel, w, Saeed Mahameed, mykyta.iziumtsev,
	Daniel Borkmann, Alexei Starovoitov, Tariq Toukan

Jesper Dangaard Brouer <brouer@redhat.com> wrote:
> From: Ilias Apalodimas <ilias.apalodimas@linaro.org>
> 
> This patch is changing struct sk_buff, and is thus per-definition
> controversial.
> 
> Place a new member 'mem_info' of type struct xdp_mem_info, just after
> members (flags) head_frag and pfmemalloc, And not in between
> headers_start/end to ensure skb_copy() and pskb_copy() work as-is.
> Copying mem_info during skb_clone() is required.  This makes sure that
> pages are correctly freed or recycled during the altered
> skb_free_head() invocation.

I read this to mean that this 'info' isn't accessed/needed until skb
is freed.  Any reason its not added at the end?

This would avoid moving other fields that are probably accessed
more frequently during processing.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [net-next, RFC, 4/8] net: core: add recycle capabilities on skbs via page_pool API
  2018-12-08  9:57   ` [net-next, RFC, " Florian Westphal
@ 2018-12-08 11:36     ` Jesper Dangaard Brouer
  2018-12-08 20:10       ` David Miller
  2018-12-08 12:29     ` Eric Dumazet
  1 sibling, 1 reply; 30+ messages in thread
From: Jesper Dangaard Brouer @ 2018-12-08 11:36 UTC (permalink / raw)
  To: Florian Westphal
  Cc: netdev, David S. Miller, Toke Høiland-Jørgensen,
	ard.biesheuvel, Jason Wang, ilias.apalodimas,
	BjörnTöpel, w, Saeed Mahameed, mykyta.iziumtsev,
	Daniel Borkmann, Alexei Starovoitov, Tariq Toukan, brouer

On Sat, 8 Dec 2018 10:57:58 +0100
Florian Westphal <fw@strlen.de> wrote:

> Jesper Dangaard Brouer <brouer@redhat.com> wrote:
> > From: Ilias Apalodimas <ilias.apalodimas@linaro.org>
> > 
> > This patch is changing struct sk_buff, and is thus per-definition
> > controversial.
> > 
> > Place a new member 'mem_info' of type struct xdp_mem_info, just after
> > members (flags) head_frag and pfmemalloc, And not in between
> > headers_start/end to ensure skb_copy() and pskb_copy() work as-is.
> > Copying mem_info during skb_clone() is required.  This makes sure that
> > pages are correctly freed or recycled during the altered
> > skb_free_head() invocation.  
> 
> I read this to mean that this 'info' isn't accessed/needed until skb
> is freed.  Any reason its not added at the end?
>
> This would avoid moving other fields that are probably accessed
> more frequently during processing.

It is placed here because it is close to skb->head_frag, which is used
also used when SKB is freed.  This is the reason, both because I think
we can move the skb->head_frag bit into mem_info, and due to cacheline
accesses (e.g. skb->cloned and destructor pointer are also read, which
is on that cache-line)

The annoying part is actually that depending on the kernel config
options CONFIG_XFRM, CONFIG_NF_CONNTRACK and CONFIG_BRIDGE_NETFILTER,
whether there is a cache-line split, where mem_info gets moved into the
next cacheline.

I do like the idea of moving it to the end, iif we also move
skb->head_frag into mem_info, as skb->head (on end-cacheline) is also
accessed during free, but given we still need to read the cache-line
containing skb->{cloned,destructor}, when I don't think it will be a
win.  I think the skb->cloned value is read during lifetime of the SKB.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [net-next, RFC, 4/8] net: core: add recycle capabilities on skbs via page_pool API
  2018-12-08  9:57   ` [net-next, RFC, " Florian Westphal
  2018-12-08 11:36     ` Jesper Dangaard Brouer
@ 2018-12-08 12:29     ` Eric Dumazet
  2018-12-08 12:34       ` Eric Dumazet
                         ` (2 more replies)
  1 sibling, 3 replies; 30+ messages in thread
From: Eric Dumazet @ 2018-12-08 12:29 UTC (permalink / raw)
  To: Florian Westphal, Jesper Dangaard Brouer
  Cc: netdev, David S. Miller, Toke Høiland-Jørgensen,
	ard.biesheuvel, Jason Wang, ilias.apalodimas,
	BjörnTöpel, w, Saeed Mahameed, mykyta.iziumtsev,
	Daniel Borkmann, Alexei Starovoitov, Tariq Toukan



On 12/08/2018 01:57 AM, Florian Westphal wrote:
> Jesper Dangaard Brouer <brouer@redhat.com> wrote:
>> From: Ilias Apalodimas <ilias.apalodimas@linaro.org>
>>
>> This patch is changing struct sk_buff, and is thus per-definition
>> controversial.
>>
>> Place a new member 'mem_info' of type struct xdp_mem_info, just after
>> members (flags) head_frag and pfmemalloc, And not in between
>> headers_start/end to ensure skb_copy() and pskb_copy() work as-is.
>> Copying mem_info during skb_clone() is required.  This makes sure that
>> pages are correctly freed or recycled during the altered
>> skb_free_head() invocation.
> 
> I read this to mean that this 'info' isn't accessed/needed until skb
> is freed.  Any reason its not added at the end?
> 
> This would avoid moving other fields that are probably accessed
> more frequently during processing.
> 

But I do not get why the patch is needed.

Adding extra cost for each skb destruction is costly.

I though XDP was all about _not_ having skbs.

Please let's do not slow down the non XDP stack only to make XDP more appealing.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [net-next, RFC, 4/8] net: core: add recycle capabilities on skbs via page_pool API
  2018-12-08 12:29     ` Eric Dumazet
@ 2018-12-08 12:34       ` Eric Dumazet
  2018-12-08 13:45       ` Jesper Dangaard Brouer
  2018-12-08 14:57       ` Ilias Apalodimas
  2 siblings, 0 replies; 30+ messages in thread
From: Eric Dumazet @ 2018-12-08 12:34 UTC (permalink / raw)
  To: Florian Westphal, Jesper Dangaard Brouer
  Cc: netdev, David S. Miller, Toke Høiland-Jørgensen,
	ard.biesheuvel, Jason Wang, ilias.apalodimas,
	BjörnTöpel, w, Saeed Mahameed, mykyta.iziumtsev,
	Daniel Borkmann, Alexei Starovoitov, Tariq Toukan



On 12/08/2018 04:29 AM, Eric Dumazet wrote:
> 

> But I do not get why the patch is needed.
> 
> Adding extra cost for each skb destruction is costly.
> 
> I though XDP was all about _not_ having skbs.
> 
> Please let's do not slow down the non XDP stack only to make XDP more appealing.
> 

I have a similar concern with napi_consume_skb() :

This added a cost for locally generated TCP traffic, since most TX packets are fast clones.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [net-next, RFC, 4/8] net: core: add recycle capabilities on skbs via page_pool API
  2018-12-08 12:29     ` Eric Dumazet
  2018-12-08 12:34       ` Eric Dumazet
@ 2018-12-08 13:45       ` Jesper Dangaard Brouer
  2018-12-08 14:57       ` Ilias Apalodimas
  2 siblings, 0 replies; 30+ messages in thread
From: Jesper Dangaard Brouer @ 2018-12-08 13:45 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Florian Westphal, netdev, David S. Miller,
	Toke Høiland-Jørgensen, ard.biesheuvel, Jason Wang,
	ilias.apalodimas, BjörnTöpel, w, Saeed Mahameed,
	mykyta.iziumtsev, Daniel Borkmann, Alexei Starovoitov,
	Tariq Toukan, brouer

On Sat, 8 Dec 2018 04:29:17 -0800
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> On 12/08/2018 01:57 AM, Florian Westphal wrote:
> > Jesper Dangaard Brouer <brouer@redhat.com> wrote:  
> >> From: Ilias Apalodimas <ilias.apalodimas@linaro.org>
> >>
> >> This patch is changing struct sk_buff, and is thus per-definition
> >> controversial.
> >>
> >> Place a new member 'mem_info' of type struct xdp_mem_info, just after
> >> members (flags) head_frag and pfmemalloc, And not in between
> >> headers_start/end to ensure skb_copy() and pskb_copy() work as-is.
> >> Copying mem_info during skb_clone() is required.  This makes sure that
> >> pages are correctly freed or recycled during the altered
> >> skb_free_head() invocation.  
> > 
> > I read this to mean that this 'info' isn't accessed/needed until skb
> > is freed.  Any reason its not added at the end?
> > 
> > This would avoid moving other fields that are probably accessed
> > more frequently during processing.
> >   
> 
> But I do not get why the patch is needed.
> 
> Adding extra cost for each skb destruction is costly.
> 
> I though XDP was all about _not_ having skbs.
> 
> Please let's do not slow down the non XDP stack only to make XDP more
> appealing.

In general this work is about better cooperation between XDP and
netstack.  The patch is needed because the page_pool is keeping pages
DMA mapped and need a return hook.  I doubt that the extra
compare-and-branch will affect your use-case on super-scalar CPUs.  We
(Ilias and I) are actually testing this stuff on low-end ARM64 in-order
execution CPUs, which is actually nice as performance effects of our
code changes are not hidden by speculative execution units.  I'm
enforcing (and Ilias agrees) that we do benchmark driven development.
I actually invite people to monitor our progress here[1].  So, trust
me, I am as concerned as you about any performance regression, and is
vigilantly measuring this stuff.  (This is more than you can say about
a lot of the other stuff that gets accepted on this list).


[1] https://github.com/xdp-project/xdp-project/blob/master/areas/arm64/
-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [net-next, RFC, 4/8] net: core: add recycle capabilities on skbs via page_pool API
  2018-12-08 12:29     ` Eric Dumazet
  2018-12-08 12:34       ` Eric Dumazet
  2018-12-08 13:45       ` Jesper Dangaard Brouer
@ 2018-12-08 14:57       ` Ilias Apalodimas
  2018-12-08 17:07         ` Andrew Lunn
                           ` (2 more replies)
  2 siblings, 3 replies; 30+ messages in thread
From: Ilias Apalodimas @ 2018-12-08 14:57 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Florian Westphal, Jesper Dangaard Brouer, netdev,
	David S. Miller, Toke Høiland-Jørgensen,
	ard.biesheuvel, Jason Wang, BjörnTöpel, w,
	Saeed Mahameed, mykyta.iziumtsev, Daniel Borkmann,
	Alexei Starovoitov, Tariq Toukan

Hi Eric, 
> >> This patch is changing struct sk_buff, and is thus per-definition
> >> controversial.
> >>
> >> Place a new member 'mem_info' of type struct xdp_mem_info, just after
> >> members (flags) head_frag and pfmemalloc, And not in between
> >> headers_start/end to ensure skb_copy() and pskb_copy() work as-is.
> >> Copying mem_info during skb_clone() is required.  This makes sure that
> >> pages are correctly freed or recycled during the altered
> >> skb_free_head() invocation.
> > 
> > I read this to mean that this 'info' isn't accessed/needed until skb
> > is freed.  Any reason its not added at the end?
> > 
> > This would avoid moving other fields that are probably accessed
> > more frequently during processing.
> > 
> 
> But I do not get why the patch is needed.
> 
> Adding extra cost for each skb destruction is costly.
> 
> I though XDP was all about _not_ having skbs.

You hit the only point i don't personally like in the code, xdp info in the 
skb. Considering the benefits i am convinced this is ok and here's why:

> Please let's do not slow down the non XDP stack only to make XDP more appealing.

We are not trying to do that, on the contrary. The patchset has nothing towards
speeding XDP and slowing down anything else. The patchset speeds up the 
mvneta driver on the default network stack. The only change that was needed was
to adapt the driver to using the page_pool API. The speed improvements we are
seeing on specific workloads (i.e 256b < packet < 400b) are almost 3x.

Lots of high speed drivers are doing similar recycling tricks themselves (and
there's no common code, everyone is doing something similar though). All we are
trying to do is provide a unified API to make that easier for the rest. Another
advantage is that if the some drivers switch to the API, adding XDP
functionality on them is pretty trivial.

Moreover our tests are only performed on systems without or with SMMU disabled.
On a platform i have access to, enabling and disabling the SMMU has some
performance impact. By keeping the buffers mapped we believe this impact
will be substantially less (i'll come back with results once i have them on
this).

I do understand your point, but the potential advantages on my head
overwight that by a lot.

I got other concerns on the patchset though. Like how much memory is it 'ok' to
keep mapped keeping in mind we are using the streaming DMA API. Are we going to
affect anyone else negatively by doing so ?

Thanks
/Ilias

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [net-next, RFC, 4/8] net: core: add recycle capabilities on skbs via page_pool API
  2018-12-08 14:57       ` Ilias Apalodimas
@ 2018-12-08 17:07         ` Andrew Lunn
  2018-12-08 19:26         ` Eric Dumazet
  2018-12-08 20:21         ` David Miller
  2 siblings, 0 replies; 30+ messages in thread
From: Andrew Lunn @ 2018-12-08 17:07 UTC (permalink / raw)
  To: Ilias Apalodimas
  Cc: Eric Dumazet, Florian Westphal, Jesper Dangaard Brouer, netdev,
	David S. Miller, Toke Høiland-Jørgensen,
	ard.biesheuvel, Jason Wang, BjörnTöpel, w,
	Saeed Mahameed, mykyta.iziumtsev, Daniel Borkmann,
	Alexei Starovoitov, Tariq Toukan

> I got other concerns on the patchset though. Like how much memory is
> it 'ok' to keep mapped keeping in mind we are using the streaming
> DMA API. Are we going to affect anyone else negatively by doing so ?

For mvneta, you can expect the target to have between 512Mbyte to
3G. You can take a look at the DT files to get a better feel for this.
All of it should be low enough that it is DMA'able.

These SoCs are often used for WiFi access points, and NAS boxes.  So i
guess the big memory usage on these boxes is the block cache for a NAS
device. So if you want to see the impact of the driver using more
memory, see if disk performance is affected.

	Andrew

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [net-next, RFC, 4/8] net: core: add recycle capabilities on skbs via page_pool API
  2018-12-08 14:57       ` Ilias Apalodimas
  2018-12-08 17:07         ` Andrew Lunn
@ 2018-12-08 19:26         ` Eric Dumazet
  2018-12-08 20:11           ` Jesper Dangaard Brouer
  2018-12-08 20:21         ` David Miller
  2 siblings, 1 reply; 30+ messages in thread
From: Eric Dumazet @ 2018-12-08 19:26 UTC (permalink / raw)
  To: Ilias Apalodimas, Eric Dumazet
  Cc: Florian Westphal, Jesper Dangaard Brouer, netdev,
	David S. Miller, Toke Høiland-Jørgensen,
	ard.biesheuvel, Jason Wang, BjörnTöpel, w,
	Saeed Mahameed, mykyta.iziumtsev, Daniel Borkmann,
	Alexei Starovoitov, Tariq Toukan



On 12/08/2018 06:57 AM, Ilias Apalodimas wrote:
> Hi Eric, 
>>>> This patch is changing struct sk_buff, and is thus per-definition
>>>> controversial.
>>>>
>>>> Place a new member 'mem_info' of type struct xdp_mem_info, just after
>>>> members (flags) head_frag and pfmemalloc, And not in between
>>>> headers_start/end to ensure skb_copy() and pskb_copy() work as-is.
>>>> Copying mem_info during skb_clone() is required.  This makes sure that
>>>> pages are correctly freed or recycled during the altered
>>>> skb_free_head() invocation.
>>>
>>> I read this to mean that this 'info' isn't accessed/needed until skb
>>> is freed.  Any reason its not added at the end?
>>>
>>> This would avoid moving other fields that are probably accessed
>>> more frequently during processing.
>>>
>>
>> But I do not get why the patch is needed.
>>
>> Adding extra cost for each skb destruction is costly.
>>
>> I though XDP was all about _not_ having skbs.
> 
> You hit the only point i don't personally like in the code, xdp info in the 
> skb. Considering the benefits i am convinced this is ok and here's why:
> 
>> Please let's do not slow down the non XDP stack only to make XDP more appealing.
> 
> We are not trying to do that, on the contrary. The patchset has nothing towards
> speeding XDP and slowing down anything else. The patchset speeds up the 
> mvneta driver on the default network stack. The only change that was needed was
> to adapt the driver to using the page_pool API. The speed improvements we are
> seeing on specific workloads (i.e 256b < packet < 400b) are almost 3x.
> 
> Lots of high speed drivers are doing similar recycling tricks themselves (and
> there's no common code, everyone is doing something similar though). All we are
> trying to do is provide a unified API to make that easier for the rest. Another
> advantage is that if the some drivers switch to the API, adding XDP
> functionality on them is pretty trivial.
> 
> Moreover our tests are only performed on systems without or with SMMU disabled.
> On a platform i have access to, enabling and disabling the SMMU has some
> performance impact. By keeping the buffers mapped we believe this impact
> will be substantially less (i'll come back with results once i have them on
> this).
> 
> I do understand your point, but the potential advantages on my head
> overwight that by a lot.
> 
> I got other concerns on the patchset though. Like how much memory is it 'ok' to
> keep mapped keeping in mind we are using the streaming DMA API. Are we going to
> affect anyone else negatively by doing so ?
> 


I want to make sure you guys thought about splice() stuff, and skb_try_coalesce(),
and GRO, and skb cloning, and ...

I do not see how an essential property of page fragments would be attached
to sk_buff, without adding extra code all over the places.

Page recycling in drivers is fine, but should operate on pages that the driver owns.

Messing with skb->head seems not needed for performance, since skb->head can be regular
slab allocated.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [net-next, RFC, 4/8] net: core: add recycle capabilities on skbs via page_pool API
  2018-12-08 11:36     ` Jesper Dangaard Brouer
@ 2018-12-08 20:10       ` David Miller
  0 siblings, 0 replies; 30+ messages in thread
From: David Miller @ 2018-12-08 20:10 UTC (permalink / raw)
  To: brouer
  Cc: fw, netdev, toke, ard.biesheuvel, jasowang, ilias.apalodimas,
	bjorn.topel, w, saeedm, mykyta.iziumtsev, borkmann,
	alexei.starovoitov, tariqt

From: Jesper Dangaard Brouer <brouer@redhat.com>
Date: Sat, 8 Dec 2018 12:36:10 +0100

> The annoying part is actually that depending on the kernel config
> options CONFIG_XFRM, CONFIG_NF_CONNTRACK and CONFIG_BRIDGE_NETFILTER,
> whether there is a cache-line split, where mem_info gets moved into the
> next cacheline.

Note that Florian Westphal's work (trying to help MP-TCP) would
eliminate this variability.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [net-next, RFC, 4/8] net: core: add recycle capabilities on skbs via page_pool API
  2018-12-08 19:26         ` Eric Dumazet
@ 2018-12-08 20:11           ` Jesper Dangaard Brouer
  2018-12-08 20:14             ` Ilias Apalodimas
  0 siblings, 1 reply; 30+ messages in thread
From: Jesper Dangaard Brouer @ 2018-12-08 20:11 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Ilias Apalodimas, Florian Westphal, netdev, David S. Miller,
	Toke Høiland-Jørgensen, ard.biesheuvel, Jason Wang,
	BjörnTöpel, w, Saeed Mahameed, mykyta.iziumtsev,
	Daniel Borkmann, Alexei Starovoitov, Tariq Toukan, brouer

On Sat, 8 Dec 2018 11:26:56 -0800
Eric Dumazet <eric.dumazet@gmail.com> wrote:

> On 12/08/2018 06:57 AM, Ilias Apalodimas wrote:
> > Hi Eric,   
> >>>> This patch is changing struct sk_buff, and is thus per-definition
> >>>> controversial.
> >>>>
> >>>> Place a new member 'mem_info' of type struct xdp_mem_info, just after
> >>>> members (flags) head_frag and pfmemalloc, And not in between
> >>>> headers_start/end to ensure skb_copy() and pskb_copy() work as-is.
> >>>> Copying mem_info during skb_clone() is required.  This makes sure that
> >>>> pages are correctly freed or recycled during the altered
> >>>> skb_free_head() invocation.  
> >>>
> >>> I read this to mean that this 'info' isn't accessed/needed until skb
> >>> is freed.  Any reason its not added at the end?
> >>>
> >>> This would avoid moving other fields that are probably accessed
> >>> more frequently during processing.
> >>>  
> >>
> >> But I do not get why the patch is needed.
> >>
> >> Adding extra cost for each skb destruction is costly.
> >>
> >> I though XDP was all about _not_ having skbs.  
> > 
> > You hit the only point i don't personally like in the code, xdp info in the 
> > skb. Considering the benefits i am convinced this is ok and here's why:
> >   
> >> Please let's do not slow down the non XDP stack only to make XDP more appealing.  
> > 
> > We are not trying to do that, on the contrary. The patchset has nothing towards
> > speeding XDP and slowing down anything else. The patchset speeds up the 
> > mvneta driver on the default network stack. The only change that was needed was
> > to adapt the driver to using the page_pool API. The speed improvements we are
> > seeing on specific workloads (i.e 256b < packet < 400b) are almost 3x.
> > 
> > Lots of high speed drivers are doing similar recycling tricks themselves (and
> > there's no common code, everyone is doing something similar though). All we are
> > trying to do is provide a unified API to make that easier for the rest. Another
> > advantage is that if the some drivers switch to the API, adding XDP
> > functionality on them is pretty trivial.
> > 
> > Moreover our tests are only performed on systems without or with SMMU disabled.
> > On a platform i have access to, enabling and disabling the SMMU has some
> > performance impact. By keeping the buffers mapped we believe this impact
> > will be substantially less (i'll come back with results once i have them on
> > this).
> > 
> > I do understand your point, but the potential advantages on my head
> > overwight that by a lot.
> > 
> > I got other concerns on the patchset though. Like how much memory is it 'ok' to
> > keep mapped keeping in mind we are using the streaming DMA API. Are we going to
> > affect anyone else negatively by doing so ?
> >   
>  
> I want to make sure you guys thought about splice() stuff, and
> skb_try_coalesce(), and GRO, and skb cloning, and ...

Thanks for the pointers. To Ilias, we need to double check skb_try_coalesce()
code path, as it does look like we don't handle this correctly.

> I do not see how an essential property of page fragments would be
> attached to sk_buff, without adding extra code all over the places.
> 
> Page recycling in drivers is fine, but should operate on pages that
> the driver owns.
> 

I think we have addressed this.  We are only recycling pages that the
driver owns.  In case of fragments, then we release the DMA-mapping and
don't recycle the page, instead the normal code path is taken (which is
missing in case of skb_try_coalesce).

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [net-next, RFC, 4/8] net: core: add recycle capabilities on skbs via page_pool API
  2018-12-08 20:11           ` Jesper Dangaard Brouer
@ 2018-12-08 20:14             ` Ilias Apalodimas
  2018-12-08 21:06               ` Willy Tarreau
  2018-12-08 22:36               ` Eric Dumazet
  0 siblings, 2 replies; 30+ messages in thread
From: Ilias Apalodimas @ 2018-12-08 20:14 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Eric Dumazet, Florian Westphal, netdev, David S. Miller,
	Toke Høiland-Jørgensen, ard.biesheuvel, Jason Wang,
	BjörnTöpel, w, Saeed Mahameed, mykyta.iziumtsev,
	Daniel Borkmann, Alexei Starovoitov, Tariq Toukan

On Sat, Dec 08, 2018 at 09:11:53PM +0100, Jesper Dangaard Brouer wrote:
> >  
> > I want to make sure you guys thought about splice() stuff, and
> > skb_try_coalesce(), and GRO, and skb cloning, and ...
> 
> Thanks for the pointers. To Ilias, we need to double check skb_try_coalesce()
> code path, as it does look like we don't handle this correctly.
> 

Noted. We did check skb cloning, but not this one indeed

/Ilias

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [net-next, RFC, 4/8] net: core: add recycle capabilities on skbs via page_pool API
  2018-12-08 14:57       ` Ilias Apalodimas
  2018-12-08 17:07         ` Andrew Lunn
  2018-12-08 19:26         ` Eric Dumazet
@ 2018-12-08 20:21         ` David Miller
  2018-12-08 20:29           ` Ilias Apalodimas
  2018-12-10  9:51           ` Saeed Mahameed
  2 siblings, 2 replies; 30+ messages in thread
From: David Miller @ 2018-12-08 20:21 UTC (permalink / raw)
  To: ilias.apalodimas
  Cc: eric.dumazet, fw, brouer, netdev, toke, ard.biesheuvel, jasowang,
	bjorn.topel, w, saeedm, mykyta.iziumtsev, borkmann,
	alexei.starovoitov, tariqt

From: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Date: Sat, 8 Dec 2018 16:57:28 +0200

> The patchset speeds up the mvneta driver on the default network
> stack. The only change that was needed was to adapt the driver to
> using the page_pool API. The speed improvements we are seeing on
> specific workloads (i.e 256b < packet < 400b) are almost 3x.
> 
> Lots of high speed drivers are doing similar recycling tricks themselves (and
> there's no common code, everyone is doing something similar though). All we are
> trying to do is provide a unified API to make that easier for the rest. Another
> advantage is that if the some drivers switch to the API, adding XDP
> functionality on them is pretty trivial.

Yeah this is a very important point moving forward.

Jesse Brandeberg brought the following up to me at LPC and I'd like to
develop it further.

Right now we tell driver authors to write a new driver as SKB based,
and once they've done all of that work we tell them to basically
shoe-horn XDP support into that somewhat different framework.

Instead, the model should be the other way around, because with a raw
meta-data free set of data buffers we can always construct an SKB or
pass it to XDP.

So drivers should be targetting some raw data buffer kind of interface
which takes care of all of this stuff.  If the buffers get wrapped
into an SKB and get pushed into the traditional networking stack, the
driver shouldn't know or care.  Likewise if it ends up being processed
with XDP, it should not need to know or care.

All of those details should be behind a common layer.  Then we can
control:

1) Buffer handling, recycling, "fast paths"

2) Statistics

3) XDP feature sets

We can consolidate behavior and semantics across all of the drivers
if we do this.  No more talk about "supporting all XDP features",
and the inconsistencies we have because of that.

The whole common statistics discussion could be resolved with this
common layer as well.

We'd be able to control and properly optimize everything.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [net-next, RFC, 4/8] net: core: add recycle capabilities on skbs via page_pool API
  2018-12-08 20:21         ` David Miller
@ 2018-12-08 20:29           ` Ilias Apalodimas
  2018-12-10  9:51           ` Saeed Mahameed
  1 sibling, 0 replies; 30+ messages in thread
From: Ilias Apalodimas @ 2018-12-08 20:29 UTC (permalink / raw)
  To: David Miller
  Cc: eric.dumazet, fw, brouer, netdev, toke, ard.biesheuvel, jasowang,
	bjorn.topel, w, saeedm, mykyta.iziumtsev, borkmann,
	alexei.starovoitov, tariqt

On Sat, Dec 08, 2018 at 12:21:10PM -0800, David Miller wrote:
> From: Ilias Apalodimas <ilias.apalodimas@linaro.org>
> Date: Sat, 8 Dec 2018 16:57:28 +0200
> 
> > The patchset speeds up the mvneta driver on the default network
> > stack. The only change that was needed was to adapt the driver to
> > using the page_pool API. The speed improvements we are seeing on
> > specific workloads (i.e 256b < packet < 400b) are almost 3x.
> > 
> > Lots of high speed drivers are doing similar recycling tricks themselves (and
> > there's no common code, everyone is doing something similar though). All we are
> > trying to do is provide a unified API to make that easier for the rest. Another
> > advantage is that if the some drivers switch to the API, adding XDP
> > functionality on them is pretty trivial.
> 
> Yeah this is a very important point moving forward.
> 
> Jesse Brandeberg brought the following up to me at LPC and I'd like to
> develop it further.
> 
> Right now we tell driver authors to write a new driver as SKB based,
> and once they've done all of that work we tell them to basically
> shoe-horn XDP support into that somewhat different framework.
> 
> Instead, the model should be the other way around, because with a raw
> meta-data free set of data buffers we can always construct an SKB or
> pass it to XDP.

Yeah exactly and it gets even worst. If the driver writer doesn't go through the
'proper' path, i.e allocate buffers and use build_skb, you end up having to
rewrite dma/memory management for the nornal stack. So it's more than 
'shoe-horning' XDP, it's re-writing and re-testing the whole thing.
The API also offers dma mapping capabilities (configurable). So you remove 
potential nasty bugs there as well.

> 
> So drivers should be targetting some raw data buffer kind of interface
> which takes care of all of this stuff.  If the buffers get wrapped
> into an SKB and get pushed into the traditional networking stack, the
> driver shouldn't know or care.  Likewise if it ends up being processed
> with XDP, it should not need to know or care.
> 
> All of those details should be behind a common layer.  Then we can
> control:
> 
> 1) Buffer handling, recycling, "fast paths"
> 
> 2) Statistics
> 
> 3) XDP feature sets
> 
> We can consolidate behavior and semantics across all of the drivers
> if we do this.  No more talk about "supporting all XDP features",
> and the inconsistencies we have because of that.
> 
> The whole common statistics discussion could be resolved with this
> common layer as well.
> 
> We'd be able to control and properly optimize everything.


/Ilias

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [net-next, RFC, 4/8] net: core: add recycle capabilities on skbs via page_pool API
  2018-12-08 20:14             ` Ilias Apalodimas
@ 2018-12-08 21:06               ` Willy Tarreau
  2018-12-10  7:54                 ` Ilias Apalodimas
  2018-12-08 22:36               ` Eric Dumazet
  1 sibling, 1 reply; 30+ messages in thread
From: Willy Tarreau @ 2018-12-08 21:06 UTC (permalink / raw)
  To: Ilias Apalodimas
  Cc: Jesper Dangaard Brouer, Eric Dumazet, Florian Westphal, netdev,
	David S. Miller, Toke Høiland-Jørgensen,
	ard.biesheuvel, Jason Wang, BjörnTöpel, Saeed Mahameed,
	mykyta.iziumtsev, Daniel Borkmann, Alexei Starovoitov,
	Tariq Toukan

On Sat, Dec 08, 2018 at 10:14:47PM +0200, Ilias Apalodimas wrote:
> On Sat, Dec 08, 2018 at 09:11:53PM +0100, Jesper Dangaard Brouer wrote:
> > >  
> > > I want to make sure you guys thought about splice() stuff, and
> > > skb_try_coalesce(), and GRO, and skb cloning, and ...
> > 
> > Thanks for the pointers. To Ilias, we need to double check skb_try_coalesce()
> > code path, as it does look like we don't handle this correctly.
> > 
> 
> Noted. We did check skb cloning, but not this one indeed

I'll try to run some tests on my clearfog later, but for now I'm still
100% busy until haproxy 1.9 gets released.

Cheers,
Willy

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [net-next, RFC, 4/8] net: core: add recycle capabilities on skbs via page_pool API
  2018-12-08 20:14             ` Ilias Apalodimas
  2018-12-08 21:06               ` Willy Tarreau
@ 2018-12-08 22:36               ` Eric Dumazet
  1 sibling, 0 replies; 30+ messages in thread
From: Eric Dumazet @ 2018-12-08 22:36 UTC (permalink / raw)
  To: Ilias Apalodimas, Jesper Dangaard Brouer
  Cc: Florian Westphal, netdev, David S. Miller,
	Toke Høiland-Jørgensen, ard.biesheuvel, Jason Wang,
	BjörnTöpel, w, Saeed Mahameed, mykyta.iziumtsev,
	Daniel Borkmann, Alexei Starovoitov, Tariq Toukan



On 12/08/2018 12:14 PM, Ilias Apalodimas wrote:
> On Sat, Dec 08, 2018 at 09:11:53PM +0100, Jesper Dangaard Brouer wrote:
>>>  
>>> I want to make sure you guys thought about splice() stuff, and
>>> skb_try_coalesce(), and GRO, and skb cloning, and ...
>>
>> Thanks for the pointers. To Ilias, we need to double check skb_try_coalesce()
>> code path, as it does look like we don't handle this correctly.
>>
> 
> Noted. We did check skb cloning, but not this one indeed

There is also TCP RX zero copy to consider ( tcp_zerocopy_receive() )
but more generally all paths that can grab pages from skbs.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [net-next, RFC, 4/8] net: core: add recycle capabilities on skbs via page_pool API
  2018-12-08 21:06               ` Willy Tarreau
@ 2018-12-10  7:54                 ` Ilias Apalodimas
  0 siblings, 0 replies; 30+ messages in thread
From: Ilias Apalodimas @ 2018-12-10  7:54 UTC (permalink / raw)
  To: Willy Tarreau
  Cc: Jesper Dangaard Brouer, Eric Dumazet, Florian Westphal, netdev,
	David S. Miller, Toke Høiland-Jørgensen,
	ard.biesheuvel, Jason Wang, BjörnTöpel, Saeed Mahameed,
	mykyta.iziumtsev, Daniel Borkmann, Alexei Starovoitov,
	Tariq Toukan

Hi Willy,
> 
> I'll try to run some tests on my clearfog later, but for now I'm still
> 100% busy until haproxy 1.9 gets released.

Great. Keep in mind there's multiple hardware this drivers sits on. If your 
clearfog hardware usues the software buffer manager then you can test. We
haven't done any changes to the 'hardware buffer manager' portion of the code
though. 

Test wise, you should see similar numbers for 64b-256b packets since the driver
internally recycled buffers for packets of that size. The tests we did with
256b-512b packets showed significant increases.


Thanks!
/Ilias

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [net-next, RFC, 4/8] net: core: add recycle capabilities on skbs via page_pool API
  2018-12-08 20:21         ` David Miller
  2018-12-08 20:29           ` Ilias Apalodimas
@ 2018-12-10  9:51           ` Saeed Mahameed
  1 sibling, 0 replies; 30+ messages in thread
From: Saeed Mahameed @ 2018-12-10  9:51 UTC (permalink / raw)
  To: David S. Miller
  Cc: ilias.apalodimas, Eric Dumazet, fw, Jesper Dangaard Brouer,
	Linux Netdev List, toke, ard.biesheuvel, jasowang, bjorn.topel,
	w, Saeed Mahameed, mykyta.iziumtsev, Daniel Borkmann,
	Alexei Starovoitov, Tariq Toukan

On Sat, Dec 8, 2018 at 12:21 PM David Miller <davem@davemloft.net> wrote:
>
> From: Ilias Apalodimas <ilias.apalodimas@linaro.org>
> Date: Sat, 8 Dec 2018 16:57:28 +0200
>
> > The patchset speeds up the mvneta driver on the default network
> > stack. The only change that was needed was to adapt the driver to
> > using the page_pool API. The speed improvements we are seeing on
> > specific workloads (i.e 256b < packet < 400b) are almost 3x.
> >
> > Lots of high speed drivers are doing similar recycling tricks themselves (and
> > there's no common code, everyone is doing something similar though). All we are
> > trying to do is provide a unified API to make that easier for the rest. Another
> > advantage is that if the some drivers switch to the API, adding XDP
> > functionality on them is pretty trivial.
>
> Yeah this is a very important point moving forward.
>
> Jesse Brandeberg brought the following up to me at LPC and I'd like to
> develop it further.
>
> Right now we tell driver authors to write a new driver as SKB based,
> and once they've done all of that work we tell them to basically
> shoe-horn XDP support into that somewhat different framework.
>
> Instead, the model should be the other way around, because with a raw
> meta-data free set of data buffers we can always construct an SKB or
> pass it to XDP.
>

Yep, one concern is how to integrate the BTF approach to let the stack
translate hw specific
meta data from that raw buffer to populate stack generated skbs,  we
will need a middle format
that both driver and the stack can understand, userspace xdp programs
will get the format from the driver
and then compile itself with it, we can't do this in kernel!

> So drivers should be targetting some raw data buffer kind of interface
> which takes care of all of this stuff.  If the buffers get wrapped
> into an SKB and get pushed into the traditional networking stack, the
> driver shouldn't know or care.  Likewise if it ends up being processed
> with XDP, it should not need to know or care.
>

We need XDP path exclusive features, like this patch page pool, to
have an advantage over the conventional way.
Otherwise it is always faster and more appealing to build skbs in driver level.

We also need to consider legacy hardware that might not be able to
fully support such raw buffers (for example ability to preserve
headrooms per packet).

> All of those details should be behind a common layer.  Then we can
> control:
>
> 1) Buffer handling, recycling, "fast paths"
>
> 2) Statistics
>
> 3) XDP feature sets
>
> We can consolidate behavior and semantics across all of the drivers
> if we do this.  No more talk about "supporting all XDP features",
> and the inconsistencies we have because of that.
>
> The whole common statistics discussion could be resolved with this
> common layer as well.
>
> We'd be able to control and properly optimize everything.

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2018-12-10  9:52 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-06 23:25 [net-next PATCH RFC 0/8] page_pool DMA handling and allow to recycles frames via SKB Jesper Dangaard Brouer
2018-12-06 23:25 ` [net-next PATCH RFC 1/8] page_pool: add helper functions for DMA Jesper Dangaard Brouer
2018-12-08  7:06   ` David Miller
2018-12-08  7:55     ` Ilias Apalodimas
2018-12-06 23:25 ` [net-next PATCH RFC 2/8] net: mvneta: use page pool API for sw buffer manager Jesper Dangaard Brouer
2018-12-06 23:25 ` [net-next PATCH RFC 3/8] xdp: reduce size of struct xdp_mem_info Jesper Dangaard Brouer
2018-12-06 23:25 ` [net-next PATCH RFC 4/8] net: core: add recycle capabilities on skbs via page_pool API Jesper Dangaard Brouer
2018-12-08  7:15   ` David Miller
2018-12-08  7:54     ` Ilias Apalodimas
2018-12-08  9:57   ` [net-next, RFC, " Florian Westphal
2018-12-08 11:36     ` Jesper Dangaard Brouer
2018-12-08 20:10       ` David Miller
2018-12-08 12:29     ` Eric Dumazet
2018-12-08 12:34       ` Eric Dumazet
2018-12-08 13:45       ` Jesper Dangaard Brouer
2018-12-08 14:57       ` Ilias Apalodimas
2018-12-08 17:07         ` Andrew Lunn
2018-12-08 19:26         ` Eric Dumazet
2018-12-08 20:11           ` Jesper Dangaard Brouer
2018-12-08 20:14             ` Ilias Apalodimas
2018-12-08 21:06               ` Willy Tarreau
2018-12-10  7:54                 ` Ilias Apalodimas
2018-12-08 22:36               ` Eric Dumazet
2018-12-08 20:21         ` David Miller
2018-12-08 20:29           ` Ilias Apalodimas
2018-12-10  9:51           ` Saeed Mahameed
2018-12-06 23:25 ` [net-next PATCH RFC 5/8] net: mvneta: remove copybreak, prefetch and use build_skb Jesper Dangaard Brouer
2018-12-06 23:25 ` [net-next PATCH RFC 6/8] mvneta: activate page recycling via skb using page_pool Jesper Dangaard Brouer
2018-12-06 23:26 ` [net-next PATCH RFC 7/8] xdp: bpf: cpumap redirect must update skb->mem_info Jesper Dangaard Brouer
2018-12-06 23:26 ` [net-next PATCH RFC 8/8] veth: xdp_frames redirected into veth need to transfer xdp_mem_info Jesper Dangaard Brouer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).