bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v8 bpf-next 00/14] mvneta: introduce XDP multi-buffer support
@ 2021-04-08 12:50 Lorenzo Bianconi
  2021-04-08 12:50 ` [PATCH v8 bpf-next 01/14] xdp: introduce mb in xdp_buff/xdp_frame Lorenzo Bianconi
                   ` (15 more replies)
  0 siblings, 16 replies; 57+ messages in thread
From: Lorenzo Bianconi @ 2021-04-08 12:50 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr, sameehj,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=y, Size: 6194 bytes --]

This series introduce XDP multi-buffer support. The mvneta driver is
the first to support these new "non-linear" xdp_{buff,frame}. Reviewers
please focus on how these new types of xdp_{buff,frame} packets
traverse the different layers and the layout design. It is on purpose
that BPF-helpers are kept simple, as we don't want to expose the
internal layout to allow later changes.

For now, to keep the design simple and to maintain performance, the XDP
BPF-prog (still) only have access to the first-buffer. It is left for
later (another patchset) to add payload access across multiple buffers.
This patchset should still allow for these future extensions. The goal
is to lift the XDP MTU restriction that comes with XDP, but maintain
same performance as before.

The main idea for the new multi-buffer layout is to reuse the same
layout used for non-linear SKB. We introduced a "xdp_shared_info" data
structure at the end of the first buffer to link together subsequent buffers.
xdp_shared_info will alias skb_shared_info allowing to keep most of the frags
in the same cache-line (while with skb_shared_info only the first fragment will
be placed in the first "shared_info" cache-line). Moreover we introduced some
xdp_shared_info helpers aligned to skb_frag* ones.
Converting xdp_frame to SKB and deliver it to the network stack is shown in
patch 07/14. Building the SKB, the xdp_shared_info structure will be converted
in a skb_shared_info one.

A multi-buffer bit (mb) has been introduced in xdp_{buff,frame} structure
to notify the bpf/network layer if this is a xdp multi-buffer frame (mb = 1)
or not (mb = 0).
The mb bit will be set by a xdp multi-buffer capable driver only for
non-linear frames maintaining the capability to receive linear frames
without any extra cost since the xdp_shared_info structure at the end
of the first buffer will be initialized only if mb is set.

Typical use cases for this series are:
- Jumbo-frames
- Packet header split (please see Google’s use-case @ NetDevConf 0x14, [0])
- TSO

A new frame_length field has been introduce in XDP ctx in order to notify the
eBPF layer about the total frame size (linear + paged parts).

bpf_xdp_adjust_tail and bpf_xdp_copy helpers have been modified to take into
account xdp multi-buff frames.

More info about the main idea behind this approach can be found here [1][2].

Changes since v7:
- rebase on top of bpf-next
- fix sparse warnings
- improve comments for frame_length in include/net/xdp.h

Changes since v6:
- the main difference respect to previous versions is the new approach proposed
  by Eelco to pass full length of the packet to eBPF layer in XDP context
- reintroduce multi-buff support to eBPF kself-tests
- reintroduce multi-buff support to bpf_xdp_adjust_tail helper
- introduce multi-buffer support to bpf_xdp_copy helper
- rebase on top of bpf-next

Changes since v5:
- rebase on top of bpf-next
- initialize mb bit in xdp_init_buff() and drop per-driver initialization
- drop xdp->mb initialization in xdp_convert_zc_to_xdp_frame()
- postpone introduction of frame_length field in XDP ctx to another series
- minor changes

Changes since v4:
- rebase ontop of bpf-next
- introduce xdp_shared_info to build xdp multi-buff instead of using the
  skb_shared_info struct
- introduce frame_length in xdp ctx
- drop previous bpf helpers
- fix bpf_xdp_adjust_tail for xdp multi-buff
- introduce xdp multi-buff self-tests for bpf_xdp_adjust_tail
- fix xdp_return_frame_bulk for xdp multi-buff

Changes since v3:
- rebase ontop of bpf-next
- add patch 10/13 to copy back paged data from a xdp multi-buff frame to
  userspace buffer for xdp multi-buff selftests

Changes since v2:
- add throughput measurements
- drop bpf_xdp_adjust_mb_header bpf helper
- introduce selftest for xdp multibuffer
- addressed comments on bpf_xdp_get_frags_count
- introduce xdp multi-buff support to cpumaps

Changes since v1:
- Fix use-after-free in xdp_return_{buff/frame}
- Introduce bpf helpers
- Introduce xdp_mb sample program
- access skb_shared_info->nr_frags only on the last fragment

Changes since RFC:
- squash multi-buffer bit initialization in a single patch
- add mvneta non-linear XDP buff support for tx side

[0] https://netdevconf.info/0x14/session.html?talk-the-path-to-tcp-4k-mtu-and-rx-zerocopy
[1] https://github.com/xdp-project/xdp-project/blob/master/areas/core/xdp-multi-buffer01-design.org
[2] https://netdevconf.info/0x14/session.html?tutorial-add-XDP-support-to-a-NIC-driver (XDPmulti-buffers section)

Eelco Chaudron (4):
  bpf: add multi-buff support to the bpf_xdp_adjust_tail() API
  bpd: add multi-buffer support to xdp copy helpers
  bpf: add new frame_length field to the XDP ctx
  bpf: update xdp_adjust_tail selftest to include multi-buffer

Lorenzo Bianconi (10):
  xdp: introduce mb in xdp_buff/xdp_frame
  xdp: add xdp_shared_info data structure
  net: mvneta: update mb bit before passing the xdp buffer to eBPF layer
  xdp: add multi-buff support to xdp_return_{buff/frame}
  net: mvneta: add multi buffer support to XDP_TX
  net: mvneta: enable jumbo frames for XDP
  net: xdp: add multi-buff support to xdp_build_skb_from_fram
  bpf: move user_size out of bpf_test_init
  bpf: introduce multibuff support to bpf_prog_test_run_xdp()
  bpf: test_run: add xdp_shared_info pointer in bpf_test_finish
    signature

 drivers/net/ethernet/marvell/mvneta.c         | 182 ++++++++++--------
 include/linux/filter.h                        |   7 +
 include/net/xdp.h                             | 105 +++++++++-
 include/uapi/linux/bpf.h                      |   1 +
 net/bpf/test_run.c                            | 109 +++++++++--
 net/core/filter.c                             | 134 ++++++++++++-
 net/core/xdp.c                                | 103 +++++++++-
 tools/include/uapi/linux/bpf.h                |   1 +
 .../bpf/prog_tests/xdp_adjust_tail.c          | 105 ++++++++++
 .../selftests/bpf/prog_tests/xdp_bpf2bpf.c    | 127 ++++++++----
 .../bpf/progs/test_xdp_adjust_tail_grow.c     |  17 +-
 .../bpf/progs/test_xdp_adjust_tail_shrink.c   |  32 ++-
 .../selftests/bpf/progs/test_xdp_bpf2bpf.c    |   3 +-
 13 files changed, 767 insertions(+), 159 deletions(-)

-- 
2.30.2


^ permalink raw reply	[flat|nested] 57+ messages in thread

* [PATCH v8 bpf-next 01/14] xdp: introduce mb in xdp_buff/xdp_frame
  2021-04-08 12:50 [PATCH v8 bpf-next 00/14] mvneta: introduce XDP multi-buffer support Lorenzo Bianconi
@ 2021-04-08 12:50 ` Lorenzo Bianconi
  2021-04-08 18:17   ` Vladimir Oltean
  2021-04-29 13:36   ` Jesper Dangaard Brouer
  2021-04-08 12:50 ` [PATCH v8 bpf-next 02/14] xdp: add xdp_shared_info data structure Lorenzo Bianconi
                   ` (14 subsequent siblings)
  15 siblings, 2 replies; 57+ messages in thread
From: Lorenzo Bianconi @ 2021-04-08 12:50 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr, sameehj,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski

Introduce multi-buffer bit (mb) in xdp_frame/xdp_buffer data structure
in order to specify if this is a linear buffer (mb = 0) or a multi-buffer
frame (mb = 1). In the latter case the shared_info area at the end of the
first buffer will be properly initialized to link together subsequent
buffers.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 include/net/xdp.h | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/include/net/xdp.h b/include/net/xdp.h
index a5bc214a49d9..842580a61563 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -73,7 +73,10 @@ struct xdp_buff {
 	void *data_hard_start;
 	struct xdp_rxq_info *rxq;
 	struct xdp_txq_info *txq;
-	u32 frame_sz; /* frame size to deduce data_hard_end/reserved tailroom*/
+	u32 frame_sz:31; /* frame size to deduce data_hard_end/reserved
+			  * tailroom
+			  */
+	u32 mb:1; /* xdp non-linear buffer */
 };
 
 static __always_inline void
@@ -81,6 +84,7 @@ xdp_init_buff(struct xdp_buff *xdp, u32 frame_sz, struct xdp_rxq_info *rxq)
 {
 	xdp->frame_sz = frame_sz;
 	xdp->rxq = rxq;
+	xdp->mb = 0;
 }
 
 static __always_inline void
@@ -116,7 +120,8 @@ struct xdp_frame {
 	u16 len;
 	u16 headroom;
 	u32 metasize:8;
-	u32 frame_sz:24;
+	u32 frame_sz:23;
+	u32 mb:1; /* xdp non-linear frame */
 	/* Lifetime of xdp_rxq_info is limited to NAPI/enqueue time,
 	 * while mem info is valid on remote CPU.
 	 */
@@ -179,6 +184,7 @@ void xdp_convert_frame_to_buff(struct xdp_frame *frame, struct xdp_buff *xdp)
 	xdp->data_end = frame->data + frame->len;
 	xdp->data_meta = frame->data - frame->metasize;
 	xdp->frame_sz = frame->frame_sz;
+	xdp->mb = frame->mb;
 }
 
 static inline
@@ -205,6 +211,7 @@ int xdp_update_frame_from_buff(struct xdp_buff *xdp,
 	xdp_frame->headroom = headroom - sizeof(*xdp_frame);
 	xdp_frame->metasize = metasize;
 	xdp_frame->frame_sz = xdp->frame_sz;
+	xdp_frame->mb = xdp->mb;
 
 	return 0;
 }
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v8 bpf-next 02/14] xdp: add xdp_shared_info data structure
  2021-04-08 12:50 [PATCH v8 bpf-next 00/14] mvneta: introduce XDP multi-buffer support Lorenzo Bianconi
  2021-04-08 12:50 ` [PATCH v8 bpf-next 01/14] xdp: introduce mb in xdp_buff/xdp_frame Lorenzo Bianconi
@ 2021-04-08 12:50 ` Lorenzo Bianconi
  2021-04-08 13:39   ` Vladimir Oltean
  2021-04-08 18:06   ` kernel test robot
  2021-04-08 12:50 ` [PATCH v8 bpf-next 03/14] net: mvneta: update mb bit before passing the xdp buffer to eBPF layer Lorenzo Bianconi
                   ` (13 subsequent siblings)
  15 siblings, 2 replies; 57+ messages in thread
From: Lorenzo Bianconi @ 2021-04-08 12:50 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr, sameehj,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski

Introduce xdp_shared_info data structure to contain info about
"non-linear" xdp frame. xdp_shared_info will alias skb_shared_info
allowing to keep most of the frags in the same cache-line.
Introduce some xdp_shared_info helpers aligned to skb_frag* ones

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 drivers/net/ethernet/marvell/mvneta.c | 62 +++++++++++++++------------
 include/net/xdp.h                     | 55 ++++++++++++++++++++++--
 2 files changed, 85 insertions(+), 32 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index f20dfd1d7a6b..a52e132fd2cf 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -2036,14 +2036,17 @@ int mvneta_rx_refill_queue(struct mvneta_port *pp, struct mvneta_rx_queue *rxq)
 
 static void
 mvneta_xdp_put_buff(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
-		    struct xdp_buff *xdp, struct skb_shared_info *sinfo,
+		    struct xdp_buff *xdp, struct xdp_shared_info *xdp_sinfo,
 		    int sync_len)
 {
 	int i;
 
-	for (i = 0; i < sinfo->nr_frags; i++)
+	for (i = 0; i < xdp_sinfo->nr_frags; i++) {
+		skb_frag_t *frag = &xdp_sinfo->frags[i];
+
 		page_pool_put_full_page(rxq->page_pool,
-					skb_frag_page(&sinfo->frags[i]), true);
+					xdp_get_frag_page(frag), true);
+	}
 	page_pool_put_page(rxq->page_pool, virt_to_head_page(xdp->data),
 			   sync_len, true);
 }
@@ -2181,7 +2184,7 @@ mvneta_run_xdp(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
 	       struct bpf_prog *prog, struct xdp_buff *xdp,
 	       u32 frame_sz, struct mvneta_stats *stats)
 {
-	struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp);
+	struct xdp_shared_info *xdp_sinfo = xdp_get_shared_info_from_buff(xdp);
 	unsigned int len, data_len, sync;
 	u32 ret, act;
 
@@ -2202,7 +2205,7 @@ mvneta_run_xdp(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
 
 		err = xdp_do_redirect(pp->dev, xdp, prog);
 		if (unlikely(err)) {
-			mvneta_xdp_put_buff(pp, rxq, xdp, sinfo, sync);
+			mvneta_xdp_put_buff(pp, rxq, xdp, xdp_sinfo, sync);
 			ret = MVNETA_XDP_DROPPED;
 		} else {
 			ret = MVNETA_XDP_REDIR;
@@ -2213,7 +2216,7 @@ mvneta_run_xdp(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
 	case XDP_TX:
 		ret = mvneta_xdp_xmit_back(pp, xdp);
 		if (ret != MVNETA_XDP_TX)
-			mvneta_xdp_put_buff(pp, rxq, xdp, sinfo, sync);
+			mvneta_xdp_put_buff(pp, rxq, xdp, xdp_sinfo, sync);
 		break;
 	default:
 		bpf_warn_invalid_xdp_action(act);
@@ -2222,7 +2225,7 @@ mvneta_run_xdp(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
 		trace_xdp_exception(pp->dev, prog, act);
 		fallthrough;
 	case XDP_DROP:
-		mvneta_xdp_put_buff(pp, rxq, xdp, sinfo, sync);
+		mvneta_xdp_put_buff(pp, rxq, xdp, xdp_sinfo, sync);
 		ret = MVNETA_XDP_DROPPED;
 		stats->xdp_drop++;
 		break;
@@ -2243,9 +2246,9 @@ mvneta_swbm_rx_frame(struct mvneta_port *pp,
 {
 	unsigned char *data = page_address(page);
 	int data_len = -MVNETA_MH_SIZE, len;
+	struct xdp_shared_info *xdp_sinfo;
 	struct net_device *dev = pp->dev;
 	enum dma_data_direction dma_dir;
-	struct skb_shared_info *sinfo;
 
 	if (*size > MVNETA_MAX_RX_BUF_SIZE) {
 		len = MVNETA_MAX_RX_BUF_SIZE;
@@ -2268,8 +2271,8 @@ mvneta_swbm_rx_frame(struct mvneta_port *pp,
 	xdp_prepare_buff(xdp, data, pp->rx_offset_correction + MVNETA_MH_SIZE,
 			 data_len, false);
 
-	sinfo = xdp_get_shared_info_from_buff(xdp);
-	sinfo->nr_frags = 0;
+	xdp_sinfo = xdp_get_shared_info_from_buff(xdp);
+	xdp_sinfo->nr_frags = 0;
 }
 
 static void
@@ -2277,7 +2280,7 @@ mvneta_swbm_add_rx_fragment(struct mvneta_port *pp,
 			    struct mvneta_rx_desc *rx_desc,
 			    struct mvneta_rx_queue *rxq,
 			    struct xdp_buff *xdp, int *size,
-			    struct skb_shared_info *xdp_sinfo,
+			    struct xdp_shared_info *xdp_sinfo,
 			    struct page *page)
 {
 	struct net_device *dev = pp->dev;
@@ -2300,13 +2303,13 @@ mvneta_swbm_add_rx_fragment(struct mvneta_port *pp,
 	if (data_len > 0 && xdp_sinfo->nr_frags < MAX_SKB_FRAGS) {
 		skb_frag_t *frag = &xdp_sinfo->frags[xdp_sinfo->nr_frags++];
 
-		skb_frag_off_set(frag, pp->rx_offset_correction);
-		skb_frag_size_set(frag, data_len);
-		__skb_frag_set_page(frag, page);
+		xdp_set_frag_offset(frag, pp->rx_offset_correction);
+		xdp_set_frag_size(frag, data_len);
+		xdp_set_frag_page(frag, page);
 
 		/* last fragment */
 		if (len == *size) {
-			struct skb_shared_info *sinfo;
+			struct xdp_shared_info *sinfo;
 
 			sinfo = xdp_get_shared_info_from_buff(xdp);
 			sinfo->nr_frags = xdp_sinfo->nr_frags;
@@ -2323,10 +2326,13 @@ static struct sk_buff *
 mvneta_swbm_build_skb(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
 		      struct xdp_buff *xdp, u32 desc_status)
 {
-	struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp);
-	int i, num_frags = sinfo->nr_frags;
+	struct xdp_shared_info *xdp_sinfo = xdp_get_shared_info_from_buff(xdp);
+	int i, num_frags = xdp_sinfo->nr_frags;
+	skb_frag_t frag_list[MAX_SKB_FRAGS];
 	struct sk_buff *skb;
 
+	memcpy(frag_list, xdp_sinfo->frags, sizeof(skb_frag_t) * num_frags);
+
 	skb = build_skb(xdp->data_hard_start, PAGE_SIZE);
 	if (!skb)
 		return ERR_PTR(-ENOMEM);
@@ -2338,12 +2344,12 @@ mvneta_swbm_build_skb(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
 	mvneta_rx_csum(pp, desc_status, skb);
 
 	for (i = 0; i < num_frags; i++) {
-		skb_frag_t *frag = &sinfo->frags[i];
+		struct page *page = xdp_get_frag_page(&frag_list[i]);
 
 		skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags,
-				skb_frag_page(frag), skb_frag_off(frag),
-				skb_frag_size(frag), PAGE_SIZE);
-		page_pool_release_page(rxq->page_pool, skb_frag_page(frag));
+				page, xdp_get_frag_offset(&frag_list[i]),
+				xdp_get_frag_size(&frag_list[i]), PAGE_SIZE);
+		page_pool_release_page(rxq->page_pool, page);
 	}
 
 	return skb;
@@ -2356,7 +2362,7 @@ static int mvneta_rx_swbm(struct napi_struct *napi,
 {
 	int rx_proc = 0, rx_todo, refill, size = 0;
 	struct net_device *dev = pp->dev;
-	struct skb_shared_info sinfo;
+	struct xdp_shared_info xdp_sinfo;
 	struct mvneta_stats ps = {};
 	struct bpf_prog *xdp_prog;
 	u32 desc_status, frame_sz;
@@ -2365,7 +2371,7 @@ static int mvneta_rx_swbm(struct napi_struct *napi,
 	xdp_init_buff(&xdp_buf, PAGE_SIZE, &rxq->xdp_rxq);
 	xdp_buf.data_hard_start = NULL;
 
-	sinfo.nr_frags = 0;
+	xdp_sinfo.nr_frags = 0;
 
 	/* Get number of received packets */
 	rx_todo = mvneta_rxq_busy_desc_num_get(pp, rxq);
@@ -2409,7 +2415,7 @@ static int mvneta_rx_swbm(struct napi_struct *napi,
 			}
 
 			mvneta_swbm_add_rx_fragment(pp, rx_desc, rxq, &xdp_buf,
-						    &size, &sinfo, page);
+						    &size, &xdp_sinfo, page);
 		} /* Middle or Last descriptor */
 
 		if (!(rx_status & MVNETA_RXD_LAST_DESC))
@@ -2417,7 +2423,7 @@ static int mvneta_rx_swbm(struct napi_struct *napi,
 			continue;
 
 		if (size) {
-			mvneta_xdp_put_buff(pp, rxq, &xdp_buf, &sinfo, -1);
+			mvneta_xdp_put_buff(pp, rxq, &xdp_buf, &xdp_sinfo, -1);
 			goto next;
 		}
 
@@ -2429,7 +2435,7 @@ static int mvneta_rx_swbm(struct napi_struct *napi,
 		if (IS_ERR(skb)) {
 			struct mvneta_pcpu_stats *stats = this_cpu_ptr(pp->stats);
 
-			mvneta_xdp_put_buff(pp, rxq, &xdp_buf, &sinfo, -1);
+			mvneta_xdp_put_buff(pp, rxq, &xdp_buf, &xdp_sinfo, -1);
 
 			u64_stats_update_begin(&stats->syncp);
 			stats->es.skb_alloc_error++;
@@ -2446,12 +2452,12 @@ static int mvneta_rx_swbm(struct napi_struct *napi,
 		napi_gro_receive(napi, skb);
 next:
 		xdp_buf.data_hard_start = NULL;
-		sinfo.nr_frags = 0;
+		xdp_sinfo.nr_frags = 0;
 	}
 	rcu_read_unlock();
 
 	if (xdp_buf.data_hard_start)
-		mvneta_xdp_put_buff(pp, rxq, &xdp_buf, &sinfo, -1);
+		mvneta_xdp_put_buff(pp, rxq, &xdp_buf, &xdp_sinfo, -1);
 
 	if (ps.xdp_redirect)
 		xdp_do_flush_map();
diff --git a/include/net/xdp.h b/include/net/xdp.h
index 842580a61563..02aea7696d15 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -109,10 +109,54 @@ xdp_prepare_buff(struct xdp_buff *xdp, unsigned char *hard_start,
 	((xdp)->data_hard_start + (xdp)->frame_sz -	\
 	 SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))
 
-static inline struct skb_shared_info *
+struct xdp_shared_info {
+	u16 nr_frags;
+	u16 data_length; /* paged area length */
+	skb_frag_t frags[MAX_SKB_FRAGS];
+};
+
+static inline struct xdp_shared_info *
 xdp_get_shared_info_from_buff(struct xdp_buff *xdp)
 {
-	return (struct skb_shared_info *)xdp_data_hard_end(xdp);
+	BUILD_BUG_ON(sizeof(struct xdp_shared_info) >
+		     sizeof(struct skb_shared_info));
+	return (struct xdp_shared_info *)xdp_data_hard_end(xdp);
+}
+
+static inline struct page *xdp_get_frag_page(const skb_frag_t *frag)
+{
+	return frag->bv_page;
+}
+
+static inline unsigned int xdp_get_frag_offset(const skb_frag_t *frag)
+{
+	return frag->bv_offset;
+}
+
+static inline unsigned int xdp_get_frag_size(const skb_frag_t *frag)
+{
+	return frag->bv_len;
+}
+
+static inline void *xdp_get_frag_address(const skb_frag_t *frag)
+{
+	return page_address(xdp_get_frag_page(frag)) +
+	       xdp_get_frag_offset(frag);
+}
+
+static inline void xdp_set_frag_page(skb_frag_t *frag, struct page *page)
+{
+	frag->bv_page = page;
+}
+
+static inline void xdp_set_frag_offset(skb_frag_t *frag, u32 offset)
+{
+	frag->bv_offset = offset;
+}
+
+static inline void xdp_set_frag_size(skb_frag_t *frag, u32 size)
+{
+	frag->bv_len = size;
 }
 
 struct xdp_frame {
@@ -142,12 +186,15 @@ static __always_inline void xdp_frame_bulk_init(struct xdp_frame_bulk *bq)
 	bq->xa = NULL;
 }
 
-static inline struct skb_shared_info *
+static inline struct xdp_shared_info *
 xdp_get_shared_info_from_frame(struct xdp_frame *frame)
 {
 	void *data_hard_start = frame->data - frame->headroom - sizeof(*frame);
 
-	return (struct skb_shared_info *)(data_hard_start + frame->frame_sz -
+	/* xdp_shared_info struct must be aligned to skb_shared_info
+	 * area in buffer tailroom
+	 */
+	return (struct xdp_shared_info *)(data_hard_start + frame->frame_sz -
 				SKB_DATA_ALIGN(sizeof(struct skb_shared_info)));
 }
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v8 bpf-next 03/14] net: mvneta: update mb bit before passing the xdp buffer to eBPF layer
  2021-04-08 12:50 [PATCH v8 bpf-next 00/14] mvneta: introduce XDP multi-buffer support Lorenzo Bianconi
  2021-04-08 12:50 ` [PATCH v8 bpf-next 01/14] xdp: introduce mb in xdp_buff/xdp_frame Lorenzo Bianconi
  2021-04-08 12:50 ` [PATCH v8 bpf-next 02/14] xdp: add xdp_shared_info data structure Lorenzo Bianconi
@ 2021-04-08 12:50 ` Lorenzo Bianconi
  2021-04-08 18:19   ` Vladimir Oltean
  2021-04-08 12:50 ` [PATCH v8 bpf-next 04/14] xdp: add multi-buff support to xdp_return_{buff/frame} Lorenzo Bianconi
                   ` (12 subsequent siblings)
  15 siblings, 1 reply; 57+ messages in thread
From: Lorenzo Bianconi @ 2021-04-08 12:50 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr, sameehj,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski

Update multi-buffer bit (mb) in xdp_buff to notify XDP/eBPF layer and
XDP remote drivers if this is a "non-linear" XDP buffer. Access
xdp_shared_info only if xdp_buff mb is set.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 drivers/net/ethernet/marvell/mvneta.c | 26 ++++++++++++++++++++------
 1 file changed, 20 insertions(+), 6 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index a52e132fd2cf..94e29cce693a 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -2041,12 +2041,16 @@ mvneta_xdp_put_buff(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
 {
 	int i;
 
+	if (likely(!xdp->mb))
+		goto out;
+
 	for (i = 0; i < xdp_sinfo->nr_frags; i++) {
 		skb_frag_t *frag = &xdp_sinfo->frags[i];
 
 		page_pool_put_full_page(rxq->page_pool,
 					xdp_get_frag_page(frag), true);
 	}
+out:
 	page_pool_put_page(rxq->page_pool, virt_to_head_page(xdp->data),
 			   sync_len, true);
 }
@@ -2246,7 +2250,6 @@ mvneta_swbm_rx_frame(struct mvneta_port *pp,
 {
 	unsigned char *data = page_address(page);
 	int data_len = -MVNETA_MH_SIZE, len;
-	struct xdp_shared_info *xdp_sinfo;
 	struct net_device *dev = pp->dev;
 	enum dma_data_direction dma_dir;
 
@@ -2270,9 +2273,6 @@ mvneta_swbm_rx_frame(struct mvneta_port *pp,
 	prefetch(data);
 	xdp_prepare_buff(xdp, data, pp->rx_offset_correction + MVNETA_MH_SIZE,
 			 data_len, false);
-
-	xdp_sinfo = xdp_get_shared_info_from_buff(xdp);
-	xdp_sinfo->nr_frags = 0;
 }
 
 static void
@@ -2307,12 +2307,18 @@ mvneta_swbm_add_rx_fragment(struct mvneta_port *pp,
 		xdp_set_frag_size(frag, data_len);
 		xdp_set_frag_page(frag, page);
 
+		if (!xdp->mb) {
+			xdp_sinfo->data_length = *size;
+			xdp->mb = 1;
+		}
 		/* last fragment */
 		if (len == *size) {
 			struct xdp_shared_info *sinfo;
 
 			sinfo = xdp_get_shared_info_from_buff(xdp);
 			sinfo->nr_frags = xdp_sinfo->nr_frags;
+			sinfo->data_length = xdp_sinfo->data_length;
+
 			memcpy(sinfo->frags, xdp_sinfo->frags,
 			       sinfo->nr_frags * sizeof(skb_frag_t));
 		}
@@ -2327,11 +2333,15 @@ mvneta_swbm_build_skb(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
 		      struct xdp_buff *xdp, u32 desc_status)
 {
 	struct xdp_shared_info *xdp_sinfo = xdp_get_shared_info_from_buff(xdp);
-	int i, num_frags = xdp_sinfo->nr_frags;
 	skb_frag_t frag_list[MAX_SKB_FRAGS];
+	int i, num_frags = 0;
 	struct sk_buff *skb;
 
-	memcpy(frag_list, xdp_sinfo->frags, sizeof(skb_frag_t) * num_frags);
+	if (unlikely(xdp->mb)) {
+		num_frags = xdp_sinfo->nr_frags;
+		memcpy(frag_list, xdp_sinfo->frags,
+		       sizeof(skb_frag_t) * num_frags);
+	}
 
 	skb = build_skb(xdp->data_hard_start, PAGE_SIZE);
 	if (!skb)
@@ -2343,6 +2353,9 @@ mvneta_swbm_build_skb(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
 	skb_put(skb, xdp->data_end - xdp->data);
 	mvneta_rx_csum(pp, desc_status, skb);
 
+	if (likely(!xdp->mb))
+		return skb;
+
 	for (i = 0; i < num_frags; i++) {
 		struct page *page = xdp_get_frag_page(&frag_list[i]);
 
@@ -2404,6 +2417,7 @@ static int mvneta_rx_swbm(struct napi_struct *napi,
 			frame_sz = size - ETH_FCS_LEN;
 			desc_status = rx_status;
 
+			xdp_buf.mb = 0;
 			mvneta_swbm_rx_frame(pp, rx_desc, rxq, &xdp_buf,
 					     &size, page);
 		} else {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v8 bpf-next 04/14] xdp: add multi-buff support to xdp_return_{buff/frame}
  2021-04-08 12:50 [PATCH v8 bpf-next 00/14] mvneta: introduce XDP multi-buffer support Lorenzo Bianconi
                   ` (2 preceding siblings ...)
  2021-04-08 12:50 ` [PATCH v8 bpf-next 03/14] net: mvneta: update mb bit before passing the xdp buffer to eBPF layer Lorenzo Bianconi
@ 2021-04-08 12:50 ` Lorenzo Bianconi
  2021-04-08 18:30   ` Vladimir Oltean
  2021-04-08 12:50 ` [PATCH v8 bpf-next 05/14] net: mvneta: add multi buffer support to XDP_TX Lorenzo Bianconi
                   ` (11 subsequent siblings)
  15 siblings, 1 reply; 57+ messages in thread
From: Lorenzo Bianconi @ 2021-04-08 12:50 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr, sameehj,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski

Take into account if the received xdp_buff/xdp_frame is non-linear
recycling/returning the frame memory to the allocator or into
xdp_frame_bulk.
Introduce xdp_return_num_frags_from_buff to return a given number of
fragments from a xdp multi-buff starting from the tail.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 include/net/xdp.h | 19 ++++++++++--
 net/core/xdp.c    | 76 ++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 92 insertions(+), 3 deletions(-)

diff --git a/include/net/xdp.h b/include/net/xdp.h
index 02aea7696d15..c8eb7cf4ebed 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -289,6 +289,7 @@ void xdp_return_buff(struct xdp_buff *xdp);
 void xdp_flush_frame_bulk(struct xdp_frame_bulk *bq);
 void xdp_return_frame_bulk(struct xdp_frame *xdpf,
 			   struct xdp_frame_bulk *bq);
+void xdp_return_num_frags_from_buff(struct xdp_buff *xdp, u16 num_frags);
 
 /* When sending xdp_frame into the network stack, then there is no
  * return point callback, which is needed to release e.g. DMA-mapping
@@ -299,10 +300,24 @@ void __xdp_release_frame(void *data, struct xdp_mem_info *mem);
 static inline void xdp_release_frame(struct xdp_frame *xdpf)
 {
 	struct xdp_mem_info *mem = &xdpf->mem;
+	struct xdp_shared_info *xdp_sinfo;
+	int i;
 
 	/* Curr only page_pool needs this */
-	if (mem->type == MEM_TYPE_PAGE_POOL)
-		__xdp_release_frame(xdpf->data, mem);
+	if (mem->type != MEM_TYPE_PAGE_POOL)
+		return;
+
+	if (likely(!xdpf->mb))
+		goto out;
+
+	xdp_sinfo = xdp_get_shared_info_from_frame(xdpf);
+	for (i = 0; i < xdp_sinfo->nr_frags; i++) {
+		struct page *page = xdp_get_frag_page(&xdp_sinfo->frags[i]);
+
+		__xdp_release_frame(page_address(page), mem);
+	}
+out:
+	__xdp_release_frame(xdpf->data, mem);
 }
 
 int xdp_rxq_info_reg(struct xdp_rxq_info *xdp_rxq,
diff --git a/net/core/xdp.c b/net/core/xdp.c
index 05354976c1fc..430f516259d9 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -374,12 +374,38 @@ static void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct,
 
 void xdp_return_frame(struct xdp_frame *xdpf)
 {
+	struct xdp_shared_info *xdp_sinfo;
+	int i;
+
+	if (likely(!xdpf->mb))
+		goto out;
+
+	xdp_sinfo = xdp_get_shared_info_from_frame(xdpf);
+	for (i = 0; i < xdp_sinfo->nr_frags; i++) {
+		struct page *page = xdp_get_frag_page(&xdp_sinfo->frags[i]);
+
+		__xdp_return(page_address(page), &xdpf->mem, false, NULL);
+	}
+out:
 	__xdp_return(xdpf->data, &xdpf->mem, false, NULL);
 }
 EXPORT_SYMBOL_GPL(xdp_return_frame);
 
 void xdp_return_frame_rx_napi(struct xdp_frame *xdpf)
 {
+	struct xdp_shared_info *xdp_sinfo;
+	int i;
+
+	if (likely(!xdpf->mb))
+		goto out;
+
+	xdp_sinfo = xdp_get_shared_info_from_frame(xdpf);
+	for (i = 0; i < xdp_sinfo->nr_frags; i++) {
+		struct page *page = xdp_get_frag_page(&xdp_sinfo->frags[i]);
+
+		__xdp_return(page_address(page), &xdpf->mem, true, NULL);
+	}
+out:
 	__xdp_return(xdpf->data, &xdpf->mem, true, NULL);
 }
 EXPORT_SYMBOL_GPL(xdp_return_frame_rx_napi);
@@ -415,7 +441,7 @@ void xdp_return_frame_bulk(struct xdp_frame *xdpf,
 	struct xdp_mem_allocator *xa;
 
 	if (mem->type != MEM_TYPE_PAGE_POOL) {
-		__xdp_return(xdpf->data, &xdpf->mem, false, NULL);
+		xdp_return_frame(xdpf);
 		return;
 	}
 
@@ -434,15 +460,63 @@ void xdp_return_frame_bulk(struct xdp_frame *xdpf,
 		bq->xa = rhashtable_lookup(mem_id_ht, &mem->id, mem_id_rht_params);
 	}
 
+	if (unlikely(xdpf->mb)) {
+		struct xdp_shared_info *xdp_sinfo;
+		int i;
+
+		xdp_sinfo = xdp_get_shared_info_from_frame(xdpf);
+		for (i = 0; i < xdp_sinfo->nr_frags; i++) {
+			skb_frag_t *frag = &xdp_sinfo->frags[i];
+
+			bq->q[bq->count++] = xdp_get_frag_address(frag);
+			if (bq->count == XDP_BULK_QUEUE_SIZE)
+				xdp_flush_frame_bulk(bq);
+		}
+	}
 	bq->q[bq->count++] = xdpf->data;
 }
 EXPORT_SYMBOL_GPL(xdp_return_frame_bulk);
 
 void xdp_return_buff(struct xdp_buff *xdp)
 {
+	struct xdp_shared_info *xdp_sinfo;
+	int i;
+
+	if (likely(!xdp->mb))
+		goto out;
+
+	xdp_sinfo = xdp_get_shared_info_from_buff(xdp);
+	for (i = 0; i < xdp_sinfo->nr_frags; i++) {
+		struct page *page = xdp_get_frag_page(&xdp_sinfo->frags[i]);
+
+		__xdp_return(page_address(page), &xdp->rxq->mem, true, xdp);
+	}
+out:
 	__xdp_return(xdp->data, &xdp->rxq->mem, true, xdp);
 }
 
+void xdp_return_num_frags_from_buff(struct xdp_buff *xdp, u16 num_frags)
+{
+	struct xdp_shared_info *xdp_sinfo;
+	int i;
+
+	if (unlikely(!xdp->mb))
+		return;
+
+	xdp_sinfo = xdp_get_shared_info_from_buff(xdp);
+	num_frags = min_t(u16, num_frags, xdp_sinfo->nr_frags);
+	for (i = 1; i <= num_frags; i++) {
+		skb_frag_t *frag = &xdp_sinfo->frags[xdp_sinfo->nr_frags - i];
+		struct page *page = xdp_get_frag_page(frag);
+
+		xdp_sinfo->data_length -= xdp_get_frag_size(frag);
+		__xdp_return(page_address(page), &xdp->rxq->mem, false, NULL);
+	}
+	xdp_sinfo->nr_frags -= num_frags;
+	xdp->mb = !!xdp_sinfo->nr_frags;
+}
+EXPORT_SYMBOL_GPL(xdp_return_num_frags_from_buff);
+
 /* Only called for MEM_TYPE_PAGE_POOL see xdp.h */
 void __xdp_release_frame(void *data, struct xdp_mem_info *mem)
 {
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v8 bpf-next 05/14] net: mvneta: add multi buffer support to XDP_TX
  2021-04-08 12:50 [PATCH v8 bpf-next 00/14] mvneta: introduce XDP multi-buffer support Lorenzo Bianconi
                   ` (3 preceding siblings ...)
  2021-04-08 12:50 ` [PATCH v8 bpf-next 04/14] xdp: add multi-buff support to xdp_return_{buff/frame} Lorenzo Bianconi
@ 2021-04-08 12:50 ` Lorenzo Bianconi
  2021-04-08 18:40   ` Vladimir Oltean
  2021-04-08 12:50 ` [PATCH v8 bpf-next 06/14] net: mvneta: enable jumbo frames for XDP Lorenzo Bianconi
                   ` (10 subsequent siblings)
  15 siblings, 1 reply; 57+ messages in thread
From: Lorenzo Bianconi @ 2021-04-08 12:50 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr, sameehj,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski

Introduce the capability to map non-linear xdp buffer running
mvneta_xdp_submit_frame() for XDP_TX and XDP_REDIRECT

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 drivers/net/ethernet/marvell/mvneta.c | 94 +++++++++++++++++----------
 1 file changed, 58 insertions(+), 36 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 94e29cce693a..e95d8df0fcdb 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -1860,8 +1860,8 @@ static void mvneta_txq_bufs_free(struct mvneta_port *pp,
 			bytes_compl += buf->skb->len;
 			pkts_compl++;
 			dev_kfree_skb_any(buf->skb);
-		} else if (buf->type == MVNETA_TYPE_XDP_TX ||
-			   buf->type == MVNETA_TYPE_XDP_NDO) {
+		} else if ((buf->type == MVNETA_TYPE_XDP_TX ||
+			    buf->type == MVNETA_TYPE_XDP_NDO) && buf->xdpf) {
 			if (napi && buf->type == MVNETA_TYPE_XDP_TX)
 				xdp_return_frame_rx_napi(buf->xdpf);
 			else
@@ -2057,45 +2057,67 @@ mvneta_xdp_put_buff(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
 
 static int
 mvneta_xdp_submit_frame(struct mvneta_port *pp, struct mvneta_tx_queue *txq,
-			struct xdp_frame *xdpf, bool dma_map)
+			struct xdp_frame *xdpf, int *nxmit_byte, bool dma_map)
 {
-	struct mvneta_tx_desc *tx_desc;
-	struct mvneta_tx_buf *buf;
-	dma_addr_t dma_addr;
+	struct mvneta_tx_desc *tx_desc = NULL;
+	struct xdp_shared_info *xdp_sinfo;
+	struct page *page;
+	int i, num_frames;
+
+	xdp_sinfo = xdp_get_shared_info_from_frame(xdpf);
+	num_frames = xdpf->mb ? xdp_sinfo->nr_frags + 1 : 1;
 
-	if (txq->count >= txq->tx_stop_threshold)
+	if (txq->count + num_frames >= txq->size)
 		return MVNETA_XDP_DROPPED;
 
-	tx_desc = mvneta_txq_next_desc_get(txq);
+	for (i = 0; i < num_frames; i++) {
+		struct mvneta_tx_buf *buf = &txq->buf[txq->txq_put_index];
+		skb_frag_t *frag = i ? &xdp_sinfo->frags[i - 1] : NULL;
+		int len = i ? xdp_get_frag_size(frag) : xdpf->len;
+		dma_addr_t dma_addr;
 
-	buf = &txq->buf[txq->txq_put_index];
-	if (dma_map) {
-		/* ndo_xdp_xmit */
-		dma_addr = dma_map_single(pp->dev->dev.parent, xdpf->data,
-					  xdpf->len, DMA_TO_DEVICE);
-		if (dma_mapping_error(pp->dev->dev.parent, dma_addr)) {
-			mvneta_txq_desc_put(txq);
-			return MVNETA_XDP_DROPPED;
+		tx_desc = mvneta_txq_next_desc_get(txq);
+		if (dma_map) {
+			/* ndo_xdp_xmit */
+			void *data;
+
+			data = frag ? xdp_get_frag_address(frag) : xdpf->data;
+			dma_addr = dma_map_single(pp->dev->dev.parent, data,
+						  len, DMA_TO_DEVICE);
+			if (dma_mapping_error(pp->dev->dev.parent, dma_addr)) {
+				for (; i >= 0; i--)
+					mvneta_txq_desc_put(txq);
+				return MVNETA_XDP_DROPPED;
+			}
+			buf->type = MVNETA_TYPE_XDP_NDO;
+		} else {
+			page = frag ? xdp_get_frag_page(frag)
+				    : virt_to_page(xdpf->data);
+			dma_addr = page_pool_get_dma_addr(page);
+			if (frag)
+				dma_addr += xdp_get_frag_offset(frag);
+			else
+				dma_addr += sizeof(*xdpf) + xdpf->headroom;
+			dma_sync_single_for_device(pp->dev->dev.parent,
+						   dma_addr, len,
+						   DMA_BIDIRECTIONAL);
+			buf->type = MVNETA_TYPE_XDP_TX;
 		}
-		buf->type = MVNETA_TYPE_XDP_NDO;
-	} else {
-		struct page *page = virt_to_page(xdpf->data);
+		buf->xdpf = i ? NULL : xdpf;
 
-		dma_addr = page_pool_get_dma_addr(page) +
-			   sizeof(*xdpf) + xdpf->headroom;
-		dma_sync_single_for_device(pp->dev->dev.parent, dma_addr,
-					   xdpf->len, DMA_BIDIRECTIONAL);
-		buf->type = MVNETA_TYPE_XDP_TX;
+		tx_desc->command = !i ? MVNETA_TXD_F_DESC : 0;
+		tx_desc->buf_phys_addr = dma_addr;
+		tx_desc->data_size = len;
+		*nxmit_byte += len;
+
+		mvneta_txq_inc_put(txq);
 	}
-	buf->xdpf = xdpf;
 
-	tx_desc->command = MVNETA_TXD_FLZ_DESC;
-	tx_desc->buf_phys_addr = dma_addr;
-	tx_desc->data_size = xdpf->len;
+	/*last descriptor */
+	tx_desc->command |= MVNETA_TXD_L_DESC | MVNETA_TXD_Z_PAD;
 
-	mvneta_txq_inc_put(txq);
-	txq->pending++;
-	txq->count++;
+	txq->pending += num_frames;
+	txq->count += num_frames;
 
 	return MVNETA_XDP_TX;
 }
@@ -2106,8 +2128,8 @@ mvneta_xdp_xmit_back(struct mvneta_port *pp, struct xdp_buff *xdp)
 	struct mvneta_pcpu_stats *stats = this_cpu_ptr(pp->stats);
 	struct mvneta_tx_queue *txq;
 	struct netdev_queue *nq;
+	int cpu, nxmit_byte = 0;
 	struct xdp_frame *xdpf;
-	int cpu;
 	u32 ret;
 
 	xdpf = xdp_convert_buff_to_frame(xdp);
@@ -2119,10 +2141,10 @@ mvneta_xdp_xmit_back(struct mvneta_port *pp, struct xdp_buff *xdp)
 	nq = netdev_get_tx_queue(pp->dev, txq->id);
 
 	__netif_tx_lock(nq, cpu);
-	ret = mvneta_xdp_submit_frame(pp, txq, xdpf, false);
+	ret = mvneta_xdp_submit_frame(pp, txq, xdpf, &nxmit_byte, false);
 	if (ret == MVNETA_XDP_TX) {
 		u64_stats_update_begin(&stats->syncp);
-		stats->es.ps.tx_bytes += xdpf->len;
+		stats->es.ps.tx_bytes += nxmit_byte;
 		stats->es.ps.tx_packets++;
 		stats->es.ps.xdp_tx++;
 		u64_stats_update_end(&stats->syncp);
@@ -2161,11 +2183,11 @@ mvneta_xdp_xmit(struct net_device *dev, int num_frame,
 
 	__netif_tx_lock(nq, cpu);
 	for (i = 0; i < num_frame; i++) {
-		ret = mvneta_xdp_submit_frame(pp, txq, frames[i], true);
+		ret = mvneta_xdp_submit_frame(pp, txq, frames[i], &nxmit_byte,
+					      true);
 		if (ret != MVNETA_XDP_TX)
 			break;
 
-		nxmit_byte += frames[i]->len;
 		nxmit++;
 	}
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v8 bpf-next 06/14] net: mvneta: enable jumbo frames for XDP
  2021-04-08 12:50 [PATCH v8 bpf-next 00/14] mvneta: introduce XDP multi-buffer support Lorenzo Bianconi
                   ` (4 preceding siblings ...)
  2021-04-08 12:50 ` [PATCH v8 bpf-next 05/14] net: mvneta: add multi buffer support to XDP_TX Lorenzo Bianconi
@ 2021-04-08 12:50 ` Lorenzo Bianconi
  2021-04-08 12:50 ` [PATCH v8 bpf-next 07/14] net: xdp: add multi-buff support to xdp_build_skb_from_fram Lorenzo Bianconi
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 57+ messages in thread
From: Lorenzo Bianconi @ 2021-04-08 12:50 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr, sameehj,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski

Enable the capability to receive jumbo frames even if the interface is
running in XDP mode

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 drivers/net/ethernet/marvell/mvneta.c | 10 ----------
 1 file changed, 10 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index e95d8df0fcdb..8489a7522453 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -3771,11 +3771,6 @@ static int mvneta_change_mtu(struct net_device *dev, int mtu)
 		mtu = ALIGN(MVNETA_RX_PKT_SIZE(mtu), 8);
 	}
 
-	if (pp->xdp_prog && mtu > MVNETA_MAX_RX_BUF_SIZE) {
-		netdev_info(dev, "Illegal MTU value %d for XDP mode\n", mtu);
-		return -EINVAL;
-	}
-
 	dev->mtu = mtu;
 
 	if (!netif_running(dev)) {
@@ -4477,11 +4472,6 @@ static int mvneta_xdp_setup(struct net_device *dev, struct bpf_prog *prog,
 	struct mvneta_port *pp = netdev_priv(dev);
 	struct bpf_prog *old_prog;
 
-	if (prog && dev->mtu > MVNETA_MAX_RX_BUF_SIZE) {
-		NL_SET_ERR_MSG_MOD(extack, "MTU too large for XDP");
-		return -EOPNOTSUPP;
-	}
-
 	if (pp->bm_priv) {
 		NL_SET_ERR_MSG_MOD(extack,
 				   "Hardware Buffer Management not supported on XDP");
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v8 bpf-next 07/14] net: xdp: add multi-buff support to xdp_build_skb_from_fram
  2021-04-08 12:50 [PATCH v8 bpf-next 00/14] mvneta: introduce XDP multi-buffer support Lorenzo Bianconi
                   ` (5 preceding siblings ...)
  2021-04-08 12:50 ` [PATCH v8 bpf-next 06/14] net: mvneta: enable jumbo frames for XDP Lorenzo Bianconi
@ 2021-04-08 12:50 ` Lorenzo Bianconi
  2021-04-08 12:51 ` [PATCH v8 bpf-next 08/14] bpf: add multi-buff support to the bpf_xdp_adjust_tail() API Lorenzo Bianconi
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 57+ messages in thread
From: Lorenzo Bianconi @ 2021-04-08 12:50 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr, sameehj,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski

Introduce xdp multi-buff support to
__xdp_build_skb_from_frame/xdp_build_skb_from_fram utility
routines.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 net/core/xdp.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/net/core/xdp.c b/net/core/xdp.c
index 430f516259d9..7388bc6d680b 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -603,9 +603,21 @@ struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf,
 					   struct sk_buff *skb,
 					   struct net_device *dev)
 {
+	skb_frag_t frag_list[MAX_SKB_FRAGS];
 	unsigned int headroom, frame_size;
+	int i, num_frags = 0;
 	void *hard_start;
 
+	/* XDP multi-buff frame */
+	if (unlikely(xdpf->mb)) {
+		struct xdp_shared_info *xdp_sinfo;
+
+		xdp_sinfo = xdp_get_shared_info_from_frame(xdpf);
+		num_frags = xdp_sinfo->nr_frags;
+		memcpy(frag_list, xdp_sinfo->frags,
+		       sizeof(skb_frag_t) * num_frags);
+	}
+
 	/* Part of headroom was reserved to xdpf */
 	headroom = sizeof(*xdpf) + xdpf->headroom;
 
@@ -624,6 +636,20 @@ struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf,
 	if (xdpf->metasize)
 		skb_metadata_set(skb, xdpf->metasize);
 
+	/* Single-buff XDP frame */
+	if (likely(!num_frags))
+		goto out;
+
+	for (i = 0; i < num_frags; i++) {
+		struct page *page = xdp_get_frag_page(&frag_list[i]);
+
+		skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags,
+				page, xdp_get_frag_offset(&frag_list[i]),
+				xdp_get_frag_size(&frag_list[i]),
+				xdpf->frame_sz);
+	}
+
+out:
 	/* Essential SKB info: protocol and skb->dev */
 	skb->protocol = eth_type_trans(skb, dev);
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v8 bpf-next 08/14] bpf: add multi-buff support to the bpf_xdp_adjust_tail() API
  2021-04-08 12:50 [PATCH v8 bpf-next 00/14] mvneta: introduce XDP multi-buffer support Lorenzo Bianconi
                   ` (6 preceding siblings ...)
  2021-04-08 12:50 ` [PATCH v8 bpf-next 07/14] net: xdp: add multi-buff support to xdp_build_skb_from_fram Lorenzo Bianconi
@ 2021-04-08 12:51 ` Lorenzo Bianconi
  2021-04-08 19:15   ` Vladimir Oltean
  2021-04-08 12:51 ` [PATCH v8 bpf-next 09/14] bpd: add multi-buffer support to xdp copy helpers Lorenzo Bianconi
                   ` (7 subsequent siblings)
  15 siblings, 1 reply; 57+ messages in thread
From: Lorenzo Bianconi @ 2021-04-08 12:51 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr, sameehj,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski

From: Eelco Chaudron <echaudro@redhat.com>

This change adds support for tail growing and shrinking for XDP multi-buff.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 include/net/xdp.h |  5 ++++
 net/core/filter.c | 63 +++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 68 insertions(+)

diff --git a/include/net/xdp.h b/include/net/xdp.h
index c8eb7cf4ebed..55751cf2badf 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -159,6 +159,11 @@ static inline void xdp_set_frag_size(skb_frag_t *frag, u32 size)
 	frag->bv_len = size;
 }
 
+static inline unsigned int xdp_get_frag_tailroom(const skb_frag_t *frag)
+{
+	return PAGE_SIZE - xdp_get_frag_size(frag) - xdp_get_frag_offset(frag);
+}
+
 struct xdp_frame {
 	void *data;
 	u16 len;
diff --git a/net/core/filter.c b/net/core/filter.c
index cae56d08a670..c4eb1392f88e 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3855,11 +3855,74 @@ static const struct bpf_func_proto bpf_xdp_adjust_head_proto = {
 	.arg2_type	= ARG_ANYTHING,
 };
 
+static int bpf_xdp_mb_adjust_tail(struct xdp_buff *xdp, int offset)
+{
+	struct xdp_shared_info *xdp_sinfo = xdp_get_shared_info_from_buff(xdp);
+
+	if (unlikely(xdp_sinfo->nr_frags == 0))
+		return -EINVAL;
+
+	if (offset >= 0) {
+		skb_frag_t *frag = &xdp_sinfo->frags[xdp_sinfo->nr_frags - 1];
+		int size;
+
+		if (unlikely(offset > xdp_get_frag_tailroom(frag)))
+			return -EINVAL;
+
+		size = xdp_get_frag_size(frag);
+		memset(xdp_get_frag_address(frag) + size, 0, offset);
+		xdp_set_frag_size(frag, size + offset);
+		xdp_sinfo->data_length += offset;
+	} else {
+		int i, frags_to_free = 0;
+
+		offset = abs(offset);
+
+		if (unlikely(offset > ((int)(xdp->data_end - xdp->data) +
+				       xdp_sinfo->data_length -
+				       ETH_HLEN)))
+			return -EINVAL;
+
+		for (i = xdp_sinfo->nr_frags - 1; i >= 0 && offset > 0; i--) {
+			skb_frag_t *frag = &xdp_sinfo->frags[i];
+			int size = xdp_get_frag_size(frag);
+			int shrink = min_t(int, offset, size);
+
+			offset -= shrink;
+			if (likely(size - shrink > 0)) {
+				/* When updating the final fragment we have
+				 * to adjust the data_length in line.
+				 */
+				xdp_sinfo->data_length -= shrink;
+				xdp_set_frag_size(frag, size - shrink);
+				break;
+			}
+
+			/* When we free the fragments,
+			 * xdp_return_frags_from_buff() will take care
+			 * of updating the xdp share info data_length.
+			 */
+			frags_to_free++;
+		}
+
+		if (unlikely(frags_to_free))
+			xdp_return_num_frags_from_buff(xdp, frags_to_free);
+
+		if (unlikely(offset > 0))
+			xdp->data_end -= offset;
+	}
+
+	return 0;
+}
+
 BPF_CALL_2(bpf_xdp_adjust_tail, struct xdp_buff *, xdp, int, offset)
 {
 	void *data_hard_end = xdp_data_hard_end(xdp); /* use xdp->frame_sz */
 	void *data_end = xdp->data_end + offset;
 
+	if (unlikely(xdp->mb))
+		return bpf_xdp_mb_adjust_tail(xdp, offset);
+
 	/* Notice that xdp_data_hard_end have reserved some tailroom */
 	if (unlikely(data_end > data_hard_end))
 		return -EINVAL;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v8 bpf-next 09/14] bpd: add multi-buffer support to xdp copy helpers
  2021-04-08 12:50 [PATCH v8 bpf-next 00/14] mvneta: introduce XDP multi-buffer support Lorenzo Bianconi
                   ` (7 preceding siblings ...)
  2021-04-08 12:51 ` [PATCH v8 bpf-next 08/14] bpf: add multi-buff support to the bpf_xdp_adjust_tail() API Lorenzo Bianconi
@ 2021-04-08 12:51 ` Lorenzo Bianconi
  2021-04-08 20:57   ` Vladimir Oltean
  2021-04-08 21:04   ` Vladimir Oltean
  2021-04-08 12:51 ` [PATCH v8 bpf-next 10/14] bpf: add new frame_length field to the XDP ctx Lorenzo Bianconi
                   ` (6 subsequent siblings)
  15 siblings, 2 replies; 57+ messages in thread
From: Lorenzo Bianconi @ 2021-04-08 12:51 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr, sameehj,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski

From: Eelco Chaudron <echaudro@redhat.com>

This patch adds support for multi-buffer for the following helpers:
  - bpf_xdp_output()
  - bpf_perf_event_output()

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 net/core/filter.c                             |  63 ++++++++-
 .../selftests/bpf/prog_tests/xdp_bpf2bpf.c    | 127 ++++++++++++------
 .../selftests/bpf/progs/test_xdp_bpf2bpf.c    |   3 +-
 3 files changed, 149 insertions(+), 44 deletions(-)

diff --git a/net/core/filter.c b/net/core/filter.c
index c4eb1392f88e..c00f52ab2532 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -4549,10 +4549,56 @@ static const struct bpf_func_proto bpf_sk_ancestor_cgroup_id_proto = {
 };
 #endif
 
-static unsigned long bpf_xdp_copy(void *dst_buff, const void *src_buff,
+static unsigned long bpf_xdp_copy(void *dst_buff, const void *ctx,
 				  unsigned long off, unsigned long len)
 {
-	memcpy(dst_buff, src_buff + off, len);
+	struct xdp_buff *xdp = (struct xdp_buff *)ctx;
+	struct xdp_shared_info *xdp_sinfo;
+	unsigned long base_len;
+
+	if (likely(!xdp->mb)) {
+		memcpy(dst_buff, xdp->data + off, len);
+		return 0;
+	}
+
+	base_len = xdp->data_end - xdp->data;
+	xdp_sinfo = xdp_get_shared_info_from_buff(xdp);
+	do {
+		const void *src_buff = NULL;
+		unsigned long copy_len = 0;
+
+		if (off < base_len) {
+			src_buff = xdp->data + off;
+			copy_len = min(len, base_len - off);
+		} else {
+			unsigned long frag_off_total = base_len;
+			int i;
+
+			for (i = 0; i < xdp_sinfo->nr_frags; i++) {
+				skb_frag_t *frag = &xdp_sinfo->frags[i];
+				unsigned long frag_len, frag_off;
+
+				frag_len = xdp_get_frag_size(frag);
+				frag_off = off - frag_off_total;
+				if (frag_off < frag_len) {
+					src_buff = xdp_get_frag_address(frag) +
+						   frag_off;
+					copy_len = min(len,
+						       frag_len - frag_off);
+					break;
+				}
+				frag_off_total += frag_len;
+			}
+		}
+		if (!src_buff)
+			break;
+
+		memcpy(dst_buff, src_buff, copy_len);
+		off += copy_len;
+		len -= copy_len;
+		dst_buff += copy_len;
+	} while (len);
+
 	return 0;
 }
 
@@ -4564,10 +4610,19 @@ BPF_CALL_5(bpf_xdp_event_output, struct xdp_buff *, xdp, struct bpf_map *, map,
 	if (unlikely(flags & ~(BPF_F_CTXLEN_MASK | BPF_F_INDEX_MASK)))
 		return -EINVAL;
 	if (unlikely(!xdp ||
-		     xdp_size > (unsigned long)(xdp->data_end - xdp->data)))
+		     (likely(!xdp->mb) &&
+		      xdp_size > (unsigned long)(xdp->data_end - xdp->data))))
 		return -EFAULT;
+	if (unlikely(xdp->mb)) {
+		struct xdp_shared_info *xdp_sinfo;
+
+		xdp_sinfo = xdp_get_shared_info_from_buff(xdp);
+		if (unlikely(xdp_size > ((int)(xdp->data_end - xdp->data) +
+					 xdp_sinfo->data_length)))
+			return -EFAULT;
+	}
 
-	return bpf_event_output(map, flags, meta, meta_size, xdp->data,
+	return bpf_event_output(map, flags, meta, meta_size, xdp,
 				xdp_size, bpf_xdp_copy);
 }
 
diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_bpf2bpf.c b/tools/testing/selftests/bpf/prog_tests/xdp_bpf2bpf.c
index 2c6c570b21f8..355e64526f3f 100644
--- a/tools/testing/selftests/bpf/prog_tests/xdp_bpf2bpf.c
+++ b/tools/testing/selftests/bpf/prog_tests/xdp_bpf2bpf.c
@@ -10,11 +10,20 @@ struct meta {
 	int pkt_len;
 };
 
+struct test_ctx_s {
+	bool passed;
+	int pkt_size;
+};
+
+struct test_ctx_s test_ctx;
+
 static void on_sample(void *ctx, int cpu, void *data, __u32 size)
 {
-	int duration = 0;
 	struct meta *meta = (struct meta *)data;
 	struct ipv4_packet *trace_pkt_v4 = data + sizeof(*meta);
+	unsigned char *raw_pkt = data + sizeof(*meta);
+	struct test_ctx_s *tst_ctx = ctx;
+	int duration = 0;
 
 	if (CHECK(size < sizeof(pkt_v4) + sizeof(*meta),
 		  "check_size", "size %u < %zu\n",
@@ -25,25 +34,90 @@ static void on_sample(void *ctx, int cpu, void *data, __u32 size)
 		  "meta->ifindex = %d\n", meta->ifindex))
 		return;
 
-	if (CHECK(meta->pkt_len != sizeof(pkt_v4), "check_meta_pkt_len",
-		  "meta->pkt_len = %zd\n", sizeof(pkt_v4)))
+	if (CHECK(meta->pkt_len != tst_ctx->pkt_size, "check_meta_pkt_len",
+		  "meta->pkt_len = %d\n", tst_ctx->pkt_size))
 		return;
 
 	if (CHECK(memcmp(trace_pkt_v4, &pkt_v4, sizeof(pkt_v4)),
 		  "check_packet_content", "content not the same\n"))
 		return;
 
-	*(bool *)ctx = true;
+	if (meta->pkt_len > sizeof(pkt_v4)) {
+		for (int i = 0; i < (meta->pkt_len - sizeof(pkt_v4)); i++) {
+			if (raw_pkt[i + sizeof(pkt_v4)] != (unsigned char)i) {
+				CHECK(true, "check_packet_content",
+				      "byte %zu does not match %u != %u\n",
+				      i + sizeof(pkt_v4),
+				      raw_pkt[i + sizeof(pkt_v4)],
+				      (unsigned char)i);
+				break;
+			}
+		}
+	}
+
+	tst_ctx->passed = true;
 }
 
-void test_xdp_bpf2bpf(void)
+static int run_xdp_bpf2bpf_pkt_size(int pkt_fd, struct perf_buffer *pb,
+				    struct test_xdp_bpf2bpf *ftrace_skel,
+				    int pkt_size)
 {
 	__u32 duration = 0, retval, size;
-	char buf[128];
+	unsigned char buf_in[9000];
+	unsigned char buf[9000];
+	int err;
+
+	if (pkt_size > sizeof(buf_in) || pkt_size < sizeof(pkt_v4))
+		return -EINVAL;
+
+	test_ctx.passed = false;
+	test_ctx.pkt_size = pkt_size;
+
+	memcpy(buf_in, &pkt_v4, sizeof(pkt_v4));
+	if (pkt_size > sizeof(pkt_v4)) {
+		for (int i = 0; i < (pkt_size - sizeof(pkt_v4)); i++)
+			buf_in[i + sizeof(pkt_v4)] = i;
+	}
+
+	/* Run test program */
+	err = bpf_prog_test_run(pkt_fd, 1, buf_in, pkt_size,
+				buf, &size, &retval, &duration);
+
+	if (CHECK(err || retval != XDP_PASS || size != pkt_size,
+		  "ipv4", "err %d errno %d retval %d size %d\n",
+		  err, errno, retval, size))
+		return -1;
+
+	/* Make sure bpf_xdp_output() was triggered and it sent the expected
+	 * data to the perf ring buffer.
+	 */
+	err = perf_buffer__poll(pb, 100);
+	if (CHECK(err <= 0, "perf_buffer__poll", "err %d\n", err))
+		return -1;
+
+	if (CHECK_FAIL(!test_ctx.passed))
+		return -1;
+
+	/* Verify test results */
+	if (CHECK(ftrace_skel->bss->test_result_fentry != if_nametoindex("lo"),
+		  "result", "fentry failed err %llu\n",
+		  ftrace_skel->bss->test_result_fentry))
+		return -1;
+
+	if (CHECK(ftrace_skel->bss->test_result_fexit != XDP_PASS, "result",
+		  "fexit failed err %llu\n",
+		  ftrace_skel->bss->test_result_fexit))
+		return -1;
+
+	return 0;
+}
+
+void test_xdp_bpf2bpf(void)
+{
 	int err, pkt_fd, map_fd;
-	bool passed = false;
-	struct iphdr *iph = (void *)buf + sizeof(struct ethhdr);
-	struct iptnl_info value4 = {.family = AF_INET};
+	__u32 duration = 0;
+	int pkt_sizes[] = {sizeof(pkt_v4), 1024, 4100, 8200};
+	struct iptnl_info value4 = {.family = AF_INET6};
 	struct test_xdp *pkt_skel = NULL;
 	struct test_xdp_bpf2bpf *ftrace_skel = NULL;
 	struct vip key4 = {.protocol = 6, .family = AF_INET};
@@ -87,40 +161,15 @@ void test_xdp_bpf2bpf(void)
 
 	/* Set up perf buffer */
 	pb_opts.sample_cb = on_sample;
-	pb_opts.ctx = &passed;
+	pb_opts.ctx = &test_ctx;
 	pb = perf_buffer__new(bpf_map__fd(ftrace_skel->maps.perf_buf_map),
-			      1, &pb_opts);
+			      8, &pb_opts);
 	if (CHECK(IS_ERR(pb), "perf_buf__new", "err %ld\n", PTR_ERR(pb)))
 		goto out;
 
-	/* Run test program */
-	err = bpf_prog_test_run(pkt_fd, 1, &pkt_v4, sizeof(pkt_v4),
-				buf, &size, &retval, &duration);
-
-	if (CHECK(err || retval != XDP_TX || size != 74 ||
-		  iph->protocol != IPPROTO_IPIP, "ipv4",
-		  "err %d errno %d retval %d size %d\n",
-		  err, errno, retval, size))
-		goto out;
-
-	/* Make sure bpf_xdp_output() was triggered and it sent the expected
-	 * data to the perf ring buffer.
-	 */
-	err = perf_buffer__poll(pb, 100);
-	if (CHECK(err < 0, "perf_buffer__poll", "err %d\n", err))
-		goto out;
-
-	CHECK_FAIL(!passed);
-
-	/* Verify test results */
-	if (CHECK(ftrace_skel->bss->test_result_fentry != if_nametoindex("lo"),
-		  "result", "fentry failed err %llu\n",
-		  ftrace_skel->bss->test_result_fentry))
-		goto out;
-
-	CHECK(ftrace_skel->bss->test_result_fexit != XDP_TX, "result",
-	      "fexit failed err %llu\n", ftrace_skel->bss->test_result_fexit);
-
+	for (int i = 0; i < ARRAY_SIZE(pkt_sizes); i++)
+		run_xdp_bpf2bpf_pkt_size(pkt_fd, pb, ftrace_skel,
+					 pkt_sizes[i]);
 out:
 	if (pb)
 		perf_buffer__free(pb);
diff --git a/tools/testing/selftests/bpf/progs/test_xdp_bpf2bpf.c b/tools/testing/selftests/bpf/progs/test_xdp_bpf2bpf.c
index a038e827f850..d5a5f603d252 100644
--- a/tools/testing/selftests/bpf/progs/test_xdp_bpf2bpf.c
+++ b/tools/testing/selftests/bpf/progs/test_xdp_bpf2bpf.c
@@ -27,6 +27,7 @@ struct xdp_buff {
 	void *data_hard_start;
 	unsigned long handle;
 	struct xdp_rxq_info *rxq;
+	__u32 frame_length;
 } __attribute__((preserve_access_index));
 
 struct meta {
@@ -49,7 +50,7 @@ int BPF_PROG(trace_on_entry, struct xdp_buff *xdp)
 	void *data = (void *)(long)xdp->data;
 
 	meta.ifindex = xdp->rxq->dev->ifindex;
-	meta.pkt_len = data_end - data;
+	meta.pkt_len = xdp->frame_length;
 	bpf_xdp_output(xdp, &perf_buf_map,
 		       ((__u64) meta.pkt_len << 32) |
 		       BPF_F_CURRENT_CPU,
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v8 bpf-next 10/14] bpf: add new frame_length field to the XDP ctx
  2021-04-08 12:50 [PATCH v8 bpf-next 00/14] mvneta: introduce XDP multi-buffer support Lorenzo Bianconi
                   ` (8 preceding siblings ...)
  2021-04-08 12:51 ` [PATCH v8 bpf-next 09/14] bpd: add multi-buffer support to xdp copy helpers Lorenzo Bianconi
@ 2021-04-08 12:51 ` Lorenzo Bianconi
  2021-04-08 12:51 ` [PATCH v8 bpf-next 11/14] bpf: move user_size out of bpf_test_init Lorenzo Bianconi
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 57+ messages in thread
From: Lorenzo Bianconi @ 2021-04-08 12:51 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr, sameehj,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski

From: Eelco Chaudron <echaudro@redhat.com>

This patch adds a new field to the XDP context called frame_length,
which will hold the full length of the packet, including fragments
if existing.

eBPF programs can determine if fragments are present using something
like:

  if (ctx->data_end - ctx->data < ctx->frame_length) {
    /* Fragements exists. /*
  }

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 include/linux/filter.h         |  7 +++++++
 include/net/xdp.h              | 15 +++++++++++++++
 include/uapi/linux/bpf.h       |  1 +
 net/core/filter.c              |  8 ++++++++
 net/core/xdp.c                 |  1 +
 tools/include/uapi/linux/bpf.h |  1 +
 6 files changed, 33 insertions(+)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 9a09547bc7ba..d378a448f673 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -768,6 +768,13 @@ static __always_inline u32 bpf_prog_run_xdp(const struct bpf_prog *prog,
 	 * already takes rcu_read_lock() when fetching the program, so
 	 * it's not necessary here anymore.
 	 */
+	xdp->frame_length = xdp->data_end - xdp->data;
+	if (unlikely(xdp->mb)) {
+		struct xdp_shared_info *xdp_sinfo;
+
+		xdp_sinfo = xdp_get_shared_info_from_buff(xdp);
+		xdp->frame_length += xdp_sinfo->data_length;
+	}
 	return __BPF_PROG_RUN(prog, xdp, BPF_DISPATCHER_FUNC(xdp));
 }
 
diff --git a/include/net/xdp.h b/include/net/xdp.h
index 55751cf2badf..e41022894770 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -77,6 +77,13 @@ struct xdp_buff {
 			  * tailroom
 			  */
 	u32 mb:1; /* xdp non-linear buffer */
+	u32 frame_length; /* Total frame length across all buffers. Only needs
+			   * to be updated by helper functions, as it will be
+			   * initialized at XDP program start. This field only
+			   * needs 17-bits (128kB). In case the remaining bits
+			   * need to be re-purposed, please make sure the
+			   * xdp_convert_ctx_access() function gets updated.
+			   */
 };
 
 static __always_inline void
@@ -237,6 +244,14 @@ void xdp_convert_frame_to_buff(struct xdp_frame *frame, struct xdp_buff *xdp)
 	xdp->data_meta = frame->data - frame->metasize;
 	xdp->frame_sz = frame->frame_sz;
 	xdp->mb = frame->mb;
+	xdp->frame_length = frame->len;
+
+	if (unlikely(xdp->mb)) {
+		struct xdp_shared_info *xdp_sinfo;
+
+		xdp_sinfo = xdp_get_shared_info_from_buff(xdp);
+		xdp->frame_length += xdp_sinfo->data_length;
+	}
 }
 
 static inline
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 49371eba98ba..643ef5979d42 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -5224,6 +5224,7 @@ struct xdp_md {
 	__u32 rx_queue_index;  /* rxq->queue_index  */
 
 	__u32 egress_ifindex;  /* txq->dev->ifindex */
+	__u32 frame_length;
 };
 
 /* DEVMAP map-value layout
diff --git a/net/core/filter.c b/net/core/filter.c
index c00f52ab2532..8f8613745f0e 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3873,6 +3873,7 @@ static int bpf_xdp_mb_adjust_tail(struct xdp_buff *xdp, int offset)
 		memset(xdp_get_frag_address(frag) + size, 0, offset);
 		xdp_set_frag_size(frag, size + offset);
 		xdp_sinfo->data_length += offset;
+		xdp->frame_length += offset;
 	} else {
 		int i, frags_to_free = 0;
 
@@ -3894,6 +3895,7 @@ static int bpf_xdp_mb_adjust_tail(struct xdp_buff *xdp, int offset)
 				 * to adjust the data_length in line.
 				 */
 				xdp_sinfo->data_length -= shrink;
+				xdp->frame_length -= shrink;
 				xdp_set_frag_size(frag, size - shrink);
 				break;
 			}
@@ -9137,6 +9139,12 @@ static u32 xdp_convert_ctx_access(enum bpf_access_type type,
 		*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg,
 				      offsetof(struct net_device, ifindex));
 		break;
+	case offsetof(struct xdp_md, frame_length):
+		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct xdp_buff,
+						       frame_length),
+				      si->dst_reg, si->src_reg,
+				      offsetof(struct xdp_buff, frame_length));
+		break;
 	}
 
 	return insn - insn_buf;
diff --git a/net/core/xdp.c b/net/core/xdp.c
index 7388bc6d680b..fb7d0724a5b6 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -510,6 +510,7 @@ void xdp_return_num_frags_from_buff(struct xdp_buff *xdp, u16 num_frags)
 		struct page *page = xdp_get_frag_page(frag);
 
 		xdp_sinfo->data_length -= xdp_get_frag_size(frag);
+		xdp->frame_length -= xdp_get_frag_size(frag);
 		__xdp_return(page_address(page), &xdp->rxq->mem, false, NULL);
 	}
 	xdp_sinfo->nr_frags -= num_frags;
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 69902603012c..5c2a497bfcf1 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -5218,6 +5218,7 @@ struct xdp_md {
 	__u32 rx_queue_index;  /* rxq->queue_index  */
 
 	__u32 egress_ifindex;  /* txq->dev->ifindex */
+	__u32 frame_length;
 };
 
 /* DEVMAP map-value layout
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v8 bpf-next 11/14] bpf: move user_size out of bpf_test_init
  2021-04-08 12:50 [PATCH v8 bpf-next 00/14] mvneta: introduce XDP multi-buffer support Lorenzo Bianconi
                   ` (9 preceding siblings ...)
  2021-04-08 12:51 ` [PATCH v8 bpf-next 10/14] bpf: add new frame_length field to the XDP ctx Lorenzo Bianconi
@ 2021-04-08 12:51 ` Lorenzo Bianconi
  2021-04-08 12:51 ` [PATCH v8 bpf-next 12/14] bpf: introduce multibuff support to bpf_prog_test_run_xdp() Lorenzo Bianconi
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 57+ messages in thread
From: Lorenzo Bianconi @ 2021-04-08 12:51 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr, sameehj,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski

Rely on data_size_in in bpf_test_init routine signature. This is a
preliminary patch to introduce xdp multi-buff selftest

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 net/bpf/test_run.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index a5d72c48fb66..1acd94377822 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -245,11 +245,10 @@ bool bpf_prog_test_check_kfunc_call(u32 kfunc_id)
 	return btf_id_set_contains(&test_sk_kfunc_ids, kfunc_id);
 }
 
-static void *bpf_test_init(const union bpf_attr *kattr, u32 size,
-			   u32 headroom, u32 tailroom)
+static void *bpf_test_init(const union bpf_attr *kattr, u32 user_size,
+			   u32 size, u32 headroom, u32 tailroom)
 {
 	void __user *data_in = u64_to_user_ptr(kattr->test.data_in);
-	u32 user_size = kattr->test.data_size_in;
 	void *data;
 
 	if (size < ETH_HLEN || size > PAGE_SIZE - headroom - tailroom)
@@ -570,7 +569,8 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
 	if (kattr->test.flags || kattr->test.cpu)
 		return -EINVAL;
 
-	data = bpf_test_init(kattr, size, NET_SKB_PAD + NET_IP_ALIGN,
+	data = bpf_test_init(kattr, kattr->test.data_size_in,
+			     size, NET_SKB_PAD + NET_IP_ALIGN,
 			     SKB_DATA_ALIGN(sizeof(struct skb_shared_info)));
 	if (IS_ERR(data))
 		return PTR_ERR(data);
@@ -707,7 +707,8 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr,
 	/* XDP have extra tailroom as (most) drivers use full page */
 	max_data_sz = 4096 - headroom - tailroom;
 
-	data = bpf_test_init(kattr, max_data_sz, headroom, tailroom);
+	data = bpf_test_init(kattr, kattr->test.data_size_in,
+			     max_data_sz, headroom, tailroom);
 	if (IS_ERR(data))
 		return PTR_ERR(data);
 
@@ -769,7 +770,7 @@ int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
 	if (size < ETH_HLEN)
 		return -EINVAL;
 
-	data = bpf_test_init(kattr, size, 0, 0);
+	data = bpf_test_init(kattr, kattr->test.data_size_in, size, 0, 0);
 	if (IS_ERR(data))
 		return PTR_ERR(data);
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v8 bpf-next 12/14] bpf: introduce multibuff support to bpf_prog_test_run_xdp()
  2021-04-08 12:50 [PATCH v8 bpf-next 00/14] mvneta: introduce XDP multi-buffer support Lorenzo Bianconi
                   ` (10 preceding siblings ...)
  2021-04-08 12:51 ` [PATCH v8 bpf-next 11/14] bpf: move user_size out of bpf_test_init Lorenzo Bianconi
@ 2021-04-08 12:51 ` Lorenzo Bianconi
  2021-04-08 12:51 ` [PATCH v8 bpf-next 13/14] bpf: test_run: add xdp_shared_info pointer in bpf_test_finish signature Lorenzo Bianconi
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 57+ messages in thread
From: Lorenzo Bianconi @ 2021-04-08 12:51 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr, sameehj,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski

Introduce the capability to allocate a xdp multi-buff in
bpf_prog_test_run_xdp routine. This is a preliminary patch to introduce
the selftests for new xdp multi-buff ebpf helpers

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 net/bpf/test_run.c | 52 +++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 44 insertions(+), 8 deletions(-)

diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index 1acd94377822..bb953b2e6501 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -692,23 +692,22 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr,
 {
 	u32 tailroom = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
 	u32 headroom = XDP_PACKET_HEADROOM;
-	u32 size = kattr->test.data_size_in;
+	struct xdp_shared_info *xdp_sinfo;
 	u32 repeat = kattr->test.repeat;
 	struct netdev_rx_queue *rxqueue;
 	struct xdp_buff xdp = {};
+	u32 max_data_sz, size;
 	u32 retval, duration;
-	u32 max_data_sz;
+	int i, ret;
 	void *data;
-	int ret;
 
 	if (kattr->test.ctx_in || kattr->test.ctx_out)
 		return -EINVAL;
 
-	/* XDP have extra tailroom as (most) drivers use full page */
 	max_data_sz = 4096 - headroom - tailroom;
+	size = min_t(u32, kattr->test.data_size_in, max_data_sz);
 
-	data = bpf_test_init(kattr, kattr->test.data_size_in,
-			     max_data_sz, headroom, tailroom);
+	data = bpf_test_init(kattr, size, max_data_sz, headroom, tailroom);
 	if (IS_ERR(data))
 		return PTR_ERR(data);
 
@@ -717,16 +716,53 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr,
 		      &rxqueue->xdp_rxq);
 	xdp_prepare_buff(&xdp, data, headroom, size, true);
 
+	xdp_sinfo = xdp_get_shared_info_from_buff(&xdp);
+	if (unlikely(kattr->test.data_size_in > size)) {
+		void __user *data_in = u64_to_user_ptr(kattr->test.data_in);
+
+		while (size < kattr->test.data_size_in) {
+			struct page *page;
+			skb_frag_t *frag;
+			int data_len;
+
+			page = alloc_page(GFP_KERNEL);
+			if (!page) {
+				ret = -ENOMEM;
+				goto out;
+			}
+
+			frag = &xdp_sinfo->frags[xdp_sinfo->nr_frags++];
+			xdp_set_frag_page(frag, page);
+
+			data_len = min_t(int, kattr->test.data_size_in - size,
+					 PAGE_SIZE);
+			xdp_set_frag_size(frag, data_len);
+
+			if (copy_from_user(page_address(page), data_in + size,
+					   data_len)) {
+				ret = -EFAULT;
+				goto out;
+			}
+			xdp_sinfo->data_length += data_len;
+			size += data_len;
+		}
+		xdp.mb = 1;
+	}
+
 	bpf_prog_change_xdp(NULL, prog);
 	ret = bpf_test_run(prog, &xdp, repeat, &retval, &duration, true);
 	if (ret)
 		goto out;
-	if (xdp.data != data + headroom || xdp.data_end != xdp.data + size)
-		size = xdp.data_end - xdp.data;
+
+	size = xdp.data_end - xdp.data + xdp_sinfo->data_length;
 	ret = bpf_test_finish(kattr, uattr, xdp.data, size, retval, duration);
+
 out:
 	bpf_prog_change_xdp(prog, NULL);
+	for (i = 0; i < xdp_sinfo->nr_frags; i++)
+		__free_page(xdp_get_frag_page(&xdp_sinfo->frags[i]));
 	kfree(data);
+
 	return ret;
 }
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v8 bpf-next 13/14] bpf: test_run: add xdp_shared_info pointer in bpf_test_finish signature
  2021-04-08 12:50 [PATCH v8 bpf-next 00/14] mvneta: introduce XDP multi-buffer support Lorenzo Bianconi
                   ` (11 preceding siblings ...)
  2021-04-08 12:51 ` [PATCH v8 bpf-next 12/14] bpf: introduce multibuff support to bpf_prog_test_run_xdp() Lorenzo Bianconi
@ 2021-04-08 12:51 ` Lorenzo Bianconi
  2021-04-08 12:51 ` [PATCH v8 bpf-next 14/14] bpf: update xdp_adjust_tail selftest to include multi-buffer Lorenzo Bianconi
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 57+ messages in thread
From: Lorenzo Bianconi @ 2021-04-08 12:51 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr, sameehj,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski

introduce xdp_shared_info pointer in bpf_test_finish signature in order
to copy back paged data from a xdp multi-buff frame to userspace buffer

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 net/bpf/test_run.c | 48 ++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 40 insertions(+), 8 deletions(-)

diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index bb953b2e6501..65c944ebc2da 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -128,7 +128,8 @@ static int bpf_test_run(struct bpf_prog *prog, void *ctx, u32 repeat,
 
 static int bpf_test_finish(const union bpf_attr *kattr,
 			   union bpf_attr __user *uattr, const void *data,
-			   u32 size, u32 retval, u32 duration)
+			   struct xdp_shared_info *xdp_sinfo, u32 size,
+			   u32 retval, u32 duration)
 {
 	void __user *data_out = u64_to_user_ptr(kattr->test.data_out);
 	int err = -EFAULT;
@@ -143,8 +144,37 @@ static int bpf_test_finish(const union bpf_attr *kattr,
 		err = -ENOSPC;
 	}
 
-	if (data_out && copy_to_user(data_out, data, copy_size))
-		goto out;
+	if (data_out) {
+		int len = xdp_sinfo ? copy_size - xdp_sinfo->data_length
+				    : copy_size;
+
+		if (copy_to_user(data_out, data, len))
+			goto out;
+
+		if (xdp_sinfo) {
+			int i, offset = len, data_len;
+
+			for (i = 0; i < xdp_sinfo->nr_frags; i++) {
+				skb_frag_t *frag = &xdp_sinfo->frags[i];
+
+				if (offset >= copy_size) {
+					err = -ENOSPC;
+					break;
+				}
+
+				data_len = min_t(int, copy_size - offset,
+						 xdp_get_frag_size(frag));
+
+				if (copy_to_user(data_out + offset,
+						 xdp_get_frag_address(frag),
+						 data_len))
+					goto out;
+
+				offset += data_len;
+			}
+		}
+	}
+
 	if (copy_to_user(&uattr->test.data_size_out, &size, sizeof(size)))
 		goto out;
 	if (copy_to_user(&uattr->test.retval, &retval, sizeof(retval)))
@@ -673,7 +703,8 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
 	/* bpf program can never convert linear skb to non-linear */
 	if (WARN_ON_ONCE(skb_is_nonlinear(skb)))
 		size = skb_headlen(skb);
-	ret = bpf_test_finish(kattr, uattr, skb->data, size, retval, duration);
+	ret = bpf_test_finish(kattr, uattr, skb->data, NULL, size, retval,
+			      duration);
 	if (!ret)
 		ret = bpf_ctx_finish(kattr, uattr, ctx,
 				     sizeof(struct __sk_buff));
@@ -755,7 +786,8 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr,
 		goto out;
 
 	size = xdp.data_end - xdp.data + xdp_sinfo->data_length;
-	ret = bpf_test_finish(kattr, uattr, xdp.data, size, retval, duration);
+	ret = bpf_test_finish(kattr, uattr, xdp.data, xdp_sinfo, size, retval,
+			      duration);
 
 out:
 	bpf_prog_change_xdp(prog, NULL);
@@ -841,8 +873,8 @@ int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
 	if (ret < 0)
 		goto out;
 
-	ret = bpf_test_finish(kattr, uattr, &flow_keys, sizeof(flow_keys),
-			      retval, duration);
+	ret = bpf_test_finish(kattr, uattr, &flow_keys, NULL,
+			      sizeof(flow_keys), retval, duration);
 	if (!ret)
 		ret = bpf_ctx_finish(kattr, uattr, user_ctx,
 				     sizeof(struct bpf_flow_keys));
@@ -946,7 +978,7 @@ int bpf_prog_test_run_sk_lookup(struct bpf_prog *prog, const union bpf_attr *kat
 		user_ctx->cookie = sock_gen_cookie(ctx.selected_sk);
 	}
 
-	ret = bpf_test_finish(kattr, uattr, NULL, 0, retval, duration);
+	ret = bpf_test_finish(kattr, uattr, NULL, NULL, 0, retval, duration);
 	if (!ret)
 		ret = bpf_ctx_finish(kattr, uattr, user_ctx, sizeof(*user_ctx));
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* [PATCH v8 bpf-next 14/14] bpf: update xdp_adjust_tail selftest to include multi-buffer
  2021-04-08 12:50 [PATCH v8 bpf-next 00/14] mvneta: introduce XDP multi-buffer support Lorenzo Bianconi
                   ` (12 preceding siblings ...)
  2021-04-08 12:51 ` [PATCH v8 bpf-next 13/14] bpf: test_run: add xdp_shared_info pointer in bpf_test_finish signature Lorenzo Bianconi
@ 2021-04-08 12:51 ` Lorenzo Bianconi
  2021-04-09  0:56 ` [PATCH v8 bpf-next 00/14] mvneta: introduce XDP multi-buffer support John Fastabend
  2021-04-16 14:27 ` Magnus Karlsson
  15 siblings, 0 replies; 57+ messages in thread
From: Lorenzo Bianconi @ 2021-04-08 12:51 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr, sameehj,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski

From: Eelco Chaudron <echaudro@redhat.com>

This change adds test cases for the multi-buffer scenarios when shrinking
and growing.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 .../bpf/prog_tests/xdp_adjust_tail.c          | 105 ++++++++++++++++++
 .../bpf/progs/test_xdp_adjust_tail_grow.c     |  17 +--
 .../bpf/progs/test_xdp_adjust_tail_shrink.c   |  32 +++++-
 3 files changed, 143 insertions(+), 11 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_adjust_tail.c b/tools/testing/selftests/bpf/prog_tests/xdp_adjust_tail.c
index d5c98f2cb12f..b936beaba797 100644
--- a/tools/testing/selftests/bpf/prog_tests/xdp_adjust_tail.c
+++ b/tools/testing/selftests/bpf/prog_tests/xdp_adjust_tail.c
@@ -130,6 +130,107 @@ void test_xdp_adjust_tail_grow2(void)
 	bpf_object__close(obj);
 }
 
+void test_xdp_adjust_mb_tail_shrink(void)
+{
+	const char *file = "./test_xdp_adjust_tail_shrink.o";
+	__u32 duration, retval, size, exp_size;
+	struct bpf_object *obj;
+	static char buf[9000];
+	int err, prog_fd;
+
+	/* For the individual test cases, the first byte in the packet
+	 * indicates which test will be run.
+	 */
+
+	err = bpf_prog_load(file, BPF_PROG_TYPE_XDP, &obj, &prog_fd);
+	if (CHECK_FAIL(err))
+		return;
+
+	/* Test case removing 10 bytes from last frag, NOT freeing it */
+	buf[0] = 0;
+	exp_size = sizeof(buf) - 10;
+	err = bpf_prog_test_run(prog_fd, 1, buf, sizeof(buf),
+				buf, &size, &retval, &duration);
+
+	CHECK(err || retval != XDP_TX || size != exp_size,
+	      "9k-10b", "err %d errno %d retval %d[%d] size %d[%u]\n",
+	      err, errno, retval, XDP_TX, size, exp_size);
+
+	/* Test case removing one of two pages, assuming 4K pages */
+	buf[0] = 1;
+	exp_size = sizeof(buf) - 4100;
+	err = bpf_prog_test_run(prog_fd, 1, buf, sizeof(buf),
+				buf, &size, &retval, &duration);
+
+	CHECK(err || retval != XDP_TX || size != exp_size,
+	      "9k-1p", "err %d errno %d retval %d[%d] size %d[%u]\n",
+	      err, errno, retval, XDP_TX, size, exp_size);
+
+	/* Test case removing two pages resulting in a non mb xdp_buff */
+	buf[0] = 2;
+	exp_size = sizeof(buf) - 8200;
+	err = bpf_prog_test_run(prog_fd, 1, buf, sizeof(buf),
+				buf, &size, &retval, &duration);
+
+	CHECK(err || retval != XDP_TX || size != exp_size,
+	      "9k-2p", "err %d errno %d retval %d[%d] size %d[%u]\n",
+	      err, errno, retval, XDP_TX, size, exp_size);
+
+	bpf_object__close(obj);
+}
+
+void test_xdp_adjust_mb_tail_grow(void)
+{
+	const char *file = "./test_xdp_adjust_tail_grow.o";
+	__u32 duration, retval, size, exp_size;
+	static char buf[16384];
+	struct bpf_object *obj;
+	int err, i, prog_fd;
+
+	err = bpf_prog_load(file, BPF_PROG_TYPE_XDP, &obj, &prog_fd);
+	if (CHECK_FAIL(err))
+		return;
+
+	/* Test case add 10 bytes to last frag */
+	memset(buf, 1, sizeof(buf));
+	size = 9000;
+	exp_size = size + 10;
+	err = bpf_prog_test_run(prog_fd, 1, buf, size,
+				buf, &size, &retval, &duration);
+
+	CHECK(err || retval != XDP_TX || size != exp_size,
+	      "9k+10b", "err %d retval %d[%d] size %d[%u]\n",
+	      err, retval, XDP_TX, size, exp_size);
+
+	for (i = 0; i < 9000; i++)
+		CHECK(buf[i] != 1, "9k+10b-old",
+		      "Old data not all ok, offset %i is failing [%u]!\n",
+		      i, buf[i]);
+
+	for (i = 9000; i < 9010; i++)
+		CHECK(buf[i] != 0, "9k+10b-new",
+		      "New data not all ok, offset %i is failing [%u]!\n",
+		      i, buf[i]);
+
+	for (i = 9010; i < sizeof(buf); i++)
+		CHECK(buf[i] != 1, "9k+10b-untouched",
+		      "Unused data not all ok, offset %i is failing [%u]!\n",
+		      i, buf[i]);
+
+	/* Test a too large grow */
+	memset(buf, 1, sizeof(buf));
+	size = 9001;
+	exp_size = size;
+	err = bpf_prog_test_run(prog_fd, 1, buf, size,
+				buf, &size, &retval, &duration);
+
+	CHECK(err || retval != XDP_DROP || size != exp_size,
+	      "9k+10b", "err %d retval %d[%d] size %d[%u]\n",
+	      err, retval, XDP_TX, size, exp_size);
+
+	bpf_object__close(obj);
+}
+
 void test_xdp_adjust_tail(void)
 {
 	if (test__start_subtest("xdp_adjust_tail_shrink"))
@@ -138,4 +239,8 @@ void test_xdp_adjust_tail(void)
 		test_xdp_adjust_tail_grow();
 	if (test__start_subtest("xdp_adjust_tail_grow2"))
 		test_xdp_adjust_tail_grow2();
+	if (test__start_subtest("xdp_adjust_mb_tail_shrink"))
+		test_xdp_adjust_mb_tail_shrink();
+	if (test__start_subtest("xdp_adjust_mb_tail_grow"))
+		test_xdp_adjust_mb_tail_grow();
 }
diff --git a/tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_grow.c b/tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_grow.c
index 3d66599eee2e..f8394d625ced 100644
--- a/tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_grow.c
+++ b/tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_grow.c
@@ -7,20 +7,23 @@ int _xdp_adjust_tail_grow(struct xdp_md *xdp)
 {
 	void *data_end = (void *)(long)xdp->data_end;
 	void *data = (void *)(long)xdp->data;
-	unsigned int data_len;
 	int offset = 0;
 
 	/* Data length determine test case */
-	data_len = data_end - data;
 
-	if (data_len == 54) { /* sizeof(pkt_v4) */
+	if (xdp->frame_length == 54) { /* sizeof(pkt_v4) */
 		offset = 4096; /* test too large offset */
-	} else if (data_len == 74) { /* sizeof(pkt_v6) */
+	} else if (xdp->frame_length == 74) { /* sizeof(pkt_v6) */
 		offset = 40;
-	} else if (data_len == 64) {
+	} else if (xdp->frame_length == 64) {
 		offset = 128;
-	} else if (data_len == 128) {
-		offset = 4096 - 256 - 320 - data_len; /* Max tail grow 3520 */
+	} else if (xdp->frame_length == 128) {
+		/* Max tail grow 3520 */
+		offset = 4096 - 256 - 320 - xdp->frame_length;
+	} else if (xdp->frame_length == 9000) {
+		offset = 10;
+	} else if (xdp->frame_length == 9001) {
+		offset = 4096;
 	} else {
 		return XDP_ABORTED; /* No matching test */
 	}
diff --git a/tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_shrink.c b/tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_shrink.c
index 22065a9cfb25..689450414d29 100644
--- a/tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_shrink.c
+++ b/tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_shrink.c
@@ -14,14 +14,38 @@ int _version SEC("version") = 1;
 SEC("xdp_adjust_tail_shrink")
 int _xdp_adjust_tail_shrink(struct xdp_md *xdp)
 {
-	void *data_end = (void *)(long)xdp->data_end;
-	void *data = (void *)(long)xdp->data;
+	__u8 *data_end = (void *)(long)xdp->data_end;
+	__u8 *data = (void *)(long)xdp->data;
 	int offset = 0;
 
-	if (data_end - data == 54) /* sizeof(pkt_v4) */
+	switch (xdp->frame_length) {
+	case 54:
+		/* sizeof(pkt_v4) */
 		offset = 256; /* shrink too much */
-	else
+		break;
+	case 9000:
+		/* Multi-buffer test cases */
+		if (data + 1 > data_end)
+			return XDP_DROP;
+
+		switch (data[0]) {
+		case 0:
+			offset = 10;
+			break;
+		case 1:
+			offset = 4100;
+			break;
+		case 2:
+			offset = 8200;
+			break;
+		default:
+			return XDP_DROP;
+		}
+		break;
+	default:
 		offset = 20;
+		break;
+	}
 	if (bpf_xdp_adjust_tail(xdp, 0 - offset))
 		return XDP_DROP;
 	return XDP_TX;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 57+ messages in thread

* Re: [PATCH v8 bpf-next 02/14] xdp: add xdp_shared_info data structure
  2021-04-08 12:50 ` [PATCH v8 bpf-next 02/14] xdp: add xdp_shared_info data structure Lorenzo Bianconi
@ 2021-04-08 13:39   ` Vladimir Oltean
  2021-04-08 14:26     ` Lorenzo Bianconi
  2021-04-08 18:06   ` kernel test robot
  1 sibling, 1 reply; 57+ messages in thread
From: Vladimir Oltean @ 2021-04-08 13:39 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: bpf, netdev, lorenzo.bianconi, davem, kuba, ast, daniel, shayagr,
	sameehj, john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski

Hi Lorenzo,

On Thu, Apr 08, 2021 at 02:50:54PM +0200, Lorenzo Bianconi wrote:
> Introduce xdp_shared_info data structure to contain info about
> "non-linear" xdp frame. xdp_shared_info will alias skb_shared_info
> allowing to keep most of the frags in the same cache-line.
> Introduce some xdp_shared_info helpers aligned to skb_frag* ones
> 
> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> ---

Would you mind updating all drivers that use skb_shared_info, such as
enetc, and not just mvneta? At the moment I get some build warnings:

drivers/net/ethernet/freescale/enetc/enetc.c: In function ‘enetc_xdp_frame_to_xdp_tx_swbd’:
drivers/net/ethernet/freescale/enetc/enetc.c:888:9: error: assignment to ‘struct skb_shared_info *’ from incompatible pointer type ‘struct xdp_shared_info *’ [-Werror=incompatible-pointer-types]
  888 |  shinfo = xdp_get_shared_info_from_frame(xdp_frame);
      |         ^
drivers/net/ethernet/freescale/enetc/enetc.c: In function ‘enetc_map_rx_buff_to_xdp’:
drivers/net/ethernet/freescale/enetc/enetc.c:975:9: error: assignment to ‘struct skb_shared_info *’ from incompatible pointer type ‘struct xdp_shared_info *’ [-Werror=incompatible-pointer-types]
  975 |  shinfo = xdp_get_shared_info_from_buff(xdp_buff);
      |         ^
drivers/net/ethernet/freescale/enetc/enetc.c: In function ‘enetc_add_rx_buff_to_xdp’:
drivers/net/ethernet/freescale/enetc/enetc.c:982:35: error: initialization of ‘struct skb_shared_info *’ from incompatible pointer type ‘struct xdp_shared_info *’ [-Werror=incompatible-pointer-types]
  982 |  struct skb_shared_info *shinfo = xdp_get_shared_info_from_buff(xdp_buff);
      |                                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v8 bpf-next 02/14] xdp: add xdp_shared_info data structure
  2021-04-08 13:39   ` Vladimir Oltean
@ 2021-04-08 14:26     ` Lorenzo Bianconi
  0 siblings, 0 replies; 57+ messages in thread
From: Lorenzo Bianconi @ 2021-04-08 14:26 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Lorenzo Bianconi, bpf, netdev, davem, kuba, ast, daniel, shayagr,
	sameehj, john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski

[-- Attachment #1: Type: text/plain, Size: 1966 bytes --]

> Hi Lorenzo,
> 
> On Thu, Apr 08, 2021 at 02:50:54PM +0200, Lorenzo Bianconi wrote:
> > Introduce xdp_shared_info data structure to contain info about
> > "non-linear" xdp frame. xdp_shared_info will alias skb_shared_info
> > allowing to keep most of the frags in the same cache-line.
> > Introduce some xdp_shared_info helpers aligned to skb_frag* ones
> > 
> > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> > ---
> 
> Would you mind updating all drivers that use skb_shared_info, such as
> enetc, and not just mvneta? At the moment I get some build warnings:
> 
> drivers/net/ethernet/freescale/enetc/enetc.c: In function ‘enetc_xdp_frame_to_xdp_tx_swbd’:
> drivers/net/ethernet/freescale/enetc/enetc.c:888:9: error: assignment to ‘struct skb_shared_info *’ from incompatible pointer type ‘struct xdp_shared_info *’ [-Werror=incompatible-pointer-types]
>   888 |  shinfo = xdp_get_shared_info_from_frame(xdp_frame);
>       |         ^
> drivers/net/ethernet/freescale/enetc/enetc.c: In function ‘enetc_map_rx_buff_to_xdp’:
> drivers/net/ethernet/freescale/enetc/enetc.c:975:9: error: assignment to ‘struct skb_shared_info *’ from incompatible pointer type ‘struct xdp_shared_info *’ [-Werror=incompatible-pointer-types]
>   975 |  shinfo = xdp_get_shared_info_from_buff(xdp_buff);
>       |         ^
> drivers/net/ethernet/freescale/enetc/enetc.c: In function ‘enetc_add_rx_buff_to_xdp’:
> drivers/net/ethernet/freescale/enetc/enetc.c:982:35: error: initialization of ‘struct skb_shared_info *’ from incompatible pointer type ‘struct xdp_shared_info *’ [-Werror=incompatible-pointer-types]
>   982 |  struct skb_shared_info *shinfo = xdp_get_shared_info_from_buff(xdp_buff);
>       |                                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 

Hi Vladimir,

Ack, I will fix it in v9; enetc was not compiled on my machine, thanks for pointing
this out :)

Regards,
Lorenzo

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v8 bpf-next 02/14] xdp: add xdp_shared_info data structure
  2021-04-08 12:50 ` [PATCH v8 bpf-next 02/14] xdp: add xdp_shared_info data structure Lorenzo Bianconi
  2021-04-08 13:39   ` Vladimir Oltean
@ 2021-04-08 18:06   ` kernel test robot
  1 sibling, 0 replies; 57+ messages in thread
From: kernel test robot @ 2021-04-08 18:06 UTC (permalink / raw)
  To: Lorenzo Bianconi, bpf, netdev
  Cc: kbuild-all, lorenzo.bianconi, davem, kuba, ast, daniel, shayagr,
	sameehj, john.fastabend

[-- Attachment #1: Type: text/plain, Size: 12920 bytes --]

Hi Lorenzo,

I love your patch! Yet something to improve:

[auto build test ERROR on bpf-next/master]

url:    https://github.com/0day-ci/linux/commits/Lorenzo-Bianconi/mvneta-introduce-XDP-multi-buffer-support/20210408-205429
base:   https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git master
config: ia64-allmodconfig (attached as .config)
compiler: ia64-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/3652b59c1a912ad4f2d609e074eeb332f44ba4d7
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Lorenzo-Bianconi/mvneta-introduce-XDP-multi-buffer-support/20210408-205429
        git checkout 3652b59c1a912ad4f2d609e074eeb332f44ba4d7
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross ARCH=ia64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   drivers/net/ethernet/freescale/enetc/enetc.c: In function 'enetc_xdp_frame_to_xdp_tx_swbd':
>> drivers/net/ethernet/freescale/enetc/enetc.c:888:9: error: assignment to 'struct skb_shared_info *' from incompatible pointer type 'struct xdp_shared_info *' [-Werror=incompatible-pointer-types]
     888 |  shinfo = xdp_get_shared_info_from_frame(xdp_frame);
         |         ^
   drivers/net/ethernet/freescale/enetc/enetc.c: In function 'enetc_map_rx_buff_to_xdp':
   drivers/net/ethernet/freescale/enetc/enetc.c:975:9: error: assignment to 'struct skb_shared_info *' from incompatible pointer type 'struct xdp_shared_info *' [-Werror=incompatible-pointer-types]
     975 |  shinfo = xdp_get_shared_info_from_buff(xdp_buff);
         |         ^
   drivers/net/ethernet/freescale/enetc/enetc.c: In function 'enetc_add_rx_buff_to_xdp':
>> drivers/net/ethernet/freescale/enetc/enetc.c:982:35: error: initialization of 'struct skb_shared_info *' from incompatible pointer type 'struct xdp_shared_info *' [-Werror=incompatible-pointer-types]
     982 |  struct skb_shared_info *shinfo = xdp_get_shared_info_from_buff(xdp_buff);
         |                                   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   cc1: some warnings being treated as errors


vim +888 drivers/net/ethernet/freescale/enetc/enetc.c

7ed2bc80074ed4 Vladimir Oltean 2021-03-31  858  
9d2b68cc108db2 Vladimir Oltean 2021-03-31  859  static int enetc_xdp_frame_to_xdp_tx_swbd(struct enetc_bdr *tx_ring,
9d2b68cc108db2 Vladimir Oltean 2021-03-31  860  					  struct enetc_tx_swbd *xdp_tx_arr,
9d2b68cc108db2 Vladimir Oltean 2021-03-31  861  					  struct xdp_frame *xdp_frame)
9d2b68cc108db2 Vladimir Oltean 2021-03-31  862  {
9d2b68cc108db2 Vladimir Oltean 2021-03-31  863  	struct enetc_tx_swbd *xdp_tx_swbd = &xdp_tx_arr[0];
9d2b68cc108db2 Vladimir Oltean 2021-03-31  864  	struct skb_shared_info *shinfo;
9d2b68cc108db2 Vladimir Oltean 2021-03-31  865  	void *data = xdp_frame->data;
9d2b68cc108db2 Vladimir Oltean 2021-03-31  866  	int len = xdp_frame->len;
9d2b68cc108db2 Vladimir Oltean 2021-03-31  867  	skb_frag_t *frag;
9d2b68cc108db2 Vladimir Oltean 2021-03-31  868  	dma_addr_t dma;
9d2b68cc108db2 Vladimir Oltean 2021-03-31  869  	unsigned int f;
9d2b68cc108db2 Vladimir Oltean 2021-03-31  870  	int n = 0;
9d2b68cc108db2 Vladimir Oltean 2021-03-31  871  
9d2b68cc108db2 Vladimir Oltean 2021-03-31  872  	dma = dma_map_single(tx_ring->dev, data, len, DMA_TO_DEVICE);
9d2b68cc108db2 Vladimir Oltean 2021-03-31  873  	if (unlikely(dma_mapping_error(tx_ring->dev, dma))) {
9d2b68cc108db2 Vladimir Oltean 2021-03-31  874  		netdev_err(tx_ring->ndev, "DMA map error\n");
9d2b68cc108db2 Vladimir Oltean 2021-03-31  875  		return -1;
9d2b68cc108db2 Vladimir Oltean 2021-03-31  876  	}
9d2b68cc108db2 Vladimir Oltean 2021-03-31  877  
9d2b68cc108db2 Vladimir Oltean 2021-03-31  878  	xdp_tx_swbd->dma = dma;
9d2b68cc108db2 Vladimir Oltean 2021-03-31  879  	xdp_tx_swbd->dir = DMA_TO_DEVICE;
9d2b68cc108db2 Vladimir Oltean 2021-03-31  880  	xdp_tx_swbd->len = len;
9d2b68cc108db2 Vladimir Oltean 2021-03-31  881  	xdp_tx_swbd->is_xdp_redirect = true;
9d2b68cc108db2 Vladimir Oltean 2021-03-31  882  	xdp_tx_swbd->is_eof = false;
9d2b68cc108db2 Vladimir Oltean 2021-03-31  883  	xdp_tx_swbd->xdp_frame = NULL;
9d2b68cc108db2 Vladimir Oltean 2021-03-31  884  
9d2b68cc108db2 Vladimir Oltean 2021-03-31  885  	n++;
9d2b68cc108db2 Vladimir Oltean 2021-03-31  886  	xdp_tx_swbd = &xdp_tx_arr[n];
9d2b68cc108db2 Vladimir Oltean 2021-03-31  887  
9d2b68cc108db2 Vladimir Oltean 2021-03-31 @888  	shinfo = xdp_get_shared_info_from_frame(xdp_frame);
9d2b68cc108db2 Vladimir Oltean 2021-03-31  889  
9d2b68cc108db2 Vladimir Oltean 2021-03-31  890  	for (f = 0, frag = &shinfo->frags[0]; f < shinfo->nr_frags;
9d2b68cc108db2 Vladimir Oltean 2021-03-31  891  	     f++, frag++) {
9d2b68cc108db2 Vladimir Oltean 2021-03-31  892  		data = skb_frag_address(frag);
9d2b68cc108db2 Vladimir Oltean 2021-03-31  893  		len = skb_frag_size(frag);
9d2b68cc108db2 Vladimir Oltean 2021-03-31  894  
9d2b68cc108db2 Vladimir Oltean 2021-03-31  895  		dma = dma_map_single(tx_ring->dev, data, len, DMA_TO_DEVICE);
9d2b68cc108db2 Vladimir Oltean 2021-03-31  896  		if (unlikely(dma_mapping_error(tx_ring->dev, dma))) {
9d2b68cc108db2 Vladimir Oltean 2021-03-31  897  			/* Undo the DMA mapping for all fragments */
9d2b68cc108db2 Vladimir Oltean 2021-03-31  898  			while (n-- >= 0)
9d2b68cc108db2 Vladimir Oltean 2021-03-31  899  				enetc_unmap_tx_buff(tx_ring, &xdp_tx_arr[n]);
9d2b68cc108db2 Vladimir Oltean 2021-03-31  900  
9d2b68cc108db2 Vladimir Oltean 2021-03-31  901  			netdev_err(tx_ring->ndev, "DMA map error\n");
9d2b68cc108db2 Vladimir Oltean 2021-03-31  902  			return -1;
9d2b68cc108db2 Vladimir Oltean 2021-03-31  903  		}
9d2b68cc108db2 Vladimir Oltean 2021-03-31  904  
9d2b68cc108db2 Vladimir Oltean 2021-03-31  905  		xdp_tx_swbd->dma = dma;
9d2b68cc108db2 Vladimir Oltean 2021-03-31  906  		xdp_tx_swbd->dir = DMA_TO_DEVICE;
9d2b68cc108db2 Vladimir Oltean 2021-03-31  907  		xdp_tx_swbd->len = len;
9d2b68cc108db2 Vladimir Oltean 2021-03-31  908  		xdp_tx_swbd->is_xdp_redirect = true;
9d2b68cc108db2 Vladimir Oltean 2021-03-31  909  		xdp_tx_swbd->is_eof = false;
9d2b68cc108db2 Vladimir Oltean 2021-03-31  910  		xdp_tx_swbd->xdp_frame = NULL;
9d2b68cc108db2 Vladimir Oltean 2021-03-31  911  
9d2b68cc108db2 Vladimir Oltean 2021-03-31  912  		n++;
9d2b68cc108db2 Vladimir Oltean 2021-03-31  913  		xdp_tx_swbd = &xdp_tx_arr[n];
9d2b68cc108db2 Vladimir Oltean 2021-03-31  914  	}
9d2b68cc108db2 Vladimir Oltean 2021-03-31  915  
9d2b68cc108db2 Vladimir Oltean 2021-03-31  916  	xdp_tx_arr[n - 1].is_eof = true;
9d2b68cc108db2 Vladimir Oltean 2021-03-31  917  	xdp_tx_arr[n - 1].xdp_frame = xdp_frame;
9d2b68cc108db2 Vladimir Oltean 2021-03-31  918  
9d2b68cc108db2 Vladimir Oltean 2021-03-31  919  	return n;
9d2b68cc108db2 Vladimir Oltean 2021-03-31  920  }
9d2b68cc108db2 Vladimir Oltean 2021-03-31  921  
9d2b68cc108db2 Vladimir Oltean 2021-03-31  922  int enetc_xdp_xmit(struct net_device *ndev, int num_frames,
9d2b68cc108db2 Vladimir Oltean 2021-03-31  923  		   struct xdp_frame **frames, u32 flags)
9d2b68cc108db2 Vladimir Oltean 2021-03-31  924  {
9d2b68cc108db2 Vladimir Oltean 2021-03-31  925  	struct enetc_tx_swbd xdp_redirect_arr[ENETC_MAX_SKB_FRAGS] = {0};
9d2b68cc108db2 Vladimir Oltean 2021-03-31  926  	struct enetc_ndev_priv *priv = netdev_priv(ndev);
9d2b68cc108db2 Vladimir Oltean 2021-03-31  927  	struct enetc_bdr *tx_ring;
9d2b68cc108db2 Vladimir Oltean 2021-03-31  928  	int xdp_tx_bd_cnt, i, k;
9d2b68cc108db2 Vladimir Oltean 2021-03-31  929  	int xdp_tx_frm_cnt = 0;
9d2b68cc108db2 Vladimir Oltean 2021-03-31  930  
9d2b68cc108db2 Vladimir Oltean 2021-03-31  931  	tx_ring = priv->tx_ring[smp_processor_id()];
9d2b68cc108db2 Vladimir Oltean 2021-03-31  932  
9d2b68cc108db2 Vladimir Oltean 2021-03-31  933  	prefetchw(ENETC_TXBD(*tx_ring, tx_ring->next_to_use));
9d2b68cc108db2 Vladimir Oltean 2021-03-31  934  
9d2b68cc108db2 Vladimir Oltean 2021-03-31  935  	for (k = 0; k < num_frames; k++) {
9d2b68cc108db2 Vladimir Oltean 2021-03-31  936  		xdp_tx_bd_cnt = enetc_xdp_frame_to_xdp_tx_swbd(tx_ring,
9d2b68cc108db2 Vladimir Oltean 2021-03-31  937  							       xdp_redirect_arr,
9d2b68cc108db2 Vladimir Oltean 2021-03-31  938  							       frames[k]);
9d2b68cc108db2 Vladimir Oltean 2021-03-31  939  		if (unlikely(xdp_tx_bd_cnt < 0))
9d2b68cc108db2 Vladimir Oltean 2021-03-31  940  			break;
9d2b68cc108db2 Vladimir Oltean 2021-03-31  941  
9d2b68cc108db2 Vladimir Oltean 2021-03-31  942  		if (unlikely(!enetc_xdp_tx(tx_ring, xdp_redirect_arr,
9d2b68cc108db2 Vladimir Oltean 2021-03-31  943  					   xdp_tx_bd_cnt))) {
9d2b68cc108db2 Vladimir Oltean 2021-03-31  944  			for (i = 0; i < xdp_tx_bd_cnt; i++)
9d2b68cc108db2 Vladimir Oltean 2021-03-31  945  				enetc_unmap_tx_buff(tx_ring,
9d2b68cc108db2 Vladimir Oltean 2021-03-31  946  						    &xdp_redirect_arr[i]);
9d2b68cc108db2 Vladimir Oltean 2021-03-31  947  			tx_ring->stats.xdp_tx_drops++;
9d2b68cc108db2 Vladimir Oltean 2021-03-31  948  			break;
9d2b68cc108db2 Vladimir Oltean 2021-03-31  949  		}
9d2b68cc108db2 Vladimir Oltean 2021-03-31  950  
9d2b68cc108db2 Vladimir Oltean 2021-03-31  951  		xdp_tx_frm_cnt++;
9d2b68cc108db2 Vladimir Oltean 2021-03-31  952  	}
9d2b68cc108db2 Vladimir Oltean 2021-03-31  953  
9d2b68cc108db2 Vladimir Oltean 2021-03-31  954  	if (unlikely((flags & XDP_XMIT_FLUSH) || k != xdp_tx_frm_cnt))
9d2b68cc108db2 Vladimir Oltean 2021-03-31  955  		enetc_update_tx_ring_tail(tx_ring);
9d2b68cc108db2 Vladimir Oltean 2021-03-31  956  
9d2b68cc108db2 Vladimir Oltean 2021-03-31  957  	tx_ring->stats.xdp_tx += xdp_tx_frm_cnt;
9d2b68cc108db2 Vladimir Oltean 2021-03-31  958  
9d2b68cc108db2 Vladimir Oltean 2021-03-31  959  	return xdp_tx_frm_cnt;
9d2b68cc108db2 Vladimir Oltean 2021-03-31  960  }
9d2b68cc108db2 Vladimir Oltean 2021-03-31  961  
d1b15102dd16ad Vladimir Oltean 2021-03-31  962  static void enetc_map_rx_buff_to_xdp(struct enetc_bdr *rx_ring, int i,
d1b15102dd16ad Vladimir Oltean 2021-03-31  963  				     struct xdp_buff *xdp_buff, u16 size)
d1b15102dd16ad Vladimir Oltean 2021-03-31  964  {
d1b15102dd16ad Vladimir Oltean 2021-03-31  965  	struct enetc_rx_swbd *rx_swbd = enetc_get_rx_buff(rx_ring, i, size);
d1b15102dd16ad Vladimir Oltean 2021-03-31  966  	void *hard_start = page_address(rx_swbd->page) + rx_swbd->page_offset;
d1b15102dd16ad Vladimir Oltean 2021-03-31  967  	struct skb_shared_info *shinfo;
d1b15102dd16ad Vladimir Oltean 2021-03-31  968  
7ed2bc80074ed4 Vladimir Oltean 2021-03-31  969  	/* To be used for XDP_TX */
7ed2bc80074ed4 Vladimir Oltean 2021-03-31  970  	rx_swbd->len = size;
7ed2bc80074ed4 Vladimir Oltean 2021-03-31  971  
d1b15102dd16ad Vladimir Oltean 2021-03-31  972  	xdp_prepare_buff(xdp_buff, hard_start - rx_ring->buffer_offset,
d1b15102dd16ad Vladimir Oltean 2021-03-31  973  			 rx_ring->buffer_offset, size, false);
d1b15102dd16ad Vladimir Oltean 2021-03-31  974  
d1b15102dd16ad Vladimir Oltean 2021-03-31  975  	shinfo = xdp_get_shared_info_from_buff(xdp_buff);
d1b15102dd16ad Vladimir Oltean 2021-03-31  976  	shinfo->nr_frags = 0;
d1b15102dd16ad Vladimir Oltean 2021-03-31  977  }
d1b15102dd16ad Vladimir Oltean 2021-03-31  978  
d1b15102dd16ad Vladimir Oltean 2021-03-31  979  static void enetc_add_rx_buff_to_xdp(struct enetc_bdr *rx_ring, int i,
d1b15102dd16ad Vladimir Oltean 2021-03-31  980  				     u16 size, struct xdp_buff *xdp_buff)
d1b15102dd16ad Vladimir Oltean 2021-03-31  981  {
d1b15102dd16ad Vladimir Oltean 2021-03-31 @982  	struct skb_shared_info *shinfo = xdp_get_shared_info_from_buff(xdp_buff);
d1b15102dd16ad Vladimir Oltean 2021-03-31  983  	struct enetc_rx_swbd *rx_swbd = enetc_get_rx_buff(rx_ring, i, size);
d1b15102dd16ad Vladimir Oltean 2021-03-31  984  	skb_frag_t *frag = &shinfo->frags[shinfo->nr_frags];
d1b15102dd16ad Vladimir Oltean 2021-03-31  985  
7ed2bc80074ed4 Vladimir Oltean 2021-03-31  986  	/* To be used for XDP_TX */
7ed2bc80074ed4 Vladimir Oltean 2021-03-31  987  	rx_swbd->len = size;
7ed2bc80074ed4 Vladimir Oltean 2021-03-31  988  
d1b15102dd16ad Vladimir Oltean 2021-03-31  989  	skb_frag_off_set(frag, rx_swbd->page_offset);
d1b15102dd16ad Vladimir Oltean 2021-03-31  990  	skb_frag_size_set(frag, size);
d1b15102dd16ad Vladimir Oltean 2021-03-31  991  	__skb_frag_set_page(frag, rx_swbd->page);
d1b15102dd16ad Vladimir Oltean 2021-03-31  992  
d1b15102dd16ad Vladimir Oltean 2021-03-31  993  	shinfo->nr_frags++;
d1b15102dd16ad Vladimir Oltean 2021-03-31  994  }
d1b15102dd16ad Vladimir Oltean 2021-03-31  995  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 63872 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v8 bpf-next 01/14] xdp: introduce mb in xdp_buff/xdp_frame
  2021-04-08 12:50 ` [PATCH v8 bpf-next 01/14] xdp: introduce mb in xdp_buff/xdp_frame Lorenzo Bianconi
@ 2021-04-08 18:17   ` Vladimir Oltean
  2021-04-09 16:03     ` Lorenzo Bianconi
  2021-04-29 13:36   ` Jesper Dangaard Brouer
  1 sibling, 1 reply; 57+ messages in thread
From: Vladimir Oltean @ 2021-04-08 18:17 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: bpf, netdev, lorenzo.bianconi, davem, kuba, ast, daniel, shayagr,
	sameehj, john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski

On Thu, Apr 08, 2021 at 02:50:53PM +0200, Lorenzo Bianconi wrote:
> Introduce multi-buffer bit (mb) in xdp_frame/xdp_buffer data structure
> in order to specify if this is a linear buffer (mb = 0) or a multi-buffer
> frame (mb = 1). In the latter case the shared_info area at the end of the
> first buffer will be properly initialized to link together subsequent
> buffers.
> 
> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> ---
>  include/net/xdp.h | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/include/net/xdp.h b/include/net/xdp.h
> index a5bc214a49d9..842580a61563 100644
> --- a/include/net/xdp.h
> +++ b/include/net/xdp.h
> @@ -73,7 +73,10 @@ struct xdp_buff {
>  	void *data_hard_start;
>  	struct xdp_rxq_info *rxq;
>  	struct xdp_txq_info *txq;
> -	u32 frame_sz; /* frame size to deduce data_hard_end/reserved tailroom*/
> +	u32 frame_sz:31; /* frame size to deduce data_hard_end/reserved
> +			  * tailroom
> +			  */

This comment would have fit just fine on one line:

	/* frame size to deduce data_hard_end/reserved tailroom */

> +	u32 mb:1; /* xdp non-linear buffer */
>  };
>  
>  static __always_inline void
> @@ -81,6 +84,7 @@ xdp_init_buff(struct xdp_buff *xdp, u32 frame_sz, struct xdp_rxq_info *rxq)
>  {
>  	xdp->frame_sz = frame_sz;
>  	xdp->rxq = rxq;
> +	xdp->mb = 0;
>  }
>  
>  static __always_inline void
> @@ -116,7 +120,8 @@ struct xdp_frame {
>  	u16 len;
>  	u16 headroom;
>  	u32 metasize:8;
> -	u32 frame_sz:24;
> +	u32 frame_sz:23;
> +	u32 mb:1; /* xdp non-linear frame */
>  	/* Lifetime of xdp_rxq_info is limited to NAPI/enqueue time,
>  	 * while mem info is valid on remote CPU.
>  	 */
> @@ -179,6 +184,7 @@ void xdp_convert_frame_to_buff(struct xdp_frame *frame, struct xdp_buff *xdp)
>  	xdp->data_end = frame->data + frame->len;
>  	xdp->data_meta = frame->data - frame->metasize;
>  	xdp->frame_sz = frame->frame_sz;
> +	xdp->mb = frame->mb;
>  }
>  
>  static inline
> @@ -205,6 +211,7 @@ int xdp_update_frame_from_buff(struct xdp_buff *xdp,
>  	xdp_frame->headroom = headroom - sizeof(*xdp_frame);
>  	xdp_frame->metasize = metasize;
>  	xdp_frame->frame_sz = xdp->frame_sz;
> +	xdp_frame->mb = xdp->mb;
>  
>  	return 0;
>  }
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v8 bpf-next 03/14] net: mvneta: update mb bit before passing the xdp buffer to eBPF layer
  2021-04-08 12:50 ` [PATCH v8 bpf-next 03/14] net: mvneta: update mb bit before passing the xdp buffer to eBPF layer Lorenzo Bianconi
@ 2021-04-08 18:19   ` Vladimir Oltean
  2021-04-09 16:24     ` Lorenzo Bianconi
  0 siblings, 1 reply; 57+ messages in thread
From: Vladimir Oltean @ 2021-04-08 18:19 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: bpf, netdev, lorenzo.bianconi, davem, kuba, ast, daniel, shayagr,
	sameehj, john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski

On Thu, Apr 08, 2021 at 02:50:55PM +0200, Lorenzo Bianconi wrote:
> Update multi-buffer bit (mb) in xdp_buff to notify XDP/eBPF layer and
> XDP remote drivers if this is a "non-linear" XDP buffer. Access
> xdp_shared_info only if xdp_buff mb is set.
> 
> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> ---
>  drivers/net/ethernet/marvell/mvneta.c | 26 ++++++++++++++++++++------
>  1 file changed, 20 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
> index a52e132fd2cf..94e29cce693a 100644
> --- a/drivers/net/ethernet/marvell/mvneta.c
> +++ b/drivers/net/ethernet/marvell/mvneta.c
> @@ -2041,12 +2041,16 @@ mvneta_xdp_put_buff(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
>  {
>  	int i;
>  
> +	if (likely(!xdp->mb))
> +		goto out;
> +

Is there any particular reason for this extra check?

>  	for (i = 0; i < xdp_sinfo->nr_frags; i++) {
>  		skb_frag_t *frag = &xdp_sinfo->frags[i];
>  
>  		page_pool_put_full_page(rxq->page_pool,
>  					xdp_get_frag_page(frag), true);
>  	}
> +out:
>  	page_pool_put_page(rxq->page_pool, virt_to_head_page(xdp->data),
>  			   sync_len, true);
>  }
> @@ -2246,7 +2250,6 @@ mvneta_swbm_rx_frame(struct mvneta_port *pp,
>  {
>  	unsigned char *data = page_address(page);
>  	int data_len = -MVNETA_MH_SIZE, len;
> -	struct xdp_shared_info *xdp_sinfo;
>  	struct net_device *dev = pp->dev;
>  	enum dma_data_direction dma_dir;
>  
> @@ -2270,9 +2273,6 @@ mvneta_swbm_rx_frame(struct mvneta_port *pp,
>  	prefetch(data);
>  	xdp_prepare_buff(xdp, data, pp->rx_offset_correction + MVNETA_MH_SIZE,
>  			 data_len, false);
> -
> -	xdp_sinfo = xdp_get_shared_info_from_buff(xdp);
> -	xdp_sinfo->nr_frags = 0;
>  }
>  
>  static void
> @@ -2307,12 +2307,18 @@ mvneta_swbm_add_rx_fragment(struct mvneta_port *pp,
>  		xdp_set_frag_size(frag, data_len);
>  		xdp_set_frag_page(frag, page);
>  
> +		if (!xdp->mb) {
> +			xdp_sinfo->data_length = *size;
> +			xdp->mb = 1;
> +		}
>  		/* last fragment */
>  		if (len == *size) {
>  			struct xdp_shared_info *sinfo;
>  
>  			sinfo = xdp_get_shared_info_from_buff(xdp);
>  			sinfo->nr_frags = xdp_sinfo->nr_frags;
> +			sinfo->data_length = xdp_sinfo->data_length;
> +
>  			memcpy(sinfo->frags, xdp_sinfo->frags,
>  			       sinfo->nr_frags * sizeof(skb_frag_t));
>  		}
> @@ -2327,11 +2333,15 @@ mvneta_swbm_build_skb(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
>  		      struct xdp_buff *xdp, u32 desc_status)
>  {
>  	struct xdp_shared_info *xdp_sinfo = xdp_get_shared_info_from_buff(xdp);
> -	int i, num_frags = xdp_sinfo->nr_frags;
>  	skb_frag_t frag_list[MAX_SKB_FRAGS];
> +	int i, num_frags = 0;
>  	struct sk_buff *skb;
>  
> -	memcpy(frag_list, xdp_sinfo->frags, sizeof(skb_frag_t) * num_frags);
> +	if (unlikely(xdp->mb)) {
> +		num_frags = xdp_sinfo->nr_frags;
> +		memcpy(frag_list, xdp_sinfo->frags,
> +		       sizeof(skb_frag_t) * num_frags);
> +	}
>  
>  	skb = build_skb(xdp->data_hard_start, PAGE_SIZE);
>  	if (!skb)
> @@ -2343,6 +2353,9 @@ mvneta_swbm_build_skb(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
>  	skb_put(skb, xdp->data_end - xdp->data);
>  	mvneta_rx_csum(pp, desc_status, skb);
>  
> +	if (likely(!xdp->mb))
> +		return skb;
> +
>  	for (i = 0; i < num_frags; i++) {
>  		struct page *page = xdp_get_frag_page(&frag_list[i]);
>  
> @@ -2404,6 +2417,7 @@ static int mvneta_rx_swbm(struct napi_struct *napi,
>  			frame_sz = size - ETH_FCS_LEN;
>  			desc_status = rx_status;
>  
> +			xdp_buf.mb = 0;
>  			mvneta_swbm_rx_frame(pp, rx_desc, rxq, &xdp_buf,
>  					     &size, page);
>  		} else {
> -- 
> 2.30.2
> 


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v8 bpf-next 04/14] xdp: add multi-buff support to xdp_return_{buff/frame}
  2021-04-08 12:50 ` [PATCH v8 bpf-next 04/14] xdp: add multi-buff support to xdp_return_{buff/frame} Lorenzo Bianconi
@ 2021-04-08 18:30   ` Vladimir Oltean
  2021-04-09 16:28     ` Lorenzo Bianconi
  0 siblings, 1 reply; 57+ messages in thread
From: Vladimir Oltean @ 2021-04-08 18:30 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: bpf, netdev, lorenzo.bianconi, davem, kuba, ast, daniel, shayagr,
	sameehj, john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski

On Thu, Apr 08, 2021 at 02:50:56PM +0200, Lorenzo Bianconi wrote:
> Take into account if the received xdp_buff/xdp_frame is non-linear
> recycling/returning the frame memory to the allocator or into
> xdp_frame_bulk.
> Introduce xdp_return_num_frags_from_buff to return a given number of
> fragments from a xdp multi-buff starting from the tail.
> 
> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> ---
>  include/net/xdp.h | 19 ++++++++++--
>  net/core/xdp.c    | 76 ++++++++++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 92 insertions(+), 3 deletions(-)
> 
> diff --git a/include/net/xdp.h b/include/net/xdp.h
> index 02aea7696d15..c8eb7cf4ebed 100644
> --- a/include/net/xdp.h
> +++ b/include/net/xdp.h
> @@ -289,6 +289,7 @@ void xdp_return_buff(struct xdp_buff *xdp);
>  void xdp_flush_frame_bulk(struct xdp_frame_bulk *bq);
>  void xdp_return_frame_bulk(struct xdp_frame *xdpf,
>  			   struct xdp_frame_bulk *bq);
> +void xdp_return_num_frags_from_buff(struct xdp_buff *xdp, u16 num_frags);
>  
>  /* When sending xdp_frame into the network stack, then there is no
>   * return point callback, which is needed to release e.g. DMA-mapping
> @@ -299,10 +300,24 @@ void __xdp_release_frame(void *data, struct xdp_mem_info *mem);
>  static inline void xdp_release_frame(struct xdp_frame *xdpf)
>  {
>  	struct xdp_mem_info *mem = &xdpf->mem;
> +	struct xdp_shared_info *xdp_sinfo;
> +	int i;
>  
>  	/* Curr only page_pool needs this */
> -	if (mem->type == MEM_TYPE_PAGE_POOL)
> -		__xdp_release_frame(xdpf->data, mem);
> +	if (mem->type != MEM_TYPE_PAGE_POOL)
> +		return;
> +
> +	if (likely(!xdpf->mb))
> +		goto out;
> +
> +	xdp_sinfo = xdp_get_shared_info_from_frame(xdpf);
> +	for (i = 0; i < xdp_sinfo->nr_frags; i++) {
> +		struct page *page = xdp_get_frag_page(&xdp_sinfo->frags[i]);
> +
> +		__xdp_release_frame(page_address(page), mem);
> +	}
> +out:
> +	__xdp_release_frame(xdpf->data, mem);
>  }
>  
>  int xdp_rxq_info_reg(struct xdp_rxq_info *xdp_rxq,
> diff --git a/net/core/xdp.c b/net/core/xdp.c
> index 05354976c1fc..430f516259d9 100644
> --- a/net/core/xdp.c
> +++ b/net/core/xdp.c
> @@ -374,12 +374,38 @@ static void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct,
>  
>  void xdp_return_frame(struct xdp_frame *xdpf)
>  {
> +	struct xdp_shared_info *xdp_sinfo;
> +	int i;
> +
> +	if (likely(!xdpf->mb))
> +		goto out;
> +
> +	xdp_sinfo = xdp_get_shared_info_from_frame(xdpf);
> +	for (i = 0; i < xdp_sinfo->nr_frags; i++) {
> +		struct page *page = xdp_get_frag_page(&xdp_sinfo->frags[i]);
> +
> +		__xdp_return(page_address(page), &xdpf->mem, false, NULL);
> +	}
> +out:
>  	__xdp_return(xdpf->data, &xdpf->mem, false, NULL);
>  }
>  EXPORT_SYMBOL_GPL(xdp_return_frame);
>  
>  void xdp_return_frame_rx_napi(struct xdp_frame *xdpf)
>  {
> +	struct xdp_shared_info *xdp_sinfo;
> +	int i;
> +
> +	if (likely(!xdpf->mb))
> +		goto out;
> +
> +	xdp_sinfo = xdp_get_shared_info_from_frame(xdpf);
> +	for (i = 0; i < xdp_sinfo->nr_frags; i++) {
> +		struct page *page = xdp_get_frag_page(&xdp_sinfo->frags[i]);
> +
> +		__xdp_return(page_address(page), &xdpf->mem, true, NULL);
> +	}
> +out:
>  	__xdp_return(xdpf->data, &xdpf->mem, true, NULL);
>  }
>  EXPORT_SYMBOL_GPL(xdp_return_frame_rx_napi);
> @@ -415,7 +441,7 @@ void xdp_return_frame_bulk(struct xdp_frame *xdpf,
>  	struct xdp_mem_allocator *xa;
>  
>  	if (mem->type != MEM_TYPE_PAGE_POOL) {
> -		__xdp_return(xdpf->data, &xdpf->mem, false, NULL);
> +		xdp_return_frame(xdpf);
>  		return;
>  	}
>  
> @@ -434,15 +460,63 @@ void xdp_return_frame_bulk(struct xdp_frame *xdpf,
>  		bq->xa = rhashtable_lookup(mem_id_ht, &mem->id, mem_id_rht_params);
>  	}
>  
> +	if (unlikely(xdpf->mb)) {
> +		struct xdp_shared_info *xdp_sinfo;
> +		int i;
> +
> +		xdp_sinfo = xdp_get_shared_info_from_frame(xdpf);
> +		for (i = 0; i < xdp_sinfo->nr_frags; i++) {
> +			skb_frag_t *frag = &xdp_sinfo->frags[i];
> +
> +			bq->q[bq->count++] = xdp_get_frag_address(frag);
> +			if (bq->count == XDP_BULK_QUEUE_SIZE)
> +				xdp_flush_frame_bulk(bq);
> +		}
> +	}
>  	bq->q[bq->count++] = xdpf->data;
>  }
>  EXPORT_SYMBOL_GPL(xdp_return_frame_bulk);
>  
>  void xdp_return_buff(struct xdp_buff *xdp)
>  {
> +	struct xdp_shared_info *xdp_sinfo;
> +	int i;
> +
> +	if (likely(!xdp->mb))
> +		goto out;
> +
> +	xdp_sinfo = xdp_get_shared_info_from_buff(xdp);
> +	for (i = 0; i < xdp_sinfo->nr_frags; i++) {
> +		struct page *page = xdp_get_frag_page(&xdp_sinfo->frags[i]);
> +
> +		__xdp_return(page_address(page), &xdp->rxq->mem, true, xdp);
> +	}
> +out:
>  	__xdp_return(xdp->data, &xdp->rxq->mem, true, xdp);
>  }
>  
> +void xdp_return_num_frags_from_buff(struct xdp_buff *xdp, u16 num_frags)
> +{
> +	struct xdp_shared_info *xdp_sinfo;
> +	int i;
> +
> +	if (unlikely(!xdp->mb))
> +		return;
> +
> +	xdp_sinfo = xdp_get_shared_info_from_buff(xdp);
> +	num_frags = min_t(u16, num_frags, xdp_sinfo->nr_frags);
> +	for (i = 1; i <= num_frags; i++) {
> +		skb_frag_t *frag = &xdp_sinfo->frags[xdp_sinfo->nr_frags - i];
> +		struct page *page = xdp_get_frag_page(frag);
> +
> +		xdp_sinfo->data_length -= xdp_get_frag_size(frag);
> +		__xdp_return(page_address(page), &xdp->rxq->mem, false, NULL);
> +	}
> +	xdp_sinfo->nr_frags -= num_frags;
> +	xdp->mb = !!xdp_sinfo->nr_frags;
> +}
> +EXPORT_SYMBOL_GPL(xdp_return_num_frags_from_buff);
> +
>  /* Only called for MEM_TYPE_PAGE_POOL see xdp.h */
>  void __xdp_release_frame(void *data, struct xdp_mem_info *mem)
>  {

None of this really benefits in any way from having the extra "mb" bit,
does it? I get the impression it would work just the same way without it.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v8 bpf-next 05/14] net: mvneta: add multi buffer support to XDP_TX
  2021-04-08 12:50 ` [PATCH v8 bpf-next 05/14] net: mvneta: add multi buffer support to XDP_TX Lorenzo Bianconi
@ 2021-04-08 18:40   ` Vladimir Oltean
  2021-04-09 16:36     ` Lorenzo Bianconi
  0 siblings, 1 reply; 57+ messages in thread
From: Vladimir Oltean @ 2021-04-08 18:40 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: bpf, netdev, lorenzo.bianconi, davem, kuba, ast, daniel, shayagr,
	sameehj, john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski

On Thu, Apr 08, 2021 at 02:50:57PM +0200, Lorenzo Bianconi wrote:
> Introduce the capability to map non-linear xdp buffer running
> mvneta_xdp_submit_frame() for XDP_TX and XDP_REDIRECT
> 
> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> ---
>  drivers/net/ethernet/marvell/mvneta.c | 94 +++++++++++++++++----------
>  1 file changed, 58 insertions(+), 36 deletions(-)
> 
> diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
> index 94e29cce693a..e95d8df0fcdb 100644
> --- a/drivers/net/ethernet/marvell/mvneta.c
> +++ b/drivers/net/ethernet/marvell/mvneta.c
> @@ -1860,8 +1860,8 @@ static void mvneta_txq_bufs_free(struct mvneta_port *pp,
>  			bytes_compl += buf->skb->len;
>  			pkts_compl++;
>  			dev_kfree_skb_any(buf->skb);
> -		} else if (buf->type == MVNETA_TYPE_XDP_TX ||
> -			   buf->type == MVNETA_TYPE_XDP_NDO) {
> +		} else if ((buf->type == MVNETA_TYPE_XDP_TX ||
> +			    buf->type == MVNETA_TYPE_XDP_NDO) && buf->xdpf) {
>  			if (napi && buf->type == MVNETA_TYPE_XDP_TX)
>  				xdp_return_frame_rx_napi(buf->xdpf);
>  			else
> @@ -2057,45 +2057,67 @@ mvneta_xdp_put_buff(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
>  
>  static int
>  mvneta_xdp_submit_frame(struct mvneta_port *pp, struct mvneta_tx_queue *txq,
> -			struct xdp_frame *xdpf, bool dma_map)
> +			struct xdp_frame *xdpf, int *nxmit_byte, bool dma_map)
>  {
> -	struct mvneta_tx_desc *tx_desc;
> -	struct mvneta_tx_buf *buf;
> -	dma_addr_t dma_addr;
> +	struct mvneta_tx_desc *tx_desc = NULL;
> +	struct xdp_shared_info *xdp_sinfo;
> +	struct page *page;
> +	int i, num_frames;
> +
> +	xdp_sinfo = xdp_get_shared_info_from_frame(xdpf);
> +	num_frames = xdpf->mb ? xdp_sinfo->nr_frags + 1 : 1;
>  
> -	if (txq->count >= txq->tx_stop_threshold)
> +	if (txq->count + num_frames >= txq->size)
>  		return MVNETA_XDP_DROPPED;
>  
> -	tx_desc = mvneta_txq_next_desc_get(txq);
> +	for (i = 0; i < num_frames; i++) {

I get the feeling this is more like num_bufs than num_frames.

> +		struct mvneta_tx_buf *buf = &txq->buf[txq->txq_put_index];
> +		skb_frag_t *frag = i ? &xdp_sinfo->frags[i - 1] : NULL;
> +		int len = i ? xdp_get_frag_size(frag) : xdpf->len;
> +		dma_addr_t dma_addr;
>  
> -	buf = &txq->buf[txq->txq_put_index];
> -	if (dma_map) {
> -		/* ndo_xdp_xmit */
> -		dma_addr = dma_map_single(pp->dev->dev.parent, xdpf->data,
> -					  xdpf->len, DMA_TO_DEVICE);
> -		if (dma_mapping_error(pp->dev->dev.parent, dma_addr)) {
> -			mvneta_txq_desc_put(txq);
> -			return MVNETA_XDP_DROPPED;
> +		tx_desc = mvneta_txq_next_desc_get(txq);
> +		if (dma_map) {
> +			/* ndo_xdp_xmit */
> +			void *data;
> +
> +			data = frag ? xdp_get_frag_address(frag) : xdpf->data;
> +			dma_addr = dma_map_single(pp->dev->dev.parent, data,
> +						  len, DMA_TO_DEVICE);
> +			if (dma_mapping_error(pp->dev->dev.parent, dma_addr)) {
> +				for (; i >= 0; i--)
> +					mvneta_txq_desc_put(txq);

Don't you need to unmap the previous buffers too?

> +				return MVNETA_XDP_DROPPED;
> +			}
> +			buf->type = MVNETA_TYPE_XDP_NDO;
> +		} else {
> +			page = frag ? xdp_get_frag_page(frag)
> +				    : virt_to_page(xdpf->data);
> +			dma_addr = page_pool_get_dma_addr(page);
> +			if (frag)
> +				dma_addr += xdp_get_frag_offset(frag);
> +			else
> +				dma_addr += sizeof(*xdpf) + xdpf->headroom;
> +			dma_sync_single_for_device(pp->dev->dev.parent,
> +						   dma_addr, len,
> +						   DMA_BIDIRECTIONAL);
> +			buf->type = MVNETA_TYPE_XDP_TX;
>  		}
> -		buf->type = MVNETA_TYPE_XDP_NDO;
> -	} else {
> -		struct page *page = virt_to_page(xdpf->data);
> +		buf->xdpf = i ? NULL : xdpf;
>  
> -		dma_addr = page_pool_get_dma_addr(page) +
> -			   sizeof(*xdpf) + xdpf->headroom;
> -		dma_sync_single_for_device(pp->dev->dev.parent, dma_addr,
> -					   xdpf->len, DMA_BIDIRECTIONAL);
> -		buf->type = MVNETA_TYPE_XDP_TX;
> +		tx_desc->command = !i ? MVNETA_TXD_F_DESC : 0;
> +		tx_desc->buf_phys_addr = dma_addr;
> +		tx_desc->data_size = len;
> +		*nxmit_byte += len;
> +
> +		mvneta_txq_inc_put(txq);
>  	}
> -	buf->xdpf = xdpf;
>  
> -	tx_desc->command = MVNETA_TXD_FLZ_DESC;
> -	tx_desc->buf_phys_addr = dma_addr;
> -	tx_desc->data_size = xdpf->len;
> +	/*last descriptor */
> +	tx_desc->command |= MVNETA_TXD_L_DESC | MVNETA_TXD_Z_PAD;
>  
> -	mvneta_txq_inc_put(txq);
> -	txq->pending++;
> -	txq->count++;
> +	txq->pending += num_frames;
> +	txq->count += num_frames;
>  
>  	return MVNETA_XDP_TX;
>  }
> @@ -2106,8 +2128,8 @@ mvneta_xdp_xmit_back(struct mvneta_port *pp, struct xdp_buff *xdp)
>  	struct mvneta_pcpu_stats *stats = this_cpu_ptr(pp->stats);
>  	struct mvneta_tx_queue *txq;
>  	struct netdev_queue *nq;
> +	int cpu, nxmit_byte = 0;
>  	struct xdp_frame *xdpf;
> -	int cpu;
>  	u32 ret;
>  
>  	xdpf = xdp_convert_buff_to_frame(xdp);
> @@ -2119,10 +2141,10 @@ mvneta_xdp_xmit_back(struct mvneta_port *pp, struct xdp_buff *xdp)
>  	nq = netdev_get_tx_queue(pp->dev, txq->id);
>  
>  	__netif_tx_lock(nq, cpu);
> -	ret = mvneta_xdp_submit_frame(pp, txq, xdpf, false);
> +	ret = mvneta_xdp_submit_frame(pp, txq, xdpf, &nxmit_byte, false);
>  	if (ret == MVNETA_XDP_TX) {
>  		u64_stats_update_begin(&stats->syncp);
> -		stats->es.ps.tx_bytes += xdpf->len;
> +		stats->es.ps.tx_bytes += nxmit_byte;
>  		stats->es.ps.tx_packets++;
>  		stats->es.ps.xdp_tx++;
>  		u64_stats_update_end(&stats->syncp);
> @@ -2161,11 +2183,11 @@ mvneta_xdp_xmit(struct net_device *dev, int num_frame,
>  
>  	__netif_tx_lock(nq, cpu);
>  	for (i = 0; i < num_frame; i++) {
> -		ret = mvneta_xdp_submit_frame(pp, txq, frames[i], true);
> +		ret = mvneta_xdp_submit_frame(pp, txq, frames[i], &nxmit_byte,
> +					      true);
>  		if (ret != MVNETA_XDP_TX)
>  			break;
>  
> -		nxmit_byte += frames[i]->len;
>  		nxmit++;
>  	}
>  
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v8 bpf-next 08/14] bpf: add multi-buff support to the bpf_xdp_adjust_tail() API
  2021-04-08 12:51 ` [PATCH v8 bpf-next 08/14] bpf: add multi-buff support to the bpf_xdp_adjust_tail() API Lorenzo Bianconi
@ 2021-04-08 19:15   ` Vladimir Oltean
  2021-04-08 20:54     ` Vladimir Oltean
  0 siblings, 1 reply; 57+ messages in thread
From: Vladimir Oltean @ 2021-04-08 19:15 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: bpf, netdev, lorenzo.bianconi, davem, kuba, ast, daniel, shayagr,
	sameehj, john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski

On Thu, Apr 08, 2021 at 02:51:00PM +0200, Lorenzo Bianconi wrote:
> From: Eelco Chaudron <echaudro@redhat.com>
> 
> This change adds support for tail growing and shrinking for XDP multi-buff.
> 
> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> ---
>  include/net/xdp.h |  5 ++++
>  net/core/filter.c | 63 +++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 68 insertions(+)
> 
> diff --git a/include/net/xdp.h b/include/net/xdp.h
> index c8eb7cf4ebed..55751cf2badf 100644
> --- a/include/net/xdp.h
> +++ b/include/net/xdp.h
> @@ -159,6 +159,11 @@ static inline void xdp_set_frag_size(skb_frag_t *frag, u32 size)
>  	frag->bv_len = size;
>  }
>  
> +static inline unsigned int xdp_get_frag_tailroom(const skb_frag_t *frag)
> +{
> +	return PAGE_SIZE - xdp_get_frag_size(frag) - xdp_get_frag_offset(frag);
> +}
> +

This is an interesting requirement. Must an XDP frame fragment be a full
PAGE_SIZE? enetc does not fulfill it, and I suspect that none of the
drivers with a "shared page" memory model will.

>  struct xdp_frame {
>  	void *data;
>  	u16 len;
> diff --git a/net/core/filter.c b/net/core/filter.c
> index cae56d08a670..c4eb1392f88e 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -3855,11 +3855,74 @@ static const struct bpf_func_proto bpf_xdp_adjust_head_proto = {
>  	.arg2_type	= ARG_ANYTHING,
>  };
>  
> +static int bpf_xdp_mb_adjust_tail(struct xdp_buff *xdp, int offset)
> +{
> +	struct xdp_shared_info *xdp_sinfo = xdp_get_shared_info_from_buff(xdp);
> +
> +	if (unlikely(xdp_sinfo->nr_frags == 0))
> +		return -EINVAL;

This function is called if xdp->mb is true, but we check whether
nr_frags != 0? Is this condition possible?

> +	if (offset >= 0) {
> +		skb_frag_t *frag = &xdp_sinfo->frags[xdp_sinfo->nr_frags - 1];
> +		int size;
> +
> +		if (unlikely(offset > xdp_get_frag_tailroom(frag)))
> +			return -EINVAL;
> +
> +		size = xdp_get_frag_size(frag);
> +		memset(xdp_get_frag_address(frag) + size, 0, offset);
> +		xdp_set_frag_size(frag, size + offset);
> +		xdp_sinfo->data_length += offset;
> +	} else {
> +		int i, frags_to_free = 0;
> +
> +		offset = abs(offset);
> +
> +		if (unlikely(offset > ((int)(xdp->data_end - xdp->data) +
> +				       xdp_sinfo->data_length -
> +				       ETH_HLEN)))

I think code alignment should be to xdp->data_end, not to (int).

Also: should we have some sort of helper for calculating the total
length of an xdp_frame (head + frags)? Maybe it's just me, but I find it
slightly confusing that xdp_sinfo->data_length does not account for
everything.

> +			return -EINVAL;
> +
> +		for (i = xdp_sinfo->nr_frags - 1; i >= 0 && offset > 0; i--) {
> +			skb_frag_t *frag = &xdp_sinfo->frags[i];
> +			int size = xdp_get_frag_size(frag);
> +			int shrink = min_t(int, offset, size);
> +
> +			offset -= shrink;
> +			if (likely(size - shrink > 0)) {
> +				/* When updating the final fragment we have
> +				 * to adjust the data_length in line.
> +				 */
> +				xdp_sinfo->data_length -= shrink;
> +				xdp_set_frag_size(frag, size - shrink);
> +				break;
> +			}
> +
> +			/* When we free the fragments,
> +			 * xdp_return_frags_from_buff() will take care
> +			 * of updating the xdp share info data_length.

s/xdp share info data_length/data_length from xdp_shared_info/

> +			 */
> +			frags_to_free++;
> +		}
> +
> +		if (unlikely(frags_to_free))
> +			xdp_return_num_frags_from_buff(xdp, frags_to_free);
> +
> +		if (unlikely(offset > 0))
> +			xdp->data_end -= offset;
> +	}
> +
> +	return 0;
> +}
> +
>  BPF_CALL_2(bpf_xdp_adjust_tail, struct xdp_buff *, xdp, int, offset)
>  {
>  	void *data_hard_end = xdp_data_hard_end(xdp); /* use xdp->frame_sz */
>  	void *data_end = xdp->data_end + offset;
>  
> +	if (unlikely(xdp->mb))
> +		return bpf_xdp_mb_adjust_tail(xdp, offset);
> +
>  	/* Notice that xdp_data_hard_end have reserved some tailroom */
>  	if (unlikely(data_end > data_hard_end))
>  		return -EINVAL;
> -- 
> 2.30.2
> 


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v8 bpf-next 08/14] bpf: add multi-buff support to the bpf_xdp_adjust_tail() API
  2021-04-08 19:15   ` Vladimir Oltean
@ 2021-04-08 20:54     ` Vladimir Oltean
  2021-04-09 18:13       ` Lorenzo Bianconi
  0 siblings, 1 reply; 57+ messages in thread
From: Vladimir Oltean @ 2021-04-08 20:54 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: bpf, netdev, lorenzo.bianconi, davem, kuba, ast, daniel, shayagr,
	sameehj, john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski

On Thu, Apr 08, 2021 at 10:15:47PM +0300, Vladimir Oltean wrote:
> > +		if (unlikely(offset > ((int)(xdp->data_end - xdp->data) +
> > +				       xdp_sinfo->data_length -
> > +				       ETH_HLEN)))
> 
> Also: should we have some sort of helper for calculating the total
> length of an xdp_frame (head + frags)? Maybe it's just me, but I find it
> slightly confusing that xdp_sinfo->data_length does not account for
> everything.

I see now that xdp_buff :: frame_length is added in patch 10. It is a
bit strange to not use it wherever you can? Could patch 10 be moved
before patch 8?

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v8 bpf-next 09/14] bpd: add multi-buffer support to xdp copy helpers
  2021-04-08 12:51 ` [PATCH v8 bpf-next 09/14] bpd: add multi-buffer support to xdp copy helpers Lorenzo Bianconi
@ 2021-04-08 20:57   ` Vladimir Oltean
  2021-04-09 18:19     ` Lorenzo Bianconi
  2021-04-08 21:04   ` Vladimir Oltean
  1 sibling, 1 reply; 57+ messages in thread
From: Vladimir Oltean @ 2021-04-08 20:57 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: bpf, netdev, lorenzo.bianconi, davem, kuba, ast, daniel, shayagr,
	sameehj, john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski

On Thu, Apr 08, 2021 at 02:51:01PM +0200, Lorenzo Bianconi wrote:
> From: Eelco Chaudron <echaudro@redhat.com>
> 
> This patch adds support for multi-buffer for the following helpers:
>   - bpf_xdp_output()
>   - bpf_perf_event_output()
> 
> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> ---
> diff --git a/tools/testing/selftests/bpf/progs/test_xdp_bpf2bpf.c b/tools/testing/selftests/bpf/progs/test_xdp_bpf2bpf.c
> index a038e827f850..d5a5f603d252 100644
> --- a/tools/testing/selftests/bpf/progs/test_xdp_bpf2bpf.c
> +++ b/tools/testing/selftests/bpf/progs/test_xdp_bpf2bpf.c
> @@ -27,6 +27,7 @@ struct xdp_buff {
>  	void *data_hard_start;
>  	unsigned long handle;
>  	struct xdp_rxq_info *rxq;
> +	__u32 frame_length;

This patch will not work without patch 10, so could you change the order?

>  } __attribute__((preserve_access_index));
>  
>  struct meta {
> @@ -49,7 +50,7 @@ int BPF_PROG(trace_on_entry, struct xdp_buff *xdp)
>  	void *data = (void *)(long)xdp->data;
>  
>  	meta.ifindex = xdp->rxq->dev->ifindex;
> -	meta.pkt_len = data_end - data;
> +	meta.pkt_len = xdp->frame_length;
>  	bpf_xdp_output(xdp, &perf_buf_map,
>  		       ((__u64) meta.pkt_len << 32) |
>  		       BPF_F_CURRENT_CPU,
> -- 
> 2.30.2
> 

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v8 bpf-next 09/14] bpd: add multi-buffer support to xdp copy helpers
  2021-04-08 12:51 ` [PATCH v8 bpf-next 09/14] bpd: add multi-buffer support to xdp copy helpers Lorenzo Bianconi
  2021-04-08 20:57   ` Vladimir Oltean
@ 2021-04-08 21:04   ` Vladimir Oltean
  2021-04-14  8:08     ` Eelco Chaudron
  1 sibling, 1 reply; 57+ messages in thread
From: Vladimir Oltean @ 2021-04-08 21:04 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: bpf, netdev, lorenzo.bianconi, davem, kuba, ast, daniel, shayagr,
	sameehj, john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski

On Thu, Apr 08, 2021 at 02:51:01PM +0200, Lorenzo Bianconi wrote:
> From: Eelco Chaudron <echaudro@redhat.com>
> 
> This patch adds support for multi-buffer for the following helpers:
>   - bpf_xdp_output()
>   - bpf_perf_event_output()
> 
> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> ---

Also there is a typo in the commit message: bpd -> bpf.

>  net/core/filter.c                             |  63 ++++++++-
>  .../selftests/bpf/prog_tests/xdp_bpf2bpf.c    | 127 ++++++++++++------
>  .../selftests/bpf/progs/test_xdp_bpf2bpf.c    |   3 +-
>  3 files changed, 149 insertions(+), 44 deletions(-)
> 
> diff --git a/net/core/filter.c b/net/core/filter.c
> index c4eb1392f88e..c00f52ab2532 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -4549,10 +4549,56 @@ static const struct bpf_func_proto bpf_sk_ancestor_cgroup_id_proto = {
>  };
>  #endif
>  
> -static unsigned long bpf_xdp_copy(void *dst_buff, const void *src_buff,
> +static unsigned long bpf_xdp_copy(void *dst_buff, const void *ctx,
>  				  unsigned long off, unsigned long len)
>  {
> -	memcpy(dst_buff, src_buff + off, len);
> +	struct xdp_buff *xdp = (struct xdp_buff *)ctx;

There is no need to cast a void pointer in C.

> +	struct xdp_shared_info *xdp_sinfo;
> +	unsigned long base_len;
> +
> +	if (likely(!xdp->mb)) {
> +		memcpy(dst_buff, xdp->data + off, len);
> +		return 0;
> +	}
> +
> +	base_len = xdp->data_end - xdp->data;

Would a static inline int xdp_buff_head_len() be useful?

> +	xdp_sinfo = xdp_get_shared_info_from_buff(xdp);
> +	do {
> +		const void *src_buff = NULL;
> +		unsigned long copy_len = 0;
> +
> +		if (off < base_len) {
> +			src_buff = xdp->data + off;
> +			copy_len = min(len, base_len - off);
> +		} else {
> +			unsigned long frag_off_total = base_len;
> +			int i;
> +
> +			for (i = 0; i < xdp_sinfo->nr_frags; i++) {
> +				skb_frag_t *frag = &xdp_sinfo->frags[i];
> +				unsigned long frag_len, frag_off;
> +
> +				frag_len = xdp_get_frag_size(frag);
> +				frag_off = off - frag_off_total;
> +				if (frag_off < frag_len) {
> +					src_buff = xdp_get_frag_address(frag) +
> +						   frag_off;
> +					copy_len = min(len,
> +						       frag_len - frag_off);
> +					break;
> +				}
> +				frag_off_total += frag_len;
> +			}
> +		}
> +		if (!src_buff)
> +			break;
> +
> +		memcpy(dst_buff, src_buff, copy_len);
> +		off += copy_len;
> +		len -= copy_len;
> +		dst_buff += copy_len;
> +	} while (len);
> +
>  	return 0;
>  }

^ permalink raw reply	[flat|nested] 57+ messages in thread

* RE: [PATCH v8 bpf-next 00/14] mvneta: introduce XDP multi-buffer support
  2021-04-08 12:50 [PATCH v8 bpf-next 00/14] mvneta: introduce XDP multi-buffer support Lorenzo Bianconi
                   ` (13 preceding siblings ...)
  2021-04-08 12:51 ` [PATCH v8 bpf-next 14/14] bpf: update xdp_adjust_tail selftest to include multi-buffer Lorenzo Bianconi
@ 2021-04-09  0:56 ` John Fastabend
  2021-04-09 20:16   ` Lorenzo Bianconi
  2021-04-13 15:16   ` Eelco Chaudron
  2021-04-16 14:27 ` Magnus Karlsson
  15 siblings, 2 replies; 57+ messages in thread
From: John Fastabend @ 2021-04-09  0:56 UTC (permalink / raw)
  To: Lorenzo Bianconi, bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr, sameehj,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski

Lorenzo Bianconi wrote:
> This series introduce XDP multi-buffer support. The mvneta driver is
> the first to support these new "non-linear" xdp_{buff,frame}. Reviewers
> please focus on how these new types of xdp_{buff,frame} packets
> traverse the different layers and the layout design. It is on purpose
> that BPF-helpers are kept simple, as we don't want to expose the
> internal layout to allow later changes.
> 
> For now, to keep the design simple and to maintain performance, the XDP
> BPF-prog (still) only have access to the first-buffer. It is left for
> later (another patchset) to add payload access across multiple buffers.
> This patchset should still allow for these future extensions. The goal
> is to lift the XDP MTU restriction that comes with XDP, but maintain
> same performance as before.
> 
> The main idea for the new multi-buffer layout is to reuse the same
> layout used for non-linear SKB. We introduced a "xdp_shared_info" data
> structure at the end of the first buffer to link together subsequent buffers.
> xdp_shared_info will alias skb_shared_info allowing to keep most of the frags
> in the same cache-line (while with skb_shared_info only the first fragment will
> be placed in the first "shared_info" cache-line). Moreover we introduced some
> xdp_shared_info helpers aligned to skb_frag* ones.
> Converting xdp_frame to SKB and deliver it to the network stack is shown in
> patch 07/14. Building the SKB, the xdp_shared_info structure will be converted
> in a skb_shared_info one.
> 
> A multi-buffer bit (mb) has been introduced in xdp_{buff,frame} structure
> to notify the bpf/network layer if this is a xdp multi-buffer frame (mb = 1)
> or not (mb = 0).
> The mb bit will be set by a xdp multi-buffer capable driver only for
> non-linear frames maintaining the capability to receive linear frames
> without any extra cost since the xdp_shared_info structure at the end
> of the first buffer will be initialized only if mb is set.
> 
> Typical use cases for this series are:
> - Jumbo-frames
> - Packet header split (please see Google���s use-case @ NetDevConf 0x14, [0])
> - TSO
> 
> A new frame_length field has been introduce in XDP ctx in order to notify the
> eBPF layer about the total frame size (linear + paged parts).
> 
> bpf_xdp_adjust_tail and bpf_xdp_copy helpers have been modified to take into
> account xdp multi-buff frames.

I just read the commit messages for v8 so far. But, I'm still wondering how
to handle use cases where we want to put extra bytes at the end of the
packet, or really anywhere in the general case. We can extend tail with above
is there anyway to then write into that extra space?

I think most use cases will only want headers so we can likely make it 
a callout to a helper. Could we add something like, xdp_get_bytes(start, end)
to pull in the bytes?

My dumb pseudoprogram being something like,

  trailer[16] = {0,1,2,3,4,5,6,7,8,9,a,b,c,d,e}
  trailer_size = 16;
  old_end = xdp->length;
  new_end = xdp->length + trailer_size;

  err = bpf_xdp_adjust_tail(xdp, trailer_size)
  if (err) return err;

  err = xdp_get_bytes(xdp, old_end, new_end);
  if (err) return err;

  memcpy(xdp->data, trailer, trailer_size);

Do you think that could work if we code up xdp_get_bytes()? Does the driver
have enough context to adjust xdp to map to my get_bytes() call? I think
so but we should check.

> 
> More info about the main idea behind this approach can be found here [1][2].

Thanks for working on this!

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v8 bpf-next 01/14] xdp: introduce mb in xdp_buff/xdp_frame
  2021-04-08 18:17   ` Vladimir Oltean
@ 2021-04-09 16:03     ` Lorenzo Bianconi
  0 siblings, 0 replies; 57+ messages in thread
From: Lorenzo Bianconi @ 2021-04-09 16:03 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: bpf, netdev, lorenzo.bianconi, davem, kuba, ast, daniel, shayagr,
	sameehj, john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski

[-- Attachment #1: Type: text/plain, Size: 2506 bytes --]

> On Thu, Apr 08, 2021 at 02:50:53PM +0200, Lorenzo Bianconi wrote:
> > Introduce multi-buffer bit (mb) in xdp_frame/xdp_buffer data structure
> > in order to specify if this is a linear buffer (mb = 0) or a multi-buffer
> > frame (mb = 1). In the latter case the shared_info area at the end of the
> > first buffer will be properly initialized to link together subsequent
> > buffers.
> > 
> > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> > ---
> >  include/net/xdp.h | 11 +++++++++--
> >  1 file changed, 9 insertions(+), 2 deletions(-)
> > 
> > diff --git a/include/net/xdp.h b/include/net/xdp.h
> > index a5bc214a49d9..842580a61563 100644
> > --- a/include/net/xdp.h
> > +++ b/include/net/xdp.h
> > @@ -73,7 +73,10 @@ struct xdp_buff {
> >  	void *data_hard_start;
> >  	struct xdp_rxq_info *rxq;
> >  	struct xdp_txq_info *txq;
> > -	u32 frame_sz; /* frame size to deduce data_hard_end/reserved tailroom*/
> > +	u32 frame_sz:31; /* frame size to deduce data_hard_end/reserved
> > +			  * tailroom
> > +			  */
> 
> This comment would have fit just fine on one line:
> 
> 	/* frame size to deduce data_hard_end/reserved tailroom */

ack, thx I will fix it in v9

Regards,
Lorenzo

> 
> > +	u32 mb:1; /* xdp non-linear buffer */
> >  };
> >  
> >  static __always_inline void
> > @@ -81,6 +84,7 @@ xdp_init_buff(struct xdp_buff *xdp, u32 frame_sz, struct xdp_rxq_info *rxq)
> >  {
> >  	xdp->frame_sz = frame_sz;
> >  	xdp->rxq = rxq;
> > +	xdp->mb = 0;
> >  }
> >  
> >  static __always_inline void
> > @@ -116,7 +120,8 @@ struct xdp_frame {
> >  	u16 len;
> >  	u16 headroom;
> >  	u32 metasize:8;
> > -	u32 frame_sz:24;
> > +	u32 frame_sz:23;
> > +	u32 mb:1; /* xdp non-linear frame */
> >  	/* Lifetime of xdp_rxq_info is limited to NAPI/enqueue time,
> >  	 * while mem info is valid on remote CPU.
> >  	 */
> > @@ -179,6 +184,7 @@ void xdp_convert_frame_to_buff(struct xdp_frame *frame, struct xdp_buff *xdp)
> >  	xdp->data_end = frame->data + frame->len;
> >  	xdp->data_meta = frame->data - frame->metasize;
> >  	xdp->frame_sz = frame->frame_sz;
> > +	xdp->mb = frame->mb;
> >  }
> >  
> >  static inline
> > @@ -205,6 +211,7 @@ int xdp_update_frame_from_buff(struct xdp_buff *xdp,
> >  	xdp_frame->headroom = headroom - sizeof(*xdp_frame);
> >  	xdp_frame->metasize = metasize;
> >  	xdp_frame->frame_sz = xdp->frame_sz;
> > +	xdp_frame->mb = xdp->mb;
> >  
> >  	return 0;
> >  }
> > -- 
> > 2.30.2
> > 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v8 bpf-next 03/14] net: mvneta: update mb bit before passing the xdp buffer to eBPF layer
  2021-04-08 18:19   ` Vladimir Oltean
@ 2021-04-09 16:24     ` Lorenzo Bianconi
  0 siblings, 0 replies; 57+ messages in thread
From: Lorenzo Bianconi @ 2021-04-09 16:24 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: bpf, netdev, lorenzo.bianconi, davem, kuba, ast, daniel, shayagr,
	sameehj, john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski

[-- Attachment #1: Type: text/plain, Size: 4066 bytes --]

> On Thu, Apr 08, 2021 at 02:50:55PM +0200, Lorenzo Bianconi wrote:
> > Update multi-buffer bit (mb) in xdp_buff to notify XDP/eBPF layer and
> > XDP remote drivers if this is a "non-linear" XDP buffer. Access
> > xdp_shared_info only if xdp_buff mb is set.
> > 
> > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> > ---
> >  drivers/net/ethernet/marvell/mvneta.c | 26 ++++++++++++++++++++------
> >  1 file changed, 20 insertions(+), 6 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
> > index a52e132fd2cf..94e29cce693a 100644
> > --- a/drivers/net/ethernet/marvell/mvneta.c
> > +++ b/drivers/net/ethernet/marvell/mvneta.c
> > @@ -2041,12 +2041,16 @@ mvneta_xdp_put_buff(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
> >  {
> >  	int i;
> >  
> > +	if (likely(!xdp->mb))
> > +		goto out;
> > +
> 
> Is there any particular reason for this extra check?

xdp_sinfo->frags[] is initialized just if xdp->mb is set.

Regards,
Lorenzo

> 
> >  	for (i = 0; i < xdp_sinfo->nr_frags; i++) {
> >  		skb_frag_t *frag = &xdp_sinfo->frags[i];
> >  
> >  		page_pool_put_full_page(rxq->page_pool,
> >  					xdp_get_frag_page(frag), true);
> >  	}
> > +out:
> >  	page_pool_put_page(rxq->page_pool, virt_to_head_page(xdp->data),
> >  			   sync_len, true);
> >  }
> > @@ -2246,7 +2250,6 @@ mvneta_swbm_rx_frame(struct mvneta_port *pp,
> >  {
> >  	unsigned char *data = page_address(page);
> >  	int data_len = -MVNETA_MH_SIZE, len;
> > -	struct xdp_shared_info *xdp_sinfo;
> >  	struct net_device *dev = pp->dev;
> >  	enum dma_data_direction dma_dir;
> >  
> > @@ -2270,9 +2273,6 @@ mvneta_swbm_rx_frame(struct mvneta_port *pp,
> >  	prefetch(data);
> >  	xdp_prepare_buff(xdp, data, pp->rx_offset_correction + MVNETA_MH_SIZE,
> >  			 data_len, false);
> > -
> > -	xdp_sinfo = xdp_get_shared_info_from_buff(xdp);
> > -	xdp_sinfo->nr_frags = 0;
> >  }
> >  
> >  static void
> > @@ -2307,12 +2307,18 @@ mvneta_swbm_add_rx_fragment(struct mvneta_port *pp,
> >  		xdp_set_frag_size(frag, data_len);
> >  		xdp_set_frag_page(frag, page);
> >  
> > +		if (!xdp->mb) {
> > +			xdp_sinfo->data_length = *size;
> > +			xdp->mb = 1;
> > +		}
> >  		/* last fragment */
> >  		if (len == *size) {
> >  			struct xdp_shared_info *sinfo;
> >  
> >  			sinfo = xdp_get_shared_info_from_buff(xdp);
> >  			sinfo->nr_frags = xdp_sinfo->nr_frags;
> > +			sinfo->data_length = xdp_sinfo->data_length;
> > +
> >  			memcpy(sinfo->frags, xdp_sinfo->frags,
> >  			       sinfo->nr_frags * sizeof(skb_frag_t));
> >  		}
> > @@ -2327,11 +2333,15 @@ mvneta_swbm_build_skb(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
> >  		      struct xdp_buff *xdp, u32 desc_status)
> >  {
> >  	struct xdp_shared_info *xdp_sinfo = xdp_get_shared_info_from_buff(xdp);
> > -	int i, num_frags = xdp_sinfo->nr_frags;
> >  	skb_frag_t frag_list[MAX_SKB_FRAGS];
> > +	int i, num_frags = 0;
> >  	struct sk_buff *skb;
> >  
> > -	memcpy(frag_list, xdp_sinfo->frags, sizeof(skb_frag_t) * num_frags);
> > +	if (unlikely(xdp->mb)) {
> > +		num_frags = xdp_sinfo->nr_frags;
> > +		memcpy(frag_list, xdp_sinfo->frags,
> > +		       sizeof(skb_frag_t) * num_frags);
> > +	}
> >  
> >  	skb = build_skb(xdp->data_hard_start, PAGE_SIZE);
> >  	if (!skb)
> > @@ -2343,6 +2353,9 @@ mvneta_swbm_build_skb(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
> >  	skb_put(skb, xdp->data_end - xdp->data);
> >  	mvneta_rx_csum(pp, desc_status, skb);
> >  
> > +	if (likely(!xdp->mb))
> > +		return skb;
> > +
> >  	for (i = 0; i < num_frags; i++) {
> >  		struct page *page = xdp_get_frag_page(&frag_list[i]);
> >  
> > @@ -2404,6 +2417,7 @@ static int mvneta_rx_swbm(struct napi_struct *napi,
> >  			frame_sz = size - ETH_FCS_LEN;
> >  			desc_status = rx_status;
> >  
> > +			xdp_buf.mb = 0;
> >  			mvneta_swbm_rx_frame(pp, rx_desc, rxq, &xdp_buf,
> >  					     &size, page);
> >  		} else {
> > -- 
> > 2.30.2
> > 
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v8 bpf-next 04/14] xdp: add multi-buff support to xdp_return_{buff/frame}
  2021-04-08 18:30   ` Vladimir Oltean
@ 2021-04-09 16:28     ` Lorenzo Bianconi
  0 siblings, 0 replies; 57+ messages in thread
From: Lorenzo Bianconi @ 2021-04-09 16:28 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: bpf, netdev, lorenzo.bianconi, davem, kuba, ast, daniel, shayagr,
	sameehj, john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski

[-- Attachment #1: Type: text/plain, Size: 6362 bytes --]

> On Thu, Apr 08, 2021 at 02:50:56PM +0200, Lorenzo Bianconi wrote:
> > Take into account if the received xdp_buff/xdp_frame is non-linear
> > recycling/returning the frame memory to the allocator or into
> > xdp_frame_bulk.
> > Introduce xdp_return_num_frags_from_buff to return a given number of
> > fragments from a xdp multi-buff starting from the tail.
> > 
> > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> > ---
> >  include/net/xdp.h | 19 ++++++++++--
> >  net/core/xdp.c    | 76 ++++++++++++++++++++++++++++++++++++++++++++++-
> >  2 files changed, 92 insertions(+), 3 deletions(-)
> > 
> > diff --git a/include/net/xdp.h b/include/net/xdp.h
> > index 02aea7696d15..c8eb7cf4ebed 100644
> > --- a/include/net/xdp.h
> > +++ b/include/net/xdp.h
> > @@ -289,6 +289,7 @@ void xdp_return_buff(struct xdp_buff *xdp);
> >  void xdp_flush_frame_bulk(struct xdp_frame_bulk *bq);
> >  void xdp_return_frame_bulk(struct xdp_frame *xdpf,
> >  			   struct xdp_frame_bulk *bq);
> > +void xdp_return_num_frags_from_buff(struct xdp_buff *xdp, u16 num_frags);
> >  
> >  /* When sending xdp_frame into the network stack, then there is no
> >   * return point callback, which is needed to release e.g. DMA-mapping
> > @@ -299,10 +300,24 @@ void __xdp_release_frame(void *data, struct xdp_mem_info *mem);
> >  static inline void xdp_release_frame(struct xdp_frame *xdpf)
> >  {
> >  	struct xdp_mem_info *mem = &xdpf->mem;
> > +	struct xdp_shared_info *xdp_sinfo;
> > +	int i;
> >  
> >  	/* Curr only page_pool needs this */
> > -	if (mem->type == MEM_TYPE_PAGE_POOL)
> > -		__xdp_release_frame(xdpf->data, mem);
> > +	if (mem->type != MEM_TYPE_PAGE_POOL)
> > +		return;
> > +
> > +	if (likely(!xdpf->mb))
> > +		goto out;
> > +
> > +	xdp_sinfo = xdp_get_shared_info_from_frame(xdpf);
> > +	for (i = 0; i < xdp_sinfo->nr_frags; i++) {
> > +		struct page *page = xdp_get_frag_page(&xdp_sinfo->frags[i]);
> > +
> > +		__xdp_release_frame(page_address(page), mem);
> > +	}
> > +out:
> > +	__xdp_release_frame(xdpf->data, mem);
> >  }
> >  
> >  int xdp_rxq_info_reg(struct xdp_rxq_info *xdp_rxq,
> > diff --git a/net/core/xdp.c b/net/core/xdp.c
> > index 05354976c1fc..430f516259d9 100644
> > --- a/net/core/xdp.c
> > +++ b/net/core/xdp.c
> > @@ -374,12 +374,38 @@ static void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct,
> >  
> >  void xdp_return_frame(struct xdp_frame *xdpf)
> >  {
> > +	struct xdp_shared_info *xdp_sinfo;
> > +	int i;
> > +
> > +	if (likely(!xdpf->mb))
> > +		goto out;
> > +
> > +	xdp_sinfo = xdp_get_shared_info_from_frame(xdpf);
> > +	for (i = 0; i < xdp_sinfo->nr_frags; i++) {
> > +		struct page *page = xdp_get_frag_page(&xdp_sinfo->frags[i]);
> > +
> > +		__xdp_return(page_address(page), &xdpf->mem, false, NULL);
> > +	}
> > +out:
> >  	__xdp_return(xdpf->data, &xdpf->mem, false, NULL);
> >  }
> >  EXPORT_SYMBOL_GPL(xdp_return_frame);
> >  
> >  void xdp_return_frame_rx_napi(struct xdp_frame *xdpf)
> >  {
> > +	struct xdp_shared_info *xdp_sinfo;
> > +	int i;
> > +
> > +	if (likely(!xdpf->mb))
> > +		goto out;
> > +
> > +	xdp_sinfo = xdp_get_shared_info_from_frame(xdpf);
> > +	for (i = 0; i < xdp_sinfo->nr_frags; i++) {
> > +		struct page *page = xdp_get_frag_page(&xdp_sinfo->frags[i]);
> > +
> > +		__xdp_return(page_address(page), &xdpf->mem, true, NULL);
> > +	}
> > +out:
> >  	__xdp_return(xdpf->data, &xdpf->mem, true, NULL);
> >  }
> >  EXPORT_SYMBOL_GPL(xdp_return_frame_rx_napi);
> > @@ -415,7 +441,7 @@ void xdp_return_frame_bulk(struct xdp_frame *xdpf,
> >  	struct xdp_mem_allocator *xa;
> >  
> >  	if (mem->type != MEM_TYPE_PAGE_POOL) {
> > -		__xdp_return(xdpf->data, &xdpf->mem, false, NULL);
> > +		xdp_return_frame(xdpf);
> >  		return;
> >  	}
> >  
> > @@ -434,15 +460,63 @@ void xdp_return_frame_bulk(struct xdp_frame *xdpf,
> >  		bq->xa = rhashtable_lookup(mem_id_ht, &mem->id, mem_id_rht_params);
> >  	}
> >  
> > +	if (unlikely(xdpf->mb)) {
> > +		struct xdp_shared_info *xdp_sinfo;
> > +		int i;
> > +
> > +		xdp_sinfo = xdp_get_shared_info_from_frame(xdpf);
> > +		for (i = 0; i < xdp_sinfo->nr_frags; i++) {
> > +			skb_frag_t *frag = &xdp_sinfo->frags[i];
> > +
> > +			bq->q[bq->count++] = xdp_get_frag_address(frag);
> > +			if (bq->count == XDP_BULK_QUEUE_SIZE)
> > +				xdp_flush_frame_bulk(bq);
> > +		}
> > +	}
> >  	bq->q[bq->count++] = xdpf->data;
> >  }
> >  EXPORT_SYMBOL_GPL(xdp_return_frame_bulk);
> >  
> >  void xdp_return_buff(struct xdp_buff *xdp)
> >  {
> > +	struct xdp_shared_info *xdp_sinfo;
> > +	int i;
> > +
> > +	if (likely(!xdp->mb))
> > +		goto out;
> > +
> > +	xdp_sinfo = xdp_get_shared_info_from_buff(xdp);
> > +	for (i = 0; i < xdp_sinfo->nr_frags; i++) {
> > +		struct page *page = xdp_get_frag_page(&xdp_sinfo->frags[i]);
> > +
> > +		__xdp_return(page_address(page), &xdp->rxq->mem, true, xdp);
> > +	}
> > +out:
> >  	__xdp_return(xdp->data, &xdp->rxq->mem, true, xdp);
> >  }
> >  
> > +void xdp_return_num_frags_from_buff(struct xdp_buff *xdp, u16 num_frags)
> > +{
> > +	struct xdp_shared_info *xdp_sinfo;
> > +	int i;
> > +
> > +	if (unlikely(!xdp->mb))
> > +		return;
> > +
> > +	xdp_sinfo = xdp_get_shared_info_from_buff(xdp);
> > +	num_frags = min_t(u16, num_frags, xdp_sinfo->nr_frags);
> > +	for (i = 1; i <= num_frags; i++) {
> > +		skb_frag_t *frag = &xdp_sinfo->frags[xdp_sinfo->nr_frags - i];
> > +		struct page *page = xdp_get_frag_page(frag);
> > +
> > +		xdp_sinfo->data_length -= xdp_get_frag_size(frag);
> > +		__xdp_return(page_address(page), &xdp->rxq->mem, false, NULL);
> > +	}
> > +	xdp_sinfo->nr_frags -= num_frags;
> > +	xdp->mb = !!xdp_sinfo->nr_frags;
> > +}
> > +EXPORT_SYMBOL_GPL(xdp_return_num_frags_from_buff);
> > +
> >  /* Only called for MEM_TYPE_PAGE_POOL see xdp.h */
> >  void __xdp_release_frame(void *data, struct xdp_mem_info *mem)
> >  {
> 
> None of this really benefits in any way from having the extra "mb" bit,
> does it? I get the impression it would work just the same way without it.

paged xdp_buff part is initialized only if xdp->mb is set. The reason is not hit
performances in the most common single buffer use case. We always need to check
xdp->mb or xdf->mb before accessing paged area.

Regards,
Lorenzo

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v8 bpf-next 05/14] net: mvneta: add multi buffer support to XDP_TX
  2021-04-08 18:40   ` Vladimir Oltean
@ 2021-04-09 16:36     ` Lorenzo Bianconi
  0 siblings, 0 replies; 57+ messages in thread
From: Lorenzo Bianconi @ 2021-04-09 16:36 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: bpf, netdev, lorenzo.bianconi, davem, kuba, ast, daniel, shayagr,
	sameehj, john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski

[-- Attachment #1: Type: text/plain, Size: 6415 bytes --]

> On Thu, Apr 08, 2021 at 02:50:57PM +0200, Lorenzo Bianconi wrote:
> > Introduce the capability to map non-linear xdp buffer running
> > mvneta_xdp_submit_frame() for XDP_TX and XDP_REDIRECT
> > 
> > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> > ---
> >  drivers/net/ethernet/marvell/mvneta.c | 94 +++++++++++++++++----------
> >  1 file changed, 58 insertions(+), 36 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
> > index 94e29cce693a..e95d8df0fcdb 100644
> > --- a/drivers/net/ethernet/marvell/mvneta.c
> > +++ b/drivers/net/ethernet/marvell/mvneta.c
> > @@ -1860,8 +1860,8 @@ static void mvneta_txq_bufs_free(struct mvneta_port *pp,
> >  			bytes_compl += buf->skb->len;
> >  			pkts_compl++;
> >  			dev_kfree_skb_any(buf->skb);
> > -		} else if (buf->type == MVNETA_TYPE_XDP_TX ||
> > -			   buf->type == MVNETA_TYPE_XDP_NDO) {
> > +		} else if ((buf->type == MVNETA_TYPE_XDP_TX ||
> > +			    buf->type == MVNETA_TYPE_XDP_NDO) && buf->xdpf) {
> >  			if (napi && buf->type == MVNETA_TYPE_XDP_TX)
> >  				xdp_return_frame_rx_napi(buf->xdpf);
> >  			else
> > @@ -2057,45 +2057,67 @@ mvneta_xdp_put_buff(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
> >  
> >  static int
> >  mvneta_xdp_submit_frame(struct mvneta_port *pp, struct mvneta_tx_queue *txq,
> > -			struct xdp_frame *xdpf, bool dma_map)
> > +			struct xdp_frame *xdpf, int *nxmit_byte, bool dma_map)
> >  {
> > -	struct mvneta_tx_desc *tx_desc;
> > -	struct mvneta_tx_buf *buf;
> > -	dma_addr_t dma_addr;
> > +	struct mvneta_tx_desc *tx_desc = NULL;
> > +	struct xdp_shared_info *xdp_sinfo;
> > +	struct page *page;
> > +	int i, num_frames;
> > +
> > +	xdp_sinfo = xdp_get_shared_info_from_frame(xdpf);
> > +	num_frames = xdpf->mb ? xdp_sinfo->nr_frags + 1 : 1;
> >  
> > -	if (txq->count >= txq->tx_stop_threshold)
> > +	if (txq->count + num_frames >= txq->size)
> >  		return MVNETA_XDP_DROPPED;
> >  
> > -	tx_desc = mvneta_txq_next_desc_get(txq);
> > +	for (i = 0; i < num_frames; i++) {
> 
> I get the feeling this is more like num_bufs than num_frames.

naming is the hardest part :)

> 
> > +		struct mvneta_tx_buf *buf = &txq->buf[txq->txq_put_index];
> > +		skb_frag_t *frag = i ? &xdp_sinfo->frags[i - 1] : NULL;
> > +		int len = i ? xdp_get_frag_size(frag) : xdpf->len;
> > +		dma_addr_t dma_addr;
> >  
> > -	buf = &txq->buf[txq->txq_put_index];
> > -	if (dma_map) {
> > -		/* ndo_xdp_xmit */
> > -		dma_addr = dma_map_single(pp->dev->dev.parent, xdpf->data,
> > -					  xdpf->len, DMA_TO_DEVICE);
> > -		if (dma_mapping_error(pp->dev->dev.parent, dma_addr)) {
> > -			mvneta_txq_desc_put(txq);
> > -			return MVNETA_XDP_DROPPED;
> > +		tx_desc = mvneta_txq_next_desc_get(txq);
> > +		if (dma_map) {
> > +			/* ndo_xdp_xmit */
> > +			void *data;
> > +
> > +			data = frag ? xdp_get_frag_address(frag) : xdpf->data;
> > +			dma_addr = dma_map_single(pp->dev->dev.parent, data,
> > +						  len, DMA_TO_DEVICE);
> > +			if (dma_mapping_error(pp->dev->dev.parent, dma_addr)) {
> > +				for (; i >= 0; i--)
> > +					mvneta_txq_desc_put(txq);
> 
> Don't you need to unmap the previous buffers too?

ack, right since these buffers do not belong to the pool, I will fix it.

Regards,
Lorenzo

> 
> > +				return MVNETA_XDP_DROPPED;
> > +			}
> > +			buf->type = MVNETA_TYPE_XDP_NDO;
> > +		} else {
> > +			page = frag ? xdp_get_frag_page(frag)
> > +				    : virt_to_page(xdpf->data);
> > +			dma_addr = page_pool_get_dma_addr(page);
> > +			if (frag)
> > +				dma_addr += xdp_get_frag_offset(frag);
> > +			else
> > +				dma_addr += sizeof(*xdpf) + xdpf->headroom;
> > +			dma_sync_single_for_device(pp->dev->dev.parent,
> > +						   dma_addr, len,
> > +						   DMA_BIDIRECTIONAL);
> > +			buf->type = MVNETA_TYPE_XDP_TX;
> >  		}
> > -		buf->type = MVNETA_TYPE_XDP_NDO;
> > -	} else {
> > -		struct page *page = virt_to_page(xdpf->data);
> > +		buf->xdpf = i ? NULL : xdpf;
> >  
> > -		dma_addr = page_pool_get_dma_addr(page) +
> > -			   sizeof(*xdpf) + xdpf->headroom;
> > -		dma_sync_single_for_device(pp->dev->dev.parent, dma_addr,
> > -					   xdpf->len, DMA_BIDIRECTIONAL);
> > -		buf->type = MVNETA_TYPE_XDP_TX;
> > +		tx_desc->command = !i ? MVNETA_TXD_F_DESC : 0;
> > +		tx_desc->buf_phys_addr = dma_addr;
> > +		tx_desc->data_size = len;
> > +		*nxmit_byte += len;
> > +
> > +		mvneta_txq_inc_put(txq);
> >  	}
> > -	buf->xdpf = xdpf;
> >  
> > -	tx_desc->command = MVNETA_TXD_FLZ_DESC;
> > -	tx_desc->buf_phys_addr = dma_addr;
> > -	tx_desc->data_size = xdpf->len;
> > +	/*last descriptor */
> > +	tx_desc->command |= MVNETA_TXD_L_DESC | MVNETA_TXD_Z_PAD;
> >  
> > -	mvneta_txq_inc_put(txq);
> > -	txq->pending++;
> > -	txq->count++;
> > +	txq->pending += num_frames;
> > +	txq->count += num_frames;
> >  
> >  	return MVNETA_XDP_TX;
> >  }
> > @@ -2106,8 +2128,8 @@ mvneta_xdp_xmit_back(struct mvneta_port *pp, struct xdp_buff *xdp)
> >  	struct mvneta_pcpu_stats *stats = this_cpu_ptr(pp->stats);
> >  	struct mvneta_tx_queue *txq;
> >  	struct netdev_queue *nq;
> > +	int cpu, nxmit_byte = 0;
> >  	struct xdp_frame *xdpf;
> > -	int cpu;
> >  	u32 ret;
> >  
> >  	xdpf = xdp_convert_buff_to_frame(xdp);
> > @@ -2119,10 +2141,10 @@ mvneta_xdp_xmit_back(struct mvneta_port *pp, struct xdp_buff *xdp)
> >  	nq = netdev_get_tx_queue(pp->dev, txq->id);
> >  
> >  	__netif_tx_lock(nq, cpu);
> > -	ret = mvneta_xdp_submit_frame(pp, txq, xdpf, false);
> > +	ret = mvneta_xdp_submit_frame(pp, txq, xdpf, &nxmit_byte, false);
> >  	if (ret == MVNETA_XDP_TX) {
> >  		u64_stats_update_begin(&stats->syncp);
> > -		stats->es.ps.tx_bytes += xdpf->len;
> > +		stats->es.ps.tx_bytes += nxmit_byte;
> >  		stats->es.ps.tx_packets++;
> >  		stats->es.ps.xdp_tx++;
> >  		u64_stats_update_end(&stats->syncp);
> > @@ -2161,11 +2183,11 @@ mvneta_xdp_xmit(struct net_device *dev, int num_frame,
> >  
> >  	__netif_tx_lock(nq, cpu);
> >  	for (i = 0; i < num_frame; i++) {
> > -		ret = mvneta_xdp_submit_frame(pp, txq, frames[i], true);
> > +		ret = mvneta_xdp_submit_frame(pp, txq, frames[i], &nxmit_byte,
> > +					      true);
> >  		if (ret != MVNETA_XDP_TX)
> >  			break;
> >  
> > -		nxmit_byte += frames[i]->len;
> >  		nxmit++;
> >  	}
> >  
> > -- 
> > 2.30.2
> > 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v8 bpf-next 08/14] bpf: add multi-buff support to the bpf_xdp_adjust_tail() API
  2021-04-08 20:54     ` Vladimir Oltean
@ 2021-04-09 18:13       ` Lorenzo Bianconi
  0 siblings, 0 replies; 57+ messages in thread
From: Lorenzo Bianconi @ 2021-04-09 18:13 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: bpf, netdev, lorenzo.bianconi, davem, kuba, ast, daniel, shayagr,
	sameehj, john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski

[-- Attachment #1: Type: text/plain, Size: 691 bytes --]

> On Thu, Apr 08, 2021 at 10:15:47PM +0300, Vladimir Oltean wrote:
> > > +		if (unlikely(offset > ((int)(xdp->data_end - xdp->data) +
> > > +				       xdp_sinfo->data_length -
> > > +				       ETH_HLEN)))
> > 
> > Also: should we have some sort of helper for calculating the total
> > length of an xdp_frame (head + frags)? Maybe it's just me, but I find it
> > slightly confusing that xdp_sinfo->data_length does not account for
> > everything.
> 
> I see now that xdp_buff :: frame_length is added in patch 10. It is a
> bit strange to not use it wherever you can? Could patch 10 be moved
> before patch 8?

yes, I agree we can change the patch order

Regards,
Lorenzo

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v8 bpf-next 09/14] bpd: add multi-buffer support to xdp copy helpers
  2021-04-08 20:57   ` Vladimir Oltean
@ 2021-04-09 18:19     ` Lorenzo Bianconi
  0 siblings, 0 replies; 57+ messages in thread
From: Lorenzo Bianconi @ 2021-04-09 18:19 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: bpf, netdev, lorenzo.bianconi, davem, kuba, ast, daniel, shayagr,
	sameehj, john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski

[-- Attachment #1: Type: text/plain, Size: 1456 bytes --]

> On Thu, Apr 08, 2021 at 02:51:01PM +0200, Lorenzo Bianconi wrote:
> > From: Eelco Chaudron <echaudro@redhat.com>
> > 
> > This patch adds support for multi-buffer for the following helpers:
> >   - bpf_xdp_output()
> >   - bpf_perf_event_output()
> > 
> > Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
> > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> > ---
> > diff --git a/tools/testing/selftests/bpf/progs/test_xdp_bpf2bpf.c b/tools/testing/selftests/bpf/progs/test_xdp_bpf2bpf.c
> > index a038e827f850..d5a5f603d252 100644
> > --- a/tools/testing/selftests/bpf/progs/test_xdp_bpf2bpf.c
> > +++ b/tools/testing/selftests/bpf/progs/test_xdp_bpf2bpf.c
> > @@ -27,6 +27,7 @@ struct xdp_buff {
> >  	void *data_hard_start;
> >  	unsigned long handle;
> >  	struct xdp_rxq_info *rxq;
> > +	__u32 frame_length;
> 
> This patch will not work without patch 10, so could you change the order?

ack, I will fix it in v9

Regards,
Lorenzo

> 
> >  } __attribute__((preserve_access_index));
> >  
> >  struct meta {
> > @@ -49,7 +50,7 @@ int BPF_PROG(trace_on_entry, struct xdp_buff *xdp)
> >  	void *data = (void *)(long)xdp->data;
> >  
> >  	meta.ifindex = xdp->rxq->dev->ifindex;
> > -	meta.pkt_len = data_end - data;
> > +	meta.pkt_len = xdp->frame_length;
> >  	bpf_xdp_output(xdp, &perf_buf_map,
> >  		       ((__u64) meta.pkt_len << 32) |
> >  		       BPF_F_CURRENT_CPU,
> > -- 
> > 2.30.2
> > 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v8 bpf-next 00/14] mvneta: introduce XDP multi-buffer support
  2021-04-09  0:56 ` [PATCH v8 bpf-next 00/14] mvneta: introduce XDP multi-buffer support John Fastabend
@ 2021-04-09 20:16   ` Lorenzo Bianconi
  2021-04-13 15:16   ` Eelco Chaudron
  1 sibling, 0 replies; 57+ messages in thread
From: Lorenzo Bianconi @ 2021-04-09 20:16 UTC (permalink / raw)
  To: John Fastabend
  Cc: Lorenzo Bianconi, bpf, netdev, davem, kuba, ast, daniel, shayagr,
	sameehj, dsahern, brouer, echaudro, jasowang, alexander.duyck,
	saeed, maciej.fijalkowski

[-- Attachment #1: Type: text/plain, Size: 1898 bytes --]

> Lorenzo Bianconi wrote:

[...]

> 
> I just read the commit messages for v8 so far. But, I'm still wondering how
> to handle use cases where we want to put extra bytes at the end of the
> packet, or really anywhere in the general case. We can extend tail with above
> is there anyway to then write into that extra space?
> 
> I think most use cases will only want headers so we can likely make it 
> a callout to a helper. Could we add something like, xdp_get_bytes(start, end)
> to pull in the bytes?
> 
> My dumb pseudoprogram being something like,
> 
>   trailer[16] = {0,1,2,3,4,5,6,7,8,9,a,b,c,d,e}
>   trailer_size = 16;
>   old_end = xdp->length;
>   new_end = xdp->length + trailer_size;
> 
>   err = bpf_xdp_adjust_tail(xdp, trailer_size)
>   if (err) return err;
> 
>   err = xdp_get_bytes(xdp, old_end, new_end);
>   if (err) return err;
> 
>   memcpy(xdp->data, trailer, trailer_size);
> 
> Do you think that could work if we code up xdp_get_bytes()? Does the driver
> have enough context to adjust xdp to map to my get_bytes() call? I think
> so but we should check.

Hi John,

can you please give more details about how xdp_get_bytes() is expected to work?
iiuc trailer will be pulled at the beginning of the frame after updating the
xdp_buff with xdp_get_bytes helper, correct?
If so I guess it will doable, it is just a matter of reserve more space mapping
the buffer on the dma engine respect to XDP_PACKET_HEADROOM. If the frame does
not fit in a single buffer, it will be split over multiple buffers.
If you are referring to add trailer at the end of the buffer, I guess it is
doable as well introducing a bpf helper.
I guess both of the solutions are orthogonal to this series.

Regards,
Lorenzo

> 
> > 
> > More info about the main idea behind this approach can be found here [1][2].
> 
> Thanks for working on this!
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v8 bpf-next 00/14] mvneta: introduce XDP multi-buffer support
  2021-04-09  0:56 ` [PATCH v8 bpf-next 00/14] mvneta: introduce XDP multi-buffer support John Fastabend
  2021-04-09 20:16   ` Lorenzo Bianconi
@ 2021-04-13 15:16   ` Eelco Chaudron
  1 sibling, 0 replies; 57+ messages in thread
From: Eelco Chaudron @ 2021-04-13 15:16 UTC (permalink / raw)
  To: John Fastabend
  Cc: Lorenzo Bianconi, bpf, netdev, lorenzo.bianconi, davem, kuba,
	ast, daniel, shayagr, sameehj, dsahern, brouer, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski



On 9 Apr 2021, at 2:56, John Fastabend wrote:

> Lorenzo Bianconi wrote:
>> This series introduce XDP multi-buffer support. The mvneta driver is
>> the first to support these new "non-linear" xdp_{buff,frame}. 
>> Reviewers
>> please focus on how these new types of xdp_{buff,frame} packets
>> traverse the different layers and the layout design. It is on purpose
>> that BPF-helpers are kept simple, as we don't want to expose the
>> internal layout to allow later changes.
>>
>> For now, to keep the design simple and to maintain performance, the 
>> XDP
>> BPF-prog (still) only have access to the first-buffer. It is left for
>> later (another patchset) to add payload access across multiple 
>> buffers.
>> This patchset should still allow for these future extensions. The 
>> goal
>> is to lift the XDP MTU restriction that comes with XDP, but maintain
>> same performance as before.
>>
>> The main idea for the new multi-buffer layout is to reuse the same
>> layout used for non-linear SKB. We introduced a "xdp_shared_info" 
>> data
>> structure at the end of the first buffer to link together subsequent 
>> buffers.
>> xdp_shared_info will alias skb_shared_info allowing to keep most of 
>> the frags
>> in the same cache-line (while with skb_shared_info only the first 
>> fragment will
>> be placed in the first "shared_info" cache-line). Moreover we 
>> introduced some
>> xdp_shared_info helpers aligned to skb_frag* ones.
>> Converting xdp_frame to SKB and deliver it to the network stack is 
>> shown in
>> patch 07/14. Building the SKB, the xdp_shared_info structure will be 
>> converted
>> in a skb_shared_info one.
>>
>> A multi-buffer bit (mb) has been introduced in xdp_{buff,frame} 
>> structure
>> to notify the bpf/network layer if this is a xdp multi-buffer frame 
>> (mb = 1)
>> or not (mb = 0).
>> The mb bit will be set by a xdp multi-buffer capable driver only for
>> non-linear frames maintaining the capability to receive linear frames
>> without any extra cost since the xdp_shared_info structure at the end
>> of the first buffer will be initialized only if mb is set.
>>
>> Typical use cases for this series are:
>> - Jumbo-frames
>> - Packet header split (please see Google���s use-case @ 
>> NetDevConf 0x14, [0])
>> - TSO
>>
>> A new frame_length field has been introduce in XDP ctx in order to 
>> notify the
>> eBPF layer about the total frame size (linear + paged parts).
>>
>> bpf_xdp_adjust_tail and bpf_xdp_copy helpers have been modified to 
>> take into
>> account xdp multi-buff frames.
>
> I just read the commit messages for v8 so far. But, I'm still 
> wondering how
> to handle use cases where we want to put extra bytes at the end of the
> packet, or really anywhere in the general case. We can extend tail 
> with above
> is there anyway to then write into that extra space?
>
> I think most use cases will only want headers so we can likely make it
> a callout to a helper. Could we add something like, 
> xdp_get_bytes(start, end)
> to pull in the bytes?
>
> My dumb pseudoprogram being something like,
>
>   trailer[16] = {0,1,2,3,4,5,6,7,8,9,a,b,c,d,e}
>   trailer_size = 16;
>   old_end = xdp->length;
>   new_end = xdp->length + trailer_size;
>
>   err = bpf_xdp_adjust_tail(xdp, trailer_size)
>   if (err) return err;
>
>   err = xdp_get_bytes(xdp, old_end, new_end);
>   if (err) return err;
>
>   memcpy(xdp->data, trailer, trailer_size);
>
> Do you think that could work if we code up xdp_get_bytes()? Does the 
> driver
> have enough context to adjust xdp to map to my get_bytes() call? I 
> think
> so but we should check.
>

I was thinking of doing something like the below, but I have no cycles 
to work on it:

void *bpf_xdp_access_bytes(struct xdp_buff *xdp_md, u32 offset, int 
*len, void *buffer)
      Description
              This function returns a pointer to the packet data, which 
can be
              accessed linearly for a maximum of *len* bytes.

              *offset* marks the starting point in the packet for which 
you
              would like to get a data pointer.

              *len* point to an initialized integer which tells the 
helper
              how many bytes from *offset* you would like to access. 
Supplying
              a value of 0 or less will tell the helper to report back 
how
              many bytes are available linearly from the offset (in this 
case
              the value of *buffer* is ignored). On return, the helper 
will
              update this value with the length available to access
              linearly at the address returned.

              *buffer* point to an optional buffer which MUST be the 
same size
              as *\*len* and will be used to copy in the data if it's 
not
              available linearly.

      Return
              Returns a pointer to the packet data requested accessible 
with
              a maximum length of *\*len*. NULL is returned on failure.

              Note that if a *buffer* is supplied and the data is not 
available
              linearly, the content is copied. In this case a pointer to
              *buffer* is returned.


int bpf_xdp_store_bytes(struct xdp_buff *xdp_md, u32 offset, const void 
*from, u32 len)
      Description
              Store *len* bytes from address *from* into the packet 
associated
              to *xdp_md*, at *offset*. This function will take care of 
copying
              data to multi-buffer XDP packets.

              A call to this helper is susceptible to change the 
underlying
              packet buffer. Therefore, at load time, all checks on 
pointers
              previously done by the verifier are invalidated and must 
be
              performed again, if the helper is used in combination with
              direct packet access.

      Return
              0 on success, or a negative error in case of failure.

>>
>> More info about the main idea behind this approach can be found here 
>> [1][2].
>
> Thanks for working on this!


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v8 bpf-next 09/14] bpd: add multi-buffer support to xdp copy helpers
  2021-04-08 21:04   ` Vladimir Oltean
@ 2021-04-14  8:08     ` Eelco Chaudron
  0 siblings, 0 replies; 57+ messages in thread
From: Eelco Chaudron @ 2021-04-14  8:08 UTC (permalink / raw)
  To: Vladimir Oltean, Lorenzo Bianconi
  Cc: bpf, netdev, lorenzo.bianconi, davem, kuba, ast, daniel, shayagr,
	sameehj, john.fastabend, dsahern, brouer, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski



On 8 Apr 2021, at 23:04, Vladimir Oltean wrote:

> On Thu, Apr 08, 2021 at 02:51:01PM +0200, Lorenzo Bianconi wrote:
>> From: Eelco Chaudron <echaudro@redhat.com>
>>
>> This patch adds support for multi-buffer for the following helpers:
>>   - bpf_xdp_output()
>>   - bpf_perf_event_output()
>>
>> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
>> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
>> ---
>
> Also there is a typo in the commit message: bpd -> bpf.

ACK, will fix in next version.

>>  net/core/filter.c                             |  63 ++++++++-
>>  .../selftests/bpf/prog_tests/xdp_bpf2bpf.c    | 127 
>> ++++++++++++------
>>  .../selftests/bpf/progs/test_xdp_bpf2bpf.c    |   3 +-
>>  3 files changed, 149 insertions(+), 44 deletions(-)
>>
>> diff --git a/net/core/filter.c b/net/core/filter.c
>> index c4eb1392f88e..c00f52ab2532 100644
>> --- a/net/core/filter.c
>> +++ b/net/core/filter.c
>> @@ -4549,10 +4549,56 @@ static const struct bpf_func_proto 
>> bpf_sk_ancestor_cgroup_id_proto = {
>>  };
>>  #endif
>>
>> -static unsigned long bpf_xdp_copy(void *dst_buff, const void 
>> *src_buff,
>> +static unsigned long bpf_xdp_copy(void *dst_buff, const void *ctx,
>>  				  unsigned long off, unsigned long len)
>>  {
>> -	memcpy(dst_buff, src_buff + off, len);
>> +	struct xdp_buff *xdp = (struct xdp_buff *)ctx;
>
> There is no need to cast a void pointer in C.

I added this as the void pointer is a const. However, looking at it 
again, we should probably change xdp_get_shared_info_from_buff() to also 
take a const pointer, i.e.:

    static inline struct xdp_shared_info *
   -xdp_get_shared_info_from_buff(struct xdp_buff *xdp)
   +xdp_get_shared_info_from_buff(const struct xdp_buff *xdp)
    {
           BUILD_BUG_ON(sizeof(struct xdp_shared_info) >
                        sizeof(struct skb_shared_info));

What do you think Lorenzo?

>> +	struct xdp_shared_info *xdp_sinfo;
>> +	unsigned long base_len;
>> +
>> +	if (likely(!xdp->mb)) {
>> +		memcpy(dst_buff, xdp->data + off, len);
>> +		return 0;
>> +	}
>> +
>> +	base_len = xdp->data_end - xdp->data;
>
> Would a static inline int xdp_buff_head_len() be useful?

Guess everybody is using the xdp->data_end - xdp->data, in there code. 
But I guess we can add a static inline and change all code, but I 
don’t think we should do it as part of this patchset. I would also 
call it something like xdp_buff_data_len()?

>> +	xdp_sinfo = xdp_get_shared_info_from_buff(xdp);
>> +	do {
>> +		const void *src_buff = NULL;
>> +		unsigned long copy_len = 0;
>> +
>> +		if (off < base_len) {
>> +			src_buff = xdp->data + off;
>> +			copy_len = min(len, base_len - off);
>> +		} else {
>> +			unsigned long frag_off_total = base_len;
>> +			int i;
>> +
>> +			for (i = 0; i < xdp_sinfo->nr_frags; i++) {
>> +				skb_frag_t *frag = &xdp_sinfo->frags[i];
>> +				unsigned long frag_len, frag_off;
>> +
>> +				frag_len = xdp_get_frag_size(frag);
>> +				frag_off = off - frag_off_total;
>> +				if (frag_off < frag_len) {
>> +					src_buff = xdp_get_frag_address(frag) +
>> +						   frag_off;
>> +					copy_len = min(len,
>> +						       frag_len - frag_off);
>> +					break;
>> +				}
>> +				frag_off_total += frag_len;
>> +			}
>> +		}
>> +		if (!src_buff)
>> +			break;
>> +
>> +		memcpy(dst_buff, src_buff, copy_len);
>> +		off += copy_len;
>> +		len -= copy_len;
>> +		dst_buff += copy_len;
>> +	} while (len);
>> +
>>  	return 0;
>>  }


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v8 bpf-next 00/14] mvneta: introduce XDP multi-buffer support
  2021-04-08 12:50 [PATCH v8 bpf-next 00/14] mvneta: introduce XDP multi-buffer support Lorenzo Bianconi
                   ` (14 preceding siblings ...)
  2021-04-09  0:56 ` [PATCH v8 bpf-next 00/14] mvneta: introduce XDP multi-buffer support John Fastabend
@ 2021-04-16 14:27 ` Magnus Karlsson
  2021-04-16 21:29   ` Lorenzo Bianconi
                     ` (2 more replies)
  15 siblings, 3 replies; 57+ messages in thread
From: Magnus Karlsson @ 2021-04-16 14:27 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: bpf, Network Development, lorenzo.bianconi, David S. Miller,
	Jakub Kicinski, Alexei Starovoitov, Daniel Borkmann, shayagr,
	sameehj, John Fastabend, David Ahern, Jesper Dangaard Brouer,
	Eelco Chaudron, Jason Wang, Alexander Duyck, Saeed Mahameed,
	Fijalkowski, Maciej, Tirthendu

On Thu, Apr 8, 2021 at 2:51 PM Lorenzo Bianconi <lorenzo@kernel.org> wrote:
>
> This series introduce XDP multi-buffer support. The mvneta driver is
> the first to support these new "non-linear" xdp_{buff,frame}. Reviewers
> please focus on how these new types of xdp_{buff,frame} packets
> traverse the different layers and the layout design. It is on purpose
> that BPF-helpers are kept simple, as we don't want to expose the
> internal layout to allow later changes.
>
> For now, to keep the design simple and to maintain performance, the XDP
> BPF-prog (still) only have access to the first-buffer. It is left for
> later (another patchset) to add payload access across multiple buffers.
> This patchset should still allow for these future extensions. The goal
> is to lift the XDP MTU restriction that comes with XDP, but maintain
> same performance as before.
>
> The main idea for the new multi-buffer layout is to reuse the same
> layout used for non-linear SKB. We introduced a "xdp_shared_info" data
> structure at the end of the first buffer to link together subsequent buffers.
> xdp_shared_info will alias skb_shared_info allowing to keep most of the frags
> in the same cache-line (while with skb_shared_info only the first fragment will
> be placed in the first "shared_info" cache-line). Moreover we introduced some
> xdp_shared_info helpers aligned to skb_frag* ones.
> Converting xdp_frame to SKB and deliver it to the network stack is shown in
> patch 07/14. Building the SKB, the xdp_shared_info structure will be converted
> in a skb_shared_info one.
>
> A multi-buffer bit (mb) has been introduced in xdp_{buff,frame} structure
> to notify the bpf/network layer if this is a xdp multi-buffer frame (mb = 1)
> or not (mb = 0).
> The mb bit will be set by a xdp multi-buffer capable driver only for
> non-linear frames maintaining the capability to receive linear frames
> without any extra cost since the xdp_shared_info structure at the end
> of the first buffer will be initialized only if mb is set.
>
> Typical use cases for this series are:
> - Jumbo-frames
> - Packet header split (please see Google’s use-case @ NetDevConf 0x14, [0])
> - TSO
>
> A new frame_length field has been introduce in XDP ctx in order to notify the
> eBPF layer about the total frame size (linear + paged parts).
>
> bpf_xdp_adjust_tail and bpf_xdp_copy helpers have been modified to take into
> account xdp multi-buff frames.
>
> More info about the main idea behind this approach can be found here [1][2].
>
> Changes since v7:
> - rebase on top of bpf-next
> - fix sparse warnings
> - improve comments for frame_length in include/net/xdp.h
>
> Changes since v6:
> - the main difference respect to previous versions is the new approach proposed
>   by Eelco to pass full length of the packet to eBPF layer in XDP context
> - reintroduce multi-buff support to eBPF kself-tests
> - reintroduce multi-buff support to bpf_xdp_adjust_tail helper
> - introduce multi-buffer support to bpf_xdp_copy helper
> - rebase on top of bpf-next
>
> Changes since v5:
> - rebase on top of bpf-next
> - initialize mb bit in xdp_init_buff() and drop per-driver initialization
> - drop xdp->mb initialization in xdp_convert_zc_to_xdp_frame()
> - postpone introduction of frame_length field in XDP ctx to another series
> - minor changes
>
> Changes since v4:
> - rebase ontop of bpf-next
> - introduce xdp_shared_info to build xdp multi-buff instead of using the
>   skb_shared_info struct
> - introduce frame_length in xdp ctx
> - drop previous bpf helpers
> - fix bpf_xdp_adjust_tail for xdp multi-buff
> - introduce xdp multi-buff self-tests for bpf_xdp_adjust_tail
> - fix xdp_return_frame_bulk for xdp multi-buff
>
> Changes since v3:
> - rebase ontop of bpf-next
> - add patch 10/13 to copy back paged data from a xdp multi-buff frame to
>   userspace buffer for xdp multi-buff selftests
>
> Changes since v2:
> - add throughput measurements
> - drop bpf_xdp_adjust_mb_header bpf helper
> - introduce selftest for xdp multibuffer
> - addressed comments on bpf_xdp_get_frags_count
> - introduce xdp multi-buff support to cpumaps
>
> Changes since v1:
> - Fix use-after-free in xdp_return_{buff/frame}
> - Introduce bpf helpers
> - Introduce xdp_mb sample program
> - access skb_shared_info->nr_frags only on the last fragment
>
> Changes since RFC:
> - squash multi-buffer bit initialization in a single patch
> - add mvneta non-linear XDP buff support for tx side
>
> [0] https://netdevconf.info/0x14/session.html?talk-the-path-to-tcp-4k-mtu-and-rx-zerocopy
> [1] https://github.com/xdp-project/xdp-project/blob/master/areas/core/xdp-multi-buffer01-design.org
> [2] https://netdevconf.info/0x14/session.html?tutorial-add-XDP-support-to-a-NIC-driver (XDPmulti-buffers section)

Took your patches for a test run with the AF_XDP sample xdpsock on an
i40e card and the throughput degradation is between 2 to 6% depending
on the setup and microbenchmark within xdpsock that is executed. And
this is without sending any multi frame packets. Just single frame
ones. Tirtha made changes to the i40e driver to support this new
interface so that is being included in the measurements.

What performance do you see with the mvneta card? How much are we
willing to pay for this feature when it is not being used or can we in
some way selectively turn it on only when needed?

Thanks: Magnus

> Eelco Chaudron (4):
>   bpf: add multi-buff support to the bpf_xdp_adjust_tail() API
>   bpd: add multi-buffer support to xdp copy helpers
>   bpf: add new frame_length field to the XDP ctx
>   bpf: update xdp_adjust_tail selftest to include multi-buffer
>
> Lorenzo Bianconi (10):
>   xdp: introduce mb in xdp_buff/xdp_frame
>   xdp: add xdp_shared_info data structure
>   net: mvneta: update mb bit before passing the xdp buffer to eBPF layer
>   xdp: add multi-buff support to xdp_return_{buff/frame}
>   net: mvneta: add multi buffer support to XDP_TX
>   net: mvneta: enable jumbo frames for XDP
>   net: xdp: add multi-buff support to xdp_build_skb_from_fram
>   bpf: move user_size out of bpf_test_init
>   bpf: introduce multibuff support to bpf_prog_test_run_xdp()
>   bpf: test_run: add xdp_shared_info pointer in bpf_test_finish
>     signature
>
>  drivers/net/ethernet/marvell/mvneta.c         | 182 ++++++++++--------
>  include/linux/filter.h                        |   7 +
>  include/net/xdp.h                             | 105 +++++++++-
>  include/uapi/linux/bpf.h                      |   1 +
>  net/bpf/test_run.c                            | 109 +++++++++--
>  net/core/filter.c                             | 134 ++++++++++++-
>  net/core/xdp.c                                | 103 +++++++++-
>  tools/include/uapi/linux/bpf.h                |   1 +
>  .../bpf/prog_tests/xdp_adjust_tail.c          | 105 ++++++++++
>  .../selftests/bpf/prog_tests/xdp_bpf2bpf.c    | 127 ++++++++----
>  .../bpf/progs/test_xdp_adjust_tail_grow.c     |  17 +-
>  .../bpf/progs/test_xdp_adjust_tail_shrink.c   |  32 ++-
>  .../selftests/bpf/progs/test_xdp_bpf2bpf.c    |   3 +-
>  13 files changed, 767 insertions(+), 159 deletions(-)
>
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v8 bpf-next 00/14] mvneta: introduce XDP multi-buffer support
  2021-04-16 14:27 ` Magnus Karlsson
@ 2021-04-16 21:29   ` Lorenzo Bianconi
  2021-04-16 23:00     ` Daniel Borkmann
  2021-04-18 16:18   ` Jesper Dangaard Brouer
  2021-04-27 18:28   ` [PATCH v8 bpf-next 00/14] mvneta: introduce XDP multi-buffer support Lorenzo Bianconi
  2 siblings, 1 reply; 57+ messages in thread
From: Lorenzo Bianconi @ 2021-04-16 21:29 UTC (permalink / raw)
  To: Magnus Karlsson
  Cc: Lorenzo Bianconi, bpf, Network Development, David S. Miller,
	Jakub Kicinski, Alexei Starovoitov, Daniel Borkmann, shayagr,
	sameehj, John Fastabend, David Ahern, Jesper Dangaard Brouer,
	Eelco Chaudron, Jason Wang, Alexander Duyck, Saeed Mahameed,
	Fijalkowski, Maciej, Tirthendu

[-- Attachment #1: Type: text/plain, Size: 2945 bytes --]

> 
> Took your patches for a test run with the AF_XDP sample xdpsock on an
> i40e card and the throughput degradation is between 2 to 6% depending
> on the setup and microbenchmark within xdpsock that is executed. And
> this is without sending any multi frame packets. Just single frame
> ones. Tirtha made changes to the i40e driver to support this new
> interface so that is being included in the measurements.

Hi Magnus,

thx for working on it. Assuming the fragmented part is only initialized/accessed
if mb is set (so for multi frame packets), I would not expect any throughput
degradation in the single frame scenario. Can you please share the i40e
support added by Tirtha?

> 
> What performance do you see with the mvneta card? How much are we
> willing to pay for this feature when it is not being used or can we in
> some way selectively turn it on only when needed?

IIRC I did not get sensible throughput degradation on mvneta but I will re-run
the tests running an updated bpf-next tree.

Regards,
Lorenzo

> 
> Thanks: Magnus
> 
> > Eelco Chaudron (4):
> >   bpf: add multi-buff support to the bpf_xdp_adjust_tail() API
> >   bpd: add multi-buffer support to xdp copy helpers
> >   bpf: add new frame_length field to the XDP ctx
> >   bpf: update xdp_adjust_tail selftest to include multi-buffer
> >
> > Lorenzo Bianconi (10):
> >   xdp: introduce mb in xdp_buff/xdp_frame
> >   xdp: add xdp_shared_info data structure
> >   net: mvneta: update mb bit before passing the xdp buffer to eBPF layer
> >   xdp: add multi-buff support to xdp_return_{buff/frame}
> >   net: mvneta: add multi buffer support to XDP_TX
> >   net: mvneta: enable jumbo frames for XDP
> >   net: xdp: add multi-buff support to xdp_build_skb_from_fram
> >   bpf: move user_size out of bpf_test_init
> >   bpf: introduce multibuff support to bpf_prog_test_run_xdp()
> >   bpf: test_run: add xdp_shared_info pointer in bpf_test_finish
> >     signature
> >
> >  drivers/net/ethernet/marvell/mvneta.c         | 182 ++++++++++--------
> >  include/linux/filter.h                        |   7 +
> >  include/net/xdp.h                             | 105 +++++++++-
> >  include/uapi/linux/bpf.h                      |   1 +
> >  net/bpf/test_run.c                            | 109 +++++++++--
> >  net/core/filter.c                             | 134 ++++++++++++-
> >  net/core/xdp.c                                | 103 +++++++++-
> >  tools/include/uapi/linux/bpf.h                |   1 +
> >  .../bpf/prog_tests/xdp_adjust_tail.c          | 105 ++++++++++
> >  .../selftests/bpf/prog_tests/xdp_bpf2bpf.c    | 127 ++++++++----
> >  .../bpf/progs/test_xdp_adjust_tail_grow.c     |  17 +-
> >  .../bpf/progs/test_xdp_adjust_tail_shrink.c   |  32 ++-
> >  .../selftests/bpf/progs/test_xdp_bpf2bpf.c    |   3 +-
> >  13 files changed, 767 insertions(+), 159 deletions(-)
> >
> > --
> > 2.30.2
> >
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v8 bpf-next 00/14] mvneta: introduce XDP multi-buffer support
  2021-04-16 21:29   ` Lorenzo Bianconi
@ 2021-04-16 23:00     ` Daniel Borkmann
  0 siblings, 0 replies; 57+ messages in thread
From: Daniel Borkmann @ 2021-04-16 23:00 UTC (permalink / raw)
  To: Lorenzo Bianconi, Magnus Karlsson
  Cc: Lorenzo Bianconi, bpf, Network Development, David S. Miller,
	Jakub Kicinski, Alexei Starovoitov, shayagr, sameehj,
	John Fastabend, David Ahern, Jesper Dangaard Brouer,
	Eelco Chaudron, Jason Wang, Alexander Duyck, Saeed Mahameed,
	Fijalkowski, Maciej, Tirthendu

On 4/16/21 11:29 PM, Lorenzo Bianconi wrote:
>>
>> Took your patches for a test run with the AF_XDP sample xdpsock on an
>> i40e card and the throughput degradation is between 2 to 6% depending
>> on the setup and microbenchmark within xdpsock that is executed. And
>> this is without sending any multi frame packets. Just single frame
>> ones. Tirtha made changes to the i40e driver to support this new
>> interface so that is being included in the measurements.
> 
> thx for working on it. Assuming the fragmented part is only initialized/accessed
> if mb is set (so for multi frame packets), I would not expect any throughput
> degradation in the single frame scenario. Can you please share the i40e
> support added by Tirtha?

Thanks Tirtha & Magnus for adding and testing mb support for i40e, and sharing those
data points; a degradation between 2-6% when mb is not used would definitely not be
acceptable. Would be great to root-cause and debug this further with Lorenzo, there
really should be close to /zero/ additional overhead to avoid regressing existing
performance sensitive workloads like load balancers, etc once they upgrade their
kernels/drivers.

>> What performance do you see with the mvneta card? How much are we
>> willing to pay for this feature when it is not being used or can we in
>> some way selectively turn it on only when needed?
> 
> IIRC I did not get sensible throughput degradation on mvneta but I will re-run
> the tests running an updated bpf-next tree.

But compared to i40e, mvneta is also only 1-2.5 Gbps so potentially less visible,
right [0]? Either way, it's definitely good to get more data points from benchmarking
given this was lacking before for higher speed NICs in particular.

Thanks everyone,
Daniel

   [0] https://doc.dpdk.org/guides/nics/mvneta.html

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v8 bpf-next 00/14] mvneta: introduce XDP multi-buffer support
  2021-04-16 14:27 ` Magnus Karlsson
  2021-04-16 21:29   ` Lorenzo Bianconi
@ 2021-04-18 16:18   ` Jesper Dangaard Brouer
  2021-04-19  6:20     ` Magnus Karlsson
  2021-04-27 18:28   ` [PATCH v8 bpf-next 00/14] mvneta: introduce XDP multi-buffer support Lorenzo Bianconi
  2 siblings, 1 reply; 57+ messages in thread
From: Jesper Dangaard Brouer @ 2021-04-18 16:18 UTC (permalink / raw)
  To: Magnus Karlsson
  Cc: Lorenzo Bianconi, bpf, Network Development, lorenzo.bianconi,
	David S. Miller, Jakub Kicinski, Alexei Starovoitov,
	Daniel Borkmann, shayagr, sameehj, John Fastabend, David Ahern,
	Eelco Chaudron, Jason Wang, Alexander Duyck, Saeed Mahameed,
	Fijalkowski, Maciej, Tirthendu, brouer

On Fri, 16 Apr 2021 16:27:18 +0200
Magnus Karlsson <magnus.karlsson@gmail.com> wrote:

> On Thu, Apr 8, 2021 at 2:51 PM Lorenzo Bianconi <lorenzo@kernel.org> wrote:
> >
> > This series introduce XDP multi-buffer support. The mvneta driver is
> > the first to support these new "non-linear" xdp_{buff,frame}. Reviewers
> > please focus on how these new types of xdp_{buff,frame} packets
> > traverse the different layers and the layout design. It is on purpose
> > that BPF-helpers are kept simple, as we don't want to expose the
> > internal layout to allow later changes.
> >
> > For now, to keep the design simple and to maintain performance, the XDP
> > BPF-prog (still) only have access to the first-buffer. It is left for
> > later (another patchset) to add payload access across multiple buffers.
> > This patchset should still allow for these future extensions. The goal
> > is to lift the XDP MTU restriction that comes with XDP, but maintain
> > same performance as before.
[...]
> >
> > [0] https://netdevconf.info/0x14/session.html?talk-the-path-to-tcp-4k-mtu-and-rx-zerocopy
> > [1] https://github.com/xdp-project/xdp-project/blob/master/areas/core/xdp-multi-buffer01-design.org
> > [2] https://netdevconf.info/0x14/session.html?tutorial-add-XDP-support-to-a-NIC-driver (XDPmulti-buffers section)  
> 
> Took your patches for a test run with the AF_XDP sample xdpsock on an
> i40e card and the throughput degradation is between 2 to 6% depending
> on the setup and microbenchmark within xdpsock that is executed. And
> this is without sending any multi frame packets. Just single frame
> ones. Tirtha made changes to the i40e driver to support this new
> interface so that is being included in the measurements.

Could you please share Tirtha's i40e support patch with me?

I would like to reproduce these results in my testlab, in-order to
figure out where the throughput degradation comes from.

> What performance do you see with the mvneta card? How much are we
> willing to pay for this feature when it is not being used or can we in
> some way selectively turn it on only when needed?

Well, as Daniel says performance wise we require close to /zero/
additional overhead, especially as you state this happens when sending
a single frame, which is a base case that we must not slowdown.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v8 bpf-next 00/14] mvneta: introduce XDP multi-buffer support
  2021-04-18 16:18   ` Jesper Dangaard Brouer
@ 2021-04-19  6:20     ` Magnus Karlsson
  2021-04-19  6:55       ` Lorenzo Bianconi
  0 siblings, 1 reply; 57+ messages in thread
From: Magnus Karlsson @ 2021-04-19  6:20 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Lorenzo Bianconi, bpf, Network Development, lorenzo.bianconi,
	David S. Miller, Jakub Kicinski, Alexei Starovoitov,
	Daniel Borkmann, shayagr, sameehj, John Fastabend, David Ahern,
	Eelco Chaudron, Jason Wang, Alexander Duyck, Saeed Mahameed,
	Fijalkowski, Maciej, Tirthendu

On Sun, Apr 18, 2021 at 6:18 PM Jesper Dangaard Brouer
<brouer@redhat.com> wrote:
>
> On Fri, 16 Apr 2021 16:27:18 +0200
> Magnus Karlsson <magnus.karlsson@gmail.com> wrote:
>
> > On Thu, Apr 8, 2021 at 2:51 PM Lorenzo Bianconi <lorenzo@kernel.org> wrote:
> > >
> > > This series introduce XDP multi-buffer support. The mvneta driver is
> > > the first to support these new "non-linear" xdp_{buff,frame}. Reviewers
> > > please focus on how these new types of xdp_{buff,frame} packets
> > > traverse the different layers and the layout design. It is on purpose
> > > that BPF-helpers are kept simple, as we don't want to expose the
> > > internal layout to allow later changes.
> > >
> > > For now, to keep the design simple and to maintain performance, the XDP
> > > BPF-prog (still) only have access to the first-buffer. It is left for
> > > later (another patchset) to add payload access across multiple buffers.
> > > This patchset should still allow for these future extensions. The goal
> > > is to lift the XDP MTU restriction that comes with XDP, but maintain
> > > same performance as before.
> [...]
> > >
> > > [0] https://netdevconf.info/0x14/session.html?talk-the-path-to-tcp-4k-mtu-and-rx-zerocopy
> > > [1] https://github.com/xdp-project/xdp-project/blob/master/areas/core/xdp-multi-buffer01-design.org
> > > [2] https://netdevconf.info/0x14/session.html?tutorial-add-XDP-support-to-a-NIC-driver (XDPmulti-buffers section)
> >
> > Took your patches for a test run with the AF_XDP sample xdpsock on an
> > i40e card and the throughput degradation is between 2 to 6% depending
> > on the setup and microbenchmark within xdpsock that is executed. And
> > this is without sending any multi frame packets. Just single frame
> > ones. Tirtha made changes to the i40e driver to support this new
> > interface so that is being included in the measurements.
>
> Could you please share Tirtha's i40e support patch with me?

We will post them on the list as an RFC. Tirtha also added AF_XDP
multi-frame support on top of Lorenzo's patches so we will send that
one out as well. Will also rerun my experiments, properly document
them and send out just to be sure that I did not make any mistake.

Just note that I would really like for the multi-frame support to get
in. I have lost count on how many people that have asked for it to be
added to XDP and AF_XDP. So please check our implementation and
improve it so we can get the overhead down to where we want it to be.

Thanks: Magnus

> I would like to reproduce these results in my testlab, in-order to
> figure out where the throughput degradation comes from.
>
> > What performance do you see with the mvneta card? How much are we
> > willing to pay for this feature when it is not being used or can we in
> > some way selectively turn it on only when needed?
>
> Well, as Daniel says performance wise we require close to /zero/
> additional overhead, especially as you state this happens when sending
> a single frame, which is a base case that we must not slowdown.
>
> --
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   LinkedIn: http://www.linkedin.com/in/brouer
>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v8 bpf-next 00/14] mvneta: introduce XDP multi-buffer support
  2021-04-19  6:20     ` Magnus Karlsson
@ 2021-04-19  6:55       ` Lorenzo Bianconi
  2021-04-20 13:49         ` Magnus Karlsson
  0 siblings, 1 reply; 57+ messages in thread
From: Lorenzo Bianconi @ 2021-04-19  6:55 UTC (permalink / raw)
  To: Magnus Karlsson
  Cc: Jesper Dangaard Brouer, Lorenzo Bianconi, bpf,
	Network Development, David S. Miller, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, shayagr, sameehj,
	John Fastabend, David Ahern, Eelco Chaudron, Jason Wang,
	Alexander Duyck, Saeed Mahameed, Fijalkowski, Maciej, Tirthendu

[-- Attachment #1: Type: text/plain, Size: 3442 bytes --]

> On Sun, Apr 18, 2021 at 6:18 PM Jesper Dangaard Brouer
> <brouer@redhat.com> wrote:
> >
> > On Fri, 16 Apr 2021 16:27:18 +0200
> > Magnus Karlsson <magnus.karlsson@gmail.com> wrote:
> >
> > > On Thu, Apr 8, 2021 at 2:51 PM Lorenzo Bianconi <lorenzo@kernel.org> wrote:
> > > >
> > > > This series introduce XDP multi-buffer support. The mvneta driver is
> > > > the first to support these new "non-linear" xdp_{buff,frame}. Reviewers
> > > > please focus on how these new types of xdp_{buff,frame} packets
> > > > traverse the different layers and the layout design. It is on purpose
> > > > that BPF-helpers are kept simple, as we don't want to expose the
> > > > internal layout to allow later changes.
> > > >
> > > > For now, to keep the design simple and to maintain performance, the XDP
> > > > BPF-prog (still) only have access to the first-buffer. It is left for
> > > > later (another patchset) to add payload access across multiple buffers.
> > > > This patchset should still allow for these future extensions. The goal
> > > > is to lift the XDP MTU restriction that comes with XDP, but maintain
> > > > same performance as before.
> > [...]
> > > >
> > > > [0] https://netdevconf.info/0x14/session.html?talk-the-path-to-tcp-4k-mtu-and-rx-zerocopy
> > > > [1] https://github.com/xdp-project/xdp-project/blob/master/areas/core/xdp-multi-buffer01-design.org
> > > > [2] https://netdevconf.info/0x14/session.html?tutorial-add-XDP-support-to-a-NIC-driver (XDPmulti-buffers section)
> > >
> > > Took your patches for a test run with the AF_XDP sample xdpsock on an
> > > i40e card and the throughput degradation is between 2 to 6% depending
> > > on the setup and microbenchmark within xdpsock that is executed. And
> > > this is without sending any multi frame packets. Just single frame
> > > ones. Tirtha made changes to the i40e driver to support this new
> > > interface so that is being included in the measurements.
> >
> > Could you please share Tirtha's i40e support patch with me?
> 
> We will post them on the list as an RFC. Tirtha also added AF_XDP
> multi-frame support on top of Lorenzo's patches so we will send that
> one out as well. Will also rerun my experiments, properly document
> them and send out just to be sure that I did not make any mistake.

ack, very cool, thx

> 
> Just note that I would really like for the multi-frame support to get
> in. I have lost count on how many people that have asked for it to be
> added to XDP and AF_XDP. So please check our implementation and
> improve it so we can get the overhead down to where we want it to be.

sure, I will do.

Regards,
Lorenzo

> 
> Thanks: Magnus
> 
> > I would like to reproduce these results in my testlab, in-order to
> > figure out where the throughput degradation comes from.
> >
> > > What performance do you see with the mvneta card? How much are we
> > > willing to pay for this feature when it is not being used or can we in
> > > some way selectively turn it on only when needed?
> >
> > Well, as Daniel says performance wise we require close to /zero/
> > additional overhead, especially as you state this happens when sending
> > a single frame, which is a base case that we must not slowdown.
> >
> > --
> > Best regards,
> >   Jesper Dangaard Brouer
> >   MSc.CS, Principal Kernel Engineer at Red Hat
> >   LinkedIn: http://www.linkedin.com/in/brouer
> >
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v8 bpf-next 00/14] mvneta: introduce XDP multi-buffer support
  2021-04-19  6:55       ` Lorenzo Bianconi
@ 2021-04-20 13:49         ` Magnus Karlsson
  2021-04-21 12:47           ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 57+ messages in thread
From: Magnus Karlsson @ 2021-04-20 13:49 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: Jesper Dangaard Brouer, Lorenzo Bianconi, bpf,
	Network Development, David S. Miller, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, shayagr, sameehj,
	John Fastabend, David Ahern, Eelco Chaudron, Jason Wang,
	Alexander Duyck, Saeed Mahameed, Fijalkowski, Maciej, Tirthendu

On Mon, Apr 19, 2021 at 8:56 AM Lorenzo Bianconi
<lorenzo.bianconi@redhat.com> wrote:
>
> > On Sun, Apr 18, 2021 at 6:18 PM Jesper Dangaard Brouer
> > <brouer@redhat.com> wrote:
> > >
> > > On Fri, 16 Apr 2021 16:27:18 +0200
> > > Magnus Karlsson <magnus.karlsson@gmail.com> wrote:
> > >
> > > > On Thu, Apr 8, 2021 at 2:51 PM Lorenzo Bianconi <lorenzo@kernel.org> wrote:
> > > > >
> > > > > This series introduce XDP multi-buffer support. The mvneta driver is
> > > > > the first to support these new "non-linear" xdp_{buff,frame}. Reviewers
> > > > > please focus on how these new types of xdp_{buff,frame} packets
> > > > > traverse the different layers and the layout design. It is on purpose
> > > > > that BPF-helpers are kept simple, as we don't want to expose the
> > > > > internal layout to allow later changes.
> > > > >
> > > > > For now, to keep the design simple and to maintain performance, the XDP
> > > > > BPF-prog (still) only have access to the first-buffer. It is left for
> > > > > later (another patchset) to add payload access across multiple buffers.
> > > > > This patchset should still allow for these future extensions. The goal
> > > > > is to lift the XDP MTU restriction that comes with XDP, but maintain
> > > > > same performance as before.
> > > [...]
> > > > >
> > > > > [0] https://netdevconf.info/0x14/session.html?talk-the-path-to-tcp-4k-mtu-and-rx-zerocopy
> > > > > [1] https://github.com/xdp-project/xdp-project/blob/master/areas/core/xdp-multi-buffer01-design.org
> > > > > [2] https://netdevconf.info/0x14/session.html?tutorial-add-XDP-support-to-a-NIC-driver (XDPmulti-buffers section)
> > > >
> > > > Took your patches for a test run with the AF_XDP sample xdpsock on an
> > > > i40e card and the throughput degradation is between 2 to 6% depending
> > > > on the setup and microbenchmark within xdpsock that is executed. And
> > > > this is without sending any multi frame packets. Just single frame
> > > > ones. Tirtha made changes to the i40e driver to support this new
> > > > interface so that is being included in the measurements.
> > >
> > > Could you please share Tirtha's i40e support patch with me?
> >
> > We will post them on the list as an RFC. Tirtha also added AF_XDP
> > multi-frame support on top of Lorenzo's patches so we will send that
> > one out as well. Will also rerun my experiments, properly document
> > them and send out just to be sure that I did not make any mistake.
>
> ack, very cool, thx

I have now run a new set of experiments on a Cascade Lake server at
2.1 GHz with turbo boost disabled. Two NICs: i40e and ice. The
baseline is commit 5c507329000e ("libbpf: Clarify flags in ringbuf
helpers") and Lorenzo's and Eelco's path set is their v8. First some
runs with xdpsock (i.e. AF_XDP) in both 2-core mode (app on one core
and the driver on another) and 1-core mode using busy_poll.

xdpsock rxdrop throughput change with the multi-buffer patches without
any driver changes:
1-core i40e: -0.5 to 0%   2-cores i40e: -0.5%
1-core ice: -2%   2-cores ice: -1 to -0.5%

xdp_rxq_info -a XDP_DROP
i40e: -4%   ice: +8%

xdp_rxq_info -a XDP_TX
i40e: -10%   ice: +9%

The XDP results with xdp_rxq_info are just weird! I reran them three
times, rebuilt and rebooted in between and I always get the same
results. And I also checked that I am running on the correct NUMA node
and so on. But I have a hard time believing them. Nearly +10% and -10%
difference. Too much in my book. Jesper, could you please run the same
and see what you get? The xdpsock numbers are more in the ballpark of
what I would expect.

Tirtha and I found some optimizations in the i40e
multi-frame/multi-buffer support that we have implemented. Will test
those next, post the results and share the code.

> >
> > Just note that I would really like for the multi-frame support to get
> > in. I have lost count on how many people that have asked for it to be
> > added to XDP and AF_XDP. So please check our implementation and
> > improve it so we can get the overhead down to where we want it to be.
>
> sure, I will do.
>
> Regards,
> Lorenzo
>
> >
> > Thanks: Magnus
> >
> > > I would like to reproduce these results in my testlab, in-order to
> > > figure out where the throughput degradation comes from.
> > >
> > > > What performance do you see with the mvneta card? How much are we
> > > > willing to pay for this feature when it is not being used or can we in
> > > > some way selectively turn it on only when needed?
> > >
> > > Well, as Daniel says performance wise we require close to /zero/
> > > additional overhead, especially as you state this happens when sending
> > > a single frame, which is a base case that we must not slowdown.
> > >
> > > --
> > > Best regards,
> > >   Jesper Dangaard Brouer
> > >   MSc.CS, Principal Kernel Engineer at Red Hat
> > >   LinkedIn: http://www.linkedin.com/in/brouer
> > >
> >

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v8 bpf-next 00/14] mvneta: introduce XDP multi-buffer support
  2021-04-20 13:49         ` Magnus Karlsson
@ 2021-04-21 12:47           ` Jesper Dangaard Brouer
  2021-04-21 14:12             ` Magnus Karlsson
  0 siblings, 1 reply; 57+ messages in thread
From: Jesper Dangaard Brouer @ 2021-04-21 12:47 UTC (permalink / raw)
  To: Magnus Karlsson
  Cc: Lorenzo Bianconi, Lorenzo Bianconi, bpf, Network Development,
	David S. Miller, Jakub Kicinski, Alexei Starovoitov,
	Daniel Borkmann, shayagr, sameehj, John Fastabend, David Ahern,
	Eelco Chaudron, Jason Wang, Alexander Duyck, Saeed Mahameed,
	Fijalkowski, Maciej, Tirthendu, brouer

On Tue, 20 Apr 2021 15:49:44 +0200
Magnus Karlsson <magnus.karlsson@gmail.com> wrote:

> On Mon, Apr 19, 2021 at 8:56 AM Lorenzo Bianconi
> <lorenzo.bianconi@redhat.com> wrote:
> >  
> > > On Sun, Apr 18, 2021 at 6:18 PM Jesper Dangaard Brouer
> > > <brouer@redhat.com> wrote:  
> > > >
> > > > On Fri, 16 Apr 2021 16:27:18 +0200
> > > > Magnus Karlsson <magnus.karlsson@gmail.com> wrote:
> > > >  
> > > > > On Thu, Apr 8, 2021 at 2:51 PM Lorenzo Bianconi <lorenzo@kernel.org> wrote:  
> > > > > >
> > > > > > This series introduce XDP multi-buffer support. The mvneta driver is
> > > > > > the first to support these new "non-linear" xdp_{buff,frame}. Reviewers
> > > > > > please focus on how these new types of xdp_{buff,frame} packets
> > > > > > traverse the different layers and the layout design. It is on purpose
> > > > > > that BPF-helpers are kept simple, as we don't want to expose the
> > > > > > internal layout to allow later changes.
> > > > > >
> > > > > > For now, to keep the design simple and to maintain performance, the XDP
> > > > > > BPF-prog (still) only have access to the first-buffer. It is left for
> > > > > > later (another patchset) to add payload access across multiple buffers.
> > > > > > This patchset should still allow for these future extensions. The goal
> > > > > > is to lift the XDP MTU restriction that comes with XDP, but maintain
> > > > > > same performance as before.  
> > > > [...]  
> > > > > >
> > > > > > [0] https://netdevconf.info/0x14/session.html?talk-the-path-to-tcp-4k-mtu-and-rx-zerocopy
> > > > > > [1] https://github.com/xdp-project/xdp-project/blob/master/areas/core/xdp-multi-buffer01-design.org
> > > > > > [2] https://netdevconf.info/0x14/session.html?tutorial-add-XDP-support-to-a-NIC-driver (XDPmulti-buffers section)  
> > > > >
> > > > > Took your patches for a test run with the AF_XDP sample xdpsock on an
> > > > > i40e card and the throughput degradation is between 2 to 6% depending
> > > > > on the setup and microbenchmark within xdpsock that is executed. And
> > > > > this is without sending any multi frame packets. Just single frame
> > > > > ones. Tirtha made changes to the i40e driver to support this new
> > > > > interface so that is being included in the measurements.  
> > > >
> > > > Could you please share Tirtha's i40e support patch with me?  
> > >
> > > We will post them on the list as an RFC. Tirtha also added AF_XDP
> > > multi-frame support on top of Lorenzo's patches so we will send that
> > > one out as well. Will also rerun my experiments, properly document
> > > them and send out just to be sure that I did not make any mistake.  
> >
> > ack, very cool, thx  
> 
> I have now run a new set of experiments on a Cascade Lake server at
> 2.1 GHz with turbo boost disabled. Two NICs: i40e and ice. The
> baseline is commit 5c507329000e ("libbpf: Clarify flags in ringbuf
> helpers") and Lorenzo's and Eelco's path set is their v8. First some
> runs with xdpsock (i.e. AF_XDP) in both 2-core mode (app on one core
> and the driver on another) and 1-core mode using busy_poll.
> 
> xdpsock rxdrop throughput change with the multi-buffer patches without
> any driver changes:
> 1-core i40e: -0.5 to 0%   2-cores i40e: -0.5%
> 1-core ice: -2%   2-cores ice: -1 to -0.5%
> 
> xdp_rxq_info -a XDP_DROP
> i40e: -4%   ice: +8%
> 
> xdp_rxq_info -a XDP_TX
> i40e: -10%   ice: +9%
> 
> The XDP results with xdp_rxq_info are just weird! I reran them three
> times, rebuilt and rebooted in between and I always get the same
> results. And I also checked that I am running on the correct NUMA node
> and so on. But I have a hard time believing them. Nearly +10% and -10%
> difference. Too much in my book. Jesper, could you please run the same
> and see what you get? 

We of-cause have to find the root-cause of the +-10%, but let me drill
into what the 10% represent time/cycle wise.  Using a percentage
difference is usually a really good idea as it implies a comparative
measure (something I always request people to do, as a single
performance number means nothing by itself).

For a zoom-in-benchmarks like these where the amount of code executed
is very small, the effect of removing or adding code can effect the
measurement a lot.

I can only do the tests for i40e, as I don't have ice hardware (but
Intel is working on fixing that ;-)).

 xdp_rxq_info -a XDP_DROP
  i40e: 33,417,775 pps

 CPU is 100% used, so we can calculate nanosec used per packet:
  29.92 nanosec (1/33417775*10^9)
  2.1 GHz CPU =  approx 63 CPU-cycles

 You lost -4% performance in this case.  This correspond to:
  -1.2 nanosec (29.92*0.04) slower
  (This could be cost of single func call overhead = 1.3 ns)
  
My measurement for XDP_TX:

 xdp_rxq_info -a XDP_TX
  28,278,722 pps
  35.36 ns (1/28278722*10^9)

 You lost -10% performance in this case:
  -3.54 nanosec (35.36*0.10) slower

In XDP context 3.54 nanosec is a lot, as you can see it is 10% in this
zoom-in benchmark.  We have to look at the details.

One detail/issue with i40e doing XDP_TX, is that I cannot verify that
packets are actually transmitted... not via exception tracepoint, not
via netstats, not via ethtool_stats.pl.  Maybe all the packets are
getting (silently) drop in my tests...!?!


> The xdpsock numbers are more in the ballpark of
> what I would expect.
>
> Tirtha and I found some optimizations in the i40e
> multi-frame/multi-buffer support that we have implemented. Will test
> those next, post the results and share the code.
> 
> > >
> > > Just note that I would really like for the multi-frame support to get
> > > in. I have lost count on how many people that have asked for it to be
> > > added to XDP and AF_XDP. So please check our implementation and
> > > improve it so we can get the overhead down to where we want it to be.  
> >
> > sure, I will do.
> >
> > Regards,
> > Lorenzo
> >  
> > >
> > > Thanks: Magnus
> > >  
> > > > I would like to reproduce these results in my testlab, in-order to
> > > > figure out where the throughput degradation comes from.
> > > >  
> > > > > What performance do you see with the mvneta card? How much are we
> > > > > willing to pay for this feature when it is not being used or can we in
> > > > > some way selectively turn it on only when needed?  
> > > >
> > > > Well, as Daniel says performance wise we require close to /zero/
> > > > additional overhead, especially as you state this happens when sending
> > > > a single frame, which is a base case that we must not slowdown.
> > > >
> > > > --
> > > > Best regards,
> > > >   Jesper Dangaard Brouer

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


Running XDP on dev:i40e2 (ifindex:6) action:XDP_DROP options:read
XDP stats       CPU     pps         issue-pps  
XDP-RX CPU      2       33,417,775  0          
XDP-RX CPU      total   33,417,775 

RXQ stats       RXQ:CPU pps         issue-pps  
rx_queue_index    2:2   33,417,775  0          
rx_queue_index    2:sum 33,417,775 


Running XDP on dev:i40e2 (ifindex:6) action:XDP_TX options:swapmac
XDP stats       CPU     pps         issue-pps  
XDP-RX CPU      2       28,278,722  0          
XDP-RX CPU      total   28,278,722 

RXQ stats       RXQ:CPU pps         issue-pps  
rx_queue_index    2:2   28,278,726  0          
rx_queue_index    2:sum 28,278,726 




^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v8 bpf-next 00/14] mvneta: introduce XDP multi-buffer support
  2021-04-21 12:47           ` Jesper Dangaard Brouer
@ 2021-04-21 14:12             ` Magnus Karlsson
  2021-04-21 15:39               ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 57+ messages in thread
From: Magnus Karlsson @ 2021-04-21 14:12 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Lorenzo Bianconi, Lorenzo Bianconi, bpf, Network Development,
	David S. Miller, Jakub Kicinski, Alexei Starovoitov,
	Daniel Borkmann, shayagr, sameehj, John Fastabend, David Ahern,
	Eelco Chaudron, Jason Wang, Alexander Duyck, Saeed Mahameed,
	Fijalkowski, Maciej, Tirthendu

On Wed, Apr 21, 2021 at 2:48 PM Jesper Dangaard Brouer
<brouer@redhat.com> wrote:
>
> On Tue, 20 Apr 2021 15:49:44 +0200
> Magnus Karlsson <magnus.karlsson@gmail.com> wrote:
>
> > On Mon, Apr 19, 2021 at 8:56 AM Lorenzo Bianconi
> > <lorenzo.bianconi@redhat.com> wrote:
> > >
> > > > On Sun, Apr 18, 2021 at 6:18 PM Jesper Dangaard Brouer
> > > > <brouer@redhat.com> wrote:
> > > > >
> > > > > On Fri, 16 Apr 2021 16:27:18 +0200
> > > > > Magnus Karlsson <magnus.karlsson@gmail.com> wrote:
> > > > >
> > > > > > On Thu, Apr 8, 2021 at 2:51 PM Lorenzo Bianconi <lorenzo@kernel.org> wrote:
> > > > > > >
> > > > > > > This series introduce XDP multi-buffer support. The mvneta driver is
> > > > > > > the first to support these new "non-linear" xdp_{buff,frame}. Reviewers
> > > > > > > please focus on how these new types of xdp_{buff,frame} packets
> > > > > > > traverse the different layers and the layout design. It is on purpose
> > > > > > > that BPF-helpers are kept simple, as we don't want to expose the
> > > > > > > internal layout to allow later changes.
> > > > > > >
> > > > > > > For now, to keep the design simple and to maintain performance, the XDP
> > > > > > > BPF-prog (still) only have access to the first-buffer. It is left for
> > > > > > > later (another patchset) to add payload access across multiple buffers.
> > > > > > > This patchset should still allow for these future extensions. The goal
> > > > > > > is to lift the XDP MTU restriction that comes with XDP, but maintain
> > > > > > > same performance as before.
> > > > > [...]
> > > > > > >
> > > > > > > [0] https://netdevconf.info/0x14/session.html?talk-the-path-to-tcp-4k-mtu-and-rx-zerocopy
> > > > > > > [1] https://github.com/xdp-project/xdp-project/blob/master/areas/core/xdp-multi-buffer01-design.org
> > > > > > > [2] https://netdevconf.info/0x14/session.html?tutorial-add-XDP-support-to-a-NIC-driver (XDPmulti-buffers section)
> > > > > >
> > > > > > Took your patches for a test run with the AF_XDP sample xdpsock on an
> > > > > > i40e card and the throughput degradation is between 2 to 6% depending
> > > > > > on the setup and microbenchmark within xdpsock that is executed. And
> > > > > > this is without sending any multi frame packets. Just single frame
> > > > > > ones. Tirtha made changes to the i40e driver to support this new
> > > > > > interface so that is being included in the measurements.
> > > > >
> > > > > Could you please share Tirtha's i40e support patch with me?
> > > >
> > > > We will post them on the list as an RFC. Tirtha also added AF_XDP
> > > > multi-frame support on top of Lorenzo's patches so we will send that
> > > > one out as well. Will also rerun my experiments, properly document
> > > > them and send out just to be sure that I did not make any mistake.
> > >
> > > ack, very cool, thx
> >
> > I have now run a new set of experiments on a Cascade Lake server at
> > 2.1 GHz with turbo boost disabled. Two NICs: i40e and ice. The
> > baseline is commit 5c507329000e ("libbpf: Clarify flags in ringbuf
> > helpers") and Lorenzo's and Eelco's path set is their v8. First some
> > runs with xdpsock (i.e. AF_XDP) in both 2-core mode (app on one core
> > and the driver on another) and 1-core mode using busy_poll.
> >
> > xdpsock rxdrop throughput change with the multi-buffer patches without
> > any driver changes:
> > 1-core i40e: -0.5 to 0%   2-cores i40e: -0.5%
> > 1-core ice: -2%   2-cores ice: -1 to -0.5%
> >
> > xdp_rxq_info -a XDP_DROP
> > i40e: -4%   ice: +8%
> >
> > xdp_rxq_info -a XDP_TX
> > i40e: -10%   ice: +9%
> >
> > The XDP results with xdp_rxq_info are just weird! I reran them three
> > times, rebuilt and rebooted in between and I always get the same
> > results. And I also checked that I am running on the correct NUMA node
> > and so on. But I have a hard time believing them. Nearly +10% and -10%
> > difference. Too much in my book. Jesper, could you please run the same
> > and see what you get?
>
> We of-cause have to find the root-cause of the +-10%, but let me drill
> into what the 10% represent time/cycle wise.  Using a percentage
> difference is usually a really good idea as it implies a comparative
> measure (something I always request people to do, as a single
> performance number means nothing by itself).
>
> For a zoom-in-benchmarks like these where the amount of code executed
> is very small, the effect of removing or adding code can effect the
> measurement a lot.
>
> I can only do the tests for i40e, as I don't have ice hardware (but
> Intel is working on fixing that ;-)).
>
>  xdp_rxq_info -a XDP_DROP
>   i40e: 33,417,775 pps

Here I only get around 21 Mpps

>  CPU is 100% used, so we can calculate nanosec used per packet:
>   29.92 nanosec (1/33417775*10^9)
>   2.1 GHz CPU =  approx 63 CPU-cycles
>
>  You lost -4% performance in this case.  This correspond to:
>   -1.2 nanosec (29.92*0.04) slower
>   (This could be cost of single func call overhead = 1.3 ns)
>
> My measurement for XDP_TX:
>
>  xdp_rxq_info -a XDP_TX
>   28,278,722 pps
>   35.36 ns (1/28278722*10^9)

And here, much lower at around 8 Mpps. But I do see correct packets
coming back on the cable for i40e but not for ice! There is likely a
bug there in the XDP_TX logic for ice. Might explain the weird results
I am getting. Will investigate.

But why do I get only a fraction of your performance? XDP_TX touches
the packet so I would expect it to be far less than what you get, but
more than I get. What CPU core do you run on? It actually looks like
your packet data gets prefetched successfully. If it had not, you
would have gotten an access to LLC which is much more expensive than
the drop you are seeing. If I run on the wrong NUMA node, I get 4
Mpps, so it is not that.

One interesting thing is that I get better results using the zero-copy
path in the driver. I start xdp_rxq_drop then tie an AF_XDP socket to
the queue id the XDP program gets its traffic from. The AF_XDP program
will get no traffic in this case, but it will force the driver to use
the zero-copy path for its XDP processing. In this case I get this:

-0.5% for XDP_DROP and +-0% for XDP_TX for i40e.

>  You lost -10% performance in this case:
>   -3.54 nanosec (35.36*0.10) slower
>
> In XDP context 3.54 nanosec is a lot, as you can see it is 10% in this
> zoom-in benchmark.  We have to look at the details.
>
> One detail/issue with i40e doing XDP_TX, is that I cannot verify that
> packets are actually transmitted... not via exception tracepoint, not
> via netstats, not via ethtool_stats.pl.  Maybe all the packets are
> getting (silently) drop in my tests...!?!
>
>
> > The xdpsock numbers are more in the ballpark of
> > what I would expect.
> >
> > Tirtha and I found some optimizations in the i40e
> > multi-frame/multi-buffer support that we have implemented. Will test
> > those next, post the results and share the code.
> >
> > > >
> > > > Just note that I would really like for the multi-frame support to get
> > > > in. I have lost count on how many people that have asked for it to be
> > > > added to XDP and AF_XDP. So please check our implementation and
> > > > improve it so we can get the overhead down to where we want it to be.
> > >
> > > sure, I will do.
> > >
> > > Regards,
> > > Lorenzo
> > >
> > > >
> > > > Thanks: Magnus
> > > >
> > > > > I would like to reproduce these results in my testlab, in-order to
> > > > > figure out where the throughput degradation comes from.
> > > > >
> > > > > > What performance do you see with the mvneta card? How much are we
> > > > > > willing to pay for this feature when it is not being used or can we in
> > > > > > some way selectively turn it on only when needed?
> > > > >
> > > > > Well, as Daniel says performance wise we require close to /zero/
> > > > > additional overhead, especially as you state this happens when sending
> > > > > a single frame, which is a base case that we must not slowdown.
> > > > >
> > > > > --
> > > > > Best regards,
> > > > >   Jesper Dangaard Brouer
>
> --
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   LinkedIn: http://www.linkedin.com/in/brouer
>
>
> Running XDP on dev:i40e2 (ifindex:6) action:XDP_DROP options:read
> XDP stats       CPU     pps         issue-pps
> XDP-RX CPU      2       33,417,775  0
> XDP-RX CPU      total   33,417,775
>
> RXQ stats       RXQ:CPU pps         issue-pps
> rx_queue_index    2:2   33,417,775  0
> rx_queue_index    2:sum 33,417,775
>
>
> Running XDP on dev:i40e2 (ifindex:6) action:XDP_TX options:swapmac
> XDP stats       CPU     pps         issue-pps
> XDP-RX CPU      2       28,278,722  0
> XDP-RX CPU      total   28,278,722
>
> RXQ stats       RXQ:CPU pps         issue-pps
> rx_queue_index    2:2   28,278,726  0
> rx_queue_index    2:sum 28,278,726
>
>
>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v8 bpf-next 00/14] mvneta: introduce XDP multi-buffer support
  2021-04-21 14:12             ` Magnus Karlsson
@ 2021-04-21 15:39               ` Jesper Dangaard Brouer
  2021-04-22 10:24                 ` Magnus Karlsson
  0 siblings, 1 reply; 57+ messages in thread
From: Jesper Dangaard Brouer @ 2021-04-21 15:39 UTC (permalink / raw)
  To: Magnus Karlsson
  Cc: Lorenzo Bianconi, Lorenzo Bianconi, bpf, Network Development,
	David S. Miller, Jakub Kicinski, Alexei Starovoitov,
	Daniel Borkmann, shayagr, sameehj, John Fastabend, David Ahern,
	Eelco Chaudron, Jason Wang, Alexander Duyck, Saeed Mahameed,
	Fijalkowski, Maciej, Tirthendu, brouer

On Wed, 21 Apr 2021 16:12:32 +0200
Magnus Karlsson <magnus.karlsson@gmail.com> wrote:

> On Wed, Apr 21, 2021 at 2:48 PM Jesper Dangaard Brouer
> <brouer@redhat.com> wrote:
> >
> > On Tue, 20 Apr 2021 15:49:44 +0200
> > Magnus Karlsson <magnus.karlsson@gmail.com> wrote:
> >  
> > > On Mon, Apr 19, 2021 at 8:56 AM Lorenzo Bianconi
> > > <lorenzo.bianconi@redhat.com> wrote:  
> > > >  
> > > > > On Sun, Apr 18, 2021 at 6:18 PM Jesper Dangaard Brouer
> > > > > <brouer@redhat.com> wrote:  
> > > > > >
> > > > > > On Fri, 16 Apr 2021 16:27:18 +0200
> > > > > > Magnus Karlsson <magnus.karlsson@gmail.com> wrote:
> > > > > >  
> > > > > > > On Thu, Apr 8, 2021 at 2:51 PM Lorenzo Bianconi <lorenzo@kernel.org> wrote:  
> > > > > > > >
> > > > > > > > This series introduce XDP multi-buffer support. The mvneta driver is
> > > > > > > > the first to support these new "non-linear" xdp_{buff,frame}. Reviewers
> > > > > > > > please focus on how these new types of xdp_{buff,frame} packets
> > > > > > > > traverse the different layers and the layout design. It is on purpose
> > > > > > > > that BPF-helpers are kept simple, as we don't want to expose the
> > > > > > > > internal layout to allow later changes.
> > > > > > > >
> > > > > > > > For now, to keep the design simple and to maintain performance, the XDP
> > > > > > > > BPF-prog (still) only have access to the first-buffer. It is left for
> > > > > > > > later (another patchset) to add payload access across multiple buffers.
> > > > > > > > This patchset should still allow for these future extensions. The goal
> > > > > > > > is to lift the XDP MTU restriction that comes with XDP, but maintain
> > > > > > > > same performance as before.  
> > > > > > [...]  
> > > > > > > >
> > > > > > > > [0] https://netdevconf.info/0x14/session.html?talk-the-path-to-tcp-4k-mtu-and-rx-zerocopy
> > > > > > > > [1] https://github.com/xdp-project/xdp-project/blob/master/areas/core/xdp-multi-buffer01-design.org
> > > > > > > > [2] https://netdevconf.info/0x14/session.html?tutorial-add-XDP-support-to-a-NIC-driver (XDPmulti-buffers section)  
> > > > > > >
> > > > > > > Took your patches for a test run with the AF_XDP sample xdpsock on an
> > > > > > > i40e card and the throughput degradation is between 2 to 6% depending
> > > > > > > on the setup and microbenchmark within xdpsock that is executed. And
> > > > > > > this is without sending any multi frame packets. Just single frame
> > > > > > > ones. Tirtha made changes to the i40e driver to support this new
> > > > > > > interface so that is being included in the measurements.  
> > > > > >
> > > > > > Could you please share Tirtha's i40e support patch with me?  
> > > > >
> > > > > We will post them on the list as an RFC. Tirtha also added AF_XDP
> > > > > multi-frame support on top of Lorenzo's patches so we will send that
> > > > > one out as well. Will also rerun my experiments, properly document
> > > > > them and send out just to be sure that I did not make any mistake.  
> > > >
> > > > ack, very cool, thx  
> > >
> > > I have now run a new set of experiments on a Cascade Lake server at
> > > 2.1 GHz with turbo boost disabled. Two NICs: i40e and ice. The
> > > baseline is commit 5c507329000e ("libbpf: Clarify flags in ringbuf
> > > helpers") and Lorenzo's and Eelco's path set is their v8. First some
> > > runs with xdpsock (i.e. AF_XDP) in both 2-core mode (app on one core
> > > and the driver on another) and 1-core mode using busy_poll.
> > >
> > > xdpsock rxdrop throughput change with the multi-buffer patches without
> > > any driver changes:
> > > 1-core i40e: -0.5 to 0%   2-cores i40e: -0.5%
> > > 1-core ice: -2%   2-cores ice: -1 to -0.5%
> > >
> > > xdp_rxq_info -a XDP_DROP
> > > i40e: -4%   ice: +8%
> > >
> > > xdp_rxq_info -a XDP_TX
> > > i40e: -10%   ice: +9%
> > >
> > > The XDP results with xdp_rxq_info are just weird! I reran them three
> > > times, rebuilt and rebooted in between and I always get the same
> > > results. And I also checked that I am running on the correct NUMA node
> > > and so on. But I have a hard time believing them. Nearly +10% and -10%
> > > difference. Too much in my book. Jesper, could you please run the same
> > > and see what you get?  
> >
> > We of-cause have to find the root-cause of the +-10%, but let me drill
> > into what the 10% represent time/cycle wise.  Using a percentage
> > difference is usually a really good idea as it implies a comparative
> > measure (something I always request people to do, as a single
> > performance number means nothing by itself).
> >
> > For a zoom-in-benchmarks like these where the amount of code executed
> > is very small, the effect of removing or adding code can effect the
> > measurement a lot.
> >
> > I can only do the tests for i40e, as I don't have ice hardware (but
> > Intel is working on fixing that ;-)).
> >
> >  xdp_rxq_info -a XDP_DROP
> >   i40e: 33,417,775 pps  
> 
> Here I only get around 21 Mpps
> 
> >  CPU is 100% used, so we can calculate nanosec used per packet:
> >   29.92 nanosec (1/33417775*10^9)
> >   2.1 GHz CPU =  approx 63 CPU-cycles
> >
> >  You lost -4% performance in this case.  This correspond to:
> >   -1.2 nanosec (29.92*0.04) slower
> >   (This could be cost of single func call overhead = 1.3 ns)
> >
> > My measurement for XDP_TX:
> >
> >  xdp_rxq_info -a XDP_TX
> >   28,278,722 pps
> >   35.36 ns (1/28278722*10^9)  
> 
> And here, much lower at around 8 Mpps. But I do see correct packets
> coming back on the cable for i40e but not for ice! There is likely a
> bug there in the XDP_TX logic for ice. Might explain the weird results
> I am getting. Will investigate.
> 
> But why do I get only a fraction of your performance? XDP_TX touches
> the packet so I would expect it to be far less than what you get, but
> more than I get. 

I clearly have a bug in the i40e driver.  As I wrote later, I don't see
any packets transmitted for XDP_TX.  Hmm, I using Mel Gorman's tree,
which doesn't contain the i40e/ice/ixgbe bug we fixed earlier.

The call to xdp_convert_buff_to_frame() fails, but (see below) that
error is simply converted to I40E_XDP_CONSUMED.  Thus, not even the
'trace_xdp_exception' will be able to troubleshoot this.  You/Intel
should consider making XDP_TX errors detectable (this will also happen
if TX ring don't have room).

 int i40e_xmit_xdp_tx_ring(struct xdp_buff *xdp, struct i40e_ring *xdp_ring)
 {
	struct xdp_frame *xdpf = xdp_convert_buff_to_frame(xdp);

	if (unlikely(!xdpf))
		return I40E_XDP_CONSUMED;

	return i40e_xmit_xdp_ring(xdpf, xdp_ring);
 }


> What CPU core do you run on? 

Intel(R) Xeon(R) CPU E5-1650 v4 @ 3.60GHz

> It actually looks like
> your packet data gets prefetched successfully. If it had not, you
> would have gotten an access to LLC which is much more expensive than
> the drop you are seeing. If I run on the wrong NUMA node, I get 4
> Mpps, so it is not that.
> 
> One interesting thing is that I get better results using the zero-copy
> path in the driver. I start xdp_rxq_drop then tie an AF_XDP socket to
> the queue id the XDP program gets its traffic from. The AF_XDP program
> will get no traffic in this case, but it will force the driver to use
> the zero-copy path for its XDP processing. In this case I get this:
> 
> -0.5% for XDP_DROP and +-0% for XDP_TX for i40e.
> 
> >  You lost -10% performance in this case:
> >   -3.54 nanosec (35.36*0.10) slower
> >
> > In XDP context 3.54 nanosec is a lot, as you can see it is 10% in this
> > zoom-in benchmark.  We have to look at the details.
> >
> > One detail/issue with i40e doing XDP_TX, is that I cannot verify that
> > packets are actually transmitted... not via exception tracepoint, not
> > via netstats, not via ethtool_stats.pl.  Maybe all the packets are
> > getting (silently) drop in my tests...!?!
> >
> >  
> > > The xdpsock numbers are more in the ballpark of
> > > what I would expect.
> > >
> > > Tirtha and I found some optimizations in the i40e
> > > multi-frame/multi-buffer support that we have implemented. Will test
> > > those next, post the results and share the code.
> > >  
> > > > >
> > > > > Just note that I would really like for the multi-frame support to get
> > > > > in. I have lost count on how many people that have asked for it to be
> > > > > added to XDP and AF_XDP. So please check our implementation and
> > > > > improve it so we can get the overhead down to where we want it to be.  
> > > >
> > > > sure, I will do.
> > > >
> > > > Regards,
> > > > Lorenzo
> > > >  
> > > > >
> > > > > Thanks: Magnus
> > > > >  
> > > > > > I would like to reproduce these results in my testlab, in-order to
> > > > > > figure out where the throughput degradation comes from.
> > > > > >  
> > > > > > > What performance do you see with the mvneta card? How much are we
> > > > > > > willing to pay for this feature when it is not being used or can we in
> > > > > > > some way selectively turn it on only when needed?  
> > > > > >
> > > > > > Well, as Daniel says performance wise we require close to /zero/
> > > > > > additional overhead, especially as you state this happens when sending
> > > > > > a single frame, which is a base case that we must not slowdown.
> > > > > >
> > > > > > --
> > > > > > Best regards,
> > > > > >   Jesper Dangaard Brouer  
> >
> > --
> > Best regards,
> >   Jesper Dangaard Brouer
> >   MSc.CS, Principal Kernel Engineer at Red Hat
> >   LinkedIn: http://www.linkedin.com/in/brouer
> >
> >
> > Running XDP on dev:i40e2 (ifindex:6) action:XDP_DROP options:read
> > XDP stats       CPU     pps         issue-pps
> > XDP-RX CPU      2       33,417,775  0
> > XDP-RX CPU      total   33,417,775
> >
> > RXQ stats       RXQ:CPU pps         issue-pps
> > rx_queue_index    2:2   33,417,775  0
> > rx_queue_index    2:sum 33,417,775
> >
> >
> > Running XDP on dev:i40e2 (ifindex:6) action:XDP_TX options:swapmac
> > XDP stats       CPU     pps         issue-pps
> > XDP-RX CPU      2       28,278,722  0
> > XDP-RX CPU      total   28,278,722
> >
> > RXQ stats       RXQ:CPU pps         issue-pps
> > rx_queue_index    2:2   28,278,726  0
> > rx_queue_index    2:sum 28,278,726
> >
> >
> >  
> 



-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v8 bpf-next 00/14] mvneta: introduce XDP multi-buffer support
  2021-04-21 15:39               ` Jesper Dangaard Brouer
@ 2021-04-22 10:24                 ` Magnus Karlsson
  2021-04-22 14:42                   ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 57+ messages in thread
From: Magnus Karlsson @ 2021-04-22 10:24 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Lorenzo Bianconi, Lorenzo Bianconi, bpf, Network Development,
	David S. Miller, Jakub Kicinski, Alexei Starovoitov,
	Daniel Borkmann, shayagr, sameehj, John Fastabend, David Ahern,
	Eelco Chaudron, Jason Wang, Alexander Duyck, Saeed Mahameed,
	Fijalkowski, Maciej, Tirthendu

On Wed, Apr 21, 2021 at 5:39 PM Jesper Dangaard Brouer
<brouer@redhat.com> wrote:
>
> On Wed, 21 Apr 2021 16:12:32 +0200
> Magnus Karlsson <magnus.karlsson@gmail.com> wrote:
>
> > On Wed, Apr 21, 2021 at 2:48 PM Jesper Dangaard Brouer
> > <brouer@redhat.com> wrote:
> > >
> > > On Tue, 20 Apr 2021 15:49:44 +0200
> > > Magnus Karlsson <magnus.karlsson@gmail.com> wrote:
> > >
> > > > On Mon, Apr 19, 2021 at 8:56 AM Lorenzo Bianconi
> > > > <lorenzo.bianconi@redhat.com> wrote:
> > > > >
> > > > > > On Sun, Apr 18, 2021 at 6:18 PM Jesper Dangaard Brouer
> > > > > > <brouer@redhat.com> wrote:
> > > > > > >
> > > > > > > On Fri, 16 Apr 2021 16:27:18 +0200
> > > > > > > Magnus Karlsson <magnus.karlsson@gmail.com> wrote:
> > > > > > >
> > > > > > > > On Thu, Apr 8, 2021 at 2:51 PM Lorenzo Bianconi <lorenzo@kernel.org> wrote:
> > > > > > > > >
> > > > > > > > > This series introduce XDP multi-buffer support. The mvneta driver is
> > > > > > > > > the first to support these new "non-linear" xdp_{buff,frame}. Reviewers
> > > > > > > > > please focus on how these new types of xdp_{buff,frame} packets
> > > > > > > > > traverse the different layers and the layout design. It is on purpose
> > > > > > > > > that BPF-helpers are kept simple, as we don't want to expose the
> > > > > > > > > internal layout to allow later changes.
> > > > > > > > >
> > > > > > > > > For now, to keep the design simple and to maintain performance, the XDP
> > > > > > > > > BPF-prog (still) only have access to the first-buffer. It is left for
> > > > > > > > > later (another patchset) to add payload access across multiple buffers.
> > > > > > > > > This patchset should still allow for these future extensions. The goal
> > > > > > > > > is to lift the XDP MTU restriction that comes with XDP, but maintain
> > > > > > > > > same performance as before.
> > > > > > > [...]
> > > > > > > > >
> > > > > > > > > [0] https://netdevconf.info/0x14/session.html?talk-the-path-to-tcp-4k-mtu-and-rx-zerocopy
> > > > > > > > > [1] https://github.com/xdp-project/xdp-project/blob/master/areas/core/xdp-multi-buffer01-design.org
> > > > > > > > > [2] https://netdevconf.info/0x14/session.html?tutorial-add-XDP-support-to-a-NIC-driver (XDPmulti-buffers section)
> > > > > > > >
> > > > > > > > Took your patches for a test run with the AF_XDP sample xdpsock on an
> > > > > > > > i40e card and the throughput degradation is between 2 to 6% depending
> > > > > > > > on the setup and microbenchmark within xdpsock that is executed. And
> > > > > > > > this is without sending any multi frame packets. Just single frame
> > > > > > > > ones. Tirtha made changes to the i40e driver to support this new
> > > > > > > > interface so that is being included in the measurements.
> > > > > > >
> > > > > > > Could you please share Tirtha's i40e support patch with me?
> > > > > >
> > > > > > We will post them on the list as an RFC. Tirtha also added AF_XDP
> > > > > > multi-frame support on top of Lorenzo's patches so we will send that
> > > > > > one out as well. Will also rerun my experiments, properly document
> > > > > > them and send out just to be sure that I did not make any mistake.
> > > > >
> > > > > ack, very cool, thx
> > > >
> > > > I have now run a new set of experiments on a Cascade Lake server at
> > > > 2.1 GHz with turbo boost disabled. Two NICs: i40e and ice. The
> > > > baseline is commit 5c507329000e ("libbpf: Clarify flags in ringbuf
> > > > helpers") and Lorenzo's and Eelco's path set is their v8. First some
> > > > runs with xdpsock (i.e. AF_XDP) in both 2-core mode (app on one core
> > > > and the driver on another) and 1-core mode using busy_poll.
> > > >
> > > > xdpsock rxdrop throughput change with the multi-buffer patches without
> > > > any driver changes:
> > > > 1-core i40e: -0.5 to 0%   2-cores i40e: -0.5%
> > > > 1-core ice: -2%   2-cores ice: -1 to -0.5%
> > > >
> > > > xdp_rxq_info -a XDP_DROP
> > > > i40e: -4%   ice: +8%
> > > >
> > > > xdp_rxq_info -a XDP_TX
> > > > i40e: -10%   ice: +9%
> > > >
> > > > The XDP results with xdp_rxq_info are just weird! I reran them three
> > > > times, rebuilt and rebooted in between and I always get the same
> > > > results. And I also checked that I am running on the correct NUMA node
> > > > and so on. But I have a hard time believing them. Nearly +10% and -10%
> > > > difference. Too much in my book. Jesper, could you please run the same
> > > > and see what you get?
> > >
> > > We of-cause have to find the root-cause of the +-10%, but let me drill
> > > into what the 10% represent time/cycle wise.  Using a percentage
> > > difference is usually a really good idea as it implies a comparative
> > > measure (something I always request people to do, as a single
> > > performance number means nothing by itself).
> > >
> > > For a zoom-in-benchmarks like these where the amount of code executed
> > > is very small, the effect of removing or adding code can effect the
> > > measurement a lot.
> > >
> > > I can only do the tests for i40e, as I don't have ice hardware (but
> > > Intel is working on fixing that ;-)).
> > >
> > >  xdp_rxq_info -a XDP_DROP
> > >   i40e: 33,417,775 pps
> >
> > Here I only get around 21 Mpps
> >
> > >  CPU is 100% used, so we can calculate nanosec used per packet:
> > >   29.92 nanosec (1/33417775*10^9)
> > >   2.1 GHz CPU =  approx 63 CPU-cycles
> > >
> > >  You lost -4% performance in this case.  This correspond to:
> > >   -1.2 nanosec (29.92*0.04) slower
> > >   (This could be cost of single func call overhead = 1.3 ns)
> > >
> > > My measurement for XDP_TX:
> > >
> > >  xdp_rxq_info -a XDP_TX
> > >   28,278,722 pps
> > >   35.36 ns (1/28278722*10^9)
> >
> > And here, much lower at around 8 Mpps. But I do see correct packets
> > coming back on the cable for i40e but not for ice! There is likely a
> > bug there in the XDP_TX logic for ice. Might explain the weird results
> > I am getting. Will investigate.
> >
> > But why do I get only a fraction of your performance? XDP_TX touches
> > the packet so I would expect it to be far less than what you get, but
> > more than I get.
>
> I clearly have a bug in the i40e driver.  As I wrote later, I don't see
> any packets transmitted for XDP_TX.  Hmm, I using Mel Gorman's tree,
> which doesn't contain the i40e/ice/ixgbe bug we fixed earlier.
>
> The call to xdp_convert_buff_to_frame() fails, but (see below) that
> error is simply converted to I40E_XDP_CONSUMED.  Thus, not even the
> 'trace_xdp_exception' will be able to troubleshoot this.  You/Intel
> should consider making XDP_TX errors detectable (this will also happen
> if TX ring don't have room).

This is not good. Will submit a fix. Thanks for reporting Jesper.

>  int i40e_xmit_xdp_tx_ring(struct xdp_buff *xdp, struct i40e_ring *xdp_ring)
>  {
>         struct xdp_frame *xdpf = xdp_convert_buff_to_frame(xdp);
>
>         if (unlikely(!xdpf))
>                 return I40E_XDP_CONSUMED;
>
>         return i40e_xmit_xdp_ring(xdpf, xdp_ring);
>  }
>
>
> > What CPU core do you run on?
>
> Intel(R) Xeon(R) CPU E5-1650 v4 @ 3.60GHz

So significantly higher clocked than my system. Explains your high numbers.

> > It actually looks like
> > your packet data gets prefetched successfully. If it had not, you
> > would have gotten an access to LLC which is much more expensive than
> > the drop you are seeing. If I run on the wrong NUMA node, I get 4
> > Mpps, so it is not that.
> >
> > One interesting thing is that I get better results using the zero-copy
> > path in the driver. I start xdp_rxq_drop then tie an AF_XDP socket to
> > the queue id the XDP program gets its traffic from. The AF_XDP program
> > will get no traffic in this case, but it will force the driver to use
> > the zero-copy path for its XDP processing. In this case I get this:
> >
> > -0.5% for XDP_DROP and +-0% for XDP_TX for i40e.
> >
> > >  You lost -10% performance in this case:
> > >   -3.54 nanosec (35.36*0.10) slower
> > >
> > > In XDP context 3.54 nanosec is a lot, as you can see it is 10% in this
> > > zoom-in benchmark.  We have to look at the details.
> > >
> > > One detail/issue with i40e doing XDP_TX, is that I cannot verify that
> > > packets are actually transmitted... not via exception tracepoint, not
> > > via netstats, not via ethtool_stats.pl.  Maybe all the packets are
> > > getting (silently) drop in my tests...!?!
> > >
> > >
> > > > The xdpsock numbers are more in the ballpark of
> > > > what I would expect.
> > > >
> > > > Tirtha and I found some optimizations in the i40e
> > > > multi-frame/multi-buffer support that we have implemented. Will test
> > > > those next, post the results and share the code.
> > > >
> > > > > >
> > > > > > Just note that I would really like for the multi-frame support to get
> > > > > > in. I have lost count on how many people that have asked for it to be
> > > > > > added to XDP and AF_XDP. So please check our implementation and
> > > > > > improve it so we can get the overhead down to where we want it to be.
> > > > >
> > > > > sure, I will do.
> > > > >
> > > > > Regards,
> > > > > Lorenzo
> > > > >
> > > > > >
> > > > > > Thanks: Magnus
> > > > > >
> > > > > > > I would like to reproduce these results in my testlab, in-order to
> > > > > > > figure out where the throughput degradation comes from.
> > > > > > >
> > > > > > > > What performance do you see with the mvneta card? How much are we
> > > > > > > > willing to pay for this feature when it is not being used or can we in
> > > > > > > > some way selectively turn it on only when needed?
> > > > > > >
> > > > > > > Well, as Daniel says performance wise we require close to /zero/
> > > > > > > additional overhead, especially as you state this happens when sending
> > > > > > > a single frame, which is a base case that we must not slowdown.
> > > > > > >
> > > > > > > --
> > > > > > > Best regards,
> > > > > > >   Jesper Dangaard Brouer
> > >
> > > --
> > > Best regards,
> > >   Jesper Dangaard Brouer
> > >   MSc.CS, Principal Kernel Engineer at Red Hat
> > >   LinkedIn: http://www.linkedin.com/in/brouer
> > >
> > >
> > > Running XDP on dev:i40e2 (ifindex:6) action:XDP_DROP options:read
> > > XDP stats       CPU     pps         issue-pps
> > > XDP-RX CPU      2       33,417,775  0
> > > XDP-RX CPU      total   33,417,775
> > >
> > > RXQ stats       RXQ:CPU pps         issue-pps
> > > rx_queue_index    2:2   33,417,775  0
> > > rx_queue_index    2:sum 33,417,775
> > >
> > >
> > > Running XDP on dev:i40e2 (ifindex:6) action:XDP_TX options:swapmac
> > > XDP stats       CPU     pps         issue-pps
> > > XDP-RX CPU      2       28,278,722  0
> > > XDP-RX CPU      total   28,278,722
> > >
> > > RXQ stats       RXQ:CPU pps         issue-pps
> > > rx_queue_index    2:2   28,278,726  0
> > > rx_queue_index    2:sum 28,278,726
> > >
> > >
> > >
> >
>
>
>
> --
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   LinkedIn: http://www.linkedin.com/in/brouer
>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v8 bpf-next 00/14] mvneta: introduce XDP multi-buffer support
  2021-04-22 10:24                 ` Magnus Karlsson
@ 2021-04-22 14:42                   ` Jesper Dangaard Brouer
  2021-04-22 15:05                     ` Crash for i40e on net-next (was: [PATCH v8 bpf-next 00/14] mvneta: introduce XDP multi-buffer support) Jesper Dangaard Brouer
  0 siblings, 1 reply; 57+ messages in thread
From: Jesper Dangaard Brouer @ 2021-04-22 14:42 UTC (permalink / raw)
  To: Magnus Karlsson
  Cc: Lorenzo Bianconi, bpf, Network Development, David S. Miller,
	Jakub Kicinski, Alexei Starovoitov, Daniel Borkmann,
	John Fastabend, David Ahern, Eelco Chaudron, Jason Wang,
	Alexander Duyck, Saeed Mahameed, Fijalkowski, Maciej, Tirthendu,
	brouer


On Thu, 22 Apr 2021 12:24:32 +0200
Magnus Karlsson <magnus.karlsson@gmail.com> wrote:

> On Wed, Apr 21, 2021 at 5:39 PM Jesper Dangaard Brouer
> <brouer@redhat.com> wrote:
> >
> > On Wed, 21 Apr 2021 16:12:32 +0200
> > Magnus Karlsson <magnus.karlsson@gmail.com> wrote:
> >  
[...]
> > > more than I get.  
> >
> > I clearly have a bug in the i40e driver.  As I wrote later, I don't see
> > any packets transmitted for XDP_TX.  Hmm, I using Mel Gorman's tree,
> > which contains the i40e/ice/ixgbe bug we fixed earlier.

Something is wrong with i40e, I changed git-tree to net-next (at
commit 5d869070569a) and XDP seems to have stopped working on i40e :-(

$ uname -a
Linux broadwell 5.12.0-rc7-net-next+ #600 SMP PREEMPT Thu Apr 22 15:13:15 CEST 2021 x86_64 x86_64 x86_64 GNU/Linux

When I load any XDP prog almost no packets are let through:

 [kernel-bpf-samples]$ sudo ./xdp1 i40e2
 libbpf: elf: skipping unrecognized data section(16) .eh_frame
 libbpf: elf: skipping relo section(17) .rel.eh_frame for section(16) .eh_frame
 proto 17:          1 pkt/s
 proto 0:          0 pkt/s
 proto 17:          0 pkt/s
 proto 0:          0 pkt/s
 proto 17:          1 pkt/s

On the same system my mlx5 NIC works fine:

 [kernel-bpf-samples]$ sudo ./xdp1 mlx5p1
 libbpf: elf: skipping unrecognized data section(16) .eh_frame
 libbpf: elf: skipping relo section(17) .rel.eh_frame for section(16) .eh_frame
 proto 17:   10984608 pkt/s
 proto 17:   24374656 pkt/s
 proto 17:   24339904 pkt/s



> > The call to xdp_convert_buff_to_frame() fails, but (see below) that
> > error is simply converted to I40E_XDP_CONSUMED.  Thus, not even the
> > 'trace_xdp_exception' will be able to troubleshoot this.  You/Intel
> > should consider making XDP_TX errors detectable (this will also happen
> > if TX ring don't have room).  
> 
> This is not good. Will submit a fix. Thanks for reporting Jesper.

Usually I use this command to troubleshoot:
  sudo ./xdp_monitor --stats 

But as driver i40e doesn't call the 'trace_xdp_exception' then I don't
capture any error this way.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Crash for i40e on net-next (was: [PATCH v8 bpf-next 00/14] mvneta: introduce XDP multi-buffer support)
  2021-04-22 14:42                   ` Jesper Dangaard Brouer
@ 2021-04-22 15:05                     ` Jesper Dangaard Brouer
  2021-04-23  5:28                       ` Magnus Karlsson
  0 siblings, 1 reply; 57+ messages in thread
From: Jesper Dangaard Brouer @ 2021-04-22 15:05 UTC (permalink / raw)
  To: Magnus Karlsson
  Cc: Lorenzo Bianconi, bpf, Network Development, David S. Miller,
	Jakub Kicinski, Alexei Starovoitov, Daniel Borkmann,
	John Fastabend, David Ahern, Eelco Chaudron, Jason Wang,
	Alexander Duyck, Saeed Mahameed, Fijalkowski, Maciej, Tirthendu,
	brouer

On Thu, 22 Apr 2021 16:42:23 +0200
Jesper Dangaard Brouer <brouer@redhat.com> wrote:

> On Thu, 22 Apr 2021 12:24:32 +0200
> Magnus Karlsson <magnus.karlsson@gmail.com> wrote:
> 
> > On Wed, Apr 21, 2021 at 5:39 PM Jesper Dangaard Brouer
> > <brouer@redhat.com> wrote:  
> > >
> > > On Wed, 21 Apr 2021 16:12:32 +0200
> > > Magnus Karlsson <magnus.karlsson@gmail.com> wrote:
> > >    
> [...]
> > > > more than I get.    
> > >
> > > I clearly have a bug in the i40e driver.  As I wrote later, I don't see
> > > any packets transmitted for XDP_TX.  Hmm, I using Mel Gorman's tree,
> > > which contains the i40e/ice/ixgbe bug we fixed earlier.  
> 
> Something is wrong with i40e, I changed git-tree to net-next (at
> commit 5d869070569a) and XDP seems to have stopped working on i40e :-(

Renamed subj as this is without this patchset applied.

> $ uname -a
> Linux broadwell 5.12.0-rc7-net-next+ #600 SMP PREEMPT Thu Apr 22 15:13:15 CEST 2021 x86_64 x86_64 x86_64 GNU/Linux
> 
> When I load any XDP prog almost no packets are let through:
> 
>  [kernel-bpf-samples]$ sudo ./xdp1 i40e2
>  libbpf: elf: skipping unrecognized data section(16) .eh_frame
>  libbpf: elf: skipping relo section(17) .rel.eh_frame for section(16) .eh_frame
>  proto 17:          1 pkt/s
>  proto 0:          0 pkt/s
>  proto 17:          0 pkt/s
>  proto 0:          0 pkt/s
>  proto 17:          1 pkt/s

Trying out xdp_redirect:

 [kernel-bpf-samples]$ sudo ./xdp_redirect i40e2 i40e2
 input: 7 output: 7
 libbpf: elf: skipping unrecognized data section(20) .eh_frame
 libbpf: elf: skipping relo section(21) .rel.eh_frame for section(20) .eh_frame
 libbpf: Kernel error message: XDP program already attached
 WARN: link set xdp fd failed on 7
 ifindex 7:       7357 pkt/s
 ifindex 7:       7909 pkt/s
 ifindex 7:       7909 pkt/s
 ifindex 7:       7909 pkt/s
 ifindex 7:       7909 pkt/s
 ifindex 7:       7909 pkt/s
 ifindex 7:       6357 pkt/s

And then it crash (see below) at page_frag_free+0x31 which calls
virt_to_head_page() with a wrong addr (I guess).  This is called by
i40e_clean_tx_irq+0xc9.

 $ ./scripts/faddr2line drivers/net/ethernet/intel/i40e/i40e.o i40e_clean_tx_irq+0xc9
 i40e_clean_tx_irq+0xc9/0x440:
 i40e_clean_tx_irq at /home/jbrouer/git/kernel/net-next/drivers/net/ethernet/intel/i40e/i40e_txrx.c:976

Which is:

 		/* unmap skb header data */
 Line:976	dma_unmap_single(tx_ring->dev,
				 dma_unmap_addr(tx_buf, dma),
				 dma_unmap_len(tx_buf, len),
				 DMA_TO_DEVICE);


[  935.781751] BUG: unable to handle page fault for address: ffffebde00000008
[  935.788630] #PF: supervisor read access in kernel mode
[  935.793766] #PF: error_code(0x0000) - not-present page
[  935.798906] PGD 0 P4D 0 
[  935.801445] Oops: 0000 [#1] PREEMPT SMP PTI
[  935.805632] CPU: 4 PID: 113 Comm: kworker/u12:9 Not tainted 5.12.0-rc7-net-next+ #600
[  935.813460] Hardware name: Supermicro Super Server/X10SRi-F, BIOS 2.0a 08/01/2016
[  935.820937] Workqueue: events_unbound call_usermodehelper_exec_work
[  935.827214] RIP: 0010:page_frag_free+0x31/0x70
[  935.831656] Code: 00 00 80 48 01 c7 72 55 48 b8 00 00 00 80 7f 77 00 00 48 01 c7 48 b8 00 00 00 00 00 ea ff ff 48 c1 ef 0c 48 c1 e7 06 48 01 c7 <48> 8b 47 08 48 8d 50 ff a8 01 48 0f 45 fa f0 ff 4f 34 74 01 c3 48
[  935.850406] RSP: 0018:ffffc900001c0e50 EFLAGS: 00010286
[  935.855629] RAX: ffffea0000000000 RBX: ffff88813b258180 RCX: 0000000000000000
[  935.862764] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffebde00000000
[  935.869895] RBP: ffff88813b258180 R08: ffffc900001c0f38 R09: 0000000000000180
[  935.877028] R10: 00000000fffffe18 R11: ffffc900001c0ff8 R12: ffff88813bc403c0
[  935.884160] R13: 000000000000003c R14: 00000000fffffe18 R15: ffff88813b15b180
[  935.891295] FS:  0000000000000000(0000) GS:ffff88887fd00000(0000) knlGS:0000000000000000
[  935.899380] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  935.905126] CR2: ffffebde00000008 CR3: 000000087e810002 CR4: 00000000003706e0
[  935.912259] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  935.919391] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  935.926526] Call Trace:
[  935.928980]  <IRQ>
[  935.930999]  i40e_clean_tx_irq+0xc9/0x440 [i40e]
[  935.935653]  i40e_napi_poll+0x101/0x410 [i40e]
[  935.940116]  __napi_poll+0x2a/0x140
[  935.943607]  net_rx_action+0x215/0x2d0
[  935.947358]  ? i40e_msix_clean_rings+0x3f/0x50 [i40e]
[  935.952449]  __do_softirq+0xe3/0x2df
[  935.956028]  irq_exit_rcu+0xa7/0xb0
[  935.959519]  common_interrupt+0x83/0xa0
[  935.963358]  </IRQ>
[  935.965465]  asm_common_interrupt+0x1e/0x40
[  935.969651] RIP: 0010:clear_page_erms+0x7/0x10
[  935.974096] Code: 48 89 47 18 48 89 47 20 48 89 47 28 48 89 47 30 48 89 47 38 48 8d 7f 40 75 d9 90 c3 0f 1f 80 00 00 00 00 b9 00 10 00 00 31 c0 <f3> aa c3 cc cc cc cc cc cc 0f 1f 44 00 00 48 85 ff 0f 84 f2 00 00
[  935.992845] RSP: 0018:ffffc900003bfbc8 EFLAGS: 00010246
[  935.998069] RAX: 0000000000000000 RBX: 0000000000000901 RCX: 0000000000000340
[  936.005202] RDX: 0000000000002dc2 RSI: 0000000000000000 RDI: ffff88813b4d8cc0
[  936.012334] RBP: ffffea0004ed3600 R08: ffff88887c625f00 R09: ffffea0004ed3600
[  936.019467] R10: ffff888000000000 R11: 0000160000000000 R12: ffffea0004ed3640
[  936.026600] R13: 0000000000000000 R14: 0000000000005c39 R15: ffffc900003bfc50
[  936.033738]  prep_new_page+0x88/0xe0
[  936.037313]  get_page_from_freelist+0x2c6/0x3d0
[  936.041847]  __alloc_pages_nodemask+0x137/0x2e0
[  936.046380]  __vmalloc_node_range+0x14f/0x270
[  936.050738]  copy_process+0x39d/0x1ad0
[  936.054491]  ? kernel_clone+0x8b/0x3c0
[  936.058244]  kernel_clone+0x8b/0x3c0
[  936.061822]  ? dequeue_entity+0xc0/0x270
[  936.065748]  kernel_thread+0x47/0x50
[  936.069329]  ? umh_complete+0x40/0x40
[  936.072992]  call_usermodehelper_exec_work+0x2f/0x90
[  936.077960]  process_one_work+0x1ad/0x380
[  936.081974]  worker_thread+0x50/0x390
[  936.085638]  ? process_one_work+0x380/0x380
[  936.089824]  kthread+0x116/0x150
[  936.093057]  ? kthread_park+0x90/0x90
[  936.096722]  ret_from_fork+0x22/0x30
[  936.100307] Modules linked in: veth nf_defrag_ipv6 nf_defrag_ipv4 tun bridge stp llc rpcrdma sunrpc rdma_ucm ib_umad coretemp rdma_cm ib_ipoib kvm_intel iw_cm ib_cm kvm mlx5_ib i40iw irqbypass ib_uverbs rapl intel_cstate intel_uncore ib_core pcspkr i2c_i801 bfq i2c_smbus acpi_ipmi wmi ipmi_si ipmi_devintf ipmi_msghandler acpi_pad sch_fq_codel sd_mod t10_pi ixgbe igb igc mdio mlx5_core i40e mlxfw nfp psample i2c_algo_bit ptp i2c_core pps_core hid_generic [last unloaded: bpfilter]
[  936.142697] CR2: ffffebde00000008
[  936.146015] ---[ end trace 1bffa979f2cccd16 ]---
[  936.156720] RIP: 0010:page_frag_free+0x31/0x70
[  936.161170] Code: 00 00 80 48 01 c7 72 55 48 b8 00 00 00 80 7f 77 00 00 48 01 c7 48 b8 00 00 00 00 00 ea ff ff 48 c1 ef 0c 48 c1 e7 06 48 01 c7 <48> 8b 47 08 48 8d 50 ff a8 01 48 0f 45 fa f0 ff 4f 34 74 01 c3 48
[  936.179919] RSP: 0018:ffffc900001c0e50 EFLAGS: 00010286
[  936.185140] RAX: ffffea0000000000 RBX: ffff88813b258180 RCX: 0000000000000000
[  936.192275] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffebde00000000
[  936.199407] RBP: ffff88813b258180 R08: ffffc900001c0f38 R09: 0000000000000180
[  936.206541] R10: 00000000fffffe18 R11: ffffc900001c0ff8 R12: ffff88813bc403c0
[  936.213673] R13: 000000000000003c R14: 00000000fffffe18 R15: ffff88813b15b180
[  936.220804] FS:  0000000000000000(0000) GS:ffff88887fd00000(0000) knlGS:0000000000000000
[  936.228893] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  936.234638] CR2: ffffebde00000008 CR3: 000000087e810002 CR4: 00000000003706e0
[  936.241771] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  936.248903] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  936.256036] Kernel panic - not syncing: Fatal exception in interrupt
[  936.262401] Kernel Offset: disabled
[  936.271867] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---


-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Crash for i40e on net-next (was: [PATCH v8 bpf-next 00/14] mvneta: introduce XDP multi-buffer support)
  2021-04-22 15:05                     ` Crash for i40e on net-next (was: [PATCH v8 bpf-next 00/14] mvneta: introduce XDP multi-buffer support) Jesper Dangaard Brouer
@ 2021-04-23  5:28                       ` Magnus Karlsson
  2021-04-23 16:43                         ` Alexander Duyck
  0 siblings, 1 reply; 57+ messages in thread
From: Magnus Karlsson @ 2021-04-23  5:28 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Lorenzo Bianconi, bpf, Network Development, David S. Miller,
	Jakub Kicinski, Alexei Starovoitov, Daniel Borkmann,
	John Fastabend, David Ahern, Eelco Chaudron, Jason Wang,
	Alexander Duyck, Saeed Mahameed, Fijalkowski, Maciej, Tirthendu

On Thu, Apr 22, 2021 at 5:05 PM Jesper Dangaard Brouer
<brouer@redhat.com> wrote:
>
> On Thu, 22 Apr 2021 16:42:23 +0200
> Jesper Dangaard Brouer <brouer@redhat.com> wrote:
>
> > On Thu, 22 Apr 2021 12:24:32 +0200
> > Magnus Karlsson <magnus.karlsson@gmail.com> wrote:
> >
> > > On Wed, Apr 21, 2021 at 5:39 PM Jesper Dangaard Brouer
> > > <brouer@redhat.com> wrote:
> > > >
> > > > On Wed, 21 Apr 2021 16:12:32 +0200
> > > > Magnus Karlsson <magnus.karlsson@gmail.com> wrote:
> > > >
> > [...]
> > > > > more than I get.
> > > >
> > > > I clearly have a bug in the i40e driver.  As I wrote later, I don't see
> > > > any packets transmitted for XDP_TX.  Hmm, I using Mel Gorman's tree,
> > > > which contains the i40e/ice/ixgbe bug we fixed earlier.
> >
> > Something is wrong with i40e, I changed git-tree to net-next (at
> > commit 5d869070569a) and XDP seems to have stopped working on i40e :-(

Found this out too when switching to the net tree yesterday to work on
proper packet drop tracing as you spotted/requested yesterday. The
commit below completely broke XDP support on i40e (if you do not run
with a zero-copy AF_XDP socket because that path still works). I am
working on a fix that does not just revert the patch, but fixes the
original problem without breaking XDP. Will post it and the tracing
fixes as soon as I can.

commit 12738ac4754ec92a6a45bf3677d8da780a1412b3
Author: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Date:   Fri Mar 26 19:43:40 2021 +0100

    i40e: Fix sparse errors in i40e_txrx.c

    Remove error handling through pointers. Instead use plain int
    to return value from i40e_run_xdp(...).

    Previously:
    - sparse errors were produced during compilation:
    i40e_txrx.c:2338 i40e_run_xdp() error: (-2147483647) too low for ERR_PTR
    i40e_txrx.c:2558 i40e_clean_rx_irq() error: 'skb' dereferencing
possible ERR_PTR()

    - sk_buff* was used to return value, but it has never had valid
    pointer to sk_buff. Returned value was always int handled as
    a pointer.

    Fixes: 0c8493d90b6b ("i40e: add XDP support for pass and drop actions")
    Fixes: 2e6893123830 ("i40e: split XDP_TX tail and XDP_REDIRECT map
flushing")
    Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
    Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
    Tested-by: Dave Switzer <david.switzer@intel.com>
    Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>


> Renamed subj as this is without this patchset applied.
>
> > $ uname -a
> > Linux broadwell 5.12.0-rc7-net-next+ #600 SMP PREEMPT Thu Apr 22 15:13:15 CEST 2021 x86_64 x86_64 x86_64 GNU/Linux
> >
> > When I load any XDP prog almost no packets are let through:
> >
> >  [kernel-bpf-samples]$ sudo ./xdp1 i40e2
> >  libbpf: elf: skipping unrecognized data section(16) .eh_frame
> >  libbpf: elf: skipping relo section(17) .rel.eh_frame for section(16) .eh_frame
> >  proto 17:          1 pkt/s
> >  proto 0:          0 pkt/s
> >  proto 17:          0 pkt/s
> >  proto 0:          0 pkt/s
> >  proto 17:          1 pkt/s
>
> Trying out xdp_redirect:
>
>  [kernel-bpf-samples]$ sudo ./xdp_redirect i40e2 i40e2
>  input: 7 output: 7
>  libbpf: elf: skipping unrecognized data section(20) .eh_frame
>  libbpf: elf: skipping relo section(21) .rel.eh_frame for section(20) .eh_frame
>  libbpf: Kernel error message: XDP program already attached
>  WARN: link set xdp fd failed on 7
>  ifindex 7:       7357 pkt/s
>  ifindex 7:       7909 pkt/s
>  ifindex 7:       7909 pkt/s
>  ifindex 7:       7909 pkt/s
>  ifindex 7:       7909 pkt/s
>  ifindex 7:       7909 pkt/s
>  ifindex 7:       6357 pkt/s
>
> And then it crash (see below) at page_frag_free+0x31 which calls
> virt_to_head_page() with a wrong addr (I guess).  This is called by
> i40e_clean_tx_irq+0xc9.

Did not see a crash myself, just 4 Kpps. But the rings and DMA
mappings got completely mangled by the patch above, so could be the
same cause.

>  $ ./scripts/faddr2line drivers/net/ethernet/intel/i40e/i40e.o i40e_clean_tx_irq+0xc9
>  i40e_clean_tx_irq+0xc9/0x440:
>  i40e_clean_tx_irq at /home/jbrouer/git/kernel/net-next/drivers/net/ethernet/intel/i40e/i40e_txrx.c:976
>
> Which is:
>
>                 /* unmap skb header data */
>  Line:976       dma_unmap_single(tx_ring->dev,
>                                  dma_unmap_addr(tx_buf, dma),
>                                  dma_unmap_len(tx_buf, len),
>                                  DMA_TO_DEVICE);
>
>
> [  935.781751] BUG: unable to handle page fault for address: ffffebde00000008
> [  935.788630] #PF: supervisor read access in kernel mode
> [  935.793766] #PF: error_code(0x0000) - not-present page
> [  935.798906] PGD 0 P4D 0
> [  935.801445] Oops: 0000 [#1] PREEMPT SMP PTI
> [  935.805632] CPU: 4 PID: 113 Comm: kworker/u12:9 Not tainted 5.12.0-rc7-net-next+ #600
> [  935.813460] Hardware name: Supermicro Super Server/X10SRi-F, BIOS 2.0a 08/01/2016
> [  935.820937] Workqueue: events_unbound call_usermodehelper_exec_work
> [  935.827214] RIP: 0010:page_frag_free+0x31/0x70
> [  935.831656] Code: 00 00 80 48 01 c7 72 55 48 b8 00 00 00 80 7f 77 00 00 48 01 c7 48 b8 00 00 00 00 00 ea ff ff 48 c1 ef 0c 48 c1 e7 06 48 01 c7 <48> 8b 47 08 48 8d 50 ff a8 01 48 0f 45 fa f0 ff 4f 34 74 01 c3 48
> [  935.850406] RSP: 0018:ffffc900001c0e50 EFLAGS: 00010286
> [  935.855629] RAX: ffffea0000000000 RBX: ffff88813b258180 RCX: 0000000000000000
> [  935.862764] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffebde00000000
> [  935.869895] RBP: ffff88813b258180 R08: ffffc900001c0f38 R09: 0000000000000180
> [  935.877028] R10: 00000000fffffe18 R11: ffffc900001c0ff8 R12: ffff88813bc403c0
> [  935.884160] R13: 000000000000003c R14: 00000000fffffe18 R15: ffff88813b15b180
> [  935.891295] FS:  0000000000000000(0000) GS:ffff88887fd00000(0000) knlGS:0000000000000000
> [  935.899380] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  935.905126] CR2: ffffebde00000008 CR3: 000000087e810002 CR4: 00000000003706e0
> [  935.912259] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  935.919391] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [  935.926526] Call Trace:
> [  935.928980]  <IRQ>
> [  935.930999]  i40e_clean_tx_irq+0xc9/0x440 [i40e]
> [  935.935653]  i40e_napi_poll+0x101/0x410 [i40e]
> [  935.940116]  __napi_poll+0x2a/0x140
> [  935.943607]  net_rx_action+0x215/0x2d0
> [  935.947358]  ? i40e_msix_clean_rings+0x3f/0x50 [i40e]
> [  935.952449]  __do_softirq+0xe3/0x2df
> [  935.956028]  irq_exit_rcu+0xa7/0xb0
> [  935.959519]  common_interrupt+0x83/0xa0
> [  935.963358]  </IRQ>
> [  935.965465]  asm_common_interrupt+0x1e/0x40
> [  935.969651] RIP: 0010:clear_page_erms+0x7/0x10
> [  935.974096] Code: 48 89 47 18 48 89 47 20 48 89 47 28 48 89 47 30 48 89 47 38 48 8d 7f 40 75 d9 90 c3 0f 1f 80 00 00 00 00 b9 00 10 00 00 31 c0 <f3> aa c3 cc cc cc cc cc cc 0f 1f 44 00 00 48 85 ff 0f 84 f2 00 00
> [  935.992845] RSP: 0018:ffffc900003bfbc8 EFLAGS: 00010246
> [  935.998069] RAX: 0000000000000000 RBX: 0000000000000901 RCX: 0000000000000340
> [  936.005202] RDX: 0000000000002dc2 RSI: 0000000000000000 RDI: ffff88813b4d8cc0
> [  936.012334] RBP: ffffea0004ed3600 R08: ffff88887c625f00 R09: ffffea0004ed3600
> [  936.019467] R10: ffff888000000000 R11: 0000160000000000 R12: ffffea0004ed3640
> [  936.026600] R13: 0000000000000000 R14: 0000000000005c39 R15: ffffc900003bfc50
> [  936.033738]  prep_new_page+0x88/0xe0
> [  936.037313]  get_page_from_freelist+0x2c6/0x3d0
> [  936.041847]  __alloc_pages_nodemask+0x137/0x2e0
> [  936.046380]  __vmalloc_node_range+0x14f/0x270
> [  936.050738]  copy_process+0x39d/0x1ad0
> [  936.054491]  ? kernel_clone+0x8b/0x3c0
> [  936.058244]  kernel_clone+0x8b/0x3c0
> [  936.061822]  ? dequeue_entity+0xc0/0x270
> [  936.065748]  kernel_thread+0x47/0x50
> [  936.069329]  ? umh_complete+0x40/0x40
> [  936.072992]  call_usermodehelper_exec_work+0x2f/0x90
> [  936.077960]  process_one_work+0x1ad/0x380
> [  936.081974]  worker_thread+0x50/0x390
> [  936.085638]  ? process_one_work+0x380/0x380
> [  936.089824]  kthread+0x116/0x150
> [  936.093057]  ? kthread_park+0x90/0x90
> [  936.096722]  ret_from_fork+0x22/0x30
> [  936.100307] Modules linked in: veth nf_defrag_ipv6 nf_defrag_ipv4 tun bridge stp llc rpcrdma sunrpc rdma_ucm ib_umad coretemp rdma_cm ib_ipoib kvm_intel iw_cm ib_cm kvm mlx5_ib i40iw irqbypass ib_uverbs rapl intel_cstate intel_uncore ib_core pcspkr i2c_i801 bfq i2c_smbus acpi_ipmi wmi ipmi_si ipmi_devintf ipmi_msghandler acpi_pad sch_fq_codel sd_mod t10_pi ixgbe igb igc mdio mlx5_core i40e mlxfw nfp psample i2c_algo_bit ptp i2c_core pps_core hid_generic [last unloaded: bpfilter]
> [  936.142697] CR2: ffffebde00000008
> [  936.146015] ---[ end trace 1bffa979f2cccd16 ]---
> [  936.156720] RIP: 0010:page_frag_free+0x31/0x70
> [  936.161170] Code: 00 00 80 48 01 c7 72 55 48 b8 00 00 00 80 7f 77 00 00 48 01 c7 48 b8 00 00 00 00 00 ea ff ff 48 c1 ef 0c 48 c1 e7 06 48 01 c7 <48> 8b 47 08 48 8d 50 ff a8 01 48 0f 45 fa f0 ff 4f 34 74 01 c3 48
> [  936.179919] RSP: 0018:ffffc900001c0e50 EFLAGS: 00010286
> [  936.185140] RAX: ffffea0000000000 RBX: ffff88813b258180 RCX: 0000000000000000
> [  936.192275] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffebde00000000
> [  936.199407] RBP: ffff88813b258180 R08: ffffc900001c0f38 R09: 0000000000000180
> [  936.206541] R10: 00000000fffffe18 R11: ffffc900001c0ff8 R12: ffff88813bc403c0
> [  936.213673] R13: 000000000000003c R14: 00000000fffffe18 R15: ffff88813b15b180
> [  936.220804] FS:  0000000000000000(0000) GS:ffff88887fd00000(0000) knlGS:0000000000000000
> [  936.228893] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  936.234638] CR2: ffffebde00000008 CR3: 000000087e810002 CR4: 00000000003706e0
> [  936.241771] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  936.248903] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [  936.256036] Kernel panic - not syncing: Fatal exception in interrupt
> [  936.262401] Kernel Offset: disabled
> [  936.271867] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
>
>
> --
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   LinkedIn: http://www.linkedin.com/in/brouer
>

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Crash for i40e on net-next (was: [PATCH v8 bpf-next 00/14] mvneta: introduce XDP multi-buffer support)
  2021-04-23  5:28                       ` Magnus Karlsson
@ 2021-04-23 16:43                         ` Alexander Duyck
  2021-04-25  9:45                           ` Magnus Karlsson
  0 siblings, 1 reply; 57+ messages in thread
From: Alexander Duyck @ 2021-04-23 16:43 UTC (permalink / raw)
  To: Magnus Karlsson
  Cc: Jesper Dangaard Brouer, Lorenzo Bianconi, bpf,
	Network Development, David S. Miller, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, John Fastabend, David Ahern,
	Eelco Chaudron, Jason Wang, Saeed Mahameed, Fijalkowski, Maciej,
	Tirthendu

On Thu, Apr 22, 2021 at 10:28 PM Magnus Karlsson
<magnus.karlsson@gmail.com> wrote:
>
> On Thu, Apr 22, 2021 at 5:05 PM Jesper Dangaard Brouer
> <brouer@redhat.com> wrote:
> >
> > On Thu, 22 Apr 2021 16:42:23 +0200
> > Jesper Dangaard Brouer <brouer@redhat.com> wrote:
> >
> > > On Thu, 22 Apr 2021 12:24:32 +0200
> > > Magnus Karlsson <magnus.karlsson@gmail.com> wrote:
> > >
> > > > On Wed, Apr 21, 2021 at 5:39 PM Jesper Dangaard Brouer
> > > > <brouer@redhat.com> wrote:
> > > > >
> > > > > On Wed, 21 Apr 2021 16:12:32 +0200
> > > > > Magnus Karlsson <magnus.karlsson@gmail.com> wrote:
> > > > >
> > > [...]
> > > > > > more than I get.
> > > > >
> > > > > I clearly have a bug in the i40e driver.  As I wrote later, I don't see
> > > > > any packets transmitted for XDP_TX.  Hmm, I using Mel Gorman's tree,
> > > > > which contains the i40e/ice/ixgbe bug we fixed earlier.
> > >
> > > Something is wrong with i40e, I changed git-tree to net-next (at
> > > commit 5d869070569a) and XDP seems to have stopped working on i40e :-(
>
> Found this out too when switching to the net tree yesterday to work on
> proper packet drop tracing as you spotted/requested yesterday. The
> commit below completely broke XDP support on i40e (if you do not run
> with a zero-copy AF_XDP socket because that path still works). I am
> working on a fix that does not just revert the patch, but fixes the
> original problem without breaking XDP. Will post it and the tracing
> fixes as soon as I can.
>
> commit 12738ac4754ec92a6a45bf3677d8da780a1412b3
> Author: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
> Date:   Fri Mar 26 19:43:40 2021 +0100
>
>     i40e: Fix sparse errors in i40e_txrx.c
>
>     Remove error handling through pointers. Instead use plain int
>     to return value from i40e_run_xdp(...).
>
>     Previously:
>     - sparse errors were produced during compilation:
>     i40e_txrx.c:2338 i40e_run_xdp() error: (-2147483647) too low for ERR_PTR
>     i40e_txrx.c:2558 i40e_clean_rx_irq() error: 'skb' dereferencing
> possible ERR_PTR()
>
>     - sk_buff* was used to return value, but it has never had valid
>     pointer to sk_buff. Returned value was always int handled as
>     a pointer.
>
>     Fixes: 0c8493d90b6b ("i40e: add XDP support for pass and drop actions")
>     Fixes: 2e6893123830 ("i40e: split XDP_TX tail and XDP_REDIRECT map
> flushing")
>     Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
>     Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
>     Tested-by: Dave Switzer <david.switzer@intel.com>
>     Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

Yeah, this patch would horribly break things, especially in the
multi-buffer case. The idea behind using the skb pointer to indicate
the error is that it is persistent until we hit the EOP descriptor.
With that removed you end up mangling the entire list of frames since
it will start trying to process the next frame in the middle of a
packet.

>
> > Renamed subj as this is without this patchset applied.
> >
> > > $ uname -a
> > > Linux broadwell 5.12.0-rc7-net-next+ #600 SMP PREEMPT Thu Apr 22 15:13:15 CEST 2021 x86_64 x86_64 x86_64 GNU/Linux
> > >
> > > When I load any XDP prog almost no packets are let through:
> > >
> > >  [kernel-bpf-samples]$ sudo ./xdp1 i40e2
> > >  libbpf: elf: skipping unrecognized data section(16) .eh_frame
> > >  libbpf: elf: skipping relo section(17) .rel.eh_frame for section(16) .eh_frame
> > >  proto 17:          1 pkt/s
> > >  proto 0:          0 pkt/s
> > >  proto 17:          0 pkt/s
> > >  proto 0:          0 pkt/s
> > >  proto 17:          1 pkt/s
> >
> > Trying out xdp_redirect:
> >
> >  [kernel-bpf-samples]$ sudo ./xdp_redirect i40e2 i40e2
> >  input: 7 output: 7
> >  libbpf: elf: skipping unrecognized data section(20) .eh_frame
> >  libbpf: elf: skipping relo section(21) .rel.eh_frame for section(20) .eh_frame
> >  libbpf: Kernel error message: XDP program already attached
> >  WARN: link set xdp fd failed on 7
> >  ifindex 7:       7357 pkt/s
> >  ifindex 7:       7909 pkt/s
> >  ifindex 7:       7909 pkt/s
> >  ifindex 7:       7909 pkt/s
> >  ifindex 7:       7909 pkt/s
> >  ifindex 7:       7909 pkt/s
> >  ifindex 7:       6357 pkt/s
> >
> > And then it crash (see below) at page_frag_free+0x31 which calls
> > virt_to_head_page() with a wrong addr (I guess).  This is called by
> > i40e_clean_tx_irq+0xc9.
>
> Did not see a crash myself, just 4 Kpps. But the rings and DMA
> mappings got completely mangled by the patch above, so could be the
> same cause.

Are you running with jumbo frames enabled? I would think this change
would really blow things up in the jumbo enabled case.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: Crash for i40e on net-next (was: [PATCH v8 bpf-next 00/14] mvneta: introduce XDP multi-buffer support)
  2021-04-23 16:43                         ` Alexander Duyck
@ 2021-04-25  9:45                           ` Magnus Karlsson
  0 siblings, 0 replies; 57+ messages in thread
From: Magnus Karlsson @ 2021-04-25  9:45 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Jesper Dangaard Brouer, Lorenzo Bianconi, bpf,
	Network Development, David S. Miller, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, John Fastabend, David Ahern,
	Eelco Chaudron, Jason Wang, Saeed Mahameed, Fijalkowski, Maciej,
	Tirthendu

On Fri, Apr 23, 2021 at 6:43 PM Alexander Duyck
<alexander.duyck@gmail.com> wrote:
>
> On Thu, Apr 22, 2021 at 10:28 PM Magnus Karlsson
> <magnus.karlsson@gmail.com> wrote:
> >
> > On Thu, Apr 22, 2021 at 5:05 PM Jesper Dangaard Brouer
> > <brouer@redhat.com> wrote:
> > >
> > > On Thu, 22 Apr 2021 16:42:23 +0200
> > > Jesper Dangaard Brouer <brouer@redhat.com> wrote:
> > >
> > > > On Thu, 22 Apr 2021 12:24:32 +0200
> > > > Magnus Karlsson <magnus.karlsson@gmail.com> wrote:
> > > >
> > > > > On Wed, Apr 21, 2021 at 5:39 PM Jesper Dangaard Brouer
> > > > > <brouer@redhat.com> wrote:
> > > > > >
> > > > > > On Wed, 21 Apr 2021 16:12:32 +0200
> > > > > > Magnus Karlsson <magnus.karlsson@gmail.com> wrote:
> > > > > >
> > > > [...]
> > > > > > > more than I get.
> > > > > >
> > > > > > I clearly have a bug in the i40e driver.  As I wrote later, I don't see
> > > > > > any packets transmitted for XDP_TX.  Hmm, I using Mel Gorman's tree,
> > > > > > which contains the i40e/ice/ixgbe bug we fixed earlier.
> > > >
> > > > Something is wrong with i40e, I changed git-tree to net-next (at
> > > > commit 5d869070569a) and XDP seems to have stopped working on i40e :-(
> >
> > Found this out too when switching to the net tree yesterday to work on
> > proper packet drop tracing as you spotted/requested yesterday. The
> > commit below completely broke XDP support on i40e (if you do not run
> > with a zero-copy AF_XDP socket because that path still works). I am
> > working on a fix that does not just revert the patch, but fixes the
> > original problem without breaking XDP. Will post it and the tracing
> > fixes as soon as I can.
> >
> > commit 12738ac4754ec92a6a45bf3677d8da780a1412b3
> > Author: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
> > Date:   Fri Mar 26 19:43:40 2021 +0100
> >
> >     i40e: Fix sparse errors in i40e_txrx.c
> >
> >     Remove error handling through pointers. Instead use plain int
> >     to return value from i40e_run_xdp(...).
> >
> >     Previously:
> >     - sparse errors were produced during compilation:
> >     i40e_txrx.c:2338 i40e_run_xdp() error: (-2147483647) too low for ERR_PTR
> >     i40e_txrx.c:2558 i40e_clean_rx_irq() error: 'skb' dereferencing
> > possible ERR_PTR()
> >
> >     - sk_buff* was used to return value, but it has never had valid
> >     pointer to sk_buff. Returned value was always int handled as
> >     a pointer.
> >
> >     Fixes: 0c8493d90b6b ("i40e: add XDP support for pass and drop actions")
> >     Fixes: 2e6893123830 ("i40e: split XDP_TX tail and XDP_REDIRECT map
> > flushing")
> >     Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
> >     Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
> >     Tested-by: Dave Switzer <david.switzer@intel.com>
> >     Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
>
> Yeah, this patch would horribly break things, especially in the
> multi-buffer case. The idea behind using the skb pointer to indicate
> the error is that it is persistent until we hit the EOP descriptor.
> With that removed you end up mangling the entire list of frames since
> it will start trying to process the next frame in the middle of a
> packet.
>
> >
> > > Renamed subj as this is without this patchset applied.
> > >
> > > > $ uname -a
> > > > Linux broadwell 5.12.0-rc7-net-next+ #600 SMP PREEMPT Thu Apr 22 15:13:15 CEST 2021 x86_64 x86_64 x86_64 GNU/Linux
> > > >
> > > > When I load any XDP prog almost no packets are let through:
> > > >
> > > >  [kernel-bpf-samples]$ sudo ./xdp1 i40e2
> > > >  libbpf: elf: skipping unrecognized data section(16) .eh_frame
> > > >  libbpf: elf: skipping relo section(17) .rel.eh_frame for section(16) .eh_frame
> > > >  proto 17:          1 pkt/s
> > > >  proto 0:          0 pkt/s
> > > >  proto 17:          0 pkt/s
> > > >  proto 0:          0 pkt/s
> > > >  proto 17:          1 pkt/s
> > >
> > > Trying out xdp_redirect:
> > >
> > >  [kernel-bpf-samples]$ sudo ./xdp_redirect i40e2 i40e2
> > >  input: 7 output: 7
> > >  libbpf: elf: skipping unrecognized data section(20) .eh_frame
> > >  libbpf: elf: skipping relo section(21) .rel.eh_frame for section(20) .eh_frame
> > >  libbpf: Kernel error message: XDP program already attached
> > >  WARN: link set xdp fd failed on 7
> > >  ifindex 7:       7357 pkt/s
> > >  ifindex 7:       7909 pkt/s
> > >  ifindex 7:       7909 pkt/s
> > >  ifindex 7:       7909 pkt/s
> > >  ifindex 7:       7909 pkt/s
> > >  ifindex 7:       7909 pkt/s
> > >  ifindex 7:       6357 pkt/s
> > >
> > > And then it crash (see below) at page_frag_free+0x31 which calls
> > > virt_to_head_page() with a wrong addr (I guess).  This is called by
> > > i40e_clean_tx_irq+0xc9.
> >
> > Did not see a crash myself, just 4 Kpps. But the rings and DMA
> > mappings got completely mangled by the patch above, so could be the
> > same cause.
>
> Are you running with jumbo frames enabled? I would think this change
> would really blow things up in the jumbo enabled case.

I did not. Just using XDP_DROP or XDP_TX would crash the system just fine.

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v8 bpf-next 00/14] mvneta: introduce XDP multi-buffer support
  2021-04-16 14:27 ` Magnus Karlsson
  2021-04-16 21:29   ` Lorenzo Bianconi
  2021-04-18 16:18   ` Jesper Dangaard Brouer
@ 2021-04-27 18:28   ` Lorenzo Bianconi
  2021-04-28  7:41     ` Magnus Karlsson
  2 siblings, 1 reply; 57+ messages in thread
From: Lorenzo Bianconi @ 2021-04-27 18:28 UTC (permalink / raw)
  To: Magnus Karlsson
  Cc: bpf, Network Development, lorenzo.bianconi, David S. Miller,
	Jakub Kicinski, Alexei Starovoitov, Daniel Borkmann, shayagr,
	sameehj, John Fastabend, David Ahern, Jesper Dangaard Brouer,
	Eelco Chaudron, Jason Wang, Alexander Duyck, Saeed Mahameed,
	Fijalkowski, Maciej, Tirthendu

[-- Attachment #1: Type: text/plain, Size: 3001 bytes --]

[...]

> Took your patches for a test run with the AF_XDP sample xdpsock on an
> i40e card and the throughput degradation is between 2 to 6% depending
> on the setup and microbenchmark within xdpsock that is executed. And
> this is without sending any multi frame packets. Just single frame
> ones. Tirtha made changes to the i40e driver to support this new
> interface so that is being included in the measurements.
> 
> What performance do you see with the mvneta card? How much are we
> willing to pay for this feature when it is not being used or can we in
> some way selectively turn it on only when needed?

Hi Magnus,

Today I carried out some comparison tests between bpf-next and bpf-next +
xdp_multibuff series on mvneta running xdp_rxq_info sample. Results are
basically aligned:

bpf-next:
- xdp drop ~ 665Kpps
- xdp_tx   ~ 291Kpps
- xdp_pass ~ 118Kpps

bpf-next + xdp_multibuff:
- xdp drop ~ 672Kpps
- xdp_tx   ~ 288Kpps
- xdp_pass ~ 118Kpps

I am not sure if results are affected by the low power CPU, I will run some
tests on ixgbe card.

Regards,
Lorenzo

> 
> Thanks: Magnus
> 
> > Eelco Chaudron (4):
> >   bpf: add multi-buff support to the bpf_xdp_adjust_tail() API
> >   bpd: add multi-buffer support to xdp copy helpers
> >   bpf: add new frame_length field to the XDP ctx
> >   bpf: update xdp_adjust_tail selftest to include multi-buffer
> >
> > Lorenzo Bianconi (10):
> >   xdp: introduce mb in xdp_buff/xdp_frame
> >   xdp: add xdp_shared_info data structure
> >   net: mvneta: update mb bit before passing the xdp buffer to eBPF layer
> >   xdp: add multi-buff support to xdp_return_{buff/frame}
> >   net: mvneta: add multi buffer support to XDP_TX
> >   net: mvneta: enable jumbo frames for XDP
> >   net: xdp: add multi-buff support to xdp_build_skb_from_fram
> >   bpf: move user_size out of bpf_test_init
> >   bpf: introduce multibuff support to bpf_prog_test_run_xdp()
> >   bpf: test_run: add xdp_shared_info pointer in bpf_test_finish
> >     signature
> >
> >  drivers/net/ethernet/marvell/mvneta.c         | 182 ++++++++++--------
> >  include/linux/filter.h                        |   7 +
> >  include/net/xdp.h                             | 105 +++++++++-
> >  include/uapi/linux/bpf.h                      |   1 +
> >  net/bpf/test_run.c                            | 109 +++++++++--
> >  net/core/filter.c                             | 134 ++++++++++++-
> >  net/core/xdp.c                                | 103 +++++++++-
> >  tools/include/uapi/linux/bpf.h                |   1 +
> >  .../bpf/prog_tests/xdp_adjust_tail.c          | 105 ++++++++++
> >  .../selftests/bpf/prog_tests/xdp_bpf2bpf.c    | 127 ++++++++----
> >  .../bpf/progs/test_xdp_adjust_tail_grow.c     |  17 +-
> >  .../bpf/progs/test_xdp_adjust_tail_shrink.c   |  32 ++-
> >  .../selftests/bpf/progs/test_xdp_bpf2bpf.c    |   3 +-
> >  13 files changed, 767 insertions(+), 159 deletions(-)
> >
> > --
> > 2.30.2
> >

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v8 bpf-next 00/14] mvneta: introduce XDP multi-buffer support
  2021-04-27 18:28   ` [PATCH v8 bpf-next 00/14] mvneta: introduce XDP multi-buffer support Lorenzo Bianconi
@ 2021-04-28  7:41     ` Magnus Karlsson
  2021-04-29 12:49       ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 57+ messages in thread
From: Magnus Karlsson @ 2021-04-28  7:41 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: bpf, Network Development, Lorenzo Bianconi, David S. Miller,
	Jakub Kicinski, Alexei Starovoitov, Daniel Borkmann, shayagr,
	sameehj, John Fastabend, David Ahern, Jesper Dangaard Brouer,
	Eelco Chaudron, Jason Wang, Alexander Duyck, Saeed Mahameed,
	Fijalkowski, Maciej, Tirthendu

On Tue, Apr 27, 2021 at 8:28 PM Lorenzo Bianconi <lorenzo@kernel.org> wrote:
>
> [...]
>
> > Took your patches for a test run with the AF_XDP sample xdpsock on an
> > i40e card and the throughput degradation is between 2 to 6% depending
> > on the setup and microbenchmark within xdpsock that is executed. And
> > this is without sending any multi frame packets. Just single frame
> > ones. Tirtha made changes to the i40e driver to support this new
> > interface so that is being included in the measurements.
> >
> > What performance do you see with the mvneta card? How much are we
> > willing to pay for this feature when it is not being used or can we in
> > some way selectively turn it on only when needed?
>
> Hi Magnus,
>
> Today I carried out some comparison tests between bpf-next and bpf-next +
> xdp_multibuff series on mvneta running xdp_rxq_info sample. Results are
> basically aligned:
>
> bpf-next:
> - xdp drop ~ 665Kpps
> - xdp_tx   ~ 291Kpps
> - xdp_pass ~ 118Kpps
>
> bpf-next + xdp_multibuff:
> - xdp drop ~ 672Kpps
> - xdp_tx   ~ 288Kpps
> - xdp_pass ~ 118Kpps
>
> I am not sure if results are affected by the low power CPU, I will run some
> tests on ixgbe card.

Thanks Lorenzo. I made some new runs, this time with i40e driver
changes as a new data point. Same baseline as before but with patches
[1] and [2] applied. Note
that if you use net or net-next and i40e, you need patch [3] too.

The i40e multi-buffer support will be posted on the mailing list as a
separate RFC patch so you can reproduce and review.

Note, calculations are performed on non-truncated numbers. So 2 ns
might be 5 cycles on my 2.1 GHz machine since 2.49 ns * 2.1 GHz =
5.229 cycles ~ 5 cycles. xdpsock is run in zero-copy mode so it uses
the zero-copy driver data path in contrast with xdp_rxq_info that uses
the regular driver data path. Only ran the busy-poll 1-core case this
time. Reported numbers are the average over 3 runs.

multi-buffer patches without any driver changes:

xdpsock rxdrop 1-core:
i40e: -4.5% in throughput / +3 ns / +6 cycles
ice: -1.5% / +1 ns / +2 cycles

xdp_rxq_info -a XDP_DROP
i40e: -2.5% / +2 ns / +3 cycles
ice: +6% / -3 ns / -7 cycles

xdp_rxq_info -a XDP_TX
i40e: -10% / +15 ns / +32 cycles
ice: -9% / +14 ns / +29 cycles

multi-buffer patches + i40e driver changes from Tirtha:

xdpsock rxdrop 1-core:
i40e: -3% / +2 ns / +3 cycles

xdp_rxq_info -a XDP_DROP
i40e: -7.5% / +5 ns / +9 cycles

xdp_rxq_info -a XDP_TX
i40e: -10% / +15 ns / +32 cycles

Would be great if someone could rerun a similar set of experiments on
i40e or ice then
report.

[1] https://lists.osuosl.org/pipermail/intel-wired-lan/Week-of-Mon-20210419/024106.html
[2] https://lists.osuosl.org/pipermail/intel-wired-lan/Week-of-Mon-20210426/024135.html
[3] https://lists.osuosl.org/pipermail/intel-wired-lan/Week-of-Mon-20210426/024129.html

> Regards,
> Lorenzo
>
> >
> > Thanks: Magnus
> >
> > > Eelco Chaudron (4):
> > >   bpf: add multi-buff support to the bpf_xdp_adjust_tail() API
> > >   bpd: add multi-buffer support to xdp copy helpers
> > >   bpf: add new frame_length field to the XDP ctx
> > >   bpf: update xdp_adjust_tail selftest to include multi-buffer
> > >
> > > Lorenzo Bianconi (10):
> > >   xdp: introduce mb in xdp_buff/xdp_frame
> > >   xdp: add xdp_shared_info data structure
> > >   net: mvneta: update mb bit before passing the xdp buffer to eBPF layer
> > >   xdp: add multi-buff support to xdp_return_{buff/frame}
> > >   net: mvneta: add multi buffer support to XDP_TX
> > >   net: mvneta: enable jumbo frames for XDP
> > >   net: xdp: add multi-buff support to xdp_build_skb_from_fram
> > >   bpf: move user_size out of bpf_test_init
> > >   bpf: introduce multibuff support to bpf_prog_test_run_xdp()
> > >   bpf: test_run: add xdp_shared_info pointer in bpf_test_finish
> > >     signature
> > >
> > >  drivers/net/ethernet/marvell/mvneta.c         | 182 ++++++++++--------
> > >  include/linux/filter.h                        |   7 +
> > >  include/net/xdp.h                             | 105 +++++++++-
> > >  include/uapi/linux/bpf.h                      |   1 +
> > >  net/bpf/test_run.c                            | 109 +++++++++--
> > >  net/core/filter.c                             | 134 ++++++++++++-
> > >  net/core/xdp.c                                | 103 +++++++++-
> > >  tools/include/uapi/linux/bpf.h                |   1 +
> > >  .../bpf/prog_tests/xdp_adjust_tail.c          | 105 ++++++++++
> > >  .../selftests/bpf/prog_tests/xdp_bpf2bpf.c    | 127 ++++++++----
> > >  .../bpf/progs/test_xdp_adjust_tail_grow.c     |  17 +-
> > >  .../bpf/progs/test_xdp_adjust_tail_shrink.c   |  32 ++-
> > >  .../selftests/bpf/progs/test_xdp_bpf2bpf.c    |   3 +-
> > >  13 files changed, 767 insertions(+), 159 deletions(-)
> > >
> > > --
> > > 2.30.2
> > >

^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v8 bpf-next 00/14] mvneta: introduce XDP multi-buffer support
  2021-04-28  7:41     ` Magnus Karlsson
@ 2021-04-29 12:49       ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 57+ messages in thread
From: Jesper Dangaard Brouer @ 2021-04-29 12:49 UTC (permalink / raw)
  To: Magnus Karlsson
  Cc: Lorenzo Bianconi, bpf, Network Development, Lorenzo Bianconi,
	David S. Miller, Jakub Kicinski, Alexei Starovoitov,
	Daniel Borkmann, shayagr, sameehj, John Fastabend, David Ahern,
	Eelco Chaudron, Jason Wang, Alexander Duyck, Saeed Mahameed,
	Fijalkowski, Maciej, Tirthendu, brouer

On Wed, 28 Apr 2021 09:41:52 +0200
Magnus Karlsson <magnus.karlsson@gmail.com> wrote:

> On Tue, Apr 27, 2021 at 8:28 PM Lorenzo Bianconi <lorenzo@kernel.org> wrote:
> >
> > [...]
> >  
> > > Took your patches for a test run with the AF_XDP sample xdpsock on an
> > > i40e card and the throughput degradation is between 2 to 6% depending
> > > on the setup and microbenchmark within xdpsock that is executed. And
> > > this is without sending any multi frame packets. Just single frame
> > > ones. Tirtha made changes to the i40e driver to support this new
> > > interface so that is being included in the measurements.
> > >
> > > What performance do you see with the mvneta card? How much are we
> > > willing to pay for this feature when it is not being used or can we in
> > > some way selectively turn it on only when needed?  
> >
> > Hi Magnus,
> >
> > Today I carried out some comparison tests between bpf-next and bpf-next +
> > xdp_multibuff series on mvneta running xdp_rxq_info sample. Results are
> > basically aligned:
> >
> > bpf-next:
> > - xdp drop ~ 665Kpps
> > - xdp_tx   ~ 291Kpps
> > - xdp_pass ~ 118Kpps
> >
> > bpf-next + xdp_multibuff:
> > - xdp drop ~ 672Kpps
> > - xdp_tx   ~ 288Kpps
> > - xdp_pass ~ 118Kpps
> >
> > I am not sure if results are affected by the low power CPU, I will run some
> > tests on ixgbe card.  
> 
> Thanks Lorenzo. I made some new runs, this time with i40e driver
> changes as a new data point. Same baseline as before but with patches
> [1] and [2] applied. Note
> that if you use net or net-next and i40e, you need patch [3] too.
> 
> The i40e multi-buffer support will be posted on the mailing list as a
> separate RFC patch so you can reproduce and review.
> 
> Note, calculations are performed on non-truncated numbers. So 2 ns
> might be 5 cycles on my 2.1 GHz machine since 2.49 ns * 2.1 GHz =
> 5.229 cycles ~ 5 cycles. xdpsock is run in zero-copy mode so it uses
> the zero-copy driver data path in contrast with xdp_rxq_info that uses
> the regular driver data path. Only ran the busy-poll 1-core case this
> time. Reported numbers are the average over 3 runs.

Yes, for i40e the xdpsock zero-copy test uses another code path, this
is something we need to keep in mind. 

Also remember that we designed the central xdp_do_redirect() call to
delay creation of xdp_frame.  This is something what AF_XDP ZC takes
advantage of.
Thus, the cost of xdp_buff to xdp_frame conversion is not covered in
below tests, and I expect this patchset to increase that cost...
(UPDATE: below XDP_TX actually does xdp_frame conversion)


> multi-buffer patches without any driver changes:

Thanks you *SO* much Magnus for these superb tests.  I absolutely love
how comprehensive your test results are.  Thanks you for catching the
performance regression in this patchset. (I for one know how time
consuming these kind of tests are, I appreciate your effort, a lot!)

> xdpsock rxdrop 1-core:
> i40e: -4.5% in throughput / +3 ns / +6 cycles
> ice: -1.5% / +1 ns / +2 cycles
> 
> xdp_rxq_info -a XDP_DROP
> i40e: -2.5% / +2 ns / +3 cycles
> ice: +6% / -3 ns / -7 cycles
> 
> xdp_rxq_info -a XDP_TX
> i40e: -10% / +15 ns / +32 cycles
> ice: -9% / +14 ns / +29 cycles

This is a clear performance regression.

Looking closer at driver i40e_xmit_xdp_tx_ring() actually performs a
xdp_frame conversion calling xdp_convert_buff_to_frame(xdp).

FYI: We have started an offlist thread on finding the root-cause and
on IRC with Lorenzo.   The current lead is that, as Alexei so wisely
pointed out in earlier patches, that struct bit access is not
efficient...

As I expect we soon need bits for HW RX checksum indication, and
indication if metadata contains BTF described area, I've asked Lorenzo
to consider this, and look into introducing a flags member. (Then we
just have to figure out how to make flags access efficient).
 

> multi-buffer patches + i40e driver changes from Tirtha:
> 
> xdpsock rxdrop 1-core:
> i40e: -3% / +2 ns / +3 cycles
> 
> xdp_rxq_info -a XDP_DROP
> i40e: -7.5% / +5 ns / +9 cycles
> 
> xdp_rxq_info -a XDP_TX
> i40e: -10% / +15 ns / +32 cycles
> 
> Would be great if someone could rerun a similar set of experiments on
> i40e or ice then
> report.
 
> [1] https://lists.osuosl.org/pipermail/intel-wired-lan/Week-of-Mon-20210419/024106.html
> [2] https://lists.osuosl.org/pipermail/intel-wired-lan/Week-of-Mon-20210426/024135.html
> [3] https://lists.osuosl.org/pipermail/intel-wired-lan/Week-of-Mon-20210426/024129.html

I'm very happy that you/we all are paying attention to keep XDP
performance intact, as small 'paper-cuts' like +32 cycles does affect
XDP in the long run. Happy performance testing everybody :-)

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v8 bpf-next 01/14] xdp: introduce mb in xdp_buff/xdp_frame
  2021-04-08 12:50 ` [PATCH v8 bpf-next 01/14] xdp: introduce mb in xdp_buff/xdp_frame Lorenzo Bianconi
  2021-04-08 18:17   ` Vladimir Oltean
@ 2021-04-29 13:36   ` Jesper Dangaard Brouer
  2021-04-29 13:54     ` Lorenzo Bianconi
  1 sibling, 1 reply; 57+ messages in thread
From: Jesper Dangaard Brouer @ 2021-04-29 13:36 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: bpf, netdev, lorenzo.bianconi, davem, kuba, ast, daniel, shayagr,
	sameehj, john.fastabend, dsahern, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, brouer

On Thu,  8 Apr 2021 14:50:53 +0200
Lorenzo Bianconi <lorenzo@kernel.org> wrote:

> Introduce multi-buffer bit (mb) in xdp_frame/xdp_buffer data structure
> in order to specify if this is a linear buffer (mb = 0) or a multi-buffer
> frame (mb = 1). In the latter case the shared_info area at the end of the
> first buffer will be properly initialized to link together subsequent
> buffers.
> 
> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> ---
>  include/net/xdp.h | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/include/net/xdp.h b/include/net/xdp.h
> index a5bc214a49d9..842580a61563 100644
> --- a/include/net/xdp.h
> +++ b/include/net/xdp.h
> @@ -73,7 +73,10 @@ struct xdp_buff {
>  	void *data_hard_start;
>  	struct xdp_rxq_info *rxq;
>  	struct xdp_txq_info *txq;
> -	u32 frame_sz; /* frame size to deduce data_hard_end/reserved tailroom*/
> +	u32 frame_sz:31; /* frame size to deduce data_hard_end/reserved
> +			  * tailroom
> +			  */
> +	u32 mb:1; /* xdp non-linear buffer */
>  };
>  
>  static __always_inline void
> @@ -81,6 +84,7 @@ xdp_init_buff(struct xdp_buff *xdp, u32 frame_sz, struct xdp_rxq_info *rxq)
>  {
>  	xdp->frame_sz = frame_sz;
>  	xdp->rxq = rxq;
> +	xdp->mb = 0;
>  }
>  
>  static __always_inline void
> @@ -116,7 +120,8 @@ struct xdp_frame {
>  	u16 len;
>  	u16 headroom;
>  	u32 metasize:8;
> -	u32 frame_sz:24;
> +	u32 frame_sz:23;
> +	u32 mb:1; /* xdp non-linear frame */
>  	/* Lifetime of xdp_rxq_info is limited to NAPI/enqueue time,
>  	 * while mem info is valid on remote CPU.
>  	 */

So, it seems that these bitfield's are the root-cause of the
performance regression.  Credit to Alexei whom wisely already point
this out[1] in V2 ;-)

[1] https://lore.kernel.org/netdev/20200904010705.jm6dnuyj3oq4cpjd@ast-mbp.dhcp.thefacebook.com/


> @@ -179,6 +184,7 @@ void xdp_convert_frame_to_buff(struct xdp_frame *frame, struct xdp_buff *xdp)
>  	xdp->data_end = frame->data + frame->len;
>  	xdp->data_meta = frame->data - frame->metasize;
>  	xdp->frame_sz = frame->frame_sz;
> +	xdp->mb = frame->mb;
>  }
>  
>  static inline
> @@ -205,6 +211,7 @@ int xdp_update_frame_from_buff(struct xdp_buff *xdp,
>  	xdp_frame->headroom = headroom - sizeof(*xdp_frame);
>  	xdp_frame->metasize = metasize;
>  	xdp_frame->frame_sz = xdp->frame_sz;
> +	xdp_frame->mb = xdp->mb;
>  
>  	return 0;
>  }

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 57+ messages in thread

* Re: [PATCH v8 bpf-next 01/14] xdp: introduce mb in xdp_buff/xdp_frame
  2021-04-29 13:36   ` Jesper Dangaard Brouer
@ 2021-04-29 13:54     ` Lorenzo Bianconi
  0 siblings, 0 replies; 57+ messages in thread
From: Lorenzo Bianconi @ 2021-04-29 13:54 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: bpf, netdev, lorenzo.bianconi, davem, kuba, ast, daniel, shayagr,
	sameehj, john.fastabend, dsahern, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski

[-- Attachment #1: Type: text/plain, Size: 2141 bytes --]

> >  static __always_inline void
> > @@ -116,7 +120,8 @@ struct xdp_frame {
> >  	u16 len;
> >  	u16 headroom;
> >  	u32 metasize:8;
> > -	u32 frame_sz:24;
> > +	u32 frame_sz:23;
> > +	u32 mb:1; /* xdp non-linear frame */
> >  	/* Lifetime of xdp_rxq_info is limited to NAPI/enqueue time,
> >  	 * while mem info is valid on remote CPU.
> >  	 */
> 
> So, it seems that these bitfield's are the root-cause of the
> performance regression.  Credit to Alexei whom wisely already point
> this out[1] in V2 ;-)
> 
> [1] https://lore.kernel.org/netdev/20200904010705.jm6dnuyj3oq4cpjd@ast-mbp.dhcp.thefacebook.com/

yes, shame on me..yesterday I recalled email from Alexei debugging the issue
reported by Magnus.
In the current approach I am testing (not posted upstream yet) I reduced the
size of xdp_mem_info as proposed by Jesper in [0] and I added a flags field
in xdp_frame/xdp_buff we can use for multiple features (e.g. multi-buff or hw csum
hints). Doing so, running xdp_rxq_info sample on ixgbe 10Gbps NIC I do not have any
performance regressions for xdp_tx or xdp_drop. Same results have been reported by
Magnus off-list on i40e (we have a 1% regression on xdp_sock tests iiuc).
I will continue working on this.

Regards,
Lorenzo

[0] https://patchwork.kernel.org/project/netdevbpf/patch/20210409223801.104657-2-mcroce@linux.microsoft.com/

> 
> 
> > @@ -179,6 +184,7 @@ void xdp_convert_frame_to_buff(struct xdp_frame *frame, struct xdp_buff *xdp)
> >  	xdp->data_end = frame->data + frame->len;
> >  	xdp->data_meta = frame->data - frame->metasize;
> >  	xdp->frame_sz = frame->frame_sz;
> > +	xdp->mb = frame->mb;
> >  }
> >  
> >  static inline
> > @@ -205,6 +211,7 @@ int xdp_update_frame_from_buff(struct xdp_buff *xdp,
> >  	xdp_frame->headroom = headroom - sizeof(*xdp_frame);
> >  	xdp_frame->metasize = metasize;
> >  	xdp_frame->frame_sz = xdp->frame_sz;
> > +	xdp_frame->mb = xdp->mb;
> >  
> >  	return 0;
> >  }
> 
> -- 
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   LinkedIn: http://www.linkedin.com/in/brouer
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 57+ messages in thread

end of thread, other threads:[~2021-04-29 13:54 UTC | newest]

Thread overview: 57+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-08 12:50 [PATCH v8 bpf-next 00/14] mvneta: introduce XDP multi-buffer support Lorenzo Bianconi
2021-04-08 12:50 ` [PATCH v8 bpf-next 01/14] xdp: introduce mb in xdp_buff/xdp_frame Lorenzo Bianconi
2021-04-08 18:17   ` Vladimir Oltean
2021-04-09 16:03     ` Lorenzo Bianconi
2021-04-29 13:36   ` Jesper Dangaard Brouer
2021-04-29 13:54     ` Lorenzo Bianconi
2021-04-08 12:50 ` [PATCH v8 bpf-next 02/14] xdp: add xdp_shared_info data structure Lorenzo Bianconi
2021-04-08 13:39   ` Vladimir Oltean
2021-04-08 14:26     ` Lorenzo Bianconi
2021-04-08 18:06   ` kernel test robot
2021-04-08 12:50 ` [PATCH v8 bpf-next 03/14] net: mvneta: update mb bit before passing the xdp buffer to eBPF layer Lorenzo Bianconi
2021-04-08 18:19   ` Vladimir Oltean
2021-04-09 16:24     ` Lorenzo Bianconi
2021-04-08 12:50 ` [PATCH v8 bpf-next 04/14] xdp: add multi-buff support to xdp_return_{buff/frame} Lorenzo Bianconi
2021-04-08 18:30   ` Vladimir Oltean
2021-04-09 16:28     ` Lorenzo Bianconi
2021-04-08 12:50 ` [PATCH v8 bpf-next 05/14] net: mvneta: add multi buffer support to XDP_TX Lorenzo Bianconi
2021-04-08 18:40   ` Vladimir Oltean
2021-04-09 16:36     ` Lorenzo Bianconi
2021-04-08 12:50 ` [PATCH v8 bpf-next 06/14] net: mvneta: enable jumbo frames for XDP Lorenzo Bianconi
2021-04-08 12:50 ` [PATCH v8 bpf-next 07/14] net: xdp: add multi-buff support to xdp_build_skb_from_fram Lorenzo Bianconi
2021-04-08 12:51 ` [PATCH v8 bpf-next 08/14] bpf: add multi-buff support to the bpf_xdp_adjust_tail() API Lorenzo Bianconi
2021-04-08 19:15   ` Vladimir Oltean
2021-04-08 20:54     ` Vladimir Oltean
2021-04-09 18:13       ` Lorenzo Bianconi
2021-04-08 12:51 ` [PATCH v8 bpf-next 09/14] bpd: add multi-buffer support to xdp copy helpers Lorenzo Bianconi
2021-04-08 20:57   ` Vladimir Oltean
2021-04-09 18:19     ` Lorenzo Bianconi
2021-04-08 21:04   ` Vladimir Oltean
2021-04-14  8:08     ` Eelco Chaudron
2021-04-08 12:51 ` [PATCH v8 bpf-next 10/14] bpf: add new frame_length field to the XDP ctx Lorenzo Bianconi
2021-04-08 12:51 ` [PATCH v8 bpf-next 11/14] bpf: move user_size out of bpf_test_init Lorenzo Bianconi
2021-04-08 12:51 ` [PATCH v8 bpf-next 12/14] bpf: introduce multibuff support to bpf_prog_test_run_xdp() Lorenzo Bianconi
2021-04-08 12:51 ` [PATCH v8 bpf-next 13/14] bpf: test_run: add xdp_shared_info pointer in bpf_test_finish signature Lorenzo Bianconi
2021-04-08 12:51 ` [PATCH v8 bpf-next 14/14] bpf: update xdp_adjust_tail selftest to include multi-buffer Lorenzo Bianconi
2021-04-09  0:56 ` [PATCH v8 bpf-next 00/14] mvneta: introduce XDP multi-buffer support John Fastabend
2021-04-09 20:16   ` Lorenzo Bianconi
2021-04-13 15:16   ` Eelco Chaudron
2021-04-16 14:27 ` Magnus Karlsson
2021-04-16 21:29   ` Lorenzo Bianconi
2021-04-16 23:00     ` Daniel Borkmann
2021-04-18 16:18   ` Jesper Dangaard Brouer
2021-04-19  6:20     ` Magnus Karlsson
2021-04-19  6:55       ` Lorenzo Bianconi
2021-04-20 13:49         ` Magnus Karlsson
2021-04-21 12:47           ` Jesper Dangaard Brouer
2021-04-21 14:12             ` Magnus Karlsson
2021-04-21 15:39               ` Jesper Dangaard Brouer
2021-04-22 10:24                 ` Magnus Karlsson
2021-04-22 14:42                   ` Jesper Dangaard Brouer
2021-04-22 15:05                     ` Crash for i40e on net-next (was: [PATCH v8 bpf-next 00/14] mvneta: introduce XDP multi-buffer support) Jesper Dangaard Brouer
2021-04-23  5:28                       ` Magnus Karlsson
2021-04-23 16:43                         ` Alexander Duyck
2021-04-25  9:45                           ` Magnus Karlsson
2021-04-27 18:28   ` [PATCH v8 bpf-next 00/14] mvneta: introduce XDP multi-buffer support Lorenzo Bianconi
2021-04-28  7:41     ` Magnus Karlsson
2021-04-29 12:49       ` Jesper Dangaard Brouer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).