All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v9 bpf-next 00/14] mvneta: introduce XDP multi-buffer support
@ 2021-06-14 12:49 Lorenzo Bianconi
  2021-06-14 12:49 ` [PATCH v9 bpf-next 01/14] net: skbuff: add data_len field to skb_shared_info Lorenzo Bianconi
                   ` (14 more replies)
  0 siblings, 15 replies; 53+ messages in thread
From: Lorenzo Bianconi @ 2021-06-14 12:49 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr, sameehj,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=y, Size: 6640 bytes --]

This series introduce XDP multi-buffer support. The mvneta driver is
the first to support these new "non-linear" xdp_{buff,frame}. Reviewers
please focus on how these new types of xdp_{buff,frame} packets
traverse the different layers and the layout design. It is on purpose
that BPF-helpers are kept simple, as we don't want to expose the
internal layout to allow later changes.

For now, to keep the design simple and to maintain performance, the XDP
BPF-prog (still) only have access to the first-buffer. It is left for
later (another patchset) to add payload access across multiple buffers.
This patchset should still allow for these future extensions. The goal
is to lift the XDP MTU restriction that comes with XDP, but maintain
same performance as before.

The main idea for the new multi-buffer layout is to reuse the same
layout used for non-linear SKB. This rely on the "skb_shared_info"
struct at the end of the first buffer to link together subsequent
buffers. Keeping the layout compatible with SKBs is also done to ease
and speedup creating an SKB from an xdp_{buff,frame}.
Converting xdp_frame to SKB and deliver it to the network stack is shown
in patch 07/14 (e.g. cpumaps).

A multi-buffer bit (mb) has been introduced in the flags field of xdp_{buff,frame}
structure to notify the bpf/network layer if this is a xdp multi-buffer frame
(mb = 1) or not (mb = 0).
The mb bit will be set by a xdp multi-buffer capable driver only for
non-linear frames maintaining the capability to receive linear frames
without any extra cost since the skb_shared_info structure at the end
of the first buffer will be initialized only if mb is set.
Moreover the flags field in xdp_{buff,frame} will be reused even for
xdp rx csum offloading in future series.

Typical use cases for this series are:
- Jumbo-frames
- Packet header split (please see Google’s use-case @ NetDevConf 0x14, [0])
- TSO

A new bpf helper (bpf_xdp_get_buff_len) has been introduce in order to notify
the eBPF layer about the total frame size (linear + paged parts).

bpf_xdp_adjust_tail and bpf_xdp_copy helpers have been modified to take into
account xdp multi-buff frames.

More info about the main idea behind this approach can be found here [1][2].

Changes since v8:
- add proper dma unmapping if XDP_TX fails on mvneta for a xdp multi-buff
- switch back to skb_shared_info implementation from previous xdp_shared_info
  one
- avoid using a bietfield in xdp_buff/xdp_frame since it introduces performance
  regressions. Tested now on 10G NIC (ixgbe) to verify there are no performance
  penalties for regular codebase
- add bpf_xdp_get_buff_len helper and remove frame_length field in xdp ctx
- add data_len field in skb_shared_info struct

Changes since v7:
- rebase on top of bpf-next
- fix sparse warnings
- improve comments for frame_length in include/net/xdp.h

Changes since v6:
- the main difference respect to previous versions is the new approach proposed
  by Eelco to pass full length of the packet to eBPF layer in XDP context
- reintroduce multi-buff support to eBPF kself-tests
- reintroduce multi-buff support to bpf_xdp_adjust_tail helper
- introduce multi-buffer support to bpf_xdp_copy helper
- rebase on top of bpf-next

Changes since v5:
- rebase on top of bpf-next
- initialize mb bit in xdp_init_buff() and drop per-driver initialization
- drop xdp->mb initialization in xdp_convert_zc_to_xdp_frame()
- postpone introduction of frame_length field in XDP ctx to another series
- minor changes

Changes since v4:
- rebase ontop of bpf-next
- introduce xdp_shared_info to build xdp multi-buff instead of using the
  skb_shared_info struct
- introduce frame_length in xdp ctx
- drop previous bpf helpers
- fix bpf_xdp_adjust_tail for xdp multi-buff
- introduce xdp multi-buff self-tests for bpf_xdp_adjust_tail
- fix xdp_return_frame_bulk for xdp multi-buff

Changes since v3:
- rebase ontop of bpf-next
- add patch 10/13 to copy back paged data from a xdp multi-buff frame to
  userspace buffer for xdp multi-buff selftests

Changes since v2:
- add throughput measurements
- drop bpf_xdp_adjust_mb_header bpf helper
- introduce selftest for xdp multibuffer
- addressed comments on bpf_xdp_get_frags_count
- introduce xdp multi-buff support to cpumaps

Changes since v1:
- Fix use-after-free in xdp_return_{buff/frame}
- Introduce bpf helpers
- Introduce xdp_mb sample program
- access skb_shared_info->nr_frags only on the last fragment

Changes since RFC:
- squash multi-buffer bit initialization in a single patch
- add mvneta non-linear XDP buff support for tx side

[0] https://netdevconf.info/0x14/session.html?talk-the-path-to-tcp-4k-mtu-and-rx-zerocopy
[1] https://github.com/xdp-project/xdp-project/blob/master/areas/core/xdp-multi-buffer01-design.org
[2] https://netdevconf.info/0x14/session.html?tutorial-add-XDP-support-to-a-NIC-driver (XDPmulti-buffers section)

Eelco Chaudron (3):
  bpf: add multi-buff support to the bpf_xdp_adjust_tail() API
  bpf: add multi-buffer support to xdp copy helpers
  bpf: update xdp_adjust_tail selftest to include multi-buffer

Lorenzo Bianconi (11):
  net: skbuff: add data_len field to skb_shared_info
  xdp: introduce flags field in xdp_buff/xdp_frame
  net: mvneta: update mb bit before passing the xdp buffer to eBPF layer
  xdp: add multi-buff support to xdp_return_{buff/frame}
  net: mvneta: add multi buffer support to XDP_TX
  net: mvneta: enable jumbo frames for XDP
  net: xdp: add multi-buff support to xdp_build_skb_from_frame
  bpf: introduce bpf_xdp_get_buff_len helper
  bpf: move user_size out of bpf_test_init
  bpf: introduce multibuff support to bpf_prog_test_run_xdp()
  bpf: test_run: add xdp_shared_info pointer in bpf_test_finish
    signature

 drivers/net/ethernet/marvell/mvneta.c         | 143 ++++++++++------
 include/linux/skbuff.h                        |   5 +-
 include/net/xdp.h                             |  56 ++++++-
 include/uapi/linux/bpf.h                      |   7 +
 kernel/trace/bpf_trace.c                      |   3 +
 net/bpf/test_run.c                            | 108 +++++++++---
 net/core/filter.c                             | 157 +++++++++++++++++-
 net/core/xdp.c                                |  72 +++++++-
 tools/include/uapi/linux/bpf.h                |   7 +
 .../bpf/prog_tests/xdp_adjust_tail.c          | 105 ++++++++++++
 .../selftests/bpf/prog_tests/xdp_bpf2bpf.c    | 127 +++++++++-----
 .../bpf/progs/test_xdp_adjust_tail_grow.c     |  10 +-
 .../bpf/progs/test_xdp_adjust_tail_shrink.c   |  32 +++-
 .../selftests/bpf/progs/test_xdp_bpf2bpf.c    |   2 +-
 14 files changed, 705 insertions(+), 129 deletions(-)

-- 
2.31.1


^ permalink raw reply	[flat|nested] 53+ messages in thread

* [PATCH v9 bpf-next 01/14] net: skbuff: add data_len field to skb_shared_info
  2021-06-14 12:49 [PATCH v9 bpf-next 00/14] mvneta: introduce XDP multi-buffer support Lorenzo Bianconi
@ 2021-06-14 12:49 ` Lorenzo Bianconi
  2021-06-28 19:58   ` Alexander Duyck
  2021-06-14 12:49 ` [PATCH v9 bpf-next 02/14] xdp: introduce flags field in xdp_buff/xdp_frame Lorenzo Bianconi
                   ` (13 subsequent siblings)
  14 siblings, 1 reply; 53+ messages in thread
From: Lorenzo Bianconi @ 2021-06-14 12:49 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr, sameehj,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar

data_len field will be used for paged frame len for xdp_buff/xdp_frame.
This is a preliminary patch to properly support xdp-multibuff

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 include/linux/skbuff.h | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index dbf820a50a39..332ec56c200d 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -522,7 +522,10 @@ struct skb_shared_info {
 	struct sk_buff	*frag_list;
 	struct skb_shared_hwtstamps hwtstamps;
 	unsigned int	gso_type;
-	u32		tskey;
+	union {
+		u32	tskey;
+		u32	data_len;
+	};
 
 	/*
 	 * Warning : all fields before dataref are cleared in __alloc_skb()
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v9 bpf-next 02/14] xdp: introduce flags field in xdp_buff/xdp_frame
  2021-06-14 12:49 [PATCH v9 bpf-next 00/14] mvneta: introduce XDP multi-buffer support Lorenzo Bianconi
  2021-06-14 12:49 ` [PATCH v9 bpf-next 01/14] net: skbuff: add data_len field to skb_shared_info Lorenzo Bianconi
@ 2021-06-14 12:49 ` Lorenzo Bianconi
  2021-06-28 20:14   ` Alexander Duyck
  2021-06-14 12:49 ` [PATCH v9 bpf-next 03/14] net: mvneta: update mb bit before passing the xdp buffer to eBPF layer Lorenzo Bianconi
                   ` (12 subsequent siblings)
  14 siblings, 1 reply; 53+ messages in thread
From: Lorenzo Bianconi @ 2021-06-14 12:49 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr, sameehj,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar

Introduce flags field in xdp_frame/xdp_buffer data structure
to define additional buffer features. At the moment the only
supported buffer feature is multi-buffer bit (mb). Multi-buffer bit
is used to specify if this is a linear buffer (mb = 0) or a multi-buffer
frame (mb = 1). In the latter case the shared_info area at the end of
the first buffer will be properly initialized to link together
subsequent buffers.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 include/net/xdp.h | 31 ++++++++++++++++++++++++++++++-
 1 file changed, 30 insertions(+), 1 deletion(-)

diff --git a/include/net/xdp.h b/include/net/xdp.h
index 5533f0ab2afc..de11d981fa44 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -66,6 +66,10 @@ struct xdp_txq_info {
 	struct net_device *dev;
 };
 
+enum xdp_buff_flags {
+	XDP_FLAGS_MULTI_BUFF	= BIT(0), /* non-linear xdp buff */
+};
+
 struct xdp_buff {
 	void *data;
 	void *data_end;
@@ -74,13 +78,30 @@ struct xdp_buff {
 	struct xdp_rxq_info *rxq;
 	struct xdp_txq_info *txq;
 	u32 frame_sz; /* frame size to deduce data_hard_end/reserved tailroom*/
+	u16 flags; /* supported values defined in xdp_flags */
 };
 
+static __always_inline bool xdp_buff_is_mb(struct xdp_buff *xdp)
+{
+	return !!(xdp->flags & XDP_FLAGS_MULTI_BUFF);
+}
+
+static __always_inline void xdp_buff_set_mb(struct xdp_buff *xdp)
+{
+	xdp->flags |= XDP_FLAGS_MULTI_BUFF;
+}
+
+static __always_inline void xdp_buff_clear_mb(struct xdp_buff *xdp)
+{
+	xdp->flags &= ~XDP_FLAGS_MULTI_BUFF;
+}
+
 static __always_inline void
 xdp_init_buff(struct xdp_buff *xdp, u32 frame_sz, struct xdp_rxq_info *rxq)
 {
 	xdp->frame_sz = frame_sz;
 	xdp->rxq = rxq;
+	xdp->flags = 0;
 }
 
 static __always_inline void
@@ -117,13 +138,19 @@ struct xdp_frame {
 	u16 headroom;
 	u32 metasize:8;
 	u32 frame_sz:24;
+	struct net_device *dev_rx; /* used by cpumap */
 	/* Lifetime of xdp_rxq_info is limited to NAPI/enqueue time,
 	 * while mem info is valid on remote CPU.
 	 */
 	struct xdp_mem_info mem;
-	struct net_device *dev_rx; /* used by cpumap */
+	u16 flags; /* supported values defined in xdp_flags */
 };
 
+static __always_inline bool xdp_frame_is_mb(struct xdp_frame *frame)
+{
+	return !!(frame->flags & XDP_FLAGS_MULTI_BUFF);
+}
+
 #define XDP_BULK_QUEUE_SIZE	16
 struct xdp_frame_bulk {
 	int count;
@@ -180,6 +207,7 @@ void xdp_convert_frame_to_buff(struct xdp_frame *frame, struct xdp_buff *xdp)
 	xdp->data_end = frame->data + frame->len;
 	xdp->data_meta = frame->data - frame->metasize;
 	xdp->frame_sz = frame->frame_sz;
+	xdp->flags = frame->flags;
 }
 
 static inline
@@ -206,6 +234,7 @@ int xdp_update_frame_from_buff(struct xdp_buff *xdp,
 	xdp_frame->headroom = headroom - sizeof(*xdp_frame);
 	xdp_frame->metasize = metasize;
 	xdp_frame->frame_sz = xdp->frame_sz;
+	xdp_frame->flags = xdp->flags;
 
 	return 0;
 }
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v9 bpf-next 03/14] net: mvneta: update mb bit before passing the xdp buffer to eBPF layer
  2021-06-14 12:49 [PATCH v9 bpf-next 00/14] mvneta: introduce XDP multi-buffer support Lorenzo Bianconi
  2021-06-14 12:49 ` [PATCH v9 bpf-next 01/14] net: skbuff: add data_len field to skb_shared_info Lorenzo Bianconi
  2021-06-14 12:49 ` [PATCH v9 bpf-next 02/14] xdp: introduce flags field in xdp_buff/xdp_frame Lorenzo Bianconi
@ 2021-06-14 12:49 ` Lorenzo Bianconi
  2021-06-14 12:49 ` [PATCH v9 bpf-next 04/14] xdp: add multi-buff support to xdp_return_{buff/frame} Lorenzo Bianconi
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 53+ messages in thread
From: Lorenzo Bianconi @ 2021-06-14 12:49 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr, sameehj,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar

Update multi-buffer bit (mb) in xdp_buff to notify XDP/eBPF layer and
XDP remote drivers if this is a "non-linear" XDP buffer. Access
xdp_shared_info only if xdp_buff mb is set.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 drivers/net/ethernet/marvell/mvneta.c | 21 ++++++++++++++++-----
 1 file changed, 16 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 7d5cd9bc6c99..f8993f7488b9 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -2041,9 +2041,14 @@ mvneta_xdp_put_buff(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
 {
 	int i;
 
+	if (likely(!xdp_buff_is_mb(xdp)))
+		goto out;
+
 	for (i = 0; i < sinfo->nr_frags; i++)
 		page_pool_put_full_page(rxq->page_pool,
 					skb_frag_page(&sinfo->frags[i]), true);
+
+out:
 	page_pool_put_page(rxq->page_pool, virt_to_head_page(xdp->data),
 			   sync_len, true);
 }
@@ -2245,7 +2250,6 @@ mvneta_swbm_rx_frame(struct mvneta_port *pp,
 	int data_len = -MVNETA_MH_SIZE, len;
 	struct net_device *dev = pp->dev;
 	enum dma_data_direction dma_dir;
-	struct skb_shared_info *sinfo;
 
 	if (*size > MVNETA_MAX_RX_BUF_SIZE) {
 		len = MVNETA_MAX_RX_BUF_SIZE;
@@ -2267,9 +2271,6 @@ mvneta_swbm_rx_frame(struct mvneta_port *pp,
 	prefetch(data);
 	xdp_prepare_buff(xdp, data, pp->rx_offset_correction + MVNETA_MH_SIZE,
 			 data_len, false);
-
-	sinfo = xdp_get_shared_info_from_buff(xdp);
-	sinfo->nr_frags = 0;
 }
 
 static void
@@ -2304,12 +2305,18 @@ mvneta_swbm_add_rx_fragment(struct mvneta_port *pp,
 		skb_frag_size_set(frag, data_len);
 		__skb_frag_set_page(frag, page);
 
+		if (!xdp_buff_is_mb(xdp)) {
+			xdp_sinfo->data_len = *size;
+			xdp_buff_set_mb(xdp);
+		}
 		/* last fragment */
 		if (len == *size) {
 			struct skb_shared_info *sinfo;
 
 			sinfo = xdp_get_shared_info_from_buff(xdp);
 			sinfo->nr_frags = xdp_sinfo->nr_frags;
+			sinfo->data_len = xdp_sinfo->data_len;
+
 			memcpy(sinfo->frags, xdp_sinfo->frags,
 			       sinfo->nr_frags * sizeof(skb_frag_t));
 		}
@@ -2324,9 +2331,12 @@ mvneta_swbm_build_skb(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
 		      struct xdp_buff *xdp, u32 desc_status)
 {
 	struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp);
-	int i, num_frags = sinfo->nr_frags;
+	int i, num_frags = 0;
 	struct sk_buff *skb;
 
+	if (unlikely(xdp_buff_is_mb(xdp)))
+		num_frags = sinfo->nr_frags;
+
 	skb = build_skb(xdp->data_hard_start, PAGE_SIZE);
 	if (!skb)
 		return ERR_PTR(-ENOMEM);
@@ -2398,6 +2408,7 @@ static int mvneta_rx_swbm(struct napi_struct *napi,
 			frame_sz = size - ETH_FCS_LEN;
 			desc_status = rx_status;
 
+			xdp_buff_clear_mb(&xdp_buf);
 			mvneta_swbm_rx_frame(pp, rx_desc, rxq, &xdp_buf,
 					     &size, page);
 		} else {
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v9 bpf-next 04/14] xdp: add multi-buff support to xdp_return_{buff/frame}
  2021-06-14 12:49 [PATCH v9 bpf-next 00/14] mvneta: introduce XDP multi-buffer support Lorenzo Bianconi
                   ` (2 preceding siblings ...)
  2021-06-14 12:49 ` [PATCH v9 bpf-next 03/14] net: mvneta: update mb bit before passing the xdp buffer to eBPF layer Lorenzo Bianconi
@ 2021-06-14 12:49 ` Lorenzo Bianconi
  2021-06-14 12:49 ` [PATCH v9 bpf-next 05/14] net: mvneta: add multi buffer support to XDP_TX Lorenzo Bianconi
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 53+ messages in thread
From: Lorenzo Bianconi @ 2021-06-14 12:49 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr, sameehj,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar

Take into account if the received xdp_buff/xdp_frame is non-linear
recycling/returning the frame memory to the allocator or into
xdp_frame_bulk.
Introduce xdp_return_num_frags_from_buff to return a given number of
fragments from a xdp multi-buff starting from the tail.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 include/net/xdp.h | 18 ++++++++++++++--
 net/core/xdp.c    | 54 ++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 69 insertions(+), 3 deletions(-)

diff --git a/include/net/xdp.h b/include/net/xdp.h
index de11d981fa44..935a6f83115f 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -275,10 +275,24 @@ void __xdp_release_frame(void *data, struct xdp_mem_info *mem);
 static inline void xdp_release_frame(struct xdp_frame *xdpf)
 {
 	struct xdp_mem_info *mem = &xdpf->mem;
+	struct skb_shared_info *sinfo;
+	int i;
 
 	/* Curr only page_pool needs this */
-	if (mem->type == MEM_TYPE_PAGE_POOL)
-		__xdp_release_frame(xdpf->data, mem);
+	if (mem->type != MEM_TYPE_PAGE_POOL)
+		return;
+
+	if (likely(!xdp_frame_is_mb(xdpf)))
+		goto out;
+
+	sinfo = xdp_get_shared_info_from_frame(xdpf);
+	for (i = 0; i < sinfo->nr_frags; i++) {
+		struct page *page = skb_frag_page(&sinfo->frags[i]);
+
+		__xdp_release_frame(page_address(page), mem);
+	}
+out:
+	__xdp_release_frame(xdpf->data, mem);
 }
 
 int xdp_rxq_info_reg(struct xdp_rxq_info *xdp_rxq,
diff --git a/net/core/xdp.c b/net/core/xdp.c
index 725d20f1b100..f61c63115c95 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -375,12 +375,38 @@ static void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct,
 
 void xdp_return_frame(struct xdp_frame *xdpf)
 {
+	struct skb_shared_info *sinfo;
+	int i;
+
+	if (likely(!xdp_frame_is_mb(xdpf)))
+		goto out;
+
+	sinfo = xdp_get_shared_info_from_frame(xdpf);
+	for (i = 0; i < sinfo->nr_frags; i++) {
+		struct page *page = skb_frag_page(&sinfo->frags[i]);
+
+		__xdp_return(page_address(page), &xdpf->mem, false, NULL);
+	}
+out:
 	__xdp_return(xdpf->data, &xdpf->mem, false, NULL);
 }
 EXPORT_SYMBOL_GPL(xdp_return_frame);
 
 void xdp_return_frame_rx_napi(struct xdp_frame *xdpf)
 {
+	struct skb_shared_info *sinfo;
+	int i;
+
+	if (likely(!xdp_frame_is_mb(xdpf)))
+		goto out;
+
+	sinfo = xdp_get_shared_info_from_frame(xdpf);
+	for (i = 0; i < sinfo->nr_frags; i++) {
+		struct page *page = skb_frag_page(&sinfo->frags[i]);
+
+		__xdp_return(page_address(page), &xdpf->mem, true, NULL);
+	}
+out:
 	__xdp_return(xdpf->data, &xdpf->mem, true, NULL);
 }
 EXPORT_SYMBOL_GPL(xdp_return_frame_rx_napi);
@@ -416,7 +442,7 @@ void xdp_return_frame_bulk(struct xdp_frame *xdpf,
 	struct xdp_mem_allocator *xa;
 
 	if (mem->type != MEM_TYPE_PAGE_POOL) {
-		__xdp_return(xdpf->data, &xdpf->mem, false, NULL);
+		xdp_return_frame(xdpf);
 		return;
 	}
 
@@ -435,12 +461,38 @@ void xdp_return_frame_bulk(struct xdp_frame *xdpf,
 		bq->xa = rhashtable_lookup(mem_id_ht, &mem->id, mem_id_rht_params);
 	}
 
+	if (unlikely(xdp_frame_is_mb(xdpf))) {
+		struct skb_shared_info *sinfo;
+		int i;
+
+		sinfo = xdp_get_shared_info_from_frame(xdpf);
+		for (i = 0; i < sinfo->nr_frags; i++) {
+			skb_frag_t *frag = &sinfo->frags[i];
+
+			bq->q[bq->count++] = skb_frag_address(frag);
+			if (bq->count == XDP_BULK_QUEUE_SIZE)
+				xdp_flush_frame_bulk(bq);
+		}
+	}
 	bq->q[bq->count++] = xdpf->data;
 }
 EXPORT_SYMBOL_GPL(xdp_return_frame_bulk);
 
 void xdp_return_buff(struct xdp_buff *xdp)
 {
+	struct skb_shared_info *sinfo;
+	int i;
+
+	if (likely(!xdp_buff_is_mb(xdp)))
+		goto out;
+
+	sinfo = xdp_get_shared_info_from_buff(xdp);
+	for (i = 0; i < sinfo->nr_frags; i++) {
+		struct page *page = skb_frag_page(&sinfo->frags[i]);
+
+		__xdp_return(page_address(page), &xdp->rxq->mem, true, xdp);
+	}
+out:
 	__xdp_return(xdp->data, &xdp->rxq->mem, true, xdp);
 }
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v9 bpf-next 05/14] net: mvneta: add multi buffer support to XDP_TX
  2021-06-14 12:49 [PATCH v9 bpf-next 00/14] mvneta: introduce XDP multi-buffer support Lorenzo Bianconi
                   ` (3 preceding siblings ...)
  2021-06-14 12:49 ` [PATCH v9 bpf-next 04/14] xdp: add multi-buff support to xdp_return_{buff/frame} Lorenzo Bianconi
@ 2021-06-14 12:49 ` Lorenzo Bianconi
  2021-06-14 12:49 ` [PATCH v9 bpf-next 06/14] net: mvneta: enable jumbo frames for XDP Lorenzo Bianconi
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 53+ messages in thread
From: Lorenzo Bianconi @ 2021-06-14 12:49 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr, sameehj,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar

Introduce the capability to map non-linear xdp buffer running
mvneta_xdp_submit_frame() for XDP_TX and XDP_REDIRECT

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 drivers/net/ethernet/marvell/mvneta.c | 112 +++++++++++++++++---------
 1 file changed, 76 insertions(+), 36 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index f8993f7488b9..ab57be38ba03 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -1860,8 +1860,8 @@ static void mvneta_txq_bufs_free(struct mvneta_port *pp,
 			bytes_compl += buf->skb->len;
 			pkts_compl++;
 			dev_kfree_skb_any(buf->skb);
-		} else if (buf->type == MVNETA_TYPE_XDP_TX ||
-			   buf->type == MVNETA_TYPE_XDP_NDO) {
+		} else if ((buf->type == MVNETA_TYPE_XDP_TX ||
+			    buf->type == MVNETA_TYPE_XDP_NDO) && buf->xdpf) {
 			if (napi && buf->type == MVNETA_TYPE_XDP_TX)
 				xdp_return_frame_rx_napi(buf->xdpf);
 			else
@@ -2055,47 +2055,87 @@ mvneta_xdp_put_buff(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
 
 static int
 mvneta_xdp_submit_frame(struct mvneta_port *pp, struct mvneta_tx_queue *txq,
-			struct xdp_frame *xdpf, bool dma_map)
+			struct xdp_frame *xdpf, int *nxmit_byte, bool dma_map)
 {
-	struct mvneta_tx_desc *tx_desc;
-	struct mvneta_tx_buf *buf;
-	dma_addr_t dma_addr;
+	struct skb_shared_info *sinfo = xdp_get_shared_info_from_frame(xdpf);
+	struct device *dev = pp->dev->dev.parent;
+	struct mvneta_tx_desc *tx_desc = NULL;
+	int i, num_frames = 1;
+	struct page *page;
+
+	if (unlikely(xdp_frame_is_mb(xdpf)))
+		num_frames += sinfo->nr_frags;
 
-	if (txq->count >= txq->tx_stop_threshold)
+	if (txq->count + num_frames >= txq->size)
 		return MVNETA_XDP_DROPPED;
 
-	tx_desc = mvneta_txq_next_desc_get(txq);
+	for (i = 0; i < num_frames; i++) {
+		struct mvneta_tx_buf *buf = &txq->buf[txq->txq_put_index];
+		skb_frag_t *frag = NULL;
+		int len = xdpf->len;
+		dma_addr_t dma_addr;
 
-	buf = &txq->buf[txq->txq_put_index];
-	if (dma_map) {
-		/* ndo_xdp_xmit */
-		dma_addr = dma_map_single(pp->dev->dev.parent, xdpf->data,
-					  xdpf->len, DMA_TO_DEVICE);
-		if (dma_mapping_error(pp->dev->dev.parent, dma_addr)) {
-			mvneta_txq_desc_put(txq);
-			return MVNETA_XDP_DROPPED;
+		if (unlikely(i)) { /* paged area */
+			frag = &sinfo->frags[i - 1];
+			len = skb_frag_size(frag);
 		}
-		buf->type = MVNETA_TYPE_XDP_NDO;
-	} else {
-		struct page *page = virt_to_page(xdpf->data);
 
-		dma_addr = page_pool_get_dma_addr(page) +
-			   sizeof(*xdpf) + xdpf->headroom;
-		dma_sync_single_for_device(pp->dev->dev.parent, dma_addr,
-					   xdpf->len, DMA_BIDIRECTIONAL);
-		buf->type = MVNETA_TYPE_XDP_TX;
+		tx_desc = mvneta_txq_next_desc_get(txq);
+		if (dma_map) {
+			/* ndo_xdp_xmit */
+			void *data;
+
+			data = unlikely(frag) ? skb_frag_address(frag)
+					      : xdpf->data;
+			dma_addr = dma_map_single(dev, data, len,
+						  DMA_TO_DEVICE);
+			if (dma_mapping_error(dev, dma_addr)) {
+				mvneta_txq_desc_put(txq);
+				goto unmap;
+			}
+
+			buf->type = MVNETA_TYPE_XDP_NDO;
+		} else {
+			page = unlikely(frag) ? skb_frag_page(frag)
+					      : virt_to_page(xdpf->data);
+			dma_addr = page_pool_get_dma_addr(page);
+			if (unlikely(frag))
+				dma_addr += skb_frag_off(frag);
+			else
+				dma_addr += sizeof(*xdpf) + xdpf->headroom;
+			dma_sync_single_for_device(dev, dma_addr, len,
+						   DMA_BIDIRECTIONAL);
+			buf->type = MVNETA_TYPE_XDP_TX;
+		}
+		buf->xdpf = unlikely(i) ? NULL : xdpf;
+
+		tx_desc->command = unlikely(i) ? 0 : MVNETA_TXD_F_DESC;
+		tx_desc->buf_phys_addr = dma_addr;
+		tx_desc->data_size = len;
+		*nxmit_byte += len;
+
+		mvneta_txq_inc_put(txq);
 	}
-	buf->xdpf = xdpf;
 
-	tx_desc->command = MVNETA_TXD_FLZ_DESC;
-	tx_desc->buf_phys_addr = dma_addr;
-	tx_desc->data_size = xdpf->len;
+	/*last descriptor */
+	if (likely(tx_desc))
+		tx_desc->command |= MVNETA_TXD_L_DESC | MVNETA_TXD_Z_PAD;
 
-	mvneta_txq_inc_put(txq);
-	txq->pending++;
-	txq->count++;
+	txq->pending += num_frames;
+	txq->count += num_frames;
 
 	return MVNETA_XDP_TX;
+
+unmap:
+	for (i--; i >= 0; i--) {
+		mvneta_txq_desc_put(txq);
+		tx_desc = txq->descs + txq->next_desc_to_proc;
+		dma_unmap_single(dev, tx_desc->buf_phys_addr,
+				 tx_desc->data_size,
+				 DMA_TO_DEVICE);
+	}
+
+	return MVNETA_XDP_DROPPED;
 }
 
 static int
@@ -2104,8 +2144,8 @@ mvneta_xdp_xmit_back(struct mvneta_port *pp, struct xdp_buff *xdp)
 	struct mvneta_pcpu_stats *stats = this_cpu_ptr(pp->stats);
 	struct mvneta_tx_queue *txq;
 	struct netdev_queue *nq;
+	int cpu, nxmit_byte = 0;
 	struct xdp_frame *xdpf;
-	int cpu;
 	u32 ret;
 
 	xdpf = xdp_convert_buff_to_frame(xdp);
@@ -2117,10 +2157,10 @@ mvneta_xdp_xmit_back(struct mvneta_port *pp, struct xdp_buff *xdp)
 	nq = netdev_get_tx_queue(pp->dev, txq->id);
 
 	__netif_tx_lock(nq, cpu);
-	ret = mvneta_xdp_submit_frame(pp, txq, xdpf, false);
+	ret = mvneta_xdp_submit_frame(pp, txq, xdpf, &nxmit_byte, false);
 	if (ret == MVNETA_XDP_TX) {
 		u64_stats_update_begin(&stats->syncp);
-		stats->es.ps.tx_bytes += xdpf->len;
+		stats->es.ps.tx_bytes += nxmit_byte;
 		stats->es.ps.tx_packets++;
 		stats->es.ps.xdp_tx++;
 		u64_stats_update_end(&stats->syncp);
@@ -2159,11 +2199,11 @@ mvneta_xdp_xmit(struct net_device *dev, int num_frame,
 
 	__netif_tx_lock(nq, cpu);
 	for (i = 0; i < num_frame; i++) {
-		ret = mvneta_xdp_submit_frame(pp, txq, frames[i], true);
+		ret = mvneta_xdp_submit_frame(pp, txq, frames[i], &nxmit_byte,
+					      true);
 		if (ret != MVNETA_XDP_TX)
 			break;
 
-		nxmit_byte += frames[i]->len;
 		nxmit++;
 	}
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v9 bpf-next 06/14] net: mvneta: enable jumbo frames for XDP
  2021-06-14 12:49 [PATCH v9 bpf-next 00/14] mvneta: introduce XDP multi-buffer support Lorenzo Bianconi
                   ` (4 preceding siblings ...)
  2021-06-14 12:49 ` [PATCH v9 bpf-next 05/14] net: mvneta: add multi buffer support to XDP_TX Lorenzo Bianconi
@ 2021-06-14 12:49 ` Lorenzo Bianconi
  2021-06-14 12:49 ` [PATCH v9 bpf-next 07/14] net: xdp: add multi-buff support to xdp_build_skb_from_frame Lorenzo Bianconi
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 53+ messages in thread
From: Lorenzo Bianconi @ 2021-06-14 12:49 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr, sameehj,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar

Enable the capability to receive jumbo frames even if the interface is
running in XDP mode

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 drivers/net/ethernet/marvell/mvneta.c | 10 ----------
 1 file changed, 10 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index ab57be38ba03..505bec673ff7 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -3780,11 +3780,6 @@ static int mvneta_change_mtu(struct net_device *dev, int mtu)
 		mtu = ALIGN(MVNETA_RX_PKT_SIZE(mtu), 8);
 	}
 
-	if (pp->xdp_prog && mtu > MVNETA_MAX_RX_BUF_SIZE) {
-		netdev_info(dev, "Illegal MTU value %d for XDP mode\n", mtu);
-		return -EINVAL;
-	}
-
 	dev->mtu = mtu;
 
 	if (!netif_running(dev)) {
@@ -4486,11 +4481,6 @@ static int mvneta_xdp_setup(struct net_device *dev, struct bpf_prog *prog,
 	struct mvneta_port *pp = netdev_priv(dev);
 	struct bpf_prog *old_prog;
 
-	if (prog && dev->mtu > MVNETA_MAX_RX_BUF_SIZE) {
-		NL_SET_ERR_MSG_MOD(extack, "MTU too large for XDP");
-		return -EOPNOTSUPP;
-	}
-
 	if (pp->bm_priv) {
 		NL_SET_ERR_MSG_MOD(extack,
 				   "Hardware Buffer Management not supported on XDP");
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v9 bpf-next 07/14] net: xdp: add multi-buff support to xdp_build_skb_from_frame
  2021-06-14 12:49 [PATCH v9 bpf-next 00/14] mvneta: introduce XDP multi-buffer support Lorenzo Bianconi
                   ` (5 preceding siblings ...)
  2021-06-14 12:49 ` [PATCH v9 bpf-next 06/14] net: mvneta: enable jumbo frames for XDP Lorenzo Bianconi
@ 2021-06-14 12:49 ` Lorenzo Bianconi
  2021-06-28 21:05   ` Alexander Duyck
  2021-06-14 12:49 ` [PATCH v9 bpf-next 08/14] bpf: add multi-buff support to the bpf_xdp_adjust_tail() API Lorenzo Bianconi
                   ` (7 subsequent siblings)
  14 siblings, 1 reply; 53+ messages in thread
From: Lorenzo Bianconi @ 2021-06-14 12:49 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr, sameehj,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar

Introduce xdp multi-buff support to
__xdp_build_skb_from_frame/xdp_build_skb_from_frame
utility routines.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 net/core/xdp.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/net/core/xdp.c b/net/core/xdp.c
index f61c63115c95..71bedf6049a1 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -582,9 +582,15 @@ struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf,
 					   struct sk_buff *skb,
 					   struct net_device *dev)
 {
+	struct skb_shared_info *sinfo = xdp_get_shared_info_from_frame(xdpf);
 	unsigned int headroom, frame_size;
+	int i, num_frags = 0;
 	void *hard_start;
 
+	/* xdp multi-buff frame */
+	if (unlikely(xdp_frame_is_mb(xdpf)))
+		num_frags = sinfo->nr_frags;
+
 	/* Part of headroom was reserved to xdpf */
 	headroom = sizeof(*xdpf) + xdpf->headroom;
 
@@ -603,6 +609,13 @@ struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf,
 	if (xdpf->metasize)
 		skb_metadata_set(skb, xdpf->metasize);
 
+	for (i = 0; i < num_frags; i++)
+		skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags,
+				skb_frag_page(&sinfo->frags[i]),
+				skb_frag_off(&sinfo->frags[i]),
+				skb_frag_size(&sinfo->frags[i]),
+				xdpf->frame_sz);
+
 	/* Essential SKB info: protocol and skb->dev */
 	skb->protocol = eth_type_trans(skb, dev);
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v9 bpf-next 08/14] bpf: add multi-buff support to the bpf_xdp_adjust_tail() API
  2021-06-14 12:49 [PATCH v9 bpf-next 00/14] mvneta: introduce XDP multi-buffer support Lorenzo Bianconi
                   ` (6 preceding siblings ...)
  2021-06-14 12:49 ` [PATCH v9 bpf-next 07/14] net: xdp: add multi-buff support to xdp_build_skb_from_frame Lorenzo Bianconi
@ 2021-06-14 12:49 ` Lorenzo Bianconi
  2021-06-22 23:37   ` John Fastabend
  2021-06-14 12:49 ` [PATCH v9 bpf-next 09/14] bpf: introduce bpf_xdp_get_buff_len helper Lorenzo Bianconi
                   ` (6 subsequent siblings)
  14 siblings, 1 reply; 53+ messages in thread
From: Lorenzo Bianconi @ 2021-06-14 12:49 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr, sameehj,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar

From: Eelco Chaudron <echaudro@redhat.com>

This change adds support for tail growing and shrinking for XDP multi-buff.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 include/net/xdp.h |  7 ++++++
 net/core/filter.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++
 net/core/xdp.c    |  5 ++--
 3 files changed, 72 insertions(+), 2 deletions(-)

diff --git a/include/net/xdp.h b/include/net/xdp.h
index 935a6f83115f..3525801c6ed5 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -132,6 +132,11 @@ xdp_get_shared_info_from_buff(struct xdp_buff *xdp)
 	return (struct skb_shared_info *)xdp_data_hard_end(xdp);
 }
 
+static inline unsigned int xdp_get_frag_tailroom(const skb_frag_t *frag)
+{
+	return PAGE_SIZE - skb_frag_size(frag) - skb_frag_off(frag);
+}
+
 struct xdp_frame {
 	void *data;
 	u16 len;
@@ -259,6 +264,8 @@ struct xdp_frame *xdp_convert_buff_to_frame(struct xdp_buff *xdp)
 	return xdp_frame;
 }
 
+void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct,
+		  struct xdp_buff *xdp);
 void xdp_return_frame(struct xdp_frame *xdpf);
 void xdp_return_frame_rx_napi(struct xdp_frame *xdpf);
 void xdp_return_buff(struct xdp_buff *xdp);
diff --git a/net/core/filter.c b/net/core/filter.c
index caa88955562e..05f574a3d690 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3859,11 +3859,73 @@ static const struct bpf_func_proto bpf_xdp_adjust_head_proto = {
 	.arg2_type	= ARG_ANYTHING,
 };
 
+static int bpf_xdp_mb_adjust_tail(struct xdp_buff *xdp, int offset)
+{
+	struct skb_shared_info *sinfo;
+
+	if (unlikely(!xdp_buff_is_mb(xdp)))
+		return -EINVAL;
+
+	sinfo = xdp_get_shared_info_from_buff(xdp);
+	if (offset >= 0) {
+		skb_frag_t *frag = &sinfo->frags[sinfo->nr_frags - 1];
+		int size;
+
+		if (unlikely(offset > xdp_get_frag_tailroom(frag)))
+			return -EINVAL;
+
+		size = skb_frag_size(frag);
+		memset(skb_frag_address(frag) + size, 0, offset);
+		skb_frag_size_set(frag, size + offset);
+		sinfo->data_len += offset;
+	} else {
+		int i, n_frags_free = 0, len_free = 0;
+
+		offset = abs(offset);
+		if (unlikely(offset > ((int)(xdp->data_end - xdp->data) +
+				       sinfo->data_len - ETH_HLEN)))
+			return -EINVAL;
+
+		for (i = sinfo->nr_frags - 1; i >= 0 && offset > 0; i--) {
+			skb_frag_t *frag = &sinfo->frags[i];
+			int size = skb_frag_size(frag);
+			int shrink = min_t(int, offset, size);
+
+			len_free += shrink;
+			offset -= shrink;
+
+			if (unlikely(size == shrink)) {
+				struct page *page = skb_frag_page(frag);
+
+				__xdp_return(page_address(page), &xdp->rxq->mem,
+					     false, NULL);
+				n_frags_free++;
+			} else {
+				skb_frag_size_set(frag, size - shrink);
+				break;
+			}
+		}
+		sinfo->nr_frags -= n_frags_free;
+		sinfo->data_len -= len_free;
+
+		if (unlikely(!sinfo->nr_frags))
+			xdp_buff_clear_mb(xdp);
+
+		if (unlikely(offset > 0))
+			xdp->data_end -= offset;
+	}
+
+	return 0;
+}
+
 BPF_CALL_2(bpf_xdp_adjust_tail, struct xdp_buff *, xdp, int, offset)
 {
 	void *data_hard_end = xdp_data_hard_end(xdp); /* use xdp->frame_sz */
 	void *data_end = xdp->data_end + offset;
 
+	if (unlikely(xdp_buff_is_mb(xdp)))
+		return bpf_xdp_mb_adjust_tail(xdp, offset);
+
 	/* Notice that xdp_data_hard_end have reserved some tailroom */
 	if (unlikely(data_end > data_hard_end))
 		return -EINVAL;
diff --git a/net/core/xdp.c b/net/core/xdp.c
index 71bedf6049a1..ffd70d3e9e5d 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -338,8 +338,8 @@ EXPORT_SYMBOL_GPL(xdp_rxq_info_reg_mem_model);
  * is used for those calls sites.  Thus, allowing for faster recycling
  * of xdp_frames/pages in those cases.
  */
-static void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct,
-			 struct xdp_buff *xdp)
+void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct,
+		  struct xdp_buff *xdp)
 {
 	struct xdp_mem_allocator *xa;
 	struct page *page;
@@ -372,6 +372,7 @@ static void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct,
 		break;
 	}
 }
+EXPORT_SYMBOL_GPL(__xdp_return);
 
 void xdp_return_frame(struct xdp_frame *xdpf)
 {
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v9 bpf-next 09/14] bpf: introduce bpf_xdp_get_buff_len helper
  2021-06-14 12:49 [PATCH v9 bpf-next 00/14] mvneta: introduce XDP multi-buffer support Lorenzo Bianconi
                   ` (7 preceding siblings ...)
  2021-06-14 12:49 ` [PATCH v9 bpf-next 08/14] bpf: add multi-buff support to the bpf_xdp_adjust_tail() API Lorenzo Bianconi
@ 2021-06-14 12:49 ` Lorenzo Bianconi
  2021-06-14 12:49 ` [PATCH v9 bpf-next 10/14] bpf: add multi-buffer support to xdp copy helpers Lorenzo Bianconi
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 53+ messages in thread
From: Lorenzo Bianconi @ 2021-06-14 12:49 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr, sameehj,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar

Introduce bpf_xdp_get_buff_len helper in order to return the xdp buffer
total size (linear and paged area)

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 include/uapi/linux/bpf.h       |  7 +++++++
 net/core/filter.c              | 23 +++++++++++++++++++++++
 tools/include/uapi/linux/bpf.h |  7 +++++++
 3 files changed, 37 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 2c1ba70abbf1..0116c0c569fe 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -4778,6 +4778,12 @@ union bpf_attr {
  * 		Execute close syscall for given FD.
  * 	Return
  * 		A syscall result.
+ *
+ * int bpf_xdp_get_buff_len(struct xdp_buff *xdp_md)
+ *	Description
+ *		Get the total size of a given xdp buff (linear and paged area)
+ *	Return
+ *		The total size of a given xdp buffer.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -4949,6 +4955,7 @@ union bpf_attr {
 	FN(sys_bpf),			\
 	FN(btf_find_by_name_kind),	\
 	FN(sys_close),			\
+	FN(xdp_get_buff_len),		\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
diff --git a/net/core/filter.c b/net/core/filter.c
index 05f574a3d690..b0855f2d4726 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3918,6 +3918,27 @@ static int bpf_xdp_mb_adjust_tail(struct xdp_buff *xdp, int offset)
 	return 0;
 }
 
+BPF_CALL_1(bpf_xdp_get_buff_len, struct  xdp_buff*, xdp)
+{
+	int data_len = 0;
+
+	if (unlikely(xdp_buff_is_mb(xdp))) {
+		struct skb_shared_info *sinfo;
+
+		sinfo = xdp_get_shared_info_from_buff(xdp);
+		data_len = sinfo->data_len;
+	}
+
+	return (xdp->data_end - xdp->data) + data_len;
+}
+
+const struct bpf_func_proto bpf_xdp_get_buff_len_proto = {
+	.func		= bpf_xdp_get_buff_len,
+	.gpl_only	= false,
+	.ret_type	= RET_INTEGER,
+	.arg1_type	= ARG_PTR_TO_CTX,
+};
+
 BPF_CALL_2(bpf_xdp_adjust_tail, struct xdp_buff *, xdp, int, offset)
 {
 	void *data_hard_end = xdp_data_hard_end(xdp); /* use xdp->frame_sz */
@@ -7454,6 +7475,8 @@ xdp_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_xdp_redirect_map_proto;
 	case BPF_FUNC_xdp_adjust_tail:
 		return &bpf_xdp_adjust_tail_proto;
+	case BPF_FUNC_xdp_get_buff_len:
+		return &bpf_xdp_get_buff_len_proto;
 	case BPF_FUNC_fib_lookup:
 		return &bpf_xdp_fib_lookup_proto;
 	case BPF_FUNC_check_mtu:
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 2c1ba70abbf1..0116c0c569fe 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -4778,6 +4778,12 @@ union bpf_attr {
  * 		Execute close syscall for given FD.
  * 	Return
  * 		A syscall result.
+ *
+ * int bpf_xdp_get_buff_len(struct xdp_buff *xdp_md)
+ *	Description
+ *		Get the total size of a given xdp buff (linear and paged area)
+ *	Return
+ *		The total size of a given xdp buffer.
  */
 #define __BPF_FUNC_MAPPER(FN)		\
 	FN(unspec),			\
@@ -4949,6 +4955,7 @@ union bpf_attr {
 	FN(sys_bpf),			\
 	FN(btf_find_by_name_kind),	\
 	FN(sys_close),			\
+	FN(xdp_get_buff_len),		\
 	/* */
 
 /* integer value in 'imm' field of BPF_CALL instruction selects which helper
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v9 bpf-next 10/14] bpf: add multi-buffer support to xdp copy helpers
  2021-06-14 12:49 [PATCH v9 bpf-next 00/14] mvneta: introduce XDP multi-buffer support Lorenzo Bianconi
                   ` (8 preceding siblings ...)
  2021-06-14 12:49 ` [PATCH v9 bpf-next 09/14] bpf: introduce bpf_xdp_get_buff_len helper Lorenzo Bianconi
@ 2021-06-14 12:49 ` Lorenzo Bianconi
  2021-06-22 23:49   ` John Fastabend
  2021-06-14 12:49 ` [PATCH v9 bpf-next 11/14] bpf: move user_size out of bpf_test_init Lorenzo Bianconi
                   ` (4 subsequent siblings)
  14 siblings, 1 reply; 53+ messages in thread
From: Lorenzo Bianconi @ 2021-06-14 12:49 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr, sameehj,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar

From: Eelco Chaudron <echaudro@redhat.com>

This patch adds support for multi-buffer for the following helpers:
  - bpf_xdp_output()
  - bpf_perf_event_output()

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 kernel/trace/bpf_trace.c                      |   3 +
 net/core/filter.c                             |  72 +++++++++-
 .../selftests/bpf/prog_tests/xdp_bpf2bpf.c    | 127 ++++++++++++------
 .../selftests/bpf/progs/test_xdp_bpf2bpf.c    |   2 +-
 4 files changed, 160 insertions(+), 44 deletions(-)

diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index d2d7cf6cfe83..ee926ec64f78 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -1365,6 +1365,7 @@ static const struct bpf_func_proto bpf_perf_event_output_proto_raw_tp = {
 
 extern const struct bpf_func_proto bpf_skb_output_proto;
 extern const struct bpf_func_proto bpf_xdp_output_proto;
+extern const struct bpf_func_proto bpf_xdp_get_buff_len_trace_proto;
 
 BPF_CALL_3(bpf_get_stackid_raw_tp, struct bpf_raw_tracepoint_args *, args,
 	   struct bpf_map *, map, u64, flags)
@@ -1460,6 +1461,8 @@ tracing_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
 		return &bpf_sock_from_file_proto;
 	case BPF_FUNC_get_socket_cookie:
 		return &bpf_get_socket_ptr_cookie_proto;
+	case BPF_FUNC_xdp_get_buff_len:
+		return &bpf_xdp_get_buff_len_trace_proto;
 #endif
 	case BPF_FUNC_seq_printf:
 		return prog->expected_attach_type == BPF_TRACE_ITER ?
diff --git a/net/core/filter.c b/net/core/filter.c
index b0855f2d4726..f7211b7908a9 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3939,6 +3939,15 @@ const struct bpf_func_proto bpf_xdp_get_buff_len_proto = {
 	.arg1_type	= ARG_PTR_TO_CTX,
 };
 
+BTF_ID_LIST_SINGLE(bpf_xdp_get_buff_len_bpf_ids, struct, xdp_buff)
+
+const struct bpf_func_proto bpf_xdp_get_buff_len_trace_proto = {
+	.func		= bpf_xdp_get_buff_len,
+	.gpl_only	= false,
+	.arg1_type	= ARG_PTR_TO_BTF_ID,
+	.arg1_btf_id	= &bpf_xdp_get_buff_len_bpf_ids[0],
+};
+
 BPF_CALL_2(bpf_xdp_adjust_tail, struct xdp_buff *, xdp, int, offset)
 {
 	void *data_hard_end = xdp_data_hard_end(xdp); /* use xdp->frame_sz */
@@ -4606,10 +4615,56 @@ static const struct bpf_func_proto bpf_sk_ancestor_cgroup_id_proto = {
 };
 #endif
 
-static unsigned long bpf_xdp_copy(void *dst_buff, const void *src_buff,
+static unsigned long bpf_xdp_copy(void *dst_buff, const void *ctx,
 				  unsigned long off, unsigned long len)
 {
-	memcpy(dst_buff, src_buff + off, len);
+	struct xdp_buff *xdp = (struct xdp_buff *)ctx;
+	struct skb_shared_info *sinfo;
+	unsigned long base_len;
+
+	if (likely(!xdp_buff_is_mb(xdp))) {
+		memcpy(dst_buff, xdp->data + off, len);
+		return 0;
+	}
+
+	base_len = xdp->data_end - xdp->data;
+	sinfo = xdp_get_shared_info_from_buff(xdp);
+	do {
+		const void *src_buff = NULL;
+		unsigned long copy_len = 0;
+
+		if (off < base_len) {
+			src_buff = xdp->data + off;
+			copy_len = min(len, base_len - off);
+		} else {
+			unsigned long frag_off_total = base_len;
+			int i;
+
+			for (i = 0; i < sinfo->nr_frags; i++) {
+				skb_frag_t *frag = &sinfo->frags[i];
+				unsigned long frag_len, frag_off;
+
+				frag_len = skb_frag_size(frag);
+				frag_off = off - frag_off_total;
+				if (frag_off < frag_len) {
+					src_buff = skb_frag_address(frag) +
+						   frag_off;
+					copy_len = min(len,
+						       frag_len - frag_off);
+					break;
+				}
+				frag_off_total += frag_len;
+			}
+		}
+		if (!src_buff)
+			break;
+
+		memcpy(dst_buff, src_buff, copy_len);
+		off += copy_len;
+		len -= copy_len;
+		dst_buff += copy_len;
+	} while (len);
+
 	return 0;
 }
 
@@ -4621,10 +4676,19 @@ BPF_CALL_5(bpf_xdp_event_output, struct xdp_buff *, xdp, struct bpf_map *, map,
 	if (unlikely(flags & ~(BPF_F_CTXLEN_MASK | BPF_F_INDEX_MASK)))
 		return -EINVAL;
 	if (unlikely(!xdp ||
-		     xdp_size > (unsigned long)(xdp->data_end - xdp->data)))
+		     (likely(!xdp_buff_is_mb(xdp)) &&
+		      xdp_size > (unsigned long)(xdp->data_end - xdp->data))))
 		return -EFAULT;
+	if (unlikely(xdp_buff_is_mb(xdp))) {
+		struct skb_shared_info *sinfo;
+
+		sinfo = xdp_get_shared_info_from_buff(xdp);
+		if (unlikely(xdp_size > ((int)(xdp->data_end - xdp->data) +
+					 sinfo->data_len)))
+			return -EFAULT;
+	}
 
-	return bpf_event_output(map, flags, meta, meta_size, xdp->data,
+	return bpf_event_output(map, flags, meta, meta_size, xdp,
 				xdp_size, bpf_xdp_copy);
 }
 
diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_bpf2bpf.c b/tools/testing/selftests/bpf/prog_tests/xdp_bpf2bpf.c
index 3bd5904b4db5..cc9be5912be8 100644
--- a/tools/testing/selftests/bpf/prog_tests/xdp_bpf2bpf.c
+++ b/tools/testing/selftests/bpf/prog_tests/xdp_bpf2bpf.c
@@ -10,11 +10,20 @@ struct meta {
 	int pkt_len;
 };
 
+struct test_ctx_s {
+	bool passed;
+	int pkt_size;
+};
+
+struct test_ctx_s test_ctx;
+
 static void on_sample(void *ctx, int cpu, void *data, __u32 size)
 {
-	int duration = 0;
 	struct meta *meta = (struct meta *)data;
 	struct ipv4_packet *trace_pkt_v4 = data + sizeof(*meta);
+	unsigned char *raw_pkt = data + sizeof(*meta);
+	struct test_ctx_s *tst_ctx = ctx;
+	int duration = 0;
 
 	if (CHECK(size < sizeof(pkt_v4) + sizeof(*meta),
 		  "check_size", "size %u < %zu\n",
@@ -25,25 +34,90 @@ static void on_sample(void *ctx, int cpu, void *data, __u32 size)
 		  "meta->ifindex = %d\n", meta->ifindex))
 		return;
 
-	if (CHECK(meta->pkt_len != sizeof(pkt_v4), "check_meta_pkt_len",
-		  "meta->pkt_len = %zd\n", sizeof(pkt_v4)))
+	if (CHECK(meta->pkt_len != tst_ctx->pkt_size, "check_meta_pkt_len",
+		  "meta->pkt_len = %d\n", tst_ctx->pkt_size))
 		return;
 
 	if (CHECK(memcmp(trace_pkt_v4, &pkt_v4, sizeof(pkt_v4)),
 		  "check_packet_content", "content not the same\n"))
 		return;
 
-	*(bool *)ctx = true;
+	if (meta->pkt_len > sizeof(pkt_v4)) {
+		for (int i = 0; i < (meta->pkt_len - sizeof(pkt_v4)); i++) {
+			if (raw_pkt[i + sizeof(pkt_v4)] != (unsigned char)i) {
+				CHECK(true, "check_packet_content",
+				      "byte %zu does not match %u != %u\n",
+				      i + sizeof(pkt_v4),
+				      raw_pkt[i + sizeof(pkt_v4)],
+				      (unsigned char)i);
+				break;
+			}
+		}
+	}
+
+	tst_ctx->passed = true;
 }
 
-void test_xdp_bpf2bpf(void)
+static int run_xdp_bpf2bpf_pkt_size(int pkt_fd, struct perf_buffer *pb,
+				    struct test_xdp_bpf2bpf *ftrace_skel,
+				    int pkt_size)
 {
 	__u32 duration = 0, retval, size;
-	char buf[128];
+	unsigned char buf_in[9000];
+	unsigned char buf[9000];
+	int err;
+
+	if (pkt_size > sizeof(buf_in) || pkt_size < sizeof(pkt_v4))
+		return -EINVAL;
+
+	test_ctx.passed = false;
+	test_ctx.pkt_size = pkt_size;
+
+	memcpy(buf_in, &pkt_v4, sizeof(pkt_v4));
+	if (pkt_size > sizeof(pkt_v4)) {
+		for (int i = 0; i < (pkt_size - sizeof(pkt_v4)); i++)
+			buf_in[i + sizeof(pkt_v4)] = i;
+	}
+
+	/* Run test program */
+	err = bpf_prog_test_run(pkt_fd, 1, buf_in, pkt_size,
+				buf, &size, &retval, &duration);
+
+	if (CHECK(err || retval != XDP_PASS || size != pkt_size,
+		  "ipv4", "err %d errno %d retval %d size %d\n",
+		  err, errno, retval, size))
+		return -1;
+
+	/* Make sure bpf_xdp_output() was triggered and it sent the expected
+	 * data to the perf ring buffer.
+	 */
+	err = perf_buffer__poll(pb, 100);
+	if (CHECK(err <= 0, "perf_buffer__poll", "err %d\n", err))
+		return -1;
+
+	if (CHECK_FAIL(!test_ctx.passed))
+		return -1;
+
+	/* Verify test results */
+	if (CHECK(ftrace_skel->bss->test_result_fentry != if_nametoindex("lo"),
+		  "result", "fentry failed err %llu\n",
+		  ftrace_skel->bss->test_result_fentry))
+		return -1;
+
+	if (CHECK(ftrace_skel->bss->test_result_fexit != XDP_PASS, "result",
+		  "fexit failed err %llu\n",
+		  ftrace_skel->bss->test_result_fexit))
+		return -1;
+
+	return 0;
+}
+
+void test_xdp_bpf2bpf(void)
+{
 	int err, pkt_fd, map_fd;
-	bool passed = false;
-	struct iphdr *iph = (void *)buf + sizeof(struct ethhdr);
-	struct iptnl_info value4 = {.family = AF_INET};
+	__u32 duration = 0;
+	int pkt_sizes[] = {sizeof(pkt_v4), 1024, 4100, 8200};
+	struct iptnl_info value4 = {.family = AF_INET6};
 	struct test_xdp *pkt_skel = NULL;
 	struct test_xdp_bpf2bpf *ftrace_skel = NULL;
 	struct vip key4 = {.protocol = 6, .family = AF_INET};
@@ -87,40 +161,15 @@ void test_xdp_bpf2bpf(void)
 
 	/* Set up perf buffer */
 	pb_opts.sample_cb = on_sample;
-	pb_opts.ctx = &passed;
+	pb_opts.ctx = &test_ctx;
 	pb = perf_buffer__new(bpf_map__fd(ftrace_skel->maps.perf_buf_map),
-			      1, &pb_opts);
+			      8, &pb_opts);
 	if (!ASSERT_OK_PTR(pb, "perf_buf__new"))
 		goto out;
 
-	/* Run test program */
-	err = bpf_prog_test_run(pkt_fd, 1, &pkt_v4, sizeof(pkt_v4),
-				buf, &size, &retval, &duration);
-
-	if (CHECK(err || retval != XDP_TX || size != 74 ||
-		  iph->protocol != IPPROTO_IPIP, "ipv4",
-		  "err %d errno %d retval %d size %d\n",
-		  err, errno, retval, size))
-		goto out;
-
-	/* Make sure bpf_xdp_output() was triggered and it sent the expected
-	 * data to the perf ring buffer.
-	 */
-	err = perf_buffer__poll(pb, 100);
-	if (CHECK(err < 0, "perf_buffer__poll", "err %d\n", err))
-		goto out;
-
-	CHECK_FAIL(!passed);
-
-	/* Verify test results */
-	if (CHECK(ftrace_skel->bss->test_result_fentry != if_nametoindex("lo"),
-		  "result", "fentry failed err %llu\n",
-		  ftrace_skel->bss->test_result_fentry))
-		goto out;
-
-	CHECK(ftrace_skel->bss->test_result_fexit != XDP_TX, "result",
-	      "fexit failed err %llu\n", ftrace_skel->bss->test_result_fexit);
-
+	for (int i = 0; i < ARRAY_SIZE(pkt_sizes); i++)
+		run_xdp_bpf2bpf_pkt_size(pkt_fd, pb, ftrace_skel,
+					 pkt_sizes[i]);
 out:
 	if (pb)
 		perf_buffer__free(pb);
diff --git a/tools/testing/selftests/bpf/progs/test_xdp_bpf2bpf.c b/tools/testing/selftests/bpf/progs/test_xdp_bpf2bpf.c
index a038e827f850..902b54190377 100644
--- a/tools/testing/selftests/bpf/progs/test_xdp_bpf2bpf.c
+++ b/tools/testing/selftests/bpf/progs/test_xdp_bpf2bpf.c
@@ -49,7 +49,7 @@ int BPF_PROG(trace_on_entry, struct xdp_buff *xdp)
 	void *data = (void *)(long)xdp->data;
 
 	meta.ifindex = xdp->rxq->dev->ifindex;
-	meta.pkt_len = data_end - data;
+	meta.pkt_len = bpf_xdp_get_buff_len((struct xdp_md *)xdp);
 	bpf_xdp_output(xdp, &perf_buf_map,
 		       ((__u64) meta.pkt_len << 32) |
 		       BPF_F_CURRENT_CPU,
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v9 bpf-next 11/14] bpf: move user_size out of bpf_test_init
  2021-06-14 12:49 [PATCH v9 bpf-next 00/14] mvneta: introduce XDP multi-buffer support Lorenzo Bianconi
                   ` (9 preceding siblings ...)
  2021-06-14 12:49 ` [PATCH v9 bpf-next 10/14] bpf: add multi-buffer support to xdp copy helpers Lorenzo Bianconi
@ 2021-06-14 12:49 ` Lorenzo Bianconi
  2021-06-14 12:49 ` [PATCH v9 bpf-next 12/14] bpf: introduce multibuff support to bpf_prog_test_run_xdp() Lorenzo Bianconi
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 53+ messages in thread
From: Lorenzo Bianconi @ 2021-06-14 12:49 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr, sameehj,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar

Rely on data_size_in in bpf_test_init routine signature. This is a
preliminary patch to introduce xdp multi-buff selftest

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 net/bpf/test_run.c | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index aa47af349ba8..d3252a234678 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -245,11 +245,10 @@ bool bpf_prog_test_check_kfunc_call(u32 kfunc_id)
 	return btf_id_set_contains(&test_sk_kfunc_ids, kfunc_id);
 }
 
-static void *bpf_test_init(const union bpf_attr *kattr, u32 size,
-			   u32 headroom, u32 tailroom)
+static void *bpf_test_init(const union bpf_attr *kattr, u32 user_size,
+			   u32 size, u32 headroom, u32 tailroom)
 {
 	void __user *data_in = u64_to_user_ptr(kattr->test.data_in);
-	u32 user_size = kattr->test.data_size_in;
 	void *data;
 
 	if (size < ETH_HLEN || size > PAGE_SIZE - headroom - tailroom)
@@ -570,7 +569,8 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
 	if (kattr->test.flags || kattr->test.cpu)
 		return -EINVAL;
 
-	data = bpf_test_init(kattr, size, NET_SKB_PAD + NET_IP_ALIGN,
+	data = bpf_test_init(kattr, kattr->test.data_size_in,
+			     size, NET_SKB_PAD + NET_IP_ALIGN,
 			     SKB_DATA_ALIGN(sizeof(struct skb_shared_info)));
 	if (IS_ERR(data))
 		return PTR_ERR(data);
@@ -707,7 +707,8 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr,
 	/* XDP have extra tailroom as (most) drivers use full page */
 	max_data_sz = 4096 - headroom - tailroom;
 
-	data = bpf_test_init(kattr, max_data_sz, headroom, tailroom);
+	data = bpf_test_init(kattr, kattr->test.data_size_in,
+			     max_data_sz, headroom, tailroom);
 	if (IS_ERR(data))
 		return PTR_ERR(data);
 
@@ -769,7 +770,7 @@ int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
 	if (size < ETH_HLEN)
 		return -EINVAL;
 
-	data = bpf_test_init(kattr, size, 0, 0);
+	data = bpf_test_init(kattr, kattr->test.data_size_in, size, 0, 0);
 	if (IS_ERR(data))
 		return PTR_ERR(data);
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v9 bpf-next 12/14] bpf: introduce multibuff support to bpf_prog_test_run_xdp()
  2021-06-14 12:49 [PATCH v9 bpf-next 00/14] mvneta: introduce XDP multi-buffer support Lorenzo Bianconi
                   ` (10 preceding siblings ...)
  2021-06-14 12:49 ` [PATCH v9 bpf-next 11/14] bpf: move user_size out of bpf_test_init Lorenzo Bianconi
@ 2021-06-14 12:49 ` Lorenzo Bianconi
  2021-06-14 12:49 ` [PATCH v9 bpf-next 13/14] bpf: test_run: add xdp_shared_info pointer in bpf_test_finish signature Lorenzo Bianconi
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 53+ messages in thread
From: Lorenzo Bianconi @ 2021-06-14 12:49 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr, sameehj,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar

Introduce the capability to allocate a xdp multi-buff in
bpf_prog_test_run_xdp routine. This is a preliminary patch to introduce
the selftests for new xdp multi-buff ebpf helpers

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 net/bpf/test_run.c | 52 +++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 44 insertions(+), 8 deletions(-)

diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index d3252a234678..5e2c8478ae18 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -692,23 +692,22 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr,
 {
 	u32 tailroom = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
 	u32 headroom = XDP_PACKET_HEADROOM;
-	u32 size = kattr->test.data_size_in;
 	u32 repeat = kattr->test.repeat;
 	struct netdev_rx_queue *rxqueue;
+	struct skb_shared_info *sinfo;
 	struct xdp_buff xdp = {};
+	u32 max_data_sz, size;
 	u32 retval, duration;
-	u32 max_data_sz;
+	int i, ret;
 	void *data;
-	int ret;
 
 	if (kattr->test.ctx_in || kattr->test.ctx_out)
 		return -EINVAL;
 
-	/* XDP have extra tailroom as (most) drivers use full page */
 	max_data_sz = 4096 - headroom - tailroom;
+	size = min_t(u32, kattr->test.data_size_in, max_data_sz);
 
-	data = bpf_test_init(kattr, kattr->test.data_size_in,
-			     max_data_sz, headroom, tailroom);
+	data = bpf_test_init(kattr, size, max_data_sz, headroom, tailroom);
 	if (IS_ERR(data))
 		return PTR_ERR(data);
 
@@ -717,16 +716,53 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr,
 		      &rxqueue->xdp_rxq);
 	xdp_prepare_buff(&xdp, data, headroom, size, true);
 
+	sinfo = xdp_get_shared_info_from_buff(&xdp);
+	if (unlikely(kattr->test.data_size_in > size)) {
+		void __user *data_in = u64_to_user_ptr(kattr->test.data_in);
+
+		while (size < kattr->test.data_size_in) {
+			struct page *page;
+			skb_frag_t *frag;
+			int data_len;
+
+			page = alloc_page(GFP_KERNEL);
+			if (!page) {
+				ret = -ENOMEM;
+				goto out;
+			}
+
+			frag = &sinfo->frags[sinfo->nr_frags++];
+			__skb_frag_set_page(frag, page);
+
+			data_len = min_t(int, kattr->test.data_size_in - size,
+					 PAGE_SIZE);
+			skb_frag_size_set(frag, data_len);
+
+			if (copy_from_user(page_address(page), data_in + size,
+					   data_len)) {
+				ret = -EFAULT;
+				goto out;
+			}
+			sinfo->data_len += data_len;
+			size += data_len;
+		}
+		xdp_buff_set_mb(&xdp);
+	}
+
 	bpf_prog_change_xdp(NULL, prog);
 	ret = bpf_test_run(prog, &xdp, repeat, &retval, &duration, true);
 	if (ret)
 		goto out;
-	if (xdp.data != data + headroom || xdp.data_end != xdp.data + size)
-		size = xdp.data_end - xdp.data;
+
+	size = xdp.data_end - xdp.data + sinfo->data_len;
 	ret = bpf_test_finish(kattr, uattr, xdp.data, size, retval, duration);
+
 out:
 	bpf_prog_change_xdp(prog, NULL);
+	for (i = 0; i < sinfo->nr_frags; i++)
+		__free_page(skb_frag_page(&sinfo->frags[i]));
 	kfree(data);
+
 	return ret;
 }
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v9 bpf-next 13/14] bpf: test_run: add xdp_shared_info pointer in bpf_test_finish signature
  2021-06-14 12:49 [PATCH v9 bpf-next 00/14] mvneta: introduce XDP multi-buffer support Lorenzo Bianconi
                   ` (11 preceding siblings ...)
  2021-06-14 12:49 ` [PATCH v9 bpf-next 12/14] bpf: introduce multibuff support to bpf_prog_test_run_xdp() Lorenzo Bianconi
@ 2021-06-14 12:49 ` Lorenzo Bianconi
  2021-06-14 12:49 ` [PATCH v9 bpf-next 14/14] bpf: update xdp_adjust_tail selftest to include multi-buffer Lorenzo Bianconi
  2021-06-22 23:18 ` [PATCH v9 bpf-next 00/14] mvneta: introduce XDP multi-buffer support John Fastabend
  14 siblings, 0 replies; 53+ messages in thread
From: Lorenzo Bianconi @ 2021-06-14 12:49 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr, sameehj,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar

introduce xdp_shared_info pointer in bpf_test_finish signature in order
to copy back paged data from a xdp multi-buff frame to userspace buffer

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 net/bpf/test_run.c | 47 ++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 39 insertions(+), 8 deletions(-)

diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index 5e2c8478ae18..e6b15e3d2086 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -128,7 +128,8 @@ static int bpf_test_run(struct bpf_prog *prog, void *ctx, u32 repeat,
 
 static int bpf_test_finish(const union bpf_attr *kattr,
 			   union bpf_attr __user *uattr, const void *data,
-			   u32 size, u32 retval, u32 duration)
+			   struct skb_shared_info *sinfo, u32 size,
+			   u32 retval, u32 duration)
 {
 	void __user *data_out = u64_to_user_ptr(kattr->test.data_out);
 	int err = -EFAULT;
@@ -143,8 +144,36 @@ static int bpf_test_finish(const union bpf_attr *kattr,
 		err = -ENOSPC;
 	}
 
-	if (data_out && copy_to_user(data_out, data, copy_size))
-		goto out;
+	if (data_out) {
+		int len = sinfo ? copy_size - sinfo->data_len : copy_size;
+
+		if (copy_to_user(data_out, data, len))
+			goto out;
+
+		if (sinfo) {
+			int i, offset = len, data_len;
+
+			for (i = 0; i < sinfo->nr_frags; i++) {
+				skb_frag_t *frag = &sinfo->frags[i];
+
+				if (offset >= copy_size) {
+					err = -ENOSPC;
+					break;
+				}
+
+				data_len = min_t(int, copy_size - offset,
+						 skb_frag_size(frag));
+
+				if (copy_to_user(data_out + offset,
+						 skb_frag_address(frag),
+						 data_len))
+					goto out;
+
+				offset += data_len;
+			}
+		}
+	}
+
 	if (copy_to_user(&uattr->test.data_size_out, &size, sizeof(size)))
 		goto out;
 	if (copy_to_user(&uattr->test.retval, &retval, sizeof(retval)))
@@ -673,7 +702,8 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
 	/* bpf program can never convert linear skb to non-linear */
 	if (WARN_ON_ONCE(skb_is_nonlinear(skb)))
 		size = skb_headlen(skb);
-	ret = bpf_test_finish(kattr, uattr, skb->data, size, retval, duration);
+	ret = bpf_test_finish(kattr, uattr, skb->data, NULL, size, retval,
+			      duration);
 	if (!ret)
 		ret = bpf_ctx_finish(kattr, uattr, ctx,
 				     sizeof(struct __sk_buff));
@@ -755,7 +785,8 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr,
 		goto out;
 
 	size = xdp.data_end - xdp.data + sinfo->data_len;
-	ret = bpf_test_finish(kattr, uattr, xdp.data, size, retval, duration);
+	ret = bpf_test_finish(kattr, uattr, xdp.data, sinfo, size, retval,
+			      duration);
 
 out:
 	bpf_prog_change_xdp(prog, NULL);
@@ -841,8 +872,8 @@ int bpf_prog_test_run_flow_dissector(struct bpf_prog *prog,
 	if (ret < 0)
 		goto out;
 
-	ret = bpf_test_finish(kattr, uattr, &flow_keys, sizeof(flow_keys),
-			      retval, duration);
+	ret = bpf_test_finish(kattr, uattr, &flow_keys, NULL,
+			      sizeof(flow_keys), retval, duration);
 	if (!ret)
 		ret = bpf_ctx_finish(kattr, uattr, user_ctx,
 				     sizeof(struct bpf_flow_keys));
@@ -946,7 +977,7 @@ int bpf_prog_test_run_sk_lookup(struct bpf_prog *prog, const union bpf_attr *kat
 		user_ctx->cookie = sock_gen_cookie(ctx.selected_sk);
 	}
 
-	ret = bpf_test_finish(kattr, uattr, NULL, 0, retval, duration);
+	ret = bpf_test_finish(kattr, uattr, NULL, NULL, 0, retval, duration);
 	if (!ret)
 		ret = bpf_ctx_finish(kattr, uattr, user_ctx, sizeof(*user_ctx));
 
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* [PATCH v9 bpf-next 14/14] bpf: update xdp_adjust_tail selftest to include multi-buffer
  2021-06-14 12:49 [PATCH v9 bpf-next 00/14] mvneta: introduce XDP multi-buffer support Lorenzo Bianconi
                   ` (12 preceding siblings ...)
  2021-06-14 12:49 ` [PATCH v9 bpf-next 13/14] bpf: test_run: add xdp_shared_info pointer in bpf_test_finish signature Lorenzo Bianconi
@ 2021-06-14 12:49 ` Lorenzo Bianconi
  2021-06-22 23:18 ` [PATCH v9 bpf-next 00/14] mvneta: introduce XDP multi-buffer support John Fastabend
  14 siblings, 0 replies; 53+ messages in thread
From: Lorenzo Bianconi @ 2021-06-14 12:49 UTC (permalink / raw)
  To: bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr, sameehj,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar

From: Eelco Chaudron <echaudro@redhat.com>

This change adds test cases for the multi-buffer scenarios when shrinking
and growing.

Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 .../bpf/prog_tests/xdp_adjust_tail.c          | 105 ++++++++++++++++++
 .../bpf/progs/test_xdp_adjust_tail_grow.c     |  10 +-
 .../bpf/progs/test_xdp_adjust_tail_shrink.c   |  32 +++++-
 3 files changed, 140 insertions(+), 7 deletions(-)

diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_adjust_tail.c b/tools/testing/selftests/bpf/prog_tests/xdp_adjust_tail.c
index d5c98f2cb12f..b936beaba797 100644
--- a/tools/testing/selftests/bpf/prog_tests/xdp_adjust_tail.c
+++ b/tools/testing/selftests/bpf/prog_tests/xdp_adjust_tail.c
@@ -130,6 +130,107 @@ void test_xdp_adjust_tail_grow2(void)
 	bpf_object__close(obj);
 }
 
+void test_xdp_adjust_mb_tail_shrink(void)
+{
+	const char *file = "./test_xdp_adjust_tail_shrink.o";
+	__u32 duration, retval, size, exp_size;
+	struct bpf_object *obj;
+	static char buf[9000];
+	int err, prog_fd;
+
+	/* For the individual test cases, the first byte in the packet
+	 * indicates which test will be run.
+	 */
+
+	err = bpf_prog_load(file, BPF_PROG_TYPE_XDP, &obj, &prog_fd);
+	if (CHECK_FAIL(err))
+		return;
+
+	/* Test case removing 10 bytes from last frag, NOT freeing it */
+	buf[0] = 0;
+	exp_size = sizeof(buf) - 10;
+	err = bpf_prog_test_run(prog_fd, 1, buf, sizeof(buf),
+				buf, &size, &retval, &duration);
+
+	CHECK(err || retval != XDP_TX || size != exp_size,
+	      "9k-10b", "err %d errno %d retval %d[%d] size %d[%u]\n",
+	      err, errno, retval, XDP_TX, size, exp_size);
+
+	/* Test case removing one of two pages, assuming 4K pages */
+	buf[0] = 1;
+	exp_size = sizeof(buf) - 4100;
+	err = bpf_prog_test_run(prog_fd, 1, buf, sizeof(buf),
+				buf, &size, &retval, &duration);
+
+	CHECK(err || retval != XDP_TX || size != exp_size,
+	      "9k-1p", "err %d errno %d retval %d[%d] size %d[%u]\n",
+	      err, errno, retval, XDP_TX, size, exp_size);
+
+	/* Test case removing two pages resulting in a non mb xdp_buff */
+	buf[0] = 2;
+	exp_size = sizeof(buf) - 8200;
+	err = bpf_prog_test_run(prog_fd, 1, buf, sizeof(buf),
+				buf, &size, &retval, &duration);
+
+	CHECK(err || retval != XDP_TX || size != exp_size,
+	      "9k-2p", "err %d errno %d retval %d[%d] size %d[%u]\n",
+	      err, errno, retval, XDP_TX, size, exp_size);
+
+	bpf_object__close(obj);
+}
+
+void test_xdp_adjust_mb_tail_grow(void)
+{
+	const char *file = "./test_xdp_adjust_tail_grow.o";
+	__u32 duration, retval, size, exp_size;
+	static char buf[16384];
+	struct bpf_object *obj;
+	int err, i, prog_fd;
+
+	err = bpf_prog_load(file, BPF_PROG_TYPE_XDP, &obj, &prog_fd);
+	if (CHECK_FAIL(err))
+		return;
+
+	/* Test case add 10 bytes to last frag */
+	memset(buf, 1, sizeof(buf));
+	size = 9000;
+	exp_size = size + 10;
+	err = bpf_prog_test_run(prog_fd, 1, buf, size,
+				buf, &size, &retval, &duration);
+
+	CHECK(err || retval != XDP_TX || size != exp_size,
+	      "9k+10b", "err %d retval %d[%d] size %d[%u]\n",
+	      err, retval, XDP_TX, size, exp_size);
+
+	for (i = 0; i < 9000; i++)
+		CHECK(buf[i] != 1, "9k+10b-old",
+		      "Old data not all ok, offset %i is failing [%u]!\n",
+		      i, buf[i]);
+
+	for (i = 9000; i < 9010; i++)
+		CHECK(buf[i] != 0, "9k+10b-new",
+		      "New data not all ok, offset %i is failing [%u]!\n",
+		      i, buf[i]);
+
+	for (i = 9010; i < sizeof(buf); i++)
+		CHECK(buf[i] != 1, "9k+10b-untouched",
+		      "Unused data not all ok, offset %i is failing [%u]!\n",
+		      i, buf[i]);
+
+	/* Test a too large grow */
+	memset(buf, 1, sizeof(buf));
+	size = 9001;
+	exp_size = size;
+	err = bpf_prog_test_run(prog_fd, 1, buf, size,
+				buf, &size, &retval, &duration);
+
+	CHECK(err || retval != XDP_DROP || size != exp_size,
+	      "9k+10b", "err %d retval %d[%d] size %d[%u]\n",
+	      err, retval, XDP_TX, size, exp_size);
+
+	bpf_object__close(obj);
+}
+
 void test_xdp_adjust_tail(void)
 {
 	if (test__start_subtest("xdp_adjust_tail_shrink"))
@@ -138,4 +239,8 @@ void test_xdp_adjust_tail(void)
 		test_xdp_adjust_tail_grow();
 	if (test__start_subtest("xdp_adjust_tail_grow2"))
 		test_xdp_adjust_tail_grow2();
+	if (test__start_subtest("xdp_adjust_mb_tail_shrink"))
+		test_xdp_adjust_mb_tail_shrink();
+	if (test__start_subtest("xdp_adjust_mb_tail_grow"))
+		test_xdp_adjust_mb_tail_grow();
 }
diff --git a/tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_grow.c b/tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_grow.c
index 3d66599eee2e..3d43defb0e00 100644
--- a/tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_grow.c
+++ b/tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_grow.c
@@ -7,11 +7,10 @@ int _xdp_adjust_tail_grow(struct xdp_md *xdp)
 {
 	void *data_end = (void *)(long)xdp->data_end;
 	void *data = (void *)(long)xdp->data;
-	unsigned int data_len;
+	int data_len = bpf_xdp_get_buff_len(xdp);
 	int offset = 0;
 
 	/* Data length determine test case */
-	data_len = data_end - data;
 
 	if (data_len == 54) { /* sizeof(pkt_v4) */
 		offset = 4096; /* test too large offset */
@@ -20,7 +19,12 @@ int _xdp_adjust_tail_grow(struct xdp_md *xdp)
 	} else if (data_len == 64) {
 		offset = 128;
 	} else if (data_len == 128) {
-		offset = 4096 - 256 - 320 - data_len; /* Max tail grow 3520 */
+		/* Max tail grow 3520 */
+		offset = 4096 - 256 - 320 - data_len;
+	} else if (data_len == 9000) {
+		offset = 10;
+	} else if (data_len == 9001) {
+		offset = 4096;
 	} else {
 		return XDP_ABORTED; /* No matching test */
 	}
diff --git a/tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_shrink.c b/tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_shrink.c
index 22065a9cfb25..64177597ac29 100644
--- a/tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_shrink.c
+++ b/tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_shrink.c
@@ -14,14 +14,38 @@ int _version SEC("version") = 1;
 SEC("xdp_adjust_tail_shrink")
 int _xdp_adjust_tail_shrink(struct xdp_md *xdp)
 {
-	void *data_end = (void *)(long)xdp->data_end;
-	void *data = (void *)(long)xdp->data;
+	__u8 *data_end = (void *)(long)xdp->data_end;
+	__u8 *data = (void *)(long)xdp->data;
 	int offset = 0;
 
-	if (data_end - data == 54) /* sizeof(pkt_v4) */
+	switch (bpf_xdp_get_buff_len(xdp)) {
+	case 54:
+		/* sizeof(pkt_v4) */
 		offset = 256; /* shrink too much */
-	else
+		break;
+	case 9000:
+		/* Multi-buffer test cases */
+		if (data + 1 > data_end)
+			return XDP_DROP;
+
+		switch (data[0]) {
+		case 0:
+			offset = 10;
+			break;
+		case 1:
+			offset = 4100;
+			break;
+		case 2:
+			offset = 8200;
+			break;
+		default:
+			return XDP_DROP;
+		}
+		break;
+	default:
 		offset = 20;
+		break;
+	}
 	if (bpf_xdp_adjust_tail(xdp, 0 - offset))
 		return XDP_DROP;
 	return XDP_TX;
-- 
2.31.1


^ permalink raw reply related	[flat|nested] 53+ messages in thread

* RE: [PATCH v9 bpf-next 00/14] mvneta: introduce XDP multi-buffer support
  2021-06-14 12:49 [PATCH v9 bpf-next 00/14] mvneta: introduce XDP multi-buffer support Lorenzo Bianconi
                   ` (13 preceding siblings ...)
  2021-06-14 12:49 ` [PATCH v9 bpf-next 14/14] bpf: update xdp_adjust_tail selftest to include multi-buffer Lorenzo Bianconi
@ 2021-06-22 23:18 ` John Fastabend
  2021-06-23  3:41   ` David Ahern
  2021-07-01  7:56   ` Magnus Karlsson
  14 siblings, 2 replies; 53+ messages in thread
From: John Fastabend @ 2021-06-22 23:18 UTC (permalink / raw)
  To: Lorenzo Bianconi, bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr, sameehj,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar

Lorenzo Bianconi wrote:
> This series introduce XDP multi-buffer support. The mvneta driver is
> the first to support these new "non-linear" xdp_{buff,frame}. Reviewers
> please focus on how these new types of xdp_{buff,frame} packets
> traverse the different layers and the layout design. It is on purpose
> that BPF-helpers are kept simple, as we don't want to expose the
> internal layout to allow later changes.
> 
> For now, to keep the design simple and to maintain performance, the XDP
> BPF-prog (still) only have access to the first-buffer. It is left for
> later (another patchset) to add payload access across multiple buffers.
> This patchset should still allow for these future extensions. The goal
> is to lift the XDP MTU restriction that comes with XDP, but maintain
> same performance as before.

At this point I don't think we can have a partial implementation. At
the moment we have packet capture applications and protocol parsers
running in production. If we allow this to go in staged we are going
to break those applications that make the fundamental assumption they
have access to all the data in the packet.

There will be no way to fix it when it happens. The teams running the
applications wont necessarily be able to change the network MTU. Now
it doesn't work, hard stop. This is better than it sort of works some
of the time. Worse if we get in a situation where some drivers support
partial access and others support full access the support matrix gets worse.

I think we need to get full support and access to all bytes. I believe
I said this earlier, but now we've deployed apps that really do need
access to the payloads so its not a theoritical concern anymore, but
rather a real one based on deployed BPF programs.

> 
> The main idea for the new multi-buffer layout is to reuse the same
> layout used for non-linear SKB. This rely on the "skb_shared_info"
> struct at the end of the first buffer to link together subsequent
> buffers. Keeping the layout compatible with SKBs is also done to ease
> and speedup creating an SKB from an xdp_{buff,frame}.
> Converting xdp_frame to SKB and deliver it to the network stack is shown
> in patch 07/14 (e.g. cpumaps).
> 
> A multi-buffer bit (mb) has been introduced in the flags field of xdp_{buff,frame}
> structure to notify the bpf/network layer if this is a xdp multi-buffer frame
> (mb = 1) or not (mb = 0).
> The mb bit will be set by a xdp multi-buffer capable driver only for
> non-linear frames maintaining the capability to receive linear frames
> without any extra cost since the skb_shared_info structure at the end
> of the first buffer will be initialized only if mb is set.
> Moreover the flags field in xdp_{buff,frame} will be reused even for
> xdp rx csum offloading in future series.
> 
> Typical use cases for this series are:
> - Jumbo-frames
> - Packet header split (please see Google���s use-case @ NetDevConf 0x14, [0])
> - TSO
> 
> A new bpf helper (bpf_xdp_get_buff_len) has been introduce in order to notify
> the eBPF layer about the total frame size (linear + paged parts).

Is it possible to make currently working programs continue to work?
For a simple packet capture example a program might capture the
entire packet of bytes '(data_end - data_start)'. With above implementation
the program will continue to run, but will no longer be capturing
all the bytes... so its a silent failure. Otherwise I'll need to
backport fixes into my BPF programs and releases to ensure they
don't walk onto a new kernel with multi-buffer support enabled.
Its not ideal.

> 
> bpf_xdp_adjust_tail and bpf_xdp_copy helpers have been modified to take into
> account xdp multi-buff frames.
> 
> More info about the main idea behind this approach can be found here [1][2].

Will read [1],[2].

Where did the perf data for the 40gbps NIC go? I think we want that
done again on this series with at least 40gbps NICs and better
yet 100gbps drivers. If its addressed in a patch commit message
I'm reading the series now.

> 
> Changes since v8:
> - add proper dma unmapping if XDP_TX fails on mvneta for a xdp multi-buff
> - switch back to skb_shared_info implementation from previous xdp_shared_info
>   one
> - avoid using a bietfield in xdp_buff/xdp_frame since it introduces performance
>   regressions. Tested now on 10G NIC (ixgbe) to verify there are no performance
>   penalties for regular codebase
> - add bpf_xdp_get_buff_len helper and remove frame_length field in xdp ctx
> - add data_len field in skb_shared_info struct
> 
> Changes since v7:
> - rebase on top of bpf-next
> - fix sparse warnings
> - improve comments for frame_length in include/net/xdp.h
> 
> Changes since v6:
> - the main difference respect to previous versions is the new approach proposed
>   by Eelco to pass full length of the packet to eBPF layer in XDP context
> - reintroduce multi-buff support to eBPF kself-tests
> - reintroduce multi-buff support to bpf_xdp_adjust_tail helper
> - introduce multi-buffer support to bpf_xdp_copy helper
> - rebase on top of bpf-next
> 
> Changes since v5:
> - rebase on top of bpf-next
> - initialize mb bit in xdp_init_buff() and drop per-driver initialization
> - drop xdp->mb initialization in xdp_convert_zc_to_xdp_frame()
> - postpone introduction of frame_length field in XDP ctx to another series
> - minor changes
> 
> Changes since v4:
> - rebase ontop of bpf-next
> - introduce xdp_shared_info to build xdp multi-buff instead of using the
>   skb_shared_info struct
> - introduce frame_length in xdp ctx
> - drop previous bpf helpers
> - fix bpf_xdp_adjust_tail for xdp multi-buff
> - introduce xdp multi-buff self-tests for bpf_xdp_adjust_tail
> - fix xdp_return_frame_bulk for xdp multi-buff
> 
> Changes since v3:
> - rebase ontop of bpf-next
> - add patch 10/13 to copy back paged data from a xdp multi-buff frame to
>   userspace buffer for xdp multi-buff selftests
> 
> Changes since v2:
> - add throughput measurements
> - drop bpf_xdp_adjust_mb_header bpf helper
> - introduce selftest for xdp multibuffer
> - addressed comments on bpf_xdp_get_frags_count
> - introduce xdp multi-buff support to cpumaps
> 
> Changes since v1:
> - Fix use-after-free in xdp_return_{buff/frame}
> - Introduce bpf helpers
> - Introduce xdp_mb sample program
> - access skb_shared_info->nr_frags only on the last fragment
> 
> Changes since RFC:
> - squash multi-buffer bit initialization in a single patch
> - add mvneta non-linear XDP buff support for tx side
> 
> [0] https://netdevconf.info/0x14/session.html?talk-the-path-to-tcp-4k-mtu-and-rx-zerocopy
> [1] https://github.com/xdp-project/xdp-project/blob/master/areas/core/xdp-multi-buffer01-design.org
> [2] https://netdevconf.info/0x14/session.html?tutorial-add-XDP-support-to-a-NIC-driver (XDPmulti-buffers section)
> 
> Eelco Chaudron (3):
>   bpf: add multi-buff support to the bpf_xdp_adjust_tail() API
>   bpf: add multi-buffer support to xdp copy helpers
>   bpf: update xdp_adjust_tail selftest to include multi-buffer
> 
> Lorenzo Bianconi (11):
>   net: skbuff: add data_len field to skb_shared_info
>   xdp: introduce flags field in xdp_buff/xdp_frame
>   net: mvneta: update mb bit before passing the xdp buffer to eBPF layer
>   xdp: add multi-buff support to xdp_return_{buff/frame}
>   net: mvneta: add multi buffer support to XDP_TX
>   net: mvneta: enable jumbo frames for XDP
>   net: xdp: add multi-buff support to xdp_build_skb_from_frame
>   bpf: introduce bpf_xdp_get_buff_len helper
>   bpf: move user_size out of bpf_test_init
>   bpf: introduce multibuff support to bpf_prog_test_run_xdp()
>   bpf: test_run: add xdp_shared_info pointer in bpf_test_finish
>     signature
> 
>  drivers/net/ethernet/marvell/mvneta.c         | 143 ++++++++++------
>  include/linux/skbuff.h                        |   5 +-
>  include/net/xdp.h                             |  56 ++++++-
>  include/uapi/linux/bpf.h                      |   7 +
>  kernel/trace/bpf_trace.c                      |   3 +
>  net/bpf/test_run.c                            | 108 +++++++++---
>  net/core/filter.c                             | 157 +++++++++++++++++-
>  net/core/xdp.c                                |  72 +++++++-
>  tools/include/uapi/linux/bpf.h                |   7 +
>  .../bpf/prog_tests/xdp_adjust_tail.c          | 105 ++++++++++++
>  .../selftests/bpf/prog_tests/xdp_bpf2bpf.c    | 127 +++++++++-----
>  .../bpf/progs/test_xdp_adjust_tail_grow.c     |  10 +-
>  .../bpf/progs/test_xdp_adjust_tail_shrink.c   |  32 +++-
>  .../selftests/bpf/progs/test_xdp_bpf2bpf.c    |   2 +-
>  14 files changed, 705 insertions(+), 129 deletions(-)
> 
> -- 
> 2.31.1
> 



^ permalink raw reply	[flat|nested] 53+ messages in thread

* RE: [PATCH v9 bpf-next 08/14] bpf: add multi-buff support to the bpf_xdp_adjust_tail() API
  2021-06-14 12:49 ` [PATCH v9 bpf-next 08/14] bpf: add multi-buff support to the bpf_xdp_adjust_tail() API Lorenzo Bianconi
@ 2021-06-22 23:37   ` John Fastabend
  2021-06-24  9:26     ` Eelco Chaudron
  0 siblings, 1 reply; 53+ messages in thread
From: John Fastabend @ 2021-06-22 23:37 UTC (permalink / raw)
  To: Lorenzo Bianconi, bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr, sameehj,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar

Lorenzo Bianconi wrote:
> From: Eelco Chaudron <echaudro@redhat.com>
> 
> This change adds support for tail growing and shrinking for XDP multi-buff.
> 

It would be nice if the commit message gave us some details on how the
growing/shrinking works in the multi-buff support.

> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> ---
>  include/net/xdp.h |  7 ++++++
>  net/core/filter.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++
>  net/core/xdp.c    |  5 ++--
>  3 files changed, 72 insertions(+), 2 deletions(-)
> 
> diff --git a/include/net/xdp.h b/include/net/xdp.h
> index 935a6f83115f..3525801c6ed5 100644
> --- a/include/net/xdp.h
> +++ b/include/net/xdp.h
> @@ -132,6 +132,11 @@ xdp_get_shared_info_from_buff(struct xdp_buff *xdp)
>  	return (struct skb_shared_info *)xdp_data_hard_end(xdp);
>  }
>  
> +static inline unsigned int xdp_get_frag_tailroom(const skb_frag_t *frag)
> +{
> +	return PAGE_SIZE - skb_frag_size(frag) - skb_frag_off(frag);
> +}
> +
>  struct xdp_frame {
>  	void *data;
>  	u16 len;
> @@ -259,6 +264,8 @@ struct xdp_frame *xdp_convert_buff_to_frame(struct xdp_buff *xdp)
>  	return xdp_frame;
>  }
>  
> +void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct,
> +		  struct xdp_buff *xdp);
>  void xdp_return_frame(struct xdp_frame *xdpf);
>  void xdp_return_frame_rx_napi(struct xdp_frame *xdpf);
>  void xdp_return_buff(struct xdp_buff *xdp);
> diff --git a/net/core/filter.c b/net/core/filter.c
> index caa88955562e..05f574a3d690 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -3859,11 +3859,73 @@ static const struct bpf_func_proto bpf_xdp_adjust_head_proto = {
>  	.arg2_type	= ARG_ANYTHING,
>  };
>  
> +static int bpf_xdp_mb_adjust_tail(struct xdp_buff *xdp, int offset)
> +{
> +	struct skb_shared_info *sinfo;
> +
> +	if (unlikely(!xdp_buff_is_mb(xdp)))
> +		return -EINVAL;
> +
> +	sinfo = xdp_get_shared_info_from_buff(xdp);
> +	if (offset >= 0) {
> +		skb_frag_t *frag = &sinfo->frags[sinfo->nr_frags - 1];
> +		int size;
> +
> +		if (unlikely(offset > xdp_get_frag_tailroom(frag)))
> +			return -EINVAL;
> +
> +		size = skb_frag_size(frag);
> +		memset(skb_frag_address(frag) + size, 0, offset);
> +		skb_frag_size_set(frag, size + offset);
> +		sinfo->data_len += offset;

Can you add some comment on how this works? So today I call
bpf_xdp_adjust_tail() to add some trailer to my packet.
This looks like it adds tailroom to the last frag? But, then
how do I insert my trailer? I don't think we can without the
extra multi-buffer access support right.

Also data_end will be unchanged yet it will return 0 so my
current programs will likely be a bit confused by this.

> +	} else {

^ permalink raw reply	[flat|nested] 53+ messages in thread

* RE: [PATCH v9 bpf-next 10/14] bpf: add multi-buffer support to xdp copy helpers
  2021-06-14 12:49 ` [PATCH v9 bpf-next 10/14] bpf: add multi-buffer support to xdp copy helpers Lorenzo Bianconi
@ 2021-06-22 23:49   ` John Fastabend
  2021-06-24  9:42     ` Eelco Chaudron
  0 siblings, 1 reply; 53+ messages in thread
From: John Fastabend @ 2021-06-22 23:49 UTC (permalink / raw)
  To: Lorenzo Bianconi, bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr, sameehj,
	john.fastabend, dsahern, brouer, echaudro, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar

Lorenzo Bianconi wrote:
> From: Eelco Chaudron <echaudro@redhat.com>
> 
> This patch adds support for multi-buffer for the following helpers:
>   - bpf_xdp_output()
>   - bpf_perf_event_output()
> 
> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> ---

Ah ok so at least xdp_output will work with all bytes. But this is
getting close to having access into the frags so I think doing
the last bit shouldn't be too hard?

>  kernel/trace/bpf_trace.c                      |   3 +
>  net/core/filter.c                             |  72 +++++++++-
>  .../selftests/bpf/prog_tests/xdp_bpf2bpf.c    | 127 ++++++++++++------
>  .../selftests/bpf/progs/test_xdp_bpf2bpf.c    |   2 +-
>  4 files changed, 160 insertions(+), 44 deletions(-)
> 
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index d2d7cf6cfe83..ee926ec64f78 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -1365,6 +1365,7 @@ static const struct bpf_func_proto bpf_perf_event_output_proto_raw_tp = {
>  
>  extern const struct bpf_func_proto bpf_skb_output_proto;
>  extern const struct bpf_func_proto bpf_xdp_output_proto;
> +extern const struct bpf_func_proto bpf_xdp_get_buff_len_trace_proto;
>  
>  BPF_CALL_3(bpf_get_stackid_raw_tp, struct bpf_raw_tracepoint_args *, args,
>  	   struct bpf_map *, map, u64, flags)
> @@ -1460,6 +1461,8 @@ tracing_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
>  		return &bpf_sock_from_file_proto;
>  	case BPF_FUNC_get_socket_cookie:
>  		return &bpf_get_socket_ptr_cookie_proto;
> +	case BPF_FUNC_xdp_get_buff_len:
> +		return &bpf_xdp_get_buff_len_trace_proto;
>  #endif
>  	case BPF_FUNC_seq_printf:
>  		return prog->expected_attach_type == BPF_TRACE_ITER ?
> diff --git a/net/core/filter.c b/net/core/filter.c
> index b0855f2d4726..f7211b7908a9 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -3939,6 +3939,15 @@ const struct bpf_func_proto bpf_xdp_get_buff_len_proto = {
>  	.arg1_type	= ARG_PTR_TO_CTX,
>  };
>  
> +BTF_ID_LIST_SINGLE(bpf_xdp_get_buff_len_bpf_ids, struct, xdp_buff)
> +
> +const struct bpf_func_proto bpf_xdp_get_buff_len_trace_proto = {
> +	.func		= bpf_xdp_get_buff_len,
> +	.gpl_only	= false,
> +	.arg1_type	= ARG_PTR_TO_BTF_ID,
> +	.arg1_btf_id	= &bpf_xdp_get_buff_len_bpf_ids[0],
> +};
> +
>  BPF_CALL_2(bpf_xdp_adjust_tail, struct xdp_buff *, xdp, int, offset)
>  {
>  	void *data_hard_end = xdp_data_hard_end(xdp); /* use xdp->frame_sz */
> @@ -4606,10 +4615,56 @@ static const struct bpf_func_proto bpf_sk_ancestor_cgroup_id_proto = {
>  };
>  #endif
>  
> -static unsigned long bpf_xdp_copy(void *dst_buff, const void *src_buff,
> +static unsigned long bpf_xdp_copy(void *dst_buff, const void *ctx,
>  				  unsigned long off, unsigned long len)
>  {
> -	memcpy(dst_buff, src_buff + off, len);
> +	struct xdp_buff *xdp = (struct xdp_buff *)ctx;
> +	struct skb_shared_info *sinfo;
> +	unsigned long base_len;
> +
> +	if (likely(!xdp_buff_is_mb(xdp))) {
> +		memcpy(dst_buff, xdp->data + off, len);
> +		return 0;
> +	}
> +
> +	base_len = xdp->data_end - xdp->data;
> +	sinfo = xdp_get_shared_info_from_buff(xdp);
> +	do {
> +		const void *src_buff = NULL;
> +		unsigned long copy_len = 0;
> +
> +		if (off < base_len) {
> +			src_buff = xdp->data + off;
> +			copy_len = min(len, base_len - off);
> +		} else {
> +			unsigned long frag_off_total = base_len;
> +			int i;
> +
> +			for (i = 0; i < sinfo->nr_frags; i++) {
> +				skb_frag_t *frag = &sinfo->frags[i];
> +				unsigned long frag_len, frag_off;
> +
> +				frag_len = skb_frag_size(frag);
> +				frag_off = off - frag_off_total;
> +				if (frag_off < frag_len) {
> +					src_buff = skb_frag_address(frag) +
> +						   frag_off;
> +					copy_len = min(len,
> +						       frag_len - frag_off);
> +					break;
> +				}
> +				frag_off_total += frag_len;
> +			}
> +		}
> +		if (!src_buff)
> +			break;
> +
> +		memcpy(dst_buff, src_buff, copy_len);
> +		off += copy_len;
> +		len -= copy_len;
> +		dst_buff += copy_len;

This block reads odd to be because it requires looping over the frags
multiple times? Why not something like this,

  if (off < base_len) {
   src_buff = xdp->data + off
   copy_len = min...
   memcpy(dst_buff, src_buff, copy_len)
   off += copylen
   len -= copylen
   dst_buff += copylen;
  }

  for (i = 0; i , nr_frags; i++) {
     frag = ...
     ...
     if frag_off < fraglen
        ...
        memcpy()
        update(off, len, dst_buff)
  }


Maybe use a helper to set off,len and dst_buff if worried about the
duplication. Seems cleaner than walking through 0..n-1 frags for
each copy.

> +	} while (len);
> +
>  	return 0;
>  }

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v9 bpf-next 00/14] mvneta: introduce XDP multi-buffer support
  2021-06-22 23:18 ` [PATCH v9 bpf-next 00/14] mvneta: introduce XDP multi-buffer support John Fastabend
@ 2021-06-23  3:41   ` David Ahern
  2021-06-23  5:48     ` John Fastabend
  2021-07-01  7:56   ` Magnus Karlsson
  1 sibling, 1 reply; 53+ messages in thread
From: David Ahern @ 2021-06-23  3:41 UTC (permalink / raw)
  To: John Fastabend, Lorenzo Bianconi, bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr, sameehj,
	dsahern, brouer, echaudro, jasowang, alexander.duyck, saeed,
	maciej.fijalkowski, magnus.karlsson, tirthendu.sarkar

On 6/22/21 5:18 PM, John Fastabend wrote:
> At this point I don't think we can have a partial implementation. At
> the moment we have packet capture applications and protocol parsers
> running in production. If we allow this to go in staged we are going
> to break those applications that make the fundamental assumption they
> have access to all the data in the packet.

What about cases like netgpu where headers are accessible but data is
not (e.g., gpu memory)? If the API indicates limited buffer access, is
that sufficient?

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v9 bpf-next 00/14] mvneta: introduce XDP multi-buffer support
  2021-06-23  3:41   ` David Ahern
@ 2021-06-23  5:48     ` John Fastabend
  2021-06-23 14:40       ` David Ahern
  0 siblings, 1 reply; 53+ messages in thread
From: John Fastabend @ 2021-06-23  5:48 UTC (permalink / raw)
  To: David Ahern, John Fastabend, Lorenzo Bianconi, bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr, sameehj,
	dsahern, brouer, echaudro, jasowang, alexander.duyck, saeed,
	maciej.fijalkowski, magnus.karlsson, tirthendu.sarkar

David Ahern wrote:
> On 6/22/21 5:18 PM, John Fastabend wrote:
> > At this point I don't think we can have a partial implementation. At
> > the moment we have packet capture applications and protocol parsers
> > running in production. If we allow this to go in staged we are going
> > to break those applications that make the fundamental assumption they
> > have access to all the data in the packet.
> 
> What about cases like netgpu where headers are accessible but data is
> not (e.g., gpu memory)? If the API indicates limited buffer access, is
> that sufficient?

I never consider netgpus and I guess I don't fully understand the
architecture to say. But, I would try to argue that an XDP API
should allow XDP to reach into the payload of these GPU packets as well.
Of course it might be slow.

I'm not really convinced just indicating its a limited buffer is enough.
I think we want to be able to read/write any byte in the packet. I see
two ways to do it,

  /* xdp_pull_data moves data and data_end pointers into the frag
   * containing the byte offset start.
   *
   * returns negative value on error otherwise returns offset of
   * data pointer into payload.
   */
  int xdp_pull_data(int start)

This would be a helper call to push the xdp->data{_end} pointers into
the correct frag and then normal verification should work. From my
side this works because I can always find the next frag by starting
at 'xdp_pull_data(xdp->data_end+1)'. And by returning offset we can
always figure out where we are in the payload. This is the easiest
thing I could come up with. And hopefully for _most_ cases the bytes
we need are in the initial data. Also I don't see how extending tail
works without something like this.

My other thought, but requires some verifier work would be to extend
'struct xdp_md' with a frags[] pointer.

 struct xdp_md {
   __u32 data;
   __u32 data_end;
   __u32 data_meta;
   /* metadata stuff */
  struct _xdp_md frags[] 
  __u32 frags_end;
 }

Then a XDP program could read access a frag like so,

  if (i < xdp->frags_end) {
     frag = xdp->frags[i];
     if (offset + hdr_size < frag->data_end)
         memcpy(dst, frag->data[offset], hdr_size);
  }

The nice bit about above is you avoid the call, but maybe it doesn't
matter if you are already looking into frags pps is probably not at
64B sizes anyways.

My main concern here is we hit a case where the driver doesn't pull in
the bytes we need and then we are stuck without a workaround. The helper
looks fairly straightforward to me could we try that?

Also I thought we had another driver in the works? Any ideas where
that went...

Last, I'll add thanks for working on this everyone.

.John

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v9 bpf-next 00/14] mvneta: introduce XDP multi-buffer support
  2021-06-23  5:48     ` John Fastabend
@ 2021-06-23 14:40       ` David Ahern
  2021-06-24 14:22         ` John Fastabend
  0 siblings, 1 reply; 53+ messages in thread
From: David Ahern @ 2021-06-23 14:40 UTC (permalink / raw)
  To: John Fastabend, Lorenzo Bianconi, bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr, sameehj,
	dsahern, brouer, echaudro, jasowang, alexander.duyck, saeed,
	maciej.fijalkowski, magnus.karlsson, tirthendu.sarkar

On 6/22/21 11:48 PM, John Fastabend wrote:
> David Ahern wrote:
>> On 6/22/21 5:18 PM, John Fastabend wrote:
>>> At this point I don't think we can have a partial implementation. At
>>> the moment we have packet capture applications and protocol parsers
>>> running in production. If we allow this to go in staged we are going
>>> to break those applications that make the fundamental assumption they
>>> have access to all the data in the packet.
>>
>> What about cases like netgpu where headers are accessible but data is
>> not (e.g., gpu memory)? If the API indicates limited buffer access, is
>> that sufficient?
> 
> I never consider netgpus and I guess I don't fully understand the
> architecture to say. But, I would try to argue that an XDP API
> should allow XDP to reach into the payload of these GPU packets as well.
> Of course it might be slow.

AIUI S/W on the host can not access gpu memory, so that is not a
possibility at all.

Another use case is DDP and ZC. Mellanox has a proposal for NVME (with
intentions to extend to iscsi) to do direct data placement. This is
really just an example of zerocopy (and netgpu has morphed into zctap
with current prototype working for host memory) which will become more
prominent. XDP programs accessing memory already mapped to user space
will be racy.

To me these proposals suggest a trend and one that XDP APIs should be
ready to handle - like indicating limited access or specifying length
that can be accessed.


> 
> I'm not really convinced just indicating its a limited buffer is enough.
> I think we want to be able to read/write any byte in the packet. I see
> two ways to do it,
> 
>   /* xdp_pull_data moves data and data_end pointers into the frag
>    * containing the byte offset start.
>    *
>    * returns negative value on error otherwise returns offset of
>    * data pointer into payload.
>    */
>   int xdp_pull_data(int start)
> 
> This would be a helper call to push the xdp->data{_end} pointers into
> the correct frag and then normal verification should work. From my
> side this works because I can always find the next frag by starting
> at 'xdp_pull_data(xdp->data_end+1)'. And by returning offset we can
> always figure out where we are in the payload. This is the easiest
> thing I could come up with. And hopefully for _most_ cases the bytes
> we need are in the initial data. Also I don't see how extending tail
> works without something like this.
> 
> My other thought, but requires some verifier work would be to extend
> 'struct xdp_md' with a frags[] pointer.
> 
>  struct xdp_md {
>    __u32 data;
>    __u32 data_end;
>    __u32 data_meta;
>    /* metadata stuff */
>   struct _xdp_md frags[] 
>   __u32 frags_end;
>  }
> 
> Then a XDP program could read access a frag like so,
> 
>   if (i < xdp->frags_end) {
>      frag = xdp->frags[i];
>      if (offset + hdr_size < frag->data_end)
>          memcpy(dst, frag->data[offset], hdr_size);
>   }
> 
> The nice bit about above is you avoid the call, but maybe it doesn't
> matter if you are already looking into frags pps is probably not at
> 64B sizes anyways.
> 
> My main concern here is we hit a case where the driver doesn't pull in
> the bytes we need and then we are stuck without a workaround. The helper
> looks fairly straightforward to me could we try that?
> 
> Also I thought we had another driver in the works? Any ideas where
> that went...
> 
> Last, I'll add thanks for working on this everyone.
> 
> .John
> 


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v9 bpf-next 08/14] bpf: add multi-buff support to the bpf_xdp_adjust_tail() API
  2021-06-22 23:37   ` John Fastabend
@ 2021-06-24  9:26     ` Eelco Chaudron
  2021-06-24 14:24       ` John Fastabend
  0 siblings, 1 reply; 53+ messages in thread
From: Eelco Chaudron @ 2021-06-24  9:26 UTC (permalink / raw)
  To: John Fastabend
  Cc: Lorenzo Bianconi, bpf, netdev, lorenzo.bianconi, davem, kuba,
	ast, daniel, shayagr, sameehj, dsahern, brouer, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar



On 23 Jun 2021, at 1:37, John Fastabend wrote:

> Lorenzo Bianconi wrote:
>> From: Eelco Chaudron <echaudro@redhat.com>
>>
>> This change adds support for tail growing and shrinking for XDP multi-buff.
>>
>
> It would be nice if the commit message gave us some details on how the
> growing/shrinking works in the multi-buff support.

Will add this to the next rev.

>> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
>> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
>> ---
>>  include/net/xdp.h |  7 ++++++
>>  net/core/filter.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++
>>  net/core/xdp.c    |  5 ++--
>>  3 files changed, 72 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/net/xdp.h b/include/net/xdp.h
>> index 935a6f83115f..3525801c6ed5 100644
>> --- a/include/net/xdp.h
>> +++ b/include/net/xdp.h
>> @@ -132,6 +132,11 @@ xdp_get_shared_info_from_buff(struct xdp_buff *xdp)
>>  	return (struct skb_shared_info *)xdp_data_hard_end(xdp);
>>  }
>>
>> +static inline unsigned int xdp_get_frag_tailroom(const skb_frag_t *frag)
>> +{
>> +	return PAGE_SIZE - skb_frag_size(frag) - skb_frag_off(frag);
>> +}
>> +
>>  struct xdp_frame {
>>  	void *data;
>>  	u16 len;
>> @@ -259,6 +264,8 @@ struct xdp_frame *xdp_convert_buff_to_frame(struct xdp_buff *xdp)
>>  	return xdp_frame;
>>  }
>>
>> +void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct,
>> +		  struct xdp_buff *xdp);
>>  void xdp_return_frame(struct xdp_frame *xdpf);
>>  void xdp_return_frame_rx_napi(struct xdp_frame *xdpf);
>>  void xdp_return_buff(struct xdp_buff *xdp);
>> diff --git a/net/core/filter.c b/net/core/filter.c
>> index caa88955562e..05f574a3d690 100644
>> --- a/net/core/filter.c
>> +++ b/net/core/filter.c
>> @@ -3859,11 +3859,73 @@ static const struct bpf_func_proto bpf_xdp_adjust_head_proto = {
>>  	.arg2_type	= ARG_ANYTHING,
>>  };
>>
>> +static int bpf_xdp_mb_adjust_tail(struct xdp_buff *xdp, int offset)
>> +{
>> +	struct skb_shared_info *sinfo;
>> +
>> +	if (unlikely(!xdp_buff_is_mb(xdp)))
>> +		return -EINVAL;
>> +
>> +	sinfo = xdp_get_shared_info_from_buff(xdp);
>> +	if (offset >= 0) {
>> +		skb_frag_t *frag = &sinfo->frags[sinfo->nr_frags - 1];
>> +		int size;
>> +
>> +		if (unlikely(offset > xdp_get_frag_tailroom(frag)))
>> +			return -EINVAL;
>> +
>> +		size = skb_frag_size(frag);
>> +		memset(skb_frag_address(frag) + size, 0, offset);
>> +		skb_frag_size_set(frag, size + offset);
>> +		sinfo->data_len += offset;
>
> Can you add some comment on how this works? So today I call
> bpf_xdp_adjust_tail() to add some trailer to my packet.
> This looks like it adds tailroom to the last frag? But, then
> how do I insert my trailer? I don't think we can without the
> extra multi-buffer access support right.

You are right, we need some kind of multi-buffer access helpers.

> Also data_end will be unchanged yet it will return 0 so my
> current programs will likely be a bit confused by this.

Guess this is the tricky part, applications need to be multi-buffer aware. If current applications rely on bpf_xdp_adjust_tail(+) to determine maximum frame length this approach might not work. In this case, we might need an additional helper to do tail expansion with multi buffer support.

But then the question arrives how would mb unaware application behave in general when an mb packet is supplied?? It would definitely not determine the correct packet length.

>> +	} else {


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v9 bpf-next 10/14] bpf: add multi-buffer support to xdp copy helpers
  2021-06-22 23:49   ` John Fastabend
@ 2021-06-24  9:42     ` Eelco Chaudron
  2021-06-24 14:28       ` John Fastabend
  0 siblings, 1 reply; 53+ messages in thread
From: Eelco Chaudron @ 2021-06-24  9:42 UTC (permalink / raw)
  To: John Fastabend
  Cc: Lorenzo Bianconi, bpf, netdev, lorenzo.bianconi, davem, kuba,
	ast, daniel, shayagr, sameehj, dsahern, brouer, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar



On 23 Jun 2021, at 1:49, John Fastabend wrote:

> Lorenzo Bianconi wrote:
>> From: Eelco Chaudron <echaudro@redhat.com>
>>
>> This patch adds support for multi-buffer for the following helpers:
>>   - bpf_xdp_output()
>>   - bpf_perf_event_output()
>>
>> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
>> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
>> ---
>
> Ah ok so at least xdp_output will work with all bytes. But this is
> getting close to having access into the frags so I think doing
> the last bit shouldn't be too hard?


Guess you are talking about multi-buffer access in the XDP program?

I did suggest an API a while back, https://lore.kernel.org/bpf/FD3E6E08-DE78-4FBA-96F6-646C93E88631@redhat.com/ but I had/have not time to work on it. Guess the difficult part is to convince the verifier to allow the data to be accessed.

>
>>  kernel/trace/bpf_trace.c                      |   3 +
>>  net/core/filter.c                             |  72 +++++++++-
>>  .../selftests/bpf/prog_tests/xdp_bpf2bpf.c    | 127 ++++++++++++------
>>  .../selftests/bpf/progs/test_xdp_bpf2bpf.c    |   2 +-
>>  4 files changed, 160 insertions(+), 44 deletions(-)
>>
>> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
>> index d2d7cf6cfe83..ee926ec64f78 100644
>> --- a/kernel/trace/bpf_trace.c
>> +++ b/kernel/trace/bpf_trace.c
>> @@ -1365,6 +1365,7 @@ static const struct bpf_func_proto bpf_perf_event_output_proto_raw_tp = {
>>
>>  extern const struct bpf_func_proto bpf_skb_output_proto;
>>  extern const struct bpf_func_proto bpf_xdp_output_proto;
>> +extern const struct bpf_func_proto bpf_xdp_get_buff_len_trace_proto;
>>
>>  BPF_CALL_3(bpf_get_stackid_raw_tp, struct bpf_raw_tracepoint_args *, args,
>>  	   struct bpf_map *, map, u64, flags)
>> @@ -1460,6 +1461,8 @@ tracing_prog_func_proto(enum bpf_func_id func_id, const struct bpf_prog *prog)
>>  		return &bpf_sock_from_file_proto;
>>  	case BPF_FUNC_get_socket_cookie:
>>  		return &bpf_get_socket_ptr_cookie_proto;
>> +	case BPF_FUNC_xdp_get_buff_len:
>> +		return &bpf_xdp_get_buff_len_trace_proto;
>>  #endif
>>  	case BPF_FUNC_seq_printf:
>>  		return prog->expected_attach_type == BPF_TRACE_ITER ?
>> diff --git a/net/core/filter.c b/net/core/filter.c
>> index b0855f2d4726..f7211b7908a9 100644
>> --- a/net/core/filter.c
>> +++ b/net/core/filter.c
>> @@ -3939,6 +3939,15 @@ const struct bpf_func_proto bpf_xdp_get_buff_len_proto = {
>>  	.arg1_type	= ARG_PTR_TO_CTX,
>>  };
>>
>> +BTF_ID_LIST_SINGLE(bpf_xdp_get_buff_len_bpf_ids, struct, xdp_buff)
>> +
>> +const struct bpf_func_proto bpf_xdp_get_buff_len_trace_proto = {
>> +	.func		= bpf_xdp_get_buff_len,
>> +	.gpl_only	= false,
>> +	.arg1_type	= ARG_PTR_TO_BTF_ID,
>> +	.arg1_btf_id	= &bpf_xdp_get_buff_len_bpf_ids[0],
>> +};
>> +
>>  BPF_CALL_2(bpf_xdp_adjust_tail, struct xdp_buff *, xdp, int, offset)
>>  {
>>  	void *data_hard_end = xdp_data_hard_end(xdp); /* use xdp->frame_sz */
>> @@ -4606,10 +4615,56 @@ static const struct bpf_func_proto bpf_sk_ancestor_cgroup_id_proto = {
>>  };
>>  #endif
>>
>> -static unsigned long bpf_xdp_copy(void *dst_buff, const void *src_buff,
>> +static unsigned long bpf_xdp_copy(void *dst_buff, const void *ctx,
>>  				  unsigned long off, unsigned long len)
>>  {
>> -	memcpy(dst_buff, src_buff + off, len);
>> +	struct xdp_buff *xdp = (struct xdp_buff *)ctx;
>> +	struct skb_shared_info *sinfo;
>> +	unsigned long base_len;
>> +
>> +	if (likely(!xdp_buff_is_mb(xdp))) {
>> +		memcpy(dst_buff, xdp->data + off, len);
>> +		return 0;
>> +	}
>> +
>> +	base_len = xdp->data_end - xdp->data;
>> +	sinfo = xdp_get_shared_info_from_buff(xdp);
>> +	do {
>> +		const void *src_buff = NULL;
>> +		unsigned long copy_len = 0;
>> +
>> +		if (off < base_len) {
>> +			src_buff = xdp->data + off;
>> +			copy_len = min(len, base_len - off);
>> +		} else {
>> +			unsigned long frag_off_total = base_len;
>> +			int i;
>> +
>> +			for (i = 0; i < sinfo->nr_frags; i++) {
>> +				skb_frag_t *frag = &sinfo->frags[i];
>> +				unsigned long frag_len, frag_off;
>> +
>> +				frag_len = skb_frag_size(frag);
>> +				frag_off = off - frag_off_total;
>> +				if (frag_off < frag_len) {
>> +					src_buff = skb_frag_address(frag) +
>> +						   frag_off;
>> +					copy_len = min(len,
>> +						       frag_len - frag_off);
>> +					break;
>> +				}
>> +				frag_off_total += frag_len;
>> +			}
>> +		}
>> +		if (!src_buff)
>> +			break;
>> +
>> +		memcpy(dst_buff, src_buff, copy_len);
>> +		off += copy_len;
>> +		len -= copy_len;
>> +		dst_buff += copy_len;
>
> This block reads odd to be because it requires looping over the frags
> multiple times? Why not something like this,
>
>   if (off < base_len) {
>    src_buff = xdp->data + off
>    copy_len = min...
>    memcpy(dst_buff, src_buff, copy_len)
>    off += copylen
>    len -= copylen
>    dst_buff += copylen;
>   }
>
>   for (i = 0; i , nr_frags; i++) {
>      frag = ...
>      ...
>      if frag_off < fraglen
>         ...
>         memcpy()
>         update(off, len, dst_buff)
>   }
>
>
> Maybe use a helper to set off,len and dst_buff if worried about the
> duplication. Seems cleaner than walking through 0..n-1 frags for
> each copy.

You are right it looks odd, will re-write this in the next iteration.

>> +	} while (len);
>> +
>>  	return 0;
>>  }


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v9 bpf-next 00/14] mvneta: introduce XDP multi-buffer support
  2021-06-23 14:40       ` David Ahern
@ 2021-06-24 14:22         ` John Fastabend
  0 siblings, 0 replies; 53+ messages in thread
From: John Fastabend @ 2021-06-24 14:22 UTC (permalink / raw)
  To: David Ahern, John Fastabend, Lorenzo Bianconi, bpf, netdev
  Cc: lorenzo.bianconi, davem, kuba, ast, daniel, shayagr, sameehj,
	dsahern, brouer, echaudro, jasowang, alexander.duyck, saeed,
	maciej.fijalkowski, magnus.karlsson, tirthendu.sarkar

David Ahern wrote:
> On 6/22/21 11:48 PM, John Fastabend wrote:
> > David Ahern wrote:
> >> On 6/22/21 5:18 PM, John Fastabend wrote:
> >>> At this point I don't think we can have a partial implementation. At
> >>> the moment we have packet capture applications and protocol parsers
> >>> running in production. If we allow this to go in staged we are going
> >>> to break those applications that make the fundamental assumption they
> >>> have access to all the data in the packet.
> >>
> >> What about cases like netgpu where headers are accessible but data is
> >> not (e.g., gpu memory)? If the API indicates limited buffer access, is
> >> that sufficient?
> > 
> > I never consider netgpus and I guess I don't fully understand the
> > architecture to say. But, I would try to argue that an XDP API
> > should allow XDP to reach into the payload of these GPU packets as well.
> > Of course it might be slow.
> 
> AIUI S/W on the host can not access gpu memory, so that is not a
> possibility at all.

interesting.

> 
> Another use case is DDP and ZC. Mellanox has a proposal for NVME (with
> intentions to extend to iscsi) to do direct data placement. This is
> really just an example of zerocopy (and netgpu has morphed into zctap
> with current prototype working for host memory) which will become more
> prominent. XDP programs accessing memory already mapped to user space
> will be racy.

Its racy in the sense that if the application is reading data before
the driver flips some bit to tell the application new data is available
XDP could write old data or read application changed data? I think
it would still "work" same as AF_XDP? If you allow DDP then you lose
ability to l7 security as far as I can tell. But, thats a general
comment not specific to XDP.

> 
> To me these proposals suggest a trend and one that XDP APIs should be
> ready to handle - like indicating limited access or specifying length
> that can be accessed.

I still think the only case is this net-gpu which we don't have in
kernel at the moment right? I think a bit or size or ... would make
sense if we had this hardware. And then for the other  DDP/ZC case
the system owner would need to know what they are doing when they
turn on DDP or whatever.

.John

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v9 bpf-next 08/14] bpf: add multi-buff support to the bpf_xdp_adjust_tail() API
  2021-06-24  9:26     ` Eelco Chaudron
@ 2021-06-24 14:24       ` John Fastabend
  2021-06-24 15:16         ` Zvi Effron
                           ` (2 more replies)
  0 siblings, 3 replies; 53+ messages in thread
From: John Fastabend @ 2021-06-24 14:24 UTC (permalink / raw)
  To: Eelco Chaudron, John Fastabend
  Cc: Lorenzo Bianconi, bpf, netdev, lorenzo.bianconi, davem, kuba,
	ast, daniel, shayagr, sameehj, dsahern, brouer, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar

Eelco Chaudron wrote:
> 
> 
> On 23 Jun 2021, at 1:37, John Fastabend wrote:
> 
> > Lorenzo Bianconi wrote:
> >> From: Eelco Chaudron <echaudro@redhat.com>
> >>
> >> This change adds support for tail growing and shrinking for XDP multi-buff.
> >>
> >
> > It would be nice if the commit message gave us some details on how the
> > growing/shrinking works in the multi-buff support.
> 
> Will add this to the next rev.
> 
> >> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
> >> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> >> ---
> >>  include/net/xdp.h |  7 ++++++
> >>  net/core/filter.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++
> >>  net/core/xdp.c    |  5 ++--
> >>  3 files changed, 72 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/include/net/xdp.h b/include/net/xdp.h
> >> index 935a6f83115f..3525801c6ed5 100644
> >> --- a/include/net/xdp.h
> >> +++ b/include/net/xdp.h
> >> @@ -132,6 +132,11 @@ xdp_get_shared_info_from_buff(struct xdp_buff *xdp)
> >>  	return (struct skb_shared_info *)xdp_data_hard_end(xdp);
> >>  }
> >>
> >> +static inline unsigned int xdp_get_frag_tailroom(const skb_frag_t *frag)
> >> +{
> >> +	return PAGE_SIZE - skb_frag_size(frag) - skb_frag_off(frag);
> >> +}
> >> +
> >>  struct xdp_frame {
> >>  	void *data;
> >>  	u16 len;
> >> @@ -259,6 +264,8 @@ struct xdp_frame *xdp_convert_buff_to_frame(struct xdp_buff *xdp)
> >>  	return xdp_frame;
> >>  }
> >>
> >> +void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct,
> >> +		  struct xdp_buff *xdp);
> >>  void xdp_return_frame(struct xdp_frame *xdpf);
> >>  void xdp_return_frame_rx_napi(struct xdp_frame *xdpf);
> >>  void xdp_return_buff(struct xdp_buff *xdp);
> >> diff --git a/net/core/filter.c b/net/core/filter.c
> >> index caa88955562e..05f574a3d690 100644
> >> --- a/net/core/filter.c
> >> +++ b/net/core/filter.c
> >> @@ -3859,11 +3859,73 @@ static const struct bpf_func_proto bpf_xdp_adjust_head_proto = {
> >>  	.arg2_type	= ARG_ANYTHING,
> >>  };
> >>
> >> +static int bpf_xdp_mb_adjust_tail(struct xdp_buff *xdp, int offset)
> >> +{
> >> +	struct skb_shared_info *sinfo;
> >> +
> >> +	if (unlikely(!xdp_buff_is_mb(xdp)))
> >> +		return -EINVAL;
> >> +
> >> +	sinfo = xdp_get_shared_info_from_buff(xdp);
> >> +	if (offset >= 0) {
> >> +		skb_frag_t *frag = &sinfo->frags[sinfo->nr_frags - 1];
> >> +		int size;
> >> +
> >> +		if (unlikely(offset > xdp_get_frag_tailroom(frag)))
> >> +			return -EINVAL;
> >> +
> >> +		size = skb_frag_size(frag);
> >> +		memset(skb_frag_address(frag) + size, 0, offset);
> >> +		skb_frag_size_set(frag, size + offset);
> >> +		sinfo->data_len += offset;
> >
> > Can you add some comment on how this works? So today I call
> > bpf_xdp_adjust_tail() to add some trailer to my packet.
> > This looks like it adds tailroom to the last frag? But, then
> > how do I insert my trailer? I don't think we can without the
> > extra multi-buffer access support right.
> 
> You are right, we need some kind of multi-buffer access helpers.
> 
> > Also data_end will be unchanged yet it will return 0 so my
> > current programs will likely be a bit confused by this.
> 
> Guess this is the tricky part, applications need to be multi-buffer aware. If current applications rely on bpf_xdp_adjust_tail(+) to determine maximum frame length this approach might not work. In this case, we might need an additional helper to do tail expansion with multi buffer support.
> 
> But then the question arrives how would mb unaware application behave in general when an mb packet is supplied?? It would definitely not determine the correct packet length.

Right that was my conclusion as well. Existing programs might
have subtle side effects if they start running on multibuffer
drivers as is. I don't have any good ideas though on how
to handle this.

> 
> >> +	} else {
> 



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v9 bpf-next 10/14] bpf: add multi-buffer support to xdp copy helpers
  2021-06-24  9:42     ` Eelco Chaudron
@ 2021-06-24 14:28       ` John Fastabend
  2021-06-25  8:25         ` Eelco Chaudron
                           ` (2 more replies)
  0 siblings, 3 replies; 53+ messages in thread
From: John Fastabend @ 2021-06-24 14:28 UTC (permalink / raw)
  To: Eelco Chaudron, John Fastabend
  Cc: Lorenzo Bianconi, bpf, netdev, lorenzo.bianconi, davem, kuba,
	ast, daniel, shayagr, sameehj, dsahern, brouer, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar

Eelco Chaudron wrote:
> 
> 
> On 23 Jun 2021, at 1:49, John Fastabend wrote:
> 
> > Lorenzo Bianconi wrote:
> >> From: Eelco Chaudron <echaudro@redhat.com>
> >>
> >> This patch adds support for multi-buffer for the following helpers:
> >>   - bpf_xdp_output()
> >>   - bpf_perf_event_output()
> >>
> >> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
> >> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> >> ---
> >
> > Ah ok so at least xdp_output will work with all bytes. But this is
> > getting close to having access into the frags so I think doing
> > the last bit shouldn't be too hard?
> 
> 
> Guess you are talking about multi-buffer access in the XDP program?
> 
> I did suggest an API a while back, https://lore.kernel.org/bpf/FD3E6E08-DE78-4FBA-96F6-646C93E88631@redhat.com/ but I had/have not time to work on it. Guess the difficult part is to convince the verifier to allow the data to be accessed.

Ah great I think we had the same idea I called it xdp_pull_data()
though.

Whats the complication though it looks like it can be done by simply
moving the data and data_end pointers around then marking them
invalidated. This way the verifier knows the program needs to
rewrite them. I can probably look more into next week.

From my first glance it looks relatively straight forward to do
now. I really would like to avoid yet another iteration of
programs features I have to discover and somehow work around
if we can get the helper into this series. If you really don't
have time I can probably take a look early next week on an
RFC for something like above helper.


.John

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v9 bpf-next 08/14] bpf: add multi-buff support to the bpf_xdp_adjust_tail() API
  2021-06-24 14:24       ` John Fastabend
@ 2021-06-24 15:16         ` Zvi Effron
  2021-06-29 13:19         ` Lorenzo Bianconi
  2021-07-06 21:44         ` Backwards compatibility for XDP multi-buff (was: Re: [PATCH v9 bpf-next 08/14] bpf: add multi-buff support to the bpf_xdp_adjust_tail() API) Toke Høiland-Jørgensen
  2 siblings, 0 replies; 53+ messages in thread
From: Zvi Effron @ 2021-06-24 15:16 UTC (permalink / raw)
  To: John Fastabend
  Cc: Eelco Chaudron, Lorenzo Bianconi, bpf, netdev, lorenzo.bianconi,
	David Miller, Jakub Kicinski, Alexei Starovoitov,
	Daniel Borkmann, shayagr, sameehj, dsahern,
	Jesper Dangaard Brouer, jasowang, alexander.duyck, saeed,
	Maciej Fijalkowski, Karlsson, Magnus, tirthendu.sarkar

On Thu, Jun 24, 2021 at 7:24 AM John Fastabend <john.fastabend@gmail.com> wrote:
>
> Eelco Chaudron wrote:
> >
> >
> > On 23 Jun 2021, at 1:37, John Fastabend wrote:
> >
> > > Lorenzo Bianconi wrote:
> > >> From: Eelco Chaudron <echaudro@redhat.com>
> > >>
> > >> This change adds support for tail growing and shrinking for XDP multi-buff.
> > >>
> > >
> > > It would be nice if the commit message gave us some details on how the
> > > growing/shrinking works in the multi-buff support.
> >
> > Will add this to the next rev.
> >
> > >> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
> > >> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> > >> ---
> > >>  include/net/xdp.h |  7 ++++++
> > >>  net/core/filter.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++
> > >>  net/core/xdp.c    |  5 ++--
> > >>  3 files changed, 72 insertions(+), 2 deletions(-)
> > >>
> > >> diff --git a/include/net/xdp.h b/include/net/xdp.h
> > >> index 935a6f83115f..3525801c6ed5 100644
> > >> --- a/include/net/xdp.h
> > >> +++ b/include/net/xdp.h
> > >> @@ -132,6 +132,11 @@ xdp_get_shared_info_from_buff(struct xdp_buff *xdp)
> > >>    return (struct skb_shared_info *)xdp_data_hard_end(xdp);
> > >>  }
> > >>
> > >> +static inline unsigned int xdp_get_frag_tailroom(const skb_frag_t *frag)
> > >> +{
> > >> +  return PAGE_SIZE - skb_frag_size(frag) - skb_frag_off(frag);
> > >> +}
> > >> +
> > >>  struct xdp_frame {
> > >>    void *data;
> > >>    u16 len;
> > >> @@ -259,6 +264,8 @@ struct xdp_frame *xdp_convert_buff_to_frame(struct xdp_buff *xdp)
> > >>    return xdp_frame;
> > >>  }
> > >>
> > >> +void __xdp_return(void *data, struct xdp_mem_info *mem, bool napi_direct,
> > >> +            struct xdp_buff *xdp);
> > >>  void xdp_return_frame(struct xdp_frame *xdpf);
> > >>  void xdp_return_frame_rx_napi(struct xdp_frame *xdpf);
> > >>  void xdp_return_buff(struct xdp_buff *xdp);
> > >> diff --git a/net/core/filter.c b/net/core/filter.c
> > >> index caa88955562e..05f574a3d690 100644
> > >> --- a/net/core/filter.c
> > >> +++ b/net/core/filter.c
> > >> @@ -3859,11 +3859,73 @@ static const struct bpf_func_proto bpf_xdp_adjust_head_proto = {
> > >>    .arg2_type      = ARG_ANYTHING,
> > >>  };
> > >>
> > >> +static int bpf_xdp_mb_adjust_tail(struct xdp_buff *xdp, int offset)
> > >> +{
> > >> +  struct skb_shared_info *sinfo;
> > >> +
> > >> +  if (unlikely(!xdp_buff_is_mb(xdp)))
> > >> +          return -EINVAL;
> > >> +
> > >> +  sinfo = xdp_get_shared_info_from_buff(xdp);
> > >> +  if (offset >= 0) {
> > >> +          skb_frag_t *frag = &sinfo->frags[sinfo->nr_frags - 1];
> > >> +          int size;
> > >> +
> > >> +          if (unlikely(offset > xdp_get_frag_tailroom(frag)))
> > >> +                  return -EINVAL;
> > >> +
> > >> +          size = skb_frag_size(frag);
> > >> +          memset(skb_frag_address(frag) + size, 0, offset);
> > >> +          skb_frag_size_set(frag, size + offset);
> > >> +          sinfo->data_len += offset;
> > >
> > > Can you add some comment on how this works? So today I call
> > > bpf_xdp_adjust_tail() to add some trailer to my packet.
> > > This looks like it adds tailroom to the last frag? But, then
> > > how do I insert my trailer? I don't think we can without the
> > > extra multi-buffer access support right.
> >
> > You are right, we need some kind of multi-buffer access helpers.
> >
> > > Also data_end will be unchanged yet it will return 0 so my
> > > current programs will likely be a bit confused by this.
> >
> > Guess this is the tricky part, applications need to be multi-buffer aware. If current applications rely on bpf_xdp_adjust_tail(+) to determine maximum frame length this approach might not work. In this case, we might need an additional helper to do tail expansion with multi buffer support.
> >
> > But then the question arrives how would mb unaware application behave in general when an mb packet is supplied?? It would definitely not determine the correct packet length.
>
> Right that was my conclusion as well. Existing programs might
> have subtle side effects if they start running on multibuffer
> drivers as is. I don't have any good ideas though on how
> to handle this.
>

Would it be possible to detect multibuffer awareness of a program at load
(or attach) time, perhaps by looking for the use of the new multibuffer
helpers? That might make it possible to reject a non-multibuffer aware
program on multibuffer drivers (or maybe even put the driver into a
non-multibuffer mode at attach time), or at the very least issue a
warning?

> >
> > >> +  } else {
> >
>
>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v9 bpf-next 10/14] bpf: add multi-buffer support to xdp copy helpers
  2021-06-24 14:28       ` John Fastabend
@ 2021-06-25  8:25         ` Eelco Chaudron
  2021-06-29 13:23         ` Lorenzo Bianconi
  2021-07-06 10:15         ` Eelco Chaudron
  2 siblings, 0 replies; 53+ messages in thread
From: Eelco Chaudron @ 2021-06-25  8:25 UTC (permalink / raw)
  To: John Fastabend
  Cc: Lorenzo Bianconi, bpf, netdev, lorenzo.bianconi, davem, kuba,
	ast, daniel, shayagr, sameehj, dsahern, brouer, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar



On 24 Jun 2021, at 16:28, John Fastabend wrote:

> Eelco Chaudron wrote:
>>
>>
>> On 23 Jun 2021, at 1:49, John Fastabend wrote:
>>
>>> Lorenzo Bianconi wrote:
>>>> From: Eelco Chaudron <echaudro@redhat.com>
>>>>
>>>> This patch adds support for multi-buffer for the following helpers:
>>>>   - bpf_xdp_output()
>>>>   - bpf_perf_event_output()
>>>>
>>>> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
>>>> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
>>>> ---
>>>
>>> Ah ok so at least xdp_output will work with all bytes. But this is
>>> getting close to having access into the frags so I think doing
>>> the last bit shouldn't be too hard?
>>
>>
>> Guess you are talking about multi-buffer access in the XDP program?
>>
>> I did suggest an API a while back, https://lore.kernel.org/bpf/FD3E6E08-DE78-4FBA-96F6-646C93E88631@redhat.com/ but I had/have not time to work on it. Guess the difficult part is to convince the verifier to allow the data to be accessed.
>
> Ah great I think we had the same idea I called it xdp_pull_data()
> though.
>
> Whats the complication though it looks like it can be done by simply
> moving the data and data_end pointers around then marking them
> invalidated. This way the verifier knows the program needs to
> rewrite them. I can probably look more into next week.
>
> From my first glance it looks relatively straight forward to do
> now. I really would like to avoid yet another iteration of
> programs features I have to discover and somehow work around
> if we can get the helper into this series. If you really don't
> have time I can probably take a look early next week on an
> RFC for something like above helper.

I’m on a small side project for the next 2 to 3 weeks, and after that, I have some PTO, so if you have time for an RFC, that will speed up this patchset.

Thanks,

Eelco


> .John


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v9 bpf-next 01/14] net: skbuff: add data_len field to skb_shared_info
  2021-06-14 12:49 ` [PATCH v9 bpf-next 01/14] net: skbuff: add data_len field to skb_shared_info Lorenzo Bianconi
@ 2021-06-28 19:58   ` Alexander Duyck
  2021-06-29 12:44     ` Lorenzo Bianconi
  0 siblings, 1 reply; 53+ messages in thread
From: Alexander Duyck @ 2021-06-28 19:58 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: bpf, Netdev, lorenzo.bianconi, David Miller, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, shayagr, Jubran, Samih,
	John Fastabend, David Ahern, Jesper Dangaard Brouer,
	Eelco Chaudron, Jason Wang, Saeed Mahameed, Maciej Fijalkowski,
	Karlsson, Magnus, Tirthendu

On Mon, Jun 14, 2021 at 5:50 AM Lorenzo Bianconi <lorenzo@kernel.org> wrote:
>
> data_len field will be used for paged frame len for xdp_buff/xdp_frame.
> This is a preliminary patch to properly support xdp-multibuff
>
> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> ---
>  include/linux/skbuff.h | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index dbf820a50a39..332ec56c200d 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -522,7 +522,10 @@ struct skb_shared_info {
>         struct sk_buff  *frag_list;
>         struct skb_shared_hwtstamps hwtstamps;
>         unsigned int    gso_type;
> -       u32             tskey;
> +       union {
> +               u32     tskey;
> +               u32     data_len;
> +       };
>

Rather than use the tskey field why not repurpose the gso_size field?
I would think in the XDP paths that the gso fields would be unused
since LRO and HW_GRO would be incompatible with XDP anyway.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v9 bpf-next 02/14] xdp: introduce flags field in xdp_buff/xdp_frame
  2021-06-14 12:49 ` [PATCH v9 bpf-next 02/14] xdp: introduce flags field in xdp_buff/xdp_frame Lorenzo Bianconi
@ 2021-06-28 20:14   ` Alexander Duyck
  2021-06-29 12:43     ` Lorenzo Bianconi
  0 siblings, 1 reply; 53+ messages in thread
From: Alexander Duyck @ 2021-06-28 20:14 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: bpf, Netdev, lorenzo.bianconi, David Miller, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, shayagr, Jubran, Samih,
	John Fastabend, David Ahern, Jesper Dangaard Brouer,
	Eelco Chaudron, Jason Wang, Saeed Mahameed, Maciej Fijalkowski,
	Karlsson, Magnus, Tirthendu

On Mon, Jun 14, 2021 at 5:50 AM Lorenzo Bianconi <lorenzo@kernel.org> wrote:
>
> Introduce flags field in xdp_frame/xdp_buffer data structure
> to define additional buffer features. At the moment the only
> supported buffer feature is multi-buffer bit (mb). Multi-buffer bit
> is used to specify if this is a linear buffer (mb = 0) or a multi-buffer
> frame (mb = 1). In the latter case the shared_info area at the end of
> the first buffer will be properly initialized to link together
> subsequent buffers.
>
> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>

Instead of passing this between buffers and frames I wonder if this
wouldn't be better to place in something like the xdp_mem_info
structure since this is something that would be specific to how the
device is handling memory anyway. You could probably split the type
field into a 16b type and a 16b flags field. Then add your bit where 0
is linear/legacy and 1 is scatter-gather/multi-buffer.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v9 bpf-next 07/14] net: xdp: add multi-buff support to xdp_build_skb_from_frame
  2021-06-14 12:49 ` [PATCH v9 bpf-next 07/14] net: xdp: add multi-buff support to xdp_build_skb_from_frame Lorenzo Bianconi
@ 2021-06-28 21:05   ` Alexander Duyck
  2021-06-29 18:34     ` Lorenzo Bianconi
  0 siblings, 1 reply; 53+ messages in thread
From: Alexander Duyck @ 2021-06-28 21:05 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: bpf, Netdev, lorenzo.bianconi, David Miller, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, shayagr, Jubran, Samih,
	John Fastabend, David Ahern, Jesper Dangaard Brouer,
	Eelco Chaudron, Jason Wang, Saeed Mahameed, Maciej Fijalkowski,
	Karlsson, Magnus, Tirthendu

On Mon, Jun 14, 2021 at 5:51 AM Lorenzo Bianconi <lorenzo@kernel.org> wrote:
>
> Introduce xdp multi-buff support to
> __xdp_build_skb_from_frame/xdp_build_skb_from_frame
> utility routines.
>
> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> ---
>  net/core/xdp.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
>
> diff --git a/net/core/xdp.c b/net/core/xdp.c
> index f61c63115c95..71bedf6049a1 100644
> --- a/net/core/xdp.c
> +++ b/net/core/xdp.c
> @@ -582,9 +582,15 @@ struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf,
>                                            struct sk_buff *skb,
>                                            struct net_device *dev)
>  {
> +       struct skb_shared_info *sinfo = xdp_get_shared_info_from_frame(xdpf);
>         unsigned int headroom, frame_size;
> +       int i, num_frags = 0;
>         void *hard_start;
>
> +       /* xdp multi-buff frame */
> +       if (unlikely(xdp_frame_is_mb(xdpf)))
> +               num_frags = sinfo->nr_frags;
> +
>         /* Part of headroom was reserved to xdpf */
>         headroom = sizeof(*xdpf) + xdpf->headroom;
>
> @@ -603,6 +609,13 @@ struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf,
>         if (xdpf->metasize)
>                 skb_metadata_set(skb, xdpf->metasize);
>
> +       for (i = 0; i < num_frags; i++)
> +               skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags,
> +                               skb_frag_page(&sinfo->frags[i]),
> +                               skb_frag_off(&sinfo->frags[i]),
> +                               skb_frag_size(&sinfo->frags[i]),
> +                               xdpf->frame_sz);
> +

So this is assuming the header frame and all of the frags are using
the same size. Rather than reading the frags out and then writing them
back, why not just directly rewrite the nr_frags, add the total size
to skb->len and skb->data_len, and then update the truesize?

Actually, I think you might need to store the truesize somewhere in
addition to the data_len that you were storing in the shared info.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v9 bpf-next 02/14] xdp: introduce flags field in xdp_buff/xdp_frame
  2021-06-28 20:14   ` Alexander Duyck
@ 2021-06-29 12:43     ` Lorenzo Bianconi
  2021-06-29 13:07       ` Alexander Duyck
  0 siblings, 1 reply; 53+ messages in thread
From: Lorenzo Bianconi @ 2021-06-29 12:43 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Lorenzo Bianconi, bpf, Netdev, David Miller, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, shayagr, Jubran, Samih,
	John Fastabend, David Ahern, Jesper Dangaard Brouer,
	Eelco Chaudron, Jason Wang, Saeed Mahameed, Maciej Fijalkowski,
	Karlsson, Magnus, Tirthendu

[-- Attachment #1: Type: text/plain, Size: 1277 bytes --]

> On Mon, Jun 14, 2021 at 5:50 AM Lorenzo Bianconi <lorenzo@kernel.org> wrote:
> >
> > Introduce flags field in xdp_frame/xdp_buffer data structure
> > to define additional buffer features. At the moment the only
> > supported buffer feature is multi-buffer bit (mb). Multi-buffer bit
> > is used to specify if this is a linear buffer (mb = 0) or a multi-buffer
> > frame (mb = 1). In the latter case the shared_info area at the end of
> > the first buffer will be properly initialized to link together
> > subsequent buffers.
> >
> > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> 
> Instead of passing this between buffers and frames I wonder if this
> wouldn't be better to place in something like the xdp_mem_info
> structure since this is something that would be specific to how the
> device is handling memory anyway. You could probably split the type
> field into a 16b type and a 16b flags field. Then add your bit where 0
> is linear/legacy and 1 is scatter-gather/multi-buffer.
> 

ack, this should be fine but I put the flag field in xdp_buff/xdp_frame
in order to reuse it for some xdp hw-hints (e.g rx checksum type).
We can put it in xdp_mem_info too but I guess it would be less intuitive, what
do you think?

Regards,
Lorenzo

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v9 bpf-next 01/14] net: skbuff: add data_len field to skb_shared_info
  2021-06-28 19:58   ` Alexander Duyck
@ 2021-06-29 12:44     ` Lorenzo Bianconi
  2021-06-29 17:08       ` Jakub Kicinski
  0 siblings, 1 reply; 53+ messages in thread
From: Lorenzo Bianconi @ 2021-06-29 12:44 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Lorenzo Bianconi, bpf, Netdev, David Miller, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, shayagr, Jubran, Samih,
	John Fastabend, David Ahern, Jesper Dangaard Brouer,
	Eelco Chaudron, Jason Wang, Saeed Mahameed, Maciej Fijalkowski,
	Karlsson, Magnus, Tirthendu

[-- Attachment #1: Type: text/plain, Size: 1187 bytes --]

> On Mon, Jun 14, 2021 at 5:50 AM Lorenzo Bianconi <lorenzo@kernel.org> wrote:
> >
> > data_len field will be used for paged frame len for xdp_buff/xdp_frame.
> > This is a preliminary patch to properly support xdp-multibuff
> >
> > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> > ---
> >  include/linux/skbuff.h | 5 ++++-
> >  1 file changed, 4 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> > index dbf820a50a39..332ec56c200d 100644
> > --- a/include/linux/skbuff.h
> > +++ b/include/linux/skbuff.h
> > @@ -522,7 +522,10 @@ struct skb_shared_info {
> >         struct sk_buff  *frag_list;
> >         struct skb_shared_hwtstamps hwtstamps;
> >         unsigned int    gso_type;
> > -       u32             tskey;
> > +       union {
> > +               u32     tskey;
> > +               u32     data_len;
> > +       };
> >
> 
> Rather than use the tskey field why not repurpose the gso_size field?
> I would think in the XDP paths that the gso fields would be unused
> since LRO and HW_GRO would be incompatible with XDP anyway.
> 

ack, I agree. I will fix it in v10.

Regards,
Lorenzo

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v9 bpf-next 02/14] xdp: introduce flags field in xdp_buff/xdp_frame
  2021-06-29 12:43     ` Lorenzo Bianconi
@ 2021-06-29 13:07       ` Alexander Duyck
  2021-06-29 13:25         ` Lorenzo Bianconi
  2021-07-05 15:52         ` Lorenzo Bianconi
  0 siblings, 2 replies; 53+ messages in thread
From: Alexander Duyck @ 2021-06-29 13:07 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: Lorenzo Bianconi, bpf, Netdev, David Miller, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, shayagr, Jubran, Samih,
	John Fastabend, David Ahern, Jesper Dangaard Brouer,
	Eelco Chaudron, Jason Wang, Saeed Mahameed, Maciej Fijalkowski,
	Karlsson, Magnus, Tirthendu

On Tue, Jun 29, 2021 at 5:43 AM Lorenzo Bianconi
<lorenzo.bianconi@redhat.com> wrote:
>
> > On Mon, Jun 14, 2021 at 5:50 AM Lorenzo Bianconi <lorenzo@kernel.org> wrote:
> > >
> > > Introduce flags field in xdp_frame/xdp_buffer data structure
> > > to define additional buffer features. At the moment the only
> > > supported buffer feature is multi-buffer bit (mb). Multi-buffer bit
> > > is used to specify if this is a linear buffer (mb = 0) or a multi-buffer
> > > frame (mb = 1). In the latter case the shared_info area at the end of
> > > the first buffer will be properly initialized to link together
> > > subsequent buffers.
> > >
> > > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> >
> > Instead of passing this between buffers and frames I wonder if this
> > wouldn't be better to place in something like the xdp_mem_info
> > structure since this is something that would be specific to how the
> > device is handling memory anyway. You could probably split the type
> > field into a 16b type and a 16b flags field. Then add your bit where 0
> > is linear/legacy and 1 is scatter-gather/multi-buffer.
> >
>
> ack, this should be fine but I put the flag field in xdp_buff/xdp_frame
> in order to reuse it for some xdp hw-hints (e.g rx checksum type).
> We can put it in xdp_mem_info too but I guess it would be less intuitive, what
> do you think?

I think it makes the most sense in xdp_mem_info. It already tells us
what to expect in some respect in regards to memory layout as it tells
us if we are dealing with shared pages or whole pages and how to
recycle them. I would think that applies almost identically to
scatter-gather XDP the same way.

As far as the addition of flags there is still time for that later as
we still have the 32b of unused space after frame_sz.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v9 bpf-next 08/14] bpf: add multi-buff support to the bpf_xdp_adjust_tail() API
  2021-06-24 14:24       ` John Fastabend
  2021-06-24 15:16         ` Zvi Effron
@ 2021-06-29 13:19         ` Lorenzo Bianconi
  2021-06-29 13:27           ` Toke Høiland-Jørgensen
  2021-07-06 21:44         ` Backwards compatibility for XDP multi-buff (was: Re: [PATCH v9 bpf-next 08/14] bpf: add multi-buff support to the bpf_xdp_adjust_tail() API) Toke Høiland-Jørgensen
  2 siblings, 1 reply; 53+ messages in thread
From: Lorenzo Bianconi @ 2021-06-29 13:19 UTC (permalink / raw)
  To: John Fastabend
  Cc: Eelco Chaudron, Lorenzo Bianconi, bpf, netdev, davem, kuba, ast,
	daniel, shayagr, sameehj, dsahern, brouer, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar

[-- Attachment #1: Type: text/plain, Size: 1405 bytes --]

> Eelco Chaudron wrote:
> > 
> > 
> > On 23 Jun 2021, at 1:37, John Fastabend wrote:
> > 
> > > Lorenzo Bianconi wrote:
> > >> From: Eelco Chaudron <echaudro@redhat.com>
> > >>
> > >> This change adds support for tail growing and shrinking for XDP multi-buff.
> > >>
> > >
> > > It would be nice if the commit message gave us some details on how the
> > > growing/shrinking works in the multi-buff support.
[...]
> > Guess this is the tricky part, applications need to be multi-buffer aware. If current applications rely on bpf_xdp_adjust_tail(+) to determine maximum frame length this approach might not work. In this case, we might need an additional helper to do tail expansion with multi buffer support.
> > 
> > But then the question arrives how would mb unaware application behave in general when an mb packet is supplied?? It would definitely not determine the correct packet length.
> 
> Right that was my conclusion as well. Existing programs might
> have subtle side effects if they start running on multibuffer
> drivers as is. I don't have any good ideas though on how
> to handle this.

what about checking the program capabilities at load time (e.g. with a
special program type) and disable mb feature if the bpf program is not
mb-aware? (e.g. forbid to set the MTU greater than 1500B in xdp mode).

Regards,
Lorenzo

> 
> > 
> > >> +	} else {
> > 
> 
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v9 bpf-next 10/14] bpf: add multi-buffer support to xdp copy helpers
  2021-06-24 14:28       ` John Fastabend
  2021-06-25  8:25         ` Eelco Chaudron
@ 2021-06-29 13:23         ` Lorenzo Bianconi
  2021-07-06 10:15         ` Eelco Chaudron
  2 siblings, 0 replies; 53+ messages in thread
From: Lorenzo Bianconi @ 2021-06-29 13:23 UTC (permalink / raw)
  To: John Fastabend
  Cc: Eelco Chaudron, Lorenzo Bianconi, bpf, netdev, davem, kuba, ast,
	daniel, shayagr, sameehj, dsahern, brouer, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar

[-- Attachment #1: Type: text/plain, Size: 1904 bytes --]

> > 
> > 
> > On 23 Jun 2021, at 1:49, John Fastabend wrote:
> > 
> > > Lorenzo Bianconi wrote:
> > >> From: Eelco Chaudron <echaudro@redhat.com>
> > >>
> > >> This patch adds support for multi-buffer for the following helpers:
> > >>   - bpf_xdp_output()
> > >>   - bpf_perf_event_output()
> > >>
> > >> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
> > >> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> > >> ---
> > >
> > > Ah ok so at least xdp_output will work with all bytes. But this is
> > > getting close to having access into the frags so I think doing
> > > the last bit shouldn't be too hard?
> > 
> > 
> > Guess you are talking about multi-buffer access in the XDP program?
> > 
> > I did suggest an API a while back, https://lore.kernel.org/bpf/FD3E6E08-DE78-4FBA-96F6-646C93E88631@redhat.com/ but I had/have not time to work on it. Guess the difficult part is to convince the verifier to allow the data to be accessed.
> 
> Ah great I think we had the same idea I called it xdp_pull_data()
> though.
> 
> Whats the complication though it looks like it can be done by simply
> moving the data and data_end pointers around then marking them
> invalidated. This way the verifier knows the program needs to
> rewrite them. I can probably look more into next week.
> 
> From my first glance it looks relatively straight forward to do
> now. I really would like to avoid yet another iteration of
> programs features I have to discover and somehow work around
> if we can get the helper into this series. If you really don't
> have time I can probably take a look early next week on an
> RFC for something like above helper.

cool, thx :)
What about discussing APIs during the BPF mtg upstream on Thursday (probably
not next one since most of the people will be in PTO)? I will work on some docs.

Regards,
Lorenzo

> 
> 
> .John
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v9 bpf-next 02/14] xdp: introduce flags field in xdp_buff/xdp_frame
  2021-06-29 13:07       ` Alexander Duyck
@ 2021-06-29 13:25         ` Lorenzo Bianconi
  2021-07-05 15:52         ` Lorenzo Bianconi
  1 sibling, 0 replies; 53+ messages in thread
From: Lorenzo Bianconi @ 2021-06-29 13:25 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Lorenzo Bianconi, bpf, Netdev, David Miller, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, shayagr, Jubran, Samih,
	John Fastabend, David Ahern, Jesper Dangaard Brouer,
	Eelco Chaudron, Jason Wang, Saeed Mahameed, Maciej Fijalkowski,
	Karlsson, Magnus, Tirthendu

[-- Attachment #1: Type: text/plain, Size: 1996 bytes --]

> On Tue, Jun 29, 2021 at 5:43 AM Lorenzo Bianconi
> <lorenzo.bianconi@redhat.com> wrote:
> >
> > > On Mon, Jun 14, 2021 at 5:50 AM Lorenzo Bianconi <lorenzo@kernel.org> wrote:
> > > >
> > > > Introduce flags field in xdp_frame/xdp_buffer data structure
> > > > to define additional buffer features. At the moment the only
> > > > supported buffer feature is multi-buffer bit (mb). Multi-buffer bit
> > > > is used to specify if this is a linear buffer (mb = 0) or a multi-buffer
> > > > frame (mb = 1). In the latter case the shared_info area at the end of
> > > > the first buffer will be properly initialized to link together
> > > > subsequent buffers.
> > > >
> > > > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> > >
> > > Instead of passing this between buffers and frames I wonder if this
> > > wouldn't be better to place in something like the xdp_mem_info
> > > structure since this is something that would be specific to how the
> > > device is handling memory anyway. You could probably split the type
> > > field into a 16b type and a 16b flags field. Then add your bit where 0
> > > is linear/legacy and 1 is scatter-gather/multi-buffer.
> > >
> >
> > ack, this should be fine but I put the flag field in xdp_buff/xdp_frame
> > in order to reuse it for some xdp hw-hints (e.g rx checksum type).
> > We can put it in xdp_mem_info too but I guess it would be less intuitive, what
> > do you think?
> 
> I think it makes the most sense in xdp_mem_info. It already tells us
> what to expect in some respect in regards to memory layout as it tells
> us if we are dealing with shared pages or whole pages and how to
> recycle them. I would think that applies almost identically to
> scatter-gather XDP the same way.
> 
> As far as the addition of flags there is still time for that later as
> we still have the 32b of unused space after frame_sz.

ack, I am fine with it. If everybody agree, I will fix it in v10.

Regards,
Lorenzo

> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v9 bpf-next 08/14] bpf: add multi-buff support to the bpf_xdp_adjust_tail() API
  2021-06-29 13:19         ` Lorenzo Bianconi
@ 2021-06-29 13:27           ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 53+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-06-29 13:27 UTC (permalink / raw)
  To: Lorenzo Bianconi, John Fastabend
  Cc: Eelco Chaudron, Lorenzo Bianconi, bpf, netdev, davem, kuba, ast,
	daniel, shayagr, sameehj, dsahern, brouer, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar

Lorenzo Bianconi <lorenzo.bianconi@redhat.com> writes:

>> Eelco Chaudron wrote:
>> > 
>> > 
>> > On 23 Jun 2021, at 1:37, John Fastabend wrote:
>> > 
>> > > Lorenzo Bianconi wrote:
>> > >> From: Eelco Chaudron <echaudro@redhat.com>
>> > >>
>> > >> This change adds support for tail growing and shrinking for XDP multi-buff.
>> > >>
>> > >
>> > > It would be nice if the commit message gave us some details on how the
>> > > growing/shrinking works in the multi-buff support.
> [...]
>> > Guess this is the tricky part, applications need to be multi-buffer aware. If current applications rely on bpf_xdp_adjust_tail(+) to determine maximum frame length this approach might not work. In this case, we might need an additional helper to do tail expansion with multi buffer support.
>> > 
>> > But then the question arrives how would mb unaware application behave in general when an mb packet is supplied?? It would definitely not determine the correct packet length.
>> 
>> Right that was my conclusion as well. Existing programs might
>> have subtle side effects if they start running on multibuffer
>> drivers as is. I don't have any good ideas though on how
>> to handle this.
>
> what about checking the program capabilities at load time (e.g. with a
> special program type) and disable mb feature if the bpf program is not
> mb-aware? (e.g. forbid to set the MTU greater than 1500B in xdp mode).

So what happens when that legacy program runs on a veth and gets an
mb-enabled frame redirected into it? :)

-Toke


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v9 bpf-next 01/14] net: skbuff: add data_len field to skb_shared_info
  2021-06-29 12:44     ` Lorenzo Bianconi
@ 2021-06-29 17:08       ` Jakub Kicinski
  2021-06-29 18:18         ` Alexander Duyck
  0 siblings, 1 reply; 53+ messages in thread
From: Jakub Kicinski @ 2021-06-29 17:08 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: Alexander Duyck, Lorenzo Bianconi, bpf, Netdev, David Miller,
	Alexei Starovoitov, Daniel Borkmann, shayagr, Jubran, Samih,
	John Fastabend, David Ahern, Jesper Dangaard Brouer,
	Eelco Chaudron, Jason Wang, Saeed Mahameed, Maciej Fijalkowski,
	Karlsson, Magnus, Tirthendu

On Tue, 29 Jun 2021 14:44:56 +0200 Lorenzo Bianconi wrote:
> > On Mon, Jun 14, 2021 at 5:50 AM Lorenzo Bianconi <lorenzo@kernel.org> wrote:  
> > >
> > > data_len field will be used for paged frame len for xdp_buff/xdp_frame.
> > > This is a preliminary patch to properly support xdp-multibuff
> > >
> > > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> > > ---
> > >  include/linux/skbuff.h | 5 ++++-
> > >  1 file changed, 4 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> > > index dbf820a50a39..332ec56c200d 100644
> > > --- a/include/linux/skbuff.h
> > > +++ b/include/linux/skbuff.h
> > > @@ -522,7 +522,10 @@ struct skb_shared_info {
> > >         struct sk_buff  *frag_list;
> > >         struct skb_shared_hwtstamps hwtstamps;
> > >         unsigned int    gso_type;
> > > -       u32             tskey;
> > > +       union {
> > > +               u32     tskey;
> > > +               u32     data_len;
> > > +       };
> > >  
> > 
> > Rather than use the tskey field why not repurpose the gso_size field?
> > I would think in the XDP paths that the gso fields would be unused
> > since LRO and HW_GRO would be incompatible with XDP anyway.
> 
> ack, I agree. I will fix it in v10.

Why is XDP mb incompatible with LRO? I thought that was one of the use
cases (mentioned by Willem IIRC).

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v9 bpf-next 01/14] net: skbuff: add data_len field to skb_shared_info
  2021-06-29 17:08       ` Jakub Kicinski
@ 2021-06-29 18:18         ` Alexander Duyck
  2021-06-29 18:37           ` Jakub Kicinski
  0 siblings, 1 reply; 53+ messages in thread
From: Alexander Duyck @ 2021-06-29 18:18 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: Lorenzo Bianconi, Lorenzo Bianconi, bpf, Netdev, David Miller,
	Alexei Starovoitov, Daniel Borkmann, shayagr, Jubran, Samih,
	John Fastabend, David Ahern, Jesper Dangaard Brouer,
	Eelco Chaudron, Jason Wang, Saeed Mahameed, Maciej Fijalkowski,
	Karlsson, Magnus, Tirthendu

On Tue, Jun 29, 2021 at 10:08 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Tue, 29 Jun 2021 14:44:56 +0200 Lorenzo Bianconi wrote:
> > > On Mon, Jun 14, 2021 at 5:50 AM Lorenzo Bianconi <lorenzo@kernel.org> wrote:
> > > >
> > > > data_len field will be used for paged frame len for xdp_buff/xdp_frame.
> > > > This is a preliminary patch to properly support xdp-multibuff
> > > >
> > > > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> > > > ---
> > > >  include/linux/skbuff.h | 5 ++++-
> > > >  1 file changed, 4 insertions(+), 1 deletion(-)
> > > >
> > > > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> > > > index dbf820a50a39..332ec56c200d 100644
> > > > --- a/include/linux/skbuff.h
> > > > +++ b/include/linux/skbuff.h
> > > > @@ -522,7 +522,10 @@ struct skb_shared_info {
> > > >         struct sk_buff  *frag_list;
> > > >         struct skb_shared_hwtstamps hwtstamps;
> > > >         unsigned int    gso_type;
> > > > -       u32             tskey;
> > > > +       union {
> > > > +               u32     tskey;
> > > > +               u32     data_len;
> > > > +       };
> > > >
> > >
> > > Rather than use the tskey field why not repurpose the gso_size field?
> > > I would think in the XDP paths that the gso fields would be unused
> > > since LRO and HW_GRO would be incompatible with XDP anyway.
> >
> > ack, I agree. I will fix it in v10.
>
> Why is XDP mb incompatible with LRO? I thought that was one of the use
> cases (mentioned by Willem IIRC).

XDP is meant to be a per packet operation with support for TX and
REDIRECT, and LRO isn't routable. So we could put together a large LRO
frame but we wouldn't be able to break it apart again. If we allow
that then we are going to need a ton more exception handling added to
the XDP paths.

As far as GSO it would require setting many more fields in order to
actually make it offloadable by any hardware. My preference would be
to make use of gso_segs and gso_size to store the truesize and datalen
of the pages. That way we keep all of the data fields used in the
shared info in the first 8 bytes assuming we don't end up having to
actually use multiple buffers.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v9 bpf-next 07/14] net: xdp: add multi-buff support to xdp_build_skb_from_frame
  2021-06-28 21:05   ` Alexander Duyck
@ 2021-06-29 18:34     ` Lorenzo Bianconi
  0 siblings, 0 replies; 53+ messages in thread
From: Lorenzo Bianconi @ 2021-06-29 18:34 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Lorenzo Bianconi, bpf, Netdev, David Miller, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, shayagr, Jubran, Samih,
	John Fastabend, David Ahern, Jesper Dangaard Brouer,
	Eelco Chaudron, Jason Wang, Saeed Mahameed, Maciej Fijalkowski,
	Karlsson, Magnus, Tirthendu

[-- Attachment #1: Type: text/plain, Size: 2284 bytes --]

> On Mon, Jun 14, 2021 at 5:51 AM Lorenzo Bianconi <lorenzo@kernel.org> wrote:
> >
> > Introduce xdp multi-buff support to
> > __xdp_build_skb_from_frame/xdp_build_skb_from_frame
> > utility routines.
> >
> > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> > ---
> >  net/core/xdp.c | 13 +++++++++++++
> >  1 file changed, 13 insertions(+)
> >
> > diff --git a/net/core/xdp.c b/net/core/xdp.c
> > index f61c63115c95..71bedf6049a1 100644
> > --- a/net/core/xdp.c
> > +++ b/net/core/xdp.c
> > @@ -582,9 +582,15 @@ struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf,
> >                                            struct sk_buff *skb,
> >                                            struct net_device *dev)
> >  {
> > +       struct skb_shared_info *sinfo = xdp_get_shared_info_from_frame(xdpf);
> >         unsigned int headroom, frame_size;
> > +       int i, num_frags = 0;
> >         void *hard_start;
> >
> > +       /* xdp multi-buff frame */
> > +       if (unlikely(xdp_frame_is_mb(xdpf)))
> > +               num_frags = sinfo->nr_frags;
> > +
> >         /* Part of headroom was reserved to xdpf */
> >         headroom = sizeof(*xdpf) + xdpf->headroom;
> >
> > @@ -603,6 +609,13 @@ struct sk_buff *__xdp_build_skb_from_frame(struct xdp_frame *xdpf,
> >         if (xdpf->metasize)
> >                 skb_metadata_set(skb, xdpf->metasize);
> >
> > +       for (i = 0; i < num_frags; i++)
> > +               skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags,
> > +                               skb_frag_page(&sinfo->frags[i]),
> > +                               skb_frag_off(&sinfo->frags[i]),
> > +                               skb_frag_size(&sinfo->frags[i]),
> > +                               xdpf->frame_sz);
> > +
> 
> So this is assuming the header frame and all of the frags are using
> the same size. Rather than reading the frags out and then writing them
> back, why not just directly rewrite the nr_frags, add the total size
> to skb->len and skb->data_len, and then update the truesize?

ack, thx. I will look into it.

Regards,
Lorenzo

> 
> Actually, I think you might need to store the truesize somewhere in
> addition to the data_len that you were storing in the shared info.
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v9 bpf-next 01/14] net: skbuff: add data_len field to skb_shared_info
  2021-06-29 18:18         ` Alexander Duyck
@ 2021-06-29 18:37           ` Jakub Kicinski
  2021-06-29 19:11             ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 53+ messages in thread
From: Jakub Kicinski @ 2021-06-29 18:37 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Lorenzo Bianconi, Lorenzo Bianconi, bpf, Netdev, David Miller,
	Alexei Starovoitov, Daniel Borkmann, shayagr, Jubran, Samih,
	John Fastabend, David Ahern, Jesper Dangaard Brouer,
	Eelco Chaudron, Jason Wang, Saeed Mahameed, Maciej Fijalkowski,
	Karlsson, Magnus, Tirthendu

On Tue, 29 Jun 2021 11:18:38 -0700 Alexander Duyck wrote:
> On Tue, Jun 29, 2021 at 10:08 AM Jakub Kicinski <kuba@kernel.org> wrote:
> > > ack, I agree. I will fix it in v10.  
> >
> > Why is XDP mb incompatible with LRO? I thought that was one of the use
> > cases (mentioned by Willem IIRC).  
> 
> XDP is meant to be a per packet operation with support for TX and
> REDIRECT, and LRO isn't routable. So we could put together a large LRO
> frame but we wouldn't be able to break it apart again. If we allow
> that then we are going to need a ton more exception handling added to
> the XDP paths.
> 
> As far as GSO it would require setting many more fields in order to
> actually make it offloadable by any hardware.

It would require more work, but TSO seems to be explicitly stated 
as what the series builds towards (in the cover letter). It's fine
to make choices we'd need to redo later, I guess, I'm just trying
to understand the why.

> My preference would be
> to make use of gso_segs and gso_size to store the truesize and datalen
> of the pages. That way we keep all of the data fields used in the
> shared info in the first 8 bytes assuming we don't end up having to
> actually use multiple buffers.

Is 8B significant? We expect the compiler to load 8B and then slice it
out? Can the CPU do that? We're not expecting sinfo to be misaligned
(e.g. placed directly after xdp_buff), right?

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v9 bpf-next 01/14] net: skbuff: add data_len field to skb_shared_info
  2021-06-29 18:37           ` Jakub Kicinski
@ 2021-06-29 19:11             ` Jesper Dangaard Brouer
  2021-06-29 19:18               ` Lorenzo Bianconi
  0 siblings, 1 reply; 53+ messages in thread
From: Jesper Dangaard Brouer @ 2021-06-29 19:11 UTC (permalink / raw)
  To: Jakub Kicinski, Alexander Duyck
  Cc: Lorenzo Bianconi, Lorenzo Bianconi, bpf, Netdev, David Miller,
	Alexei Starovoitov, Daniel Borkmann, shayagr, Jubran, Samih,
	John Fastabend, David Ahern, Jesper Dangaard Brouer,
	Eelco Chaudron, Jason Wang, Saeed Mahameed, Maciej Fijalkowski,
	Karlsson, Magnus, Tirthendu


On 29/06/2021 20.37, Jakub Kicinski wrote:
> On Tue, 29 Jun 2021 11:18:38 -0700 Alexander Duyck wrote:
>> On Tue, Jun 29, 2021 at 10:08 AM Jakub Kicinski <kuba@kernel.org> wrote:
>>>> ack, I agree. I will fix it in v10.
>>> Why is XDP mb incompatible with LRO? I thought that was one of the use
>>> cases (mentioned by Willem IIRC).
>> XDP is meant to be a per packet operation with support for TX and
>> REDIRECT, and LRO isn't routable. So we could put together a large LRO
>> frame but we wouldn't be able to break it apart again. If we allow
>> that then we are going to need a ton more exception handling added to
>> the XDP paths.
>>
>> As far as GSO it would require setting many more fields in order to
>> actually make it offloadable by any hardware.
> It would require more work, but TSO seems to be explicitly stated
> as what the series builds towards (in the cover letter). It's fine
> to make choices we'd need to redo later, I guess, I'm just trying
> to understand the why.

This is also my understanding that LRO and TSO is what this patchset is 
working towards.

Sorry, I don't agree or understand this requested change.



^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v9 bpf-next 01/14] net: skbuff: add data_len field to skb_shared_info
  2021-06-29 19:11             ` Jesper Dangaard Brouer
@ 2021-06-29 19:18               ` Lorenzo Bianconi
  2021-06-29 20:45                 ` Alexander Duyck
  0 siblings, 1 reply; 53+ messages in thread
From: Lorenzo Bianconi @ 2021-06-29 19:18 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Jakub Kicinski, Alexander Duyck, Lorenzo Bianconi, bpf, Netdev,
	David Miller, Alexei Starovoitov, Daniel Borkmann, shayagr,
	Jubran, Samih, John Fastabend, David Ahern,
	Jesper Dangaard Brouer, Eelco Chaudron, Jason Wang,
	Saeed Mahameed, Maciej Fijalkowski, Karlsson, Magnus, Tirthendu

[-- Attachment #1: Type: text/plain, Size: 1468 bytes --]

> 
> On 29/06/2021 20.37, Jakub Kicinski wrote:
> > On Tue, 29 Jun 2021 11:18:38 -0700 Alexander Duyck wrote:
> > > On Tue, Jun 29, 2021 at 10:08 AM Jakub Kicinski <kuba@kernel.org> wrote:
> > > > > ack, I agree. I will fix it in v10.
> > > > Why is XDP mb incompatible with LRO? I thought that was one of the use
> > > > cases (mentioned by Willem IIRC).
> > > XDP is meant to be a per packet operation with support for TX and
> > > REDIRECT, and LRO isn't routable. So we could put together a large LRO
> > > frame but we wouldn't be able to break it apart again. If we allow
> > > that then we are going to need a ton more exception handling added to
> > > the XDP paths.
> > > 
> > > As far as GSO it would require setting many more fields in order to
> > > actually make it offloadable by any hardware.
> > It would require more work, but TSO seems to be explicitly stated
> > as what the series builds towards (in the cover letter). It's fine
> > to make choices we'd need to redo later, I guess, I'm just trying
> > to understand the why.
> 
> This is also my understanding that LRO and TSO is what this patchset is
> working towards.
> 
> Sorry, I don't agree or understand this requested change.
> 
> 

My understanding here is to use gso_size to store paged length of the
xdp multi-buffer. When converting the xdp_frame to a skb we will need
to overwrite it to support gro/lro. Is my understanding correct?

Regards,
Lorenzo

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v9 bpf-next 01/14] net: skbuff: add data_len field to skb_shared_info
  2021-06-29 19:18               ` Lorenzo Bianconi
@ 2021-06-29 20:45                 ` Alexander Duyck
  0 siblings, 0 replies; 53+ messages in thread
From: Alexander Duyck @ 2021-06-29 20:45 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: Jesper Dangaard Brouer, Jakub Kicinski, Lorenzo Bianconi, bpf,
	Netdev, David Miller, Alexei Starovoitov, Daniel Borkmann,
	shayagr, Jubran, Samih, John Fastabend, David Ahern,
	Jesper Dangaard Brouer, Eelco Chaudron, Jason Wang,
	Saeed Mahameed, Maciej Fijalkowski, Karlsson, Magnus, Tirthendu

On Tue, Jun 29, 2021 at 12:18 PM Lorenzo Bianconi
<lorenzo.bianconi@redhat.com> wrote:
>
> >
> > On 29/06/2021 20.37, Jakub Kicinski wrote:
> > > On Tue, 29 Jun 2021 11:18:38 -0700 Alexander Duyck wrote:
> > > > On Tue, Jun 29, 2021 at 10:08 AM Jakub Kicinski <kuba@kernel.org> wrote:
> > > > > > ack, I agree. I will fix it in v10.
> > > > > Why is XDP mb incompatible with LRO? I thought that was one of the use
> > > > > cases (mentioned by Willem IIRC).
> > > > XDP is meant to be a per packet operation with support for TX and
> > > > REDIRECT, and LRO isn't routable. So we could put together a large LRO
> > > > frame but we wouldn't be able to break it apart again. If we allow
> > > > that then we are going to need a ton more exception handling added to
> > > > the XDP paths.
> > > >
> > > > As far as GSO it would require setting many more fields in order to
> > > > actually make it offloadable by any hardware.
> > > It would require more work, but TSO seems to be explicitly stated
> > > as what the series builds towards (in the cover letter). It's fine
> > > to make choices we'd need to redo later, I guess, I'm just trying
> > > to understand the why.
> >
> > This is also my understanding that LRO and TSO is what this patchset is
> > working towards.
> >
> > Sorry, I don't agree or understand this requested change.
> >
> >
>
> My understanding here is to use gso_size to store paged length of the
> xdp multi-buffer. When converting the xdp_frame to a skb we will need
> to overwrite it to support gro/lro. Is my understanding correct?

Yes, I was thinking just of the xdp_buff, not the xdp_frame. My focus
for right now is mostly around the Rx side of things, xdp_buff to skb,
and around the XDP_TX path. If we want to drop/move where we keep the
data length when doing the conversion I would be fine with that.

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v9 bpf-next 00/14] mvneta: introduce XDP multi-buffer support
  2021-06-22 23:18 ` [PATCH v9 bpf-next 00/14] mvneta: introduce XDP multi-buffer support John Fastabend
  2021-06-23  3:41   ` David Ahern
@ 2021-07-01  7:56   ` Magnus Karlsson
  1 sibling, 0 replies; 53+ messages in thread
From: Magnus Karlsson @ 2021-07-01  7:56 UTC (permalink / raw)
  To: John Fastabend
  Cc: Lorenzo Bianconi, bpf, Network Development, Lorenzo Bianconi,
	David S. Miller, Jakub Kicinski, Alexei Starovoitov,
	Daniel Borkmann, shayagr, sameehj, David Ahern,
	Jesper Dangaard Brouer, Eelco Chaudron, Jason Wang,
	Alexander Duyck, Saeed Mahameed, Fijalkowski, Maciej, Karlsson,
	Magnus, Tirthendu

On Wed, Jun 23, 2021 at 1:19 AM John Fastabend <john.fastabend@gmail.com> wrote:
>
> Lorenzo Bianconi wrote:
> > This series introduce XDP multi-buffer support. The mvneta driver is
> > the first to support these new "non-linear" xdp_{buff,frame}. Reviewers
> > please focus on how these new types of xdp_{buff,frame} packets
> > traverse the different layers and the layout design. It is on purpose
> > that BPF-helpers are kept simple, as we don't want to expose the
> > internal layout to allow later changes.
> >
> > For now, to keep the design simple and to maintain performance, the XDP
> > BPF-prog (still) only have access to the first-buffer. It is left for
> > later (another patchset) to add payload access across multiple buffers.
> > This patchset should still allow for these future extensions. The goal
> > is to lift the XDP MTU restriction that comes with XDP, but maintain
> > same performance as before.
>
> At this point I don't think we can have a partial implementation. At
> the moment we have packet capture applications and protocol parsers
> running in production. If we allow this to go in staged we are going
> to break those applications that make the fundamental assumption they
> have access to all the data in the packet.
>
> There will be no way to fix it when it happens. The teams running the
> applications wont necessarily be able to change the network MTU. Now
> it doesn't work, hard stop. This is better than it sort of works some
> of the time. Worse if we get in a situation where some drivers support
> partial access and others support full access the support matrix gets worse.
>
> I think we need to get full support and access to all bytes. I believe
> I said this earlier, but now we've deployed apps that really do need
> access to the payloads so its not a theoritical concern anymore, but
> rather a real one based on deployed BPF programs.
>
> >
> > The main idea for the new multi-buffer layout is to reuse the same
> > layout used for non-linear SKB. This rely on the "skb_shared_info"
> > struct at the end of the first buffer to link together subsequent
> > buffers. Keeping the layout compatible with SKBs is also done to ease
> > and speedup creating an SKB from an xdp_{buff,frame}.
> > Converting xdp_frame to SKB and deliver it to the network stack is shown
> > in patch 07/14 (e.g. cpumaps).
> >
> > A multi-buffer bit (mb) has been introduced in the flags field of xdp_{buff,frame}
> > structure to notify the bpf/network layer if this is a xdp multi-buffer frame
> > (mb = 1) or not (mb = 0).
> > The mb bit will be set by a xdp multi-buffer capable driver only for
> > non-linear frames maintaining the capability to receive linear frames
> > without any extra cost since the skb_shared_info structure at the end
> > of the first buffer will be initialized only if mb is set.
> > Moreover the flags field in xdp_{buff,frame} will be reused even for
> > xdp rx csum offloading in future series.
> >
> > Typical use cases for this series are:
> > - Jumbo-frames
> > - Packet header split (please see Google���s use-case @ NetDevConf 0x14, [0])
> > - TSO
> >
> > A new bpf helper (bpf_xdp_get_buff_len) has been introduce in order to notify
> > the eBPF layer about the total frame size (linear + paged parts).
>
> Is it possible to make currently working programs continue to work?
> For a simple packet capture example a program might capture the
> entire packet of bytes '(data_end - data_start)'. With above implementation
> the program will continue to run, but will no longer be capturing
> all the bytes... so its a silent failure. Otherwise I'll need to
> backport fixes into my BPF programs and releases to ensure they
> don't walk onto a new kernel with multi-buffer support enabled.
> Its not ideal.
>
> >
> > bpf_xdp_adjust_tail and bpf_xdp_copy helpers have been modified to take into
> > account xdp multi-buff frames.
> >
> > More info about the main idea behind this approach can be found here [1][2].
>
> Will read [1],[2].
>
> Where did the perf data for the 40gbps NIC go? I think we want that
> done again on this series with at least 40gbps NICs and better
> yet 100gbps drivers. If its addressed in a patch commit message
> I'm reading the series now.

Here is the perf data for a 40 gbps i40e on my 2.1 GHz Cascade Lake server.

                                xdpsock -r           XDP_DROP        XDP_TX
Lorenzo's patches:  -2%/+1.5 cycles    -3%/+3            +2%/-6 (Yes,
it gets better!)
+ i40e support:        -5.5%/+5               -8%/+9             -9%/+31

It seems that it is the driver support itself that hurts now. The
overhead of the base support has decreased substantially over time
which is good.

> >
> > Changes since v8:
> > - add proper dma unmapping if XDP_TX fails on mvneta for a xdp multi-buff
> > - switch back to skb_shared_info implementation from previous xdp_shared_info
> >   one
> > - avoid using a bietfield in xdp_buff/xdp_frame since it introduces performance
> >   regressions. Tested now on 10G NIC (ixgbe) to verify there are no performance
> >   penalties for regular codebase
> > - add bpf_xdp_get_buff_len helper and remove frame_length field in xdp ctx
> > - add data_len field in skb_shared_info struct
> >
> > Changes since v7:
> > - rebase on top of bpf-next
> > - fix sparse warnings
> > - improve comments for frame_length in include/net/xdp.h
> >
> > Changes since v6:
> > - the main difference respect to previous versions is the new approach proposed
> >   by Eelco to pass full length of the packet to eBPF layer in XDP context
> > - reintroduce multi-buff support to eBPF kself-tests
> > - reintroduce multi-buff support to bpf_xdp_adjust_tail helper
> > - introduce multi-buffer support to bpf_xdp_copy helper
> > - rebase on top of bpf-next
> >
> > Changes since v5:
> > - rebase on top of bpf-next
> > - initialize mb bit in xdp_init_buff() and drop per-driver initialization
> > - drop xdp->mb initialization in xdp_convert_zc_to_xdp_frame()
> > - postpone introduction of frame_length field in XDP ctx to another series
> > - minor changes
> >
> > Changes since v4:
> > - rebase ontop of bpf-next
> > - introduce xdp_shared_info to build xdp multi-buff instead of using the
> >   skb_shared_info struct
> > - introduce frame_length in xdp ctx
> > - drop previous bpf helpers
> > - fix bpf_xdp_adjust_tail for xdp multi-buff
> > - introduce xdp multi-buff self-tests for bpf_xdp_adjust_tail
> > - fix xdp_return_frame_bulk for xdp multi-buff
> >
> > Changes since v3:
> > - rebase ontop of bpf-next
> > - add patch 10/13 to copy back paged data from a xdp multi-buff frame to
> >   userspace buffer for xdp multi-buff selftests
> >
> > Changes since v2:
> > - add throughput measurements
> > - drop bpf_xdp_adjust_mb_header bpf helper
> > - introduce selftest for xdp multibuffer
> > - addressed comments on bpf_xdp_get_frags_count
> > - introduce xdp multi-buff support to cpumaps
> >
> > Changes since v1:
> > - Fix use-after-free in xdp_return_{buff/frame}
> > - Introduce bpf helpers
> > - Introduce xdp_mb sample program
> > - access skb_shared_info->nr_frags only on the last fragment
> >
> > Changes since RFC:
> > - squash multi-buffer bit initialization in a single patch
> > - add mvneta non-linear XDP buff support for tx side
> >
> > [0] https://netdevconf.info/0x14/session.html?talk-the-path-to-tcp-4k-mtu-and-rx-zerocopy
> > [1] https://github.com/xdp-project/xdp-project/blob/master/areas/core/xdp-multi-buffer01-design.org
> > [2] https://netdevconf.info/0x14/session.html?tutorial-add-XDP-support-to-a-NIC-driver (XDPmulti-buffers section)
> >
> > Eelco Chaudron (3):
> >   bpf: add multi-buff support to the bpf_xdp_adjust_tail() API
> >   bpf: add multi-buffer support to xdp copy helpers
> >   bpf: update xdp_adjust_tail selftest to include multi-buffer
> >
> > Lorenzo Bianconi (11):
> >   net: skbuff: add data_len field to skb_shared_info
> >   xdp: introduce flags field in xdp_buff/xdp_frame
> >   net: mvneta: update mb bit before passing the xdp buffer to eBPF layer
> >   xdp: add multi-buff support to xdp_return_{buff/frame}
> >   net: mvneta: add multi buffer support to XDP_TX
> >   net: mvneta: enable jumbo frames for XDP
> >   net: xdp: add multi-buff support to xdp_build_skb_from_frame
> >   bpf: introduce bpf_xdp_get_buff_len helper
> >   bpf: move user_size out of bpf_test_init
> >   bpf: introduce multibuff support to bpf_prog_test_run_xdp()
> >   bpf: test_run: add xdp_shared_info pointer in bpf_test_finish
> >     signature
> >
> >  drivers/net/ethernet/marvell/mvneta.c         | 143 ++++++++++------
> >  include/linux/skbuff.h                        |   5 +-
> >  include/net/xdp.h                             |  56 ++++++-
> >  include/uapi/linux/bpf.h                      |   7 +
> >  kernel/trace/bpf_trace.c                      |   3 +
> >  net/bpf/test_run.c                            | 108 +++++++++---
> >  net/core/filter.c                             | 157 +++++++++++++++++-
> >  net/core/xdp.c                                |  72 +++++++-
> >  tools/include/uapi/linux/bpf.h                |   7 +
> >  .../bpf/prog_tests/xdp_adjust_tail.c          | 105 ++++++++++++
> >  .../selftests/bpf/prog_tests/xdp_bpf2bpf.c    | 127 +++++++++-----
> >  .../bpf/progs/test_xdp_adjust_tail_grow.c     |  10 +-
> >  .../bpf/progs/test_xdp_adjust_tail_shrink.c   |  32 +++-
> >  .../selftests/bpf/progs/test_xdp_bpf2bpf.c    |   2 +-
> >  14 files changed, 705 insertions(+), 129 deletions(-)
> >
> > --
> > 2.31.1
> >
>
>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v9 bpf-next 02/14] xdp: introduce flags field in xdp_buff/xdp_frame
  2021-06-29 13:07       ` Alexander Duyck
  2021-06-29 13:25         ` Lorenzo Bianconi
@ 2021-07-05 15:52         ` Lorenzo Bianconi
  2021-07-05 21:35           ` Alexander Duyck
  1 sibling, 1 reply; 53+ messages in thread
From: Lorenzo Bianconi @ 2021-07-05 15:52 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Lorenzo Bianconi, bpf, Netdev, David Miller, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, shayagr, Jubran, Samih,
	John Fastabend, David Ahern, Jesper Dangaard Brouer,
	Eelco Chaudron, Jason Wang, Saeed Mahameed, Maciej Fijalkowski,
	Karlsson, Magnus, Tirthendu

[-- Attachment #1: Type: text/plain, Size: 2432 bytes --]

> On Tue, Jun 29, 2021 at 5:43 AM Lorenzo Bianconi
> <lorenzo.bianconi@redhat.com> wrote:
> >
> > > On Mon, Jun 14, 2021 at 5:50 AM Lorenzo Bianconi <lorenzo@kernel.org> wrote:
> > > >
> > > > Introduce flags field in xdp_frame/xdp_buffer data structure
> > > > to define additional buffer features. At the moment the only
> > > > supported buffer feature is multi-buffer bit (mb). Multi-buffer bit
> > > > is used to specify if this is a linear buffer (mb = 0) or a multi-buffer
> > > > frame (mb = 1). In the latter case the shared_info area at the end of
> > > > the first buffer will be properly initialized to link together
> > > > subsequent buffers.
> > > >
> > > > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> > >
> > > Instead of passing this between buffers and frames I wonder if this
> > > wouldn't be better to place in something like the xdp_mem_info
> > > structure since this is something that would be specific to how the
> > > device is handling memory anyway. You could probably split the type
> > > field into a 16b type and a 16b flags field. Then add your bit where 0
> > > is linear/legacy and 1 is scatter-gather/multi-buffer.
> > >
> >
> > ack, this should be fine but I put the flag field in xdp_buff/xdp_frame
> > in order to reuse it for some xdp hw-hints (e.g rx checksum type).
> > We can put it in xdp_mem_info too but I guess it would be less intuitive, what
> > do you think?
> 
> I think it makes the most sense in xdp_mem_info. It already tells us
> what to expect in some respect in regards to memory layout as it tells
> us if we are dealing with shared pages or whole pages and how to
> recycle them. I would think that applies almost identically to
> scatter-gather XDP the same way.

Hi Alex,

Reviewing the code to address this comment I think I spotted a corner case
where we can't use this approach. Whenever we run dev_map_bpf_prog_run()
we loose mb info converting xdp_frame to xdp_buff since
xdp_convert_frame_to_buff() does not copy it and we have no xdp_rxq_info there.
Do you think we should add a rxq_info there similar to what we did for cpumap?
I think it is better to keep the previous approach since it seems cleaner and
reusable in the future. What do you think?

Regards,
Lorenzo

> 
> As far as the addition of flags there is still time for that later as
> we still have the 32b of unused space after frame_sz.
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v9 bpf-next 02/14] xdp: introduce flags field in xdp_buff/xdp_frame
  2021-07-05 15:52         ` Lorenzo Bianconi
@ 2021-07-05 21:35           ` Alexander Duyck
  2021-07-06 11:53             ` Lorenzo Bianconi
  0 siblings, 1 reply; 53+ messages in thread
From: Alexander Duyck @ 2021-07-05 21:35 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: Lorenzo Bianconi, bpf, Netdev, David Miller, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, shayagr, Jubran, Samih,
	John Fastabend, David Ahern, Jesper Dangaard Brouer,
	Eelco Chaudron, Jason Wang, Saeed Mahameed, Maciej Fijalkowski,
	Karlsson, Magnus, Tirthendu

On Mon, Jul 5, 2021 at 8:52 AM Lorenzo Bianconi
<lorenzo.bianconi@redhat.com> wrote:
>
> > On Tue, Jun 29, 2021 at 5:43 AM Lorenzo Bianconi
> > <lorenzo.bianconi@redhat.com> wrote:
> > >
> > > > On Mon, Jun 14, 2021 at 5:50 AM Lorenzo Bianconi <lorenzo@kernel.org> wrote:
> > > > >
> > > > > Introduce flags field in xdp_frame/xdp_buffer data structure
> > > > > to define additional buffer features. At the moment the only
> > > > > supported buffer feature is multi-buffer bit (mb). Multi-buffer bit
> > > > > is used to specify if this is a linear buffer (mb = 0) or a multi-buffer
> > > > > frame (mb = 1). In the latter case the shared_info area at the end of
> > > > > the first buffer will be properly initialized to link together
> > > > > subsequent buffers.
> > > > >
> > > > > Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
> > > >
> > > > Instead of passing this between buffers and frames I wonder if this
> > > > wouldn't be better to place in something like the xdp_mem_info
> > > > structure since this is something that would be specific to how the
> > > > device is handling memory anyway. You could probably split the type
> > > > field into a 16b type and a 16b flags field. Then add your bit where 0
> > > > is linear/legacy and 1 is scatter-gather/multi-buffer.
> > > >
> > >
> > > ack, this should be fine but I put the flag field in xdp_buff/xdp_frame
> > > in order to reuse it for some xdp hw-hints (e.g rx checksum type).
> > > We can put it in xdp_mem_info too but I guess it would be less intuitive, what
> > > do you think?
> >
> > I think it makes the most sense in xdp_mem_info. It already tells us
> > what to expect in some respect in regards to memory layout as it tells
> > us if we are dealing with shared pages or whole pages and how to
> > recycle them. I would think that applies almost identically to
> > scatter-gather XDP the same way.
>
> Hi Alex,
>
> Reviewing the code to address this comment I think I spotted a corner case
> where we can't use this approach. Whenever we run dev_map_bpf_prog_run()
> we loose mb info converting xdp_frame to xdp_buff since
> xdp_convert_frame_to_buff() does not copy it and we have no xdp_rxq_info there.
> Do you think we should add a rxq_info there similar to what we did for cpumap?
> I think it is better to keep the previous approach since it seems cleaner and
> reusable in the future. What do you think?
>

Hi Lorenzo,

What about doing something like breaking up the type value in
xdp_mem_info? The fact is having it as an enum doesn't get us much
since we have a 32b type field but are only storing 4 possible values
there currently

The way I see it, scatter-gather is just another memory model
attribute rather than being something entirely new. It makes as much
sense to have a bit there for MEM_TYPE_PAGE_SG as it does for
MEM_TYPE_PAGE_SHARED. I would consider either splitting the type field
into two 16b fields. For example you might have one field that
describes the source pool which is currently either allocated page
(ORDER0, SHARED), page_pool (PAGE_POOL), or XSK pool (XSK_BUFF_POOL),
and then two flags for type with there being either shared and/or
scatter-gather.

Also, looking over the code I don't see any reason why current
ORDER0/SHARED couldn't be merged as the free paths are essentially
identical since the MEM_TYPE_PAGE_SHARED path would function perfectly
fine to free MEM_TYPE_PAGE_ORDER0 pages.

Thanks,

- Alex

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v9 bpf-next 10/14] bpf: add multi-buffer support to xdp copy helpers
  2021-06-24 14:28       ` John Fastabend
  2021-06-25  8:25         ` Eelco Chaudron
  2021-06-29 13:23         ` Lorenzo Bianconi
@ 2021-07-06 10:15         ` Eelco Chaudron
  2 siblings, 0 replies; 53+ messages in thread
From: Eelco Chaudron @ 2021-07-06 10:15 UTC (permalink / raw)
  To: John Fastabend
  Cc: Lorenzo Bianconi, bpf, netdev, lorenzo.bianconi, davem, kuba,
	ast, daniel, shayagr, sameehj, dsahern, brouer, jasowang,
	alexander.duyck, saeed, maciej.fijalkowski, magnus.karlsson,
	tirthendu.sarkar



On 24 Jun 2021, at 16:28, John Fastabend wrote:

> Eelco Chaudron wrote:
>>
>>
>> On 23 Jun 2021, at 1:49, John Fastabend wrote:
>>
>>> Lorenzo Bianconi wrote:
>>>> From: Eelco Chaudron <echaudro@redhat.com>
>>>>
>>>> This patch adds support for multi-buffer for the following helpers:
>>>>   - bpf_xdp_output()
>>>>   - bpf_perf_event_output()
>>>>
>>>> Signed-off-by: Eelco Chaudron <echaudro@redhat.com>
>>>> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
>>>> ---
>>>
>>> Ah ok so at least xdp_output will work with all bytes. But this is
>>> getting close to having access into the frags so I think doing
>>> the last bit shouldn't be too hard?
>>
>>
>> Guess you are talking about multi-buffer access in the XDP program?
>>
>> I did suggest an API a while back, https://lore.kernel.org/bpf/FD3E6E08-DE78-4FBA-96F6-646C93E88631@redhat.com/ but I had/have not time to work on it. Guess the difficult part is to convince the verifier to allow the data to be accessed.
>
> Ah great I think we had the same idea I called it xdp_pull_data()
> though.
>
> Whats the complication though it looks like it can be done by simply
> moving the data and data_end pointers around then marking them
> invalidated. This way the verifier knows the program needs to
> rewrite them. I can probably look more into next week.


Sorry for the late response, but I did do a POC a while back with changing the data and data_end pointers, and this worked. The problem that got raised at the time was that it was not hiding the implementation. i.e. you had to put in the fragment number, and so you needed to know how many fragments existed and the size of each one.

With the API suggested in the above email link, I was trying to avoid this. But it needs a lot of work in the verifier I guess.

> From my first glance it looks relatively straight forward to do
> now. I really would like to avoid yet another iteration of
> programs features I have to discover and somehow work around
> if we can get the helper into this series. If you really don't
> have time I can probably take a look early next week on an
> RFC for something like above helper.

Thanks for looking at it.


Cheers,


Eelco


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v9 bpf-next 02/14] xdp: introduce flags field in xdp_buff/xdp_frame
  2021-07-05 21:35           ` Alexander Duyck
@ 2021-07-06 11:53             ` Lorenzo Bianconi
  2021-07-06 14:04               ` Alexander Duyck
  0 siblings, 1 reply; 53+ messages in thread
From: Lorenzo Bianconi @ 2021-07-06 11:53 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Lorenzo Bianconi, bpf, Netdev, David Miller, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, shayagr, Jubran, Samih,
	John Fastabend, David Ahern, Jesper Dangaard Brouer,
	Eelco Chaudron, Jason Wang, Saeed Mahameed, Maciej Fijalkowski,
	Karlsson, Magnus, Tirthendu

[-- Attachment #1: Type: text/plain, Size: 2339 bytes --]

> On Mon, Jul 5, 2021 at 8:52 AM Lorenzo Bianconi
> <lorenzo.bianconi@redhat.com> wrote:
> >
> > > On Tue, Jun 29, 2021 at 5:43 AM Lorenzo Bianconi
> > > <lorenzo.bianconi@redhat.com> wrote:

[...]
> 
> Hi Lorenzo,
> 
> What about doing something like breaking up the type value in
> xdp_mem_info? The fact is having it as an enum doesn't get us much
> since we have a 32b type field but are only storing 4 possible values
> there currently
> 
> The way I see it, scatter-gather is just another memory model
> attribute rather than being something entirely new. It makes as much
> sense to have a bit there for MEM_TYPE_PAGE_SG as it does for
> MEM_TYPE_PAGE_SHARED. I would consider either splitting the type field
> into two 16b fields. For example you might have one field that
> describes the source pool which is currently either allocated page
> (ORDER0, SHARED), page_pool (PAGE_POOL), or XSK pool (XSK_BUFF_POOL),
> and then two flags for type with there being either shared and/or
> scatter-gather.

Hi Alex,

I am fine reducing the xdp_mem_info size defining type field as u16 instead of
u32 but I think mb is a per-xdp_buff/xdp_frame property since at runtime we can
receive a tiny single page xdp_buff/xdp_frame and a "jumbo" xdp_buff/xdp_frame
composed by multiple pages. According to the documentation available in
include/net/xdp.h, xdp_rxq_info (where xdp_mem_info is contained for xdp_buff) 
is "associated with the driver level RX-ring queues and it is information that
is specific to how the driver have configured a given RX-ring queue" so I guess
it is a little bit counterintuitive to add this info there.
Moreover we have the "issue" for devmap in dev_map_bpf_prog_run() when we
perform XDP_REDIRECT with the approach you proposed and last we can reuse this
new flags filed for XDP hw-hints support.
What about reducing xdp_mem_info and add the flags field in xdp_buff/xdp_frame
in order to avoid increasing the xdp_buff/xdp_frame size? Am I missing
something?

Regards,
Lorenzo

> 
> Also, looking over the code I don't see any reason why current
> ORDER0/SHARED couldn't be merged as the free paths are essentially
> identical since the MEM_TYPE_PAGE_SHARED path would function perfectly
> fine to free MEM_TYPE_PAGE_ORDER0 pages.
> 
> Thanks,
> 
> - Alex
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v9 bpf-next 02/14] xdp: introduce flags field in xdp_buff/xdp_frame
  2021-07-06 11:53             ` Lorenzo Bianconi
@ 2021-07-06 14:04               ` Alexander Duyck
  2021-07-06 17:47                 ` Lorenzo Bianconi
  0 siblings, 1 reply; 53+ messages in thread
From: Alexander Duyck @ 2021-07-06 14:04 UTC (permalink / raw)
  To: Lorenzo Bianconi
  Cc: Lorenzo Bianconi, bpf, Netdev, David Miller, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, shayagr, Jubran, Samih,
	John Fastabend, David Ahern, Jesper Dangaard Brouer,
	Eelco Chaudron, Jason Wang, Saeed Mahameed, Maciej Fijalkowski,
	Karlsson, Magnus, Tirthendu

On Tue, Jul 6, 2021 at 4:53 AM Lorenzo Bianconi
<lorenzo.bianconi@redhat.com> wrote:
>
> > On Mon, Jul 5, 2021 at 8:52 AM Lorenzo Bianconi
> > <lorenzo.bianconi@redhat.com> wrote:
> > >
> > > > On Tue, Jun 29, 2021 at 5:43 AM Lorenzo Bianconi
> > > > <lorenzo.bianconi@redhat.com> wrote:
>
> [...]
> >
> > Hi Lorenzo,
> >
> > What about doing something like breaking up the type value in
> > xdp_mem_info? The fact is having it as an enum doesn't get us much
> > since we have a 32b type field but are only storing 4 possible values
> > there currently
> >
> > The way I see it, scatter-gather is just another memory model
> > attribute rather than being something entirely new. It makes as much
> > sense to have a bit there for MEM_TYPE_PAGE_SG as it does for
> > MEM_TYPE_PAGE_SHARED. I would consider either splitting the type field
> > into two 16b fields. For example you might have one field that
> > describes the source pool which is currently either allocated page
> > (ORDER0, SHARED), page_pool (PAGE_POOL), or XSK pool (XSK_BUFF_POOL),
> > and then two flags for type with there being either shared and/or
> > scatter-gather.
>
> Hi Alex,
>
> I am fine reducing the xdp_mem_info size defining type field as u16 instead of
> u32 but I think mb is a per-xdp_buff/xdp_frame property since at runtime we can
> receive a tiny single page xdp_buff/xdp_frame and a "jumbo" xdp_buff/xdp_frame
> composed by multiple pages. According to the documentation available in
> include/net/xdp.h, xdp_rxq_info (where xdp_mem_info is contained for xdp_buff)
> is "associated with the driver level RX-ring queues and it is information that
> is specific to how the driver have configured a given RX-ring queue" so I guess
> it is a little bit counterintuitive to add this info there.

It isn't really all that counterintuitive. However it does put the
onus on the driver to be consistent about things. So even a
single-buffer xdp_buff would technically have to be a scatter-gather
buff, but it would have no fragments in it. So the requirement would
be to initialize the frags and data_len fields to 0 for all xdp_buff
structures.

> Moreover we have the "issue" for devmap in dev_map_bpf_prog_run() when we
> perform XDP_REDIRECT with the approach you proposed and last we can reuse this
> new flags filed for XDP hw-hints support.
> What about reducing xdp_mem_info and add the flags field in xdp_buff/xdp_frame
> in order to avoid increasing the xdp_buff/xdp_frame size? Am I missing
> something?

The problem is there isn't a mem_info field in the xdp_buff. It is in
the Rx queue info structure.

Thanks,

- Alex

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [PATCH v9 bpf-next 02/14] xdp: introduce flags field in xdp_buff/xdp_frame
  2021-07-06 14:04               ` Alexander Duyck
@ 2021-07-06 17:47                 ` Lorenzo Bianconi
  0 siblings, 0 replies; 53+ messages in thread
From: Lorenzo Bianconi @ 2021-07-06 17:47 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Lorenzo Bianconi, bpf, Netdev, David Miller, Jakub Kicinski,
	Alexei Starovoitov, Daniel Borkmann, shayagr, Jubran, Samih,
	John Fastabend, David Ahern, Jesper Dangaard Brouer,
	Eelco Chaudron, Jason Wang, Saeed Mahameed, Maciej Fijalkowski,
	Karlsson, Magnus, Tirthendu

[-- Attachment #1: Type: text/plain, Size: 3181 bytes --]

> On Tue, Jul 6, 2021 at 4:53 AM Lorenzo Bianconi
> <lorenzo.bianconi@redhat.com> wrote:
> >
> > > On Mon, Jul 5, 2021 at 8:52 AM Lorenzo Bianconi
> > > <lorenzo.bianconi@redhat.com> wrote:
> > > >
> > > > > On Tue, Jun 29, 2021 at 5:43 AM Lorenzo Bianconi
> > > > > <lorenzo.bianconi@redhat.com> wrote:
> >
> > [...]
> > >
> > > Hi Lorenzo,
> > >
> > > What about doing something like breaking up the type value in
> > > xdp_mem_info? The fact is having it as an enum doesn't get us much
> > > since we have a 32b type field but are only storing 4 possible values
> > > there currently
> > >
> > > The way I see it, scatter-gather is just another memory model
> > > attribute rather than being something entirely new. It makes as much
> > > sense to have a bit there for MEM_TYPE_PAGE_SG as it does for
> > > MEM_TYPE_PAGE_SHARED. I would consider either splitting the type field
> > > into two 16b fields. For example you might have one field that
> > > describes the source pool which is currently either allocated page
> > > (ORDER0, SHARED), page_pool (PAGE_POOL), or XSK pool (XSK_BUFF_POOL),
> > > and then two flags for type with there being either shared and/or
> > > scatter-gather.
> >
> > Hi Alex,
> >
> > I am fine reducing the xdp_mem_info size defining type field as u16 instead of
> > u32 but I think mb is a per-xdp_buff/xdp_frame property since at runtime we can
> > receive a tiny single page xdp_buff/xdp_frame and a "jumbo" xdp_buff/xdp_frame
> > composed by multiple pages. According to the documentation available in
> > include/net/xdp.h, xdp_rxq_info (where xdp_mem_info is contained for xdp_buff)
> > is "associated with the driver level RX-ring queues and it is information that
> > is specific to how the driver have configured a given RX-ring queue" so I guess
> > it is a little bit counterintuitive to add this info there.
> 
> It isn't really all that counterintuitive. However it does put the
> onus on the driver to be consistent about things. So even a
> single-buffer xdp_buff would technically have to be a scatter-gather
> buff, but it would have no fragments in it. So the requirement would
> be to initialize the frags and data_len fields to 0 for all xdp_buff
> structures.

nr_frags and data_len are currently defined in skb_shared_info(xdp_buff)
so I guess initialize them to 0 will trigger a cache miss (in fact we
introduced the mb bit just to avoid this initialization and introduce
penalties for legacy single-buffer use-case). Do you mean to have these
fields in xdp_buff/xdp_frame structure?

> 
> > Moreover we have the "issue" for devmap in dev_map_bpf_prog_run() when we
> > perform XDP_REDIRECT with the approach you proposed and last we can reuse this
> > new flags filed for XDP hw-hints support.
> > What about reducing xdp_mem_info and add the flags field in xdp_buff/xdp_frame
> > in order to avoid increasing the xdp_buff/xdp_frame size? Am I missing
> > something?
> 
> The problem is there isn't a mem_info field in the xdp_buff. It is in
> the Rx queue info structure.

yes, this is what I meant :)

Regards,
Lorenzo

> 
> Thanks,
> 
> - Alex
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Backwards compatibility for XDP multi-buff (was: Re: [PATCH v9 bpf-next 08/14] bpf: add multi-buff support to the bpf_xdp_adjust_tail() API)
  2021-06-24 14:24       ` John Fastabend
  2021-06-24 15:16         ` Zvi Effron
  2021-06-29 13:19         ` Lorenzo Bianconi
@ 2021-07-06 21:44         ` Toke Høiland-Jørgensen
  2 siblings, 0 replies; 53+ messages in thread
From: Toke Høiland-Jørgensen @ 2021-07-06 21:44 UTC (permalink / raw)
  To: John Fastabend, Eelco Chaudron, John Fastabend
  Cc: Lorenzo Bianconi, bpf, netdev, lorenzo.bianconi, davem, kuba,
	ast, daniel, shayagr, dsahern, brouer, jasowang, alexander.duyck,
	saeed, maciej.fijalkowski, magnus.karlsson, tirthendu.sarkar

Changing the subject to address this point specifically:

> Right that was my conclusion as well. Existing programs might have
> subtle side effects if they start running on multibuffer drivers as
> is. I don't have any good ideas though on how to handle this.

So I had a chat about this with Lorenzo, Eelco and Jesper today, and
promised I'd summarise our discussion to you all, so this is my attempt
at that. Please excuse the long email, I'm just trying to be
comprehensive :)

So first off, a problem description: If an existing XDP program is
exposed to an xdp_buff that is really a multi-buffer, it may end up with
subtle and hard-to-debug bugs: If it's parsing the packet it'll only see
part of the payload and not be aware of that fact, and if it's
calculating the packet length, that will also only be wrong (only
counting the first fragment).

So what to do about this? First of all, to do anything about it, XDP
programs need to be able to declare themselves "multi-buffer aware" (but
see point 1 below). We could try to auto-detect it in the verifier by
which helpers the program is using, but since existing programs could be
perfectly happy to just keep running, it probably needs to be something
the program communicates explicitly. One option is to use the
expected_attach_type to encode this; programs can then declare it in the
source by section name, or the userspace loader can set the type for
existing programs if needed.

With this, the kernel will know if a given XDP program is multi-buff
aware and can decide what to do with that information. For this we came
up with basically three options:

1. Do nothing. This would make it up to users / sysadmins to avoid
   anything breaking by manually making sure to not enable multi-buffer
   support while loading any XDP programs that will malfunction if
   presented with an mb frame. This will probably break in interesting
   ways, but it's nice and simple from an implementation PoV. With this
   we don't need the declaration discussed above either.

2. Add a check at runtime and drop the frames if they are mb-enabled and
   the program doesn't understand it. This is relatively simple to
   implement, but it also makes for difficult-to-understand issues (why
   are my packets suddenly being dropped?), and it will incur runtime
   overhead.

3. Reject loading of programs that are not MB-aware when running in an
   MB-enabled mode. This would make things break in more obvious ways,
   and still allow a userspace loader to declare a program "MB-aware" to
   force it to run if necessary. The problem then becomes at what level
   to block this?

   Doing this at the driver level is not enough: while a particular
   driver knows if it's running in multi-buff mode, we can't know for
   sure if a particular XDP program is multi-buff aware at attach time:
   it could be tail-calling other programs, or redirecting packets to
   another interface where it will be processed by a non-MB aware
   program.

   So another option is to make it a global toggle: e.g., create a new
   sysctl to enable multi-buffer. If this is set, reject loading any XDP
   program that doesn't support multi-buffer mode, and if it's unset,
   disable multi-buffer mode in all drivers. This will make it explicit
   when the multi-buffer mode is used, and prevent any accidental subtle
   malfunction of existing XDP programs. The drawback is that it's a
   mode switch, so more configuration complexity.

None of these options are ideal, of course, but I hope the above
explanation at least makes sense. If anyone has any better ideas (or can
spot any flaws in the reasoning above) please don't hesitate to let us
know!

-Toke


^ permalink raw reply	[flat|nested] 53+ messages in thread

end of thread, other threads:[~2021-07-06 21:44 UTC | newest]

Thread overview: 53+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-14 12:49 [PATCH v9 bpf-next 00/14] mvneta: introduce XDP multi-buffer support Lorenzo Bianconi
2021-06-14 12:49 ` [PATCH v9 bpf-next 01/14] net: skbuff: add data_len field to skb_shared_info Lorenzo Bianconi
2021-06-28 19:58   ` Alexander Duyck
2021-06-29 12:44     ` Lorenzo Bianconi
2021-06-29 17:08       ` Jakub Kicinski
2021-06-29 18:18         ` Alexander Duyck
2021-06-29 18:37           ` Jakub Kicinski
2021-06-29 19:11             ` Jesper Dangaard Brouer
2021-06-29 19:18               ` Lorenzo Bianconi
2021-06-29 20:45                 ` Alexander Duyck
2021-06-14 12:49 ` [PATCH v9 bpf-next 02/14] xdp: introduce flags field in xdp_buff/xdp_frame Lorenzo Bianconi
2021-06-28 20:14   ` Alexander Duyck
2021-06-29 12:43     ` Lorenzo Bianconi
2021-06-29 13:07       ` Alexander Duyck
2021-06-29 13:25         ` Lorenzo Bianconi
2021-07-05 15:52         ` Lorenzo Bianconi
2021-07-05 21:35           ` Alexander Duyck
2021-07-06 11:53             ` Lorenzo Bianconi
2021-07-06 14:04               ` Alexander Duyck
2021-07-06 17:47                 ` Lorenzo Bianconi
2021-06-14 12:49 ` [PATCH v9 bpf-next 03/14] net: mvneta: update mb bit before passing the xdp buffer to eBPF layer Lorenzo Bianconi
2021-06-14 12:49 ` [PATCH v9 bpf-next 04/14] xdp: add multi-buff support to xdp_return_{buff/frame} Lorenzo Bianconi
2021-06-14 12:49 ` [PATCH v9 bpf-next 05/14] net: mvneta: add multi buffer support to XDP_TX Lorenzo Bianconi
2021-06-14 12:49 ` [PATCH v9 bpf-next 06/14] net: mvneta: enable jumbo frames for XDP Lorenzo Bianconi
2021-06-14 12:49 ` [PATCH v9 bpf-next 07/14] net: xdp: add multi-buff support to xdp_build_skb_from_frame Lorenzo Bianconi
2021-06-28 21:05   ` Alexander Duyck
2021-06-29 18:34     ` Lorenzo Bianconi
2021-06-14 12:49 ` [PATCH v9 bpf-next 08/14] bpf: add multi-buff support to the bpf_xdp_adjust_tail() API Lorenzo Bianconi
2021-06-22 23:37   ` John Fastabend
2021-06-24  9:26     ` Eelco Chaudron
2021-06-24 14:24       ` John Fastabend
2021-06-24 15:16         ` Zvi Effron
2021-06-29 13:19         ` Lorenzo Bianconi
2021-06-29 13:27           ` Toke Høiland-Jørgensen
2021-07-06 21:44         ` Backwards compatibility for XDP multi-buff (was: Re: [PATCH v9 bpf-next 08/14] bpf: add multi-buff support to the bpf_xdp_adjust_tail() API) Toke Høiland-Jørgensen
2021-06-14 12:49 ` [PATCH v9 bpf-next 09/14] bpf: introduce bpf_xdp_get_buff_len helper Lorenzo Bianconi
2021-06-14 12:49 ` [PATCH v9 bpf-next 10/14] bpf: add multi-buffer support to xdp copy helpers Lorenzo Bianconi
2021-06-22 23:49   ` John Fastabend
2021-06-24  9:42     ` Eelco Chaudron
2021-06-24 14:28       ` John Fastabend
2021-06-25  8:25         ` Eelco Chaudron
2021-06-29 13:23         ` Lorenzo Bianconi
2021-07-06 10:15         ` Eelco Chaudron
2021-06-14 12:49 ` [PATCH v9 bpf-next 11/14] bpf: move user_size out of bpf_test_init Lorenzo Bianconi
2021-06-14 12:49 ` [PATCH v9 bpf-next 12/14] bpf: introduce multibuff support to bpf_prog_test_run_xdp() Lorenzo Bianconi
2021-06-14 12:49 ` [PATCH v9 bpf-next 13/14] bpf: test_run: add xdp_shared_info pointer in bpf_test_finish signature Lorenzo Bianconi
2021-06-14 12:49 ` [PATCH v9 bpf-next 14/14] bpf: update xdp_adjust_tail selftest to include multi-buffer Lorenzo Bianconi
2021-06-22 23:18 ` [PATCH v9 bpf-next 00/14] mvneta: introduce XDP multi-buffer support John Fastabend
2021-06-23  3:41   ` David Ahern
2021-06-23  5:48     ` John Fastabend
2021-06-23 14:40       ` David Ahern
2021-06-24 14:22         ` John Fastabend
2021-07-01  7:56   ` Magnus Karlsson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.