bpf.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next 01/33] xdp: add frame size to xdp_buff
       [not found] <158757160439.1370371.13213378122947426220.stgit@firesoul>
@ 2020-04-22 16:07 ` Jesper Dangaard Brouer
  2020-04-24 14:00   ` Toke Høiland-Jørgensen
  2020-04-22 16:07 ` [PATCH net-next 02/33] bnxt: add XDP frame size to driver Jesper Dangaard Brouer
                   ` (31 subsequent siblings)
  32 siblings, 1 reply; 66+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-22 16:07 UTC (permalink / raw)
  To: sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, steffen.klassert

XDP have evolved to support several frame sizes, but xdp_buff was not
updated with this information. The frame size (frame_sz) member of
xdp_buff is introduced to know the real size of the memory the frame is
delivered in.

When introducing this also make it clear that some tailroom is
reserved/required when creating SKBs using build_skb().

It would also have been an option to introduce a pointer to
data_hard_end (with reserved offset). The advantage with frame_sz is
that (like rxq) drivers only need to setup/assign this value once per
NAPI cycle. Due to XDP-generic (and some drivers) it's not possible to
store frame_sz inside xdp_rxq_info, because it's varies per packet as it
can be based/depend on packet length.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 include/net/xdp.h |   13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/include/net/xdp.h b/include/net/xdp.h
index 40c6d3398458..1ccf7df98bee 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -6,6 +6,8 @@
 #ifndef __LINUX_NET_XDP_H__
 #define __LINUX_NET_XDP_H__
 
+#include <linux/skbuff.h> /* skb_shared_info */
+
 /**
  * DOC: XDP RX-queue information
  *
@@ -70,8 +72,19 @@ struct xdp_buff {
 	void *data_hard_start;
 	unsigned long handle;
 	struct xdp_rxq_info *rxq;
+	u32 frame_sz; /* frame size to deduct data_hard_end/reserved tailroom*/
 };
 
+/* Reserve memory area at end-of data area.
+ *
+ * This macro reserves tailroom in the XDP buffer by limiting the
+ * XDP/BPF data access to data_hard_end.  Notice same area (and size)
+ * is used for XDP_PASS, when constructing the SKB via build_skb().
+ */
+#define xdp_data_hard_end(xdp)				\
+	((xdp)->data_hard_start + (xdp)->frame_sz -	\
+	 SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))
+
 struct xdp_frame {
 	void *data;
 	u16 len;



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH net-next 02/33] bnxt: add XDP frame size to driver
       [not found] <158757160439.1370371.13213378122947426220.stgit@firesoul>
  2020-04-22 16:07 ` [PATCH net-next 01/33] xdp: add frame size to xdp_buff Jesper Dangaard Brouer
@ 2020-04-22 16:07 ` Jesper Dangaard Brouer
  2020-04-22 16:07 ` [PATCH net-next 03/33] sfc: add XDP frame size Jesper Dangaard Brouer
                   ` (30 subsequent siblings)
  32 siblings, 0 replies; 66+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-22 16:07 UTC (permalink / raw)
  To: sameehj
  Cc: Michael Chan, Andy Gospodarek, Andy Gospodarek,
	Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, steffen.klassert

This driver uses full PAGE_SIZE pages when XDP is enabled.

In case of XDP uses driver uses __bnxt_alloc_rx_page which does full
page DMA-map. Thus, xdp_adjust_tail grow is DMA compliant for XDP_TX
action that does DMA-sync.

Cc: Michael Chan <michael.chan@broadcom.com>
Cc: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
index c6f6f2033880..5e3b4a3b69ea 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
@@ -138,6 +138,7 @@ bool bnxt_rx_xdp(struct bnxt *bp, struct bnxt_rx_ring_info *rxr, u16 cons,
 	xdp_set_data_meta_invalid(&xdp);
 	xdp.data_end = *data_ptr + *len;
 	xdp.rxq = &rxr->xdp_rxq;
+	xdp.frame_sz = PAGE_SIZE; /* BNXT_RX_PAGE_MODE(bp) when XDP enabled */
 	orig_data = xdp.data;
 
 	rcu_read_lock();



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH net-next 03/33] sfc: add XDP frame size
       [not found] <158757160439.1370371.13213378122947426220.stgit@firesoul>
  2020-04-22 16:07 ` [PATCH net-next 01/33] xdp: add frame size to xdp_buff Jesper Dangaard Brouer
  2020-04-22 16:07 ` [PATCH net-next 02/33] bnxt: add XDP frame size to driver Jesper Dangaard Brouer
@ 2020-04-22 16:07 ` Jesper Dangaard Brouer
  2020-04-22 16:07 ` [PATCH net-next 04/33] mvneta: add XDP frame size to driver Jesper Dangaard Brouer
                   ` (29 subsequent siblings)
  32 siblings, 0 replies; 66+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-22 16:07 UTC (permalink / raw)
  To: sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, steffen.klassert

This driver uses RX page-split when possible. It was recently fixed
in commit 86e85bf6981c ("sfc: fix XDP-redirect in this driver") to
add needed tailroom for XDP-redirect.

After the fix efx->rx_page_buf_step is the frame size, with enough
head and tail-room for XDP-redirect.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/sfc/rx.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/sfc/rx.c b/drivers/net/ethernet/sfc/rx.c
index 260352d97d9d..68c47a8c71df 100644
--- a/drivers/net/ethernet/sfc/rx.c
+++ b/drivers/net/ethernet/sfc/rx.c
@@ -308,6 +308,7 @@ static bool efx_do_xdp(struct efx_nic *efx, struct efx_channel *channel,
 	xdp_set_data_meta_invalid(&xdp);
 	xdp.data_end = xdp.data + rx_buf->len;
 	xdp.rxq = &rx_queue->xdp_rxq_info;
+	xdp.frame_sz = efx->rx_page_buf_step;
 
 	xdp_act = bpf_prog_run_xdp(xdp_prog, &xdp);
 	rcu_read_unlock();



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH net-next 04/33] mvneta: add XDP frame size to driver
       [not found] <158757160439.1370371.13213378122947426220.stgit@firesoul>
                   ` (2 preceding siblings ...)
  2020-04-22 16:07 ` [PATCH net-next 03/33] sfc: add XDP frame size Jesper Dangaard Brouer
@ 2020-04-22 16:07 ` Jesper Dangaard Brouer
  2020-04-22 16:07 ` [PATCH net-next 05/33] net: netsec: Add support for XDP frame size Jesper Dangaard Brouer
                   ` (28 subsequent siblings)
  32 siblings, 0 replies; 66+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-22 16:07 UTC (permalink / raw)
  To: sameehj
  Cc: thomas.petazzoni, Lorenzo Bianconi, Jesper Dangaard Brouer,
	netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, steffen.klassert

This marvell driver mvneta uses PAGE_SIZE frames, which makes it
really easy to convert.  Driver updates rxq and now frame_sz
once per NAPI call.

This driver takes advantage of page_pool PP_FLAG_DMA_SYNC_DEV that
can help reduce the number of cache-lines that need to be flushed
when doing DMA sync for_device. Due to xdp_adjust_tail can grow the
area accessible to the by the CPU (can possibly write into), then max
sync length *after* bpf_prog_run_xdp() needs to be taken into account.

For XDP_TX action the driver is smart and does DMA-sync. When growing
tail this is still safe, because page_pool have DMA-mapped the entire
page size.

Cc: thomas.petazzoni@bootlin.com
Acked-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/marvell/mvneta.c |   25 +++++++++++++++----------
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 51889770958d..37947949345c 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -2148,12 +2148,17 @@ mvneta_run_xdp(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
 	       struct bpf_prog *prog, struct xdp_buff *xdp,
 	       struct mvneta_stats *stats)
 {
-	unsigned int len;
+	unsigned int len, sync;
+	struct page *page;
 	u32 ret, act;
 
 	len = xdp->data_end - xdp->data_hard_start - pp->rx_offset_correction;
 	act = bpf_prog_run_xdp(prog, xdp);
 
+	/* Due xdp_adjust_tail: DMA sync for_device cover max len CPU touch */
+	sync = xdp->data_end - xdp->data_hard_start - pp->rx_offset_correction;
+	sync = max(sync, len);
+
 	switch (act) {
 	case XDP_PASS:
 		stats->xdp_pass++;
@@ -2164,9 +2169,8 @@ mvneta_run_xdp(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
 		err = xdp_do_redirect(pp->dev, xdp, prog);
 		if (unlikely(err)) {
 			ret = MVNETA_XDP_DROPPED;
-			page_pool_put_page(rxq->page_pool,
-					   virt_to_head_page(xdp->data), len,
-					   true);
+			page = virt_to_head_page(xdp->data);
+			page_pool_put_page(rxq->page_pool, page, sync, true);
 		} else {
 			ret = MVNETA_XDP_REDIR;
 			stats->xdp_redirect++;
@@ -2175,10 +2179,10 @@ mvneta_run_xdp(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
 	}
 	case XDP_TX:
 		ret = mvneta_xdp_xmit_back(pp, xdp);
-		if (ret != MVNETA_XDP_TX)
-			page_pool_put_page(rxq->page_pool,
-					   virt_to_head_page(xdp->data), len,
-					   true);
+		if (ret != MVNETA_XDP_TX) {
+			page = virt_to_head_page(xdp->data);
+			page_pool_put_page(rxq->page_pool, page, sync, true);
+		}
 		break;
 	default:
 		bpf_warn_invalid_xdp_action(act);
@@ -2187,8 +2191,8 @@ mvneta_run_xdp(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
 		trace_xdp_exception(pp->dev, prog, act);
 		/* fall through */
 	case XDP_DROP:
-		page_pool_put_page(rxq->page_pool,
-				   virt_to_head_page(xdp->data), len, true);
+		page = virt_to_head_page(xdp->data);
+		page_pool_put_page(rxq->page_pool, page, sync, true);
 		ret = MVNETA_XDP_DROPPED;
 		stats->xdp_drop++;
 		break;
@@ -2320,6 +2324,7 @@ static int mvneta_rx_swbm(struct napi_struct *napi,
 	rcu_read_lock();
 	xdp_prog = READ_ONCE(pp->xdp_prog);
 	xdp_buf.rxq = &rxq->xdp_rxq;
+	xdp_buf.frame_sz = PAGE_SIZE;
 
 	/* Fairness NAPI loop */
 	while (rx_proc < budget && rx_proc < rx_todo) {



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH net-next 05/33] net: netsec: Add support for XDP frame size
       [not found] <158757160439.1370371.13213378122947426220.stgit@firesoul>
                   ` (3 preceding siblings ...)
  2020-04-22 16:07 ` [PATCH net-next 04/33] mvneta: add XDP frame size to driver Jesper Dangaard Brouer
@ 2020-04-22 16:07 ` Jesper Dangaard Brouer
  2020-04-22 16:07 ` [PATCH net-next 06/33] net: XDP-generic determining " Jesper Dangaard Brouer
                   ` (27 subsequent siblings)
  32 siblings, 0 replies; 66+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-22 16:07 UTC (permalink / raw)
  To: sameehj
  Cc: Ilias Apalodimas, Lorenzo Bianconi, Jesper Dangaard Brouer,
	netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, steffen.klassert

From: Ilias Apalodimas <ilias.apalodimas@linaro.org>

This driver takes advantage of page_pool PP_FLAG_DMA_SYNC_DEV that
can help reduce the number of cache-lines that need to be flushed
when doing DMA sync for_device. Due to xdp_adjust_tail can grow the
area accessible to the by the CPU (can possibly write into), then max
sync length *after* bpf_prog_run_xdp() needs to be taken into account.

For XDP_TX action the driver is smart and does DMA-sync. When growing
tail this is still safe, because page_pool have DMA-mapped the entire
page size.

Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 drivers/net/ethernet/socionext/netsec.c |   30 ++++++++++++++++++------------
 1 file changed, 18 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/socionext/netsec.c b/drivers/net/ethernet/socionext/netsec.c
index a5a0fb60193a..e1f4be4b3d69 100644
--- a/drivers/net/ethernet/socionext/netsec.c
+++ b/drivers/net/ethernet/socionext/netsec.c
@@ -884,23 +884,28 @@ static u32 netsec_run_xdp(struct netsec_priv *priv, struct bpf_prog *prog,
 			  struct xdp_buff *xdp)
 {
 	struct netsec_desc_ring *dring = &priv->desc_ring[NETSEC_RING_RX];
-	unsigned int len = xdp->data_end - xdp->data;
+	unsigned int sync, len = xdp->data_end - xdp->data;
 	u32 ret = NETSEC_XDP_PASS;
+	struct page *page;
 	int err;
 	u32 act;
 
 	act = bpf_prog_run_xdp(prog, xdp);
 
+	/* Due xdp_adjust_tail: DMA sync for_device cover max len CPU touch */
+	sync = xdp->data_end - xdp->data_hard_start - NETSEC_RXBUF_HEADROOM;
+	sync = max(sync, len);
+
 	switch (act) {
 	case XDP_PASS:
 		ret = NETSEC_XDP_PASS;
 		break;
 	case XDP_TX:
 		ret = netsec_xdp_xmit_back(priv, xdp);
-		if (ret != NETSEC_XDP_TX)
-			page_pool_put_page(dring->page_pool,
-					   virt_to_head_page(xdp->data), len,
-					   true);
+		if (ret != NETSEC_XDP_TX) {
+			page = virt_to_head_page(xdp->data);
+			page_pool_put_page(dring->page_pool, page, sync, true);
+		}
 		break;
 	case XDP_REDIRECT:
 		err = xdp_do_redirect(priv->ndev, xdp, prog);
@@ -908,9 +913,8 @@ static u32 netsec_run_xdp(struct netsec_priv *priv, struct bpf_prog *prog,
 			ret = NETSEC_XDP_REDIR;
 		} else {
 			ret = NETSEC_XDP_CONSUMED;
-			page_pool_put_page(dring->page_pool,
-					   virt_to_head_page(xdp->data), len,
-					   true);
+			page = virt_to_head_page(xdp->data);
+			page_pool_put_page(dring->page_pool, page, sync, true);
 		}
 		break;
 	default:
@@ -921,8 +925,8 @@ static u32 netsec_run_xdp(struct netsec_priv *priv, struct bpf_prog *prog,
 		/* fall through -- handle aborts by dropping packet */
 	case XDP_DROP:
 		ret = NETSEC_XDP_CONSUMED;
-		page_pool_put_page(dring->page_pool,
-				   virt_to_head_page(xdp->data), len, true);
+		page = virt_to_head_page(xdp->data);
+		page_pool_put_page(dring->page_pool, page, sync, true);
 		break;
 	}
 
@@ -936,10 +940,14 @@ static int netsec_process_rx(struct netsec_priv *priv, int budget)
 	struct netsec_rx_pkt_info rx_info;
 	enum dma_data_direction dma_dir;
 	struct bpf_prog *xdp_prog;
+	struct xdp_buff xdp;
 	u16 xdp_xmit = 0;
 	u32 xdp_act = 0;
 	int done = 0;
 
+	xdp.rxq = &dring->xdp_rxq;
+	xdp.frame_sz = PAGE_SIZE;
+
 	rcu_read_lock();
 	xdp_prog = READ_ONCE(priv->xdp_prog);
 	dma_dir = page_pool_get_dma_dir(dring->page_pool);
@@ -953,7 +961,6 @@ static int netsec_process_rx(struct netsec_priv *priv, int budget)
 		struct sk_buff *skb = NULL;
 		u16 pkt_len, desc_len;
 		dma_addr_t dma_handle;
-		struct xdp_buff xdp;
 		void *buf_addr;
 
 		if (de->attr & (1U << NETSEC_RX_PKT_OWN_FIELD)) {
@@ -1002,7 +1009,6 @@ static int netsec_process_rx(struct netsec_priv *priv, int budget)
 		xdp.data = desc->addr + NETSEC_RXBUF_HEADROOM;
 		xdp_set_data_meta_invalid(&xdp);
 		xdp.data_end = xdp.data + pkt_len;
-		xdp.rxq = &dring->xdp_rxq;
 
 		if (xdp_prog) {
 			xdp_result = netsec_run_xdp(priv, xdp_prog, &xdp);



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH net-next 06/33] net: XDP-generic determining XDP frame size
       [not found] <158757160439.1370371.13213378122947426220.stgit@firesoul>
                   ` (4 preceding siblings ...)
  2020-04-22 16:07 ` [PATCH net-next 05/33] net: netsec: Add support for XDP frame size Jesper Dangaard Brouer
@ 2020-04-22 16:07 ` Jesper Dangaard Brouer
  2020-04-24 14:03   ` Toke Høiland-Jørgensen
  2020-04-22 16:07 ` [PATCH net-next 07/33] xdp: xdp_frame add member frame_sz and handle in convert_to_xdp_frame Jesper Dangaard Brouer
                   ` (26 subsequent siblings)
  32 siblings, 1 reply; 66+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-22 16:07 UTC (permalink / raw)
  To: sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, steffen.klassert

The SKB "head" pointer points to the data area that contains
skb_shared_info, that can be found via skb_end_pointer(). Given
xdp->data_hard_start have been established (basically pointing to
skb->head), frame size is between skb_end_pointer() and data_hard_start,
plus the size reserved to skb_shared_info.

Change the bpf_xdp_adjust_tail offset adjust of skb->len, to be a positive
offset number on grow, and negative number on shrink.  As this seems more
natural when reading the code.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 net/core/dev.c |   14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index fb61522b1ce1..8d827d3e9f3b 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4549,6 +4549,11 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
 	xdp->data_meta = xdp->data;
 	xdp->data_end = xdp->data + hlen;
 	xdp->data_hard_start = skb->data - skb_headroom(skb);
+
+	/* SKB "head" area always have tailroom for skb_shared_info */
+	xdp->frame_sz  = (void *)skb_end_pointer(skb) - xdp->data_hard_start;
+	xdp->frame_sz += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+
 	orig_data_end = xdp->data_end;
 	orig_data = xdp->data;
 	eth = (struct ethhdr *)xdp->data;
@@ -4572,14 +4577,11 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
 		skb_reset_network_header(skb);
 	}
 
-	/* check if bpf_xdp_adjust_tail was used. it can only "shrink"
-	 * pckt.
-	 */
-	off = orig_data_end - xdp->data_end;
+	/* check if bpf_xdp_adjust_tail was used */
+	off = xdp->data_end - orig_data_end;
 	if (off != 0) {
 		skb_set_tail_pointer(skb, xdp->data_end - xdp->data);
-		skb->len -= off;
-
+		skb->len += off; /* positive on grow, negative on shrink */
 	}
 
 	/* check if XDP changed eth hdr such SKB needs update */



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH net-next 07/33] xdp: xdp_frame add member frame_sz and handle in convert_to_xdp_frame
       [not found] <158757160439.1370371.13213378122947426220.stgit@firesoul>
                   ` (5 preceding siblings ...)
  2020-04-22 16:07 ` [PATCH net-next 06/33] net: XDP-generic determining " Jesper Dangaard Brouer
@ 2020-04-22 16:07 ` Jesper Dangaard Brouer
  2020-04-24 14:04   ` Toke Høiland-Jørgensen
  2020-04-25  3:24   ` Toshiaki Makita
  2020-04-22 16:08 ` [PATCH net-next 08/33] xdp: cpumap redirect use frame_sz and increase skb_tailroom Jesper Dangaard Brouer
                   ` (25 subsequent siblings)
  32 siblings, 2 replies; 66+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-22 16:07 UTC (permalink / raw)
  To: sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, steffen.klassert

Use hole in struct xdp_frame, when adding member frame_sz, which keeps
same sizeof struct (32 bytes)

Drivers ixgbe and sfc had bug cases where the necessary/expected
tailroom was not reserved. This can lead to some hard to catch memory
corruption issues. Having the drivers frame_sz this can be detected when
packet length/end via xdp->data_end exceed the xdp_data_hard_end
pointer, which accounts for the reserved the tailroom.

When detecting this driver issue, simply fail the conversion with NULL,
which results in feedback to driver (failing xdp_do_redirect()) causing
driver to drop packet. Given the lack of consistent XDP stats, this can
be hard to troubleshoot. And given this is a driver bug, we want to
generate some more noise in form of a WARN stack dump (to ID the driver
code that inlined convert_to_xdp_frame).

Inlining the WARN macro is problematic, because it adds an asm
instruction (on Intel CPUs ud2) what influence instruction cache
prefetching. Thus, introduce xdp_warn and macro XDP_WARN, to avoid this
and at the same time make identifying the function and line of this
inlined function easier.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 include/net/xdp.h |   14 +++++++++++++-
 net/core/xdp.c    |    7 +++++++
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/include/net/xdp.h b/include/net/xdp.h
index 99f4374f6214..55a885aa4e53 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -93,7 +93,8 @@ struct xdp_frame {
 	void *data;
 	u16 len;
 	u16 headroom;
-	u16 metasize;
+	u32 metasize:8;
+	u32 frame_sz:24;
 	/* Lifetime of xdp_rxq_info is limited to NAPI/enqueue time,
 	 * while mem info is valid on remote CPU.
 	 */
@@ -108,6 +109,10 @@ static inline void xdp_scrub_frame(struct xdp_frame *frame)
 	frame->dev_rx = NULL;
 }
 
+/* Avoids inlining WARN macro in fast-path */
+void xdp_warn(const char* msg, const char* func, const int line);
+#define XDP_WARN(msg) xdp_warn(msg, __func__, __LINE__)
+
 struct xdp_frame *xdp_convert_zc_to_xdp_frame(struct xdp_buff *xdp);
 
 /* Convert xdp_buff to xdp_frame */
@@ -128,6 +133,12 @@ struct xdp_frame *convert_to_xdp_frame(struct xdp_buff *xdp)
 	if (unlikely((headroom - metasize) < sizeof(*xdp_frame)))
 		return NULL;
 
+	/* Catch if driver didn't reserve tailroom for skb_shared_info */
+	if (unlikely(xdp->data_end > xdp_data_hard_end(xdp))) {
+		XDP_WARN("Driver BUG: missing reserved tailroom");
+		return NULL;
+	}
+
 	/* Store info in top of packet */
 	xdp_frame = xdp->data_hard_start;
 
@@ -135,6 +146,7 @@ struct xdp_frame *convert_to_xdp_frame(struct xdp_buff *xdp)
 	xdp_frame->len  = xdp->data_end - xdp->data;
 	xdp_frame->headroom = headroom - sizeof(*xdp_frame);
 	xdp_frame->metasize = metasize;
+	xdp_frame->frame_sz = xdp->frame_sz;
 
 	/* rxq only valid until napi_schedule ends, convert to xdp_mem_info */
 	xdp_frame->mem = xdp->rxq->mem;
diff --git a/net/core/xdp.c b/net/core/xdp.c
index 4c7ea85486af..4bc3026ae218 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -11,6 +11,7 @@
 #include <linux/slab.h>
 #include <linux/idr.h>
 #include <linux/rhashtable.h>
+#include <linux/bug.h>
 #include <net/page_pool.h>
 
 #include <net/xdp.h>
@@ -496,3 +497,9 @@ struct xdp_frame *xdp_convert_zc_to_xdp_frame(struct xdp_buff *xdp)
 	return xdpf;
 }
 EXPORT_SYMBOL_GPL(xdp_convert_zc_to_xdp_frame);
+
+/* Used by XDP_WARN macro, to avoid inlining WARN() in fast-path */
+void xdp_warn(const char* msg, const char* func, const int line) {
+	WARN(1, "XDP_WARN: %s(line:%d): %s\n", func, line, msg);
+};
+EXPORT_SYMBOL_GPL(xdp_warn);



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH net-next 08/33] xdp: cpumap redirect use frame_sz and increase skb_tailroom
       [not found] <158757160439.1370371.13213378122947426220.stgit@firesoul>
                   ` (6 preceding siblings ...)
  2020-04-22 16:07 ` [PATCH net-next 07/33] xdp: xdp_frame add member frame_sz and handle in convert_to_xdp_frame Jesper Dangaard Brouer
@ 2020-04-22 16:08 ` Jesper Dangaard Brouer
  2020-04-24 14:04   ` Toke Høiland-Jørgensen
  2020-04-22 16:08 ` [PATCH net-next 09/33] veth: adjust hard_start offset on redirect XDP frames Jesper Dangaard Brouer
                   ` (24 subsequent siblings)
  32 siblings, 1 reply; 66+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-22 16:08 UTC (permalink / raw)
  To: sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, steffen.klassert

Knowing the memory size backing the packet/xdp_frame data area, and
knowing it already have reserved room for skb_shared_info, simplifies
using build_skb significantly.

With this change we no-longer lie about the SKB truesize, but more
importantly a significant larger skb_tailroom is now provided, e.g. when
drivers uses a full PAGE_SIZE. This extra tailroom (in linear area) can be
used by the network stack when coalescing SKBs (e.g. in skb_try_coalesce,
see TCP cases where tcp_queue_rcv() can 'eat' skb).

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 kernel/bpf/cpumap.c |   21 +++------------------
 1 file changed, 3 insertions(+), 18 deletions(-)

diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
index 70f71b154fa5..9c777ac4d4bd 100644
--- a/kernel/bpf/cpumap.c
+++ b/kernel/bpf/cpumap.c
@@ -162,25 +162,10 @@ static struct sk_buff *cpu_map_build_skb(struct bpf_cpu_map_entry *rcpu,
 	/* Part of headroom was reserved to xdpf */
 	hard_start_headroom = sizeof(struct xdp_frame) +  xdpf->headroom;
 
-	/* build_skb need to place skb_shared_info after SKB end, and
-	 * also want to know the memory "truesize".  Thus, need to
-	 * know the memory frame size backing xdp_buff.
-	 *
-	 * XDP was designed to have PAGE_SIZE frames, but this
-	 * assumption is not longer true with ixgbe and i40e.  It
-	 * would be preferred to set frame_size to 2048 or 4096
-	 * depending on the driver.
-	 *   frame_size = 2048;
-	 *   frame_len  = frame_size - sizeof(*xdp_frame);
-	 *
-	 * Instead, with info avail, skb_shared_info in placed after
-	 * packet len.  This, unfortunately fakes the truesize.
-	 * Another disadvantage of this approach, the skb_shared_info
-	 * is not at a fixed memory location, with mixed length
-	 * packets, which is bad for cache-line hotness.
+	/* Memory size backing xdp_frame data already have reserved
+	 * room for build_skb to place skb_shared_info in tailroom.
 	 */
-	frame_size = SKB_DATA_ALIGN(xdpf->len + hard_start_headroom) +
-		SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+	frame_size = xdpf->frame_sz;
 
 	pkt_data_start = xdpf->data - hard_start_headroom;
 	skb = build_skb_around(skb, pkt_data_start, frame_size);



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH net-next 09/33] veth: adjust hard_start offset on redirect XDP frames
       [not found] <158757160439.1370371.13213378122947426220.stgit@firesoul>
                   ` (7 preceding siblings ...)
  2020-04-22 16:08 ` [PATCH net-next 08/33] xdp: cpumap redirect use frame_sz and increase skb_tailroom Jesper Dangaard Brouer
@ 2020-04-22 16:08 ` Jesper Dangaard Brouer
  2020-04-24 14:05   ` Toke Høiland-Jørgensen
  2020-04-22 16:08 ` [PATCH net-next 10/33] veth: xdp using frame_sz in veth driver Jesper Dangaard Brouer
                   ` (23 subsequent siblings)
  32 siblings, 1 reply; 66+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-22 16:08 UTC (permalink / raw)
  To: sameehj
  Cc: Toshiaki Makita, Mao Wenan, Toshiaki Makita,
	Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, steffen.klassert

When native XDP redirect into a veth device, the frame arrives in the
xdp_frame structure. It is then processed in veth_xdp_rcv_one(),
which can run a new XDP bpf_prog on the packet. Doing so requires
converting xdp_frame to xdp_buff, but the tricky part is that
xdp_frame memory area is located in the top (data_hard_start) memory
area that xdp_buff will point into.

The current code tried to protect the xdp_frame area, by assigning
xdp_buff.data_hard_start past this memory. This results in 32 bytes
less headroom to expand into via BPF-helper bpf_xdp_adjust_head().

This protect step is actually not needed, because BPF-helper
bpf_xdp_adjust_head() already reserve this area, and don't allow
BPF-prog to expand into it. Thus, it is safe to point data_hard_start
directly at xdp_frame memory area.

Cc: Toshiaki Makita <toshiaki.makita1@gmail.com>
Fixes: 9fc8d518d9d5 ("veth: Handle xdp_frames in xdp napi ring")
Reported-by: Mao Wenan <maowenan@huawei.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Toshiaki Makita <toshiaki.makita1@gmail.com>
---
 drivers/net/veth.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index aece0e5eec8c..d5691bb84448 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -564,13 +564,15 @@ static struct sk_buff *veth_xdp_rcv_one(struct veth_rq *rq,
 					struct veth_stats *stats)
 {
 	void *hard_start = frame->data - frame->headroom;
-	void *head = hard_start - sizeof(struct xdp_frame);
 	int len = frame->len, delta = 0;
 	struct xdp_frame orig_frame;
 	struct bpf_prog *xdp_prog;
 	unsigned int headroom;
 	struct sk_buff *skb;
 
+	/* bpf_xdp_adjust_head() assures BPF cannot access xdp_frame area */
+	hard_start -= sizeof(struct xdp_frame);
+
 	rcu_read_lock();
 	xdp_prog = rcu_dereference(rq->xdp_prog);
 	if (likely(xdp_prog)) {
@@ -592,7 +594,6 @@ static struct sk_buff *veth_xdp_rcv_one(struct veth_rq *rq,
 			break;
 		case XDP_TX:
 			orig_frame = *frame;
-			xdp.data_hard_start = head;
 			xdp.rxq->mem = frame->mem;
 			if (unlikely(veth_xdp_tx(rq, &xdp, bq) < 0)) {
 				trace_xdp_exception(rq->dev, xdp_prog, act);
@@ -605,7 +606,6 @@ static struct sk_buff *veth_xdp_rcv_one(struct veth_rq *rq,
 			goto xdp_xmit;
 		case XDP_REDIRECT:
 			orig_frame = *frame;
-			xdp.data_hard_start = head;
 			xdp.rxq->mem = frame->mem;
 			if (xdp_do_redirect(rq->dev, &xdp, xdp_prog)) {
 				frame = &orig_frame;
@@ -629,7 +629,7 @@ static struct sk_buff *veth_xdp_rcv_one(struct veth_rq *rq,
 	rcu_read_unlock();
 
 	headroom = sizeof(struct xdp_frame) + frame->headroom - delta;
-	skb = veth_build_skb(head, headroom, len, 0);
+	skb = veth_build_skb(hard_start, headroom, len, 0);
 	if (!skb) {
 		xdp_return_frame(frame);
 		stats->rx_drops++;



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH net-next 10/33] veth: xdp using frame_sz in veth driver
       [not found] <158757160439.1370371.13213378122947426220.stgit@firesoul>
                   ` (8 preceding siblings ...)
  2020-04-22 16:08 ` [PATCH net-next 09/33] veth: adjust hard_start offset on redirect XDP frames Jesper Dangaard Brouer
@ 2020-04-22 16:08 ` Jesper Dangaard Brouer
  2020-04-24 14:07   ` Toke Høiland-Jørgensen
  2020-04-25  3:10   ` Toshiaki Makita
  2020-04-22 16:08 ` [PATCH net-next 11/33] dpaa2-eth: add XDP frame size Jesper Dangaard Brouer
                   ` (22 subsequent siblings)
  32 siblings, 2 replies; 66+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-22 16:08 UTC (permalink / raw)
  To: sameehj
  Cc: Toshiaki Makita, Lorenzo Bianconi, Jesper Dangaard Brouer,
	netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, steffen.klassert

The veth driver can run XDP in "native" mode in it's own NAPI
handler, and since commit 9fc8d518d9d5 ("veth: Handle xdp_frames in
xdp napi ring") packets can come in two forms either xdp_frame or
skb, calling respectively veth_xdp_rcv_one() or veth_xdp_rcv_skb().

For packets to arrive in xdp_frame format, they will have been
redirected from an XDP native driver. In case of XDP_PASS or no
XDP-prog attached, the veth driver will allocate and create an SKB.

The current code in veth_xdp_rcv_one() xdp_frame case, had to guess
the frame truesize of the incoming xdp_frame, when using
veth_build_skb(). With xdp_frame->frame_sz this is not longer
necessary.

Calculating the frame_sz in veth_xdp_rcv_skb() skb case, is done
similar to the XDP-generic handling code in net/core/dev.c.

Cc: Toshiaki Makita <toshiaki.makita1@gmail.com>
Reviewed-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/veth.c |   22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index d5691bb84448..b586d2fa5551 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -405,10 +405,6 @@ static struct sk_buff *veth_build_skb(void *head, int headroom, int len,
 {
 	struct sk_buff *skb;
 
-	if (!buflen) {
-		buflen = SKB_DATA_ALIGN(headroom + len) +
-			 SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
-	}
 	skb = build_skb(head, buflen);
 	if (!skb)
 		return NULL;
@@ -583,6 +579,7 @@ static struct sk_buff *veth_xdp_rcv_one(struct veth_rq *rq,
 		xdp.data = frame->data;
 		xdp.data_end = frame->data + frame->len;
 		xdp.data_meta = frame->data - frame->metasize;
+		xdp.frame_sz = frame->frame_sz;
 		xdp.rxq = &rq->xdp_rxq;
 
 		act = bpf_prog_run_xdp(xdp_prog, &xdp);
@@ -629,7 +626,7 @@ static struct sk_buff *veth_xdp_rcv_one(struct veth_rq *rq,
 	rcu_read_unlock();
 
 	headroom = sizeof(struct xdp_frame) + frame->headroom - delta;
-	skb = veth_build_skb(hard_start, headroom, len, 0);
+	skb = veth_build_skb(hard_start, headroom, len, frame->frame_sz);
 	if (!skb) {
 		xdp_return_frame(frame);
 		stats->rx_drops++;
@@ -695,9 +692,8 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
 			goto drop;
 		}
 
-		nskb = veth_build_skb(head,
-				      VETH_XDP_HEADROOM + mac_len, skb->len,
-				      PAGE_SIZE);
+		nskb = veth_build_skb(head, VETH_XDP_HEADROOM + mac_len,
+				      skb->len, PAGE_SIZE);
 		if (!nskb) {
 			page_frag_free(head);
 			goto drop;
@@ -715,6 +711,11 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
 	xdp.data_end = xdp.data + pktlen;
 	xdp.data_meta = xdp.data;
 	xdp.rxq = &rq->xdp_rxq;
+
+	/* SKB "head" area always have tailroom for skb_shared_info */
+	xdp.frame_sz = (void *)skb_end_pointer(skb) - xdp.data_hard_start;
+	xdp.frame_sz += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+
 	orig_data = xdp.data;
 	orig_data_end = xdp.data_end;
 
@@ -758,6 +759,7 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
 	}
 	rcu_read_unlock();
 
+	/* check if bpf_xdp_adjust_head was used */
 	delta = orig_data - xdp.data;
 	off = mac_len + delta;
 	if (off > 0)
@@ -765,9 +767,11 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
 	else if (off < 0)
 		__skb_pull(skb, -off);
 	skb->mac_header -= delta;
+
+	/* check if bpf_xdp_adjust_tail was used */
 	off = xdp.data_end - orig_data_end;
 	if (off != 0)
-		__skb_put(skb, off);
+		__skb_put(skb, off); /* positive on grow, negative on shrink */
 	skb->protocol = eth_type_trans(skb, rq->dev);
 
 	metalen = xdp.data - xdp.data_meta;



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH net-next 11/33] dpaa2-eth: add XDP frame size
       [not found] <158757160439.1370371.13213378122947426220.stgit@firesoul>
                   ` (9 preceding siblings ...)
  2020-04-22 16:08 ` [PATCH net-next 10/33] veth: xdp using frame_sz in veth driver Jesper Dangaard Brouer
@ 2020-04-22 16:08 ` Jesper Dangaard Brouer
  2020-04-22 16:08 ` [PATCH net-next 12/33] hv_netvsc: add XDP frame size to driver Jesper Dangaard Brouer
                   ` (21 subsequent siblings)
  32 siblings, 0 replies; 66+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-22 16:08 UTC (permalink / raw)
  To: sameehj
  Cc: Ioana Radulescu, Jesper Dangaard Brouer, netdev, bpf, zorik,
	akiyano, gtzalik, Toke Høiland-Jørgensen,
	Daniel Borkmann, Alexei Starovoitov, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Willem de Bruijn,
	Ilias Apalodimas, Lorenzo Bianconi, Saeed Mahameed,
	steffen.klassert

The dpaa2-eth driver reserve some headroom used for hardware and
software annotation area in RX/TX buffers. Thus, xdp.data_hard_start
doesn't start at page boundary.

When XDP is configured the area reserved via dpaa2_fd_get_offset(fd) is
448 bytes of which XDP have reserved 256 bytes. As frame_sz is
calculated as an offset from xdp_buff.data_hard_start, an adjust from
the full PAGE_SIZE == DPAA2_ETH_RX_BUF_RAW_SIZE.

When doing XDP_REDIRECT, the driver doesn't need this reserved headroom
any-longer and allows xdp_do_redirect() to use it. This is an advantage
for the drivers own ndo-xdp_xmit, as it uses part of this headroom for
itself.  Patch also adjust frame_sz in this case.

The driver cannot support XDP data_meta, because it uses the headroom
just before xdp.data for struct dpaa2_eth_swa (DPAA2_ETH_SWA_SIZE=64),
when transmitting the packet. When transmitting a xdp_frame in
dpaa2_eth_xdp_xmit_frame (call via ndo_xdp_xmit) is uses this area to
store a pointer to xdp_frame and dma_size, which is used in TX
completion (free_tx_fd) to return frame via xdp_return_frame().

Cc: Ioana Radulescu <ruxandra.radulescu@nxp.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c |    7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
index b6c46639aa4c..b5c0225942b5 100644
--- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
+++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
@@ -302,6 +302,9 @@ static u32 run_xdp(struct dpaa2_eth_priv *priv,
 	xdp_set_data_meta_invalid(&xdp);
 	xdp.rxq = &ch->xdp_rxq;
 
+	xdp.frame_sz = DPAA2_ETH_RX_BUF_RAW_SIZE -
+		(dpaa2_fd_get_offset(fd) - XDP_PACKET_HEADROOM);
+
 	xdp_act = bpf_prog_run_xdp(xdp_prog, &xdp);
 
 	/* xdp.data pointer may have changed */
@@ -337,7 +340,11 @@ static u32 run_xdp(struct dpaa2_eth_priv *priv,
 		dma_unmap_page(priv->net_dev->dev.parent, addr,
 			       DPAA2_ETH_RX_BUF_SIZE, DMA_BIDIRECTIONAL);
 		ch->buf_count--;
+
+		/* Allow redirect use of full headroom */
 		xdp.data_hard_start = vaddr;
+		xdp.frame_sz = DPAA2_ETH_RX_BUF_RAW_SIZE;
+
 		err = xdp_do_redirect(priv->net_dev, &xdp, xdp_prog);
 		if (unlikely(err))
 			ch->stats.xdp_drop++;



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH net-next 12/33] hv_netvsc: add XDP frame size to driver
       [not found] <158757160439.1370371.13213378122947426220.stgit@firesoul>
                   ` (10 preceding siblings ...)
  2020-04-22 16:08 ` [PATCH net-next 11/33] dpaa2-eth: add XDP frame size Jesper Dangaard Brouer
@ 2020-04-22 16:08 ` Jesper Dangaard Brouer
  2020-04-22 16:57   ` Haiyang Zhang
  2020-04-22 16:08 ` [PATCH net-next 13/33] qlogic/qede: " Jesper Dangaard Brouer
                   ` (20 subsequent siblings)
  32 siblings, 1 reply; 66+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-22 16:08 UTC (permalink / raw)
  To: sameehj
  Cc: Wei Liu, K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, steffen.klassert

The hyperv NIC drivers XDP implementation is rather disappointing as it
will be a slowdown to enable XDP on this driver, given it will allocate a
new page for each packet and copy over the payload, before invoking the
XDP BPF-prog.

The positive thing it that its easy to determine the xdp.frame_sz.

The XDP implementation for hv_netvsc transparently passes xdp_prog
to the associated VF NIC. Many of the Azure VMs are using SRIOV, so
majority of the data are actually processed directly on the VF driver's XDP
path. So the overhead of the synthetic data path (hv_netvsc) is minimal.

Then XDP is enabled on this driver, XDP_PASS and XDP_TX will create the
SKB via build_skb (based on the newly allocated page). Now using XDP
frame_sz this will provide more skb_tailroom, which netstack can use for
SKB coalescing (e.g tcp_try_coalesce -> skb_try_coalesce).

Cc: Wei Liu <wei.liu@kernel.org>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/hyperv/netvsc_bpf.c |    1 +
 drivers/net/hyperv/netvsc_drv.c |    2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/hyperv/netvsc_bpf.c b/drivers/net/hyperv/netvsc_bpf.c
index b86611041db6..1e0c024b0a93 100644
--- a/drivers/net/hyperv/netvsc_bpf.c
+++ b/drivers/net/hyperv/netvsc_bpf.c
@@ -49,6 +49,7 @@ u32 netvsc_run_xdp(struct net_device *ndev, struct netvsc_channel *nvchan,
 	xdp_set_data_meta_invalid(xdp);
 	xdp->data_end = xdp->data + len;
 	xdp->rxq = &nvchan->xdp_rxq;
+	xdp->frame_sz = PAGE_SIZE;
 	xdp->handle = 0;
 
 	memcpy(xdp->data, data, len);
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index d8e86bdbfba1..651344fea0a5 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -794,7 +794,7 @@ static struct sk_buff *netvsc_alloc_recv_skb(struct net_device *net,
 	if (xbuf) {
 		unsigned int hdroom = xdp->data - xdp->data_hard_start;
 		unsigned int xlen = xdp->data_end - xdp->data;
-		unsigned int frag_size = netvsc_xdp_fraglen(hdroom + xlen);
+		unsigned int frag_size = xdp->frame_sz;
 
 		skb = build_skb(xbuf, frag_size);
 



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH net-next 13/33] qlogic/qede: add XDP frame size to driver
       [not found] <158757160439.1370371.13213378122947426220.stgit@firesoul>
                   ` (11 preceding siblings ...)
  2020-04-22 16:08 ` [PATCH net-next 12/33] hv_netvsc: add XDP frame size to driver Jesper Dangaard Brouer
@ 2020-04-22 16:08 ` Jesper Dangaard Brouer
  2020-04-22 16:08 ` [PATCH net-next 14/33] net: ethernet: ti: add XDP frame size to driver cpsw Jesper Dangaard Brouer
                   ` (19 subsequent siblings)
  32 siblings, 0 replies; 66+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-22 16:08 UTC (permalink / raw)
  To: sameehj
  Cc: Ariel Elior, GR-everest-linux-l2, Jesper Dangaard Brouer, netdev,
	bpf, zorik, akiyano, gtzalik, Toke Høiland-Jørgensen,
	Daniel Borkmann, Alexei Starovoitov, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Willem de Bruijn,
	Ilias Apalodimas, Lorenzo Bianconi, Saeed Mahameed,
	steffen.klassert

The driver qede uses a full page, when XDP is enabled. The drivers value
in rx_buf_seg_size (struct qede_rx_queue) will be PAGE_SIZE when an
XDP bpf_prog is attached.

Cc: Ariel Elior <aelior@marvell.com>
Cc: GR-everest-linux-l2@marvell.com
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/qlogic/qede/qede_fp.c   |    1 +
 drivers/net/ethernet/qlogic/qede/qede_main.c |    2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/qlogic/qede/qede_fp.c b/drivers/net/ethernet/qlogic/qede/qede_fp.c
index c6c20776b474..7598ebe0962a 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_fp.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_fp.c
@@ -1066,6 +1066,7 @@ static bool qede_rx_xdp(struct qede_dev *edev,
 	xdp_set_data_meta_invalid(&xdp);
 	xdp.data_end = xdp.data + *len;
 	xdp.rxq = &rxq->xdp_rxq;
+	xdp.frame_sz = rxq->rx_buf_seg_size; /* PAGE_SIZE when XDP enabled */
 
 	/* Queues always have a full reset currently, so for the time
 	 * being until there's atomic program replace just mark read
diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c b/drivers/net/ethernet/qlogic/qede/qede_main.c
index 34fa3917eb33..39b404e8088f 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_main.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_main.c
@@ -1398,7 +1398,7 @@ static int qede_alloc_mem_rxq(struct qede_dev *edev, struct qede_rx_queue *rxq)
 	if (rxq->rx_buf_size + size > PAGE_SIZE)
 		rxq->rx_buf_size = PAGE_SIZE - size;
 
-	/* Segment size to spilt a page in multiple equal parts ,
+	/* Segment size to split a page in multiple equal parts,
 	 * unless XDP is used in which case we'd use the entire page.
 	 */
 	if (!edev->xdp_prog) {



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH net-next 14/33] net: ethernet: ti: add XDP frame size to driver cpsw
       [not found] <158757160439.1370371.13213378122947426220.stgit@firesoul>
                   ` (12 preceding siblings ...)
  2020-04-22 16:08 ` [PATCH net-next 13/33] qlogic/qede: " Jesper Dangaard Brouer
@ 2020-04-22 16:08 ` Jesper Dangaard Brouer
  2020-04-22 20:28   ` Grygorii Strashko
  2020-04-22 16:08 ` [PATCH net-next 15/33] ena: add XDP frame size to amazon NIC driver Jesper Dangaard Brouer
                   ` (18 subsequent siblings)
  32 siblings, 1 reply; 66+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-22 16:08 UTC (permalink / raw)
  To: sameehj
  Cc: Grygorii Strashko, Ilias Apalodimas, Jesper Dangaard Brouer,
	netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, steffen.klassert

The driver code cpsw.c and cpsw_new.c both use page_pool
with default order-0 pages or their RX-pages.

Cc: Grygorii Strashko <grygorii.strashko@ti.com>
Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/ti/cpsw.c     |    1 +
 drivers/net/ethernet/ti/cpsw_new.c |    1 +
 2 files changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index c2c5bf87da01..58e346ea9898 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -406,6 +406,7 @@ static void cpsw_rx_handler(void *token, int len, int status)
 
 		xdp.data_hard_start = pa;
 		xdp.rxq = &priv->xdp_rxq[ch];
+		xdp.frame_sz = PAGE_SIZE;
 
 		port = priv->emac_port + cpsw->data.dual_emac;
 		ret = cpsw_run_xdp(priv, ch, &xdp, page, port);
diff --git a/drivers/net/ethernet/ti/cpsw_new.c b/drivers/net/ethernet/ti/cpsw_new.c
index 9209e613257d..08e1c5b8f00e 100644
--- a/drivers/net/ethernet/ti/cpsw_new.c
+++ b/drivers/net/ethernet/ti/cpsw_new.c
@@ -348,6 +348,7 @@ static void cpsw_rx_handler(void *token, int len, int status)
 
 		xdp.data_hard_start = pa;
 		xdp.rxq = &priv->xdp_rxq[ch];
+		xdp.frame_sz = PAGE_SIZE;
 
 		ret = cpsw_run_xdp(priv, ch, &xdp, page, priv->emac_port);
 		if (ret != CPSW_XDP_PASS)



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH net-next 15/33] ena: add XDP frame size to amazon NIC driver
       [not found] <158757160439.1370371.13213378122947426220.stgit@firesoul>
                   ` (13 preceding siblings ...)
  2020-04-22 16:08 ` [PATCH net-next 14/33] net: ethernet: ti: add XDP frame size to driver cpsw Jesper Dangaard Brouer
@ 2020-04-22 16:08 ` Jesper Dangaard Brouer
  2020-04-22 16:08 ` [PATCH net-next 16/33] mlx4: add XDP frame size and adjust max XDP MTU Jesper Dangaard Brouer
                   ` (17 subsequent siblings)
  32 siblings, 0 replies; 66+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-22 16:08 UTC (permalink / raw)
  To: sameehj
  Cc: Arthur Kiyanovski, Jesper Dangaard Brouer, netdev, bpf, zorik,
	akiyano, gtzalik, Toke Høiland-Jørgensen,
	Daniel Borkmann, Alexei Starovoitov, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Willem de Bruijn,
	Ilias Apalodimas, Lorenzo Bianconi, Saeed Mahameed,
	steffen.klassert

Frame size ENA_PAGE_SIZE is limited to 16K on systems with larger
PAGE_SIZE than 16K. Change ENA_XDP_MAX_MTU to also take into account
the reserved tailroom.

Cc: Arthur Kiyanovski <akiyano@amazon.com>
Acked-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/amazon/ena/ena_netdev.c |    1 +
 drivers/net/ethernet/amazon/ena/ena_netdev.h |    5 +++--
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index 2cc765df8da3..0fd7db1769f8 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -1606,6 +1606,7 @@ static int ena_clean_rx_irq(struct ena_ring *rx_ring, struct napi_struct *napi,
 		  "%s qid %d\n", __func__, rx_ring->qid);
 	res_budget = budget;
 	xdp.rxq = &rx_ring->xdp_rxq;
+	xdp.frame_sz = ENA_PAGE_SIZE;
 
 	do {
 		xdp_verdict = XDP_PASS;
diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.h b/drivers/net/ethernet/amazon/ena/ena_netdev.h
index 97dfd0c67e84..dd00127dfe9f 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.h
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.h
@@ -151,8 +151,9 @@
  * The buffer size we share with the device is defined to be ENA_PAGE_SIZE
  */
 
-#define ENA_XDP_MAX_MTU (ENA_PAGE_SIZE - ETH_HLEN - ETH_FCS_LEN - \
-				VLAN_HLEN - XDP_PACKET_HEADROOM)
+#define ENA_XDP_MAX_MTU (ENA_PAGE_SIZE - ETH_HLEN - ETH_FCS_LEN -	\
+			 VLAN_HLEN - XDP_PACKET_HEADROOM -		\
+			 SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))
 
 #define ENA_IS_XDP_INDEX(adapter, index) (((index) >= (adapter)->xdp_first_ring) && \
 	((index) < (adapter)->xdp_first_ring + (adapter)->xdp_num_queues))



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH net-next 16/33] mlx4: add XDP frame size and adjust max XDP MTU
       [not found] <158757160439.1370371.13213378122947426220.stgit@firesoul>
                   ` (14 preceding siblings ...)
  2020-04-22 16:08 ` [PATCH net-next 15/33] ena: add XDP frame size to amazon NIC driver Jesper Dangaard Brouer
@ 2020-04-22 16:08 ` Jesper Dangaard Brouer
  2020-04-22 16:08 ` [PATCH net-next 17/33] net: thunderx: add XDP frame size Jesper Dangaard Brouer
                   ` (16 subsequent siblings)
  32 siblings, 0 replies; 66+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-22 16:08 UTC (permalink / raw)
  To: sameehj
  Cc: Tariq Toukan, Saeed Mahameed, Tariq Toukan,
	Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, steffen.klassert

The mlx4 drivers size of memory backing the RX packet is stored in
frag_stride. For XDP mode this will be PAGE_SIZE (normally 4096).
For normal mode frag_stride is 2048.

Also adjust MLX4_EN_MAX_XDP_MTU to take tailroom into account.

Cc: Tariq Toukan <tariqt@mellanox.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c |    3 ++-
 drivers/net/ethernet/mellanox/mlx4/en_rx.c     |    1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 43dcbd8214c6..5bd3cd37d50f 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -51,7 +51,8 @@
 #include "en_port.h"
 
 #define MLX4_EN_MAX_XDP_MTU ((int)(PAGE_SIZE - ETH_HLEN - (2 * VLAN_HLEN) - \
-				   XDP_PACKET_HEADROOM))
+				XDP_PACKET_HEADROOM -			    \
+				SKB_DATA_ALIGN(sizeof(struct skb_shared_info))))
 
 int mlx4_en_setup_tc(struct net_device *dev, u8 up)
 {
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index db3552f2d087..231f08c0276c 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -683,6 +683,7 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
 	rcu_read_lock();
 	xdp_prog = rcu_dereference(ring->xdp_prog);
 	xdp.rxq = &ring->xdp_rxq;
+	xdp.frame_sz = priv->frag_info[0].frag_stride;
 	doorbell_pending = 0;
 
 	/* We assume a 1:1 mapping between CQEs and Rx descriptors, so Rx



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH net-next 17/33] net: thunderx: add XDP frame size
       [not found] <158757160439.1370371.13213378122947426220.stgit@firesoul>
                   ` (15 preceding siblings ...)
  2020-04-22 16:08 ` [PATCH net-next 16/33] mlx4: add XDP frame size and adjust max XDP MTU Jesper Dangaard Brouer
@ 2020-04-22 16:08 ` Jesper Dangaard Brouer
  2020-04-22 16:08 ` [PATCH net-next 18/33] nfp: add XDP frame size to netronome driver Jesper Dangaard Brouer
                   ` (15 subsequent siblings)
  32 siblings, 0 replies; 66+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-22 16:08 UTC (permalink / raw)
  To: sameehj
  Cc: Sunil Goutham, Robert Richter, Jesper Dangaard Brouer, netdev,
	bpf, zorik, akiyano, gtzalik, Toke Høiland-Jørgensen,
	Daniel Borkmann, Alexei Starovoitov, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Willem de Bruijn,
	Ilias Apalodimas, Lorenzo Bianconi, Saeed Mahameed,
	steffen.klassert

To help reviewers these are the defines related to RCV_FRAG_LEN

 #define DMA_BUFFER_LEN	1536 /* In multiples of 128bytes */
 #define RCV_FRAG_LEN	(SKB_DATA_ALIGN(DMA_BUFFER_LEN + NET_SKB_PAD) + \
			 SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))

Cc: Sunil Goutham <sgoutham@marvell.com>
Cc: Robert Richter <rrichter@marvell.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/cavium/thunder/nicvf_main.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_main.c b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
index b4b33368698f..2ba0ce115e63 100644
--- a/drivers/net/ethernet/cavium/thunder/nicvf_main.c
+++ b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
@@ -552,6 +552,7 @@ static inline bool nicvf_xdp_rx(struct nicvf *nic, struct bpf_prog *prog,
 	xdp_set_data_meta_invalid(&xdp);
 	xdp.data_end = xdp.data + len;
 	xdp.rxq = &rq->xdp_rxq;
+	xdp.frame_sz = RCV_FRAG_LEN + XDP_PACKET_HEADROOM;
 	orig_data = xdp.data;
 
 	rcu_read_lock();



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH net-next 18/33] nfp: add XDP frame size to netronome driver
       [not found] <158757160439.1370371.13213378122947426220.stgit@firesoul>
                   ` (16 preceding siblings ...)
  2020-04-22 16:08 ` [PATCH net-next 17/33] net: thunderx: add XDP frame size Jesper Dangaard Brouer
@ 2020-04-22 16:08 ` Jesper Dangaard Brouer
  2020-04-23  2:43   ` Jakub Kicinski
  2020-04-22 16:08 ` [PATCH net-next 19/33] tun: add XDP frame size Jesper Dangaard Brouer
                   ` (14 subsequent siblings)
  32 siblings, 1 reply; 66+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-22 16:08 UTC (permalink / raw)
  To: sameehj
  Cc: Jakub Kicinski, Jesper Dangaard Brouer, netdev, bpf, zorik,
	akiyano, gtzalik, Toke Høiland-Jørgensen,
	Daniel Borkmann, Alexei Starovoitov, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Willem de Bruijn,
	Ilias Apalodimas, Lorenzo Bianconi, Saeed Mahameed,
	steffen.klassert

The netronome nfp driver use PAGE_SIZE when xdp_prog is set, but
xdp.data_hard_start begins at offset NFP_NET_RX_BUF_HEADROOM.
Thus, adjust for this when setting xdp.frame_sz, as it counts
from data_hard_start.

When doing XDP_TX this driver is smart and instead of a full DMA-map
does a DMA-sync on with packet length. As xdp_adjust_tail can now
grow packet length, add checks to make sure that grow size is within
the DMA-mapped size.

Cc: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 .../net/ethernet/netronome/nfp/nfp_net_common.c    |    6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 9bfb3b077bc1..0e0cc3d58bdc 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -1741,10 +1741,15 @@ nfp_net_tx_xdp_buf(struct nfp_net_dp *dp, struct nfp_net_rx_ring *rx_ring,
 		   struct nfp_net_rx_buf *rxbuf, unsigned int dma_off,
 		   unsigned int pkt_len, bool *completed)
 {
+	unsigned int dma_map_sz = dp->fl_bufsz - NFP_NET_RX_BUF_NON_DATA;
 	struct nfp_net_tx_buf *txbuf;
 	struct nfp_net_tx_desc *txd;
 	int wr_idx;
 
+	/* Reject if xdp_adjust_tail grow packet beyond DMA area */
+	if (pkt_len + dma_off > dma_map_sz)
+		return false;
+
 	if (unlikely(nfp_net_tx_full(tx_ring, 1))) {
 		if (!*completed) {
 			nfp_net_xdp_complete(tx_ring);
@@ -1817,6 +1822,7 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, int budget)
 	rcu_read_lock();
 	xdp_prog = READ_ONCE(dp->xdp_prog);
 	true_bufsz = xdp_prog ? PAGE_SIZE : dp->fl_bufsz;
+	xdp.frame_sz = PAGE_SIZE - NFP_NET_RX_BUF_HEADROOM;
 	xdp.rxq = &rx_ring->xdp_rxq;
 	tx_ring = r_vec->xdp_ring;
 



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH net-next 19/33] tun: add XDP frame size
       [not found] <158757160439.1370371.13213378122947426220.stgit@firesoul>
                   ` (17 preceding siblings ...)
  2020-04-22 16:08 ` [PATCH net-next 18/33] nfp: add XDP frame size to netronome driver Jesper Dangaard Brouer
@ 2020-04-22 16:08 ` Jesper Dangaard Brouer
  2020-04-27  5:51   ` Jason Wang
  2020-05-06 20:30   ` Michael S. Tsirkin
  2020-04-22 16:09 ` [PATCH net-next 20/33] vhost_net: also populate " Jesper Dangaard Brouer
                   ` (13 subsequent siblings)
  32 siblings, 2 replies; 66+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-22 16:08 UTC (permalink / raw)
  To: sameehj
  Cc: Jason Wang, Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano,
	gtzalik, Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, steffen.klassert

The tun driver have two code paths for running XDP (bpf_prog_run_xdp).
In both cases 'buflen' contains enough tailroom for skb_shared_info.

Cc: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/tun.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 44889eba1dbc..c54f967e2c66 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1671,6 +1671,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
 		xdp_set_data_meta_invalid(&xdp);
 		xdp.data_end = xdp.data + len;
 		xdp.rxq = &tfile->xdp_rxq;
+		xdp.frame_sz = buflen;
 
 		act = bpf_prog_run_xdp(xdp_prog, &xdp);
 		if (act == XDP_REDIRECT || act == XDP_TX) {
@@ -2411,6 +2412,7 @@ static int tun_xdp_one(struct tun_struct *tun,
 		}
 		xdp_set_data_meta_invalid(xdp);
 		xdp->rxq = &tfile->xdp_rxq;
+		xdp->frame_sz = buflen;
 
 		act = bpf_prog_run_xdp(xdp_prog, xdp);
 		err = tun_xdp_act(tun, xdp_prog, xdp, act);



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH net-next 20/33] vhost_net: also populate XDP frame size
       [not found] <158757160439.1370371.13213378122947426220.stgit@firesoul>
                   ` (18 preceding siblings ...)
  2020-04-22 16:08 ` [PATCH net-next 19/33] tun: add XDP frame size Jesper Dangaard Brouer
@ 2020-04-22 16:09 ` Jesper Dangaard Brouer
  2020-04-27  5:50   ` Jason Wang
  2020-04-22 16:09 ` [PATCH net-next 21/33] virtio_net: add XDP frame size in two code paths Jesper Dangaard Brouer
                   ` (12 subsequent siblings)
  32 siblings, 1 reply; 66+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-22 16:09 UTC (permalink / raw)
  To: sameehj
  Cc: Jason Wang, Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano,
	gtzalik, Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, steffen.klassert

In vhost_net_build_xdp() the 'buf' that gets queued via an xdp_buff
have embedded a struct tun_xdp_hdr (located at xdp->data_hard_start)
which contains the buffer length 'buflen' (with tailroom for
skb_shared_info). Also storing this buflen in xdp->frame_sz, does not
obsolete struct tun_xdp_hdr, as it also contains a struct
virtio_net_hdr with other information.

Cc: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/vhost/net.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 87469d67ede8..69af007e22f4 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -745,6 +745,7 @@ static int vhost_net_build_xdp(struct vhost_net_virtqueue *nvq,
 	xdp->data = buf + pad;
 	xdp->data_end = xdp->data + len;
 	hdr->buflen = buflen;
+	xdp->frame_sz = buflen;
 
 	--net->refcnt_bias;
 	alloc_frag->offset += buflen;



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH net-next 21/33] virtio_net: add XDP frame size in two code paths
       [not found] <158757160439.1370371.13213378122947426220.stgit@firesoul>
                   ` (19 preceding siblings ...)
  2020-04-22 16:09 ` [PATCH net-next 20/33] vhost_net: also populate " Jesper Dangaard Brouer
@ 2020-04-22 16:09 ` Jesper Dangaard Brouer
  2020-04-27  7:21   ` Jason Wang
  2020-04-22 16:09 ` [PATCH net-next 22/33] ixgbe: fix XDP redirect on archs with PAGE_SIZE above 4K Jesper Dangaard Brouer
                   ` (11 subsequent siblings)
  32 siblings, 1 reply; 66+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-22 16:09 UTC (permalink / raw)
  To: sameehj
  Cc: Jason Wang, Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano,
	gtzalik, Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, steffen.klassert

The virtio_net driver is running inside the guest-OS. There are two
XDP receive code-paths in virtio_net, namely receive_small() and
receive_mergeable(). The receive_big() function does not support XDP.

In receive_small() the frame size is available in buflen. The buffer
backing these frames are allocated in add_recvbuf_small() with same
size, except for the headroom, but tailroom have reserved room for
skb_shared_info. The headroom is encoded in ctx pointer as a value.

In receive_mergeable() the frame size is more dynamic. There are two
basic cases: (1) buffer size is based on a exponentially weighted
moving average (see DECLARE_EWMA) of packet length. Or (2) in case
virtnet_get_headroom() have any headroom then buffer size is
PAGE_SIZE. The ctx pointer is this time used for encoding two values;
the buffer len "truesize" and headroom. In case (1) if the rx buffer
size is underestimated, the packet will have been split over more
buffers (num_buf info in virtio_net_hdr_mrg_rxbuf placed in top of
buffer area). If that happens the XDP path does a xdp_linearize_page
operation.

Cc: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/virtio_net.c |   15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 11f722460513..1df3676da185 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -689,6 +689,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
 		xdp.data_end = xdp.data + len;
 		xdp.data_meta = xdp.data;
 		xdp.rxq = &rq->xdp_rxq;
+		xdp.frame_sz = buflen;
 		orig_data = xdp.data;
 		act = bpf_prog_run_xdp(xdp_prog, &xdp);
 		stats->xdp_packets++;
@@ -797,10 +798,11 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 	int offset = buf - page_address(page);
 	struct sk_buff *head_skb, *curr_skb;
 	struct bpf_prog *xdp_prog;
-	unsigned int truesize;
+	unsigned int truesize = mergeable_ctx_to_truesize(ctx);
 	unsigned int headroom = mergeable_ctx_to_headroom(ctx);
-	int err;
 	unsigned int metasize = 0;
+	unsigned int frame_sz;
+	int err;
 
 	head_skb = NULL;
 	stats->bytes += len - vi->hdr_len;
@@ -821,6 +823,11 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 		if (unlikely(hdr->hdr.gso_type))
 			goto err_xdp;
 
+		/* Buffers with headroom use PAGE_SIZE as alloc size,
+		 * see add_recvbuf_mergeable() + get_mergeable_buf_len()
+		 */
+		frame_sz = headroom ? PAGE_SIZE : truesize;
+
 		/* This happens when rx buffer size is underestimated
 		 * or headroom is not enough because of the buffer
 		 * was refilled before XDP is set. This should only
@@ -834,6 +841,8 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 						      page, offset,
 						      VIRTIO_XDP_HEADROOM,
 						      &len);
+			frame_sz = PAGE_SIZE;
+
 			if (!xdp_page)
 				goto err_xdp;
 			offset = VIRTIO_XDP_HEADROOM;
@@ -850,6 +859,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 		xdp.data_end = xdp.data + (len - vi->hdr_len);
 		xdp.data_meta = xdp.data;
 		xdp.rxq = &rq->xdp_rxq;
+		xdp.frame_sz = frame_sz;
 
 		act = bpf_prog_run_xdp(xdp_prog, &xdp);
 		stats->xdp_packets++;
@@ -924,7 +934,6 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 	}
 	rcu_read_unlock();
 
-	truesize = mergeable_ctx_to_truesize(ctx);
 	if (unlikely(len > truesize)) {
 		pr_debug("%s: rx error: len %u exceeds truesize %lu\n",
 			 dev->name, len, (unsigned long)ctx);



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH net-next 22/33] ixgbe: fix XDP redirect on archs with PAGE_SIZE above 4K
       [not found] <158757160439.1370371.13213378122947426220.stgit@firesoul>
                   ` (20 preceding siblings ...)
  2020-04-22 16:09 ` [PATCH net-next 21/33] virtio_net: add XDP frame size in two code paths Jesper Dangaard Brouer
@ 2020-04-22 16:09 ` Jesper Dangaard Brouer
  2020-04-22 16:09 ` [PATCH net-next 23/33] ixgbe: add XDP frame size to driver Jesper Dangaard Brouer
                   ` (10 subsequent siblings)
  32 siblings, 0 replies; 66+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-22 16:09 UTC (permalink / raw)
  To: sameehj
  Cc: Jeff Kirsher, Jesper Dangaard Brouer, netdev, bpf, zorik,
	akiyano, gtzalik, Toke Høiland-Jørgensen,
	Daniel Borkmann, Alexei Starovoitov, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Willem de Bruijn,
	Ilias Apalodimas, Lorenzo Bianconi, Saeed Mahameed,
	steffen.klassert

The ixgbe driver have another memory model when compiled on archs with
PAGE_SIZE above 4096 bytes. In this mode it doesn't split the page in
two halves, but instead increment rx_buffer->page_offset by truesize of
packet (which include headroom and tailroom for skb_shared_info).

This is done correctly in ixgbe_build_skb(), but in ixgbe_rx_buffer_flip
which is currently only called on XDP_TX and XDP_REDIRECT, it forgets
to add the tailroom for skb_shared_info. This breaks XDP_REDIRECT, for
veth and cpumap.  Fix by adding size of skb_shared_info tailroom.

Maintainers notice: This fix have been queued to Jeff.

Fixes: 6453073987ba ("ixgbe: add initial support for xdp redirect")
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 718931d951bc..ea6834bae04c 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -2254,7 +2254,8 @@ static void ixgbe_rx_buffer_flip(struct ixgbe_ring *rx_ring,
 	rx_buffer->page_offset ^= truesize;
 #else
 	unsigned int truesize = ring_uses_build_skb(rx_ring) ?
-				SKB_DATA_ALIGN(IXGBE_SKB_PAD + size) :
+				SKB_DATA_ALIGN(IXGBE_SKB_PAD + size) +
+				SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) :
 				SKB_DATA_ALIGN(size);
 
 	rx_buffer->page_offset += truesize;



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH net-next 23/33] ixgbe: add XDP frame size to driver
       [not found] <158757160439.1370371.13213378122947426220.stgit@firesoul>
                   ` (21 preceding siblings ...)
  2020-04-22 16:09 ` [PATCH net-next 22/33] ixgbe: fix XDP redirect on archs with PAGE_SIZE above 4K Jesper Dangaard Brouer
@ 2020-04-22 16:09 ` Jesper Dangaard Brouer
  2020-04-27 19:51   ` Daniel Borkmann
  2020-04-22 16:09 ` [PATCH net-next 24/33] ixgbevf: add XDP frame size to VF driver Jesper Dangaard Brouer
                   ` (9 subsequent siblings)
  32 siblings, 1 reply; 66+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-22 16:09 UTC (permalink / raw)
  To: sameehj
  Cc: intel-wired-lan, Jeff Kirsher, Alexander Duyck,
	Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, steffen.klassert

This driver uses different memory models depending on PAGE_SIZE at
compile time. For PAGE_SIZE 4K it uses page splitting, meaning for
normal MTU frame size is 2048 bytes (and headroom 192 bytes). For
larger MTUs the driver still use page splitting, by allocating
order-1 pages (8192 bytes) for RX frames. For PAGE_SIZE larger than
4K, driver instead advance its rx_buffer->page_offset with the frame
size "truesize".

For XDP frame size calculations, this mean that in PAGE_SIZE larger
than 4K mode the frame_sz change on a per packet basis. For the page
split 4K PAGE_SIZE mode, xdp.frame_sz is more constant and can be
updated once outside the main NAPI loop.

The default setting in the driver uses build_skb(), which provides
the necessary headroom and tailroom for XDP-redirect in RX-frame
(in both modes).

There is one complication, which is legacy-rx mode (configurable via
ethtool priv-flags). There are zero headroom in this mode, which is a
requirement for XDP-redirect to work. The conversion to xdp_frame
(convert_to_xdp_frame) will detect this insufficient space, and
xdp_do_redirect() call will fail. This is deemed acceptable, as it
allows other XDP actions to still work in legacy-mode. In
legacy-mode + larger PAGE_SIZE due to lacking tailroom, we also
accept that xdp_adjust_tail shrink doesn't work.

Cc: intel-wired-lan@lists.osuosl.org
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   34 +++++++++++++++++++------
 1 file changed, 26 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index ea6834bae04c..eab5934b04f5 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -2244,20 +2244,30 @@ static struct sk_buff *ixgbe_run_xdp(struct ixgbe_adapter *adapter,
 	return ERR_PTR(-result);
 }
 
+static unsigned int ixgbe_rx_frame_truesize(struct ixgbe_ring *rx_ring,
+					    unsigned int size)
+{
+	unsigned int truesize;
+
+#if (PAGE_SIZE < 8192)
+	truesize = ixgbe_rx_pg_size(rx_ring) / 2; /* Must be power-of-2 */
+#else
+	truesize = ring_uses_build_skb(rx_ring) ?
+		SKB_DATA_ALIGN(IXGBE_SKB_PAD + size) +
+		SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) :
+		SKB_DATA_ALIGN(size);
+#endif
+	return truesize;
+}
+
 static void ixgbe_rx_buffer_flip(struct ixgbe_ring *rx_ring,
 				 struct ixgbe_rx_buffer *rx_buffer,
 				 unsigned int size)
 {
+	unsigned int truesize = ixgbe_rx_frame_truesize(rx_ring, size);
 #if (PAGE_SIZE < 8192)
-	unsigned int truesize = ixgbe_rx_pg_size(rx_ring) / 2;
-
 	rx_buffer->page_offset ^= truesize;
 #else
-	unsigned int truesize = ring_uses_build_skb(rx_ring) ?
-				SKB_DATA_ALIGN(IXGBE_SKB_PAD + size) +
-				SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) :
-				SKB_DATA_ALIGN(size);
-
 	rx_buffer->page_offset += truesize;
 #endif
 }
@@ -2291,6 +2301,11 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 
 	xdp.rxq = &rx_ring->xdp_rxq;
 
+	/* Frame size depend on rx_ring setup when PAGE_SIZE=4K */
+#if (PAGE_SIZE < 8192)
+	xdp.frame_sz = ixgbe_rx_frame_truesize(rx_ring, 0);
+#endif
+
 	while (likely(total_rx_packets < budget)) {
 		union ixgbe_adv_rx_desc *rx_desc;
 		struct ixgbe_rx_buffer *rx_buffer;
@@ -2324,7 +2339,10 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 			xdp.data_hard_start = xdp.data -
 					      ixgbe_rx_offset(rx_ring);
 			xdp.data_end = xdp.data + size;
-
+#if (PAGE_SIZE > 4096)
+			/* At larger PAGE_SIZE, frame_sz depend on len size */
+			xdp.frame_sz = ixgbe_rx_frame_truesize(rx_ring, size);
+#endif
 			skb = ixgbe_run_xdp(adapter, rx_ring, &xdp);
 		}
 



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH net-next 24/33] ixgbevf: add XDP frame size to VF driver
       [not found] <158757160439.1370371.13213378122947426220.stgit@firesoul>
                   ` (22 preceding siblings ...)
  2020-04-22 16:09 ` [PATCH net-next 23/33] ixgbe: add XDP frame size to driver Jesper Dangaard Brouer
@ 2020-04-22 16:09 ` Jesper Dangaard Brouer
  2020-04-22 16:09 ` [PATCH net-next 25/33] i40e: add XDP frame size to driver Jesper Dangaard Brouer
                   ` (8 subsequent siblings)
  32 siblings, 0 replies; 66+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-22 16:09 UTC (permalink / raw)
  To: sameehj
  Cc: intel-wired-lan, Jeff Kirsher, Alexander Duyck,
	Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, steffen.klassert

This patch mirrors the changes to ixgbe in previous patch.

This VF driver doesn't support XDP_REDIRECT, but correct tailroom is
still necessary for BPF-helper xdp_adjust_tail.  In legacy-mode +
larger PAGE_SIZE, due to lacking tailroom, we accept that
xdp_adjust_tail shrink doesn't work.

Cc: intel-wired-lan@lists.osuosl.org
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c |   34 +++++++++++++++++----
 1 file changed, 27 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 4622c4ea2e46..62bc3e3b5b9c 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -1095,19 +1095,31 @@ static struct sk_buff *ixgbevf_run_xdp(struct ixgbevf_adapter *adapter,
 	return ERR_PTR(-result);
 }
 
+static unsigned int ixgbevf_rx_frame_truesize(struct ixgbevf_ring *rx_ring,
+					      unsigned int size)
+{
+	unsigned int truesize;
+
+#if (PAGE_SIZE < 8192)
+	truesize = ixgbevf_rx_pg_size(rx_ring) / 2; /* Must be power-of-2 */
+#else
+	truesize = ring_uses_build_skb(rx_ring) ?
+		SKB_DATA_ALIGN(IXGBEVF_SKB_PAD + size) +
+		SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) :
+		SKB_DATA_ALIGN(size);
+#endif
+       return truesize;
+}
+
 static void ixgbevf_rx_buffer_flip(struct ixgbevf_ring *rx_ring,
 				   struct ixgbevf_rx_buffer *rx_buffer,
 				   unsigned int size)
 {
-#if (PAGE_SIZE < 8192)
-	unsigned int truesize = ixgbevf_rx_pg_size(rx_ring) / 2;
+	unsigned int truesize = ixgbevf_rx_frame_truesize(rx_ring, size);
 
+#if (PAGE_SIZE < 8192)
 	rx_buffer->page_offset ^= truesize;
 #else
-	unsigned int truesize = ring_uses_build_skb(rx_ring) ?
-				SKB_DATA_ALIGN(IXGBEVF_SKB_PAD + size) :
-				SKB_DATA_ALIGN(size);
-
 	rx_buffer->page_offset += truesize;
 #endif
 }
@@ -1125,6 +1137,11 @@ static int ixgbevf_clean_rx_irq(struct ixgbevf_q_vector *q_vector,
 
 	xdp.rxq = &rx_ring->xdp_rxq;
 
+	/* Frame size depend on rx_ring setup when PAGE_SIZE=4K */
+#if (PAGE_SIZE < 8192)
+	xdp.frame_sz = ixgbevf_rx_frame_truesize(rx_ring, 0);
+#endif
+
 	while (likely(total_rx_packets < budget)) {
 		struct ixgbevf_rx_buffer *rx_buffer;
 		union ixgbe_adv_rx_desc *rx_desc;
@@ -1157,7 +1174,10 @@ static int ixgbevf_clean_rx_irq(struct ixgbevf_q_vector *q_vector,
 			xdp.data_hard_start = xdp.data -
 					      ixgbevf_rx_offset(rx_ring);
 			xdp.data_end = xdp.data + size;
-
+#if (PAGE_SIZE > 4096)
+			/* At larger PAGE_SIZE, frame_sz depend on len size */
+			xdp.frame_sz = ixgbevf_rx_frame_truesize(rx_ring, size);
+#endif
 			skb = ixgbevf_run_xdp(adapter, rx_ring, &xdp);
 		}
 



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH net-next 25/33] i40e: add XDP frame size to driver
       [not found] <158757160439.1370371.13213378122947426220.stgit@firesoul>
                   ` (23 preceding siblings ...)
  2020-04-22 16:09 ` [PATCH net-next 24/33] ixgbevf: add XDP frame size to VF driver Jesper Dangaard Brouer
@ 2020-04-22 16:09 ` Jesper Dangaard Brouer
  2020-04-22 16:09 ` [PATCH net-next 26/33] ice: " Jesper Dangaard Brouer
                   ` (7 subsequent siblings)
  32 siblings, 0 replies; 66+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-22 16:09 UTC (permalink / raw)
  To: sameehj
  Cc: intel-wired-lan, Jeff Kirsher, Alexander Duyck,
	Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, steffen.klassert

This driver uses different memory models depending on PAGE_SIZE at
compile time. For PAGE_SIZE 4K it uses page splitting, meaning for
normal MTU frame size is 2048 bytes (and headroom 192 bytes). For
larger MTUs the driver still use page splitting, by allocating
order-1 pages (8192 bytes) for RX frames. For PAGE_SIZE larger than
4K, driver instead advance its rx_buffer->page_offset with the frame
size "truesize".

For XDP frame size calculations, this mean that in PAGE_SIZE larger
than 4K mode the frame_sz change on a per packet basis. For the page
split 4K PAGE_SIZE mode, xdp.frame_sz is more constant and can be
updated once outside the main NAPI loop.

The default setting in the driver uses build_skb(), which provides
the necessary headroom and tailroom for XDP-redirect in RX-frame
(in both modes).

There is one complication, which is legacy-rx mode (configurable via
ethtool priv-flags). There are zero headroom in this mode, which is a
requirement for XDP-redirect to work. The conversion to xdp_frame
(convert_to_xdp_frame) will detect this insufficient space, and
xdp_do_redirect() call will fail. This is deemed acceptable, as it
allows other XDP actions to still work in legacy-mode. In
legacy-mode + larger PAGE_SIZE due to lacking tailroom, we also
accept that xdp_adjust_tail shrink doesn't work.

Cc: intel-wired-lan@lists.osuosl.org
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c |   30 +++++++++++++++++++++++----
 1 file changed, 25 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index b8496037ef7f..a3772beffe02 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -1507,6 +1507,22 @@ static inline unsigned int i40e_rx_offset(struct i40e_ring *rx_ring)
 	return ring_uses_build_skb(rx_ring) ? I40E_SKB_PAD : 0;
 }
 
+static unsigned int i40e_rx_frame_truesize(struct i40e_ring *rx_ring,
+					   unsigned int size)
+{
+	unsigned int truesize;
+
+#if (PAGE_SIZE < 8192)
+	truesize = i40e_rx_pg_size(rx_ring) / 2; /* Must be power-of-2 */
+#else
+	truesize = i40e_rx_offset(rx_ring) ?
+		SKB_DATA_ALIGN(size + i40e_rx_offset(rx_ring)) +
+		SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) :
+		SKB_DATA_ALIGN(size);
+#endif
+	return truesize;
+}
+
 /**
  * i40e_alloc_mapped_page - recycle or make a new page
  * @rx_ring: ring to use
@@ -2246,13 +2262,11 @@ static void i40e_rx_buffer_flip(struct i40e_ring *rx_ring,
 				struct i40e_rx_buffer *rx_buffer,
 				unsigned int size)
 {
-#if (PAGE_SIZE < 8192)
-	unsigned int truesize = i40e_rx_pg_size(rx_ring) / 2;
+	unsigned int truesize = i40e_rx_frame_truesize(rx_ring, size);
 
+#if (PAGE_SIZE < 8192)
 	rx_buffer->page_offset ^= truesize;
 #else
-	unsigned int truesize = SKB_DATA_ALIGN(i40e_rx_offset(rx_ring) + size);
-
 	rx_buffer->page_offset += truesize;
 #endif
 }
@@ -2335,6 +2349,9 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
 	bool failure = false;
 	struct xdp_buff xdp;
 
+#if (PAGE_SIZE < 8192)
+	xdp.frame_sz = i40e_rx_frame_truesize(rx_ring, 0);
+#endif
 	xdp.rxq = &rx_ring->xdp_rxq;
 
 	while (likely(total_rx_packets < (unsigned int)budget)) {
@@ -2389,7 +2406,10 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
 			xdp.data_hard_start = xdp.data -
 					      i40e_rx_offset(rx_ring);
 			xdp.data_end = xdp.data + size;
-
+#if (PAGE_SIZE > 4096)
+			/* At larger PAGE_SIZE, frame_sz depend on len size */
+			xdp.frame_sz = i40e_rx_frame_truesize(rx_ring, size);
+#endif
 			skb = i40e_run_xdp(rx_ring, &xdp);
 		}
 



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH net-next 26/33] ice: add XDP frame size to driver
       [not found] <158757160439.1370371.13213378122947426220.stgit@firesoul>
                   ` (24 preceding siblings ...)
  2020-04-22 16:09 ` [PATCH net-next 25/33] i40e: add XDP frame size to driver Jesper Dangaard Brouer
@ 2020-04-22 16:09 ` Jesper Dangaard Brouer
  2020-04-22 16:09 ` [PATCH net-next 27/33] xdp: for Intel AF_XDP drivers add XDP frame_sz Jesper Dangaard Brouer
                   ` (6 subsequent siblings)
  32 siblings, 0 replies; 66+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-22 16:09 UTC (permalink / raw)
  To: sameehj
  Cc: intel-wired-lan, Jeff Kirsher, Alexander Duyck,
	Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, steffen.klassert

This driver uses different memory models depending on PAGE_SIZE at
compile time. For PAGE_SIZE 4K it uses page splitting, meaning for
normal MTU frame size is 2048 bytes (and headroom 192 bytes). For
larger MTUs the driver still use page splitting, by allocating
order-1 pages (8192 bytes) for RX frames. For PAGE_SIZE larger than
4K, driver instead advance its rx_buffer->page_offset with the frame
size "truesize".

For XDP frame size calculations, this mean that in PAGE_SIZE larger
than 4K mode the frame_sz change on a per packet basis. For the page
split 4K PAGE_SIZE mode, xdp.frame_sz is more constant and can be
updated once outside the main NAPI loop.

The default setting in the driver uses build_skb(), which provides
the necessary headroom and tailroom for XDP-redirect in RX-frame
(in both modes).

There is one complication, which is legacy-rx mode (configurable via
ethtool priv-flags). There are zero headroom in this mode, which is a
requirement for XDP-redirect to work. The conversion to xdp_frame
(convert_to_xdp_frame) will detect this insufficient space, and
xdp_do_redirect() call will fail. This is deemed acceptable, as it
allows other XDP actions to still work in legacy-mode. In
legacy-mode + larger PAGE_SIZE due to lacking tailroom, we also
accept that xdp_adjust_tail shrink doesn't work.

Cc: intel-wired-lan@lists.osuosl.org
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/intel/ice/ice_txrx.c |   34 +++++++++++++++++++++--------
 1 file changed, 25 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c
index f67e8362958c..69b21b436f9a 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
@@ -423,6 +423,22 @@ static unsigned int ice_rx_offset(struct ice_ring *rx_ring)
 	return 0;
 }
 
+static unsigned int ice_rx_frame_truesize(struct ice_ring *rx_ring,
+					  unsigned int size)
+{
+	unsigned int truesize;
+
+#if (PAGE_SIZE < 8192)
+	truesize = ice_rx_pg_size(rx_ring) / 2; /* Must be power-of-2 */
+#else
+	truesize = ice_rx_offset(rx_ring) ?
+		SKB_DATA_ALIGN(ice_rx_offset(rx_ring) + size) +
+		SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) :
+		SKB_DATA_ALIGN(size);
+#endif
+	return truesize;
+}
+
 /**
  * ice_run_xdp - Executes an XDP program on initialized xdp_buff
  * @rx_ring: Rx ring
@@ -991,6 +1007,10 @@ static int ice_clean_rx_irq(struct ice_ring *rx_ring, int budget)
 	bool failure;
 
 	xdp.rxq = &rx_ring->xdp_rxq;
+	/* Frame size depend on rx_ring setup when PAGE_SIZE=4K */
+#if (PAGE_SIZE < 8192)
+	xdp.frame_sz = ice_rx_frame_truesize(rx_ring, 0);
+#endif
 
 	/* start the loop to process Rx packets bounded by 'budget' */
 	while (likely(total_rx_pkts < (unsigned int)budget)) {
@@ -1038,6 +1058,10 @@ static int ice_clean_rx_irq(struct ice_ring *rx_ring, int budget)
 		xdp.data_hard_start = xdp.data - ice_rx_offset(rx_ring);
 		xdp.data_meta = xdp.data;
 		xdp.data_end = xdp.data + size;
+#if (PAGE_SIZE > 4096)
+		/* At larger PAGE_SIZE, frame_sz depend on len size */
+		xdp.frame_sz = ice_rx_frame_truesize(rx_ring, size);
+#endif
 
 		rcu_read_lock();
 		xdp_prog = READ_ONCE(rx_ring->xdp_prog);
@@ -1051,16 +1075,8 @@ static int ice_clean_rx_irq(struct ice_ring *rx_ring, int budget)
 		if (!xdp_res)
 			goto construct_skb;
 		if (xdp_res & (ICE_XDP_TX | ICE_XDP_REDIR)) {
-			unsigned int truesize;
-
-#if (PAGE_SIZE < 8192)
-			truesize = ice_rx_pg_size(rx_ring) / 2;
-#else
-			truesize = SKB_DATA_ALIGN(ice_rx_offset(rx_ring) +
-						  size);
-#endif
 			xdp_xmit |= xdp_res;
-			ice_rx_buf_adjust_pg_offset(rx_buf, truesize);
+			ice_rx_buf_adjust_pg_offset(rx_buf, xdp.frame_sz);
 		} else {
 			rx_buf->pagecnt_bias++;
 		}



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH net-next 27/33] xdp: for Intel AF_XDP drivers add XDP frame_sz
       [not found] <158757160439.1370371.13213378122947426220.stgit@firesoul>
                   ` (25 preceding siblings ...)
  2020-04-22 16:09 ` [PATCH net-next 26/33] ice: " Jesper Dangaard Brouer
@ 2020-04-22 16:09 ` Jesper Dangaard Brouer
  2020-04-22 16:09 ` [PATCH net-next 28/33] mlx5: rx queue setup time determine frame_sz for XDP Jesper Dangaard Brouer
                   ` (5 subsequent siblings)
  32 siblings, 0 replies; 66+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-22 16:09 UTC (permalink / raw)
  To: sameehj
  Cc: intel-wired-lan, Björn Töpel, Magnus Karlsson,
	Björn Töpel, Jesper Dangaard Brouer, netdev, bpf,
	zorik, akiyano, gtzalik, Toke Høiland-Jørgensen,
	Daniel Borkmann, Alexei Starovoitov, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Willem de Bruijn,
	Ilias Apalodimas, Lorenzo Bianconi, Saeed Mahameed,
	steffen.klassert

Intel drivers implement native AF_XDP zerocopy in separate C-files,
that have its own invocation of bpf_prog_run_xdp(). The setup of
xdp_buff is also handled in separately from normal code path.

This patch update XDP frame_sz for AF_XDP zerocopy drivers i40e, ice
and ixgbe, as the code changes needed are very similar.  Introduce a
helper function xsk_umem_xdp_frame_sz() for calculating frame size.

Cc: intel-wired-lan@lists.osuosl.org
Cc: Björn Töpel <bjorn.topel@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_xsk.c   |    2 ++
 drivers/net/ethernet/intel/ice/ice_xsk.c     |    2 ++
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c |    2 ++
 include/net/xdp_sock.h                       |   11 +++++++++++
 4 files changed, 17 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index 0b7d29192b2c..2b9184aead5f 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -531,12 +531,14 @@ int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget)
 {
 	unsigned int total_rx_bytes = 0, total_rx_packets = 0;
 	u16 cleaned_count = I40E_DESC_UNUSED(rx_ring);
+	struct xdp_umem *umem = rx_ring->xsk_umem;
 	unsigned int xdp_res, xdp_xmit = 0;
 	bool failure = false;
 	struct sk_buff *skb;
 	struct xdp_buff xdp;
 
 	xdp.rxq = &rx_ring->xdp_rxq;
+	xdp.frame_sz = xsk_umem_xdp_frame_sz(umem);
 
 	while (likely(total_rx_packets < (unsigned int)budget)) {
 		struct i40e_rx_buffer *bi;
diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c
index 8279db15e870..23e5515d4527 100644
--- a/drivers/net/ethernet/intel/ice/ice_xsk.c
+++ b/drivers/net/ethernet/intel/ice/ice_xsk.c
@@ -840,11 +840,13 @@ int ice_clean_rx_irq_zc(struct ice_ring *rx_ring, int budget)
 {
 	unsigned int total_rx_bytes = 0, total_rx_packets = 0;
 	u16 cleaned_count = ICE_DESC_UNUSED(rx_ring);
+	struct xdp_umem *umem = rx_ring->xsk_umem;
 	unsigned int xdp_xmit = 0;
 	bool failure = false;
 	struct xdp_buff xdp;
 
 	xdp.rxq = &rx_ring->xdp_rxq;
+	xdp.frame_sz = xsk_umem_xdp_frame_sz(umem);
 
 	while (likely(total_rx_packets < (unsigned int)budget)) {
 		union ice_32b_rx_flex_desc *rx_desc;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
index 74b540ebb3dc..a656ee9a1fae 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
@@ -431,12 +431,14 @@ int ixgbe_clean_rx_irq_zc(struct ixgbe_q_vector *q_vector,
 	unsigned int total_rx_bytes = 0, total_rx_packets = 0;
 	struct ixgbe_adapter *adapter = q_vector->adapter;
 	u16 cleaned_count = ixgbe_desc_unused(rx_ring);
+	struct xdp_umem *umem = rx_ring->xsk_umem;
 	unsigned int xdp_res, xdp_xmit = 0;
 	bool failure = false;
 	struct sk_buff *skb;
 	struct xdp_buff xdp;
 
 	xdp.rxq = &rx_ring->xdp_rxq;
+	xdp.frame_sz = xsk_umem_xdp_frame_sz(umem);
 
 	while (likely(total_rx_packets < budget)) {
 		union ixgbe_adv_rx_desc *rx_desc;
diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
index e86ec48ef627..1cd1ec3cea97 100644
--- a/include/net/xdp_sock.h
+++ b/include/net/xdp_sock.h
@@ -237,6 +237,12 @@ static inline u64 xsk_umem_adjust_offset(struct xdp_umem *umem, u64 address,
 	else
 		return address + offset;
 }
+
+static inline u32 xsk_umem_xdp_frame_sz(struct xdp_umem *umem)
+{
+	return umem->chunk_size_nohr + umem->headroom;
+}
+
 #else
 static inline int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
 {
@@ -367,6 +373,11 @@ static inline u64 xsk_umem_adjust_offset(struct xdp_umem *umem, u64 handle,
 	return 0;
 }
 
+static inline u32 xsk_umem_xdp_frame_sz(struct xdp_umem *umem)
+{
+	return 0;
+}
+
 static inline int __xsk_map_redirect(struct xdp_sock *xs, struct xdp_buff *xdp)
 {
 	return -EOPNOTSUPP;



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH net-next 28/33] mlx5: rx queue setup time determine frame_sz for XDP
       [not found] <158757160439.1370371.13213378122947426220.stgit@firesoul>
                   ` (26 preceding siblings ...)
  2020-04-22 16:09 ` [PATCH net-next 27/33] xdp: for Intel AF_XDP drivers add XDP frame_sz Jesper Dangaard Brouer
@ 2020-04-22 16:09 ` Jesper Dangaard Brouer
  2020-04-25  0:58   ` Alexei Starovoitov
  2020-04-27 20:22   ` Jesper Dangaard Brouer
  2020-04-22 16:09 ` [PATCH net-next 29/33] xdp: allow bpf_xdp_adjust_tail() to grow packet size Jesper Dangaard Brouer
                   ` (4 subsequent siblings)
  32 siblings, 2 replies; 66+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-22 16:09 UTC (permalink / raw)
  To: sameehj
  Cc: Tariq Toukan, Saeed Mahameed, Jesper Dangaard Brouer, netdev,
	bpf, zorik, akiyano, gtzalik, Toke Høiland-Jørgensen,
	Daniel Borkmann, Alexei Starovoitov, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Willem de Bruijn,
	Ilias Apalodimas, Lorenzo Bianconi, Saeed Mahameed,
	steffen.klassert

The mlx5 driver have multiple memory models, which are also changed
according to whether a XDP bpf_prog is attached.

The 'rx_striding_rq' setting is adjusted via ethtool priv-flags e.g.:
 # ethtool --set-priv-flags mlx5p2 rx_striding_rq off

On the general case with 4K page_size and regular MTU packet, then
the frame_sz is 2048 and 4096 when XDP is enabled, in both modes.

The info on the given frame size is stored differently depending on the
RQ-mode and encoded in a union in struct mlx5e_rq union wqe/mpwqe.
In rx striding mode rq->mpwqe.log_stride_sz is either 11 or 12, which
corresponds to 2048 or 4096 (MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ).
In non-striding mode (MLX5_WQ_TYPE_CYCLIC) the frag_stride is stored
in rq->wqe.info.arr[0].frag_stride, for the first fragment, which is
what the XDP case cares about.

To reduce effect on fast-path, this patch determine the frame_sz at
setup time, to avoid determining the memory model runtime. Variable
is named first_frame_sz to make it clear that this is only the frame
size of the first fragment.

This mlx5 driver does a DMA-sync on XDP_TX action, but grow is safe
as it have done a DMA-map on the entire PAGE_SIZE. The driver also
already does a XDP length check against sq->hw_mtu on the possible
XDP xmit paths mlx5e_xmit_xdp_frame() + mlx5e_xmit_xdp_frame_mpwqe().

Cc: Tariq Toukan <tariqt@mellanox.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h      |    1 +
 drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c  |    1 +
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c |    6 ++++++
 3 files changed, 8 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 12a61bf82c14..5fa5fa891856 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -651,6 +651,7 @@ struct mlx5e_rq {
 	struct {
 		u16            umem_headroom;
 		u16            headroom;
+		u32            first_frame_sz;
 		u8             map_dir;   /* dma map direction */
 	} buff;
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
index f049e0ac308a..b63abaf51253 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
@@ -137,6 +137,7 @@ bool mlx5e_xdp_handle(struct mlx5e_rq *rq, struct mlx5e_dma_info *di,
 	if (xsk)
 		xdp.handle = di->xsk.handle;
 	xdp.rxq = &rq->xdp_rxq;
+	xdp.frame_sz = rq->buff.first_frame_sz;
 
 	act = bpf_prog_run_xdp(prog, &xdp);
 	if (xsk) {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index e057822898f8..200cfd61cc54 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -462,6 +462,8 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c,
 		rq->mpwqe.num_strides =
 			BIT(mlx5e_mpwqe_get_log_num_strides(mdev, params, xsk));
 
+		rq->buff.first_frame_sz = (1 << rq->mpwqe.log_stride_sz);
+
 		err = mlx5e_create_rq_umr_mkey(mdev, rq);
 		if (err)
 			goto err_rq_wq_destroy;
@@ -485,6 +487,8 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c,
 			num_xsk_frames = wq_sz << rq->wqe.info.log_num_frags;
 
 		rq->wqe.info = rqp->frags_info;
+		rq->buff.first_frame_sz = rq->wqe.info.arr[0].frag_stride;
+
 		rq->wqe.frags =
 			kvzalloc_node(array_size(sizeof(*rq->wqe.frags),
 					(wq_sz << rq->wqe.info.log_num_frags)),
@@ -522,6 +526,8 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c,
 	}
 
 	if (xsk) {
+		rq->buff.first_frame_sz = xsk_umem_xdp_frame_sz(umem);
+
 		err = mlx5e_xsk_resize_reuseq(umem, num_xsk_frames);
 		if (unlikely(err)) {
 			mlx5_core_err(mdev, "Unable to allocate the Reuse Ring for %u frames\n",



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH net-next 29/33] xdp: allow bpf_xdp_adjust_tail() to grow packet size
       [not found] <158757160439.1370371.13213378122947426220.stgit@firesoul>
                   ` (27 preceding siblings ...)
  2020-04-22 16:09 ` [PATCH net-next 28/33] mlx5: rx queue setup time determine frame_sz for XDP Jesper Dangaard Brouer
@ 2020-04-22 16:09 ` Jesper Dangaard Brouer
  2020-04-24 14:09   ` Toke Høiland-Jørgensen
  2020-04-27 19:01   ` Daniel Borkmann
  2020-04-22 16:09 ` [PATCH net-next 30/33] xdp: clear grow memory in bpf_xdp_adjust_tail() Jesper Dangaard Brouer
                   ` (3 subsequent siblings)
  32 siblings, 2 replies; 66+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-22 16:09 UTC (permalink / raw)
  To: sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, steffen.klassert

Finally, after all drivers have a frame size, allow BPF-helper
bpf_xdp_adjust_tail() to grow or extend packet size at frame tail.

Remember that helper/macro xdp_data_hard_end have reserved some
tailroom.  Thus, this helper makes sure that the BPF-prog don't have
access to this tailroom area.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 include/uapi/linux/bpf.h |    4 ++--
 net/core/filter.c        |   15 +++++++++++++--
 2 files changed, 15 insertions(+), 4 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 2e29a671d67e..0e5abe991ca3 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1969,8 +1969,8 @@ union bpf_attr {
  * int bpf_xdp_adjust_tail(struct xdp_buff *xdp_md, int delta)
  * 	Description
  * 		Adjust (move) *xdp_md*\ **->data_end** by *delta* bytes. It is
- * 		only possible to shrink the packet as of this writing,
- * 		therefore *delta* must be a negative integer.
+ * 		possible to both shrink and grow the packet tail.
+ * 		Shrink done via *delta* being a negative integer.
  *
  * 		A call to this helper is susceptible to change the underlying
  * 		packet buffer. Therefore, at load time, all checks on pointers
diff --git a/net/core/filter.c b/net/core/filter.c
index 7d6ceaa54d21..5e9c387f74eb 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3422,12 +3422,23 @@ static const struct bpf_func_proto bpf_xdp_adjust_head_proto = {
 
 BPF_CALL_2(bpf_xdp_adjust_tail, struct xdp_buff *, xdp, int, offset)
 {
+	void *data_hard_end = xdp_data_hard_end(xdp);
 	void *data_end = xdp->data_end + offset;
 
-	/* only shrinking is allowed for now. */
-	if (unlikely(offset >= 0))
+	/* Notice that xdp_data_hard_end have reserved some tailroom */
+	if (unlikely(data_end > data_hard_end))
 		return -EINVAL;
 
+	/* ALL drivers MUST init xdp->frame_sz, some chicken checks below */
+	if (unlikely(xdp->frame_sz < (xdp->data_end - xdp->data_hard_start))) {
+		WARN(1, "Too small xdp->frame_sz = %d\n", xdp->frame_sz);
+		return -EINVAL;
+	}
+	if (unlikely(xdp->frame_sz > PAGE_SIZE)) {
+		WARN(1, "Too BIG xdp->frame_sz = %d\n", xdp->frame_sz);
+		return -EINVAL;
+	}
+
 	if (unlikely(data_end < xdp->data + ETH_HLEN))
 		return -EINVAL;
 



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH net-next 30/33] xdp: clear grow memory in bpf_xdp_adjust_tail()
       [not found] <158757160439.1370371.13213378122947426220.stgit@firesoul>
                   ` (28 preceding siblings ...)
  2020-04-22 16:09 ` [PATCH net-next 29/33] xdp: allow bpf_xdp_adjust_tail() to grow packet size Jesper Dangaard Brouer
@ 2020-04-22 16:09 ` Jesper Dangaard Brouer
  2020-04-24 14:09   ` Toke Høiland-Jørgensen
  2020-04-27  5:26   ` John Fastabend
  2020-04-22 16:09 ` [PATCH net-next 31/33] bpf: add xdp.frame_sz in bpf_prog_test_run_xdp() Jesper Dangaard Brouer
                   ` (2 subsequent siblings)
  32 siblings, 2 replies; 66+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-22 16:09 UTC (permalink / raw)
  To: sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, steffen.klassert

Clearing memory of tail when grow happens, because it is too easy
to write a XDP_PASS program that extend the tail, which expose
this memory to users that can run tcpdump.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 net/core/filter.c |    4 ++++
 1 file changed, 4 insertions(+)

diff --git a/net/core/filter.c b/net/core/filter.c
index 5e9c387f74eb..889d96a690c2 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3442,6 +3442,10 @@ BPF_CALL_2(bpf_xdp_adjust_tail, struct xdp_buff *, xdp, int, offset)
 	if (unlikely(data_end < xdp->data + ETH_HLEN))
 		return -EINVAL;
 
+	/* Clear memory area on grow, can contain uninit kernel memory */
+	if (offset > 0)
+		memset(xdp->data_end, 0, offset);
+
 	xdp->data_end = data_end;
 
 	return 0;



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH net-next 31/33] bpf: add xdp.frame_sz in bpf_prog_test_run_xdp().
       [not found] <158757160439.1370371.13213378122947426220.stgit@firesoul>
                   ` (29 preceding siblings ...)
  2020-04-22 16:09 ` [PATCH net-next 30/33] xdp: clear grow memory in bpf_xdp_adjust_tail() Jesper Dangaard Brouer
@ 2020-04-22 16:09 ` Jesper Dangaard Brouer
  2020-04-22 16:10 ` [PATCH net-next 32/33] selftests/bpf: adjust BPF selftest for xdp_adjust_tail Jesper Dangaard Brouer
  2020-04-22 16:10 ` [PATCH net-next 33/33] selftests/bpf: xdp_adjust_tail add grow tail tests Jesper Dangaard Brouer
  32 siblings, 0 replies; 66+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-22 16:09 UTC (permalink / raw)
  To: sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, steffen.klassert

Update the memory requirements, when adding xdp.frame_sz in BPF test_run
function bpf_prog_test_run_xdp() which e.g. is used by XDP selftests.

Specifically add the expected reserved tailroom, but also allocated a
larger memory area to reflect that XDP frames usually comes in this
format. Limit the provided packet data size to 4096 minus headroom +
tailroom, as this also reflect a common 3520 bytes MTU limit with XDP.

Note that bpf_test_init already use a memory allocation method that clears
memory.  Thus, this already guards against leaking uninit kernel memory.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 net/bpf/test_run.c |   16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index 29dbdd4c29f6..30ba7d38941d 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -470,25 +470,34 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
 int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr,
 			  union bpf_attr __user *uattr)
 {
+	u32 tailroom = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+	u32 headroom = XDP_PACKET_HEADROOM;
 	u32 size = kattr->test.data_size_in;
 	u32 repeat = kattr->test.repeat;
 	struct netdev_rx_queue *rxqueue;
 	struct xdp_buff xdp = {};
 	u32 retval, duration;
+	u32 max_data_sz;
 	void *data;
 	int ret;
 
 	if (kattr->test.ctx_in || kattr->test.ctx_out)
 		return -EINVAL;
 
-	data = bpf_test_init(kattr, size, XDP_PACKET_HEADROOM + NET_IP_ALIGN, 0);
+	/* XDP have extra tailroom as (most) drivers use full page */
+	max_data_sz = 4096 - headroom - tailroom;
+	if (size > max_data_sz)
+		return -EINVAL;
+
+	data = bpf_test_init(kattr, max_data_sz, headroom, tailroom);
 	if (IS_ERR(data))
 		return PTR_ERR(data);
 
 	xdp.data_hard_start = data;
-	xdp.data = data + XDP_PACKET_HEADROOM + NET_IP_ALIGN;
+	xdp.data = data + headroom;
 	xdp.data_meta = xdp.data;
 	xdp.data_end = xdp.data + size;
+	xdp.frame_sz = headroom + max_data_sz + tailroom;
 
 	rxqueue = __netif_get_rx_queue(current->nsproxy->net_ns->loopback_dev, 0);
 	xdp.rxq = &rxqueue->xdp_rxq;
@@ -496,8 +505,7 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr,
 	ret = bpf_test_run(prog, &xdp, repeat, &retval, &duration, true);
 	if (ret)
 		goto out;
-	if (xdp.data != data + XDP_PACKET_HEADROOM + NET_IP_ALIGN ||
-	    xdp.data_end != xdp.data + size)
+	if (xdp.data != data + headroom || xdp.data_end != xdp.data + size)
 		size = xdp.data_end - xdp.data;
 	ret = bpf_test_finish(kattr, uattr, xdp.data, size, retval, duration);
 out:



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH net-next 32/33] selftests/bpf: adjust BPF selftest for xdp_adjust_tail
       [not found] <158757160439.1370371.13213378122947426220.stgit@firesoul>
                   ` (30 preceding siblings ...)
  2020-04-22 16:09 ` [PATCH net-next 31/33] bpf: add xdp.frame_sz in bpf_prog_test_run_xdp() Jesper Dangaard Brouer
@ 2020-04-22 16:10 ` Jesper Dangaard Brouer
  2020-04-22 16:10 ` [PATCH net-next 33/33] selftests/bpf: xdp_adjust_tail add grow tail tests Jesper Dangaard Brouer
  32 siblings, 0 replies; 66+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-22 16:10 UTC (permalink / raw)
  To: sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, steffen.klassert

Current selftest for BPF-helper xdp_adjust_tail only shrink tail.
Make it more clear that this is a shrink test case.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 .../selftests/bpf/prog_tests/xdp_adjust_tail.c     |    9 +++++-
 .../testing/selftests/bpf/progs/test_adjust_tail.c |   30 --------------------
 .../bpf/progs/test_xdp_adjust_tail_shrink.c        |   30 ++++++++++++++++++++
 3 files changed, 37 insertions(+), 32 deletions(-)
 delete mode 100644 tools/testing/selftests/bpf/progs/test_adjust_tail.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_shrink.c

diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_adjust_tail.c b/tools/testing/selftests/bpf/prog_tests/xdp_adjust_tail.c
index 3744196d7cba..d258f979d5ef 100644
--- a/tools/testing/selftests/bpf/prog_tests/xdp_adjust_tail.c
+++ b/tools/testing/selftests/bpf/prog_tests/xdp_adjust_tail.c
@@ -1,9 +1,9 @@
 // SPDX-License-Identifier: GPL-2.0
 #include <test_progs.h>
 
-void test_xdp_adjust_tail(void)
+void test_xdp_adjust_tail_shrink(void)
 {
-	const char *file = "./test_adjust_tail.o";
+	const char *file = "./test_xdp_adjust_tail_shrink.o";
 	struct bpf_object *obj;
 	char buf[128];
 	__u32 duration, retval, size;
@@ -27,3 +27,8 @@ void test_xdp_adjust_tail(void)
 	      err, errno, retval, size);
 	bpf_object__close(obj);
 }
+
+void test_xdp_adjust_tail(void)
+{
+	test_xdp_adjust_tail_shrink();
+}
diff --git a/tools/testing/selftests/bpf/progs/test_adjust_tail.c b/tools/testing/selftests/bpf/progs/test_adjust_tail.c
deleted file mode 100644
index b7fc85769bdc..000000000000
--- a/tools/testing/selftests/bpf/progs/test_adjust_tail.c
+++ /dev/null
@@ -1,30 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0
- * Copyright (c) 2018 Facebook
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- */
-#include <linux/bpf.h>
-#include <linux/if_ether.h>
-#include <bpf/bpf_helpers.h>
-
-int _version SEC("version") = 1;
-
-SEC("xdp_adjust_tail")
-int _xdp_adjust_tail(struct xdp_md *xdp)
-{
-	void *data_end = (void *)(long)xdp->data_end;
-	void *data = (void *)(long)xdp->data;
-	int offset = 0;
-
-	if (data_end - data == 54)
-		offset = 256;
-	else
-		offset = 20;
-	if (bpf_xdp_adjust_tail(xdp, 0 - offset))
-		return XDP_DROP;
-	return XDP_TX;
-}
-
-char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_shrink.c b/tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_shrink.c
new file mode 100644
index 000000000000..c8a7c17b54f4
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_shrink.c
@@ -0,0 +1,30 @@
+/* SPDX-License-Identifier: GPL-2.0
+ * Copyright (c) 2018 Facebook
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+#include <linux/bpf.h>
+#include <linux/if_ether.h>
+#include <bpf/bpf_helpers.h>
+
+int _version SEC("version") = 1;
+
+SEC("xdp_adjust_tail_shrink")
+int _xdp_adjust_tail_shrink(struct xdp_md *xdp)
+{
+	void *data_end = (void *)(long)xdp->data_end;
+	void *data = (void *)(long)xdp->data;
+	int offset = 0;
+
+	if (data_end - data == 54) /* sizeof(pkt_v4) */
+		offset = 256; /* shrink too much */
+	else
+		offset = 20;
+	if (bpf_xdp_adjust_tail(xdp, 0 - offset))
+		return XDP_DROP;
+	return XDP_TX;
+}
+
+char _license[] SEC("license") = "GPL";



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* [PATCH net-next 33/33] selftests/bpf: xdp_adjust_tail add grow tail tests
       [not found] <158757160439.1370371.13213378122947426220.stgit@firesoul>
                   ` (31 preceding siblings ...)
  2020-04-22 16:10 ` [PATCH net-next 32/33] selftests/bpf: adjust BPF selftest for xdp_adjust_tail Jesper Dangaard Brouer
@ 2020-04-22 16:10 ` Jesper Dangaard Brouer
  32 siblings, 0 replies; 66+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-22 16:10 UTC (permalink / raw)
  To: sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, steffen.klassert

Extend BPF selftest xdp_adjust_tail with grow tail tests, which is added
as subtest's. The first grow test stays in same form as original shrink
test. The second grow test use the newer bpf_prog_test_run_xattr() calls,
and does extra checking of data contents.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 .../selftests/bpf/prog_tests/xdp_adjust_tail.c     |  116 +++++++++++++++++++-
 .../bpf/progs/test_xdp_adjust_tail_grow.c          |   33 ++++++
 2 files changed, 144 insertions(+), 5 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_grow.c

diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_adjust_tail.c b/tools/testing/selftests/bpf/prog_tests/xdp_adjust_tail.c
index d258f979d5ef..1498627af6e8 100644
--- a/tools/testing/selftests/bpf/prog_tests/xdp_adjust_tail.c
+++ b/tools/testing/selftests/bpf/prog_tests/xdp_adjust_tail.c
@@ -4,10 +4,10 @@
 void test_xdp_adjust_tail_shrink(void)
 {
 	const char *file = "./test_xdp_adjust_tail_shrink.o";
+	__u32 duration, retval, size, expect_sz;
 	struct bpf_object *obj;
-	char buf[128];
-	__u32 duration, retval, size;
 	int err, prog_fd;
+	char buf[128];
 
 	err = bpf_prog_load(file, BPF_PROG_TYPE_XDP, &obj, &prog_fd);
 	if (CHECK_FAIL(err))
@@ -20,15 +20,121 @@ void test_xdp_adjust_tail_shrink(void)
 	      "ipv4", "err %d errno %d retval %d size %d\n",
 	      err, errno, retval, size);
 
+	expect_sz = sizeof(pkt_v6) - 20;  /* Test shrink with 20 bytes */
 	err = bpf_prog_test_run(prog_fd, 1, &pkt_v6, sizeof(pkt_v6),
 				buf, &size, &retval, &duration);
-	CHECK(err || retval != XDP_TX || size != 54,
-	      "ipv6", "err %d errno %d retval %d size %d\n",
+	CHECK(err || retval != XDP_TX || size != expect_sz,
+	      "ipv6", "err %d errno %d retval %d size %d expect-size %d\n",
+	      err, errno, retval, size, expect_sz);
+	bpf_object__close(obj);
+}
+
+void test_xdp_adjust_tail_grow(void)
+{
+	const char *file = "./test_xdp_adjust_tail_grow.o";
+	struct bpf_object *obj;
+	char buf[4096]; /* avoid segfault: large buf to hold grow results */
+	__u32 duration, retval, size, expect_sz;
+	int err, prog_fd;
+
+	err = bpf_prog_load(file, BPF_PROG_TYPE_XDP, &obj, &prog_fd);
+	if (CHECK_FAIL(err))
+		return;
+
+	err = bpf_prog_test_run(prog_fd, 1, &pkt_v4, sizeof(pkt_v4),
+				buf, &size, &retval, &duration);
+	CHECK(err || retval != XDP_DROP,
+	      "ipv4", "err %d errno %d retval %d size %d\n",
 	      err, errno, retval, size);
+
+	expect_sz = sizeof(pkt_v6) + 40; /* Test grow with 40 bytes */
+	err = bpf_prog_test_run(prog_fd, 1, &pkt_v6, sizeof(pkt_v6) /* 74 */,
+				buf, &size, &retval, &duration);
+	CHECK(err || retval != XDP_TX || size != expect_sz,
+	      "ipv6", "err %d errno %d retval %d size %d expect-size %d\n",
+	      err, errno, retval, size, expect_sz);
+
+	bpf_object__close(obj);
+}
+
+void test_xdp_adjust_tail_grow2(void)
+{
+	const char *file = "./test_xdp_adjust_tail_grow.o";
+	char buf[4096]; /* avoid segfault: large buf to hold grow results */
+	int tailroom = 320; /* SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) */;
+	struct bpf_object *obj;
+	int err, cnt, i;
+	int max_grow;
+
+	struct bpf_prog_test_run_attr tattr = {
+		.repeat 	= 1,
+		.data_in	= &buf,
+		.data_out	= &buf,
+		.data_size_in	= 0, /* Per test */
+		.data_size_out	= 0, /* Per test */
+	};
+
+	err = bpf_prog_load(file, BPF_PROG_TYPE_XDP, &obj, &tattr.prog_fd);
+	if (CHECK_ATTR(err, "load", "err %d errno %d\n", err, errno))
+		return;
+
+	/* Test case-64 */
+	memset(buf, 1, sizeof(buf));
+	tattr.data_size_in  =  64; /* Determine test case via pkt size */
+	tattr.data_size_out = 128; /* Limit copy_size */
+	/* Kernel side alloc packet memory area that is zero init */
+	err = bpf_prog_test_run_xattr(&tattr);
+
+	CHECK_ATTR(errno != ENOSPC /* Due limit copy_size in bpf_test_finish */
+		   || tattr.retval != XDP_TX
+		   || tattr.data_size_out != 192, /* Expected grow size */
+		   "case-64",
+		   "err %d errno %d retval %d size %d\n",
+		   err, errno, tattr.retval, tattr.data_size_out);
+
+	/* Extra checks for data contents */
+	CHECK_ATTR(tattr.data_size_out != 192
+		   || buf[0]   != 1 ||  buf[63]  != 1  /*  0-63  memset to 1 */
+		   || buf[64]  != 0 ||  buf[127] != 0  /* 64-127 memset to 0 */
+		   || buf[128] != 1 ||  buf[191] != 1, /*128-191 memset to 1 */
+		   "case-64-data",
+		   "err %d errno %d retval %d size %d\n",
+		   err, errno, tattr.retval, tattr.data_size_out);
+
+	/* Test case-128 */
+	memset(buf, 2, sizeof(buf));
+	tattr.data_size_in  = 128; /* Determine test case via pkt size */
+	tattr.data_size_out = sizeof(buf);   /* Copy everything */
+	err = bpf_prog_test_run_xattr(&tattr);
+
+	max_grow = 4096 - XDP_PACKET_HEADROOM -	tailroom; /* 3520 */
+	CHECK_ATTR(err
+		   || tattr.retval != XDP_TX
+		   || tattr.data_size_out != max_grow, /* Expect max grow size */
+		   "case-128",
+		   "err %d errno %d retval %d size %d expect-size %d\n",
+		   err, errno, tattr.retval, tattr.data_size_out, max_grow);
+
+	/* Extra checks for data contents: Count grow size, will contain zeros */
+	for (i = 0, cnt = 0; i < sizeof(buf); i++) {
+		if (buf[i] == 0)
+			cnt++;
+	}
+	CHECK_ATTR((cnt != (max_grow - tattr.data_size_in)) /* Grow increase */
+		   || tattr.data_size_out != max_grow, /* Total grow size */
+		   "case-128-data",
+		   "err %d errno %d retval %d size %d grow-size %d\n",
+		   err, errno, tattr.retval, tattr.data_size_out, cnt);
+
 	bpf_object__close(obj);
 }
 
 void test_xdp_adjust_tail(void)
 {
-	test_xdp_adjust_tail_shrink();
+	if (test__start_subtest("xdp_adjust_tail_shrink"))
+		test_xdp_adjust_tail_shrink();
+	if (test__start_subtest("xdp_adjust_tail_grow"))
+		test_xdp_adjust_tail_grow();
+	if (test__start_subtest("xdp_adjust_tail_grow2"))
+		test_xdp_adjust_tail_grow2();
 }
diff --git a/tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_grow.c b/tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_grow.c
new file mode 100644
index 000000000000..3d66599eee2e
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_grow.c
@@ -0,0 +1,33 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+
+SEC("xdp_adjust_tail_grow")
+int _xdp_adjust_tail_grow(struct xdp_md *xdp)
+{
+	void *data_end = (void *)(long)xdp->data_end;
+	void *data = (void *)(long)xdp->data;
+	unsigned int data_len;
+	int offset = 0;
+
+	/* Data length determine test case */
+	data_len = data_end - data;
+
+	if (data_len == 54) { /* sizeof(pkt_v4) */
+		offset = 4096; /* test too large offset */
+	} else if (data_len == 74) { /* sizeof(pkt_v6) */
+		offset = 40;
+	} else if (data_len == 64) {
+		offset = 128;
+	} else if (data_len == 128) {
+		offset = 4096 - 256 - 320 - data_len; /* Max tail grow 3520 */
+	} else {
+		return XDP_ABORTED; /* No matching test */
+	}
+
+	if (bpf_xdp_adjust_tail(xdp, offset))
+		return XDP_DROP;
+	return XDP_TX;
+}
+
+char _license[] SEC("license") = "GPL";



^ permalink raw reply related	[flat|nested] 66+ messages in thread

* RE: [PATCH net-next 12/33] hv_netvsc: add XDP frame size to driver
  2020-04-22 16:08 ` [PATCH net-next 12/33] hv_netvsc: add XDP frame size to driver Jesper Dangaard Brouer
@ 2020-04-22 16:57   ` Haiyang Zhang
  0 siblings, 0 replies; 66+ messages in thread
From: Haiyang Zhang @ 2020-04-22 16:57 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, sameehj
  Cc: Wei Liu, KY Srinivasan, Stephen Hemminger, netdev, bpf, zorik,
	akiyano, gtzalik, Toke Høiland-Jørgensen,
	Daniel Borkmann, Alexei Starovoitov, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Willem de Bruijn,
	Ilias Apalodimas, Lorenzo Bianconi, Saeed Mahameed,
	steffen.klassert



> -----Original Message-----
> From: Jesper Dangaard Brouer <brouer@redhat.com>
> Sent: Wednesday, April 22, 2020 12:08 PM
> To: sameehj@amazon.com
> Cc: Wei Liu <wei.liu@kernel.org>; KY Srinivasan <kys@microsoft.com>;
> Haiyang Zhang <haiyangz@microsoft.com>; Stephen Hemminger
> <sthemmin@microsoft.com>; Jesper Dangaard Brouer
> <brouer@redhat.com>; netdev@vger.kernel.org; bpf@vger.kernel.org;
> zorik@amazon.com; akiyano@amazon.com; gtzalik@amazon.com; Toke
> Høiland-Jørgensen <toke@redhat.com>; Daniel Borkmann
> <borkmann@iogearbox.net>; Alexei Starovoitov
> <alexei.starovoitov@gmail.com>; John Fastabend
> <john.fastabend@gmail.com>; Alexander Duyck
> <alexander.duyck@gmail.com>; Jeff Kirsher <jeffrey.t.kirsher@intel.com>;
> David Ahern <dsahern@gmail.com>; Willem de Bruijn
> <willemdebruijn.kernel@gmail.com>; Ilias Apalodimas
> <ilias.apalodimas@linaro.org>; Lorenzo Bianconi <lorenzo@kernel.org>;
> Saeed Mahameed <saeedm@mellanox.com>;
> steffen.klassert@secunet.com
> Subject: [PATCH net-next 12/33] hv_netvsc: add XDP frame size to driver
> 
> The hyperv NIC drivers XDP implementation is rather disappointing as it will
> be a slowdown to enable XDP on this driver, given it will allocate a new page
> for each packet and copy over the payload, before invoking the XDP BPF-
> prog.

This statement is not accurate -- The data path of netvsc driver does memory 
allocation and copy even without XDP, so it's not "a slowdown to enable XDP".

Thanks,
- Haiyang


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH net-next 14/33] net: ethernet: ti: add XDP frame size to driver cpsw
  2020-04-22 16:08 ` [PATCH net-next 14/33] net: ethernet: ti: add XDP frame size to driver cpsw Jesper Dangaard Brouer
@ 2020-04-22 20:28   ` Grygorii Strashko
  0 siblings, 0 replies; 66+ messages in thread
From: Grygorii Strashko @ 2020-04-22 20:28 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, sameehj
  Cc: Ilias Apalodimas, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Lorenzo Bianconi,
	Saeed Mahameed, steffen.klassert



On 22/04/2020 19:08, Jesper Dangaard Brouer wrote:
> The driver code cpsw.c and cpsw_new.c both use page_pool
> with default order-0 pages or their RX-pages.
> 
> Cc: Grygorii Strashko <grygorii.strashko@ti.com>
> Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> ---
>   drivers/net/ethernet/ti/cpsw.c     |    1 +
>   drivers/net/ethernet/ti/cpsw_new.c |    1 +
>   2 files changed, 2 insertions(+)
> 
> diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
> index c2c5bf87da01..58e346ea9898 100644
> --- a/drivers/net/ethernet/ti/cpsw.c
> +++ b/drivers/net/ethernet/ti/cpsw.c
> @@ -406,6 +406,7 @@ static void cpsw_rx_handler(void *token, int len, int status)
>   
>   		xdp.data_hard_start = pa;
>   		xdp.rxq = &priv->xdp_rxq[ch];
> +		xdp.frame_sz = PAGE_SIZE;
>   
>   		port = priv->emac_port + cpsw->data.dual_emac;
>   		ret = cpsw_run_xdp(priv, ch, &xdp, page, port);
> diff --git a/drivers/net/ethernet/ti/cpsw_new.c b/drivers/net/ethernet/ti/cpsw_new.c
> index 9209e613257d..08e1c5b8f00e 100644
> --- a/drivers/net/ethernet/ti/cpsw_new.c
> +++ b/drivers/net/ethernet/ti/cpsw_new.c
> @@ -348,6 +348,7 @@ static void cpsw_rx_handler(void *token, int len, int status)
>   
>   		xdp.data_hard_start = pa;
>   		xdp.rxq = &priv->xdp_rxq[ch];
> +		xdp.frame_sz = PAGE_SIZE;
>   
>   		ret = cpsw_run_xdp(priv, ch, &xdp, page, priv->emac_port);
>   		if (ret != CPSW_XDP_PASS)
> 
> 

Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>

-- 
Best regards,
grygorii

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH net-next 18/33] nfp: add XDP frame size to netronome driver
  2020-04-22 16:08 ` [PATCH net-next 18/33] nfp: add XDP frame size to netronome driver Jesper Dangaard Brouer
@ 2020-04-23  2:43   ` Jakub Kicinski
  0 siblings, 0 replies; 66+ messages in thread
From: Jakub Kicinski @ 2020-04-23  2:43 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: sameehj, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, steffen.klassert

On Wed, 22 Apr 2020 18:08:52 +0200 Jesper Dangaard Brouer wrote:
> The netronome nfp driver use PAGE_SIZE when xdp_prog is set, but
> xdp.data_hard_start begins at offset NFP_NET_RX_BUF_HEADROOM.
> Thus, adjust for this when setting xdp.frame_sz, as it counts
> from data_hard_start.
> 
> When doing XDP_TX this driver is smart and instead of a full DMA-map
> does a DMA-sync on with packet length. As xdp_adjust_tail can now
> grow packet length, add checks to make sure that grow size is within
> the DMA-mapped size.
> 
> Cc: Jakub Kicinski <kuba@kernel.org>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>

Reviewed-by: Jakub Kicinski <kuba@kernel.org>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH net-next 01/33] xdp: add frame size to xdp_buff
  2020-04-22 16:07 ` [PATCH net-next 01/33] xdp: add frame size to xdp_buff Jesper Dangaard Brouer
@ 2020-04-24 14:00   ` Toke Høiland-Jørgensen
  2020-04-28 16:06     ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 66+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-04-24 14:00 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano, gtzalik,
	Daniel Borkmann, Alexei Starovoitov, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Willem de Bruijn,
	Ilias Apalodimas, Lorenzo Bianconi, Saeed Mahameed,
	steffen.klassert

Jesper Dangaard Brouer <brouer@redhat.com> writes:

> XDP have evolved to support several frame sizes, but xdp_buff was not
> updated with this information. The frame size (frame_sz) member of
> xdp_buff is introduced to know the real size of the memory the frame is
> delivered in.
>
> When introducing this also make it clear that some tailroom is
> reserved/required when creating SKBs using build_skb().
>
> It would also have been an option to introduce a pointer to
> data_hard_end (with reserved offset). The advantage with frame_sz is
> that (like rxq) drivers only need to setup/assign this value once per
> NAPI cycle. Due to XDP-generic (and some drivers) it's not possible to
> store frame_sz inside xdp_rxq_info, because it's varies per packet as it
> can be based/depend on packet length.
>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>

With one possible nit below:

Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>

> ---
>  include/net/xdp.h |   13 +++++++++++++
>  1 file changed, 13 insertions(+)
>
> diff --git a/include/net/xdp.h b/include/net/xdp.h
> index 40c6d3398458..1ccf7df98bee 100644
> --- a/include/net/xdp.h
> +++ b/include/net/xdp.h
> @@ -6,6 +6,8 @@
>  #ifndef __LINUX_NET_XDP_H__
>  #define __LINUX_NET_XDP_H__
>  
> +#include <linux/skbuff.h> /* skb_shared_info */
> +
>  /**
>   * DOC: XDP RX-queue information
>   *
> @@ -70,8 +72,19 @@ struct xdp_buff {
>  	void *data_hard_start;
>  	unsigned long handle;
>  	struct xdp_rxq_info *rxq;
> +	u32 frame_sz; /* frame size to deduct data_hard_end/reserved tailroom*/

I think maybe you want to s/deduct/deduce/ here?

-Toke


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH net-next 06/33] net: XDP-generic determining XDP frame size
  2020-04-22 16:07 ` [PATCH net-next 06/33] net: XDP-generic determining " Jesper Dangaard Brouer
@ 2020-04-24 14:03   ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 66+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-04-24 14:03 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano, gtzalik,
	Daniel Borkmann, Alexei Starovoitov, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Willem de Bruijn,
	Ilias Apalodimas, Lorenzo Bianconi, Saeed Mahameed,
	steffen.klassert

Jesper Dangaard Brouer <brouer@redhat.com> writes:

> The SKB "head" pointer points to the data area that contains
> skb_shared_info, that can be found via skb_end_pointer(). Given
> xdp->data_hard_start have been established (basically pointing to
> skb->head), frame size is between skb_end_pointer() and data_hard_start,
> plus the size reserved to skb_shared_info.
>
> Change the bpf_xdp_adjust_tail offset adjust of skb->len, to be a positive
> offset number on grow, and negative number on shrink.  As this seems more
> natural when reading the code.
>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>

Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH net-next 07/33] xdp: xdp_frame add member frame_sz and handle in convert_to_xdp_frame
  2020-04-22 16:07 ` [PATCH net-next 07/33] xdp: xdp_frame add member frame_sz and handle in convert_to_xdp_frame Jesper Dangaard Brouer
@ 2020-04-24 14:04   ` Toke Høiland-Jørgensen
  2020-04-25  3:24   ` Toshiaki Makita
  1 sibling, 0 replies; 66+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-04-24 14:04 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano, gtzalik,
	Daniel Borkmann, Alexei Starovoitov, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Willem de Bruijn,
	Ilias Apalodimas, Lorenzo Bianconi, Saeed Mahameed,
	steffen.klassert

Jesper Dangaard Brouer <brouer@redhat.com> writes:

> Use hole in struct xdp_frame, when adding member frame_sz, which keeps
> same sizeof struct (32 bytes)
>
> Drivers ixgbe and sfc had bug cases where the necessary/expected
> tailroom was not reserved. This can lead to some hard to catch memory
> corruption issues. Having the drivers frame_sz this can be detected when
> packet length/end via xdp->data_end exceed the xdp_data_hard_end
> pointer, which accounts for the reserved the tailroom.
>
> When detecting this driver issue, simply fail the conversion with NULL,
> which results in feedback to driver (failing xdp_do_redirect()) causing
> driver to drop packet. Given the lack of consistent XDP stats, this can
> be hard to troubleshoot. And given this is a driver bug, we want to
> generate some more noise in form of a WARN stack dump (to ID the driver
> code that inlined convert_to_xdp_frame).
>
> Inlining the WARN macro is problematic, because it adds an asm
> instruction (on Intel CPUs ud2) what influence instruction cache
> prefetching. Thus, introduce xdp_warn and macro XDP_WARN, to avoid this
> and at the same time make identifying the function and line of this
> inlined function easier.
>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>

Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH net-next 08/33] xdp: cpumap redirect use frame_sz and increase skb_tailroom
  2020-04-22 16:08 ` [PATCH net-next 08/33] xdp: cpumap redirect use frame_sz and increase skb_tailroom Jesper Dangaard Brouer
@ 2020-04-24 14:04   ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 66+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-04-24 14:04 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano, gtzalik,
	Daniel Borkmann, Alexei Starovoitov, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Willem de Bruijn,
	Ilias Apalodimas, Lorenzo Bianconi, Saeed Mahameed,
	steffen.klassert

Jesper Dangaard Brouer <brouer@redhat.com> writes:

> Knowing the memory size backing the packet/xdp_frame data area, and
> knowing it already have reserved room for skb_shared_info, simplifies
> using build_skb significantly.
>
> With this change we no-longer lie about the SKB truesize, but more
> importantly a significant larger skb_tailroom is now provided, e.g. when
> drivers uses a full PAGE_SIZE. This extra tailroom (in linear area) can be
> used by the network stack when coalescing SKBs (e.g. in skb_try_coalesce,
> see TCP cases where tcp_queue_rcv() can 'eat' skb).
>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>

Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH net-next 09/33] veth: adjust hard_start offset on redirect XDP frames
  2020-04-22 16:08 ` [PATCH net-next 09/33] veth: adjust hard_start offset on redirect XDP frames Jesper Dangaard Brouer
@ 2020-04-24 14:05   ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 66+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-04-24 14:05 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, sameehj
  Cc: Toshiaki Makita, Mao Wenan, Toshiaki Makita,
	Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano, gtzalik,
	Daniel Borkmann, Alexei Starovoitov, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Willem de Bruijn,
	Ilias Apalodimas, Lorenzo Bianconi, Saeed Mahameed,
	steffen.klassert

Jesper Dangaard Brouer <brouer@redhat.com> writes:

> When native XDP redirect into a veth device, the frame arrives in the
> xdp_frame structure. It is then processed in veth_xdp_rcv_one(),
> which can run a new XDP bpf_prog on the packet. Doing so requires
> converting xdp_frame to xdp_buff, but the tricky part is that
> xdp_frame memory area is located in the top (data_hard_start) memory
> area that xdp_buff will point into.
>
> The current code tried to protect the xdp_frame area, by assigning
> xdp_buff.data_hard_start past this memory. This results in 32 bytes
> less headroom to expand into via BPF-helper bpf_xdp_adjust_head().
>
> This protect step is actually not needed, because BPF-helper
> bpf_xdp_adjust_head() already reserve this area, and don't allow
> BPF-prog to expand into it. Thus, it is safe to point data_hard_start
> directly at xdp_frame memory area.
>
> Cc: Toshiaki Makita <toshiaki.makita1@gmail.com>
> Fixes: 9fc8d518d9d5 ("veth: Handle xdp_frames in xdp napi ring")
> Reported-by: Mao Wenan <maowenan@huawei.com>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> Acked-by: Toshiaki Makita <toshiaki.makita1@gmail.com>

Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH net-next 10/33] veth: xdp using frame_sz in veth driver
  2020-04-22 16:08 ` [PATCH net-next 10/33] veth: xdp using frame_sz in veth driver Jesper Dangaard Brouer
@ 2020-04-24 14:07   ` Toke Høiland-Jørgensen
  2020-04-25  3:10   ` Toshiaki Makita
  1 sibling, 0 replies; 66+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-04-24 14:07 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, sameehj
  Cc: Toshiaki Makita, Lorenzo Bianconi, Jesper Dangaard Brouer,
	netdev, bpf, zorik, akiyano, gtzalik, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, steffen.klassert

Jesper Dangaard Brouer <brouer@redhat.com> writes:

> The veth driver can run XDP in "native" mode in it's own NAPI
> handler, and since commit 9fc8d518d9d5 ("veth: Handle xdp_frames in
> xdp napi ring") packets can come in two forms either xdp_frame or
> skb, calling respectively veth_xdp_rcv_one() or veth_xdp_rcv_skb().
>
> For packets to arrive in xdp_frame format, they will have been
> redirected from an XDP native driver. In case of XDP_PASS or no
> XDP-prog attached, the veth driver will allocate and create an SKB.
>
> The current code in veth_xdp_rcv_one() xdp_frame case, had to guess
> the frame truesize of the incoming xdp_frame, when using
> veth_build_skb(). With xdp_frame->frame_sz this is not longer
> necessary.
>
> Calculating the frame_sz in veth_xdp_rcv_skb() skb case, is done
> similar to the XDP-generic handling code in net/core/dev.c.
>
> Cc: Toshiaki Makita <toshiaki.makita1@gmail.com>
> Reviewed-by: Lorenzo Bianconi <lorenzo@kernel.org>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>

Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH net-next 29/33] xdp: allow bpf_xdp_adjust_tail() to grow packet size
  2020-04-22 16:09 ` [PATCH net-next 29/33] xdp: allow bpf_xdp_adjust_tail() to grow packet size Jesper Dangaard Brouer
@ 2020-04-24 14:09   ` Toke Høiland-Jørgensen
  2020-04-27 19:01   ` Daniel Borkmann
  1 sibling, 0 replies; 66+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-04-24 14:09 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano, gtzalik,
	Daniel Borkmann, Alexei Starovoitov, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Willem de Bruijn,
	Ilias Apalodimas, Lorenzo Bianconi, Saeed Mahameed,
	steffen.klassert

Jesper Dangaard Brouer <brouer@redhat.com> writes:

> Finally, after all drivers have a frame size, allow BPF-helper
> bpf_xdp_adjust_tail() to grow or extend packet size at frame tail.
>
> Remember that helper/macro xdp_data_hard_end have reserved some
> tailroom.  Thus, this helper makes sure that the BPF-prog don't have
> access to this tailroom area.
>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>

Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH net-next 30/33] xdp: clear grow memory in bpf_xdp_adjust_tail()
  2020-04-22 16:09 ` [PATCH net-next 30/33] xdp: clear grow memory in bpf_xdp_adjust_tail() Jesper Dangaard Brouer
@ 2020-04-24 14:09   ` Toke Høiland-Jørgensen
  2020-04-27  5:26   ` John Fastabend
  1 sibling, 0 replies; 66+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-04-24 14:09 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano, gtzalik,
	Daniel Borkmann, Alexei Starovoitov, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Willem de Bruijn,
	Ilias Apalodimas, Lorenzo Bianconi, Saeed Mahameed,
	steffen.klassert

Jesper Dangaard Brouer <brouer@redhat.com> writes:

> Clearing memory of tail when grow happens, because it is too easy
> to write a XDP_PASS program that extend the tail, which expose
> this memory to users that can run tcpdump.
>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>

Probably a good precaution :)

Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH net-next 28/33] mlx5: rx queue setup time determine frame_sz for XDP
  2020-04-22 16:09 ` [PATCH net-next 28/33] mlx5: rx queue setup time determine frame_sz for XDP Jesper Dangaard Brouer
@ 2020-04-25  0:58   ` Alexei Starovoitov
  2020-04-27 20:22   ` Jesper Dangaard Brouer
  1 sibling, 0 replies; 66+ messages in thread
From: Alexei Starovoitov @ 2020-04-25  0:58 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: sameehj, Tariq Toukan, Saeed Mahameed, Network Development, bpf,
	zorik, Arthur Kiyanovski, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	John Fastabend, Alexander Duyck, Jeff Kirsher, David Ahern,
	Willem de Bruijn, Ilias Apalodimas, Lorenzo Bianconi,
	Steffen Klassert

On Wed, Apr 22, 2020 at 9:09 AM Jesper Dangaard Brouer
<brouer@redhat.com> wrote:
>
> The mlx5 driver have multiple memory models, which are also changed
> according to whether a XDP bpf_prog is attached.
>
> The 'rx_striding_rq' setting is adjusted via ethtool priv-flags e.g.:
>  # ethtool --set-priv-flags mlx5p2 rx_striding_rq off
>
> On the general case with 4K page_size and regular MTU packet, then
> the frame_sz is 2048 and 4096 when XDP is enabled, in both modes.
>
> The info on the given frame size is stored differently depending on the
> RQ-mode and encoded in a union in struct mlx5e_rq union wqe/mpwqe.
> In rx striding mode rq->mpwqe.log_stride_sz is either 11 or 12, which
> corresponds to 2048 or 4096 (MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ).
> In non-striding mode (MLX5_WQ_TYPE_CYCLIC) the frag_stride is stored
> in rq->wqe.info.arr[0].frag_stride, for the first fragment, which is
> what the XDP case cares about.
>
> To reduce effect on fast-path, this patch determine the frame_sz at
> setup time, to avoid determining the memory model runtime. Variable
> is named first_frame_sz to make it clear that this is only the frame
> size of the first fragment.
>
> This mlx5 driver does a DMA-sync on XDP_TX action, but grow is safe
> as it have done a DMA-map on the entire PAGE_SIZE. The driver also
> already does a XDP length check against sq->hw_mtu on the possible
> XDP xmit paths mlx5e_xmit_xdp_frame() + mlx5e_xmit_xdp_frame_mpwqe().
>
> Cc: Tariq Toukan <tariqt@mellanox.com>
> Cc: Saeed Mahameed <saeedm@mellanox.com>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/en.h      |    1 +
>  drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c  |    1 +
>  drivers/net/ethernet/mellanox/mlx5/core/en_main.c |    6 ++++++
>  3 files changed, 8 insertions(+)


Hey mellanox folks,

you had an active discussion regarding mlx5 changes earlier.
Were your concerns resolved ?
If so, could you please ack.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH net-next 10/33] veth: xdp using frame_sz in veth driver
  2020-04-22 16:08 ` [PATCH net-next 10/33] veth: xdp using frame_sz in veth driver Jesper Dangaard Brouer
  2020-04-24 14:07   ` Toke Høiland-Jørgensen
@ 2020-04-25  3:10   ` Toshiaki Makita
  1 sibling, 0 replies; 66+ messages in thread
From: Toshiaki Makita @ 2020-04-25  3:10 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: sameehj, Lorenzo Bianconi, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Saeed Mahameed, steffen.klassert

On 2020/04/23 1:08, Jesper Dangaard Brouer wrote:
> The veth driver can run XDP in "native" mode in it's own NAPI
> handler, and since commit 9fc8d518d9d5 ("veth: Handle xdp_frames in
> xdp napi ring") packets can come in two forms either xdp_frame or
> skb, calling respectively veth_xdp_rcv_one() or veth_xdp_rcv_skb().
> 
> For packets to arrive in xdp_frame format, they will have been
> redirected from an XDP native driver. In case of XDP_PASS or no
> XDP-prog attached, the veth driver will allocate and create an SKB.
> 
> The current code in veth_xdp_rcv_one() xdp_frame case, had to guess
> the frame truesize of the incoming xdp_frame, when using
> veth_build_skb(). With xdp_frame->frame_sz this is not longer
> necessary.
> 
> Calculating the frame_sz in veth_xdp_rcv_skb() skb case, is done
> similar to the XDP-generic handling code in net/core/dev.c.
> 
> Cc: Toshiaki Makita <toshiaki.makita1@gmail.com>
> Reviewed-by: Lorenzo Bianconi <lorenzo@kernel.org>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>

Acked-by: Toshiaki Makita <toshiaki.makita1@gmail.com>

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH net-next 07/33] xdp: xdp_frame add member frame_sz and handle in convert_to_xdp_frame
  2020-04-22 16:07 ` [PATCH net-next 07/33] xdp: xdp_frame add member frame_sz and handle in convert_to_xdp_frame Jesper Dangaard Brouer
  2020-04-24 14:04   ` Toke Høiland-Jørgensen
@ 2020-04-25  3:24   ` Toshiaki Makita
  2020-04-27 15:20     ` Jesper Dangaard Brouer
  1 sibling, 1 reply; 66+ messages in thread
From: Toshiaki Makita @ 2020-04-25  3:24 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: sameehj, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, steffen.klassert

On 2020/04/23 1:07, Jesper Dangaard Brouer wrote:
> Use hole in struct xdp_frame, when adding member frame_sz, which keeps
> same sizeof struct (32 bytes)
> 
> Drivers ixgbe and sfc had bug cases where the necessary/expected
> tailroom was not reserved. This can lead to some hard to catch memory
> corruption issues. Having the drivers frame_sz this can be detected when
> packet length/end via xdp->data_end exceed the xdp_data_hard_end
> pointer, which accounts for the reserved the tailroom.
> 
> When detecting this driver issue, simply fail the conversion with NULL,
> which results in feedback to driver (failing xdp_do_redirect()) causing
> driver to drop packet. Given the lack of consistent XDP stats, this can
> be hard to troubleshoot. And given this is a driver bug, we want to
> generate some more noise in form of a WARN stack dump (to ID the driver
> code that inlined convert_to_xdp_frame).
> 
> Inlining the WARN macro is problematic, because it adds an asm
> instruction (on Intel CPUs ud2) what influence instruction cache
> prefetching. Thus, introduce xdp_warn and macro XDP_WARN, to avoid this
> and at the same time make identifying the function and line of this
> inlined function easier.
> 
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> ---
>   include/net/xdp.h |   14 +++++++++++++-
>   net/core/xdp.c    |    7 +++++++
>   2 files changed, 20 insertions(+), 1 deletion(-)
> 
> diff --git a/include/net/xdp.h b/include/net/xdp.h
> index 99f4374f6214..55a885aa4e53 100644
> --- a/include/net/xdp.h
> +++ b/include/net/xdp.h
> @@ -93,7 +93,8 @@ struct xdp_frame {
>   	void *data;
>   	u16 len;
>   	u16 headroom;
> -	u16 metasize;
> +	u32 metasize:8;
> +	u32 frame_sz:24;
>   	/* Lifetime of xdp_rxq_info is limited to NAPI/enqueue time,
>   	 * while mem info is valid on remote CPU.
>   	 */
> @@ -108,6 +109,10 @@ static inline void xdp_scrub_frame(struct xdp_frame *frame)
>   	frame->dev_rx = NULL;
>   }
>   
> +/* Avoids inlining WARN macro in fast-path */
> +void xdp_warn(const char* msg, const char* func, const int line);
> +#define XDP_WARN(msg) xdp_warn(msg, __func__, __LINE__)

Shouldn't this have WARN_ONCE()-like mechanism?
A buggy driver may generate massive amount of dump messages...

Toshiaki Makita

> +
>   struct xdp_frame *xdp_convert_zc_to_xdp_frame(struct xdp_buff *xdp);
>   
>   /* Convert xdp_buff to xdp_frame */
> @@ -128,6 +133,12 @@ struct xdp_frame *convert_to_xdp_frame(struct xdp_buff *xdp)
>   	if (unlikely((headroom - metasize) < sizeof(*xdp_frame)))
>   		return NULL;
>   
> +	/* Catch if driver didn't reserve tailroom for skb_shared_info */
> +	if (unlikely(xdp->data_end > xdp_data_hard_end(xdp))) {
> +		XDP_WARN("Driver BUG: missing reserved tailroom");
> +		return NULL;
> +	}
> +
>   	/* Store info in top of packet */
>   	xdp_frame = xdp->data_hard_start;
>   
> @@ -135,6 +146,7 @@ struct xdp_frame *convert_to_xdp_frame(struct xdp_buff *xdp)
>   	xdp_frame->len  = xdp->data_end - xdp->data;
>   	xdp_frame->headroom = headroom - sizeof(*xdp_frame);
>   	xdp_frame->metasize = metasize;
> +	xdp_frame->frame_sz = xdp->frame_sz;
>   
>   	/* rxq only valid until napi_schedule ends, convert to xdp_mem_info */
>   	xdp_frame->mem = xdp->rxq->mem;
> diff --git a/net/core/xdp.c b/net/core/xdp.c
> index 4c7ea85486af..4bc3026ae218 100644
> --- a/net/core/xdp.c
> +++ b/net/core/xdp.c
> @@ -11,6 +11,7 @@
>   #include <linux/slab.h>
>   #include <linux/idr.h>
>   #include <linux/rhashtable.h>
> +#include <linux/bug.h>
>   #include <net/page_pool.h>
>   
>   #include <net/xdp.h>
> @@ -496,3 +497,9 @@ struct xdp_frame *xdp_convert_zc_to_xdp_frame(struct xdp_buff *xdp)
>   	return xdpf;
>   }
>   EXPORT_SYMBOL_GPL(xdp_convert_zc_to_xdp_frame);
> +
> +/* Used by XDP_WARN macro, to avoid inlining WARN() in fast-path */
> +void xdp_warn(const char* msg, const char* func, const int line) {
> +	WARN(1, "XDP_WARN: %s(line:%d): %s\n", func, line, msg);
> +};
> +EXPORT_SYMBOL_GPL(xdp_warn);
> 
> 

^ permalink raw reply	[flat|nested] 66+ messages in thread

* RE: [PATCH net-next 30/33] xdp: clear grow memory in bpf_xdp_adjust_tail()
  2020-04-22 16:09 ` [PATCH net-next 30/33] xdp: clear grow memory in bpf_xdp_adjust_tail() Jesper Dangaard Brouer
  2020-04-24 14:09   ` Toke Høiland-Jørgensen
@ 2020-04-27  5:26   ` John Fastabend
  2020-04-28 14:50     ` Jesper Dangaard Brouer
  1 sibling, 1 reply; 66+ messages in thread
From: John Fastabend @ 2020-04-27  5:26 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, steffen.klassert

Jesper Dangaard Brouer wrote:
> Clearing memory of tail when grow happens, because it is too easy
> to write a XDP_PASS program that extend the tail, which expose
> this memory to users that can run tcpdump.
> 
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> ---

Hi Jesper, Thanks for the series any idea what the cost of doing
this is? If you have some data I would be curious to know a
baseline measurment, a grow with memset, then a grow with memset.
I'm guess this can be relatively expensive?

>  net/core/filter.c |    4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 5e9c387f74eb..889d96a690c2 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -3442,6 +3442,10 @@ BPF_CALL_2(bpf_xdp_adjust_tail, struct xdp_buff *, xdp, int, offset)
>  	if (unlikely(data_end < xdp->data + ETH_HLEN))
>  		return -EINVAL;
>  
> +	/* Clear memory area on grow, can contain uninit kernel memory */
> +	if (offset > 0)
> +		memset(xdp->data_end, 0, offset);
> +
>  	xdp->data_end = data_end;
>  
>  	return 0;
> 
> 



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH net-next 20/33] vhost_net: also populate XDP frame size
  2020-04-22 16:09 ` [PATCH net-next 20/33] vhost_net: also populate " Jesper Dangaard Brouer
@ 2020-04-27  5:50   ` Jason Wang
  2020-04-30  9:54     ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 66+ messages in thread
From: Jason Wang @ 2020-04-27  5:50 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, sameehj
  Cc: netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, steffen.klassert


On 2020/4/23 上午12:09, Jesper Dangaard Brouer wrote:
> In vhost_net_build_xdp() the 'buf' that gets queued via an xdp_buff
> have embedded a struct tun_xdp_hdr (located at xdp->data_hard_start)
> which contains the buffer length 'buflen' (with tailroom for
> skb_shared_info). Also storing this buflen in xdp->frame_sz, does not
> obsolete struct tun_xdp_hdr, as it also contains a struct
> virtio_net_hdr with other information.
>
> Cc: Jason Wang <jasowang@redhat.com>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> ---
>   drivers/vhost/net.c |    1 +
>   1 file changed, 1 insertion(+)
>
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 87469d67ede8..69af007e22f4 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -745,6 +745,7 @@ static int vhost_net_build_xdp(struct vhost_net_virtqueue *nvq,
>   	xdp->data = buf + pad;
>   	xdp->data_end = xdp->data + len;
>   	hdr->buflen = buflen;
> +	xdp->frame_sz = buflen;
>   
>   	--net->refcnt_bias;
>   	alloc_frag->offset += buflen;


Tun_xdp_one() will use hdr->buflen as the frame_sz (patch 19), so it 
looks to me there's no need to do this?

Thanks


>
>


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH net-next 19/33] tun: add XDP frame size
  2020-04-22 16:08 ` [PATCH net-next 19/33] tun: add XDP frame size Jesper Dangaard Brouer
@ 2020-04-27  5:51   ` Jason Wang
  2020-05-06 20:30   ` Michael S. Tsirkin
  1 sibling, 0 replies; 66+ messages in thread
From: Jason Wang @ 2020-04-27  5:51 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, sameehj
  Cc: netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, steffen.klassert


On 2020/4/23 上午12:08, Jesper Dangaard Brouer wrote:
> The tun driver have two code paths for running XDP (bpf_prog_run_xdp).
> In both cases 'buflen' contains enough tailroom for skb_shared_info.
>
> Cc: Jason Wang <jasowang@redhat.com>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>


Acked-by: Jason Wang <jasowang@redhat.com>


> ---
>   drivers/net/tun.c |    2 ++
>   1 file changed, 2 insertions(+)
>
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 44889eba1dbc..c54f967e2c66 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -1671,6 +1671,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
>   		xdp_set_data_meta_invalid(&xdp);
>   		xdp.data_end = xdp.data + len;
>   		xdp.rxq = &tfile->xdp_rxq;
> +		xdp.frame_sz = buflen;
>   
>   		act = bpf_prog_run_xdp(xdp_prog, &xdp);
>   		if (act == XDP_REDIRECT || act == XDP_TX) {
> @@ -2411,6 +2412,7 @@ static int tun_xdp_one(struct tun_struct *tun,
>   		}
>   		xdp_set_data_meta_invalid(xdp);
>   		xdp->rxq = &tfile->xdp_rxq;
> +		xdp->frame_sz = buflen;
>   
>   		act = bpf_prog_run_xdp(xdp_prog, xdp);
>   		err = tun_xdp_act(tun, xdp_prog, xdp, act);
>
>


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH net-next 21/33] virtio_net: add XDP frame size in two code paths
  2020-04-22 16:09 ` [PATCH net-next 21/33] virtio_net: add XDP frame size in two code paths Jesper Dangaard Brouer
@ 2020-04-27  7:21   ` Jason Wang
  2020-04-27 14:32     ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 66+ messages in thread
From: Jason Wang @ 2020-04-27  7:21 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, sameehj
  Cc: netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, steffen.klassert


On 2020/4/23 上午12:09, Jesper Dangaard Brouer wrote:
> The virtio_net driver is running inside the guest-OS. There are two
> XDP receive code-paths in virtio_net, namely receive_small() and
> receive_mergeable(). The receive_big() function does not support XDP.
>
> In receive_small() the frame size is available in buflen. The buffer
> backing these frames are allocated in add_recvbuf_small() with same
> size, except for the headroom, but tailroom have reserved room for
> skb_shared_info. The headroom is encoded in ctx pointer as a value.
>
> In receive_mergeable() the frame size is more dynamic. There are two
> basic cases: (1) buffer size is based on a exponentially weighted
> moving average (see DECLARE_EWMA) of packet length. Or (2) in case
> virtnet_get_headroom() have any headroom then buffer size is
> PAGE_SIZE. The ctx pointer is this time used for encoding two values;
> the buffer len "truesize" and headroom. In case (1) if the rx buffer
> size is underestimated, the packet will have been split over more
> buffers (num_buf info in virtio_net_hdr_mrg_rxbuf placed in top of
> buffer area). If that happens the XDP path does a xdp_linearize_page
> operation.
>
> Cc: Jason Wang <jasowang@redhat.com>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> ---
>   drivers/net/virtio_net.c |   15 ++++++++++++---
>   1 file changed, 12 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 11f722460513..1df3676da185 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -689,6 +689,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
>   		xdp.data_end = xdp.data + len;
>   		xdp.data_meta = xdp.data;
>   		xdp.rxq = &rq->xdp_rxq;
> +		xdp.frame_sz = buflen;
>   		orig_data = xdp.data;
>   		act = bpf_prog_run_xdp(xdp_prog, &xdp);
>   		stats->xdp_packets++;
> @@ -797,10 +798,11 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>   	int offset = buf - page_address(page);
>   	struct sk_buff *head_skb, *curr_skb;
>   	struct bpf_prog *xdp_prog;
> -	unsigned int truesize;
> +	unsigned int truesize = mergeable_ctx_to_truesize(ctx);
>   	unsigned int headroom = mergeable_ctx_to_headroom(ctx);
> -	int err;
>   	unsigned int metasize = 0;
> +	unsigned int frame_sz;
> +	int err;
>   
>   	head_skb = NULL;
>   	stats->bytes += len - vi->hdr_len;
> @@ -821,6 +823,11 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>   		if (unlikely(hdr->hdr.gso_type))
>   			goto err_xdp;
>   
> +		/* Buffers with headroom use PAGE_SIZE as alloc size,
> +		 * see add_recvbuf_mergeable() + get_mergeable_buf_len()
> +		 */
> +		frame_sz = headroom ? PAGE_SIZE : truesize;
> +
>   		/* This happens when rx buffer size is underestimated
>   		 * or headroom is not enough because of the buffer
>   		 * was refilled before XDP is set. This should only
> @@ -834,6 +841,8 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>   						      page, offset,
>   						      VIRTIO_XDP_HEADROOM,
>   						      &len);
> +			frame_sz = PAGE_SIZE;


Should this be PAGE_SIZE -  SKB_DATA_ALIGN(sizeof(struct skb_shared_info))?


> +
>   			if (!xdp_page)
>   				goto err_xdp;
>   			offset = VIRTIO_XDP_HEADROOM;
> @@ -850,6 +859,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>   		xdp.data_end = xdp.data + (len - vi->hdr_len);
>   		xdp.data_meta = xdp.data;
>   		xdp.rxq = &rq->xdp_rxq;
> +		xdp.frame_sz = frame_sz;


Maybe we can easily check by

xdp.frame_sz = (xdp_page == page) ? truesize : ...

Thanks


>   
>   		act = bpf_prog_run_xdp(xdp_prog, &xdp);
>   		stats->xdp_packets++;
> @@ -924,7 +934,6 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>   	}
>   	rcu_read_unlock();
>   
> -	truesize = mergeable_ctx_to_truesize(ctx);
>   	if (unlikely(len > truesize)) {
>   		pr_debug("%s: rx error: len %u exceeds truesize %lu\n",
>   			 dev->name, len, (unsigned long)ctx);
>
>


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH net-next 21/33] virtio_net: add XDP frame size in two code paths
  2020-04-27  7:21   ` Jason Wang
@ 2020-04-27 14:32     ` Jesper Dangaard Brouer
  2020-04-28  9:50       ` Jason Wang
  0 siblings, 1 reply; 66+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-27 14:32 UTC (permalink / raw)
  To: Jason Wang
  Cc: sameehj, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, steffen.klassert, brouer

On Mon, 27 Apr 2020 15:21:02 +0800
Jason Wang <jasowang@redhat.com> wrote:

> On 2020/4/23 上午12:09, Jesper Dangaard Brouer wrote:
> > The virtio_net driver is running inside the guest-OS. There are two
> > XDP receive code-paths in virtio_net, namely receive_small() and
> > receive_mergeable(). The receive_big() function does not support XDP.
> >
> > In receive_small() the frame size is available in buflen. The buffer
> > backing these frames are allocated in add_recvbuf_small() with same
> > size, except for the headroom, but tailroom have reserved room for
> > skb_shared_info. The headroom is encoded in ctx pointer as a value.
> >
> > In receive_mergeable() the frame size is more dynamic. There are two
> > basic cases: (1) buffer size is based on a exponentially weighted
> > moving average (see DECLARE_EWMA) of packet length. Or (2) in case
> > virtnet_get_headroom() have any headroom then buffer size is
> > PAGE_SIZE. The ctx pointer is this time used for encoding two values;
> > the buffer len "truesize" and headroom. In case (1) if the rx buffer
> > size is underestimated, the packet will have been split over more
> > buffers (num_buf info in virtio_net_hdr_mrg_rxbuf placed in top of
> > buffer area). If that happens the XDP path does a xdp_linearize_page
> > operation.
> >
> > Cc: Jason Wang <jasowang@redhat.com>
> > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> > ---
> >   drivers/net/virtio_net.c |   15 ++++++++++++---
> >   1 file changed, 12 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 11f722460513..1df3676da185 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -689,6 +689,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
> >   		xdp.data_end = xdp.data + len;
> >   		xdp.data_meta = xdp.data;
> >   		xdp.rxq = &rq->xdp_rxq;
> > +		xdp.frame_sz = buflen;
> >   		orig_data = xdp.data;
> >   		act = bpf_prog_run_xdp(xdp_prog, &xdp);
> >   		stats->xdp_packets++;
> > @@ -797,10 +798,11 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> >   	int offset = buf - page_address(page);
> >   	struct sk_buff *head_skb, *curr_skb;
> >   	struct bpf_prog *xdp_prog;
> > -	unsigned int truesize;
> > +	unsigned int truesize = mergeable_ctx_to_truesize(ctx);
> >   	unsigned int headroom = mergeable_ctx_to_headroom(ctx);
> > -	int err;
> >   	unsigned int metasize = 0;
> > +	unsigned int frame_sz;
> > +	int err;
> >   
> >   	head_skb = NULL;
> >   	stats->bytes += len - vi->hdr_len;
> > @@ -821,6 +823,11 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> >   		if (unlikely(hdr->hdr.gso_type))
> >   			goto err_xdp;
> >   
> > +		/* Buffers with headroom use PAGE_SIZE as alloc size,
> > +		 * see add_recvbuf_mergeable() + get_mergeable_buf_len()
> > +		 */
> > +		frame_sz = headroom ? PAGE_SIZE : truesize;
> > +
> >   		/* This happens when rx buffer size is underestimated
> >   		 * or headroom is not enough because of the buffer
> >   		 * was refilled before XDP is set. This should only
> > @@ -834,6 +841,8 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> >   						      page, offset,
> >   						      VIRTIO_XDP_HEADROOM,
> >   						      &len);
> > +			frame_sz = PAGE_SIZE;  
> 
> 
> Should this be PAGE_SIZE -  SKB_DATA_ALIGN(sizeof(struct skb_shared_info))?

No, frame_sz include the SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) length.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH net-next 07/33] xdp: xdp_frame add member frame_sz and handle in convert_to_xdp_frame
  2020-04-25  3:24   ` Toshiaki Makita
@ 2020-04-27 15:20     ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 66+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-27 15:20 UTC (permalink / raw)
  To: Toshiaki Makita
  Cc: sameehj, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, steffen.klassert, brouer

On Sat, 25 Apr 2020 12:24:07 +0900
Toshiaki Makita <toshiaki.makita1@gmail.com> wrote:

> > +/* Avoids inlining WARN macro in fast-path */
> > +void xdp_warn(const char* msg, const char* func, const int line);
> > +#define XDP_WARN(msg) xdp_warn(msg, __func__, __LINE__)  
> 
> Shouldn't this have WARN_ONCE()-like mechanism?
> A buggy driver may generate massive amount of dump messages...

Well, in this use-case I think I want it be loud.  I usually miss those
WARN_ONCE messages, and I while extending and testing drivers, it was
an advantage that is was loud, as it caught some of my own bugs.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH net-next 29/33] xdp: allow bpf_xdp_adjust_tail() to grow packet size
  2020-04-22 16:09 ` [PATCH net-next 29/33] xdp: allow bpf_xdp_adjust_tail() to grow packet size Jesper Dangaard Brouer
  2020-04-24 14:09   ` Toke Høiland-Jørgensen
@ 2020-04-27 19:01   ` Daniel Borkmann
  2020-04-28 16:37     ` Jesper Dangaard Brouer
  1 sibling, 1 reply; 66+ messages in thread
From: Daniel Borkmann @ 2020-04-27 19:01 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, sameehj
  Cc: netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, steffen.klassert

On 4/22/20 6:09 PM, Jesper Dangaard Brouer wrote:
> Finally, after all drivers have a frame size, allow BPF-helper
> bpf_xdp_adjust_tail() to grow or extend packet size at frame tail.
> 
> Remember that helper/macro xdp_data_hard_end have reserved some
> tailroom.  Thus, this helper makes sure that the BPF-prog don't have
> access to this tailroom area.
> 
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> ---
>   include/uapi/linux/bpf.h |    4 ++--
>   net/core/filter.c        |   15 +++++++++++++--
>   2 files changed, 15 insertions(+), 4 deletions(-)
> 
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 2e29a671d67e..0e5abe991ca3 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -1969,8 +1969,8 @@ union bpf_attr {
>    * int bpf_xdp_adjust_tail(struct xdp_buff *xdp_md, int delta)
>    * 	Description
>    * 		Adjust (move) *xdp_md*\ **->data_end** by *delta* bytes. It is
> - * 		only possible to shrink the packet as of this writing,
> - * 		therefore *delta* must be a negative integer.
> + * 		possible to both shrink and grow the packet tail.
> + * 		Shrink done via *delta* being a negative integer.
>    *
>    * 		A call to this helper is susceptible to change the underlying
>    * 		packet buffer. Therefore, at load time, all checks on pointers
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 7d6ceaa54d21..5e9c387f74eb 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -3422,12 +3422,23 @@ static const struct bpf_func_proto bpf_xdp_adjust_head_proto = {
>   
>   BPF_CALL_2(bpf_xdp_adjust_tail, struct xdp_buff *, xdp, int, offset)
>   {
> +	void *data_hard_end = xdp_data_hard_end(xdp);
>   	void *data_end = xdp->data_end + offset;
>   
> -	/* only shrinking is allowed for now. */
> -	if (unlikely(offset >= 0))
> +	/* Notice that xdp_data_hard_end have reserved some tailroom */
> +	if (unlikely(data_end > data_hard_end))
>   		return -EINVAL;
>   
> +	/* ALL drivers MUST init xdp->frame_sz, some chicken checks below */
> +	if (unlikely(xdp->frame_sz < (xdp->data_end - xdp->data_hard_start))) {
> +		WARN(1, "Too small xdp->frame_sz = %d\n", xdp->frame_sz);
> +		return -EINVAL;
> +	}
> +	if (unlikely(xdp->frame_sz > PAGE_SIZE)) {
> +		WARN(1, "Too BIG xdp->frame_sz = %d\n", xdp->frame_sz);
> +		return -EINVAL;
> +	}

I don't think we can add the WARN()s here. If there is a bug in the driver in this
area and someone deploys an XDP-based application (otherwise known to work well
elsewhere) on top of this, then an attacker can basically remote DoS the machine
with malicious packets that end up triggering these WARN()s over and over.

If you are worried that not all your driver changes are correct, maybe only add
those that you were able to actually test yourself or that have been acked, and
otherwise pre-init the frame_sz to a known invalid value so this helper would only
allow shrinking for them in here (as today)?

Thanks,
Daniel

>   	if (unlikely(data_end < xdp->data + ETH_HLEN))
>   		return -EINVAL;
>   
> 
> 


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH net-next 23/33] ixgbe: add XDP frame size to driver
  2020-04-22 16:09 ` [PATCH net-next 23/33] ixgbe: add XDP frame size to driver Jesper Dangaard Brouer
@ 2020-04-27 19:51   ` Daniel Borkmann
  0 siblings, 0 replies; 66+ messages in thread
From: Daniel Borkmann @ 2020-04-27 19:51 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, sameehj
  Cc: intel-wired-lan, Jeff Kirsher, Alexander Duyck, netdev, bpf,
	zorik, akiyano, gtzalik, Toke Høiland-Jørgensen,
	Daniel Borkmann, Alexei Starovoitov, John Fastabend, David Ahern,
	Willem de Bruijn, Ilias Apalodimas, Lorenzo Bianconi,
	Saeed Mahameed, steffen.klassert

On 4/22/20 6:09 PM, Jesper Dangaard Brouer wrote:
> This driver uses different memory models depending on PAGE_SIZE at
> compile time. For PAGE_SIZE 4K it uses page splitting, meaning for
> normal MTU frame size is 2048 bytes (and headroom 192 bytes). For
> larger MTUs the driver still use page splitting, by allocating
> order-1 pages (8192 bytes) for RX frames. For PAGE_SIZE larger than
> 4K, driver instead advance its rx_buffer->page_offset with the frame
> size "truesize".
> 
> For XDP frame size calculations, this mean that in PAGE_SIZE larger
> than 4K mode the frame_sz change on a per packet basis. For the page
> split 4K PAGE_SIZE mode, xdp.frame_sz is more constant and can be
> updated once outside the main NAPI loop.
> 
> The default setting in the driver uses build_skb(), which provides
> the necessary headroom and tailroom for XDP-redirect in RX-frame
> (in both modes).
> 
> There is one complication, which is legacy-rx mode (configurable via
> ethtool priv-flags). There are zero headroom in this mode, which is a
> requirement for XDP-redirect to work. The conversion to xdp_frame
> (convert_to_xdp_frame) will detect this insufficient space, and
> xdp_do_redirect() call will fail. This is deemed acceptable, as it
> allows other XDP actions to still work in legacy-mode. In
> legacy-mode + larger PAGE_SIZE due to lacking tailroom, we also
> accept that xdp_adjust_tail shrink doesn't work.
> 
> Cc: intel-wired-lan@lists.osuosl.org
> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> Cc: Alexander Duyck <alexander.duyck@gmail.com>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>

Alexander/Jeff, in case the ixgbe/i40e/ice changes look good to you,
please ack.

Thanks,
Daniel

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH net-next 28/33] mlx5: rx queue setup time determine frame_sz for XDP
  2020-04-22 16:09 ` [PATCH net-next 28/33] mlx5: rx queue setup time determine frame_sz for XDP Jesper Dangaard Brouer
  2020-04-25  0:58   ` Alexei Starovoitov
@ 2020-04-27 20:22   ` Jesper Dangaard Brouer
  1 sibling, 0 replies; 66+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-27 20:22 UTC (permalink / raw)
  To: sameehj
  Cc: Tariq Toukan, Saeed Mahameed, netdev, bpf, zorik, akiyano,
	gtzalik, Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, steffen.klassert, brouer

On Wed, 22 Apr 2020 18:09:43 +0200
Jesper Dangaard Brouer <brouer@redhat.com> wrote:

> The mlx5 driver have multiple memory models, which are also changed
> according to whether a XDP bpf_prog is attached.
> 
> The 'rx_striding_rq' setting is adjusted via ethtool priv-flags e.g.:
>  # ethtool --set-priv-flags mlx5p2 rx_striding_rq off
> 
> On the general case with 4K page_size and regular MTU packet, then
> the frame_sz is 2048 and 4096 when XDP is enabled, in both modes.
> 
> The info on the given frame size is stored differently depending on the
> RQ-mode and encoded in a union in struct mlx5e_rq union wqe/mpwqe.
> In rx striding mode rq->mpwqe.log_stride_sz is either 11 or 12, which
> corresponds to 2048 or 4096 (MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ).
> In non-striding mode (MLX5_WQ_TYPE_CYCLIC) the frag_stride is stored
> in rq->wqe.info.arr[0].frag_stride, for the first fragment, which is
> what the XDP case cares about.
> 
> To reduce effect on fast-path, this patch determine the frame_sz at
> setup time, to avoid determining the memory model runtime. Variable
> is named first_frame_sz to make it clear that this is only the frame
> size of the first fragment.
> 
> This mlx5 driver does a DMA-sync on XDP_TX action, but grow is safe
> as it have done a DMA-map on the entire PAGE_SIZE. The driver also
> already does a XDP length check against sq->hw_mtu on the possible
> XDP xmit paths mlx5e_xmit_xdp_frame() + mlx5e_xmit_xdp_frame_mpwqe().
> 
> Cc: Tariq Toukan <tariqt@mellanox.com>
> Cc: Saeed Mahameed <saeedm@mellanox.com>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> ---
>  drivers/net/ethernet/mellanox/mlx5/core/en.h      |    1 +
>  drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c  |    1 +
>  drivers/net/ethernet/mellanox/mlx5/core/en_main.c |    6 ++++++
>  3 files changed, 8 insertions(+)

I found a bug in this patch, that can lead to BUG in skb_panic() in
XDP_PASS case when growing the tail. (Hint why I fixed output in [1]).
I already have a fix, but this implies I will send a V2 tomorrow.

I'll pickup all the ACKs manually tomorrow, before I resubmit.

[1] https://lore.kernel.org/netdev/158800546361.1962096.4535216438507756179.stgit@firesoul/
-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH net-next 21/33] virtio_net: add XDP frame size in two code paths
  2020-04-27 14:32     ` Jesper Dangaard Brouer
@ 2020-04-28  9:50       ` Jason Wang
  2020-04-30 10:14         ` Jesper Dangaard Brouer
  2020-05-06  6:38         ` Jason Wang
  0 siblings, 2 replies; 66+ messages in thread
From: Jason Wang @ 2020-04-28  9:50 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: sameehj, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, steffen.klassert

[-- Attachment #1: Type: text/plain, Size: 3971 bytes --]


On 2020/4/27 下午10:32, Jesper Dangaard Brouer wrote:
> On Mon, 27 Apr 2020 15:21:02 +0800
> Jason Wang<jasowang@redhat.com>  wrote:
>
>> On 2020/4/23 上午12:09, Jesper Dangaard Brouer wrote:
>>> The virtio_net driver is running inside the guest-OS. There are two
>>> XDP receive code-paths in virtio_net, namely receive_small() and
>>> receive_mergeable(). The receive_big() function does not support XDP.
>>>
>>> In receive_small() the frame size is available in buflen. The buffer
>>> backing these frames are allocated in add_recvbuf_small() with same
>>> size, except for the headroom, but tailroom have reserved room for
>>> skb_shared_info. The headroom is encoded in ctx pointer as a value.
>>>
>>> In receive_mergeable() the frame size is more dynamic. There are two
>>> basic cases: (1) buffer size is based on a exponentially weighted
>>> moving average (see DECLARE_EWMA) of packet length. Or (2) in case
>>> virtnet_get_headroom() have any headroom then buffer size is
>>> PAGE_SIZE. The ctx pointer is this time used for encoding two values;
>>> the buffer len "truesize" and headroom. In case (1) if the rx buffer
>>> size is underestimated, the packet will have been split over more
>>> buffers (num_buf info in virtio_net_hdr_mrg_rxbuf placed in top of
>>> buffer area). If that happens the XDP path does a xdp_linearize_page
>>> operation.
>>>
>>> Cc: Jason Wang<jasowang@redhat.com>
>>> Signed-off-by: Jesper Dangaard Brouer<brouer@redhat.com>
>>> ---
>>>    drivers/net/virtio_net.c |   15 ++++++++++++---
>>>    1 file changed, 12 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>> index 11f722460513..1df3676da185 100644
>>> --- a/drivers/net/virtio_net.c
>>> +++ b/drivers/net/virtio_net.c
>>> @@ -689,6 +689,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
>>>    		xdp.data_end = xdp.data + len;
>>>    		xdp.data_meta = xdp.data;
>>>    		xdp.rxq = &rq->xdp_rxq;
>>> +		xdp.frame_sz = buflen;
>>>    		orig_data = xdp.data;
>>>    		act = bpf_prog_run_xdp(xdp_prog, &xdp);
>>>    		stats->xdp_packets++;
>>> @@ -797,10 +798,11 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>>>    	int offset = buf - page_address(page);
>>>    	struct sk_buff *head_skb, *curr_skb;
>>>    	struct bpf_prog *xdp_prog;
>>> -	unsigned int truesize;
>>> +	unsigned int truesize = mergeable_ctx_to_truesize(ctx);
>>>    	unsigned int headroom = mergeable_ctx_to_headroom(ctx);
>>> -	int err;
>>>    	unsigned int metasize = 0;
>>> +	unsigned int frame_sz;
>>> +	int err;
>>>    
>>>    	head_skb = NULL;
>>>    	stats->bytes += len - vi->hdr_len;
>>> @@ -821,6 +823,11 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>>>    		if (unlikely(hdr->hdr.gso_type))
>>>    			goto err_xdp;
>>>    
>>> +		/* Buffers with headroom use PAGE_SIZE as alloc size,
>>> +		 * see add_recvbuf_mergeable() + get_mergeable_buf_len()
>>> +		 */
>>> +		frame_sz = headroom ? PAGE_SIZE : truesize;
>>> +
>>>    		/* This happens when rx buffer size is underestimated
>>>    		 * or headroom is not enough because of the buffer
>>>    		 * was refilled before XDP is set. This should only
>>> @@ -834,6 +841,8 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>>>    						      page, offset,
>>>    						      VIRTIO_XDP_HEADROOM,
>>>    						      &len);
>>> +			frame_sz = PAGE_SIZE;
>> Should this be PAGE_SIZE -  SKB_DATA_ALIGN(sizeof(struct skb_shared_info))?
> No, frame_sz include the SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) length.


Ok, consider mergeable buffer path depends on the truesize which is 
encoded in ctx.

It looks to the the calculation in add_recvfbuf_mergeable() is wrong, we 
need count both headroom and tailroom there.

We probably need the attached 2 patches to fix this.

(untested, will test it tomorrow).

Thanks



[-- Attachment #2: 0002-virtio-net-fix-the-XDP-truesize-calculation-for-merg.patch --]
[-- Type: text/x-patch, Size: 1825 bytes --]

From c2778eb8ee4b7558bccb53f2fc7f1b0aaf1fcb58 Mon Sep 17 00:00:00 2001
From: Jason Wang <jasowang@redhat.com>
Date: Tue, 28 Apr 2020 11:37:39 +0800
Subject: [PATCH 2/2] virtio-net: fix the XDP truesize calculation for
 mergeable buffers

We should not exclude headroom and tailroom when XDP is set. So this
patch fixes this by initializing the truesize from PAGE_SIZE when XDP
is set.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/virtio_net.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 9bdaf2425e6e..f9ba5275e447 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1172,7 +1172,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
 	char *buf;
 	void *ctx;
 	int err;
-	unsigned int len, hole;
+	unsigned int len, hole, truesize;
 
 	/* Extra tailroom is needed to satisfy XDP's assumption. This
 	 * means rx frags coalescing won't work, but consider we've
@@ -1182,6 +1182,7 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
 	if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp)))
 		return -ENOMEM;
 
+	truesize = headroom ? PAGE_SIZE: len;
 	buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
 	buf += headroom; /* advance address leaving hole at front of pkt */
 	get_page(alloc_frag->page);
@@ -1193,11 +1194,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
 		 * the current buffer.
 		 */
 		len += hole;
+		truesize += hole;
 		alloc_frag->offset += hole;
 	}
 
 	sg_init_one(rq->sg, buf, len);
-	ctx = mergeable_len_to_ctx(len, headroom);
+	ctx = mergeable_len_to_ctx(truesize, headroom);
 	err = virtqueue_add_inbuf_ctx(rq->vq, rq->sg, 1, buf, ctx, gfp);
 	if (err < 0)
 		put_page(virt_to_head_page(buf));
-- 
2.20.1


[-- Attachment #3: 0001-virtio-net-don-t-reserve-space-for-vnet-header-for-X.patch --]
[-- Type: text/x-patch, Size: 1738 bytes --]

From 307ac87e823fde059be3bb5a7bdd3ffd3b18521d Mon Sep 17 00:00:00 2001
From: Jason Wang <jasowang@redhat.com>
Date: Tue, 28 Apr 2020 11:31:47 +0800
Subject: [PATCH 1/2] virtio-net: don't reserve space for vnet header for XDP

We tried to reserve space for vnet header before
xdp.data_hard_start. But this is useless since the packet could be
modified by XDP which may invalidate the information stored in the
header and there's no way for XDP to know the existence of the vnet
header currently.

So let's just not reserve space for vnet header in this case.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/virtio_net.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 2fe7a3188282..9bdaf2425e6e 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -681,8 +681,8 @@ static struct sk_buff *receive_small(struct net_device *dev,
 			page = xdp_page;
 		}
 
-		xdp.data_hard_start = buf + VIRTNET_RX_PAD + vi->hdr_len;
-		xdp.data = xdp.data_hard_start + xdp_headroom;
+		xdp.data_hard_start = buf + VIRTNET_RX_PAD;
+		xdp.data = xdp.data_hard_start + xdp_headroom + vi->hdr_len;;
 		xdp_set_data_meta_invalid(&xdp);
 		xdp.data_end = xdp.data + len;
 		xdp.rxq = &rq->xdp_rxq;
@@ -837,7 +837,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 		 * the descriptor on if we get an XDP_TX return code.
 		 */
 		data = page_address(xdp_page) + offset;
-		xdp.data_hard_start = data - VIRTIO_XDP_HEADROOM + vi->hdr_len;
+		xdp.data_hard_start = data - VIRTIO_XDP_HEADROOM;
 		xdp.data = data + vi->hdr_len;
 		xdp_set_data_meta_invalid(&xdp);
 		xdp.data_end = xdp.data + (len - vi->hdr_len);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 66+ messages in thread

* Re: [PATCH net-next 30/33] xdp: clear grow memory in bpf_xdp_adjust_tail()
  2020-04-27  5:26   ` John Fastabend
@ 2020-04-28 14:50     ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 66+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-28 14:50 UTC (permalink / raw)
  To: John Fastabend
  Cc: sameehj, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, Alexander Duyck, Jeff Kirsher, David Ahern,
	Willem de Bruijn, Ilias Apalodimas, Lorenzo Bianconi,
	Saeed Mahameed, steffen.klassert, brouer

On Sun, 26 Apr 2020 22:26:54 -0700
John Fastabend <john.fastabend@gmail.com> wrote:

> Jesper Dangaard Brouer wrote:
> > Clearing memory of tail when grow happens, because it is too easy
> > to write a XDP_PASS program that extend the tail, which expose
> > this memory to users that can run tcpdump.
> > 
> > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> > ---  
> 
> Hi Jesper, Thanks for the series any idea what the cost of doing
> this is? If you have some data I would be curious to know a
> baseline measurment, a grow with memset, then a grow with memset.
> I'm guess this can be relatively expensive?

I have a "time_bench" memset kernel module[1] that I use to understand
that is the best-case/minimum overhead with a hot-cache.  But in this
case, the memory will be in L3-cache (at least on Intel with DDIO).

For legitimate use-cases, the BPF-programmer will write her tail data
into this memory area anyhow.  Thus, I'm not convinced this will be a
performance issue for real use-cases.  When we have a real use-case that
need this tail extend and does XDP_TX, I say we can revisit this.


[1] https://github.com/netoptimizer/prototype-kernel/blob/master/kernel/lib/time_bench_memset.c
-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH net-next 01/33] xdp: add frame size to xdp_buff
  2020-04-24 14:00   ` Toke Høiland-Jørgensen
@ 2020-04-28 16:06     ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 66+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-28 16:06 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: sameehj, netdev, bpf, Daniel Borkmann, Alexei Starovoitov,
	John Fastabend, David Ahern, Willem de Bruijn, Ilias Apalodimas

On Fri, 24 Apr 2020 16:00:53 +0200
Toke Høiland-Jørgensen <toke@redhat.com> wrote:

> Jesper Dangaard Brouer <brouer@redhat.com> writes:
> 
> > XDP have evolved to support several frame sizes, but xdp_buff was not
> > updated with this information. The frame size (frame_sz) member of
> > xdp_buff is introduced to know the real size of the memory the frame is
> > delivered in.
> >
> > When introducing this also make it clear that some tailroom is
> > reserved/required when creating SKBs using build_skb().
> >
> > It would also have been an option to introduce a pointer to
> > data_hard_end (with reserved offset). The advantage with frame_sz is
> > that (like rxq) drivers only need to setup/assign this value once per
> > NAPI cycle. Due to XDP-generic (and some drivers) it's not possible to
> > store frame_sz inside xdp_rxq_info, because it's varies per packet as it
> > can be based/depend on packet length.
> >
> > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>  
> 
> With one possible nit below:
> 
> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>

thx

> > ---
> >  include/net/xdp.h |   13 +++++++++++++
> >  1 file changed, 13 insertions(+)
> >
> > diff --git a/include/net/xdp.h b/include/net/xdp.h
> > index 40c6d3398458..1ccf7df98bee 100644
> > --- a/include/net/xdp.h
> > +++ b/include/net/xdp.h
> > @@ -6,6 +6,8 @@
> >  #ifndef __LINUX_NET_XDP_H__
> >  #define __LINUX_NET_XDP_H__
> >  
> > +#include <linux/skbuff.h> /* skb_shared_info */
> > +
> >  /**
> >   * DOC: XDP RX-queue information
> >   *
> > @@ -70,8 +72,19 @@ struct xdp_buff {
> >  	void *data_hard_start;
> >  	unsigned long handle;
> >  	struct xdp_rxq_info *rxq;
> > +	u32 frame_sz; /* frame size to deduct data_hard_end/reserved tailroom*/  
> 
> I think maybe you want to s/deduct/deduce/ here?

Okay, queued for V2.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH net-next 29/33] xdp: allow bpf_xdp_adjust_tail() to grow packet size
  2020-04-27 19:01   ` Daniel Borkmann
@ 2020-04-28 16:37     ` Jesper Dangaard Brouer
  2020-04-28 19:36       ` Daniel Borkmann
  0 siblings, 1 reply; 66+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-28 16:37 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: sameehj, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, steffen.klassert, brouer

On Mon, 27 Apr 2020 21:01:14 +0200
Daniel Borkmann <daniel@iogearbox.net> wrote:

> On 4/22/20 6:09 PM, Jesper Dangaard Brouer wrote:
> > Finally, after all drivers have a frame size, allow BPF-helper
> > bpf_xdp_adjust_tail() to grow or extend packet size at frame tail.
> > 
> > Remember that helper/macro xdp_data_hard_end have reserved some
> > tailroom.  Thus, this helper makes sure that the BPF-prog don't have
> > access to this tailroom area.
> > 
> > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> > ---
> >   include/uapi/linux/bpf.h |    4 ++--
> >   net/core/filter.c        |   15 +++++++++++++--
> >   2 files changed, 15 insertions(+), 4 deletions(-)
> > 
[...]
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index 7d6ceaa54d21..5e9c387f74eb 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -3422,12 +3422,23 @@ static const struct bpf_func_proto bpf_xdp_adjust_head_proto = {
> >   
> >   BPF_CALL_2(bpf_xdp_adjust_tail, struct xdp_buff *, xdp, int, offset)
> >   {
> > +	void *data_hard_end = xdp_data_hard_end(xdp);
> >   	void *data_end = xdp->data_end + offset;
> >   
> > -	/* only shrinking is allowed for now. */
> > -	if (unlikely(offset >= 0))
> > +	/* Notice that xdp_data_hard_end have reserved some tailroom */
> > +	if (unlikely(data_end > data_hard_end))
> >   		return -EINVAL;
> >   
> > +	/* ALL drivers MUST init xdp->frame_sz, some chicken checks below */
> > +	if (unlikely(xdp->frame_sz < (xdp->data_end - xdp->data_hard_start))) {
> > +		WARN(1, "Too small xdp->frame_sz = %d\n", xdp->frame_sz);
> > +		return -EINVAL;
> > +	}

I will remove this "too small" check, as it is useless, given it will
already get caught by above check.


> > +	if (unlikely(xdp->frame_sz > PAGE_SIZE)) {
> > +		WARN(1, "Too BIG xdp->frame_sz = %d\n", xdp->frame_sz);
> > +		return -EINVAL;
> > +	}  
> 
> I don't think we can add the WARN()s here. If there is a bug in the
> driver in this area and someone deploys an XDP-based application
> (otherwise known to work well elsewhere) on top of this, then an
> attacker can basically remote DoS the machine with malicious packets
> that end up triggering these WARN()s over and over.

Good point.  I've changed this to WARN_ONCE(), but I'm still
considering to remove it completely...

> If you are worried that not all your driver changes are correct,
> maybe only add those that you were able to actually test yourself or
> that have been acked, and otherwise pre-init the frame_sz to a known
> invalid value so this helper would only allow shrinking for them in
> here (as today)?

Hmm... no, I really want to require ALL drivers to set a valid value,
because else we will have the "data_meta" feature situation, where a lot
of drivers still doesn't support this.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH net-next 29/33] xdp: allow bpf_xdp_adjust_tail() to grow packet size
  2020-04-28 16:37     ` Jesper Dangaard Brouer
@ 2020-04-28 19:36       ` Daniel Borkmann
  0 siblings, 0 replies; 66+ messages in thread
From: Daniel Borkmann @ 2020-04-28 19:36 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: sameehj, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, steffen.klassert

On 4/28/20 6:37 PM, Jesper Dangaard Brouer wrote:
> On Mon, 27 Apr 2020 21:01:14 +0200
> Daniel Borkmann <daniel@iogearbox.net> wrote:
>> On 4/22/20 6:09 PM, Jesper Dangaard Brouer wrote:
>>> Finally, after all drivers have a frame size, allow BPF-helper
>>> bpf_xdp_adjust_tail() to grow or extend packet size at frame tail.
>>>
>>> Remember that helper/macro xdp_data_hard_end have reserved some
>>> tailroom.  Thus, this helper makes sure that the BPF-prog don't have
>>> access to this tailroom area.
>>>
>>> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
>>> ---
>>>    include/uapi/linux/bpf.h |    4 ++--
>>>    net/core/filter.c        |   15 +++++++++++++--
>>>    2 files changed, 15 insertions(+), 4 deletions(-)
>>>
> [...]
>>> diff --git a/net/core/filter.c b/net/core/filter.c
>>> index 7d6ceaa54d21..5e9c387f74eb 100644
>>> --- a/net/core/filter.c
>>> +++ b/net/core/filter.c
>>> @@ -3422,12 +3422,23 @@ static const struct bpf_func_proto bpf_xdp_adjust_head_proto = {
>>>    
>>>    BPF_CALL_2(bpf_xdp_adjust_tail, struct xdp_buff *, xdp, int, offset)
>>>    {
>>> +	void *data_hard_end = xdp_data_hard_end(xdp);
>>>    	void *data_end = xdp->data_end + offset;
>>>    
>>> -	/* only shrinking is allowed for now. */
>>> -	if (unlikely(offset >= 0))
>>> +	/* Notice that xdp_data_hard_end have reserved some tailroom */
>>> +	if (unlikely(data_end > data_hard_end))
>>>    		return -EINVAL;
>>>    
>>> +	/* ALL drivers MUST init xdp->frame_sz, some chicken checks below */
>>> +	if (unlikely(xdp->frame_sz < (xdp->data_end - xdp->data_hard_start))) {
>>> +		WARN(1, "Too small xdp->frame_sz = %d\n", xdp->frame_sz);
>>> +		return -EINVAL;
>>> +	}
> 
> I will remove this "too small" check, as it is useless, given it will
> already get caught by above check.
> 
>>> +	if (unlikely(xdp->frame_sz > PAGE_SIZE)) {
>>> +		WARN(1, "Too BIG xdp->frame_sz = %d\n", xdp->frame_sz);
>>> +		return -EINVAL;
>>> +	}
>>
>> I don't think we can add the WARN()s here. If there is a bug in the
>> driver in this area and someone deploys an XDP-based application
>> (otherwise known to work well elsewhere) on top of this, then an
>> attacker can basically remote DoS the machine with malicious packets
>> that end up triggering these WARN()s over and over.
> 
> Good point.  I've changed this to WARN_ONCE(), but I'm still
> considering to remove it completely...
> 
>> If you are worried that not all your driver changes are correct,
>> maybe only add those that you were able to actually test yourself or
>> that have been acked, and otherwise pre-init the frame_sz to a known
>> invalid value so this helper would only allow shrinking for them in
>> here (as today)?
> 
> Hmm... no, I really want to require ALL drivers to set a valid value,
> because else we will have the "data_meta" feature situation, where a lot
> of drivers still doesn't support this.

Ok, makes sense, it's probably better that way. I do have a data_meta
series for a few more drivers to push out soon to make sure there's more
coverage as we're using it in Cilium.

^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH net-next 20/33] vhost_net: also populate XDP frame size
  2020-04-27  5:50   ` Jason Wang
@ 2020-04-30  9:54     ` Jesper Dangaard Brouer
  2020-05-06  6:43       ` Jason Wang
  0 siblings, 1 reply; 66+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-30  9:54 UTC (permalink / raw)
  To: Jason Wang
  Cc: sameehj, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, steffen.klassert, brouer

On Mon, 27 Apr 2020 13:50:15 +0800
Jason Wang <jasowang@redhat.com> wrote:

> On 2020/4/23 上午12:09, Jesper Dangaard Brouer wrote:
> > In vhost_net_build_xdp() the 'buf' that gets queued via an xdp_buff
> > have embedded a struct tun_xdp_hdr (located at xdp->data_hard_start)
> > which contains the buffer length 'buflen' (with tailroom for
> > skb_shared_info). Also storing this buflen in xdp->frame_sz, does not
> > obsolete struct tun_xdp_hdr, as it also contains a struct
> > virtio_net_hdr with other information.
> >
> > Cc: Jason Wang <jasowang@redhat.com>
> > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> > ---
> >   drivers/vhost/net.c |    1 +
> >   1 file changed, 1 insertion(+)
> >
> > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> > index 87469d67ede8..69af007e22f4 100644
> > --- a/drivers/vhost/net.c
> > +++ b/drivers/vhost/net.c
> > @@ -745,6 +745,7 @@ static int vhost_net_build_xdp(struct vhost_net_virtqueue *nvq,
> >   	xdp->data = buf + pad;
> >   	xdp->data_end = xdp->data + len;
> >   	hdr->buflen = buflen;
> > +	xdp->frame_sz = buflen;
> >   
> >   	--net->refcnt_bias;
> >   	alloc_frag->offset += buflen;  
> 
> 
> Tun_xdp_one() will use hdr->buflen as the frame_sz (patch 19), so it 
> looks to me there's no need to do this?

I was thinking to go the "other way", meaning let tun_xdp_one() use
xdp->frame_sz, which gets set here.  This would allow us to refactor
the code, and drop struct tun_xdp_hdr, as (see pahole below) it only
carries 'buflen' and the remaining part comes from struct
virtio_net_hdr, which could be used directly instead.

As this will be a code refactor, I would prefer we do it after this
patchseries is agreed upon.

$ pahole -C tun_xdp_hdr drivers/net/tap.o
struct tun_xdp_hdr {
	int                        buflen;               /*     0     4 */
	struct virtio_net_hdr gso;                       /*     4    10 */

	/* size: 16, cachelines: 1, members: 2 */
	/* padding: 2 */
	/* last cacheline: 16 bytes */
};

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH net-next 21/33] virtio_net: add XDP frame size in two code paths
  2020-04-28  9:50       ` Jason Wang
@ 2020-04-30 10:14         ` Jesper Dangaard Brouer
  2020-05-06  6:38         ` Jason Wang
  1 sibling, 0 replies; 66+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-30 10:14 UTC (permalink / raw)
  To: Jason Wang
  Cc: sameehj, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, steffen.klassert, brouer

On Tue, 28 Apr 2020 17:50:11 +0800
Jason Wang <jasowang@redhat.com> wrote:

> We tried to reserve space for vnet header before
> xdp.data_hard_start. But this is useless since the packet could be
> modified by XDP which may invalidate the information stored in the
> header and there's no way for XDP to know the existence of the vnet
> header currently.

I think this is wrong.  We can reserve space for vnet header before
xdp.data_hard_start, and it will be safe and cannot be modified by XDP
as BPF program cannot access data before xdp.data_hard_start.
 
> So let's just not reserve space for vnet header in this case.
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/net/virtio_net.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 2fe7a3188282..9bdaf2425e6e 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -681,8 +681,8 @@ static struct sk_buff *receive_small(struct net_device *dev,
>  			page = xdp_page;
>  		}
>  
> -		xdp.data_hard_start = buf + VIRTNET_RX_PAD + vi->hdr_len;
> -		xdp.data = xdp.data_hard_start + xdp_headroom;
> +		xdp.data_hard_start = buf + VIRTNET_RX_PAD;
> +		xdp.data = xdp.data_hard_start + xdp_headroom + vi->hdr_len;;

I think this is wrong.  You are exposing the vi header info, which will
be overwritten when code creates an xdp_frame.  I find it very fragile,
as later in the code the vi header info is copied, but only if xdp_prog
is not loaded, so in principle it's okay, but when someone later
figures out that we want to look at this area, we will be in trouble
(and I expect this will be needed when we work towards multi-buffer
XDP).

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH net-next 21/33] virtio_net: add XDP frame size in two code paths
  2020-04-28  9:50       ` Jason Wang
  2020-04-30 10:14         ` Jesper Dangaard Brouer
@ 2020-05-06  6:38         ` Jason Wang
  1 sibling, 0 replies; 66+ messages in thread
From: Jason Wang @ 2020-05-06  6:38 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: sameehj, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, steffen.klassert


On 2020/4/28 下午5:50, Jason Wang wrote:
>
> On 2020/4/27 下午10:32, Jesper Dangaard Brouer wrote:
>> On Mon, 27 Apr 2020 15:21:02 +0800
>> Jason Wang<jasowang@redhat.com> wrote:
>>
>>> On 2020/4/23 上午12:09, Jesper Dangaard Brouer wrote:
>>>> The virtio_net driver is running inside the guest-OS. There are two
>>>> XDP receive code-paths in virtio_net, namely receive_small() and
>>>> receive_mergeable(). The receive_big() function does not support XDP.
>>>>
>>>> In receive_small() the frame size is available in buflen. The buffer
>>>> backing these frames are allocated in add_recvbuf_small() with same
>>>> size, except for the headroom, but tailroom have reserved room for
>>>> skb_shared_info. The headroom is encoded in ctx pointer as a value.
>>>>
>>>> In receive_mergeable() the frame size is more dynamic. There are two
>>>> basic cases: (1) buffer size is based on a exponentially weighted
>>>> moving average (see DECLARE_EWMA) of packet length. Or (2) in case
>>>> virtnet_get_headroom() have any headroom then buffer size is
>>>> PAGE_SIZE. The ctx pointer is this time used for encoding two values;
>>>> the buffer len "truesize" and headroom. In case (1) if the rx buffer
>>>> size is underestimated, the packet will have been split over more
>>>> buffers (num_buf info in virtio_net_hdr_mrg_rxbuf placed in top of
>>>> buffer area). If that happens the XDP path does a xdp_linearize_page
>>>> operation.
>>>>
>>>> Cc: Jason Wang<jasowang@redhat.com>
>>>> Signed-off-by: Jesper Dangaard Brouer<brouer@redhat.com>
>>>> ---
>>>>    drivers/net/virtio_net.c |   15 ++++++++++++---
>>>>    1 file changed, 12 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>>> index 11f722460513..1df3676da185 100644
>>>> --- a/drivers/net/virtio_net.c
>>>> +++ b/drivers/net/virtio_net.c
>>>> @@ -689,6 +689,7 @@ static struct sk_buff *receive_small(struct 
>>>> net_device *dev,
>>>>            xdp.data_end = xdp.data + len;
>>>>            xdp.data_meta = xdp.data;
>>>>            xdp.rxq = &rq->xdp_rxq;
>>>> +        xdp.frame_sz = buflen;
>>>>            orig_data = xdp.data;
>>>>            act = bpf_prog_run_xdp(xdp_prog, &xdp);
>>>>            stats->xdp_packets++;
>>>> @@ -797,10 +798,11 @@ static struct sk_buff 
>>>> *receive_mergeable(struct net_device *dev,
>>>>        int offset = buf - page_address(page);
>>>>        struct sk_buff *head_skb, *curr_skb;
>>>>        struct bpf_prog *xdp_prog;
>>>> -    unsigned int truesize;
>>>> +    unsigned int truesize = mergeable_ctx_to_truesize(ctx);
>>>>        unsigned int headroom = mergeable_ctx_to_headroom(ctx);
>>>> -    int err;
>>>>        unsigned int metasize = 0;
>>>> +    unsigned int frame_sz;
>>>> +    int err;
>>>>           head_skb = NULL;
>>>>        stats->bytes += len - vi->hdr_len;
>>>> @@ -821,6 +823,11 @@ static struct sk_buff 
>>>> *receive_mergeable(struct net_device *dev,
>>>>            if (unlikely(hdr->hdr.gso_type))
>>>>                goto err_xdp;
>>>>    +        /* Buffers with headroom use PAGE_SIZE as alloc size,
>>>> +         * see add_recvbuf_mergeable() + get_mergeable_buf_len()
>>>> +         */
>>>> +        frame_sz = headroom ? PAGE_SIZE : truesize;
>>>> +
>>>>            /* This happens when rx buffer size is underestimated
>>>>             * or headroom is not enough because of the buffer
>>>>             * was refilled before XDP is set. This should only
>>>> @@ -834,6 +841,8 @@ static struct sk_buff *receive_mergeable(struct 
>>>> net_device *dev,
>>>>                                  page, offset,
>>>>                                  VIRTIO_XDP_HEADROOM,
>>>>                                  &len);
>>>> +            frame_sz = PAGE_SIZE;
>>> Should this be PAGE_SIZE -  SKB_DATA_ALIGN(sizeof(struct 
>>> skb_shared_info))?
>> No, frame_sz include the SKB_DATA_ALIGN(sizeof(struct 
>> skb_shared_info)) length.
>
>
> Ok, consider mergeable buffer path depends on the truesize which is 
> encoded in ctx.
>
> It looks to the the calculation in add_recvfbuf_mergeable() is wrong, 
> we need count both headroom and tailroom there.
>
> We probably need the attached 2 patches to fix this.
>
> (untested, will test it tomorrow).


Sorry for the late reply. I gave a test and post the attached two 
patches (with minor tweaks).

It looks to me they are required for this patch to work since 
data_hard_start excludes vnet hdr len without the attached patches which 
means PAGE_SIZE could not be used as frame_sz.

Thanks


>
> Thanks


^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH net-next 20/33] vhost_net: also populate XDP frame size
  2020-04-30  9:54     ` Jesper Dangaard Brouer
@ 2020-05-06  6:43       ` Jason Wang
  0 siblings, 0 replies; 66+ messages in thread
From: Jason Wang @ 2020-05-06  6:43 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: sameehj, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, steffen.klassert


On 2020/4/30 下午5:54, Jesper Dangaard Brouer wrote:
> On Mon, 27 Apr 2020 13:50:15 +0800
> Jason Wang <jasowang@redhat.com> wrote:
>
>> On 2020/4/23 上午12:09, Jesper Dangaard Brouer wrote:
>>> In vhost_net_build_xdp() the 'buf' that gets queued via an xdp_buff
>>> have embedded a struct tun_xdp_hdr (located at xdp->data_hard_start)
>>> which contains the buffer length 'buflen' (with tailroom for
>>> skb_shared_info). Also storing this buflen in xdp->frame_sz, does not
>>> obsolete struct tun_xdp_hdr, as it also contains a struct
>>> virtio_net_hdr with other information.
>>>
>>> Cc: Jason Wang <jasowang@redhat.com>
>>> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
>>> ---
>>>    drivers/vhost/net.c |    1 +
>>>    1 file changed, 1 insertion(+)
>>>
>>> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
>>> index 87469d67ede8..69af007e22f4 100644
>>> --- a/drivers/vhost/net.c
>>> +++ b/drivers/vhost/net.c
>>> @@ -745,6 +745,7 @@ static int vhost_net_build_xdp(struct vhost_net_virtqueue *nvq,
>>>    	xdp->data = buf + pad;
>>>    	xdp->data_end = xdp->data + len;
>>>    	hdr->buflen = buflen;
>>> +	xdp->frame_sz = buflen;
>>>    
>>>    	--net->refcnt_bias;
>>>    	alloc_frag->offset += buflen;
>>
>> Tun_xdp_one() will use hdr->buflen as the frame_sz (patch 19), so it
>> looks to me there's no need to do this?
> I was thinking to go the "other way", meaning let tun_xdp_one() use
> xdp->frame_sz, which gets set here.  This would allow us to refactor
> the code, and drop struct tun_xdp_hdr, as (see pahole below) it only
> carries 'buflen' and the remaining part comes from struct
> virtio_net_hdr, which could be used directly instead.
>
> As this will be a code refactor, I would prefer we do it after this
> patchseries is agreed upon.
>
> $ pahole -C tun_xdp_hdr drivers/net/tap.o
> struct tun_xdp_hdr {
> 	int                        buflen;               /*     0     4 */
> 	struct virtio_net_hdr gso;                       /*     4    10 */
>
> 	/* size: 16, cachelines: 1, members: 2 */
> 	/* padding: 2 */
> 	/* last cacheline: 16 bytes */
> };


Ok I get this.

Thanks



^ permalink raw reply	[flat|nested] 66+ messages in thread

* Re: [PATCH net-next 19/33] tun: add XDP frame size
  2020-04-22 16:08 ` [PATCH net-next 19/33] tun: add XDP frame size Jesper Dangaard Brouer
  2020-04-27  5:51   ` Jason Wang
@ 2020-05-06 20:30   ` Michael S. Tsirkin
  1 sibling, 0 replies; 66+ messages in thread
From: Michael S. Tsirkin @ 2020-05-06 20:30 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: sameehj, Jason Wang, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, steffen.klassert

On Wed, Apr 22, 2020 at 06:08:57PM +0200, Jesper Dangaard Brouer wrote:
> The tun driver have two code paths for running XDP (bpf_prog_run_xdp).
> In both cases 'buflen' contains enough tailroom for skb_shared_info.
> 
> Cc: Jason Wang <jasowang@redhat.com>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>

Acked-by: Michael S. Tsirkin <mst@redhat.com>

> ---
>  drivers/net/tun.c |    2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 44889eba1dbc..c54f967e2c66 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -1671,6 +1671,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
>  		xdp_set_data_meta_invalid(&xdp);
>  		xdp.data_end = xdp.data + len;
>  		xdp.rxq = &tfile->xdp_rxq;
> +		xdp.frame_sz = buflen;
>  
>  		act = bpf_prog_run_xdp(xdp_prog, &xdp);
>  		if (act == XDP_REDIRECT || act == XDP_TX) {
> @@ -2411,6 +2412,7 @@ static int tun_xdp_one(struct tun_struct *tun,
>  		}
>  		xdp_set_data_meta_invalid(xdp);
>  		xdp->rxq = &tfile->xdp_rxq;
> +		xdp->frame_sz = buflen;
>  
>  		act = bpf_prog_run_xdp(xdp_prog, xdp);
>  		err = tun_xdp_act(tun, xdp_prog, xdp, act);
> 
> 


^ permalink raw reply	[flat|nested] 66+ messages in thread

end of thread, other threads:[~2020-05-06 20:31 UTC | newest]

Thread overview: 66+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <158757160439.1370371.13213378122947426220.stgit@firesoul>
2020-04-22 16:07 ` [PATCH net-next 01/33] xdp: add frame size to xdp_buff Jesper Dangaard Brouer
2020-04-24 14:00   ` Toke Høiland-Jørgensen
2020-04-28 16:06     ` Jesper Dangaard Brouer
2020-04-22 16:07 ` [PATCH net-next 02/33] bnxt: add XDP frame size to driver Jesper Dangaard Brouer
2020-04-22 16:07 ` [PATCH net-next 03/33] sfc: add XDP frame size Jesper Dangaard Brouer
2020-04-22 16:07 ` [PATCH net-next 04/33] mvneta: add XDP frame size to driver Jesper Dangaard Brouer
2020-04-22 16:07 ` [PATCH net-next 05/33] net: netsec: Add support for XDP frame size Jesper Dangaard Brouer
2020-04-22 16:07 ` [PATCH net-next 06/33] net: XDP-generic determining " Jesper Dangaard Brouer
2020-04-24 14:03   ` Toke Høiland-Jørgensen
2020-04-22 16:07 ` [PATCH net-next 07/33] xdp: xdp_frame add member frame_sz and handle in convert_to_xdp_frame Jesper Dangaard Brouer
2020-04-24 14:04   ` Toke Høiland-Jørgensen
2020-04-25  3:24   ` Toshiaki Makita
2020-04-27 15:20     ` Jesper Dangaard Brouer
2020-04-22 16:08 ` [PATCH net-next 08/33] xdp: cpumap redirect use frame_sz and increase skb_tailroom Jesper Dangaard Brouer
2020-04-24 14:04   ` Toke Høiland-Jørgensen
2020-04-22 16:08 ` [PATCH net-next 09/33] veth: adjust hard_start offset on redirect XDP frames Jesper Dangaard Brouer
2020-04-24 14:05   ` Toke Høiland-Jørgensen
2020-04-22 16:08 ` [PATCH net-next 10/33] veth: xdp using frame_sz in veth driver Jesper Dangaard Brouer
2020-04-24 14:07   ` Toke Høiland-Jørgensen
2020-04-25  3:10   ` Toshiaki Makita
2020-04-22 16:08 ` [PATCH net-next 11/33] dpaa2-eth: add XDP frame size Jesper Dangaard Brouer
2020-04-22 16:08 ` [PATCH net-next 12/33] hv_netvsc: add XDP frame size to driver Jesper Dangaard Brouer
2020-04-22 16:57   ` Haiyang Zhang
2020-04-22 16:08 ` [PATCH net-next 13/33] qlogic/qede: " Jesper Dangaard Brouer
2020-04-22 16:08 ` [PATCH net-next 14/33] net: ethernet: ti: add XDP frame size to driver cpsw Jesper Dangaard Brouer
2020-04-22 20:28   ` Grygorii Strashko
2020-04-22 16:08 ` [PATCH net-next 15/33] ena: add XDP frame size to amazon NIC driver Jesper Dangaard Brouer
2020-04-22 16:08 ` [PATCH net-next 16/33] mlx4: add XDP frame size and adjust max XDP MTU Jesper Dangaard Brouer
2020-04-22 16:08 ` [PATCH net-next 17/33] net: thunderx: add XDP frame size Jesper Dangaard Brouer
2020-04-22 16:08 ` [PATCH net-next 18/33] nfp: add XDP frame size to netronome driver Jesper Dangaard Brouer
2020-04-23  2:43   ` Jakub Kicinski
2020-04-22 16:08 ` [PATCH net-next 19/33] tun: add XDP frame size Jesper Dangaard Brouer
2020-04-27  5:51   ` Jason Wang
2020-05-06 20:30   ` Michael S. Tsirkin
2020-04-22 16:09 ` [PATCH net-next 20/33] vhost_net: also populate " Jesper Dangaard Brouer
2020-04-27  5:50   ` Jason Wang
2020-04-30  9:54     ` Jesper Dangaard Brouer
2020-05-06  6:43       ` Jason Wang
2020-04-22 16:09 ` [PATCH net-next 21/33] virtio_net: add XDP frame size in two code paths Jesper Dangaard Brouer
2020-04-27  7:21   ` Jason Wang
2020-04-27 14:32     ` Jesper Dangaard Brouer
2020-04-28  9:50       ` Jason Wang
2020-04-30 10:14         ` Jesper Dangaard Brouer
2020-05-06  6:38         ` Jason Wang
2020-04-22 16:09 ` [PATCH net-next 22/33] ixgbe: fix XDP redirect on archs with PAGE_SIZE above 4K Jesper Dangaard Brouer
2020-04-22 16:09 ` [PATCH net-next 23/33] ixgbe: add XDP frame size to driver Jesper Dangaard Brouer
2020-04-27 19:51   ` Daniel Borkmann
2020-04-22 16:09 ` [PATCH net-next 24/33] ixgbevf: add XDP frame size to VF driver Jesper Dangaard Brouer
2020-04-22 16:09 ` [PATCH net-next 25/33] i40e: add XDP frame size to driver Jesper Dangaard Brouer
2020-04-22 16:09 ` [PATCH net-next 26/33] ice: " Jesper Dangaard Brouer
2020-04-22 16:09 ` [PATCH net-next 27/33] xdp: for Intel AF_XDP drivers add XDP frame_sz Jesper Dangaard Brouer
2020-04-22 16:09 ` [PATCH net-next 28/33] mlx5: rx queue setup time determine frame_sz for XDP Jesper Dangaard Brouer
2020-04-25  0:58   ` Alexei Starovoitov
2020-04-27 20:22   ` Jesper Dangaard Brouer
2020-04-22 16:09 ` [PATCH net-next 29/33] xdp: allow bpf_xdp_adjust_tail() to grow packet size Jesper Dangaard Brouer
2020-04-24 14:09   ` Toke Høiland-Jørgensen
2020-04-27 19:01   ` Daniel Borkmann
2020-04-28 16:37     ` Jesper Dangaard Brouer
2020-04-28 19:36       ` Daniel Borkmann
2020-04-22 16:09 ` [PATCH net-next 30/33] xdp: clear grow memory in bpf_xdp_adjust_tail() Jesper Dangaard Brouer
2020-04-24 14:09   ` Toke Høiland-Jørgensen
2020-04-27  5:26   ` John Fastabend
2020-04-28 14:50     ` Jesper Dangaard Brouer
2020-04-22 16:09 ` [PATCH net-next 31/33] bpf: add xdp.frame_sz in bpf_prog_test_run_xdp() Jesper Dangaard Brouer
2020-04-22 16:10 ` [PATCH net-next 32/33] selftests/bpf: adjust BPF selftest for xdp_adjust_tail Jesper Dangaard Brouer
2020-04-22 16:10 ` [PATCH net-next 33/33] selftests/bpf: xdp_adjust_tail add grow tail tests Jesper Dangaard Brouer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).