netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next v2 01/33] xdp: add frame size to xdp_buff
       [not found] <158824557985.2172139.4173570969543904434.stgit@firesoul>
@ 2020-04-30 11:20 ` Jesper Dangaard Brouer
  2020-04-30 11:20 ` [PATCH net-next v2 02/33] bnxt: add XDP frame size to driver Jesper Dangaard Brouer
                   ` (31 subsequent siblings)
  32 siblings, 0 replies; 47+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-30 11:20 UTC (permalink / raw)
  To: sameehj
  Cc: Toke Høiland-Jørgensen, Jesper Dangaard Brouer, netdev,
	bpf, zorik, akiyano, gtzalik, Toke Høiland-Jørgensen,
	Daniel Borkmann, Alexei Starovoitov, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Willem de Bruijn,
	Ilias Apalodimas, Lorenzo Bianconi, Saeed Mahameed,
	steffen.klassert

XDP have evolved to support several frame sizes, but xdp_buff was not
updated with this information. The frame size (frame_sz) member of
xdp_buff is introduced to know the real size of the memory the frame is
delivered in.

When introducing this also make it clear that some tailroom is
reserved/required when creating SKBs using build_skb().

It would also have been an option to introduce a pointer to
data_hard_end (with reserved offset). The advantage with frame_sz is
that (like rxq) drivers only need to setup/assign this value once per
NAPI cycle. Due to XDP-generic (and some drivers) it's not possible to
store frame_sz inside xdp_rxq_info, because it's varies per packet as it
can be based/depend on packet length.

V2: nitpick: deduct -> deduce

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 include/net/xdp.h |   13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/include/net/xdp.h b/include/net/xdp.h
index 3cc6d5d84aa4..a764af4ae0ea 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -6,6 +6,8 @@
 #ifndef __LINUX_NET_XDP_H__
 #define __LINUX_NET_XDP_H__
 
+#include <linux/skbuff.h> /* skb_shared_info */
+
 /**
  * DOC: XDP RX-queue information
  *
@@ -70,8 +72,19 @@ struct xdp_buff {
 	void *data_hard_start;
 	unsigned long handle;
 	struct xdp_rxq_info *rxq;
+	u32 frame_sz; /* frame size to deduce data_hard_end/reserved tailroom*/
 };
 
+/* Reserve memory area at end-of data area.
+ *
+ * This macro reserves tailroom in the XDP buffer by limiting the
+ * XDP/BPF data access to data_hard_end.  Notice same area (and size)
+ * is used for XDP_PASS, when constructing the SKB via build_skb().
+ */
+#define xdp_data_hard_end(xdp)				\
+	((xdp)->data_hard_start + (xdp)->frame_sz -	\
+	 SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))
+
 struct xdp_frame {
 	void *data;
 	u16 len;



^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH net-next v2 02/33] bnxt: add XDP frame size to driver
       [not found] <158824557985.2172139.4173570969543904434.stgit@firesoul>
  2020-04-30 11:20 ` [PATCH net-next v2 01/33] xdp: add frame size to xdp_buff Jesper Dangaard Brouer
@ 2020-04-30 11:20 ` Jesper Dangaard Brouer
  2020-04-30 11:20 ` [PATCH net-next v2 03/33] sfc: add XDP frame size Jesper Dangaard Brouer
                   ` (30 subsequent siblings)
  32 siblings, 0 replies; 47+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-30 11:20 UTC (permalink / raw)
  To: sameehj
  Cc: Michael Chan, Andy Gospodarek, Andy Gospodarek,
	Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, steffen.klassert

This driver uses full PAGE_SIZE pages when XDP is enabled.

In case of XDP uses driver uses __bnxt_alloc_rx_page which does full
page DMA-map. Thus, xdp_adjust_tail grow is DMA compliant for XDP_TX
action that does DMA-sync.

Cc: Michael Chan <michael.chan@broadcom.com>
Cc: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
index c6f6f2033880..5e3b4a3b69ea 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
@@ -138,6 +138,7 @@ bool bnxt_rx_xdp(struct bnxt *bp, struct bnxt_rx_ring_info *rxr, u16 cons,
 	xdp_set_data_meta_invalid(&xdp);
 	xdp.data_end = *data_ptr + *len;
 	xdp.rxq = &rxr->xdp_rxq;
+	xdp.frame_sz = PAGE_SIZE; /* BNXT_RX_PAGE_MODE(bp) when XDP enabled */
 	orig_data = xdp.data;
 
 	rcu_read_lock();



^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH net-next v2 03/33] sfc: add XDP frame size
       [not found] <158824557985.2172139.4173570969543904434.stgit@firesoul>
  2020-04-30 11:20 ` [PATCH net-next v2 01/33] xdp: add frame size to xdp_buff Jesper Dangaard Brouer
  2020-04-30 11:20 ` [PATCH net-next v2 02/33] bnxt: add XDP frame size to driver Jesper Dangaard Brouer
@ 2020-04-30 11:20 ` Jesper Dangaard Brouer
  2020-04-30 11:20 ` [PATCH net-next v2 04/33] mvneta: add XDP frame size to driver Jesper Dangaard Brouer
                   ` (29 subsequent siblings)
  32 siblings, 0 replies; 47+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-30 11:20 UTC (permalink / raw)
  To: sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, steffen.klassert

This driver uses RX page-split when possible. It was recently fixed
in commit 86e85bf6981c ("sfc: fix XDP-redirect in this driver") to
add needed tailroom for XDP-redirect.

After the fix efx->rx_page_buf_step is the frame size, with enough
head and tail-room for XDP-redirect.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/sfc/rx.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/sfc/rx.c b/drivers/net/ethernet/sfc/rx.c
index 260352d97d9d..68c47a8c71df 100644
--- a/drivers/net/ethernet/sfc/rx.c
+++ b/drivers/net/ethernet/sfc/rx.c
@@ -308,6 +308,7 @@ static bool efx_do_xdp(struct efx_nic *efx, struct efx_channel *channel,
 	xdp_set_data_meta_invalid(&xdp);
 	xdp.data_end = xdp.data + rx_buf->len;
 	xdp.rxq = &rx_queue->xdp_rxq_info;
+	xdp.frame_sz = efx->rx_page_buf_step;
 
 	xdp_act = bpf_prog_run_xdp(xdp_prog, &xdp);
 	rcu_read_unlock();



^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH net-next v2 04/33] mvneta: add XDP frame size to driver
       [not found] <158824557985.2172139.4173570969543904434.stgit@firesoul>
                   ` (2 preceding siblings ...)
  2020-04-30 11:20 ` [PATCH net-next v2 03/33] sfc: add XDP frame size Jesper Dangaard Brouer
@ 2020-04-30 11:20 ` Jesper Dangaard Brouer
  2020-04-30 11:20 ` [PATCH net-next v2 05/33] net: netsec: Add support for XDP frame size Jesper Dangaard Brouer
                   ` (28 subsequent siblings)
  32 siblings, 0 replies; 47+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-30 11:20 UTC (permalink / raw)
  To: sameehj
  Cc: thomas.petazzoni, Lorenzo Bianconi, Jesper Dangaard Brouer,
	netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, steffen.klassert

This marvell driver mvneta uses PAGE_SIZE frames, which makes it
really easy to convert.  Driver updates rxq and now frame_sz
once per NAPI call.

This driver takes advantage of page_pool PP_FLAG_DMA_SYNC_DEV that
can help reduce the number of cache-lines that need to be flushed
when doing DMA sync for_device. Due to xdp_adjust_tail can grow the
area accessible to the by the CPU (can possibly write into), then max
sync length *after* bpf_prog_run_xdp() needs to be taken into account.

For XDP_TX action the driver is smart and does DMA-sync. When growing
tail this is still safe, because page_pool have DMA-mapped the entire
page size.

Cc: thomas.petazzoni@bootlin.com
Acked-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/marvell/mvneta.c |   25 +++++++++++++++----------
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 51889770958d..37947949345c 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -2148,12 +2148,17 @@ mvneta_run_xdp(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
 	       struct bpf_prog *prog, struct xdp_buff *xdp,
 	       struct mvneta_stats *stats)
 {
-	unsigned int len;
+	unsigned int len, sync;
+	struct page *page;
 	u32 ret, act;
 
 	len = xdp->data_end - xdp->data_hard_start - pp->rx_offset_correction;
 	act = bpf_prog_run_xdp(prog, xdp);
 
+	/* Due xdp_adjust_tail: DMA sync for_device cover max len CPU touch */
+	sync = xdp->data_end - xdp->data_hard_start - pp->rx_offset_correction;
+	sync = max(sync, len);
+
 	switch (act) {
 	case XDP_PASS:
 		stats->xdp_pass++;
@@ -2164,9 +2169,8 @@ mvneta_run_xdp(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
 		err = xdp_do_redirect(pp->dev, xdp, prog);
 		if (unlikely(err)) {
 			ret = MVNETA_XDP_DROPPED;
-			page_pool_put_page(rxq->page_pool,
-					   virt_to_head_page(xdp->data), len,
-					   true);
+			page = virt_to_head_page(xdp->data);
+			page_pool_put_page(rxq->page_pool, page, sync, true);
 		} else {
 			ret = MVNETA_XDP_REDIR;
 			stats->xdp_redirect++;
@@ -2175,10 +2179,10 @@ mvneta_run_xdp(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
 	}
 	case XDP_TX:
 		ret = mvneta_xdp_xmit_back(pp, xdp);
-		if (ret != MVNETA_XDP_TX)
-			page_pool_put_page(rxq->page_pool,
-					   virt_to_head_page(xdp->data), len,
-					   true);
+		if (ret != MVNETA_XDP_TX) {
+			page = virt_to_head_page(xdp->data);
+			page_pool_put_page(rxq->page_pool, page, sync, true);
+		}
 		break;
 	default:
 		bpf_warn_invalid_xdp_action(act);
@@ -2187,8 +2191,8 @@ mvneta_run_xdp(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
 		trace_xdp_exception(pp->dev, prog, act);
 		/* fall through */
 	case XDP_DROP:
-		page_pool_put_page(rxq->page_pool,
-				   virt_to_head_page(xdp->data), len, true);
+		page = virt_to_head_page(xdp->data);
+		page_pool_put_page(rxq->page_pool, page, sync, true);
 		ret = MVNETA_XDP_DROPPED;
 		stats->xdp_drop++;
 		break;
@@ -2320,6 +2324,7 @@ static int mvneta_rx_swbm(struct napi_struct *napi,
 	rcu_read_lock();
 	xdp_prog = READ_ONCE(pp->xdp_prog);
 	xdp_buf.rxq = &rxq->xdp_rxq;
+	xdp_buf.frame_sz = PAGE_SIZE;
 
 	/* Fairness NAPI loop */
 	while (rx_proc < budget && rx_proc < rx_todo) {



^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH net-next v2 05/33] net: netsec: Add support for XDP frame size
       [not found] <158824557985.2172139.4173570969543904434.stgit@firesoul>
                   ` (3 preceding siblings ...)
  2020-04-30 11:20 ` [PATCH net-next v2 04/33] mvneta: add XDP frame size to driver Jesper Dangaard Brouer
@ 2020-04-30 11:20 ` Jesper Dangaard Brouer
  2020-04-30 11:20 ` [PATCH net-next v2 06/33] net: XDP-generic determining " Jesper Dangaard Brouer
                   ` (27 subsequent siblings)
  32 siblings, 0 replies; 47+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-30 11:20 UTC (permalink / raw)
  To: sameehj
  Cc: Ilias Apalodimas, Lorenzo Bianconi, Jesper Dangaard Brouer,
	netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, steffen.klassert

From: Ilias Apalodimas <ilias.apalodimas@linaro.org>

This driver takes advantage of page_pool PP_FLAG_DMA_SYNC_DEV that
can help reduce the number of cache-lines that need to be flushed
when doing DMA sync for_device. Due to xdp_adjust_tail can grow the
area accessible to the by the CPU (can possibly write into), then max
sync length *after* bpf_prog_run_xdp() needs to be taken into account.

For XDP_TX action the driver is smart and does DMA-sync. When growing
tail this is still safe, because page_pool have DMA-mapped the entire
page size.

Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 drivers/net/ethernet/socionext/netsec.c |   30 ++++++++++++++++++------------
 1 file changed, 18 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/socionext/netsec.c b/drivers/net/ethernet/socionext/netsec.c
index a5a0fb60193a..e1f4be4b3d69 100644
--- a/drivers/net/ethernet/socionext/netsec.c
+++ b/drivers/net/ethernet/socionext/netsec.c
@@ -884,23 +884,28 @@ static u32 netsec_run_xdp(struct netsec_priv *priv, struct bpf_prog *prog,
 			  struct xdp_buff *xdp)
 {
 	struct netsec_desc_ring *dring = &priv->desc_ring[NETSEC_RING_RX];
-	unsigned int len = xdp->data_end - xdp->data;
+	unsigned int sync, len = xdp->data_end - xdp->data;
 	u32 ret = NETSEC_XDP_PASS;
+	struct page *page;
 	int err;
 	u32 act;
 
 	act = bpf_prog_run_xdp(prog, xdp);
 
+	/* Due xdp_adjust_tail: DMA sync for_device cover max len CPU touch */
+	sync = xdp->data_end - xdp->data_hard_start - NETSEC_RXBUF_HEADROOM;
+	sync = max(sync, len);
+
 	switch (act) {
 	case XDP_PASS:
 		ret = NETSEC_XDP_PASS;
 		break;
 	case XDP_TX:
 		ret = netsec_xdp_xmit_back(priv, xdp);
-		if (ret != NETSEC_XDP_TX)
-			page_pool_put_page(dring->page_pool,
-					   virt_to_head_page(xdp->data), len,
-					   true);
+		if (ret != NETSEC_XDP_TX) {
+			page = virt_to_head_page(xdp->data);
+			page_pool_put_page(dring->page_pool, page, sync, true);
+		}
 		break;
 	case XDP_REDIRECT:
 		err = xdp_do_redirect(priv->ndev, xdp, prog);
@@ -908,9 +913,8 @@ static u32 netsec_run_xdp(struct netsec_priv *priv, struct bpf_prog *prog,
 			ret = NETSEC_XDP_REDIR;
 		} else {
 			ret = NETSEC_XDP_CONSUMED;
-			page_pool_put_page(dring->page_pool,
-					   virt_to_head_page(xdp->data), len,
-					   true);
+			page = virt_to_head_page(xdp->data);
+			page_pool_put_page(dring->page_pool, page, sync, true);
 		}
 		break;
 	default:
@@ -921,8 +925,8 @@ static u32 netsec_run_xdp(struct netsec_priv *priv, struct bpf_prog *prog,
 		/* fall through -- handle aborts by dropping packet */
 	case XDP_DROP:
 		ret = NETSEC_XDP_CONSUMED;
-		page_pool_put_page(dring->page_pool,
-				   virt_to_head_page(xdp->data), len, true);
+		page = virt_to_head_page(xdp->data);
+		page_pool_put_page(dring->page_pool, page, sync, true);
 		break;
 	}
 
@@ -936,10 +940,14 @@ static int netsec_process_rx(struct netsec_priv *priv, int budget)
 	struct netsec_rx_pkt_info rx_info;
 	enum dma_data_direction dma_dir;
 	struct bpf_prog *xdp_prog;
+	struct xdp_buff xdp;
 	u16 xdp_xmit = 0;
 	u32 xdp_act = 0;
 	int done = 0;
 
+	xdp.rxq = &dring->xdp_rxq;
+	xdp.frame_sz = PAGE_SIZE;
+
 	rcu_read_lock();
 	xdp_prog = READ_ONCE(priv->xdp_prog);
 	dma_dir = page_pool_get_dma_dir(dring->page_pool);
@@ -953,7 +961,6 @@ static int netsec_process_rx(struct netsec_priv *priv, int budget)
 		struct sk_buff *skb = NULL;
 		u16 pkt_len, desc_len;
 		dma_addr_t dma_handle;
-		struct xdp_buff xdp;
 		void *buf_addr;
 
 		if (de->attr & (1U << NETSEC_RX_PKT_OWN_FIELD)) {
@@ -1002,7 +1009,6 @@ static int netsec_process_rx(struct netsec_priv *priv, int budget)
 		xdp.data = desc->addr + NETSEC_RXBUF_HEADROOM;
 		xdp_set_data_meta_invalid(&xdp);
 		xdp.data_end = xdp.data + pkt_len;
-		xdp.rxq = &dring->xdp_rxq;
 
 		if (xdp_prog) {
 			xdp_result = netsec_run_xdp(priv, xdp_prog, &xdp);



^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH net-next v2 06/33] net: XDP-generic determining XDP frame size
       [not found] <158824557985.2172139.4173570969543904434.stgit@firesoul>
                   ` (4 preceding siblings ...)
  2020-04-30 11:20 ` [PATCH net-next v2 05/33] net: netsec: Add support for XDP frame size Jesper Dangaard Brouer
@ 2020-04-30 11:20 ` Jesper Dangaard Brouer
  2020-04-30 11:20 ` [PATCH net-next v2 07/33] xdp: xdp_frame add member frame_sz and handle in convert_to_xdp_frame Jesper Dangaard Brouer
                   ` (26 subsequent siblings)
  32 siblings, 0 replies; 47+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-30 11:20 UTC (permalink / raw)
  To: sameehj
  Cc: Toke Høiland-Jørgensen, Jesper Dangaard Brouer, netdev,
	bpf, zorik, akiyano, gtzalik, Toke Høiland-Jørgensen,
	Daniel Borkmann, Alexei Starovoitov, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Willem de Bruijn,
	Ilias Apalodimas, Lorenzo Bianconi, Saeed Mahameed,
	steffen.klassert

The SKB "head" pointer points to the data area that contains
skb_shared_info, that can be found via skb_end_pointer(). Given
xdp->data_hard_start have been established (basically pointing to
skb->head), frame size is between skb_end_pointer() and data_hard_start,
plus the size reserved to skb_shared_info.

Change the bpf_xdp_adjust_tail offset adjust of skb->len, to be a positive
offset number on grow, and negative number on shrink.  As this seems more
natural when reading the code.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 net/core/dev.c |   14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index afff16849c26..b364e6f3a37a 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4549,6 +4549,11 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
 	xdp->data_meta = xdp->data;
 	xdp->data_end = xdp->data + hlen;
 	xdp->data_hard_start = skb->data - skb_headroom(skb);
+
+	/* SKB "head" area always have tailroom for skb_shared_info */
+	xdp->frame_sz  = (void *)skb_end_pointer(skb) - xdp->data_hard_start;
+	xdp->frame_sz += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+
 	orig_data_end = xdp->data_end;
 	orig_data = xdp->data;
 	eth = (struct ethhdr *)xdp->data;
@@ -4572,14 +4577,11 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
 		skb_reset_network_header(skb);
 	}
 
-	/* check if bpf_xdp_adjust_tail was used. it can only "shrink"
-	 * pckt.
-	 */
-	off = orig_data_end - xdp->data_end;
+	/* check if bpf_xdp_adjust_tail was used */
+	off = xdp->data_end - orig_data_end;
 	if (off != 0) {
 		skb_set_tail_pointer(skb, xdp->data_end - xdp->data);
-		skb->len -= off;
-
+		skb->len += off; /* positive on grow, negative on shrink */
 	}
 
 	/* check if XDP changed eth hdr such SKB needs update */



^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH net-next v2 07/33] xdp: xdp_frame add member frame_sz and handle in convert_to_xdp_frame
       [not found] <158824557985.2172139.4173570969543904434.stgit@firesoul>
                   ` (5 preceding siblings ...)
  2020-04-30 11:20 ` [PATCH net-next v2 06/33] net: XDP-generic determining " Jesper Dangaard Brouer
@ 2020-04-30 11:20 ` Jesper Dangaard Brouer
  2020-04-30 11:21 ` [PATCH net-next v2 08/33] xdp: cpumap redirect use frame_sz and increase skb_tailroom Jesper Dangaard Brouer
                   ` (25 subsequent siblings)
  32 siblings, 0 replies; 47+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-30 11:20 UTC (permalink / raw)
  To: sameehj
  Cc: Toke Høiland-Jørgensen, Jesper Dangaard Brouer, netdev,
	bpf, zorik, akiyano, gtzalik, Toke Høiland-Jørgensen,
	Daniel Borkmann, Alexei Starovoitov, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Willem de Bruijn,
	Ilias Apalodimas, Lorenzo Bianconi, Saeed Mahameed,
	steffen.klassert

Use hole in struct xdp_frame, when adding member frame_sz, which keeps
same sizeof struct (32 bytes)

Drivers ixgbe and sfc had bug cases where the necessary/expected
tailroom was not reserved. This can lead to some hard to catch memory
corruption issues. Having the drivers frame_sz this can be detected when
packet length/end via xdp->data_end exceed the xdp_data_hard_end
pointer, which accounts for the reserved the tailroom.

When detecting this driver issue, simply fail the conversion with NULL,
which results in feedback to driver (failing xdp_do_redirect()) causing
driver to drop packet. Given the lack of consistent XDP stats, this can
be hard to troubleshoot. And given this is a driver bug, we want to
generate some more noise in form of a WARN stack dump (to ID the driver
code that inlined convert_to_xdp_frame).

Inlining the WARN macro is problematic, because it adds an asm
instruction (on Intel CPUs ud2) what influence instruction cache
prefetching. Thus, introduce xdp_warn and macro XDP_WARN, to avoid this
and at the same time make identifying the function and line of this
inlined function easier.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 include/net/xdp.h |   14 +++++++++++++-
 net/core/xdp.c    |    7 +++++++
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/include/net/xdp.h b/include/net/xdp.h
index a764af4ae0ea..1366466868e4 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -89,7 +89,8 @@ struct xdp_frame {
 	void *data;
 	u16 len;
 	u16 headroom;
-	u16 metasize;
+	u32 metasize:8;
+	u32 frame_sz:24;
 	/* Lifetime of xdp_rxq_info is limited to NAPI/enqueue time,
 	 * while mem info is valid on remote CPU.
 	 */
@@ -104,6 +105,10 @@ static inline void xdp_scrub_frame(struct xdp_frame *frame)
 	frame->dev_rx = NULL;
 }
 
+/* Avoids inlining WARN macro in fast-path */
+void xdp_warn(const char* msg, const char* func, const int line);
+#define XDP_WARN(msg) xdp_warn(msg, __func__, __LINE__)
+
 struct xdp_frame *xdp_convert_zc_to_xdp_frame(struct xdp_buff *xdp);
 
 /* Convert xdp_buff to xdp_frame */
@@ -124,6 +129,12 @@ struct xdp_frame *convert_to_xdp_frame(struct xdp_buff *xdp)
 	if (unlikely((headroom - metasize) < sizeof(*xdp_frame)))
 		return NULL;
 
+	/* Catch if driver didn't reserve tailroom for skb_shared_info */
+	if (unlikely(xdp->data_end > xdp_data_hard_end(xdp))) {
+		XDP_WARN("Driver BUG: missing reserved tailroom");
+		return NULL;
+	}
+
 	/* Store info in top of packet */
 	xdp_frame = xdp->data_hard_start;
 
@@ -131,6 +142,7 @@ struct xdp_frame *convert_to_xdp_frame(struct xdp_buff *xdp)
 	xdp_frame->len  = xdp->data_end - xdp->data;
 	xdp_frame->headroom = headroom - sizeof(*xdp_frame);
 	xdp_frame->metasize = metasize;
+	xdp_frame->frame_sz = xdp->frame_sz;
 
 	/* rxq only valid until napi_schedule ends, convert to xdp_mem_info */
 	xdp_frame->mem = xdp->rxq->mem;
diff --git a/net/core/xdp.c b/net/core/xdp.c
index 4c7ea85486af..4bc3026ae218 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -11,6 +11,7 @@
 #include <linux/slab.h>
 #include <linux/idr.h>
 #include <linux/rhashtable.h>
+#include <linux/bug.h>
 #include <net/page_pool.h>
 
 #include <net/xdp.h>
@@ -496,3 +497,9 @@ struct xdp_frame *xdp_convert_zc_to_xdp_frame(struct xdp_buff *xdp)
 	return xdpf;
 }
 EXPORT_SYMBOL_GPL(xdp_convert_zc_to_xdp_frame);
+
+/* Used by XDP_WARN macro, to avoid inlining WARN() in fast-path */
+void xdp_warn(const char* msg, const char* func, const int line) {
+	WARN(1, "XDP_WARN: %s(line:%d): %s\n", func, line, msg);
+};
+EXPORT_SYMBOL_GPL(xdp_warn);



^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH net-next v2 08/33] xdp: cpumap redirect use frame_sz and increase skb_tailroom
       [not found] <158824557985.2172139.4173570969543904434.stgit@firesoul>
                   ` (6 preceding siblings ...)
  2020-04-30 11:20 ` [PATCH net-next v2 07/33] xdp: xdp_frame add member frame_sz and handle in convert_to_xdp_frame Jesper Dangaard Brouer
@ 2020-04-30 11:21 ` Jesper Dangaard Brouer
  2020-04-30 11:21 ` [PATCH net-next v2 09/33] veth: adjust hard_start offset on redirect XDP frames Jesper Dangaard Brouer
                   ` (24 subsequent siblings)
  32 siblings, 0 replies; 47+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-30 11:21 UTC (permalink / raw)
  To: sameehj
  Cc: Toke Høiland-Jørgensen, Jesper Dangaard Brouer, netdev,
	bpf, zorik, akiyano, gtzalik, Toke Høiland-Jørgensen,
	Daniel Borkmann, Alexei Starovoitov, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Willem de Bruijn,
	Ilias Apalodimas, Lorenzo Bianconi, Saeed Mahameed,
	steffen.klassert

Knowing the memory size backing the packet/xdp_frame data area, and
knowing it already have reserved room for skb_shared_info, simplifies
using build_skb significantly.

With this change we no-longer lie about the SKB truesize, but more
importantly a significant larger skb_tailroom is now provided, e.g. when
drivers uses a full PAGE_SIZE. This extra tailroom (in linear area) can be
used by the network stack when coalescing SKBs (e.g. in skb_try_coalesce,
see TCP cases where tcp_queue_rcv() can 'eat' skb).

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 kernel/bpf/cpumap.c |   21 +++------------------
 1 file changed, 3 insertions(+), 18 deletions(-)

diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
index 3fe0b006d2d2..a71790dab12d 100644
--- a/kernel/bpf/cpumap.c
+++ b/kernel/bpf/cpumap.c
@@ -162,25 +162,10 @@ static struct sk_buff *cpu_map_build_skb(struct bpf_cpu_map_entry *rcpu,
 	/* Part of headroom was reserved to xdpf */
 	hard_start_headroom = sizeof(struct xdp_frame) +  xdpf->headroom;
 
-	/* build_skb need to place skb_shared_info after SKB end, and
-	 * also want to know the memory "truesize".  Thus, need to
-	 * know the memory frame size backing xdp_buff.
-	 *
-	 * XDP was designed to have PAGE_SIZE frames, but this
-	 * assumption is not longer true with ixgbe and i40e.  It
-	 * would be preferred to set frame_size to 2048 or 4096
-	 * depending on the driver.
-	 *   frame_size = 2048;
-	 *   frame_len  = frame_size - sizeof(*xdp_frame);
-	 *
-	 * Instead, with info avail, skb_shared_info in placed after
-	 * packet len.  This, unfortunately fakes the truesize.
-	 * Another disadvantage of this approach, the skb_shared_info
-	 * is not at a fixed memory location, with mixed length
-	 * packets, which is bad for cache-line hotness.
+	/* Memory size backing xdp_frame data already have reserved
+	 * room for build_skb to place skb_shared_info in tailroom.
 	 */
-	frame_size = SKB_DATA_ALIGN(xdpf->len + hard_start_headroom) +
-		SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+	frame_size = xdpf->frame_sz;
 
 	pkt_data_start = xdpf->data - hard_start_headroom;
 	skb = build_skb_around(skb, pkt_data_start, frame_size);



^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH net-next v2 09/33] veth: adjust hard_start offset on redirect XDP frames
       [not found] <158824557985.2172139.4173570969543904434.stgit@firesoul>
                   ` (7 preceding siblings ...)
  2020-04-30 11:21 ` [PATCH net-next v2 08/33] xdp: cpumap redirect use frame_sz and increase skb_tailroom Jesper Dangaard Brouer
@ 2020-04-30 11:21 ` Jesper Dangaard Brouer
  2020-04-30 11:21 ` [PATCH net-next v2 10/33] veth: xdp using frame_sz in veth driver Jesper Dangaard Brouer
                   ` (23 subsequent siblings)
  32 siblings, 0 replies; 47+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-30 11:21 UTC (permalink / raw)
  To: sameehj
  Cc: Toshiaki Makita, Mao Wenan, Toshiaki Makita,
	Toke Høiland-Jørgensen, Jesper Dangaard Brouer, netdev,
	bpf, zorik, akiyano, gtzalik, Toke Høiland-Jørgensen,
	Daniel Borkmann, Alexei Starovoitov, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Willem de Bruijn,
	Ilias Apalodimas, Lorenzo Bianconi, Saeed Mahameed,
	steffen.klassert

When native XDP redirect into a veth device, the frame arrives in the
xdp_frame structure. It is then processed in veth_xdp_rcv_one(),
which can run a new XDP bpf_prog on the packet. Doing so requires
converting xdp_frame to xdp_buff, but the tricky part is that
xdp_frame memory area is located in the top (data_hard_start) memory
area that xdp_buff will point into.

The current code tried to protect the xdp_frame area, by assigning
xdp_buff.data_hard_start past this memory. This results in 32 bytes
less headroom to expand into via BPF-helper bpf_xdp_adjust_head().

This protect step is actually not needed, because BPF-helper
bpf_xdp_adjust_head() already reserve this area, and don't allow
BPF-prog to expand into it. Thus, it is safe to point data_hard_start
directly at xdp_frame memory area.

Cc: Toshiaki Makita <toshiaki.makita1@gmail.com>
Fixes: 9fc8d518d9d5 ("veth: Handle xdp_frames in xdp napi ring")
Reported-by: Mao Wenan <maowenan@huawei.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Toshiaki Makita <toshiaki.makita1@gmail.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 drivers/net/veth.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index aece0e5eec8c..d5691bb84448 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -564,13 +564,15 @@ static struct sk_buff *veth_xdp_rcv_one(struct veth_rq *rq,
 					struct veth_stats *stats)
 {
 	void *hard_start = frame->data - frame->headroom;
-	void *head = hard_start - sizeof(struct xdp_frame);
 	int len = frame->len, delta = 0;
 	struct xdp_frame orig_frame;
 	struct bpf_prog *xdp_prog;
 	unsigned int headroom;
 	struct sk_buff *skb;
 
+	/* bpf_xdp_adjust_head() assures BPF cannot access xdp_frame area */
+	hard_start -= sizeof(struct xdp_frame);
+
 	rcu_read_lock();
 	xdp_prog = rcu_dereference(rq->xdp_prog);
 	if (likely(xdp_prog)) {
@@ -592,7 +594,6 @@ static struct sk_buff *veth_xdp_rcv_one(struct veth_rq *rq,
 			break;
 		case XDP_TX:
 			orig_frame = *frame;
-			xdp.data_hard_start = head;
 			xdp.rxq->mem = frame->mem;
 			if (unlikely(veth_xdp_tx(rq, &xdp, bq) < 0)) {
 				trace_xdp_exception(rq->dev, xdp_prog, act);
@@ -605,7 +606,6 @@ static struct sk_buff *veth_xdp_rcv_one(struct veth_rq *rq,
 			goto xdp_xmit;
 		case XDP_REDIRECT:
 			orig_frame = *frame;
-			xdp.data_hard_start = head;
 			xdp.rxq->mem = frame->mem;
 			if (xdp_do_redirect(rq->dev, &xdp, xdp_prog)) {
 				frame = &orig_frame;
@@ -629,7 +629,7 @@ static struct sk_buff *veth_xdp_rcv_one(struct veth_rq *rq,
 	rcu_read_unlock();
 
 	headroom = sizeof(struct xdp_frame) + frame->headroom - delta;
-	skb = veth_build_skb(head, headroom, len, 0);
+	skb = veth_build_skb(hard_start, headroom, len, 0);
 	if (!skb) {
 		xdp_return_frame(frame);
 		stats->rx_drops++;



^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH net-next v2 10/33] veth: xdp using frame_sz in veth driver
       [not found] <158824557985.2172139.4173570969543904434.stgit@firesoul>
                   ` (8 preceding siblings ...)
  2020-04-30 11:21 ` [PATCH net-next v2 09/33] veth: adjust hard_start offset on redirect XDP frames Jesper Dangaard Brouer
@ 2020-04-30 11:21 ` Jesper Dangaard Brouer
  2020-04-30 11:21 ` [PATCH net-next v2 11/33] dpaa2-eth: add XDP frame size Jesper Dangaard Brouer
                   ` (22 subsequent siblings)
  32 siblings, 0 replies; 47+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-30 11:21 UTC (permalink / raw)
  To: sameehj
  Cc: Toshiaki Makita, Lorenzo Bianconi,
	Toke Høiland-Jørgensen, Toshiaki Makita,
	Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, steffen.klassert

The veth driver can run XDP in "native" mode in it's own NAPI
handler, and since commit 9fc8d518d9d5 ("veth: Handle xdp_frames in
xdp napi ring") packets can come in two forms either xdp_frame or
skb, calling respectively veth_xdp_rcv_one() or veth_xdp_rcv_skb().

For packets to arrive in xdp_frame format, they will have been
redirected from an XDP native driver. In case of XDP_PASS or no
XDP-prog attached, the veth driver will allocate and create an SKB.

The current code in veth_xdp_rcv_one() xdp_frame case, had to guess
the frame truesize of the incoming xdp_frame, when using
veth_build_skb(). With xdp_frame->frame_sz this is not longer
necessary.

Calculating the frame_sz in veth_xdp_rcv_skb() skb case, is done
similar to the XDP-generic handling code in net/core/dev.c.

Cc: Toshiaki Makita <toshiaki.makita1@gmail.com>
Reviewed-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Toshiaki Makita <toshiaki.makita1@gmail.com>
---
 drivers/net/veth.c |   22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index d5691bb84448..b586d2fa5551 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -405,10 +405,6 @@ static struct sk_buff *veth_build_skb(void *head, int headroom, int len,
 {
 	struct sk_buff *skb;
 
-	if (!buflen) {
-		buflen = SKB_DATA_ALIGN(headroom + len) +
-			 SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
-	}
 	skb = build_skb(head, buflen);
 	if (!skb)
 		return NULL;
@@ -583,6 +579,7 @@ static struct sk_buff *veth_xdp_rcv_one(struct veth_rq *rq,
 		xdp.data = frame->data;
 		xdp.data_end = frame->data + frame->len;
 		xdp.data_meta = frame->data - frame->metasize;
+		xdp.frame_sz = frame->frame_sz;
 		xdp.rxq = &rq->xdp_rxq;
 
 		act = bpf_prog_run_xdp(xdp_prog, &xdp);
@@ -629,7 +626,7 @@ static struct sk_buff *veth_xdp_rcv_one(struct veth_rq *rq,
 	rcu_read_unlock();
 
 	headroom = sizeof(struct xdp_frame) + frame->headroom - delta;
-	skb = veth_build_skb(hard_start, headroom, len, 0);
+	skb = veth_build_skb(hard_start, headroom, len, frame->frame_sz);
 	if (!skb) {
 		xdp_return_frame(frame);
 		stats->rx_drops++;
@@ -695,9 +692,8 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
 			goto drop;
 		}
 
-		nskb = veth_build_skb(head,
-				      VETH_XDP_HEADROOM + mac_len, skb->len,
-				      PAGE_SIZE);
+		nskb = veth_build_skb(head, VETH_XDP_HEADROOM + mac_len,
+				      skb->len, PAGE_SIZE);
 		if (!nskb) {
 			page_frag_free(head);
 			goto drop;
@@ -715,6 +711,11 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
 	xdp.data_end = xdp.data + pktlen;
 	xdp.data_meta = xdp.data;
 	xdp.rxq = &rq->xdp_rxq;
+
+	/* SKB "head" area always have tailroom for skb_shared_info */
+	xdp.frame_sz = (void *)skb_end_pointer(skb) - xdp.data_hard_start;
+	xdp.frame_sz += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+
 	orig_data = xdp.data;
 	orig_data_end = xdp.data_end;
 
@@ -758,6 +759,7 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
 	}
 	rcu_read_unlock();
 
+	/* check if bpf_xdp_adjust_head was used */
 	delta = orig_data - xdp.data;
 	off = mac_len + delta;
 	if (off > 0)
@@ -765,9 +767,11 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
 	else if (off < 0)
 		__skb_pull(skb, -off);
 	skb->mac_header -= delta;
+
+	/* check if bpf_xdp_adjust_tail was used */
 	off = xdp.data_end - orig_data_end;
 	if (off != 0)
-		__skb_put(skb, off);
+		__skb_put(skb, off); /* positive on grow, negative on shrink */
 	skb->protocol = eth_type_trans(skb, rq->dev);
 
 	metalen = xdp.data - xdp.data_meta;



^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH net-next v2 11/33] dpaa2-eth: add XDP frame size
       [not found] <158824557985.2172139.4173570969543904434.stgit@firesoul>
                   ` (9 preceding siblings ...)
  2020-04-30 11:21 ` [PATCH net-next v2 10/33] veth: xdp using frame_sz in veth driver Jesper Dangaard Brouer
@ 2020-04-30 11:21 ` Jesper Dangaard Brouer
  2020-04-30 11:21 ` [PATCH net-next v2 12/33] hv_netvsc: add XDP frame size to driver Jesper Dangaard Brouer
                   ` (21 subsequent siblings)
  32 siblings, 0 replies; 47+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-30 11:21 UTC (permalink / raw)
  To: sameehj
  Cc: Ioana Radulescu, Jesper Dangaard Brouer, netdev, bpf, zorik,
	akiyano, gtzalik, Toke Høiland-Jørgensen,
	Daniel Borkmann, Alexei Starovoitov, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Willem de Bruijn,
	Ilias Apalodimas, Lorenzo Bianconi, Saeed Mahameed,
	steffen.klassert

The dpaa2-eth driver reserve some headroom used for hardware and
software annotation area in RX/TX buffers. Thus, xdp.data_hard_start
doesn't start at page boundary.

When XDP is configured the area reserved via dpaa2_fd_get_offset(fd) is
448 bytes of which XDP have reserved 256 bytes. As frame_sz is
calculated as an offset from xdp_buff.data_hard_start, an adjust from
the full PAGE_SIZE == DPAA2_ETH_RX_BUF_RAW_SIZE.

When doing XDP_REDIRECT, the driver doesn't need this reserved headroom
any-longer and allows xdp_do_redirect() to use it. This is an advantage
for the drivers own ndo-xdp_xmit, as it uses part of this headroom for
itself.  Patch also adjust frame_sz in this case.

The driver cannot support XDP data_meta, because it uses the headroom
just before xdp.data for struct dpaa2_eth_swa (DPAA2_ETH_SWA_SIZE=64),
when transmitting the packet. When transmitting a xdp_frame in
dpaa2_eth_xdp_xmit_frame (call via ndo_xdp_xmit) is uses this area to
store a pointer to xdp_frame and dma_size, which is used in TX
completion (free_tx_fd) to return frame via xdp_return_frame().

Cc: Ioana Radulescu <ruxandra.radulescu@nxp.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c |    7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
index 8ec435ba7d27..a517b5190c8c 100644
--- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
+++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
@@ -302,6 +302,9 @@ static u32 run_xdp(struct dpaa2_eth_priv *priv,
 	xdp_set_data_meta_invalid(&xdp);
 	xdp.rxq = &ch->xdp_rxq;
 
+	xdp.frame_sz = DPAA2_ETH_RX_BUF_RAW_SIZE -
+		(dpaa2_fd_get_offset(fd) - XDP_PACKET_HEADROOM);
+
 	xdp_act = bpf_prog_run_xdp(xdp_prog, &xdp);
 
 	/* xdp.data pointer may have changed */
@@ -337,7 +340,11 @@ static u32 run_xdp(struct dpaa2_eth_priv *priv,
 		dma_unmap_page(priv->net_dev->dev.parent, addr,
 			       DPAA2_ETH_RX_BUF_SIZE, DMA_BIDIRECTIONAL);
 		ch->buf_count--;
+
+		/* Allow redirect use of full headroom */
 		xdp.data_hard_start = vaddr;
+		xdp.frame_sz = DPAA2_ETH_RX_BUF_RAW_SIZE;
+
 		err = xdp_do_redirect(priv->net_dev, &xdp, xdp_prog);
 		if (unlikely(err))
 			ch->stats.xdp_drop++;



^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH net-next v2 12/33] hv_netvsc: add XDP frame size to driver
       [not found] <158824557985.2172139.4173570969543904434.stgit@firesoul>
                   ` (10 preceding siblings ...)
  2020-04-30 11:21 ` [PATCH net-next v2 11/33] dpaa2-eth: add XDP frame size Jesper Dangaard Brouer
@ 2020-04-30 11:21 ` Jesper Dangaard Brouer
  2020-04-30 14:20   ` Haiyang Zhang
  2020-04-30 11:21 ` [PATCH net-next v2 13/33] qlogic/qede: " Jesper Dangaard Brouer
                   ` (20 subsequent siblings)
  32 siblings, 1 reply; 47+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-30 11:21 UTC (permalink / raw)
  To: sameehj
  Cc: Wei Liu, K. Y. Srinivasan, Haiyang Zhang, Stephen Hemminger,
	Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, steffen.klassert

The hyperv NIC drivers XDP implementation is rather disappointing as it
will be a slowdown to enable XDP on this driver, given it will allocate a
new page for each packet and copy over the payload, before invoking the
XDP BPF-prog.

The positive thing it that its easy to determine the xdp.frame_sz.

The XDP implementation for hv_netvsc transparently passes xdp_prog
to the associated VF NIC. Many of the Azure VMs are using SRIOV, so
majority of the data are actually processed directly on the VF driver's XDP
path. So the overhead of the synthetic data path (hv_netvsc) is minimal.

Then XDP is enabled on this driver, XDP_PASS and XDP_TX will create the
SKB via build_skb (based on the newly allocated page). Now using XDP
frame_sz this will provide more skb_tailroom, which netstack can use for
SKB coalescing (e.g tcp_try_coalesce -> skb_try_coalesce).

Cc: Wei Liu <wei.liu@kernel.org>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/hyperv/netvsc_bpf.c |    1 +
 drivers/net/hyperv/netvsc_drv.c |    2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/hyperv/netvsc_bpf.c b/drivers/net/hyperv/netvsc_bpf.c
index b86611041db6..1e0c024b0a93 100644
--- a/drivers/net/hyperv/netvsc_bpf.c
+++ b/drivers/net/hyperv/netvsc_bpf.c
@@ -49,6 +49,7 @@ u32 netvsc_run_xdp(struct net_device *ndev, struct netvsc_channel *nvchan,
 	xdp_set_data_meta_invalid(xdp);
 	xdp->data_end = xdp->data + len;
 	xdp->rxq = &nvchan->xdp_rxq;
+	xdp->frame_sz = PAGE_SIZE;
 	xdp->handle = 0;
 
 	memcpy(xdp->data, data, len);
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index d8e86bdbfba1..651344fea0a5 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -794,7 +794,7 @@ static struct sk_buff *netvsc_alloc_recv_skb(struct net_device *net,
 	if (xbuf) {
 		unsigned int hdroom = xdp->data - xdp->data_hard_start;
 		unsigned int xlen = xdp->data_end - xdp->data;
-		unsigned int frag_size = netvsc_xdp_fraglen(hdroom + xlen);
+		unsigned int frag_size = xdp->frame_sz;
 
 		skb = build_skb(xbuf, frag_size);
 



^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH net-next v2 13/33] qlogic/qede: add XDP frame size to driver
       [not found] <158824557985.2172139.4173570969543904434.stgit@firesoul>
                   ` (11 preceding siblings ...)
  2020-04-30 11:21 ` [PATCH net-next v2 12/33] hv_netvsc: add XDP frame size to driver Jesper Dangaard Brouer
@ 2020-04-30 11:21 ` Jesper Dangaard Brouer
  2020-04-30 11:21 ` [PATCH net-next v2 14/33] net: ethernet: ti: add XDP frame size to driver cpsw Jesper Dangaard Brouer
                   ` (19 subsequent siblings)
  32 siblings, 0 replies; 47+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-30 11:21 UTC (permalink / raw)
  To: sameehj
  Cc: Ariel Elior, GR-everest-linux-l2, Jesper Dangaard Brouer, netdev,
	bpf, zorik, akiyano, gtzalik, Toke Høiland-Jørgensen,
	Daniel Borkmann, Alexei Starovoitov, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Willem de Bruijn,
	Ilias Apalodimas, Lorenzo Bianconi, Saeed Mahameed,
	steffen.klassert

The driver qede uses a full page, when XDP is enabled. The drivers value
in rx_buf_seg_size (struct qede_rx_queue) will be PAGE_SIZE when an
XDP bpf_prog is attached.

Cc: Ariel Elior <aelior@marvell.com>
Cc: GR-everest-linux-l2@marvell.com
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/qlogic/qede/qede_fp.c   |    1 +
 drivers/net/ethernet/qlogic/qede/qede_main.c |    2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/qlogic/qede/qede_fp.c b/drivers/net/ethernet/qlogic/qede/qede_fp.c
index c6c20776b474..7598ebe0962a 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_fp.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_fp.c
@@ -1066,6 +1066,7 @@ static bool qede_rx_xdp(struct qede_dev *edev,
 	xdp_set_data_meta_invalid(&xdp);
 	xdp.data_end = xdp.data + *len;
 	xdp.rxq = &rxq->xdp_rxq;
+	xdp.frame_sz = rxq->rx_buf_seg_size; /* PAGE_SIZE when XDP enabled */
 
 	/* Queues always have a full reset currently, so for the time
 	 * being until there's atomic program replace just mark read
diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c b/drivers/net/ethernet/qlogic/qede/qede_main.c
index 9b456198cb50..7e359c2bf2dc 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_main.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_main.c
@@ -1418,7 +1418,7 @@ static int qede_alloc_mem_rxq(struct qede_dev *edev, struct qede_rx_queue *rxq)
 	if (rxq->rx_buf_size + size > PAGE_SIZE)
 		rxq->rx_buf_size = PAGE_SIZE - size;
 
-	/* Segment size to spilt a page in multiple equal parts ,
+	/* Segment size to split a page in multiple equal parts,
 	 * unless XDP is used in which case we'd use the entire page.
 	 */
 	if (!edev->xdp_prog) {



^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH net-next v2 14/33] net: ethernet: ti: add XDP frame size to driver cpsw
       [not found] <158824557985.2172139.4173570969543904434.stgit@firesoul>
                   ` (12 preceding siblings ...)
  2020-04-30 11:21 ` [PATCH net-next v2 13/33] qlogic/qede: " Jesper Dangaard Brouer
@ 2020-04-30 11:21 ` Jesper Dangaard Brouer
  2020-04-30 11:21 ` [PATCH net-next v2 15/33] ena: add XDP frame size to amazon NIC driver Jesper Dangaard Brouer
                   ` (18 subsequent siblings)
  32 siblings, 0 replies; 47+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-30 11:21 UTC (permalink / raw)
  To: sameehj
  Cc: Grygorii Strashko, Ilias Apalodimas, Grygorii Strashko,
	Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, steffen.klassert

The driver code cpsw.c and cpsw_new.c both use page_pool
with default order-0 pages or their RX-pages.

Cc: Grygorii Strashko <grygorii.strashko@ti.com>
Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
---
 drivers/net/ethernet/ti/cpsw.c     |    1 +
 drivers/net/ethernet/ti/cpsw_new.c |    1 +
 2 files changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 09f98fa2fb4e..ce0645ada6e7 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -406,6 +406,7 @@ static void cpsw_rx_handler(void *token, int len, int status)
 
 		xdp.data_hard_start = pa;
 		xdp.rxq = &priv->xdp_rxq[ch];
+		xdp.frame_sz = PAGE_SIZE;
 
 		port = priv->emac_port + cpsw->data.dual_emac;
 		ret = cpsw_run_xdp(priv, ch, &xdp, page, port);
diff --git a/drivers/net/ethernet/ti/cpsw_new.c b/drivers/net/ethernet/ti/cpsw_new.c
index 33c8dd686206..f196fb7cbdd4 100644
--- a/drivers/net/ethernet/ti/cpsw_new.c
+++ b/drivers/net/ethernet/ti/cpsw_new.c
@@ -348,6 +348,7 @@ static void cpsw_rx_handler(void *token, int len, int status)
 
 		xdp.data_hard_start = pa;
 		xdp.rxq = &priv->xdp_rxq[ch];
+		xdp.frame_sz = PAGE_SIZE;
 
 		ret = cpsw_run_xdp(priv, ch, &xdp, page, priv->emac_port);
 		if (ret != CPSW_XDP_PASS)



^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH net-next v2 15/33] ena: add XDP frame size to amazon NIC driver
       [not found] <158824557985.2172139.4173570969543904434.stgit@firesoul>
                   ` (13 preceding siblings ...)
  2020-04-30 11:21 ` [PATCH net-next v2 14/33] net: ethernet: ti: add XDP frame size to driver cpsw Jesper Dangaard Brouer
@ 2020-04-30 11:21 ` Jesper Dangaard Brouer
  2020-04-30 11:21 ` [PATCH net-next v2 16/33] mlx4: add XDP frame size and adjust max XDP MTU Jesper Dangaard Brouer
                   ` (17 subsequent siblings)
  32 siblings, 0 replies; 47+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-30 11:21 UTC (permalink / raw)
  To: sameehj
  Cc: Arthur Kiyanovski, Jesper Dangaard Brouer, netdev, bpf, zorik,
	akiyano, gtzalik, Toke Høiland-Jørgensen,
	Daniel Borkmann, Alexei Starovoitov, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Willem de Bruijn,
	Ilias Apalodimas, Lorenzo Bianconi, Saeed Mahameed,
	steffen.klassert

Frame size ENA_PAGE_SIZE is limited to 16K on systems with larger
PAGE_SIZE than 16K. Change ENA_XDP_MAX_MTU to also take into account
the reserved tailroom.

Cc: Arthur Kiyanovski <akiyano@amazon.com>
Acked-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/amazon/ena/ena_netdev.c |    1 +
 drivers/net/ethernet/amazon/ena/ena_netdev.h |    5 +++--
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index 2cc765df8da3..0fd7db1769f8 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -1606,6 +1606,7 @@ static int ena_clean_rx_irq(struct ena_ring *rx_ring, struct napi_struct *napi,
 		  "%s qid %d\n", __func__, rx_ring->qid);
 	res_budget = budget;
 	xdp.rxq = &rx_ring->xdp_rxq;
+	xdp.frame_sz = ENA_PAGE_SIZE;
 
 	do {
 		xdp_verdict = XDP_PASS;
diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.h b/drivers/net/ethernet/amazon/ena/ena_netdev.h
index 97dfd0c67e84..dd00127dfe9f 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.h
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.h
@@ -151,8 +151,9 @@
  * The buffer size we share with the device is defined to be ENA_PAGE_SIZE
  */
 
-#define ENA_XDP_MAX_MTU (ENA_PAGE_SIZE - ETH_HLEN - ETH_FCS_LEN - \
-				VLAN_HLEN - XDP_PACKET_HEADROOM)
+#define ENA_XDP_MAX_MTU (ENA_PAGE_SIZE - ETH_HLEN - ETH_FCS_LEN -	\
+			 VLAN_HLEN - XDP_PACKET_HEADROOM -		\
+			 SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))
 
 #define ENA_IS_XDP_INDEX(adapter, index) (((index) >= (adapter)->xdp_first_ring) && \
 	((index) < (adapter)->xdp_first_ring + (adapter)->xdp_num_queues))



^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH net-next v2 16/33] mlx4: add XDP frame size and adjust max XDP MTU
       [not found] <158824557985.2172139.4173570969543904434.stgit@firesoul>
                   ` (14 preceding siblings ...)
  2020-04-30 11:21 ` [PATCH net-next v2 15/33] ena: add XDP frame size to amazon NIC driver Jesper Dangaard Brouer
@ 2020-04-30 11:21 ` Jesper Dangaard Brouer
  2020-04-30 11:21 ` [PATCH net-next v2 17/33] net: thunderx: add XDP frame size Jesper Dangaard Brouer
                   ` (16 subsequent siblings)
  32 siblings, 0 replies; 47+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-30 11:21 UTC (permalink / raw)
  To: sameehj
  Cc: Tariq Toukan, Saeed Mahameed, Tariq Toukan,
	Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, steffen.klassert

The mlx4 drivers size of memory backing the RX packet is stored in
frag_stride. For XDP mode this will be PAGE_SIZE (normally 4096).
For normal mode frag_stride is 2048.

Also adjust MLX4_EN_MAX_XDP_MTU to take tailroom into account.

Cc: Tariq Toukan <tariqt@mellanox.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c |    3 ++-
 drivers/net/ethernet/mellanox/mlx4/en_rx.c     |    1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 43dcbd8214c6..5bd3cd37d50f 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -51,7 +51,8 @@
 #include "en_port.h"
 
 #define MLX4_EN_MAX_XDP_MTU ((int)(PAGE_SIZE - ETH_HLEN - (2 * VLAN_HLEN) - \
-				   XDP_PACKET_HEADROOM))
+				XDP_PACKET_HEADROOM -			    \
+				SKB_DATA_ALIGN(sizeof(struct skb_shared_info))))
 
 int mlx4_en_setup_tc(struct net_device *dev, u8 up)
 {
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 787139219813..8a10285b0e10 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -683,6 +683,7 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
 	rcu_read_lock();
 	xdp_prog = rcu_dereference(ring->xdp_prog);
 	xdp.rxq = &ring->xdp_rxq;
+	xdp.frame_sz = priv->frag_info[0].frag_stride;
 	doorbell_pending = 0;
 
 	/* We assume a 1:1 mapping between CQEs and Rx descriptors, so Rx



^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH net-next v2 17/33] net: thunderx: add XDP frame size
       [not found] <158824557985.2172139.4173570969543904434.stgit@firesoul>
                   ` (15 preceding siblings ...)
  2020-04-30 11:21 ` [PATCH net-next v2 16/33] mlx4: add XDP frame size and adjust max XDP MTU Jesper Dangaard Brouer
@ 2020-04-30 11:21 ` Jesper Dangaard Brouer
  2020-04-30 11:21 ` [PATCH net-next v2 18/33] nfp: add XDP frame size to netronome driver Jesper Dangaard Brouer
                   ` (15 subsequent siblings)
  32 siblings, 0 replies; 47+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-30 11:21 UTC (permalink / raw)
  To: sameehj
  Cc: Sunil Goutham, Robert Richter, Jesper Dangaard Brouer, netdev,
	bpf, zorik, akiyano, gtzalik, Toke Høiland-Jørgensen,
	Daniel Borkmann, Alexei Starovoitov, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Willem de Bruijn,
	Ilias Apalodimas, Lorenzo Bianconi, Saeed Mahameed,
	steffen.klassert

To help reviewers these are the defines related to RCV_FRAG_LEN

 #define DMA_BUFFER_LEN	1536 /* In multiples of 128bytes */
 #define RCV_FRAG_LEN	(SKB_DATA_ALIGN(DMA_BUFFER_LEN + NET_SKB_PAD) + \
			 SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))

Cc: Sunil Goutham <sgoutham@marvell.com>
Cc: Robert Richter <rrichter@marvell.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/cavium/thunder/nicvf_main.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_main.c b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
index b4b33368698f..2ba0ce115e63 100644
--- a/drivers/net/ethernet/cavium/thunder/nicvf_main.c
+++ b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
@@ -552,6 +552,7 @@ static inline bool nicvf_xdp_rx(struct nicvf *nic, struct bpf_prog *prog,
 	xdp_set_data_meta_invalid(&xdp);
 	xdp.data_end = xdp.data + len;
 	xdp.rxq = &rq->xdp_rxq;
+	xdp.frame_sz = RCV_FRAG_LEN + XDP_PACKET_HEADROOM;
 	orig_data = xdp.data;
 
 	rcu_read_lock();



^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH net-next v2 18/33] nfp: add XDP frame size to netronome driver
       [not found] <158824557985.2172139.4173570969543904434.stgit@firesoul>
                   ` (16 preceding siblings ...)
  2020-04-30 11:21 ` [PATCH net-next v2 17/33] net: thunderx: add XDP frame size Jesper Dangaard Brouer
@ 2020-04-30 11:21 ` Jesper Dangaard Brouer
  2020-04-30 11:21 ` [PATCH net-next v2 19/33] tun: add XDP frame size Jesper Dangaard Brouer
                   ` (14 subsequent siblings)
  32 siblings, 0 replies; 47+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-30 11:21 UTC (permalink / raw)
  To: sameehj
  Cc: Jakub Kicinski, Jakub Kicinski, Jesper Dangaard Brouer, netdev,
	bpf, zorik, akiyano, gtzalik, Toke Høiland-Jørgensen,
	Daniel Borkmann, Alexei Starovoitov, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Willem de Bruijn,
	Ilias Apalodimas, Lorenzo Bianconi, Saeed Mahameed,
	steffen.klassert

The netronome nfp driver use PAGE_SIZE when xdp_prog is set, but
xdp.data_hard_start begins at offset NFP_NET_RX_BUF_HEADROOM.
Thus, adjust for this when setting xdp.frame_sz, as it counts
from data_hard_start.

When doing XDP_TX this driver is smart and instead of a full DMA-map
does a DMA-sync on with packet length. As xdp_adjust_tail can now
grow packet length, add checks to make sure that grow size is within
the DMA-mapped size.

Cc: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
---
 .../net/ethernet/netronome/nfp/nfp_net_common.c    |    6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 9bfb3b077bc1..0e0cc3d58bdc 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -1741,10 +1741,15 @@ nfp_net_tx_xdp_buf(struct nfp_net_dp *dp, struct nfp_net_rx_ring *rx_ring,
 		   struct nfp_net_rx_buf *rxbuf, unsigned int dma_off,
 		   unsigned int pkt_len, bool *completed)
 {
+	unsigned int dma_map_sz = dp->fl_bufsz - NFP_NET_RX_BUF_NON_DATA;
 	struct nfp_net_tx_buf *txbuf;
 	struct nfp_net_tx_desc *txd;
 	int wr_idx;
 
+	/* Reject if xdp_adjust_tail grow packet beyond DMA area */
+	if (pkt_len + dma_off > dma_map_sz)
+		return false;
+
 	if (unlikely(nfp_net_tx_full(tx_ring, 1))) {
 		if (!*completed) {
 			nfp_net_xdp_complete(tx_ring);
@@ -1817,6 +1822,7 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, int budget)
 	rcu_read_lock();
 	xdp_prog = READ_ONCE(dp->xdp_prog);
 	true_bufsz = xdp_prog ? PAGE_SIZE : dp->fl_bufsz;
+	xdp.frame_sz = PAGE_SIZE - NFP_NET_RX_BUF_HEADROOM;
 	xdp.rxq = &rx_ring->xdp_rxq;
 	tx_ring = r_vec->xdp_ring;
 



^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH net-next v2 19/33] tun: add XDP frame size
       [not found] <158824557985.2172139.4173570969543904434.stgit@firesoul>
                   ` (17 preceding siblings ...)
  2020-04-30 11:21 ` [PATCH net-next v2 18/33] nfp: add XDP frame size to netronome driver Jesper Dangaard Brouer
@ 2020-04-30 11:21 ` Jesper Dangaard Brouer
  2020-05-06 20:32   ` Michael S. Tsirkin
  2020-04-30 11:22 ` [PATCH net-next v2 20/33] vhost_net: also populate " Jesper Dangaard Brouer
                   ` (13 subsequent siblings)
  32 siblings, 1 reply; 47+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-30 11:21 UTC (permalink / raw)
  To: sameehj
  Cc: Jason Wang, Jason Wang, Jesper Dangaard Brouer, netdev, bpf,
	zorik, akiyano, gtzalik, Toke Høiland-Jørgensen,
	Daniel Borkmann, Alexei Starovoitov, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Willem de Bruijn,
	Ilias Apalodimas, Lorenzo Bianconi, Saeed Mahameed,
	steffen.klassert

The tun driver have two code paths for running XDP (bpf_prog_run_xdp).
In both cases 'buflen' contains enough tailroom for skb_shared_info.

Cc: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/tun.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 44889eba1dbc..c54f967e2c66 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1671,6 +1671,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
 		xdp_set_data_meta_invalid(&xdp);
 		xdp.data_end = xdp.data + len;
 		xdp.rxq = &tfile->xdp_rxq;
+		xdp.frame_sz = buflen;
 
 		act = bpf_prog_run_xdp(xdp_prog, &xdp);
 		if (act == XDP_REDIRECT || act == XDP_TX) {
@@ -2411,6 +2412,7 @@ static int tun_xdp_one(struct tun_struct *tun,
 		}
 		xdp_set_data_meta_invalid(xdp);
 		xdp->rxq = &tfile->xdp_rxq;
+		xdp->frame_sz = buflen;
 
 		act = bpf_prog_run_xdp(xdp_prog, xdp);
 		err = tun_xdp_act(tun, xdp_prog, xdp, act);



^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH net-next v2 20/33] vhost_net: also populate XDP frame size
       [not found] <158824557985.2172139.4173570969543904434.stgit@firesoul>
                   ` (18 preceding siblings ...)
  2020-04-30 11:21 ` [PATCH net-next v2 19/33] tun: add XDP frame size Jesper Dangaard Brouer
@ 2020-04-30 11:22 ` Jesper Dangaard Brouer
  2020-05-06  6:41   ` Jason Wang
  2020-05-06 20:33   ` Michael S. Tsirkin
  2020-04-30 11:22 ` [PATCH net-next v2 21/33] virtio_net: add XDP frame size in two code paths Jesper Dangaard Brouer
                   ` (12 subsequent siblings)
  32 siblings, 2 replies; 47+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-30 11:22 UTC (permalink / raw)
  To: sameehj
  Cc: Jason Wang, Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano,
	gtzalik, Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, steffen.klassert

In vhost_net_build_xdp() the 'buf' that gets queued via an xdp_buff
have embedded a struct tun_xdp_hdr (located at xdp->data_hard_start)
which contains the buffer length 'buflen' (with tailroom for
skb_shared_info). Also storing this buflen in xdp->frame_sz, does not
obsolete struct tun_xdp_hdr, as it also contains a struct
virtio_net_hdr with other information.

Cc: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/vhost/net.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 2927f02cc7e1..516519dcc8ff 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -747,6 +747,7 @@ static int vhost_net_build_xdp(struct vhost_net_virtqueue *nvq,
 	xdp->data = buf + pad;
 	xdp->data_end = xdp->data + len;
 	hdr->buflen = buflen;
+	xdp->frame_sz = buflen;
 
 	--net->refcnt_bias;
 	alloc_frag->offset += buflen;



^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH net-next v2 21/33] virtio_net: add XDP frame size in two code paths
       [not found] <158824557985.2172139.4173570969543904434.stgit@firesoul>
                   ` (19 preceding siblings ...)
  2020-04-30 11:22 ` [PATCH net-next v2 20/33] vhost_net: also populate " Jesper Dangaard Brouer
@ 2020-04-30 11:22 ` Jesper Dangaard Brouer
  2020-05-06 20:34   ` Michael S. Tsirkin
  2020-04-30 11:22 ` [PATCH net-next v2 22/33] ixgbe: fix XDP redirect on archs with PAGE_SIZE above 4K Jesper Dangaard Brouer
                   ` (11 subsequent siblings)
  32 siblings, 1 reply; 47+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-30 11:22 UTC (permalink / raw)
  To: sameehj
  Cc: Jason Wang, Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano,
	gtzalik, Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, steffen.klassert

The virtio_net driver is running inside the guest-OS. There are two
XDP receive code-paths in virtio_net, namely receive_small() and
receive_mergeable(). The receive_big() function does not support XDP.

In receive_small() the frame size is available in buflen. The buffer
backing these frames are allocated in add_recvbuf_small() with same
size, except for the headroom, but tailroom have reserved room for
skb_shared_info. The headroom is encoded in ctx pointer as a value.

In receive_mergeable() the frame size is more dynamic. There are two
basic cases: (1) buffer size is based on a exponentially weighted
moving average (see DECLARE_EWMA) of packet length. Or (2) in case
virtnet_get_headroom() have any headroom then buffer size is
PAGE_SIZE. The ctx pointer is this time used for encoding two values;
the buffer len "truesize" and headroom. In case (1) if the rx buffer
size is underestimated, the packet will have been split over more
buffers (num_buf info in virtio_net_hdr_mrg_rxbuf placed in top of
buffer area). If that happens the XDP path does a xdp_linearize_page
operation.

Cc: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/virtio_net.c |   15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 11f722460513..1df3676da185 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -689,6 +689,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
 		xdp.data_end = xdp.data + len;
 		xdp.data_meta = xdp.data;
 		xdp.rxq = &rq->xdp_rxq;
+		xdp.frame_sz = buflen;
 		orig_data = xdp.data;
 		act = bpf_prog_run_xdp(xdp_prog, &xdp);
 		stats->xdp_packets++;
@@ -797,10 +798,11 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 	int offset = buf - page_address(page);
 	struct sk_buff *head_skb, *curr_skb;
 	struct bpf_prog *xdp_prog;
-	unsigned int truesize;
+	unsigned int truesize = mergeable_ctx_to_truesize(ctx);
 	unsigned int headroom = mergeable_ctx_to_headroom(ctx);
-	int err;
 	unsigned int metasize = 0;
+	unsigned int frame_sz;
+	int err;
 
 	head_skb = NULL;
 	stats->bytes += len - vi->hdr_len;
@@ -821,6 +823,11 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 		if (unlikely(hdr->hdr.gso_type))
 			goto err_xdp;
 
+		/* Buffers with headroom use PAGE_SIZE as alloc size,
+		 * see add_recvbuf_mergeable() + get_mergeable_buf_len()
+		 */
+		frame_sz = headroom ? PAGE_SIZE : truesize;
+
 		/* This happens when rx buffer size is underestimated
 		 * or headroom is not enough because of the buffer
 		 * was refilled before XDP is set. This should only
@@ -834,6 +841,8 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 						      page, offset,
 						      VIRTIO_XDP_HEADROOM,
 						      &len);
+			frame_sz = PAGE_SIZE;
+
 			if (!xdp_page)
 				goto err_xdp;
 			offset = VIRTIO_XDP_HEADROOM;
@@ -850,6 +859,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 		xdp.data_end = xdp.data + (len - vi->hdr_len);
 		xdp.data_meta = xdp.data;
 		xdp.rxq = &rq->xdp_rxq;
+		xdp.frame_sz = frame_sz;
 
 		act = bpf_prog_run_xdp(xdp_prog, &xdp);
 		stats->xdp_packets++;
@@ -924,7 +934,6 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 	}
 	rcu_read_unlock();
 
-	truesize = mergeable_ctx_to_truesize(ctx);
 	if (unlikely(len > truesize)) {
 		pr_debug("%s: rx error: len %u exceeds truesize %lu\n",
 			 dev->name, len, (unsigned long)ctx);



^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH net-next v2 22/33] ixgbe: fix XDP redirect on archs with PAGE_SIZE above 4K
       [not found] <158824557985.2172139.4173570969543904434.stgit@firesoul>
                   ` (20 preceding siblings ...)
  2020-04-30 11:22 ` [PATCH net-next v2 21/33] virtio_net: add XDP frame size in two code paths Jesper Dangaard Brouer
@ 2020-04-30 11:22 ` Jesper Dangaard Brouer
  2020-04-30 11:22 ` [PATCH net-next v2 23/33] ixgbe: add XDP frame size to driver Jesper Dangaard Brouer
                   ` (10 subsequent siblings)
  32 siblings, 0 replies; 47+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-30 11:22 UTC (permalink / raw)
  To: sameehj
  Cc: Jeff Kirsher, Jesper Dangaard Brouer, netdev, bpf, zorik,
	akiyano, gtzalik, Toke Høiland-Jørgensen,
	Daniel Borkmann, Alexei Starovoitov, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Willem de Bruijn,
	Ilias Apalodimas, Lorenzo Bianconi, Saeed Mahameed,
	steffen.klassert

The ixgbe driver have another memory model when compiled on archs with
PAGE_SIZE above 4096 bytes. In this mode it doesn't split the page in
two halves, but instead increment rx_buffer->page_offset by truesize of
packet (which include headroom and tailroom for skb_shared_info).

This is done correctly in ixgbe_build_skb(), but in ixgbe_rx_buffer_flip
which is currently only called on XDP_TX and XDP_REDIRECT, it forgets
to add the tailroom for skb_shared_info. This breaks XDP_REDIRECT, for
veth and cpumap.  Fix by adding size of skb_shared_info tailroom.

Maintainers notice: This fix have been queued to Jeff.

Fixes: 6453073987ba ("ixgbe: add initial support for xdp redirect")
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 718931d951bc..ea6834bae04c 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -2254,7 +2254,8 @@ static void ixgbe_rx_buffer_flip(struct ixgbe_ring *rx_ring,
 	rx_buffer->page_offset ^= truesize;
 #else
 	unsigned int truesize = ring_uses_build_skb(rx_ring) ?
-				SKB_DATA_ALIGN(IXGBE_SKB_PAD + size) :
+				SKB_DATA_ALIGN(IXGBE_SKB_PAD + size) +
+				SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) :
 				SKB_DATA_ALIGN(size);
 
 	rx_buffer->page_offset += truesize;



^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH net-next v2 23/33] ixgbe: add XDP frame size to driver
       [not found] <158824557985.2172139.4173570969543904434.stgit@firesoul>
                   ` (21 preceding siblings ...)
  2020-04-30 11:22 ` [PATCH net-next v2 22/33] ixgbe: fix XDP redirect on archs with PAGE_SIZE above 4K Jesper Dangaard Brouer
@ 2020-04-30 11:22 ` Jesper Dangaard Brouer
  2020-04-30 11:22 ` [PATCH net-next v2 24/33] ixgbevf: add XDP frame size to VF driver Jesper Dangaard Brouer
                   ` (9 subsequent siblings)
  32 siblings, 0 replies; 47+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-30 11:22 UTC (permalink / raw)
  To: sameehj
  Cc: intel-wired-lan, Jeff Kirsher, Alexander Duyck,
	Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, steffen.klassert

This driver uses different memory models depending on PAGE_SIZE at
compile time. For PAGE_SIZE 4K it uses page splitting, meaning for
normal MTU frame size is 2048 bytes (and headroom 192 bytes). For
larger MTUs the driver still use page splitting, by allocating
order-1 pages (8192 bytes) for RX frames. For PAGE_SIZE larger than
4K, driver instead advance its rx_buffer->page_offset with the frame
size "truesize".

For XDP frame size calculations, this mean that in PAGE_SIZE larger
than 4K mode the frame_sz change on a per packet basis. For the page
split 4K PAGE_SIZE mode, xdp.frame_sz is more constant and can be
updated once outside the main NAPI loop.

The default setting in the driver uses build_skb(), which provides
the necessary headroom and tailroom for XDP-redirect in RX-frame
(in both modes).

There is one complication, which is legacy-rx mode (configurable via
ethtool priv-flags). There are zero headroom in this mode, which is a
requirement for XDP-redirect to work. The conversion to xdp_frame
(convert_to_xdp_frame) will detect this insufficient space, and
xdp_do_redirect() call will fail. This is deemed acceptable, as it
allows other XDP actions to still work in legacy-mode. In
legacy-mode + larger PAGE_SIZE due to lacking tailroom, we also
accept that xdp_adjust_tail shrink doesn't work.

Cc: intel-wired-lan@lists.osuosl.org
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   34 +++++++++++++++++++------
 1 file changed, 26 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index ea6834bae04c..eab5934b04f5 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -2244,20 +2244,30 @@ static struct sk_buff *ixgbe_run_xdp(struct ixgbe_adapter *adapter,
 	return ERR_PTR(-result);
 }
 
+static unsigned int ixgbe_rx_frame_truesize(struct ixgbe_ring *rx_ring,
+					    unsigned int size)
+{
+	unsigned int truesize;
+
+#if (PAGE_SIZE < 8192)
+	truesize = ixgbe_rx_pg_size(rx_ring) / 2; /* Must be power-of-2 */
+#else
+	truesize = ring_uses_build_skb(rx_ring) ?
+		SKB_DATA_ALIGN(IXGBE_SKB_PAD + size) +
+		SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) :
+		SKB_DATA_ALIGN(size);
+#endif
+	return truesize;
+}
+
 static void ixgbe_rx_buffer_flip(struct ixgbe_ring *rx_ring,
 				 struct ixgbe_rx_buffer *rx_buffer,
 				 unsigned int size)
 {
+	unsigned int truesize = ixgbe_rx_frame_truesize(rx_ring, size);
 #if (PAGE_SIZE < 8192)
-	unsigned int truesize = ixgbe_rx_pg_size(rx_ring) / 2;
-
 	rx_buffer->page_offset ^= truesize;
 #else
-	unsigned int truesize = ring_uses_build_skb(rx_ring) ?
-				SKB_DATA_ALIGN(IXGBE_SKB_PAD + size) +
-				SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) :
-				SKB_DATA_ALIGN(size);
-
 	rx_buffer->page_offset += truesize;
 #endif
 }
@@ -2291,6 +2301,11 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 
 	xdp.rxq = &rx_ring->xdp_rxq;
 
+	/* Frame size depend on rx_ring setup when PAGE_SIZE=4K */
+#if (PAGE_SIZE < 8192)
+	xdp.frame_sz = ixgbe_rx_frame_truesize(rx_ring, 0);
+#endif
+
 	while (likely(total_rx_packets < budget)) {
 		union ixgbe_adv_rx_desc *rx_desc;
 		struct ixgbe_rx_buffer *rx_buffer;
@@ -2324,7 +2339,10 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 			xdp.data_hard_start = xdp.data -
 					      ixgbe_rx_offset(rx_ring);
 			xdp.data_end = xdp.data + size;
-
+#if (PAGE_SIZE > 4096)
+			/* At larger PAGE_SIZE, frame_sz depend on len size */
+			xdp.frame_sz = ixgbe_rx_frame_truesize(rx_ring, size);
+#endif
 			skb = ixgbe_run_xdp(adapter, rx_ring, &xdp);
 		}
 



^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH net-next v2 24/33] ixgbevf: add XDP frame size to VF driver
       [not found] <158824557985.2172139.4173570969543904434.stgit@firesoul>
                   ` (22 preceding siblings ...)
  2020-04-30 11:22 ` [PATCH net-next v2 23/33] ixgbe: add XDP frame size to driver Jesper Dangaard Brouer
@ 2020-04-30 11:22 ` Jesper Dangaard Brouer
  2020-04-30 11:22 ` [PATCH net-next v2 25/33] i40e: add XDP frame size to driver Jesper Dangaard Brouer
                   ` (8 subsequent siblings)
  32 siblings, 0 replies; 47+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-30 11:22 UTC (permalink / raw)
  To: sameehj
  Cc: intel-wired-lan, Jeff Kirsher, Alexander Duyck,
	Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, steffen.klassert

This patch mirrors the changes to ixgbe in previous patch.

This VF driver doesn't support XDP_REDIRECT, but correct tailroom is
still necessary for BPF-helper xdp_adjust_tail.  In legacy-mode +
larger PAGE_SIZE, due to lacking tailroom, we accept that
xdp_adjust_tail shrink doesn't work.

Cc: intel-wired-lan@lists.osuosl.org
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c |   34 +++++++++++++++++----
 1 file changed, 27 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 4622c4ea2e46..62bc3e3b5b9c 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -1095,19 +1095,31 @@ static struct sk_buff *ixgbevf_run_xdp(struct ixgbevf_adapter *adapter,
 	return ERR_PTR(-result);
 }
 
+static unsigned int ixgbevf_rx_frame_truesize(struct ixgbevf_ring *rx_ring,
+					      unsigned int size)
+{
+	unsigned int truesize;
+
+#if (PAGE_SIZE < 8192)
+	truesize = ixgbevf_rx_pg_size(rx_ring) / 2; /* Must be power-of-2 */
+#else
+	truesize = ring_uses_build_skb(rx_ring) ?
+		SKB_DATA_ALIGN(IXGBEVF_SKB_PAD + size) +
+		SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) :
+		SKB_DATA_ALIGN(size);
+#endif
+       return truesize;
+}
+
 static void ixgbevf_rx_buffer_flip(struct ixgbevf_ring *rx_ring,
 				   struct ixgbevf_rx_buffer *rx_buffer,
 				   unsigned int size)
 {
-#if (PAGE_SIZE < 8192)
-	unsigned int truesize = ixgbevf_rx_pg_size(rx_ring) / 2;
+	unsigned int truesize = ixgbevf_rx_frame_truesize(rx_ring, size);
 
+#if (PAGE_SIZE < 8192)
 	rx_buffer->page_offset ^= truesize;
 #else
-	unsigned int truesize = ring_uses_build_skb(rx_ring) ?
-				SKB_DATA_ALIGN(IXGBEVF_SKB_PAD + size) :
-				SKB_DATA_ALIGN(size);
-
 	rx_buffer->page_offset += truesize;
 #endif
 }
@@ -1125,6 +1137,11 @@ static int ixgbevf_clean_rx_irq(struct ixgbevf_q_vector *q_vector,
 
 	xdp.rxq = &rx_ring->xdp_rxq;
 
+	/* Frame size depend on rx_ring setup when PAGE_SIZE=4K */
+#if (PAGE_SIZE < 8192)
+	xdp.frame_sz = ixgbevf_rx_frame_truesize(rx_ring, 0);
+#endif
+
 	while (likely(total_rx_packets < budget)) {
 		struct ixgbevf_rx_buffer *rx_buffer;
 		union ixgbe_adv_rx_desc *rx_desc;
@@ -1157,7 +1174,10 @@ static int ixgbevf_clean_rx_irq(struct ixgbevf_q_vector *q_vector,
 			xdp.data_hard_start = xdp.data -
 					      ixgbevf_rx_offset(rx_ring);
 			xdp.data_end = xdp.data + size;
-
+#if (PAGE_SIZE > 4096)
+			/* At larger PAGE_SIZE, frame_sz depend on len size */
+			xdp.frame_sz = ixgbevf_rx_frame_truesize(rx_ring, size);
+#endif
 			skb = ixgbevf_run_xdp(adapter, rx_ring, &xdp);
 		}
 



^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH net-next v2 25/33] i40e: add XDP frame size to driver
       [not found] <158824557985.2172139.4173570969543904434.stgit@firesoul>
                   ` (23 preceding siblings ...)
  2020-04-30 11:22 ` [PATCH net-next v2 24/33] ixgbevf: add XDP frame size to VF driver Jesper Dangaard Brouer
@ 2020-04-30 11:22 ` Jesper Dangaard Brouer
  2020-04-30 11:22 ` [PATCH net-next v2 26/33] ice: " Jesper Dangaard Brouer
                   ` (7 subsequent siblings)
  32 siblings, 0 replies; 47+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-30 11:22 UTC (permalink / raw)
  To: sameehj
  Cc: intel-wired-lan, Jeff Kirsher, Alexander Duyck,
	Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, steffen.klassert

This driver uses different memory models depending on PAGE_SIZE at
compile time. For PAGE_SIZE 4K it uses page splitting, meaning for
normal MTU frame size is 2048 bytes (and headroom 192 bytes). For
larger MTUs the driver still use page splitting, by allocating
order-1 pages (8192 bytes) for RX frames. For PAGE_SIZE larger than
4K, driver instead advance its rx_buffer->page_offset with the frame
size "truesize".

For XDP frame size calculations, this mean that in PAGE_SIZE larger
than 4K mode the frame_sz change on a per packet basis. For the page
split 4K PAGE_SIZE mode, xdp.frame_sz is more constant and can be
updated once outside the main NAPI loop.

The default setting in the driver uses build_skb(), which provides
the necessary headroom and tailroom for XDP-redirect in RX-frame
(in both modes).

There is one complication, which is legacy-rx mode (configurable via
ethtool priv-flags). There are zero headroom in this mode, which is a
requirement for XDP-redirect to work. The conversion to xdp_frame
(convert_to_xdp_frame) will detect this insufficient space, and
xdp_do_redirect() call will fail. This is deemed acceptable, as it
allows other XDP actions to still work in legacy-mode. In
legacy-mode + larger PAGE_SIZE due to lacking tailroom, we also
accept that xdp_adjust_tail shrink doesn't work.

Cc: intel-wired-lan@lists.osuosl.org
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c |   30 +++++++++++++++++++++++----
 1 file changed, 25 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index b8496037ef7f..a3772beffe02 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -1507,6 +1507,22 @@ static inline unsigned int i40e_rx_offset(struct i40e_ring *rx_ring)
 	return ring_uses_build_skb(rx_ring) ? I40E_SKB_PAD : 0;
 }
 
+static unsigned int i40e_rx_frame_truesize(struct i40e_ring *rx_ring,
+					   unsigned int size)
+{
+	unsigned int truesize;
+
+#if (PAGE_SIZE < 8192)
+	truesize = i40e_rx_pg_size(rx_ring) / 2; /* Must be power-of-2 */
+#else
+	truesize = i40e_rx_offset(rx_ring) ?
+		SKB_DATA_ALIGN(size + i40e_rx_offset(rx_ring)) +
+		SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) :
+		SKB_DATA_ALIGN(size);
+#endif
+	return truesize;
+}
+
 /**
  * i40e_alloc_mapped_page - recycle or make a new page
  * @rx_ring: ring to use
@@ -2246,13 +2262,11 @@ static void i40e_rx_buffer_flip(struct i40e_ring *rx_ring,
 				struct i40e_rx_buffer *rx_buffer,
 				unsigned int size)
 {
-#if (PAGE_SIZE < 8192)
-	unsigned int truesize = i40e_rx_pg_size(rx_ring) / 2;
+	unsigned int truesize = i40e_rx_frame_truesize(rx_ring, size);
 
+#if (PAGE_SIZE < 8192)
 	rx_buffer->page_offset ^= truesize;
 #else
-	unsigned int truesize = SKB_DATA_ALIGN(i40e_rx_offset(rx_ring) + size);
-
 	rx_buffer->page_offset += truesize;
 #endif
 }
@@ -2335,6 +2349,9 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
 	bool failure = false;
 	struct xdp_buff xdp;
 
+#if (PAGE_SIZE < 8192)
+	xdp.frame_sz = i40e_rx_frame_truesize(rx_ring, 0);
+#endif
 	xdp.rxq = &rx_ring->xdp_rxq;
 
 	while (likely(total_rx_packets < (unsigned int)budget)) {
@@ -2389,7 +2406,10 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
 			xdp.data_hard_start = xdp.data -
 					      i40e_rx_offset(rx_ring);
 			xdp.data_end = xdp.data + size;
-
+#if (PAGE_SIZE > 4096)
+			/* At larger PAGE_SIZE, frame_sz depend on len size */
+			xdp.frame_sz = i40e_rx_frame_truesize(rx_ring, size);
+#endif
 			skb = i40e_run_xdp(rx_ring, &xdp);
 		}
 



^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH net-next v2 26/33] ice: add XDP frame size to driver
       [not found] <158824557985.2172139.4173570969543904434.stgit@firesoul>
                   ` (24 preceding siblings ...)
  2020-04-30 11:22 ` [PATCH net-next v2 25/33] i40e: add XDP frame size to driver Jesper Dangaard Brouer
@ 2020-04-30 11:22 ` Jesper Dangaard Brouer
  2020-04-30 11:22 ` [PATCH net-next v2 27/33] xdp: for Intel AF_XDP drivers add XDP frame_sz Jesper Dangaard Brouer
                   ` (6 subsequent siblings)
  32 siblings, 0 replies; 47+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-30 11:22 UTC (permalink / raw)
  To: sameehj
  Cc: intel-wired-lan, Jeff Kirsher, Alexander Duyck,
	Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, steffen.klassert

This driver uses different memory models depending on PAGE_SIZE at
compile time. For PAGE_SIZE 4K it uses page splitting, meaning for
normal MTU frame size is 2048 bytes (and headroom 192 bytes). For
larger MTUs the driver still use page splitting, by allocating
order-1 pages (8192 bytes) for RX frames. For PAGE_SIZE larger than
4K, driver instead advance its rx_buffer->page_offset with the frame
size "truesize".

For XDP frame size calculations, this mean that in PAGE_SIZE larger
than 4K mode the frame_sz change on a per packet basis. For the page
split 4K PAGE_SIZE mode, xdp.frame_sz is more constant and can be
updated once outside the main NAPI loop.

The default setting in the driver uses build_skb(), which provides
the necessary headroom and tailroom for XDP-redirect in RX-frame
(in both modes).

There is one complication, which is legacy-rx mode (configurable via
ethtool priv-flags). There are zero headroom in this mode, which is a
requirement for XDP-redirect to work. The conversion to xdp_frame
(convert_to_xdp_frame) will detect this insufficient space, and
xdp_do_redirect() call will fail. This is deemed acceptable, as it
allows other XDP actions to still work in legacy-mode. In
legacy-mode + larger PAGE_SIZE due to lacking tailroom, we also
accept that xdp_adjust_tail shrink doesn't work.

Cc: intel-wired-lan@lists.osuosl.org
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/intel/ice/ice_txrx.c |   34 +++++++++++++++++++++--------
 1 file changed, 25 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c
index f67e8362958c..69b21b436f9a 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
@@ -423,6 +423,22 @@ static unsigned int ice_rx_offset(struct ice_ring *rx_ring)
 	return 0;
 }
 
+static unsigned int ice_rx_frame_truesize(struct ice_ring *rx_ring,
+					  unsigned int size)
+{
+	unsigned int truesize;
+
+#if (PAGE_SIZE < 8192)
+	truesize = ice_rx_pg_size(rx_ring) / 2; /* Must be power-of-2 */
+#else
+	truesize = ice_rx_offset(rx_ring) ?
+		SKB_DATA_ALIGN(ice_rx_offset(rx_ring) + size) +
+		SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) :
+		SKB_DATA_ALIGN(size);
+#endif
+	return truesize;
+}
+
 /**
  * ice_run_xdp - Executes an XDP program on initialized xdp_buff
  * @rx_ring: Rx ring
@@ -991,6 +1007,10 @@ static int ice_clean_rx_irq(struct ice_ring *rx_ring, int budget)
 	bool failure;
 
 	xdp.rxq = &rx_ring->xdp_rxq;
+	/* Frame size depend on rx_ring setup when PAGE_SIZE=4K */
+#if (PAGE_SIZE < 8192)
+	xdp.frame_sz = ice_rx_frame_truesize(rx_ring, 0);
+#endif
 
 	/* start the loop to process Rx packets bounded by 'budget' */
 	while (likely(total_rx_pkts < (unsigned int)budget)) {
@@ -1038,6 +1058,10 @@ static int ice_clean_rx_irq(struct ice_ring *rx_ring, int budget)
 		xdp.data_hard_start = xdp.data - ice_rx_offset(rx_ring);
 		xdp.data_meta = xdp.data;
 		xdp.data_end = xdp.data + size;
+#if (PAGE_SIZE > 4096)
+		/* At larger PAGE_SIZE, frame_sz depend on len size */
+		xdp.frame_sz = ice_rx_frame_truesize(rx_ring, size);
+#endif
 
 		rcu_read_lock();
 		xdp_prog = READ_ONCE(rx_ring->xdp_prog);
@@ -1051,16 +1075,8 @@ static int ice_clean_rx_irq(struct ice_ring *rx_ring, int budget)
 		if (!xdp_res)
 			goto construct_skb;
 		if (xdp_res & (ICE_XDP_TX | ICE_XDP_REDIR)) {
-			unsigned int truesize;
-
-#if (PAGE_SIZE < 8192)
-			truesize = ice_rx_pg_size(rx_ring) / 2;
-#else
-			truesize = SKB_DATA_ALIGN(ice_rx_offset(rx_ring) +
-						  size);
-#endif
 			xdp_xmit |= xdp_res;
-			ice_rx_buf_adjust_pg_offset(rx_buf, truesize);
+			ice_rx_buf_adjust_pg_offset(rx_buf, xdp.frame_sz);
 		} else {
 			rx_buf->pagecnt_bias++;
 		}



^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH net-next v2 27/33] xdp: for Intel AF_XDP drivers add XDP frame_sz
       [not found] <158824557985.2172139.4173570969543904434.stgit@firesoul>
                   ` (25 preceding siblings ...)
  2020-04-30 11:22 ` [PATCH net-next v2 26/33] ice: " Jesper Dangaard Brouer
@ 2020-04-30 11:22 ` Jesper Dangaard Brouer
  2020-04-30 11:22 ` [PATCH net-next v2 28/33] mlx5: rx queue setup time determine frame_sz for XDP Jesper Dangaard Brouer
                   ` (5 subsequent siblings)
  32 siblings, 0 replies; 47+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-30 11:22 UTC (permalink / raw)
  To: sameehj
  Cc: intel-wired-lan, Björn Töpel, Magnus Karlsson,
	Björn Töpel, Jesper Dangaard Brouer, netdev, bpf,
	zorik, akiyano, gtzalik, Toke Høiland-Jørgensen,
	Daniel Borkmann, Alexei Starovoitov, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Willem de Bruijn,
	Ilias Apalodimas, Lorenzo Bianconi, Saeed Mahameed,
	steffen.klassert

Intel drivers implement native AF_XDP zerocopy in separate C-files,
that have its own invocation of bpf_prog_run_xdp(). The setup of
xdp_buff is also handled in separately from normal code path.

This patch update XDP frame_sz for AF_XDP zerocopy drivers i40e, ice
and ixgbe, as the code changes needed are very similar.  Introduce a
helper function xsk_umem_xdp_frame_sz() for calculating frame size.

Cc: intel-wired-lan@lists.osuosl.org
Cc: Björn Töpel <bjorn.topel@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_xsk.c   |    2 ++
 drivers/net/ethernet/intel/ice/ice_xsk.c     |    2 ++
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c |    2 ++
 include/net/xdp_sock.h                       |   11 +++++++++++
 4 files changed, 17 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index 0b7d29192b2c..2b9184aead5f 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -531,12 +531,14 @@ int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget)
 {
 	unsigned int total_rx_bytes = 0, total_rx_packets = 0;
 	u16 cleaned_count = I40E_DESC_UNUSED(rx_ring);
+	struct xdp_umem *umem = rx_ring->xsk_umem;
 	unsigned int xdp_res, xdp_xmit = 0;
 	bool failure = false;
 	struct sk_buff *skb;
 	struct xdp_buff xdp;
 
 	xdp.rxq = &rx_ring->xdp_rxq;
+	xdp.frame_sz = xsk_umem_xdp_frame_sz(umem);
 
 	while (likely(total_rx_packets < (unsigned int)budget)) {
 		struct i40e_rx_buffer *bi;
diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c
index 8279db15e870..23e5515d4527 100644
--- a/drivers/net/ethernet/intel/ice/ice_xsk.c
+++ b/drivers/net/ethernet/intel/ice/ice_xsk.c
@@ -840,11 +840,13 @@ int ice_clean_rx_irq_zc(struct ice_ring *rx_ring, int budget)
 {
 	unsigned int total_rx_bytes = 0, total_rx_packets = 0;
 	u16 cleaned_count = ICE_DESC_UNUSED(rx_ring);
+	struct xdp_umem *umem = rx_ring->xsk_umem;
 	unsigned int xdp_xmit = 0;
 	bool failure = false;
 	struct xdp_buff xdp;
 
 	xdp.rxq = &rx_ring->xdp_rxq;
+	xdp.frame_sz = xsk_umem_xdp_frame_sz(umem);
 
 	while (likely(total_rx_packets < (unsigned int)budget)) {
 		union ice_32b_rx_flex_desc *rx_desc;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
index 74b540ebb3dc..a656ee9a1fae 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
@@ -431,12 +431,14 @@ int ixgbe_clean_rx_irq_zc(struct ixgbe_q_vector *q_vector,
 	unsigned int total_rx_bytes = 0, total_rx_packets = 0;
 	struct ixgbe_adapter *adapter = q_vector->adapter;
 	u16 cleaned_count = ixgbe_desc_unused(rx_ring);
+	struct xdp_umem *umem = rx_ring->xsk_umem;
 	unsigned int xdp_res, xdp_xmit = 0;
 	bool failure = false;
 	struct sk_buff *skb;
 	struct xdp_buff xdp;
 
 	xdp.rxq = &rx_ring->xdp_rxq;
+	xdp.frame_sz = xsk_umem_xdp_frame_sz(umem);
 
 	while (likely(total_rx_packets < budget)) {
 		union ixgbe_adv_rx_desc *rx_desc;
diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
index e86ec48ef627..1cd1ec3cea97 100644
--- a/include/net/xdp_sock.h
+++ b/include/net/xdp_sock.h
@@ -237,6 +237,12 @@ static inline u64 xsk_umem_adjust_offset(struct xdp_umem *umem, u64 address,
 	else
 		return address + offset;
 }
+
+static inline u32 xsk_umem_xdp_frame_sz(struct xdp_umem *umem)
+{
+	return umem->chunk_size_nohr + umem->headroom;
+}
+
 #else
 static inline int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
 {
@@ -367,6 +373,11 @@ static inline u64 xsk_umem_adjust_offset(struct xdp_umem *umem, u64 handle,
 	return 0;
 }
 
+static inline u32 xsk_umem_xdp_frame_sz(struct xdp_umem *umem)
+{
+	return 0;
+}
+
 static inline int __xsk_map_redirect(struct xdp_sock *xs, struct xdp_buff *xdp)
 {
 	return -EOPNOTSUPP;



^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH net-next v2 28/33] mlx5: rx queue setup time determine frame_sz for XDP
       [not found] <158824557985.2172139.4173570969543904434.stgit@firesoul>
                   ` (26 preceding siblings ...)
  2020-04-30 11:22 ` [PATCH net-next v2 27/33] xdp: for Intel AF_XDP drivers add XDP frame_sz Jesper Dangaard Brouer
@ 2020-04-30 11:22 ` Jesper Dangaard Brouer
  2020-04-30 17:07   ` Tariq Toukan
  2020-04-30 11:22 ` [PATCH net-next v2 29/33] xdp: allow bpf_xdp_adjust_tail() to grow packet size Jesper Dangaard Brouer
                   ` (4 subsequent siblings)
  32 siblings, 1 reply; 47+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-30 11:22 UTC (permalink / raw)
  To: sameehj
  Cc: Tariq Toukan, Saeed Mahameed, Jesper Dangaard Brouer, netdev,
	bpf, zorik, akiyano, gtzalik, Toke Høiland-Jørgensen,
	Daniel Borkmann, Alexei Starovoitov, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Willem de Bruijn,
	Ilias Apalodimas, Lorenzo Bianconi, Saeed Mahameed,
	steffen.klassert

The mlx5 driver have multiple memory models, which are also changed
according to whether a XDP bpf_prog is attached.

The 'rx_striding_rq' setting is adjusted via ethtool priv-flags e.g.:
 # ethtool --set-priv-flags mlx5p2 rx_striding_rq off

On the general case with 4K page_size and regular MTU packet, then
the frame_sz is 2048 and 4096 when XDP is enabled, in both modes.

The info on the given frame size is stored differently depending on the
RQ-mode and encoded in a union in struct mlx5e_rq union wqe/mpwqe.
In rx striding mode rq->mpwqe.log_stride_sz is either 11 or 12, which
corresponds to 2048 or 4096 (MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ).
In non-striding mode (MLX5_WQ_TYPE_CYCLIC) the frag_stride is stored
in rq->wqe.info.arr[0].frag_stride, for the first fragment, which is
what the XDP case cares about.

To reduce effect on fast-path, this patch determine the frame_sz at
setup time, to avoid determining the memory model runtime. Variable
is named first_frame_sz to make it clear that this is only the frame
size of the first fragment.

This mlx5 driver does a DMA-sync on XDP_TX action, but grow is safe
as it have done a DMA-map on the entire PAGE_SIZE. The driver also
already does a XDP length check against sq->hw_mtu on the possible
XDP xmit paths mlx5e_xmit_xdp_frame() + mlx5e_xmit_xdp_frame_mpwqe().

V2: Fix that frag_size need to be recalc before creating SKB.

Cc: Tariq Toukan <tariqt@mellanox.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h      |    1 +
 drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c  |    1 +
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c |    6 ++++++
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c   |    2 ++
 4 files changed, 10 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 23701c0e36ec..ba6a0ee297c6 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -652,6 +652,7 @@ struct mlx5e_rq {
 	struct {
 		u16            umem_headroom;
 		u16            headroom;
+		u32            first_frame_sz;
 		u8             map_dir;   /* dma map direction */
 	} buff;
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
index f049e0ac308a..b63abaf51253 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
@@ -137,6 +137,7 @@ bool mlx5e_xdp_handle(struct mlx5e_rq *rq, struct mlx5e_dma_info *di,
 	if (xsk)
 		xdp.handle = di->xsk.handle;
 	xdp.rxq = &rq->xdp_rxq;
+	xdp.frame_sz = rq->buff.first_frame_sz;
 
 	act = bpf_prog_run_xdp(prog, &xdp);
 	if (xsk) {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 47396f1b02f4..1d04ed3feead 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -462,6 +462,8 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c,
 		rq->mpwqe.num_strides =
 			BIT(mlx5e_mpwqe_get_log_num_strides(mdev, params, xsk));
 
+		rq->buff.first_frame_sz = (1 << rq->mpwqe.log_stride_sz);
+
 		err = mlx5e_create_rq_umr_mkey(mdev, rq);
 		if (err)
 			goto err_rq_wq_destroy;
@@ -485,6 +487,8 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c,
 			num_xsk_frames = wq_sz << rq->wqe.info.log_num_frags;
 
 		rq->wqe.info = rqp->frags_info;
+		rq->buff.first_frame_sz = rq->wqe.info.arr[0].frag_stride;
+
 		rq->wqe.frags =
 			kvzalloc_node(array_size(sizeof(*rq->wqe.frags),
 					(wq_sz << rq->wqe.info.log_num_frags)),
@@ -522,6 +526,8 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c,
 	}
 
 	if (xsk) {
+		rq->buff.first_frame_sz = xsk_umem_xdp_frame_sz(umem);
+
 		err = mlx5e_xsk_resize_reuseq(umem, num_xsk_frames);
 		if (unlikely(err)) {
 			mlx5_core_err(mdev, "Unable to allocate the Reuse Ring for %u frames\n",
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index e2beb89c1832..04671ed977a5 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -1084,6 +1084,7 @@ mlx5e_skb_from_cqe_linear(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe,
 	if (consumed)
 		return NULL; /* page/packet was consumed by XDP */
 
+	frag_size = MLX5_SKB_FRAG_SZ(rx_headroom + cqe_bcnt);
 	skb = mlx5e_build_linear_skb(rq, va, frag_size, rx_headroom, cqe_bcnt);
 	if (unlikely(!skb))
 		return NULL;
@@ -1385,6 +1386,7 @@ mlx5e_skb_from_cqe_mpwrq_linear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *wi,
 		return NULL; /* page/packet was consumed by XDP */
 	}
 
+	frag_size = MLX5_SKB_FRAG_SZ(rx_headroom + cqe_bcnt32);
 	skb = mlx5e_build_linear_skb(rq, va, frag_size, rx_headroom, cqe_bcnt32);
 	if (unlikely(!skb))
 		return NULL;



^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH net-next v2 29/33] xdp: allow bpf_xdp_adjust_tail() to grow packet size
       [not found] <158824557985.2172139.4173570969543904434.stgit@firesoul>
                   ` (27 preceding siblings ...)
  2020-04-30 11:22 ` [PATCH net-next v2 28/33] mlx5: rx queue setup time determine frame_sz for XDP Jesper Dangaard Brouer
@ 2020-04-30 11:22 ` Jesper Dangaard Brouer
  2020-04-30 11:22 ` [PATCH net-next v2 30/33] xdp: clear grow memory in bpf_xdp_adjust_tail() Jesper Dangaard Brouer
                   ` (3 subsequent siblings)
  32 siblings, 0 replies; 47+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-30 11:22 UTC (permalink / raw)
  To: sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, steffen.klassert

Finally, after all drivers have a frame size, allow BPF-helper
bpf_xdp_adjust_tail() to grow or extend packet size at frame tail.

Remember that helper/macro xdp_data_hard_end have reserved some
tailroom.  Thus, this helper makes sure that the BPF-prog don't have
access to this tailroom area.

V2: Remove one chicken check and use WARN_ONCE for other

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 include/uapi/linux/bpf.h |    4 ++--
 net/core/filter.c        |   11 +++++++++--
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 7bbf1b65be10..621a64c3cd75 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -1969,8 +1969,8 @@ union bpf_attr {
  * int bpf_xdp_adjust_tail(struct xdp_buff *xdp_md, int delta)
  * 	Description
  * 		Adjust (move) *xdp_md*\ **->data_end** by *delta* bytes. It is
- * 		only possible to shrink the packet as of this writing,
- * 		therefore *delta* must be a negative integer.
+ * 		possible to both shrink and grow the packet tail.
+ * 		Shrink done via *delta* being a negative integer.
  *
  * 		A call to this helper is susceptible to change the underlying
  * 		packet buffer. Therefore, at load time, all checks on pointers
diff --git a/net/core/filter.c b/net/core/filter.c
index 7d6ceaa54d21..40e749d57cc1 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3422,12 +3422,19 @@ static const struct bpf_func_proto bpf_xdp_adjust_head_proto = {
 
 BPF_CALL_2(bpf_xdp_adjust_tail, struct xdp_buff *, xdp, int, offset)
 {
+	void *data_hard_end = xdp_data_hard_end(xdp); /* use xdp->frame_sz */
 	void *data_end = xdp->data_end + offset;
 
-	/* only shrinking is allowed for now. */
-	if (unlikely(offset >= 0))
+	/* Notice that xdp_data_hard_end have reserved some tailroom */
+	if (unlikely(data_end > data_hard_end))
 		return -EINVAL;
 
+	/* ALL drivers MUST init xdp->frame_sz, chicken check below */
+	if (unlikely(xdp->frame_sz > PAGE_SIZE)) {
+		WARN_ONCE(1, "Too BIG xdp->frame_sz = %d\n", xdp->frame_sz);
+		return -EINVAL;
+	}
+
 	if (unlikely(data_end < xdp->data + ETH_HLEN))
 		return -EINVAL;
 



^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH net-next v2 30/33] xdp: clear grow memory in bpf_xdp_adjust_tail()
       [not found] <158824557985.2172139.4173570969543904434.stgit@firesoul>
                   ` (28 preceding siblings ...)
  2020-04-30 11:22 ` [PATCH net-next v2 29/33] xdp: allow bpf_xdp_adjust_tail() to grow packet size Jesper Dangaard Brouer
@ 2020-04-30 11:22 ` Jesper Dangaard Brouer
  2020-04-30 11:22 ` [PATCH net-next v2 31/33] bpf: add xdp.frame_sz in bpf_prog_test_run_xdp() Jesper Dangaard Brouer
                   ` (2 subsequent siblings)
  32 siblings, 0 replies; 47+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-30 11:22 UTC (permalink / raw)
  To: sameehj
  Cc: Toke Høiland-Jørgensen, Jesper Dangaard Brouer, netdev,
	bpf, zorik, akiyano, gtzalik, Toke Høiland-Jørgensen,
	Daniel Borkmann, Alexei Starovoitov, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Willem de Bruijn,
	Ilias Apalodimas, Lorenzo Bianconi, Saeed Mahameed,
	steffen.klassert

Clearing memory of tail when grow happens, because it is too easy
to write a XDP_PASS program that extend the tail, which expose
this memory to users that can run tcpdump.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 net/core/filter.c |    4 ++++
 1 file changed, 4 insertions(+)

diff --git a/net/core/filter.c b/net/core/filter.c
index 40e749d57cc1..7af583648c8d 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3438,6 +3438,10 @@ BPF_CALL_2(bpf_xdp_adjust_tail, struct xdp_buff *, xdp, int, offset)
 	if (unlikely(data_end < xdp->data + ETH_HLEN))
 		return -EINVAL;
 
+	/* Clear memory area on grow, can contain uninit kernel memory */
+	if (offset > 0)
+		memset(xdp->data_end, 0, offset);
+
 	xdp->data_end = data_end;
 
 	return 0;



^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH net-next v2 31/33] bpf: add xdp.frame_sz in bpf_prog_test_run_xdp().
       [not found] <158824557985.2172139.4173570969543904434.stgit@firesoul>
                   ` (29 preceding siblings ...)
  2020-04-30 11:22 ` [PATCH net-next v2 30/33] xdp: clear grow memory in bpf_xdp_adjust_tail() Jesper Dangaard Brouer
@ 2020-04-30 11:22 ` Jesper Dangaard Brouer
  2020-04-30 11:23 ` [PATCH net-next v2 32/33] selftests/bpf: adjust BPF selftest for xdp_adjust_tail Jesper Dangaard Brouer
  2020-04-30 11:23 ` [PATCH net-next v2 33/33] selftests/bpf: xdp_adjust_tail add grow tail tests Jesper Dangaard Brouer
  32 siblings, 0 replies; 47+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-30 11:22 UTC (permalink / raw)
  To: sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, steffen.klassert

Update the memory requirements, when adding xdp.frame_sz in BPF test_run
function bpf_prog_test_run_xdp() which e.g. is used by XDP selftests.

Specifically add the expected reserved tailroom, but also allocated a
larger memory area to reflect that XDP frames usually comes in this
format. Limit the provided packet data size to 4096 minus headroom +
tailroom, as this also reflect a common 3520 bytes MTU limit with XDP.

Note that bpf_test_init already use a memory allocation method that clears
memory.  Thus, this already guards against leaking uninit kernel memory.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 net/bpf/test_run.c |   16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index 29dbdd4c29f6..30ba7d38941d 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -470,25 +470,34 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
 int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr,
 			  union bpf_attr __user *uattr)
 {
+	u32 tailroom = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+	u32 headroom = XDP_PACKET_HEADROOM;
 	u32 size = kattr->test.data_size_in;
 	u32 repeat = kattr->test.repeat;
 	struct netdev_rx_queue *rxqueue;
 	struct xdp_buff xdp = {};
 	u32 retval, duration;
+	u32 max_data_sz;
 	void *data;
 	int ret;
 
 	if (kattr->test.ctx_in || kattr->test.ctx_out)
 		return -EINVAL;
 
-	data = bpf_test_init(kattr, size, XDP_PACKET_HEADROOM + NET_IP_ALIGN, 0);
+	/* XDP have extra tailroom as (most) drivers use full page */
+	max_data_sz = 4096 - headroom - tailroom;
+	if (size > max_data_sz)
+		return -EINVAL;
+
+	data = bpf_test_init(kattr, max_data_sz, headroom, tailroom);
 	if (IS_ERR(data))
 		return PTR_ERR(data);
 
 	xdp.data_hard_start = data;
-	xdp.data = data + XDP_PACKET_HEADROOM + NET_IP_ALIGN;
+	xdp.data = data + headroom;
 	xdp.data_meta = xdp.data;
 	xdp.data_end = xdp.data + size;
+	xdp.frame_sz = headroom + max_data_sz + tailroom;
 
 	rxqueue = __netif_get_rx_queue(current->nsproxy->net_ns->loopback_dev, 0);
 	xdp.rxq = &rxqueue->xdp_rxq;
@@ -496,8 +505,7 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr,
 	ret = bpf_test_run(prog, &xdp, repeat, &retval, &duration, true);
 	if (ret)
 		goto out;
-	if (xdp.data != data + XDP_PACKET_HEADROOM + NET_IP_ALIGN ||
-	    xdp.data_end != xdp.data + size)
+	if (xdp.data != data + headroom || xdp.data_end != xdp.data + size)
 		size = xdp.data_end - xdp.data;
 	ret = bpf_test_finish(kattr, uattr, xdp.data, size, retval, duration);
 out:



^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH net-next v2 32/33] selftests/bpf: adjust BPF selftest for xdp_adjust_tail
       [not found] <158824557985.2172139.4173570969543904434.stgit@firesoul>
                   ` (30 preceding siblings ...)
  2020-04-30 11:22 ` [PATCH net-next v2 31/33] bpf: add xdp.frame_sz in bpf_prog_test_run_xdp() Jesper Dangaard Brouer
@ 2020-04-30 11:23 ` Jesper Dangaard Brouer
  2020-04-30 11:23 ` [PATCH net-next v2 33/33] selftests/bpf: xdp_adjust_tail add grow tail tests Jesper Dangaard Brouer
  32 siblings, 0 replies; 47+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-30 11:23 UTC (permalink / raw)
  To: sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, steffen.klassert

Current selftest for BPF-helper xdp_adjust_tail only shrink tail.
Make it more clear that this is a shrink test case.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 .../selftests/bpf/prog_tests/xdp_adjust_tail.c     |    9 +++++-
 .../testing/selftests/bpf/progs/test_adjust_tail.c |   30 --------------------
 .../bpf/progs/test_xdp_adjust_tail_shrink.c        |   30 ++++++++++++++++++++
 3 files changed, 37 insertions(+), 32 deletions(-)
 delete mode 100644 tools/testing/selftests/bpf/progs/test_adjust_tail.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_shrink.c

diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_adjust_tail.c b/tools/testing/selftests/bpf/prog_tests/xdp_adjust_tail.c
index 3744196d7cba..d258f979d5ef 100644
--- a/tools/testing/selftests/bpf/prog_tests/xdp_adjust_tail.c
+++ b/tools/testing/selftests/bpf/prog_tests/xdp_adjust_tail.c
@@ -1,9 +1,9 @@
 // SPDX-License-Identifier: GPL-2.0
 #include <test_progs.h>
 
-void test_xdp_adjust_tail(void)
+void test_xdp_adjust_tail_shrink(void)
 {
-	const char *file = "./test_adjust_tail.o";
+	const char *file = "./test_xdp_adjust_tail_shrink.o";
 	struct bpf_object *obj;
 	char buf[128];
 	__u32 duration, retval, size;
@@ -27,3 +27,8 @@ void test_xdp_adjust_tail(void)
 	      err, errno, retval, size);
 	bpf_object__close(obj);
 }
+
+void test_xdp_adjust_tail(void)
+{
+	test_xdp_adjust_tail_shrink();
+}
diff --git a/tools/testing/selftests/bpf/progs/test_adjust_tail.c b/tools/testing/selftests/bpf/progs/test_adjust_tail.c
deleted file mode 100644
index b7fc85769bdc..000000000000
--- a/tools/testing/selftests/bpf/progs/test_adjust_tail.c
+++ /dev/null
@@ -1,30 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0
- * Copyright (c) 2018 Facebook
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- */
-#include <linux/bpf.h>
-#include <linux/if_ether.h>
-#include <bpf/bpf_helpers.h>
-
-int _version SEC("version") = 1;
-
-SEC("xdp_adjust_tail")
-int _xdp_adjust_tail(struct xdp_md *xdp)
-{
-	void *data_end = (void *)(long)xdp->data_end;
-	void *data = (void *)(long)xdp->data;
-	int offset = 0;
-
-	if (data_end - data == 54)
-		offset = 256;
-	else
-		offset = 20;
-	if (bpf_xdp_adjust_tail(xdp, 0 - offset))
-		return XDP_DROP;
-	return XDP_TX;
-}
-
-char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_shrink.c b/tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_shrink.c
new file mode 100644
index 000000000000..c8a7c17b54f4
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_shrink.c
@@ -0,0 +1,30 @@
+/* SPDX-License-Identifier: GPL-2.0
+ * Copyright (c) 2018 Facebook
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+#include <linux/bpf.h>
+#include <linux/if_ether.h>
+#include <bpf/bpf_helpers.h>
+
+int _version SEC("version") = 1;
+
+SEC("xdp_adjust_tail_shrink")
+int _xdp_adjust_tail_shrink(struct xdp_md *xdp)
+{
+	void *data_end = (void *)(long)xdp->data_end;
+	void *data = (void *)(long)xdp->data;
+	int offset = 0;
+
+	if (data_end - data == 54) /* sizeof(pkt_v4) */
+		offset = 256; /* shrink too much */
+	else
+		offset = 20;
+	if (bpf_xdp_adjust_tail(xdp, 0 - offset))
+		return XDP_DROP;
+	return XDP_TX;
+}
+
+char _license[] SEC("license") = "GPL";



^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH net-next v2 33/33] selftests/bpf: xdp_adjust_tail add grow tail tests
       [not found] <158824557985.2172139.4173570969543904434.stgit@firesoul>
                   ` (31 preceding siblings ...)
  2020-04-30 11:23 ` [PATCH net-next v2 32/33] selftests/bpf: adjust BPF selftest for xdp_adjust_tail Jesper Dangaard Brouer
@ 2020-04-30 11:23 ` Jesper Dangaard Brouer
  32 siblings, 0 replies; 47+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-30 11:23 UTC (permalink / raw)
  To: sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, steffen.klassert

Extend BPF selftest xdp_adjust_tail with grow tail tests, which is added
as subtest's. The first grow test stays in same form as original shrink
test. The second grow test use the newer bpf_prog_test_run_xattr() calls,
and does extra checking of data contents.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 .../selftests/bpf/prog_tests/xdp_adjust_tail.c     |  116 +++++++++++++++++++-
 .../bpf/progs/test_xdp_adjust_tail_grow.c          |   33 ++++++
 2 files changed, 144 insertions(+), 5 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_grow.c

diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_adjust_tail.c b/tools/testing/selftests/bpf/prog_tests/xdp_adjust_tail.c
index d258f979d5ef..1498627af6e8 100644
--- a/tools/testing/selftests/bpf/prog_tests/xdp_adjust_tail.c
+++ b/tools/testing/selftests/bpf/prog_tests/xdp_adjust_tail.c
@@ -4,10 +4,10 @@
 void test_xdp_adjust_tail_shrink(void)
 {
 	const char *file = "./test_xdp_adjust_tail_shrink.o";
+	__u32 duration, retval, size, expect_sz;
 	struct bpf_object *obj;
-	char buf[128];
-	__u32 duration, retval, size;
 	int err, prog_fd;
+	char buf[128];
 
 	err = bpf_prog_load(file, BPF_PROG_TYPE_XDP, &obj, &prog_fd);
 	if (CHECK_FAIL(err))
@@ -20,15 +20,121 @@ void test_xdp_adjust_tail_shrink(void)
 	      "ipv4", "err %d errno %d retval %d size %d\n",
 	      err, errno, retval, size);
 
+	expect_sz = sizeof(pkt_v6) - 20;  /* Test shrink with 20 bytes */
 	err = bpf_prog_test_run(prog_fd, 1, &pkt_v6, sizeof(pkt_v6),
 				buf, &size, &retval, &duration);
-	CHECK(err || retval != XDP_TX || size != 54,
-	      "ipv6", "err %d errno %d retval %d size %d\n",
+	CHECK(err || retval != XDP_TX || size != expect_sz,
+	      "ipv6", "err %d errno %d retval %d size %d expect-size %d\n",
+	      err, errno, retval, size, expect_sz);
+	bpf_object__close(obj);
+}
+
+void test_xdp_adjust_tail_grow(void)
+{
+	const char *file = "./test_xdp_adjust_tail_grow.o";
+	struct bpf_object *obj;
+	char buf[4096]; /* avoid segfault: large buf to hold grow results */
+	__u32 duration, retval, size, expect_sz;
+	int err, prog_fd;
+
+	err = bpf_prog_load(file, BPF_PROG_TYPE_XDP, &obj, &prog_fd);
+	if (CHECK_FAIL(err))
+		return;
+
+	err = bpf_prog_test_run(prog_fd, 1, &pkt_v4, sizeof(pkt_v4),
+				buf, &size, &retval, &duration);
+	CHECK(err || retval != XDP_DROP,
+	      "ipv4", "err %d errno %d retval %d size %d\n",
 	      err, errno, retval, size);
+
+	expect_sz = sizeof(pkt_v6) + 40; /* Test grow with 40 bytes */
+	err = bpf_prog_test_run(prog_fd, 1, &pkt_v6, sizeof(pkt_v6) /* 74 */,
+				buf, &size, &retval, &duration);
+	CHECK(err || retval != XDP_TX || size != expect_sz,
+	      "ipv6", "err %d errno %d retval %d size %d expect-size %d\n",
+	      err, errno, retval, size, expect_sz);
+
+	bpf_object__close(obj);
+}
+
+void test_xdp_adjust_tail_grow2(void)
+{
+	const char *file = "./test_xdp_adjust_tail_grow.o";
+	char buf[4096]; /* avoid segfault: large buf to hold grow results */
+	int tailroom = 320; /* SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) */;
+	struct bpf_object *obj;
+	int err, cnt, i;
+	int max_grow;
+
+	struct bpf_prog_test_run_attr tattr = {
+		.repeat 	= 1,
+		.data_in	= &buf,
+		.data_out	= &buf,
+		.data_size_in	= 0, /* Per test */
+		.data_size_out	= 0, /* Per test */
+	};
+
+	err = bpf_prog_load(file, BPF_PROG_TYPE_XDP, &obj, &tattr.prog_fd);
+	if (CHECK_ATTR(err, "load", "err %d errno %d\n", err, errno))
+		return;
+
+	/* Test case-64 */
+	memset(buf, 1, sizeof(buf));
+	tattr.data_size_in  =  64; /* Determine test case via pkt size */
+	tattr.data_size_out = 128; /* Limit copy_size */
+	/* Kernel side alloc packet memory area that is zero init */
+	err = bpf_prog_test_run_xattr(&tattr);
+
+	CHECK_ATTR(errno != ENOSPC /* Due limit copy_size in bpf_test_finish */
+		   || tattr.retval != XDP_TX
+		   || tattr.data_size_out != 192, /* Expected grow size */
+		   "case-64",
+		   "err %d errno %d retval %d size %d\n",
+		   err, errno, tattr.retval, tattr.data_size_out);
+
+	/* Extra checks for data contents */
+	CHECK_ATTR(tattr.data_size_out != 192
+		   || buf[0]   != 1 ||  buf[63]  != 1  /*  0-63  memset to 1 */
+		   || buf[64]  != 0 ||  buf[127] != 0  /* 64-127 memset to 0 */
+		   || buf[128] != 1 ||  buf[191] != 1, /*128-191 memset to 1 */
+		   "case-64-data",
+		   "err %d errno %d retval %d size %d\n",
+		   err, errno, tattr.retval, tattr.data_size_out);
+
+	/* Test case-128 */
+	memset(buf, 2, sizeof(buf));
+	tattr.data_size_in  = 128; /* Determine test case via pkt size */
+	tattr.data_size_out = sizeof(buf);   /* Copy everything */
+	err = bpf_prog_test_run_xattr(&tattr);
+
+	max_grow = 4096 - XDP_PACKET_HEADROOM -	tailroom; /* 3520 */
+	CHECK_ATTR(err
+		   || tattr.retval != XDP_TX
+		   || tattr.data_size_out != max_grow, /* Expect max grow size */
+		   "case-128",
+		   "err %d errno %d retval %d size %d expect-size %d\n",
+		   err, errno, tattr.retval, tattr.data_size_out, max_grow);
+
+	/* Extra checks for data contents: Count grow size, will contain zeros */
+	for (i = 0, cnt = 0; i < sizeof(buf); i++) {
+		if (buf[i] == 0)
+			cnt++;
+	}
+	CHECK_ATTR((cnt != (max_grow - tattr.data_size_in)) /* Grow increase */
+		   || tattr.data_size_out != max_grow, /* Total grow size */
+		   "case-128-data",
+		   "err %d errno %d retval %d size %d grow-size %d\n",
+		   err, errno, tattr.retval, tattr.data_size_out, cnt);
+
 	bpf_object__close(obj);
 }
 
 void test_xdp_adjust_tail(void)
 {
-	test_xdp_adjust_tail_shrink();
+	if (test__start_subtest("xdp_adjust_tail_shrink"))
+		test_xdp_adjust_tail_shrink();
+	if (test__start_subtest("xdp_adjust_tail_grow"))
+		test_xdp_adjust_tail_grow();
+	if (test__start_subtest("xdp_adjust_tail_grow2"))
+		test_xdp_adjust_tail_grow2();
 }
diff --git a/tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_grow.c b/tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_grow.c
new file mode 100644
index 000000000000..3d66599eee2e
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_grow.c
@@ -0,0 +1,33 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+
+SEC("xdp_adjust_tail_grow")
+int _xdp_adjust_tail_grow(struct xdp_md *xdp)
+{
+	void *data_end = (void *)(long)xdp->data_end;
+	void *data = (void *)(long)xdp->data;
+	unsigned int data_len;
+	int offset = 0;
+
+	/* Data length determine test case */
+	data_len = data_end - data;
+
+	if (data_len == 54) { /* sizeof(pkt_v4) */
+		offset = 4096; /* test too large offset */
+	} else if (data_len == 74) { /* sizeof(pkt_v6) */
+		offset = 40;
+	} else if (data_len == 64) {
+		offset = 128;
+	} else if (data_len == 128) {
+		offset = 4096 - 256 - 320 - data_len; /* Max tail grow 3520 */
+	} else {
+		return XDP_ABORTED; /* No matching test */
+	}
+
+	if (bpf_xdp_adjust_tail(xdp, offset))
+		return XDP_DROP;
+	return XDP_TX;
+}
+
+char _license[] SEC("license") = "GPL";



^ permalink raw reply	[flat|nested] 47+ messages in thread

* RE: [PATCH net-next v2 12/33] hv_netvsc: add XDP frame size to driver
  2020-04-30 11:21 ` [PATCH net-next v2 12/33] hv_netvsc: add XDP frame size to driver Jesper Dangaard Brouer
@ 2020-04-30 14:20   ` Haiyang Zhang
  2020-05-01 14:47     ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 47+ messages in thread
From: Haiyang Zhang @ 2020-04-30 14:20 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, sameehj
  Cc: Wei Liu, KY Srinivasan, Stephen Hemminger, netdev, bpf, zorik,
	akiyano, gtzalik, Toke Høiland-Jørgensen,
	Daniel Borkmann, Alexei Starovoitov, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Willem de Bruijn,
	Ilias Apalodimas, Lorenzo Bianconi, Saeed Mahameed,
	steffen.klassert



> -----Original Message-----
> From: Jesper Dangaard Brouer <brouer@redhat.com>
> Sent: Thursday, April 30, 2020 7:21 AM
> To: sameehj@amazon.com
> Cc: Wei Liu <wei.liu@kernel.org>; KY Srinivasan <kys@microsoft.com>;
> Haiyang Zhang <haiyangz@microsoft.com>; Stephen Hemminger
> <sthemmin@microsoft.com>; Jesper Dangaard Brouer
> <brouer@redhat.com>; netdev@vger.kernel.org; bpf@vger.kernel.org;
> zorik@amazon.com; akiyano@amazon.com; gtzalik@amazon.com; Toke
> Høiland-Jørgensen <toke@redhat.com>; Daniel Borkmann
> <borkmann@iogearbox.net>; Alexei Starovoitov
> <alexei.starovoitov@gmail.com>; John Fastabend
> <john.fastabend@gmail.com>; Alexander Duyck
> <alexander.duyck@gmail.com>; Jeff Kirsher <jeffrey.t.kirsher@intel.com>;
> David Ahern <dsahern@gmail.com>; Willem de Bruijn
> <willemdebruijn.kernel@gmail.com>; Ilias Apalodimas
> <ilias.apalodimas@linaro.org>; Lorenzo Bianconi <lorenzo@kernel.org>;
> Saeed Mahameed <saeedm@mellanox.com>;
> steffen.klassert@secunet.com
> Subject: [PATCH net-next v2 12/33] hv_netvsc: add XDP frame size to driver
> 
> The hyperv NIC drivers XDP implementation is rather disappointing as it will
> be a slowdown to enable XDP on this driver, given it will allocate a new page
> for each packet and copy over the payload, before invoking the XDP BPF-
> prog.
This needs correction. As I said previously -- 
This statement is not accurate -- The data path of netvsc driver does memory 
allocation and copy even without XDP, so it's not "a slowdown to enable XDP".

Thanks,
- Haiyang

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH net-next v2 28/33] mlx5: rx queue setup time determine frame_sz for XDP
  2020-04-30 11:22 ` [PATCH net-next v2 28/33] mlx5: rx queue setup time determine frame_sz for XDP Jesper Dangaard Brouer
@ 2020-04-30 17:07   ` Tariq Toukan
  2020-04-30 17:12     ` Tariq Toukan
  2020-05-01 13:01     ` Jesper Dangaard Brouer
  0 siblings, 2 replies; 47+ messages in thread
From: Tariq Toukan @ 2020-04-30 17:07 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, sameehj
  Cc: Tariq Toukan, Saeed Mahameed, netdev, bpf, zorik, akiyano,
	gtzalik, Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, steffen.klassert



On 4/30/2020 2:22 PM, Jesper Dangaard Brouer wrote:
> The mlx5 driver have multiple memory models, which are also changed
> according to whether a XDP bpf_prog is attached.
> 
> The 'rx_striding_rq' setting is adjusted via ethtool priv-flags e.g.:
>   # ethtool --set-priv-flags mlx5p2 rx_striding_rq off
> 
> On the general case with 4K page_size and regular MTU packet, then
> the frame_sz is 2048 and 4096 when XDP is enabled, in both modes.
> 
> The info on the given frame size is stored differently depending on the
> RQ-mode and encoded in a union in struct mlx5e_rq union wqe/mpwqe.
> In rx striding mode rq->mpwqe.log_stride_sz is either 11 or 12, which
> corresponds to 2048 or 4096 (MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ).
> In non-striding mode (MLX5_WQ_TYPE_CYCLIC) the frag_stride is stored
> in rq->wqe.info.arr[0].frag_stride, for the first fragment, which is
> what the XDP case cares about.
> 
> To reduce effect on fast-path, this patch determine the frame_sz at
> setup time, to avoid determining the memory model runtime. Variable
> is named first_frame_sz to make it clear that this is only the frame
> size of the first fragment.
> 
> This mlx5 driver does a DMA-sync on XDP_TX action, but grow is safe
> as it have done a DMA-map on the entire PAGE_SIZE. The driver also
> already does a XDP length check against sq->hw_mtu on the possible
> XDP xmit paths mlx5e_xmit_xdp_frame() + mlx5e_xmit_xdp_frame_mpwqe().
> 
> V2: Fix that frag_size need to be recalc before creating SKB.
> 
> Cc: Tariq Toukan <tariqt@mellanox.com>
> Cc: Saeed Mahameed <saeedm@mellanox.com>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> ---
>   drivers/net/ethernet/mellanox/mlx5/core/en.h      |    1 +
>   drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c  |    1 +
>   drivers/net/ethernet/mellanox/mlx5/core/en_main.c |    6 ++++++
>   drivers/net/ethernet/mellanox/mlx5/core/en_rx.c   |    2 ++
>   4 files changed, 10 insertions(+)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
> index 23701c0e36ec..ba6a0ee297c6 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
> @@ -652,6 +652,7 @@ struct mlx5e_rq {
>   	struct {
>   		u16            umem_headroom;
>   		u16            headroom;
> +		u32            first_frame_sz;
>   		u8             map_dir;   /* dma map direction */
>   	} buff;
>   
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
> index f049e0ac308a..b63abaf51253 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
> @@ -137,6 +137,7 @@ bool mlx5e_xdp_handle(struct mlx5e_rq *rq, struct mlx5e_dma_info *di,
>   	if (xsk)
>   		xdp.handle = di->xsk.handle;
>   	xdp.rxq = &rq->xdp_rxq;
> +	xdp.frame_sz = rq->buff.first_frame_sz;
>   
>   	act = bpf_prog_run_xdp(prog, &xdp);
>   	if (xsk) {
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> index 47396f1b02f4..1d04ed3feead 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
> @@ -462,6 +462,8 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c,
>   		rq->mpwqe.num_strides =
>   			BIT(mlx5e_mpwqe_get_log_num_strides(mdev, params, xsk));
>   
> +		rq->buff.first_frame_sz = (1 << rq->mpwqe.log_stride_sz);
> +
>   		err = mlx5e_create_rq_umr_mkey(mdev, rq);
>   		if (err)
>   			goto err_rq_wq_destroy;
> @@ -485,6 +487,8 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c,
>   			num_xsk_frames = wq_sz << rq->wqe.info.log_num_frags;
>   
>   		rq->wqe.info = rqp->frags_info;
> +		rq->buff.first_frame_sz = rq->wqe.info.arr[0].frag_stride;
> +
>   		rq->wqe.frags =
>   			kvzalloc_node(array_size(sizeof(*rq->wqe.frags),
>   					(wq_sz << rq->wqe.info.log_num_frags)),
> @@ -522,6 +526,8 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c,
>   	}
>   
>   	if (xsk) {
> +		rq->buff.first_frame_sz = xsk_umem_xdp_frame_sz(umem);
> +
>   		err = mlx5e_xsk_resize_reuseq(umem, num_xsk_frames);
>   		if (unlikely(err)) {
>   			mlx5_core_err(mdev, "Unable to allocate the Reuse Ring for %u frames\n",
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> index e2beb89c1832..04671ed977a5 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> @@ -1084,6 +1084,7 @@ mlx5e_skb_from_cqe_linear(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe,
>   	if (consumed)
>   		return NULL; /* page/packet was consumed by XDP */
>   
> +	frag_size = MLX5_SKB_FRAG_SZ(rx_headroom + cqe_bcnt);

This is a re-calculation of frag_size, using the exact same command used 
earlier in this function, but with a newer value of rx_headroom.
This wasn't part of the previous patchset. I understand the need.

However, this code repetition looks weird and non-optimal to me. I think 
we can come up with something better.

>   	skb = mlx5e_build_linear_skb(rq, va, frag_size, rx_headroom, cqe_bcnt);
>   	if (unlikely(!skb))
>   		return NULL;
> @@ -1385,6 +1386,7 @@ mlx5e_skb_from_cqe_mpwrq_linear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *wi,
>   		return NULL; /* page/packet was consumed by XDP */
>   	}
>   
> +	frag_size = MLX5_SKB_FRAG_SZ(rx_headroom + cqe_bcnt32);

Same here.

>   	skb = mlx5e_build_linear_skb(rq, va, frag_size, rx_headroom, cqe_bcnt32);
>   	if (unlikely(!skb))
>   		return NULL;
> 
> 

My suggetion is:
Pass &frag_size to mlx5e_xdp_handle(), and update it within it, just 
next to the update of rx_headroom.
All the needed information is there: the new rx_headroom, and cqe_bcnt.

Thanks,
Tariq

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH net-next v2 28/33] mlx5: rx queue setup time determine frame_sz for XDP
  2020-04-30 17:07   ` Tariq Toukan
@ 2020-04-30 17:12     ` Tariq Toukan
  2020-05-01 12:32       ` Jesper Dangaard Brouer
  2020-05-01 13:01     ` Jesper Dangaard Brouer
  1 sibling, 1 reply; 47+ messages in thread
From: Tariq Toukan @ 2020-04-30 17:12 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, sameehj
  Cc: Saeed Mahameed, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, steffen.klassert



On 4/30/2020 8:07 PM, Tariq Toukan wrote:
> 
> 
> On 4/30/2020 2:22 PM, Jesper Dangaard Brouer wrote:
>> The mlx5 driver have multiple memory models, which are also changed
>> according to whether a XDP bpf_prog is attached.
>>
>> The 'rx_striding_rq' setting is adjusted via ethtool priv-flags e.g.:
>>   # ethtool --set-priv-flags mlx5p2 rx_striding_rq off
>>
>> On the general case with 4K page_size and regular MTU packet, then
>> the frame_sz is 2048 and 4096 when XDP is enabled, in both modes.
>>
>> The info on the given frame size is stored differently depending on the
>> RQ-mode and encoded in a union in struct mlx5e_rq union wqe/mpwqe.
>> In rx striding mode rq->mpwqe.log_stride_sz is either 11 or 12, which
>> corresponds to 2048 or 4096 (MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ).
>> In non-striding mode (MLX5_WQ_TYPE_CYCLIC) the frag_stride is stored
>> in rq->wqe.info.arr[0].frag_stride, for the first fragment, which is
>> what the XDP case cares about.
>>
>> To reduce effect on fast-path, this patch determine the frame_sz at
>> setup time, to avoid determining the memory model runtime. Variable
>> is named first_frame_sz to make it clear that this is only the frame
>> size of the first fragment.
>>
>> This mlx5 driver does a DMA-sync on XDP_TX action, but grow is safe
>> as it have done a DMA-map on the entire PAGE_SIZE. The driver also
>> already does a XDP length check against sq->hw_mtu on the possible
>> XDP xmit paths mlx5e_xmit_xdp_frame() + mlx5e_xmit_xdp_frame_mpwqe().
>>
>> V2: Fix that frag_size need to be recalc before creating SKB.
>>
>> Cc: Tariq Toukan <tariqt@mellanox.com>
>> Cc: Saeed Mahameed <saeedm@mellanox.com>
>> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
>> ---
>>   drivers/net/ethernet/mellanox/mlx5/core/en.h      |    1 +
>>   drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c  |    1 +
>>   drivers/net/ethernet/mellanox/mlx5/core/en_main.c |    6 ++++++
>>   drivers/net/ethernet/mellanox/mlx5/core/en_rx.c   |    2 ++
>>   4 files changed, 10 insertions(+)
>>
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
>> b/drivers/net/ethernet/mellanox/mlx5/core/en.h
>> index 23701c0e36ec..ba6a0ee297c6 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
>> @@ -652,6 +652,7 @@ struct mlx5e_rq {
>>       struct {
>>           u16            umem_headroom;
>>           u16            headroom;
>> +        u32            first_frame_sz;

I also think that a better name would be: frame0_sz, or frag0_sz.

Thanks.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH net-next v2 28/33] mlx5: rx queue setup time determine frame_sz for XDP
  2020-04-30 17:12     ` Tariq Toukan
@ 2020-05-01 12:32       ` Jesper Dangaard Brouer
  2020-05-08 10:49         ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 47+ messages in thread
From: Jesper Dangaard Brouer @ 2020-05-01 12:32 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: sameehj, Saeed Mahameed, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, steffen.klassert, brouer

On Thu, 30 Apr 2020 20:12:11 +0300
Tariq Toukan <tariqt@mellanox.com> wrote:

> >> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
> >> b/drivers/net/ethernet/mellanox/mlx5/core/en.h
> >> index 23701c0e36ec..ba6a0ee297c6 100644
> >> --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
> >> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
> >> @@ -652,6 +652,7 @@ struct mlx5e_rq {
> >>       struct {
> >>           u16            umem_headroom;
> >>           u16            headroom;
> >> +        u32            first_frame_sz;  
> 
> I also think that a better name would be: frame0_sz, or frag0_sz.

You do realize that the name "first_frame_sz" was your suggestion last
time... Now you give me two options, can please select one of them so I
can update the patch for a V3 with that?

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH net-next v2 28/33] mlx5: rx queue setup time determine frame_sz for XDP
  2020-04-30 17:07   ` Tariq Toukan
  2020-04-30 17:12     ` Tariq Toukan
@ 2020-05-01 13:01     ` Jesper Dangaard Brouer
  1 sibling, 0 replies; 47+ messages in thread
From: Jesper Dangaard Brouer @ 2020-05-01 13:01 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: sameehj, Saeed Mahameed, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, steffen.klassert, brouer

On Thu, 30 Apr 2020 20:07:43 +0300
Tariq Toukan <tariqt@mellanox.com> wrote:

> On 4/30/2020 2:22 PM, Jesper Dangaard Brouer wrote:
> > The mlx5 driver have multiple memory models, which are also changed
> > according to whether a XDP bpf_prog is attached.
> > 
> > The 'rx_striding_rq' setting is adjusted via ethtool priv-flags e.g.:
> >   # ethtool --set-priv-flags mlx5p2 rx_striding_rq off
> > 
> > On the general case with 4K page_size and regular MTU packet, then
> > the frame_sz is 2048 and 4096 when XDP is enabled, in both modes.
> > 
> > The info on the given frame size is stored differently depending on the
> > RQ-mode and encoded in a union in struct mlx5e_rq union wqe/mpwqe.
> > In rx striding mode rq->mpwqe.log_stride_sz is either 11 or 12, which
> > corresponds to 2048 or 4096 (MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ).
> > In non-striding mode (MLX5_WQ_TYPE_CYCLIC) the frag_stride is stored
> > in rq->wqe.info.arr[0].frag_stride, for the first fragment, which is
> > what the XDP case cares about.
> > 
> > To reduce effect on fast-path, this patch determine the frame_sz at
> > setup time, to avoid determining the memory model runtime. Variable
> > is named first_frame_sz to make it clear that this is only the frame
> > size of the first fragment.
> > 
> > This mlx5 driver does a DMA-sync on XDP_TX action, but grow is safe
> > as it have done a DMA-map on the entire PAGE_SIZE. The driver also
> > already does a XDP length check against sq->hw_mtu on the possible
> > XDP xmit paths mlx5e_xmit_xdp_frame() + mlx5e_xmit_xdp_frame_mpwqe().
> > 
> > V2: Fix that frag_size need to be recalc before creating SKB.
> > 
> > Cc: Tariq Toukan <tariqt@mellanox.com>
> > Cc: Saeed Mahameed <saeedm@mellanox.com>
> > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> > ---
> >   drivers/net/ethernet/mellanox/mlx5/core/en.h      |    1 +
> >   drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c  |    1 +
> >   drivers/net/ethernet/mellanox/mlx5/core/en_main.c |    6 ++++++
> >   drivers/net/ethernet/mellanox/mlx5/core/en_rx.c   |    2 ++
> >   4 files changed, 10 insertions(+)
> > 
[... cut ...]
> > diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> > index e2beb89c1832..04671ed977a5 100644
> > --- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> > +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
> > @@ -1084,6 +1084,7 @@ mlx5e_skb_from_cqe_linear(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe,
> >   	if (consumed)
> >   		return NULL; /* page/packet was consumed by XDP */
> >   
> > +	frag_size = MLX5_SKB_FRAG_SZ(rx_headroom + cqe_bcnt);  
> 
> This is a re-calculation of frag_size, using the exact same command used 
> earlier in this function, but with a newer value of rx_headroom.
> This wasn't part of the previous patchset. I understand the need.

Yes, kernel will crash without this change.

> However, this code repetition looks weird and non-optimal to me. I think 
> we can come up with something better.
> 
> >   	skb = mlx5e_build_linear_skb(rq, va, frag_size, rx_headroom, cqe_bcnt);
> >   	if (unlikely(!skb))
> >   		return NULL;
> > @@ -1385,6 +1386,7 @@ mlx5e_skb_from_cqe_mpwrq_linear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *wi,
> >   		return NULL; /* page/packet was consumed by XDP */
> >   	}
> >   
> > +	frag_size = MLX5_SKB_FRAG_SZ(rx_headroom + cqe_bcnt32);  
> 
> Same here.
> 
> >   	skb = mlx5e_build_linear_skb(rq, va, frag_size, rx_headroom, cqe_bcnt32);
> >   	if (unlikely(!skb))
> >   		return NULL;
> > 
> >   
> 
> My suggetion is:
> Pass &frag_size to mlx5e_xdp_handle(), and update it within it, just 
> next to the update of rx_headroom.
> All the needed information is there: the new rx_headroom, and cqe_bcnt.

First of all, passing yet-another argument to mlx5e_xdp_handle(), also
looks weird, and is on the brink of becoming a performance issue, as on
x86_64 you can pass max 6 arguments in registers before they get pushed
on the stack. Adding this would be the 7th argument.

Second the MLX5_SKB_FRAG_SZ() calculation is also weird, because it
does not provide any tailroom in the packet, I guess it is for
supporting another memory mode, as in case XDP is activated there are
plenty of tailroom.
  I though about increasing the frag_size, but I choose not to, because
then this patch would change the driver behavior beyond adding frame_sz
for XDP.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer



bool mlx5e_xdp_handle(struct mlx5e_rq *rq, struct mlx5e_dma_info *di,
		      void *va, u16 *rx_headroom, u32 *len, bool xsk)
{
	[...]
}


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH net-next v2 12/33] hv_netvsc: add XDP frame size to driver
  2020-04-30 14:20   ` Haiyang Zhang
@ 2020-05-01 14:47     ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 47+ messages in thread
From: Jesper Dangaard Brouer @ 2020-05-01 14:47 UTC (permalink / raw)
  To: Haiyang Zhang
  Cc: sameehj, Wei Liu, KY Srinivasan, Stephen Hemminger, netdev, bpf,
	zorik, akiyano, gtzalik, Toke Høiland-Jørgensen,
	Daniel Borkmann, Alexei Starovoitov, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Willem de Bruijn,
	Ilias Apalodimas, Lorenzo Bianconi, Saeed Mahameed,
	steffen.klassert, brouer

On Thu, 30 Apr 2020 14:20:20 +0000
Haiyang Zhang <haiyangz@microsoft.com> wrote:
> > -----Original Message-----
> > From: Jesper Dangaard Brouer <brouer@redhat.com>
> > 
> > The hyperv NIC drivers XDP implementation is rather disappointing as it will
> > be a slowdown to enable XDP on this driver, given it will allocate a new page
> > for each packet and copy over the payload, before invoking the XDP BPF-
> > prog.  
>
> This needs correction. As I said previously -- 
> This statement is not accurate -- The data path of netvsc driver does memory 
> allocation and copy even without XDP, so it's not "a slowdown to enable XDP".

Okay, I have changed the paragraph text to:

 The hyperv NIC driver does memory allocation and copy even without XDP.
 In XDP mode it will allocate a new page for each packet and copy over
 the payload, before invoking the XDP BPF-prog.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH net-next v2 20/33] vhost_net: also populate XDP frame size
  2020-04-30 11:22 ` [PATCH net-next v2 20/33] vhost_net: also populate " Jesper Dangaard Brouer
@ 2020-05-06  6:41   ` Jason Wang
  2020-05-06  6:49     ` Jason Wang
  2020-05-06 20:33   ` Michael S. Tsirkin
  1 sibling, 1 reply; 47+ messages in thread
From: Jason Wang @ 2020-05-06  6:41 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, sameehj
  Cc: netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, steffen.klassert


On 2020/4/30 下午7:22, Jesper Dangaard Brouer wrote:
> In vhost_net_build_xdp() the 'buf' that gets queued via an xdp_buff
> have embedded a struct tun_xdp_hdr (located at xdp->data_hard_start)
> which contains the buffer length 'buflen' (with tailroom for
> skb_shared_info). Also storing this buflen in xdp->frame_sz, does not
> obsolete struct tun_xdp_hdr, as it also contains a struct
> virtio_net_hdr with other information.
>
> Cc: Jason Wang <jasowang@redhat.com>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> ---
>   drivers/vhost/net.c |    1 +
>   1 file changed, 1 insertion(+)
>
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 2927f02cc7e1..516519dcc8ff 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -747,6 +747,7 @@ static int vhost_net_build_xdp(struct vhost_net_virtqueue *nvq,
>   	xdp->data = buf + pad;
>   	xdp->data_end = xdp->data + len;
>   	hdr->buflen = buflen;
> +	xdp->frame_sz = buflen;
>   
>   	--net->refcnt_bias;
>   	alloc_frag->offset += buflen;


Hi Jesper:

As I said in v1, tun will do this for us (patch 19) via hdr->buflen. So 
it looks to me this is not necessary?

Thanks

>
>


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH net-next v2 20/33] vhost_net: also populate XDP frame size
  2020-05-06  6:41   ` Jason Wang
@ 2020-05-06  6:49     ` Jason Wang
  0 siblings, 0 replies; 47+ messages in thread
From: Jason Wang @ 2020-05-06  6:49 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, sameehj
  Cc: netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, steffen.klassert


On 2020/5/6 下午2:41, Jason Wang wrote:
>
> On 2020/4/30 下午7:22, Jesper Dangaard Brouer wrote:
>> In vhost_net_build_xdp() the 'buf' that gets queued via an xdp_buff
>> have embedded a struct tun_xdp_hdr (located at xdp->data_hard_start)
>> which contains the buffer length 'buflen' (with tailroom for
>> skb_shared_info). Also storing this buflen in xdp->frame_sz, does not
>> obsolete struct tun_xdp_hdr, as it also contains a struct
>> virtio_net_hdr with other information.
>>
>> Cc: Jason Wang <jasowang@redhat.com>
>> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
>> ---
>>   drivers/vhost/net.c |    1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
>> index 2927f02cc7e1..516519dcc8ff 100644
>> --- a/drivers/vhost/net.c
>> +++ b/drivers/vhost/net.c
>> @@ -747,6 +747,7 @@ static int vhost_net_build_xdp(struct 
>> vhost_net_virtqueue *nvq,
>>       xdp->data = buf + pad;
>>       xdp->data_end = xdp->data + len;
>>       hdr->buflen = buflen;
>> +    xdp->frame_sz = buflen;
>>         --net->refcnt_bias;
>>       alloc_frag->offset += buflen;
>
>
> Hi Jesper:
>
> As I said in v1, tun will do this for us (patch 19) via hdr->buflen. 
> So it looks to me this is not necessary?
>
> Thanks 


Miss your reply. So

Acked-by: Jason Wang <jasowang@redhat.com>



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH net-next v2 19/33] tun: add XDP frame size
  2020-04-30 11:21 ` [PATCH net-next v2 19/33] tun: add XDP frame size Jesper Dangaard Brouer
@ 2020-05-06 20:32   ` Michael S. Tsirkin
  0 siblings, 0 replies; 47+ messages in thread
From: Michael S. Tsirkin @ 2020-05-06 20:32 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: sameehj, Jason Wang, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, steffen.klassert

On Thu, Apr 30, 2020 at 01:21:58PM +0200, Jesper Dangaard Brouer wrote:
> The tun driver have two code paths for running XDP (bpf_prog_run_xdp).
> In both cases 'buflen' contains enough tailroom for skb_shared_info.
> 
> Cc: Jason Wang <jasowang@redhat.com>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> Acked-by: Jason Wang <jasowang@redhat.com>

Acked-by: Michael S. Tsirkin <mst@redhat.com>

> ---
>  drivers/net/tun.c |    2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 44889eba1dbc..c54f967e2c66 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -1671,6 +1671,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
>  		xdp_set_data_meta_invalid(&xdp);
>  		xdp.data_end = xdp.data + len;
>  		xdp.rxq = &tfile->xdp_rxq;
> +		xdp.frame_sz = buflen;
>  
>  		act = bpf_prog_run_xdp(xdp_prog, &xdp);
>  		if (act == XDP_REDIRECT || act == XDP_TX) {
> @@ -2411,6 +2412,7 @@ static int tun_xdp_one(struct tun_struct *tun,
>  		}
>  		xdp_set_data_meta_invalid(xdp);
>  		xdp->rxq = &tfile->xdp_rxq;
> +		xdp->frame_sz = buflen;
>  
>  		act = bpf_prog_run_xdp(xdp_prog, xdp);
>  		err = tun_xdp_act(tun, xdp_prog, xdp, act);
> 
> 


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH net-next v2 20/33] vhost_net: also populate XDP frame size
  2020-04-30 11:22 ` [PATCH net-next v2 20/33] vhost_net: also populate " Jesper Dangaard Brouer
  2020-05-06  6:41   ` Jason Wang
@ 2020-05-06 20:33   ` Michael S. Tsirkin
  1 sibling, 0 replies; 47+ messages in thread
From: Michael S. Tsirkin @ 2020-05-06 20:33 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: sameehj, Jason Wang, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, steffen.klassert

On Thu, Apr 30, 2020 at 01:22:03PM +0200, Jesper Dangaard Brouer wrote:
> In vhost_net_build_xdp() the 'buf' that gets queued via an xdp_buff
> have embedded a struct tun_xdp_hdr (located at xdp->data_hard_start)
> which contains the buffer length 'buflen' (with tailroom for
> skb_shared_info). Also storing this buflen in xdp->frame_sz, does not
> obsolete struct tun_xdp_hdr, as it also contains a struct
> virtio_net_hdr with other information.
> 
> Cc: Jason Wang <jasowang@redhat.com>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>

Acked-by: Michael S. Tsirkin <mst@redhat.com>

> ---
>  drivers/vhost/net.c |    1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 2927f02cc7e1..516519dcc8ff 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -747,6 +747,7 @@ static int vhost_net_build_xdp(struct vhost_net_virtqueue *nvq,
>  	xdp->data = buf + pad;
>  	xdp->data_end = xdp->data + len;
>  	hdr->buflen = buflen;
> +	xdp->frame_sz = buflen;
>  
>  	--net->refcnt_bias;
>  	alloc_frag->offset += buflen;
> 
> 


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH net-next v2 21/33] virtio_net: add XDP frame size in two code paths
  2020-04-30 11:22 ` [PATCH net-next v2 21/33] virtio_net: add XDP frame size in two code paths Jesper Dangaard Brouer
@ 2020-05-06 20:34   ` Michael S. Tsirkin
  2020-05-08  2:05     ` Jason Wang
  0 siblings, 1 reply; 47+ messages in thread
From: Michael S. Tsirkin @ 2020-05-06 20:34 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: sameehj, Jason Wang, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, steffen.klassert

On Thu, Apr 30, 2020 at 01:22:08PM +0200, Jesper Dangaard Brouer wrote:
> The virtio_net driver is running inside the guest-OS. There are two
> XDP receive code-paths in virtio_net, namely receive_small() and
> receive_mergeable(). The receive_big() function does not support XDP.
> 
> In receive_small() the frame size is available in buflen. The buffer
> backing these frames are allocated in add_recvbuf_small() with same
> size, except for the headroom, but tailroom have reserved room for
> skb_shared_info. The headroom is encoded in ctx pointer as a value.
> 
> In receive_mergeable() the frame size is more dynamic. There are two
> basic cases: (1) buffer size is based on a exponentially weighted
> moving average (see DECLARE_EWMA) of packet length. Or (2) in case
> virtnet_get_headroom() have any headroom then buffer size is
> PAGE_SIZE. The ctx pointer is this time used for encoding two values;
> the buffer len "truesize" and headroom. In case (1) if the rx buffer
> size is underestimated, the packet will have been split over more
> buffers (num_buf info in virtio_net_hdr_mrg_rxbuf placed in top of
> buffer area). If that happens the XDP path does a xdp_linearize_page
> operation.
> 
> Cc: Jason Wang <jasowang@redhat.com>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>

Acked-by: Michael S. Tsirkin <mst@redhat.com>

> ---
>  drivers/net/virtio_net.c |   15 ++++++++++++---
>  1 file changed, 12 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 11f722460513..1df3676da185 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -689,6 +689,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
>  		xdp.data_end = xdp.data + len;
>  		xdp.data_meta = xdp.data;
>  		xdp.rxq = &rq->xdp_rxq;
> +		xdp.frame_sz = buflen;
>  		orig_data = xdp.data;
>  		act = bpf_prog_run_xdp(xdp_prog, &xdp);
>  		stats->xdp_packets++;
> @@ -797,10 +798,11 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>  	int offset = buf - page_address(page);
>  	struct sk_buff *head_skb, *curr_skb;
>  	struct bpf_prog *xdp_prog;
> -	unsigned int truesize;
> +	unsigned int truesize = mergeable_ctx_to_truesize(ctx);
>  	unsigned int headroom = mergeable_ctx_to_headroom(ctx);
> -	int err;
>  	unsigned int metasize = 0;
> +	unsigned int frame_sz;
> +	int err;
>  
>  	head_skb = NULL;
>  	stats->bytes += len - vi->hdr_len;
> @@ -821,6 +823,11 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>  		if (unlikely(hdr->hdr.gso_type))
>  			goto err_xdp;
>  
> +		/* Buffers with headroom use PAGE_SIZE as alloc size,
> +		 * see add_recvbuf_mergeable() + get_mergeable_buf_len()
> +		 */
> +		frame_sz = headroom ? PAGE_SIZE : truesize;
> +
>  		/* This happens when rx buffer size is underestimated
>  		 * or headroom is not enough because of the buffer
>  		 * was refilled before XDP is set. This should only
> @@ -834,6 +841,8 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>  						      page, offset,
>  						      VIRTIO_XDP_HEADROOM,
>  						      &len);
> +			frame_sz = PAGE_SIZE;
> +
>  			if (!xdp_page)
>  				goto err_xdp;
>  			offset = VIRTIO_XDP_HEADROOM;
> @@ -850,6 +859,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>  		xdp.data_end = xdp.data + (len - vi->hdr_len);
>  		xdp.data_meta = xdp.data;
>  		xdp.rxq = &rq->xdp_rxq;
> +		xdp.frame_sz = frame_sz;
>  
>  		act = bpf_prog_run_xdp(xdp_prog, &xdp);
>  		stats->xdp_packets++;
> @@ -924,7 +934,6 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>  	}
>  	rcu_read_unlock();
>  
> -	truesize = mergeable_ctx_to_truesize(ctx);
>  	if (unlikely(len > truesize)) {
>  		pr_debug("%s: rx error: len %u exceeds truesize %lu\n",
>  			 dev->name, len, (unsigned long)ctx);
> 
> 


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH net-next v2 21/33] virtio_net: add XDP frame size in two code paths
  2020-05-06 20:34   ` Michael S. Tsirkin
@ 2020-05-08  2:05     ` Jason Wang
  2020-05-08  7:21       ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 47+ messages in thread
From: Jason Wang @ 2020-05-08  2:05 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jesper Dangaard Brouer
  Cc: sameehj, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, steffen.klassert


On 2020/5/7 上午4:34, Michael S. Tsirkin wrote:
> On Thu, Apr 30, 2020 at 01:22:08PM +0200, Jesper Dangaard Brouer wrote:
>> The virtio_net driver is running inside the guest-OS. There are two
>> XDP receive code-paths in virtio_net, namely receive_small() and
>> receive_mergeable(). The receive_big() function does not support XDP.
>>
>> In receive_small() the frame size is available in buflen. The buffer
>> backing these frames are allocated in add_recvbuf_small() with same
>> size, except for the headroom, but tailroom have reserved room for
>> skb_shared_info. The headroom is encoded in ctx pointer as a value.
>>
>> In receive_mergeable() the frame size is more dynamic. There are two
>> basic cases: (1) buffer size is based on a exponentially weighted
>> moving average (see DECLARE_EWMA) of packet length. Or (2) in case
>> virtnet_get_headroom() have any headroom then buffer size is
>> PAGE_SIZE. The ctx pointer is this time used for encoding two values;
>> the buffer len "truesize" and headroom. In case (1) if the rx buffer
>> size is underestimated, the packet will have been split over more
>> buffers (num_buf info in virtio_net_hdr_mrg_rxbuf placed in top of
>> buffer area). If that happens the XDP path does a xdp_linearize_page
>> operation.
>>
>> Cc: Jason Wang<jasowang@redhat.com>
>> Signed-off-by: Jesper Dangaard Brouer<brouer@redhat.com>
> Acked-by: Michael S. Tsirkin<mst@redhat.com>


Note that we do:

         xdp.data_hard_start = data - VIRTIO_XDP_HEADROOM + vi->hdr_len;

So using PAGE_SIZE here is probably not correct.

Thanks

>


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH net-next v2 21/33] virtio_net: add XDP frame size in two code paths
  2020-05-08  2:05     ` Jason Wang
@ 2020-05-08  7:21       ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 47+ messages in thread
From: Jesper Dangaard Brouer @ 2020-05-08  7:21 UTC (permalink / raw)
  To: Jason Wang
  Cc: Michael S. Tsirkin, sameehj, netdev, bpf, zorik, akiyano,
	gtzalik, Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, steffen.klassert, brouer

On Fri, 8 May 2020 10:05:46 +0800
Jason Wang <jasowang@redhat.com> wrote:

> On 2020/5/7 上午4:34, Michael S. Tsirkin wrote:
> > On Thu, Apr 30, 2020 at 01:22:08PM +0200, Jesper Dangaard Brouer wrote:  
> >> The virtio_net driver is running inside the guest-OS. There are two
> >> XDP receive code-paths in virtio_net, namely receive_small() and
> >> receive_mergeable(). The receive_big() function does not support XDP.
> >>
> >> In receive_small() the frame size is available in buflen. The buffer
> >> backing these frames are allocated in add_recvbuf_small() with same
> >> size, except for the headroom, but tailroom have reserved room for
> >> skb_shared_info. The headroom is encoded in ctx pointer as a value.
> >>
> >> In receive_mergeable() the frame size is more dynamic. There are two
> >> basic cases: (1) buffer size is based on a exponentially weighted
> >> moving average (see DECLARE_EWMA) of packet length. Or (2) in case
> >> virtnet_get_headroom() have any headroom then buffer size is
> >> PAGE_SIZE. The ctx pointer is this time used for encoding two values;
> >> the buffer len "truesize" and headroom. In case (1) if the rx buffer
> >> size is underestimated, the packet will have been split over more
> >> buffers (num_buf info in virtio_net_hdr_mrg_rxbuf placed in top of
> >> buffer area). If that happens the XDP path does a xdp_linearize_page
> >> operation.
> >>
> >> Cc: Jason Wang<jasowang@redhat.com>
> >> Signed-off-by: Jesper Dangaard Brouer<brouer@redhat.com>  
> > Acked-by: Michael S. Tsirkin<mst@redhat.com>  
> 
> 
> Note that we do:
> 
>          xdp.data_hard_start = data - VIRTIO_XDP_HEADROOM + vi->hdr_len;
> 
> So using PAGE_SIZE here is probably not correct.

Yes, you are correct.  I will fix this up in V3.  We need to
adjust/reduce xdp.frame_sz with these offsets, as frame_sz is an offset
size from xdp.data_hard_start.

Thanks for pointing this out again, I will fix.
-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH net-next v2 28/33] mlx5: rx queue setup time determine frame_sz for XDP
  2020-05-01 12:32       ` Jesper Dangaard Brouer
@ 2020-05-08 10:49         ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 47+ messages in thread
From: Jesper Dangaard Brouer @ 2020-05-08 10:49 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: sameehj, Saeed Mahameed, netdev, bpf, zorik, akiyano, gtzalik,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, John Fastabend, Alexander Duyck,
	Jeff Kirsher, David Ahern, Willem de Bruijn, Ilias Apalodimas,
	Lorenzo Bianconi, brouer, Tariq Toukan

On Fri, 1 May 2020 14:32:32 +0200
Jesper Dangaard Brouer <brouer@redhat.com> wrote:

> On Thu, 30 Apr 2020 20:12:11 +0300
> Tariq Toukan <tariqt@mellanox.com> wrote:
> 
> > >> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h 
> > >> b/drivers/net/ethernet/mellanox/mlx5/core/en.h
> > >> index 23701c0e36ec..ba6a0ee297c6 100644
> > >> --- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
> > >> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
> > >> @@ -652,6 +652,7 @@ struct mlx5e_rq {
> > >>       struct {
> > >>           u16            umem_headroom;
> > >>           u16            headroom;
> > >> +        u32            first_frame_sz;    
> > 
> > I also think that a better name would be: frame0_sz, or frag0_sz.  
> 
> You do realize that the name "first_frame_sz" was your suggestion last
> time... Now you give me two options, can please select one of them so I
> can update the patch for a V3 with that?

As I've not gotten any feedback from you, I'm going to choose your
first suggestion "frame0_sz" and update the patch with that...

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2020-05-08 10:50 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <158824557985.2172139.4173570969543904434.stgit@firesoul>
2020-04-30 11:20 ` [PATCH net-next v2 01/33] xdp: add frame size to xdp_buff Jesper Dangaard Brouer
2020-04-30 11:20 ` [PATCH net-next v2 02/33] bnxt: add XDP frame size to driver Jesper Dangaard Brouer
2020-04-30 11:20 ` [PATCH net-next v2 03/33] sfc: add XDP frame size Jesper Dangaard Brouer
2020-04-30 11:20 ` [PATCH net-next v2 04/33] mvneta: add XDP frame size to driver Jesper Dangaard Brouer
2020-04-30 11:20 ` [PATCH net-next v2 05/33] net: netsec: Add support for XDP frame size Jesper Dangaard Brouer
2020-04-30 11:20 ` [PATCH net-next v2 06/33] net: XDP-generic determining " Jesper Dangaard Brouer
2020-04-30 11:20 ` [PATCH net-next v2 07/33] xdp: xdp_frame add member frame_sz and handle in convert_to_xdp_frame Jesper Dangaard Brouer
2020-04-30 11:21 ` [PATCH net-next v2 08/33] xdp: cpumap redirect use frame_sz and increase skb_tailroom Jesper Dangaard Brouer
2020-04-30 11:21 ` [PATCH net-next v2 09/33] veth: adjust hard_start offset on redirect XDP frames Jesper Dangaard Brouer
2020-04-30 11:21 ` [PATCH net-next v2 10/33] veth: xdp using frame_sz in veth driver Jesper Dangaard Brouer
2020-04-30 11:21 ` [PATCH net-next v2 11/33] dpaa2-eth: add XDP frame size Jesper Dangaard Brouer
2020-04-30 11:21 ` [PATCH net-next v2 12/33] hv_netvsc: add XDP frame size to driver Jesper Dangaard Brouer
2020-04-30 14:20   ` Haiyang Zhang
2020-05-01 14:47     ` Jesper Dangaard Brouer
2020-04-30 11:21 ` [PATCH net-next v2 13/33] qlogic/qede: " Jesper Dangaard Brouer
2020-04-30 11:21 ` [PATCH net-next v2 14/33] net: ethernet: ti: add XDP frame size to driver cpsw Jesper Dangaard Brouer
2020-04-30 11:21 ` [PATCH net-next v2 15/33] ena: add XDP frame size to amazon NIC driver Jesper Dangaard Brouer
2020-04-30 11:21 ` [PATCH net-next v2 16/33] mlx4: add XDP frame size and adjust max XDP MTU Jesper Dangaard Brouer
2020-04-30 11:21 ` [PATCH net-next v2 17/33] net: thunderx: add XDP frame size Jesper Dangaard Brouer
2020-04-30 11:21 ` [PATCH net-next v2 18/33] nfp: add XDP frame size to netronome driver Jesper Dangaard Brouer
2020-04-30 11:21 ` [PATCH net-next v2 19/33] tun: add XDP frame size Jesper Dangaard Brouer
2020-05-06 20:32   ` Michael S. Tsirkin
2020-04-30 11:22 ` [PATCH net-next v2 20/33] vhost_net: also populate " Jesper Dangaard Brouer
2020-05-06  6:41   ` Jason Wang
2020-05-06  6:49     ` Jason Wang
2020-05-06 20:33   ` Michael S. Tsirkin
2020-04-30 11:22 ` [PATCH net-next v2 21/33] virtio_net: add XDP frame size in two code paths Jesper Dangaard Brouer
2020-05-06 20:34   ` Michael S. Tsirkin
2020-05-08  2:05     ` Jason Wang
2020-05-08  7:21       ` Jesper Dangaard Brouer
2020-04-30 11:22 ` [PATCH net-next v2 22/33] ixgbe: fix XDP redirect on archs with PAGE_SIZE above 4K Jesper Dangaard Brouer
2020-04-30 11:22 ` [PATCH net-next v2 23/33] ixgbe: add XDP frame size to driver Jesper Dangaard Brouer
2020-04-30 11:22 ` [PATCH net-next v2 24/33] ixgbevf: add XDP frame size to VF driver Jesper Dangaard Brouer
2020-04-30 11:22 ` [PATCH net-next v2 25/33] i40e: add XDP frame size to driver Jesper Dangaard Brouer
2020-04-30 11:22 ` [PATCH net-next v2 26/33] ice: " Jesper Dangaard Brouer
2020-04-30 11:22 ` [PATCH net-next v2 27/33] xdp: for Intel AF_XDP drivers add XDP frame_sz Jesper Dangaard Brouer
2020-04-30 11:22 ` [PATCH net-next v2 28/33] mlx5: rx queue setup time determine frame_sz for XDP Jesper Dangaard Brouer
2020-04-30 17:07   ` Tariq Toukan
2020-04-30 17:12     ` Tariq Toukan
2020-05-01 12:32       ` Jesper Dangaard Brouer
2020-05-08 10:49         ` Jesper Dangaard Brouer
2020-05-01 13:01     ` Jesper Dangaard Brouer
2020-04-30 11:22 ` [PATCH net-next v2 29/33] xdp: allow bpf_xdp_adjust_tail() to grow packet size Jesper Dangaard Brouer
2020-04-30 11:22 ` [PATCH net-next v2 30/33] xdp: clear grow memory in bpf_xdp_adjust_tail() Jesper Dangaard Brouer
2020-04-30 11:22 ` [PATCH net-next v2 31/33] bpf: add xdp.frame_sz in bpf_prog_test_run_xdp() Jesper Dangaard Brouer
2020-04-30 11:23 ` [PATCH net-next v2 32/33] selftests/bpf: adjust BPF selftest for xdp_adjust_tail Jesper Dangaard Brouer
2020-04-30 11:23 ` [PATCH net-next v2 33/33] selftests/bpf: xdp_adjust_tail add grow tail tests Jesper Dangaard Brouer

This is a public inbox, see mirroring instructions
on how to clone and mirror all data and code used for this inbox