All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next v3 00/33] XDP extend with knowledge of frame size
@ 2020-05-08 11:08 Jesper Dangaard Brouer
  2020-05-08 11:08 ` [PATCH net-next v3 01/33] xdp: add frame size to xdp_buff Jesper Dangaard Brouer
                   ` (32 more replies)
  0 siblings, 33 replies; 38+ messages in thread
From: Jesper Dangaard Brouer @ 2020-05-08 11:08 UTC (permalink / raw)
  To: sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, David S. Miller, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, Tariq Toukan

(Send for net-next due to all the driver updates)

V3:
- Fix issue on virtio_net patch spotted by Jason Wang
- Adjust name for variable in mlx5 patch
- Collected more ACKs

V2:
- Fix bug in mlx5 for XDP_PASS case
- Collected nitpicks and ACKs from mailing list

V1:
- Fix bug in dpaa2

XDP have evolved to support several frame sizes, but xdp_buff was not
updated with this information. This have caused the side-effect that
XDP frame data hard end is unknown. This have limited the BPF-helper
bpf_xdp_adjust_tail to only shrink the packet. This patchset address
this and add packet tail extend/grow.

The purpose of the patchset is ALSO to reserve a memory area that can be
used for storing extra information, specifically for extending XDP with
multi-buffer support. One proposal is to use same layout as
skb_shared_info, which is why this area is currently 320 bytes.

When converting xdp_frame to SKB (veth and cpumap), the full tailroom
area can now be used and SKB truesize is now correct. For most
drivers this result in a much larger tailroom in SKB "head" data
area. The network stack can now take advantage of this when doing SKB
coalescing. Thus, a good driver test is to use xdp_redirect_cpu from
samples/bpf/ and do some TCP stream testing.

Use-cases for tail grow/extend:
(1) IPsec / XFRM needs a tail extend[1][2].
(2) DNS-cache responses in XDP.
(3) HAProxy ALOHA would need it to convert to XDP.
(4) Add tail info e.g. timestamp and collect via tcpdump

[1] http://vger.kernel.org/netconf2019_files/xfrm_xdp.pdf
[2] http://vger.kernel.org/netconf2019.html

---

Ilias Apalodimas (1):
      net: netsec: Add support for XDP frame size

Jesper Dangaard Brouer (32):
      xdp: add frame size to xdp_buff
      bnxt: add XDP frame size to driver
      sfc: add XDP frame size
      mvneta: add XDP frame size to driver
      net: XDP-generic determining XDP frame size
      xdp: xdp_frame add member frame_sz and handle in convert_to_xdp_frame
      xdp: cpumap redirect use frame_sz and increase skb_tailroom
      veth: adjust hard_start offset on redirect XDP frames
      veth: xdp using frame_sz in veth driver
      dpaa2-eth: add XDP frame size
      hv_netvsc: add XDP frame size to driver
      qlogic/qede: add XDP frame size to driver
      net: ethernet: ti: add XDP frame size to driver cpsw
      ena: add XDP frame size to amazon NIC driver
      mlx4: add XDP frame size and adjust max XDP MTU
      net: thunderx: add XDP frame size
      nfp: add XDP frame size to netronome driver
      tun: add XDP frame size
      vhost_net: also populate XDP frame size
      virtio_net: add XDP frame size in two code paths
      ixgbe: fix XDP redirect on archs with PAGE_SIZE above 4K
      ixgbe: add XDP frame size to driver
      ixgbevf: add XDP frame size to VF driver
      i40e: add XDP frame size to driver
      ice: add XDP frame size to driver
      xdp: for Intel AF_XDP drivers add XDP frame_sz
      mlx5: rx queue setup time determine frame_sz for XDP
      xdp: allow bpf_xdp_adjust_tail() to grow packet size
      xdp: clear grow memory in bpf_xdp_adjust_tail()
      bpf: add xdp.frame_sz in bpf_prog_test_run_xdp().
      selftests/bpf: adjust BPF selftest for xdp_adjust_tail
      selftests/bpf: xdp_adjust_tail add grow tail tests


 drivers/net/ethernet/amazon/ena/ena_netdev.c       |    1 
 drivers/net/ethernet/amazon/ena/ena_netdev.h       |    5 -
 drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c      |    1 
 drivers/net/ethernet/cavium/thunder/nicvf_main.c   |    1 
 drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c   |    7 +
 drivers/net/ethernet/intel/i40e/i40e_txrx.c        |   30 ++++-
 drivers/net/ethernet/intel/i40e/i40e_xsk.c         |    2 
 drivers/net/ethernet/intel/ice/ice_txrx.c          |   34 ++++--
 drivers/net/ethernet/intel/ice/ice_xsk.c           |    2 
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c      |   33 ++++-
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c       |    2 
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c  |   34 ++++--
 drivers/net/ethernet/marvell/mvneta.c              |   25 ++--
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c     |    3 
 drivers/net/ethernet/mellanox/mlx4/en_rx.c         |    1 
 drivers/net/ethernet/mellanox/mlx5/core/en.h       |    1 
 drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c   |    1 
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c  |    6 +
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c    |    2 
 .../net/ethernet/netronome/nfp/nfp_net_common.c    |    6 +
 drivers/net/ethernet/qlogic/qede/qede_fp.c         |    1 
 drivers/net/ethernet/qlogic/qede/qede_main.c       |    2 
 drivers/net/ethernet/sfc/rx.c                      |    1 
 drivers/net/ethernet/socionext/netsec.c            |   30 +++--
 drivers/net/ethernet/ti/cpsw.c                     |    1 
 drivers/net/ethernet/ti/cpsw_new.c                 |    1 
 drivers/net/hyperv/netvsc_bpf.c                    |    1 
 drivers/net/hyperv/netvsc_drv.c                    |    2 
 drivers/net/tun.c                                  |    2 
 drivers/net/veth.c                                 |   28 +++--
 drivers/net/virtio_net.c                           |   15 ++
 drivers/vhost/net.c                                |    1 
 include/net/xdp.h                                  |   27 ++++
 include/net/xdp_sock.h                             |   11 ++
 include/uapi/linux/bpf.h                           |    4 -
 kernel/bpf/cpumap.c                                |   21 ---
 net/bpf/test_run.c                                 |   16 ++-
 net/core/dev.c                                     |   14 +-
 net/core/filter.c                                  |   15 ++
 net/core/xdp.c                                     |    7 +
 .../selftests/bpf/prog_tests/xdp_adjust_tail.c     |  123 +++++++++++++++++++-
 .../testing/selftests/bpf/progs/test_adjust_tail.c |   30 -----
 .../bpf/progs/test_xdp_adjust_tail_grow.c          |   33 +++++
 .../bpf/progs/test_xdp_adjust_tail_shrink.c        |   30 +++++
 44 files changed, 474 insertions(+), 139 deletions(-)
 delete mode 100644 tools/testing/selftests/bpf/progs/test_adjust_tail.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_grow.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_shrink.c

--


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH net-next v3 01/33] xdp: add frame size to xdp_buff
  2020-05-08 11:08 [PATCH net-next v3 00/33] XDP extend with knowledge of frame size Jesper Dangaard Brouer
@ 2020-05-08 11:08 ` Jesper Dangaard Brouer
  2020-05-08 11:08 ` [PATCH net-next v3 02/33] bnxt: add XDP frame size to driver Jesper Dangaard Brouer
                   ` (31 subsequent siblings)
  32 siblings, 0 replies; 38+ messages in thread
From: Jesper Dangaard Brouer @ 2020-05-08 11:08 UTC (permalink / raw)
  To: sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, David S. Miller, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, Tariq Toukan

XDP have evolved to support several frame sizes, but xdp_buff was not
updated with this information. The frame size (frame_sz) member of
xdp_buff is introduced to know the real size of the memory the frame is
delivered in.

When introducing this also make it clear that some tailroom is
reserved/required when creating SKBs using build_skb().

It would also have been an option to introduce a pointer to
data_hard_end (with reserved offset). The advantage with frame_sz is
that (like rxq) drivers only need to setup/assign this value once per
NAPI cycle. Due to XDP-generic (and some drivers) it's not possible to
store frame_sz inside xdp_rxq_info, because it's varies per packet as it
can be based/depend on packet length.

V2: nitpick: deduct -> deduce

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 include/net/xdp.h |   13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/include/net/xdp.h b/include/net/xdp.h
index 3cc6d5d84aa4..a764af4ae0ea 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -6,6 +6,8 @@
 #ifndef __LINUX_NET_XDP_H__
 #define __LINUX_NET_XDP_H__
 
+#include <linux/skbuff.h> /* skb_shared_info */
+
 /**
  * DOC: XDP RX-queue information
  *
@@ -70,8 +72,19 @@ struct xdp_buff {
 	void *data_hard_start;
 	unsigned long handle;
 	struct xdp_rxq_info *rxq;
+	u32 frame_sz; /* frame size to deduce data_hard_end/reserved tailroom*/
 };
 
+/* Reserve memory area at end-of data area.
+ *
+ * This macro reserves tailroom in the XDP buffer by limiting the
+ * XDP/BPF data access to data_hard_end.  Notice same area (and size)
+ * is used for XDP_PASS, when constructing the SKB via build_skb().
+ */
+#define xdp_data_hard_end(xdp)				\
+	((xdp)->data_hard_start + (xdp)->frame_sz -	\
+	 SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))
+
 struct xdp_frame {
 	void *data;
 	u16 len;



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH net-next v3 02/33] bnxt: add XDP frame size to driver
  2020-05-08 11:08 [PATCH net-next v3 00/33] XDP extend with knowledge of frame size Jesper Dangaard Brouer
  2020-05-08 11:08 ` [PATCH net-next v3 01/33] xdp: add frame size to xdp_buff Jesper Dangaard Brouer
@ 2020-05-08 11:08 ` Jesper Dangaard Brouer
  2020-05-08 11:09 ` [PATCH net-next v3 03/33] sfc: add XDP frame size Jesper Dangaard Brouer
                   ` (30 subsequent siblings)
  32 siblings, 0 replies; 38+ messages in thread
From: Jesper Dangaard Brouer @ 2020-05-08 11:08 UTC (permalink / raw)
  To: sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, David S. Miller, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, Tariq Toukan

This driver uses full PAGE_SIZE pages when XDP is enabled.

In case of XDP uses driver uses __bnxt_alloc_rx_page which does full
page DMA-map. Thus, xdp_adjust_tail grow is DMA compliant for XDP_TX
action that does DMA-sync.

Cc: Michael Chan <michael.chan@broadcom.com>
Cc: Andy Gospodarek <andrew.gospodarek@broadcom.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: Andy Gospodarek <gospo@broadcom.com>
---
 drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
index c6f6f2033880..5e3b4a3b69ea 100644
--- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
+++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
@@ -138,6 +138,7 @@ bool bnxt_rx_xdp(struct bnxt *bp, struct bnxt_rx_ring_info *rxr, u16 cons,
 	xdp_set_data_meta_invalid(&xdp);
 	xdp.data_end = *data_ptr + *len;
 	xdp.rxq = &rxr->xdp_rxq;
+	xdp.frame_sz = PAGE_SIZE; /* BNXT_RX_PAGE_MODE(bp) when XDP enabled */
 	orig_data = xdp.data;
 
 	rcu_read_lock();



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH net-next v3 03/33] sfc: add XDP frame size
  2020-05-08 11:08 [PATCH net-next v3 00/33] XDP extend with knowledge of frame size Jesper Dangaard Brouer
  2020-05-08 11:08 ` [PATCH net-next v3 01/33] xdp: add frame size to xdp_buff Jesper Dangaard Brouer
  2020-05-08 11:08 ` [PATCH net-next v3 02/33] bnxt: add XDP frame size to driver Jesper Dangaard Brouer
@ 2020-05-08 11:09 ` Jesper Dangaard Brouer
  2020-05-08 11:09 ` [PATCH net-next v3 04/33] mvneta: add XDP frame size to driver Jesper Dangaard Brouer
                   ` (29 subsequent siblings)
  32 siblings, 0 replies; 38+ messages in thread
From: Jesper Dangaard Brouer @ 2020-05-08 11:09 UTC (permalink / raw)
  To: sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, David S. Miller, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, Tariq Toukan

This driver uses RX page-split when possible. It was recently fixed
in commit 86e85bf6981c ("sfc: fix XDP-redirect in this driver") to
add needed tailroom for XDP-redirect.

After the fix efx->rx_page_buf_step is the frame size, with enough
head and tail-room for XDP-redirect.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/sfc/rx.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/sfc/rx.c b/drivers/net/ethernet/sfc/rx.c
index 260352d97d9d..68c47a8c71df 100644
--- a/drivers/net/ethernet/sfc/rx.c
+++ b/drivers/net/ethernet/sfc/rx.c
@@ -308,6 +308,7 @@ static bool efx_do_xdp(struct efx_nic *efx, struct efx_channel *channel,
 	xdp_set_data_meta_invalid(&xdp);
 	xdp.data_end = xdp.data + rx_buf->len;
 	xdp.rxq = &rx_queue->xdp_rxq_info;
+	xdp.frame_sz = efx->rx_page_buf_step;
 
 	xdp_act = bpf_prog_run_xdp(xdp_prog, &xdp);
 	rcu_read_unlock();



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH net-next v3 04/33] mvneta: add XDP frame size to driver
  2020-05-08 11:08 [PATCH net-next v3 00/33] XDP extend with knowledge of frame size Jesper Dangaard Brouer
                   ` (2 preceding siblings ...)
  2020-05-08 11:09 ` [PATCH net-next v3 03/33] sfc: add XDP frame size Jesper Dangaard Brouer
@ 2020-05-08 11:09 ` Jesper Dangaard Brouer
  2020-05-08 11:09 ` [PATCH net-next v3 05/33] net: netsec: Add support for XDP frame size Jesper Dangaard Brouer
                   ` (28 subsequent siblings)
  32 siblings, 0 replies; 38+ messages in thread
From: Jesper Dangaard Brouer @ 2020-05-08 11:09 UTC (permalink / raw)
  To: sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, David S. Miller, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, Tariq Toukan

This marvell driver mvneta uses PAGE_SIZE frames, which makes it
really easy to convert.  Driver updates rxq and now frame_sz
once per NAPI call.

This driver takes advantage of page_pool PP_FLAG_DMA_SYNC_DEV that
can help reduce the number of cache-lines that need to be flushed
when doing DMA sync for_device. Due to xdp_adjust_tail can grow the
area accessible to the by the CPU (can possibly write into), then max
sync length *after* bpf_prog_run_xdp() needs to be taken into account.

For XDP_TX action the driver is smart and does DMA-sync. When growing
tail this is still safe, because page_pool have DMA-mapped the entire
page size.

Cc: thomas.petazzoni@bootlin.com
Acked-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/marvell/mvneta.c |   25 +++++++++++++++----------
 1 file changed, 15 insertions(+), 10 deletions(-)

diff --git a/drivers/net/ethernet/marvell/mvneta.c b/drivers/net/ethernet/marvell/mvneta.c
index 51889770958d..37947949345c 100644
--- a/drivers/net/ethernet/marvell/mvneta.c
+++ b/drivers/net/ethernet/marvell/mvneta.c
@@ -2148,12 +2148,17 @@ mvneta_run_xdp(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
 	       struct bpf_prog *prog, struct xdp_buff *xdp,
 	       struct mvneta_stats *stats)
 {
-	unsigned int len;
+	unsigned int len, sync;
+	struct page *page;
 	u32 ret, act;
 
 	len = xdp->data_end - xdp->data_hard_start - pp->rx_offset_correction;
 	act = bpf_prog_run_xdp(prog, xdp);
 
+	/* Due xdp_adjust_tail: DMA sync for_device cover max len CPU touch */
+	sync = xdp->data_end - xdp->data_hard_start - pp->rx_offset_correction;
+	sync = max(sync, len);
+
 	switch (act) {
 	case XDP_PASS:
 		stats->xdp_pass++;
@@ -2164,9 +2169,8 @@ mvneta_run_xdp(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
 		err = xdp_do_redirect(pp->dev, xdp, prog);
 		if (unlikely(err)) {
 			ret = MVNETA_XDP_DROPPED;
-			page_pool_put_page(rxq->page_pool,
-					   virt_to_head_page(xdp->data), len,
-					   true);
+			page = virt_to_head_page(xdp->data);
+			page_pool_put_page(rxq->page_pool, page, sync, true);
 		} else {
 			ret = MVNETA_XDP_REDIR;
 			stats->xdp_redirect++;
@@ -2175,10 +2179,10 @@ mvneta_run_xdp(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
 	}
 	case XDP_TX:
 		ret = mvneta_xdp_xmit_back(pp, xdp);
-		if (ret != MVNETA_XDP_TX)
-			page_pool_put_page(rxq->page_pool,
-					   virt_to_head_page(xdp->data), len,
-					   true);
+		if (ret != MVNETA_XDP_TX) {
+			page = virt_to_head_page(xdp->data);
+			page_pool_put_page(rxq->page_pool, page, sync, true);
+		}
 		break;
 	default:
 		bpf_warn_invalid_xdp_action(act);
@@ -2187,8 +2191,8 @@ mvneta_run_xdp(struct mvneta_port *pp, struct mvneta_rx_queue *rxq,
 		trace_xdp_exception(pp->dev, prog, act);
 		/* fall through */
 	case XDP_DROP:
-		page_pool_put_page(rxq->page_pool,
-				   virt_to_head_page(xdp->data), len, true);
+		page = virt_to_head_page(xdp->data);
+		page_pool_put_page(rxq->page_pool, page, sync, true);
 		ret = MVNETA_XDP_DROPPED;
 		stats->xdp_drop++;
 		break;
@@ -2320,6 +2324,7 @@ static int mvneta_rx_swbm(struct napi_struct *napi,
 	rcu_read_lock();
 	xdp_prog = READ_ONCE(pp->xdp_prog);
 	xdp_buf.rxq = &rxq->xdp_rxq;
+	xdp_buf.frame_sz = PAGE_SIZE;
 
 	/* Fairness NAPI loop */
 	while (rx_proc < budget && rx_proc < rx_todo) {



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH net-next v3 05/33] net: netsec: Add support for XDP frame size
  2020-05-08 11:08 [PATCH net-next v3 00/33] XDP extend with knowledge of frame size Jesper Dangaard Brouer
                   ` (3 preceding siblings ...)
  2020-05-08 11:09 ` [PATCH net-next v3 04/33] mvneta: add XDP frame size to driver Jesper Dangaard Brouer
@ 2020-05-08 11:09 ` Jesper Dangaard Brouer
  2020-05-08 11:09 ` [PATCH net-next v3 06/33] net: XDP-generic determining " Jesper Dangaard Brouer
                   ` (27 subsequent siblings)
  32 siblings, 0 replies; 38+ messages in thread
From: Jesper Dangaard Brouer @ 2020-05-08 11:09 UTC (permalink / raw)
  To: sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, David S. Miller, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, Tariq Toukan

From: Ilias Apalodimas <ilias.apalodimas@linaro.org>

This driver takes advantage of page_pool PP_FLAG_DMA_SYNC_DEV that
can help reduce the number of cache-lines that need to be flushed
when doing DMA sync for_device. Due to xdp_adjust_tail can grow the
area accessible to the by the CPU (can possibly write into), then max
sync length *after* bpf_prog_run_xdp() needs to be taken into account.

For XDP_TX action the driver is smart and does DMA-sync. When growing
tail this is still safe, because page_pool have DMA-mapped the entire
page size.

Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Lorenzo Bianconi <lorenzo@kernel.org>
---
 drivers/net/ethernet/socionext/netsec.c |   30 ++++++++++++++++++------------
 1 file changed, 18 insertions(+), 12 deletions(-)

diff --git a/drivers/net/ethernet/socionext/netsec.c b/drivers/net/ethernet/socionext/netsec.c
index a5a0fb60193a..e1f4be4b3d69 100644
--- a/drivers/net/ethernet/socionext/netsec.c
+++ b/drivers/net/ethernet/socionext/netsec.c
@@ -884,23 +884,28 @@ static u32 netsec_run_xdp(struct netsec_priv *priv, struct bpf_prog *prog,
 			  struct xdp_buff *xdp)
 {
 	struct netsec_desc_ring *dring = &priv->desc_ring[NETSEC_RING_RX];
-	unsigned int len = xdp->data_end - xdp->data;
+	unsigned int sync, len = xdp->data_end - xdp->data;
 	u32 ret = NETSEC_XDP_PASS;
+	struct page *page;
 	int err;
 	u32 act;
 
 	act = bpf_prog_run_xdp(prog, xdp);
 
+	/* Due xdp_adjust_tail: DMA sync for_device cover max len CPU touch */
+	sync = xdp->data_end - xdp->data_hard_start - NETSEC_RXBUF_HEADROOM;
+	sync = max(sync, len);
+
 	switch (act) {
 	case XDP_PASS:
 		ret = NETSEC_XDP_PASS;
 		break;
 	case XDP_TX:
 		ret = netsec_xdp_xmit_back(priv, xdp);
-		if (ret != NETSEC_XDP_TX)
-			page_pool_put_page(dring->page_pool,
-					   virt_to_head_page(xdp->data), len,
-					   true);
+		if (ret != NETSEC_XDP_TX) {
+			page = virt_to_head_page(xdp->data);
+			page_pool_put_page(dring->page_pool, page, sync, true);
+		}
 		break;
 	case XDP_REDIRECT:
 		err = xdp_do_redirect(priv->ndev, xdp, prog);
@@ -908,9 +913,8 @@ static u32 netsec_run_xdp(struct netsec_priv *priv, struct bpf_prog *prog,
 			ret = NETSEC_XDP_REDIR;
 		} else {
 			ret = NETSEC_XDP_CONSUMED;
-			page_pool_put_page(dring->page_pool,
-					   virt_to_head_page(xdp->data), len,
-					   true);
+			page = virt_to_head_page(xdp->data);
+			page_pool_put_page(dring->page_pool, page, sync, true);
 		}
 		break;
 	default:
@@ -921,8 +925,8 @@ static u32 netsec_run_xdp(struct netsec_priv *priv, struct bpf_prog *prog,
 		/* fall through -- handle aborts by dropping packet */
 	case XDP_DROP:
 		ret = NETSEC_XDP_CONSUMED;
-		page_pool_put_page(dring->page_pool,
-				   virt_to_head_page(xdp->data), len, true);
+		page = virt_to_head_page(xdp->data);
+		page_pool_put_page(dring->page_pool, page, sync, true);
 		break;
 	}
 
@@ -936,10 +940,14 @@ static int netsec_process_rx(struct netsec_priv *priv, int budget)
 	struct netsec_rx_pkt_info rx_info;
 	enum dma_data_direction dma_dir;
 	struct bpf_prog *xdp_prog;
+	struct xdp_buff xdp;
 	u16 xdp_xmit = 0;
 	u32 xdp_act = 0;
 	int done = 0;
 
+	xdp.rxq = &dring->xdp_rxq;
+	xdp.frame_sz = PAGE_SIZE;
+
 	rcu_read_lock();
 	xdp_prog = READ_ONCE(priv->xdp_prog);
 	dma_dir = page_pool_get_dma_dir(dring->page_pool);
@@ -953,7 +961,6 @@ static int netsec_process_rx(struct netsec_priv *priv, int budget)
 		struct sk_buff *skb = NULL;
 		u16 pkt_len, desc_len;
 		dma_addr_t dma_handle;
-		struct xdp_buff xdp;
 		void *buf_addr;
 
 		if (de->attr & (1U << NETSEC_RX_PKT_OWN_FIELD)) {
@@ -1002,7 +1009,6 @@ static int netsec_process_rx(struct netsec_priv *priv, int budget)
 		xdp.data = desc->addr + NETSEC_RXBUF_HEADROOM;
 		xdp_set_data_meta_invalid(&xdp);
 		xdp.data_end = xdp.data + pkt_len;
-		xdp.rxq = &dring->xdp_rxq;
 
 		if (xdp_prog) {
 			xdp_result = netsec_run_xdp(priv, xdp_prog, &xdp);



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH net-next v3 06/33] net: XDP-generic determining XDP frame size
  2020-05-08 11:08 [PATCH net-next v3 00/33] XDP extend with knowledge of frame size Jesper Dangaard Brouer
                   ` (4 preceding siblings ...)
  2020-05-08 11:09 ` [PATCH net-next v3 05/33] net: netsec: Add support for XDP frame size Jesper Dangaard Brouer
@ 2020-05-08 11:09 ` Jesper Dangaard Brouer
  2020-05-08 11:09 ` [PATCH net-next v3 07/33] xdp: xdp_frame add member frame_sz and handle in convert_to_xdp_frame Jesper Dangaard Brouer
                   ` (26 subsequent siblings)
  32 siblings, 0 replies; 38+ messages in thread
From: Jesper Dangaard Brouer @ 2020-05-08 11:09 UTC (permalink / raw)
  To: sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, David S. Miller, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, Tariq Toukan

The SKB "head" pointer points to the data area that contains
skb_shared_info, that can be found via skb_end_pointer(). Given
xdp->data_hard_start have been established (basically pointing to
skb->head), frame size is between skb_end_pointer() and data_hard_start,
plus the size reserved to skb_shared_info.

Change the bpf_xdp_adjust_tail offset adjust of skb->len, to be a positive
offset number on grow, and negative number on shrink.  As this seems more
natural when reading the code.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 net/core/dev.c |   14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index f8d83922a6af..c347c36f919a 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4617,6 +4617,11 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
 	xdp->data_meta = xdp->data;
 	xdp->data_end = xdp->data + hlen;
 	xdp->data_hard_start = skb->data - skb_headroom(skb);
+
+	/* SKB "head" area always have tailroom for skb_shared_info */
+	xdp->frame_sz  = (void *)skb_end_pointer(skb) - xdp->data_hard_start;
+	xdp->frame_sz += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+
 	orig_data_end = xdp->data_end;
 	orig_data = xdp->data;
 	eth = (struct ethhdr *)xdp->data;
@@ -4640,14 +4645,11 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
 		skb_reset_network_header(skb);
 	}
 
-	/* check if bpf_xdp_adjust_tail was used. it can only "shrink"
-	 * pckt.
-	 */
-	off = orig_data_end - xdp->data_end;
+	/* check if bpf_xdp_adjust_tail was used */
+	off = xdp->data_end - orig_data_end;
 	if (off != 0) {
 		skb_set_tail_pointer(skb, xdp->data_end - xdp->data);
-		skb->len -= off;
-
+		skb->len += off; /* positive on grow, negative on shrink */
 	}
 
 	/* check if XDP changed eth hdr such SKB needs update */



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH net-next v3 07/33] xdp: xdp_frame add member frame_sz and handle in convert_to_xdp_frame
  2020-05-08 11:08 [PATCH net-next v3 00/33] XDP extend with knowledge of frame size Jesper Dangaard Brouer
                   ` (5 preceding siblings ...)
  2020-05-08 11:09 ` [PATCH net-next v3 06/33] net: XDP-generic determining " Jesper Dangaard Brouer
@ 2020-05-08 11:09 ` Jesper Dangaard Brouer
  2020-05-08 16:00   ` Jakub Kicinski
  2020-05-08 11:09 ` [PATCH net-next v3 08/33] xdp: cpumap redirect use frame_sz and increase skb_tailroom Jesper Dangaard Brouer
                   ` (25 subsequent siblings)
  32 siblings, 1 reply; 38+ messages in thread
From: Jesper Dangaard Brouer @ 2020-05-08 11:09 UTC (permalink / raw)
  To: sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, David S. Miller, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, Tariq Toukan

Use hole in struct xdp_frame, when adding member frame_sz, which keeps
same sizeof struct (32 bytes)

Drivers ixgbe and sfc had bug cases where the necessary/expected
tailroom was not reserved. This can lead to some hard to catch memory
corruption issues. Having the drivers frame_sz this can be detected when
packet length/end via xdp->data_end exceed the xdp_data_hard_end
pointer, which accounts for the reserved the tailroom.

When detecting this driver issue, simply fail the conversion with NULL,
which results in feedback to driver (failing xdp_do_redirect()) causing
driver to drop packet. Given the lack of consistent XDP stats, this can
be hard to troubleshoot. And given this is a driver bug, we want to
generate some more noise in form of a WARN stack dump (to ID the driver
code that inlined convert_to_xdp_frame).

Inlining the WARN macro is problematic, because it adds an asm
instruction (on Intel CPUs ud2) what influence instruction cache
prefetching. Thus, introduce xdp_warn and macro XDP_WARN, to avoid this
and at the same time make identifying the function and line of this
inlined function easier.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 include/net/xdp.h |   14 +++++++++++++-
 net/core/xdp.c    |    7 +++++++
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/include/net/xdp.h b/include/net/xdp.h
index a764af4ae0ea..1366466868e4 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -89,7 +89,8 @@ struct xdp_frame {
 	void *data;
 	u16 len;
 	u16 headroom;
-	u16 metasize;
+	u32 metasize:8;
+	u32 frame_sz:24;
 	/* Lifetime of xdp_rxq_info is limited to NAPI/enqueue time,
 	 * while mem info is valid on remote CPU.
 	 */
@@ -104,6 +105,10 @@ static inline void xdp_scrub_frame(struct xdp_frame *frame)
 	frame->dev_rx = NULL;
 }
 
+/* Avoids inlining WARN macro in fast-path */
+void xdp_warn(const char* msg, const char* func, const int line);
+#define XDP_WARN(msg) xdp_warn(msg, __func__, __LINE__)
+
 struct xdp_frame *xdp_convert_zc_to_xdp_frame(struct xdp_buff *xdp);
 
 /* Convert xdp_buff to xdp_frame */
@@ -124,6 +129,12 @@ struct xdp_frame *convert_to_xdp_frame(struct xdp_buff *xdp)
 	if (unlikely((headroom - metasize) < sizeof(*xdp_frame)))
 		return NULL;
 
+	/* Catch if driver didn't reserve tailroom for skb_shared_info */
+	if (unlikely(xdp->data_end > xdp_data_hard_end(xdp))) {
+		XDP_WARN("Driver BUG: missing reserved tailroom");
+		return NULL;
+	}
+
 	/* Store info in top of packet */
 	xdp_frame = xdp->data_hard_start;
 
@@ -131,6 +142,7 @@ struct xdp_frame *convert_to_xdp_frame(struct xdp_buff *xdp)
 	xdp_frame->len  = xdp->data_end - xdp->data;
 	xdp_frame->headroom = headroom - sizeof(*xdp_frame);
 	xdp_frame->metasize = metasize;
+	xdp_frame->frame_sz = xdp->frame_sz;
 
 	/* rxq only valid until napi_schedule ends, convert to xdp_mem_info */
 	xdp_frame->mem = xdp->rxq->mem;
diff --git a/net/core/xdp.c b/net/core/xdp.c
index 4c7ea85486af..4bc3026ae218 100644
--- a/net/core/xdp.c
+++ b/net/core/xdp.c
@@ -11,6 +11,7 @@
 #include <linux/slab.h>
 #include <linux/idr.h>
 #include <linux/rhashtable.h>
+#include <linux/bug.h>
 #include <net/page_pool.h>
 
 #include <net/xdp.h>
@@ -496,3 +497,9 @@ struct xdp_frame *xdp_convert_zc_to_xdp_frame(struct xdp_buff *xdp)
 	return xdpf;
 }
 EXPORT_SYMBOL_GPL(xdp_convert_zc_to_xdp_frame);
+
+/* Used by XDP_WARN macro, to avoid inlining WARN() in fast-path */
+void xdp_warn(const char* msg, const char* func, const int line) {
+	WARN(1, "XDP_WARN: %s(line:%d): %s\n", func, line, msg);
+};
+EXPORT_SYMBOL_GPL(xdp_warn);



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH net-next v3 08/33] xdp: cpumap redirect use frame_sz and increase skb_tailroom
  2020-05-08 11:08 [PATCH net-next v3 00/33] XDP extend with knowledge of frame size Jesper Dangaard Brouer
                   ` (6 preceding siblings ...)
  2020-05-08 11:09 ` [PATCH net-next v3 07/33] xdp: xdp_frame add member frame_sz and handle in convert_to_xdp_frame Jesper Dangaard Brouer
@ 2020-05-08 11:09 ` Jesper Dangaard Brouer
  2020-05-08 11:09 ` [PATCH net-next v3 09/33] veth: adjust hard_start offset on redirect XDP frames Jesper Dangaard Brouer
                   ` (24 subsequent siblings)
  32 siblings, 0 replies; 38+ messages in thread
From: Jesper Dangaard Brouer @ 2020-05-08 11:09 UTC (permalink / raw)
  To: sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, David S. Miller, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, Tariq Toukan

Knowing the memory size backing the packet/xdp_frame data area, and
knowing it already have reserved room for skb_shared_info, simplifies
using build_skb significantly.

With this change we no-longer lie about the SKB truesize, but more
importantly a significant larger skb_tailroom is now provided, e.g. when
drivers uses a full PAGE_SIZE. This extra tailroom (in linear area) can be
used by the network stack when coalescing SKBs (e.g. in skb_try_coalesce,
see TCP cases where tcp_queue_rcv() can 'eat' skb).

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 kernel/bpf/cpumap.c |   21 +++------------------
 1 file changed, 3 insertions(+), 18 deletions(-)

diff --git a/kernel/bpf/cpumap.c b/kernel/bpf/cpumap.c
index 3fe0b006d2d2..a71790dab12d 100644
--- a/kernel/bpf/cpumap.c
+++ b/kernel/bpf/cpumap.c
@@ -162,25 +162,10 @@ static struct sk_buff *cpu_map_build_skb(struct bpf_cpu_map_entry *rcpu,
 	/* Part of headroom was reserved to xdpf */
 	hard_start_headroom = sizeof(struct xdp_frame) +  xdpf->headroom;
 
-	/* build_skb need to place skb_shared_info after SKB end, and
-	 * also want to know the memory "truesize".  Thus, need to
-	 * know the memory frame size backing xdp_buff.
-	 *
-	 * XDP was designed to have PAGE_SIZE frames, but this
-	 * assumption is not longer true with ixgbe and i40e.  It
-	 * would be preferred to set frame_size to 2048 or 4096
-	 * depending on the driver.
-	 *   frame_size = 2048;
-	 *   frame_len  = frame_size - sizeof(*xdp_frame);
-	 *
-	 * Instead, with info avail, skb_shared_info in placed after
-	 * packet len.  This, unfortunately fakes the truesize.
-	 * Another disadvantage of this approach, the skb_shared_info
-	 * is not at a fixed memory location, with mixed length
-	 * packets, which is bad for cache-line hotness.
+	/* Memory size backing xdp_frame data already have reserved
+	 * room for build_skb to place skb_shared_info in tailroom.
 	 */
-	frame_size = SKB_DATA_ALIGN(xdpf->len + hard_start_headroom) +
-		SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+	frame_size = xdpf->frame_sz;
 
 	pkt_data_start = xdpf->data - hard_start_headroom;
 	skb = build_skb_around(skb, pkt_data_start, frame_size);



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH net-next v3 09/33] veth: adjust hard_start offset on redirect XDP frames
  2020-05-08 11:08 [PATCH net-next v3 00/33] XDP extend with knowledge of frame size Jesper Dangaard Brouer
                   ` (7 preceding siblings ...)
  2020-05-08 11:09 ` [PATCH net-next v3 08/33] xdp: cpumap redirect use frame_sz and increase skb_tailroom Jesper Dangaard Brouer
@ 2020-05-08 11:09 ` Jesper Dangaard Brouer
  2020-05-08 11:09 ` [PATCH net-next v3 10/33] veth: xdp using frame_sz in veth driver Jesper Dangaard Brouer
                   ` (23 subsequent siblings)
  32 siblings, 0 replies; 38+ messages in thread
From: Jesper Dangaard Brouer @ 2020-05-08 11:09 UTC (permalink / raw)
  To: sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, David S. Miller, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, Tariq Toukan

When native XDP redirect into a veth device, the frame arrives in the
xdp_frame structure. It is then processed in veth_xdp_rcv_one(),
which can run a new XDP bpf_prog on the packet. Doing so requires
converting xdp_frame to xdp_buff, but the tricky part is that
xdp_frame memory area is located in the top (data_hard_start) memory
area that xdp_buff will point into.

The current code tried to protect the xdp_frame area, by assigning
xdp_buff.data_hard_start past this memory. This results in 32 bytes
less headroom to expand into via BPF-helper bpf_xdp_adjust_head().

This protect step is actually not needed, because BPF-helper
bpf_xdp_adjust_head() already reserve this area, and don't allow
BPF-prog to expand into it. Thus, it is safe to point data_hard_start
directly at xdp_frame memory area.

Cc: Toshiaki Makita <toshiaki.makita1@gmail.com>
Fixes: 9fc8d518d9d5 ("veth: Handle xdp_frames in xdp napi ring")
Reported-by: Mao Wenan <maowenan@huawei.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Toshiaki Makita <toshiaki.makita1@gmail.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 drivers/net/veth.c |    8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index aece0e5eec8c..d5691bb84448 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -564,13 +564,15 @@ static struct sk_buff *veth_xdp_rcv_one(struct veth_rq *rq,
 					struct veth_stats *stats)
 {
 	void *hard_start = frame->data - frame->headroom;
-	void *head = hard_start - sizeof(struct xdp_frame);
 	int len = frame->len, delta = 0;
 	struct xdp_frame orig_frame;
 	struct bpf_prog *xdp_prog;
 	unsigned int headroom;
 	struct sk_buff *skb;
 
+	/* bpf_xdp_adjust_head() assures BPF cannot access xdp_frame area */
+	hard_start -= sizeof(struct xdp_frame);
+
 	rcu_read_lock();
 	xdp_prog = rcu_dereference(rq->xdp_prog);
 	if (likely(xdp_prog)) {
@@ -592,7 +594,6 @@ static struct sk_buff *veth_xdp_rcv_one(struct veth_rq *rq,
 			break;
 		case XDP_TX:
 			orig_frame = *frame;
-			xdp.data_hard_start = head;
 			xdp.rxq->mem = frame->mem;
 			if (unlikely(veth_xdp_tx(rq, &xdp, bq) < 0)) {
 				trace_xdp_exception(rq->dev, xdp_prog, act);
@@ -605,7 +606,6 @@ static struct sk_buff *veth_xdp_rcv_one(struct veth_rq *rq,
 			goto xdp_xmit;
 		case XDP_REDIRECT:
 			orig_frame = *frame;
-			xdp.data_hard_start = head;
 			xdp.rxq->mem = frame->mem;
 			if (xdp_do_redirect(rq->dev, &xdp, xdp_prog)) {
 				frame = &orig_frame;
@@ -629,7 +629,7 @@ static struct sk_buff *veth_xdp_rcv_one(struct veth_rq *rq,
 	rcu_read_unlock();
 
 	headroom = sizeof(struct xdp_frame) + frame->headroom - delta;
-	skb = veth_build_skb(head, headroom, len, 0);
+	skb = veth_build_skb(hard_start, headroom, len, 0);
 	if (!skb) {
 		xdp_return_frame(frame);
 		stats->rx_drops++;



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH net-next v3 10/33] veth: xdp using frame_sz in veth driver
  2020-05-08 11:08 [PATCH net-next v3 00/33] XDP extend with knowledge of frame size Jesper Dangaard Brouer
                   ` (8 preceding siblings ...)
  2020-05-08 11:09 ` [PATCH net-next v3 09/33] veth: adjust hard_start offset on redirect XDP frames Jesper Dangaard Brouer
@ 2020-05-08 11:09 ` Jesper Dangaard Brouer
  2020-05-08 11:09 ` [PATCH net-next v3 11/33] dpaa2-eth: add XDP frame size Jesper Dangaard Brouer
                   ` (22 subsequent siblings)
  32 siblings, 0 replies; 38+ messages in thread
From: Jesper Dangaard Brouer @ 2020-05-08 11:09 UTC (permalink / raw)
  To: sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, David S. Miller, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, Tariq Toukan

The veth driver can run XDP in "native" mode in it's own NAPI
handler, and since commit 9fc8d518d9d5 ("veth: Handle xdp_frames in
xdp napi ring") packets can come in two forms either xdp_frame or
skb, calling respectively veth_xdp_rcv_one() or veth_xdp_rcv_skb().

For packets to arrive in xdp_frame format, they will have been
redirected from an XDP native driver. In case of XDP_PASS or no
XDP-prog attached, the veth driver will allocate and create an SKB.

The current code in veth_xdp_rcv_one() xdp_frame case, had to guess
the frame truesize of the incoming xdp_frame, when using
veth_build_skb(). With xdp_frame->frame_sz this is not longer
necessary.

Calculating the frame_sz in veth_xdp_rcv_skb() skb case, is done
similar to the XDP-generic handling code in net/core/dev.c.

Cc: Toshiaki Makita <toshiaki.makita1@gmail.com>
Reviewed-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Toshiaki Makita <toshiaki.makita1@gmail.com>
---
 drivers/net/veth.c |   22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index d5691bb84448..b586d2fa5551 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -405,10 +405,6 @@ static struct sk_buff *veth_build_skb(void *head, int headroom, int len,
 {
 	struct sk_buff *skb;
 
-	if (!buflen) {
-		buflen = SKB_DATA_ALIGN(headroom + len) +
-			 SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
-	}
 	skb = build_skb(head, buflen);
 	if (!skb)
 		return NULL;
@@ -583,6 +579,7 @@ static struct sk_buff *veth_xdp_rcv_one(struct veth_rq *rq,
 		xdp.data = frame->data;
 		xdp.data_end = frame->data + frame->len;
 		xdp.data_meta = frame->data - frame->metasize;
+		xdp.frame_sz = frame->frame_sz;
 		xdp.rxq = &rq->xdp_rxq;
 
 		act = bpf_prog_run_xdp(xdp_prog, &xdp);
@@ -629,7 +626,7 @@ static struct sk_buff *veth_xdp_rcv_one(struct veth_rq *rq,
 	rcu_read_unlock();
 
 	headroom = sizeof(struct xdp_frame) + frame->headroom - delta;
-	skb = veth_build_skb(hard_start, headroom, len, 0);
+	skb = veth_build_skb(hard_start, headroom, len, frame->frame_sz);
 	if (!skb) {
 		xdp_return_frame(frame);
 		stats->rx_drops++;
@@ -695,9 +692,8 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
 			goto drop;
 		}
 
-		nskb = veth_build_skb(head,
-				      VETH_XDP_HEADROOM + mac_len, skb->len,
-				      PAGE_SIZE);
+		nskb = veth_build_skb(head, VETH_XDP_HEADROOM + mac_len,
+				      skb->len, PAGE_SIZE);
 		if (!nskb) {
 			page_frag_free(head);
 			goto drop;
@@ -715,6 +711,11 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
 	xdp.data_end = xdp.data + pktlen;
 	xdp.data_meta = xdp.data;
 	xdp.rxq = &rq->xdp_rxq;
+
+	/* SKB "head" area always have tailroom for skb_shared_info */
+	xdp.frame_sz = (void *)skb_end_pointer(skb) - xdp.data_hard_start;
+	xdp.frame_sz += SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+
 	orig_data = xdp.data;
 	orig_data_end = xdp.data_end;
 
@@ -758,6 +759,7 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
 	}
 	rcu_read_unlock();
 
+	/* check if bpf_xdp_adjust_head was used */
 	delta = orig_data - xdp.data;
 	off = mac_len + delta;
 	if (off > 0)
@@ -765,9 +767,11 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
 	else if (off < 0)
 		__skb_pull(skb, -off);
 	skb->mac_header -= delta;
+
+	/* check if bpf_xdp_adjust_tail was used */
 	off = xdp.data_end - orig_data_end;
 	if (off != 0)
-		__skb_put(skb, off);
+		__skb_put(skb, off); /* positive on grow, negative on shrink */
 	skb->protocol = eth_type_trans(skb, rq->dev);
 
 	metalen = xdp.data - xdp.data_meta;



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH net-next v3 11/33] dpaa2-eth: add XDP frame size
  2020-05-08 11:08 [PATCH net-next v3 00/33] XDP extend with knowledge of frame size Jesper Dangaard Brouer
                   ` (9 preceding siblings ...)
  2020-05-08 11:09 ` [PATCH net-next v3 10/33] veth: xdp using frame_sz in veth driver Jesper Dangaard Brouer
@ 2020-05-08 11:09 ` Jesper Dangaard Brouer
  2020-05-08 11:09 ` [PATCH net-next v3 12/33] hv_netvsc: add XDP frame size to driver Jesper Dangaard Brouer
                   ` (21 subsequent siblings)
  32 siblings, 0 replies; 38+ messages in thread
From: Jesper Dangaard Brouer @ 2020-05-08 11:09 UTC (permalink / raw)
  To: sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, David S. Miller, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, Tariq Toukan

The dpaa2-eth driver reserve some headroom used for hardware and
software annotation area in RX/TX buffers. Thus, xdp.data_hard_start
doesn't start at page boundary.

When XDP is configured the area reserved via dpaa2_fd_get_offset(fd) is
448 bytes of which XDP have reserved 256 bytes. As frame_sz is
calculated as an offset from xdp_buff.data_hard_start, an adjust from
the full PAGE_SIZE == DPAA2_ETH_RX_BUF_RAW_SIZE.

When doing XDP_REDIRECT, the driver doesn't need this reserved headroom
any-longer and allows xdp_do_redirect() to use it. This is an advantage
for the drivers own ndo-xdp_xmit, as it uses part of this headroom for
itself.  Patch also adjust frame_sz in this case.

The driver cannot support XDP data_meta, because it uses the headroom
just before xdp.data for struct dpaa2_eth_swa (DPAA2_ETH_SWA_SIZE=64),
when transmitting the packet. When transmitting a xdp_frame in
dpaa2_eth_xdp_xmit_frame (call via ndo_xdp_xmit) is uses this area to
store a pointer to xdp_frame and dma_size, which is used in TX
completion (free_tx_fd) to return frame via xdp_return_frame().

Cc: Ioana Radulescu <ruxandra.radulescu@nxp.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c |    7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
index 0f3e842a4fd6..8c8d95aa1dfd 100644
--- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
+++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-eth.c
@@ -331,6 +331,9 @@ static u32 run_xdp(struct dpaa2_eth_priv *priv,
 	xdp_set_data_meta_invalid(&xdp);
 	xdp.rxq = &ch->xdp_rxq;
 
+	xdp.frame_sz = DPAA2_ETH_RX_BUF_RAW_SIZE -
+		(dpaa2_fd_get_offset(fd) - XDP_PACKET_HEADROOM);
+
 	xdp_act = bpf_prog_run_xdp(xdp_prog, &xdp);
 
 	/* xdp.data pointer may have changed */
@@ -366,7 +369,11 @@ static u32 run_xdp(struct dpaa2_eth_priv *priv,
 		dma_unmap_page(priv->net_dev->dev.parent, addr,
 			       DPAA2_ETH_RX_BUF_SIZE, DMA_BIDIRECTIONAL);
 		ch->buf_count--;
+
+		/* Allow redirect use of full headroom */
 		xdp.data_hard_start = vaddr;
+		xdp.frame_sz = DPAA2_ETH_RX_BUF_RAW_SIZE;
+
 		err = xdp_do_redirect(priv->net_dev, &xdp, xdp_prog);
 		if (unlikely(err))
 			ch->stats.xdp_drop++;



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH net-next v3 12/33] hv_netvsc: add XDP frame size to driver
  2020-05-08 11:08 [PATCH net-next v3 00/33] XDP extend with knowledge of frame size Jesper Dangaard Brouer
                   ` (10 preceding siblings ...)
  2020-05-08 11:09 ` [PATCH net-next v3 11/33] dpaa2-eth: add XDP frame size Jesper Dangaard Brouer
@ 2020-05-08 11:09 ` Jesper Dangaard Brouer
  2020-05-08 11:09 ` [PATCH net-next v3 13/33] qlogic/qede: " Jesper Dangaard Brouer
                   ` (20 subsequent siblings)
  32 siblings, 0 replies; 38+ messages in thread
From: Jesper Dangaard Brouer @ 2020-05-08 11:09 UTC (permalink / raw)
  To: sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, David S. Miller, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, Tariq Toukan

The hyperv NIC driver does memory allocation and copy even without XDP.
In XDP mode it will allocate a new page for each packet and copy over
the payload, before invoking the XDP BPF-prog.

The positive thing it that its easy to determine the xdp.frame_sz.

The XDP implementation for hv_netvsc transparently passes xdp_prog
to the associated VF NIC. Many of the Azure VMs are using SRIOV, so
majority of the data are actually processed directly on the VF driver's XDP
path. So the overhead of the synthetic data path (hv_netvsc) is minimal.

Then XDP is enabled on this driver, XDP_PASS and XDP_TX will create the
SKB via build_skb (based on the newly allocated page). Now using XDP
frame_sz this will provide more skb_tailroom, which netstack can use for
SKB coalescing (e.g tcp_try_coalesce -> skb_try_coalesce).

V3: Adjust patch desc to be more positive.

Cc: Wei Liu <wei.liu@kernel.org>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/hyperv/netvsc_bpf.c |    1 +
 drivers/net/hyperv/netvsc_drv.c |    2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/hyperv/netvsc_bpf.c b/drivers/net/hyperv/netvsc_bpf.c
index b86611041db6..1e0c024b0a93 100644
--- a/drivers/net/hyperv/netvsc_bpf.c
+++ b/drivers/net/hyperv/netvsc_bpf.c
@@ -49,6 +49,7 @@ u32 netvsc_run_xdp(struct net_device *ndev, struct netvsc_channel *nvchan,
 	xdp_set_data_meta_invalid(xdp);
 	xdp->data_end = xdp->data + len;
 	xdp->rxq = &nvchan->xdp_rxq;
+	xdp->frame_sz = PAGE_SIZE;
 	xdp->handle = 0;
 
 	memcpy(xdp->data, data, len);
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 5de57fc3ec60..6267f706e8ee 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -795,7 +795,7 @@ static struct sk_buff *netvsc_alloc_recv_skb(struct net_device *net,
 	if (xbuf) {
 		unsigned int hdroom = xdp->data - xdp->data_hard_start;
 		unsigned int xlen = xdp->data_end - xdp->data;
-		unsigned int frag_size = netvsc_xdp_fraglen(hdroom + xlen);
+		unsigned int frag_size = xdp->frame_sz;
 
 		skb = build_skb(xbuf, frag_size);
 



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH net-next v3 13/33] qlogic/qede: add XDP frame size to driver
  2020-05-08 11:08 [PATCH net-next v3 00/33] XDP extend with knowledge of frame size Jesper Dangaard Brouer
                   ` (11 preceding siblings ...)
  2020-05-08 11:09 ` [PATCH net-next v3 12/33] hv_netvsc: add XDP frame size to driver Jesper Dangaard Brouer
@ 2020-05-08 11:09 ` Jesper Dangaard Brouer
  2020-05-08 11:09 ` [PATCH net-next v3 14/33] net: ethernet: ti: add XDP frame size to driver cpsw Jesper Dangaard Brouer
                   ` (19 subsequent siblings)
  32 siblings, 0 replies; 38+ messages in thread
From: Jesper Dangaard Brouer @ 2020-05-08 11:09 UTC (permalink / raw)
  To: sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, David S. Miller, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, Tariq Toukan

The driver qede uses a full page, when XDP is enabled. The drivers value
in rx_buf_seg_size (struct qede_rx_queue) will be PAGE_SIZE when an
XDP bpf_prog is attached.

Cc: Ariel Elior <aelior@marvell.com>
Cc: GR-everest-linux-l2@marvell.com
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/qlogic/qede/qede_fp.c   |    1 +
 drivers/net/ethernet/qlogic/qede/qede_main.c |    2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/qlogic/qede/qede_fp.c b/drivers/net/ethernet/qlogic/qede/qede_fp.c
index c6c20776b474..7598ebe0962a 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_fp.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_fp.c
@@ -1066,6 +1066,7 @@ static bool qede_rx_xdp(struct qede_dev *edev,
 	xdp_set_data_meta_invalid(&xdp);
 	xdp.data_end = xdp.data + *len;
 	xdp.rxq = &rxq->xdp_rxq;
+	xdp.frame_sz = rxq->rx_buf_seg_size; /* PAGE_SIZE when XDP enabled */
 
 	/* Queues always have a full reset currently, so for the time
 	 * being until there's atomic program replace just mark read
diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c b/drivers/net/ethernet/qlogic/qede/qede_main.c
index 256506024b88..2bc153d5ce03 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_main.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_main.c
@@ -1418,7 +1418,7 @@ static int qede_alloc_mem_rxq(struct qede_dev *edev, struct qede_rx_queue *rxq)
 	if (rxq->rx_buf_size + size > PAGE_SIZE)
 		rxq->rx_buf_size = PAGE_SIZE - size;
 
-	/* Segment size to spilt a page in multiple equal parts ,
+	/* Segment size to split a page in multiple equal parts,
 	 * unless XDP is used in which case we'd use the entire page.
 	 */
 	if (!edev->xdp_prog) {



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH net-next v3 14/33] net: ethernet: ti: add XDP frame size to driver cpsw
  2020-05-08 11:08 [PATCH net-next v3 00/33] XDP extend with knowledge of frame size Jesper Dangaard Brouer
                   ` (12 preceding siblings ...)
  2020-05-08 11:09 ` [PATCH net-next v3 13/33] qlogic/qede: " Jesper Dangaard Brouer
@ 2020-05-08 11:09 ` Jesper Dangaard Brouer
  2020-05-08 11:10 ` [PATCH net-next v3 15/33] ena: add XDP frame size to amazon NIC driver Jesper Dangaard Brouer
                   ` (18 subsequent siblings)
  32 siblings, 0 replies; 38+ messages in thread
From: Jesper Dangaard Brouer @ 2020-05-08 11:09 UTC (permalink / raw)
  To: sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, David S. Miller, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, Tariq Toukan

The driver code cpsw.c and cpsw_new.c both use page_pool
with default order-0 pages or their RX-pages.

Cc: Grygorii Strashko <grygorii.strashko@ti.com>
Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: Grygorii Strashko <grygorii.strashko@ti.com>
---
 drivers/net/ethernet/ti/cpsw.c     |    1 +
 drivers/net/ethernet/ti/cpsw_new.c |    1 +
 2 files changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/ti/cpsw.c b/drivers/net/ethernet/ti/cpsw.c
index 09f98fa2fb4e..ce0645ada6e7 100644
--- a/drivers/net/ethernet/ti/cpsw.c
+++ b/drivers/net/ethernet/ti/cpsw.c
@@ -406,6 +406,7 @@ static void cpsw_rx_handler(void *token, int len, int status)
 
 		xdp.data_hard_start = pa;
 		xdp.rxq = &priv->xdp_rxq[ch];
+		xdp.frame_sz = PAGE_SIZE;
 
 		port = priv->emac_port + cpsw->data.dual_emac;
 		ret = cpsw_run_xdp(priv, ch, &xdp, page, port);
diff --git a/drivers/net/ethernet/ti/cpsw_new.c b/drivers/net/ethernet/ti/cpsw_new.c
index dce49311d3d3..1247d35d42ef 100644
--- a/drivers/net/ethernet/ti/cpsw_new.c
+++ b/drivers/net/ethernet/ti/cpsw_new.c
@@ -348,6 +348,7 @@ static void cpsw_rx_handler(void *token, int len, int status)
 
 		xdp.data_hard_start = pa;
 		xdp.rxq = &priv->xdp_rxq[ch];
+		xdp.frame_sz = PAGE_SIZE;
 
 		ret = cpsw_run_xdp(priv, ch, &xdp, page, priv->emac_port);
 		if (ret != CPSW_XDP_PASS)



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH net-next v3 15/33] ena: add XDP frame size to amazon NIC driver
  2020-05-08 11:08 [PATCH net-next v3 00/33] XDP extend with knowledge of frame size Jesper Dangaard Brouer
                   ` (13 preceding siblings ...)
  2020-05-08 11:09 ` [PATCH net-next v3 14/33] net: ethernet: ti: add XDP frame size to driver cpsw Jesper Dangaard Brouer
@ 2020-05-08 11:10 ` Jesper Dangaard Brouer
  2020-05-08 11:10 ` [PATCH net-next v3 16/33] mlx4: add XDP frame size and adjust max XDP MTU Jesper Dangaard Brouer
                   ` (17 subsequent siblings)
  32 siblings, 0 replies; 38+ messages in thread
From: Jesper Dangaard Brouer @ 2020-05-08 11:10 UTC (permalink / raw)
  To: sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, David S. Miller, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, Tariq Toukan

Frame size ENA_PAGE_SIZE is limited to 16K on systems with larger
PAGE_SIZE than 16K. Change ENA_XDP_MAX_MTU to also take into account
the reserved tailroom.

Cc: Arthur Kiyanovski <akiyano@amazon.com>
Acked-by: Sameeh Jubran <sameehj@amazon.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/amazon/ena/ena_netdev.c |    1 +
 drivers/net/ethernet/amazon/ena/ena_netdev.h |    5 +++--
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c
index 2818965427e9..85b87ed02dd5 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.c
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c
@@ -1606,6 +1606,7 @@ static int ena_clean_rx_irq(struct ena_ring *rx_ring, struct napi_struct *napi,
 		  "%s qid %d\n", __func__, rx_ring->qid);
 	res_budget = budget;
 	xdp.rxq = &rx_ring->xdp_rxq;
+	xdp.frame_sz = ENA_PAGE_SIZE;
 
 	do {
 		xdp_verdict = XDP_PASS;
diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.h b/drivers/net/ethernet/amazon/ena/ena_netdev.h
index 7df67bf09b93..680099afcccf 100644
--- a/drivers/net/ethernet/amazon/ena/ena_netdev.h
+++ b/drivers/net/ethernet/amazon/ena/ena_netdev.h
@@ -151,8 +151,9 @@
  * The buffer size we share with the device is defined to be ENA_PAGE_SIZE
  */
 
-#define ENA_XDP_MAX_MTU (ENA_PAGE_SIZE - ETH_HLEN - ETH_FCS_LEN - \
-				VLAN_HLEN - XDP_PACKET_HEADROOM)
+#define ENA_XDP_MAX_MTU (ENA_PAGE_SIZE - ETH_HLEN - ETH_FCS_LEN -	\
+			 VLAN_HLEN - XDP_PACKET_HEADROOM -		\
+			 SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))
 
 #define ENA_IS_XDP_INDEX(adapter, index) (((index) >= (adapter)->xdp_first_ring) && \
 	((index) < (adapter)->xdp_first_ring + (adapter)->xdp_num_queues))



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH net-next v3 16/33] mlx4: add XDP frame size and adjust max XDP MTU
  2020-05-08 11:08 [PATCH net-next v3 00/33] XDP extend with knowledge of frame size Jesper Dangaard Brouer
                   ` (14 preceding siblings ...)
  2020-05-08 11:10 ` [PATCH net-next v3 15/33] ena: add XDP frame size to amazon NIC driver Jesper Dangaard Brouer
@ 2020-05-08 11:10 ` Jesper Dangaard Brouer
  2020-05-08 11:10 ` [PATCH net-next v3 17/33] net: thunderx: add XDP frame size Jesper Dangaard Brouer
                   ` (16 subsequent siblings)
  32 siblings, 0 replies; 38+ messages in thread
From: Jesper Dangaard Brouer @ 2020-05-08 11:10 UTC (permalink / raw)
  To: sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, David S. Miller, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, Tariq Toukan

The mlx4 drivers size of memory backing the RX packet is stored in
frag_stride. For XDP mode this will be PAGE_SIZE (normally 4096).
For normal mode frag_stride is 2048.

Also adjust MLX4_EN_MAX_XDP_MTU to take tailroom into account.

Cc: Tariq Toukan <tariqt@mellanox.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlx4/en_netdev.c |    3 ++-
 drivers/net/ethernet/mellanox/mlx4/en_rx.c     |    1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 43dcbd8214c6..5bd3cd37d50f 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -51,7 +51,8 @@
 #include "en_port.h"
 
 #define MLX4_EN_MAX_XDP_MTU ((int)(PAGE_SIZE - ETH_HLEN - (2 * VLAN_HLEN) - \
-				   XDP_PACKET_HEADROOM))
+				XDP_PACKET_HEADROOM -			    \
+				SKB_DATA_ALIGN(sizeof(struct skb_shared_info))))
 
 int mlx4_en_setup_tc(struct net_device *dev, u8 up)
 {
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_rx.c b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
index 787139219813..8a10285b0e10 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_rx.c
@@ -683,6 +683,7 @@ int mlx4_en_process_rx_cq(struct net_device *dev, struct mlx4_en_cq *cq, int bud
 	rcu_read_lock();
 	xdp_prog = rcu_dereference(ring->xdp_prog);
 	xdp.rxq = &ring->xdp_rxq;
+	xdp.frame_sz = priv->frag_info[0].frag_stride;
 	doorbell_pending = 0;
 
 	/* We assume a 1:1 mapping between CQEs and Rx descriptors, so Rx



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH net-next v3 17/33] net: thunderx: add XDP frame size
  2020-05-08 11:08 [PATCH net-next v3 00/33] XDP extend with knowledge of frame size Jesper Dangaard Brouer
                   ` (15 preceding siblings ...)
  2020-05-08 11:10 ` [PATCH net-next v3 16/33] mlx4: add XDP frame size and adjust max XDP MTU Jesper Dangaard Brouer
@ 2020-05-08 11:10 ` Jesper Dangaard Brouer
  2020-05-08 11:10 ` [PATCH net-next v3 18/33] nfp: add XDP frame size to netronome driver Jesper Dangaard Brouer
                   ` (15 subsequent siblings)
  32 siblings, 0 replies; 38+ messages in thread
From: Jesper Dangaard Brouer @ 2020-05-08 11:10 UTC (permalink / raw)
  To: sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, David S. Miller, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, Tariq Toukan

To help reviewers these are the defines related to RCV_FRAG_LEN

 #define DMA_BUFFER_LEN	1536 /* In multiples of 128bytes */
 #define RCV_FRAG_LEN	(SKB_DATA_ALIGN(DMA_BUFFER_LEN + NET_SKB_PAD) + \
			 SKB_DATA_ALIGN(sizeof(struct skb_shared_info)))

Cc: Sunil Goutham <sgoutham@marvell.com>
Cc: Robert Richter <rrichter@marvell.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/cavium/thunder/nicvf_main.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/cavium/thunder/nicvf_main.c b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
index b4b33368698f..2ba0ce115e63 100644
--- a/drivers/net/ethernet/cavium/thunder/nicvf_main.c
+++ b/drivers/net/ethernet/cavium/thunder/nicvf_main.c
@@ -552,6 +552,7 @@ static inline bool nicvf_xdp_rx(struct nicvf *nic, struct bpf_prog *prog,
 	xdp_set_data_meta_invalid(&xdp);
 	xdp.data_end = xdp.data + len;
 	xdp.rxq = &rq->xdp_rxq;
+	xdp.frame_sz = RCV_FRAG_LEN + XDP_PACKET_HEADROOM;
 	orig_data = xdp.data;
 
 	rcu_read_lock();



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH net-next v3 18/33] nfp: add XDP frame size to netronome driver
  2020-05-08 11:08 [PATCH net-next v3 00/33] XDP extend with knowledge of frame size Jesper Dangaard Brouer
                   ` (16 preceding siblings ...)
  2020-05-08 11:10 ` [PATCH net-next v3 17/33] net: thunderx: add XDP frame size Jesper Dangaard Brouer
@ 2020-05-08 11:10 ` Jesper Dangaard Brouer
  2020-05-08 11:10 ` [PATCH net-next v3 19/33] tun: add XDP frame size Jesper Dangaard Brouer
                   ` (14 subsequent siblings)
  32 siblings, 0 replies; 38+ messages in thread
From: Jesper Dangaard Brouer @ 2020-05-08 11:10 UTC (permalink / raw)
  To: sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, David S. Miller, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, Tariq Toukan

The netronome nfp driver use PAGE_SIZE when xdp_prog is set, but
xdp.data_hard_start begins at offset NFP_NET_RX_BUF_HEADROOM.
Thus, adjust for this when setting xdp.frame_sz, as it counts
from data_hard_start.

When doing XDP_TX this driver is smart and instead of a full DMA-map
does a DMA-sync on with packet length. As xdp_adjust_tail can now
grow packet length, add checks to make sure that grow size is within
the DMA-mapped size.

Cc: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: Jakub Kicinski <kuba@kernel.org>
---
 .../net/ethernet/netronome/nfp/nfp_net_common.c    |    6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 9bfb3b077bc1..0e0cc3d58bdc 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -1741,10 +1741,15 @@ nfp_net_tx_xdp_buf(struct nfp_net_dp *dp, struct nfp_net_rx_ring *rx_ring,
 		   struct nfp_net_rx_buf *rxbuf, unsigned int dma_off,
 		   unsigned int pkt_len, bool *completed)
 {
+	unsigned int dma_map_sz = dp->fl_bufsz - NFP_NET_RX_BUF_NON_DATA;
 	struct nfp_net_tx_buf *txbuf;
 	struct nfp_net_tx_desc *txd;
 	int wr_idx;
 
+	/* Reject if xdp_adjust_tail grow packet beyond DMA area */
+	if (pkt_len + dma_off > dma_map_sz)
+		return false;
+
 	if (unlikely(nfp_net_tx_full(tx_ring, 1))) {
 		if (!*completed) {
 			nfp_net_xdp_complete(tx_ring);
@@ -1817,6 +1822,7 @@ static int nfp_net_rx(struct nfp_net_rx_ring *rx_ring, int budget)
 	rcu_read_lock();
 	xdp_prog = READ_ONCE(dp->xdp_prog);
 	true_bufsz = xdp_prog ? PAGE_SIZE : dp->fl_bufsz;
+	xdp.frame_sz = PAGE_SIZE - NFP_NET_RX_BUF_HEADROOM;
 	xdp.rxq = &rx_ring->xdp_rxq;
 	tx_ring = r_vec->xdp_ring;
 



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH net-next v3 19/33] tun: add XDP frame size
  2020-05-08 11:08 [PATCH net-next v3 00/33] XDP extend with knowledge of frame size Jesper Dangaard Brouer
                   ` (17 preceding siblings ...)
  2020-05-08 11:10 ` [PATCH net-next v3 18/33] nfp: add XDP frame size to netronome driver Jesper Dangaard Brouer
@ 2020-05-08 11:10 ` Jesper Dangaard Brouer
  2020-05-08 11:10 ` [PATCH net-next v3 20/33] vhost_net: also populate " Jesper Dangaard Brouer
                   ` (13 subsequent siblings)
  32 siblings, 0 replies; 38+ messages in thread
From: Jesper Dangaard Brouer @ 2020-05-08 11:10 UTC (permalink / raw)
  To: sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, David S. Miller, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, Tariq Toukan

The tun driver have two code paths for running XDP (bpf_prog_run_xdp).
In both cases 'buflen' contains enough tailroom for skb_shared_info.

Cc: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/tun.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 44889eba1dbc..c54f967e2c66 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1671,6 +1671,7 @@ static struct sk_buff *tun_build_skb(struct tun_struct *tun,
 		xdp_set_data_meta_invalid(&xdp);
 		xdp.data_end = xdp.data + len;
 		xdp.rxq = &tfile->xdp_rxq;
+		xdp.frame_sz = buflen;
 
 		act = bpf_prog_run_xdp(xdp_prog, &xdp);
 		if (act == XDP_REDIRECT || act == XDP_TX) {
@@ -2411,6 +2412,7 @@ static int tun_xdp_one(struct tun_struct *tun,
 		}
 		xdp_set_data_meta_invalid(xdp);
 		xdp->rxq = &tfile->xdp_rxq;
+		xdp->frame_sz = buflen;
 
 		act = bpf_prog_run_xdp(xdp_prog, xdp);
 		err = tun_xdp_act(tun, xdp_prog, xdp, act);



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH net-next v3 20/33] vhost_net: also populate XDP frame size
  2020-05-08 11:08 [PATCH net-next v3 00/33] XDP extend with knowledge of frame size Jesper Dangaard Brouer
                   ` (18 preceding siblings ...)
  2020-05-08 11:10 ` [PATCH net-next v3 19/33] tun: add XDP frame size Jesper Dangaard Brouer
@ 2020-05-08 11:10 ` Jesper Dangaard Brouer
  2020-05-08 11:10 ` [PATCH net-next v3 21/33] virtio_net: add XDP frame size in two code paths Jesper Dangaard Brouer
                   ` (12 subsequent siblings)
  32 siblings, 0 replies; 38+ messages in thread
From: Jesper Dangaard Brouer @ 2020-05-08 11:10 UTC (permalink / raw)
  To: sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, David S. Miller, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, Tariq Toukan

In vhost_net_build_xdp() the 'buf' that gets queued via an xdp_buff
have embedded a struct tun_xdp_hdr (located at xdp->data_hard_start)
which contains the buffer length 'buflen' (with tailroom for
skb_shared_info). Also storing this buflen in xdp->frame_sz, does not
obsolete struct tun_xdp_hdr, as it also contains a struct
virtio_net_hdr with other information.

Cc: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
 drivers/vhost/net.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 2927f02cc7e1..516519dcc8ff 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -747,6 +747,7 @@ static int vhost_net_build_xdp(struct vhost_net_virtqueue *nvq,
 	xdp->data = buf + pad;
 	xdp->data_end = xdp->data + len;
 	hdr->buflen = buflen;
+	xdp->frame_sz = buflen;
 
 	--net->refcnt_bias;
 	alloc_frag->offset += buflen;



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH net-next v3 21/33] virtio_net: add XDP frame size in two code paths
  2020-05-08 11:08 [PATCH net-next v3 00/33] XDP extend with knowledge of frame size Jesper Dangaard Brouer
                   ` (19 preceding siblings ...)
  2020-05-08 11:10 ` [PATCH net-next v3 20/33] vhost_net: also populate " Jesper Dangaard Brouer
@ 2020-05-08 11:10 ` Jesper Dangaard Brouer
  2020-05-09  2:15   ` Jason Wang
  2020-05-08 11:10 ` [PATCH net-next v3 22/33] ixgbe: fix XDP redirect on archs with PAGE_SIZE above 4K Jesper Dangaard Brouer
                   ` (11 subsequent siblings)
  32 siblings, 1 reply; 38+ messages in thread
From: Jesper Dangaard Brouer @ 2020-05-08 11:10 UTC (permalink / raw)
  To: sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, David S. Miller, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, Tariq Toukan

The virtio_net driver is running inside the guest-OS. There are two
XDP receive code-paths in virtio_net, namely receive_small() and
receive_mergeable(). The receive_big() function does not support XDP.

In receive_small() the frame size is available in buflen. The buffer
backing these frames are allocated in add_recvbuf_small() with same
size, except for the headroom, but tailroom have reserved room for
skb_shared_info. The headroom is encoded in ctx pointer as a value.

In receive_mergeable() the frame size is more dynamic. There are two
basic cases: (1) buffer size is based on a exponentially weighted
moving average (see DECLARE_EWMA) of packet length. Or (2) in case
virtnet_get_headroom() have any headroom then buffer size is
PAGE_SIZE. The ctx pointer is this time used for encoding two values;
the buffer len "truesize" and headroom. In case (1) if the rx buffer
size is underestimated, the packet will have been split over more
buffers (num_buf info in virtio_net_hdr_mrg_rxbuf placed in top of
buffer area). If that happens the XDP path does a xdp_linearize_page
operation.

V3: Adjust frame_sz in receive_mergeable() case, spotted by Jason Wang.

The code is really hard to follow, so some hints to reviewers.
The receive_mergeable() case gets frames that were allocated in
add_recvbuf_mergeable() which uses headroom=virtnet_get_headroom(),
and 'buf' ptr is advanced this headroom.  The headroom can only
be 0 or VIRTIO_XDP_HEADROOM, as virtnet_get_headroom is really
simple:

  static unsigned int virtnet_get_headroom(struct virtnet_info *vi)
  {
	return vi->xdp_queue_pairs ? VIRTIO_XDP_HEADROOM : 0;
  }

As frame_sz is an offset size from xdp.data_hard_start, reviewers
should notice how this is calculated in receive_mergeable():

  int offset = buf - page_address(page);
  [...]
  data = page_address(xdp_page) + offset;
  xdp.data_hard_start = data - VIRTIO_XDP_HEADROOM + vi->hdr_len;

The calculated offset will always be VIRTIO_XDP_HEADROOM when
reaching this code.  Thus, xdp.data_hard_start will be page-start
address plus vi->hdr_len.  Given this xdp.frame_sz need to be
reduced with vi->hdr_len size.

IMHO a followup patch should cleanup this code to make it easier
to maintain and understand, but it is outside the scope of this
patchset.

Cc: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
---
 drivers/net/virtio_net.c |   15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 11f722460513..9e1b5d748586 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -689,6 +689,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
 		xdp.data_end = xdp.data + len;
 		xdp.data_meta = xdp.data;
 		xdp.rxq = &rq->xdp_rxq;
+		xdp.frame_sz = buflen;
 		orig_data = xdp.data;
 		act = bpf_prog_run_xdp(xdp_prog, &xdp);
 		stats->xdp_packets++;
@@ -797,10 +798,11 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 	int offset = buf - page_address(page);
 	struct sk_buff *head_skb, *curr_skb;
 	struct bpf_prog *xdp_prog;
-	unsigned int truesize;
+	unsigned int truesize = mergeable_ctx_to_truesize(ctx);
 	unsigned int headroom = mergeable_ctx_to_headroom(ctx);
-	int err;
 	unsigned int metasize = 0;
+	unsigned int frame_sz;
+	int err;
 
 	head_skb = NULL;
 	stats->bytes += len - vi->hdr_len;
@@ -821,6 +823,11 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 		if (unlikely(hdr->hdr.gso_type))
 			goto err_xdp;
 
+		/* Buffers with headroom use PAGE_SIZE as alloc size,
+		 * see add_recvbuf_mergeable() + get_mergeable_buf_len()
+		 */
+		frame_sz = headroom ? PAGE_SIZE : truesize;
+
 		/* This happens when rx buffer size is underestimated
 		 * or headroom is not enough because of the buffer
 		 * was refilled before XDP is set. This should only
@@ -834,6 +841,8 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 						      page, offset,
 						      VIRTIO_XDP_HEADROOM,
 						      &len);
+			frame_sz = PAGE_SIZE;
+
 			if (!xdp_page)
 				goto err_xdp;
 			offset = VIRTIO_XDP_HEADROOM;
@@ -850,6 +859,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 		xdp.data_end = xdp.data + (len - vi->hdr_len);
 		xdp.data_meta = xdp.data;
 		xdp.rxq = &rq->xdp_rxq;
+		xdp.frame_sz = frame_sz - vi->hdr_len;
 
 		act = bpf_prog_run_xdp(xdp_prog, &xdp);
 		stats->xdp_packets++;
@@ -924,7 +934,6 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 	}
 	rcu_read_unlock();
 
-	truesize = mergeable_ctx_to_truesize(ctx);
 	if (unlikely(len > truesize)) {
 		pr_debug("%s: rx error: len %u exceeds truesize %lu\n",
 			 dev->name, len, (unsigned long)ctx);



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH net-next v3 22/33] ixgbe: fix XDP redirect on archs with PAGE_SIZE above 4K
  2020-05-08 11:08 [PATCH net-next v3 00/33] XDP extend with knowledge of frame size Jesper Dangaard Brouer
                   ` (20 preceding siblings ...)
  2020-05-08 11:10 ` [PATCH net-next v3 21/33] virtio_net: add XDP frame size in two code paths Jesper Dangaard Brouer
@ 2020-05-08 11:10 ` Jesper Dangaard Brouer
  2020-05-08 11:10 ` [PATCH net-next v3 23/33] ixgbe: add XDP frame size to driver Jesper Dangaard Brouer
                   ` (10 subsequent siblings)
  32 siblings, 0 replies; 38+ messages in thread
From: Jesper Dangaard Brouer @ 2020-05-08 11:10 UTC (permalink / raw)
  To: sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, David S. Miller, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, Tariq Toukan

The ixgbe driver have another memory model when compiled on archs with
PAGE_SIZE above 4096 bytes. In this mode it doesn't split the page in
two halves, but instead increment rx_buffer->page_offset by truesize of
packet (which include headroom and tailroom for skb_shared_info).

This is done correctly in ixgbe_build_skb(), but in ixgbe_rx_buffer_flip
which is currently only called on XDP_TX and XDP_REDIRECT, it forgets
to add the tailroom for skb_shared_info. This breaks XDP_REDIRECT, for
veth and cpumap.  Fix by adding size of skb_shared_info tailroom.

Maintainers notice: This fix have been queued to Jeff.

Fixes: 6453073987ba ("ixgbe: add initial support for xdp redirect")
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index 718931d951bc..ea6834bae04c 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -2254,7 +2254,8 @@ static void ixgbe_rx_buffer_flip(struct ixgbe_ring *rx_ring,
 	rx_buffer->page_offset ^= truesize;
 #else
 	unsigned int truesize = ring_uses_build_skb(rx_ring) ?
-				SKB_DATA_ALIGN(IXGBE_SKB_PAD + size) :
+				SKB_DATA_ALIGN(IXGBE_SKB_PAD + size) +
+				SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) :
 				SKB_DATA_ALIGN(size);
 
 	rx_buffer->page_offset += truesize;



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH net-next v3 23/33] ixgbe: add XDP frame size to driver
  2020-05-08 11:08 [PATCH net-next v3 00/33] XDP extend with knowledge of frame size Jesper Dangaard Brouer
                   ` (21 preceding siblings ...)
  2020-05-08 11:10 ` [PATCH net-next v3 22/33] ixgbe: fix XDP redirect on archs with PAGE_SIZE above 4K Jesper Dangaard Brouer
@ 2020-05-08 11:10 ` Jesper Dangaard Brouer
  2020-05-08 11:10 ` [PATCH net-next v3 24/33] ixgbevf: add XDP frame size to VF driver Jesper Dangaard Brouer
                   ` (9 subsequent siblings)
  32 siblings, 0 replies; 38+ messages in thread
From: Jesper Dangaard Brouer @ 2020-05-08 11:10 UTC (permalink / raw)
  To: sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, David S. Miller, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, Tariq Toukan

This driver uses different memory models depending on PAGE_SIZE at
compile time. For PAGE_SIZE 4K it uses page splitting, meaning for
normal MTU frame size is 2048 bytes (and headroom 192 bytes). For
larger MTUs the driver still use page splitting, by allocating
order-1 pages (8192 bytes) for RX frames. For PAGE_SIZE larger than
4K, driver instead advance its rx_buffer->page_offset with the frame
size "truesize".

For XDP frame size calculations, this mean that in PAGE_SIZE larger
than 4K mode the frame_sz change on a per packet basis. For the page
split 4K PAGE_SIZE mode, xdp.frame_sz is more constant and can be
updated once outside the main NAPI loop.

The default setting in the driver uses build_skb(), which provides
the necessary headroom and tailroom for XDP-redirect in RX-frame
(in both modes).

There is one complication, which is legacy-rx mode (configurable via
ethtool priv-flags). There are zero headroom in this mode, which is a
requirement for XDP-redirect to work. The conversion to xdp_frame
(convert_to_xdp_frame) will detect this insufficient space, and
xdp_do_redirect() call will fail. This is deemed acceptable, as it
allows other XDP actions to still work in legacy-mode. In
legacy-mode + larger PAGE_SIZE due to lacking tailroom, we also
accept that xdp_adjust_tail shrink doesn't work.

Cc: intel-wired-lan@lists.osuosl.org
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c |   34 +++++++++++++++++++------
 1 file changed, 26 insertions(+), 8 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index ea6834bae04c..eab5934b04f5 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -2244,20 +2244,30 @@ static struct sk_buff *ixgbe_run_xdp(struct ixgbe_adapter *adapter,
 	return ERR_PTR(-result);
 }
 
+static unsigned int ixgbe_rx_frame_truesize(struct ixgbe_ring *rx_ring,
+					    unsigned int size)
+{
+	unsigned int truesize;
+
+#if (PAGE_SIZE < 8192)
+	truesize = ixgbe_rx_pg_size(rx_ring) / 2; /* Must be power-of-2 */
+#else
+	truesize = ring_uses_build_skb(rx_ring) ?
+		SKB_DATA_ALIGN(IXGBE_SKB_PAD + size) +
+		SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) :
+		SKB_DATA_ALIGN(size);
+#endif
+	return truesize;
+}
+
 static void ixgbe_rx_buffer_flip(struct ixgbe_ring *rx_ring,
 				 struct ixgbe_rx_buffer *rx_buffer,
 				 unsigned int size)
 {
+	unsigned int truesize = ixgbe_rx_frame_truesize(rx_ring, size);
 #if (PAGE_SIZE < 8192)
-	unsigned int truesize = ixgbe_rx_pg_size(rx_ring) / 2;
-
 	rx_buffer->page_offset ^= truesize;
 #else
-	unsigned int truesize = ring_uses_build_skb(rx_ring) ?
-				SKB_DATA_ALIGN(IXGBE_SKB_PAD + size) +
-				SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) :
-				SKB_DATA_ALIGN(size);
-
 	rx_buffer->page_offset += truesize;
 #endif
 }
@@ -2291,6 +2301,11 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 
 	xdp.rxq = &rx_ring->xdp_rxq;
 
+	/* Frame size depend on rx_ring setup when PAGE_SIZE=4K */
+#if (PAGE_SIZE < 8192)
+	xdp.frame_sz = ixgbe_rx_frame_truesize(rx_ring, 0);
+#endif
+
 	while (likely(total_rx_packets < budget)) {
 		union ixgbe_adv_rx_desc *rx_desc;
 		struct ixgbe_rx_buffer *rx_buffer;
@@ -2324,7 +2339,10 @@ static int ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector,
 			xdp.data_hard_start = xdp.data -
 					      ixgbe_rx_offset(rx_ring);
 			xdp.data_end = xdp.data + size;
-
+#if (PAGE_SIZE > 4096)
+			/* At larger PAGE_SIZE, frame_sz depend on len size */
+			xdp.frame_sz = ixgbe_rx_frame_truesize(rx_ring, size);
+#endif
 			skb = ixgbe_run_xdp(adapter, rx_ring, &xdp);
 		}
 



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH net-next v3 24/33] ixgbevf: add XDP frame size to VF driver
  2020-05-08 11:08 [PATCH net-next v3 00/33] XDP extend with knowledge of frame size Jesper Dangaard Brouer
                   ` (22 preceding siblings ...)
  2020-05-08 11:10 ` [PATCH net-next v3 23/33] ixgbe: add XDP frame size to driver Jesper Dangaard Brouer
@ 2020-05-08 11:10 ` Jesper Dangaard Brouer
  2020-05-08 16:01   ` Jakub Kicinski
  2020-05-08 11:10 ` [PATCH net-next v3 25/33] i40e: add XDP frame size to driver Jesper Dangaard Brouer
                   ` (8 subsequent siblings)
  32 siblings, 1 reply; 38+ messages in thread
From: Jesper Dangaard Brouer @ 2020-05-08 11:10 UTC (permalink / raw)
  To: sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, David S. Miller, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, Tariq Toukan

This patch mirrors the changes to ixgbe in previous patch.

This VF driver doesn't support XDP_REDIRECT, but correct tailroom is
still necessary for BPF-helper xdp_adjust_tail.  In legacy-mode +
larger PAGE_SIZE, due to lacking tailroom, we accept that
xdp_adjust_tail shrink doesn't work.

Cc: intel-wired-lan@lists.osuosl.org
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c |   34 +++++++++++++++++----
 1 file changed, 27 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 4622c4ea2e46..62bc3e3b5b9c 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -1095,19 +1095,31 @@ static struct sk_buff *ixgbevf_run_xdp(struct ixgbevf_adapter *adapter,
 	return ERR_PTR(-result);
 }
 
+static unsigned int ixgbevf_rx_frame_truesize(struct ixgbevf_ring *rx_ring,
+					      unsigned int size)
+{
+	unsigned int truesize;
+
+#if (PAGE_SIZE < 8192)
+	truesize = ixgbevf_rx_pg_size(rx_ring) / 2; /* Must be power-of-2 */
+#else
+	truesize = ring_uses_build_skb(rx_ring) ?
+		SKB_DATA_ALIGN(IXGBEVF_SKB_PAD + size) +
+		SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) :
+		SKB_DATA_ALIGN(size);
+#endif
+       return truesize;
+}
+
 static void ixgbevf_rx_buffer_flip(struct ixgbevf_ring *rx_ring,
 				   struct ixgbevf_rx_buffer *rx_buffer,
 				   unsigned int size)
 {
-#if (PAGE_SIZE < 8192)
-	unsigned int truesize = ixgbevf_rx_pg_size(rx_ring) / 2;
+	unsigned int truesize = ixgbevf_rx_frame_truesize(rx_ring, size);
 
+#if (PAGE_SIZE < 8192)
 	rx_buffer->page_offset ^= truesize;
 #else
-	unsigned int truesize = ring_uses_build_skb(rx_ring) ?
-				SKB_DATA_ALIGN(IXGBEVF_SKB_PAD + size) :
-				SKB_DATA_ALIGN(size);
-
 	rx_buffer->page_offset += truesize;
 #endif
 }
@@ -1125,6 +1137,11 @@ static int ixgbevf_clean_rx_irq(struct ixgbevf_q_vector *q_vector,
 
 	xdp.rxq = &rx_ring->xdp_rxq;
 
+	/* Frame size depend on rx_ring setup when PAGE_SIZE=4K */
+#if (PAGE_SIZE < 8192)
+	xdp.frame_sz = ixgbevf_rx_frame_truesize(rx_ring, 0);
+#endif
+
 	while (likely(total_rx_packets < budget)) {
 		struct ixgbevf_rx_buffer *rx_buffer;
 		union ixgbe_adv_rx_desc *rx_desc;
@@ -1157,7 +1174,10 @@ static int ixgbevf_clean_rx_irq(struct ixgbevf_q_vector *q_vector,
 			xdp.data_hard_start = xdp.data -
 					      ixgbevf_rx_offset(rx_ring);
 			xdp.data_end = xdp.data + size;
-
+#if (PAGE_SIZE > 4096)
+			/* At larger PAGE_SIZE, frame_sz depend on len size */
+			xdp.frame_sz = ixgbevf_rx_frame_truesize(rx_ring, size);
+#endif
 			skb = ixgbevf_run_xdp(adapter, rx_ring, &xdp);
 		}
 



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH net-next v3 25/33] i40e: add XDP frame size to driver
  2020-05-08 11:08 [PATCH net-next v3 00/33] XDP extend with knowledge of frame size Jesper Dangaard Brouer
                   ` (23 preceding siblings ...)
  2020-05-08 11:10 ` [PATCH net-next v3 24/33] ixgbevf: add XDP frame size to VF driver Jesper Dangaard Brouer
@ 2020-05-08 11:10 ` Jesper Dangaard Brouer
  2020-05-08 11:10 ` [PATCH net-next v3 26/33] ice: " Jesper Dangaard Brouer
                   ` (7 subsequent siblings)
  32 siblings, 0 replies; 38+ messages in thread
From: Jesper Dangaard Brouer @ 2020-05-08 11:10 UTC (permalink / raw)
  To: sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, David S. Miller, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, Tariq Toukan

This driver uses different memory models depending on PAGE_SIZE at
compile time. For PAGE_SIZE 4K it uses page splitting, meaning for
normal MTU frame size is 2048 bytes (and headroom 192 bytes). For
larger MTUs the driver still use page splitting, by allocating
order-1 pages (8192 bytes) for RX frames. For PAGE_SIZE larger than
4K, driver instead advance its rx_buffer->page_offset with the frame
size "truesize".

For XDP frame size calculations, this mean that in PAGE_SIZE larger
than 4K mode the frame_sz change on a per packet basis. For the page
split 4K PAGE_SIZE mode, xdp.frame_sz is more constant and can be
updated once outside the main NAPI loop.

The default setting in the driver uses build_skb(), which provides
the necessary headroom and tailroom for XDP-redirect in RX-frame
(in both modes).

There is one complication, which is legacy-rx mode (configurable via
ethtool priv-flags). There are zero headroom in this mode, which is a
requirement for XDP-redirect to work. The conversion to xdp_frame
(convert_to_xdp_frame) will detect this insufficient space, and
xdp_do_redirect() call will fail. This is deemed acceptable, as it
allows other XDP actions to still work in legacy-mode. In
legacy-mode + larger PAGE_SIZE due to lacking tailroom, we also
accept that xdp_adjust_tail shrink doesn't work.

Cc: intel-wired-lan@lists.osuosl.org
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/intel/i40e/i40e_txrx.c |   30 +++++++++++++++++++++++----
 1 file changed, 25 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index b8496037ef7f..a3772beffe02 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -1507,6 +1507,22 @@ static inline unsigned int i40e_rx_offset(struct i40e_ring *rx_ring)
 	return ring_uses_build_skb(rx_ring) ? I40E_SKB_PAD : 0;
 }
 
+static unsigned int i40e_rx_frame_truesize(struct i40e_ring *rx_ring,
+					   unsigned int size)
+{
+	unsigned int truesize;
+
+#if (PAGE_SIZE < 8192)
+	truesize = i40e_rx_pg_size(rx_ring) / 2; /* Must be power-of-2 */
+#else
+	truesize = i40e_rx_offset(rx_ring) ?
+		SKB_DATA_ALIGN(size + i40e_rx_offset(rx_ring)) +
+		SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) :
+		SKB_DATA_ALIGN(size);
+#endif
+	return truesize;
+}
+
 /**
  * i40e_alloc_mapped_page - recycle or make a new page
  * @rx_ring: ring to use
@@ -2246,13 +2262,11 @@ static void i40e_rx_buffer_flip(struct i40e_ring *rx_ring,
 				struct i40e_rx_buffer *rx_buffer,
 				unsigned int size)
 {
-#if (PAGE_SIZE < 8192)
-	unsigned int truesize = i40e_rx_pg_size(rx_ring) / 2;
+	unsigned int truesize = i40e_rx_frame_truesize(rx_ring, size);
 
+#if (PAGE_SIZE < 8192)
 	rx_buffer->page_offset ^= truesize;
 #else
-	unsigned int truesize = SKB_DATA_ALIGN(i40e_rx_offset(rx_ring) + size);
-
 	rx_buffer->page_offset += truesize;
 #endif
 }
@@ -2335,6 +2349,9 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
 	bool failure = false;
 	struct xdp_buff xdp;
 
+#if (PAGE_SIZE < 8192)
+	xdp.frame_sz = i40e_rx_frame_truesize(rx_ring, 0);
+#endif
 	xdp.rxq = &rx_ring->xdp_rxq;
 
 	while (likely(total_rx_packets < (unsigned int)budget)) {
@@ -2389,7 +2406,10 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget)
 			xdp.data_hard_start = xdp.data -
 					      i40e_rx_offset(rx_ring);
 			xdp.data_end = xdp.data + size;
-
+#if (PAGE_SIZE > 4096)
+			/* At larger PAGE_SIZE, frame_sz depend on len size */
+			xdp.frame_sz = i40e_rx_frame_truesize(rx_ring, size);
+#endif
 			skb = i40e_run_xdp(rx_ring, &xdp);
 		}
 



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH net-next v3 26/33] ice: add XDP frame size to driver
  2020-05-08 11:08 [PATCH net-next v3 00/33] XDP extend with knowledge of frame size Jesper Dangaard Brouer
                   ` (24 preceding siblings ...)
  2020-05-08 11:10 ` [PATCH net-next v3 25/33] i40e: add XDP frame size to driver Jesper Dangaard Brouer
@ 2020-05-08 11:10 ` Jesper Dangaard Brouer
  2020-05-08 11:11 ` [PATCH net-next v3 27/33] xdp: for Intel AF_XDP drivers add XDP frame_sz Jesper Dangaard Brouer
                   ` (6 subsequent siblings)
  32 siblings, 0 replies; 38+ messages in thread
From: Jesper Dangaard Brouer @ 2020-05-08 11:10 UTC (permalink / raw)
  To: sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, David S. Miller, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, Tariq Toukan

This driver uses different memory models depending on PAGE_SIZE at
compile time. For PAGE_SIZE 4K it uses page splitting, meaning for
normal MTU frame size is 2048 bytes (and headroom 192 bytes). For
larger MTUs the driver still use page splitting, by allocating
order-1 pages (8192 bytes) for RX frames. For PAGE_SIZE larger than
4K, driver instead advance its rx_buffer->page_offset with the frame
size "truesize".

For XDP frame size calculations, this mean that in PAGE_SIZE larger
than 4K mode the frame_sz change on a per packet basis. For the page
split 4K PAGE_SIZE mode, xdp.frame_sz is more constant and can be
updated once outside the main NAPI loop.

The default setting in the driver uses build_skb(), which provides
the necessary headroom and tailroom for XDP-redirect in RX-frame
(in both modes).

There is one complication, which is legacy-rx mode (configurable via
ethtool priv-flags). There are zero headroom in this mode, which is a
requirement for XDP-redirect to work. The conversion to xdp_frame
(convert_to_xdp_frame) will detect this insufficient space, and
xdp_do_redirect() call will fail. This is deemed acceptable, as it
allows other XDP actions to still work in legacy-mode. In
legacy-mode + larger PAGE_SIZE due to lacking tailroom, we also
accept that xdp_adjust_tail shrink doesn't work.

Cc: intel-wired-lan@lists.osuosl.org
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/intel/ice/ice_txrx.c |   34 +++++++++++++++++++++--------
 1 file changed, 25 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c
index f67e8362958c..69b21b436f9a 100644
--- a/drivers/net/ethernet/intel/ice/ice_txrx.c
+++ b/drivers/net/ethernet/intel/ice/ice_txrx.c
@@ -423,6 +423,22 @@ static unsigned int ice_rx_offset(struct ice_ring *rx_ring)
 	return 0;
 }
 
+static unsigned int ice_rx_frame_truesize(struct ice_ring *rx_ring,
+					  unsigned int size)
+{
+	unsigned int truesize;
+
+#if (PAGE_SIZE < 8192)
+	truesize = ice_rx_pg_size(rx_ring) / 2; /* Must be power-of-2 */
+#else
+	truesize = ice_rx_offset(rx_ring) ?
+		SKB_DATA_ALIGN(ice_rx_offset(rx_ring) + size) +
+		SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) :
+		SKB_DATA_ALIGN(size);
+#endif
+	return truesize;
+}
+
 /**
  * ice_run_xdp - Executes an XDP program on initialized xdp_buff
  * @rx_ring: Rx ring
@@ -991,6 +1007,10 @@ static int ice_clean_rx_irq(struct ice_ring *rx_ring, int budget)
 	bool failure;
 
 	xdp.rxq = &rx_ring->xdp_rxq;
+	/* Frame size depend on rx_ring setup when PAGE_SIZE=4K */
+#if (PAGE_SIZE < 8192)
+	xdp.frame_sz = ice_rx_frame_truesize(rx_ring, 0);
+#endif
 
 	/* start the loop to process Rx packets bounded by 'budget' */
 	while (likely(total_rx_pkts < (unsigned int)budget)) {
@@ -1038,6 +1058,10 @@ static int ice_clean_rx_irq(struct ice_ring *rx_ring, int budget)
 		xdp.data_hard_start = xdp.data - ice_rx_offset(rx_ring);
 		xdp.data_meta = xdp.data;
 		xdp.data_end = xdp.data + size;
+#if (PAGE_SIZE > 4096)
+		/* At larger PAGE_SIZE, frame_sz depend on len size */
+		xdp.frame_sz = ice_rx_frame_truesize(rx_ring, size);
+#endif
 
 		rcu_read_lock();
 		xdp_prog = READ_ONCE(rx_ring->xdp_prog);
@@ -1051,16 +1075,8 @@ static int ice_clean_rx_irq(struct ice_ring *rx_ring, int budget)
 		if (!xdp_res)
 			goto construct_skb;
 		if (xdp_res & (ICE_XDP_TX | ICE_XDP_REDIR)) {
-			unsigned int truesize;
-
-#if (PAGE_SIZE < 8192)
-			truesize = ice_rx_pg_size(rx_ring) / 2;
-#else
-			truesize = SKB_DATA_ALIGN(ice_rx_offset(rx_ring) +
-						  size);
-#endif
 			xdp_xmit |= xdp_res;
-			ice_rx_buf_adjust_pg_offset(rx_buf, truesize);
+			ice_rx_buf_adjust_pg_offset(rx_buf, xdp.frame_sz);
 		} else {
 			rx_buf->pagecnt_bias++;
 		}



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH net-next v3 27/33] xdp: for Intel AF_XDP drivers add XDP frame_sz
  2020-05-08 11:08 [PATCH net-next v3 00/33] XDP extend with knowledge of frame size Jesper Dangaard Brouer
                   ` (25 preceding siblings ...)
  2020-05-08 11:10 ` [PATCH net-next v3 26/33] ice: " Jesper Dangaard Brouer
@ 2020-05-08 11:11 ` Jesper Dangaard Brouer
  2020-05-08 11:11 ` [PATCH net-next v3 28/33] mlx5: rx queue setup time determine frame_sz for XDP Jesper Dangaard Brouer
                   ` (5 subsequent siblings)
  32 siblings, 0 replies; 38+ messages in thread
From: Jesper Dangaard Brouer @ 2020-05-08 11:11 UTC (permalink / raw)
  To: sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, David S. Miller, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, Tariq Toukan

Intel drivers implement native AF_XDP zerocopy in separate C-files,
that have its own invocation of bpf_prog_run_xdp(). The setup of
xdp_buff is also handled in separately from normal code path.

This patch update XDP frame_sz for AF_XDP zerocopy drivers i40e, ice
and ixgbe, as the code changes needed are very similar.  Introduce a
helper function xsk_umem_xdp_frame_sz() for calculating frame size.

Cc: intel-wired-lan@lists.osuosl.org
Cc: Björn Töpel <bjorn.topel@intel.com>
Cc: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_xsk.c   |    2 ++
 drivers/net/ethernet/intel/ice/ice_xsk.c     |    2 ++
 drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c |    2 ++
 include/net/xdp_sock.h                       |   11 +++++++++++
 4 files changed, 17 insertions(+)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
index 0b7d29192b2c..2b9184aead5f 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c
@@ -531,12 +531,14 @@ int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget)
 {
 	unsigned int total_rx_bytes = 0, total_rx_packets = 0;
 	u16 cleaned_count = I40E_DESC_UNUSED(rx_ring);
+	struct xdp_umem *umem = rx_ring->xsk_umem;
 	unsigned int xdp_res, xdp_xmit = 0;
 	bool failure = false;
 	struct sk_buff *skb;
 	struct xdp_buff xdp;
 
 	xdp.rxq = &rx_ring->xdp_rxq;
+	xdp.frame_sz = xsk_umem_xdp_frame_sz(umem);
 
 	while (likely(total_rx_packets < (unsigned int)budget)) {
 		struct i40e_rx_buffer *bi;
diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c
index 8279db15e870..23e5515d4527 100644
--- a/drivers/net/ethernet/intel/ice/ice_xsk.c
+++ b/drivers/net/ethernet/intel/ice/ice_xsk.c
@@ -840,11 +840,13 @@ int ice_clean_rx_irq_zc(struct ice_ring *rx_ring, int budget)
 {
 	unsigned int total_rx_bytes = 0, total_rx_packets = 0;
 	u16 cleaned_count = ICE_DESC_UNUSED(rx_ring);
+	struct xdp_umem *umem = rx_ring->xsk_umem;
 	unsigned int xdp_xmit = 0;
 	bool failure = false;
 	struct xdp_buff xdp;
 
 	xdp.rxq = &rx_ring->xdp_rxq;
+	xdp.frame_sz = xsk_umem_xdp_frame_sz(umem);
 
 	while (likely(total_rx_packets < (unsigned int)budget)) {
 		union ice_32b_rx_flex_desc *rx_desc;
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
index 74b540ebb3dc..a656ee9a1fae 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_xsk.c
@@ -431,12 +431,14 @@ int ixgbe_clean_rx_irq_zc(struct ixgbe_q_vector *q_vector,
 	unsigned int total_rx_bytes = 0, total_rx_packets = 0;
 	struct ixgbe_adapter *adapter = q_vector->adapter;
 	u16 cleaned_count = ixgbe_desc_unused(rx_ring);
+	struct xdp_umem *umem = rx_ring->xsk_umem;
 	unsigned int xdp_res, xdp_xmit = 0;
 	bool failure = false;
 	struct sk_buff *skb;
 	struct xdp_buff xdp;
 
 	xdp.rxq = &rx_ring->xdp_rxq;
+	xdp.frame_sz = xsk_umem_xdp_frame_sz(umem);
 
 	while (likely(total_rx_packets < budget)) {
 		union ixgbe_adv_rx_desc *rx_desc;
diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h
index e86ec48ef627..1cd1ec3cea97 100644
--- a/include/net/xdp_sock.h
+++ b/include/net/xdp_sock.h
@@ -237,6 +237,12 @@ static inline u64 xsk_umem_adjust_offset(struct xdp_umem *umem, u64 address,
 	else
 		return address + offset;
 }
+
+static inline u32 xsk_umem_xdp_frame_sz(struct xdp_umem *umem)
+{
+	return umem->chunk_size_nohr + umem->headroom;
+}
+
 #else
 static inline int xsk_generic_rcv(struct xdp_sock *xs, struct xdp_buff *xdp)
 {
@@ -367,6 +373,11 @@ static inline u64 xsk_umem_adjust_offset(struct xdp_umem *umem, u64 handle,
 	return 0;
 }
 
+static inline u32 xsk_umem_xdp_frame_sz(struct xdp_umem *umem)
+{
+	return 0;
+}
+
 static inline int __xsk_map_redirect(struct xdp_sock *xs, struct xdp_buff *xdp)
 {
 	return -EOPNOTSUPP;



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH net-next v3 28/33] mlx5: rx queue setup time determine frame_sz for XDP
  2020-05-08 11:08 [PATCH net-next v3 00/33] XDP extend with knowledge of frame size Jesper Dangaard Brouer
                   ` (26 preceding siblings ...)
  2020-05-08 11:11 ` [PATCH net-next v3 27/33] xdp: for Intel AF_XDP drivers add XDP frame_sz Jesper Dangaard Brouer
@ 2020-05-08 11:11 ` Jesper Dangaard Brouer
  2020-05-10  6:31   ` Tariq Toukan
  2020-05-08 11:11 ` [PATCH net-next v3 29/33] xdp: allow bpf_xdp_adjust_tail() to grow packet size Jesper Dangaard Brouer
                   ` (4 subsequent siblings)
  32 siblings, 1 reply; 38+ messages in thread
From: Jesper Dangaard Brouer @ 2020-05-08 11:11 UTC (permalink / raw)
  To: sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, David S. Miller, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, Tariq Toukan

The mlx5 driver have multiple memory models, which are also changed
according to whether a XDP bpf_prog is attached.

The 'rx_striding_rq' setting is adjusted via ethtool priv-flags e.g.:
 # ethtool --set-priv-flags mlx5p2 rx_striding_rq off

On the general case with 4K page_size and regular MTU packet, then
the frame_sz is 2048 and 4096 when XDP is enabled, in both modes.

The info on the given frame size is stored differently depending on the
RQ-mode and encoded in a union in struct mlx5e_rq union wqe/mpwqe.
In rx striding mode rq->mpwqe.log_stride_sz is either 11 or 12, which
corresponds to 2048 or 4096 (MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ).
In non-striding mode (MLX5_WQ_TYPE_CYCLIC) the frag_stride is stored
in rq->wqe.info.arr[0].frag_stride, for the first fragment, which is
what the XDP case cares about.

To reduce effect on fast-path, this patch determine the frame_sz at
setup time, to avoid determining the memory model runtime. Variable
is named first_frame_sz to make it clear that this is only the frame
size of the first fragment.

This mlx5 driver does a DMA-sync on XDP_TX action, but grow is safe
as it have done a DMA-map on the entire PAGE_SIZE. The driver also
already does a XDP length check against sq->hw_mtu on the possible
XDP xmit paths mlx5e_xmit_xdp_frame() + mlx5e_xmit_xdp_frame_mpwqe().

V2: Fix that frag_size need to be recalc before creating SKB.

Cc: Tariq Toukan <tariqt@mellanox.com>
Cc: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en.h      |    1 +
 drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c  |    1 +
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c |    6 ++++++
 drivers/net/ethernet/mellanox/mlx5/core/en_rx.c   |    2 ++
 4 files changed, 10 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 0864b76ca2c0..8e75a9f9b114 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -652,6 +652,7 @@ struct mlx5e_rq {
 	struct {
 		u16            umem_headroom;
 		u16            headroom;
+		u32            frame0_sz;
 		u8             map_dir;   /* dma map direction */
 	} buff;
 
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
index c4a7fb4ecd14..761c8979bd41 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/xdp.c
@@ -137,6 +137,7 @@ bool mlx5e_xdp_handle(struct mlx5e_rq *rq, struct mlx5e_dma_info *di,
 	if (xsk)
 		xdp.handle = di->xsk.handle;
 	xdp.rxq = &rq->xdp_rxq;
+	xdp.frame_sz = rq->buff.frame0_sz;
 
 	act = bpf_prog_run_xdp(prog, &xdp);
 	if (xsk) {
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 048a4f8601a8..c0f1912bcb65 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -462,6 +462,8 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c,
 		rq->mpwqe.num_strides =
 			BIT(mlx5e_mpwqe_get_log_num_strides(mdev, params, xsk));
 
+		rq->buff.frame0_sz = (1 << rq->mpwqe.log_stride_sz);
+
 		err = mlx5e_create_rq_umr_mkey(mdev, rq);
 		if (err)
 			goto err_rq_wq_destroy;
@@ -485,6 +487,8 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c,
 			num_xsk_frames = wq_sz << rq->wqe.info.log_num_frags;
 
 		rq->wqe.info = rqp->frags_info;
+		rq->buff.frame0_sz = rq->wqe.info.arr[0].frag_stride;
+
 		rq->wqe.frags =
 			kvzalloc_node(array_size(sizeof(*rq->wqe.frags),
 					(wq_sz << rq->wqe.info.log_num_frags)),
@@ -522,6 +526,8 @@ static int mlx5e_alloc_rq(struct mlx5e_channel *c,
 	}
 
 	if (xsk) {
+		rq->buff.frame0_sz = xsk_umem_xdp_frame_sz(umem);
+
 		err = mlx5e_xsk_resize_reuseq(umem, num_xsk_frames);
 		if (unlikely(err)) {
 			mlx5_core_err(mdev, "Unable to allocate the Reuse Ring for %u frames\n",
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
index d9a5a669b84d..97c2fda8f697 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_rx.c
@@ -1064,6 +1064,7 @@ mlx5e_skb_from_cqe_linear(struct mlx5e_rq *rq, struct mlx5_cqe64 *cqe,
 	if (consumed)
 		return NULL; /* page/packet was consumed by XDP */
 
+	frag_size = MLX5_SKB_FRAG_SZ(rx_headroom + cqe_bcnt);
 	skb = mlx5e_build_linear_skb(rq, va, frag_size, rx_headroom, cqe_bcnt);
 	if (unlikely(!skb))
 		return NULL;
@@ -1365,6 +1366,7 @@ mlx5e_skb_from_cqe_mpwrq_linear(struct mlx5e_rq *rq, struct mlx5e_mpw_info *wi,
 		return NULL; /* page/packet was consumed by XDP */
 	}
 
+	frag_size = MLX5_SKB_FRAG_SZ(rx_headroom + cqe_bcnt32);
 	skb = mlx5e_build_linear_skb(rq, va, frag_size, rx_headroom, cqe_bcnt32);
 	if (unlikely(!skb))
 		return NULL;



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH net-next v3 29/33] xdp: allow bpf_xdp_adjust_tail() to grow packet size
  2020-05-08 11:08 [PATCH net-next v3 00/33] XDP extend with knowledge of frame size Jesper Dangaard Brouer
                   ` (27 preceding siblings ...)
  2020-05-08 11:11 ` [PATCH net-next v3 28/33] mlx5: rx queue setup time determine frame_sz for XDP Jesper Dangaard Brouer
@ 2020-05-08 11:11 ` Jesper Dangaard Brouer
  2020-05-08 11:11 ` [PATCH net-next v3 30/33] xdp: clear grow memory in bpf_xdp_adjust_tail() Jesper Dangaard Brouer
                   ` (3 subsequent siblings)
  32 siblings, 0 replies; 38+ messages in thread
From: Jesper Dangaard Brouer @ 2020-05-08 11:11 UTC (permalink / raw)
  To: sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, David S. Miller, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, Tariq Toukan

Finally, after all drivers have a frame size, allow BPF-helper
bpf_xdp_adjust_tail() to grow or extend packet size at frame tail.

Remember that helper/macro xdp_data_hard_end have reserved some
tailroom.  Thus, this helper makes sure that the BPF-prog don't have
access to this tailroom area.

V2: Remove one chicken check and use WARN_ONCE for other

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 include/uapi/linux/bpf.h |    4 ++--
 net/core/filter.c        |   11 +++++++++--
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 101b0c8a3784..3e44853388d8 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -2004,8 +2004,8 @@ union bpf_attr {
  * int bpf_xdp_adjust_tail(struct xdp_buff *xdp_md, int delta)
  * 	Description
  * 		Adjust (move) *xdp_md*\ **->data_end** by *delta* bytes. It is
- * 		only possible to shrink the packet as of this writing,
- * 		therefore *delta* must be a negative integer.
+ * 		possible to both shrink and grow the packet tail.
+ * 		Shrink done via *delta* being a negative integer.
  *
  * 		A call to this helper is susceptible to change the underlying
  * 		packet buffer. Therefore, at load time, all checks on pointers
diff --git a/net/core/filter.c b/net/core/filter.c
index dfaf5df13722..ec3ab2e2d800 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3411,12 +3411,19 @@ static const struct bpf_func_proto bpf_xdp_adjust_head_proto = {
 
 BPF_CALL_2(bpf_xdp_adjust_tail, struct xdp_buff *, xdp, int, offset)
 {
+	void *data_hard_end = xdp_data_hard_end(xdp); /* use xdp->frame_sz */
 	void *data_end = xdp->data_end + offset;
 
-	/* only shrinking is allowed for now. */
-	if (unlikely(offset >= 0))
+	/* Notice that xdp_data_hard_end have reserved some tailroom */
+	if (unlikely(data_end > data_hard_end))
 		return -EINVAL;
 
+	/* ALL drivers MUST init xdp->frame_sz, chicken check below */
+	if (unlikely(xdp->frame_sz > PAGE_SIZE)) {
+		WARN_ONCE(1, "Too BIG xdp->frame_sz = %d\n", xdp->frame_sz);
+		return -EINVAL;
+	}
+
 	if (unlikely(data_end < xdp->data + ETH_HLEN))
 		return -EINVAL;
 



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH net-next v3 30/33] xdp: clear grow memory in bpf_xdp_adjust_tail()
  2020-05-08 11:08 [PATCH net-next v3 00/33] XDP extend with knowledge of frame size Jesper Dangaard Brouer
                   ` (28 preceding siblings ...)
  2020-05-08 11:11 ` [PATCH net-next v3 29/33] xdp: allow bpf_xdp_adjust_tail() to grow packet size Jesper Dangaard Brouer
@ 2020-05-08 11:11 ` Jesper Dangaard Brouer
  2020-05-08 11:11 ` [PATCH net-next v3 31/33] bpf: add xdp.frame_sz in bpf_prog_test_run_xdp() Jesper Dangaard Brouer
                   ` (2 subsequent siblings)
  32 siblings, 0 replies; 38+ messages in thread
From: Jesper Dangaard Brouer @ 2020-05-08 11:11 UTC (permalink / raw)
  To: sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, David S. Miller, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, Tariq Toukan

Clearing memory of tail when grow happens, because it is too easy
to write a XDP_PASS program that extend the tail, which expose
this memory to users that can run tcpdump.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
 net/core/filter.c |    4 ++++
 1 file changed, 4 insertions(+)

diff --git a/net/core/filter.c b/net/core/filter.c
index ec3ab2e2d800..691235208e0b 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -3427,6 +3427,10 @@ BPF_CALL_2(bpf_xdp_adjust_tail, struct xdp_buff *, xdp, int, offset)
 	if (unlikely(data_end < xdp->data + ETH_HLEN))
 		return -EINVAL;
 
+	/* Clear memory area on grow, can contain uninit kernel memory */
+	if (offset > 0)
+		memset(xdp->data_end, 0, offset);
+
 	xdp->data_end = data_end;
 
 	return 0;



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH net-next v3 31/33] bpf: add xdp.frame_sz in bpf_prog_test_run_xdp().
  2020-05-08 11:08 [PATCH net-next v3 00/33] XDP extend with knowledge of frame size Jesper Dangaard Brouer
                   ` (29 preceding siblings ...)
  2020-05-08 11:11 ` [PATCH net-next v3 30/33] xdp: clear grow memory in bpf_xdp_adjust_tail() Jesper Dangaard Brouer
@ 2020-05-08 11:11 ` Jesper Dangaard Brouer
  2020-05-08 11:11 ` [PATCH net-next v3 32/33] selftests/bpf: adjust BPF selftest for xdp_adjust_tail Jesper Dangaard Brouer
  2020-05-08 11:11 ` [PATCH net-next v3 33/33] selftests/bpf: xdp_adjust_tail add grow tail tests Jesper Dangaard Brouer
  32 siblings, 0 replies; 38+ messages in thread
From: Jesper Dangaard Brouer @ 2020-05-08 11:11 UTC (permalink / raw)
  To: sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, David S. Miller, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, Tariq Toukan

Update the memory requirements, when adding xdp.frame_sz in BPF test_run
function bpf_prog_test_run_xdp() which e.g. is used by XDP selftests.

Specifically add the expected reserved tailroom, but also allocated a
larger memory area to reflect that XDP frames usually comes in this
format. Limit the provided packet data size to 4096 minus headroom +
tailroom, as this also reflect a common 3520 bytes MTU limit with XDP.

Note that bpf_test_init already use a memory allocation method that clears
memory.  Thus, this already guards against leaking uninit kernel memory.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 net/bpf/test_run.c |   16 ++++++++++++----
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index 29dbdd4c29f6..30ba7d38941d 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -470,25 +470,34 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
 int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr,
 			  union bpf_attr __user *uattr)
 {
+	u32 tailroom = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+	u32 headroom = XDP_PACKET_HEADROOM;
 	u32 size = kattr->test.data_size_in;
 	u32 repeat = kattr->test.repeat;
 	struct netdev_rx_queue *rxqueue;
 	struct xdp_buff xdp = {};
 	u32 retval, duration;
+	u32 max_data_sz;
 	void *data;
 	int ret;
 
 	if (kattr->test.ctx_in || kattr->test.ctx_out)
 		return -EINVAL;
 
-	data = bpf_test_init(kattr, size, XDP_PACKET_HEADROOM + NET_IP_ALIGN, 0);
+	/* XDP have extra tailroom as (most) drivers use full page */
+	max_data_sz = 4096 - headroom - tailroom;
+	if (size > max_data_sz)
+		return -EINVAL;
+
+	data = bpf_test_init(kattr, max_data_sz, headroom, tailroom);
 	if (IS_ERR(data))
 		return PTR_ERR(data);
 
 	xdp.data_hard_start = data;
-	xdp.data = data + XDP_PACKET_HEADROOM + NET_IP_ALIGN;
+	xdp.data = data + headroom;
 	xdp.data_meta = xdp.data;
 	xdp.data_end = xdp.data + size;
+	xdp.frame_sz = headroom + max_data_sz + tailroom;
 
 	rxqueue = __netif_get_rx_queue(current->nsproxy->net_ns->loopback_dev, 0);
 	xdp.rxq = &rxqueue->xdp_rxq;
@@ -496,8 +505,7 @@ int bpf_prog_test_run_xdp(struct bpf_prog *prog, const union bpf_attr *kattr,
 	ret = bpf_test_run(prog, &xdp, repeat, &retval, &duration, true);
 	if (ret)
 		goto out;
-	if (xdp.data != data + XDP_PACKET_HEADROOM + NET_IP_ALIGN ||
-	    xdp.data_end != xdp.data + size)
+	if (xdp.data != data + headroom || xdp.data_end != xdp.data + size)
 		size = xdp.data_end - xdp.data;
 	ret = bpf_test_finish(kattr, uattr, xdp.data, size, retval, duration);
 out:



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH net-next v3 32/33] selftests/bpf: adjust BPF selftest for xdp_adjust_tail
  2020-05-08 11:08 [PATCH net-next v3 00/33] XDP extend with knowledge of frame size Jesper Dangaard Brouer
                   ` (30 preceding siblings ...)
  2020-05-08 11:11 ` [PATCH net-next v3 31/33] bpf: add xdp.frame_sz in bpf_prog_test_run_xdp() Jesper Dangaard Brouer
@ 2020-05-08 11:11 ` Jesper Dangaard Brouer
  2020-05-08 11:11 ` [PATCH net-next v3 33/33] selftests/bpf: xdp_adjust_tail add grow tail tests Jesper Dangaard Brouer
  32 siblings, 0 replies; 38+ messages in thread
From: Jesper Dangaard Brouer @ 2020-05-08 11:11 UTC (permalink / raw)
  To: sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, David S. Miller, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, Tariq Toukan

Current selftest for BPF-helper xdp_adjust_tail only shrink tail.
Make it more clear that this is a shrink test case.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 .../selftests/bpf/prog_tests/xdp_adjust_tail.c     |    9 +++++-
 .../testing/selftests/bpf/progs/test_adjust_tail.c |   30 --------------------
 .../bpf/progs/test_xdp_adjust_tail_shrink.c        |   30 ++++++++++++++++++++
 3 files changed, 37 insertions(+), 32 deletions(-)
 delete mode 100644 tools/testing/selftests/bpf/progs/test_adjust_tail.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_shrink.c

diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_adjust_tail.c b/tools/testing/selftests/bpf/prog_tests/xdp_adjust_tail.c
index 3744196d7cba..d258f979d5ef 100644
--- a/tools/testing/selftests/bpf/prog_tests/xdp_adjust_tail.c
+++ b/tools/testing/selftests/bpf/prog_tests/xdp_adjust_tail.c
@@ -1,9 +1,9 @@
 // SPDX-License-Identifier: GPL-2.0
 #include <test_progs.h>
 
-void test_xdp_adjust_tail(void)
+void test_xdp_adjust_tail_shrink(void)
 {
-	const char *file = "./test_adjust_tail.o";
+	const char *file = "./test_xdp_adjust_tail_shrink.o";
 	struct bpf_object *obj;
 	char buf[128];
 	__u32 duration, retval, size;
@@ -27,3 +27,8 @@ void test_xdp_adjust_tail(void)
 	      err, errno, retval, size);
 	bpf_object__close(obj);
 }
+
+void test_xdp_adjust_tail(void)
+{
+	test_xdp_adjust_tail_shrink();
+}
diff --git a/tools/testing/selftests/bpf/progs/test_adjust_tail.c b/tools/testing/selftests/bpf/progs/test_adjust_tail.c
deleted file mode 100644
index b7fc85769bdc..000000000000
--- a/tools/testing/selftests/bpf/progs/test_adjust_tail.c
+++ /dev/null
@@ -1,30 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0
- * Copyright (c) 2018 Facebook
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of version 2 of the GNU General Public
- * License as published by the Free Software Foundation.
- */
-#include <linux/bpf.h>
-#include <linux/if_ether.h>
-#include <bpf/bpf_helpers.h>
-
-int _version SEC("version") = 1;
-
-SEC("xdp_adjust_tail")
-int _xdp_adjust_tail(struct xdp_md *xdp)
-{
-	void *data_end = (void *)(long)xdp->data_end;
-	void *data = (void *)(long)xdp->data;
-	int offset = 0;
-
-	if (data_end - data == 54)
-		offset = 256;
-	else
-		offset = 20;
-	if (bpf_xdp_adjust_tail(xdp, 0 - offset))
-		return XDP_DROP;
-	return XDP_TX;
-}
-
-char _license[] SEC("license") = "GPL";
diff --git a/tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_shrink.c b/tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_shrink.c
new file mode 100644
index 000000000000..c8a7c17b54f4
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_shrink.c
@@ -0,0 +1,30 @@
+/* SPDX-License-Identifier: GPL-2.0
+ * Copyright (c) 2018 Facebook
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of version 2 of the GNU General Public
+ * License as published by the Free Software Foundation.
+ */
+#include <linux/bpf.h>
+#include <linux/if_ether.h>
+#include <bpf/bpf_helpers.h>
+
+int _version SEC("version") = 1;
+
+SEC("xdp_adjust_tail_shrink")
+int _xdp_adjust_tail_shrink(struct xdp_md *xdp)
+{
+	void *data_end = (void *)(long)xdp->data_end;
+	void *data = (void *)(long)xdp->data;
+	int offset = 0;
+
+	if (data_end - data == 54) /* sizeof(pkt_v4) */
+		offset = 256; /* shrink too much */
+	else
+		offset = 20;
+	if (bpf_xdp_adjust_tail(xdp, 0 - offset))
+		return XDP_DROP;
+	return XDP_TX;
+}
+
+char _license[] SEC("license") = "GPL";



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH net-next v3 33/33] selftests/bpf: xdp_adjust_tail add grow tail tests
  2020-05-08 11:08 [PATCH net-next v3 00/33] XDP extend with knowledge of frame size Jesper Dangaard Brouer
                   ` (31 preceding siblings ...)
  2020-05-08 11:11 ` [PATCH net-next v3 32/33] selftests/bpf: adjust BPF selftest for xdp_adjust_tail Jesper Dangaard Brouer
@ 2020-05-08 11:11 ` Jesper Dangaard Brouer
  32 siblings, 0 replies; 38+ messages in thread
From: Jesper Dangaard Brouer @ 2020-05-08 11:11 UTC (permalink / raw)
  To: sameehj
  Cc: Jesper Dangaard Brouer, netdev, bpf,
	Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, David S. Miller, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, Tariq Toukan

Extend BPF selftest xdp_adjust_tail with grow tail tests, which is added
as subtest's. The first grow test stays in same form as original shrink
test. The second grow test use the newer bpf_prog_test_run_xattr() calls,
and does extra checking of data contents.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
---
 .../selftests/bpf/prog_tests/xdp_adjust_tail.c     |  116 +++++++++++++++++++-
 .../bpf/progs/test_xdp_adjust_tail_grow.c          |   33 ++++++
 2 files changed, 144 insertions(+), 5 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_grow.c

diff --git a/tools/testing/selftests/bpf/prog_tests/xdp_adjust_tail.c b/tools/testing/selftests/bpf/prog_tests/xdp_adjust_tail.c
index d258f979d5ef..1498627af6e8 100644
--- a/tools/testing/selftests/bpf/prog_tests/xdp_adjust_tail.c
+++ b/tools/testing/selftests/bpf/prog_tests/xdp_adjust_tail.c
@@ -4,10 +4,10 @@
 void test_xdp_adjust_tail_shrink(void)
 {
 	const char *file = "./test_xdp_adjust_tail_shrink.o";
+	__u32 duration, retval, size, expect_sz;
 	struct bpf_object *obj;
-	char buf[128];
-	__u32 duration, retval, size;
 	int err, prog_fd;
+	char buf[128];
 
 	err = bpf_prog_load(file, BPF_PROG_TYPE_XDP, &obj, &prog_fd);
 	if (CHECK_FAIL(err))
@@ -20,15 +20,121 @@ void test_xdp_adjust_tail_shrink(void)
 	      "ipv4", "err %d errno %d retval %d size %d\n",
 	      err, errno, retval, size);
 
+	expect_sz = sizeof(pkt_v6) - 20;  /* Test shrink with 20 bytes */
 	err = bpf_prog_test_run(prog_fd, 1, &pkt_v6, sizeof(pkt_v6),
 				buf, &size, &retval, &duration);
-	CHECK(err || retval != XDP_TX || size != 54,
-	      "ipv6", "err %d errno %d retval %d size %d\n",
+	CHECK(err || retval != XDP_TX || size != expect_sz,
+	      "ipv6", "err %d errno %d retval %d size %d expect-size %d\n",
+	      err, errno, retval, size, expect_sz);
+	bpf_object__close(obj);
+}
+
+void test_xdp_adjust_tail_grow(void)
+{
+	const char *file = "./test_xdp_adjust_tail_grow.o";
+	struct bpf_object *obj;
+	char buf[4096]; /* avoid segfault: large buf to hold grow results */
+	__u32 duration, retval, size, expect_sz;
+	int err, prog_fd;
+
+	err = bpf_prog_load(file, BPF_PROG_TYPE_XDP, &obj, &prog_fd);
+	if (CHECK_FAIL(err))
+		return;
+
+	err = bpf_prog_test_run(prog_fd, 1, &pkt_v4, sizeof(pkt_v4),
+				buf, &size, &retval, &duration);
+	CHECK(err || retval != XDP_DROP,
+	      "ipv4", "err %d errno %d retval %d size %d\n",
 	      err, errno, retval, size);
+
+	expect_sz = sizeof(pkt_v6) + 40; /* Test grow with 40 bytes */
+	err = bpf_prog_test_run(prog_fd, 1, &pkt_v6, sizeof(pkt_v6) /* 74 */,
+				buf, &size, &retval, &duration);
+	CHECK(err || retval != XDP_TX || size != expect_sz,
+	      "ipv6", "err %d errno %d retval %d size %d expect-size %d\n",
+	      err, errno, retval, size, expect_sz);
+
+	bpf_object__close(obj);
+}
+
+void test_xdp_adjust_tail_grow2(void)
+{
+	const char *file = "./test_xdp_adjust_tail_grow.o";
+	char buf[4096]; /* avoid segfault: large buf to hold grow results */
+	int tailroom = 320; /* SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) */;
+	struct bpf_object *obj;
+	int err, cnt, i;
+	int max_grow;
+
+	struct bpf_prog_test_run_attr tattr = {
+		.repeat 	= 1,
+		.data_in	= &buf,
+		.data_out	= &buf,
+		.data_size_in	= 0, /* Per test */
+		.data_size_out	= 0, /* Per test */
+	};
+
+	err = bpf_prog_load(file, BPF_PROG_TYPE_XDP, &obj, &tattr.prog_fd);
+	if (CHECK_ATTR(err, "load", "err %d errno %d\n", err, errno))
+		return;
+
+	/* Test case-64 */
+	memset(buf, 1, sizeof(buf));
+	tattr.data_size_in  =  64; /* Determine test case via pkt size */
+	tattr.data_size_out = 128; /* Limit copy_size */
+	/* Kernel side alloc packet memory area that is zero init */
+	err = bpf_prog_test_run_xattr(&tattr);
+
+	CHECK_ATTR(errno != ENOSPC /* Due limit copy_size in bpf_test_finish */
+		   || tattr.retval != XDP_TX
+		   || tattr.data_size_out != 192, /* Expected grow size */
+		   "case-64",
+		   "err %d errno %d retval %d size %d\n",
+		   err, errno, tattr.retval, tattr.data_size_out);
+
+	/* Extra checks for data contents */
+	CHECK_ATTR(tattr.data_size_out != 192
+		   || buf[0]   != 1 ||  buf[63]  != 1  /*  0-63  memset to 1 */
+		   || buf[64]  != 0 ||  buf[127] != 0  /* 64-127 memset to 0 */
+		   || buf[128] != 1 ||  buf[191] != 1, /*128-191 memset to 1 */
+		   "case-64-data",
+		   "err %d errno %d retval %d size %d\n",
+		   err, errno, tattr.retval, tattr.data_size_out);
+
+	/* Test case-128 */
+	memset(buf, 2, sizeof(buf));
+	tattr.data_size_in  = 128; /* Determine test case via pkt size */
+	tattr.data_size_out = sizeof(buf);   /* Copy everything */
+	err = bpf_prog_test_run_xattr(&tattr);
+
+	max_grow = 4096 - XDP_PACKET_HEADROOM -	tailroom; /* 3520 */
+	CHECK_ATTR(err
+		   || tattr.retval != XDP_TX
+		   || tattr.data_size_out != max_grow, /* Expect max grow size */
+		   "case-128",
+		   "err %d errno %d retval %d size %d expect-size %d\n",
+		   err, errno, tattr.retval, tattr.data_size_out, max_grow);
+
+	/* Extra checks for data contents: Count grow size, will contain zeros */
+	for (i = 0, cnt = 0; i < sizeof(buf); i++) {
+		if (buf[i] == 0)
+			cnt++;
+	}
+	CHECK_ATTR((cnt != (max_grow - tattr.data_size_in)) /* Grow increase */
+		   || tattr.data_size_out != max_grow, /* Total grow size */
+		   "case-128-data",
+		   "err %d errno %d retval %d size %d grow-size %d\n",
+		   err, errno, tattr.retval, tattr.data_size_out, cnt);
+
 	bpf_object__close(obj);
 }
 
 void test_xdp_adjust_tail(void)
 {
-	test_xdp_adjust_tail_shrink();
+	if (test__start_subtest("xdp_adjust_tail_shrink"))
+		test_xdp_adjust_tail_shrink();
+	if (test__start_subtest("xdp_adjust_tail_grow"))
+		test_xdp_adjust_tail_grow();
+	if (test__start_subtest("xdp_adjust_tail_grow2"))
+		test_xdp_adjust_tail_grow2();
 }
diff --git a/tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_grow.c b/tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_grow.c
new file mode 100644
index 000000000000..3d66599eee2e
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/test_xdp_adjust_tail_grow.c
@@ -0,0 +1,33 @@
+// SPDX-License-Identifier: GPL-2.0
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+
+SEC("xdp_adjust_tail_grow")
+int _xdp_adjust_tail_grow(struct xdp_md *xdp)
+{
+	void *data_end = (void *)(long)xdp->data_end;
+	void *data = (void *)(long)xdp->data;
+	unsigned int data_len;
+	int offset = 0;
+
+	/* Data length determine test case */
+	data_len = data_end - data;
+
+	if (data_len == 54) { /* sizeof(pkt_v4) */
+		offset = 4096; /* test too large offset */
+	} else if (data_len == 74) { /* sizeof(pkt_v6) */
+		offset = 40;
+	} else if (data_len == 64) {
+		offset = 128;
+	} else if (data_len == 128) {
+		offset = 4096 - 256 - 320 - data_len; /* Max tail grow 3520 */
+	} else {
+		return XDP_ABORTED; /* No matching test */
+	}
+
+	if (bpf_xdp_adjust_tail(xdp, offset))
+		return XDP_DROP;
+	return XDP_TX;
+}
+
+char _license[] SEC("license") = "GPL";



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net-next v3 07/33] xdp: xdp_frame add member frame_sz and handle in convert_to_xdp_frame
  2020-05-08 11:09 ` [PATCH net-next v3 07/33] xdp: xdp_frame add member frame_sz and handle in convert_to_xdp_frame Jesper Dangaard Brouer
@ 2020-05-08 16:00   ` Jakub Kicinski
  0 siblings, 0 replies; 38+ messages in thread
From: Jakub Kicinski @ 2020-05-08 16:00 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: sameehj, netdev, bpf, Toke Høiland-Jørgensen,
	Daniel Borkmann, Alexei Starovoitov, David S. Miller,
	John Fastabend, Alexander Duyck, Jeff Kirsher, David Ahern,
	Ilias Apalodimas, Lorenzo Bianconi, Saeed Mahameed, Tariq Toukan

On Fri, 08 May 2020 13:09:21 +0200 Jesper Dangaard Brouer wrote:
> +/* Used by XDP_WARN macro, to avoid inlining WARN() in fast-path */
> +void xdp_warn(const char* msg, const char* func, const int line) {
> +	WARN(1, "XDP_WARN: %s(line:%d): %s\n", func, line, msg);
> +};
> +EXPORT_SYMBOL_GPL(xdp_warn);

ERROR: "foo* bar" should be "foo *bar"
#59: FILE: include/net/xdp.h:109:
+void xdp_warn(const char* msg, const char* func, const int line);

ERROR: "foo* bar" should be "foo *bar"
#59: FILE: include/net/xdp.h:109:
+void xdp_warn(const char* msg, const char* func, const int line);

ERROR: "foo* bar" should be "foo *bar"
#104: FILE: net/core/xdp.c:502:
+void xdp_warn(const char* msg, const char* func, const int line) {

ERROR: "foo* bar" should be "foo *bar"
#104: FILE: net/core/xdp.c:502:
+void xdp_warn(const char* msg, const char* func, const int line) {

ERROR: open brace '{' following function definitions go on the next line
#104: FILE: net/core/xdp.c:502:
+void xdp_warn(const char* msg, const char* func, const int line) {

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net-next v3 24/33] ixgbevf: add XDP frame size to VF driver
  2020-05-08 11:10 ` [PATCH net-next v3 24/33] ixgbevf: add XDP frame size to VF driver Jesper Dangaard Brouer
@ 2020-05-08 16:01   ` Jakub Kicinski
  0 siblings, 0 replies; 38+ messages in thread
From: Jakub Kicinski @ 2020-05-08 16:01 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: sameehj, netdev, bpf, Toke Høiland-Jørgensen,
	Daniel Borkmann, Alexei Starovoitov, David S. Miller,
	John Fastabend, Alexander Duyck, Jeff Kirsher, David Ahern,
	Ilias Apalodimas, Lorenzo Bianconi, Saeed Mahameed, Tariq Toukan

On Fri, 08 May 2020 13:10:48 +0200 Jesper Dangaard Brouer wrote:
> +static unsigned int ixgbevf_rx_frame_truesize(struct ixgbevf_ring *rx_ring,
> +					      unsigned int size)
> +{
> +	unsigned int truesize;
> +
> +#if (PAGE_SIZE < 8192)
> +	truesize = ixgbevf_rx_pg_size(rx_ring) / 2; /* Must be power-of-2 */
> +#else
> +	truesize = ring_uses_build_skb(rx_ring) ?
> +		SKB_DATA_ALIGN(IXGBEVF_SKB_PAD + size) +
> +		SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) :
> +		SKB_DATA_ALIGN(size);
> +#endif
> +       return truesize;
> +}

WARNING: please, no spaces at the start of a line
#43: FILE: drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c:1111:
+       return truesize;$

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net-next v3 21/33] virtio_net: add XDP frame size in two code paths
  2020-05-08 11:10 ` [PATCH net-next v3 21/33] virtio_net: add XDP frame size in two code paths Jesper Dangaard Brouer
@ 2020-05-09  2:15   ` Jason Wang
  0 siblings, 0 replies; 38+ messages in thread
From: Jason Wang @ 2020-05-09  2:15 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, sameehj
  Cc: netdev, bpf, Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, David S. Miller, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed, Tariq Toukan


On 2020/5/8 下午7:10, Jesper Dangaard Brouer wrote:
> The virtio_net driver is running inside the guest-OS. There are two
> XDP receive code-paths in virtio_net, namely receive_small() and
> receive_mergeable(). The receive_big() function does not support XDP.
>
> In receive_small() the frame size is available in buflen. The buffer
> backing these frames are allocated in add_recvbuf_small() with same
> size, except for the headroom, but tailroom have reserved room for
> skb_shared_info. The headroom is encoded in ctx pointer as a value.
>
> In receive_mergeable() the frame size is more dynamic. There are two
> basic cases: (1) buffer size is based on a exponentially weighted
> moving average (see DECLARE_EWMA) of packet length. Or (2) in case
> virtnet_get_headroom() have any headroom then buffer size is
> PAGE_SIZE. The ctx pointer is this time used for encoding two values;
> the buffer len "truesize" and headroom. In case (1) if the rx buffer
> size is underestimated, the packet will have been split over more
> buffers (num_buf info in virtio_net_hdr_mrg_rxbuf placed in top of
> buffer area). If that happens the XDP path does a xdp_linearize_page
> operation.
>
> V3: Adjust frame_sz in receive_mergeable() case, spotted by Jason Wang.
>
> The code is really hard to follow,


Yes, I plan to rework to make it more easier for reviewers.


>   so some hints to reviewers.
> The receive_mergeable() case gets frames that were allocated in
> add_recvbuf_mergeable() which uses headroom=virtnet_get_headroom(),
> and 'buf' ptr is advanced this headroom.  The headroom can only
> be 0 or VIRTIO_XDP_HEADROOM, as virtnet_get_headroom is really
> simple:
>
>    static unsigned int virtnet_get_headroom(struct virtnet_info *vi)
>    {
> 	return vi->xdp_queue_pairs ? VIRTIO_XDP_HEADROOM : 0;
>    }
>
> As frame_sz is an offset size from xdp.data_hard_start, reviewers
> should notice how this is calculated in receive_mergeable():
>
>    int offset = buf - page_address(page);
>    [...]
>    data = page_address(xdp_page) + offset;
>    xdp.data_hard_start = data - VIRTIO_XDP_HEADROOM + vi->hdr_len;
>
> The calculated offset will always be VIRTIO_XDP_HEADROOM when
> reaching this code.  Thus, xdp.data_hard_start will be page-start
> address plus vi->hdr_len.  Given this xdp.frame_sz need to be
> reduced with vi->hdr_len size.
>
> IMHO a followup patch should cleanup this code to make it easier
> to maintain and understand, but it is outside the scope of this
> patchset.
>
> Cc: Jason Wang <jasowang@redhat.com>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> Acked-by: Michael S. Tsirkin <mst@redhat.com>


Acked-by: Jason Wang <jasowang@redhat.com>

Thanks


> ---
>   drivers/net/virtio_net.c |   15 ++++++++++++---
>   1 file changed, 12 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 11f722460513..9e1b5d748586 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -689,6 +689,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
>   		xdp.data_end = xdp.data + len;
>   		xdp.data_meta = xdp.data;
>   		xdp.rxq = &rq->xdp_rxq;
> +		xdp.frame_sz = buflen;
>   		orig_data = xdp.data;
>   		act = bpf_prog_run_xdp(xdp_prog, &xdp);
>   		stats->xdp_packets++;
> @@ -797,10 +798,11 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>   	int offset = buf - page_address(page);
>   	struct sk_buff *head_skb, *curr_skb;
>   	struct bpf_prog *xdp_prog;
> -	unsigned int truesize;
> +	unsigned int truesize = mergeable_ctx_to_truesize(ctx);
>   	unsigned int headroom = mergeable_ctx_to_headroom(ctx);
> -	int err;
>   	unsigned int metasize = 0;
> +	unsigned int frame_sz;
> +	int err;
>   
>   	head_skb = NULL;
>   	stats->bytes += len - vi->hdr_len;
> @@ -821,6 +823,11 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>   		if (unlikely(hdr->hdr.gso_type))
>   			goto err_xdp;
>   
> +		/* Buffers with headroom use PAGE_SIZE as alloc size,
> +		 * see add_recvbuf_mergeable() + get_mergeable_buf_len()
> +		 */
> +		frame_sz = headroom ? PAGE_SIZE : truesize;
> +
>   		/* This happens when rx buffer size is underestimated
>   		 * or headroom is not enough because of the buffer
>   		 * was refilled before XDP is set. This should only
> @@ -834,6 +841,8 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>   						      page, offset,
>   						      VIRTIO_XDP_HEADROOM,
>   						      &len);
> +			frame_sz = PAGE_SIZE;
> +
>   			if (!xdp_page)
>   				goto err_xdp;
>   			offset = VIRTIO_XDP_HEADROOM;
> @@ -850,6 +859,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>   		xdp.data_end = xdp.data + (len - vi->hdr_len);
>   		xdp.data_meta = xdp.data;
>   		xdp.rxq = &rq->xdp_rxq;
> +		xdp.frame_sz = frame_sz - vi->hdr_len;
>   
>   		act = bpf_prog_run_xdp(xdp_prog, &xdp);
>   		stats->xdp_packets++;
> @@ -924,7 +934,6 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>   	}
>   	rcu_read_unlock();
>   
> -	truesize = mergeable_ctx_to_truesize(ctx);
>   	if (unlikely(len > truesize)) {
>   		pr_debug("%s: rx error: len %u exceeds truesize %lu\n",
>   			 dev->name, len, (unsigned long)ctx);
>
>


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH net-next v3 28/33] mlx5: rx queue setup time determine frame_sz for XDP
  2020-05-08 11:11 ` [PATCH net-next v3 28/33] mlx5: rx queue setup time determine frame_sz for XDP Jesper Dangaard Brouer
@ 2020-05-10  6:31   ` Tariq Toukan
  0 siblings, 0 replies; 38+ messages in thread
From: Tariq Toukan @ 2020-05-10  6:31 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, sameehj
  Cc: netdev, bpf, Toke Høiland-Jørgensen, Daniel Borkmann,
	Alexei Starovoitov, David S. Miller, John Fastabend,
	Alexander Duyck, Jeff Kirsher, David Ahern, Ilias Apalodimas,
	Lorenzo Bianconi, Saeed Mahameed



On 5/8/2020 2:11 PM, Jesper Dangaard Brouer wrote:
> The mlx5 driver have multiple memory models, which are also changed
> according to whether a XDP bpf_prog is attached.
> 
> The 'rx_striding_rq' setting is adjusted via ethtool priv-flags e.g.:
>   # ethtool --set-priv-flags mlx5p2 rx_striding_rq off
> 
> On the general case with 4K page_size and regular MTU packet, then
> the frame_sz is 2048 and 4096 when XDP is enabled, in both modes.
> 
> The info on the given frame size is stored differently depending on the
> RQ-mode and encoded in a union in struct mlx5e_rq union wqe/mpwqe.
> In rx striding mode rq->mpwqe.log_stride_sz is either 11 or 12, which
> corresponds to 2048 or 4096 (MLX5_WQ_TYPE_LINKED_LIST_STRIDING_RQ).
> In non-striding mode (MLX5_WQ_TYPE_CYCLIC) the frag_stride is stored
> in rq->wqe.info.arr[0].frag_stride, for the first fragment, which is
> what the XDP case cares about.
> 
> To reduce effect on fast-path, this patch determine the frame_sz at
> setup time, to avoid determining the memory model runtime. Variable
> is named first_frame_sz to make it clear that this is only the frame
> size of the first fragment.
> 
> This mlx5 driver does a DMA-sync on XDP_TX action, but grow is safe
> as it have done a DMA-map on the entire PAGE_SIZE. The driver also
> already does a XDP length check against sq->hw_mtu on the possible
> XDP xmit paths mlx5e_xmit_xdp_frame() + mlx5e_xmit_xdp_frame_mpwqe().
> 
> V2: Fix that frag_size need to be recalc before creating SKB.
> 
> Cc: Tariq Toukan <tariqt@mellanox.com>
> Cc: Saeed Mahameed <saeedm@mellanox.com>
> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
> ---

Thanks Jesper.
Acked-by: Tariq Toukan <tariqt@mellanox.com>

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2020-05-10  6:31 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-08 11:08 [PATCH net-next v3 00/33] XDP extend with knowledge of frame size Jesper Dangaard Brouer
2020-05-08 11:08 ` [PATCH net-next v3 01/33] xdp: add frame size to xdp_buff Jesper Dangaard Brouer
2020-05-08 11:08 ` [PATCH net-next v3 02/33] bnxt: add XDP frame size to driver Jesper Dangaard Brouer
2020-05-08 11:09 ` [PATCH net-next v3 03/33] sfc: add XDP frame size Jesper Dangaard Brouer
2020-05-08 11:09 ` [PATCH net-next v3 04/33] mvneta: add XDP frame size to driver Jesper Dangaard Brouer
2020-05-08 11:09 ` [PATCH net-next v3 05/33] net: netsec: Add support for XDP frame size Jesper Dangaard Brouer
2020-05-08 11:09 ` [PATCH net-next v3 06/33] net: XDP-generic determining " Jesper Dangaard Brouer
2020-05-08 11:09 ` [PATCH net-next v3 07/33] xdp: xdp_frame add member frame_sz and handle in convert_to_xdp_frame Jesper Dangaard Brouer
2020-05-08 16:00   ` Jakub Kicinski
2020-05-08 11:09 ` [PATCH net-next v3 08/33] xdp: cpumap redirect use frame_sz and increase skb_tailroom Jesper Dangaard Brouer
2020-05-08 11:09 ` [PATCH net-next v3 09/33] veth: adjust hard_start offset on redirect XDP frames Jesper Dangaard Brouer
2020-05-08 11:09 ` [PATCH net-next v3 10/33] veth: xdp using frame_sz in veth driver Jesper Dangaard Brouer
2020-05-08 11:09 ` [PATCH net-next v3 11/33] dpaa2-eth: add XDP frame size Jesper Dangaard Brouer
2020-05-08 11:09 ` [PATCH net-next v3 12/33] hv_netvsc: add XDP frame size to driver Jesper Dangaard Brouer
2020-05-08 11:09 ` [PATCH net-next v3 13/33] qlogic/qede: " Jesper Dangaard Brouer
2020-05-08 11:09 ` [PATCH net-next v3 14/33] net: ethernet: ti: add XDP frame size to driver cpsw Jesper Dangaard Brouer
2020-05-08 11:10 ` [PATCH net-next v3 15/33] ena: add XDP frame size to amazon NIC driver Jesper Dangaard Brouer
2020-05-08 11:10 ` [PATCH net-next v3 16/33] mlx4: add XDP frame size and adjust max XDP MTU Jesper Dangaard Brouer
2020-05-08 11:10 ` [PATCH net-next v3 17/33] net: thunderx: add XDP frame size Jesper Dangaard Brouer
2020-05-08 11:10 ` [PATCH net-next v3 18/33] nfp: add XDP frame size to netronome driver Jesper Dangaard Brouer
2020-05-08 11:10 ` [PATCH net-next v3 19/33] tun: add XDP frame size Jesper Dangaard Brouer
2020-05-08 11:10 ` [PATCH net-next v3 20/33] vhost_net: also populate " Jesper Dangaard Brouer
2020-05-08 11:10 ` [PATCH net-next v3 21/33] virtio_net: add XDP frame size in two code paths Jesper Dangaard Brouer
2020-05-09  2:15   ` Jason Wang
2020-05-08 11:10 ` [PATCH net-next v3 22/33] ixgbe: fix XDP redirect on archs with PAGE_SIZE above 4K Jesper Dangaard Brouer
2020-05-08 11:10 ` [PATCH net-next v3 23/33] ixgbe: add XDP frame size to driver Jesper Dangaard Brouer
2020-05-08 11:10 ` [PATCH net-next v3 24/33] ixgbevf: add XDP frame size to VF driver Jesper Dangaard Brouer
2020-05-08 16:01   ` Jakub Kicinski
2020-05-08 11:10 ` [PATCH net-next v3 25/33] i40e: add XDP frame size to driver Jesper Dangaard Brouer
2020-05-08 11:10 ` [PATCH net-next v3 26/33] ice: " Jesper Dangaard Brouer
2020-05-08 11:11 ` [PATCH net-next v3 27/33] xdp: for Intel AF_XDP drivers add XDP frame_sz Jesper Dangaard Brouer
2020-05-08 11:11 ` [PATCH net-next v3 28/33] mlx5: rx queue setup time determine frame_sz for XDP Jesper Dangaard Brouer
2020-05-10  6:31   ` Tariq Toukan
2020-05-08 11:11 ` [PATCH net-next v3 29/33] xdp: allow bpf_xdp_adjust_tail() to grow packet size Jesper Dangaard Brouer
2020-05-08 11:11 ` [PATCH net-next v3 30/33] xdp: clear grow memory in bpf_xdp_adjust_tail() Jesper Dangaard Brouer
2020-05-08 11:11 ` [PATCH net-next v3 31/33] bpf: add xdp.frame_sz in bpf_prog_test_run_xdp() Jesper Dangaard Brouer
2020-05-08 11:11 ` [PATCH net-next v3 32/33] selftests/bpf: adjust BPF selftest for xdp_adjust_tail Jesper Dangaard Brouer
2020-05-08 11:11 ` [PATCH net-next v3 33/33] selftests/bpf: xdp_adjust_tail add grow tail tests Jesper Dangaard Brouer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.