All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC-v5 bpf-next 00/12] Add support for XDP in egress path
@ 2020-04-13 17:17 David Ahern
  2020-04-13 17:17 ` [PATCH RFC-v5 bpf-next 01/12] net: Add XDP setup and query commands for Tx programs David Ahern
                   ` (12 more replies)
  0 siblings, 13 replies; 27+ messages in thread
From: David Ahern @ 2020-04-13 17:17 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, prashantbhole.linux, jasowang, brouer, toke,
	toshiaki.makita1, daniel, john.fastabend, ast, kafai,
	songliubraving, yhs, andriin, dsahern

From: David Ahern <dsahern@gmail.com>

This series adds support for XDP in the egress path by introducing
a new XDP attachment type, BPF_XDP_EGRESS, and adding a UAPI to
if_link.h for attaching the program to a netdevice and reporting
the program. bpf programs can be run on all packets in the Tx path -
skbs or redirected xdp frames. The intent is to emulate the current
RX path for XDP as much as possible to maintain consistency and
symmetry in the 2 paths with their APIs.

This is a missing primitive for XDP allowing solutions to build small,
targeted programs properly distributed in the networking path allowing,
for example, an egress firewall/ACL/traffic verification or packet
manipulation and encapping an entire ethernet frame whether it is
locally generated traffic, forwarded via the slow path (ie., full
stack processing) or xdp redirected frames.

Nothing about running a program in the Tx path requires driver specific
resources like the Rx path has. Thus, programs can be run in core
code and attached to the net_device struct similar to skb mode. The
existing XDP_FLAGS_*_MODE are not relevant at the moment, so none can
be set in the attach. XDP_FLAGS_HW_MODE can be used in the future
(e.g., the work on offloading programs from a VM).

The locations chosen to run the egress program - __netdev_start_xmit
before the call to ndo_start_xmit and bq_xmit_all before invoking
ndo_xdp_xmit - allow follow on patch sets to handle tx queueing and
setting the queue index if multi-queue with consistency in handling
both packet formats.

A few of the patches trace back to work done on offloading programs
from a VM by Jason Wang and Prashant Bole.

Automated tests will be added to tools/testing/selftests/bpf in the
next revision.

v5:
- updated cover letter
- moved running of ebpf program from ndo_{start,xdp}_xmit to core
  code. Accordingly, dropped all tun and vhost related changes.
- added egress support to bpftool

v4:
- updated cover letter
- patches related to code movement between tuntap, headers and vhost
  are dropped; previous RFC ran the XDP program in vhost context vs
  this set which runs them before queueing to vhost. As a part of this
  moved invocation of egress program to tun_net_xmit and tun_xdp_xmit.
- renamed do_xdp_generic to do_xdp_generic_rx to emphasize is called
  in the Rx path; added rx argument to do_xdp_generic_core since it
  is used for both directions and needs to know which queue values to
  set in xdp_buff

v3:
- reworked the patches - splitting patch 1 from RFC v2 into 3, combining
  patch 2 from RFC v2 into the first 3, combining patches 6 and 7 from
  RFC v2 into 1 since both did a trivial rename and export. Reordered
  the patches such that kernel changes are first followed by libbpf and
  an enhancement to a sample.

- moved small xdp related helper functions from tun.c to tun.h to make
  tun_ptr_free usable from the tap code. This is needed to handle the
  case of tap builtin and tun built as a module.

- pkt_ptrs added to `struct tun_file` and passed to tun_consume_packets
  rather than declaring pkts as an array on the stack.

v2:
- New XDP attachment type: Jesper, Toke and Alexei discussed whether
  to introduce a new program type. Since this set adds a way to attach
  regular XDP program to the tx path, as per Alexei's suggestion, a
  new attachment type BPF_XDP_EGRESS is introduced.

- libbpf API changes:
  Alexei had suggested _opts() style of API extension. Considering it
  two new libbpf APIs are introduced which are equivalent to existing
  APIs. New ones can be extended easily. Please see individual patches
  for details. xdp1 sample program is modified to use new APIs.

- tun: Some patches from previous set are removed as they are
  irrelevant in this series. They will in introduced later.

[1]: https://netdevconf.info/0x13/session.html?xdp-offload-with-virtio-net

David Ahern (12):
  net: Add XDP setup and query commands for Tx programs
  net: Add BPF_XDP_EGRESS as a bpf_attach_type
  xdp: Add xdp_txq_info to xdp_buff
  net: Add IFLA_XDP_EGRESS for XDP programs in the egress path
  net: core: rename netif_receive_generic_xdp to do_generic_xdp_core
  net: core: Rename do_xdp_generic to do_xdp_generic_rx
  dev: set egress XDP program
  dev: Support xdp in the Tx path for packets as an skb
  dev: Support xdp in the Tx path for xdp_frames
  libbpf: Add egress XDP support
  bpftool: Add support for XDP egress
  samples/bpf: add XDP egress support to xdp1

 drivers/net/tun.c                  |   4 +-
 include/linux/netdevice.h          |  23 ++-
 include/net/xdp.h                  |   5 +
 include/uapi/linux/bpf.h           |   3 +
 include/uapi/linux/if_link.h       |   3 +
 kernel/bpf/devmap.c                |  19 ++-
 net/core/dev.c                     | 258 +++++++++++++++++++++++------
 net/core/filter.c                  |  23 +++
 net/core/rtnetlink.c               |  96 ++++++++++-
 samples/bpf/xdp1_user.c            |  38 ++++-
 tools/bpf/bpftool/main.h           |   2 +-
 tools/bpf/bpftool/net.c            |  49 +++++-
 tools/bpf/bpftool/netlink_dumper.c |  12 +-
 tools/include/uapi/linux/bpf.h     |   3 +
 tools/include/uapi/linux/if_link.h |   3 +
 tools/lib/bpf/libbpf.c             |   2 +
 tools/lib/bpf/libbpf.h             |   9 +-
 tools/lib/bpf/libbpf.map           |   2 +
 tools/lib/bpf/netlink.c            |  63 ++++++-
 19 files changed, 528 insertions(+), 89 deletions(-)

-- 
2.21.1 (Apple Git-122.3)


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH RFC-v5 bpf-next 01/12] net: Add XDP setup and query commands for Tx programs
  2020-04-13 17:17 [PATCH RFC-v5 bpf-next 00/12] Add support for XDP in egress path David Ahern
@ 2020-04-13 17:17 ` David Ahern
  2020-04-13 17:17 ` [PATCH RFC-v5 bpf-next 02/12] net: Add BPF_XDP_EGRESS as a bpf_attach_type David Ahern
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 27+ messages in thread
From: David Ahern @ 2020-04-13 17:17 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, prashantbhole.linux, jasowang, brouer, toke,
	toshiaki.makita1, daniel, john.fastabend, ast, kafai,
	songliubraving, yhs, andriin, dsahern, David Ahern

From: David Ahern <dahern@digitalocean.com>

Add new netdev commands, XDP_SETUP_PROG_EGRESS and
XDP_QUERY_PROG_EGRESS, to query and setup egress programs.

Update dev_change_xdp_fd and dev_xdp_install to take an egress argument.
If egress bool is set, then use XDP_SETUP_PROG_EGRESS in dev_xdp_install
and XDP_QUERY_PROG_EGRESS in dev_change_xdp_fd.

Update dev_xdp_uninstall to query for XDP_QUERY_PROG_EGRESS and if a
program is installed call dev_xdp_install with the egress argument set
to true.

This enables existing infrastructure to be used for XDP programs in
the egress path.

Signed-off-by: David Ahern <dahern@digitalocean.com>
Co-developed-by: Prashant Bhole <prashantbhole.linux@gmail.com>
Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
---
 include/linux/netdevice.h |  4 +++-
 net/core/dev.c            | 34 +++++++++++++++++++++++-----------
 net/core/rtnetlink.c      |  2 +-
 3 files changed, 27 insertions(+), 13 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 130a668049ab..d0bb9e09660a 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -871,8 +871,10 @@ enum bpf_netdev_command {
 	 */
 	XDP_SETUP_PROG,
 	XDP_SETUP_PROG_HW,
+	XDP_SETUP_PROG_EGRESS,
 	XDP_QUERY_PROG,
 	XDP_QUERY_PROG_HW,
+	XDP_QUERY_PROG_EGRESS,
 	/* BPF program for offload callbacks, invoked at program load time. */
 	BPF_OFFLOAD_MAP_ALLOC,
 	BPF_OFFLOAD_MAP_FREE,
@@ -3777,7 +3779,7 @@ struct sk_buff *dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
 
 typedef int (*bpf_op_t)(struct net_device *dev, struct netdev_bpf *bpf);
 int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
-		      int fd, int expected_fd, u32 flags);
+		      int fd, int expected_fd, u32 flags, bool egress);
 u32 __dev_xdp_query(struct net_device *dev, bpf_op_t xdp_op,
 		    enum bpf_netdev_command cmd);
 int xdp_umem_query(struct net_device *dev, u16 queue_id);
diff --git a/net/core/dev.c b/net/core/dev.c
index 3ebfecc7b112..06e0872ecdae 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -8589,7 +8589,7 @@ u32 __dev_xdp_query(struct net_device *dev, bpf_op_t bpf_op,
 
 static int dev_xdp_install(struct net_device *dev, bpf_op_t bpf_op,
 			   struct netlink_ext_ack *extack, u32 flags,
-			   struct bpf_prog *prog)
+			   struct bpf_prog *prog, bool egress)
 {
 	bool non_hw = !(flags & XDP_FLAGS_HW_MODE);
 	struct bpf_prog *prev_prog = NULL;
@@ -8597,8 +8597,10 @@ static int dev_xdp_install(struct net_device *dev, bpf_op_t bpf_op,
 	int err;
 
 	if (non_hw) {
-		prev_prog = bpf_prog_by_id(__dev_xdp_query(dev, bpf_op,
-							   XDP_QUERY_PROG));
+		enum bpf_netdev_command cmd;
+
+		cmd = egress ? XDP_QUERY_PROG_EGRESS : XDP_QUERY_PROG;
+		prev_prog = bpf_prog_by_id(__dev_xdp_query(dev, bpf_op, cmd));
 		if (IS_ERR(prev_prog))
 			prev_prog = NULL;
 	}
@@ -8607,7 +8609,7 @@ static int dev_xdp_install(struct net_device *dev, bpf_op_t bpf_op,
 	if (flags & XDP_FLAGS_HW_MODE)
 		xdp.command = XDP_SETUP_PROG_HW;
 	else
-		xdp.command = XDP_SETUP_PROG;
+		xdp.command = egress ? XDP_SETUP_PROG_EGRESS : XDP_SETUP_PROG;
 	xdp.extack = extack;
 	xdp.flags = flags;
 	xdp.prog = prog;
@@ -8628,7 +8630,12 @@ static void dev_xdp_uninstall(struct net_device *dev)
 	bpf_op_t ndo_bpf;
 
 	/* Remove generic XDP */
-	WARN_ON(dev_xdp_install(dev, generic_xdp_install, NULL, 0, NULL));
+	WARN_ON(dev_xdp_install(dev, generic_xdp_install, NULL, 0, NULL,
+				false));
+
+	/* Remove XDP egress */
+	WARN_ON(dev_xdp_install(dev, generic_xdp_install, NULL, 0, NULL,
+				true));
 
 	/* Remove from the driver */
 	ndo_bpf = dev->netdev_ops->ndo_bpf;
@@ -8640,14 +8647,14 @@ static void dev_xdp_uninstall(struct net_device *dev)
 	WARN_ON(ndo_bpf(dev, &xdp));
 	if (xdp.prog_id)
 		WARN_ON(dev_xdp_install(dev, ndo_bpf, NULL, xdp.prog_flags,
-					NULL));
+					NULL, false));
 
 	/* Remove HW offload */
 	memset(&xdp, 0, sizeof(xdp));
 	xdp.command = XDP_QUERY_PROG_HW;
 	if (!ndo_bpf(dev, &xdp) && xdp.prog_id)
 		WARN_ON(dev_xdp_install(dev, ndo_bpf, NULL, xdp.prog_flags,
-					NULL));
+					NULL, false));
 }
 
 /**
@@ -8661,7 +8668,7 @@ static void dev_xdp_uninstall(struct net_device *dev)
  *	Set or clear a bpf program for a device
  */
 int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
-		      int fd, int expected_fd, u32 flags)
+		      int fd, int expected_fd, u32 flags, bool egress)
 {
 	const struct net_device_ops *ops = dev->netdev_ops;
 	enum bpf_netdev_command query;
@@ -8674,7 +8681,11 @@ int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
 	ASSERT_RTNL();
 
 	offload = flags & XDP_FLAGS_HW_MODE;
-	query = offload ? XDP_QUERY_PROG_HW : XDP_QUERY_PROG;
+	if (egress)
+		query = XDP_QUERY_PROG_EGRESS;
+	else
+		query = offload ? XDP_QUERY_PROG_HW : XDP_QUERY_PROG;
+
 
 	bpf_op = bpf_chk = ops->ndo_bpf;
 	if (!bpf_op && (flags & (XDP_FLAGS_DRV_MODE | XDP_FLAGS_HW_MODE))) {
@@ -8704,7 +8715,8 @@ int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
 		}
 	}
 	if (fd >= 0) {
-		if (!offload && __dev_xdp_query(dev, bpf_chk, XDP_QUERY_PROG)) {
+		if (!offload && !egress &&
+		    __dev_xdp_query(dev, bpf_chk, XDP_QUERY_PROG)) {
 			NL_SET_ERR_MSG(extack, "native and generic XDP can't be active at the same time");
 			return -EEXIST;
 		}
@@ -8736,7 +8748,7 @@ int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
 		prog = NULL;
 	}
 
-	err = dev_xdp_install(dev, bpf_op, extack, flags, prog);
+	err = dev_xdp_install(dev, bpf_op, extack, flags, prog, egress);
 	if (err < 0 && prog)
 		bpf_prog_put(prog);
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 97e47c292333..dc44af16226a 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -2515,7 +2515,7 @@ static int do_setlink_xdp(struct net_device *dev, struct nlattr *tb,
 
 		err = dev_change_xdp_fd(dev, extack,
 					nla_get_s32(xdp[IFLA_XDP_FD]),
-					expected_fd, xdp_flags);
+					expected_fd, xdp_flags, false);
 		if (err)
 			return err;
 
-- 
2.21.1 (Apple Git-122.3)


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH RFC-v5 bpf-next 02/12] net: Add BPF_XDP_EGRESS as a bpf_attach_type
  2020-04-13 17:17 [PATCH RFC-v5 bpf-next 00/12] Add support for XDP in egress path David Ahern
  2020-04-13 17:17 ` [PATCH RFC-v5 bpf-next 01/12] net: Add XDP setup and query commands for Tx programs David Ahern
@ 2020-04-13 17:17 ` David Ahern
  2020-04-16 14:01   ` Toke Høiland-Jørgensen
  2020-04-13 17:17 ` [PATCH RFC-v5 bpf-next 03/12] xdp: Add xdp_txq_info to xdp_buff David Ahern
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 27+ messages in thread
From: David Ahern @ 2020-04-13 17:17 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, prashantbhole.linux, jasowang, brouer, toke,
	toshiaki.makita1, daniel, john.fastabend, ast, kafai,
	songliubraving, yhs, andriin, dsahern, David Ahern

From: David Ahern <dahern@digitalocean.com>

Add new bpf_attach_type, BPF_XDP_EGRESS, for BPF programs attached
at the XDP layer, but the egress path.

Since egress path will not have ingress_ifindex and rx_queue_index
set, update xdp_is_valid_access to block access to these entries in
the xdp context when a program is attached to egress path.

Update dev_change_xdp_fd to verify expected_attach_type for a program
is BPF_XDP_EGRESS if egress argument is set.

The next patch adds support for the egress ifindex.

Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
Signed-off-by: David Ahern <dahern@digitalocean.com>
---
 include/uapi/linux/bpf.h       | 1 +
 net/core/dev.c                 | 6 ++++++
 net/core/filter.c              | 8 ++++++++
 tools/include/uapi/linux/bpf.h | 1 +
 4 files changed, 16 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 2e29a671d67e..a9d384998e8b 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -215,6 +215,7 @@ enum bpf_attach_type {
 	BPF_TRACE_FEXIT,
 	BPF_MODIFY_RETURN,
 	BPF_LSM_MAC,
+	BPF_XDP_EGRESS,
 	__MAX_BPF_ATTACH_TYPE
 };
 
diff --git a/net/core/dev.c b/net/core/dev.c
index 06e0872ecdae..e763b6cea8ff 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -8731,6 +8731,12 @@ int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
 		if (IS_ERR(prog))
 			return PTR_ERR(prog);
 
+		if (egress && prog->expected_attach_type != BPF_XDP_EGRESS) {
+			NL_SET_ERR_MSG(extack, "XDP program in Tx path must use BPF_XDP_EGRESS attach type");
+			bpf_prog_put(prog);
+			return -EINVAL;
+		}
+
 		if (!offload && bpf_prog_is_dev_bound(prog->aux)) {
 			NL_SET_ERR_MSG(extack, "using device-bound program without HW_MODE flag is not supported");
 			bpf_prog_put(prog);
diff --git a/net/core/filter.c b/net/core/filter.c
index 7628b947dbc3..c4e0e044722f 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -6935,6 +6935,14 @@ static bool xdp_is_valid_access(int off, int size,
 				const struct bpf_prog *prog,
 				struct bpf_insn_access_aux *info)
 {
+	if (prog->expected_attach_type == BPF_XDP_EGRESS) {
+		switch (off) {
+		case offsetof(struct xdp_md, ingress_ifindex):
+		case offsetof(struct xdp_md, rx_queue_index):
+			return false;
+		}
+	}
+
 	if (type == BPF_WRITE) {
 		if (bpf_prog_is_dev_bound(prog->aux)) {
 			switch (off) {
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 2e29a671d67e..a9d384998e8b 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -215,6 +215,7 @@ enum bpf_attach_type {
 	BPF_TRACE_FEXIT,
 	BPF_MODIFY_RETURN,
 	BPF_LSM_MAC,
+	BPF_XDP_EGRESS,
 	__MAX_BPF_ATTACH_TYPE
 };
 
-- 
2.21.1 (Apple Git-122.3)


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH RFC-v5 bpf-next 03/12] xdp: Add xdp_txq_info to xdp_buff
  2020-04-13 17:17 [PATCH RFC-v5 bpf-next 00/12] Add support for XDP in egress path David Ahern
  2020-04-13 17:17 ` [PATCH RFC-v5 bpf-next 01/12] net: Add XDP setup and query commands for Tx programs David Ahern
  2020-04-13 17:17 ` [PATCH RFC-v5 bpf-next 02/12] net: Add BPF_XDP_EGRESS as a bpf_attach_type David Ahern
@ 2020-04-13 17:17 ` David Ahern
  2020-04-13 17:17 ` [PATCH RFC-v5 bpf-next 04/12] net: Add IFLA_XDP_EGRESS for XDP programs in the egress path David Ahern
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 27+ messages in thread
From: David Ahern @ 2020-04-13 17:17 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, prashantbhole.linux, jasowang, brouer, toke,
	toshiaki.makita1, daniel, john.fastabend, ast, kafai,
	songliubraving, yhs, andriin, dsahern, David Ahern

From: David Ahern <dahern@digitalocean.com>

Add xdp_txq_info as the Tx counterpart to xdp_rxq_info. At the
moment only the device is added. Other fields (queue_index)
can be added as use cases arise.

From a UAPI perspective, add egress_ifindex to xdp context.

Update the verifier to reject accesses to egress_ifindex by
rx programs.

Signed-off-by: David Ahern <dahern@digitalocean.com>
---
 include/net/xdp.h              |  5 +++++
 include/uapi/linux/bpf.h       |  2 ++
 net/core/filter.c              | 15 +++++++++++++++
 tools/include/uapi/linux/bpf.h |  2 ++
 4 files changed, 24 insertions(+)

diff --git a/include/net/xdp.h b/include/net/xdp.h
index 40c6d3398458..5584b9db86fe 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -63,6 +63,10 @@ struct xdp_rxq_info {
 	struct xdp_mem_info mem;
 } ____cacheline_aligned; /* perf critical, avoid false-sharing */
 
+struct xdp_txq_info {
+	struct net_device *dev;
+};
+
 struct xdp_buff {
 	void *data;
 	void *data_end;
@@ -70,6 +74,7 @@ struct xdp_buff {
 	void *data_hard_start;
 	unsigned long handle;
 	struct xdp_rxq_info *rxq;
+	struct xdp_txq_info *txq;
 };
 
 struct xdp_frame {
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index a9d384998e8b..35e3aab97dd4 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -3487,6 +3487,8 @@ struct xdp_md {
 	/* Below access go through struct xdp_rxq_info */
 	__u32 ingress_ifindex; /* rxq->dev->ifindex */
 	__u32 rx_queue_index;  /* rxq->queue_index  */
+
+	__u32 egress_ifindex;  /* txq->dev->ifindex */
 };
 
 enum sk_action {
diff --git a/net/core/filter.c b/net/core/filter.c
index c4e0e044722f..d8839d07acad 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -6941,6 +6941,11 @@ static bool xdp_is_valid_access(int off, int size,
 		case offsetof(struct xdp_md, rx_queue_index):
 			return false;
 		}
+	} else {
+		switch (off) {
+		case offsetof(struct xdp_md, egress_ifindex):
+			return false;
+		}
 	}
 
 	if (type == BPF_WRITE) {
@@ -7890,6 +7895,16 @@ static u32 xdp_convert_ctx_access(enum bpf_access_type type,
 				      offsetof(struct xdp_rxq_info,
 					       queue_index));
 		break;
+	case offsetof(struct xdp_md, egress_ifindex):
+		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct xdp_buff, txq),
+				      si->dst_reg, si->src_reg,
+				      offsetof(struct xdp_buff, txq));
+		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct xdp_txq_info, dev),
+				      si->dst_reg, si->dst_reg,
+				      offsetof(struct xdp_txq_info, dev));
+		*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg,
+				      offsetof(struct net_device, ifindex));
+		break;
 	}
 
 	return insn - insn_buf;
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index a9d384998e8b..35e3aab97dd4 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -3487,6 +3487,8 @@ struct xdp_md {
 	/* Below access go through struct xdp_rxq_info */
 	__u32 ingress_ifindex; /* rxq->dev->ifindex */
 	__u32 rx_queue_index;  /* rxq->queue_index  */
+
+	__u32 egress_ifindex;  /* txq->dev->ifindex */
 };
 
 enum sk_action {
-- 
2.21.1 (Apple Git-122.3)


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH RFC-v5 bpf-next 04/12] net: Add IFLA_XDP_EGRESS for XDP programs in the egress path
  2020-04-13 17:17 [PATCH RFC-v5 bpf-next 00/12] Add support for XDP in egress path David Ahern
                   ` (2 preceding siblings ...)
  2020-04-13 17:17 ` [PATCH RFC-v5 bpf-next 03/12] xdp: Add xdp_txq_info to xdp_buff David Ahern
@ 2020-04-13 17:17 ` David Ahern
  2020-04-13 17:17 ` [PATCH RFC-v5 bpf-next 05/12] net: core: rename netif_receive_generic_xdp to do_generic_xdp_core David Ahern
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 27+ messages in thread
From: David Ahern @ 2020-04-13 17:17 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, prashantbhole.linux, jasowang, brouer, toke,
	toshiaki.makita1, daniel, john.fastabend, ast, kafai,
	songliubraving, yhs, andriin, dsahern, David Ahern

From: David Ahern <dahern@digitalocean.com>

Running programs in the egress path, on skbs or xdp_frames, does not
require driver specific resources like Rx path. Accordingly, the
programs can be run in core code, so add xdp_egress_prog to net_device
to hold a reference to an attached program.

For UAPI, add IFLA_XDP_EGRESS to if_link.h to specify egress programs,
add a new attach flag, XDP_ATTACHED_EGRESS_CORE, to denote the
attach point is at the core level (as opposed to driver or hardware)
and add IFLA_XDP_EGRESS_CORE_PROG_ID for reporting the program id.

Add egress argument to do_setlink_xdp to denote processing of
IFLA_XDP_EGRESS versus IFLA_XDP, and add a check that none of the
existing modes (SKB, DRV or HW) are set since those modes are not
valid. The expectation is that XDP_FLAGS_HW_MODE will be used later
(e.g., offloading guest programs).

Add rtnl_xdp_egress_fill and helpers as the egress counterpart to the
existing rtnl_xdp_fill.

Signed-off-by: David Ahern <dahern@digitalocean.com>
---
 include/linux/netdevice.h          |  1 +
 include/uapi/linux/if_link.h       |  3 +
 net/core/rtnetlink.c               | 96 ++++++++++++++++++++++++++++--
 tools/include/uapi/linux/if_link.h |  3 +
 4 files changed, 99 insertions(+), 4 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index d0bb9e09660a..3133247681fd 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -1995,6 +1995,7 @@ struct net_device {
 	unsigned int		real_num_rx_queues;
 
 	struct bpf_prog __rcu	*xdp_prog;
+	struct bpf_prog __rcu	*xdp_egress_prog;
 	unsigned long		gro_flush_timeout;
 	rx_handler_func_t __rcu	*rx_handler;
 	void __rcu		*rx_handler_data;
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 127c704eeba9..b3c6cb2f0f0a 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -170,6 +170,7 @@ enum {
 	IFLA_PROP_LIST,
 	IFLA_ALT_IFNAME, /* Alternative ifname */
 	IFLA_PERM_ADDRESS,
+	IFLA_XDP_EGRESS, /* nested attribute with 1 or more IFLA_XDP_ attrs */
 	__IFLA_MAX
 };
 
@@ -988,6 +989,7 @@ enum {
 	XDP_ATTACHED_SKB,
 	XDP_ATTACHED_HW,
 	XDP_ATTACHED_MULTI,
+	XDP_ATTACHED_EGRESS_CORE,
 };
 
 enum {
@@ -1000,6 +1002,7 @@ enum {
 	IFLA_XDP_SKB_PROG_ID,
 	IFLA_XDP_HW_PROG_ID,
 	IFLA_XDP_EXPECTED_FD,
+	IFLA_XDP_EGRESS_CORE_PROG_ID,
 	__IFLA_XDP_MAX,
 };
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index dc44af16226a..e9bc5cee06c8 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1030,7 +1030,7 @@ static noinline size_t if_nlmsg_size(const struct net_device *dev,
 	       + nla_total_size(MAX_PHYS_ITEM_ID_LEN) /* IFLA_PHYS_PORT_ID */
 	       + nla_total_size(MAX_PHYS_ITEM_ID_LEN) /* IFLA_PHYS_SWITCH_ID */
 	       + nla_total_size(IFNAMSIZ) /* IFLA_PHYS_PORT_NAME */
-	       + rtnl_xdp_size() /* IFLA_XDP */
+	       + rtnl_xdp_size() * 2 /* IFLA_XDP and IFLA_XDP_EGRESS */
 	       + nla_total_size(4)  /* IFLA_EVENT */
 	       + nla_total_size(4)  /* IFLA_NEW_NETNSID */
 	       + nla_total_size(4)  /* IFLA_NEW_IFINDEX */
@@ -1395,6 +1395,42 @@ static int rtnl_fill_link_ifmap(struct sk_buff *skb, struct net_device *dev)
 	return 0;
 }
 
+static u32 rtnl_xdp_egress_prog(struct net_device *dev)
+{
+	const struct bpf_prog *prog;
+
+	ASSERT_RTNL();
+
+	prog = rtnl_dereference(dev->xdp_egress_prog);
+	if (!prog)
+		return 0;
+	return prog->aux->id;
+}
+
+static int rtnl_xdp_egress_report(struct sk_buff *skb, struct net_device *dev,
+				  u32 *prog_id, u8 *mode, u8 tgt_mode, u32 attr,
+				  u32 (*get_prog_id)(struct net_device *dev))
+{
+	u32 curr_id;
+	int err;
+
+	curr_id = get_prog_id(dev);
+	if (!curr_id)
+		return 0;
+
+	*prog_id = curr_id;
+	err = nla_put_u32(skb, attr, curr_id);
+	if (err)
+		return err;
+
+	if (*mode != XDP_ATTACHED_NONE)
+		*mode = XDP_ATTACHED_MULTI;
+	else
+		*mode = tgt_mode;
+
+	return 0;
+}
+
 static u32 rtnl_xdp_prog_skb(struct net_device *dev)
 {
 	const struct bpf_prog *generic_xdp_prog;
@@ -1486,6 +1522,42 @@ static int rtnl_xdp_fill(struct sk_buff *skb, struct net_device *dev)
 	return err;
 }
 
+static int rtnl_xdp_egress_fill(struct sk_buff *skb, struct net_device *dev)
+{
+	u8 mode = XDP_ATTACHED_NONE;
+	struct nlattr *xdp;
+	u32 prog_id = 0;
+	int err;
+
+	xdp = nla_nest_start_noflag(skb, IFLA_XDP_EGRESS);
+	if (!xdp)
+		return -EMSGSIZE;
+
+	err = rtnl_xdp_egress_report(skb, dev, &prog_id, &mode,
+				     XDP_ATTACHED_EGRESS_CORE,
+				     IFLA_XDP_EGRESS_CORE_PROG_ID,
+				     rtnl_xdp_egress_prog);
+	if (err)
+		goto err_cancel;
+
+	err = nla_put_u8(skb, IFLA_XDP_ATTACHED, mode);
+	if (err)
+		goto err_cancel;
+
+	if (prog_id && mode != XDP_ATTACHED_MULTI) {
+		err = nla_put_u32(skb, IFLA_XDP_PROG_ID, prog_id);
+		if (err)
+			goto err_cancel;
+	}
+
+	nla_nest_end(skb, xdp);
+	return 0;
+
+err_cancel:
+	nla_nest_cancel(skb, xdp);
+	return err;
+}
+
 static u32 rtnl_get_event(unsigned long event)
 {
 	u32 rtnl_event_type = IFLA_EVENT_NONE;
@@ -1743,6 +1815,9 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb,
 	if (rtnl_xdp_fill(skb, dev))
 		goto nla_put_failure;
 
+	if (rtnl_xdp_egress_fill(skb, dev))
+		goto nla_put_failure;
+
 	if (dev->rtnl_link_ops || rtnl_have_link_slave_info(dev)) {
 		if (rtnl_link_fill(skb, dev) < 0)
 			goto nla_put_failure;
@@ -1827,6 +1902,7 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = {
 	[IFLA_ALT_IFNAME]	= { .type = NLA_STRING,
 				    .len = ALTIFNAMSIZ - 1 },
 	[IFLA_PERM_ADDRESS]	= { .type = NLA_REJECT },
+	[IFLA_XDP_EGRESS]	= { .type = NLA_NESTED },
 };
 
 static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = {
@@ -2482,7 +2558,8 @@ static int do_set_master(struct net_device *dev, int ifindex,
 #define DO_SETLINK_NOTIFY	0x03
 
 static int do_setlink_xdp(struct net_device *dev, struct nlattr *tb,
-			  int *status, struct netlink_ext_ack *extack)
+			  int *status, bool egress,
+			  struct netlink_ext_ack *extack)
 {
 	struct nlattr *xdp[IFLA_XDP_MAX + 1];
 	u32 xdp_flags = 0;
@@ -2498,6 +2575,10 @@ static int do_setlink_xdp(struct net_device *dev, struct nlattr *tb,
 
 	if (xdp[IFLA_XDP_FLAGS]) {
 		xdp_flags = nla_get_u32(xdp[IFLA_XDP_FLAGS]);
+		if (egress && xdp_flags & XDP_FLAGS_MODES) {
+			NL_SET_ERR_MSG(extack, "XDP_FLAGS_MODES not valid for egress");
+			goto out_einval;
+		}
 		if (xdp_flags & ~XDP_FLAGS_MASK)
 			goto out_einval;
 		if (hweight32(xdp_flags & XDP_FLAGS_MODES) > 1)
@@ -2515,7 +2596,7 @@ static int do_setlink_xdp(struct net_device *dev, struct nlattr *tb,
 
 		err = dev_change_xdp_fd(dev, extack,
 					nla_get_s32(xdp[IFLA_XDP_FD]),
-					expected_fd, xdp_flags, false);
+					expected_fd, xdp_flags, egress);
 		if (err)
 			return err;
 
@@ -2821,7 +2902,14 @@ static int do_setlink(const struct sk_buff *skb,
 	}
 
 	if (tb[IFLA_XDP]) {
-		err = do_setlink_xdp(dev, tb[IFLA_XDP], &status, extack);
+		err = do_setlink_xdp(dev, tb[IFLA_XDP], &status, false, extack);
+		if (err)
+			goto errout;
+	}
+
+	if (tb[IFLA_XDP_EGRESS]) {
+		err = do_setlink_xdp(dev, tb[IFLA_XDP_EGRESS], &status, true,
+				     extack);
 		if (err)
 			goto errout;
 	}
diff --git a/tools/include/uapi/linux/if_link.h b/tools/include/uapi/linux/if_link.h
index ca6665ea758a..f9e665aa836a 100644
--- a/tools/include/uapi/linux/if_link.h
+++ b/tools/include/uapi/linux/if_link.h
@@ -170,6 +170,7 @@ enum {
 	IFLA_PROP_LIST,
 	IFLA_ALT_IFNAME, /* Alternative ifname */
 	IFLA_PERM_ADDRESS,
+	IFLA_XDP_EGRESS, /* nested attribute with 1 or more IFLA_XDP_ attrs */
 	__IFLA_MAX
 };
 
@@ -976,6 +977,7 @@ enum {
 	XDP_ATTACHED_SKB,
 	XDP_ATTACHED_HW,
 	XDP_ATTACHED_MULTI,
+	XDP_ATTACHED_EGRESS_CORE,
 };
 
 enum {
@@ -988,6 +990,7 @@ enum {
 	IFLA_XDP_SKB_PROG_ID,
 	IFLA_XDP_HW_PROG_ID,
 	IFLA_XDP_EXPECTED_FD,
+	IFLA_XDP_EGRESS_CORE_PROG_ID,
 	__IFLA_XDP_MAX,
 };
 
-- 
2.21.1 (Apple Git-122.3)


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH RFC-v5 bpf-next 05/12] net: core: rename netif_receive_generic_xdp to do_generic_xdp_core
  2020-04-13 17:17 [PATCH RFC-v5 bpf-next 00/12] Add support for XDP in egress path David Ahern
                   ` (3 preceding siblings ...)
  2020-04-13 17:17 ` [PATCH RFC-v5 bpf-next 04/12] net: Add IFLA_XDP_EGRESS for XDP programs in the egress path David Ahern
@ 2020-04-13 17:17 ` David Ahern
  2020-04-16 14:00   ` Toke Høiland-Jørgensen
  2020-04-13 17:17 ` [PATCH RFC-v5 bpf-next 06/12] net: core: Rename do_xdp_generic to do_xdp_generic_rx David Ahern
                   ` (7 subsequent siblings)
  12 siblings, 1 reply; 27+ messages in thread
From: David Ahern @ 2020-04-13 17:17 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, prashantbhole.linux, jasowang, brouer, toke,
	toshiaki.makita1, daniel, john.fastabend, ast, kafai,
	songliubraving, yhs, andriin, dsahern, David Ahern

From: David Ahern <dahern@digitalocean.com>

In skb generic path, we need a way to run XDP program on skb but
to have customized handling of XDP actions. netif_receive_generic_xdp
will be more helpful in such cases than do_xdp_generic.

This patch prepares netif_receive_generic_xdp() to be used as general
purpose function for running xdp programs on skbs by renaming it to
do_xdp_generic_core, moving skb_is_redirected and rxq settings as well
as XDP return code checks to the callers.

This allows this core function to be used from both Rx and Tx paths
with rxq and txq set based on context.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
Signed-off-by: David Ahern <dahern@digitalocean.com>
---
 net/core/dev.c | 52 ++++++++++++++++++++++++--------------------------
 1 file changed, 25 insertions(+), 27 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index e763b6cea8ff..4f0c4fee1125 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4500,25 +4500,17 @@ static struct netdev_rx_queue *netif_get_rxqueue(struct sk_buff *skb)
 	return rxqueue;
 }
 
-static u32 netif_receive_generic_xdp(struct sk_buff *skb,
-				     struct xdp_buff *xdp,
-				     struct bpf_prog *xdp_prog)
+static u32 do_xdp_generic_core(struct sk_buff *skb, struct xdp_buff *xdp,
+			       struct bpf_prog *xdp_prog)
 {
-	struct netdev_rx_queue *rxqueue;
 	void *orig_data, *orig_data_end;
-	u32 metalen, act = XDP_DROP;
 	__be16 orig_eth_type;
 	struct ethhdr *eth;
+	u32 metalen, act;
 	bool orig_bcast;
 	int hlen, off;
 	u32 mac_len;
 
-	/* Reinjected packets coming from act_mirred or similar should
-	 * not get XDP generic processing.
-	 */
-	if (skb_is_redirected(skb))
-		return XDP_PASS;
-
 	/* XDP packets must be linear and must have sufficient headroom
 	 * of XDP_PACKET_HEADROOM bytes. This is the guarantee that also
 	 * native XDP provides, thus we need to do it here as well.
@@ -4534,9 +4526,9 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
 		if (pskb_expand_head(skb,
 				     hroom > 0 ? ALIGN(hroom, NET_SKB_PAD) : 0,
 				     troom > 0 ? troom + 128 : 0, GFP_ATOMIC))
-			goto do_drop;
+			return XDP_DROP;
 		if (skb_linearize(skb))
-			goto do_drop;
+			return XDP_DROP;
 	}
 
 	/* The XDP program wants to see the packet starting at the MAC
@@ -4554,9 +4546,6 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
 	orig_bcast = is_multicast_ether_addr_64bits(eth->h_dest);
 	orig_eth_type = eth->h_proto;
 
-	rxqueue = netif_get_rxqueue(skb);
-	xdp->rxq = &rxqueue->xdp_rxq;
-
 	act = bpf_prog_run_xdp(xdp_prog, xdp);
 
 	/* check if bpf_xdp_adjust_head was used */
@@ -4599,16 +4588,6 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
 		if (metalen)
 			skb_metadata_set(skb, metalen);
 		break;
-	default:
-		bpf_warn_invalid_xdp_action(act);
-		/* fall through */
-	case XDP_ABORTED:
-		trace_xdp_exception(skb->dev, xdp_prog, act);
-		/* fall through */
-	case XDP_DROP:
-	do_drop:
-		kfree_skb(skb);
-		break;
 	}
 
 	return act;
@@ -4643,12 +4622,22 @@ static DEFINE_STATIC_KEY_FALSE(generic_xdp_needed_key);
 
 int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb)
 {
+	/* Reinjected packets coming from act_mirred or similar should
+	 * not get XDP generic processing.
+	 */
+	if (skb_is_redirected(skb))
+		return XDP_PASS;
+
 	if (xdp_prog) {
+		struct netdev_rx_queue *rxqueue;
 		struct xdp_buff xdp;
 		u32 act;
 		int err;
 
-		act = netif_receive_generic_xdp(skb, &xdp, xdp_prog);
+		rxqueue = netif_get_rxqueue(skb);
+		xdp.rxq = &rxqueue->xdp_rxq;
+
+		act = do_xdp_generic_core(skb, &xdp, xdp_prog);
 		if (act != XDP_PASS) {
 			switch (act) {
 			case XDP_REDIRECT:
@@ -4660,6 +4649,15 @@ int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb)
 			case XDP_TX:
 				generic_xdp_tx(skb, xdp_prog);
 				break;
+			default:
+				bpf_warn_invalid_xdp_action(act);
+				/* fall through */
+			case XDP_ABORTED:
+				trace_xdp_exception(skb->dev, xdp_prog, act);
+				/* fall through */
+			case XDP_DROP:
+				kfree_skb(skb);
+				break;
 			}
 			return XDP_DROP;
 		}
-- 
2.21.1 (Apple Git-122.3)


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH RFC-v5 bpf-next 06/12] net: core: Rename do_xdp_generic to do_xdp_generic_rx
  2020-04-13 17:17 [PATCH RFC-v5 bpf-next 00/12] Add support for XDP in egress path David Ahern
                   ` (4 preceding siblings ...)
  2020-04-13 17:17 ` [PATCH RFC-v5 bpf-next 05/12] net: core: rename netif_receive_generic_xdp to do_generic_xdp_core David Ahern
@ 2020-04-13 17:17 ` David Ahern
  2020-04-13 17:17 ` [PATCH RFC-v5 bpf-next 07/12] dev: set egress XDP program David Ahern
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 27+ messages in thread
From: David Ahern @ 2020-04-13 17:17 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, prashantbhole.linux, jasowang, brouer, toke,
	toshiaki.makita1, daniel, john.fastabend, ast, kafai,
	songliubraving, yhs, andriin, dsahern, David Ahern

From: David Ahern <dahern@digitalocean.com>

Rename do_xdp_generic to do_xdp_generic_rx to emphasize its use in the
Rx path.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
Signed-off-by: David Ahern <dahern@digitalocean.com>
---
 drivers/net/tun.c         | 4 ++--
 include/linux/netdevice.h | 2 +-
 net/core/dev.c            | 7 ++++---
 3 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 228fe449dc6d..20d94bf79cf7 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1895,7 +1895,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
 		rcu_read_lock();
 		xdp_prog = rcu_dereference(tun->xdp_prog);
 		if (xdp_prog) {
-			ret = do_xdp_generic(xdp_prog, skb);
+			ret = do_xdp_generic_rx(xdp_prog, skb);
 			if (ret != XDP_PASS) {
 				rcu_read_unlock();
 				local_bh_enable();
@@ -2459,7 +2459,7 @@ static int tun_xdp_one(struct tun_struct *tun,
 	skb_probe_transport_header(skb);
 
 	if (skb_xdp) {
-		err = do_xdp_generic(xdp_prog, skb);
+		err = do_xdp_generic_rx(xdp_prog, skb);
 		if (err != XDP_PASS)
 			goto out;
 	}
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 3133247681fd..2649f2b36858 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3711,7 +3711,7 @@ static inline void dev_consume_skb_any(struct sk_buff *skb)
 }
 
 void generic_xdp_tx(struct sk_buff *skb, struct bpf_prog *xdp_prog);
-int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb);
+int do_xdp_generic_rx(struct bpf_prog *xdp_prog, struct sk_buff *skb);
 int netif_rx(struct sk_buff *skb);
 int netif_rx_ni(struct sk_buff *skb);
 int netif_receive_skb(struct sk_buff *skb);
diff --git a/net/core/dev.c b/net/core/dev.c
index 4f0c4fee1125..6da613fb6623 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4620,7 +4620,7 @@ void generic_xdp_tx(struct sk_buff *skb, struct bpf_prog *xdp_prog)
 
 static DEFINE_STATIC_KEY_FALSE(generic_xdp_needed_key);
 
-int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb)
+int do_xdp_generic_rx(struct bpf_prog *xdp_prog, struct sk_buff *skb)
 {
 	/* Reinjected packets coming from act_mirred or similar should
 	 * not get XDP generic processing.
@@ -4667,7 +4667,7 @@ int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb)
 	kfree_skb(skb);
 	return XDP_DROP;
 }
-EXPORT_SYMBOL_GPL(do_xdp_generic);
+EXPORT_SYMBOL_GPL(do_xdp_generic_rx);
 
 static int netif_rx_internal(struct sk_buff *skb)
 {
@@ -5017,7 +5017,8 @@ static int __netif_receive_skb_core(struct sk_buff *skb, bool pfmemalloc,
 		int ret2;
 
 		preempt_disable();
-		ret2 = do_xdp_generic(rcu_dereference(skb->dev->xdp_prog), skb);
+		ret2 = do_xdp_generic_rx(rcu_dereference(skb->dev->xdp_prog),
+					 skb);
 		preempt_enable();
 
 		if (ret2 != XDP_PASS)
-- 
2.21.1 (Apple Git-122.3)


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH RFC-v5 bpf-next 07/12] dev: set egress XDP program
  2020-04-13 17:17 [PATCH RFC-v5 bpf-next 00/12] Add support for XDP in egress path David Ahern
                   ` (5 preceding siblings ...)
  2020-04-13 17:17 ` [PATCH RFC-v5 bpf-next 06/12] net: core: Rename do_xdp_generic to do_xdp_generic_rx David Ahern
@ 2020-04-13 17:17 ` David Ahern
  2020-04-13 17:17 ` [PATCH RFC-v5 bpf-next 08/12] dev: Support xdp in the Tx path for packets as an skb David Ahern
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 27+ messages in thread
From: David Ahern @ 2020-04-13 17:17 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, prashantbhole.linux, jasowang, brouer, toke,
	toshiaki.makita1, daniel, john.fastabend, ast, kafai,
	songliubraving, yhs, andriin, dsahern, David Ahern

From: David Ahern <dahern@digitalocean.com>

This patch adds a way to set tx path XDP program on a net_device
by handling XDP_SETUP_PROG_EGRESS and XDP_QUERY_PROG_EGRESS in
generic_xdp_install handler. New static key is added to signal
when an egress program has been installed.

Signed-off-by: David Ahern <dahern@digitalocean.com>
---
 include/linux/netdevice.h |  2 ++
 net/core/dev.c            | 43 +++++++++++++++++++++++++++------------
 2 files changed, 32 insertions(+), 13 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 2649f2b36858..0c89996a6bec 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -750,6 +750,8 @@ struct netdev_rx_queue {
 #endif
 } ____cacheline_aligned_in_smp;
 
+extern struct static_key_false xdp_egress_needed_key;
+
 /*
  * RX queue sysfs structures and functions.
  */
diff --git a/net/core/dev.c b/net/core/dev.c
index 6da613fb6623..c879c291244a 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4619,6 +4619,7 @@ void generic_xdp_tx(struct sk_buff *skb, struct bpf_prog *xdp_prog)
 }
 
 static DEFINE_STATIC_KEY_FALSE(generic_xdp_needed_key);
+DEFINE_STATIC_KEY_FALSE(xdp_egress_needed_key);
 
 int do_xdp_generic_rx(struct bpf_prog *xdp_prog, struct sk_buff *skb)
 {
@@ -5334,12 +5335,12 @@ static void __netif_receive_skb_list(struct list_head *head)
 
 static int generic_xdp_install(struct net_device *dev, struct netdev_bpf *xdp)
 {
-	struct bpf_prog *old = rtnl_dereference(dev->xdp_prog);
-	struct bpf_prog *new = xdp->prog;
+	struct bpf_prog *old, *new = xdp->prog;
 	int ret = 0;
 
 	switch (xdp->command) {
 	case XDP_SETUP_PROG:
+		old = rtnl_dereference(dev->xdp_prog);
 		rcu_assign_pointer(dev->xdp_prog, new);
 		if (old)
 			bpf_prog_put(old);
@@ -5352,11 +5353,25 @@ static int generic_xdp_install(struct net_device *dev, struct netdev_bpf *xdp)
 			dev_disable_gro_hw(dev);
 		}
 		break;
+	case XDP_SETUP_PROG_EGRESS:
+		old = rtnl_dereference(dev->xdp_egress_prog);
+		rcu_assign_pointer(dev->xdp_egress_prog, new);
+		if (old)
+			bpf_prog_put(old);
 
+		if (old && !new)
+			static_branch_dec(&xdp_egress_needed_key);
+		else if (new && !old)
+			static_branch_inc(&xdp_egress_needed_key);
+		break;
 	case XDP_QUERY_PROG:
+		old = rtnl_dereference(dev->xdp_prog);
+		xdp->prog_id = old ? old->aux->id : 0;
+		break;
+	case XDP_QUERY_PROG_EGRESS:
+		old = rtnl_dereference(dev->xdp_egress_prog);
 		xdp->prog_id = old ? old->aux->id : 0;
 		break;
-
 	default:
 		ret = -EINVAL;
 		break;
@@ -8680,21 +8695,23 @@ int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
 	ASSERT_RTNL();
 
 	offload = flags & XDP_FLAGS_HW_MODE;
-	if (egress)
+	if (egress) {
 		query = XDP_QUERY_PROG_EGRESS;
-	else
+		bpf_op = bpf_chk = generic_xdp_install;
+	} else {
 		query = offload ? XDP_QUERY_PROG_HW : XDP_QUERY_PROG;
 
 
-	bpf_op = bpf_chk = ops->ndo_bpf;
-	if (!bpf_op && (flags & (XDP_FLAGS_DRV_MODE | XDP_FLAGS_HW_MODE))) {
-		NL_SET_ERR_MSG(extack, "underlying driver does not support XDP in native mode");
-		return -EOPNOTSUPP;
+		bpf_op = bpf_chk = ops->ndo_bpf;
+		if (!bpf_op && (flags & (XDP_FLAGS_DRV_MODE | XDP_FLAGS_HW_MODE))) {
+			NL_SET_ERR_MSG(extack, "underlying driver does not support XDP in native mode");
+			return -EOPNOTSUPP;
+		}
+		if (!bpf_op || (flags & XDP_FLAGS_SKB_MODE))
+			bpf_op = generic_xdp_install;
+		if (bpf_op == bpf_chk)
+			bpf_chk = generic_xdp_install;
 	}
-	if (!bpf_op || (flags & XDP_FLAGS_SKB_MODE))
-		bpf_op = generic_xdp_install;
-	if (bpf_op == bpf_chk)
-		bpf_chk = generic_xdp_install;
 
 	prog_id = __dev_xdp_query(dev, bpf_op, query);
 	if (flags & XDP_FLAGS_REPLACE) {
-- 
2.21.1 (Apple Git-122.3)


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH RFC-v5 bpf-next 08/12] dev: Support xdp in the Tx path for packets as an skb
  2020-04-13 17:17 [PATCH RFC-v5 bpf-next 00/12] Add support for XDP in egress path David Ahern
                   ` (6 preceding siblings ...)
  2020-04-13 17:17 ` [PATCH RFC-v5 bpf-next 07/12] dev: set egress XDP program David Ahern
@ 2020-04-13 17:17 ` David Ahern
  2020-04-13 17:17 ` [PATCH RFC-v5 bpf-next 09/12] dev: Support xdp in the Tx path for xdp_frames David Ahern
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 27+ messages in thread
From: David Ahern @ 2020-04-13 17:17 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, prashantbhole.linux, jasowang, brouer, toke,
	toshiaki.makita1, daniel, john.fastabend, ast, kafai,
	songliubraving, yhs, andriin, dsahern, David Ahern

From: David Ahern <dahern@digitalocean.com>

Add support to run Tx path program on packets about to hit the
ndo_start_xmit function for a device. Only XDP_DROP and XDP_PASS
are supported now. Conceptually, XDP_REDIRECT for this path can
work the same as it does for the Rx path, but that support is left
for a follow on series.

Signed-off-by: David Ahern <dahern@digitalocean.com>
---
 include/linux/netdevice.h | 11 +++++++++
 net/core/dev.c            | 52 ++++++++++++++++++++++++++++++++++++++-
 2 files changed, 62 insertions(+), 1 deletion(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 0c89996a6bec..39e1b42c042f 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3714,6 +3714,7 @@ static inline void dev_consume_skb_any(struct sk_buff *skb)
 
 void generic_xdp_tx(struct sk_buff *skb, struct bpf_prog *xdp_prog);
 int do_xdp_generic_rx(struct bpf_prog *xdp_prog, struct sk_buff *skb);
+u32 do_xdp_egress_skb(struct net_device *dev, struct sk_buff *skb);
 int netif_rx(struct sk_buff *skb);
 int netif_rx_ni(struct sk_buff *skb);
 int netif_receive_skb(struct sk_buff *skb);
@@ -4534,6 +4535,16 @@ static inline netdev_tx_t __netdev_start_xmit(const struct net_device_ops *ops,
 					      struct sk_buff *skb, struct net_device *dev,
 					      bool more)
 {
+	if (static_branch_unlikely(&xdp_egress_needed_key)) {
+		u32 act;
+
+		rcu_read_lock();
+		act = do_xdp_egress_skb(dev, skb);
+		rcu_read_unlock();
+		if (act == XDP_DROP)
+			return NET_XMIT_DROP;
+	}
+
 	__this_cpu_write(softnet_data.xmit.more, more);
 	return ops->ndo_start_xmit(skb, dev);
 }
diff --git a/net/core/dev.c b/net/core/dev.c
index c879c291244a..1bbaeb8842ed 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4619,7 +4619,6 @@ void generic_xdp_tx(struct sk_buff *skb, struct bpf_prog *xdp_prog)
 }
 
 static DEFINE_STATIC_KEY_FALSE(generic_xdp_needed_key);
-DEFINE_STATIC_KEY_FALSE(xdp_egress_needed_key);
 
 int do_xdp_generic_rx(struct bpf_prog *xdp_prog, struct sk_buff *skb)
 {
@@ -4670,6 +4669,57 @@ int do_xdp_generic_rx(struct bpf_prog *xdp_prog, struct sk_buff *skb)
 }
 EXPORT_SYMBOL_GPL(do_xdp_generic_rx);
 
+DEFINE_STATIC_KEY_FALSE(xdp_egress_needed_key);
+EXPORT_SYMBOL_GPL(xdp_egress_needed_key);
+
+static u32 handle_xdp_egress_act(u32 act, struct net_device *dev,
+				 struct bpf_prog *xdp_prog)
+{
+	switch (act) {
+	case XDP_DROP:
+		/* fall through */
+	case XDP_PASS:
+		break;
+	case XDP_TX:
+		/* fall through */
+	case XDP_REDIRECT:
+		/* fall through */
+	default:
+		bpf_warn_invalid_xdp_action(act);
+		/* fall through */
+	case XDP_ABORTED:
+		trace_xdp_exception(dev, xdp_prog, act);
+		act = XDP_DROP;
+		break;
+	}
+
+	return act;
+}
+
+u32 do_xdp_egress_skb(struct net_device *dev, struct sk_buff *skb)
+{
+	struct bpf_prog *xdp_prog;
+	u32 act = XDP_PASS;
+
+	xdp_prog = rcu_dereference(dev->xdp_egress_prog);
+	if (xdp_prog) {
+		struct xdp_txq_info txq = { .dev = dev };
+		struct xdp_buff xdp;
+
+		xdp.txq = &txq;
+		act = do_xdp_generic_core(skb, &xdp, xdp_prog);
+		act = handle_xdp_egress_act(act, dev, xdp_prog);
+		if (act == XDP_DROP) {
+			atomic_long_inc(&dev->tx_dropped);
+			skb_tx_error(skb);
+			kfree_skb(skb);
+		}
+	}
+
+	return act;
+}
+EXPORT_SYMBOL_GPL(do_xdp_egress_skb);
+
 static int netif_rx_internal(struct sk_buff *skb)
 {
 	int ret;
-- 
2.21.1 (Apple Git-122.3)


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH RFC-v5 bpf-next 09/12] dev: Support xdp in the Tx path for xdp_frames
  2020-04-13 17:17 [PATCH RFC-v5 bpf-next 00/12] Add support for XDP in egress path David Ahern
                   ` (7 preceding siblings ...)
  2020-04-13 17:17 ` [PATCH RFC-v5 bpf-next 08/12] dev: Support xdp in the Tx path for packets as an skb David Ahern
@ 2020-04-13 17:17 ` David Ahern
  2020-04-16 14:02   ` Toke Høiland-Jørgensen
  2020-04-17  8:30   ` Jesper Dangaard Brouer
  2020-04-13 17:17 ` [PATCH RFC-v5 bpf-next 10/12] libbpf: Add egress XDP support David Ahern
                   ` (3 subsequent siblings)
  12 siblings, 2 replies; 27+ messages in thread
From: David Ahern @ 2020-04-13 17:17 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, prashantbhole.linux, jasowang, brouer, toke,
	toshiaki.makita1, daniel, john.fastabend, ast, kafai,
	songliubraving, yhs, andriin, dsahern, David Ahern

From: David Ahern <dahern@digitalocean.com>

Add support to run Tx path program on xdp_frames by adding a hook to
bq_xmit_all before xdp_frames are passed to ndo_xdp_xmit for the device.

If an xdp_frame is dropped by the program, it is removed from the
xdp_frames array with subsequent entries moved up.

Signed-off-by: David Ahern <dahern@digitalocean.com>
---
 include/linux/netdevice.h |  3 ++
 kernel/bpf/devmap.c       | 19 ++++++++---
 net/core/dev.c            | 70 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 87 insertions(+), 5 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 39e1b42c042f..d75e31ac2751 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3715,6 +3715,9 @@ static inline void dev_consume_skb_any(struct sk_buff *skb)
 void generic_xdp_tx(struct sk_buff *skb, struct bpf_prog *xdp_prog);
 int do_xdp_generic_rx(struct bpf_prog *xdp_prog, struct sk_buff *skb);
 u32 do_xdp_egress_skb(struct net_device *dev, struct sk_buff *skb);
+unsigned int do_xdp_egress_frame(struct net_device *dev,
+				 struct xdp_frame **frames,
+				 unsigned int *pcount);
 int netif_rx(struct sk_buff *skb);
 int netif_rx_ni(struct sk_buff *skb);
 int netif_receive_skb(struct sk_buff *skb);
diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
index 58bdca5d978a..bedecd07d898 100644
--- a/kernel/bpf/devmap.c
+++ b/kernel/bpf/devmap.c
@@ -322,24 +322,33 @@ static int bq_xmit_all(struct xdp_dev_bulk_queue *bq, u32 flags)
 {
 	struct net_device *dev = bq->dev;
 	int sent = 0, drops = 0, err = 0;
+	unsigned int count = bq->count;
 	int i;
 
-	if (unlikely(!bq->count))
+	if (unlikely(!count))
 		return 0;
 
-	for (i = 0; i < bq->count; i++) {
+	for (i = 0; i < count; i++) {
 		struct xdp_frame *xdpf = bq->q[i];
 
 		prefetch(xdpf);
 	}
 
-	sent = dev->netdev_ops->ndo_xdp_xmit(dev, bq->count, bq->q, flags);
+	if (static_branch_unlikely(&xdp_egress_needed_key)) {
+		count = do_xdp_egress_frame(dev, bq->q, &count);
+		drops += bq->count - count;
+		/* all frames consumed by the xdp program? */
+		if (!count)
+			goto out;
+	}
+
+	sent = dev->netdev_ops->ndo_xdp_xmit(dev, count, bq->q, flags);
 	if (sent < 0) {
 		err = sent;
 		sent = 0;
 		goto error;
 	}
-	drops = bq->count - sent;
+	drops += count - sent;
 out:
 	bq->count = 0;
 
@@ -351,7 +360,7 @@ static int bq_xmit_all(struct xdp_dev_bulk_queue *bq, u32 flags)
 	/* If ndo_xdp_xmit fails with an errno, no frames have been
 	 * xmit'ed and it's our responsibility to them free all.
 	 */
-	for (i = 0; i < bq->count; i++) {
+	for (i = 0; i < count; i++) {
 		struct xdp_frame *xdpf = bq->q[i];
 
 		xdp_return_frame_rx_napi(xdpf);
diff --git a/net/core/dev.c b/net/core/dev.c
index 1bbaeb8842ed..f23dc6043329 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4720,6 +4720,76 @@ u32 do_xdp_egress_skb(struct net_device *dev, struct sk_buff *skb)
 }
 EXPORT_SYMBOL_GPL(do_xdp_egress_skb);
 
+static u32 __xdp_egress_frame(struct net_device *dev,
+			      struct bpf_prog *xdp_prog,
+			      struct xdp_frame *xdp_frame,
+			      struct xdp_txq_info *txq)
+{
+	struct xdp_buff xdp;
+	u32 act;
+
+	xdp.data_hard_start = xdp_frame->data - xdp_frame->headroom;
+	xdp.data = xdp_frame->data;
+	xdp.data_end = xdp.data + xdp_frame->len;
+	xdp_set_data_meta_invalid(&xdp);
+	xdp.txq = txq;
+
+	act = bpf_prog_run_xdp(xdp_prog, &xdp);
+	act = handle_xdp_egress_act(act, dev, xdp_prog);
+
+	/* if not dropping frame, readjust pointers in case
+	 * program made changes to the buffer
+	 */
+	if (act != XDP_DROP) {
+		int headroom = xdp.data - xdp.data_hard_start;
+		int metasize = xdp.data - xdp.data_meta;
+
+		metasize = metasize > 0 ? metasize : 0;
+		if (unlikely((headroom - metasize) < sizeof(*xdp_frame)))
+			return XDP_DROP;
+
+		xdp_frame = xdp.data_hard_start;
+		xdp_frame->data = xdp.data;
+		xdp_frame->len  = xdp.data_end - xdp.data;
+		xdp_frame->headroom = headroom - sizeof(*xdp_frame);
+		xdp_frame->metasize = metasize;
+		/* xdp_frame->mem is unchanged */
+	}
+
+	return act;
+}
+
+unsigned int do_xdp_egress_frame(struct net_device *dev,
+				 struct xdp_frame **frames,
+				 unsigned int *pcount)
+{
+	struct bpf_prog *xdp_prog;
+	unsigned int count = *pcount;
+
+	xdp_prog = rcu_dereference(dev->xdp_egress_prog);
+	if (xdp_prog) {
+		struct xdp_txq_info txq = { .dev = dev };
+		unsigned int i, j;
+		u32 act;
+
+		for (i = 0, j = 0; i < count; i++) {
+			struct xdp_frame *frame = frames[i];
+
+			act = __xdp_egress_frame(dev, xdp_prog, frame, &txq);
+			if (act == XDP_DROP) {
+				xdp_return_frame_rx_napi(frame);
+				continue;
+			}
+
+			frames[j] = frame;
+			j++;
+		}
+		count = j;
+	}
+
+	return count;
+}
+
 static int netif_rx_internal(struct sk_buff *skb)
 {
 	int ret;
-- 
2.21.1 (Apple Git-122.3)


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH RFC-v5 bpf-next 10/12] libbpf: Add egress XDP support
  2020-04-13 17:17 [PATCH RFC-v5 bpf-next 00/12] Add support for XDP in egress path David Ahern
                   ` (8 preceding siblings ...)
  2020-04-13 17:17 ` [PATCH RFC-v5 bpf-next 09/12] dev: Support xdp in the Tx path for xdp_frames David Ahern
@ 2020-04-13 17:17 ` David Ahern
  2020-04-13 17:18 ` [PATCH RFC-v5 bpf-next 11/12] bpftool: Add support for XDP egress David Ahern
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 27+ messages in thread
From: David Ahern @ 2020-04-13 17:17 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, prashantbhole.linux, jasowang, brouer, toke,
	toshiaki.makita1, daniel, john.fastabend, ast, kafai,
	songliubraving, yhs, andriin, dsahern, David Ahern

From: David Ahern <dahern@digitalocean.com>

Patch adds egress XDP support in libbpf.

New section name hint, xdp_egress, is added to set expected attach
type at program load. Programs can use xdp_egress as the prefix in
the SEC statement to load the program with the BPF_XDP_EGRESS
attach type set.

egress is added to bpf_xdp_set_link_opts to specify egress type for
use with bpf_set_link_xdp_fd_opts. Update library side to check
for flag and set nla_type to IFLA_XDP_EGRESS.

Add egress version of bpf_get_link_xdp* info and id apis with core
code refactored to handle both rx and tx paths.

Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
Co-developed-by: David Ahern <dahern@digitalocean.com>
Signed-off-by: David Ahern <dahern@digitalocean.com>
---
 tools/lib/bpf/libbpf.c   |  2 ++
 tools/lib/bpf/libbpf.h   |  9 +++++-
 tools/lib/bpf/libbpf.map |  2 ++
 tools/lib/bpf/netlink.c  | 63 +++++++++++++++++++++++++++++++++++-----
 4 files changed, 67 insertions(+), 9 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index ff9174282a8c..463f55e6e82f 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -6322,6 +6322,8 @@ static const struct bpf_sec_def section_defs[] = {
 		.is_attach_btf = true,
 		.expected_attach_type = BPF_LSM_MAC,
 		.attach_fn = attach_lsm),
+	BPF_EAPROG_SEC("xdp_egress",		BPF_PROG_TYPE_XDP,
+						BPF_XDP_EGRESS),
 	BPF_PROG_SEC("xdp",			BPF_PROG_TYPE_XDP),
 	BPF_PROG_SEC("perf_event",		BPF_PROG_TYPE_PERF_EVENT),
 	BPF_PROG_SEC("lwt_in",			BPF_PROG_TYPE_LWT_IN),
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 44df1d3e7287..2de56b9c6397 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -453,14 +453,16 @@ struct xdp_link_info {
 	__u32 drv_prog_id;
 	__u32 hw_prog_id;
 	__u32 skb_prog_id;
+	__u32 egress_core_prog_id;
 	__u8 attach_mode;
 };
 
 struct bpf_xdp_set_link_opts {
 	size_t sz;
 	__u32 old_fd;
+	__u8  egress;
 };
-#define bpf_xdp_set_link_opts__last_field old_fd
+#define bpf_xdp_set_link_opts__last_field egress
 
 LIBBPF_API int bpf_set_link_xdp_fd(int ifindex, int fd, __u32 flags);
 LIBBPF_API int bpf_set_link_xdp_fd_opts(int ifindex, int fd, __u32 flags,
@@ -468,6 +470,11 @@ LIBBPF_API int bpf_set_link_xdp_fd_opts(int ifindex, int fd, __u32 flags,
 LIBBPF_API int bpf_get_link_xdp_id(int ifindex, __u32 *prog_id, __u32 flags);
 LIBBPF_API int bpf_get_link_xdp_info(int ifindex, struct xdp_link_info *info,
 				     size_t info_size, __u32 flags);
+LIBBPF_API int bpf_get_link_xdp_egress_id(int ifindex, __u32 *prog_id,
+					  __u32 flags);
+LIBBPF_API int bpf_get_link_xdp_egress_info(int ifindex,
+					    struct xdp_link_info *info,
+					    size_t info_size, __u32 flags);
 
 struct perf_buffer;
 
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index bb8831605b25..51576c8a02fe 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -253,4 +253,6 @@ LIBBPF_0.0.8 {
 		bpf_program__set_attach_target;
 		bpf_program__set_lsm;
 		bpf_set_link_xdp_fd_opts;
+		bpf_get_link_xdp_egress_id;
+		bpf_get_link_xdp_egress_info;
 } LIBBPF_0.0.7;
diff --git a/tools/lib/bpf/netlink.c b/tools/lib/bpf/netlink.c
index 18b5319025e1..31c4252a6908 100644
--- a/tools/lib/bpf/netlink.c
+++ b/tools/lib/bpf/netlink.c
@@ -28,6 +28,7 @@ typedef int (*__dump_nlmsg_t)(struct nlmsghdr *nlmsg, libbpf_dump_nlmsg_t,
 struct xdp_id_md {
 	int ifindex;
 	__u32 flags;
+	__u16 nla_type;
 	struct xdp_link_info info;
 };
 
@@ -133,7 +134,7 @@ static int bpf_netlink_recv(int sock, __u32 nl_pid, int seq,
 }
 
 static int __bpf_set_link_xdp_fd_replace(int ifindex, int fd, int old_fd,
-					 __u32 flags)
+					 __u32 flags, __u16 nla_type)
 {
 	int sock, seq = 0, ret;
 	struct nlattr *nla, *nla_xdp;
@@ -160,7 +161,7 @@ static int __bpf_set_link_xdp_fd_replace(int ifindex, int fd, int old_fd,
 	/* started nested attribute for XDP */
 	nla = (struct nlattr *)(((char *)&req)
 				+ NLMSG_ALIGN(req.nh.nlmsg_len));
-	nla->nla_type = NLA_F_NESTED | IFLA_XDP;
+	nla->nla_type = NLA_F_NESTED | nla_type;
 	nla->nla_len = NLA_HDRLEN;
 
 	/* add XDP fd */
@@ -203,6 +204,7 @@ static int __bpf_set_link_xdp_fd_replace(int ifindex, int fd, int old_fd,
 int bpf_set_link_xdp_fd_opts(int ifindex, int fd, __u32 flags,
 			     const struct bpf_xdp_set_link_opts *opts)
 {
+	__u16 nla_type = IFLA_XDP;
 	int old_fd = -1;
 
 	if (!OPTS_VALID(opts, bpf_xdp_set_link_opts))
@@ -213,14 +215,22 @@ int bpf_set_link_xdp_fd_opts(int ifindex, int fd, __u32 flags,
 		flags |= XDP_FLAGS_REPLACE;
 	}
 
+	if (OPTS_HAS(opts, egress)) {
+		__u8 egress = OPTS_GET(opts, egress, 0);
+
+		if (egress)
+			nla_type = IFLA_XDP_EGRESS;
+	}
+
 	return __bpf_set_link_xdp_fd_replace(ifindex, fd,
 					     old_fd,
-					     flags);
+					     flags,
+					     nla_type);
 }
 
 int bpf_set_link_xdp_fd(int ifindex, int fd, __u32 flags)
 {
-	return __bpf_set_link_xdp_fd_replace(ifindex, fd, 0, flags);
+	return __bpf_set_link_xdp_fd_replace(ifindex, fd, 0, flags, IFLA_XDP);
 }
 
 static int __dump_link_nlmsg(struct nlmsghdr *nlh,
@@ -243,15 +253,16 @@ static int get_xdp_info(void *cookie, void *msg, struct nlattr **tb)
 	struct nlattr *xdp_tb[IFLA_XDP_MAX + 1];
 	struct xdp_id_md *xdp_id = cookie;
 	struct ifinfomsg *ifinfo = msg;
+	__u16 atype = xdp_id->nla_type;
 	int ret;
 
 	if (xdp_id->ifindex && xdp_id->ifindex != ifinfo->ifi_index)
 		return 0;
 
-	if (!tb[IFLA_XDP])
+	if (!tb[atype])
 		return 0;
 
-	ret = libbpf_nla_parse_nested(xdp_tb, IFLA_XDP_MAX, tb[IFLA_XDP], NULL);
+	ret = libbpf_nla_parse_nested(xdp_tb, IFLA_XDP_MAX, tb[atype], NULL);
 	if (ret)
 		return ret;
 
@@ -280,11 +291,16 @@ static int get_xdp_info(void *cookie, void *msg, struct nlattr **tb)
 		xdp_id->info.hw_prog_id = libbpf_nla_getattr_u32(
 			xdp_tb[IFLA_XDP_HW_PROG_ID]);
 
+	if (xdp_tb[IFLA_XDP_EGRESS_CORE_PROG_ID])
+		xdp_id->info.egress_core_prog_id = libbpf_nla_getattr_u32(
+			xdp_tb[IFLA_XDP_EGRESS_CORE_PROG_ID]);
+
 	return 0;
 }
 
-int bpf_get_link_xdp_info(int ifindex, struct xdp_link_info *info,
-			  size_t info_size, __u32 flags)
+static int __bpf_get_link_xdp_info(int ifindex, struct xdp_link_info *info,
+				   size_t info_size, __u32 flags,
+				   __u16 nla_type)
 {
 	struct xdp_id_md xdp_id = {};
 	int sock, ret;
@@ -306,6 +322,7 @@ int bpf_get_link_xdp_info(int ifindex, struct xdp_link_info *info,
 
 	xdp_id.ifindex = ifindex;
 	xdp_id.flags = flags;
+	xdp_id.nla_type = nla_type;
 
 	ret = libbpf_nl_get_link(sock, nl_pid, get_xdp_info, &xdp_id);
 	if (!ret) {
@@ -319,6 +336,20 @@ int bpf_get_link_xdp_info(int ifindex, struct xdp_link_info *info,
 	return ret;
 }
 
+int bpf_get_link_xdp_info(int ifindex, struct xdp_link_info *info,
+			  size_t info_size, __u32 flags)
+{
+	return __bpf_get_link_xdp_info(ifindex, info, info_size, flags,
+				       IFLA_XDP);
+}
+
+int bpf_get_link_xdp_egress_info(int ifindex, struct xdp_link_info *info,
+				 size_t info_size, __u32 flags)
+{
+	return __bpf_get_link_xdp_info(ifindex, info, info_size, flags,
+				       IFLA_XDP_EGRESS);
+}
+
 static __u32 get_xdp_id(struct xdp_link_info *info, __u32 flags)
 {
 	if (info->attach_mode != XDP_ATTACHED_MULTI)
@@ -345,6 +376,22 @@ int bpf_get_link_xdp_id(int ifindex, __u32 *prog_id, __u32 flags)
 	return ret;
 }
 
+int bpf_get_link_xdp_egress_id(int ifindex, __u32 *prog_id, __u32 flags)
+{
+	struct xdp_link_info info;
+	int ret;
+
+	/* egress path does not support SKB, DRV or HW mode */
+	if (flags & XDP_FLAGS_MODES)
+		return -EINVAL;
+
+	ret = bpf_get_link_xdp_egress_info(ifindex, &info, sizeof(info), flags);
+	if (!ret)
+		*prog_id = get_xdp_id(&info, flags);
+
+	return ret;
+}
+
 int libbpf_nl_get_link(int sock, unsigned int nl_pid,
 		       libbpf_dump_nlmsg_t dump_link_nlmsg, void *cookie)
 {
-- 
2.21.1 (Apple Git-122.3)


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH RFC-v5 bpf-next 11/12] bpftool: Add support for XDP egress
  2020-04-13 17:17 [PATCH RFC-v5 bpf-next 00/12] Add support for XDP in egress path David Ahern
                   ` (9 preceding siblings ...)
  2020-04-13 17:17 ` [PATCH RFC-v5 bpf-next 10/12] libbpf: Add egress XDP support David Ahern
@ 2020-04-13 17:18 ` David Ahern
  2020-04-13 17:18 ` [PATCH RFC-v5 bpf-next 12/12] samples/bpf: add XDP egress support to xdp1 David Ahern
  2020-04-16 13:59 ` [PATCH RFC-v5 bpf-next 00/12] Add support for XDP in egress path Toke Høiland-Jørgensen
  12 siblings, 0 replies; 27+ messages in thread
From: David Ahern @ 2020-04-13 17:18 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, prashantbhole.linux, jasowang, brouer, toke,
	toshiaki.makita1, daniel, john.fastabend, ast, kafai,
	songliubraving, yhs, andriin, dsahern, David Ahern

From: David Ahern <dahern@digitalocean.com>

Add NET_ATTACH_TYPE_XDP_EGRESS and update attach_type_strings to
allow a user to specify 'xdp_egress' as the attach or detach point.

libbpf handles egress config via bpf_set_link_xdp_fd_opts, so
update do_attach_detach_xdp to use it. Specifically, the new API
requires old_fd to be set based on any currently loaded program,
so use bpf_get_link_xdp_id and bpf_get_link_xdp_egress_id to get
an fd to any existing program.

Update 'net show' command to dump egress programs.

Signed-off-by: David Ahern <dahern@digitalocean.com>
---
 tools/bpf/bpftool/main.h           |  2 +-
 tools/bpf/bpftool/net.c            | 49 +++++++++++++++++++++++++++---
 tools/bpf/bpftool/netlink_dumper.c | 12 ++++++--
 3 files changed, 55 insertions(+), 8 deletions(-)

diff --git a/tools/bpf/bpftool/main.h b/tools/bpf/bpftool/main.h
index 86f14ce26fd7..cbc0cc2257eb 100644
--- a/tools/bpf/bpftool/main.h
+++ b/tools/bpf/bpftool/main.h
@@ -230,7 +230,7 @@ void btf_dump_linfo_json(const struct btf *btf,
 struct nlattr;
 struct ifinfomsg;
 struct tcmsg;
-int do_xdp_dump(struct ifinfomsg *ifinfo, struct nlattr **tb);
+int do_xdp_dump(struct ifinfomsg *ifinfo, struct nlattr **tb, bool egress);
 int do_filter_dump(struct tcmsg *ifinfo, struct nlattr **tb, const char *kind,
 		   const char *devname, int ifindex);
 
diff --git a/tools/bpf/bpftool/net.c b/tools/bpf/bpftool/net.c
index c5e3895b7c8b..1789dcd33d4e 100644
--- a/tools/bpf/bpftool/net.c
+++ b/tools/bpf/bpftool/net.c
@@ -32,6 +32,7 @@ struct bpf_netdev_t {
 	int	used_len;
 	int	array_len;
 	int	filter_idx;
+	bool	egress;
 };
 
 struct tc_kind_handle {
@@ -61,6 +62,7 @@ enum net_attach_type {
 	NET_ATTACH_TYPE_XDP_GENERIC,
 	NET_ATTACH_TYPE_XDP_DRIVER,
 	NET_ATTACH_TYPE_XDP_OFFLOAD,
+	NET_ATTACH_TYPE_XDP_EGRESS,
 };
 
 static const char * const attach_type_strings[] = {
@@ -68,6 +70,7 @@ static const char * const attach_type_strings[] = {
 	[NET_ATTACH_TYPE_XDP_GENERIC]	= "xdpgeneric",
 	[NET_ATTACH_TYPE_XDP_DRIVER]	= "xdpdrv",
 	[NET_ATTACH_TYPE_XDP_OFFLOAD]	= "xdpoffload",
+	[NET_ATTACH_TYPE_XDP_EGRESS]	= "xdp_egress",
 };
 
 const size_t net_attach_type_size = ARRAY_SIZE(attach_type_strings);
@@ -111,7 +114,7 @@ static int dump_link_nlmsg(void *cookie, void *msg, struct nlattr **tb)
 			 : "");
 	netinfo->used_len++;
 
-	return do_xdp_dump(ifinfo, tb);
+	return do_xdp_dump(ifinfo, tb, netinfo->egress);
 }
 
 static int dump_class_qdisc_nlmsg(void *cookie, void *msg, struct nlattr **tb)
@@ -276,10 +279,19 @@ static int net_parse_dev(int *argc, char ***argv)
 static int do_attach_detach_xdp(int progfd, enum net_attach_type attach_type,
 				int ifindex, bool overwrite)
 {
-	__u32 flags = 0;
+	struct bpf_xdp_set_link_opts opts;
+	__u32 flags = 0, id = 0;
+	int rc, old_fd = -1;
+
+	memset(&opts, 0, sizeof(opts));
+	opts.sz = sizeof(opts);
 
 	if (!overwrite)
 		flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
+
+	if (attach_type == NET_ATTACH_TYPE_XDP_EGRESS)
+		opts.egress = 1;
+
 	if (attach_type == NET_ATTACH_TYPE_XDP_GENERIC)
 		flags |= XDP_FLAGS_SKB_MODE;
 	if (attach_type == NET_ATTACH_TYPE_XDP_DRIVER)
@@ -287,7 +299,27 @@ static int do_attach_detach_xdp(int progfd, enum net_attach_type attach_type,
 	if (attach_type == NET_ATTACH_TYPE_XDP_OFFLOAD)
 		flags |= XDP_FLAGS_HW_MODE;
 
-	return bpf_set_link_xdp_fd(ifindex, progfd, flags);
+	if (opts.egress)
+		rc = bpf_get_link_xdp_egress_id(ifindex, &id, flags);
+	else
+		rc = bpf_get_link_xdp_id(ifindex, &id, flags);
+
+	if (rc) {
+		p_err("Failed to get existing prog id for device");
+		return rc;
+	}
+
+	if (id)
+		old_fd = bpf_prog_get_fd_by_id(id);
+
+	opts.old_fd = old_fd;
+
+	rc = bpf_set_link_xdp_fd_opts(ifindex, progfd, flags, &opts);
+
+	if (old_fd != -1)
+		close(old_fd);
+
+	return rc;
 }
 
 static int do_attach(int argc, char **argv)
@@ -411,6 +443,7 @@ static int do_show(int argc, char **argv)
 	dev_array.used_len = 0;
 	dev_array.array_len = 0;
 	dev_array.filter_idx = filter_idx;
+	dev_array.egress = 0;
 
 	if (json_output)
 		jsonw_start_array(json_wtr);
@@ -419,6 +452,14 @@ static int do_show(int argc, char **argv)
 	ret = libbpf_nl_get_link(sock, nl_pid, dump_link_nlmsg, &dev_array);
 	NET_END_ARRAY("\n");
 
+	if (!ret) {
+		dev_array.egress = true;
+		NET_START_ARRAY("xdp_egress", "%s:\n");
+		ret = libbpf_nl_get_link(sock, nl_pid, dump_link_nlmsg,
+					 &dev_array);
+		NET_END_ARRAY("\n");
+	}
+
 	if (!ret) {
 		NET_START_ARRAY("tc", "%s:\n");
 		for (i = 0; i < dev_array.used_len; i++) {
@@ -464,7 +505,7 @@ static int do_help(int argc, char **argv)
 		"       %s %s help\n"
 		"\n"
 		"       " HELP_SPEC_PROGRAM "\n"
-		"       ATTACH_TYPE := { xdp | xdpgeneric | xdpdrv | xdpoffload }\n"
+		"       ATTACH_TYPE := { xdp | xdpgeneric | xdpdrv | xdpoffload | xdp_egress}\n"
 		"\n"
 		"Note: Only xdp and tc attachments are supported now.\n"
 		"      For progs attached to cgroups, use \"bpftool cgroup\"\n"
diff --git a/tools/bpf/bpftool/netlink_dumper.c b/tools/bpf/bpftool/netlink_dumper.c
index 5f65140b003b..e4a2b6f8e50b 100644
--- a/tools/bpf/bpftool/netlink_dumper.c
+++ b/tools/bpf/bpftool/netlink_dumper.c
@@ -55,6 +55,7 @@ static int do_xdp_dump_one(struct nlattr *attr, unsigned int ifindex,
 		xdp_dump_prog_id(tb, IFLA_XDP_SKB_PROG_ID, "generic", true);
 		xdp_dump_prog_id(tb, IFLA_XDP_DRV_PROG_ID, "driver", true);
 		xdp_dump_prog_id(tb, IFLA_XDP_HW_PROG_ID, "offload", true);
+		xdp_dump_prog_id(tb, IFLA_XDP_EGRESS_CORE_PROG_ID, "core", true);
 		if (json_output)
 			jsonw_end_array(json_wtr);
 	} else if (mode == XDP_ATTACHED_DRV) {
@@ -63,18 +64,23 @@ static int do_xdp_dump_one(struct nlattr *attr, unsigned int ifindex,
 		xdp_dump_prog_id(tb, IFLA_XDP_PROG_ID, "generic", false);
 	} else if (mode == XDP_ATTACHED_HW) {
 		xdp_dump_prog_id(tb, IFLA_XDP_PROG_ID, "offload", false);
+	} else if (mode == XDP_ATTACHED_EGRESS_CORE) {
+		xdp_dump_prog_id(tb, IFLA_XDP_EGRESS_CORE_PROG_ID, "core",
+				 false);
 	}
 
 	NET_END_OBJECT_FINAL;
 	return 0;
 }
 
-int do_xdp_dump(struct ifinfomsg *ifinfo, struct nlattr **tb)
+int do_xdp_dump(struct ifinfomsg *ifinfo, struct nlattr **tb, bool egress)
 {
-	if (!tb[IFLA_XDP])
+	__u16 atype = egress ? IFLA_XDP_EGRESS : IFLA_XDP;
+
+	if (!tb[atype])
 		return 0;
 
-	return do_xdp_dump_one(tb[IFLA_XDP], ifinfo->ifi_index,
+	return do_xdp_dump_one(tb[atype], ifinfo->ifi_index,
 			       libbpf_nla_getattr_str(tb[IFLA_IFNAME]));
 }
 
-- 
2.21.1 (Apple Git-122.3)


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH RFC-v5 bpf-next 12/12] samples/bpf: add XDP egress support to xdp1
  2020-04-13 17:17 [PATCH RFC-v5 bpf-next 00/12] Add support for XDP in egress path David Ahern
                   ` (10 preceding siblings ...)
  2020-04-13 17:18 ` [PATCH RFC-v5 bpf-next 11/12] bpftool: Add support for XDP egress David Ahern
@ 2020-04-13 17:18 ` David Ahern
  2020-04-15 15:01   ` Alexei Starovoitov
  2020-04-16 13:59 ` [PATCH RFC-v5 bpf-next 00/12] Add support for XDP in egress path Toke Høiland-Jørgensen
  12 siblings, 1 reply; 27+ messages in thread
From: David Ahern @ 2020-04-13 17:18 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, prashantbhole.linux, jasowang, brouer, toke,
	toshiaki.makita1, daniel, john.fastabend, ast, kafai,
	songliubraving, yhs, andriin, dsahern, David Ahern

From: David Ahern <dahern@digitalocean.com>

xdp1 and xdp2 now accept -E flag to set XDP program in the egress
path.

Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
Signed-off-by: David Ahern <dahern@digitalocean.com>
---
 samples/bpf/xdp1_user.c | 38 +++++++++++++++++++++++++++++++-------
 1 file changed, 31 insertions(+), 7 deletions(-)

diff --git a/samples/bpf/xdp1_user.c b/samples/bpf/xdp1_user.c
index c447ad9e3a1d..1e74020203ca 100644
--- a/samples/bpf/xdp1_user.c
+++ b/samples/bpf/xdp1_user.c
@@ -21,18 +21,32 @@
 static int ifindex;
 static __u32 xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
 static __u32 prog_id;
+static bool egress;
 
 static void int_exit(int sig)
 {
+	struct bpf_xdp_set_link_opts opts = { .sz = sizeof(opts) };
 	__u32 curr_prog_id = 0;
+	int err, old_fd;
 
-	if (bpf_get_link_xdp_id(ifindex, &curr_prog_id, xdp_flags)) {
+	if (egress)
+		err = bpf_get_link_xdp_egress_id(ifindex, &curr_prog_id,
+						 xdp_flags);
+	else
+		err = bpf_get_link_xdp_id(ifindex, &curr_prog_id, xdp_flags);
+	if (err) {
 		printf("bpf_get_link_xdp_id failed\n");
 		exit(1);
 	}
-	if (prog_id == curr_prog_id)
-		bpf_set_link_xdp_fd(ifindex, -1, xdp_flags);
-	else if (!curr_prog_id)
+	if (prog_id == curr_prog_id) {
+		if (egress)
+			opts.egress = 1;
+
+		old_fd = bpf_prog_get_fd_by_id(prog_id);
+		opts.old_fd = old_fd;
+		bpf_set_link_xdp_fd_opts(ifindex, -1, xdp_flags, &opts);
+		close(old_fd);
+	} else if (!curr_prog_id)
 		printf("couldn't find a prog id on a given interface\n");
 	else
 		printf("program on interface changed, not removing\n");
@@ -73,19 +87,21 @@ static void usage(const char *prog)
 		"OPTS:\n"
 		"    -S    use skb-mode\n"
 		"    -N    enforce native mode\n"
-		"    -F    force loading prog\n",
+		"    -F    force loading prog\n"
+		"    -E	   egress path program\n",
 		prog);
 }
 
 int main(int argc, char **argv)
 {
+	struct bpf_xdp_set_link_opts opts = { .sz = sizeof(opts) };
 	struct rlimit r = {RLIM_INFINITY, RLIM_INFINITY};
 	struct bpf_prog_load_attr prog_load_attr = {
 		.prog_type	= BPF_PROG_TYPE_XDP,
 	};
 	struct bpf_prog_info info = {};
 	__u32 info_len = sizeof(info);
-	const char *optstr = "FSN";
+	const char *optstr = "FSNE";
 	int prog_fd, map_fd, opt;
 	struct bpf_object *obj;
 	struct bpf_map *map;
@@ -103,6 +119,9 @@ int main(int argc, char **argv)
 		case 'F':
 			xdp_flags &= ~XDP_FLAGS_UPDATE_IF_NOEXIST;
 			break;
+		case 'E':
+			egress = true;
+			break;
 		default:
 			usage(basename(argv[0]));
 			return 1;
@@ -130,6 +149,10 @@ int main(int argc, char **argv)
 
 	snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
 	prog_load_attr.file = filename;
+	if (egress) {
+		opts.egress = 1;
+		prog_load_attr.expected_attach_type = BPF_XDP_EGRESS;
+	}
 
 	if (bpf_prog_load_xattr(&prog_load_attr, &obj, &prog_fd))
 		return 1;
@@ -149,7 +172,8 @@ int main(int argc, char **argv)
 	signal(SIGINT, int_exit);
 	signal(SIGTERM, int_exit);
 
-	if (bpf_set_link_xdp_fd(ifindex, prog_fd, xdp_flags) < 0) {
+	err = bpf_set_link_xdp_fd_opts(ifindex, prog_fd, xdp_flags, &opts);
+	if (err < 0) {
 		printf("link set xdp fd failed\n");
 		return 1;
 	}
-- 
2.21.1 (Apple Git-122.3)


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH RFC-v5 bpf-next 12/12] samples/bpf: add XDP egress support to xdp1
  2020-04-13 17:18 ` [PATCH RFC-v5 bpf-next 12/12] samples/bpf: add XDP egress support to xdp1 David Ahern
@ 2020-04-15 15:01   ` Alexei Starovoitov
  0 siblings, 0 replies; 27+ messages in thread
From: Alexei Starovoitov @ 2020-04-15 15:01 UTC (permalink / raw)
  To: David Ahern
  Cc: netdev, davem, kuba, prashantbhole.linux, jasowang, brouer, toke,
	toshiaki.makita1, daniel, john.fastabend, ast, kafai,
	songliubraving, yhs, andriin, dsahern, David Ahern

On Mon, Apr 13, 2020 at 11:18:01AM -0600, David Ahern wrote:
> From: David Ahern <dahern@digitalocean.com>
> 
> xdp1 and xdp2 now accept -E flag to set XDP program in the egress
> path.
> 
> Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
> Signed-off-by: David Ahern <dahern@digitalocean.com>
> ---
>  samples/bpf/xdp1_user.c | 38 +++++++++++++++++++++++++++++++-------
>  1 file changed, 31 insertions(+), 7 deletions(-)

samples are great to see, but not enough. selftests/bpf is _mandatory_
for any new feature.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH RFC-v5 bpf-next 00/12] Add support for XDP in egress path
  2020-04-13 17:17 [PATCH RFC-v5 bpf-next 00/12] Add support for XDP in egress path David Ahern
                   ` (11 preceding siblings ...)
  2020-04-13 17:18 ` [PATCH RFC-v5 bpf-next 12/12] samples/bpf: add XDP egress support to xdp1 David Ahern
@ 2020-04-16 13:59 ` Toke Høiland-Jørgensen
  2020-04-16 23:55   ` David Ahern
  12 siblings, 1 reply; 27+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-04-16 13:59 UTC (permalink / raw)
  To: David Ahern, netdev
  Cc: davem, kuba, prashantbhole.linux, jasowang, brouer,
	toshiaki.makita1, daniel, john.fastabend, ast, kafai,
	songliubraving, yhs, andriin, dsahern

David Ahern <dsahern@kernel.org> writes:

> From: David Ahern <dsahern@gmail.com>
>
> This series adds support for XDP in the egress path by introducing
> a new XDP attachment type, BPF_XDP_EGRESS, and adding a UAPI to
> if_link.h for attaching the program to a netdevice and reporting
> the program. bpf programs can be run on all packets in the Tx path -
> skbs or redirected xdp frames. The intent is to emulate the current
> RX path for XDP as much as possible to maintain consistency and
> symmetry in the 2 paths with their APIs.
>
> This is a missing primitive for XDP allowing solutions to build small,
> targeted programs properly distributed in the networking path allowing,
> for example, an egress firewall/ACL/traffic verification or packet
> manipulation and encapping an entire ethernet frame whether it is
> locally generated traffic, forwarded via the slow path (ie., full
> stack processing) or xdp redirected frames.
>
> Nothing about running a program in the Tx path requires driver specific
> resources like the Rx path has. Thus, programs can be run in core
> code and attached to the net_device struct similar to skb mode. The
> existing XDP_FLAGS_*_MODE are not relevant at the moment, so none can
> be set in the attach. XDP_FLAGS_HW_MODE can be used in the future
> (e.g., the work on offloading programs from a VM).
>
> The locations chosen to run the egress program - __netdev_start_xmit
> before the call to ndo_start_xmit and bq_xmit_all before invoking
> ndo_xdp_xmit - allow follow on patch sets to handle tx queueing and
> setting the queue index if multi-queue with consistency in handling
> both packet formats.

I like the choice of hook points. It is interesting that it implies that
there will not be not a separate "XDP generic" hook on egress. And it's
certainly a benefit to not have to change all the drivers. So that's
good :)

I also think it'll be possible to get the information we want (such as
TXQ fill level) at the places you put the hooks. For the skb case
through struct netdev_queue and BQL, and for REDIRECT presumably with
Magnus' queue abstraction once that lands. So overall I think we're
getting there :)

I'll add a few more comments for each patch...

-Toke


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH RFC-v5 bpf-next 05/12] net: core: rename netif_receive_generic_xdp to do_generic_xdp_core
  2020-04-13 17:17 ` [PATCH RFC-v5 bpf-next 05/12] net: core: rename netif_receive_generic_xdp to do_generic_xdp_core David Ahern
@ 2020-04-16 14:00   ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 27+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-04-16 14:00 UTC (permalink / raw)
  To: David Ahern, netdev
  Cc: davem, kuba, prashantbhole.linux, jasowang, brouer,
	toshiaki.makita1, daniel, john.fastabend, ast, kafai,
	songliubraving, yhs, andriin, dsahern, David Ahern

David Ahern <dsahern@kernel.org> writes:

> From: David Ahern <dahern@digitalocean.com>
>
> In skb generic path, we need a way to run XDP program on skb but
> to have customized handling of XDP actions. netif_receive_generic_xdp
> will be more helpful in such cases than do_xdp_generic.
>
> This patch prepares netif_receive_generic_xdp() to be used as general
> purpose function for running xdp programs on skbs by renaming it to
> do_xdp_generic_core, moving skb_is_redirected and rxq settings as well
> as XDP return code checks to the callers.
>
> This allows this core function to be used from both Rx and Tx paths
> with rxq and txq set based on context.
>
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
> Signed-off-by: David Ahern <dahern@digitalocean.com>
> ---
>  net/core/dev.c | 52 ++++++++++++++++++++++++--------------------------
>  1 file changed, 25 insertions(+), 27 deletions(-)
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index e763b6cea8ff..4f0c4fee1125 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -4500,25 +4500,17 @@ static struct netdev_rx_queue *netif_get_rxqueue(struct sk_buff *skb)
>  	return rxqueue;
>  }
>  
> -static u32 netif_receive_generic_xdp(struct sk_buff *skb,
> -				     struct xdp_buff *xdp,
> -				     struct bpf_prog *xdp_prog)
> +static u32 do_xdp_generic_core(struct sk_buff *skb, struct xdp_buff *xdp,
> +			       struct bpf_prog *xdp_prog)
>  {
> -	struct netdev_rx_queue *rxqueue;
>  	void *orig_data, *orig_data_end;
> -	u32 metalen, act = XDP_DROP;
>  	__be16 orig_eth_type;
>  	struct ethhdr *eth;
> +	u32 metalen, act;
>  	bool orig_bcast;
>  	int hlen, off;
>  	u32 mac_len;
>  
> -	/* Reinjected packets coming from act_mirred or similar should
> -	 * not get XDP generic processing.
> -	 */
> -	if (skb_is_redirected(skb))
> -		return XDP_PASS;
> -
>  	/* XDP packets must be linear and must have sufficient headroom
>  	 * of XDP_PACKET_HEADROOM bytes. This is the guarantee that also
>  	 * native XDP provides, thus we need to do it here as well.
> @@ -4534,9 +4526,9 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
>  		if (pskb_expand_head(skb,
>  				     hroom > 0 ? ALIGN(hroom, NET_SKB_PAD) : 0,
>  				     troom > 0 ? troom + 128 : 0, GFP_ATOMIC))
> -			goto do_drop;
> +			return XDP_DROP;
>  		if (skb_linearize(skb))
> -			goto do_drop;
> +			return XDP_DROP;
>  	}
>  
>  	/* The XDP program wants to see the packet starting at the MAC
> @@ -4554,9 +4546,6 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
>  	orig_bcast = is_multicast_ether_addr_64bits(eth->h_dest);
>  	orig_eth_type = eth->h_proto;
>  
> -	rxqueue = netif_get_rxqueue(skb);
> -	xdp->rxq = &rxqueue->xdp_rxq;
> -
>  	act = bpf_prog_run_xdp(xdp_prog, xdp);
>  
>  	/* check if bpf_xdp_adjust_head was used */
> @@ -4599,16 +4588,6 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
>  		if (metalen)
>  			skb_metadata_set(skb, metalen);
>  		break;
> -	default:
> -		bpf_warn_invalid_xdp_action(act);
> -		/* fall through */
> -	case XDP_ABORTED:
> -		trace_xdp_exception(skb->dev, xdp_prog, act);
> -		/* fall through */
> -	case XDP_DROP:
> -	do_drop:
> -		kfree_skb(skb);
> -		break;
>  	}
>  
>  	return act;
> @@ -4643,12 +4622,22 @@ static DEFINE_STATIC_KEY_FALSE(generic_xdp_needed_key);
>  
>  int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb)
>  {
> +	/* Reinjected packets coming from act_mirred or similar should
> +	 * not get XDP generic processing.
> +	 */

My immediate thought when reading this was "wait, we're doing TX now, is
this still true?". And then I saw the next patch where you're renaming
the function; so maybe switch those two patches, or merge them?

-Toke


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH RFC-v5 bpf-next 02/12] net: Add BPF_XDP_EGRESS as a bpf_attach_type
  2020-04-13 17:17 ` [PATCH RFC-v5 bpf-next 02/12] net: Add BPF_XDP_EGRESS as a bpf_attach_type David Ahern
@ 2020-04-16 14:01   ` Toke Høiland-Jørgensen
  2020-04-16 16:35     ` David Ahern
  0 siblings, 1 reply; 27+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-04-16 14:01 UTC (permalink / raw)
  To: David Ahern, netdev
  Cc: davem, kuba, prashantbhole.linux, jasowang, brouer,
	toshiaki.makita1, daniel, john.fastabend, ast, kafai,
	songliubraving, yhs, andriin, dsahern, David Ahern

David Ahern <dsahern@kernel.org> writes:

> From: David Ahern <dahern@digitalocean.com>
>
> Add new bpf_attach_type, BPF_XDP_EGRESS, for BPF programs attached
> at the XDP layer, but the egress path.
>
> Since egress path will not have ingress_ifindex and rx_queue_index
> set, update xdp_is_valid_access to block access to these entries in
> the xdp context when a program is attached to egress path.
>
> Update dev_change_xdp_fd to verify expected_attach_type for a program
> is BPF_XDP_EGRESS if egress argument is set.
>
> The next patch adds support for the egress ifindex.
>
> Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
> Signed-off-by: David Ahern <dahern@digitalocean.com>
> ---
>  include/uapi/linux/bpf.h       | 1 +
>  net/core/dev.c                 | 6 ++++++
>  net/core/filter.c              | 8 ++++++++
>  tools/include/uapi/linux/bpf.h | 1 +
>  4 files changed, 16 insertions(+)
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 2e29a671d67e..a9d384998e8b 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -215,6 +215,7 @@ enum bpf_attach_type {
>  	BPF_TRACE_FEXIT,
>  	BPF_MODIFY_RETURN,
>  	BPF_LSM_MAC,
> +	BPF_XDP_EGRESS,
>  	__MAX_BPF_ATTACH_TYPE
>  };
>  
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 06e0872ecdae..e763b6cea8ff 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -8731,6 +8731,12 @@ int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
>  		if (IS_ERR(prog))
>  			return PTR_ERR(prog);
>  
> +		if (egress && prog->expected_attach_type != BPF_XDP_EGRESS) {
> +			NL_SET_ERR_MSG(extack, "XDP program in Tx path must use BPF_XDP_EGRESS attach type");
> +			bpf_prog_put(prog);
> +			return -EINVAL;
> +		}
> +
>  		if (!offload && bpf_prog_is_dev_bound(prog->aux)) {
>  			NL_SET_ERR_MSG(extack, "using device-bound program without HW_MODE flag is not supported");
>  			bpf_prog_put(prog);
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 7628b947dbc3..c4e0e044722f 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -6935,6 +6935,14 @@ static bool xdp_is_valid_access(int off, int size,
>  				const struct bpf_prog *prog,
>  				struct bpf_insn_access_aux *info)
>  {
> +	if (prog->expected_attach_type == BPF_XDP_EGRESS) {
> +		switch (off) {
> +		case offsetof(struct xdp_md, ingress_ifindex):
> +		case offsetof(struct xdp_md, rx_queue_index):
> +			return false;
> +		}
> +	}
> +

How will this be handled for freplace programs - will they also
"inherit" the expected_attach_type of the programs they attach to?

-Toke


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH RFC-v5 bpf-next 09/12] dev: Support xdp in the Tx path for xdp_frames
  2020-04-13 17:17 ` [PATCH RFC-v5 bpf-next 09/12] dev: Support xdp in the Tx path for xdp_frames David Ahern
@ 2020-04-16 14:02   ` Toke Høiland-Jørgensen
  2020-04-16 23:50     ` David Ahern
  2020-04-17  8:30   ` Jesper Dangaard Brouer
  1 sibling, 1 reply; 27+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-04-16 14:02 UTC (permalink / raw)
  To: David Ahern, netdev
  Cc: davem, kuba, prashantbhole.linux, jasowang, brouer,
	toshiaki.makita1, daniel, john.fastabend, ast, kafai,
	songliubraving, yhs, andriin, dsahern, David Ahern

David Ahern <dsahern@kernel.org> writes:

> From: David Ahern <dahern@digitalocean.com>
>
> Add support to run Tx path program on xdp_frames by adding a hook to
> bq_xmit_all before xdp_frames are passed to ndo_xdp_xmit for the device.
>
> If an xdp_frame is dropped by the program, it is removed from the
> xdp_frames array with subsequent entries moved up.
>
> Signed-off-by: David Ahern <dahern@digitalocean.com>
> ---
>  include/linux/netdevice.h |  3 ++
>  kernel/bpf/devmap.c       | 19 ++++++++---
>  net/core/dev.c            | 70 +++++++++++++++++++++++++++++++++++++++
>  3 files changed, 87 insertions(+), 5 deletions(-)
>
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 39e1b42c042f..d75e31ac2751 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -3715,6 +3715,9 @@ static inline void dev_consume_skb_any(struct sk_buff *skb)
>  void generic_xdp_tx(struct sk_buff *skb, struct bpf_prog *xdp_prog);
>  int do_xdp_generic_rx(struct bpf_prog *xdp_prog, struct sk_buff *skb);
>  u32 do_xdp_egress_skb(struct net_device *dev, struct sk_buff *skb);
> +unsigned int do_xdp_egress_frame(struct net_device *dev,
> +				 struct xdp_frame **frames,
> +				 unsigned int *pcount);
>  int netif_rx(struct sk_buff *skb);
>  int netif_rx_ni(struct sk_buff *skb);
>  int netif_receive_skb(struct sk_buff *skb);
> diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
> index 58bdca5d978a..bedecd07d898 100644
> --- a/kernel/bpf/devmap.c
> +++ b/kernel/bpf/devmap.c
> @@ -322,24 +322,33 @@ static int bq_xmit_all(struct xdp_dev_bulk_queue *bq, u32 flags)
>  {
>  	struct net_device *dev = bq->dev;
>  	int sent = 0, drops = 0, err = 0;
> +	unsigned int count = bq->count;
>  	int i;
>  
> -	if (unlikely(!bq->count))
> +	if (unlikely(!count))
>  		return 0;
>  
> -	for (i = 0; i < bq->count; i++) {
> +	for (i = 0; i < count; i++) {
>  		struct xdp_frame *xdpf = bq->q[i];
>  
>  		prefetch(xdpf);
>  	}
>  
> -	sent = dev->netdev_ops->ndo_xdp_xmit(dev, bq->count, bq->q, flags);
> +	if (static_branch_unlikely(&xdp_egress_needed_key)) {
> +		count = do_xdp_egress_frame(dev, bq->q, &count);

nit: seems a bit odd to pass the point to count, then reassign it with
the return value?

> +		drops += bq->count - count;
> +		/* all frames consumed by the xdp program? */
> +		if (!count)
> +			goto out;
> +	}
> +
> +	sent = dev->netdev_ops->ndo_xdp_xmit(dev, count, bq->q, flags);
>  	if (sent < 0) {
>  		err = sent;
>  		sent = 0;
>  		goto error;
>  	}
> -	drops = bq->count - sent;
> +	drops += count - sent;
>  out:
>  	bq->count = 0;
>  
> @@ -351,7 +360,7 @@ static int bq_xmit_all(struct xdp_dev_bulk_queue *bq, u32 flags)
>  	/* If ndo_xdp_xmit fails with an errno, no frames have been
>  	 * xmit'ed and it's our responsibility to them free all.
>  	 */
> -	for (i = 0; i < bq->count; i++) {
> +	for (i = 0; i < count; i++) {
>  		struct xdp_frame *xdpf = bq->q[i];
>  
>  		xdp_return_frame_rx_napi(xdpf);
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 1bbaeb8842ed..f23dc6043329 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -4720,6 +4720,76 @@ u32 do_xdp_egress_skb(struct net_device *dev, struct sk_buff *skb)
>  }
>  EXPORT_SYMBOL_GPL(do_xdp_egress_skb);
>  
> +static u32 __xdp_egress_frame(struct net_device *dev,
> +			      struct bpf_prog *xdp_prog,
> +			      struct xdp_frame *xdp_frame,
> +			      struct xdp_txq_info *txq)
> +{
> +	struct xdp_buff xdp;
> +	u32 act;
> +
> +	xdp.data_hard_start = xdp_frame->data - xdp_frame->headroom;
> +	xdp.data = xdp_frame->data;
> +	xdp.data_end = xdp.data + xdp_frame->len;
> +	xdp_set_data_meta_invalid(&xdp);

Why invalidate the metadata? On the contrary we'd want metadata from the
RX side to survive, wouldn't we?

> +	xdp.txq = txq;
> +
> +	act = bpf_prog_run_xdp(xdp_prog, &xdp);
> +	act = handle_xdp_egress_act(act, dev, xdp_prog);
> +
> +	/* if not dropping frame, readjust pointers in case
> +	 * program made changes to the buffer
> +	 */
> +	if (act != XDP_DROP) {
> +		int headroom = xdp.data - xdp.data_hard_start;
> +		int metasize = xdp.data - xdp.data_meta;
> +
> +		metasize = metasize > 0 ? metasize : 0;
> +		if (unlikely((headroom - metasize) < sizeof(*xdp_frame)))
> +			return XDP_DROP;
> +
> +		xdp_frame = xdp.data_hard_start;
> +		xdp_frame->data = xdp.data;
> +		xdp_frame->len  = xdp.data_end - xdp.data;
> +		xdp_frame->headroom = headroom - sizeof(*xdp_frame);
> +		xdp_frame->metasize = metasize;
> +		/* xdp_frame->mem is unchanged */
> +	}
> +
> +	return act;
> +}
> +
> +unsigned int do_xdp_egress_frame(struct net_device *dev,
> +				 struct xdp_frame **frames,
> +				 unsigned int *pcount)
> +{
> +	struct bpf_prog *xdp_prog;
> +	unsigned int count = *pcount;
> +
> +	xdp_prog = rcu_dereference(dev->xdp_egress_prog);
> +	if (xdp_prog) {
> +		struct xdp_txq_info txq = { .dev = dev };

Do you have any thoughts on how to populate this for the redirect case?
I guess using Magnus' HWQ abstraction when that lands? Or did you have
something different in mind?

> +		unsigned int i, j;
> +		u32 act;
> +
> +		for (i = 0, j = 0; i < count; i++) {
> +			struct xdp_frame *frame = frames[i];
> +
> +			act = __xdp_egress_frame(dev, xdp_prog, frame, &txq);
> +			if (act == XDP_DROP) {
> +				xdp_return_frame_rx_napi(frame);
> +				continue;
> +			}
> +
> +			frames[j] = frame;
> +			j++;
> +		}
> +		count = j;
> +	}
> +
> +	return count;
> +}
> +
>  static int netif_rx_internal(struct sk_buff *skb)
>  {
>  	int ret;
> -- 
> 2.21.1 (Apple Git-122.3)


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH RFC-v5 bpf-next 02/12] net: Add BPF_XDP_EGRESS as a bpf_attach_type
  2020-04-16 14:01   ` Toke Høiland-Jørgensen
@ 2020-04-16 16:35     ` David Ahern
  2020-04-17  9:23       ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 27+ messages in thread
From: David Ahern @ 2020-04-16 16:35 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, David Ahern, netdev
  Cc: davem, kuba, prashantbhole.linux, jasowang, brouer,
	toshiaki.makita1, daniel, john.fastabend, ast, kafai,
	songliubraving, yhs, andriin, David Ahern

On 4/16/20 8:01 AM, Toke Høiland-Jørgensen wrote:
> David Ahern <dsahern@kernel.org> writes:
> 
>> From: David Ahern <dahern@digitalocean.com>
>>
>> Add new bpf_attach_type, BPF_XDP_EGRESS, for BPF programs attached
>> at the XDP layer, but the egress path.
>>
>> Since egress path will not have ingress_ifindex and rx_queue_index
>> set, update xdp_is_valid_access to block access to these entries in
>> the xdp context when a program is attached to egress path.
>>
>> Update dev_change_xdp_fd to verify expected_attach_type for a program
>> is BPF_XDP_EGRESS if egress argument is set.
>>
>> The next patch adds support for the egress ifindex.
>>
>> Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
>> Signed-off-by: David Ahern <dahern@digitalocean.com>
>> ---
>>  include/uapi/linux/bpf.h       | 1 +
>>  net/core/dev.c                 | 6 ++++++
>>  net/core/filter.c              | 8 ++++++++
>>  tools/include/uapi/linux/bpf.h | 1 +
>>  4 files changed, 16 insertions(+)
>>
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index 2e29a671d67e..a9d384998e8b 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -215,6 +215,7 @@ enum bpf_attach_type {
>>  	BPF_TRACE_FEXIT,
>>  	BPF_MODIFY_RETURN,
>>  	BPF_LSM_MAC,
>> +	BPF_XDP_EGRESS,
>>  	__MAX_BPF_ATTACH_TYPE
>>  };
>>  
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index 06e0872ecdae..e763b6cea8ff 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -8731,6 +8731,12 @@ int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
>>  		if (IS_ERR(prog))
>>  			return PTR_ERR(prog);
>>  
>> +		if (egress && prog->expected_attach_type != BPF_XDP_EGRESS) {
>> +			NL_SET_ERR_MSG(extack, "XDP program in Tx path must use BPF_XDP_EGRESS attach type");
>> +			bpf_prog_put(prog);
>> +			return -EINVAL;
>> +		}
>> +
>>  		if (!offload && bpf_prog_is_dev_bound(prog->aux)) {
>>  			NL_SET_ERR_MSG(extack, "using device-bound program without HW_MODE flag is not supported");
>>  			bpf_prog_put(prog);
>> diff --git a/net/core/filter.c b/net/core/filter.c
>> index 7628b947dbc3..c4e0e044722f 100644
>> --- a/net/core/filter.c
>> +++ b/net/core/filter.c
>> @@ -6935,6 +6935,14 @@ static bool xdp_is_valid_access(int off, int size,
>>  				const struct bpf_prog *prog,
>>  				struct bpf_insn_access_aux *info)
>>  {
>> +	if (prog->expected_attach_type == BPF_XDP_EGRESS) {
>> +		switch (off) {
>> +		case offsetof(struct xdp_md, ingress_ifindex):
>> +		case offsetof(struct xdp_md, rx_queue_index):
>> +			return false;
>> +		}
>> +	}
>> +
> 
> How will this be handled for freplace programs - will they also
> "inherit" the expected_attach_type of the programs they attach to?
> 

not sure I understand your point. This is not the first program type to
have an expected_attach_type; it should work the same way others do -
e.g., cgroup program types.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH RFC-v5 bpf-next 09/12] dev: Support xdp in the Tx path for xdp_frames
  2020-04-16 14:02   ` Toke Høiland-Jørgensen
@ 2020-04-16 23:50     ` David Ahern
  2020-04-17  9:25       ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 27+ messages in thread
From: David Ahern @ 2020-04-16 23:50 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, David Ahern, netdev
  Cc: davem, kuba, prashantbhole.linux, jasowang, brouer,
	toshiaki.makita1, daniel, john.fastabend, ast, kafai,
	songliubraving, yhs, andriin, David Ahern

On 4/16/20 8:02 AM, Toke Høiland-Jørgensen wrote:
>> diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
>> index 58bdca5d978a..bedecd07d898 100644
>> --- a/kernel/bpf/devmap.c
>> +++ b/kernel/bpf/devmap.c
>> @@ -322,24 +322,33 @@ static int bq_xmit_all(struct xdp_dev_bulk_queue *bq, u32 flags)
>>  {
>>  	struct net_device *dev = bq->dev;
>>  	int sent = 0, drops = 0, err = 0;
>> +	unsigned int count = bq->count;
>>  	int i;
>>  
>> -	if (unlikely(!bq->count))
>> +	if (unlikely(!count))
>>  		return 0;
>>  
>> -	for (i = 0; i < bq->count; i++) {
>> +	for (i = 0; i < count; i++) {
>>  		struct xdp_frame *xdpf = bq->q[i];
>>  
>>  		prefetch(xdpf);
>>  	}
>>  
>> -	sent = dev->netdev_ops->ndo_xdp_xmit(dev, bq->count, bq->q, flags);
>> +	if (static_branch_unlikely(&xdp_egress_needed_key)) {
>> +		count = do_xdp_egress_frame(dev, bq->q, &count);
> 
> nit: seems a bit odd to pass the point to count, then reassign it with
> the return value?

thanks for noticing that. leftover from the evolution of this. changed to
		count = do_xdp_egress_frame(dev, bq->q, count);


>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index 1bbaeb8842ed..f23dc6043329 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -4720,6 +4720,76 @@ u32 do_xdp_egress_skb(struct net_device *dev, struct sk_buff *skb)
>>  }
>>  EXPORT_SYMBOL_GPL(do_xdp_egress_skb);
>>  
>> +static u32 __xdp_egress_frame(struct net_device *dev,
>> +			      struct bpf_prog *xdp_prog,
>> +			      struct xdp_frame *xdp_frame,
>> +			      struct xdp_txq_info *txq)
>> +{
>> +	struct xdp_buff xdp;
>> +	u32 act;
>> +
>> +	xdp.data_hard_start = xdp_frame->data - xdp_frame->headroom;
>> +	xdp.data = xdp_frame->data;
>> +	xdp.data_end = xdp.data + xdp_frame->len;
>> +	xdp_set_data_meta_invalid(&xdp);
> 
> Why invalidate the metadata? On the contrary we'd want metadata from the
> RX side to survive, wouldn't we?

right, replaced with:
	xdp.data_meta = xdp.data - metasize;

> 
>> +	xdp.txq = txq;
>> +
>> +	act = bpf_prog_run_xdp(xdp_prog, &xdp);
>> +	act = handle_xdp_egress_act(act, dev, xdp_prog);
>> +
>> +	/* if not dropping frame, readjust pointers in case
>> +	 * program made changes to the buffer
>> +	 */
>> +	if (act != XDP_DROP) {
>> +		int headroom = xdp.data - xdp.data_hard_start;
>> +		int metasize = xdp.data - xdp.data_meta;
>> +
>> +		metasize = metasize > 0 ? metasize : 0;
>> +		if (unlikely((headroom - metasize) < sizeof(*xdp_frame)))
>> +			return XDP_DROP;
>> +
>> +		xdp_frame = xdp.data_hard_start;
>> +		xdp_frame->data = xdp.data;
>> +		xdp_frame->len  = xdp.data_end - xdp.data;
>> +		xdp_frame->headroom = headroom - sizeof(*xdp_frame);
>> +		xdp_frame->metasize = metasize;
>> +		/* xdp_frame->mem is unchanged */
>> +	}
>> +
>> +	return act;
>> +}
>> +
>> +unsigned int do_xdp_egress_frame(struct net_device *dev,
>> +				 struct xdp_frame **frames,
>> +				 unsigned int *pcount)
>> +{
>> +	struct bpf_prog *xdp_prog;
>> +	unsigned int count = *pcount;
>> +
>> +	xdp_prog = rcu_dereference(dev->xdp_egress_prog);
>> +	if (xdp_prog) {
>> +		struct xdp_txq_info txq = { .dev = dev };
> 
> Do you have any thoughts on how to populate this for the redirect case?

not sure I understand. This is the redirect case. ie.., On rx a program
is run, XDP_REDIRECT is returned and the packet is queued. Once the
queue fills or flush is done, bq_xmit_all is called to send the frames.

> I guess using Magnus' HWQ abstraction when that lands? Or did you have
> something different in mind?
> 

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH RFC-v5 bpf-next 00/12] Add support for XDP in egress path
  2020-04-16 13:59 ` [PATCH RFC-v5 bpf-next 00/12] Add support for XDP in egress path Toke Høiland-Jørgensen
@ 2020-04-16 23:55   ` David Ahern
  2020-04-17  9:28     ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 27+ messages in thread
From: David Ahern @ 2020-04-16 23:55 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, David Ahern, netdev
  Cc: davem, kuba, prashantbhole.linux, jasowang, brouer,
	toshiaki.makita1, daniel, john.fastabend, ast, kafai,
	songliubraving, yhs, andriin

On 4/16/20 7:59 AM, Toke Høiland-Jørgensen wrote:
> 
> I like the choice of hook points. It is interesting that it implies that
> there will not be not a separate "XDP generic" hook on egress. And it's
> certainly a benefit to not have to change all the drivers. So that's
> good :)
> 
> I also think it'll be possible to get the information we want (such as
> TXQ fill level) at the places you put the hooks. For the skb case
> through struct netdev_queue and BQL, and for REDIRECT presumably with
> Magnus' queue abstraction once that lands. So overall I think we're
> getting there :)
> 
> I'll add a few more comments for each patch...
> 

thanks for reviewing.

FYI, somehow I left out a refactoring patch when generating patches to
send out. Basically moves existing tb[IFLA_XDP] handling to a helper
that can be reused for tb[IFLA_XDP_EGRESS]

https://github.com/dsahern/linux/commit/71011b5cf6f8c1bca28a6afe5a92be59152a8219

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH RFC-v5 bpf-next 09/12] dev: Support xdp in the Tx path for xdp_frames
  2020-04-13 17:17 ` [PATCH RFC-v5 bpf-next 09/12] dev: Support xdp in the Tx path for xdp_frames David Ahern
  2020-04-16 14:02   ` Toke Høiland-Jørgensen
@ 2020-04-17  8:30   ` Jesper Dangaard Brouer
  2020-04-17 21:29     ` David Ahern
  1 sibling, 1 reply; 27+ messages in thread
From: Jesper Dangaard Brouer @ 2020-04-17  8:30 UTC (permalink / raw)
  To: David Ahern
  Cc: netdev, davem, kuba, prashantbhole.linux, jasowang, toke,
	toshiaki.makita1, daniel, john.fastabend, ast, kafai,
	songliubraving, yhs, andriin, dsahern, David Ahern, brouer

On Mon, 13 Apr 2020 11:17:58 -0600
David Ahern <dsahern@kernel.org> wrote:

> From: David Ahern <dahern@digitalocean.com>
> 
> Add support to run Tx path program on xdp_frames by adding a hook to
> bq_xmit_all before xdp_frames are passed to ndo_xdp_xmit for the device.
> 
> If an xdp_frame is dropped by the program, it is removed from the
> xdp_frames array with subsequent entries moved up.
> 
> Signed-off-by: David Ahern <dahern@digitalocean.com>
> ---
>  include/linux/netdevice.h |  3 ++
>  kernel/bpf/devmap.c       | 19 ++++++++---
>  net/core/dev.c            | 70 +++++++++++++++++++++++++++++++++++++++
>  3 files changed, 87 insertions(+), 5 deletions(-)
> 
[...]
> diff --git a/net/core/dev.c b/net/core/dev.c
> index 1bbaeb8842ed..f23dc6043329 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -4720,6 +4720,76 @@ u32 do_xdp_egress_skb(struct net_device *dev, struct sk_buff *skb)
>  }
>  EXPORT_SYMBOL_GPL(do_xdp_egress_skb);
>  
> +static u32 __xdp_egress_frame(struct net_device *dev,
> +			      struct bpf_prog *xdp_prog,
> +			      struct xdp_frame *xdp_frame,
> +			      struct xdp_txq_info *txq)
> +{
> +	struct xdp_buff xdp;
> +	u32 act;
> +
> +	xdp.data_hard_start = xdp_frame->data - xdp_frame->headroom;

You also need: minus sizeof(*xdp_frame).

The BPF-helper xdp_adjust_head will not allow BPF-prog to access the
memory area that is used for xdp_frame, thus it still is safe.


> +	xdp.data = xdp_frame->data;
> +	xdp.data_end = xdp.data + xdp_frame->len;
> +	xdp_set_data_meta_invalid(&xdp);
> +	xdp.txq = txq;

I think this will be the 3rd place we convert xdp-frame to xdp_buff,
perhaps we should introduce a helper function call.

> +	act = bpf_prog_run_xdp(xdp_prog, &xdp);
> +	act = handle_xdp_egress_act(act, dev, xdp_prog);
> +
> +	/* if not dropping frame, readjust pointers in case
> +	 * program made changes to the buffer
> +	 */
> +	if (act != XDP_DROP) {
> +		int headroom = xdp.data - xdp.data_hard_start;
> +		int metasize = xdp.data - xdp.data_meta;
> +
> +		metasize = metasize > 0 ? metasize : 0;
> +		if (unlikely((headroom - metasize) < sizeof(*xdp_frame)))
> +			return XDP_DROP;
> +
> +		xdp_frame = xdp.data_hard_start;

Is this needed?

> +		xdp_frame->data = xdp.data;
> +		xdp_frame->len  = xdp.data_end - xdp.data;
> +		xdp_frame->headroom = headroom - sizeof(*xdp_frame);
> +		xdp_frame->metasize = metasize;
> +		/* xdp_frame->mem is unchanged */

This looks very similar to convert_to_xdp_frame.
Maybe we need an central update_xdp_frame(xdp_buff) call?


Untested code-up:

diff --git a/include/net/xdp.h b/include/net/xdp.h
index 40c6d3398458..180800c4e7d1 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -93,6 +93,24 @@ static inline void xdp_scrub_frame(struct xdp_frame *frame)
 
 struct xdp_frame *xdp_convert_zc_to_xdp_frame(struct xdp_buff *xdp);
 
+static inline
+bool update_xdp_frame(struct xdp_buff *xdp, struct xdp_frame *xdp_frame)
+{
+       /* Assure headroom is available for storing info */
+       headroom = xdp->data - xdp->data_hard_start;
+       metasize = xdp->data - xdp->data_meta;
+       metasize = metasize > 0 ? metasize : 0;
+       if (unlikely((headroom - metasize) < sizeof(*xdp_frame)))
+               return false;
+
+       xdp_frame->data = xdp->data;
+       xdp_frame->len  = xdp->data_end - xdp->data;
+       xdp_frame->headroom = headroom - sizeof(*xdp_frame);
+       xdp_frame->metasize = metasize;
+
+       return true;
+}
+
 /* Convert xdp_buff to xdp_frame */
 static inline
 struct xdp_frame *convert_to_xdp_frame(struct xdp_buff *xdp)
@@ -104,20 +122,11 @@ struct xdp_frame *convert_to_xdp_frame(struct xdp_buff *xdp)
        if (xdp->rxq->mem.type == MEM_TYPE_ZERO_COPY)
                return xdp_convert_zc_to_xdp_frame(xdp);
 
-       /* Assure headroom is available for storing info */
-       headroom = xdp->data - xdp->data_hard_start;
-       metasize = xdp->data - xdp->data_meta;
-       metasize = metasize > 0 ? metasize : 0;
-       if (unlikely((headroom - metasize) < sizeof(*xdp_frame)))
-               return NULL;
-
        /* Store info in top of packet */
        xdp_frame = xdp->data_hard_start;
 
-       xdp_frame->data = xdp->data;
-       xdp_frame->len  = xdp->data_end - xdp->data;
-       xdp_frame->headroom = headroom - sizeof(*xdp_frame);
-       xdp_frame->metasize = metasize;
+       if (unlikely(!update_xdp_frame(xdp, xdp_frame))
+           return NULL;
 
        /* rxq only valid until napi_schedule ends, convert to xdp_mem_info */
        xdp_frame->mem = xdp->rxq->mem;

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH RFC-v5 bpf-next 02/12] net: Add BPF_XDP_EGRESS as a bpf_attach_type
  2020-04-16 16:35     ` David Ahern
@ 2020-04-17  9:23       ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 27+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-04-17  9:23 UTC (permalink / raw)
  To: David Ahern, David Ahern, netdev
  Cc: davem, kuba, prashantbhole.linux, jasowang, brouer,
	toshiaki.makita1, daniel, john.fastabend, ast, kafai,
	songliubraving, yhs, andriin, David Ahern

David Ahern <dsahern@gmail.com> writes:

> On 4/16/20 8:01 AM, Toke Høiland-Jørgensen wrote:
>> David Ahern <dsahern@kernel.org> writes:
>> 
>>> From: David Ahern <dahern@digitalocean.com>
>>>
>>> Add new bpf_attach_type, BPF_XDP_EGRESS, for BPF programs attached
>>> at the XDP layer, but the egress path.
>>>
>>> Since egress path will not have ingress_ifindex and rx_queue_index
>>> set, update xdp_is_valid_access to block access to these entries in
>>> the xdp context when a program is attached to egress path.
>>>
>>> Update dev_change_xdp_fd to verify expected_attach_type for a program
>>> is BPF_XDP_EGRESS if egress argument is set.
>>>
>>> The next patch adds support for the egress ifindex.
>>>
>>> Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
>>> Signed-off-by: David Ahern <dahern@digitalocean.com>
>>> ---
>>>  include/uapi/linux/bpf.h       | 1 +
>>>  net/core/dev.c                 | 6 ++++++
>>>  net/core/filter.c              | 8 ++++++++
>>>  tools/include/uapi/linux/bpf.h | 1 +
>>>  4 files changed, 16 insertions(+)
>>>
>>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>>> index 2e29a671d67e..a9d384998e8b 100644
>>> --- a/include/uapi/linux/bpf.h
>>> +++ b/include/uapi/linux/bpf.h
>>> @@ -215,6 +215,7 @@ enum bpf_attach_type {
>>>  	BPF_TRACE_FEXIT,
>>>  	BPF_MODIFY_RETURN,
>>>  	BPF_LSM_MAC,
>>> +	BPF_XDP_EGRESS,
>>>  	__MAX_BPF_ATTACH_TYPE
>>>  };
>>>  
>>> diff --git a/net/core/dev.c b/net/core/dev.c
>>> index 06e0872ecdae..e763b6cea8ff 100644
>>> --- a/net/core/dev.c
>>> +++ b/net/core/dev.c
>>> @@ -8731,6 +8731,12 @@ int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
>>>  		if (IS_ERR(prog))
>>>  			return PTR_ERR(prog);
>>>  
>>> +		if (egress && prog->expected_attach_type != BPF_XDP_EGRESS) {
>>> +			NL_SET_ERR_MSG(extack, "XDP program in Tx path must use BPF_XDP_EGRESS attach type");
>>> +			bpf_prog_put(prog);
>>> +			return -EINVAL;
>>> +		}
>>> +
>>>  		if (!offload && bpf_prog_is_dev_bound(prog->aux)) {
>>>  			NL_SET_ERR_MSG(extack, "using device-bound program without HW_MODE flag is not supported");
>>>  			bpf_prog_put(prog);
>>> diff --git a/net/core/filter.c b/net/core/filter.c
>>> index 7628b947dbc3..c4e0e044722f 100644
>>> --- a/net/core/filter.c
>>> +++ b/net/core/filter.c
>>> @@ -6935,6 +6935,14 @@ static bool xdp_is_valid_access(int off, int size,
>>>  				const struct bpf_prog *prog,
>>>  				struct bpf_insn_access_aux *info)
>>>  {
>>> +	if (prog->expected_attach_type == BPF_XDP_EGRESS) {
>>> +		switch (off) {
>>> +		case offsetof(struct xdp_md, ingress_ifindex):
>>> +		case offsetof(struct xdp_md, rx_queue_index):
>>> +			return false;
>>> +		}
>>> +	}
>>> +
>> 
>> How will this be handled for freplace programs - will they also
>> "inherit" the expected_attach_type of the programs they attach to?
>> 
>
> not sure I understand your point. This is not the first program type to
> have an expected_attach_type; it should work the same way others do -
> e.g., cgroup program types.

When attaching an freplace prog, the verifier will update the verifier
ops it uses with those of the target XDP program (in
check_attach_btf_id()). This is what makes the freplace program get
verified as if it were itself an XDP program.

However the freplace() program itself cannot use expected_attach_type
(see check in bpf_tracing_prog_attach()). So I don't think this check
based on prog->expected_attach_type is not going to work for freplace()
programs?

-Toke


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH RFC-v5 bpf-next 09/12] dev: Support xdp in the Tx path for xdp_frames
  2020-04-16 23:50     ` David Ahern
@ 2020-04-17  9:25       ` Toke Høiland-Jørgensen
  2020-04-17 20:06         ` David Ahern
  0 siblings, 1 reply; 27+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-04-17  9:25 UTC (permalink / raw)
  To: David Ahern, David Ahern, netdev
  Cc: davem, kuba, prashantbhole.linux, jasowang, brouer,
	toshiaki.makita1, daniel, john.fastabend, ast, kafai,
	songliubraving, yhs, andriin, David Ahern

David Ahern <dsahern@gmail.com> writes:

> On 4/16/20 8:02 AM, Toke Høiland-Jørgensen wrote:
>>> diff --git a/kernel/bpf/devmap.c b/kernel/bpf/devmap.c
>>> index 58bdca5d978a..bedecd07d898 100644
>>> --- a/kernel/bpf/devmap.c
>>> +++ b/kernel/bpf/devmap.c
>>> @@ -322,24 +322,33 @@ static int bq_xmit_all(struct xdp_dev_bulk_queue *bq, u32 flags)
>>>  {
>>>  	struct net_device *dev = bq->dev;
>>>  	int sent = 0, drops = 0, err = 0;
>>> +	unsigned int count = bq->count;
>>>  	int i;
>>>  
>>> -	if (unlikely(!bq->count))
>>> +	if (unlikely(!count))
>>>  		return 0;
>>>  
>>> -	for (i = 0; i < bq->count; i++) {
>>> +	for (i = 0; i < count; i++) {
>>>  		struct xdp_frame *xdpf = bq->q[i];
>>>  
>>>  		prefetch(xdpf);
>>>  	}
>>>  
>>> -	sent = dev->netdev_ops->ndo_xdp_xmit(dev, bq->count, bq->q, flags);
>>> +	if (static_branch_unlikely(&xdp_egress_needed_key)) {
>>> +		count = do_xdp_egress_frame(dev, bq->q, &count);
>> 
>> nit: seems a bit odd to pass the point to count, then reassign it with
>> the return value?
>
> thanks for noticing that. leftover from the evolution of this. changed to
> 		count = do_xdp_egress_frame(dev, bq->q, count);

Thought it might be. Great!

>>> diff --git a/net/core/dev.c b/net/core/dev.c
>>> index 1bbaeb8842ed..f23dc6043329 100644
>>> --- a/net/core/dev.c
>>> +++ b/net/core/dev.c
>>> @@ -4720,6 +4720,76 @@ u32 do_xdp_egress_skb(struct net_device *dev, struct sk_buff *skb)
>>>  }
>>>  EXPORT_SYMBOL_GPL(do_xdp_egress_skb);
>>>  
>>> +static u32 __xdp_egress_frame(struct net_device *dev,
>>> +			      struct bpf_prog *xdp_prog,
>>> +			      struct xdp_frame *xdp_frame,
>>> +			      struct xdp_txq_info *txq)
>>> +{
>>> +	struct xdp_buff xdp;
>>> +	u32 act;
>>> +
>>> +	xdp.data_hard_start = xdp_frame->data - xdp_frame->headroom;
>>> +	xdp.data = xdp_frame->data;
>>> +	xdp.data_end = xdp.data + xdp_frame->len;
>>> +	xdp_set_data_meta_invalid(&xdp);
>> 
>> Why invalidate the metadata? On the contrary we'd want metadata from the
>> RX side to survive, wouldn't we?
>
> right, replaced with:
> 	xdp.data_meta = xdp.data - metasize;

OK.

>> 
>>> +	xdp.txq = txq;
>>> +
>>> +	act = bpf_prog_run_xdp(xdp_prog, &xdp);
>>> +	act = handle_xdp_egress_act(act, dev, xdp_prog);
>>> +
>>> +	/* if not dropping frame, readjust pointers in case
>>> +	 * program made changes to the buffer
>>> +	 */
>>> +	if (act != XDP_DROP) {
>>> +		int headroom = xdp.data - xdp.data_hard_start;
>>> +		int metasize = xdp.data - xdp.data_meta;
>>> +
>>> +		metasize = metasize > 0 ? metasize : 0;
>>> +		if (unlikely((headroom - metasize) < sizeof(*xdp_frame)))
>>> +			return XDP_DROP;
>>> +
>>> +		xdp_frame = xdp.data_hard_start;
>>> +		xdp_frame->data = xdp.data;
>>> +		xdp_frame->len  = xdp.data_end - xdp.data;
>>> +		xdp_frame->headroom = headroom - sizeof(*xdp_frame);
>>> +		xdp_frame->metasize = metasize;
>>> +		/* xdp_frame->mem is unchanged */
>>> +	}
>>> +
>>> +	return act;
>>> +}
>>> +
>>> +unsigned int do_xdp_egress_frame(struct net_device *dev,
>>> +				 struct xdp_frame **frames,
>>> +				 unsigned int *pcount)
>>> +{
>>> +	struct bpf_prog *xdp_prog;
>>> +	unsigned int count = *pcount;
>>> +
>>> +	xdp_prog = rcu_dereference(dev->xdp_egress_prog);
>>> +	if (xdp_prog) {
>>> +		struct xdp_txq_info txq = { .dev = dev };
>> 
>> Do you have any thoughts on how to populate this for the redirect case?
>
> not sure I understand. This is the redirect case. ie.., On rx a program
> is run, XDP_REDIRECT is returned and the packet is queued. Once the
> queue fills or flush is done, bq_xmit_all is called to send the
> frames.

I just meant that eventually we'd want to populate xdp_txq_info with a
TX HWQ index (and possibly other stuff), right? So how do you figure
we'd get that information at this call site?

-Toke


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH RFC-v5 bpf-next 00/12] Add support for XDP in egress path
  2020-04-16 23:55   ` David Ahern
@ 2020-04-17  9:28     ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 27+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-04-17  9:28 UTC (permalink / raw)
  To: David Ahern, David Ahern, netdev
  Cc: davem, kuba, prashantbhole.linux, jasowang, brouer,
	toshiaki.makita1, daniel, john.fastabend, ast, kafai,
	songliubraving, yhs, andriin

David Ahern <dsahern@gmail.com> writes:

> On 4/16/20 7:59 AM, Toke Høiland-Jørgensen wrote:
>> 
>> I like the choice of hook points. It is interesting that it implies that
>> there will not be not a separate "XDP generic" hook on egress. And it's
>> certainly a benefit to not have to change all the drivers. So that's
>> good :)
>> 
>> I also think it'll be possible to get the information we want (such as
>> TXQ fill level) at the places you put the hooks. For the skb case
>> through struct netdev_queue and BQL, and for REDIRECT presumably with
>> Magnus' queue abstraction once that lands. So overall I think we're
>> getting there :)
>> 
>> I'll add a few more comments for each patch...
>> 
>
> thanks for reviewing.
>
> FYI, somehow I left out a refactoring patch when generating patches to
> send out. Basically moves existing tb[IFLA_XDP] handling to a helper
> that can be reused for tb[IFLA_XDP_EGRESS]
>
> https://github.com/dsahern/linux/commit/71011b5cf6f8c1bca28a6afe5a92be59152a8219

Ah yes, makes sense. I skipped over the netlink patches fairly quickly,
so didn't notice this was missing. I guess this also answers the
question "what about netlink policy for the new nested attribute", right? :)

-Toke


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH RFC-v5 bpf-next 09/12] dev: Support xdp in the Tx path for xdp_frames
  2020-04-17  9:25       ` Toke Høiland-Jørgensen
@ 2020-04-17 20:06         ` David Ahern
  0 siblings, 0 replies; 27+ messages in thread
From: David Ahern @ 2020-04-17 20:06 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, David Ahern, netdev
  Cc: davem, kuba, prashantbhole.linux, jasowang, brouer,
	toshiaki.makita1, daniel, john.fastabend, ast, kafai,
	songliubraving, yhs, andriin, David Ahern

On 4/17/20 3:25 AM, Toke Høiland-Jørgensen wrote:
>> not sure I understand. This is the redirect case. ie.., On rx a program
>> is run, XDP_REDIRECT is returned and the packet is queued. Once the
>> queue fills or flush is done, bq_xmit_all is called to send the
>> frames.
> I just meant that eventually we'd want to populate xdp_txq_info with a
> TX HWQ index (and possibly other stuff), right? So how do you figure
> we'd get that information at this call site?

same way it is done for skb's.

1. Add queue_mapping to struct xdp_frame

2. Update ndo_select_queue for xdp_frames

net_device_ops has ndo_select_queue which can be extended to handle
xdp_frames with a reasonable level of work (e.g., lowest bit in the
pointer is a flag signaling skb or xdp_frame). Right now, all queue
selection for xdp frames is buried in ndo_xdp_xmit and is cpu id based.
Move that code to ndo_select_queue (or make an xdp variant).

3. Refactor netdev_core_pick_tx. Move the guts of netdev_core_pick_tx -
the queue id selection - to a separate helper.

4. bq_xmit_all calls the new helper to set the tx queue for all frames.

5. Pass the queue index to the egress bpf program and make it writable
for steering.

6. ndo_xdp_xmit implementation use the index from the xdp_frame just
like ndo_start_xmit uses the mapping from the skb.

Just a bit of code movement and refactoring.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH RFC-v5 bpf-next 09/12] dev: Support xdp in the Tx path for xdp_frames
  2020-04-17  8:30   ` Jesper Dangaard Brouer
@ 2020-04-17 21:29     ` David Ahern
  0 siblings, 0 replies; 27+ messages in thread
From: David Ahern @ 2020-04-17 21:29 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, David Ahern
  Cc: netdev, davem, kuba, prashantbhole.linux, jasowang, toke,
	toshiaki.makita1, daniel, john.fastabend, ast, kafai,
	songliubraving, yhs, andriin, David Ahern

On 4/17/20 2:30 AM, Jesper Dangaard Brouer wrote:
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index 1bbaeb8842ed..f23dc6043329 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -4720,6 +4720,76 @@ u32 do_xdp_egress_skb(struct net_device *dev, struct sk_buff *skb)
>>  }
>>  EXPORT_SYMBOL_GPL(do_xdp_egress_skb);
>>  
>> +static u32 __xdp_egress_frame(struct net_device *dev,
>> +			      struct bpf_prog *xdp_prog,
>> +			      struct xdp_frame *xdp_frame,
>> +			      struct xdp_txq_info *txq)
>> +{
>> +	struct xdp_buff xdp;
>> +	u32 act;
>> +
>> +	xdp.data_hard_start = xdp_frame->data - xdp_frame->headroom;
> 
> You also need: minus sizeof(*xdp_frame).

Updated. thanks.

> 
> The BPF-helper xdp_adjust_head will not allow BPF-prog to access the
> memory area that is used for xdp_frame, thus it still is safe.
> 
> 
>> +	xdp.data = xdp_frame->data;
>> +	xdp.data_end = xdp.data + xdp_frame->len;
>> +	xdp_set_data_meta_invalid(&xdp);
>> +	xdp.txq = txq;
> 
> I think this will be the 3rd place we convert xdp-frame to xdp_buff,
> perhaps we should introduce a helper function call.

> 
>> +	act = bpf_prog_run_xdp(xdp_prog, &xdp);
>> +	act = handle_xdp_egress_act(act, dev, xdp_prog);
>> +
>> +	/* if not dropping frame, readjust pointers in case
>> +	 * program made changes to the buffer
>> +	 */
>> +	if (act != XDP_DROP) {
>> +		int headroom = xdp.data - xdp.data_hard_start;
>> +		int metasize = xdp.data - xdp.data_meta;
>> +
>> +		metasize = metasize > 0 ? metasize : 0;
>> +		if (unlikely((headroom - metasize) < sizeof(*xdp_frame)))
>> +			return XDP_DROP;
>> +
>> +		xdp_frame = xdp.data_hard_start;
> 
> Is this needed?

removed.

> 
>> +		xdp_frame->data = xdp.data;
>> +		xdp_frame->len  = xdp.data_end - xdp.data;
>> +		xdp_frame->headroom = headroom - sizeof(*xdp_frame);
>> +		xdp_frame->metasize = metasize;
>> +		/* xdp_frame->mem is unchanged */
> 
> This looks very similar to convert_to_xdp_frame.

yes, except the rxq references since rxq is not set.

> Maybe we need an central update_xdp_frame(xdp_buff) call?
> 
> 
> Untested code-up:
> 

sure, that should work.

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2020-04-17 21:29 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-04-13 17:17 [PATCH RFC-v5 bpf-next 00/12] Add support for XDP in egress path David Ahern
2020-04-13 17:17 ` [PATCH RFC-v5 bpf-next 01/12] net: Add XDP setup and query commands for Tx programs David Ahern
2020-04-13 17:17 ` [PATCH RFC-v5 bpf-next 02/12] net: Add BPF_XDP_EGRESS as a bpf_attach_type David Ahern
2020-04-16 14:01   ` Toke Høiland-Jørgensen
2020-04-16 16:35     ` David Ahern
2020-04-17  9:23       ` Toke Høiland-Jørgensen
2020-04-13 17:17 ` [PATCH RFC-v5 bpf-next 03/12] xdp: Add xdp_txq_info to xdp_buff David Ahern
2020-04-13 17:17 ` [PATCH RFC-v5 bpf-next 04/12] net: Add IFLA_XDP_EGRESS for XDP programs in the egress path David Ahern
2020-04-13 17:17 ` [PATCH RFC-v5 bpf-next 05/12] net: core: rename netif_receive_generic_xdp to do_generic_xdp_core David Ahern
2020-04-16 14:00   ` Toke Høiland-Jørgensen
2020-04-13 17:17 ` [PATCH RFC-v5 bpf-next 06/12] net: core: Rename do_xdp_generic to do_xdp_generic_rx David Ahern
2020-04-13 17:17 ` [PATCH RFC-v5 bpf-next 07/12] dev: set egress XDP program David Ahern
2020-04-13 17:17 ` [PATCH RFC-v5 bpf-next 08/12] dev: Support xdp in the Tx path for packets as an skb David Ahern
2020-04-13 17:17 ` [PATCH RFC-v5 bpf-next 09/12] dev: Support xdp in the Tx path for xdp_frames David Ahern
2020-04-16 14:02   ` Toke Høiland-Jørgensen
2020-04-16 23:50     ` David Ahern
2020-04-17  9:25       ` Toke Høiland-Jørgensen
2020-04-17 20:06         ` David Ahern
2020-04-17  8:30   ` Jesper Dangaard Brouer
2020-04-17 21:29     ` David Ahern
2020-04-13 17:17 ` [PATCH RFC-v5 bpf-next 10/12] libbpf: Add egress XDP support David Ahern
2020-04-13 17:18 ` [PATCH RFC-v5 bpf-next 11/12] bpftool: Add support for XDP egress David Ahern
2020-04-13 17:18 ` [PATCH RFC-v5 bpf-next 12/12] samples/bpf: add XDP egress support to xdp1 David Ahern
2020-04-15 15:01   ` Alexei Starovoitov
2020-04-16 13:59 ` [PATCH RFC-v5 bpf-next 00/12] Add support for XDP in egress path Toke Høiland-Jørgensen
2020-04-16 23:55   ` David Ahern
2020-04-17  9:28     ` Toke Høiland-Jørgensen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.