All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC v4 bpf-next 00/11] Add support for XDP in egress path
@ 2020-02-27  3:20 David Ahern
  2020-02-27  3:20 ` [PATCH RFC v4 bpf-next 01/11] net: Add XDP setup and query commands for Tx programs David Ahern
                   ` (11 more replies)
  0 siblings, 12 replies; 38+ messages in thread
From: David Ahern @ 2020-02-27  3:20 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, prashantbhole.linux, jasowang, brouer, toke, mst,
	toshiaki.makita1, daniel, john.fastabend, ast, kafai,
	songliubraving, yhs, andriin, dsahern

From: David Ahern <dsahern@gmail.com>

This series adds support for XDP in the egress path by introducing
a new XDP attachment type, BPF_XDP_EGRESS, and adding an if_link API
for attaching the program to a netdevice and reporting the program.
The intent is to emulate the current RX path for XDP as much as
possible to maintain consistency and symmetry in the 2 paths with
their APIs and when the programs are run: at first touch in the Rx
path and last touch in the Tx path.

The intent is to be able to run bpf programs on all packets regardless
of how they got to the xmit function of the netdevice - as an skb or a
redirected xdp frame. This is a missing primitive for XDP allowing
solutions to build small, targeted programs properly distributed in the
networking path allowing for example an egress firewall / ACL / traffic
verification or packet manipulation and encapping an entire ethernet
frame whether it is locally generated traffic, forwarded via the slow
path (ie., full stack processing) or xdp redirected frames. 

This set adds initial support for XDP in the egress path to the tun
driver, mostly for XDP_DROP (e.g., ACL on the ingress path to the
VM). XDP_TX on Rx means send the packet out the device it arrived
on; given that, XDP_Tx for the Tx path is treated as equivalent to
XDP_PASS - ie., continue on the Tx path. XDP_REDIRECT in the Tx path
can be handled in a follow set; conceptually it is no different than
the Rx path - the frame is handed off to another device though loops
do need to be detected.

The expectation is that the current mode levels (skb, driver and hardware)
will also apply to the egress path. The current patch set focuses on the
driver mode with tun. Attempting to tag the EGRESS path as yet another
mode is inconsistent on a number of levels - from the current usage of
XDP_FLAGS to options passed to the verifier for restricting xdp_md
accesses. Using the API as listed above maintains consistency with all
existing code.

As for libbpf, we believe that an API that mirrors and is consistent
with the existing xdp functions for the ingress path is the right way
to go for the egress path. Meaning, the libbpf API is replicated with
_egress in the name, but the libbpf implementation shares common code
to the degree possible.

Some of the patches in this set were originally part of Jason
Wang's work "XDP offload with virtio-net" [1]. The work overlaps
work done by me, so we decided to consolidate the work into the
egress path first with the offload option expected to be a follow
on.

At this point we are recognizing enough differences in the use case
that we are diverging again. For instance, there is agreement that
running the offloaded program should be done in vhost process context
making the "cost" attributable to the VM (vhost or qemu), but it is
more important for the host managed programs to be in the primary
packet path. As an example, packets should not be queueud to the vhost
ring buffer until *after* the XDP program for the Tx path has been run
otherwise there is the potential for the ring buffer to fill with
packets that the XDP program intends to drop (e.g., host managed ACL).

v4:
- updated cover letter
- patches related to code movement between tuntap, headers and vhost
  are dropped; previous RFC ran the XDP program in vhost context vs
  this set which runs them before queueing to vhost. As a part of this
  moved invocation of egress program to tun_net_xmit and tun_xdp_xmit.
- renamed do_xdp_generic to do_xdp_generic_rx to emphasize is called
  in the Rx path; added rx argument to do_xdp_generic_core since it
  is used for both directions and needs to know which queue values to
  set in xdp_buff

v3:
- reworked the patches - splitting patch 1 from RFC v2 into 3, combining
  patch 2 from RFC v2 into the first 3, combining patches 6 and 7 from
  RFC v2 into 1 since both did a trivial rename and export. Reordered
  the patches such that kernel changes are first followed by libbpf and
  an enhancement to a sample.

- moved small xdp related helper functions from tun.c to tun.h to make
  tun_ptr_free usable from the tap code. This is needed to handle the
  case of tap builtin and tun built as a module.

- pkt_ptrs added to `struct tun_file` and passed to tun_consume_packets
  rather than declaring pkts as an array on the stack.

v2:
- New XDP attachment type: Jesper, Toke and Alexei discussed whether
  to introduce a new program type. Since this set adds a way to attach
  regular XDP program to the tx path, as per Alexei's suggestion, a
  new attachment type BPF_XDP_EGRESS is introduced.

- libbpf API changes:
  Alexei had suggested _opts() style of API extension. Considering it
  two new libbpf APIs are introduced which are equivalent to existing
  APIs. New ones can be extended easily. Please see individual patches
  for details. xdp1 sample program is modified to use new APIs.

- tun: Some patches from previous set are removed as they are
  irrelevant in this series. They will in introduced later.

[1]: https://netdevconf.info/0x13/session.html?xdp-offload-with-virtio-net


David Ahern (9):
  net: Add XDP setup and query commands for Tx programs
  xdp: Add xdp_txq_info to xdp_buff
  net: Add IFLA_XDP_EGRESS for XDP programs in the egress path
  net: core: rename netif_receive_generic_xdp to do_generic_xdp_core
  net: core: Rename do_xdp_generic to do_xdp_generic_rx and export
  tun: set egress XDP program
  tun: Support xdp in the Tx path for skb
  tun: Support xdp in the Tx path for xdp_frames
  libbpf: Add egress XDP support

Prashant Bhole (2):
  net: Add BPF_XDP_EGRESS as a bpf_attach_type
  samples/bpf: xdp1, add egress XDP support

 drivers/net/tun.c                  | 156 ++++++++++++++++++++++++++---
 include/linux/netdevice.h          |   8 +-
 include/net/xdp.h                  |   5 +
 include/uapi/linux/bpf.h           |   7 +-
 include/uapi/linux/if_link.h       |   1 +
 net/core/dev.c                     |  59 +++++++----
 net/core/filter.c                  |  34 +++++--
 net/core/rtnetlink.c               | 114 ++++++++++++++++++++-
 samples/bpf/xdp1_user.c            |  30 +++++-
 tools/include/uapi/linux/bpf.h     |   1 +
 tools/include/uapi/linux/if_link.h |   1 +
 tools/lib/bpf/libbpf.c             |   2 +
 tools/lib/bpf/libbpf.h             |   6 ++
 tools/lib/bpf/libbpf.map           |   3 +
 tools/lib/bpf/netlink.c            |  52 ++++++++--
 15 files changed, 418 insertions(+), 61 deletions(-)

-- 
2.21.1 (Apple Git-122.3)


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH RFC v4 bpf-next 01/11] net: Add XDP setup and query commands for Tx programs
  2020-02-27  3:20 [PATCH RFC v4 bpf-next 00/11] Add support for XDP in egress path David Ahern
@ 2020-02-27  3:20 ` David Ahern
  2020-02-27  3:20 ` [PATCH RFC v4 bpf-next 02/11] net: Add BPF_XDP_EGRESS as a bpf_attach_type David Ahern
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 38+ messages in thread
From: David Ahern @ 2020-02-27  3:20 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, prashantbhole.linux, jasowang, brouer, toke, mst,
	toshiaki.makita1, daniel, john.fastabend, ast, kafai,
	songliubraving, yhs, andriin, dsahern, David Ahern

From: David Ahern <dahern@digitalocean.com>

Add new netdev commands, XDP_SETUP_PROG_EGRESS and
XDP_QUERY_PROG_EGRESS.

Update dev_change_xdp_fd and dev_xdp_install to take an egress argument.
If egress bool is set, then use XDP_SETUP_PROG_EGRESS in dev_xdp_install
and XDP_QUERY_PROG_EGRESS in dev_change_xdp_fd.

Update dev_xdp_uninstall to query for XDP_QUERY_PROG_EGRESS and if a
program is installed call dev_xdp_install with the egress argument set
to true.

This enables existing infrastructure to be used for XDP programs in
the egress path.

Signed-off-by: David Ahern <dahern@digitalocean.com>
Co-developed-by: Prashant Bhole <prashantbhole.linux@gmail.com>
Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
---
 include/linux/netdevice.h |  4 +++-
 net/core/dev.c            | 30 +++++++++++++++++++++---------
 net/core/rtnetlink.c      |  2 +-
 3 files changed, 25 insertions(+), 11 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 6c3f7032e8d9..d0dd0706ece4 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -868,8 +868,10 @@ enum bpf_netdev_command {
 	 */
 	XDP_SETUP_PROG,
 	XDP_SETUP_PROG_HW,
+	XDP_SETUP_PROG_EGRESS,
 	XDP_QUERY_PROG,
 	XDP_QUERY_PROG_HW,
+	XDP_QUERY_PROG_EGRESS,
 	/* BPF program for offload callbacks, invoked at program load time. */
 	BPF_OFFLOAD_MAP_ALLOC,
 	BPF_OFFLOAD_MAP_FREE,
@@ -3767,7 +3769,7 @@ struct sk_buff *dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
 
 typedef int (*bpf_op_t)(struct net_device *dev, struct netdev_bpf *bpf);
 int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
-		      int fd, u32 flags);
+		      int fd, u32 flags, bool egress);
 u32 __dev_xdp_query(struct net_device *dev, bpf_op_t xdp_op,
 		    enum bpf_netdev_command cmd);
 int xdp_umem_query(struct net_device *dev, u16 queue_id);
diff --git a/net/core/dev.c b/net/core/dev.c
index e10bd680dc03..27fc96ea0fdb 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -8587,7 +8587,7 @@ u32 __dev_xdp_query(struct net_device *dev, bpf_op_t bpf_op,
 
 static int dev_xdp_install(struct net_device *dev, bpf_op_t bpf_op,
 			   struct netlink_ext_ack *extack, u32 flags,
-			   struct bpf_prog *prog)
+			   struct bpf_prog *prog, bool egress)
 {
 	bool non_hw = !(flags & XDP_FLAGS_HW_MODE);
 	struct bpf_prog *prev_prog = NULL;
@@ -8605,7 +8605,7 @@ static int dev_xdp_install(struct net_device *dev, bpf_op_t bpf_op,
 	if (flags & XDP_FLAGS_HW_MODE)
 		xdp.command = XDP_SETUP_PROG_HW;
 	else
-		xdp.command = XDP_SETUP_PROG;
+		xdp.command = egress ? XDP_SETUP_PROG_EGRESS : XDP_SETUP_PROG;
 	xdp.extack = extack;
 	xdp.flags = flags;
 	xdp.prog = prog;
@@ -8626,7 +8626,8 @@ static void dev_xdp_uninstall(struct net_device *dev)
 	bpf_op_t ndo_bpf;
 
 	/* Remove generic XDP */
-	WARN_ON(dev_xdp_install(dev, generic_xdp_install, NULL, 0, NULL));
+	WARN_ON(dev_xdp_install(dev, generic_xdp_install, NULL, 0, NULL,
+				false));
 
 	/* Remove from the driver */
 	ndo_bpf = dev->netdev_ops->ndo_bpf;
@@ -8638,14 +8639,20 @@ static void dev_xdp_uninstall(struct net_device *dev)
 	WARN_ON(ndo_bpf(dev, &xdp));
 	if (xdp.prog_id)
 		WARN_ON(dev_xdp_install(dev, ndo_bpf, NULL, xdp.prog_flags,
-					NULL));
+					NULL, false));
 
 	/* Remove HW offload */
 	memset(&xdp, 0, sizeof(xdp));
 	xdp.command = XDP_QUERY_PROG_HW;
 	if (!ndo_bpf(dev, &xdp) && xdp.prog_id)
 		WARN_ON(dev_xdp_install(dev, ndo_bpf, NULL, xdp.prog_flags,
-					NULL));
+					NULL, false));
+	/* Remove egress program */
+	memset(&xdp, 0, sizeof(xdp));
+	xdp.command = XDP_QUERY_PROG_EGRESS;
+	if (!ndo_bpf(dev, &xdp) && xdp.prog_id)
+		WARN_ON(dev_xdp_install(dev, ndo_bpf, NULL, xdp.prog_flags,
+					NULL, true));
 }
 
 /**
@@ -8658,7 +8665,7 @@ static void dev_xdp_uninstall(struct net_device *dev)
  *	Set or clear a bpf program for a device
  */
 int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
-		      int fd, u32 flags)
+		      int fd, u32 flags, bool egress)
 {
 	const struct net_device_ops *ops = dev->netdev_ops;
 	enum bpf_netdev_command query;
@@ -8670,7 +8677,11 @@ int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
 	ASSERT_RTNL();
 
 	offload = flags & XDP_FLAGS_HW_MODE;
-	query = offload ? XDP_QUERY_PROG_HW : XDP_QUERY_PROG;
+	if (egress)
+		query = XDP_QUERY_PROG_EGRESS;
+	else
+		query = offload ? XDP_QUERY_PROG_HW : XDP_QUERY_PROG;
+
 
 	bpf_op = bpf_chk = ops->ndo_bpf;
 	if (!bpf_op && (flags & (XDP_FLAGS_DRV_MODE | XDP_FLAGS_HW_MODE))) {
@@ -8685,7 +8696,8 @@ int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
 	if (fd >= 0) {
 		u32 prog_id;
 
-		if (!offload && __dev_xdp_query(dev, bpf_chk, XDP_QUERY_PROG)) {
+		if (!offload && !egress &&
+		    __dev_xdp_query(dev, bpf_chk, XDP_QUERY_PROG)) {
 			NL_SET_ERR_MSG(extack, "native and generic XDP can't be active at the same time");
 			return -EEXIST;
 		}
@@ -8717,7 +8729,7 @@ int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
 			return 0;
 	}
 
-	err = dev_xdp_install(dev, bpf_op, extack, flags, prog);
+	err = dev_xdp_install(dev, bpf_op, extack, flags, prog, egress);
 	if (err < 0 && prog)
 		bpf_prog_put(prog);
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 14e6ea21c378..a488ab995f3e 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -2801,7 +2801,7 @@ static int do_setlink(const struct sk_buff *skb,
 		if (xdp[IFLA_XDP_FD]) {
 			err = dev_change_xdp_fd(dev, extack,
 						nla_get_s32(xdp[IFLA_XDP_FD]),
-						xdp_flags);
+						xdp_flags, false);
 			if (err)
 				goto errout;
 			status |= DO_SETLINK_NOTIFY;
-- 
2.21.1 (Apple Git-122.3)


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH RFC v4 bpf-next 02/11] net: Add BPF_XDP_EGRESS as a bpf_attach_type
  2020-02-27  3:20 [PATCH RFC v4 bpf-next 00/11] Add support for XDP in egress path David Ahern
  2020-02-27  3:20 ` [PATCH RFC v4 bpf-next 01/11] net: Add XDP setup and query commands for Tx programs David Ahern
@ 2020-02-27  3:20 ` David Ahern
  2020-02-27  3:20 ` [PATCH RFC v4 bpf-next 03/11] xdp: Add xdp_txq_info to xdp_buff David Ahern
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 38+ messages in thread
From: David Ahern @ 2020-02-27  3:20 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, prashantbhole.linux, jasowang, brouer, toke, mst,
	toshiaki.makita1, daniel, john.fastabend, ast, kafai,
	songliubraving, yhs, andriin, dsahern, David Ahern

From: Prashant Bhole <prashantbhole.linux@gmail.com>

Add new bpf_attach_type, BPF_XDP_EGRESS, for BPF programs attached
at the XDP layer, but the egress path.

Since egress path does not have rx_queue_index set, update
xdp_is_valid_access to block access to this entry in the xdp
context when a program is attached to egress path.

The next patch adds support for the egress ifindex.

Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
Signed-off-by: David Ahern <dahern@digitalocean.com>
---
 include/uapi/linux/bpf.h       | 1 +
 net/core/filter.c              | 7 +++++++
 tools/include/uapi/linux/bpf.h | 1 +
 3 files changed, 9 insertions(+)

diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 906e9f2752db..7850f8683b81 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -210,6 +210,7 @@ enum bpf_attach_type {
 	BPF_TRACE_RAW_TP,
 	BPF_TRACE_FENTRY,
 	BPF_TRACE_FEXIT,
+	BPF_XDP_EGRESS,
 	__MAX_BPF_ATTACH_TYPE
 };
 
diff --git a/net/core/filter.c b/net/core/filter.c
index 925b23de218b..c7cc98c55621 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -6795,6 +6795,13 @@ static bool xdp_is_valid_access(int off, int size,
 				const struct bpf_prog *prog,
 				struct bpf_insn_access_aux *info)
 {
+	if (prog->expected_attach_type == BPF_XDP_EGRESS) {
+		switch (off) {
+		case offsetof(struct xdp_md, rx_queue_index):
+			return false;
+		}
+	}
+
 	if (type == BPF_WRITE) {
 		if (bpf_prog_is_dev_bound(prog->aux)) {
 			switch (off) {
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index 906e9f2752db..7850f8683b81 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -210,6 +210,7 @@ enum bpf_attach_type {
 	BPF_TRACE_RAW_TP,
 	BPF_TRACE_FENTRY,
 	BPF_TRACE_FEXIT,
+	BPF_XDP_EGRESS,
 	__MAX_BPF_ATTACH_TYPE
 };
 
-- 
2.21.1 (Apple Git-122.3)


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH RFC v4 bpf-next 03/11] xdp: Add xdp_txq_info to xdp_buff
  2020-02-27  3:20 [PATCH RFC v4 bpf-next 00/11] Add support for XDP in egress path David Ahern
  2020-02-27  3:20 ` [PATCH RFC v4 bpf-next 01/11] net: Add XDP setup and query commands for Tx programs David Ahern
  2020-02-27  3:20 ` [PATCH RFC v4 bpf-next 02/11] net: Add BPF_XDP_EGRESS as a bpf_attach_type David Ahern
@ 2020-02-27  3:20 ` David Ahern
  2020-02-27  8:00   ` Jesper Dangaard Brouer
  2020-02-27  3:20 ` [PATCH RFC v4 bpf-next 04/11] net: Add IFLA_XDP_EGRESS for XDP programs in the egress path David Ahern
                   ` (8 subsequent siblings)
  11 siblings, 1 reply; 38+ messages in thread
From: David Ahern @ 2020-02-27  3:20 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, prashantbhole.linux, jasowang, brouer, toke, mst,
	toshiaki.makita1, daniel, john.fastabend, ast, kafai,
	songliubraving, yhs, andriin, dsahern, David Ahern

From: David Ahern <dahern@digitalocean.com>

Add xdp_txq_info as the Tx counterpart to xdp_rxq_info. At the
moment only the device is added. Other fields (queue_index)
can be added as use cases arise.

From a UAPI perspective, egress_ifindex is a union with ingress_ifindex
since only one applies based on where the program is attached.

Signed-off-by: David Ahern <dahern@digitalocean.com>
---
 include/net/xdp.h        |  5 +++++
 include/uapi/linux/bpf.h |  6 ++++--
 net/core/filter.c        | 27 +++++++++++++++++++--------
 3 files changed, 28 insertions(+), 10 deletions(-)

diff --git a/include/net/xdp.h b/include/net/xdp.h
index 40c6d3398458..5584b9db86fe 100644
--- a/include/net/xdp.h
+++ b/include/net/xdp.h
@@ -63,6 +63,10 @@ struct xdp_rxq_info {
 	struct xdp_mem_info mem;
 } ____cacheline_aligned; /* perf critical, avoid false-sharing */
 
+struct xdp_txq_info {
+	struct net_device *dev;
+};
+
 struct xdp_buff {
 	void *data;
 	void *data_end;
@@ -70,6 +74,7 @@ struct xdp_buff {
 	void *data_hard_start;
 	unsigned long handle;
 	struct xdp_rxq_info *rxq;
+	struct xdp_txq_info *txq;
 };
 
 struct xdp_frame {
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index 7850f8683b81..5e3f8aefad41 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -3334,8 +3334,10 @@ struct xdp_md {
 	__u32 data;
 	__u32 data_end;
 	__u32 data_meta;
-	/* Below access go through struct xdp_rxq_info */
-	__u32 ingress_ifindex; /* rxq->dev->ifindex */
+	union {
+		__u32 ingress_ifindex; /* rxq->dev->ifindex */
+		__u32 egress_ifindex;  /* txq->dev->ifindex */
+	};
 	__u32 rx_queue_index;  /* rxq->queue_index  */
 };
 
diff --git a/net/core/filter.c b/net/core/filter.c
index c7cc98c55621..d1c65dccd671 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -7716,14 +7716,25 @@ static u32 xdp_convert_ctx_access(enum bpf_access_type type,
 				      offsetof(struct xdp_buff, data_end));
 		break;
 	case offsetof(struct xdp_md, ingress_ifindex):
-		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct xdp_buff, rxq),
-				      si->dst_reg, si->src_reg,
-				      offsetof(struct xdp_buff, rxq));
-		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct xdp_rxq_info, dev),
-				      si->dst_reg, si->dst_reg,
-				      offsetof(struct xdp_rxq_info, dev));
-		*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg,
-				      offsetof(struct net_device, ifindex));
+		if (prog->expected_attach_type == BPF_XDP_EGRESS) {
+			*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct xdp_buff, txq),
+					      si->dst_reg, si->src_reg,
+					      offsetof(struct xdp_buff, txq));
+			*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct xdp_txq_info, dev),
+					      si->dst_reg, si->dst_reg,
+					      offsetof(struct xdp_txq_info, dev));
+			*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg,
+					      offsetof(struct net_device, ifindex));
+		} else {
+			*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct xdp_buff, rxq),
+					      si->dst_reg, si->src_reg,
+					      offsetof(struct xdp_buff, rxq));
+			*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct xdp_rxq_info, dev),
+					      si->dst_reg, si->dst_reg,
+					      offsetof(struct xdp_rxq_info, dev));
+			*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg,
+					      offsetof(struct net_device, ifindex));
+		}
 		break;
 	case offsetof(struct xdp_md, rx_queue_index):
 		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct xdp_buff, rxq),
-- 
2.21.1 (Apple Git-122.3)


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH RFC v4 bpf-next 04/11] net: Add IFLA_XDP_EGRESS for XDP programs in the egress path
  2020-02-27  3:20 [PATCH RFC v4 bpf-next 00/11] Add support for XDP in egress path David Ahern
                   ` (2 preceding siblings ...)
  2020-02-27  3:20 ` [PATCH RFC v4 bpf-next 03/11] xdp: Add xdp_txq_info to xdp_buff David Ahern
@ 2020-02-27  3:20 ` David Ahern
  2020-02-27  3:20 ` [PATCH RFC v4 bpf-next 05/11] net: core: rename netif_receive_generic_xdp to do_generic_xdp_core David Ahern
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 38+ messages in thread
From: David Ahern @ 2020-02-27  3:20 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, prashantbhole.linux, jasowang, brouer, toke, mst,
	toshiaki.makita1, daniel, john.fastabend, ast, kafai,
	songliubraving, yhs, andriin, dsahern, David Ahern

From: David Ahern <dahern@digitalocean.com>

Add IFLA_XDP_EGRESS to if_link.h uapi to handle an XDP program attached
to the Tx path of a device. Add rtnl_xdp_egress_fill and helpers as
the egress counterpart to the existing rtnl_xdp_fill. The expectation
is that going forward egress path will acquire the various levels of
attach - generic, driver and hardware.

Signed-off-by: David Ahern <dahern@digitalocean.com>
Co-developed-by: Prashant Bhole <prashantbhole.linux@gmail.com>
Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
---
 include/uapi/linux/if_link.h       |   1 +
 net/core/dev.c                     |   6 ++
 net/core/rtnetlink.c               | 112 ++++++++++++++++++++++++++++-
 tools/include/uapi/linux/if_link.h |   1 +
 4 files changed, 119 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 024af2d1d0af..7f2246d8122f 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -170,6 +170,7 @@ enum {
 	IFLA_PROP_LIST,
 	IFLA_ALT_IFNAME, /* Alternative ifname */
 	IFLA_PERM_ADDRESS,
+	IFLA_XDP_EGRESS, /* nested attribute with 1 or more IFLA_XDP_ attrs */
 	__IFLA_MAX
 };
 
diff --git a/net/core/dev.c b/net/core/dev.c
index 27fc96ea0fdb..15c511c5c7d5 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -8713,6 +8713,12 @@ int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
 		if (IS_ERR(prog))
 			return PTR_ERR(prog);
 
+		if (egress && prog->expected_attach_type != BPF_XDP_EGRESS) {
+			NL_SET_ERR_MSG(extack, "XDP program in Tx path must use BPF_XDP_EGRESS attach type");
+			bpf_prog_put(prog);
+			return -EINVAL;
+		}
+
 		if (!offload && bpf_prog_is_dev_bound(prog->aux)) {
 			NL_SET_ERR_MSG(extack, "using device-bound program without HW_MODE flag is not supported");
 			bpf_prog_put(prog);
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index a488ab995f3e..e5a87b4041c4 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1030,7 +1030,7 @@ static noinline size_t if_nlmsg_size(const struct net_device *dev,
 	       + nla_total_size(MAX_PHYS_ITEM_ID_LEN) /* IFLA_PHYS_PORT_ID */
 	       + nla_total_size(MAX_PHYS_ITEM_ID_LEN) /* IFLA_PHYS_SWITCH_ID */
 	       + nla_total_size(IFNAMSIZ) /* IFLA_PHYS_PORT_NAME */
-	       + rtnl_xdp_size() /* IFLA_XDP */
+	       + rtnl_xdp_size() * 2 /* IFLA_XDP and IFLA_XDP_EGRESS */
 	       + nla_total_size(4)  /* IFLA_EVENT */
 	       + nla_total_size(4)  /* IFLA_NEW_NETNSID */
 	       + nla_total_size(4)  /* IFLA_NEW_IFINDEX */
@@ -1395,6 +1395,36 @@ static int rtnl_fill_link_ifmap(struct sk_buff *skb, struct net_device *dev)
 	return 0;
 }
 
+static u32 rtnl_xdp_egress_prog_drv(struct net_device *dev)
+{
+	return __dev_xdp_query(dev, dev->netdev_ops->ndo_bpf,
+			       XDP_QUERY_PROG_EGRESS);
+}
+
+static int rtnl_xdp_egress_report(struct sk_buff *skb, struct net_device *dev,
+				  u32 *prog_id, u8 *mode, u8 tgt_mode, u32 attr,
+				  u32 (*get_prog_id)(struct net_device *dev))
+{
+	u32 curr_id;
+	int err;
+
+	curr_id = get_prog_id(dev);
+	if (!curr_id)
+		return 0;
+
+	*prog_id = curr_id;
+	err = nla_put_u32(skb, attr, curr_id);
+	if (err)
+		return err;
+
+	if (*mode != XDP_ATTACHED_NONE)
+		*mode = XDP_ATTACHED_MULTI;
+	else
+		*mode = tgt_mode;
+
+	return 0;
+}
+
 static u32 rtnl_xdp_prog_skb(struct net_device *dev)
 {
 	const struct bpf_prog *generic_xdp_prog;
@@ -1486,6 +1516,41 @@ static int rtnl_xdp_fill(struct sk_buff *skb, struct net_device *dev)
 	return err;
 }
 
+static int rtnl_xdp_egress_fill(struct sk_buff *skb, struct net_device *dev)
+{
+	u8 mode = XDP_ATTACHED_NONE;
+	struct nlattr *xdp;
+	u32 prog_id = 0;
+	int err;
+
+	xdp = nla_nest_start_noflag(skb, IFLA_XDP_EGRESS);
+	if (!xdp)
+		return -EMSGSIZE;
+
+	err = rtnl_xdp_egress_report(skb, dev, &prog_id, &mode,
+				     XDP_ATTACHED_DRV, IFLA_XDP_DRV_PROG_ID,
+				     rtnl_xdp_egress_prog_drv);
+	if (err)
+		goto err_cancel;
+
+	err = nla_put_u8(skb, IFLA_XDP_ATTACHED, mode);
+	if (err)
+		goto err_cancel;
+
+	if (prog_id && mode != XDP_ATTACHED_MULTI) {
+		err = nla_put_u32(skb, IFLA_XDP_PROG_ID, prog_id);
+		if (err)
+			goto err_cancel;
+	}
+
+	nla_nest_end(skb, xdp);
+	return 0;
+
+err_cancel:
+	nla_nest_cancel(skb, xdp);
+	return err;
+}
+
 static u32 rtnl_get_event(unsigned long event)
 {
 	u32 rtnl_event_type = IFLA_EVENT_NONE;
@@ -1743,6 +1808,9 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb,
 	if (rtnl_xdp_fill(skb, dev))
 		goto nla_put_failure;
 
+	if (rtnl_xdp_egress_fill(skb, dev))
+		goto nla_put_failure;
+
 	if (dev->rtnl_link_ops || rtnl_have_link_slave_info(dev)) {
 		if (rtnl_link_fill(skb, dev) < 0)
 			goto nla_put_failure;
@@ -1827,6 +1895,7 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = {
 	[IFLA_ALT_IFNAME]	= { .type = NLA_STRING,
 				    .len = ALTIFNAMSIZ - 1 },
 	[IFLA_PERM_ADDRESS]	= { .type = NLA_REJECT },
+	[IFLA_XDP_EGRESS]	= { .type = NLA_NESTED },
 };
 
 static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = {
@@ -2808,6 +2877,47 @@ static int do_setlink(const struct sk_buff *skb,
 		}
 	}
 
+	if (tb[IFLA_XDP_EGRESS]) {
+		struct nlattr *xdp[IFLA_XDP_MAX + 1];
+		u32 xdp_flags = 0;
+
+		err = nla_parse_nested_deprecated(xdp, IFLA_XDP_MAX,
+						  tb[IFLA_XDP_EGRESS],
+						  ifla_xdp_policy, NULL);
+		if (err < 0)
+			goto errout;
+
+		if (xdp[IFLA_XDP_ATTACHED] || xdp[IFLA_XDP_PROG_ID]) {
+			err = -EINVAL;
+			goto errout;
+		}
+
+		if (xdp[IFLA_XDP_FLAGS]) {
+			xdp_flags = nla_get_u32(xdp[IFLA_XDP_FLAGS]);
+			if (xdp_flags & XDP_FLAGS_HW_MODE) {
+				err = -EINVAL;
+				goto errout;
+			}
+			if (xdp_flags & ~XDP_FLAGS_MASK) {
+				err = -EINVAL;
+				goto errout;
+			}
+			if (hweight32(xdp_flags & XDP_FLAGS_MODES) > 1) {
+				err = -EINVAL;
+				goto errout;
+			}
+		}
+
+		if (xdp[IFLA_XDP_FD]) {
+			err = dev_change_xdp_fd(dev, extack,
+						nla_get_s32(xdp[IFLA_XDP_FD]),
+						xdp_flags, true);
+			if (err)
+				goto errout;
+			status |= DO_SETLINK_NOTIFY;
+		}
+	}
+
 errout:
 	if (status & DO_SETLINK_MODIFIED) {
 		if ((status & DO_SETLINK_NOTIFY) == DO_SETLINK_NOTIFY)
diff --git a/tools/include/uapi/linux/if_link.h b/tools/include/uapi/linux/if_link.h
index 024af2d1d0af..7f2246d8122f 100644
--- a/tools/include/uapi/linux/if_link.h
+++ b/tools/include/uapi/linux/if_link.h
@@ -170,6 +170,7 @@ enum {
 	IFLA_PROP_LIST,
 	IFLA_ALT_IFNAME, /* Alternative ifname */
 	IFLA_PERM_ADDRESS,
+	IFLA_XDP_EGRESS, /* nested attribute with 1 or more IFLA_XDP_ attrs */
 	__IFLA_MAX
 };
 
-- 
2.21.1 (Apple Git-122.3)


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH RFC v4 bpf-next 05/11] net: core: rename netif_receive_generic_xdp to do_generic_xdp_core
  2020-02-27  3:20 [PATCH RFC v4 bpf-next 00/11] Add support for XDP in egress path David Ahern
                   ` (3 preceding siblings ...)
  2020-02-27  3:20 ` [PATCH RFC v4 bpf-next 04/11] net: Add IFLA_XDP_EGRESS for XDP programs in the egress path David Ahern
@ 2020-02-27  3:20 ` David Ahern
  2020-02-27  3:20 ` [PATCH RFC v4 bpf-next 06/11] net: core: Rename do_xdp_generic to do_xdp_generic_rx and export David Ahern
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 38+ messages in thread
From: David Ahern @ 2020-02-27  3:20 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, prashantbhole.linux, jasowang, brouer, toke, mst,
	toshiaki.makita1, daniel, john.fastabend, ast, kafai,
	songliubraving, yhs, andriin, dsahern, David Ahern

From: David Ahern <dahern@digitalocean.com>

In skb generic path, we need a way to run XDP program on skb but
to have customized handling of XDP actions. netif_receive_generic_xdp
will be more helpful in such cases than do_xdp_generic.

This patch prepares netif_receive_generic_xdp() to be used as general
purpose function by renaming it and exporting as a general purpose
function. It will just run XDP program on skb but will not handle XDP
actions.

Move setting xdp->rxq to the caller. This allows this core function
to be used from both Rx and Tx paths with rxq and txq set as needed.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
Signed-off-by: David Ahern <dahern@digitalocean.com>
---
 include/linux/netdevice.h |  2 ++
 net/core/dev.c            | 16 ++++++++--------
 2 files changed, 10 insertions(+), 8 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index d0dd0706ece4..bc58c489e959 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3701,6 +3701,8 @@ static inline void dev_consume_skb_any(struct sk_buff *skb)
 
 void generic_xdp_tx(struct sk_buff *skb, struct bpf_prog *xdp_prog);
 int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb);
+u32 do_xdp_generic_core(struct sk_buff *skb, struct xdp_buff *xdp,
+			struct bpf_prog *xdp_prog);
 int netif_rx(struct sk_buff *skb);
 int netif_rx_ni(struct sk_buff *skb);
 int netif_receive_skb(struct sk_buff *skb);
diff --git a/net/core/dev.c b/net/core/dev.c
index 15c511c5c7d5..bfa7a64c4e68 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4498,11 +4498,9 @@ static struct netdev_rx_queue *netif_get_rxqueue(struct sk_buff *skb)
 	return rxqueue;
 }
 
-static u32 netif_receive_generic_xdp(struct sk_buff *skb,
-				     struct xdp_buff *xdp,
-				     struct bpf_prog *xdp_prog)
+u32 do_xdp_generic_core(struct sk_buff *skb, struct xdp_buff *xdp,
+			struct bpf_prog *xdp_prog)
 {
-	struct netdev_rx_queue *rxqueue;
 	void *orig_data, *orig_data_end;
 	u32 metalen, act = XDP_DROP;
 	__be16 orig_eth_type;
@@ -4552,9 +4550,6 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
 	orig_bcast = is_multicast_ether_addr_64bits(eth->h_dest);
 	orig_eth_type = eth->h_proto;
 
-	rxqueue = netif_get_rxqueue(skb);
-	xdp->rxq = &rxqueue->xdp_rxq;
-
 	act = bpf_prog_run_xdp(xdp_prog, xdp);
 
 	/* check if bpf_xdp_adjust_head was used */
@@ -4611,6 +4606,7 @@ static u32 netif_receive_generic_xdp(struct sk_buff *skb,
 
 	return act;
 }
+EXPORT_SYMBOL_GPL(do_xdp_generic_core);
 
 /* When doing generic XDP we have to bypass the qdisc layer and the
  * network taps in order to match in-driver-XDP behavior.
@@ -4643,11 +4639,15 @@ static DEFINE_STATIC_KEY_FALSE(generic_xdp_needed_key);
 int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb)
 {
 	if (xdp_prog) {
+		struct netdev_rx_queue *rxqueue;
 		struct xdp_buff xdp;
 		u32 act;
 		int err;
 
-		act = netif_receive_generic_xdp(skb, &xdp, xdp_prog);
+		rxqueue = netif_get_rxqueue(skb);
+		xdp.rxq = &rxqueue->xdp_rxq;
+
+		act = do_xdp_generic_core(skb, &xdp, xdp_prog);
 		if (act != XDP_PASS) {
 			switch (act) {
 			case XDP_REDIRECT:
-- 
2.21.1 (Apple Git-122.3)


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH RFC v4 bpf-next 06/11] net: core: Rename do_xdp_generic to do_xdp_generic_rx and export
  2020-02-27  3:20 [PATCH RFC v4 bpf-next 00/11] Add support for XDP in egress path David Ahern
                   ` (4 preceding siblings ...)
  2020-02-27  3:20 ` [PATCH RFC v4 bpf-next 05/11] net: core: rename netif_receive_generic_xdp to do_generic_xdp_core David Ahern
@ 2020-02-27  3:20 ` David Ahern
  2020-02-27  3:20 ` [PATCH RFC v4 bpf-next 07/11] tun: set egress XDP program David Ahern
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 38+ messages in thread
From: David Ahern @ 2020-02-27  3:20 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, prashantbhole.linux, jasowang, brouer, toke, mst,
	toshiaki.makita1, daniel, john.fastabend, ast, kafai,
	songliubraving, yhs, andriin, dsahern, David Ahern

From: David Ahern <dahern@digitalocean.com>

Rename do_xdp_generic to do_xdp_generic_rx to emphasize its use in the
Rx path, and export it as a general purpose function. It will just run
XDP program on skb but will not handle XDP actions.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
Signed-off-by: David Ahern <dahern@digitalocean.com>
---
 drivers/net/tun.c         | 4 ++--
 include/linux/netdevice.h | 2 +-
 net/core/dev.c            | 7 ++++---
 3 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 650c937ed56b..7cc5a1acaef2 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1932,7 +1932,7 @@ static ssize_t tun_get_user(struct tun_struct *tun, struct tun_file *tfile,
 		rcu_read_lock();
 		xdp_prog = rcu_dereference(tun->xdp_prog);
 		if (xdp_prog) {
-			ret = do_xdp_generic(xdp_prog, skb);
+			ret = do_xdp_generic_rx(xdp_prog, skb);
 			if (ret != XDP_PASS) {
 				rcu_read_unlock();
 				local_bh_enable();
@@ -2498,7 +2498,7 @@ static int tun_xdp_one(struct tun_struct *tun,
 	skb_probe_transport_header(skb);
 
 	if (skb_xdp) {
-		err = do_xdp_generic(xdp_prog, skb);
+		err = do_xdp_generic_rx(xdp_prog, skb);
 		if (err != XDP_PASS)
 			goto out;
 	}
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index bc58c489e959..c3549006b8fc 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3700,7 +3700,7 @@ static inline void dev_consume_skb_any(struct sk_buff *skb)
 }
 
 void generic_xdp_tx(struct sk_buff *skb, struct bpf_prog *xdp_prog);
-int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb);
+int do_xdp_generic_rx(struct bpf_prog *xdp_prog, struct sk_buff *skb);
 u32 do_xdp_generic_core(struct sk_buff *skb, struct xdp_buff *xdp,
 			struct bpf_prog *xdp_prog);
 int netif_rx(struct sk_buff *skb);
diff --git a/net/core/dev.c b/net/core/dev.c
index bfa7a64c4e68..c943fc7b8054 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4636,7 +4636,7 @@ EXPORT_SYMBOL_GPL(generic_xdp_tx);
 
 static DEFINE_STATIC_KEY_FALSE(generic_xdp_needed_key);
 
-int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb)
+int do_xdp_generic_rx(struct bpf_prog *xdp_prog, struct sk_buff *skb)
 {
 	if (xdp_prog) {
 		struct netdev_rx_queue *rxqueue;
@@ -4668,7 +4668,7 @@ int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb)
 	kfree_skb(skb);
 	return XDP_DROP;
 }
-EXPORT_SYMBOL_GPL(do_xdp_generic);
+EXPORT_SYMBOL_GPL(do_xdp_generic_rx);
 
 static int netif_rx_internal(struct sk_buff *skb)
 {
@@ -5017,7 +5017,8 @@ static int __netif_receive_skb_core(struct sk_buff *skb, bool pfmemalloc,
 		int ret2;
 
 		preempt_disable();
-		ret2 = do_xdp_generic(rcu_dereference(skb->dev->xdp_prog), skb);
+		ret2 = do_xdp_generic_rx(rcu_dereference(skb->dev->xdp_prog),
+					 skb);
 		preempt_enable();
 
 		if (ret2 != XDP_PASS)
-- 
2.21.1 (Apple Git-122.3)


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH RFC v4 bpf-next 07/11] tun: set egress XDP program
  2020-02-27  3:20 [PATCH RFC v4 bpf-next 00/11] Add support for XDP in egress path David Ahern
                   ` (5 preceding siblings ...)
  2020-02-27  3:20 ` [PATCH RFC v4 bpf-next 06/11] net: core: Rename do_xdp_generic to do_xdp_generic_rx and export David Ahern
@ 2020-02-27  3:20 ` David Ahern
  2020-03-02  3:32   ` Jason Wang
  2020-02-27  3:20 ` [PATCH RFC v4 bpf-next 08/11] tun: Support xdp in the Tx path for skb David Ahern
                   ` (4 subsequent siblings)
  11 siblings, 1 reply; 38+ messages in thread
From: David Ahern @ 2020-02-27  3:20 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, prashantbhole.linux, jasowang, brouer, toke, mst,
	toshiaki.makita1, daniel, john.fastabend, ast, kafai,
	songliubraving, yhs, andriin, dsahern, David Ahern

From: David Ahern <dahern@digitalocean.com>

This patch adds a way to set tx path XDP program in tun driver
by handling XDP_SETUP_PROG_EGRESS and XDP_QUERY_PROG_EGRESS in
ndo_bpf handler.

Signed-off-by: David Ahern <dahern@digitalocean.com>
Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
---
 drivers/net/tun.c | 34 ++++++++++++++++++++++++++--------
 1 file changed, 26 insertions(+), 8 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 7cc5a1acaef2..6aae398b904b 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -239,6 +239,7 @@ struct tun_struct {
 	u32 rx_batched;
 	struct tun_pcpu_stats __percpu *pcpu_stats;
 	struct bpf_prog __rcu *xdp_prog;
+	struct bpf_prog __rcu *xdp_egress_prog;
 	struct tun_prog __rcu *steering_prog;
 	struct tun_prog __rcu *filter_prog;
 	struct ethtool_link_ksettings link_ksettings;
@@ -1189,15 +1190,21 @@ tun_net_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats)
 }
 
 static int tun_xdp_set(struct net_device *dev, struct bpf_prog *prog,
-		       struct netlink_ext_ack *extack)
+		       bool egress, struct netlink_ext_ack *extack)
 {
 	struct tun_struct *tun = netdev_priv(dev);
 	struct tun_file *tfile;
 	struct bpf_prog *old_prog;
 	int i;
 
-	old_prog = rtnl_dereference(tun->xdp_prog);
-	rcu_assign_pointer(tun->xdp_prog, prog);
+	if (egress) {
+		old_prog = rtnl_dereference(tun->xdp_egress_prog);
+		rcu_assign_pointer(tun->xdp_egress_prog, prog);
+	} else {
+		old_prog = rtnl_dereference(tun->xdp_prog);
+		rcu_assign_pointer(tun->xdp_prog, prog);
+	}
+
 	if (old_prog)
 		bpf_prog_put(old_prog);
 
@@ -1218,12 +1225,16 @@ static int tun_xdp_set(struct net_device *dev, struct bpf_prog *prog,
 	return 0;
 }
 
-static u32 tun_xdp_query(struct net_device *dev)
+static u32 tun_xdp_query(struct net_device *dev, bool egress)
 {
 	struct tun_struct *tun = netdev_priv(dev);
 	const struct bpf_prog *xdp_prog;
 
-	xdp_prog = rtnl_dereference(tun->xdp_prog);
+	if (egress)
+		xdp_prog = rtnl_dereference(tun->xdp_egress_prog);
+	else
+		xdp_prog = rtnl_dereference(tun->xdp_prog);
+
 	if (xdp_prog)
 		return xdp_prog->aux->id;
 
@@ -1234,13 +1245,20 @@ static int tun_xdp(struct net_device *dev, struct netdev_bpf *xdp)
 {
 	switch (xdp->command) {
 	case XDP_SETUP_PROG:
-		return tun_xdp_set(dev, xdp->prog, xdp->extack);
+		return tun_xdp_set(dev, xdp->prog, false, xdp->extack);
+	case XDP_SETUP_PROG_EGRESS:
+		return tun_xdp_set(dev, xdp->prog, true, xdp->extack);
 	case XDP_QUERY_PROG:
-		xdp->prog_id = tun_xdp_query(dev);
-		return 0;
+		xdp->prog_id = tun_xdp_query(dev, false);
+		break;
+	case XDP_QUERY_PROG_EGRESS:
+		xdp->prog_id = tun_xdp_query(dev, true);
+		break;
 	default:
 		return -EINVAL;
 	}
+
+	return 0;
 }
 
 static int tun_net_change_carrier(struct net_device *dev, bool new_carrier)
-- 
2.21.1 (Apple Git-122.3)


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH RFC v4 bpf-next 08/11] tun: Support xdp in the Tx path for skb
  2020-02-27  3:20 [PATCH RFC v4 bpf-next 00/11] Add support for XDP in egress path David Ahern
                   ` (6 preceding siblings ...)
  2020-02-27  3:20 ` [PATCH RFC v4 bpf-next 07/11] tun: set egress XDP program David Ahern
@ 2020-02-27  3:20 ` David Ahern
  2020-03-02  3:28   ` Jason Wang
  2020-03-03 10:46   ` Jesper Dangaard Brouer
  2020-02-27  3:20 ` [PATCH RFC v4 bpf-next 09/11] tun: Support xdp in the Tx path for xdp_frames David Ahern
                   ` (3 subsequent siblings)
  11 siblings, 2 replies; 38+ messages in thread
From: David Ahern @ 2020-02-27  3:20 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, prashantbhole.linux, jasowang, brouer, toke, mst,
	toshiaki.makita1, daniel, john.fastabend, ast, kafai,
	songliubraving, yhs, andriin, dsahern, David Ahern

From: David Ahern <dahern@digitalocean.com>

Add support to run Tx path program on packets arriving at a tun
device as an skb.

XDP_TX return code means move the packet to the Tx path of the device.
For a program run in the Tx / egress path, XDP_TX is essentially the
same as "continue on" which is XDP_PASS.

Conceptually, XDP_REDIRECT for this path can work the same as it
does for the Rx path, but that return code is left for a follow
on series.

Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
Signed-off-by: David Ahern <dahern@digitalocean.com>
---
 drivers/net/tun.c | 69 ++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 66 insertions(+), 3 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 6aae398b904b..dcae6521a39d 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1059,6 +1059,63 @@ static unsigned int run_ebpf_filter(struct tun_struct *tun,
 	return len;
 }
 
+static struct sk_buff *tun_prepare_xdp_skb(struct sk_buff *skb)
+{
+	if (skb_shared(skb) || skb_cloned(skb)) {
+		struct sk_buff *nskb;
+
+		nskb = skb_copy(skb, GFP_ATOMIC);
+		consume_skb(skb);
+		return nskb;
+	}
+
+	return skb;
+}
+
+static u32 tun_do_xdp_tx_generic(struct tun_struct *tun,
+				 struct net_device *dev,
+				 struct sk_buff *skb)
+{
+	struct bpf_prog *xdp_prog;
+	u32 act = XDP_PASS;
+
+	xdp_prog = rcu_dereference(tun->xdp_egress_prog);
+	if (xdp_prog) {
+		struct xdp_txq_info txq = { .dev = dev };
+		struct xdp_buff xdp;
+
+		skb = tun_prepare_xdp_skb(skb);
+		if (!skb) {
+			act = XDP_DROP;
+			goto out;
+		}
+
+		xdp.txq = &txq;
+
+		act = do_xdp_generic_core(skb, &xdp, xdp_prog);
+		switch (act) {
+		case XDP_TX:    /* for Tx path, XDP_TX == XDP_PASS */
+			act = XDP_PASS;
+			break;
+		case XDP_PASS:
+			break;
+		case XDP_REDIRECT:
+			/* fall through */
+		default:
+			bpf_warn_invalid_xdp_action(act);
+			/* fall through */
+		case XDP_ABORTED:
+			trace_xdp_exception(tun->dev, xdp_prog, act);
+			/* fall through */
+		case XDP_DROP:
+			break;
+		}
+	}
+
+out:
+	return act;
+}
+
 /* Net device start xmit */
 static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
 {
@@ -1066,6 +1123,7 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
 	int txq = skb->queue_mapping;
 	struct tun_file *tfile;
 	int len = skb->len;
+	u32 act;
 
 	rcu_read_lock();
 	tfile = rcu_dereference(tun->tfiles[txq]);
@@ -1107,9 +1165,13 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
 
 	nf_reset_ct(skb);
 
-	if (ptr_ring_produce(&tfile->tx_ring, skb))
+	act = tun_do_xdp_tx_generic(tun, dev, skb);
+	if (act != XDP_PASS)
 		goto drop;
 
+	if (ptr_ring_produce(&tfile->tx_ring, skb))
+		goto err_out;
+
 	/* Notify and wake up reader process */
 	if (tfile->flags & TUN_FASYNC)
 		kill_fasync(&tfile->fasync, SIGIO, POLL_IN);
@@ -1118,10 +1180,11 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
 	rcu_read_unlock();
 	return NETDEV_TX_OK;
 
-drop:
-	this_cpu_inc(tun->pcpu_stats->tx_dropped);
+err_out:
 	skb_tx_error(skb);
 	kfree_skb(skb);
+drop:
+	this_cpu_inc(tun->pcpu_stats->tx_dropped);
 	rcu_read_unlock();
 	return NET_XMIT_DROP;
 }
-- 
2.21.1 (Apple Git-122.3)


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH RFC v4 bpf-next 09/11] tun: Support xdp in the Tx path for xdp_frames
  2020-02-27  3:20 [PATCH RFC v4 bpf-next 00/11] Add support for XDP in egress path David Ahern
                   ` (7 preceding siblings ...)
  2020-02-27  3:20 ` [PATCH RFC v4 bpf-next 08/11] tun: Support xdp in the Tx path for skb David Ahern
@ 2020-02-27  3:20 ` David Ahern
  2020-03-02 18:30   ` Alexei Starovoitov
  2020-03-03 10:40   ` Jesper Dangaard Brouer
  2020-02-27  3:20 ` [PATCH RFC v4 bpf-next 10/11] libbpf: Add egress XDP support David Ahern
                   ` (2 subsequent siblings)
  11 siblings, 2 replies; 38+ messages in thread
From: David Ahern @ 2020-02-27  3:20 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, prashantbhole.linux, jasowang, brouer, toke, mst,
	toshiaki.makita1, daniel, john.fastabend, ast, kafai,
	songliubraving, yhs, andriin, dsahern, David Ahern

From: David Ahern <dahern@digitalocean.com>

Add support to run Tx path program on packets arriving at a tun
device via XDP redirect.

XDP_TX return code means move the packet to the Tx path of the device.
For a program run in the Tx / egress path, XDP_TX is essentially the
same as "continue on" which is XDP_PASS.

Conceptually, XDP_REDIRECT for this path can work the same as it
does for the Rx path, but that return code is left for a follow
on series.

Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
Signed-off-by: David Ahern <dahern@digitalocean.com>
---
 drivers/net/tun.c | 49 +++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 47 insertions(+), 2 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index dcae6521a39d..d3fc7e921c85 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1359,10 +1359,50 @@ static void __tun_xdp_flush_tfile(struct tun_file *tfile)
 	tfile->socket.sk->sk_data_ready(tfile->socket.sk);
 }
 
+static u32 tun_do_xdp_tx(struct tun_struct *tun, struct tun_file *tfile,
+			 struct xdp_frame *frame, struct xdp_txq_info *txq)
+{
+	struct bpf_prog *xdp_prog;
+	u32 act = XDP_PASS;
+
+	xdp_prog = rcu_dereference(tun->xdp_egress_prog);
+	if (xdp_prog) {
+		struct xdp_buff xdp;
+
+		xdp.data_hard_start = frame->data - frame->headroom;
+		xdp.data = frame->data;
+		xdp.data_end = xdp.data + frame->len;
+		xdp_set_data_meta_invalid(&xdp);
+		xdp.txq = txq;
+
+		act = bpf_prog_run_xdp(xdp_prog, &xdp);
+		switch (act) {
+		case XDP_TX:    /* for Tx path, XDP_TX == XDP_PASS */
+			act = XDP_PASS;
+			break;
+		case XDP_PASS:
+			break;
+		case XDP_REDIRECT:
+			/* fall through */
+		default:
+			bpf_warn_invalid_xdp_action(act);
+			/* fall through */
+		case XDP_ABORTED:
+			trace_xdp_exception(tun->dev, xdp_prog, act);
+			/* fall through */
+		case XDP_DROP:
+			break;
+		}
+	}
+
+	return act;
+}
+
 static int tun_xdp_xmit(struct net_device *dev, int n,
 			struct xdp_frame **frames, u32 flags)
 {
 	struct tun_struct *tun = netdev_priv(dev);
+	struct xdp_txq_info txq = { .dev = dev };
 	struct tun_file *tfile;
 	u32 numqueues;
 	int drops = 0;
@@ -1389,12 +1429,17 @@ static int tun_xdp_xmit(struct net_device *dev, int n,
 	spin_lock(&tfile->tx_ring.producer_lock);
 	for (i = 0; i < n; i++) {
 		struct xdp_frame *xdp = frames[i];
+		void *frame;
+
+		if (tun_do_xdp_tx(tun, tfile, xdp, &txq) != XDP_PASS)
+			goto drop;
+
 		/* Encode the XDP flag into lowest bit for consumer to differ
 		 * XDP buffer from sk_buff.
 		 */
-		void *frame = tun_xdp_to_ptr(xdp);
-
+		frame = tun_xdp_to_ptr(xdp);
 		if (__ptr_ring_produce(&tfile->tx_ring, frame)) {
+drop:
 			this_cpu_inc(tun->pcpu_stats->tx_dropped);
 			xdp_return_frame_rx_napi(xdp);
 			drops++;
-- 
2.21.1 (Apple Git-122.3)


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH RFC v4 bpf-next 10/11] libbpf: Add egress XDP support
  2020-02-27  3:20 [PATCH RFC v4 bpf-next 00/11] Add support for XDP in egress path David Ahern
                   ` (8 preceding siblings ...)
  2020-02-27  3:20 ` [PATCH RFC v4 bpf-next 09/11] tun: Support xdp in the Tx path for xdp_frames David Ahern
@ 2020-02-27  3:20 ` David Ahern
  2020-02-27  3:20 ` [PATCH RFC v4 bpf-next 11/11] samples/bpf: xdp1, add " David Ahern
  2020-02-27 11:55 ` [PATCH RFC v4 bpf-next 00/11] Add support for XDP in egress path Toke Høiland-Jørgensen
  11 siblings, 0 replies; 38+ messages in thread
From: David Ahern @ 2020-02-27  3:20 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, prashantbhole.linux, jasowang, brouer, toke, mst,
	toshiaki.makita1, daniel, john.fastabend, ast, kafai,
	songliubraving, yhs, andriin, dsahern, David Ahern

From: David Ahern <dahern@digitalocean.com>

Patch adds egress XDP support in libbpf.

First, new section name hint, xdp_egress, is added to set expected attach
type at program load. Programs can use xdp_egress as the prefix in
the SEC statement to load the program with the BPF_XDP_EGRESS
attach type set.

Second, new APIs are added that parallel the existing xdp ones which
can be changed:
        bpf_set_link_xdp_egress_fd - attach program at fd to device as
                                     xdp egress
        bpf_get_link_xdp_egress_id - get id for xdp egress program
        bpf_get_link_xdp_egress_info - get info for xdp egress program

Internally, the libbpf code is re-factored to be common for both
XDP use cases with a new egress argument to specify which netlink
attribute to use.

Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
Co-developed-by: David Ahern <dahern@digitalocean.com>
Signed-off-by: David Ahern <dahern@digitalocean.com>
---
 tools/lib/bpf/libbpf.c   |  2 ++
 tools/lib/bpf/libbpf.h   |  6 +++++
 tools/lib/bpf/libbpf.map |  3 +++
 tools/lib/bpf/netlink.c  | 52 ++++++++++++++++++++++++++++++++++------
 4 files changed, 56 insertions(+), 7 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 996162801f7a..d90f7f034aad 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -6295,6 +6295,8 @@ static const struct bpf_sec_def section_defs[] = {
 	SEC_DEF("freplace/", EXT,
 		.is_attach_btf = true,
 		.attach_fn = attach_trace),
+	BPF_EAPROG_SEC("xdp_egress",		BPF_PROG_TYPE_XDP,
+						BPF_XDP_EGRESS),
 	BPF_PROG_SEC("xdp",			BPF_PROG_TYPE_XDP),
 	BPF_PROG_SEC("perf_event",		BPF_PROG_TYPE_PERF_EVENT),
 	BPF_PROG_SEC("lwt_in",			BPF_PROG_TYPE_LWT_IN),
diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 02fc58a21a7f..6d4071215b06 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -443,6 +443,12 @@ LIBBPF_API int bpf_set_link_xdp_fd(int ifindex, int fd, __u32 flags);
 LIBBPF_API int bpf_get_link_xdp_id(int ifindex, __u32 *prog_id, __u32 flags);
 LIBBPF_API int bpf_get_link_xdp_info(int ifindex, struct xdp_link_info *info,
 				     size_t info_size, __u32 flags);
+LIBBPF_API int bpf_set_link_xdp_egress_fd(int ifindex, int fd, __u32 flags);
+LIBBPF_API int bpf_get_link_xdp_egress_id(int ifindex, __u32 *prog_id,
+					  __u32 flags);
+LIBBPF_API int bpf_get_link_xdp_egress_info(int ifindex,
+					    struct xdp_link_info *info,
+					    size_t info_size, __u32 flags);
 
 struct perf_buffer;
 
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index 7b014c8cdece..7441b2bd267a 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -239,4 +239,7 @@ LIBBPF_0.0.7 {
 LIBBPF_0.0.8 {
 	global:
 		bpf_program__set_attach_target;
+		bpf_set_link_xdp_egress_fd;
+		bpf_get_link_xdp_egress_id;
+		bpf_get_link_xdp_egress_info;
 } LIBBPF_0.0.7;
diff --git a/tools/lib/bpf/netlink.c b/tools/lib/bpf/netlink.c
index 431bd25c6cdb..3c53c5dff122 100644
--- a/tools/lib/bpf/netlink.c
+++ b/tools/lib/bpf/netlink.c
@@ -28,6 +28,7 @@ typedef int (*__dump_nlmsg_t)(struct nlmsghdr *nlmsg, libbpf_dump_nlmsg_t,
 struct xdp_id_md {
 	int ifindex;
 	__u32 flags;
+	bool egress;
 	struct xdp_link_info info;
 };
 
@@ -132,7 +133,7 @@ static int bpf_netlink_recv(int sock, __u32 nl_pid, int seq,
 	return ret;
 }
 
-int bpf_set_link_xdp_fd(int ifindex, int fd, __u32 flags)
+static int __bpf_set_link_xdp_fd(int ifindex, int fd, __u32 flags, bool egress)
 {
 	int sock, seq = 0, ret;
 	struct nlattr *nla, *nla_xdp;
@@ -159,7 +160,7 @@ int bpf_set_link_xdp_fd(int ifindex, int fd, __u32 flags)
 	/* started nested attribute for XDP */
 	nla = (struct nlattr *)(((char *)&req)
 				+ NLMSG_ALIGN(req.nh.nlmsg_len));
-	nla->nla_type = NLA_F_NESTED | IFLA_XDP;
+	nla->nla_type = NLA_F_NESTED | (egress ? IFLA_XDP_EGRESS : IFLA_XDP);
 	nla->nla_len = NLA_HDRLEN;
 
 	/* add XDP fd */
@@ -191,6 +192,16 @@ int bpf_set_link_xdp_fd(int ifindex, int fd, __u32 flags)
 	return ret;
 }
 
+int bpf_set_link_xdp_fd(int ifindex, int fd, __u32 flags)
+{
+	return __bpf_set_link_xdp_fd(ifindex, fd, flags, false);
+}
+
+int bpf_set_link_xdp_egress_fd(int ifindex, int fd, __u32 flags)
+{
+	return __bpf_set_link_xdp_fd(ifindex, fd, flags, true);
+}
+
 static int __dump_link_nlmsg(struct nlmsghdr *nlh,
 			     libbpf_dump_nlmsg_t dump_link_nlmsg, void *cookie)
 {
@@ -211,15 +222,17 @@ static int get_xdp_info(void *cookie, void *msg, struct nlattr **tb)
 	struct nlattr *xdp_tb[IFLA_XDP_MAX + 1];
 	struct xdp_id_md *xdp_id = cookie;
 	struct ifinfomsg *ifinfo = msg;
+	unsigned int atype;
 	int ret;
 
 	if (xdp_id->ifindex && xdp_id->ifindex != ifinfo->ifi_index)
 		return 0;
 
-	if (!tb[IFLA_XDP])
+	atype = xdp_id->egress ? IFLA_XDP_EGRESS : IFLA_XDP;
+	if (!tb[atype])
 		return 0;
 
-	ret = libbpf_nla_parse_nested(xdp_tb, IFLA_XDP_MAX, tb[IFLA_XDP], NULL);
+	ret = libbpf_nla_parse_nested(xdp_tb, IFLA_XDP_MAX, tb[atype], NULL);
 	if (ret)
 		return ret;
 
@@ -251,10 +264,10 @@ static int get_xdp_info(void *cookie, void *msg, struct nlattr **tb)
 	return 0;
 }
 
-int bpf_get_link_xdp_info(int ifindex, struct xdp_link_info *info,
-			  size_t info_size, __u32 flags)
+static int __bpf_get_link_xdp_info(int ifindex, struct xdp_link_info *info,
+				   size_t info_size, __u32 flags, bool egress)
 {
-	struct xdp_id_md xdp_id = {};
+	struct xdp_id_md xdp_id;
 	int sock, ret;
 	__u32 nl_pid;
 	__u32 mask;
@@ -274,6 +287,7 @@ int bpf_get_link_xdp_info(int ifindex, struct xdp_link_info *info,
 
 	xdp_id.ifindex = ifindex;
 	xdp_id.flags = flags;
+	xdp_id.egress = egress;
 
 	ret = libbpf_nl_get_link(sock, nl_pid, get_xdp_info, &xdp_id);
 	if (!ret) {
@@ -287,6 +301,18 @@ int bpf_get_link_xdp_info(int ifindex, struct xdp_link_info *info,
 	return ret;
 }
 
+int bpf_get_link_xdp_info(int ifindex, struct xdp_link_info *info,
+			  size_t info_size, __u32 flags)
+{
+	return __bpf_get_link_xdp_info(ifindex, info, info_size, flags, false);
+}
+
+int bpf_get_link_xdp_egress_info(int ifindex, struct xdp_link_info *info,
+				 size_t info_size, __u32 flags)
+{
+	return __bpf_get_link_xdp_info(ifindex, info, info_size, flags, true);
+}
+
 static __u32 get_xdp_id(struct xdp_link_info *info, __u32 flags)
 {
 	if (info->attach_mode != XDP_ATTACHED_MULTI)
@@ -313,6 +339,18 @@ int bpf_get_link_xdp_id(int ifindex, __u32 *prog_id, __u32 flags)
 	return ret;
 }
 
+int bpf_get_link_xdp_egress_id(int ifindex, __u32 *prog_id, __u32 flags)
+{
+	struct xdp_link_info info;
+	int ret;
+
+	ret = bpf_get_link_xdp_egress_info(ifindex, &info, sizeof(info), flags);
+	if (!ret)
+		*prog_id = get_xdp_id(&info, flags);
+
+	return ret;
+}
+
 int libbpf_nl_get_link(int sock, unsigned int nl_pid,
 		       libbpf_dump_nlmsg_t dump_link_nlmsg, void *cookie)
 {
-- 
2.21.1 (Apple Git-122.3)


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH RFC v4 bpf-next 11/11] samples/bpf: xdp1, add egress XDP support
  2020-02-27  3:20 [PATCH RFC v4 bpf-next 00/11] Add support for XDP in egress path David Ahern
                   ` (9 preceding siblings ...)
  2020-02-27  3:20 ` [PATCH RFC v4 bpf-next 10/11] libbpf: Add egress XDP support David Ahern
@ 2020-02-27  3:20 ` David Ahern
  2020-02-27 11:55 ` [PATCH RFC v4 bpf-next 00/11] Add support for XDP in egress path Toke Høiland-Jørgensen
  11 siblings, 0 replies; 38+ messages in thread
From: David Ahern @ 2020-02-27  3:20 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, prashantbhole.linux, jasowang, brouer, toke, mst,
	toshiaki.makita1, daniel, john.fastabend, ast, kafai,
	songliubraving, yhs, andriin, dsahern

From: Prashant Bhole <prashantbhole.linux@gmail.com>

xdp1 and xdp2 now accept -E flag to set XDP program in the egress
path.

Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
---
 samples/bpf/xdp1_user.c | 30 +++++++++++++++++++++++++-----
 1 file changed, 25 insertions(+), 5 deletions(-)

diff --git a/samples/bpf/xdp1_user.c b/samples/bpf/xdp1_user.c
index c447ad9e3a1d..72c5bedbd030 100644
--- a/samples/bpf/xdp1_user.c
+++ b/samples/bpf/xdp1_user.c
@@ -21,17 +21,27 @@
 static int ifindex;
 static __u32 xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
 static __u32 prog_id;
+static bool egress;
 
 static void int_exit(int sig)
 {
 	__u32 curr_prog_id = 0;
+	int err;
 
-	if (bpf_get_link_xdp_id(ifindex, &curr_prog_id, xdp_flags)) {
+	if (egress)
+		err = bpf_get_link_xdp_egress_id(ifindex, &curr_prog_id,
+						 xdp_flags);
+	else
+		err = bpf_get_link_xdp_id(ifindex, &curr_prog_id, xdp_flags);
+	if (err) {
 		printf("bpf_get_link_xdp_id failed\n");
 		exit(1);
 	}
 	if (prog_id == curr_prog_id)
-		bpf_set_link_xdp_fd(ifindex, -1, xdp_flags);
+		if (egress)
+			bpf_set_link_xdp_egress_fd(ifindex, -1, xdp_flags);
+		else
+			bpf_set_link_xdp_fd(ifindex, -1, xdp_flags);
 	else if (!curr_prog_id)
 		printf("couldn't find a prog id on a given interface\n");
 	else
@@ -73,7 +83,8 @@ static void usage(const char *prog)
 		"OPTS:\n"
 		"    -S    use skb-mode\n"
 		"    -N    enforce native mode\n"
-		"    -F    force loading prog\n",
+		"    -F    force loading prog\n"
+		"    -E	   egress path program\n",
 		prog);
 }
 
@@ -85,7 +96,7 @@ int main(int argc, char **argv)
 	};
 	struct bpf_prog_info info = {};
 	__u32 info_len = sizeof(info);
-	const char *optstr = "FSN";
+	const char *optstr = "FSNE";
 	int prog_fd, map_fd, opt;
 	struct bpf_object *obj;
 	struct bpf_map *map;
@@ -103,6 +114,9 @@ int main(int argc, char **argv)
 		case 'F':
 			xdp_flags &= ~XDP_FLAGS_UPDATE_IF_NOEXIST;
 			break;
+		case 'E':
+			egress = true;
+			break;
 		default:
 			usage(basename(argv[0]));
 			return 1;
@@ -130,6 +144,8 @@ int main(int argc, char **argv)
 
 	snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
 	prog_load_attr.file = filename;
+	if (egress)
+		prog_load_attr.expected_attach_type = BPF_XDP_EGRESS;
 
 	if (bpf_prog_load_xattr(&prog_load_attr, &obj, &prog_fd))
 		return 1;
@@ -149,7 +165,11 @@ int main(int argc, char **argv)
 	signal(SIGINT, int_exit);
 	signal(SIGTERM, int_exit);
 
-	if (bpf_set_link_xdp_fd(ifindex, prog_fd, xdp_flags) < 0) {
+	if (egress)
+		err = bpf_set_link_xdp_egress_fd(ifindex, prog_fd, xdp_flags);
+	else
+		err = bpf_set_link_xdp_fd(ifindex, prog_fd, xdp_flags);
+	if (err < 0) {
 		printf("link set xdp fd failed\n");
 		return 1;
 	}
-- 
2.21.1 (Apple Git-122.3)


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH RFC v4 bpf-next 03/11] xdp: Add xdp_txq_info to xdp_buff
  2020-02-27  3:20 ` [PATCH RFC v4 bpf-next 03/11] xdp: Add xdp_txq_info to xdp_buff David Ahern
@ 2020-02-27  8:00   ` Jesper Dangaard Brouer
  2020-02-27 11:58     ` Toke Høiland-Jørgensen
  2020-02-27 20:44     ` David Ahern
  0 siblings, 2 replies; 38+ messages in thread
From: Jesper Dangaard Brouer @ 2020-02-27  8:00 UTC (permalink / raw)
  To: David Ahern
  Cc: netdev, davem, kuba, prashantbhole.linux, jasowang, toke, mst,
	toshiaki.makita1, daniel, john.fastabend, ast, kafai,
	songliubraving, yhs, andriin, dsahern, David Ahern, brouer

On Wed, 26 Feb 2020 20:20:05 -0700
David Ahern <dsahern@kernel.org> wrote:

> From: David Ahern <dahern@digitalocean.com>
> 
> Add xdp_txq_info as the Tx counterpart to xdp_rxq_info. At the
> moment only the device is added. Other fields (queue_index)
> can be added as use cases arise.
> 
> From a UAPI perspective, egress_ifindex is a union with ingress_ifindex
> since only one applies based on where the program is attached.
> 
> Signed-off-by: David Ahern <dahern@digitalocean.com>
> ---
>  include/net/xdp.h        |  5 +++++
>  include/uapi/linux/bpf.h |  6 ++++--
>  net/core/filter.c        | 27 +++++++++++++++++++--------
>  3 files changed, 28 insertions(+), 10 deletions(-)
> 
> diff --git a/include/net/xdp.h b/include/net/xdp.h
> index 40c6d3398458..5584b9db86fe 100644
> --- a/include/net/xdp.h
> +++ b/include/net/xdp.h
> @@ -63,6 +63,10 @@ struct xdp_rxq_info {
>  	struct xdp_mem_info mem;
>  } ____cacheline_aligned; /* perf critical, avoid false-sharing */
>  
> +struct xdp_txq_info {
> +	struct net_device *dev;
> +};
> +
>  struct xdp_buff {
>  	void *data;
>  	void *data_end;
> @@ -70,6 +74,7 @@ struct xdp_buff {
>  	void *data_hard_start;
>  	unsigned long handle;
>  	struct xdp_rxq_info *rxq;
> +	struct xdp_txq_info *txq;
>  };
>  
>  struct xdp_frame {
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index 7850f8683b81..5e3f8aefad41 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -3334,8 +3334,10 @@ struct xdp_md {
>  	__u32 data;
>  	__u32 data_end;
>  	__u32 data_meta;
> -	/* Below access go through struct xdp_rxq_info */
> -	__u32 ingress_ifindex; /* rxq->dev->ifindex */
> +	union {
> +		__u32 ingress_ifindex; /* rxq->dev->ifindex */
> +		__u32 egress_ifindex;  /* txq->dev->ifindex */
> +	};

Are we sure it is wise to "union share" (struct) xdp_md as the
XDP-context in the XDP programs, with different expected_attach_type?
As this allows the XDP-programmer to code an EGRESS program that access
ctx->ingress_ifindex, this will under the hood be translated to
ctx->egress_ifindex, because from the compilers-PoV this will just be an
offset.

We are setting up the XDP-programmer for a long debugging session, as
she will be expecting to read 'ingress_ifindex', but will be getting
'egress_ifindex'.  (As the compiler cannot warn her, and it is also
correct seen from the verifier).


>  	__u32 rx_queue_index;  /* rxq->queue_index  */

So, the TX program can still read 'rx_queue_index', is this wise?
(It should be easy to catch below and reject).


>  };
>  
> diff --git a/net/core/filter.c b/net/core/filter.c
> index c7cc98c55621..d1c65dccd671 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -7716,14 +7716,25 @@ static u32 xdp_convert_ctx_access(enum bpf_access_type type,
>  				      offsetof(struct xdp_buff, data_end));
>  		break;
>  	case offsetof(struct xdp_md, ingress_ifindex):
> -		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct xdp_buff, rxq),
> -				      si->dst_reg, si->src_reg,
> -				      offsetof(struct xdp_buff, rxq));
> -		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct xdp_rxq_info, dev),
> -				      si->dst_reg, si->dst_reg,
> -				      offsetof(struct xdp_rxq_info, dev));
> -		*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg,
> -				      offsetof(struct net_device, ifindex));
> +		if (prog->expected_attach_type == BPF_XDP_EGRESS) {
> +			*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct xdp_buff, txq),
> +					      si->dst_reg, si->src_reg,
> +					      offsetof(struct xdp_buff, txq));
> +			*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct xdp_txq_info, dev),
> +					      si->dst_reg, si->dst_reg,
> +					      offsetof(struct xdp_txq_info, dev));
> +			*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg,
> +					      offsetof(struct net_device, ifindex));
> +		} else {
> +			*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct xdp_buff, rxq),
> +					      si->dst_reg, si->src_reg,
> +					      offsetof(struct xdp_buff, rxq));
> +			*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct xdp_rxq_info, dev),
> +					      si->dst_reg, si->dst_reg,
> +					      offsetof(struct xdp_rxq_info, dev));
> +			*insn++ = BPF_LDX_MEM(BPF_W, si->dst_reg, si->dst_reg,
> +					      offsetof(struct net_device, ifindex));
> +		}
>  		break;
>  	case offsetof(struct xdp_md, rx_queue_index):
>  		*insn++ = BPF_LDX_MEM(BPF_FIELD_SIZEOF(struct xdp_buff, rxq),

We can catch and disallow access to rx_queue_index from expected_attach_type
BPF_XDP_EGRESS, here.  But then we are adding more code to handle/separate
egress from normal RX/ingress.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH RFC v4 bpf-next 00/11] Add support for XDP in egress path
  2020-02-27  3:20 [PATCH RFC v4 bpf-next 00/11] Add support for XDP in egress path David Ahern
                   ` (10 preceding siblings ...)
  2020-02-27  3:20 ` [PATCH RFC v4 bpf-next 11/11] samples/bpf: xdp1, add " David Ahern
@ 2020-02-27 11:55 ` Toke Høiland-Jørgensen
  2020-02-27 16:22   ` Alexei Starovoitov
  11 siblings, 1 reply; 38+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-02-27 11:55 UTC (permalink / raw)
  To: David Ahern, netdev
  Cc: davem, kuba, prashantbhole.linux, jasowang, brouer, mst,
	toshiaki.makita1, daniel, john.fastabend, ast, kafai,
	songliubraving, yhs, andriin, dsahern

David Ahern <dsahern@kernel.org> writes:

> From: David Ahern <dsahern@gmail.com>
>
> This series adds support for XDP in the egress path by introducing
> a new XDP attachment type, BPF_XDP_EGRESS, and adding an if_link API
> for attaching the program to a netdevice and reporting the program.
> The intent is to emulate the current RX path for XDP as much as
> possible to maintain consistency and symmetry in the 2 paths with
> their APIs and when the programs are run: at first touch in the Rx
> path and last touch in the Tx path.
>
> The intent is to be able to run bpf programs on all packets regardless
> of how they got to the xmit function of the netdevice - as an skb or a
> redirected xdp frame. This is a missing primitive for XDP allowing
> solutions to build small, targeted programs properly distributed in the
> networking path allowing for example an egress firewall / ACL / traffic
> verification or packet manipulation and encapping an entire ethernet
> frame whether it is locally generated traffic, forwarded via the slow
> path (ie., full stack processing) or xdp redirected frames.

I'm totally on board with these goals!

As for this:

> Attempting to tag the EGRESS path as yet another mode is inconsistent
> on a number of levels - from the current usage of XDP_FLAGS to options
> passed to the verifier for restricting xdp_md accesses. Using the API
> as listed above maintains consistency with all existing code.

You *are* effectively tagging the EGRESS path as another mode: You are
restricting which fields of the context object the program can access,
and you're restricting where the program can be attached. I am pretty
sure we will end up accumulating more differences, either in more
metadata that is only available in one mode (we've already discussed
exposing TX qlen on egress programs), or even helpers that only make
sense in one mode.

So it doesn't make sense to discuss whether egress programs are a
distinct type from ingress programs: They clearly are. What we are
discussing is how to encode this type difference. You are proposing to
encode it using expected_attach_type as a subtype identifier instead of
using a new type number. There is already precedence for this with the
tracing programs, and I do think it makes sense - ingress and egress XDP
programs are clearly related, just as (e.g.) fentry/fexit/freplace
programs are.

However, my issue with this encoding is that it is write-only: You can't
inspect a BPF program already loaded into the kernel and tell which type
it is. So my proposal would be to make it explicit: Expose the
expected_attach_type as a new field in bpf_prog_info so userspace can
query it, and clearly document it as, essentially, a program subtype
that can significantly affect how a program is treated by the kernel.

-Toke


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH RFC v4 bpf-next 03/11] xdp: Add xdp_txq_info to xdp_buff
  2020-02-27  8:00   ` Jesper Dangaard Brouer
@ 2020-02-27 11:58     ` Toke Høiland-Jørgensen
  2020-02-28  3:01       ` David Ahern
  2020-02-27 20:44     ` David Ahern
  1 sibling, 1 reply; 38+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-02-27 11:58 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, David Ahern
  Cc: netdev, davem, kuba, prashantbhole.linux, jasowang, mst,
	toshiaki.makita1, daniel, john.fastabend, ast, kafai,
	songliubraving, yhs, andriin, dsahern, David Ahern, brouer

Jesper Dangaard Brouer <brouer@redhat.com> writes:

> On Wed, 26 Feb 2020 20:20:05 -0700
> David Ahern <dsahern@kernel.org> wrote:
>
>> From: David Ahern <dahern@digitalocean.com>
>> 
>> Add xdp_txq_info as the Tx counterpart to xdp_rxq_info. At the
>> moment only the device is added. Other fields (queue_index)
>> can be added as use cases arise.
>> 
>> From a UAPI perspective, egress_ifindex is a union with ingress_ifindex
>> since only one applies based on where the program is attached.
>> 
>> Signed-off-by: David Ahern <dahern@digitalocean.com>
>> ---
>>  include/net/xdp.h        |  5 +++++
>>  include/uapi/linux/bpf.h |  6 ++++--
>>  net/core/filter.c        | 27 +++++++++++++++++++--------
>>  3 files changed, 28 insertions(+), 10 deletions(-)
>> 
>> diff --git a/include/net/xdp.h b/include/net/xdp.h
>> index 40c6d3398458..5584b9db86fe 100644
>> --- a/include/net/xdp.h
>> +++ b/include/net/xdp.h
>> @@ -63,6 +63,10 @@ struct xdp_rxq_info {
>>  	struct xdp_mem_info mem;
>>  } ____cacheline_aligned; /* perf critical, avoid false-sharing */
>>  
>> +struct xdp_txq_info {
>> +	struct net_device *dev;
>> +};
>> +
>>  struct xdp_buff {
>>  	void *data;
>>  	void *data_end;
>> @@ -70,6 +74,7 @@ struct xdp_buff {
>>  	void *data_hard_start;
>>  	unsigned long handle;
>>  	struct xdp_rxq_info *rxq;
>> +	struct xdp_txq_info *txq;
>>  };
>>  
>>  struct xdp_frame {
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index 7850f8683b81..5e3f8aefad41 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -3334,8 +3334,10 @@ struct xdp_md {
>>  	__u32 data;
>>  	__u32 data_end;
>>  	__u32 data_meta;
>> -	/* Below access go through struct xdp_rxq_info */
>> -	__u32 ingress_ifindex; /* rxq->dev->ifindex */
>> +	union {
>> +		__u32 ingress_ifindex; /* rxq->dev->ifindex */
>> +		__u32 egress_ifindex;  /* txq->dev->ifindex */
>> +	};
>
> Are we sure it is wise to "union share" (struct) xdp_md as the
> XDP-context in the XDP programs, with different expected_attach_type?
> As this allows the XDP-programmer to code an EGRESS program that access
> ctx->ingress_ifindex, this will under the hood be translated to
> ctx->egress_ifindex, because from the compilers-PoV this will just be an
> offset.
>
> We are setting up the XDP-programmer for a long debugging session, as
> she will be expecting to read 'ingress_ifindex', but will be getting
> 'egress_ifindex'.  (As the compiler cannot warn her, and it is also
> correct seen from the verifier).

+1 on this; also, an egress program may want to actually know which
ingress iface the packet was first received on. So why not just keep
both fields? Since ifindex 0 is invalid anyway, the field could just be
0 when it isn't known (e.g., egress ifindex on RX, or ingress ifindex if
it comes from the stack)?

>>  	__u32 rx_queue_index;  /* rxq->queue_index  */
>
> So, the TX program can still read 'rx_queue_index', is this wise?

Why shouldn't it be able to (as well as ingress ifindex)?

-Toke


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH RFC v4 bpf-next 00/11] Add support for XDP in egress path
  2020-02-27 11:55 ` [PATCH RFC v4 bpf-next 00/11] Add support for XDP in egress path Toke Høiland-Jørgensen
@ 2020-02-27 16:22   ` Alexei Starovoitov
  2020-02-27 17:06     ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 38+ messages in thread
From: Alexei Starovoitov @ 2020-02-27 16:22 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: David Ahern, Network Development, David S. Miller,
	Jakub Kicinski, Prashant Bhole, Jason Wang,
	Jesper Dangaard Brouer, Michael S. Tsirkin, Toshiaki Makita,
	Daniel Borkmann, John Fastabend, Alexei Starovoitov,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	David Ahern

On Thu, Feb 27, 2020 at 3:55 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> However, my issue with this encoding is that it is write-only: You can't
> inspect a BPF program already loaded into the kernel and tell which type
> it is. So my proposal would be to make it explicit: Expose the
> expected_attach_type as a new field in bpf_prog_info so userspace can
> query it, and clearly document it as, essentially, a program subtype
> that can significantly affect how a program is treated by the kernel.

You had the same request for "freplace" target prog.
My answer to both is still the same:
Please take a look at drgn and the script that Andrey posted.
All this information is trivial to extract from the kernel
without introducing new uapi.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH RFC v4 bpf-next 00/11] Add support for XDP in egress path
  2020-02-27 16:22   ` Alexei Starovoitov
@ 2020-02-27 17:06     ` Toke Høiland-Jørgensen
  2020-02-27 18:37       ` Alexei Starovoitov
  0 siblings, 1 reply; 38+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-02-27 17:06 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David Ahern, Network Development, David S. Miller,
	Jakub Kicinski, Prashant Bhole, Jason Wang,
	Jesper Dangaard Brouer, Michael S. Tsirkin, Toshiaki Makita,
	Daniel Borkmann, John Fastabend, Alexei Starovoitov,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	David Ahern

Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:

> On Thu, Feb 27, 2020 at 3:55 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> However, my issue with this encoding is that it is write-only: You can't
>> inspect a BPF program already loaded into the kernel and tell which type
>> it is. So my proposal would be to make it explicit: Expose the
>> expected_attach_type as a new field in bpf_prog_info so userspace can
>> query it, and clearly document it as, essentially, a program subtype
>> that can significantly affect how a program is treated by the kernel.
>
> You had the same request for "freplace" target prog.

Yes, and for the same reason; I'm being consistent here :)

> My answer to both is still the same:
> Please take a look at drgn and the script that Andrey posted.
> All this information is trivial to extract from the kernel
> without introducing new uapi.

I'm sorry, but having this kind of write-only data structure is a
horrible API design; and saying "you can just parse the internal kernel
data structures to see what is going on" is a cop-out. The whole point
of having a stable UAPI is to make it possible for people to write
applications that make use of kernel features with an expectation that
those applications will keep working. XDP is a networking feature;
people building networking applications shouldn't have to chase internal
kernel APIs just to keep their packet processing programs working.

Besides, it's already UAPI - there's a setter for it! If we introduce
this new egress program type that is still going to be a de facto new
program type with different semantics than the RX program, whether we
try to hide the fact or not. Even if we end up completely changing the
internal data structures, we're still going to have to support someone 
loading a program with type==XDP and attach_type == XDP_EGRESS. So why
can't we allow the user to query the state of a previously loaded
program as well?

-Toke


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH RFC v4 bpf-next 00/11] Add support for XDP in egress path
  2020-02-27 17:06     ` Toke Høiland-Jørgensen
@ 2020-02-27 18:37       ` Alexei Starovoitov
  0 siblings, 0 replies; 38+ messages in thread
From: Alexei Starovoitov @ 2020-02-27 18:37 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: David Ahern, Network Development, David S. Miller,
	Jakub Kicinski, Prashant Bhole, Jason Wang,
	Jesper Dangaard Brouer, Michael S. Tsirkin, Toshiaki Makita,
	Daniel Borkmann, John Fastabend, Alexei Starovoitov,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	David Ahern

On Thu, Feb 27, 2020 at 06:06:57PM +0100, Toke Høiland-Jørgensen wrote:
> Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
> 
> > On Thu, Feb 27, 2020 at 3:55 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >>
> >> However, my issue with this encoding is that it is write-only: You can't
> >> inspect a BPF program already loaded into the kernel and tell which type
> >> it is. So my proposal would be to make it explicit: Expose the
> >> expected_attach_type as a new field in bpf_prog_info so userspace can
> >> query it, and clearly document it as, essentially, a program subtype
> >> that can significantly affect how a program is treated by the kernel.
> >
> > You had the same request for "freplace" target prog.
> 
> Yes, and for the same reason; I'm being consistent here :)
> 
> > My answer to both is still the same:
> > Please take a look at drgn and the script that Andrey posted.
> > All this information is trivial to extract from the kernel
> > without introducing new uapi.
> 
> I'm sorry, but having this kind of write-only data structure is a
> horrible API design; and saying "you can just parse the internal kernel
> data structures to see what is going on" is a cop-out. The whole point
> of having a stable UAPI is to make it possible for people to write
> applications that make use of kernel features with an expectation that
> those applications will keep working. XDP is a networking feature;
> people building networking applications shouldn't have to chase internal
> kernel APIs just to keep their packet processing programs working.

You're mistaking human needs for tooling needs.
Humans want to see all sorts of internal things and they can tweak drgn
script to look at them on the live kernel and examine kdump's vmcore
with the same drgn script.
Tooling needs uapi to work on a live system.
There are 26 attach_types that are used by various cgroup based progs,
sockmap, lirc, flow_dissector and tracing. Many of these attach types
are used in production and not a single time _tools_ had a need to
retrieve that field from the kernel.
Show me the tool that needs to read expected_attach_type back and
then we can discuss how to expose it in uapi.

There is another side of this. struct bpf_prog->pages field currently
is not exposed in uapi and I've seen tools doing
cat /proc/pid/fdinfo/fd|grep "memlock:"
just to retrive 'pages' field. That's the case where 'struct bpf_prog_info'
should be extended.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH RFC v4 bpf-next 03/11] xdp: Add xdp_txq_info to xdp_buff
  2020-02-27  8:00   ` Jesper Dangaard Brouer
  2020-02-27 11:58     ` Toke Høiland-Jørgensen
@ 2020-02-27 20:44     ` David Ahern
  2020-02-28 10:07       ` Toke Høiland-Jørgensen
  1 sibling, 1 reply; 38+ messages in thread
From: David Ahern @ 2020-02-27 20:44 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: netdev, davem, kuba, prashantbhole.linux, jasowang, toke, mst,
	toshiaki.makita1, daniel, john.fastabend, ast, kafai,
	songliubraving, yhs, andriin, David Ahern

On 2/27/20 1:00 AM, Jesper Dangaard Brouer wrote:
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index 7850f8683b81..5e3f8aefad41 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -3334,8 +3334,10 @@ struct xdp_md {
>>  	__u32 data;
>>  	__u32 data_end;
>>  	__u32 data_meta;
>> -	/* Below access go through struct xdp_rxq_info */
>> -	__u32 ingress_ifindex; /* rxq->dev->ifindex */
>> +	union {
>> +		__u32 ingress_ifindex; /* rxq->dev->ifindex */
>> +		__u32 egress_ifindex;  /* txq->dev->ifindex */
>> +	};
> 
> Are we sure it is wise to "union share" (struct) xdp_md as the
> XDP-context in the XDP programs, with different expected_attach_type?
> As this allows the XDP-programmer to code an EGRESS program that access
> ctx->ingress_ifindex, this will under the hood be translated to
> ctx->egress_ifindex, because from the compilers-PoV this will just be an
> offset.
> 
> We are setting up the XDP-programmer for a long debugging session, as
> she will be expecting to read 'ingress_ifindex', but will be getting
> 'egress_ifindex'.  (As the compiler cannot warn her, and it is also
> correct seen from the verifier).

It both cases it means the device handling the packet. ingress_ifindex
== device handling the Rx, egress_ifindex == device handling the Tx.
Really, it is syntactic sugar for program writers. It would have been
better had xdp_md only called it ifindex from the beginning.

> 
> 
>>  	__u32 rx_queue_index;  /* rxq->queue_index  */
> 
> So, the TX program can still read 'rx_queue_index', is this wise?
> (It should be easy to catch below and reject).

See patch 2.

In time I expect rx_queue_index to be a union with tx_queue_index for
the same reasons as the ifindex.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH RFC v4 bpf-next 03/11] xdp: Add xdp_txq_info to xdp_buff
  2020-02-27 11:58     ` Toke Høiland-Jørgensen
@ 2020-02-28  3:01       ` David Ahern
  2020-02-28 10:10         ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 38+ messages in thread
From: David Ahern @ 2020-02-28  3:01 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Jesper Dangaard Brouer, David Ahern
  Cc: netdev, davem, kuba, prashantbhole.linux, jasowang, mst,
	toshiaki.makita1, daniel, john.fastabend, ast, kafai,
	songliubraving, yhs, andriin, dsahern

On 2/27/20 4:58 AM, Toke Høiland-Jørgensen wrote:
>  also, an egress program may want to actually know which
> ingress iface the packet was first received on. So why not just keep
> both fields? Since ifindex 0 is invalid anyway, the field could just be
> 0 when it isn't known (e.g., egress ifindex on RX, or ingress ifindex if
> it comes from the stack)?

Today, the ingress device is lost in the conversion from xdp_buff to
xdp_frame. The plumbing needed to keep that information is beyond the
scope of this set.

I am open to making the UAPI separate entries if there is a real reason
for it. Do you have a specific use case? I am not aware of any situation
where a packet queued up for Tx on a device would want to know the
ingress device. At that point it is kind of irrelevant; the packet is
about to hit the "wire". Further, it would only apply to XDP_redirected
frames which could be only a limited set in the ways that a packet can
end up at a device for Tx. I suspect the flow is more relevant than the
device. When you factor on other details - e.g., bonds, vlans - the
ingress device is not the full story. Perhaps the metadata area is more
appropriate than the overhead of managing that information in the kernel
code.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH RFC v4 bpf-next 03/11] xdp: Add xdp_txq_info to xdp_buff
  2020-02-27 20:44     ` David Ahern
@ 2020-02-28 10:07       ` Toke Høiland-Jørgensen
  2020-02-28 10:41         ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 38+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-02-28 10:07 UTC (permalink / raw)
  To: David Ahern, Jesper Dangaard Brouer
  Cc: netdev, davem, kuba, prashantbhole.linux, jasowang, mst,
	toshiaki.makita1, daniel, john.fastabend, ast, kafai,
	songliubraving, yhs, andriin, David Ahern

David Ahern <dahern@digitalocean.com> writes:

> On 2/27/20 1:00 AM, Jesper Dangaard Brouer wrote:
>>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>>> index 7850f8683b81..5e3f8aefad41 100644
>>> --- a/include/uapi/linux/bpf.h
>>> +++ b/include/uapi/linux/bpf.h
>>> @@ -3334,8 +3334,10 @@ struct xdp_md {
>>>  	__u32 data;
>>>  	__u32 data_end;
>>>  	__u32 data_meta;
>>> -	/* Below access go through struct xdp_rxq_info */
>>> -	__u32 ingress_ifindex; /* rxq->dev->ifindex */
>>> +	union {
>>> +		__u32 ingress_ifindex; /* rxq->dev->ifindex */
>>> +		__u32 egress_ifindex;  /* txq->dev->ifindex */
>>> +	};
>> 
>> Are we sure it is wise to "union share" (struct) xdp_md as the
>> XDP-context in the XDP programs, with different expected_attach_type?
>> As this allows the XDP-programmer to code an EGRESS program that access
>> ctx->ingress_ifindex, this will under the hood be translated to
>> ctx->egress_ifindex, because from the compilers-PoV this will just be an
>> offset.
>> 
>> We are setting up the XDP-programmer for a long debugging session, as
>> she will be expecting to read 'ingress_ifindex', but will be getting
>> 'egress_ifindex'.  (As the compiler cannot warn her, and it is also
>> correct seen from the verifier).
>
> It both cases it means the device handling the packet. ingress_ifindex
> == device handling the Rx, egress_ifindex == device handling the Tx.
> Really, it is syntactic sugar for program writers. It would have been
> better had xdp_md only called it ifindex from the beginning.

Telling users that they are doing it wrong is not going to make their
debugging session any less frustrating :)

If we keep rx_ifindex a separate field we can unambiguously reject a TX
program that tries to access it, *and* we keep the option of allowing
access to it later if it does turn out to be useful. IMO that is worth
the four extra bytes.

-Toke


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH RFC v4 bpf-next 03/11] xdp: Add xdp_txq_info to xdp_buff
  2020-02-28  3:01       ` David Ahern
@ 2020-02-28 10:10         ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 38+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-02-28 10:10 UTC (permalink / raw)
  To: David Ahern, Jesper Dangaard Brouer, David Ahern
  Cc: netdev, davem, kuba, prashantbhole.linux, jasowang, mst,
	toshiaki.makita1, daniel, john.fastabend, ast, kafai,
	songliubraving, yhs, andriin, dsahern

David Ahern <dahern@digitalocean.com> writes:

> On 2/27/20 4:58 AM, Toke Høiland-Jørgensen wrote:
>>  also, an egress program may want to actually know which
>> ingress iface the packet was first received on. So why not just keep
>> both fields? Since ifindex 0 is invalid anyway, the field could just be
>> 0 when it isn't known (e.g., egress ifindex on RX, or ingress ifindex if
>> it comes from the stack)?
>
> Today, the ingress device is lost in the conversion from xdp_buff to
> xdp_frame. The plumbing needed to keep that information is beyond the
> scope of this set.
>
> I am open to making the UAPI separate entries if there is a real reason
> for it. Do you have a specific use case?

I was thinking it could be a nice shorthand for whether a packet comes
from the local stack or was forwarded (0 == stack). But no, I don't have
a concrete application where this is useful. However, if we define it as
a union we lose the ability to change our mind. Together with the
debugability issue I just replied with to your other email, I think it
would be better to expend the four bytes keep them as separate fields,
but still restrict access to the RX ifindex for now.

-Toke


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH RFC v4 bpf-next 03/11] xdp: Add xdp_txq_info to xdp_buff
  2020-02-28 10:07       ` Toke Høiland-Jørgensen
@ 2020-02-28 10:41         ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 38+ messages in thread
From: Jesper Dangaard Brouer @ 2020-02-28 10:41 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: David Ahern, netdev, davem, kuba, prashantbhole.linux, jasowang,
	mst, toshiaki.makita1, daniel, john.fastabend, ast, kafai,
	songliubraving, yhs, andriin, David Ahern, brouer

On Fri, 28 Feb 2020 11:07:23 +0100
Toke Høiland-Jørgensen <toke@redhat.com> wrote:

> David Ahern <dahern@digitalocean.com> writes:
> 
> > On 2/27/20 1:00 AM, Jesper Dangaard Brouer wrote:  
> >>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> >>> index 7850f8683b81..5e3f8aefad41 100644
> >>> --- a/include/uapi/linux/bpf.h
> >>> +++ b/include/uapi/linux/bpf.h
> >>> @@ -3334,8 +3334,10 @@ struct xdp_md {
> >>>  	__u32 data;
> >>>  	__u32 data_end;
> >>>  	__u32 data_meta;
> >>> -	/* Below access go through struct xdp_rxq_info */
> >>> -	__u32 ingress_ifindex; /* rxq->dev->ifindex */
> >>> +	union {
> >>> +		__u32 ingress_ifindex; /* rxq->dev->ifindex */
> >>> +		__u32 egress_ifindex;  /* txq->dev->ifindex */
> >>> +	};  
> >> 
> >> Are we sure it is wise to "union share" (struct) xdp_md as the
> >> XDP-context in the XDP programs, with different expected_attach_type?
> >> As this allows the XDP-programmer to code an EGRESS program that access
> >> ctx->ingress_ifindex, this will under the hood be translated to
> >> ctx->egress_ifindex, because from the compilers-PoV this will just be an
> >> offset.
> >> 
> >> We are setting up the XDP-programmer for a long debugging session, as
> >> she will be expecting to read 'ingress_ifindex', but will be getting
> >> 'egress_ifindex'.  (As the compiler cannot warn her, and it is also
> >> correct seen from the verifier).  
> >
> > It both cases it means the device handling the packet. ingress_ifindex
> > == device handling the Rx, egress_ifindex == device handling the Tx.
> > Really, it is syntactic sugar for program writers. It would have been
> > better had xdp_md only called it ifindex from the beginning.  
> 
> Telling users that they are doing it wrong is not going to make their
> debugging session any less frustrating :)
> 
> If we keep rx_ifindex a separate field we can unambiguously reject a TX
> program that tries to access it, *and* we keep the option of allowing
> access to it later if it does turn out to be useful. IMO that is worth
> the four extra bytes.

I agree. We need unambiguously to help the program writer.

This is the wrong kind of 'syntactic sugar'.  If you want a straight
'ifindex', that translates to the running_ifindex, when you need to add
a new member 'ifindex' that does this rewriting based on attach type.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH RFC v4 bpf-next 08/11] tun: Support xdp in the Tx path for skb
  2020-02-27  3:20 ` [PATCH RFC v4 bpf-next 08/11] tun: Support xdp in the Tx path for skb David Ahern
@ 2020-03-02  3:28   ` Jason Wang
  2020-03-02  3:41     ` David Ahern
  2020-03-03 10:46   ` Jesper Dangaard Brouer
  1 sibling, 1 reply; 38+ messages in thread
From: Jason Wang @ 2020-03-02  3:28 UTC (permalink / raw)
  To: David Ahern, netdev
  Cc: davem, kuba, prashantbhole.linux, brouer, toke, mst,
	toshiaki.makita1, daniel, john.fastabend, ast, kafai,
	songliubraving, yhs, andriin, dsahern, David Ahern


On 2020/2/27 上午11:20, David Ahern wrote:
> From: David Ahern <dahern@digitalocean.com>
>
> Add support to run Tx path program on packets arriving at a tun
> device as an skb.
>
> XDP_TX return code means move the packet to the Tx path of the device.
> For a program run in the Tx / egress path, XDP_TX is essentially the
> same as "continue on" which is XDP_PASS.
>
> Conceptually, XDP_REDIRECT for this path can work the same as it
> does for the Rx path, but that return code is left for a follow
> on series.
>
> Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
> Signed-off-by: David Ahern <dahern@digitalocean.com>
> ---
>   drivers/net/tun.c | 69 ++++++++++++++++++++++++++++++++++++++++++++---
>   1 file changed, 66 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 6aae398b904b..dcae6521a39d 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -1059,6 +1059,63 @@ static unsigned int run_ebpf_filter(struct tun_struct *tun,
>   	return len;
>   }
>   
> +static struct sk_buff *tun_prepare_xdp_skb(struct sk_buff *skb)
> +{
> +	if (skb_shared(skb) || skb_cloned(skb)) {
> +		struct sk_buff *nskb;
> +
> +		nskb = skb_copy(skb, GFP_ATOMIC);
> +		consume_skb(skb);
> +		return nskb;
> +	}
> +
> +	return skb;
> +}
> +
> +static u32 tun_do_xdp_tx_generic(struct tun_struct *tun,
> +				 struct net_device *dev,
> +				 struct sk_buff *skb)
> +{
> +	struct bpf_prog *xdp_prog;
> +	u32 act = XDP_PASS;
> +
> +	xdp_prog = rcu_dereference(tun->xdp_egress_prog);
> +	if (xdp_prog) {
> +		struct xdp_txq_info txq = { .dev = dev };
> +		struct xdp_buff xdp;
> +
> +		skb = tun_prepare_xdp_skb(skb);
> +		if (!skb) {
> +			act = XDP_DROP;
> +			goto out;
> +		}
> +
> +		xdp.txq = &txq;
> +
> +		act = do_xdp_generic_core(skb, &xdp, xdp_prog);
> +		switch (act) {
> +		case XDP_TX:    /* for Tx path, XDP_TX == XDP_PASS */
> +			act = XDP_PASS;
> +			break;


Jute a note here, I agree for TX XDP it may be better to do this.

But for offloaded program we need different semantic. Or we can deal 
this with attach types?

Thanks


> +		case XDP_PASS:
> +			break;
> +		case XDP_REDIRECT:
> +			/* fall through */
> +		default:
> +			bpf_warn_invalid_xdp_action(act);
> +			/* fall through */
> +		case XDP_ABORTED:
> +			trace_xdp_exception(tun->dev, xdp_prog, act);
> +			/* fall through */
> +		case XDP_DROP:
> +			break;
> +		}
> +	}
> +
> +out:
> +	return act;
> +}
> +
>   /* Net device start xmit */
>   static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
>   {
> @@ -1066,6 +1123,7 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
>   	int txq = skb->queue_mapping;
>   	struct tun_file *tfile;
>   	int len = skb->len;
> +	u32 act;
>   
>   	rcu_read_lock();
>   	tfile = rcu_dereference(tun->tfiles[txq]);
> @@ -1107,9 +1165,13 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
>   
>   	nf_reset_ct(skb);
>   
> -	if (ptr_ring_produce(&tfile->tx_ring, skb))
> +	act = tun_do_xdp_tx_generic(tun, dev, skb);
> +	if (act != XDP_PASS)
>   		goto drop;
>   
> +	if (ptr_ring_produce(&tfile->tx_ring, skb))
> +		goto err_out;
> +
>   	/* Notify and wake up reader process */
>   	if (tfile->flags & TUN_FASYNC)
>   		kill_fasync(&tfile->fasync, SIGIO, POLL_IN);
> @@ -1118,10 +1180,11 @@ static netdev_tx_t tun_net_xmit(struct sk_buff *skb, struct net_device *dev)
>   	rcu_read_unlock();
>   	return NETDEV_TX_OK;
>   
> -drop:
> -	this_cpu_inc(tun->pcpu_stats->tx_dropped);
> +err_out:
>   	skb_tx_error(skb);
>   	kfree_skb(skb);
> +drop:
> +	this_cpu_inc(tun->pcpu_stats->tx_dropped);
>   	rcu_read_unlock();
>   	return NET_XMIT_DROP;
>   }


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH RFC v4 bpf-next 07/11] tun: set egress XDP program
  2020-02-27  3:20 ` [PATCH RFC v4 bpf-next 07/11] tun: set egress XDP program David Ahern
@ 2020-03-02  3:32   ` Jason Wang
  2020-03-02  3:52     ` David Ahern
  2020-03-10  2:18     ` David Ahern
  0 siblings, 2 replies; 38+ messages in thread
From: Jason Wang @ 2020-03-02  3:32 UTC (permalink / raw)
  To: David Ahern, netdev
  Cc: davem, kuba, prashantbhole.linux, brouer, toke, mst,
	toshiaki.makita1, daniel, john.fastabend, ast, kafai,
	songliubraving, yhs, andriin, dsahern, David Ahern


On 2020/2/27 上午11:20, David Ahern wrote:
> From: David Ahern <dahern@digitalocean.com>
>
> This patch adds a way to set tx path XDP program in tun driver
> by handling XDP_SETUP_PROG_EGRESS and XDP_QUERY_PROG_EGRESS in
> ndo_bpf handler.
>
> Signed-off-by: David Ahern <dahern@digitalocean.com>
> Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
> ---
>   drivers/net/tun.c | 34 ++++++++++++++++++++++++++--------
>   1 file changed, 26 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 7cc5a1acaef2..6aae398b904b 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -239,6 +239,7 @@ struct tun_struct {
>   	u32 rx_batched;
>   	struct tun_pcpu_stats __percpu *pcpu_stats;
>   	struct bpf_prog __rcu *xdp_prog;
> +	struct bpf_prog __rcu *xdp_egress_prog;
>   	struct tun_prog __rcu *steering_prog;
>   	struct tun_prog __rcu *filter_prog;
>   	struct ethtool_link_ksettings link_ksettings;
> @@ -1189,15 +1190,21 @@ tun_net_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats)
>   }
>   
>   static int tun_xdp_set(struct net_device *dev, struct bpf_prog *prog,
> -		       struct netlink_ext_ack *extack)


How about accept a bpf_prog ** here, then there's no need to check 
egress when deferencing the pointer?


> +		       bool egress, struct netlink_ext_ack *extack)
>   {
>   	struct tun_struct *tun = netdev_priv(dev);
>   	struct tun_file *tfile;
>   	struct bpf_prog *old_prog;
>   	int i;
>   
> -	old_prog = rtnl_dereference(tun->xdp_prog);
> -	rcu_assign_pointer(tun->xdp_prog, prog);
> +	if (egress) {
> +		old_prog = rtnl_dereference(tun->xdp_egress_prog);
> +		rcu_assign_pointer(tun->xdp_egress_prog, prog);
> +	} else {
> +		old_prog = rtnl_dereference(tun->xdp_prog);
> +		rcu_assign_pointer(tun->xdp_prog, prog);
> +	}
> +
>   	if (old_prog)
>   		bpf_prog_put(old_prog);


Btw, for egress type of XDP there's no need to toggle SOCK_XDP flag 
which is only needed for RX XDP.

Thanks


>   
> @@ -1218,12 +1225,16 @@ static int tun_xdp_set(struct net_device *dev, struct bpf_prog *prog,
>   	return 0;
>   }
>   
> -static u32 tun_xdp_query(struct net_device *dev)
> +static u32 tun_xdp_query(struct net_device *dev, bool egress)
>   {
>   	struct tun_struct *tun = netdev_priv(dev);
>   	const struct bpf_prog *xdp_prog;
>   
> -	xdp_prog = rtnl_dereference(tun->xdp_prog);
> +	if (egress)
> +		xdp_prog = rtnl_dereference(tun->xdp_egress_prog);
> +	else
> +		xdp_prog = rtnl_dereference(tun->xdp_prog);
> +
>   	if (xdp_prog)
>   		return xdp_prog->aux->id;
>   
> @@ -1234,13 +1245,20 @@ static int tun_xdp(struct net_device *dev, struct netdev_bpf *xdp)
>   {
>   	switch (xdp->command) {
>   	case XDP_SETUP_PROG:
> -		return tun_xdp_set(dev, xdp->prog, xdp->extack);
> +		return tun_xdp_set(dev, xdp->prog, false, xdp->extack);
> +	case XDP_SETUP_PROG_EGRESS:
> +		return tun_xdp_set(dev, xdp->prog, true, xdp->extack);
>   	case XDP_QUERY_PROG:
> -		xdp->prog_id = tun_xdp_query(dev);
> -		return 0;
> +		xdp->prog_id = tun_xdp_query(dev, false);
> +		break;
> +	case XDP_QUERY_PROG_EGRESS:
> +		xdp->prog_id = tun_xdp_query(dev, true);
> +		break;
>   	default:
>   		return -EINVAL;
>   	}
> +
> +	return 0;
>   }
>   
>   static int tun_net_change_carrier(struct net_device *dev, bool new_carrier)


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH RFC v4 bpf-next 08/11] tun: Support xdp in the Tx path for skb
  2020-03-02  3:28   ` Jason Wang
@ 2020-03-02  3:41     ` David Ahern
  0 siblings, 0 replies; 38+ messages in thread
From: David Ahern @ 2020-03-02  3:41 UTC (permalink / raw)
  To: Jason Wang, David Ahern, netdev
  Cc: davem, kuba, prashantbhole.linux, brouer, toke, mst,
	toshiaki.makita1, daniel, john.fastabend, ast, kafai,
	songliubraving, yhs, andriin, David Ahern

On 3/1/20 8:28 PM, Jason Wang wrote:
>> +static u32 tun_do_xdp_tx_generic(struct tun_struct *tun,
>> +                 struct net_device *dev,
>> +                 struct sk_buff *skb)
>> +{
>> +    struct bpf_prog *xdp_prog;
>> +    u32 act = XDP_PASS;
>> +
>> +    xdp_prog = rcu_dereference(tun->xdp_egress_prog);
>> +    if (xdp_prog) {
>> +        struct xdp_txq_info txq = { .dev = dev };
>> +        struct xdp_buff xdp;
>> +
>> +        skb = tun_prepare_xdp_skb(skb);
>> +        if (!skb) {
>> +            act = XDP_DROP;
>> +            goto out;
>> +        }
>> +
>> +        xdp.txq = &txq;
>> +
>> +        act = do_xdp_generic_core(skb, &xdp, xdp_prog);
>> +        switch (act) {
>> +        case XDP_TX:    /* for Tx path, XDP_TX == XDP_PASS */
>> +            act = XDP_PASS;
>> +            break;
> 
> 
> Jute a note here, I agree for TX XDP it may be better to do this.
> 
> But for offloaded program we need different semantic. Or we can deal
> this with attach types?
> 

This path is for XDP_FLAGS_DRV_MODE. Offloaded programs
(XDP_FLAGS_HW_MODE) will be run in vhost context.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH RFC v4 bpf-next 07/11] tun: set egress XDP program
  2020-03-02  3:32   ` Jason Wang
@ 2020-03-02  3:52     ` David Ahern
  2020-03-10  2:18     ` David Ahern
  1 sibling, 0 replies; 38+ messages in thread
From: David Ahern @ 2020-03-02  3:52 UTC (permalink / raw)
  To: Jason Wang, David Ahern, netdev
  Cc: davem, kuba, prashantbhole.linux, brouer, toke, mst,
	toshiaki.makita1, daniel, john.fastabend, ast, kafai,
	songliubraving, yhs, andriin, David Ahern

On 3/1/20 8:32 PM, Jason Wang wrote:
>> @@ -1189,15 +1190,21 @@ tun_net_get_stats64(struct net_device *dev,
>> struct rtnl_link_stats64 *stats)
>>   }
>>     static int tun_xdp_set(struct net_device *dev, struct bpf_prog *prog,
>> -               struct netlink_ext_ack *extack)
> 
> 
> How about accept a bpf_prog ** here, then there's no need to check
> egress when deferencing the pointer?


sure, that works. Will change.

> 
> 
>> +               bool egress, struct netlink_ext_ack *extack)
>>   {
>>       struct tun_struct *tun = netdev_priv(dev);
>>       struct tun_file *tfile;
>>       struct bpf_prog *old_prog;
>>       int i;
>>   -    old_prog = rtnl_dereference(tun->xdp_prog);
>> -    rcu_assign_pointer(tun->xdp_prog, prog);
>> +    if (egress) {
>> +        old_prog = rtnl_dereference(tun->xdp_egress_prog);
>> +        rcu_assign_pointer(tun->xdp_egress_prog, prog);
>> +    } else {
>> +        old_prog = rtnl_dereference(tun->xdp_prog);
>> +        rcu_assign_pointer(tun->xdp_prog, prog);
>> +    }
>> +
>>       if (old_prog)
>>           bpf_prog_put(old_prog);
> 
> 
> Btw, for egress type of XDP there's no need to toggle SOCK_XDP flag
> which is only needed for RX XDP.

Right. We removed the manipulation of that flag a few iterations ago
based on a review comment.



^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH RFC v4 bpf-next 09/11] tun: Support xdp in the Tx path for xdp_frames
  2020-02-27  3:20 ` [PATCH RFC v4 bpf-next 09/11] tun: Support xdp in the Tx path for xdp_frames David Ahern
@ 2020-03-02 18:30   ` Alexei Starovoitov
  2020-03-03  4:27     ` David Ahern
  2020-03-03 10:40   ` Jesper Dangaard Brouer
  1 sibling, 1 reply; 38+ messages in thread
From: Alexei Starovoitov @ 2020-03-02 18:30 UTC (permalink / raw)
  To: David Ahern
  Cc: netdev, davem, kuba, prashantbhole.linux, jasowang, brouer, toke,
	mst, toshiaki.makita1, daniel, john.fastabend, ast, kafai,
	songliubraving, yhs, andriin, dsahern, David Ahern

On Wed, Feb 26, 2020 at 08:20:11PM -0700, David Ahern wrote:
> +
> +		act = bpf_prog_run_xdp(xdp_prog, &xdp);
> +		switch (act) {
> +		case XDP_TX:    /* for Tx path, XDP_TX == XDP_PASS */
> +			act = XDP_PASS;
> +			break;
> +		case XDP_PASS:
> +			break;
> +		case XDP_REDIRECT:
> +			/* fall through */
> +		default:
> +			bpf_warn_invalid_xdp_action(act);
> +			/* fall through */
> +		case XDP_ABORTED:
> +			trace_xdp_exception(tun->dev, xdp_prog, act);
> +			/* fall through */
> +		case XDP_DROP:
> +			break;
> +		}

patch 8 has very similar switch. Can you share the code?

I'm worried that XDP_TX is a silent alias to XDP_PASS.
What were the reasons to go with this approach?
imo it's less error prone and extensible to warn on XDP_TX.
Which will mean that both XDP_TX and XDP_REDICT are not supported for egress atm.

Patches 8 and 9 cover tun only. I'd like to see egress hook to be implemented
in at least one physical NIC. Pick any hw. Something that handles real frames.
Adding this hook to virtual NIC is easy, but it doesn't demonstrate design
trade-offs one would need to think through by adding egress hook to physical
nic. That's why I think it's mandatory to have it as part of the patch set.

Patch 11 exposes egress to samples/bpf. It's nice, but without selftests it's
no go. All new features must be exercised as part of selftests/bpf.

re: patch 3. I agree with Toke and Jesper that union in uapi is unnecessary.
Just drop it. xdp_md is a virtual data structure. It's not allocated anywhere.
There is no need to save non-existent memory.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH RFC v4 bpf-next 09/11] tun: Support xdp in the Tx path for xdp_frames
  2020-03-02 18:30   ` Alexei Starovoitov
@ 2020-03-03  4:27     ` David Ahern
  2020-03-03  9:08       ` Jesper Dangaard Brouer
  2020-03-03 18:16       ` Alexei Starovoitov
  0 siblings, 2 replies; 38+ messages in thread
From: David Ahern @ 2020-03-03  4:27 UTC (permalink / raw)
  To: Alexei Starovoitov, David Ahern
  Cc: netdev, davem, kuba, prashantbhole.linux, jasowang, brouer, toke,
	mst, toshiaki.makita1, daniel, john.fastabend, ast, kafai,
	songliubraving, yhs, andriin, dsahern

On 3/2/20 11:30 AM, Alexei Starovoitov wrote:
> On Wed, Feb 26, 2020 at 08:20:11PM -0700, David Ahern wrote:
>> +
>> +		act = bpf_prog_run_xdp(xdp_prog, &xdp);
>> +		switch (act) {
>> +		case XDP_TX:    /* for Tx path, XDP_TX == XDP_PASS */
>> +			act = XDP_PASS;
>> +			break;
>> +		case XDP_PASS:
>> +			break;
>> +		case XDP_REDIRECT:
>> +			/* fall through */
>> +		default:
>> +			bpf_warn_invalid_xdp_action(act);
>> +			/* fall through */
>> +		case XDP_ABORTED:
>> +			trace_xdp_exception(tun->dev, xdp_prog, act);
>> +			/* fall through */
>> +		case XDP_DROP:
>> +			break;
>> +		}
> 
> patch 8 has very similar switch. Can you share the code?

Most likely; I'll take a look.

> 
> I'm worried that XDP_TX is a silent alias to XDP_PASS.
> What were the reasons to go with this approach?

As I stated in the cover letter:

"XDP_TX on Rx means send the packet out the device it arrived
on; given that, XDP_Tx for the Tx path is treated as equivalent to
XDP_PASS - ie., continue on the Tx path."

> imo it's less error prone and extensible to warn on XDP_TX.
> Which will mean that both XDP_TX and XDP_REDICT are not supported for egress atm.

I personally don't care either way; I was going with the simplest
concept from a user perspective.

> 
> Patches 8 and 9 cover tun only. I'd like to see egress hook to be implemented
> in at least one physical NIC. Pick any hw. Something that handles real frames.
> Adding this hook to virtual NIC is easy, but it doesn't demonstrate design
> trade-offs one would need to think through by adding egress hook to physical
> nic. That's why I think it's mandatory to have it as part of the patch set.
> 
> Patch 11 exposes egress to samples/bpf. It's nice, but without selftests it's
> no go. All new features must be exercised as part of selftests/bpf.

Patches that exercise the rtnetlink uapi are fairly easy to do on single
node; anything traffic related requires multiple nodes or namespace
level capabilities.  Unless I am missing something that is why all
current XDP tests ride on top of veth; veth changes are not part of this
set.

So to be clear you are saying that all new XDP features require patches
to a h/w nic, veth and whatever the author really cares about before new
features like this go in?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH RFC v4 bpf-next 09/11] tun: Support xdp in the Tx path for xdp_frames
  2020-03-03  4:27     ` David Ahern
@ 2020-03-03  9:08       ` Jesper Dangaard Brouer
  2020-03-03 18:16       ` Alexei Starovoitov
  1 sibling, 0 replies; 38+ messages in thread
From: Jesper Dangaard Brouer @ 2020-03-03  9:08 UTC (permalink / raw)
  To: David Ahern
  Cc: Alexei Starovoitov, David Ahern, netdev, davem, kuba,
	prashantbhole.linux, jasowang, toke, mst, toshiaki.makita1,
	daniel, john.fastabend, ast, kafai, songliubraving, yhs, andriin,
	dsahern, brouer

On Mon, 2 Mar 2020 21:27:08 -0700
David Ahern <dahern@digitalocean.com> wrote:

> On 3/2/20 11:30 AM, Alexei Starovoitov wrote:
> > On Wed, Feb 26, 2020 at 08:20:11PM -0700, David Ahern wrote:  
> >> +
> >> +		act = bpf_prog_run_xdp(xdp_prog, &xdp);
> >> +		switch (act) {
> >> +		case XDP_TX:    /* for Tx path, XDP_TX == XDP_PASS */
> >> +			act = XDP_PASS;
> >> +			break;
> >> +		case XDP_PASS:
> >> +			break;
> >> +		case XDP_REDIRECT:
> >> +			/* fall through */
> >> +		default:
> >> +			bpf_warn_invalid_xdp_action(act);
> >> +			/* fall through */
> >> +		case XDP_ABORTED:
> >> +			trace_xdp_exception(tun->dev, xdp_prog, act);
> >> +			/* fall through */
> >> +		case XDP_DROP:
> >> +			break;
> >> +		}  
> > 
> > patch 8 has very similar switch. Can you share the code?  
> 
> Most likely; I'll take a look.
> 
> > 
> > I'm worried that XDP_TX is a silent alias to XDP_PASS.
> > What were the reasons to go with this approach?  
> 
> As I stated in the cover letter:
> 
> "XDP_TX on Rx means send the packet out the device it arrived
> on; given that, XDP_Tx for the Tx path is treated as equivalent to
> XDP_PASS - ie., continue on the Tx path."

I'm not really buying this.  IHMO XDP_PASS should mean continue on the
Tx path, as this is a tx/egress XDP hook.  I don't see a reason to
basically "remove" the XDP_TX return code at this point.

> > imo it's less error prone and extensible to warn on XDP_TX.
> > Which will mean that both XDP_TX and XDP_REDICT are not supported
> > for egress atm.  

I agree.

I more see "XDP_TX" as a mirror facility... maybe there is a use-case
for bouncing packets back in the TX/Egress hook? That is future work,
but not reason disable the option now.


> I personally don't care either way; I was going with the simplest
> concept from a user perspective.
> 
> > 
> > Patches 8 and 9 cover tun only. I'd like to see egress hook to be
> > implemented in at least one physical NIC. Pick any hw. Something
> > that handles real frames. Adding this hook to virtual NIC is easy,
> > but it doesn't demonstrate design trade-offs one would need to
> > think through by adding egress hook to physical nic. That's why I
> > think it's mandatory to have it as part of the patch set.
> > 
> > Patch 11 exposes egress to samples/bpf. It's nice, but without
> > selftests it's no go. All new features must be exercised as part of
> > selftests/bpf.  
> 
> Patches that exercise the rtnetlink uapi are fairly easy to do on
> single node; anything traffic related requires multiple nodes or
> namespace level capabilities.  Unless I am missing something that is
> why all current XDP tests ride on top of veth; veth changes are not
> part of this set.
> 
> So to be clear you are saying that all new XDP features require
> patches to a h/w nic, veth and whatever the author really cares about
> before new features like this go in?

I would say yes. XDP is founded for physical HW NICs, and we need to
show it makes sense for physical HW NICs.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH RFC v4 bpf-next 09/11] tun: Support xdp in the Tx path for xdp_frames
  2020-02-27  3:20 ` [PATCH RFC v4 bpf-next 09/11] tun: Support xdp in the Tx path for xdp_frames David Ahern
  2020-03-02 18:30   ` Alexei Starovoitov
@ 2020-03-03 10:40   ` Jesper Dangaard Brouer
  2020-03-10  3:06     ` David Ahern
  1 sibling, 1 reply; 38+ messages in thread
From: Jesper Dangaard Brouer @ 2020-03-03 10:40 UTC (permalink / raw)
  To: David Ahern
  Cc: netdev, davem, kuba, prashantbhole.linux, jasowang, toke, mst,
	toshiaki.makita1, daniel, john.fastabend, ast, kafai,
	songliubraving, yhs, andriin, dsahern, David Ahern, brouer

On Wed, 26 Feb 2020 20:20:11 -0700
David Ahern <dsahern@kernel.org> wrote:

> From: David Ahern <dahern@digitalocean.com>
> 
> Add support to run Tx path program on packets arriving at a tun
> device via XDP redirect.
> 
> XDP_TX return code means move the packet to the Tx path of the device.
> For a program run in the Tx / egress path, XDP_TX is essentially the
> same as "continue on" which is XDP_PASS.
> 
> Conceptually, XDP_REDIRECT for this path can work the same as it
> does for the Rx path, but that return code is left for a follow
> on series.
> 
> Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
> Signed-off-by: David Ahern <dahern@digitalocean.com>
> ---
>  drivers/net/tun.c | 49 +++++++++++++++++++++++++++++++++++++++++++++--
>  1 file changed, 47 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index dcae6521a39d..d3fc7e921c85 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -1359,10 +1359,50 @@ static void __tun_xdp_flush_tfile(struct tun_file *tfile)
>  	tfile->socket.sk->sk_data_ready(tfile->socket.sk);
>  }
>  
> +static u32 tun_do_xdp_tx(struct tun_struct *tun, struct tun_file *tfile,
> +			 struct xdp_frame *frame, struct xdp_txq_info *txq)
> +{
> +	struct bpf_prog *xdp_prog;
> +	u32 act = XDP_PASS;
> +
> +	xdp_prog = rcu_dereference(tun->xdp_egress_prog);
> +	if (xdp_prog) {
> +		struct xdp_buff xdp;
> +
> +		xdp.data_hard_start = frame->data - frame->headroom;

This is correct, only because frame->headroom have been reduced with
sizeof(*xdp_frame), as we want to avoid that the BPF-prog have access
to xdp_frame memory.  Remember that memory storing xdp_frame in located
in the top of the payload/page.


> +		xdp.data = frame->data;
> +		xdp.data_end = xdp.data + frame->len;
> +		xdp_set_data_meta_invalid(&xdp);
> +		xdp.txq = txq;
> +
> +		act = bpf_prog_run_xdp(xdp_prog, &xdp);

The BPF-prog can change/adjust headroom and tailroom (tail only shrink,
but I'm working on extending this).  Thus, you need to adjust the
xdp_frame accordingly afterwards.

(The main use-case is pushing on a header, right?)

> +		switch (act) {
> +		case XDP_TX:    /* for Tx path, XDP_TX == XDP_PASS */
> +			act = XDP_PASS;
> +			break;
> +		case XDP_PASS:
> +			break;
> +		case XDP_REDIRECT:
> +			/* fall through */
> +		default:
> +			bpf_warn_invalid_xdp_action(act);
> +			/* fall through */
> +		case XDP_ABORTED:
> +			trace_xdp_exception(tun->dev, xdp_prog, act);
> +			/* fall through */
> +		case XDP_DROP:
> +			break;
> +		}
> +	}
> +
> +	return act;
> +}
> +
>  static int tun_xdp_xmit(struct net_device *dev, int n,
>  			struct xdp_frame **frames, u32 flags)
>  {
>  	struct tun_struct *tun = netdev_priv(dev);
> +	struct xdp_txq_info txq = { .dev = dev };
>  	struct tun_file *tfile;
>  	u32 numqueues;
>  	int drops = 0;
> @@ -1389,12 +1429,17 @@ static int tun_xdp_xmit(struct net_device *dev, int n,
>  	spin_lock(&tfile->tx_ring.producer_lock);
>  	for (i = 0; i < n; i++) {
>  		struct xdp_frame *xdp = frames[i];
> +		void *frame;
> +
> +		if (tun_do_xdp_tx(tun, tfile, xdp, &txq) != XDP_PASS)
> +			goto drop;
> +
>  		/* Encode the XDP flag into lowest bit for consumer to differ
>  		 * XDP buffer from sk_buff.
>  		 */
> -		void *frame = tun_xdp_to_ptr(xdp);
> -
> +		frame = tun_xdp_to_ptr(xdp);
>  		if (__ptr_ring_produce(&tfile->tx_ring, frame)) {
> +drop:
>  			this_cpu_inc(tun->pcpu_stats->tx_dropped);
>  			xdp_return_frame_rx_napi(xdp);
>  			drops++;



-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH RFC v4 bpf-next 08/11] tun: Support xdp in the Tx path for skb
  2020-02-27  3:20 ` [PATCH RFC v4 bpf-next 08/11] tun: Support xdp in the Tx path for skb David Ahern
  2020-03-02  3:28   ` Jason Wang
@ 2020-03-03 10:46   ` Jesper Dangaard Brouer
  2020-03-03 15:36     ` David Ahern
  1 sibling, 1 reply; 38+ messages in thread
From: Jesper Dangaard Brouer @ 2020-03-03 10:46 UTC (permalink / raw)
  To: David Ahern
  Cc: netdev, davem, kuba, prashantbhole.linux, jasowang, toke, mst,
	toshiaki.makita1, daniel, john.fastabend, ast, kafai,
	songliubraving, yhs, andriin, dsahern, David Ahern, brouer

On Wed, 26 Feb 2020 20:20:10 -0700
David Ahern <dsahern@kernel.org> wrote:

> +static u32 tun_do_xdp_tx_generic(struct tun_struct *tun,
> +				 struct net_device *dev,
> +				 struct sk_buff *skb)
> +{
> +	struct bpf_prog *xdp_prog;
> +	u32 act = XDP_PASS;
> +
> +	xdp_prog = rcu_dereference(tun->xdp_egress_prog);
> +	if (xdp_prog) {
> +		struct xdp_txq_info txq = { .dev = dev };
> +		struct xdp_buff xdp;
> +
> +		skb = tun_prepare_xdp_skb(skb);
> +		if (!skb) {
> +			act = XDP_DROP;
> +			goto out;
> +		}
> +
> +		xdp.txq = &txq;
> +
> +		act = do_xdp_generic_core(skb, &xdp, xdp_prog);
> +		switch (act) {
> +		case XDP_TX:    /* for Tx path, XDP_TX == XDP_PASS */
> +			act = XDP_PASS;
> +			break;
> +		case XDP_PASS:
> +			break;
> +		case XDP_REDIRECT:
> +			/* fall through */
> +		default:
> +			bpf_warn_invalid_xdp_action(act);
> +			/* fall through */
> +		case XDP_ABORTED:
> +			trace_xdp_exception(tun->dev, xdp_prog, act);

Hmm, don't we need to extend the trace_xdp_exception() to give users a
hint that this happened on the TX/egress path?

> +			/* fall through */
> +		case XDP_DROP:
> +			break;
> +		}
> +	}
> +
> +out:
> +	return act;
> +}


-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH RFC v4 bpf-next 08/11] tun: Support xdp in the Tx path for skb
  2020-03-03 10:46   ` Jesper Dangaard Brouer
@ 2020-03-03 15:36     ` David Ahern
  0 siblings, 0 replies; 38+ messages in thread
From: David Ahern @ 2020-03-03 15:36 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, David Ahern
  Cc: netdev, davem, kuba, prashantbhole.linux, jasowang, toke, mst,
	toshiaki.makita1, daniel, john.fastabend, ast, kafai,
	songliubraving, yhs, andriin, dsahern

On 3/3/20 3:46 AM, Jesper Dangaard Brouer wrote:
> On Wed, 26 Feb 2020 20:20:10 -0700
> David Ahern <dsahern@kernel.org> wrote:
> 
>> +static u32 tun_do_xdp_tx_generic(struct tun_struct *tun,
>> +				 struct net_device *dev,
>> +				 struct sk_buff *skb)
>> +{
>> +	struct bpf_prog *xdp_prog;
>> +	u32 act = XDP_PASS;
>> +
>> +	xdp_prog = rcu_dereference(tun->xdp_egress_prog);
>> +	if (xdp_prog) {
>> +		struct xdp_txq_info txq = { .dev = dev };
>> +		struct xdp_buff xdp;
>> +
>> +		skb = tun_prepare_xdp_skb(skb);
>> +		if (!skb) {
>> +			act = XDP_DROP;
>> +			goto out;
>> +		}
>> +
>> +		xdp.txq = &txq;
>> +
>> +		act = do_xdp_generic_core(skb, &xdp, xdp_prog);
>> +		switch (act) {
>> +		case XDP_TX:    /* for Tx path, XDP_TX == XDP_PASS */
>> +			act = XDP_PASS;
>> +			break;
>> +		case XDP_PASS:
>> +			break;
>> +		case XDP_REDIRECT:
>> +			/* fall through */
>> +		default:
>> +			bpf_warn_invalid_xdp_action(act);
>> +			/* fall through */
>> +		case XDP_ABORTED:
>> +			trace_xdp_exception(tun->dev, xdp_prog, act);
> 
> Hmm, don't we need to extend the trace_xdp_exception() to give users a
> hint that this happened on the TX/egress path?

tracepoint has the program id, unsupported action and device. Seems like
the program id is sufficient. I do need to update libbpf to account for
the attach type.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH RFC v4 bpf-next 09/11] tun: Support xdp in the Tx path for xdp_frames
  2020-03-03  4:27     ` David Ahern
  2020-03-03  9:08       ` Jesper Dangaard Brouer
@ 2020-03-03 18:16       ` Alexei Starovoitov
  1 sibling, 0 replies; 38+ messages in thread
From: Alexei Starovoitov @ 2020-03-03 18:16 UTC (permalink / raw)
  To: David Ahern
  Cc: David Ahern, netdev, davem, kuba, prashantbhole.linux, jasowang,
	brouer, toke, mst, toshiaki.makita1, daniel, john.fastabend, ast,
	kafai, songliubraving, yhs, andriin, dsahern

On Mon, Mar 02, 2020 at 09:27:08PM -0700, David Ahern wrote:
> > 
> > I'm worried that XDP_TX is a silent alias to XDP_PASS.
> > What were the reasons to go with this approach?
> 
> As I stated in the cover letter:
> 
> "XDP_TX on Rx means send the packet out the device it arrived
> on; given that, XDP_Tx for the Tx path is treated as equivalent to
> XDP_PASS - ie., continue on the Tx path."

I saw that, but it states the behavior and doesn't answer my "why" question.

> > imo it's less error prone and extensible to warn on XDP_TX.
> > Which will mean that both XDP_TX and XDP_REDICT are not supported for egress atm.
> 
> I personally don't care either way; I was going with the simplest
> concept from a user perspective.

That's not a good sign when uapi is designed as "dont care either way".

> > 
> > Patches 8 and 9 cover tun only. I'd like to see egress hook to be implemented
> > in at least one physical NIC. Pick any hw. Something that handles real frames.
> > Adding this hook to virtual NIC is easy, but it doesn't demonstrate design
> > trade-offs one would need to think through by adding egress hook to physical
> > nic. That's why I think it's mandatory to have it as part of the patch set.
> > 
> > Patch 11 exposes egress to samples/bpf. It's nice, but without selftests it's
> > no go. All new features must be exercised as part of selftests/bpf.
> 
> Patches that exercise the rtnetlink uapi are fairly easy to do on single
> node; anything traffic related requires multiple nodes or namespace
> level capabilities.  Unless I am missing something that is why all
> current XDP tests ride on top of veth; veth changes are not part of this
> set.
> 
> So to be clear you are saying that all new XDP features require patches
> to a h/w nic, veth and whatever the author really cares about before new
> features like this go in?

I didn't say 'veth'. I really meant 'physical nic'.
The patch set implies that XDP_EGRESS is a generic concept and applicable to
physical and virtual netdevs. There is an implementation for tun. But reading
between the lines I don't see that api was thought through on the physical nic.
Hence I'm requesting to see the patches that implement it. When you'll try to
add xdp_egress to a physical nic I suspect there will be challenges that will
force changes to xdp_egress api and I want that to happen before uapi lands in
the tree and becomes frozen.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH RFC v4 bpf-next 07/11] tun: set egress XDP program
  2020-03-02  3:32   ` Jason Wang
  2020-03-02  3:52     ` David Ahern
@ 2020-03-10  2:18     ` David Ahern
  1 sibling, 0 replies; 38+ messages in thread
From: David Ahern @ 2020-03-10  2:18 UTC (permalink / raw)
  To: Jason Wang, David Ahern, netdev
  Cc: davem, kuba, prashantbhole.linux, brouer, toke, mst,
	toshiaki.makita1, daniel, john.fastabend, ast, kafai,
	songliubraving, yhs, andriin, David Ahern

On 3/1/20 8:32 PM, Jason Wang wrote:
>> @@ -1189,15 +1190,21 @@ tun_net_get_stats64(struct net_device *dev,
>> struct rtnl_link_stats64 *stats)
>>   }
>>     static int tun_xdp_set(struct net_device *dev, struct bpf_prog *prog,
>> -               struct netlink_ext_ack *extack)
> 
> 
> How about accept a bpf_prog ** here, then there's no need to check
> egress when deferencing the pointer?
> 
> 

The egress arg is better. xdp_prog vs xdp_egress_prog is an element of
'struct tun'. tun is currently not needed in tun_xdp and is part of
tun_xdp_set already.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH RFC v4 bpf-next 09/11] tun: Support xdp in the Tx path for xdp_frames
  2020-03-03 10:40   ` Jesper Dangaard Brouer
@ 2020-03-10  3:06     ` David Ahern
  2020-03-10  3:44       ` David Ahern
  0 siblings, 1 reply; 38+ messages in thread
From: David Ahern @ 2020-03-10  3:06 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, David Ahern
  Cc: netdev, davem, kuba, prashantbhole.linux, jasowang, toke, mst,
	toshiaki.makita1, daniel, john.fastabend, ast, kafai,
	songliubraving, yhs, andriin, dsahern

On 3/3/20 3:40 AM, Jesper Dangaard Brouer wrote:
>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>> index dcae6521a39d..d3fc7e921c85 100644
>> --- a/drivers/net/tun.c
>> +++ b/drivers/net/tun.c
>> @@ -1359,10 +1359,50 @@ static void __tun_xdp_flush_tfile(struct tun_file *tfile)
>>  	tfile->socket.sk->sk_data_ready(tfile->socket.sk);
>>  }
>>  
>> +static u32 tun_do_xdp_tx(struct tun_struct *tun, struct tun_file *tfile,
>> +			 struct xdp_frame *frame, struct xdp_txq_info *txq)
>> +{
>> +	struct bpf_prog *xdp_prog;
>> +	u32 act = XDP_PASS;
>> +
>> +	xdp_prog = rcu_dereference(tun->xdp_egress_prog);
>> +	if (xdp_prog) {
>> +		struct xdp_buff xdp;
>> +
>> +		xdp.data_hard_start = frame->data - frame->headroom;
> 
> This is correct, only because frame->headroom have been reduced with
> sizeof(*xdp_frame), as we want to avoid that the BPF-prog have access
> to xdp_frame memory.  Remember that memory storing xdp_frame in located
> in the top of the payload/page.
> 
> 
>> +		xdp.data = frame->data;
>> +		xdp.data_end = xdp.data + frame->len;
>> +		xdp_set_data_meta_invalid(&xdp);
>> +		xdp.txq = txq;
>> +
>> +		act = bpf_prog_run_xdp(xdp_prog, &xdp);
> 
> The BPF-prog can change/adjust headroom and tailroom (tail only shrink,
> but I'm working on extending this).  Thus, you need to adjust the
> xdp_frame accordingly afterwards.

Why do I need to make any adjustments beyond what is done by
bpf_xdp_adjust_head and bpf_xdp_adjust_tail?

The frame is on its way out, so the stack will not see the frame after
any head or tail changes. (REDIRECT is not supported, only PASS or DROP)

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH RFC v4 bpf-next 09/11] tun: Support xdp in the Tx path for xdp_frames
  2020-03-10  3:06     ` David Ahern
@ 2020-03-10  3:44       ` David Ahern
  2020-03-10  9:03         ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 38+ messages in thread
From: David Ahern @ 2020-03-10  3:44 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, David Ahern
  Cc: netdev, davem, kuba, prashantbhole.linux, jasowang, toke, mst,
	toshiaki.makita1, daniel, john.fastabend, ast, kafai,
	songliubraving, yhs, andriin, dsahern

On 3/9/20 9:06 PM, David Ahern wrote:
> Why do I need to make any adjustments beyond what is done by
> bpf_xdp_adjust_head and bpf_xdp_adjust_tail?

never mind. forgot the switch from xdp_frame to xdp_buff there so need
to go back to frame.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH RFC v4 bpf-next 09/11] tun: Support xdp in the Tx path for xdp_frames
  2020-03-10  3:44       ` David Ahern
@ 2020-03-10  9:03         ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 38+ messages in thread
From: Jesper Dangaard Brouer @ 2020-03-10  9:03 UTC (permalink / raw)
  To: David Ahern
  Cc: David Ahern, netdev, davem, kuba, prashantbhole.linux, jasowang,
	toke, mst, toshiaki.makita1, daniel, john.fastabend, ast, kafai,
	songliubraving, yhs, andriin, dsahern, brouer

On Mon, 9 Mar 2020 21:44:22 -0600
David Ahern <dahern@digitalocean.com> wrote:

> On 3/9/20 9:06 PM, David Ahern wrote:
> > Why do I need to make any adjustments beyond what is done by
> > bpf_xdp_adjust_head and bpf_xdp_adjust_tail?  
> 
> never mind. forgot the switch from xdp_frame to xdp_buff there so need
> to go back to frame.

Yes exactly, glad you realized this yourself ;-)

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2020-03-10  9:03 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-27  3:20 [PATCH RFC v4 bpf-next 00/11] Add support for XDP in egress path David Ahern
2020-02-27  3:20 ` [PATCH RFC v4 bpf-next 01/11] net: Add XDP setup and query commands for Tx programs David Ahern
2020-02-27  3:20 ` [PATCH RFC v4 bpf-next 02/11] net: Add BPF_XDP_EGRESS as a bpf_attach_type David Ahern
2020-02-27  3:20 ` [PATCH RFC v4 bpf-next 03/11] xdp: Add xdp_txq_info to xdp_buff David Ahern
2020-02-27  8:00   ` Jesper Dangaard Brouer
2020-02-27 11:58     ` Toke Høiland-Jørgensen
2020-02-28  3:01       ` David Ahern
2020-02-28 10:10         ` Toke Høiland-Jørgensen
2020-02-27 20:44     ` David Ahern
2020-02-28 10:07       ` Toke Høiland-Jørgensen
2020-02-28 10:41         ` Jesper Dangaard Brouer
2020-02-27  3:20 ` [PATCH RFC v4 bpf-next 04/11] net: Add IFLA_XDP_EGRESS for XDP programs in the egress path David Ahern
2020-02-27  3:20 ` [PATCH RFC v4 bpf-next 05/11] net: core: rename netif_receive_generic_xdp to do_generic_xdp_core David Ahern
2020-02-27  3:20 ` [PATCH RFC v4 bpf-next 06/11] net: core: Rename do_xdp_generic to do_xdp_generic_rx and export David Ahern
2020-02-27  3:20 ` [PATCH RFC v4 bpf-next 07/11] tun: set egress XDP program David Ahern
2020-03-02  3:32   ` Jason Wang
2020-03-02  3:52     ` David Ahern
2020-03-10  2:18     ` David Ahern
2020-02-27  3:20 ` [PATCH RFC v4 bpf-next 08/11] tun: Support xdp in the Tx path for skb David Ahern
2020-03-02  3:28   ` Jason Wang
2020-03-02  3:41     ` David Ahern
2020-03-03 10:46   ` Jesper Dangaard Brouer
2020-03-03 15:36     ` David Ahern
2020-02-27  3:20 ` [PATCH RFC v4 bpf-next 09/11] tun: Support xdp in the Tx path for xdp_frames David Ahern
2020-03-02 18:30   ` Alexei Starovoitov
2020-03-03  4:27     ` David Ahern
2020-03-03  9:08       ` Jesper Dangaard Brouer
2020-03-03 18:16       ` Alexei Starovoitov
2020-03-03 10:40   ` Jesper Dangaard Brouer
2020-03-10  3:06     ` David Ahern
2020-03-10  3:44       ` David Ahern
2020-03-10  9:03         ` Jesper Dangaard Brouer
2020-02-27  3:20 ` [PATCH RFC v4 bpf-next 10/11] libbpf: Add egress XDP support David Ahern
2020-02-27  3:20 ` [PATCH RFC v4 bpf-next 11/11] samples/bpf: xdp1, add " David Ahern
2020-02-27 11:55 ` [PATCH RFC v4 bpf-next 00/11] Add support for XDP in egress path Toke Høiland-Jørgensen
2020-02-27 16:22   ` Alexei Starovoitov
2020-02-27 17:06     ` Toke Høiland-Jørgensen
2020-02-27 18:37       ` Alexei Starovoitov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.