All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC v2 net-next 00/12] XDP in tx path
@ 2019-12-26  2:31 Prashant Bhole
  2019-12-26  2:31 ` [RFC v2 net-next 01/12] net: introduce BPF_XDP_EGRESS attach type for XDP Prashant Bhole
                   ` (12 more replies)
  0 siblings, 13 replies; 22+ messages in thread
From: Prashant Bhole @ 2019-12-26  2:31 UTC (permalink / raw)
  To: David S . Miller, Michael S . Tsirkin, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer
  Cc: Prashant Bhole, Jason Wang, David Ahern, Jakub Kicinski,
	John Fastabend, Toshiaki Makita, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, netdev

v2:
- New XDP attachment type: Jesper, Toke and Alexei discussed whether
  to introduce a new program type. Since this set adds a way to attach
  regular XDP program to the tx path, as per Alexei's suggestion, a
  new attachment type BPF_XDP_EGRESS is introduced.

- libbpf API changes:
  Alexei had suggested _opts() style of API extension. Considering it
  two new libbpf APIs are introduced which are equivalent to existing
  APIs. New ones can be extended easily. Please see individual patches
  for details. xdp1 sample program is modified to use new APIs.

- tun: Some patches from previous set are removed as they are
  irrelevant in this series. They will in introduced later.


This series introduces new XDP attachment type BPF_XDP_EGRESS to run
an XDP program in tx path. The idea is to emulate RX path XDP of the
peer interface. Such program will not have access to rxq info.

RFC also includes its usage in tun driver.
Later it can be posted separately. Another possible use of this
feature can be in veth driver. It can improve container networking
where veth pair links the host and the container. Host can set ACL by
setting tx path XDP to the veth interface.

It was originally a part of Jason Wang's work "XDP offload with
virtio-net" [1]. In order to simplify this work we decided to split
it and introduce tx path XDP separately in this set.

The performance improvment can be seen when an XDP program is attached
to tun tx path opposed to rx path in the guest.

* Case 1: When packets are XDP_REDIRECT'ed towards tun.

                     virtio-net rx XDP      tun tx XDP
  xdp1(XDP_DROP)        2.57 Mpps           12.90 Mpps
  xdp2(XDP_TX)          1.53 Mpps            7.15 Mpps

* Case 2: When packets are pass through bridge towards tun

                     virtio-net rx XDP      tun tx XDP
  xdp1(XDP_DROP)        0.99 Mpps           1.00 Mpps
  xdp2(XDP_TX)          1.19 Mpps           0.97 Mpps

Since this set modifies tun and vhost_net, below are the netperf
performance numbers.

    Netperf_test       Before      After   Difference
  UDP_STREAM 18byte     90.14       88.77    -1.51%
  UDP_STREAM 1472byte   6955        6658     -4.27%
  TCP STREAM            9409        9402     -0.07%
  UDP_RR                12658       13030    +2.93%
  TCP_RR                12711       12831    +0.94%

XDP_REDIRECT will be handled later because we need to come up with
proper way to handle it in tx path.

Patches 1-5 are related to adding tx path XDP support.
Patches 6-12 implement tx path XDP in tun driver.

[1]: https://netdevconf.info/0x13/session.html?xdp-offload-with-virtio-net



David Ahern (2):
  net: introduce BPF_XDP_EGRESS attach type for XDP
  tun: set tx path XDP program

Jason Wang (2):
  net: core: rename netif_receive_generic_xdp() to do_generic_xdp_core()
  net: core: export do_xdp_generic_core()

Prashant Bhole (8):
  tools: sync kernel uapi/linux/if_link.h header
  libbpf: api for getting/setting link xdp options
  libbpf: set xdp program in tx path
  samples/bpf: xdp1, add XDP tx support
  tuntap: check tun_msg_ctl type at necessary places
  vhost_net: user tap recvmsg api to access ptr ring
  tuntap: remove usage of ptr ring in vhost_net
  tun: run XDP program in tx path

 drivers/net/tap.c                  |  42 +++---
 drivers/net/tun.c                  | 220 ++++++++++++++++++++++++++---
 drivers/vhost/net.c                |  77 +++++-----
 include/linux/if_tap.h             |   5 -
 include/linux/if_tun.h             |  23 ++-
 include/linux/netdevice.h          |   6 +-
 include/uapi/linux/bpf.h           |   1 +
 include/uapi/linux/if_link.h       |   1 +
 net/core/dev.c                     |  42 ++++--
 net/core/filter.c                  |   8 ++
 net/core/rtnetlink.c               | 112 ++++++++++++++-
 samples/bpf/xdp1_user.c            |  42 ++++--
 tools/include/uapi/linux/bpf.h     |   1 +
 tools/include/uapi/linux/if_link.h |   2 +
 tools/lib/bpf/libbpf.h             |  40 ++++++
 tools/lib/bpf/libbpf.map           |   2 +
 tools/lib/bpf/netlink.c            | 113 +++++++++++++--
 17 files changed, 613 insertions(+), 124 deletions(-)

-- 
2.21.0


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [RFC v2 net-next 01/12] net: introduce BPF_XDP_EGRESS attach type for XDP
  2019-12-26  2:31 [RFC v2 net-next 00/12] XDP in tx path Prashant Bhole
@ 2019-12-26  2:31 ` Prashant Bhole
  2019-12-27 14:27   ` Jesper Dangaard Brouer
  2019-12-26  2:31 ` [RFC v2 net-next 02/12] tools: sync kernel uapi/linux/if_link.h header Prashant Bhole
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 22+ messages in thread
From: Prashant Bhole @ 2019-12-26  2:31 UTC (permalink / raw)
  To: David S . Miller, Michael S . Tsirkin, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer
  Cc: David Ahern, Jason Wang, David Ahern, Jakub Kicinski,
	John Fastabend, Toshiaki Makita, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, netdev, Prashant Bhole

From: David Ahern <dahern@digitalocean.com>

There was a need to run XDP program in tx path such that it emulates
rx path XDP on the peer interface.

Possible use cases:
- virtio-net XDP offload, where virtio-net drivers implements offload
  feature such that it sends the XDP program to QEMU and then QEMU
  runs the XDP program in the tx path of tap device.

- Container networking, where veth pair links the host and the
  container. Host can set ACL by setting tx path XDP to the veth
  interface.

This patch introduces a new bpf attach type BPF_XDP_EGRESS. Programs
having this attach type will be allowed to run in the tx path. It is
because we need to prevent the programs from accessing rxq info when
they are running in tx path. Verifier can reject the programs those
have this attach type and trying to access rxq info.

Patch also introduces a new netlink attribute IFLA_XDP_TX which can
be used for setting XDP program in tx path and to get information of
such programs.

Drivers those want to support tx path XDP needs to handle
XDP_SETUP_PROG_TX and XDP_QUERY_PROG_TX cases in their ndo_bpf.

Signed-off-by: David Ahern <dahern@digitalocean.com>
Co-developed-by: Prashant Bhole <prashantbhole.linux@gmail.com>
Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
---
 include/linux/netdevice.h      |   4 +-
 include/uapi/linux/bpf.h       |   1 +
 include/uapi/linux/if_link.h   |   1 +
 net/core/dev.c                 |  34 +++++++---
 net/core/filter.c              |   8 +++
 net/core/rtnetlink.c           | 112 ++++++++++++++++++++++++++++++++-
 tools/include/uapi/linux/bpf.h |   1 +
 7 files changed, 150 insertions(+), 11 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 469a297b58c0..ac3e88d86581 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -865,8 +865,10 @@ enum bpf_netdev_command {
 	 */
 	XDP_SETUP_PROG,
 	XDP_SETUP_PROG_HW,
+	XDP_SETUP_PROG_TX,
 	XDP_QUERY_PROG,
 	XDP_QUERY_PROG_HW,
+	XDP_QUERY_PROG_TX,
 	/* BPF program for offload callbacks, invoked at program load time. */
 	BPF_OFFLOAD_MAP_ALLOC,
 	BPF_OFFLOAD_MAP_FREE,
@@ -3725,7 +3727,7 @@ struct sk_buff *dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
 
 typedef int (*bpf_op_t)(struct net_device *dev, struct netdev_bpf *bpf);
 int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
-		      int fd, u32 flags);
+		      int fd, u32 flags, bool tx);
 u32 __dev_xdp_query(struct net_device *dev, bpf_op_t xdp_op,
 		    enum bpf_netdev_command cmd);
 int xdp_umem_query(struct net_device *dev, u16 queue_id);
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
index dbbcf0b02970..23c1841c8086 100644
--- a/include/uapi/linux/bpf.h
+++ b/include/uapi/linux/bpf.h
@@ -203,6 +203,7 @@ enum bpf_attach_type {
 	BPF_TRACE_RAW_TP,
 	BPF_TRACE_FENTRY,
 	BPF_TRACE_FEXIT,
+	BPF_XDP_EGRESS,
 	__MAX_BPF_ATTACH_TYPE
 };
 
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 1d69f637c5d6..be97c9787140 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -170,6 +170,7 @@ enum {
 	IFLA_PROP_LIST,
 	IFLA_ALT_IFNAME, /* Alternative ifname */
 	IFLA_PERM_ADDRESS,
+	IFLA_XDP_TX,
 	__IFLA_MAX
 };
 
diff --git a/net/core/dev.c b/net/core/dev.c
index 0ad39c87b7fd..ae66fd791737 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -8540,7 +8540,7 @@ u32 __dev_xdp_query(struct net_device *dev, bpf_op_t bpf_op,
 
 static int dev_xdp_install(struct net_device *dev, bpf_op_t bpf_op,
 			   struct netlink_ext_ack *extack, u32 flags,
-			   struct bpf_prog *prog)
+			   struct bpf_prog *prog, bool tx)
 {
 	struct netdev_bpf xdp;
 
@@ -8548,7 +8548,8 @@ static int dev_xdp_install(struct net_device *dev, bpf_op_t bpf_op,
 	if (flags & XDP_FLAGS_HW_MODE)
 		xdp.command = XDP_SETUP_PROG_HW;
 	else
-		xdp.command = XDP_SETUP_PROG;
+		xdp.command = tx ? XDP_SETUP_PROG_TX : XDP_SETUP_PROG;
+
 	xdp.extack = extack;
 	xdp.flags = flags;
 	xdp.prog = prog;
@@ -8562,7 +8563,8 @@ static void dev_xdp_uninstall(struct net_device *dev)
 	bpf_op_t ndo_bpf;
 
 	/* Remove generic XDP */
-	WARN_ON(dev_xdp_install(dev, generic_xdp_install, NULL, 0, NULL));
+	WARN_ON(dev_xdp_install(dev, generic_xdp_install, NULL, 0, NULL,
+				false));
 
 	/* Remove from the driver */
 	ndo_bpf = dev->netdev_ops->ndo_bpf;
@@ -8574,14 +8576,21 @@ static void dev_xdp_uninstall(struct net_device *dev)
 	WARN_ON(ndo_bpf(dev, &xdp));
 	if (xdp.prog_id)
 		WARN_ON(dev_xdp_install(dev, ndo_bpf, NULL, xdp.prog_flags,
-					NULL));
+					NULL, false));
 
 	/* Remove HW offload */
 	memset(&xdp, 0, sizeof(xdp));
 	xdp.command = XDP_QUERY_PROG_HW;
 	if (!ndo_bpf(dev, &xdp) && xdp.prog_id)
 		WARN_ON(dev_xdp_install(dev, ndo_bpf, NULL, xdp.prog_flags,
-					NULL));
+					NULL, false));
+
+	/* Remove HW offload */
+	memset(&xdp, 0, sizeof(xdp));
+	xdp.command = XDP_QUERY_PROG_TX;
+	if (!ndo_bpf(dev, &xdp) && xdp.prog_id)
+		WARN_ON(dev_xdp_install(dev, ndo_bpf, NULL, xdp.prog_flags,
+					NULL, true));
 }
 
 /**
@@ -8594,7 +8603,7 @@ static void dev_xdp_uninstall(struct net_device *dev)
  *	Set or clear a bpf program for a device
  */
 int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
-		      int fd, u32 flags)
+		      int fd, u32 flags, bool tx)
 {
 	const struct net_device_ops *ops = dev->netdev_ops;
 	enum bpf_netdev_command query;
@@ -8606,7 +8615,10 @@ int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
 	ASSERT_RTNL();
 
 	offload = flags & XDP_FLAGS_HW_MODE;
-	query = offload ? XDP_QUERY_PROG_HW : XDP_QUERY_PROG;
+	if (tx)
+		query = XDP_QUERY_PROG_TX;
+	else
+		query = offload ? XDP_QUERY_PROG_HW : XDP_QUERY_PROG;
 
 	bpf_op = bpf_chk = ops->ndo_bpf;
 	if (!bpf_op && (flags & (XDP_FLAGS_DRV_MODE | XDP_FLAGS_HW_MODE))) {
@@ -8621,7 +8633,8 @@ int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
 	if (fd >= 0) {
 		u32 prog_id;
 
-		if (!offload && __dev_xdp_query(dev, bpf_chk, XDP_QUERY_PROG)) {
+		if (!offload && !tx &&
+		    __dev_xdp_query(dev, bpf_chk, XDP_QUERY_PROG)) {
 			NL_SET_ERR_MSG(extack, "native and generic XDP can't be active at the same time");
 			return -EEXIST;
 		}
@@ -8637,6 +8650,9 @@ int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
 		if (IS_ERR(prog))
 			return PTR_ERR(prog);
 
+		if (tx && prog->expected_attach_type != BPF_XDP_EGRESS)
+			return -EINVAL;
+
 		if (!offload && bpf_prog_is_dev_bound(prog->aux)) {
 			NL_SET_ERR_MSG(extack, "using device-bound program without HW_MODE flag is not supported");
 			bpf_prog_put(prog);
@@ -8653,7 +8669,7 @@ int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
 			return 0;
 	}
 
-	err = dev_xdp_install(dev, bpf_op, extack, flags, prog);
+	err = dev_xdp_install(dev, bpf_op, extack, flags, prog, tx);
 	if (err < 0 && prog)
 		bpf_prog_put(prog);
 
diff --git a/net/core/filter.c b/net/core/filter.c
index 28b3c258188c..aaf04ff297c7 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -6896,6 +6896,14 @@ static bool xdp_is_valid_access(int off, int size,
 		return false;
 	}
 
+	if (prog->expected_attach_type == BPF_XDP_EGRESS) {
+		switch (off) {
+		case offsetof(struct xdp_md, rx_queue_index):
+		case offsetof(struct xdp_md, ingress_ifindex):
+			return false;
+		}
+	}
+
 	switch (off) {
 	case offsetof(struct xdp_md, data):
 		info->reg_type = PTR_TO_PACKET;
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 20bc406f3871..9dc4b2547f62 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1395,6 +1395,36 @@ static int rtnl_fill_link_ifmap(struct sk_buff *skb, struct net_device *dev)
 	return 0;
 }
 
+static u32 rtnl_xdp_tx_prog_drv(struct net_device *dev)
+{
+	return __dev_xdp_query(dev, dev->netdev_ops->ndo_bpf,
+			       XDP_QUERY_PROG_TX);
+}
+
+static int rtnl_xdp_tx_report_one(struct sk_buff *skb, struct net_device *dev,
+				  u32 *prog_id, u8 *mode, u8 tgt_mode, u32 attr,
+				  u32 (*get_prog_id)(struct net_device *dev))
+{
+	u32 curr_id;
+	int err;
+
+	curr_id = get_prog_id(dev);
+	if (!curr_id)
+		return 0;
+
+	*prog_id = curr_id;
+	err = nla_put_u32(skb, attr, curr_id);
+	if (err)
+		return err;
+
+	if (*mode != XDP_ATTACHED_NONE)
+		*mode = XDP_ATTACHED_MULTI;
+	else
+		*mode = tgt_mode;
+
+	return 0;
+}
+
 static u32 rtnl_xdp_prog_skb(struct net_device *dev)
 {
 	const struct bpf_prog *generic_xdp_prog;
@@ -1486,6 +1516,41 @@ static int rtnl_xdp_fill(struct sk_buff *skb, struct net_device *dev)
 	return err;
 }
 
+static int rtnl_xdp_tx_fill(struct sk_buff *skb, struct net_device *dev)
+{
+	u8 mode = XDP_ATTACHED_NONE;
+	struct nlattr *xdp;
+	u32 prog_id = 0;
+	int err;
+
+	xdp = nla_nest_start_noflag(skb, IFLA_XDP_TX);
+	if (!xdp)
+		return -EMSGSIZE;
+
+	err = rtnl_xdp_tx_report_one(skb, dev, &prog_id, &mode,
+				     XDP_ATTACHED_DRV, IFLA_XDP_DRV_PROG_ID,
+				     rtnl_xdp_tx_prog_drv);
+	if (err)
+		goto err_cancel;
+
+	err = nla_put_u8(skb, IFLA_XDP_ATTACHED, mode);
+	if (err)
+		goto err_cancel;
+
+	if (prog_id && mode != XDP_ATTACHED_MULTI) {
+		err = nla_put_u32(skb, IFLA_XDP_PROG_ID, prog_id);
+		if (err)
+			goto err_cancel;
+	}
+
+	nla_nest_end(skb, xdp);
+	return 0;
+
+err_cancel:
+	nla_nest_cancel(skb, xdp);
+	return err;
+}
+
 static u32 rtnl_get_event(unsigned long event)
 {
 	u32 rtnl_event_type = IFLA_EVENT_NONE;
@@ -1743,6 +1808,9 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb,
 	if (rtnl_xdp_fill(skb, dev))
 		goto nla_put_failure;
 
+	if (rtnl_xdp_tx_fill(skb, dev))
+		goto nla_put_failure;
+
 	if (dev->rtnl_link_ops || rtnl_have_link_slave_info(dev)) {
 		if (rtnl_link_fill(skb, dev) < 0)
 			goto nla_put_failure;
@@ -1827,6 +1895,7 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = {
 	[IFLA_ALT_IFNAME]	= { .type = NLA_STRING,
 				    .len = ALTIFNAMSIZ - 1 },
 	[IFLA_PERM_ADDRESS]	= { .type = NLA_REJECT },
+	[IFLA_XDP_TX]		= { .type = NLA_NESTED },
 };
 
 static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = {
@@ -2801,7 +2870,48 @@ static int do_setlink(const struct sk_buff *skb,
 		if (xdp[IFLA_XDP_FD]) {
 			err = dev_change_xdp_fd(dev, extack,
 						nla_get_s32(xdp[IFLA_XDP_FD]),
-						xdp_flags);
+						xdp_flags, false);
+			if (err)
+				goto errout;
+			status |= DO_SETLINK_NOTIFY;
+		}
+	}
+
+	if (tb[IFLA_XDP_TX]) {
+		struct nlattr *xdp[IFLA_XDP_MAX + 1];
+		u32 xdp_flags = 0;
+
+		err = nla_parse_nested_deprecated(xdp, IFLA_XDP_MAX,
+						  tb[IFLA_XDP_TX],
+						  ifla_xdp_policy, NULL);
+		if (err < 0)
+			goto errout;
+
+		if (xdp[IFLA_XDP_ATTACHED] || xdp[IFLA_XDP_PROG_ID]) {
+			err = -EINVAL;
+			goto errout;
+		}
+
+		if (xdp[IFLA_XDP_FLAGS]) {
+			xdp_flags = nla_get_u32(xdp[IFLA_XDP_FLAGS]);
+			if (xdp_flags & XDP_FLAGS_HW_MODE) {
+				err = -EINVAL;
+				goto errout;
+			}
+			if (xdp_flags & ~XDP_FLAGS_MASK) {
+				err = -EINVAL;
+				goto errout;
+			}
+			if (hweight32(xdp_flags & XDP_FLAGS_MODES) > 1) {
+				err = -EINVAL;
+				goto errout;
+			}
+		}
+
+		if (xdp[IFLA_XDP_FD]) {
+			err = dev_change_xdp_fd(dev, extack,
+						nla_get_s32(xdp[IFLA_XDP_FD]),
+						xdp_flags, true);
 			if (err)
 				goto errout;
 			status |= DO_SETLINK_NOTIFY;
diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h
index dbbcf0b02970..23c1841c8086 100644
--- a/tools/include/uapi/linux/bpf.h
+++ b/tools/include/uapi/linux/bpf.h
@@ -203,6 +203,7 @@ enum bpf_attach_type {
 	BPF_TRACE_RAW_TP,
 	BPF_TRACE_FENTRY,
 	BPF_TRACE_FEXIT,
+	BPF_XDP_EGRESS,
 	__MAX_BPF_ATTACH_TYPE
 };
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC v2 net-next 02/12] tools: sync kernel uapi/linux/if_link.h header
  2019-12-26  2:31 [RFC v2 net-next 00/12] XDP in tx path Prashant Bhole
  2019-12-26  2:31 ` [RFC v2 net-next 01/12] net: introduce BPF_XDP_EGRESS attach type for XDP Prashant Bhole
@ 2019-12-26  2:31 ` Prashant Bhole
  2019-12-26  2:31 ` [RFC v2 net-next 03/12] libbpf: api for getting/setting link xdp options Prashant Bhole
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 22+ messages in thread
From: Prashant Bhole @ 2019-12-26  2:31 UTC (permalink / raw)
  To: David S . Miller, Michael S . Tsirkin, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer
  Cc: Prashant Bhole, Jason Wang, David Ahern, Jakub Kicinski,
	John Fastabend, Toshiaki Makita, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, netdev

if_link.h was out of sync. Also it was recently updated to add
IFLA_XDP_TX attribute.

Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
---
 tools/include/uapi/linux/if_link.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/include/uapi/linux/if_link.h b/tools/include/uapi/linux/if_link.h
index 8aec8769d944..be97c9787140 100644
--- a/tools/include/uapi/linux/if_link.h
+++ b/tools/include/uapi/linux/if_link.h
@@ -169,6 +169,8 @@ enum {
 	IFLA_MAX_MTU,
 	IFLA_PROP_LIST,
 	IFLA_ALT_IFNAME, /* Alternative ifname */
+	IFLA_PERM_ADDRESS,
+	IFLA_XDP_TX,
 	__IFLA_MAX
 };
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC v2 net-next 03/12] libbpf: api for getting/setting link xdp options
  2019-12-26  2:31 [RFC v2 net-next 00/12] XDP in tx path Prashant Bhole
  2019-12-26  2:31 ` [RFC v2 net-next 01/12] net: introduce BPF_XDP_EGRESS attach type for XDP Prashant Bhole
  2019-12-26  2:31 ` [RFC v2 net-next 02/12] tools: sync kernel uapi/linux/if_link.h header Prashant Bhole
@ 2019-12-26  2:31 ` Prashant Bhole
  2019-12-30  4:49   ` Andrii Nakryiko
  2019-12-26  2:31 ` [RFC v2 net-next 04/12] libbpf: set xdp program in tx path Prashant Bhole
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 22+ messages in thread
From: Prashant Bhole @ 2019-12-26  2:31 UTC (permalink / raw)
  To: David S . Miller, Michael S . Tsirkin, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer
  Cc: Prashant Bhole, Jason Wang, David Ahern, Jakub Kicinski,
	John Fastabend, Toshiaki Makita, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, netdev

This patch introduces and uses new APIs:

struct bpf_link_xdp_opts {
        struct xdp_link_info *link_info;
        size_t link_info_sz;
        __u32 flags;
        __u32 prog_id;
        int prog_fd;
};

enum bpf_link_cmd {
	BPF_LINK_GET_XDP_INFO,
	BPF_LINK_GET_XDP_ID,
	BPF_LINK_SET_XDP_FD,
};

int bpf_get_link_opts(int ifindex, struct bpf_link_xdp_opts *opts,
		      enum bpf_link_cmd cmd);
int bpf_set_link_opts(int ifindex, struct bpf_link_xdp_opts *opts,
		      enum bpf_link_cmd cmd);

The operations performed by these two functions are equivalent to
existing APIs.

BPF_LINK_GET_XDP_ID equivalent to bpf_get_link_xdp_id()
BPF_LINK_SET_XDP_FD equivalent to bpf_set_link_xdp_fd()
BPF_LINK_GET_XDP_INFO equivalent to bpf_get_link_xdp_info()

It will be easy to extend this API by adding members in struct
bpf_link_xdp_opts and adding different operations. Next patch
will extend this API to set XDP program in the tx path.

Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
---
 tools/lib/bpf/libbpf.h   | 36 +++++++++++++++++++
 tools/lib/bpf/libbpf.map |  2 ++
 tools/lib/bpf/netlink.c  | 77 ++++++++++++++++++++++++++++++++++++----
 3 files changed, 109 insertions(+), 6 deletions(-)

diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 0dbf4bfba0c4..8178fd5a1e8f 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -443,10 +443,46 @@ struct xdp_link_info {
 	__u8 attach_mode;
 };
 
+struct bpf_link_xdp_opts {
+	struct xdp_link_info *link_info;
+	size_t link_info_sz;
+	__u32 flags;
+	__u32 prog_id;
+	int prog_fd;
+};
+
+/*
+ * enum values below are set of commands to get and set/get XDP related
+ * attributes to a link. These are used along with struct bpf_link_xdp_opts.
+ *
+ * BPF_LINK_GET_XDP_INFO uses fields:
+ *	- link_info
+ *	- link_info_sz
+ *	- flags
+ *
+ * BPF_LINK_SET_XDP_FD uses fields:
+ *	- flags
+ *
+ * BPF_LINK_SET_XDP_FD uses fields:
+ *	- prog_fd
+ *	- flags
+ */
+enum bpf_link_cmd {
+	BPF_LINK_GET_XDP_INFO,
+	BPF_LINK_GET_XDP_ID,
+	BPF_LINK_SET_XDP_FD,
+};
+
 LIBBPF_API int bpf_set_link_xdp_fd(int ifindex, int fd, __u32 flags);
 LIBBPF_API int bpf_get_link_xdp_id(int ifindex, __u32 *prog_id, __u32 flags);
 LIBBPF_API int bpf_get_link_xdp_info(int ifindex, struct xdp_link_info *info,
 				     size_t info_size, __u32 flags);
+LIBBPF_API __u32 bpf_get_link_xdp_info_id(struct xdp_link_info *info,
+					  __u32 flags);
+LIBBPF_API int bpf_get_link_opts(int ifindex, struct bpf_link_xdp_opts *opts,
+				 enum bpf_link_cmd cmd);
+LIBBPF_API int bpf_set_link_opts(int ifindex, struct bpf_link_xdp_opts *opts,
+				 enum bpf_link_cmd cmd);
 
 struct perf_buffer;
 
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index 8ddc2c40e482..332522fb5853 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -207,4 +207,6 @@ LIBBPF_0.0.6 {
 		bpf_program__size;
 		btf__find_by_name_kind;
 		libbpf_find_vmlinux_btf_id;
+		bpf_set_link_opts;
+		bpf_get_link_opts;
 } LIBBPF_0.0.5;
diff --git a/tools/lib/bpf/netlink.c b/tools/lib/bpf/netlink.c
index 5065c1aa1061..1274b540a9ad 100644
--- a/tools/lib/bpf/netlink.c
+++ b/tools/lib/bpf/netlink.c
@@ -129,8 +129,10 @@ static int bpf_netlink_recv(int sock, __u32 nl_pid, int seq,
 	return ret;
 }
 
-int bpf_set_link_xdp_fd(int ifindex, int fd, __u32 flags)
+static int __bpf_set_link_xdp_fd(int ifindex, struct bpf_link_xdp_opts *opts)
 {
+	int fd = opts->prog_fd;
+	__u32 flags = opts->flags;
 	int sock, seq = 0, ret;
 	struct nlattr *nla, *nla_xdp;
 	struct {
@@ -188,6 +190,16 @@ int bpf_set_link_xdp_fd(int ifindex, int fd, __u32 flags)
 	return ret;
 }
 
+int bpf_set_link_xdp_fd(int ifindex, int fd, __u32 flags)
+{
+	struct bpf_link_xdp_opts opts = {};
+
+	opts.prog_fd = fd;
+	opts.flags = flags;
+
+	return bpf_set_link_opts(ifindex, &opts, BPF_LINK_SET_XDP_FD);
+}
+
 static int __dump_link_nlmsg(struct nlmsghdr *nlh,
 			     libbpf_dump_nlmsg_t dump_link_nlmsg, void *cookie)
 {
@@ -248,10 +260,12 @@ static int get_xdp_info(void *cookie, void *msg, struct nlattr **tb)
 	return 0;
 }
 
-int bpf_get_link_xdp_info(int ifindex, struct xdp_link_info *info,
-			  size_t info_size, __u32 flags)
+static int __bpf_get_link_xdp_info(int ifindex, struct bpf_link_xdp_opts *opts)
 {
+	struct xdp_link_info *info = opts->link_info;
+	size_t info_size = opts->link_info_sz;
 	struct xdp_id_md xdp_id = {};
+	__u32 flags = opts->flags;
 	int sock, ret;
 	__u32 nl_pid;
 	__u32 mask;
@@ -284,6 +298,18 @@ int bpf_get_link_xdp_info(int ifindex, struct xdp_link_info *info,
 	return ret;
 }
 
+int bpf_get_link_xdp_info(int ifindex, struct xdp_link_info *info,
+			  size_t info_size, __u32 flags)
+{
+	struct bpf_link_xdp_opts opts = {};
+
+	opts.link_info = info;
+	opts.link_info_sz = info_size;
+	opts.flags = flags;
+
+	return bpf_get_link_opts(ifindex, &opts, BPF_LINK_GET_XDP_INFO);
+}
+
 static __u32 get_xdp_id(struct xdp_link_info *info, __u32 flags)
 {
 	if (info->attach_mode != XDP_ATTACHED_MULTI)
@@ -300,12 +326,13 @@ static __u32 get_xdp_id(struct xdp_link_info *info, __u32 flags)
 
 int bpf_get_link_xdp_id(int ifindex, __u32 *prog_id, __u32 flags)
 {
-	struct xdp_link_info info;
+	struct bpf_link_xdp_opts opts = {};
 	int ret;
 
-	ret = bpf_get_link_xdp_info(ifindex, &info, sizeof(info), flags);
+	opts.flags = flags;
+	ret =  bpf_get_link_opts(ifindex, &opts, BPF_LINK_GET_XDP_ID);
 	if (!ret)
-		*prog_id = get_xdp_id(&info, flags);
+		*prog_id = opts.prog_id;
 
 	return ret;
 }
@@ -449,3 +476,41 @@ int libbpf_nl_get_filter(int sock, unsigned int nl_pid, int ifindex, int handle,
 	return bpf_netlink_recv(sock, nl_pid, seq, __dump_filter_nlmsg,
 				dump_filter_nlmsg, cookie);
 }
+
+int bpf_set_link_opts(int ifindex, struct bpf_link_xdp_opts *opts,
+		      enum bpf_link_cmd cmd)
+{
+	switch (cmd) {
+	case BPF_LINK_SET_XDP_FD:
+		return __bpf_set_link_xdp_fd(ifindex, opts);
+	case BPF_LINK_GET_XDP_INFO:
+	case BPF_LINK_GET_XDP_ID:
+	default:
+		return -EINVAL;
+	}
+}
+
+int bpf_get_link_opts(int ifindex, struct bpf_link_xdp_opts *opts,
+		      enum bpf_link_cmd cmd)
+{
+	switch (cmd) {
+	case BPF_LINK_GET_XDP_INFO:
+		return __bpf_get_link_xdp_info(ifindex, opts);
+	case BPF_LINK_GET_XDP_ID: {
+		struct bpf_link_xdp_opts tmp_opts = {};
+		struct xdp_link_info link_info = {};
+		int ret;
+
+		tmp_opts.flags = opts->flags;
+		tmp_opts.link_info = &link_info;
+		tmp_opts.link_info_sz = sizeof(link_info);
+		ret = __bpf_get_link_xdp_info(ifindex, &tmp_opts);
+		if (!ret)
+			opts->prog_id = get_xdp_id(&link_info, opts->flags);
+		return ret;
+	}
+	case BPF_LINK_SET_XDP_FD:
+	default:
+		return -EINVAL;
+	}
+}
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC v2 net-next 04/12] libbpf: set xdp program in tx path
  2019-12-26  2:31 [RFC v2 net-next 00/12] XDP in tx path Prashant Bhole
                   ` (2 preceding siblings ...)
  2019-12-26  2:31 ` [RFC v2 net-next 03/12] libbpf: api for getting/setting link xdp options Prashant Bhole
@ 2019-12-26  2:31 ` Prashant Bhole
  2019-12-26  2:31 ` [RFC v2 net-next 05/12] samples/bpf: xdp1, add XDP tx support Prashant Bhole
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 22+ messages in thread
From: Prashant Bhole @ 2019-12-26  2:31 UTC (permalink / raw)
  To: David S . Miller, Michael S . Tsirkin, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer
  Cc: Prashant Bhole, Jason Wang, David Ahern, Jakub Kicinski,
	John Fastabend, Toshiaki Makita, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, netdev

Existing libbpf APIs to set/get XDP attributes of a link were
written for rx path. This patch extends the new APIs introduced
in last patch. We need to set the tx_path parameter in struct
bpf_link_xdp_opts to attach the program in tx path.

Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
---
 tools/lib/bpf/libbpf.h  |  4 ++++
 tools/lib/bpf/netlink.c | 36 ++++++++++++++++++++++++++++++------
 2 files changed, 34 insertions(+), 6 deletions(-)

diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 8178fd5a1e8f..c073d0eb3bf5 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -449,6 +449,7 @@ struct bpf_link_xdp_opts {
 	__u32 flags;
 	__u32 prog_id;
 	int prog_fd;
+	bool tx_path;
 };
 
 /*
@@ -459,13 +460,16 @@ struct bpf_link_xdp_opts {
  *	- link_info
  *	- link_info_sz
  *	- flags
+ *	- tx_path
  *
  * BPF_LINK_SET_XDP_FD uses fields:
  *	- flags
+ *	- tx_path
  *
  * BPF_LINK_SET_XDP_FD uses fields:
  *	- prog_fd
  *	- flags
+ *	- tx_path
  */
 enum bpf_link_cmd {
 	BPF_LINK_GET_XDP_INFO,
diff --git a/tools/lib/bpf/netlink.c b/tools/lib/bpf/netlink.c
index 1274b540a9ad..c839495e8f03 100644
--- a/tools/lib/bpf/netlink.c
+++ b/tools/lib/bpf/netlink.c
@@ -133,6 +133,7 @@ static int __bpf_set_link_xdp_fd(int ifindex, struct bpf_link_xdp_opts *opts)
 {
 	int fd = opts->prog_fd;
 	__u32 flags = opts->flags;
+	bool tx = opts->tx_path;
 	int sock, seq = 0, ret;
 	struct nlattr *nla, *nla_xdp;
 	struct {
@@ -158,7 +159,8 @@ static int __bpf_set_link_xdp_fd(int ifindex, struct bpf_link_xdp_opts *opts)
 	/* started nested attribute for XDP */
 	nla = (struct nlattr *)(((char *)&req)
 				+ NLMSG_ALIGN(req.nh.nlmsg_len));
-	nla->nla_type = NLA_F_NESTED | IFLA_XDP;
+	nla->nla_type = NLA_F_NESTED;
+	nla->nla_type |= tx ? IFLA_XDP_TX : IFLA_XDP;
 	nla->nla_len = NLA_HDRLEN;
 
 	/* add XDP fd */
@@ -196,6 +198,7 @@ int bpf_set_link_xdp_fd(int ifindex, int fd, __u32 flags)
 
 	opts.prog_fd = fd;
 	opts.flags = flags;
+	/* opts.tx_path is already 0 */
 
 	return bpf_set_link_opts(ifindex, &opts, BPF_LINK_SET_XDP_FD);
 }
@@ -215,20 +218,20 @@ static int __dump_link_nlmsg(struct nlmsghdr *nlh,
 	return dump_link_nlmsg(cookie, ifi, tb);
 }
 
-static int get_xdp_info(void *cookie, void *msg, struct nlattr **tb)
+static int __get_xdp_info(void *cookie, void *msg, struct nlattr **tb, bool tx)
 {
 	struct nlattr *xdp_tb[IFLA_XDP_MAX + 1];
 	struct xdp_id_md *xdp_id = cookie;
 	struct ifinfomsg *ifinfo = msg;
+	struct nlattr *attr;
 	int ret;
 
 	if (xdp_id->ifindex && xdp_id->ifindex != ifinfo->ifi_index)
 		return 0;
 
-	if (!tb[IFLA_XDP])
-		return 0;
+	attr = tx ? tb[IFLA_XDP_TX] : tb[IFLA_XDP];
 
-	ret = libbpf_nla_parse_nested(xdp_tb, IFLA_XDP_MAX, tb[IFLA_XDP], NULL);
+	ret = libbpf_nla_parse_nested(xdp_tb, IFLA_XDP_MAX, attr, NULL);
 	if (ret)
 		return ret;
 
@@ -260,12 +263,27 @@ static int get_xdp_info(void *cookie, void *msg, struct nlattr **tb)
 	return 0;
 }
 
+static int get_xdp_tx_info(void *cookie, void *msg, struct nlattr **tb)
+{
+	if (!tb[IFLA_XDP_TX])
+		return 0;
+	return __get_xdp_info(cookie, msg, tb, true);
+}
+
+static int get_xdp_info(void *cookie, void *msg, struct nlattr **tb)
+{
+	if (!tb[IFLA_XDP])
+		return 0;
+	return __get_xdp_info(cookie, msg, tb, false);
+}
+
 static int __bpf_get_link_xdp_info(int ifindex, struct bpf_link_xdp_opts *opts)
 {
 	struct xdp_link_info *info = opts->link_info;
 	size_t info_size = opts->link_info_sz;
 	struct xdp_id_md xdp_id = {};
 	__u32 flags = opts->flags;
+	int tx = opts->tx_path;
 	int sock, ret;
 	__u32 nl_pid;
 	__u32 mask;
@@ -286,7 +304,11 @@ static int __bpf_get_link_xdp_info(int ifindex, struct bpf_link_xdp_opts *opts)
 	xdp_id.ifindex = ifindex;
 	xdp_id.flags = flags;
 
-	ret = libbpf_nl_get_link(sock, nl_pid, get_xdp_info, &xdp_id);
+	if (tx)
+		ret = libbpf_nl_get_link(sock, nl_pid, get_xdp_tx_info,
+					 &xdp_id);
+	else
+		ret = libbpf_nl_get_link(sock, nl_pid, get_xdp_info, &xdp_id);
 	if (!ret) {
 		size_t sz = min(info_size, sizeof(xdp_id.info));
 
@@ -306,6 +328,7 @@ int bpf_get_link_xdp_info(int ifindex, struct xdp_link_info *info,
 	opts.link_info = info;
 	opts.link_info_sz = info_size;
 	opts.flags = flags;
+	/* opts.tx_path is already 0 */
 
 	return bpf_get_link_opts(ifindex, &opts, BPF_LINK_GET_XDP_INFO);
 }
@@ -502,6 +525,7 @@ int bpf_get_link_opts(int ifindex, struct bpf_link_xdp_opts *opts,
 		int ret;
 
 		tmp_opts.flags = opts->flags;
+		tmp_opts.tx_path = opts->tx_path;
 		tmp_opts.link_info = &link_info;
 		tmp_opts.link_info_sz = sizeof(link_info);
 		ret = __bpf_get_link_xdp_info(ifindex, &tmp_opts);
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC v2 net-next 05/12] samples/bpf: xdp1, add XDP tx support
  2019-12-26  2:31 [RFC v2 net-next 00/12] XDP in tx path Prashant Bhole
                   ` (3 preceding siblings ...)
  2019-12-26  2:31 ` [RFC v2 net-next 04/12] libbpf: set xdp program in tx path Prashant Bhole
@ 2019-12-26  2:31 ` Prashant Bhole
  2019-12-26  2:31 ` [RFC v2 net-next 06/12] net: core: rename netif_receive_generic_xdp() to do_generic_xdp_core() Prashant Bhole
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 22+ messages in thread
From: Prashant Bhole @ 2019-12-26  2:31 UTC (permalink / raw)
  To: David S . Miller, Michael S . Tsirkin, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer
  Cc: Prashant Bhole, Jason Wang, David Ahern, Jakub Kicinski,
	John Fastabend, Toshiaki Makita, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, netdev

xdp1 and xdp2 now accept -T flag to set XDP program in the tx path.
The user program uses new APIs bpf_get_link_opts/bpf_set_link_opts.

Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
---
 samples/bpf/xdp1_user.c | 42 +++++++++++++++++++++++++++++++----------
 1 file changed, 32 insertions(+), 10 deletions(-)

diff --git a/samples/bpf/xdp1_user.c b/samples/bpf/xdp1_user.c
index 3e553eed95a7..d8e27d4f785c 100644
--- a/samples/bpf/xdp1_user.c
+++ b/samples/bpf/xdp1_user.c
@@ -21,21 +21,31 @@
 static int ifindex;
 static __u32 xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
 static __u32 prog_id;
+static bool tx_path;
 
 static void int_exit(int sig)
 {
-	__u32 curr_prog_id = 0;
+	struct bpf_link_xdp_opts xdp_opts = {};
+	int err;
+
+	xdp_opts.flags = xdp_flags;
+	xdp_opts.tx_path = tx_path;
 
-	if (bpf_get_link_xdp_id(ifindex, &curr_prog_id, xdp_flags)) {
-		printf("bpf_get_link_xdp_id failed\n");
+	err = bpf_get_link_opts(ifindex, &xdp_opts, BPF_LINK_GET_XDP_ID);
+	if (err) {
+		printf("getting xdp program id failed\n");
 		exit(1);
 	}
-	if (prog_id == curr_prog_id)
-		bpf_set_link_xdp_fd(ifindex, -1, xdp_flags);
-	else if (!curr_prog_id)
+
+	if (prog_id == xdp_opts.prog_id) {
+		xdp_opts.prog_fd = -1;
+		err = bpf_set_link_opts(ifindex, &xdp_opts,
+					BPF_LINK_SET_XDP_FD);
+	} else if (!xdp_opts.prog_id) {
 		printf("couldn't find a prog id on a given interface\n");
-	else
+	} else {
 		printf("program on interface changed, not removing\n");
+	}
 	exit(0);
 }
 
@@ -73,7 +83,8 @@ static void usage(const char *prog)
 		"OPTS:\n"
 		"    -S    use skb-mode\n"
 		"    -N    enforce native mode\n"
-		"    -F    force loading prog\n",
+		"    -F    force loading prog\n"
+		"    -T    TX path prog\n",
 		prog);
 }
 
@@ -83,9 +94,10 @@ int main(int argc, char **argv)
 	struct bpf_prog_load_attr prog_load_attr = {
 		.prog_type	= BPF_PROG_TYPE_XDP,
 	};
+	struct bpf_link_xdp_opts xdp_opts = {0};
 	struct bpf_prog_info info = {};
 	__u32 info_len = sizeof(info);
-	const char *optstr = "FSN";
+	const char *optstr = "FSNT";
 	int prog_fd, map_fd, opt;
 	struct bpf_object *obj;
 	struct bpf_map *map;
@@ -103,6 +115,9 @@ int main(int argc, char **argv)
 		case 'F':
 			xdp_flags &= ~XDP_FLAGS_UPDATE_IF_NOEXIST;
 			break;
+		case 'T':
+			tx_path = true;
+			break;
 		default:
 			usage(basename(argv[0]));
 			return 1;
@@ -127,6 +142,8 @@ int main(int argc, char **argv)
 
 	snprintf(filename, sizeof(filename), "%s_kern.o", argv[0]);
 	prog_load_attr.file = filename;
+	if (tx_path)
+		prog_load_attr.expected_attach_type = BPF_XDP_EGRESS;
 
 	if (bpf_prog_load_xattr(&prog_load_attr, &obj, &prog_fd))
 		return 1;
@@ -146,7 +163,12 @@ int main(int argc, char **argv)
 	signal(SIGINT, int_exit);
 	signal(SIGTERM, int_exit);
 
-	if (bpf_set_link_xdp_fd(ifindex, prog_fd, xdp_flags) < 0) {
+	xdp_opts.prog_fd = prog_fd;
+	xdp_opts.flags = xdp_flags;
+	xdp_opts.tx_path = tx_path;
+
+	err = bpf_set_link_opts(ifindex, &xdp_opts, BPF_LINK_SET_XDP_FD);
+	if (err < 0) {
 		printf("link set xdp fd failed\n");
 		return 1;
 	}
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC v2 net-next 06/12] net: core: rename netif_receive_generic_xdp() to do_generic_xdp_core()
  2019-12-26  2:31 [RFC v2 net-next 00/12] XDP in tx path Prashant Bhole
                   ` (4 preceding siblings ...)
  2019-12-26  2:31 ` [RFC v2 net-next 05/12] samples/bpf: xdp1, add XDP tx support Prashant Bhole
@ 2019-12-26  2:31 ` Prashant Bhole
  2019-12-26  2:31 ` [RFC v2 net-next 07/12] net: core: export do_xdp_generic_core() Prashant Bhole
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 22+ messages in thread
From: Prashant Bhole @ 2019-12-26  2:31 UTC (permalink / raw)
  To: David S . Miller, Michael S . Tsirkin, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer
  Cc: Jason Wang, David Ahern, Jakub Kicinski, John Fastabend,
	Toshiaki Makita, Martin KaFai Lau, Song Liu, Yonghong Song,
	Andrii Nakryiko, netdev, Prashant Bhole

From: Jason Wang <jasowang@redhat.com>

In skb generic path, we need a way to run XDP program on skb but
to have customized handling of XDP actions. netif_receive_generic_xdp
will be more helpful in such cases than do_xdp_generic.

This patch prepares netif_receive_generic_xdp() to be used as general
purpose function by renaming it.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
---
 net/core/dev.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index ae66fd791737..b05c2d639dcc 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4461,9 +4461,9 @@ static struct netdev_rx_queue *netif_get_rxqueue(struct sk_buff *skb)
 	return rxqueue;
 }
 
-static u32 netif_receive_generic_xdp(struct sk_buff *skb,
-				     struct xdp_buff *xdp,
-				     struct bpf_prog *xdp_prog)
+static u32 do_xdp_generic_core(struct sk_buff *skb,
+			       struct xdp_buff *xdp,
+			       struct bpf_prog *xdp_prog)
 {
 	struct netdev_rx_queue *rxqueue;
 	void *orig_data, *orig_data_end;
@@ -4610,7 +4610,7 @@ int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb)
 		u32 act;
 		int err;
 
-		act = netif_receive_generic_xdp(skb, &xdp, xdp_prog);
+		act = do_xdp_generic_core(skb, &xdp, xdp_prog);
 		if (act != XDP_PASS) {
 			switch (act) {
 			case XDP_REDIRECT:
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC v2 net-next 07/12] net: core: export do_xdp_generic_core()
  2019-12-26  2:31 [RFC v2 net-next 00/12] XDP in tx path Prashant Bhole
                   ` (5 preceding siblings ...)
  2019-12-26  2:31 ` [RFC v2 net-next 06/12] net: core: rename netif_receive_generic_xdp() to do_generic_xdp_core() Prashant Bhole
@ 2019-12-26  2:31 ` Prashant Bhole
  2019-12-26  2:31 ` [RFC v2 net-next 08/12] tuntap: check tun_msg_ctl type at necessary places Prashant Bhole
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 22+ messages in thread
From: Prashant Bhole @ 2019-12-26  2:31 UTC (permalink / raw)
  To: David S . Miller, Michael S . Tsirkin, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer
  Cc: Jason Wang, David Ahern, Jakub Kicinski, John Fastabend,
	Toshiaki Makita, Martin KaFai Lau, Song Liu, Yonghong Song,
	Andrii Nakryiko, netdev, Prashant Bhole

From: Jason Wang <jasowang@redhat.com>

Let's export do_xdp_generic as a general purpose function. It will
just run XDP program on skb but will not handle XDP actions.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
---
 include/linux/netdevice.h | 2 ++
 net/core/dev.c            | 6 +++---
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index ac3e88d86581..51b58e47e521 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3661,6 +3661,8 @@ static inline void dev_consume_skb_any(struct sk_buff *skb)
 
 void generic_xdp_tx(struct sk_buff *skb, struct bpf_prog *xdp_prog);
 int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb);
+u32 do_xdp_generic_core(struct sk_buff *skb, struct xdp_buff *xdp,
+			struct bpf_prog *xdp_prog);
 int netif_rx(struct sk_buff *skb);
 int netif_rx_ni(struct sk_buff *skb);
 int netif_receive_skb(struct sk_buff *skb);
diff --git a/net/core/dev.c b/net/core/dev.c
index b05c2d639dcc..db36dd288015 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4461,9 +4461,8 @@ static struct netdev_rx_queue *netif_get_rxqueue(struct sk_buff *skb)
 	return rxqueue;
 }
 
-static u32 do_xdp_generic_core(struct sk_buff *skb,
-			       struct xdp_buff *xdp,
-			       struct bpf_prog *xdp_prog)
+u32 do_xdp_generic_core(struct sk_buff *skb, struct xdp_buff *xdp,
+			struct bpf_prog *xdp_prog)
 {
 	struct netdev_rx_queue *rxqueue;
 	void *orig_data, *orig_data_end;
@@ -4574,6 +4573,7 @@ static u32 do_xdp_generic_core(struct sk_buff *skb,
 
 	return act;
 }
+EXPORT_SYMBOL_GPL(do_xdp_generic_core);
 
 /* When doing generic XDP we have to bypass the qdisc layer and the
  * network taps in order to match in-driver-XDP behavior.
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC v2 net-next 08/12] tuntap: check tun_msg_ctl type at necessary places
  2019-12-26  2:31 [RFC v2 net-next 00/12] XDP in tx path Prashant Bhole
                   ` (6 preceding siblings ...)
  2019-12-26  2:31 ` [RFC v2 net-next 07/12] net: core: export do_xdp_generic_core() Prashant Bhole
@ 2019-12-26  2:31 ` Prashant Bhole
  2019-12-26  2:31 ` [RFC v2 net-next 09/12] vhost_net: user tap recvmsg api to access ptr ring Prashant Bhole
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 22+ messages in thread
From: Prashant Bhole @ 2019-12-26  2:31 UTC (permalink / raw)
  To: David S . Miller, Michael S . Tsirkin, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer
  Cc: Prashant Bhole, Jason Wang, David Ahern, Jakub Kicinski,
	John Fastabend, Toshiaki Makita, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, netdev

tun_msg_ctl is used by vhost_net to communicate with tuntap. We will
introduce another type in soon. As a preparation this patch adds
conditions to check tun_msg_ctl type at necessary places.

Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
---
 drivers/net/tap.c | 7 +++++--
 drivers/net/tun.c | 6 +++++-
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/net/tap.c b/drivers/net/tap.c
index a6d63665ad03..a0a5dc18109a 100644
--- a/drivers/net/tap.c
+++ b/drivers/net/tap.c
@@ -1203,6 +1203,7 @@ static int tap_sendmsg(struct socket *sock, struct msghdr *m,
 	struct tap_queue *q = container_of(sock, struct tap_queue, sock);
 	struct tun_msg_ctl *ctl = m->msg_control;
 	struct xdp_buff *xdp;
+	void *ptr = NULL;
 	int i;
 
 	if (ctl && (ctl->type == TUN_MSG_PTR)) {
@@ -1213,8 +1214,10 @@ static int tap_sendmsg(struct socket *sock, struct msghdr *m,
 		return 0;
 	}
 
-	return tap_get_user(q, ctl ? ctl->ptr : NULL, &m->msg_iter,
-			    m->msg_flags & MSG_DONTWAIT);
+	if (ctl && ctl->type == TUN_MSG_UBUF)
+		ptr = ctl->ptr;
+
+	return tap_get_user(q, ptr, &m->msg_iter, m->msg_flags & MSG_DONTWAIT);
 }
 
 static int tap_recvmsg(struct socket *sock, struct msghdr *m,
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 683d371e6e82..1e436d9ec4e1 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -2529,6 +2529,7 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
 	struct tun_struct *tun = tun_get(tfile);
 	struct tun_msg_ctl *ctl = m->msg_control;
 	struct xdp_buff *xdp;
+	void *ptr = NULL;
 
 	if (!tun)
 		return -EBADFD;
@@ -2560,7 +2561,10 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
 		goto out;
 	}
 
-	ret = tun_get_user(tun, tfile, ctl ? ctl->ptr : NULL, &m->msg_iter,
+	if (ctl && ctl->type == TUN_MSG_UBUF)
+		ptr = ctl->ptr;
+
+	ret = tun_get_user(tun, tfile, ptr, &m->msg_iter,
 			   m->msg_flags & MSG_DONTWAIT,
 			   m->msg_flags & MSG_MORE);
 out:
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC v2 net-next 09/12] vhost_net: user tap recvmsg api to access ptr ring
  2019-12-26  2:31 [RFC v2 net-next 00/12] XDP in tx path Prashant Bhole
                   ` (7 preceding siblings ...)
  2019-12-26  2:31 ` [RFC v2 net-next 08/12] tuntap: check tun_msg_ctl type at necessary places Prashant Bhole
@ 2019-12-26  2:31 ` Prashant Bhole
  2019-12-26  9:05   ` kbuild test robot
  2019-12-26  2:31 ` [RFC v2 net-next 10/12] tuntap: remove usage of ptr ring in vhost_net Prashant Bhole
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 22+ messages in thread
From: Prashant Bhole @ 2019-12-26  2:31 UTC (permalink / raw)
  To: David S . Miller, Michael S . Tsirkin, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer
  Cc: Prashant Bhole, Jason Wang, David Ahern, Jakub Kicinski,
	John Fastabend, Toshiaki Makita, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, netdev

Currently vhost_net directly accesses ptr ring of tap driver to
fetch Rx packet pointers. In order to avoid it this patch modifies
tap driver's recvmsg api to do additional task of fetching Rx packet
pointers.

A special struct tun_msg_ctl is already being passed via msg_control
for tun Rx XDP batching. This patch extends tun_msg_ctl usage to
send sub commands to recvmsg api. Now tun_recvmsg will handle commands
to consume and unconsume packet pointers from ptr ring.

This will be useful in implementation of tx path XDP in tun driver,
where XDP program will process the packet before it is passed to
vhost_net.

Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
---
 drivers/net/tap.c      | 22 ++++++++++++++++++-
 drivers/net/tun.c      | 24 ++++++++++++++++++++-
 drivers/vhost/net.c    | 48 +++++++++++++++++++++++++++++++-----------
 include/linux/if_tun.h | 18 ++++++++++++++++
 4 files changed, 98 insertions(+), 14 deletions(-)

diff --git a/drivers/net/tap.c b/drivers/net/tap.c
index a0a5dc18109a..a5ce44db11a3 100644
--- a/drivers/net/tap.c
+++ b/drivers/net/tap.c
@@ -1224,8 +1224,28 @@ static int tap_recvmsg(struct socket *sock, struct msghdr *m,
 		       size_t total_len, int flags)
 {
 	struct tap_queue *q = container_of(sock, struct tap_queue, sock);
-	struct sk_buff *skb = m->msg_control;
+	struct tun_msg_ctl *ctl = m->msg_control;
+	struct sk_buff *skb = NULL;
 	int ret;
+
+	if (ctl) {
+		switch (ctl->type) {
+		case TUN_MSG_PKT:
+			skb = ctl->ptr;
+			break;
+		case TUN_MSG_CONSUME_PKTS:
+			return ptr_ring_consume_batched(&q->ring,
+							ctl->ptr,
+							ctl->num);
+		case TUN_MSG_UNCONSUME_PKTS:
+			ptr_ring_unconsume(&q->ring, ctl->ptr, ctl->num,
+					   tun_ptr_free);
+			return 0;
+		default:
+			return -EINVAL;
+		}
+	}
+
 	if (flags & ~(MSG_DONTWAIT|MSG_TRUNC)) {
 		kfree_skb(skb);
 		return -EINVAL;
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 1e436d9ec4e1..4f28f2387435 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -2577,7 +2577,8 @@ static int tun_recvmsg(struct socket *sock, struct msghdr *m, size_t total_len,
 {
 	struct tun_file *tfile = container_of(sock, struct tun_file, socket);
 	struct tun_struct *tun = tun_get(tfile);
-	void *ptr = m->msg_control;
+	struct tun_msg_ctl *ctl = m->msg_control;
+	void *ptr = NULL;
 	int ret;
 
 	if (!tun) {
@@ -2585,6 +2586,27 @@ static int tun_recvmsg(struct socket *sock, struct msghdr *m, size_t total_len,
 		goto out_free;
 	}
 
+	if (ctl) {
+		switch (ctl->type) {
+		case TUN_MSG_PKT:
+			ptr = ctl->ptr;
+			break;
+		case TUN_MSG_CONSUME_PKTS:
+			ret = ptr_ring_consume_batched(&tfile->tx_ring,
+						       ctl->ptr,
+						       ctl->num);
+			goto out;
+		case TUN_MSG_UNCONSUME_PKTS:
+			ptr_ring_unconsume(&tfile->tx_ring, ctl->ptr,
+					   ctl->num, tun_ptr_free);
+			ret = 0;
+			goto out;
+		default:
+			ret = -EINVAL;
+			goto out_put_tun;
+		}
+	}
+
 	if (flags & ~(MSG_DONTWAIT|MSG_TRUNC|MSG_ERRQUEUE)) {
 		ret = -EINVAL;
 		goto out_put_tun;
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index e158159671fa..482548d00105 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -175,24 +175,44 @@ static void *vhost_net_buf_consume(struct vhost_net_buf *rxq)
 
 static int vhost_net_buf_produce(struct vhost_net_virtqueue *nvq)
 {
+	struct vhost_virtqueue *vq = &nvq->vq;
+	struct socket *sock = vq->private_data;
 	struct vhost_net_buf *rxq = &nvq->rxq;
+	struct tun_msg_ctl ctl = {
+		.type = TUN_MSG_CONSUME_PKTS,
+		.ptr = (void *) rxq->queue,
+		.num = VHOST_NET_BATCH,
+	};
+	struct msghdr msg = {
+		.msg_control = &ctl,
+	};
 
 	rxq->head = 0;
-	rxq->tail = ptr_ring_consume_batched(nvq->rx_ring, rxq->queue,
-					      VHOST_NET_BATCH);
+	rxq->tail = sock->ops->recvmsg(sock, &msg, 0, 0);
+	if (WARN_ON_ONCE(rxq->tail < 0))
+		rxq->tail = 0;
+
 	return rxq->tail;
 }
 
 static void vhost_net_buf_unproduce(struct vhost_net_virtqueue *nvq)
 {
+	struct vhost_virtqueue *vq = &nvq->vq;
+	struct socket *sock = vq->private_data;
 	struct vhost_net_buf *rxq = &nvq->rxq;
+	struct tun_msg_ctl ctl = {
+		.type = TUN_MSG_UNCONSUME_PKTS,
+		.ptr = (void *) (rxq->queue + rxq->head),
+		.num = vhost_net_buf_get_size(rxq),
+	};
+	struct msghdr msg = {
+		.msg_control = &ctl,
+	};
 
-	if (nvq->rx_ring && !vhost_net_buf_is_empty(rxq)) {
-		ptr_ring_unconsume(nvq->rx_ring, rxq->queue + rxq->head,
-				   vhost_net_buf_get_size(rxq),
-				   tun_ptr_free);
-		rxq->head = rxq->tail = 0;
-	}
+	if (!vhost_net_buf_is_empty(rxq))
+		sock->ops->recvmsg(sock, &msg, 0, 0);
+
+	rxq->head = rxq->tail = 0;
 }
 
 static int vhost_net_buf_peek_len(void *ptr)
@@ -1109,6 +1129,7 @@ static void handle_rx(struct vhost_net *net)
 		.flags = 0,
 		.gso_type = VIRTIO_NET_HDR_GSO_NONE
 	};
+	struct tun_msg_ctl ctl;
 	size_t total_len = 0;
 	int err, mergeable;
 	s16 headcount;
@@ -1166,8 +1187,11 @@ static void handle_rx(struct vhost_net *net)
 			goto out;
 		}
 		busyloop_intr = false;
-		if (nvq->rx_ring)
-			msg.msg_control = vhost_net_buf_consume(&nvq->rxq);
+		if (nvq->rx_ring) {
+			ctl.type = TUN_MSG_PKT;
+			ctl.ptr = vhost_net_buf_consume(&nvq->rxq);
+			msg.msg_control = &ctl;
+		}
 		/* On overrun, truncate and discard */
 		if (unlikely(headcount > UIO_MAXIOV)) {
 			iov_iter_init(&msg.msg_iter, READ, vq->iov, 1, 1);
@@ -1346,8 +1370,8 @@ static struct socket *vhost_net_stop_vq(struct vhost_net *n,
 	mutex_lock(&vq->mutex);
 	sock = vq->private_data;
 	vhost_net_disable_vq(n, vq);
-	vq->private_data = NULL;
 	vhost_net_buf_unproduce(nvq);
+	vq->private_data = NULL;
 	nvq->rx_ring = NULL;
 	mutex_unlock(&vq->mutex);
 	return sock;
@@ -1538,8 +1562,8 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd)
 		}
 
 		vhost_net_disable_vq(n, vq);
-		vq->private_data = sock;
 		vhost_net_buf_unproduce(nvq);
+		vq->private_data = sock;
 		r = vhost_vq_init_access(vq);
 		if (r)
 			goto err_used;
diff --git a/include/linux/if_tun.h b/include/linux/if_tun.h
index 5bda8cf457b6..bb94843e3829 100644
--- a/include/linux/if_tun.h
+++ b/include/linux/if_tun.h
@@ -11,8 +11,26 @@
 
 #define TUN_XDP_FLAG 0x1UL
 
+/*
+ * tun_msg_ctl types
+ */
+
 #define TUN_MSG_UBUF 1
 #define TUN_MSG_PTR  2
+/*
+ * Used for passing a packet pointer from vhost to tun
+ */
+#define TUN_MSG_PKT  3
+/*
+ * Used for passing an array of pointer from vhost to tun.
+ * tun consumes packets from ptr ring and stores in pointer array.
+ */
+#define TUN_MSG_CONSUME_PKTS    4
+/*
+ * Used for passing an array of pointer from vhost to tun.
+ * tun consumes get pointer from array and puts back into ptr ring.
+ */
+#define TUN_MSG_UNCONSUME_PKTS  5
 struct tun_msg_ctl {
 	unsigned short type;
 	unsigned short num;
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC v2 net-next 10/12] tuntap: remove usage of ptr ring in vhost_net
  2019-12-26  2:31 [RFC v2 net-next 00/12] XDP in tx path Prashant Bhole
                   ` (8 preceding siblings ...)
  2019-12-26  2:31 ` [RFC v2 net-next 09/12] vhost_net: user tap recvmsg api to access ptr ring Prashant Bhole
@ 2019-12-26  2:31 ` Prashant Bhole
  2019-12-26  2:31 ` [RFC v2 net-next 11/12] tun: set tx path XDP program Prashant Bhole
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 22+ messages in thread
From: Prashant Bhole @ 2019-12-26  2:31 UTC (permalink / raw)
  To: David S . Miller, Michael S . Tsirkin, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer
  Cc: Prashant Bhole, Jason Wang, David Ahern, Jakub Kicinski,
	John Fastabend, Toshiaki Makita, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, netdev

Remove usage of ptr ring of tuntap in vhost_net and remove the
functions exported from tuntap drivers to get ptr ring.

Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
---
 drivers/net/tap.c      | 13 -------------
 drivers/net/tun.c      | 13 -------------
 drivers/vhost/net.c    | 31 ++++---------------------------
 include/linux/if_tap.h |  5 -----
 include/linux/if_tun.h |  5 -----
 5 files changed, 4 insertions(+), 63 deletions(-)

diff --git a/drivers/net/tap.c b/drivers/net/tap.c
index a5ce44db11a3..fe816a99275d 100644
--- a/drivers/net/tap.c
+++ b/drivers/net/tap.c
@@ -1288,19 +1288,6 @@ struct socket *tap_get_socket(struct file *file)
 }
 EXPORT_SYMBOL_GPL(tap_get_socket);
 
-struct ptr_ring *tap_get_ptr_ring(struct file *file)
-{
-	struct tap_queue *q;
-
-	if (file->f_op != &tap_fops)
-		return ERR_PTR(-EINVAL);
-	q = file->private_data;
-	if (!q)
-		return ERR_PTR(-EBADFD);
-	return &q->ring;
-}
-EXPORT_SYMBOL_GPL(tap_get_ptr_ring);
-
 int tap_queue_resize(struct tap_dev *tap)
 {
 	struct net_device *dev = tap->dev;
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 4f28f2387435..d078b4659897 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -3750,19 +3750,6 @@ struct socket *tun_get_socket(struct file *file)
 }
 EXPORT_SYMBOL_GPL(tun_get_socket);
 
-struct ptr_ring *tun_get_tx_ring(struct file *file)
-{
-	struct tun_file *tfile;
-
-	if (file->f_op != &tun_fops)
-		return ERR_PTR(-EINVAL);
-	tfile = file->private_data;
-	if (!tfile)
-		return ERR_PTR(-EBADFD);
-	return &tfile->tx_ring;
-}
-EXPORT_SYMBOL_GPL(tun_get_tx_ring);
-
 module_init(tun_init);
 module_exit(tun_cleanup);
 MODULE_DESCRIPTION(DRV_DESCRIPTION);
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 482548d00105..30b5c68193c9 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -122,7 +122,6 @@ struct vhost_net_virtqueue {
 	/* Reference counting for outstanding ubufs.
 	 * Protected by vq mutex. Writers must also take device mutex. */
 	struct vhost_net_ubuf_ref *ubufs;
-	struct ptr_ring *rx_ring;
 	struct vhost_net_buf rxq;
 	/* Batched XDP buffs */
 	struct xdp_buff *xdp;
@@ -997,8 +996,9 @@ static int peek_head_len(struct vhost_net_virtqueue *rvq, struct sock *sk)
 	int len = 0;
 	unsigned long flags;
 
-	if (rvq->rx_ring)
-		return vhost_net_buf_peek(rvq);
+	len = vhost_net_buf_peek(rvq);
+	if (len)
+		return len;
 
 	spin_lock_irqsave(&sk->sk_receive_queue.lock, flags);
 	head = skb_peek(&sk->sk_receive_queue);
@@ -1187,7 +1187,7 @@ static void handle_rx(struct vhost_net *net)
 			goto out;
 		}
 		busyloop_intr = false;
-		if (nvq->rx_ring) {
+		if (!vhost_net_buf_is_empty(&nvq->rxq)) {
 			ctl.type = TUN_MSG_PKT;
 			ctl.ptr = vhost_net_buf_consume(&nvq->rxq);
 			msg.msg_control = &ctl;
@@ -1343,7 +1343,6 @@ static int vhost_net_open(struct inode *inode, struct file *f)
 		n->vqs[i].batched_xdp = 0;
 		n->vqs[i].vhost_hlen = 0;
 		n->vqs[i].sock_hlen = 0;
-		n->vqs[i].rx_ring = NULL;
 		vhost_net_buf_init(&n->vqs[i].rxq);
 	}
 	vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX,
@@ -1372,7 +1371,6 @@ static struct socket *vhost_net_stop_vq(struct vhost_net *n,
 	vhost_net_disable_vq(n, vq);
 	vhost_net_buf_unproduce(nvq);
 	vq->private_data = NULL;
-	nvq->rx_ring = NULL;
 	mutex_unlock(&vq->mutex);
 	return sock;
 }
@@ -1468,25 +1466,6 @@ static struct socket *get_raw_socket(int fd)
 	return ERR_PTR(r);
 }
 
-static struct ptr_ring *get_tap_ptr_ring(int fd)
-{
-	struct ptr_ring *ring;
-	struct file *file = fget(fd);
-
-	if (!file)
-		return NULL;
-	ring = tun_get_tx_ring(file);
-	if (!IS_ERR(ring))
-		goto out;
-	ring = tap_get_ptr_ring(file);
-	if (!IS_ERR(ring))
-		goto out;
-	ring = NULL;
-out:
-	fput(file);
-	return ring;
-}
-
 static struct socket *get_tap_socket(int fd)
 {
 	struct file *file = fget(fd);
@@ -1570,8 +1549,6 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd)
 		r = vhost_net_enable_vq(n, vq);
 		if (r)
 			goto err_used;
-		if (index == VHOST_NET_VQ_RX)
-			nvq->rx_ring = get_tap_ptr_ring(fd);
 
 		oldubufs = nvq->ubufs;
 		nvq->ubufs = ubufs;
diff --git a/include/linux/if_tap.h b/include/linux/if_tap.h
index 915a187cfabd..68fe366fb185 100644
--- a/include/linux/if_tap.h
+++ b/include/linux/if_tap.h
@@ -4,7 +4,6 @@
 
 #if IS_ENABLED(CONFIG_TAP)
 struct socket *tap_get_socket(struct file *);
-struct ptr_ring *tap_get_ptr_ring(struct file *file);
 #else
 #include <linux/err.h>
 #include <linux/errno.h>
@@ -14,10 +13,6 @@ static inline struct socket *tap_get_socket(struct file *f)
 {
 	return ERR_PTR(-EINVAL);
 }
-static inline struct ptr_ring *tap_get_ptr_ring(struct file *f)
-{
-	return ERR_PTR(-EINVAL);
-}
 #endif /* CONFIG_TAP */
 
 #include <net/sock.h>
diff --git a/include/linux/if_tun.h b/include/linux/if_tun.h
index bb94843e3829..f01a255e076d 100644
--- a/include/linux/if_tun.h
+++ b/include/linux/if_tun.h
@@ -44,7 +44,6 @@ struct tun_xdp_hdr {
 
 #if defined(CONFIG_TUN) || defined(CONFIG_TUN_MODULE)
 struct socket *tun_get_socket(struct file *);
-struct ptr_ring *tun_get_tx_ring(struct file *file);
 bool tun_is_xdp_frame(void *ptr);
 void *tun_xdp_to_ptr(void *ptr);
 void *tun_ptr_to_xdp(void *ptr);
@@ -58,10 +57,6 @@ static inline struct socket *tun_get_socket(struct file *f)
 {
 	return ERR_PTR(-EINVAL);
 }
-static inline struct ptr_ring *tun_get_tx_ring(struct file *f)
-{
-	return ERR_PTR(-EINVAL);
-}
 static inline bool tun_is_xdp_frame(void *ptr)
 {
 	return false;
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC v2 net-next 11/12] tun: set tx path XDP program
  2019-12-26  2:31 [RFC v2 net-next 00/12] XDP in tx path Prashant Bhole
                   ` (9 preceding siblings ...)
  2019-12-26  2:31 ` [RFC v2 net-next 10/12] tuntap: remove usage of ptr ring in vhost_net Prashant Bhole
@ 2019-12-26  2:31 ` Prashant Bhole
  2019-12-26  2:32 ` [RFC v2 net-next 12/12] tun: run XDP program in tx path Prashant Bhole
  2019-12-26 19:23 ` [RFC v2 net-next 00/12] XDP " Tom Herbert
  12 siblings, 0 replies; 22+ messages in thread
From: Prashant Bhole @ 2019-12-26  2:31 UTC (permalink / raw)
  To: David S . Miller, Michael S . Tsirkin, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer
  Cc: David Ahern, Jason Wang, David Ahern, Jakub Kicinski,
	John Fastabend, Toshiaki Makita, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, netdev, Prashant Bhole

From: David Ahern <dahern@digitalocean.com>

This patch adds a way to set tx path XDP program in tun driver
by handling XDP_SETUP_PROG_TX and XDP_QUERY_PROG_TX in ndo_bpf
handler.

Signed-off-by: David Ahern <dahern@digitalocean.com>
Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
---
 drivers/net/tun.c | 34 ++++++++++++++++++++++++++--------
 1 file changed, 26 insertions(+), 8 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index d078b4659897..8aee7abd53a2 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -239,6 +239,7 @@ struct tun_struct {
 	u32 rx_batched;
 	struct tun_pcpu_stats __percpu *pcpu_stats;
 	struct bpf_prog __rcu *xdp_prog;
+	struct bpf_prog __rcu *xdp_tx_prog;
 	struct tun_prog __rcu *steering_prog;
 	struct tun_prog __rcu *filter_prog;
 	struct ethtool_link_ksettings link_ksettings;
@@ -1189,15 +1190,21 @@ tun_net_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats)
 }
 
 static int tun_xdp_set(struct net_device *dev, struct bpf_prog *prog,
-		       struct netlink_ext_ack *extack)
+		       bool tx, struct netlink_ext_ack *extack)
 {
 	struct tun_struct *tun = netdev_priv(dev);
 	struct tun_file *tfile;
 	struct bpf_prog *old_prog;
 	int i;
 
-	old_prog = rtnl_dereference(tun->xdp_prog);
-	rcu_assign_pointer(tun->xdp_prog, prog);
+	if (tx) {
+		old_prog = rtnl_dereference(tun->xdp_tx_prog);
+		rcu_assign_pointer(tun->xdp_tx_prog, prog);
+	} else {
+		old_prog = rtnl_dereference(tun->xdp_prog);
+		rcu_assign_pointer(tun->xdp_prog, prog);
+	}
+
 	if (old_prog)
 		bpf_prog_put(old_prog);
 
@@ -1218,12 +1225,16 @@ static int tun_xdp_set(struct net_device *dev, struct bpf_prog *prog,
 	return 0;
 }
 
-static u32 tun_xdp_query(struct net_device *dev)
+static u32 tun_xdp_query(struct net_device *dev, bool tx)
 {
 	struct tun_struct *tun = netdev_priv(dev);
 	const struct bpf_prog *xdp_prog;
 
-	xdp_prog = rtnl_dereference(tun->xdp_prog);
+	if (tx)
+		xdp_prog = rtnl_dereference(tun->xdp_tx_prog);
+	else
+		xdp_prog = rtnl_dereference(tun->xdp_prog);
+
 	if (xdp_prog)
 		return xdp_prog->aux->id;
 
@@ -1234,13 +1245,20 @@ static int tun_xdp(struct net_device *dev, struct netdev_bpf *xdp)
 {
 	switch (xdp->command) {
 	case XDP_SETUP_PROG:
-		return tun_xdp_set(dev, xdp->prog, xdp->extack);
+		return tun_xdp_set(dev, xdp->prog, false, xdp->extack);
+	case XDP_SETUP_PROG_TX:
+		return tun_xdp_set(dev, xdp->prog, true, xdp->extack);
 	case XDP_QUERY_PROG:
-		xdp->prog_id = tun_xdp_query(dev);
-		return 0;
+		xdp->prog_id = tun_xdp_query(dev, false);
+		break;
+	case XDP_QUERY_PROG_TX:
+		xdp->prog_id = tun_xdp_query(dev, true);
+		break;
 	default:
 		return -EINVAL;
 	}
+
+	return 0;
 }
 
 static int tun_net_change_carrier(struct net_device *dev, bool new_carrier)
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [RFC v2 net-next 12/12] tun: run XDP program in tx path
  2019-12-26  2:31 [RFC v2 net-next 00/12] XDP in tx path Prashant Bhole
                   ` (10 preceding siblings ...)
  2019-12-26  2:31 ` [RFC v2 net-next 11/12] tun: set tx path XDP program Prashant Bhole
@ 2019-12-26  2:32 ` Prashant Bhole
  2019-12-26 19:23 ` [RFC v2 net-next 00/12] XDP " Tom Herbert
  12 siblings, 0 replies; 22+ messages in thread
From: Prashant Bhole @ 2019-12-26  2:32 UTC (permalink / raw)
  To: David S . Miller, Michael S . Tsirkin, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer
  Cc: Prashant Bhole, Jason Wang, David Ahern, Jakub Kicinski,
	John Fastabend, Toshiaki Makita, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, netdev

Run the XDP program as soon as packet is removed from the ptr
ring. Since this is XDP in tx path, the traditional handling of
XDP actions XDP_TX/REDIRECT isn't valid. For this reason we call
do_xdp_generic_core instead of do_xdp_generic. do_xdp_generic_core
just runs the program and leaves the action handling to us.

Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
---
 drivers/net/tun.c | 149 +++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 146 insertions(+), 3 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 8aee7abd53a2..1afded9252f5 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -131,6 +131,7 @@ struct tap_filter {
 /* MAX_TAP_QUEUES 256 is chosen to allow rx/tx queues to be equal
  * to max number of VCPUs in guest. */
 #define MAX_TAP_QUEUES 256
+#define MAX_TAP_BATCH 64
 #define MAX_TAP_FLOWS  4096
 
 #define TUN_FLOW_EXPIRE (3 * HZ)
@@ -2173,6 +2174,109 @@ static ssize_t tun_put_user(struct tun_struct *tun,
 	return total;
 }
 
+static struct sk_buff *tun_prepare_xdp_skb(struct sk_buff *skb)
+{
+	struct sk_buff *nskb;
+
+	if (skb_shared(skb) || skb_cloned(skb)) {
+		nskb = skb_copy(skb, GFP_ATOMIC);
+		consume_skb(skb);
+		return nskb;
+	}
+
+	return skb;
+}
+
+static u32 tun_do_xdp_tx_generic(struct tun_struct *tun,
+				 struct sk_buff *skb)
+{
+	struct bpf_prog *xdp_prog;
+	struct xdp_buff xdp;
+	u32 act = XDP_PASS;
+
+	xdp_prog = rcu_dereference(tun->xdp_tx_prog);
+	if (xdp_prog) {
+		skb = tun_prepare_xdp_skb(skb);
+		if (!skb) {
+			act = XDP_DROP;
+			kfree_skb(skb);
+			goto drop;
+		}
+
+		act = do_xdp_generic_core(skb, &xdp, xdp_prog);
+		switch (act) {
+		case XDP_TX:
+			/* Rx path generic XDP will be called in this path
+			 */
+			local_bh_disable();
+			netif_receive_skb(skb);
+			local_bh_enable();
+			break;
+		case XDP_PASS:
+			break;
+		case XDP_REDIRECT:
+			/* Since we are not handling this case yet, let's free
+			 * skb here. In case of XDP_DROP/XDP_ABORTED, the skb
+			 * was already freed in do_xdp_generic_core()
+			 */
+			kfree_skb(skb);
+			/* fall through */
+		default:
+			bpf_warn_invalid_xdp_action(act);
+			/* fall through */
+		case XDP_ABORTED:
+			trace_xdp_exception(tun->dev, xdp_prog, act);
+			/* fall through */
+		case XDP_DROP:
+			goto drop;
+		}
+	}
+
+	return act;
+drop:
+	this_cpu_inc(tun->pcpu_stats->tx_dropped);
+	return act;
+}
+
+static u32 tun_do_xdp_tx(struct tun_struct *tun, struct tun_file *tfile,
+			 struct xdp_frame *frame)
+{
+	struct bpf_prog *xdp_prog;
+	struct tun_page tpage;
+	struct xdp_buff xdp;
+	u32 act = XDP_PASS;
+	int flush = 0;
+
+	xdp_prog = rcu_dereference(tun->xdp_tx_prog);
+	if (xdp_prog) {
+		xdp.data_hard_start = frame->data - frame->headroom;
+		xdp.data = frame->data;
+		xdp.data_end = xdp.data + frame->len;
+		xdp.data_meta = xdp.data - frame->metasize;
+
+		act = bpf_prog_run_xdp(xdp_prog, &xdp);
+		switch (act) {
+		case XDP_PASS:
+			break;
+		case XDP_TX:
+			/* fall through */
+		case XDP_REDIRECT:
+			/* fall through */
+		default:
+			bpf_warn_invalid_xdp_action(act);
+			/* fall through */
+		case XDP_ABORTED:
+			trace_xdp_exception(tun->dev, xdp_prog, act);
+			/* fall through */
+		case XDP_DROP:
+			xdp_return_frame_rx_napi(frame);
+			break;
+		}
+	}
+
+	return act;
+}
+
 static void *tun_ring_recv(struct tun_file *tfile, int noblock, int *err)
 {
 	DECLARE_WAITQUEUE(wait, current);
@@ -2590,6 +2694,47 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
 	return ret;
 }
 
+static int tun_consume_packets(struct tun_file *tfile, void **ptr_array, int n)
+{
+	struct xdp_frame *frame;
+	struct tun_struct *tun;
+	int i, num_ptrs;
+	int pkt_cnt = 0;
+	void *pkts[MAX_TAP_BATCH];
+	void *ptr;
+	u32 act;
+
+	if (unlikely(!tfile))
+		return 0;
+
+	if (n > MAX_TAP_BATCH)
+		n = MAX_TAP_BATCH;
+
+	rcu_read_lock();
+	tun = rcu_dereference(tfile->tun);
+	if (unlikely(!tun)) {
+		rcu_read_unlock();
+		return 0;
+	}
+
+	num_ptrs = ptr_ring_consume_batched(&tfile->tx_ring, pkts, n);
+	for (i = 0; i < num_ptrs; i++) {
+		ptr = pkts[i];
+		if (tun_is_xdp_frame(ptr)) {
+			frame = tun_ptr_to_xdp(ptr);
+			act = tun_do_xdp_tx(tun, tfile, frame);
+		} else {
+			act = tun_do_xdp_tx_generic(tun, ptr);
+		}
+
+		if (act == XDP_PASS)
+			ptr_array[pkt_cnt++] = ptr;
+	}
+
+	rcu_read_unlock();
+	return pkt_cnt;
+}
+
 static int tun_recvmsg(struct socket *sock, struct msghdr *m, size_t total_len,
 		       int flags)
 {
@@ -2610,9 +2755,7 @@ static int tun_recvmsg(struct socket *sock, struct msghdr *m, size_t total_len,
 			ptr = ctl->ptr;
 			break;
 		case TUN_MSG_CONSUME_PKTS:
-			ret = ptr_ring_consume_batched(&tfile->tx_ring,
-						       ctl->ptr,
-						       ctl->num);
+			ret = tun_consume_packets(tfile, ctl->ptr, ctl->num);
 			goto out;
 		case TUN_MSG_UNCONSUME_PKTS:
 			ptr_ring_unconsume(&tfile->tx_ring, ctl->ptr,
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [RFC v2 net-next 09/12] vhost_net: user tap recvmsg api to access ptr ring
  2019-12-26  2:31 ` [RFC v2 net-next 09/12] vhost_net: user tap recvmsg api to access ptr ring Prashant Bhole
@ 2019-12-26  9:05   ` kbuild test robot
  0 siblings, 0 replies; 22+ messages in thread
From: kbuild test robot @ 2019-12-26  9:05 UTC (permalink / raw)
  To: kbuild-all

[-- Attachment #1: Type: text/plain, Size: 6179 bytes --]

Hi Prashant,

[FYI, it's a private test report for your RFC patch.]
[auto build test ERROR on net-next/master]
[also build test ERROR on next-20191220]
[cannot apply to net/master bpf-next/master bpf/master v5.5-rc3]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Prashant-Bhole/XDP-in-tx-path/20191226-103951
base:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 9f6cff995e98258b6b81cc864532f633e5b3a081
config: i386-randconfig-d001-20191225 (attached as .config)
compiler: gcc-7 (Debian 7.5.0-3) 7.5.0
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

If you fix the issue, kindly add following tag
Reported-by: kbuild test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   ld: drivers/net/tap.o: in function `ptr_ring_unconsume':
>> include/linux/ptr_ring.h:552: undefined reference to `tun_ptr_free'

vim +552 include/linux/ptr_ring.h

2e0ab8ca83c122 Michael S. Tsirkin 2016-06-13  499  
197a5212c3dd70 Michael S. Tsirkin 2017-05-17  500  /*
197a5212c3dd70 Michael S. Tsirkin 2017-05-17  501   * Return entries into ring. Destroy entries that don't fit.
197a5212c3dd70 Michael S. Tsirkin 2017-05-17  502   *
197a5212c3dd70 Michael S. Tsirkin 2017-05-17  503   * Note: this is expected to be a rare slow path operation.
197a5212c3dd70 Michael S. Tsirkin 2017-05-17  504   *
197a5212c3dd70 Michael S. Tsirkin 2017-05-17  505   * Note: producer lock is nested within consumer lock, so if you
197a5212c3dd70 Michael S. Tsirkin 2017-05-17  506   * resize you must make sure all uses nest correctly.
197a5212c3dd70 Michael S. Tsirkin 2017-05-17  507   * In particular if you consume ring in interrupt or BH context, you must
197a5212c3dd70 Michael S. Tsirkin 2017-05-17  508   * disable interrupts/BH when doing so.
197a5212c3dd70 Michael S. Tsirkin 2017-05-17  509   */
197a5212c3dd70 Michael S. Tsirkin 2017-05-17  510  static inline void ptr_ring_unconsume(struct ptr_ring *r, void **batch, int n,
197a5212c3dd70 Michael S. Tsirkin 2017-05-17  511  				      void (*destroy)(void *))
197a5212c3dd70 Michael S. Tsirkin 2017-05-17  512  {
197a5212c3dd70 Michael S. Tsirkin 2017-05-17  513  	unsigned long flags;
197a5212c3dd70 Michael S. Tsirkin 2017-05-17  514  	int head;
197a5212c3dd70 Michael S. Tsirkin 2017-05-17  515  
197a5212c3dd70 Michael S. Tsirkin 2017-05-17  516  	spin_lock_irqsave(&r->consumer_lock, flags);
197a5212c3dd70 Michael S. Tsirkin 2017-05-17  517  	spin_lock(&r->producer_lock);
197a5212c3dd70 Michael S. Tsirkin 2017-05-17  518  
197a5212c3dd70 Michael S. Tsirkin 2017-05-17  519  	if (!r->size)
197a5212c3dd70 Michael S. Tsirkin 2017-05-17  520  		goto done;
197a5212c3dd70 Michael S. Tsirkin 2017-05-17  521  
197a5212c3dd70 Michael S. Tsirkin 2017-05-17  522  	/*
197a5212c3dd70 Michael S. Tsirkin 2017-05-17  523  	 * Clean out buffered entries (for simplicity). This way following code
197a5212c3dd70 Michael S. Tsirkin 2017-05-17  524  	 * can test entries for NULL and if not assume they are valid.
197a5212c3dd70 Michael S. Tsirkin 2017-05-17  525  	 */
197a5212c3dd70 Michael S. Tsirkin 2017-05-17  526  	head = r->consumer_head - 1;
197a5212c3dd70 Michael S. Tsirkin 2017-05-17  527  	while (likely(head >= r->consumer_tail))
197a5212c3dd70 Michael S. Tsirkin 2017-05-17  528  		r->queue[head--] = NULL;
197a5212c3dd70 Michael S. Tsirkin 2017-05-17  529  	r->consumer_tail = r->consumer_head;
197a5212c3dd70 Michael S. Tsirkin 2017-05-17  530  
197a5212c3dd70 Michael S. Tsirkin 2017-05-17  531  	/*
197a5212c3dd70 Michael S. Tsirkin 2017-05-17  532  	 * Go over entries in batch, start moving head back and copy entries.
197a5212c3dd70 Michael S. Tsirkin 2017-05-17  533  	 * Stop when we run into previously unconsumed entries.
197a5212c3dd70 Michael S. Tsirkin 2017-05-17  534  	 */
197a5212c3dd70 Michael S. Tsirkin 2017-05-17  535  	while (n) {
197a5212c3dd70 Michael S. Tsirkin 2017-05-17  536  		head = r->consumer_head - 1;
197a5212c3dd70 Michael S. Tsirkin 2017-05-17  537  		if (head < 0)
197a5212c3dd70 Michael S. Tsirkin 2017-05-17  538  			head = r->size - 1;
197a5212c3dd70 Michael S. Tsirkin 2017-05-17  539  		if (r->queue[head]) {
197a5212c3dd70 Michael S. Tsirkin 2017-05-17  540  			/* This batch entry will have to be destroyed. */
197a5212c3dd70 Michael S. Tsirkin 2017-05-17  541  			goto done;
197a5212c3dd70 Michael S. Tsirkin 2017-05-17  542  		}
197a5212c3dd70 Michael S. Tsirkin 2017-05-17  543  		r->queue[head] = batch[--n];
a259df36d1fbf2 Michael S. Tsirkin 2018-01-26  544  		r->consumer_tail = head;
a259df36d1fbf2 Michael S. Tsirkin 2018-01-26  545  		/* matching READ_ONCE in __ptr_ring_empty for lockless tests */
a259df36d1fbf2 Michael S. Tsirkin 2018-01-26  546  		WRITE_ONCE(r->consumer_head, head);
197a5212c3dd70 Michael S. Tsirkin 2017-05-17  547  	}
197a5212c3dd70 Michael S. Tsirkin 2017-05-17  548  
197a5212c3dd70 Michael S. Tsirkin 2017-05-17  549  done:
197a5212c3dd70 Michael S. Tsirkin 2017-05-17  550  	/* Destroy all entries left in the batch. */
197a5212c3dd70 Michael S. Tsirkin 2017-05-17  551  	while (n)
197a5212c3dd70 Michael S. Tsirkin 2017-05-17 @552  		destroy(batch[--n]);
197a5212c3dd70 Michael S. Tsirkin 2017-05-17  553  	spin_unlock(&r->producer_lock);
197a5212c3dd70 Michael S. Tsirkin 2017-05-17  554  	spin_unlock_irqrestore(&r->consumer_lock, flags);
197a5212c3dd70 Michael S. Tsirkin 2017-05-17  555  }
197a5212c3dd70 Michael S. Tsirkin 2017-05-17  556  

:::::: The code at line 552 was first introduced by commit
:::::: 197a5212c3dd70be267b5cd930be0fb68bb53018 ptr_ring: add ptr_ring_unconsume

:::::: TO: Michael S. Tsirkin <mst@redhat.com>
:::::: CC: David S. Miller <davem@davemloft.net>

---
0-DAY kernel test infrastructure                 Open Source Technology Center
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org Intel Corporation

[-- Attachment #2: config.gz --]
[-- Type: application/gzip, Size: 33631 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC v2 net-next 00/12] XDP in tx path
  2019-12-26  2:31 [RFC v2 net-next 00/12] XDP in tx path Prashant Bhole
                   ` (11 preceding siblings ...)
  2019-12-26  2:32 ` [RFC v2 net-next 12/12] tun: run XDP program in tx path Prashant Bhole
@ 2019-12-26 19:23 ` Tom Herbert
  2019-12-27  1:35   ` Prashant Bhole
  12 siblings, 1 reply; 22+ messages in thread
From: Tom Herbert @ 2019-12-26 19:23 UTC (permalink / raw)
  To: Prashant Bhole
  Cc: David S . Miller, Michael S . Tsirkin, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, Jason Wang, David Ahern,
	Jakub Kicinski, John Fastabend, Toshiaki Makita,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	Linux Kernel Network Developers

Prashant,

Can you provide some more detail about the expected use cases. I am
particularly interested if the intent is to set an XDP-like eBPF hook
in the generic TX path (the examples provided seem limited to
tunnels). For instance, is there an intent to send packets on a device
without ever creating a skbuf as the analogue of how XDP can receive
packets without needing skb.

Tom

On Wed, Dec 25, 2019 at 6:33 PM Prashant Bhole
<prashantbhole.linux@gmail.com> wrote:
>
> v2:
> - New XDP attachment type: Jesper, Toke and Alexei discussed whether
>   to introduce a new program type. Since this set adds a way to attach
>   regular XDP program to the tx path, as per Alexei's suggestion, a
>   new attachment type BPF_XDP_EGRESS is introduced.
>
> - libbpf API changes:
>   Alexei had suggested _opts() style of API extension. Considering it
>   two new libbpf APIs are introduced which are equivalent to existing
>   APIs. New ones can be extended easily. Please see individual patches
>   for details. xdp1 sample program is modified to use new APIs.
>
> - tun: Some patches from previous set are removed as they are
>   irrelevant in this series. They will in introduced later.
>
>
> This series introduces new XDP attachment type BPF_XDP_EGRESS to run
> an XDP program in tx path. The idea is to emulate RX path XDP of the
> peer interface. Such program will not have access to rxq info.
>
> RFC also includes its usage in tun driver.
> Later it can be posted separately. Another possible use of this
> feature can be in veth driver. It can improve container networking
> where veth pair links the host and the container. Host can set ACL by
> setting tx path XDP to the veth interface.
>
> It was originally a part of Jason Wang's work "XDP offload with
> virtio-net" [1]. In order to simplify this work we decided to split
> it and introduce tx path XDP separately in this set.
>
> The performance improvment can be seen when an XDP program is attached
> to tun tx path opposed to rx path in the guest.
>
> * Case 1: When packets are XDP_REDIRECT'ed towards tun.
>
>                      virtio-net rx XDP      tun tx XDP
>   xdp1(XDP_DROP)        2.57 Mpps           12.90 Mpps
>   xdp2(XDP_TX)          1.53 Mpps            7.15 Mpps
>
> * Case 2: When packets are pass through bridge towards tun
>
>                      virtio-net rx XDP      tun tx XDP
>   xdp1(XDP_DROP)        0.99 Mpps           1.00 Mpps
>   xdp2(XDP_TX)          1.19 Mpps           0.97 Mpps
>
> Since this set modifies tun and vhost_net, below are the netperf
> performance numbers.
>
>     Netperf_test       Before      After   Difference
>   UDP_STREAM 18byte     90.14       88.77    -1.51%
>   UDP_STREAM 1472byte   6955        6658     -4.27%
>   TCP STREAM            9409        9402     -0.07%
>   UDP_RR                12658       13030    +2.93%
>   TCP_RR                12711       12831    +0.94%
>
> XDP_REDIRECT will be handled later because we need to come up with
> proper way to handle it in tx path.
>
> Patches 1-5 are related to adding tx path XDP support.
> Patches 6-12 implement tx path XDP in tun driver.
>
> [1]: https://netdevconf.info/0x13/session.html?xdp-offload-with-virtio-net
>
>
>
> David Ahern (2):
>   net: introduce BPF_XDP_EGRESS attach type for XDP
>   tun: set tx path XDP program
>
> Jason Wang (2):
>   net: core: rename netif_receive_generic_xdp() to do_generic_xdp_core()
>   net: core: export do_xdp_generic_core()
>
> Prashant Bhole (8):
>   tools: sync kernel uapi/linux/if_link.h header
>   libbpf: api for getting/setting link xdp options
>   libbpf: set xdp program in tx path
>   samples/bpf: xdp1, add XDP tx support
>   tuntap: check tun_msg_ctl type at necessary places
>   vhost_net: user tap recvmsg api to access ptr ring
>   tuntap: remove usage of ptr ring in vhost_net
>   tun: run XDP program in tx path
>
>  drivers/net/tap.c                  |  42 +++---
>  drivers/net/tun.c                  | 220 ++++++++++++++++++++++++++---
>  drivers/vhost/net.c                |  77 +++++-----
>  include/linux/if_tap.h             |   5 -
>  include/linux/if_tun.h             |  23 ++-
>  include/linux/netdevice.h          |   6 +-
>  include/uapi/linux/bpf.h           |   1 +
>  include/uapi/linux/if_link.h       |   1 +
>  net/core/dev.c                     |  42 ++++--
>  net/core/filter.c                  |   8 ++
>  net/core/rtnetlink.c               | 112 ++++++++++++++-
>  samples/bpf/xdp1_user.c            |  42 ++++--
>  tools/include/uapi/linux/bpf.h     |   1 +
>  tools/include/uapi/linux/if_link.h |   2 +
>  tools/lib/bpf/libbpf.h             |  40 ++++++
>  tools/lib/bpf/libbpf.map           |   2 +
>  tools/lib/bpf/netlink.c            | 113 +++++++++++++--
>  17 files changed, 613 insertions(+), 124 deletions(-)
>
> --
> 2.21.0
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC v2 net-next 00/12] XDP in tx path
  2019-12-26 19:23 ` [RFC v2 net-next 00/12] XDP " Tom Herbert
@ 2019-12-27  1:35   ` Prashant Bhole
  0 siblings, 0 replies; 22+ messages in thread
From: Prashant Bhole @ 2019-12-27  1:35 UTC (permalink / raw)
  To: Tom Herbert
  Cc: David S . Miller, Michael S . Tsirkin, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, Jason Wang, David Ahern,
	Jakub Kicinski, John Fastabend, Toshiaki Makita,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	Linux Kernel Network Developers



On 12/27/19 4:23 AM, Tom Herbert wrote:
> Prashant,
> 
> Can you provide some more detail about the expected use cases. I am
> particularly interested if the intent is to set an XDP-like eBPF hook
> in the generic TX path (the examples provided seem limited to
> tunnels). For instance, is there an intent to send packets on a device
> without ever creating a skbuf as the analogue of how XDP can receive
> packets without needing skb.

This patchset is a result of trying to solve problems in virtual
devices. It just emulates RX path XDP of peer device. At least I have
not thought about the idea that you mentioned. May be this idea can be
helpful when a separate XDP_TX type of program which will introduced
later.

Toke had pointed out one more use case of TX path XDP here:
https://github.com/xdp-project/xdp-project/blob/master/xdp-project.org#xdp-hook-at-tx

Thanks

> 
> Tom
> 
> On Wed, Dec 25, 2019 at 6:33 PM Prashant Bhole
> <prashantbhole.linux@gmail.com> wrote:
>>
>> v2:
>> - New XDP attachment type: Jesper, Toke and Alexei discussed whether
>>    to introduce a new program type. Since this set adds a way to attach
>>    regular XDP program to the tx path, as per Alexei's suggestion, a
>>    new attachment type BPF_XDP_EGRESS is introduced.
>>
>> - libbpf API changes:
>>    Alexei had suggested _opts() style of API extension. Considering it
>>    two new libbpf APIs are introduced which are equivalent to existing
>>    APIs. New ones can be extended easily. Please see individual patches
>>    for details. xdp1 sample program is modified to use new APIs.
>>
>> - tun: Some patches from previous set are removed as they are
>>    irrelevant in this series. They will in introduced later.
>>
>>
>> This series introduces new XDP attachment type BPF_XDP_EGRESS to run
>> an XDP program in tx path. The idea is to emulate RX path XDP of the
>> peer interface. Such program will not have access to rxq info.
>>
>> RFC also includes its usage in tun driver.
>> Later it can be posted separately. Another possible use of this
>> feature can be in veth driver. It can improve container networking
>> where veth pair links the host and the container. Host can set ACL by
>> setting tx path XDP to the veth interface.
>>
>> It was originally a part of Jason Wang's work "XDP offload with
>> virtio-net" [1]. In order to simplify this work we decided to split
>> it and introduce tx path XDP separately in this set.
>>
>> The performance improvment can be seen when an XDP program is attached
>> to tun tx path opposed to rx path in the guest.
>>
>> * Case 1: When packets are XDP_REDIRECT'ed towards tun.
>>
>>                       virtio-net rx XDP      tun tx XDP
>>    xdp1(XDP_DROP)        2.57 Mpps           12.90 Mpps
>>    xdp2(XDP_TX)          1.53 Mpps            7.15 Mpps
>>
>> * Case 2: When packets are pass through bridge towards tun
>>
>>                       virtio-net rx XDP      tun tx XDP
>>    xdp1(XDP_DROP)        0.99 Mpps           1.00 Mpps
>>    xdp2(XDP_TX)          1.19 Mpps           0.97 Mpps
>>
>> Since this set modifies tun and vhost_net, below are the netperf
>> performance numbers.
>>
>>      Netperf_test       Before      After   Difference
>>    UDP_STREAM 18byte     90.14       88.77    -1.51%
>>    UDP_STREAM 1472byte   6955        6658     -4.27%
>>    TCP STREAM            9409        9402     -0.07%
>>    UDP_RR                12658       13030    +2.93%
>>    TCP_RR                12711       12831    +0.94%
>>
>> XDP_REDIRECT will be handled later because we need to come up with
>> proper way to handle it in tx path.
>>
>> Patches 1-5 are related to adding tx path XDP support.
>> Patches 6-12 implement tx path XDP in tun driver.
>>
>> [1]: https://netdevconf.info/0x13/session.html?xdp-offload-with-virtio-net
>>
>>
>>
>> David Ahern (2):
>>    net: introduce BPF_XDP_EGRESS attach type for XDP
>>    tun: set tx path XDP program
>>
>> Jason Wang (2):
>>    net: core: rename netif_receive_generic_xdp() to do_generic_xdp_core()
>>    net: core: export do_xdp_generic_core()
>>
>> Prashant Bhole (8):
>>    tools: sync kernel uapi/linux/if_link.h header
>>    libbpf: api for getting/setting link xdp options
>>    libbpf: set xdp program in tx path
>>    samples/bpf: xdp1, add XDP tx support
>>    tuntap: check tun_msg_ctl type at necessary places
>>    vhost_net: user tap recvmsg api to access ptr ring
>>    tuntap: remove usage of ptr ring in vhost_net
>>    tun: run XDP program in tx path
>>
>>   drivers/net/tap.c                  |  42 +++---
>>   drivers/net/tun.c                  | 220 ++++++++++++++++++++++++++---
>>   drivers/vhost/net.c                |  77 +++++-----
>>   include/linux/if_tap.h             |   5 -
>>   include/linux/if_tun.h             |  23 ++-
>>   include/linux/netdevice.h          |   6 +-
>>   include/uapi/linux/bpf.h           |   1 +
>>   include/uapi/linux/if_link.h       |   1 +
>>   net/core/dev.c                     |  42 ++++--
>>   net/core/filter.c                  |   8 ++
>>   net/core/rtnetlink.c               | 112 ++++++++++++++-
>>   samples/bpf/xdp1_user.c            |  42 ++++--
>>   tools/include/uapi/linux/bpf.h     |   1 +
>>   tools/include/uapi/linux/if_link.h |   2 +
>>   tools/lib/bpf/libbpf.h             |  40 ++++++
>>   tools/lib/bpf/libbpf.map           |   2 +
>>   tools/lib/bpf/netlink.c            | 113 +++++++++++++--
>>   17 files changed, 613 insertions(+), 124 deletions(-)
>>
>> --
>> 2.21.0
>>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC v2 net-next 01/12] net: introduce BPF_XDP_EGRESS attach type for XDP
  2019-12-26  2:31 ` [RFC v2 net-next 01/12] net: introduce BPF_XDP_EGRESS attach type for XDP Prashant Bhole
@ 2019-12-27 14:27   ` Jesper Dangaard Brouer
  2019-12-28  0:15     ` Prashant Bhole
  0 siblings, 1 reply; 22+ messages in thread
From: Jesper Dangaard Brouer @ 2019-12-27 14:27 UTC (permalink / raw)
  To: Prashant Bhole
  Cc: David S . Miller, Michael S . Tsirkin, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, David Ahern, Jason Wang,
	David Ahern, Jakub Kicinski, John Fastabend, Toshiaki Makita,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	netdev

On Thu, 26 Dec 2019 11:31:49 +0900
Prashant Bhole <prashantbhole.linux@gmail.com> wrote:

> This patch introduces a new bpf attach type BPF_XDP_EGRESS. Programs
> having this attach type will be allowed to run in the tx path. It is
> because we need to prevent the programs from accessing rxq info when
> they are running in tx path. Verifier can reject the programs those
> have this attach type and trying to access rxq info.
> 
> Patch also introduces a new netlink attribute IFLA_XDP_TX which can
> be used for setting XDP program in tx path and to get information of
> such programs.
> 
> Drivers those want to support tx path XDP needs to handle
> XDP_SETUP_PROG_TX and XDP_QUERY_PROG_TX cases in their ndo_bpf.

Why do you keep the "TX" names, when you introduce the "EGRESS"
attachment type?

Netlink attribute IFLA_XDP_TX is particularly confusing.

I personally like that this is called "*_XDP_EGRESS" to avoid confusing
with XDP_TX action.

BTW, should the XDP_EGRESS program also inspect XDP_TX packets?


> Signed-off-by: David Ahern <dahern@digitalocean.com>
> Co-developed-by: Prashant Bhole <prashantbhole.linux@gmail.com>
> Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
> ---
>  include/linux/netdevice.h      |   4 +-
>  include/uapi/linux/bpf.h       |   1 +
>  include/uapi/linux/if_link.h   |   1 +
>  net/core/dev.c                 |  34 +++++++---
>  net/core/filter.c              |   8 +++
>  net/core/rtnetlink.c           | 112 ++++++++++++++++++++++++++++++++-
>  tools/include/uapi/linux/bpf.h |   1 +
>  7 files changed, 150 insertions(+), 11 deletions(-)
> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 469a297b58c0..ac3e88d86581 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -865,8 +865,10 @@ enum bpf_netdev_command {
>  	 */
>  	XDP_SETUP_PROG,
>  	XDP_SETUP_PROG_HW,
> +	XDP_SETUP_PROG_TX,
>  	XDP_QUERY_PROG,
>  	XDP_QUERY_PROG_HW,
> +	XDP_QUERY_PROG_TX,
>  	/* BPF program for offload callbacks, invoked at program load time. */
>  	BPF_OFFLOAD_MAP_ALLOC,
>  	BPF_OFFLOAD_MAP_FREE,
> @@ -3725,7 +3727,7 @@ struct sk_buff *dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
>  
>  typedef int (*bpf_op_t)(struct net_device *dev, struct netdev_bpf *bpf);
>  int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
> -		      int fd, u32 flags);
> +		      int fd, u32 flags, bool tx);
>  u32 __dev_xdp_query(struct net_device *dev, bpf_op_t xdp_op,
>  		    enum bpf_netdev_command cmd);
>  int xdp_umem_query(struct net_device *dev, u16 queue_id);
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index dbbcf0b02970..23c1841c8086 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -203,6 +203,7 @@ enum bpf_attach_type {
>  	BPF_TRACE_RAW_TP,
>  	BPF_TRACE_FENTRY,
>  	BPF_TRACE_FEXIT,
> +	BPF_XDP_EGRESS,
>  	__MAX_BPF_ATTACH_TYPE
>  };
>  
> diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
> index 1d69f637c5d6..be97c9787140 100644
> --- a/include/uapi/linux/if_link.h
> +++ b/include/uapi/linux/if_link.h
> @@ -170,6 +170,7 @@ enum {
>  	IFLA_PROP_LIST,
>  	IFLA_ALT_IFNAME, /* Alternative ifname */
>  	IFLA_PERM_ADDRESS,
> +	IFLA_XDP_TX,
>  	__IFLA_MAX
>  };



-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC v2 net-next 01/12] net: introduce BPF_XDP_EGRESS attach type for XDP
  2019-12-27 14:27   ` Jesper Dangaard Brouer
@ 2019-12-28  0:15     ` Prashant Bhole
  2020-01-07 11:35       ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 22+ messages in thread
From: Prashant Bhole @ 2019-12-28  0:15 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: David S . Miller, Michael S . Tsirkin, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, David Ahern, Jason Wang,
	David Ahern, Jakub Kicinski, John Fastabend, Toshiaki Makita,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	netdev



On 12/27/2019 11:27 PM, Jesper Dangaard Brouer wrote:
> On Thu, 26 Dec 2019 11:31:49 +0900
> Prashant Bhole <prashantbhole.linux@gmail.com> wrote:
> 
>> This patch introduces a new bpf attach type BPF_XDP_EGRESS. Programs
>> having this attach type will be allowed to run in the tx path. It is
>> because we need to prevent the programs from accessing rxq info when
>> they are running in tx path. Verifier can reject the programs those
>> have this attach type and trying to access rxq info.
>>
>> Patch also introduces a new netlink attribute IFLA_XDP_TX which can
>> be used for setting XDP program in tx path and to get information of
>> such programs.
>>
>> Drivers those want to support tx path XDP needs to handle
>> XDP_SETUP_PROG_TX and XDP_QUERY_PROG_TX cases in their ndo_bpf.
> 
> Why do you keep the "TX" names, when you introduce the "EGRESS"
> attachment type?
> 
> Netlink attribute IFLA_XDP_TX is particularly confusing.
> 
> I personally like that this is called "*_XDP_EGRESS" to avoid confusing
> with XDP_TX action.

It's been named like that because it is likely that a new program
type tx path will be introduced later. It can re-use IFLA_XDP_TX
XDP_SETUP_PROG_TX, XDP_QUERY_PROG_TX. Do think that it should not
be shared by two different type of programs?

> 
> BTW, should the XDP_EGRESS program also inspect XDP_TX packets?

Yes, makes sense. But I missed to handle this case in tun driver
changes.

Thanks

> 
> 
>> Signed-off-by: David Ahern <dahern@digitalocean.com>
>> Co-developed-by: Prashant Bhole <prashantbhole.linux@gmail.com>
>> Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
>> ---
>>   include/linux/netdevice.h      |   4 +-
>>   include/uapi/linux/bpf.h       |   1 +
>>   include/uapi/linux/if_link.h   |   1 +
>>   net/core/dev.c                 |  34 +++++++---
>>   net/core/filter.c              |   8 +++
>>   net/core/rtnetlink.c           | 112 ++++++++++++++++++++++++++++++++-
>>   tools/include/uapi/linux/bpf.h |   1 +
>>   7 files changed, 150 insertions(+), 11 deletions(-)
>>
>> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
>> index 469a297b58c0..ac3e88d86581 100644
>> --- a/include/linux/netdevice.h
>> +++ b/include/linux/netdevice.h
>> @@ -865,8 +865,10 @@ enum bpf_netdev_command {
>>   	 */
>>   	XDP_SETUP_PROG,
>>   	XDP_SETUP_PROG_HW,
>> +	XDP_SETUP_PROG_TX,
>>   	XDP_QUERY_PROG,
>>   	XDP_QUERY_PROG_HW,
>> +	XDP_QUERY_PROG_TX,
>>   	/* BPF program for offload callbacks, invoked at program load time. */
>>   	BPF_OFFLOAD_MAP_ALLOC,
>>   	BPF_OFFLOAD_MAP_FREE,
>> @@ -3725,7 +3727,7 @@ struct sk_buff *dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
>>   
>>   typedef int (*bpf_op_t)(struct net_device *dev, struct netdev_bpf *bpf);
>>   int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
>> -		      int fd, u32 flags);
>> +		      int fd, u32 flags, bool tx);
>>   u32 __dev_xdp_query(struct net_device *dev, bpf_op_t xdp_op,
>>   		    enum bpf_netdev_command cmd);
>>   int xdp_umem_query(struct net_device *dev, u16 queue_id);
>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>> index dbbcf0b02970..23c1841c8086 100644
>> --- a/include/uapi/linux/bpf.h
>> +++ b/include/uapi/linux/bpf.h
>> @@ -203,6 +203,7 @@ enum bpf_attach_type {
>>   	BPF_TRACE_RAW_TP,
>>   	BPF_TRACE_FENTRY,
>>   	BPF_TRACE_FEXIT,
>> +	BPF_XDP_EGRESS,
>>   	__MAX_BPF_ATTACH_TYPE
>>   };
>>   
>> diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
>> index 1d69f637c5d6..be97c9787140 100644
>> --- a/include/uapi/linux/if_link.h
>> +++ b/include/uapi/linux/if_link.h
>> @@ -170,6 +170,7 @@ enum {
>>   	IFLA_PROP_LIST,
>>   	IFLA_ALT_IFNAME, /* Alternative ifname */
>>   	IFLA_PERM_ADDRESS,
>> +	IFLA_XDP_TX,
>>   	__IFLA_MAX
>>   };
> 
> 
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC v2 net-next 03/12] libbpf: api for getting/setting link xdp options
  2019-12-26  2:31 ` [RFC v2 net-next 03/12] libbpf: api for getting/setting link xdp options Prashant Bhole
@ 2019-12-30  4:49   ` Andrii Nakryiko
  2020-01-03 11:04     ` Prashant Bhole
  0 siblings, 1 reply; 22+ messages in thread
From: Andrii Nakryiko @ 2019-12-30  4:49 UTC (permalink / raw)
  To: Prashant Bhole
  Cc: David S . Miller, Michael S . Tsirkin, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, Jason Wang, David Ahern,
	Jakub Kicinski, John Fastabend, Toshiaki Makita,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	Networking

On Wed, Dec 25, 2019 at 6:34 PM Prashant Bhole
<prashantbhole.linux@gmail.com> wrote:
>
> This patch introduces and uses new APIs:
>
> struct bpf_link_xdp_opts {
>         struct xdp_link_info *link_info;
>         size_t link_info_sz;
>         __u32 flags;
>         __u32 prog_id;
>         int prog_fd;
> };

Please see the usage of DECLARE_LIBBPF_OPTS and OPTS_VALID/OPTS_GET
(e.g., in bpf_object__open_file). This also seems like a rather
low-level API, so might be more appropriate to follow the naming of
low-level API in bpf.h (see Andrey Ignatov's recent
bpf_prog_attach_xattr() changes).

As is this is not backwards/forward compatible, unless you use
LIBBPF_OPTS approach (that's what Alexei meant).


>
> enum bpf_link_cmd {
>         BPF_LINK_GET_XDP_INFO,
>         BPF_LINK_GET_XDP_ID,
>         BPF_LINK_SET_XDP_FD,
> };
>
> int bpf_get_link_opts(int ifindex, struct bpf_link_xdp_opts *opts,
>                       enum bpf_link_cmd cmd);
> int bpf_set_link_opts(int ifindex, struct bpf_link_xdp_opts *opts,
>                       enum bpf_link_cmd cmd);
>
> The operations performed by these two functions are equivalent to
> existing APIs.
>
> BPF_LINK_GET_XDP_ID equivalent to bpf_get_link_xdp_id()
> BPF_LINK_SET_XDP_FD equivalent to bpf_set_link_xdp_fd()
> BPF_LINK_GET_XDP_INFO equivalent to bpf_get_link_xdp_info()
>
> It will be easy to extend this API by adding members in struct
> bpf_link_xdp_opts and adding different operations. Next patch
> will extend this API to set XDP program in the tx path.

Not really, and this has been extensively discussed previously. One of
the problems is old user code linked against newer libbpf version
(shared library). New libbpf will assume struct with more fields,
while old user code will provide too short struct. That's why all the
LIBBPF_OPTS stuff.

>
> Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
> ---
>  tools/lib/bpf/libbpf.h   | 36 +++++++++++++++++++
>  tools/lib/bpf/libbpf.map |  2 ++
>  tools/lib/bpf/netlink.c  | 77 ++++++++++++++++++++++++++++++++++++----
>  3 files changed, 109 insertions(+), 6 deletions(-)
>

[...]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC v2 net-next 03/12] libbpf: api for getting/setting link xdp options
  2019-12-30  4:49   ` Andrii Nakryiko
@ 2020-01-03 11:04     ` Prashant Bhole
  0 siblings, 0 replies; 22+ messages in thread
From: Prashant Bhole @ 2020-01-03 11:04 UTC (permalink / raw)
  To: Andrii Nakryiko
  Cc: David S . Miller, Michael S . Tsirkin, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, Jason Wang, David Ahern,
	Jakub Kicinski, John Fastabend, Toshiaki Makita,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	Networking



On 12/30/2019 1:49 PM, Andrii Nakryiko wrote:
> On Wed, Dec 25, 2019 at 6:34 PM Prashant Bhole
> <prashantbhole.linux@gmail.com> wrote:
>>
>> This patch introduces and uses new APIs:
>>
>> struct bpf_link_xdp_opts {
>>          struct xdp_link_info *link_info;
>>          size_t link_info_sz;
>>          __u32 flags;
>>          __u32 prog_id;
>>          int prog_fd;
>> };
> 
> Please see the usage of DECLARE_LIBBPF_OPTS and OPTS_VALID/OPTS_GET
> (e.g., in bpf_object__open_file). This also seems like a rather
> low-level API, so might be more appropriate to follow the naming of
> low-level API in bpf.h (see Andrey Ignatov's recent
> bpf_prog_attach_xattr() changes).
> 
> As is this is not backwards/forward compatible, unless you use
> LIBBPF_OPTS approach (that's what Alexei meant).

Got it.

> 
> 
>>
>> enum bpf_link_cmd {
>>          BPF_LINK_GET_XDP_INFO,
>>          BPF_LINK_GET_XDP_ID,
>>          BPF_LINK_SET_XDP_FD,
>> };
>>
>> int bpf_get_link_opts(int ifindex, struct bpf_link_xdp_opts *opts,
>>                        enum bpf_link_cmd cmd);
>> int bpf_set_link_opts(int ifindex, struct bpf_link_xdp_opts *opts,
>>                        enum bpf_link_cmd cmd);
>>
>> The operations performed by these two functions are equivalent to
>> existing APIs.
>>
>> BPF_LINK_GET_XDP_ID equivalent to bpf_get_link_xdp_id()
>> BPF_LINK_SET_XDP_FD equivalent to bpf_set_link_xdp_fd()
>> BPF_LINK_GET_XDP_INFO equivalent to bpf_get_link_xdp_info()
>>
>> It will be easy to extend this API by adding members in struct
>> bpf_link_xdp_opts and adding different operations. Next patch
>> will extend this API to set XDP program in the tx path.
> 
> Not really, and this has been extensively discussed previously. One of
> the problems is old user code linked against newer libbpf version
> (shared library). New libbpf will assume struct with more fields,
> while old user code will provide too short struct. That's why all the
> LIBBPF_OPTS stuff.

Got it. Thanks for reviewing.

Prashant

> 
>>
>> Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
>> ---
>>   tools/lib/bpf/libbpf.h   | 36 +++++++++++++++++++
>>   tools/lib/bpf/libbpf.map |  2 ++
>>   tools/lib/bpf/netlink.c  | 77 ++++++++++++++++++++++++++++++++++++----
>>   3 files changed, 109 insertions(+), 6 deletions(-)
>>
> 
> [...]
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC v2 net-next 01/12] net: introduce BPF_XDP_EGRESS attach type for XDP
  2019-12-28  0:15     ` Prashant Bhole
@ 2020-01-07 11:35       ` Toke Høiland-Jørgensen
  2020-01-11  0:53         ` Prashant Bhole
  0 siblings, 1 reply; 22+ messages in thread
From: Toke Høiland-Jørgensen @ 2020-01-07 11:35 UTC (permalink / raw)
  To: Prashant Bhole, Jesper Dangaard Brouer
  Cc: David S . Miller, Michael S . Tsirkin, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, David Ahern, Jason Wang,
	David Ahern, Jakub Kicinski, John Fastabend, Toshiaki Makita,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	netdev

Prashant Bhole <prashantbhole.linux@gmail.com> writes:

> On 12/27/2019 11:27 PM, Jesper Dangaard Brouer wrote:
>> On Thu, 26 Dec 2019 11:31:49 +0900
>> Prashant Bhole <prashantbhole.linux@gmail.com> wrote:
>> 
>>> This patch introduces a new bpf attach type BPF_XDP_EGRESS. Programs
>>> having this attach type will be allowed to run in the tx path. It is
>>> because we need to prevent the programs from accessing rxq info when
>>> they are running in tx path. Verifier can reject the programs those
>>> have this attach type and trying to access rxq info.
>>>
>>> Patch also introduces a new netlink attribute IFLA_XDP_TX which can
>>> be used for setting XDP program in tx path and to get information of
>>> such programs.
>>>
>>> Drivers those want to support tx path XDP needs to handle
>>> XDP_SETUP_PROG_TX and XDP_QUERY_PROG_TX cases in their ndo_bpf.
>> 
>> Why do you keep the "TX" names, when you introduce the "EGRESS"
>> attachment type?
>> 
>> Netlink attribute IFLA_XDP_TX is particularly confusing.
>> 
>> I personally like that this is called "*_XDP_EGRESS" to avoid confusing
>> with XDP_TX action.
>
> It's been named like that because it is likely that a new program
> type tx path will be introduced later. It can re-use IFLA_XDP_TX
> XDP_SETUP_PROG_TX, XDP_QUERY_PROG_TX. Do think that it should not
> be shared by two different type of programs?

I agree that the *PROG_TX stuff is confusing.

Why not just keep the same XDP attach command, and just make this a new
attach mode? I.e., today you can do

bpf_set_link_xdp_fd(ifindex, prog_fd, XDP_FLAGS_DRV_MODE);

so for this, just add support for:

bpf_set_link_xdp_fd(ifindex, prog_fd, XDP_FLAGS_EGRESS_MODE);

No need for a new command/netlink attribute. We already support multiple
attach modes (HW+DRV), so this should be a straight-forward extension,
no?

-Toke


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [RFC v2 net-next 01/12] net: introduce BPF_XDP_EGRESS attach type for XDP
  2020-01-07 11:35       ` Toke Høiland-Jørgensen
@ 2020-01-11  0:53         ` Prashant Bhole
  0 siblings, 0 replies; 22+ messages in thread
From: Prashant Bhole @ 2020-01-11  0:53 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Jesper Dangaard Brouer
  Cc: David S . Miller, Michael S . Tsirkin, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, David Ahern, Jason Wang,
	David Ahern, Jakub Kicinski, John Fastabend, Toshiaki Makita,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	netdev



On 1/7/2020 8:35 PM, Toke Høiland-Jørgensen wrote:
> Prashant Bhole <prashantbhole.linux@gmail.com> writes:
> 
>> On 12/27/2019 11:27 PM, Jesper Dangaard Brouer wrote:
>>> On Thu, 26 Dec 2019 11:31:49 +0900
>>> Prashant Bhole <prashantbhole.linux@gmail.com> wrote:
>>>
>>>> This patch introduces a new bpf attach type BPF_XDP_EGRESS. Programs
>>>> having this attach type will be allowed to run in the tx path. It is
>>>> because we need to prevent the programs from accessing rxq info when
>>>> they are running in tx path. Verifier can reject the programs those
>>>> have this attach type and trying to access rxq info.
>>>>
>>>> Patch also introduces a new netlink attribute IFLA_XDP_TX which can
>>>> be used for setting XDP program in tx path and to get information of
>>>> such programs.
>>>>
>>>> Drivers those want to support tx path XDP needs to handle
>>>> XDP_SETUP_PROG_TX and XDP_QUERY_PROG_TX cases in their ndo_bpf.
>>>
>>> Why do you keep the "TX" names, when you introduce the "EGRESS"
>>> attachment type?
>>>
>>> Netlink attribute IFLA_XDP_TX is particularly confusing.
>>>
>>> I personally like that this is called "*_XDP_EGRESS" to avoid confusing
>>> with XDP_TX action.
>>
>> It's been named like that because it is likely that a new program
>> type tx path will be introduced later. It can re-use IFLA_XDP_TX
>> XDP_SETUP_PROG_TX, XDP_QUERY_PROG_TX. Do think that it should not
>> be shared by two different type of programs?
> 
> I agree that the *PROG_TX stuff is confusing.

Ok. It seems s/TX/EGRESS is good for now.

> 
> Why not just keep the same XDP attach command, and just make this a new
> attach mode? I.e., today you can do
> 
> bpf_set_link_xdp_fd(ifindex, prog_fd, XDP_FLAGS_DRV_MODE);
> 
> so for this, just add support for:
> 
> bpf_set_link_xdp_fd(ifindex, prog_fd, XDP_FLAGS_EGRESS_MODE);
> 
> No need for a new command/netlink attribute. We already support multiple
> attach modes (HW+DRV), so this should be a straight-forward extension,
> no?

Initially we had implemented it the same way. I am ok with this way too.
- new attachment flag BPF_XDP_EGRESS for verifier purpose
- new xdp flag XDP_FLAGS_EGRESS for libbpf


Thanks

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2020-01-11  0:53 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-26  2:31 [RFC v2 net-next 00/12] XDP in tx path Prashant Bhole
2019-12-26  2:31 ` [RFC v2 net-next 01/12] net: introduce BPF_XDP_EGRESS attach type for XDP Prashant Bhole
2019-12-27 14:27   ` Jesper Dangaard Brouer
2019-12-28  0:15     ` Prashant Bhole
2020-01-07 11:35       ` Toke Høiland-Jørgensen
2020-01-11  0:53         ` Prashant Bhole
2019-12-26  2:31 ` [RFC v2 net-next 02/12] tools: sync kernel uapi/linux/if_link.h header Prashant Bhole
2019-12-26  2:31 ` [RFC v2 net-next 03/12] libbpf: api for getting/setting link xdp options Prashant Bhole
2019-12-30  4:49   ` Andrii Nakryiko
2020-01-03 11:04     ` Prashant Bhole
2019-12-26  2:31 ` [RFC v2 net-next 04/12] libbpf: set xdp program in tx path Prashant Bhole
2019-12-26  2:31 ` [RFC v2 net-next 05/12] samples/bpf: xdp1, add XDP tx support Prashant Bhole
2019-12-26  2:31 ` [RFC v2 net-next 06/12] net: core: rename netif_receive_generic_xdp() to do_generic_xdp_core() Prashant Bhole
2019-12-26  2:31 ` [RFC v2 net-next 07/12] net: core: export do_xdp_generic_core() Prashant Bhole
2019-12-26  2:31 ` [RFC v2 net-next 08/12] tuntap: check tun_msg_ctl type at necessary places Prashant Bhole
2019-12-26  2:31 ` [RFC v2 net-next 09/12] vhost_net: user tap recvmsg api to access ptr ring Prashant Bhole
2019-12-26  9:05   ` kbuild test robot
2019-12-26  2:31 ` [RFC v2 net-next 10/12] tuntap: remove usage of ptr ring in vhost_net Prashant Bhole
2019-12-26  2:31 ` [RFC v2 net-next 11/12] tun: set tx path XDP program Prashant Bhole
2019-12-26  2:32 ` [RFC v2 net-next 12/12] tun: run XDP program in tx path Prashant Bhole
2019-12-26 19:23 ` [RFC v2 net-next 00/12] XDP " Tom Herbert
2019-12-27  1:35   ` Prashant Bhole

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.