netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC net-next 00/14] XDP in tx path
@ 2019-12-18  8:10 Prashant Bhole
  2019-12-18  8:10 ` [RFC net-next 01/14] net: add tx path XDP support Prashant Bhole
                   ` (13 more replies)
  0 siblings, 14 replies; 36+ messages in thread
From: Prashant Bhole @ 2019-12-18  8:10 UTC (permalink / raw)
  To: David S . Miller, Michael S . Tsirkin, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer
  Cc: Prashant Bhole, Jason Wang, David Ahern, Jakub Kicinski,
	John Fastabend, Toshiaki Makita, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, netdev


This series introduces tx path XDP. RFC also includes usage of the
feature in tun driver. Later both can be posted separately. Another
possible use of this feature can be in veth driver. It can improve
container networking where veth pair links the host and the container.
Host can set ACL by setting tx path XDP to the veth iface.

It was originally a part of Jason Wang's work "XDP offload with
virtio-net" [1]. In order to simplify this work we decided to split
it and introduce tx path XDP separately in this set.

The performance improvment can be seen when an XDP program is attached
to tun tx path opposed to rx path in the guest.

* Case 1: When packets are XDP_REDIRECT'ed towards tun.

                     virtio-net rx XDP      tun tx XDP
  xdp1(XDP_DROP)        2.57 Mpps           12.90 Mpps
  xdp2(XDP_TX)          1.53 Mpps            7.15 Mpps

* Case 2: When packets are pass through bridge towards tun

                     virtio-net rx XDP      tun tx XDP
  xdp1(XDP_DROP)        0.99 Mpps           1.00 Mpps
  xdp2(XDP_TX)          1.19 Mpps           0.97 Mpps

Since this set modifies tun and vhost_net, below are the netperf
performance numbers.

    Netperf_test       Before      After   Difference
  UDP_STREAM 18byte     90.14       88.77    -1.51%
  UDP_STREAM 1472byte   6955        6658     -4.27%
  TCP STREAM            9409        9402     -0.07%
  UDP_RR                12658       13030    +2.93%
  TCP_RR                12711       12831    +0.94%

XDP_REDIRECT will be handled later because we need to come up with
proper way to handle it in tx path.

Patches 1-4 are related to adding tx path XDP support.
Patches 5-14 implement tx path XDP in tun driver.

[1]: https://netdevconf.info/0x13/session.html?xdp-offload-with-virtio-net


David Ahern (2):
  net: add tx path XDP support
  tun: set tx path XDP program

Jason Wang (2):
  net: core: rename netif_receive_generic_xdp() to do_generic_xdp_core()
  net: core: export do_xdp_generic_core()

Prashant Bhole (10):
  tools: sync kernel uapi/linux/if_link.h header
  libbpf: API for tx path XDP support
  samples/bpf: xdp1, add XDP tx support
  tuntap: check tun_msg_ctl type at necessary places
  vhost_net: user tap recvmsg api to access ptr ring
  tuntap: remove usage of ptr ring in vhost_net
  tun: run XDP program in tx path
  tun: add a way to inject tx path packet into Rx path
  tun: handle XDP_TX action of tx path XDP program
  tun: run xdp prog when tun is read from file interface

 drivers/net/tap.c                  |  42 +++--
 drivers/net/tun.c                  | 278 +++++++++++++++++++++++++----
 drivers/vhost/net.c                |  77 ++++----
 include/linux/if_tap.h             |   5 -
 include/linux/if_tun.h             |  23 ++-
 include/linux/netdevice.h          |   6 +-
 include/uapi/linux/if_link.h       |   1 +
 net/core/dev.c                     |  39 ++--
 net/core/rtnetlink.c               | 112 +++++++++++-
 samples/bpf/xdp1_user.c            |  28 ++-
 tools/include/uapi/linux/if_link.h |   2 +
 tools/lib/bpf/libbpf.h             |   4 +
 tools/lib/bpf/libbpf.map           |   3 +
 tools/lib/bpf/netlink.c            |  77 +++++++-
 14 files changed, 571 insertions(+), 126 deletions(-)

-- 
2.21.0


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [RFC net-next 01/14] net: add tx path XDP support
  2019-12-18  8:10 [RFC net-next 00/14] XDP in tx path Prashant Bhole
@ 2019-12-18  8:10 ` Prashant Bhole
  2019-12-18  8:10 ` [RFC net-next 02/14] tools: sync kernel uapi/linux/if_link.h header Prashant Bhole
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 36+ messages in thread
From: Prashant Bhole @ 2019-12-18  8:10 UTC (permalink / raw)
  To: David S . Miller, Michael S . Tsirkin, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer
  Cc: David Ahern, Jason Wang, David Ahern, Jakub Kicinski,
	John Fastabend, Toshiaki Makita, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, netdev, Prashant Bhole

From: David Ahern <dahern@digitalocean.com>

Add a way to run XDP program in tx path. Drivers those want to support
tx path XDP needs to handle XDP_SETUP_PROG_TX and XDP_QUERY_PROG_TX
cases in their ndo_bpf handler.

Signed-off-by: David Ahern <dahern@digitalocean.com>
Co-developed-by: Prashant Bhole <prashantbhole.linux@gmail.com>
Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
---
 include/linux/netdevice.h    |   4 +-
 include/uapi/linux/if_link.h |   1 +
 net/core/dev.c               |  31 +++++++---
 net/core/rtnetlink.c         | 112 ++++++++++++++++++++++++++++++++++-
 4 files changed, 137 insertions(+), 11 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 30745068fb39..7b5256d374d2 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -864,8 +864,10 @@ enum bpf_netdev_command {
 	 */
 	XDP_SETUP_PROG,
 	XDP_SETUP_PROG_HW,
+	XDP_SETUP_PROG_TX,
 	XDP_QUERY_PROG,
 	XDP_QUERY_PROG_HW,
+	XDP_QUERY_PROG_TX,
 	/* BPF program for offload callbacks, invoked at program load time. */
 	BPF_OFFLOAD_MAP_ALLOC,
 	BPF_OFFLOAD_MAP_FREE,
@@ -3724,7 +3726,7 @@ struct sk_buff *dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev,
 
 typedef int (*bpf_op_t)(struct net_device *dev, struct netdev_bpf *bpf);
 int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
-		      int fd, u32 flags);
+		      int fd, u32 flags, bool tx);
 u32 __dev_xdp_query(struct net_device *dev, bpf_op_t xdp_op,
 		    enum bpf_netdev_command cmd);
 int xdp_umem_query(struct net_device *dev, u16 queue_id);
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 1d69f637c5d6..be97c9787140 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -170,6 +170,7 @@ enum {
 	IFLA_PROP_LIST,
 	IFLA_ALT_IFNAME, /* Alternative ifname */
 	IFLA_PERM_ADDRESS,
+	IFLA_XDP_TX,
 	__IFLA_MAX
 };
 
diff --git a/net/core/dev.c b/net/core/dev.c
index 2c277b8aba38..a46db72759cf 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -8540,7 +8540,7 @@ u32 __dev_xdp_query(struct net_device *dev, bpf_op_t bpf_op,
 
 static int dev_xdp_install(struct net_device *dev, bpf_op_t bpf_op,
 			   struct netlink_ext_ack *extack, u32 flags,
-			   struct bpf_prog *prog)
+			   struct bpf_prog *prog, bool tx)
 {
 	struct netdev_bpf xdp;
 
@@ -8548,7 +8548,8 @@ static int dev_xdp_install(struct net_device *dev, bpf_op_t bpf_op,
 	if (flags & XDP_FLAGS_HW_MODE)
 		xdp.command = XDP_SETUP_PROG_HW;
 	else
-		xdp.command = XDP_SETUP_PROG;
+		xdp.command = tx ? XDP_SETUP_PROG_TX : XDP_SETUP_PROG;
+
 	xdp.extack = extack;
 	xdp.flags = flags;
 	xdp.prog = prog;
@@ -8562,7 +8563,8 @@ static void dev_xdp_uninstall(struct net_device *dev)
 	bpf_op_t ndo_bpf;
 
 	/* Remove generic XDP */
-	WARN_ON(dev_xdp_install(dev, generic_xdp_install, NULL, 0, NULL));
+	WARN_ON(dev_xdp_install(dev, generic_xdp_install, NULL, 0, NULL,
+				false));
 
 	/* Remove from the driver */
 	ndo_bpf = dev->netdev_ops->ndo_bpf;
@@ -8574,14 +8576,21 @@ static void dev_xdp_uninstall(struct net_device *dev)
 	WARN_ON(ndo_bpf(dev, &xdp));
 	if (xdp.prog_id)
 		WARN_ON(dev_xdp_install(dev, ndo_bpf, NULL, xdp.prog_flags,
-					NULL));
+					NULL, false));
 
 	/* Remove HW offload */
 	memset(&xdp, 0, sizeof(xdp));
 	xdp.command = XDP_QUERY_PROG_HW;
 	if (!ndo_bpf(dev, &xdp) && xdp.prog_id)
 		WARN_ON(dev_xdp_install(dev, ndo_bpf, NULL, xdp.prog_flags,
-					NULL));
+					NULL, false));
+
+	/* Remove HW offload */
+	memset(&xdp, 0, sizeof(xdp));
+	xdp.command = XDP_QUERY_PROG_TX;
+	if (!ndo_bpf(dev, &xdp) && xdp.prog_id)
+		WARN_ON(dev_xdp_install(dev, ndo_bpf, NULL, xdp.prog_flags,
+					NULL, true));
 }
 
 /**
@@ -8594,7 +8603,7 @@ static void dev_xdp_uninstall(struct net_device *dev)
  *	Set or clear a bpf program for a device
  */
 int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
-		      int fd, u32 flags)
+		      int fd, u32 flags, bool tx)
 {
 	const struct net_device_ops *ops = dev->netdev_ops;
 	enum bpf_netdev_command query;
@@ -8606,7 +8615,10 @@ int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
 	ASSERT_RTNL();
 
 	offload = flags & XDP_FLAGS_HW_MODE;
-	query = offload ? XDP_QUERY_PROG_HW : XDP_QUERY_PROG;
+	if (tx)
+		query = XDP_QUERY_PROG_TX;
+	else
+		query = offload ? XDP_QUERY_PROG_HW : XDP_QUERY_PROG;
 
 	bpf_op = bpf_chk = ops->ndo_bpf;
 	if (!bpf_op && (flags & (XDP_FLAGS_DRV_MODE | XDP_FLAGS_HW_MODE))) {
@@ -8621,7 +8633,8 @@ int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
 	if (fd >= 0) {
 		u32 prog_id;
 
-		if (!offload && __dev_xdp_query(dev, bpf_chk, XDP_QUERY_PROG)) {
+		if (!offload && !tx &&
+		    __dev_xdp_query(dev, bpf_chk, XDP_QUERY_PROG)) {
 			NL_SET_ERR_MSG(extack, "native and generic XDP can't be active at the same time");
 			return -EEXIST;
 		}
@@ -8653,7 +8666,7 @@ int dev_change_xdp_fd(struct net_device *dev, struct netlink_ext_ack *extack,
 			return 0;
 	}
 
-	err = dev_xdp_install(dev, bpf_op, extack, flags, prog);
+	err = dev_xdp_install(dev, bpf_op, extack, flags, prog, tx);
 	if (err < 0 && prog)
 		bpf_prog_put(prog);
 
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 20bc406f3871..9dc4b2547f62 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1395,6 +1395,36 @@ static int rtnl_fill_link_ifmap(struct sk_buff *skb, struct net_device *dev)
 	return 0;
 }
 
+static u32 rtnl_xdp_tx_prog_drv(struct net_device *dev)
+{
+	return __dev_xdp_query(dev, dev->netdev_ops->ndo_bpf,
+			       XDP_QUERY_PROG_TX);
+}
+
+static int rtnl_xdp_tx_report_one(struct sk_buff *skb, struct net_device *dev,
+				  u32 *prog_id, u8 *mode, u8 tgt_mode, u32 attr,
+				  u32 (*get_prog_id)(struct net_device *dev))
+{
+	u32 curr_id;
+	int err;
+
+	curr_id = get_prog_id(dev);
+	if (!curr_id)
+		return 0;
+
+	*prog_id = curr_id;
+	err = nla_put_u32(skb, attr, curr_id);
+	if (err)
+		return err;
+
+	if (*mode != XDP_ATTACHED_NONE)
+		*mode = XDP_ATTACHED_MULTI;
+	else
+		*mode = tgt_mode;
+
+	return 0;
+}
+
 static u32 rtnl_xdp_prog_skb(struct net_device *dev)
 {
 	const struct bpf_prog *generic_xdp_prog;
@@ -1486,6 +1516,41 @@ static int rtnl_xdp_fill(struct sk_buff *skb, struct net_device *dev)
 	return err;
 }
 
+static int rtnl_xdp_tx_fill(struct sk_buff *skb, struct net_device *dev)
+{
+	u8 mode = XDP_ATTACHED_NONE;
+	struct nlattr *xdp;
+	u32 prog_id = 0;
+	int err;
+
+	xdp = nla_nest_start_noflag(skb, IFLA_XDP_TX);
+	if (!xdp)
+		return -EMSGSIZE;
+
+	err = rtnl_xdp_tx_report_one(skb, dev, &prog_id, &mode,
+				     XDP_ATTACHED_DRV, IFLA_XDP_DRV_PROG_ID,
+				     rtnl_xdp_tx_prog_drv);
+	if (err)
+		goto err_cancel;
+
+	err = nla_put_u8(skb, IFLA_XDP_ATTACHED, mode);
+	if (err)
+		goto err_cancel;
+
+	if (prog_id && mode != XDP_ATTACHED_MULTI) {
+		err = nla_put_u32(skb, IFLA_XDP_PROG_ID, prog_id);
+		if (err)
+			goto err_cancel;
+	}
+
+	nla_nest_end(skb, xdp);
+	return 0;
+
+err_cancel:
+	nla_nest_cancel(skb, xdp);
+	return err;
+}
+
 static u32 rtnl_get_event(unsigned long event)
 {
 	u32 rtnl_event_type = IFLA_EVENT_NONE;
@@ -1743,6 +1808,9 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb,
 	if (rtnl_xdp_fill(skb, dev))
 		goto nla_put_failure;
 
+	if (rtnl_xdp_tx_fill(skb, dev))
+		goto nla_put_failure;
+
 	if (dev->rtnl_link_ops || rtnl_have_link_slave_info(dev)) {
 		if (rtnl_link_fill(skb, dev) < 0)
 			goto nla_put_failure;
@@ -1827,6 +1895,7 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = {
 	[IFLA_ALT_IFNAME]	= { .type = NLA_STRING,
 				    .len = ALTIFNAMSIZ - 1 },
 	[IFLA_PERM_ADDRESS]	= { .type = NLA_REJECT },
+	[IFLA_XDP_TX]		= { .type = NLA_NESTED },
 };
 
 static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = {
@@ -2801,7 +2870,48 @@ static int do_setlink(const struct sk_buff *skb,
 		if (xdp[IFLA_XDP_FD]) {
 			err = dev_change_xdp_fd(dev, extack,
 						nla_get_s32(xdp[IFLA_XDP_FD]),
-						xdp_flags);
+						xdp_flags, false);
+			if (err)
+				goto errout;
+			status |= DO_SETLINK_NOTIFY;
+		}
+	}
+
+	if (tb[IFLA_XDP_TX]) {
+		struct nlattr *xdp[IFLA_XDP_MAX + 1];
+		u32 xdp_flags = 0;
+
+		err = nla_parse_nested_deprecated(xdp, IFLA_XDP_MAX,
+						  tb[IFLA_XDP_TX],
+						  ifla_xdp_policy, NULL);
+		if (err < 0)
+			goto errout;
+
+		if (xdp[IFLA_XDP_ATTACHED] || xdp[IFLA_XDP_PROG_ID]) {
+			err = -EINVAL;
+			goto errout;
+		}
+
+		if (xdp[IFLA_XDP_FLAGS]) {
+			xdp_flags = nla_get_u32(xdp[IFLA_XDP_FLAGS]);
+			if (xdp_flags & XDP_FLAGS_HW_MODE) {
+				err = -EINVAL;
+				goto errout;
+			}
+			if (xdp_flags & ~XDP_FLAGS_MASK) {
+				err = -EINVAL;
+				goto errout;
+			}
+			if (hweight32(xdp_flags & XDP_FLAGS_MODES) > 1) {
+				err = -EINVAL;
+				goto errout;
+			}
+		}
+
+		if (xdp[IFLA_XDP_FD]) {
+			err = dev_change_xdp_fd(dev, extack,
+						nla_get_s32(xdp[IFLA_XDP_FD]),
+						xdp_flags, true);
 			if (err)
 				goto errout;
 			status |= DO_SETLINK_NOTIFY;
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [RFC net-next 02/14] tools: sync kernel uapi/linux/if_link.h header
  2019-12-18  8:10 [RFC net-next 00/14] XDP in tx path Prashant Bhole
  2019-12-18  8:10 ` [RFC net-next 01/14] net: add tx path XDP support Prashant Bhole
@ 2019-12-18  8:10 ` Prashant Bhole
  2019-12-18  8:10 ` [RFC net-next 03/14] libbpf: API for tx path XDP support Prashant Bhole
                   ` (11 subsequent siblings)
  13 siblings, 0 replies; 36+ messages in thread
From: Prashant Bhole @ 2019-12-18  8:10 UTC (permalink / raw)
  To: David S . Miller, Michael S . Tsirkin, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer
  Cc: Prashant Bhole, Jason Wang, David Ahern, Jakub Kicinski,
	John Fastabend, Toshiaki Makita, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, netdev

if_link.h was out of sync. Also it was recently updated to add
IFLA_XDP_TX attribute.

Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
---
 tools/include/uapi/linux/if_link.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/include/uapi/linux/if_link.h b/tools/include/uapi/linux/if_link.h
index 8aec8769d944..be97c9787140 100644
--- a/tools/include/uapi/linux/if_link.h
+++ b/tools/include/uapi/linux/if_link.h
@@ -169,6 +169,8 @@ enum {
 	IFLA_MAX_MTU,
 	IFLA_PROP_LIST,
 	IFLA_ALT_IFNAME, /* Alternative ifname */
+	IFLA_PERM_ADDRESS,
+	IFLA_XDP_TX,
 	__IFLA_MAX
 };
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [RFC net-next 03/14] libbpf: API for tx path XDP support
  2019-12-18  8:10 [RFC net-next 00/14] XDP in tx path Prashant Bhole
  2019-12-18  8:10 ` [RFC net-next 01/14] net: add tx path XDP support Prashant Bhole
  2019-12-18  8:10 ` [RFC net-next 02/14] tools: sync kernel uapi/linux/if_link.h header Prashant Bhole
@ 2019-12-18  8:10 ` Prashant Bhole
  2019-12-18 18:20   ` Alexei Starovoitov
  2019-12-18  8:10 ` [RFC net-next 04/14] samples/bpf: xdp1, add XDP tx support Prashant Bhole
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 36+ messages in thread
From: Prashant Bhole @ 2019-12-18  8:10 UTC (permalink / raw)
  To: David S . Miller, Michael S . Tsirkin, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer
  Cc: Prashant Bhole, Jason Wang, David Ahern, Jakub Kicinski,
	John Fastabend, Toshiaki Makita, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, netdev, David Ahern

Adds new APIs for tx path XDP:
 - bpf_set_link_xdp_tx_fd
 - bpf_get_link_xdp_tx_id
 - bpf_get_link_xdp_tx_info

Co-developed-by: David Ahern <dahern@digitalocean.com>
Signed-off-by: David Ahern <dahern@digitalocean.com>
Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
---
 tools/lib/bpf/libbpf.h   |  4 +++
 tools/lib/bpf/libbpf.map |  3 ++
 tools/lib/bpf/netlink.c  | 77 +++++++++++++++++++++++++++++++++++-----
 3 files changed, 75 insertions(+), 9 deletions(-)

diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
index 0dbf4bfba0c4..741e5fee61f6 100644
--- a/tools/lib/bpf/libbpf.h
+++ b/tools/lib/bpf/libbpf.h
@@ -447,6 +447,10 @@ LIBBPF_API int bpf_set_link_xdp_fd(int ifindex, int fd, __u32 flags);
 LIBBPF_API int bpf_get_link_xdp_id(int ifindex, __u32 *prog_id, __u32 flags);
 LIBBPF_API int bpf_get_link_xdp_info(int ifindex, struct xdp_link_info *info,
 				     size_t info_size, __u32 flags);
+LIBBPF_API int bpf_set_link_xdp_tx_fd(int ifindex, int fd, __u32 flags);
+LIBBPF_API int bpf_get_link_xdp_tx_id(int ifindex, __u32 *prog_id, __u32 flags);
+LIBBPF_API int bpf_get_link_xdp_tx_info(int ifindex, struct xdp_link_info *info,
+				     size_t info_size, __u32 flags);
 
 struct perf_buffer;
 
diff --git a/tools/lib/bpf/libbpf.map b/tools/lib/bpf/libbpf.map
index 8ddc2c40e482..28597c2a24c4 100644
--- a/tools/lib/bpf/libbpf.map
+++ b/tools/lib/bpf/libbpf.map
@@ -207,4 +207,7 @@ LIBBPF_0.0.6 {
 		bpf_program__size;
 		btf__find_by_name_kind;
 		libbpf_find_vmlinux_btf_id;
+		bpf_set_link_xdp_tx_fd;
+		bpf_get_link_xdp_tx_id;
+		bpf_get_link_xdp_tx_info;
 } LIBBPF_0.0.5;
diff --git a/tools/lib/bpf/netlink.c b/tools/lib/bpf/netlink.c
index 5065c1aa1061..eca6e80df697 100644
--- a/tools/lib/bpf/netlink.c
+++ b/tools/lib/bpf/netlink.c
@@ -129,7 +129,7 @@ static int bpf_netlink_recv(int sock, __u32 nl_pid, int seq,
 	return ret;
 }
 
-int bpf_set_link_xdp_fd(int ifindex, int fd, __u32 flags)
+static int __bpf_set_link_xdp_fd(int ifindex, int fd, __u32 flags, bool tx)
 {
 	int sock, seq = 0, ret;
 	struct nlattr *nla, *nla_xdp;
@@ -156,7 +156,11 @@ int bpf_set_link_xdp_fd(int ifindex, int fd, __u32 flags)
 	/* started nested attribute for XDP */
 	nla = (struct nlattr *)(((char *)&req)
 				+ NLMSG_ALIGN(req.nh.nlmsg_len));
-	nla->nla_type = NLA_F_NESTED | IFLA_XDP;
+	nla->nla_type = NLA_F_NESTED;
+	if (tx)
+		nla->nla_type |= IFLA_XDP_TX;
+	else
+		nla->nla_type |= IFLA_XDP;
 	nla->nla_len = NLA_HDRLEN;
 
 	/* add XDP fd */
@@ -188,6 +192,16 @@ int bpf_set_link_xdp_fd(int ifindex, int fd, __u32 flags)
 	return ret;
 }
 
+int bpf_set_link_xdp_fd(int ifindex, int fd, __u32 flags)
+{
+	return __bpf_set_link_xdp_fd(ifindex, fd, flags, false);
+}
+
+int bpf_set_link_xdp_tx_fd(int ifindex, int fd, __u32 flags)
+{
+	return __bpf_set_link_xdp_fd(ifindex, fd, flags, true);
+}
+
 static int __dump_link_nlmsg(struct nlmsghdr *nlh,
 			     libbpf_dump_nlmsg_t dump_link_nlmsg, void *cookie)
 {
@@ -203,20 +217,23 @@ static int __dump_link_nlmsg(struct nlmsghdr *nlh,
 	return dump_link_nlmsg(cookie, ifi, tb);
 }
 
-static int get_xdp_info(void *cookie, void *msg, struct nlattr **tb)
+static int __get_xdp_info(void *cookie, void *msg, struct nlattr **tb, bool tx)
 {
 	struct nlattr *xdp_tb[IFLA_XDP_MAX + 1];
 	struct xdp_id_md *xdp_id = cookie;
 	struct ifinfomsg *ifinfo = msg;
+	struct nlattr *attr;
 	int ret;
 
 	if (xdp_id->ifindex && xdp_id->ifindex != ifinfo->ifi_index)
 		return 0;
 
-	if (!tb[IFLA_XDP])
-		return 0;
+	if (tx)
+		attr = tb[IFLA_XDP_TX];
+	else
+		attr = tb[IFLA_XDP];
 
-	ret = libbpf_nla_parse_nested(xdp_tb, IFLA_XDP_MAX, tb[IFLA_XDP], NULL);
+	ret = libbpf_nla_parse_nested(xdp_tb, IFLA_XDP_MAX, attr, NULL);
 	if (ret)
 		return ret;
 
@@ -248,8 +265,22 @@ static int get_xdp_info(void *cookie, void *msg, struct nlattr **tb)
 	return 0;
 }
 
-int bpf_get_link_xdp_info(int ifindex, struct xdp_link_info *info,
-			  size_t info_size, __u32 flags)
+static int get_xdp_tx_info(void *cookie, void *msg, struct nlattr **tb)
+{
+	if (!tb[IFLA_XDP_TX])
+		return 0;
+	return __get_xdp_info(cookie, msg, tb, true);
+}
+
+static int get_xdp_info(void *cookie, void *msg, struct nlattr **tb)
+{
+	if (!tb[IFLA_XDP])
+		return 0;
+	return __get_xdp_info(cookie, msg, tb, false);
+}
+
+static int __bpf_get_link_xdp_info(int ifindex, struct xdp_link_info *info,
+			  size_t info_size, __u32 flags, bool tx)
 {
 	struct xdp_id_md xdp_id = {};
 	int sock, ret;
@@ -272,7 +303,11 @@ int bpf_get_link_xdp_info(int ifindex, struct xdp_link_info *info,
 	xdp_id.ifindex = ifindex;
 	xdp_id.flags = flags;
 
-	ret = libbpf_nl_get_link(sock, nl_pid, get_xdp_info, &xdp_id);
+	if (tx)
+		ret = libbpf_nl_get_link(sock, nl_pid, get_xdp_tx_info,
+					 &xdp_id);
+	else
+		ret = libbpf_nl_get_link(sock, nl_pid, get_xdp_info, &xdp_id);
 	if (!ret) {
 		size_t sz = min(info_size, sizeof(xdp_id.info));
 
@@ -284,6 +319,18 @@ int bpf_get_link_xdp_info(int ifindex, struct xdp_link_info *info,
 	return ret;
 }
 
+int bpf_get_link_xdp_info(int ifindex, struct xdp_link_info *info,
+			  size_t info_size, __u32 flags)
+{
+	return __bpf_get_link_xdp_info(ifindex, info, info_size, flags, false);
+}
+
+int bpf_get_link_xdp_tx_info(int ifindex, struct xdp_link_info *info,
+			  size_t info_size, __u32 flags)
+{
+	return __bpf_get_link_xdp_info(ifindex, info, info_size, flags, true);
+}
+
 static __u32 get_xdp_id(struct xdp_link_info *info, __u32 flags)
 {
 	if (info->attach_mode != XDP_ATTACHED_MULTI)
@@ -310,6 +357,18 @@ int bpf_get_link_xdp_id(int ifindex, __u32 *prog_id, __u32 flags)
 	return ret;
 }
 
+int bpf_get_link_xdp_tx_id(int ifindex, __u32 *prog_id, __u32 flags)
+{
+	struct xdp_link_info info;
+	int ret;
+
+	ret = bpf_get_link_xdp_tx_info(ifindex, &info, sizeof(info), flags);
+	if (!ret)
+		*prog_id = get_xdp_id(&info, flags);
+
+	return ret;
+}
+
 int libbpf_nl_get_link(int sock, unsigned int nl_pid,
 		       libbpf_dump_nlmsg_t dump_link_nlmsg, void *cookie)
 {
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [RFC net-next 04/14] samples/bpf: xdp1, add XDP tx support
  2019-12-18  8:10 [RFC net-next 00/14] XDP in tx path Prashant Bhole
                   ` (2 preceding siblings ...)
  2019-12-18  8:10 ` [RFC net-next 03/14] libbpf: API for tx path XDP support Prashant Bhole
@ 2019-12-18  8:10 ` Prashant Bhole
  2019-12-18  8:10 ` [RFC net-next 05/14] net: core: rename netif_receive_generic_xdp() to do_generic_xdp_core() Prashant Bhole
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 36+ messages in thread
From: Prashant Bhole @ 2019-12-18  8:10 UTC (permalink / raw)
  To: David S . Miller, Michael S . Tsirkin, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer
  Cc: Prashant Bhole, Jason Wang, David Ahern, Jakub Kicinski,
	John Fastabend, Toshiaki Makita, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, netdev

xdp1 and xdp2 now accept -T flag to set XDP program in the tx path.

Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
---
 samples/bpf/xdp1_user.c | 28 +++++++++++++++++++++++-----
 1 file changed, 23 insertions(+), 5 deletions(-)

diff --git a/samples/bpf/xdp1_user.c b/samples/bpf/xdp1_user.c
index 3e553eed95a7..16408f5afa4d 100644
--- a/samples/bpf/xdp1_user.c
+++ b/samples/bpf/xdp1_user.c
@@ -21,17 +21,26 @@
 static int ifindex;
 static __u32 xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST;
 static __u32 prog_id;
+static bool tx_path;
 
 static void int_exit(int sig)
 {
 	__u32 curr_prog_id = 0;
+	int err;
 
-	if (bpf_get_link_xdp_id(ifindex, &curr_prog_id, xdp_flags)) {
+	if (tx_path)
+		err = bpf_get_link_xdp_tx_id(ifindex, &curr_prog_id, xdp_flags);
+	else
+		err = bpf_get_link_xdp_id(ifindex, &curr_prog_id, xdp_flags);
+	if (err) {
 		printf("bpf_get_link_xdp_id failed\n");
 		exit(1);
 	}
 	if (prog_id == curr_prog_id)
-		bpf_set_link_xdp_fd(ifindex, -1, xdp_flags);
+		if (tx_path)
+			bpf_set_link_xdp_tx_fd(ifindex, -1, xdp_flags);
+		else
+			bpf_set_link_xdp_fd(ifindex, -1, xdp_flags);
 	else if (!curr_prog_id)
 		printf("couldn't find a prog id on a given interface\n");
 	else
@@ -73,7 +82,8 @@ static void usage(const char *prog)
 		"OPTS:\n"
 		"    -S    use skb-mode\n"
 		"    -N    enforce native mode\n"
-		"    -F    force loading prog\n",
+		"    -F    force loading prog\n"
+		"    -T    TX path prog\n",
 		prog);
 }
 
@@ -85,7 +95,7 @@ int main(int argc, char **argv)
 	};
 	struct bpf_prog_info info = {};
 	__u32 info_len = sizeof(info);
-	const char *optstr = "FSN";
+	const char *optstr = "FSNT";
 	int prog_fd, map_fd, opt;
 	struct bpf_object *obj;
 	struct bpf_map *map;
@@ -103,6 +113,9 @@ int main(int argc, char **argv)
 		case 'F':
 			xdp_flags &= ~XDP_FLAGS_UPDATE_IF_NOEXIST;
 			break;
+		case 'T':
+			tx_path = true;
+			break;
 		default:
 			usage(basename(argv[0]));
 			return 1;
@@ -146,7 +159,12 @@ int main(int argc, char **argv)
 	signal(SIGINT, int_exit);
 	signal(SIGTERM, int_exit);
 
-	if (bpf_set_link_xdp_fd(ifindex, prog_fd, xdp_flags) < 0) {
+	if (tx_path)
+		err = bpf_set_link_xdp_tx_fd(ifindex, prog_fd, xdp_flags);
+	else
+		err = bpf_set_link_xdp_fd(ifindex, prog_fd, xdp_flags);
+
+	if (err < 0) {
 		printf("link set xdp fd failed\n");
 		return 1;
 	}
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [RFC net-next 05/14] net: core: rename netif_receive_generic_xdp() to do_generic_xdp_core()
  2019-12-18  8:10 [RFC net-next 00/14] XDP in tx path Prashant Bhole
                   ` (3 preceding siblings ...)
  2019-12-18  8:10 ` [RFC net-next 04/14] samples/bpf: xdp1, add XDP tx support Prashant Bhole
@ 2019-12-18  8:10 ` Prashant Bhole
  2019-12-18  8:10 ` [RFC net-next 06/14] net: core: export do_xdp_generic_core() Prashant Bhole
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 36+ messages in thread
From: Prashant Bhole @ 2019-12-18  8:10 UTC (permalink / raw)
  To: David S . Miller, Michael S . Tsirkin, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer
  Cc: Jason Wang, David Ahern, Jakub Kicinski, John Fastabend,
	Toshiaki Makita, Martin KaFai Lau, Song Liu, Yonghong Song,
	Andrii Nakryiko, netdev, Prashant Bhole

From: Jason Wang <jasowang@redhat.com>

In skb generic path, we need a way to run XDP program on skb but
to have customized handling of XDP actions. netif_receive_generic_xdp
will be more helpful in such cases than do_xdp_generic.

This patch prepares netif_receive_generic_xdp() to be used as general
purpose function by renaming it.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
---
 net/core/dev.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index a46db72759cf..6bfe6fadea17 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4461,9 +4461,9 @@ static struct netdev_rx_queue *netif_get_rxqueue(struct sk_buff *skb)
 	return rxqueue;
 }
 
-static u32 netif_receive_generic_xdp(struct sk_buff *skb,
-				     struct xdp_buff *xdp,
-				     struct bpf_prog *xdp_prog)
+static u32 do_xdp_generic_core(struct sk_buff *skb,
+			       struct xdp_buff *xdp,
+			       struct bpf_prog *xdp_prog)
 {
 	struct netdev_rx_queue *rxqueue;
 	void *orig_data, *orig_data_end;
@@ -4610,7 +4610,7 @@ int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb)
 		u32 act;
 		int err;
 
-		act = netif_receive_generic_xdp(skb, &xdp, xdp_prog);
+		act = do_xdp_generic_core(skb, &xdp, xdp_prog);
 		if (act != XDP_PASS) {
 			switch (act) {
 			case XDP_REDIRECT:
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [RFC net-next 06/14] net: core: export do_xdp_generic_core()
  2019-12-18  8:10 [RFC net-next 00/14] XDP in tx path Prashant Bhole
                   ` (4 preceding siblings ...)
  2019-12-18  8:10 ` [RFC net-next 05/14] net: core: rename netif_receive_generic_xdp() to do_generic_xdp_core() Prashant Bhole
@ 2019-12-18  8:10 ` Prashant Bhole
  2019-12-18  8:10 ` [RFC net-next 07/14] tuntap: check tun_msg_ctl type at necessary places Prashant Bhole
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 36+ messages in thread
From: Prashant Bhole @ 2019-12-18  8:10 UTC (permalink / raw)
  To: David S . Miller, Michael S . Tsirkin, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer
  Cc: Jason Wang, David Ahern, Jakub Kicinski, John Fastabend,
	Toshiaki Makita, Martin KaFai Lau, Song Liu, Yonghong Song,
	Andrii Nakryiko, netdev, Prashant Bhole

From: Jason Wang <jasowang@redhat.com>

Let's export do_xdp_generic as a general purpose function. It will
just run XDP program on skb but will not handle XDP actions.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
---
 include/linux/netdevice.h | 2 ++
 net/core/dev.c            | 6 +++---
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 7b5256d374d2..c637754819d3 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -3660,6 +3660,8 @@ static inline void dev_consume_skb_any(struct sk_buff *skb)
 
 void generic_xdp_tx(struct sk_buff *skb, struct bpf_prog *xdp_prog);
 int do_xdp_generic(struct bpf_prog *xdp_prog, struct sk_buff *skb);
+u32 do_xdp_generic_core(struct sk_buff *skb, struct xdp_buff *xdp,
+			struct bpf_prog *xdp_prog);
 int netif_rx(struct sk_buff *skb);
 int netif_rx_ni(struct sk_buff *skb);
 int netif_receive_skb(struct sk_buff *skb);
diff --git a/net/core/dev.c b/net/core/dev.c
index 6bfe6fadea17..c45e8eaebf4f 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4461,9 +4461,8 @@ static struct netdev_rx_queue *netif_get_rxqueue(struct sk_buff *skb)
 	return rxqueue;
 }
 
-static u32 do_xdp_generic_core(struct sk_buff *skb,
-			       struct xdp_buff *xdp,
-			       struct bpf_prog *xdp_prog)
+u32 do_xdp_generic_core(struct sk_buff *skb, struct xdp_buff *xdp,
+			struct bpf_prog *xdp_prog)
 {
 	struct netdev_rx_queue *rxqueue;
 	void *orig_data, *orig_data_end;
@@ -4574,6 +4573,7 @@ static u32 do_xdp_generic_core(struct sk_buff *skb,
 
 	return act;
 }
+EXPORT_SYMBOL_GPL(do_xdp_generic_core);
 
 /* When doing generic XDP we have to bypass the qdisc layer and the
  * network taps in order to match in-driver-XDP behavior.
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [RFC net-next 07/14] tuntap: check tun_msg_ctl type at necessary places
  2019-12-18  8:10 [RFC net-next 00/14] XDP in tx path Prashant Bhole
                   ` (5 preceding siblings ...)
  2019-12-18  8:10 ` [RFC net-next 06/14] net: core: export do_xdp_generic_core() Prashant Bhole
@ 2019-12-18  8:10 ` Prashant Bhole
  2019-12-18  8:10 ` [RFC net-next 08/14] vhost_net: user tap recvmsg api to access ptr ring Prashant Bhole
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 36+ messages in thread
From: Prashant Bhole @ 2019-12-18  8:10 UTC (permalink / raw)
  To: David S . Miller, Michael S . Tsirkin, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer
  Cc: Prashant Bhole, Jason Wang, David Ahern, Jakub Kicinski,
	John Fastabend, Toshiaki Makita, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, netdev

tun_msg_ctl is used by vhost_net to communicate with tuntap. We will
introduce another type in soon. As a preparation this patch adds
conditions to check tun_msg_ctl type at necessary places.

Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
---
 drivers/net/tap.c | 7 +++++--
 drivers/net/tun.c | 6 +++++-
 2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/net/tap.c b/drivers/net/tap.c
index a6d63665ad03..a0a5dc18109a 100644
--- a/drivers/net/tap.c
+++ b/drivers/net/tap.c
@@ -1203,6 +1203,7 @@ static int tap_sendmsg(struct socket *sock, struct msghdr *m,
 	struct tap_queue *q = container_of(sock, struct tap_queue, sock);
 	struct tun_msg_ctl *ctl = m->msg_control;
 	struct xdp_buff *xdp;
+	void *ptr = NULL;
 	int i;
 
 	if (ctl && (ctl->type == TUN_MSG_PTR)) {
@@ -1213,8 +1214,10 @@ static int tap_sendmsg(struct socket *sock, struct msghdr *m,
 		return 0;
 	}
 
-	return tap_get_user(q, ctl ? ctl->ptr : NULL, &m->msg_iter,
-			    m->msg_flags & MSG_DONTWAIT);
+	if (ctl && ctl->type == TUN_MSG_UBUF)
+		ptr = ctl->ptr;
+
+	return tap_get_user(q, ptr, &m->msg_iter, m->msg_flags & MSG_DONTWAIT);
 }
 
 static int tap_recvmsg(struct socket *sock, struct msghdr *m,
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 683d371e6e82..1e436d9ec4e1 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -2529,6 +2529,7 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
 	struct tun_struct *tun = tun_get(tfile);
 	struct tun_msg_ctl *ctl = m->msg_control;
 	struct xdp_buff *xdp;
+	void *ptr = NULL;
 
 	if (!tun)
 		return -EBADFD;
@@ -2560,7 +2561,10 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
 		goto out;
 	}
 
-	ret = tun_get_user(tun, tfile, ctl ? ctl->ptr : NULL, &m->msg_iter,
+	if (ctl && ctl->type == TUN_MSG_UBUF)
+		ptr = ctl->ptr;
+
+	ret = tun_get_user(tun, tfile, ptr, &m->msg_iter,
 			   m->msg_flags & MSG_DONTWAIT,
 			   m->msg_flags & MSG_MORE);
 out:
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [RFC net-next 08/14] vhost_net: user tap recvmsg api to access ptr ring
  2019-12-18  8:10 [RFC net-next 00/14] XDP in tx path Prashant Bhole
                   ` (6 preceding siblings ...)
  2019-12-18  8:10 ` [RFC net-next 07/14] tuntap: check tun_msg_ctl type at necessary places Prashant Bhole
@ 2019-12-18  8:10 ` Prashant Bhole
  2019-12-18  8:10 ` [RFC net-next 09/14] tuntap: remove usage of ptr ring in vhost_net Prashant Bhole
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 36+ messages in thread
From: Prashant Bhole @ 2019-12-18  8:10 UTC (permalink / raw)
  To: David S . Miller, Michael S . Tsirkin, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer
  Cc: Prashant Bhole, Jason Wang, David Ahern, Jakub Kicinski,
	John Fastabend, Toshiaki Makita, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, netdev

Currently vhost_net directly accesses ptr ring of tap driver to
fetch Rx packet pointers. In order to avoid it this patch modifies
tap driver's recvmsg api to do additional task of fetching Rx packet
pointers.

A special struct tun_msg_ctl is already being passed via msg_control
for tun Rx XDP batching. This patch extends tun_msg_ctl usage to
send sub commands to recvmsg api. Now tun_recvmsg will handle commands
to consume and unconsume packet pointers from ptr ring.

This will be useful in implementation of tx path XDP in tun driver,
where XDP program will process the packet before it is passed to
vhost_net.

Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
---
 drivers/net/tap.c      | 22 ++++++++++++++++++-
 drivers/net/tun.c      | 24 ++++++++++++++++++++-
 drivers/vhost/net.c    | 48 +++++++++++++++++++++++++++++++-----------
 include/linux/if_tun.h | 18 ++++++++++++++++
 4 files changed, 98 insertions(+), 14 deletions(-)

diff --git a/drivers/net/tap.c b/drivers/net/tap.c
index a0a5dc18109a..a5ce44db11a3 100644
--- a/drivers/net/tap.c
+++ b/drivers/net/tap.c
@@ -1224,8 +1224,28 @@ static int tap_recvmsg(struct socket *sock, struct msghdr *m,
 		       size_t total_len, int flags)
 {
 	struct tap_queue *q = container_of(sock, struct tap_queue, sock);
-	struct sk_buff *skb = m->msg_control;
+	struct tun_msg_ctl *ctl = m->msg_control;
+	struct sk_buff *skb = NULL;
 	int ret;
+
+	if (ctl) {
+		switch (ctl->type) {
+		case TUN_MSG_PKT:
+			skb = ctl->ptr;
+			break;
+		case TUN_MSG_CONSUME_PKTS:
+			return ptr_ring_consume_batched(&q->ring,
+							ctl->ptr,
+							ctl->num);
+		case TUN_MSG_UNCONSUME_PKTS:
+			ptr_ring_unconsume(&q->ring, ctl->ptr, ctl->num,
+					   tun_ptr_free);
+			return 0;
+		default:
+			return -EINVAL;
+		}
+	}
+
 	if (flags & ~(MSG_DONTWAIT|MSG_TRUNC)) {
 		kfree_skb(skb);
 		return -EINVAL;
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 1e436d9ec4e1..4f28f2387435 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -2577,7 +2577,8 @@ static int tun_recvmsg(struct socket *sock, struct msghdr *m, size_t total_len,
 {
 	struct tun_file *tfile = container_of(sock, struct tun_file, socket);
 	struct tun_struct *tun = tun_get(tfile);
-	void *ptr = m->msg_control;
+	struct tun_msg_ctl *ctl = m->msg_control;
+	void *ptr = NULL;
 	int ret;
 
 	if (!tun) {
@@ -2585,6 +2586,27 @@ static int tun_recvmsg(struct socket *sock, struct msghdr *m, size_t total_len,
 		goto out_free;
 	}
 
+	if (ctl) {
+		switch (ctl->type) {
+		case TUN_MSG_PKT:
+			ptr = ctl->ptr;
+			break;
+		case TUN_MSG_CONSUME_PKTS:
+			ret = ptr_ring_consume_batched(&tfile->tx_ring,
+						       ctl->ptr,
+						       ctl->num);
+			goto out;
+		case TUN_MSG_UNCONSUME_PKTS:
+			ptr_ring_unconsume(&tfile->tx_ring, ctl->ptr,
+					   ctl->num, tun_ptr_free);
+			ret = 0;
+			goto out;
+		default:
+			ret = -EINVAL;
+			goto out_put_tun;
+		}
+	}
+
 	if (flags & ~(MSG_DONTWAIT|MSG_TRUNC|MSG_ERRQUEUE)) {
 		ret = -EINVAL;
 		goto out_put_tun;
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index e158159671fa..482548d00105 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -175,24 +175,44 @@ static void *vhost_net_buf_consume(struct vhost_net_buf *rxq)
 
 static int vhost_net_buf_produce(struct vhost_net_virtqueue *nvq)
 {
+	struct vhost_virtqueue *vq = &nvq->vq;
+	struct socket *sock = vq->private_data;
 	struct vhost_net_buf *rxq = &nvq->rxq;
+	struct tun_msg_ctl ctl = {
+		.type = TUN_MSG_CONSUME_PKTS,
+		.ptr = (void *) rxq->queue,
+		.num = VHOST_NET_BATCH,
+	};
+	struct msghdr msg = {
+		.msg_control = &ctl,
+	};
 
 	rxq->head = 0;
-	rxq->tail = ptr_ring_consume_batched(nvq->rx_ring, rxq->queue,
-					      VHOST_NET_BATCH);
+	rxq->tail = sock->ops->recvmsg(sock, &msg, 0, 0);
+	if (WARN_ON_ONCE(rxq->tail < 0))
+		rxq->tail = 0;
+
 	return rxq->tail;
 }
 
 static void vhost_net_buf_unproduce(struct vhost_net_virtqueue *nvq)
 {
+	struct vhost_virtqueue *vq = &nvq->vq;
+	struct socket *sock = vq->private_data;
 	struct vhost_net_buf *rxq = &nvq->rxq;
+	struct tun_msg_ctl ctl = {
+		.type = TUN_MSG_UNCONSUME_PKTS,
+		.ptr = (void *) (rxq->queue + rxq->head),
+		.num = vhost_net_buf_get_size(rxq),
+	};
+	struct msghdr msg = {
+		.msg_control = &ctl,
+	};
 
-	if (nvq->rx_ring && !vhost_net_buf_is_empty(rxq)) {
-		ptr_ring_unconsume(nvq->rx_ring, rxq->queue + rxq->head,
-				   vhost_net_buf_get_size(rxq),
-				   tun_ptr_free);
-		rxq->head = rxq->tail = 0;
-	}
+	if (!vhost_net_buf_is_empty(rxq))
+		sock->ops->recvmsg(sock, &msg, 0, 0);
+
+	rxq->head = rxq->tail = 0;
 }
 
 static int vhost_net_buf_peek_len(void *ptr)
@@ -1109,6 +1129,7 @@ static void handle_rx(struct vhost_net *net)
 		.flags = 0,
 		.gso_type = VIRTIO_NET_HDR_GSO_NONE
 	};
+	struct tun_msg_ctl ctl;
 	size_t total_len = 0;
 	int err, mergeable;
 	s16 headcount;
@@ -1166,8 +1187,11 @@ static void handle_rx(struct vhost_net *net)
 			goto out;
 		}
 		busyloop_intr = false;
-		if (nvq->rx_ring)
-			msg.msg_control = vhost_net_buf_consume(&nvq->rxq);
+		if (nvq->rx_ring) {
+			ctl.type = TUN_MSG_PKT;
+			ctl.ptr = vhost_net_buf_consume(&nvq->rxq);
+			msg.msg_control = &ctl;
+		}
 		/* On overrun, truncate and discard */
 		if (unlikely(headcount > UIO_MAXIOV)) {
 			iov_iter_init(&msg.msg_iter, READ, vq->iov, 1, 1);
@@ -1346,8 +1370,8 @@ static struct socket *vhost_net_stop_vq(struct vhost_net *n,
 	mutex_lock(&vq->mutex);
 	sock = vq->private_data;
 	vhost_net_disable_vq(n, vq);
-	vq->private_data = NULL;
 	vhost_net_buf_unproduce(nvq);
+	vq->private_data = NULL;
 	nvq->rx_ring = NULL;
 	mutex_unlock(&vq->mutex);
 	return sock;
@@ -1538,8 +1562,8 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd)
 		}
 
 		vhost_net_disable_vq(n, vq);
-		vq->private_data = sock;
 		vhost_net_buf_unproduce(nvq);
+		vq->private_data = sock;
 		r = vhost_vq_init_access(vq);
 		if (r)
 			goto err_used;
diff --git a/include/linux/if_tun.h b/include/linux/if_tun.h
index 5bda8cf457b6..bb94843e3829 100644
--- a/include/linux/if_tun.h
+++ b/include/linux/if_tun.h
@@ -11,8 +11,26 @@
 
 #define TUN_XDP_FLAG 0x1UL
 
+/*
+ * tun_msg_ctl types
+ */
+
 #define TUN_MSG_UBUF 1
 #define TUN_MSG_PTR  2
+/*
+ * Used for passing a packet pointer from vhost to tun
+ */
+#define TUN_MSG_PKT  3
+/*
+ * Used for passing an array of pointer from vhost to tun.
+ * tun consumes packets from ptr ring and stores in pointer array.
+ */
+#define TUN_MSG_CONSUME_PKTS    4
+/*
+ * Used for passing an array of pointer from vhost to tun.
+ * tun consumes get pointer from array and puts back into ptr ring.
+ */
+#define TUN_MSG_UNCONSUME_PKTS  5
 struct tun_msg_ctl {
 	unsigned short type;
 	unsigned short num;
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [RFC net-next 09/14] tuntap: remove usage of ptr ring in vhost_net
  2019-12-18  8:10 [RFC net-next 00/14] XDP in tx path Prashant Bhole
                   ` (7 preceding siblings ...)
  2019-12-18  8:10 ` [RFC net-next 08/14] vhost_net: user tap recvmsg api to access ptr ring Prashant Bhole
@ 2019-12-18  8:10 ` Prashant Bhole
  2019-12-18  8:10 ` [RFC net-next 10/14] tun: set tx path XDP program Prashant Bhole
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 36+ messages in thread
From: Prashant Bhole @ 2019-12-18  8:10 UTC (permalink / raw)
  To: David S . Miller, Michael S . Tsirkin, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer
  Cc: Prashant Bhole, Jason Wang, David Ahern, Jakub Kicinski,
	John Fastabend, Toshiaki Makita, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, netdev

Remove usage of ptr ring of tuntap in vhost_net and remove the
functions exported from tuntap drivers to get ptr ring.

Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
---
 drivers/net/tap.c      | 13 -------------
 drivers/net/tun.c      | 13 -------------
 drivers/vhost/net.c    | 31 ++++---------------------------
 include/linux/if_tap.h |  5 -----
 include/linux/if_tun.h |  5 -----
 5 files changed, 4 insertions(+), 63 deletions(-)

diff --git a/drivers/net/tap.c b/drivers/net/tap.c
index a5ce44db11a3..fe816a99275d 100644
--- a/drivers/net/tap.c
+++ b/drivers/net/tap.c
@@ -1288,19 +1288,6 @@ struct socket *tap_get_socket(struct file *file)
 }
 EXPORT_SYMBOL_GPL(tap_get_socket);
 
-struct ptr_ring *tap_get_ptr_ring(struct file *file)
-{
-	struct tap_queue *q;
-
-	if (file->f_op != &tap_fops)
-		return ERR_PTR(-EINVAL);
-	q = file->private_data;
-	if (!q)
-		return ERR_PTR(-EBADFD);
-	return &q->ring;
-}
-EXPORT_SYMBOL_GPL(tap_get_ptr_ring);
-
 int tap_queue_resize(struct tap_dev *tap)
 {
 	struct net_device *dev = tap->dev;
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 4f28f2387435..d078b4659897 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -3750,19 +3750,6 @@ struct socket *tun_get_socket(struct file *file)
 }
 EXPORT_SYMBOL_GPL(tun_get_socket);
 
-struct ptr_ring *tun_get_tx_ring(struct file *file)
-{
-	struct tun_file *tfile;
-
-	if (file->f_op != &tun_fops)
-		return ERR_PTR(-EINVAL);
-	tfile = file->private_data;
-	if (!tfile)
-		return ERR_PTR(-EBADFD);
-	return &tfile->tx_ring;
-}
-EXPORT_SYMBOL_GPL(tun_get_tx_ring);
-
 module_init(tun_init);
 module_exit(tun_cleanup);
 MODULE_DESCRIPTION(DRV_DESCRIPTION);
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 482548d00105..30b5c68193c9 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -122,7 +122,6 @@ struct vhost_net_virtqueue {
 	/* Reference counting for outstanding ubufs.
 	 * Protected by vq mutex. Writers must also take device mutex. */
 	struct vhost_net_ubuf_ref *ubufs;
-	struct ptr_ring *rx_ring;
 	struct vhost_net_buf rxq;
 	/* Batched XDP buffs */
 	struct xdp_buff *xdp;
@@ -997,8 +996,9 @@ static int peek_head_len(struct vhost_net_virtqueue *rvq, struct sock *sk)
 	int len = 0;
 	unsigned long flags;
 
-	if (rvq->rx_ring)
-		return vhost_net_buf_peek(rvq);
+	len = vhost_net_buf_peek(rvq);
+	if (len)
+		return len;
 
 	spin_lock_irqsave(&sk->sk_receive_queue.lock, flags);
 	head = skb_peek(&sk->sk_receive_queue);
@@ -1187,7 +1187,7 @@ static void handle_rx(struct vhost_net *net)
 			goto out;
 		}
 		busyloop_intr = false;
-		if (nvq->rx_ring) {
+		if (!vhost_net_buf_is_empty(&nvq->rxq)) {
 			ctl.type = TUN_MSG_PKT;
 			ctl.ptr = vhost_net_buf_consume(&nvq->rxq);
 			msg.msg_control = &ctl;
@@ -1343,7 +1343,6 @@ static int vhost_net_open(struct inode *inode, struct file *f)
 		n->vqs[i].batched_xdp = 0;
 		n->vqs[i].vhost_hlen = 0;
 		n->vqs[i].sock_hlen = 0;
-		n->vqs[i].rx_ring = NULL;
 		vhost_net_buf_init(&n->vqs[i].rxq);
 	}
 	vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX,
@@ -1372,7 +1371,6 @@ static struct socket *vhost_net_stop_vq(struct vhost_net *n,
 	vhost_net_disable_vq(n, vq);
 	vhost_net_buf_unproduce(nvq);
 	vq->private_data = NULL;
-	nvq->rx_ring = NULL;
 	mutex_unlock(&vq->mutex);
 	return sock;
 }
@@ -1468,25 +1466,6 @@ static struct socket *get_raw_socket(int fd)
 	return ERR_PTR(r);
 }
 
-static struct ptr_ring *get_tap_ptr_ring(int fd)
-{
-	struct ptr_ring *ring;
-	struct file *file = fget(fd);
-
-	if (!file)
-		return NULL;
-	ring = tun_get_tx_ring(file);
-	if (!IS_ERR(ring))
-		goto out;
-	ring = tap_get_ptr_ring(file);
-	if (!IS_ERR(ring))
-		goto out;
-	ring = NULL;
-out:
-	fput(file);
-	return ring;
-}
-
 static struct socket *get_tap_socket(int fd)
 {
 	struct file *file = fget(fd);
@@ -1570,8 +1549,6 @@ static long vhost_net_set_backend(struct vhost_net *n, unsigned index, int fd)
 		r = vhost_net_enable_vq(n, vq);
 		if (r)
 			goto err_used;
-		if (index == VHOST_NET_VQ_RX)
-			nvq->rx_ring = get_tap_ptr_ring(fd);
 
 		oldubufs = nvq->ubufs;
 		nvq->ubufs = ubufs;
diff --git a/include/linux/if_tap.h b/include/linux/if_tap.h
index 915a187cfabd..68fe366fb185 100644
--- a/include/linux/if_tap.h
+++ b/include/linux/if_tap.h
@@ -4,7 +4,6 @@
 
 #if IS_ENABLED(CONFIG_TAP)
 struct socket *tap_get_socket(struct file *);
-struct ptr_ring *tap_get_ptr_ring(struct file *file);
 #else
 #include <linux/err.h>
 #include <linux/errno.h>
@@ -14,10 +13,6 @@ static inline struct socket *tap_get_socket(struct file *f)
 {
 	return ERR_PTR(-EINVAL);
 }
-static inline struct ptr_ring *tap_get_ptr_ring(struct file *f)
-{
-	return ERR_PTR(-EINVAL);
-}
 #endif /* CONFIG_TAP */
 
 #include <net/sock.h>
diff --git a/include/linux/if_tun.h b/include/linux/if_tun.h
index bb94843e3829..f01a255e076d 100644
--- a/include/linux/if_tun.h
+++ b/include/linux/if_tun.h
@@ -44,7 +44,6 @@ struct tun_xdp_hdr {
 
 #if defined(CONFIG_TUN) || defined(CONFIG_TUN_MODULE)
 struct socket *tun_get_socket(struct file *);
-struct ptr_ring *tun_get_tx_ring(struct file *file);
 bool tun_is_xdp_frame(void *ptr);
 void *tun_xdp_to_ptr(void *ptr);
 void *tun_ptr_to_xdp(void *ptr);
@@ -58,10 +57,6 @@ static inline struct socket *tun_get_socket(struct file *f)
 {
 	return ERR_PTR(-EINVAL);
 }
-static inline struct ptr_ring *tun_get_tx_ring(struct file *f)
-{
-	return ERR_PTR(-EINVAL);
-}
 static inline bool tun_is_xdp_frame(void *ptr)
 {
 	return false;
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [RFC net-next 10/14] tun: set tx path XDP program
  2019-12-18  8:10 [RFC net-next 00/14] XDP in tx path Prashant Bhole
                   ` (8 preceding siblings ...)
  2019-12-18  8:10 ` [RFC net-next 09/14] tuntap: remove usage of ptr ring in vhost_net Prashant Bhole
@ 2019-12-18  8:10 ` Prashant Bhole
  2019-12-18  8:10 ` [RFC net-next 11/14] tun: run XDP program in tx path Prashant Bhole
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 36+ messages in thread
From: Prashant Bhole @ 2019-12-18  8:10 UTC (permalink / raw)
  To: David S . Miller, Michael S . Tsirkin, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer
  Cc: David Ahern, Jason Wang, David Ahern, Jakub Kicinski,
	John Fastabend, Toshiaki Makita, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, netdev, Prashant Bhole

From: David Ahern <dahern@digitalocean.com>

This patch adds a way to set tx path XDP program in tun driver
by handling XDP_SETUP_PROG_TX and XDP_QUERY_PROG_TX in ndo_bpf
handler.

Signed-off-by: David Ahern <dahern@digitalocean.com>
Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
---
 drivers/net/tun.c | 34 ++++++++++++++++++++++++++--------
 1 file changed, 26 insertions(+), 8 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index d078b4659897..8aee7abd53a2 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -239,6 +239,7 @@ struct tun_struct {
 	u32 rx_batched;
 	struct tun_pcpu_stats __percpu *pcpu_stats;
 	struct bpf_prog __rcu *xdp_prog;
+	struct bpf_prog __rcu *xdp_tx_prog;
 	struct tun_prog __rcu *steering_prog;
 	struct tun_prog __rcu *filter_prog;
 	struct ethtool_link_ksettings link_ksettings;
@@ -1189,15 +1190,21 @@ tun_net_get_stats64(struct net_device *dev, struct rtnl_link_stats64 *stats)
 }
 
 static int tun_xdp_set(struct net_device *dev, struct bpf_prog *prog,
-		       struct netlink_ext_ack *extack)
+		       bool tx, struct netlink_ext_ack *extack)
 {
 	struct tun_struct *tun = netdev_priv(dev);
 	struct tun_file *tfile;
 	struct bpf_prog *old_prog;
 	int i;
 
-	old_prog = rtnl_dereference(tun->xdp_prog);
-	rcu_assign_pointer(tun->xdp_prog, prog);
+	if (tx) {
+		old_prog = rtnl_dereference(tun->xdp_tx_prog);
+		rcu_assign_pointer(tun->xdp_tx_prog, prog);
+	} else {
+		old_prog = rtnl_dereference(tun->xdp_prog);
+		rcu_assign_pointer(tun->xdp_prog, prog);
+	}
+
 	if (old_prog)
 		bpf_prog_put(old_prog);
 
@@ -1218,12 +1225,16 @@ static int tun_xdp_set(struct net_device *dev, struct bpf_prog *prog,
 	return 0;
 }
 
-static u32 tun_xdp_query(struct net_device *dev)
+static u32 tun_xdp_query(struct net_device *dev, bool tx)
 {
 	struct tun_struct *tun = netdev_priv(dev);
 	const struct bpf_prog *xdp_prog;
 
-	xdp_prog = rtnl_dereference(tun->xdp_prog);
+	if (tx)
+		xdp_prog = rtnl_dereference(tun->xdp_tx_prog);
+	else
+		xdp_prog = rtnl_dereference(tun->xdp_prog);
+
 	if (xdp_prog)
 		return xdp_prog->aux->id;
 
@@ -1234,13 +1245,20 @@ static int tun_xdp(struct net_device *dev, struct netdev_bpf *xdp)
 {
 	switch (xdp->command) {
 	case XDP_SETUP_PROG:
-		return tun_xdp_set(dev, xdp->prog, xdp->extack);
+		return tun_xdp_set(dev, xdp->prog, false, xdp->extack);
+	case XDP_SETUP_PROG_TX:
+		return tun_xdp_set(dev, xdp->prog, true, xdp->extack);
 	case XDP_QUERY_PROG:
-		xdp->prog_id = tun_xdp_query(dev);
-		return 0;
+		xdp->prog_id = tun_xdp_query(dev, false);
+		break;
+	case XDP_QUERY_PROG_TX:
+		xdp->prog_id = tun_xdp_query(dev, true);
+		break;
 	default:
 		return -EINVAL;
 	}
+
+	return 0;
 }
 
 static int tun_net_change_carrier(struct net_device *dev, bool new_carrier)
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [RFC net-next 11/14] tun: run XDP program in tx path
  2019-12-18  8:10 [RFC net-next 00/14] XDP in tx path Prashant Bhole
                   ` (9 preceding siblings ...)
  2019-12-18  8:10 ` [RFC net-next 10/14] tun: set tx path XDP program Prashant Bhole
@ 2019-12-18  8:10 ` Prashant Bhole
  2019-12-18 10:07   ` Jesper Dangaard Brouer
  2019-12-18  8:10 ` [RFC net-next 12/14] tun: add a way to inject tx path packet into Rx path Prashant Bhole
                   ` (2 subsequent siblings)
  13 siblings, 1 reply; 36+ messages in thread
From: Prashant Bhole @ 2019-12-18  8:10 UTC (permalink / raw)
  To: David S . Miller, Michael S . Tsirkin, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer
  Cc: Prashant Bhole, Jason Wang, David Ahern, Jakub Kicinski,
	John Fastabend, Toshiaki Makita, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, netdev

Run the XDP program as soon as packet is removed from the ptr
ring. Since this is XDP in tx path, the traditional handling of
XDP actions XDP_TX/REDIRECT isn't valid. For this reason we call
do_xdp_generic_core instead of do_xdp_generic. do_xdp_generic_core
just runs the program and leaves the action handling to us.

Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
---
 drivers/net/tun.c | 149 +++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 146 insertions(+), 3 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 8aee7abd53a2..1afded9252f5 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -131,6 +131,7 @@ struct tap_filter {
 /* MAX_TAP_QUEUES 256 is chosen to allow rx/tx queues to be equal
  * to max number of VCPUs in guest. */
 #define MAX_TAP_QUEUES 256
+#define MAX_TAP_BATCH 64
 #define MAX_TAP_FLOWS  4096
 
 #define TUN_FLOW_EXPIRE (3 * HZ)
@@ -2173,6 +2174,109 @@ static ssize_t tun_put_user(struct tun_struct *tun,
 	return total;
 }
 
+static struct sk_buff *tun_prepare_xdp_skb(struct sk_buff *skb)
+{
+	struct sk_buff *nskb;
+
+	if (skb_shared(skb) || skb_cloned(skb)) {
+		nskb = skb_copy(skb, GFP_ATOMIC);
+		consume_skb(skb);
+		return nskb;
+	}
+
+	return skb;
+}
+
+static u32 tun_do_xdp_tx_generic(struct tun_struct *tun,
+				 struct sk_buff *skb)
+{
+	struct bpf_prog *xdp_prog;
+	struct xdp_buff xdp;
+	u32 act = XDP_PASS;
+
+	xdp_prog = rcu_dereference(tun->xdp_tx_prog);
+	if (xdp_prog) {
+		skb = tun_prepare_xdp_skb(skb);
+		if (!skb) {
+			act = XDP_DROP;
+			kfree_skb(skb);
+			goto drop;
+		}
+
+		act = do_xdp_generic_core(skb, &xdp, xdp_prog);
+		switch (act) {
+		case XDP_TX:
+			/* Rx path generic XDP will be called in this path
+			 */
+			local_bh_disable();
+			netif_receive_skb(skb);
+			local_bh_enable();
+			break;
+		case XDP_PASS:
+			break;
+		case XDP_REDIRECT:
+			/* Since we are not handling this case yet, let's free
+			 * skb here. In case of XDP_DROP/XDP_ABORTED, the skb
+			 * was already freed in do_xdp_generic_core()
+			 */
+			kfree_skb(skb);
+			/* fall through */
+		default:
+			bpf_warn_invalid_xdp_action(act);
+			/* fall through */
+		case XDP_ABORTED:
+			trace_xdp_exception(tun->dev, xdp_prog, act);
+			/* fall through */
+		case XDP_DROP:
+			goto drop;
+		}
+	}
+
+	return act;
+drop:
+	this_cpu_inc(tun->pcpu_stats->tx_dropped);
+	return act;
+}
+
+static u32 tun_do_xdp_tx(struct tun_struct *tun, struct tun_file *tfile,
+			 struct xdp_frame *frame)
+{
+	struct bpf_prog *xdp_prog;
+	struct tun_page tpage;
+	struct xdp_buff xdp;
+	u32 act = XDP_PASS;
+	int flush = 0;
+
+	xdp_prog = rcu_dereference(tun->xdp_tx_prog);
+	if (xdp_prog) {
+		xdp.data_hard_start = frame->data - frame->headroom;
+		xdp.data = frame->data;
+		xdp.data_end = xdp.data + frame->len;
+		xdp.data_meta = xdp.data - frame->metasize;
+
+		act = bpf_prog_run_xdp(xdp_prog, &xdp);
+		switch (act) {
+		case XDP_PASS:
+			break;
+		case XDP_TX:
+			/* fall through */
+		case XDP_REDIRECT:
+			/* fall through */
+		default:
+			bpf_warn_invalid_xdp_action(act);
+			/* fall through */
+		case XDP_ABORTED:
+			trace_xdp_exception(tun->dev, xdp_prog, act);
+			/* fall through */
+		case XDP_DROP:
+			xdp_return_frame_rx_napi(frame);
+			break;
+		}
+	}
+
+	return act;
+}
+
 static void *tun_ring_recv(struct tun_file *tfile, int noblock, int *err)
 {
 	DECLARE_WAITQUEUE(wait, current);
@@ -2590,6 +2694,47 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
 	return ret;
 }
 
+static int tun_consume_packets(struct tun_file *tfile, void **ptr_array, int n)
+{
+	struct xdp_frame *frame;
+	struct tun_struct *tun;
+	int i, num_ptrs;
+	int pkt_cnt = 0;
+	void *pkts[MAX_TAP_BATCH];
+	void *ptr;
+	u32 act;
+
+	if (unlikely(!tfile))
+		return 0;
+
+	if (n > MAX_TAP_BATCH)
+		n = MAX_TAP_BATCH;
+
+	rcu_read_lock();
+	tun = rcu_dereference(tfile->tun);
+	if (unlikely(!tun)) {
+		rcu_read_unlock();
+		return 0;
+	}
+
+	num_ptrs = ptr_ring_consume_batched(&tfile->tx_ring, pkts, n);
+	for (i = 0; i < num_ptrs; i++) {
+		ptr = pkts[i];
+		if (tun_is_xdp_frame(ptr)) {
+			frame = tun_ptr_to_xdp(ptr);
+			act = tun_do_xdp_tx(tun, tfile, frame);
+		} else {
+			act = tun_do_xdp_tx_generic(tun, ptr);
+		}
+
+		if (act == XDP_PASS)
+			ptr_array[pkt_cnt++] = ptr;
+	}
+
+	rcu_read_unlock();
+	return pkt_cnt;
+}
+
 static int tun_recvmsg(struct socket *sock, struct msghdr *m, size_t total_len,
 		       int flags)
 {
@@ -2610,9 +2755,7 @@ static int tun_recvmsg(struct socket *sock, struct msghdr *m, size_t total_len,
 			ptr = ctl->ptr;
 			break;
 		case TUN_MSG_CONSUME_PKTS:
-			ret = ptr_ring_consume_batched(&tfile->tx_ring,
-						       ctl->ptr,
-						       ctl->num);
+			ret = tun_consume_packets(tfile, ctl->ptr, ctl->num);
 			goto out;
 		case TUN_MSG_UNCONSUME_PKTS:
 			ptr_ring_unconsume(&tfile->tx_ring, ctl->ptr,
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [RFC net-next 12/14] tun: add a way to inject tx path packet into Rx path
  2019-12-18  8:10 [RFC net-next 00/14] XDP in tx path Prashant Bhole
                   ` (10 preceding siblings ...)
  2019-12-18  8:10 ` [RFC net-next 11/14] tun: run XDP program in tx path Prashant Bhole
@ 2019-12-18  8:10 ` Prashant Bhole
  2019-12-18  8:10 ` [RFC net-next 13/14] tun: handle XDP_TX action of tx path XDP program Prashant Bhole
  2019-12-18  8:10 ` [RFC net-next 14/14] tun: run xdp prog when tun is read from file interface Prashant Bhole
  13 siblings, 0 replies; 36+ messages in thread
From: Prashant Bhole @ 2019-12-18  8:10 UTC (permalink / raw)
  To: David S . Miller, Michael S . Tsirkin, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer
  Cc: Prashant Bhole, Jason Wang, David Ahern, Jakub Kicinski,
	John Fastabend, Toshiaki Makita, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, netdev

In order to support XDP_TX from tx path XDP program, we need a way
to inject tx path packet into RX path. Let's modify the RX path
function tun_xdp_one() for this purpose.

This patch adds a parameter to pass information whether packet has
virtio_net header.

Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
---
 drivers/net/tun.c | 41 +++++++++++++++++++++++++++++++----------
 1 file changed, 31 insertions(+), 10 deletions(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 1afded9252f5..0701a7a80346 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -2238,6 +2238,13 @@ static u32 tun_do_xdp_tx_generic(struct tun_struct *tun,
 	return act;
 }
 
+static int tun_xdp_one(struct tun_struct *tun,
+		       struct tun_file *tfile,
+		       struct xdp_buff *xdp, int *flush,
+		       struct tun_page *tpage, int has_hdr);
+
+static void tun_put_page(struct tun_page *tpage);
+
 static u32 tun_do_xdp_tx(struct tun_struct *tun, struct tun_file *tfile,
 			 struct xdp_frame *frame)
 {
@@ -2543,23 +2550,36 @@ static void tun_put_page(struct tun_page *tpage)
 static int tun_xdp_one(struct tun_struct *tun,
 		       struct tun_file *tfile,
 		       struct xdp_buff *xdp, int *flush,
-		       struct tun_page *tpage)
+		       struct tun_page *tpage, int has_hdr)
 {
 	unsigned int datasize = xdp->data_end - xdp->data;
-	struct tun_xdp_hdr *hdr = xdp->data_hard_start;
-	struct virtio_net_hdr *gso = &hdr->gso;
+	struct virtio_net_hdr *gso = NULL;
 	struct tun_pcpu_stats *stats;
-	struct bpf_prog *xdp_prog;
 	struct sk_buff *skb = NULL;
-	u32 rxhash = 0, act;
-	int buflen = hdr->buflen;
-	int err = 0;
+	struct bpf_prog *xdp_prog;
+	struct tun_xdp_hdr *hdr;
+	unsigned int headroom;
 	bool skb_xdp = false;
+	u32 rxhash = 0, act;
 	struct page *page;
+	int err = 0;
+	int buflen;
+
+	if (has_hdr) {
+		hdr = xdp->data_hard_start;
+		gso = &hdr->gso;
+		buflen = hdr->buflen;
+	} else {
+		/* came here from tun tx path */
+		xdp->data_hard_start -= sizeof(struct xdp_frame);
+		headroom = xdp->data - xdp->data_hard_start;
+		buflen = datasize + headroom +
+			 SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
+	}
 
 	xdp_prog = rcu_dereference(tun->xdp_prog);
 	if (xdp_prog) {
-		if (gso->gso_type) {
+		if (has_hdr && gso->gso_type) {
 			skb_xdp = true;
 			goto build;
 		}
@@ -2604,7 +2624,8 @@ static int tun_xdp_one(struct tun_struct *tun,
 	skb_reserve(skb, xdp->data - xdp->data_hard_start);
 	skb_put(skb, xdp->data_end - xdp->data);
 
-	if (virtio_net_hdr_to_skb(skb, gso, tun_is_little_endian(tun))) {
+	if (has_hdr &&
+	    virtio_net_hdr_to_skb(skb, gso, tun_is_little_endian(tun))) {
 		this_cpu_inc(tun->pcpu_stats->rx_frame_errors);
 		kfree_skb(skb);
 		err = -EINVAL;
@@ -2668,7 +2689,7 @@ static int tun_sendmsg(struct socket *sock, struct msghdr *m, size_t total_len)
 
 		for (i = 0; i < n; i++) {
 			xdp = &((struct xdp_buff *)ctl->ptr)[i];
-			tun_xdp_one(tun, tfile, xdp, &flush, &tpage);
+			tun_xdp_one(tun, tfile, xdp, &flush, &tpage, true);
 		}
 
 		if (flush)
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [RFC net-next 13/14] tun: handle XDP_TX action of tx path XDP program
  2019-12-18  8:10 [RFC net-next 00/14] XDP in tx path Prashant Bhole
                   ` (11 preceding siblings ...)
  2019-12-18  8:10 ` [RFC net-next 12/14] tun: add a way to inject tx path packet into Rx path Prashant Bhole
@ 2019-12-18  8:10 ` Prashant Bhole
  2019-12-18  8:10 ` [RFC net-next 14/14] tun: run xdp prog when tun is read from file interface Prashant Bhole
  13 siblings, 0 replies; 36+ messages in thread
From: Prashant Bhole @ 2019-12-18  8:10 UTC (permalink / raw)
  To: David S . Miller, Michael S . Tsirkin, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer
  Cc: Prashant Bhole, Jason Wang, David Ahern, Jakub Kicinski,
	John Fastabend, Toshiaki Makita, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, netdev

When the action code is XDP_TX, we need to inject the packet in
Rx path of tun. This patch injects such packets in Rx path using
tun_xdp_one.

Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
---
 drivers/net/tun.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 0701a7a80346..8e2fe0ad7955 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -2266,7 +2266,13 @@ static u32 tun_do_xdp_tx(struct tun_struct *tun, struct tun_file *tfile,
 		case XDP_PASS:
 			break;
 		case XDP_TX:
-			/* fall through */
+			tpage.page = NULL;
+			tpage.count = 0;
+			tun_xdp_one(tun, tfile, &xdp, &flush, &tpage, false);
+			tun_put_page(&tpage);
+			if (flush)
+				xdp_do_flush_map();
+			break;
 		case XDP_REDIRECT:
 			/* fall through */
 		default:
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [RFC net-next 14/14] tun: run xdp prog when tun is read from file interface
  2019-12-18  8:10 [RFC net-next 00/14] XDP in tx path Prashant Bhole
                   ` (12 preceding siblings ...)
  2019-12-18  8:10 ` [RFC net-next 13/14] tun: handle XDP_TX action of tx path XDP program Prashant Bhole
@ 2019-12-18  8:10 ` Prashant Bhole
  13 siblings, 0 replies; 36+ messages in thread
From: Prashant Bhole @ 2019-12-18  8:10 UTC (permalink / raw)
  To: David S . Miller, Michael S . Tsirkin, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer
  Cc: Prashant Bhole, Jason Wang, David Ahern, Jakub Kicinski,
	John Fastabend, Toshiaki Makita, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, netdev

It handles the case when qemu performs read on tun using file
operations.

Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
---
 drivers/net/tun.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 8e2fe0ad7955..0a9ac380c1b4 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -2335,8 +2335,10 @@ static ssize_t tun_do_read(struct tun_struct *tun, struct tun_file *tfile,
 			   struct iov_iter *to,
 			   int noblock, void *ptr)
 {
+	struct xdp_frame *frame;
 	ssize_t ret;
 	int err;
+	u32 act;
 
 	tun_debug(KERN_INFO, tun, "tun_do_read\n");
 
@@ -2350,6 +2352,15 @@ static ssize_t tun_do_read(struct tun_struct *tun, struct tun_file *tfile,
 		ptr = tun_ring_recv(tfile, noblock, &err);
 		if (!ptr)
 			return err;
+
+		if (tun_is_xdp_frame(ptr)) {
+			frame = tun_ptr_to_xdp(ptr);
+			act = tun_do_xdp_tx(tun, tfile, frame);
+		} else {
+			act = tun_do_xdp_tx_generic(tun, ptr);
+		}
+		if (act != XDP_PASS)
+			return err;
 	}
 
 	if (tun_is_xdp_frame(ptr)) {
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [RFC net-next 11/14] tun: run XDP program in tx path
  2019-12-18  8:10 ` [RFC net-next 11/14] tun: run XDP program in tx path Prashant Bhole
@ 2019-12-18 10:07   ` Jesper Dangaard Brouer
  2019-12-18 11:48     ` Toke Høiland-Jørgensen
                       ` (2 more replies)
  0 siblings, 3 replies; 36+ messages in thread
From: Jesper Dangaard Brouer @ 2019-12-18 10:07 UTC (permalink / raw)
  To: Prashant Bhole
  Cc: David S . Miller, Michael S . Tsirkin, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, Jason Wang, David Ahern,
	Jakub Kicinski, John Fastabend, Toshiaki Makita,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	netdev, Ilias Apalodimas

On Wed, 18 Dec 2019 17:10:47 +0900
Prashant Bhole <prashantbhole.linux@gmail.com> wrote:

> +static u32 tun_do_xdp_tx(struct tun_struct *tun, struct tun_file *tfile,
> +			 struct xdp_frame *frame)
> +{
> +	struct bpf_prog *xdp_prog;
> +	struct tun_page tpage;
> +	struct xdp_buff xdp;
> +	u32 act = XDP_PASS;
> +	int flush = 0;
> +
> +	xdp_prog = rcu_dereference(tun->xdp_tx_prog);
> +	if (xdp_prog) {
> +		xdp.data_hard_start = frame->data - frame->headroom;
> +		xdp.data = frame->data;
> +		xdp.data_end = xdp.data + frame->len;
> +		xdp.data_meta = xdp.data - frame->metasize;

You have not configured xdp.rxq, thus a BPF-prog accessing this will crash.

For an XDP TX hook, I want us to provide/give BPF-prog access to some
more information about e.g. the current tx-queue length, or TC-q number.

Question to Daniel or Alexei, can we do this and still keep BPF_PROG_TYPE_XDP?
Or is it better to introduce a new BPF prog type (enum bpf_prog_type)
for XDP TX-hook ?

To Prashant, look at net/core/filter.c in xdp_convert_ctx_access() on
how the BPF instruction rewrites are done, when accessing xdp_rxq_info.


> +		act = bpf_prog_run_xdp(xdp_prog, &xdp);
> +		switch (act) {
> +		case XDP_PASS:
> +			break;
> +		case XDP_TX:
> +			/* fall through */
> +		case XDP_REDIRECT:
> +			/* fall through */
> +		default:
> +			bpf_warn_invalid_xdp_action(act);
> +			/* fall through */
> +		case XDP_ABORTED:
> +			trace_xdp_exception(tun->dev, xdp_prog, act);
> +			/* fall through */
> +		case XDP_DROP:
> +			xdp_return_frame_rx_napi(frame);

I'm not sure that it is safe to use "napi" variant here, as you have to
be under same RX-NAPI processing loop for this to be safe.

Notice the "rx" part of the name "xdp_return_frame_rx_napi". 


> +			break;
> +		}
> +	}
> +
> +	return act;
> +}


-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC net-next 11/14] tun: run XDP program in tx path
  2019-12-18 10:07   ` Jesper Dangaard Brouer
@ 2019-12-18 11:48     ` Toke Høiland-Jørgensen
  2019-12-18 16:33       ` David Ahern
  2019-12-18 18:19       ` Alexei Starovoitov
  2019-12-18 16:29     ` David Ahern
  2019-12-19  1:47     ` Prashant Bhole
  2 siblings, 2 replies; 36+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-12-18 11:48 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, Prashant Bhole
  Cc: David S . Miller, Michael S . Tsirkin, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, Jason Wang, David Ahern,
	Jakub Kicinski, John Fastabend, Toshiaki Makita,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	netdev, Ilias Apalodimas

Jesper Dangaard Brouer <jbrouer@redhat.com> writes:

> On Wed, 18 Dec 2019 17:10:47 +0900
> Prashant Bhole <prashantbhole.linux@gmail.com> wrote:
>
>> +static u32 tun_do_xdp_tx(struct tun_struct *tun, struct tun_file *tfile,
>> +			 struct xdp_frame *frame)
>> +{
>> +	struct bpf_prog *xdp_prog;
>> +	struct tun_page tpage;
>> +	struct xdp_buff xdp;
>> +	u32 act = XDP_PASS;
>> +	int flush = 0;
>> +
>> +	xdp_prog = rcu_dereference(tun->xdp_tx_prog);
>> +	if (xdp_prog) {
>> +		xdp.data_hard_start = frame->data - frame->headroom;
>> +		xdp.data = frame->data;
>> +		xdp.data_end = xdp.data + frame->len;
>> +		xdp.data_meta = xdp.data - frame->metasize;
>
> You have not configured xdp.rxq, thus a BPF-prog accessing this will crash.
>
> For an XDP TX hook, I want us to provide/give BPF-prog access to some
> more information about e.g. the current tx-queue length, or TC-q number.
>
> Question to Daniel or Alexei, can we do this and still keep BPF_PROG_TYPE_XDP?
> Or is it better to introduce a new BPF prog type (enum bpf_prog_type)
> for XDP TX-hook ?

I think a new program type would make the most sense. If/when we
introduce an XDP TX hook[0], it should have different semantics than the
regular XDP hook. I view the XDP TX hook as a hook that executes as the
very last thing before packets leave the interface. It should have
access to different context data as you say, but also I don't think it
makes sense to have XDP_TX and XDP_REDIRECT in an XDP_TX hook. And we
may also want to have a "throttle" return code; or maybe that could be
done via a helper?

In any case, I don't think this "emulated RX hook on the other end of a
virtual device" model that this series introduces is the right semantics
for an XDP TX hook. I can see what you're trying to do, and for virtual
point-to-point links I think it may make sense to emulate the RX hook of
the "other end" on TX. However, form a UAPI perspective, I don't think
we should be calling this a TX hook; logically, it's still an RX hook
on the receive end.

If you guys are up for evolving this design into a "proper" TX hook (as
outlined above an in [0]), that would be awesome, of course. But not
sure what constraints you have on your original problem? Do you
specifically need the "emulated RX hook for unmodified XDP programs"
semantics, or could your problem be solved with a TX hook with different
semantics?

-Toke


[0] We've suggested this in the past, see
https://github.com/xdp-project/xdp-project/blob/master/xdp-project.org#xdp-hook-at-tx


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC net-next 11/14] tun: run XDP program in tx path
  2019-12-18 10:07   ` Jesper Dangaard Brouer
  2019-12-18 11:48     ` Toke Høiland-Jørgensen
@ 2019-12-18 16:29     ` David Ahern
  2019-12-19  1:47     ` Prashant Bhole
  2 siblings, 0 replies; 36+ messages in thread
From: David Ahern @ 2019-12-18 16:29 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, Prashant Bhole
  Cc: David S . Miller, Michael S . Tsirkin, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, Jason Wang,
	Jakub Kicinski, John Fastabend, Toshiaki Makita,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	netdev, Ilias Apalodimas

On 12/18/19 3:07 AM, Jesper Dangaard Brouer wrote:
> 
> You have not configured xdp.rxq, thus a BPF-prog accessing this will crash.

yes, I had a note in my early patch about this. The current XDP context
is definitely Rx focused.

> 
> For an XDP TX hook, I want us to provide/give BPF-prog access to some
> more information about e.g. the current tx-queue length, or TC-q number.
> 
> Question to Daniel or Alexei, can we do this and still keep BPF_PROG_TYPE_XDP?
> Or is it better to introduce a new BPF prog type (enum bpf_prog_type)
> for XDP TX-hook ?
> 

That is one of the design questions: can the Tx path re-use the existing
uapi for XDP or do we need to create a new one.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC net-next 11/14] tun: run XDP program in tx path
  2019-12-18 11:48     ` Toke Høiland-Jørgensen
@ 2019-12-18 16:33       ` David Ahern
  2019-12-19  2:44         ` Jason Wang
  2019-12-18 18:19       ` Alexei Starovoitov
  1 sibling, 1 reply; 36+ messages in thread
From: David Ahern @ 2019-12-18 16:33 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Jesper Dangaard Brouer, Prashant Bhole
  Cc: David S . Miller, Michael S . Tsirkin, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, Jason Wang,
	Jakub Kicinski, John Fastabend, Toshiaki Makita,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	netdev, Ilias Apalodimas

On 12/18/19 4:48 AM, Toke Høiland-Jørgensen wrote:
> Jesper Dangaard Brouer <jbrouer@redhat.com> writes:
> 
>> On Wed, 18 Dec 2019 17:10:47 +0900
>> Prashant Bhole <prashantbhole.linux@gmail.com> wrote:
>>
>>> +static u32 tun_do_xdp_tx(struct tun_struct *tun, struct tun_file *tfile,
>>> +			 struct xdp_frame *frame)
>>> +{
>>> +	struct bpf_prog *xdp_prog;
>>> +	struct tun_page tpage;
>>> +	struct xdp_buff xdp;
>>> +	u32 act = XDP_PASS;
>>> +	int flush = 0;
>>> +
>>> +	xdp_prog = rcu_dereference(tun->xdp_tx_prog);
>>> +	if (xdp_prog) {
>>> +		xdp.data_hard_start = frame->data - frame->headroom;
>>> +		xdp.data = frame->data;
>>> +		xdp.data_end = xdp.data + frame->len;
>>> +		xdp.data_meta = xdp.data - frame->metasize;
>>
>> You have not configured xdp.rxq, thus a BPF-prog accessing this will crash.
>>
>> For an XDP TX hook, I want us to provide/give BPF-prog access to some
>> more information about e.g. the current tx-queue length, or TC-q number.
>>
>> Question to Daniel or Alexei, can we do this and still keep BPF_PROG_TYPE_XDP?
>> Or is it better to introduce a new BPF prog type (enum bpf_prog_type)
>> for XDP TX-hook ?
> 
> I think a new program type would make the most sense. If/when we
> introduce an XDP TX hook[0], it should have different semantics than the
> regular XDP hook. I view the XDP TX hook as a hook that executes as the
> very last thing before packets leave the interface. It should have
> access to different context data as you say, but also I don't think it
> makes sense to have XDP_TX and XDP_REDIRECT in an XDP_TX hook. And we
> may also want to have a "throttle" return code; or maybe that could be
> done via a helper?

XDP_TX does not make sense in the Tx path. Jason questioned whether
XDP_RX makes sense. There is not a clear use case just yet.

REDIRECT is another one that would be useful as you point out below.

A new program type would allow support for these to be added over time
and not hold up the ability to do XDP_DROP in the Tx path.

> 
> In any case, I don't think this "emulated RX hook on the other end of a
> virtual device" model that this series introduces is the right semantics
> for an XDP TX hook. I can see what you're trying to do, and for virtual
> point-to-point links I think it may make sense to emulate the RX hook of
> the "other end" on TX. However, form a UAPI perspective, I don't think
> we should be calling this a TX hook; logically, it's still an RX hook
> on the receive end.
> 
> If you guys are up for evolving this design into a "proper" TX hook (as
> outlined above an in [0]), that would be awesome, of course. But not
> sure what constraints you have on your original problem? Do you
> specifically need the "emulated RX hook for unmodified XDP programs"
> semantics, or could your problem be solved with a TX hook with different
> semantics?
> 
> -Toke
> 
> 
> [0] We've suggested this in the past, see
> https://github.com/xdp-project/xdp-project/blob/master/xdp-project.org#xdp-hook-at-tx
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC net-next 11/14] tun: run XDP program in tx path
  2019-12-18 11:48     ` Toke Høiland-Jørgensen
  2019-12-18 16:33       ` David Ahern
@ 2019-12-18 18:19       ` Alexei Starovoitov
  2019-12-19  2:34         ` Prashant Bhole
  1 sibling, 1 reply; 36+ messages in thread
From: Alexei Starovoitov @ 2019-12-18 18:19 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Jesper Dangaard Brouer, Prashant Bhole, David S . Miller,
	Michael S . Tsirkin, Alexei Starovoitov, Daniel Borkmann,
	Jesper Dangaard Brouer, Jason Wang, David Ahern, Jakub Kicinski,
	John Fastabend, Toshiaki Makita, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, netdev, Ilias Apalodimas

On Wed, Dec 18, 2019 at 12:48:59PM +0100, Toke Høiland-Jørgensen wrote:
> Jesper Dangaard Brouer <jbrouer@redhat.com> writes:
> 
> > On Wed, 18 Dec 2019 17:10:47 +0900
> > Prashant Bhole <prashantbhole.linux@gmail.com> wrote:
> >
> >> +static u32 tun_do_xdp_tx(struct tun_struct *tun, struct tun_file *tfile,
> >> +			 struct xdp_frame *frame)
> >> +{
> >> +	struct bpf_prog *xdp_prog;
> >> +	struct tun_page tpage;
> >> +	struct xdp_buff xdp;
> >> +	u32 act = XDP_PASS;
> >> +	int flush = 0;
> >> +
> >> +	xdp_prog = rcu_dereference(tun->xdp_tx_prog);
> >> +	if (xdp_prog) {
> >> +		xdp.data_hard_start = frame->data - frame->headroom;
> >> +		xdp.data = frame->data;
> >> +		xdp.data_end = xdp.data + frame->len;
> >> +		xdp.data_meta = xdp.data - frame->metasize;
> >
> > You have not configured xdp.rxq, thus a BPF-prog accessing this will crash.
> >
> > For an XDP TX hook, I want us to provide/give BPF-prog access to some
> > more information about e.g. the current tx-queue length, or TC-q number.
> >
> > Question to Daniel or Alexei, can we do this and still keep BPF_PROG_TYPE_XDP?
> > Or is it better to introduce a new BPF prog type (enum bpf_prog_type)
> > for XDP TX-hook ?
> 
> I think a new program type would make the most sense. If/when we
> introduce an XDP TX hook[0], it should have different semantics than the
> regular XDP hook. I view the XDP TX hook as a hook that executes as the
> very last thing before packets leave the interface. It should have
> access to different context data as you say, but also I don't think it
> makes sense to have XDP_TX and XDP_REDIRECT in an XDP_TX hook. And we
> may also want to have a "throttle" return code; or maybe that could be
> done via a helper?
> 
> In any case, I don't think this "emulated RX hook on the other end of a
> virtual device" model that this series introduces is the right semantics
> for an XDP TX hook. I can see what you're trying to do, and for virtual
> point-to-point links I think it may make sense to emulate the RX hook of
> the "other end" on TX. However, form a UAPI perspective, I don't think
> we should be calling this a TX hook; logically, it's still an RX hook
> on the receive end.
> 
> If you guys are up for evolving this design into a "proper" TX hook (as
> outlined above an in [0]), that would be awesome, of course. But not
> sure what constraints you have on your original problem? Do you
> specifically need the "emulated RX hook for unmodified XDP programs"
> semantics, or could your problem be solved with a TX hook with different
> semantics?

I agree with above.
It looks more like existing BPF_PROG_TYPE_XDP, but attached to egress
of veth/tap interface. I think only attachment point makes a difference.
May be use expected_attach_type ?
Then there will be no need to create new program type.
BPF_PROG_TYPE_XDP will be able to access different fields depending
on expected_attach_type. Like rx-queue length that Jesper is suggesting
will be available only in such case and not for all BPF_PROG_TYPE_XDP progs.
It can be reduced too. Like if there is no xdp.rxq concept for egress side
of virtual device the access to that field can disallowed by the verifier.
Could you also call it XDP_EGRESS instead of XDP_TX?
I would like to reserve XDP_TX name to what Toke describes as XDP_TX.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC net-next 03/14] libbpf: API for tx path XDP support
  2019-12-18  8:10 ` [RFC net-next 03/14] libbpf: API for tx path XDP support Prashant Bhole
@ 2019-12-18 18:20   ` Alexei Starovoitov
  0 siblings, 0 replies; 36+ messages in thread
From: Alexei Starovoitov @ 2019-12-18 18:20 UTC (permalink / raw)
  To: Prashant Bhole
  Cc: David S . Miller, Michael S . Tsirkin, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, Jason Wang, David Ahern,
	Jakub Kicinski, John Fastabend, Toshiaki Makita,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	netdev, David Ahern

On Wed, Dec 18, 2019 at 05:10:39PM +0900, Prashant Bhole wrote:
> Adds new APIs for tx path XDP:
>  - bpf_set_link_xdp_tx_fd
>  - bpf_get_link_xdp_tx_id
>  - bpf_get_link_xdp_tx_info
> 
> Co-developed-by: David Ahern <dahern@digitalocean.com>
> Signed-off-by: David Ahern <dahern@digitalocean.com>
> Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com>
> ---
>  tools/lib/bpf/libbpf.h   |  4 +++
>  tools/lib/bpf/libbpf.map |  3 ++
>  tools/lib/bpf/netlink.c  | 77 +++++++++++++++++++++++++++++++++++-----
>  3 files changed, 75 insertions(+), 9 deletions(-)
> 
> diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h
> index 0dbf4bfba0c4..741e5fee61f6 100644
> --- a/tools/lib/bpf/libbpf.h
> +++ b/tools/lib/bpf/libbpf.h
> @@ -447,6 +447,10 @@ LIBBPF_API int bpf_set_link_xdp_fd(int ifindex, int fd, __u32 flags);
>  LIBBPF_API int bpf_get_link_xdp_id(int ifindex, __u32 *prog_id, __u32 flags);
>  LIBBPF_API int bpf_get_link_xdp_info(int ifindex, struct xdp_link_info *info,
>  				     size_t info_size, __u32 flags);
> +LIBBPF_API int bpf_set_link_xdp_tx_fd(int ifindex, int fd, __u32 flags);
> +LIBBPF_API int bpf_get_link_xdp_tx_id(int ifindex, __u32 *prog_id, __u32 flags);
> +LIBBPF_API int bpf_get_link_xdp_tx_info(int ifindex, struct xdp_link_info *info,
> +				     size_t info_size, __u32 flags);

please use _opts() style of api extensions.


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC net-next 11/14] tun: run XDP program in tx path
  2019-12-18 10:07   ` Jesper Dangaard Brouer
  2019-12-18 11:48     ` Toke Høiland-Jørgensen
  2019-12-18 16:29     ` David Ahern
@ 2019-12-19  1:47     ` Prashant Bhole
  2 siblings, 0 replies; 36+ messages in thread
From: Prashant Bhole @ 2019-12-19  1:47 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: David S . Miller, Michael S . Tsirkin, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, Jason Wang, David Ahern,
	Jakub Kicinski, John Fastabend, Toshiaki Makita,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	netdev, Ilias Apalodimas



On 12/18/19 7:07 PM, Jesper Dangaard Brouer wrote:
> On Wed, 18 Dec 2019 17:10:47 +0900
> Prashant Bhole <prashantbhole.linux@gmail.com> wrote:
> 
>> +static u32 tun_do_xdp_tx(struct tun_struct *tun, struct tun_file *tfile,
>> +			 struct xdp_frame *frame)
>> +{
>> +	struct bpf_prog *xdp_prog;
>> +	struct tun_page tpage;
>> +	struct xdp_buff xdp;
>> +	u32 act = XDP_PASS;
>> +	int flush = 0;
>> +
>> +	xdp_prog = rcu_dereference(tun->xdp_tx_prog);
>> +	if (xdp_prog) {
>> +		xdp.data_hard_start = frame->data - frame->headroom;
>> +		xdp.data = frame->data;
>> +		xdp.data_end = xdp.data + frame->len;
>> +		xdp.data_meta = xdp.data - frame->metasize;
> 
> You have not configured xdp.rxq, thus a BPF-prog accessing this will crash.
> 
> For an XDP TX hook, I want us to provide/give BPF-prog access to some
> more information about e.g. the current tx-queue length, or TC-q number.
> 
> Question to Daniel or Alexei, can we do this and still keep BPF_PROG_TYPE_XDP?
> Or is it better to introduce a new BPF prog type (enum bpf_prog_type)
> for XDP TX-hook ?
> 
> To Prashant, look at net/core/filter.c in xdp_convert_ctx_access() on
> how the BPF instruction rewrites are done, when accessing xdp_rxq_info.

Got it. I will take care of it next time.

> 
> 
>> +		act = bpf_prog_run_xdp(xdp_prog, &xdp);
>> +		switch (act) {
>> +		case XDP_PASS:
>> +			break;
>> +		case XDP_TX:
>> +			/* fall through */
>> +		case XDP_REDIRECT:
>> +			/* fall through */
>> +		default:
>> +			bpf_warn_invalid_xdp_action(act);
>> +			/* fall through */
>> +		case XDP_ABORTED:
>> +			trace_xdp_exception(tun->dev, xdp_prog, act);
>> +			/* fall through */
>> +		case XDP_DROP:
>> +			xdp_return_frame_rx_napi(frame);
> 
> I'm not sure that it is safe to use "napi" variant here, as you have to
> be under same RX-NAPI processing loop for this to be safe.
> 
> Notice the "rx" part of the name "xdp_return_frame_rx_napi".

You are right, I will fix it next time.

Thanks!

> 
> 
>> +			break;
>> +		}
>> +	}
>> +
>> +	return act;
>> +}
> 
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC net-next 11/14] tun: run XDP program in tx path
  2019-12-18 18:19       ` Alexei Starovoitov
@ 2019-12-19  2:34         ` Prashant Bhole
  2019-12-19 10:15           ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 36+ messages in thread
From: Prashant Bhole @ 2019-12-19  2:34 UTC (permalink / raw)
  To: Alexei Starovoitov, Toke Høiland-Jørgensen,
	Jesper Dangaard Brouer
  Cc: David S . Miller, Michael S . Tsirkin, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, Jason Wang, David Ahern,
	Jakub Kicinski, John Fastabend, Toshiaki Makita,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	netdev, Ilias Apalodimas



On 12/19/19 3:19 AM, Alexei Starovoitov wrote:
> On Wed, Dec 18, 2019 at 12:48:59PM +0100, Toke Høiland-Jørgensen wrote:
>> Jesper Dangaard Brouer <jbrouer@redhat.com> writes:
>>
>>> On Wed, 18 Dec 2019 17:10:47 +0900
>>> Prashant Bhole <prashantbhole.linux@gmail.com> wrote:
>>>
>>>> +static u32 tun_do_xdp_tx(struct tun_struct *tun, struct tun_file *tfile,
>>>> +			 struct xdp_frame *frame)
>>>> +{
>>>> +	struct bpf_prog *xdp_prog;
>>>> +	struct tun_page tpage;
>>>> +	struct xdp_buff xdp;
>>>> +	u32 act = XDP_PASS;
>>>> +	int flush = 0;
>>>> +
>>>> +	xdp_prog = rcu_dereference(tun->xdp_tx_prog);
>>>> +	if (xdp_prog) {
>>>> +		xdp.data_hard_start = frame->data - frame->headroom;
>>>> +		xdp.data = frame->data;
>>>> +		xdp.data_end = xdp.data + frame->len;
>>>> +		xdp.data_meta = xdp.data - frame->metasize;
>>>
>>> You have not configured xdp.rxq, thus a BPF-prog accessing this will crash.
>>>
>>> For an XDP TX hook, I want us to provide/give BPF-prog access to some
>>> more information about e.g. the current tx-queue length, or TC-q number.
>>>
>>> Question to Daniel or Alexei, can we do this and still keep BPF_PROG_TYPE_XDP?
>>> Or is it better to introduce a new BPF prog type (enum bpf_prog_type)
>>> for XDP TX-hook ?
>>
>> I think a new program type would make the most sense. If/when we
>> introduce an XDP TX hook[0], it should have different semantics than the
>> regular XDP hook. I view the XDP TX hook as a hook that executes as the
>> very last thing before packets leave the interface. It should have
>> access to different context data as you say, but also I don't think it
>> makes sense to have XDP_TX and XDP_REDIRECT in an XDP_TX hook. And we
>> may also want to have a "throttle" return code; or maybe that could be
>> done via a helper?
>>
>> In any case, I don't think this "emulated RX hook on the other end of a
>> virtual device" model that this series introduces is the right semantics
>> for an XDP TX hook. I can see what you're trying to do, and for virtual
>> point-to-point links I think it may make sense to emulate the RX hook of
>> the "other end" on TX. However, form a UAPI perspective, I don't think
>> we should be calling this a TX hook; logically, it's still an RX hook
>> on the receive end.
>>
>> If you guys are up for evolving this design into a "proper" TX hook (as
>> outlined above an in [0]), that would be awesome, of course. But not
>> sure what constraints you have on your original problem? Do you
>> specifically need the "emulated RX hook for unmodified XDP programs"
>> semantics, or could your problem be solved with a TX hook with different
>> semantics?
> 
> I agree with above.
> It looks more like existing BPF_PROG_TYPE_XDP, but attached to egress
> of veth/tap interface. I think only attachment point makes a difference.
> May be use expected_attach_type ?
> Then there will be no need to create new program type.
> BPF_PROG_TYPE_XDP will be able to access different fields depending
> on expected_attach_type. Like rx-queue length that Jesper is suggesting
> will be available only in such case and not for all BPF_PROG_TYPE_XDP progs.
> It can be reduced too. Like if there is no xdp.rxq concept for egress side
> of virtual device the access to that field can disallowed by the verifier.
> Could you also call it XDP_EGRESS instead of XDP_TX?
> I would like to reserve XDP_TX name to what Toke describes as XDP_TX.
> 

 From the discussion over this set, it makes sense to have new type of
program. As David suggested it will make a way for changes specific
to egress path.
On the other hand, XDP offload with virtio-net implementation is based
on "emulated RX hook". How about having this special behavior with
expected_attach_type?

Thanks,
Prashant

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC net-next 11/14] tun: run XDP program in tx path
  2019-12-18 16:33       ` David Ahern
@ 2019-12-19  2:44         ` Jason Wang
  0 siblings, 0 replies; 36+ messages in thread
From: Jason Wang @ 2019-12-19  2:44 UTC (permalink / raw)
  To: David Ahern, Toke Høiland-Jørgensen,
	Jesper Dangaard Brouer, Prashant Bhole
  Cc: David S . Miller, Michael S . Tsirkin, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, Jakub Kicinski,
	John Fastabend, Toshiaki Makita, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, netdev, Ilias Apalodimas


On 2019/12/19 上午12:33, David Ahern wrote:
> On 12/18/19 4:48 AM, Toke Høiland-Jørgensen wrote:
>> Jesper Dangaard Brouer <jbrouer@redhat.com> writes:
>>
>>> On Wed, 18 Dec 2019 17:10:47 +0900
>>> Prashant Bhole <prashantbhole.linux@gmail.com> wrote:
>>>
>>>> +static u32 tun_do_xdp_tx(struct tun_struct *tun, struct tun_file *tfile,
>>>> +			 struct xdp_frame *frame)
>>>> +{
>>>> +	struct bpf_prog *xdp_prog;
>>>> +	struct tun_page tpage;
>>>> +	struct xdp_buff xdp;
>>>> +	u32 act = XDP_PASS;
>>>> +	int flush = 0;
>>>> +
>>>> +	xdp_prog = rcu_dereference(tun->xdp_tx_prog);
>>>> +	if (xdp_prog) {
>>>> +		xdp.data_hard_start = frame->data - frame->headroom;
>>>> +		xdp.data = frame->data;
>>>> +		xdp.data_end = xdp.data + frame->len;
>>>> +		xdp.data_meta = xdp.data - frame->metasize;
>>> You have not configured xdp.rxq, thus a BPF-prog accessing this will crash.
>>>
>>> For an XDP TX hook, I want us to provide/give BPF-prog access to some
>>> more information about e.g. the current tx-queue length, or TC-q number.
>>>
>>> Question to Daniel or Alexei, can we do this and still keep BPF_PROG_TYPE_XDP?
>>> Or is it better to introduce a new BPF prog type (enum bpf_prog_type)
>>> for XDP TX-hook ?
>> I think a new program type would make the most sense. If/when we
>> introduce an XDP TX hook[0], it should have different semantics than the
>> regular XDP hook. I view the XDP TX hook as a hook that executes as the
>> very last thing before packets leave the interface. It should have
>> access to different context data as you say, but also I don't think it
>> makes sense to have XDP_TX and XDP_REDIRECT in an XDP_TX hook. And we
>> may also want to have a "throttle" return code; or maybe that could be
>> done via a helper?
> XDP_TX does not make sense in the Tx path. Jason questioned whether
> XDP_RX makes sense. There is not a clear use case just yet.


XDP_RX can chain TX XDP program and RX XDP program on the same interface.

Thanks


>
> REDIRECT is another one that would be useful as you point out below.
>
> A new program type would allow support for these to be added over time
> and not hold up the ability to do XDP_DROP in the Tx path.
>
>> In any case, I don't think this "emulated RX hook on the other end of a
>> virtual device" model that this series introduces is the right semantics
>> for an XDP TX hook. I can see what you're trying to do, and for virtual
>> point-to-point links I think it may make sense to emulate the RX hook of
>> the "other end" on TX. However, form a UAPI perspective, I don't think
>> we should be calling this a TX hook; logically, it's still an RX hook
>> on the receive end.
>>
>> If you guys are up for evolving this design into a "proper" TX hook (as
>> outlined above an in [0]), that would be awesome, of course. But not
>> sure what constraints you have on your original problem? Do you
>> specifically need the "emulated RX hook for unmodified XDP programs"
>> semantics, or could your problem be solved with a TX hook with different
>> semantics?
>>
>> -Toke
>>
>>
>> [0] We've suggested this in the past, see
>> https://github.com/xdp-project/xdp-project/blob/master/xdp-project.org#xdp-hook-at-tx
>>


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC net-next 11/14] tun: run XDP program in tx path
  2019-12-19  2:34         ` Prashant Bhole
@ 2019-12-19 10:15           ` Toke Høiland-Jørgensen
  2019-12-20  0:07             ` Prashant Bhole
  0 siblings, 1 reply; 36+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-12-19 10:15 UTC (permalink / raw)
  To: Prashant Bhole, Alexei Starovoitov, Jesper Dangaard Brouer
  Cc: David S . Miller, Michael S . Tsirkin, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, Jason Wang, David Ahern,
	Jakub Kicinski, John Fastabend, Toshiaki Makita,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	netdev, Ilias Apalodimas

Prashant Bhole <prashantbhole.linux@gmail.com> writes:

> On 12/19/19 3:19 AM, Alexei Starovoitov wrote:
>> On Wed, Dec 18, 2019 at 12:48:59PM +0100, Toke Høiland-Jørgensen wrote:
>>> Jesper Dangaard Brouer <jbrouer@redhat.com> writes:
>>>
>>>> On Wed, 18 Dec 2019 17:10:47 +0900
>>>> Prashant Bhole <prashantbhole.linux@gmail.com> wrote:
>>>>
>>>>> +static u32 tun_do_xdp_tx(struct tun_struct *tun, struct tun_file *tfile,
>>>>> +			 struct xdp_frame *frame)
>>>>> +{
>>>>> +	struct bpf_prog *xdp_prog;
>>>>> +	struct tun_page tpage;
>>>>> +	struct xdp_buff xdp;
>>>>> +	u32 act = XDP_PASS;
>>>>> +	int flush = 0;
>>>>> +
>>>>> +	xdp_prog = rcu_dereference(tun->xdp_tx_prog);
>>>>> +	if (xdp_prog) {
>>>>> +		xdp.data_hard_start = frame->data - frame->headroom;
>>>>> +		xdp.data = frame->data;
>>>>> +		xdp.data_end = xdp.data + frame->len;
>>>>> +		xdp.data_meta = xdp.data - frame->metasize;
>>>>
>>>> You have not configured xdp.rxq, thus a BPF-prog accessing this will crash.
>>>>
>>>> For an XDP TX hook, I want us to provide/give BPF-prog access to some
>>>> more information about e.g. the current tx-queue length, or TC-q number.
>>>>
>>>> Question to Daniel or Alexei, can we do this and still keep BPF_PROG_TYPE_XDP?
>>>> Or is it better to introduce a new BPF prog type (enum bpf_prog_type)
>>>> for XDP TX-hook ?
>>>
>>> I think a new program type would make the most sense. If/when we
>>> introduce an XDP TX hook[0], it should have different semantics than the
>>> regular XDP hook. I view the XDP TX hook as a hook that executes as the
>>> very last thing before packets leave the interface. It should have
>>> access to different context data as you say, but also I don't think it
>>> makes sense to have XDP_TX and XDP_REDIRECT in an XDP_TX hook. And we
>>> may also want to have a "throttle" return code; or maybe that could be
>>> done via a helper?
>>>
>>> In any case, I don't think this "emulated RX hook on the other end of a
>>> virtual device" model that this series introduces is the right semantics
>>> for an XDP TX hook. I can see what you're trying to do, and for virtual
>>> point-to-point links I think it may make sense to emulate the RX hook of
>>> the "other end" on TX. However, form a UAPI perspective, I don't think
>>> we should be calling this a TX hook; logically, it's still an RX hook
>>> on the receive end.
>>>
>>> If you guys are up for evolving this design into a "proper" TX hook (as
>>> outlined above an in [0]), that would be awesome, of course. But not
>>> sure what constraints you have on your original problem? Do you
>>> specifically need the "emulated RX hook for unmodified XDP programs"
>>> semantics, or could your problem be solved with a TX hook with different
>>> semantics?
>> 
>> I agree with above.
>> It looks more like existing BPF_PROG_TYPE_XDP, but attached to egress
>> of veth/tap interface. I think only attachment point makes a difference.
>> May be use expected_attach_type ?
>> Then there will be no need to create new program type.
>> BPF_PROG_TYPE_XDP will be able to access different fields depending
>> on expected_attach_type. Like rx-queue length that Jesper is suggesting
>> will be available only in such case and not for all BPF_PROG_TYPE_XDP progs.
>> It can be reduced too. Like if there is no xdp.rxq concept for egress side
>> of virtual device the access to that field can disallowed by the verifier.
>> Could you also call it XDP_EGRESS instead of XDP_TX?
>> I would like to reserve XDP_TX name to what Toke describes as XDP_TX.
>> 
>
>  From the discussion over this set, it makes sense to have new type of
> program. As David suggested it will make a way for changes specific
> to egress path.
> On the other hand, XDP offload with virtio-net implementation is based
> on "emulated RX hook". How about having this special behavior with
> expected_attach_type?

Another thought I had re: this was that for these "special" virtual
point-to-point devices we could extend the API to have an ATTACH_PEER
flag. So if you have a pair of veth devices (veth0,veth1) connecting to
each other, you could do either of:

bpf_set_link_xdp_fd(ifindex(veth0), prog_fd, 0);
bpf_set_link_xdp_fd(ifindex(veth1), prog_fd, ATTACH_PEER);

to attach to veth0, and:

bpf_set_link_xdp_fd(ifindex(veth1), prog_fd, 0);
bpf_set_link_xdp_fd(ifindex(veth0), prog_fd, ATTACH_PEER);

to attach to veth0.

This would allow to attach to a device without having the "other end"
visible, and keep the "XDP runs on RX" semantics clear to userspace.
Internally in the kernel we could then turn the "attach to peer"
operation for a tun device into the "emulate on TX" thing you're already
doing?

Would this work for your use case, do you think?

-Toke


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC net-next 11/14] tun: run XDP program in tx path
  2019-12-19 10:15           ` Toke Høiland-Jørgensen
@ 2019-12-20  0:07             ` Prashant Bhole
  2019-12-20  3:24               ` Jason Wang
  0 siblings, 1 reply; 36+ messages in thread
From: Prashant Bhole @ 2019-12-20  0:07 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Alexei Starovoitov,
	Jesper Dangaard Brouer
  Cc: David S . Miller, Michael S . Tsirkin, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, Jason Wang, David Ahern,
	Jakub Kicinski, John Fastabend, Toshiaki Makita,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	netdev, Ilias Apalodimas

Note: Resending my last response. It was not delivered to netdev list
due to some problem.

On 12/19/19 7:15 PM, Toke Høiland-Jørgensen wrote:
> Prashant Bhole <prashantbhole.linux@gmail.com> writes:
> 
>> On 12/19/19 3:19 AM, Alexei Starovoitov wrote:
>>> On Wed, Dec 18, 2019 at 12:48:59PM +0100, Toke Høiland-Jørgensen wrote:
>>>> Jesper Dangaard Brouer <jbrouer@redhat.com> writes:
>>>>
>>>>> On Wed, 18 Dec 2019 17:10:47 +0900
>>>>> Prashant Bhole <prashantbhole.linux@gmail.com> wrote:
>>>>>
>>>>>> +static u32 tun_do_xdp_tx(struct tun_struct *tun, struct tun_file *tfile,
>>>>>> +			 struct xdp_frame *frame)
>>>>>> +{
>>>>>> +	struct bpf_prog *xdp_prog;
>>>>>> +	struct tun_page tpage;
>>>>>> +	struct xdp_buff xdp;
>>>>>> +	u32 act = XDP_PASS;
>>>>>> +	int flush = 0;
>>>>>> +
>>>>>> +	xdp_prog = rcu_dereference(tun->xdp_tx_prog);
>>>>>> +	if (xdp_prog) {
>>>>>> +		xdp.data_hard_start = frame->data - frame->headroom;
>>>>>> +		xdp.data = frame->data;
>>>>>> +		xdp.data_end = xdp.data + frame->len;
>>>>>> +		xdp.data_meta = xdp.data - frame->metasize;
>>>>>
>>>>> You have not configured xdp.rxq, thus a BPF-prog accessing this will crash.
>>>>>
>>>>> For an XDP TX hook, I want us to provide/give BPF-prog access to some
>>>>> more information about e.g. the current tx-queue length, or TC-q number.
>>>>>
>>>>> Question to Daniel or Alexei, can we do this and still keep BPF_PROG_TYPE_XDP?
>>>>> Or is it better to introduce a new BPF prog type (enum bpf_prog_type)
>>>>> for XDP TX-hook ?
>>>>
>>>> I think a new program type would make the most sense. If/when we
>>>> introduce an XDP TX hook[0], it should have different semantics than the
>>>> regular XDP hook. I view the XDP TX hook as a hook that executes as the
>>>> very last thing before packets leave the interface. It should have
>>>> access to different context data as you say, but also I don't think it
>>>> makes sense to have XDP_TX and XDP_REDIRECT in an XDP_TX hook. And we
>>>> may also want to have a "throttle" return code; or maybe that could be
>>>> done via a helper?
>>>>
>>>> In any case, I don't think this "emulated RX hook on the other end of a
>>>> virtual device" model that this series introduces is the right semantics
>>>> for an XDP TX hook. I can see what you're trying to do, and for virtual
>>>> point-to-point links I think it may make sense to emulate the RX hook of
>>>> the "other end" on TX. However, form a UAPI perspective, I don't think
>>>> we should be calling this a TX hook; logically, it's still an RX hook
>>>> on the receive end.
>>>>
>>>> If you guys are up for evolving this design into a "proper" TX hook (as
>>>> outlined above an in [0]), that would be awesome, of course. But not
>>>> sure what constraints you have on your original problem? Do you
>>>> specifically need the "emulated RX hook for unmodified XDP programs"
>>>> semantics, or could your problem be solved with a TX hook with different
>>>> semantics?
>>>
>>> I agree with above.
>>> It looks more like existing BPF_PROG_TYPE_XDP, but attached to egress
>>> of veth/tap interface. I think only attachment point makes a difference.
>>> May be use expected_attach_type ?
>>> Then there will be no need to create new program type.
>>> BPF_PROG_TYPE_XDP will be able to access different fields depending
>>> on expected_attach_type. Like rx-queue length that Jesper is suggesting
>>> will be available only in such case and not for all BPF_PROG_TYPE_XDP progs.
>>> It can be reduced too. Like if there is no xdp.rxq concept for egress side
>>> of virtual device the access to that field can disallowed by the verifier.
>>> Could you also call it XDP_EGRESS instead of XDP_TX?
>>> I would like to reserve XDP_TX name to what Toke describes as XDP_TX.
>>>
>>
>>   From the discussion over this set, it makes sense to have new type of
>> program. As David suggested it will make a way for changes specific
>> to egress path.
>> On the other hand, XDP offload with virtio-net implementation is based
>> on "emulated RX hook". How about having this special behavior with
>> expected_attach_type?
> 
> Another thought I had re: this was that for these "special" virtual
> point-to-point devices we could extend the API to have an ATTACH_PEER
> flag. So if you have a pair of veth devices (veth0,veth1) connecting to
> each other, you could do either of:
> 
> bpf_set_link_xdp_fd(ifindex(veth0), prog_fd, 0);
> bpf_set_link_xdp_fd(ifindex(veth1), prog_fd, ATTACH_PEER);
> 
> to attach to veth0, and:
> 
> bpf_set_link_xdp_fd(ifindex(veth1), prog_fd, 0);
> bpf_set_link_xdp_fd(ifindex(veth0), prog_fd, ATTACH_PEER);
> 
> to attach to veth0.
> 
> This would allow to attach to a device without having the "other end"
> visible, and keep the "XDP runs on RX" semantics clear to userspace.
> Internally in the kernel we could then turn the "attach to peer"
> operation for a tun device into the "emulate on TX" thing you're already
> doing?
> 
> Would this work for your use case, do you think?
> 
> -Toke
> 

This is nice from UAPI point of view. It may work for veth case but
not for XDP offload with virtio-net. Please see the sequence when
a user program in the guest wants to offload a program to tun.

* User program wants to loads the program by setting offload flag and
   ifindex:

- map_offload_ops->alloc()
   virtio-net sends map info to qemu and it creates map on the host.
- prog_offload_ops->setup()
   New callback just to have a copy of unmodified program. It contains
   original map fds. We replace map fds with fds from the host side.
   Check the program for unsupported helpers calls.
- prog_offload_ops->finalize()
   Send the program to qemu and it loads the program to the host.

* User program calls bpf_set_link_xdp_fd()
   virtio-net handles XDP_PROG_SETUP_HW by sending a request to qemu.
   Qemu then attaches host side program fd to respective tun device by
   calling bpf_set_link_xdp_fd()

In above sequence there is no chance to use.

Here is how other ideas from this discussion can be used:

- Introduce BPF_PROG_TYPE_TX_XDP for egress path. Have a special
   behavior of emulating RX XDP using expected_attach_type flag.
- The emulated RX XDP will be restrictive in terms of helper calls.
- In offload case qemu will load the program BPF_PROG_TYPE_TX_XDP and
   set expected_attach_type.

What is your opinion about it? Does the driver implementing egress
XDP needs to know what kind of XDP program it is running?

Thanks,
Prashant

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC net-next 11/14] tun: run XDP program in tx path
  2019-12-20  0:07             ` Prashant Bhole
@ 2019-12-20  3:24               ` Jason Wang
  2019-12-20  4:46                 ` Prashant Bhole
  0 siblings, 1 reply; 36+ messages in thread
From: Jason Wang @ 2019-12-20  3:24 UTC (permalink / raw)
  To: Prashant Bhole, Toke Høiland-Jørgensen,
	Alexei Starovoitov, Jesper Dangaard Brouer
  Cc: David S . Miller, Michael S . Tsirkin, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, David Ahern,
	Jakub Kicinski, John Fastabend, Toshiaki Makita,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	netdev, Ilias Apalodimas


On 2019/12/20 上午8:07, Prashant Bhole wrote:
> Note: Resending my last response. It was not delivered to netdev list
> due to some problem.
>
> On 12/19/19 7:15 PM, Toke Høiland-Jørgensen wrote:
>> Prashant Bhole <prashantbhole.linux@gmail.com> writes:
>>
>>> On 12/19/19 3:19 AM, Alexei Starovoitov wrote:
>>>> On Wed, Dec 18, 2019 at 12:48:59PM +0100, Toke Høiland-Jørgensen 
>>>> wrote:
>>>>> Jesper Dangaard Brouer <jbrouer@redhat.com> writes:
>>>>>
>>>>>> On Wed, 18 Dec 2019 17:10:47 +0900
>>>>>> Prashant Bhole <prashantbhole.linux@gmail.com> wrote:
>>>>>>
>>>>>>> +static u32 tun_do_xdp_tx(struct tun_struct *tun, struct 
>>>>>>> tun_file *tfile,
>>>>>>> +             struct xdp_frame *frame)
>>>>>>> +{
>>>>>>> +    struct bpf_prog *xdp_prog;
>>>>>>> +    struct tun_page tpage;
>>>>>>> +    struct xdp_buff xdp;
>>>>>>> +    u32 act = XDP_PASS;
>>>>>>> +    int flush = 0;
>>>>>>> +
>>>>>>> +    xdp_prog = rcu_dereference(tun->xdp_tx_prog);
>>>>>>> +    if (xdp_prog) {
>>>>>>> +        xdp.data_hard_start = frame->data - frame->headroom;
>>>>>>> +        xdp.data = frame->data;
>>>>>>> +        xdp.data_end = xdp.data + frame->len;
>>>>>>> +        xdp.data_meta = xdp.data - frame->metasize;
>>>>>>
>>>>>> You have not configured xdp.rxq, thus a BPF-prog accessing this 
>>>>>> will crash.
>>>>>>
>>>>>> For an XDP TX hook, I want us to provide/give BPF-prog access to 
>>>>>> some
>>>>>> more information about e.g. the current tx-queue length, or TC-q 
>>>>>> number.
>>>>>>
>>>>>> Question to Daniel or Alexei, can we do this and still keep 
>>>>>> BPF_PROG_TYPE_XDP?
>>>>>> Or is it better to introduce a new BPF prog type (enum 
>>>>>> bpf_prog_type)
>>>>>> for XDP TX-hook ?
>>>>>
>>>>> I think a new program type would make the most sense. If/when we
>>>>> introduce an XDP TX hook[0], it should have different semantics 
>>>>> than the
>>>>> regular XDP hook. I view the XDP TX hook as a hook that executes 
>>>>> as the
>>>>> very last thing before packets leave the interface. It should have
>>>>> access to different context data as you say, but also I don't 
>>>>> think it
>>>>> makes sense to have XDP_TX and XDP_REDIRECT in an XDP_TX hook. And we
>>>>> may also want to have a "throttle" return code; or maybe that 
>>>>> could be
>>>>> done via a helper?
>>>>>
>>>>> In any case, I don't think this "emulated RX hook on the other end 
>>>>> of a
>>>>> virtual device" model that this series introduces is the right 
>>>>> semantics
>>>>> for an XDP TX hook. I can see what you're trying to do, and for 
>>>>> virtual
>>>>> point-to-point links I think it may make sense to emulate the RX 
>>>>> hook of
>>>>> the "other end" on TX. However, form a UAPI perspective, I don't 
>>>>> think
>>>>> we should be calling this a TX hook; logically, it's still an RX hook
>>>>> on the receive end.
>>>>>
>>>>> If you guys are up for evolving this design into a "proper" TX 
>>>>> hook (as
>>>>> outlined above an in [0]), that would be awesome, of course. But not
>>>>> sure what constraints you have on your original problem? Do you
>>>>> specifically need the "emulated RX hook for unmodified XDP programs"
>>>>> semantics, or could your problem be solved with a TX hook with 
>>>>> different
>>>>> semantics?
>>>>
>>>> I agree with above.
>>>> It looks more like existing BPF_PROG_TYPE_XDP, but attached to egress
>>>> of veth/tap interface. I think only attachment point makes a 
>>>> difference.
>>>> May be use expected_attach_type ?
>>>> Then there will be no need to create new program type.
>>>> BPF_PROG_TYPE_XDP will be able to access different fields depending
>>>> on expected_attach_type. Like rx-queue length that Jesper is 
>>>> suggesting
>>>> will be available only in such case and not for all 
>>>> BPF_PROG_TYPE_XDP progs.
>>>> It can be reduced too. Like if there is no xdp.rxq concept for 
>>>> egress side
>>>> of virtual device the access to that field can disallowed by the 
>>>> verifier.
>>>> Could you also call it XDP_EGRESS instead of XDP_TX?
>>>> I would like to reserve XDP_TX name to what Toke describes as XDP_TX.
>>>>
>>>
>>>   From the discussion over this set, it makes sense to have new type of
>>> program. As David suggested it will make a way for changes specific
>>> to egress path.
>>> On the other hand, XDP offload with virtio-net implementation is based
>>> on "emulated RX hook". How about having this special behavior with
>>> expected_attach_type?
>>
>> Another thought I had re: this was that for these "special" virtual
>> point-to-point devices we could extend the API to have an ATTACH_PEER
>> flag. So if you have a pair of veth devices (veth0,veth1) connecting to
>> each other, you could do either of:
>>
>> bpf_set_link_xdp_fd(ifindex(veth0), prog_fd, 0);
>> bpf_set_link_xdp_fd(ifindex(veth1), prog_fd, ATTACH_PEER);
>>
>> to attach to veth0, and:
>>
>> bpf_set_link_xdp_fd(ifindex(veth1), prog_fd, 0);
>> bpf_set_link_xdp_fd(ifindex(veth0), prog_fd, ATTACH_PEER);
>>
>> to attach to veth0.
>>
>> This would allow to attach to a device without having the "other end"
>> visible, and keep the "XDP runs on RX" semantics clear to userspace.
>> Internally in the kernel we could then turn the "attach to peer"
>> operation for a tun device into the "emulate on TX" thing you're already
>> doing?
>>
>> Would this work for your use case, do you think?
>>
>> -Toke
>>
>
> This is nice from UAPI point of view. It may work for veth case but
> not for XDP offload with virtio-net. Please see the sequence when
> a user program in the guest wants to offload a program to tun.
>
> * User program wants to loads the program by setting offload flag and
>   ifindex:
>
> - map_offload_ops->alloc()
>   virtio-net sends map info to qemu and it creates map on the host.
> - prog_offload_ops->setup()
>   New callback just to have a copy of unmodified program. It contains
>   original map fds. We replace map fds with fds from the host side.
>   Check the program for unsupported helpers calls.
> - prog_offload_ops->finalize()
>   Send the program to qemu and it loads the program to the host.
>
> * User program calls bpf_set_link_xdp_fd()
>   virtio-net handles XDP_PROG_SETUP_HW by sending a request to qemu.
>   Qemu then attaches host side program fd to respective tun device by
>   calling bpf_set_link_xdp_fd()
>
> In above sequence there is no chance to use.


For VM, I think what Toke meant is to consider virtio-net as a peer of 
TAP and we can do something like the following in qemu:

bpf_set_link_xdp_fd(ifindex(tap0), prog_fd, ATTACH_PEER);

in this case. And the behavior of XDP_TX could be kept as if the XDP was 
attached to the peer of TAP (actually a virtio-net inside the guest).

Thanks


>
> Here is how other ideas from this discussion can be used:
>
> - Introduce BPF_PROG_TYPE_TX_XDP for egress path. Have a special
>   behavior of emulating RX XDP using expected_attach_type flag.
> - The emulated RX XDP will be restrictive in terms of helper calls.
> - In offload case qemu will load the program BPF_PROG_TYPE_TX_XDP and
>   set expected_attach_type.
>
> What is your opinion about it? Does the driver implementing egress
> XDP needs to know what kind of XDP program it is running?
>
> Thanks,
> Prashant
>


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC net-next 11/14] tun: run XDP program in tx path
  2019-12-20  3:24               ` Jason Wang
@ 2019-12-20  4:46                 ` Prashant Bhole
  2019-12-20  7:36                   ` Jason Wang
                                     ` (2 more replies)
  0 siblings, 3 replies; 36+ messages in thread
From: Prashant Bhole @ 2019-12-20  4:46 UTC (permalink / raw)
  To: Jason Wang, Toke Høiland-Jørgensen, Alexei Starovoitov,
	Jesper Dangaard Brouer
  Cc: David S . Miller, Michael S . Tsirkin, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, David Ahern,
	Jakub Kicinski, John Fastabend, Toshiaki Makita,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	netdev, Ilias Apalodimas



On 12/20/19 12:24 PM, Jason Wang wrote:
> 
> On 2019/12/20 上午8:07, Prashant Bhole wrote:
>> Note: Resending my last response. It was not delivered to netdev list
>> due to some problem.
>>
>> On 12/19/19 7:15 PM, Toke Høiland-Jørgensen wrote:
>>> Prashant Bhole <prashantbhole.linux@gmail.com> writes:
>>>
>>>> On 12/19/19 3:19 AM, Alexei Starovoitov wrote:
>>>>> On Wed, Dec 18, 2019 at 12:48:59PM +0100, Toke Høiland-Jørgensen 
>>>>> wrote:
>>>>>> Jesper Dangaard Brouer <jbrouer@redhat.com> writes:
>>>>>>
>>>>>>> On Wed, 18 Dec 2019 17:10:47 +0900
>>>>>>> Prashant Bhole <prashantbhole.linux@gmail.com> wrote:
>>>>>>>
>>>>>>>> +static u32 tun_do_xdp_tx(struct tun_struct *tun, struct 
>>>>>>>> tun_file *tfile,
>>>>>>>> +             struct xdp_frame *frame)
>>>>>>>> +{
>>>>>>>> +    struct bpf_prog *xdp_prog;
>>>>>>>> +    struct tun_page tpage;
>>>>>>>> +    struct xdp_buff xdp;
>>>>>>>> +    u32 act = XDP_PASS;
>>>>>>>> +    int flush = 0;
>>>>>>>> +
>>>>>>>> +    xdp_prog = rcu_dereference(tun->xdp_tx_prog);
>>>>>>>> +    if (xdp_prog) {
>>>>>>>> +        xdp.data_hard_start = frame->data - frame->headroom;
>>>>>>>> +        xdp.data = frame->data;
>>>>>>>> +        xdp.data_end = xdp.data + frame->len;
>>>>>>>> +        xdp.data_meta = xdp.data - frame->metasize;
>>>>>>>
>>>>>>> You have not configured xdp.rxq, thus a BPF-prog accessing this 
>>>>>>> will crash.
>>>>>>>
>>>>>>> For an XDP TX hook, I want us to provide/give BPF-prog access to 
>>>>>>> some
>>>>>>> more information about e.g. the current tx-queue length, or TC-q 
>>>>>>> number.
>>>>>>>
>>>>>>> Question to Daniel or Alexei, can we do this and still keep 
>>>>>>> BPF_PROG_TYPE_XDP?
>>>>>>> Or is it better to introduce a new BPF prog type (enum 
>>>>>>> bpf_prog_type)
>>>>>>> for XDP TX-hook ?
>>>>>>
>>>>>> I think a new program type would make the most sense. If/when we
>>>>>> introduce an XDP TX hook[0], it should have different semantics 
>>>>>> than the
>>>>>> regular XDP hook. I view the XDP TX hook as a hook that executes 
>>>>>> as the
>>>>>> very last thing before packets leave the interface. It should have
>>>>>> access to different context data as you say, but also I don't 
>>>>>> think it
>>>>>> makes sense to have XDP_TX and XDP_REDIRECT in an XDP_TX hook. And we
>>>>>> may also want to have a "throttle" return code; or maybe that 
>>>>>> could be
>>>>>> done via a helper?
>>>>>>
>>>>>> In any case, I don't think this "emulated RX hook on the other end 
>>>>>> of a
>>>>>> virtual device" model that this series introduces is the right 
>>>>>> semantics
>>>>>> for an XDP TX hook. I can see what you're trying to do, and for 
>>>>>> virtual
>>>>>> point-to-point links I think it may make sense to emulate the RX 
>>>>>> hook of
>>>>>> the "other end" on TX. However, form a UAPI perspective, I don't 
>>>>>> think
>>>>>> we should be calling this a TX hook; logically, it's still an RX hook
>>>>>> on the receive end.
>>>>>>
>>>>>> If you guys are up for evolving this design into a "proper" TX 
>>>>>> hook (as
>>>>>> outlined above an in [0]), that would be awesome, of course. But not
>>>>>> sure what constraints you have on your original problem? Do you
>>>>>> specifically need the "emulated RX hook for unmodified XDP programs"
>>>>>> semantics, or could your problem be solved with a TX hook with 
>>>>>> different
>>>>>> semantics?
>>>>>
>>>>> I agree with above.
>>>>> It looks more like existing BPF_PROG_TYPE_XDP, but attached to egress
>>>>> of veth/tap interface. I think only attachment point makes a 
>>>>> difference.
>>>>> May be use expected_attach_type ?
>>>>> Then there will be no need to create new program type.
>>>>> BPF_PROG_TYPE_XDP will be able to access different fields depending
>>>>> on expected_attach_type. Like rx-queue length that Jesper is 
>>>>> suggesting
>>>>> will be available only in such case and not for all 
>>>>> BPF_PROG_TYPE_XDP progs.
>>>>> It can be reduced too. Like if there is no xdp.rxq concept for 
>>>>> egress side
>>>>> of virtual device the access to that field can disallowed by the 
>>>>> verifier.
>>>>> Could you also call it XDP_EGRESS instead of XDP_TX?
>>>>> I would like to reserve XDP_TX name to what Toke describes as XDP_TX.
>>>>>
>>>>
>>>>   From the discussion over this set, it makes sense to have new type of
>>>> program. As David suggested it will make a way for changes specific
>>>> to egress path.
>>>> On the other hand, XDP offload with virtio-net implementation is based
>>>> on "emulated RX hook". How about having this special behavior with
>>>> expected_attach_type?
>>>
>>> Another thought I had re: this was that for these "special" virtual
>>> point-to-point devices we could extend the API to have an ATTACH_PEER
>>> flag. So if you have a pair of veth devices (veth0,veth1) connecting to
>>> each other, you could do either of:
>>>
>>> bpf_set_link_xdp_fd(ifindex(veth0), prog_fd, 0);
>>> bpf_set_link_xdp_fd(ifindex(veth1), prog_fd, ATTACH_PEER);
>>>
>>> to attach to veth0, and:
>>>
>>> bpf_set_link_xdp_fd(ifindex(veth1), prog_fd, 0);
>>> bpf_set_link_xdp_fd(ifindex(veth0), prog_fd, ATTACH_PEER);
>>>
>>> to attach to veth0.
>>>
>>> This would allow to attach to a device without having the "other end"
>>> visible, and keep the "XDP runs on RX" semantics clear to userspace.
>>> Internally in the kernel we could then turn the "attach to peer"
>>> operation for a tun device into the "emulate on TX" thing you're already
>>> doing?
>>>
>>> Would this work for your use case, do you think?
>>>
>>> -Toke
>>>
>>
>> This is nice from UAPI point of view. It may work for veth case but
>> not for XDP offload with virtio-net. Please see the sequence when
>> a user program in the guest wants to offload a program to tun.
>>
>> * User program wants to loads the program by setting offload flag and
>>   ifindex:
>>
>> - map_offload_ops->alloc()
>>   virtio-net sends map info to qemu and it creates map on the host.
>> - prog_offload_ops->setup()
>>   New callback just to have a copy of unmodified program. It contains
>>   original map fds. We replace map fds with fds from the host side.
>>   Check the program for unsupported helpers calls.
>> - prog_offload_ops->finalize()
>>   Send the program to qemu and it loads the program to the host.
>>
>> * User program calls bpf_set_link_xdp_fd()
>>   virtio-net handles XDP_PROG_SETUP_HW by sending a request to qemu.
>>   Qemu then attaches host side program fd to respective tun device by
>>   calling bpf_set_link_xdp_fd()
>>
>> In above sequence there is no chance to use.
> 
> 
> For VM, I think what Toke meant is to consider virtio-net as a peer of 
> TAP and we can do something like the following in qemu:
> 
> bpf_set_link_xdp_fd(ifindex(tap0), prog_fd, ATTACH_PEER);
> 
> in this case. And the behavior of XDP_TX could be kept as if the XDP was 
> attached to the peer of TAP (actually a virtio-net inside the guest).

I think he meant actually attaching the program to the peer. Most
probably referring the use case I mentioned in the cover letter.

"It can improve container networking where veth pair links the host and
the container. Host can set ACL by setting tx path XDP to the veth
iface."

Toke, Can you please clarify?

Thanks!

> 
> Thanks
> 
> 
>>
>> Here is how other ideas from this discussion can be used:
>>
>> - Introduce BPF_PROG_TYPE_TX_XDP for egress path. Have a special
>>   behavior of emulating RX XDP using expected_attach_type flag.
>> - The emulated RX XDP will be restrictive in terms of helper calls.
>> - In offload case qemu will load the program BPF_PROG_TYPE_TX_XDP and
>>   set expected_attach_type.
>>
>> What is your opinion about it? Does the driver implementing egress
>> XDP needs to know what kind of XDP program it is running?
>>
>> Thanks,
>> Prashant
>>
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC net-next 11/14] tun: run XDP program in tx path
  2019-12-20  4:46                 ` Prashant Bhole
@ 2019-12-20  7:36                   ` Jason Wang
  2019-12-20 10:11                   ` Toke Høiland-Jørgensen
  2019-12-20 16:11                   ` David Ahern
  2 siblings, 0 replies; 36+ messages in thread
From: Jason Wang @ 2019-12-20  7:36 UTC (permalink / raw)
  To: Prashant Bhole, Toke Høiland-Jørgensen,
	Alexei Starovoitov, Jesper Dangaard Brouer
  Cc: David S . Miller, Michael S . Tsirkin, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, David Ahern,
	Jakub Kicinski, John Fastabend, Toshiaki Makita,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	netdev, Ilias Apalodimas


On 2019/12/20 下午12:46, Prashant Bhole wrote:
>
>
> On 12/20/19 12:24 PM, Jason Wang wrote:
>>
>> On 2019/12/20 上午8:07, Prashant Bhole wrote:
>>> Note: Resending my last response. It was not delivered to netdev list
>>> due to some problem.
>>>
>>> On 12/19/19 7:15 PM, Toke Høiland-Jørgensen wrote:
>>>> Prashant Bhole <prashantbhole.linux@gmail.com> writes:
>>>>
>>>>> On 12/19/19 3:19 AM, Alexei Starovoitov wrote:
>>>>>> On Wed, Dec 18, 2019 at 12:48:59PM +0100, Toke Høiland-Jørgensen 
>>>>>> wrote:
>>>>>>> Jesper Dangaard Brouer <jbrouer@redhat.com> writes:
>>>>>>>
>>>>>>>> On Wed, 18 Dec 2019 17:10:47 +0900
>>>>>>>> Prashant Bhole <prashantbhole.linux@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> +static u32 tun_do_xdp_tx(struct tun_struct *tun, struct 
>>>>>>>>> tun_file *tfile,
>>>>>>>>> +             struct xdp_frame *frame)
>>>>>>>>> +{
>>>>>>>>> +    struct bpf_prog *xdp_prog;
>>>>>>>>> +    struct tun_page tpage;
>>>>>>>>> +    struct xdp_buff xdp;
>>>>>>>>> +    u32 act = XDP_PASS;
>>>>>>>>> +    int flush = 0;
>>>>>>>>> +
>>>>>>>>> +    xdp_prog = rcu_dereference(tun->xdp_tx_prog);
>>>>>>>>> +    if (xdp_prog) {
>>>>>>>>> +        xdp.data_hard_start = frame->data - frame->headroom;
>>>>>>>>> +        xdp.data = frame->data;
>>>>>>>>> +        xdp.data_end = xdp.data + frame->len;
>>>>>>>>> +        xdp.data_meta = xdp.data - frame->metasize;
>>>>>>>>
>>>>>>>> You have not configured xdp.rxq, thus a BPF-prog accessing this 
>>>>>>>> will crash.
>>>>>>>>
>>>>>>>> For an XDP TX hook, I want us to provide/give BPF-prog access 
>>>>>>>> to some
>>>>>>>> more information about e.g. the current tx-queue length, or 
>>>>>>>> TC-q number.
>>>>>>>>
>>>>>>>> Question to Daniel or Alexei, can we do this and still keep 
>>>>>>>> BPF_PROG_TYPE_XDP?
>>>>>>>> Or is it better to introduce a new BPF prog type (enum 
>>>>>>>> bpf_prog_type)
>>>>>>>> for XDP TX-hook ?
>>>>>>>
>>>>>>> I think a new program type would make the most sense. If/when we
>>>>>>> introduce an XDP TX hook[0], it should have different semantics 
>>>>>>> than the
>>>>>>> regular XDP hook. I view the XDP TX hook as a hook that executes 
>>>>>>> as the
>>>>>>> very last thing before packets leave the interface. It should have
>>>>>>> access to different context data as you say, but also I don't 
>>>>>>> think it
>>>>>>> makes sense to have XDP_TX and XDP_REDIRECT in an XDP_TX hook. 
>>>>>>> And we
>>>>>>> may also want to have a "throttle" return code; or maybe that 
>>>>>>> could be
>>>>>>> done via a helper?
>>>>>>>
>>>>>>> In any case, I don't think this "emulated RX hook on the other 
>>>>>>> end of a
>>>>>>> virtual device" model that this series introduces is the right 
>>>>>>> semantics
>>>>>>> for an XDP TX hook. I can see what you're trying to do, and for 
>>>>>>> virtual
>>>>>>> point-to-point links I think it may make sense to emulate the RX 
>>>>>>> hook of
>>>>>>> the "other end" on TX. However, form a UAPI perspective, I don't 
>>>>>>> think
>>>>>>> we should be calling this a TX hook; logically, it's still an RX 
>>>>>>> hook
>>>>>>> on the receive end.
>>>>>>>
>>>>>>> If you guys are up for evolving this design into a "proper" TX 
>>>>>>> hook (as
>>>>>>> outlined above an in [0]), that would be awesome, of course. But 
>>>>>>> not
>>>>>>> sure what constraints you have on your original problem? Do you
>>>>>>> specifically need the "emulated RX hook for unmodified XDP 
>>>>>>> programs"
>>>>>>> semantics, or could your problem be solved with a TX hook with 
>>>>>>> different
>>>>>>> semantics?
>>>>>>
>>>>>> I agree with above.
>>>>>> It looks more like existing BPF_PROG_TYPE_XDP, but attached to 
>>>>>> egress
>>>>>> of veth/tap interface. I think only attachment point makes a 
>>>>>> difference.
>>>>>> May be use expected_attach_type ?
>>>>>> Then there will be no need to create new program type.
>>>>>> BPF_PROG_TYPE_XDP will be able to access different fields depending
>>>>>> on expected_attach_type. Like rx-queue length that Jesper is 
>>>>>> suggesting
>>>>>> will be available only in such case and not for all 
>>>>>> BPF_PROG_TYPE_XDP progs.
>>>>>> It can be reduced too. Like if there is no xdp.rxq concept for 
>>>>>> egress side
>>>>>> of virtual device the access to that field can disallowed by the 
>>>>>> verifier.
>>>>>> Could you also call it XDP_EGRESS instead of XDP_TX?
>>>>>> I would like to reserve XDP_TX name to what Toke describes as 
>>>>>> XDP_TX.
>>>>>>
>>>>>
>>>>>   From the discussion over this set, it makes sense to have new 
>>>>> type of
>>>>> program. As David suggested it will make a way for changes specific
>>>>> to egress path.
>>>>> On the other hand, XDP offload with virtio-net implementation is 
>>>>> based
>>>>> on "emulated RX hook". How about having this special behavior with
>>>>> expected_attach_type?
>>>>
>>>> Another thought I had re: this was that for these "special" virtual
>>>> point-to-point devices we could extend the API to have an ATTACH_PEER
>>>> flag. So if you have a pair of veth devices (veth0,veth1) 
>>>> connecting to
>>>> each other, you could do either of:
>>>>
>>>> bpf_set_link_xdp_fd(ifindex(veth0), prog_fd, 0);
>>>> bpf_set_link_xdp_fd(ifindex(veth1), prog_fd, ATTACH_PEER);
>>>>
>>>> to attach to veth0, and:
>>>>
>>>> bpf_set_link_xdp_fd(ifindex(veth1), prog_fd, 0);
>>>> bpf_set_link_xdp_fd(ifindex(veth0), prog_fd, ATTACH_PEER);
>>>>
>>>> to attach to veth0.
>>>>
>>>> This would allow to attach to a device without having the "other end"
>>>> visible, and keep the "XDP runs on RX" semantics clear to userspace.
>>>> Internally in the kernel we could then turn the "attach to peer"
>>>> operation for a tun device into the "emulate on TX" thing you're 
>>>> already
>>>> doing?
>>>>
>>>> Would this work for your use case, do you think?
>>>>
>>>> -Toke
>>>>
>>>
>>> This is nice from UAPI point of view. It may work for veth case but
>>> not for XDP offload with virtio-net. Please see the sequence when
>>> a user program in the guest wants to offload a program to tun.
>>>
>>> * User program wants to loads the program by setting offload flag and
>>>   ifindex:
>>>
>>> - map_offload_ops->alloc()
>>>   virtio-net sends map info to qemu and it creates map on the host.
>>> - prog_offload_ops->setup()
>>>   New callback just to have a copy of unmodified program. It contains
>>>   original map fds. We replace map fds with fds from the host side.
>>>   Check the program for unsupported helpers calls.
>>> - prog_offload_ops->finalize()
>>>   Send the program to qemu and it loads the program to the host.
>>>
>>> * User program calls bpf_set_link_xdp_fd()
>>>   virtio-net handles XDP_PROG_SETUP_HW by sending a request to qemu.
>>>   Qemu then attaches host side program fd to respective tun device by
>>>   calling bpf_set_link_xdp_fd()
>>>
>>> In above sequence there is no chance to use.
>>
>>
>> For VM, I think what Toke meant is to consider virtio-net as a peer 
>> of TAP and we can do something like the following in qemu:
>>
>> bpf_set_link_xdp_fd(ifindex(tap0), prog_fd, ATTACH_PEER);
>>
>> in this case. And the behavior of XDP_TX could be kept as if the XDP 
>> was attached to the peer of TAP (actually a virtio-net inside the 
>> guest).
>
> I think he meant actually attaching the program to the peer. Most
> probably referring the use case I mentioned in the cover letter.


Interesting, actually this is kind of XDP offloading. In the case of 
veth, we had some discussion with Toshiaki and I remember he said it 
doesn't help much.

Thanks


>
> "It can improve container networking where veth pair links the host and
> the container. Host can set ACL by setting tx path XDP to the veth
> iface."
>
> Toke, Can you please clarify?
>
> Thanks!
>
>>
>> Thanks
>>
>>
>>>
>>> Here is how other ideas from this discussion can be used:
>>>
>>> - Introduce BPF_PROG_TYPE_TX_XDP for egress path. Have a special
>>>   behavior of emulating RX XDP using expected_attach_type flag.
>>> - The emulated RX XDP will be restrictive in terms of helper calls.
>>> - In offload case qemu will load the program BPF_PROG_TYPE_TX_XDP and
>>>   set expected_attach_type.
>>>
>>> What is your opinion about it? Does the driver implementing egress
>>> XDP needs to know what kind of XDP program it is running?
>>>
>>> Thanks,
>>> Prashant
>>>
>>
>


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC net-next 11/14] tun: run XDP program in tx path
  2019-12-20  4:46                 ` Prashant Bhole
  2019-12-20  7:36                   ` Jason Wang
@ 2019-12-20 10:11                   ` Toke Høiland-Jørgensen
  2019-12-20 16:11                   ` David Ahern
  2 siblings, 0 replies; 36+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-12-20 10:11 UTC (permalink / raw)
  To: Prashant Bhole, Jason Wang, Alexei Starovoitov, Jesper Dangaard Brouer
  Cc: David S . Miller, Michael S . Tsirkin, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, David Ahern,
	Jakub Kicinski, John Fastabend, Toshiaki Makita,
	Martin KaFai Lau, Song Liu, Yonghong Song, Andrii Nakryiko,
	netdev, Ilias Apalodimas

Prashant Bhole <prashantbhole.linux@gmail.com> writes:

> On 12/20/19 12:24 PM, Jason Wang wrote:
>> 
>> On 2019/12/20 上午8:07, Prashant Bhole wrote:
>>> Note: Resending my last response. It was not delivered to netdev list
>>> due to some problem.
>>>
>>> On 12/19/19 7:15 PM, Toke Høiland-Jørgensen wrote:
>>>> Prashant Bhole <prashantbhole.linux@gmail.com> writes:
>>>>
>>>>> On 12/19/19 3:19 AM, Alexei Starovoitov wrote:
>>>>>> On Wed, Dec 18, 2019 at 12:48:59PM +0100, Toke Høiland-Jørgensen 
>>>>>> wrote:
>>>>>>> Jesper Dangaard Brouer <jbrouer@redhat.com> writes:
>>>>>>>
>>>>>>>> On Wed, 18 Dec 2019 17:10:47 +0900
>>>>>>>> Prashant Bhole <prashantbhole.linux@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> +static u32 tun_do_xdp_tx(struct tun_struct *tun, struct 
>>>>>>>>> tun_file *tfile,
>>>>>>>>> +             struct xdp_frame *frame)
>>>>>>>>> +{
>>>>>>>>> +    struct bpf_prog *xdp_prog;
>>>>>>>>> +    struct tun_page tpage;
>>>>>>>>> +    struct xdp_buff xdp;
>>>>>>>>> +    u32 act = XDP_PASS;
>>>>>>>>> +    int flush = 0;
>>>>>>>>> +
>>>>>>>>> +    xdp_prog = rcu_dereference(tun->xdp_tx_prog);
>>>>>>>>> +    if (xdp_prog) {
>>>>>>>>> +        xdp.data_hard_start = frame->data - frame->headroom;
>>>>>>>>> +        xdp.data = frame->data;
>>>>>>>>> +        xdp.data_end = xdp.data + frame->len;
>>>>>>>>> +        xdp.data_meta = xdp.data - frame->metasize;
>>>>>>>>
>>>>>>>> You have not configured xdp.rxq, thus a BPF-prog accessing this 
>>>>>>>> will crash.
>>>>>>>>
>>>>>>>> For an XDP TX hook, I want us to provide/give BPF-prog access to 
>>>>>>>> some
>>>>>>>> more information about e.g. the current tx-queue length, or TC-q 
>>>>>>>> number.
>>>>>>>>
>>>>>>>> Question to Daniel or Alexei, can we do this and still keep 
>>>>>>>> BPF_PROG_TYPE_XDP?
>>>>>>>> Or is it better to introduce a new BPF prog type (enum 
>>>>>>>> bpf_prog_type)
>>>>>>>> for XDP TX-hook ?
>>>>>>>
>>>>>>> I think a new program type would make the most sense. If/when we
>>>>>>> introduce an XDP TX hook[0], it should have different semantics 
>>>>>>> than the
>>>>>>> regular XDP hook. I view the XDP TX hook as a hook that executes 
>>>>>>> as the
>>>>>>> very last thing before packets leave the interface. It should have
>>>>>>> access to different context data as you say, but also I don't 
>>>>>>> think it
>>>>>>> makes sense to have XDP_TX and XDP_REDIRECT in an XDP_TX hook. And we
>>>>>>> may also want to have a "throttle" return code; or maybe that 
>>>>>>> could be
>>>>>>> done via a helper?
>>>>>>>
>>>>>>> In any case, I don't think this "emulated RX hook on the other end 
>>>>>>> of a
>>>>>>> virtual device" model that this series introduces is the right 
>>>>>>> semantics
>>>>>>> for an XDP TX hook. I can see what you're trying to do, and for 
>>>>>>> virtual
>>>>>>> point-to-point links I think it may make sense to emulate the RX 
>>>>>>> hook of
>>>>>>> the "other end" on TX. However, form a UAPI perspective, I don't 
>>>>>>> think
>>>>>>> we should be calling this a TX hook; logically, it's still an RX hook
>>>>>>> on the receive end.
>>>>>>>
>>>>>>> If you guys are up for evolving this design into a "proper" TX 
>>>>>>> hook (as
>>>>>>> outlined above an in [0]), that would be awesome, of course. But not
>>>>>>> sure what constraints you have on your original problem? Do you
>>>>>>> specifically need the "emulated RX hook for unmodified XDP programs"
>>>>>>> semantics, or could your problem be solved with a TX hook with 
>>>>>>> different
>>>>>>> semantics?
>>>>>>
>>>>>> I agree with above.
>>>>>> It looks more like existing BPF_PROG_TYPE_XDP, but attached to egress
>>>>>> of veth/tap interface. I think only attachment point makes a 
>>>>>> difference.
>>>>>> May be use expected_attach_type ?
>>>>>> Then there will be no need to create new program type.
>>>>>> BPF_PROG_TYPE_XDP will be able to access different fields depending
>>>>>> on expected_attach_type. Like rx-queue length that Jesper is 
>>>>>> suggesting
>>>>>> will be available only in such case and not for all 
>>>>>> BPF_PROG_TYPE_XDP progs.
>>>>>> It can be reduced too. Like if there is no xdp.rxq concept for 
>>>>>> egress side
>>>>>> of virtual device the access to that field can disallowed by the 
>>>>>> verifier.
>>>>>> Could you also call it XDP_EGRESS instead of XDP_TX?
>>>>>> I would like to reserve XDP_TX name to what Toke describes as XDP_TX.
>>>>>>
>>>>>
>>>>>   From the discussion over this set, it makes sense to have new type of
>>>>> program. As David suggested it will make a way for changes specific
>>>>> to egress path.
>>>>> On the other hand, XDP offload with virtio-net implementation is based
>>>>> on "emulated RX hook". How about having this special behavior with
>>>>> expected_attach_type?
>>>>
>>>> Another thought I had re: this was that for these "special" virtual
>>>> point-to-point devices we could extend the API to have an ATTACH_PEER
>>>> flag. So if you have a pair of veth devices (veth0,veth1) connecting to
>>>> each other, you could do either of:
>>>>
>>>> bpf_set_link_xdp_fd(ifindex(veth0), prog_fd, 0);
>>>> bpf_set_link_xdp_fd(ifindex(veth1), prog_fd, ATTACH_PEER);
>>>>
>>>> to attach to veth0, and:
>>>>
>>>> bpf_set_link_xdp_fd(ifindex(veth1), prog_fd, 0);
>>>> bpf_set_link_xdp_fd(ifindex(veth0), prog_fd, ATTACH_PEER);
>>>>
>>>> to attach to veth0.
>>>>
>>>> This would allow to attach to a device without having the "other end"
>>>> visible, and keep the "XDP runs on RX" semantics clear to userspace.
>>>> Internally in the kernel we could then turn the "attach to peer"
>>>> operation for a tun device into the "emulate on TX" thing you're already
>>>> doing?
>>>>
>>>> Would this work for your use case, do you think?
>>>>
>>>> -Toke
>>>>
>>>
>>> This is nice from UAPI point of view. It may work for veth case but
>>> not for XDP offload with virtio-net. Please see the sequence when
>>> a user program in the guest wants to offload a program to tun.
>>>
>>> * User program wants to loads the program by setting offload flag and
>>>   ifindex:
>>>
>>> - map_offload_ops->alloc()
>>>   virtio-net sends map info to qemu and it creates map on the host.
>>> - prog_offload_ops->setup()
>>>   New callback just to have a copy of unmodified program. It contains
>>>   original map fds. We replace map fds with fds from the host side.
>>>   Check the program for unsupported helpers calls.
>>> - prog_offload_ops->finalize()
>>>   Send the program to qemu and it loads the program to the host.
>>>
>>> * User program calls bpf_set_link_xdp_fd()
>>>   virtio-net handles XDP_PROG_SETUP_HW by sending a request to qemu.
>>>   Qemu then attaches host side program fd to respective tun device by
>>>   calling bpf_set_link_xdp_fd()
>>>
>>> In above sequence there is no chance to use.
>> 
>> 
>> For VM, I think what Toke meant is to consider virtio-net as a peer of 
>> TAP and we can do something like the following in qemu:
>> 
>> bpf_set_link_xdp_fd(ifindex(tap0), prog_fd, ATTACH_PEER);
>> 
>> in this case. And the behavior of XDP_TX could be kept as if the XDP was 
>> attached to the peer of TAP (actually a virtio-net inside the guest).
>
> I think he meant actually attaching the program to the peer. Most
> probably referring the use case I mentioned in the cover letter.
>
> "It can improve container networking where veth pair links the host and
> the container. Host can set ACL by setting tx path XDP to the veth
> iface."
>
> Toke, Can you please clarify?

Well, I was mostly talking about the UAPI. I.e., the ATTACH_PEER should
*behave as if* the XDP program ran on RX on the peer side. Whether it
actually just attaches on that side, or whether it is emulated like your
"run in TX path" patch does, I think would have to be context dependent.
For TAP/virtio_net, I think it would be pretty obvious that we should
just emulate it, like Jason said. For veth I'm not sure; there are some
gnarly details if we do actually move the attachment to the peer
interface when that peer is in a different namespace (a container
probably shouldn't be able to install an XDP program outside its
namespace, for instance). So either we could say we attach on the actual
peer if both interfaces are in the same namespace, and emulate
otherwise? IDK?

-Toke


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC net-next 11/14] tun: run XDP program in tx path
  2019-12-20  4:46                 ` Prashant Bhole
  2019-12-20  7:36                   ` Jason Wang
  2019-12-20 10:11                   ` Toke Høiland-Jørgensen
@ 2019-12-20 16:11                   ` David Ahern
  2019-12-20 22:17                     ` Prashant Bhole
  2 siblings, 1 reply; 36+ messages in thread
From: David Ahern @ 2019-12-20 16:11 UTC (permalink / raw)
  To: Prashant Bhole, Jason Wang, Toke Høiland-Jørgensen,
	Alexei Starovoitov, Jesper Dangaard Brouer
  Cc: David S . Miller, Michael S . Tsirkin, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, Jakub Kicinski,
	John Fastabend, Toshiaki Makita, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, netdev, Ilias Apalodimas

On 12/19/19 9:46 PM, Prashant Bhole wrote:
> 
> "It can improve container networking where veth pair links the host and
> the container. Host can set ACL by setting tx path XDP to the veth
> iface."

Just to be clear, this is the use case of interest to me, not the
offloading. I want programs managed by and viewable by the host OS and
not necessarily viewable by the guest OS or container programs.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC net-next 11/14] tun: run XDP program in tx path
  2019-12-20 16:11                   ` David Ahern
@ 2019-12-20 22:17                     ` Prashant Bhole
  2019-12-23  6:05                       ` Jason Wang
  0 siblings, 1 reply; 36+ messages in thread
From: Prashant Bhole @ 2019-12-20 22:17 UTC (permalink / raw)
  To: David Ahern, Jason Wang, Toke Høiland-Jørgensen,
	Alexei Starovoitov, Jesper Dangaard Brouer
  Cc: David S . Miller, Michael S . Tsirkin, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, Jakub Kicinski,
	John Fastabend, Toshiaki Makita, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, netdev, Ilias Apalodimas



On 12/21/2019 1:11 AM, David Ahern wrote:
> On 12/19/19 9:46 PM, Prashant Bhole wrote:
>>
>> "It can improve container networking where veth pair links the host and
>> the container. Host can set ACL by setting tx path XDP to the veth
>> iface."
> 
> Just to be clear, this is the use case of interest to me, not the
> offloading. I want programs managed by and viewable by the host OS and
> not necessarily viewable by the guest OS or container programs.
> 

Yes the plan is to implement this while having a provision to implement
offload feature on top of it.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC net-next 11/14] tun: run XDP program in tx path
  2019-12-20 22:17                     ` Prashant Bhole
@ 2019-12-23  6:05                       ` Jason Wang
  2019-12-23  8:09                         ` Prashant Bhole
  0 siblings, 1 reply; 36+ messages in thread
From: Jason Wang @ 2019-12-23  6:05 UTC (permalink / raw)
  To: Prashant Bhole, David Ahern, Toke Høiland-Jørgensen,
	Alexei Starovoitov, Jesper Dangaard Brouer
  Cc: David S . Miller, Michael S . Tsirkin, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, Jakub Kicinski,
	John Fastabend, Toshiaki Makita, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, netdev, Ilias Apalodimas


On 2019/12/21 上午6:17, Prashant Bhole wrote:
>
>
> On 12/21/2019 1:11 AM, David Ahern wrote:
>> On 12/19/19 9:46 PM, Prashant Bhole wrote:
>>>
>>> "It can improve container networking where veth pair links the host and
>>> the container. Host can set ACL by setting tx path XDP to the veth
>>> iface."
>>
>> Just to be clear, this is the use case of interest to me, not the
>> offloading. I want programs managed by and viewable by the host OS and
>> not necessarily viewable by the guest OS or container programs.
>>
>
> Yes the plan is to implement this while having a provision to implement
> offload feature on top of it.


I wonder maybe it's easier to focus on the TX path first then consider 
building offloading support on top.

Thanks


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC net-next 11/14] tun: run XDP program in tx path
  2019-12-23  6:05                       ` Jason Wang
@ 2019-12-23  8:09                         ` Prashant Bhole
  2019-12-23  8:34                           ` Jason Wang
  0 siblings, 1 reply; 36+ messages in thread
From: Prashant Bhole @ 2019-12-23  8:09 UTC (permalink / raw)
  To: Jason Wang, David Ahern, Toke Høiland-Jørgensen,
	Alexei Starovoitov, Jesper Dangaard Brouer
  Cc: David S . Miller, Michael S . Tsirkin, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, Jakub Kicinski,
	John Fastabend, Toshiaki Makita, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, netdev, Ilias Apalodimas



On 12/23/19 3:05 PM, Jason Wang wrote:
> 
> On 2019/12/21 上午6:17, Prashant Bhole wrote:
>>
>>
>> On 12/21/2019 1:11 AM, David Ahern wrote:
>>> On 12/19/19 9:46 PM, Prashant Bhole wrote:
>>>>
>>>> "It can improve container networking where veth pair links the host and
>>>> the container. Host can set ACL by setting tx path XDP to the veth
>>>> iface."
>>>
>>> Just to be clear, this is the use case of interest to me, not the
>>> offloading. I want programs managed by and viewable by the host OS and
>>> not necessarily viewable by the guest OS or container programs.
>>>
>>
>> Yes the plan is to implement this while having a provision to implement
>> offload feature on top of it.
> 
> 
> I wonder maybe it's easier to focus on the TX path first then consider 
> building offloading support on top.
Currently working on TX path. I will try make sure that we will be able
to implement offloading on top of it later.

Thanks.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC net-next 11/14] tun: run XDP program in tx path
  2019-12-23  8:09                         ` Prashant Bhole
@ 2019-12-23  8:34                           ` Jason Wang
  2019-12-23 11:06                             ` Prashant Bhole
  0 siblings, 1 reply; 36+ messages in thread
From: Jason Wang @ 2019-12-23  8:34 UTC (permalink / raw)
  To: Prashant Bhole, David Ahern, Toke Høiland-Jørgensen,
	Alexei Starovoitov, Jesper Dangaard Brouer
  Cc: David S . Miller, Michael S . Tsirkin, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, Jakub Kicinski,
	John Fastabend, Toshiaki Makita, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, netdev, Ilias Apalodimas


On 2019/12/23 下午4:09, Prashant Bhole wrote:
>
>
> On 12/23/19 3:05 PM, Jason Wang wrote:
>>
>> On 2019/12/21 上午6:17, Prashant Bhole wrote:
>>>
>>>
>>> On 12/21/2019 1:11 AM, David Ahern wrote:
>>>> On 12/19/19 9:46 PM, Prashant Bhole wrote:
>>>>>
>>>>> "It can improve container networking where veth pair links the 
>>>>> host and
>>>>> the container. Host can set ACL by setting tx path XDP to the veth
>>>>> iface."
>>>>
>>>> Just to be clear, this is the use case of interest to me, not the
>>>> offloading. I want programs managed by and viewable by the host OS and
>>>> not necessarily viewable by the guest OS or container programs.
>>>>
>>>
>>> Yes the plan is to implement this while having a provision to implement
>>> offload feature on top of it.
>>
>>
>> I wonder maybe it's easier to focus on the TX path first then 
>> consider building offloading support on top.
> Currently working on TX path. I will try make sure that we will be able
> to implement offloading on top of it later.
>
> Thanks.


Right, then I think it's better to drop patch 13 and bring it back in 
the offloading series.

Thanks


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC net-next 11/14] tun: run XDP program in tx path
  2019-12-23  8:34                           ` Jason Wang
@ 2019-12-23 11:06                             ` Prashant Bhole
  0 siblings, 0 replies; 36+ messages in thread
From: Prashant Bhole @ 2019-12-23 11:06 UTC (permalink / raw)
  To: Jason Wang, David Ahern, Toke Høiland-Jørgensen,
	Alexei Starovoitov, Jesper Dangaard Brouer
  Cc: David S . Miller, Michael S . Tsirkin, Alexei Starovoitov,
	Daniel Borkmann, Jesper Dangaard Brouer, Jakub Kicinski,
	John Fastabend, Toshiaki Makita, Martin KaFai Lau, Song Liu,
	Yonghong Song, Andrii Nakryiko, netdev, Ilias Apalodimas



On 12/23/2019 5:34 PM, Jason Wang wrote:
> 
> On 2019/12/23 下午4:09, Prashant Bhole wrote:
>>
>>
>> On 12/23/19 3:05 PM, Jason Wang wrote:
>>>
>>> On 2019/12/21 上午6:17, Prashant Bhole wrote:
>>>>
>>>>
>>>> On 12/21/2019 1:11 AM, David Ahern wrote:
>>>>> On 12/19/19 9:46 PM, Prashant Bhole wrote:
>>>>>>
>>>>>> "It can improve container networking where veth pair links the 
>>>>>> host and
>>>>>> the container. Host can set ACL by setting tx path XDP to the veth
>>>>>> iface."
>>>>>
>>>>> Just to be clear, this is the use case of interest to me, not the
>>>>> offloading. I want programs managed by and viewable by the host OS and
>>>>> not necessarily viewable by the guest OS or container programs.
>>>>>
>>>>
>>>> Yes the plan is to implement this while having a provision to implement
>>>> offload feature on top of it.
>>>
>>>
>>> I wonder maybe it's easier to focus on the TX path first then 
>>> consider building offloading support on top.
>> Currently working on TX path. I will try make sure that we will be able
>> to implement offloading on top of it later.
>>
>> Thanks.
> 
> 
> Right, then I think it's better to drop patch 13 and bring it back in 
> the offloading series.
> 

Yes that patch actually makes sense in offloading series.




^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2019-12-23 11:07 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-18  8:10 [RFC net-next 00/14] XDP in tx path Prashant Bhole
2019-12-18  8:10 ` [RFC net-next 01/14] net: add tx path XDP support Prashant Bhole
2019-12-18  8:10 ` [RFC net-next 02/14] tools: sync kernel uapi/linux/if_link.h header Prashant Bhole
2019-12-18  8:10 ` [RFC net-next 03/14] libbpf: API for tx path XDP support Prashant Bhole
2019-12-18 18:20   ` Alexei Starovoitov
2019-12-18  8:10 ` [RFC net-next 04/14] samples/bpf: xdp1, add XDP tx support Prashant Bhole
2019-12-18  8:10 ` [RFC net-next 05/14] net: core: rename netif_receive_generic_xdp() to do_generic_xdp_core() Prashant Bhole
2019-12-18  8:10 ` [RFC net-next 06/14] net: core: export do_xdp_generic_core() Prashant Bhole
2019-12-18  8:10 ` [RFC net-next 07/14] tuntap: check tun_msg_ctl type at necessary places Prashant Bhole
2019-12-18  8:10 ` [RFC net-next 08/14] vhost_net: user tap recvmsg api to access ptr ring Prashant Bhole
2019-12-18  8:10 ` [RFC net-next 09/14] tuntap: remove usage of ptr ring in vhost_net Prashant Bhole
2019-12-18  8:10 ` [RFC net-next 10/14] tun: set tx path XDP program Prashant Bhole
2019-12-18  8:10 ` [RFC net-next 11/14] tun: run XDP program in tx path Prashant Bhole
2019-12-18 10:07   ` Jesper Dangaard Brouer
2019-12-18 11:48     ` Toke Høiland-Jørgensen
2019-12-18 16:33       ` David Ahern
2019-12-19  2:44         ` Jason Wang
2019-12-18 18:19       ` Alexei Starovoitov
2019-12-19  2:34         ` Prashant Bhole
2019-12-19 10:15           ` Toke Høiland-Jørgensen
2019-12-20  0:07             ` Prashant Bhole
2019-12-20  3:24               ` Jason Wang
2019-12-20  4:46                 ` Prashant Bhole
2019-12-20  7:36                   ` Jason Wang
2019-12-20 10:11                   ` Toke Høiland-Jørgensen
2019-12-20 16:11                   ` David Ahern
2019-12-20 22:17                     ` Prashant Bhole
2019-12-23  6:05                       ` Jason Wang
2019-12-23  8:09                         ` Prashant Bhole
2019-12-23  8:34                           ` Jason Wang
2019-12-23 11:06                             ` Prashant Bhole
2019-12-18 16:29     ` David Ahern
2019-12-19  1:47     ` Prashant Bhole
2019-12-18  8:10 ` [RFC net-next 12/14] tun: add a way to inject tx path packet into Rx path Prashant Bhole
2019-12-18  8:10 ` [RFC net-next 13/14] tun: handle XDP_TX action of tx path XDP program Prashant Bhole
2019-12-18  8:10 ` [RFC net-next 14/14] tun: run xdp prog when tun is read from file interface Prashant Bhole

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).