All of lore.kernel.org
 help / color / mirror / Atom feed
* [net-next PATCH v2 0/5] XDP for virtio_net
@ 2016-11-20  2:49 John Fastabend
  2016-11-20  2:49 ` [net-next PATCH v2 1/5] net: virtio dynamically disable/enable LRO John Fastabend
                   ` (4 more replies)
  0 siblings, 5 replies; 21+ messages in thread
From: John Fastabend @ 2016-11-20  2:49 UTC (permalink / raw)
  To: daniel, eric.dumazet, mst, kubakici, shm, davem, alexei.starovoitov
  Cc: netdev, bblanco, john.fastabend, john.r.fastabend, brouer, tgraf

This implements virtio_net for the mergeable buffers and big_packet
modes. I tested this with vhost_net running on qemu and did not see
any issues.

There are some restrictions for XDP to be enabled (see patch 3) for
more details.

  1. LRO must be off
  2. MTU must be less than PAGE_SIZE
  3. queues must be available to dedicate to XDP
  4. num_bufs received in mergeable buffers must be 1
  5. big_packet mode must have all data on single page

Please review any comments/feedback welcome as always.

v2, fixes rcu usage throughout thanks to Eric and the use of
num_online_cpus() usage thanks to Jakub.

Thanks,
John

---

John Fastabend (4):
      net: virtio dynamically disable/enable LRO
      net: xdp: add invalid buffer warning
      virtio_net: add dedicated XDP transmit queues
      virtio_net: add XDP_TX support

Shrijeet Mukherjee (1):
      virtio_net: Add XDP support


 drivers/net/virtio_net.c |  268 +++++++++++++++++++++++++++++++++++++++++++++-
 include/linux/filter.h   |    1 
 net/core/filter.c        |    6 +
 3 files changed, 270 insertions(+), 5 deletions(-)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [net-next PATCH v2 1/5] net: virtio dynamically disable/enable LRO
  2016-11-20  2:49 [net-next PATCH v2 0/5] XDP for virtio_net John Fastabend
@ 2016-11-20  2:49 ` John Fastabend
  2016-11-21 23:23   ` Michael S. Tsirkin
  2016-11-20  2:50 ` [net-next PATCH v2 2/5] net: xdp: add invalid buffer warning John Fastabend
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 21+ messages in thread
From: John Fastabend @ 2016-11-20  2:49 UTC (permalink / raw)
  To: daniel, eric.dumazet, mst, kubakici, shm, davem, alexei.starovoitov
  Cc: netdev, bblanco, john.fastabend, john.r.fastabend, brouer, tgraf

This adds support for dynamically setting the LRO feature flag. The
message to control guest features in the backend uses the
CTRL_GUEST_OFFLOADS msg type.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 drivers/net/virtio_net.c |   45 ++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 44 insertions(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index ca5239a..8189e5b 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1419,6 +1419,41 @@ static void virtnet_init_settings(struct net_device *dev)
 	.set_settings = virtnet_set_settings,
 };
 
+static int virtnet_set_features(struct net_device *netdev,
+				netdev_features_t features)
+{
+	struct virtnet_info *vi = netdev_priv(netdev);
+	struct virtio_device *vdev = vi->vdev;
+	struct scatterlist sg;
+	u64 offloads = 0;
+
+	if (features & NETIF_F_LRO)
+		offloads |= (1 << VIRTIO_NET_F_GUEST_TSO4) |
+			    (1 << VIRTIO_NET_F_GUEST_TSO6);
+
+	if (features & NETIF_F_RXCSUM)
+		offloads |= (1 << VIRTIO_NET_F_GUEST_CSUM);
+
+	if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS)) {
+		sg_init_one(&sg, &offloads, sizeof(uint64_t));
+		if (!virtnet_send_command(vi,
+					  VIRTIO_NET_CTRL_GUEST_OFFLOADS,
+					  VIRTIO_NET_CTRL_GUEST_OFFLOADS_SET,
+					  &sg)) {
+			dev_warn(&netdev->dev,
+				 "Failed to set guest offloads by virtnet command.\n");
+			return -EINVAL;
+		}
+	} else if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS) &&
+		   !virtio_has_feature(vdev, VIRTIO_F_VERSION_1)) {
+		dev_warn(&netdev->dev,
+			 "No support for setting offloads pre version_1.\n");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
 static const struct net_device_ops virtnet_netdev = {
 	.ndo_open            = virtnet_open,
 	.ndo_stop   	     = virtnet_close,
@@ -1435,6 +1470,7 @@ static void virtnet_init_settings(struct net_device *dev)
 #ifdef CONFIG_NET_RX_BUSY_POLL
 	.ndo_busy_poll		= virtnet_busy_poll,
 #endif
+	.ndo_set_features	= virtnet_set_features,
 };
 
 static void virtnet_config_changed_work(struct work_struct *work)
@@ -1810,6 +1846,12 @@ static int virtnet_probe(struct virtio_device *vdev)
 	if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_CSUM))
 		dev->features |= NETIF_F_RXCSUM;
 
+	if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) &&
+	    virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6)) {
+		dev->features |= NETIF_F_LRO;
+		dev->hw_features |= NETIF_F_LRO;
+	}
+
 	dev->vlan_features = dev->features;
 
 	/* MTU range: 68 - 65535 */
@@ -2047,7 +2089,8 @@ static int virtnet_restore(struct virtio_device *vdev)
 	VIRTIO_NET_F_CTRL_RX, VIRTIO_NET_F_CTRL_VLAN, \
 	VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ, \
 	VIRTIO_NET_F_CTRL_MAC_ADDR, \
-	VIRTIO_NET_F_MTU
+	VIRTIO_NET_F_MTU, \
+	VIRTIO_NET_F_CTRL_GUEST_OFFLOADS
 
 static unsigned int features[] = {
 	VIRTNET_FEATURES,

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [net-next PATCH v2 2/5] net: xdp: add invalid buffer warning
  2016-11-20  2:49 [net-next PATCH v2 0/5] XDP for virtio_net John Fastabend
  2016-11-20  2:49 ` [net-next PATCH v2 1/5] net: virtio dynamically disable/enable LRO John Fastabend
@ 2016-11-20  2:50 ` John Fastabend
  2016-11-20  2:50 ` [net-next PATCH v2 3/5] virtio_net: Add XDP support John Fastabend
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 21+ messages in thread
From: John Fastabend @ 2016-11-20  2:50 UTC (permalink / raw)
  To: daniel, eric.dumazet, mst, kubakici, shm, davem, alexei.starovoitov
  Cc: netdev, bblanco, john.fastabend, john.r.fastabend, brouer, tgraf

This adds a warning for drivers to use when encountering an invalid
buffer for XDP. For normal cases this should not happen but to catch
this in virtual/qemu setups that I may not have expected from the
emulation layer having a standard warning is useful.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 include/linux/filter.h |    1 +
 net/core/filter.c      |    6 ++++++
 2 files changed, 7 insertions(+)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 1f09c52..0c79004 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -595,6 +595,7 @@ int sk_get_filter(struct sock *sk, struct sock_filter __user *filter,
 struct bpf_prog *bpf_patch_insn_single(struct bpf_prog *prog, u32 off,
 				       const struct bpf_insn *patch, u32 len);
 void bpf_warn_invalid_xdp_action(u32 act);
+void bpf_warn_invalid_xdp_buffer(void);
 
 #ifdef CONFIG_BPF_JIT
 extern int bpf_jit_enable;
diff --git a/net/core/filter.c b/net/core/filter.c
index dece94f..b777730 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2774,6 +2774,12 @@ void bpf_warn_invalid_xdp_action(u32 act)
 }
 EXPORT_SYMBOL_GPL(bpf_warn_invalid_xdp_action);
 
+void bpf_warn_invalid_xdp_buffer(void)
+{
+	WARN_ONCE(1, "Illegal XDP buffer encountered, expect packet loss\n");
+}
+EXPORT_SYMBOL_GPL(bpf_warn_invalid_xdp_buffer);
+
 static u32 sk_filter_convert_ctx_access(enum bpf_access_type type, int dst_reg,
 					int src_reg, int ctx_off,
 					struct bpf_insn *insn_buf,

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [net-next PATCH v2 3/5] virtio_net: Add XDP support
  2016-11-20  2:49 [net-next PATCH v2 0/5] XDP for virtio_net John Fastabend
  2016-11-20  2:49 ` [net-next PATCH v2 1/5] net: virtio dynamically disable/enable LRO John Fastabend
  2016-11-20  2:50 ` [net-next PATCH v2 2/5] net: xdp: add invalid buffer warning John Fastabend
@ 2016-11-20  2:50 ` John Fastabend
  2016-11-21 23:20   ` Michael S. Tsirkin
  2016-11-20  2:51 ` [net-next PATCH v2 4/5] virtio_net: add dedicated XDP transmit queues John Fastabend
  2016-11-20  2:51 ` [net-next PATCH v2 5/5] virtio_net: add XDP_TX support John Fastabend
  4 siblings, 1 reply; 21+ messages in thread
From: John Fastabend @ 2016-11-20  2:50 UTC (permalink / raw)
  To: daniel, eric.dumazet, mst, kubakici, shm, davem, alexei.starovoitov
  Cc: netdev, bblanco, john.fastabend, john.r.fastabend, brouer, tgraf

From: Shrijeet Mukherjee <shrijeet@gmail.com>

This adds XDP support to virtio_net. Some requirements must be
met for XDP to be enabled depending on the mode. First it will
only be supported with LRO disabled so that data is not pushed
across multiple buffers. The MTU must be less than a page size
to avoid having to handle XDP across multiple pages.

If mergeable receive is enabled this first series only supports
the case where header and data are in the same buf which we can
check when a packet is received by looking at num_buf. If the
num_buf is greater than 1 and a XDP program is loaded the packet
is dropped and a warning is thrown. When any_header_sg is set this
does not happen and both header and data is put in a single buffer
as expected so we check this when XDP programs are loaded. Note I
have only tested this with Linux vhost backend.

If big packets mode is enabled and MTU/LRO conditions above are
met then XDP is allowed.

A follow on patch can be generated to solve the mergeable receive
case with num_bufs equal to 2. Buffers greater than two may not
be handled has easily.

Suggested-by: Shrijeet Mukherjee <shrijeet@gmail.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 drivers/net/virtio_net.c |  146 +++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 142 insertions(+), 4 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 8189e5b..8f99a53 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -22,6 +22,7 @@
 #include <linux/module.h>
 #include <linux/virtio.h>
 #include <linux/virtio_net.h>
+#include <linux/bpf.h>
 #include <linux/scatterlist.h>
 #include <linux/if_vlan.h>
 #include <linux/slab.h>
@@ -81,6 +82,8 @@ struct receive_queue {
 
 	struct napi_struct napi;
 
+	struct bpf_prog __rcu *xdp_prog;
+
 	/* Chain pages by the private ptr. */
 	struct page *pages;
 
@@ -324,6 +327,38 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
 	return skb;
 }
 
+static u32 do_xdp_prog(struct virtnet_info *vi,
+		       struct bpf_prog *xdp_prog,
+		       struct page *page, int offset, int len)
+{
+	int hdr_padded_len;
+	struct xdp_buff xdp;
+	u32 act;
+	u8 *buf;
+
+	buf = page_address(page) + offset;
+
+	if (vi->mergeable_rx_bufs)
+		hdr_padded_len = sizeof(struct virtio_net_hdr_mrg_rxbuf);
+	else
+		hdr_padded_len = sizeof(struct padded_vnet_hdr);
+
+	xdp.data = buf + hdr_padded_len;
+	xdp.data_end = xdp.data + (len - vi->hdr_len);
+
+	act = bpf_prog_run_xdp(xdp_prog, &xdp);
+	switch (act) {
+	case XDP_PASS:
+		return XDP_PASS;
+	default:
+		bpf_warn_invalid_xdp_action(act);
+	case XDP_TX:
+	case XDP_ABORTED:
+	case XDP_DROP:
+		return XDP_DROP;
+	}
+}
+
 static struct sk_buff *receive_small(struct virtnet_info *vi, void *buf, unsigned int len)
 {
 	struct sk_buff * skb = buf;
@@ -340,9 +375,19 @@ static struct sk_buff *receive_big(struct net_device *dev,
 				   void *buf,
 				   unsigned int len)
 {
+	struct bpf_prog *xdp_prog;
 	struct page *page = buf;
-	struct sk_buff *skb = page_to_skb(vi, rq, page, 0, len, PAGE_SIZE);
+	struct sk_buff *skb;
 
+	xdp_prog = rcu_dereference_bh(rq->xdp_prog);
+	if (xdp_prog) {
+		u32 act = do_xdp_prog(vi, xdp_prog, page, 0, len);
+
+		if (act == XDP_DROP)
+			goto err;
+	}
+
+	skb = page_to_skb(vi, rq, page, 0, len, PAGE_SIZE);
 	if (unlikely(!skb))
 		goto err;
 
@@ -366,10 +411,25 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 	struct page *page = virt_to_head_page(buf);
 	int offset = buf - page_address(page);
 	unsigned int truesize = max(len, mergeable_ctx_to_buf_truesize(ctx));
+	struct sk_buff *head_skb, *curr_skb;
+	struct bpf_prog *xdp_prog;
 
-	struct sk_buff *head_skb = page_to_skb(vi, rq, page, offset, len,
-					       truesize);
-	struct sk_buff *curr_skb = head_skb;
+	xdp_prog = rcu_dereference_bh(rq->xdp_prog);
+	if (xdp_prog) {
+		u32 act;
+
+		if (num_buf > 1) {
+			bpf_warn_invalid_xdp_buffer();
+			goto err_skb;
+		}
+
+		act = do_xdp_prog(vi, xdp_prog, page, offset, len);
+		if (act == XDP_DROP)
+			goto err_skb;
+	}
+
+	head_skb = page_to_skb(vi, rq, page, offset, len, truesize);
+	curr_skb = head_skb;
 
 	if (unlikely(!curr_skb))
 		goto err_skb;
@@ -1328,6 +1388,13 @@ static int virtnet_set_channels(struct net_device *dev,
 	if (queue_pairs > vi->max_queue_pairs || queue_pairs == 0)
 		return -EINVAL;
 
+	/* For now we don't support modifying channels while XDP is loaded
+	 * also when XDP is loaded all RX queues have XDP programs so we only
+	 * need to check a single RX queue.
+	 */
+	if (vi->rq[0].xdp_prog)
+		return -EINVAL;
+
 	get_online_cpus();
 	err = virtnet_set_queues(vi, queue_pairs);
 	if (!err) {
@@ -1454,6 +1521,68 @@ static int virtnet_set_features(struct net_device *netdev,
 	return 0;
 }
 
+static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog)
+{
+	struct virtnet_info *vi = netdev_priv(dev);
+	struct bpf_prog *old_prog;
+	int i;
+
+	if ((dev->features & NETIF_F_LRO) && prog) {
+		netdev_warn(dev, "can't set XDP while LRO is on, disable LRO first\n");
+		return -EINVAL;
+	}
+
+	if (vi->mergeable_rx_bufs && !vi->any_header_sg) {
+		netdev_warn(dev, "XDP expects header/data in single page\n");
+		return -EINVAL;
+	}
+
+	if (dev->mtu > PAGE_SIZE) {
+		netdev_warn(dev, "XDP requires MTU less than %lu\n", PAGE_SIZE);
+		return -EINVAL;
+	}
+
+	if (prog) {
+		prog = bpf_prog_add(prog, vi->max_queue_pairs - 1);
+		if (IS_ERR(prog))
+			return PTR_ERR(prog);
+	}
+
+	for (i = 0; i < vi->max_queue_pairs; i++) {
+		old_prog = rtnl_dereference(vi->rq[i].xdp_prog);
+		rcu_assign_pointer(vi->rq[i].xdp_prog, prog);
+		if (old_prog)
+			bpf_prog_put(old_prog);
+	}
+
+	return 0;
+}
+
+static bool virtnet_xdp_query(struct net_device *dev)
+{
+	struct virtnet_info *vi = netdev_priv(dev);
+	int i;
+
+	for (i = 0; i < vi->max_queue_pairs; i++) {
+		if (vi->rq[i].xdp_prog)
+			return true;
+	}
+	return false;
+}
+
+static int virtnet_xdp(struct net_device *dev, struct netdev_xdp *xdp)
+{
+	switch (xdp->command) {
+	case XDP_SETUP_PROG:
+		return virtnet_xdp_set(dev, xdp->prog);
+	case XDP_QUERY_PROG:
+		xdp->prog_attached = virtnet_xdp_query(dev);
+		return 0;
+	default:
+		return -EINVAL;
+	}
+}
+
 static const struct net_device_ops virtnet_netdev = {
 	.ndo_open            = virtnet_open,
 	.ndo_stop   	     = virtnet_close,
@@ -1471,6 +1600,7 @@ static int virtnet_set_features(struct net_device *netdev,
 	.ndo_busy_poll		= virtnet_busy_poll,
 #endif
 	.ndo_set_features	= virtnet_set_features,
+	.ndo_xdp		= virtnet_xdp,
 };
 
 static void virtnet_config_changed_work(struct work_struct *work)
@@ -1527,12 +1657,20 @@ static void virtnet_free_queues(struct virtnet_info *vi)
 
 static void free_receive_bufs(struct virtnet_info *vi)
 {
+	struct bpf_prog *old_prog;
 	int i;
 
+	rtnl_lock();
 	for (i = 0; i < vi->max_queue_pairs; i++) {
 		while (vi->rq[i].pages)
 			__free_pages(get_a_page(&vi->rq[i], GFP_KERNEL), 0);
+
+		old_prog = rtnl_dereference(vi->rq[i].xdp_prog);
+		RCU_INIT_POINTER(vi->rq[i].xdp_prog, NULL);
+		if (old_prog)
+			bpf_prog_put(old_prog);
 	}
+	rtnl_unlock();
 }
 
 static void free_receive_page_frags(struct virtnet_info *vi)

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [net-next PATCH v2 4/5] virtio_net: add dedicated XDP transmit queues
  2016-11-20  2:49 [net-next PATCH v2 0/5] XDP for virtio_net John Fastabend
                   ` (2 preceding siblings ...)
  2016-11-20  2:50 ` [net-next PATCH v2 3/5] virtio_net: Add XDP support John Fastabend
@ 2016-11-20  2:51 ` John Fastabend
  2016-11-21 11:45   ` Daniel Borkmann
  2016-11-21 23:13   ` Michael S. Tsirkin
  2016-11-20  2:51 ` [net-next PATCH v2 5/5] virtio_net: add XDP_TX support John Fastabend
  4 siblings, 2 replies; 21+ messages in thread
From: John Fastabend @ 2016-11-20  2:51 UTC (permalink / raw)
  To: daniel, eric.dumazet, mst, kubakici, shm, davem, alexei.starovoitov
  Cc: netdev, bblanco, john.fastabend, john.r.fastabend, brouer, tgraf

XDP requires using isolated transmit queues to avoid interference
with normal networking stack (BQL, NETDEV_TX_BUSY, etc). This patch
adds a XDP queue per cpu when a XDP program is loaded and does not
expose the queues to the OS via the normal API call to
netif_set_real_num_tx_queues(). This way the stack will never push
an skb to these queues.

However virtio/vhost/qemu implementation only allows for creating
TX/RX queue pairs at this time so creating only TX queues was not
possible. And because the associated RX queues are being created I
went ahead and exposed these to the stack and let the backend use
them. This creates more RX queues visible to the network stack than
TX queues which is worth mentioning but does not cause any issues as
far as I can tell.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 drivers/net/virtio_net.c |   32 +++++++++++++++++++++++++++++---
 1 file changed, 29 insertions(+), 3 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 8f99a53..80a426c 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -114,6 +114,9 @@ struct virtnet_info {
 	/* # of queue pairs currently used by the driver */
 	u16 curr_queue_pairs;
 
+	/* # of XDP queue pairs currently used by the driver */
+	u16 xdp_queue_pairs;
+
 	/* I like... big packets and I cannot lie! */
 	bool big_packets;
 
@@ -1525,7 +1528,8 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog)
 {
 	struct virtnet_info *vi = netdev_priv(dev);
 	struct bpf_prog *old_prog;
-	int i;
+	u16 xdp_qp = 0, curr_qp;
+	int err, i;
 
 	if ((dev->features & NETIF_F_LRO) && prog) {
 		netdev_warn(dev, "can't set XDP while LRO is on, disable LRO first\n");
@@ -1542,12 +1546,34 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog)
 		return -EINVAL;
 	}
 
+	curr_qp = vi->curr_queue_pairs - vi->xdp_queue_pairs;
+	if (prog)
+		xdp_qp = nr_cpu_ids;
+
+	/* XDP requires extra queues for XDP_TX */
+	if (curr_qp + xdp_qp > vi->max_queue_pairs) {
+		netdev_warn(dev, "request %i queues but max is %i\n",
+			    curr_qp + xdp_qp, vi->max_queue_pairs);
+		return -ENOMEM;
+	}
+
+	err = virtnet_set_queues(vi, curr_qp + xdp_qp);
+	if (err) {
+		dev_warn(&dev->dev, "XDP Device queue allocation failure.\n");
+		return err;
+	}
+
 	if (prog) {
-		prog = bpf_prog_add(prog, vi->max_queue_pairs - 1);
-		if (IS_ERR(prog))
+		prog = bpf_prog_add(prog, vi->max_queue_pairs);
+		if (IS_ERR(prog)) {
+			virtnet_set_queues(vi, curr_qp);
 			return PTR_ERR(prog);
+		}
 	}
 
+	vi->xdp_queue_pairs = xdp_qp;
+	netif_set_real_num_rx_queues(dev, curr_qp + xdp_qp);
+
 	for (i = 0; i < vi->max_queue_pairs; i++) {
 		old_prog = rtnl_dereference(vi->rq[i].xdp_prog);
 		rcu_assign_pointer(vi->rq[i].xdp_prog, prog);

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [net-next PATCH v2 5/5] virtio_net: add XDP_TX support
  2016-11-20  2:49 [net-next PATCH v2 0/5] XDP for virtio_net John Fastabend
                   ` (3 preceding siblings ...)
  2016-11-20  2:51 ` [net-next PATCH v2 4/5] virtio_net: add dedicated XDP transmit queues John Fastabend
@ 2016-11-20  2:51 ` John Fastabend
  4 siblings, 0 replies; 21+ messages in thread
From: John Fastabend @ 2016-11-20  2:51 UTC (permalink / raw)
  To: daniel, eric.dumazet, mst, kubakici, shm, davem, alexei.starovoitov
  Cc: netdev, bblanco, john.fastabend, john.r.fastabend, brouer, tgraf

This adds support for the XDP_TX action to virtio_net. When an XDP
program is run and returns the XDP_TX action the virtio_net XDP
implementation will transmit the packet on a TX queue that aligns
with the current CPU that the XDP packet was processed on.

Before sending the packet the header is zeroed.  Also XDP is expected
to handle checksum correctly so no checksum offload  support is
provided.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 drivers/net/virtio_net.c |   57 ++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 54 insertions(+), 3 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 80a426c..c127494 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -330,12 +330,40 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
 	return skb;
 }
 
+static void virtnet_xdp_xmit(struct virtnet_info *vi,
+			     unsigned int qnum, struct xdp_buff *xdp)
+{
+	struct send_queue *sq = &vi->sq[qnum];
+	struct virtio_net_hdr_mrg_rxbuf *hdr;
+	unsigned int num_sg, len;
+	void *xdp_sent;
+
+	/* Free up any pending old buffers before queueing new ones. */
+	while ((xdp_sent = virtqueue_get_buf(sq->vq, &len)) != NULL) {
+		struct page *page = virt_to_head_page(xdp_sent);
+
+		put_page(page);
+	}
+
+	/* Zero header and leave csum up to XDP layers */
+	hdr = xdp->data;
+	memset(hdr, 0, vi->hdr_len);
+	hdr->hdr.gso_type = VIRTIO_NET_HDR_GSO_NONE;
+	hdr->hdr.flags = VIRTIO_NET_HDR_F_DATA_VALID;
+
+	num_sg = 1;
+	sg_init_one(sq->sg, xdp->data, xdp->data_end - xdp->data);
+	virtqueue_add_outbuf(sq->vq, sq->sg, num_sg, xdp->data, GFP_ATOMIC);
+	virtqueue_kick(sq->vq);
+}
+
 static u32 do_xdp_prog(struct virtnet_info *vi,
 		       struct bpf_prog *xdp_prog,
 		       struct page *page, int offset, int len)
 {
 	int hdr_padded_len;
 	struct xdp_buff xdp;
+	unsigned int qp;
 	u32 act;
 	u8 *buf;
 
@@ -353,9 +381,15 @@ static u32 do_xdp_prog(struct virtnet_info *vi,
 	switch (act) {
 	case XDP_PASS:
 		return XDP_PASS;
+	case XDP_TX:
+		qp = vi->curr_queue_pairs -
+			vi->xdp_queue_pairs +
+			smp_processor_id();
+		xdp.data = buf + (vi->mergeable_rx_bufs ? 0 : 4);
+		virtnet_xdp_xmit(vi, qp, &xdp);
+		return XDP_TX;
 	default:
 		bpf_warn_invalid_xdp_action(act);
-	case XDP_TX:
 	case XDP_ABORTED:
 	case XDP_DROP:
 		return XDP_DROP;
@@ -386,8 +420,15 @@ static struct sk_buff *receive_big(struct net_device *dev,
 	if (xdp_prog) {
 		u32 act = do_xdp_prog(vi, xdp_prog, page, 0, len);
 
-		if (act == XDP_DROP)
+		switch (act) {
+		case XDP_PASS:
+			break;
+		case XDP_TX:
+			goto xdp_xmit;
+		case XDP_DROP:
+		default:
 			goto err;
+		}
 	}
 
 	skb = page_to_skb(vi, rq, page, 0, len, PAGE_SIZE);
@@ -399,6 +440,7 @@ static struct sk_buff *receive_big(struct net_device *dev,
 err:
 	dev->stats.rx_dropped++;
 	give_pages(rq, page);
+xdp_xmit:
 	return NULL;
 }
 
@@ -417,6 +459,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 	struct sk_buff *head_skb, *curr_skb;
 	struct bpf_prog *xdp_prog;
 
+	head_skb = NULL;
 	xdp_prog = rcu_dereference_bh(rq->xdp_prog);
 	if (xdp_prog) {
 		u32 act;
@@ -427,8 +470,15 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 		}
 
 		act = do_xdp_prog(vi, xdp_prog, page, offset, len);
-		if (act == XDP_DROP)
+		switch (act) {
+		case XDP_PASS:
+			break;
+		case XDP_TX:
+			goto xdp_xmit;
+		case XDP_DROP:
+		default:
 			goto err_skb;
+		}
 	}
 
 	head_skb = page_to_skb(vi, rq, page, offset, len, truesize);
@@ -502,6 +552,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 err_buf:
 	dev->stats.rx_dropped++;
 	dev_kfree_skb(head_skb);
+xdp_xmit:
 	return NULL;
 }
 

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [net-next PATCH v2 4/5] virtio_net: add dedicated XDP transmit queues
  2016-11-20  2:51 ` [net-next PATCH v2 4/5] virtio_net: add dedicated XDP transmit queues John Fastabend
@ 2016-11-21 11:45   ` Daniel Borkmann
  2016-11-21 15:56     ` John Fastabend
  2016-11-21 23:13   ` Michael S. Tsirkin
  1 sibling, 1 reply; 21+ messages in thread
From: Daniel Borkmann @ 2016-11-21 11:45 UTC (permalink / raw)
  To: John Fastabend, eric.dumazet, mst, kubakici, shm, davem,
	alexei.starovoitov
  Cc: netdev, bblanco, john.r.fastabend, brouer, tgraf

On 11/20/2016 03:51 AM, John Fastabend wrote:
> XDP requires using isolated transmit queues to avoid interference
> with normal networking stack (BQL, NETDEV_TX_BUSY, etc). This patch
> adds a XDP queue per cpu when a XDP program is loaded and does not
> expose the queues to the OS via the normal API call to
> netif_set_real_num_tx_queues(). This way the stack will never push
> an skb to these queues.
>
> However virtio/vhost/qemu implementation only allows for creating
> TX/RX queue pairs at this time so creating only TX queues was not
> possible. And because the associated RX queues are being created I
> went ahead and exposed these to the stack and let the backend use
> them. This creates more RX queues visible to the network stack than
> TX queues which is worth mentioning but does not cause any issues as
> far as I can tell.
>
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> ---
>   drivers/net/virtio_net.c |   32 +++++++++++++++++++++++++++++---
>   1 file changed, 29 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 8f99a53..80a426c 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -114,6 +114,9 @@ struct virtnet_info {
>   	/* # of queue pairs currently used by the driver */
>   	u16 curr_queue_pairs;
>
> +	/* # of XDP queue pairs currently used by the driver */
> +	u16 xdp_queue_pairs;
> +
>   	/* I like... big packets and I cannot lie! */
>   	bool big_packets;
>
> @@ -1525,7 +1528,8 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog)
>   {
>   	struct virtnet_info *vi = netdev_priv(dev);
>   	struct bpf_prog *old_prog;
> -	int i;
> +	u16 xdp_qp = 0, curr_qp;
> +	int err, i;
>
>   	if ((dev->features & NETIF_F_LRO) && prog) {
>   		netdev_warn(dev, "can't set XDP while LRO is on, disable LRO first\n");
> @@ -1542,12 +1546,34 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog)
>   		return -EINVAL;
>   	}
>
> +	curr_qp = vi->curr_queue_pairs - vi->xdp_queue_pairs;
> +	if (prog)
> +		xdp_qp = nr_cpu_ids;
> +
> +	/* XDP requires extra queues for XDP_TX */
> +	if (curr_qp + xdp_qp > vi->max_queue_pairs) {
> +		netdev_warn(dev, "request %i queues but max is %i\n",
> +			    curr_qp + xdp_qp, vi->max_queue_pairs);
> +		return -ENOMEM;
> +	}
> +
> +	err = virtnet_set_queues(vi, curr_qp + xdp_qp);
> +	if (err) {
> +		dev_warn(&dev->dev, "XDP Device queue allocation failure.\n");
> +		return err;
> +	}
> +
>   	if (prog) {
> -		prog = bpf_prog_add(prog, vi->max_queue_pairs - 1);
> -		if (IS_ERR(prog))
> +		prog = bpf_prog_add(prog, vi->max_queue_pairs);

I think this change is not correct, it would be off by one now.
The previous 'vi->max_queue_pairs - 1' was actually correct here.
dev_change_xdp_fd() already gives you a reference (see the doc on
enum xdp_netdev_command in netdevice.h).

> +		if (IS_ERR(prog)) {
> +			virtnet_set_queues(vi, curr_qp);
>   			return PTR_ERR(prog);
> +		}
>   	}
>
> +	vi->xdp_queue_pairs = xdp_qp;
> +	netif_set_real_num_rx_queues(dev, curr_qp + xdp_qp);
> +
>   	for (i = 0; i < vi->max_queue_pairs; i++) {
>   		old_prog = rtnl_dereference(vi->rq[i].xdp_prog);
>   		rcu_assign_pointer(vi->rq[i].xdp_prog, prog);
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [net-next PATCH v2 4/5] virtio_net: add dedicated XDP transmit queues
  2016-11-21 11:45   ` Daniel Borkmann
@ 2016-11-21 15:56     ` John Fastabend
  0 siblings, 0 replies; 21+ messages in thread
From: John Fastabend @ 2016-11-21 15:56 UTC (permalink / raw)
  To: Daniel Borkmann, eric.dumazet, mst, kubakici, shm, davem,
	alexei.starovoitov
  Cc: netdev, bblanco, john.r.fastabend, brouer, tgraf

On 16-11-21 03:45 AM, Daniel Borkmann wrote:
> On 11/20/2016 03:51 AM, John Fastabend wrote:
>> XDP requires using isolated transmit queues to avoid interference
>> with normal networking stack (BQL, NETDEV_TX_BUSY, etc). This patch
>> adds a XDP queue per cpu when a XDP program is loaded and does not
>> expose the queues to the OS via the normal API call to
>> netif_set_real_num_tx_queues(). This way the stack will never push
>> an skb to these queues.
>>
>> However virtio/vhost/qemu implementation only allows for creating
>> TX/RX queue pairs at this time so creating only TX queues was not
>> possible. And because the associated RX queues are being created I
>> went ahead and exposed these to the stack and let the backend use
>> them. This creates more RX queues visible to the network stack than
>> TX queues which is worth mentioning but does not cause any issues as
>> far as I can tell.
>>
>> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
>> ---

[...]

>>       }
>>
>> +    curr_qp = vi->curr_queue_pairs - vi->xdp_queue_pairs;
>> +    if (prog)
>> +        xdp_qp = nr_cpu_ids;
>> +
>> +    /* XDP requires extra queues for XDP_TX */
>> +    if (curr_qp + xdp_qp > vi->max_queue_pairs) {
>> +        netdev_warn(dev, "request %i queues but max is %i\n",
>> +                curr_qp + xdp_qp, vi->max_queue_pairs);
>> +        return -ENOMEM;
>> +    }
>> +
>> +    err = virtnet_set_queues(vi, curr_qp + xdp_qp);
>> +    if (err) {
>> +        dev_warn(&dev->dev, "XDP Device queue allocation failure.\n");
>> +        return err;
>> +    }
>> +
>>       if (prog) {
>> -        prog = bpf_prog_add(prog, vi->max_queue_pairs - 1);
>> -        if (IS_ERR(prog))
>> +        prog = bpf_prog_add(prog, vi->max_queue_pairs);
> 
> I think this change is not correct, it would be off by one now.
> The previous 'vi->max_queue_pairs - 1' was actually correct here.
> dev_change_xdp_fd() already gives you a reference (see the doc on
> enum xdp_netdev_command in netdevice.h).


Right, this was an error thanks for checking it I'll send a v3. And
maybe draft a test for XDP ref counting to test it in the future.

.John

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [net-next PATCH v2 4/5] virtio_net: add dedicated XDP transmit queues
  2016-11-20  2:51 ` [net-next PATCH v2 4/5] virtio_net: add dedicated XDP transmit queues John Fastabend
  2016-11-21 11:45   ` Daniel Borkmann
@ 2016-11-21 23:13   ` Michael S. Tsirkin
  2016-11-22  8:17     ` John Fastabend
  1 sibling, 1 reply; 21+ messages in thread
From: Michael S. Tsirkin @ 2016-11-21 23:13 UTC (permalink / raw)
  To: John Fastabend
  Cc: daniel, eric.dumazet, kubakici, shm, davem, alexei.starovoitov,
	netdev, bblanco, john.r.fastabend, brouer, tgraf

On Sat, Nov 19, 2016 at 06:51:04PM -0800, John Fastabend wrote:
> XDP requires using isolated transmit queues to avoid interference
> with normal networking stack (BQL, NETDEV_TX_BUSY, etc). This patch
> adds a XDP queue per cpu when a XDP program is loaded and does not
> expose the queues to the OS via the normal API call to
> netif_set_real_num_tx_queues(). This way the stack will never push
> an skb to these queues.
> 
> However virtio/vhost/qemu implementation only allows for creating
> TX/RX queue pairs at this time so creating only TX queues was not
> possible. And because the associated RX queues are being created I
> went ahead and exposed these to the stack and let the backend use
> them. This creates more RX queues visible to the network stack than
> TX queues which is worth mentioning but does not cause any issues as
> far as I can tell.
> 
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>

FYI what's supposed to happen is packets from the same
flow going in the reverse direction will go on the
same queue.

This might come in handy when implementing RX XDP.

> ---
>  drivers/net/virtio_net.c |   32 +++++++++++++++++++++++++++++---
>  1 file changed, 29 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 8f99a53..80a426c 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -114,6 +114,9 @@ struct virtnet_info {
>  	/* # of queue pairs currently used by the driver */
>  	u16 curr_queue_pairs;
>  
> +	/* # of XDP queue pairs currently used by the driver */
> +	u16 xdp_queue_pairs;
> +
>  	/* I like... big packets and I cannot lie! */
>  	bool big_packets;
>  
> @@ -1525,7 +1528,8 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog)
>  {
>  	struct virtnet_info *vi = netdev_priv(dev);
>  	struct bpf_prog *old_prog;
> -	int i;
> +	u16 xdp_qp = 0, curr_qp;
> +	int err, i;
>  
>  	if ((dev->features & NETIF_F_LRO) && prog) {
>  		netdev_warn(dev, "can't set XDP while LRO is on, disable LRO first\n");
> @@ -1542,12 +1546,34 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog)
>  		return -EINVAL;
>  	}
>  
> +	curr_qp = vi->curr_queue_pairs - vi->xdp_queue_pairs;
> +	if (prog)
> +		xdp_qp = nr_cpu_ids;
> +
> +	/* XDP requires extra queues for XDP_TX */
> +	if (curr_qp + xdp_qp > vi->max_queue_pairs) {
> +		netdev_warn(dev, "request %i queues but max is %i\n",
> +			    curr_qp + xdp_qp, vi->max_queue_pairs);
> +		return -ENOMEM;
> +	}
> +
> +	err = virtnet_set_queues(vi, curr_qp + xdp_qp);
> +	if (err) {
> +		dev_warn(&dev->dev, "XDP Device queue allocation failure.\n");
> +		return err;
> +	}
> +
>  	if (prog) {
> -		prog = bpf_prog_add(prog, vi->max_queue_pairs - 1);
> -		if (IS_ERR(prog))
> +		prog = bpf_prog_add(prog, vi->max_queue_pairs);
> +		if (IS_ERR(prog)) {
> +			virtnet_set_queues(vi, curr_qp);
>  			return PTR_ERR(prog);
> +		}
>  	}
>  
> +	vi->xdp_queue_pairs = xdp_qp;
> +	netif_set_real_num_rx_queues(dev, curr_qp + xdp_qp);
> +
>  	for (i = 0; i < vi->max_queue_pairs; i++) {
>  		old_prog = rtnl_dereference(vi->rq[i].xdp_prog);
>  		rcu_assign_pointer(vi->rq[i].xdp_prog, prog);

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [net-next PATCH v2 3/5] virtio_net: Add XDP support
  2016-11-20  2:50 ` [net-next PATCH v2 3/5] virtio_net: Add XDP support John Fastabend
@ 2016-11-21 23:20   ` Michael S. Tsirkin
  2016-11-22  8:27     ` John Fastabend
  0 siblings, 1 reply; 21+ messages in thread
From: Michael S. Tsirkin @ 2016-11-21 23:20 UTC (permalink / raw)
  To: John Fastabend
  Cc: daniel, eric.dumazet, kubakici, shm, davem, alexei.starovoitov,
	netdev, bblanco, john.r.fastabend, brouer, tgraf

On Sat, Nov 19, 2016 at 06:50:33PM -0800, John Fastabend wrote:
> From: Shrijeet Mukherjee <shrijeet@gmail.com>
> 
> This adds XDP support to virtio_net. Some requirements must be
> met for XDP to be enabled depending on the mode. First it will
> only be supported with LRO disabled so that data is not pushed
> across multiple buffers. The MTU must be less than a page size
> to avoid having to handle XDP across multiple pages.
> 
> If mergeable receive is enabled this first series only supports
> the case where header and data are in the same buf which we can
> check when a packet is received by looking at num_buf. If the
> num_buf is greater than 1 and a XDP program is loaded the packet
> is dropped and a warning is thrown. When any_header_sg is set this
> does not happen and both header and data is put in a single buffer
> as expected so we check this when XDP programs are loaded. Note I
> have only tested this with Linux vhost backend.
> 
> If big packets mode is enabled and MTU/LRO conditions above are
> met then XDP is allowed.
> 
> A follow on patch can be generated to solve the mergeable receive
> case with num_bufs equal to 2. Buffers greater than two may not
> be handled has easily.


I would very much prefer support for other layouts without drops
before merging this.
header by itself can certainly be handled by skipping it.
People wanted to use that e.g. for zero copy.

Anything else can be handled by copying the packet.

> Suggested-by: Shrijeet Mukherjee <shrijeet@gmail.com>
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> ---
>  drivers/net/virtio_net.c |  146 +++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 142 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 8189e5b..8f99a53 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -22,6 +22,7 @@
>  #include <linux/module.h>
>  #include <linux/virtio.h>
>  #include <linux/virtio_net.h>
> +#include <linux/bpf.h>
>  #include <linux/scatterlist.h>
>  #include <linux/if_vlan.h>
>  #include <linux/slab.h>
> @@ -81,6 +82,8 @@ struct receive_queue {
>  
>  	struct napi_struct napi;
>  
> +	struct bpf_prog __rcu *xdp_prog;
> +
>  	/* Chain pages by the private ptr. */
>  	struct page *pages;
>  
> @@ -324,6 +327,38 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
>  	return skb;
>  }
>  
> +static u32 do_xdp_prog(struct virtnet_info *vi,
> +		       struct bpf_prog *xdp_prog,
> +		       struct page *page, int offset, int len)
> +{
> +	int hdr_padded_len;
> +	struct xdp_buff xdp;
> +	u32 act;
> +	u8 *buf;
> +
> +	buf = page_address(page) + offset;
> +
> +	if (vi->mergeable_rx_bufs)
> +		hdr_padded_len = sizeof(struct virtio_net_hdr_mrg_rxbuf);
> +	else
> +		hdr_padded_len = sizeof(struct padded_vnet_hdr);
> +
> +	xdp.data = buf + hdr_padded_len;
> +	xdp.data_end = xdp.data + (len - vi->hdr_len);
> +
> +	act = bpf_prog_run_xdp(xdp_prog, &xdp);
> +	switch (act) {
> +	case XDP_PASS:
> +		return XDP_PASS;
> +	default:
> +		bpf_warn_invalid_xdp_action(act);
> +	case XDP_TX:
> +	case XDP_ABORTED:
> +	case XDP_DROP:
> +		return XDP_DROP;
> +	}
> +}
> +
>  static struct sk_buff *receive_small(struct virtnet_info *vi, void *buf, unsigned int len)
>  {
>  	struct sk_buff * skb = buf;
> @@ -340,9 +375,19 @@ static struct sk_buff *receive_big(struct net_device *dev,
>  				   void *buf,
>  				   unsigned int len)
>  {
> +	struct bpf_prog *xdp_prog;
>  	struct page *page = buf;
> -	struct sk_buff *skb = page_to_skb(vi, rq, page, 0, len, PAGE_SIZE);
> +	struct sk_buff *skb;
>  
> +	xdp_prog = rcu_dereference_bh(rq->xdp_prog);
> +	if (xdp_prog) {
> +		u32 act = do_xdp_prog(vi, xdp_prog, page, 0, len);
> +
> +		if (act == XDP_DROP)
> +			goto err;
> +	}
> +
> +	skb = page_to_skb(vi, rq, page, 0, len, PAGE_SIZE);
>  	if (unlikely(!skb))
>  		goto err;
>  
> @@ -366,10 +411,25 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>  	struct page *page = virt_to_head_page(buf);
>  	int offset = buf - page_address(page);
>  	unsigned int truesize = max(len, mergeable_ctx_to_buf_truesize(ctx));
> +	struct sk_buff *head_skb, *curr_skb;
> +	struct bpf_prog *xdp_prog;
>  
> -	struct sk_buff *head_skb = page_to_skb(vi, rq, page, offset, len,
> -					       truesize);
> -	struct sk_buff *curr_skb = head_skb;
> +	xdp_prog = rcu_dereference_bh(rq->xdp_prog);
> +	if (xdp_prog) {
> +		u32 act;
> +
> +		if (num_buf > 1) {
> +			bpf_warn_invalid_xdp_buffer();
> +			goto err_skb;
> +		}
> +
> +		act = do_xdp_prog(vi, xdp_prog, page, offset, len);
> +		if (act == XDP_DROP)
> +			goto err_skb;
> +	}
> +
> +	head_skb = page_to_skb(vi, rq, page, offset, len, truesize);
> +	curr_skb = head_skb;
>  
>  	if (unlikely(!curr_skb))
>  		goto err_skb;
> @@ -1328,6 +1388,13 @@ static int virtnet_set_channels(struct net_device *dev,
>  	if (queue_pairs > vi->max_queue_pairs || queue_pairs == 0)
>  		return -EINVAL;
>  
> +	/* For now we don't support modifying channels while XDP is loaded
> +	 * also when XDP is loaded all RX queues have XDP programs so we only
> +	 * need to check a single RX queue.
> +	 */
> +	if (vi->rq[0].xdp_prog)
> +		return -EINVAL;
> +
>  	get_online_cpus();
>  	err = virtnet_set_queues(vi, queue_pairs);
>  	if (!err) {
> @@ -1454,6 +1521,68 @@ static int virtnet_set_features(struct net_device *netdev,
>  	return 0;
>  }
>  
> +static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog)
> +{
> +	struct virtnet_info *vi = netdev_priv(dev);
> +	struct bpf_prog *old_prog;
> +	int i;
> +
> +	if ((dev->features & NETIF_F_LRO) && prog) {
> +		netdev_warn(dev, "can't set XDP while LRO is on, disable LRO first\n");
> +		return -EINVAL;
> +	}
> +
> +	if (vi->mergeable_rx_bufs && !vi->any_header_sg) {
> +		netdev_warn(dev, "XDP expects header/data in single page\n");
> +		return -EINVAL;
> +	}
> +
> +	if (dev->mtu > PAGE_SIZE) {
> +		netdev_warn(dev, "XDP requires MTU less than %lu\n", PAGE_SIZE);
> +		return -EINVAL;
> +	}
> +
> +	if (prog) {
> +		prog = bpf_prog_add(prog, vi->max_queue_pairs - 1);
> +		if (IS_ERR(prog))
> +			return PTR_ERR(prog);
> +	}
> +
> +	for (i = 0; i < vi->max_queue_pairs; i++) {
> +		old_prog = rtnl_dereference(vi->rq[i].xdp_prog);
> +		rcu_assign_pointer(vi->rq[i].xdp_prog, prog);
> +		if (old_prog)
> +			bpf_prog_put(old_prog);
> +	}
> +
> +	return 0;
> +}
> +
> +static bool virtnet_xdp_query(struct net_device *dev)
> +{
> +	struct virtnet_info *vi = netdev_priv(dev);
> +	int i;
> +
> +	for (i = 0; i < vi->max_queue_pairs; i++) {
> +		if (vi->rq[i].xdp_prog)
> +			return true;
> +	}
> +	return false;
> +}
> +
> +static int virtnet_xdp(struct net_device *dev, struct netdev_xdp *xdp)
> +{
> +	switch (xdp->command) {
> +	case XDP_SETUP_PROG:
> +		return virtnet_xdp_set(dev, xdp->prog);
> +	case XDP_QUERY_PROG:
> +		xdp->prog_attached = virtnet_xdp_query(dev);
> +		return 0;
> +	default:
> +		return -EINVAL;
> +	}
> +}
> +
>  static const struct net_device_ops virtnet_netdev = {
>  	.ndo_open            = virtnet_open,
>  	.ndo_stop   	     = virtnet_close,
> @@ -1471,6 +1600,7 @@ static int virtnet_set_features(struct net_device *netdev,
>  	.ndo_busy_poll		= virtnet_busy_poll,
>  #endif
>  	.ndo_set_features	= virtnet_set_features,
> +	.ndo_xdp		= virtnet_xdp,
>  };
>  
>  static void virtnet_config_changed_work(struct work_struct *work)
> @@ -1527,12 +1657,20 @@ static void virtnet_free_queues(struct virtnet_info *vi)
>  
>  static void free_receive_bufs(struct virtnet_info *vi)
>  {
> +	struct bpf_prog *old_prog;
>  	int i;
>  
> +	rtnl_lock();
>  	for (i = 0; i < vi->max_queue_pairs; i++) {
>  		while (vi->rq[i].pages)
>  			__free_pages(get_a_page(&vi->rq[i], GFP_KERNEL), 0);
> +
> +		old_prog = rtnl_dereference(vi->rq[i].xdp_prog);
> +		RCU_INIT_POINTER(vi->rq[i].xdp_prog, NULL);
> +		if (old_prog)
> +			bpf_prog_put(old_prog);
>  	}
> +	rtnl_unlock();
>  }
>  
>  static void free_receive_page_frags(struct virtnet_info *vi)

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [net-next PATCH v2 1/5] net: virtio dynamically disable/enable LRO
  2016-11-20  2:49 ` [net-next PATCH v2 1/5] net: virtio dynamically disable/enable LRO John Fastabend
@ 2016-11-21 23:23   ` Michael S. Tsirkin
  2016-11-22  8:16     ` John Fastabend
  0 siblings, 1 reply; 21+ messages in thread
From: Michael S. Tsirkin @ 2016-11-21 23:23 UTC (permalink / raw)
  To: John Fastabend
  Cc: daniel, eric.dumazet, kubakici, shm, davem, alexei.starovoitov,
	netdev, bblanco, john.r.fastabend, brouer, tgraf

On Sat, Nov 19, 2016 at 06:49:34PM -0800, John Fastabend wrote:
> This adds support for dynamically setting the LRO feature flag. The
> message to control guest features in the backend uses the
> CTRL_GUEST_OFFLOADS msg type.
> 
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> ---
>  drivers/net/virtio_net.c |   45 ++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 44 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index ca5239a..8189e5b 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1419,6 +1419,41 @@ static void virtnet_init_settings(struct net_device *dev)
>  	.set_settings = virtnet_set_settings,
>  };
>  
> +static int virtnet_set_features(struct net_device *netdev,
> +				netdev_features_t features)
> +{
> +	struct virtnet_info *vi = netdev_priv(netdev);
> +	struct virtio_device *vdev = vi->vdev;
> +	struct scatterlist sg;
> +	u64 offloads = 0;
> +
> +	if (features & NETIF_F_LRO)
> +		offloads |= (1 << VIRTIO_NET_F_GUEST_TSO4) |
> +			    (1 << VIRTIO_NET_F_GUEST_TSO6);
> +
> +	if (features & NETIF_F_RXCSUM)
> +		offloads |= (1 << VIRTIO_NET_F_GUEST_CSUM);
> +
> +	if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS)) {
> +		sg_init_one(&sg, &offloads, sizeof(uint64_t));
> +		if (!virtnet_send_command(vi,
> +					  VIRTIO_NET_CTRL_GUEST_OFFLOADS,
> +					  VIRTIO_NET_CTRL_GUEST_OFFLOADS_SET,
> +					  &sg)) {
> +			dev_warn(&netdev->dev,
> +				 "Failed to set guest offloads by virtnet command.\n");
> +			return -EINVAL;
> +		}
> +	} else if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS) &&
> +		   !virtio_has_feature(vdev, VIRTIO_F_VERSION_1)) {
> +		dev_warn(&netdev->dev,
> +			 "No support for setting offloads pre version_1.\n");
> +		return -EINVAL;
> +	}

I don't get this last warning. VIRTIO_NET_F_CTRL_GUEST_OFFLOADS
was exposes by legacy devices, I don't think it's related to
VIRTIO_F_VERSION_1.


> +
> +	return 0;
> +}
> +
>  static const struct net_device_ops virtnet_netdev = {
>  	.ndo_open            = virtnet_open,
>  	.ndo_stop   	     = virtnet_close,
> @@ -1435,6 +1470,7 @@ static void virtnet_init_settings(struct net_device *dev)
>  #ifdef CONFIG_NET_RX_BUSY_POLL
>  	.ndo_busy_poll		= virtnet_busy_poll,
>  #endif
> +	.ndo_set_features	= virtnet_set_features,
>  };
>  
>  static void virtnet_config_changed_work(struct work_struct *work)
> @@ -1810,6 +1846,12 @@ static int virtnet_probe(struct virtio_device *vdev)
>  	if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_CSUM))
>  		dev->features |= NETIF_F_RXCSUM;
>  
> +	if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) &&
> +	    virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6)) {
> +		dev->features |= NETIF_F_LRO;
> +		dev->hw_features |= NETIF_F_LRO;
> +	}
> +
>  	dev->vlan_features = dev->features;
>  
>  	/* MTU range: 68 - 65535 */
> @@ -2047,7 +2089,8 @@ static int virtnet_restore(struct virtio_device *vdev)
>  	VIRTIO_NET_F_CTRL_RX, VIRTIO_NET_F_CTRL_VLAN, \
>  	VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ, \
>  	VIRTIO_NET_F_CTRL_MAC_ADDR, \
> -	VIRTIO_NET_F_MTU
> +	VIRTIO_NET_F_MTU, \
> +	VIRTIO_NET_F_CTRL_GUEST_OFFLOADS
>  
>  static unsigned int features[] = {
>  	VIRTNET_FEATURES,

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [net-next PATCH v2 1/5] net: virtio dynamically disable/enable LRO
  2016-11-21 23:23   ` Michael S. Tsirkin
@ 2016-11-22  8:16     ` John Fastabend
  0 siblings, 0 replies; 21+ messages in thread
From: John Fastabend @ 2016-11-22  8:16 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: daniel, eric.dumazet, kubakici, shm, davem, alexei.starovoitov,
	netdev, bblanco, john.r.fastabend, brouer, tgraf

On 16-11-21 03:23 PM, Michael S. Tsirkin wrote:
> On Sat, Nov 19, 2016 at 06:49:34PM -0800, John Fastabend wrote:
>> This adds support for dynamically setting the LRO feature flag. The
>> message to control guest features in the backend uses the
>> CTRL_GUEST_OFFLOADS msg type.
>>
>> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
>> ---
>>  drivers/net/virtio_net.c |   45 ++++++++++++++++++++++++++++++++++++++++++++-
>>  1 file changed, 44 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index ca5239a..8189e5b 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -1419,6 +1419,41 @@ static void virtnet_init_settings(struct net_device *dev)
>>  	.set_settings = virtnet_set_settings,
>>  };
>>  
>> +static int virtnet_set_features(struct net_device *netdev,
>> +				netdev_features_t features)
>> +{
>> +	struct virtnet_info *vi = netdev_priv(netdev);
>> +	struct virtio_device *vdev = vi->vdev;
>> +	struct scatterlist sg;
>> +	u64 offloads = 0;
>> +
>> +	if (features & NETIF_F_LRO)
>> +		offloads |= (1 << VIRTIO_NET_F_GUEST_TSO4) |
>> +			    (1 << VIRTIO_NET_F_GUEST_TSO6);
>> +
>> +	if (features & NETIF_F_RXCSUM)
>> +		offloads |= (1 << VIRTIO_NET_F_GUEST_CSUM);
>> +
>> +	if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS)) {
>> +		sg_init_one(&sg, &offloads, sizeof(uint64_t));
>> +		if (!virtnet_send_command(vi,
>> +					  VIRTIO_NET_CTRL_GUEST_OFFLOADS,
>> +					  VIRTIO_NET_CTRL_GUEST_OFFLOADS_SET,
>> +					  &sg)) {
>> +			dev_warn(&netdev->dev,
>> +				 "Failed to set guest offloads by virtnet command.\n");
>> +			return -EINVAL;
>> +		}
>> +	} else if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS) &&
>> +		   !virtio_has_feature(vdev, VIRTIO_F_VERSION_1)) {
>> +		dev_warn(&netdev->dev,
>> +			 "No support for setting offloads pre version_1.\n");
>> +		return -EINVAL;
>> +	}
> 
> I don't get this last warning. VIRTIO_NET_F_CTRL_GUEST_OFFLOADS
> was exposes by legacy devices, I don't think it's related to
> VIRTIO_F_VERSION_1.
> 

OK looks like I can just drop the else if branch here.

Thanks,
John

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [net-next PATCH v2 4/5] virtio_net: add dedicated XDP transmit queues
  2016-11-21 23:13   ` Michael S. Tsirkin
@ 2016-11-22  8:17     ` John Fastabend
  2016-11-22 14:59       ` Michael S. Tsirkin
  0 siblings, 1 reply; 21+ messages in thread
From: John Fastabend @ 2016-11-22  8:17 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: daniel, eric.dumazet, kubakici, shm, davem, alexei.starovoitov,
	netdev, bblanco, john.r.fastabend, brouer, tgraf

On 16-11-21 03:13 PM, Michael S. Tsirkin wrote:
> On Sat, Nov 19, 2016 at 06:51:04PM -0800, John Fastabend wrote:
>> XDP requires using isolated transmit queues to avoid interference
>> with normal networking stack (BQL, NETDEV_TX_BUSY, etc). This patch
>> adds a XDP queue per cpu when a XDP program is loaded and does not
>> expose the queues to the OS via the normal API call to
>> netif_set_real_num_tx_queues(). This way the stack will never push
>> an skb to these queues.
>>
>> However virtio/vhost/qemu implementation only allows for creating
>> TX/RX queue pairs at this time so creating only TX queues was not
>> possible. And because the associated RX queues are being created I
>> went ahead and exposed these to the stack and let the backend use
>> them. This creates more RX queues visible to the network stack than
>> TX queues which is worth mentioning but does not cause any issues as
>> far as I can tell.
>>
>> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> 
> FYI what's supposed to happen is packets from the same
> flow going in the reverse direction will go on the
> same queue.
> 
> This might come in handy when implementing RX XDP.
> 

Yeah but if its the first packet not part of a flow then presumably it
can pick any queue but its worth keeping in mind certainly.

.John

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [net-next PATCH v2 3/5] virtio_net: Add XDP support
  2016-11-21 23:20   ` Michael S. Tsirkin
@ 2016-11-22  8:27     ` John Fastabend
  2016-11-22 14:58       ` Michael S. Tsirkin
  0 siblings, 1 reply; 21+ messages in thread
From: John Fastabend @ 2016-11-22  8:27 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: daniel, eric.dumazet, kubakici, shm, davem, alexei.starovoitov,
	netdev, bblanco, john.r.fastabend, brouer, tgraf

On 16-11-21 03:20 PM, Michael S. Tsirkin wrote:
> On Sat, Nov 19, 2016 at 06:50:33PM -0800, John Fastabend wrote:
>> From: Shrijeet Mukherjee <shrijeet@gmail.com>
>>
>> This adds XDP support to virtio_net. Some requirements must be
>> met for XDP to be enabled depending on the mode. First it will
>> only be supported with LRO disabled so that data is not pushed
>> across multiple buffers. The MTU must be less than a page size
>> to avoid having to handle XDP across multiple pages.
>>
>> If mergeable receive is enabled this first series only supports
>> the case where header and data are in the same buf which we can
>> check when a packet is received by looking at num_buf. If the
>> num_buf is greater than 1 and a XDP program is loaded the packet
>> is dropped and a warning is thrown. When any_header_sg is set this
>> does not happen and both header and data is put in a single buffer
>> as expected so we check this when XDP programs are loaded. Note I
>> have only tested this with Linux vhost backend.
>>
>> If big packets mode is enabled and MTU/LRO conditions above are
>> met then XDP is allowed.
>>
>> A follow on patch can be generated to solve the mergeable receive
>> case with num_bufs equal to 2. Buffers greater than two may not
>> be handled has easily.
> 
> 
> I would very much prefer support for other layouts without drops
> before merging this.
> header by itself can certainly be handled by skipping it.
> People wanted to use that e.g. for zero copy.

OK fair enough I'll do this now rather than push it out.

> 
> Anything else can be handled by copying the packet.

This though I'm not so sure about. The copy is going to be slow and
I wonder if someone could craft a packet to cause this if it could
be used to slow down a system.

Also I can't see what would cause this to happen. With mergeable
buffers and LRO off the num_bufs is either 1 or 2 depending on where
the header is. Otherwise with LRO off it should be in a single page.
At least this is the Linux vhost implementation, I guess other
implementation might meet spec but use num_buf > 2 or multiple pages
even in the non LRO case.

I tend to think dropping the packet out right is better than copying
it around. At very least if we do this we need to put in warnings so
users can see something is mis-configured.

.John

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [net-next PATCH v2 3/5] virtio_net: Add XDP support
  2016-11-22  8:27     ` John Fastabend
@ 2016-11-22 14:58       ` Michael S. Tsirkin
  2016-11-25 21:24         ` John Fastabend
  0 siblings, 1 reply; 21+ messages in thread
From: Michael S. Tsirkin @ 2016-11-22 14:58 UTC (permalink / raw)
  To: John Fastabend
  Cc: daniel, eric.dumazet, kubakici, shm, davem, alexei.starovoitov,
	netdev, bblanco, john.r.fastabend, brouer, tgraf

On Tue, Nov 22, 2016 at 12:27:03AM -0800, John Fastabend wrote:
> On 16-11-21 03:20 PM, Michael S. Tsirkin wrote:
> > On Sat, Nov 19, 2016 at 06:50:33PM -0800, John Fastabend wrote:
> >> From: Shrijeet Mukherjee <shrijeet@gmail.com>
> >>
> >> This adds XDP support to virtio_net. Some requirements must be
> >> met for XDP to be enabled depending on the mode. First it will
> >> only be supported with LRO disabled so that data is not pushed
> >> across multiple buffers. The MTU must be less than a page size
> >> to avoid having to handle XDP across multiple pages.
> >>
> >> If mergeable receive is enabled this first series only supports
> >> the case where header and data are in the same buf which we can
> >> check when a packet is received by looking at num_buf. If the
> >> num_buf is greater than 1 and a XDP program is loaded the packet
> >> is dropped and a warning is thrown. When any_header_sg is set this
> >> does not happen and both header and data is put in a single buffer
> >> as expected so we check this when XDP programs are loaded. Note I
> >> have only tested this with Linux vhost backend.
> >>
> >> If big packets mode is enabled and MTU/LRO conditions above are
> >> met then XDP is allowed.
> >>
> >> A follow on patch can be generated to solve the mergeable receive
> >> case with num_bufs equal to 2. Buffers greater than two may not
> >> be handled has easily.
> > 
> > 
> > I would very much prefer support for other layouts without drops
> > before merging this.
> > header by itself can certainly be handled by skipping it.
> > People wanted to use that e.g. for zero copy.
> 
> OK fair enough I'll do this now rather than push it out.
> 
> > 
> > Anything else can be handled by copying the packet.
> 
> This though I'm not so sure about. The copy is going to be slow and
> I wonder if someone could craft a packet to cause this if it could
> be used to slow down a system.

Device can always linearize if it wants to. If device is malicious
it's hard for OS to defend itself.

> Also I can't see what would cause this to happen. With mergeable
> buffers and LRO off the num_bufs is either 1 or 2 depending on where
> the header is. Otherwise with LRO off it should be in a single page.
> At least this is the Linux vhost implementation, I guess other
> implementation might meet spec but use num_buf > 2 or multiple pages
> even in the non LRO case.

Me neither but then not a long time ago we always placed
header in a separate entry until we saw the extra s/g has
measureable overhead.

network broken is kind of a heavy handed thing, making debugging
impossible for many people.

> I tend to think dropping the packet out right is better than copying
> it around. At very least if we do this we need to put in warnings so
> users can see something is mis-configured.
> 
> .John

Yes, I think that's a good idea.

-- 
MST

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [net-next PATCH v2 4/5] virtio_net: add dedicated XDP transmit queues
  2016-11-22  8:17     ` John Fastabend
@ 2016-11-22 14:59       ` Michael S. Tsirkin
  0 siblings, 0 replies; 21+ messages in thread
From: Michael S. Tsirkin @ 2016-11-22 14:59 UTC (permalink / raw)
  To: John Fastabend
  Cc: daniel, eric.dumazet, kubakici, shm, davem, alexei.starovoitov,
	netdev, bblanco, john.r.fastabend, brouer, tgraf

On Tue, Nov 22, 2016 at 12:17:40AM -0800, John Fastabend wrote:
> On 16-11-21 03:13 PM, Michael S. Tsirkin wrote:
> > On Sat, Nov 19, 2016 at 06:51:04PM -0800, John Fastabend wrote:
> >> XDP requires using isolated transmit queues to avoid interference
> >> with normal networking stack (BQL, NETDEV_TX_BUSY, etc). This patch
> >> adds a XDP queue per cpu when a XDP program is loaded and does not
> >> expose the queues to the OS via the normal API call to
> >> netif_set_real_num_tx_queues(). This way the stack will never push
> >> an skb to these queues.
> >>
> >> However virtio/vhost/qemu implementation only allows for creating
> >> TX/RX queue pairs at this time so creating only TX queues was not
> >> possible. And because the associated RX queues are being created I
> >> went ahead and exposed these to the stack and let the backend use
> >> them. This creates more RX queues visible to the network stack than
> >> TX queues which is worth mentioning but does not cause any issues as
> >> far as I can tell.
> >>
> >> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> > 
> > FYI what's supposed to happen is packets from the same
> > flow going in the reverse direction will go on the
> > same queue.
> > 
> > This might come in handy when implementing RX XDP.
> > 
> 
> Yeah but if its the first packet not part of a flow then presumably it
> can pick any queue but its worth keeping in mind certainly.
> 
> .John

Oh I agree, absolutely. This was just a FYI in case it comes useful
as an optimization down the road.

-- 
MST

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [net-next PATCH v2 3/5] virtio_net: Add XDP support
  2016-11-22 14:58       ` Michael S. Tsirkin
@ 2016-11-25 21:24         ` John Fastabend
  2016-11-28  3:36           ` Michael S. Tsirkin
  0 siblings, 1 reply; 21+ messages in thread
From: John Fastabend @ 2016-11-25 21:24 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: daniel, eric.dumazet, kubakici, shm, davem, alexei.starovoitov,
	netdev, bblanco, john.r.fastabend, brouer, tgraf

On 16-11-22 06:58 AM, Michael S. Tsirkin wrote:
> On Tue, Nov 22, 2016 at 12:27:03AM -0800, John Fastabend wrote:
>> On 16-11-21 03:20 PM, Michael S. Tsirkin wrote:
>>> On Sat, Nov 19, 2016 at 06:50:33PM -0800, John Fastabend wrote:
>>>> From: Shrijeet Mukherjee <shrijeet@gmail.com>
>>>>
>>>> This adds XDP support to virtio_net. Some requirements must be
>>>> met for XDP to be enabled depending on the mode. First it will
>>>> only be supported with LRO disabled so that data is not pushed
>>>> across multiple buffers. The MTU must be less than a page size
>>>> to avoid having to handle XDP across multiple pages.
>>>>
>>>> If mergeable receive is enabled this first series only supports
>>>> the case where header and data are in the same buf which we can
>>>> check when a packet is received by looking at num_buf. If the
>>>> num_buf is greater than 1 and a XDP program is loaded the packet
>>>> is dropped and a warning is thrown. When any_header_sg is set this
>>>> does not happen and both header and data is put in a single buffer
>>>> as expected so we check this when XDP programs are loaded. Note I
>>>> have only tested this with Linux vhost backend.
>>>>
>>>> If big packets mode is enabled and MTU/LRO conditions above are
>>>> met then XDP is allowed.
>>>>
>>>> A follow on patch can be generated to solve the mergeable receive
>>>> case with num_bufs equal to 2. Buffers greater than two may not
>>>> be handled has easily.
>>>
>>>
>>> I would very much prefer support for other layouts without drops
>>> before merging this.
>>> header by itself can certainly be handled by skipping it.
>>> People wanted to use that e.g. for zero copy.
>>
>> OK fair enough I'll do this now rather than push it out.
>>

Hi Michael,

The header skip logic however complicates the xmit handling a fair
amount. Specifically when we release the buffers after xmit then
both the hdr and data portions need to be released which requires
some tracking.

Is the header split logic actually in use somewhere today? It looks
like its not being used in Linux case. And zero copy RX is currently as
best I can tell not supported anywhere so I would prefer not to
complicate the XDP path at the moment with a possible future feature.

>>>
>>> Anything else can be handled by copying the packet.

Any idea how to test this? At the moment I have some code to linearize
the data in all cases with more than a single buffer. But wasn't clear
to me which features I could negotiate with vhost/qemu to get more than
a single buffer in the receive path.

Thanks,
John

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [net-next PATCH v2 3/5] virtio_net: Add XDP support
  2016-11-25 21:24         ` John Fastabend
@ 2016-11-28  3:36           ` Michael S. Tsirkin
  2016-11-28  3:56             ` John Fastabend
  0 siblings, 1 reply; 21+ messages in thread
From: Michael S. Tsirkin @ 2016-11-28  3:36 UTC (permalink / raw)
  To: John Fastabend
  Cc: daniel, eric.dumazet, kubakici, shm, davem, alexei.starovoitov,
	netdev, bblanco, john.r.fastabend, brouer, tgraf

On Fri, Nov 25, 2016 at 01:24:03PM -0800, John Fastabend wrote:
> On 16-11-22 06:58 AM, Michael S. Tsirkin wrote:
> > On Tue, Nov 22, 2016 at 12:27:03AM -0800, John Fastabend wrote:
> >> On 16-11-21 03:20 PM, Michael S. Tsirkin wrote:
> >>> On Sat, Nov 19, 2016 at 06:50:33PM -0800, John Fastabend wrote:
> >>>> From: Shrijeet Mukherjee <shrijeet@gmail.com>
> >>>>
> >>>> This adds XDP support to virtio_net. Some requirements must be
> >>>> met for XDP to be enabled depending on the mode. First it will
> >>>> only be supported with LRO disabled so that data is not pushed
> >>>> across multiple buffers. The MTU must be less than a page size
> >>>> to avoid having to handle XDP across multiple pages.
> >>>>
> >>>> If mergeable receive is enabled this first series only supports
> >>>> the case where header and data are in the same buf which we can
> >>>> check when a packet is received by looking at num_buf. If the
> >>>> num_buf is greater than 1 and a XDP program is loaded the packet
> >>>> is dropped and a warning is thrown. When any_header_sg is set this
> >>>> does not happen and both header and data is put in a single buffer
> >>>> as expected so we check this when XDP programs are loaded. Note I
> >>>> have only tested this with Linux vhost backend.
> >>>>
> >>>> If big packets mode is enabled and MTU/LRO conditions above are
> >>>> met then XDP is allowed.
> >>>>
> >>>> A follow on patch can be generated to solve the mergeable receive
> >>>> case with num_bufs equal to 2. Buffers greater than two may not
> >>>> be handled has easily.
> >>>
> >>>
> >>> I would very much prefer support for other layouts without drops
> >>> before merging this.
> >>> header by itself can certainly be handled by skipping it.
> >>> People wanted to use that e.g. for zero copy.
> >>
> >> OK fair enough I'll do this now rather than push it out.
> >>
> 
> Hi Michael,
> 
> The header skip logic however complicates the xmit handling a fair
> amount. Specifically when we release the buffers after xmit then
> both the hdr and data portions need to be released which requires
> some tracking.

I thought you disable all checksum offloads so why not discard the
header immediately?

> Is the header split logic actually in use somewhere today? It looks
> like its not being used in Linux case. And zero copy RX is currently as
> best I can tell not supported anywhere so I would prefer not to
> complicate the XDP path at the moment with a possible future feature.

Well it's part of the documented interface so we never
know who implemented it. Normally if we want to make
restrictions we would do the reverse and add a feature.

We can do this easily, but I'd like to first look into
just handling all possible inputs as the spec asks us to.
I'm a bit too busy with other stuff next week but will
look into this a week after that if you don't beat me to it.

> >>>
> >>> Anything else can be handled by copying the packet.
> 
> Any idea how to test this? At the moment I have some code to linearize
> the data in all cases with more than a single buffer. But wasn't clear
> to me which features I could negotiate with vhost/qemu to get more than
> a single buffer in the receive path.
> 
> Thanks,
> John

ATM you need to hack qemu. Here's a hack to make header completely
separate.


diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index b68c69d..4866144 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -1164,6 +1164,7 @@ static ssize_t virtio_net_receive(NetClientState *nc, const uint8_t *buf, size_t
             offset = n->host_hdr_len;
             total += n->guest_hdr_len;
             guest_offset = n->guest_hdr_len;
+            continue;
         } else {
             guest_offset = 0;
         }



here's one that should cap the 1st s/g to 100 bytes:


diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index b68c69d..7943004 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -1164,6 +1164,7 @@ static ssize_t virtio_net_receive(NetClientState *nc, const uint8_t *buf, size_t
             offset = n->host_hdr_len;
             total += n->guest_hdr_len;
             guest_offset = n->guest_hdr_len;
+            sg.iov_len = MIN(sg.iov_len, 100);
         } else {
             guest_offset = 0;
         }

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [net-next PATCH v2 3/5] virtio_net: Add XDP support
  2016-11-28  3:36           ` Michael S. Tsirkin
@ 2016-11-28  3:56             ` John Fastabend
  2016-11-28  4:07               ` Michael S. Tsirkin
  0 siblings, 1 reply; 21+ messages in thread
From: John Fastabend @ 2016-11-28  3:56 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: daniel, eric.dumazet, kubakici, shm, davem, alexei.starovoitov,
	netdev, bblanco, john.r.fastabend, brouer, tgraf

On 16-11-27 07:36 PM, Michael S. Tsirkin wrote:
> On Fri, Nov 25, 2016 at 01:24:03PM -0800, John Fastabend wrote:
>> On 16-11-22 06:58 AM, Michael S. Tsirkin wrote:
>>> On Tue, Nov 22, 2016 at 12:27:03AM -0800, John Fastabend wrote:
>>>> On 16-11-21 03:20 PM, Michael S. Tsirkin wrote:
>>>>> On Sat, Nov 19, 2016 at 06:50:33PM -0800, John Fastabend wrote:
>>>>>> From: Shrijeet Mukherjee <shrijeet@gmail.com>
>>>>>>
>>>>>> This adds XDP support to virtio_net. Some requirements must be
>>>>>> met for XDP to be enabled depending on the mode. First it will
>>>>>> only be supported with LRO disabled so that data is not pushed
>>>>>> across multiple buffers. The MTU must be less than a page size
>>>>>> to avoid having to handle XDP across multiple pages.
>>>>>>
>>>>>> If mergeable receive is enabled this first series only supports
>>>>>> the case where header and data are in the same buf which we can
>>>>>> check when a packet is received by looking at num_buf. If the
>>>>>> num_buf is greater than 1 and a XDP program is loaded the packet
>>>>>> is dropped and a warning is thrown. When any_header_sg is set this
>>>>>> does not happen and both header and data is put in a single buffer
>>>>>> as expected so we check this when XDP programs are loaded. Note I
>>>>>> have only tested this with Linux vhost backend.
>>>>>>
>>>>>> If big packets mode is enabled and MTU/LRO conditions above are
>>>>>> met then XDP is allowed.
>>>>>>
>>>>>> A follow on patch can be generated to solve the mergeable receive
>>>>>> case with num_bufs equal to 2. Buffers greater than two may not
>>>>>> be handled has easily.
>>>>>
>>>>>
>>>>> I would very much prefer support for other layouts without drops
>>>>> before merging this.
>>>>> header by itself can certainly be handled by skipping it.
>>>>> People wanted to use that e.g. for zero copy.
>>>>
>>>> OK fair enough I'll do this now rather than push it out.
>>>>
>>
>> Hi Michael,
>>
>> The header skip logic however complicates the xmit handling a fair
>> amount. Specifically when we release the buffers after xmit then
>> both the hdr and data portions need to be released which requires
>> some tracking.
> 
> I thought you disable all checksum offloads so why not discard the
> header immediately?

Well in the "normal" case where the header is part of the same buffer
we keep it to use the same space for the header on the TX path.

If we discard it in the header split case we have to push the header
somewhere else. In the skb case the cb[] region is used it looks like.
In our case I guess free space at the end of the page could be used.

My thinking is if we handle the general case of more than one buffer
being used with a copy we can handle the case above using the same
logic and no need to handle it as a special case. It seems to be an odd
case that doesn't really exist anyways. At least not in qemu/Linux. I
have not tested anything else.

> 
>> Is the header split logic actually in use somewhere today? It looks
>> like its not being used in Linux case. And zero copy RX is currently as
>> best I can tell not supported anywhere so I would prefer not to
>> complicate the XDP path at the moment with a possible future feature.
> 
> Well it's part of the documented interface so we never
> know who implemented it. Normally if we want to make
> restrictions we would do the reverse and add a feature.
> 
> We can do this easily, but I'd like to first look into
> just handling all possible inputs as the spec asks us to.
> I'm a bit too busy with other stuff next week but will
> look into this a week after that if you don't beat me to it.
> 

Well I've almost got it working now with some logic to copy everything
into a single page if we hit this case so should be OK but slow. I'll
finish testing this and send it out hopefully in the next few days.

>>>>>
>>>>> Anything else can be handled by copying the packet.
>>
>> Any idea how to test this? At the moment I have some code to linearize
>> the data in all cases with more than a single buffer. But wasn't clear
>> to me which features I could negotiate with vhost/qemu to get more than
>> a single buffer in the receive path.
>>
>> Thanks,
>> John
> 
> ATM you need to hack qemu. Here's a hack to make header completely
> separate.
> 

Perfect! hacking qemu for testing is no problem this helps a lot thanks
and saves me time trying to figure out how to get qemu to do this.

> 
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index b68c69d..4866144 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -1164,6 +1164,7 @@ static ssize_t virtio_net_receive(NetClientState *nc, const uint8_t *buf, size_t
>              offset = n->host_hdr_len;
>              total += n->guest_hdr_len;
>              guest_offset = n->guest_hdr_len;
> +            continue;
>          } else {
>              guest_offset = 0;
>          }
> 
> 
> 
> here's one that should cap the 1st s/g to 100 bytes:
> 
> 
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index b68c69d..7943004 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -1164,6 +1164,7 @@ static ssize_t virtio_net_receive(NetClientState *nc, const uint8_t *buf, size_t
>              offset = n->host_hdr_len;
>              total += n->guest_hdr_len;
>              guest_offset = n->guest_hdr_len;
> +            sg.iov_len = MIN(sg.iov_len, 100);
>          } else {
>              guest_offset = 0;
>          }
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [net-next PATCH v2 3/5] virtio_net: Add XDP support
  2016-11-28  3:56             ` John Fastabend
@ 2016-11-28  4:07               ` Michael S. Tsirkin
  2016-11-28 23:26                 ` John Fastabend
  0 siblings, 1 reply; 21+ messages in thread
From: Michael S. Tsirkin @ 2016-11-28  4:07 UTC (permalink / raw)
  To: John Fastabend
  Cc: daniel, eric.dumazet, kubakici, shm, davem, alexei.starovoitov,
	netdev, bblanco, john.r.fastabend, brouer, tgraf

On Sun, Nov 27, 2016 at 07:56:09PM -0800, John Fastabend wrote:
> On 16-11-27 07:36 PM, Michael S. Tsirkin wrote:
> > On Fri, Nov 25, 2016 at 01:24:03PM -0800, John Fastabend wrote:
> >> On 16-11-22 06:58 AM, Michael S. Tsirkin wrote:
> >>> On Tue, Nov 22, 2016 at 12:27:03AM -0800, John Fastabend wrote:
> >>>> On 16-11-21 03:20 PM, Michael S. Tsirkin wrote:
> >>>>> On Sat, Nov 19, 2016 at 06:50:33PM -0800, John Fastabend wrote:
> >>>>>> From: Shrijeet Mukherjee <shrijeet@gmail.com>
> >>>>>>
> >>>>>> This adds XDP support to virtio_net. Some requirements must be
> >>>>>> met for XDP to be enabled depending on the mode. First it will
> >>>>>> only be supported with LRO disabled so that data is not pushed
> >>>>>> across multiple buffers. The MTU must be less than a page size
> >>>>>> to avoid having to handle XDP across multiple pages.
> >>>>>>
> >>>>>> If mergeable receive is enabled this first series only supports
> >>>>>> the case where header and data are in the same buf which we can
> >>>>>> check when a packet is received by looking at num_buf. If the
> >>>>>> num_buf is greater than 1 and a XDP program is loaded the packet
> >>>>>> is dropped and a warning is thrown. When any_header_sg is set this
> >>>>>> does not happen and both header and data is put in a single buffer
> >>>>>> as expected so we check this when XDP programs are loaded. Note I
> >>>>>> have only tested this with Linux vhost backend.
> >>>>>>
> >>>>>> If big packets mode is enabled and MTU/LRO conditions above are
> >>>>>> met then XDP is allowed.
> >>>>>>
> >>>>>> A follow on patch can be generated to solve the mergeable receive
> >>>>>> case with num_bufs equal to 2. Buffers greater than two may not
> >>>>>> be handled has easily.
> >>>>>
> >>>>>
> >>>>> I would very much prefer support for other layouts without drops
> >>>>> before merging this.
> >>>>> header by itself can certainly be handled by skipping it.
> >>>>> People wanted to use that e.g. for zero copy.
> >>>>
> >>>> OK fair enough I'll do this now rather than push it out.
> >>>>
> >>
> >> Hi Michael,
> >>
> >> The header skip logic however complicates the xmit handling a fair
> >> amount. Specifically when we release the buffers after xmit then
> >> both the hdr and data portions need to be released which requires
> >> some tracking.
> > 
> > I thought you disable all checksum offloads so why not discard the
> > header immediately?
> 
> Well in the "normal" case where the header is part of the same buffer
> we keep it to use the same space for the header on the TX path.
> 
> If we discard it in the header split case we have to push the header
> somewhere else. In the skb case the cb[] region is used it looks like.
> In our case I guess free space at the end of the page could be used.

You don't have to put start of page in a buffer, you
can put an offset there. Will result in some waste in the
common case, but it's just several bytes so likely not a big deal.

> My thinking is if we handle the general case of more than one buffer
> being used with a copy we can handle the case above using the same
> logic and no need to handle it as a special case. It seems to be an odd
> case that doesn't really exist anyways. At least not in qemu/Linux. I
> have not tested anything else.

OK

> > 
> >> Is the header split logic actually in use somewhere today? It looks
> >> like its not being used in Linux case. And zero copy RX is currently as
> >> best I can tell not supported anywhere so I would prefer not to
> >> complicate the XDP path at the moment with a possible future feature.
> > 
> > Well it's part of the documented interface so we never
> > know who implemented it. Normally if we want to make
> > restrictions we would do the reverse and add a feature.
> > 
> > We can do this easily, but I'd like to first look into
> > just handling all possible inputs as the spec asks us to.
> > I'm a bit too busy with other stuff next week but will
> > look into this a week after that if you don't beat me to it.
> > 
> 
> Well I've almost got it working now with some logic to copy everything
> into a single page if we hit this case so should be OK but slow. I'll
> finish testing this and send it out hopefully in the next few days.
> 
> >>>>>
> >>>>> Anything else can be handled by copying the packet.
> >>
> >> Any idea how to test this? At the moment I have some code to linearize
> >> the data in all cases with more than a single buffer. But wasn't clear
> >> to me which features I could negotiate with vhost/qemu to get more than
> >> a single buffer in the receive path.
> >>
> >> Thanks,
> >> John
> > 
> > ATM you need to hack qemu. Here's a hack to make header completely
> > separate.
> > 
> 
> Perfect! hacking qemu for testing is no problem this helps a lot thanks
> and saves me time trying to figure out how to get qemu to do this.

Pls note I didn't try this at all, so might not work, but should
give you the idea.

> > 
> > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> > index b68c69d..4866144 100644
> > --- a/hw/net/virtio-net.c
> > +++ b/hw/net/virtio-net.c
> > @@ -1164,6 +1164,7 @@ static ssize_t virtio_net_receive(NetClientState *nc, const uint8_t *buf, size_t
> >              offset = n->host_hdr_len;
> >              total += n->guest_hdr_len;
> >              guest_offset = n->guest_hdr_len;
> > +            continue;
> >          } else {
> >              guest_offset = 0;
> >          }
> > 
> > 
> > 
> > here's one that should cap the 1st s/g to 100 bytes:
> > 
> > 
> > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> > index b68c69d..7943004 100644
> > --- a/hw/net/virtio-net.c
> > +++ b/hw/net/virtio-net.c
> > @@ -1164,6 +1164,7 @@ static ssize_t virtio_net_receive(NetClientState *nc, const uint8_t *buf, size_t
> >              offset = n->host_hdr_len;
> >              total += n->guest_hdr_len;
> >              guest_offset = n->guest_hdr_len;
> > +            sg.iov_len = MIN(sg.iov_len, 100);
> >          } else {
> >              guest_offset = 0;
> >          }
> > 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [net-next PATCH v2 3/5] virtio_net: Add XDP support
  2016-11-28  4:07               ` Michael S. Tsirkin
@ 2016-11-28 23:26                 ` John Fastabend
  0 siblings, 0 replies; 21+ messages in thread
From: John Fastabend @ 2016-11-28 23:26 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: daniel, eric.dumazet, kubakici, shm, davem, alexei.starovoitov,
	netdev, bblanco, john.r.fastabend, brouer, tgraf

[...]

>> Perfect! hacking qemu for testing is no problem this helps a lot thanks
>> and saves me time trying to figure out how to get qemu to do this.
> 
> Pls note I didn't try this at all, so might not work, but should
> give you the idea.
> 
>>>
>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>>> index b68c69d..4866144 100644
>>> --- a/hw/net/virtio-net.c
>>> +++ b/hw/net/virtio-net.c
>>> @@ -1164,6 +1164,7 @@ static ssize_t virtio_net_receive(NetClientState *nc, const uint8_t *buf, size_t
>>>              offset = n->host_hdr_len;
>>>              total += n->guest_hdr_len;
>>>              guest_offset = n->guest_hdr_len;
>>> +            continue;
>>>          } else {
>>>              guest_offset = 0;
>>>          }
>>>
>>>
>>>
>>> here's one that should cap the 1st s/g to 100 bytes:
>>>
>>>
>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>>> index b68c69d..7943004 100644
>>> --- a/hw/net/virtio-net.c
>>> +++ b/hw/net/virtio-net.c
>>> @@ -1164,6 +1164,7 @@ static ssize_t virtio_net_receive(NetClientState *nc, const uint8_t *buf, size_t
>>>              offset = n->host_hdr_len;
>>>              total += n->guest_hdr_len;
>>>              guest_offset = n->guest_hdr_len;
>>> +            sg.iov_len = MIN(sg.iov_len, 100);
>>>          } else {
>>>              guest_offset = 0;
>>>          }
>>>

Here is the patch I went with, I'm using vhost=on:

--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -1777,7 +1777,8 @@ static int translate_desc(struct vhost_virtqueue
*vq, u64

                _iov = iov + ret;
                size = node->size - addr + node->start;
-               _iov->iov_len = min((u64)len - s, size);
+               printk("%s: build 100 length headers!\n", __func__);
+               _iov->iov_len = min((u64)len - s, (u64)100);//size);
                _iov->iov_base = (void __user *)(unsigned long)
                        (node->userspace_addr + addr - node->start);
                s += size;


This seems to do the trick and with 100 I can use 'ping -s' to generate
as many bufs is needed up to MTU. The patch I have seems to be working
fine I'll let it run a bit and test it with some real traffic (not just
ping) then push out a v3 assuming I don't find any issues.

Thanks,
John

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2016-11-28 23:27 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-20  2:49 [net-next PATCH v2 0/5] XDP for virtio_net John Fastabend
2016-11-20  2:49 ` [net-next PATCH v2 1/5] net: virtio dynamically disable/enable LRO John Fastabend
2016-11-21 23:23   ` Michael S. Tsirkin
2016-11-22  8:16     ` John Fastabend
2016-11-20  2:50 ` [net-next PATCH v2 2/5] net: xdp: add invalid buffer warning John Fastabend
2016-11-20  2:50 ` [net-next PATCH v2 3/5] virtio_net: Add XDP support John Fastabend
2016-11-21 23:20   ` Michael S. Tsirkin
2016-11-22  8:27     ` John Fastabend
2016-11-22 14:58       ` Michael S. Tsirkin
2016-11-25 21:24         ` John Fastabend
2016-11-28  3:36           ` Michael S. Tsirkin
2016-11-28  3:56             ` John Fastabend
2016-11-28  4:07               ` Michael S. Tsirkin
2016-11-28 23:26                 ` John Fastabend
2016-11-20  2:51 ` [net-next PATCH v2 4/5] virtio_net: add dedicated XDP transmit queues John Fastabend
2016-11-21 11:45   ` Daniel Borkmann
2016-11-21 15:56     ` John Fastabend
2016-11-21 23:13   ` Michael S. Tsirkin
2016-11-22  8:17     ` John Fastabend
2016-11-22 14:59       ` Michael S. Tsirkin
2016-11-20  2:51 ` [net-next PATCH v2 5/5] virtio_net: add XDP_TX support John Fastabend

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.