All of lore.kernel.org
 help / color / mirror / Atom feed
* [net-next PATCH v5 0/6] XDP for virtio_net
@ 2016-12-07 20:10 John Fastabend
  2016-12-07 20:11 ` [net-next PATCH v5 1/6] net: virtio dynamically disable/enable LRO John Fastabend
                   ` (6 more replies)
  0 siblings, 7 replies; 35+ messages in thread
From: John Fastabend @ 2016-12-07 20:10 UTC (permalink / raw)
  To: daniel, mst, shm, davem, tgraf, alexei.starovoitov
  Cc: john.r.fastabend, netdev, john.fastabend, brouer

This implements virtio_net for the mergeable buffers and big_packet
modes. I tested this with vhost_net running on qemu and did not see
any issues. For testing num_buf > 1 I added a hack to vhost driver
to only but 100 bytes per buffer.

There are some restrictions for XDP to be enabled and work well
(see patch 3) for more details.

  1. LRO must be off
  2. MTU must be less than PAGE_SIZE
  3. queues must be available to dedicate to XDP
  4. num_bufs received in mergeable buffers must be 1
  5. big_packet mode must have all data on single page

To test this I used pktgen in the hypervisor and ran the XDP sample
programs xdp1 and xdp2 from ./samples/bpf in the host. The default
mode that is used with these patches with Linux guest and QEMU/Linux
hypervisor is the mergeable buffers mode. I tested this mode for 2+
days running xdp2 without issues. Additionally I did a series of
driver unload/load tests to check the allocate/release paths.

To test the big_packets path I applied the following simple patch against
the virtio driver forcing big_packets mode,

--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -2242,7 +2242,7 @@ static int virtnet_probe(struct virtio_device *vdev)
                vi->big_packets = true;
 
        if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF))
-               vi->mergeable_rx_bufs = true;
+               vi->mergeable_rx_bufs = false;
 
        if (virtio_has_feature(vdev, VIRTIO_NET_F_MRG_RXBUF) ||
            virtio_has_feature(vdev, VIRTIO_F_VERSION_1))

I then repeated the tests with xdp1 and xdp2. After letting them run
for a few hours I called it good enough.

Testing the unexpected case where virtio receives a packet across
multiple buffers required patching the hypervisor vhost driver to
convince it to send these unexpected packets. Then I used ping with
the -s option to trigger the case with multiple buffers. This mode
is not expected to be used but as MST pointed out per spec it is
not strictly speaking illegal to generate multi-buffer packets so we
need someway to handle these. The following patch can be used to
generate multiple buffers,


--- a/drivers/vhost/vhost.c
+++ b/drivers/vhost/vhost.c
@@ -1777,7 +1777,8 @@ static int translate_desc(struct vhost_virtqueue
*vq, u64

                _iov = iov + ret;
                size = node->size - addr + node->start;
-               _iov->iov_len = min((u64)len - s, size);
+               printk("%s: build 100 length headers!\n", __func__);
+               _iov->iov_len = min((u64)len - s, (u64)100);//size);
                _iov->iov_base = (void __user *)(unsigned long)
                        (node->userspace_addr + addr - node->start);
                s += size;

Please review any comments/feedback welcome as always.

Thanks,
John

---

John Fastabend (6):
      net: virtio dynamically disable/enable LRO
      net: xdp: add invalid buffer warning
      virtio_net: Add XDP support
      virtio_net: add dedicated XDP transmit queues
      virtio_net: add XDP_TX support
      virtio_net: xdp, add slowpath case for non contiguous buffers


 drivers/net/virtio_net.c |  403 +++++++++++++++++++++++++++++++++++++++++++++-
 include/linux/filter.h   |    1 
 net/core/filter.c        |    6 +
 3 files changed, 402 insertions(+), 8 deletions(-)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [net-next PATCH v5 1/6] net: virtio dynamically disable/enable LRO
  2016-12-07 20:10 [net-next PATCH v5 0/6] XDP for virtio_net John Fastabend
@ 2016-12-07 20:11 ` John Fastabend
  2016-12-08 21:36   ` Michael S. Tsirkin
  2016-12-07 20:11 ` [net-next PATCH v5 2/6] net: xdp: add invalid buffer warning John Fastabend
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 35+ messages in thread
From: John Fastabend @ 2016-12-07 20:11 UTC (permalink / raw)
  To: daniel, mst, shm, davem, tgraf, alexei.starovoitov
  Cc: john.r.fastabend, netdev, john.fastabend, brouer

This adds support for dynamically setting the LRO feature flag. The
message to control guest features in the backend uses the
CTRL_GUEST_OFFLOADS msg type.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 drivers/net/virtio_net.c |   40 +++++++++++++++++++++++++++++++++++++++-
 1 file changed, 39 insertions(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index a21d93a..a5c47b1 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1419,6 +1419,36 @@ static void virtnet_init_settings(struct net_device *dev)
 	.set_settings = virtnet_set_settings,
 };
 
+static int virtnet_set_features(struct net_device *netdev,
+				netdev_features_t features)
+{
+	struct virtnet_info *vi = netdev_priv(netdev);
+	struct virtio_device *vdev = vi->vdev;
+	struct scatterlist sg;
+	u64 offloads = 0;
+
+	if (features & NETIF_F_LRO)
+		offloads |= (1 << VIRTIO_NET_F_GUEST_TSO4) |
+			    (1 << VIRTIO_NET_F_GUEST_TSO6);
+
+	if (features & NETIF_F_RXCSUM)
+		offloads |= (1 << VIRTIO_NET_F_GUEST_CSUM);
+
+	if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS)) {
+		sg_init_one(&sg, &offloads, sizeof(uint64_t));
+		if (!virtnet_send_command(vi,
+					  VIRTIO_NET_CTRL_GUEST_OFFLOADS,
+					  VIRTIO_NET_CTRL_GUEST_OFFLOADS_SET,
+					  &sg)) {
+			dev_warn(&netdev->dev,
+				 "Failed to set guest offloads by virtnet command.\n");
+			return -EINVAL;
+		}
+	}
+
+	return 0;
+}
+
 static const struct net_device_ops virtnet_netdev = {
 	.ndo_open            = virtnet_open,
 	.ndo_stop   	     = virtnet_close,
@@ -1435,6 +1465,7 @@ static void virtnet_init_settings(struct net_device *dev)
 #ifdef CONFIG_NET_RX_BUSY_POLL
 	.ndo_busy_poll		= virtnet_busy_poll,
 #endif
+	.ndo_set_features	= virtnet_set_features,
 };
 
 static void virtnet_config_changed_work(struct work_struct *work)
@@ -1815,6 +1846,12 @@ static int virtnet_probe(struct virtio_device *vdev)
 	if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_CSUM))
 		dev->features |= NETIF_F_RXCSUM;
 
+	if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) &&
+	    virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6)) {
+		dev->features |= NETIF_F_LRO;
+		dev->hw_features |= NETIF_F_LRO;
+	}
+
 	dev->vlan_features = dev->features;
 
 	/* MTU range: 68 - 65535 */
@@ -2057,7 +2094,8 @@ static int virtnet_restore(struct virtio_device *vdev)
 	VIRTIO_NET_F_CTRL_RX, VIRTIO_NET_F_CTRL_VLAN, \
 	VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ, \
 	VIRTIO_NET_F_CTRL_MAC_ADDR, \
-	VIRTIO_NET_F_MTU
+	VIRTIO_NET_F_MTU, \
+	VIRTIO_NET_F_CTRL_GUEST_OFFLOADS
 
 static unsigned int features[] = {
 	VIRTNET_FEATURES,

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [net-next PATCH v5 2/6] net: xdp: add invalid buffer warning
  2016-12-07 20:10 [net-next PATCH v5 0/6] XDP for virtio_net John Fastabend
  2016-12-07 20:11 ` [net-next PATCH v5 1/6] net: virtio dynamically disable/enable LRO John Fastabend
@ 2016-12-07 20:11 ` John Fastabend
  2016-12-07 20:11 ` [net-next PATCH v5 3/6] virtio_net: Add XDP support John Fastabend
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 35+ messages in thread
From: John Fastabend @ 2016-12-07 20:11 UTC (permalink / raw)
  To: daniel, mst, shm, davem, tgraf, alexei.starovoitov
  Cc: john.r.fastabend, netdev, john.fastabend, brouer

This adds a warning for drivers to use when encountering an invalid
buffer for XDP. For normal cases this should not happen but to catch
this in virtual/qemu setups that I may not have expected from the
emulation layer having a standard warning is useful.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 include/linux/filter.h |    1 +
 net/core/filter.c      |    6 ++++++
 2 files changed, 7 insertions(+)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index f078d2b..860f953 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -600,6 +600,7 @@ int sk_get_filter(struct sock *sk, struct sock_filter __user *filter,
 struct bpf_prog *bpf_patch_insn_single(struct bpf_prog *prog, u32 off,
 				       const struct bpf_insn *patch, u32 len);
 void bpf_warn_invalid_xdp_action(u32 act);
+void bpf_warn_invalid_xdp_buffer(void);
 
 #ifdef CONFIG_BPF_JIT
 extern int bpf_jit_enable;
diff --git a/net/core/filter.c b/net/core/filter.c
index b751202..849f23d 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2948,6 +2948,12 @@ void bpf_warn_invalid_xdp_action(u32 act)
 }
 EXPORT_SYMBOL_GPL(bpf_warn_invalid_xdp_action);
 
+void bpf_warn_invalid_xdp_buffer(void)
+{
+	WARN_ONCE(1, "Illegal XDP buffer encountered, expect throughput degradation\n");
+}
+EXPORT_SYMBOL_GPL(bpf_warn_invalid_xdp_buffer);
+
 static u32 sk_filter_convert_ctx_access(enum bpf_access_type type, int dst_reg,
 					int src_reg, int ctx_off,
 					struct bpf_insn *insn_buf,

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [net-next PATCH v5 3/6] virtio_net: Add XDP support
  2016-12-07 20:10 [net-next PATCH v5 0/6] XDP for virtio_net John Fastabend
  2016-12-07 20:11 ` [net-next PATCH v5 1/6] net: virtio dynamically disable/enable LRO John Fastabend
  2016-12-07 20:11 ` [net-next PATCH v5 2/6] net: xdp: add invalid buffer warning John Fastabend
@ 2016-12-07 20:11 ` John Fastabend
  2016-12-08  4:48   ` Michael S. Tsirkin
  2016-12-07 20:12 ` [net-next PATCH v5 4/6] virtio_net: add dedicated XDP transmit queues John Fastabend
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 35+ messages in thread
From: John Fastabend @ 2016-12-07 20:11 UTC (permalink / raw)
  To: daniel, mst, shm, davem, tgraf, alexei.starovoitov
  Cc: john.r.fastabend, netdev, john.fastabend, brouer

From: John Fastabend <john.fastabend@gmail.com>

This adds XDP support to virtio_net. Some requirements must be
met for XDP to be enabled depending on the mode. First it will
only be supported with LRO disabled so that data is not pushed
across multiple buffers. Second the MTU must be less than a page
size to avoid having to handle XDP across multiple pages.

If mergeable receive is enabled this patch only supports the case
where header and data are in the same buf which we can check when
a packet is received by looking at num_buf. If the num_buf is
greater than 1 and a XDP program is loaded the packet is dropped
and a warning is thrown. When any_header_sg is set this does not
happen and both header and data is put in a single buffer as expected
so we check this when XDP programs are loaded.  Subsequent patches
will process the packet in a degraded mode to ensure connectivity
and correctness is not lost even if backend pushes packets into
multiple buffers.

If big packets mode is enabled and MTU/LRO conditions above are
met then XDP is allowed.

This patch was tested with qemu with vhost=on and vhost=off where
mergeable and big_packet modes were forced via hard coding feature
negotiation. Multiple buffers per packet was forced via a small
test patch to vhost.c in the vhost=on qemu mode.

Suggested-by: Shrijeet Mukherjee <shrijeet@gmail.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 drivers/net/virtio_net.c |  175 +++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 170 insertions(+), 5 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index a5c47b1..a009299 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -22,6 +22,7 @@
 #include <linux/module.h>
 #include <linux/virtio.h>
 #include <linux/virtio_net.h>
+#include <linux/bpf.h>
 #include <linux/scatterlist.h>
 #include <linux/if_vlan.h>
 #include <linux/slab.h>
@@ -81,6 +82,8 @@ struct receive_queue {
 
 	struct napi_struct napi;
 
+	struct bpf_prog __rcu *xdp_prog;
+
 	/* Chain pages by the private ptr. */
 	struct page *pages;
 
@@ -324,6 +327,38 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
 	return skb;
 }
 
+static u32 do_xdp_prog(struct virtnet_info *vi,
+		       struct bpf_prog *xdp_prog,
+		       struct page *page, int offset, int len)
+{
+	int hdr_padded_len;
+	struct xdp_buff xdp;
+	u32 act;
+	u8 *buf;
+
+	buf = page_address(page) + offset;
+
+	if (vi->mergeable_rx_bufs)
+		hdr_padded_len = sizeof(struct virtio_net_hdr_mrg_rxbuf);
+	else
+		hdr_padded_len = sizeof(struct padded_vnet_hdr);
+
+	xdp.data = buf + hdr_padded_len;
+	xdp.data_end = xdp.data + (len - vi->hdr_len);
+
+	act = bpf_prog_run_xdp(xdp_prog, &xdp);
+	switch (act) {
+	case XDP_PASS:
+		return XDP_PASS;
+	default:
+		bpf_warn_invalid_xdp_action(act);
+	case XDP_TX:
+	case XDP_ABORTED:
+	case XDP_DROP:
+		return XDP_DROP;
+	}
+}
+
 static struct sk_buff *receive_small(struct virtnet_info *vi, void *buf, unsigned int len)
 {
 	struct sk_buff * skb = buf;
@@ -340,14 +375,32 @@ static struct sk_buff *receive_big(struct net_device *dev,
 				   void *buf,
 				   unsigned int len)
 {
+	struct bpf_prog *xdp_prog;
 	struct page *page = buf;
-	struct sk_buff *skb = page_to_skb(vi, rq, page, 0, len, PAGE_SIZE);
+	struct sk_buff *skb;
 
+	rcu_read_lock();
+	xdp_prog = rcu_dereference(rq->xdp_prog);
+	if (xdp_prog) {
+		struct virtio_net_hdr_mrg_rxbuf *hdr = buf;
+		u32 act;
+
+		if (unlikely(hdr->hdr.gso_type || hdr->hdr.flags))
+			goto err_xdp;
+		act = do_xdp_prog(vi, xdp_prog, page, 0, len);
+		if (act == XDP_DROP)
+			goto err_xdp;
+	}
+	rcu_read_unlock();
+
+	skb = page_to_skb(vi, rq, page, 0, len, PAGE_SIZE);
 	if (unlikely(!skb))
 		goto err;
 
 	return skb;
 
+err_xdp:
+	rcu_read_unlock();
 err:
 	dev->stats.rx_dropped++;
 	give_pages(rq, page);
@@ -365,11 +418,42 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 	u16 num_buf = virtio16_to_cpu(vi->vdev, hdr->num_buffers);
 	struct page *page = virt_to_head_page(buf);
 	int offset = buf - page_address(page);
-	unsigned int truesize = max(len, mergeable_ctx_to_buf_truesize(ctx));
+	struct sk_buff *head_skb, *curr_skb;
+	struct bpf_prog *xdp_prog;
+	unsigned int truesize;
+
+	rcu_read_lock();
+	xdp_prog = rcu_dereference(rq->xdp_prog);
+	if (xdp_prog) {
+		u32 act;
+
+		/* No known backend devices should send packets with
+		 * more than a single buffer when XDP conditions are
+		 * met. However it is not strictly illegal so the case
+		 * is handled as an exception and a warning is thrown.
+		 */
+		if (unlikely(num_buf > 1)) {
+			bpf_warn_invalid_xdp_buffer();
+			goto err_xdp;
+		}
 
-	struct sk_buff *head_skb = page_to_skb(vi, rq, page, offset, len,
-					       truesize);
-	struct sk_buff *curr_skb = head_skb;
+		/* Transient failure which in theory could occur if
+		 * in-flight packets from before XDP was enabled reach
+		 * the receive path after XDP is loaded. In practice I
+		 * was not able to create this condition.
+		 */
+		if (unlikely(hdr->hdr.gso_type || hdr->hdr.flags))
+			goto err_xdp;
+
+		act = do_xdp_prog(vi, xdp_prog, page, offset, len);
+		if (act == XDP_DROP)
+			goto err_xdp;
+	}
+	rcu_read_unlock();
+
+	truesize = max(len, mergeable_ctx_to_buf_truesize(ctx));
+	head_skb = page_to_skb(vi, rq, page, offset, len, truesize);
+	curr_skb = head_skb;
 
 	if (unlikely(!curr_skb))
 		goto err_skb;
@@ -423,6 +507,8 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 	ewma_pkt_len_add(&rq->mrg_avg_pkt_len, head_skb->len);
 	return head_skb;
 
+err_xdp:
+	rcu_read_unlock();
 err_skb:
 	put_page(page);
 	while (--num_buf) {
@@ -1328,6 +1414,13 @@ static int virtnet_set_channels(struct net_device *dev,
 	if (queue_pairs > vi->max_queue_pairs || queue_pairs == 0)
 		return -EINVAL;
 
+	/* For now we don't support modifying channels while XDP is loaded
+	 * also when XDP is loaded all RX queues have XDP programs so we only
+	 * need to check a single RX queue.
+	 */
+	if (vi->rq[0].xdp_prog)
+		return -EINVAL;
+
 	get_online_cpus();
 	err = virtnet_set_queues(vi, queue_pairs);
 	if (!err) {
@@ -1449,6 +1542,69 @@ static int virtnet_set_features(struct net_device *netdev,
 	return 0;
 }
 
+static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog)
+{
+	unsigned long int max_sz = PAGE_SIZE - sizeof(struct padded_vnet_hdr);
+	struct virtnet_info *vi = netdev_priv(dev);
+	struct bpf_prog *old_prog;
+	int i;
+
+	if ((dev->features & NETIF_F_LRO) && prog) {
+		netdev_warn(dev, "can't set XDP while LRO is on, disable LRO first\n");
+		return -EINVAL;
+	}
+
+	if (vi->mergeable_rx_bufs && !vi->any_header_sg) {
+		netdev_warn(dev, "XDP expects header/data in single page\n");
+		return -EINVAL;
+	}
+
+	if (dev->mtu > max_sz) {
+		netdev_warn(dev, "XDP requires MTU less than %lu\n", max_sz);
+		return -EINVAL;
+	}
+
+	if (prog) {
+		prog = bpf_prog_add(prog, vi->max_queue_pairs - 1);
+		if (IS_ERR(prog))
+			return PTR_ERR(prog);
+	}
+
+	for (i = 0; i < vi->max_queue_pairs; i++) {
+		old_prog = rtnl_dereference(vi->rq[i].xdp_prog);
+		rcu_assign_pointer(vi->rq[i].xdp_prog, prog);
+		if (old_prog)
+			bpf_prog_put(old_prog);
+	}
+
+	return 0;
+}
+
+static bool virtnet_xdp_query(struct net_device *dev)
+{
+	struct virtnet_info *vi = netdev_priv(dev);
+	int i;
+
+	for (i = 0; i < vi->max_queue_pairs; i++) {
+		if (vi->rq[i].xdp_prog)
+			return true;
+	}
+	return false;
+}
+
+static int virtnet_xdp(struct net_device *dev, struct netdev_xdp *xdp)
+{
+	switch (xdp->command) {
+	case XDP_SETUP_PROG:
+		return virtnet_xdp_set(dev, xdp->prog);
+	case XDP_QUERY_PROG:
+		xdp->prog_attached = virtnet_xdp_query(dev);
+		return 0;
+	default:
+		return -EINVAL;
+	}
+}
+
 static const struct net_device_ops virtnet_netdev = {
 	.ndo_open            = virtnet_open,
 	.ndo_stop   	     = virtnet_close,
@@ -1466,6 +1622,7 @@ static int virtnet_set_features(struct net_device *netdev,
 	.ndo_busy_poll		= virtnet_busy_poll,
 #endif
 	.ndo_set_features	= virtnet_set_features,
+	.ndo_xdp		= virtnet_xdp,
 };
 
 static void virtnet_config_changed_work(struct work_struct *work)
@@ -1527,12 +1684,20 @@ static void virtnet_free_queues(struct virtnet_info *vi)
 
 static void free_receive_bufs(struct virtnet_info *vi)
 {
+	struct bpf_prog *old_prog;
 	int i;
 
+	rtnl_lock();
 	for (i = 0; i < vi->max_queue_pairs; i++) {
 		while (vi->rq[i].pages)
 			__free_pages(get_a_page(&vi->rq[i], GFP_KERNEL), 0);
+
+		old_prog = rtnl_dereference(vi->rq[i].xdp_prog);
+		RCU_INIT_POINTER(vi->rq[i].xdp_prog, NULL);
+		if (old_prog)
+			bpf_prog_put(old_prog);
 	}
+	rtnl_unlock();
 }
 
 static void free_receive_page_frags(struct virtnet_info *vi)

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [net-next PATCH v5 4/6] virtio_net: add dedicated XDP transmit queues
  2016-12-07 20:10 [net-next PATCH v5 0/6] XDP for virtio_net John Fastabend
                   ` (2 preceding siblings ...)
  2016-12-07 20:11 ` [net-next PATCH v5 3/6] virtio_net: Add XDP support John Fastabend
@ 2016-12-07 20:12 ` John Fastabend
  2016-12-08  5:59   ` Michael S. Tsirkin
  2016-12-07 20:12 ` [net-next PATCH v5 5/6] virtio_net: add XDP_TX support John Fastabend
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 35+ messages in thread
From: John Fastabend @ 2016-12-07 20:12 UTC (permalink / raw)
  To: daniel, mst, shm, davem, tgraf, alexei.starovoitov
  Cc: john.r.fastabend, netdev, john.fastabend, brouer

XDP requires using isolated transmit queues to avoid interference
with normal networking stack (BQL, NETDEV_TX_BUSY, etc). This patch
adds a XDP queue per cpu when a XDP program is loaded and does not
expose the queues to the OS via the normal API call to
netif_set_real_num_tx_queues(). This way the stack will never push
an skb to these queues.

However virtio/vhost/qemu implementation only allows for creating
TX/RX queue pairs at this time so creating only TX queues was not
possible. And because the associated RX queues are being created I
went ahead and exposed these to the stack and let the backend use
them. This creates more RX queues visible to the network stack than
TX queues which is worth mentioning but does not cause any issues as
far as I can tell.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 drivers/net/virtio_net.c |   30 ++++++++++++++++++++++++++++--
 1 file changed, 28 insertions(+), 2 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index a009299..28b1196 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -114,6 +114,9 @@ struct virtnet_info {
 	/* # of queue pairs currently used by the driver */
 	u16 curr_queue_pairs;
 
+	/* # of XDP queue pairs currently used by the driver */
+	u16 xdp_queue_pairs;
+
 	/* I like... big packets and I cannot lie! */
 	bool big_packets;
 
@@ -1547,7 +1550,8 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog)
 	unsigned long int max_sz = PAGE_SIZE - sizeof(struct padded_vnet_hdr);
 	struct virtnet_info *vi = netdev_priv(dev);
 	struct bpf_prog *old_prog;
-	int i;
+	u16 xdp_qp = 0, curr_qp;
+	int i, err;
 
 	if ((dev->features & NETIF_F_LRO) && prog) {
 		netdev_warn(dev, "can't set XDP while LRO is on, disable LRO first\n");
@@ -1564,12 +1568,34 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog)
 		return -EINVAL;
 	}
 
+	curr_qp = vi->curr_queue_pairs - vi->xdp_queue_pairs;
+	if (prog)
+		xdp_qp = nr_cpu_ids;
+
+	/* XDP requires extra queues for XDP_TX */
+	if (curr_qp + xdp_qp > vi->max_queue_pairs) {
+		netdev_warn(dev, "request %i queues but max is %i\n",
+			    curr_qp + xdp_qp, vi->max_queue_pairs);
+		return -ENOMEM;
+	}
+
+	err = virtnet_set_queues(vi, curr_qp + xdp_qp);
+	if (err) {
+		dev_warn(&dev->dev, "XDP Device queue allocation failure.\n");
+		return err;
+	}
+
 	if (prog) {
 		prog = bpf_prog_add(prog, vi->max_queue_pairs - 1);
-		if (IS_ERR(prog))
+		if (IS_ERR(prog)) {
+			virtnet_set_queues(vi, curr_qp);
 			return PTR_ERR(prog);
+		}
 	}
 
+	vi->xdp_queue_pairs = xdp_qp;
+	netif_set_real_num_rx_queues(dev, curr_qp + xdp_qp);
+
 	for (i = 0; i < vi->max_queue_pairs; i++) {
 		old_prog = rtnl_dereference(vi->rq[i].xdp_prog);
 		rcu_assign_pointer(vi->rq[i].xdp_prog, prog);

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [net-next PATCH v5 5/6] virtio_net: add XDP_TX support
  2016-12-07 20:10 [net-next PATCH v5 0/6] XDP for virtio_net John Fastabend
                   ` (3 preceding siblings ...)
  2016-12-07 20:12 ` [net-next PATCH v5 4/6] virtio_net: add dedicated XDP transmit queues John Fastabend
@ 2016-12-07 20:12 ` John Fastabend
  2016-12-08  6:11   ` Michael S. Tsirkin
  2016-12-07 20:13 ` [net-next PATCH v5 6/6] virtio_net: xdp, add slowpath case for non contiguous buffers John Fastabend
  2016-12-08 19:17 ` [net-next PATCH v5 0/6] XDP for virtio_net David Miller
  6 siblings, 1 reply; 35+ messages in thread
From: John Fastabend @ 2016-12-07 20:12 UTC (permalink / raw)
  To: daniel, mst, shm, davem, tgraf, alexei.starovoitov
  Cc: john.r.fastabend, netdev, john.fastabend, brouer

This adds support for the XDP_TX action to virtio_net. When an XDP
program is run and returns the XDP_TX action the virtio_net XDP
implementation will transmit the packet on a TX queue that aligns
with the current CPU that the XDP packet was processed on.

Before sending the packet the header is zeroed.  Also XDP is expected
to handle checksum correctly so no checksum offload  support is
provided.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 drivers/net/virtio_net.c |   99 +++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 92 insertions(+), 7 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 28b1196..8e5b13c 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -330,12 +330,57 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
 	return skb;
 }
 
+static void virtnet_xdp_xmit(struct virtnet_info *vi,
+			     struct receive_queue *rq,
+			     struct send_queue *sq,
+			     struct xdp_buff *xdp)
+{
+	struct page *page = virt_to_head_page(xdp->data);
+	struct virtio_net_hdr_mrg_rxbuf *hdr;
+	unsigned int num_sg, len;
+	void *xdp_sent;
+	int err;
+
+	/* Free up any pending old buffers before queueing new ones. */
+	while ((xdp_sent = virtqueue_get_buf(sq->vq, &len)) != NULL) {
+		struct page *sent_page = virt_to_head_page(xdp_sent);
+
+		if (vi->mergeable_rx_bufs)
+			put_page(sent_page);
+		else
+			give_pages(rq, sent_page);
+	}
+
+	/* Zero header and leave csum up to XDP layers */
+	hdr = xdp->data;
+	memset(hdr, 0, vi->hdr_len);
+
+	num_sg = 1;
+	sg_init_one(sq->sg, xdp->data, xdp->data_end - xdp->data);
+	err = virtqueue_add_outbuf(sq->vq, sq->sg, num_sg,
+				   xdp->data, GFP_ATOMIC);
+	if (unlikely(err)) {
+		if (vi->mergeable_rx_bufs)
+			put_page(page);
+		else
+			give_pages(rq, page);
+	} else if (!vi->mergeable_rx_bufs) {
+		/* If not mergeable bufs must be big packets so cleanup pages */
+		give_pages(rq, (struct page *)page->private);
+		page->private = 0;
+	}
+
+	virtqueue_kick(sq->vq);
+}
+
 static u32 do_xdp_prog(struct virtnet_info *vi,
+		       struct receive_queue *rq,
 		       struct bpf_prog *xdp_prog,
 		       struct page *page, int offset, int len)
 {
 	int hdr_padded_len;
 	struct xdp_buff xdp;
+	unsigned int qp;
 	u32 act;
 	u8 *buf;
 
@@ -353,9 +398,15 @@ static u32 do_xdp_prog(struct virtnet_info *vi,
 	switch (act) {
 	case XDP_PASS:
 		return XDP_PASS;
+	case XDP_TX:
+		qp = vi->curr_queue_pairs -
+			vi->xdp_queue_pairs +
+			smp_processor_id();
+		xdp.data = buf + (vi->mergeable_rx_bufs ? 0 : 4);
+		virtnet_xdp_xmit(vi, rq, &vi->sq[qp], &xdp);
+		return XDP_TX;
 	default:
 		bpf_warn_invalid_xdp_action(act);
-	case XDP_TX:
 	case XDP_ABORTED:
 	case XDP_DROP:
 		return XDP_DROP;
@@ -390,9 +441,17 @@ static struct sk_buff *receive_big(struct net_device *dev,
 
 		if (unlikely(hdr->hdr.gso_type || hdr->hdr.flags))
 			goto err_xdp;
-		act = do_xdp_prog(vi, xdp_prog, page, 0, len);
-		if (act == XDP_DROP)
+		act = do_xdp_prog(vi, rq, xdp_prog, page, 0, len);
+		switch (act) {
+		case XDP_PASS:
+			break;
+		case XDP_TX:
+			rcu_read_unlock();
+			goto xdp_xmit;
+		case XDP_DROP:
+		default:
 			goto err_xdp;
+		}
 	}
 	rcu_read_unlock();
 
@@ -407,6 +466,7 @@ static struct sk_buff *receive_big(struct net_device *dev,
 err:
 	dev->stats.rx_dropped++;
 	give_pages(rq, page);
+xdp_xmit:
 	return NULL;
 }
 
@@ -425,6 +485,8 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 	struct bpf_prog *xdp_prog;
 	unsigned int truesize;
 
+	head_skb = NULL;
+
 	rcu_read_lock();
 	xdp_prog = rcu_dereference(rq->xdp_prog);
 	if (xdp_prog) {
@@ -448,9 +510,17 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 		if (unlikely(hdr->hdr.gso_type || hdr->hdr.flags))
 			goto err_xdp;
 
-		act = do_xdp_prog(vi, xdp_prog, page, offset, len);
-		if (act == XDP_DROP)
+		act = do_xdp_prog(vi, rq, xdp_prog, page, offset, len);
+		switch (act) {
+		case XDP_PASS:
+			break;
+		case XDP_TX:
+			rcu_read_unlock();
+			goto xdp_xmit;
+		case XDP_DROP:
+		default:
 			goto err_xdp;
+		}
 	}
 	rcu_read_unlock();
 
@@ -528,6 +598,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 err_buf:
 	dev->stats.rx_dropped++;
 	dev_kfree_skb(head_skb);
+xdp_xmit:
 	return NULL;
 }
 
@@ -1734,6 +1805,16 @@ static void free_receive_page_frags(struct virtnet_info *vi)
 			put_page(vi->rq[i].alloc_frag.page);
 }
 
+static bool is_xdp_queue(struct virtnet_info *vi, int q)
+{
+	if (q < (vi->curr_queue_pairs - vi->xdp_queue_pairs))
+		return false;
+	else if (q < vi->curr_queue_pairs)
+		return true;
+	else
+		return false;
+}
+
 static void free_unused_bufs(struct virtnet_info *vi)
 {
 	void *buf;
@@ -1741,8 +1822,12 @@ static void free_unused_bufs(struct virtnet_info *vi)
 
 	for (i = 0; i < vi->max_queue_pairs; i++) {
 		struct virtqueue *vq = vi->sq[i].vq;
-		while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
-			dev_kfree_skb(buf);
+		while ((buf = virtqueue_detach_unused_buf(vq)) != NULL) {
+			if (!is_xdp_queue(vi, i))
+				dev_kfree_skb(buf);
+			else
+				put_page(virt_to_head_page(buf));
+		}
 	}
 
 	for (i = 0; i < vi->max_queue_pairs; i++) {

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [net-next PATCH v5 6/6] virtio_net: xdp, add slowpath case for non contiguous buffers
  2016-12-07 20:10 [net-next PATCH v5 0/6] XDP for virtio_net John Fastabend
                   ` (4 preceding siblings ...)
  2016-12-07 20:12 ` [net-next PATCH v5 5/6] virtio_net: add XDP_TX support John Fastabend
@ 2016-12-07 20:13 ` John Fastabend
  2016-12-08 19:17 ` [net-next PATCH v5 0/6] XDP for virtio_net David Miller
  6 siblings, 0 replies; 35+ messages in thread
From: John Fastabend @ 2016-12-07 20:13 UTC (permalink / raw)
  To: daniel, mst, shm, davem, tgraf, alexei.starovoitov
  Cc: john.r.fastabend, netdev, john.fastabend, brouer

virtio_net XDP support expects receive buffers to be contiguous.
If this is not the case we enable a slowpath to allow connectivity
to continue but at a significan performance overhead associated with
linearizing data. To make it painfully aware to users that XDP is
running in a degraded mode we throw an xdp buffer error.

To linearize packets we allocate a page and copy the segments of
the data, including the header, into it. After this the page can be
handled by XDP code flow as normal.

Then depending on the return code the page is either freed or sent
to the XDP xmit path. There is no attempt to optimize this path.

This case is being handled simple as a precaution in case some
unknown backend were to generate packets in this form. To test this
I had to hack qemu and force it to generate these packets. I do not
expect this case to be generated by "real" backends.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 drivers/net/virtio_net.c |   75 +++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 74 insertions(+), 1 deletion(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 8e5b13c..20c81bc 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -470,6 +470,64 @@ static struct sk_buff *receive_big(struct net_device *dev,
 	return NULL;
 }
 
+/* The conditions to enable XDP should preclude the underlying device from
+ * sending packets across multiple buffers (num_buf > 1). However per spec
+ * it does not appear to be illegal to do so but rather just against convention.
+ * So in order to avoid making a system unresponsive the packets are pushed
+ * into a page and the XDP program is run. This will be extremely slow and we
+ * push a warning to the user to fix this as soon as possible. Fixing this may
+ * require resolving the underlying hardware to determine why multiple buffers
+ * are being received or simply loading the XDP program in the ingress stack
+ * after the skb is built because there is no advantage to running it here
+ * anymore.
+ */
+static struct page *xdp_linearize_page(struct receive_queue *rq,
+				       u16 num_buf,
+				       struct page *p,
+				       int offset,
+				       unsigned int *len)
+{
+	struct page *page = alloc_page(GFP_ATOMIC);
+	unsigned int page_off = 0;
+
+	if (!page)
+		return NULL;
+
+	memcpy(page_address(page) + page_off, page_address(p) + offset, *len);
+	page_off += *len;
+
+	while (--num_buf) {
+		unsigned int buflen;
+		unsigned long ctx;
+		void *buf;
+		int off;
+
+		ctx = (unsigned long)virtqueue_get_buf(rq->vq, &buflen);
+		if (unlikely(!ctx))
+			goto err_buf;
+
+		/* guard against a misconfigured or uncooperative backend that
+		 * is sending packet larger than the MTU.
+		 */
+		if ((page_off + buflen) > PAGE_SIZE)
+			goto err_buf;
+
+		buf = mergeable_ctx_to_buf_address(ctx);
+		p = virt_to_head_page(buf);
+		off = buf - page_address(p);
+
+		memcpy(page_address(page) + page_off,
+		       page_address(p) + off, buflen);
+		page_off += buflen;
+	}
+
+	*len = page_off;
+	return page;
+err_buf:
+	__free_pages(page, 0);
+	return NULL;
+}
+
 static struct sk_buff *receive_mergeable(struct net_device *dev,
 					 struct virtnet_info *vi,
 					 struct receive_queue *rq,
@@ -490,6 +548,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 	rcu_read_lock();
 	xdp_prog = rcu_dereference(rq->xdp_prog);
 	if (xdp_prog) {
+		struct page *xdp_page;
 		u32 act;
 
 		/* No known backend devices should send packets with
@@ -499,7 +558,15 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 		 */
 		if (unlikely(num_buf > 1)) {
 			bpf_warn_invalid_xdp_buffer();
-			goto err_xdp;
+
+			/* linearize data for XDP */
+			xdp_page = xdp_linearize_page(rq, num_buf,
+						      page, offset, &len);
+			if (!xdp_page)
+				goto err_xdp;
+			offset = 0;
+		} else {
+			xdp_page = page;
 		}
 
 		/* Transient failure which in theory could occur if
@@ -513,12 +580,18 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 		act = do_xdp_prog(vi, rq, xdp_prog, page, offset, len);
 		switch (act) {
 		case XDP_PASS:
+			if (unlikely(xdp_page != page))
+				__free_pages(xdp_page, 0);
 			break;
 		case XDP_TX:
+			if (unlikely(xdp_page != page))
+				goto err_xdp;
 			rcu_read_unlock();
 			goto xdp_xmit;
 		case XDP_DROP:
 		default:
+			if (unlikely(xdp_page != page))
+				__free_pages(xdp_page, 0);
 			goto err_xdp;
 		}
 	}

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [net-next PATCH v5 3/6] virtio_net: Add XDP support
  2016-12-07 20:11 ` [net-next PATCH v5 3/6] virtio_net: Add XDP support John Fastabend
@ 2016-12-08  4:48   ` Michael S. Tsirkin
  2016-12-08  5:14     ` John Fastabend
  0 siblings, 1 reply; 35+ messages in thread
From: Michael S. Tsirkin @ 2016-12-08  4:48 UTC (permalink / raw)
  To: John Fastabend
  Cc: daniel, shm, davem, tgraf, alexei.starovoitov, john.r.fastabend,
	netdev, brouer

On Wed, Dec 07, 2016 at 12:11:57PM -0800, John Fastabend wrote:
> From: John Fastabend <john.fastabend@gmail.com>
> 
> This adds XDP support to virtio_net. Some requirements must be
> met for XDP to be enabled depending on the mode. First it will
> only be supported with LRO disabled so that data is not pushed
> across multiple buffers. Second the MTU must be less than a page
> size to avoid having to handle XDP across multiple pages.
> 
> If mergeable receive is enabled this patch only supports the case
> where header and data are in the same buf which we can check when
> a packet is received by looking at num_buf. If the num_buf is
> greater than 1 and a XDP program is loaded the packet is dropped
> and a warning is thrown. When any_header_sg is set this does not
> happen and both header and data is put in a single buffer as expected
> so we check this when XDP programs are loaded.  Subsequent patches
> will process the packet in a degraded mode to ensure connectivity
> and correctness is not lost even if backend pushes packets into
> multiple buffers.
> 
> If big packets mode is enabled and MTU/LRO conditions above are
> met then XDP is allowed.
> 
> This patch was tested with qemu with vhost=on and vhost=off where
> mergeable and big_packet modes were forced via hard coding feature
> negotiation. Multiple buffers per packet was forced via a small
> test patch to vhost.c in the vhost=on qemu mode.
> 
> Suggested-by: Shrijeet Mukherjee <shrijeet@gmail.com>
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>

I'd like to note that I don't think disabling LRO is a good
plan long-term. It's really important for virtio performance,
so IMHO we need a fix for that.
I'm guessing that a subset of XDP programs would be quite
happy with just looking at headers, and that is there in the 1st buffer.
So how about teaching XDP that there could be a truncated packet?

Then we won't have to disable LRO.


> ---
>  drivers/net/virtio_net.c |  175 +++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 170 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index a5c47b1..a009299 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -22,6 +22,7 @@
>  #include <linux/module.h>
>  #include <linux/virtio.h>
>  #include <linux/virtio_net.h>
> +#include <linux/bpf.h>
>  #include <linux/scatterlist.h>
>  #include <linux/if_vlan.h>
>  #include <linux/slab.h>
> @@ -81,6 +82,8 @@ struct receive_queue {
>  
>  	struct napi_struct napi;
>  
> +	struct bpf_prog __rcu *xdp_prog;
> +
>  	/* Chain pages by the private ptr. */
>  	struct page *pages;
>  
> @@ -324,6 +327,38 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
>  	return skb;
>  }
>  
> +static u32 do_xdp_prog(struct virtnet_info *vi,
> +		       struct bpf_prog *xdp_prog,
> +		       struct page *page, int offset, int len)
> +{
> +	int hdr_padded_len;
> +	struct xdp_buff xdp;
> +	u32 act;
> +	u8 *buf;
> +
> +	buf = page_address(page) + offset;
> +
> +	if (vi->mergeable_rx_bufs)
> +		hdr_padded_len = sizeof(struct virtio_net_hdr_mrg_rxbuf);
> +	else
> +		hdr_padded_len = sizeof(struct padded_vnet_hdr);
> +
> +	xdp.data = buf + hdr_padded_len;
> +	xdp.data_end = xdp.data + (len - vi->hdr_len);
> +
> +	act = bpf_prog_run_xdp(xdp_prog, &xdp);
> +	switch (act) {
> +	case XDP_PASS:
> +		return XDP_PASS;
> +	default:
> +		bpf_warn_invalid_xdp_action(act);
> +	case XDP_TX:
> +	case XDP_ABORTED:
> +	case XDP_DROP:
> +		return XDP_DROP;
> +	}
> +}
> +
>  static struct sk_buff *receive_small(struct virtnet_info *vi, void *buf, unsigned int len)
>  {
>  	struct sk_buff * skb = buf;
> @@ -340,14 +375,32 @@ static struct sk_buff *receive_big(struct net_device *dev,
>  				   void *buf,
>  				   unsigned int len)
>  {
> +	struct bpf_prog *xdp_prog;
>  	struct page *page = buf;
> -	struct sk_buff *skb = page_to_skb(vi, rq, page, 0, len, PAGE_SIZE);
> +	struct sk_buff *skb;
>  
> +	rcu_read_lock();
> +	xdp_prog = rcu_dereference(rq->xdp_prog);
> +	if (xdp_prog) {
> +		struct virtio_net_hdr_mrg_rxbuf *hdr = buf;
> +		u32 act;
> +
> +		if (unlikely(hdr->hdr.gso_type || hdr->hdr.flags))
> +			goto err_xdp;
> +		act = do_xdp_prog(vi, xdp_prog, page, 0, len);
> +		if (act == XDP_DROP)
> +			goto err_xdp;
> +	}
> +	rcu_read_unlock();
> +
> +	skb = page_to_skb(vi, rq, page, 0, len, PAGE_SIZE);
>  	if (unlikely(!skb))
>  		goto err;
>  
>  	return skb;
>  
> +err_xdp:
> +	rcu_read_unlock();
>  err:
>  	dev->stats.rx_dropped++;
>  	give_pages(rq, page);
> @@ -365,11 +418,42 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>  	u16 num_buf = virtio16_to_cpu(vi->vdev, hdr->num_buffers);
>  	struct page *page = virt_to_head_page(buf);
>  	int offset = buf - page_address(page);
> -	unsigned int truesize = max(len, mergeable_ctx_to_buf_truesize(ctx));
> +	struct sk_buff *head_skb, *curr_skb;
> +	struct bpf_prog *xdp_prog;
> +	unsigned int truesize;
> +
> +	rcu_read_lock();
> +	xdp_prog = rcu_dereference(rq->xdp_prog);
> +	if (xdp_prog) {
> +		u32 act;
> +
> +		/* No known backend devices should send packets with
> +		 * more than a single buffer when XDP conditions are
> +		 * met. However it is not strictly illegal so the case
> +		 * is handled as an exception and a warning is thrown.
> +		 */
> +		if (unlikely(num_buf > 1)) {
> +			bpf_warn_invalid_xdp_buffer();
> +			goto err_xdp;
> +		}
>  
> -	struct sk_buff *head_skb = page_to_skb(vi, rq, page, offset, len,
> -					       truesize);
> -	struct sk_buff *curr_skb = head_skb;
> +		/* Transient failure which in theory could occur if
> +		 * in-flight packets from before XDP was enabled reach
> +		 * the receive path after XDP is loaded. In practice I
> +		 * was not able to create this condition.
> +		 */
> +		if (unlikely(hdr->hdr.gso_type || hdr->hdr.flags))
> +			goto err_xdp;
> +
> +		act = do_xdp_prog(vi, xdp_prog, page, offset, len);
> +		if (act == XDP_DROP)
> +			goto err_xdp;
> +	}
> +	rcu_read_unlock();
> +
> +	truesize = max(len, mergeable_ctx_to_buf_truesize(ctx));
> +	head_skb = page_to_skb(vi, rq, page, offset, len, truesize);
> +	curr_skb = head_skb;
>  
>  	if (unlikely(!curr_skb))
>  		goto err_skb;
> @@ -423,6 +507,8 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>  	ewma_pkt_len_add(&rq->mrg_avg_pkt_len, head_skb->len);
>  	return head_skb;
>  
> +err_xdp:
> +	rcu_read_unlock();
>  err_skb:
>  	put_page(page);
>  	while (--num_buf) {
> @@ -1328,6 +1414,13 @@ static int virtnet_set_channels(struct net_device *dev,
>  	if (queue_pairs > vi->max_queue_pairs || queue_pairs == 0)
>  		return -EINVAL;
>  
> +	/* For now we don't support modifying channels while XDP is loaded
> +	 * also when XDP is loaded all RX queues have XDP programs so we only
> +	 * need to check a single RX queue.
> +	 */
> +	if (vi->rq[0].xdp_prog)
> +		return -EINVAL;
> +
>  	get_online_cpus();
>  	err = virtnet_set_queues(vi, queue_pairs);
>  	if (!err) {
> @@ -1449,6 +1542,69 @@ static int virtnet_set_features(struct net_device *netdev,
>  	return 0;
>  }
>  
> +static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog)
> +{
> +	unsigned long int max_sz = PAGE_SIZE - sizeof(struct padded_vnet_hdr);
> +	struct virtnet_info *vi = netdev_priv(dev);
> +	struct bpf_prog *old_prog;
> +	int i;
> +
> +	if ((dev->features & NETIF_F_LRO) && prog) {
> +		netdev_warn(dev, "can't set XDP while LRO is on, disable LRO first\n");
> +		return -EINVAL;
> +	}
> +
> +	if (vi->mergeable_rx_bufs && !vi->any_header_sg) {
> +		netdev_warn(dev, "XDP expects header/data in single page\n");
> +		return -EINVAL;
> +	}
> +
> +	if (dev->mtu > max_sz) {
> +		netdev_warn(dev, "XDP requires MTU less than %lu\n", max_sz);
> +		return -EINVAL;
> +	}
> +
> +	if (prog) {
> +		prog = bpf_prog_add(prog, vi->max_queue_pairs - 1);
> +		if (IS_ERR(prog))
> +			return PTR_ERR(prog);
> +	}
> +
> +	for (i = 0; i < vi->max_queue_pairs; i++) {
> +		old_prog = rtnl_dereference(vi->rq[i].xdp_prog);
> +		rcu_assign_pointer(vi->rq[i].xdp_prog, prog);
> +		if (old_prog)
> +			bpf_prog_put(old_prog);
> +	}
> +
> +	return 0;
> +}
> +
> +static bool virtnet_xdp_query(struct net_device *dev)
> +{
> +	struct virtnet_info *vi = netdev_priv(dev);
> +	int i;
> +
> +	for (i = 0; i < vi->max_queue_pairs; i++) {
> +		if (vi->rq[i].xdp_prog)
> +			return true;
> +	}
> +	return false;
> +}
> +
> +static int virtnet_xdp(struct net_device *dev, struct netdev_xdp *xdp)
> +{
> +	switch (xdp->command) {
> +	case XDP_SETUP_PROG:
> +		return virtnet_xdp_set(dev, xdp->prog);
> +	case XDP_QUERY_PROG:
> +		xdp->prog_attached = virtnet_xdp_query(dev);
> +		return 0;
> +	default:
> +		return -EINVAL;
> +	}
> +}
> +
>  static const struct net_device_ops virtnet_netdev = {
>  	.ndo_open            = virtnet_open,
>  	.ndo_stop   	     = virtnet_close,
> @@ -1466,6 +1622,7 @@ static int virtnet_set_features(struct net_device *netdev,
>  	.ndo_busy_poll		= virtnet_busy_poll,
>  #endif
>  	.ndo_set_features	= virtnet_set_features,
> +	.ndo_xdp		= virtnet_xdp,
>  };
>  
>  static void virtnet_config_changed_work(struct work_struct *work)
> @@ -1527,12 +1684,20 @@ static void virtnet_free_queues(struct virtnet_info *vi)
>  
>  static void free_receive_bufs(struct virtnet_info *vi)
>  {
> +	struct bpf_prog *old_prog;
>  	int i;
>  
> +	rtnl_lock();
>  	for (i = 0; i < vi->max_queue_pairs; i++) {
>  		while (vi->rq[i].pages)
>  			__free_pages(get_a_page(&vi->rq[i], GFP_KERNEL), 0);
> +
> +		old_prog = rtnl_dereference(vi->rq[i].xdp_prog);
> +		RCU_INIT_POINTER(vi->rq[i].xdp_prog, NULL);
> +		if (old_prog)
> +			bpf_prog_put(old_prog);
>  	}
> +	rtnl_unlock();
>  }
>  
>  static void free_receive_page_frags(struct virtnet_info *vi)

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [net-next PATCH v5 3/6] virtio_net: Add XDP support
  2016-12-08  4:48   ` Michael S. Tsirkin
@ 2016-12-08  5:14     ` John Fastabend
  2016-12-08  5:54       ` Michael S. Tsirkin
  0 siblings, 1 reply; 35+ messages in thread
From: John Fastabend @ 2016-12-08  5:14 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: daniel, shm, davem, tgraf, alexei.starovoitov, john.r.fastabend,
	netdev, brouer

On 16-12-07 08:48 PM, Michael S. Tsirkin wrote:
> On Wed, Dec 07, 2016 at 12:11:57PM -0800, John Fastabend wrote:
>> From: John Fastabend <john.fastabend@gmail.com>
>>
>> This adds XDP support to virtio_net. Some requirements must be
>> met for XDP to be enabled depending on the mode. First it will
>> only be supported with LRO disabled so that data is not pushed
>> across multiple buffers. Second the MTU must be less than a page
>> size to avoid having to handle XDP across multiple pages.
>>
>> If mergeable receive is enabled this patch only supports the case
>> where header and data are in the same buf which we can check when
>> a packet is received by looking at num_buf. If the num_buf is
>> greater than 1 and a XDP program is loaded the packet is dropped
>> and a warning is thrown. When any_header_sg is set this does not
>> happen and both header and data is put in a single buffer as expected
>> so we check this when XDP programs are loaded.  Subsequent patches
>> will process the packet in a degraded mode to ensure connectivity
>> and correctness is not lost even if backend pushes packets into
>> multiple buffers.
>>
>> If big packets mode is enabled and MTU/LRO conditions above are
>> met then XDP is allowed.
>>
>> This patch was tested with qemu with vhost=on and vhost=off where
>> mergeable and big_packet modes were forced via hard coding feature
>> negotiation. Multiple buffers per packet was forced via a small
>> test patch to vhost.c in the vhost=on qemu mode.
>>
>> Suggested-by: Shrijeet Mukherjee <shrijeet@gmail.com>
>> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> 
> I'd like to note that I don't think disabling LRO is a good
> plan long-term. It's really important for virtio performance,
> so IMHO we need a fix for that.
> I'm guessing that a subset of XDP programs would be quite
> happy with just looking at headers, and that is there in the 1st buffer.
> So how about teaching XDP that there could be a truncated packet?
> 
> Then we won't have to disable LRO.
> 

Agreed long-term we can drop this requirement this type of improvement
would also allow working with jumbo frames on nics.

I don't think it should block this patch series though.

.John

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [net-next PATCH v5 3/6] virtio_net: Add XDP support
  2016-12-08  5:14     ` John Fastabend
@ 2016-12-08  5:54       ` Michael S. Tsirkin
  0 siblings, 0 replies; 35+ messages in thread
From: Michael S. Tsirkin @ 2016-12-08  5:54 UTC (permalink / raw)
  To: John Fastabend
  Cc: daniel, shm, davem, tgraf, alexei.starovoitov, john.r.fastabend,
	netdev, brouer

On Wed, Dec 07, 2016 at 09:14:48PM -0800, John Fastabend wrote:
> On 16-12-07 08:48 PM, Michael S. Tsirkin wrote:
> > On Wed, Dec 07, 2016 at 12:11:57PM -0800, John Fastabend wrote:
> >> From: John Fastabend <john.fastabend@gmail.com>
> >>
> >> This adds XDP support to virtio_net. Some requirements must be
> >> met for XDP to be enabled depending on the mode. First it will
> >> only be supported with LRO disabled so that data is not pushed
> >> across multiple buffers. Second the MTU must be less than a page
> >> size to avoid having to handle XDP across multiple pages.
> >>
> >> If mergeable receive is enabled this patch only supports the case
> >> where header and data are in the same buf which we can check when
> >> a packet is received by looking at num_buf. If the num_buf is
> >> greater than 1 and a XDP program is loaded the packet is dropped
> >> and a warning is thrown. When any_header_sg is set this does not
> >> happen and both header and data is put in a single buffer as expected
> >> so we check this when XDP programs are loaded.  Subsequent patches
> >> will process the packet in a degraded mode to ensure connectivity
> >> and correctness is not lost even if backend pushes packets into
> >> multiple buffers.
> >>
> >> If big packets mode is enabled and MTU/LRO conditions above are
> >> met then XDP is allowed.
> >>
> >> This patch was tested with qemu with vhost=on and vhost=off where
> >> mergeable and big_packet modes were forced via hard coding feature
> >> negotiation. Multiple buffers per packet was forced via a small
> >> test patch to vhost.c in the vhost=on qemu mode.
> >>
> >> Suggested-by: Shrijeet Mukherjee <shrijeet@gmail.com>
> >> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> > 
> > I'd like to note that I don't think disabling LRO is a good
> > plan long-term. It's really important for virtio performance,
> > so IMHO we need a fix for that.
> > I'm guessing that a subset of XDP programs would be quite
> > happy with just looking at headers, and that is there in the 1st buffer.
> > So how about teaching XDP that there could be a truncated packet?
> > 
> > Then we won't have to disable LRO.
> > 
> 
> Agreed long-term we can drop this requirement this type of improvement
> would also allow working with jumbo frames on nics.
> 
> I don't think it should block this patch series though.
> 
> .John

Right.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [net-next PATCH v5 4/6] virtio_net: add dedicated XDP transmit queues
  2016-12-07 20:12 ` [net-next PATCH v5 4/6] virtio_net: add dedicated XDP transmit queues John Fastabend
@ 2016-12-08  5:59   ` Michael S. Tsirkin
  2016-12-08 17:10     ` John Fastabend
  0 siblings, 1 reply; 35+ messages in thread
From: Michael S. Tsirkin @ 2016-12-08  5:59 UTC (permalink / raw)
  To: John Fastabend
  Cc: daniel, shm, davem, tgraf, alexei.starovoitov, john.r.fastabend,
	netdev, brouer

On Wed, Dec 07, 2016 at 12:12:23PM -0800, John Fastabend wrote:
> XDP requires using isolated transmit queues to avoid interference
> with normal networking stack (BQL, NETDEV_TX_BUSY, etc).
> This patch
> adds a XDP queue per cpu when a XDP program is loaded and does not
> expose the queues to the OS via the normal API call to
> netif_set_real_num_tx_queues(). This way the stack will never push
> an skb to these queues.
> 
> However virtio/vhost/qemu implementation only allows for creating
> TX/RX queue pairs at this time so creating only TX queues was not
> possible. And because the associated RX queues are being created I
> went ahead and exposed these to the stack and let the backend use
> them. This creates more RX queues visible to the network stack than
> TX queues which is worth mentioning but does not cause any issues as
> far as I can tell.
> 
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> ---
>  drivers/net/virtio_net.c |   30 ++++++++++++++++++++++++++++--
>  1 file changed, 28 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index a009299..28b1196 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -114,6 +114,9 @@ struct virtnet_info {
>  	/* # of queue pairs currently used by the driver */
>  	u16 curr_queue_pairs;
>  
> +	/* # of XDP queue pairs currently used by the driver */
> +	u16 xdp_queue_pairs;
> +
>  	/* I like... big packets and I cannot lie! */
>  	bool big_packets;
>  
> @@ -1547,7 +1550,8 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog)
>  	unsigned long int max_sz = PAGE_SIZE - sizeof(struct padded_vnet_hdr);
>  	struct virtnet_info *vi = netdev_priv(dev);
>  	struct bpf_prog *old_prog;
> -	int i;
> +	u16 xdp_qp = 0, curr_qp;
> +	int i, err;
>  
>  	if ((dev->features & NETIF_F_LRO) && prog) {
>  		netdev_warn(dev, "can't set XDP while LRO is on, disable LRO first\n");
> @@ -1564,12 +1568,34 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog)
>  		return -EINVAL;
>  	}
>  
> +	curr_qp = vi->curr_queue_pairs - vi->xdp_queue_pairs;
> +	if (prog)
> +		xdp_qp = nr_cpu_ids;
> +
> +	/* XDP requires extra queues for XDP_TX */
> +	if (curr_qp + xdp_qp > vi->max_queue_pairs) {
> +		netdev_warn(dev, "request %i queues but max is %i\n",
> +			    curr_qp + xdp_qp, vi->max_queue_pairs);
> +		return -ENOMEM;
> +	}

Can't we disable XDP_TX somehow? Many people might only want RX drop,
and extra queues are not always there.


> +
> +	err = virtnet_set_queues(vi, curr_qp + xdp_qp);
> +	if (err) {
> +		dev_warn(&dev->dev, "XDP Device queue allocation failure.\n");
> +		return err;
> +	}
> +
>  	if (prog) {
>  		prog = bpf_prog_add(prog, vi->max_queue_pairs - 1);
> -		if (IS_ERR(prog))
> +		if (IS_ERR(prog)) {
> +			virtnet_set_queues(vi, curr_qp);
>  			return PTR_ERR(prog);
> +		}
>  	}
>  
> +	vi->xdp_queue_pairs = xdp_qp;
> +	netif_set_real_num_rx_queues(dev, curr_qp + xdp_qp);
> +
>  	for (i = 0; i < vi->max_queue_pairs; i++) {
>  		old_prog = rtnl_dereference(vi->rq[i].xdp_prog);
>  		rcu_assign_pointer(vi->rq[i].xdp_prog, prog);

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [net-next PATCH v5 5/6] virtio_net: add XDP_TX support
  2016-12-07 20:12 ` [net-next PATCH v5 5/6] virtio_net: add XDP_TX support John Fastabend
@ 2016-12-08  6:11   ` Michael S. Tsirkin
  2016-12-08 18:18     ` John Fastabend
  0 siblings, 1 reply; 35+ messages in thread
From: Michael S. Tsirkin @ 2016-12-08  6:11 UTC (permalink / raw)
  To: John Fastabend
  Cc: daniel, shm, davem, tgraf, alexei.starovoitov, john.r.fastabend,
	netdev, brouer

On Wed, Dec 07, 2016 at 12:12:45PM -0800, John Fastabend wrote:
> This adds support for the XDP_TX action to virtio_net. When an XDP
> program is run and returns the XDP_TX action the virtio_net XDP
> implementation will transmit the packet on a TX queue that aligns
> with the current CPU that the XDP packet was processed on.
> 
> Before sending the packet the header is zeroed.  Also XDP is expected
> to handle checksum correctly so no checksum offload  support is
> provided.
> 
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> ---
>  drivers/net/virtio_net.c |   99 +++++++++++++++++++++++++++++++++++++++++++---
>  1 file changed, 92 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 28b1196..8e5b13c 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -330,12 +330,57 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
>  	return skb;
>  }
>  
> +static void virtnet_xdp_xmit(struct virtnet_info *vi,
> +			     struct receive_queue *rq,
> +			     struct send_queue *sq,
> +			     struct xdp_buff *xdp)
> +{
> +	struct page *page = virt_to_head_page(xdp->data);
> +	struct virtio_net_hdr_mrg_rxbuf *hdr;
> +	unsigned int num_sg, len;
> +	void *xdp_sent;
> +	int err;
> +
> +	/* Free up any pending old buffers before queueing new ones. */
> +	while ((xdp_sent = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> +		struct page *sent_page = virt_to_head_page(xdp_sent);
> +
> +		if (vi->mergeable_rx_bufs)
> +			put_page(sent_page);
> +		else
> +			give_pages(rq, sent_page);
> +	}

Looks like this is the only place where you do virtqueue_get_buf.
No interrupt handler?
This means that if you fill up the queue, nothing will clean it
and things will get stuck.
Can this be the issue you saw?


> +
> +	/* Zero header and leave csum up to XDP layers */
> +	hdr = xdp->data;
> +	memset(hdr, 0, vi->hdr_len);
> +
> +	num_sg = 1;
> +	sg_init_one(sq->sg, xdp->data, xdp->data_end - xdp->data);
> +	err = virtqueue_add_outbuf(sq->vq, sq->sg, num_sg,
> +				   xdp->data, GFP_ATOMIC);
> +	if (unlikely(err)) {
> +		if (vi->mergeable_rx_bufs)
> +			put_page(page);
> +		else
> +			give_pages(rq, page);
> +	} else if (!vi->mergeable_rx_bufs) {
> +		/* If not mergeable bufs must be big packets so cleanup pages */
> +		give_pages(rq, (struct page *)page->private);
> +		page->private = 0;
> +	}
> +
> +	virtqueue_kick(sq->vq);

Is this unconditional kick a work-around for hang
we could not figure out yet?
I guess this helps because it just slows down the guest.
I don't much like it ...

> +}
> +
>  static u32 do_xdp_prog(struct virtnet_info *vi,
> +		       struct receive_queue *rq,
>  		       struct bpf_prog *xdp_prog,
>  		       struct page *page, int offset, int len)
>  {
>  	int hdr_padded_len;
>  	struct xdp_buff xdp;
> +	unsigned int qp;
>  	u32 act;
>  	u8 *buf;
>  
> @@ -353,9 +398,15 @@ static u32 do_xdp_prog(struct virtnet_info *vi,
>  	switch (act) {
>  	case XDP_PASS:
>  		return XDP_PASS;
> +	case XDP_TX:
> +		qp = vi->curr_queue_pairs -
> +			vi->xdp_queue_pairs +
> +			smp_processor_id();
> +		xdp.data = buf + (vi->mergeable_rx_bufs ? 0 : 4);
> +		virtnet_xdp_xmit(vi, rq, &vi->sq[qp], &xdp);
> +		return XDP_TX;
>  	default:
>  		bpf_warn_invalid_xdp_action(act);
> -	case XDP_TX:
>  	case XDP_ABORTED:
>  	case XDP_DROP:
>  		return XDP_DROP;
> @@ -390,9 +441,17 @@ static struct sk_buff *receive_big(struct net_device *dev,
>  
>  		if (unlikely(hdr->hdr.gso_type || hdr->hdr.flags))
>  			goto err_xdp;
> -		act = do_xdp_prog(vi, xdp_prog, page, 0, len);
> -		if (act == XDP_DROP)
> +		act = do_xdp_prog(vi, rq, xdp_prog, page, 0, len);
> +		switch (act) {
> +		case XDP_PASS:
> +			break;
> +		case XDP_TX:
> +			rcu_read_unlock();
> +			goto xdp_xmit;
> +		case XDP_DROP:
> +		default:
>  			goto err_xdp;
> +		}
>  	}
>  	rcu_read_unlock();
>  
> @@ -407,6 +466,7 @@ static struct sk_buff *receive_big(struct net_device *dev,
>  err:
>  	dev->stats.rx_dropped++;
>  	give_pages(rq, page);
> +xdp_xmit:
>  	return NULL;
>  }
>  
> @@ -425,6 +485,8 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>  	struct bpf_prog *xdp_prog;
>  	unsigned int truesize;
>  
> +	head_skb = NULL;
> +
>  	rcu_read_lock();
>  	xdp_prog = rcu_dereference(rq->xdp_prog);
>  	if (xdp_prog) {
> @@ -448,9 +510,17 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>  		if (unlikely(hdr->hdr.gso_type || hdr->hdr.flags))
>  			goto err_xdp;
>  
> -		act = do_xdp_prog(vi, xdp_prog, page, offset, len);
> -		if (act == XDP_DROP)
> +		act = do_xdp_prog(vi, rq, xdp_prog, page, offset, len);
> +		switch (act) {
> +		case XDP_PASS:
> +			break;
> +		case XDP_TX:
> +			rcu_read_unlock();
> +			goto xdp_xmit;
> +		case XDP_DROP:
> +		default:
>  			goto err_xdp;
> +		}
>  	}
>  	rcu_read_unlock();
>  
> @@ -528,6 +598,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>  err_buf:
>  	dev->stats.rx_dropped++;
>  	dev_kfree_skb(head_skb);
> +xdp_xmit:
>  	return NULL;
>  }
>  
> @@ -1734,6 +1805,16 @@ static void free_receive_page_frags(struct virtnet_info *vi)
>  			put_page(vi->rq[i].alloc_frag.page);
>  }
>  
> +static bool is_xdp_queue(struct virtnet_info *vi, int q)
> +{
> +	if (q < (vi->curr_queue_pairs - vi->xdp_queue_pairs))
> +		return false;
> +	else if (q < vi->curr_queue_pairs)
> +		return true;
> +	else
> +		return false;
> +}
> +
>  static void free_unused_bufs(struct virtnet_info *vi)
>  {
>  	void *buf;
> @@ -1741,8 +1822,12 @@ static void free_unused_bufs(struct virtnet_info *vi)
>  
>  	for (i = 0; i < vi->max_queue_pairs; i++) {
>  		struct virtqueue *vq = vi->sq[i].vq;
> -		while ((buf = virtqueue_detach_unused_buf(vq)) != NULL)
> -			dev_kfree_skb(buf);
> +		while ((buf = virtqueue_detach_unused_buf(vq)) != NULL) {
> +			if (!is_xdp_queue(vi, i))
> +				dev_kfree_skb(buf);
> +			else
> +				put_page(virt_to_head_page(buf));
> +		}
>  	}
>  
>  	for (i = 0; i < vi->max_queue_pairs; i++) {

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [net-next PATCH v5 4/6] virtio_net: add dedicated XDP transmit queues
  2016-12-08  5:59   ` Michael S. Tsirkin
@ 2016-12-08 17:10     ` John Fastabend
  0 siblings, 0 replies; 35+ messages in thread
From: John Fastabend @ 2016-12-08 17:10 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: daniel, shm, davem, tgraf, alexei.starovoitov, john.r.fastabend,
	netdev, brouer

On 16-12-07 09:59 PM, Michael S. Tsirkin wrote:
> On Wed, Dec 07, 2016 at 12:12:23PM -0800, John Fastabend wrote:
>> XDP requires using isolated transmit queues to avoid interference
>> with normal networking stack (BQL, NETDEV_TX_BUSY, etc).
>> This patch
>> adds a XDP queue per cpu when a XDP program is loaded and does not
>> expose the queues to the OS via the normal API call to
>> netif_set_real_num_tx_queues(). This way the stack will never push
>> an skb to these queues.
>>
>> However virtio/vhost/qemu implementation only allows for creating
>> TX/RX queue pairs at this time so creating only TX queues was not
>> possible. And because the associated RX queues are being created I
>> went ahead and exposed these to the stack and let the backend use
>> them. This creates more RX queues visible to the network stack than
>> TX queues which is worth mentioning but does not cause any issues as
>> far as I can tell.
>>
>> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
>> ---
>>  drivers/net/virtio_net.c |   30 ++++++++++++++++++++++++++++--
>>  1 file changed, 28 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index a009299..28b1196 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -114,6 +114,9 @@ struct virtnet_info {
>>  	/* # of queue pairs currently used by the driver */
>>  	u16 curr_queue_pairs;
>>  
>> +	/* # of XDP queue pairs currently used by the driver */
>> +	u16 xdp_queue_pairs;
>> +
>>  	/* I like... big packets and I cannot lie! */
>>  	bool big_packets;
>>  
>> @@ -1547,7 +1550,8 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog)
>>  	unsigned long int max_sz = PAGE_SIZE - sizeof(struct padded_vnet_hdr);
>>  	struct virtnet_info *vi = netdev_priv(dev);
>>  	struct bpf_prog *old_prog;
>> -	int i;
>> +	u16 xdp_qp = 0, curr_qp;
>> +	int i, err;
>>  
>>  	if ((dev->features & NETIF_F_LRO) && prog) {
>>  		netdev_warn(dev, "can't set XDP while LRO is on, disable LRO first\n");
>> @@ -1564,12 +1568,34 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog)
>>  		return -EINVAL;
>>  	}
>>  
>> +	curr_qp = vi->curr_queue_pairs - vi->xdp_queue_pairs;
>> +	if (prog)
>> +		xdp_qp = nr_cpu_ids;
>> +
>> +	/* XDP requires extra queues for XDP_TX */
>> +	if (curr_qp + xdp_qp > vi->max_queue_pairs) {
>> +		netdev_warn(dev, "request %i queues but max is %i\n",
>> +			    curr_qp + xdp_qp, vi->max_queue_pairs);
>> +		return -ENOMEM;
>> +	}
> 
> Can't we disable XDP_TX somehow? Many people might only want RX drop,
> and extra queues are not always there.
> 

Alexei, Daniel, any thoughts on this?

I know we were trying to claim some base level of feature support for
all XDP drivers. I am sympathetic to this argument though for DDOS we
do not need XDP_TX support. And virtio can become queue constrained
in some cases.

But, I do not want to silently degrade to RX mode and trying to check
this through the verifier appears challenging. And I'm not thrilled
about more knobs :/ Maybe an escape to force RX mode in sysfs or at
program load time would be OK?

I think this is an improvement that can go on my list along with LRO.

.John

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [net-next PATCH v5 5/6] virtio_net: add XDP_TX support
  2016-12-08  6:11   ` Michael S. Tsirkin
@ 2016-12-08 18:18     ` John Fastabend
  2016-12-08 21:08       ` Michael S. Tsirkin
  2016-12-08 21:18       ` Michael S. Tsirkin
  0 siblings, 2 replies; 35+ messages in thread
From: John Fastabend @ 2016-12-08 18:18 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: daniel, shm, davem, tgraf, alexei.starovoitov, john.r.fastabend,
	netdev, brouer

On 16-12-07 10:11 PM, Michael S. Tsirkin wrote:
> On Wed, Dec 07, 2016 at 12:12:45PM -0800, John Fastabend wrote:
>> This adds support for the XDP_TX action to virtio_net. When an XDP
>> program is run and returns the XDP_TX action the virtio_net XDP
>> implementation will transmit the packet on a TX queue that aligns
>> with the current CPU that the XDP packet was processed on.
>>
>> Before sending the packet the header is zeroed.  Also XDP is expected
>> to handle checksum correctly so no checksum offload  support is
>> provided.
>>
>> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
>> ---
>>  drivers/net/virtio_net.c |   99 +++++++++++++++++++++++++++++++++++++++++++---
>>  1 file changed, 92 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index 28b1196..8e5b13c 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -330,12 +330,57 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
>>  	return skb;
>>  }
>>  
>> +static void virtnet_xdp_xmit(struct virtnet_info *vi,
>> +			     struct receive_queue *rq,
>> +			     struct send_queue *sq,
>> +			     struct xdp_buff *xdp)
>> +{
>> +	struct page *page = virt_to_head_page(xdp->data);
>> +	struct virtio_net_hdr_mrg_rxbuf *hdr;
>> +	unsigned int num_sg, len;
>> +	void *xdp_sent;
>> +	int err;
>> +
>> +	/* Free up any pending old buffers before queueing new ones. */
>> +	while ((xdp_sent = virtqueue_get_buf(sq->vq, &len)) != NULL) {
>> +		struct page *sent_page = virt_to_head_page(xdp_sent);
>> +
>> +		if (vi->mergeable_rx_bufs)
>> +			put_page(sent_page);
>> +		else
>> +			give_pages(rq, sent_page);
>> +	}
> 
> Looks like this is the only place where you do virtqueue_get_buf.
> No interrupt handler?
> This means that if you fill up the queue, nothing will clean it
> and things will get stuck.

hmm OK so the callbacks should be implemented to do this and a pair
of virtqueue_enable_cb_prepare()/virtqueue_disable_cb() used to enable
and disable callbacks if packets are enqueued.

Also in the normal xmit path via start_xmit() will the same condition
happen? It looks like free_old_xmit_skbs for example is only called if
a packet is sent could we end up holding on to skbs in this case? I
don't see free_old_xmit_skbs being called from any callbacks?

> Can this be the issue you saw?

nope see below I was mishandling the big_packets page cleanup path in
the error case.

> 
> 
>> +
>> +	/* Zero header and leave csum up to XDP layers */
>> +	hdr = xdp->data;
>> +	memset(hdr, 0, vi->hdr_len);
>> +
>> +	nu_sg = 1;
>> +	sg_init_one(sq->sg, xdp->data, xdp->data_end - xdp->data);
>> +	err = virtqueue_add_outbuf(sq->vq, sq->sg, num_sg,
>> +				   xdp->data, GFP_ATOMIC);
>> +	if (unlikely(err)) {
>> +		if (vi->mergeable_rx_bufs)
>> +			put_page(page);
>> +		else
>> +			give_pages(rq, page);
>> +	} else if (!vi->mergeable_rx_bufs) {
>> +		/* If not mergeable bufs must be big packets so cleanup pages */
>> +		give_pages(rq, (struct page *)page->private);
>> +		page->private = 0;
>> +	}
>> +
>> +	virtqueue_kick(sq->vq);
> 
> Is this unconditional kick a work-around for hang
> we could not figure out yet?

I tracked the original issue down to how I handled the big_packet page
cleanups.

> I guess this helps because it just slows down the guest.
> I don't much like it ...

I left it like this copying the pattern in balloon and input drivers. I
can change it back to the previous pattern where it is only called if
there is no errors. It has been running fine with the old pattern now
for an hour or so.

.John

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [net-next PATCH v5 0/6] XDP for virtio_net
  2016-12-07 20:10 [net-next PATCH v5 0/6] XDP for virtio_net John Fastabend
                   ` (5 preceding siblings ...)
  2016-12-07 20:13 ` [net-next PATCH v5 6/6] virtio_net: xdp, add slowpath case for non contiguous buffers John Fastabend
@ 2016-12-08 19:17 ` David Miller
  2016-12-08 19:38   ` Alexei Starovoitov
  2016-12-08 21:16   ` [net-next PATCH v5 0/6] XDP for virtio_net Michael S. Tsirkin
  6 siblings, 2 replies; 35+ messages in thread
From: David Miller @ 2016-12-08 19:17 UTC (permalink / raw)
  To: john.fastabend
  Cc: daniel, mst, shm, tgraf, alexei.starovoitov, john.r.fastabend,
	netdev, brouer

From: John Fastabend <john.fastabend@gmail.com>
Date: Wed, 07 Dec 2016 12:10:47 -0800

> This implements virtio_net for the mergeable buffers and big_packet
> modes. I tested this with vhost_net running on qemu and did not see
> any issues. For testing num_buf > 1 I added a hack to vhost driver
> to only but 100 bytes per buffer.
 ...

So where are we with this?

I'm not too thrilled with the idea of making XDP_TX optional or
something like that.  If someone enables XDP, there is a tradeoff.

I also have reservations about the idea to make jumbo frames work
without giving XDP access to the whole packet.  If it wants to push or
pop a header, it might need to know the whole packet length.  How will
you pass that to the XDP program?

Some kinds of encapsulation require trailers, thus preclusing access
to the entire packet precludes those kinds of transformations.

This is why we want simple, linear, buffer access for XDP.

Even the most seemingly minor exception turns into a huge complicated
mess.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [net-next PATCH v5 0/6] XDP for virtio_net
  2016-12-08 19:17 ` [net-next PATCH v5 0/6] XDP for virtio_net David Miller
@ 2016-12-08 19:38   ` Alexei Starovoitov
  2016-12-08 20:46     ` John Fastabend
  2016-12-13  8:46     ` XDP_DROP and XDP_TX (Was: Re: [net-next PATCH v5 0/6] XDP for virtio_net) Jesper Dangaard Brouer
  2016-12-08 21:16   ` [net-next PATCH v5 0/6] XDP for virtio_net Michael S. Tsirkin
  1 sibling, 2 replies; 35+ messages in thread
From: Alexei Starovoitov @ 2016-12-08 19:38 UTC (permalink / raw)
  To: David Miller
  Cc: john.fastabend, daniel, mst, shm, tgraf, john.r.fastabend,
	netdev, brouer

On Thu, Dec 08, 2016 at 02:17:02PM -0500, David Miller wrote:
> From: John Fastabend <john.fastabend@gmail.com>
> Date: Wed, 07 Dec 2016 12:10:47 -0800
> 
> > This implements virtio_net for the mergeable buffers and big_packet
> > modes. I tested this with vhost_net running on qemu and did not see
> > any issues. For testing num_buf > 1 I added a hack to vhost driver
> > to only but 100 bytes per buffer.
>  ...
> 
> So where are we with this?
> 
> I'm not too thrilled with the idea of making XDP_TX optional or
> something like that.  If someone enables XDP, there is a tradeoff.
> 
> I also have reservations about the idea to make jumbo frames work
> without giving XDP access to the whole packet.  If it wants to push or
> pop a header, it might need to know the whole packet length.  How will
> you pass that to the XDP program?
> 
> Some kinds of encapsulation require trailers, thus preclusing access
> to the entire packet precludes those kinds of transformations.

+1

> This is why we want simple, linear, buffer access for XDP.
> 
> Even the most seemingly minor exception turns into a huge complicated
> mess.

+1

and from the other thread:
> > Can't we disable XDP_TX somehow? Many people might only want RX drop,
> > and extra queues are not always there.
> >
> 
> Alexei, Daniel, any thoughts on this?

I don't like it.

> I know we were trying to claim some base level of feature support for
> all XDP drivers. I am sympathetic to this argument though for DDOS we
> do not need XDP_TX support. And virtio can become queue constrained
> in some cases.

especially for ddos case doing lro/gro is not helpful.
I frankly don't see a use case where you'd want to steer a packet
all the way into VM just to drop them there?
Without XDP_TX it's too crippled. adjust_head() won't be possible,
packet mangling would have to be disabled and so on.
If xdp program doesn't see raw packet it can only parse the headers of
this jumbo meta-packet and drop it, but for virtio it's really too late.
ddos protection needs to be done at the earliest hw nic receive.
I think if driver claims xdp support it needs to support
drop/pass/tx and adjust_head. For metadata passing up into stack from xdp
we need adjust_head, for encap/decap we need it too. And lro is in the way
of such transformations.
We struggled a lot with cls_bpf due to all metadata inside skb that needs
to be kept correct. Feeding non-raw packets into xdp is a rat hole.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [net-next PATCH v5 0/6] XDP for virtio_net
  2016-12-08 19:38   ` Alexei Starovoitov
@ 2016-12-08 20:46     ` John Fastabend
  2016-12-08 20:58       ` Michael S. Tsirkin
                         ` (2 more replies)
  2016-12-13  8:46     ` XDP_DROP and XDP_TX (Was: Re: [net-next PATCH v5 0/6] XDP for virtio_net) Jesper Dangaard Brouer
  1 sibling, 3 replies; 35+ messages in thread
From: John Fastabend @ 2016-12-08 20:46 UTC (permalink / raw)
  To: Alexei Starovoitov, David Miller
  Cc: daniel, mst, shm, tgraf, john.r.fastabend, netdev, brouer,
	Michael S. Tsirkin

On 16-12-08 11:38 AM, Alexei Starovoitov wrote:
> On Thu, Dec 08, 2016 at 02:17:02PM -0500, David Miller wrote:
>> From: John Fastabend <john.fastabend@gmail.com>
>> Date: Wed, 07 Dec 2016 12:10:47 -0800
>>
>>> This implements virtio_net for the mergeable buffers and big_packet
>>> modes. I tested this with vhost_net running on qemu and did not see
>>> any issues. For testing num_buf > 1 I added a hack to vhost driver
>>> to only but 100 bytes per buffer.
>>  ...
>>
>> So where are we with this?

There is one possible issue with a hang that Michael pointed out. I can
either spin a v6 or if you pull this v5 series in I can post a bugfix
for it. I am not seeing the issue in practice XDP virtio has been up
and running on my box here for days without issue.

All the concerns below are really future XDP ideas and unrelated to
this series or at least not required for this series to applied IMO.

>>
>> I'm not too thrilled with the idea of making XDP_TX optional or
>> something like that.  If someone enables XDP, there is a tradeoff.
>>
>> I also have reservations about the idea to make jumbo frames work
>> without giving XDP access to the whole packet.  If it wants to push or
>> pop a header, it might need to know the whole packet length.  How will
>> you pass that to the XDP program?
>>
>> Some kinds of encapsulation require trailers, thus preclusing access
>> to the entire packet precludes those kinds of transformations.
> 
> +1

This was sort of speculative on my side it is certainly not dependent on
the series here. I agree that we don't want to get into a state where
program X runs here and not there and only runs after doing magic
incantations, etc. I would only propose it if there is a clean way to
implement this.

> 
>> This is why we want simple, linear, buffer access for XDP.
>>
>> Even the most seemingly minor exception turns into a huge complicated
>> mess.
> 
> +1

Yep.

> 
> and from the other thread:
>>> Can't we disable XDP_TX somehow? Many people might only want RX drop,
>>> and extra queues are not always there.
>>>
>>
>> Alexei, Daniel, any thoughts on this?
> 
> I don't like it.
> 

OK alternatively we can make more queues available in virtio which might
be the better solution.

>> I know we were trying to claim some base level of feature support for
>> all XDP drivers. I am sympathetic to this argument though for DDOS we
>> do not need XDP_TX support. And virtio can become queue constrained
>> in some cases.
> 
> especially for ddos case doing lro/gro is not helpful.

Fair enough but disabling LRO to handle the case where you "might" get
a DDOS will hurt normal good traffic.

> I frankly don't see a use case where you'd want to steer a packet
> all the way into VM just to drop them there?

VM to VM traffic is my use case. And in that model we need XDP at the
virtio or vhost layer in case of malicious/broke/untrusted VM. I have
some vhost patches under development for when net-next opens up again.

> Without XDP_TX it's too crippled. adjust_head() won't be possible,

Just a nit but any reason not to support adjust_head and then XDP_PASS.
I don't have a use case in mind but also see no reason to preclude it.

> packet mangling would have to be disabled and so on.
> If xdp program doesn't see raw packet it can only parse the headers of
> this jumbo meta-packet and drop it, but for virtio it's really too late.
> ddos protection needs to be done at the earliest hw nic receive.

VM to VM traffic never touches hw nic.

> I think if driver claims xdp support it needs to support
> drop/pass/tx and adjust_head. For metadata passing up into stack from xdp
> we need adjust_head, for encap/decap we need it too. And lro is in the way
> of such transformations.
> We struggled a lot with cls_bpf due to all metadata inside skb that needs
> to be kept correct. Feeding non-raw packets into xdp is a rat hole.
> 

In summary:

I think its worth investigating getting LRO working but agree we can't
sacrifice any of the existing features or complicate the code to do it.
If the result of investigating is it can't be done then that is how it
is.

Jumbo frames I care very little about in reality so should not have
mentioned it.

Requiring XDP drivers to support all features is fine for me I can make
the virtio queue scheme a bit more flexible. Michael might have some
opinion on this though.

This series shouldn't be blocked by any of the above.

Thanks,
.John

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [net-next PATCH v5 0/6] XDP for virtio_net
  2016-12-08 20:46     ` John Fastabend
@ 2016-12-08 20:58       ` Michael S. Tsirkin
  2016-12-08 21:10         ` Michael S. Tsirkin
  2016-12-08 21:08       ` Alexei Starovoitov
  2016-12-08 22:16       ` David Miller
  2 siblings, 1 reply; 35+ messages in thread
From: Michael S. Tsirkin @ 2016-12-08 20:58 UTC (permalink / raw)
  To: John Fastabend
  Cc: Alexei Starovoitov, David Miller, daniel, shm, tgraf,
	john.r.fastabend, netdev, brouer

On Thu, Dec 08, 2016 at 12:46:07PM -0800, John Fastabend wrote:
> On 16-12-08 11:38 AM, Alexei Starovoitov wrote:
> > On Thu, Dec 08, 2016 at 02:17:02PM -0500, David Miller wrote:
> >> From: John Fastabend <john.fastabend@gmail.com>
> >> Date: Wed, 07 Dec 2016 12:10:47 -0800
> >>
> >>> This implements virtio_net for the mergeable buffers and big_packet
> >>> modes. I tested this with vhost_net running on qemu and did not see
> >>> any issues. For testing num_buf > 1 I added a hack to vhost driver
> >>> to only but 100 bytes per buffer.
> >>  ...
> >>
> >> So where are we with this?
> 
> There is one possible issue with a hang that Michael pointed out. I can
> either spin a v6 or if you pull this v5 series in I can post a bugfix
> for it. I am not seeing the issue in practice XDP virtio has been up
> and running on my box here for days without issue.


I'd prefer it fixed. Alternatively, apply just 1-3 for now.

> All the concerns below are really future XDP ideas and unrelated to
> this series or at least not required for this series to applied IMO.
> 
> >>
> >> I'm not too thrilled with the idea of making XDP_TX optional or
> >> something like that.  If someone enables XDP, there is a tradeoff.
> >>
> >> I also have reservations about the idea to make jumbo frames work
> >> without giving XDP access to the whole packet.  If it wants to push or
> >> pop a header, it might need to know the whole packet length.  How will
> >> you pass that to the XDP program?
> >>
> >> Some kinds of encapsulation require trailers, thus preclusing access
> >> to the entire packet precludes those kinds of transformations.
> > 
> > +1
> 
> This was sort of speculative on my side it is certainly not dependent on
> the series here. I agree that we don't want to get into a state where
> program X runs here and not there and only runs after doing magic
> incantations, etc. I would only propose it if there is a clean way to
> implement this.
> 
> > 
> >> This is why we want simple, linear, buffer access for XDP.
> >>
> >> Even the most seemingly minor exception turns into a huge complicated
> >> mess.
> > 
> > +1
> 
> Yep.
> 
> > 
> > and from the other thread:
> >>> Can't we disable XDP_TX somehow? Many people might only want RX drop,
> >>> and extra queues are not always there.
> >>>
> >>
> >> Alexei, Daniel, any thoughts on this?
> > 
> > I don't like it.
> > 
> 
> OK alternatively we can make more queues available in virtio which might
> be the better solution.
> 
> >> I know we were trying to claim some base level of feature support for
> >> all XDP drivers. I am sympathetic to this argument though for DDOS we
> >> do not need XDP_TX support. And virtio can become queue constrained
> >> in some cases.
> > 
> > especially for ddos case doing lro/gro is not helpful.
> 
> Fair enough but disabling LRO to handle the case where you "might" get
> a DDOS will hurt normal good traffic.
> 
> > I frankly don't see a use case where you'd want to steer a packet
> > all the way into VM just to drop them there?
> 
> VM to VM traffic is my use case. And in that model we need XDP at the
> virtio or vhost layer in case of malicious/broke/untrusted VM. I have
> some vhost patches under development for when net-next opens up again.
> 
> > Without XDP_TX it's too crippled. adjust_head() won't be possible,
> 
> Just a nit but any reason not to support adjust_head and then XDP_PASS.
> I don't have a use case in mind but also see no reason to preclude it.
> 
> > packet mangling would have to be disabled and so on.
> > If xdp program doesn't see raw packet it can only parse the headers of
> > this jumbo meta-packet and drop it, but for virtio it's really too late.
> > ddos protection needs to be done at the earliest hw nic receive.
> 
> VM to VM traffic never touches hw nic.
> 
> > I think if driver claims xdp support it needs to support
> > drop/pass/tx and adjust_head. For metadata passing up into stack from xdp
> > we need adjust_head, for encap/decap we need it too. And lro is in the way
> > of such transformations.
> > We struggled a lot with cls_bpf due to all metadata inside skb that needs
> > to be kept correct. Feeding non-raw packets into xdp is a rat hole.
> > 
> 
> In summary:
> 
> I think its worth investigating getting LRO working but agree we can't
> sacrifice any of the existing features or complicate the code to do it.
> If the result of investigating is it can't be done then that is how it
> is.
> 
> Jumbo frames I care very little about in reality so should not have
> mentioned it.
> 
> Requiring XDP drivers to support all features is fine for me I can make
> the virtio queue scheme a bit more flexible. Michael might have some
> opinion on this though.
> 
> This series shouldn't be blocked by any of the above.
> 
> Thanks,
> .John

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [net-next PATCH v5 5/6] virtio_net: add XDP_TX support
  2016-12-08 18:18     ` John Fastabend
@ 2016-12-08 21:08       ` Michael S. Tsirkin
  2016-12-08 21:18       ` Michael S. Tsirkin
  1 sibling, 0 replies; 35+ messages in thread
From: Michael S. Tsirkin @ 2016-12-08 21:08 UTC (permalink / raw)
  To: John Fastabend
  Cc: daniel, shm, davem, tgraf, alexei.starovoitov, john.r.fastabend,
	netdev, brouer

On Thu, Dec 08, 2016 at 10:18:22AM -0800, John Fastabend wrote:
> On 16-12-07 10:11 PM, Michael S. Tsirkin wrote:
> > On Wed, Dec 07, 2016 at 12:12:45PM -0800, John Fastabend wrote:
> >> This adds support for the XDP_TX action to virtio_net. When an XDP
> >> program is run and returns the XDP_TX action the virtio_net XDP
> >> implementation will transmit the packet on a TX queue that aligns
> >> with the current CPU that the XDP packet was processed on.
> >>
> >> Before sending the packet the header is zeroed.  Also XDP is expected
> >> to handle checksum correctly so no checksum offload  support is
> >> provided.
> >>
> >> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> >> ---
> >>  drivers/net/virtio_net.c |   99 +++++++++++++++++++++++++++++++++++++++++++---
> >>  1 file changed, 92 insertions(+), 7 deletions(-)
> >>
> >> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> >> index 28b1196..8e5b13c 100644
> >> --- a/drivers/net/virtio_net.c
> >> +++ b/drivers/net/virtio_net.c
> >> @@ -330,12 +330,57 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> >>  	return skb;
> >>  }
> >>  
> >> +static void virtnet_xdp_xmit(struct virtnet_info *vi,
> >> +			     struct receive_queue *rq,
> >> +			     struct send_queue *sq,
> >> +			     struct xdp_buff *xdp)
> >> +{
> >> +	struct page *page = virt_to_head_page(xdp->data);
> >> +	struct virtio_net_hdr_mrg_rxbuf *hdr;
> >> +	unsigned int num_sg, len;
> >> +	void *xdp_sent;
> >> +	int err;
> >> +
> >> +	/* Free up any pending old buffers before queueing new ones. */
> >> +	while ((xdp_sent = virtqueue_get_buf(sq->vq, &len)) != NULL) {
> >> +		struct page *sent_page = virt_to_head_page(xdp_sent);
> >> +
> >> +		if (vi->mergeable_rx_bufs)
> >> +			put_page(sent_page);
> >> +		else
> >> +			give_pages(rq, sent_page);
> >> +	}
> > 
> > Looks like this is the only place where you do virtqueue_get_buf.
> > No interrupt handler?
> > This means that if you fill up the queue, nothing will clean it
> > and things will get stuck.
> 
> hmm OK so the callbacks should be implemented to do this and a pair
> of virtqueue_enable_cb_prepare()/virtqueue_disable_cb() used to enable
> and disable callbacks if packets are enqueued.

Oh I didn't realize XDP never stops processing packets,
even if they are never freed.
In that case you do not need callbacks.

> Also in the normal xmit path via start_xmit() will the same condition
> happen? It looks like free_old_xmit_skbs for example is only called if
> a packet is sent could we end up holding on to skbs in this case? I
> don't see free_old_xmit_skbs being called from any callbacks?

Right - all it does is restart the queue. That's why we don't support
BQL right now.

> > Can this be the issue you saw?
> 
> nope see below I was mishandling the big_packets page cleanup path in
> the error case.
> 
> > 
> > 
> >> +
> >> +	/* Zero header and leave csum up to XDP layers */
> >> +	hdr = xdp->data;
> >> +	memset(hdr, 0, vi->hdr_len);
> >> +
> >> +	nu_sg = 1;
> >> +	sg_init_one(sq->sg, xdp->data, xdp->data_end - xdp->data);
> >> +	err = virtqueue_add_outbuf(sq->vq, sq->sg, num_sg,
> >> +				   xdp->data, GFP_ATOMIC);
> >> +	if (unlikely(err)) {
> >> +		if (vi->mergeable_rx_bufs)
> >> +			put_page(page);
> >> +		else
> >> +			give_pages(rq, page);
> >> +	} else if (!vi->mergeable_rx_bufs) {
> >> +		/* If not mergeable bufs must be big packets so cleanup pages */
> >> +		give_pages(rq, (struct page *)page->private);
> >> +		page->private = 0;
> >> +	}
> >> +
> >> +	virtqueue_kick(sq->vq);
> > 
> > Is this unconditional kick a work-around for hang
> > we could not figure out yet?
> 
> I tracked the original issue down to how I handled the big_packet page
> cleanups.
> 
> > I guess this helps because it just slows down the guest.
> > I don't much like it ...
> 
> I left it like this copying the pattern in balloon and input drivers. I
> can change it back to the previous pattern where it is only called if
> there is no errors. It has been running fine with the old pattern now
> for an hour or so.
> 
> .John

OK makes sense.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [net-next PATCH v5 0/6] XDP for virtio_net
  2016-12-08 20:46     ` John Fastabend
  2016-12-08 20:58       ` Michael S. Tsirkin
@ 2016-12-08 21:08       ` Alexei Starovoitov
  2016-12-08 22:16       ` David Miller
  2 siblings, 0 replies; 35+ messages in thread
From: Alexei Starovoitov @ 2016-12-08 21:08 UTC (permalink / raw)
  To: John Fastabend
  Cc: David Miller, daniel, mst, shm, tgraf, john.r.fastabend, netdev, brouer

On Thu, Dec 08, 2016 at 12:46:07PM -0800, John Fastabend wrote:
> 
> Fair enough but disabling LRO to handle the case where you "might" get
> a DDOS will hurt normal good traffic.

the xdp_pass path is not optimized right now. so even without VMs
we need some work to do. lro or not-lro is imo secondary.

> > I frankly don't see a use case where you'd want to steer a packet
> > all the way into VM just to drop them there?
> 
> VM to VM traffic is my use case. And in that model we need XDP at the
> virtio or vhost layer in case of malicious/broke/untrusted VM. I have
> some vhost patches under development for when net-next opens up again.

excellent. looking forward to vhost patches.

> > Without XDP_TX it's too crippled. adjust_head() won't be possible,
> 
> Just a nit but any reason not to support adjust_head and then XDP_PASS.
> I don't have a use case in mind but also see no reason to preclude it.

adjust_head and xdp_pass needs to be supported. No doubt.
the use case is metadata passing between xdp and upper layers.

> In summary:
> 
> I think its worth investigating getting LRO working but agree we can't
> sacrifice any of the existing features or complicate the code to do it.
> If the result of investigating is it can't be done then that is how it
> is.

agree

> Requiring XDP drivers to support all features is fine for me I can make
> the virtio queue scheme a bit more flexible. Michael might have some
> opinion on this though.

I say right now all the features are pretty much must have,
but in the future we will become selective.
Like zero-copy a page from dma into user space probably doesn't
make sense to do for virtio.
Multi-port TX from virtio into phys netdev doesn't make sense either.

> This series shouldn't be blocked by any of the above.

completely agree.
since we abandoned e1000+xdp patches, the virtio+xdp is the only
thing on the table that allows us to do convenient development
and testing of xdp programs.
We've talked about a repository of blessed xdp programs.
They all would need to be routinely and automatically tested.
So virtio+xdp is a must have feature to me.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [net-next PATCH v5 0/6] XDP for virtio_net
  2016-12-08 20:58       ` Michael S. Tsirkin
@ 2016-12-08 21:10         ` Michael S. Tsirkin
  0 siblings, 0 replies; 35+ messages in thread
From: Michael S. Tsirkin @ 2016-12-08 21:10 UTC (permalink / raw)
  To: John Fastabend
  Cc: Alexei Starovoitov, David Miller, daniel, shm, tgraf,
	john.r.fastabend, netdev, brouer

On Thu, Dec 08, 2016 at 10:58:52PM +0200, Michael S. Tsirkin wrote:
> On Thu, Dec 08, 2016 at 12:46:07PM -0800, John Fastabend wrote:
> > On 16-12-08 11:38 AM, Alexei Starovoitov wrote:
> > > On Thu, Dec 08, 2016 at 02:17:02PM -0500, David Miller wrote:
> > >> From: John Fastabend <john.fastabend@gmail.com>
> > >> Date: Wed, 07 Dec 2016 12:10:47 -0800
> > >>
> > >>> This implements virtio_net for the mergeable buffers and big_packet
> > >>> modes. I tested this with vhost_net running on qemu and did not see
> > >>> any issues. For testing num_buf > 1 I added a hack to vhost driver
> > >>> to only but 100 bytes per buffer.
> > >>  ...
> > >>
> > >> So where are we with this?
> > 
> > There is one possible issue with a hang that Michael pointed out. I can
> > either spin a v6 or if you pull this v5 series in I can post a bugfix
> > for it. I am not seeing the issue in practice XDP virtio has been up
> > and running on my box here for days without issue.
> 
> 
> I'd prefer it fixed. Alternatively, apply just 1-3 for now.

Looks like there's no issue though after all.
I misunderstood things.


> > All the concerns below are really future XDP ideas and unrelated to
> > this series or at least not required for this series to applied IMO.
> > 
> > >>
> > >> I'm not too thrilled with the idea of making XDP_TX optional or
> > >> something like that.  If someone enables XDP, there is a tradeoff.
> > >>
> > >> I also have reservations about the idea to make jumbo frames work
> > >> without giving XDP access to the whole packet.  If it wants to push or
> > >> pop a header, it might need to know the whole packet length.  How will
> > >> you pass that to the XDP program?
> > >>
> > >> Some kinds of encapsulation require trailers, thus preclusing access
> > >> to the entire packet precludes those kinds of transformations.
> > > 
> > > +1
> > 
> > This was sort of speculative on my side it is certainly not dependent on
> > the series here. I agree that we don't want to get into a state where
> > program X runs here and not there and only runs after doing magic
> > incantations, etc. I would only propose it if there is a clean way to
> > implement this.
> > 
> > > 
> > >> This is why we want simple, linear, buffer access for XDP.
> > >>
> > >> Even the most seemingly minor exception turns into a huge complicated
> > >> mess.
> > > 
> > > +1
> > 
> > Yep.
> > 
> > > 
> > > and from the other thread:
> > >>> Can't we disable XDP_TX somehow? Many people might only want RX drop,
> > >>> and extra queues are not always there.
> > >>>
> > >>
> > >> Alexei, Daniel, any thoughts on this?
> > > 
> > > I don't like it.
> > > 
> > 
> > OK alternatively we can make more queues available in virtio which might
> > be the better solution.
> > 
> > >> I know we were trying to claim some base level of feature support for
> > >> all XDP drivers. I am sympathetic to this argument though for DDOS we
> > >> do not need XDP_TX support. And virtio can become queue constrained
> > >> in some cases.
> > > 
> > > especially for ddos case doing lro/gro is not helpful.
> > 
> > Fair enough but disabling LRO to handle the case where you "might" get
> > a DDOS will hurt normal good traffic.
> > 
> > > I frankly don't see a use case where you'd want to steer a packet
> > > all the way into VM just to drop them there?
> > 
> > VM to VM traffic is my use case. And in that model we need XDP at the
> > virtio or vhost layer in case of malicious/broke/untrusted VM. I have
> > some vhost patches under development for when net-next opens up again.
> > 
> > > Without XDP_TX it's too crippled. adjust_head() won't be possible,
> > 
> > Just a nit but any reason not to support adjust_head and then XDP_PASS.
> > I don't have a use case in mind but also see no reason to preclude it.
> > 
> > > packet mangling would have to be disabled and so on.
> > > If xdp program doesn't see raw packet it can only parse the headers of
> > > this jumbo meta-packet and drop it, but for virtio it's really too late.
> > > ddos protection needs to be done at the earliest hw nic receive.
> > 
> > VM to VM traffic never touches hw nic.
> > 
> > > I think if driver claims xdp support it needs to support
> > > drop/pass/tx and adjust_head. For metadata passing up into stack from xdp
> > > we need adjust_head, for encap/decap we need it too. And lro is in the way
> > > of such transformations.
> > > We struggled a lot with cls_bpf due to all metadata inside skb that needs
> > > to be kept correct. Feeding non-raw packets into xdp is a rat hole.
> > > 
> > 
> > In summary:
> > 
> > I think its worth investigating getting LRO working but agree we can't
> > sacrifice any of the existing features or complicate the code to do it.
> > If the result of investigating is it can't be done then that is how it
> > is.
> > 
> > Jumbo frames I care very little about in reality so should not have
> > mentioned it.
> > 
> > Requiring XDP drivers to support all features is fine for me I can make
> > the virtio queue scheme a bit more flexible. Michael might have some
> > opinion on this though.
> > 
> > This series shouldn't be blocked by any of the above.
> > 
> > Thanks,
> > .John

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [net-next PATCH v5 0/6] XDP for virtio_net
  2016-12-08 19:17 ` [net-next PATCH v5 0/6] XDP for virtio_net David Miller
  2016-12-08 19:38   ` Alexei Starovoitov
@ 2016-12-08 21:16   ` Michael S. Tsirkin
  1 sibling, 0 replies; 35+ messages in thread
From: Michael S. Tsirkin @ 2016-12-08 21:16 UTC (permalink / raw)
  To: David Miller
  Cc: john.fastabend, daniel, shm, tgraf, alexei.starovoitov,
	john.r.fastabend, netdev, brouer

On Thu, Dec 08, 2016 at 02:17:02PM -0500, David Miller wrote:
> From: John Fastabend <john.fastabend@gmail.com>
> Date: Wed, 07 Dec 2016 12:10:47 -0800
> 
> > This implements virtio_net for the mergeable buffers and big_packet
> > modes. I tested this with vhost_net running on qemu and did not see
> > any issues. For testing num_buf > 1 I added a hack to vhost driver
> > to only but 100 bytes per buffer.
>  ...
> 
> So where are we with this?
> 
> I'm not too thrilled with the idea of making XDP_TX optional or
> something like that.  If someone enables XDP, there is a tradeoff.

The issue is inability of XDP TX to share xmit queues with net stack.
I'm guessing virtio is not the only card that has a limited
number of queues, is it? Is it really so hard to lock the queue
and check it's running? Could be optional in case resources are there
...

-- 
MST

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [net-next PATCH v5 5/6] virtio_net: add XDP_TX support
  2016-12-08 18:18     ` John Fastabend
  2016-12-08 21:08       ` Michael S. Tsirkin
@ 2016-12-08 21:18       ` Michael S. Tsirkin
  2016-12-08 21:25         ` John Fastabend
  1 sibling, 1 reply; 35+ messages in thread
From: Michael S. Tsirkin @ 2016-12-08 21:18 UTC (permalink / raw)
  To: John Fastabend
  Cc: daniel, shm, davem, tgraf, alexei.starovoitov, john.r.fastabend,
	netdev, brouer

On Thu, Dec 08, 2016 at 10:18:22AM -0800, John Fastabend wrote:
> > I guess this helps because it just slows down the guest.
> > I don't much like it ...
> 
> I left it like this copying the pattern in balloon and input drivers. I
> can change it back to the previous pattern where it is only called if
> there is no errors. It has been running fine with the old pattern now
> for an hour or so.
> 
> .John

I'd prefer internal consistency. Could be a patch on top
if this helps. I'm happy it isn't actually buggy.
Let me know whether you want to post v6 or
want me to ack this one.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [net-next PATCH v5 5/6] virtio_net: add XDP_TX support
  2016-12-08 21:18       ` Michael S. Tsirkin
@ 2016-12-08 21:25         ` John Fastabend
  2016-12-08 21:45           ` Michael S. Tsirkin
  0 siblings, 1 reply; 35+ messages in thread
From: John Fastabend @ 2016-12-08 21:25 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: daniel, shm, davem, tgraf, alexei.starovoitov, john.r.fastabend,
	netdev, brouer

On 16-12-08 01:18 PM, Michael S. Tsirkin wrote:
> On Thu, Dec 08, 2016 at 10:18:22AM -0800, John Fastabend wrote:
>>> I guess this helps because it just slows down the guest.
>>> I don't much like it ...
>>
>> I left it like this copying the pattern in balloon and input drivers. I
>> can change it back to the previous pattern where it is only called if
>> there is no errors. It has been running fine with the old pattern now
>> for an hour or so.
>>
>> .John
> 
> I'd prefer internal consistency. Could be a patch on top
> if this helps. I'm happy it isn't actually buggy.
> Let me know whether you want to post v6 or
> want me to ack this one.
> 

I think because DaveM has closed net-next ACK'ing this set would be
great to get it in Dave's tree. Then I can post a small "fix" so that it
is consistent between normal stack and xdp tx path after that.

Thanks,
John

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [net-next PATCH v5 1/6] net: virtio dynamically disable/enable LRO
  2016-12-07 20:11 ` [net-next PATCH v5 1/6] net: virtio dynamically disable/enable LRO John Fastabend
@ 2016-12-08 21:36   ` Michael S. Tsirkin
  2016-12-09  0:04     ` John Fastabend
  0 siblings, 1 reply; 35+ messages in thread
From: Michael S. Tsirkin @ 2016-12-08 21:36 UTC (permalink / raw)
  To: John Fastabend
  Cc: daniel, shm, davem, tgraf, alexei.starovoitov, john.r.fastabend,
	netdev, brouer

On Wed, Dec 07, 2016 at 12:11:11PM -0800, John Fastabend wrote:
> This adds support for dynamically setting the LRO feature flag. The
> message to control guest features in the backend uses the
> CTRL_GUEST_OFFLOADS msg type.
> 
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> ---
>  drivers/net/virtio_net.c |   40 +++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 39 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index a21d93a..a5c47b1 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1419,6 +1419,36 @@ static void virtnet_init_settings(struct net_device *dev)
>  	.set_settings = virtnet_set_settings,
>  };
>  
> +static int virtnet_set_features(struct net_device *netdev,
> +				netdev_features_t features)
> +{
> +	struct virtnet_info *vi = netdev_priv(netdev);
> +	struct virtio_device *vdev = vi->vdev;
> +	struct scatterlist sg;
> +	u64 offloads = 0;
> +
> +	if (features & NETIF_F_LRO)
> +		offloads |= (1 << VIRTIO_NET_F_GUEST_TSO4) |
> +			    (1 << VIRTIO_NET_F_GUEST_TSO6);
> +
> +	if (features & NETIF_F_RXCSUM)
> +		offloads |= (1 << VIRTIO_NET_F_GUEST_CSUM);
> +
> +	if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS)) {
> +		sg_init_one(&sg, &offloads, sizeof(uint64_t));
> +		if (!virtnet_send_command(vi,
> +					  VIRTIO_NET_CTRL_GUEST_OFFLOADS,
> +					  VIRTIO_NET_CTRL_GUEST_OFFLOADS_SET,
> +					  &sg)) {

Hmm I just realised that this will slow down setups that bridge
virtio net interfaces since bridge calls this if provided.
See below.

> +			dev_warn(&netdev->dev,
> +				 "Failed to set guest offloads by virtnet command.\n");
> +			return -EINVAL;
> +		}
> +	}

Hmm if VIRTIO_NET_F_CTRL_GUEST_OFFLOADS is off, this fails
silently. It might actually be a good idea to avoid
breaking setups.

> +
> +	return 0;
> +}
> +
>  static const struct net_device_ops virtnet_netdev = {
>  	.ndo_open            = virtnet_open,
>  	.ndo_stop   	     = virtnet_close,
> @@ -1435,6 +1465,7 @@ static void virtnet_init_settings(struct net_device *dev)
>  #ifdef CONFIG_NET_RX_BUSY_POLL
>  	.ndo_busy_poll		= virtnet_busy_poll,
>  #endif
> +	.ndo_set_features	= virtnet_set_features,
>  };
>  
>  static void virtnet_config_changed_work(struct work_struct *work)
> @@ -1815,6 +1846,12 @@ static int virtnet_probe(struct virtio_device *vdev)
>  	if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_CSUM))
>  		dev->features |= NETIF_F_RXCSUM;
>  
> +	if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) &&
> +	    virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6)) {
> +		dev->features |= NETIF_F_LRO;
> +		dev->hw_features |= NETIF_F_LRO;

So the issue is I think that the virtio "LRO" isn't really
LRO, it's typically just GRO forwarded to guests.
So these are easily re-split along MTU boundaries,
which makes it ok to forward these across bridges.

It's not nice that we don't document this in the spec,
but it's the reality and people rely on this.

For now, how about doing a custom thing and just disable/enable
it as XDP is attached/detached?

> +	}
> +
>  	dev->vlan_features = dev->features;
>  
>  	/* MTU range: 68 - 65535 */
> @@ -2057,7 +2094,8 @@ static int virtnet_restore(struct virtio_device *vdev)
>  	VIRTIO_NET_F_CTRL_RX, VIRTIO_NET_F_CTRL_VLAN, \
>  	VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ, \
>  	VIRTIO_NET_F_CTRL_MAC_ADDR, \
> -	VIRTIO_NET_F_MTU
> +	VIRTIO_NET_F_MTU, \
> +	VIRTIO_NET_F_CTRL_GUEST_OFFLOADS
>  
>  static unsigned int features[] = {
>  	VIRTNET_FEATURES,

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [net-next PATCH v5 5/6] virtio_net: add XDP_TX support
  2016-12-08 21:25         ` John Fastabend
@ 2016-12-08 21:45           ` Michael S. Tsirkin
  2016-12-08 21:51             ` John Fastabend
  0 siblings, 1 reply; 35+ messages in thread
From: Michael S. Tsirkin @ 2016-12-08 21:45 UTC (permalink / raw)
  To: John Fastabend
  Cc: daniel, shm, davem, tgraf, alexei.starovoitov, john.r.fastabend,
	netdev, brouer

On Thu, Dec 08, 2016 at 01:25:40PM -0800, John Fastabend wrote:
> On 16-12-08 01:18 PM, Michael S. Tsirkin wrote:
> > On Thu, Dec 08, 2016 at 10:18:22AM -0800, John Fastabend wrote:
> >>> I guess this helps because it just slows down the guest.
> >>> I don't much like it ...
> >>
> >> I left it like this copying the pattern in balloon and input drivers. I
> >> can change it back to the previous pattern where it is only called if
> >> there is no errors. It has been running fine with the old pattern now
> >> for an hour or so.
> >>
> >> .John
> > 
> > I'd prefer internal consistency. Could be a patch on top
> > if this helps. I'm happy it isn't actually buggy.
> > Let me know whether you want to post v6 or
> > want me to ack this one.
> > 
> 
> I think because DaveM has closed net-next ACK'ing this set would be
> great to get it in Dave's tree. Then I can post a small "fix" so that it
> is consistent between normal stack and xdp tx path after that.
> 
> Thanks,
> John

I can merge it through my tree too - I don't think there's
any dependency on net-next, is there?

-- 
MST

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [net-next PATCH v5 5/6] virtio_net: add XDP_TX support
  2016-12-08 21:45           ` Michael S. Tsirkin
@ 2016-12-08 21:51             ` John Fastabend
  0 siblings, 0 replies; 35+ messages in thread
From: John Fastabend @ 2016-12-08 21:51 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: daniel, shm, davem, tgraf, alexei.starovoitov, john.r.fastabend,
	netdev, brouer

On 16-12-08 01:45 PM, Michael S. Tsirkin wrote:
> On Thu, Dec 08, 2016 at 01:25:40PM -0800, John Fastabend wrote:
>> On 16-12-08 01:18 PM, Michael S. Tsirkin wrote:
>>> On Thu, Dec 08, 2016 at 10:18:22AM -0800, John Fastabend wrote:
>>>>> I guess this helps because it just slows down the guest.
>>>>> I don't much like it ...
>>>>
>>>> I left it like this copying the pattern in balloon and input drivers. I
>>>> can change it back to the previous pattern where it is only called if
>>>> there is no errors. It has been running fine with the old pattern now
>>>> for an hour or so.
>>>>
>>>> .John
>>>
>>> I'd prefer internal consistency. Could be a patch on top
>>> if this helps. I'm happy it isn't actually buggy.
>>> Let me know whether you want to post v6 or
>>> want me to ack this one.
>>>
>>
>> I think because DaveM has closed net-next ACK'ing this set would be
>> great to get it in Dave's tree. Then I can post a small "fix" so that it
>> is consistent between normal stack and xdp tx path after that.
>>
>> Thanks,
>> John
> 
> I can merge it through my tree too - I don't think there's
> any dependency on net-next, is there?
> 

Its all changes to virtio_net except for patch 2 which is against
filter.c/filter.h which I guess you could possibly get a merge conflict
on. But it would be fairly trivial.

I don't have a preference on which tree it goes through whichever makes
the most sense.

.John

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [net-next PATCH v5 0/6] XDP for virtio_net
  2016-12-08 20:46     ` John Fastabend
  2016-12-08 20:58       ` Michael S. Tsirkin
  2016-12-08 21:08       ` Alexei Starovoitov
@ 2016-12-08 22:16       ` David Miller
  2016-12-09  3:01         ` Michael S. Tsirkin
  2 siblings, 1 reply; 35+ messages in thread
From: David Miller @ 2016-12-08 22:16 UTC (permalink / raw)
  To: john.fastabend
  Cc: alexei.starovoitov, daniel, mst, shm, tgraf, john.r.fastabend,
	netdev, brouer

From: John Fastabend <john.fastabend@gmail.com>
Date: Thu, 8 Dec 2016 12:46:07 -0800

> On 16-12-08 11:38 AM, Alexei Starovoitov wrote:
>> On Thu, Dec 08, 2016 at 02:17:02PM -0500, David Miller wrote:
>>> From: John Fastabend <john.fastabend@gmail.com>
>>> Date: Wed, 07 Dec 2016 12:10:47 -0800
>>>
>>>> This implements virtio_net for the mergeable buffers and big_packet
>>>> modes. I tested this with vhost_net running on qemu and did not see
>>>> any issues. For testing num_buf > 1 I added a hack to vhost driver
>>>> to only but 100 bytes per buffer.
>>>  ...
>>>
>>> So where are we with this?
> 
> There is one possible issue with a hang that Michael pointed out. I can
> either spin a v6 or if you pull this v5 series in I can post a bugfix
> for it. I am not seeing the issue in practice XDP virtio has been up
> and running on my box here for days without issue.

If that's the case then please spin a v6 and I'll apply it.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [net-next PATCH v5 1/6] net: virtio dynamically disable/enable LRO
  2016-12-08 21:36   ` Michael S. Tsirkin
@ 2016-12-09  0:04     ` John Fastabend
  2016-12-09  3:05       ` Michael S. Tsirkin
  2016-12-14 13:31       ` Michael S. Tsirkin
  0 siblings, 2 replies; 35+ messages in thread
From: John Fastabend @ 2016-12-09  0:04 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: daniel, shm, davem, tgraf, alexei.starovoitov, john.r.fastabend,
	netdev, brouer

On 16-12-08 01:36 PM, Michael S. Tsirkin wrote:
> On Wed, Dec 07, 2016 at 12:11:11PM -0800, John Fastabend wrote:
>> This adds support for dynamically setting the LRO feature flag. The
>> message to control guest features in the backend uses the
>> CTRL_GUEST_OFFLOADS msg type.
>>
>> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
>> ---
>>  drivers/net/virtio_net.c |   40 +++++++++++++++++++++++++++++++++++++++-
>>  1 file changed, 39 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index a21d93a..a5c47b1 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -1419,6 +1419,36 @@ static void virtnet_init_settings(struct net_device *dev)
>>  	.set_settings = virtnet_set_settings,
>>  };
>>  
>> +static int virtnet_set_features(struct net_device *netdev,
>> +				netdev_features_t features)
>> +{
>> +	struct virtnet_info *vi = netdev_priv(netdev);
>> +	struct virtio_device *vdev = vi->vdev;
>> +	struct scatterlist sg;
>> +	u64 offloads = 0;
>> +
>> +	if (features & NETIF_F_LRO)
>> +		offloads |= (1 << VIRTIO_NET_F_GUEST_TSO4) |
>> +			    (1 << VIRTIO_NET_F_GUEST_TSO6);
>> +
>> +	if (features & NETIF_F_RXCSUM)
>> +		offloads |= (1 << VIRTIO_NET_F_GUEST_CSUM);
>> +
>> +	if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS)) {
>> +		sg_init_one(&sg, &offloads, sizeof(uint64_t));
>> +		if (!virtnet_send_command(vi,
>> +					  VIRTIO_NET_CTRL_GUEST_OFFLOADS,
>> +					  VIRTIO_NET_CTRL_GUEST_OFFLOADS_SET,
>> +					  &sg)) {
> 
> Hmm I just realised that this will slow down setups that bridge
> virtio net interfaces since bridge calls this if provided.
> See below.


Really? What code is trying to turn off GRO via the GUEST_OFFLOADS LRO
command. My qemu/Linux setup has a set of tap/vhost devices attached to
a bridge and all of them have LRO enabled even with this patch series.

I must missing a setup handler somewhere?

> 
>> +			dev_warn(&netdev->dev,
>> +				 "Failed to set guest offloads by virtnet command.\n");
>> +			return -EINVAL;
>> +		}
>> +	}
> 
> Hmm if VIRTIO_NET_F_CTRL_GUEST_OFFLOADS is off, this fails
> silently. It might actually be a good idea to avoid
> breaking setups.
> 
>> +
>> +	return 0;
>> +}
>> +
>>  static const struct net_device_ops virtnet_netdev = {
>>  	.ndo_open            = virtnet_open,
>>  	.ndo_stop   	     = virtnet_close,
>> @@ -1435,6 +1465,7 @@ static void virtnet_init_settings(struct net_device *dev)
>>  #ifdef CONFIG_NET_RX_BUSY_POLL
>>  	.ndo_busy_poll		= virtnet_busy_poll,
>>  #endif
>> +	.ndo_set_features	= virtnet_set_features,
>>  };
>>  
>>  static void virtnet_config_changed_work(struct work_struct *work)
>> @@ -1815,6 +1846,12 @@ static int virtnet_probe(struct virtio_device *vdev)
>>  	if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_CSUM))
>>  		dev->features |= NETIF_F_RXCSUM;
>>  
>> +	if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) &&
>> +	    virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6)) {
>> +		dev->features |= NETIF_F_LRO;
>> +		dev->hw_features |= NETIF_F_LRO;
> 
> So the issue is I think that the virtio "LRO" isn't really
> LRO, it's typically just GRO forwarded to guests.
> So these are easily re-split along MTU boundaries,
> which makes it ok to forward these across bridges.
> 
> It's not nice that we don't document this in the spec,
> but it's the reality and people rely on this.
> 
> For now, how about doing a custom thing and just disable/enable
> it as XDP is attached/detached?

The annoying part about doing this is ethtool will say that it is fixed
yet it will be changed by seemingly unrelated operation. I'm not sure I
like the idea to start automatically configuring the link via xdp_set.

> 
>> +	}
>> +
>>  	dev->vlan_features = dev->features;
>>  
>>  	/* MTU range: 68 - 65535 */
>> @@ -2057,7 +2094,8 @@ static int virtnet_restore(struct virtio_device *vdev)
>>  	VIRTIO_NET_F_CTRL_RX, VIRTIO_NET_F_CTRL_VLAN, \
>>  	VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ, \
>>  	VIRTIO_NET_F_CTRL_MAC_ADDR, \
>> -	VIRTIO_NET_F_MTU
>> +	VIRTIO_NET_F_MTU, \
>> +	VIRTIO_NET_F_CTRL_GUEST_OFFLOADS
>>  
>>  static unsigned int features[] = {
>>  	VIRTNET_FEATURES,

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [net-next PATCH v5 0/6] XDP for virtio_net
  2016-12-08 22:16       ` David Miller
@ 2016-12-09  3:01         ` Michael S. Tsirkin
  0 siblings, 0 replies; 35+ messages in thread
From: Michael S. Tsirkin @ 2016-12-09  3:01 UTC (permalink / raw)
  To: David Miller
  Cc: john.fastabend, alexei.starovoitov, daniel, shm, tgraf,
	john.r.fastabend, netdev, brouer

On Thu, Dec 08, 2016 at 05:16:02PM -0500, David Miller wrote:
> From: John Fastabend <john.fastabend@gmail.com>
> Date: Thu, 8 Dec 2016 12:46:07 -0800
> 
> > On 16-12-08 11:38 AM, Alexei Starovoitov wrote:
> >> On Thu, Dec 08, 2016 at 02:17:02PM -0500, David Miller wrote:
> >>> From: John Fastabend <john.fastabend@gmail.com>
> >>> Date: Wed, 07 Dec 2016 12:10:47 -0800
> >>>
> >>>> This implements virtio_net for the mergeable buffers and big_packet
> >>>> modes. I tested this with vhost_net running on qemu and did not see
> >>>> any issues. For testing num_buf > 1 I added a hack to vhost driver
> >>>> to only but 100 bytes per buffer.
> >>>  ...
> >>>
> >>> So where are we with this?
> > 
> > There is one possible issue with a hang that Michael pointed out. I can
> > either spin a v6 or if you pull this v5 series in I can post a bugfix
> > for it. I am not seeing the issue in practice XDP virtio has been up
> > and running on my box here for days without issue.
> 
> If that's the case then please spin a v6 and I'll apply it.

I think 1/6 should be reworked, disabling/enabling GUEST_TSO
transparently without touching the LRO flag, and without adding
NDO callback.

-- 
MST

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [net-next PATCH v5 1/6] net: virtio dynamically disable/enable LRO
  2016-12-09  0:04     ` John Fastabend
@ 2016-12-09  3:05       ` Michael S. Tsirkin
  2016-12-14 13:31       ` Michael S. Tsirkin
  1 sibling, 0 replies; 35+ messages in thread
From: Michael S. Tsirkin @ 2016-12-09  3:05 UTC (permalink / raw)
  To: John Fastabend
  Cc: daniel, shm, davem, tgraf, alexei.starovoitov, john.r.fastabend,
	netdev, brouer

On Thu, Dec 08, 2016 at 04:04:58PM -0800, John Fastabend wrote:
> On 16-12-08 01:36 PM, Michael S. Tsirkin wrote:
> > On Wed, Dec 07, 2016 at 12:11:11PM -0800, John Fastabend wrote:
> >> This adds support for dynamically setting the LRO feature flag. The
> >> message to control guest features in the backend uses the
> >> CTRL_GUEST_OFFLOADS msg type.
> >>
> >> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> >> ---
> >>  drivers/net/virtio_net.c |   40 +++++++++++++++++++++++++++++++++++++++-
> >>  1 file changed, 39 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> >> index a21d93a..a5c47b1 100644
> >> --- a/drivers/net/virtio_net.c
> >> +++ b/drivers/net/virtio_net.c
> >> @@ -1419,6 +1419,36 @@ static void virtnet_init_settings(struct net_device *dev)
> >>  	.set_settings = virtnet_set_settings,
> >>  };
> >>  
> >> +static int virtnet_set_features(struct net_device *netdev,
> >> +				netdev_features_t features)
> >> +{
> >> +	struct virtnet_info *vi = netdev_priv(netdev);
> >> +	struct virtio_device *vdev = vi->vdev;
> >> +	struct scatterlist sg;
> >> +	u64 offloads = 0;
> >> +
> >> +	if (features & NETIF_F_LRO)
> >> +		offloads |= (1 << VIRTIO_NET_F_GUEST_TSO4) |
> >> +			    (1 << VIRTIO_NET_F_GUEST_TSO6);
> >> +
> >> +	if (features & NETIF_F_RXCSUM)
> >> +		offloads |= (1 << VIRTIO_NET_F_GUEST_CSUM);
> >> +
> >> +	if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS)) {
> >> +		sg_init_one(&sg, &offloads, sizeof(uint64_t));
> >> +		if (!virtnet_send_command(vi,
> >> +					  VIRTIO_NET_CTRL_GUEST_OFFLOADS,
> >> +					  VIRTIO_NET_CTRL_GUEST_OFFLOADS_SET,
> >> +					  &sg)) {
> > 
> > Hmm I just realised that this will slow down setups that bridge
> > virtio net interfaces since bridge calls this if provided.
> > See below.
> 
> 
> Really? What code is trying to turn off GRO via the GUEST_OFFLOADS LRO
> command. My qemu/Linux setup has a set of tap/vhost devices attached to
> a bridge and all of them have LRO enabled even with this patch series.
> 
> I must missing a setup handler somewhere?

Turning off LRO, not GRO.

> > 
> >> +			dev_warn(&netdev->dev,
> >> +				 "Failed to set guest offloads by virtnet command.\n");
> >> +			return -EINVAL;
> >> +		}
> >> +	}
> > 
> > Hmm if VIRTIO_NET_F_CTRL_GUEST_OFFLOADS is off, this fails
> > silently. It might actually be a good idea to avoid
> > breaking setups.
> > 
> >> +
> >> +	return 0;
> >> +}
> >> +
> >>  static const struct net_device_ops virtnet_netdev = {
> >>  	.ndo_open            = virtnet_open,
> >>  	.ndo_stop   	     = virtnet_close,
> >> @@ -1435,6 +1465,7 @@ static void virtnet_init_settings(struct net_device *dev)
> >>  #ifdef CONFIG_NET_RX_BUSY_POLL
> >>  	.ndo_busy_poll		= virtnet_busy_poll,
> >>  #endif
> >> +	.ndo_set_features	= virtnet_set_features,
> >>  };
> >>  
> >>  static void virtnet_config_changed_work(struct work_struct *work)
> >> @@ -1815,6 +1846,12 @@ static int virtnet_probe(struct virtio_device *vdev)
> >>  	if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_CSUM))
> >>  		dev->features |= NETIF_F_RXCSUM;
> >>  
> >> +	if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) &&
> >> +	    virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6)) {
> >> +		dev->features |= NETIF_F_LRO;
> >> +		dev->hw_features |= NETIF_F_LRO;
> > 
> > So the issue is I think that the virtio "LRO" isn't really
> > LRO, it's typically just GRO forwarded to guests.
> > So these are easily re-split along MTU boundaries,
> > which makes it ok to forward these across bridges.
> > 
> > It's not nice that we don't document this in the spec,
> > but it's the reality and people rely on this.
> > 
> > For now, how about doing a custom thing and just disable/enable
> > it as XDP is attached/detached?
> 
> The annoying part about doing this is ethtool will say that it is fixed
> yet it will be changed by seemingly unrelated operation.

ATM ethtool does not tell you LRO is enabled, since
again, this isn't exactly LRO. It's a safe variant that
supports e.g. bridging.

> I'm not sure I
> like the idea to start automatically configuring the link via xdp_set.

That's what I thought up off-hand. Let me think about better
interfaces over the weekend.

> > 
> >> +	}
> >> +
> >>  	dev->vlan_features = dev->features;
> >>  
> >>  	/* MTU range: 68 - 65535 */
> >> @@ -2057,7 +2094,8 @@ static int virtnet_restore(struct virtio_device *vdev)
> >>  	VIRTIO_NET_F_CTRL_RX, VIRTIO_NET_F_CTRL_VLAN, \
> >>  	VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ, \
> >>  	VIRTIO_NET_F_CTRL_MAC_ADDR, \
> >> -	VIRTIO_NET_F_MTU
> >> +	VIRTIO_NET_F_MTU, \
> >> +	VIRTIO_NET_F_CTRL_GUEST_OFFLOADS
> >>  
> >>  static unsigned int features[] = {
> >>  	VIRTNET_FEATURES,

^ permalink raw reply	[flat|nested] 35+ messages in thread

* XDP_DROP and XDP_TX (Was: Re: [net-next PATCH v5 0/6] XDP for virtio_net)
  2016-12-08 19:38   ` Alexei Starovoitov
  2016-12-08 20:46     ` John Fastabend
@ 2016-12-13  8:46     ` Jesper Dangaard Brouer
  1 sibling, 0 replies; 35+ messages in thread
From: Jesper Dangaard Brouer @ 2016-12-13  8:46 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David Miller, john.fastabend, daniel, mst, shm, tgraf,
	john.r.fastabend, netdev, brouer


On Thu, 8 Dec 2016 11:38:16 -0800
Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote:
> On Thu, Dec 08, 2016 at 02:17:02PM -0500, David Miller wrote:
> > From: John Fastabend <john.fastabend@gmail.com>
> > Date: Wed, 07 Dec 2016 12:10:47 -0800
> >   
[...]
> > > Can't we disable XDP_TX somehow? Many people might only want RX drop,
> > > and extra queues are not always there.
> > >  
> > 
> > Alexei, Daniel, any thoughts on this?  
> 
> I don't like it.

I don't know that the use-case virtio XDP_TX is, but I still think we
should implement it for this virtio_net driver, as it allow easier
testing of XDP programs without physical HW.

BUT I do believe XDP_DROP and XDP_TX should be two different capabilities.

I can easily imagine that an older driver only wants to implement the
XDP_DROP facility. The reason is that XDP_TX would require changing too
much driver code, which is a concern for an old, stable and time-proven
driver.

I can also imagine wanting to implement XDP_DROP on my OpenWRT home
router, which I don't think can get an extra HW TX queue for XDP_TX.

I even have a practical use-case for my OpenWRT home router.  I
experience a Windows machine being connected to my home network, it
caused some (WiFi) traffic storm that overloaded the small OpenWRT box,
so much that it couldn't even route packets for other machines
connected (on Ethernet).  This could also be an IoT device or an
infected machine participating in a DDoS attack.  I fairly quickly
identified the bad machine and disconnected it from the network, but
for e.g. conference or hotel networks is can be harder to identify and
disconnect offenders.  An XDP eBPF (ddos) filter could allow the sysadm
to "virtually" disconnect a certain MAC address, and restore operation.


> > I know we were trying to claim some base level of feature support for
> > all XDP drivers. I am sympathetic to this argument though for DDOS we
> > do not need XDP_TX support. And virtio can become queue constrained
> > in some cases.  
> 
> Without XDP_TX it's too crippled. adjust_head() won't be possible,
> packet mangling would have to be disabled and so on.

Even without XDP_TX support, adjust_head() must be supported.  The
XDP_PASS action still requires ability to modify the packet.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [net-next PATCH v5 1/6] net: virtio dynamically disable/enable LRO
  2016-12-09  0:04     ` John Fastabend
  2016-12-09  3:05       ` Michael S. Tsirkin
@ 2016-12-14 13:31       ` Michael S. Tsirkin
  2016-12-14 17:01         ` John Fastabend
  1 sibling, 1 reply; 35+ messages in thread
From: Michael S. Tsirkin @ 2016-12-14 13:31 UTC (permalink / raw)
  To: John Fastabend
  Cc: daniel, shm, davem, tgraf, alexei.starovoitov, john.r.fastabend,
	netdev, brouer

On Thu, Dec 08, 2016 at 04:04:58PM -0800, John Fastabend wrote:
> On 16-12-08 01:36 PM, Michael S. Tsirkin wrote:
> > On Wed, Dec 07, 2016 at 12:11:11PM -0800, John Fastabend wrote:
> >> This adds support for dynamically setting the LRO feature flag. The
> >> message to control guest features in the backend uses the
> >> CTRL_GUEST_OFFLOADS msg type.
> >>
> >> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> >> ---
> >>  drivers/net/virtio_net.c |   40 +++++++++++++++++++++++++++++++++++++++-
> >>  1 file changed, 39 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> >> index a21d93a..a5c47b1 100644
> >> --- a/drivers/net/virtio_net.c
> >> +++ b/drivers/net/virtio_net.c
> >> @@ -1419,6 +1419,36 @@ static void virtnet_init_settings(struct net_device *dev)
> >>  	.set_settings = virtnet_set_settings,
> >>  };
> >>  
> >> +static int virtnet_set_features(struct net_device *netdev,
> >> +				netdev_features_t features)
> >> +{
> >> +	struct virtnet_info *vi = netdev_priv(netdev);
> >> +	struct virtio_device *vdev = vi->vdev;
> >> +	struct scatterlist sg;
> >> +	u64 offloads = 0;
> >> +
> >> +	if (features & NETIF_F_LRO)
> >> +		offloads |= (1 << VIRTIO_NET_F_GUEST_TSO4) |
> >> +			    (1 << VIRTIO_NET_F_GUEST_TSO6);
> >> +
> >> +	if (features & NETIF_F_RXCSUM)
> >> +		offloads |= (1 << VIRTIO_NET_F_GUEST_CSUM);
> >> +
> >> +	if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_GUEST_OFFLOADS)) {
> >> +		sg_init_one(&sg, &offloads, sizeof(uint64_t));
> >> +		if (!virtnet_send_command(vi,
> >> +					  VIRTIO_NET_CTRL_GUEST_OFFLOADS,
> >> +					  VIRTIO_NET_CTRL_GUEST_OFFLOADS_SET,
> >> +					  &sg)) {
> > 
> > Hmm I just realised that this will slow down setups that bridge
> > virtio net interfaces since bridge calls this if provided.
> > See below.
> 
> 
> Really? What code is trying to turn off GRO via the GUEST_OFFLOADS LRO
> command. My qemu/Linux setup has a set of tap/vhost devices attached to
> a bridge and all of them have LRO enabled even with this patch series.
> 
> I must missing a setup handler somewhere?
> 
> > 
> >> +			dev_warn(&netdev->dev,
> >> +				 "Failed to set guest offloads by virtnet command.\n");
> >> +			return -EINVAL;
> >> +		}
> >> +	}
> > 
> > Hmm if VIRTIO_NET_F_CTRL_GUEST_OFFLOADS is off, this fails
> > silently. It might actually be a good idea to avoid
> > breaking setups.
> > 
> >> +
> >> +	return 0;
> >> +}
> >> +
> >>  static const struct net_device_ops virtnet_netdev = {
> >>  	.ndo_open            = virtnet_open,
> >>  	.ndo_stop   	     = virtnet_close,
> >> @@ -1435,6 +1465,7 @@ static void virtnet_init_settings(struct net_device *dev)
> >>  #ifdef CONFIG_NET_RX_BUSY_POLL
> >>  	.ndo_busy_poll		= virtnet_busy_poll,
> >>  #endif
> >> +	.ndo_set_features	= virtnet_set_features,
> >>  };
> >>  
> >>  static void virtnet_config_changed_work(struct work_struct *work)
> >> @@ -1815,6 +1846,12 @@ static int virtnet_probe(struct virtio_device *vdev)
> >>  	if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_CSUM))
> >>  		dev->features |= NETIF_F_RXCSUM;
> >>  
> >> +	if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) &&
> >> +	    virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6)) {
> >> +		dev->features |= NETIF_F_LRO;
> >> +		dev->hw_features |= NETIF_F_LRO;
> > 
> > So the issue is I think that the virtio "LRO" isn't really
> > LRO, it's typically just GRO forwarded to guests.
> > So these are easily re-split along MTU boundaries,
> > which makes it ok to forward these across bridges.
> > 
> > It's not nice that we don't document this in the spec,
> > but it's the reality and people rely on this.
> > 
> > For now, how about doing a custom thing and just disable/enable
> > it as XDP is attached/detached?
> 
> The annoying part about doing this is ethtool will say that it is fixed
> yet it will be changed by seemingly unrelated operation. I'm not sure I
> like the idea to start automatically configuring the link via xdp_set.

I really don't like the idea of dropping performance
by a factor of 3 for people bridging two virtio net
interfaces.

So how about a simple approach for now, just disable
XDP if GUEST_TSO is enabled?

We can discuss better approaches in next version.


> > 
> >> +	}
> >> +
> >>  	dev->vlan_features = dev->features;
> >>  
> >>  	/* MTU range: 68 - 65535 */
> >> @@ -2057,7 +2094,8 @@ static int virtnet_restore(struct virtio_device *vdev)
> >>  	VIRTIO_NET_F_CTRL_RX, VIRTIO_NET_F_CTRL_VLAN, \
> >>  	VIRTIO_NET_F_GUEST_ANNOUNCE, VIRTIO_NET_F_MQ, \
> >>  	VIRTIO_NET_F_CTRL_MAC_ADDR, \
> >> -	VIRTIO_NET_F_MTU
> >> +	VIRTIO_NET_F_MTU, \
> >> +	VIRTIO_NET_F_CTRL_GUEST_OFFLOADS
> >>  
> >>  static unsigned int features[] = {
> >>  	VIRTNET_FEATURES,

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [net-next PATCH v5 1/6] net: virtio dynamically disable/enable LRO
  2016-12-14 13:31       ` Michael S. Tsirkin
@ 2016-12-14 17:01         ` John Fastabend
  2016-12-15 16:35           ` Michael S. Tsirkin
  0 siblings, 1 reply; 35+ messages in thread
From: John Fastabend @ 2016-12-14 17:01 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: daniel, shm, davem, tgraf, alexei.starovoitov, john.r.fastabend,
	netdev, brouer

On 16-12-14 05:31 AM, Michael S. Tsirkin wrote:
> On Thu, Dec 08, 2016 at 04:04:58PM -0800, John Fastabend wrote:
>> On 16-12-08 01:36 PM, Michael S. Tsirkin wrote:
>>> On Wed, Dec 07, 2016 at 12:11:11PM -0800, John Fastabend wrote:
>>>> This adds support for dynamically setting the LRO feature flag. The
>>>> message to control guest features in the backend uses the
>>>> CTRL_GUEST_OFFLOADS msg type.
>>>>
>>>> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
>>>> ---

[...]

>>>>  
>>>>  static void virtnet_config_changed_work(struct work_struct *work)
>>>> @@ -1815,6 +1846,12 @@ static int virtnet_probe(struct virtio_device *vdev)
>>>>  	if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_CSUM))
>>>>  		dev->features |= NETIF_F_RXCSUM;
>>>>  
>>>> +	if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) &&
>>>> +	    virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6)) {
>>>> +		dev->features |= NETIF_F_LRO;
>>>> +		dev->hw_features |= NETIF_F_LRO;
>>>
>>> So the issue is I think that the virtio "LRO" isn't really
>>> LRO, it's typically just GRO forwarded to guests.
>>> So these are easily re-split along MTU boundaries,
>>> which makes it ok to forward these across bridges.
>>>
>>> It's not nice that we don't document this in the spec,
>>> but it's the reality and people rely on this.
>>>
>>> For now, how about doing a custom thing and just disable/enable
>>> it as XDP is attached/detached?
>>
>> The annoying part about doing this is ethtool will say that it is fixed
>> yet it will be changed by seemingly unrelated operation. I'm not sure I
>> like the idea to start automatically configuring the link via xdp_set.
> 
> I really don't like the idea of dropping performance
> by a factor of 3 for people bridging two virtio net
> interfaces.
> 
> So how about a simple approach for now, just disable
> XDP if GUEST_TSO is enabled?
> 
> We can discuss better approaches in next version.
> 

So the proposal is to add a check in XDP setup so that

  if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO{4|6})
	return -ENOPSUPP;

Or whatever is the most appropriate return code? Then we can
disable TSO via qemu-system with guest_tso4=off,guest_tso6=off for
XDP use cases.

Sounds like a reasonable start to me. I'll make the change should this
go through DaveMs net-next tree or do you want it on virtio tree? Either
is fine with me.

Thanks,
John

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [net-next PATCH v5 1/6] net: virtio dynamically disable/enable LRO
  2016-12-14 17:01         ` John Fastabend
@ 2016-12-15 16:35           ` Michael S. Tsirkin
  0 siblings, 0 replies; 35+ messages in thread
From: Michael S. Tsirkin @ 2016-12-15 16:35 UTC (permalink / raw)
  To: John Fastabend
  Cc: daniel, shm, davem, tgraf, alexei.starovoitov, john.r.fastabend,
	netdev, brouer

On Wed, Dec 14, 2016 at 09:01:27AM -0800, John Fastabend wrote:
> On 16-12-14 05:31 AM, Michael S. Tsirkin wrote:
> > On Thu, Dec 08, 2016 at 04:04:58PM -0800, John Fastabend wrote:
> >> On 16-12-08 01:36 PM, Michael S. Tsirkin wrote:
> >>> On Wed, Dec 07, 2016 at 12:11:11PM -0800, John Fastabend wrote:
> >>>> This adds support for dynamically setting the LRO feature flag. The
> >>>> message to control guest features in the backend uses the
> >>>> CTRL_GUEST_OFFLOADS msg type.
> >>>>
> >>>> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> >>>> ---
> 
> [...]
> 
> >>>>  
> >>>>  static void virtnet_config_changed_work(struct work_struct *work)
> >>>> @@ -1815,6 +1846,12 @@ static int virtnet_probe(struct virtio_device *vdev)
> >>>>  	if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_CSUM))
> >>>>  		dev->features |= NETIF_F_RXCSUM;
> >>>>  
> >>>> +	if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO4) &&
> >>>> +	    virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO6)) {
> >>>> +		dev->features |= NETIF_F_LRO;
> >>>> +		dev->hw_features |= NETIF_F_LRO;
> >>>
> >>> So the issue is I think that the virtio "LRO" isn't really
> >>> LRO, it's typically just GRO forwarded to guests.
> >>> So these are easily re-split along MTU boundaries,
> >>> which makes it ok to forward these across bridges.
> >>>
> >>> It's not nice that we don't document this in the spec,
> >>> but it's the reality and people rely on this.
> >>>
> >>> For now, how about doing a custom thing and just disable/enable
> >>> it as XDP is attached/detached?
> >>
> >> The annoying part about doing this is ethtool will say that it is fixed
> >> yet it will be changed by seemingly unrelated operation. I'm not sure I
> >> like the idea to start automatically configuring the link via xdp_set.
> > 
> > I really don't like the idea of dropping performance
> > by a factor of 3 for people bridging two virtio net
> > interfaces.
> > 
> > So how about a simple approach for now, just disable
> > XDP if GUEST_TSO is enabled?
> > 
> > We can discuss better approaches in next version.
> > 
> 
> So the proposal is to add a check in XDP setup so that
> 
>   if (virtio_has_feature(vdev, VIRTIO_NET_F_GUEST_TSO{4|6})
> 	return -ENOPSUPP;
> 
> Or whatever is the most appropriate return code? Then we can
> disable TSO via qemu-system with guest_tso4=off,guest_tso6=off for
> XDP use cases.

Right. It's a start.

> Sounds like a reasonable start to me. I'll make the change should this
> go through DaveMs net-next tree or do you want it on virtio tree? Either
> is fine with me.
> 
> Thanks,
> John

I think I'll merge it because I'm tweaking RX processing too,
and this will likely conflict.

-- 
MST

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2016-12-15 16:35 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-07 20:10 [net-next PATCH v5 0/6] XDP for virtio_net John Fastabend
2016-12-07 20:11 ` [net-next PATCH v5 1/6] net: virtio dynamically disable/enable LRO John Fastabend
2016-12-08 21:36   ` Michael S. Tsirkin
2016-12-09  0:04     ` John Fastabend
2016-12-09  3:05       ` Michael S. Tsirkin
2016-12-14 13:31       ` Michael S. Tsirkin
2016-12-14 17:01         ` John Fastabend
2016-12-15 16:35           ` Michael S. Tsirkin
2016-12-07 20:11 ` [net-next PATCH v5 2/6] net: xdp: add invalid buffer warning John Fastabend
2016-12-07 20:11 ` [net-next PATCH v5 3/6] virtio_net: Add XDP support John Fastabend
2016-12-08  4:48   ` Michael S. Tsirkin
2016-12-08  5:14     ` John Fastabend
2016-12-08  5:54       ` Michael S. Tsirkin
2016-12-07 20:12 ` [net-next PATCH v5 4/6] virtio_net: add dedicated XDP transmit queues John Fastabend
2016-12-08  5:59   ` Michael S. Tsirkin
2016-12-08 17:10     ` John Fastabend
2016-12-07 20:12 ` [net-next PATCH v5 5/6] virtio_net: add XDP_TX support John Fastabend
2016-12-08  6:11   ` Michael S. Tsirkin
2016-12-08 18:18     ` John Fastabend
2016-12-08 21:08       ` Michael S. Tsirkin
2016-12-08 21:18       ` Michael S. Tsirkin
2016-12-08 21:25         ` John Fastabend
2016-12-08 21:45           ` Michael S. Tsirkin
2016-12-08 21:51             ` John Fastabend
2016-12-07 20:13 ` [net-next PATCH v5 6/6] virtio_net: xdp, add slowpath case for non contiguous buffers John Fastabend
2016-12-08 19:17 ` [net-next PATCH v5 0/6] XDP for virtio_net David Miller
2016-12-08 19:38   ` Alexei Starovoitov
2016-12-08 20:46     ` John Fastabend
2016-12-08 20:58       ` Michael S. Tsirkin
2016-12-08 21:10         ` Michael S. Tsirkin
2016-12-08 21:08       ` Alexei Starovoitov
2016-12-08 22:16       ` David Miller
2016-12-09  3:01         ` Michael S. Tsirkin
2016-12-13  8:46     ` XDP_DROP and XDP_TX (Was: Re: [net-next PATCH v5 0/6] XDP for virtio_net) Jesper Dangaard Brouer
2016-12-08 21:16   ` [net-next PATCH v5 0/6] XDP for virtio_net Michael S. Tsirkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.