All of lore.kernel.org
 help / color / mirror / Atom feed
* [net-next PATCH v2 0/5] XDP adjust head support for virtio
@ 2017-02-03  3:14 John Fastabend
  2017-02-03  3:14 ` [net-next PATCH v2 1/5] virtio_net: wrap rtnl_lock in test for calling with lock already held John Fastabend
                   ` (8 more replies)
  0 siblings, 9 replies; 24+ messages in thread
From: John Fastabend @ 2017-02-03  3:14 UTC (permalink / raw)
  To: kubakici, jasowang, ast, mst; +Cc: john.r.fastabend, netdev, john.fastabend

This series adds adjust head support for virtio. The following is my
test setup. I use qemu + virtio as follows,

./x86_64-softmmu/qemu-system-x86_64 \
  -hda /var/lib/libvirt/images/Fedora-test0.img \
  -m 4096  -enable-kvm -smp 2 -netdev tap,id=hn0,queues=4,vhost=on \
  -device virtio-net-pci,netdev=hn0,mq=on,guest_tso4=off,guest_tso6=off,guest_ecn=off,guest_ufo=off,vectors=9

In order to use XDP with virtio until LRO is supported TSO must be
turned off in the host. The important fields in the above command line
are the following,

  guest_tso4=off,guest_tso6=off,guest_ecn=off,guest_ufo=off

Also note it is possible to conusme more queues than can be supported
because when XDP is enabled for retransmit XDP attempts to use a queue
per cpu. My standard queue count is 'queues=4'.

After loading the VM I run the relevant XDP test programs in,

  ./sammples/bpf

For this series I tested xdp1, xdp2, and xdp_tx_iptunnel. I usually test
with iperf (-d option to get bidirectional traffic), ping, and pktgen.
I also have a modified xdp1 that returns XDP_PASS on any packet to ensure
the normal traffic path to the stack continues to work with XDP loaded.

It would be great to automate this soon. At the moment I do it by hand
which is starting to get tedious.

v2: original series dropped trace points after merge.

---

John Fastabend (5):
      virtio_net: wrap rtnl_lock in test for calling with lock already held
      virtio_net: factor out xdp handler for readability
      virtio_net: remove duplicate queue pair binding in XDP
      virtio_net: refactor freeze/restore logic into virtnet reset logic
      virtio_net: XDP support for adjust_head


 drivers/net/virtio_net.c |  338 ++++++++++++++++++++++++++++++----------------
 drivers/virtio/virtio.c  |   42 +++---
 include/linux/virtio.h   |    4 +
 3 files changed, 247 insertions(+), 137 deletions(-)

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [net-next PATCH v2 1/5] virtio_net: wrap rtnl_lock in test for calling with lock already held
  2017-02-03  3:14 [net-next PATCH v2 0/5] XDP adjust head support for virtio John Fastabend
@ 2017-02-03  3:14 ` John Fastabend
  2017-02-06  6:48   ` Jason Wang
  2017-02-03  3:15 ` [net-next PATCH v2 2/5] virtio_net: factor out xdp handler for readability John Fastabend
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 24+ messages in thread
From: John Fastabend @ 2017-02-03  3:14 UTC (permalink / raw)
  To: kubakici, jasowang, ast, mst; +Cc: john.r.fastabend, netdev, john.fastabend

For XDP use case and to allow ethtool reset tests it is useful to be
able to use reset paths from contexts where rtnl lock is already
held.

This requries updating virtnet_set_queues and free_receive_bufs the
two places where rtnl_lock is taken in virtio_net. To do this we
use the following pattern,

	_foo(...) { do stuff }
	foo(...) { rtnl_lock(); _foo(...); rtnl_unlock()};

this allows us to use freeze()/restore() flow from both contexts.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 drivers/net/virtio_net.c |   31 +++++++++++++++++++++----------
 1 file changed, 21 insertions(+), 10 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index bd22cf3..f8ba586 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1342,7 +1342,7 @@ static void virtnet_ack_link_announce(struct virtnet_info *vi)
 	rtnl_unlock();
 }
 
-static int virtnet_set_queues(struct virtnet_info *vi, u16 queue_pairs)
+static int _virtnet_set_queues(struct virtnet_info *vi, u16 queue_pairs)
 {
 	struct scatterlist sg;
 	struct net_device *dev = vi->dev;
@@ -1368,6 +1368,16 @@ static int virtnet_set_queues(struct virtnet_info *vi, u16 queue_pairs)
 	return 0;
 }
 
+static int virtnet_set_queues(struct virtnet_info *vi, u16 queue_pairs)
+{
+	int err;
+
+	rtnl_lock();
+	err = _virtnet_set_queues(vi, queue_pairs);
+	rtnl_unlock();
+	return err;
+}
+
 static int virtnet_close(struct net_device *dev)
 {
 	struct virtnet_info *vi = netdev_priv(dev);
@@ -1620,7 +1630,7 @@ static int virtnet_set_channels(struct net_device *dev,
 		return -EINVAL;
 
 	get_online_cpus();
-	err = virtnet_set_queues(vi, queue_pairs);
+	err = _virtnet_set_queues(vi, queue_pairs);
 	if (!err) {
 		netif_set_real_num_tx_queues(dev, queue_pairs);
 		netif_set_real_num_rx_queues(dev, queue_pairs);
@@ -1752,7 +1762,7 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog)
 		return -ENOMEM;
 	}
 
-	err = virtnet_set_queues(vi, curr_qp + xdp_qp);
+	err = _virtnet_set_queues(vi, curr_qp + xdp_qp);
 	if (err) {
 		dev_warn(&dev->dev, "XDP Device queue allocation failure.\n");
 		return err;
@@ -1761,7 +1771,7 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog)
 	if (prog) {
 		prog = bpf_prog_add(prog, vi->max_queue_pairs - 1);
 		if (IS_ERR(prog)) {
-			virtnet_set_queues(vi, curr_qp);
+			_virtnet_set_queues(vi, curr_qp);
 			return PTR_ERR(prog);
 		}
 	}
@@ -1880,12 +1890,11 @@ static void virtnet_free_queues(struct virtnet_info *vi)
 	kfree(vi->sq);
 }
 
-static void free_receive_bufs(struct virtnet_info *vi)
+static void _free_receive_bufs(struct virtnet_info *vi)
 {
 	struct bpf_prog *old_prog;
 	int i;
 
-	rtnl_lock();
 	for (i = 0; i < vi->max_queue_pairs; i++) {
 		while (vi->rq[i].pages)
 			__free_pages(get_a_page(&vi->rq[i], GFP_KERNEL), 0);
@@ -1895,6 +1904,12 @@ static void free_receive_bufs(struct virtnet_info *vi)
 		if (old_prog)
 			bpf_prog_put(old_prog);
 	}
+}
+
+static void free_receive_bufs(struct virtnet_info *vi)
+{
+	rtnl_lock();
+	_free_receive_bufs(vi);
 	rtnl_unlock();
 }
 
@@ -2333,9 +2348,7 @@ static int virtnet_probe(struct virtio_device *vdev)
 		goto free_unregister_netdev;
 	}
 
-	rtnl_lock();
 	virtnet_set_queues(vi, vi->curr_queue_pairs);
-	rtnl_unlock();
 
 	/* Assume link up if device can't report link status,
 	   otherwise get link status from config. */
@@ -2444,9 +2457,7 @@ static int virtnet_restore(struct virtio_device *vdev)
 
 	netif_device_attach(vi->dev);
 
-	rtnl_lock();
 	virtnet_set_queues(vi, vi->curr_queue_pairs);
-	rtnl_unlock();
 
 	err = virtnet_cpu_notif_add(vi);
 	if (err)

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [net-next PATCH v2 2/5] virtio_net: factor out xdp handler for readability
  2017-02-03  3:14 [net-next PATCH v2 0/5] XDP adjust head support for virtio John Fastabend
  2017-02-03  3:14 ` [net-next PATCH v2 1/5] virtio_net: wrap rtnl_lock in test for calling with lock already held John Fastabend
@ 2017-02-03  3:15 ` John Fastabend
  2017-02-06  6:49   ` Jason Wang
  2017-02-03  3:15 ` [net-next PATCH v2 3/5] virtio_net: remove duplicate queue pair binding in XDP John Fastabend
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 24+ messages in thread
From: John Fastabend @ 2017-02-03  3:15 UTC (permalink / raw)
  To: kubakici, jasowang, ast, mst; +Cc: john.r.fastabend, netdev, john.fastabend

At this point the do_xdp_prog is mostly if/else branches handling
the different modes of virtio_net. So remove it and handle running
the program in the per mode handlers.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 drivers/net/virtio_net.c |   86 +++++++++++++++++++---------------------------
 1 file changed, 35 insertions(+), 51 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index f8ba586..3b49363 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -399,52 +399,6 @@ static bool virtnet_xdp_xmit(struct virtnet_info *vi,
 	return true;
 }
 
-static u32 do_xdp_prog(struct virtnet_info *vi,
-		       struct receive_queue *rq,
-		       struct bpf_prog *xdp_prog,
-		       void *data, int len)
-{
-	int hdr_padded_len;
-	struct xdp_buff xdp;
-	void *buf;
-	unsigned int qp;
-	u32 act;
-
-	if (vi->mergeable_rx_bufs) {
-		hdr_padded_len = sizeof(struct virtio_net_hdr_mrg_rxbuf);
-		xdp.data = data + hdr_padded_len;
-		xdp.data_end = xdp.data + (len - vi->hdr_len);
-		buf = data;
-	} else { /* small buffers */
-		struct sk_buff *skb = data;
-
-		xdp.data = skb->data;
-		xdp.data_end = xdp.data + len;
-		buf = skb->data;
-	}
-
-	act = bpf_prog_run_xdp(xdp_prog, &xdp);
-	switch (act) {
-	case XDP_PASS:
-		return XDP_PASS;
-	case XDP_TX:
-		qp = vi->curr_queue_pairs -
-			vi->xdp_queue_pairs +
-			smp_processor_id();
-		xdp.data = buf;
-		if (unlikely(!virtnet_xdp_xmit(vi, rq, &vi->sq[qp], &xdp,
-					       data)))
-			trace_xdp_exception(vi->dev, xdp_prog, act);
-		return XDP_TX;
-	default:
-		bpf_warn_invalid_xdp_action(act);
-	case XDP_ABORTED:
-		trace_xdp_exception(vi->dev, xdp_prog, act);
-	case XDP_DROP:
-		return XDP_DROP;
-	}
-}
-
 static struct sk_buff *receive_small(struct net_device *dev,
 				     struct virtnet_info *vi,
 				     struct receive_queue *rq,
@@ -460,19 +414,34 @@ static struct sk_buff *receive_small(struct net_device *dev,
 	xdp_prog = rcu_dereference(rq->xdp_prog);
 	if (xdp_prog) {
 		struct virtio_net_hdr_mrg_rxbuf *hdr = buf;
+		struct xdp_buff xdp;
+		unsigned int qp;
 		u32 act;
 
 		if (unlikely(hdr->hdr.gso_type || hdr->hdr.flags))
 			goto err_xdp;
-		act = do_xdp_prog(vi, rq, xdp_prog, skb, len);
+
+		xdp.data = skb->data;
+		xdp.data_end = xdp.data + len;
+		act = bpf_prog_run_xdp(xdp_prog, &xdp);
+
 		switch (act) {
 		case XDP_PASS:
 			break;
 		case XDP_TX:
+			qp = vi->curr_queue_pairs -
+				vi->xdp_queue_pairs +
+				smp_processor_id();
+			if (unlikely(!virtnet_xdp_xmit(vi, rq, &vi->sq[qp],
+						       &xdp, skb)))
+				trace_xdp_exception(vi->dev, xdp_prog, act);
 			rcu_read_unlock();
 			goto xdp_xmit;
-		case XDP_DROP:
 		default:
+			bpf_warn_invalid_xdp_action(act);
+		case XDP_ABORTED:
+			trace_xdp_exception(vi->dev, xdp_prog, act);
+		case XDP_DROP:
 			goto err_xdp;
 		}
 	}
@@ -590,6 +559,9 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 	xdp_prog = rcu_dereference(rq->xdp_prog);
 	if (xdp_prog) {
 		struct page *xdp_page;
+		struct xdp_buff xdp;
+		unsigned int qp;
+		void *data;
 		u32 act;
 
 		/* This happens when rx buffer size is underestimated */
@@ -612,8 +584,11 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 		if (unlikely(hdr->hdr.gso_type))
 			goto err_xdp;
 
-		act = do_xdp_prog(vi, rq, xdp_prog,
-				  page_address(xdp_page) + offset, len);
+		data = page_address(xdp_page) + offset;
+		xdp.data = data + vi->hdr_len;
+		xdp.data_end = xdp.data + (len - vi->hdr_len);
+		act = bpf_prog_run_xdp(xdp_prog, &xdp);
+
 		switch (act) {
 		case XDP_PASS:
 			/* We can only create skb based on xdp_page. */
@@ -627,13 +602,22 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 			}
 			break;
 		case XDP_TX:
+			qp = vi->curr_queue_pairs -
+				vi->xdp_queue_pairs +
+				smp_processor_id();
+			if (unlikely(!virtnet_xdp_xmit(vi, rq, &vi->sq[qp],
+						       &xdp, data)))
+				trace_xdp_exception(vi->dev, xdp_prog, act);
 			ewma_pkt_len_add(&rq->mrg_avg_pkt_len, len);
 			if (unlikely(xdp_page != page))
 				goto err_xdp;
 			rcu_read_unlock();
 			goto xdp_xmit;
-		case XDP_DROP:
 		default:
+			bpf_warn_invalid_xdp_action(act);
+		case XDP_ABORTED:
+			trace_xdp_exception(vi->dev, xdp_prog, act);
+		case XDP_DROP:
 			if (unlikely(xdp_page != page))
 				__free_pages(xdp_page, 0);
 			ewma_pkt_len_add(&rq->mrg_avg_pkt_len, len);

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [net-next PATCH v2 3/5] virtio_net: remove duplicate queue pair binding in XDP
  2017-02-03  3:14 [net-next PATCH v2 0/5] XDP adjust head support for virtio John Fastabend
  2017-02-03  3:14 ` [net-next PATCH v2 1/5] virtio_net: wrap rtnl_lock in test for calling with lock already held John Fastabend
  2017-02-03  3:15 ` [net-next PATCH v2 2/5] virtio_net: factor out xdp handler for readability John Fastabend
@ 2017-02-03  3:15 ` John Fastabend
  2017-02-06  7:06   ` Jason Wang
  2017-02-03  3:16 ` [net-next PATCH v2 4/5] virtio_net: refactor freeze/restore logic into virtnet reset logic John Fastabend
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 24+ messages in thread
From: John Fastabend @ 2017-02-03  3:15 UTC (permalink / raw)
  To: kubakici, jasowang, ast, mst; +Cc: john.r.fastabend, netdev, john.fastabend

Factor out qp assignment.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 drivers/net/virtio_net.c |   20 +++++++-------------
 1 file changed, 7 insertions(+), 13 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 3b49363..dba5afb 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -341,15 +341,19 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
 
 static bool virtnet_xdp_xmit(struct virtnet_info *vi,
 			     struct receive_queue *rq,
-			     struct send_queue *sq,
 			     struct xdp_buff *xdp,
 			     void *data)
 {
 	struct virtio_net_hdr_mrg_rxbuf *hdr;
 	unsigned int num_sg, len;
+	struct send_queue *sq;
+	unsigned int qp;
 	void *xdp_sent;
 	int err;
 
+	qp = vi->curr_queue_pairs - vi->xdp_queue_pairs + smp_processor_id();
+	sq = &vi->sq[qp];
+
 	/* Free up any pending old buffers before queueing new ones. */
 	while ((xdp_sent = virtqueue_get_buf(sq->vq, &len)) != NULL) {
 		if (vi->mergeable_rx_bufs) {
@@ -415,7 +419,6 @@ static struct sk_buff *receive_small(struct net_device *dev,
 	if (xdp_prog) {
 		struct virtio_net_hdr_mrg_rxbuf *hdr = buf;
 		struct xdp_buff xdp;
-		unsigned int qp;
 		u32 act;
 
 		if (unlikely(hdr->hdr.gso_type || hdr->hdr.flags))
@@ -429,11 +432,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
 		case XDP_PASS:
 			break;
 		case XDP_TX:
-			qp = vi->curr_queue_pairs -
-				vi->xdp_queue_pairs +
-				smp_processor_id();
-			if (unlikely(!virtnet_xdp_xmit(vi, rq, &vi->sq[qp],
-						       &xdp, skb)))
+			if (unlikely(!virtnet_xdp_xmit(vi, rq, &xdp, skb)))
 				trace_xdp_exception(vi->dev, xdp_prog, act);
 			rcu_read_unlock();
 			goto xdp_xmit;
@@ -560,7 +559,6 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 	if (xdp_prog) {
 		struct page *xdp_page;
 		struct xdp_buff xdp;
-		unsigned int qp;
 		void *data;
 		u32 act;
 
@@ -602,11 +600,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 			}
 			break;
 		case XDP_TX:
-			qp = vi->curr_queue_pairs -
-				vi->xdp_queue_pairs +
-				smp_processor_id();
-			if (unlikely(!virtnet_xdp_xmit(vi, rq, &vi->sq[qp],
-						       &xdp, data)))
+			if (unlikely(!virtnet_xdp_xmit(vi, rq, &xdp, data)))
 				trace_xdp_exception(vi->dev, xdp_prog, act);
 			ewma_pkt_len_add(&rq->mrg_avg_pkt_len, len);
 			if (unlikely(xdp_page != page))

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [net-next PATCH v2 4/5] virtio_net: refactor freeze/restore logic into virtnet reset logic
  2017-02-03  3:14 [net-next PATCH v2 0/5] XDP adjust head support for virtio John Fastabend
                   ` (2 preceding siblings ...)
  2017-02-03  3:15 ` [net-next PATCH v2 3/5] virtio_net: remove duplicate queue pair binding in XDP John Fastabend
@ 2017-02-03  3:16 ` John Fastabend
  2017-02-06  7:07   ` Jason Wang
  2017-02-03  3:16 ` [net-next PATCH v2 5/5] virtio_net: XDP support for adjust_head John Fastabend
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 24+ messages in thread
From: John Fastabend @ 2017-02-03  3:16 UTC (permalink / raw)
  To: kubakici, jasowang, ast, mst; +Cc: john.r.fastabend, netdev, john.fastabend

For XDP we will need to reset the queues to allow for buffer headroom
to be configured. In order to do this we need to essentially run the
freeze()/restore() code path. Unfortunately the locking requirements
between the freeze/restore and reset paths are different however so
we can not simply reuse the code.

This patch refactors the code path and adds a reset helper routine.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 drivers/net/virtio_net.c |   75 ++++++++++++++++++++++++++++------------------
 drivers/virtio/virtio.c  |   42 ++++++++++++++------------
 include/linux/virtio.h   |    4 ++
 3 files changed, 73 insertions(+), 48 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index dba5afb..07f9076 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1698,6 +1698,49 @@ static void virtnet_init_settings(struct net_device *dev)
 	.set_settings = virtnet_set_settings,
 };
 
+static void virtnet_freeze_down(struct virtio_device *vdev)
+{
+	struct virtnet_info *vi = vdev->priv;
+	int i;
+
+	/* Make sure no work handler is accessing the device */
+	flush_work(&vi->config_work);
+
+	netif_device_detach(vi->dev);
+	cancel_delayed_work_sync(&vi->refill);
+
+	if (netif_running(vi->dev)) {
+		for (i = 0; i < vi->max_queue_pairs; i++)
+			napi_disable(&vi->rq[i].napi);
+	}
+}
+
+static int init_vqs(struct virtnet_info *vi);
+
+static int virtnet_restore_up(struct virtio_device *vdev)
+{
+	struct virtnet_info *vi = vdev->priv;
+	int err, i;
+
+	err = init_vqs(vi);
+	if (err)
+		return err;
+
+	virtio_device_ready(vdev);
+
+	if (netif_running(vi->dev)) {
+		for (i = 0; i < vi->curr_queue_pairs; i++)
+			if (!try_fill_recv(vi, &vi->rq[i], GFP_KERNEL))
+				schedule_delayed_work(&vi->refill, 0);
+
+		for (i = 0; i < vi->max_queue_pairs; i++)
+			virtnet_napi_enable(&vi->rq[i]);
+	}
+
+	netif_device_attach(vi->dev);
+	return err;
+}
+
 static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog)
 {
 	unsigned long int max_sz = PAGE_SIZE - sizeof(struct padded_vnet_hdr);
@@ -2393,21 +2436,9 @@ static void virtnet_remove(struct virtio_device *vdev)
 static int virtnet_freeze(struct virtio_device *vdev)
 {
 	struct virtnet_info *vi = vdev->priv;
-	int i;
 
 	virtnet_cpu_notif_remove(vi);
-
-	/* Make sure no work handler is accessing the device */
-	flush_work(&vi->config_work);
-
-	netif_device_detach(vi->dev);
-	cancel_delayed_work_sync(&vi->refill);
-
-	if (netif_running(vi->dev)) {
-		for (i = 0; i < vi->max_queue_pairs; i++)
-			napi_disable(&vi->rq[i].napi);
-	}
-
+	virtnet_freeze_down(vdev);
 	remove_vq_common(vi);
 
 	return 0;
@@ -2416,25 +2447,11 @@ static int virtnet_freeze(struct virtio_device *vdev)
 static int virtnet_restore(struct virtio_device *vdev)
 {
 	struct virtnet_info *vi = vdev->priv;
-	int err, i;
+	int err;
 
-	err = init_vqs(vi);
+	err = virtnet_restore_up(vdev);
 	if (err)
 		return err;
-
-	virtio_device_ready(vdev);
-
-	if (netif_running(vi->dev)) {
-		for (i = 0; i < vi->curr_queue_pairs; i++)
-			if (!try_fill_recv(vi, &vi->rq[i], GFP_KERNEL))
-				schedule_delayed_work(&vi->refill, 0);
-
-		for (i = 0; i < vi->max_queue_pairs; i++)
-			virtnet_napi_enable(&vi->rq[i]);
-	}
-
-	netif_device_attach(vi->dev);
-
 	virtnet_set_queues(vi, vi->curr_queue_pairs);
 
 	err = virtnet_cpu_notif_add(vi);
diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index 7062bb0..400d70b 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -100,11 +100,6 @@ static int virtio_uevent(struct device *_dv, struct kobj_uevent_env *env)
 			      dev->id.device, dev->id.vendor);
 }
 
-static void add_status(struct virtio_device *dev, unsigned status)
-{
-	dev->config->set_status(dev, dev->config->get_status(dev) | status);
-}
-
 void virtio_check_driver_offered_feature(const struct virtio_device *vdev,
 					 unsigned int fbit)
 {
@@ -145,14 +140,15 @@ void virtio_config_changed(struct virtio_device *dev)
 }
 EXPORT_SYMBOL_GPL(virtio_config_changed);
 
-static void virtio_config_disable(struct virtio_device *dev)
+void virtio_config_disable(struct virtio_device *dev)
 {
 	spin_lock_irq(&dev->config_lock);
 	dev->config_enabled = false;
 	spin_unlock_irq(&dev->config_lock);
 }
+EXPORT_SYMBOL_GPL(virtio_config_disable);
 
-static void virtio_config_enable(struct virtio_device *dev)
+void virtio_config_enable(struct virtio_device *dev)
 {
 	spin_lock_irq(&dev->config_lock);
 	dev->config_enabled = true;
@@ -161,8 +157,15 @@ static void virtio_config_enable(struct virtio_device *dev)
 	dev->config_change_pending = false;
 	spin_unlock_irq(&dev->config_lock);
 }
+EXPORT_SYMBOL_GPL(virtio_config_enable);
+
+void virtio_add_status(struct virtio_device *dev, unsigned int status)
+{
+	dev->config->set_status(dev, dev->config->get_status(dev) | status);
+}
+EXPORT_SYMBOL_GPL(virtio_add_status);
 
-static int virtio_finalize_features(struct virtio_device *dev)
+int virtio_finalize_features(struct virtio_device *dev)
 {
 	int ret = dev->config->finalize_features(dev);
 	unsigned status;
@@ -173,7 +176,7 @@ static int virtio_finalize_features(struct virtio_device *dev)
 	if (!virtio_has_feature(dev, VIRTIO_F_VERSION_1))
 		return 0;
 
-	add_status(dev, VIRTIO_CONFIG_S_FEATURES_OK);
+	virtio_add_status(dev, VIRTIO_CONFIG_S_FEATURES_OK);
 	status = dev->config->get_status(dev);
 	if (!(status & VIRTIO_CONFIG_S_FEATURES_OK)) {
 		dev_err(&dev->dev, "virtio: device refuses features: %x\n",
@@ -182,6 +185,7 @@ static int virtio_finalize_features(struct virtio_device *dev)
 	}
 	return 0;
 }
+EXPORT_SYMBOL_GPL(virtio_finalize_features);
 
 static int virtio_dev_probe(struct device *_d)
 {
@@ -193,7 +197,7 @@ static int virtio_dev_probe(struct device *_d)
 	u64 driver_features_legacy;
 
 	/* We have a driver! */
-	add_status(dev, VIRTIO_CONFIG_S_DRIVER);
+	virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER);
 
 	/* Figure out what features the device supports. */
 	device_features = dev->config->get_features(dev);
@@ -247,7 +251,7 @@ static int virtio_dev_probe(struct device *_d)
 
 	return 0;
 err:
-	add_status(dev, VIRTIO_CONFIG_S_FAILED);
+	virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED);
 	return err;
 
 }
@@ -265,7 +269,7 @@ static int virtio_dev_remove(struct device *_d)
 	WARN_ON_ONCE(dev->config->get_status(dev));
 
 	/* Acknowledge the device's existence again. */
-	add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
+	virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
 	return 0;
 }
 
@@ -316,7 +320,7 @@ int register_virtio_device(struct virtio_device *dev)
 	dev->config->reset(dev);
 
 	/* Acknowledge that we've seen the device. */
-	add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
+	virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
 
 	INIT_LIST_HEAD(&dev->vqs);
 
@@ -325,7 +329,7 @@ int register_virtio_device(struct virtio_device *dev)
 	err = device_register(&dev->dev);
 out:
 	if (err)
-		add_status(dev, VIRTIO_CONFIG_S_FAILED);
+		virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED);
 	return err;
 }
 EXPORT_SYMBOL_GPL(register_virtio_device);
@@ -365,18 +369,18 @@ int virtio_device_restore(struct virtio_device *dev)
 	dev->config->reset(dev);
 
 	/* Acknowledge that we've seen the device. */
-	add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
+	virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
 
 	/* Maybe driver failed before freeze.
 	 * Restore the failed status, for debugging. */
 	if (dev->failed)
-		add_status(dev, VIRTIO_CONFIG_S_FAILED);
+		virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED);
 
 	if (!drv)
 		return 0;
 
 	/* We have a driver! */
-	add_status(dev, VIRTIO_CONFIG_S_DRIVER);
+	virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER);
 
 	ret = virtio_finalize_features(dev);
 	if (ret)
@@ -389,14 +393,14 @@ int virtio_device_restore(struct virtio_device *dev)
 	}
 
 	/* Finally, tell the device we're all set */
-	add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
+	virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
 
 	virtio_config_enable(dev);
 
 	return 0;
 
 err:
-	add_status(dev, VIRTIO_CONFIG_S_FAILED);
+	virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED);
 	return ret;
 }
 EXPORT_SYMBOL_GPL(virtio_device_restore);
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index d5eb547..04b0d3f 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -132,12 +132,16 @@ static inline struct virtio_device *dev_to_virtio(struct device *_dev)
 	return container_of(_dev, struct virtio_device, dev);
 }
 
+void virtio_add_status(struct virtio_device *dev, unsigned int status);
 int register_virtio_device(struct virtio_device *dev);
 void unregister_virtio_device(struct virtio_device *dev);
 
 void virtio_break_device(struct virtio_device *dev);
 
 void virtio_config_changed(struct virtio_device *dev);
+void virtio_config_disable(struct virtio_device *dev);
+void virtio_config_enable(struct virtio_device *dev);
+int virtio_finalize_features(struct virtio_device *dev);
 #ifdef CONFIG_PM_SLEEP
 int virtio_device_freeze(struct virtio_device *dev);
 int virtio_device_restore(struct virtio_device *dev);

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [net-next PATCH v2 5/5] virtio_net: XDP support for adjust_head
  2017-02-03  3:14 [net-next PATCH v2 0/5] XDP adjust head support for virtio John Fastabend
                   ` (3 preceding siblings ...)
  2017-02-03  3:16 ` [net-next PATCH v2 4/5] virtio_net: refactor freeze/restore logic into virtnet reset logic John Fastabend
@ 2017-02-03  3:16 ` John Fastabend
  2017-02-03  4:04   ` Michael S. Tsirkin
  2017-02-06  7:08   ` Jason Wang
  2017-02-03  3:29 ` [net-next PATCH v2 0/5] XDP adjust head support for virtio Alexei Starovoitov
                   ` (3 subsequent siblings)
  8 siblings, 2 replies; 24+ messages in thread
From: John Fastabend @ 2017-02-03  3:16 UTC (permalink / raw)
  To: kubakici, jasowang, ast, mst; +Cc: john.r.fastabend, netdev, john.fastabend

Add support for XDP adjust head by allocating a 256B header region
that XDP programs can grow into. This is only enabled when a XDP
program is loaded.

In order to ensure that we do not have to unwind queue headroom push
queue setup below bpf_prog_add. It reads better to do a prog ref
unwind vs another queue setup call.

At the moment this code must do a full reset to ensure old buffers
without headroom on program add or with headroom on program removal
are not used incorrectly in the datapath. Ideally we would only
have to disable/enable the RX queues being updated but there is no
API to do this at the moment in virtio so use the big hammer. In
practice it is likely not that big of a problem as this will only
happen when XDP is enabled/disabled changing programs does not
require the reset. There is some risk that the driver may either
have an allocation failure or for some reason fail to correctly
negotiate with the underlying backend in this case the driver will
be left uninitialized. I have not seen this ever happen on my test
systems and for what its worth this same failure case can occur
from probe and other contexts in virtio framework.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 drivers/net/virtio_net.c |  154 +++++++++++++++++++++++++++++++++++++---------
 1 file changed, 125 insertions(+), 29 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 07f9076..52a18b8 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -42,6 +42,9 @@
 #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN)
 #define GOOD_COPY_LEN	128
 
+/* Amount of XDP headroom to prepend to packets for use by xdp_adjust_head */
+#define VIRTIO_XDP_HEADROOM 256
+
 /* RX packet size EWMA. The average packet size is used to determine the packet
  * buffer size when refilling RX rings. As the entire RX ring may be refilled
  * at once, the weight is chosen so that the EWMA will be insensitive to short-
@@ -368,6 +371,7 @@ static bool virtnet_xdp_xmit(struct virtnet_info *vi,
 	}
 
 	if (vi->mergeable_rx_bufs) {
+		xdp->data -= sizeof(struct virtio_net_hdr_mrg_rxbuf);
 		/* Zero header and leave csum up to XDP layers */
 		hdr = xdp->data;
 		memset(hdr, 0, vi->hdr_len);
@@ -384,7 +388,9 @@ static bool virtnet_xdp_xmit(struct virtnet_info *vi,
 		num_sg = 2;
 		sg_init_table(sq->sg, 2);
 		sg_set_buf(sq->sg, hdr, vi->hdr_len);
-		skb_to_sgvec(skb, sq->sg + 1, 0, skb->len);
+		skb_to_sgvec(skb, sq->sg + 1,
+			     xdp->data - xdp->data_hard_start,
+			     xdp->data_end - xdp->data);
 	}
 	err = virtqueue_add_outbuf(sq->vq, sq->sg, num_sg,
 				   data, GFP_ATOMIC);
@@ -412,7 +418,6 @@ static struct sk_buff *receive_small(struct net_device *dev,
 	struct bpf_prog *xdp_prog;
 
 	len -= vi->hdr_len;
-	skb_trim(skb, len);
 
 	rcu_read_lock();
 	xdp_prog = rcu_dereference(rq->xdp_prog);
@@ -424,12 +429,16 @@ static struct sk_buff *receive_small(struct net_device *dev,
 		if (unlikely(hdr->hdr.gso_type || hdr->hdr.flags))
 			goto err_xdp;
 
-		xdp.data = skb->data;
+		xdp.data_hard_start = skb->data;
+		xdp.data = skb->data + VIRTIO_XDP_HEADROOM;
 		xdp.data_end = xdp.data + len;
 		act = bpf_prog_run_xdp(xdp_prog, &xdp);
 
 		switch (act) {
 		case XDP_PASS:
+			/* Recalculate length in case bpf program changed it */
+			__skb_pull(skb, xdp.data - xdp.data_hard_start);
+			len = xdp.data_end - xdp.data;
 			break;
 		case XDP_TX:
 			if (unlikely(!virtnet_xdp_xmit(vi, rq, &xdp, skb)))
@@ -446,6 +455,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
 	}
 	rcu_read_unlock();
 
+	skb_trim(skb, len);
 	return skb;
 
 err_xdp:
@@ -494,7 +504,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
 				       unsigned int *len)
 {
 	struct page *page = alloc_page(GFP_ATOMIC);
-	unsigned int page_off = 0;
+	unsigned int page_off = VIRTIO_XDP_HEADROOM;
 
 	if (!page)
 		return NULL;
@@ -530,7 +540,8 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
 		put_page(p);
 	}
 
-	*len = page_off;
+	/* Headroom does not contribute to packet length */
+	*len = page_off - VIRTIO_XDP_HEADROOM;
 	return page;
 err_buf:
 	__free_pages(page, 0);
@@ -569,7 +580,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 						      page, offset, &len);
 			if (!xdp_page)
 				goto err_xdp;
-			offset = 0;
+			offset = VIRTIO_XDP_HEADROOM;
 		} else {
 			xdp_page = page;
 		}
@@ -582,19 +593,30 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 		if (unlikely(hdr->hdr.gso_type))
 			goto err_xdp;
 
+		/* Allow consuming headroom but reserve enough space to push
+		 * the descriptor on if we get an XDP_TX return code.
+		 */
 		data = page_address(xdp_page) + offset;
+		xdp.data_hard_start = data - VIRTIO_XDP_HEADROOM + vi->hdr_len;
 		xdp.data = data + vi->hdr_len;
 		xdp.data_end = xdp.data + (len - vi->hdr_len);
 		act = bpf_prog_run_xdp(xdp_prog, &xdp);
 
 		switch (act) {
 		case XDP_PASS:
+			/* recalculate offset to account for any header
+			 * adjustments. Note other cases do not build an
+			 * skb and avoid using offset
+			 */
+			offset = xdp.data -
+					page_address(xdp_page) - vi->hdr_len;
+
 			/* We can only create skb based on xdp_page. */
 			if (unlikely(xdp_page != page)) {
 				rcu_read_unlock();
 				put_page(page);
 				head_skb = page_to_skb(vi, rq, xdp_page,
-						       0, len, PAGE_SIZE);
+						       offset, len, PAGE_SIZE);
 				ewma_pkt_len_add(&rq->mrg_avg_pkt_len, len);
 				return head_skb;
 			}
@@ -761,23 +783,30 @@ static void receive_buf(struct virtnet_info *vi, struct receive_queue *rq,
 	dev_kfree_skb(skb);
 }
 
+static unsigned int virtnet_get_headroom(struct virtnet_info *vi)
+{
+	return vi->xdp_queue_pairs ? VIRTIO_XDP_HEADROOM : 0;
+}
+
 static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
 			     gfp_t gfp)
 {
+	int headroom = GOOD_PACKET_LEN + virtnet_get_headroom(vi);
+	unsigned int xdp_headroom = virtnet_get_headroom(vi);
 	struct sk_buff *skb;
 	struct virtio_net_hdr_mrg_rxbuf *hdr;
 	int err;
 
-	skb = __netdev_alloc_skb_ip_align(vi->dev, GOOD_PACKET_LEN, gfp);
+	skb = __netdev_alloc_skb_ip_align(vi->dev, headroom, gfp);
 	if (unlikely(!skb))
 		return -ENOMEM;
 
-	skb_put(skb, GOOD_PACKET_LEN);
+	skb_put(skb, headroom);
 
 	hdr = skb_vnet_hdr(skb);
 	sg_init_table(rq->sg, 2);
 	sg_set_buf(rq->sg, hdr, vi->hdr_len);
-	skb_to_sgvec(skb, rq->sg + 1, 0, skb->len);
+	skb_to_sgvec(skb, rq->sg + 1, xdp_headroom, skb->len - xdp_headroom);
 
 	err = virtqueue_add_inbuf(rq->vq, rq->sg, 2, skb, gfp);
 	if (err < 0)
@@ -845,24 +874,27 @@ static unsigned int get_mergeable_buf_len(struct ewma_pkt_len *avg_pkt_len)
 	return ALIGN(len, MERGEABLE_BUFFER_ALIGN);
 }
 
-static int add_recvbuf_mergeable(struct receive_queue *rq, gfp_t gfp)
+static int add_recvbuf_mergeable(struct virtnet_info *vi,
+				 struct receive_queue *rq, gfp_t gfp)
 {
 	struct page_frag *alloc_frag = &rq->alloc_frag;
+	unsigned int headroom = virtnet_get_headroom(vi);
 	char *buf;
 	unsigned long ctx;
 	int err;
 	unsigned int len, hole;
 
 	len = get_mergeable_buf_len(&rq->mrg_avg_pkt_len);
-	if (unlikely(!skb_page_frag_refill(len, alloc_frag, gfp)))
+	if (unlikely(!skb_page_frag_refill(len + headroom, alloc_frag, gfp)))
 		return -ENOMEM;
 
 	buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
+	buf += headroom; /* advance address leaving hole at front of pkt */
 	ctx = mergeable_buf_to_ctx(buf, len);
 	get_page(alloc_frag->page);
-	alloc_frag->offset += len;
+	alloc_frag->offset += len + headroom;
 	hole = alloc_frag->size - alloc_frag->offset;
-	if (hole < len) {
+	if (hole < len + headroom) {
 		/* To avoid internal fragmentation, if there is very likely not
 		 * enough space for another buffer, add the remaining space to
 		 * the current buffer. This extra space is not included in
@@ -896,7 +928,7 @@ static bool try_fill_recv(struct virtnet_info *vi, struct receive_queue *rq,
 	gfp |= __GFP_COLD;
 	do {
 		if (vi->mergeable_rx_bufs)
-			err = add_recvbuf_mergeable(rq, gfp);
+			err = add_recvbuf_mergeable(vi, rq, gfp);
 		else if (vi->big_packets)
 			err = add_recvbuf_big(vi, rq, gfp);
 		else
@@ -1716,6 +1748,7 @@ static void virtnet_freeze_down(struct virtio_device *vdev)
 }
 
 static int init_vqs(struct virtnet_info *vi);
+static void _remove_vq_common(struct virtnet_info *vi);
 
 static int virtnet_restore_up(struct virtio_device *vdev)
 {
@@ -1741,19 +1774,47 @@ static int virtnet_restore_up(struct virtio_device *vdev)
 	return err;
 }
 
+static int virtnet_reset(struct virtnet_info *vi)
+{
+	struct virtio_device *dev = vi->vdev;
+	int ret;
+
+	virtio_config_disable(dev);
+	dev->failed = dev->config->get_status(dev) & VIRTIO_CONFIG_S_FAILED;
+	virtnet_freeze_down(dev);
+	_remove_vq_common(vi);
+
+	dev->config->reset(dev);
+	virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
+	virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER);
+
+	ret = virtio_finalize_features(dev);
+	if (ret)
+		goto err;
+
+	ret = virtnet_restore_up(dev);
+	if (ret)
+		goto err;
+	ret = _virtnet_set_queues(vi, vi->curr_queue_pairs);
+	if (ret)
+		goto err;
+
+	virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
+	virtio_config_enable(dev);
+	return 0;
+err:
+	virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED);
+	return ret;
+}
+
 static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog)
 {
 	unsigned long int max_sz = PAGE_SIZE - sizeof(struct padded_vnet_hdr);
 	struct virtnet_info *vi = netdev_priv(dev);
 	struct bpf_prog *old_prog;
-	u16 xdp_qp = 0, curr_qp;
+	u16 oxdp_qp, xdp_qp = 0, curr_qp;
 	int i, err;
 
-	if (prog && prog->xdp_adjust_head) {
-		netdev_warn(dev, "Does not support bpf_xdp_adjust_head()\n");
-		return -EOPNOTSUPP;
-	}
-
 	if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO4) ||
 	    virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO6) ||
 	    virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_ECN) ||
@@ -1783,21 +1844,32 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog)
 		return -ENOMEM;
 	}
 
+	if (prog) {
+		prog = bpf_prog_add(prog, vi->max_queue_pairs - 1);
+		if (IS_ERR(prog))
+			return PTR_ERR(prog);
+	}
+
 	err = _virtnet_set_queues(vi, curr_qp + xdp_qp);
 	if (err) {
 		dev_warn(&dev->dev, "XDP Device queue allocation failure.\n");
-		return err;
+		goto virtio_queue_err;
 	}
 
-	if (prog) {
-		prog = bpf_prog_add(prog, vi->max_queue_pairs - 1);
-		if (IS_ERR(prog)) {
-			_virtnet_set_queues(vi, curr_qp);
-			return PTR_ERR(prog);
-		}
+	oxdp_qp = vi->xdp_queue_pairs;
+
+	/* Changing the headroom in buffers is a disruptive operation because
+	 * existing buffers must be flushed and reallocated. This will happen
+	 * when a xdp program is initially added or xdp is disabled by removing
+	 * the xdp program resulting in number of XDP queues changing.
+	 */
+	if (vi->xdp_queue_pairs != xdp_qp) {
+		vi->xdp_queue_pairs = xdp_qp;
+		err = virtnet_reset(vi);
+		if (err)
+			goto virtio_reset_err;
 	}
 
-	vi->xdp_queue_pairs = xdp_qp;
 	netif_set_real_num_rx_queues(dev, curr_qp + xdp_qp);
 
 	for (i = 0; i < vi->max_queue_pairs; i++) {
@@ -1808,6 +1880,21 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog)
 	}
 
 	return 0;
+
+virtio_reset_err:
+	/* On reset error do our best to unwind XDP changes inflight and return
+	 * error up to user space for resolution. The underlying reset hung on
+	 * us so not much we can do here.
+	 */
+	dev_warn(&dev->dev, "XDP reset failure and queues unstable\n");
+	vi->xdp_queue_pairs = oxdp_qp;
+virtio_queue_err:
+	/* On queue set error we can unwind bpf ref count and user space can
+	 * retry this is most likely an allocation failure.
+	 */
+	if (prog)
+		bpf_prog_sub(prog, vi->max_queue_pairs - 1);
+	return err;
 }
 
 static bool virtnet_xdp_query(struct net_device *dev)
@@ -2401,6 +2488,15 @@ static int virtnet_probe(struct virtio_device *vdev)
 	return err;
 }
 
+static void _remove_vq_common(struct virtnet_info *vi)
+{
+	vi->vdev->config->reset(vi->vdev);
+	free_unused_bufs(vi);
+	_free_receive_bufs(vi);
+	free_receive_page_frags(vi);
+	virtnet_del_vqs(vi);
+}
+
 static void remove_vq_common(struct virtnet_info *vi)
 {
 	vi->vdev->config->reset(vi->vdev);

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [net-next PATCH v2 0/5] XDP adjust head support for virtio
  2017-02-03  3:14 [net-next PATCH v2 0/5] XDP adjust head support for virtio John Fastabend
                   ` (4 preceding siblings ...)
  2017-02-03  3:16 ` [net-next PATCH v2 5/5] virtio_net: XDP support for adjust_head John Fastabend
@ 2017-02-03  3:29 ` Alexei Starovoitov
  2017-02-03  3:55 ` Jakub Kicinski
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 24+ messages in thread
From: Alexei Starovoitov @ 2017-02-03  3:29 UTC (permalink / raw)
  To: John Fastabend, kubakici, jasowang, mst; +Cc: john.r.fastabend, netdev

On 2/2/17 7:14 PM, John Fastabend wrote:
> This series adds adjust head support for virtio. The following is my
> test setup. I use qemu + virtio as follows,
>
> ./x86_64-softmmu/qemu-system-x86_64 \
>    -hda /var/lib/libvirt/images/Fedora-test0.img \
>    -m 4096  -enable-kvm -smp 2 -netdev tap,id=hn0,queues=4,vhost=on \
>    -device virtio-net-pci,netdev=hn0,mq=on,guest_tso4=off,guest_tso6=off,guest_ecn=off,guest_ufo=off,vectors=9
>
> In order to use XDP with virtio until LRO is supported TSO must be
> turned off in the host. The important fields in the above command line
> are the following,
>
>    guest_tso4=off,guest_tso6=off,guest_ecn=off,guest_ufo=off

thank you for sharing the command line!
Pretty hard to figure this out.
Too bad you dropped the patch that allows dynamic switch off of LRO.
Long term I don't see why guest shouldn't be allowed to turn that knob.

> Also note it is possible to conusme more queues than can be supported
> because when XDP is enabled for retransmit XDP attempts to use a queue
> per cpu. My standard queue count is 'queues=4'.
>
> After loading the VM I run the relevant XDP test programs in,
>
>    ./sammples/bpf
>
> For this series I tested xdp1, xdp2, and xdp_tx_iptunnel. I usually test
> with iperf (-d option to get bidirectional traffic), ping, and pktgen.
> I also have a modified xdp1 that returns XDP_PASS on any packet to ensure
> the normal traffic path to the stack continues to work with XDP loaded.

same here.
xdp testing requires two physical machines with specific nics,
so hard to automate.
At least the virtio+xdp gives us ability to test the programs
automatically. So virtio+xdp will get the most test coverage and
all hw nics will be using it as a yardstick. Very important to
make it easy to use.

For bpf and generic xdp bits:
Acked-by: Alexei Starovoitov <ast@kernel.org>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [net-next PATCH v2 0/5] XDP adjust head support for virtio
  2017-02-03  3:14 [net-next PATCH v2 0/5] XDP adjust head support for virtio John Fastabend
                   ` (5 preceding siblings ...)
  2017-02-03  3:29 ` [net-next PATCH v2 0/5] XDP adjust head support for virtio Alexei Starovoitov
@ 2017-02-03  3:55 ` Jakub Kicinski
  2017-02-05 22:36 ` David Miller
  2017-02-07  4:15 ` Michael S. Tsirkin
  8 siblings, 0 replies; 24+ messages in thread
From: Jakub Kicinski @ 2017-02-03  3:55 UTC (permalink / raw)
  To: John Fastabend; +Cc: jasowang, ast, mst, john.r.fastabend, netdev

On Thu, 02 Feb 2017 19:14:05 -0800, John Fastabend wrote:
> This series adds adjust head support for virtio. ...

XDP bits look good to me too!

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [net-next PATCH v2 5/5] virtio_net: XDP support for adjust_head
  2017-02-03  3:16 ` [net-next PATCH v2 5/5] virtio_net: XDP support for adjust_head John Fastabend
@ 2017-02-03  4:04   ` Michael S. Tsirkin
  2017-02-06  7:08   ` Jason Wang
  1 sibling, 0 replies; 24+ messages in thread
From: Michael S. Tsirkin @ 2017-02-03  4:04 UTC (permalink / raw)
  To: John Fastabend; +Cc: kubakici, jasowang, ast, john.r.fastabend, netdev

On Thu, Feb 02, 2017 at 07:16:29PM -0800, John Fastabend wrote:
> Add support for XDP adjust head by allocating a 256B header region
> that XDP programs can grow into. This is only enabled when a XDP
> program is loaded.
> 
> In order to ensure that we do not have to unwind queue headroom push
> queue setup below bpf_prog_add. It reads better to do a prog ref
> unwind vs another queue setup call.
> 
> At the moment this code must do a full reset to ensure old buffers
> without headroom on program add or with headroom on program removal
> are not used incorrectly in the datapath. Ideally we would only
> have to disable/enable the RX queues being updated but there is no
> API to do this at the moment in virtio so use the big hammer. In
> practice it is likely not that big of a problem as this will only
> happen when XDP is enabled/disabled changing programs does not
> require the reset. There is some risk that the driver may either
> have an allocation failure or for some reason fail to correctly
> negotiate with the underlying backend in this case the driver will
> be left uninitialized. I have not seen this ever happen on my test
> systems and for what its worth this same failure case can occur
> from probe and other contexts in virtio framework.
> 
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> ---
>  drivers/net/virtio_net.c |  154 +++++++++++++++++++++++++++++++++++++---------
>  1 file changed, 125 insertions(+), 29 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 07f9076..52a18b8 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -42,6 +42,9 @@
>  #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN)
>  #define GOOD_COPY_LEN	128
>  
> +/* Amount of XDP headroom to prepend to packets for use by xdp_adjust_head */
> +#define VIRTIO_XDP_HEADROOM 256
> +
>  /* RX packet size EWMA. The average packet size is used to determine the packet
>   * buffer size when refilling RX rings. As the entire RX ring may be refilled
>   * at once, the weight is chosen so that the EWMA will be insensitive to short-
> @@ -368,6 +371,7 @@ static bool virtnet_xdp_xmit(struct virtnet_info *vi,
>  	}
>  
>  	if (vi->mergeable_rx_bufs) {
> +		xdp->data -= sizeof(struct virtio_net_hdr_mrg_rxbuf);
>  		/* Zero header and leave csum up to XDP layers */
>  		hdr = xdp->data;
>  		memset(hdr, 0, vi->hdr_len);
> @@ -384,7 +388,9 @@ static bool virtnet_xdp_xmit(struct virtnet_info *vi,
>  		num_sg = 2;
>  		sg_init_table(sq->sg, 2);
>  		sg_set_buf(sq->sg, hdr, vi->hdr_len);
> -		skb_to_sgvec(skb, sq->sg + 1, 0, skb->len);
> +		skb_to_sgvec(skb, sq->sg + 1,
> +			     xdp->data - xdp->data_hard_start,
> +			     xdp->data_end - xdp->data);
>  	}
>  	err = virtqueue_add_outbuf(sq->vq, sq->sg, num_sg,
>  				   data, GFP_ATOMIC);
> @@ -412,7 +418,6 @@ static struct sk_buff *receive_small(struct net_device *dev,
>  	struct bpf_prog *xdp_prog;
>  
>  	len -= vi->hdr_len;
> -	skb_trim(skb, len);
>  
>  	rcu_read_lock();
>  	xdp_prog = rcu_dereference(rq->xdp_prog);
> @@ -424,12 +429,16 @@ static struct sk_buff *receive_small(struct net_device *dev,
>  		if (unlikely(hdr->hdr.gso_type || hdr->hdr.flags))
>  			goto err_xdp;
>  
> -		xdp.data = skb->data;
> +		xdp.data_hard_start = skb->data;
> +		xdp.data = skb->data + VIRTIO_XDP_HEADROOM;
>  		xdp.data_end = xdp.data + len;
>  		act = bpf_prog_run_xdp(xdp_prog, &xdp);
>  
>  		switch (act) {
>  		case XDP_PASS:
> +			/* Recalculate length in case bpf program changed it */
> +			__skb_pull(skb, xdp.data - xdp.data_hard_start);
> +			len = xdp.data_end - xdp.data;
>  			break;
>  		case XDP_TX:
>  			if (unlikely(!virtnet_xdp_xmit(vi, rq, &xdp, skb)))
> @@ -446,6 +455,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
>  	}
>  	rcu_read_unlock();
>  
> +	skb_trim(skb, len);
>  	return skb;
>  
>  err_xdp:
> @@ -494,7 +504,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
>  				       unsigned int *len)
>  {
>  	struct page *page = alloc_page(GFP_ATOMIC);
> -	unsigned int page_off = 0;
> +	unsigned int page_off = VIRTIO_XDP_HEADROOM;
>  
>  	if (!page)
>  		return NULL;
> @@ -530,7 +540,8 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
>  		put_page(p);
>  	}
>  
> -	*len = page_off;
> +	/* Headroom does not contribute to packet length */
> +	*len = page_off - VIRTIO_XDP_HEADROOM;
>  	return page;
>  err_buf:
>  	__free_pages(page, 0);
> @@ -569,7 +580,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>  						      page, offset, &len);
>  			if (!xdp_page)
>  				goto err_xdp;
> -			offset = 0;
> +			offset = VIRTIO_XDP_HEADROOM;
>  		} else {
>  			xdp_page = page;
>  		}
> @@ -582,19 +593,30 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>  		if (unlikely(hdr->hdr.gso_type))
>  			goto err_xdp;
>  
> +		/* Allow consuming headroom but reserve enough space to push
> +		 * the descriptor on if we get an XDP_TX return code.
> +		 */
>  		data = page_address(xdp_page) + offset;
> +		xdp.data_hard_start = data - VIRTIO_XDP_HEADROOM + vi->hdr_len;
>  		xdp.data = data + vi->hdr_len;
>  		xdp.data_end = xdp.data + (len - vi->hdr_len);
>  		act = bpf_prog_run_xdp(xdp_prog, &xdp);
>  
>  		switch (act) {
>  		case XDP_PASS:
> +			/* recalculate offset to account for any header
> +			 * adjustments. Note other cases do not build an
> +			 * skb and avoid using offset
> +			 */
> +			offset = xdp.data -
> +					page_address(xdp_page) - vi->hdr_len;
> +
>  			/* We can only create skb based on xdp_page. */
>  			if (unlikely(xdp_page != page)) {
>  				rcu_read_unlock();
>  				put_page(page);
>  				head_skb = page_to_skb(vi, rq, xdp_page,
> -						       0, len, PAGE_SIZE);
> +						       offset, len, PAGE_SIZE);
>  				ewma_pkt_len_add(&rq->mrg_avg_pkt_len, len);
>  				return head_skb;
>  			}
> @@ -761,23 +783,30 @@ static void receive_buf(struct virtnet_info *vi, struct receive_queue *rq,
>  	dev_kfree_skb(skb);
>  }
>  
> +static unsigned int virtnet_get_headroom(struct virtnet_info *vi)
> +{
> +	return vi->xdp_queue_pairs ? VIRTIO_XDP_HEADROOM : 0;
> +}
> +

Why not limit the extra headroom to when prog->xdp_adjust_head
is set? People just doing filtering to fight DOS attacks
don't need it at all.



>  static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
>  			     gfp_t gfp)
>  {
> +	int headroom = GOOD_PACKET_LEN + virtnet_get_headroom(vi);
> +	unsigned int xdp_headroom = virtnet_get_headroom(vi);
>  	struct sk_buff *skb;
>  	struct virtio_net_hdr_mrg_rxbuf *hdr;
>  	int err;
>  
> -	skb = __netdev_alloc_skb_ip_align(vi->dev, GOOD_PACKET_LEN, gfp);
> +	skb = __netdev_alloc_skb_ip_align(vi->dev, headroom, gfp);
>  	if (unlikely(!skb))
>  		return -ENOMEM;
>  
> -	skb_put(skb, GOOD_PACKET_LEN);
> +	skb_put(skb, headroom);
>  
>  	hdr = skb_vnet_hdr(skb);
>  	sg_init_table(rq->sg, 2);
>  	sg_set_buf(rq->sg, hdr, vi->hdr_len);
> -	skb_to_sgvec(skb, rq->sg + 1, 0, skb->len);
> +	skb_to_sgvec(skb, rq->sg + 1, xdp_headroom, skb->len - xdp_headroom);
>  
>  	err = virtqueue_add_inbuf(rq->vq, rq->sg, 2, skb, gfp);
>  	if (err < 0)
> @@ -845,24 +874,27 @@ static unsigned int get_mergeable_buf_len(struct ewma_pkt_len *avg_pkt_len)
>  	return ALIGN(len, MERGEABLE_BUFFER_ALIGN);
>  }
>  
> -static int add_recvbuf_mergeable(struct receive_queue *rq, gfp_t gfp)
> +static int add_recvbuf_mergeable(struct virtnet_info *vi,
> +				 struct receive_queue *rq, gfp_t gfp)
>  {
>  	struct page_frag *alloc_frag = &rq->alloc_frag;
> +	unsigned int headroom = virtnet_get_headroom(vi);
>  	char *buf;
>  	unsigned long ctx;
>  	int err;
>  	unsigned int len, hole;
>  
>  	len = get_mergeable_buf_len(&rq->mrg_avg_pkt_len);
> -	if (unlikely(!skb_page_frag_refill(len, alloc_frag, gfp)))
> +	if (unlikely(!skb_page_frag_refill(len + headroom, alloc_frag, gfp)))
>  		return -ENOMEM;
>  
>  	buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
> +	buf += headroom; /* advance address leaving hole at front of pkt */
>  	ctx = mergeable_buf_to_ctx(buf, len);
>  	get_page(alloc_frag->page);
> -	alloc_frag->offset += len;
> +	alloc_frag->offset += len + headroom;
>  	hole = alloc_frag->size - alloc_frag->offset;
> -	if (hole < len) {
> +	if (hole < len + headroom) {
>  		/* To avoid internal fragmentation, if there is very likely not
>  		 * enough space for another buffer, add the remaining space to
>  		 * the current buffer. This extra space is not included in
> @@ -896,7 +928,7 @@ static bool try_fill_recv(struct virtnet_info *vi, struct receive_queue *rq,
>  	gfp |= __GFP_COLD;
>  	do {
>  		if (vi->mergeable_rx_bufs)
> -			err = add_recvbuf_mergeable(rq, gfp);
> +			err = add_recvbuf_mergeable(vi, rq, gfp);
>  		else if (vi->big_packets)
>  			err = add_recvbuf_big(vi, rq, gfp);
>  		else
> @@ -1716,6 +1748,7 @@ static void virtnet_freeze_down(struct virtio_device *vdev)
>  }
>  
>  static int init_vqs(struct virtnet_info *vi);
> +static void _remove_vq_common(struct virtnet_info *vi);
>  
>  static int virtnet_restore_up(struct virtio_device *vdev)
>  {
> @@ -1741,19 +1774,47 @@ static int virtnet_restore_up(struct virtio_device *vdev)
>  	return err;
>  }
>  
> +static int virtnet_reset(struct virtnet_info *vi)
> +{
> +	struct virtio_device *dev = vi->vdev;
> +	int ret;
> +
> +	virtio_config_disable(dev);
> +	dev->failed = dev->config->get_status(dev) & VIRTIO_CONFIG_S_FAILED;
> +	virtnet_freeze_down(dev);
> +	_remove_vq_common(vi);
> +
> +	dev->config->reset(dev);
> +	virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
> +	virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER);
> +
> +	ret = virtio_finalize_features(dev);
> +	if (ret)
> +		goto err;
> +
> +	ret = virtnet_restore_up(dev);
> +	if (ret)
> +		goto err;
> +	ret = _virtnet_set_queues(vi, vi->curr_queue_pairs);
> +	if (ret)
> +		goto err;
> +
> +	virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
> +	virtio_config_enable(dev);
> +	return 0;
> +err:
> +	virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED);
> +	return ret;
> +}
> +
>  static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog)
>  {
>  	unsigned long int max_sz = PAGE_SIZE - sizeof(struct padded_vnet_hdr);
>  	struct virtnet_info *vi = netdev_priv(dev);
>  	struct bpf_prog *old_prog;
> -	u16 xdp_qp = 0, curr_qp;
> +	u16 oxdp_qp, xdp_qp = 0, curr_qp;
>  	int i, err;
>  
> -	if (prog && prog->xdp_adjust_head) {
> -		netdev_warn(dev, "Does not support bpf_xdp_adjust_head()\n");
> -		return -EOPNOTSUPP;
> -	}
> -
>  	if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO4) ||
>  	    virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO6) ||
>  	    virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_ECN) ||
> @@ -1783,21 +1844,32 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog)
>  		return -ENOMEM;
>  	}
>  
> +	if (prog) {
> +		prog = bpf_prog_add(prog, vi->max_queue_pairs - 1);
> +		if (IS_ERR(prog))
> +			return PTR_ERR(prog);
> +	}
> +
>  	err = _virtnet_set_queues(vi, curr_qp + xdp_qp);
>  	if (err) {
>  		dev_warn(&dev->dev, "XDP Device queue allocation failure.\n");
> -		return err;
> +		goto virtio_queue_err;
>  	}
>  
> -	if (prog) {
> -		prog = bpf_prog_add(prog, vi->max_queue_pairs - 1);
> -		if (IS_ERR(prog)) {
> -			_virtnet_set_queues(vi, curr_qp);
> -			return PTR_ERR(prog);
> -		}
> +	oxdp_qp = vi->xdp_queue_pairs;
> +
> +	/* Changing the headroom in buffers is a disruptive operation because
> +	 * existing buffers must be flushed and reallocated. This will happen
> +	 * when a xdp program is initially added or xdp is disabled by removing
> +	 * the xdp program resulting in number of XDP queues changing.
> +	 */
> +	if (vi->xdp_queue_pairs != xdp_qp) {
> +		vi->xdp_queue_pairs = xdp_qp;
> +		err = virtnet_reset(vi);
> +		if (err)
> +			goto virtio_reset_err;
>  	}
>  
> -	vi->xdp_queue_pairs = xdp_qp;
>  	netif_set_real_num_rx_queues(dev, curr_qp + xdp_qp);
>  
>  	for (i = 0; i < vi->max_queue_pairs; i++) {
> @@ -1808,6 +1880,21 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog)
>  	}
>  
>  	return 0;
> +
> +virtio_reset_err:
> +	/* On reset error do our best to unwind XDP changes inflight and return
> +	 * error up to user space for resolution. The underlying reset hung on
> +	 * us so not much we can do here.
> +	 */
> +	dev_warn(&dev->dev, "XDP reset failure and queues unstable\n");
> +	vi->xdp_queue_pairs = oxdp_qp;
> +virtio_queue_err:
> +	/* On queue set error we can unwind bpf ref count and user space can
> +	 * retry this is most likely an allocation failure.
> +	 */
> +	if (prog)
> +		bpf_prog_sub(prog, vi->max_queue_pairs - 1);
> +	return err;
>  }
>  
>  static bool virtnet_xdp_query(struct net_device *dev)
> @@ -2401,6 +2488,15 @@ static int virtnet_probe(struct virtio_device *vdev)
>  	return err;
>  }
>  
> +static void _remove_vq_common(struct virtnet_info *vi)
> +{
> +	vi->vdev->config->reset(vi->vdev);
> +	free_unused_bufs(vi);
> +	_free_receive_bufs(vi);
> +	free_receive_page_frags(vi);
> +	virtnet_del_vqs(vi);
> +}
> +
>  static void remove_vq_common(struct virtnet_info *vi)
>  {
>  	vi->vdev->config->reset(vi->vdev);

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [net-next PATCH v2 0/5] XDP adjust head support for virtio
  2017-02-03  3:14 [net-next PATCH v2 0/5] XDP adjust head support for virtio John Fastabend
                   ` (6 preceding siblings ...)
  2017-02-03  3:55 ` Jakub Kicinski
@ 2017-02-05 22:36 ` David Miller
  2017-02-06  4:39   ` Michael S. Tsirkin
  2017-02-07  4:15 ` Michael S. Tsirkin
  8 siblings, 1 reply; 24+ messages in thread
From: David Miller @ 2017-02-05 22:36 UTC (permalink / raw)
  To: john.fastabend; +Cc: kubakici, jasowang, ast, mst, john.r.fastabend, netdev

From: John Fastabend <john.fastabend@gmail.com>
Date: Thu, 02 Feb 2017 19:14:05 -0800

> This series adds adjust head support for virtio. The following is my
> test setup. I use qemu + virtio as follows,
> 
> ./x86_64-softmmu/qemu-system-x86_64 \
>   -hda /var/lib/libvirt/images/Fedora-test0.img \
>   -m 4096  -enable-kvm -smp 2 -netdev tap,id=hn0,queues=4,vhost=on \
>   -device virtio-net-pci,netdev=hn0,mq=on,guest_tso4=off,guest_tso6=off,guest_ecn=off,guest_ufo=off,vectors=9
> 
> In order to use XDP with virtio until LRO is supported TSO must be
> turned off in the host. The important fields in the above command line
> are the following,
> 
>   guest_tso4=off,guest_tso6=off,guest_ecn=off,guest_ufo=off
> 
> Also note it is possible to conusme more queues than can be supported
> because when XDP is enabled for retransmit XDP attempts to use a queue
> per cpu. My standard queue count is 'queues=4'.
> 
> After loading the VM I run the relevant XDP test programs in,
> 
>   ./sammples/bpf
> 
> For this series I tested xdp1, xdp2, and xdp_tx_iptunnel. I usually test
> with iperf (-d option to get bidirectional traffic), ping, and pktgen.
> I also have a modified xdp1 that returns XDP_PASS on any packet to ensure
> the normal traffic path to the stack continues to work with XDP loaded.
> 
> It would be great to automate this soon. At the moment I do it by hand
> which is starting to get tedious.
> 
> v2: original series dropped trace points after merge.

Michael, I just want to apply this right now.

I don't think haggling over whether to allocate the adjust_head area
unconditionally or not is a blocker for this series going in.  That
can be addressed trivially in a follow-on patch.

We want these new reset paths tested as much as possible and each day
we delay this series is detrimental towards that goal.

Thanks.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [net-next PATCH v2 0/5] XDP adjust head support for virtio
  2017-02-05 22:36 ` David Miller
@ 2017-02-06  4:39   ` Michael S. Tsirkin
  2017-02-06  7:12     ` Jason Wang
  2017-02-06 16:37     ` David Miller
  0 siblings, 2 replies; 24+ messages in thread
From: Michael S. Tsirkin @ 2017-02-06  4:39 UTC (permalink / raw)
  To: David Miller
  Cc: john.fastabend, kubakici, jasowang, ast, john.r.fastabend, netdev

On Sun, Feb 05, 2017 at 05:36:34PM -0500, David Miller wrote:
> From: John Fastabend <john.fastabend@gmail.com>
> Date: Thu, 02 Feb 2017 19:14:05 -0800
> 
> > This series adds adjust head support for virtio. The following is my
> > test setup. I use qemu + virtio as follows,
> > 
> > ./x86_64-softmmu/qemu-system-x86_64 \
> >   -hda /var/lib/libvirt/images/Fedora-test0.img \
> >   -m 4096  -enable-kvm -smp 2 -netdev tap,id=hn0,queues=4,vhost=on \
> >   -device virtio-net-pci,netdev=hn0,mq=on,guest_tso4=off,guest_tso6=off,guest_ecn=off,guest_ufo=off,vectors=9
> > 
> > In order to use XDP with virtio until LRO is supported TSO must be
> > turned off in the host. The important fields in the above command line
> > are the following,
> > 
> >   guest_tso4=off,guest_tso6=off,guest_ecn=off,guest_ufo=off
> > 
> > Also note it is possible to conusme more queues than can be supported
> > because when XDP is enabled for retransmit XDP attempts to use a queue
> > per cpu. My standard queue count is 'queues=4'.
> > 
> > After loading the VM I run the relevant XDP test programs in,
> > 
> >   ./sammples/bpf
> > 
> > For this series I tested xdp1, xdp2, and xdp_tx_iptunnel. I usually test
> > with iperf (-d option to get bidirectional traffic), ping, and pktgen.
> > I also have a modified xdp1 that returns XDP_PASS on any packet to ensure
> > the normal traffic path to the stack continues to work with XDP loaded.
> > 
> > It would be great to automate this soon. At the moment I do it by hand
> > which is starting to get tedious.
> > 
> > v2: original series dropped trace points after merge.
> 
> Michael, I just want to apply this right now.
> 
> I don't think haggling over whether to allocate the adjust_head area
> unconditionally or not is a blocker for this series going in.  That
> can be addressed trivially in a follow-on patch.

FYI it would just mean we revert most of this patchset except patches 2 and 3 though.

> We want these new reset paths tested as much as possible and each day
> we delay this series is detrimental towards that goal.
> 
> Thanks.

Well the point is to avoid resets completely, at the cost of extra 256 bytes
for packets > 128 bytes on ppc (64k pages) only.

Found a volunteer so I hope to have this idea tested on ppc Tuesday.

And really all we need to know is confirm whether this:
-#define MERGEABLE_BUFFER_MIN_ALIGN_SHIFT ((PAGE_SHIFT + 1) / 2)
+#define MERGEABLE_BUFFER_MIN_ALIGN_SHIFT (PAGE_SHIFT / 2 + 1)

affects performance in a measureable way.

So I would rather wait another day. But the patches themselves
look correct, from that POV.

Acked-by: Michael S. Tsirkin <mst@redhat.com>

but I would prefer that you waited another day for a Tested-by from me too.

-- 
MST

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [net-next PATCH v2 1/5] virtio_net: wrap rtnl_lock in test for calling with lock already held
  2017-02-03  3:14 ` [net-next PATCH v2 1/5] virtio_net: wrap rtnl_lock in test for calling with lock already held John Fastabend
@ 2017-02-06  6:48   ` Jason Wang
  0 siblings, 0 replies; 24+ messages in thread
From: Jason Wang @ 2017-02-06  6:48 UTC (permalink / raw)
  To: John Fastabend, kubakici, ast, mst; +Cc: john.r.fastabend, netdev



On 2017年02月03日 11:14, John Fastabend wrote:
> For XDP use case and to allow ethtool reset tests it is useful to be
> able to use reset paths from contexts where rtnl lock is already
> held.
>
> This requries updating virtnet_set_queues and free_receive_bufs the
> two places where rtnl_lock is taken in virtio_net. To do this we
> use the following pattern,
>
> 	_foo(...) { do stuff }
> 	foo(...) { rtnl_lock(); _foo(...); rtnl_unlock()};
>
> this allows us to use freeze()/restore() flow from both contexts.
>
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>

Acked-by: Jason Wang <jasowang@redhat.com>

> ---
>   drivers/net/virtio_net.c |   31 +++++++++++++++++++++----------
>   1 file changed, 21 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index bd22cf3..f8ba586 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1342,7 +1342,7 @@ static void virtnet_ack_link_announce(struct virtnet_info *vi)
>   	rtnl_unlock();
>   }
>   
> -static int virtnet_set_queues(struct virtnet_info *vi, u16 queue_pairs)
> +static int _virtnet_set_queues(struct virtnet_info *vi, u16 queue_pairs)
>   {
>   	struct scatterlist sg;
>   	struct net_device *dev = vi->dev;
> @@ -1368,6 +1368,16 @@ static int virtnet_set_queues(struct virtnet_info *vi, u16 queue_pairs)
>   	return 0;
>   }
>   
> +static int virtnet_set_queues(struct virtnet_info *vi, u16 queue_pairs)
> +{
> +	int err;
> +
> +	rtnl_lock();
> +	err = _virtnet_set_queues(vi, queue_pairs);
> +	rtnl_unlock();
> +	return err;
> +}
> +
>   static int virtnet_close(struct net_device *dev)
>   {
>   	struct virtnet_info *vi = netdev_priv(dev);
> @@ -1620,7 +1630,7 @@ static int virtnet_set_channels(struct net_device *dev,
>   		return -EINVAL;
>   
>   	get_online_cpus();
> -	err = virtnet_set_queues(vi, queue_pairs);
> +	err = _virtnet_set_queues(vi, queue_pairs);
>   	if (!err) {
>   		netif_set_real_num_tx_queues(dev, queue_pairs);
>   		netif_set_real_num_rx_queues(dev, queue_pairs);
> @@ -1752,7 +1762,7 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog)
>   		return -ENOMEM;
>   	}
>   
> -	err = virtnet_set_queues(vi, curr_qp + xdp_qp);
> +	err = _virtnet_set_queues(vi, curr_qp + xdp_qp);
>   	if (err) {
>   		dev_warn(&dev->dev, "XDP Device queue allocation failure.\n");
>   		return err;
> @@ -1761,7 +1771,7 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog)
>   	if (prog) {
>   		prog = bpf_prog_add(prog, vi->max_queue_pairs - 1);
>   		if (IS_ERR(prog)) {
> -			virtnet_set_queues(vi, curr_qp);
> +			_virtnet_set_queues(vi, curr_qp);
>   			return PTR_ERR(prog);
>   		}
>   	}
> @@ -1880,12 +1890,11 @@ static void virtnet_free_queues(struct virtnet_info *vi)
>   	kfree(vi->sq);
>   }
>   
> -static void free_receive_bufs(struct virtnet_info *vi)
> +static void _free_receive_bufs(struct virtnet_info *vi)
>   {
>   	struct bpf_prog *old_prog;
>   	int i;
>   
> -	rtnl_lock();
>   	for (i = 0; i < vi->max_queue_pairs; i++) {
>   		while (vi->rq[i].pages)
>   			__free_pages(get_a_page(&vi->rq[i], GFP_KERNEL), 0);
> @@ -1895,6 +1904,12 @@ static void free_receive_bufs(struct virtnet_info *vi)
>   		if (old_prog)
>   			bpf_prog_put(old_prog);
>   	}
> +}
> +
> +static void free_receive_bufs(struct virtnet_info *vi)
> +{
> +	rtnl_lock();
> +	_free_receive_bufs(vi);
>   	rtnl_unlock();
>   }
>   
> @@ -2333,9 +2348,7 @@ static int virtnet_probe(struct virtio_device *vdev)
>   		goto free_unregister_netdev;
>   	}
>   
> -	rtnl_lock();
>   	virtnet_set_queues(vi, vi->curr_queue_pairs);
> -	rtnl_unlock();
>   
>   	/* Assume link up if device can't report link status,
>   	   otherwise get link status from config. */
> @@ -2444,9 +2457,7 @@ static int virtnet_restore(struct virtio_device *vdev)
>   
>   	netif_device_attach(vi->dev);
>   
> -	rtnl_lock();
>   	virtnet_set_queues(vi, vi->curr_queue_pairs);
> -	rtnl_unlock();
>   
>   	err = virtnet_cpu_notif_add(vi);
>   	if (err)
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [net-next PATCH v2 2/5] virtio_net: factor out xdp handler for readability
  2017-02-03  3:15 ` [net-next PATCH v2 2/5] virtio_net: factor out xdp handler for readability John Fastabend
@ 2017-02-06  6:49   ` Jason Wang
  0 siblings, 0 replies; 24+ messages in thread
From: Jason Wang @ 2017-02-06  6:49 UTC (permalink / raw)
  To: John Fastabend, kubakici, ast, mst; +Cc: john.r.fastabend, netdev



On 2017年02月03日 11:15, John Fastabend wrote:
> At this point the do_xdp_prog is mostly if/else branches handling
> the different modes of virtio_net. So remove it and handle running
> the program in the per mode handlers.
>
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>

Acked-by: Jason Wang <jasowang@redhat.com>

> ---
>   drivers/net/virtio_net.c |   86 +++++++++++++++++++---------------------------
>   1 file changed, 35 insertions(+), 51 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index f8ba586..3b49363 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -399,52 +399,6 @@ static bool virtnet_xdp_xmit(struct virtnet_info *vi,
>   	return true;
>   }
>   
> -static u32 do_xdp_prog(struct virtnet_info *vi,
> -		       struct receive_queue *rq,
> -		       struct bpf_prog *xdp_prog,
> -		       void *data, int len)
> -{
> -	int hdr_padded_len;
> -	struct xdp_buff xdp;
> -	void *buf;
> -	unsigned int qp;
> -	u32 act;
> -
> -	if (vi->mergeable_rx_bufs) {
> -		hdr_padded_len = sizeof(struct virtio_net_hdr_mrg_rxbuf);
> -		xdp.data = data + hdr_padded_len;
> -		xdp.data_end = xdp.data + (len - vi->hdr_len);
> -		buf = data;
> -	} else { /* small buffers */
> -		struct sk_buff *skb = data;
> -
> -		xdp.data = skb->data;
> -		xdp.data_end = xdp.data + len;
> -		buf = skb->data;
> -	}
> -
> -	act = bpf_prog_run_xdp(xdp_prog, &xdp);
> -	switch (act) {
> -	case XDP_PASS:
> -		return XDP_PASS;
> -	case XDP_TX:
> -		qp = vi->curr_queue_pairs -
> -			vi->xdp_queue_pairs +
> -			smp_processor_id();
> -		xdp.data = buf;
> -		if (unlikely(!virtnet_xdp_xmit(vi, rq, &vi->sq[qp], &xdp,
> -					       data)))
> -			trace_xdp_exception(vi->dev, xdp_prog, act);
> -		return XDP_TX;
> -	default:
> -		bpf_warn_invalid_xdp_action(act);
> -	case XDP_ABORTED:
> -		trace_xdp_exception(vi->dev, xdp_prog, act);
> -	case XDP_DROP:
> -		return XDP_DROP;
> -	}
> -}
> -
>   static struct sk_buff *receive_small(struct net_device *dev,
>   				     struct virtnet_info *vi,
>   				     struct receive_queue *rq,
> @@ -460,19 +414,34 @@ static struct sk_buff *receive_small(struct net_device *dev,
>   	xdp_prog = rcu_dereference(rq->xdp_prog);
>   	if (xdp_prog) {
>   		struct virtio_net_hdr_mrg_rxbuf *hdr = buf;
> +		struct xdp_buff xdp;
> +		unsigned int qp;
>   		u32 act;
>   
>   		if (unlikely(hdr->hdr.gso_type || hdr->hdr.flags))
>   			goto err_xdp;
> -		act = do_xdp_prog(vi, rq, xdp_prog, skb, len);
> +
> +		xdp.data = skb->data;
> +		xdp.data_end = xdp.data + len;
> +		act = bpf_prog_run_xdp(xdp_prog, &xdp);
> +
>   		switch (act) {
>   		case XDP_PASS:
>   			break;
>   		case XDP_TX:
> +			qp = vi->curr_queue_pairs -
> +				vi->xdp_queue_pairs +
> +				smp_processor_id();
> +			if (unlikely(!virtnet_xdp_xmit(vi, rq, &vi->sq[qp],
> +						       &xdp, skb)))
> +				trace_xdp_exception(vi->dev, xdp_prog, act);
>   			rcu_read_unlock();
>   			goto xdp_xmit;
> -		case XDP_DROP:
>   		default:
> +			bpf_warn_invalid_xdp_action(act);
> +		case XDP_ABORTED:
> +			trace_xdp_exception(vi->dev, xdp_prog, act);
> +		case XDP_DROP:
>   			goto err_xdp;
>   		}
>   	}
> @@ -590,6 +559,9 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>   	xdp_prog = rcu_dereference(rq->xdp_prog);
>   	if (xdp_prog) {
>   		struct page *xdp_page;
> +		struct xdp_buff xdp;
> +		unsigned int qp;
> +		void *data;
>   		u32 act;
>   
>   		/* This happens when rx buffer size is underestimated */
> @@ -612,8 +584,11 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>   		if (unlikely(hdr->hdr.gso_type))
>   			goto err_xdp;
>   
> -		act = do_xdp_prog(vi, rq, xdp_prog,
> -				  page_address(xdp_page) + offset, len);
> +		data = page_address(xdp_page) + offset;
> +		xdp.data = data + vi->hdr_len;
> +		xdp.data_end = xdp.data + (len - vi->hdr_len);
> +		act = bpf_prog_run_xdp(xdp_prog, &xdp);
> +
>   		switch (act) {
>   		case XDP_PASS:
>   			/* We can only create skb based on xdp_page. */
> @@ -627,13 +602,22 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>   			}
>   			break;
>   		case XDP_TX:
> +			qp = vi->curr_queue_pairs -
> +				vi->xdp_queue_pairs +
> +				smp_processor_id();
> +			if (unlikely(!virtnet_xdp_xmit(vi, rq, &vi->sq[qp],
> +						       &xdp, data)))
> +				trace_xdp_exception(vi->dev, xdp_prog, act);
>   			ewma_pkt_len_add(&rq->mrg_avg_pkt_len, len);
>   			if (unlikely(xdp_page != page))
>   				goto err_xdp;
>   			rcu_read_unlock();
>   			goto xdp_xmit;
> -		case XDP_DROP:
>   		default:
> +			bpf_warn_invalid_xdp_action(act);
> +		case XDP_ABORTED:
> +			trace_xdp_exception(vi->dev, xdp_prog, act);
> +		case XDP_DROP:
>   			if (unlikely(xdp_page != page))
>   				__free_pages(xdp_page, 0);
>   			ewma_pkt_len_add(&rq->mrg_avg_pkt_len, len);
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [net-next PATCH v2 3/5] virtio_net: remove duplicate queue pair binding in XDP
  2017-02-03  3:15 ` [net-next PATCH v2 3/5] virtio_net: remove duplicate queue pair binding in XDP John Fastabend
@ 2017-02-06  7:06   ` Jason Wang
  0 siblings, 0 replies; 24+ messages in thread
From: Jason Wang @ 2017-02-06  7:06 UTC (permalink / raw)
  To: John Fastabend, kubakici, ast, mst; +Cc: john.r.fastabend, netdev



On 2017年02月03日 11:15, John Fastabend wrote:
> Factor out qp assignment.
>
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>

Acked-by: Jason Wang <jasowang@redhat.com>

> ---
>   drivers/net/virtio_net.c |   20 +++++++-------------
>   1 file changed, 7 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 3b49363..dba5afb 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -341,15 +341,19 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
>   
>   static bool virtnet_xdp_xmit(struct virtnet_info *vi,
>   			     struct receive_queue *rq,
> -			     struct send_queue *sq,
>   			     struct xdp_buff *xdp,
>   			     void *data)
>   {
>   	struct virtio_net_hdr_mrg_rxbuf *hdr;
>   	unsigned int num_sg, len;
> +	struct send_queue *sq;
> +	unsigned int qp;
>   	void *xdp_sent;
>   	int err;
>   
> +	qp = vi->curr_queue_pairs - vi->xdp_queue_pairs + smp_processor_id();
> +	sq = &vi->sq[qp];
> +
>   	/* Free up any pending old buffers before queueing new ones. */
>   	while ((xdp_sent = virtqueue_get_buf(sq->vq, &len)) != NULL) {
>   		if (vi->mergeable_rx_bufs) {
> @@ -415,7 +419,6 @@ static struct sk_buff *receive_small(struct net_device *dev,
>   	if (xdp_prog) {
>   		struct virtio_net_hdr_mrg_rxbuf *hdr = buf;
>   		struct xdp_buff xdp;
> -		unsigned int qp;
>   		u32 act;
>   
>   		if (unlikely(hdr->hdr.gso_type || hdr->hdr.flags))
> @@ -429,11 +432,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
>   		case XDP_PASS:
>   			break;
>   		case XDP_TX:
> -			qp = vi->curr_queue_pairs -
> -				vi->xdp_queue_pairs +
> -				smp_processor_id();
> -			if (unlikely(!virtnet_xdp_xmit(vi, rq, &vi->sq[qp],
> -						       &xdp, skb)))
> +			if (unlikely(!virtnet_xdp_xmit(vi, rq, &xdp, skb)))
>   				trace_xdp_exception(vi->dev, xdp_prog, act);
>   			rcu_read_unlock();
>   			goto xdp_xmit;
> @@ -560,7 +559,6 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>   	if (xdp_prog) {
>   		struct page *xdp_page;
>   		struct xdp_buff xdp;
> -		unsigned int qp;
>   		void *data;
>   		u32 act;
>   
> @@ -602,11 +600,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>   			}
>   			break;
>   		case XDP_TX:
> -			qp = vi->curr_queue_pairs -
> -				vi->xdp_queue_pairs +
> -				smp_processor_id();
> -			if (unlikely(!virtnet_xdp_xmit(vi, rq, &vi->sq[qp],
> -						       &xdp, data)))
> +			if (unlikely(!virtnet_xdp_xmit(vi, rq, &xdp, data)))
>   				trace_xdp_exception(vi->dev, xdp_prog, act);
>   			ewma_pkt_len_add(&rq->mrg_avg_pkt_len, len);
>   			if (unlikely(xdp_page != page))
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [net-next PATCH v2 4/5] virtio_net: refactor freeze/restore logic into virtnet reset logic
  2017-02-03  3:16 ` [net-next PATCH v2 4/5] virtio_net: refactor freeze/restore logic into virtnet reset logic John Fastabend
@ 2017-02-06  7:07   ` Jason Wang
  0 siblings, 0 replies; 24+ messages in thread
From: Jason Wang @ 2017-02-06  7:07 UTC (permalink / raw)
  To: John Fastabend, kubakici, ast, mst; +Cc: john.r.fastabend, netdev



On 2017年02月03日 11:16, John Fastabend wrote:
> For XDP we will need to reset the queues to allow for buffer headroom
> to be configured. In order to do this we need to essentially run the
> freeze()/restore() code path. Unfortunately the locking requirements
> between the freeze/restore and reset paths are different however so
> we can not simply reuse the code.
>
> This patch refactors the code path and adds a reset helper routine.
>
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>

Acked-by: Jason Wang <jasowang@redhat.com>

> ---
>   drivers/net/virtio_net.c |   75 ++++++++++++++++++++++++++++------------------
>   drivers/virtio/virtio.c  |   42 ++++++++++++++------------
>   include/linux/virtio.h   |    4 ++
>   3 files changed, 73 insertions(+), 48 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index dba5afb..07f9076 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1698,6 +1698,49 @@ static void virtnet_init_settings(struct net_device *dev)
>   	.set_settings = virtnet_set_settings,
>   };
>   
> +static void virtnet_freeze_down(struct virtio_device *vdev)
> +{
> +	struct virtnet_info *vi = vdev->priv;
> +	int i;
> +
> +	/* Make sure no work handler is accessing the device */
> +	flush_work(&vi->config_work);
> +
> +	netif_device_detach(vi->dev);
> +	cancel_delayed_work_sync(&vi->refill);
> +
> +	if (netif_running(vi->dev)) {
> +		for (i = 0; i < vi->max_queue_pairs; i++)
> +			napi_disable(&vi->rq[i].napi);
> +	}
> +}
> +
> +static int init_vqs(struct virtnet_info *vi);
> +
> +static int virtnet_restore_up(struct virtio_device *vdev)
> +{
> +	struct virtnet_info *vi = vdev->priv;
> +	int err, i;
> +
> +	err = init_vqs(vi);
> +	if (err)
> +		return err;
> +
> +	virtio_device_ready(vdev);
> +
> +	if (netif_running(vi->dev)) {
> +		for (i = 0; i < vi->curr_queue_pairs; i++)
> +			if (!try_fill_recv(vi, &vi->rq[i], GFP_KERNEL))
> +				schedule_delayed_work(&vi->refill, 0);
> +
> +		for (i = 0; i < vi->max_queue_pairs; i++)
> +			virtnet_napi_enable(&vi->rq[i]);
> +	}
> +
> +	netif_device_attach(vi->dev);
> +	return err;
> +}
> +
>   static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog)
>   {
>   	unsigned long int max_sz = PAGE_SIZE - sizeof(struct padded_vnet_hdr);
> @@ -2393,21 +2436,9 @@ static void virtnet_remove(struct virtio_device *vdev)
>   static int virtnet_freeze(struct virtio_device *vdev)
>   {
>   	struct virtnet_info *vi = vdev->priv;
> -	int i;
>   
>   	virtnet_cpu_notif_remove(vi);
> -
> -	/* Make sure no work handler is accessing the device */
> -	flush_work(&vi->config_work);
> -
> -	netif_device_detach(vi->dev);
> -	cancel_delayed_work_sync(&vi->refill);
> -
> -	if (netif_running(vi->dev)) {
> -		for (i = 0; i < vi->max_queue_pairs; i++)
> -			napi_disable(&vi->rq[i].napi);
> -	}
> -
> +	virtnet_freeze_down(vdev);
>   	remove_vq_common(vi);
>   
>   	return 0;
> @@ -2416,25 +2447,11 @@ static int virtnet_freeze(struct virtio_device *vdev)
>   static int virtnet_restore(struct virtio_device *vdev)
>   {
>   	struct virtnet_info *vi = vdev->priv;
> -	int err, i;
> +	int err;
>   
> -	err = init_vqs(vi);
> +	err = virtnet_restore_up(vdev);
>   	if (err)
>   		return err;
> -
> -	virtio_device_ready(vdev);
> -
> -	if (netif_running(vi->dev)) {
> -		for (i = 0; i < vi->curr_queue_pairs; i++)
> -			if (!try_fill_recv(vi, &vi->rq[i], GFP_KERNEL))
> -				schedule_delayed_work(&vi->refill, 0);
> -
> -		for (i = 0; i < vi->max_queue_pairs; i++)
> -			virtnet_napi_enable(&vi->rq[i]);
> -	}
> -
> -	netif_device_attach(vi->dev);
> -
>   	virtnet_set_queues(vi, vi->curr_queue_pairs);
>   
>   	err = virtnet_cpu_notif_add(vi);
> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> index 7062bb0..400d70b 100644
> --- a/drivers/virtio/virtio.c
> +++ b/drivers/virtio/virtio.c
> @@ -100,11 +100,6 @@ static int virtio_uevent(struct device *_dv, struct kobj_uevent_env *env)
>   			      dev->id.device, dev->id.vendor);
>   }
>   
> -static void add_status(struct virtio_device *dev, unsigned status)
> -{
> -	dev->config->set_status(dev, dev->config->get_status(dev) | status);
> -}
> -
>   void virtio_check_driver_offered_feature(const struct virtio_device *vdev,
>   					 unsigned int fbit)
>   {
> @@ -145,14 +140,15 @@ void virtio_config_changed(struct virtio_device *dev)
>   }
>   EXPORT_SYMBOL_GPL(virtio_config_changed);
>   
> -static void virtio_config_disable(struct virtio_device *dev)
> +void virtio_config_disable(struct virtio_device *dev)
>   {
>   	spin_lock_irq(&dev->config_lock);
>   	dev->config_enabled = false;
>   	spin_unlock_irq(&dev->config_lock);
>   }
> +EXPORT_SYMBOL_GPL(virtio_config_disable);
>   
> -static void virtio_config_enable(struct virtio_device *dev)
> +void virtio_config_enable(struct virtio_device *dev)
>   {
>   	spin_lock_irq(&dev->config_lock);
>   	dev->config_enabled = true;
> @@ -161,8 +157,15 @@ static void virtio_config_enable(struct virtio_device *dev)
>   	dev->config_change_pending = false;
>   	spin_unlock_irq(&dev->config_lock);
>   }
> +EXPORT_SYMBOL_GPL(virtio_config_enable);
> +
> +void virtio_add_status(struct virtio_device *dev, unsigned int status)
> +{
> +	dev->config->set_status(dev, dev->config->get_status(dev) | status);
> +}
> +EXPORT_SYMBOL_GPL(virtio_add_status);
>   
> -static int virtio_finalize_features(struct virtio_device *dev)
> +int virtio_finalize_features(struct virtio_device *dev)
>   {
>   	int ret = dev->config->finalize_features(dev);
>   	unsigned status;
> @@ -173,7 +176,7 @@ static int virtio_finalize_features(struct virtio_device *dev)
>   	if (!virtio_has_feature(dev, VIRTIO_F_VERSION_1))
>   		return 0;
>   
> -	add_status(dev, VIRTIO_CONFIG_S_FEATURES_OK);
> +	virtio_add_status(dev, VIRTIO_CONFIG_S_FEATURES_OK);
>   	status = dev->config->get_status(dev);
>   	if (!(status & VIRTIO_CONFIG_S_FEATURES_OK)) {
>   		dev_err(&dev->dev, "virtio: device refuses features: %x\n",
> @@ -182,6 +185,7 @@ static int virtio_finalize_features(struct virtio_device *dev)
>   	}
>   	return 0;
>   }
> +EXPORT_SYMBOL_GPL(virtio_finalize_features);
>   
>   static int virtio_dev_probe(struct device *_d)
>   {
> @@ -193,7 +197,7 @@ static int virtio_dev_probe(struct device *_d)
>   	u64 driver_features_legacy;
>   
>   	/* We have a driver! */
> -	add_status(dev, VIRTIO_CONFIG_S_DRIVER);
> +	virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER);
>   
>   	/* Figure out what features the device supports. */
>   	device_features = dev->config->get_features(dev);
> @@ -247,7 +251,7 @@ static int virtio_dev_probe(struct device *_d)
>   
>   	return 0;
>   err:
> -	add_status(dev, VIRTIO_CONFIG_S_FAILED);
> +	virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED);
>   	return err;
>   
>   }
> @@ -265,7 +269,7 @@ static int virtio_dev_remove(struct device *_d)
>   	WARN_ON_ONCE(dev->config->get_status(dev));
>   
>   	/* Acknowledge the device's existence again. */
> -	add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
> +	virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
>   	return 0;
>   }
>   
> @@ -316,7 +320,7 @@ int register_virtio_device(struct virtio_device *dev)
>   	dev->config->reset(dev);
>   
>   	/* Acknowledge that we've seen the device. */
> -	add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
> +	virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
>   
>   	INIT_LIST_HEAD(&dev->vqs);
>   
> @@ -325,7 +329,7 @@ int register_virtio_device(struct virtio_device *dev)
>   	err = device_register(&dev->dev);
>   out:
>   	if (err)
> -		add_status(dev, VIRTIO_CONFIG_S_FAILED);
> +		virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED);
>   	return err;
>   }
>   EXPORT_SYMBOL_GPL(register_virtio_device);
> @@ -365,18 +369,18 @@ int virtio_device_restore(struct virtio_device *dev)
>   	dev->config->reset(dev);
>   
>   	/* Acknowledge that we've seen the device. */
> -	add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
> +	virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
>   
>   	/* Maybe driver failed before freeze.
>   	 * Restore the failed status, for debugging. */
>   	if (dev->failed)
> -		add_status(dev, VIRTIO_CONFIG_S_FAILED);
> +		virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED);
>   
>   	if (!drv)
>   		return 0;
>   
>   	/* We have a driver! */
> -	add_status(dev, VIRTIO_CONFIG_S_DRIVER);
> +	virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER);
>   
>   	ret = virtio_finalize_features(dev);
>   	if (ret)
> @@ -389,14 +393,14 @@ int virtio_device_restore(struct virtio_device *dev)
>   	}
>   
>   	/* Finally, tell the device we're all set */
> -	add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
> +	virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
>   
>   	virtio_config_enable(dev);
>   
>   	return 0;
>   
>   err:
> -	add_status(dev, VIRTIO_CONFIG_S_FAILED);
> +	virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED);
>   	return ret;
>   }
>   EXPORT_SYMBOL_GPL(virtio_device_restore);
> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> index d5eb547..04b0d3f 100644
> --- a/include/linux/virtio.h
> +++ b/include/linux/virtio.h
> @@ -132,12 +132,16 @@ static inline struct virtio_device *dev_to_virtio(struct device *_dev)
>   	return container_of(_dev, struct virtio_device, dev);
>   }
>   
> +void virtio_add_status(struct virtio_device *dev, unsigned int status);
>   int register_virtio_device(struct virtio_device *dev);
>   void unregister_virtio_device(struct virtio_device *dev);
>   
>   void virtio_break_device(struct virtio_device *dev);
>   
>   void virtio_config_changed(struct virtio_device *dev);
> +void virtio_config_disable(struct virtio_device *dev);
> +void virtio_config_enable(struct virtio_device *dev);
> +int virtio_finalize_features(struct virtio_device *dev);
>   #ifdef CONFIG_PM_SLEEP
>   int virtio_device_freeze(struct virtio_device *dev);
>   int virtio_device_restore(struct virtio_device *dev);
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [net-next PATCH v2 5/5] virtio_net: XDP support for adjust_head
  2017-02-03  3:16 ` [net-next PATCH v2 5/5] virtio_net: XDP support for adjust_head John Fastabend
  2017-02-03  4:04   ` Michael S. Tsirkin
@ 2017-02-06  7:08   ` Jason Wang
  2017-02-06 19:29     ` John Fastabend
  1 sibling, 1 reply; 24+ messages in thread
From: Jason Wang @ 2017-02-06  7:08 UTC (permalink / raw)
  To: John Fastabend, kubakici, ast, mst; +Cc: john.r.fastabend, netdev



On 2017年02月03日 11:16, John Fastabend wrote:
> Add support for XDP adjust head by allocating a 256B header region
> that XDP programs can grow into. This is only enabled when a XDP
> program is loaded.
>
> In order to ensure that we do not have to unwind queue headroom push
> queue setup below bpf_prog_add. It reads better to do a prog ref
> unwind vs another queue setup call.
>
> At the moment this code must do a full reset to ensure old buffers
> without headroom on program add or with headroom on program removal
> are not used incorrectly in the datapath. Ideally we would only
> have to disable/enable the RX queues being updated but there is no
> API to do this at the moment in virtio so use the big hammer. In
> practice it is likely not that big of a problem as this will only
> happen when XDP is enabled/disabled changing programs does not
> require the reset. There is some risk that the driver may either
> have an allocation failure or for some reason fail to correctly
> negotiate with the underlying backend in this case the driver will
> be left uninitialized. I have not seen this ever happen on my test
> systems and for what its worth this same failure case can occur
> from probe and other contexts in virtio framework.
>
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> ---
>   drivers/net/virtio_net.c |  154 +++++++++++++++++++++++++++++++++++++---------
>   1 file changed, 125 insertions(+), 29 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 07f9076..52a18b8 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -42,6 +42,9 @@
>   #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN)
>   #define GOOD_COPY_LEN	128
>   
> +/* Amount of XDP headroom to prepend to packets for use by xdp_adjust_head */
> +#define VIRTIO_XDP_HEADROOM 256
> +
>   /* RX packet size EWMA. The average packet size is used to determine the packet
>    * buffer size when refilling RX rings. As the entire RX ring may be refilled
>    * at once, the weight is chosen so that the EWMA will be insensitive to short-
> @@ -368,6 +371,7 @@ static bool virtnet_xdp_xmit(struct virtnet_info *vi,
>   	}
>   
>   	if (vi->mergeable_rx_bufs) {
> +		xdp->data -= sizeof(struct virtio_net_hdr_mrg_rxbuf);
>   		/* Zero header and leave csum up to XDP layers */
>   		hdr = xdp->data;
>   		memset(hdr, 0, vi->hdr_len);
> @@ -384,7 +388,9 @@ static bool virtnet_xdp_xmit(struct virtnet_info *vi,
>   		num_sg = 2;
>   		sg_init_table(sq->sg, 2);
>   		sg_set_buf(sq->sg, hdr, vi->hdr_len);
> -		skb_to_sgvec(skb, sq->sg + 1, 0, skb->len);
> +		skb_to_sgvec(skb, sq->sg + 1,
> +			     xdp->data - xdp->data_hard_start,
> +			     xdp->data_end - xdp->data);
>   	}
>   	err = virtqueue_add_outbuf(sq->vq, sq->sg, num_sg,
>   				   data, GFP_ATOMIC);
> @@ -412,7 +418,6 @@ static struct sk_buff *receive_small(struct net_device *dev,
>   	struct bpf_prog *xdp_prog;
>   
>   	len -= vi->hdr_len;
> -	skb_trim(skb, len);
>   
>   	rcu_read_lock();
>   	xdp_prog = rcu_dereference(rq->xdp_prog);
> @@ -424,12 +429,16 @@ static struct sk_buff *receive_small(struct net_device *dev,
>   		if (unlikely(hdr->hdr.gso_type || hdr->hdr.flags))
>   			goto err_xdp;
>   
> -		xdp.data = skb->data;
> +		xdp.data_hard_start = skb->data;
> +		xdp.data = skb->data + VIRTIO_XDP_HEADROOM;
>   		xdp.data_end = xdp.data + len;
>   		act = bpf_prog_run_xdp(xdp_prog, &xdp);
>   
>   		switch (act) {
>   		case XDP_PASS:
> +			/* Recalculate length in case bpf program changed it */
> +			__skb_pull(skb, xdp.data - xdp.data_hard_start);

But skb->len were trimmed to len below which seems wrong.

> +			len = xdp.data_end - xdp.data;
>   			break;
>   		case XDP_TX:
>   			if (unlikely(!virtnet_xdp_xmit(vi, rq, &xdp, skb)))
> @@ -446,6 +455,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
>   	}
>   	rcu_read_unlock();
>   
> +	skb_trim(skb, len);
>   	return skb;
>   
>   err_xdp:
> @@ -494,7 +504,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
>   				       unsigned int *len)
>   {
>   	struct page *page = alloc_page(GFP_ATOMIC);
> -	unsigned int page_off = 0;
> +	unsigned int page_off = VIRTIO_XDP_HEADROOM;
>   
>   	if (!page)
>   		return NULL;
> @@ -530,7 +540,8 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
>   		put_page(p);
>   	}
>   
> -	*len = page_off;
> +	/* Headroom does not contribute to packet length */
> +	*len = page_off - VIRTIO_XDP_HEADROOM;
>   	return page;
>   err_buf:
>   	__free_pages(page, 0);
> @@ -569,7 +580,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>   						      page, offset, &len);
>   			if (!xdp_page)
>   				goto err_xdp;
> -			offset = 0;
> +			offset = VIRTIO_XDP_HEADROOM;
>   		} else {
>   			xdp_page = page;
>   		}
> @@ -582,19 +593,30 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>   		if (unlikely(hdr->hdr.gso_type))
>   			goto err_xdp;
>   
> +		/* Allow consuming headroom but reserve enough space to push
> +		 * the descriptor on if we get an XDP_TX return code.
> +		 */
>   		data = page_address(xdp_page) + offset;
> +		xdp.data_hard_start = data - VIRTIO_XDP_HEADROOM + vi->hdr_len;

Should be data - VIRTIO_XDP_HEADROOM I think?

>   		xdp.data = data + vi->hdr_len;
>   		xdp.data_end = xdp.data + (len - vi->hdr_len);
>   		act = bpf_prog_run_xdp(xdp_prog, &xdp);
>   
>   		switch (act) {
>   		case XDP_PASS:
> +			/* recalculate offset to account for any header
> +			 * adjustments. Note other cases do not build an
> +			 * skb and avoid using offset
> +			 */
> +			offset = xdp.data -
> +					page_address(xdp_page) - vi->hdr_len;
> +
>   			/* We can only create skb based on xdp_page. */
>   			if (unlikely(xdp_page != page)) {
>   				rcu_read_unlock();
>   				put_page(page);
>   				head_skb = page_to_skb(vi, rq, xdp_page,
> -						       0, len, PAGE_SIZE);
> +						       offset, len, PAGE_SIZE);
>   				ewma_pkt_len_add(&rq->mrg_avg_pkt_len, len);
>   				return head_skb;
>   			}
> @@ -761,23 +783,30 @@ static void receive_buf(struct virtnet_info *vi, struct receive_queue *rq,
>   	dev_kfree_skb(skb);
>   }
>   
> +static unsigned int virtnet_get_headroom(struct virtnet_info *vi)
> +{
> +	return vi->xdp_queue_pairs ? VIRTIO_XDP_HEADROOM : 0;
> +}
> +
>   static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
>   			     gfp_t gfp)
>   {
> +	int headroom = GOOD_PACKET_LEN + virtnet_get_headroom(vi);
> +	unsigned int xdp_headroom = virtnet_get_headroom(vi);
>   	struct sk_buff *skb;
>   	struct virtio_net_hdr_mrg_rxbuf *hdr;
>   	int err;
>   
> -	skb = __netdev_alloc_skb_ip_align(vi->dev, GOOD_PACKET_LEN, gfp);
> +	skb = __netdev_alloc_skb_ip_align(vi->dev, headroom, gfp);
>   	if (unlikely(!skb))
>   		return -ENOMEM;
>   
> -	skb_put(skb, GOOD_PACKET_LEN);
> +	skb_put(skb, headroom);
>   
>   	hdr = skb_vnet_hdr(skb);
>   	sg_init_table(rq->sg, 2);
>   	sg_set_buf(rq->sg, hdr, vi->hdr_len);
> -	skb_to_sgvec(skb, rq->sg + 1, 0, skb->len);
> +	skb_to_sgvec(skb, rq->sg + 1, xdp_headroom, skb->len - xdp_headroom);
>   
>   	err = virtqueue_add_inbuf(rq->vq, rq->sg, 2, skb, gfp);
>   	if (err < 0)
> @@ -845,24 +874,27 @@ static unsigned int get_mergeable_buf_len(struct ewma_pkt_len *avg_pkt_len)
>   	return ALIGN(len, MERGEABLE_BUFFER_ALIGN);
>   }
>   
> -static int add_recvbuf_mergeable(struct receive_queue *rq, gfp_t gfp)
> +static int add_recvbuf_mergeable(struct virtnet_info *vi,
> +				 struct receive_queue *rq, gfp_t gfp)
>   {
>   	struct page_frag *alloc_frag = &rq->alloc_frag;
> +	unsigned int headroom = virtnet_get_headroom(vi);
>   	char *buf;
>   	unsigned long ctx;
>   	int err;
>   	unsigned int len, hole;
>   
>   	len = get_mergeable_buf_len(&rq->mrg_avg_pkt_len);
> -	if (unlikely(!skb_page_frag_refill(len, alloc_frag, gfp)))
> +	if (unlikely(!skb_page_frag_refill(len + headroom, alloc_frag, gfp)))
>   		return -ENOMEM;
>   
>   	buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
> +	buf += headroom; /* advance address leaving hole at front of pkt */

Note: the headroom will reduce the possibility of frag coalescing which 
may damage the performance more or less.

[...]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [net-next PATCH v2 0/5] XDP adjust head support for virtio
  2017-02-06  4:39   ` Michael S. Tsirkin
@ 2017-02-06  7:12     ` Jason Wang
  2017-02-06 16:37     ` David Miller
  1 sibling, 0 replies; 24+ messages in thread
From: Jason Wang @ 2017-02-06  7:12 UTC (permalink / raw)
  To: Michael S. Tsirkin, David Miller
  Cc: john.fastabend, kubakici, ast, john.r.fastabend, netdev



On 2017年02月06日 12:39, Michael S. Tsirkin wrote:
> On Sun, Feb 05, 2017 at 05:36:34PM -0500, David Miller wrote:
>> From: John Fastabend <john.fastabend@gmail.com>
>> Date: Thu, 02 Feb 2017 19:14:05 -0800
>>
>>> This series adds adjust head support for virtio. The following is my
>>> test setup. I use qemu + virtio as follows,
>>>
>>> ./x86_64-softmmu/qemu-system-x86_64 \
>>>    -hda /var/lib/libvirt/images/Fedora-test0.img \
>>>    -m 4096  -enable-kvm -smp 2 -netdev tap,id=hn0,queues=4,vhost=on \
>>>    -device virtio-net-pci,netdev=hn0,mq=on,guest_tso4=off,guest_tso6=off,guest_ecn=off,guest_ufo=off,vectors=9
>>>
>>> In order to use XDP with virtio until LRO is supported TSO must be
>>> turned off in the host. The important fields in the above command line
>>> are the following,
>>>
>>>    guest_tso4=off,guest_tso6=off,guest_ecn=off,guest_ufo=off
>>>
>>> Also note it is possible to conusme more queues than can be supported
>>> because when XDP is enabled for retransmit XDP attempts to use a queue
>>> per cpu. My standard queue count is 'queues=4'.
>>>
>>> After loading the VM I run the relevant XDP test programs in,
>>>
>>>    ./sammples/bpf
>>>
>>> For this series I tested xdp1, xdp2, and xdp_tx_iptunnel. I usually test
>>> with iperf (-d option to get bidirectional traffic), ping, and pktgen.
>>> I also have a modified xdp1 that returns XDP_PASS on any packet to ensure
>>> the normal traffic path to the stack continues to work with XDP loaded.
>>>
>>> It would be great to automate this soon. At the moment I do it by hand
>>> which is starting to get tedious.
>>>
>>> v2: original series dropped trace points after merge.
>> Michael, I just want to apply this right now.
>>
>> I don't think haggling over whether to allocate the adjust_head area
>> unconditionally or not is a blocker for this series going in.  That
>> can be addressed trivially in a follow-on patch.
> FYI it would just mean we revert most of this patchset except patches 2 and 3 though.
>
>> We want these new reset paths tested as much as possible and each day
>> we delay this series is detrimental towards that goal.
>>
>> Thanks.
> Well the point is to avoid resets completely, at the cost of extra 256 bytes
> for packets > 128 bytes on ppc (64k pages) only.
>
> Found a volunteer so I hope to have this idea tested on ppc Tuesday.
>
> And really all we need to know is confirm whether this:
> -#define MERGEABLE_BUFFER_MIN_ALIGN_SHIFT ((PAGE_SHIFT + 1) / 2)
> +#define MERGEABLE_BUFFER_MIN_ALIGN_SHIFT (PAGE_SHIFT / 2 + 1)
>
> affects performance in a measureable way.

Ok, but we still need to drop some packets with this way I believe, and 
does it work if we allow to change the size of headroom in the future?

Thanks

>
> So I would rather wait another day. But the patches themselves
> look correct, from that POV.
>
> Acked-by: Michael S. Tsirkin <mst@redhat.com>
>
> but I would prefer that you waited another day for a Tested-by from me too.
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [net-next PATCH v2 0/5] XDP adjust head support for virtio
  2017-02-06  4:39   ` Michael S. Tsirkin
  2017-02-06  7:12     ` Jason Wang
@ 2017-02-06 16:37     ` David Miller
  1 sibling, 0 replies; 24+ messages in thread
From: David Miller @ 2017-02-06 16:37 UTC (permalink / raw)
  To: mst; +Cc: john.fastabend, kubakici, jasowang, ast, john.r.fastabend, netdev

From: "Michael S. Tsirkin" <mst@redhat.com>
Date: Mon, 6 Feb 2017 06:39:54 +0200

> Well the point is to avoid resets completely, at the cost of extra 256 bytes
> for packets > 128 bytes on ppc (64k pages) only.
> 
> Found a volunteer so I hope to have this idea tested on ppc Tuesday.
> 
> And really all we need to know is confirm whether this:
> -#define MERGEABLE_BUFFER_MIN_ALIGN_SHIFT ((PAGE_SHIFT + 1) / 2)
> +#define MERGEABLE_BUFFER_MIN_ALIGN_SHIFT (PAGE_SHIFT / 2 + 1)
> 
> affects performance in a measureable way.
> 
> So I would rather wait another day. But the patches themselves
> look correct, from that POV.
> 
> Acked-by: Michael S. Tsirkin <mst@redhat.com>
> 
> but I would prefer that you waited another day for a Tested-by from me too.

Ok.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [net-next PATCH v2 5/5] virtio_net: XDP support for adjust_head
  2017-02-06  7:08   ` Jason Wang
@ 2017-02-06 19:29     ` John Fastabend
  2017-02-07  2:23       ` Jason Wang
  0 siblings, 1 reply; 24+ messages in thread
From: John Fastabend @ 2017-02-06 19:29 UTC (permalink / raw)
  To: Jason Wang, kubakici, ast, mst; +Cc: john.r.fastabend, netdev

On 17-02-05 11:08 PM, Jason Wang wrote:
> 
> 
> On 2017年02月03日 11:16, John Fastabend wrote:
>> Add support for XDP adjust head by allocating a 256B header region
>> that XDP programs can grow into. This is only enabled when a XDP
>> program is loaded.
>>
>> In order to ensure that we do not have to unwind queue headroom push
>> queue setup below bpf_prog_add. It reads better to do a prog ref
>> unwind vs another queue setup call.
>>
>> At the moment this code must do a full reset to ensure old buffers
>> without headroom on program add or with headroom on program removal
>> are not used incorrectly in the datapath. Ideally we would only
>> have to disable/enable the RX queues being updated but there is no
>> API to do this at the moment in virtio so use the big hammer. In
>> practice it is likely not that big of a problem as this will only
>> happen when XDP is enabled/disabled changing programs does not
>> require the reset. There is some risk that the driver may either
>> have an allocation failure or for some reason fail to correctly
>> negotiate with the underlying backend in this case the driver will
>> be left uninitialized. I have not seen this ever happen on my test
>> systems and for what its worth this same failure case can occur
>> from probe and other contexts in virtio framework.
>>
>> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
>> ---


[...]

>> @@ -412,7 +418,6 @@ static struct sk_buff *receive_small(struct net_device *dev,
>>       struct bpf_prog *xdp_prog;
>>         len -= vi->hdr_len;
>> -    skb_trim(skb, len);
>>         rcu_read_lock();
>>       xdp_prog = rcu_dereference(rq->xdp_prog);
>> @@ -424,12 +429,16 @@ static struct sk_buff *receive_small(struct net_device
>> *dev,
>>           if (unlikely(hdr->hdr.gso_type || hdr->hdr.flags))
>>               goto err_xdp;
>>   -        xdp.data = skb->data;
>> +        xdp.data_hard_start = skb->data;
>> +        xdp.data = skb->data + VIRTIO_XDP_HEADROOM;
>>           xdp.data_end = xdp.data + len;
>>           act = bpf_prog_run_xdp(xdp_prog, &xdp);
>>             switch (act) {
>>           case XDP_PASS:
>> +            /* Recalculate length in case bpf program changed it */
>> +            __skb_pull(skb, xdp.data - xdp.data_hard_start);
> 
> But skb->len were trimmed to len below which seems wrong.

I believe this is correct and it passes my basic iperf/ping tests.

When we are using small buffers with XDP, skb->data is pointing to the front
of the buffer. This space includes the XDP headroom. When we pass the skb up
to the stack we need to pull this off and point to the start of the data. But
there still is likely a bunch of room at the end of the buffer assuming the
packet is smaller than the buffer side.

> 
>> +            len = xdp.data_end - xdp.data;
>>               break;
>>           case XDP_TX:
>>               if (unlikely(!virtnet_xdp_xmit(vi, rq, &xdp, skb)))
>> @@ -446,6 +455,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
>>       }
>>       rcu_read_unlock();
>>   +    skb_trim(skb, len);

So here we trim the packet to set the length to the actual payload size. The
'len' parameter passed into receive_small does not include the headroom so this
gives us the correct length of the payload.

Make sense?

>>       return skb;
>>     err_xdp:

[...]

>> @@ -569,7 +580,7 @@ static struct sk_buff *receive_mergeable(struct net_device
>> *dev,
>>                                 page, offset, &len);
>>               if (!xdp_page)
>>                   goto err_xdp;
>> -            offset = 0;
>> +            offset = VIRTIO_XDP_HEADROOM;
>>           } else {
>>               xdp_page = page;
>>           }
>> @@ -582,19 +593,30 @@ static struct sk_buff *receive_mergeable(struct
>> net_device *dev,
>>           if (unlikely(hdr->hdr.gso_type))
>>               goto err_xdp;
>>   +        /* Allow consuming headroom but reserve enough space to push
>> +         * the descriptor on if we get an XDP_TX return code.
>> +         */
>>           data = page_address(xdp_page) + offset;
>> +        xdp.data_hard_start = data - VIRTIO_XDP_HEADROOM + vi->hdr_len;
> 
> Should be data - VIRTIO_XDP_HEADROOM I think?
> 

If the XDP program does an adjust_head() and then a XDP_TX I want to ensure
we reserve enough headroom to push the header onto the buffer when the packet
is sent. So the additional hdr_len reserve here is intentional. Otherwise we
would need to detect this and do some type of linearize action.

>>           xdp.data = data + vi->hdr_len;
>>           xdp.data_end = xdp.data + (len - vi->hdr_len);
>>           act = bpf_prog_run_xdp(xdp_prog, &xdp);
>>             switch (act) {
>>           case XDP_PASS:
>> +            /* recalculate offset to account for any header
>> +             * adjustments. Note other cases do not build an
>> +             * skb and avoid using offset
>> +             */
>> +            offset = xdp.data -
>> +                    page_address(xdp_page) - vi->hdr_len;
>> +
>>               /* We can only create skb based on xdp_page. */
>>               if (unlikely(xdp_page != page)) {
>>                   rcu_read_unlock();
>>                   put_page(page);
>>                   head_skb = page_to_skb(vi, rq, xdp_page,
>> -                               0, len, PAGE_SIZE);
>> +                               offset, len, PAGE_SIZE);
>>                   ewma_pkt_len_add(&rq->mrg_avg_pkt_len, len);
>>                   return head_skb;

[...]

>>   -static int add_recvbuf_mergeable(struct receive_queue *rq, gfp_t gfp)
>> +static int add_recvbuf_mergeable(struct virtnet_info *vi,
>> +                 struct receive_queue *rq, gfp_t gfp)
>>   {
>>       struct page_frag *alloc_frag = &rq->alloc_frag;
>> +    unsigned int headroom = virtnet_get_headroom(vi);
>>       char *buf;
>>       unsigned long ctx;
>>       int err;
>>       unsigned int len, hole;
>>         len = get_mergeable_buf_len(&rq->mrg_avg_pkt_len);
>> -    if (unlikely(!skb_page_frag_refill(len, alloc_frag, gfp)))
>> +    if (unlikely(!skb_page_frag_refill(len + headroom, alloc_frag, gfp)))
>>           return -ENOMEM;
>>         buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
>> +    buf += headroom; /* advance address leaving hole at front of pkt */
> 
> Note: the headroom will reduce the possibility of frag coalescing which may
> damage the performance more or less.
> 
> [...]

Right there are a few other performance optimizations I am looking at in
virtio as well but these should go in as follow on series.

Specifically, I'm looking at recycling buffers to see what sort of performance
increase we can get out of that. Many of the hardware drivers do this and see
a performance boost from it. However dynamic buffer sizes like this make it a
bit challenging.

.John

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [net-next PATCH v2 5/5] virtio_net: XDP support for adjust_head
  2017-02-06 19:29     ` John Fastabend
@ 2017-02-07  2:23       ` Jason Wang
  0 siblings, 0 replies; 24+ messages in thread
From: Jason Wang @ 2017-02-07  2:23 UTC (permalink / raw)
  To: John Fastabend, kubakici, ast, mst; +Cc: john.r.fastabend, netdev



On 2017年02月07日 03:29, John Fastabend wrote:
> On 17-02-05 11:08 PM, Jason Wang wrote:
>>
>> On 2017年02月03日 11:16, John Fastabend wrote:
>>> Add support for XDP adjust head by allocating a 256B header region
>>> that XDP programs can grow into. This is only enabled when a XDP
>>> program is loaded.
>>>
>>> In order to ensure that we do not have to unwind queue headroom push
>>> queue setup below bpf_prog_add. It reads better to do a prog ref
>>> unwind vs another queue setup call.
>>>
>>> At the moment this code must do a full reset to ensure old buffers
>>> without headroom on program add or with headroom on program removal
>>> are not used incorrectly in the datapath. Ideally we would only
>>> have to disable/enable the RX queues being updated but there is no
>>> API to do this at the moment in virtio so use the big hammer. In
>>> practice it is likely not that big of a problem as this will only
>>> happen when XDP is enabled/disabled changing programs does not
>>> require the reset. There is some risk that the driver may either
>>> have an allocation failure or for some reason fail to correctly
>>> negotiate with the underlying backend in this case the driver will
>>> be left uninitialized. I have not seen this ever happen on my test
>>> systems and for what its worth this same failure case can occur
>>> from probe and other contexts in virtio framework.
>>>
>>> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
>>> ---
>
> [...]
>
>>> @@ -412,7 +418,6 @@ static struct sk_buff *receive_small(struct net_device *dev,
>>>        struct bpf_prog *xdp_prog;
>>>          len -= vi->hdr_len;
>>> -    skb_trim(skb, len);
>>>          rcu_read_lock();
>>>        xdp_prog = rcu_dereference(rq->xdp_prog);
>>> @@ -424,12 +429,16 @@ static struct sk_buff *receive_small(struct net_device
>>> *dev,
>>>            if (unlikely(hdr->hdr.gso_type || hdr->hdr.flags))
>>>                goto err_xdp;
>>>    -        xdp.data = skb->data;
>>> +        xdp.data_hard_start = skb->data;
>>> +        xdp.data = skb->data + VIRTIO_XDP_HEADROOM;
>>>            xdp.data_end = xdp.data + len;
>>>            act = bpf_prog_run_xdp(xdp_prog, &xdp);
>>>              switch (act) {
>>>            case XDP_PASS:
>>> +            /* Recalculate length in case bpf program changed it */
>>> +            __skb_pull(skb, xdp.data - xdp.data_hard_start);
>> But skb->len were trimmed to len below which seems wrong.
> I believe this is correct and it passes my basic iperf/ping tests.
>
> When we are using small buffers with XDP, skb->data is pointing to the front
> of the buffer. This space includes the XDP headroom. When we pass the skb up
> to the stack we need to pull this off and point to the start of the data. But
> there still is likely a bunch of room at the end of the buffer assuming the
> packet is smaller than the buffer side.
>
>>> +            len = xdp.data_end - xdp.data;
>>>                break;
>>>            case XDP_TX:
>>>                if (unlikely(!virtnet_xdp_xmit(vi, rq, &xdp, skb)))
>>> @@ -446,6 +455,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
>>>        }
>>>        rcu_read_unlock();
>>>    +    skb_trim(skb, len);
> So here we trim the packet to set the length to the actual payload size. The
> 'len' parameter passed into receive_small does not include the headroom so this
> gives us the correct length of the payload.
>
> Make sense?

Yes, you are right.

>
>>>        return skb;
>>>      err_xdp:
> [...]
>
>>> @@ -569,7 +580,7 @@ static struct sk_buff *receive_mergeable(struct net_device
>>> *dev,
>>>                                  page, offset, &len);
>>>                if (!xdp_page)
>>>                    goto err_xdp;
>>> -            offset = 0;
>>> +            offset = VIRTIO_XDP_HEADROOM;
>>>            } else {
>>>                xdp_page = page;
>>>            }
>>> @@ -582,19 +593,30 @@ static struct sk_buff *receive_mergeable(struct
>>> net_device *dev,
>>>            if (unlikely(hdr->hdr.gso_type))
>>>                goto err_xdp;
>>>    +        /* Allow consuming headroom but reserve enough space to push
>>> +         * the descriptor on if we get an XDP_TX return code.
>>> +         */
>>>            data = page_address(xdp_page) + offset;
>>> +        xdp.data_hard_start = data - VIRTIO_XDP_HEADROOM + vi->hdr_len;
>> Should be data - VIRTIO_XDP_HEADROOM I think?
>>
> If the XDP program does an adjust_head() and then a XDP_TX I want to ensure
> we reserve enough headroom to push the header onto the buffer when the packet
> is sent. So the additional hdr_len reserve here is intentional. Otherwise we
> would need to detect this and do some type of linearize action.

I get the point.

>
>>>            xdp.data = data + vi->hdr_len;
>>>            xdp.data_end = xdp.data + (len - vi->hdr_len);
>>>            act = bpf_prog_run_xdp(xdp_prog, &xdp);
>>>              switch (act) {
>>>            case XDP_PASS:
>>> +            /* recalculate offset to account for any header
>>> +             * adjustments. Note other cases do not build an
>>> +             * skb and avoid using offset
>>> +             */
>>> +            offset = xdp.data -
>>> +                    page_address(xdp_page) - vi->hdr_len;
>>> +
>>>                /* We can only create skb based on xdp_page. */
>>>                if (unlikely(xdp_page != page)) {
>>>                    rcu_read_unlock();
>>>                    put_page(page);
>>>                    head_skb = page_to_skb(vi, rq, xdp_page,
>>> -                               0, len, PAGE_SIZE);
>>> +                               offset, len, PAGE_SIZE);
>>>                    ewma_pkt_len_add(&rq->mrg_avg_pkt_len, len);
>>>                    return head_skb;
> [...]
>
>>>    -static int add_recvbuf_mergeable(struct receive_queue *rq, gfp_t gfp)
>>> +static int add_recvbuf_mergeable(struct virtnet_info *vi,
>>> +                 struct receive_queue *rq, gfp_t gfp)
>>>    {
>>>        struct page_frag *alloc_frag = &rq->alloc_frag;
>>> +    unsigned int headroom = virtnet_get_headroom(vi);
>>>        char *buf;
>>>        unsigned long ctx;
>>>        int err;
>>>        unsigned int len, hole;
>>>          len = get_mergeable_buf_len(&rq->mrg_avg_pkt_len);
>>> -    if (unlikely(!skb_page_frag_refill(len, alloc_frag, gfp)))
>>> +    if (unlikely(!skb_page_frag_refill(len + headroom, alloc_frag, gfp)))
>>>            return -ENOMEM;
>>>          buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
>>> +    buf += headroom; /* advance address leaving hole at front of pkt */
>> Note: the headroom will reduce the possibility of frag coalescing which may
>> damage the performance more or less.
>>
>> [...]
> Right there are a few other performance optimizations I am looking at in
> virtio as well but these should go in as follow on series.
>
> Specifically, I'm looking at recycling buffers to see what sort of performance
> increase we can get out of that. Many of the hardware drivers do this and see
> a performance boost from it. However dynamic buffer sizes like this make it a
> bit challenging.
>
> .John

Right.

Acked-by: Jason Wang <jasowang@redhat.com>

Thanks

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [net-next PATCH v2 0/5] XDP adjust head support for virtio
  2017-02-03  3:14 [net-next PATCH v2 0/5] XDP adjust head support for virtio John Fastabend
                   ` (7 preceding siblings ...)
  2017-02-05 22:36 ` David Miller
@ 2017-02-07  4:15 ` Michael S. Tsirkin
  2017-02-07 15:05   ` David Miller
  2017-02-08 16:39   ` John Fastabend
  8 siblings, 2 replies; 24+ messages in thread
From: Michael S. Tsirkin @ 2017-02-07  4:15 UTC (permalink / raw)
  To: John Fastabend; +Cc: kubakici, jasowang, ast, john.r.fastabend, netdev

On Thu, Feb 02, 2017 at 07:14:05PM -0800, John Fastabend wrote:
> This series adds adjust head support for virtio. The following is my
> test setup. I use qemu + virtio as follows,
> 
> ./x86_64-softmmu/qemu-system-x86_64 \
>   -hda /var/lib/libvirt/images/Fedora-test0.img \
>   -m 4096  -enable-kvm -smp 2 -netdev tap,id=hn0,queues=4,vhost=on \
>   -device virtio-net-pci,netdev=hn0,mq=on,guest_tso4=off,guest_tso6=off,guest_ecn=off,guest_ufo=off,vectors=9
> 
> In order to use XDP with virtio until LRO is supported TSO must be
> turned off in the host. The important fields in the above command line
> are the following,
> 
>   guest_tso4=off,guest_tso6=off,guest_ecn=off,guest_ufo=off
> 
> Also note it is possible to conusme more queues than can be supported
> because when XDP is enabled for retransmit XDP attempts to use a queue
> per cpu. My standard queue count is 'queues=4'.
> 
> After loading the VM I run the relevant XDP test programs in,
> 
>   ./sammples/bpf
> 
> For this series I tested xdp1, xdp2, and xdp_tx_iptunnel. I usually test
> with iperf (-d option to get bidirectional traffic), ping, and pktgen.
> I also have a modified xdp1 that returns XDP_PASS on any packet to ensure
> the normal traffic path to the stack continues to work with XDP loaded.
> 
> It would be great to automate this soon. At the moment I do it by hand
> which is starting to get tedious.
> 
> v2: original series dropped trace points after merge.

So I'd say ok, let's go ahead and merge this for now.

However, I came up with a new idea for the future and I'd like to show
where I'm going.  The idea is that we don't use s/g buffers on RX, so we
have a pointer per descriptor untapped.  So we can allow users to stick
their own pointer in there, if they promise not to use s/g on this vq.
With a full extra pointer to play with, we can go wild.

Take a look but it doesn't even build yet.
Need to roll it out to all devices etc.

--->

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

--

diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
index 409aeaa..b59e95e 100644
--- a/drivers/virtio/virtio_ring.c
+++ b/drivers/virtio/virtio_ring.c
@@ -263,6 +263,7 @@ static inline int virtqueue_add(struct virtqueue *_vq,
 				unsigned int out_sgs,
 				unsigned int in_sgs,
 				void *data,
+				void *ctx,
 				gfp_t gfp)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
@@ -275,6 +276,7 @@ static inline int virtqueue_add(struct virtqueue *_vq,
 	START_USE(vq);
 
 	BUG_ON(data == NULL);
+	BUG_ON(ctx && vq->indirect);
 
 	if (unlikely(vq->broken)) {
 		END_USE(vq);
@@ -389,6 +391,8 @@ static inline int virtqueue_add(struct virtqueue *_vq,
 	vq->desc_state[head].data = data;
 	if (indirect)
 		vq->desc_state[head].indir_desc = desc;
+	if (ctx)
+		vq->desc_state[head].indir_desc = ctx;
 
 	/* Put entry in available array (but don't update avail->idx until they
 	 * do sync). */
@@ -461,7 +465,8 @@ int virtqueue_add_sgs(struct virtqueue *_vq,
 		for (sg = sgs[i]; sg; sg = sg_next(sg))
 			total_sg++;
 	}
-	return virtqueue_add(_vq, sgs, total_sg, out_sgs, in_sgs, data, gfp);
+	return virtqueue_add(_vq, sgs, total_sg, out_sgs, in_sgs,
+			     data, NULL, gfp);
 }
 EXPORT_SYMBOL_GPL(virtqueue_add_sgs);
 
@@ -483,7 +488,7 @@ int virtqueue_add_outbuf(struct virtqueue *vq,
 			 void *data,
 			 gfp_t gfp)
 {
-	return virtqueue_add(vq, &sg, num, 1, 0, data, gfp);
+	return virtqueue_add(vq, &sg, num, 1, 0, data, NULL, gfp);
 }
 EXPORT_SYMBOL_GPL(virtqueue_add_outbuf);
 
@@ -505,7 +510,31 @@ int virtqueue_add_inbuf(struct virtqueue *vq,
 			void *data,
 			gfp_t gfp)
 {
-	return virtqueue_add(vq, &sg, num, 0, 1, data, gfp);
+	return virtqueue_add(vq, &sg, num, 0, 1, data, NULL, gfp);
+}
+EXPORT_SYMBOL_GPL(virtqueue_add_inbuf);
+
+/**
+ * virtqueue_add_inbuf_ctx - expose input buffers to other end
+ * @vq: the struct virtqueue we're talking about.
+ * @sg: scatterlist (must be well-formed and terminated!)
+ * @num: the number of entries in @sg writable by other side
+ * @data: the token identifying the buffer.
+ * @ctx: extra context for the token
+ * @gfp: how to do memory allocations (if necessary).
+ *
+ * Caller must ensure we don't call this with other virtqueue operations
+ * at the same time (except where noted).
+ *
+ * Returns zero or a negative error (ie. ENOSPC, ENOMEM, EIO).
+ */
+int virtqueue_add_inbuf_ctx(struct virtqueue *vq,
+			struct scatterlist *sg, unsigned int num,
+			void *data,
+			void *ctx,
+			gfp_t gfp)
+{
+	return virtqueue_add(vq, &sg, num, 0, 1, data, ctx, gfp);
 }
 EXPORT_SYMBOL_GPL(virtqueue_add_inbuf);
 
@@ -598,7 +627,8 @@ bool virtqueue_kick(struct virtqueue *vq)
 }
 EXPORT_SYMBOL_GPL(virtqueue_kick);
 
-static void detach_buf(struct vring_virtqueue *vq, unsigned int head)
+static void detach_buf(struct vring_virtqueue *vq, unsigned int head,
+		       void **ctx)
 {
 	unsigned int i, j;
 	__virtio16 nextflag = cpu_to_virtio16(vq->vq.vdev, VRING_DESC_F_NEXT);
@@ -623,7 +653,10 @@ static void detach_buf(struct vring_virtqueue *vq, unsigned int head)
 	vq->vq.num_free++;
 
 	/* Free the indirect table, if any, now that it's unmapped. */
-	if (vq->desc_state[head].indir_desc) {
+	if (!vq->desc_state[head].indir_desc)
+		return;
+
+	if (vq->indirect) {
 		struct vring_desc *indir_desc = vq->desc_state[head].indir_desc;
 		u32 len = virtio32_to_cpu(vq->vq.vdev, vq->vring.desc[head].len);
 
@@ -635,8 +668,10 @@ static void detach_buf(struct vring_virtqueue *vq, unsigned int head)
 			vring_unmap_one(vq, &indir_desc[j]);
 
 		kfree(vq->desc_state[head].indir_desc);
-		vq->desc_state[head].indir_desc = NULL;
+	} else if (ctx) {
+		*ctx = vq->desc_state[head].indir_desc;
 	}
+	vq->desc_state[head].indir_desc = NULL;
 }
 
 static inline bool more_used(const struct vring_virtqueue *vq)
@@ -660,7 +695,8 @@ static inline bool more_used(const struct vring_virtqueue *vq)
  * Returns NULL if there are no used buffers, or the "data" token
  * handed to virtqueue_add_*().
  */
-void *virtqueue_get_buf(struct virtqueue *_vq, unsigned int *len)
+void *virtqueue_get_buf_ctx(struct virtqueue *_vq, unsigned int *len,
+			    void **ctx)
 {
 	struct vring_virtqueue *vq = to_vvq(_vq);
 	void *ret;
@@ -698,7 +734,7 @@ void *virtqueue_get_buf(struct virtqueue *_vq, unsigned int *len)
 
 	/* detach_buf clears data, so grab it now. */
 	ret = vq->desc_state[i].data;
-	detach_buf(vq, i);
+	detach_buf(vq, i, ctx);
 	vq->last_used_idx++;
 	/* If we expect an interrupt for the next entry, tell host
 	 * by writing event index and flush out the write before
@@ -715,8 +751,13 @@ void *virtqueue_get_buf(struct virtqueue *_vq, unsigned int *len)
 	END_USE(vq);
 	return ret;
 }
-EXPORT_SYMBOL_GPL(virtqueue_get_buf);
+EXPORT_SYMBOL_GPL(virtqueue_get_buf_ctx);
 
+void *virtqueue_get_buf(struct virtqueue *_vq, unsigned int *len)
+{
+	return virtqueue_get_buf_ctx(_vq, len, NULL);
+}
+EXPORT_SYMBOL_GPL(virtqueue_get_buf);
 /**
  * virtqueue_disable_cb - disable callbacks
  * @vq: the struct virtqueue we're talking about.
@@ -870,6 +911,7 @@ void *virtqueue_detach_unused_buf(struct virtqueue *_vq)
 	struct vring_virtqueue *vq = to_vvq(_vq);
 	unsigned int i;
 	void *buf;
+	void *ctx;
 
 	START_USE(vq);
 
@@ -878,7 +920,7 @@ void *virtqueue_detach_unused_buf(struct virtqueue *_vq)
 			continue;
 		/* detach_buf clears data, so grab it now. */
 		buf = vq->desc_state[i].data;
-		detach_buf(vq, i);
+		detach_buf(vq, i, NULL);
 		vq->avail_idx_shadow--;
 		vq->vring.avail->idx = cpu_to_virtio16(_vq->vdev, vq->avail_idx_shadow);
 		END_USE(vq);
@@ -916,6 +958,7 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
 					struct vring vring,
 					struct virtio_device *vdev,
 					bool weak_barriers,
+					bool context,
 					bool (*notify)(struct virtqueue *),
 					void (*callback)(struct virtqueue *),
 					const char *name)
@@ -950,7 +993,8 @@ struct virtqueue *__vring_new_virtqueue(unsigned int index,
 	vq->last_add_time_valid = false;
 #endif
 
-	vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC);
+	vq->indirect = virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC) &&
+		!context;
 	vq->event = virtio_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX);
 
 	/* No callback?  Tell other side not to bother us. */
@@ -1019,6 +1063,7 @@ struct virtqueue *vring_create_virtqueue(
 	struct virtio_device *vdev,
 	bool weak_barriers,
 	bool may_reduce_num,
+	bool context,
 	bool (*notify)(struct virtqueue *),
 	void (*callback)(struct virtqueue *),
 	const char *name)
@@ -1058,7 +1103,7 @@ struct virtqueue *vring_create_virtqueue(
 	queue_size_in_bytes = vring_size(num, vring_align);
 	vring_init(&vring, num, queue, vring_align);
 
-	vq = __vring_new_virtqueue(index, vring, vdev, weak_barriers,
+	vq = __vring_new_virtqueue(index, vring, vdev, weak_barriers, context,
 				   notify, callback, name);
 	if (!vq) {
 		vring_free_queue(vdev, queue_size_in_bytes, queue,
@@ -1079,6 +1124,7 @@ struct virtqueue *vring_new_virtqueue(unsigned int index,
 				      unsigned int vring_align,
 				      struct virtio_device *vdev,
 				      bool weak_barriers,
+				      bool context,
 				      void *pages,
 				      bool (*notify)(struct virtqueue *vq),
 				      void (*callback)(struct virtqueue *vq),
@@ -1086,7 +1132,7 @@ struct virtqueue *vring_new_virtqueue(unsigned int index,
 {
 	struct vring vring;
 	vring_init(&vring, num, pages, vring_align);
-	return __vring_new_virtqueue(index, vring, vdev, weak_barriers,
+	return __vring_new_virtqueue(index, vring, vdev, weak_barriers, context,
 				     notify, callback, name);
 }
 EXPORT_SYMBOL_GPL(vring_new_virtqueue);

-- 
MST

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [net-next PATCH v2 0/5] XDP adjust head support for virtio
  2017-02-07  4:15 ` Michael S. Tsirkin
@ 2017-02-07 15:05   ` David Miller
  2017-02-08 16:39   ` John Fastabend
  1 sibling, 0 replies; 24+ messages in thread
From: David Miller @ 2017-02-07 15:05 UTC (permalink / raw)
  To: mst; +Cc: john.fastabend, kubakici, jasowang, ast, john.r.fastabend, netdev

From: "Michael S. Tsirkin" <mst@redhat.com>
Date: Tue, 7 Feb 2017 06:15:13 +0200

> On Thu, Feb 02, 2017 at 07:14:05PM -0800, John Fastabend wrote:
>> This series adds adjust head support for virtio. The following is my
>> test setup. I use qemu + virtio as follows,
>> 
>> ./x86_64-softmmu/qemu-system-x86_64 \
>>   -hda /var/lib/libvirt/images/Fedora-test0.img \
>>   -m 4096  -enable-kvm -smp 2 -netdev tap,id=hn0,queues=4,vhost=on \
>>   -device virtio-net-pci,netdev=hn0,mq=on,guest_tso4=off,guest_tso6=off,guest_ecn=off,guest_ufo=off,vectors=9
>> 
>> In order to use XDP with virtio until LRO is supported TSO must be
>> turned off in the host. The important fields in the above command line
>> are the following,
>> 
>>   guest_tso4=off,guest_tso6=off,guest_ecn=off,guest_ufo=off
>> 
>> Also note it is possible to conusme more queues than can be supported
>> because when XDP is enabled for retransmit XDP attempts to use a queue
>> per cpu. My standard queue count is 'queues=4'.
>> 
>> After loading the VM I run the relevant XDP test programs in,
>> 
>>   ./sammples/bpf
>> 
>> For this series I tested xdp1, xdp2, and xdp_tx_iptunnel. I usually test
>> with iperf (-d option to get bidirectional traffic), ping, and pktgen.
>> I also have a modified xdp1 that returns XDP_PASS on any packet to ensure
>> the normal traffic path to the stack continues to work with XDP loaded.
>> 
>> It would be great to automate this soon. At the moment I do it by hand
>> which is starting to get tedious.
>> 
>> v2: original series dropped trace points after merge.
> 
> So I'd say ok, let's go ahead and merge this for now.

Ok, done.  Thanks everyone.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [net-next PATCH v2 0/5] XDP adjust head support for virtio
  2017-02-07  4:15 ` Michael S. Tsirkin
  2017-02-07 15:05   ` David Miller
@ 2017-02-08 16:39   ` John Fastabend
  2017-02-08 16:50     ` Michael S. Tsirkin
  1 sibling, 1 reply; 24+ messages in thread
From: John Fastabend @ 2017-02-08 16:39 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: kubakici, jasowang, ast, john.r.fastabend, netdev

[...]

> However, I came up with a new idea for the future and I'd like to show
> where I'm going.  The idea is that we don't use s/g buffers on RX, so we
> have a pointer per descriptor untapped.  So we can allow users to stick
> their own pointer in there, if they promise not to use s/g on this vq.
> With a full extra pointer to play with, we can go wild.

I looked at this quickly it seems like it would work and allow us to avoid
the reset. However, it seems like a lot of churn to avoid a single reset.
I don't see the reset itself as being that bad of an operation. I agree the
reset is not ideal though.

Are there any other use cases for this other than XDP?

> 
> Take a look but it doesn't even build yet.
> Need to roll it out to all devices etc.
> 
> --->
> 

[...]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [net-next PATCH v2 0/5] XDP adjust head support for virtio
  2017-02-08 16:39   ` John Fastabend
@ 2017-02-08 16:50     ` Michael S. Tsirkin
  0 siblings, 0 replies; 24+ messages in thread
From: Michael S. Tsirkin @ 2017-02-08 16:50 UTC (permalink / raw)
  To: John Fastabend; +Cc: kubakici, jasowang, ast, john.r.fastabend, netdev

On Wed, Feb 08, 2017 at 08:39:13AM -0800, John Fastabend wrote:
> [...]
> 
> > However, I came up with a new idea for the future and I'd like to show
> > where I'm going.  The idea is that we don't use s/g buffers on RX, so we
> > have a pointer per descriptor untapped.  So we can allow users to stick
> > their own pointer in there, if they promise not to use s/g on this vq.
> > With a full extra pointer to play with, we can go wild.
> 
> I looked at this quickly it seems like it would work and allow us to avoid
> the reset. However, it seems like a lot of churn to avoid a single reset.
> I don't see the reset itself as being that bad of an operation. I agree the
> reset is not ideal though.
> 
> Are there any other use cases for this other than XDP?

Well in fact this would allow reducing MERGEABLE_BUFFER_ALIGN
to L1_CACHE_BYTES so we save space per packet for regular
networking.

The idea to use build_skb would also benefit accordingly.

I guess ndo_set_rx_headroom could benefit if we were to implement that.



> > 
> > Take a look but it doesn't even build yet.
> > Need to roll it out to all devices etc.
> > 
> > --->
> > 
> 
> [...]

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2017-02-08 16:58 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-03  3:14 [net-next PATCH v2 0/5] XDP adjust head support for virtio John Fastabend
2017-02-03  3:14 ` [net-next PATCH v2 1/5] virtio_net: wrap rtnl_lock in test for calling with lock already held John Fastabend
2017-02-06  6:48   ` Jason Wang
2017-02-03  3:15 ` [net-next PATCH v2 2/5] virtio_net: factor out xdp handler for readability John Fastabend
2017-02-06  6:49   ` Jason Wang
2017-02-03  3:15 ` [net-next PATCH v2 3/5] virtio_net: remove duplicate queue pair binding in XDP John Fastabend
2017-02-06  7:06   ` Jason Wang
2017-02-03  3:16 ` [net-next PATCH v2 4/5] virtio_net: refactor freeze/restore logic into virtnet reset logic John Fastabend
2017-02-06  7:07   ` Jason Wang
2017-02-03  3:16 ` [net-next PATCH v2 5/5] virtio_net: XDP support for adjust_head John Fastabend
2017-02-03  4:04   ` Michael S. Tsirkin
2017-02-06  7:08   ` Jason Wang
2017-02-06 19:29     ` John Fastabend
2017-02-07  2:23       ` Jason Wang
2017-02-03  3:29 ` [net-next PATCH v2 0/5] XDP adjust head support for virtio Alexei Starovoitov
2017-02-03  3:55 ` Jakub Kicinski
2017-02-05 22:36 ` David Miller
2017-02-06  4:39   ` Michael S. Tsirkin
2017-02-06  7:12     ` Jason Wang
2017-02-06 16:37     ` David Miller
2017-02-07  4:15 ` Michael S. Tsirkin
2017-02-07 15:05   ` David Miller
2017-02-08 16:39   ` John Fastabend
2017-02-08 16:50     ` Michael S. Tsirkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.