All of lore.kernel.org
 help / color / mirror / Atom feed
* [net PATCH v5 0/6] virtio_net XDP fixes and adjust_header support
@ 2017-01-17 22:19 John Fastabend
  2017-01-17 22:19 ` [net PATCH v5 1/6] virtio_net: use dev_kfree_skb for small buffer XDP receive John Fastabend
                   ` (6 more replies)
  0 siblings, 7 replies; 40+ messages in thread
From: John Fastabend @ 2017-01-17 22:19 UTC (permalink / raw)
  To: jasowang, mst
  Cc: john.r.fastabend, netdev, john.fastabend, alexei.starovoitov, daniel

This has a fix to handle small buffer free logic correctly and then
also adds adjust head support.

I pushed adjust head at net (even though its rc3) to avoid having
to push another exception case into virtio_net to catch if the
program uses adjust_head and then block it. If there are any strong
objections to this we can push it at net-next and use a patch from
Jakub to add the exception handling but then user space has to deal
with it either via try/fail logic or via kernel version checks. Granted
we already have some cases that need to be configured to enable XDP
but I don't see any reason to have yet another one when we can fix it
now vs delaying a kernel version.


v2: fix spelling error, convert unsigned -> unsigned int
v3: v2 git crashed during send so retrying sorry for the noise
v4: changed layout of rtnl_lock fixes (Stephen)
    moved reset logic into virtio core with new patch (MST)
    fixed up linearize and some code cleanup (Jason)

    Otherwise did some generic code cleanup so might be a bit
    cleaner this time at least that is the hope.
v5: fixed rtnl_lock issue (DaveM)

    In order to fix rtnl_lock issue and also to address Jason's
    comment questioning the need for a generic virtio_device_reset
    routine I exported some virtio core routines and then wrote
    virtio_net reset routine. This is the cleanest solution I
    came up with today and I do not at this time have any need
    for a more generic reset. If folks don't like this I could
    revert back to v3 variant but Stephen pointed out that the
    pattern used there is also not ideal.

Thanks for the review.

---

John Fastabend (6):
      virtio_net: use dev_kfree_skb for small buffer XDP receive
      virtio_net: wrap rtnl_lock in test for calling with lock already held
      virtio_net: factor out xdp handler for readability
      virtio_net: remove duplicate queue pair binding in XDP
      virtio_net: refactor freeze/restore logic into virtnet reset logic
      virtio_net: XDP support for adjust_head


 drivers/net/virtio_net.c |  332 ++++++++++++++++++++++++++++++----------------
 drivers/virtio/virtio.c  |   42 +++---
 include/linux/virtio.h   |    4 +
 3 files changed, 247 insertions(+), 131 deletions(-)

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [net PATCH v5 1/6] virtio_net: use dev_kfree_skb for small buffer XDP receive
  2017-01-17 22:19 [net PATCH v5 0/6] virtio_net XDP fixes and adjust_header support John Fastabend
@ 2017-01-17 22:19 ` John Fastabend
  2017-01-18 15:48   ` Michael S. Tsirkin
  2017-01-23 21:08   ` Michael S. Tsirkin
  2017-01-17 22:20 ` [net PATCH v5 2/6] virtio_net: wrap rtnl_lock in test for calling with lock already held John Fastabend
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 40+ messages in thread
From: John Fastabend @ 2017-01-17 22:19 UTC (permalink / raw)
  To: jasowang, mst
  Cc: john.r.fastabend, netdev, john.fastabend, alexei.starovoitov, daniel

In the small buffer case during driver unload we currently use
put_page instead of dev_kfree_skb. Resolve this by adding a check
for virtnet mode when checking XDP queue type. Also name the
function so that the code reads correctly to match the additional
check.

Fixes: bb91accf2733 ("virtio-net: XDP support for small buffers")
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/virtio_net.c |    8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 4a10500..d97bb71 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1890,8 +1890,12 @@ static void free_receive_page_frags(struct virtnet_info *vi)
 			put_page(vi->rq[i].alloc_frag.page);
 }
 
-static bool is_xdp_queue(struct virtnet_info *vi, int q)
+static bool is_xdp_raw_buffer_queue(struct virtnet_info *vi, int q)
 {
+	/* For small receive mode always use kfree_skb variants */
+	if (!vi->mergeable_rx_bufs)
+		return false;
+
 	if (q < (vi->curr_queue_pairs - vi->xdp_queue_pairs))
 		return false;
 	else if (q < vi->curr_queue_pairs)
@@ -1908,7 +1912,7 @@ static void free_unused_bufs(struct virtnet_info *vi)
 	for (i = 0; i < vi->max_queue_pairs; i++) {
 		struct virtqueue *vq = vi->sq[i].vq;
 		while ((buf = virtqueue_detach_unused_buf(vq)) != NULL) {
-			if (!is_xdp_queue(vi, i))
+			if (!is_xdp_raw_buffer_queue(vi, i))
 				dev_kfree_skb(buf);
 			else
 				put_page(virt_to_head_page(buf));

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [net PATCH v5 2/6] virtio_net: wrap rtnl_lock in test for calling with lock already held
  2017-01-17 22:19 [net PATCH v5 0/6] virtio_net XDP fixes and adjust_header support John Fastabend
  2017-01-17 22:19 ` [net PATCH v5 1/6] virtio_net: use dev_kfree_skb for small buffer XDP receive John Fastabend
@ 2017-01-17 22:20 ` John Fastabend
  2017-01-17 22:21 ` [net PATCH v5 3/6] virtio_net: factor out xdp handler for readability John Fastabend
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 40+ messages in thread
From: John Fastabend @ 2017-01-17 22:20 UTC (permalink / raw)
  To: jasowang, mst
  Cc: john.r.fastabend, netdev, john.fastabend, alexei.starovoitov, daniel

For XDP use case and to allow ethtool reset tests it is useful to be
able to use reset paths from contexts where rtnl lock is already
held.

This requries updating virtnet_set_queues and free_receive_bufs the
two places where rtnl_lock is taken in virtio_net. To do this we
use the following pattern,

	_foo(...) { do stuff }
	foo(...) { rtnl_lock(); _foo(...); rtnl_unlock()};

this allows us to use freeze()/restore() flow from both contexts.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 drivers/net/virtio_net.c |   31 +++++++++++++++++++++----------
 1 file changed, 21 insertions(+), 10 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index d97bb71..ba0efee 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1331,7 +1331,7 @@ static void virtnet_ack_link_announce(struct virtnet_info *vi)
 	rtnl_unlock();
 }
 
-static int virtnet_set_queues(struct virtnet_info *vi, u16 queue_pairs)
+static int _virtnet_set_queues(struct virtnet_info *vi, u16 queue_pairs)
 {
 	struct scatterlist sg;
 	struct net_device *dev = vi->dev;
@@ -1357,6 +1357,16 @@ static int virtnet_set_queues(struct virtnet_info *vi, u16 queue_pairs)
 	return 0;
 }
 
+static int virtnet_set_queues(struct virtnet_info *vi, u16 queue_pairs)
+{
+	int err;
+
+	rtnl_lock();
+	err = _virtnet_set_queues(vi, queue_pairs);
+	rtnl_unlock();
+	return err;
+}
+
 static int virtnet_close(struct net_device *dev)
 {
 	struct virtnet_info *vi = netdev_priv(dev);
@@ -1609,7 +1619,7 @@ static int virtnet_set_channels(struct net_device *dev,
 		return -EINVAL;
 
 	get_online_cpus();
-	err = virtnet_set_queues(vi, queue_pairs);
+	err = _virtnet_set_queues(vi, queue_pairs);
 	if (!err) {
 		netif_set_real_num_tx_queues(dev, queue_pairs);
 		netif_set_real_num_rx_queues(dev, queue_pairs);
@@ -1736,7 +1746,7 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog)
 		return -ENOMEM;
 	}
 
-	err = virtnet_set_queues(vi, curr_qp + xdp_qp);
+	err = _virtnet_set_queues(vi, curr_qp + xdp_qp);
 	if (err) {
 		dev_warn(&dev->dev, "XDP Device queue allocation failure.\n");
 		return err;
@@ -1745,7 +1755,7 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog)
 	if (prog) {
 		prog = bpf_prog_add(prog, vi->max_queue_pairs - 1);
 		if (IS_ERR(prog)) {
-			virtnet_set_queues(vi, curr_qp);
+			_virtnet_set_queues(vi, curr_qp);
 			return PTR_ERR(prog);
 		}
 	}
@@ -1864,12 +1874,11 @@ static void virtnet_free_queues(struct virtnet_info *vi)
 	kfree(vi->sq);
 }
 
-static void free_receive_bufs(struct virtnet_info *vi)
+static void _free_receive_bufs(struct virtnet_info *vi)
 {
 	struct bpf_prog *old_prog;
 	int i;
 
-	rtnl_lock();
 	for (i = 0; i < vi->max_queue_pairs; i++) {
 		while (vi->rq[i].pages)
 			__free_pages(get_a_page(&vi->rq[i], GFP_KERNEL), 0);
@@ -1879,6 +1888,12 @@ static void free_receive_bufs(struct virtnet_info *vi)
 		if (old_prog)
 			bpf_prog_put(old_prog);
 	}
+}
+
+static void free_receive_bufs(struct virtnet_info *vi)
+{
+	rtnl_lock();
+	_free_receive_bufs(vi);
 	rtnl_unlock();
 }
 
@@ -2317,9 +2332,7 @@ static int virtnet_probe(struct virtio_device *vdev)
 		goto free_unregister_netdev;
 	}
 
-	rtnl_lock();
 	virtnet_set_queues(vi, vi->curr_queue_pairs);
-	rtnl_unlock();
 
 	/* Assume link up if device can't report link status,
 	   otherwise get link status from config. */
@@ -2428,9 +2441,7 @@ static int virtnet_restore(struct virtio_device *vdev)
 
 	netif_device_attach(vi->dev);
 
-	rtnl_lock();
 	virtnet_set_queues(vi, vi->curr_queue_pairs);
-	rtnl_unlock();
 
 	err = virtnet_cpu_notif_add(vi);
 	if (err)

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [net PATCH v5 3/6] virtio_net: factor out xdp handler for readability
  2017-01-17 22:19 [net PATCH v5 0/6] virtio_net XDP fixes and adjust_header support John Fastabend
  2017-01-17 22:19 ` [net PATCH v5 1/6] virtio_net: use dev_kfree_skb for small buffer XDP receive John Fastabend
  2017-01-17 22:20 ` [net PATCH v5 2/6] virtio_net: wrap rtnl_lock in test for calling with lock already held John Fastabend
@ 2017-01-17 22:21 ` John Fastabend
  2017-01-18 15:48   ` Michael S. Tsirkin
  2017-01-17 22:21 ` [net PATCH v5 4/6] virtio_net: remove duplicate queue pair binding in XDP John Fastabend
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 40+ messages in thread
From: John Fastabend @ 2017-01-17 22:21 UTC (permalink / raw)
  To: jasowang, mst
  Cc: john.r.fastabend, netdev, john.fastabend, alexei.starovoitov, daniel

At this point the do_xdp_prog is mostly if/else branches handling
the different modes of virtio_net. So remove it and handle running
the program in the per mode handlers.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 drivers/net/virtio_net.c |   75 +++++++++++++++++-----------------------------
 1 file changed, 27 insertions(+), 48 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index ba0efee..6de0cbe 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -388,49 +388,6 @@ static void virtnet_xdp_xmit(struct virtnet_info *vi,
 	virtqueue_kick(sq->vq);
 }
 
-static u32 do_xdp_prog(struct virtnet_info *vi,
-		       struct receive_queue *rq,
-		       struct bpf_prog *xdp_prog,
-		       void *data, int len)
-{
-	int hdr_padded_len;
-	struct xdp_buff xdp;
-	void *buf;
-	unsigned int qp;
-	u32 act;
-
-	if (vi->mergeable_rx_bufs) {
-		hdr_padded_len = sizeof(struct virtio_net_hdr_mrg_rxbuf);
-		xdp.data = data + hdr_padded_len;
-		xdp.data_end = xdp.data + (len - vi->hdr_len);
-		buf = data;
-	} else { /* small buffers */
-		struct sk_buff *skb = data;
-
-		xdp.data = skb->data;
-		xdp.data_end = xdp.data + len;
-		buf = skb->data;
-	}
-
-	act = bpf_prog_run_xdp(xdp_prog, &xdp);
-	switch (act) {
-	case XDP_PASS:
-		return XDP_PASS;
-	case XDP_TX:
-		qp = vi->curr_queue_pairs -
-			vi->xdp_queue_pairs +
-			smp_processor_id();
-		xdp.data = buf;
-		virtnet_xdp_xmit(vi, rq, &vi->sq[qp], &xdp, data);
-		return XDP_TX;
-	default:
-		bpf_warn_invalid_xdp_action(act);
-	case XDP_ABORTED:
-	case XDP_DROP:
-		return XDP_DROP;
-	}
-}
-
 static struct sk_buff *receive_small(struct net_device *dev,
 				     struct virtnet_info *vi,
 				     struct receive_queue *rq,
@@ -446,19 +403,30 @@ static struct sk_buff *receive_small(struct net_device *dev,
 	xdp_prog = rcu_dereference(rq->xdp_prog);
 	if (xdp_prog) {
 		struct virtio_net_hdr_mrg_rxbuf *hdr = buf;
+		struct xdp_buff xdp;
+		unsigned int qp;
 		u32 act;
 
 		if (unlikely(hdr->hdr.gso_type || hdr->hdr.flags))
 			goto err_xdp;
-		act = do_xdp_prog(vi, rq, xdp_prog, skb, len);
+
+		xdp.data = skb->data;
+		xdp.data_end = xdp.data + len;
+		act = bpf_prog_run_xdp(xdp_prog, &xdp);
 		switch (act) {
 		case XDP_PASS:
 			break;
 		case XDP_TX:
+			qp = vi->curr_queue_pairs -
+				vi->xdp_queue_pairs +
+				smp_processor_id();
+			virtnet_xdp_xmit(vi, rq, &vi->sq[qp], &xdp, skb);
 			rcu_read_unlock();
 			goto xdp_xmit;
-		case XDP_DROP:
 		default:
+			bpf_warn_invalid_xdp_action(act);
+		case XDP_ABORTED:
+		case XDP_DROP:
 			goto err_xdp;
 		}
 	}
@@ -576,6 +544,9 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 	xdp_prog = rcu_dereference(rq->xdp_prog);
 	if (xdp_prog) {
 		struct page *xdp_page;
+		struct xdp_buff xdp;
+		unsigned int qp;
+		void *data;
 		u32 act;
 
 		/* This happens when rx buffer size is underestimated */
@@ -598,8 +569,10 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 		if (unlikely(hdr->hdr.gso_type))
 			goto err_xdp;
 
-		act = do_xdp_prog(vi, rq, xdp_prog,
-				  page_address(xdp_page) + offset, len);
+		data = page_address(xdp_page) + offset;
+		xdp.data = data + vi->hdr_len;
+		xdp.data_end = xdp.data + (len - vi->hdr_len);
+		act = bpf_prog_run_xdp(xdp_prog, &xdp);
 		switch (act) {
 		case XDP_PASS:
 			/* We can only create skb based on xdp_page. */
@@ -613,13 +586,19 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 			}
 			break;
 		case XDP_TX:
+			qp = vi->curr_queue_pairs -
+				vi->xdp_queue_pairs +
+				smp_processor_id();
+			virtnet_xdp_xmit(vi, rq, &vi->sq[qp], &xdp, data);
 			ewma_pkt_len_add(&rq->mrg_avg_pkt_len, len);
 			if (unlikely(xdp_page != page))
 				goto err_xdp;
 			rcu_read_unlock();
 			goto xdp_xmit;
-		case XDP_DROP:
 		default:
+			bpf_warn_invalid_xdp_action(act);
+		case XDP_ABORTED:
+		case XDP_DROP:
 			if (unlikely(xdp_page != page))
 				__free_pages(xdp_page, 0);
 			ewma_pkt_len_add(&rq->mrg_avg_pkt_len, len);

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [net PATCH v5 4/6] virtio_net: remove duplicate queue pair binding in XDP
  2017-01-17 22:19 [net PATCH v5 0/6] virtio_net XDP fixes and adjust_header support John Fastabend
                   ` (2 preceding siblings ...)
  2017-01-17 22:21 ` [net PATCH v5 3/6] virtio_net: factor out xdp handler for readability John Fastabend
@ 2017-01-17 22:21 ` John Fastabend
  2017-01-18 15:49   ` Michael S. Tsirkin
  2017-01-17 22:22 ` [net PATCH v5 5/6] virtio_net: refactor freeze/restore logic into virtnet reset logic John Fastabend
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 40+ messages in thread
From: John Fastabend @ 2017-01-17 22:21 UTC (permalink / raw)
  To: jasowang, mst
  Cc: john.r.fastabend, netdev, john.fastabend, alexei.starovoitov, daniel

Factor out qp assignment.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 drivers/net/virtio_net.c |   18 +++++++-----------
 1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 6de0cbe..922ca66 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -332,15 +332,19 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
 
 static void virtnet_xdp_xmit(struct virtnet_info *vi,
 			     struct receive_queue *rq,
-			     struct send_queue *sq,
 			     struct xdp_buff *xdp,
 			     void *data)
 {
 	struct virtio_net_hdr_mrg_rxbuf *hdr;
 	unsigned int num_sg, len;
+	struct send_queue *sq;
+	unsigned int qp;
 	void *xdp_sent;
 	int err;
 
+	qp = vi->curr_queue_pairs - vi->xdp_queue_pairs + smp_processor_id();
+	sq = &vi->sq[qp];
+
 	/* Free up any pending old buffers before queueing new ones. */
 	while ((xdp_sent = virtqueue_get_buf(sq->vq, &len)) != NULL) {
 		if (vi->mergeable_rx_bufs) {
@@ -404,7 +408,6 @@ static struct sk_buff *receive_small(struct net_device *dev,
 	if (xdp_prog) {
 		struct virtio_net_hdr_mrg_rxbuf *hdr = buf;
 		struct xdp_buff xdp;
-		unsigned int qp;
 		u32 act;
 
 		if (unlikely(hdr->hdr.gso_type || hdr->hdr.flags))
@@ -417,10 +420,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
 		case XDP_PASS:
 			break;
 		case XDP_TX:
-			qp = vi->curr_queue_pairs -
-				vi->xdp_queue_pairs +
-				smp_processor_id();
-			virtnet_xdp_xmit(vi, rq, &vi->sq[qp], &xdp, skb);
+			virtnet_xdp_xmit(vi, rq, &xdp, skb);
 			rcu_read_unlock();
 			goto xdp_xmit;
 		default:
@@ -545,7 +545,6 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 	if (xdp_prog) {
 		struct page *xdp_page;
 		struct xdp_buff xdp;
-		unsigned int qp;
 		void *data;
 		u32 act;
 
@@ -586,10 +585,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 			}
 			break;
 		case XDP_TX:
-			qp = vi->curr_queue_pairs -
-				vi->xdp_queue_pairs +
-				smp_processor_id();
-			virtnet_xdp_xmit(vi, rq, &vi->sq[qp], &xdp, data);
+			virtnet_xdp_xmit(vi, rq, &xdp, data);
 			ewma_pkt_len_add(&rq->mrg_avg_pkt_len, len);
 			if (unlikely(xdp_page != page))
 				goto err_xdp;

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [net PATCH v5 5/6] virtio_net: refactor freeze/restore logic into virtnet reset logic
  2017-01-17 22:19 [net PATCH v5 0/6] virtio_net XDP fixes and adjust_header support John Fastabend
                   ` (3 preceding siblings ...)
  2017-01-17 22:21 ` [net PATCH v5 4/6] virtio_net: remove duplicate queue pair binding in XDP John Fastabend
@ 2017-01-17 22:22 ` John Fastabend
  2017-01-18 15:50   ` Michael S. Tsirkin
  2017-01-17 22:22 ` [net PATCH v5 6/6] virtio_net: XDP support for adjust_head John Fastabend
  2017-01-18 15:48 ` [net PATCH v5 0/6] virtio_net XDP fixes and adjust_header support Michael S. Tsirkin
  6 siblings, 1 reply; 40+ messages in thread
From: John Fastabend @ 2017-01-17 22:22 UTC (permalink / raw)
  To: jasowang, mst
  Cc: john.r.fastabend, netdev, john.fastabend, alexei.starovoitov, daniel

For XDP we will need to reset the queues to allow for buffer headroom
to be configured. In order to do this we need to essentially run the
freeze()/restore() code path. Unfortunately the locking requirements
between the freeze/restore and reset paths are different however so
we can not simply reuse the code.

This patch refactors the code path and adds a reset helper routine.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 drivers/net/virtio_net.c |   75 ++++++++++++++++++++++++++++------------------
 drivers/virtio/virtio.c  |   42 ++++++++++++++------------
 include/linux/virtio.h   |    4 ++
 3 files changed, 73 insertions(+), 48 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 922ca66..62dbf4b 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1684,6 +1684,49 @@ static void virtnet_init_settings(struct net_device *dev)
 	.set_settings = virtnet_set_settings,
 };
 
+static void virtnet_freeze_down(struct virtio_device *vdev)
+{
+	struct virtnet_info *vi = vdev->priv;
+	int i;
+
+	/* Make sure no work handler is accessing the device */
+	flush_work(&vi->config_work);
+
+	netif_device_detach(vi->dev);
+	cancel_delayed_work_sync(&vi->refill);
+
+	if (netif_running(vi->dev)) {
+		for (i = 0; i < vi->max_queue_pairs; i++)
+			napi_disable(&vi->rq[i].napi);
+	}
+}
+
+static int init_vqs(struct virtnet_info *vi);
+
+static int virtnet_restore_up(struct virtio_device *vdev)
+{
+	struct virtnet_info *vi = vdev->priv;
+	int err, i;
+
+	err = init_vqs(vi);
+	if (err)
+		return err;
+
+	virtio_device_ready(vdev);
+
+	if (netif_running(vi->dev)) {
+		for (i = 0; i < vi->curr_queue_pairs; i++)
+			if (!try_fill_recv(vi, &vi->rq[i], GFP_KERNEL))
+				schedule_delayed_work(&vi->refill, 0);
+
+		for (i = 0; i < vi->max_queue_pairs; i++)
+			virtnet_napi_enable(&vi->rq[i]);
+	}
+
+	netif_device_attach(vi->dev);
+	return err;
+}
+
 static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog)
 {
 	unsigned long int max_sz = PAGE_SIZE - sizeof(struct padded_vnet_hdr);
@@ -2374,21 +2417,9 @@ static void virtnet_remove(struct virtio_device *vdev)
 static int virtnet_freeze(struct virtio_device *vdev)
 {
 	struct virtnet_info *vi = vdev->priv;
-	int i;
 
 	virtnet_cpu_notif_remove(vi);
-
-	/* Make sure no work handler is accessing the device */
-	flush_work(&vi->config_work);
-
-	netif_device_detach(vi->dev);
-	cancel_delayed_work_sync(&vi->refill);
-
-	if (netif_running(vi->dev)) {
-		for (i = 0; i < vi->max_queue_pairs; i++)
-			napi_disable(&vi->rq[i].napi);
-	}
-
+	virtnet_freeze_down(vdev);
 	remove_vq_common(vi);
 
 	return 0;
@@ -2397,25 +2428,11 @@ static int virtnet_freeze(struct virtio_device *vdev)
 static int virtnet_restore(struct virtio_device *vdev)
 {
 	struct virtnet_info *vi = vdev->priv;
-	int err, i;
+	int err;
 
-	err = init_vqs(vi);
+	err = virtnet_restore_up(vdev);
 	if (err)
 		return err;
-
-	virtio_device_ready(vdev);
-
-	if (netif_running(vi->dev)) {
-		for (i = 0; i < vi->curr_queue_pairs; i++)
-			if (!try_fill_recv(vi, &vi->rq[i], GFP_KERNEL))
-				schedule_delayed_work(&vi->refill, 0);
-
-		for (i = 0; i < vi->max_queue_pairs; i++)
-			virtnet_napi_enable(&vi->rq[i]);
-	}
-
-	netif_device_attach(vi->dev);
-
 	virtnet_set_queues(vi, vi->curr_queue_pairs);
 
 	err = virtnet_cpu_notif_add(vi);
diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index 7062bb0..400d70b 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -100,11 +100,6 @@ static int virtio_uevent(struct device *_dv, struct kobj_uevent_env *env)
 			      dev->id.device, dev->id.vendor);
 }
 
-static void add_status(struct virtio_device *dev, unsigned status)
-{
-	dev->config->set_status(dev, dev->config->get_status(dev) | status);
-}
-
 void virtio_check_driver_offered_feature(const struct virtio_device *vdev,
 					 unsigned int fbit)
 {
@@ -145,14 +140,15 @@ void virtio_config_changed(struct virtio_device *dev)
 }
 EXPORT_SYMBOL_GPL(virtio_config_changed);
 
-static void virtio_config_disable(struct virtio_device *dev)
+void virtio_config_disable(struct virtio_device *dev)
 {
 	spin_lock_irq(&dev->config_lock);
 	dev->config_enabled = false;
 	spin_unlock_irq(&dev->config_lock);
 }
+EXPORT_SYMBOL_GPL(virtio_config_disable);
 
-static void virtio_config_enable(struct virtio_device *dev)
+void virtio_config_enable(struct virtio_device *dev)
 {
 	spin_lock_irq(&dev->config_lock);
 	dev->config_enabled = true;
@@ -161,8 +157,15 @@ static void virtio_config_enable(struct virtio_device *dev)
 	dev->config_change_pending = false;
 	spin_unlock_irq(&dev->config_lock);
 }
+EXPORT_SYMBOL_GPL(virtio_config_enable);
+
+void virtio_add_status(struct virtio_device *dev, unsigned int status)
+{
+	dev->config->set_status(dev, dev->config->get_status(dev) | status);
+}
+EXPORT_SYMBOL_GPL(virtio_add_status);
 
-static int virtio_finalize_features(struct virtio_device *dev)
+int virtio_finalize_features(struct virtio_device *dev)
 {
 	int ret = dev->config->finalize_features(dev);
 	unsigned status;
@@ -173,7 +176,7 @@ static int virtio_finalize_features(struct virtio_device *dev)
 	if (!virtio_has_feature(dev, VIRTIO_F_VERSION_1))
 		return 0;
 
-	add_status(dev, VIRTIO_CONFIG_S_FEATURES_OK);
+	virtio_add_status(dev, VIRTIO_CONFIG_S_FEATURES_OK);
 	status = dev->config->get_status(dev);
 	if (!(status & VIRTIO_CONFIG_S_FEATURES_OK)) {
 		dev_err(&dev->dev, "virtio: device refuses features: %x\n",
@@ -182,6 +185,7 @@ static int virtio_finalize_features(struct virtio_device *dev)
 	}
 	return 0;
 }
+EXPORT_SYMBOL_GPL(virtio_finalize_features);
 
 static int virtio_dev_probe(struct device *_d)
 {
@@ -193,7 +197,7 @@ static int virtio_dev_probe(struct device *_d)
 	u64 driver_features_legacy;
 
 	/* We have a driver! */
-	add_status(dev, VIRTIO_CONFIG_S_DRIVER);
+	virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER);
 
 	/* Figure out what features the device supports. */
 	device_features = dev->config->get_features(dev);
@@ -247,7 +251,7 @@ static int virtio_dev_probe(struct device *_d)
 
 	return 0;
 err:
-	add_status(dev, VIRTIO_CONFIG_S_FAILED);
+	virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED);
 	return err;
 
 }
@@ -265,7 +269,7 @@ static int virtio_dev_remove(struct device *_d)
 	WARN_ON_ONCE(dev->config->get_status(dev));
 
 	/* Acknowledge the device's existence again. */
-	add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
+	virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
 	return 0;
 }
 
@@ -316,7 +320,7 @@ int register_virtio_device(struct virtio_device *dev)
 	dev->config->reset(dev);
 
 	/* Acknowledge that we've seen the device. */
-	add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
+	virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
 
 	INIT_LIST_HEAD(&dev->vqs);
 
@@ -325,7 +329,7 @@ int register_virtio_device(struct virtio_device *dev)
 	err = device_register(&dev->dev);
 out:
 	if (err)
-		add_status(dev, VIRTIO_CONFIG_S_FAILED);
+		virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED);
 	return err;
 }
 EXPORT_SYMBOL_GPL(register_virtio_device);
@@ -365,18 +369,18 @@ int virtio_device_restore(struct virtio_device *dev)
 	dev->config->reset(dev);
 
 	/* Acknowledge that we've seen the device. */
-	add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
+	virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
 
 	/* Maybe driver failed before freeze.
 	 * Restore the failed status, for debugging. */
 	if (dev->failed)
-		add_status(dev, VIRTIO_CONFIG_S_FAILED);
+		virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED);
 
 	if (!drv)
 		return 0;
 
 	/* We have a driver! */
-	add_status(dev, VIRTIO_CONFIG_S_DRIVER);
+	virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER);
 
 	ret = virtio_finalize_features(dev);
 	if (ret)
@@ -389,14 +393,14 @@ int virtio_device_restore(struct virtio_device *dev)
 	}
 
 	/* Finally, tell the device we're all set */
-	add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
+	virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
 
 	virtio_config_enable(dev);
 
 	return 0;
 
 err:
-	add_status(dev, VIRTIO_CONFIG_S_FAILED);
+	virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED);
 	return ret;
 }
 EXPORT_SYMBOL_GPL(virtio_device_restore);
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index d5eb547..04b0d3f 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -132,12 +132,16 @@ static inline struct virtio_device *dev_to_virtio(struct device *_dev)
 	return container_of(_dev, struct virtio_device, dev);
 }
 
+void virtio_add_status(struct virtio_device *dev, unsigned int status);
 int register_virtio_device(struct virtio_device *dev);
 void unregister_virtio_device(struct virtio_device *dev);
 
 void virtio_break_device(struct virtio_device *dev);
 
 void virtio_config_changed(struct virtio_device *dev);
+void virtio_config_disable(struct virtio_device *dev);
+void virtio_config_enable(struct virtio_device *dev);
+int virtio_finalize_features(struct virtio_device *dev);
 #ifdef CONFIG_PM_SLEEP
 int virtio_device_freeze(struct virtio_device *dev);
 int virtio_device_restore(struct virtio_device *dev);

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [net PATCH v5 6/6] virtio_net: XDP support for adjust_head
  2017-01-17 22:19 [net PATCH v5 0/6] virtio_net XDP fixes and adjust_header support John Fastabend
                   ` (4 preceding siblings ...)
  2017-01-17 22:22 ` [net PATCH v5 5/6] virtio_net: refactor freeze/restore logic into virtnet reset logic John Fastabend
@ 2017-01-17 22:22 ` John Fastabend
  2017-01-18  3:35   ` Jason Wang
                     ` (2 more replies)
  2017-01-18 15:48 ` [net PATCH v5 0/6] virtio_net XDP fixes and adjust_header support Michael S. Tsirkin
  6 siblings, 3 replies; 40+ messages in thread
From: John Fastabend @ 2017-01-17 22:22 UTC (permalink / raw)
  To: jasowang, mst
  Cc: john.r.fastabend, netdev, john.fastabend, alexei.starovoitov, daniel

Add support for XDP adjust head by allocating a 256B header region
that XDP programs can grow into. This is only enabled when a XDP
program is loaded.

In order to ensure that we do not have to unwind queue headroom push
queue setup below bpf_prog_add. It reads better to do a prog ref
unwind vs another queue setup call.

At the moment this code must do a full reset to ensure old buffers
without headroom on program add or with headroom on program removal
are not used incorrectly in the datapath. Ideally we would only
have to disable/enable the RX queues being updated but there is no
API to do this at the moment in virtio so use the big hammer. In
practice it is likely not that big of a problem as this will only
happen when XDP is enabled/disabled changing programs does not
require the reset. There is some risk that the driver may either
have an allocation failure or for some reason fail to correctly
negotiate with the underlying backend in this case the driver will
be left uninitialized. I have not seen this ever happen on my test
systems and for what its worth this same failure case can occur
from probe and other contexts in virtio framework.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 drivers/net/virtio_net.c |  149 +++++++++++++++++++++++++++++++++++++++-------
 1 file changed, 125 insertions(+), 24 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 62dbf4b..3b129b4 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -41,6 +41,9 @@
 #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN)
 #define GOOD_COPY_LEN	128
 
+/* Amount of XDP headroom to prepend to packets for use by xdp_adjust_head */
+#define VIRTIO_XDP_HEADROOM 256
+
 /* RX packet size EWMA. The average packet size is used to determine the packet
  * buffer size when refilling RX rings. As the entire RX ring may be refilled
  * at once, the weight is chosen so that the EWMA will be insensitive to short-
@@ -359,6 +362,7 @@ static void virtnet_xdp_xmit(struct virtnet_info *vi,
 	}
 
 	if (vi->mergeable_rx_bufs) {
+		xdp->data -= sizeof(struct virtio_net_hdr_mrg_rxbuf);
 		/* Zero header and leave csum up to XDP layers */
 		hdr = xdp->data;
 		memset(hdr, 0, vi->hdr_len);
@@ -375,7 +379,9 @@ static void virtnet_xdp_xmit(struct virtnet_info *vi,
 		num_sg = 2;
 		sg_init_table(sq->sg, 2);
 		sg_set_buf(sq->sg, hdr, vi->hdr_len);
-		skb_to_sgvec(skb, sq->sg + 1, 0, skb->len);
+		skb_to_sgvec(skb, sq->sg + 1,
+			     xdp->data - xdp->data_hard_start,
+			     xdp->data_end - xdp->data);
 	}
 	err = virtqueue_add_outbuf(sq->vq, sq->sg, num_sg,
 				   data, GFP_ATOMIC);
@@ -401,7 +407,6 @@ static struct sk_buff *receive_small(struct net_device *dev,
 	struct bpf_prog *xdp_prog;
 
 	len -= vi->hdr_len;
-	skb_trim(skb, len);
 
 	rcu_read_lock();
 	xdp_prog = rcu_dereference(rq->xdp_prog);
@@ -413,11 +418,15 @@ static struct sk_buff *receive_small(struct net_device *dev,
 		if (unlikely(hdr->hdr.gso_type || hdr->hdr.flags))
 			goto err_xdp;
 
-		xdp.data = skb->data;
+		xdp.data_hard_start = skb->data;
+		xdp.data = skb->data + VIRTIO_XDP_HEADROOM;
 		xdp.data_end = xdp.data + len;
 		act = bpf_prog_run_xdp(xdp_prog, &xdp);
 		switch (act) {
 		case XDP_PASS:
+			/* Recalculate length in case bpf program changed it */
+			__skb_pull(skb, xdp.data - xdp.data_hard_start);
+			len = xdp.data_end - xdp.data;
 			break;
 		case XDP_TX:
 			virtnet_xdp_xmit(vi, rq, &xdp, skb);
@@ -432,6 +441,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
 	}
 	rcu_read_unlock();
 
+	skb_trim(skb, len);
 	return skb;
 
 err_xdp:
@@ -480,7 +490,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
 				       unsigned int *len)
 {
 	struct page *page = alloc_page(GFP_ATOMIC);
-	unsigned int page_off = 0;
+	unsigned int page_off = VIRTIO_XDP_HEADROOM;
 
 	if (!page)
 		return NULL;
@@ -516,7 +526,8 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
 		put_page(p);
 	}
 
-	*len = page_off;
+	/* Headroom does not contribute to packet length */
+	*len = page_off - VIRTIO_XDP_HEADROOM;
 	return page;
 err_buf:
 	__free_pages(page, 0);
@@ -555,7 +566,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 						      page, offset, &len);
 			if (!xdp_page)
 				goto err_xdp;
-			offset = 0;
+			offset = VIRTIO_XDP_HEADROOM;
 		} else {
 			xdp_page = page;
 		}
@@ -568,18 +579,29 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 		if (unlikely(hdr->hdr.gso_type))
 			goto err_xdp;
 
+		/* Allow consuming headroom but reserve enough space to push
+		 * the descriptor on if we get an XDP_TX return code.
+		 */
 		data = page_address(xdp_page) + offset;
+		xdp.data_hard_start = data - VIRTIO_XDP_HEADROOM + vi->hdr_len;
 		xdp.data = data + vi->hdr_len;
 		xdp.data_end = xdp.data + (len - vi->hdr_len);
 		act = bpf_prog_run_xdp(xdp_prog, &xdp);
 		switch (act) {
 		case XDP_PASS:
+			/* recalculate offset to account for any header
+			 * adjustments. Note other cases do not build an
+			 * skb and avoid using offset
+			 */
+			offset = xdp.data -
+					page_address(xdp_page) - vi->hdr_len;
+
 			/* We can only create skb based on xdp_page. */
 			if (unlikely(xdp_page != page)) {
 				rcu_read_unlock();
 				put_page(page);
 				head_skb = page_to_skb(vi, rq, xdp_page,
-						       0, len, PAGE_SIZE);
+						       offset, len, PAGE_SIZE);
 				ewma_pkt_len_add(&rq->mrg_avg_pkt_len, len);
 				return head_skb;
 			}
@@ -744,23 +766,30 @@ static void receive_buf(struct virtnet_info *vi, struct receive_queue *rq,
 	dev_kfree_skb(skb);
 }
 
+static unsigned int virtnet_get_headroom(struct virtnet_info *vi)
+{
+	return vi->xdp_queue_pairs ? VIRTIO_XDP_HEADROOM : 0;
+}
+
 static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
 			     gfp_t gfp)
 {
+	int headroom = GOOD_PACKET_LEN + virtnet_get_headroom(vi);
+	unsigned int xdp_headroom = virtnet_get_headroom(vi);
 	struct sk_buff *skb;
 	struct virtio_net_hdr_mrg_rxbuf *hdr;
 	int err;
 
-	skb = __netdev_alloc_skb_ip_align(vi->dev, GOOD_PACKET_LEN, gfp);
+	skb = __netdev_alloc_skb_ip_align(vi->dev, headroom, gfp);
 	if (unlikely(!skb))
 		return -ENOMEM;
 
-	skb_put(skb, GOOD_PACKET_LEN);
+	skb_put(skb, headroom);
 
 	hdr = skb_vnet_hdr(skb);
 	sg_init_table(rq->sg, 2);
 	sg_set_buf(rq->sg, hdr, vi->hdr_len);
-	skb_to_sgvec(skb, rq->sg + 1, 0, skb->len);
+	skb_to_sgvec(skb, rq->sg + 1, xdp_headroom, skb->len - xdp_headroom);
 
 	err = virtqueue_add_inbuf(rq->vq, rq->sg, 2, skb, gfp);
 	if (err < 0)
@@ -828,24 +857,27 @@ static unsigned int get_mergeable_buf_len(struct ewma_pkt_len *avg_pkt_len)
 	return ALIGN(len, MERGEABLE_BUFFER_ALIGN);
 }
 
-static int add_recvbuf_mergeable(struct receive_queue *rq, gfp_t gfp)
+static int add_recvbuf_mergeable(struct virtnet_info *vi,
+				 struct receive_queue *rq, gfp_t gfp)
 {
 	struct page_frag *alloc_frag = &rq->alloc_frag;
+	unsigned int headroom = virtnet_get_headroom(vi);
 	char *buf;
 	unsigned long ctx;
 	int err;
 	unsigned int len, hole;
 
 	len = get_mergeable_buf_len(&rq->mrg_avg_pkt_len);
-	if (unlikely(!skb_page_frag_refill(len, alloc_frag, gfp)))
+	if (unlikely(!skb_page_frag_refill(len + headroom, alloc_frag, gfp)))
 		return -ENOMEM;
 
 	buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
+	buf += headroom; /* advance address leaving hole at front of pkt */
 	ctx = mergeable_buf_to_ctx(buf, len);
 	get_page(alloc_frag->page);
-	alloc_frag->offset += len;
+	alloc_frag->offset += len + headroom;
 	hole = alloc_frag->size - alloc_frag->offset;
-	if (hole < len) {
+	if (hole < len + headroom) {
 		/* To avoid internal fragmentation, if there is very likely not
 		 * enough space for another buffer, add the remaining space to
 		 * the current buffer. This extra space is not included in
@@ -879,7 +911,7 @@ static bool try_fill_recv(struct virtnet_info *vi, struct receive_queue *rq,
 	gfp |= __GFP_COLD;
 	do {
 		if (vi->mergeable_rx_bufs)
-			err = add_recvbuf_mergeable(rq, gfp);
+			err = add_recvbuf_mergeable(vi, rq, gfp);
 		else if (vi->big_packets)
 			err = add_recvbuf_big(vi, rq, gfp);
 		else
@@ -1702,6 +1734,7 @@ static void virtnet_freeze_down(struct virtio_device *vdev)
 }
 
 static int init_vqs(struct virtnet_info *vi);
+static void _remove_vq_common(struct virtnet_info *vi);
 
 static int virtnet_restore_up(struct virtio_device *vdev)
 {
@@ -1727,12 +1760,45 @@ static int virtnet_restore_up(struct virtio_device *vdev)
 	return err;
 }
 
+static int virtnet_reset(struct virtnet_info *vi)
+{
+	struct virtio_device *dev = vi->vdev;
+	int ret;
+
+	virtio_config_disable(dev);
+	dev->failed = dev->config->get_status(dev) & VIRTIO_CONFIG_S_FAILED;
+	virtnet_freeze_down(dev);
+	_remove_vq_common(vi);
+
+	dev->config->reset(dev);
+	virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
+	virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER);
+
+	ret = virtio_finalize_features(dev);
+	if (ret)
+		goto err;
+
+	ret = virtnet_restore_up(dev);
+	if (ret)
+		goto err;
+	ret = _virtnet_set_queues(vi, vi->curr_queue_pairs);
+	if (ret)
+		goto err;
+
+	virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
+	virtio_config_enable(dev);
+	return 0;
+err:
+	virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED);
+	return ret;
+}
+
 static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog)
 {
 	unsigned long int max_sz = PAGE_SIZE - sizeof(struct padded_vnet_hdr);
 	struct virtnet_info *vi = netdev_priv(dev);
 	struct bpf_prog *old_prog;
-	u16 xdp_qp = 0, curr_qp;
+	u16 oxdp_qp, xdp_qp = 0, curr_qp;
 	int i, err;
 
 	if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO4) ||
@@ -1764,21 +1830,32 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog)
 		return -ENOMEM;
 	}
 
+	if (prog) {
+		prog = bpf_prog_add(prog, vi->max_queue_pairs - 1);
+		if (IS_ERR(prog))
+			return PTR_ERR(prog);
+	}
+
 	err = _virtnet_set_queues(vi, curr_qp + xdp_qp);
 	if (err) {
 		dev_warn(&dev->dev, "XDP Device queue allocation failure.\n");
-		return err;
+		goto virtio_queue_err;
 	}
 
-	if (prog) {
-		prog = bpf_prog_add(prog, vi->max_queue_pairs - 1);
-		if (IS_ERR(prog)) {
-			_virtnet_set_queues(vi, curr_qp);
-			return PTR_ERR(prog);
-		}
+	oxdp_qp = vi->xdp_queue_pairs;
+
+	/* Changing the headroom in buffers is a disruptive operation because
+	 * existing buffers must be flushed and reallocated. This will happen
+	 * when a xdp program is initially added or xdp is disabled by removing
+	 * the xdp program resulting in number of XDP queues changing.
+	 */
+	if (vi->xdp_queue_pairs != xdp_qp) {
+		vi->xdp_queue_pairs = xdp_qp;
+		err = virtnet_reset(vi);
+		if (err)
+			goto virtio_reset_err;
 	}
 
-	vi->xdp_queue_pairs = xdp_qp;
 	netif_set_real_num_rx_queues(dev, curr_qp + xdp_qp);
 
 	for (i = 0; i < vi->max_queue_pairs; i++) {
@@ -1789,6 +1866,21 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog)
 	}
 
 	return 0;
+
+virtio_reset_err:
+	/* On reset error do our best to unwind XDP changes inflight and return
+	 * error up to user space for resolution. The underlying reset hung on
+	 * us so not much we can do here.
+	 */
+	dev_warn(&dev->dev, "XDP reset failure and queues unstable\n");
+	vi->xdp_queue_pairs = oxdp_qp;
+virtio_queue_err:
+	/* On queue set error we can unwind bpf ref count and user space can
+	 * retry this is most likely an allocation failure.
+	 */
+	if (prog)
+		bpf_prog_sub(prog, vi->max_queue_pairs - 1);
+	return err;
 }
 
 static bool virtnet_xdp_query(struct net_device *dev)
@@ -2382,6 +2474,15 @@ static int virtnet_probe(struct virtio_device *vdev)
 	return err;
 }
 
+static void _remove_vq_common(struct virtnet_info *vi)
+{
+	vi->vdev->config->reset(vi->vdev);
+	free_unused_bufs(vi);
+	_free_receive_bufs(vi);
+	free_receive_page_frags(vi);
+	virtnet_del_vqs(vi);
+}
+
 static void remove_vq_common(struct virtnet_info *vi)
 {
 	vi->vdev->config->reset(vi->vdev);

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [net PATCH v5 6/6] virtio_net: XDP support for adjust_head
  2017-01-17 22:22 ` [net PATCH v5 6/6] virtio_net: XDP support for adjust_head John Fastabend
@ 2017-01-18  3:35   ` Jason Wang
  2017-01-18 15:15   ` Michael S. Tsirkin
  2017-01-23 19:22   ` Michael S. Tsirkin
  2 siblings, 0 replies; 40+ messages in thread
From: Jason Wang @ 2017-01-18  3:35 UTC (permalink / raw)
  To: John Fastabend, mst; +Cc: john.r.fastabend, netdev, alexei.starovoitov, daniel



On 2017年01月18日 06:22, John Fastabend wrote:
>   
> +static int virtnet_reset(struct virtnet_info *vi)
> +{
> +	struct virtio_device *dev = vi->vdev;
> +	int ret;
> +
> +	virtio_config_disable(dev);
> +	dev->failed = dev->config->get_status(dev) & VIRTIO_CONFIG_S_FAILED;
> +	virtnet_freeze_down(dev);
> +	_remove_vq_common(vi);
> +
> +	dev->config->reset(dev);
> +	virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
> +	virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER);
> +
> +	ret = virtio_finalize_features(dev);
> +	if (ret)
> +		goto err;
> +
> +	ret = virtnet_restore_up(dev);
> +	if (ret)
> +		goto err;
> +	ret = _virtnet_set_queues(vi, vi->curr_queue_pairs);
> +	if (ret)
> +		goto err;
> +
> +	virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
> +	virtio_config_enable(dev);
> +	return 0;
> +err:
> +	virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED);
> +	return ret;
> +}
> +

Hi John:

I still prefer not open code (part of) virtio_device_freeze() and 
virtio_device_restore() here. How about:

1) introduce __virtio_device_freeze/__virtio_device_restore which 
accepts a function pointer of free/restore
2) for virtio_device_freeze/virtio_device_restore just pass 
drv->freeze/drv->restore (locked version)
3) for virtnet_reset(), we can pass unlocked version of freeze and restore

Just my preference, if both Michael and you stick to this, I'm also fine.

Thanks

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [net PATCH v5 6/6] virtio_net: XDP support for adjust_head
  2017-01-17 22:22 ` [net PATCH v5 6/6] virtio_net: XDP support for adjust_head John Fastabend
  2017-01-18  3:35   ` Jason Wang
@ 2017-01-18 15:15   ` Michael S. Tsirkin
  2017-01-19  3:05     ` Jason Wang
  2017-01-23 19:22   ` Michael S. Tsirkin
  2 siblings, 1 reply; 40+ messages in thread
From: Michael S. Tsirkin @ 2017-01-18 15:15 UTC (permalink / raw)
  To: John Fastabend
  Cc: jasowang, john.r.fastabend, netdev, alexei.starovoitov, daniel

On Tue, Jan 17, 2017 at 02:22:59PM -0800, John Fastabend wrote:
> Add support for XDP adjust head by allocating a 256B header region
> that XDP programs can grow into. This is only enabled when a XDP
> program is loaded.
> 
> In order to ensure that we do not have to unwind queue headroom push
> queue setup below bpf_prog_add. It reads better to do a prog ref
> unwind vs another queue setup call.
> 
> At the moment this code must do a full reset to ensure old buffers
> without headroom on program add or with headroom on program removal
> are not used incorrectly in the datapath. Ideally we would only
> have to disable/enable the RX queues being updated but there is no
> API to do this at the moment in virtio so use the big hammer. In
> practice it is likely not that big of a problem as this will only
> happen when XDP is enabled/disabled changing programs does not
> require the reset. There is some risk that the driver may either
> have an allocation failure or for some reason fail to correctly
> negotiate with the underlying backend in this case the driver will
> be left uninitialized. I have not seen this ever happen on my test
> systems and for what its worth this same failure case can occur
> from probe and other contexts in virtio framework.
> 
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>

I've been thinking about it - can't we drop
old buffers without the head room which were posted before
xdp attached?

Avoiding the reset would be much nicer.

Thoughts?

> ---
>  drivers/net/virtio_net.c |  149 +++++++++++++++++++++++++++++++++++++++-------
>  1 file changed, 125 insertions(+), 24 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 62dbf4b..3b129b4 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -41,6 +41,9 @@
>  #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN)
>  #define GOOD_COPY_LEN	128
>  
> +/* Amount of XDP headroom to prepend to packets for use by xdp_adjust_head */
> +#define VIRTIO_XDP_HEADROOM 256
> +
>  /* RX packet size EWMA. The average packet size is used to determine the packet
>   * buffer size when refilling RX rings. As the entire RX ring may be refilled
>   * at once, the weight is chosen so that the EWMA will be insensitive to short-
> @@ -359,6 +362,7 @@ static void virtnet_xdp_xmit(struct virtnet_info *vi,
>  	}
>  
>  	if (vi->mergeable_rx_bufs) {
> +		xdp->data -= sizeof(struct virtio_net_hdr_mrg_rxbuf);
>  		/* Zero header and leave csum up to XDP layers */
>  		hdr = xdp->data;
>  		memset(hdr, 0, vi->hdr_len);
> @@ -375,7 +379,9 @@ static void virtnet_xdp_xmit(struct virtnet_info *vi,
>  		num_sg = 2;
>  		sg_init_table(sq->sg, 2);
>  		sg_set_buf(sq->sg, hdr, vi->hdr_len);
> -		skb_to_sgvec(skb, sq->sg + 1, 0, skb->len);
> +		skb_to_sgvec(skb, sq->sg + 1,
> +			     xdp->data - xdp->data_hard_start,
> +			     xdp->data_end - xdp->data);
>  	}
>  	err = virtqueue_add_outbuf(sq->vq, sq->sg, num_sg,
>  				   data, GFP_ATOMIC);
> @@ -401,7 +407,6 @@ static struct sk_buff *receive_small(struct net_device *dev,
>  	struct bpf_prog *xdp_prog;
>  
>  	len -= vi->hdr_len;
> -	skb_trim(skb, len);
>  
>  	rcu_read_lock();
>  	xdp_prog = rcu_dereference(rq->xdp_prog);
> @@ -413,11 +418,15 @@ static struct sk_buff *receive_small(struct net_device *dev,
>  		if (unlikely(hdr->hdr.gso_type || hdr->hdr.flags))
>  			goto err_xdp;
>  
> -		xdp.data = skb->data;
> +		xdp.data_hard_start = skb->data;
> +		xdp.data = skb->data + VIRTIO_XDP_HEADROOM;
>  		xdp.data_end = xdp.data + len;
>  		act = bpf_prog_run_xdp(xdp_prog, &xdp);
>  		switch (act) {
>  		case XDP_PASS:
> +			/* Recalculate length in case bpf program changed it */
> +			__skb_pull(skb, xdp.data - xdp.data_hard_start);
> +			len = xdp.data_end - xdp.data;
>  			break;
>  		case XDP_TX:
>  			virtnet_xdp_xmit(vi, rq, &xdp, skb);
> @@ -432,6 +441,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
>  	}
>  	rcu_read_unlock();
>  
> +	skb_trim(skb, len);
>  	return skb;
>  
>  err_xdp:
> @@ -480,7 +490,7 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
>  				       unsigned int *len)
>  {
>  	struct page *page = alloc_page(GFP_ATOMIC);
> -	unsigned int page_off = 0;
> +	unsigned int page_off = VIRTIO_XDP_HEADROOM;
>  
>  	if (!page)
>  		return NULL;
> @@ -516,7 +526,8 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
>  		put_page(p);
>  	}
>  
> -	*len = page_off;
> +	/* Headroom does not contribute to packet length */
> +	*len = page_off - VIRTIO_XDP_HEADROOM;
>  	return page;
>  err_buf:
>  	__free_pages(page, 0);
> @@ -555,7 +566,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>  						      page, offset, &len);
>  			if (!xdp_page)
>  				goto err_xdp;
> -			offset = 0;
> +			offset = VIRTIO_XDP_HEADROOM;
>  		} else {
>  			xdp_page = page;
>  		}
> @@ -568,18 +579,29 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>  		if (unlikely(hdr->hdr.gso_type))
>  			goto err_xdp;
>  
> +		/* Allow consuming headroom but reserve enough space to push
> +		 * the descriptor on if we get an XDP_TX return code.
> +		 */
>  		data = page_address(xdp_page) + offset;
> +		xdp.data_hard_start = data - VIRTIO_XDP_HEADROOM + vi->hdr_len;
>  		xdp.data = data + vi->hdr_len;
>  		xdp.data_end = xdp.data + (len - vi->hdr_len);
>  		act = bpf_prog_run_xdp(xdp_prog, &xdp);
>  		switch (act) {
>  		case XDP_PASS:
> +			/* recalculate offset to account for any header
> +			 * adjustments. Note other cases do not build an
> +			 * skb and avoid using offset
> +			 */
> +			offset = xdp.data -
> +					page_address(xdp_page) - vi->hdr_len;
> +
>  			/* We can only create skb based on xdp_page. */
>  			if (unlikely(xdp_page != page)) {
>  				rcu_read_unlock();
>  				put_page(page);
>  				head_skb = page_to_skb(vi, rq, xdp_page,
> -						       0, len, PAGE_SIZE);
> +						       offset, len, PAGE_SIZE);
>  				ewma_pkt_len_add(&rq->mrg_avg_pkt_len, len);
>  				return head_skb;
>  			}
> @@ -744,23 +766,30 @@ static void receive_buf(struct virtnet_info *vi, struct receive_queue *rq,
>  	dev_kfree_skb(skb);
>  }
>  
> +static unsigned int virtnet_get_headroom(struct virtnet_info *vi)
> +{
> +	return vi->xdp_queue_pairs ? VIRTIO_XDP_HEADROOM : 0;
> +}
> +
>  static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
>  			     gfp_t gfp)
>  {
> +	int headroom = GOOD_PACKET_LEN + virtnet_get_headroom(vi);
> +	unsigned int xdp_headroom = virtnet_get_headroom(vi);
>  	struct sk_buff *skb;
>  	struct virtio_net_hdr_mrg_rxbuf *hdr;
>  	int err;
>  
> -	skb = __netdev_alloc_skb_ip_align(vi->dev, GOOD_PACKET_LEN, gfp);
> +	skb = __netdev_alloc_skb_ip_align(vi->dev, headroom, gfp);
>  	if (unlikely(!skb))
>  		return -ENOMEM;
>  
> -	skb_put(skb, GOOD_PACKET_LEN);
> +	skb_put(skb, headroom);
>  
>  	hdr = skb_vnet_hdr(skb);
>  	sg_init_table(rq->sg, 2);
>  	sg_set_buf(rq->sg, hdr, vi->hdr_len);
> -	skb_to_sgvec(skb, rq->sg + 1, 0, skb->len);
> +	skb_to_sgvec(skb, rq->sg + 1, xdp_headroom, skb->len - xdp_headroom);
>  
>  	err = virtqueue_add_inbuf(rq->vq, rq->sg, 2, skb, gfp);
>  	if (err < 0)
> @@ -828,24 +857,27 @@ static unsigned int get_mergeable_buf_len(struct ewma_pkt_len *avg_pkt_len)
>  	return ALIGN(len, MERGEABLE_BUFFER_ALIGN);
>  }
>  
> -static int add_recvbuf_mergeable(struct receive_queue *rq, gfp_t gfp)
> +static int add_recvbuf_mergeable(struct virtnet_info *vi,
> +				 struct receive_queue *rq, gfp_t gfp)
>  {
>  	struct page_frag *alloc_frag = &rq->alloc_frag;
> +	unsigned int headroom = virtnet_get_headroom(vi);
>  	char *buf;
>  	unsigned long ctx;
>  	int err;
>  	unsigned int len, hole;
>  
>  	len = get_mergeable_buf_len(&rq->mrg_avg_pkt_len);
> -	if (unlikely(!skb_page_frag_refill(len, alloc_frag, gfp)))
> +	if (unlikely(!skb_page_frag_refill(len + headroom, alloc_frag, gfp)))
>  		return -ENOMEM;
>  
>  	buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
> +	buf += headroom; /* advance address leaving hole at front of pkt */
>  	ctx = mergeable_buf_to_ctx(buf, len);
>  	get_page(alloc_frag->page);
> -	alloc_frag->offset += len;
> +	alloc_frag->offset += len + headroom;
>  	hole = alloc_frag->size - alloc_frag->offset;
> -	if (hole < len) {
> +	if (hole < len + headroom) {
>  		/* To avoid internal fragmentation, if there is very likely not
>  		 * enough space for another buffer, add the remaining space to
>  		 * the current buffer. This extra space is not included in
> @@ -879,7 +911,7 @@ static bool try_fill_recv(struct virtnet_info *vi, struct receive_queue *rq,
>  	gfp |= __GFP_COLD;
>  	do {
>  		if (vi->mergeable_rx_bufs)
> -			err = add_recvbuf_mergeable(rq, gfp);
> +			err = add_recvbuf_mergeable(vi, rq, gfp);
>  		else if (vi->big_packets)
>  			err = add_recvbuf_big(vi, rq, gfp);
>  		else
> @@ -1702,6 +1734,7 @@ static void virtnet_freeze_down(struct virtio_device *vdev)
>  }
>  
>  static int init_vqs(struct virtnet_info *vi);
> +static void _remove_vq_common(struct virtnet_info *vi);
>  
>  static int virtnet_restore_up(struct virtio_device *vdev)
>  {
> @@ -1727,12 +1760,45 @@ static int virtnet_restore_up(struct virtio_device *vdev)
>  	return err;
>  }
>  
> +static int virtnet_reset(struct virtnet_info *vi)
> +{
> +	struct virtio_device *dev = vi->vdev;
> +	int ret;
> +
> +	virtio_config_disable(dev);
> +	dev->failed = dev->config->get_status(dev) & VIRTIO_CONFIG_S_FAILED;
> +	virtnet_freeze_down(dev);
> +	_remove_vq_common(vi);
> +
> +	dev->config->reset(dev);
> +	virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
> +	virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER);
> +
> +	ret = virtio_finalize_features(dev);
> +	if (ret)
> +		goto err;
> +
> +	ret = virtnet_restore_up(dev);
> +	if (ret)
> +		goto err;
> +	ret = _virtnet_set_queues(vi, vi->curr_queue_pairs);
> +	if (ret)
> +		goto err;
> +
> +	virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
> +	virtio_config_enable(dev);
> +	return 0;
> +err:
> +	virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED);
> +	return ret;
> +}
> +
>  static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog)
>  {
>  	unsigned long int max_sz = PAGE_SIZE - sizeof(struct padded_vnet_hdr);
>  	struct virtnet_info *vi = netdev_priv(dev);
>  	struct bpf_prog *old_prog;
> -	u16 xdp_qp = 0, curr_qp;
> +	u16 oxdp_qp, xdp_qp = 0, curr_qp;
>  	int i, err;
>  
>  	if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO4) ||
> @@ -1764,21 +1830,32 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog)
>  		return -ENOMEM;
>  	}
>  
> +	if (prog) {
> +		prog = bpf_prog_add(prog, vi->max_queue_pairs - 1);
> +		if (IS_ERR(prog))
> +			return PTR_ERR(prog);
> +	}
> +
>  	err = _virtnet_set_queues(vi, curr_qp + xdp_qp);
>  	if (err) {
>  		dev_warn(&dev->dev, "XDP Device queue allocation failure.\n");
> -		return err;
> +		goto virtio_queue_err;
>  	}
>  
> -	if (prog) {
> -		prog = bpf_prog_add(prog, vi->max_queue_pairs - 1);
> -		if (IS_ERR(prog)) {
> -			_virtnet_set_queues(vi, curr_qp);
> -			return PTR_ERR(prog);
> -		}
> +	oxdp_qp = vi->xdp_queue_pairs;
> +
> +	/* Changing the headroom in buffers is a disruptive operation because
> +	 * existing buffers must be flushed and reallocated. This will happen
> +	 * when a xdp program is initially added or xdp is disabled by removing
> +	 * the xdp program resulting in number of XDP queues changing.
> +	 */
> +	if (vi->xdp_queue_pairs != xdp_qp) {
> +		vi->xdp_queue_pairs = xdp_qp;
> +		err = virtnet_reset(vi);
> +		if (err)
> +			goto virtio_reset_err;
>  	}
>  
> -	vi->xdp_queue_pairs = xdp_qp;
>  	netif_set_real_num_rx_queues(dev, curr_qp + xdp_qp);
>  
>  	for (i = 0; i < vi->max_queue_pairs; i++) {
> @@ -1789,6 +1866,21 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog)
>  	}
>  
>  	return 0;
> +
> +virtio_reset_err:
> +	/* On reset error do our best to unwind XDP changes inflight and return
> +	 * error up to user space for resolution. The underlying reset hung on
> +	 * us so not much we can do here.
> +	 */
> +	dev_warn(&dev->dev, "XDP reset failure and queues unstable\n");
> +	vi->xdp_queue_pairs = oxdp_qp;
> +virtio_queue_err:
> +	/* On queue set error we can unwind bpf ref count and user space can
> +	 * retry this is most likely an allocation failure.
> +	 */
> +	if (prog)
> +		bpf_prog_sub(prog, vi->max_queue_pairs - 1);
> +	return err;
>  }
>  
>  static bool virtnet_xdp_query(struct net_device *dev)
> @@ -2382,6 +2474,15 @@ static int virtnet_probe(struct virtio_device *vdev)
>  	return err;
>  }
>  
> +static void _remove_vq_common(struct virtnet_info *vi)
> +{
> +	vi->vdev->config->reset(vi->vdev);
> +	free_unused_bufs(vi);
> +	_free_receive_bufs(vi);
> +	free_receive_page_frags(vi);
> +	virtnet_del_vqs(vi);
> +}
> +
>  static void remove_vq_common(struct virtnet_info *vi)
>  {
>  	vi->vdev->config->reset(vi->vdev);

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [net PATCH v5 0/6] virtio_net XDP fixes and adjust_header support
  2017-01-17 22:19 [net PATCH v5 0/6] virtio_net XDP fixes and adjust_header support John Fastabend
                   ` (5 preceding siblings ...)
  2017-01-17 22:22 ` [net PATCH v5 6/6] virtio_net: XDP support for adjust_head John Fastabend
@ 2017-01-18 15:48 ` Michael S. Tsirkin
  6 siblings, 0 replies; 40+ messages in thread
From: Michael S. Tsirkin @ 2017-01-18 15:48 UTC (permalink / raw)
  To: John Fastabend
  Cc: jasowang, john.r.fastabend, netdev, alexei.starovoitov, daniel

On Tue, Jan 17, 2017 at 02:19:27PM -0800, John Fastabend wrote:
> This has a fix to handle small buffer free logic correctly and then
> also adds adjust head support.
> 
> I pushed adjust head at net (even though its rc3) to avoid having
> to push another exception case into virtio_net to catch if the
> program uses adjust_head and then block it. If there are any strong
> objections to this we can push it at net-next and use a patch from
> Jakub to add the exception handling but then user space has to deal
> with it either via try/fail logic or via kernel version checks. Granted
> we already have some cases that need to be configured to enable XDP
> but I don't see any reason to have yet another one when we can fix it
> now vs delaying a kernel version.

1, 3 and 4 definitely look good to me.
I don't like the big hammer approach that other patches
take though. Sent some comments, and I'd like to ponder it for a
couple of days.



> 
> v2: fix spelling error, convert unsigned -> unsigned int
> v3: v2 git crashed during send so retrying sorry for the noise
> v4: changed layout of rtnl_lock fixes (Stephen)
>     moved reset logic into virtio core with new patch (MST)
>     fixed up linearize and some code cleanup (Jason)
> 
>     Otherwise did some generic code cleanup so might be a bit
>     cleaner this time at least that is the hope.
> v5: fixed rtnl_lock issue (DaveM)
> 
>     In order to fix rtnl_lock issue and also to address Jason's
>     comment questioning the need for a generic virtio_device_reset
>     routine I exported some virtio core routines and then wrote
>     virtio_net reset routine. This is the cleanest solution I
>     came up with today and I do not at this time have any need
>     for a more generic reset. If folks don't like this I could
>     revert back to v3 variant but Stephen pointed out that the
>     pattern used there is also not ideal.
> 
> Thanks for the review.
> 
> ---
> 
> John Fastabend (6):
>       virtio_net: use dev_kfree_skb for small buffer XDP receive
>       virtio_net: wrap rtnl_lock in test for calling with lock already held
>       virtio_net: factor out xdp handler for readability
>       virtio_net: remove duplicate queue pair binding in XDP
>       virtio_net: refactor freeze/restore logic into virtnet reset logic
>       virtio_net: XDP support for adjust_head
> 
> 
>  drivers/net/virtio_net.c |  332 ++++++++++++++++++++++++++++++----------------
>  drivers/virtio/virtio.c  |   42 +++---
>  include/linux/virtio.h   |    4 +
>  3 files changed, 247 insertions(+), 131 deletions(-)
> 
> --
> Signature

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [net PATCH v5 1/6] virtio_net: use dev_kfree_skb for small buffer XDP receive
  2017-01-17 22:19 ` [net PATCH v5 1/6] virtio_net: use dev_kfree_skb for small buffer XDP receive John Fastabend
@ 2017-01-18 15:48   ` Michael S. Tsirkin
  2017-01-23 21:08   ` Michael S. Tsirkin
  1 sibling, 0 replies; 40+ messages in thread
From: Michael S. Tsirkin @ 2017-01-18 15:48 UTC (permalink / raw)
  To: John Fastabend
  Cc: jasowang, john.r.fastabend, netdev, alexei.starovoitov, daniel

On Tue, Jan 17, 2017 at 02:19:50PM -0800, John Fastabend wrote:
> In the small buffer case during driver unload we currently use
> put_page instead of dev_kfree_skb. Resolve this by adding a check
> for virtnet mode when checking XDP queue type. Also name the
> function so that the code reads correctly to match the additional
> check.
> 
> Fixes: bb91accf2733 ("virtio-net: XDP support for small buffers")
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> Acked-by: Jason Wang <jasowang@redhat.com>

Acked-by: Michael S. Tsirkin <mst@redhat.com>

> ---
>  drivers/net/virtio_net.c |    8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 4a10500..d97bb71 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1890,8 +1890,12 @@ static void free_receive_page_frags(struct virtnet_info *vi)
>  			put_page(vi->rq[i].alloc_frag.page);
>  }
>  
> -static bool is_xdp_queue(struct virtnet_info *vi, int q)
> +static bool is_xdp_raw_buffer_queue(struct virtnet_info *vi, int q)
>  {
> +	/* For small receive mode always use kfree_skb variants */
> +	if (!vi->mergeable_rx_bufs)
> +		return false;
> +
>  	if (q < (vi->curr_queue_pairs - vi->xdp_queue_pairs))
>  		return false;
>  	else if (q < vi->curr_queue_pairs)
> @@ -1908,7 +1912,7 @@ static void free_unused_bufs(struct virtnet_info *vi)
>  	for (i = 0; i < vi->max_queue_pairs; i++) {
>  		struct virtqueue *vq = vi->sq[i].vq;
>  		while ((buf = virtqueue_detach_unused_buf(vq)) != NULL) {
> -			if (!is_xdp_queue(vi, i))
> +			if (!is_xdp_raw_buffer_queue(vi, i))
>  				dev_kfree_skb(buf);
>  			else
>  				put_page(virt_to_head_page(buf));

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [net PATCH v5 3/6] virtio_net: factor out xdp handler for readability
  2017-01-17 22:21 ` [net PATCH v5 3/6] virtio_net: factor out xdp handler for readability John Fastabend
@ 2017-01-18 15:48   ` Michael S. Tsirkin
  0 siblings, 0 replies; 40+ messages in thread
From: Michael S. Tsirkin @ 2017-01-18 15:48 UTC (permalink / raw)
  To: John Fastabend
  Cc: jasowang, john.r.fastabend, netdev, alexei.starovoitov, daniel

On Tue, Jan 17, 2017 at 02:21:07PM -0800, John Fastabend wrote:
> At this point the do_xdp_prog is mostly if/else branches handling
> the different modes of virtio_net. So remove it and handle running
> the program in the per mode handlers.
> 
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>

Acked-by: Michael S. Tsirkin <mst@redhat.com>

> ---
>  drivers/net/virtio_net.c |   75 +++++++++++++++++-----------------------------
>  1 file changed, 27 insertions(+), 48 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index ba0efee..6de0cbe 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -388,49 +388,6 @@ static void virtnet_xdp_xmit(struct virtnet_info *vi,
>  	virtqueue_kick(sq->vq);
>  }
>  
> -static u32 do_xdp_prog(struct virtnet_info *vi,
> -		       struct receive_queue *rq,
> -		       struct bpf_prog *xdp_prog,
> -		       void *data, int len)
> -{
> -	int hdr_padded_len;
> -	struct xdp_buff xdp;
> -	void *buf;
> -	unsigned int qp;
> -	u32 act;
> -
> -	if (vi->mergeable_rx_bufs) {
> -		hdr_padded_len = sizeof(struct virtio_net_hdr_mrg_rxbuf);
> -		xdp.data = data + hdr_padded_len;
> -		xdp.data_end = xdp.data + (len - vi->hdr_len);
> -		buf = data;
> -	} else { /* small buffers */
> -		struct sk_buff *skb = data;
> -
> -		xdp.data = skb->data;
> -		xdp.data_end = xdp.data + len;
> -		buf = skb->data;
> -	}
> -
> -	act = bpf_prog_run_xdp(xdp_prog, &xdp);
> -	switch (act) {
> -	case XDP_PASS:
> -		return XDP_PASS;
> -	case XDP_TX:
> -		qp = vi->curr_queue_pairs -
> -			vi->xdp_queue_pairs +
> -			smp_processor_id();
> -		xdp.data = buf;
> -		virtnet_xdp_xmit(vi, rq, &vi->sq[qp], &xdp, data);
> -		return XDP_TX;
> -	default:
> -		bpf_warn_invalid_xdp_action(act);
> -	case XDP_ABORTED:
> -	case XDP_DROP:
> -		return XDP_DROP;
> -	}
> -}
> -
>  static struct sk_buff *receive_small(struct net_device *dev,
>  				     struct virtnet_info *vi,
>  				     struct receive_queue *rq,
> @@ -446,19 +403,30 @@ static struct sk_buff *receive_small(struct net_device *dev,
>  	xdp_prog = rcu_dereference(rq->xdp_prog);
>  	if (xdp_prog) {
>  		struct virtio_net_hdr_mrg_rxbuf *hdr = buf;
> +		struct xdp_buff xdp;
> +		unsigned int qp;
>  		u32 act;
>  
>  		if (unlikely(hdr->hdr.gso_type || hdr->hdr.flags))
>  			goto err_xdp;
> -		act = do_xdp_prog(vi, rq, xdp_prog, skb, len);
> +
> +		xdp.data = skb->data;
> +		xdp.data_end = xdp.data + len;
> +		act = bpf_prog_run_xdp(xdp_prog, &xdp);
>  		switch (act) {
>  		case XDP_PASS:
>  			break;
>  		case XDP_TX:
> +			qp = vi->curr_queue_pairs -
> +				vi->xdp_queue_pairs +
> +				smp_processor_id();
> +			virtnet_xdp_xmit(vi, rq, &vi->sq[qp], &xdp, skb);
>  			rcu_read_unlock();
>  			goto xdp_xmit;
> -		case XDP_DROP:
>  		default:
> +			bpf_warn_invalid_xdp_action(act);
> +		case XDP_ABORTED:
> +		case XDP_DROP:
>  			goto err_xdp;
>  		}
>  	}
> @@ -576,6 +544,9 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>  	xdp_prog = rcu_dereference(rq->xdp_prog);
>  	if (xdp_prog) {
>  		struct page *xdp_page;
> +		struct xdp_buff xdp;
> +		unsigned int qp;
> +		void *data;
>  		u32 act;
>  
>  		/* This happens when rx buffer size is underestimated */
> @@ -598,8 +569,10 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>  		if (unlikely(hdr->hdr.gso_type))
>  			goto err_xdp;
>  
> -		act = do_xdp_prog(vi, rq, xdp_prog,
> -				  page_address(xdp_page) + offset, len);
> +		data = page_address(xdp_page) + offset;
> +		xdp.data = data + vi->hdr_len;
> +		xdp.data_end = xdp.data + (len - vi->hdr_len);
> +		act = bpf_prog_run_xdp(xdp_prog, &xdp);
>  		switch (act) {
>  		case XDP_PASS:
>  			/* We can only create skb based on xdp_page. */
> @@ -613,13 +586,19 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>  			}
>  			break;
>  		case XDP_TX:
> +			qp = vi->curr_queue_pairs -
> +				vi->xdp_queue_pairs +
> +				smp_processor_id();
> +			virtnet_xdp_xmit(vi, rq, &vi->sq[qp], &xdp, data);
>  			ewma_pkt_len_add(&rq->mrg_avg_pkt_len, len);
>  			if (unlikely(xdp_page != page))
>  				goto err_xdp;
>  			rcu_read_unlock();
>  			goto xdp_xmit;
> -		case XDP_DROP:
>  		default:
> +			bpf_warn_invalid_xdp_action(act);
> +		case XDP_ABORTED:
> +		case XDP_DROP:
>  			if (unlikely(xdp_page != page))
>  				__free_pages(xdp_page, 0);
>  			ewma_pkt_len_add(&rq->mrg_avg_pkt_len, len);

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [net PATCH v5 4/6] virtio_net: remove duplicate queue pair binding in XDP
  2017-01-17 22:21 ` [net PATCH v5 4/6] virtio_net: remove duplicate queue pair binding in XDP John Fastabend
@ 2017-01-18 15:49   ` Michael S. Tsirkin
  0 siblings, 0 replies; 40+ messages in thread
From: Michael S. Tsirkin @ 2017-01-18 15:49 UTC (permalink / raw)
  To: John Fastabend
  Cc: jasowang, john.r.fastabend, netdev, alexei.starovoitov, daniel

On Tue, Jan 17, 2017 at 02:21:44PM -0800, John Fastabend wrote:
> Factor out qp assignment.
> 
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>

Acked-by: Michael S. Tsirkin <mst@redhat.com>

> ---
>  drivers/net/virtio_net.c |   18 +++++++-----------
>  1 file changed, 7 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 6de0cbe..922ca66 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -332,15 +332,19 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
>  
>  static void virtnet_xdp_xmit(struct virtnet_info *vi,
>  			     struct receive_queue *rq,
> -			     struct send_queue *sq,
>  			     struct xdp_buff *xdp,
>  			     void *data)
>  {
>  	struct virtio_net_hdr_mrg_rxbuf *hdr;
>  	unsigned int num_sg, len;
> +	struct send_queue *sq;
> +	unsigned int qp;
>  	void *xdp_sent;
>  	int err;
>  
> +	qp = vi->curr_queue_pairs - vi->xdp_queue_pairs + smp_processor_id();
> +	sq = &vi->sq[qp];
> +
>  	/* Free up any pending old buffers before queueing new ones. */
>  	while ((xdp_sent = virtqueue_get_buf(sq->vq, &len)) != NULL) {
>  		if (vi->mergeable_rx_bufs) {
> @@ -404,7 +408,6 @@ static struct sk_buff *receive_small(struct net_device *dev,
>  	if (xdp_prog) {
>  		struct virtio_net_hdr_mrg_rxbuf *hdr = buf;
>  		struct xdp_buff xdp;
> -		unsigned int qp;
>  		u32 act;
>  
>  		if (unlikely(hdr->hdr.gso_type || hdr->hdr.flags))
> @@ -417,10 +420,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
>  		case XDP_PASS:
>  			break;
>  		case XDP_TX:
> -			qp = vi->curr_queue_pairs -
> -				vi->xdp_queue_pairs +
> -				smp_processor_id();
> -			virtnet_xdp_xmit(vi, rq, &vi->sq[qp], &xdp, skb);
> +			virtnet_xdp_xmit(vi, rq, &xdp, skb);
>  			rcu_read_unlock();
>  			goto xdp_xmit;
>  		default:
> @@ -545,7 +545,6 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>  	if (xdp_prog) {
>  		struct page *xdp_page;
>  		struct xdp_buff xdp;
> -		unsigned int qp;
>  		void *data;
>  		u32 act;
>  
> @@ -586,10 +585,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>  			}
>  			break;
>  		case XDP_TX:
> -			qp = vi->curr_queue_pairs -
> -				vi->xdp_queue_pairs +
> -				smp_processor_id();
> -			virtnet_xdp_xmit(vi, rq, &vi->sq[qp], &xdp, data);
> +			virtnet_xdp_xmit(vi, rq, &xdp, data);
>  			ewma_pkt_len_add(&rq->mrg_avg_pkt_len, len);
>  			if (unlikely(xdp_page != page))
>  				goto err_xdp;

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [net PATCH v5 5/6] virtio_net: refactor freeze/restore logic into virtnet reset logic
  2017-01-17 22:22 ` [net PATCH v5 5/6] virtio_net: refactor freeze/restore logic into virtnet reset logic John Fastabend
@ 2017-01-18 15:50   ` Michael S. Tsirkin
  0 siblings, 0 replies; 40+ messages in thread
From: Michael S. Tsirkin @ 2017-01-18 15:50 UTC (permalink / raw)
  To: John Fastabend
  Cc: jasowang, john.r.fastabend, netdev, alexei.starovoitov, daniel

On Tue, Jan 17, 2017 at 02:22:23PM -0800, John Fastabend wrote:
> For XDP we will need to reset the queues to allow for buffer headroom
> to be configured. In order to do this we need to essentially run the
> freeze()/restore() code path. Unfortunately the locking requirements
> between the freeze/restore and reset paths are different however so
> we can not simply reuse the code.
> 
> This patch refactors the code path and adds a reset helper routine.
> 
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> ---
>  drivers/net/virtio_net.c |   75 ++++++++++++++++++++++++++++------------------
>  drivers/virtio/virtio.c  |   42 ++++++++++++++------------
>  include/linux/virtio.h   |    4 ++
>  3 files changed, 73 insertions(+), 48 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 922ca66..62dbf4b 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1684,6 +1684,49 @@ static void virtnet_init_settings(struct net_device *dev)
>  	.set_settings = virtnet_set_settings,
>  };
>  
> +static void virtnet_freeze_down(struct virtio_device *vdev)
> +{
> +	struct virtnet_info *vi = vdev->priv;
> +	int i;
> +
> +	/* Make sure no work handler is accessing the device */
> +	flush_work(&vi->config_work);
> +
> +	netif_device_detach(vi->dev);
> +	cancel_delayed_work_sync(&vi->refill);
> +
> +	if (netif_running(vi->dev)) {
> +		for (i = 0; i < vi->max_queue_pairs; i++)
> +			napi_disable(&vi->rq[i].napi);
> +	}
> +}
> +
> +static int init_vqs(struct virtnet_info *vi);

I dislike forward declarations for static functions -
if you are trying to make the diff more readable
(understandable) then pls move this function to before use
in a follow-up patch. Same applies to the next patch.

> +
> +static int virtnet_restore_up(struct virtio_device *vdev)
> +{
> +	struct virtnet_info *vi = vdev->priv;
> +	int err, i;
> +
> +	err = init_vqs(vi);
> +	if (err)
> +		return err;
> +
> +	virtio_device_ready(vdev);
> +
> +	if (netif_running(vi->dev)) {
> +		for (i = 0; i < vi->curr_queue_pairs; i++)
> +			if (!try_fill_recv(vi, &vi->rq[i], GFP_KERNEL))
> +				schedule_delayed_work(&vi->refill, 0);
> +
> +		for (i = 0; i < vi->max_queue_pairs; i++)
> +			virtnet_napi_enable(&vi->rq[i]);
> +	}
> +
> +	netif_device_attach(vi->dev);
> +	return err;
> +}
> +
>  static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog)
>  {
>  	unsigned long int max_sz = PAGE_SIZE - sizeof(struct padded_vnet_hdr);
> @@ -2374,21 +2417,9 @@ static void virtnet_remove(struct virtio_device *vdev)
>  static int virtnet_freeze(struct virtio_device *vdev)
>  {
>  	struct virtnet_info *vi = vdev->priv;
> -	int i;
>  
>  	virtnet_cpu_notif_remove(vi);
> -
> -	/* Make sure no work handler is accessing the device */
> -	flush_work(&vi->config_work);
> -
> -	netif_device_detach(vi->dev);
> -	cancel_delayed_work_sync(&vi->refill);
> -
> -	if (netif_running(vi->dev)) {
> -		for (i = 0; i < vi->max_queue_pairs; i++)
> -			napi_disable(&vi->rq[i].napi);
> -	}
> -
> +	virtnet_freeze_down(vdev);
>  	remove_vq_common(vi);
>  
>  	return 0;
> @@ -2397,25 +2428,11 @@ static int virtnet_freeze(struct virtio_device *vdev)
>  static int virtnet_restore(struct virtio_device *vdev)
>  {
>  	struct virtnet_info *vi = vdev->priv;
> -	int err, i;
> +	int err;
>  
> -	err = init_vqs(vi);
> +	err = virtnet_restore_up(vdev);
>  	if (err)
>  		return err;
> -
> -	virtio_device_ready(vdev);
> -
> -	if (netif_running(vi->dev)) {
> -		for (i = 0; i < vi->curr_queue_pairs; i++)
> -			if (!try_fill_recv(vi, &vi->rq[i], GFP_KERNEL))
> -				schedule_delayed_work(&vi->refill, 0);
> -
> -		for (i = 0; i < vi->max_queue_pairs; i++)
> -			virtnet_napi_enable(&vi->rq[i]);
> -	}
> -
> -	netif_device_attach(vi->dev);
> -
>  	virtnet_set_queues(vi, vi->curr_queue_pairs);
>  
>  	err = virtnet_cpu_notif_add(vi);
> diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
> index 7062bb0..400d70b 100644
> --- a/drivers/virtio/virtio.c
> +++ b/drivers/virtio/virtio.c
> @@ -100,11 +100,6 @@ static int virtio_uevent(struct device *_dv, struct kobj_uevent_env *env)
>  			      dev->id.device, dev->id.vendor);
>  }
>  
> -static void add_status(struct virtio_device *dev, unsigned status)
> -{
> -	dev->config->set_status(dev, dev->config->get_status(dev) | status);
> -}
> -
>  void virtio_check_driver_offered_feature(const struct virtio_device *vdev,
>  					 unsigned int fbit)
>  {
> @@ -145,14 +140,15 @@ void virtio_config_changed(struct virtio_device *dev)
>  }
>  EXPORT_SYMBOL_GPL(virtio_config_changed);
>  
> -static void virtio_config_disable(struct virtio_device *dev)
> +void virtio_config_disable(struct virtio_device *dev)
>  {
>  	spin_lock_irq(&dev->config_lock);
>  	dev->config_enabled = false;
>  	spin_unlock_irq(&dev->config_lock);
>  }
> +EXPORT_SYMBOL_GPL(virtio_config_disable);
>  
> -static void virtio_config_enable(struct virtio_device *dev)
> +void virtio_config_enable(struct virtio_device *dev)
>  {
>  	spin_lock_irq(&dev->config_lock);
>  	dev->config_enabled = true;
> @@ -161,8 +157,15 @@ static void virtio_config_enable(struct virtio_device *dev)
>  	dev->config_change_pending = false;
>  	spin_unlock_irq(&dev->config_lock);
>  }
> +EXPORT_SYMBOL_GPL(virtio_config_enable);
> +
> +void virtio_add_status(struct virtio_device *dev, unsigned int status)
> +{
> +	dev->config->set_status(dev, dev->config->get_status(dev) | status);
> +}
> +EXPORT_SYMBOL_GPL(virtio_add_status);
>  
> -static int virtio_finalize_features(struct virtio_device *dev)
> +int virtio_finalize_features(struct virtio_device *dev)
>  {
>  	int ret = dev->config->finalize_features(dev);
>  	unsigned status;
> @@ -173,7 +176,7 @@ static int virtio_finalize_features(struct virtio_device *dev)
>  	if (!virtio_has_feature(dev, VIRTIO_F_VERSION_1))
>  		return 0;
>  
> -	add_status(dev, VIRTIO_CONFIG_S_FEATURES_OK);
> +	virtio_add_status(dev, VIRTIO_CONFIG_S_FEATURES_OK);
>  	status = dev->config->get_status(dev);
>  	if (!(status & VIRTIO_CONFIG_S_FEATURES_OK)) {
>  		dev_err(&dev->dev, "virtio: device refuses features: %x\n",
> @@ -182,6 +185,7 @@ static int virtio_finalize_features(struct virtio_device *dev)
>  	}
>  	return 0;
>  }
> +EXPORT_SYMBOL_GPL(virtio_finalize_features);
>  
>  static int virtio_dev_probe(struct device *_d)
>  {
> @@ -193,7 +197,7 @@ static int virtio_dev_probe(struct device *_d)
>  	u64 driver_features_legacy;
>  
>  	/* We have a driver! */
> -	add_status(dev, VIRTIO_CONFIG_S_DRIVER);
> +	virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER);
>  
>  	/* Figure out what features the device supports. */
>  	device_features = dev->config->get_features(dev);
> @@ -247,7 +251,7 @@ static int virtio_dev_probe(struct device *_d)
>  
>  	return 0;
>  err:
> -	add_status(dev, VIRTIO_CONFIG_S_FAILED);
> +	virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED);
>  	return err;
>  
>  }
> @@ -265,7 +269,7 @@ static int virtio_dev_remove(struct device *_d)
>  	WARN_ON_ONCE(dev->config->get_status(dev));
>  
>  	/* Acknowledge the device's existence again. */
> -	add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
> +	virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
>  	return 0;
>  }
>  
> @@ -316,7 +320,7 @@ int register_virtio_device(struct virtio_device *dev)
>  	dev->config->reset(dev);
>  
>  	/* Acknowledge that we've seen the device. */
> -	add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
> +	virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
>  
>  	INIT_LIST_HEAD(&dev->vqs);
>  
> @@ -325,7 +329,7 @@ int register_virtio_device(struct virtio_device *dev)
>  	err = device_register(&dev->dev);
>  out:
>  	if (err)
> -		add_status(dev, VIRTIO_CONFIG_S_FAILED);
> +		virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED);
>  	return err;
>  }
>  EXPORT_SYMBOL_GPL(register_virtio_device);
> @@ -365,18 +369,18 @@ int virtio_device_restore(struct virtio_device *dev)
>  	dev->config->reset(dev);
>  
>  	/* Acknowledge that we've seen the device. */
> -	add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
> +	virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE);
>  
>  	/* Maybe driver failed before freeze.
>  	 * Restore the failed status, for debugging. */
>  	if (dev->failed)
> -		add_status(dev, VIRTIO_CONFIG_S_FAILED);
> +		virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED);
>  
>  	if (!drv)
>  		return 0;
>  
>  	/* We have a driver! */
> -	add_status(dev, VIRTIO_CONFIG_S_DRIVER);
> +	virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER);
>  
>  	ret = virtio_finalize_features(dev);
>  	if (ret)
> @@ -389,14 +393,14 @@ int virtio_device_restore(struct virtio_device *dev)
>  	}
>  
>  	/* Finally, tell the device we're all set */
> -	add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
> +	virtio_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
>  
>  	virtio_config_enable(dev);
>  
>  	return 0;
>  
>  err:
> -	add_status(dev, VIRTIO_CONFIG_S_FAILED);
> +	virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED);
>  	return ret;
>  }
>  EXPORT_SYMBOL_GPL(virtio_device_restore);
> diff --git a/include/linux/virtio.h b/include/linux/virtio.h
> index d5eb547..04b0d3f 100644
> --- a/include/linux/virtio.h
> +++ b/include/linux/virtio.h
> @@ -132,12 +132,16 @@ static inline struct virtio_device *dev_to_virtio(struct device *_dev)
>  	return container_of(_dev, struct virtio_device, dev);
>  }
>  
> +void virtio_add_status(struct virtio_device *dev, unsigned int status);
>  int register_virtio_device(struct virtio_device *dev);
>  void unregister_virtio_device(struct virtio_device *dev);
>  
>  void virtio_break_device(struct virtio_device *dev);
>  
>  void virtio_config_changed(struct virtio_device *dev);
> +void virtio_config_disable(struct virtio_device *dev);
> +void virtio_config_enable(struct virtio_device *dev);
> +int virtio_finalize_features(struct virtio_device *dev);
>  #ifdef CONFIG_PM_SLEEP
>  int virtio_device_freeze(struct virtio_device *dev);
>  int virtio_device_restore(struct virtio_device *dev);

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [net PATCH v5 6/6] virtio_net: XDP support for adjust_head
  2017-01-18 15:15   ` Michael S. Tsirkin
@ 2017-01-19  3:05     ` Jason Wang
  2017-01-19 21:11       ` Michael S. Tsirkin
  0 siblings, 1 reply; 40+ messages in thread
From: Jason Wang @ 2017-01-19  3:05 UTC (permalink / raw)
  To: Michael S. Tsirkin, John Fastabend
  Cc: john.r.fastabend, netdev, alexei.starovoitov, daniel



On 2017年01月18日 23:15, Michael S. Tsirkin wrote:
> On Tue, Jan 17, 2017 at 02:22:59PM -0800, John Fastabend wrote:
>> Add support for XDP adjust head by allocating a 256B header region
>> that XDP programs can grow into. This is only enabled when a XDP
>> program is loaded.
>>
>> In order to ensure that we do not have to unwind queue headroom push
>> queue setup below bpf_prog_add. It reads better to do a prog ref
>> unwind vs another queue setup call.
>>
>> At the moment this code must do a full reset to ensure old buffers
>> without headroom on program add or with headroom on program removal
>> are not used incorrectly in the datapath. Ideally we would only
>> have to disable/enable the RX queues being updated but there is no
>> API to do this at the moment in virtio so use the big hammer. In
>> practice it is likely not that big of a problem as this will only
>> happen when XDP is enabled/disabled changing programs does not
>> require the reset. There is some risk that the driver may either
>> have an allocation failure or for some reason fail to correctly
>> negotiate with the underlying backend in this case the driver will
>> be left uninitialized. I have not seen this ever happen on my test
>> systems and for what its worth this same failure case can occur
>> from probe and other contexts in virtio framework.
>>
>> Signed-off-by: John Fastabend<john.r.fastabend@intel.com>
> I've been thinking about it - can't we drop
> old buffers without the head room which were posted before
> xdp attached?
>
> Avoiding the reset would be much nicer.
>
> Thoughts?
>

As been discussed before, device may use them in the same time so it's 
not safe. Or do you mean detect them after xdp were set and drop the 
buffer without head room, this looks sub-optimal.

Thanks

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [net PATCH v5 6/6] virtio_net: XDP support for adjust_head
  2017-01-19  3:05     ` Jason Wang
@ 2017-01-19 21:11       ` Michael S. Tsirkin
  2017-01-20  3:26         ` Jason Wang
                           ` (2 more replies)
  0 siblings, 3 replies; 40+ messages in thread
From: Michael S. Tsirkin @ 2017-01-19 21:11 UTC (permalink / raw)
  To: Jason Wang
  Cc: John Fastabend, john.r.fastabend, netdev, alexei.starovoitov, daniel

On Thu, Jan 19, 2017 at 11:05:40AM +0800, Jason Wang wrote:
> 
> 
> On 2017年01月18日 23:15, Michael S. Tsirkin wrote:
> > On Tue, Jan 17, 2017 at 02:22:59PM -0800, John Fastabend wrote:
> > > Add support for XDP adjust head by allocating a 256B header region
> > > that XDP programs can grow into. This is only enabled when a XDP
> > > program is loaded.
> > > 
> > > In order to ensure that we do not have to unwind queue headroom push
> > > queue setup below bpf_prog_add. It reads better to do a prog ref
> > > unwind vs another queue setup call.
> > > 
> > > At the moment this code must do a full reset to ensure old buffers
> > > without headroom on program add or with headroom on program removal
> > > are not used incorrectly in the datapath. Ideally we would only
> > > have to disable/enable the RX queues being updated but there is no
> > > API to do this at the moment in virtio so use the big hammer. In
> > > practice it is likely not that big of a problem as this will only
> > > happen when XDP is enabled/disabled changing programs does not
> > > require the reset. There is some risk that the driver may either
> > > have an allocation failure or for some reason fail to correctly
> > > negotiate with the underlying backend in this case the driver will
> > > be left uninitialized. I have not seen this ever happen on my test
> > > systems and for what its worth this same failure case can occur
> > > from probe and other contexts in virtio framework.
> > > 
> > > Signed-off-by: John Fastabend<john.r.fastabend@intel.com>
> > I've been thinking about it - can't we drop
> > old buffers without the head room which were posted before
> > xdp attached?
> > 
> > Avoiding the reset would be much nicer.
> > 
> > Thoughts?
> > 
> 
> As been discussed before, device may use them in the same time so it's not
> safe. Or do you mean detect them after xdp were set and drop the buffer
> without head room, this looks sub-optimal.
> 
> Thanks

Yes, this is what I mean.  Why is this suboptimal? It's a single branch
in code. Yes we might lose some packets but the big hammer of device
reset will likely lose more.

-- 
MST

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [net PATCH v5 6/6] virtio_net: XDP support for adjust_head
  2017-01-19 21:11       ` Michael S. Tsirkin
@ 2017-01-20  3:26         ` Jason Wang
  2017-01-20  3:39           ` John Fastabend
  2017-01-20  3:38         ` John Fastabend
  2017-01-20 16:59         ` David Laight
  2 siblings, 1 reply; 40+ messages in thread
From: Jason Wang @ 2017-01-20  3:26 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: John Fastabend, john.r.fastabend, netdev, alexei.starovoitov, daniel



On 2017年01月20日 05:11, Michael S. Tsirkin wrote:
> On Thu, Jan 19, 2017 at 11:05:40AM +0800, Jason Wang wrote:
>>
>> On 2017年01月18日 23:15, Michael S. Tsirkin wrote:
>>> On Tue, Jan 17, 2017 at 02:22:59PM -0800, John Fastabend wrote:
>>>> Add support for XDP adjust head by allocating a 256B header region
>>>> that XDP programs can grow into. This is only enabled when a XDP
>>>> program is loaded.
>>>>
>>>> In order to ensure that we do not have to unwind queue headroom push
>>>> queue setup below bpf_prog_add. It reads better to do a prog ref
>>>> unwind vs another queue setup call.
>>>>
>>>> At the moment this code must do a full reset to ensure old buffers
>>>> without headroom on program add or with headroom on program removal
>>>> are not used incorrectly in the datapath. Ideally we would only
>>>> have to disable/enable the RX queues being updated but there is no
>>>> API to do this at the moment in virtio so use the big hammer. In
>>>> practice it is likely not that big of a problem as this will only
>>>> happen when XDP is enabled/disabled changing programs does not
>>>> require the reset. There is some risk that the driver may either
>>>> have an allocation failure or for some reason fail to correctly
>>>> negotiate with the underlying backend in this case the driver will
>>>> be left uninitialized. I have not seen this ever happen on my test
>>>> systems and for what its worth this same failure case can occur
>>>> from probe and other contexts in virtio framework.
>>>>
>>>> Signed-off-by: John Fastabend<john.r.fastabend@intel.com>
>>> I've been thinking about it - can't we drop
>>> old buffers without the head room which were posted before
>>> xdp attached?
>>>
>>> Avoiding the reset would be much nicer.
>>>
>>> Thoughts?
>>>
>> As been discussed before, device may use them in the same time so it's not
>> safe. Or do you mean detect them after xdp were set and drop the buffer
>> without head room, this looks sub-optimal.
>>
>> Thanks
> Yes, this is what I mean.  Why is this suboptimal? It's a single branch
> in code. Yes we might lose some packets but the big hammer of device
> reset will likely lose more.
>

Maybe I was wrong but I think driver should try their best to avoid 
dropping packets. (And look at mlx4, it did something similar to this 
patch).

Thanks

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [net PATCH v5 6/6] virtio_net: XDP support for adjust_head
  2017-01-19 21:11       ` Michael S. Tsirkin
  2017-01-20  3:26         ` Jason Wang
@ 2017-01-20  3:38         ` John Fastabend
  2017-01-20 16:59         ` David Laight
  2 siblings, 0 replies; 40+ messages in thread
From: John Fastabend @ 2017-01-20  3:38 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jason Wang
  Cc: john.r.fastabend, netdev, alexei.starovoitov, daniel

On 17-01-19 01:11 PM, Michael S. Tsirkin wrote:
> On Thu, Jan 19, 2017 at 11:05:40AM +0800, Jason Wang wrote:
>>
>>
>> On 2017年01月18日 23:15, Michael S. Tsirkin wrote:
>>> On Tue, Jan 17, 2017 at 02:22:59PM -0800, John Fastabend wrote:
>>>> Add support for XDP adjust head by allocating a 256B header region
>>>> that XDP programs can grow into. This is only enabled when a XDP
>>>> program is loaded.
>>>>
>>>> In order to ensure that we do not have to unwind queue headroom push
>>>> queue setup below bpf_prog_add. It reads better to do a prog ref
>>>> unwind vs another queue setup call.
>>>>
>>>> At the moment this code must do a full reset to ensure old buffers
>>>> without headroom on program add or with headroom on program removal
>>>> are not used incorrectly in the datapath. Ideally we would only
>>>> have to disable/enable the RX queues being updated but there is no
>>>> API to do this at the moment in virtio so use the big hammer. In
>>>> practice it is likely not that big of a problem as this will only
>>>> happen when XDP is enabled/disabled changing programs does not
>>>> require the reset. There is some risk that the driver may either
>>>> have an allocation failure or for some reason fail to correctly
>>>> negotiate with the underlying backend in this case the driver will
>>>> be left uninitialized. I have not seen this ever happen on my test
>>>> systems and for what its worth this same failure case can occur
>>>> from probe and other contexts in virtio framework.
>>>>
>>>> Signed-off-by: John Fastabend<john.r.fastabend@intel.com>
>>> I've been thinking about it - can't we drop
>>> old buffers without the head room which were posted before
>>> xdp attached?
>>>
>>> Avoiding the reset would be much nicer.
>>>
>>> Thoughts?
>>>
>>
>> As been discussed before, device may use them in the same time so it's not
>> safe. Or do you mean detect them after xdp were set and drop the buffer
>> without head room, this looks sub-optimal.
>>
>> Thanks
> 
> Yes, this is what I mean.  Why is this suboptimal? It's a single branch
> in code. Yes we might lose some packets but the big hammer of device
> reset will likely lose more.
> 

Maybe I'm not following, is the suggestion to drop the packets after XDP
is setup for all outstanding buffers until we have done a reallocation of
all the buffers? In this case we can't just detach the buffers we have to
wait until the backend retires them by using them correct?

But when XDP setup call returns we need to guarantee that buffers and
driver are setup. Otherwise the next n packets get dropped in the future.
If there is no traffic currently this could be at some undetermined point
in the future. This will be very buggy.

Did I miss something?

Thanks,
John

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [net PATCH v5 6/6] virtio_net: XDP support for adjust_head
  2017-01-20  3:26         ` Jason Wang
@ 2017-01-20  3:39           ` John Fastabend
  0 siblings, 0 replies; 40+ messages in thread
From: John Fastabend @ 2017-01-20  3:39 UTC (permalink / raw)
  To: Jason Wang, Michael S. Tsirkin
  Cc: john.r.fastabend, netdev, alexei.starovoitov, daniel

On 17-01-19 07:26 PM, Jason Wang wrote:
> 
> 
> On 2017年01月20日 05:11, Michael S. Tsirkin wrote:
>> On Thu, Jan 19, 2017 at 11:05:40AM +0800, Jason Wang wrote:
>>>
>>> On 2017年01月18日 23:15, Michael S. Tsirkin wrote:
>>>> On Tue, Jan 17, 2017 at 02:22:59PM -0800, John Fastabend wrote:
>>>>> Add support for XDP adjust head by allocating a 256B header region
>>>>> that XDP programs can grow into. This is only enabled when a XDP
>>>>> program is loaded.
>>>>>
>>>>> In order to ensure that we do not have to unwind queue headroom push
>>>>> queue setup below bpf_prog_add. It reads better to do a prog ref
>>>>> unwind vs another queue setup call.
>>>>>
>>>>> At the moment this code must do a full reset to ensure old buffers
>>>>> without headroom on program add or with headroom on program removal
>>>>> are not used incorrectly in the datapath. Ideally we would only
>>>>> have to disable/enable the RX queues being updated but there is no
>>>>> API to do this at the moment in virtio so use the big hammer. In
>>>>> practice it is likely not that big of a problem as this will only
>>>>> happen when XDP is enabled/disabled changing programs does not
>>>>> require the reset. There is some risk that the driver may either
>>>>> have an allocation failure or for some reason fail to correctly
>>>>> negotiate with the underlying backend in this case the driver will
>>>>> be left uninitialized. I have not seen this ever happen on my test
>>>>> systems and for what its worth this same failure case can occur
>>>>> from probe and other contexts in virtio framework.
>>>>>
>>>>> Signed-off-by: John Fastabend<john.r.fastabend@intel.com>
>>>> I've been thinking about it - can't we drop
>>>> old buffers without the head room which were posted before
>>>> xdp attached?
>>>>
>>>> Avoiding the reset would be much nicer.
>>>>
>>>> Thoughts?
>>>>
>>> As been discussed before, device may use them in the same time so it's not
>>> safe. Or do you mean detect them after xdp were set and drop the buffer
>>> without head room, this looks sub-optimal.
>>>
>>> Thanks
>> Yes, this is what I mean.  Why is this suboptimal? It's a single branch
>> in code. Yes we might lose some packets but the big hammer of device
>> reset will likely lose more.
>>
> 
> Maybe I was wrong but I think driver should try their best to avoid dropping
> packets. (And look at mlx4, it did something similar to this patch).
> 
> Thanks

+1 sorry didn't see your reply as I was typing mine. Bottom line when XDP
returns I believe the driver must be ready to accept packets or managing
XDP will be problematic.

.John

^ permalink raw reply	[flat|nested] 40+ messages in thread

* RE: [net PATCH v5 6/6] virtio_net: XDP support for adjust_head
  2017-01-19 21:11       ` Michael S. Tsirkin
  2017-01-20  3:26         ` Jason Wang
  2017-01-20  3:38         ` John Fastabend
@ 2017-01-20 16:59         ` David Laight
  2017-01-20 17:48           ` Michael S. Tsirkin
  2 siblings, 1 reply; 40+ messages in thread
From: David Laight @ 2017-01-20 16:59 UTC (permalink / raw)
  To: 'Michael S. Tsirkin', Jason Wang
  Cc: John Fastabend, john.r.fastabend, netdev, alexei.starovoitov, daniel

From: Michael S. Tsirkin
> Sent: 19 January 2017 21:12
> > On 2017?01?18? 23:15, Michael S. Tsirkin wrote:
> > > On Tue, Jan 17, 2017 at 02:22:59PM -0800, John Fastabend wrote:
> > > > Add support for XDP adjust head by allocating a 256B header region
> > > > that XDP programs can grow into. This is only enabled when a XDP
> > > > program is loaded.
> > > >
> > > > In order to ensure that we do not have to unwind queue headroom push
> > > > queue setup below bpf_prog_add. It reads better to do a prog ref
> > > > unwind vs another queue setup call.
> > > >
> > > > At the moment this code must do a full reset to ensure old buffers
> > > > without headroom on program add or with headroom on program removal
> > > > are not used incorrectly in the datapath. Ideally we would only
> > > > have to disable/enable the RX queues being updated but there is no
> > > > API to do this at the moment in virtio so use the big hammer. In
> > > > practice it is likely not that big of a problem as this will only
> > > > happen when XDP is enabled/disabled changing programs does not
> > > > require the reset. There is some risk that the driver may either
> > > > have an allocation failure or for some reason fail to correctly
> > > > negotiate with the underlying backend in this case the driver will
> > > > be left uninitialized. I have not seen this ever happen on my test
> > > > systems and for what its worth this same failure case can occur
> > > > from probe and other contexts in virtio framework.
> > > >
> > > > Signed-off-by: John Fastabend<john.r.fastabend@intel.com>
> > > I've been thinking about it - can't we drop
> > > old buffers without the head room which were posted before
> > > xdp attached?
> > >
> > > Avoiding the reset would be much nicer.
> > >
> > > Thoughts?
> > >
> >
> > As been discussed before, device may use them in the same time so it's not
> > safe. Or do you mean detect them after xdp were set and drop the buffer
> > without head room, this looks sub-optimal.
> >
> > Thanks
> 
> Yes, this is what I mean.  Why is this suboptimal? It's a single branch
> in code. Yes we might lose some packets but the big hammer of device
> reset will likely lose more.

Why not leave let the hardware receive into the 'small' buffer (without
headroom) and do a copy when a frame is received.
Replace the buffers with 'big' ones for the next receive.
A data copy on a ring full of buffers won't really be noticed.

	David


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [net PATCH v5 6/6] virtio_net: XDP support for adjust_head
  2017-01-20 16:59         ` David Laight
@ 2017-01-20 17:48           ` Michael S. Tsirkin
  2017-01-22  2:51             ` Jason Wang
  0 siblings, 1 reply; 40+ messages in thread
From: Michael S. Tsirkin @ 2017-01-20 17:48 UTC (permalink / raw)
  To: David Laight
  Cc: Jason Wang, John Fastabend, john.r.fastabend, netdev,
	alexei.starovoitov, daniel

On Fri, Jan 20, 2017 at 04:59:11PM +0000, David Laight wrote:
> From: Michael S. Tsirkin
> > Sent: 19 January 2017 21:12
> > > On 2017?01?18? 23:15, Michael S. Tsirkin wrote:
> > > > On Tue, Jan 17, 2017 at 02:22:59PM -0800, John Fastabend wrote:
> > > > > Add support for XDP adjust head by allocating a 256B header region
> > > > > that XDP programs can grow into. This is only enabled when a XDP
> > > > > program is loaded.
> > > > >
> > > > > In order to ensure that we do not have to unwind queue headroom push
> > > > > queue setup below bpf_prog_add. It reads better to do a prog ref
> > > > > unwind vs another queue setup call.
> > > > >
> > > > > At the moment this code must do a full reset to ensure old buffers
> > > > > without headroom on program add or with headroom on program removal
> > > > > are not used incorrectly in the datapath. Ideally we would only
> > > > > have to disable/enable the RX queues being updated but there is no
> > > > > API to do this at the moment in virtio so use the big hammer. In
> > > > > practice it is likely not that big of a problem as this will only
> > > > > happen when XDP is enabled/disabled changing programs does not
> > > > > require the reset. There is some risk that the driver may either
> > > > > have an allocation failure or for some reason fail to correctly
> > > > > negotiate with the underlying backend in this case the driver will
> > > > > be left uninitialized. I have not seen this ever happen on my test
> > > > > systems and for what its worth this same failure case can occur
> > > > > from probe and other contexts in virtio framework.
> > > > >
> > > > > Signed-off-by: John Fastabend<john.r.fastabend@intel.com>
> > > > I've been thinking about it - can't we drop
> > > > old buffers without the head room which were posted before
> > > > xdp attached?
> > > >
> > > > Avoiding the reset would be much nicer.
> > > >
> > > > Thoughts?
> > > >
> > >
> > > As been discussed before, device may use them in the same time so it's not
> > > safe. Or do you mean detect them after xdp were set and drop the buffer
> > > without head room, this looks sub-optimal.
> > >
> > > Thanks
> > 
> > Yes, this is what I mean.  Why is this suboptimal? It's a single branch
> > in code. Yes we might lose some packets but the big hammer of device
> > reset will likely lose more.
> 
> Why not leave let the hardware receive into the 'small' buffer (without
> headroom) and do a copy when a frame is received.
> Replace the buffers with 'big' ones for the next receive.
> A data copy on a ring full of buffers won't really be noticed.
> 
> 	David
> 

I like that. John?

-- 
MST

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [net PATCH v5 6/6] virtio_net: XDP support for adjust_head
  2017-01-20 17:48           ` Michael S. Tsirkin
@ 2017-01-22  2:51             ` Jason Wang
  2017-01-22  4:14               ` John Fastabend
  0 siblings, 1 reply; 40+ messages in thread
From: Jason Wang @ 2017-01-22  2:51 UTC (permalink / raw)
  To: Michael S. Tsirkin, David Laight
  Cc: John Fastabend, john.r.fastabend, netdev, alexei.starovoitov, daniel



On 2017年01月21日 01:48, Michael S. Tsirkin wrote:
> On Fri, Jan 20, 2017 at 04:59:11PM +0000, David Laight wrote:
>> From: Michael S. Tsirkin
>>> Sent: 19 January 2017 21:12
>>>> On 2017?01?18? 23:15, Michael S. Tsirkin wrote:
>>>>> On Tue, Jan 17, 2017 at 02:22:59PM -0800, John Fastabend wrote:
>>>>>> Add support for XDP adjust head by allocating a 256B header region
>>>>>> that XDP programs can grow into. This is only enabled when a XDP
>>>>>> program is loaded.
>>>>>>
>>>>>> In order to ensure that we do not have to unwind queue headroom push
>>>>>> queue setup below bpf_prog_add. It reads better to do a prog ref
>>>>>> unwind vs another queue setup call.
>>>>>>
>>>>>> At the moment this code must do a full reset to ensure old buffers
>>>>>> without headroom on program add or with headroom on program removal
>>>>>> are not used incorrectly in the datapath. Ideally we would only
>>>>>> have to disable/enable the RX queues being updated but there is no
>>>>>> API to do this at the moment in virtio so use the big hammer. In
>>>>>> practice it is likely not that big of a problem as this will only
>>>>>> happen when XDP is enabled/disabled changing programs does not
>>>>>> require the reset. There is some risk that the driver may either
>>>>>> have an allocation failure or for some reason fail to correctly
>>>>>> negotiate with the underlying backend in this case the driver will
>>>>>> be left uninitialized. I have not seen this ever happen on my test
>>>>>> systems and for what its worth this same failure case can occur
>>>>>> from probe and other contexts in virtio framework.
>>>>>>
>>>>>> Signed-off-by: John Fastabend<john.r.fastabend@intel.com>
>>>>> I've been thinking about it - can't we drop
>>>>> old buffers without the head room which were posted before
>>>>> xdp attached?
>>>>>
>>>>> Avoiding the reset would be much nicer.
>>>>>
>>>>> Thoughts?
>>>>>
>>>> As been discussed before, device may use them in the same time so it's not
>>>> safe. Or do you mean detect them after xdp were set and drop the buffer
>>>> without head room, this looks sub-optimal.
>>>>
>>>> Thanks
>>> Yes, this is what I mean.  Why is this suboptimal? It's a single branch
>>> in code. Yes we might lose some packets but the big hammer of device
>>> reset will likely lose more.
>> Why not leave let the hardware receive into the 'small' buffer (without
>> headroom) and do a copy when a frame is received.
>> Replace the buffers with 'big' ones for the next receive.
>> A data copy on a ring full of buffers won't really be noticed.
>>
>> 	David
>>
> I like that. John?
>

This works, I prefer this only if it uses simpler code (but I suspect) 
than reset.

Thanks

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [net PATCH v5 6/6] virtio_net: XDP support for adjust_head
  2017-01-22  2:51             ` Jason Wang
@ 2017-01-22  4:14               ` John Fastabend
  2017-01-23 17:02                 ` Michael S. Tsirkin
  0 siblings, 1 reply; 40+ messages in thread
From: John Fastabend @ 2017-01-22  4:14 UTC (permalink / raw)
  To: Jason Wang, Michael S. Tsirkin, David Laight
  Cc: john.r.fastabend, netdev, alexei.starovoitov, daniel

On 17-01-21 06:51 PM, Jason Wang wrote:
> 
> 
> On 2017年01月21日 01:48, Michael S. Tsirkin wrote:
>> On Fri, Jan 20, 2017 at 04:59:11PM +0000, David Laight wrote:
>>> From: Michael S. Tsirkin
>>>> Sent: 19 January 2017 21:12
>>>>> On 2017?01?18? 23:15, Michael S. Tsirkin wrote:
>>>>>> On Tue, Jan 17, 2017 at 02:22:59PM -0800, John Fastabend wrote:
>>>>>>> Add support for XDP adjust head by allocating a 256B header region
>>>>>>> that XDP programs can grow into. This is only enabled when a XDP
>>>>>>> program is loaded.
>>>>>>>
>>>>>>> In order to ensure that we do not have to unwind queue headroom push
>>>>>>> queue setup below bpf_prog_add. It reads better to do a prog ref
>>>>>>> unwind vs another queue setup call.
>>>>>>>
>>>>>>> At the moment this code must do a full reset to ensure old buffers
>>>>>>> without headroom on program add or with headroom on program removal
>>>>>>> are not used incorrectly in the datapath. Ideally we would only
>>>>>>> have to disable/enable the RX queues being updated but there is no
>>>>>>> API to do this at the moment in virtio so use the big hammer. In
>>>>>>> practice it is likely not that big of a problem as this will only
>>>>>>> happen when XDP is enabled/disabled changing programs does not
>>>>>>> require the reset. There is some risk that the driver may either
>>>>>>> have an allocation failure or for some reason fail to correctly
>>>>>>> negotiate with the underlying backend in this case the driver will
>>>>>>> be left uninitialized. I have not seen this ever happen on my test
>>>>>>> systems and for what its worth this same failure case can occur
>>>>>>> from probe and other contexts in virtio framework.
>>>>>>>
>>>>>>> Signed-off-by: John Fastabend<john.r.fastabend@intel.com>
>>>>>> I've been thinking about it - can't we drop
>>>>>> old buffers without the head room which were posted before
>>>>>> xdp attached?
>>>>>>
>>>>>> Avoiding the reset would be much nicer.
>>>>>>
>>>>>> Thoughts?
>>>>>>
>>>>> As been discussed before, device may use them in the same time so it's not
>>>>> safe. Or do you mean detect them after xdp were set and drop the buffer
>>>>> without head room, this looks sub-optimal.
>>>>>
>>>>> Thanks
>>>> Yes, this is what I mean.  Why is this suboptimal? It's a single branch
>>>> in code. Yes we might lose some packets but the big hammer of device
>>>> reset will likely lose more.
>>> Why not leave let the hardware receive into the 'small' buffer (without
>>> headroom) and do a copy when a frame is received.
>>> Replace the buffers with 'big' ones for the next receive.
>>> A data copy on a ring full of buffers won't really be noticed.
>>>
>>>     David
>>>
>> I like that. John?
>>
> 
> This works, I prefer this only if it uses simpler code (but I suspect) than reset.
> 
> Thanks

Before the reset path I looked at doing this but it seems to require tracking
if a buffer had headroom on a per buffer basis. I don't see a good spot to
put a bit like this? It could go in the inbuf 'ctx' added by virtqueue_add_inbuf
but I would need to change the current usage of ctx which in the mergeable case
at least is just a simple pointer today. I don't like this because it
complicates the normal path and the XDP hotpath.

Otherwise we could somehow mark the ring at the point where XDP is enabled so
that it can learn when a full iteration around the ring. But I can't see a
simple way to make this work either.

I don't know the reset look straight forward to me and although not ideal is
fairly common on hardware based drivers during configuration changes. I'm open
to any ideas on where to put the metadata to track headroom though.

Thanks,
John

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [net PATCH v5 6/6] virtio_net: XDP support for adjust_head
  2017-01-22  4:14               ` John Fastabend
@ 2017-01-23 17:02                 ` Michael S. Tsirkin
  0 siblings, 0 replies; 40+ messages in thread
From: Michael S. Tsirkin @ 2017-01-23 17:02 UTC (permalink / raw)
  To: John Fastabend
  Cc: Jason Wang, David Laight, john.r.fastabend, netdev,
	alexei.starovoitov, daniel

On Sat, Jan 21, 2017 at 08:14:19PM -0800, John Fastabend wrote:
> On 17-01-21 06:51 PM, Jason Wang wrote:
> > 
> > 
> > On 2017年01月21日 01:48, Michael S. Tsirkin wrote:
> >> On Fri, Jan 20, 2017 at 04:59:11PM +0000, David Laight wrote:
> >>> From: Michael S. Tsirkin
> >>>> Sent: 19 January 2017 21:12
> >>>>> On 2017?01?18? 23:15, Michael S. Tsirkin wrote:
> >>>>>> On Tue, Jan 17, 2017 at 02:22:59PM -0800, John Fastabend wrote:
> >>>>>>> Add support for XDP adjust head by allocating a 256B header region
> >>>>>>> that XDP programs can grow into. This is only enabled when a XDP
> >>>>>>> program is loaded.
> >>>>>>>
> >>>>>>> In order to ensure that we do not have to unwind queue headroom push
> >>>>>>> queue setup below bpf_prog_add. It reads better to do a prog ref
> >>>>>>> unwind vs another queue setup call.
> >>>>>>>
> >>>>>>> At the moment this code must do a full reset to ensure old buffers
> >>>>>>> without headroom on program add or with headroom on program removal
> >>>>>>> are not used incorrectly in the datapath. Ideally we would only
> >>>>>>> have to disable/enable the RX queues being updated but there is no
> >>>>>>> API to do this at the moment in virtio so use the big hammer. In
> >>>>>>> practice it is likely not that big of a problem as this will only
> >>>>>>> happen when XDP is enabled/disabled changing programs does not
> >>>>>>> require the reset. There is some risk that the driver may either
> >>>>>>> have an allocation failure or for some reason fail to correctly
> >>>>>>> negotiate with the underlying backend in this case the driver will
> >>>>>>> be left uninitialized. I have not seen this ever happen on my test
> >>>>>>> systems and for what its worth this same failure case can occur
> >>>>>>> from probe and other contexts in virtio framework.
> >>>>>>>
> >>>>>>> Signed-off-by: John Fastabend<john.r.fastabend@intel.com>
> >>>>>> I've been thinking about it - can't we drop
> >>>>>> old buffers without the head room which were posted before
> >>>>>> xdp attached?
> >>>>>>
> >>>>>> Avoiding the reset would be much nicer.
> >>>>>>
> >>>>>> Thoughts?
> >>>>>>
> >>>>> As been discussed before, device may use them in the same time so it's not
> >>>>> safe. Or do you mean detect them after xdp were set and drop the buffer
> >>>>> without head room, this looks sub-optimal.
> >>>>>
> >>>>> Thanks
> >>>> Yes, this is what I mean.  Why is this suboptimal? It's a single branch
> >>>> in code. Yes we might lose some packets but the big hammer of device
> >>>> reset will likely lose more.
> >>> Why not leave let the hardware receive into the 'small' buffer (without
> >>> headroom) and do a copy when a frame is received.
> >>> Replace the buffers with 'big' ones for the next receive.
> >>> A data copy on a ring full of buffers won't really be noticed.
> >>>
> >>>     David
> >>>
> >> I like that. John?
> >>
> > 
> > This works, I prefer this only if it uses simpler code (but I suspect) than reset.
> > 
> > Thanks
> 
> Before the reset path I looked at doing this but it seems to require tracking
> if a buffer had headroom on a per buffer basis. I don't see a good spot to
> put a bit like this? It could go in the inbuf 'ctx' added by virtqueue_add_inbuf
> but I would need to change the current usage of ctx which in the mergeable case
> at least is just a simple pointer today. I don't like this because it
> complicates the normal path and the XDP hotpath.
> 
> Otherwise we could somehow mark the ring at the point where XDP is enabled so
> that it can learn when a full iteration around the ring. But I can't see a
> simple way to make this work either.
> 
> I don't know the reset look straight forward to me and although not ideal is
> fairly common on hardware based drivers during configuration changes. I'm open
> to any ideas on where to put the metadata to track headroom though.
> 
> Thanks,
> John


Well with 4K pages we actually have 4 spare bits to use.
In fact this means we could reduce the mergeable buffer alignment.
It starts getting strange with 64K pages where we are
out of space, and I just noticed that with bigger pages
virtio is actually broken.

So let me fix it up first of all, and on top - maybe we can just
increase the alignment for 64k pages and up?
Truesize alignment to 512 is still reasonable and presumably
these 64k page boxes have lots of memory.
Would it make sense to tweak SK_RMEM_MAX up for larger
page sizes?

-- 
MST

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [net PATCH v5 6/6] virtio_net: XDP support for adjust_head
  2017-01-17 22:22 ` [net PATCH v5 6/6] virtio_net: XDP support for adjust_head John Fastabend
  2017-01-18  3:35   ` Jason Wang
  2017-01-18 15:15   ` Michael S. Tsirkin
@ 2017-01-23 19:22   ` Michael S. Tsirkin
  2017-01-23 20:09     ` Michael S. Tsirkin
  2 siblings, 1 reply; 40+ messages in thread
From: Michael S. Tsirkin @ 2017-01-23 19:22 UTC (permalink / raw)
  To: John Fastabend
  Cc: jasowang, john.r.fastabend, netdev, alexei.starovoitov, daniel

On Tue, Jan 17, 2017 at 02:22:59PM -0800, John Fastabend wrote:
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 62dbf4b..3b129b4 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -41,6 +41,9 @@
>  #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN)
>  #define GOOD_COPY_LEN	128
>  
> +/* Amount of XDP headroom to prepend to packets for use by xdp_adjust_head */
> +#define VIRTIO_XDP_HEADROOM 256
> +
>  /* RX packet size EWMA. The average packet size is used to determine the packet
>   * buffer size when refilling RX rings. As the entire RX ring may be refilled
>   * at once, the weight is chosen so that the EWMA will be insensitive to short-

I wonder where does this number come from?  This is quite a lot and
means that using XDP_PASS will slow down any sockets on top of it.
Which in turn means people will try to remove XDP when not in use,
causing resets.  E.g. build_skb (which I have a patch to switch to) uses
a much more reasonable NET_SKB_PAD.

-- 
MST

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [net PATCH v5 6/6] virtio_net: XDP support for adjust_head
  2017-01-23 19:22   ` Michael S. Tsirkin
@ 2017-01-23 20:09     ` Michael S. Tsirkin
  2017-01-23 22:12       ` John Fastabend
  0 siblings, 1 reply; 40+ messages in thread
From: Michael S. Tsirkin @ 2017-01-23 20:09 UTC (permalink / raw)
  To: John Fastabend
  Cc: jasowang, john.r.fastabend, netdev, alexei.starovoitov, daniel

On Mon, Jan 23, 2017 at 09:22:36PM +0200, Michael S. Tsirkin wrote:
> On Tue, Jan 17, 2017 at 02:22:59PM -0800, John Fastabend wrote:
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index 62dbf4b..3b129b4 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -41,6 +41,9 @@
> >  #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN)
> >  #define GOOD_COPY_LEN	128
> >  
> > +/* Amount of XDP headroom to prepend to packets for use by xdp_adjust_head */
> > +#define VIRTIO_XDP_HEADROOM 256
> > +
> >  /* RX packet size EWMA. The average packet size is used to determine the packet
> >   * buffer size when refilling RX rings. As the entire RX ring may be refilled
> >   * at once, the weight is chosen so that the EWMA will be insensitive to short-
> 
> I wonder where does this number come from?  This is quite a lot and
> means that using XDP_PASS will slow down any sockets on top of it.
> Which in turn means people will try to remove XDP when not in use,
> causing resets.  E.g. build_skb (which I have a patch to switch to) uses
> a much more reasonable NET_SKB_PAD.
> 
> -- 
> MST


Let me show you a patch that I've been cooking.  What is missing there
is handling corner cases like e.g.  when ring size is ~4 entries so
using smaller buffers might mean we no longer have enough space to store
a full packet.  So it looks like I have to maintain the skb copy path
for this hardware.

With this patch, standard configuration has NET_SKB_PAD + NET_IP_ALIGN
bytes head padding. Would this be enough for XDP? If yes we do not
need the resets.

Thoughts?

--->

virtio_net: switch to build_skb for mrg_rxbuf

For small packets data copy was observed to
take up about 15% CPU time. Switch to build_skb
and avoid the copy when using mergeable rx buffers.

As a bonus, medium-size skbs that fit in a page will be
completely linear.

Of course, we now need to lower the lower bound on packet size,
to make sure a sane number of skbs fits in rx socket buffer.
By how much? I don't know yet.

It might also be useful to prefetch the packet buffer since
net stack will likely use it soon.

Lightly tested, in particular, I didn't yet test what this
actually does to performance - sending this out for early
feedback/flames.

TODO: it appears that Linux won't handle correctly the case of first
buffer being very small (or consisting exclusively of virtio header).
This is already the case for current code, need to fix.
TODO: might be unfair to the last packet in a fragment as we include
remaining space if any in its truesize.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

----

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index b425fa1..a6b996f 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -38,6 +38,8 @@ module_param(gso, bool, 0444);
 
 /* FIXME: MTU in config. */
 #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN)
+//#define MIN_PACKET_ALLOC GOOD_PACKET_LEN
+#define MIN_PACKET_ALLOC 128
 #define GOOD_COPY_LEN	128
 
 /* RX packet size EWMA. The average packet size is used to determine the packet
@@ -246,6 +248,9 @@ static void *mergeable_ctx_to_buf_address(unsigned long mrg_ctx)
 static unsigned long mergeable_buf_to_ctx(void *buf, unsigned int truesize)
 {
 	unsigned int size = truesize / MERGEABLE_BUFFER_ALIGN;
+
+	BUG_ON((unsigned long)buf & (MERGEABLE_BUFFER_ALIGN - 1));
+	BUG_ON(size - 1 >= MERGEABLE_BUFFER_ALIGN);
 	return (unsigned long)buf | (size - 1);
 }
 
@@ -354,25 +359,54 @@ static struct sk_buff *receive_big(struct net_device *dev,
 	return NULL;
 }
 
+#define VNET_SKB_PAD (NET_SKB_PAD + NET_IP_ALIGN)
+#define VNET_SKB_BUG (VNET_SKB_PAD < sizeof(struct virtio_net_hdr_mrg_rxbuf))
+#define VNET_SKB_LEN(len) ((len) - sizeof(struct virtio_net_hdr_mrg_rxbuf))
+#define VNET_SKB_OFF VNET_SKB_LEN(VNET_SKB_PAD)
+
+static struct sk_buff *vnet_build_skb(struct virtnet_info *vi,
+				      void *buf,
+				      unsigned int len, unsigned int truesize)
+{
+	struct sk_buff *skb = build_skb(buf, truesize);
+
+	if (!skb)
+		return NULL;
+
+	skb_reserve(skb, VNET_SKB_PAD);
+	skb_put(skb, VNET_SKB_LEN(len));
+
+	return skb;
+}
+
 static struct sk_buff *receive_mergeable(struct net_device *dev,
 					 struct virtnet_info *vi,
 					 struct receive_queue *rq,
 					 unsigned long ctx,
-					 unsigned int len)
+					 unsigned int len,
+					 struct virtio_net_hdr_mrg_rxbuf *hdr)
 {
 	void *buf = mergeable_ctx_to_buf_address(ctx);
-	struct virtio_net_hdr_mrg_rxbuf *hdr = buf;
-	u16 num_buf = virtio16_to_cpu(vi->vdev, hdr->num_buffers);
+	u16 num_buf;
 	struct page *page = virt_to_head_page(buf);
-	int offset = buf - page_address(page);
-	unsigned int truesize = max(len, mergeable_ctx_to_buf_truesize(ctx));
+	unsigned int truesize = mergeable_ctx_to_buf_truesize(ctx);
+	int offset;
+	struct sk_buff *head_skb;
+	struct sk_buff *curr_skb;
+
+	BUG_ON(len > truesize);
 
-	struct sk_buff *head_skb = page_to_skb(vi, rq, page, offset, len,
-					       truesize);
-	struct sk_buff *curr_skb = head_skb;
+	/* copy header: build_skb will overwrite it */
+	memcpy(hdr, buf + VNET_SKB_OFF, sizeof *hdr);
+
+	head_skb = vnet_build_skb(vi, buf, len, truesize);
+	curr_skb = head_skb;
+
+	num_buf = virtio16_to_cpu(vi->vdev, hdr->num_buffers);
 
 	if (unlikely(!curr_skb))
 		goto err_skb;
+
 	while (--num_buf) {
 		int num_skb_frags;
 
@@ -386,7 +420,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 			goto err_buf;
 		}
 
-		buf = mergeable_ctx_to_buf_address(ctx);
+		buf = mergeable_ctx_to_buf_address(ctx) + VNET_SKB_OFF;
 		page = virt_to_head_page(buf);
 
 		num_skb_frags = skb_shinfo(curr_skb)->nr_frags;
@@ -403,7 +437,8 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 			head_skb->truesize += nskb->truesize;
 			num_skb_frags = 0;
 		}
-		truesize = max(len, mergeable_ctx_to_buf_truesize(ctx));
+		truesize = mergeable_ctx_to_buf_truesize(ctx);
+		BUG_ON(len > truesize);
 		if (curr_skb != head_skb) {
 			head_skb->data_len += len;
 			head_skb->len += len;
@@ -449,6 +484,7 @@ static void receive_buf(struct virtnet_info *vi, struct receive_queue *rq,
 	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
 	struct sk_buff *skb;
 	struct virtio_net_hdr_mrg_rxbuf *hdr;
+	struct virtio_net_hdr_mrg_rxbuf hdr0;
 
 	if (unlikely(len < vi->hdr_len + ETH_HLEN)) {
 		pr_debug("%s: short packet %i\n", dev->name, len);
@@ -465,17 +501,24 @@ static void receive_buf(struct virtnet_info *vi, struct receive_queue *rq,
 		return;
 	}
 
-	if (vi->mergeable_rx_bufs)
-		skb = receive_mergeable(dev, vi, rq, (unsigned long)buf, len);
-	else if (vi->big_packets)
+	if (vi->mergeable_rx_bufs) {
+		skb = receive_mergeable(dev, vi, rq, (unsigned long)buf, len,
+					&hdr0);
+		if (unlikely(!skb))
+			return;
+		hdr = &hdr0;
+	} else if (vi->big_packets) {
 		skb = receive_big(dev, vi, rq, buf, len);
-	else
+		if (unlikely(!skb))
+			return;
+		hdr = skb_vnet_hdr(skb);
+	} else {
 		skb = receive_small(vi, buf, len);
+		if (unlikely(!skb))
+			return;
+		hdr = skb_vnet_hdr(skb);
+	}
 
-	if (unlikely(!skb))
-		return;
-
-	hdr = skb_vnet_hdr(skb);
 
 	u64_stats_update_begin(&stats->rx_syncp);
 	stats->rx_bytes += skb->len;
@@ -581,11 +624,14 @@ static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq,
 
 static unsigned int get_mergeable_buf_len(struct ewma_pkt_len *avg_pkt_len)
 {
-	const size_t hdr_len = sizeof(struct virtio_net_hdr_mrg_rxbuf);
+	unsigned int hdr;
 	unsigned int len;
 
-	len = hdr_len + clamp_t(unsigned int, ewma_pkt_len_read(avg_pkt_len),
-			GOOD_PACKET_LEN, PAGE_SIZE - hdr_len);
+	hdr = ALIGN(VNET_SKB_PAD + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)),
+		    MERGEABLE_BUFFER_ALIGN);
+
+	len = hdr + clamp_t(unsigned int, ewma_pkt_len_read(avg_pkt_len),
+			    MIN_PACKET_ALLOC, PAGE_SIZE - hdr);
 	return ALIGN(len, MERGEABLE_BUFFER_ALIGN);
 }
 
@@ -601,8 +647,11 @@ static int add_recvbuf_mergeable(struct receive_queue *rq, gfp_t gfp)
 	if (unlikely(!skb_page_frag_refill(len, alloc_frag, gfp)))
 		return -ENOMEM;
 
-	buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
-	ctx = mergeable_buf_to_ctx(buf, len);
+	BUILD_BUG_ON(VNET_SKB_BUG);
+
+	buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset +
+		VNET_SKB_OFF;
+	//ctx = mergeable_buf_to_ctx(buf - VNET_SKB_OFF, len);
 	get_page(alloc_frag->page);
 	alloc_frag->offset += len;
 	hole = alloc_frag->size - alloc_frag->offset;
@@ -615,8 +664,10 @@ static int add_recvbuf_mergeable(struct receive_queue *rq, gfp_t gfp)
 		len += hole;
 		alloc_frag->offset += hole;
 	}
+	ctx = mergeable_buf_to_ctx(buf - VNET_SKB_OFF, len);
 
-	sg_init_one(rq->sg, buf, len);
+	sg_init_one(rq->sg, buf,
+		    len - VNET_SKB_OFF - SKB_DATA_ALIGN(sizeof(struct skb_shared_info)));
 	err = virtqueue_add_inbuf(rq->vq, rq->sg, 1, (void *)ctx, gfp);
 	if (err < 0)
 		put_page(virt_to_head_page(buf));

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [net PATCH v5 1/6] virtio_net: use dev_kfree_skb for small buffer XDP receive
  2017-01-17 22:19 ` [net PATCH v5 1/6] virtio_net: use dev_kfree_skb for small buffer XDP receive John Fastabend
  2017-01-18 15:48   ` Michael S. Tsirkin
@ 2017-01-23 21:08   ` Michael S. Tsirkin
  2017-01-23 21:57     ` John Fastabend
  2017-01-24 19:43     ` David Miller
  1 sibling, 2 replies; 40+ messages in thread
From: Michael S. Tsirkin @ 2017-01-23 21:08 UTC (permalink / raw)
  To: John Fastabend
  Cc: jasowang, john.r.fastabend, netdev, alexei.starovoitov, daniel

On Tue, Jan 17, 2017 at 02:19:50PM -0800, John Fastabend wrote:
> In the small buffer case during driver unload we currently use
> put_page instead of dev_kfree_skb. Resolve this by adding a check
> for virtnet mode when checking XDP queue type. Also name the
> function so that the code reads correctly to match the additional
> check.
> 
> Fixes: bb91accf2733 ("virtio-net: XDP support for small buffers")
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> Acked-by: Jason Wang <jasowang@redhat.com>

Acked-by: Michael S. Tsirkin <mst@redhat.com>

I think we definitely want this one in -net as it's
a bugfix.

> ---
>  drivers/net/virtio_net.c |    8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 4a10500..d97bb71 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1890,8 +1890,12 @@ static void free_receive_page_frags(struct virtnet_info *vi)
>  			put_page(vi->rq[i].alloc_frag.page);
>  }
>  
> -static bool is_xdp_queue(struct virtnet_info *vi, int q)
> +static bool is_xdp_raw_buffer_queue(struct virtnet_info *vi, int q)
>  {
> +	/* For small receive mode always use kfree_skb variants */
> +	if (!vi->mergeable_rx_bufs)
> +		return false;
> +
>  	if (q < (vi->curr_queue_pairs - vi->xdp_queue_pairs))
>  		return false;
>  	else if (q < vi->curr_queue_pairs)
> @@ -1908,7 +1912,7 @@ static void free_unused_bufs(struct virtnet_info *vi)
>  	for (i = 0; i < vi->max_queue_pairs; i++) {
>  		struct virtqueue *vq = vi->sq[i].vq;
>  		while ((buf = virtqueue_detach_unused_buf(vq)) != NULL) {
> -			if (!is_xdp_queue(vi, i))
> +			if (!is_xdp_raw_buffer_queue(vi, i))
>  				dev_kfree_skb(buf);
>  			else
>  				put_page(virt_to_head_page(buf));

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [net PATCH v5 1/6] virtio_net: use dev_kfree_skb for small buffer XDP receive
  2017-01-23 21:08   ` Michael S. Tsirkin
@ 2017-01-23 21:57     ` John Fastabend
  2017-01-24 19:43     ` David Miller
  1 sibling, 0 replies; 40+ messages in thread
From: John Fastabend @ 2017-01-23 21:57 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang, john.r.fastabend, netdev, alexei.starovoitov, daniel

On 17-01-23 01:08 PM, Michael S. Tsirkin wrote:
> On Tue, Jan 17, 2017 at 02:19:50PM -0800, John Fastabend wrote:
>> In the small buffer case during driver unload we currently use
>> put_page instead of dev_kfree_skb. Resolve this by adding a check
>> for virtnet mode when checking XDP queue type. Also name the
>> function so that the code reads correctly to match the additional
>> check.
>>
>> Fixes: bb91accf2733 ("virtio-net: XDP support for small buffers")
>> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
>> Acked-by: Jason Wang <jasowang@redhat.com>
> 
> Acked-by: Michael S. Tsirkin <mst@redhat.com>
> 
> I think we definitely want this one in -net as it's
> a bugfix.
> 

Agreed, let me pull this fix out of the series and submit it for
net.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [net PATCH v5 6/6] virtio_net: XDP support for adjust_head
  2017-01-23 20:09     ` Michael S. Tsirkin
@ 2017-01-23 22:12       ` John Fastabend
  2017-01-23 22:28         ` Michael S. Tsirkin
  0 siblings, 1 reply; 40+ messages in thread
From: John Fastabend @ 2017-01-23 22:12 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: jasowang, john.r.fastabend, netdev, alexei.starovoitov, daniel

On 17-01-23 12:09 PM, Michael S. Tsirkin wrote:
> On Mon, Jan 23, 2017 at 09:22:36PM +0200, Michael S. Tsirkin wrote:
>> On Tue, Jan 17, 2017 at 02:22:59PM -0800, John Fastabend wrote:
>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>> index 62dbf4b..3b129b4 100644
>>> --- a/drivers/net/virtio_net.c
>>> +++ b/drivers/net/virtio_net.c
>>> @@ -41,6 +41,9 @@
>>>  #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN)
>>>  #define GOOD_COPY_LEN	128
>>>  
>>> +/* Amount of XDP headroom to prepend to packets for use by xdp_adjust_head */
>>> +#define VIRTIO_XDP_HEADROOM 256
>>> +
>>>  /* RX packet size EWMA. The average packet size is used to determine the packet
>>>   * buffer size when refilling RX rings. As the entire RX ring may be refilled
>>>   * at once, the weight is chosen so that the EWMA will be insensitive to short-
>>
>> I wonder where does this number come from?  This is quite a lot and
>> means that using XDP_PASS will slow down any sockets on top of it.
>> Which in turn means people will try to remove XDP when not in use,
>> causing resets.  E.g. build_skb (which I have a patch to switch to) uses
>> a much more reasonable NET_SKB_PAD.

I just used the value Alexei (or someone?) came up with. I think it needs to be
large enough to avoid copy in header encap cases. So minimum

  VXLAN_HDR + OUTER_UDP + OUTER_IPV6_HDR + OUTER_MAC =
     8      +     8     +        40      +      14   =  70

The choice of VXLAN hdr was sort of arbitrary but seems good for estimates. For
what its worth there is also a ndo_set_rx_headroom could we use that to set it
and choose a reasonable default.

>>
>> -- 
>> MST
> 
> 
> Let me show you a patch that I've been cooking.  What is missing there
> is handling corner cases like e.g.  when ring size is ~4 entries so
> using smaller buffers might mean we no longer have enough space to store
> a full packet.  So it looks like I have to maintain the skb copy path
> for this hardware.
> 
> With this patch, standard configuration has NET_SKB_PAD + NET_IP_ALIGN
> bytes head padding. Would this be enough for XDP? If yes we do not
> need the resets.

Based on above seems a bit small (L1_CACHE_BYTES + 2)? How tricky would it
be to add support for ndo_set_rx_headroom.

> 
> Thoughts?

I'll take a look at the patch this afternoon. Thanks.

> 
> --->
> 
> virtio_net: switch to build_skb for mrg_rxbuf
> 
> For small packets data copy was observed to
> take up about 15% CPU time. Switch to build_skb
> and avoid the copy when using mergeable rx buffers.
> 
> As a bonus, medium-size skbs that fit in a page will be
> completely linear.
> 
> Of course, we now need to lower the lower bound on packet size,
> to make sure a sane number of skbs fits in rx socket buffer.
> By how much? I don't know yet.
> 
> It might also be useful to prefetch the packet buffer since
> net stack will likely use it soon.
> 
> Lightly tested, in particular, I didn't yet test what this
> actually does to performance - sending this out for early
> feedback/flames.
> 
> TODO: it appears that Linux won't handle correctly the case of first
> buffer being very small (or consisting exclusively of virtio header).
> This is already the case for current code, need to fix.
> TODO: might be unfair to the last packet in a fragment as we include
> remaining space if any in its truesize.
> 
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> 
> ----
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index b425fa1..a6b996f 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -38,6 +38,8 @@ module_param(gso, bool, 0444);
>  
>  /* FIXME: MTU in config. */
>  #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN)
> +//#define MIN_PACKET_ALLOC GOOD_PACKET_LEN
> +#define MIN_PACKET_ALLOC 128
>  #define GOOD_COPY_LEN	128
>  
>  /* RX packet size EWMA. The average packet size is used to determine the packet
> @@ -246,6 +248,9 @@ static void *mergeable_ctx_to_buf_address(unsigned long mrg_ctx)
>  static unsigned long mergeable_buf_to_ctx(void *buf, unsigned int truesize)
>  {
>  	unsigned int size = truesize / MERGEABLE_BUFFER_ALIGN;
> +
> +	BUG_ON((unsigned long)buf & (MERGEABLE_BUFFER_ALIGN - 1));
> +	BUG_ON(size - 1 >= MERGEABLE_BUFFER_ALIGN);
>  	return (unsigned long)buf | (size - 1);
>  }
>  
> @@ -354,25 +359,54 @@ static struct sk_buff *receive_big(struct net_device *dev,
>  	return NULL;
>  }
>  
> +#define VNET_SKB_PAD (NET_SKB_PAD + NET_IP_ALIGN)
> +#define VNET_SKB_BUG (VNET_SKB_PAD < sizeof(struct virtio_net_hdr_mrg_rxbuf))
> +#define VNET_SKB_LEN(len) ((len) - sizeof(struct virtio_net_hdr_mrg_rxbuf))
> +#define VNET_SKB_OFF VNET_SKB_LEN(VNET_SKB_PAD)
> +
> +static struct sk_buff *vnet_build_skb(struct virtnet_info *vi,
> +				      void *buf,
> +				      unsigned int len, unsigned int truesize)
> +{
> +	struct sk_buff *skb = build_skb(buf, truesize);
> +
> +	if (!skb)
> +		return NULL;
> +
> +	skb_reserve(skb, VNET_SKB_PAD);
> +	skb_put(skb, VNET_SKB_LEN(len));
> +
> +	return skb;
> +}
> +
>  static struct sk_buff *receive_mergeable(struct net_device *dev,
>  					 struct virtnet_info *vi,
>  					 struct receive_queue *rq,
>  					 unsigned long ctx,
> -					 unsigned int len)
> +					 unsigned int len,
> +					 struct virtio_net_hdr_mrg_rxbuf *hdr)
>  {
>  	void *buf = mergeable_ctx_to_buf_address(ctx);
> -	struct virtio_net_hdr_mrg_rxbuf *hdr = buf;
> -	u16 num_buf = virtio16_to_cpu(vi->vdev, hdr->num_buffers);
> +	u16 num_buf;
>  	struct page *page = virt_to_head_page(buf);
> -	int offset = buf - page_address(page);
> -	unsigned int truesize = max(len, mergeable_ctx_to_buf_truesize(ctx));
> +	unsigned int truesize = mergeable_ctx_to_buf_truesize(ctx);
> +	int offset;
> +	struct sk_buff *head_skb;
> +	struct sk_buff *curr_skb;
> +
> +	BUG_ON(len > truesize);
>  
> -	struct sk_buff *head_skb = page_to_skb(vi, rq, page, offset, len,
> -					       truesize);
> -	struct sk_buff *curr_skb = head_skb;
> +	/* copy header: build_skb will overwrite it */
> +	memcpy(hdr, buf + VNET_SKB_OFF, sizeof *hdr);
> +
> +	head_skb = vnet_build_skb(vi, buf, len, truesize);
> +	curr_skb = head_skb;
> +
> +	num_buf = virtio16_to_cpu(vi->vdev, hdr->num_buffers);
>  
>  	if (unlikely(!curr_skb))
>  		goto err_skb;
> +
>  	while (--num_buf) {
>  		int num_skb_frags;
>  
> @@ -386,7 +420,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>  			goto err_buf;
>  		}
>  
> -		buf = mergeable_ctx_to_buf_address(ctx);
> +		buf = mergeable_ctx_to_buf_address(ctx) + VNET_SKB_OFF;
>  		page = virt_to_head_page(buf);
>  
>  		num_skb_frags = skb_shinfo(curr_skb)->nr_frags;
> @@ -403,7 +437,8 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>  			head_skb->truesize += nskb->truesize;
>  			num_skb_frags = 0;
>  		}
> -		truesize = max(len, mergeable_ctx_to_buf_truesize(ctx));
> +		truesize = mergeable_ctx_to_buf_truesize(ctx);
> +		BUG_ON(len > truesize);
>  		if (curr_skb != head_skb) {
>  			head_skb->data_len += len;
>  			head_skb->len += len;
> @@ -449,6 +484,7 @@ static void receive_buf(struct virtnet_info *vi, struct receive_queue *rq,
>  	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
>  	struct sk_buff *skb;
>  	struct virtio_net_hdr_mrg_rxbuf *hdr;
> +	struct virtio_net_hdr_mrg_rxbuf hdr0;
>  
>  	if (unlikely(len < vi->hdr_len + ETH_HLEN)) {
>  		pr_debug("%s: short packet %i\n", dev->name, len);
> @@ -465,17 +501,24 @@ static void receive_buf(struct virtnet_info *vi, struct receive_queue *rq,
>  		return;
>  	}
>  
> -	if (vi->mergeable_rx_bufs)
> -		skb = receive_mergeable(dev, vi, rq, (unsigned long)buf, len);
> -	else if (vi->big_packets)
> +	if (vi->mergeable_rx_bufs) {
> +		skb = receive_mergeable(dev, vi, rq, (unsigned long)buf, len,
> +					&hdr0);
> +		if (unlikely(!skb))
> +			return;
> +		hdr = &hdr0;
> +	} else if (vi->big_packets) {
>  		skb = receive_big(dev, vi, rq, buf, len);
> -	else
> +		if (unlikely(!skb))
> +			return;
> +		hdr = skb_vnet_hdr(skb);
> +	} else {
>  		skb = receive_small(vi, buf, len);
> +		if (unlikely(!skb))
> +			return;
> +		hdr = skb_vnet_hdr(skb);
> +	}
>  
> -	if (unlikely(!skb))
> -		return;
> -
> -	hdr = skb_vnet_hdr(skb);
>  
>  	u64_stats_update_begin(&stats->rx_syncp);
>  	stats->rx_bytes += skb->len;
> @@ -581,11 +624,14 @@ static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq,
>  
>  static unsigned int get_mergeable_buf_len(struct ewma_pkt_len *avg_pkt_len)
>  {
> -	const size_t hdr_len = sizeof(struct virtio_net_hdr_mrg_rxbuf);
> +	unsigned int hdr;
>  	unsigned int len;
>  
> -	len = hdr_len + clamp_t(unsigned int, ewma_pkt_len_read(avg_pkt_len),
> -			GOOD_PACKET_LEN, PAGE_SIZE - hdr_len);
> +	hdr = ALIGN(VNET_SKB_PAD + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)),
> +		    MERGEABLE_BUFFER_ALIGN);
> +
> +	len = hdr + clamp_t(unsigned int, ewma_pkt_len_read(avg_pkt_len),
> +			    MIN_PACKET_ALLOC, PAGE_SIZE - hdr);
>  	return ALIGN(len, MERGEABLE_BUFFER_ALIGN);
>  }
>  
> @@ -601,8 +647,11 @@ static int add_recvbuf_mergeable(struct receive_queue *rq, gfp_t gfp)
>  	if (unlikely(!skb_page_frag_refill(len, alloc_frag, gfp)))
>  		return -ENOMEM;
>  
> -	buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
> -	ctx = mergeable_buf_to_ctx(buf, len);
> +	BUILD_BUG_ON(VNET_SKB_BUG);
> +
> +	buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset +
> +		VNET_SKB_OFF;
> +	//ctx = mergeable_buf_to_ctx(buf - VNET_SKB_OFF, len);
>  	get_page(alloc_frag->page);
>  	alloc_frag->offset += len;
>  	hole = alloc_frag->size - alloc_frag->offset;
> @@ -615,8 +664,10 @@ static int add_recvbuf_mergeable(struct receive_queue *rq, gfp_t gfp)
>  		len += hole;
>  		alloc_frag->offset += hole;
>  	}
> +	ctx = mergeable_buf_to_ctx(buf - VNET_SKB_OFF, len);
>  
> -	sg_init_one(rq->sg, buf, len);
> +	sg_init_one(rq->sg, buf,
> +		    len - VNET_SKB_OFF - SKB_DATA_ALIGN(sizeof(struct skb_shared_info)));
>  	err = virtqueue_add_inbuf(rq->vq, rq->sg, 1, (void *)ctx, gfp);
>  	if (err < 0)
>  		put_page(virt_to_head_page(buf));
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [net PATCH v5 6/6] virtio_net: XDP support for adjust_head
  2017-01-23 22:12       ` John Fastabend
@ 2017-01-23 22:28         ` Michael S. Tsirkin
  0 siblings, 0 replies; 40+ messages in thread
From: Michael S. Tsirkin @ 2017-01-23 22:28 UTC (permalink / raw)
  To: John Fastabend
  Cc: jasowang, john.r.fastabend, netdev, alexei.starovoitov, daniel

On Mon, Jan 23, 2017 at 02:12:47PM -0800, John Fastabend wrote:
> On 17-01-23 12:09 PM, Michael S. Tsirkin wrote:
> > On Mon, Jan 23, 2017 at 09:22:36PM +0200, Michael S. Tsirkin wrote:
> >> On Tue, Jan 17, 2017 at 02:22:59PM -0800, John Fastabend wrote:
> >>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> >>> index 62dbf4b..3b129b4 100644
> >>> --- a/drivers/net/virtio_net.c
> >>> +++ b/drivers/net/virtio_net.c
> >>> @@ -41,6 +41,9 @@
> >>>  #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN)
> >>>  #define GOOD_COPY_LEN	128
> >>>  
> >>> +/* Amount of XDP headroom to prepend to packets for use by xdp_adjust_head */
> >>> +#define VIRTIO_XDP_HEADROOM 256
> >>> +
> >>>  /* RX packet size EWMA. The average packet size is used to determine the packet
> >>>   * buffer size when refilling RX rings. As the entire RX ring may be refilled
> >>>   * at once, the weight is chosen so that the EWMA will be insensitive to short-
> >>
> >> I wonder where does this number come from?  This is quite a lot and
> >> means that using XDP_PASS will slow down any sockets on top of it.
> >> Which in turn means people will try to remove XDP when not in use,
> >> causing resets.  E.g. build_skb (which I have a patch to switch to) uses
> >> a much more reasonable NET_SKB_PAD.
> 
> I just used the value Alexei (or someone?) came up with. I think it needs to be
> large enough to avoid copy in header encap cases. So minimum
> 
>   VXLAN_HDR + OUTER_UDP + OUTER_IPV6_HDR + OUTER_MAC =
>      8      +     8     +        40      +      14   =  70
> 
> The choice of VXLAN hdr was sort of arbitrary but seems good for estimates. For
> what its worth there is also a ndo_set_rx_headroom could we use that to set it
> and choose a reasonable default.
> 
> >>
> >> -- 
> >> MST
> > 
> > 
> > Let me show you a patch that I've been cooking.  What is missing there
> > is handling corner cases like e.g.  when ring size is ~4 entries so
> > using smaller buffers might mean we no longer have enough space to store
> > a full packet.  So it looks like I have to maintain the skb copy path
> > for this hardware.
> > 
> > With this patch, standard configuration has NET_SKB_PAD + NET_IP_ALIGN
> > bytes head padding. Would this be enough for XDP? If yes we do not
> > need the resets.
> 
> Based on above seems a bit small (L1_CACHE_BYTES + 2)? How tricky would it
> be to add support for ndo_set_rx_headroom.

Donnu but then what? Expose it to userspace and let admin
make the decision for us?


> > 
> > Thoughts?
> 
> I'll take a look at the patch this afternoon. Thanks.
> 
> > 
> > --->
> > 
> > virtio_net: switch to build_skb for mrg_rxbuf
> > 
> > For small packets data copy was observed to
> > take up about 15% CPU time. Switch to build_skb
> > and avoid the copy when using mergeable rx buffers.
> > 
> > As a bonus, medium-size skbs that fit in a page will be
> > completely linear.
> > 
> > Of course, we now need to lower the lower bound on packet size,
> > to make sure a sane number of skbs fits in rx socket buffer.
> > By how much? I don't know yet.
> > 
> > It might also be useful to prefetch the packet buffer since
> > net stack will likely use it soon.
> > 
> > Lightly tested, in particular, I didn't yet test what this
> > actually does to performance - sending this out for early
> > feedback/flames.
> > 
> > TODO: it appears that Linux won't handle correctly the case of first
> > buffer being very small (or consisting exclusively of virtio header).
> > This is already the case for current code, need to fix.
> > TODO: might be unfair to the last packet in a fragment as we include
> > remaining space if any in its truesize.
> > 
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > 
> > ----
> > 
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index b425fa1..a6b996f 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -38,6 +38,8 @@ module_param(gso, bool, 0444);
> >  
> >  /* FIXME: MTU in config. */
> >  #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN)
> > +//#define MIN_PACKET_ALLOC GOOD_PACKET_LEN
> > +#define MIN_PACKET_ALLOC 128
> >  #define GOOD_COPY_LEN	128
> >  
> >  /* RX packet size EWMA. The average packet size is used to determine the packet
> > @@ -246,6 +248,9 @@ static void *mergeable_ctx_to_buf_address(unsigned long mrg_ctx)
> >  static unsigned long mergeable_buf_to_ctx(void *buf, unsigned int truesize)
> >  {
> >  	unsigned int size = truesize / MERGEABLE_BUFFER_ALIGN;
> > +
> > +	BUG_ON((unsigned long)buf & (MERGEABLE_BUFFER_ALIGN - 1));
> > +	BUG_ON(size - 1 >= MERGEABLE_BUFFER_ALIGN);
> >  	return (unsigned long)buf | (size - 1);
> >  }
> >  
> > @@ -354,25 +359,54 @@ static struct sk_buff *receive_big(struct net_device *dev,
> >  	return NULL;
> >  }
> >  
> > +#define VNET_SKB_PAD (NET_SKB_PAD + NET_IP_ALIGN)
> > +#define VNET_SKB_BUG (VNET_SKB_PAD < sizeof(struct virtio_net_hdr_mrg_rxbuf))
> > +#define VNET_SKB_LEN(len) ((len) - sizeof(struct virtio_net_hdr_mrg_rxbuf))
> > +#define VNET_SKB_OFF VNET_SKB_LEN(VNET_SKB_PAD)
> > +
> > +static struct sk_buff *vnet_build_skb(struct virtnet_info *vi,
> > +				      void *buf,
> > +				      unsigned int len, unsigned int truesize)
> > +{
> > +	struct sk_buff *skb = build_skb(buf, truesize);
> > +
> > +	if (!skb)
> > +		return NULL;
> > +
> > +	skb_reserve(skb, VNET_SKB_PAD);
> > +	skb_put(skb, VNET_SKB_LEN(len));
> > +
> > +	return skb;
> > +}
> > +
> >  static struct sk_buff *receive_mergeable(struct net_device *dev,
> >  					 struct virtnet_info *vi,
> >  					 struct receive_queue *rq,
> >  					 unsigned long ctx,
> > -					 unsigned int len)
> > +					 unsigned int len,
> > +					 struct virtio_net_hdr_mrg_rxbuf *hdr)
> >  {
> >  	void *buf = mergeable_ctx_to_buf_address(ctx);
> > -	struct virtio_net_hdr_mrg_rxbuf *hdr = buf;
> > -	u16 num_buf = virtio16_to_cpu(vi->vdev, hdr->num_buffers);
> > +	u16 num_buf;
> >  	struct page *page = virt_to_head_page(buf);
> > -	int offset = buf - page_address(page);
> > -	unsigned int truesize = max(len, mergeable_ctx_to_buf_truesize(ctx));
> > +	unsigned int truesize = mergeable_ctx_to_buf_truesize(ctx);
> > +	int offset;
> > +	struct sk_buff *head_skb;
> > +	struct sk_buff *curr_skb;
> > +
> > +	BUG_ON(len > truesize);
> >  
> > -	struct sk_buff *head_skb = page_to_skb(vi, rq, page, offset, len,
> > -					       truesize);
> > -	struct sk_buff *curr_skb = head_skb;
> > +	/* copy header: build_skb will overwrite it */
> > +	memcpy(hdr, buf + VNET_SKB_OFF, sizeof *hdr);
> > +
> > +	head_skb = vnet_build_skb(vi, buf, len, truesize);
> > +	curr_skb = head_skb;
> > +
> > +	num_buf = virtio16_to_cpu(vi->vdev, hdr->num_buffers);
> >  
> >  	if (unlikely(!curr_skb))
> >  		goto err_skb;
> > +
> >  	while (--num_buf) {
> >  		int num_skb_frags;
> >  
> > @@ -386,7 +420,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> >  			goto err_buf;
> >  		}
> >  
> > -		buf = mergeable_ctx_to_buf_address(ctx);
> > +		buf = mergeable_ctx_to_buf_address(ctx) + VNET_SKB_OFF;
> >  		page = virt_to_head_page(buf);
> >  
> >  		num_skb_frags = skb_shinfo(curr_skb)->nr_frags;
> > @@ -403,7 +437,8 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> >  			head_skb->truesize += nskb->truesize;
> >  			num_skb_frags = 0;
> >  		}
> > -		truesize = max(len, mergeable_ctx_to_buf_truesize(ctx));
> > +		truesize = mergeable_ctx_to_buf_truesize(ctx);
> > +		BUG_ON(len > truesize);
> >  		if (curr_skb != head_skb) {
> >  			head_skb->data_len += len;
> >  			head_skb->len += len;
> > @@ -449,6 +484,7 @@ static void receive_buf(struct virtnet_info *vi, struct receive_queue *rq,
> >  	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
> >  	struct sk_buff *skb;
> >  	struct virtio_net_hdr_mrg_rxbuf *hdr;
> > +	struct virtio_net_hdr_mrg_rxbuf hdr0;
> >  
> >  	if (unlikely(len < vi->hdr_len + ETH_HLEN)) {
> >  		pr_debug("%s: short packet %i\n", dev->name, len);
> > @@ -465,17 +501,24 @@ static void receive_buf(struct virtnet_info *vi, struct receive_queue *rq,
> >  		return;
> >  	}
> >  
> > -	if (vi->mergeable_rx_bufs)
> > -		skb = receive_mergeable(dev, vi, rq, (unsigned long)buf, len);
> > -	else if (vi->big_packets)
> > +	if (vi->mergeable_rx_bufs) {
> > +		skb = receive_mergeable(dev, vi, rq, (unsigned long)buf, len,
> > +					&hdr0);
> > +		if (unlikely(!skb))
> > +			return;
> > +		hdr = &hdr0;
> > +	} else if (vi->big_packets) {
> >  		skb = receive_big(dev, vi, rq, buf, len);
> > -	else
> > +		if (unlikely(!skb))
> > +			return;
> > +		hdr = skb_vnet_hdr(skb);
> > +	} else {
> >  		skb = receive_small(vi, buf, len);
> > +		if (unlikely(!skb))
> > +			return;
> > +		hdr = skb_vnet_hdr(skb);
> > +	}
> >  
> > -	if (unlikely(!skb))
> > -		return;
> > -
> > -	hdr = skb_vnet_hdr(skb);
> >  
> >  	u64_stats_update_begin(&stats->rx_syncp);
> >  	stats->rx_bytes += skb->len;
> > @@ -581,11 +624,14 @@ static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq,
> >  
> >  static unsigned int get_mergeable_buf_len(struct ewma_pkt_len *avg_pkt_len)
> >  {
> > -	const size_t hdr_len = sizeof(struct virtio_net_hdr_mrg_rxbuf);
> > +	unsigned int hdr;
> >  	unsigned int len;
> >  
> > -	len = hdr_len + clamp_t(unsigned int, ewma_pkt_len_read(avg_pkt_len),
> > -			GOOD_PACKET_LEN, PAGE_SIZE - hdr_len);
> > +	hdr = ALIGN(VNET_SKB_PAD + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)),
> > +		    MERGEABLE_BUFFER_ALIGN);
> > +
> > +	len = hdr + clamp_t(unsigned int, ewma_pkt_len_read(avg_pkt_len),
> > +			    MIN_PACKET_ALLOC, PAGE_SIZE - hdr);
> >  	return ALIGN(len, MERGEABLE_BUFFER_ALIGN);
> >  }
> >  
> > @@ -601,8 +647,11 @@ static int add_recvbuf_mergeable(struct receive_queue *rq, gfp_t gfp)
> >  	if (unlikely(!skb_page_frag_refill(len, alloc_frag, gfp)))
> >  		return -ENOMEM;
> >  
> > -	buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
> > -	ctx = mergeable_buf_to_ctx(buf, len);
> > +	BUILD_BUG_ON(VNET_SKB_BUG);
> > +
> > +	buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset +
> > +		VNET_SKB_OFF;
> > +	//ctx = mergeable_buf_to_ctx(buf - VNET_SKB_OFF, len);
> >  	get_page(alloc_frag->page);
> >  	alloc_frag->offset += len;
> >  	hole = alloc_frag->size - alloc_frag->offset;
> > @@ -615,8 +664,10 @@ static int add_recvbuf_mergeable(struct receive_queue *rq, gfp_t gfp)
> >  		len += hole;
> >  		alloc_frag->offset += hole;
> >  	}
> > +	ctx = mergeable_buf_to_ctx(buf - VNET_SKB_OFF, len);
> >  
> > -	sg_init_one(rq->sg, buf, len);
> > +	sg_init_one(rq->sg, buf,
> > +		    len - VNET_SKB_OFF - SKB_DATA_ALIGN(sizeof(struct skb_shared_info)));
> >  	err = virtqueue_add_inbuf(rq->vq, rq->sg, 1, (void *)ctx, gfp);
> >  	if (err < 0)
> >  		put_page(virt_to_head_page(buf));
> > 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [net PATCH v5 1/6] virtio_net: use dev_kfree_skb for small buffer XDP receive
  2017-01-23 21:08   ` Michael S. Tsirkin
  2017-01-23 21:57     ` John Fastabend
@ 2017-01-24 19:43     ` David Miller
  2017-01-24 20:08       ` Michael S. Tsirkin
  1 sibling, 1 reply; 40+ messages in thread
From: David Miller @ 2017-01-24 19:43 UTC (permalink / raw)
  To: mst
  Cc: john.fastabend, jasowang, john.r.fastabend, netdev,
	alexei.starovoitov, daniel

From: "Michael S. Tsirkin" <mst@redhat.com>
Date: Mon, 23 Jan 2017 23:08:35 +0200

> On Tue, Jan 17, 2017 at 02:19:50PM -0800, John Fastabend wrote:
>> In the small buffer case during driver unload we currently use
>> put_page instead of dev_kfree_skb. Resolve this by adding a check
>> for virtnet mode when checking XDP queue type. Also name the
>> function so that the code reads correctly to match the additional
>> check.
>> 
>> Fixes: bb91accf2733 ("virtio-net: XDP support for small buffers")
>> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
>> Acked-by: Jason Wang <jasowang@redhat.com>
> 
> Acked-by: Michael S. Tsirkin <mst@redhat.com>
> 
> I think we definitely want this one in -net as it's
> a bugfix.

This whole series is a bug fix, we must have adjust_header XDP
support in the virtio_net driver before v4.10 goes out, it is
a requires base feature for XDP.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [net PATCH v5 1/6] virtio_net: use dev_kfree_skb for small buffer XDP receive
  2017-01-24 19:43     ` David Miller
@ 2017-01-24 20:08       ` Michael S. Tsirkin
  2017-01-24 20:11         ` David Miller
  2017-01-25  2:57         ` Jason Wang
  0 siblings, 2 replies; 40+ messages in thread
From: Michael S. Tsirkin @ 2017-01-24 20:08 UTC (permalink / raw)
  To: David Miller
  Cc: john.fastabend, jasowang, john.r.fastabend, netdev,
	alexei.starovoitov, daniel

On Tue, Jan 24, 2017 at 02:43:28PM -0500, David Miller wrote:
> From: "Michael S. Tsirkin" <mst@redhat.com>
> Date: Mon, 23 Jan 2017 23:08:35 +0200
> 
> > On Tue, Jan 17, 2017 at 02:19:50PM -0800, John Fastabend wrote:
> >> In the small buffer case during driver unload we currently use
> >> put_page instead of dev_kfree_skb. Resolve this by adding a check
> >> for virtnet mode when checking XDP queue type. Also name the
> >> function so that the code reads correctly to match the additional
> >> check.
> >> 
> >> Fixes: bb91accf2733 ("virtio-net: XDP support for small buffers")
> >> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> >> Acked-by: Jason Wang <jasowang@redhat.com>
> > 
> > Acked-by: Michael S. Tsirkin <mst@redhat.com>
> > 
> > I think we definitely want this one in -net as it's
> > a bugfix.
> 
> This whole series is a bug fix, we must have adjust_header XDP
> support in the virtio_net driver before v4.10 goes out, it is
> a requires base feature for XDP.

I have to say device resets outside probe have a huge potential
to uncover hypervisor bugs. I am rather uncomfortable
doing that after -rc1.

How about a module option to disable it by default?
We can then ship a partial implementation in 4.10
and work on completing it in 4.11.

-- 
MST

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [net PATCH v5 1/6] virtio_net: use dev_kfree_skb for small buffer XDP receive
  2017-01-24 20:08       ` Michael S. Tsirkin
@ 2017-01-24 20:11         ` David Miller
  2017-01-24 20:54           ` Michael S. Tsirkin
  2017-01-25  2:57         ` Jason Wang
  1 sibling, 1 reply; 40+ messages in thread
From: David Miller @ 2017-01-24 20:11 UTC (permalink / raw)
  To: mst
  Cc: john.fastabend, jasowang, john.r.fastabend, netdev,
	alexei.starovoitov, daniel

From: "Michael S. Tsirkin" <mst@redhat.com>
Date: Tue, 24 Jan 2017 22:08:33 +0200

> On Tue, Jan 24, 2017 at 02:43:28PM -0500, David Miller wrote:
>> From: "Michael S. Tsirkin" <mst@redhat.com>
>> Date: Mon, 23 Jan 2017 23:08:35 +0200
>> 
>> > On Tue, Jan 17, 2017 at 02:19:50PM -0800, John Fastabend wrote:
>> >> In the small buffer case during driver unload we currently use
>> >> put_page instead of dev_kfree_skb. Resolve this by adding a check
>> >> for virtnet mode when checking XDP queue type. Also name the
>> >> function so that the code reads correctly to match the additional
>> >> check.
>> >> 
>> >> Fixes: bb91accf2733 ("virtio-net: XDP support for small buffers")
>> >> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
>> >> Acked-by: Jason Wang <jasowang@redhat.com>
>> > 
>> > Acked-by: Michael S. Tsirkin <mst@redhat.com>
>> > 
>> > I think we definitely want this one in -net as it's
>> > a bugfix.
>> 
>> This whole series is a bug fix, we must have adjust_header XDP
>> support in the virtio_net driver before v4.10 goes out, it is
>> a requires base feature for XDP.
> 
> I have to say device resets outside probe have a huge potential
> to uncover hypervisor bugs. I am rather uncomfortable
> doing that after -rc1.
> 
> How about a module option to disable it by default?
> We can then ship a partial implementation in 4.10
> and work on completing it in 4.11.

XDP programmers must be able to assume a base set of features being
present, adjust_header is one of them.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [net PATCH v5 1/6] virtio_net: use dev_kfree_skb for small buffer XDP receive
  2017-01-24 20:11         ` David Miller
@ 2017-01-24 20:54           ` Michael S. Tsirkin
  0 siblings, 0 replies; 40+ messages in thread
From: Michael S. Tsirkin @ 2017-01-24 20:54 UTC (permalink / raw)
  To: David Miller
  Cc: john.fastabend, jasowang, john.r.fastabend, netdev,
	alexei.starovoitov, daniel

On Tue, Jan 24, 2017 at 03:11:39PM -0500, David Miller wrote:
> From: "Michael S. Tsirkin" <mst@redhat.com>
> Date: Tue, 24 Jan 2017 22:08:33 +0200
> 
> > On Tue, Jan 24, 2017 at 02:43:28PM -0500, David Miller wrote:
> >> From: "Michael S. Tsirkin" <mst@redhat.com>
> >> Date: Mon, 23 Jan 2017 23:08:35 +0200
> >> 
> >> > On Tue, Jan 17, 2017 at 02:19:50PM -0800, John Fastabend wrote:
> >> >> In the small buffer case during driver unload we currently use
> >> >> put_page instead of dev_kfree_skb. Resolve this by adding a check
> >> >> for virtnet mode when checking XDP queue type. Also name the
> >> >> function so that the code reads correctly to match the additional
> >> >> check.
> >> >> 
> >> >> Fixes: bb91accf2733 ("virtio-net: XDP support for small buffers")
> >> >> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> >> >> Acked-by: Jason Wang <jasowang@redhat.com>
> >> > 
> >> > Acked-by: Michael S. Tsirkin <mst@redhat.com>
> >> > 
> >> > I think we definitely want this one in -net as it's
> >> > a bugfix.
> >> 
> >> This whole series is a bug fix, we must have adjust_header XDP
> >> support in the virtio_net driver before v4.10 goes out, it is
> >> a requires base feature for XDP.
> > 
> > I have to say device resets outside probe have a huge potential
> > to uncover hypervisor bugs. I am rather uncomfortable
> > doing that after -rc1.
> > 
> > How about a module option to disable it by default?
> > We can then ship a partial implementation in 4.10
> > and work on completing it in 4.11.
> 
> XDP programmers must be able to assume a base set of features being
> present, adjust_header is one of them.

Let's make all of XDP depend on this extra_headroom option then?

-- 
MST

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [net PATCH v5 1/6] virtio_net: use dev_kfree_skb for small buffer XDP receive
  2017-01-24 20:08       ` Michael S. Tsirkin
  2017-01-24 20:11         ` David Miller
@ 2017-01-25  2:57         ` Jason Wang
  2017-01-25  3:23           ` Michael S. Tsirkin
  1 sibling, 1 reply; 40+ messages in thread
From: Jason Wang @ 2017-01-25  2:57 UTC (permalink / raw)
  To: Michael S. Tsirkin, David Miller
  Cc: john.fastabend, john.r.fastabend, netdev, alexei.starovoitov, daniel



On 2017年01月25日 04:08, Michael S. Tsirkin wrote:
> On Tue, Jan 24, 2017 at 02:43:28PM -0500, David Miller wrote:
>> From: "Michael S. Tsirkin" <mst@redhat.com>
>> Date: Mon, 23 Jan 2017 23:08:35 +0200
>>
>>> On Tue, Jan 17, 2017 at 02:19:50PM -0800, John Fastabend wrote:
>>>> In the small buffer case during driver unload we currently use
>>>> put_page instead of dev_kfree_skb. Resolve this by adding a check
>>>> for virtnet mode when checking XDP queue type. Also name the
>>>> function so that the code reads correctly to match the additional
>>>> check.
>>>>
>>>> Fixes: bb91accf2733 ("virtio-net: XDP support for small buffers")
>>>> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
>>>> Acked-by: Jason Wang <jasowang@redhat.com>
>>> Acked-by: Michael S. Tsirkin <mst@redhat.com>
>>>
>>> I think we definitely want this one in -net as it's
>>> a bugfix.
>> This whole series is a bug fix, we must have adjust_header XDP
>> support in the virtio_net driver before v4.10 goes out, it is
>> a requires base feature for XDP.
> I have to say device resets outside probe have a huge potential
> to uncover hypervisor bugs.

Maybe not if it reuses most of current codes? Since we've already used 
them in sleep or hibernation?

Thanks

>   I am rather uncomfortable
> doing that after -rc1.
>
> How about a module option to disable it by default?
> We can then ship a partial implementation in 4.10
> and work on completing it in 4.11.
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [net PATCH v5 1/6] virtio_net: use dev_kfree_skb for small buffer XDP receive
  2017-01-25  2:57         ` Jason Wang
@ 2017-01-25  3:23           ` Michael S. Tsirkin
  2017-01-25  4:02             ` John Fastabend
  0 siblings, 1 reply; 40+ messages in thread
From: Michael S. Tsirkin @ 2017-01-25  3:23 UTC (permalink / raw)
  To: Jason Wang
  Cc: David Miller, john.fastabend, john.r.fastabend, netdev,
	alexei.starovoitov, daniel

On Wed, Jan 25, 2017 at 10:57:12AM +0800, Jason Wang wrote:
> 
> 
> On 2017年01月25日 04:08, Michael S. Tsirkin wrote:
> > On Tue, Jan 24, 2017 at 02:43:28PM -0500, David Miller wrote:
> > > From: "Michael S. Tsirkin" <mst@redhat.com>
> > > Date: Mon, 23 Jan 2017 23:08:35 +0200
> > > 
> > > > On Tue, Jan 17, 2017 at 02:19:50PM -0800, John Fastabend wrote:
> > > > > In the small buffer case during driver unload we currently use
> > > > > put_page instead of dev_kfree_skb. Resolve this by adding a check
> > > > > for virtnet mode when checking XDP queue type. Also name the
> > > > > function so that the code reads correctly to match the additional
> > > > > check.
> > > > > 
> > > > > Fixes: bb91accf2733 ("virtio-net: XDP support for small buffers")
> > > > > Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> > > > > Acked-by: Jason Wang <jasowang@redhat.com>
> > > > Acked-by: Michael S. Tsirkin <mst@redhat.com>
> > > > 
> > > > I think we definitely want this one in -net as it's
> > > > a bugfix.
> > > This whole series is a bug fix, we must have adjust_header XDP
> > > support in the virtio_net driver before v4.10 goes out, it is
> > > a requires base feature for XDP.
> > I have to say device resets outside probe have a huge potential
> > to uncover hypervisor bugs.
> 
> Maybe not if it reuses most of current codes? Since we've already used them
> in sleep or hibernation?
> 
> Thanks

Except almost no one uses sleep or hybernate with VMs.  I'm not saying
it's a bad idea, just that it needs a lot of testing before release and
we won't get enough if we merge at this point.

> >   I am rather uncomfortable
> > doing that after -rc1.
> > 
> > How about a module option to disable it by default?
> > We can then ship a partial implementation in 4.10
> > and work on completing it in 4.11.
> > 

To clarify, I'm thinking an option similar to enable_xdp,
and have all packets have a 256 byte headroom for 4.10.

Consider our options for 4.11.

-- 
MST

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [net PATCH v5 1/6] virtio_net: use dev_kfree_skb for small buffer XDP receive
  2017-01-25  3:23           ` Michael S. Tsirkin
@ 2017-01-25  4:02             ` John Fastabend
  2017-01-25  5:46               ` Jason Wang
  2017-01-25 14:45               ` Michael S. Tsirkin
  0 siblings, 2 replies; 40+ messages in thread
From: John Fastabend @ 2017-01-25  4:02 UTC (permalink / raw)
  To: Michael S. Tsirkin, Jason Wang
  Cc: David Miller, john.r.fastabend, netdev, alexei.starovoitov, daniel

On 17-01-24 07:23 PM, Michael S. Tsirkin wrote:
> On Wed, Jan 25, 2017 at 10:57:12AM +0800, Jason Wang wrote:
>>
>>
>> On 2017年01月25日 04:08, Michael S. Tsirkin wrote:
>>> On Tue, Jan 24, 2017 at 02:43:28PM -0500, David Miller wrote:
>>>> From: "Michael S. Tsirkin" <mst@redhat.com>
>>>> Date: Mon, 23 Jan 2017 23:08:35 +0200
>>>>
>>>>> On Tue, Jan 17, 2017 at 02:19:50PM -0800, John Fastabend wrote:
>>>>>> In the small buffer case during driver unload we currently use
>>>>>> put_page instead of dev_kfree_skb. Resolve this by adding a check
>>>>>> for virtnet mode when checking XDP queue type. Also name the
>>>>>> function so that the code reads correctly to match the additional
>>>>>> check.
>>>>>>
>>>>>> Fixes: bb91accf2733 ("virtio-net: XDP support for small buffers")
>>>>>> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
>>>>>> Acked-by: Jason Wang <jasowang@redhat.com>
>>>>> Acked-by: Michael S. Tsirkin <mst@redhat.com>
>>>>>
>>>>> I think we definitely want this one in -net as it's
>>>>> a bugfix.
>>>> This whole series is a bug fix, we must have adjust_header XDP
>>>> support in the virtio_net driver before v4.10 goes out, it is
>>>> a requires base feature for XDP.
>>> I have to say device resets outside probe have a huge potential
>>> to uncover hypervisor bugs.
>>
>> Maybe not if it reuses most of current codes? Since we've already used them
>> in sleep or hibernation?
>>
>> Thanks
> 
> Except almost no one uses sleep or hybernate with VMs.  I'm not saying
> it's a bad idea, just that it needs a lot of testing before release and
> we won't get enough if we merge at this point.
> 

Then it would seem like a good thing to have another user of these paths and
find the bugs versus letting them sit there for some poor folks who do use
sleep/hybernate.

>>>   I am rather uncomfortable
>>> doing that after -rc1.
>>>
>>> How about a module option to disable it by default?
>>> We can then ship a partial implementation in 4.10
>>> and work on completing it in 4.11.
>>>

Ugh I would prefer to avoid module options. This will only happen if users
push XDP program into driver anyways.

> 
> To clarify, I'm thinking an option similar to enable_xdp,
> and have all packets have a 256 byte headroom for 4.10.

An option where? In QEMU side, in driver? Is the reset really that bad, coming
from a hardware driver side lots of configuration changes can cause resets. I
agree its not overly elegant but could follow on patches be used to make it
prettier if possible.

I know folks prefer to avoid tuning knobs but I think exposing the headroom
configuration to users might not be a bad idea. After all these same users are
already programming maps and ebpf codes. A simple tuning knob should not be a
big deal and reasonable defaults would of course be used. That is a net-next
debate though.

> 
> Consider our options for 4.11.
> 

Finally just to point out here are the drivers with XDP support on latest
net tree,

	mlx/mlx5
	mlx/mlx4
	qlogic/qede
	netronome/nfp
	virtio_net

And here is the list of adjust header support,

	mlx/mlx4

So we currently have the same feature gap on all the other drivers except one.
Although I do not think that is a very good excuse. Lets figure out what we
should do about virtio.

Thanks,
John

	
	

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [net PATCH v5 1/6] virtio_net: use dev_kfree_skb for small buffer XDP receive
  2017-01-25  4:02             ` John Fastabend
@ 2017-01-25  5:46               ` Jason Wang
  2017-01-25 14:47                 ` Michael S. Tsirkin
  2017-01-25 14:45               ` Michael S. Tsirkin
  1 sibling, 1 reply; 40+ messages in thread
From: Jason Wang @ 2017-01-25  5:46 UTC (permalink / raw)
  To: John Fastabend, Michael S. Tsirkin
  Cc: David Miller, john.r.fastabend, netdev, alexei.starovoitov, daniel



On 2017年01月25日 12:02, John Fastabend wrote:
> On 17-01-24 07:23 PM, Michael S. Tsirkin wrote:
>> On Wed, Jan 25, 2017 at 10:57:12AM +0800, Jason Wang wrote:
>>> On 2017年01月25日 04:08, Michael S. Tsirkin wrote:
>>>> On Tue, Jan 24, 2017 at 02:43:28PM -0500, David Miller wrote:
>>>>> From: "Michael S. Tsirkin"<mst@redhat.com>
>>>>> Date: Mon, 23 Jan 2017 23:08:35 +0200
>>>>>
>>>>>> On Tue, Jan 17, 2017 at 02:19:50PM -0800, John Fastabend wrote:
>>>>>>> In the small buffer case during driver unload we currently use
>>>>>>> put_page instead of dev_kfree_skb. Resolve this by adding a check
>>>>>>> for virtnet mode when checking XDP queue type. Also name the
>>>>>>> function so that the code reads correctly to match the additional
>>>>>>> check.
>>>>>>>
>>>>>>> Fixes: bb91accf2733 ("virtio-net: XDP support for small buffers")
>>>>>>> Signed-off-by: John Fastabend<john.r.fastabend@intel.com>
>>>>>>> Acked-by: Jason Wang<jasowang@redhat.com>
>>>>>> Acked-by: Michael S. Tsirkin<mst@redhat.com>
>>>>>>
>>>>>> I think we definitely want this one in -net as it's
>>>>>> a bugfix.
>>>>> This whole series is a bug fix, we must have adjust_header XDP
>>>>> support in the virtio_net driver before v4.10 goes out, it is
>>>>> a requires base feature for XDP.
>>>> I have to say device resets outside probe have a huge potential
>>>> to uncover hypervisor bugs.
>>> Maybe not if it reuses most of current codes? Since we've already used them
>>> in sleep or hibernation?
>>>
>>> Thanks
>> Except almost no one uses sleep or hybernate with VMs.  I'm not saying
>> it's a bad idea, just that it needs a lot of testing before release and
>> we won't get enough if we merge at this point.
>>
> Then it would seem like a good thing to have another user of these paths and
> find the bugs versus letting them sit there for some poor folks who do use
> sleep/hybernate.
>

Yes, and uncovering hypervisor bugs now is better than uncovering it in 
the future.

Thanks

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [net PATCH v5 1/6] virtio_net: use dev_kfree_skb for small buffer XDP receive
  2017-01-25  4:02             ` John Fastabend
  2017-01-25  5:46               ` Jason Wang
@ 2017-01-25 14:45               ` Michael S. Tsirkin
  1 sibling, 0 replies; 40+ messages in thread
From: Michael S. Tsirkin @ 2017-01-25 14:45 UTC (permalink / raw)
  To: John Fastabend
  Cc: Jason Wang, David Miller, john.r.fastabend, netdev,
	alexei.starovoitov, daniel

On Tue, Jan 24, 2017 at 08:02:29PM -0800, John Fastabend wrote:
> On 17-01-24 07:23 PM, Michael S. Tsirkin wrote:
> > On Wed, Jan 25, 2017 at 10:57:12AM +0800, Jason Wang wrote:
> >>
> >>
> >> On 2017年01月25日 04:08, Michael S. Tsirkin wrote:
> >>> On Tue, Jan 24, 2017 at 02:43:28PM -0500, David Miller wrote:
> >>>> From: "Michael S. Tsirkin" <mst@redhat.com>
> >>>> Date: Mon, 23 Jan 2017 23:08:35 +0200
> >>>>
> >>>>> On Tue, Jan 17, 2017 at 02:19:50PM -0800, John Fastabend wrote:
> >>>>>> In the small buffer case during driver unload we currently use
> >>>>>> put_page instead of dev_kfree_skb. Resolve this by adding a check
> >>>>>> for virtnet mode when checking XDP queue type. Also name the
> >>>>>> function so that the code reads correctly to match the additional
> >>>>>> check.
> >>>>>>
> >>>>>> Fixes: bb91accf2733 ("virtio-net: XDP support for small buffers")
> >>>>>> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> >>>>>> Acked-by: Jason Wang <jasowang@redhat.com>
> >>>>> Acked-by: Michael S. Tsirkin <mst@redhat.com>
> >>>>>
> >>>>> I think we definitely want this one in -net as it's
> >>>>> a bugfix.
> >>>> This whole series is a bug fix, we must have adjust_header XDP
> >>>> support in the virtio_net driver before v4.10 goes out, it is
> >>>> a requires base feature for XDP.
> >>> I have to say device resets outside probe have a huge potential
> >>> to uncover hypervisor bugs.
> >>
> >> Maybe not if it reuses most of current codes? Since we've already used them
> >> in sleep or hibernation?
> >>
> >> Thanks
> > 
> > Except almost no one uses sleep or hybernate with VMs.  I'm not saying
> > it's a bad idea, just that it needs a lot of testing before release and
> > we won't get enough if we merge at this point.
> > 
> 
> Then it would seem like a good thing to have another user of these paths and
> find the bugs versus letting them sit there for some poor folks who do use
> sleep/hybernate.

Absolutely. But -rc6 is not the time to test waters IMO.

> >>>   I am rather uncomfortable
> >>> doing that after -rc1.
> >>>
> >>> How about a module option to disable it by default?
> >>> We can then ship a partial implementation in 4.10
> >>> and work on completing it in 4.11.
> >>>
> 
> Ugh I would prefer to avoid module options. This will only happen if users
> push XDP program into driver anyways.

Again I agree, it's an idea for a stopgap measure so we can have
something in 4.10 - and also assuming that 256b headroom is a must.

> > 
> > To clarify, I'm thinking an option similar to enable_xdp,
> > and have all packets have a 256 byte headroom for 4.10.
> 
> An option where? In QEMU side, in driver? Is the reset really that bad, coming
> from a hardware driver side lots of configuration changes can cause resets. I
> agree its not overly elegant but could follow on patches be used to make it
> prettier if possible.

Again I agree and it's not that bad it's just not something we should
do past rc5.

> I know folks prefer to avoid tuning knobs but I think exposing the headroom
> configuration to users might not be a bad idea. After all these same users are
> already programming maps and ebpf codes. A simple tuning knob should not be a
> big deal and reasonable defaults would of course be used. That is a net-next
> debate though.

No arguments from my side here.

> > 
> > Consider our options for 4.11.
> > 
> 
> Finally just to point out here are the drivers with XDP support on latest
> net tree,
> 
> 	mlx/mlx5
> 	mlx/mlx4
> 	qlogic/qede
> 	netronome/nfp
> 	virtio_net
> 
> And here is the list of adjust header support,
> 
> 	mlx/mlx4

Above seems to imply an interface for userspace to detect the amount
of head space would be benefitial.

> 
> So we currently have the same feature gap on all the other drivers except one.
> Although I do not think that is a very good excuse. Lets figure out what we
> should do about virtio.
> 
> Thanks,
> John

If we can simply defer adjust_head patches to 4.11 then that's fine.

-- 
MST

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [net PATCH v5 1/6] virtio_net: use dev_kfree_skb for small buffer XDP receive
  2017-01-25  5:46               ` Jason Wang
@ 2017-01-25 14:47                 ` Michael S. Tsirkin
  0 siblings, 0 replies; 40+ messages in thread
From: Michael S. Tsirkin @ 2017-01-25 14:47 UTC (permalink / raw)
  To: Jason Wang
  Cc: John Fastabend, David Miller, john.r.fastabend, netdev,
	alexei.starovoitov, daniel

On Wed, Jan 25, 2017 at 01:46:46PM +0800, Jason Wang wrote:
> > Then it would seem like a good thing to have another user of these paths and
> > find the bugs versus letting them sit there for some poor folks who do use
> > sleep/hybernate.
> > 
> 
> Yes, and uncovering hypervisor bugs now is better than uncovering it in the
> future.
> 
> Thanks

Not really, all the uncovering should happen in -next or early rc.
Right now we need to fix what has been uncovered so far.

-- 
MST

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2017-01-25 14:47 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-17 22:19 [net PATCH v5 0/6] virtio_net XDP fixes and adjust_header support John Fastabend
2017-01-17 22:19 ` [net PATCH v5 1/6] virtio_net: use dev_kfree_skb for small buffer XDP receive John Fastabend
2017-01-18 15:48   ` Michael S. Tsirkin
2017-01-23 21:08   ` Michael S. Tsirkin
2017-01-23 21:57     ` John Fastabend
2017-01-24 19:43     ` David Miller
2017-01-24 20:08       ` Michael S. Tsirkin
2017-01-24 20:11         ` David Miller
2017-01-24 20:54           ` Michael S. Tsirkin
2017-01-25  2:57         ` Jason Wang
2017-01-25  3:23           ` Michael S. Tsirkin
2017-01-25  4:02             ` John Fastabend
2017-01-25  5:46               ` Jason Wang
2017-01-25 14:47                 ` Michael S. Tsirkin
2017-01-25 14:45               ` Michael S. Tsirkin
2017-01-17 22:20 ` [net PATCH v5 2/6] virtio_net: wrap rtnl_lock in test for calling with lock already held John Fastabend
2017-01-17 22:21 ` [net PATCH v5 3/6] virtio_net: factor out xdp handler for readability John Fastabend
2017-01-18 15:48   ` Michael S. Tsirkin
2017-01-17 22:21 ` [net PATCH v5 4/6] virtio_net: remove duplicate queue pair binding in XDP John Fastabend
2017-01-18 15:49   ` Michael S. Tsirkin
2017-01-17 22:22 ` [net PATCH v5 5/6] virtio_net: refactor freeze/restore logic into virtnet reset logic John Fastabend
2017-01-18 15:50   ` Michael S. Tsirkin
2017-01-17 22:22 ` [net PATCH v5 6/6] virtio_net: XDP support for adjust_head John Fastabend
2017-01-18  3:35   ` Jason Wang
2017-01-18 15:15   ` Michael S. Tsirkin
2017-01-19  3:05     ` Jason Wang
2017-01-19 21:11       ` Michael S. Tsirkin
2017-01-20  3:26         ` Jason Wang
2017-01-20  3:39           ` John Fastabend
2017-01-20  3:38         ` John Fastabend
2017-01-20 16:59         ` David Laight
2017-01-20 17:48           ` Michael S. Tsirkin
2017-01-22  2:51             ` Jason Wang
2017-01-22  4:14               ` John Fastabend
2017-01-23 17:02                 ` Michael S. Tsirkin
2017-01-23 19:22   ` Michael S. Tsirkin
2017-01-23 20:09     ` Michael S. Tsirkin
2017-01-23 22:12       ` John Fastabend
2017-01-23 22:28         ` Michael S. Tsirkin
2017-01-18 15:48 ` [net PATCH v5 0/6] virtio_net XDP fixes and adjust_header support Michael S. Tsirkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.