All of lore.kernel.org
 help / color / mirror / Atom feed
* [net PATCH v3 0/5] virtio_net XDP fixes and adjust_header support
@ 2017-01-13  2:50 John Fastabend
  2017-01-13  2:50 ` [net PATCH v3 1/5] virtio_net: use dev_kfree_skb for small buffer XDP receive John Fastabend
                   ` (4 more replies)
  0 siblings, 5 replies; 15+ messages in thread
From: John Fastabend @ 2017-01-13  2:50 UTC (permalink / raw)
  To: jasowang, mst
  Cc: john.r.fastabend, netdev, john.fastabend, alexei.starovoitov, daniel

This has a fix to handle small buffer free logic correctly and then
also adds adjust head support.

I pushed adjust head at net (even though its rc3) to avoid having
to push another exception case into virtio_net to catch if the
program uses adjust_head and then block it. If there are any strong
objections to this we can push it at net-next and use a patch from
Jakub to add the exception handling but then user space has to deal
with it either via try/fail logic or via kernel version checks. Granted
we already have some cases that need to be configured to enable XDP
but I don't see any reason to have yet another one when we can fix it
now vs delaying a kernel version.


v2: fix spelling error, convert unsigned -> unsigned int
v3: v2 git crashed during send so retrying sorry for the noise

---

John Fastabend (5):
      virtio_net: use dev_kfree_skb for small buffer XDP receive
      net: virtio: wrap rtnl_lock in test for calling with lock already held
      virtio_net: factor out xdp handler for readability
      virtio_net: remove duplicate queue pair binding in XDP
      virtio_net: XDP support for adjust_head


 drivers/net/virtio_net.c |  251 ++++++++++++++++++++++++++++++++--------------
 drivers/virtio/virtio.c  |    9 +-
 include/linux/virtio.h   |    3 +
 3 files changed, 183 insertions(+), 80 deletions(-)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [net PATCH v3 1/5] virtio_net: use dev_kfree_skb for small buffer XDP receive
  2017-01-13  2:50 [net PATCH v3 0/5] virtio_net XDP fixes and adjust_header support John Fastabend
@ 2017-01-13  2:50 ` John Fastabend
  2017-01-13  3:47   ` Jason Wang
  2017-01-13  2:51 ` [net PATCH v3 2/5] net: virtio: wrap rtnl_lock in test for calling with lock already held John Fastabend
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: John Fastabend @ 2017-01-13  2:50 UTC (permalink / raw)
  To: jasowang, mst
  Cc: john.r.fastabend, netdev, john.fastabend, alexei.starovoitov, daniel

In the small buffer case during driver unload we currently use
put_page instead of dev_kfree_skb. Resolve this by adding a check
for virtnet mode when checking XDP queue type. Also name the
function so that the code reads correctly to match the additional
check.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 drivers/net/virtio_net.c |    8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 4a10500..d97bb71 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1890,8 +1890,12 @@ static void free_receive_page_frags(struct virtnet_info *vi)
 			put_page(vi->rq[i].alloc_frag.page);
 }
 
-static bool is_xdp_queue(struct virtnet_info *vi, int q)
+static bool is_xdp_raw_buffer_queue(struct virtnet_info *vi, int q)
 {
+	/* For small receive mode always use kfree_skb variants */
+	if (!vi->mergeable_rx_bufs)
+		return false;
+
 	if (q < (vi->curr_queue_pairs - vi->xdp_queue_pairs))
 		return false;
 	else if (q < vi->curr_queue_pairs)
@@ -1908,7 +1912,7 @@ static void free_unused_bufs(struct virtnet_info *vi)
 	for (i = 0; i < vi->max_queue_pairs; i++) {
 		struct virtqueue *vq = vi->sq[i].vq;
 		while ((buf = virtqueue_detach_unused_buf(vq)) != NULL) {
-			if (!is_xdp_queue(vi, i))
+			if (!is_xdp_raw_buffer_queue(vi, i))
 				dev_kfree_skb(buf);
 			else
 				put_page(virt_to_head_page(buf));

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [net PATCH v3 2/5] net: virtio: wrap rtnl_lock in test for calling with lock already held
  2017-01-13  2:50 [net PATCH v3 0/5] virtio_net XDP fixes and adjust_header support John Fastabend
  2017-01-13  2:50 ` [net PATCH v3 1/5] virtio_net: use dev_kfree_skb for small buffer XDP receive John Fastabend
@ 2017-01-13  2:51 ` John Fastabend
  2017-01-13 16:34   ` Stephen Hemminger
  2017-01-13  2:51 ` [net PATCH v3 3/5] virtio_net: factor out xdp handler for readability John Fastabend
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 15+ messages in thread
From: John Fastabend @ 2017-01-13  2:51 UTC (permalink / raw)
  To: jasowang, mst
  Cc: john.r.fastabend, netdev, john.fastabend, alexei.starovoitov, daniel

For XDP use case and to allow ethtool reset tests it is useful to be
able to use reset routines from contexts where rtnl lock is already
held.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 drivers/net/virtio_net.c |   16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index d97bb71..43cb2e0 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1864,12 +1864,13 @@ static void virtnet_free_queues(struct virtnet_info *vi)
 	kfree(vi->sq);
 }
 
-static void free_receive_bufs(struct virtnet_info *vi)
+static void free_receive_bufs(struct virtnet_info *vi, bool need_lock)
 {
 	struct bpf_prog *old_prog;
 	int i;
 
-	rtnl_lock();
+	if (need_lock)
+		rtnl_lock();
 	for (i = 0; i < vi->max_queue_pairs; i++) {
 		while (vi->rq[i].pages)
 			__free_pages(get_a_page(&vi->rq[i], GFP_KERNEL), 0);
@@ -1879,7 +1880,8 @@ static void free_receive_bufs(struct virtnet_info *vi)
 		if (old_prog)
 			bpf_prog_put(old_prog);
 	}
-	rtnl_unlock();
+	if (need_lock)
+		rtnl_unlock();
 }
 
 static void free_receive_page_frags(struct virtnet_info *vi)
@@ -2351,14 +2353,14 @@ static int virtnet_probe(struct virtio_device *vdev)
 	return err;
 }
 
-static void remove_vq_common(struct virtnet_info *vi)
+static void remove_vq_common(struct virtnet_info *vi, bool lock)
 {
 	vi->vdev->config->reset(vi->vdev);
 
 	/* Free unused buffers in both send and recv, if any. */
 	free_unused_bufs(vi);
 
-	free_receive_bufs(vi);
+	free_receive_bufs(vi, lock);
 
 	free_receive_page_frags(vi);
 
@@ -2376,7 +2378,7 @@ static void virtnet_remove(struct virtio_device *vdev)
 
 	unregister_netdev(vi->dev);
 
-	remove_vq_common(vi);
+	remove_vq_common(vi, true);
 
 	free_percpu(vi->stats);
 	free_netdev(vi->dev);
@@ -2401,7 +2403,7 @@ static int virtnet_freeze(struct virtio_device *vdev)
 			napi_disable(&vi->rq[i].napi);
 	}
 
-	remove_vq_common(vi);
+	remove_vq_common(vi, true);
 
 	return 0;
 }

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [net PATCH v3 3/5] virtio_net: factor out xdp handler for readability
  2017-01-13  2:50 [net PATCH v3 0/5] virtio_net XDP fixes and adjust_header support John Fastabend
  2017-01-13  2:50 ` [net PATCH v3 1/5] virtio_net: use dev_kfree_skb for small buffer XDP receive John Fastabend
  2017-01-13  2:51 ` [net PATCH v3 2/5] net: virtio: wrap rtnl_lock in test for calling with lock already held John Fastabend
@ 2017-01-13  2:51 ` John Fastabend
  2017-01-13  7:40   ` Jason Wang
  2017-01-13  2:51 ` [net PATCH v3 4/5] virtio_net: remove duplicate queue pair binding in XDP John Fastabend
  2017-01-13  2:52 ` [net PATCH v3 5/5] virtio_net: XDP support for adjust_head John Fastabend
  4 siblings, 1 reply; 15+ messages in thread
From: John Fastabend @ 2017-01-13  2:51 UTC (permalink / raw)
  To: jasowang, mst
  Cc: john.r.fastabend, netdev, john.fastabend, alexei.starovoitov, daniel

At this point the do_xdp_prog is mostly if/else branches handling
the different modes of virtio_net. So remove it and handle running
the program in the per mode handlers.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 drivers/net/virtio_net.c |   76 +++++++++++++++++-----------------------------
 1 file changed, 28 insertions(+), 48 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 43cb2e0..ec54644 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -388,49 +388,6 @@ static void virtnet_xdp_xmit(struct virtnet_info *vi,
 	virtqueue_kick(sq->vq);
 }
 
-static u32 do_xdp_prog(struct virtnet_info *vi,
-		       struct receive_queue *rq,
-		       struct bpf_prog *xdp_prog,
-		       void *data, int len)
-{
-	int hdr_padded_len;
-	struct xdp_buff xdp;
-	void *buf;
-	unsigned int qp;
-	u32 act;
-
-	if (vi->mergeable_rx_bufs) {
-		hdr_padded_len = sizeof(struct virtio_net_hdr_mrg_rxbuf);
-		xdp.data = data + hdr_padded_len;
-		xdp.data_end = xdp.data + (len - vi->hdr_len);
-		buf = data;
-	} else { /* small buffers */
-		struct sk_buff *skb = data;
-
-		xdp.data = skb->data;
-		xdp.data_end = xdp.data + len;
-		buf = skb->data;
-	}
-
-	act = bpf_prog_run_xdp(xdp_prog, &xdp);
-	switch (act) {
-	case XDP_PASS:
-		return XDP_PASS;
-	case XDP_TX:
-		qp = vi->curr_queue_pairs -
-			vi->xdp_queue_pairs +
-			smp_processor_id();
-		xdp.data = buf;
-		virtnet_xdp_xmit(vi, rq, &vi->sq[qp], &xdp, data);
-		return XDP_TX;
-	default:
-		bpf_warn_invalid_xdp_action(act);
-	case XDP_ABORTED:
-	case XDP_DROP:
-		return XDP_DROP;
-	}
-}
-
 static struct sk_buff *receive_small(struct net_device *dev,
 				     struct virtnet_info *vi,
 				     struct receive_queue *rq,
@@ -446,19 +403,30 @@ static struct sk_buff *receive_small(struct net_device *dev,
 	xdp_prog = rcu_dereference(rq->xdp_prog);
 	if (xdp_prog) {
 		struct virtio_net_hdr_mrg_rxbuf *hdr = buf;
+		struct xdp_buff xdp;
+		unsigned int qp;
 		u32 act;
 
 		if (unlikely(hdr->hdr.gso_type || hdr->hdr.flags))
 			goto err_xdp;
-		act = do_xdp_prog(vi, rq, xdp_prog, skb, len);
+
+		xdp.data = skb->data;
+		xdp.data_end = xdp.data + len;
+		act = bpf_prog_run_xdp(xdp_prog, &xdp);
 		switch (act) {
 		case XDP_PASS:
 			break;
 		case XDP_TX:
+			qp = vi->curr_queue_pairs -
+				vi->xdp_queue_pairs +
+				smp_processor_id();
+			virtnet_xdp_xmit(vi, rq, &vi->sq[qp], &xdp, skb);
 			rcu_read_unlock();
 			goto xdp_xmit;
-		case XDP_DROP:
 		default:
+			bpf_warn_invalid_xdp_action(act);
+		case XDP_ABORTED:
+		case XDP_DROP:
 			goto err_xdp;
 		}
 	}
@@ -575,7 +543,11 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 	rcu_read_lock();
 	xdp_prog = rcu_dereference(rq->xdp_prog);
 	if (xdp_prog) {
+		int desc_room = sizeof(struct virtio_net_hdr_mrg_rxbuf);
 		struct page *xdp_page;
+		struct xdp_buff xdp;
+		unsigned int qp;
+		void *data;
 		u32 act;
 
 		/* This happens when rx buffer size is underestimated */
@@ -598,8 +570,10 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 		if (unlikely(hdr->hdr.gso_type))
 			goto err_xdp;
 
-		act = do_xdp_prog(vi, rq, xdp_prog,
-				  page_address(xdp_page) + offset, len);
+		data = page_address(xdp_page) + offset;
+		xdp.data = data + desc_room;
+		xdp.data_end = xdp.data + (len - vi->hdr_len);
+		act = bpf_prog_run_xdp(xdp_prog, &xdp);
 		switch (act) {
 		case XDP_PASS:
 			/* We can only create skb based on xdp_page. */
@@ -613,13 +587,19 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 			}
 			break;
 		case XDP_TX:
+			qp = vi->curr_queue_pairs -
+				vi->xdp_queue_pairs +
+				smp_processor_id();
+			virtnet_xdp_xmit(vi, rq, &vi->sq[qp], &xdp, data);
 			ewma_pkt_len_add(&rq->mrg_avg_pkt_len, len);
 			if (unlikely(xdp_page != page))
 				goto err_xdp;
 			rcu_read_unlock();
 			goto xdp_xmit;
-		case XDP_DROP:
 		default:
+			bpf_warn_invalid_xdp_action(act);
+		case XDP_ABORTED:
+		case XDP_DROP:
 			if (unlikely(xdp_page != page))
 				__free_pages(xdp_page, 0);
 			ewma_pkt_len_add(&rq->mrg_avg_pkt_len, len);

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [net PATCH v3 4/5] virtio_net: remove duplicate queue pair binding in XDP
  2017-01-13  2:50 [net PATCH v3 0/5] virtio_net XDP fixes and adjust_header support John Fastabend
                   ` (2 preceding siblings ...)
  2017-01-13  2:51 ` [net PATCH v3 3/5] virtio_net: factor out xdp handler for readability John Fastabend
@ 2017-01-13  2:51 ` John Fastabend
  2017-01-13  2:52 ` [net PATCH v3 5/5] virtio_net: XDP support for adjust_head John Fastabend
  4 siblings, 0 replies; 15+ messages in thread
From: John Fastabend @ 2017-01-13  2:51 UTC (permalink / raw)
  To: jasowang, mst
  Cc: john.r.fastabend, netdev, john.fastabend, alexei.starovoitov, daniel

Factor out qp assignment.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 drivers/net/virtio_net.c |   18 +++++++-----------
 1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index ec54644..6041828 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -332,15 +332,19 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
 
 static void virtnet_xdp_xmit(struct virtnet_info *vi,
 			     struct receive_queue *rq,
-			     struct send_queue *sq,
 			     struct xdp_buff *xdp,
 			     void *data)
 {
 	struct virtio_net_hdr_mrg_rxbuf *hdr;
 	unsigned int num_sg, len;
+	struct send_queue *sq;
+	unsigned int qp;
 	void *xdp_sent;
 	int err;
 
+	qp = vi->curr_queue_pairs - vi->xdp_queue_pairs + smp_processor_id();
+	sq = &vi->sq[qp];
+
 	/* Free up any pending old buffers before queueing new ones. */
 	while ((xdp_sent = virtqueue_get_buf(sq->vq, &len)) != NULL) {
 		if (vi->mergeable_rx_bufs) {
@@ -404,7 +408,6 @@ static struct sk_buff *receive_small(struct net_device *dev,
 	if (xdp_prog) {
 		struct virtio_net_hdr_mrg_rxbuf *hdr = buf;
 		struct xdp_buff xdp;
-		unsigned int qp;
 		u32 act;
 
 		if (unlikely(hdr->hdr.gso_type || hdr->hdr.flags))
@@ -417,10 +420,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
 		case XDP_PASS:
 			break;
 		case XDP_TX:
-			qp = vi->curr_queue_pairs -
-				vi->xdp_queue_pairs +
-				smp_processor_id();
-			virtnet_xdp_xmit(vi, rq, &vi->sq[qp], &xdp, skb);
+			virtnet_xdp_xmit(vi, rq, &xdp, skb);
 			rcu_read_unlock();
 			goto xdp_xmit;
 		default:
@@ -546,7 +546,6 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 		int desc_room = sizeof(struct virtio_net_hdr_mrg_rxbuf);
 		struct page *xdp_page;
 		struct xdp_buff xdp;
-		unsigned int qp;
 		void *data;
 		u32 act;
 
@@ -587,10 +586,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 			}
 			break;
 		case XDP_TX:
-			qp = vi->curr_queue_pairs -
-				vi->xdp_queue_pairs +
-				smp_processor_id();
-			virtnet_xdp_xmit(vi, rq, &vi->sq[qp], &xdp, data);
+			virtnet_xdp_xmit(vi, rq, &xdp, data);
 			ewma_pkt_len_add(&rq->mrg_avg_pkt_len, len);
 			if (unlikely(xdp_page != page))
 				goto err_xdp;

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [net PATCH v3 5/5] virtio_net: XDP support for adjust_head
  2017-01-13  2:50 [net PATCH v3 0/5] virtio_net XDP fixes and adjust_header support John Fastabend
                   ` (3 preceding siblings ...)
  2017-01-13  2:51 ` [net PATCH v3 4/5] virtio_net: remove duplicate queue pair binding in XDP John Fastabend
@ 2017-01-13  2:52 ` John Fastabend
  2017-01-13  7:41   ` Jason Wang
  4 siblings, 1 reply; 15+ messages in thread
From: John Fastabend @ 2017-01-13  2:52 UTC (permalink / raw)
  To: jasowang, mst
  Cc: john.r.fastabend, netdev, john.fastabend, alexei.starovoitov, daniel

Add support for XDP adjust head by allocating a 256B header region
that XDP programs can grow into. This is only enabled when a XDP
program is loaded.

In order to ensure that we do not have to unwind queue headroom push
queue setup below bpf_prog_add. It reads better to do a prog ref
unwind vs another queue setup call.

At the moment this code must do a full reset to ensure old buffers
without headroom on program add or with headroom on program removal
are not used incorrectly in the datapath. Ideally we would only
have to disable/enable the RX queues being updated but there is no
API to do this at the moment in virtio so use the big hammer. In
practice it is likely not that big of a problem as this will only
happen when XDP is enabled/disabled changing programs does not
require the reset. There is some risk that the driver may either
have an allocation failure or for some reason fail to correctly
negotiate with the underlying backend in this case the driver will
be left uninitialized. I have not seen this ever happen on my test
systems and for what its worth this same failure case can occur
from probe and other contexts in virtio framework.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
---
 drivers/net/virtio_net.c |  155 ++++++++++++++++++++++++++++++++++++++++------
 drivers/virtio/virtio.c  |    9 ++-
 include/linux/virtio.h   |    3 +
 3 files changed, 144 insertions(+), 23 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 6041828..8b897e7 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -28,6 +28,7 @@
 #include <linux/slab.h>
 #include <linux/cpu.h>
 #include <linux/average.h>
+#include <linux/pci.h>
 #include <net/busy_poll.h>
 
 static int napi_weight = NAPI_POLL_WEIGHT;
@@ -159,6 +160,9 @@ struct virtnet_info {
 	/* Ethtool settings */
 	u8 duplex;
 	u32 speed;
+
+	/* Headroom allocated in RX Queue */
+	unsigned int headroom;
 };
 
 struct padded_vnet_hdr {
@@ -359,6 +363,7 @@ static void virtnet_xdp_xmit(struct virtnet_info *vi,
 	}
 
 	if (vi->mergeable_rx_bufs) {
+		xdp->data -= sizeof(struct virtio_net_hdr_mrg_rxbuf);
 		/* Zero header and leave csum up to XDP layers */
 		hdr = xdp->data;
 		memset(hdr, 0, vi->hdr_len);
@@ -375,7 +380,9 @@ static void virtnet_xdp_xmit(struct virtnet_info *vi,
 		num_sg = 2;
 		sg_init_table(sq->sg, 2);
 		sg_set_buf(sq->sg, hdr, vi->hdr_len);
-		skb_to_sgvec(skb, sq->sg + 1, 0, skb->len);
+		skb_to_sgvec(skb, sq->sg + 1,
+			     xdp->data - xdp->data_hard_start,
+			     xdp->data_end - xdp->data);
 	}
 	err = virtqueue_add_outbuf(sq->vq, sq->sg, num_sg,
 				   data, GFP_ATOMIC);
@@ -401,7 +408,6 @@ static struct sk_buff *receive_small(struct net_device *dev,
 	struct bpf_prog *xdp_prog;
 
 	len -= vi->hdr_len;
-	skb_trim(skb, len);
 
 	rcu_read_lock();
 	xdp_prog = rcu_dereference(rq->xdp_prog);
@@ -413,11 +419,15 @@ static struct sk_buff *receive_small(struct net_device *dev,
 		if (unlikely(hdr->hdr.gso_type || hdr->hdr.flags))
 			goto err_xdp;
 
-		xdp.data = skb->data;
+		xdp.data_hard_start = skb->data;
+		xdp.data = skb->data + vi->headroom;
 		xdp.data_end = xdp.data + len;
 		act = bpf_prog_run_xdp(xdp_prog, &xdp);
 		switch (act) {
 		case XDP_PASS:
+			/* Recalculate length in case bpf program changed it */
+			len = xdp.data_end - xdp.data;
+			__skb_pull(skb, xdp.data - xdp.data_hard_start);
 			break;
 		case XDP_TX:
 			virtnet_xdp_xmit(vi, rq, &xdp, skb);
@@ -432,6 +442,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
 	}
 	rcu_read_unlock();
 
+	skb_trim(skb, len);
 	return skb;
 
 err_xdp:
@@ -569,7 +580,11 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 		if (unlikely(hdr->hdr.gso_type))
 			goto err_xdp;
 
+		/* Allow consuming headroom but reserve enough space to push
+		 * the descriptor on if we get an XDP_TX return code.
+		 */
 		data = page_address(xdp_page) + offset;
+		xdp.data_hard_start = data - vi->headroom + desc_room;
 		xdp.data = data + desc_room;
 		xdp.data_end = xdp.data + (len - vi->hdr_len);
 		act = bpf_prog_run_xdp(xdp_prog, &xdp);
@@ -748,20 +763,21 @@ static void receive_buf(struct virtnet_info *vi, struct receive_queue *rq,
 static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
 			     gfp_t gfp)
 {
+	int headroom = GOOD_PACKET_LEN + vi->headroom;
 	struct sk_buff *skb;
 	struct virtio_net_hdr_mrg_rxbuf *hdr;
 	int err;
 
-	skb = __netdev_alloc_skb_ip_align(vi->dev, GOOD_PACKET_LEN, gfp);
+	skb = __netdev_alloc_skb_ip_align(vi->dev, headroom, gfp);
 	if (unlikely(!skb))
 		return -ENOMEM;
 
-	skb_put(skb, GOOD_PACKET_LEN);
+	skb_put(skb, headroom);
 
 	hdr = skb_vnet_hdr(skb);
 	sg_init_table(rq->sg, 2);
 	sg_set_buf(rq->sg, hdr, vi->hdr_len);
-	skb_to_sgvec(skb, rq->sg + 1, 0, skb->len);
+	skb_to_sgvec(skb, rq->sg + 1, vi->headroom, skb->len - vi->headroom);
 
 	err = virtqueue_add_inbuf(rq->vq, rq->sg, 2, skb, gfp);
 	if (err < 0)
@@ -829,24 +845,27 @@ static unsigned int get_mergeable_buf_len(struct ewma_pkt_len *avg_pkt_len)
 	return ALIGN(len, MERGEABLE_BUFFER_ALIGN);
 }
 
-static int add_recvbuf_mergeable(struct receive_queue *rq, gfp_t gfp)
+static int add_recvbuf_mergeable(struct virtnet_info *vi,
+				 struct receive_queue *rq, gfp_t gfp)
 {
 	struct page_frag *alloc_frag = &rq->alloc_frag;
+	unsigned int headroom = vi->headroom;
 	char *buf;
 	unsigned long ctx;
 	int err;
 	unsigned int len, hole;
 
 	len = get_mergeable_buf_len(&rq->mrg_avg_pkt_len);
-	if (unlikely(!skb_page_frag_refill(len, alloc_frag, gfp)))
+	if (unlikely(!skb_page_frag_refill(len + headroom, alloc_frag, gfp)))
 		return -ENOMEM;
 
 	buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
+	buf += headroom; /* advance address leaving hole at front of pkt */
 	ctx = mergeable_buf_to_ctx(buf, len);
 	get_page(alloc_frag->page);
-	alloc_frag->offset += len;
+	alloc_frag->offset += len + headroom;
 	hole = alloc_frag->size - alloc_frag->offset;
-	if (hole < len) {
+	if (hole < len + headroom) {
 		/* To avoid internal fragmentation, if there is very likely not
 		 * enough space for another buffer, add the remaining space to
 		 * the current buffer. This extra space is not included in
@@ -880,7 +899,7 @@ static bool try_fill_recv(struct virtnet_info *vi, struct receive_queue *rq,
 	gfp |= __GFP_COLD;
 	do {
 		if (vi->mergeable_rx_bufs)
-			err = add_recvbuf_mergeable(rq, gfp);
+			err = add_recvbuf_mergeable(vi, rq, gfp);
 		else if (vi->big_packets)
 			err = add_recvbuf_big(vi, rq, gfp);
 		else
@@ -1675,12 +1694,90 @@ static void virtnet_init_settings(struct net_device *dev)
 	.set_settings = virtnet_set_settings,
 };
 
+#define VIRTIO_XDP_HEADROOM 256
+
+static int init_vqs(struct virtnet_info *vi);
+static void remove_vq_common(struct virtnet_info *vi, bool lock);
+
+/* Reset virtio device with RTNL held this is very similar to the
+ * freeze()/restore() logic except we need to ensure locking. It is
+ * possible that this routine may fail and leave the driver in a
+ * failed state. However assuming the driver negotiated correctly
+ * at probe time we _should_ be able to (re)negotiate driver again.
+ */
+static int virtnet_xdp_reset(struct virtnet_info *vi)
+{
+	struct virtio_device *vdev = vi->vdev;
+	unsigned int status;
+	int i, ret;
+
+	/* Disable and unwind rings */
+	virtio_config_disable(vdev);
+	vdev->failed = vdev->config->get_status(vdev) & VIRTIO_CONFIG_S_FAILED;
+
+	netif_device_detach(vi->dev);
+	cancel_delayed_work_sync(&vi->refill);
+	if (netif_running(vi->dev)) {
+		for (i = 0; i < vi->max_queue_pairs; i++)
+			napi_disable(&vi->rq[i].napi);
+	}
+
+	remove_vq_common(vi, false);
+
+	/* Do a reset per virtio spec recommendation */
+	vdev->config->reset(vdev);
+
+	/* Acknowledge that we've seen the device. */
+	status = vdev->config->get_status(vdev);
+	vdev->config->set_status(vdev, status | VIRTIO_CONFIG_S_ACKNOWLEDGE);
+
+	/* Notify driver is up and finalize features per specification. The
+	 * error code from finalize features is checked here but should not
+	 * fail because we assume features were previously synced successfully.
+	 */
+	status = vdev->config->get_status(vdev);
+	vdev->config->set_status(vdev, status | VIRTIO_CONFIG_S_DRIVER);
+	ret = virtio_finalize_features(vdev);
+	if (ret) {
+		netdev_warn(vi->dev, "virtio_finalize_features failed during reset aborting\n");
+		goto err;
+	}
+
+	ret = init_vqs(vi);
+	if (ret) {
+		netdev_warn(vi->dev, "init_vqs failed during reset aborting\n");
+		goto err;
+	}
+	virtio_device_ready(vi->vdev);
+
+	if (netif_running(vi->dev)) {
+		for (i = 0; i < vi->curr_queue_pairs; i++)
+			if (!try_fill_recv(vi, &vi->rq[i], GFP_KERNEL))
+				schedule_delayed_work(&vi->refill, 0);
+
+		for (i = 0; i < vi->max_queue_pairs; i++)
+			virtnet_napi_enable(&vi->rq[i]);
+	}
+	netif_device_attach(vi->dev);
+	/* Finally, tell the device we're all set */
+	status = vdev->config->get_status(vdev);
+	vdev->config->set_status(vdev, status | VIRTIO_CONFIG_S_DRIVER_OK);
+	virtio_config_enable(vdev);
+
+	return 0;
+err:
+	status = vdev->config->get_status(vdev);
+	vdev->config->set_status(vdev, status | VIRTIO_CONFIG_S_FAILED);
+	return ret;
+}
+
 static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog)
 {
 	unsigned long int max_sz = PAGE_SIZE - sizeof(struct padded_vnet_hdr);
 	struct virtnet_info *vi = netdev_priv(dev);
 	struct bpf_prog *old_prog;
 	u16 xdp_qp = 0, curr_qp;
+	unsigned int old_hr;
 	int i, err;
 
 	if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_GUEST_TSO4) ||
@@ -1712,18 +1809,31 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog)
 		return -ENOMEM;
 	}
 
-	err = virtnet_set_queues(vi, curr_qp + xdp_qp);
-	if (err) {
-		dev_warn(&dev->dev, "XDP Device queue allocation failure.\n");
-		return err;
-	}
-
+	old_hr = vi->headroom;
 	if (prog) {
 		prog = bpf_prog_add(prog, vi->max_queue_pairs - 1);
-		if (IS_ERR(prog)) {
-			virtnet_set_queues(vi, curr_qp);
+		if (IS_ERR(prog))
 			return PTR_ERR(prog);
-		}
+		vi->headroom = VIRTIO_XDP_HEADROOM;
+	} else {
+		vi->headroom = 0;
+	}
+
+	/* Changing the headroom in buffers is a disruptive operation because
+	 * existing buffers must be flushed and reallocated. This will happen
+	 * when a xdp program is initially added or xdp is disabled by removing
+	 * the xdp program.
+	 */
+	if (old_hr != vi->headroom) {
+		err = virtnet_xdp_reset(vi);
+		if (err)
+			goto err_reset;
+	}
+
+	err = virtnet_set_queues(vi, curr_qp + xdp_qp);
+	if (err) {
+		dev_warn(&dev->dev, "XDP Device queue allocation failure.\n");
+		goto err_reset;
 	}
 
 	vi->xdp_queue_pairs = xdp_qp;
@@ -1737,6 +1847,11 @@ static int virtnet_xdp_set(struct net_device *dev, struct bpf_prog *prog)
 	}
 
 	return 0;
+err_reset:
+	if (prog)
+		bpf_prog_sub(prog, vi->max_queue_pairs - 1);
+	vi->headroom = old_hr;
+	return err;
 }
 
 static bool virtnet_xdp_query(struct net_device *dev)
diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index 7062bb0..0e922b9 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -145,14 +145,15 @@ void virtio_config_changed(struct virtio_device *dev)
 }
 EXPORT_SYMBOL_GPL(virtio_config_changed);
 
-static void virtio_config_disable(struct virtio_device *dev)
+void virtio_config_disable(struct virtio_device *dev)
 {
 	spin_lock_irq(&dev->config_lock);
 	dev->config_enabled = false;
 	spin_unlock_irq(&dev->config_lock);
 }
+EXPORT_SYMBOL_GPL(virtio_config_disable);
 
-static void virtio_config_enable(struct virtio_device *dev)
+void virtio_config_enable(struct virtio_device *dev)
 {
 	spin_lock_irq(&dev->config_lock);
 	dev->config_enabled = true;
@@ -161,8 +162,9 @@ static void virtio_config_enable(struct virtio_device *dev)
 	dev->config_change_pending = false;
 	spin_unlock_irq(&dev->config_lock);
 }
+EXPORT_SYMBOL_GPL(virtio_config_enable);
 
-static int virtio_finalize_features(struct virtio_device *dev)
+int virtio_finalize_features(struct virtio_device *dev)
 {
 	int ret = dev->config->finalize_features(dev);
 	unsigned status;
@@ -182,6 +184,7 @@ static int virtio_finalize_features(struct virtio_device *dev)
 	}
 	return 0;
 }
+EXPORT_SYMBOL_GPL(virtio_finalize_features);
 
 static int virtio_dev_probe(struct device *_d)
 {
diff --git a/include/linux/virtio.h b/include/linux/virtio.h
index d5eb547..eac8f05 100644
--- a/include/linux/virtio.h
+++ b/include/linux/virtio.h
@@ -137,6 +137,9 @@ static inline struct virtio_device *dev_to_virtio(struct device *_dev)
 
 void virtio_break_device(struct virtio_device *dev);
 
+void virtio_config_disable(struct virtio_device *dev);
+void virtio_config_enable(struct virtio_device *dev);
+int virtio_finalize_features(struct virtio_device *dev);
 void virtio_config_changed(struct virtio_device *dev);
 #ifdef CONFIG_PM_SLEEP
 int virtio_device_freeze(struct virtio_device *dev);

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [net PATCH v3 1/5] virtio_net: use dev_kfree_skb for small buffer XDP receive
  2017-01-13  2:50 ` [net PATCH v3 1/5] virtio_net: use dev_kfree_skb for small buffer XDP receive John Fastabend
@ 2017-01-13  3:47   ` Jason Wang
  0 siblings, 0 replies; 15+ messages in thread
From: Jason Wang @ 2017-01-13  3:47 UTC (permalink / raw)
  To: John Fastabend, mst; +Cc: john.r.fastabend, netdev, alexei.starovoitov, daniel



On 2017年01月13日 10:50, John Fastabend wrote:
> In the small buffer case during driver unload we currently use
> put_page instead of dev_kfree_skb. Resolve this by adding a check
> for virtnet mode when checking XDP queue type. Also name the
> function so that the code reads correctly to match the additional
> check.
>
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> ---
>   drivers/net/virtio_net.c |    8 ++++++--
>   1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 4a10500..d97bb71 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -1890,8 +1890,12 @@ static void free_receive_page_frags(struct virtnet_info *vi)
>   			put_page(vi->rq[i].alloc_frag.page);
>   }
>   
> -static bool is_xdp_queue(struct virtnet_info *vi, int q)
> +static bool is_xdp_raw_buffer_queue(struct virtnet_info *vi, int q)
>   {
> +	/* For small receive mode always use kfree_skb variants */
> +	if (!vi->mergeable_rx_bufs)
> +		return false;
> +
>   	if (q < (vi->curr_queue_pairs - vi->xdp_queue_pairs))
>   		return false;
>   	else if (q < vi->curr_queue_pairs)
> @@ -1908,7 +1912,7 @@ static void free_unused_bufs(struct virtnet_info *vi)
>   	for (i = 0; i < vi->max_queue_pairs; i++) {
>   		struct virtqueue *vq = vi->sq[i].vq;
>   		while ((buf = virtqueue_detach_unused_buf(vq)) != NULL) {
> -			if (!is_xdp_queue(vi, i))
> +			if (!is_xdp_raw_buffer_queue(vi, i))
>   				dev_kfree_skb(buf);
>   			else
>   				put_page(virt_to_head_page(buf));
>

Acked-by: Jason Wang <jasowang@redhat.com>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [net PATCH v3 3/5] virtio_net: factor out xdp handler for readability
  2017-01-13  2:51 ` [net PATCH v3 3/5] virtio_net: factor out xdp handler for readability John Fastabend
@ 2017-01-13  7:40   ` Jason Wang
  2017-01-13 19:56     ` John Fastabend
  0 siblings, 1 reply; 15+ messages in thread
From: Jason Wang @ 2017-01-13  7:40 UTC (permalink / raw)
  To: John Fastabend, mst; +Cc: john.r.fastabend, netdev, alexei.starovoitov, daniel



On 2017年01月13日 10:51, John Fastabend wrote:
> At this point the do_xdp_prog is mostly if/else branches handling
> the different modes of virtio_net. So remove it and handle running
> the program in the per mode handlers.
>
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> ---
>   drivers/net/virtio_net.c |   76 +++++++++++++++++-----------------------------
>   1 file changed, 28 insertions(+), 48 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 43cb2e0..ec54644 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -388,49 +388,6 @@ static void virtnet_xdp_xmit(struct virtnet_info *vi,
>   	virtqueue_kick(sq->vq);
>   }
>   

[...]

>   
>   		/* This happens when rx buffer size is underestimated */
> @@ -598,8 +570,10 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>   		if (unlikely(hdr->hdr.gso_type))
>   			goto err_xdp;
>   
> -		act = do_xdp_prog(vi, rq, xdp_prog,
> -				  page_address(xdp_page) + offset, len);
> +		data = page_address(xdp_page) + offset;
> +		xdp.data = data + desc_room;
> +		xdp.data_end = xdp.data + (len - vi->hdr_len);

It looks desc_room is always vi->hdr_len.

> +		act = bpf_prog_run_xdp(xdp_prog, &xdp);
>   		switch (act) {
>   		case XDP_PASS:
>   			/* We can only create skb based on xdp_page. */
> @@ -613,13 +587,19 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>   			}
>   			break;
>   		case XDP_TX:
> +			qp = vi->curr_queue_pairs -
> +				vi->xdp_queue_pairs +
> +				smp_processor_id();
> +			virtnet_xdp_xmit(vi, rq, &vi->sq[qp], &xdp, data);
>   			ewma_pkt_len_add(&rq->mrg_avg_pkt_len, len);
>   			if (unlikely(xdp_page != page))
>   				goto err_xdp;
>   			rcu_read_unlock();
>   			goto xdp_xmit;
> -		case XDP_DROP:
>   		default:
> +			bpf_warn_invalid_xdp_action(act);
> +		case XDP_ABORTED:
> +		case XDP_DROP:
>   			if (unlikely(xdp_page != page))
>   				__free_pages(xdp_page, 0);
>   			ewma_pkt_len_add(&rq->mrg_avg_pkt_len, len);
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [net PATCH v3 5/5] virtio_net: XDP support for adjust_head
  2017-01-13  2:52 ` [net PATCH v3 5/5] virtio_net: XDP support for adjust_head John Fastabend
@ 2017-01-13  7:41   ` Jason Wang
  2017-01-13 20:08     ` John Fastabend
  0 siblings, 1 reply; 15+ messages in thread
From: Jason Wang @ 2017-01-13  7:41 UTC (permalink / raw)
  To: John Fastabend, mst; +Cc: john.r.fastabend, netdev, alexei.starovoitov, daniel



On 2017年01月13日 10:52, John Fastabend wrote:
> Add support for XDP adjust head by allocating a 256B header region
> that XDP programs can grow into. This is only enabled when a XDP
> program is loaded.
>
> In order to ensure that we do not have to unwind queue headroom push
> queue setup below bpf_prog_add. It reads better to do a prog ref
> unwind vs another queue setup call.
>
> At the moment this code must do a full reset to ensure old buffers
> without headroom on program add or with headroom on program removal
> are not used incorrectly in the datapath. Ideally we would only
> have to disable/enable the RX queues being updated but there is no
> API to do this at the moment in virtio so use the big hammer. In
> practice it is likely not that big of a problem as this will only
> happen when XDP is enabled/disabled changing programs does not
> require the reset. There is some risk that the driver may either
> have an allocation failure or for some reason fail to correctly
> negotiate with the underlying backend in this case the driver will
> be left uninitialized. I have not seen this ever happen on my test
> systems and for what its worth this same failure case can occur
> from probe and other contexts in virtio framework.
>
> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
> ---
>   drivers/net/virtio_net.c |  155 ++++++++++++++++++++++++++++++++++++++++------
>   drivers/virtio/virtio.c  |    9 ++-
>   include/linux/virtio.h   |    3 +
>   3 files changed, 144 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 6041828..8b897e7 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -28,6 +28,7 @@
>   #include <linux/slab.h>
>   #include <linux/cpu.h>
>   #include <linux/average.h>
> +#include <linux/pci.h>
>   #include <net/busy_poll.h>
>   
>   static int napi_weight = NAPI_POLL_WEIGHT;
> @@ -159,6 +160,9 @@ struct virtnet_info {
>   	/* Ethtool settings */
>   	u8 duplex;
>   	u32 speed;
> +
> +	/* Headroom allocated in RX Queue */
> +	unsigned int headroom;

If this could not be changed in anyway, better use a macro instead of a 
filed here. And there's even no need to add an extra parameter to 
add_recvbuf_mergeable().

>   };
>   
>   struct padded_vnet_hdr {
> @@ -359,6 +363,7 @@ static void virtnet_xdp_xmit(struct virtnet_info *vi,
>   	}
>   
>   	if (vi->mergeable_rx_bufs) {
> +		xdp->data -= sizeof(struct virtio_net_hdr_mrg_rxbuf);

Fail to understand why this is needed. We should have excluded vnet 
header from xdp->data even before bpf_prog_run_xdp().

>   		/* Zero header and leave csum up to XDP layers */
>   		hdr = xdp->data;
>   		memset(hdr, 0, vi->hdr_len);
> @@ -375,7 +380,9 @@ static void virtnet_xdp_xmit(struct virtnet_info *vi,
>   		num_sg = 2;
>   		sg_init_table(sq->sg, 2);
>   		sg_set_buf(sq->sg, hdr, vi->hdr_len);
> -		skb_to_sgvec(skb, sq->sg + 1, 0, skb->len);
> +		skb_to_sgvec(skb, sq->sg + 1,
> +			     xdp->data - xdp->data_hard_start,
> +			     xdp->data_end - xdp->data);
>   	}
>   	err = virtqueue_add_outbuf(sq->vq, sq->sg, num_sg,
>   				   data, GFP_ATOMIC);
> @@ -401,7 +408,6 @@ static struct sk_buff *receive_small(struct net_device *dev,
>   	struct bpf_prog *xdp_prog;
>   
>   	len -= vi->hdr_len;
> -	skb_trim(skb, len);
>   
>   	rcu_read_lock();
>   	xdp_prog = rcu_dereference(rq->xdp_prog);
> @@ -413,11 +419,15 @@ static struct sk_buff *receive_small(struct net_device *dev,
>   		if (unlikely(hdr->hdr.gso_type || hdr->hdr.flags))
>   			goto err_xdp;
>   
> -		xdp.data = skb->data;
> +		xdp.data_hard_start = skb->data;
> +		xdp.data = skb->data + vi->headroom;
>   		xdp.data_end = xdp.data + len;
>   		act = bpf_prog_run_xdp(xdp_prog, &xdp);
>   		switch (act) {
>   		case XDP_PASS:
> +			/* Recalculate length in case bpf program changed it */
> +			len = xdp.data_end - xdp.data;
> +			__skb_pull(skb, xdp.data - xdp.data_hard_start);

How about do this just after bpf_pro_run_xdp() for XDP_TX too? This is 
more readable and there's no need to change xmit path.

>   			break;
>   		case XDP_TX:
>   			virtnet_xdp_xmit(vi, rq, &xdp, skb);
> @@ -432,6 +442,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
>   	}
>   	rcu_read_unlock();
>   
> +	skb_trim(skb, len);
>   	return skb;
>   
>   err_xdp:
> @@ -569,7 +580,11 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>   		if (unlikely(hdr->hdr.gso_type))
>   			goto err_xdp;
>   
> +		/* Allow consuming headroom but reserve enough space to push
> +		 * the descriptor on if we get an XDP_TX return code.
> +		 */
>   		data = page_address(xdp_page) + offset;
> +		xdp.data_hard_start = data - vi->headroom + desc_room;

Two possible issues here:

1) If we want to adjust header after linearizing, we should reserve a 
room for that page, but I don't see any codes for this.
2) If the header has been adjusted, looks like we need change offset 
value otherwise, page_to_skb() won't build a correct skb for us for 
XDP_PASS.

>   		xdp.data = data + desc_room;
>   		xdp.data_end = xdp.data + (len - vi->hdr_len);
>   		act = bpf_prog_run_xdp(xdp_prog, &xdp);
> @@ -748,20 +763,21 @@ static void receive_buf(struct virtnet_info *vi, struct receive_queue *rq,
>   static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
>   			     gfp_t gfp)
>   {
> +	int headroom = GOOD_PACKET_LEN + vi->headroom;
>   	struct sk_buff *skb;
>   	struct virtio_net_hdr_mrg_rxbuf *hdr;
>   	int err;
>   
> -	skb = __netdev_alloc_skb_ip_align(vi->dev, GOOD_PACKET_LEN, gfp);
> +	skb = __netdev_alloc_skb_ip_align(vi->dev, headroom, gfp);
>   	if (unlikely(!skb))
>   		return -ENOMEM;
>   
> -	skb_put(skb, GOOD_PACKET_LEN);
> +	skb_put(skb, headroom);
>   
>   	hdr = skb_vnet_hdr(skb);
>   	sg_init_table(rq->sg, 2);
>   	sg_set_buf(rq->sg, hdr, vi->hdr_len);
> -	skb_to_sgvec(skb, rq->sg + 1, 0, skb->len);
> +	skb_to_sgvec(skb, rq->sg + 1, vi->headroom, skb->len - vi->headroom);
>   
>   	err = virtqueue_add_inbuf(rq->vq, rq->sg, 2, skb, gfp);
>   	if (err < 0)
> @@ -829,24 +845,27 @@ static unsigned int get_mergeable_buf_len(struct ewma_pkt_len *avg_pkt_len)
>   	return ALIGN(len, MERGEABLE_BUFFER_ALIGN);
>   }
>   
> -static int add_recvbuf_mergeable(struct receive_queue *rq, gfp_t gfp)
> +static int add_recvbuf_mergeable(struct virtnet_info *vi,
> +				 struct receive_queue *rq, gfp_t gfp)
>   {
>   	struct page_frag *alloc_frag = &rq->alloc_frag;
> +	unsigned int headroom = vi->headroom;
>   	char *buf;
>   	unsigned long ctx;
>   	int err;
>   	unsigned int len, hole;
>   
>   	len = get_mergeable_buf_len(&rq->mrg_avg_pkt_len);
> -	if (unlikely(!skb_page_frag_refill(len, alloc_frag, gfp)))
> +	if (unlikely(!skb_page_frag_refill(len + headroom, alloc_frag, gfp)))
>   		return -ENOMEM;
>   
>   	buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
> +	buf += headroom; /* advance address leaving hole at front of pkt */
>   	ctx = mergeable_buf_to_ctx(buf, len);
>   	get_page(alloc_frag->page);
> -	alloc_frag->offset += len;
> +	alloc_frag->offset += len + headroom;
>   	hole = alloc_frag->size - alloc_frag->offset;
> -	if (hole < len) {
> +	if (hole < len + headroom) {
>   		/* To avoid internal fragmentation, if there is very likely not
>   		 * enough space for another buffer, add the remaining space to
>   		 * the current buffer. This extra space is not included in
> @@ -880,7 +899,7 @@ static bool try_fill_recv(struct virtnet_info *vi, struct receive_queue *rq,
>   	gfp |= __GFP_COLD;
>   	do {
>   		if (vi->mergeable_rx_bufs)
> -			err = add_recvbuf_mergeable(rq, gfp);
> +			err = add_recvbuf_mergeable(vi, rq, gfp);
>   		else if (vi->big_packets)
>   			err = add_recvbuf_big(vi, rq, gfp);
>   		else
> @@ -1675,12 +1694,90 @@ static void virtnet_init_settings(struct net_device *dev)
>   	.set_settings = virtnet_set_settings,
>   };
>   
> +#define VIRTIO_XDP_HEADROOM 256
> +
> +static int init_vqs(struct virtnet_info *vi);
> +static void remove_vq_common(struct virtnet_info *vi, bool lock);
> +
> +/* Reset virtio device with RTNL held this is very similar to the
> + * freeze()/restore() logic except we need to ensure locking. It is
> + * possible that this routine may fail and leave the driver in a
> + * failed state. However assuming the driver negotiated correctly
> + * at probe time we _should_ be able to (re)negotiate driver again.
> + */

Instead of duplicate codes and export helpers, why not use 
virtio_device_freeze()/virtio_device_restore()? For rtnl_lock in 
restore, you can probably avoid it be checking rtnl_is_locked() before?

> +static int virtnet_xdp_reset(struct virtnet_info *vi)
> +{
> +	struct virtio_device *vdev = vi->vdev;
> +	unsigned int status;
> +	int i, ret;
> +
> +	/* Disable and unwind rings */
> +	virtio_config_disable(vdev);
> +	vdev->failed = vdev->config->get_status(vdev) & VIRTIO_CONFIG_S_FAILED;
> +
> +	netif_device_detach(vi->dev);
> +	cancel_delayed_work_sync(&vi->refill);
> +	if (netif_running(vi->dev)) {
> +		for (i = 0; i < vi->max_queue_pairs; i++)
> +			napi_disable(&vi->rq[i].napi);
> +	}
> +
> +	remove_vq_common(vi, false);
> +
> +	/* Do a reset per virtio spec recommendation */
> +	vdev->config->reset(vdev);
> +
> +	/* Acknowledge that we've seen the device. */
> +	status = vdev->config->get_status(vdev);
> +	vdev->config->set_status(vdev, status | VIRTIO_CONFIG_S_ACKNOWLEDGE);
> +
> +	/* Notify driver is up and finalize features per specification. The
> +	 * error code from finalize features is checked here but should not
> +	 * fail because we assume features were previously synced successfully.
> +	 */
> +	status = vdev->config->get_status(vdev);
> +	vdev->config->set_status(vdev, status | VIRTIO_CONFIG_S_DRIVER);
> +	ret = virtio_finalize_features(vdev);
> +	if (ret) {
> +		netdev_warn(vi->dev, "virtio_finalize_features failed during reset aborting\n");
> +		goto err;
> +	}
> +
> +	ret = init_vqs(vi);
> +	if (ret) {
> +		netdev_warn(vi->dev, "init_vqs failed during reset aborting\n");
> +		goto err;
> +	}
> +	virtio_device_ready(vi->vdev);
> +
> +	if (netif_running(vi->dev)) {
> +		for (i = 0; i < vi->curr_queue_pairs; i++)
> +			if (!try_fill_recv(vi, &vi->rq[i], GFP_KERNEL))
> +				schedule_delayed_work(&vi->refill, 0);
> +
> +		for (i = 0; i < vi->max_queue_pairs; i++)
> +			virtnet_napi_enable(&vi->rq[i]);
> +	}
> +	netif_device_attach(vi->dev);
> +	/* Finally, tell the device we're all set */
> +	status = vdev->config->get_status(vdev);
> +	vdev->config->set_status(vdev, status | VIRTIO_CONFIG_S_DRIVER_OK);
> +	virtio_config_enable(vdev);
> +
> +	return 0;
> +err:
> +	status = vdev->config->get_status(vdev);
> +	vdev->config->set_status(vdev, status | VIRTIO_CONFIG_S_FAILED);
> +	return ret;
> +}

[...]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [net PATCH v3 2/5] net: virtio: wrap rtnl_lock in test for calling with lock already held
  2017-01-13  2:51 ` [net PATCH v3 2/5] net: virtio: wrap rtnl_lock in test for calling with lock already held John Fastabend
@ 2017-01-13 16:34   ` Stephen Hemminger
  2017-01-13 17:31     ` John Fastabend
  0 siblings, 1 reply; 15+ messages in thread
From: Stephen Hemminger @ 2017-01-13 16:34 UTC (permalink / raw)
  To: John Fastabend
  Cc: jasowang, mst, john.r.fastabend, netdev, alexei.starovoitov, daniel

On Thu, 12 Jan 2017 18:51:00 -0800
John Fastabend <john.fastabend@gmail.com> wrote:

>  
> -static void free_receive_bufs(struct virtnet_info *vi)
> +static void free_receive_bufs(struct virtnet_info *vi, bool need_lock)
>  {
>  	struct bpf_prog *old_prog;
>  	int i;
>  
> -	rtnl_lock();
> +	if (need_lock)
> +		rtnl_lock();
>  	for (i = 0; i < vi->max_queue_pairs; i++) {
>  		while (vi->rq[i].pages)
>  			__free_pages(get_a_page(&vi->rq[i], GFP_KERNEL), 0);
> @@ -1879,7 +1880,8 @@ static void free_receive_bufs(struct virtnet_info *vi)
>  		if (old_prog)
>  			bpf_prog_put(old_prog);
>  	}
> -	rtnl_unlock();
> +	if (need_lock)
> +		rtnl_unlock();
>  }

Conditional locking is bad idea; sparse complains about it and is later source
of bugs. The more typical way of doing this in kernel is:

void _foo(some args)
{
	ASSERT_RTNL();

	...
}

void foo(some args)
{
	rtnl_lock();
	_foo(some args)
	rtnl_unlock();
}

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [net PATCH v3 2/5] net: virtio: wrap rtnl_lock in test for calling with lock already held
  2017-01-13 16:34   ` Stephen Hemminger
@ 2017-01-13 17:31     ` John Fastabend
  2017-01-13 23:56       ` John Fastabend
  0 siblings, 1 reply; 15+ messages in thread
From: John Fastabend @ 2017-01-13 17:31 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: jasowang, mst, john.r.fastabend, netdev, alexei.starovoitov, daniel

On 17-01-13 08:34 AM, Stephen Hemminger wrote:
> On Thu, 12 Jan 2017 18:51:00 -0800
> John Fastabend <john.fastabend@gmail.com> wrote:
> 
>>  
>> -static void free_receive_bufs(struct virtnet_info *vi)
>> +static void free_receive_bufs(struct virtnet_info *vi, bool need_lock)
>>  {
>>  	struct bpf_prog *old_prog;
>>  	int i;
>>  
>> -	rtnl_lock();
>> +	if (need_lock)
>> +		rtnl_lock();
>>  	for (i = 0; i < vi->max_queue_pairs; i++) {
>>  		while (vi->rq[i].pages)
>>  			__free_pages(get_a_page(&vi->rq[i], GFP_KERNEL), 0);
>> @@ -1879,7 +1880,8 @@ static void free_receive_bufs(struct virtnet_info *vi)
>>  		if (old_prog)
>>  			bpf_prog_put(old_prog);
>>  	}
>> -	rtnl_unlock();
>> +	if (need_lock)
>> +		rtnl_unlock();
>>  }
> 
> Conditional locking is bad idea; sparse complains about it and is later source
> of bugs. The more typical way of doing this in kernel is:

OK I'll use the normal form.

> 
> void _foo(some args)
> {
> 	ASSERT_RTNL();
> 
> 	...
> }
> 
> void foo(some args)
> {
> 	rtnl_lock();
> 	_foo(some args)
> 	rtnl_unlock();
> }
> 
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [net PATCH v3 3/5] virtio_net: factor out xdp handler for readability
  2017-01-13  7:40   ` Jason Wang
@ 2017-01-13 19:56     ` John Fastabend
  0 siblings, 0 replies; 15+ messages in thread
From: John Fastabend @ 2017-01-13 19:56 UTC (permalink / raw)
  To: Jason Wang, mst; +Cc: john.r.fastabend, netdev, alexei.starovoitov, daniel

On 17-01-12 11:40 PM, Jason Wang wrote:
> 
> 
> On 2017年01月13日 10:51, John Fastabend wrote:
>> At this point the do_xdp_prog is mostly if/else branches handling
>> the different modes of virtio_net. So remove it and handle running
>> the program in the per mode handlers.
>>
>> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
>> ---
>>   drivers/net/virtio_net.c |   76 +++++++++++++++++-----------------------------
>>   1 file changed, 28 insertions(+), 48 deletions(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index 43cb2e0..ec54644 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -388,49 +388,6 @@ static void virtnet_xdp_xmit(struct virtnet_info *vi,
>>       virtqueue_kick(sq->vq);
>>   }
>>   
> 
> [...]
> 
>>             /* This happens when rx buffer size is underestimated */
>> @@ -598,8 +570,10 @@ static struct sk_buff *receive_mergeable(struct
>> net_device *dev,
>>           if (unlikely(hdr->hdr.gso_type))
>>               goto err_xdp;
>>   -        act = do_xdp_prog(vi, rq, xdp_prog,
>> -                  page_address(xdp_page) + offset, len);
>> +        data = page_address(xdp_page) + offset;
>> +        xdp.data = data + desc_room;
>> +        xdp.data_end = xdp.data + (len - vi->hdr_len);
> 
> It looks desc_room is always vi->hdr_len.
> 

Seems to be the case I'll just use vi->hdr_len and remove the variable.

Thanks.

>> +        act = bpf_prog_run_xdp(xdp_prog, &xdp);
>>           switch (act) {
>>           case XDP_PASS:
>>               /* We can only create skb based on xdp_page. */
>> @@ -613,13 +587,19 @@ static struct sk_buff *receive_mergeable(struct
>> net_device *dev,
>>               }
>>               break;
>>           case XDP_TX:
>> +            qp = vi->curr_queue_pairs -
>> +                vi->xdp_queue_pairs +
>> +                smp_processor_id();
>> +            virtnet_xdp_xmit(vi, rq, &vi->sq[qp], &xdp, data);
>>               ewma_pkt_len_add(&rq->mrg_avg_pkt_len, len);
>>               if (unlikely(xdp_page != page))
>>                   goto err_xdp;
>>               rcu_read_unlock();
>>               goto xdp_xmit;
>> -        case XDP_DROP:
>>           default:
>> +            bpf_warn_invalid_xdp_action(act);
>> +        case XDP_ABORTED:
>> +        case XDP_DROP:
>>               if (unlikely(xdp_page != page))
>>                   __free_pages(xdp_page, 0);
>>               ewma_pkt_len_add(&rq->mrg_avg_pkt_len, len);
>>
> 

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [net PATCH v3 5/5] virtio_net: XDP support for adjust_head
  2017-01-13  7:41   ` Jason Wang
@ 2017-01-13 20:08     ` John Fastabend
  2017-01-14  0:45       ` John Fastabend
  0 siblings, 1 reply; 15+ messages in thread
From: John Fastabend @ 2017-01-13 20:08 UTC (permalink / raw)
  To: Jason Wang, mst; +Cc: john.r.fastabend, netdev, alexei.starovoitov, daniel

On 17-01-12 11:41 PM, Jason Wang wrote:
> 
> 
> On 2017年01月13日 10:52, John Fastabend wrote:
>> Add support for XDP adjust head by allocating a 256B header region
>> that XDP programs can grow into. This is only enabled when a XDP
>> program is loaded.
>>
>> In order to ensure that we do not have to unwind queue headroom push
>> queue setup below bpf_prog_add. It reads better to do a prog ref
>> unwind vs another queue setup call.
>>
>> At the moment this code must do a full reset to ensure old buffers
>> without headroom on program add or with headroom on program removal
>> are not used incorrectly in the datapath. Ideally we would only
>> have to disable/enable the RX queues being updated but there is no
>> API to do this at the moment in virtio so use the big hammer. In
>> practice it is likely not that big of a problem as this will only
>> happen when XDP is enabled/disabled changing programs does not
>> require the reset. There is some risk that the driver may either
>> have an allocation failure or for some reason fail to correctly
>> negotiate with the underlying backend in this case the driver will
>> be left uninitialized. I have not seen this ever happen on my test
>> systems and for what its worth this same failure case can occur
>> from probe and other contexts in virtio framework.
>>
>> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
>> ---
>>   drivers/net/virtio_net.c |  155 ++++++++++++++++++++++++++++++++++++++++------
>>   drivers/virtio/virtio.c  |    9 ++-
>>   include/linux/virtio.h   |    3 +
>>   3 files changed, 144 insertions(+), 23 deletions(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index 6041828..8b897e7 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
>> @@ -28,6 +28,7 @@
>>   #include <linux/slab.h>
>>   #include <linux/cpu.h>
>>   #include <linux/average.h>
>> +#include <linux/pci.h>
>>   #include <net/busy_poll.h>
>>     static int napi_weight = NAPI_POLL_WEIGHT;
>> @@ -159,6 +160,9 @@ struct virtnet_info {
>>       /* Ethtool settings */
>>       u8 duplex;
>>       u32 speed;
>> +
>> +    /* Headroom allocated in RX Queue */
>> +    unsigned int headroom;
> 
> If this could not be changed in anyway, better use a macro instead of a filed
> here. And there's even no need to add an extra parameter to
> add_recvbuf_mergeable().

OK originally I thought this might be dynamic but I agree no need
for it here.

> 
>>   };
>>     struct padded_vnet_hdr {
>> @@ -359,6 +363,7 @@ static void virtnet_xdp_xmit(struct virtnet_info *vi,
>>       }
>>         if (vi->mergeable_rx_bufs) {
>> +        xdp->data -= sizeof(struct virtio_net_hdr_mrg_rxbuf);
> 
> Fail to understand why this is needed. We should have excluded vnet header from
> xdp->data even before bpf_prog_run_xdp().
> 
>>           /* Zero header and leave csum up to XDP layers */
>>           hdr = xdp->data;
>>           memset(hdr, 0, vi->hdr_len);
>> @@ -375,7 +380,9 @@ static void virtnet_xdp_xmit(struct virtnet_info *vi,
>>           num_sg = 2;
>>           sg_init_table(sq->sg, 2);
>>           sg_set_buf(sq->sg, hdr, vi->hdr_len);
>> -        skb_to_sgvec(skb, sq->sg + 1, 0, skb->len);
>> +        skb_to_sgvec(skb, sq->sg + 1,
>> +                 xdp->data - xdp->data_hard_start,
>> +                 xdp->data_end - xdp->data);
>>       }
>>       err = virtqueue_add_outbuf(sq->vq, sq->sg, num_sg,
>>                      data, GFP_ATOMIC);
>> @@ -401,7 +408,6 @@ static struct sk_buff *receive_small(struct net_device *dev,
>>       struct bpf_prog *xdp_prog;
>>         len -= vi->hdr_len;
>> -    skb_trim(skb, len);
>>         rcu_read_lock();
>>       xdp_prog = rcu_dereference(rq->xdp_prog);
>> @@ -413,11 +419,15 @@ static struct sk_buff *receive_small(struct net_device
>> *dev,
>>           if (unlikely(hdr->hdr.gso_type || hdr->hdr.flags))
>>               goto err_xdp;
>>   -        xdp.data = skb->data;
>> +        xdp.data_hard_start = skb->data;
>> +        xdp.data = skb->data + vi->headroom;
>>           xdp.data_end = xdp.data + len;
>>           act = bpf_prog_run_xdp(xdp_prog, &xdp);
>>           switch (act) {
>>           case XDP_PASS:
>> +            /* Recalculate length in case bpf program changed it */
>> +            len = xdp.data_end - xdp.data;
>> +            __skb_pull(skb, xdp.data - xdp.data_hard_start);
> 
> How about do this just after bpf_pro_run_xdp() for XDP_TX too? This is more
> readable and there's no need to change xmit path.

Agreed will do.

> 
>>               break;
>>           case XDP_TX:
>>               virtnet_xdp_xmit(vi, rq, &xdp, skb);
>> @@ -432,6 +442,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
>>       }
>>       rcu_read_unlock();
>>   +    skb_trim(skb, len);
>>       return skb;
>>     err_xdp:
>> @@ -569,7 +580,11 @@ static struct sk_buff *receive_mergeable(struct
>> net_device *dev,
>>           if (unlikely(hdr->hdr.gso_type))
>>               goto err_xdp;
>>   +        /* Allow consuming headroom but reserve enough space to push
>> +         * the descriptor on if we get an XDP_TX return code.
>> +         */
>>           data = page_address(xdp_page) + offset;
>> +        xdp.data_hard_start = data - vi->headroom + desc_room;
> 
> Two possible issues here:
> 
> 1) If we want to adjust header after linearizing, we should reserve a room for
> that page, but I don't see any codes for this.
> 2) If the header has been adjusted, looks like we need change offset value
> otherwise, page_to_skb() won't build a correct skb for us for XDP_PASS.
> 

Both correct thanks. I'll add a couple sample programs to catch this as well.

>>           xdp.data = data + desc_room;
>>           xdp.data_end = xdp.data + (len - vi->hdr_len);
>>           act = bpf_prog_run_xdp(xdp_prog, &xdp);
>> @@ -748,20 +763,21 @@ static void receive_buf(struct virtnet_info *vi, struct
>> receive_queue *rq,
>>   static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
>>                    gfp_t gfp)
>>   {
>> +    int headroom = GOOD_PACKET_LEN + vi->headroom;
>>       struct sk_buff *skb;
>>       struct virtio_net_hdr_mrg_rxbuf *hdr;
>>       int err;
>>   -    skb = __netdev_alloc_skb_ip_align(vi->dev, GOOD_PACKET_LEN, gfp);
>> +    skb = __netdev_alloc_skb_ip_align(vi->dev, headroom, gfp);
>>       if (unlikely(!skb))
>>           return -ENOMEM;
>>   -    skb_put(skb, GOOD_PACKET_LEN);
>> +    skb_put(skb, headroom);
>>         hdr = skb_vnet_hdr(skb);
>>       sg_init_table(rq->sg, 2);
>>       sg_set_buf(rq->sg, hdr, vi->hdr_len);
>> -    skb_to_sgvec(skb, rq->sg + 1, 0, skb->len);
>> +    skb_to_sgvec(skb, rq->sg + 1, vi->headroom, skb->len - vi->headroom);
>>         err = virtqueue_add_inbuf(rq->vq, rq->sg, 2, skb, gfp);
>>       if (err < 0)
>> @@ -829,24 +845,27 @@ static unsigned int get_mergeable_buf_len(struct
>> ewma_pkt_len *avg_pkt_len)
>>       return ALIGN(len, MERGEABLE_BUFFER_ALIGN);
>>   }
>>   -static int add_recvbuf_mergeable(struct receive_queue *rq, gfp_t gfp)
>> +static int add_recvbuf_mergeable(struct virtnet_info *vi,
>> +                 struct receive_queue *rq, gfp_t gfp)
>>   {
>>       struct page_frag *alloc_frag = &rq->alloc_frag;
>> +    unsigned int headroom = vi->headroom;
>>       char *buf;
>>       unsigned long ctx;
>>       int err;
>>       unsigned int len, hole;
>>         len = get_mergeable_buf_len(&rq->mrg_avg_pkt_len);
>> -    if (unlikely(!skb_page_frag_refill(len, alloc_frag, gfp)))
>> +    if (unlikely(!skb_page_frag_refill(len + headroom, alloc_frag, gfp)))
>>           return -ENOMEM;
>>         buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
>> +    buf += headroom; /* advance address leaving hole at front of pkt */
>>       ctx = mergeable_buf_to_ctx(buf, len);
>>       get_page(alloc_frag->page);
>> -    alloc_frag->offset += len;
>> +    alloc_frag->offset += len + headroom;
>>       hole = alloc_frag->size - alloc_frag->offset;
>> -    if (hole < len) {
>> +    if (hole < len + headroom) {
>>           /* To avoid internal fragmentation, if there is very likely not
>>            * enough space for another buffer, add the remaining space to
>>            * the current buffer. This extra space is not included in
>> @@ -880,7 +899,7 @@ static bool try_fill_recv(struct virtnet_info *vi, struct
>> receive_queue *rq,
>>       gfp |= __GFP_COLD;
>>       do {
>>           if (vi->mergeable_rx_bufs)
>> -            err = add_recvbuf_mergeable(rq, gfp);
>> +            err = add_recvbuf_mergeable(vi, rq, gfp);
>>           else if (vi->big_packets)
>>               err = add_recvbuf_big(vi, rq, gfp);
>>           else
>> @@ -1675,12 +1694,90 @@ static void virtnet_init_settings(struct net_device *dev)
>>       .set_settings = virtnet_set_settings,
>>   };
>>   +#define VIRTIO_XDP_HEADROOM 256
>> +
>> +static int init_vqs(struct virtnet_info *vi);
>> +static void remove_vq_common(struct virtnet_info *vi, bool lock);
>> +
>> +/* Reset virtio device with RTNL held this is very similar to the
>> + * freeze()/restore() logic except we need to ensure locking. It is
>> + * possible that this routine may fail and leave the driver in a
>> + * failed state. However assuming the driver negotiated correctly
>> + * at probe time we _should_ be able to (re)negotiate driver again.
>> + */
> 
> Instead of duplicate codes and export helpers, why not use
> virtio_device_freeze()/virtio_device_restore()? For rtnl_lock in restore, you
> can probably avoid it be checking rtnl_is_locked() before?

freeze/restore does virtnet_cpu_notif_* work that is not needed
in this case. But the overhead here is maybe minimal.

Michael wanted to create a generic virtio_reset() so let me
give that a try and that should clean this up as well.

.John

> 
>> +static int virtnet_xdp_reset(struct virtnet_info *vi)
>> +{
>> +    struct virtio_device *vdev = vi->vdev;
>> +    unsigned int status;
>> +    int i, ret;
>> +
>> +    /* Disable and unwind rings */
>> +    virtio_config_disable(vdev);
>> +    vdev->failed = vdev->config->get_status(vdev) & VIRTIO_CONFIG_S_FAILED;
>> +
>> +    netif_device_detach(vi->dev);
>> +    cancel_delayed_work_sync(&vi->refill);
>> +    if (netif_running(vi->dev)) {
>> +        for (i = 0; i < vi->max_queue_pairs; i++)
>> +            napi_disable(&vi->rq[i].napi);
>> +    }
>> +
>> +    remove_vq_common(vi, false);
>> +
>> +    /* Do a reset per virtio spec recommendation */
>> +    vdev->config->reset(vdev);
>> +
>> +    /* Acknowledge that we've seen the device. */
>> +    status = vdev->config->get_status(vdev);
>> +    vdev->config->set_status(vdev, status | VIRTIO_CONFIG_S_ACKNOWLEDGE);
>> +
>> +    /* Notify driver is up and finalize features per specification. The
>> +     * error code from finalize features is checked here but should not
>> +     * fail because we assume features were previously synced successfully.
>> +     */
>> +    status = vdev->config->get_status(vdev);
>> +    vdev->config->set_status(vdev, status | VIRTIO_CONFIG_S_DRIVER);
>> +    ret = virtio_finalize_features(vdev);
>> +    if (ret) {
>> +        netdev_warn(vi->dev, "virtio_finalize_features failed during reset
>> aborting\n");
>> +        goto err;
>> +    }
>> +
>> +    ret = init_vqs(vi);
>> +    if (ret) {
>> +        netdev_warn(vi->dev, "init_vqs failed during reset aborting\n");
>> +        goto err;
>> +    }
>> +    virtio_device_ready(vi->vdev);
>> +
>> +    if (netif_running(vi->dev)) {
>> +        for (i = 0; i < vi->curr_queue_pairs; i++)
>> +            if (!try_fill_recv(vi, &vi->rq[i], GFP_KERNEL))
>> +                schedule_delayed_work(&vi->refill, 0);
>> +
>> +        for (i = 0; i < vi->max_queue_pairs; i++)
>> +            virtnet_napi_enable(&vi->rq[i]);
>> +    }
>> +    netif_device_attach(vi->dev);
>> +    /* Finally, tell the device we're all set */
>> +    status = vdev->config->get_status(vdev);
>> +    vdev->config->set_status(vdev, status | VIRTIO_CONFIG_S_DRIVER_OK);
>> +    virtio_config_enable(vdev);
>> +
>> +    return 0;
>> +err:
>> +    status = vdev->config->get_status(vdev);
>> +    vdev->config->set_status(vdev, status | VIRTIO_CONFIG_S_FAILED);
>> +    return ret;
>> +}
> 
> [...]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [net PATCH v3 2/5] net: virtio: wrap rtnl_lock in test for calling with lock already held
  2017-01-13 17:31     ` John Fastabend
@ 2017-01-13 23:56       ` John Fastabend
  0 siblings, 0 replies; 15+ messages in thread
From: John Fastabend @ 2017-01-13 23:56 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: jasowang, mst, john.r.fastabend, netdev, alexei.starovoitov, daniel

On 17-01-13 09:31 AM, John Fastabend wrote:
> On 17-01-13 08:34 AM, Stephen Hemminger wrote:
>> On Thu, 12 Jan 2017 18:51:00 -0800
>> John Fastabend <john.fastabend@gmail.com> wrote:
>>
>>>  
>>> -static void free_receive_bufs(struct virtnet_info *vi)
>>> +static void free_receive_bufs(struct virtnet_info *vi, bool need_lock)
>>>  {
>>>  	struct bpf_prog *old_prog;
>>>  	int i;
>>>  
>>> -	rtnl_lock();
>>> +	if (need_lock)
>>> +		rtnl_lock();
>>>  	for (i = 0; i < vi->max_queue_pairs; i++) {
>>>  		while (vi->rq[i].pages)
>>>  			__free_pages(get_a_page(&vi->rq[i], GFP_KERNEL), 0);
>>> @@ -1879,7 +1880,8 @@ static void free_receive_bufs(struct virtnet_info *vi)
>>>  		if (old_prog)
>>>  			bpf_prog_put(old_prog);
>>>  	}
>>> -	rtnl_unlock();
>>> +	if (need_lock)
>>> +		rtnl_unlock();
>>>  }
>>
>> Conditional locking is bad idea; sparse complains about it and is later source
>> of bugs. The more typical way of doing this in kernel is:
> 
> OK I'll use the normal form.
> 
>>
>> void _foo(some args)
>> {
>> 	ASSERT_RTNL();
>>
>> 	...
>> }
>>
>> void foo(some args)
>> {
>> 	rtnl_lock();
>> 	_foo(some args)
>> 	rtnl_unlock();
>> }
>>
>>
> 

Actually doing this without a rtnl_try_lock() is going to create two more
callbacks in virtio core just for virtio_net. All the other users do not
appear to have locking restrictions. How about the following it at least
helps in that there is no argument passing and if/else on the locks itself
but does use the if around rtnl_try_lock().

--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1864,12 +1864,11 @@ static void virtnet_free_queues(struct virtnet_info *vi)
        kfree(vi->sq);
 }

-static void free_receive_bufs(struct virtnet_info *vi)
+static void _free_receive_bufs(struct virtnet_info *vi)
 {
        struct bpf_prog *old_prog;
        int i;

-       rtnl_lock();
        for (i = 0; i < vi->max_queue_pairs; i++) {
                while (vi->rq[i].pages)
                        __free_pages(get_a_page(&vi->rq[i], GFP_KERNEL), 0);
@@ -1879,6 +1878,12 @@ static void free_receive_bufs(struct virtnet_info *vi)
                if (old_prog)
                        bpf_prog_put(old_prog);
        }
+}
+
+static void free_receive_bufs(struct virtnet_info *vi)
+{
+       rtnl_lock();
+       _free_receive_bufs(vi);
        rtnl_unlock();
 }

@@ -2358,7 +2363,10 @@ static void remove_vq_common(struct virtnet_info *vi)
        /* Free unused buffers in both send and recv, if any. */
        free_unused_bufs(vi);

-       free_receive_bufs(vi);
+       if (rtnl_is_locked());
+               _free_receive_bufs(vi);
+       else
+               free_receive_bufs(vi);

        free_receive_page_frags(vi);

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [net PATCH v3 5/5] virtio_net: XDP support for adjust_head
  2017-01-13 20:08     ` John Fastabend
@ 2017-01-14  0:45       ` John Fastabend
  0 siblings, 0 replies; 15+ messages in thread
From: John Fastabend @ 2017-01-14  0:45 UTC (permalink / raw)
  To: Jason Wang, mst; +Cc: john.r.fastabend, netdev, alexei.starovoitov, daniel

On 17-01-13 12:08 PM, John Fastabend wrote:
> On 17-01-12 11:41 PM, Jason Wang wrote:
>>
>>
>> On 2017年01月13日 10:52, John Fastabend wrote:
>>> Add support for XDP adjust head by allocating a 256B header region
>>> that XDP programs can grow into. This is only enabled when a XDP
>>> program is loaded.
>>>
>>> In order to ensure that we do not have to unwind queue headroom push
>>> queue setup below bpf_prog_add. It reads better to do a prog ref
>>> unwind vs another queue setup call.
>>>
>>> At the moment this code must do a full reset to ensure old buffers
>>> without headroom on program add or with headroom on program removal
>>> are not used incorrectly in the datapath. Ideally we would only
>>> have to disable/enable the RX queues being updated but there is no
>>> API to do this at the moment in virtio so use the big hammer. In
>>> practice it is likely not that big of a problem as this will only
>>> happen when XDP is enabled/disabled changing programs does not
>>> require the reset. There is some risk that the driver may either
>>> have an allocation failure or for some reason fail to correctly
>>> negotiate with the underlying backend in this case the driver will
>>> be left uninitialized. I have not seen this ever happen on my test
>>> systems and for what its worth this same failure case can occur
>>> from probe and other contexts in virtio framework.
>>>
>>> Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
>>> ---
>>>   drivers/net/virtio_net.c |  155 ++++++++++++++++++++++++++++++++++++++++------
>>>   drivers/virtio/virtio.c  |    9 ++-
>>>   include/linux/virtio.h   |    3 +
>>>   3 files changed, 144 insertions(+), 23 deletions(-)
>>>
>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>>> index 6041828..8b897e7 100644
>>> --- a/drivers/net/virtio_net.c
>>> +++ b/drivers/net/virtio_net.c
>>> @@ -28,6 +28,7 @@
>>>   #include <linux/slab.h>
>>>   #include <linux/cpu.h>
>>>   #include <linux/average.h>
>>> +#include <linux/pci.h>
>>>   #include <net/busy_poll.h>
>>>     static int napi_weight = NAPI_POLL_WEIGHT;
>>> @@ -159,6 +160,9 @@ struct virtnet_info {
>>>       /* Ethtool settings */
>>>       u8 duplex;
>>>       u32 speed;
>>> +
>>> +    /* Headroom allocated in RX Queue */
>>> +    unsigned int headroom;
>>
>> If this could not be changed in anyway, better use a macro instead of a filed
>> here. And there's even no need to add an extra parameter to
>> add_recvbuf_mergeable().
> 
> OK originally I thought this might be dynamic but I agree no need
> for it here.
> 

Well there is a bit of an order of operation issue that means we need at
least some bit here to tell us an enablement is pending.

The problem is when we do the reset we need to know that headroom for XDP
is needed. But we can't use the xdp_prog values because xdp_prog can not
be added on an device that is up without headroom otherwise the program
could fail. Plus reset via freeze/restore tears these structures down and
rebuilds them.

How about a boolean bit here instead of an unsigned int,

	'bool xdp_headroom_needed'

seems better than an int.

Thanks,
John

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2017-01-14  0:46 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-13  2:50 [net PATCH v3 0/5] virtio_net XDP fixes and adjust_header support John Fastabend
2017-01-13  2:50 ` [net PATCH v3 1/5] virtio_net: use dev_kfree_skb for small buffer XDP receive John Fastabend
2017-01-13  3:47   ` Jason Wang
2017-01-13  2:51 ` [net PATCH v3 2/5] net: virtio: wrap rtnl_lock in test for calling with lock already held John Fastabend
2017-01-13 16:34   ` Stephen Hemminger
2017-01-13 17:31     ` John Fastabend
2017-01-13 23:56       ` John Fastabend
2017-01-13  2:51 ` [net PATCH v3 3/5] virtio_net: factor out xdp handler for readability John Fastabend
2017-01-13  7:40   ` Jason Wang
2017-01-13 19:56     ` John Fastabend
2017-01-13  2:51 ` [net PATCH v3 4/5] virtio_net: remove duplicate queue pair binding in XDP John Fastabend
2017-01-13  2:52 ` [net PATCH v3 5/5] virtio_net: XDP support for adjust_head John Fastabend
2017-01-13  7:41   ` Jason Wang
2017-01-13 20:08     ` John Fastabend
2017-01-14  0:45       ` John Fastabend

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.