All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 0/2] virtio-net: re enable XDP_REDIRECT for mergeable buffer
@ 2018-03-01  3:19 ` Jason Wang
  0 siblings, 0 replies; 36+ messages in thread
From: Jason Wang @ 2018-03-01  3:19 UTC (permalink / raw)
  To: mst, virtualization, netdev, linux-kernel
  Cc: brouer, john.fastabend, Jason Wang

Hi:

This series tries to re-enable XDP_REDIRECT for mergeable buffer which
was removed since commit 7324f5399b06 ("virtio_net: disable
XDP_REDIRECT in receive_mergeable() case"). Main concerns are:

- not enough tailroom was reserved which breaks cpumap
- complex logic like EWMA and linearizing during XDP processing

Fix those by:

- reserve enough tailroom during refill
- disable EWMA and use fixed size of rx buffer
- drop linearizing logic and offload it to generic XDP routine, this
  could happen only when the buffer were refilled before XDP set, so
  we could simply ignore the negative performance impact.

Please review.

Thanks

Jason Wang (2):
  virtio-net: re enable XDP_REDIRECT for mergeable buffer
  virtio-net: simplify XDP handling in small buffer

 drivers/net/virtio_net.c | 186 ++++++++++++++++++-----------------------------
 1 file changed, 70 insertions(+), 116 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH net-next 0/2] virtio-net: re enable XDP_REDIRECT for mergeable buffer
@ 2018-03-01  3:19 ` Jason Wang
  0 siblings, 0 replies; 36+ messages in thread
From: Jason Wang @ 2018-03-01  3:19 UTC (permalink / raw)
  To: mst, virtualization, netdev, linux-kernel; +Cc: john.fastabend, brouer

Hi:

This series tries to re-enable XDP_REDIRECT for mergeable buffer which
was removed since commit 7324f5399b06 ("virtio_net: disable
XDP_REDIRECT in receive_mergeable() case"). Main concerns are:

- not enough tailroom was reserved which breaks cpumap
- complex logic like EWMA and linearizing during XDP processing

Fix those by:

- reserve enough tailroom during refill
- disable EWMA and use fixed size of rx buffer
- drop linearizing logic and offload it to generic XDP routine, this
  could happen only when the buffer were refilled before XDP set, so
  we could simply ignore the negative performance impact.

Please review.

Thanks

Jason Wang (2):
  virtio-net: re enable XDP_REDIRECT for mergeable buffer
  virtio-net: simplify XDP handling in small buffer

 drivers/net/virtio_net.c | 186 ++++++++++++++++++-----------------------------
 1 file changed, 70 insertions(+), 116 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH net-next 1/2] virtio-net: re enable XDP_REDIRECT for mergeable buffer
  2018-03-01  3:19 ` Jason Wang
  (?)
  (?)
@ 2018-03-01  3:19 ` Jason Wang
  2018-03-01  8:41   ` Jesper Dangaard Brouer
                     ` (2 more replies)
  -1 siblings, 3 replies; 36+ messages in thread
From: Jason Wang @ 2018-03-01  3:19 UTC (permalink / raw)
  To: mst, virtualization, netdev, linux-kernel
  Cc: brouer, john.fastabend, Jason Wang

XDP_REDIRECT support for mergeable buffer was removed since commit
7324f5399b06 ("virtio_net: disable XDP_REDIRECT in receive_mergeable()
case"). This is because we don't reserve enough tailroom for struct
skb_shared_info which breaks XDP assumption. Other complaints are, the
complex linearize logic and EWMA estimation may increase the
possibility of linearizing.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/virtio_net.c | 107 +++++++++++++++++++++++++++++------------------
 1 file changed, 67 insertions(+), 40 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 9bb9e56..81190ba 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -537,6 +537,26 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
 	return NULL;
 }
 
+static struct sk_buff *virtnet_skb_xdp(struct receive_queue *rq,
+				       struct sk_buff *skb)
+{
+	struct bpf_prog *xdp_prog;
+	int ret;
+
+	rcu_read_lock();
+	xdp_prog = rcu_dereference(rq->xdp_prog);
+	if (xdp_prog) {
+		ret = do_xdp_generic(xdp_prog, skb);
+		if (ret != XDP_PASS) {
+			rcu_read_unlock();
+			return NULL;
+		}
+	}
+	rcu_read_unlock();
+
+	return skb;
+}
+
 static struct sk_buff *receive_small(struct net_device *dev,
 				     struct virtnet_info *vi,
 				     struct receive_queue *rq,
@@ -689,31 +709,30 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 	struct bpf_prog *xdp_prog;
 	unsigned int truesize;
 	unsigned int headroom = mergeable_ctx_to_headroom(ctx);
-	bool sent;
+	bool sent, skb_xdp = false;
+	int err;
 
 	head_skb = NULL;
 
 	rcu_read_lock();
 	xdp_prog = rcu_dereference(rq->xdp_prog);
 	if (xdp_prog) {
-		struct page *xdp_page;
 		struct xdp_buff xdp;
 		void *data;
 		u32 act;
 
-		/* This happens when rx buffer size is underestimated */
+		/* This happens when rx buffer size is underestimated
+		 * or headroom is not enough because of the buffer
+		 * was refilled before XDP is set. In both cases,
+		 * for simplicity, we will offload them to generic
+		 * XDP routine. This should only happen for the first
+		 * several packets, so we don't care much about its
+		 * performance.
+		 */
 		if (unlikely(num_buf > 1 ||
 			     headroom < virtnet_get_headroom(vi))) {
-			/* linearize data for XDP */
-			xdp_page = xdp_linearize_page(rq, &num_buf,
-						      page, offset,
-						      VIRTIO_XDP_HEADROOM,
-						      &len);
-			if (!xdp_page)
-				goto err_xdp;
-			offset = VIRTIO_XDP_HEADROOM;
-		} else {
-			xdp_page = page;
+			skb_xdp = true;
+			goto skb_xdp;
 		}
 
 		/* Transient failure which in theory could occur if
@@ -727,7 +746,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 		/* Allow consuming headroom but reserve enough space to push
 		 * the descriptor on if we get an XDP_TX return code.
 		 */
-		data = page_address(xdp_page) + offset;
+		data = page_address(page) + offset;
 		xdp.data_hard_start = data - VIRTIO_XDP_HEADROOM + vi->hdr_len;
 		xdp.data = data + vi->hdr_len;
 		xdp_set_data_meta_invalid(&xdp);
@@ -736,9 +755,6 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 
 		act = bpf_prog_run_xdp(xdp_prog, &xdp);
 
-		if (act != XDP_PASS)
-			ewma_pkt_len_add(&rq->mrg_avg_pkt_len, len);
-
 		switch (act) {
 		case XDP_PASS:
 			/* recalculate offset to account for any header
@@ -746,28 +762,22 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 			 * skb and avoid using offset
 			 */
 			offset = xdp.data -
-					page_address(xdp_page) - vi->hdr_len;
-
-			/* We can only create skb based on xdp_page. */
-			if (unlikely(xdp_page != page)) {
-				rcu_read_unlock();
-				put_page(page);
-				head_skb = page_to_skb(vi, rq, xdp_page,
-						       offset, len, PAGE_SIZE);
-				return head_skb;
-			}
+					page_address(page) - vi->hdr_len;
 			break;
 		case XDP_TX:
 			sent = __virtnet_xdp_xmit(vi, &xdp);
 			if (unlikely(!sent)) {
 				trace_xdp_exception(vi->dev, xdp_prog, act);
-				if (unlikely(xdp_page != page))
-					put_page(xdp_page);
 				goto err_xdp;
 			}
 			*xdp_xmit = true;
-			if (unlikely(xdp_page != page))
+			rcu_read_unlock();
+			goto xdp_xmit;
+		case XDP_REDIRECT:
+			err = xdp_do_redirect(dev, &xdp, xdp_prog);
+			if (err)
 				goto err_xdp;
+			*xdp_xmit = true;
 			rcu_read_unlock();
 			goto xdp_xmit;
 		default:
@@ -775,13 +785,12 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 		case XDP_ABORTED:
 			trace_xdp_exception(vi->dev, xdp_prog, act);
 		case XDP_DROP:
-			if (unlikely(xdp_page != page))
-				__free_pages(xdp_page, 0);
 			goto err_xdp;
 		}
 	}
 	rcu_read_unlock();
 
+skb_xdp:
 	truesize = mergeable_ctx_to_truesize(ctx);
 	if (unlikely(len > truesize)) {
 		pr_debug("%s: rx error: len %u exceeds truesize %lu\n",
@@ -848,7 +857,11 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 		}
 	}
 
-	ewma_pkt_len_add(&rq->mrg_avg_pkt_len, head_skb->len);
+	if (skb_xdp)
+		head_skb = virtnet_skb_xdp(rq, head_skb);
+	else
+		ewma_pkt_len_add(&rq->mrg_avg_pkt_len, head_skb->len);
+
 	return head_skb;
 
 err_xdp:
@@ -1013,13 +1026,18 @@ static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq,
 }
 
 static unsigned int get_mergeable_buf_len(struct receive_queue *rq,
-					  struct ewma_pkt_len *avg_pkt_len)
+					  struct ewma_pkt_len *avg_pkt_len,
+					  unsigned int room)
 {
 	const size_t hdr_len = sizeof(struct virtio_net_hdr_mrg_rxbuf);
 	unsigned int len;
 
-	len = hdr_len + clamp_t(unsigned int, ewma_pkt_len_read(avg_pkt_len),
+	if (room)
+		return PAGE_SIZE - room;
+
+	len = hdr_len +	clamp_t(unsigned int, ewma_pkt_len_read(avg_pkt_len),
 				rq->min_buf_len, PAGE_SIZE - hdr_len);
+
 	return ALIGN(len, L1_CACHE_BYTES);
 }
 
@@ -1028,21 +1046,27 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
 {
 	struct page_frag *alloc_frag = &rq->alloc_frag;
 	unsigned int headroom = virtnet_get_headroom(vi);
+	unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
+	unsigned int room = SKB_DATA_ALIGN(headroom + tailroom);
 	char *buf;
 	void *ctx;
 	int err;
 	unsigned int len, hole;
 
-	len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len);
-	if (unlikely(!skb_page_frag_refill(len + headroom, alloc_frag, gfp)))
+	/* Extra tailroom is needed to satisfy XDP's assumption. This
+	 * means rx frags coalescing won't work, but consider we've
+	 * disabled GSO for XDP, it won't be a big issue.
+	 */
+	len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room);
+	if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp)))
 		return -ENOMEM;
 
 	buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
 	buf += headroom; /* advance address leaving hole at front of pkt */
 	get_page(alloc_frag->page);
-	alloc_frag->offset += len + headroom;
+	alloc_frag->offset += len + room;
 	hole = alloc_frag->size - alloc_frag->offset;
-	if (hole < len + headroom) {
+	if (hole < len + room) {
 		/* To avoid internal fragmentation, if there is very likely not
 		 * enough space for another buffer, add the remaining space to
 		 * the current buffer.
@@ -2576,12 +2600,15 @@ static ssize_t mergeable_rx_buffer_size_show(struct netdev_rx_queue *queue,
 {
 	struct virtnet_info *vi = netdev_priv(queue->dev);
 	unsigned int queue_index = get_netdev_rx_queue_index(queue);
+	unsigned int headroom = virtnet_get_headroom(vi);
+	unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
 	struct ewma_pkt_len *avg;
 
 	BUG_ON(queue_index >= vi->max_queue_pairs);
 	avg = &vi->rq[queue_index].mrg_avg_pkt_len;
 	return sprintf(buf, "%u\n",
-		       get_mergeable_buf_len(&vi->rq[queue_index], avg));
+		       get_mergeable_buf_len(&vi->rq[queue_index], avg,
+				       SKB_DATA_ALIGN(headroom + tailroom)));
 }
 
 static struct rx_queue_attribute mergeable_rx_buffer_size_attribute =
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH net-next 1/2] virtio-net: re enable XDP_REDIRECT for mergeable buffer
  2018-03-01  3:19 ` Jason Wang
  (?)
@ 2018-03-01  3:19 ` Jason Wang
  -1 siblings, 0 replies; 36+ messages in thread
From: Jason Wang @ 2018-03-01  3:19 UTC (permalink / raw)
  To: mst, virtualization, netdev, linux-kernel; +Cc: john.fastabend, brouer

XDP_REDIRECT support for mergeable buffer was removed since commit
7324f5399b06 ("virtio_net: disable XDP_REDIRECT in receive_mergeable()
case"). This is because we don't reserve enough tailroom for struct
skb_shared_info which breaks XDP assumption. Other complaints are, the
complex linearize logic and EWMA estimation may increase the
possibility of linearizing.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/virtio_net.c | 107 +++++++++++++++++++++++++++++------------------
 1 file changed, 67 insertions(+), 40 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 9bb9e56..81190ba 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -537,6 +537,26 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
 	return NULL;
 }
 
+static struct sk_buff *virtnet_skb_xdp(struct receive_queue *rq,
+				       struct sk_buff *skb)
+{
+	struct bpf_prog *xdp_prog;
+	int ret;
+
+	rcu_read_lock();
+	xdp_prog = rcu_dereference(rq->xdp_prog);
+	if (xdp_prog) {
+		ret = do_xdp_generic(xdp_prog, skb);
+		if (ret != XDP_PASS) {
+			rcu_read_unlock();
+			return NULL;
+		}
+	}
+	rcu_read_unlock();
+
+	return skb;
+}
+
 static struct sk_buff *receive_small(struct net_device *dev,
 				     struct virtnet_info *vi,
 				     struct receive_queue *rq,
@@ -689,31 +709,30 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 	struct bpf_prog *xdp_prog;
 	unsigned int truesize;
 	unsigned int headroom = mergeable_ctx_to_headroom(ctx);
-	bool sent;
+	bool sent, skb_xdp = false;
+	int err;
 
 	head_skb = NULL;
 
 	rcu_read_lock();
 	xdp_prog = rcu_dereference(rq->xdp_prog);
 	if (xdp_prog) {
-		struct page *xdp_page;
 		struct xdp_buff xdp;
 		void *data;
 		u32 act;
 
-		/* This happens when rx buffer size is underestimated */
+		/* This happens when rx buffer size is underestimated
+		 * or headroom is not enough because of the buffer
+		 * was refilled before XDP is set. In both cases,
+		 * for simplicity, we will offload them to generic
+		 * XDP routine. This should only happen for the first
+		 * several packets, so we don't care much about its
+		 * performance.
+		 */
 		if (unlikely(num_buf > 1 ||
 			     headroom < virtnet_get_headroom(vi))) {
-			/* linearize data for XDP */
-			xdp_page = xdp_linearize_page(rq, &num_buf,
-						      page, offset,
-						      VIRTIO_XDP_HEADROOM,
-						      &len);
-			if (!xdp_page)
-				goto err_xdp;
-			offset = VIRTIO_XDP_HEADROOM;
-		} else {
-			xdp_page = page;
+			skb_xdp = true;
+			goto skb_xdp;
 		}
 
 		/* Transient failure which in theory could occur if
@@ -727,7 +746,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 		/* Allow consuming headroom but reserve enough space to push
 		 * the descriptor on if we get an XDP_TX return code.
 		 */
-		data = page_address(xdp_page) + offset;
+		data = page_address(page) + offset;
 		xdp.data_hard_start = data - VIRTIO_XDP_HEADROOM + vi->hdr_len;
 		xdp.data = data + vi->hdr_len;
 		xdp_set_data_meta_invalid(&xdp);
@@ -736,9 +755,6 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 
 		act = bpf_prog_run_xdp(xdp_prog, &xdp);
 
-		if (act != XDP_PASS)
-			ewma_pkt_len_add(&rq->mrg_avg_pkt_len, len);
-
 		switch (act) {
 		case XDP_PASS:
 			/* recalculate offset to account for any header
@@ -746,28 +762,22 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 			 * skb and avoid using offset
 			 */
 			offset = xdp.data -
-					page_address(xdp_page) - vi->hdr_len;
-
-			/* We can only create skb based on xdp_page. */
-			if (unlikely(xdp_page != page)) {
-				rcu_read_unlock();
-				put_page(page);
-				head_skb = page_to_skb(vi, rq, xdp_page,
-						       offset, len, PAGE_SIZE);
-				return head_skb;
-			}
+					page_address(page) - vi->hdr_len;
 			break;
 		case XDP_TX:
 			sent = __virtnet_xdp_xmit(vi, &xdp);
 			if (unlikely(!sent)) {
 				trace_xdp_exception(vi->dev, xdp_prog, act);
-				if (unlikely(xdp_page != page))
-					put_page(xdp_page);
 				goto err_xdp;
 			}
 			*xdp_xmit = true;
-			if (unlikely(xdp_page != page))
+			rcu_read_unlock();
+			goto xdp_xmit;
+		case XDP_REDIRECT:
+			err = xdp_do_redirect(dev, &xdp, xdp_prog);
+			if (err)
 				goto err_xdp;
+			*xdp_xmit = true;
 			rcu_read_unlock();
 			goto xdp_xmit;
 		default:
@@ -775,13 +785,12 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 		case XDP_ABORTED:
 			trace_xdp_exception(vi->dev, xdp_prog, act);
 		case XDP_DROP:
-			if (unlikely(xdp_page != page))
-				__free_pages(xdp_page, 0);
 			goto err_xdp;
 		}
 	}
 	rcu_read_unlock();
 
+skb_xdp:
 	truesize = mergeable_ctx_to_truesize(ctx);
 	if (unlikely(len > truesize)) {
 		pr_debug("%s: rx error: len %u exceeds truesize %lu\n",
@@ -848,7 +857,11 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
 		}
 	}
 
-	ewma_pkt_len_add(&rq->mrg_avg_pkt_len, head_skb->len);
+	if (skb_xdp)
+		head_skb = virtnet_skb_xdp(rq, head_skb);
+	else
+		ewma_pkt_len_add(&rq->mrg_avg_pkt_len, head_skb->len);
+
 	return head_skb;
 
 err_xdp:
@@ -1013,13 +1026,18 @@ static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq,
 }
 
 static unsigned int get_mergeable_buf_len(struct receive_queue *rq,
-					  struct ewma_pkt_len *avg_pkt_len)
+					  struct ewma_pkt_len *avg_pkt_len,
+					  unsigned int room)
 {
 	const size_t hdr_len = sizeof(struct virtio_net_hdr_mrg_rxbuf);
 	unsigned int len;
 
-	len = hdr_len + clamp_t(unsigned int, ewma_pkt_len_read(avg_pkt_len),
+	if (room)
+		return PAGE_SIZE - room;
+
+	len = hdr_len +	clamp_t(unsigned int, ewma_pkt_len_read(avg_pkt_len),
 				rq->min_buf_len, PAGE_SIZE - hdr_len);
+
 	return ALIGN(len, L1_CACHE_BYTES);
 }
 
@@ -1028,21 +1046,27 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
 {
 	struct page_frag *alloc_frag = &rq->alloc_frag;
 	unsigned int headroom = virtnet_get_headroom(vi);
+	unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
+	unsigned int room = SKB_DATA_ALIGN(headroom + tailroom);
 	char *buf;
 	void *ctx;
 	int err;
 	unsigned int len, hole;
 
-	len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len);
-	if (unlikely(!skb_page_frag_refill(len + headroom, alloc_frag, gfp)))
+	/* Extra tailroom is needed to satisfy XDP's assumption. This
+	 * means rx frags coalescing won't work, but consider we've
+	 * disabled GSO for XDP, it won't be a big issue.
+	 */
+	len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room);
+	if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp)))
 		return -ENOMEM;
 
 	buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
 	buf += headroom; /* advance address leaving hole at front of pkt */
 	get_page(alloc_frag->page);
-	alloc_frag->offset += len + headroom;
+	alloc_frag->offset += len + room;
 	hole = alloc_frag->size - alloc_frag->offset;
-	if (hole < len + headroom) {
+	if (hole < len + room) {
 		/* To avoid internal fragmentation, if there is very likely not
 		 * enough space for another buffer, add the remaining space to
 		 * the current buffer.
@@ -2576,12 +2600,15 @@ static ssize_t mergeable_rx_buffer_size_show(struct netdev_rx_queue *queue,
 {
 	struct virtnet_info *vi = netdev_priv(queue->dev);
 	unsigned int queue_index = get_netdev_rx_queue_index(queue);
+	unsigned int headroom = virtnet_get_headroom(vi);
+	unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
 	struct ewma_pkt_len *avg;
 
 	BUG_ON(queue_index >= vi->max_queue_pairs);
 	avg = &vi->rq[queue_index].mrg_avg_pkt_len;
 	return sprintf(buf, "%u\n",
-		       get_mergeable_buf_len(&vi->rq[queue_index], avg));
+		       get_mergeable_buf_len(&vi->rq[queue_index], avg,
+				       SKB_DATA_ALIGN(headroom + tailroom)));
 }
 
 static struct rx_queue_attribute mergeable_rx_buffer_size_attribute =
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH net-next 2/2] virtio-net: simplify XDP handling in small buffer
  2018-03-01  3:19 ` Jason Wang
                   ` (2 preceding siblings ...)
  (?)
@ 2018-03-01  3:19 ` Jason Wang
  2018-03-01  8:02   ` Jesper Dangaard Brouer
  2018-03-01  8:02   ` Jesper Dangaard Brouer
  -1 siblings, 2 replies; 36+ messages in thread
From: Jason Wang @ 2018-03-01  3:19 UTC (permalink / raw)
  To: mst, virtualization, netdev, linux-kernel
  Cc: brouer, john.fastabend, Jason Wang

We used to do data copy through xdp_linearize_page() for the buffer
without sufficient headroom, it brings extra complexity without
helping for the performance. So this patch remove it and switch to use
generic XDP routine to handle this case.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/virtio_net.c | 93 ++++++------------------------------------------
 1 file changed, 10 insertions(+), 83 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 81190ba..3f14948 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -474,69 +474,6 @@ static unsigned int virtnet_get_headroom(struct virtnet_info *vi)
 	return vi->xdp_queue_pairs ? VIRTIO_XDP_HEADROOM : 0;
 }
 
-/* We copy the packet for XDP in the following cases:
- *
- * 1) Packet is scattered across multiple rx buffers.
- * 2) Headroom space is insufficient.
- *
- * This is inefficient but it's a temporary condition that
- * we hit right after XDP is enabled and until queue is refilled
- * with large buffers with sufficient headroom - so it should affect
- * at most queue size packets.
- * Afterwards, the conditions to enable
- * XDP should preclude the underlying device from sending packets
- * across multiple buffers (num_buf > 1), and we make sure buffers
- * have enough headroom.
- */
-static struct page *xdp_linearize_page(struct receive_queue *rq,
-				       u16 *num_buf,
-				       struct page *p,
-				       int offset,
-				       int page_off,
-				       unsigned int *len)
-{
-	struct page *page = alloc_page(GFP_ATOMIC);
-
-	if (!page)
-		return NULL;
-
-	memcpy(page_address(page) + page_off, page_address(p) + offset, *len);
-	page_off += *len;
-
-	while (--*num_buf) {
-		unsigned int buflen;
-		void *buf;
-		int off;
-
-		buf = virtqueue_get_buf(rq->vq, &buflen);
-		if (unlikely(!buf))
-			goto err_buf;
-
-		p = virt_to_head_page(buf);
-		off = buf - page_address(p);
-
-		/* guard against a misconfigured or uncooperative backend that
-		 * is sending packet larger than the MTU.
-		 */
-		if ((page_off + buflen) > PAGE_SIZE) {
-			put_page(p);
-			goto err_buf;
-		}
-
-		memcpy(page_address(page) + page_off,
-		       page_address(p) + off, buflen);
-		page_off += buflen;
-		put_page(p);
-	}
-
-	/* Headroom does not contribute to packet length */
-	*len = page_off - VIRTIO_XDP_HEADROOM;
-	return page;
-err_buf:
-	__free_pages(page, 0);
-	return NULL;
-}
-
 static struct sk_buff *virtnet_skb_xdp(struct receive_queue *rq,
 				       struct sk_buff *skb)
 {
@@ -573,8 +510,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
 			      SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
 	struct page *page = virt_to_head_page(buf);
 	unsigned int delta = 0;
-	struct page *xdp_page;
-	bool sent;
+	bool sent, skb_xdp = false;
 	int err;
 
 	len -= vi->hdr_len;
@@ -590,25 +526,14 @@ static struct sk_buff *receive_small(struct net_device *dev,
 		if (unlikely(hdr->hdr.gso_type))
 			goto err_xdp;
 
+		/* This happnes when headroom is not enough because
+		 * the buffer was refilled before XDP is set. This
+		 * only happen for several packets, for simplicity,
+		 * offload them to generic XDP routine.
+		 */
 		if (unlikely(xdp_headroom < virtnet_get_headroom(vi))) {
-			int offset = buf - page_address(page) + header_offset;
-			unsigned int tlen = len + vi->hdr_len;
-			u16 num_buf = 1;
-
-			xdp_headroom = virtnet_get_headroom(vi);
-			header_offset = VIRTNET_RX_PAD + xdp_headroom;
-			headroom = vi->hdr_len + header_offset;
-			buflen = SKB_DATA_ALIGN(GOOD_PACKET_LEN + headroom) +
-				 SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
-			xdp_page = xdp_linearize_page(rq, &num_buf, page,
-						      offset, header_offset,
-						      &tlen);
-			if (!xdp_page)
-				goto err_xdp;
-
-			buf = page_address(xdp_page);
-			put_page(page);
-			page = xdp_page;
+			skb_xdp = true;
+			goto skb_xdp;
 		}
 
 		xdp.data_hard_start = buf + VIRTNET_RX_PAD + vi->hdr_len;
@@ -650,6 +575,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
 	}
 	rcu_read_unlock();
 
+skb_xdp:
 	skb = build_skb(buf, buflen);
 	if (!skb) {
 		put_page(page);
@@ -662,6 +588,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
 		memcpy(skb_vnet_hdr(skb), buf, vi->hdr_len);
 	} /* keep zeroed vnet hdr since packet was changed by bpf */
 
+	skb = virtnet_skb_xdp(rq, skb);
 err:
 	return skb;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH net-next 2/2] virtio-net: simplify XDP handling in small buffer
  2018-03-01  3:19 ` Jason Wang
                   ` (3 preceding siblings ...)
  (?)
@ 2018-03-01  3:19 ` Jason Wang
  -1 siblings, 0 replies; 36+ messages in thread
From: Jason Wang @ 2018-03-01  3:19 UTC (permalink / raw)
  To: mst, virtualization, netdev, linux-kernel; +Cc: john.fastabend, brouer

We used to do data copy through xdp_linearize_page() for the buffer
without sufficient headroom, it brings extra complexity without
helping for the performance. So this patch remove it and switch to use
generic XDP routine to handle this case.

Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/virtio_net.c | 93 ++++++------------------------------------------
 1 file changed, 10 insertions(+), 83 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 81190ba..3f14948 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -474,69 +474,6 @@ static unsigned int virtnet_get_headroom(struct virtnet_info *vi)
 	return vi->xdp_queue_pairs ? VIRTIO_XDP_HEADROOM : 0;
 }
 
-/* We copy the packet for XDP in the following cases:
- *
- * 1) Packet is scattered across multiple rx buffers.
- * 2) Headroom space is insufficient.
- *
- * This is inefficient but it's a temporary condition that
- * we hit right after XDP is enabled and until queue is refilled
- * with large buffers with sufficient headroom - so it should affect
- * at most queue size packets.
- * Afterwards, the conditions to enable
- * XDP should preclude the underlying device from sending packets
- * across multiple buffers (num_buf > 1), and we make sure buffers
- * have enough headroom.
- */
-static struct page *xdp_linearize_page(struct receive_queue *rq,
-				       u16 *num_buf,
-				       struct page *p,
-				       int offset,
-				       int page_off,
-				       unsigned int *len)
-{
-	struct page *page = alloc_page(GFP_ATOMIC);
-
-	if (!page)
-		return NULL;
-
-	memcpy(page_address(page) + page_off, page_address(p) + offset, *len);
-	page_off += *len;
-
-	while (--*num_buf) {
-		unsigned int buflen;
-		void *buf;
-		int off;
-
-		buf = virtqueue_get_buf(rq->vq, &buflen);
-		if (unlikely(!buf))
-			goto err_buf;
-
-		p = virt_to_head_page(buf);
-		off = buf - page_address(p);
-
-		/* guard against a misconfigured or uncooperative backend that
-		 * is sending packet larger than the MTU.
-		 */
-		if ((page_off + buflen) > PAGE_SIZE) {
-			put_page(p);
-			goto err_buf;
-		}
-
-		memcpy(page_address(page) + page_off,
-		       page_address(p) + off, buflen);
-		page_off += buflen;
-		put_page(p);
-	}
-
-	/* Headroom does not contribute to packet length */
-	*len = page_off - VIRTIO_XDP_HEADROOM;
-	return page;
-err_buf:
-	__free_pages(page, 0);
-	return NULL;
-}
-
 static struct sk_buff *virtnet_skb_xdp(struct receive_queue *rq,
 				       struct sk_buff *skb)
 {
@@ -573,8 +510,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
 			      SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
 	struct page *page = virt_to_head_page(buf);
 	unsigned int delta = 0;
-	struct page *xdp_page;
-	bool sent;
+	bool sent, skb_xdp = false;
 	int err;
 
 	len -= vi->hdr_len;
@@ -590,25 +526,14 @@ static struct sk_buff *receive_small(struct net_device *dev,
 		if (unlikely(hdr->hdr.gso_type))
 			goto err_xdp;
 
+		/* This happnes when headroom is not enough because
+		 * the buffer was refilled before XDP is set. This
+		 * only happen for several packets, for simplicity,
+		 * offload them to generic XDP routine.
+		 */
 		if (unlikely(xdp_headroom < virtnet_get_headroom(vi))) {
-			int offset = buf - page_address(page) + header_offset;
-			unsigned int tlen = len + vi->hdr_len;
-			u16 num_buf = 1;
-
-			xdp_headroom = virtnet_get_headroom(vi);
-			header_offset = VIRTNET_RX_PAD + xdp_headroom;
-			headroom = vi->hdr_len + header_offset;
-			buflen = SKB_DATA_ALIGN(GOOD_PACKET_LEN + headroom) +
-				 SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
-			xdp_page = xdp_linearize_page(rq, &num_buf, page,
-						      offset, header_offset,
-						      &tlen);
-			if (!xdp_page)
-				goto err_xdp;
-
-			buf = page_address(xdp_page);
-			put_page(page);
-			page = xdp_page;
+			skb_xdp = true;
+			goto skb_xdp;
 		}
 
 		xdp.data_hard_start = buf + VIRTNET_RX_PAD + vi->hdr_len;
@@ -650,6 +575,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
 	}
 	rcu_read_unlock();
 
+skb_xdp:
 	skb = build_skb(buf, buflen);
 	if (!skb) {
 		put_page(page);
@@ -662,6 +588,7 @@ static struct sk_buff *receive_small(struct net_device *dev,
 		memcpy(skb_vnet_hdr(skb), buf, vi->hdr_len);
 	} /* keep zeroed vnet hdr since packet was changed by bpf */
 
+	skb = virtnet_skb_xdp(rq, skb);
 err:
 	return skb;
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next 2/2] virtio-net: simplify XDP handling in small buffer
  2018-03-01  3:19 ` [PATCH net-next 2/2] virtio-net: simplify XDP handling in small buffer Jason Wang
  2018-03-01  8:02   ` Jesper Dangaard Brouer
@ 2018-03-01  8:02   ` Jesper Dangaard Brouer
  2018-03-01  8:49     ` Jason Wang
  2018-03-01  8:49     ` Jason Wang
  1 sibling, 2 replies; 36+ messages in thread
From: Jesper Dangaard Brouer @ 2018-03-01  8:02 UTC (permalink / raw)
  To: Jason Wang
  Cc: mst, virtualization, netdev, linux-kernel, john.fastabend, brouer


On Thu,  1 Mar 2018 11:19:05 +0800 Jason Wang <jasowang@redhat.com> wrote:

> We used to do data copy through xdp_linearize_page() for the buffer
> without sufficient headroom, it brings extra complexity without
> helping for the performance. So this patch remove it and switch to use
> generic XDP routine to handle this case.

I don't like where this is going.  I don't like intermixing the native
XDP and generic XDP in this way, for several reasons:

1. XDP generic is not feature complete, e.g. cpumap will drop these
   packets. It might not be possible to implement some features, think
   of (AF_XDP) zero-copy.

2. This can easily cause out-of-order packets.

3. It makes it harder to troubleshoot, when diagnosing issues
   around #1, we have a hard time determining what path an XDP packet
   took (the xdp tracepoints doesn't know).


[...]
> @@ -590,25 +526,14 @@ static struct sk_buff *receive_small(struct net_device *dev,
>  		if (unlikely(hdr->hdr.gso_type))
>  			goto err_xdp;
>  
> +		/* This happnes when headroom is not enough because
> +		 * the buffer was refilled before XDP is set. This
> +		 * only happen for several packets, for simplicity,
> +		 * offload them to generic XDP routine.

In my practical tests, I also saw that sometime my ping packets were
traveling this code-path, even after a long time when XDP were attached.

This worries me a bit, for troubleshooting purposes... as this can give
a strange user experience given point #1.


> +		 */
>  		if (unlikely(xdp_headroom < virtnet_get_headroom(vi))) {
> -			int offset = buf - page_address(page) + header_offset;
> -			unsigned int tlen = len + vi->hdr_len;
> -			u16 num_buf = 1;


-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next 2/2] virtio-net: simplify XDP handling in small buffer
  2018-03-01  3:19 ` [PATCH net-next 2/2] virtio-net: simplify XDP handling in small buffer Jason Wang
@ 2018-03-01  8:02   ` Jesper Dangaard Brouer
  2018-03-01  8:02   ` Jesper Dangaard Brouer
  1 sibling, 0 replies; 36+ messages in thread
From: Jesper Dangaard Brouer @ 2018-03-01  8:02 UTC (permalink / raw)
  To: Jason Wang
  Cc: mst, netdev, john.fastabend, linux-kernel, virtualization, brouer


On Thu,  1 Mar 2018 11:19:05 +0800 Jason Wang <jasowang@redhat.com> wrote:

> We used to do data copy through xdp_linearize_page() for the buffer
> without sufficient headroom, it brings extra complexity without
> helping for the performance. So this patch remove it and switch to use
> generic XDP routine to handle this case.

I don't like where this is going.  I don't like intermixing the native
XDP and generic XDP in this way, for several reasons:

1. XDP generic is not feature complete, e.g. cpumap will drop these
   packets. It might not be possible to implement some features, think
   of (AF_XDP) zero-copy.

2. This can easily cause out-of-order packets.

3. It makes it harder to troubleshoot, when diagnosing issues
   around #1, we have a hard time determining what path an XDP packet
   took (the xdp tracepoints doesn't know).


[...]
> @@ -590,25 +526,14 @@ static struct sk_buff *receive_small(struct net_device *dev,
>  		if (unlikely(hdr->hdr.gso_type))
>  			goto err_xdp;
>  
> +		/* This happnes when headroom is not enough because
> +		 * the buffer was refilled before XDP is set. This
> +		 * only happen for several packets, for simplicity,
> +		 * offload them to generic XDP routine.

In my practical tests, I also saw that sometime my ping packets were
traveling this code-path, even after a long time when XDP were attached.

This worries me a bit, for troubleshooting purposes... as this can give
a strange user experience given point #1.


> +		 */
>  		if (unlikely(xdp_headroom < virtnet_get_headroom(vi))) {
> -			int offset = buf - page_address(page) + header_offset;
> -			unsigned int tlen = len + vi->hdr_len;
> -			u16 num_buf = 1;


-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next 1/2] virtio-net: re enable XDP_REDIRECT for mergeable buffer
  2018-03-01  3:19 ` Jason Wang
  2018-03-01  8:41   ` Jesper Dangaard Brouer
@ 2018-03-01  8:41   ` Jesper Dangaard Brouer
  2018-03-01  9:11       ` Jason Wang
  2018-03-01 13:36     ` Michael S. Tsirkin
  2 siblings, 1 reply; 36+ messages in thread
From: Jesper Dangaard Brouer @ 2018-03-01  8:41 UTC (permalink / raw)
  To: Jason Wang
  Cc: mst, virtualization, netdev, linux-kernel, john.fastabend, brouer


On Thu,  1 Mar 2018 11:19:04 +0800 Jason Wang <jasowang@redhat.com> wrote:

> XDP_REDIRECT support for mergeable buffer was removed since commit
> 7324f5399b06 ("virtio_net: disable XDP_REDIRECT in receive_mergeable()
> case"). This is because we don't reserve enough tailroom for struct
> skb_shared_info which breaks XDP assumption. Other complaints are, the
> complex linearize logic and EWMA estimation may increase the
> possibility of linearizing.

This patch also have the intermixing issues, I mentioned for patch 2/2.

On Thu, 1 Mar 2018 09:02:06 +0100
Jesper Dangaard Brouer <brouer@redhat.com> wrote:

> 1. XDP generic is not feature complete, e.g. cpumap will drop these
>    packets. It might not be possible to implement some features, think
>    of (AF_XDP) zero-copy.
> 
> 2. This can easily cause out-of-order packets.
> 
> 3. It makes it harder to troubleshoot, when diagnosing issues
>    around #1, we have a hard time determining what path an XDP packet
>    took (the xdp tracepoints doesn't know).


It is slightly better, as it is consistent in calling XDP-generic in
the XDP_REDIRECT action, which an action under heavy development, here
we want the freedom to develop in different code tempi.  And some
features might never be available in XDP-generic. Thus, when a feature
is missing/broken it will be consistent for the user.

The remaining question is how will a user know that XDP "mode" she is
using?  The user clearly loaded an XDP-native program, and expect the
associated performance, but XDP_REDIRECT will be using the slow
XDP-generic code path...



> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/net/virtio_net.c | 107 +++++++++++++++++++++++++++++------------------
>  1 file changed, 67 insertions(+), 40 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 9bb9e56..81190ba 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
[...]
> @@ -689,31 +709,30 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>  	struct bpf_prog *xdp_prog;
>  	unsigned int truesize;
>  	unsigned int headroom = mergeable_ctx_to_headroom(ctx);
> -	bool sent;
> +	bool sent, skb_xdp = false;
> +	int err;
>  
>  	head_skb = NULL;
>  
>  	rcu_read_lock();
>  	xdp_prog = rcu_dereference(rq->xdp_prog);
>  	if (xdp_prog) {
> -		struct page *xdp_page;
>  		struct xdp_buff xdp;
>  		void *data;
>  		u32 act;
>  
> -		/* This happens when rx buffer size is underestimated */
> +		/* This happens when rx buffer size is underestimated
> +		 * or headroom is not enough because of the buffer
> +		 * was refilled before XDP is set. In both cases,
> +		 * for simplicity, we will offload them to generic
> +		 * XDP routine. This should only happen for the first
> +		 * several packets, so we don't care much about its
> +		 * performance.
> +		 */
>  		if (unlikely(num_buf > 1 ||
>  			     headroom < virtnet_get_headroom(vi))) {

I think you also need to check the tailroom here? (AFAIK this is hidden
in the len_to_ctx as the "truesize").

> -			/* linearize data for XDP */
> -			xdp_page = xdp_linearize_page(rq, &num_buf,
> -						      page, offset,
> -						      VIRTIO_XDP_HEADROOM,
> -						      &len);
> -			if (!xdp_page)
> -				goto err_xdp;
> -			offset = VIRTIO_XDP_HEADROOM;
> -		} else {
> -			xdp_page = page;
> +			skb_xdp = true;
> +			goto skb_xdp;
>  		}
>  
>  		/* Transient failure which in theory could occur if


-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next 1/2] virtio-net: re enable XDP_REDIRECT for mergeable buffer
  2018-03-01  3:19 ` Jason Wang
@ 2018-03-01  8:41   ` Jesper Dangaard Brouer
  2018-03-01  8:41   ` Jesper Dangaard Brouer
  2018-03-01 13:36     ` Michael S. Tsirkin
  2 siblings, 0 replies; 36+ messages in thread
From: Jesper Dangaard Brouer @ 2018-03-01  8:41 UTC (permalink / raw)
  To: Jason Wang
  Cc: mst, netdev, john.fastabend, linux-kernel, virtualization, brouer


On Thu,  1 Mar 2018 11:19:04 +0800 Jason Wang <jasowang@redhat.com> wrote:

> XDP_REDIRECT support for mergeable buffer was removed since commit
> 7324f5399b06 ("virtio_net: disable XDP_REDIRECT in receive_mergeable()
> case"). This is because we don't reserve enough tailroom for struct
> skb_shared_info which breaks XDP assumption. Other complaints are, the
> complex linearize logic and EWMA estimation may increase the
> possibility of linearizing.

This patch also have the intermixing issues, I mentioned for patch 2/2.

On Thu, 1 Mar 2018 09:02:06 +0100
Jesper Dangaard Brouer <brouer@redhat.com> wrote:

> 1. XDP generic is not feature complete, e.g. cpumap will drop these
>    packets. It might not be possible to implement some features, think
>    of (AF_XDP) zero-copy.
> 
> 2. This can easily cause out-of-order packets.
> 
> 3. It makes it harder to troubleshoot, when diagnosing issues
>    around #1, we have a hard time determining what path an XDP packet
>    took (the xdp tracepoints doesn't know).


It is slightly better, as it is consistent in calling XDP-generic in
the XDP_REDIRECT action, which an action under heavy development, here
we want the freedom to develop in different code tempi.  And some
features might never be available in XDP-generic. Thus, when a feature
is missing/broken it will be consistent for the user.

The remaining question is how will a user know that XDP "mode" she is
using?  The user clearly loaded an XDP-native program, and expect the
associated performance, but XDP_REDIRECT will be using the slow
XDP-generic code path...



> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/net/virtio_net.c | 107 +++++++++++++++++++++++++++++------------------
>  1 file changed, 67 insertions(+), 40 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 9bb9e56..81190ba 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
[...]
> @@ -689,31 +709,30 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>  	struct bpf_prog *xdp_prog;
>  	unsigned int truesize;
>  	unsigned int headroom = mergeable_ctx_to_headroom(ctx);
> -	bool sent;
> +	bool sent, skb_xdp = false;
> +	int err;
>  
>  	head_skb = NULL;
>  
>  	rcu_read_lock();
>  	xdp_prog = rcu_dereference(rq->xdp_prog);
>  	if (xdp_prog) {
> -		struct page *xdp_page;
>  		struct xdp_buff xdp;
>  		void *data;
>  		u32 act;
>  
> -		/* This happens when rx buffer size is underestimated */
> +		/* This happens when rx buffer size is underestimated
> +		 * or headroom is not enough because of the buffer
> +		 * was refilled before XDP is set. In both cases,
> +		 * for simplicity, we will offload them to generic
> +		 * XDP routine. This should only happen for the first
> +		 * several packets, so we don't care much about its
> +		 * performance.
> +		 */
>  		if (unlikely(num_buf > 1 ||
>  			     headroom < virtnet_get_headroom(vi))) {

I think you also need to check the tailroom here? (AFAIK this is hidden
in the len_to_ctx as the "truesize").

> -			/* linearize data for XDP */
> -			xdp_page = xdp_linearize_page(rq, &num_buf,
> -						      page, offset,
> -						      VIRTIO_XDP_HEADROOM,
> -						      &len);
> -			if (!xdp_page)
> -				goto err_xdp;
> -			offset = VIRTIO_XDP_HEADROOM;
> -		} else {
> -			xdp_page = page;
> +			skb_xdp = true;
> +			goto skb_xdp;
>  		}
>  
>  		/* Transient failure which in theory could occur if


-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next 2/2] virtio-net: simplify XDP handling in small buffer
  2018-03-01  8:02   ` Jesper Dangaard Brouer
@ 2018-03-01  8:49     ` Jason Wang
  2018-03-01  9:15         ` Jesper Dangaard Brouer
  2018-03-01  8:49     ` Jason Wang
  1 sibling, 1 reply; 36+ messages in thread
From: Jason Wang @ 2018-03-01  8:49 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: mst, virtualization, netdev, linux-kernel, john.fastabend



On 2018年03月01日 16:02, Jesper Dangaard Brouer wrote:
> On Thu,  1 Mar 2018 11:19:05 +0800 Jason Wang <jasowang@redhat.com> wrote:
>
>> We used to do data copy through xdp_linearize_page() for the buffer
>> without sufficient headroom, it brings extra complexity without
>> helping for the performance. So this patch remove it and switch to use
>> generic XDP routine to handle this case.
> I don't like where this is going.  I don't like intermixing the native
> XDP and generic XDP in this way, for several reasons:
>
> 1. XDP generic is not feature complete, e.g. cpumap will drop these
>     packets.

Well, then we'd better fix it, otherwise it would be even worse than not 
having it. For cpumap, it can be done through queuing skb in the pointer 
ring with some encoding/decoding.

>   It might not be possible to implement some features, think
>     of (AF_XDP) zero-copy.

Yes, but can AF_XDP support header adjustment? Consider the assumption 
of zerocopy, I was considering maybe we need a way to tell AF_XDP of 
whether or not zerocopy is supported by the driver.

>
> 2. This can easily cause out-of-order packets.

I may miss something, but it looks to me packets were still delivered in 
order? Or you mean the packets that was dropped by cpumap?

>
> 3. It makes it harder to troubleshoot, when diagnosing issues
>     around #1, we have a hard time determining what path an XDP packet
>     took (the xdp tracepoints doesn't know).

Looks like we can address this by adding tracepoints in generic code path.

>
>
> [...]
>> @@ -590,25 +526,14 @@ static struct sk_buff *receive_small(struct net_device *dev,
>>   		if (unlikely(hdr->hdr.gso_type))
>>   			goto err_xdp;
>>   
>> +		/* This happnes when headroom is not enough because
>> +		 * the buffer was refilled before XDP is set. This
>> +		 * only happen for several packets, for simplicity,
>> +		 * offload them to generic XDP routine.
> In my practical tests, I also saw that sometime my ping packets were
> traveling this code-path, even after a long time when XDP were attached.

I don't see this, probably a bug somewhere. I would like to reproduce 
and see, how do you do the test? If there's just very few packets were 
received after XDP, we may still use the buffer that was refilled before 
XDP, that's by design.

>
> This worries me a bit, for troubleshooting purposes... as this can give
> a strange user experience given point #1.



I can stick to the xdp_linearize_page() logic if you don't like generic 
XDP stuffs, but it may not work for AF_XDP neither. Consider AF_XDP has 
still some distance of being merged, I'm not sure we could even care 
about it.

Thanks

>
>
>> +		 */
>>   		if (unlikely(xdp_headroom < virtnet_get_headroom(vi))) {
>> -			int offset = buf - page_address(page) + header_offset;
>> -			unsigned int tlen = len + vi->hdr_len;
>> -			u16 num_buf = 1;
>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next 2/2] virtio-net: simplify XDP handling in small buffer
  2018-03-01  8:02   ` Jesper Dangaard Brouer
  2018-03-01  8:49     ` Jason Wang
@ 2018-03-01  8:49     ` Jason Wang
  1 sibling, 0 replies; 36+ messages in thread
From: Jason Wang @ 2018-03-01  8:49 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: netdev, virtualization, john.fastabend, linux-kernel, mst



On 2018年03月01日 16:02, Jesper Dangaard Brouer wrote:
> On Thu,  1 Mar 2018 11:19:05 +0800 Jason Wang <jasowang@redhat.com> wrote:
>
>> We used to do data copy through xdp_linearize_page() for the buffer
>> without sufficient headroom, it brings extra complexity without
>> helping for the performance. So this patch remove it and switch to use
>> generic XDP routine to handle this case.
> I don't like where this is going.  I don't like intermixing the native
> XDP and generic XDP in this way, for several reasons:
>
> 1. XDP generic is not feature complete, e.g. cpumap will drop these
>     packets.

Well, then we'd better fix it, otherwise it would be even worse than not 
having it. For cpumap, it can be done through queuing skb in the pointer 
ring with some encoding/decoding.

>   It might not be possible to implement some features, think
>     of (AF_XDP) zero-copy.

Yes, but can AF_XDP support header adjustment? Consider the assumption 
of zerocopy, I was considering maybe we need a way to tell AF_XDP of 
whether or not zerocopy is supported by the driver.

>
> 2. This can easily cause out-of-order packets.

I may miss something, but it looks to me packets were still delivered in 
order? Or you mean the packets that was dropped by cpumap?

>
> 3. It makes it harder to troubleshoot, when diagnosing issues
>     around #1, we have a hard time determining what path an XDP packet
>     took (the xdp tracepoints doesn't know).

Looks like we can address this by adding tracepoints in generic code path.

>
>
> [...]
>> @@ -590,25 +526,14 @@ static struct sk_buff *receive_small(struct net_device *dev,
>>   		if (unlikely(hdr->hdr.gso_type))
>>   			goto err_xdp;
>>   
>> +		/* This happnes when headroom is not enough because
>> +		 * the buffer was refilled before XDP is set. This
>> +		 * only happen for several packets, for simplicity,
>> +		 * offload them to generic XDP routine.
> In my practical tests, I also saw that sometime my ping packets were
> traveling this code-path, even after a long time when XDP were attached.

I don't see this, probably a bug somewhere. I would like to reproduce 
and see, how do you do the test? If there's just very few packets were 
received after XDP, we may still use the buffer that was refilled before 
XDP, that's by design.

>
> This worries me a bit, for troubleshooting purposes... as this can give
> a strange user experience given point #1.



I can stick to the xdp_linearize_page() logic if you don't like generic 
XDP stuffs, but it may not work for AF_XDP neither. Consider AF_XDP has 
still some distance of being merged, I'm not sure we could even care 
about it.

Thanks

>
>
>> +		 */
>>   		if (unlikely(xdp_headroom < virtnet_get_headroom(vi))) {
>> -			int offset = buf - page_address(page) + header_offset;
>> -			unsigned int tlen = len + vi->hdr_len;
>> -			u16 num_buf = 1;
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next 0/2] virtio-net: re enable XDP_REDIRECT for mergeable buffer
  2018-03-01  3:19 ` Jason Wang
@ 2018-03-01  9:10   ` Jesper Dangaard Brouer
  -1 siblings, 0 replies; 36+ messages in thread
From: Jesper Dangaard Brouer @ 2018-03-01  9:10 UTC (permalink / raw)
  To: Jason Wang
  Cc: mst, virtualization, netdev, linux-kernel, john.fastabend, brouer

On Thu,  1 Mar 2018 11:19:03 +0800
Jason Wang <jasowang@redhat.com> wrote:

> This series tries to re-enable XDP_REDIRECT for mergeable buffer which
> was removed since commit 7324f5399b06 ("virtio_net: disable
> XDP_REDIRECT in receive_mergeable() case"). Main concerns are:
> 
> - not enough tailroom was reserved which breaks cpumap

To address this at a more fundamental level, I would suggest that we/you
instead extend XDP to know it's buffers "frame" size/end.  (The
assumption use to be, xdp_buff->data_hard_start + PAGE_SIZE, but
ixgbe+virtio_net broke that assumption).

It should actually be fairly easy to implement:
 * Simply extend xdp_buff with a "data_hard_end" pointer.

Now cpumap is more safe... instead of crashing, it can now know/see when
it is safe to create an SKB using build_skb (could fallback to
dev_alloc_skb).

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next 0/2] virtio-net: re enable XDP_REDIRECT for mergeable buffer
@ 2018-03-01  9:10   ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 36+ messages in thread
From: Jesper Dangaard Brouer @ 2018-03-01  9:10 UTC (permalink / raw)
  To: Jason Wang
  Cc: mst, netdev, john.fastabend, linux-kernel, virtualization, brouer

On Thu,  1 Mar 2018 11:19:03 +0800
Jason Wang <jasowang@redhat.com> wrote:

> This series tries to re-enable XDP_REDIRECT for mergeable buffer which
> was removed since commit 7324f5399b06 ("virtio_net: disable
> XDP_REDIRECT in receive_mergeable() case"). Main concerns are:
> 
> - not enough tailroom was reserved which breaks cpumap

To address this at a more fundamental level, I would suggest that we/you
instead extend XDP to know it's buffers "frame" size/end.  (The
assumption use to be, xdp_buff->data_hard_start + PAGE_SIZE, but
ixgbe+virtio_net broke that assumption).

It should actually be fairly easy to implement:
 * Simply extend xdp_buff with a "data_hard_end" pointer.

Now cpumap is more safe... instead of crashing, it can now know/see when
it is safe to create an SKB using build_skb (could fallback to
dev_alloc_skb).

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next 1/2] virtio-net: re enable XDP_REDIRECT for mergeable buffer
  2018-03-01  8:41   ` Jesper Dangaard Brouer
@ 2018-03-01  9:11       ` Jason Wang
  0 siblings, 0 replies; 36+ messages in thread
From: Jason Wang @ 2018-03-01  9:11 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: mst, virtualization, netdev, linux-kernel, john.fastabend



On 2018年03月01日 16:41, Jesper Dangaard Brouer wrote:
> On Thu,  1 Mar 2018 11:19:04 +0800 Jason Wang <jasowang@redhat.com> wrote:
>
>> XDP_REDIRECT support for mergeable buffer was removed since commit
>> 7324f5399b06 ("virtio_net: disable XDP_REDIRECT in receive_mergeable()
>> case"). This is because we don't reserve enough tailroom for struct
>> skb_shared_info which breaks XDP assumption. Other complaints are, the
>> complex linearize logic and EWMA estimation may increase the
>> possibility of linearizing.
> This patch also have the intermixing issues, I mentioned for patch 2/2.
>
> On Thu, 1 Mar 2018 09:02:06 +0100
> Jesper Dangaard Brouer <brouer@redhat.com> wrote:
>
>> 1. XDP generic is not feature complete, e.g. cpumap will drop these
>>     packets. It might not be possible to implement some features, think
>>     of (AF_XDP) zero-copy.
>>
>> 2. This can easily cause out-of-order packets.
>>
>> 3. It makes it harder to troubleshoot, when diagnosing issues
>>     around #1, we have a hard time determining what path an XDP packet
>>     took (the xdp tracepoints doesn't know).
>
> It is slightly better, as it is consistent in calling XDP-generic in
> the XDP_REDIRECT action, which an action under heavy development, here
> we want the freedom to develop in different code tempi.

Looks not, this patch did:

+               case XDP_REDIRECT:
+                       err = xdp_do_redirect(dev, &xdp, xdp_prog);
+                       if (err)
                                 goto err_xdp;
+                       *xdp_xmit = true;
                         rcu_read_unlock();
                         goto xdp_xmit;

So native version is used for most cases. XDP-generic will be only used 
for the packets of insufficient headroom which should be rare (just 
first few packets).

>    And some
> features might never be available in XDP-generic. Thus, when a feature
> is missing/broken it will be consistent for the user.

This sounds bad.

>
> The remaining question is how will a user know that XDP "mode" she is
> using?

So for this patch, skb mode will only happen for the first several 
packets. So mostly native mode. And it's even not clear to me if user 
need to know about this since the performance should be the same.

>    The user clearly loaded an XDP-native program, and expect the
> associated performance, but XDP_REDIRECT will be using the slow
> XDP-generic code path...

As I replied above, XDP_REDIRECT use native mode except for the first 
few packets.

>
>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> ---
>>   drivers/net/virtio_net.c | 107 +++++++++++++++++++++++++++++------------------
>>   1 file changed, 67 insertions(+), 40 deletions(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index 9bb9e56..81190ba 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
> [...]
>> @@ -689,31 +709,30 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>>   	struct bpf_prog *xdp_prog;
>>   	unsigned int truesize;
>>   	unsigned int headroom = mergeable_ctx_to_headroom(ctx);
>> -	bool sent;
>> +	bool sent, skb_xdp = false;
>> +	int err;
>>   
>>   	head_skb = NULL;
>>   
>>   	rcu_read_lock();
>>   	xdp_prog = rcu_dereference(rq->xdp_prog);
>>   	if (xdp_prog) {
>> -		struct page *xdp_page;
>>   		struct xdp_buff xdp;
>>   		void *data;
>>   		u32 act;
>>   
>> -		/* This happens when rx buffer size is underestimated */
>> +		/* This happens when rx buffer size is underestimated
>> +		 * or headroom is not enough because of the buffer
>> +		 * was refilled before XDP is set. In both cases,
>> +		 * for simplicity, we will offload them to generic
>> +		 * XDP routine. This should only happen for the first
>> +		 * several packets, so we don't care much about its
>> +		 * performance.
>> +		 */
>>   		if (unlikely(num_buf > 1 ||
>>   			     headroom < virtnet_get_headroom(vi))) {
> I think you also need to check the tailroom here? (AFAIK this is hidden
> in the len_to_ctx as the "truesize").

add_recvbuf_mergeable() did this:

     unsigned int headroom = virtnet_get_headroom(vi);
     unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;

So if headroom is not zero, it implies tailroom here.

Thanks

>
>> -			/* linearize data for XDP */
>> -			xdp_page = xdp_linearize_page(rq, &num_buf,
>> -						      page, offset,
>> -						      VIRTIO_XDP_HEADROOM,
>> -						      &len);
>> -			if (!xdp_page)
>> -				goto err_xdp;
>> -			offset = VIRTIO_XDP_HEADROOM;
>> -		} else {
>> -			xdp_page = page;
>> +			skb_xdp = true;
>> +			goto skb_xdp;
>>   		}
>>   
>>   		/* Transient failure which in theory could occur if
>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next 1/2] virtio-net: re enable XDP_REDIRECT for mergeable buffer
@ 2018-03-01  9:11       ` Jason Wang
  0 siblings, 0 replies; 36+ messages in thread
From: Jason Wang @ 2018-03-01  9:11 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: netdev, virtualization, john.fastabend, linux-kernel, mst



On 2018年03月01日 16:41, Jesper Dangaard Brouer wrote:
> On Thu,  1 Mar 2018 11:19:04 +0800 Jason Wang <jasowang@redhat.com> wrote:
>
>> XDP_REDIRECT support for mergeable buffer was removed since commit
>> 7324f5399b06 ("virtio_net: disable XDP_REDIRECT in receive_mergeable()
>> case"). This is because we don't reserve enough tailroom for struct
>> skb_shared_info which breaks XDP assumption. Other complaints are, the
>> complex linearize logic and EWMA estimation may increase the
>> possibility of linearizing.
> This patch also have the intermixing issues, I mentioned for patch 2/2.
>
> On Thu, 1 Mar 2018 09:02:06 +0100
> Jesper Dangaard Brouer <brouer@redhat.com> wrote:
>
>> 1. XDP generic is not feature complete, e.g. cpumap will drop these
>>     packets. It might not be possible to implement some features, think
>>     of (AF_XDP) zero-copy.
>>
>> 2. This can easily cause out-of-order packets.
>>
>> 3. It makes it harder to troubleshoot, when diagnosing issues
>>     around #1, we have a hard time determining what path an XDP packet
>>     took (the xdp tracepoints doesn't know).
>
> It is slightly better, as it is consistent in calling XDP-generic in
> the XDP_REDIRECT action, which an action under heavy development, here
> we want the freedom to develop in different code tempi.

Looks not, this patch did:

+               case XDP_REDIRECT:
+                       err = xdp_do_redirect(dev, &xdp, xdp_prog);
+                       if (err)
                                 goto err_xdp;
+                       *xdp_xmit = true;
                         rcu_read_unlock();
                         goto xdp_xmit;

So native version is used for most cases. XDP-generic will be only used 
for the packets of insufficient headroom which should be rare (just 
first few packets).

>    And some
> features might never be available in XDP-generic. Thus, when a feature
> is missing/broken it will be consistent for the user.

This sounds bad.

>
> The remaining question is how will a user know that XDP "mode" she is
> using?

So for this patch, skb mode will only happen for the first several 
packets. So mostly native mode. And it's even not clear to me if user 
need to know about this since the performance should be the same.

>    The user clearly loaded an XDP-native program, and expect the
> associated performance, but XDP_REDIRECT will be using the slow
> XDP-generic code path...

As I replied above, XDP_REDIRECT use native mode except for the first 
few packets.

>
>
>> Signed-off-by: Jason Wang <jasowang@redhat.com>
>> ---
>>   drivers/net/virtio_net.c | 107 +++++++++++++++++++++++++++++------------------
>>   1 file changed, 67 insertions(+), 40 deletions(-)
>>
>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
>> index 9bb9e56..81190ba 100644
>> --- a/drivers/net/virtio_net.c
>> +++ b/drivers/net/virtio_net.c
> [...]
>> @@ -689,31 +709,30 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>>   	struct bpf_prog *xdp_prog;
>>   	unsigned int truesize;
>>   	unsigned int headroom = mergeable_ctx_to_headroom(ctx);
>> -	bool sent;
>> +	bool sent, skb_xdp = false;
>> +	int err;
>>   
>>   	head_skb = NULL;
>>   
>>   	rcu_read_lock();
>>   	xdp_prog = rcu_dereference(rq->xdp_prog);
>>   	if (xdp_prog) {
>> -		struct page *xdp_page;
>>   		struct xdp_buff xdp;
>>   		void *data;
>>   		u32 act;
>>   
>> -		/* This happens when rx buffer size is underestimated */
>> +		/* This happens when rx buffer size is underestimated
>> +		 * or headroom is not enough because of the buffer
>> +		 * was refilled before XDP is set. In both cases,
>> +		 * for simplicity, we will offload them to generic
>> +		 * XDP routine. This should only happen for the first
>> +		 * several packets, so we don't care much about its
>> +		 * performance.
>> +		 */
>>   		if (unlikely(num_buf > 1 ||
>>   			     headroom < virtnet_get_headroom(vi))) {
> I think you also need to check the tailroom here? (AFAIK this is hidden
> in the len_to_ctx as the "truesize").

add_recvbuf_mergeable() did this:

     unsigned int headroom = virtnet_get_headroom(vi);
     unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;

So if headroom is not zero, it implies tailroom here.

Thanks

>
>> -			/* linearize data for XDP */
>> -			xdp_page = xdp_linearize_page(rq, &num_buf,
>> -						      page, offset,
>> -						      VIRTIO_XDP_HEADROOM,
>> -						      &len);
>> -			if (!xdp_page)
>> -				goto err_xdp;
>> -			offset = VIRTIO_XDP_HEADROOM;
>> -		} else {
>> -			xdp_page = page;
>> +			skb_xdp = true;
>> +			goto skb_xdp;
>>   		}
>>   
>>   		/* Transient failure which in theory could occur if
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next 2/2] virtio-net: simplify XDP handling in small buffer
  2018-03-01  8:49     ` Jason Wang
@ 2018-03-01  9:15         ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 36+ messages in thread
From: Jesper Dangaard Brouer @ 2018-03-01  9:15 UTC (permalink / raw)
  To: Jason Wang
  Cc: mst, virtualization, netdev, linux-kernel, john.fastabend, brouer

On Thu, 1 Mar 2018 16:49:24 +0800
Jason Wang <jasowang@redhat.com> wrote:

> > 2. This can easily cause out-of-order packets.  
> 
> I may miss something, but it looks to me packets were still delivered
> in order? Or you mean the packets that was dropped by cpumap?

No. Packets can now travel two code paths to the egress device. (1) XDP
native via ndp_xdp_xmit via direct delivery into a lockfree/dedicated
TX queue, (2) via normal network stack which can involve being queue in
a qdisc.  Do you see the possibility of the reorder now?

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next 2/2] virtio-net: simplify XDP handling in small buffer
@ 2018-03-01  9:15         ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 36+ messages in thread
From: Jesper Dangaard Brouer @ 2018-03-01  9:15 UTC (permalink / raw)
  To: Jason Wang
  Cc: mst, netdev, john.fastabend, linux-kernel, virtualization, brouer

On Thu, 1 Mar 2018 16:49:24 +0800
Jason Wang <jasowang@redhat.com> wrote:

> > 2. This can easily cause out-of-order packets.  
> 
> I may miss something, but it looks to me packets were still delivered
> in order? Or you mean the packets that was dropped by cpumap?

No. Packets can now travel two code paths to the egress device. (1) XDP
native via ndp_xdp_xmit via direct delivery into a lockfree/dedicated
TX queue, (2) via normal network stack which can involve being queue in
a qdisc.  Do you see the possibility of the reorder now?

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next 0/2] virtio-net: re enable XDP_REDIRECT for mergeable buffer
  2018-03-01  9:10   ` Jesper Dangaard Brouer
  (?)
  (?)
@ 2018-03-01  9:23   ` Jason Wang
  2018-03-01 10:35       ` Jesper Dangaard Brouer
  -1 siblings, 1 reply; 36+ messages in thread
From: Jason Wang @ 2018-03-01  9:23 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: mst, virtualization, netdev, linux-kernel, john.fastabend



On 2018年03月01日 17:10, Jesper Dangaard Brouer wrote:
> On Thu,  1 Mar 2018 11:19:03 +0800
> Jason Wang <jasowang@redhat.com> wrote:
>
>> This series tries to re-enable XDP_REDIRECT for mergeable buffer which
>> was removed since commit 7324f5399b06 ("virtio_net: disable
>> XDP_REDIRECT in receive_mergeable() case"). Main concerns are:
>>
>> - not enough tailroom was reserved which breaks cpumap
> To address this at a more fundamental level, I would suggest that we/you
> instead extend XDP to know it's buffers "frame" size/end.  (The
> assumption use to be, xdp_buff->data_hard_start + PAGE_SIZE, but
> ixgbe+virtio_net broke that assumption).
>
> It should actually be fairly easy to implement:
>   * Simply extend xdp_buff with a "data_hard_end" pointer.

Right, and then cpumap can warn and drop packets with insufficient 
tailroom. But it should be a patch on top of this I think.

Thanks

>
> Now cpumap is more safe... instead of crashing, it can now know/see when
> it is safe to create an SKB using build_skb (could fallback to
> dev_alloc_skb).
>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next 0/2] virtio-net: re enable XDP_REDIRECT for mergeable buffer
  2018-03-01  9:10   ` Jesper Dangaard Brouer
  (?)
@ 2018-03-01  9:23   ` Jason Wang
  -1 siblings, 0 replies; 36+ messages in thread
From: Jason Wang @ 2018-03-01  9:23 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: netdev, virtualization, john.fastabend, linux-kernel, mst



On 2018年03月01日 17:10, Jesper Dangaard Brouer wrote:
> On Thu,  1 Mar 2018 11:19:03 +0800
> Jason Wang <jasowang@redhat.com> wrote:
>
>> This series tries to re-enable XDP_REDIRECT for mergeable buffer which
>> was removed since commit 7324f5399b06 ("virtio_net: disable
>> XDP_REDIRECT in receive_mergeable() case"). Main concerns are:
>>
>> - not enough tailroom was reserved which breaks cpumap
> To address this at a more fundamental level, I would suggest that we/you
> instead extend XDP to know it's buffers "frame" size/end.  (The
> assumption use to be, xdp_buff->data_hard_start + PAGE_SIZE, but
> ixgbe+virtio_net broke that assumption).
>
> It should actually be fairly easy to implement:
>   * Simply extend xdp_buff with a "data_hard_end" pointer.

Right, and then cpumap can warn and drop packets with insufficient 
tailroom. But it should be a patch on top of this I think.

Thanks

>
> Now cpumap is more safe... instead of crashing, it can now know/see when
> it is safe to create an SKB using build_skb (could fallback to
> dev_alloc_skb).
>

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next 2/2] virtio-net: simplify XDP handling in small buffer
  2018-03-01  9:15         ` Jesper Dangaard Brouer
  (?)
  (?)
@ 2018-03-01  9:24         ` Jason Wang
  -1 siblings, 0 replies; 36+ messages in thread
From: Jason Wang @ 2018-03-01  9:24 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: mst, virtualization, netdev, linux-kernel, john.fastabend



On 2018年03月01日 17:15, Jesper Dangaard Brouer wrote:
> On Thu, 1 Mar 2018 16:49:24 +0800
> Jason Wang <jasowang@redhat.com> wrote:
>
>>> 2. This can easily cause out-of-order packets.
>> I may miss something, but it looks to me packets were still delivered
>> in order? Or you mean the packets that was dropped by cpumap?
> No. Packets can now travel two code paths to the egress device. (1) XDP
> native via ndp_xdp_xmit via direct delivery into a lockfree/dedicated
> TX queue, (2) via normal network stack which can involve being queue in
> a qdisc.  Do you see the possibility of the reorder now?
>

I see, thanks. But consider this could only happen for first few 
packets, not sure it was worth to worry about it.

Thanks

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next 2/2] virtio-net: simplify XDP handling in small buffer
  2018-03-01  9:15         ` Jesper Dangaard Brouer
  (?)
@ 2018-03-01  9:24         ` Jason Wang
  -1 siblings, 0 replies; 36+ messages in thread
From: Jason Wang @ 2018-03-01  9:24 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: netdev, virtualization, john.fastabend, linux-kernel, mst



On 2018年03月01日 17:15, Jesper Dangaard Brouer wrote:
> On Thu, 1 Mar 2018 16:49:24 +0800
> Jason Wang <jasowang@redhat.com> wrote:
>
>>> 2. This can easily cause out-of-order packets.
>> I may miss something, but it looks to me packets were still delivered
>> in order? Or you mean the packets that was dropped by cpumap?
> No. Packets can now travel two code paths to the egress device. (1) XDP
> native via ndp_xdp_xmit via direct delivery into a lockfree/dedicated
> TX queue, (2) via normal network stack which can involve being queue in
> a qdisc.  Do you see the possibility of the reorder now?
>

I see, thanks. But consider this could only happen for first few 
packets, not sure it was worth to worry about it.

Thanks
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next 0/2] virtio-net: re enable XDP_REDIRECT for mergeable buffer
  2018-03-01  9:23   ` Jason Wang
@ 2018-03-01 10:35       ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 36+ messages in thread
From: Jesper Dangaard Brouer @ 2018-03-01 10:35 UTC (permalink / raw)
  To: Jason Wang
  Cc: mst, virtualization, netdev, linux-kernel, john.fastabend,
	brouer, Alexei Starovoitov

On Thu, 1 Mar 2018 17:23:37 +0800
Jason Wang <jasowang@redhat.com> wrote:

> On 2018年03月01日 17:10, Jesper Dangaard Brouer wrote:
> > On Thu,  1 Mar 2018 11:19:03 +0800
> > Jason Wang <jasowang@redhat.com> wrote:
> >  
> >> This series tries to re-enable XDP_REDIRECT for mergeable buffer which
> >> was removed since commit 7324f5399b06 ("virtio_net: disable
> >> XDP_REDIRECT in receive_mergeable() case"). Main concerns are:
> >>
> >> - not enough tailroom was reserved which breaks cpumap  
>
> > To address this at a more fundamental level, I would suggest that we/you
> > instead extend XDP to know it's buffers "frame" size/end.  (The
> > assumption use to be, xdp_buff->data_hard_start + PAGE_SIZE, but
> > ixgbe+virtio_net broke that assumption).
> >
> > It should actually be fairly easy to implement:
> >   * Simply extend xdp_buff with a "data_hard_end" pointer.  
> 
> Right, and then cpumap can warn and drop packets with insufficient 
> tailroom.
>
> But it should be a patch on top of this I think.

Hmmm, not really.  If we/you instead fix the issue of XDP doesn't know
the end/size of the frame, then we don't need this mixed XDP
generic/native code path mixing.

You could re-enable native redirect, and push the responsibility to
cpumap for detecting this too-small frame "missing tailroom" (and avoid
crashing...). (If we really want to support this, cpumap could fallback
to dev_alloc_skb, and handle it gracefully).

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next 0/2] virtio-net: re enable XDP_REDIRECT for mergeable buffer
@ 2018-03-01 10:35       ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 36+ messages in thread
From: Jesper Dangaard Brouer @ 2018-03-01 10:35 UTC (permalink / raw)
  To: Jason Wang
  Cc: mst, netdev, john.fastabend, linux-kernel, virtualization,
	brouer, Alexei Starovoitov

On Thu, 1 Mar 2018 17:23:37 +0800
Jason Wang <jasowang@redhat.com> wrote:

> On 2018年03月01日 17:10, Jesper Dangaard Brouer wrote:
> > On Thu,  1 Mar 2018 11:19:03 +0800
> > Jason Wang <jasowang@redhat.com> wrote:
> >  
> >> This series tries to re-enable XDP_REDIRECT for mergeable buffer which
> >> was removed since commit 7324f5399b06 ("virtio_net: disable
> >> XDP_REDIRECT in receive_mergeable() case"). Main concerns are:
> >>
> >> - not enough tailroom was reserved which breaks cpumap  
>
> > To address this at a more fundamental level, I would suggest that we/you
> > instead extend XDP to know it's buffers "frame" size/end.  (The
> > assumption use to be, xdp_buff->data_hard_start + PAGE_SIZE, but
> > ixgbe+virtio_net broke that assumption).
> >
> > It should actually be fairly easy to implement:
> >   * Simply extend xdp_buff with a "data_hard_end" pointer.  
> 
> Right, and then cpumap can warn and drop packets with insufficient 
> tailroom.
>
> But it should be a patch on top of this I think.

Hmmm, not really.  If we/you instead fix the issue of XDP doesn't know
the end/size of the frame, then we don't need this mixed XDP
generic/native code path mixing.

You could re-enable native redirect, and push the responsibility to
cpumap for detecting this too-small frame "missing tailroom" (and avoid
crashing...). (If we really want to support this, cpumap could fallback
to dev_alloc_skb, and handle it gracefully).

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next 0/2] virtio-net: re enable XDP_REDIRECT for mergeable buffer
  2018-03-01 10:35       ` Jesper Dangaard Brouer
  (?)
@ 2018-03-01 13:15       ` Jason Wang
  2018-03-01 14:16           ` Jesper Dangaard Brouer
  -1 siblings, 1 reply; 36+ messages in thread
From: Jason Wang @ 2018-03-01 13:15 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: mst, virtualization, netdev, linux-kernel, john.fastabend,
	Alexei Starovoitov



On 2018年03月01日 18:35, Jesper Dangaard Brouer wrote:
> On Thu, 1 Mar 2018 17:23:37 +0800
> Jason Wang <jasowang@redhat.com> wrote:
>
>> On 2018年03月01日 17:10, Jesper Dangaard Brouer wrote:
>>> On Thu,  1 Mar 2018 11:19:03 +0800
>>> Jason Wang <jasowang@redhat.com> wrote:
>>>   
>>>> This series tries to re-enable XDP_REDIRECT for mergeable buffer which
>>>> was removed since commit 7324f5399b06 ("virtio_net: disable
>>>> XDP_REDIRECT in receive_mergeable() case"). Main concerns are:
>>>>
>>>> - not enough tailroom was reserved which breaks cpumap
>>> To address this at a more fundamental level, I would suggest that we/you
>>> instead extend XDP to know it's buffers "frame" size/end.  (The
>>> assumption use to be, xdp_buff->data_hard_start + PAGE_SIZE, but
>>> ixgbe+virtio_net broke that assumption).
>>>
>>> It should actually be fairly easy to implement:
>>>    * Simply extend xdp_buff with a "data_hard_end" pointer.
>> Right, and then cpumap can warn and drop packets with insufficient
>> tailroom.
>>
>> But it should be a patch on top of this I think.
> Hmmm, not really.  If we/you instead fix the issue of XDP doesn't know
> the end/size of the frame, then we don't need this mixed XDP
> generic/native code path mixing.

I know this but I'm still a little bit confused. According to the commit 
log of 7324f5399b06 ("virtio_net: disable XDP_REDIRECT in 
receive_mergeable() case"), you said:

"""
     The longer explaination is that receive_mergeable() tries to
     work-around and satisfy these XDP requiresments e.g. by having a
     function xdp_linearize_page() that allocates and memcpy RX buffers
     around (in case packet is scattered across multiple rx buffers).  This
     does currently satisfy XDP_PASS, XDP_DROP and XDP_TX (but only because
     we have not implemented bpf_xdp_adjust_tail yet).
"""

So I consider the tailroom is a must for the (future) tail adjustment.

>
> You could re-enable native redirect, and push the responsibility to
> cpumap for detecting this too-small frame "missing tailroom" (and avoid
> crashing...). (If we really want to support this, cpumap could fallback
> to dev_alloc_skb, and handle it gracefully).
>

Right but it will be slower than build_skb().

Thanks

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next 0/2] virtio-net: re enable XDP_REDIRECT for mergeable buffer
  2018-03-01 10:35       ` Jesper Dangaard Brouer
  (?)
  (?)
@ 2018-03-01 13:15       ` Jason Wang
  -1 siblings, 0 replies; 36+ messages in thread
From: Jason Wang @ 2018-03-01 13:15 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: mst, netdev, john.fastabend, linux-kernel, virtualization,
	Alexei Starovoitov



On 2018年03月01日 18:35, Jesper Dangaard Brouer wrote:
> On Thu, 1 Mar 2018 17:23:37 +0800
> Jason Wang <jasowang@redhat.com> wrote:
>
>> On 2018年03月01日 17:10, Jesper Dangaard Brouer wrote:
>>> On Thu,  1 Mar 2018 11:19:03 +0800
>>> Jason Wang <jasowang@redhat.com> wrote:
>>>   
>>>> This series tries to re-enable XDP_REDIRECT for mergeable buffer which
>>>> was removed since commit 7324f5399b06 ("virtio_net: disable
>>>> XDP_REDIRECT in receive_mergeable() case"). Main concerns are:
>>>>
>>>> - not enough tailroom was reserved which breaks cpumap
>>> To address this at a more fundamental level, I would suggest that we/you
>>> instead extend XDP to know it's buffers "frame" size/end.  (The
>>> assumption use to be, xdp_buff->data_hard_start + PAGE_SIZE, but
>>> ixgbe+virtio_net broke that assumption).
>>>
>>> It should actually be fairly easy to implement:
>>>    * Simply extend xdp_buff with a "data_hard_end" pointer.
>> Right, and then cpumap can warn and drop packets with insufficient
>> tailroom.
>>
>> But it should be a patch on top of this I think.
> Hmmm, not really.  If we/you instead fix the issue of XDP doesn't know
> the end/size of the frame, then we don't need this mixed XDP
> generic/native code path mixing.

I know this but I'm still a little bit confused. According to the commit 
log of 7324f5399b06 ("virtio_net: disable XDP_REDIRECT in 
receive_mergeable() case"), you said:

"""
     The longer explaination is that receive_mergeable() tries to
     work-around and satisfy these XDP requiresments e.g. by having a
     function xdp_linearize_page() that allocates and memcpy RX buffers
     around (in case packet is scattered across multiple rx buffers).  This
     does currently satisfy XDP_PASS, XDP_DROP and XDP_TX (but only because
     we have not implemented bpf_xdp_adjust_tail yet).
"""

So I consider the tailroom is a must for the (future) tail adjustment.

>
> You could re-enable native redirect, and push the responsibility to
> cpumap for detecting this too-small frame "missing tailroom" (and avoid
> crashing...). (If we really want to support this, cpumap could fallback
> to dev_alloc_skb, and handle it gracefully).
>

Right but it will be slower than build_skb().

Thanks

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next 1/2] virtio-net: re enable XDP_REDIRECT for mergeable buffer
  2018-03-01  3:19 ` Jason Wang
@ 2018-03-01 13:36     ` Michael S. Tsirkin
  2018-03-01  8:41   ` Jesper Dangaard Brouer
  2018-03-01 13:36     ` Michael S. Tsirkin
  2 siblings, 0 replies; 36+ messages in thread
From: Michael S. Tsirkin @ 2018-03-01 13:36 UTC (permalink / raw)
  To: Jason Wang; +Cc: virtualization, netdev, linux-kernel, brouer, john.fastabend

On Thu, Mar 01, 2018 at 11:19:04AM +0800, Jason Wang wrote:
> XDP_REDIRECT support for mergeable buffer was removed since commit
> 7324f5399b06 ("virtio_net: disable XDP_REDIRECT in receive_mergeable()
> case"). This is because we don't reserve enough tailroom for struct
> skb_shared_info which breaks XDP assumption. Other complaints are, the
> complex linearize logic and EWMA estimation may increase the
> possibility of linearizing.
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>

How about reposting just this patch for net? It's pretty small
and this way we don't have broken redirect there.
Probably keeping the linearize in here to reduce the
amount of changes.

> ---
>  drivers/net/virtio_net.c | 107 +++++++++++++++++++++++++++++------------------
>  1 file changed, 67 insertions(+), 40 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 9bb9e56..81190ba 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -537,6 +537,26 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
>  	return NULL;
>  }
>  
> +static struct sk_buff *virtnet_skb_xdp(struct receive_queue *rq,
> +				       struct sk_buff *skb)
> +{
> +	struct bpf_prog *xdp_prog;
> +	int ret;
> +
> +	rcu_read_lock();
> +	xdp_prog = rcu_dereference(rq->xdp_prog);
> +	if (xdp_prog) {
> +		ret = do_xdp_generic(xdp_prog, skb);
> +		if (ret != XDP_PASS) {
> +			rcu_read_unlock();
> +			return NULL;
> +		}
> +	}
> +	rcu_read_unlock();
> +
> +	return skb;
> +}
> +
>  static struct sk_buff *receive_small(struct net_device *dev,
>  				     struct virtnet_info *vi,
>  				     struct receive_queue *rq,
> @@ -689,31 +709,30 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>  	struct bpf_prog *xdp_prog;
>  	unsigned int truesize;
>  	unsigned int headroom = mergeable_ctx_to_headroom(ctx);
> -	bool sent;
> +	bool sent, skb_xdp = false;
> +	int err;
>  
>  	head_skb = NULL;
>  
>  	rcu_read_lock();
>  	xdp_prog = rcu_dereference(rq->xdp_prog);
>  	if (xdp_prog) {
> -		struct page *xdp_page;
>  		struct xdp_buff xdp;
>  		void *data;
>  		u32 act;
>  
> -		/* This happens when rx buffer size is underestimated */
> +		/* This happens when rx buffer size is underestimated
> +		 * or headroom is not enough because of the buffer
> +		 * was refilled before XDP is set. In both cases,
> +		 * for simplicity, we will offload them to generic
> +		 * XDP routine. This should only happen for the first
> +		 * several packets, so we don't care much about its
> +		 * performance.
> +		 */
>  		if (unlikely(num_buf > 1 ||
>  			     headroom < virtnet_get_headroom(vi))) {
> -			/* linearize data for XDP */
> -			xdp_page = xdp_linearize_page(rq, &num_buf,
> -						      page, offset,
> -						      VIRTIO_XDP_HEADROOM,
> -						      &len);
> -			if (!xdp_page)
> -				goto err_xdp;
> -			offset = VIRTIO_XDP_HEADROOM;
> -		} else {
> -			xdp_page = page;
> +			skb_xdp = true;
> +			goto skb_xdp;
>  		}
>  
>  		/* Transient failure which in theory could occur if
> @@ -727,7 +746,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>  		/* Allow consuming headroom but reserve enough space to push
>  		 * the descriptor on if we get an XDP_TX return code.
>  		 */
> -		data = page_address(xdp_page) + offset;
> +		data = page_address(page) + offset;
>  		xdp.data_hard_start = data - VIRTIO_XDP_HEADROOM + vi->hdr_len;
>  		xdp.data = data + vi->hdr_len;
>  		xdp_set_data_meta_invalid(&xdp);
> @@ -736,9 +755,6 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>  
>  		act = bpf_prog_run_xdp(xdp_prog, &xdp);
>  
> -		if (act != XDP_PASS)
> -			ewma_pkt_len_add(&rq->mrg_avg_pkt_len, len);
> -
>  		switch (act) {
>  		case XDP_PASS:
>  			/* recalculate offset to account for any header
> @@ -746,28 +762,22 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>  			 * skb and avoid using offset
>  			 */
>  			offset = xdp.data -
> -					page_address(xdp_page) - vi->hdr_len;
> -
> -			/* We can only create skb based on xdp_page. */
> -			if (unlikely(xdp_page != page)) {
> -				rcu_read_unlock();
> -				put_page(page);
> -				head_skb = page_to_skb(vi, rq, xdp_page,
> -						       offset, len, PAGE_SIZE);
> -				return head_skb;
> -			}
> +					page_address(page) - vi->hdr_len;
>  			break;
>  		case XDP_TX:
>  			sent = __virtnet_xdp_xmit(vi, &xdp);
>  			if (unlikely(!sent)) {
>  				trace_xdp_exception(vi->dev, xdp_prog, act);
> -				if (unlikely(xdp_page != page))
> -					put_page(xdp_page);
>  				goto err_xdp;
>  			}
>  			*xdp_xmit = true;
> -			if (unlikely(xdp_page != page))
> +			rcu_read_unlock();
> +			goto xdp_xmit;
> +		case XDP_REDIRECT:
> +			err = xdp_do_redirect(dev, &xdp, xdp_prog);
> +			if (err)
>  				goto err_xdp;
> +			*xdp_xmit = true;
>  			rcu_read_unlock();
>  			goto xdp_xmit;
>  		default:
> @@ -775,13 +785,12 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>  		case XDP_ABORTED:
>  			trace_xdp_exception(vi->dev, xdp_prog, act);
>  		case XDP_DROP:
> -			if (unlikely(xdp_page != page))
> -				__free_pages(xdp_page, 0);
>  			goto err_xdp;
>  		}
>  	}
>  	rcu_read_unlock();
>  
> +skb_xdp:
>  	truesize = mergeable_ctx_to_truesize(ctx);
>  	if (unlikely(len > truesize)) {
>  		pr_debug("%s: rx error: len %u exceeds truesize %lu\n",
> @@ -848,7 +857,11 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>  		}
>  	}
>  
> -	ewma_pkt_len_add(&rq->mrg_avg_pkt_len, head_skb->len);
> +	if (skb_xdp)
> +		head_skb = virtnet_skb_xdp(rq, head_skb);
> +	else
> +		ewma_pkt_len_add(&rq->mrg_avg_pkt_len, head_skb->len);
> +
>  	return head_skb;
>  
>  err_xdp:
> @@ -1013,13 +1026,18 @@ static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq,
>  }
>  
>  static unsigned int get_mergeable_buf_len(struct receive_queue *rq,
> -					  struct ewma_pkt_len *avg_pkt_len)
> +					  struct ewma_pkt_len *avg_pkt_len,
> +					  unsigned int room)
>  {
>  	const size_t hdr_len = sizeof(struct virtio_net_hdr_mrg_rxbuf);
>  	unsigned int len;
>  
> -	len = hdr_len + clamp_t(unsigned int, ewma_pkt_len_read(avg_pkt_len),
> +	if (room)
> +		return PAGE_SIZE - room;
> +
> +	len = hdr_len +	clamp_t(unsigned int, ewma_pkt_len_read(avg_pkt_len),
>  				rq->min_buf_len, PAGE_SIZE - hdr_len);
> +
>  	return ALIGN(len, L1_CACHE_BYTES);
>  }
>  
> @@ -1028,21 +1046,27 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
>  {
>  	struct page_frag *alloc_frag = &rq->alloc_frag;
>  	unsigned int headroom = virtnet_get_headroom(vi);
> +	unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
> +	unsigned int room = SKB_DATA_ALIGN(headroom + tailroom);
>  	char *buf;
>  	void *ctx;
>  	int err;
>  	unsigned int len, hole;
>  
> -	len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len);
> -	if (unlikely(!skb_page_frag_refill(len + headroom, alloc_frag, gfp)))
> +	/* Extra tailroom is needed to satisfy XDP's assumption. This
> +	 * means rx frags coalescing won't work, but consider we've
> +	 * disabled GSO for XDP, it won't be a big issue.
> +	 */
> +	len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room);
> +	if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp)))
>  		return -ENOMEM;
>  
>  	buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
>  	buf += headroom; /* advance address leaving hole at front of pkt */
>  	get_page(alloc_frag->page);
> -	alloc_frag->offset += len + headroom;
> +	alloc_frag->offset += len + room;
>  	hole = alloc_frag->size - alloc_frag->offset;
> -	if (hole < len + headroom) {
> +	if (hole < len + room) {
>  		/* To avoid internal fragmentation, if there is very likely not
>  		 * enough space for another buffer, add the remaining space to
>  		 * the current buffer.
> @@ -2576,12 +2600,15 @@ static ssize_t mergeable_rx_buffer_size_show(struct netdev_rx_queue *queue,
>  {
>  	struct virtnet_info *vi = netdev_priv(queue->dev);
>  	unsigned int queue_index = get_netdev_rx_queue_index(queue);
> +	unsigned int headroom = virtnet_get_headroom(vi);
> +	unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
>  	struct ewma_pkt_len *avg;
>  
>  	BUG_ON(queue_index >= vi->max_queue_pairs);
>  	avg = &vi->rq[queue_index].mrg_avg_pkt_len;
>  	return sprintf(buf, "%u\n",
> -		       get_mergeable_buf_len(&vi->rq[queue_index], avg));
> +		       get_mergeable_buf_len(&vi->rq[queue_index], avg,
> +				       SKB_DATA_ALIGN(headroom + tailroom)));
>  }
>  
>  static struct rx_queue_attribute mergeable_rx_buffer_size_attribute =
> -- 
> 2.7.4

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next 1/2] virtio-net: re enable XDP_REDIRECT for mergeable buffer
@ 2018-03-01 13:36     ` Michael S. Tsirkin
  0 siblings, 0 replies; 36+ messages in thread
From: Michael S. Tsirkin @ 2018-03-01 13:36 UTC (permalink / raw)
  To: Jason Wang; +Cc: netdev, brouer, john.fastabend, linux-kernel, virtualization

On Thu, Mar 01, 2018 at 11:19:04AM +0800, Jason Wang wrote:
> XDP_REDIRECT support for mergeable buffer was removed since commit
> 7324f5399b06 ("virtio_net: disable XDP_REDIRECT in receive_mergeable()
> case"). This is because we don't reserve enough tailroom for struct
> skb_shared_info which breaks XDP assumption. Other complaints are, the
> complex linearize logic and EWMA estimation may increase the
> possibility of linearizing.
> 
> Signed-off-by: Jason Wang <jasowang@redhat.com>

How about reposting just this patch for net? It's pretty small
and this way we don't have broken redirect there.
Probably keeping the linearize in here to reduce the
amount of changes.

> ---
>  drivers/net/virtio_net.c | 107 +++++++++++++++++++++++++++++------------------
>  1 file changed, 67 insertions(+), 40 deletions(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index 9bb9e56..81190ba 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -537,6 +537,26 @@ static struct page *xdp_linearize_page(struct receive_queue *rq,
>  	return NULL;
>  }
>  
> +static struct sk_buff *virtnet_skb_xdp(struct receive_queue *rq,
> +				       struct sk_buff *skb)
> +{
> +	struct bpf_prog *xdp_prog;
> +	int ret;
> +
> +	rcu_read_lock();
> +	xdp_prog = rcu_dereference(rq->xdp_prog);
> +	if (xdp_prog) {
> +		ret = do_xdp_generic(xdp_prog, skb);
> +		if (ret != XDP_PASS) {
> +			rcu_read_unlock();
> +			return NULL;
> +		}
> +	}
> +	rcu_read_unlock();
> +
> +	return skb;
> +}
> +
>  static struct sk_buff *receive_small(struct net_device *dev,
>  				     struct virtnet_info *vi,
>  				     struct receive_queue *rq,
> @@ -689,31 +709,30 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>  	struct bpf_prog *xdp_prog;
>  	unsigned int truesize;
>  	unsigned int headroom = mergeable_ctx_to_headroom(ctx);
> -	bool sent;
> +	bool sent, skb_xdp = false;
> +	int err;
>  
>  	head_skb = NULL;
>  
>  	rcu_read_lock();
>  	xdp_prog = rcu_dereference(rq->xdp_prog);
>  	if (xdp_prog) {
> -		struct page *xdp_page;
>  		struct xdp_buff xdp;
>  		void *data;
>  		u32 act;
>  
> -		/* This happens when rx buffer size is underestimated */
> +		/* This happens when rx buffer size is underestimated
> +		 * or headroom is not enough because of the buffer
> +		 * was refilled before XDP is set. In both cases,
> +		 * for simplicity, we will offload them to generic
> +		 * XDP routine. This should only happen for the first
> +		 * several packets, so we don't care much about its
> +		 * performance.
> +		 */
>  		if (unlikely(num_buf > 1 ||
>  			     headroom < virtnet_get_headroom(vi))) {
> -			/* linearize data for XDP */
> -			xdp_page = xdp_linearize_page(rq, &num_buf,
> -						      page, offset,
> -						      VIRTIO_XDP_HEADROOM,
> -						      &len);
> -			if (!xdp_page)
> -				goto err_xdp;
> -			offset = VIRTIO_XDP_HEADROOM;
> -		} else {
> -			xdp_page = page;
> +			skb_xdp = true;
> +			goto skb_xdp;
>  		}
>  
>  		/* Transient failure which in theory could occur if
> @@ -727,7 +746,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>  		/* Allow consuming headroom but reserve enough space to push
>  		 * the descriptor on if we get an XDP_TX return code.
>  		 */
> -		data = page_address(xdp_page) + offset;
> +		data = page_address(page) + offset;
>  		xdp.data_hard_start = data - VIRTIO_XDP_HEADROOM + vi->hdr_len;
>  		xdp.data = data + vi->hdr_len;
>  		xdp_set_data_meta_invalid(&xdp);
> @@ -736,9 +755,6 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>  
>  		act = bpf_prog_run_xdp(xdp_prog, &xdp);
>  
> -		if (act != XDP_PASS)
> -			ewma_pkt_len_add(&rq->mrg_avg_pkt_len, len);
> -
>  		switch (act) {
>  		case XDP_PASS:
>  			/* recalculate offset to account for any header
> @@ -746,28 +762,22 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>  			 * skb and avoid using offset
>  			 */
>  			offset = xdp.data -
> -					page_address(xdp_page) - vi->hdr_len;
> -
> -			/* We can only create skb based on xdp_page. */
> -			if (unlikely(xdp_page != page)) {
> -				rcu_read_unlock();
> -				put_page(page);
> -				head_skb = page_to_skb(vi, rq, xdp_page,
> -						       offset, len, PAGE_SIZE);
> -				return head_skb;
> -			}
> +					page_address(page) - vi->hdr_len;
>  			break;
>  		case XDP_TX:
>  			sent = __virtnet_xdp_xmit(vi, &xdp);
>  			if (unlikely(!sent)) {
>  				trace_xdp_exception(vi->dev, xdp_prog, act);
> -				if (unlikely(xdp_page != page))
> -					put_page(xdp_page);
>  				goto err_xdp;
>  			}
>  			*xdp_xmit = true;
> -			if (unlikely(xdp_page != page))
> +			rcu_read_unlock();
> +			goto xdp_xmit;
> +		case XDP_REDIRECT:
> +			err = xdp_do_redirect(dev, &xdp, xdp_prog);
> +			if (err)
>  				goto err_xdp;
> +			*xdp_xmit = true;
>  			rcu_read_unlock();
>  			goto xdp_xmit;
>  		default:
> @@ -775,13 +785,12 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>  		case XDP_ABORTED:
>  			trace_xdp_exception(vi->dev, xdp_prog, act);
>  		case XDP_DROP:
> -			if (unlikely(xdp_page != page))
> -				__free_pages(xdp_page, 0);
>  			goto err_xdp;
>  		}
>  	}
>  	rcu_read_unlock();
>  
> +skb_xdp:
>  	truesize = mergeable_ctx_to_truesize(ctx);
>  	if (unlikely(len > truesize)) {
>  		pr_debug("%s: rx error: len %u exceeds truesize %lu\n",
> @@ -848,7 +857,11 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
>  		}
>  	}
>  
> -	ewma_pkt_len_add(&rq->mrg_avg_pkt_len, head_skb->len);
> +	if (skb_xdp)
> +		head_skb = virtnet_skb_xdp(rq, head_skb);
> +	else
> +		ewma_pkt_len_add(&rq->mrg_avg_pkt_len, head_skb->len);
> +
>  	return head_skb;
>  
>  err_xdp:
> @@ -1013,13 +1026,18 @@ static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq,
>  }
>  
>  static unsigned int get_mergeable_buf_len(struct receive_queue *rq,
> -					  struct ewma_pkt_len *avg_pkt_len)
> +					  struct ewma_pkt_len *avg_pkt_len,
> +					  unsigned int room)
>  {
>  	const size_t hdr_len = sizeof(struct virtio_net_hdr_mrg_rxbuf);
>  	unsigned int len;
>  
> -	len = hdr_len + clamp_t(unsigned int, ewma_pkt_len_read(avg_pkt_len),
> +	if (room)
> +		return PAGE_SIZE - room;
> +
> +	len = hdr_len +	clamp_t(unsigned int, ewma_pkt_len_read(avg_pkt_len),
>  				rq->min_buf_len, PAGE_SIZE - hdr_len);
> +
>  	return ALIGN(len, L1_CACHE_BYTES);
>  }
>  
> @@ -1028,21 +1046,27 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
>  {
>  	struct page_frag *alloc_frag = &rq->alloc_frag;
>  	unsigned int headroom = virtnet_get_headroom(vi);
> +	unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
> +	unsigned int room = SKB_DATA_ALIGN(headroom + tailroom);
>  	char *buf;
>  	void *ctx;
>  	int err;
>  	unsigned int len, hole;
>  
> -	len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len);
> -	if (unlikely(!skb_page_frag_refill(len + headroom, alloc_frag, gfp)))
> +	/* Extra tailroom is needed to satisfy XDP's assumption. This
> +	 * means rx frags coalescing won't work, but consider we've
> +	 * disabled GSO for XDP, it won't be a big issue.
> +	 */
> +	len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room);
> +	if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp)))
>  		return -ENOMEM;
>  
>  	buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
>  	buf += headroom; /* advance address leaving hole at front of pkt */
>  	get_page(alloc_frag->page);
> -	alloc_frag->offset += len + headroom;
> +	alloc_frag->offset += len + room;
>  	hole = alloc_frag->size - alloc_frag->offset;
> -	if (hole < len + headroom) {
> +	if (hole < len + room) {
>  		/* To avoid internal fragmentation, if there is very likely not
>  		 * enough space for another buffer, add the remaining space to
>  		 * the current buffer.
> @@ -2576,12 +2600,15 @@ static ssize_t mergeable_rx_buffer_size_show(struct netdev_rx_queue *queue,
>  {
>  	struct virtnet_info *vi = netdev_priv(queue->dev);
>  	unsigned int queue_index = get_netdev_rx_queue_index(queue);
> +	unsigned int headroom = virtnet_get_headroom(vi);
> +	unsigned int tailroom = headroom ? sizeof(struct skb_shared_info) : 0;
>  	struct ewma_pkt_len *avg;
>  
>  	BUG_ON(queue_index >= vi->max_queue_pairs);
>  	avg = &vi->rq[queue_index].mrg_avg_pkt_len;
>  	return sprintf(buf, "%u\n",
> -		       get_mergeable_buf_len(&vi->rq[queue_index], avg));
> +		       get_mergeable_buf_len(&vi->rq[queue_index], avg,
> +				       SKB_DATA_ALIGN(headroom + tailroom)));
>  }
>  
>  static struct rx_queue_attribute mergeable_rx_buffer_size_attribute =
> -- 
> 2.7.4

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next 0/2] virtio-net: re enable XDP_REDIRECT for mergeable buffer
  2018-03-01 10:35       ` Jesper Dangaard Brouer
@ 2018-03-01 13:40         ` Michael S. Tsirkin
  -1 siblings, 0 replies; 36+ messages in thread
From: Michael S. Tsirkin @ 2018-03-01 13:40 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: Jason Wang, virtualization, netdev, linux-kernel, john.fastabend,
	Alexei Starovoitov

On Thu, Mar 01, 2018 at 11:35:31AM +0100, Jesper Dangaard Brouer wrote:
> On Thu, 1 Mar 2018 17:23:37 +0800
> Jason Wang <jasowang@redhat.com> wrote:
> 
> > On 2018年03月01日 17:10, Jesper Dangaard Brouer wrote:
> > > On Thu,  1 Mar 2018 11:19:03 +0800
> > > Jason Wang <jasowang@redhat.com> wrote:
> > >  
> > >> This series tries to re-enable XDP_REDIRECT for mergeable buffer which
> > >> was removed since commit 7324f5399b06 ("virtio_net: disable
> > >> XDP_REDIRECT in receive_mergeable() case"). Main concerns are:
> > >>
> > >> - not enough tailroom was reserved which breaks cpumap  
> >
> > > To address this at a more fundamental level, I would suggest that we/you
> > > instead extend XDP to know it's buffers "frame" size/end.  (The
> > > assumption use to be, xdp_buff->data_hard_start + PAGE_SIZE, but
> > > ixgbe+virtio_net broke that assumption).
> > >
> > > It should actually be fairly easy to implement:
> > >   * Simply extend xdp_buff with a "data_hard_end" pointer.  
> > 
> > Right, and then cpumap can warn and drop packets with insufficient 
> > tailroom.
> >
> > But it should be a patch on top of this I think.
> 
> Hmmm, not really.  If we/you instead fix the issue of XDP doesn't know
> the end/size of the frame, then we don't need this mixed XDP
> generic/native code path mixing.
> 
> You could re-enable native redirect, and push the responsibility to
> cpumap for detecting this too-small frame "missing tailroom" (and avoid
> crashing...). (If we really want to support this, cpumap could fallback
> to dev_alloc_skb, and handle it gracefully).

Yea, we probably should.

However it's not nice that redirect is now gone in net.
IMHO a smaller version of patch 1/2 (without using generic code)
should go into net. tailroom tracking and fallback to dev_alloc_skb
can go into net-next.

> -- 
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next 0/2] virtio-net: re enable XDP_REDIRECT for mergeable buffer
@ 2018-03-01 13:40         ` Michael S. Tsirkin
  0 siblings, 0 replies; 36+ messages in thread
From: Michael S. Tsirkin @ 2018-03-01 13:40 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: netdev, john.fastabend, linux-kernel, virtualization, Alexei Starovoitov

On Thu, Mar 01, 2018 at 11:35:31AM +0100, Jesper Dangaard Brouer wrote:
> On Thu, 1 Mar 2018 17:23:37 +0800
> Jason Wang <jasowang@redhat.com> wrote:
> 
> > On 2018年03月01日 17:10, Jesper Dangaard Brouer wrote:
> > > On Thu,  1 Mar 2018 11:19:03 +0800
> > > Jason Wang <jasowang@redhat.com> wrote:
> > >  
> > >> This series tries to re-enable XDP_REDIRECT for mergeable buffer which
> > >> was removed since commit 7324f5399b06 ("virtio_net: disable
> > >> XDP_REDIRECT in receive_mergeable() case"). Main concerns are:
> > >>
> > >> - not enough tailroom was reserved which breaks cpumap  
> >
> > > To address this at a more fundamental level, I would suggest that we/you
> > > instead extend XDP to know it's buffers "frame" size/end.  (The
> > > assumption use to be, xdp_buff->data_hard_start + PAGE_SIZE, but
> > > ixgbe+virtio_net broke that assumption).
> > >
> > > It should actually be fairly easy to implement:
> > >   * Simply extend xdp_buff with a "data_hard_end" pointer.  
> > 
> > Right, and then cpumap can warn and drop packets with insufficient 
> > tailroom.
> >
> > But it should be a patch on top of this I think.
> 
> Hmmm, not really.  If we/you instead fix the issue of XDP doesn't know
> the end/size of the frame, then we don't need this mixed XDP
> generic/native code path mixing.
> 
> You could re-enable native redirect, and push the responsibility to
> cpumap for detecting this too-small frame "missing tailroom" (and avoid
> crashing...). (If we really want to support this, cpumap could fallback
> to dev_alloc_skb, and handle it gracefully).

Yea, we probably should.

However it's not nice that redirect is now gone in net.
IMHO a smaller version of patch 1/2 (without using generic code)
should go into net. tailroom tracking and fallback to dev_alloc_skb
can go into net-next.

> -- 
> Best regards,
>   Jesper Dangaard Brouer
>   MSc.CS, Principal Kernel Engineer at Red Hat
>   LinkedIn: http://www.linkedin.com/in/brouer
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next 0/2] virtio-net: re enable XDP_REDIRECT for mergeable buffer
  2018-03-01 13:15       ` Jason Wang
@ 2018-03-01 14:16           ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 36+ messages in thread
From: Jesper Dangaard Brouer @ 2018-03-01 14:16 UTC (permalink / raw)
  To: Jason Wang
  Cc: mst, virtualization, netdev, linux-kernel, john.fastabend,
	Alexei Starovoitov, brouer

On Thu, 1 Mar 2018 21:15:36 +0800
Jason Wang <jasowang@redhat.com> wrote:

> On 2018年03月01日 18:35, Jesper Dangaard Brouer wrote:
> > On Thu, 1 Mar 2018 17:23:37 +0800
> > Jason Wang <jasowang@redhat.com> wrote:
> >  
> >> On 2018年03月01日 17:10, Jesper Dangaard Brouer wrote:  
> >>> On Thu,  1 Mar 2018 11:19:03 +0800
> >>> Jason Wang <jasowang@redhat.com> wrote:
> >>>     
> >>>> This series tries to re-enable XDP_REDIRECT for mergeable buffer which
> >>>> was removed since commit 7324f5399b06 ("virtio_net: disable
> >>>> XDP_REDIRECT in receive_mergeable() case"). Main concerns are:
> >>>>
> >>>> - not enough tailroom was reserved which breaks cpumap  
> >>> To address this at a more fundamental level, I would suggest that we/you
> >>> instead extend XDP to know it's buffers "frame" size/end.  (The
> >>> assumption use to be, xdp_buff->data_hard_start + PAGE_SIZE, but
> >>> ixgbe+virtio_net broke that assumption).
> >>>
> >>> It should actually be fairly easy to implement:
> >>>    * Simply extend xdp_buff with a "data_hard_end" pointer.  
> >> Right, and then cpumap can warn and drop packets with insufficient
> >> tailroom.
> >>
> >> But it should be a patch on top of this I think.  
> > Hmmm, not really.  If we/you instead fix the issue of XDP doesn't know
> > the end/size of the frame, then we don't need this mixed XDP
> > generic/native code path mixing.  
> 
> I know this but I'm still a little bit confused. According to the commit 
> log of 7324f5399b06 ("virtio_net: disable XDP_REDIRECT in 
> receive_mergeable() case"), you said:
> 
> """
>      The longer explaination is that receive_mergeable() tries to
>      work-around and satisfy these XDP requiresments e.g. by having a
>      function xdp_linearize_page() that allocates and memcpy RX buffers
>      around (in case packet is scattered across multiple rx buffers).  This
>      does currently satisfy XDP_PASS, XDP_DROP and XDP_TX (but only because
>      we have not implemented bpf_xdp_adjust_tail yet).
> """
> 
> So I consider the tailroom is a must for the (future) tail adjustment.

That is true, BUT implementing the "data_hard_end" extension is a
pre-requisite.  It will also be to catch the issue of too little
tail-room if/when implementing bpf_xdp_adjust_tail().

It is of-cause a "nice-to-have", to fix this virtio_net driver's
receive_mergeable() call to have enough tail-room, but I don't see it
as a solution to the fundamental problem.


> > You could re-enable native redirect, and push the responsibility to
> > cpumap for detecting this too-small frame "missing tailroom" (and avoid
> > crashing...). (If we really want to support this, cpumap could fallback
> > to dev_alloc_skb, and handle it gracefully).
> >  
> 
> Right but it will be slower than build_skb().

True, but bad argument in this context, as you are already using a
similar function call napi_alloc_skb().  And it will be even slower to
call generic-XDP code path.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next 0/2] virtio-net: re enable XDP_REDIRECT for mergeable buffer
@ 2018-03-01 14:16           ` Jesper Dangaard Brouer
  0 siblings, 0 replies; 36+ messages in thread
From: Jesper Dangaard Brouer @ 2018-03-01 14:16 UTC (permalink / raw)
  To: Jason Wang
  Cc: mst, netdev, john.fastabend, linux-kernel, virtualization,
	brouer, Alexei Starovoitov

On Thu, 1 Mar 2018 21:15:36 +0800
Jason Wang <jasowang@redhat.com> wrote:

> On 2018年03月01日 18:35, Jesper Dangaard Brouer wrote:
> > On Thu, 1 Mar 2018 17:23:37 +0800
> > Jason Wang <jasowang@redhat.com> wrote:
> >  
> >> On 2018年03月01日 17:10, Jesper Dangaard Brouer wrote:  
> >>> On Thu,  1 Mar 2018 11:19:03 +0800
> >>> Jason Wang <jasowang@redhat.com> wrote:
> >>>     
> >>>> This series tries to re-enable XDP_REDIRECT for mergeable buffer which
> >>>> was removed since commit 7324f5399b06 ("virtio_net: disable
> >>>> XDP_REDIRECT in receive_mergeable() case"). Main concerns are:
> >>>>
> >>>> - not enough tailroom was reserved which breaks cpumap  
> >>> To address this at a more fundamental level, I would suggest that we/you
> >>> instead extend XDP to know it's buffers "frame" size/end.  (The
> >>> assumption use to be, xdp_buff->data_hard_start + PAGE_SIZE, but
> >>> ixgbe+virtio_net broke that assumption).
> >>>
> >>> It should actually be fairly easy to implement:
> >>>    * Simply extend xdp_buff with a "data_hard_end" pointer.  
> >> Right, and then cpumap can warn and drop packets with insufficient
> >> tailroom.
> >>
> >> But it should be a patch on top of this I think.  
> > Hmmm, not really.  If we/you instead fix the issue of XDP doesn't know
> > the end/size of the frame, then we don't need this mixed XDP
> > generic/native code path mixing.  
> 
> I know this but I'm still a little bit confused. According to the commit 
> log of 7324f5399b06 ("virtio_net: disable XDP_REDIRECT in 
> receive_mergeable() case"), you said:
> 
> """
>      The longer explaination is that receive_mergeable() tries to
>      work-around and satisfy these XDP requiresments e.g. by having a
>      function xdp_linearize_page() that allocates and memcpy RX buffers
>      around (in case packet is scattered across multiple rx buffers).  This
>      does currently satisfy XDP_PASS, XDP_DROP and XDP_TX (but only because
>      we have not implemented bpf_xdp_adjust_tail yet).
> """
> 
> So I consider the tailroom is a must for the (future) tail adjustment.

That is true, BUT implementing the "data_hard_end" extension is a
pre-requisite.  It will also be to catch the issue of too little
tail-room if/when implementing bpf_xdp_adjust_tail().

It is of-cause a "nice-to-have", to fix this virtio_net driver's
receive_mergeable() call to have enough tail-room, but I don't see it
as a solution to the fundamental problem.


> > You could re-enable native redirect, and push the responsibility to
> > cpumap for detecting this too-small frame "missing tailroom" (and avoid
> > crashing...). (If we really want to support this, cpumap could fallback
> > to dev_alloc_skb, and handle it gracefully).
> >  
> 
> Right but it will be slower than build_skb().

True, but bad argument in this context, as you are already using a
similar function call napi_alloc_skb().  And it will be even slower to
call generic-XDP code path.

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer
_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next 0/2] virtio-net: re enable XDP_REDIRECT for mergeable buffer
  2018-03-01 14:16           ` Jesper Dangaard Brouer
@ 2018-03-02  4:17             ` Jason Wang
  -1 siblings, 0 replies; 36+ messages in thread
From: Jason Wang @ 2018-03-02  4:17 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: mst, virtualization, netdev, linux-kernel, john.fastabend,
	Alexei Starovoitov



On 2018年03月01日 22:16, Jesper Dangaard Brouer wrote:
> On Thu, 1 Mar 2018 21:15:36 +0800
> Jason Wang <jasowang@redhat.com> wrote:
>
>> On 2018年03月01日 18:35, Jesper Dangaard Brouer wrote:
>>> On Thu, 1 Mar 2018 17:23:37 +0800
>>> Jason Wang <jasowang@redhat.com> wrote:
>>>   
>>>> On 2018年03月01日 17:10, Jesper Dangaard Brouer wrote:
>>>>> On Thu,  1 Mar 2018 11:19:03 +0800
>>>>> Jason Wang <jasowang@redhat.com> wrote:
>>>>>      
>>>>>> This series tries to re-enable XDP_REDIRECT for mergeable buffer which
>>>>>> was removed since commit 7324f5399b06 ("virtio_net: disable
>>>>>> XDP_REDIRECT in receive_mergeable() case"). Main concerns are:
>>>>>>
>>>>>> - not enough tailroom was reserved which breaks cpumap
>>>>> To address this at a more fundamental level, I would suggest that we/you
>>>>> instead extend XDP to know it's buffers "frame" size/end.  (The
>>>>> assumption use to be, xdp_buff->data_hard_start + PAGE_SIZE, but
>>>>> ixgbe+virtio_net broke that assumption).
>>>>>
>>>>> It should actually be fairly easy to implement:
>>>>>     * Simply extend xdp_buff with a "data_hard_end" pointer.
>>>> Right, and then cpumap can warn and drop packets with insufficient
>>>> tailroom.
>>>>
>>>> But it should be a patch on top of this I think.
>>> Hmmm, not really.  If we/you instead fix the issue of XDP doesn't know
>>> the end/size of the frame, then we don't need this mixed XDP
>>> generic/native code path mixing.
>> I know this but I'm still a little bit confused. According to the commit
>> log of 7324f5399b06 ("virtio_net: disable XDP_REDIRECT in
>> receive_mergeable() case"), you said:
>>
>> """
>>       The longer explaination is that receive_mergeable() tries to
>>       work-around and satisfy these XDP requiresments e.g. by having a
>>       function xdp_linearize_page() that allocates and memcpy RX buffers
>>       around (in case packet is scattered across multiple rx buffers).  This
>>       does currently satisfy XDP_PASS, XDP_DROP and XDP_TX (but only because
>>       we have not implemented bpf_xdp_adjust_tail yet).
>> """
>>
>> So I consider the tailroom is a must for the (future) tail adjustment.
> That is true, BUT implementing the "data_hard_end" extension is a
> pre-requisite.  It will also be to catch the issue of too little
> tail-room if/when implementing bpf_xdp_adjust_tail().
>
> It is of-cause a "nice-to-have", to fix this virtio_net driver's
> receive_mergeable() call to have enough tail-room, but I don't see it
> as a solution to the fundamental problem.
>
>
>>> You could re-enable native redirect, and push the responsibility to
>>> cpumap for detecting this too-small frame "missing tailroom" (and avoid
>>> crashing...). (If we really want to support this, cpumap could fallback
>>> to dev_alloc_skb, and handle it gracefully).
>>>   
>> Right but it will be slower than build_skb().
> True, but bad argument in this context, as you are already using a
> similar function call napi_alloc_skb().  And it will be even slower to
> call generic-XDP code path.
>

Well, there's no generic skb implementation for cpumap redirection so I 
think we're talking about native XDP for cpumap, In this case, we won't 
even use napi_alloc_skb().

Thanks

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next 0/2] virtio-net: re enable XDP_REDIRECT for mergeable buffer
@ 2018-03-02  4:17             ` Jason Wang
  0 siblings, 0 replies; 36+ messages in thread
From: Jason Wang @ 2018-03-02  4:17 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: mst, netdev, john.fastabend, linux-kernel, virtualization,
	Alexei Starovoitov



On 2018年03月01日 22:16, Jesper Dangaard Brouer wrote:
> On Thu, 1 Mar 2018 21:15:36 +0800
> Jason Wang <jasowang@redhat.com> wrote:
>
>> On 2018年03月01日 18:35, Jesper Dangaard Brouer wrote:
>>> On Thu, 1 Mar 2018 17:23:37 +0800
>>> Jason Wang <jasowang@redhat.com> wrote:
>>>   
>>>> On 2018年03月01日 17:10, Jesper Dangaard Brouer wrote:
>>>>> On Thu,  1 Mar 2018 11:19:03 +0800
>>>>> Jason Wang <jasowang@redhat.com> wrote:
>>>>>      
>>>>>> This series tries to re-enable XDP_REDIRECT for mergeable buffer which
>>>>>> was removed since commit 7324f5399b06 ("virtio_net: disable
>>>>>> XDP_REDIRECT in receive_mergeable() case"). Main concerns are:
>>>>>>
>>>>>> - not enough tailroom was reserved which breaks cpumap
>>>>> To address this at a more fundamental level, I would suggest that we/you
>>>>> instead extend XDP to know it's buffers "frame" size/end.  (The
>>>>> assumption use to be, xdp_buff->data_hard_start + PAGE_SIZE, but
>>>>> ixgbe+virtio_net broke that assumption).
>>>>>
>>>>> It should actually be fairly easy to implement:
>>>>>     * Simply extend xdp_buff with a "data_hard_end" pointer.
>>>> Right, and then cpumap can warn and drop packets with insufficient
>>>> tailroom.
>>>>
>>>> But it should be a patch on top of this I think.
>>> Hmmm, not really.  If we/you instead fix the issue of XDP doesn't know
>>> the end/size of the frame, then we don't need this mixed XDP
>>> generic/native code path mixing.
>> I know this but I'm still a little bit confused. According to the commit
>> log of 7324f5399b06 ("virtio_net: disable XDP_REDIRECT in
>> receive_mergeable() case"), you said:
>>
>> """
>>       The longer explaination is that receive_mergeable() tries to
>>       work-around and satisfy these XDP requiresments e.g. by having a
>>       function xdp_linearize_page() that allocates and memcpy RX buffers
>>       around (in case packet is scattered across multiple rx buffers).  This
>>       does currently satisfy XDP_PASS, XDP_DROP and XDP_TX (but only because
>>       we have not implemented bpf_xdp_adjust_tail yet).
>> """
>>
>> So I consider the tailroom is a must for the (future) tail adjustment.
> That is true, BUT implementing the "data_hard_end" extension is a
> pre-requisite.  It will also be to catch the issue of too little
> tail-room if/when implementing bpf_xdp_adjust_tail().
>
> It is of-cause a "nice-to-have", to fix this virtio_net driver's
> receive_mergeable() call to have enough tail-room, but I don't see it
> as a solution to the fundamental problem.
>
>
>>> You could re-enable native redirect, and push the responsibility to
>>> cpumap for detecting this too-small frame "missing tailroom" (and avoid
>>> crashing...). (If we really want to support this, cpumap could fallback
>>> to dev_alloc_skb, and handle it gracefully).
>>>   
>> Right but it will be slower than build_skb().
> True, but bad argument in this context, as you are already using a
> similar function call napi_alloc_skb().  And it will be even slower to
> call generic-XDP code path.
>

Well, there's no generic skb implementation for cpumap redirection so I 
think we're talking about native XDP for cpumap, In this case, we won't 
even use napi_alloc_skb().

Thanks

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next 1/2] virtio-net: re enable XDP_REDIRECT for mergeable buffer
  2018-03-01 13:36     ` Michael S. Tsirkin
@ 2018-03-02  4:20       ` Jason Wang
  -1 siblings, 0 replies; 36+ messages in thread
From: Jason Wang @ 2018-03-02  4:20 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtualization, netdev, linux-kernel, brouer, john.fastabend



On 2018年03月01日 21:36, Michael S. Tsirkin wrote:
> On Thu, Mar 01, 2018 at 11:19:04AM +0800, Jason Wang wrote:
>> XDP_REDIRECT support for mergeable buffer was removed since commit
>> 7324f5399b06 ("virtio_net: disable XDP_REDIRECT in receive_mergeable()
>> case"). This is because we don't reserve enough tailroom for struct
>> skb_shared_info which breaks XDP assumption. Other complaints are, the
>> complex linearize logic and EWMA estimation may increase the
>> possibility of linearizing.
>>
>> Signed-off-by: Jason Wang<jasowang@redhat.com>
> How about reposting just this patch for net? It's pretty small
> and this way we don't have broken redirect there.
> Probably keeping the linearize in here to reduce the
> amount of changes.
>

Looks possible, let me post a version for net.

Thanks

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH net-next 1/2] virtio-net: re enable XDP_REDIRECT for mergeable buffer
@ 2018-03-02  4:20       ` Jason Wang
  0 siblings, 0 replies; 36+ messages in thread
From: Jason Wang @ 2018-03-02  4:20 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: netdev, brouer, john.fastabend, linux-kernel, virtualization



On 2018年03月01日 21:36, Michael S. Tsirkin wrote:
> On Thu, Mar 01, 2018 at 11:19:04AM +0800, Jason Wang wrote:
>> XDP_REDIRECT support for mergeable buffer was removed since commit
>> 7324f5399b06 ("virtio_net: disable XDP_REDIRECT in receive_mergeable()
>> case"). This is because we don't reserve enough tailroom for struct
>> skb_shared_info which breaks XDP assumption. Other complaints are, the
>> complex linearize logic and EWMA estimation may increase the
>> possibility of linearizing.
>>
>> Signed-off-by: Jason Wang<jasowang@redhat.com>
> How about reposting just this patch for net? It's pretty small
> and this way we don't have broken redirect there.
> Probably keeping the linearize in here to reduce the
> amount of changes.
>

Looks possible, let me post a version for net.

Thanks

_______________________________________________
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2018-03-02  4:21 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-01  3:19 [PATCH net-next 0/2] virtio-net: re enable XDP_REDIRECT for mergeable buffer Jason Wang
2018-03-01  3:19 ` Jason Wang
2018-03-01  3:19 ` [PATCH net-next 1/2] " Jason Wang
2018-03-01  3:19 ` Jason Wang
2018-03-01  8:41   ` Jesper Dangaard Brouer
2018-03-01  8:41   ` Jesper Dangaard Brouer
2018-03-01  9:11     ` Jason Wang
2018-03-01  9:11       ` Jason Wang
2018-03-01 13:36   ` Michael S. Tsirkin
2018-03-01 13:36     ` Michael S. Tsirkin
2018-03-02  4:20     ` Jason Wang
2018-03-02  4:20       ` Jason Wang
2018-03-01  3:19 ` [PATCH net-next 2/2] virtio-net: simplify XDP handling in small buffer Jason Wang
2018-03-01  8:02   ` Jesper Dangaard Brouer
2018-03-01  8:02   ` Jesper Dangaard Brouer
2018-03-01  8:49     ` Jason Wang
2018-03-01  9:15       ` Jesper Dangaard Brouer
2018-03-01  9:15         ` Jesper Dangaard Brouer
2018-03-01  9:24         ` Jason Wang
2018-03-01  9:24         ` Jason Wang
2018-03-01  8:49     ` Jason Wang
2018-03-01  3:19 ` Jason Wang
2018-03-01  9:10 ` [PATCH net-next 0/2] virtio-net: re enable XDP_REDIRECT for mergeable buffer Jesper Dangaard Brouer
2018-03-01  9:10   ` Jesper Dangaard Brouer
2018-03-01  9:23   ` Jason Wang
2018-03-01  9:23   ` Jason Wang
2018-03-01 10:35     ` Jesper Dangaard Brouer
2018-03-01 10:35       ` Jesper Dangaard Brouer
2018-03-01 13:15       ` Jason Wang
2018-03-01 14:16         ` Jesper Dangaard Brouer
2018-03-01 14:16           ` Jesper Dangaard Brouer
2018-03-02  4:17           ` Jason Wang
2018-03-02  4:17             ` Jason Wang
2018-03-01 13:15       ` Jason Wang
2018-03-01 13:40       ` Michael S. Tsirkin
2018-03-01 13:40         ` Michael S. Tsirkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.