All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: John Fastabend <john.fastabend@gmail.com>
Cc: jasowang@redhat.com, john.r.fastabend@intel.com,
	netdev@vger.kernel.org, alexei.starovoitov@gmail.com,
	daniel@iogearbox.net
Subject: Re: [net PATCH v5 6/6] virtio_net: XDP support for adjust_head
Date: Tue, 24 Jan 2017 00:28:53 +0200	[thread overview]
Message-ID: <20170124002828-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <58867FDF.4070708@gmail.com>

On Mon, Jan 23, 2017 at 02:12:47PM -0800, John Fastabend wrote:
> On 17-01-23 12:09 PM, Michael S. Tsirkin wrote:
> > On Mon, Jan 23, 2017 at 09:22:36PM +0200, Michael S. Tsirkin wrote:
> >> On Tue, Jan 17, 2017 at 02:22:59PM -0800, John Fastabend wrote:
> >>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> >>> index 62dbf4b..3b129b4 100644
> >>> --- a/drivers/net/virtio_net.c
> >>> +++ b/drivers/net/virtio_net.c
> >>> @@ -41,6 +41,9 @@
> >>>  #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN)
> >>>  #define GOOD_COPY_LEN	128
> >>>  
> >>> +/* Amount of XDP headroom to prepend to packets for use by xdp_adjust_head */
> >>> +#define VIRTIO_XDP_HEADROOM 256
> >>> +
> >>>  /* RX packet size EWMA. The average packet size is used to determine the packet
> >>>   * buffer size when refilling RX rings. As the entire RX ring may be refilled
> >>>   * at once, the weight is chosen so that the EWMA will be insensitive to short-
> >>
> >> I wonder where does this number come from?  This is quite a lot and
> >> means that using XDP_PASS will slow down any sockets on top of it.
> >> Which in turn means people will try to remove XDP when not in use,
> >> causing resets.  E.g. build_skb (which I have a patch to switch to) uses
> >> a much more reasonable NET_SKB_PAD.
> 
> I just used the value Alexei (or someone?) came up with. I think it needs to be
> large enough to avoid copy in header encap cases. So minimum
> 
>   VXLAN_HDR + OUTER_UDP + OUTER_IPV6_HDR + OUTER_MAC =
>      8      +     8     +        40      +      14   =  70
> 
> The choice of VXLAN hdr was sort of arbitrary but seems good for estimates. For
> what its worth there is also a ndo_set_rx_headroom could we use that to set it
> and choose a reasonable default.
> 
> >>
> >> -- 
> >> MST
> > 
> > 
> > Let me show you a patch that I've been cooking.  What is missing there
> > is handling corner cases like e.g.  when ring size is ~4 entries so
> > using smaller buffers might mean we no longer have enough space to store
> > a full packet.  So it looks like I have to maintain the skb copy path
> > for this hardware.
> > 
> > With this patch, standard configuration has NET_SKB_PAD + NET_IP_ALIGN
> > bytes head padding. Would this be enough for XDP? If yes we do not
> > need the resets.
> 
> Based on above seems a bit small (L1_CACHE_BYTES + 2)? How tricky would it
> be to add support for ndo_set_rx_headroom.

Donnu but then what? Expose it to userspace and let admin
make the decision for us?


> > 
> > Thoughts?
> 
> I'll take a look at the patch this afternoon. Thanks.
> 
> > 
> > --->
> > 
> > virtio_net: switch to build_skb for mrg_rxbuf
> > 
> > For small packets data copy was observed to
> > take up about 15% CPU time. Switch to build_skb
> > and avoid the copy when using mergeable rx buffers.
> > 
> > As a bonus, medium-size skbs that fit in a page will be
> > completely linear.
> > 
> > Of course, we now need to lower the lower bound on packet size,
> > to make sure a sane number of skbs fits in rx socket buffer.
> > By how much? I don't know yet.
> > 
> > It might also be useful to prefetch the packet buffer since
> > net stack will likely use it soon.
> > 
> > Lightly tested, in particular, I didn't yet test what this
> > actually does to performance - sending this out for early
> > feedback/flames.
> > 
> > TODO: it appears that Linux won't handle correctly the case of first
> > buffer being very small (or consisting exclusively of virtio header).
> > This is already the case for current code, need to fix.
> > TODO: might be unfair to the last packet in a fragment as we include
> > remaining space if any in its truesize.
> > 
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > 
> > ----
> > 
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index b425fa1..a6b996f 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -38,6 +38,8 @@ module_param(gso, bool, 0444);
> >  
> >  /* FIXME: MTU in config. */
> >  #define GOOD_PACKET_LEN (ETH_HLEN + VLAN_HLEN + ETH_DATA_LEN)
> > +//#define MIN_PACKET_ALLOC GOOD_PACKET_LEN
> > +#define MIN_PACKET_ALLOC 128
> >  #define GOOD_COPY_LEN	128
> >  
> >  /* RX packet size EWMA. The average packet size is used to determine the packet
> > @@ -246,6 +248,9 @@ static void *mergeable_ctx_to_buf_address(unsigned long mrg_ctx)
> >  static unsigned long mergeable_buf_to_ctx(void *buf, unsigned int truesize)
> >  {
> >  	unsigned int size = truesize / MERGEABLE_BUFFER_ALIGN;
> > +
> > +	BUG_ON((unsigned long)buf & (MERGEABLE_BUFFER_ALIGN - 1));
> > +	BUG_ON(size - 1 >= MERGEABLE_BUFFER_ALIGN);
> >  	return (unsigned long)buf | (size - 1);
> >  }
> >  
> > @@ -354,25 +359,54 @@ static struct sk_buff *receive_big(struct net_device *dev,
> >  	return NULL;
> >  }
> >  
> > +#define VNET_SKB_PAD (NET_SKB_PAD + NET_IP_ALIGN)
> > +#define VNET_SKB_BUG (VNET_SKB_PAD < sizeof(struct virtio_net_hdr_mrg_rxbuf))
> > +#define VNET_SKB_LEN(len) ((len) - sizeof(struct virtio_net_hdr_mrg_rxbuf))
> > +#define VNET_SKB_OFF VNET_SKB_LEN(VNET_SKB_PAD)
> > +
> > +static struct sk_buff *vnet_build_skb(struct virtnet_info *vi,
> > +				      void *buf,
> > +				      unsigned int len, unsigned int truesize)
> > +{
> > +	struct sk_buff *skb = build_skb(buf, truesize);
> > +
> > +	if (!skb)
> > +		return NULL;
> > +
> > +	skb_reserve(skb, VNET_SKB_PAD);
> > +	skb_put(skb, VNET_SKB_LEN(len));
> > +
> > +	return skb;
> > +}
> > +
> >  static struct sk_buff *receive_mergeable(struct net_device *dev,
> >  					 struct virtnet_info *vi,
> >  					 struct receive_queue *rq,
> >  					 unsigned long ctx,
> > -					 unsigned int len)
> > +					 unsigned int len,
> > +					 struct virtio_net_hdr_mrg_rxbuf *hdr)
> >  {
> >  	void *buf = mergeable_ctx_to_buf_address(ctx);
> > -	struct virtio_net_hdr_mrg_rxbuf *hdr = buf;
> > -	u16 num_buf = virtio16_to_cpu(vi->vdev, hdr->num_buffers);
> > +	u16 num_buf;
> >  	struct page *page = virt_to_head_page(buf);
> > -	int offset = buf - page_address(page);
> > -	unsigned int truesize = max(len, mergeable_ctx_to_buf_truesize(ctx));
> > +	unsigned int truesize = mergeable_ctx_to_buf_truesize(ctx);
> > +	int offset;
> > +	struct sk_buff *head_skb;
> > +	struct sk_buff *curr_skb;
> > +
> > +	BUG_ON(len > truesize);
> >  
> > -	struct sk_buff *head_skb = page_to_skb(vi, rq, page, offset, len,
> > -					       truesize);
> > -	struct sk_buff *curr_skb = head_skb;
> > +	/* copy header: build_skb will overwrite it */
> > +	memcpy(hdr, buf + VNET_SKB_OFF, sizeof *hdr);
> > +
> > +	head_skb = vnet_build_skb(vi, buf, len, truesize);
> > +	curr_skb = head_skb;
> > +
> > +	num_buf = virtio16_to_cpu(vi->vdev, hdr->num_buffers);
> >  
> >  	if (unlikely(!curr_skb))
> >  		goto err_skb;
> > +
> >  	while (--num_buf) {
> >  		int num_skb_frags;
> >  
> > @@ -386,7 +420,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> >  			goto err_buf;
> >  		}
> >  
> > -		buf = mergeable_ctx_to_buf_address(ctx);
> > +		buf = mergeable_ctx_to_buf_address(ctx) + VNET_SKB_OFF;
> >  		page = virt_to_head_page(buf);
> >  
> >  		num_skb_frags = skb_shinfo(curr_skb)->nr_frags;
> > @@ -403,7 +437,8 @@ static struct sk_buff *receive_mergeable(struct net_device *dev,
> >  			head_skb->truesize += nskb->truesize;
> >  			num_skb_frags = 0;
> >  		}
> > -		truesize = max(len, mergeable_ctx_to_buf_truesize(ctx));
> > +		truesize = mergeable_ctx_to_buf_truesize(ctx);
> > +		BUG_ON(len > truesize);
> >  		if (curr_skb != head_skb) {
> >  			head_skb->data_len += len;
> >  			head_skb->len += len;
> > @@ -449,6 +484,7 @@ static void receive_buf(struct virtnet_info *vi, struct receive_queue *rq,
> >  	struct virtnet_stats *stats = this_cpu_ptr(vi->stats);
> >  	struct sk_buff *skb;
> >  	struct virtio_net_hdr_mrg_rxbuf *hdr;
> > +	struct virtio_net_hdr_mrg_rxbuf hdr0;
> >  
> >  	if (unlikely(len < vi->hdr_len + ETH_HLEN)) {
> >  		pr_debug("%s: short packet %i\n", dev->name, len);
> > @@ -465,17 +501,24 @@ static void receive_buf(struct virtnet_info *vi, struct receive_queue *rq,
> >  		return;
> >  	}
> >  
> > -	if (vi->mergeable_rx_bufs)
> > -		skb = receive_mergeable(dev, vi, rq, (unsigned long)buf, len);
> > -	else if (vi->big_packets)
> > +	if (vi->mergeable_rx_bufs) {
> > +		skb = receive_mergeable(dev, vi, rq, (unsigned long)buf, len,
> > +					&hdr0);
> > +		if (unlikely(!skb))
> > +			return;
> > +		hdr = &hdr0;
> > +	} else if (vi->big_packets) {
> >  		skb = receive_big(dev, vi, rq, buf, len);
> > -	else
> > +		if (unlikely(!skb))
> > +			return;
> > +		hdr = skb_vnet_hdr(skb);
> > +	} else {
> >  		skb = receive_small(vi, buf, len);
> > +		if (unlikely(!skb))
> > +			return;
> > +		hdr = skb_vnet_hdr(skb);
> > +	}
> >  
> > -	if (unlikely(!skb))
> > -		return;
> > -
> > -	hdr = skb_vnet_hdr(skb);
> >  
> >  	u64_stats_update_begin(&stats->rx_syncp);
> >  	stats->rx_bytes += skb->len;
> > @@ -581,11 +624,14 @@ static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq,
> >  
> >  static unsigned int get_mergeable_buf_len(struct ewma_pkt_len *avg_pkt_len)
> >  {
> > -	const size_t hdr_len = sizeof(struct virtio_net_hdr_mrg_rxbuf);
> > +	unsigned int hdr;
> >  	unsigned int len;
> >  
> > -	len = hdr_len + clamp_t(unsigned int, ewma_pkt_len_read(avg_pkt_len),
> > -			GOOD_PACKET_LEN, PAGE_SIZE - hdr_len);
> > +	hdr = ALIGN(VNET_SKB_PAD + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)),
> > +		    MERGEABLE_BUFFER_ALIGN);
> > +
> > +	len = hdr + clamp_t(unsigned int, ewma_pkt_len_read(avg_pkt_len),
> > +			    MIN_PACKET_ALLOC, PAGE_SIZE - hdr);
> >  	return ALIGN(len, MERGEABLE_BUFFER_ALIGN);
> >  }
> >  
> > @@ -601,8 +647,11 @@ static int add_recvbuf_mergeable(struct receive_queue *rq, gfp_t gfp)
> >  	if (unlikely(!skb_page_frag_refill(len, alloc_frag, gfp)))
> >  		return -ENOMEM;
> >  
> > -	buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
> > -	ctx = mergeable_buf_to_ctx(buf, len);
> > +	BUILD_BUG_ON(VNET_SKB_BUG);
> > +
> > +	buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset +
> > +		VNET_SKB_OFF;
> > +	//ctx = mergeable_buf_to_ctx(buf - VNET_SKB_OFF, len);
> >  	get_page(alloc_frag->page);
> >  	alloc_frag->offset += len;
> >  	hole = alloc_frag->size - alloc_frag->offset;
> > @@ -615,8 +664,10 @@ static int add_recvbuf_mergeable(struct receive_queue *rq, gfp_t gfp)
> >  		len += hole;
> >  		alloc_frag->offset += hole;
> >  	}
> > +	ctx = mergeable_buf_to_ctx(buf - VNET_SKB_OFF, len);
> >  
> > -	sg_init_one(rq->sg, buf, len);
> > +	sg_init_one(rq->sg, buf,
> > +		    len - VNET_SKB_OFF - SKB_DATA_ALIGN(sizeof(struct skb_shared_info)));
> >  	err = virtqueue_add_inbuf(rq->vq, rq->sg, 1, (void *)ctx, gfp);
> >  	if (err < 0)
> >  		put_page(virt_to_head_page(buf));
> > 

  reply	other threads:[~2017-01-23 22:28 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-17 22:19 [net PATCH v5 0/6] virtio_net XDP fixes and adjust_header support John Fastabend
2017-01-17 22:19 ` [net PATCH v5 1/6] virtio_net: use dev_kfree_skb for small buffer XDP receive John Fastabend
2017-01-18 15:48   ` Michael S. Tsirkin
2017-01-23 21:08   ` Michael S. Tsirkin
2017-01-23 21:57     ` John Fastabend
2017-01-24 19:43     ` David Miller
2017-01-24 20:08       ` Michael S. Tsirkin
2017-01-24 20:11         ` David Miller
2017-01-24 20:54           ` Michael S. Tsirkin
2017-01-25  2:57         ` Jason Wang
2017-01-25  3:23           ` Michael S. Tsirkin
2017-01-25  4:02             ` John Fastabend
2017-01-25  5:46               ` Jason Wang
2017-01-25 14:47                 ` Michael S. Tsirkin
2017-01-25 14:45               ` Michael S. Tsirkin
2017-01-17 22:20 ` [net PATCH v5 2/6] virtio_net: wrap rtnl_lock in test for calling with lock already held John Fastabend
2017-01-17 22:21 ` [net PATCH v5 3/6] virtio_net: factor out xdp handler for readability John Fastabend
2017-01-18 15:48   ` Michael S. Tsirkin
2017-01-17 22:21 ` [net PATCH v5 4/6] virtio_net: remove duplicate queue pair binding in XDP John Fastabend
2017-01-18 15:49   ` Michael S. Tsirkin
2017-01-17 22:22 ` [net PATCH v5 5/6] virtio_net: refactor freeze/restore logic into virtnet reset logic John Fastabend
2017-01-18 15:50   ` Michael S. Tsirkin
2017-01-17 22:22 ` [net PATCH v5 6/6] virtio_net: XDP support for adjust_head John Fastabend
2017-01-18  3:35   ` Jason Wang
2017-01-18 15:15   ` Michael S. Tsirkin
2017-01-19  3:05     ` Jason Wang
2017-01-19 21:11       ` Michael S. Tsirkin
2017-01-20  3:26         ` Jason Wang
2017-01-20  3:39           ` John Fastabend
2017-01-20  3:38         ` John Fastabend
2017-01-20 16:59         ` David Laight
2017-01-20 17:48           ` Michael S. Tsirkin
2017-01-22  2:51             ` Jason Wang
2017-01-22  4:14               ` John Fastabend
2017-01-23 17:02                 ` Michael S. Tsirkin
2017-01-23 19:22   ` Michael S. Tsirkin
2017-01-23 20:09     ` Michael S. Tsirkin
2017-01-23 22:12       ` John Fastabend
2017-01-23 22:28         ` Michael S. Tsirkin [this message]
2017-01-18 15:48 ` [net PATCH v5 0/6] virtio_net XDP fixes and adjust_header support Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170124002828-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=daniel@iogearbox.net \
    --cc=jasowang@redhat.com \
    --cc=john.fastabend@gmail.com \
    --cc=john.r.fastabend@intel.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.