All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	kvm@vger.kernel.org, virtualization@lists.linux-foundation.org
Subject: Re: [RFC PATCH net-next 10/12] vhost_net: build xdp buff
Date: Tue, 22 May 2018 01:21:11 +0300	[thread overview]
Message-ID: <20180522012008-mutt-send-email-mst__38174.064565874$1526941158$gmane$org@kernel.org> (raw)
In-Reply-To: <20180521095611.00005caa@intel.com>

On Mon, May 21, 2018 at 09:56:11AM -0700, Jesse Brandeburg wrote:
> On Mon, 21 May 2018 17:04:31 +0800 Jason wrote:
> > This patch implement build XDP buffers in vhost_net. The idea is do
> > userspace copy in vhost_net and build XDP buff based on the
> > page. Vhost_net can then submit one or an array of XDP buffs to
> > underlayer socket (e.g TUN). TUN can choose to do XDP or call
> > build_skb() to build skb. To support build skb, vnet header were also
> > stored into the header of the XDP buff.
> > 
> > This userspace copy and XDP buffs building is key to achieve XDP
> > batching in TUN, since TUN does not need to care about userspace copy
> > and then can disable premmption for several XDP buffs to achieve
> > batching from XDP.
> > 
> > TODO: reserve headroom based on the TUN XDP.
> > 
> > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > ---
> >  drivers/vhost/net.c | 74 +++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 74 insertions(+)
> > 
> > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> > index f0639d7..1209e84 100644
> > --- a/drivers/vhost/net.c
> > +++ b/drivers/vhost/net.c
> > @@ -492,6 +492,80 @@ static bool vhost_has_more_pkts(struct vhost_net *net,
> >  	       likely(!vhost_exceeds_maxpend(net));
> >  }
> >  
> > +#define VHOST_NET_HEADROOM 256
> > +#define VHOST_NET_RX_PAD (NET_IP_ALIGN + NET_SKB_PAD)
> > +
> > +static int vhost_net_build_xdp(struct vhost_net_virtqueue *nvq,
> > +			       struct iov_iter *from,
> > +			       struct xdp_buff *xdp)
> > +{
> > +	struct vhost_virtqueue *vq = &nvq->vq;
> > +	struct page_frag *alloc_frag = &current->task_frag;
> > +	struct virtio_net_hdr *gso;
> > +	size_t len = iov_iter_count(from);
> > +	int buflen = SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> > +	int pad = SKB_DATA_ALIGN(VHOST_NET_RX_PAD + VHOST_NET_HEADROOM
> > +				 + nvq->sock_hlen);
> > +	int sock_hlen = nvq->sock_hlen;
> > +	void *buf;
> > +	int copied;
> > +
> > +	if (len < nvq->sock_hlen)
> > +		return -EFAULT;
> > +
> > +	if (SKB_DATA_ALIGN(len + pad) +
> > +	    SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) > PAGE_SIZE)
> > +		return -ENOSPC;
> > +
> > +	buflen += SKB_DATA_ALIGN(len + pad);
> 
> maybe store the result of SKB_DATA_ALIGN in a local instead of doing
> the work twice?

I don't mind, but I guess gcc can always do it itself?

> > +	alloc_frag->offset = ALIGN((u64)alloc_frag->offset, SMP_CACHE_BYTES);
> > +	if (unlikely(!skb_page_frag_refill(buflen, alloc_frag, GFP_KERNEL)))
> > +		return -ENOMEM;
> > +
> > +	buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
> > +
> > +	/* We store two kinds of metadata in the header which will be
> > +	 * used for XDP_PASS to do build_skb():
> > +	 * offset 0: buflen
> > +	 * offset sizeof(int): vnet header
> > +	 */
> > +	copied = copy_page_from_iter(alloc_frag->page,
> > +				     alloc_frag->offset + sizeof(int), sock_hlen, from);
> > +	if (copied != sock_hlen)
> > +		return -EFAULT;
> > +
> > +	gso = (struct virtio_net_hdr *)(buf + sizeof(int));
> > +
> > +	if ((gso->flags & VIRTIO_NET_HDR_F_NEEDS_CSUM) &&
> > +	    vhost16_to_cpu(vq, gso->csum_start) +
> > +	    vhost16_to_cpu(vq, gso->csum_offset) + 2 >
> > +	    vhost16_to_cpu(vq, gso->hdr_len)) {
> > +		gso->hdr_len = cpu_to_vhost16(vq,
> > +			       vhost16_to_cpu(vq, gso->csum_start) +
> > +			       vhost16_to_cpu(vq, gso->csum_offset) + 2);
> > +
> > +		if (vhost16_to_cpu(vq, gso->hdr_len) > len)
> > +			return -EINVAL;
> > +	}
> > +
> > +	len -= sock_hlen;
> > +	copied = copy_page_from_iter(alloc_frag->page,
> > +				     alloc_frag->offset + pad,
> > +				     len, from);
> > +	if (copied != len)
> > +		return -EFAULT;
> > +
> > +	xdp->data_hard_start = buf;
> > +	xdp->data = buf + pad;
> > +	xdp->data_end = xdp->data + len;
> > +	*(int *)(xdp->data_hard_start)= buflen;
> 
> space before =
> 
> > +
> > +	get_page(alloc_frag->page);
> > +	alloc_frag->offset += buflen;
> > +
> > +	return 0;
> > +}
> > +
> >  static void handle_tx_copy(struct vhost_net *net)
> >  {
> >  	struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX];

  parent reply	other threads:[~2018-05-21 22:21 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-05-21  9:04 [RFC PATCH net-next 00/12] XDP batching for TUN/vhost_net Jason Wang
2018-05-21  9:04 ` Jason Wang
2018-05-21  9:04 ` [RFC PATCH net-next 01/12] vhost_net: introduce helper to initialize tx iov iter Jason Wang
2018-05-21 16:24   ` Jesse Brandeburg
2018-05-22 12:26     ` Jason Wang
2018-05-22 12:26     ` Jason Wang
2018-05-21 16:24   ` Jesse Brandeburg
2018-05-21  9:04 ` Jason Wang
2018-05-21  9:04 ` [RFC PATCH net-next 02/12] vhost_net: introduce vhost_exceeds_weight() Jason Wang
2018-05-21  9:04 ` Jason Wang
2018-05-21 16:29   ` Jesse Brandeburg
2018-05-22 12:27     ` Jason Wang
2018-05-22 12:27       ` Jason Wang
2018-05-21 16:29   ` Jesse Brandeburg
2018-05-21  9:04 ` [RFC PATCH net-next 03/12] vhost_net: introduce vhost_has_more_pkts() Jason Wang
2018-05-21  9:04   ` Jason Wang
2018-05-21 16:39   ` Jesse Brandeburg
2018-05-21 16:39   ` Jesse Brandeburg
2018-05-22 12:31     ` Jason Wang
2018-05-22 12:31     ` Jason Wang
2018-05-21  9:04 ` [RFC PATCH net-next 04/12] vhost_net: split out datacopy logic Jason Wang
2018-05-21  9:04   ` Jason Wang
2018-05-21 16:46   ` Jesse Brandeburg
2018-05-21 16:46     ` Jesse Brandeburg
2018-05-22 12:39     ` Jason Wang
2018-05-22 12:39     ` Jason Wang
2018-05-21  9:04 ` [RFC PATCH net-next 05/12] vhost_net: batch update used ring for datacopy TX Jason Wang
2018-05-21  9:04 ` Jason Wang
2018-05-21  9:04 ` [RFC PATCH net-next 06/12] tuntap: enable premmption early Jason Wang
2018-05-21  9:04 ` Jason Wang
2018-05-21 14:32   ` Michael S. Tsirkin
2018-05-21 14:32   ` Michael S. Tsirkin
2018-05-21  9:04 ` [RFC PATCH net-next 07/12] tuntap: simplify error handling in tun_build_skb() Jason Wang
2018-05-21  9:04 ` Jason Wang
2018-05-21  9:04 ` [RFC PATCH net-next 08/12] tuntap: tweak on the path of non-xdp case " Jason Wang
2018-05-21  9:04 ` Jason Wang
2018-05-21  9:04 ` [RFC PATCH net-next 09/12] tuntap: split out XDP logic Jason Wang
2018-05-21  9:04 ` Jason Wang
2018-05-21  9:04 ` [RFC PATCH net-next 10/12] vhost_net: build xdp buff Jason Wang
2018-05-21 16:56   ` Jesse Brandeburg
2018-05-21 16:56   ` Jesse Brandeburg
2018-05-21 22:21     ` Michael S. Tsirkin
2018-05-21 22:21     ` Michael S. Tsirkin [this message]
2018-05-22 12:41     ` Jason Wang
2018-05-22 12:41     ` Jason Wang
2018-05-21  9:04 ` Jason Wang
2018-05-21  9:04 ` [RFC PATCH net-next 11/12] vhost_net: passing raw xdp buff to tun Jason Wang
2018-05-21  9:04 ` Jason Wang
2018-05-21  9:04 ` [RFC PATCH net-next 12/12] vhost_net: batch submitting XDP buffers to underlayer sockets Jason Wang
2018-05-21  9:04 ` Jason Wang
2018-05-21 14:33   ` Michael S. Tsirkin
2018-05-21 14:33   ` Michael S. Tsirkin
2018-05-25 17:53 ` [RFC PATCH net-next 00/12] XDP batching for TUN/vhost_net Michael S. Tsirkin
2018-05-25 17:53 ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='20180522012008-mutt-send-email-mst__38174.064565874$1526941158$gmane$org@kernel.org' \
    --to=mst@redhat.com \
    --cc=jesse.brandeburg@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.