From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932270AbdBVKgV (ORCPT ); Wed, 22 Feb 2017 05:36:21 -0500 Received: from mx1.redhat.com ([209.132.183.28]:34316 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754103AbdBVKgN (ORCPT ); Wed, 22 Feb 2017 05:36:13 -0500 Subject: Re: [PATCH net-next] virtio-net: switch to use build_skb() for small buffer To: "Michael S. Tsirkin" References: <1487666788-9658-1-git-send-email-jasowang@redhat.com> <20170221162849-mutt-send-email-mst@kernel.org> <35244d06-2d0d-3f2e-0cd1-43137988c869@redhat.com> <20170222050601-mutt-send-email-mst@kernel.org> <49577dad-3f6f-e6b3-4f93-2b919b066068@redhat.com> <20170222053753-mutt-send-email-mst@kernel.org> Cc: virtualization@lists.linux-foundation.org, netdev@vger.kernel.org, linux-kernel@vger.kernel.org From: Jason Wang Message-ID: Date: Wed, 22 Feb 2017 18:36:05 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.7.0 MIME-Version: 1.0 In-Reply-To: <20170222053753-mutt-send-email-mst@kernel.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.38]); Wed, 22 Feb 2017 10:36:13 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 2017年02月22日 11:42, Michael S. Tsirkin wrote: > On Wed, Feb 22, 2017 at 11:17:50AM +0800, Jason Wang wrote: >> >> On 2017年02月22日 11:06, Michael S. Tsirkin wrote: >>> On Wed, Feb 22, 2017 at 10:58:08AM +0800, Jason Wang wrote: >>>> On 2017年02月21日 22:37, Michael S. Tsirkin wrote: >>>>> On Tue, Feb 21, 2017 at 04:46:28PM +0800, Jason Wang wrote: >>>>>> This patch switch to use build_skb() for small buffer which can have >>>>>> better performance for both TCP and XDP (since we can work at page >>>>>> before skb creation). It also remove lots of XDP codes since both >>>>>> mergeable and small buffer use page frag during refill now. >>>>>> >>>>>> Before | After >>>>>> XDP_DROP(xdp1) 64B : 11.1Mpps | 14.4Mpps >>>>>> >>>>>> Tested with xdp1/xdp2/xdp_ip_tx_tunnel and netperf. >>>>>> >>>>>> Signed-off-by: Jason Wang >>>>> Thanks! >>>>> I had a similar patch for mergeable too, though it's trickier there >>>>> as host has a lot of flexibility in sizing buffers. >>>>> Looks like a good intermediate step to me. >>>> Yes, I think it's more tricky for the case of mergeable buffer: >>>> >>>> 1) we need reserve NET_SKB_PAD + NET_IP_ALIGN for each buffer, this will >>>> break rx frag coalescing >>>> 2) need tailroom for skb_shinfo, so it won't work for all size of packet >>>> >>>> Thanks >>> Have you seen my prototype? >> Not yet, please give me a pointer. >> >>> It works with qemu in practice, >>> just needs to cover a bunch of corner cases. >> Sounds good, if you wish I can help to finalize it. >> >> Thanks >> >>>>> Acked-by: Michael S. Tsirkin >>>>> > > Great! Here it is I think it's on top of 4.9 or so. Issues to address: > - when ring is very small this might cause us not to > be able to fit even a single packet in the whole queue. > Limit logic to when ring is big enough? > - If e.g. header is split across many buffers, this won't work > correctly. Detect and copy? > > Could be more issues that I forgot. It might be a good idea to > first finish my buffer ctx hack. Will simplify the logic. > > > commit 77c177091a7c8473e3f0ed148afa2b4765b8b0f7 > Author: Michael S. Tsirkin > Date: Sun Feb 21 20:22:31 2016 +0200 > > virtio_net: switch to build_skb for mrg_rxbuf > > For small packets data copy was observed to > take up about 15% CPU time. Switch to build_skb > and avoid the copy when using mergeable rx buffers. > > As a bonus, medium-size skbs that fit in a page will be > completely linear. > > Of course, we now need to lower the lower bound on packet size, > to make sure a sane number of skbs fits in rx socket buffer. > By how much? I don't know yet. > > It might also be useful to prefetch the packet buffer since > net stack will likely use it soon. > > TODO: it appears that Linux won't handle correctly the case of first > buffer being very small (or consisting exclusively of virtio header). > This is already the case for current code, so don't bother > testing yet. > TODO: might be unfair to the last packet in a fragment as we include > remaining space if any in its truesize. > > Signed-off-by: Michael S. Tsirkin > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index b425fa1..a6b996f 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -38,6 +38,8 @@ module_param(gso, bool, 0444); > [...] > - if (unlikely(!skb)) > - return; > - > - hdr = skb_vnet_hdr(skb); > > u64_stats_update_begin(&stats->rx_syncp); > stats->rx_bytes += skb->len; > @@ -581,11 +624,14 @@ static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq, > > static unsigned int get_mergeable_buf_len(struct ewma_pkt_len *avg_pkt_len) > { > - const size_t hdr_len = sizeof(struct virtio_net_hdr_mrg_rxbuf); > + unsigned int hdr; > unsigned int len; > > - len = hdr_len + clamp_t(unsigned int, ewma_pkt_len_read(avg_pkt_len), > - GOOD_PACKET_LEN, PAGE_SIZE - hdr_len); > + hdr = ALIGN(VNET_SKB_PAD + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)), > + MERGEABLE_BUFFER_ALIGN); So we need to reserve sizeof(struct skb_shared_info) for each buffer. If we have several 4K buffers for a single large packet, about 8% truesize were wasted? > + > + len = hdr + clamp_t(unsigned int, ewma_pkt_len_read(avg_pkt_len), > + MIN_PACKET_ALLOC, PAGE_SIZE - hdr); > return ALIGN(len, MERGEABLE_BUFFER_ALIGN); > } > > @@ -601,8 +647,11 @@ static int add_recvbuf_mergeable(struct receive_queue *rq, gfp_t gfp) > if (unlikely(!skb_page_frag_refill(len, alloc_frag, gfp))) > return -ENOMEM; > > - buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset; > - ctx = mergeable_buf_to_ctx(buf, len); > + BUILD_BUG_ON(VNET_SKB_BUG); > + > + buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset + > + VNET_SKB_OFF; Rx frag coalescing won't work any more and we will end up with more fraglist for large packets. Thanks > + //ctx = mergeable_buf_to_ctx(buf - VNET_SKB_OFF, len); > get_page(alloc_frag->page); > alloc_frag->offset += len; > hole = alloc_frag->size - alloc_frag->offset; > @@ -615,8 +664,10 @@ static int add_recvbuf_mergeable(struct receive_queue *rq, gfp_t gfp) > len += hole; > alloc_frag->offset += hole; > } > + ctx = mergeable_buf_to_ctx(buf - VNET_SKB_OFF, len); > > - sg_init_one(rq->sg, buf, len); > + sg_init_one(rq->sg, buf, > + len - VNET_SKB_OFF - SKB_DATA_ALIGN(sizeof(struct skb_shared_info))); > err = virtqueue_add_inbuf(rq->vq, rq->sg, 1, (void *)ctx, gfp); > if (err < 0) > put_page(virt_to_head_page(buf));