From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S932270AbdBVKgV (ORCPT <rfc822;w@1wt.eu>);
        Wed, 22 Feb 2017 05:36:21 -0500
Received: from mx1.redhat.com ([209.132.183.28]:34316 "EHLO mx1.redhat.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1754103AbdBVKgN (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
        Wed, 22 Feb 2017 05:36:13 -0500
Subject: Re: [PATCH net-next] virtio-net: switch to use build_skb() for small
 buffer
To: "Michael S. Tsirkin" <mst@redhat.com>
References: <1487666788-9658-1-git-send-email-jasowang@redhat.com>
 <20170221162849-mutt-send-email-mst@kernel.org>
 <35244d06-2d0d-3f2e-0cd1-43137988c869@redhat.com>
 <20170222050601-mutt-send-email-mst@kernel.org>
 <49577dad-3f6f-e6b3-4f93-2b919b066068@redhat.com>
 <20170222053753-mutt-send-email-mst@kernel.org>
Cc: virtualization@lists.linux-foundation.org, netdev@vger.kernel.org,
        linux-kernel@vger.kernel.org
From: Jason Wang <jasowang@redhat.com>
Message-ID: <d30578ba-5796-fe61-6db3-189a652b390c@redhat.com>
Date: Wed, 22 Feb 2017 18:36:05 +0800
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
 Thunderbird/45.7.0
MIME-Version: 1.0
In-Reply-To: <20170222053753-mutt-send-email-mst@kernel.org>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.38]); Wed, 22 Feb 2017 10:36:13 +0000 (UTC)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org


On 2017年02月22日 11:42, Michael S. Tsirkin wrote:
> On Wed, Feb 22, 2017 at 11:17:50AM +0800, Jason Wang wrote:
>>
>> On 2017年02月22日 11:06, Michael S. Tsirkin wrote:
>>> On Wed, Feb 22, 2017 at 10:58:08AM +0800, Jason Wang wrote:
>>>> On 2017年02月21日 22:37, Michael S. Tsirkin wrote:
>>>>> On Tue, Feb 21, 2017 at 04:46:28PM +0800, Jason Wang wrote:
>>>>>> This patch switch to use build_skb() for small buffer which can have
>>>>>> better performance for both TCP and XDP (since we can work at page
>>>>>> before skb creation). It also remove lots of XDP codes since both
>>>>>> mergeable and small buffer use page frag during refill now.
>>>>>>
>>>>>>                           Before   | After
>>>>>> XDP_DROP(xdp1) 64B  :  11.1Mpps | 14.4Mpps
>>>>>>
>>>>>> Tested with xdp1/xdp2/xdp_ip_tx_tunnel and netperf.
>>>>>>
>>>>>> Signed-off-by: Jason Wang<jasowang@redhat.com>
>>>>> Thanks!
>>>>> I had a similar patch for mergeable too, though it's trickier there
>>>>> as host has a lot of flexibility in sizing buffers.
>>>>> Looks like a good intermediate step to me.
>>>> Yes, I think it's more tricky for the case of mergeable buffer:
>>>>
>>>> 1) we need reserve NET_SKB_PAD + NET_IP_ALIGN for each buffer, this will
>>>> break rx frag coalescing
>>>> 2) need tailroom for skb_shinfo, so it won't work for all size of packet
>>>>
>>>> Thanks
>>> Have you seen my prototype?
>> Not yet, please give me a pointer.
>>
>>>    It works with qemu in practice,
>>> just needs to cover a bunch of corner cases.
>> Sounds good, if you wish I can help to finalize it.
>>
>> Thanks
>>
>>>>> Acked-by: Michael S. Tsirkin<mst@redhat.com>
>>>>>
>
> Great! Here it is I think it's on top of 4.9 or so. Issues to address:
> - when ring is very small this might cause us not to
>    be able to fit even a single packet in the whole queue.
>    Limit logic to when ring is big enough?
> - If e.g. header is split across many buffers, this won't work
>    correctly. Detect and copy?
>
> Could be more issues that I forgot. It might be a good idea to
> first finish my buffer ctx hack. Will simplify the logic.
>
>
> commit 77c177091a7c8473e3f0ed148afa2b4765b8b0f7
> Author: Michael S. Tsirkin <mst@redhat.com>
> Date:   Sun Feb 21 20:22:31 2016 +0200
>
>      virtio_net: switch to build_skb for mrg_rxbuf
>      
>      For small packets data copy was observed to
>      take up about 15% CPU time. Switch to build_skb
>      and avoid the copy when using mergeable rx buffers.
>      
>      As a bonus, medium-size skbs that fit in a page will be
>      completely linear.
>      
>      Of course, we now need to lower the lower bound on packet size,
>      to make sure a sane number of skbs fits in rx socket buffer.
>      By how much? I don't know yet.
>      
>      It might also be useful to prefetch the packet buffer since
>      net stack will likely use it soon.
>
>      TODO: it appears that Linux won't handle correctly the case of first
>      buffer being very small (or consisting exclusively of virtio header).
>      This is already the case for current code, so don't bother
>      testing yet.
>      TODO: might be unfair to the last packet in a fragment as we include
>      remaining space if any in its truesize.
>      
>      Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index b425fa1..a6b996f 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -38,6 +38,8 @@ module_param(gso, bool, 0444);
>   

[...]

> -	if (unlikely(!skb))
> -		return;
> -
> -	hdr = skb_vnet_hdr(skb);
>   
>   	u64_stats_update_begin(&stats->rx_syncp);
>   	stats->rx_bytes += skb->len;
> @@ -581,11 +624,14 @@ static int add_recvbuf_big(struct virtnet_info *vi, struct receive_queue *rq,
>   
>   static unsigned int get_mergeable_buf_len(struct ewma_pkt_len *avg_pkt_len)
>   {
> -	const size_t hdr_len = sizeof(struct virtio_net_hdr_mrg_rxbuf);
> +	unsigned int hdr;
>   	unsigned int len;
>   
> -	len = hdr_len + clamp_t(unsigned int, ewma_pkt_len_read(avg_pkt_len),
> -			GOOD_PACKET_LEN, PAGE_SIZE - hdr_len);
> +	hdr = ALIGN(VNET_SKB_PAD + SKB_DATA_ALIGN(sizeof(struct skb_shared_info)),
> +		    MERGEABLE_BUFFER_ALIGN);

So we need to reserve sizeof(struct skb_shared_info) for each buffer. If 
we have several 4K buffers for a single large packet, about 8% truesize 
were wasted?

> +
> +	len = hdr + clamp_t(unsigned int, ewma_pkt_len_read(avg_pkt_len),
> +			    MIN_PACKET_ALLOC, PAGE_SIZE - hdr);
>   	return ALIGN(len, MERGEABLE_BUFFER_ALIGN);
>   }
>   
> @@ -601,8 +647,11 @@ static int add_recvbuf_mergeable(struct receive_queue *rq, gfp_t gfp)
>   	if (unlikely(!skb_page_frag_refill(len, alloc_frag, gfp)))
>   		return -ENOMEM;
>   
> -	buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset;
> -	ctx = mergeable_buf_to_ctx(buf, len);
> +	BUILD_BUG_ON(VNET_SKB_BUG);
> +
> +	buf = (char *)page_address(alloc_frag->page) + alloc_frag->offset +
> +		VNET_SKB_OFF;

Rx frag coalescing won't work any more and we will end up with more 
fraglist for large packets.

Thanks

> +	//ctx = mergeable_buf_to_ctx(buf - VNET_SKB_OFF, len);
>   	get_page(alloc_frag->page);
>   	alloc_frag->offset += len;
>   	hole = alloc_frag->size - alloc_frag->offset;
> @@ -615,8 +664,10 @@ static int add_recvbuf_mergeable(struct receive_queue *rq, gfp_t gfp)
>   		len += hole;
>   		alloc_frag->offset += hole;
>   	}
> +	ctx = mergeable_buf_to_ctx(buf - VNET_SKB_OFF, len);
>   
> -	sg_init_one(rq->sg, buf, len);
> +	sg_init_one(rq->sg, buf,
> +		    len - VNET_SKB_OFF - SKB_DATA_ALIGN(sizeof(struct skb_shared_info)));
>   	err = virtqueue_add_inbuf(rq->vq, rq->sg, 1, (void *)ctx, gfp);
>   	if (err < 0)
>   		put_page(virt_to_head_page(buf));