From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Yao, Lei A" Subject: Re: [RFC PATCH] net/virtio: Align Virtio-net header on cache line in receive path Date: Wed, 8 Mar 2017 06:01:03 +0000 Message-ID: <2DBBFF226F7CF64BAFCA79B681719D953A15F40C@shsmsx102.ccr.corp.intel.com> References: <20170221173243.20779-1-maxime.coquelin@redhat.com> <20170222013734.GJ18844@yliu-dev.sh.intel.com> <024ad979-8b54-ac33-54b4-5f8753b74d75@redhat.com> <20170223054954.GU18844@yliu-dev.sh.intel.com> <349f9a71-7407-e45a-4687-a54fe7e778c8@redhat.com> <20170306084649.GH18844@yliu-dev.sh.intel.com> <3fa785c1-7a7b-b13e-4bd0-ee34ed5985fe@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Cc: "Liang, Cunming" , "Tan, Jianfeng" , "dev@dpdk.org" , "Wang, Zhihong" To: Maxime Coquelin , Yuanhan Liu Return-path: Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by dpdk.org (Postfix) with ESMTP id 3061B108A for ; Wed, 8 Mar 2017 07:01:09 +0100 (CET) In-Reply-To: <3fa785c1-7a7b-b13e-4bd0-ee34ed5985fe@redhat.com> Content-Language: en-US List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" > -----Original Message----- > From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com] > Sent: Monday, March 6, 2017 10:11 PM > To: Yuanhan Liu > Cc: Liang, Cunming ; Tan, Jianfeng > ; dev@dpdk.org; Wang, Zhihong > ; Yao, Lei A > Subject: Re: [RFC PATCH] net/virtio: Align Virtio-net header on cache lin= e in > receive path >=20 >=20 >=20 > On 03/06/2017 09:46 AM, Yuanhan Liu wrote: > > On Wed, Mar 01, 2017 at 08:36:24AM +0100, Maxime Coquelin wrote: > >> > >> > >> On 02/23/2017 06:49 AM, Yuanhan Liu wrote: > >>> On Wed, Feb 22, 2017 at 10:36:36AM +0100, Maxime Coquelin wrote: > >>>> > >>>> > >>>> On 02/22/2017 02:37 AM, Yuanhan Liu wrote: > >>>>> On Tue, Feb 21, 2017 at 06:32:43PM +0100, Maxime Coquelin wrote: > >>>>>> This patch aligns the Virtio-net header on a cache-line boundary t= o > >>>>>> optimize cache utilization, as it puts the Virtio-net header (whic= h > >>>>>> is always accessed) on the same cache line as the packet header. > >>>>>> > >>>>>> For example with an application that forwards packets at L2 level, > >>>>>> a single cache-line will be accessed with this patch, instead of > >>>>>> two before. > >>>>> > >>>>> I'm assuming you were testing pkt size <=3D (64 - hdr_size)? > >>>> > >>>> No, I tested with 64 bytes packets only. > >>> > >>> Oh, my bad, I overlooked it. While you were saying "a single cache > >>> line", I was thinking putting the virtio net hdr and the "whole" > >>> packet data in single cache line, which is not possible for pkt > >>> size 64B. > >>> > >>>> I run some more tests this morning with different packet sizes, > >>>> and also with changing the mbuf size on guest side to have multi- > >>>> buffers packets: > >>>> > >>>> +-------+--------+--------+-------------------------+ > >>>> | Txpkt | Rxmbuf | v17.02 | v17.02 + vnet hdr align | > >>>> +-------+--------+--------+-------------------------+ > >>>> | 64 | 2048 | 11.05 | 11.78 | > >>>> | 128 | 2048 | 10.66 | 11.48 | > >>>> | 256 | 2048 | 10.47 | 11.21 | > >>>> | 512 | 2048 | 10.22 | 10.88 | > >>>> | 1024 | 2048 | 7.65 | 7.84 | > >>>> | 1500 | 2048 | 6.25 | 6.45 | > >>>> | 2000 | 2048 | 5.31 | 5.43 | > >>>> | 2048 | 2048 | 5.32 | 4.25 | > >>>> | 1500 | 512 | 3.89 | 3.98 | > >>>> | 2048 | 512 | 1.96 | 2.02 | > >>>> +-------+--------+--------+-------------------------+ > >>> > >>> Could you share more info, say is it a PVP test? Is mergeable on? > >>> What's the fwd mode? > >> > >> No, this is not PVP benchmark, I have neither another server nor a pac= ket > >> generator connected to my Haswell machine back-to-back. > >> > >> This is simple micro-benchmark, vhost PMD in txonly, Virtio PMD in > >> rxonly. In this configuration, mergeable is ON and no offload disabled > >> in QEMU cmdline. > > > > Okay, I see. So the boost, as you have stated, comes from saving two > > cache line access to one. Before that, vhost write 2 cache lines, > > while the virtio pmd reads 2 cache lines: one for reading the header, > > another one for reading the ether header, for updating xstats (there > > is no ether access in the fwd mode you tested). > > > >> That's why I would be interested in more testing on recent hardware > >> with PVP benchmark. Is it something that could be run in Intel lab? > > > > I think Yao Lei could help on that? But as stated, I think it may > > break the performance for bit packets. And I also won't expect big > > boost even for 64B in PVP test, judging that it's only 6% boost in > > micro bechmarking. > That would be great. > Note that on SandyBridge, on which I see a drop in perf with > microbenchmark, I get a 4% gain on PVP benchmark. So on recent hardware > that show a gain on microbenchmark, I'm curious of the gain with PVP > bench. >=20 Hi, Maxime, Yuanhan I have execute the PVP and loopback performance test on my Ivy bridge serve= r.=20 OS:Ubutnu16.04 CPU: Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz Kernal: 4.4.0 gcc : 5.4.0 I use MAC forward for test. Performance base is commit f5472703c0bdfc29c46fc4b2ca445bce3dc08c9f,=20 "eal: optimize aligned memcpy on x86". I can see big performance drop on Mergeable and no-mergeable path after apply this patch=20 Mergebale Path loopback test =09 packet size Performance compare=09 64 -21.76% 128 -17.79% 260 -20.25% 520 -14.80% 1024 -9.34% 1500 -6.16% No-mergeable Path loopback test=09 packet size =09 64 -13.72% 128 -10.35% 260 -16.40% 520 -14.78% 1024 -10.48% 1500 -6.91% Mergeable Path PVP test =09 packet size Performance compare =09 64 -16.33% No-mergeable Path PVP test =09 packet size =09 64 -8.69% Best Regards Lei > Cheers, > Maxime