From mboxrd@z Thu Jan  1 00:00:00 1970
From: "Yao, Lei A" <lei.a.yao@intel.com>
Subject: Re: [RFC PATCH] net/virtio: Align Virtio-net header on
 cache line in receive path
Date: Wed, 8 Mar 2017 06:01:03 +0000
Message-ID: <2DBBFF226F7CF64BAFCA79B681719D953A15F40C@shsmsx102.ccr.corp.intel.com>
References: <20170221173243.20779-1-maxime.coquelin@redhat.com>
 <20170222013734.GJ18844@yliu-dev.sh.intel.com>
 <024ad979-8b54-ac33-54b4-5f8753b74d75@redhat.com>
 <20170223054954.GU18844@yliu-dev.sh.intel.com>
 <349f9a71-7407-e45a-4687-a54fe7e778c8@redhat.com>
 <20170306084649.GH18844@yliu-dev.sh.intel.com>
 <3fa785c1-7a7b-b13e-4bd0-ee34ed5985fe@redhat.com>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Cc: "Liang, Cunming" <cunming.liang@intel.com>, "Tan, Jianfeng"
 <jianfeng.tan@intel.com>, "dev@dpdk.org" <dev@dpdk.org>, "Wang, Zhihong"
 <zhihong.wang@intel.com>
To: Maxime Coquelin <maxime.coquelin@redhat.com>, Yuanhan Liu
 <yuanhan.liu@linux.intel.com>
Return-path: <dev-bounces@dpdk.org>
Received: from mga02.intel.com (mga02.intel.com [134.134.136.20])
 by dpdk.org (Postfix) with ESMTP id 3061B108A
 for <dev@dpdk.org>; Wed,  8 Mar 2017 07:01:09 +0100 (CET)
In-Reply-To: <3fa785c1-7a7b-b13e-4bd0-ee34ed5985fe@redhat.com>
Content-Language: en-US
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>


> -----Original Message-----
> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
> Sent: Monday, March 6, 2017 10:11 PM
> To: Yuanhan Liu <yuanhan.liu@linux.intel.com>
> Cc: Liang, Cunming <cunming.liang@intel.com>; Tan, Jianfeng
> <jianfeng.tan@intel.com>; dev@dpdk.org; Wang, Zhihong
> <zhihong.wang@intel.com>; Yao, Lei A <lei.a.yao@intel.com>
> Subject: Re: [RFC PATCH] net/virtio: Align Virtio-net header on cache lin=
e in
> receive path
>=20
>=20
>=20
> On 03/06/2017 09:46 AM, Yuanhan Liu wrote:
> > On Wed, Mar 01, 2017 at 08:36:24AM +0100, Maxime Coquelin wrote:
> >>
> >>
> >> On 02/23/2017 06:49 AM, Yuanhan Liu wrote:
> >>> On Wed, Feb 22, 2017 at 10:36:36AM +0100, Maxime Coquelin wrote:
> >>>>
> >>>>
> >>>> On 02/22/2017 02:37 AM, Yuanhan Liu wrote:
> >>>>> On Tue, Feb 21, 2017 at 06:32:43PM +0100, Maxime Coquelin wrote:
> >>>>>> This patch aligns the Virtio-net header on a cache-line boundary t=
o
> >>>>>> optimize cache utilization, as it puts the Virtio-net header (whic=
h
> >>>>>> is always accessed) on the same cache line as the packet header.
> >>>>>>
> >>>>>> For example with an application that forwards packets at L2 level,
> >>>>>> a single cache-line will be accessed with this patch, instead of
> >>>>>> two before.
> >>>>>
> >>>>> I'm assuming you were testing pkt size <=3D (64 - hdr_size)?
> >>>>
> >>>> No, I tested with 64 bytes packets only.
> >>>
> >>> Oh, my bad, I overlooked it. While you were saying "a single cache
> >>> line", I was thinking putting the virtio net hdr and the "whole"
> >>> packet data in single cache line, which is not possible for pkt
> >>> size 64B.
> >>>
> >>>> I run some more tests this morning with different packet sizes,
> >>>> and also with changing the mbuf size on guest side to have multi-
> >>>> buffers packets:
> >>>>
> >>>> +-------+--------+--------+-------------------------+
> >>>> | Txpkt | Rxmbuf | v17.02 | v17.02 + vnet hdr align |
> >>>> +-------+--------+--------+-------------------------+
> >>>> |    64 |   2048 |  11.05 |                   11.78 |
> >>>> |   128 |   2048 |  10.66 |                   11.48 |
> >>>> |   256 |   2048 |  10.47 |                   11.21 |
> >>>> |   512 |   2048 |  10.22 |                   10.88 |
> >>>> |  1024 |   2048 |   7.65 |                    7.84 |
> >>>> |  1500 |   2048 |   6.25 |                    6.45 |
> >>>> |  2000 |   2048 |   5.31 |                    5.43 |
> >>>> |  2048 |   2048 |   5.32 |                    4.25 |
> >>>> |  1500 |    512 |   3.89 |                    3.98 |
> >>>> |  2048 |    512 |   1.96 |                    2.02 |
> >>>> +-------+--------+--------+-------------------------+
> >>>
> >>> Could you share more info, say is it a PVP test? Is mergeable on?
> >>> What's the fwd mode?
> >>
> >> No, this is not PVP benchmark, I have neither another server nor a pac=
ket
> >> generator connected to my Haswell machine back-to-back.
> >>
> >> This is simple micro-benchmark, vhost PMD in txonly, Virtio PMD in
> >> rxonly. In this configuration, mergeable is ON and no offload disabled
> >> in QEMU cmdline.
> >
> > Okay, I see. So the boost, as you have stated, comes from saving two
> > cache line access to one. Before that, vhost write 2 cache lines,
> > while the virtio pmd reads 2 cache lines: one for reading the header,
> > another one for reading the ether header, for updating xstats (there
> > is no ether access in the fwd mode you tested).
> >
> >> That's why I would be interested in more testing on recent hardware
> >> with PVP benchmark. Is it something that could be run in Intel lab?
> >
> > I think Yao Lei could help on that? But as stated, I think it may
> > break the performance for bit packets. And I also won't expect big
> > boost even for 64B in PVP test, judging that it's only 6% boost in
> > micro bechmarking.
> That would be great.
> Note that on SandyBridge, on which I see a drop in perf with
> microbenchmark, I get a 4% gain on PVP benchmark. So on recent hardware
> that show a gain on microbenchmark, I'm curious of the gain with PVP
> bench.
>=20
Hi, Maxime, Yuanhan

I have execute the PVP and loopback performance test on my Ivy bridge serve=
r.=20
OS:Ubutnu16.04
CPU: Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz
Kernal:  4.4.0
gcc : 5.4.0
I use MAC forward for test.

Performance base is commit f5472703c0bdfc29c46fc4b2ca445bce3dc08c9f,=20
"eal: optimize aligned memcpy on x86".
I can see big performance drop on Mergeable and no-mergeable path
after apply this patch=20
Mergebale Path loopback test		=09
packet size		Performance compare=09
64		 		-21.76%
128				-17.79%
260				-20.25%
520				-14.80%
1024				-9.34%
1500				-6.16%

No-mergeable  Path loopback test=09
packet size		=09
64				-13.72%
128				-10.35%
260				-16.40%
520				-14.78%
1024				-10.48%
1500				-6.91%

Mergeable Path PVP test		=09
packet size		Performance compare	=09
64	                                               -16.33%

No-mergeable Path PVP test		=09
packet size		=09
64		                               -8.69%

Best Regards
Lei


> Cheers,
> Maxime