From mboxrd@z Thu Jan  1 00:00:00 1970
From: Jianbo Liu <jianbo.liu@linaro.org>
Subject: Re: [PATCH v3 0/5] vhost: optimize enqueue
Date: Mon, 26 Sep 2016 12:24:25 +0800
Message-ID: <CAP4Qi3-_04uDZGm8TiiVROQuQ9OkQPgG_ubGMKHk6QN28NniaA@mail.gmail.com>
References: <1471319402-112998-1-git-send-email-zhihong.wang@intel.com>
 <8F6C2BD409508844A0EFC19955BE09414E7B6204@SHSMSX103.ccr.corp.intel.com>
 <CAP4Qi3_DxAnvs0jX1P=G_PiLnRRbP5Wty-eU-OPE_81RGCAuTA@mail.gmail.com>
 <1536480.IYe8r5XoNN@xps13>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Cc: dev@dpdk.org, "Wang, Zhihong" <zhihong.wang@intel.com>,
 Yuanhan Liu <yuanhan.liu@linux.intel.com>,
 Maxime Coquelin <maxime.coquelin@redhat.com>
To: Thomas Monjalon <thomas.monjalon@6wind.com>
Return-path: <dev-bounces@dpdk.org>
Received: from mail-yw0-f170.google.com (mail-yw0-f170.google.com
 [209.85.161.170]) by dpdk.org (Postfix) with ESMTP id 48A3C37B1
 for <dev@dpdk.org>; Mon, 26 Sep 2016 06:24:27 +0200 (CEST)
Received: by mail-yw0-f170.google.com with SMTP id i129so150674305ywb.0
 for <dev@dpdk.org>; Sun, 25 Sep 2016 21:24:27 -0700 (PDT)
In-Reply-To: <1536480.IYe8r5XoNN@xps13>
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

Hi Thomas,

On 23 September 2016 at 21:41, Thomas Monjalon
<thomas.monjalon@6wind.com> wrote:
> 2016-09-23 18:41, Jianbo Liu:
>> On 23 September 2016 at 10:56, Wang, Zhihong <zhihong.wang@intel.com> wrote:
>> .....
>> > This is expected because the 2nd patch is just a baseline and all optimization
>> > patches are organized in the rest of this patch set.
>> >
>> > I think you can do bottleneck analysis on ARM to see what's slowing down the
>> > perf, there might be some micro-arch complications there, mostly likely in
>> > memcpy.
>> >
>> > Do you use glibc's memcpy? I suggest to hand-crafted it on your own.
>> >
>> > Could you publish the mrg_rxbuf=on data also? Since it's more widely used
>> > in terms of spec integrity.
>> >
>> I don't think it will be helpful for you, considering the differences
>> between x86 and arm.
>> So please move on with this patchset...
>
> Jianbo,
> I don't understand.
> You said that the 2nd patch is a regression:
> -       volatile uint16_t       last_used_idx;
> +       uint16_t                last_used_idx;
>
No, I meant "vhost: rewrite enqueue".

> And the overrall series lead to performance regression
> for packets > 512 B, right?
> But we don't know wether you have tested the v6 or not.
Yes, I tested v6, and found performance regression for size >=512B.

>
> Zhihong talked about some improvements possible in rte_memcpy.
> ARM64 is using libc memcpy in rte_memcpy.
>
> Now you seem to give up.
> Does it mean you accept having a regression in 16.11 release?
> Are you working on rte_memcpy?
This patchset actually improves performance according to Zhihong's
result on x86 platfrom. And I also get improvement as least with
small-size packet on ARM.
I don't want to give up, but I need more time to find out the reason
for the regression. I think rte_memcpy definitely is one of the ways
to improve performance, but it could be the reason?