From mboxrd@z Thu Jan  1 00:00:00 1970
From: Jianbo Liu <jianbo.liu@linaro.org>
Subject: Re: [PATCH v3 0/5] vhost: optimize enqueue
Date: Mon, 26 Sep 2016 13:38:58 +0800
Message-ID: <CAP4Qi38GnqR=-VOroLsBh6cqD9wY8fnC234ug5jMh_F5Ad702w@mail.gmail.com>
References: <1471319402-112998-1-git-send-email-zhihong.wang@intel.com>
 <8F6C2BD409508844A0EFC19955BE09414E7B6204@SHSMSX103.ccr.corp.intel.com>
 <CAP4Qi3_DxAnvs0jX1P=G_PiLnRRbP5Wty-eU-OPE_81RGCAuTA@mail.gmail.com>
 <1536480.IYe8r5XoNN@xps13>
 <8F6C2BD409508844A0EFC19955BE09414E7B6EA6@SHSMSX103.ccr.corp.intel.com>
 <CAP4Qi39FfeA_B1-5D-5wzXK_S2wJXGUHp9S+w2Z_BtoedAfrfQ@mail.gmail.com>
 <8F6C2BD409508844A0EFC19955BE09414E7B720A@SHSMSX103.ccr.corp.intel.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Cc: Thomas Monjalon <thomas.monjalon@6wind.com>, "dev@dpdk.org" <dev@dpdk.org>,
 Yuanhan Liu <yuanhan.liu@linux.intel.com>,
 Maxime Coquelin <maxime.coquelin@redhat.com>
To: "Wang, Zhihong" <zhihong.wang@intel.com>
Return-path: <dev-bounces@dpdk.org>
Received: from mail-yb0-f182.google.com (mail-yb0-f182.google.com
 [209.85.213.182]) by dpdk.org (Postfix) with ESMTP id 7009937B1
 for <dev@dpdk.org>; Mon, 26 Sep 2016 07:39:06 +0200 (CEST)
Received: by mail-yb0-f182.google.com with SMTP id t5so8279768yba.2
 for <dev@dpdk.org>; Sun, 25 Sep 2016 22:39:06 -0700 (PDT)
In-Reply-To: <8F6C2BD409508844A0EFC19955BE09414E7B720A@SHSMSX103.ccr.corp.intel.com>
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

On 26 September 2016 at 13:25, Wang, Zhihong <zhihong.wang@intel.com> wrote:
>
>
>> -----Original Message-----
>> From: Jianbo Liu [mailto:jianbo.liu@linaro.org]
>> Sent: Monday, September 26, 2016 1:13 PM
>> To: Wang, Zhihong <zhihong.wang@intel.com>
>> Cc: Thomas Monjalon <thomas.monjalon@6wind.com>; dev@dpdk.org; Yuanhan
>> Liu <yuanhan.liu@linux.intel.com>; Maxime Coquelin
>> <maxime.coquelin@redhat.com>
>> Subject: Re: [dpdk-dev] [PATCH v3 0/5] vhost: optimize enqueue
>>
>> On 25 September 2016 at 13:41, Wang, Zhihong <zhihong.wang@intel.com>
>> wrote:
>> >
>> >
>> >> -----Original Message-----
>> >> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com]
>> >> Sent: Friday, September 23, 2016 9:41 PM
>> >> To: Jianbo Liu <jianbo.liu@linaro.org>
>> >> Cc: dev@dpdk.org; Wang, Zhihong <zhihong.wang@intel.com>; Yuanhan Liu
>> >> <yuanhan.liu@linux.intel.com>; Maxime Coquelin
>> >> <maxime.coquelin@redhat.com>
>> ....
>> > This patch does help in ARM for small packets like 64B sized ones,
>> > this actually proves the similarity between x86 and ARM in terms
>> > of caching optimization in this patch.
>> >
>> > My estimation is based on:
>> >
>> >  1. The last patch are for mrg_rxbuf=on, and since you said it helps
>> >     perf, we can ignore it for now when we discuss mrg_rxbuf=off
>> >
>> >  2. Vhost enqueue perf =
>> >     Ring overhead + Virtio header overhead + Data memcpy overhead
>> >
>> >  3. This patch helps small packets traffic, which means it helps
>> >     ring + virtio header operations
>> >
>> >  4. So, when you say perf drop when packet size larger than 512B,
>> >     this is most likely caused by memcpy in ARM not working well
>> >     with this patch
>> >
>> > I'm not saying glibc's memcpy is not good enough, it's just that
>> > this is a rather special use case. And since we see specialized
>> > memcpy + this patch give better performance than other combinations
>> > significantly on x86, we suggest to hand-craft a specialized memcpy
>> > for it.
>> >
>> > Of course on ARM this is still just my speculation, and we need to
>> > either prove it or find the actual root cause.
>> >
>> > It can be **REALLY HELPFUL** if you could help to test this patch on
>> > ARM for mrg_rxbuf=on cases to see if this patch is in fact helpful
>> > to ARM at all, since mrg_rxbuf=on the more widely used cases.
>> >
>> Actually it's worse than mrg_rxbuf=off.
>
> I mean compare the perf of original vs. original + patch with
> mrg_rxbuf turned on. Is there any perf improvement?
>
Yes, orig + patch + on is better than orig + on, but orig + patch + on
is worse than orig + patch + off.