From mboxrd@z Thu Jan  1 00:00:00 1970
From: Jianbo Liu <jianbo.liu@linaro.org>
Subject: Re: [PATCH v3 0/5] vhost: optimize enqueue
Date: Mon, 10 Oct 2016 14:57:46 +0800
Message-ID: <CAP4Qi3-5_5aAATEqnHhNtH0j6Q_VkzF4xbZQGZYbN5KA6oEX_g@mail.gmail.com>
References: <CAP4Qi3-cSgHDPC3Wne3RSL0t=Z-vhYUPsPWH6VAXsXsHYX6ShQ@mail.gmail.com>
 <8F6C2BD409508844A0EFC19955BE09414E7B5581@SHSMSX103.ccr.corp.intel.com>
 <CAP4Qi39-KD8pY-3M31asoDV+dja27XzFTsBMq9ignoawdL8=HQ@mail.gmail.com>
 <20160922022903.GJ23158@yliu-dev.sh.intel.com>
 <CAP4Qi392=aOMrSyTu-5qwpSLpwK-NVdHp-aztT-xT=BcRPWoew@mail.gmail.com>
 <8F6C2BD409508844A0EFC19955BE09414E7B5DAE@SHSMSX103.ccr.corp.intel.com>
 <CAP4Qi39YF6SoaiSaka0ioZFWb-2uzWZUbNP4CK7LqCQosaSmWQ@mail.gmail.com>
 <20160927102123.GL25823@yliu-dev.sh.intel.com>
 <8F6C2BD409508844A0EFC19955BE09414E7B7C0B@SHSMSX103.ccr.corp.intel.com>
 <8F6C2BD409508844A0EFC19955BE09414E7BBE7D@SHSMSX103.ccr.corp.intel.com>
 <20161010024428.GT1597@yliu-dev.sh.intel.com>
 <CAP4Qi3_jk5+=diZhq8V7FecUkZge1_riEq-bopY35n71Q_pANQ@mail.gmail.com>
 <8F6C2BD409508844A0EFC19955BE09414E7BC1BD@SHSMSX103.ccr.corp.intel.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Cc: Yuanhan Liu <yuanhan.liu@linux.intel.com>,
 Maxime Coquelin <maxime.coquelin@redhat.com>, "dev@dpdk.org" <dev@dpdk.org>
To: "Wang, Zhihong" <zhihong.wang@intel.com>
Return-path: <dev-bounces@dpdk.org>
Received: from mail-qk0-f176.google.com (mail-qk0-f176.google.com
 [209.85.220.176]) by dpdk.org (Postfix) with ESMTP id A0DB96936
 for <dev@dpdk.org>; Mon, 10 Oct 2016 08:57:47 +0200 (CEST)
Received: by mail-qk0-f176.google.com with SMTP id f128so73152840qkb.1
 for <dev@dpdk.org>; Sun, 09 Oct 2016 23:57:47 -0700 (PDT)
In-Reply-To: <8F6C2BD409508844A0EFC19955BE09414E7BC1BD@SHSMSX103.ccr.corp.intel.com>
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

On 10 October 2016 at 14:22, Wang, Zhihong <zhihong.wang@intel.com> wrote:
>
>
>> -----Original Message-----
>> From: Jianbo Liu [mailto:jianbo.liu@linaro.org]
>> Sent: Monday, October 10, 2016 1:32 PM
>> To: Yuanhan Liu <yuanhan.liu@linux.intel.com>
>> Cc: Wang, Zhihong <zhihong.wang@intel.com>; Maxime Coquelin
>> <maxime.coquelin@redhat.com>; dev@dpdk.org
>> Subject: Re: [dpdk-dev] [PATCH v3 0/5] vhost: optimize enqueue
>>
>> On 10 October 2016 at 10:44, Yuanhan Liu <yuanhan.liu@linux.intel.com>
>> wrote:
>> > On Sun, Oct 09, 2016 at 12:09:07PM +0000, Wang, Zhihong wrote:
>> >> > > > Tested with testpmd, host: txonly, guest: rxonly
>> >> > > > size (bytes)     improvement (%)
>> >> > > > 64                    4.12
>> >> > > > 128                   6
>> >> > > > 256                   2.65
>> >> > > > 512                   -1.12
>> >> > > > 1024                 -7.02
>> >> > >
>> >> > > There is a difference between Zhihong's code and the old I spotted in
>> >> > > the first time: Zhihong removed the avail_idx prefetch. I understand
>> >> > > the prefetch becomes a bit tricky when mrg-rx code path is
>> considered;
>> >> > > thus, I didn't comment on that.
>> >> > >
>> >> > > That's one of the difference that, IMO, could drop a regression. I then
>> >> > > finally got a chance to add it back.
>> >> > >
>> >> > > A rough test shows it improves the performance of 1400B packet size
>> >> > greatly
>> >> > > in the "txonly in host and rxonly in guest" case: +33% is the number I
>> get
>> >> > > with my test server (Ivybridge).
>> >> >
>> >> > Thanks Yuanhan! I'll validate this on x86.
>> >>
>> >> Hi Yuanhan,
>> >>
>> >> Seems your code doesn't perform correctly. I write a new version
>> >> of avail idx prefetch but didn't see any perf benefit.
>> >>
>> >> To be honest I doubt the benefit of this idea. The previous mrg_off
>> >> code has this method but doesn't give any benefits.
>> >
>> > Good point. I thought of that before, too. But you know that I made it
>> > in rush, that I didn't think further and test more.
>> >
>> > I looked the code a bit closer this time, and spotted a bug: the prefetch
>> > actually didn't happen, due to following code piece:
>> >
>> >         if (vq->next_avail_idx >= NR_AVAIL_IDX_PREFETCH) {
>> >                 prefetch_avail_idx(vq);
>> >                 ...
>> >         }
>> >
>> > Since vq->next_avail_idx is set to 0 at the entrance of enqueue path,
>> > prefetch_avail_idx() will be called. The fix is easy though: just put
>> > prefetch_avail_idx before invoking enqueue_packet.
>> >
>> > In summary, Zhihong is right, I see no more gains with that fix :(
>> >
>> > However, as stated, that's kind of the only difference I found between
>> > yours and the old code, that maybe it's still worthwhile to have a
>> > test on ARM, Jianbo?
>> >
>> I haven't tested it, but I think it could be no improvement for ARM either.
>>
>> A smalll suggestion for enqueue_packet:
>>
>> .....
>> +       /* start copy from mbuf to desc */
>> +       while (mbuf_avail || mbuf->next) {
>> .....
>>
>> Considering pkt_len is in the first cache line (same as data_len),
>> while next pointer is in the second cache line,
>> is it better to check the total packet len, instead of the last mbuf's
>> next pointer to jump out of while loop and avoid possible cache miss?
>
> Jianbo,
>
> Thanks for the reply!
>
> This idea sounds good, but it won't help the general perf in my
> opinion, since the 2nd cache line is accessed anyway prior in
> virtio_enqueue_offload.
>
Yes, you are right. I'm thinking of prefetching beforehand.
And if it's a chained mbuf, virtio_enqueue_offload will not be called
in next loop.

> Also this would bring a NULL check when actually access mbuf->next.
>
> BTW, could you please publish the number of:
>
>  1. mrg_rxbuf=on, comparison between original and original + this patch
>
>  2. mrg_rxbuf=off, comparison between original and original + this patch
>
> So we can have a whole picture of how this patch impact on ARM platform.
>
I think you already have got many results in my previous emails.
Sorry I can't test right now and busy with other things.