From mboxrd@z Thu Jan  1 00:00:00 1970
From: Maxime Coquelin <maxime.coquelin@redhat.com>
Subject: Re: [PATCH v3 0/5] vhost: optimize enqueue
Date: Mon, 22 Aug 2016 12:01:47 +0200
Message-ID: <2a5320b9-c9dd-f7e3-e659-7d14cd1e5620@redhat.com>
References: <1471319402-112998-1-git-send-email-zhihong.wang@intel.com>
 <1471585430-125925-1-git-send-email-zhihong.wang@intel.com>
 <e6addeba-ffbc-baae-61c8-5b8e798c843e@redhat.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Cc: yuanhan.liu@linux.intel.com
To: Zhihong Wang <zhihong.wang@intel.com>, dev@dpdk.org
Return-path: <dev-bounces@dpdk.org>
Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28])
 by dpdk.org (Postfix) with ESMTP id 246AD37B3
 for <dev@dpdk.org>; Mon, 22 Aug 2016 12:01:51 +0200 (CEST)
In-Reply-To: <e6addeba-ffbc-baae-61c8-5b8e798c843e@redhat.com>
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>


On 08/22/2016 10:11 AM, Maxime Coquelin wrote:
> Hi Zhihong,
>
> On 08/19/2016 07:43 AM, Zhihong Wang wrote:
> > This patch set optimizes the vhost enqueue function.
> >
> > It implements the vhost logic from scratch into a single function
> > designed
> > for high performance and good maintainability, and improves CPU
> > efficiency
> > significantly by optimizing cache access, which means:
> >
> >  *  For fast frontends (eg. DPDK virtio pmd), higher performance (maximum
> >     throughput) can be achieved.
> >
> >  *  For slow frontends (eg. kernel virtio-net), better scalability can be
> >     achieved, each vhost core can support more connections since it takes
> >     less cycles to handle each single frontend.
> >
> > The main optimization techniques are:
> >
> >  1. Reorder code to reduce CPU pipeline stall cycles.
> >
> >  2. Batch update the used ring for better efficiency.
> >
> >  3. Prefetch descriptor to hide cache latency.
> >
> >  4. Remove useless volatile attribute to allow compiler optimization.
>
> Thanks for these details, this is helpful to understand where the perf
> gain comes from.
> I would suggest to add these information as comments in the code
> where/if it makes sense. If more a general comment, at least add it in
> the commit message of the patch introducing it.
> Indeed, adding it to the cover letter is fine, but the information is
> lost as soon as the series is applied.
>
> You don't mention any figures, so I set up a benchmark on my side to
> evaluate your series. It indeed shows an interesting performance gain.
>
> My setup consists of one host running a guest.
> The guest generates as much 64bytes packets as possible using
> pktgen-dpdk. The hosts forwards received packets back to the guest
> using testpmd on vhost pmd interface. Guest's vCPUs are pinned to
> physical CPUs.
>
> I tested it with and without your v1 patch, with and without
> rx-mergeable feature turned ON.
> Results are the average of 8 runs of 60 seconds:
>
> Rx-Mergeable ON : 7.72Mpps
> Rx-Mergeable ON + "vhost: optimize enqueue" v1: 9.19Mpps
> Rx-Mergeable OFF: 10.52Mpps
> Rx-Mergeable OFF + "vhost: optimize enqueue" v1: 10.60Mpps
>
I forgot to add that before this series, I think we should first fix the windows bug.
Else we will need a dedicated fix for the stable branch.

Regards,
Maxime