From mboxrd@z Thu Jan  1 00:00:00 1970
From: Tiwei Bie <tiwei.bie@intel.com>
Subject: Re: [PATCH] vhost: adaptively batch small guest memory
	copies
Date: Fri, 1 Sep 2017 18:33:23 +0800
Message-ID: <20170901103322.GA10109@debian-ZGViaWFuCg>
References: <20170824021939.21306-1-tiwei.bie@intel.com>
 <8697fb77-a1d6-c3de-2bc4-2a9956fbad36@redhat.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Cc: dev@dpdk.org, yliu@fridaylinux.org, Zhihong Wang <zhihong.wang@intel.com>,
 Zhiyong Yang <zhiyong.yang@intel.com>,
 Santosh Shukla <santosh.shukla@caviumnetworks.com>,
 Jerin Jacob <jerin.jacob@caviumnetworks.com>, hemant.agrawal@nxp.com
To: Maxime Coquelin <maxime.coquelin@redhat.com>
Return-path: <dev-bounces@dpdk.org>
Received: from mga01.intel.com (mga01.intel.com [192.55.52.88])
 by dpdk.org (Postfix) with ESMTP id 3929937AC
 for <dev@dpdk.org>; Fri,  1 Sep 2017 12:32:59 +0200 (CEST)
Content-Disposition: inline
In-Reply-To: <8697fb77-a1d6-c3de-2bc4-2a9956fbad36@redhat.com>
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

On Fri, Sep 01, 2017 at 11:45:42AM +0200, Maxime Coquelin wrote:
> On 08/24/2017 04:19 AM, Tiwei Bie wrote:
> > This patch adaptively batches the small guest memory copies.
> > By batching the small copies, the efficiency of executing the
> > memory LOAD instructions can be improved greatly, because the
> > memory LOAD latency can be effectively hidden by the pipeline.
> > We saw great performance boosts for small packets PVP test.
> > 
> > This patch improves the performance for small packets, and has
> > distinguished the packets by size. So although the performance
> > for big packets doesn't change, it makes it relatively easy to
> > do some special optimizations for the big packets too.
> 
> Do you mean that if we would batch unconditionnaly whatever the size,
> we see performance drop for larger (>256) packets?
> 

Yeah, you are right.

> Other question is about indirect descriptors, my understanding of the
> patch is that the number of batched copies is limited to the queue size.
> In theory, we could have more than that with indirect descriptors (first
> indirect desc for the vnet header, second one for the packet).
> 
> So in the worst case, we would have the first small copies being
> batched, but not the last ones if there are more than queue size.
> So, I think it works, but I'd like your confirmation.
> 

Yeah, you are right. If the number of small copies is larger than
the queue size, the last ones won't be batched any more.

> > 
> > Signed-off-by: Tiwei Bie <tiwei.bie@intel.com>
> > Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
> > Signed-off-by: Zhiyong Yang <zhiyong.yang@intel.com>
> > ---
> > This optimization depends on the CPU internal pipeline design.
> > So further tests (e.g. ARM) from the community is appreciated.
> 
> Agree, I think this is important to have it tested on ARM platforms at
> least to ensure it doesn't introduce a regression.
> 
> Adding Santosh, Jerin & Hemant in cc, who might know who could do the
> test.
> 

Thank you very much! :-)

Best regards,
Tiwei Bie