From mboxrd@z Thu Jan 1 00:00:00 1970 From: Tiwei Bie Subject: Re: [PATCH] vhost: adaptively batch small guest memory copies Date: Fri, 1 Sep 2017 18:33:23 +0800 Message-ID: <20170901103322.GA10109@debian-ZGViaWFuCg> References: <20170824021939.21306-1-tiwei.bie@intel.com> <8697fb77-a1d6-c3de-2bc4-2a9956fbad36@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Cc: dev@dpdk.org, yliu@fridaylinux.org, Zhihong Wang , Zhiyong Yang , Santosh Shukla , Jerin Jacob , hemant.agrawal@nxp.com To: Maxime Coquelin Return-path: Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by dpdk.org (Postfix) with ESMTP id 3929937AC for ; Fri, 1 Sep 2017 12:32:59 +0200 (CEST) Content-Disposition: inline In-Reply-To: <8697fb77-a1d6-c3de-2bc4-2a9956fbad36@redhat.com> List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On Fri, Sep 01, 2017 at 11:45:42AM +0200, Maxime Coquelin wrote: > On 08/24/2017 04:19 AM, Tiwei Bie wrote: > > This patch adaptively batches the small guest memory copies. > > By batching the small copies, the efficiency of executing the > > memory LOAD instructions can be improved greatly, because the > > memory LOAD latency can be effectively hidden by the pipeline. > > We saw great performance boosts for small packets PVP test. > > > > This patch improves the performance for small packets, and has > > distinguished the packets by size. So although the performance > > for big packets doesn't change, it makes it relatively easy to > > do some special optimizations for the big packets too. > > Do you mean that if we would batch unconditionnaly whatever the size, > we see performance drop for larger (>256) packets? > Yeah, you are right. > Other question is about indirect descriptors, my understanding of the > patch is that the number of batched copies is limited to the queue size. > In theory, we could have more than that with indirect descriptors (first > indirect desc for the vnet header, second one for the packet). > > So in the worst case, we would have the first small copies being > batched, but not the last ones if there are more than queue size. > So, I think it works, but I'd like your confirmation. > Yeah, you are right. If the number of small copies is larger than the queue size, the last ones won't be batched any more. > > > > Signed-off-by: Tiwei Bie > > Signed-off-by: Zhihong Wang > > Signed-off-by: Zhiyong Yang > > --- > > This optimization depends on the CPU internal pipeline design. > > So further tests (e.g. ARM) from the community is appreciated. > > Agree, I think this is important to have it tested on ARM platforms at > least to ensure it doesn't introduce a regression. > > Adding Santosh, Jerin & Hemant in cc, who might know who could do the > test. > Thank you very much! :-) Best regards, Tiwei Bie