From mboxrd@z Thu Jan 1 00:00:00 1970 From: Thomas Monjalon Subject: Re: [PATCH v3 0/5] vhost: optimize enqueue Date: Fri, 23 Sep 2016 15:41:08 +0200 Message-ID: <1536480.IYe8r5XoNN@xps13> References: <1471319402-112998-1-git-send-email-zhihong.wang@intel.com> <8F6C2BD409508844A0EFC19955BE09414E7B6204@SHSMSX103.ccr.corp.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7Bit Cc: dev@dpdk.org, "Wang, Zhihong" , Yuanhan Liu , Maxime Coquelin To: Jianbo Liu Return-path: Received: from mail-wm0-f41.google.com (mail-wm0-f41.google.com [74.125.82.41]) by dpdk.org (Postfix) with ESMTP id C713068F5 for ; Fri, 23 Sep 2016 15:41:11 +0200 (CEST) Received: by mail-wm0-f41.google.com with SMTP id 197so5876352wmk.1 for ; Fri, 23 Sep 2016 06:41:11 -0700 (PDT) In-Reply-To: List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" 2016-09-23 18:41, Jianbo Liu: > On 23 September 2016 at 10:56, Wang, Zhihong wrote: > ..... > > This is expected because the 2nd patch is just a baseline and all optimization > > patches are organized in the rest of this patch set. > > > > I think you can do bottleneck analysis on ARM to see what's slowing down the > > perf, there might be some micro-arch complications there, mostly likely in > > memcpy. > > > > Do you use glibc's memcpy? I suggest to hand-crafted it on your own. > > > > Could you publish the mrg_rxbuf=on data also? Since it's more widely used > > in terms of spec integrity. > > > I don't think it will be helpful for you, considering the differences > between x86 and arm. > So please move on with this patchset... Jianbo, I don't understand. You said that the 2nd patch is a regression: - volatile uint16_t last_used_idx; + uint16_t last_used_idx; And the overrall series lead to performance regression for packets > 512 B, right? But we don't know wether you have tested the v6 or not. Zhihong talked about some improvements possible in rte_memcpy. ARM64 is using libc memcpy in rte_memcpy. Now you seem to give up. Does it mean you accept having a regression in 16.11 release? Are you working on rte_memcpy?