From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jianbo Liu Subject: Re: [PATCH v3 0/5] vhost: optimize enqueue Date: Mon, 26 Sep 2016 12:24:25 +0800 Message-ID: References: <1471319402-112998-1-git-send-email-zhihong.wang@intel.com> <8F6C2BD409508844A0EFC19955BE09414E7B6204@SHSMSX103.ccr.corp.intel.com> <1536480.IYe8r5XoNN@xps13> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Cc: dev@dpdk.org, "Wang, Zhihong" , Yuanhan Liu , Maxime Coquelin To: Thomas Monjalon Return-path: Received: from mail-yw0-f170.google.com (mail-yw0-f170.google.com [209.85.161.170]) by dpdk.org (Postfix) with ESMTP id 48A3C37B1 for ; Mon, 26 Sep 2016 06:24:27 +0200 (CEST) Received: by mail-yw0-f170.google.com with SMTP id i129so150674305ywb.0 for ; Sun, 25 Sep 2016 21:24:27 -0700 (PDT) In-Reply-To: <1536480.IYe8r5XoNN@xps13> List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi Thomas, On 23 September 2016 at 21:41, Thomas Monjalon wrote: > 2016-09-23 18:41, Jianbo Liu: >> On 23 September 2016 at 10:56, Wang, Zhihong wrote: >> ..... >> > This is expected because the 2nd patch is just a baseline and all optimization >> > patches are organized in the rest of this patch set. >> > >> > I think you can do bottleneck analysis on ARM to see what's slowing down the >> > perf, there might be some micro-arch complications there, mostly likely in >> > memcpy. >> > >> > Do you use glibc's memcpy? I suggest to hand-crafted it on your own. >> > >> > Could you publish the mrg_rxbuf=on data also? Since it's more widely used >> > in terms of spec integrity. >> > >> I don't think it will be helpful for you, considering the differences >> between x86 and arm. >> So please move on with this patchset... > > Jianbo, > I don't understand. > You said that the 2nd patch is a regression: > - volatile uint16_t last_used_idx; > + uint16_t last_used_idx; > No, I meant "vhost: rewrite enqueue". > And the overrall series lead to performance regression > for packets > 512 B, right? > But we don't know wether you have tested the v6 or not. Yes, I tested v6, and found performance regression for size >=512B. > > Zhihong talked about some improvements possible in rte_memcpy. > ARM64 is using libc memcpy in rte_memcpy. > > Now you seem to give up. > Does it mean you accept having a regression in 16.11 release? > Are you working on rte_memcpy? This patchset actually improves performance according to Zhihong's result on x86 platfrom. And I also get improvement as least with small-size packet on ARM. I don't want to give up, but I need more time to find out the reason for the regression. I think rte_memcpy definitely is one of the ways to improve performance, but it could be the reason?