From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jianbo Liu Subject: Re: [PATCH v3 0/5] vhost: optimize enqueue Date: Mon, 26 Sep 2016 13:38:58 +0800 Message-ID: References: <1471319402-112998-1-git-send-email-zhihong.wang@intel.com> <8F6C2BD409508844A0EFC19955BE09414E7B6204@SHSMSX103.ccr.corp.intel.com> <1536480.IYe8r5XoNN@xps13> <8F6C2BD409508844A0EFC19955BE09414E7B6EA6@SHSMSX103.ccr.corp.intel.com> <8F6C2BD409508844A0EFC19955BE09414E7B720A@SHSMSX103.ccr.corp.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Cc: Thomas Monjalon , "dev@dpdk.org" , Yuanhan Liu , Maxime Coquelin To: "Wang, Zhihong" Return-path: Received: from mail-yb0-f182.google.com (mail-yb0-f182.google.com [209.85.213.182]) by dpdk.org (Postfix) with ESMTP id 7009937B1 for ; Mon, 26 Sep 2016 07:39:06 +0200 (CEST) Received: by mail-yb0-f182.google.com with SMTP id t5so8279768yba.2 for ; Sun, 25 Sep 2016 22:39:06 -0700 (PDT) In-Reply-To: <8F6C2BD409508844A0EFC19955BE09414E7B720A@SHSMSX103.ccr.corp.intel.com> List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 26 September 2016 at 13:25, Wang, Zhihong wrote: > > >> -----Original Message----- >> From: Jianbo Liu [mailto:jianbo.liu@linaro.org] >> Sent: Monday, September 26, 2016 1:13 PM >> To: Wang, Zhihong >> Cc: Thomas Monjalon ; dev@dpdk.org; Yuanhan >> Liu ; Maxime Coquelin >> >> Subject: Re: [dpdk-dev] [PATCH v3 0/5] vhost: optimize enqueue >> >> On 25 September 2016 at 13:41, Wang, Zhihong >> wrote: >> > >> > >> >> -----Original Message----- >> >> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com] >> >> Sent: Friday, September 23, 2016 9:41 PM >> >> To: Jianbo Liu >> >> Cc: dev@dpdk.org; Wang, Zhihong ; Yuanhan Liu >> >> ; Maxime Coquelin >> >> >> .... >> > This patch does help in ARM for small packets like 64B sized ones, >> > this actually proves the similarity between x86 and ARM in terms >> > of caching optimization in this patch. >> > >> > My estimation is based on: >> > >> > 1. The last patch are for mrg_rxbuf=on, and since you said it helps >> > perf, we can ignore it for now when we discuss mrg_rxbuf=off >> > >> > 2. Vhost enqueue perf = >> > Ring overhead + Virtio header overhead + Data memcpy overhead >> > >> > 3. This patch helps small packets traffic, which means it helps >> > ring + virtio header operations >> > >> > 4. So, when you say perf drop when packet size larger than 512B, >> > this is most likely caused by memcpy in ARM not working well >> > with this patch >> > >> > I'm not saying glibc's memcpy is not good enough, it's just that >> > this is a rather special use case. And since we see specialized >> > memcpy + this patch give better performance than other combinations >> > significantly on x86, we suggest to hand-craft a specialized memcpy >> > for it. >> > >> > Of course on ARM this is still just my speculation, and we need to >> > either prove it or find the actual root cause. >> > >> > It can be **REALLY HELPFUL** if you could help to test this patch on >> > ARM for mrg_rxbuf=on cases to see if this patch is in fact helpful >> > to ARM at all, since mrg_rxbuf=on the more widely used cases. >> > >> Actually it's worse than mrg_rxbuf=off. > > I mean compare the perf of original vs. original + patch with > mrg_rxbuf turned on. Is there any perf improvement? > Yes, orig + patch + on is better than orig + on, but orig + patch + on is worse than orig + patch + off.