From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jianbo Liu Subject: Re: [PATCH v3 0/5] vhost: optimize enqueue Date: Mon, 26 Sep 2016 13:12:46 +0800 Message-ID: References: <1471319402-112998-1-git-send-email-zhihong.wang@intel.com> <8F6C2BD409508844A0EFC19955BE09414E7B6204@SHSMSX103.ccr.corp.intel.com> <1536480.IYe8r5XoNN@xps13> <8F6C2BD409508844A0EFC19955BE09414E7B6EA6@SHSMSX103.ccr.corp.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Cc: Thomas Monjalon , "dev@dpdk.org" , Yuanhan Liu , Maxime Coquelin To: "Wang, Zhihong" Return-path: Received: from mail-yb0-f179.google.com (mail-yb0-f179.google.com [209.85.213.179]) by dpdk.org (Postfix) with ESMTP id CB29329CF for ; Mon, 26 Sep 2016 07:12:47 +0200 (CEST) Received: by mail-yb0-f179.google.com with SMTP id i83so7047285ybi.3 for ; Sun, 25 Sep 2016 22:12:47 -0700 (PDT) In-Reply-To: <8F6C2BD409508844A0EFC19955BE09414E7B6EA6@SHSMSX103.ccr.corp.intel.com> List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 25 September 2016 at 13:41, Wang, Zhihong wrote: > > >> -----Original Message----- >> From: Thomas Monjalon [mailto:thomas.monjalon@6wind.com] >> Sent: Friday, September 23, 2016 9:41 PM >> To: Jianbo Liu >> Cc: dev@dpdk.org; Wang, Zhihong ; Yuanhan Liu >> ; Maxime Coquelin >> .... > This patch does help in ARM for small packets like 64B sized ones, > this actually proves the similarity between x86 and ARM in terms > of caching optimization in this patch. > > My estimation is based on: > > 1. The last patch are for mrg_rxbuf=on, and since you said it helps > perf, we can ignore it for now when we discuss mrg_rxbuf=off > > 2. Vhost enqueue perf = > Ring overhead + Virtio header overhead + Data memcpy overhead > > 3. This patch helps small packets traffic, which means it helps > ring + virtio header operations > > 4. So, when you say perf drop when packet size larger than 512B, > this is most likely caused by memcpy in ARM not working well > with this patch > > I'm not saying glibc's memcpy is not good enough, it's just that > this is a rather special use case. And since we see specialized > memcpy + this patch give better performance than other combinations > significantly on x86, we suggest to hand-craft a specialized memcpy > for it. > > Of course on ARM this is still just my speculation, and we need to > either prove it or find the actual root cause. > > It can be **REALLY HELPFUL** if you could help to test this patch on > ARM for mrg_rxbuf=on cases to see if this patch is in fact helpful > to ARM at all, since mrg_rxbuf=on the more widely used cases. > Actually it's worse than mrg_rxbuf=off.