From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Wang, Zhihong" Subject: Re: [PATCH v3 0/5] vhost: optimize enqueue Date: Wed, 12 Oct 2016 12:22:08 +0000 Message-ID: <8F6C2BD409508844A0EFC19955BE09414E7CD78E@SHSMSX103.ccr.corp.intel.com> References: <1471319402-112998-1-git-send-email-zhihong.wang@intel.com> <1471585430-125925-1-git-send-email-zhihong.wang@intel.com> <8F6C2BD409508844A0EFC19955BE09414E7B5581@SHSMSX103.ccr.corp.intel.com> <20160922022903.GJ23158@yliu-dev.sh.intel.com> <20161012025307.GG16751@yliu-dev.sh.intel.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Cc: "dev@dpdk.org" To: Yuanhan Liu , Jianbo Liu , Thomas Monjalon , Maxime Coquelin Return-path: Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by dpdk.org (Postfix) with ESMTP id B5D2C6A73 for ; Wed, 12 Oct 2016 14:22:12 +0200 (CEST) In-Reply-To: <20161012025307.GG16751@yliu-dev.sh.intel.com> Content-Language: en-US List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" > -----Original Message----- > From: Yuanhan Liu [mailto:yuanhan.liu@linux.intel.com] > Sent: Wednesday, October 12, 2016 10:53 AM > To: Wang, Zhihong ; Jianbo Liu > Cc: Maxime Coquelin ; dev@dpdk.org; Thomas > Monjalon > Subject: Re: [dpdk-dev] [PATCH v3 0/5] vhost: optimize enqueue >=20 > On Thu, Sep 22, 2016 at 01:47:45PM +0800, Jianbo Liu wrote: > > On 22 September 2016 at 10:29, Yuanhan Liu > wrote: > > > On Wed, Sep 21, 2016 at 08:54:11PM +0800, Jianbo Liu wrote: > > >> >> > My setup consists of one host running a guest. > > >> >> > The guest generates as much 64bytes packets as possible using > > >> >> > > >> >> Have you tested with other different packet size? > > >> >> My testing shows that performance is dropping when packet size is= more > > >> >> than 256. > > >> > > > >> > > > >> > Hi Jianbo, > > >> > > > >> > Thanks for reporting this. > > >> > > > >> > 1. Are you running the vector frontend with mrg_rxbuf=3Doff? > > >> > > > Yes, my testing is mrg_rxbuf=3Doff, but not vector frontend PMD. > > > > >> > 2. Could you please specify what CPU you're running? Is it Haswel= l > > >> > or Ivy Bridge? > > >> > > > It's an ARM server. > > > > >> > 3. How many percentage of drop are you seeing? > > The testing result: > > size (bytes) improvement (%) > > 64 3.92 > > 128 11.51 > > 256 24.16 > > 512 -13.79 > > 1024 -22.51 > > 1500 -12.22 > > A correction is that performance is dropping if byte size is larger tha= n 512. >=20 > I have thought of this twice. Unfortunately, I think I may need NACK this > series. >=20 > Merging two code path into one is really good: as you stated, it improves > the maintainability. But only if we see no performance regression on both > path after the refactor. Unfortunately, that's not the case here: it hurt= s > the performance for one code path (non-mrg Rx). >=20 > That makes me think we may should not do the code path merge at all. I th= ink > that also aligns with what you have said before (internally): we could do= the > merge if it gives comparable performance before and after that. >=20 > Besides that, I don't quite like the way you did in patch 2 (rewrite enqu= eue): > you made a lot of changes in one patch. That means if something wrong > happened, > it is hard to narrow down which change introduces that regression. Badly, > that's exactly what we met here. Weeks have been passed, I see no progres= s. >=20 > That's the reason we like the idea of "one patch only does one thing, an > atomic thing". Yuanhan, folks, Thanks for the analysis. I disagree here though. I analyze, develop, benchmark on x86 platforms, where this patch works great. I've been trying to analyze on ARM too but it takes time and I've had a schedule. Also since the ARM perf issue comes when it's v6 already, I might not be able to make it in time. However that's what I have to do for this patch to be merged in this or the next release. In the meantime, may I suggest we consider the possibility to have dedicated codes for **perf critical paths** for different kinds of architecture? It can be hard for a person to have both the knowledge and the development environment for multiple archs at the same time. Moreover, different optimization techniques might be required for different archs, so it's hard and unnecessary to make a function works for all archs, sometimes it's just not the right thing to do. Thanks Zhihong >=20 > So I will apply the first patch (it's a bug fixing patch) and ask you to > refactor the rest, without the code path merge. >=20 > I think we could still have a good maintainability code base if we introd= uce > more common helper functions that can be used on both Rx path, or even on > Tx path (such as update_used_ring, or shadow_used_ring). >=20 > It's a bit late for too many changes for v16.11. I think you could just > grab patch 6 (vhost: optimize cache access) to the old mrg-Rx code path, > if that also helps the performance? Let us handle the left in next releas= e, > such as shadow used ring. >=20 > Thanks. >=20 > --yliu