On Thu, Sep 22, 2016 at 05:01:41PM +0800, Jianbo Liu wrote: > On 22 September 2016 at 14:58, Wang, Zhihong wrote: > > > > > >> -----Original Message----- > >> From: Jianbo Liu [mailto:jianbo.liu@linaro.org] > >> Sent: Thursday, September 22, 2016 1:48 PM > >> To: Yuanhan Liu > >> Cc: Wang, Zhihong ; Maxime Coquelin > >> ; dev@dpdk.org > >> Subject: Re: [dpdk-dev] [PATCH v3 0/5] vhost: optimize enqueue > >> > >> On 22 September 2016 at 10:29, Yuanhan Liu > >> wrote: > >> > On Wed, Sep 21, 2016 at 08:54:11PM +0800, Jianbo Liu wrote: > >> >> >> > My setup consists of one host running a guest. > >> >> >> > The guest generates as much 64bytes packets as possible using > >> >> >> > >> >> >> Have you tested with other different packet size? > >> >> >> My testing shows that performance is dropping when packet size is > >> more > >> >> >> than 256. > >> >> > > >> >> > > >> >> > Hi Jianbo, > >> >> > > >> >> > Thanks for reporting this. > >> >> > > >> >> > 1. Are you running the vector frontend with mrg_rxbuf=off? > >> >> > > >> Yes, my testing is mrg_rxbuf=off, but not vector frontend PMD. > >> > >> >> > 2. Could you please specify what CPU you're running? Is it Haswell > >> >> > or Ivy Bridge? > >> >> > > >> It's an ARM server. > >> > >> >> > 3. How many percentage of drop are you seeing? > >> The testing result: > >> size (bytes) improvement (%) > >> 64 3.92 > >> 128 11.51 > >> 256 24.16 > >> 512 -13.79 > >> 1024 -22.51 > >> 1500 -12.22 > >> A correction is that performance is dropping if byte size is larger than 512. > > > > > > Jianbo, > > > > Could you please verify does this patch really cause enqueue perf to drop? > > > > You can test the enqueue path only by set guest to do rxonly, and compare > > the mpps by show port stats all in the guest. > > > > > Tested with testpmd, host: txonly, guest: rxonly > size (bytes) improvement (%) > 64 4.12 > 128 6 > 256 2.65 > 512 -1.12 > 1024 -7.02 There is a difference between Zhihong's code and the old I spotted in the first time: Zhihong removed the avail_idx prefetch. I understand the prefetch becomes a bit tricky when mrg-rx code path is considered; thus, I didn't comment on that. That's one of the difference that, IMO, could drop a regression. I then finally got a chance to add it back. A rough test shows it improves the performance of 1400B packet size greatly in the "txonly in host and rxonly in guest" case: +33% is the number I get with my test server (Ivybridge). I guess this might/would help your case as well. Mind to have a test and tell me the results? BTW, I made it in rush; I haven't tested the mrg-rx code path yet. Thanks. --yliu