On Thu, Sep 22, 2016 at 05:01:41PM +0800, Jianbo Liu wrote:
> On 22 September 2016 at 14:58, Wang, Zhihong <zhihong.wang@intel.com> wrote:
> >
> >
> >> -----Original Message-----
> >> From: Jianbo Liu [mailto:jianbo.liu@linaro.org]
> >> Sent: Thursday, September 22, 2016 1:48 PM
> >> To: Yuanhan Liu <yuanhan.liu@linux.intel.com>
> >> Cc: Wang, Zhihong <zhihong.wang@intel.com>; Maxime Coquelin
> >> <maxime.coquelin@redhat.com>; dev@dpdk.org
> >> Subject: Re: [dpdk-dev] [PATCH v3 0/5] vhost: optimize enqueue
> >>
> >> On 22 September 2016 at 10:29, Yuanhan Liu <yuanhan.liu@linux.intel.com>
> >> wrote:
> >> > On Wed, Sep 21, 2016 at 08:54:11PM +0800, Jianbo Liu wrote:
> >> >> >> > My setup consists of one host running a guest.
> >> >> >> > The guest generates as much 64bytes packets as possible using
> >> >> >>
> >> >> >> Have you tested with other different packet size?
> >> >> >> My testing shows that performance is dropping when packet size is
> >> more
> >> >> >> than 256.
> >> >> >
> >> >> >
> >> >> > Hi Jianbo,
> >> >> >
> >> >> > Thanks for reporting this.
> >> >> >
> >> >> >  1. Are you running the vector frontend with mrg_rxbuf=off?
> >> >> >
> >> Yes, my testing is mrg_rxbuf=off, but not vector frontend PMD.
> >>
> >> >> >  2. Could you please specify what CPU you're running? Is it Haswell
> >> >> >     or Ivy Bridge?
> >> >> >
> >> It's an ARM server.
> >>
> >> >> >  3. How many percentage of drop are you seeing?
> >> The testing result:
> >> size (bytes)     improvement (%)
> >> 64                   3.92
> >> 128                 11.51
> >> 256                  24.16
> >> 512                  -13.79
> >> 1024                -22.51
> >> 1500                -12.22
> >> A correction is that performance is dropping if byte size is larger than 512.
> >
> >
> > Jianbo,
> >
> > Could you please verify does this patch really cause enqueue perf to drop?
> >
> > You can test the enqueue path only by set guest to do rxonly, and compare
> > the mpps by show port stats all in the guest.
> >
> >
> Tested with testpmd, host: txonly, guest: rxonly
> size (bytes)     improvement (%)
> 64                    4.12
> 128                   6
> 256                   2.65
> 512                   -1.12
> 1024                 -7.02

There is a difference between Zhihong's code and the old I spotted in
the first time: Zhihong removed the avail_idx prefetch. I understand
the prefetch becomes a bit tricky when mrg-rx code path is considered;
thus, I didn't comment on that.

That's one of the difference that, IMO, could drop a regression. I then
finally got a chance to add it back.

A rough test shows it improves the performance of 1400B packet size greatly
in the "txonly in host and rxonly in guest" case: +33% is the number I get
with my test server (Ivybridge).

I guess this might/would help your case as well. Mind to have a test
and tell me the results?

BTW, I made it in rush; I haven't tested the mrg-rx code path yet.

Thanks.

	--yliu