On Thu, Dec 13, 2018 at 11:08:04AM +0800, jiangyiwen wrote:
> On 2018/12/12 23:37, Michael S. Tsirkin wrote:
> > On Wed, Dec 12, 2018 at 05:29:31PM +0800, jiangyiwen wrote:
> >> When vhost support VIRTIO_VSOCK_F_MRG_RXBUF feature,
> >> it will merge big packet into rx vq.
> >>
> >> Signed-off-by: Yiwen Jiang <jiangyiwen@huawei.com>
> > 
> > I feel this approach jumps into making interface changes for
> > optimizations too quickly. For example, what prevents us
> > from taking a big buffer, prepending each chunk
> > with the header and writing it out without
> > host/guest interface changes?
> > 
> > This should allow optimizations such as vhost_add_used_n
> > batching.
> > 
> > I realize a header in each packet does have a cost,
> > but it also has advantages such as improved robustness,
> > I'd like to see more of an apples to apples comparison
> > of the performance gain from skipping them.
> > 
> > 
> 
> Hi Michael,
> 
> I don't fully understand what you mean, do you want to
> see a performance comparison that before performance and
> only use batching?
> 
> In my opinion, guest don't fill big buffer in rx vq because
> the balance performance and guest memory pressure, add
> mergeable feature can improve big packets performance,
> as for small packets, I try to find out the reason, may be
> the fluctuation of test results, or in mergeable mode, when
> Host send a 4k packet to Guest, we should call vhost_get_vq_desc()
> twice in host(hdr + 4k data), and in guest we also should call
> virtqueue_get_buf() twice.

I like the idea of making optimizations in small steps and measuring the
effect of each step.  This way we'll know which aspect caused the
differences in benchmark results.

Stefan