On Thu, Dec 13, 2018 at 11:08:04AM +0800, jiangyiwen wrote: > On 2018/12/12 23:37, Michael S. Tsirkin wrote: > > On Wed, Dec 12, 2018 at 05:29:31PM +0800, jiangyiwen wrote: > >> When vhost support VIRTIO_VSOCK_F_MRG_RXBUF feature, > >> it will merge big packet into rx vq. > >> > >> Signed-off-by: Yiwen Jiang > > > > I feel this approach jumps into making interface changes for > > optimizations too quickly. For example, what prevents us > > from taking a big buffer, prepending each chunk > > with the header and writing it out without > > host/guest interface changes? > > > > This should allow optimizations such as vhost_add_used_n > > batching. > > > > I realize a header in each packet does have a cost, > > but it also has advantages such as improved robustness, > > I'd like to see more of an apples to apples comparison > > of the performance gain from skipping them. > > > > > > Hi Michael, > > I don't fully understand what you mean, do you want to > see a performance comparison that before performance and > only use batching? > > In my opinion, guest don't fill big buffer in rx vq because > the balance performance and guest memory pressure, add > mergeable feature can improve big packets performance, > as for small packets, I try to find out the reason, may be > the fluctuation of test results, or in mergeable mode, when > Host send a 4k packet to Guest, we should call vhost_get_vq_desc() > twice in host(hdr + 4k data), and in guest we also should call > virtqueue_get_buf() twice. I like the idea of making optimizations in small steps and measuring the effect of each step. This way we'll know which aspect caused the differences in benchmark results. Stefan