All of lore.kernel.org
 help / color / mirror / Atom feed
* virtio ring layout changes for optimal single-stream performance
@ 2016-01-21 13:39 ` Michael S. Tsirkin
  0 siblings, 0 replies; 11+ messages in thread
From: Michael S. Tsirkin @ 2016-01-21 13:39 UTC (permalink / raw)
  To: virtio
  Cc: virtio-dev, kvm, qemu-devel, dev, virtualization, Xie, Huawei,
	linux-kernel

Hi all!
I have been experimenting with alternative virtio ring layouts,
in order to speed up single stream performance.

I have just posted a benchmark I wrote for the purpose, and a (partial)
alternative layout implementation.  This achieves 20-40% reduction in
virtio overhead in the (default) polling mode.

http://article.gmane.org/gmane.linux.kernel.virtualization/26889

The layout is trying to be as simple as possible, to reduce
the number of cache lines bouncing between CPUs.

For benchmarking, the idea is to emulate virtio in user-space,
artificially adding overhead for e.g. signalling to match what happens
in case of a VM.

I'd be very curious to get feedback on this, in particular, some people
discussed using vectored operations to format virtio ring - would it
conflict with this work?

You are all welcome to post enhancements or more layout alternatives as
patches.

TODO:
- documentation+discussion of interaction with CPU caching
- thorough benchmarking of different configurations/hosts
- experiment with event index replacements
- better emulate vmexit/vmentry cost overhead
- virtio spec proposal

Thanks!
-- 
MST

^ permalink raw reply	[flat|nested] 11+ messages in thread

* virtio ring layout changes for optimal single-stream performance
@ 2016-01-21 13:39 ` Michael S. Tsirkin
  0 siblings, 0 replies; 11+ messages in thread
From: Michael S. Tsirkin @ 2016-01-21 13:39 UTC (permalink / raw)
  To: virtio; +Cc: virtio-dev, kvm, dev, linux-kernel, qemu-devel, virtualization

Hi all!
I have been experimenting with alternative virtio ring layouts,
in order to speed up single stream performance.

I have just posted a benchmark I wrote for the purpose, and a (partial)
alternative layout implementation.  This achieves 20-40% reduction in
virtio overhead in the (default) polling mode.

http://article.gmane.org/gmane.linux.kernel.virtualization/26889

The layout is trying to be as simple as possible, to reduce
the number of cache lines bouncing between CPUs.

For benchmarking, the idea is to emulate virtio in user-space,
artificially adding overhead for e.g. signalling to match what happens
in case of a VM.

I'd be very curious to get feedback on this, in particular, some people
discussed using vectored operations to format virtio ring - would it
conflict with this work?

You are all welcome to post enhancements or more layout alternatives as
patches.

TODO:
- documentation+discussion of interaction with CPU caching
- thorough benchmarking of different configurations/hosts
- experiment with event index replacements
- better emulate vmexit/vmentry cost overhead
- virtio spec proposal

Thanks!
-- 
MST

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Qemu-devel] virtio ring layout changes for optimal single-stream performance
@ 2016-01-21 13:39 ` Michael S. Tsirkin
  0 siblings, 0 replies; 11+ messages in thread
From: Michael S. Tsirkin @ 2016-01-21 13:39 UTC (permalink / raw)
  To: virtio
  Cc: virtio-dev, kvm, dev, linux-kernel, qemu-devel, virtualization,
	Xie, Huawei

Hi all!
I have been experimenting with alternative virtio ring layouts,
in order to speed up single stream performance.

I have just posted a benchmark I wrote for the purpose, and a (partial)
alternative layout implementation.  This achieves 20-40% reduction in
virtio overhead in the (default) polling mode.

http://article.gmane.org/gmane.linux.kernel.virtualization/26889

The layout is trying to be as simple as possible, to reduce
the number of cache lines bouncing between CPUs.

For benchmarking, the idea is to emulate virtio in user-space,
artificially adding overhead for e.g. signalling to match what happens
in case of a VM.

I'd be very curious to get feedback on this, in particular, some people
discussed using vectored operations to format virtio ring - would it
conflict with this work?

You are all welcome to post enhancements or more layout alternatives as
patches.

TODO:
- documentation+discussion of interaction with CPU caching
- thorough benchmarking of different configurations/hosts
- experiment with event index replacements
- better emulate vmexit/vmentry cost overhead
- virtio spec proposal

Thanks!
-- 
MST

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: virtio ring layout changes for optimal single-stream performance
  2016-01-21 13:39 ` Michael S. Tsirkin
  (?)
@ 2016-01-21 15:38   ` Cornelia Huck
  -1 siblings, 0 replies; 11+ messages in thread
From: Cornelia Huck @ 2016-01-21 15:38 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtio, virtio-dev, kvm, dev, linux-kernel, qemu-devel,
	virtualization, Xie, Huawei

On Thu, 21 Jan 2016 15:39:26 +0200
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> Hi all!
> I have been experimenting with alternative virtio ring layouts,
> in order to speed up single stream performance.
> 
> I have just posted a benchmark I wrote for the purpose, and a (partial)
> alternative layout implementation.  This achieves 20-40% reduction in
> virtio overhead in the (default) polling mode.
> 
> http://article.gmane.org/gmane.linux.kernel.virtualization/26889
> 
> The layout is trying to be as simple as possible, to reduce
> the number of cache lines bouncing between CPUs.

Some kind of diagram or textual description would really help to review
this.

> 
> For benchmarking, the idea is to emulate virtio in user-space,
> artificially adding overhead for e.g. signalling to match what happens
> in case of a VM.

Hm... is this overhead comparable enough between different platform so
that you can get a halfway realistic scenario? What about things like
endianness conversions?

> 
> I'd be very curious to get feedback on this, in particular, some people
> discussed using vectored operations to format virtio ring - would it
> conflict with this work?
> 
> You are all welcome to post enhancements or more layout alternatives as
> patches.

Let me see if I can find time to experiment a bit.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: virtio ring layout changes for optimal single-stream performance
@ 2016-01-21 15:38   ` Cornelia Huck
  0 siblings, 0 replies; 11+ messages in thread
From: Cornelia Huck @ 2016-01-21 15:38 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtio-dev, kvm, dev, linux-kernel, qemu-devel, Xie, Huawei,
	virtio, virtualization

On Thu, 21 Jan 2016 15:39:26 +0200
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> Hi all!
> I have been experimenting with alternative virtio ring layouts,
> in order to speed up single stream performance.
> 
> I have just posted a benchmark I wrote for the purpose, and a (partial)
> alternative layout implementation.  This achieves 20-40% reduction in
> virtio overhead in the (default) polling mode.
> 
> http://article.gmane.org/gmane.linux.kernel.virtualization/26889
> 
> The layout is trying to be as simple as possible, to reduce
> the number of cache lines bouncing between CPUs.

Some kind of diagram or textual description would really help to review
this.

> 
> For benchmarking, the idea is to emulate virtio in user-space,
> artificially adding overhead for e.g. signalling to match what happens
> in case of a VM.

Hm... is this overhead comparable enough between different platform so
that you can get a halfway realistic scenario? What about things like
endianness conversions?

> 
> I'd be very curious to get feedback on this, in particular, some people
> discussed using vectored operations to format virtio ring - would it
> conflict with this work?
> 
> You are all welcome to post enhancements or more layout alternatives as
> patches.

Let me see if I can find time to experiment a bit.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] virtio ring layout changes for optimal single-stream performance
@ 2016-01-21 15:38   ` Cornelia Huck
  0 siblings, 0 replies; 11+ messages in thread
From: Cornelia Huck @ 2016-01-21 15:38 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: virtio-dev, kvm, dev, linux-kernel, qemu-devel, Xie, Huawei,
	virtio, virtualization

On Thu, 21 Jan 2016 15:39:26 +0200
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> Hi all!
> I have been experimenting with alternative virtio ring layouts,
> in order to speed up single stream performance.
> 
> I have just posted a benchmark I wrote for the purpose, and a (partial)
> alternative layout implementation.  This achieves 20-40% reduction in
> virtio overhead in the (default) polling mode.
> 
> http://article.gmane.org/gmane.linux.kernel.virtualization/26889
> 
> The layout is trying to be as simple as possible, to reduce
> the number of cache lines bouncing between CPUs.

Some kind of diagram or textual description would really help to review
this.

> 
> For benchmarking, the idea is to emulate virtio in user-space,
> artificially adding overhead for e.g. signalling to match what happens
> in case of a VM.

Hm... is this overhead comparable enough between different platform so
that you can get a halfway realistic scenario? What about things like
endianness conversions?

> 
> I'd be very curious to get feedback on this, in particular, some people
> discussed using vectored operations to format virtio ring - would it
> conflict with this work?
> 
> You are all welcome to post enhancements or more layout alternatives as
> patches.

Let me see if I can find time to experiment a bit.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: virtio ring layout changes for optimal single-stream performance
  2016-01-21 15:38   ` Cornelia Huck
  (?)
@ 2016-01-21 19:03     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 11+ messages in thread
From: Michael S. Tsirkin @ 2016-01-21 19:03 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: virtio, virtio-dev, kvm, dev, linux-kernel, qemu-devel,
	virtualization, Xie, Huawei

On Thu, Jan 21, 2016 at 04:38:36PM +0100, Cornelia Huck wrote:
> On Thu, 21 Jan 2016 15:39:26 +0200
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > Hi all!
> > I have been experimenting with alternative virtio ring layouts,
> > in order to speed up single stream performance.
> > 
> > I have just posted a benchmark I wrote for the purpose, and a (partial)
> > alternative layout implementation.  This achieves 20-40% reduction in
> > virtio overhead in the (default) polling mode.
> > 
> > http://article.gmane.org/gmane.linux.kernel.virtualization/26889
> > 
> > The layout is trying to be as simple as possible, to reduce
> > the number of cache lines bouncing between CPUs.
> 
> Some kind of diagram or textual description would really help to review
> this.
> 
> > 
> > For benchmarking, the idea is to emulate virtio in user-space,
> > artificially adding overhead for e.g. signalling to match what happens
> > in case of a VM.
> 
> Hm... is this overhead comparable enough between different platform so
> that you can get a halfway realistic scenario?

On x86 is seems pretty stable.
It's a question of setting VMEXIT_CYCLES and VMENTRY_CYCLES correctly.

> What about things like
> endianness conversions?

I didn't bother with them yet.

> > 
> > I'd be very curious to get feedback on this, in particular, some people
> > discussed using vectored operations to format virtio ring - would it
> > conflict with this work?
> > 
> > You are all welcome to post enhancements or more layout alternatives as
> > patches.
> 
> Let me see if I can find time to experiment a bit.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: virtio ring layout changes for optimal single-stream performance
@ 2016-01-21 19:03     ` Michael S. Tsirkin
  0 siblings, 0 replies; 11+ messages in thread
From: Michael S. Tsirkin @ 2016-01-21 19:03 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: virtio-dev, kvm, dev, linux-kernel, qemu-devel, virtio, virtualization

On Thu, Jan 21, 2016 at 04:38:36PM +0100, Cornelia Huck wrote:
> On Thu, 21 Jan 2016 15:39:26 +0200
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > Hi all!
> > I have been experimenting with alternative virtio ring layouts,
> > in order to speed up single stream performance.
> > 
> > I have just posted a benchmark I wrote for the purpose, and a (partial)
> > alternative layout implementation.  This achieves 20-40% reduction in
> > virtio overhead in the (default) polling mode.
> > 
> > http://article.gmane.org/gmane.linux.kernel.virtualization/26889
> > 
> > The layout is trying to be as simple as possible, to reduce
> > the number of cache lines bouncing between CPUs.
> 
> Some kind of diagram or textual description would really help to review
> this.
> 
> > 
> > For benchmarking, the idea is to emulate virtio in user-space,
> > artificially adding overhead for e.g. signalling to match what happens
> > in case of a VM.
> 
> Hm... is this overhead comparable enough between different platform so
> that you can get a halfway realistic scenario?

On x86 is seems pretty stable.
It's a question of setting VMEXIT_CYCLES and VMENTRY_CYCLES correctly.

> What about things like
> endianness conversions?

I didn't bother with them yet.

> > 
> > I'd be very curious to get feedback on this, in particular, some people
> > discussed using vectored operations to format virtio ring - would it
> > conflict with this work?
> > 
> > You are all welcome to post enhancements or more layout alternatives as
> > patches.
> 
> Let me see if I can find time to experiment a bit.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [Qemu-devel] virtio ring layout changes for optimal single-stream performance
@ 2016-01-21 19:03     ` Michael S. Tsirkin
  0 siblings, 0 replies; 11+ messages in thread
From: Michael S. Tsirkin @ 2016-01-21 19:03 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: virtio-dev, kvm, dev, linux-kernel, qemu-devel, Xie, Huawei,
	virtio, virtualization

On Thu, Jan 21, 2016 at 04:38:36PM +0100, Cornelia Huck wrote:
> On Thu, 21 Jan 2016 15:39:26 +0200
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > Hi all!
> > I have been experimenting with alternative virtio ring layouts,
> > in order to speed up single stream performance.
> > 
> > I have just posted a benchmark I wrote for the purpose, and a (partial)
> > alternative layout implementation.  This achieves 20-40% reduction in
> > virtio overhead in the (default) polling mode.
> > 
> > http://article.gmane.org/gmane.linux.kernel.virtualization/26889
> > 
> > The layout is trying to be as simple as possible, to reduce
> > the number of cache lines bouncing between CPUs.
> 
> Some kind of diagram or textual description would really help to review
> this.
> 
> > 
> > For benchmarking, the idea is to emulate virtio in user-space,
> > artificially adding overhead for e.g. signalling to match what happens
> > in case of a VM.
> 
> Hm... is this overhead comparable enough between different platform so
> that you can get a halfway realistic scenario?

On x86 is seems pretty stable.
It's a question of setting VMEXIT_CYCLES and VMENTRY_CYCLES correctly.

> What about things like
> endianness conversions?

I didn't bother with them yet.

> > 
> > I'd be very curious to get feedback on this, in particular, some people
> > discussed using vectored operations to format virtio ring - would it
> > conflict with this work?
> > 
> > You are all welcome to post enhancements or more layout alternatives as
> > patches.
> 
> Let me see if I can find time to experiment a bit.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: virtio ring layout changes for optimal single-stream performance
  2016-01-21 15:38   ` Cornelia Huck
  (?)
  (?)
@ 2016-01-21 19:03   ` Michael S. Tsirkin
  -1 siblings, 0 replies; 11+ messages in thread
From: Michael S. Tsirkin @ 2016-01-21 19:03 UTC (permalink / raw)
  To: Cornelia Huck
  Cc: virtio-dev, kvm, dev, linux-kernel, qemu-devel, Xie, Huawei,
	virtio, virtualization

On Thu, Jan 21, 2016 at 04:38:36PM +0100, Cornelia Huck wrote:
> On Thu, 21 Jan 2016 15:39:26 +0200
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > Hi all!
> > I have been experimenting with alternative virtio ring layouts,
> > in order to speed up single stream performance.
> > 
> > I have just posted a benchmark I wrote for the purpose, and a (partial)
> > alternative layout implementation.  This achieves 20-40% reduction in
> > virtio overhead in the (default) polling mode.
> > 
> > http://article.gmane.org/gmane.linux.kernel.virtualization/26889
> > 
> > The layout is trying to be as simple as possible, to reduce
> > the number of cache lines bouncing between CPUs.
> 
> Some kind of diagram or textual description would really help to review
> this.
> 
> > 
> > For benchmarking, the idea is to emulate virtio in user-space,
> > artificially adding overhead for e.g. signalling to match what happens
> > in case of a VM.
> 
> Hm... is this overhead comparable enough between different platform so
> that you can get a halfway realistic scenario?

On x86 is seems pretty stable.
It's a question of setting VMEXIT_CYCLES and VMENTRY_CYCLES correctly.

> What about things like
> endianness conversions?

I didn't bother with them yet.

> > 
> > I'd be very curious to get feedback on this, in particular, some people
> > discussed using vectored operations to format virtio ring - would it
> > conflict with this work?
> > 
> > You are all welcome to post enhancements or more layout alternatives as
> > patches.
> 
> Let me see if I can find time to experiment a bit.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* virtio ring layout changes for optimal single-stream performance
@ 2016-01-21 13:39 Michael S. Tsirkin
  0 siblings, 0 replies; 11+ messages in thread
From: Michael S. Tsirkin @ 2016-01-21 13:39 UTC (permalink / raw)
  To: virtio
  Cc: virtio-dev, kvm, dev, linux-kernel, qemu-devel, virtualization,
	Xie, Huawei

Hi all!
I have been experimenting with alternative virtio ring layouts,
in order to speed up single stream performance.

I have just posted a benchmark I wrote for the purpose, and a (partial)
alternative layout implementation.  This achieves 20-40% reduction in
virtio overhead in the (default) polling mode.

http://article.gmane.org/gmane.linux.kernel.virtualization/26889

The layout is trying to be as simple as possible, to reduce
the number of cache lines bouncing between CPUs.

For benchmarking, the idea is to emulate virtio in user-space,
artificially adding overhead for e.g. signalling to match what happens
in case of a VM.

I'd be very curious to get feedback on this, in particular, some people
discussed using vectored operations to format virtio ring - would it
conflict with this work?

You are all welcome to post enhancements or more layout alternatives as
patches.

TODO:
- documentation+discussion of interaction with CPU caching
- thorough benchmarking of different configurations/hosts
- experiment with event index replacements
- better emulate vmexit/vmentry cost overhead
- virtio spec proposal

Thanks!
-- 
MST

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2016-01-21 19:03 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-21 13:39 virtio ring layout changes for optimal single-stream performance Michael S. Tsirkin
2016-01-21 13:39 ` [Qemu-devel] " Michael S. Tsirkin
2016-01-21 13:39 ` Michael S. Tsirkin
2016-01-21 15:38 ` Cornelia Huck
2016-01-21 15:38   ` [Qemu-devel] " Cornelia Huck
2016-01-21 15:38   ` Cornelia Huck
2016-01-21 19:03   ` Michael S. Tsirkin
2016-01-21 19:03   ` Michael S. Tsirkin
2016-01-21 19:03     ` [Qemu-devel] " Michael S. Tsirkin
2016-01-21 19:03     ` Michael S. Tsirkin
2016-01-21 13:39 Michael S. Tsirkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.