[Qemu-devel][RFC]QEMU disk I/O limits

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Qemu-devel][RFC]QEMU disk I/O limits
@ 2011-05-30  5:09 ` Zhi Yong Wu
  0 siblings, 0 replies; 56+ messages in thread
From: Zhi Yong Wu @ 2011-05-30  5:09 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: kwolf, vgoyal, guijianfeng, herbert, stefanha, aliguori,
	raharper, luowenj, wuzhy, zhanx, zhaoyang, llim

Hello, all,

    I have prepared to work on a feature called "Disk I/O limits" for qemu-kvm projeect.
    This feature will enable the user to cap disk I/O amount performed by a VM.It is important for some storage resources to be shared among multi-VMs. As you've known, if some of VMs are doing excessive disk I/O, they will hurt the performance of other VMs.

    More detail is available here:
    http://wiki.qemu.org/Features/DiskIOLimits

    1.) Why we need per-drive disk I/O limits 
    As you've known, for linux, cgroup blkio-controller has supported I/O throttling on block devices. More importantly, there is no single mechanism for disk I/O throttling across all underlying storage types (image file, LVM, NFS, Ceph) and for some types there is no way to throttle at all. 

    Disk I/O limits feature introduces QEMU block layer I/O limits together with command-line and QMP interfaces for configuring limits. This allows I/O limits to be imposed across all underlying storage types using a single interface.

    2.) How disk I/O limits will be implemented
    QEMU block layer will introduce a per-drive disk I/O request queue for those disks whose "disk I/O limits" feature is enabled. It can control disk I/O limits individually for each disk when multiple disks are attached to a VM, and enable use cases like unlimited local disk access but shared storage access with limits. 
    In mutliple I/O threads scenario, when an application in a VM issues a block I/O request, this request will be intercepted by QEMU block layer, then it will calculate disk runtime I/O rate and determine if it has go beyond its limits. If yes, this I/O request will enqueue to that introduced queue; otherwise it will be serviced.

    3.) How the users enable and play with it
    QEMU -drive option will be extended so that disk I/O limits can be specified on its command line, such as -drive [iops=xxx,][throughput=xxx] or -drive [iops_rd=xxx,][iops_wr=xxx,][throughput=xxx] etc. When this argument is specified, it means that "disk I/O limits" feature is enabled for this drive disk.
    The feature will also provide users with the ability to change per-drive disk I/O limits at runtime using QMP commands.


Regards,

Zhiyong Wu

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Qemu-devel] [RFC]QEMU disk I/O limits
@ 2011-05-30  5:09 ` Zhi Yong Wu
  0 siblings, 0 replies; 56+ messages in thread
From: Zhi Yong Wu @ 2011-05-30  5:09 UTC (permalink / raw)
  To: qemu-devel, kvm
  Cc: kwolf, aliguori, herbert, guijianfeng, wuzhy, luowenj, zhanx,
	zhaoyang, llim, raharper, vgoyal, stefanha

Hello, all,

    I have prepared to work on a feature called "Disk I/O limits" for qemu-kvm projeect.
    This feature will enable the user to cap disk I/O amount performed by a VM.It is important for some storage resources to be shared among multi-VMs. As you've known, if some of VMs are doing excessive disk I/O, they will hurt the performance of other VMs.

    More detail is available here:
    http://wiki.qemu.org/Features/DiskIOLimits

    1.) Why we need per-drive disk I/O limits 
    As you've known, for linux, cgroup blkio-controller has supported I/O throttling on block devices. More importantly, there is no single mechanism for disk I/O throttling across all underlying storage types (image file, LVM, NFS, Ceph) and for some types there is no way to throttle at all. 

    Disk I/O limits feature introduces QEMU block layer I/O limits together with command-line and QMP interfaces for configuring limits. This allows I/O limits to be imposed across all underlying storage types using a single interface.

    2.) How disk I/O limits will be implemented
    QEMU block layer will introduce a per-drive disk I/O request queue for those disks whose "disk I/O limits" feature is enabled. It can control disk I/O limits individually for each disk when multiple disks are attached to a VM, and enable use cases like unlimited local disk access but shared storage access with limits. 
    In mutliple I/O threads scenario, when an application in a VM issues a block I/O request, this request will be intercepted by QEMU block layer, then it will calculate disk runtime I/O rate and determine if it has go beyond its limits. If yes, this I/O request will enqueue to that introduced queue; otherwise it will be serviced.

    3.) How the users enable and play with it
    QEMU -drive option will be extended so that disk I/O limits can be specified on its command line, such as -drive [iops=xxx,][throughput=xxx] or -drive [iops_rd=xxx,][iops_wr=xxx,][throughput=xxx] etc. When this argument is specified, it means that "disk I/O limits" feature is enabled for this drive disk.
    The feature will also provide users with the ability to change per-drive disk I/O limits at runtime using QMP commands.


Regards,

Zhiyong Wu

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel][RFC]QEMU disk I/O limits
  2011-05-30  5:09 ` [Qemu-devel] [RFC]QEMU " Zhi Yong Wu
@ 2011-05-31 13:45   ` Vivek Goyal
  -1 siblings, 0 replies; 56+ messages in thread
From: Vivek Goyal @ 2011-05-31 13:45 UTC (permalink / raw)
  To: Zhi Yong Wu
  Cc: qemu-devel, kvm, kwolf, guijianfeng, herbert, stefanha, aliguori,
	raharper, luowenj, wuzhy, zhanx, zhaoyang, llim

On Mon, May 30, 2011 at 01:09:23PM +0800, Zhi Yong Wu wrote:
> Hello, all,
> 
>     I have prepared to work on a feature called "Disk I/O limits" for qemu-kvm projeect.
>     This feature will enable the user to cap disk I/O amount performed by a VM.It is important for some storage resources to be shared among multi-VMs. As you've known, if some of VMs are doing excessive disk I/O, they will hurt the performance of other VMs.
> 

Hi Zhiyong,

Why not use kernel blkio controller for this and why reinvent the wheel
and implement the feature again in qemu?

Thanks
Vivek


>     More detail is available here:
>     http://wiki.qemu.org/Features/DiskIOLimits
> 
>     1.) Why we need per-drive disk I/O limits 
>     As you've known, for linux, cgroup blkio-controller has supported I/O throttling on block devices. More importantly, there is no single mechanism for disk I/O throttling across all underlying storage types (image file, LVM, NFS, Ceph) and for some types there is no way to throttle at all. 
> 
>     Disk I/O limits feature introduces QEMU block layer I/O limits together with command-line and QMP interfaces for configuring limits. This allows I/O limits to be imposed across all underlying storage types using a single interface.
> 
>     2.) How disk I/O limits will be implemented
>     QEMU block layer will introduce a per-drive disk I/O request queue for those disks whose "disk I/O limits" feature is enabled. It can control disk I/O limits individually for each disk when multiple disks are attached to a VM, and enable use cases like unlimited local disk access but shared storage access with limits. 
>     In mutliple I/O threads scenario, when an application in a VM issues a block I/O request, this request will be intercepted by QEMU block layer, then it will calculate disk runtime I/O rate and determine if it has go beyond its limits. If yes, this I/O request will enqueue to that introduced queue; otherwise it will be serviced.
> 
>     3.) How the users enable and play with it
>     QEMU -drive option will be extended so that disk I/O limits can be specified on its command line, such as -drive [iops=xxx,][throughput=xxx] or -drive [iops_rd=xxx,][iops_wr=xxx,][throughput=xxx] etc. When this argument is specified, it means that "disk I/O limits" feature is enabled for this drive disk.
>     The feature will also provide users with the ability to change per-drive disk I/O limits at runtime using QMP commands.
> 
> 
> Regards,
> 
> Zhiyong Wu

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC]QEMU disk I/O limits
@ 2011-05-31 13:45   ` Vivek Goyal
  0 siblings, 0 replies; 56+ messages in thread
From: Vivek Goyal @ 2011-05-31 13:45 UTC (permalink / raw)
  To: Zhi Yong Wu
  Cc: kwolf, aliguori, stefanha, kvm, guijianfeng, qemu-devel, wuzhy,
	herbert, luowenj, zhanx, zhaoyang, llim, raharper

On Mon, May 30, 2011 at 01:09:23PM +0800, Zhi Yong Wu wrote:
> Hello, all,
> 
>     I have prepared to work on a feature called "Disk I/O limits" for qemu-kvm projeect.
>     This feature will enable the user to cap disk I/O amount performed by a VM.It is important for some storage resources to be shared among multi-VMs. As you've known, if some of VMs are doing excessive disk I/O, they will hurt the performance of other VMs.
> 

Hi Zhiyong,

Why not use kernel blkio controller for this and why reinvent the wheel
and implement the feature again in qemu?

Thanks
Vivek


>     More detail is available here:
>     http://wiki.qemu.org/Features/DiskIOLimits
> 
>     1.) Why we need per-drive disk I/O limits 
>     As you've known, for linux, cgroup blkio-controller has supported I/O throttling on block devices. More importantly, there is no single mechanism for disk I/O throttling across all underlying storage types (image file, LVM, NFS, Ceph) and for some types there is no way to throttle at all. 
> 
>     Disk I/O limits feature introduces QEMU block layer I/O limits together with command-line and QMP interfaces for configuring limits. This allows I/O limits to be imposed across all underlying storage types using a single interface.
> 
>     2.) How disk I/O limits will be implemented
>     QEMU block layer will introduce a per-drive disk I/O request queue for those disks whose "disk I/O limits" feature is enabled. It can control disk I/O limits individually for each disk when multiple disks are attached to a VM, and enable use cases like unlimited local disk access but shared storage access with limits. 
>     In mutliple I/O threads scenario, when an application in a VM issues a block I/O request, this request will be intercepted by QEMU block layer, then it will calculate disk runtime I/O rate and determine if it has go beyond its limits. If yes, this I/O request will enqueue to that introduced queue; otherwise it will be serviced.
> 
>     3.) How the users enable and play with it
>     QEMU -drive option will be extended so that disk I/O limits can be specified on its command line, such as -drive [iops=xxx,][throughput=xxx] or -drive [iops_rd=xxx,][iops_wr=xxx,][throughput=xxx] etc. When this argument is specified, it means that "disk I/O limits" feature is enabled for this drive disk.
>     The feature will also provide users with the ability to change per-drive disk I/O limits at runtime using QMP commands.
> 
> 
> Regards,
> 
> Zhiyong Wu

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC]QEMU disk I/O limits
  2011-05-31 13:45   ` [Qemu-devel] [RFC]QEMU " Vivek Goyal
@ 2011-05-31 13:50     ` Anthony Liguori
  -1 siblings, 0 replies; 56+ messages in thread
From: Anthony Liguori @ 2011-05-31 13:50 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: kwolf, stefanha, kvm, guijianfeng, qemu-devel, wuzhy, herbert,
	Zhi Yong Wu, luowenj, zhanx, zhaoyang, llim, Ryan A Harper

On 05/31/2011 08:45 AM, Vivek Goyal wrote:
> On Mon, May 30, 2011 at 01:09:23PM +0800, Zhi Yong Wu wrote:
>> Hello, all,
>>
>>      I have prepared to work on a feature called "Disk I/O limits" for qemu-kvm projeect.
>>      This feature will enable the user to cap disk I/O amount performed by a VM.It is important for some storage resources to be shared among multi-VMs. As you've known, if some of VMs are doing excessive disk I/O, they will hurt the performance of other VMs.
>>
>
> Hi Zhiyong,
>
> Why not use kernel blkio controller for this and why reinvent the wheel
> and implement the feature again in qemu?

blkio controller only works for block devices.  It doesn't work when 
using files.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC]QEMU disk I/O limits
@ 2011-05-31 13:50     ` Anthony Liguori
  0 siblings, 0 replies; 56+ messages in thread
From: Anthony Liguori @ 2011-05-31 13:50 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: kwolf, stefanha, kvm, guijianfeng, qemu-devel, wuzhy, herbert,
	Zhi Yong Wu, luowenj, zhanx, zhaoyang, llim, Ryan A Harper

On 05/31/2011 08:45 AM, Vivek Goyal wrote:
> On Mon, May 30, 2011 at 01:09:23PM +0800, Zhi Yong Wu wrote:
>> Hello, all,
>>
>>      I have prepared to work on a feature called "Disk I/O limits" for qemu-kvm projeect.
>>      This feature will enable the user to cap disk I/O amount performed by a VM.It is important for some storage resources to be shared among multi-VMs. As you've known, if some of VMs are doing excessive disk I/O, they will hurt the performance of other VMs.
>>
>
> Hi Zhiyong,
>
> Why not use kernel blkio controller for this and why reinvent the wheel
> and implement the feature again in qemu?

blkio controller only works for block devices.  It doesn't work when 
using files.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC]QEMU disk I/O limits
  2011-05-31 13:50     ` [Qemu-devel] " Anthony Liguori
@ 2011-05-31 14:04       ` Vivek Goyal
  -1 siblings, 0 replies; 56+ messages in thread
From: Vivek Goyal @ 2011-05-31 14:04 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: kwolf, stefanha, kvm, guijianfeng, qemu-devel, wuzhy, herbert,
	Zhi Yong Wu, luowenj, zhanx, zhaoyang, llim, Ryan A Harper

On Tue, May 31, 2011 at 08:50:40AM -0500, Anthony Liguori wrote:
> On 05/31/2011 08:45 AM, Vivek Goyal wrote:
> >On Mon, May 30, 2011 at 01:09:23PM +0800, Zhi Yong Wu wrote:
> >>Hello, all,
> >>
> >>     I have prepared to work on a feature called "Disk I/O limits" for qemu-kvm projeect.
> >>     This feature will enable the user to cap disk I/O amount performed by a VM.It is important for some storage resources to be shared among multi-VMs. As you've known, if some of VMs are doing excessive disk I/O, they will hurt the performance of other VMs.
> >>
> >
> >Hi Zhiyong,
> >
> >Why not use kernel blkio controller for this and why reinvent the wheel
> >and implement the feature again in qemu?
> 
> blkio controller only works for block devices.  It doesn't work when
> using files.

So can't we comeup with something to easily determine which device backs
up this file? Though that will still not work for NFS backed storage
though.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC]QEMU disk I/O limits
@ 2011-05-31 14:04       ` Vivek Goyal
  0 siblings, 0 replies; 56+ messages in thread
From: Vivek Goyal @ 2011-05-31 14:04 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: kwolf, stefanha, kvm, guijianfeng, qemu-devel, wuzhy, herbert,
	Zhi Yong Wu, luowenj, zhanx, zhaoyang, llim, Ryan A Harper

On Tue, May 31, 2011 at 08:50:40AM -0500, Anthony Liguori wrote:
> On 05/31/2011 08:45 AM, Vivek Goyal wrote:
> >On Mon, May 30, 2011 at 01:09:23PM +0800, Zhi Yong Wu wrote:
> >>Hello, all,
> >>
> >>     I have prepared to work on a feature called "Disk I/O limits" for qemu-kvm projeect.
> >>     This feature will enable the user to cap disk I/O amount performed by a VM.It is important for some storage resources to be shared among multi-VMs. As you've known, if some of VMs are doing excessive disk I/O, they will hurt the performance of other VMs.
> >>
> >
> >Hi Zhiyong,
> >
> >Why not use kernel blkio controller for this and why reinvent the wheel
> >and implement the feature again in qemu?
> 
> blkio controller only works for block devices.  It doesn't work when
> using files.

So can't we comeup with something to easily determine which device backs
up this file? Though that will still not work for NFS backed storage
though.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC]QEMU disk I/O limits
  2011-05-31 14:04       ` [Qemu-devel] " Vivek Goyal
  (?)
@ 2011-05-31 14:25       ` Anthony Liguori
  2011-05-31 17:59           ` Vivek Goyal
  -1 siblings, 1 reply; 56+ messages in thread
From: Anthony Liguori @ 2011-05-31 14:25 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: kwolf, stefanha, kvm, guijianfeng, qemu-devel, wuzhy, herbert,
	Zhi Yong Wu, luowenj, zhanx, zhaoyang, llim, Ryan A Harper

On 05/31/2011 09:04 AM, Vivek Goyal wrote:
> On Tue, May 31, 2011 at 08:50:40AM -0500, Anthony Liguori wrote:
>> On 05/31/2011 08:45 AM, Vivek Goyal wrote:
>>> On Mon, May 30, 2011 at 01:09:23PM +0800, Zhi Yong Wu wrote:
>>>> Hello, all,
>>>>
>>>>      I have prepared to work on a feature called "Disk I/O limits" for qemu-kvm projeect.
>>>>      This feature will enable the user to cap disk I/O amount performed by a VM.It is important for some storage resources to be shared among multi-VMs. As you've known, if some of VMs are doing excessive disk I/O, they will hurt the performance of other VMs.
>>>>
>>>
>>> Hi Zhiyong,
>>>
>>> Why not use kernel blkio controller for this and why reinvent the wheel
>>> and implement the feature again in qemu?
>>
>> blkio controller only works for block devices.  It doesn't work when
>> using files.
>
> So can't we comeup with something to easily determine which device backs
> up this file? Though that will still not work for NFS backed storage
> though.

Right.

Additionally, in QEMU, we can rate limit based on concepts that make 
sense to a guest.  We can limit the actual I/O ops visible to the guest 
which means that we'll get consistent performance regardless of whether 
the backing file is qcow2, raw, LVM, or raw over NFS.

The kernel just doesn't have enough information to do a good job here.

Regards,

Anthony Liguori

>
> Thanks
> Vivek
>


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC]QEMU disk I/O limits
  2011-05-31 14:25       ` Anthony Liguori
@ 2011-05-31 17:59           ` Vivek Goyal
  0 siblings, 0 replies; 56+ messages in thread
From: Vivek Goyal @ 2011-05-31 17:59 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: kwolf, stefanha, kvm, guijianfeng, qemu-devel, wuzhy, herbert,
	Zhi Yong Wu, luowenj, zhanx, zhaoyang, llim, Ryan A Harper,
	Mike Snitzer, Joe Thornber

On Tue, May 31, 2011 at 09:25:31AM -0500, Anthony Liguori wrote:
> On 05/31/2011 09:04 AM, Vivek Goyal wrote:
> >On Tue, May 31, 2011 at 08:50:40AM -0500, Anthony Liguori wrote:
> >>On 05/31/2011 08:45 AM, Vivek Goyal wrote:
> >>>On Mon, May 30, 2011 at 01:09:23PM +0800, Zhi Yong Wu wrote:
> >>>>Hello, all,
> >>>>
> >>>>     I have prepared to work on a feature called "Disk I/O limits" for qemu-kvm projeect.
> >>>>     This feature will enable the user to cap disk I/O amount performed by a VM.It is important for some storage resources to be shared among multi-VMs. As you've known, if some of VMs are doing excessive disk I/O, they will hurt the performance of other VMs.
> >>>>
> >>>
> >>>Hi Zhiyong,
> >>>
> >>>Why not use kernel blkio controller for this and why reinvent the wheel
> >>>and implement the feature again in qemu?
> >>
> >>blkio controller only works for block devices.  It doesn't work when
> >>using files.
> >
> >So can't we comeup with something to easily determine which device backs
> >up this file? Though that will still not work for NFS backed storage
> >though.
> 
> Right.
> 
> Additionally, in QEMU, we can rate limit based on concepts that make
> sense to a guest.  We can limit the actual I/O ops visible to the
> guest which means that we'll get consistent performance regardless
> of whether the backing file is qcow2, raw, LVM, or raw over NFS.
> 

Are you referring to merging taking place which can change the definition
of IOPS as seen by guest?

We do throttling at bio level and no merging is taking place. So IOPS
seen by guest and as seen by throttling logic should be same. Readahead
would be one exception though where any readahead data will be charged
to guest.

Device throttling and interaction with file system is still an issue
with IO controller (things like journalling lead to serialization) where
a faster group can get blocked behind slower group. That's why at the
moment, it is recommened that directly export devices/partitions to
virtual machines if throttling is to be used and don't share a
file system across VMs.

> The kernel just doesn't have enough information to do a good job here.

[CCing couple of device mapper folks for thoughts on below]

When I think more about it, I think this problem is very similar to
other features like snapshotting. Whether we should implement snapshotting
in qemu or use some kernel based solution like dm-snaphot or dm-multisnap
etc.

I don't have a good answer for that. Has this detabe been settled
already? I see that development is happening in kernel or providing
dm snapshot capabilities and Mike Snitzer also mentioned about 
possibility of using dm-loop for covering the case of files over
NFS etc.

Some thoughts in general though.

- Any kernel based solution is generic and can be used in other contexts
  also like containers or bare metal.

- In some cases, kernel can implement throttling more efficiently. For
  example if a block devie has multiple partitions and these partitions
  are exported to VMs, then kernel can maintain a single queue and
  single set of timer to manage all the VMs doing IO to that device.
  In user space solution we shall have manage as many queues and
  timers as there are VMs.

  So kernel implementation can enable more efficient implementation in
  certain cases. 

- Things like dm-loop essentially means introduce another block layer
  on top of file system layer. I personally think that it does not
  sound very clean and might slow down things. Though I don't have
  any data. Has there been any discussion/conclusion on this?

- qemu based scheme will work well with all kind of targets. For using
  kenrel based scheme one shall have to switch to using kernel provided
  snapshotting schemes (dm-snapshot or dm multisnap etc). Otherwise a
  READ might come from a base image which is on other device and we did
  not throttle the VM. 
 
Thanks
Vivek

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC]QEMU disk I/O limits
@ 2011-05-31 17:59           ` Vivek Goyal
  0 siblings, 0 replies; 56+ messages in thread
From: Vivek Goyal @ 2011-05-31 17:59 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: kwolf, stefanha, kvm, guijianfeng, Mike Snitzer, qemu-devel,
	wuzhy, herbert, Joe Thornber, Zhi Yong Wu, luowenj, zhanx,
	zhaoyang, llim, Ryan A Harper

On Tue, May 31, 2011 at 09:25:31AM -0500, Anthony Liguori wrote:
> On 05/31/2011 09:04 AM, Vivek Goyal wrote:
> >On Tue, May 31, 2011 at 08:50:40AM -0500, Anthony Liguori wrote:
> >>On 05/31/2011 08:45 AM, Vivek Goyal wrote:
> >>>On Mon, May 30, 2011 at 01:09:23PM +0800, Zhi Yong Wu wrote:
> >>>>Hello, all,
> >>>>
> >>>>     I have prepared to work on a feature called "Disk I/O limits" for qemu-kvm projeect.
> >>>>     This feature will enable the user to cap disk I/O amount performed by a VM.It is important for some storage resources to be shared among multi-VMs. As you've known, if some of VMs are doing excessive disk I/O, they will hurt the performance of other VMs.
> >>>>
> >>>
> >>>Hi Zhiyong,
> >>>
> >>>Why not use kernel blkio controller for this and why reinvent the wheel
> >>>and implement the feature again in qemu?
> >>
> >>blkio controller only works for block devices.  It doesn't work when
> >>using files.
> >
> >So can't we comeup with something to easily determine which device backs
> >up this file? Though that will still not work for NFS backed storage
> >though.
> 
> Right.
> 
> Additionally, in QEMU, we can rate limit based on concepts that make
> sense to a guest.  We can limit the actual I/O ops visible to the
> guest which means that we'll get consistent performance regardless
> of whether the backing file is qcow2, raw, LVM, or raw over NFS.
> 

Are you referring to merging taking place which can change the definition
of IOPS as seen by guest?

We do throttling at bio level and no merging is taking place. So IOPS
seen by guest and as seen by throttling logic should be same. Readahead
would be one exception though where any readahead data will be charged
to guest.

Device throttling and interaction with file system is still an issue
with IO controller (things like journalling lead to serialization) where
a faster group can get blocked behind slower group. That's why at the
moment, it is recommened that directly export devices/partitions to
virtual machines if throttling is to be used and don't share a
file system across VMs.

> The kernel just doesn't have enough information to do a good job here.

[CCing couple of device mapper folks for thoughts on below]

When I think more about it, I think this problem is very similar to
other features like snapshotting. Whether we should implement snapshotting
in qemu or use some kernel based solution like dm-snaphot or dm-multisnap
etc.

I don't have a good answer for that. Has this detabe been settled
already? I see that development is happening in kernel or providing
dm snapshot capabilities and Mike Snitzer also mentioned about 
possibility of using dm-loop for covering the case of files over
NFS etc.

Some thoughts in general though.

- Any kernel based solution is generic and can be used in other contexts
  also like containers or bare metal.

- In some cases, kernel can implement throttling more efficiently. For
  example if a block devie has multiple partitions and these partitions
  are exported to VMs, then kernel can maintain a single queue and
  single set of timer to manage all the VMs doing IO to that device.
  In user space solution we shall have manage as many queues and
  timers as there are VMs.

  So kernel implementation can enable more efficient implementation in
  certain cases. 

- Things like dm-loop essentially means introduce another block layer
  on top of file system layer. I personally think that it does not
  sound very clean and might slow down things. Though I don't have
  any data. Has there been any discussion/conclusion on this?

- qemu based scheme will work well with all kind of targets. For using
  kenrel based scheme one shall have to switch to using kernel provided
  snapshotting schemes (dm-snapshot or dm multisnap etc). Otherwise a
  READ might come from a base image which is on other device and we did
  not throttle the VM. 
 
Thanks
Vivek

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC]QEMU disk I/O limits
  2011-05-31 17:59           ` Vivek Goyal
@ 2011-05-31 18:39             ` Anthony Liguori
  -1 siblings, 0 replies; 56+ messages in thread
From: Anthony Liguori @ 2011-05-31 18:39 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: kwolf, stefanha, kvm, guijianfeng, Mike Snitzer, qemu-devel,
	wuzhy, herbert, Joe Thornber, Zhi Yong Wu, luowenj, zhanx,
	zhaoyang, llim, Ryan A Harper

On 05/31/2011 12:59 PM, Vivek Goyal wrote:
> On Tue, May 31, 2011 at 09:25:31AM -0500, Anthony Liguori wrote:
>> On 05/31/2011 09:04 AM, Vivek Goyal wrote:
>>> On Tue, May 31, 2011 at 08:50:40AM -0500, Anthony Liguori wrote:
>>>> On 05/31/2011 08:45 AM, Vivek Goyal wrote:
>>>>> On Mon, May 30, 2011 at 01:09:23PM +0800, Zhi Yong Wu wrote:
>>>>>> Hello, all,
>>>>>>
>>>>>>      I have prepared to work on a feature called "Disk I/O limits" for qemu-kvm projeect.
>>>>>>      This feature will enable the user to cap disk I/O amount performed by a VM.It is important for some storage resources to be shared among multi-VMs. As you've known, if some of VMs are doing excessive disk I/O, they will hurt the performance of other VMs.
>>>>>>
>>>>>
>>>>> Hi Zhiyong,
>>>>>
>>>>> Why not use kernel blkio controller for this and why reinvent the wheel
>>>>> and implement the feature again in qemu?
>>>>
>>>> blkio controller only works for block devices.  It doesn't work when
>>>> using files.
>>>
>>> So can't we comeup with something to easily determine which device backs
>>> up this file? Though that will still not work for NFS backed storage
>>> though.
>>
>> Right.
>>
>> Additionally, in QEMU, we can rate limit based on concepts that make
>> sense to a guest.  We can limit the actual I/O ops visible to the
>> guest which means that we'll get consistent performance regardless
>> of whether the backing file is qcow2, raw, LVM, or raw over NFS.
>>
>
> Are you referring to merging taking place which can change the definition
> of IOPS as seen by guest?

No, with qcow2, it may take multiple real IOPs for what the guest sees 
as an IOP.

That's really the main argument I'm making here.  The only entity that 
knows what a guest IOP corresponds to is QEMU.  On the backend, it may 
end up being a network request, multiple BIOs to physical disks, file 
access, etc.

That's why QEMU is the right place to do the throttling for this use 
case.  That doesn't mean device level throttling isn't useful but just 
that for virtualization, it makes more sense to do it in QEMU.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC]QEMU disk I/O limits
@ 2011-05-31 18:39             ` Anthony Liguori
  0 siblings, 0 replies; 56+ messages in thread
From: Anthony Liguori @ 2011-05-31 18:39 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: kwolf, stefanha, Mike Snitzer, guijianfeng, qemu-devel, wuzhy,
	herbert, Joe Thornber, Zhi Yong Wu, luowenj, kvm, zhanx,
	zhaoyang, llim, Ryan A Harper

On 05/31/2011 12:59 PM, Vivek Goyal wrote:
> On Tue, May 31, 2011 at 09:25:31AM -0500, Anthony Liguori wrote:
>> On 05/31/2011 09:04 AM, Vivek Goyal wrote:
>>> On Tue, May 31, 2011 at 08:50:40AM -0500, Anthony Liguori wrote:
>>>> On 05/31/2011 08:45 AM, Vivek Goyal wrote:
>>>>> On Mon, May 30, 2011 at 01:09:23PM +0800, Zhi Yong Wu wrote:
>>>>>> Hello, all,
>>>>>>
>>>>>>      I have prepared to work on a feature called "Disk I/O limits" for qemu-kvm projeect.
>>>>>>      This feature will enable the user to cap disk I/O amount performed by a VM.It is important for some storage resources to be shared among multi-VMs. As you've known, if some of VMs are doing excessive disk I/O, they will hurt the performance of other VMs.
>>>>>>
>>>>>
>>>>> Hi Zhiyong,
>>>>>
>>>>> Why not use kernel blkio controller for this and why reinvent the wheel
>>>>> and implement the feature again in qemu?
>>>>
>>>> blkio controller only works for block devices.  It doesn't work when
>>>> using files.
>>>
>>> So can't we comeup with something to easily determine which device backs
>>> up this file? Though that will still not work for NFS backed storage
>>> though.
>>
>> Right.
>>
>> Additionally, in QEMU, we can rate limit based on concepts that make
>> sense to a guest.  We can limit the actual I/O ops visible to the
>> guest which means that we'll get consistent performance regardless
>> of whether the backing file is qcow2, raw, LVM, or raw over NFS.
>>
>
> Are you referring to merging taking place which can change the definition
> of IOPS as seen by guest?

No, with qcow2, it may take multiple real IOPs for what the guest sees 
as an IOP.

That's really the main argument I'm making here.  The only entity that 
knows what a guest IOP corresponds to is QEMU.  On the backend, it may 
end up being a network request, multiple BIOs to physical disks, file 
access, etc.

That's why QEMU is the right place to do the throttling for this use 
case.  That doesn't mean device level throttling isn't useful but just 
that for virtualization, it makes more sense to do it in QEMU.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC]QEMU disk I/O limits
  2011-05-31 18:39             ` Anthony Liguori
@ 2011-05-31 19:24               ` Vivek Goyal
  -1 siblings, 0 replies; 56+ messages in thread
From: Vivek Goyal @ 2011-05-31 19:24 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: kwolf, stefanha, kvm, guijianfeng, Mike Snitzer, qemu-devel,
	wuzhy, herbert, Joe Thornber, Zhi Yong Wu, luowenj, zhanx,
	zhaoyang, llim, Ryan A Harper

On Tue, May 31, 2011 at 01:39:47PM -0500, Anthony Liguori wrote:
> On 05/31/2011 12:59 PM, Vivek Goyal wrote:
> >On Tue, May 31, 2011 at 09:25:31AM -0500, Anthony Liguori wrote:
> >>On 05/31/2011 09:04 AM, Vivek Goyal wrote:
> >>>On Tue, May 31, 2011 at 08:50:40AM -0500, Anthony Liguori wrote:
> >>>>On 05/31/2011 08:45 AM, Vivek Goyal wrote:
> >>>>>On Mon, May 30, 2011 at 01:09:23PM +0800, Zhi Yong Wu wrote:
> >>>>>>Hello, all,
> >>>>>>
> >>>>>>     I have prepared to work on a feature called "Disk I/O limits" for qemu-kvm projeect.
> >>>>>>     This feature will enable the user to cap disk I/O amount performed by a VM.It is important for some storage resources to be shared among multi-VMs. As you've known, if some of VMs are doing excessive disk I/O, they will hurt the performance of other VMs.
> >>>>>>
> >>>>>
> >>>>>Hi Zhiyong,
> >>>>>
> >>>>>Why not use kernel blkio controller for this and why reinvent the wheel
> >>>>>and implement the feature again in qemu?
> >>>>
> >>>>blkio controller only works for block devices.  It doesn't work when
> >>>>using files.
> >>>
> >>>So can't we comeup with something to easily determine which device backs
> >>>up this file? Though that will still not work for NFS backed storage
> >>>though.
> >>
> >>Right.
> >>
> >>Additionally, in QEMU, we can rate limit based on concepts that make
> >>sense to a guest.  We can limit the actual I/O ops visible to the
> >>guest which means that we'll get consistent performance regardless
> >>of whether the backing file is qcow2, raw, LVM, or raw over NFS.
> >>
> >
> >Are you referring to merging taking place which can change the definition
> >of IOPS as seen by guest?
> 
> No, with qcow2, it may take multiple real IOPs for what the guest
> sees as an IOP.
> 
> That's really the main argument I'm making here.  The only entity
> that knows what a guest IOP corresponds to is QEMU.  On the backend,
> it may end up being a network request, multiple BIOs to physical
> disks, file access, etc.

Ok, so we seem to be talking of two requirements.

- A consistent experience to guest
- Isolation between VMs.

If this qcow2 mapping/metada overhead is not significant, then we
don't have to worry about IOPs perceived by guest. It will be more or less
same. If it is significant then we provide more consistent experience to
guest but then weaken the isolation between guest and might overload the
backend storage and in turn might not get the expected IOPS for the
guest anyway.

So I think these two things are not independent.

I agree though that advantage of qemu is that everything is a file
and handling all the complex configuraitons becomes very easy.

Having said that, to provide a consistent experience to guest, you
also need to know where IO from guest is going and whether underlying
storage system can support that kind of IO or not.

IO limits are of not much use if if these are put in isolation without
knowing where IO is going and how many VMs are doing IO to it. Otherwise
there are no gurantees/estimates on minimum bandwidth for guests hence
there is no consistent experience.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC]QEMU disk I/O limits
@ 2011-05-31 19:24               ` Vivek Goyal
  0 siblings, 0 replies; 56+ messages in thread
From: Vivek Goyal @ 2011-05-31 19:24 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: kwolf, stefanha, Mike Snitzer, guijianfeng, qemu-devel, wuzhy,
	herbert, Joe Thornber, Zhi Yong Wu, luowenj, kvm, zhanx,
	zhaoyang, llim, Ryan A Harper

On Tue, May 31, 2011 at 01:39:47PM -0500, Anthony Liguori wrote:
> On 05/31/2011 12:59 PM, Vivek Goyal wrote:
> >On Tue, May 31, 2011 at 09:25:31AM -0500, Anthony Liguori wrote:
> >>On 05/31/2011 09:04 AM, Vivek Goyal wrote:
> >>>On Tue, May 31, 2011 at 08:50:40AM -0500, Anthony Liguori wrote:
> >>>>On 05/31/2011 08:45 AM, Vivek Goyal wrote:
> >>>>>On Mon, May 30, 2011 at 01:09:23PM +0800, Zhi Yong Wu wrote:
> >>>>>>Hello, all,
> >>>>>>
> >>>>>>     I have prepared to work on a feature called "Disk I/O limits" for qemu-kvm projeect.
> >>>>>>     This feature will enable the user to cap disk I/O amount performed by a VM.It is important for some storage resources to be shared among multi-VMs. As you've known, if some of VMs are doing excessive disk I/O, they will hurt the performance of other VMs.
> >>>>>>
> >>>>>
> >>>>>Hi Zhiyong,
> >>>>>
> >>>>>Why not use kernel blkio controller for this and why reinvent the wheel
> >>>>>and implement the feature again in qemu?
> >>>>
> >>>>blkio controller only works for block devices.  It doesn't work when
> >>>>using files.
> >>>
> >>>So can't we comeup with something to easily determine which device backs
> >>>up this file? Though that will still not work for NFS backed storage
> >>>though.
> >>
> >>Right.
> >>
> >>Additionally, in QEMU, we can rate limit based on concepts that make
> >>sense to a guest.  We can limit the actual I/O ops visible to the
> >>guest which means that we'll get consistent performance regardless
> >>of whether the backing file is qcow2, raw, LVM, or raw over NFS.
> >>
> >
> >Are you referring to merging taking place which can change the definition
> >of IOPS as seen by guest?
> 
> No, with qcow2, it may take multiple real IOPs for what the guest
> sees as an IOP.
> 
> That's really the main argument I'm making here.  The only entity
> that knows what a guest IOP corresponds to is QEMU.  On the backend,
> it may end up being a network request, multiple BIOs to physical
> disks, file access, etc.

Ok, so we seem to be talking of two requirements.

- A consistent experience to guest
- Isolation between VMs.

If this qcow2 mapping/metada overhead is not significant, then we
don't have to worry about IOPs perceived by guest. It will be more or less
same. If it is significant then we provide more consistent experience to
guest but then weaken the isolation between guest and might overload the
backend storage and in turn might not get the expected IOPS for the
guest anyway.

So I think these two things are not independent.

I agree though that advantage of qemu is that everything is a file
and handling all the complex configuraitons becomes very easy.

Having said that, to provide a consistent experience to guest, you
also need to know where IO from guest is going and whether underlying
storage system can support that kind of IO or not.

IO limits are of not much use if if these are put in isolation without
knowing where IO is going and how many VMs are doing IO to it. Otherwise
there are no gurantees/estimates on minimum bandwidth for guests hence
there is no consistent experience.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC]QEMU disk I/O limits
  2011-05-31 19:24               ` Vivek Goyal
  (?)
@ 2011-05-31 23:30               ` Anthony Liguori
  2011-06-01 13:20                 ` Vivek Goyal
  2011-06-04  8:54                   ` Blue Swirl
  -1 siblings, 2 replies; 56+ messages in thread
From: Anthony Liguori @ 2011-05-31 23:30 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: kwolf, stefanha, Mike Snitzer, guijianfeng, qemu-devel, wuzhy,
	herbert, Joe Thornber, Zhi Yong Wu, luowenj, kvm, zhanx,
	zhaoyang, llim, Ryan A Harper

On 05/31/2011 02:24 PM, Vivek Goyal wrote:
> On Tue, May 31, 2011 at 01:39:47PM -0500, Anthony Liguori wrote:
>> On 05/31/2011 12:59 PM, Vivek Goyal wrote:
> Ok, so we seem to be talking of two requirements.
>
> - A consistent experience to guest
> - Isolation between VMs.
>
> If this qcow2 mapping/metada overhead is not significant, then we
> don't have to worry about IOPs perceived by guest. It will be more or less
> same. If it is significant then we provide more consistent experience to
> guest but then weaken the isolation between guest and might overload the
> backend storage and in turn might not get the expected IOPS for the
> guest anyway.

That's quite a bit of hand waving considering your following argument is 
that you can't be precise enough at the QEMU level.

> So I think these two things are not independent.
>
> I agree though that advantage of qemu is that everything is a file
> and handling all the complex configuraitons becomes very easy.
>
> Having said that, to provide a consistent experience to guest, you
> also need to know where IO from guest is going and whether underlying
> storage system can support that kind of IO or not.
>
> IO limits are of not much use if if these are put in isolation without
> knowing where IO is going and how many VMs are doing IO to it. Otherwise
> there are no gurantees/estimates on minimum bandwidth for guests hence
> there is no consistent experience.

Consistent and maximum are two very different things.

QEMU can, very effectively, enforce a maximum I/O rate.  This can then 
be used to provide mostly consistent performance across different 
generations of hardware, to implement service levels in a tiered 
offering, etc.

The level of consistency will then depend on whether you overcommit your 
hardware and how you have it configured.

Consistency is very hard because at the end of the day, you still have 
shared resources.  Even with blkio, I presume one guest can still impact 
another guest by forcing the disk to do excessive seeking or something 
of that nature.

So absolutely consistency can't be the requirement for the use-case. 
The use-cases we are interested really are more about providing caps 
than anything else.

Regards,

Anthony Liguori

>
> Thanks
> Vivek
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC]QEMU disk I/O limits
  2011-05-31 23:30               ` Anthony Liguori
@ 2011-06-01 13:20                 ` Vivek Goyal
  2011-06-01 21:15                     ` Stefan Hajnoczi
  2011-06-04  8:54                   ` Blue Swirl
  1 sibling, 1 reply; 56+ messages in thread
From: Vivek Goyal @ 2011-06-01 13:20 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: kwolf, stefanha, Mike Snitzer, guijianfeng, qemu-devel, wuzhy,
	herbert, Joe Thornber, Zhi Yong Wu, luowenj, kvm, zhanx,
	zhaoyang, llim, Ryan A Harper

On Tue, May 31, 2011 at 06:30:09PM -0500, Anthony Liguori wrote:

[..]
> The level of consistency will then depend on whether you overcommit
> your hardware and how you have it configured.

Agreed.

> 
> Consistency is very hard because at the end of the day, you still
> have shared resources.  Even with blkio, I presume one guest can
> still impact another guest by forcing the disk to do excessive
> seeking or something of that nature.
> 
> So absolutely consistency can't be the requirement for the use-case.
> The use-cases we are interested really are more about providing caps
> than anything else.

I think both qemu and kenrel can do the job. The only thing which
seriously favors throttling implementation in qemu is the ability
to handle wide variety of backend files (NFS, qcow, libcurl based
devices etc).

So what I am arguing is that your previous reason that qemu can do
a better job because it knows effective IOPS of guest, is not
necessarily a very good reason. To me simplicity of being able to handle
everything as file and do the throttling is the most compelling reason
to do this implementation in qemu.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC]QEMU disk I/O limits
  2011-06-01 13:20                 ` Vivek Goyal
@ 2011-06-01 21:15                     ` Stefan Hajnoczi
  0 siblings, 0 replies; 56+ messages in thread
From: Stefan Hajnoczi @ 2011-06-01 21:15 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Anthony Liguori, kwolf, stefanha, Mike Snitzer, guijianfeng,
	qemu-devel, wuzhy, herbert, Joe Thornber, Zhi Yong Wu, luowenj,
	kvm, zhanx, zhaoyang, llim, Ryan A Harper

On Wed, Jun 1, 2011 at 2:20 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Tue, May 31, 2011 at 06:30:09PM -0500, Anthony Liguori wrote:
>
> [..]
>> The level of consistency will then depend on whether you overcommit
>> your hardware and how you have it configured.
>
> Agreed.
>
>>
>> Consistency is very hard because at the end of the day, you still
>> have shared resources.  Even with blkio, I presume one guest can
>> still impact another guest by forcing the disk to do excessive
>> seeking or something of that nature.
>>
>> So absolutely consistency can't be the requirement for the use-case.
>> The use-cases we are interested really are more about providing caps
>> than anything else.
>
> I think both qemu and kenrel can do the job. The only thing which
> seriously favors throttling implementation in qemu is the ability
> to handle wide variety of backend files (NFS, qcow, libcurl based
> devices etc).
>
> So what I am arguing is that your previous reason that qemu can do
> a better job because it knows effective IOPS of guest, is not
> necessarily a very good reason. To me simplicity of being able to handle
> everything as file and do the throttling is the most compelling reason
> to do this implementation in qemu.

The variety of backends is the reason to go for a QEMU-based approach.
 If there were kernel mechanisms to handle non-block backends that
would be great.  cgroups NFS?

Of course for something like Sheepdog or Ceph it becomes quite hard to
do it in the kernel at all since they are userspace libraries that
speak their protocol over sockets, and you really don't have sinight
into what I/O operations they are doing from the kernel.

One issue that concerns me is how effective iops and throughput are as
capping mechanisms.  If you cap throughput then you're likely to
affect sequential I/O but do little against random I/O which can hog
the disk with a seeky I/O pattern.  If you limit iops you can cap
random I/O but artifically limit sequential I/O, which may be able to
perform a high number of iops without hogging the disk due to seek
times at all.  One proposed solution here (I think Christoph Hellwig
suggested it) is to do something like merging sequential I/O counting
so that multiple sequential I/Os only count as 1 iop.

I like the idea of a proportional share of disk utilization but doing
that from QEMU is problematic since we only know when we issued an I/O
to the kernel, not when it's actually being serviced by the disk -
there could be queue wait times in the block layer that we don't know
about - so we end up with a magic number for disk utilization which
may not be a very meaningful number.

So given the constraints and the backends we need to support, disk I/O
limits in QEMU with iops and throughput limits seem like the approach
we need.

Stefan

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC]QEMU disk I/O limits
@ 2011-06-01 21:15                     ` Stefan Hajnoczi
  0 siblings, 0 replies; 56+ messages in thread
From: Stefan Hajnoczi @ 2011-06-01 21:15 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: kwolf, stefanha, Mike Snitzer, guijianfeng, qemu-devel, wuzhy,
	herbert, Joe Thornber, Zhi Yong Wu, luowenj, kvm, zhanx,
	zhaoyang, llim, Ryan A Harper

On Wed, Jun 1, 2011 at 2:20 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Tue, May 31, 2011 at 06:30:09PM -0500, Anthony Liguori wrote:
>
> [..]
>> The level of consistency will then depend on whether you overcommit
>> your hardware and how you have it configured.
>
> Agreed.
>
>>
>> Consistency is very hard because at the end of the day, you still
>> have shared resources.  Even with blkio, I presume one guest can
>> still impact another guest by forcing the disk to do excessive
>> seeking or something of that nature.
>>
>> So absolutely consistency can't be the requirement for the use-case.
>> The use-cases we are interested really are more about providing caps
>> than anything else.
>
> I think both qemu and kenrel can do the job. The only thing which
> seriously favors throttling implementation in qemu is the ability
> to handle wide variety of backend files (NFS, qcow, libcurl based
> devices etc).
>
> So what I am arguing is that your previous reason that qemu can do
> a better job because it knows effective IOPS of guest, is not
> necessarily a very good reason. To me simplicity of being able to handle
> everything as file and do the throttling is the most compelling reason
> to do this implementation in qemu.

The variety of backends is the reason to go for a QEMU-based approach.
 If there were kernel mechanisms to handle non-block backends that
would be great.  cgroups NFS?

Of course for something like Sheepdog or Ceph it becomes quite hard to
do it in the kernel at all since they are userspace libraries that
speak their protocol over sockets, and you really don't have sinight
into what I/O operations they are doing from the kernel.

One issue that concerns me is how effective iops and throughput are as
capping mechanisms.  If you cap throughput then you're likely to
affect sequential I/O but do little against random I/O which can hog
the disk with a seeky I/O pattern.  If you limit iops you can cap
random I/O but artifically limit sequential I/O, which may be able to
perform a high number of iops without hogging the disk due to seek
times at all.  One proposed solution here (I think Christoph Hellwig
suggested it) is to do something like merging sequential I/O counting
so that multiple sequential I/Os only count as 1 iop.

I like the idea of a proportional share of disk utilization but doing
that from QEMU is problematic since we only know when we issued an I/O
to the kernel, not when it's actually being serviced by the disk -
there could be queue wait times in the block layer that we don't know
about - so we end up with a magic number for disk utilization which
may not be a very meaningful number.

So given the constraints and the backends we need to support, disk I/O
limits in QEMU with iops and throughput limits seem like the approach
we need.

Stefan

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC]QEMU disk I/O limits
  2011-06-01 21:15                     ` Stefan Hajnoczi
@ 2011-06-01 21:42                       ` Vivek Goyal
  -1 siblings, 0 replies; 56+ messages in thread
From: Vivek Goyal @ 2011-06-01 21:42 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Anthony Liguori, kwolf, stefanha, Mike Snitzer, guijianfeng,
	qemu-devel, wuzhy, herbert, Joe Thornber, Zhi Yong Wu, luowenj,
	kvm, zhanx, zhaoyang, llim, Ryan A Harper

On Wed, Jun 01, 2011 at 10:15:30PM +0100, Stefan Hajnoczi wrote:
> On Wed, Jun 1, 2011 at 2:20 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > On Tue, May 31, 2011 at 06:30:09PM -0500, Anthony Liguori wrote:
> >
> > [..]
> >> The level of consistency will then depend on whether you overcommit
> >> your hardware and how you have it configured.
> >
> > Agreed.
> >
> >>
> >> Consistency is very hard because at the end of the day, you still
> >> have shared resources.  Even with blkio, I presume one guest can
> >> still impact another guest by forcing the disk to do excessive
> >> seeking or something of that nature.
> >>
> >> So absolutely consistency can't be the requirement for the use-case.
> >> The use-cases we are interested really are more about providing caps
> >> than anything else.
> >
> > I think both qemu and kenrel can do the job. The only thing which
> > seriously favors throttling implementation in qemu is the ability
> > to handle wide variety of backend files (NFS, qcow, libcurl based
> > devices etc).
> >
> > So what I am arguing is that your previous reason that qemu can do
> > a better job because it knows effective IOPS of guest, is not
> > necessarily a very good reason. To me simplicity of being able to handle
> > everything as file and do the throttling is the most compelling reason
> > to do this implementation in qemu.
> 
> The variety of backends is the reason to go for a QEMU-based approach.
>  If there were kernel mechanisms to handle non-block backends that
> would be great.  cgroups NFS?

I agree that because qemu can handle variety of backends it becomes a
very good reason to do throttling in qemu. Kernel currently does not
handle files over NFS.

There were some suggestions of using a loop or device mapper loop device
on top of NFS images and then implement block device policies like
throttling. But I am not convinced that it is a good idea.

To cover the case of NFS we probably shall have to implement something
in NFS or something more generic in VFS. But I am not sure if file system
guys will like it or is it even worth at this point of time given the
fact that primary use case is qemu and qemu can easily implement this
funcitonality.

> 
> Of course for something like Sheepdog or Ceph it becomes quite hard to
> do it in the kernel at all since they are userspace libraries that
> speak their protocol over sockets, and you really don't have sinight
> into what I/O operations they are doing from the kernel.

Agreed. This is another reason that why doing it in qemu makes sense. 

> 
> One issue that concerns me is how effective iops and throughput are as
> capping mechanisms.  If you cap throughput then you're likely to
> affect sequential I/O but do little against random I/O which can hog
> the disk with a seeky I/O pattern.  If you limit iops you can cap
> random I/O but artifically limit sequential I/O, which may be able to
> perform a high number of iops without hogging the disk due to seek
> times at all.  One proposed solution here (I think Christoph Hellwig
> suggested it) is to do something like merging sequential I/O counting
> so that multiple sequential I/Os only count as 1 iop.

One of the things we atleast need to do is allow specifying both
bps and iops rule together so that random IO  with high iops does
not create havoc and seqential or large size IO with low iops and
high bps does not overload the system.

I am not sure how IO shows up in qemu but will elevator in guest
make sure that lot of sequential IO is merged together? For dependent
READS, I think counting multiple sequential reads as 1 iops might
help. I think this is one optimization one can do once throttling
starts working in qemu and see if it is a real concern.

> 
> I like the idea of a proportional share of disk utilization but doing
> that from QEMU is problematic since we only know when we issued an I/O
> to the kernel, not when it's actually being serviced by the disk -
> there could be queue wait times in the block layer that we don't know
> about - so we end up with a magic number for disk utilization which
> may not be a very meaningful number.

To be able to implement proportional IO one should be able to see
all IO from all clients at one place. Qemu knows about IO of only
its guest and not other guests running on the system. So I think 
qemu can't implement proportion IO.

> 
> So given the constraints and the backends we need to support, disk I/O
> limits in QEMU with iops and throughput limits seem like the approach
> we need.

For qemu yes. For other non-qemu usages we will still require a kernel
mechanism of throttling.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC]QEMU disk I/O limits
@ 2011-06-01 21:42                       ` Vivek Goyal
  0 siblings, 0 replies; 56+ messages in thread
From: Vivek Goyal @ 2011-06-01 21:42 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: kwolf, stefanha, Mike Snitzer, guijianfeng, qemu-devel, wuzhy,
	herbert, Joe Thornber, Zhi Yong Wu, luowenj, kvm, zhanx,
	zhaoyang, llim, Ryan A Harper

On Wed, Jun 01, 2011 at 10:15:30PM +0100, Stefan Hajnoczi wrote:
> On Wed, Jun 1, 2011 at 2:20 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > On Tue, May 31, 2011 at 06:30:09PM -0500, Anthony Liguori wrote:
> >
> > [..]
> >> The level of consistency will then depend on whether you overcommit
> >> your hardware and how you have it configured.
> >
> > Agreed.
> >
> >>
> >> Consistency is very hard because at the end of the day, you still
> >> have shared resources.  Even with blkio, I presume one guest can
> >> still impact another guest by forcing the disk to do excessive
> >> seeking or something of that nature.
> >>
> >> So absolutely consistency can't be the requirement for the use-case.
> >> The use-cases we are interested really are more about providing caps
> >> than anything else.
> >
> > I think both qemu and kenrel can do the job. The only thing which
> > seriously favors throttling implementation in qemu is the ability
> > to handle wide variety of backend files (NFS, qcow, libcurl based
> > devices etc).
> >
> > So what I am arguing is that your previous reason that qemu can do
> > a better job because it knows effective IOPS of guest, is not
> > necessarily a very good reason. To me simplicity of being able to handle
> > everything as file and do the throttling is the most compelling reason
> > to do this implementation in qemu.
> 
> The variety of backends is the reason to go for a QEMU-based approach.
>  If there were kernel mechanisms to handle non-block backends that
> would be great.  cgroups NFS?

I agree that because qemu can handle variety of backends it becomes a
very good reason to do throttling in qemu. Kernel currently does not
handle files over NFS.

There were some suggestions of using a loop or device mapper loop device
on top of NFS images and then implement block device policies like
throttling. But I am not convinced that it is a good idea.

To cover the case of NFS we probably shall have to implement something
in NFS or something more generic in VFS. But I am not sure if file system
guys will like it or is it even worth at this point of time given the
fact that primary use case is qemu and qemu can easily implement this
funcitonality.

> 
> Of course for something like Sheepdog or Ceph it becomes quite hard to
> do it in the kernel at all since they are userspace libraries that
> speak their protocol over sockets, and you really don't have sinight
> into what I/O operations they are doing from the kernel.

Agreed. This is another reason that why doing it in qemu makes sense. 

> 
> One issue that concerns me is how effective iops and throughput are as
> capping mechanisms.  If you cap throughput then you're likely to
> affect sequential I/O but do little against random I/O which can hog
> the disk with a seeky I/O pattern.  If you limit iops you can cap
> random I/O but artifically limit sequential I/O, which may be able to
> perform a high number of iops without hogging the disk due to seek
> times at all.  One proposed solution here (I think Christoph Hellwig
> suggested it) is to do something like merging sequential I/O counting
> so that multiple sequential I/Os only count as 1 iop.

One of the things we atleast need to do is allow specifying both
bps and iops rule together so that random IO  with high iops does
not create havoc and seqential or large size IO with low iops and
high bps does not overload the system.

I am not sure how IO shows up in qemu but will elevator in guest
make sure that lot of sequential IO is merged together? For dependent
READS, I think counting multiple sequential reads as 1 iops might
help. I think this is one optimization one can do once throttling
starts working in qemu and see if it is a real concern.

> 
> I like the idea of a proportional share of disk utilization but doing
> that from QEMU is problematic since we only know when we issued an I/O
> to the kernel, not when it's actually being serviced by the disk -
> there could be queue wait times in the block layer that we don't know
> about - so we end up with a magic number for disk utilization which
> may not be a very meaningful number.

To be able to implement proportional IO one should be able to see
all IO from all clients at one place. Qemu knows about IO of only
its guest and not other guests running on the system. So I think 
qemu can't implement proportion IO.

> 
> So given the constraints and the backends we need to support, disk I/O
> limits in QEMU with iops and throughput limits seem like the approach
> we need.

For qemu yes. For other non-qemu usages we will still require a kernel
mechanism of throttling.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC]QEMU disk I/O limits
  2011-06-01 21:42                       ` Vivek Goyal
@ 2011-06-01 22:28                         ` Stefan Hajnoczi
  -1 siblings, 0 replies; 56+ messages in thread
From: Stefan Hajnoczi @ 2011-06-01 22:28 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Anthony Liguori, kwolf, stefanha, Mike Snitzer, guijianfeng,
	qemu-devel, wuzhy, herbert, Joe Thornber, Zhi Yong Wu, luowenj,
	kvm, zhanx, zhaoyang, llim, Ryan A Harper

On Wed, Jun 1, 2011 at 10:42 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Wed, Jun 01, 2011 at 10:15:30PM +0100, Stefan Hajnoczi wrote:
>> One issue that concerns me is how effective iops and throughput are as
>> capping mechanisms.  If you cap throughput then you're likely to
>> affect sequential I/O but do little against random I/O which can hog
>> the disk with a seeky I/O pattern.  If you limit iops you can cap
>> random I/O but artifically limit sequential I/O, which may be able to
>> perform a high number of iops without hogging the disk due to seek
>> times at all.  One proposed solution here (I think Christoph Hellwig
>> suggested it) is to do something like merging sequential I/O counting
>> so that multiple sequential I/Os only count as 1 iop.
>
> One of the things we atleast need to do is allow specifying both
> bps and iops rule together so that random IO  with high iops does
> not create havoc and seqential or large size IO with low iops and
> high bps does not overload the system.
>
> I am not sure how IO shows up in qemu but will elevator in guest
> make sure that lot of sequential IO is merged together? For dependent
> READS, I think counting multiple sequential reads as 1 iops might
> help. I think this is one optimization one can do once throttling
> starts working in qemu and see if it is a real concern.

The guest can use an I/O scheduler, so for Linux guests we see the
typical effects of cfq.  Requests do get merged by the guest before
being submitted to QEMU.

Okay, good idea.  Zhi Yong's test plan includes tests with multiple
VMs and both iops and throughput limits at the same time.  If
workloads turn up that cause issues it would be possible at counting
sequential I/Os a 1 iop.

>>
>> I like the idea of a proportional share of disk utilization but doing
>> that from QEMU is problematic since we only know when we issued an I/O
>> to the kernel, not when it's actually being serviced by the disk -
>> there could be queue wait times in the block layer that we don't know
>> about - so we end up with a magic number for disk utilization which
>> may not be a very meaningful number.
>
> To be able to implement proportional IO one should be able to see
> all IO from all clients at one place. Qemu knows about IO of only
> its guest and not other guests running on the system. So I think
> qemu can't implement proportion IO.

Yeah :(

>>
>> So given the constraints and the backends we need to support, disk I/O
>> limits in QEMU with iops and throughput limits seem like the approach
>> we need.
>
> For qemu yes. For other non-qemu usages we will still require a kernel
> mechanism of throttling.

Definitely.  In fact I like the idea of using blkio-controller for raw
image files on local file systems or LVM volumes.

Hopefully the end-user API (libvirt interface) that QEMU disk I/O
limits gets exposed from complements the existing blkiotune
(blkio-controller) virsh command.

Stefan

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC]QEMU disk I/O limits
@ 2011-06-01 22:28                         ` Stefan Hajnoczi
  0 siblings, 0 replies; 56+ messages in thread
From: Stefan Hajnoczi @ 2011-06-01 22:28 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: kwolf, stefanha, Mike Snitzer, guijianfeng, qemu-devel, wuzhy,
	herbert, Joe Thornber, Zhi Yong Wu, luowenj, kvm, zhanx,
	zhaoyang, llim, Ryan A Harper

On Wed, Jun 1, 2011 at 10:42 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Wed, Jun 01, 2011 at 10:15:30PM +0100, Stefan Hajnoczi wrote:
>> One issue that concerns me is how effective iops and throughput are as
>> capping mechanisms.  If you cap throughput then you're likely to
>> affect sequential I/O but do little against random I/O which can hog
>> the disk with a seeky I/O pattern.  If you limit iops you can cap
>> random I/O but artifically limit sequential I/O, which may be able to
>> perform a high number of iops without hogging the disk due to seek
>> times at all.  One proposed solution here (I think Christoph Hellwig
>> suggested it) is to do something like merging sequential I/O counting
>> so that multiple sequential I/Os only count as 1 iop.
>
> One of the things we atleast need to do is allow specifying both
> bps and iops rule together so that random IO  with high iops does
> not create havoc and seqential or large size IO with low iops and
> high bps does not overload the system.
>
> I am not sure how IO shows up in qemu but will elevator in guest
> make sure that lot of sequential IO is merged together? For dependent
> READS, I think counting multiple sequential reads as 1 iops might
> help. I think this is one optimization one can do once throttling
> starts working in qemu and see if it is a real concern.

The guest can use an I/O scheduler, so for Linux guests we see the
typical effects of cfq.  Requests do get merged by the guest before
being submitted to QEMU.

Okay, good idea.  Zhi Yong's test plan includes tests with multiple
VMs and both iops and throughput limits at the same time.  If
workloads turn up that cause issues it would be possible at counting
sequential I/Os a 1 iop.

>>
>> I like the idea of a proportional share of disk utilization but doing
>> that from QEMU is problematic since we only know when we issued an I/O
>> to the kernel, not when it's actually being serviced by the disk -
>> there could be queue wait times in the block layer that we don't know
>> about - so we end up with a magic number for disk utilization which
>> may not be a very meaningful number.
>
> To be able to implement proportional IO one should be able to see
> all IO from all clients at one place. Qemu knows about IO of only
> its guest and not other guests running on the system. So I think
> qemu can't implement proportion IO.

Yeah :(

>>
>> So given the constraints and the backends we need to support, disk I/O
>> limits in QEMU with iops and throughput limits seem like the approach
>> we need.
>
> For qemu yes. For other non-qemu usages we will still require a kernel
> mechanism of throttling.

Definitely.  In fact I like the idea of using blkio-controller for raw
image files on local file systems or LVM volumes.

Hopefully the end-user API (libvirt interface) that QEMU disk I/O
limits gets exposed from complements the existing blkiotune
(blkio-controller) virsh command.

Stefan

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC]QEMU disk I/O limits
  2011-05-31 23:30               ` Anthony Liguori
@ 2011-06-04  8:54                   ` Blue Swirl
  2011-06-04  8:54                   ` Blue Swirl
  1 sibling, 0 replies; 56+ messages in thread
From: Blue Swirl @ 2011-06-04  8:54 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Vivek Goyal, kwolf, stefanha, Mike Snitzer, guijianfeng,
	qemu-devel, wuzhy, herbert, Joe Thornber, Zhi Yong Wu, luowenj,
	kvm, zhanx, zhaoyang, llim, Ryan A Harper

On Wed, Jun 1, 2011 at 2:30 AM, Anthony Liguori <anthony@codemonkey.ws> wrote:
> On 05/31/2011 02:24 PM, Vivek Goyal wrote:
>>
>> On Tue, May 31, 2011 at 01:39:47PM -0500, Anthony Liguori wrote:
>>>
>>> On 05/31/2011 12:59 PM, Vivek Goyal wrote:
>>
>> Ok, so we seem to be talking of two requirements.
>>
>> - A consistent experience to guest
>> - Isolation between VMs.
>>
>> If this qcow2 mapping/metada overhead is not significant, then we
>> don't have to worry about IOPs perceived by guest. It will be more or less
>> same. If it is significant then we provide more consistent experience to
>> guest but then weaken the isolation between guest and might overload the
>> backend storage and in turn might not get the expected IOPS for the
>> guest anyway.
>
> That's quite a bit of hand waving considering your following argument is
> that you can't be precise enough at the QEMU level.
>
>> So I think these two things are not independent.
>>
>> I agree though that advantage of qemu is that everything is a file
>> and handling all the complex configuraitons becomes very easy.
>>
>> Having said that, to provide a consistent experience to guest, you
>> also need to know where IO from guest is going and whether underlying
>> storage system can support that kind of IO or not.
>>
>> IO limits are of not much use if if these are put in isolation without
>> knowing where IO is going and how many VMs are doing IO to it. Otherwise
>> there are no gurantees/estimates on minimum bandwidth for guests hence
>> there is no consistent experience.
>
> Consistent and maximum are two very different things.
>
> QEMU can, very effectively, enforce a maximum I/O rate.  This can then be
> used to provide mostly consistent performance across different generations
> of hardware, to implement service levels in a tiered offering, etc.

What is the point of view, guest or host? It is not possible to
enforce any rates which would make sense to guests without taking into
account guest clock and execution speed.

If instead you mean host rate (which would not be in sync with I/O
rates seen by guest), then I'd suppose metadata accesses would also
matter and then the host facilities should produce same results. On a
positive side, they may only exist on newer Linux and not on other OS
so introducing them to QEMU would not be so bad idea.

> The level of consistency will then depend on whether you overcommit your
> hardware and how you have it configured.
>
> Consistency is very hard because at the end of the day, you still have
> shared resources.  Even with blkio, I presume one guest can still impact
> another guest by forcing the disk to do excessive seeking or something of
> that nature.
>
> So absolutely consistency can't be the requirement for the use-case. The
> use-cases we are interested really are more about providing caps than
> anything else.
>
> Regards,
>
> Anthony Liguori
>
>>
>> Thanks
>> Vivek
>>
>
>
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC]QEMU disk I/O limits
@ 2011-06-04  8:54                   ` Blue Swirl
  0 siblings, 0 replies; 56+ messages in thread
From: Blue Swirl @ 2011-06-04  8:54 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: kwolf, stefanha, Mike Snitzer, guijianfeng, qemu-devel, wuzhy,
	herbert, Joe Thornber, Zhi Yong Wu, luowenj, kvm, zhanx,
	zhaoyang, llim, Ryan A Harper, Vivek Goyal

On Wed, Jun 1, 2011 at 2:30 AM, Anthony Liguori <anthony@codemonkey.ws> wrote:
> On 05/31/2011 02:24 PM, Vivek Goyal wrote:
>>
>> On Tue, May 31, 2011 at 01:39:47PM -0500, Anthony Liguori wrote:
>>>
>>> On 05/31/2011 12:59 PM, Vivek Goyal wrote:
>>
>> Ok, so we seem to be talking of two requirements.
>>
>> - A consistent experience to guest
>> - Isolation between VMs.
>>
>> If this qcow2 mapping/metada overhead is not significant, then we
>> don't have to worry about IOPs perceived by guest. It will be more or less
>> same. If it is significant then we provide more consistent experience to
>> guest but then weaken the isolation between guest and might overload the
>> backend storage and in turn might not get the expected IOPS for the
>> guest anyway.
>
> That's quite a bit of hand waving considering your following argument is
> that you can't be precise enough at the QEMU level.
>
>> So I think these two things are not independent.
>>
>> I agree though that advantage of qemu is that everything is a file
>> and handling all the complex configuraitons becomes very easy.
>>
>> Having said that, to provide a consistent experience to guest, you
>> also need to know where IO from guest is going and whether underlying
>> storage system can support that kind of IO or not.
>>
>> IO limits are of not much use if if these are put in isolation without
>> knowing where IO is going and how many VMs are doing IO to it. Otherwise
>> there are no gurantees/estimates on minimum bandwidth for guests hence
>> there is no consistent experience.
>
> Consistent and maximum are two very different things.
>
> QEMU can, very effectively, enforce a maximum I/O rate.  This can then be
> used to provide mostly consistent performance across different generations
> of hardware, to implement service levels in a tiered offering, etc.

What is the point of view, guest or host? It is not possible to
enforce any rates which would make sense to guests without taking into
account guest clock and execution speed.

If instead you mean host rate (which would not be in sync with I/O
rates seen by guest), then I'd suppose metadata accesses would also
matter and then the host facilities should produce same results. On a
positive side, they may only exist on newer Linux and not on other OS
so introducing them to QEMU would not be so bad idea.

> The level of consistency will then depend on whether you overcommit your
> hardware and how you have it configured.
>
> Consistency is very hard because at the end of the day, you still have
> shared resources.  Even with blkio, I presume one guest can still impact
> another guest by forcing the disk to do excessive seeking or something of
> that nature.
>
> So absolutely consistency can't be the requirement for the use-case. The
> use-cases we are interested really are more about providing caps than
> anything else.
>
> Regards,
>
> Anthony Liguori
>
>>
>> Thanks
>> Vivek
>>
>
>
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [RFC]QEMU disk I/O limits
  2011-05-31 18:39             ` Anthony Liguori
@ 2011-05-31 20:48               ` Mike Snitzer
  -1 siblings, 0 replies; 56+ messages in thread
From: Mike Snitzer @ 2011-05-31 20:48 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: Vivek Goyal, kwolf, stefanha, kvm, guijianfeng, qemu-devel,
	wuzhy, herbert, Joe Thornber, Zhi Yong Wu, luowenj, zhanx,
	zhaoyang, llim, Ryan A Harper

On Tue, May 31 2011 at  2:39pm -0400,
Anthony Liguori <anthony@codemonkey.ws> wrote:

> On 05/31/2011 12:59 PM, Vivek Goyal wrote:
> >On Tue, May 31, 2011 at 09:25:31AM -0500, Anthony Liguori wrote:
> >>On 05/31/2011 09:04 AM, Vivek Goyal wrote:
> >>>On Tue, May 31, 2011 at 08:50:40AM -0500, Anthony Liguori wrote:
> >>>>On 05/31/2011 08:45 AM, Vivek Goyal wrote:
> >>>>>On Mon, May 30, 2011 at 01:09:23PM +0800, Zhi Yong Wu wrote:
> >>>>>>Hello, all,
> >>>>>>
> >>>>>>     I have prepared to work on a feature called "Disk I/O limits" for qemu-kvm projeect.
> >>>>>>     This feature will enable the user to cap disk I/O amount
> performed by a VM.It is important for some storage resources to be
> shared among multi-VMs. As you've known, if some of VMs are doing
> excessive disk I/O, they will hurt the performance of other VMs. 
> >>>>>>
> >>>>>
> >>>>>Hi Zhiyong,
> >>>>>
> >>>>>Why not use kernel blkio controller for this and why reinvent the wheel
> >>>>>and implement the feature again in qemu?
> >>>>
> >>>>blkio controller only works for block devices.  It doesn't work when
> >>>>using files.
> >>>
> >>>So can't we comeup with something to easily determine which device backs
> >>>up this file? Though that will still not work for NFS backed storage
> >>>though.
> >>
> >>Right.
> >>
> >>Additionally, in QEMU, we can rate limit based on concepts that make
> >>sense to a guest.  We can limit the actual I/O ops visible to the
> >>guest which means that we'll get consistent performance regardless
> >>of whether the backing file is qcow2, raw, LVM, or raw over NFS.
> >>
> >
> >Are you referring to merging taking place which can change the definition
> >of IOPS as seen by guest?
> 
> No, with qcow2, it may take multiple real IOPs for what the guest
> sees as an IOP.
> 
> That's really the main argument I'm making here.  The only entity
> that knows what a guest IOP corresponds to is QEMU.  On the backend,
> it may end up being a network request, multiple BIOs to physical
> disks, file access, etc.

Couldn't QEMU give a hint to the kernel about the ratio of guest IOP to
real IOPs?  Or is QEMU blind to the real IOPs that correspond to a guest
IOP?

If QEMU is trully blind to the amount of real IOPs then couldn't QEMU
driven throttling cause physical resources to be oversubscribed
(underestimating the backend work it is creating)?

Mike

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC]QEMU disk I/O limits
@ 2011-05-31 20:48               ` Mike Snitzer
  0 siblings, 0 replies; 56+ messages in thread
From: Mike Snitzer @ 2011-05-31 20:48 UTC (permalink / raw)
  To: Anthony Liguori
  Cc: kwolf, stefanha, kvm, guijianfeng, qemu-devel, wuzhy, herbert,
	Joe Thornber, Zhi Yong Wu, luowenj, zhanx, zhaoyang, llim,
	Ryan A Harper, Vivek Goyal

On Tue, May 31 2011 at  2:39pm -0400,
Anthony Liguori <anthony@codemonkey.ws> wrote:

> On 05/31/2011 12:59 PM, Vivek Goyal wrote:
> >On Tue, May 31, 2011 at 09:25:31AM -0500, Anthony Liguori wrote:
> >>On 05/31/2011 09:04 AM, Vivek Goyal wrote:
> >>>On Tue, May 31, 2011 at 08:50:40AM -0500, Anthony Liguori wrote:
> >>>>On 05/31/2011 08:45 AM, Vivek Goyal wrote:
> >>>>>On Mon, May 30, 2011 at 01:09:23PM +0800, Zhi Yong Wu wrote:
> >>>>>>Hello, all,
> >>>>>>
> >>>>>>     I have prepared to work on a feature called "Disk I/O limits" for qemu-kvm projeect.
> >>>>>>     This feature will enable the user to cap disk I/O amount
> performed by a VM.It is important for some storage resources to be
> shared among multi-VMs. As you've known, if some of VMs are doing
> excessive disk I/O, they will hurt the performance of other VMs. 
> >>>>>>
> >>>>>
> >>>>>Hi Zhiyong,
> >>>>>
> >>>>>Why not use kernel blkio controller for this and why reinvent the wheel
> >>>>>and implement the feature again in qemu?
> >>>>
> >>>>blkio controller only works for block devices.  It doesn't work when
> >>>>using files.
> >>>
> >>>So can't we comeup with something to easily determine which device backs
> >>>up this file? Though that will still not work for NFS backed storage
> >>>though.
> >>
> >>Right.
> >>
> >>Additionally, in QEMU, we can rate limit based on concepts that make
> >>sense to a guest.  We can limit the actual I/O ops visible to the
> >>guest which means that we'll get consistent performance regardless
> >>of whether the backing file is qcow2, raw, LVM, or raw over NFS.
> >>
> >
> >Are you referring to merging taking place which can change the definition
> >of IOPS as seen by guest?
> 
> No, with qcow2, it may take multiple real IOPs for what the guest
> sees as an IOP.
> 
> That's really the main argument I'm making here.  The only entity
> that knows what a guest IOP corresponds to is QEMU.  On the backend,
> it may end up being a network request, multiple BIOs to physical
> disks, file access, etc.

Couldn't QEMU give a hint to the kernel about the ratio of guest IOP to
real IOPs?  Or is QEMU blind to the real IOPs that correspond to a guest
IOP?

If QEMU is trully blind to the amount of real IOPs then couldn't QEMU
driven throttling cause physical resources to be oversubscribed
(underestimating the backend work it is creating)?

Mike

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC]QEMU disk I/O limits
  2011-05-31 20:48               ` [Qemu-devel] " Mike Snitzer
  (?)
@ 2011-05-31 22:22               ` Anthony Liguori
  -1 siblings, 0 replies; 56+ messages in thread
From: Anthony Liguori @ 2011-05-31 22:22 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: kwolf, stefanha, kvm, guijianfeng, qemu-devel, wuzhy, herbert,
	Joe Thornber, Zhi Yong Wu, luowenj, zhanx, zhaoyang, llim,
	Ryan A Harper, Vivek Goyal

On 05/31/2011 03:48 PM, Mike Snitzer wrote:
> On Tue, May 31 2011 at  2:39pm -0400,
> Anthony Liguori<anthony@codemonkey.ws>  wrote:
>
>>> Are you referring to merging taking place which can change the definition
>>> of IOPS as seen by guest?
>>
>> No, with qcow2, it may take multiple real IOPs for what the guest
>> sees as an IOP.
>>
>> That's really the main argument I'm making here.  The only entity
>> that knows what a guest IOP corresponds to is QEMU.  On the backend,
>> it may end up being a network request, multiple BIOs to physical
>> disks, file access, etc.
>
> Couldn't QEMU give a hint to the kernel about the ratio of guest IOP to
> real IOPs?  Or is QEMU blind to the real IOPs that correspond to a guest
> IOP?

Perhaps, but how does that work when the disk image is backed by NFS?

And even if you had a VFS level API, we can do things like libcurl based 
block devices in QEMU.  So unless you tried to do level 5 traffic 
throttling which hopefully, you'll agree is total overkill, we're going 
to need to have this functionality in QEMU no matter what.

Regards,

Anthony Liguori

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel][RFC]QEMU disk I/O limits
  2011-05-31 13:45   ` [Qemu-devel] [RFC]QEMU " Vivek Goyal
@ 2011-05-31 13:56     ` Daniel P. Berrange
  -1 siblings, 0 replies; 56+ messages in thread
From: Daniel P. Berrange @ 2011-05-31 13:56 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Zhi Yong Wu, qemu-devel, kvm, kwolf, guijianfeng, herbert,
	stefanha, aliguori, raharper, luowenj, wuzhy, zhanx, zhaoyang,
	llim

On Tue, May 31, 2011 at 09:45:37AM -0400, Vivek Goyal wrote:
> On Mon, May 30, 2011 at 01:09:23PM +0800, Zhi Yong Wu wrote:
> > Hello, all,
> > 
> >     I have prepared to work on a feature called "Disk I/O limits" for qemu-kvm projeect.
> >     This feature will enable the user to cap disk I/O amount performed by a VM.It is important for some storage resources to be shared among multi-VMs. As you've known, if some of VMs are doing excessive disk I/O, they will hurt the performance of other VMs.
> > 
> 
> Hi Zhiyong,
> 
> Why not use kernel blkio controller for this and why reinvent the wheel
> and implement the feature again in qemu?

The finest level of granularity offered by cgroups apply limits per QEMU
process. So the blkio controller can't be used to apply controls directly
to individual disks used by QEMU, only the VM as a whole.

We networking we can use 'net_cls' cgroups controller for the process
as a whole, or attach  'tc' to individual TAP devices for per-NIC
throttling, both of which ultimately use the same kernel functionality.
I don't see an equivalent option for throttling individual disks that
would reuse functionality from the blkio controller.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC]QEMU disk I/O limits
@ 2011-05-31 13:56     ` Daniel P. Berrange
  0 siblings, 0 replies; 56+ messages in thread
From: Daniel P. Berrange @ 2011-05-31 13:56 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: kwolf, aliguori, stefanha, kvm, guijianfeng, qemu-devel, wuzhy,
	herbert, Zhi Yong Wu, luowenj, zhanx, zhaoyang, llim, raharper

On Tue, May 31, 2011 at 09:45:37AM -0400, Vivek Goyal wrote:
> On Mon, May 30, 2011 at 01:09:23PM +0800, Zhi Yong Wu wrote:
> > Hello, all,
> > 
> >     I have prepared to work on a feature called "Disk I/O limits" for qemu-kvm projeect.
> >     This feature will enable the user to cap disk I/O amount performed by a VM.It is important for some storage resources to be shared among multi-VMs. As you've known, if some of VMs are doing excessive disk I/O, they will hurt the performance of other VMs.
> > 
> 
> Hi Zhiyong,
> 
> Why not use kernel blkio controller for this and why reinvent the wheel
> and implement the feature again in qemu?

The finest level of granularity offered by cgroups apply limits per QEMU
process. So the blkio controller can't be used to apply controls directly
to individual disks used by QEMU, only the VM as a whole.

We networking we can use 'net_cls' cgroups controller for the process
as a whole, or attach  'tc' to individual TAP devices for per-NIC
throttling, both of which ultimately use the same kernel functionality.
I don't see an equivalent option for throttling individual disks that
would reuse functionality from the blkio controller.

Regards,
Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel][RFC]QEMU disk I/O limits
  2011-05-31 13:56     ` [Qemu-devel] [RFC]QEMU " Daniel P. Berrange
@ 2011-05-31 14:10       ` Vivek Goyal
  -1 siblings, 0 replies; 56+ messages in thread
From: Vivek Goyal @ 2011-05-31 14:10 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Zhi Yong Wu, qemu-devel, kvm, kwolf, guijianfeng, herbert,
	stefanha, aliguori, raharper, luowenj, wuzhy, zhanx, zhaoyang,
	llim

On Tue, May 31, 2011 at 02:56:46PM +0100, Daniel P. Berrange wrote:
> On Tue, May 31, 2011 at 09:45:37AM -0400, Vivek Goyal wrote:
> > On Mon, May 30, 2011 at 01:09:23PM +0800, Zhi Yong Wu wrote:
> > > Hello, all,
> > > 
> > >     I have prepared to work on a feature called "Disk I/O limits" for qemu-kvm projeect.
> > >     This feature will enable the user to cap disk I/O amount performed by a VM.It is important for some storage resources to be shared among multi-VMs. As you've known, if some of VMs are doing excessive disk I/O, they will hurt the performance of other VMs.
> > > 
> > 
> > Hi Zhiyong,
> > 
> > Why not use kernel blkio controller for this and why reinvent the wheel
> > and implement the feature again in qemu?
> 
> The finest level of granularity offered by cgroups apply limits per QEMU
> process. So the blkio controller can't be used to apply controls directly
> to individual disks used by QEMU, only the VM as a whole.

So are multiple VMs using same disk. Then put multiple VMs in same
cgroup and apply the limit on that disk.

Or if you want to put a system wide limit on a disk, then put all
VMs in root cgroup and put limit on root cgroups.

I fail to understand what's the exact requirement here. I thought
the biggest use case was isolation one VM from other which might
be sharing same device. Hence we were interested in putting 
per VM limit on disk and not a system wide limit on disk (independent
of VM).

Thanks
Vivek

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC]QEMU disk I/O limits
@ 2011-05-31 14:10       ` Vivek Goyal
  0 siblings, 0 replies; 56+ messages in thread
From: Vivek Goyal @ 2011-05-31 14:10 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: kwolf, aliguori, stefanha, kvm, guijianfeng, qemu-devel, wuzhy,
	herbert, Zhi Yong Wu, luowenj, zhanx, zhaoyang, llim, raharper

On Tue, May 31, 2011 at 02:56:46PM +0100, Daniel P. Berrange wrote:
> On Tue, May 31, 2011 at 09:45:37AM -0400, Vivek Goyal wrote:
> > On Mon, May 30, 2011 at 01:09:23PM +0800, Zhi Yong Wu wrote:
> > > Hello, all,
> > > 
> > >     I have prepared to work on a feature called "Disk I/O limits" for qemu-kvm projeect.
> > >     This feature will enable the user to cap disk I/O amount performed by a VM.It is important for some storage resources to be shared among multi-VMs. As you've known, if some of VMs are doing excessive disk I/O, they will hurt the performance of other VMs.
> > > 
> > 
> > Hi Zhiyong,
> > 
> > Why not use kernel blkio controller for this and why reinvent the wheel
> > and implement the feature again in qemu?
> 
> The finest level of granularity offered by cgroups apply limits per QEMU
> process. So the blkio controller can't be used to apply controls directly
> to individual disks used by QEMU, only the VM as a whole.

So are multiple VMs using same disk. Then put multiple VMs in same
cgroup and apply the limit on that disk.

Or if you want to put a system wide limit on a disk, then put all
VMs in root cgroup and put limit on root cgroups.

I fail to understand what's the exact requirement here. I thought
the biggest use case was isolation one VM from other which might
be sharing same device. Hence we were interested in putting 
per VM limit on disk and not a system wide limit on disk (independent
of VM).

Thanks
Vivek

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel][RFC]QEMU disk I/O limits
  2011-05-31 14:10       ` [Qemu-devel] [RFC]QEMU " Vivek Goyal
@ 2011-05-31 14:19         ` Daniel P. Berrange
  -1 siblings, 0 replies; 56+ messages in thread
From: Daniel P. Berrange @ 2011-05-31 14:19 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Zhi Yong Wu, qemu-devel, kvm, kwolf, guijianfeng, herbert,
	stefanha, aliguori, raharper, luowenj, wuzhy, zhanx, zhaoyang,
	llim

On Tue, May 31, 2011 at 10:10:37AM -0400, Vivek Goyal wrote:
> On Tue, May 31, 2011 at 02:56:46PM +0100, Daniel P. Berrange wrote:
> > On Tue, May 31, 2011 at 09:45:37AM -0400, Vivek Goyal wrote:
> > > On Mon, May 30, 2011 at 01:09:23PM +0800, Zhi Yong Wu wrote:
> > > > Hello, all,
> > > > 
> > > >     I have prepared to work on a feature called "Disk I/O limits" for qemu-kvm projeect.
> > > >     This feature will enable the user to cap disk I/O amount performed by a VM.It is important for some storage resources to be shared among multi-VMs. As you've known, if some of VMs are doing excessive disk I/O, they will hurt the performance of other VMs.
> > > > 
> > > 
> > > Hi Zhiyong,
> > > 
> > > Why not use kernel blkio controller for this and why reinvent the wheel
> > > and implement the feature again in qemu?
> > 
> > The finest level of granularity offered by cgroups apply limits per QEMU
> > process. So the blkio controller can't be used to apply controls directly
> > to individual disks used by QEMU, only the VM as a whole.
> 
> So are multiple VMs using same disk. Then put multiple VMs in same
> cgroup and apply the limit on that disk.
> 
> Or if you want to put a system wide limit on a disk, then put all
> VMs in root cgroup and put limit on root cgroups.
> 
> I fail to understand what's the exact requirement here. I thought
> the biggest use case was isolation one VM from other which might
> be sharing same device. Hence we were interested in putting 
> per VM limit on disk and not a system wide limit on disk (independent
> of VM).

No, it isn't about putting limits on a disk independant of a VM. It is
about one VM having multiple disks, and wanting to set different policies
for each of its virtual disks. eg

  qemu-kvm -drive file=/dev/sda1 -drive file=/dev/sdb3

and wanting to say that sda1 is limited to 10 MB/s, while sdb3 is
limited to 50 MB/s.  You can't do that kind of thing with cgroups,
because it can only control the entire process, not individual
resources within the process.

Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC]QEMU disk I/O limits
@ 2011-05-31 14:19         ` Daniel P. Berrange
  0 siblings, 0 replies; 56+ messages in thread
From: Daniel P. Berrange @ 2011-05-31 14:19 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: kwolf, aliguori, stefanha, kvm, guijianfeng, qemu-devel, wuzhy,
	herbert, Zhi Yong Wu, luowenj, zhanx, zhaoyang, llim, raharper

On Tue, May 31, 2011 at 10:10:37AM -0400, Vivek Goyal wrote:
> On Tue, May 31, 2011 at 02:56:46PM +0100, Daniel P. Berrange wrote:
> > On Tue, May 31, 2011 at 09:45:37AM -0400, Vivek Goyal wrote:
> > > On Mon, May 30, 2011 at 01:09:23PM +0800, Zhi Yong Wu wrote:
> > > > Hello, all,
> > > > 
> > > >     I have prepared to work on a feature called "Disk I/O limits" for qemu-kvm projeect.
> > > >     This feature will enable the user to cap disk I/O amount performed by a VM.It is important for some storage resources to be shared among multi-VMs. As you've known, if some of VMs are doing excessive disk I/O, they will hurt the performance of other VMs.
> > > > 
> > > 
> > > Hi Zhiyong,
> > > 
> > > Why not use kernel blkio controller for this and why reinvent the wheel
> > > and implement the feature again in qemu?
> > 
> > The finest level of granularity offered by cgroups apply limits per QEMU
> > process. So the blkio controller can't be used to apply controls directly
> > to individual disks used by QEMU, only the VM as a whole.
> 
> So are multiple VMs using same disk. Then put multiple VMs in same
> cgroup and apply the limit on that disk.
> 
> Or if you want to put a system wide limit on a disk, then put all
> VMs in root cgroup and put limit on root cgroups.
> 
> I fail to understand what's the exact requirement here. I thought
> the biggest use case was isolation one VM from other which might
> be sharing same device. Hence we were interested in putting 
> per VM limit on disk and not a system wide limit on disk (independent
> of VM).

No, it isn't about putting limits on a disk independant of a VM. It is
about one VM having multiple disks, and wanting to set different policies
for each of its virtual disks. eg

  qemu-kvm -drive file=/dev/sda1 -drive file=/dev/sdb3

and wanting to say that sda1 is limited to 10 MB/s, while sdb3 is
limited to 50 MB/s.  You can't do that kind of thing with cgroups,
because it can only control the entire process, not individual
resources within the process.

Daniel
-- 
|: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org              -o-             http://virt-manager.org :|
|: http://autobuild.org       -o-         http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc :|

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel][RFC]QEMU disk I/O limits
  2011-05-31 14:19         ` [Qemu-devel] [RFC]QEMU " Daniel P. Berrange
@ 2011-05-31 14:28           ` Vivek Goyal
  -1 siblings, 0 replies; 56+ messages in thread
From: Vivek Goyal @ 2011-05-31 14:28 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Zhi Yong Wu, qemu-devel, kvm, kwolf, guijianfeng, herbert,
	stefanha, aliguori, raharper, luowenj, wuzhy, zhanx, zhaoyang,
	llim

On Tue, May 31, 2011 at 03:19:56PM +0100, Daniel P. Berrange wrote:
> On Tue, May 31, 2011 at 10:10:37AM -0400, Vivek Goyal wrote:
> > On Tue, May 31, 2011 at 02:56:46PM +0100, Daniel P. Berrange wrote:
> > > On Tue, May 31, 2011 at 09:45:37AM -0400, Vivek Goyal wrote:
> > > > On Mon, May 30, 2011 at 01:09:23PM +0800, Zhi Yong Wu wrote:
> > > > > Hello, all,
> > > > > 
> > > > >     I have prepared to work on a feature called "Disk I/O limits" for qemu-kvm projeect.
> > > > >     This feature will enable the user to cap disk I/O amount performed by a VM.It is important for some storage resources to be shared among multi-VMs. As you've known, if some of VMs are doing excessive disk I/O, they will hurt the performance of other VMs.
> > > > > 
> > > > 
> > > > Hi Zhiyong,
> > > > 
> > > > Why not use kernel blkio controller for this and why reinvent the wheel
> > > > and implement the feature again in qemu?
> > > 
> > > The finest level of granularity offered by cgroups apply limits per QEMU
> > > process. So the blkio controller can't be used to apply controls directly
> > > to individual disks used by QEMU, only the VM as a whole.
> > 
> > So are multiple VMs using same disk. Then put multiple VMs in same
> > cgroup and apply the limit on that disk.
> > 
> > Or if you want to put a system wide limit on a disk, then put all
> > VMs in root cgroup and put limit on root cgroups.
> > 
> > I fail to understand what's the exact requirement here. I thought
> > the biggest use case was isolation one VM from other which might
> > be sharing same device. Hence we were interested in putting 
> > per VM limit on disk and not a system wide limit on disk (independent
> > of VM).
> 
> No, it isn't about putting limits on a disk independant of a VM. It is
> about one VM having multiple disks, and wanting to set different policies
> for each of its virtual disks. eg
> 
>   qemu-kvm -drive file=/dev/sda1 -drive file=/dev/sdb3
> 
> and wanting to say that sda1 is limited to 10 MB/s, while sdb3 is
> limited to 50 MB/s.  You can't do that kind of thing with cgroups,
> because it can only control the entire process, not individual
> resources within the process.

With IO controller you can do that. Limits are "per cgroup per disk".
So once you have put a VM in a cgroup, you can specify two differnt
limits for two disk for that cgroup.

There are 4 relevant files per cgroup.

blkio.throttle.read_bps_device
blkio.throttle.write_bps_device
blkio.throttle.read_iops_device
blkio.throttle.write_iops_device

And syntax of these files is.

device_major:device_minor  <rate_limit>

Ex.

8:16	1024000

This means from a specified cgroup, on disk with major:minor 8:16, don't
allow read BW higher than 1024000 bytes per second.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC]QEMU disk I/O limits
@ 2011-05-31 14:28           ` Vivek Goyal
  0 siblings, 0 replies; 56+ messages in thread
From: Vivek Goyal @ 2011-05-31 14:28 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: kwolf, aliguori, stefanha, kvm, guijianfeng, qemu-devel, wuzhy,
	herbert, Zhi Yong Wu, luowenj, zhanx, zhaoyang, llim, raharper

On Tue, May 31, 2011 at 03:19:56PM +0100, Daniel P. Berrange wrote:
> On Tue, May 31, 2011 at 10:10:37AM -0400, Vivek Goyal wrote:
> > On Tue, May 31, 2011 at 02:56:46PM +0100, Daniel P. Berrange wrote:
> > > On Tue, May 31, 2011 at 09:45:37AM -0400, Vivek Goyal wrote:
> > > > On Mon, May 30, 2011 at 01:09:23PM +0800, Zhi Yong Wu wrote:
> > > > > Hello, all,
> > > > > 
> > > > >     I have prepared to work on a feature called "Disk I/O limits" for qemu-kvm projeect.
> > > > >     This feature will enable the user to cap disk I/O amount performed by a VM.It is important for some storage resources to be shared among multi-VMs. As you've known, if some of VMs are doing excessive disk I/O, they will hurt the performance of other VMs.
> > > > > 
> > > > 
> > > > Hi Zhiyong,
> > > > 
> > > > Why not use kernel blkio controller for this and why reinvent the wheel
> > > > and implement the feature again in qemu?
> > > 
> > > The finest level of granularity offered by cgroups apply limits per QEMU
> > > process. So the blkio controller can't be used to apply controls directly
> > > to individual disks used by QEMU, only the VM as a whole.
> > 
> > So are multiple VMs using same disk. Then put multiple VMs in same
> > cgroup and apply the limit on that disk.
> > 
> > Or if you want to put a system wide limit on a disk, then put all
> > VMs in root cgroup and put limit on root cgroups.
> > 
> > I fail to understand what's the exact requirement here. I thought
> > the biggest use case was isolation one VM from other which might
> > be sharing same device. Hence we were interested in putting 
> > per VM limit on disk and not a system wide limit on disk (independent
> > of VM).
> 
> No, it isn't about putting limits on a disk independant of a VM. It is
> about one VM having multiple disks, and wanting to set different policies
> for each of its virtual disks. eg
> 
>   qemu-kvm -drive file=/dev/sda1 -drive file=/dev/sdb3
> 
> and wanting to say that sda1 is limited to 10 MB/s, while sdb3 is
> limited to 50 MB/s.  You can't do that kind of thing with cgroups,
> because it can only control the entire process, not individual
> resources within the process.

With IO controller you can do that. Limits are "per cgroup per disk".
So once you have put a VM in a cgroup, you can specify two differnt
limits for two disk for that cgroup.

There are 4 relevant files per cgroup.

blkio.throttle.read_bps_device
blkio.throttle.write_bps_device
blkio.throttle.read_iops_device
blkio.throttle.write_iops_device

And syntax of these files is.

device_major:device_minor  <rate_limit>

Ex.

8:16	1024000

This means from a specified cgroup, on disk with major:minor 8:16, don't
allow read BW higher than 1024000 bytes per second.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel][RFC]QEMU disk I/O limits
  2011-05-31 14:19         ` [Qemu-devel] [RFC]QEMU " Daniel P. Berrange
@ 2011-05-31 15:28           ` Ryan Harper
  -1 siblings, 0 replies; 56+ messages in thread
From: Ryan Harper @ 2011-05-31 15:28 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: Vivek Goyal, Zhi Yong Wu, qemu-devel, kvm, kwolf, guijianfeng,
	herbert, stefanha, Anthony Liguori, Ryan A Harper, luowenj,
	wuzhy, zhanx, zhaoyang, llim

* Daniel P. Berrange <berrange@redhat.com> [2011-05-31 09:25]:
> On Tue, May 31, 2011 at 10:10:37AM -0400, Vivek Goyal wrote:
> > On Tue, May 31, 2011 at 02:56:46PM +0100, Daniel P. Berrange wrote:
> > > On Tue, May 31, 2011 at 09:45:37AM -0400, Vivek Goyal wrote:
> > > > On Mon, May 30, 2011 at 01:09:23PM +0800, Zhi Yong Wu wrote:
> > > > > Hello, all,
> > > > > 
> > > > >     I have prepared to work on a feature called "Disk I/O limits" for qemu-kvm projeect.
> > > > >     This feature will enable the user to cap disk I/O amount performed by a VM.It is important for some storage resources to be shared among multi-VMs. As you've known, if some of VMs are doing excessive disk I/O, they will hurt the performance of other VMs.
> > > > > 
> > > > 
> > > > Hi Zhiyong,
> > > > 
> > > > Why not use kernel blkio controller for this and why reinvent the wheel
> > > > and implement the feature again in qemu?
> > > 
> > > The finest level of granularity offered by cgroups apply limits per QEMU
> > > process. So the blkio controller can't be used to apply controls directly
> > > to individual disks used by QEMU, only the VM as a whole.
> > 
> > So are multiple VMs using same disk. Then put multiple VMs in same
> > cgroup and apply the limit on that disk.
> > 
> > Or if you want to put a system wide limit on a disk, then put all
> > VMs in root cgroup and put limit on root cgroups.
> > 
> > I fail to understand what's the exact requirement here. I thought
> > the biggest use case was isolation one VM from other which might
> > be sharing same device. Hence we were interested in putting 
> > per VM limit on disk and not a system wide limit on disk (independent
> > of VM).
> 
> No, it isn't about putting limits on a disk independant of a VM. It is
> about one VM having multiple disks, and wanting to set different policies
> for each of its virtual disks. eg
> 
>   qemu-kvm -drive file=/dev/sda1 -drive file=/dev/sdb3
> 
> and wanting to say that sda1 is limited to 10 MB/s, while sdb3 is
> limited to 50 MB/s.  You can't do that kind of thing with cgroups,
> because it can only control the entire process, not individual
> resources within the process.

yes, but with files:

qemu-kvm -drive file=/path/to/local/vm/images 
         -drive file=/path/to/shared/storage 


-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ryanh@us.ibm.com

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC]QEMU disk I/O limits
@ 2011-05-31 15:28           ` Ryan Harper
  0 siblings, 0 replies; 56+ messages in thread
From: Ryan Harper @ 2011-05-31 15:28 UTC (permalink / raw)
  To: Daniel P. Berrange
  Cc: kwolf, Anthony Liguori, stefanha, kvm, guijianfeng, qemu-devel,
	wuzhy, herbert, Zhi Yong Wu, luowenj, zhanx, zhaoyang, llim,
	Ryan A Harper, Vivek Goyal

* Daniel P. Berrange <berrange@redhat.com> [2011-05-31 09:25]:
> On Tue, May 31, 2011 at 10:10:37AM -0400, Vivek Goyal wrote:
> > On Tue, May 31, 2011 at 02:56:46PM +0100, Daniel P. Berrange wrote:
> > > On Tue, May 31, 2011 at 09:45:37AM -0400, Vivek Goyal wrote:
> > > > On Mon, May 30, 2011 at 01:09:23PM +0800, Zhi Yong Wu wrote:
> > > > > Hello, all,
> > > > > 
> > > > >     I have prepared to work on a feature called "Disk I/O limits" for qemu-kvm projeect.
> > > > >     This feature will enable the user to cap disk I/O amount performed by a VM.It is important for some storage resources to be shared among multi-VMs. As you've known, if some of VMs are doing excessive disk I/O, they will hurt the performance of other VMs.
> > > > > 
> > > > 
> > > > Hi Zhiyong,
> > > > 
> > > > Why not use kernel blkio controller for this and why reinvent the wheel
> > > > and implement the feature again in qemu?
> > > 
> > > The finest level of granularity offered by cgroups apply limits per QEMU
> > > process. So the blkio controller can't be used to apply controls directly
> > > to individual disks used by QEMU, only the VM as a whole.
> > 
> > So are multiple VMs using same disk. Then put multiple VMs in same
> > cgroup and apply the limit on that disk.
> > 
> > Or if you want to put a system wide limit on a disk, then put all
> > VMs in root cgroup and put limit on root cgroups.
> > 
> > I fail to understand what's the exact requirement here. I thought
> > the biggest use case was isolation one VM from other which might
> > be sharing same device. Hence we were interested in putting 
> > per VM limit on disk and not a system wide limit on disk (independent
> > of VM).
> 
> No, it isn't about putting limits on a disk independant of a VM. It is
> about one VM having multiple disks, and wanting to set different policies
> for each of its virtual disks. eg
> 
>   qemu-kvm -drive file=/dev/sda1 -drive file=/dev/sdb3
> 
> and wanting to say that sda1 is limited to 10 MB/s, while sdb3 is
> limited to 50 MB/s.  You can't do that kind of thing with cgroups,
> because it can only control the entire process, not individual
> resources within the process.

yes, but with files:

qemu-kvm -drive file=/path/to/local/vm/images 
         -drive file=/path/to/shared/storage 


-- 
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ryanh@us.ibm.com

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel][RFC]QEMU disk I/O limits
  2011-05-30  5:09 ` [Qemu-devel] [RFC]QEMU " Zhi Yong Wu
@ 2011-05-31 19:55   ` Vivek Goyal
  -1 siblings, 0 replies; 56+ messages in thread
From: Vivek Goyal @ 2011-05-31 19:55 UTC (permalink / raw)
  To: Zhi Yong Wu
  Cc: qemu-devel, kvm, kwolf, guijianfeng, herbert, stefanha, aliguori,
	raharper, luowenj, wuzhy, zhanx, zhaoyang, llim

On Mon, May 30, 2011 at 01:09:23PM +0800, Zhi Yong Wu wrote:

[..]
>     3.) How the users enable and play with it
>     QEMU -drive option will be extended so that disk I/O limits can be specified on its command line, such as -drive [iops=xxx,][throughput=xxx] or -drive [iops_rd=xxx,][iops_wr=xxx,][throughput=xxx] etc. When this argument is specified, it means that "disk I/O limits" feature is enabled for this drive disk.

How does throughput interface look like? is it bytes per second or something
else?

Do we have read and write variants for throughput as we have for iops.

if you have bytes interface(as kenrel does), then "bps_rd" and "bps_wr"
might be good names too for thoughput interface.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC]QEMU disk I/O limits
@ 2011-05-31 19:55   ` Vivek Goyal
  0 siblings, 0 replies; 56+ messages in thread
From: Vivek Goyal @ 2011-05-31 19:55 UTC (permalink / raw)
  To: Zhi Yong Wu
  Cc: kwolf, aliguori, stefanha, kvm, guijianfeng, qemu-devel, wuzhy,
	herbert, luowenj, zhanx, zhaoyang, llim, raharper

On Mon, May 30, 2011 at 01:09:23PM +0800, Zhi Yong Wu wrote:

[..]
>     3.) How the users enable and play with it
>     QEMU -drive option will be extended so that disk I/O limits can be specified on its command line, such as -drive [iops=xxx,][throughput=xxx] or -drive [iops_rd=xxx,][iops_wr=xxx,][throughput=xxx] etc. When this argument is specified, it means that "disk I/O limits" feature is enabled for this drive disk.

How does throughput interface look like? is it bytes per second or something
else?

Do we have read and write variants for throughput as we have for iops.

if you have bytes interface(as kenrel does), then "bps_rd" and "bps_wr"
might be good names too for thoughput interface.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC]QEMU disk I/O limits
  2011-05-31 19:55   ` [Qemu-devel] [RFC]QEMU " Vivek Goyal
@ 2011-06-01  3:12     ` Zhi Yong Wu
  -1 siblings, 0 replies; 56+ messages in thread
From: Zhi Yong Wu @ 2011-06-01  3:12 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: vgoyal, kwolf, stefanha, kvm, guijianfeng, qemu-devel, wuzhy,
	herbert, ejt, wuzhy, luowenj, zhanx, zhaoyang, llim, raharper

On Tue, May 31, 2011 at 03:55:49PM -0400, Vivek Goyal wrote:
>Date: Tue, 31 May 2011 15:55:49 -0400
>From: Vivek Goyal <vgoyal@redhat.com>
>To: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
>Cc: kwolf@redhat.com, aliguori@us.ibm.com, stefanha@linux.vnet.ibm.com,
>	kvm@vger.kernel.org, guijianfeng@cn.fujitsu.com,
>	qemu-devel@nongnu.org, wuzhy@cn.ibm.com,
>	herbert@gondor.hengli.com.au, luowenj@cn.ibm.com, zhanx@cn.ibm.com,
>	zhaoyang@cn.ibm.com, llim@redhat.com, raharper@us.ibm.com
>Subject: Re: [Qemu-devel] [RFC]QEMU disk I/O limits
>User-Agent: Mutt/1.5.21 (2010-09-15)
>
>On Mon, May 30, 2011 at 01:09:23PM +0800, Zhi Yong Wu wrote:
>
>[..]
>>     3.) How the users enable and play with it
>>     QEMU -drive option will be extended so that disk I/O limits can be specified on its command line, such as -drive [iops=xxx,][throughput=xxx] or -drive [iops_rd=xxx,][iops_wr=xxx,][throughput=xxx] etc. When this argument is specified, it means that "disk I/O limits" feature is enabled for this drive disk.
>
>How does throughput interface look like? is it bytes per second or something
>else?
HI, Vivek,
It will be a value based on bytes per second.

>
>Do we have read and write variants for throughput as we have for iops.
QEMU code has two variants "rd_bytes, wr_bytes", but we maybe need to get their bytes per second.

>
>if you have bytes interface(as kenrel does), then "bps_rd" and "bps_wr"
>might be good names too for thoughput interface.
I agree with you, and can change them as your suggestions.


Regards,

Zhiyong Wu
>
>Thanks
>Vivek
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC]QEMU disk I/O limits
@ 2011-06-01  3:12     ` Zhi Yong Wu
  0 siblings, 0 replies; 56+ messages in thread
From: Zhi Yong Wu @ 2011-06-01  3:12 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: kwolf, stefanha, kvm, guijianfeng, qemu-devel, wuzhy, herbert,
	ejt, wuzhy, luowenj, zhanx, zhaoyang, llim, raharper

On Tue, May 31, 2011 at 03:55:49PM -0400, Vivek Goyal wrote:
>Date: Tue, 31 May 2011 15:55:49 -0400
>From: Vivek Goyal <vgoyal@redhat.com>
>To: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
>Cc: kwolf@redhat.com, aliguori@us.ibm.com, stefanha@linux.vnet.ibm.com,
>	kvm@vger.kernel.org, guijianfeng@cn.fujitsu.com,
>	qemu-devel@nongnu.org, wuzhy@cn.ibm.com,
>	herbert@gondor.hengli.com.au, luowenj@cn.ibm.com, zhanx@cn.ibm.com,
>	zhaoyang@cn.ibm.com, llim@redhat.com, raharper@us.ibm.com
>Subject: Re: [Qemu-devel] [RFC]QEMU disk I/O limits
>User-Agent: Mutt/1.5.21 (2010-09-15)
>
>On Mon, May 30, 2011 at 01:09:23PM +0800, Zhi Yong Wu wrote:
>
>[..]
>>     3.) How the users enable and play with it
>>     QEMU -drive option will be extended so that disk I/O limits can be specified on its command line, such as -drive [iops=xxx,][throughput=xxx] or -drive [iops_rd=xxx,][iops_wr=xxx,][throughput=xxx] etc. When this argument is specified, it means that "disk I/O limits" feature is enabled for this drive disk.
>
>How does throughput interface look like? is it bytes per second or something
>else?
HI, Vivek,
It will be a value based on bytes per second.

>
>Do we have read and write variants for throughput as we have for iops.
QEMU code has two variants "rd_bytes, wr_bytes", but we maybe need to get their bytes per second.

>
>if you have bytes interface(as kenrel does), then "bps_rd" and "bps_wr"
>might be good names too for thoughput interface.
I agree with you, and can change them as your suggestions.


Regards,

Zhiyong Wu
>
>Thanks
>Vivek
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC]QEMU disk I/O limits
  2011-06-01  3:12     ` Zhi Yong Wu
@ 2011-06-02  9:33       ` Michal Suchanek
  -1 siblings, 0 replies; 56+ messages in thread
From: Michal Suchanek @ 2011-06-02  9:33 UTC (permalink / raw)
  To: Zhi Yong Wu
  Cc: Vivek Goyal, kwolf, stefanha, kvm, guijianfeng, qemu-devel,
	wuzhy, herbert, ejt, luowenj, zhanx, zhaoyang, llim, raharper

On 1 June 2011 05:12, Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> wrote:
> On Tue, May 31, 2011 at 03:55:49PM -0400, Vivek Goyal wrote:
>>Date: Tue, 31 May 2011 15:55:49 -0400
>>From: Vivek Goyal <vgoyal@redhat.com>
>>To: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
>>Cc: kwolf@redhat.com, aliguori@us.ibm.com, stefanha@linux.vnet.ibm.com,
>>       kvm@vger.kernel.org, guijianfeng@cn.fujitsu.com,
>>       qemu-devel@nongnu.org, wuzhy@cn.ibm.com,
>>       herbert@gondor.hengli.com.au, luowenj@cn.ibm.com, zhanx@cn.ibm.com,
>>       zhaoyang@cn.ibm.com, llim@redhat.com, raharper@us.ibm.com
>>Subject: Re: [Qemu-devel] [RFC]QEMU disk I/O limits
>>User-Agent: Mutt/1.5.21 (2010-09-15)
>>
>>On Mon, May 30, 2011 at 01:09:23PM +0800, Zhi Yong Wu wrote:
>>
>>[..]
>>>     3.) How the users enable and play with it
>>>     QEMU -drive option will be extended so that disk I/O limits can be specified on its command line, such as -drive [iops=xxx,][throughput=xxx] or -drive [iops_rd=xxx,][iops_wr=xxx,][throughput=xxx] etc. When this argument is specified, it means that "disk I/O limits" feature is enabled for this drive disk.
>>
>>How does throughput interface look like? is it bytes per second or something
>>else?
> HI, Vivek,
> It will be a value based on bytes per second.
>
>>
>>Do we have read and write variants for throughput as we have for iops.
> QEMU code has two variants "rd_bytes, wr_bytes", but we maybe need to get their bytes per second.
>
>>
>>if you have bytes interface(as kenrel does), then "bps_rd" and "bps_wr"
>>might be good names too for thoughput interface.
> I agree with you, and can change them as your suggestions.
>

Changing them this way is not going to be an improvement. While
rd_bytes and wr_bytes lack the time interval specification bps_rd and
bps_wr is ambiguous. Is that bits? bytes? Sure, there should be some
distinction by capitalization but that does not apply since qemu
arguments are all lowercase.

Thanks

Michal

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC]QEMU disk I/O limits
@ 2011-06-02  9:33       ` Michal Suchanek
  0 siblings, 0 replies; 56+ messages in thread
From: Michal Suchanek @ 2011-06-02  9:33 UTC (permalink / raw)
  To: Zhi Yong Wu
  Cc: kwolf, stefanha, kvm, guijianfeng, qemu-devel, wuzhy, herbert,
	ejt, luowenj, zhanx, zhaoyang, llim, raharper, Vivek Goyal

On 1 June 2011 05:12, Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> wrote:
> On Tue, May 31, 2011 at 03:55:49PM -0400, Vivek Goyal wrote:
>>Date: Tue, 31 May 2011 15:55:49 -0400
>>From: Vivek Goyal <vgoyal@redhat.com>
>>To: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
>>Cc: kwolf@redhat.com, aliguori@us.ibm.com, stefanha@linux.vnet.ibm.com,
>>       kvm@vger.kernel.org, guijianfeng@cn.fujitsu.com,
>>       qemu-devel@nongnu.org, wuzhy@cn.ibm.com,
>>       herbert@gondor.hengli.com.au, luowenj@cn.ibm.com, zhanx@cn.ibm.com,
>>       zhaoyang@cn.ibm.com, llim@redhat.com, raharper@us.ibm.com
>>Subject: Re: [Qemu-devel] [RFC]QEMU disk I/O limits
>>User-Agent: Mutt/1.5.21 (2010-09-15)
>>
>>On Mon, May 30, 2011 at 01:09:23PM +0800, Zhi Yong Wu wrote:
>>
>>[..]
>>>     3.) How the users enable and play with it
>>>     QEMU -drive option will be extended so that disk I/O limits can be specified on its command line, such as -drive [iops=xxx,][throughput=xxx] or -drive [iops_rd=xxx,][iops_wr=xxx,][throughput=xxx] etc. When this argument is specified, it means that "disk I/O limits" feature is enabled for this drive disk.
>>
>>How does throughput interface look like? is it bytes per second or something
>>else?
> HI, Vivek,
> It will be a value based on bytes per second.
>
>>
>>Do we have read and write variants for throughput as we have for iops.
> QEMU code has two variants "rd_bytes, wr_bytes", but we maybe need to get their bytes per second.
>
>>
>>if you have bytes interface(as kenrel does), then "bps_rd" and "bps_wr"
>>might be good names too for thoughput interface.
> I agree with you, and can change them as your suggestions.
>

Changing them this way is not going to be an improvement. While
rd_bytes and wr_bytes lack the time interval specification bps_rd and
bps_wr is ambiguous. Is that bits? bytes? Sure, there should be some
distinction by capitalization but that does not apply since qemu
arguments are all lowercase.

Thanks

Michal

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC]QEMU disk I/O limits
  2011-06-02  9:33       ` Michal Suchanek
@ 2011-06-03  6:56         ` Zhi Yong Wu
  -1 siblings, 0 replies; 56+ messages in thread
From: Zhi Yong Wu @ 2011-06-03  6:56 UTC (permalink / raw)
  To: Michal Suchanek
  Cc: Vivek Goyal, kwolf, stefanha, kvm, guijianfeng, qemu-devel,
	wuzhy, herbert, ejt, luowenj, raharper

On Thu, Jun 2, 2011 at 5:33 PM, Michal Suchanek <hramrach@centrum.cz> wrote:
> On 1 June 2011 05:12, Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> wrote:
>> On Tue, May 31, 2011 at 03:55:49PM -0400, Vivek Goyal wrote:
>>>Date: Tue, 31 May 2011 15:55:49 -0400
>>>From: Vivek Goyal <vgoyal@redhat.com>
>>>To: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
>>>Cc: kwolf@redhat.com, aliguori@us.ibm.com, stefanha@linux.vnet.ibm.com,
>>>       kvm@vger.kernel.org, guijianfeng@cn.fujitsu.com,
>>>       qemu-devel@nongnu.org, wuzhy@cn.ibm.com,
>>>       herbert@gondor.hengli.com.au, luowenj@cn.ibm.com, zhanx@cn.ibm.com,
>>>       zhaoyang@cn.ibm.com, llim@redhat.com, raharper@us.ibm.com
>>>Subject: Re: [Qemu-devel] [RFC]QEMU disk I/O limits
>>>User-Agent: Mutt/1.5.21 (2010-09-15)
>>>
>>>On Mon, May 30, 2011 at 01:09:23PM +0800, Zhi Yong Wu wrote:
>>>
>>>[..]
>>>>     3.) How the users enable and play with it
>>>>     QEMU -drive option will be extended so that disk I/O limits can be specified on its command line, such as -drive [iops=xxx,][throughput=xxx] or -drive [iops_rd=xxx,][iops_wr=xxx,][throughput=xxx] etc. When this argument is specified, it means that "disk I/O limits" feature is enabled for this drive disk.
>>>
>>>How does throughput interface look like? is it bytes per second or something
>>>else?
>> HI, Vivek,
>> It will be a value based on bytes per second.
>>
>>>
>>>Do we have read and write variants for throughput as we have for iops.
>> QEMU code has two variants "rd_bytes, wr_bytes", but we maybe need to get their bytes per second.
>>
>>>
>>>if you have bytes interface(as kenrel does), then "bps_rd" and "bps_wr"
>>>might be good names too for thoughput interface.
>> I agree with you, and can change them as your suggestions.
>>
>
> Changing them this way is not going to be an improvement. While
> rd_bytes and wr_bytes lack the time interval specification bps_rd and
right, rd_bytes and wr_bytes lack.
> bps_wr is ambiguous. Is that bits? bytes? Sure, there should be some
if we implement them, they will be bytes.
> distinction by capitalization but that does not apply since qemu
> arguments are all lowercase.
Michal,
maybe you misunderstand what i mean. I mean that two variables
rd_bytes and wr_bytes exist in block.c file, and are not qemu
arguments. But bps_rd and bps_wr wil be added as qemu arguments.

Regards,

Zhiyong Wu
>
> Thanks
>
> Michal
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Regards,

Zhi Yong Wu

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC]QEMU disk I/O limits
@ 2011-06-03  6:56         ` Zhi Yong Wu
  0 siblings, 0 replies; 56+ messages in thread
From: Zhi Yong Wu @ 2011-06-03  6:56 UTC (permalink / raw)
  To: Michal Suchanek
  Cc: kwolf, stefanha, kvm, guijianfeng, qemu-devel, wuzhy, herbert,
	ejt, luowenj, raharper, Vivek Goyal

On Thu, Jun 2, 2011 at 5:33 PM, Michal Suchanek <hramrach@centrum.cz> wrote:
> On 1 June 2011 05:12, Zhi Yong Wu <wuzhy@linux.vnet.ibm.com> wrote:
>> On Tue, May 31, 2011 at 03:55:49PM -0400, Vivek Goyal wrote:
>>>Date: Tue, 31 May 2011 15:55:49 -0400
>>>From: Vivek Goyal <vgoyal@redhat.com>
>>>To: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
>>>Cc: kwolf@redhat.com, aliguori@us.ibm.com, stefanha@linux.vnet.ibm.com,
>>>       kvm@vger.kernel.org, guijianfeng@cn.fujitsu.com,
>>>       qemu-devel@nongnu.org, wuzhy@cn.ibm.com,
>>>       herbert@gondor.hengli.com.au, luowenj@cn.ibm.com, zhanx@cn.ibm.com,
>>>       zhaoyang@cn.ibm.com, llim@redhat.com, raharper@us.ibm.com
>>>Subject: Re: [Qemu-devel] [RFC]QEMU disk I/O limits
>>>User-Agent: Mutt/1.5.21 (2010-09-15)
>>>
>>>On Mon, May 30, 2011 at 01:09:23PM +0800, Zhi Yong Wu wrote:
>>>
>>>[..]
>>>>     3.) How the users enable and play with it
>>>>     QEMU -drive option will be extended so that disk I/O limits can be specified on its command line, such as -drive [iops=xxx,][throughput=xxx] or -drive [iops_rd=xxx,][iops_wr=xxx,][throughput=xxx] etc. When this argument is specified, it means that "disk I/O limits" feature is enabled for this drive disk.
>>>
>>>How does throughput interface look like? is it bytes per second or something
>>>else?
>> HI, Vivek,
>> It will be a value based on bytes per second.
>>
>>>
>>>Do we have read and write variants for throughput as we have for iops.
>> QEMU code has two variants "rd_bytes, wr_bytes", but we maybe need to get their bytes per second.
>>
>>>
>>>if you have bytes interface(as kenrel does), then "bps_rd" and "bps_wr"
>>>might be good names too for thoughput interface.
>> I agree with you, and can change them as your suggestions.
>>
>
> Changing them this way is not going to be an improvement. While
> rd_bytes and wr_bytes lack the time interval specification bps_rd and
right, rd_bytes and wr_bytes lack.
> bps_wr is ambiguous. Is that bits? bytes? Sure, there should be some
if we implement them, they will be bytes.
> distinction by capitalization but that does not apply since qemu
> arguments are all lowercase.
Michal,
maybe you misunderstand what i mean. I mean that two variables
rd_bytes and wr_bytes exist in block.c file, and are not qemu
arguments. But bps_rd and bps_wr wil be added as qemu arguments.

Regards,

Zhiyong Wu
>
> Thanks
>
> Michal
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>



-- 
Regards,

Zhi Yong Wu

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC]QEMU disk I/O limits
  2011-05-31 19:55   ` [Qemu-devel] [RFC]QEMU " Vivek Goyal
@ 2011-06-01  3:19     ` Zhi Yong Wu
  -1 siblings, 0 replies; 56+ messages in thread
From: Zhi Yong Wu @ 2011-06-01  3:19 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: kwolf, stefanha, kvm, guijianfeng, qemu-devel, wuzhy, herbert,
	ejt, wuzhy, luowenj, zhanx, zhaoyang, llim, raharper, vgoyal

On Tue, May 31, 2011 at 03:55:49PM -0400, Vivek Goyal wrote:
>Date: Tue, 31 May 2011 15:55:49 -0400
>From: Vivek Goyal <vgoyal@redhat.com>
>To: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
>Cc: kwolf@redhat.com, aliguori@us.ibm.com, stefanha@linux.vnet.ibm.com,
>	kvm@vger.kernel.org, guijianfeng@cn.fujitsu.com,
>	qemu-devel@nongnu.org, wuzhy@cn.ibm.com,
>	herbert@gondor.hengli.com.au, luowenj@cn.ibm.com, zhanx@cn.ibm.com,
>	zhaoyang@cn.ibm.com, llim@redhat.com, raharper@us.ibm.com
>Subject: Re: [Qemu-devel] [RFC]QEMU disk I/O limits
>User-Agent: Mutt/1.5.21 (2010-09-15)
>
>On Mon, May 30, 2011 at 01:09:23PM +0800, Zhi Yong Wu wrote:
>
>[..]
>>     3.) How the users enable and play with it
>>     QEMU -drive option will be extended so that disk I/O limits can be specified on its command line, such as -drive [iops=xxx,][throughput=xxx] or -drive [iops_rd=xxx,][iops_wr=xxx,][throughput=xxx] etc. When this argument is specified, it means that "disk I/O limits" feature is enabled for this drive disk.
>
>How does throughput interface look like? is it bytes per second or something
>else?
Given your suggestion, its form will look like below:

-drive [iops=xxx][,bps=xxx] or -drive [iops_rd=xxx][,iops_wr=xxx][,bps_rd=xxx][,bps_wr=xxx]


Regards,

Zhiyong Wu
>
>Do we have read and write variants for throughput as we have for iops.
>
>if you have bytes interface(as kenrel does), then "bps_rd" and "bps_wr"
>might be good names too for thoughput interface.
>
>Thanks
>Vivek
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC]QEMU disk I/O limits
@ 2011-06-01  3:19     ` Zhi Yong Wu
  0 siblings, 0 replies; 56+ messages in thread
From: Zhi Yong Wu @ 2011-06-01  3:19 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: kwolf, stefanha, kvm, guijianfeng, qemu-devel, wuzhy, herbert,
	ejt, wuzhy, luowenj, zhanx, zhaoyang, llim, raharper

On Tue, May 31, 2011 at 03:55:49PM -0400, Vivek Goyal wrote:
>Date: Tue, 31 May 2011 15:55:49 -0400
>From: Vivek Goyal <vgoyal@redhat.com>
>To: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
>Cc: kwolf@redhat.com, aliguori@us.ibm.com, stefanha@linux.vnet.ibm.com,
>	kvm@vger.kernel.org, guijianfeng@cn.fujitsu.com,
>	qemu-devel@nongnu.org, wuzhy@cn.ibm.com,
>	herbert@gondor.hengli.com.au, luowenj@cn.ibm.com, zhanx@cn.ibm.com,
>	zhaoyang@cn.ibm.com, llim@redhat.com, raharper@us.ibm.com
>Subject: Re: [Qemu-devel] [RFC]QEMU disk I/O limits
>User-Agent: Mutt/1.5.21 (2010-09-15)
>
>On Mon, May 30, 2011 at 01:09:23PM +0800, Zhi Yong Wu wrote:
>
>[..]
>>     3.) How the users enable and play with it
>>     QEMU -drive option will be extended so that disk I/O limits can be specified on its command line, such as -drive [iops=xxx,][throughput=xxx] or -drive [iops_rd=xxx,][iops_wr=xxx,][throughput=xxx] etc. When this argument is specified, it means that "disk I/O limits" feature is enabled for this drive disk.
>
>How does throughput interface look like? is it bytes per second or something
>else?
Given your suggestion, its form will look like below:

-drive [iops=xxx][,bps=xxx] or -drive [iops_rd=xxx][,iops_wr=xxx][,bps_rd=xxx][,bps_wr=xxx]


Regards,

Zhiyong Wu
>
>Do we have read and write variants for throughput as we have for iops.
>
>if you have bytes interface(as kenrel does), then "bps_rd" and "bps_wr"
>might be good names too for thoughput interface.
>
>Thanks
>Vivek
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC]QEMU disk I/O limits
  2011-06-01  3:19     ` Zhi Yong Wu
  (?)
@ 2011-06-01 13:32     ` Vivek Goyal
  2011-06-02  6:07       ` Zhi Yong Wu
  -1 siblings, 1 reply; 56+ messages in thread
From: Vivek Goyal @ 2011-06-01 13:32 UTC (permalink / raw)
  To: Zhi Yong Wu
  Cc: kwolf, stefanha, kvm, guijianfeng, qemu-devel, wuzhy, herbert,
	ejt, luowenj, zhanx, zhaoyang, llim, raharper

On Wed, Jun 01, 2011 at 11:19:58AM +0800, Zhi Yong Wu wrote:
> On Tue, May 31, 2011 at 03:55:49PM -0400, Vivek Goyal wrote:
> >Date: Tue, 31 May 2011 15:55:49 -0400
> >From: Vivek Goyal <vgoyal@redhat.com>
> >To: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
> >Cc: kwolf@redhat.com, aliguori@us.ibm.com, stefanha@linux.vnet.ibm.com,
> >	kvm@vger.kernel.org, guijianfeng@cn.fujitsu.com,
> >	qemu-devel@nongnu.org, wuzhy@cn.ibm.com,
> >	herbert@gondor.hengli.com.au, luowenj@cn.ibm.com, zhanx@cn.ibm.com,
> >	zhaoyang@cn.ibm.com, llim@redhat.com, raharper@us.ibm.com
> >Subject: Re: [Qemu-devel] [RFC]QEMU disk I/O limits
> >User-Agent: Mutt/1.5.21 (2010-09-15)
> >
> >On Mon, May 30, 2011 at 01:09:23PM +0800, Zhi Yong Wu wrote:
> >
> >[..]
> >>     3.) How the users enable and play with it
> >>     QEMU -drive option will be extended so that disk I/O limits can be specified on its command line, such as -drive [iops=xxx,][throughput=xxx] or -drive [iops_rd=xxx,][iops_wr=xxx,][throughput=xxx] etc. When this argument is specified, it means that "disk I/O limits" feature is enabled for this drive disk.
> >
> >How does throughput interface look like? is it bytes per second or something
> >else?
> Given your suggestion, its form will look like below:
> 
> -drive [iops=xxx][,bps=xxx] or -drive [iops_rd=xxx][,iops_wr=xxx][,bps_rd=xxx][,bps_wr=xxx]

Can one specify both iops and bps rule for the same drive?

Thanks
Vivek

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC]QEMU disk I/O limits
  2011-06-01 13:32     ` Vivek Goyal
@ 2011-06-02  6:07       ` Zhi Yong Wu
  0 siblings, 0 replies; 56+ messages in thread
From: Zhi Yong Wu @ 2011-06-02  6:07 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: kwolf, stefanha, Mike Snitzer, guijianfeng, qemu-devel, wuzhy,
	herbert, Joe Thornber, Zhi Yong Wu, luowenj, kvm, zhanx,
	zhaoyang, llim, Ryan A Harper

On Wed, Jun 01, 2011 at 09:32:32AM -0400, Vivek Goyal wrote:
>Date: Wed, 1 Jun 2011 09:32:32 -0400
>From: Vivek Goyal <vgoyal@redhat.com>
>To: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
>Cc: kwolf@redhat.com, stefanha@linux.vnet.ibm.com, kvm@vger.kernel.org,
>	guijianfeng@cn.fujitsu.com, qemu-devel@nongnu.org, wuzhy@cn.ibm.com,
>	herbert@gondor.hengli.com.au, ejt@redhat.com, luowenj@cn.ibm.com,
>	zhanx@cn.ibm.com, zhaoyang@cn.ibm.com, llim@redhat.com,
>	raharper@us.ibm.com
>Subject: Re: [Qemu-devel] [RFC]QEMU disk I/O limits
>User-Agent: Mutt/1.5.21 (2010-09-15)
>
>On Wed, Jun 01, 2011 at 11:19:58AM +0800, Zhi Yong Wu wrote:
>> On Tue, May 31, 2011 at 03:55:49PM -0400, Vivek Goyal wrote:
>> >Date: Tue, 31 May 2011 15:55:49 -0400
>> >From: Vivek Goyal <vgoyal@redhat.com>
>> >To: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
>> >Cc: kwolf@redhat.com, aliguori@us.ibm.com, stefanha@linux.vnet.ibm.com,
>> >	kvm@vger.kernel.org, guijianfeng@cn.fujitsu.com,
>> >	qemu-devel@nongnu.org, wuzhy@cn.ibm.com,
>> >	herbert@gondor.hengli.com.au, luowenj@cn.ibm.com, zhanx@cn.ibm.com,
>> >	zhaoyang@cn.ibm.com, llim@redhat.com, raharper@us.ibm.com
>> >Subject: Re: [Qemu-devel] [RFC]QEMU disk I/O limits
>> >User-Agent: Mutt/1.5.21 (2010-09-15)
>> >
>> >On Mon, May 30, 2011 at 01:09:23PM +0800, Zhi Yong Wu wrote:
>> >
>> >[..]
>> >>     3.) How the users enable and play with it
>> >>     QEMU -drive option will be extended so that disk I/O limits can be specified on its command line, such as -drive [iops=xxx,][throughput=xxx] or -drive [iops_rd=xxx,][iops_wr=xxx,][throughput=xxx] etc. When this argument is specified, it means that "disk I/O limits" feature is enabled for this drive disk.
>> >
>> >How does throughput interface look like? is it bytes per second or something
>> >else?
>> Given your suggestion, its form will look like below:
>> 
>> -drive [iops=xxx][,bps=xxx] or -drive [iops_rd=xxx][,iops_wr=xxx][,bps_rd=xxx][,bps_wr=xxx]
>
>Can one specify both iops and bps rule for the same drive?
Right. They both will together limit runtime I/O rate.

Regards,

Zhiyong Wu
>
>Thanks
>Vivek

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC]QEMU disk I/O limits
  2011-05-30  5:09 ` [Qemu-devel] [RFC]QEMU " Zhi Yong Wu
@ 2011-06-02  6:17   ` Sasha Levin
  -1 siblings, 0 replies; 56+ messages in thread
From: Sasha Levin @ 2011-06-02  6:17 UTC (permalink / raw)
  To: Zhi Yong Wu
  Cc: qemu-devel, kvm, kwolf, aliguori, herbert, guijianfeng, wuzhy,
	luowenj, zhanx, zhaoyang, llim, raharper, vgoyal, stefanha

Hi,

On Mon, 2011-05-30 at 13:09 +0800, Zhi Yong Wu wrote:
> Hello, all,
> 
>     I have prepared to work on a feature called "Disk I/O limits" for qemu-kvm projeect.
>     This feature will enable the user to cap disk I/O amount performed by a VM.It is important for some storage resources to be shared among multi-VMs. As you've known, if some of VMs are doing excessive disk I/O, they will hurt the performance of other VMs.
> 
>     More detail is available here:
>     http://wiki.qemu.org/Features/DiskIOLimits
> 
>     1.) Why we need per-drive disk I/O limits 
>     As you've known, for linux, cgroup blkio-controller has supported I/O throttling on block devices. More importantly, there is no single mechanism for disk I/O throttling across all underlying storage types (image file, LVM, NFS, Ceph) and for some types there is no way to throttle at all. 
> 
>     Disk I/O limits feature introduces QEMU block layer I/O limits together with command-line and QMP interfaces for configuring limits. This allows I/O limits to be imposed across all underlying storage types using a single interface.
> 
>     2.) How disk I/O limits will be implemented
>     QEMU block layer will introduce a per-drive disk I/O request queue for those disks whose "disk I/O limits" feature is enabled. It can control disk I/O limits individually for each disk when multiple disks are attached to a VM, and enable use cases like unlimited local disk access but shared storage access with limits. 
>     In mutliple I/O threads scenario, when an application in a VM issues a block I/O request, this request will be intercepted by QEMU block layer, then it will calculate disk runtime I/O rate and determine if it has go beyond its limits. If yes, this I/O request will enqueue to that introduced queue; otherwise it will be serviced.
> 
>     3.) How the users enable and play with it
>     QEMU -drive option will be extended so that disk I/O limits can be specified on its command line, such as -drive [iops=xxx,][throughput=xxx] or -drive [iops_rd=xxx,][iops_wr=xxx,][throughput=xxx] etc. When this argument is specified, it means that "disk I/O limits" feature is enabled for this drive disk.
>     The feature will also provide users with the ability to change per-drive disk I/O limits at runtime using QMP commands.

I'm wondering if you've considered adding a 'burst' parameter -
something which will not limit (or limit less) the io ops or the
throughput for the first 'x' ms in a given time window.

> Regards,
> 
> Zhiyong Wu
> 

-- 

Sasha.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC]QEMU disk I/O limits
@ 2011-06-02  6:17   ` Sasha Levin
  0 siblings, 0 replies; 56+ messages in thread
From: Sasha Levin @ 2011-06-02  6:17 UTC (permalink / raw)
  To: Zhi Yong Wu
  Cc: kwolf, aliguori, herbert, kvm, guijianfeng, qemu-devel, wuzhy,
	luowenj, zhanx, zhaoyang, llim, raharper, vgoyal, stefanha

Hi,

On Mon, 2011-05-30 at 13:09 +0800, Zhi Yong Wu wrote:
> Hello, all,
> 
>     I have prepared to work on a feature called "Disk I/O limits" for qemu-kvm projeect.
>     This feature will enable the user to cap disk I/O amount performed by a VM.It is important for some storage resources to be shared among multi-VMs. As you've known, if some of VMs are doing excessive disk I/O, they will hurt the performance of other VMs.
> 
>     More detail is available here:
>     http://wiki.qemu.org/Features/DiskIOLimits
> 
>     1.) Why we need per-drive disk I/O limits 
>     As you've known, for linux, cgroup blkio-controller has supported I/O throttling on block devices. More importantly, there is no single mechanism for disk I/O throttling across all underlying storage types (image file, LVM, NFS, Ceph) and for some types there is no way to throttle at all. 
> 
>     Disk I/O limits feature introduces QEMU block layer I/O limits together with command-line and QMP interfaces for configuring limits. This allows I/O limits to be imposed across all underlying storage types using a single interface.
> 
>     2.) How disk I/O limits will be implemented
>     QEMU block layer will introduce a per-drive disk I/O request queue for those disks whose "disk I/O limits" feature is enabled. It can control disk I/O limits individually for each disk when multiple disks are attached to a VM, and enable use cases like unlimited local disk access but shared storage access with limits. 
>     In mutliple I/O threads scenario, when an application in a VM issues a block I/O request, this request will be intercepted by QEMU block layer, then it will calculate disk runtime I/O rate and determine if it has go beyond its limits. If yes, this I/O request will enqueue to that introduced queue; otherwise it will be serviced.
> 
>     3.) How the users enable and play with it
>     QEMU -drive option will be extended so that disk I/O limits can be specified on its command line, such as -drive [iops=xxx,][throughput=xxx] or -drive [iops_rd=xxx,][iops_wr=xxx,][throughput=xxx] etc. When this argument is specified, it means that "disk I/O limits" feature is enabled for this drive disk.
>     The feature will also provide users with the ability to change per-drive disk I/O limits at runtime using QMP commands.

I'm wondering if you've considered adding a 'burst' parameter -
something which will not limit (or limit less) the io ops or the
throughput for the first 'x' ms in a given time window.

> Regards,
> 
> Zhiyong Wu
> 

-- 

Sasha.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC]QEMU disk I/O limits
  2011-06-02  6:17   ` Sasha Levin
  (?)
@ 2011-06-02  6:29   ` Zhi Yong Wu
  2011-06-02  7:15     ` Sasha Levin
  -1 siblings, 1 reply; 56+ messages in thread
From: Zhi Yong Wu @ 2011-06-02  6:29 UTC (permalink / raw)
  To: Sasha Levin
  Cc: kwolf, aliguori, herbert, kvm, guijianfeng, qemu-devel, wuzhy,
	luowenj, zhanx, zhaoyang, llim, raharper, vgoyal, stefanha

On Thu, Jun 02, 2011 at 09:17:06AM +0300, Sasha Levin wrote:
>Date: Thu, 02 Jun 2011 09:17:06 +0300
>From: Sasha Levin <levinsasha928@gmail.com>
>To: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
>Cc: qemu-devel@nongnu.org, kvm@vger.kernel.org, kwolf@redhat.com,
>	aliguori@us.ibm.com, herbert@gondor.apana.org.au,
>	guijianfeng@cn.fujitsu.com, wuzhy@cn.ibm.com, luowenj@cn.ibm.com,
>	zhanx@cn.ibm.com, zhaoyang@cn.ibm.com, llim@redhat.com,
>	raharper@us.ibm.com, vgoyal@redhat.com, stefanha@linux.vnet.ibm.com
>Subject: Re: [Qemu-devel] [RFC]QEMU disk I/O limits
>X-Mailer: Evolution 2.32.2 
>
>Hi,
>
>On Mon, 2011-05-30 at 13:09 +0800, Zhi Yong Wu wrote:
>> Hello, all,
>> 
>>     I have prepared to work on a feature called "Disk I/O limits" for qemu-kvm projeect.
>>     This feature will enable the user to cap disk I/O amount performed by a VM.It is important for some storage resources to be shared among multi-VMs. As you've known, if some of VMs are doing excessive disk I/O, they will hurt the performance of other VMs.
>> 
>>     More detail is available here:
>>     http://wiki.qemu.org/Features/DiskIOLimits
>> 
>>     1.) Why we need per-drive disk I/O limits 
>>     As you've known, for linux, cgroup blkio-controller has supported I/O throttling on block devices. More importantly, there is no single mechanism for disk I/O throttling across all underlying storage types (image file, LVM, NFS, Ceph) and for some types there is no way to throttle at all. 
>> 
>>     Disk I/O limits feature introduces QEMU block layer I/O limits together with command-line and QMP interfaces for configuring limits. This allows I/O limits to be imposed across all underlying storage types using a single interface.
>> 
>>     2.) How disk I/O limits will be implemented
>>     QEMU block layer will introduce a per-drive disk I/O request queue for those disks whose "disk I/O limits" feature is enabled. It can control disk I/O limits individually for each disk when multiple disks are attached to a VM, and enable use cases like unlimited local disk access but shared storage access with limits. 
>>     In mutliple I/O threads scenario, when an application in a VM issues a block I/O request, this request will be intercepted by QEMU block layer, then it will calculate disk runtime I/O rate and determine if it has go beyond its limits. If yes, this I/O request will enqueue to that introduced queue; otherwise it will be serviced.
>> 
>>     3.) How the users enable and play with it
>>     QEMU -drive option will be extended so that disk I/O limits can be specified on its command line, such as -drive [iops=xxx,][throughput=xxx] or -drive [iops_rd=xxx,][iops_wr=xxx,][throughput=xxx] etc. When this argument is specified, it means that "disk I/O limits" feature is enabled for this drive disk.
>>     The feature will also provide users with the ability to change per-drive disk I/O limits at runtime using QMP commands.
>
>I'm wondering if you've considered adding a 'burst' parameter -
>something which will not limit (or limit less) the io ops or the
>throughput for the first 'x' ms in a given time window.
Currently no, Do you let us know what scenario it will make sense to?

Regards,

Zhiyong Wu
>
>> Regards,
>> 
>> Zhiyong Wu
>> 
>
>-- 
>
>Sasha.
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC]QEMU disk I/O limits
  2011-06-02  6:29   ` Zhi Yong Wu
@ 2011-06-02  7:15     ` Sasha Levin
  2011-06-02  8:18         ` Zhi Yong Wu
  0 siblings, 1 reply; 56+ messages in thread
From: Sasha Levin @ 2011-06-02  7:15 UTC (permalink / raw)
  To: Zhi Yong Wu
  Cc: kwolf, aliguori, herbert, kvm, guijianfeng, qemu-devel, wuzhy,
	luowenj, zhanx, zhaoyang, llim, raharper, vgoyal, stefanha

On Thu, 2011-06-02 at 14:29 +0800, Zhi Yong Wu wrote:
> On Thu, Jun 02, 2011 at 09:17:06AM +0300, Sasha Levin wrote:
> >Date: Thu, 02 Jun 2011 09:17:06 +0300
> >From: Sasha Levin <levinsasha928@gmail.com>
> >To: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
> >Cc: qemu-devel@nongnu.org, kvm@vger.kernel.org, kwolf@redhat.com,
> >	aliguori@us.ibm.com, herbert@gondor.apana.org.au,
> >	guijianfeng@cn.fujitsu.com, wuzhy@cn.ibm.com, luowenj@cn.ibm.com,
> >	zhanx@cn.ibm.com, zhaoyang@cn.ibm.com, llim@redhat.com,
> >	raharper@us.ibm.com, vgoyal@redhat.com, stefanha@linux.vnet.ibm.com
> >Subject: Re: [Qemu-devel] [RFC]QEMU disk I/O limits
> >X-Mailer: Evolution 2.32.2 
> >
> >Hi,
> >
> >On Mon, 2011-05-30 at 13:09 +0800, Zhi Yong Wu wrote:
> >> Hello, all,
> >> 
> >>     I have prepared to work on a feature called "Disk I/O limits" for qemu-kvm projeect.
> >>     This feature will enable the user to cap disk I/O amount performed by a VM.It is important for some storage resources to be shared among multi-VMs. As you've known, if some of VMs are doing excessive disk I/O, they will hurt the performance of other VMs.
> >> 
> >>     More detail is available here:
> >>     http://wiki.qemu.org/Features/DiskIOLimits
> >> 
> >>     1.) Why we need per-drive disk I/O limits 
> >>     As you've known, for linux, cgroup blkio-controller has supported I/O throttling on block devices. More importantly, there is no single mechanism for disk I/O throttling across all underlying storage types (image file, LVM, NFS, Ceph) and for some types there is no way to throttle at all. 
> >> 
> >>     Disk I/O limits feature introduces QEMU block layer I/O limits together with command-line and QMP interfaces for configuring limits. This allows I/O limits to be imposed across all underlying storage types using a single interface.
> >> 
> >>     2.) How disk I/O limits will be implemented
> >>     QEMU block layer will introduce a per-drive disk I/O request queue for those disks whose "disk I/O limits" feature is enabled. It can control disk I/O limits individually for each disk when multiple disks are attached to a VM, and enable use cases like unlimited local disk access but shared storage access with limits. 
> >>     In mutliple I/O threads scenario, when an application in a VM issues a block I/O request, this request will be intercepted by QEMU block layer, then it will calculate disk runtime I/O rate and determine if it has go beyond its limits. If yes, this I/O request will enqueue to that introduced queue; otherwise it will be serviced.
> >> 
> >>     3.) How the users enable and play with it
> >>     QEMU -drive option will be extended so that disk I/O limits can be specified on its command line, such as -drive [iops=xxx,][throughput=xxx] or -drive [iops_rd=xxx,][iops_wr=xxx,][throughput=xxx] etc. When this argument is specified, it means that "disk I/O limits" feature is enabled for this drive disk.
> >>     The feature will also provide users with the ability to change per-drive disk I/O limits at runtime using QMP commands.
> >
> >I'm wondering if you've considered adding a 'burst' parameter -
> >something which will not limit (or limit less) the io ops or the
> >throughput for the first 'x' ms in a given time window.
> Currently no, Do you let us know what scenario it will make sense to?

My assumption is that most guests are not doing constant disk I/O
access. Instead, the operations are usually short and happen on small
scale (relatively small amount of bytes accessed).

For example: Multiple table DB lookup, serving a website, file servers.

Basically, if I need to do a DB lookup which needs 50MB of data from a
disk which is limited to 10MB/s, I'd rather let it burst for 1 second
and complete the lookup faster instead of having it read data for 5
seconds.

If the guest now starts running multiple lookups one after the other,
thats when I would like to limit.

> Regards,
> 
> Zhiyong Wu
> >
> >> Regards,
> >> 
> >> Zhiyong Wu
> >> 
> >
> >-- 
> >
> >Sasha.
> >

-- 

Sasha.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC]QEMU disk I/O limits
  2011-06-02  7:15     ` Sasha Levin
@ 2011-06-02  8:18         ` Zhi Yong Wu
  0 siblings, 0 replies; 56+ messages in thread
From: Zhi Yong Wu @ 2011-06-02  8:18 UTC (permalink / raw)
  To: Sasha Levin
  Cc: kwolf, aliguori, kvm, guijianfeng, qemu-devel, wuzhy, luowenj, stefanha

On Thu, Jun 02, 2011 at 10:15:02AM +0300, Sasha Levin wrote:
>Date: Thu, 02 Jun 2011 10:15:02 +0300
>From: Sasha Levin <levinsasha928@gmail.com>
>To: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
>Cc: kwolf@redhat.com, aliguori@us.ibm.com, herbert@gondor.apana.org.au,
>	kvm@vger.kernel.org, guijianfeng@cn.fujitsu.com,
>	qemu-devel@nongnu.org, wuzhy@cn.ibm.com, luowenj@cn.ibm.com,
>	zhanx@cn.ibm.com, zhaoyang@cn.ibm.com, llim@redhat.com,
>	raharper@us.ibm.com, vgoyal@redhat.com, stefanha@linux.vnet.ibm.com
>Subject: Re: [Qemu-devel] [RFC]QEMU disk I/O limits
>X-Mailer: Evolution 2.32.2 
>
>On Thu, 2011-06-02 at 14:29 +0800, Zhi Yong Wu wrote:
>> On Thu, Jun 02, 2011 at 09:17:06AM +0300, Sasha Levin wrote:
>> >Date: Thu, 02 Jun 2011 09:17:06 +0300
>> >From: Sasha Levin <levinsasha928@gmail.com>
>> >To: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
>> >Cc: qemu-devel@nongnu.org, kvm@vger.kernel.org, kwolf@redhat.com,
>> >	aliguori@us.ibm.com, herbert@gondor.apana.org.au,
>> >	guijianfeng@cn.fujitsu.com, wuzhy@cn.ibm.com, luowenj@cn.ibm.com,
>> >	zhanx@cn.ibm.com, zhaoyang@cn.ibm.com, llim@redhat.com,
>> >	raharper@us.ibm.com, vgoyal@redhat.com, stefanha@linux.vnet.ibm.com
>> >Subject: Re: [Qemu-devel] [RFC]QEMU disk I/O limits
>> >X-Mailer: Evolution 2.32.2 
>> >
>> >Hi,
>> >
>> >On Mon, 2011-05-30 at 13:09 +0800, Zhi Yong Wu wrote:
>> >> Hello, all,
>> >> 
>> >>     I have prepared to work on a feature called "Disk I/O limits" for qemu-kvm projeect.
>> >>     This feature will enable the user to cap disk I/O amount performed by a VM.It is important for some storage resources to be shared among multi-VMs. As you've known, if some of VMs are doing excessive disk I/O, they will hurt the performance of other VMs.
>> >> 
>> >>     More detail is available here:
>> >>     http://wiki.qemu.org/Features/DiskIOLimits
>> >> 
>> >>     1.) Why we need per-drive disk I/O limits 
>> >>     As you've known, for linux, cgroup blkio-controller has supported I/O throttling on block devices. More importantly, there is no single mechanism for disk I/O throttling across all underlying storage types (image file, LVM, NFS, Ceph) and for some types there is no way to throttle at all. 
>> >> 
>> >>     Disk I/O limits feature introduces QEMU block layer I/O limits together with command-line and QMP interfaces for configuring limits. This allows I/O limits to be imposed across all underlying storage types using a single interface.
>> >> 
>> >>     2.) How disk I/O limits will be implemented
>> >>     QEMU block layer will introduce a per-drive disk I/O request queue for those disks whose "disk I/O limits" feature is enabled. It can control disk I/O limits individually for each disk when multiple disks are attached to a VM, and enable use cases like unlimited local disk access but shared storage access with limits. 
>> >>     In mutliple I/O threads scenario, when an application in a VM issues a block I/O request, this request will be intercepted by QEMU block layer, then it will calculate disk runtime I/O rate and determine if it has go beyond its limits. If yes, this I/O request will enqueue to that introduced queue; otherwise it will be serviced.
>> >> 
>> >>     3.) How the users enable and play with it
>> >>     QEMU -drive option will be extended so that disk I/O limits can be specified on its command line, such as -drive [iops=xxx,][throughput=xxx] or -drive [iops_rd=xxx,][iops_wr=xxx,][throughput=xxx] etc. When this argument is specified, it means that "disk I/O limits" feature is enabled for this drive disk.
>> >>     The feature will also provide users with the ability to change per-drive disk I/O limits at runtime using QMP commands.
>> >
>> >I'm wondering if you've considered adding a 'burst' parameter -
>> >something which will not limit (or limit less) the io ops or the
>> >throughput for the first 'x' ms in a given time window.
>> Currently no, Do you let us know what scenario it will make sense to?
>
>My assumption is that most guests are not doing constant disk I/O
>access. Instead, the operations are usually short and happen on small
>scale (relatively small amount of bytes accessed).
>
>For example: Multiple table DB lookup, serving a website, file servers.
>
>Basically, if I need to do a DB lookup which needs 50MB of data from a
>disk which is limited to 10MB/s, I'd rather let it burst for 1 second
>and complete the lookup faster instead of having it read data for 5
>seconds.
>
>If the guest now starts running multiple lookups one after the other,
>thats when I would like to limit.
HI, Sasha,

If iops or bps parameters are not specified to -drive, it will not limit this disk I/O rate. Of course, QMP commands will be extended to support changing or disabling disk I/O limits at runtime. If you'd like not limit a disk I/O rate, you can use it to disabled this feature.

I don't make sure that this is the right answer for your question.


Regards,

Zhiyong Wu 

>
>> Regards,
>> 
>> Zhiyong Wu
>> >
>> >> Regards,
>> >> 
>> >> Zhiyong Wu
>> >> 
>> >
>> >-- 
>> >
>> >Sasha.
>> >
>
>-- 
>
>Sasha.
>
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC]QEMU disk I/O limits
@ 2011-06-02  8:18         ` Zhi Yong Wu
  0 siblings, 0 replies; 56+ messages in thread
From: Zhi Yong Wu @ 2011-06-02  8:18 UTC (permalink / raw)
  To: Sasha Levin
  Cc: kwolf, aliguori, stefanha, kvm, guijianfeng, qemu-devel, wuzhy, luowenj

On Thu, Jun 02, 2011 at 10:15:02AM +0300, Sasha Levin wrote:
>Date: Thu, 02 Jun 2011 10:15:02 +0300
>From: Sasha Levin <levinsasha928@gmail.com>
>To: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
>Cc: kwolf@redhat.com, aliguori@us.ibm.com, herbert@gondor.apana.org.au,
>	kvm@vger.kernel.org, guijianfeng@cn.fujitsu.com,
>	qemu-devel@nongnu.org, wuzhy@cn.ibm.com, luowenj@cn.ibm.com,
>	zhanx@cn.ibm.com, zhaoyang@cn.ibm.com, llim@redhat.com,
>	raharper@us.ibm.com, vgoyal@redhat.com, stefanha@linux.vnet.ibm.com
>Subject: Re: [Qemu-devel] [RFC]QEMU disk I/O limits
>X-Mailer: Evolution 2.32.2 
>
>On Thu, 2011-06-02 at 14:29 +0800, Zhi Yong Wu wrote:
>> On Thu, Jun 02, 2011 at 09:17:06AM +0300, Sasha Levin wrote:
>> >Date: Thu, 02 Jun 2011 09:17:06 +0300
>> >From: Sasha Levin <levinsasha928@gmail.com>
>> >To: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>
>> >Cc: qemu-devel@nongnu.org, kvm@vger.kernel.org, kwolf@redhat.com,
>> >	aliguori@us.ibm.com, herbert@gondor.apana.org.au,
>> >	guijianfeng@cn.fujitsu.com, wuzhy@cn.ibm.com, luowenj@cn.ibm.com,
>> >	zhanx@cn.ibm.com, zhaoyang@cn.ibm.com, llim@redhat.com,
>> >	raharper@us.ibm.com, vgoyal@redhat.com, stefanha@linux.vnet.ibm.com
>> >Subject: Re: [Qemu-devel] [RFC]QEMU disk I/O limits
>> >X-Mailer: Evolution 2.32.2 
>> >
>> >Hi,
>> >
>> >On Mon, 2011-05-30 at 13:09 +0800, Zhi Yong Wu wrote:
>> >> Hello, all,
>> >> 
>> >>     I have prepared to work on a feature called "Disk I/O limits" for qemu-kvm projeect.
>> >>     This feature will enable the user to cap disk I/O amount performed by a VM.It is important for some storage resources to be shared among multi-VMs. As you've known, if some of VMs are doing excessive disk I/O, they will hurt the performance of other VMs.
>> >> 
>> >>     More detail is available here:
>> >>     http://wiki.qemu.org/Features/DiskIOLimits
>> >> 
>> >>     1.) Why we need per-drive disk I/O limits 
>> >>     As you've known, for linux, cgroup blkio-controller has supported I/O throttling on block devices. More importantly, there is no single mechanism for disk I/O throttling across all underlying storage types (image file, LVM, NFS, Ceph) and for some types there is no way to throttle at all. 
>> >> 
>> >>     Disk I/O limits feature introduces QEMU block layer I/O limits together with command-line and QMP interfaces for configuring limits. This allows I/O limits to be imposed across all underlying storage types using a single interface.
>> >> 
>> >>     2.) How disk I/O limits will be implemented
>> >>     QEMU block layer will introduce a per-drive disk I/O request queue for those disks whose "disk I/O limits" feature is enabled. It can control disk I/O limits individually for each disk when multiple disks are attached to a VM, and enable use cases like unlimited local disk access but shared storage access with limits. 
>> >>     In mutliple I/O threads scenario, when an application in a VM issues a block I/O request, this request will be intercepted by QEMU block layer, then it will calculate disk runtime I/O rate and determine if it has go beyond its limits. If yes, this I/O request will enqueue to that introduced queue; otherwise it will be serviced.
>> >> 
>> >>     3.) How the users enable and play with it
>> >>     QEMU -drive option will be extended so that disk I/O limits can be specified on its command line, such as -drive [iops=xxx,][throughput=xxx] or -drive [iops_rd=xxx,][iops_wr=xxx,][throughput=xxx] etc. When this argument is specified, it means that "disk I/O limits" feature is enabled for this drive disk.
>> >>     The feature will also provide users with the ability to change per-drive disk I/O limits at runtime using QMP commands.
>> >
>> >I'm wondering if you've considered adding a 'burst' parameter -
>> >something which will not limit (or limit less) the io ops or the
>> >throughput for the first 'x' ms in a given time window.
>> Currently no, Do you let us know what scenario it will make sense to?
>
>My assumption is that most guests are not doing constant disk I/O
>access. Instead, the operations are usually short and happen on small
>scale (relatively small amount of bytes accessed).
>
>For example: Multiple table DB lookup, serving a website, file servers.
>
>Basically, if I need to do a DB lookup which needs 50MB of data from a
>disk which is limited to 10MB/s, I'd rather let it burst for 1 second
>and complete the lookup faster instead of having it read data for 5
>seconds.
>
>If the guest now starts running multiple lookups one after the other,
>thats when I would like to limit.
HI, Sasha,

If iops or bps parameters are not specified to -drive, it will not limit this disk I/O rate. Of course, QMP commands will be extended to support changing or disabling disk I/O limits at runtime. If you'd like not limit a disk I/O rate, you can use it to disabled this feature.

I don't make sure that this is the right answer for your question.


Regards,

Zhiyong Wu 

>
>> Regards,
>> 
>> Zhiyong Wu
>> >
>> >> Regards,
>> >> 
>> >> Zhiyong Wu
>> >> 
>> >
>> >-- 
>> >
>> >Sasha.
>> >
>
>-- 
>
>Sasha.
>
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2011-06-04  8:55 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-05-30  5:09 [Qemu-devel][RFC]QEMU disk I/O limits Zhi Yong Wu
2011-05-30  5:09 ` [Qemu-devel] [RFC]QEMU " Zhi Yong Wu
2011-05-31 13:45 ` [Qemu-devel][RFC]QEMU " Vivek Goyal
2011-05-31 13:45   ` [Qemu-devel] [RFC]QEMU " Vivek Goyal
2011-05-31 13:50   ` Anthony Liguori
2011-05-31 13:50     ` [Qemu-devel] " Anthony Liguori
2011-05-31 14:04     ` Vivek Goyal
2011-05-31 14:04       ` [Qemu-devel] " Vivek Goyal
2011-05-31 14:25       ` Anthony Liguori
2011-05-31 17:59         ` Vivek Goyal
2011-05-31 17:59           ` Vivek Goyal
2011-05-31 18:39           ` Anthony Liguori
2011-05-31 18:39             ` Anthony Liguori
2011-05-31 19:24             ` Vivek Goyal
2011-05-31 19:24               ` Vivek Goyal
2011-05-31 23:30               ` Anthony Liguori
2011-06-01 13:20                 ` Vivek Goyal
2011-06-01 21:15                   ` Stefan Hajnoczi
2011-06-01 21:15                     ` Stefan Hajnoczi
2011-06-01 21:42                     ` Vivek Goyal
2011-06-01 21:42                       ` Vivek Goyal
2011-06-01 22:28                       ` Stefan Hajnoczi
2011-06-01 22:28                         ` Stefan Hajnoczi
2011-06-04  8:54                 ` Blue Swirl
2011-06-04  8:54                   ` Blue Swirl
2011-05-31 20:48             ` Mike Snitzer
2011-05-31 20:48               ` [Qemu-devel] " Mike Snitzer
2011-05-31 22:22               ` Anthony Liguori
2011-05-31 13:56   ` [Qemu-devel][RFC]QEMU " Daniel P. Berrange
2011-05-31 13:56     ` [Qemu-devel] [RFC]QEMU " Daniel P. Berrange
2011-05-31 14:10     ` [Qemu-devel][RFC]QEMU " Vivek Goyal
2011-05-31 14:10       ` [Qemu-devel] [RFC]QEMU " Vivek Goyal
2011-05-31 14:19       ` [Qemu-devel][RFC]QEMU " Daniel P. Berrange
2011-05-31 14:19         ` [Qemu-devel] [RFC]QEMU " Daniel P. Berrange
2011-05-31 14:28         ` [Qemu-devel][RFC]QEMU " Vivek Goyal
2011-05-31 14:28           ` [Qemu-devel] [RFC]QEMU " Vivek Goyal
2011-05-31 15:28         ` [Qemu-devel][RFC]QEMU " Ryan Harper
2011-05-31 15:28           ` [Qemu-devel] [RFC]QEMU " Ryan Harper
2011-05-31 19:55 ` [Qemu-devel][RFC]QEMU " Vivek Goyal
2011-05-31 19:55   ` [Qemu-devel] [RFC]QEMU " Vivek Goyal
2011-06-01  3:12   ` Zhi Yong Wu
2011-06-01  3:12     ` Zhi Yong Wu
2011-06-02  9:33     ` Michal Suchanek
2011-06-02  9:33       ` Michal Suchanek
2011-06-03  6:56       ` Zhi Yong Wu
2011-06-03  6:56         ` Zhi Yong Wu
2011-06-01  3:19   ` Zhi Yong Wu
2011-06-01  3:19     ` Zhi Yong Wu
2011-06-01 13:32     ` Vivek Goyal
2011-06-02  6:07       ` Zhi Yong Wu
2011-06-02  6:17 ` Sasha Levin
2011-06-02  6:17   ` Sasha Levin
2011-06-02  6:29   ` Zhi Yong Wu
2011-06-02  7:15     ` Sasha Levin
2011-06-02  8:18       ` Zhi Yong Wu
2011-06-02  8:18         ` Zhi Yong Wu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.