Re: [Qemu-devel] [RFC]QEMU disk I/O limits

From: Stefan Hajnoczi <stefanha@gmail.com>
To: Vivek Goyal <vgoyal@redhat.com>
Cc: Anthony Liguori <anthony@codemonkey.ws>,
	kwolf@redhat.com, stefanha@linux.vnet.ibm.com,
	Mike Snitzer <snitzer@redhat.com>,
	guijianfeng@cn.fujitsu.com, qemu-devel@nongnu.org,
	wuzhy@cn.ibm.com, herbert@gondor.hengli.com.au,
	Joe Thornber <ejt@redhat.com>,
	Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>,
	luowenj@cn.ibm.com, kvm@vger.kernel.org, zhanx@cn.ibm.com,
	zhaoyang@cn.ibm.com, llim@redhat.com,
	Ryan A Harper <raharper@us.ibm.com>
Subject: Re: [Qemu-devel] [RFC]QEMU disk I/O limits
Date: Wed, 1 Jun 2011 23:28:27 +0100	[thread overview]
Message-ID: <BANLkTinvO2Sku5jGwDu98EWa56BUhgvx6A@mail.gmail.com> (raw)
In-Reply-To: <20110601214212.GB17449@redhat.com>

On Wed, Jun 1, 2011 at 10:42 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Wed, Jun 01, 2011 at 10:15:30PM +0100, Stefan Hajnoczi wrote:
>> One issue that concerns me is how effective iops and throughput are as
>> capping mechanisms.  If you cap throughput then you're likely to
>> affect sequential I/O but do little against random I/O which can hog
>> the disk with a seeky I/O pattern.  If you limit iops you can cap
>> random I/O but artifically limit sequential I/O, which may be able to
>> perform a high number of iops without hogging the disk due to seek
>> times at all.  One proposed solution here (I think Christoph Hellwig
>> suggested it) is to do something like merging sequential I/O counting
>> so that multiple sequential I/Os only count as 1 iop.
>
> One of the things we atleast need to do is allow specifying both
> bps and iops rule together so that random IO  with high iops does
> not create havoc and seqential or large size IO with low iops and
> high bps does not overload the system.
>
> I am not sure how IO shows up in qemu but will elevator in guest
> make sure that lot of sequential IO is merged together? For dependent
> READS, I think counting multiple sequential reads as 1 iops might
> help. I think this is one optimization one can do once throttling
> starts working in qemu and see if it is a real concern.

The guest can use an I/O scheduler, so for Linux guests we see the
typical effects of cfq.  Requests do get merged by the guest before
being submitted to QEMU.

Okay, good idea.  Zhi Yong's test plan includes tests with multiple
VMs and both iops and throughput limits at the same time.  If
workloads turn up that cause issues it would be possible at counting
sequential I/Os a 1 iop.

>>
>> I like the idea of a proportional share of disk utilization but doing
>> that from QEMU is problematic since we only know when we issued an I/O
>> to the kernel, not when it's actually being serviced by the disk -
>> there could be queue wait times in the block layer that we don't know
>> about - so we end up with a magic number for disk utilization which
>> may not be a very meaningful number.
>
> To be able to implement proportional IO one should be able to see
> all IO from all clients at one place. Qemu knows about IO of only
> its guest and not other guests running on the system. So I think
> qemu can't implement proportion IO.

Yeah :(

>>
>> So given the constraints and the backends we need to support, disk I/O
>> limits in QEMU with iops and throughput limits seem like the approach
>> we need.
>
> For qemu yes. For other non-qemu usages we will still require a kernel
> mechanism of throttling.

Definitely.  In fact I like the idea of using blkio-controller for raw
image files on local file systems or LVM volumes.

Hopefully the end-user API (libvirt interface) that QEMU disk I/O
limits gets exposed from complements the existing blkiotune
(blkio-controller) virsh command.

Stefan