Re: [Qemu-devel] [RFC]QEMU disk I/O limits

From: Vivek Goyal <vgoyal@redhat.com>
To: Stefan Hajnoczi <stefanha@gmail.com>
Cc: Anthony Liguori <anthony@codemonkey.ws>,
	kwolf@redhat.com, stefanha@linux.vnet.ibm.com,
	Mike Snitzer <snitzer@redhat.com>,
	guijianfeng@cn.fujitsu.com, qemu-devel@nongnu.org,
	wuzhy@cn.ibm.com, herbert@gondor.hengli.com.au,
	Joe Thornber <ejt@redhat.com>,
	Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>,
	luowenj@cn.ibm.com, kvm@vger.kernel.org, zhanx@cn.ibm.com,
	zhaoyang@cn.ibm.com, llim@redhat.com,
	Ryan A Harper <raharper@us.ibm.com>
Subject: Re: [Qemu-devel] [RFC]QEMU disk I/O limits
Date: Wed, 1 Jun 2011 17:42:12 -0400	[thread overview]
Message-ID: <20110601214212.GB17449@redhat.com> (raw)
In-Reply-To: <BANLkTinn4ysHFncnVrfNgay07JBpnFsqsw@mail.gmail.com>

On Wed, Jun 01, 2011 at 10:15:30PM +0100, Stefan Hajnoczi wrote:
> On Wed, Jun 1, 2011 at 2:20 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> > On Tue, May 31, 2011 at 06:30:09PM -0500, Anthony Liguori wrote:
> >
> > [..]
> >> The level of consistency will then depend on whether you overcommit
> >> your hardware and how you have it configured.
> >
> > Agreed.
> >
> >>
> >> Consistency is very hard because at the end of the day, you still
> >> have shared resources.  Even with blkio, I presume one guest can
> >> still impact another guest by forcing the disk to do excessive
> >> seeking or something of that nature.
> >>
> >> So absolutely consistency can't be the requirement for the use-case.
> >> The use-cases we are interested really are more about providing caps
> >> than anything else.
> >
> > I think both qemu and kenrel can do the job. The only thing which
> > seriously favors throttling implementation in qemu is the ability
> > to handle wide variety of backend files (NFS, qcow, libcurl based
> > devices etc).
> >
> > So what I am arguing is that your previous reason that qemu can do
> > a better job because it knows effective IOPS of guest, is not
> > necessarily a very good reason. To me simplicity of being able to handle
> > everything as file and do the throttling is the most compelling reason
> > to do this implementation in qemu.
> 
> The variety of backends is the reason to go for a QEMU-based approach.
>  If there were kernel mechanisms to handle non-block backends that
> would be great.  cgroups NFS?

I agree that because qemu can handle variety of backends it becomes a
very good reason to do throttling in qemu. Kernel currently does not
handle files over NFS.

There were some suggestions of using a loop or device mapper loop device
on top of NFS images and then implement block device policies like
throttling. But I am not convinced that it is a good idea.

To cover the case of NFS we probably shall have to implement something
in NFS or something more generic in VFS. But I am not sure if file system
guys will like it or is it even worth at this point of time given the
fact that primary use case is qemu and qemu can easily implement this
funcitonality.

> 
> Of course for something like Sheepdog or Ceph it becomes quite hard to
> do it in the kernel at all since they are userspace libraries that
> speak their protocol over sockets, and you really don't have sinight
> into what I/O operations they are doing from the kernel.

Agreed. This is another reason that why doing it in qemu makes sense. 

> 
> One issue that concerns me is how effective iops and throughput are as
> capping mechanisms.  If you cap throughput then you're likely to
> affect sequential I/O but do little against random I/O which can hog
> the disk with a seeky I/O pattern.  If you limit iops you can cap
> random I/O but artifically limit sequential I/O, which may be able to
> perform a high number of iops without hogging the disk due to seek
> times at all.  One proposed solution here (I think Christoph Hellwig
> suggested it) is to do something like merging sequential I/O counting
> so that multiple sequential I/Os only count as 1 iop.

One of the things we atleast need to do is allow specifying both
bps and iops rule together so that random IO  with high iops does
not create havoc and seqential or large size IO with low iops and
high bps does not overload the system.

I am not sure how IO shows up in qemu but will elevator in guest
make sure that lot of sequential IO is merged together? For dependent
READS, I think counting multiple sequential reads as 1 iops might
help. I think this is one optimization one can do once throttling
starts working in qemu and see if it is a real concern.

> 
> I like the idea of a proportional share of disk utilization but doing
> that from QEMU is problematic since we only know when we issued an I/O
> to the kernel, not when it's actually being serviced by the disk -
> there could be queue wait times in the block layer that we don't know
> about - so we end up with a magic number for disk utilization which
> may not be a very meaningful number.

To be able to implement proportional IO one should be able to see
all IO from all clients at one place. Qemu knows about IO of only
its guest and not other guests running on the system. So I think 
qemu can't implement proportion IO.

> 
> So given the constraints and the backends we need to support, disk I/O
> limits in QEMU with iops and throughput limits seem like the approach
> we need.

For qemu yes. For other non-qemu usages we will still require a kernel
mechanism of throttling.

Thanks
Vivek