All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stefan Hajnoczi <stefanha@gmail.com>
To: Vivek Goyal <vgoyal@redhat.com>
Cc: Anthony Liguori <anthony@codemonkey.ws>,
	kwolf@redhat.com, stefanha@linux.vnet.ibm.com,
	Mike Snitzer <snitzer@redhat.com>,
	guijianfeng@cn.fujitsu.com, qemu-devel@nongnu.org,
	wuzhy@cn.ibm.com, herbert@gondor.hengli.com.au,
	Joe Thornber <ejt@redhat.com>,
	Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>,
	luowenj@cn.ibm.com, kvm@vger.kernel.org, zhanx@cn.ibm.com,
	zhaoyang@cn.ibm.com, llim@redhat.com,
	Ryan A Harper <raharper@us.ibm.com>
Subject: Re: [Qemu-devel] [RFC]QEMU disk I/O limits
Date: Wed, 1 Jun 2011 23:28:27 +0100	[thread overview]
Message-ID: <BANLkTinvO2Sku5jGwDu98EWa56BUhgvx6A@mail.gmail.com> (raw)
In-Reply-To: <20110601214212.GB17449@redhat.com>

On Wed, Jun 1, 2011 at 10:42 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Wed, Jun 01, 2011 at 10:15:30PM +0100, Stefan Hajnoczi wrote:
>> One issue that concerns me is how effective iops and throughput are as
>> capping mechanisms.  If you cap throughput then you're likely to
>> affect sequential I/O but do little against random I/O which can hog
>> the disk with a seeky I/O pattern.  If you limit iops you can cap
>> random I/O but artifically limit sequential I/O, which may be able to
>> perform a high number of iops without hogging the disk due to seek
>> times at all.  One proposed solution here (I think Christoph Hellwig
>> suggested it) is to do something like merging sequential I/O counting
>> so that multiple sequential I/Os only count as 1 iop.
>
> One of the things we atleast need to do is allow specifying both
> bps and iops rule together so that random IO  with high iops does
> not create havoc and seqential or large size IO with low iops and
> high bps does not overload the system.
>
> I am not sure how IO shows up in qemu but will elevator in guest
> make sure that lot of sequential IO is merged together? For dependent
> READS, I think counting multiple sequential reads as 1 iops might
> help. I think this is one optimization one can do once throttling
> starts working in qemu and see if it is a real concern.

The guest can use an I/O scheduler, so for Linux guests we see the
typical effects of cfq.  Requests do get merged by the guest before
being submitted to QEMU.

Okay, good idea.  Zhi Yong's test plan includes tests with multiple
VMs and both iops and throughput limits at the same time.  If
workloads turn up that cause issues it would be possible at counting
sequential I/Os a 1 iop.

>>
>> I like the idea of a proportional share of disk utilization but doing
>> that from QEMU is problematic since we only know when we issued an I/O
>> to the kernel, not when it's actually being serviced by the disk -
>> there could be queue wait times in the block layer that we don't know
>> about - so we end up with a magic number for disk utilization which
>> may not be a very meaningful number.
>
> To be able to implement proportional IO one should be able to see
> all IO from all clients at one place. Qemu knows about IO of only
> its guest and not other guests running on the system. So I think
> qemu can't implement proportion IO.

Yeah :(

>>
>> So given the constraints and the backends we need to support, disk I/O
>> limits in QEMU with iops and throughput limits seem like the approach
>> we need.
>
> For qemu yes. For other non-qemu usages we will still require a kernel
> mechanism of throttling.

Definitely.  In fact I like the idea of using blkio-controller for raw
image files on local file systems or LVM volumes.

Hopefully the end-user API (libvirt interface) that QEMU disk I/O
limits gets exposed from complements the existing blkiotune
(blkio-controller) virsh command.

Stefan

WARNING: multiple messages have this Message-ID (diff)
From: Stefan Hajnoczi <stefanha@gmail.com>
To: Vivek Goyal <vgoyal@redhat.com>
Cc: kwolf@redhat.com, stefanha@linux.vnet.ibm.com,
	Mike Snitzer <snitzer@redhat.com>,
	guijianfeng@cn.fujitsu.com, qemu-devel@nongnu.org,
	wuzhy@cn.ibm.com, herbert@gondor.hengli.com.au,
	Joe Thornber <ejt@redhat.com>,
	Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>,
	luowenj@cn.ibm.com, kvm@vger.kernel.org, zhanx@cn.ibm.com,
	zhaoyang@cn.ibm.com, llim@redhat.com,
	Ryan A Harper <raharper@us.ibm.com>
Subject: Re: [Qemu-devel] [RFC]QEMU disk I/O limits
Date: Wed, 1 Jun 2011 23:28:27 +0100	[thread overview]
Message-ID: <BANLkTinvO2Sku5jGwDu98EWa56BUhgvx6A@mail.gmail.com> (raw)
In-Reply-To: <20110601214212.GB17449@redhat.com>

On Wed, Jun 1, 2011 at 10:42 PM, Vivek Goyal <vgoyal@redhat.com> wrote:
> On Wed, Jun 01, 2011 at 10:15:30PM +0100, Stefan Hajnoczi wrote:
>> One issue that concerns me is how effective iops and throughput are as
>> capping mechanisms.  If you cap throughput then you're likely to
>> affect sequential I/O but do little against random I/O which can hog
>> the disk with a seeky I/O pattern.  If you limit iops you can cap
>> random I/O but artifically limit sequential I/O, which may be able to
>> perform a high number of iops without hogging the disk due to seek
>> times at all.  One proposed solution here (I think Christoph Hellwig
>> suggested it) is to do something like merging sequential I/O counting
>> so that multiple sequential I/Os only count as 1 iop.
>
> One of the things we atleast need to do is allow specifying both
> bps and iops rule together so that random IO  with high iops does
> not create havoc and seqential or large size IO with low iops and
> high bps does not overload the system.
>
> I am not sure how IO shows up in qemu but will elevator in guest
> make sure that lot of sequential IO is merged together? For dependent
> READS, I think counting multiple sequential reads as 1 iops might
> help. I think this is one optimization one can do once throttling
> starts working in qemu and see if it is a real concern.

The guest can use an I/O scheduler, so for Linux guests we see the
typical effects of cfq.  Requests do get merged by the guest before
being submitted to QEMU.

Okay, good idea.  Zhi Yong's test plan includes tests with multiple
VMs and both iops and throughput limits at the same time.  If
workloads turn up that cause issues it would be possible at counting
sequential I/Os a 1 iop.

>>
>> I like the idea of a proportional share of disk utilization but doing
>> that from QEMU is problematic since we only know when we issued an I/O
>> to the kernel, not when it's actually being serviced by the disk -
>> there could be queue wait times in the block layer that we don't know
>> about - so we end up with a magic number for disk utilization which
>> may not be a very meaningful number.
>
> To be able to implement proportional IO one should be able to see
> all IO from all clients at one place. Qemu knows about IO of only
> its guest and not other guests running on the system. So I think
> qemu can't implement proportion IO.

Yeah :(

>>
>> So given the constraints and the backends we need to support, disk I/O
>> limits in QEMU with iops and throughput limits seem like the approach
>> we need.
>
> For qemu yes. For other non-qemu usages we will still require a kernel
> mechanism of throttling.

Definitely.  In fact I like the idea of using blkio-controller for raw
image files on local file systems or LVM volumes.

Hopefully the end-user API (libvirt interface) that QEMU disk I/O
limits gets exposed from complements the existing blkiotune
(blkio-controller) virsh command.

Stefan

  reply	other threads:[~2011-06-01 22:28 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-05-30  5:09 [Qemu-devel][RFC]QEMU disk I/O limits Zhi Yong Wu
2011-05-30  5:09 ` [Qemu-devel] [RFC]QEMU " Zhi Yong Wu
2011-05-31 13:45 ` [Qemu-devel][RFC]QEMU " Vivek Goyal
2011-05-31 13:45   ` [Qemu-devel] [RFC]QEMU " Vivek Goyal
2011-05-31 13:50   ` Anthony Liguori
2011-05-31 13:50     ` [Qemu-devel] " Anthony Liguori
2011-05-31 14:04     ` Vivek Goyal
2011-05-31 14:04       ` [Qemu-devel] " Vivek Goyal
2011-05-31 14:25       ` Anthony Liguori
2011-05-31 17:59         ` Vivek Goyal
2011-05-31 17:59           ` Vivek Goyal
2011-05-31 18:39           ` Anthony Liguori
2011-05-31 18:39             ` Anthony Liguori
2011-05-31 19:24             ` Vivek Goyal
2011-05-31 19:24               ` Vivek Goyal
2011-05-31 23:30               ` Anthony Liguori
2011-06-01 13:20                 ` Vivek Goyal
2011-06-01 21:15                   ` Stefan Hajnoczi
2011-06-01 21:15                     ` Stefan Hajnoczi
2011-06-01 21:42                     ` Vivek Goyal
2011-06-01 21:42                       ` Vivek Goyal
2011-06-01 22:28                       ` Stefan Hajnoczi [this message]
2011-06-01 22:28                         ` Stefan Hajnoczi
2011-06-04  8:54                 ` Blue Swirl
2011-06-04  8:54                   ` Blue Swirl
2011-05-31 20:48             ` Mike Snitzer
2011-05-31 20:48               ` [Qemu-devel] " Mike Snitzer
2011-05-31 22:22               ` Anthony Liguori
2011-05-31 13:56   ` [Qemu-devel][RFC]QEMU " Daniel P. Berrange
2011-05-31 13:56     ` [Qemu-devel] [RFC]QEMU " Daniel P. Berrange
2011-05-31 14:10     ` [Qemu-devel][RFC]QEMU " Vivek Goyal
2011-05-31 14:10       ` [Qemu-devel] [RFC]QEMU " Vivek Goyal
2011-05-31 14:19       ` [Qemu-devel][RFC]QEMU " Daniel P. Berrange
2011-05-31 14:19         ` [Qemu-devel] [RFC]QEMU " Daniel P. Berrange
2011-05-31 14:28         ` [Qemu-devel][RFC]QEMU " Vivek Goyal
2011-05-31 14:28           ` [Qemu-devel] [RFC]QEMU " Vivek Goyal
2011-05-31 15:28         ` [Qemu-devel][RFC]QEMU " Ryan Harper
2011-05-31 15:28           ` [Qemu-devel] [RFC]QEMU " Ryan Harper
2011-05-31 19:55 ` [Qemu-devel][RFC]QEMU " Vivek Goyal
2011-05-31 19:55   ` [Qemu-devel] [RFC]QEMU " Vivek Goyal
2011-06-01  3:12   ` Zhi Yong Wu
2011-06-01  3:12     ` Zhi Yong Wu
2011-06-02  9:33     ` Michal Suchanek
2011-06-02  9:33       ` Michal Suchanek
2011-06-03  6:56       ` Zhi Yong Wu
2011-06-03  6:56         ` Zhi Yong Wu
2011-06-01  3:19   ` Zhi Yong Wu
2011-06-01  3:19     ` Zhi Yong Wu
2011-06-01 13:32     ` Vivek Goyal
2011-06-02  6:07       ` Zhi Yong Wu
2011-06-02  6:17 ` Sasha Levin
2011-06-02  6:17   ` Sasha Levin
2011-06-02  6:29   ` Zhi Yong Wu
2011-06-02  7:15     ` Sasha Levin
2011-06-02  8:18       ` Zhi Yong Wu
2011-06-02  8:18         ` Zhi Yong Wu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=BANLkTinvO2Sku5jGwDu98EWa56BUhgvx6A@mail.gmail.com \
    --to=stefanha@gmail.com \
    --cc=anthony@codemonkey.ws \
    --cc=ejt@redhat.com \
    --cc=guijianfeng@cn.fujitsu.com \
    --cc=herbert@gondor.hengli.com.au \
    --cc=kvm@vger.kernel.org \
    --cc=kwolf@redhat.com \
    --cc=llim@redhat.com \
    --cc=luowenj@cn.ibm.com \
    --cc=qemu-devel@nongnu.org \
    --cc=raharper@us.ibm.com \
    --cc=snitzer@redhat.com \
    --cc=stefanha@linux.vnet.ibm.com \
    --cc=vgoyal@redhat.com \
    --cc=wuzhy@cn.ibm.com \
    --cc=wuzhy@linux.vnet.ibm.com \
    --cc=zhanx@cn.ibm.com \
    --cc=zhaoyang@cn.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.