From mboxrd@z Thu Jan  1 00:00:00 1970
From: Zhi Yong Wu <zwu.kernel@gmail.com>
Subject: Re: [PATCH v2 1/1] The codes V2 for QEMU disk I/O limits.
Date: Fri, 29 Jul 2011 10:09:30 +0800
Message-ID: <CAEH94Ljti-TpeRMcqRhu7aMT_vpZjfDjOv+cy3CxvDjaPtA1DQ@mail.gmail.com>
References: <1311670746-20498-1-git-send-email-wuzhy@linux.vnet.ibm.com>
	<1311670746-20498-2-git-send-email-wuzhy@linux.vnet.ibm.com>
	<20110726192618.GA8126@amt.cnet>
	<CAEH94Liq-0jNH4mC5nCib+SBkYKoCO1vwy4LemiNQzWVSwYAsg@mail.gmail.com>
	<20110727154913.GA6334@amt.cnet>
	<CAEH94LibyMN9=8gGBYE15Wva7GR0BBBWmbjTaWdcLO8psbHGYA@mail.gmail.com>
	<20110728144211.GA17754@amt.cnet>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: QUOTED-PRINTABLE
Cc: Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>, qemu-devel@nongnu.org,
	kvm@vger.kernel.org, aliguori@us.ibm.com,
	stefanha@linux.vnet.ibm.com, ryanh@us.ibm.com, kwolf@redhat.com,
	vgoyal@redhat.com
To: Marcelo Tosatti <mtosatti@redhat.com>
Return-path: <kvm-owner@vger.kernel.org>
Received: from mail-gx0-f174.google.com ([209.85.161.174]:56694 "EHLO
	mail-gx0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1754688Ab1G2CJb convert rfc822-to-8bit (ORCPT
	<rfc822;kvm@vger.kernel.org>); Thu, 28 Jul 2011 22:09:31 -0400
Received: by gxk21 with SMTP id 21so2260582gxk.19
        for <kvm@vger.kernel.org>; Thu, 28 Jul 2011 19:09:30 -0700 (PDT)
In-Reply-To: <20110728144211.GA17754@amt.cnet>
Sender: kvm-owner@vger.kernel.org
List-ID: <kvm.vger.kernel.org>

On Thu, Jul 28, 2011 at 10:42 PM, Marcelo Tosatti <mtosatti@redhat.com>=
 wrote:
> On Thu, Jul 28, 2011 at 12:24:48PM +0800, Zhi Yong Wu wrote:
>> On Wed, Jul 27, 2011 at 11:49 PM, Marcelo Tosatti <mtosatti@redhat.c=
om> wrote:
>> > On Wed, Jul 27, 2011 at 06:17:15PM +0800, Zhi Yong Wu wrote:
>> >> >> + =A0 =A0 =A0 =A0wait_time =3D 1;
>> >> >> + =A0 =A0}
>> >> >> +
>> >> >> + =A0 =A0wait_time =3D wait_time + (slice_time - elapsed_time)=
;
>> >> >> + =A0 =A0if (wait) {
>> >> >> + =A0 =A0 =A0 =A0*wait =3D wait_time * BLOCK_IO_SLICE_TIME * 1=
0 + 1;
>> >> >> + =A0 =A0}
>> >> >
>> >> > The guest can keep submitting requests where "wait_time =3D 1" =
above,
>> >> > and the timer will be rearmed continuously in the future.
>> >
>> > This is wrong.
>> >
>> >> =A0Can't you
>> >> > simply arm the timer to the next slice start? _Some_ data must =
be
>> >> > transfered by then, anyway (and nothing can be transfered earli=
er than
>> >> > that).
>> >
>> > This is valid.
>> >
>> >> Sorry, i have got what you mean. Can you elaborate in more detail=
?
>> >
>> > Sorry, the bug i mentioned about timer being rearmed does not exis=
t.
>> >
>> > But arming the timer for the last request as its done is confusing=
/unnecessary.
>> >
>> > The timer processes queued requests, but the timer is armed accord=
ingly
>> > to the last queued request in the slice. For example, if a request=
 is
>> > submitted 1ms before the slice ends, the timer is armed 100ms +
>> > (slice_time - elapsed_time) in the future.
>>
>> If the timer is simply amred to the next slice start, this timer wil=
l
>> be a periodic timer, either the I/O rate can not be throttled under
>> the limits, or the enqueued request can be delayed to handled, this
>> will lower I/O rate seriously than the limits.
>
> Yes, periodic but disarmed when there is no queueing. I don't underst=
and
> your point about low I/O rate.
We hope that at the same time the runtime I/O rate is throttled under
this limits, it can be very close to the limits. If the timer is
simpily armed to the next slice, the enqueued request could be delayed
a bit long to be handled, this could make current I/O rate lowerer
largely than the limits. So this could seriously hurt the I/O
performance.

>
>> Maybe the slice time should be variable with current I/O rate. What =
do
>> you think of it?
>
> Not sure if its necessary. The slice should be small to avoid excessi=
ve
> work on timer context, but the advantage of increasing the slice is
> very small if any. BTW, 10ms seems a better default than 100ms.
Thanks for your comments, if you're interested, pls take a look at the
code V3. It has added the codes for variable slice time.

>
>


--=20
Regards,

Zhi Yong Wu

From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([140.186.70.92]:44178)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <zwu.kernel@gmail.com>) id 1QmcW4-0000mT-FM
	for qemu-devel@nongnu.org; Thu, 28 Jul 2011 22:09:33 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <zwu.kernel@gmail.com>) id 1QmcW3-0002RC-Bs
	for qemu-devel@nongnu.org; Thu, 28 Jul 2011 22:09:32 -0400
Received: from mail-gx0-f173.google.com ([209.85.161.173]:56078)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <zwu.kernel@gmail.com>) id 1QmcW3-0002R7-6Z
	for qemu-devel@nongnu.org; Thu, 28 Jul 2011 22:09:31 -0400
Received: by gxk26 with SMTP id 26so2710834gxk.4
	for <qemu-devel@nongnu.org>; Thu, 28 Jul 2011 19:09:30 -0700 (PDT)
MIME-Version: 1.0
In-Reply-To: <20110728144211.GA17754@amt.cnet>
References: <1311670746-20498-1-git-send-email-wuzhy@linux.vnet.ibm.com>
	<1311670746-20498-2-git-send-email-wuzhy@linux.vnet.ibm.com>
	<20110726192618.GA8126@amt.cnet>
	<CAEH94Liq-0jNH4mC5nCib+SBkYKoCO1vwy4LemiNQzWVSwYAsg@mail.gmail.com>
	<20110727154913.GA6334@amt.cnet>
	<CAEH94LibyMN9=8gGBYE15Wva7GR0BBBWmbjTaWdcLO8psbHGYA@mail.gmail.com>
	<20110728144211.GA17754@amt.cnet>
Date: Fri, 29 Jul 2011 10:09:30 +0800
Message-ID: <CAEH94Ljti-TpeRMcqRhu7aMT_vpZjfDjOv+cy3CxvDjaPtA1DQ@mail.gmail.com>
From: Zhi Yong Wu <zwu.kernel@gmail.com>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Subject: Re: [Qemu-devel] [PATCH v2 1/1] The codes V2 for QEMU disk I/O
	limits.
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Marcelo Tosatti <mtosatti@redhat.com>
Cc: kwolf@redhat.com, aliguori@us.ibm.com, stefanha@linux.vnet.ibm.com, kvm@vger.kernel.org, Zhi Yong Wu <wuzhy@linux.vnet.ibm.com>, qemu-devel@nongnu.org, ryanh@us.ibm.com, vgoyal@redhat.com

On Thu, Jul 28, 2011 at 10:42 PM, Marcelo Tosatti <mtosatti@redhat.com> wro=
te:
> On Thu, Jul 28, 2011 at 12:24:48PM +0800, Zhi Yong Wu wrote:
>> On Wed, Jul 27, 2011 at 11:49 PM, Marcelo Tosatti <mtosatti@redhat.com> =
wrote:
>> > On Wed, Jul 27, 2011 at 06:17:15PM +0800, Zhi Yong Wu wrote:
>> >> >> + =A0 =A0 =A0 =A0wait_time =3D 1;
>> >> >> + =A0 =A0}
>> >> >> +
>> >> >> + =A0 =A0wait_time =3D wait_time + (slice_time - elapsed_time);
>> >> >> + =A0 =A0if (wait) {
>> >> >> + =A0 =A0 =A0 =A0*wait =3D wait_time * BLOCK_IO_SLICE_TIME * 10 + =
1;
>> >> >> + =A0 =A0}
>> >> >
>> >> > The guest can keep submitting requests where "wait_time =3D 1" abov=
e,
>> >> > and the timer will be rearmed continuously in the future.
>> >
>> > This is wrong.
>> >
>> >> =A0Can't you
>> >> > simply arm the timer to the next slice start? _Some_ data must be
>> >> > transfered by then, anyway (and nothing can be transfered earlier t=
han
>> >> > that).
>> >
>> > This is valid.
>> >
>> >> Sorry, i have got what you mean. Can you elaborate in more detail?
>> >
>> > Sorry, the bug i mentioned about timer being rearmed does not exist.
>> >
>> > But arming the timer for the last request as its done is confusing/unn=
ecessary.
>> >
>> > The timer processes queued requests, but the timer is armed accordingl=
y
>> > to the last queued request in the slice. For example, if a request is
>> > submitted 1ms before the slice ends, the timer is armed 100ms +
>> > (slice_time - elapsed_time) in the future.
>>
>> If the timer is simply amred to the next slice start, this timer will
>> be a periodic timer, either the I/O rate can not be throttled under
>> the limits, or the enqueued request can be delayed to handled, this
>> will lower I/O rate seriously than the limits.
>
> Yes, periodic but disarmed when there is no queueing. I don't understand
> your point about low I/O rate.
We hope that at the same time the runtime I/O rate is throttled under
this limits, it can be very close to the limits. If the timer is
simpily armed to the next slice, the enqueued request could be delayed
a bit long to be handled, this could make current I/O rate lowerer
largely than the limits. So this could seriously hurt the I/O
performance.

>
>> Maybe the slice time should be variable with current I/O rate. What do
>> you think of it?
>
> Not sure if its necessary. The slice should be small to avoid excessive
> work on timer context, but the advantage of increasing the slice is
> very small if any. BTW, 10ms seems a better default than 100ms.
Thanks for your comments, if you're interested, pls take a look at the
code V3. It has added the codes for variable slice time.

>
>


--=20
Regards,

Zhi Yong Wu