From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:47753)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <benoit.canet@irqsave.net>) id 1XNP6w-0003o1-Bx
	for qemu-devel@nongnu.org; Fri, 29 Aug 2014 12:33:20 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <benoit.canet@irqsave.net>) id 1XNP6q-0003XU-3c
	for qemu-devel@nongnu.org; Fri, 29 Aug 2014 12:33:14 -0400
Received: from lputeaux-656-01-25-125.w80-12.abo.wanadoo.fr
	([80.12.84.125]:44395 helo=paradis.irqsave.net)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <benoit.canet@irqsave.net>) id 1XNP6p-0003X9-Qp
	for qemu-devel@nongnu.org; Fri, 29 Aug 2014 12:33:08 -0400
Date: Fri, 29 Aug 2014 18:32:20 +0200
From: =?iso-8859-1?Q?Beno=EEt?= Canet <benoit.canet@irqsave.net>
Message-ID: <20140829163219.GB16755@irqsave.net>
References: <20140828143809.GB28789@irqsave.net>
	<20140829160446.GC20193@stefanha-thinkpad.redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
In-Reply-To: <20140829160446.GC20193@stefanha-thinkpad.redhat.com>
Content-Transfer-Encoding: quoted-printable
Subject: Re: [Qemu-devel] IO accounting overhaul
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Stefan Hajnoczi <stefanha@gmail.com>
Cc: =?iso-8859-1?Q?Beno=EEt?= Canet <benoit.canet@irqsave.net>, kwolf@redhat.com, libvir-list@redhat.com, qemu-devel@nongnu.org, armbru@redhat.com, stefanha@redhat.com, anshul.makkar@profitbricks.com

The Friday 29 Aug 2014 =E0 17:04:46 (+0100), Stefan Hajnoczi wrote :
> On Thu, Aug 28, 2014 at 04:38:09PM +0200, Beno=EEt Canet wrote:
> > I collected some items of a cloud provider wishlist regarding I/O acc=
outing.
> >=20
> > In a cloud I/O accouting can have 3 purpose: billing, helping the cus=
tomers
> > and doing metrology to help the cloud provider seeks hidden costs.
> >=20
> > I'll cover the two former topic in this mail because they are the mos=
t important
> > business wize.
> >=20
> > 1) prefered place to collect billing IO accounting data:
> > --------------------------------------------------------
> > For billing purpose the collected data must be as close as possible t=
o what the
> > customer would see by using iostats in his vm.
> >=20
> > The first conclusion we can draw is that the choice of collecting IO =
accouting
> > data used for billing in the block devices models is right.
>=20
> I agree.  When statistics are collected at lower layers it becomes are
> for the end user to understand numbers that include hidden costs for
> image formats, network protocols, etc.
>=20
> > 2) what to do with occurences of rare events:
> > ---------------------------------------------
> >=20
> > Another point is that QEMU developpers agree that they don't know whi=
ch policy
> > to apply to some I/O accounting events.
> > Must QEMU discard invalid I/O write IO or account them as done ?
> > Must QEMU count a failed read I/O as done ?
> >=20
> > When discusting this with a cloud provider the following appears: the=
se decisions
> > are really specific to each cloud provider and QEMU should not implem=
ent them.
> > The right thing to do is to add accouting counters to collect these e=
vents.
> >=20
> > Moreover these rare events are precious troubleshooting data so it's =
an additional
> > reason not to toss them.
>=20
> Sounds good, network interface statistics also include error counters.
>=20
> > 3) list of block I/O accouting metrics wished for billing and helping=
 the customers
> > ---------------------------------------------------------------------=
--------------
> >=20
> > Basic I/O accouting data will end up making the customers bills.
> > Extra I/O accouting informations would be a precious help for the clo=
ud provider
> > to implement a monitoring panel like Amazon Cloudwatch.
>=20
> One thing to be aware of is that counters inside QEMU cannot be trusted=
.
> If a malicious guest can overwrite memory in QEMU then the counters can
> be manipulated.
>=20
> For most purposes this should be okay.  Just be aware that evil guests
> could manipulate their counters if a security hole is found in QEMU.
>=20
> > Here is the list of counters and statitics I would like to help imple=
ment in QEMU.
> >=20
> > This is the most important part of the mail and the one I would like =
the community
> > review the most.
> >=20
> > Once this list is settled I would proceed to implement the required i=
nfrastructure
> > in QEMU before using it in the device models.
> >=20
> > /* volume of data transfered by the IOs */
> > read_bytes
> > write_bytes
> >=20
> > /* operation count */
> > read_ios
> > write_ios
> > flush_ios
> >=20
> > /* how many invalid IOs the guest submit */
> > invalid_read_ios
> > invalid_write_ios
> > invalid_flush_ios
> >=20
> > /* how many io error happened */
> > read_ios_error
> > write_ios_error
> > flush_ios_error
> >=20
> > /* account the time passed doing IOs */
> > total_read_time
> > total_write_time
> > total_flush_time
> >=20
> > /* since when the volume is iddle */
> > qvolume_iddleness_time
>=20
> ?

s/qv/v/

It's the time the volume spent being iddle.
Amazon report it in it's tools.

>=20
> >=20
> > /* the following would compute latecies for slices of 1 seconds then =
toss the
> >  * result and start a new slice. A weighted sumation of the instant l=
atencies
> >  * could help to implement this.
> >  */
> > 1s_read_average_latency
> > 1s_write_average_latency
> > 1s_flush_average_latency
> >=20
> > /* the former three numbers could be used to further compute a 1 minu=
te slice value */
> > 1m_read_average_latency
> > 1m_write_average_latency
> > 1m_flush_average_latency
> >=20
> > /* the former three numbers could be used to further compute a 1 hour=
s slice value */
> > 1h_read_average_latency
> > 1h_write_average_latency
> > 1h_flush_average_latency
> >=20
> > /* 1 second average number of requests in flight */
> > 1s_read_queue_depth
> > 1s_write_queue_depth
> >=20
> > /* 1 minute average number of requests in flight */
> > 1m_read_queue_depth
> > 1m_write_queue_depth
> >=20
> > /* 1 hours average number of requests in flight */
> > 1h_read_queue_depth
> > 1h_write_queue_depth
>=20
> I think libvirt captures similar data.  At least virt-manager displays
> graphs with similar data (maybe for CPU, memory, or network instead of
> disk).
>=20
> > 4) Making this happen
> > -------------------------
> >=20
> > Outscale want to make these IO stat happen and gave me the go to do w=
hatever
> > grunt is required to do so.
> > That said we could collaborate on some part of the work.
>=20
> Seems like a nice improvement to the query-blockstats available today.
>=20
> CCing libvirt for management stack ideas.
>=20
> Stefan