From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:47753) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XNP6w-0003o1-Bx for qemu-devel@nongnu.org; Fri, 29 Aug 2014 12:33:20 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XNP6q-0003XU-3c for qemu-devel@nongnu.org; Fri, 29 Aug 2014 12:33:14 -0400 Received: from lputeaux-656-01-25-125.w80-12.abo.wanadoo.fr ([80.12.84.125]:44395 helo=paradis.irqsave.net) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XNP6p-0003X9-Qp for qemu-devel@nongnu.org; Fri, 29 Aug 2014 12:33:08 -0400 Date: Fri, 29 Aug 2014 18:32:20 +0200 From: =?iso-8859-1?Q?Beno=EEt?= Canet Message-ID: <20140829163219.GB16755@irqsave.net> References: <20140828143809.GB28789@irqsave.net> <20140829160446.GC20193@stefanha-thinkpad.redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <20140829160446.GC20193@stefanha-thinkpad.redhat.com> Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] IO accounting overhaul List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stefan Hajnoczi Cc: =?iso-8859-1?Q?Beno=EEt?= Canet , kwolf@redhat.com, libvir-list@redhat.com, qemu-devel@nongnu.org, armbru@redhat.com, stefanha@redhat.com, anshul.makkar@profitbricks.com The Friday 29 Aug 2014 =E0 17:04:46 (+0100), Stefan Hajnoczi wrote : > On Thu, Aug 28, 2014 at 04:38:09PM +0200, Beno=EEt Canet wrote: > > I collected some items of a cloud provider wishlist regarding I/O acc= outing. > >=20 > > In a cloud I/O accouting can have 3 purpose: billing, helping the cus= tomers > > and doing metrology to help the cloud provider seeks hidden costs. > >=20 > > I'll cover the two former topic in this mail because they are the mos= t important > > business wize. > >=20 > > 1) prefered place to collect billing IO accounting data: > > -------------------------------------------------------- > > For billing purpose the collected data must be as close as possible t= o what the > > customer would see by using iostats in his vm. > >=20 > > The first conclusion we can draw is that the choice of collecting IO = accouting > > data used for billing in the block devices models is right. >=20 > I agree. When statistics are collected at lower layers it becomes are > for the end user to understand numbers that include hidden costs for > image formats, network protocols, etc. >=20 > > 2) what to do with occurences of rare events: > > --------------------------------------------- > >=20 > > Another point is that QEMU developpers agree that they don't know whi= ch policy > > to apply to some I/O accounting events. > > Must QEMU discard invalid I/O write IO or account them as done ? > > Must QEMU count a failed read I/O as done ? > >=20 > > When discusting this with a cloud provider the following appears: the= se decisions > > are really specific to each cloud provider and QEMU should not implem= ent them. > > The right thing to do is to add accouting counters to collect these e= vents. > >=20 > > Moreover these rare events are precious troubleshooting data so it's = an additional > > reason not to toss them. >=20 > Sounds good, network interface statistics also include error counters. >=20 > > 3) list of block I/O accouting metrics wished for billing and helping= the customers > > ---------------------------------------------------------------------= -------------- > >=20 > > Basic I/O accouting data will end up making the customers bills. > > Extra I/O accouting informations would be a precious help for the clo= ud provider > > to implement a monitoring panel like Amazon Cloudwatch. >=20 > One thing to be aware of is that counters inside QEMU cannot be trusted= . > If a malicious guest can overwrite memory in QEMU then the counters can > be manipulated. >=20 > For most purposes this should be okay. Just be aware that evil guests > could manipulate their counters if a security hole is found in QEMU. >=20 > > Here is the list of counters and statitics I would like to help imple= ment in QEMU. > >=20 > > This is the most important part of the mail and the one I would like = the community > > review the most. > >=20 > > Once this list is settled I would proceed to implement the required i= nfrastructure > > in QEMU before using it in the device models. > >=20 > > /* volume of data transfered by the IOs */ > > read_bytes > > write_bytes > >=20 > > /* operation count */ > > read_ios > > write_ios > > flush_ios > >=20 > > /* how many invalid IOs the guest submit */ > > invalid_read_ios > > invalid_write_ios > > invalid_flush_ios > >=20 > > /* how many io error happened */ > > read_ios_error > > write_ios_error > > flush_ios_error > >=20 > > /* account the time passed doing IOs */ > > total_read_time > > total_write_time > > total_flush_time > >=20 > > /* since when the volume is iddle */ > > qvolume_iddleness_time >=20 > ? s/qv/v/ It's the time the volume spent being iddle. Amazon report it in it's tools. >=20 > >=20 > > /* the following would compute latecies for slices of 1 seconds then = toss the > > * result and start a new slice. A weighted sumation of the instant l= atencies > > * could help to implement this. > > */ > > 1s_read_average_latency > > 1s_write_average_latency > > 1s_flush_average_latency > >=20 > > /* the former three numbers could be used to further compute a 1 minu= te slice value */ > > 1m_read_average_latency > > 1m_write_average_latency > > 1m_flush_average_latency > >=20 > > /* the former three numbers could be used to further compute a 1 hour= s slice value */ > > 1h_read_average_latency > > 1h_write_average_latency > > 1h_flush_average_latency > >=20 > > /* 1 second average number of requests in flight */ > > 1s_read_queue_depth > > 1s_write_queue_depth > >=20 > > /* 1 minute average number of requests in flight */ > > 1m_read_queue_depth > > 1m_write_queue_depth > >=20 > > /* 1 hours average number of requests in flight */ > > 1h_read_queue_depth > > 1h_write_queue_depth >=20 > I think libvirt captures similar data. At least virt-manager displays > graphs with similar data (maybe for CPU, memory, or network instead of > disk). >=20 > > 4) Making this happen > > ------------------------- > >=20 > > Outscale want to make these IO stat happen and gave me the go to do w= hatever > > grunt is required to do so. > > That said we could collaborate on some part of the work. >=20 > Seems like a nice improvement to the query-blockstats available today. >=20 > CCing libvirt for management stack ideas. >=20 > Stefan