From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:40407) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XNOfb-0002kF-RW for qemu-devel@nongnu.org; Fri, 29 Aug 2014 12:05:08 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XNOfS-0002Mp-Bg for qemu-devel@nongnu.org; Fri, 29 Aug 2014 12:04:59 -0400 Received: from mail-wg0-x232.google.com ([2a00:1450:400c:c00::232]:55396) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XNOfS-0002MP-1x for qemu-devel@nongnu.org; Fri, 29 Aug 2014 12:04:50 -0400 Received: by mail-wg0-f50.google.com with SMTP id x12so2407263wgg.9 for ; Fri, 29 Aug 2014 09:04:49 -0700 (PDT) Date: Fri, 29 Aug 2014 17:04:46 +0100 From: Stefan Hajnoczi Message-ID: <20140829160446.GC20193@stefanha-thinkpad.redhat.com> References: <20140828143809.GB28789@irqsave.net> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="WfZ7S8PLGjBY9Voh" Content-Disposition: inline In-Reply-To: <20140828143809.GB28789@irqsave.net> Subject: Re: [Qemu-devel] IO accounting overhaul List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: =?iso-8859-1?Q?Beno=EEt?= Canet Cc: kwolf@redhat.com, libvir-list@redhat.com, qemu-devel@nongnu.org, armbru@redhat.com, stefanha@redhat.com, anshul.makkar@profitbricks.com --WfZ7S8PLGjBY9Voh Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Thu, Aug 28, 2014 at 04:38:09PM +0200, Beno=EEt Canet wrote: > I collected some items of a cloud provider wishlist regarding I/O accouti= ng. >=20 > In a cloud I/O accouting can have 3 purpose: billing, helping the custome= rs > and doing metrology to help the cloud provider seeks hidden costs. >=20 > I'll cover the two former topic in this mail because they are the most im= portant > business wize. >=20 > 1) prefered place to collect billing IO accounting data: > -------------------------------------------------------- > For billing purpose the collected data must be as close as possible to wh= at the > customer would see by using iostats in his vm. >=20 > The first conclusion we can draw is that the choice of collecting IO acco= uting > data used for billing in the block devices models is right. I agree. When statistics are collected at lower layers it becomes are for the end user to understand numbers that include hidden costs for image formats, network protocols, etc. > 2) what to do with occurences of rare events: > --------------------------------------------- >=20 > Another point is that QEMU developpers agree that they don't know which p= olicy > to apply to some I/O accounting events. > Must QEMU discard invalid I/O write IO or account them as done ? > Must QEMU count a failed read I/O as done ? >=20 > When discusting this with a cloud provider the following appears: these d= ecisions > are really specific to each cloud provider and QEMU should not implement = them. > The right thing to do is to add accouting counters to collect these event= s. >=20 > Moreover these rare events are precious troubleshooting data so it's an a= dditional > reason not to toss them. Sounds good, network interface statistics also include error counters. > 3) list of block I/O accouting metrics wished for billing and helping the= customers > -------------------------------------------------------------------------= ---------- >=20 > Basic I/O accouting data will end up making the customers bills. > Extra I/O accouting informations would be a precious help for the cloud p= rovider > to implement a monitoring panel like Amazon Cloudwatch. One thing to be aware of is that counters inside QEMU cannot be trusted. If a malicious guest can overwrite memory in QEMU then the counters can be manipulated. For most purposes this should be okay. Just be aware that evil guests could manipulate their counters if a security hole is found in QEMU. > Here is the list of counters and statitics I would like to help implement= in QEMU. >=20 > This is the most important part of the mail and the one I would like the = community > review the most. >=20 > Once this list is settled I would proceed to implement the required infra= structure > in QEMU before using it in the device models. >=20 > /* volume of data transfered by the IOs */ > read_bytes > write_bytes >=20 > /* operation count */ > read_ios > write_ios > flush_ios >=20 > /* how many invalid IOs the guest submit */ > invalid_read_ios > invalid_write_ios > invalid_flush_ios >=20 > /* how many io error happened */ > read_ios_error > write_ios_error > flush_ios_error >=20 > /* account the time passed doing IOs */ > total_read_time > total_write_time > total_flush_time >=20 > /* since when the volume is iddle */ > qvolume_iddleness_time ? >=20 > /* the following would compute latecies for slices of 1 seconds then toss= the > * result and start a new slice. A weighted sumation of the instant laten= cies > * could help to implement this. > */ > 1s_read_average_latency > 1s_write_average_latency > 1s_flush_average_latency >=20 > /* the former three numbers could be used to further compute a 1 minute s= lice value */ > 1m_read_average_latency > 1m_write_average_latency > 1m_flush_average_latency >=20 > /* the former three numbers could be used to further compute a 1 hours sl= ice value */ > 1h_read_average_latency > 1h_write_average_latency > 1h_flush_average_latency >=20 > /* 1 second average number of requests in flight */ > 1s_read_queue_depth > 1s_write_queue_depth >=20 > /* 1 minute average number of requests in flight */ > 1m_read_queue_depth > 1m_write_queue_depth >=20 > /* 1 hours average number of requests in flight */ > 1h_read_queue_depth > 1h_write_queue_depth I think libvirt captures similar data. At least virt-manager displays graphs with similar data (maybe for CPU, memory, or network instead of disk). > 4) Making this happen > ------------------------- >=20 > Outscale want to make these IO stat happen and gave me the go to do whate= ver > grunt is required to do so. > That said we could collaborate on some part of the work. Seems like a nice improvement to the query-blockstats available today. CCing libvirt for management stack ideas. Stefan --WfZ7S8PLGjBY9Voh Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQEcBAEBAgAGBQJUAKSeAAoJEJykq7OBq3PIvtUH/AvOf4AODgjkfnWCfRI7PkVZ nKB/HH7eL2/JcrIv5GKkCpz1oFmGPEI30xuS8xV287chc0J5nmOSLFDHUghNNmpW U0ITccFL/qhX8NDHe28Exv8ZMewfSx0HwSO+UxOGfyKCL9Rtv26ZE0W6B56dF3Vq acaNnBH/wtibz5azYweSfmGTHX74gQ6rHJrAdaF7m8FsOHlyguA8garhfsyt8flD BgEGJ+uhTN/mXm/+EwkbhKAhTj/ytPCrWgBwMJc611Uf+pjFdYQ8gTqS810zzuP7 vi9mB55wuaXQjffQUqFUnOqH9hRMoPjI5f3ciRScI9xFyRbegU6tSxrDGtGcb2g= =B2ru -----END PGP SIGNATURE----- --WfZ7S8PLGjBY9Voh--