All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] IO accounting overhaul
@ 2014-08-28 14:38 Benoît Canet
  2014-08-29 16:04 ` Stefan Hajnoczi
  2014-09-01  9:52 ` Markus Armbruster
  0 siblings, 2 replies; 14+ messages in thread
From: Benoît Canet @ 2014-08-28 14:38 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, anshul.makkar, armbru, stefanha


Hi,

I collected some items of a cloud provider wishlist regarding I/O accouting.

In a cloud I/O accouting can have 3 purpose: billing, helping the customers
and doing metrology to help the cloud provider seeks hidden costs.

I'll cover the two former topic in this mail because they are the most important
business wize.

1) prefered place to collect billing IO accounting data:
--------------------------------------------------------
For billing purpose the collected data must be as close as possible to what the
customer would see by using iostats in his vm.

The first conclusion we can draw is that the choice of collecting IO accouting
data used for billing in the block devices models is right.

2) what to do with occurences of rare events:
---------------------------------------------

Another point is that QEMU developpers agree that they don't know which policy
to apply to some I/O accounting events.
Must QEMU discard invalid I/O write IO or account them as done ?
Must QEMU count a failed read I/O as done ?

When discusting this with a cloud provider the following appears: these decisions
are really specific to each cloud provider and QEMU should not implement them.
The right thing to do is to add accouting counters to collect these events.

Moreover these rare events are precious troubleshooting data so it's an additional
reason not to toss them.

3) list of block I/O accouting metrics wished for billing and helping the customers
-----------------------------------------------------------------------------------

Basic I/O accouting data will end up making the customers bills.
Extra I/O accouting informations would be a precious help for the cloud provider
to implement a monitoring panel like Amazon Cloudwatch.

Here is the list of counters and statitics I would like to help implement in QEMU.

This is the most important part of the mail and the one I would like the community
review the most.

Once this list is settled I would proceed to implement the required infrastructure
in QEMU before using it in the device models.

/* volume of data transfered by the IOs */
read_bytes
write_bytes

/* operation count */
read_ios
write_ios
flush_ios

/* how many invalid IOs the guest submit */
invalid_read_ios
invalid_write_ios
invalid_flush_ios

/* how many io error happened */
read_ios_error
write_ios_error
flush_ios_error

/* account the time passed doing IOs */
total_read_time
total_write_time
total_flush_time

/* since when the volume is iddle */
qvolume_iddleness_time

/* the following would compute latecies for slices of 1 seconds then toss the
 * result and start a new slice. A weighted sumation of the instant latencies
 * could help to implement this.
 */
1s_read_average_latency
1s_write_average_latency
1s_flush_average_latency

/* the former three numbers could be used to further compute a 1 minute slice value */
1m_read_average_latency
1m_write_average_latency
1m_flush_average_latency

/* the former three numbers could be used to further compute a 1 hours slice value */
1h_read_average_latency
1h_write_average_latency
1h_flush_average_latency

/* 1 second average number of requests in flight */
1s_read_queue_depth
1s_write_queue_depth

/* 1 minute average number of requests in flight */
1m_read_queue_depth
1m_write_queue_depth

/* 1 hours average number of requests in flight */
1h_read_queue_depth
1h_write_queue_depth

4) Making this happen
-------------------------

Outscale want to make these IO stat happen and gave me the go to do whatever
grunt is required to do so.
That said we could collaborate on some part of the work.

Best regards

Benoît

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] IO accounting overhaul
  2014-08-28 14:38 [Qemu-devel] IO accounting overhaul Benoît Canet
@ 2014-08-29 16:04 ` Stefan Hajnoczi
  2014-08-29 16:32   ` Benoît Canet
  2014-09-01  9:52 ` Markus Armbruster
  1 sibling, 1 reply; 14+ messages in thread
From: Stefan Hajnoczi @ 2014-08-29 16:04 UTC (permalink / raw)
  To: Benoît Canet
  Cc: kwolf, libvir-list, qemu-devel, armbru, stefanha, anshul.makkar

[-- Attachment #1: Type: text/plain, Size: 4830 bytes --]

On Thu, Aug 28, 2014 at 04:38:09PM +0200, Benoît Canet wrote:
> I collected some items of a cloud provider wishlist regarding I/O accouting.
> 
> In a cloud I/O accouting can have 3 purpose: billing, helping the customers
> and doing metrology to help the cloud provider seeks hidden costs.
> 
> I'll cover the two former topic in this mail because they are the most important
> business wize.
> 
> 1) prefered place to collect billing IO accounting data:
> --------------------------------------------------------
> For billing purpose the collected data must be as close as possible to what the
> customer would see by using iostats in his vm.
> 
> The first conclusion we can draw is that the choice of collecting IO accouting
> data used for billing in the block devices models is right.

I agree.  When statistics are collected at lower layers it becomes are
for the end user to understand numbers that include hidden costs for
image formats, network protocols, etc.

> 2) what to do with occurences of rare events:
> ---------------------------------------------
> 
> Another point is that QEMU developpers agree that they don't know which policy
> to apply to some I/O accounting events.
> Must QEMU discard invalid I/O write IO or account them as done ?
> Must QEMU count a failed read I/O as done ?
> 
> When discusting this with a cloud provider the following appears: these decisions
> are really specific to each cloud provider and QEMU should not implement them.
> The right thing to do is to add accouting counters to collect these events.
> 
> Moreover these rare events are precious troubleshooting data so it's an additional
> reason not to toss them.

Sounds good, network interface statistics also include error counters.

> 3) list of block I/O accouting metrics wished for billing and helping the customers
> -----------------------------------------------------------------------------------
> 
> Basic I/O accouting data will end up making the customers bills.
> Extra I/O accouting informations would be a precious help for the cloud provider
> to implement a monitoring panel like Amazon Cloudwatch.

One thing to be aware of is that counters inside QEMU cannot be trusted.
If a malicious guest can overwrite memory in QEMU then the counters can
be manipulated.

For most purposes this should be okay.  Just be aware that evil guests
could manipulate their counters if a security hole is found in QEMU.

> Here is the list of counters and statitics I would like to help implement in QEMU.
> 
> This is the most important part of the mail and the one I would like the community
> review the most.
> 
> Once this list is settled I would proceed to implement the required infrastructure
> in QEMU before using it in the device models.
> 
> /* volume of data transfered by the IOs */
> read_bytes
> write_bytes
> 
> /* operation count */
> read_ios
> write_ios
> flush_ios
> 
> /* how many invalid IOs the guest submit */
> invalid_read_ios
> invalid_write_ios
> invalid_flush_ios
> 
> /* how many io error happened */
> read_ios_error
> write_ios_error
> flush_ios_error
> 
> /* account the time passed doing IOs */
> total_read_time
> total_write_time
> total_flush_time
> 
> /* since when the volume is iddle */
> qvolume_iddleness_time

?

> 
> /* the following would compute latecies for slices of 1 seconds then toss the
>  * result and start a new slice. A weighted sumation of the instant latencies
>  * could help to implement this.
>  */
> 1s_read_average_latency
> 1s_write_average_latency
> 1s_flush_average_latency
> 
> /* the former three numbers could be used to further compute a 1 minute slice value */
> 1m_read_average_latency
> 1m_write_average_latency
> 1m_flush_average_latency
> 
> /* the former three numbers could be used to further compute a 1 hours slice value */
> 1h_read_average_latency
> 1h_write_average_latency
> 1h_flush_average_latency
> 
> /* 1 second average number of requests in flight */
> 1s_read_queue_depth
> 1s_write_queue_depth
> 
> /* 1 minute average number of requests in flight */
> 1m_read_queue_depth
> 1m_write_queue_depth
> 
> /* 1 hours average number of requests in flight */
> 1h_read_queue_depth
> 1h_write_queue_depth

I think libvirt captures similar data.  At least virt-manager displays
graphs with similar data (maybe for CPU, memory, or network instead of
disk).

> 4) Making this happen
> -------------------------
> 
> Outscale want to make these IO stat happen and gave me the go to do whatever
> grunt is required to do so.
> That said we could collaborate on some part of the work.

Seems like a nice improvement to the query-blockstats available today.

CCing libvirt for management stack ideas.

Stefan

[-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] IO accounting overhaul
  2014-08-29 16:04 ` Stefan Hajnoczi
@ 2014-08-29 16:32   ` Benoît Canet
  0 siblings, 0 replies; 14+ messages in thread
From: Benoît Canet @ 2014-08-29 16:32 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Benoît Canet, kwolf, libvir-list, qemu-devel, armbru,
	stefanha, anshul.makkar

The Friday 29 Aug 2014 à 17:04:46 (+0100), Stefan Hajnoczi wrote :
> On Thu, Aug 28, 2014 at 04:38:09PM +0200, Benoît Canet wrote:
> > I collected some items of a cloud provider wishlist regarding I/O accouting.
> > 
> > In a cloud I/O accouting can have 3 purpose: billing, helping the customers
> > and doing metrology to help the cloud provider seeks hidden costs.
> > 
> > I'll cover the two former topic in this mail because they are the most important
> > business wize.
> > 
> > 1) prefered place to collect billing IO accounting data:
> > --------------------------------------------------------
> > For billing purpose the collected data must be as close as possible to what the
> > customer would see by using iostats in his vm.
> > 
> > The first conclusion we can draw is that the choice of collecting IO accouting
> > data used for billing in the block devices models is right.
> 
> I agree.  When statistics are collected at lower layers it becomes are
> for the end user to understand numbers that include hidden costs for
> image formats, network protocols, etc.
> 
> > 2) what to do with occurences of rare events:
> > ---------------------------------------------
> > 
> > Another point is that QEMU developpers agree that they don't know which policy
> > to apply to some I/O accounting events.
> > Must QEMU discard invalid I/O write IO or account them as done ?
> > Must QEMU count a failed read I/O as done ?
> > 
> > When discusting this with a cloud provider the following appears: these decisions
> > are really specific to each cloud provider and QEMU should not implement them.
> > The right thing to do is to add accouting counters to collect these events.
> > 
> > Moreover these rare events are precious troubleshooting data so it's an additional
> > reason not to toss them.
> 
> Sounds good, network interface statistics also include error counters.
> 
> > 3) list of block I/O accouting metrics wished for billing and helping the customers
> > -----------------------------------------------------------------------------------
> > 
> > Basic I/O accouting data will end up making the customers bills.
> > Extra I/O accouting informations would be a precious help for the cloud provider
> > to implement a monitoring panel like Amazon Cloudwatch.
> 
> One thing to be aware of is that counters inside QEMU cannot be trusted.
> If a malicious guest can overwrite memory in QEMU then the counters can
> be manipulated.
> 
> For most purposes this should be okay.  Just be aware that evil guests
> could manipulate their counters if a security hole is found in QEMU.
> 
> > Here is the list of counters and statitics I would like to help implement in QEMU.
> > 
> > This is the most important part of the mail and the one I would like the community
> > review the most.
> > 
> > Once this list is settled I would proceed to implement the required infrastructure
> > in QEMU before using it in the device models.
> > 
> > /* volume of data transfered by the IOs */
> > read_bytes
> > write_bytes
> > 
> > /* operation count */
> > read_ios
> > write_ios
> > flush_ios
> > 
> > /* how many invalid IOs the guest submit */
> > invalid_read_ios
> > invalid_write_ios
> > invalid_flush_ios
> > 
> > /* how many io error happened */
> > read_ios_error
> > write_ios_error
> > flush_ios_error
> > 
> > /* account the time passed doing IOs */
> > total_read_time
> > total_write_time
> > total_flush_time
> > 
> > /* since when the volume is iddle */
> > qvolume_iddleness_time
> 
> ?

s/qv/v/

It's the time the volume spent being iddle.
Amazon report it in it's tools.

> 
> > 
> > /* the following would compute latecies for slices of 1 seconds then toss the
> >  * result and start a new slice. A weighted sumation of the instant latencies
> >  * could help to implement this.
> >  */
> > 1s_read_average_latency
> > 1s_write_average_latency
> > 1s_flush_average_latency
> > 
> > /* the former three numbers could be used to further compute a 1 minute slice value */
> > 1m_read_average_latency
> > 1m_write_average_latency
> > 1m_flush_average_latency
> > 
> > /* the former three numbers could be used to further compute a 1 hours slice value */
> > 1h_read_average_latency
> > 1h_write_average_latency
> > 1h_flush_average_latency
> > 
> > /* 1 second average number of requests in flight */
> > 1s_read_queue_depth
> > 1s_write_queue_depth
> > 
> > /* 1 minute average number of requests in flight */
> > 1m_read_queue_depth
> > 1m_write_queue_depth
> > 
> > /* 1 hours average number of requests in flight */
> > 1h_read_queue_depth
> > 1h_write_queue_depth
> 
> I think libvirt captures similar data.  At least virt-manager displays
> graphs with similar data (maybe for CPU, memory, or network instead of
> disk).
> 
> > 4) Making this happen
> > -------------------------
> > 
> > Outscale want to make these IO stat happen and gave me the go to do whatever
> > grunt is required to do so.
> > That said we could collaborate on some part of the work.
> 
> Seems like a nice improvement to the query-blockstats available today.
> 
> CCing libvirt for management stack ideas.
> 
> Stefan

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] IO accounting overhaul
  2014-08-28 14:38 [Qemu-devel] IO accounting overhaul Benoît Canet
  2014-08-29 16:04 ` Stefan Hajnoczi
@ 2014-09-01  9:52 ` Markus Armbruster
  2014-09-01 10:44   ` [Qemu-devel] [libvirt] " Benoît Canet
  1 sibling, 1 reply; 14+ messages in thread
From: Markus Armbruster @ 2014-09-01  9:52 UTC (permalink / raw)
  To: Benoît Canet; +Cc: kwolf, anshul.makkar, libvir-list, qemu-devel, stefanha

Cc'ing libvirt following Stefan's lead.

Benoît Canet <benoit.canet@irqsave.net> writes:

> Hi,
>
> I collected some items of a cloud provider wishlist regarding I/O accouting.

Feedback from real power-users, lovely!

> In a cloud I/O accouting can have 3 purpose: billing, helping the customers
> and doing metrology to help the cloud provider seeks hidden costs.
>
> I'll cover the two former topic in this mail because they are the most important
> business wize.
>
> 1) prefered place to collect billing IO accounting data:
> --------------------------------------------------------
> For billing purpose the collected data must be as close as possible to what the
> customer would see by using iostats in his vm.

Good point.

> The first conclusion we can draw is that the choice of collecting IO accouting
> data used for billing in the block devices models is right.

Slightly rephrasing: doing I/O accounting in the block device models is
right for billing.

There may be other uses for I/O accounting, with different preferences.
For instance, data on how exactly guest I/O gets translated to host I/O
as it flows through the nodes in the block graph could be useful.

Doesn't diminish the need for accurate billing information, of course.

> 2) what to do with occurences of rare events:
> ---------------------------------------------
>
> Another point is that QEMU developpers agree that they don't know which policy
> to apply to some I/O accounting events.
> Must QEMU discard invalid I/O write IO or account them as done ?
> Must QEMU count a failed read I/O as done ?
>
> When discusting this with a cloud provider the following appears:
> these decisions
> are really specific to each cloud provider and QEMU should not implement them.

Good point, consistent with the old advice to avoid baking policy into
inappropriately low levels of the stack.

> The right thing to do is to add accouting counters to collect these events.
>
> Moreover these rare events are precious troubleshooting data so it's
> an additional
> reason not to toss them.

Another good point.

> 3) list of block I/O accouting metrics wished for billing and helping
> the customers
> -----------------------------------------------------------------------------------
>
> Basic I/O accouting data will end up making the customers bills.
> Extra I/O accouting informations would be a precious help for the cloud provider
> to implement a monitoring panel like Amazon Cloudwatch.

These are the first two from your list of three purposes, i.e. the ones
you promised to cover here.

> Here is the list of counters and statitics I would like to help
> implement in QEMU.
>
> This is the most important part of the mail and the one I would like
> the community
> review the most.
>
> Once this list is settled I would proceed to implement the required
> infrastructure
> in QEMU before using it in the device models.

For context, let me recap how I/O accounting works now.

The BlockDriverState abstract data type (short: BDS) can hold the
following accounting data:

    uint64_t nr_bytes[BDRV_MAX_IOTYPE];
    uint64_t nr_ops[BDRV_MAX_IOTYPE];
    uint64_t total_time_ns[BDRV_MAX_IOTYPE];
    uint64_t wr_highest_sector;

where BDRV_MAX_IOTYPE enumerates read, write, flush.

wr_highest_sector is a high watermark updated by the block layer as it
writes sectors.

The other three are *not* touched by the block layer.  Instead, the
block layer provides a pair of functions for device models to update
them:

    void bdrv_acct_start(BlockDriverState *bs, BlockAcctCookie *cookie,
            int64_t bytes, enum BlockAcctType type);
    void bdrv_acct_done(BlockDriverState *bs, BlockAcctCookie *cookie);

bdrv_acct_start() initializes cookie for a read, write, or flush
operation of a certain size.  The size of a flush is always zero.

bdrv_acct_done() adds the operations to the BDS's accounting data.
total_time_ns is incremented by the time between _start() and _done().

You may call _start() without calling _done().  That's a feature.
Device models use it to avoid accounting some requests.

Device models are not supposed to mess with cookie directly, only
through these two functions.

Some device models implement accounting, some don't.  The ones that do
don't agree on how to count invalid guest requests (the ones not passed
to block layer) and failed requests (passed to block layer and failed
there).  It's a mess in part caused by us never writing down what
exactly device models are expected to do.

Accounting data is used by "query-blockstats", and nothing else.

Corollary: even though every BDS holds accounting data, only the ones in
"top" BDSes ever get used.  This is a common block layer blemish, and
we're working on cleaning it up.

If a device model doesn't implement accounting, query-blockstats lies.
Fortunately, its lies are pretty transparent (everything's zero) as long
as you don't do things like connecting a backend to a device model that
doesn't implement accounting after disconnecting it from a device model
that does.  Still, I'd welcome a more honest QMP interface.

For me, this accounting data belongs to the device model, not the BDS.
Naturally, the block device models should use common infrastructure.  I
guess they use the block layer only because it's obvious infrastructure
they share.  Clumsy design.

> /* volume of data transfered by the IOs */
> read_bytes
> write_bytes

This is nr_bytes[BDRV_ACCT_READ] and nr_bytes[BDRV_ACCT_WRITE].

nr_bytes[BDRV_ACCT_FLUSH] is always zero.

Should this count only actual I/O, i.e. accumulated size of successful
operations?

> /* operation count */
> read_ios
> write_ios
> flush_ios
>
> /* how many invalid IOs the guest submit */
> invalid_read_ios
> invalid_write_ios
> invalid_flush_ios
>
> /* how many io error happened */
> read_ios_error
> write_ios_error
> flush_ios_error

This is nr_ops[BDRV_ACCT_READ], nr_ops[BDRV_ACCT_WRITE],
nr_ops[BDRV_ACCT_FLUSH] split up into successful, invalid and failed.

> /* account the time passed doing IOs */
> total_read_time
> total_write_time
> total_flush_time

This is total_time_ns[BDRV_ACCT_READ], total_time_ns[BDRV_ACCT_WRITE],
total_time_ns[BDRV_ACCT_FLUSH].

I guess this should count both successful and failed I/O.  Could throw
in invalid, too, but it's probably too quick to matter.

Please specify the unit clearly.  Both total_FOO_time_ns or total_FOO_ns
would work for me.

> /* since when the volume is iddle */
> qvolume_iddleness_time

"idle"

The obvious way to maintain this information with the current could
would be saving the value of get_clock() in bdrv_acct_done().

> /* the following would compute latecies for slices of 1 seconds then toss the
>  * result and start a new slice. A weighted sumation of the instant latencies
>  * could help to implement this.
>  */
> 1s_read_average_latency
> 1s_write_average_latency
> 1s_flush_average_latency
>
> /* the former three numbers could be used to further compute a 1
> minute slice value */
> 1m_read_average_latency
> 1m_write_average_latency
> 1m_flush_average_latency
>
> /* the former three numbers could be used to further compute a 1 hours
> slice value */
> 1h_read_average_latency
> 1h_write_average_latency
> 1h_flush_average_latency

This is something like "what we added to total_FOO_time in the last
completed 1s / 1m / 1h time slice divided by the number of additions".
Just another way to accumulate the same raw data, thus no worries.

> /* 1 second average number of requests in flight */
> 1s_read_queue_depth
> 1s_write_queue_depth
>
> /* 1 minute average number of requests in flight */
> 1m_read_queue_depth
> 1m_write_queue_depth
>
> /* 1 hours average number of requests in flight */
> 1h_read_queue_depth
> 1h_write_queue_depth

I guess this involves counting bdrv_acct_start() and bdrv_acct_done().
The "you need not call bdrv_acct_done()" feature may get in the way.
Solvable.

Permit me a short detour into the other use for I/O accounting I
mentioned: data on how exactly guest I/O gets translated to host I/O as
it flows through the nodes in the block graph.  Do you think this would
be pretty much the same data, just collected at different points?

> 4) Making this happen
> -------------------------
>
> Outscale want to make these IO stat happen and gave me the go to do whatever
> grunt is required to do so.
> That said we could collaborate on some part of the work.

Cool!

A quick stab at tasks:

* QMP interface, either a compatible extension of query-blockstats or a
  new one.

* Rough idea on how to do the shared infrastructure.

* Implement (can be split up into several tasks if desired)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [libvirt]  IO accounting overhaul
  2014-09-01  9:52 ` Markus Armbruster
@ 2014-09-01 10:44   ` Benoît Canet
  2014-09-01 11:41     ` Markus Armbruster
  0 siblings, 1 reply; 14+ messages in thread
From: Benoît Canet @ 2014-09-01 10:44 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Benoît Canet, kwolf, libvir-list, qemu-devel, stefanha,
	anshul.makkar

The Monday 01 Sep 2014 à 11:52:00 (+0200), Markus Armbruster wrote :
> Cc'ing libvirt following Stefan's lead.
> 
> Benoît Canet <benoit.canet@irqsave.net> writes:
> 
> > Hi,
> >
> > I collected some items of a cloud provider wishlist regarding I/O accouting.
> 
> Feedback from real power-users, lovely!
> 
> > In a cloud I/O accouting can have 3 purpose: billing, helping the customers
> > and doing metrology to help the cloud provider seeks hidden costs.
> >
> > I'll cover the two former topic in this mail because they are the most important
> > business wize.
> >
> > 1) prefered place to collect billing IO accounting data:
> > --------------------------------------------------------
> > For billing purpose the collected data must be as close as possible to what the
> > customer would see by using iostats in his vm.
> 
> Good point.
> 
> > The first conclusion we can draw is that the choice of collecting IO accouting
> > data used for billing in the block devices models is right.
> 
> Slightly rephrasing: doing I/O accounting in the block device models is
> right for billing.
> 
> There may be other uses for I/O accounting, with different preferences.
> For instance, data on how exactly guest I/O gets translated to host I/O
> as it flows through the nodes in the block graph could be useful.

I think this is the third point that I named as metrology.
Basically it boils down to "Where are the hidden IO costs of the QEMU block layer".

> 
> Doesn't diminish the need for accurate billing information, of course.
> 
> > 2) what to do with occurences of rare events:
> > ---------------------------------------------
> >
> > Another point is that QEMU developpers agree that they don't know which policy
> > to apply to some I/O accounting events.
> > Must QEMU discard invalid I/O write IO or account them as done ?
> > Must QEMU count a failed read I/O as done ?
> >
> > When discusting this with a cloud provider the following appears:
> > these decisions
> > are really specific to each cloud provider and QEMU should not implement them.
> 
> Good point, consistent with the old advice to avoid baking policy into
> inappropriately low levels of the stack.
> 
> > The right thing to do is to add accouting counters to collect these events.
> >
> > Moreover these rare events are precious troubleshooting data so it's
> > an additional
> > reason not to toss them.
> 
> Another good point.
> 
> > 3) list of block I/O accouting metrics wished for billing and helping
> > the customers
> > -----------------------------------------------------------------------------------
> >
> > Basic I/O accouting data will end up making the customers bills.
> > Extra I/O accouting informations would be a precious help for the cloud provider
> > to implement a monitoring panel like Amazon Cloudwatch.
> 
> These are the first two from your list of three purposes, i.e. the ones
> you promised to cover here.
> 
> > Here is the list of counters and statitics I would like to help
> > implement in QEMU.
> >
> > This is the most important part of the mail and the one I would like
> > the community
> > review the most.
> >
> > Once this list is settled I would proceed to implement the required
> > infrastructure
> > in QEMU before using it in the device models.
> 
> For context, let me recap how I/O accounting works now.
> 
> The BlockDriverState abstract data type (short: BDS) can hold the
> following accounting data:
> 
>     uint64_t nr_bytes[BDRV_MAX_IOTYPE];
>     uint64_t nr_ops[BDRV_MAX_IOTYPE];
>     uint64_t total_time_ns[BDRV_MAX_IOTYPE];
>     uint64_t wr_highest_sector;
> 
> where BDRV_MAX_IOTYPE enumerates read, write, flush.
> 
> wr_highest_sector is a high watermark updated by the block layer as it
> writes sectors.
> 
> The other three are *not* touched by the block layer.  Instead, the
> block layer provides a pair of functions for device models to update
> them:
> 
>     void bdrv_acct_start(BlockDriverState *bs, BlockAcctCookie *cookie,
>             int64_t bytes, enum BlockAcctType type);
>     void bdrv_acct_done(BlockDriverState *bs, BlockAcctCookie *cookie);
> 
> bdrv_acct_start() initializes cookie for a read, write, or flush
> operation of a certain size.  The size of a flush is always zero.
> 
> bdrv_acct_done() adds the operations to the BDS's accounting data.
> total_time_ns is incremented by the time between _start() and _done().
> 
> You may call _start() without calling _done().  That's a feature.
> Device models use it to avoid accounting some requests.
> 
> Device models are not supposed to mess with cookie directly, only
> through these two functions.
> 
> Some device models implement accounting, some don't.  The ones that do
> don't agree on how to count invalid guest requests (the ones not passed
> to block layer) and failed requests (passed to block layer and failed
> there).  It's a mess in part caused by us never writing down what
> exactly device models are expected to do.
> 
> Accounting data is used by "query-blockstats", and nothing else.
> 
> Corollary: even though every BDS holds accounting data, only the ones in
> "top" BDSes ever get used.  This is a common block layer blemish, and
> we're working on cleaning it up.
> 
> If a device model doesn't implement accounting, query-blockstats lies.
> Fortunately, its lies are pretty transparent (everything's zero) as long
> as you don't do things like connecting a backend to a device model that
> doesn't implement accounting after disconnecting it from a device model
> that does.  Still, I'd welcome a more honest QMP interface.
> 
> For me, this accounting data belongs to the device model, not the BDS.
> Naturally, the block device models should use common infrastructure.  I
> guess they use the block layer only because it's obvious infrastructure
> they share.  Clumsy design.
> 
> > /* volume of data transfered by the IOs */
> > read_bytes
> > write_bytes
> 
> This is nr_bytes[BDRV_ACCT_READ] and nr_bytes[BDRV_ACCT_WRITE].
> 
> nr_bytes[BDRV_ACCT_FLUSH] is always zero.
> 
> Should this count only actual I/O, i.e. accumulated size of successful
> operations?
> 
> > /* operation count */
> > read_ios
> > write_ios
> > flush_ios
> >
> > /* how many invalid IOs the guest submit */
> > invalid_read_ios
> > invalid_write_ios
> > invalid_flush_ios
> >
> > /* how many io error happened */
> > read_ios_error
> > write_ios_error
> > flush_ios_error
> 
> This is nr_ops[BDRV_ACCT_READ], nr_ops[BDRV_ACCT_WRITE],
> nr_ops[BDRV_ACCT_FLUSH] split up into successful, invalid and failed.
> 
> > /* account the time passed doing IOs */
> > total_read_time
> > total_write_time
> > total_flush_time
> 
> This is total_time_ns[BDRV_ACCT_READ], total_time_ns[BDRV_ACCT_WRITE],
> total_time_ns[BDRV_ACCT_FLUSH].
> 
> I guess this should count both successful and failed I/O.  Could throw
> in invalid, too, but it's probably too quick to matter.
> 
> Please specify the unit clearly.  Both total_FOO_time_ns or total_FOO_ns
> would work for me.

Yes _ns is fine for me too.

> 
> > /* since when the volume is iddle */
> > qvolume_iddleness_time
> 
> "idle"
> 
> The obvious way to maintain this information with the current could
> would be saving the value of get_clock() in bdrv_acct_done().
> 
> > /* the following would compute latecies for slices of 1 seconds then toss the
> >  * result and start a new slice. A weighted sumation of the instant latencies
> >  * could help to implement this.
> >  */
> > 1s_read_average_latency
> > 1s_write_average_latency
> > 1s_flush_average_latency
> >
> > /* the former three numbers could be used to further compute a 1
> > minute slice value */
> > 1m_read_average_latency
> > 1m_write_average_latency
> > 1m_flush_average_latency
> >
> > /* the former three numbers could be used to further compute a 1 hours
> > slice value */
> > 1h_read_average_latency
> > 1h_write_average_latency
> > 1h_flush_average_latency
> 
> This is something like "what we added to total_FOO_time in the last
> completed 1s / 1m / 1h time slice divided by the number of additions".
> Just another way to accumulate the same raw data, thus no worries.
> 
> > /* 1 second average number of requests in flight */
> > 1s_read_queue_depth
> > 1s_write_queue_depth
> >
> > /* 1 minute average number of requests in flight */
> > 1m_read_queue_depth
> > 1m_write_queue_depth
> >
> > /* 1 hours average number of requests in flight */
> > 1h_read_queue_depth
> > 1h_write_queue_depth
> 
> I guess this involves counting bdrv_acct_start() and bdrv_acct_done().
> The "you need not call bdrv_acct_done()" feature may get in the way.
> Solvable.
> 
> Permit me a short detour into the other use for I/O accounting I
> mentioned: data on how exactly guest I/O gets translated to host I/O as
> it flows through the nodes in the block graph.  Do you think this would
> be pretty much the same data, just collected at different points?

That something I would like to take care in a further sub project.
Optionally collecting the same data for each BDS of the graph.

> 
> > 4) Making this happen
> > -------------------------
> >
> > Outscale want to make these IO stat happen and gave me the go to do whatever
> > grunt is required to do so.
> > That said we could collaborate on some part of the work.
> 
> Cool!
> 
> A quick stab at tasks:
> 
> * QMP interface, either a compatible extension of query-blockstats or a
>   new one.

I would like to extend query-blockstat in a first time but I also
would like to postpone the QMP interface changes and just write the
shared infrastructure and deploy it in the device models.

> 
> * Rough idea on how to do the shared infrastructure.

-API wize I think about adding
bdrv_acct_invalid() and
bdrv_acct_failed() and systematically issuing a bdrv_acct_start() asap.

-To calculate the averages I think about a global timer firing every seconds
and iterating of the bds list to make the computations even when there is no
IO activity. Is it acceptable to have a qemu_mutex by statitic structure ?

> 
> * Implement (can be split up into several tasks if desired)

First I would like to write a series implementing a backward-compatible API and get it
merged.

Then the deployment of the new API specifics in the devices models can be splitted/parallelized.

Best regards

Benoît

> 
> --
> libvir-list mailing list
> libvir-list@redhat.com
> https://www.redhat.com/mailman/listinfo/libvir-list

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [libvirt]  IO accounting overhaul
  2014-09-01 10:44   ` [Qemu-devel] [libvirt] " Benoît Canet
@ 2014-09-01 11:41     ` Markus Armbruster
  2014-09-01 13:38       ` Benoît Canet
  2014-09-05 14:30       ` Kevin Wolf
  0 siblings, 2 replies; 14+ messages in thread
From: Markus Armbruster @ 2014-09-01 11:41 UTC (permalink / raw)
  To: Benoît Canet; +Cc: kwolf, libvir-list, qemu-devel, stefanha, anshul.makkar

Benoît Canet <benoit.canet@irqsave.net> writes:

> The Monday 01 Sep 2014 à 11:52:00 (+0200), Markus Armbruster wrote :
>> Cc'ing libvirt following Stefan's lead.
>> 
>> Benoît Canet <benoit.canet@irqsave.net> writes:
>> 
>> > Hi,
>> >
>> > I collected some items of a cloud provider wishlist regarding I/O accouting.
>> 
>> Feedback from real power-users, lovely!
>> 
>> > In a cloud I/O accouting can have 3 purpose: billing, helping the customers
>> > and doing metrology to help the cloud provider seeks hidden costs.
>> >
>> > I'll cover the two former topic in this mail because they are the
>> > most important
>> > business wize.
>> >
>> > 1) prefered place to collect billing IO accounting data:
>> > --------------------------------------------------------
>> > For billing purpose the collected data must be as close as
>> > possible to what the
>> > customer would see by using iostats in his vm.
>> 
>> Good point.
>> 
>> > The first conclusion we can draw is that the choice of collecting
>> > IO accouting
>> > data used for billing in the block devices models is right.
>> 
>> Slightly rephrasing: doing I/O accounting in the block device models is
>> right for billing.
>> 
>> There may be other uses for I/O accounting, with different preferences.
>> For instance, data on how exactly guest I/O gets translated to host I/O
>> as it flows through the nodes in the block graph could be useful.
>
> I think this is the third point that I named as metrology.
> Basically it boils down to "Where are the hidden IO costs of the QEMU
> block layer".

Understood.

>> Doesn't diminish the need for accurate billing information, of course.
>> 
>> > 2) what to do with occurences of rare events:
>> > ---------------------------------------------
>> >
>> > Another point is that QEMU developpers agree that they don't know
>> > which policy
>> > to apply to some I/O accounting events.
>> > Must QEMU discard invalid I/O write IO or account them as done ?
>> > Must QEMU count a failed read I/O as done ?
>> >
>> > When discusting this with a cloud provider the following appears:
>> > these decisions
>> > are really specific to each cloud provider and QEMU should not
>> > implement them.
>> 
>> Good point, consistent with the old advice to avoid baking policy into
>> inappropriately low levels of the stack.
>> 
>> > The right thing to do is to add accouting counters to collect these events.
>> >
>> > Moreover these rare events are precious troubleshooting data so it's
>> > an additional
>> > reason not to toss them.
>> 
>> Another good point.
>> 
>> > 3) list of block I/O accouting metrics wished for billing and helping
>> > the customers
>> > -----------------------------------------------------------------------------------
>> >
>> > Basic I/O accouting data will end up making the customers bills.
>> > Extra I/O accouting informations would be a precious help for the
>> > cloud provider
>> > to implement a monitoring panel like Amazon Cloudwatch.
>> 
>> These are the first two from your list of three purposes, i.e. the ones
>> you promised to cover here.
>> 
>> > Here is the list of counters and statitics I would like to help
>> > implement in QEMU.
>> >
>> > This is the most important part of the mail and the one I would like
>> > the community
>> > review the most.
>> >
>> > Once this list is settled I would proceed to implement the required
>> > infrastructure
>> > in QEMU before using it in the device models.
>> 
>> For context, let me recap how I/O accounting works now.
>> 
>> The BlockDriverState abstract data type (short: BDS) can hold the
>> following accounting data:
>> 
>>     uint64_t nr_bytes[BDRV_MAX_IOTYPE];
>>     uint64_t nr_ops[BDRV_MAX_IOTYPE];
>>     uint64_t total_time_ns[BDRV_MAX_IOTYPE];
>>     uint64_t wr_highest_sector;
>> 
>> where BDRV_MAX_IOTYPE enumerates read, write, flush.
>> 
>> wr_highest_sector is a high watermark updated by the block layer as it
>> writes sectors.
>> 
>> The other three are *not* touched by the block layer.  Instead, the
>> block layer provides a pair of functions for device models to update
>> them:
>> 
>>     void bdrv_acct_start(BlockDriverState *bs, BlockAcctCookie *cookie,
>>             int64_t bytes, enum BlockAcctType type);
>>     void bdrv_acct_done(BlockDriverState *bs, BlockAcctCookie *cookie);
>> 
>> bdrv_acct_start() initializes cookie for a read, write, or flush
>> operation of a certain size.  The size of a flush is always zero.
>> 
>> bdrv_acct_done() adds the operations to the BDS's accounting data.
>> total_time_ns is incremented by the time between _start() and _done().
>> 
>> You may call _start() without calling _done().  That's a feature.
>> Device models use it to avoid accounting some requests.
>> 
>> Device models are not supposed to mess with cookie directly, only
>> through these two functions.
>> 
>> Some device models implement accounting, some don't.  The ones that do
>> don't agree on how to count invalid guest requests (the ones not passed
>> to block layer) and failed requests (passed to block layer and failed
>> there).  It's a mess in part caused by us never writing down what
>> exactly device models are expected to do.
>> 
>> Accounting data is used by "query-blockstats", and nothing else.
>> 
>> Corollary: even though every BDS holds accounting data, only the ones in
>> "top" BDSes ever get used.  This is a common block layer blemish, and
>> we're working on cleaning it up.
>> 
>> If a device model doesn't implement accounting, query-blockstats lies.
>> Fortunately, its lies are pretty transparent (everything's zero) as long
>> as you don't do things like connecting a backend to a device model that
>> doesn't implement accounting after disconnecting it from a device model
>> that does.  Still, I'd welcome a more honest QMP interface.
>> 
>> For me, this accounting data belongs to the device model, not the BDS.
>> Naturally, the block device models should use common infrastructure.  I
>> guess they use the block layer only because it's obvious infrastructure
>> they share.  Clumsy design.
>> 
>> > /* volume of data transfered by the IOs */
>> > read_bytes
>> > write_bytes
>> 
>> This is nr_bytes[BDRV_ACCT_READ] and nr_bytes[BDRV_ACCT_WRITE].
>> 
>> nr_bytes[BDRV_ACCT_FLUSH] is always zero.
>> 
>> Should this count only actual I/O, i.e. accumulated size of successful
>> operations?
>> 
>> > /* operation count */
>> > read_ios
>> > write_ios
>> > flush_ios
>> >
>> > /* how many invalid IOs the guest submit */
>> > invalid_read_ios
>> > invalid_write_ios
>> > invalid_flush_ios
>> >
>> > /* how many io error happened */
>> > read_ios_error
>> > write_ios_error
>> > flush_ios_error
>> 
>> This is nr_ops[BDRV_ACCT_READ], nr_ops[BDRV_ACCT_WRITE],
>> nr_ops[BDRV_ACCT_FLUSH] split up into successful, invalid and failed.
>> 
>> > /* account the time passed doing IOs */
>> > total_read_time
>> > total_write_time
>> > total_flush_time
>> 
>> This is total_time_ns[BDRV_ACCT_READ], total_time_ns[BDRV_ACCT_WRITE],
>> total_time_ns[BDRV_ACCT_FLUSH].
>> 
>> I guess this should count both successful and failed I/O.  Could throw
>> in invalid, too, but it's probably too quick to matter.
>> 
>> Please specify the unit clearly.  Both total_FOO_time_ns or total_FOO_ns
>> would work for me.
>
> Yes _ns is fine for me too.
>
>> 
>> > /* since when the volume is iddle */
>> > qvolume_iddleness_time
>> 
>> "idle"
>> 
>> The obvious way to maintain this information with the current could
>> would be saving the value of get_clock() in bdrv_acct_done().
>> 
>> > /* the following would compute latecies for slices of 1 seconds then toss the
>> >  * result and start a new slice. A weighted sumation of the instant latencies
>> >  * could help to implement this.
>> >  */
>> > 1s_read_average_latency
>> > 1s_write_average_latency
>> > 1s_flush_average_latency
>> >
>> > /* the former three numbers could be used to further compute a 1
>> > minute slice value */
>> > 1m_read_average_latency
>> > 1m_write_average_latency
>> > 1m_flush_average_latency
>> >
>> > /* the former three numbers could be used to further compute a 1 hours
>> > slice value */
>> > 1h_read_average_latency
>> > 1h_write_average_latency
>> > 1h_flush_average_latency
>> 
>> This is something like "what we added to total_FOO_time in the last
>> completed 1s / 1m / 1h time slice divided by the number of additions".
>> Just another way to accumulate the same raw data, thus no worries.
>> 
>> > /* 1 second average number of requests in flight */
>> > 1s_read_queue_depth
>> > 1s_write_queue_depth
>> >
>> > /* 1 minute average number of requests in flight */
>> > 1m_read_queue_depth
>> > 1m_write_queue_depth
>> >
>> > /* 1 hours average number of requests in flight */
>> > 1h_read_queue_depth
>> > 1h_write_queue_depth
>> 
>> I guess this involves counting bdrv_acct_start() and bdrv_acct_done().
>> The "you need not call bdrv_acct_done()" feature may get in the way.
>> Solvable.
>> 
>> Permit me a short detour into the other use for I/O accounting I
>> mentioned: data on how exactly guest I/O gets translated to host I/O as
>> it flows through the nodes in the block graph.  Do you think this would
>> be pretty much the same data, just collected at different points?
>
> That something I would like to take care in a further sub project.

I'm asking now because it impacts where I'd prefer the shared
infrastructure to live.

> Optionally collecting the same data for each BDS of the graph.

If that's the case, keeping the shared infrastructure in the block layer
makes sense.

BDS member acct then holds I/O stats for the BDS.  We currently use it
for something else: I/O stats of the device model backed by this BDS.
That needs to move elsewhere.  Two places come to mind:

1. BlockBackend, when it's available (I resumed working on it last week
   for a bit).  Superficially attractive, because it's close to what we
   have now, but then we have to deal with what to do when the backend
   gets disconnected from its device model, then connected to another
   one.

2. The device models that actually implement I/O accounting.  Since
   query-blockstats names a backend rather than a device model, we need
   a BlockDevOps callback to fetch the stats.  Fetch fails when the
   callback is null.  Lets us distinguish "no stats yet" and "device
   model can't do stats", thus permits a QMP interface that doesn't lie.

Right now, I like (2) better.

>> > 4) Making this happen
>> > -------------------------
>> >
>> > Outscale want to make these IO stat happen and gave me the go to do whatever
>> > grunt is required to do so.
>> > That said we could collaborate on some part of the work.
>> 
>> Cool!
>> 
>> A quick stab at tasks:
>> 
>> * QMP interface, either a compatible extension of query-blockstats or a
>>   new one.
>
> I would like to extend query-blockstat in a first time but I also
> would like to postpone the QMP interface changes and just write the
> shared infrastructure and deploy it in the device models.

Implementing improved data collection need not wait for the QMP design.

>> * Rough idea on how to do the shared infrastructure.
>
> -API wize I think about adding
> bdrv_acct_invalid() and
> bdrv_acct_failed() and systematically issuing a bdrv_acct_start() asap.

Complication: partial success.  Example:

1. Guest requests a read of N sectors.

2. Device model calls
   bdrv_acct_start(s->bs, &req->acct, N * BDRV_SECTOR_SIZE, BDRV_ACCT_READ)

3. Device model examines the request, and deems it valid.

4. Device model passes it to the block layer.

5. Block layer does its thing, but for some reason only M < N sectors
   can be read.  Block layer returns M.

6. What's the device model to do now?  Both bdrv_acct_failed() and
   bdrv_acct_done() would be wrong.

   Should the device model account for a read of size M?  This ignores
   the partial failure.

   Should it split the read into a successful and a failed part for
   accounting purposes?  This miscounts the number of reads.

> -To calculate the averages I think about a global timer firing every seconds
> and iterating of the bds list to make the computations even when there is no
> IO activity. Is it acceptable to have a qemu_mutex by statitic structure ?

Let me try to sketch something simpler.  For brevity, I cover only a
single slice, but adding mroe is just more of the same.

Have an end-of-slice timestamp.

When accounting an operation, compare current time to end-of-slice (we
already get the current time to measure latency, so this is cheap).  If
end-of-slice is in the past, zero the slice's stats, and update
end-of-slice.

When something asks for stats, do the same, to avoid returning stale
slice stats.

>> * Implement (can be split up into several tasks if desired)
>
> First I would like to write a series implementing a
> backward-compatible API and get it
> merged.
>
> Then the deployment of the new API specifics in the devices models can
> be splitted/parallelized.

Works for me.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [libvirt]    IO accounting overhaul
  2014-09-01 11:41     ` Markus Armbruster
@ 2014-09-01 13:38       ` Benoît Canet
  2014-09-02 13:59         ` Markus Armbruster
  2014-09-05 14:30       ` Kevin Wolf
  1 sibling, 1 reply; 14+ messages in thread
From: Benoît Canet @ 2014-09-01 13:38 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Benoît Canet, kwolf, libvir-list, qemu-devel, stefanha,
	anshul.makkar

The Monday 01 Sep 2014 à 13:41:01 (+0200), Markus Armbruster wrote :
> Benoît Canet <benoit.canet@irqsave.net> writes:
> 
> > The Monday 01 Sep 2014 à 11:52:00 (+0200), Markus Armbruster wrote :
> >> Cc'ing libvirt following Stefan's lead.
> >> 
> >> Benoît Canet <benoit.canet@irqsave.net> writes:
> >> 
> >> > Hi,
> >> >
> >> > I collected some items of a cloud provider wishlist regarding I/O accouting.
> >> 
> >> Feedback from real power-users, lovely!
> >> 
> >> > In a cloud I/O accouting can have 3 purpose: billing, helping the customers
> >> > and doing metrology to help the cloud provider seeks hidden costs.
> >> >
> >> > I'll cover the two former topic in this mail because they are the
> >> > most important
> >> > business wize.
> >> >
> >> > 1) prefered place to collect billing IO accounting data:
> >> > --------------------------------------------------------
> >> > For billing purpose the collected data must be as close as
> >> > possible to what the
> >> > customer would see by using iostats in his vm.
> >> 
> >> Good point.
> >> 
> >> > The first conclusion we can draw is that the choice of collecting
> >> > IO accouting
> >> > data used for billing in the block devices models is right.
> >> 
> >> Slightly rephrasing: doing I/O accounting in the block device models is
> >> right for billing.
> >> 
> >> There may be other uses for I/O accounting, with different preferences.
> >> For instance, data on how exactly guest I/O gets translated to host I/O
> >> as it flows through the nodes in the block graph could be useful.
> >
> > I think this is the third point that I named as metrology.
> > Basically it boils down to "Where are the hidden IO costs of the QEMU
> > block layer".
> 
> Understood.
> 
> >> Doesn't diminish the need for accurate billing information, of course.
> >> 
> >> > 2) what to do with occurences of rare events:
> >> > ---------------------------------------------
> >> >
> >> > Another point is that QEMU developpers agree that they don't know
> >> > which policy
> >> > to apply to some I/O accounting events.
> >> > Must QEMU discard invalid I/O write IO or account them as done ?
> >> > Must QEMU count a failed read I/O as done ?
> >> >
> >> > When discusting this with a cloud provider the following appears:
> >> > these decisions
> >> > are really specific to each cloud provider and QEMU should not
> >> > implement them.
> >> 
> >> Good point, consistent with the old advice to avoid baking policy into
> >> inappropriately low levels of the stack.
> >> 
> >> > The right thing to do is to add accouting counters to collect these events.
> >> >
> >> > Moreover these rare events are precious troubleshooting data so it's
> >> > an additional
> >> > reason not to toss them.
> >> 
> >> Another good point.
> >> 
> >> > 3) list of block I/O accouting metrics wished for billing and helping
> >> > the customers
> >> > -----------------------------------------------------------------------------------
> >> >
> >> > Basic I/O accouting data will end up making the customers bills.
> >> > Extra I/O accouting informations would be a precious help for the
> >> > cloud provider
> >> > to implement a monitoring panel like Amazon Cloudwatch.
> >> 
> >> These are the first two from your list of three purposes, i.e. the ones
> >> you promised to cover here.
> >> 
> >> > Here is the list of counters and statitics I would like to help
> >> > implement in QEMU.
> >> >
> >> > This is the most important part of the mail and the one I would like
> >> > the community
> >> > review the most.
> >> >
> >> > Once this list is settled I would proceed to implement the required
> >> > infrastructure
> >> > in QEMU before using it in the device models.
> >> 
> >> For context, let me recap how I/O accounting works now.
> >> 
> >> The BlockDriverState abstract data type (short: BDS) can hold the
> >> following accounting data:
> >> 
> >>     uint64_t nr_bytes[BDRV_MAX_IOTYPE];
> >>     uint64_t nr_ops[BDRV_MAX_IOTYPE];
> >>     uint64_t total_time_ns[BDRV_MAX_IOTYPE];
> >>     uint64_t wr_highest_sector;
> >> 
> >> where BDRV_MAX_IOTYPE enumerates read, write, flush.
> >> 
> >> wr_highest_sector is a high watermark updated by the block layer as it
> >> writes sectors.
> >> 
> >> The other three are *not* touched by the block layer.  Instead, the
> >> block layer provides a pair of functions for device models to update
> >> them:
> >> 
> >>     void bdrv_acct_start(BlockDriverState *bs, BlockAcctCookie *cookie,
> >>             int64_t bytes, enum BlockAcctType type);
> >>     void bdrv_acct_done(BlockDriverState *bs, BlockAcctCookie *cookie);
> >> 
> >> bdrv_acct_start() initializes cookie for a read, write, or flush
> >> operation of a certain size.  The size of a flush is always zero.
> >> 
> >> bdrv_acct_done() adds the operations to the BDS's accounting data.
> >> total_time_ns is incremented by the time between _start() and _done().
> >> 
> >> You may call _start() without calling _done().  That's a feature.
> >> Device models use it to avoid accounting some requests.
> >> 
> >> Device models are not supposed to mess with cookie directly, only
> >> through these two functions.
> >> 
> >> Some device models implement accounting, some don't.  The ones that do
> >> don't agree on how to count invalid guest requests (the ones not passed
> >> to block layer) and failed requests (passed to block layer and failed
> >> there).  It's a mess in part caused by us never writing down what
> >> exactly device models are expected to do.
> >> 
> >> Accounting data is used by "query-blockstats", and nothing else.
> >> 
> >> Corollary: even though every BDS holds accounting data, only the ones in
> >> "top" BDSes ever get used.  This is a common block layer blemish, and
> >> we're working on cleaning it up.
> >> 
> >> If a device model doesn't implement accounting, query-blockstats lies.
> >> Fortunately, its lies are pretty transparent (everything's zero) as long
> >> as you don't do things like connecting a backend to a device model that
> >> doesn't implement accounting after disconnecting it from a device model
> >> that does.  Still, I'd welcome a more honest QMP interface.
> >> 
> >> For me, this accounting data belongs to the device model, not the BDS.
> >> Naturally, the block device models should use common infrastructure.  I
> >> guess they use the block layer only because it's obvious infrastructure
> >> they share.  Clumsy design.
> >> 
> >> > /* volume of data transfered by the IOs */
> >> > read_bytes
> >> > write_bytes
> >> 
> >> This is nr_bytes[BDRV_ACCT_READ] and nr_bytes[BDRV_ACCT_WRITE].
> >> 
> >> nr_bytes[BDRV_ACCT_FLUSH] is always zero.
> >> 
> >> Should this count only actual I/O, i.e. accumulated size of successful
> >> operations?
> >> 
> >> > /* operation count */
> >> > read_ios
> >> > write_ios
> >> > flush_ios
> >> >
> >> > /* how many invalid IOs the guest submit */
> >> > invalid_read_ios
> >> > invalid_write_ios
> >> > invalid_flush_ios
> >> >
> >> > /* how many io error happened */
> >> > read_ios_error
> >> > write_ios_error
> >> > flush_ios_error
> >> 
> >> This is nr_ops[BDRV_ACCT_READ], nr_ops[BDRV_ACCT_WRITE],
> >> nr_ops[BDRV_ACCT_FLUSH] split up into successful, invalid and failed.
> >> 
> >> > /* account the time passed doing IOs */
> >> > total_read_time
> >> > total_write_time
> >> > total_flush_time
> >> 
> >> This is total_time_ns[BDRV_ACCT_READ], total_time_ns[BDRV_ACCT_WRITE],
> >> total_time_ns[BDRV_ACCT_FLUSH].
> >> 
> >> I guess this should count both successful and failed I/O.  Could throw
> >> in invalid, too, but it's probably too quick to matter.
> >> 
> >> Please specify the unit clearly.  Both total_FOO_time_ns or total_FOO_ns
> >> would work for me.
> >
> > Yes _ns is fine for me too.
> >
> >> 
> >> > /* since when the volume is iddle */
> >> > qvolume_iddleness_time
> >> 
> >> "idle"
> >> 
> >> The obvious way to maintain this information with the current could
> >> would be saving the value of get_clock() in bdrv_acct_done().
> >> 
> >> > /* the following would compute latecies for slices of 1 seconds then toss the
> >> >  * result and start a new slice. A weighted sumation of the instant latencies
> >> >  * could help to implement this.
> >> >  */
> >> > 1s_read_average_latency
> >> > 1s_write_average_latency
> >> > 1s_flush_average_latency
> >> >
> >> > /* the former three numbers could be used to further compute a 1
> >> > minute slice value */
> >> > 1m_read_average_latency
> >> > 1m_write_average_latency
> >> > 1m_flush_average_latency
> >> >
> >> > /* the former three numbers could be used to further compute a 1 hours
> >> > slice value */
> >> > 1h_read_average_latency
> >> > 1h_write_average_latency
> >> > 1h_flush_average_latency
> >> 
> >> This is something like "what we added to total_FOO_time in the last
> >> completed 1s / 1m / 1h time slice divided by the number of additions".
> >> Just another way to accumulate the same raw data, thus no worries.
> >> 
> >> > /* 1 second average number of requests in flight */
> >> > 1s_read_queue_depth
> >> > 1s_write_queue_depth
> >> >
> >> > /* 1 minute average number of requests in flight */
> >> > 1m_read_queue_depth
> >> > 1m_write_queue_depth
> >> >
> >> > /* 1 hours average number of requests in flight */
> >> > 1h_read_queue_depth
> >> > 1h_write_queue_depth
> >> 
> >> I guess this involves counting bdrv_acct_start() and bdrv_acct_done().
> >> The "you need not call bdrv_acct_done()" feature may get in the way.
> >> Solvable.
> >> 
> >> Permit me a short detour into the other use for I/O accounting I
> >> mentioned: data on how exactly guest I/O gets translated to host I/O as
> >> it flows through the nodes in the block graph.  Do you think this would
> >> be pretty much the same data, just collected at different points?
> >
> > That something I would like to take care in a further sub project.
> 
> I'm asking now because it impacts where I'd prefer the shared
> infrastructure to live.
> 
> > Optionally collecting the same data for each BDS of the graph.
> 
> If that's the case, keeping the shared infrastructure in the block layer
> makes sense.
> 
> BDS member acct then holds I/O stats for the BDS.  We currently use it
> for something else: I/O stats of the device model backed by this BDS.
> That needs to move elsewhere.  Two places come to mind:
> 
> 1. BlockBackend, when it's available (I resumed working on it last week
>    for a bit).  Superficially attractive, because it's close to what we
>    have now, but then we have to deal with what to do when the backend
>    gets disconnected from its device model, then connected to another
>    one.
> 
> 2. The device models that actually implement I/O accounting.  Since
>    query-blockstats names a backend rather than a device model, we need
>    a BlockDevOps callback to fetch the stats.  Fetch fails when the
>    callback is null.  Lets us distinguish "no stats yet" and "device
>    model can't do stats", thus permits a QMP interface that doesn't lie.
> 
> Right now, I like (2) better.

ok

> 
> >> > 4) Making this happen
> >> > -------------------------
> >> >
> >> > Outscale want to make these IO stat happen and gave me the go to do whatever
> >> > grunt is required to do so.
> >> > That said we could collaborate on some part of the work.
> >> 
> >> Cool!
> >> 
> >> A quick stab at tasks:
> >> 
> >> * QMP interface, either a compatible extension of query-blockstats or a
> >>   new one.
> >
> > I would like to extend query-blockstat in a first time but I also
> > would like to postpone the QMP interface changes and just write the
> > shared infrastructure and deploy it in the device models.
> 
> Implementing improved data collection need not wait for the QMP design.
> 
> >> * Rough idea on how to do the shared infrastructure.
> >
> > -API wize I think about adding
> > bdrv_acct_invalid() and
> > bdrv_acct_failed() and systematically issuing a bdrv_acct_start() asap.
> 
> Complication: partial success.  Example:
> 
> 1. Guest requests a read of N sectors.
> 
> 2. Device model calls
>    bdrv_acct_start(s->bs, &req->acct, N * BDRV_SECTOR_SIZE, BDRV_ACCT_READ)
> 
> 3. Device model examines the request, and deems it valid.
> 
> 4. Device model passes it to the block layer.
> 
> 5. Block layer does its thing, but for some reason only M < N sectors
>    can be read.  Block layer returns M.
> 
> 6. What's the device model to do now?  Both bdrv_acct_failed() and
>    bdrv_acct_done() would be wrong.
> 
>    Should the device model account for a read of size M?  This ignores
>    the partial failure.
> 
>    Should it split the read into a successful and a failed part for
>    accounting purposes?  This miscounts the number of reads.

maybe bdrv_acct_partial() accounting the size of data read in the bandwith counter
and keeping care of counting this event.

Maybe we will discover some other rare event to account.

> 
> > -To calculate the averages I think about a global timer firing every seconds
> > and iterating of the bds list to make the computations even when there is no
> > IO activity. Is it acceptable to have a qemu_mutex by statitic structure ?
> 
> Let me try to sketch something simpler.  For brevity, I cover only a
> single slice, but adding mroe is just more of the same.
> 
> Have an end-of-slice timestamp.
> 
> When accounting an operation, compare current time to end-of-slice (we
> already get the current time to measure latency, so this is cheap).  If
> end-of-slice is in the past, zero the slice's stats, and update
> end-of-slice.
> 
> When something asks for stats, do the same, to avoid returning stale
> slice stats.
> 
> >> * Implement (can be split up into several tasks if desired)
> >
> > First I would like to write a series implementing a
> > backward-compatible API and get it
> > merged.
> >
> > Then the deployment of the new API specifics in the devices models can
> > be splitted/parallelized.
> 
> Works for me.
> 
> --
> libvir-list mailing list
> libvir-list@redhat.com
> https://www.redhat.com/mailman/listinfo/libvir-list

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [libvirt]    IO accounting overhaul
  2014-09-01 13:38       ` Benoît Canet
@ 2014-09-02 13:59         ` Markus Armbruster
  0 siblings, 0 replies; 14+ messages in thread
From: Markus Armbruster @ 2014-09-02 13:59 UTC (permalink / raw)
  To: Benoît Canet; +Cc: kwolf, libvir-list, qemu-devel, stefanha, anshul.makkar

Benoît Canet <benoit.canet@irqsave.net> writes:

> The Monday 01 Sep 2014 à 13:41:01 (+0200), Markus Armbruster wrote :
>> Benoît Canet <benoit.canet@irqsave.net> writes:
>> 
>> > The Monday 01 Sep 2014 à 11:52:00 (+0200), Markus Armbruster wrote :
[...]
>> >> A quick stab at tasks:
>> >> 
>> >> * QMP interface, either a compatible extension of query-blockstats or a
>> >>   new one.
>> >
>> > I would like to extend query-blockstat in a first time but I also
>> > would like to postpone the QMP interface changes and just write the
>> > shared infrastructure and deploy it in the device models.
>> 
>> Implementing improved data collection need not wait for the QMP design.
>> 
>> >> * Rough idea on how to do the shared infrastructure.
>> >
>> > -API wize I think about adding
>> > bdrv_acct_invalid() and
>> > bdrv_acct_failed() and systematically issuing a bdrv_acct_start() asap.
>> 
>> Complication: partial success.  Example:
>> 
>> 1. Guest requests a read of N sectors.
>> 
>> 2. Device model calls
>>    bdrv_acct_start(s->bs, &req->acct, N * BDRV_SECTOR_SIZE, BDRV_ACCT_READ)
>> 
>> 3. Device model examines the request, and deems it valid.
>> 
>> 4. Device model passes it to the block layer.
>> 
>> 5. Block layer does its thing, but for some reason only M < N sectors
>>    can be read.  Block layer returns M.
>> 
>> 6. What's the device model to do now?  Both bdrv_acct_failed() and
>>    bdrv_acct_done() would be wrong.
>> 
>>    Should the device model account for a read of size M?  This ignores
>>    the partial failure.
>> 
>>    Should it split the read into a successful and a failed part for
>>    accounting purposes?  This miscounts the number of reads.
>
> maybe bdrv_acct_partial() accounting the size of data read in the
> bandwith counter
> and keeping care of counting this event.

Two sub-questions:

a. What's a convenient interface for device models to report the
   operation announced with bdrv_acct_start() has succeeded partially?

   bdrv_acct_partial() sounds okay.

   Or maybe pass the #bytes actually done to bdrv_acct_done().  Equal to
   #bytes passed to bdrv_acct_done() means complete sucess, less means
   partial success.  You could even have negative mean complete
   failure.  Could perhaps be more concise.

   I trust you'll develop a preference while making the device models
   use your new interface.

b. How to count a partially successful operation?  In other words, what
   should your answer to a. do?

   I guess I'd be fine with simply counting short I/O as if the request
   had the short size.  But if we decide differently, changing the code
   accordingly should be trivial, so just start with whatever you think
   is right, and leave the debate (if any) to patch review.

> Maybe we will discover some other rare event to account.

Yes, but we can worry about it when we run into it.

[...]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [libvirt]  IO accounting overhaul
  2014-09-01 11:41     ` Markus Armbruster
  2014-09-01 13:38       ` Benoît Canet
@ 2014-09-05 14:30       ` Kevin Wolf
  2014-09-05 14:56         ` Benoît Canet
                           ` (3 more replies)
  1 sibling, 4 replies; 14+ messages in thread
From: Kevin Wolf @ 2014-09-05 14:30 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Benoît Canet, libvir-list, qemu-devel, stefanha, anshul.makkar

Am 01.09.2014 um 13:41 hat Markus Armbruster geschrieben:
> Benoît Canet <benoit.canet@irqsave.net> writes:
> 
> > The Monday 01 Sep 2014 à 11:52:00 (+0200), Markus Armbruster wrote :
> >> Cc'ing libvirt following Stefan's lead.
> >> 
> >> Benoît Canet <benoit.canet@irqsave.net> writes:
> >> > /* the following would compute latecies for slices of 1 seconds then toss the
> >> >  * result and start a new slice. A weighted sumation of the instant latencies
> >> >  * could help to implement this.
> >> >  */
> >> > 1s_read_average_latency
> >> > 1s_write_average_latency
> >> > 1s_flush_average_latency
> >> >
> >> > /* the former three numbers could be used to further compute a 1
> >> > minute slice value */
> >> > 1m_read_average_latency
> >> > 1m_write_average_latency
> >> > 1m_flush_average_latency
> >> >
> >> > /* the former three numbers could be used to further compute a 1 hours
> >> > slice value */
> >> > 1h_read_average_latency
> >> > 1h_write_average_latency
> >> > 1h_flush_average_latency
> >> 
> >> This is something like "what we added to total_FOO_time in the last
> >> completed 1s / 1m / 1h time slice divided by the number of additions".
> >> Just another way to accumulate the same raw data, thus no worries.
> >> 
> >> > /* 1 second average number of requests in flight */
> >> > 1s_read_queue_depth
> >> > 1s_write_queue_depth
> >> >
> >> > /* 1 minute average number of requests in flight */
> >> > 1m_read_queue_depth
> >> > 1m_write_queue_depth
> >> >
> >> > /* 1 hours average number of requests in flight */
> >> > 1h_read_queue_depth
> >> > 1h_write_queue_depth

I don't think I agree with putting fixed time periods like 1 s/min/h
into qemu. What you need there is policy and we should probably make
it configurable.

Do we need accounting for multiple time periods at the same time or
would it be enough to have one and make its duration an option?

> > Optionally collecting the same data for each BDS of the graph.
> 
> If that's the case, keeping the shared infrastructure in the block layer
> makes sense.
> 
> BDS member acct then holds I/O stats for the BDS.  We currently use it
> for something else: I/O stats of the device model backed by this BDS.
> That needs to move elsewhere.  Two places come to mind:
> 
> 1. BlockBackend, when it's available (I resumed working on it last week
>    for a bit).  Superficially attractive, because it's close to what we
>    have now, but then we have to deal with what to do when the backend
>    gets disconnected from its device model, then connected to another
>    one.
> 
> 2. The device models that actually implement I/O accounting.  Since
>    query-blockstats names a backend rather than a device model, we need
>    a BlockDevOps callback to fetch the stats.  Fetch fails when the
>    callback is null.  Lets us distinguish "no stats yet" and "device
>    model can't do stats", thus permits a QMP interface that doesn't lie.
> 
> Right now, I like (2) better.

So let's say I have some block device, which is attached to a guest
device for a while, but then I detach it and continue using it in a
different place (maybe another guest device or a block job). Should we
really reset all counters in query-blockstats to 0?

I think as I user I would be surprised about this, because I still refer
to it by the same name (the device_name, which will be in the BB), so
it's the same thing for me and the total requests include everything
that was ever issued against it.

> > -API wize I think about adding
> > bdrv_acct_invalid() and
> > bdrv_acct_failed() and systematically issuing a bdrv_acct_start() asap.
> 
> Complication: partial success.  Example:
> 
> 1. Guest requests a read of N sectors.
> 
> 2. Device model calls
>    bdrv_acct_start(s->bs, &req->acct, N * BDRV_SECTOR_SIZE, BDRV_ACCT_READ)
> 
> 3. Device model examines the request, and deems it valid.
> 
> 4. Device model passes it to the block layer.
> 
> 5. Block layer does its thing, but for some reason only M < N sectors
>    can be read.  Block layer returns M.

No, it returns -errno.

> 6. What's the device model to do now?  Both bdrv_acct_failed() and
>    bdrv_acct_done() would be wrong.
> 
>    Should the device model account for a read of size M?  This ignores
>    the partial failure.
> 
>    Should it split the read into a successful and a failed part for
>    accounting purposes?  This miscounts the number of reads.

I think we should simply account it as a failed request because this is
what it will look like for the guest. If you want the partial data that
was internally issued, you need to look at different statistics
(probably those of bs->file).

Kevin

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [libvirt]  IO accounting overhaul
  2014-09-05 14:30       ` Kevin Wolf
@ 2014-09-05 14:56         ` Benoît Canet
  2014-09-05 14:57         ` Benoît Canet
                           ` (2 subsequent siblings)
  3 siblings, 0 replies; 14+ messages in thread
From: Benoît Canet @ 2014-09-05 14:56 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Benoît Canet, libvir-list, Markus Armbruster, qemu-devel,
	stefanha, anshul.makkar

The Friday 05 Sep 2014 à 16:30:31 (+0200), Kevin Wolf wrote :
> Am 01.09.2014 um 13:41 hat Markus Armbruster geschrieben:
> > Benoît Canet <benoit.canet@irqsave.net> writes:
> > 
> > > The Monday 01 Sep 2014 à 11:52:00 (+0200), Markus Armbruster wrote :
> > >> Cc'ing libvirt following Stefan's lead.
> > >> 
> > >> Benoît Canet <benoit.canet@irqsave.net> writes:
> > >> > /* the following would compute latecies for slices of 1 seconds then toss the
> > >> >  * result and start a new slice. A weighted sumation of the instant latencies
> > >> >  * could help to implement this.
> > >> >  */
> > >> > 1s_read_average_latency
> > >> > 1s_write_average_latency
> > >> > 1s_flush_average_latency
> > >> >
> > >> > /* the former three numbers could be used to further compute a 1
> > >> > minute slice value */
> > >> > 1m_read_average_latency
> > >> > 1m_write_average_latency
> > >> > 1m_flush_average_latency
> > >> >
> > >> > /* the former three numbers could be used to further compute a 1 hours
> > >> > slice value */
> > >> > 1h_read_average_latency
> > >> > 1h_write_average_latency
> > >> > 1h_flush_average_latency
> > >> 
> > >> This is something like "what we added to total_FOO_time in the last
> > >> completed 1s / 1m / 1h time slice divided by the number of additions".
> > >> Just another way to accumulate the same raw data, thus no worries.
> > >> 
> > >> > /* 1 second average number of requests in flight */
> > >> > 1s_read_queue_depth
> > >> > 1s_write_queue_depth
> > >> >
> > >> > /* 1 minute average number of requests in flight */
> > >> > 1m_read_queue_depth
> > >> > 1m_write_queue_depth
> > >> >
> > >> > /* 1 hours average number of requests in flight */
> > >> > 1h_read_queue_depth
> > >> > 1h_write_queue_depth
> 
> I don't think I agree with putting fixed time periods like 1 s/min/h
> into qemu. What you need there is policy and we should probably make
> it configurable.
> 
> Do we need accounting for multiple time periods at the same time or
> would it be enough to have one and make its duration an option?
> 
> > > Optionally collecting the same data for each BDS of the graph.
> > 
> > If that's the case, keeping the shared infrastructure in the block layer
> > makes sense.
> > 
> > BDS member acct then holds I/O stats for the BDS.  We currently use it
> > for something else: I/O stats of the device model backed by this BDS.
> > That needs to move elsewhere.  Two places come to mind:
> > 
> > 1. BlockBackend, when it's available (I resumed working on it last week
> >    for a bit).  Superficially attractive, because it's close to what we
> >    have now, but then we have to deal with what to do when the backend
> >    gets disconnected from its device model, then connected to another
> >    one.
> > 
> > 2. The device models that actually implement I/O accounting.  Since
> >    query-blockstats names a backend rather than a device model, we need
> >    a BlockDevOps callback to fetch the stats.  Fetch fails when the
> >    callback is null.  Lets us distinguish "no stats yet" and "device
> >    model can't do stats", thus permits a QMP interface that doesn't lie.
> > 
> > Right now, I like (2) better.
> 
> So let's say I have some block device, which is attached to a guest
> device for a while, but then I detach it and continue using it in a
> different place (maybe another guest device or a block job). Should we
> really reset all counters in query-blockstats to 0?
> 
> I think as I user I would be surprised about this, because I still refer
> to it by the same name (the device_name, which will be in the BB), so
> it's the same thing for me and the total requests include everything
> that was ever issued against it.

It all depends on where you are in the user food chain.
If you are a cloud end user you are interested in the stats associated with
the hardware of your /dev/vdx because that's what iostat allow you to see.

> 
> > > -API wize I think about adding
> > > bdrv_acct_invalid() and
> > > bdrv_acct_failed() and systematically issuing a bdrv_acct_start() asap.
> > 
> > Complication: partial success.  Example:
> > 
> > 1. Guest requests a read of N sectors.
> > 
> > 2. Device model calls
> >    bdrv_acct_start(s->bs, &req->acct, N * BDRV_SECTOR_SIZE, BDRV_ACCT_READ)
> > 
> > 3. Device model examines the request, and deems it valid.
> > 
> > 4. Device model passes it to the block layer.
> > 
> > 5. Block layer does its thing, but for some reason only M < N sectors
> >    can be read.  Block layer returns M.
> 
> No, it returns -errno.
> 
> > 6. What's the device model to do now?  Both bdrv_acct_failed() and
> >    bdrv_acct_done() would be wrong.
> > 
> >    Should the device model account for a read of size M?  This ignores
> >    the partial failure.
> > 
> >    Should it split the read into a successful and a failed part for
> >    accounting purposes?  This miscounts the number of reads.
> 
> I think we should simply account it as a failed request because this is
> what it will look like for the guest. If you want the partial data that
> was internally issued, you need to look at different statistics
> (probably those of bs->file).
> 
> Kevin

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [libvirt]    IO accounting overhaul
  2014-09-05 14:30       ` Kevin Wolf
  2014-09-05 14:56         ` Benoît Canet
@ 2014-09-05 14:57         ` Benoît Canet
  2014-09-05 15:24         ` Benoît Canet
  2014-09-08  7:12         ` Markus Armbruster
  3 siblings, 0 replies; 14+ messages in thread
From: Benoît Canet @ 2014-09-05 14:57 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Benoît Canet, libvir-list, Markus Armbruster, qemu-devel,
	stefanha, anshul.makkar

The Friday 05 Sep 2014 à 16:30:31 (+0200), Kevin Wolf wrote :
> Am 01.09.2014 um 13:41 hat Markus Armbruster geschrieben:
> > Benoît Canet <benoit.canet@irqsave.net> writes:
> > 
> > > The Monday 01 Sep 2014 à 11:52:00 (+0200), Markus Armbruster wrote :
> > >> Cc'ing libvirt following Stefan's lead.
> > >> 
> > >> Benoît Canet <benoit.canet@irqsave.net> writes:
> > >> > /* the following would compute latecies for slices of 1 seconds then toss the
> > >> >  * result and start a new slice. A weighted sumation of the instant latencies
> > >> >  * could help to implement this.
> > >> >  */
> > >> > 1s_read_average_latency
> > >> > 1s_write_average_latency
> > >> > 1s_flush_average_latency
> > >> >
> > >> > /* the former three numbers could be used to further compute a 1
> > >> > minute slice value */
> > >> > 1m_read_average_latency
> > >> > 1m_write_average_latency
> > >> > 1m_flush_average_latency
> > >> >
> > >> > /* the former three numbers could be used to further compute a 1 hours
> > >> > slice value */
> > >> > 1h_read_average_latency
> > >> > 1h_write_average_latency
> > >> > 1h_flush_average_latency
> > >> 
> > >> This is something like "what we added to total_FOO_time in the last
> > >> completed 1s / 1m / 1h time slice divided by the number of additions".
> > >> Just another way to accumulate the same raw data, thus no worries.
> > >> 
> > >> > /* 1 second average number of requests in flight */
> > >> > 1s_read_queue_depth
> > >> > 1s_write_queue_depth
> > >> >
> > >> > /* 1 minute average number of requests in flight */
> > >> > 1m_read_queue_depth
> > >> > 1m_write_queue_depth
> > >> >
> > >> > /* 1 hours average number of requests in flight */
> > >> > 1h_read_queue_depth
> > >> > 1h_write_queue_depth
> 
> I don't think I agree with putting fixed time periods like 1 s/min/h
> into qemu. What you need there is policy and we should probably make
> it configurable.

I agree.

> 
> Do we need accounting for multiple time periods at the same time or
> would it be enough to have one and make its duration an option?

I don't know yet.

> 
> > > Optionally collecting the same data for each BDS of the graph.
> > 
> > If that's the case, keeping the shared infrastructure in the block layer
> > makes sense.
> > 
> > BDS member acct then holds I/O stats for the BDS.  We currently use it
> > for something else: I/O stats of the device model backed by this BDS.
> > That needs to move elsewhere.  Two places come to mind:
> > 
> > 1. BlockBackend, when it's available (I resumed working on it last week
> >    for a bit).  Superficially attractive, because it's close to what we
> >    have now, but then we have to deal with what to do when the backend
> >    gets disconnected from its device model, then connected to another
> >    one.
> > 
> > 2. The device models that actually implement I/O accounting.  Since
> >    query-blockstats names a backend rather than a device model, we need
> >    a BlockDevOps callback to fetch the stats.  Fetch fails when the
> >    callback is null.  Lets us distinguish "no stats yet" and "device
> >    model can't do stats", thus permits a QMP interface that doesn't lie.
> > 
> > Right now, I like (2) better.
> 
> So let's say I have some block device, which is attached to a guest
> device for a while, but then I detach it and continue using it in a
> different place (maybe another guest device or a block job). Should we
> really reset all counters in query-blockstats to 0?
> 
> I think as I user I would be surprised about this, because I still refer
> to it by the same name (the device_name, which will be in the BB), so
> it's the same thing for me and the total requests include everything
> that was ever issued against it.
> 
> > > -API wize I think about adding
> > > bdrv_acct_invalid() and
> > > bdrv_acct_failed() and systematically issuing a bdrv_acct_start() asap.
> > 
> > Complication: partial success.  Example:
> > 
> > 1. Guest requests a read of N sectors.
> > 
> > 2. Device model calls
> >    bdrv_acct_start(s->bs, &req->acct, N * BDRV_SECTOR_SIZE, BDRV_ACCT_READ)
> > 
> > 3. Device model examines the request, and deems it valid.
> > 
> > 4. Device model passes it to the block layer.
> > 
> > 5. Block layer does its thing, but for some reason only M < N sectors
> >    can be read.  Block layer returns M.
> 
> No, it returns -errno.
> 
> > 6. What's the device model to do now?  Both bdrv_acct_failed() and
> >    bdrv_acct_done() would be wrong.
> > 
> >    Should the device model account for a read of size M?  This ignores
> >    the partial failure.
> > 
> >    Should it split the read into a successful and a failed part for
> >    accounting purposes?  This miscounts the number of reads.
> 
> I think we should simply account it as a failed request because this is
> what it will look like for the guest. If you want the partial data that
> was internally issued, you need to look at different statistics
> (probably those of bs->file).
> 
> Kevin
> 
> --
> libvir-list mailing list
> libvir-list@redhat.com
> https://www.redhat.com/mailman/listinfo/libvir-list

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [libvirt]    IO accounting overhaul
  2014-09-05 14:30       ` Kevin Wolf
  2014-09-05 14:56         ` Benoît Canet
  2014-09-05 14:57         ` Benoît Canet
@ 2014-09-05 15:24         ` Benoît Canet
  2014-09-08  7:12         ` Markus Armbruster
  3 siblings, 0 replies; 14+ messages in thread
From: Benoît Canet @ 2014-09-05 15:24 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Benoît Canet, libvir-list, Markus Armbruster, qemu-devel,
	stefanha, anshul.makkar

The Friday 05 Sep 2014 à 16:30:31 (+0200), Kevin Wolf wrote :
> Am 01.09.2014 um 13:41 hat Markus Armbruster geschrieben:
> > Benoît Canet <benoit.canet@irqsave.net> writes:
> > 
> > > The Monday 01 Sep 2014 à 11:52:00 (+0200), Markus Armbruster wrote :
> > >> Cc'ing libvirt following Stefan's lead.
> > >> 
> > >> Benoît Canet <benoit.canet@irqsave.net> writes:
> > >> > /* the following would compute latecies for slices of 1 seconds then toss the
> > >> >  * result and start a new slice. A weighted sumation of the instant latencies
> > >> >  * could help to implement this.
> > >> >  */
> > >> > 1s_read_average_latency
> > >> > 1s_write_average_latency
> > >> > 1s_flush_average_latency
> > >> >
> > >> > /* the former three numbers could be used to further compute a 1
> > >> > minute slice value */
> > >> > 1m_read_average_latency
> > >> > 1m_write_average_latency
> > >> > 1m_flush_average_latency
> > >> >
> > >> > /* the former three numbers could be used to further compute a 1 hours
> > >> > slice value */
> > >> > 1h_read_average_latency
> > >> > 1h_write_average_latency
> > >> > 1h_flush_average_latency
> > >> 
> > >> This is something like "what we added to total_FOO_time in the last
> > >> completed 1s / 1m / 1h time slice divided by the number of additions".
> > >> Just another way to accumulate the same raw data, thus no worries.
> > >> 
> > >> > /* 1 second average number of requests in flight */
> > >> > 1s_read_queue_depth
> > >> > 1s_write_queue_depth
> > >> >
> > >> > /* 1 minute average number of requests in flight */
> > >> > 1m_read_queue_depth
> > >> > 1m_write_queue_depth
> > >> >
> > >> > /* 1 hours average number of requests in flight */
> > >> > 1h_read_queue_depth
> > >> > 1h_write_queue_depth
> 

I asked some input from a cloud provider.

> I don't think I agree with putting fixed time periods like 1 s/min/h
> into qemu. What you need there is policy and we should probably make
> it configurable.

yes, baking policy in qemu is bad.

> 
> Do we need accounting for multiple time periods at the same time or
> would it be enough to have one and make its duration an option?

Having multiple time periods (up to 3) at once would be cool.
Having big knobs to turn in the configuration (one per period) would be
preferable than a configuration nightmare of one or more setting per device.

> 
> > > Optionally collecting the same data for each BDS of the graph.
> > 
> > If that's the case, keeping the shared infrastructure in the block layer
> > makes sense.
> > 
> > BDS member acct then holds I/O stats for the BDS.  We currently use it
> > for something else: I/O stats of the device model backed by this BDS.
> > That needs to move elsewhere.  Two places come to mind:
> > 
> > 1. BlockBackend, when it's available (I resumed working on it last week
> >    for a bit).  Superficially attractive, because it's close to what we
> >    have now, but then we have to deal with what to do when the backend
> >    gets disconnected from its device model, then connected to another
> >    one.
> > 
> > 2. The device models that actually implement I/O accounting.  Since
> >    query-blockstats names a backend rather than a device model, we need
> >    a BlockDevOps callback to fetch the stats.  Fetch fails when the
> >    callback is null.  Lets us distinguish "no stats yet" and "device
> >    model can't do stats", thus permits a QMP interface that doesn't lie.
> > 
> > Right now, I like (2) better.
> 
> So let's say I have some block device, which is attached to a guest
> device for a while, but then I detach it and continue using it in a
> different place (maybe another guest device or a block job). Should we
> really reset all counters in query-blockstats to 0?
> 
> I think as I user I would be surprised about this, because I still refer
> to it by the same name (the device_name, which will be in the BB), so
> it's the same thing for me and the total requests include everything
> that was ever issued against it.

This particular cloud provider think that associating the stats with the
emulated hardware is the less worisome to manage.

So now we need to discuss futher in the community about this.

Best regards

Benoît

> 
> > > -API wize I think about adding
> > > bdrv_acct_invalid() and
> > > bdrv_acct_failed() and systematically issuing a bdrv_acct_start() asap.
> > 
> > Complication: partial success.  Example:
> > 
> > 1. Guest requests a read of N sectors.
> > 
> > 2. Device model calls
> >    bdrv_acct_start(s->bs, &req->acct, N * BDRV_SECTOR_SIZE, BDRV_ACCT_READ)
> > 
> > 3. Device model examines the request, and deems it valid.
> > 
> > 4. Device model passes it to the block layer.
> > 
> > 5. Block layer does its thing, but for some reason only M < N sectors
> >    can be read.  Block layer returns M.
> 
> No, it returns -errno.
> 
> > 6. What's the device model to do now?  Both bdrv_acct_failed() and
> >    bdrv_acct_done() would be wrong.
> > 
> >    Should the device model account for a read of size M?  This ignores
> >    the partial failure.
> > 
> >    Should it split the read into a successful and a failed part for
> >    accounting purposes?  This miscounts the number of reads.
> 
> I think we should simply account it as a failed request because this is
> what it will look like for the guest. If you want the partial data that
> was internally issued, you need to look at different statistics
> (probably those of bs->file).
> 
> Kevin
> 
> --
> libvir-list mailing list
> libvir-list@redhat.com
> https://www.redhat.com/mailman/listinfo/libvir-list

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [libvirt]  IO accounting overhaul
  2014-09-05 14:30       ` Kevin Wolf
                           ` (2 preceding siblings ...)
  2014-09-05 15:24         ` Benoît Canet
@ 2014-09-08  7:12         ` Markus Armbruster
  2014-09-08  9:12           ` Kevin Wolf
  3 siblings, 1 reply; 14+ messages in thread
From: Markus Armbruster @ 2014-09-08  7:12 UTC (permalink / raw)
  To: Kevin Wolf
  Cc: Benoît Canet, libvir-list, qemu-devel, stefanha, anshul.makkar

Kevin Wolf <kwolf@redhat.com> writes:

> Am 01.09.2014 um 13:41 hat Markus Armbruster geschrieben:
>> Benoît Canet <benoit.canet@irqsave.net> writes:
>> 
>> > The Monday 01 Sep 2014 à 11:52:00 (+0200), Markus Armbruster wrote :
>> >> Cc'ing libvirt following Stefan's lead.
>> >> 
>> >> Benoît Canet <benoit.canet@irqsave.net> writes:
>> >> > /* the following would compute latecies for slices of 1 seconds
>> >> > then toss the
>> >> >  * result and start a new slice. A weighted sumation of the
>> >> > instant latencies
>> >> >  * could help to implement this.
>> >> >  */
>> >> > 1s_read_average_latency
>> >> > 1s_write_average_latency
>> >> > 1s_flush_average_latency
>> >> >
>> >> > /* the former three numbers could be used to further compute a 1
>> >> > minute slice value */
>> >> > 1m_read_average_latency
>> >> > 1m_write_average_latency
>> >> > 1m_flush_average_latency
>> >> >
>> >> > /* the former three numbers could be used to further compute a 1 hours
>> >> > slice value */
>> >> > 1h_read_average_latency
>> >> > 1h_write_average_latency
>> >> > 1h_flush_average_latency
>> >> 
>> >> This is something like "what we added to total_FOO_time in the last
>> >> completed 1s / 1m / 1h time slice divided by the number of additions".
>> >> Just another way to accumulate the same raw data, thus no worries.
>> >> 
>> >> > /* 1 second average number of requests in flight */
>> >> > 1s_read_queue_depth
>> >> > 1s_write_queue_depth
>> >> >
>> >> > /* 1 minute average number of requests in flight */
>> >> > 1m_read_queue_depth
>> >> > 1m_write_queue_depth
>> >> >
>> >> > /* 1 hours average number of requests in flight */
>> >> > 1h_read_queue_depth
>> >> > 1h_write_queue_depth
>
> I don't think I agree with putting fixed time periods like 1 s/min/h
> into qemu. What you need there is policy and we should probably make
> it configurable.
>
> Do we need accounting for multiple time periods at the same time or
> would it be enough to have one and make its duration an option?
>
>> > Optionally collecting the same data for each BDS of the graph.
>> 
>> If that's the case, keeping the shared infrastructure in the block layer
>> makes sense.
>> 
>> BDS member acct then holds I/O stats for the BDS.  We currently use it
>> for something else: I/O stats of the device model backed by this BDS.
>> That needs to move elsewhere.  Two places come to mind:
>> 
>> 1. BlockBackend, when it's available (I resumed working on it last week
>>    for a bit).  Superficially attractive, because it's close to what we
>>    have now, but then we have to deal with what to do when the backend
>>    gets disconnected from its device model, then connected to another
>>    one.
>> 
>> 2. The device models that actually implement I/O accounting.  Since
>>    query-blockstats names a backend rather than a device model, we need
>>    a BlockDevOps callback to fetch the stats.  Fetch fails when the
>>    callback is null.  Lets us distinguish "no stats yet" and "device
>>    model can't do stats", thus permits a QMP interface that doesn't lie.
>> 
>> Right now, I like (2) better.
>
> So let's say I have some block device, which is attached to a guest
> device for a while, but then I detach it and continue using it in a
> different place (maybe another guest device or a block job). Should we
> really reset all counters in query-blockstats to 0?
>
> I think as I user I would be surprised about this, because I still refer
> to it by the same name (the device_name, which will be in the BB), so
> it's the same thing for me and the total requests include everything
> that was ever issued against it.

In my opinion, what's wrong here is the user interface: query-blockstats
lets you query device model I/O statistics, but they're reported for the
backend rather than the device model.  This is confusing.

Once you accept that these statistics measure device model behavior,
it's no longer surprising they go away on device model destruction.

We may want to measure BDS behavior, too.  But that's a separate set of
stats.

>> > -API wize I think about adding
>> > bdrv_acct_invalid() and
>> > bdrv_acct_failed() and systematically issuing a bdrv_acct_start() asap.
>> 
>> Complication: partial success.  Example:
>> 
>> 1. Guest requests a read of N sectors.
>> 
>> 2. Device model calls
>>    bdrv_acct_start(s->bs, &req->acct, N * BDRV_SECTOR_SIZE, BDRV_ACCT_READ)
>> 
>> 3. Device model examines the request, and deems it valid.
>> 
>> 4. Device model passes it to the block layer.
>> 
>> 5. Block layer does its thing, but for some reason only M < N sectors
>>    can be read.  Block layer returns M.
>
> No, it returns -errno.

Really?

Consider a device that can arrange for a DMA of multiple sectors
into/from guest memory, where partial success can happen, and the device
can tell the OS how much I/O succeeded then.

Now let's build a device model.  Consider a read.  The guest passes some
guest memory to fill to the device model.  The device model passes it on
to the block layer.  The block layer succeeds only partially.  Now the
device model needs to figure out how much succeeded, so it can tell the
guest OS.  How does it do that?

>> 6. What's the device model to do now?  Both bdrv_acct_failed() and
>>    bdrv_acct_done() would be wrong.
>> 
>>    Should the device model account for a read of size M?  This ignores
>>    the partial failure.
>> 
>>    Should it split the read into a successful and a failed part for
>>    accounting purposes?  This miscounts the number of reads.
>
> I think we should simply account it as a failed request because this is
> what it will look like for the guest. If you want the partial data that
> was internally issued, you need to look at different statistics
> (probably those of bs->file).

I doubt "complete success" and "complete failure" is all guests could
ever see.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Qemu-devel] [libvirt]  IO accounting overhaul
  2014-09-08  7:12         ` Markus Armbruster
@ 2014-09-08  9:12           ` Kevin Wolf
  0 siblings, 0 replies; 14+ messages in thread
From: Kevin Wolf @ 2014-09-08  9:12 UTC (permalink / raw)
  To: Markus Armbruster
  Cc: Benoît Canet, libvir-list, qemu-devel, stefanha, anshul.makkar

Am 08.09.2014 um 09:12 hat Markus Armbruster geschrieben:
> Kevin Wolf <kwolf@redhat.com> writes:
> 
> > Am 01.09.2014 um 13:41 hat Markus Armbruster geschrieben:
> >> Benoît Canet <benoit.canet@irqsave.net> writes:
> >> 
> >> > The Monday 01 Sep 2014 à 11:52:00 (+0200), Markus Armbruster wrote :
> >> >> Cc'ing libvirt following Stefan's lead.
> >> >> 
> >> >> Benoît Canet <benoit.canet@irqsave.net> writes:
> >> >> > /* the following would compute latecies for slices of 1 seconds
> >> >> > then toss the
> >> >> >  * result and start a new slice. A weighted sumation of the
> >> >> > instant latencies
> >> >> >  * could help to implement this.
> >> >> >  */
> >> >> > 1s_read_average_latency
> >> >> > 1s_write_average_latency
> >> >> > 1s_flush_average_latency
> >> >> >
> >> >> > /* the former three numbers could be used to further compute a 1
> >> >> > minute slice value */
> >> >> > 1m_read_average_latency
> >> >> > 1m_write_average_latency
> >> >> > 1m_flush_average_latency
> >> >> >
> >> >> > /* the former three numbers could be used to further compute a 1 hours
> >> >> > slice value */
> >> >> > 1h_read_average_latency
> >> >> > 1h_write_average_latency
> >> >> > 1h_flush_average_latency
> >> >> 
> >> >> This is something like "what we added to total_FOO_time in the last
> >> >> completed 1s / 1m / 1h time slice divided by the number of additions".
> >> >> Just another way to accumulate the same raw data, thus no worries.
> >> >> 
> >> >> > /* 1 second average number of requests in flight */
> >> >> > 1s_read_queue_depth
> >> >> > 1s_write_queue_depth
> >> >> >
> >> >> > /* 1 minute average number of requests in flight */
> >> >> > 1m_read_queue_depth
> >> >> > 1m_write_queue_depth
> >> >> >
> >> >> > /* 1 hours average number of requests in flight */
> >> >> > 1h_read_queue_depth
> >> >> > 1h_write_queue_depth
> >
> > I don't think I agree with putting fixed time periods like 1 s/min/h
> > into qemu. What you need there is policy and we should probably make
> > it configurable.
> >
> > Do we need accounting for multiple time periods at the same time or
> > would it be enough to have one and make its duration an option?
> >
> >> > Optionally collecting the same data for each BDS of the graph.
> >> 
> >> If that's the case, keeping the shared infrastructure in the block layer
> >> makes sense.
> >> 
> >> BDS member acct then holds I/O stats for the BDS.  We currently use it
> >> for something else: I/O stats of the device model backed by this BDS.
> >> That needs to move elsewhere.  Two places come to mind:
> >> 
> >> 1. BlockBackend, when it's available (I resumed working on it last week
> >>    for a bit).  Superficially attractive, because it's close to what we
> >>    have now, but then we have to deal with what to do when the backend
> >>    gets disconnected from its device model, then connected to another
> >>    one.
> >> 
> >> 2. The device models that actually implement I/O accounting.  Since
> >>    query-blockstats names a backend rather than a device model, we need
> >>    a BlockDevOps callback to fetch the stats.  Fetch fails when the
> >>    callback is null.  Lets us distinguish "no stats yet" and "device
> >>    model can't do stats", thus permits a QMP interface that doesn't lie.
> >> 
> >> Right now, I like (2) better.
> >
> > So let's say I have some block device, which is attached to a guest
> > device for a while, but then I detach it and continue using it in a
> > different place (maybe another guest device or a block job). Should we
> > really reset all counters in query-blockstats to 0?
> >
> > I think as I user I would be surprised about this, because I still refer
> > to it by the same name (the device_name, which will be in the BB), so
> > it's the same thing for me and the total requests include everything
> > that was ever issued against it.
> 
> In my opinion, what's wrong here is the user interface: query-blockstats
> lets you query device model I/O statistics, but they're reported for the
> backend rather than the device model.  This is confusing.
> 
> Once you accept that these statistics measure device model behavior,
> it's no longer surprising they go away on device model destruction.
>
> We may want to measure BDS behavior, too.  But that's a separate set of
> stats.

If they measure device model behaviour rather than backend behaviour
(which results in the same numbers as long as you keep them attached),
then the API is broken because it should be using the qdev ID to
identify the device, and not the backend's device_name.

So I would argue that we're measuring the backend (even though as an
implementation detail we don't measure _in_ the backend), and if we
want to measure the device model, too, _that_ is what needs a new
interface.

> >> > -API wize I think about adding
> >> > bdrv_acct_invalid() and
> >> > bdrv_acct_failed() and systematically issuing a bdrv_acct_start() asap.
> >> 
> >> Complication: partial success.  Example:
> >> 
> >> 1. Guest requests a read of N sectors.
> >> 
> >> 2. Device model calls
> >>    bdrv_acct_start(s->bs, &req->acct, N * BDRV_SECTOR_SIZE, BDRV_ACCT_READ)
> >> 
> >> 3. Device model examines the request, and deems it valid.
> >> 
> >> 4. Device model passes it to the block layer.
> >> 
> >> 5. Block layer does its thing, but for some reason only M < N sectors
> >>    can be read.  Block layer returns M.
> >
> > No, it returns -errno.
> 
> Really?
> 
> Consider a device that can arrange for a DMA of multiple sectors
> into/from guest memory, where partial success can happen, and the device
> can tell the OS how much I/O succeeded then.

I don't think we have any such device and can't think of any that we
might want to implement. But let's assume the existence of such a device
for the sake of the argument.

> Now let's build a device model.  Consider a read.  The guest passes some
> guest memory to fill to the device model.  The device model passes it on
> to the block layer.  The block layer succeeds only partially.  Now the
> device model needs to figure out how much succeeded, so it can tell the
> guest OS.  How does it do that?

Currently it doesn't. The block layer doesn't have a concept of
"suceeding partially". A request either success fully, or it fails. If
you want to expose partial success to the guest, you would have to
fundamentally change the block layer read/write APIs (probably it's
"just" allowing short reads/writes, but today everyone relies on the
facts that those don't happen).

In practice, we would probably just never emulate partial success but
signal failure even to a device that could handle partial success.
Trying to be clever in corner cases is hardly ever a good idea anyway.

> >> 6. What's the device model to do now?  Both bdrv_acct_failed() and
> >>    bdrv_acct_done() would be wrong.
> >> 
> >>    Should the device model account for a read of size M?  This ignores
> >>    the partial failure.
> >> 
> >>    Should it split the read into a successful and a failed part for
> >>    accounting purposes?  This miscounts the number of reads.
> >
> > I think we should simply account it as a failed request because this is
> > what it will look like for the guest. If you want the partial data that
> > was internally issued, you need to look at different statistics
> > (probably those of bs->file).
> 
> I doubt "complete success" and "complete failure" is all guests could
> ever see.

If you define everything that isn't a complete success as a failure,
then you've covered all cases.

Kevin

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2014-09-08  9:12 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-08-28 14:38 [Qemu-devel] IO accounting overhaul Benoît Canet
2014-08-29 16:04 ` Stefan Hajnoczi
2014-08-29 16:32   ` Benoît Canet
2014-09-01  9:52 ` Markus Armbruster
2014-09-01 10:44   ` [Qemu-devel] [libvirt] " Benoît Canet
2014-09-01 11:41     ` Markus Armbruster
2014-09-01 13:38       ` Benoît Canet
2014-09-02 13:59         ` Markus Armbruster
2014-09-05 14:30       ` Kevin Wolf
2014-09-05 14:56         ` Benoît Canet
2014-09-05 14:57         ` Benoît Canet
2014-09-05 15:24         ` Benoît Canet
2014-09-08  7:12         ` Markus Armbruster
2014-09-08  9:12           ` Kevin Wolf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.