Interpreting ceph osd pool stats output

* Interpreting ceph osd pool stats output
@ 2017-03-10  2:37 Paul Cuzner
  2017-03-10  9:55 ` John Spray
  2017-03-20 14:20 ` Ruben Kerkhof
  0 siblings, 2 replies; 18+ messages in thread
From: Paul Cuzner @ 2017-03-10  2:37 UTC (permalink / raw)
  To: ceph-devel

Hi,

I've been putting together a collectd plugin for ceph - since the old
one's I could find no longer work. I'm gathering data from the mon's
admin socket, merged with a couple of commands I issue through the
rados mon_command interface.

Nothing complicated, but the data has me a little confused

When I run "osd pool stats" I get *two* different sets of metrics that
describe client i/o and recovery i/o. Since the metrics are different
I can't merge them to get a consistent view of what the cluster is
doing as a whole at any given point in time. For example, client i/o
reports in bytes_sec, but the recovery dict is empty and the
recovery_rate is in objects_sec...

i.e.

}, {
"pool_name": "rados-bench-cbt",
"pool_id": 86,
"recovery": {},
"recovery_rate": {
"recovering_objects_per_sec": 3530,
"recovering_bytes_per_sec": 14462655,
"recovering_keys_per_sec": 0,
"num_objects_recovered": 7148,
"num_bytes_recovered": 29278208,
"num_keys_recovered": 0
},
"client_io_rate": {}

This is running Jewel - 10.2.5-37.el7cp

Is this a bug or a 'feature' :)

Cheers,

Paul C

^ permalink raw reply	[flat|nested] 18+ messages in thread