* Blocked / Slow requests in health JSON from Mon/Mgr
@ 2017-11-27 10:29 Wido den Hollander
2017-11-27 11:19 ` John Spray
0 siblings, 1 reply; 3+ messages in thread
From: Wido den Hollander @ 2017-11-27 10:29 UTC (permalink / raw)
To: ceph-devel
Hi,
For the Zabbix plugin for the Mgr I wanted to report the amount of block and/or slow requests the cluster is experiencing.
There is no item with a int value in the JSON returned by the Monitors.
What would be the easiest way to obtain these values in a Mgr Module?
Or would we need to expand the JSON the MON reports?
I'd like to make a trigger in Zabbix that if num slow requests is > X a admin is alerted.
Right now you would have to parse a string which isn't very stable.
Any ideas?
Wido
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Blocked / Slow requests in health JSON from Mon/Mgr
2017-11-27 10:29 Blocked / Slow requests in health JSON from Mon/Mgr Wido den Hollander
@ 2017-11-27 11:19 ` John Spray
2017-11-27 12:59 ` Wido den Hollander
0 siblings, 1 reply; 3+ messages in thread
From: John Spray @ 2017-11-27 11:19 UTC (permalink / raw)
To: Wido den Hollander; +Cc: ceph-devel
On Mon, Nov 27, 2017 at 10:29 AM, Wido den Hollander <wido@42on.com> wrote:
> Hi,
>
> For the Zabbix plugin for the Mgr I wanted to report the amount of block and/or slow requests the cluster is experiencing.
>
> There is no item with a int value in the JSON returned by the Monitors.
>
> What would be the easiest way to obtain these values in a Mgr Module?
>
> Or would we need to expand the JSON the MON reports?
>
> I'd like to make a trigger in Zabbix that if num slow requests is > X a admin is alerted.
>
> Right now you would have to parse a string which isn't very stable.
Kefu has been working on the health checks for slow requests:
https://github.com/ceph/ceph/pull/18614
https://github.com/ceph/ceph/pull/19114
Currently, health checks are very string-ish, but I would really like
them to have more machine-readable stuff (i.e. expand the
health_check_t structure with a generic map to store json-encodable
metadata), and populate that in the same places we generate strings
(e.g. in this instance where PGMap generates the
REQUEST_SLOW/REQUEST_STUCK health checks).
BTW, I'm curious about the use case for thresholding slow requests on
the number of slow requests: wouldn't you want to alert the admin even
if there was only one? If there are false positives then maybe
mon_osd_warn_op_age is the thing to adjust
John
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Blocked / Slow requests in health JSON from Mon/Mgr
2017-11-27 11:19 ` John Spray
@ 2017-11-27 12:59 ` Wido den Hollander
0 siblings, 0 replies; 3+ messages in thread
From: Wido den Hollander @ 2017-11-27 12:59 UTC (permalink / raw)
To: John Spray; +Cc: ceph-devel
> Op 27 november 2017 om 12:19 schreef John Spray <jspray@redhat.com>:
>
>
> On Mon, Nov 27, 2017 at 10:29 AM, Wido den Hollander <wido@42on.com> wrote:
> > Hi,
> >
> > For the Zabbix plugin for the Mgr I wanted to report the amount of block and/or slow requests the cluster is experiencing.
> >
> > There is no item with a int value in the JSON returned by the Monitors.
> >
> > What would be the easiest way to obtain these values in a Mgr Module?
> >
> > Or would we need to expand the JSON the MON reports?
> >
> > I'd like to make a trigger in Zabbix that if num slow requests is > X a admin is alerted.
> >
> > Right now you would have to parse a string which isn't very stable.
>
> Kefu has been working on the health checks for slow requests:
> https://github.com/ceph/ceph/pull/18614
> https://github.com/ceph/ceph/pull/19114
>
> Currently, health checks are very string-ish, but I would really like
> them to have more machine-readable stuff (i.e. expand the
> health_check_t structure with a generic map to store json-encodable
> metadata), and populate that in the same places we generate strings
> (e.g. in this instance where PGMap generates the
> REQUEST_SLOW/REQUEST_STUCK health checks).
>
Good! That would be nice, something like this in a JSON:
{
"block_requests": 13,
"slow_requests": 35
}
The Zabbix module could pick this values up and send them to Zabbix for further processing.
> BTW, I'm curious about the use case for thresholding slow requests on
> the number of slow requests: wouldn't you want to alert the admin even
> if there was only one? If there are false positives then maybe
> mon_osd_warn_op_age is the thing to adjust
>
Well, not sure. You probably want to alert if one or more occur.
I would just want a integer in a JSON somewhere and use it in Zabbix for graphing and alerting.
Wido
> John
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2017-11-27 13:00 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-27 10:29 Blocked / Slow requests in health JSON from Mon/Mgr Wido den Hollander
2017-11-27 11:19 ` John Spray
2017-11-27 12:59 ` Wido den Hollander
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.