All of lore.kernel.org
 help / color / mirror / Atom feed
* Blocked / Slow requests in health JSON from Mon/Mgr
@ 2017-11-27 10:29 Wido den Hollander
  2017-11-27 11:19 ` John Spray
  0 siblings, 1 reply; 3+ messages in thread
From: Wido den Hollander @ 2017-11-27 10:29 UTC (permalink / raw)
  To: ceph-devel

Hi,

For the Zabbix plugin for the Mgr I wanted to report the amount of block and/or slow requests the cluster is experiencing.

There is no item with a int value in the JSON returned by the Monitors.

What would be the easiest way to obtain these values in a Mgr Module?

Or would we need to expand the JSON the MON reports?

I'd like to make a trigger in Zabbix that if num slow requests is > X a admin is alerted.

Right now you would have to parse a string which isn't very stable.

Any ideas?

Wido

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Blocked / Slow requests in health JSON from Mon/Mgr
  2017-11-27 10:29 Blocked / Slow requests in health JSON from Mon/Mgr Wido den Hollander
@ 2017-11-27 11:19 ` John Spray
  2017-11-27 12:59   ` Wido den Hollander
  0 siblings, 1 reply; 3+ messages in thread
From: John Spray @ 2017-11-27 11:19 UTC (permalink / raw)
  To: Wido den Hollander; +Cc: ceph-devel

On Mon, Nov 27, 2017 at 10:29 AM, Wido den Hollander <wido@42on.com> wrote:
> Hi,
>
> For the Zabbix plugin for the Mgr I wanted to report the amount of block and/or slow requests the cluster is experiencing.
>
> There is no item with a int value in the JSON returned by the Monitors.
>
> What would be the easiest way to obtain these values in a Mgr Module?
>
> Or would we need to expand the JSON the MON reports?
>
> I'd like to make a trigger in Zabbix that if num slow requests is > X a admin is alerted.
>
> Right now you would have to parse a string which isn't very stable.

Kefu has been working on the health checks for slow requests:
https://github.com/ceph/ceph/pull/18614
https://github.com/ceph/ceph/pull/19114

Currently, health checks are very string-ish, but I would really like
them to have more machine-readable stuff (i.e. expand the
health_check_t structure with a generic map to store json-encodable
metadata), and populate that in the same places we generate strings
(e.g. in this instance where PGMap generates the
REQUEST_SLOW/REQUEST_STUCK health checks).

BTW, I'm curious about the use case for thresholding slow requests on
the number of slow requests: wouldn't you want to alert the admin even
if there was only one?  If there are false positives then maybe
mon_osd_warn_op_age is the thing to adjust

John

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Blocked / Slow requests in health JSON from Mon/Mgr
  2017-11-27 11:19 ` John Spray
@ 2017-11-27 12:59   ` Wido den Hollander
  0 siblings, 0 replies; 3+ messages in thread
From: Wido den Hollander @ 2017-11-27 12:59 UTC (permalink / raw)
  To: John Spray; +Cc: ceph-devel


> Op 27 november 2017 om 12:19 schreef John Spray <jspray@redhat.com>:
> 
> 
> On Mon, Nov 27, 2017 at 10:29 AM, Wido den Hollander <wido@42on.com> wrote:
> > Hi,
> >
> > For the Zabbix plugin for the Mgr I wanted to report the amount of block and/or slow requests the cluster is experiencing.
> >
> > There is no item with a int value in the JSON returned by the Monitors.
> >
> > What would be the easiest way to obtain these values in a Mgr Module?
> >
> > Or would we need to expand the JSON the MON reports?
> >
> > I'd like to make a trigger in Zabbix that if num slow requests is > X a admin is alerted.
> >
> > Right now you would have to parse a string which isn't very stable.
> 
> Kefu has been working on the health checks for slow requests:
> https://github.com/ceph/ceph/pull/18614
> https://github.com/ceph/ceph/pull/19114
> 
> Currently, health checks are very string-ish, but I would really like
> them to have more machine-readable stuff (i.e. expand the
> health_check_t structure with a generic map to store json-encodable
> metadata), and populate that in the same places we generate strings
> (e.g. in this instance where PGMap generates the
> REQUEST_SLOW/REQUEST_STUCK health checks).
> 

Good! That would be nice, something like this in a JSON:

{
    "block_requests": 13,
    "slow_requests": 35
}

The Zabbix module could pick this values up and send them to Zabbix for further processing.

> BTW, I'm curious about the use case for thresholding slow requests on
> the number of slow requests: wouldn't you want to alert the admin even
> if there was only one?  If there are false positives then maybe
> mon_osd_warn_op_age is the thing to adjust
> 

Well, not sure. You probably want to alert if one or more occur.

I would just want a integer in a JSON somewhere and use it in Zabbix for graphing and alerting.

Wido

> John

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2017-11-27 13:00 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-27 10:29 Blocked / Slow requests in health JSON from Mon/Mgr Wido den Hollander
2017-11-27 11:19 ` John Spray
2017-11-27 12:59   ` Wido den Hollander

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.