Re: Metrics vs Logging, Continued

* Re: Metrics vs Logging, Continued
@ 2018-01-03  2:22 Christopher Covington
  2018-01-03  4:08 ` Deepak Kodihalli
  2018-01-04 15:39 ` Michael E Brown
  0 siblings, 2 replies; 10+ messages in thread
From: Christopher Covington @ 2018-01-03  2:22 UTC (permalink / raw)
  To: openbmc, Michael_E_Brown, venture

Hi Michael, Patrick,

I probably should have hopped on this list months ago. Thanks for your patience as I come up to
speed on your code, configure my mail client to suit this list, and so on.

> Prometheus metrics is fundamentally a pull model, not a push model. If you have a pull model,
> it greatly simplifies the dependencies:

>	- Pull metrics internally or externally (daemons listen on 127.0.0.1, optionally reverse proxy
>	  that through your web service).

An option for on-demand metrics (as opposed to periodic, always-on monitoring) is nice. I would
use it to more highly scrutinize upgrades in progress for example.

>	- Optionally run the metrics server or not depending on configuration.

I agree it should fail gracefully when there is no server present, and think this generalizes to
other network services, even NTP and DHCP.

>	- Pull model naturally self-limits in performance-limited cases... you don’t have a thundering
>	  herd of daemons trying to push metrics. In case metrics server gets loaded it will naturally
>	  slow down polls to backend daemons.

At large scale you'll either need multiple pollers or load-balancing for the receiving server. I'm
not sure what the best solution is. Is load-balancing perhaps more commonplace?

> But what I think would be pretty nice is if you could point graphana/Prometheus towards every
> BMC on your network to get nice graphs of temp, fan speeds, etc.

For metrics/counters, I've been centrally pulling/polling from a fleet running the following RESTful
API:

https://github.com/facebook/openbmc/tree/helium/common/recipes-rest/rest-api/files

But polling the whole fleet doesn't seem ideal, so I'm wondering about a push model.

Prometheus looks interesting, thanks for the pointer. It does seem to support a push model
https://prometheus.io/docs/instrumenting/pushing/

Do Go language applications run reasonably well on ASpeed 2400 SoCs?

I've heard that OpenWRT uses collectd: https://wiki.openwrt.org/doc/howto/statistic.collectd

Thanks,
Christopher Covington

^ permalink raw reply	[flat|nested] 10+ messages in thread