Hello there,

This topic has been brought up several times on the mailing list and= offline, but in general seems we as a community didn't reach a consens= us on what things would be the most valuable to monitor, and how to monitor= them. While it seems a general purposed monitoring infrastructure for Open= BMC is a hard problem, I have some simple ideas that I hope can provide imm= ediate and direct benefits.

1. Monitoring host IPM= I link reliability (host side)

The essentials I wa= nt are "IPMI commands sent" and "IPMI commands succeeded&quo= t; counts over time. More metrics like response time would be=C2=A0helpful = as well. The issue to address here: when some IPMI sensor readings are flak= y, it would be really helpful to tell from IPMI command stats to determine = whether it is a hardware issue, or IPMI issue. Moreover, it would be a very= useful regression test metric for rolling out new BMC software.

Looking at the host IPMI side, there is some metrics exposed= through=C2=A0/proc/ipmi/0/si_stats if ipmi_si driver is used, but I haven&= #39;t dug into whether it contains information mapping to the interrupts. T= ime to read the source code I guess.

Another idea = would be to instrument caller libraries like the interfaces in ipmitool, th= ough I feel that approach is harder due to fragmentation of IPMI libraries.=

2. Read and expose core BMC performance metrics f= rom procfs

This is straightforward: have a smallis= h daemon (or bmc-state-manager) read,parse, and process procfs and put valu= es on D-Bus. Core metrics I'm interested in getting through this way: l= oad average, memory, disk used/available, net stats... The values can then = simply be exported as IPMI sensors or Redfish resource properties.

A nice byproduct of this effort would be a procfs parsing = library. Since different platforms would probably have different monitoring= requirements and procfs output format has no standard, I'm thinking th= e user would just provide a configuration file containing list of (procfs p= ath, property regex, D-Bus property name), and the compile-time=C2=A0genera= ted code to provide an object for each property.=C2=A0

All of this is merely thoughts and nothing concrete. With that said, i= t would be really great if you could provide some feedback such as "I = want this, but I really need that feature", or let me know it's al= l implemented already :)

If this seems valuable, a= fter gathering more feedback of feature requirements, I'm going to turn= them into design docs and upload for review.

Regards,

Kun