linux-scsi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* On FC host statistics ( /sys/class/fc_host/hostX/statistics) and monitoring plugins
@ 2020-05-19 12:23 Ulrich Windl
  2020-05-27  2:05 ` Martin K. Petersen
  0 siblings, 1 reply; 2+ messages in thread
From: Ulrich Windl @ 2020-05-19 12:23 UTC (permalink / raw)
  To: linux-scsi

Hi!

I've developed a monitoring plugin that reads /sys/class/fc_host/hostX/statistics for numbers.
I'm not very happy about it for the following reasons:

1) Some numbers (e.g. fcp_frame_alloc_failures) are not supported by some drivers (e.g. QLE2690) and the value read from the file is "0xffffffffffffffff". The source seems to set this to -1, but when reading it back it looks like unsigned. For a 64-bit counter it's quite unlikely to read this value, but it's still possible.

2) While statistics counters seems to be 64 bits, I've experienced a "wrap around" at fewer bit positions (maybe like 40 bits) for the bfa driver. I have no idea whether it's a hardware restriction or a firmware/driver bug, however. I did my best to make sure it's not a problem of my plugin (assuming those counters are read atomically when using one read())

3) The bfa drivers has  an (maybe even two) "offlines" event counter, but that is not exported to fc_host statistics (it seems to me). I've seen several "bfa 0000:0e:00.0: Target (WWN = 50:01:43:80:11:36:89:da) connectivity lost for initiator (WWN = 10:00:00:05:1e:fb:3e:a0)" in syslog, but I'd like to see such events through statistics.

4) The bfa driver (QLogic-815) sets link_failure_count to 1 after a clean reboot. This triggers the monitoring plugin reporting a "link failure". That's not very nice.

My idea was (probably more universal than restricted to FC host statistics) to provide another file (maybe named "statistics") that lists the names of implemented statistics counters (i.e.: leaving out those set to -1) together with the significant bits (like 32 or 64), the type of the value (like "counter", "gauge", "boolean", "enum", "string", etc.)
"string" would be free text (I doubt it will make sense for statistics, but anyhow), "enum" would be single word tokens (e.g. _not_ " NPort (fabric via point-to-point)"), "counter" would count bytes or events (maybe a type "event_count[er]" may make sense), and "gauge" would be a non-monotonic value like utilization...

Finally an example what the existing "statistics" directory contains (4.12.14-95.51-default from SLES12 SP4):
/sys/class/fc_host/host0/statistics/dumped_frames: 0x0
/sys/class/fc_host/host0/statistics/error_frames: 0x0
/sys/class/fc_host/host0/statistics/fc_no_free_exch: 0x0
/sys/class/fc_host/host0/statistics/fc_no_free_exch_xid: 0x0
/sys/class/fc_host/host0/statistics/fc_non_bls_resp: 0x0
/sys/class/fc_host/host0/statistics/fc_seq_not_found: 0x0
/sys/class/fc_host/host0/statistics/fc_xid_busy: 0x0
/sys/class/fc_host/host0/statistics/fc_xid_not_found: 0x0
/sys/class/fc_host/host0/statistics/fcp_control_requests: 0x0
/sys/class/fc_host/host0/statistics/fcp_frame_alloc_failures: 0x0
/sys/class/fc_host/host0/statistics/fcp_input_megabytes: 0x0
/sys/class/fc_host/host0/statistics/fcp_input_requests: 0x0
/sys/class/fc_host/host0/statistics/fcp_output_megabytes: 0x0
/sys/class/fc_host/host0/statistics/fcp_output_requests: 0x0
/sys/class/fc_host/host0/statistics/fcp_packet_aborts: 0x0
/sys/class/fc_host/host0/statistics/fcp_packet_alloc_failures: 0x0
/sys/class/fc_host/host0/statistics/invalid_crc_count: 0x0
/sys/class/fc_host/host0/statistics/invalid_tx_word_count: 0x0
/sys/class/fc_host/host0/statistics/link_failure_count: 0x1
/sys/class/fc_host/host0/statistics/lip_count: 0x0
/sys/class/fc_host/host0/statistics/loss_of_signal_count: 0x0
/sys/class/fc_host/host0/statistics/loss_of_sync_count: 0x0
/sys/class/fc_host/host0/statistics/nos_count: 0x0
/sys/class/fc_host/host0/statistics/prim_seq_protocol_err_count: 0x0
/sys/class/fc_host/host0/statistics/reset_statistics: ()
/sys/class/fc_host/host0/statistics/rx_frames: 0x18221
/sys/class/fc_host/host0/statistics/rx_words: 0x1a3e8f9
/sys/class/fc_host/host0/statistics/seconds_since_last_reset: 0x2a79
/sys/class/fc_host/host0/statistics/tx_frames: 0x77f4
/sys/class/fc_host/host0/statistics/tx_words: 0x82483

And here's what 4.12.14-122.17-default (SLES12 SP5) contains for a different FC host:
/sys/class/fc_host/host3/statistics/dumped_frames: 0x0
/sys/class/fc_host/host3/statistics/error_frames: 0x0
/sys/class/fc_host/host3/statistics/fc_no_free_exch: 0xffffffffffffffff
/sys/class/fc_host/host3/statistics/fc_no_free_exch_xid: 0xffffffffffffffff
/sys/class/fc_host/host3/statistics/fc_non_bls_resp: 0xffffffffffffffff
/sys/class/fc_host/host3/statistics/fc_seq_not_found: 0xffffffffffffffff
/sys/class/fc_host/host3/statistics/fc_xid_busy: 0xffffffffffffffff
/sys/class/fc_host/host3/statistics/fc_xid_not_found: 0xffffffffffffffff
/sys/class/fc_host/host3/statistics/fcp_control_requests: 0x19
/sys/class/fc_host/host3/statistics/fcp_frame_alloc_failures: 0xffffffffffffffff
/sys/class/fc_host/host3/statistics/fcp_input_megabytes: 0x2829
/sys/class/fc_host/host3/statistics/fcp_input_requests: 0x114b54e
/sys/class/fc_host/host3/statistics/fcp_output_megabytes: 0x11cbf
/sys/class/fc_host/host3/statistics/fcp_output_requests: 0xd87b98
/sys/class/fc_host/host3/statistics/fcp_packet_aborts: 0xffffffffffffffff
/sys/class/fc_host/host3/statistics/fcp_packet_alloc_failures: 0xffffffffffffffff
/sys/class/fc_host/host3/statistics/invalid_crc_count: 0x0
/sys/class/fc_host/host3/statistics/invalid_tx_word_count: 0x0
/sys/class/fc_host/host3/statistics/link_failure_count: 0x0
/sys/class/fc_host/host3/statistics/lip_count: 0x0
/sys/class/fc_host/host3/statistics/loss_of_signal_count: 0x0
/sys/class/fc_host/host3/statistics/loss_of_sync_count: 0x0
/sys/class/fc_host/host3/statistics/nos_count: 0x0
/sys/class/fc_host/host3/statistics/prim_seq_protocol_err_count: 0x0
/sys/class/fc_host/host3/statistics/rx_frames: 0x43a5ec7
/sys/class/fc_host/host3/statistics/rx_words: 0x2829a23ac
/sys/class/fc_host/host3/statistics/seconds_since_last_reset: 0x57dffc
/sys/class/fc_host/host3/statistics/tx_frames: 0x4b9e39d
/sys/class/fc_host/host3/statistics/tx_words: 0x11cbf9bc00

So it's hard to tell which FC HBA supports which statistics numbers...

The only message I see from bfa after boot (regarding link_failure_count set to 1) is "kernel: bfa 0000:0e:00.0: Logical port online: WWN = 10:00:00:05:1e:fb:3e:a0 Role = Initiator"...

Regards,
Ulrich Windl


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: On FC host statistics ( /sys/class/fc_host/hostX/statistics) and monitoring plugins
  2020-05-19 12:23 On FC host statistics ( /sys/class/fc_host/hostX/statistics) and monitoring plugins Ulrich Windl
@ 2020-05-27  2:05 ` Martin K. Petersen
  0 siblings, 0 replies; 2+ messages in thread
From: Martin K. Petersen @ 2020-05-27  2:05 UTC (permalink / raw)
  To: Ulrich Windl; +Cc: linux-scsi, James Smart


Ulrich,

> 1) Some numbers (e.g. fcp_frame_alloc_failures) are not supported by
> some drivers (e.g. QLE2690) and the value read from the file is
> "0xffffffffffffffff". The source seems to set this to -1, but when
> reading it back it looks like unsigned. For a 64-bit counter it's
> quite unlikely to read this value, but it's still possible.

I agree that's messy.

> 2) While statistics counters seems to be 64 bits, I've experienced a
> "wrap around" at fewer bit positions (maybe like 40 bits) for the bfa
> driver. I have no idea whether it's a hardware restriction or a
> firmware/driver bug, however. I did my best to make sure it's not a
> problem of my plugin (assuming those counters are read atomically when
> using one read())

bfa has been dead for about 5 years so don't expect any fixes in that
department.

> My idea was (probably more universal than restricted to FC host
> statistics) to provide another file (maybe named "statistics") that
> lists the names of implemented statistics counters (i.e.: leaving out
> those set to -1) together with the significant bits (like 32 or 64),
> the type of the value (like "counter", "gauge", "boolean", "enum",
> "string", etc.)
> "string" would be free text (I doubt it will make sense for
> statistics, but anyhow), "enum" would be single word tokens
> (e.g. _not_ " NPort (fabric via point-to-point)"), "counter" would
> count bytes or events (maybe a type "event_count[er]" may make sense),
> and "gauge" would be a non-monotonic value like utilization...

I'm not a particularly big fan of -1 reporting. But it seems that the
path of least resistance is to fix the sysfs unsigned issue.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2020-05-27  2:06 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-19 12:23 On FC host statistics ( /sys/class/fc_host/hostX/statistics) and monitoring plugins Ulrich Windl
2020-05-27  2:05 ` Martin K. Petersen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).