[RFC PATCH net-next 0/6] devlink: Add device metric support

* [RFC PATCH net-next 0/6] devlink: Add device metric support
@ 2020-08-17 12:50 Ido Schimmel
  2020-08-17 12:50 ` [RFC PATCH net-next 1/6] devlink: Add device metric infrastructure Ido Schimmel
                   ` (6 more replies)
  0 siblings, 7 replies; 29+ messages in thread
From: Ido Schimmel @ 2020-08-17 12:50 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, jiri, amcohen, danieller, mlxsw, roopa, dsahern,
	andrew, f.fainelli, vivien.didelot, saeedm, tariqt, ayal, eranbe,
	mkubecek, Ido Schimmel

From: Ido Schimmel <idosch@nvidia.com>

This patch set extends devlink to allow device drivers to expose device
metrics to user space in a standard and extensible fashion, as opposed
to the driver-specific debugfs approach.

This a joint work with Amit Cohen and Danielle Ratson during a two-day
company hackathon.

Motivation
==========

Certain devices have metrics (e.g., diagnostic counters, histograms)
that are useful to expose to user space for testing and debugging
purposes.

Currently, there is no standardized interface through which these
metrics can be exposed and most drivers resort to debugfs, which is not
very welcome in the networking subsystem and for good reasons. For one,
it is not a stable interface on which users can rely. Secondly, it
results in duplicated code and inconsistent interfaces in case drivers
are implementing similar functionality.

While Ethernet drivers can expose per-port counters to user space via
ethtool, they cannot expose device-wide metrics or configurable metrics
such as counters that can be enabled / disabled or histogram agents.

Solution overview
=================

Currently, the only supported metric type is a counter, but histograms
will be added in the future. The current interface is:

devlink dev metric show [ DEV metric METRIC | group GROUP ]
devlink dev metric set DEV metric METRIC [ group GROUP ]

Device drivers can dynamically register or unregister their metrics with
devlink by calling devlink_metric_counter_create() /
devlink_metric_destroy().

Grouping allows user space to group certain metrics together so that
they can be queried from the kernel using one request and retrieved in a
single response filtered by the kernel (i.e., kernel sets
NLM_F_DUMP_FILTERED).

Example
=======

Instantiate two netdevsim devices:

# echo "10 1" > /sys/bus/netdevsim/new_device
# echo "20 1" > /sys/bus/netdevsim/new_device

Dump all available metrics:

# devlink -s dev metric show
netdevsim/netdevsim10:
   metric dummy_counter type counter group 0 value 2
netdevsim/netdevsim20:
   metric dummy_counter type counter group 0 value 2

Dump a specific metric:

# devlink -s dev metric show netdevsim/netdevsim10 metric dummy_counter
netdevsim/netdevsim10:
   metric dummy_counter type counter group 0 value 3

Set a metric to a different group:

# devlink dev metric set netdevsim/netdevsim10 metric dummy_counter group 10

Dump all metrics in a specific group:

# devlink -s dev metric show group 10
netdevsim/netdevsim10:
   metric dummy_counter type counter group 10 value 4

Future extensions
=================

1. Enablement and disablement of metrics. This is useful in case the
metric adds latency when enabled or consumes limited resources (e.g.,
counters or histogram agents). It is up to the device driver to decide
if a metric is enabled by default or not. Proposed interface:

devlink dev metric set DEV metric METRIC [ group GROUP ]
	[ enable { true | false } ]

2. Histogram metrics. Some devices have the ability to calculate
histograms in hardware by sampling a specific parameter multiple times
per second. For example, the transmission queue depth of a port. This
enables the debugging of microbursts which would otherwise be invisible.
While this can be achieved in software using BPF, it is not applicable
when the data plane is offloaded as the CPU does not see the traffic.
Proposed interface:

devlink dev metric set DEV metric METRIC [ group GROUP ]
	[ enable { true | false } ] [ hist_type { linear | exp } ]
	[ hist_sample_interval SAMPLE ] [ hist_min MIN ] [ hist_max MAX ]
	[ hist_buckets BUCKETS ]

3. Per-port metrics. While all the metrics can be exposed as global and
namespaced as per-port by naming them accordingly, there is value in
allowing user space to dump all metrics related to a certain port.
Proposed interface:

devlink port metric set DEV/PORT_INDEX metric METRIC [ group GROUP ]
	[ enable { true | false } ] [ hist_type { linear | exp } ]
	[ hist_sample_interval SAMPLE ] [ hist_min MIN ] [ hist_max MAX ]
	[ hist_buckets BUCKETS ]
devlink port metric show [ DEV/PORT_INDEX metric METRIC | group GROUP ]

To avoid duplicating ethtool functionality we can decide to expose via
this interface only:

1. Configurable metrics
2. Metrics that are not only relevant to Ethernet ports

TODO
====

1. Add devlink-metric man page
2. Add selftests over mlxsw

Ido Schimmel (6):
  devlink: Add device metric infrastructure
  netdevsim: Add devlink metric support
  selftests: netdevsim: Add devlink metric tests
  mlxsw: reg: Add Tunneling NVE Counters Register
  mlxsw: reg: Add Tunneling NVE Counters Register Version 2
  mlxsw: spectrum_nve: Expose VXLAN counters via devlink-metric

 .../networking/devlink/devlink-metric.rst     |  37 ++
 Documentation/networking/devlink/index.rst    |   1 +
 Documentation/networking/devlink/mlxsw.rst    |  36 ++
 drivers/net/ethernet/mellanox/mlxsw/reg.h     | 104 ++++++
 .../ethernet/mellanox/mlxsw/spectrum_nve.h    |  10 +
 .../mellanox/mlxsw/spectrum_nve_vxlan.c       | 285 +++++++++++++++
 drivers/net/netdevsim/dev.c                   |  92 ++++-
 drivers/net/netdevsim/netdevsim.h             |   1 +
 include/net/devlink.h                         |  18 +
 include/uapi/linux/devlink.h                  |  19 +
 net/core/devlink.c                            | 346 ++++++++++++++++++
 .../drivers/net/netdevsim/devlink.sh          |  49 ++-
 12 files changed, 995 insertions(+), 3 deletions(-)
 create mode 100644 Documentation/networking/devlink/devlink-metric.rst

-- 
2.26.2

^ permalink raw reply	[flat|nested] 29+ messages in thread