All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 00/14] netlink/hierarchical stats
@ 2019-01-28 23:44 Jakub Kicinski
  2019-01-28 23:44 ` [RFC 01/14] nfp: remove unused structure Jakub Kicinski
                   ` (15 more replies)
  0 siblings, 16 replies; 27+ messages in thread
From: Jakub Kicinski @ 2019-01-28 23:44 UTC (permalink / raw)
  To: davem
  Cc: oss-drivers, netdev, jiri, f.fainelli, andrew, mkubecek, dsahern,
	simon.horman, jesse.brandeburg, maciejromanfijalkowski,
	vasundhara-v.volam, michael.chan, shalomt, idosch,
	Jakub Kicinski

Hi!

As I tried to explain in my slides at netconf 2018 we are lacking
an expressive, standard API to report device statistics.

Networking silicon generally maintains some IEEE 802.3 and/or RMON
statistics.  Today those all end up in ethtool -S.  Here is a simple
attempt (admittedly very imprecise) of counting how many names driver
authors invented for IETF RFC2819 etherStatsPkts512to1023Octets
statistics (RX and TX):

$ git grep '".*512.*1023.*"' -- drivers/net/ | \
    sed -e 's/.*"\(.*\)".*/\1/' | sort | uniq | wc -l
63

Interestingly only two drivers in the tree use the name the standard
gave us (etherStatsPkts512to1023, modulo case).

I set out to working on this set in an attempt to give drivers a way
to express clearly to user space standard-compliant counters.

Second most common use for custom statistics is per-queue counters.
This is where the "hierarchical" part of this set comes in, as
groups can be nested, and user space tools can handle the aggregation
inside the groups if needed.

This set also tries to address the problem of users not knowing if
a statistic is reported by hardware or the driver.  Many modern drivers
use some prefix in ethtool -S to indicate MAC/PHY stats.  At a quick
glance: Netronome uses "mac.", Intel "port." and Mellanox "_phy".
In this set, netlink attributes describe whether a group of statistics
is RX or TX, maintained by device or driver.

The purpose of this patch set is _not_ to replace ethtool -S.  It is
an incredibly useful tool, and we will certainly continue using it.
However, for standard-based and commonly maintained statistics a more
structured API seems warranted.

There are two things missing from these patches, which I initially
planned to address as well: filtering, and refresh rate control.

Filtering doesn't need much explanation, users should be able to request
only a subset of statistics (like only SW stats or only given ID).  The
bitmap of statistics in each group is there for filtering later on.

By refresh control I mean the ability for user space to indicate how
"fresh" values it expects.  Sometimes reading the HW counters requires
slow register reads or FW communication, in such cases drivers may cache
the result.  (Privileged) user space should be able to add a "not older
than" timestamp to indicate how fresh statistics it expects.  And vice
versa, drivers can then also put the timestamp of when the statistics
were last refreshed in the dump for more precise bandwidth estimation.

Jakub Kicinski (14):
  nfp: remove unused structure
  nfp: constify parameter to nfp_port_from_netdev()
  net: hstats: add basic/core functionality
  net: hstats: allow hierarchies to be built
  nfp: very basic hstat support
  net: hstats: allow iterators
  net: hstats: help in iteration over directions
  nfp: hstats: make use of iteration for direction
  nfp: hstats: add driver and device per queue statistics
  net: hstats: add IEEE 802.3 and common IETF MIB/RMON stats
  nfp: hstats: add IEEE/RMON ethernet port/MAC stats
  net: hstats: add markers for partial groups
  nfp: hstats: add a partial group of per-8021Q prio stats
  Documentation: networking: describe new hstat API

 Documentation/networking/hstats.rst           | 590 +++++++++++++++
 .../networking/hstats_flow_example.dot        |  11 +
 Documentation/networking/index.rst            |   1 +
 drivers/net/ethernet/netronome/nfp/Makefile   |   1 +
 .../net/ethernet/netronome/nfp/nfp_hstat.c    | 474 ++++++++++++
 drivers/net/ethernet/netronome/nfp/nfp_main.c |   1 +
 drivers/net/ethernet/netronome/nfp/nfp_main.h |   2 +
 drivers/net/ethernet/netronome/nfp/nfp_net.h  |  10 +-
 .../ethernet/netronome/nfp/nfp_net_common.c   |   1 +
 .../net/ethernet/netronome/nfp/nfp_net_repr.h |   2 +-
 drivers/net/ethernet/netronome/nfp/nfp_port.c |   2 +-
 drivers/net/ethernet/netronome/nfp/nfp_port.h |   2 +-
 include/linux/netdevice.h                     |   9 +
 include/net/hstats.h                          | 176 +++++
 include/uapi/linux/if_link.h                  | 107 +++
 net/core/Makefile                             |   2 +-
 net/core/hstats.c                             | 682 ++++++++++++++++++
 net/core/rtnetlink.c                          |  21 +
 18 files changed, 2084 insertions(+), 10 deletions(-)
 create mode 100644 Documentation/networking/hstats.rst
 create mode 100644 Documentation/networking/hstats_flow_example.dot
 create mode 100644 drivers/net/ethernet/netronome/nfp/nfp_hstat.c
 create mode 100644 include/net/hstats.h
 create mode 100644 net/core/hstats.c

-- 
2.19.2


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [RFC 01/14] nfp: remove unused structure
  2019-01-28 23:44 [RFC 00/14] netlink/hierarchical stats Jakub Kicinski
@ 2019-01-28 23:44 ` Jakub Kicinski
  2019-01-28 23:44 ` [RFC 02/14] nfp: constify parameter to nfp_port_from_netdev() Jakub Kicinski
                   ` (14 subsequent siblings)
  15 siblings, 0 replies; 27+ messages in thread
From: Jakub Kicinski @ 2019-01-28 23:44 UTC (permalink / raw)
  To: davem
  Cc: oss-drivers, netdev, jiri, f.fainelli, andrew, mkubecek, dsahern,
	simon.horman, jesse.brandeburg, maciejromanfijalkowski,
	vasundhara-v.volam, michael.chan, shalomt, idosch,
	Jakub Kicinski

Remove struct nfp_pair, it used to be used for TC BPF offload,
but now we only offload direct action mode which doesn't have
explicit stats.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/nfp_net.h | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net.h b/drivers/net/ethernet/netronome/nfp/nfp_net.h
index be37c2d6151c..320ec3900a32 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net.h
@@ -447,11 +447,6 @@ static inline bool nfp_net_fw_ver_eq(struct nfp_net_fw_version *fw_ver,
 	       fw_ver->minor == minor;
 }
 
-struct nfp_stat_pair {
-	u64 pkts;
-	u64 bytes;
-};
-
 /**
  * struct nfp_net_dp - NFP network device datapath data structure
  * @dev:		Backpointer to struct device
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RFC 02/14] nfp: constify parameter to nfp_port_from_netdev()
  2019-01-28 23:44 [RFC 00/14] netlink/hierarchical stats Jakub Kicinski
  2019-01-28 23:44 ` [RFC 01/14] nfp: remove unused structure Jakub Kicinski
@ 2019-01-28 23:44 ` Jakub Kicinski
  2019-01-28 23:44 ` [RFC 03/14] net: hstats: add basic/core functionality Jakub Kicinski
                   ` (13 subsequent siblings)
  15 siblings, 0 replies; 27+ messages in thread
From: Jakub Kicinski @ 2019-01-28 23:44 UTC (permalink / raw)
  To: davem
  Cc: oss-drivers, netdev, jiri, f.fainelli, andrew, mkubecek, dsahern,
	simon.horman, jesse.brandeburg, maciejromanfijalkowski,
	vasundhara-v.volam, michael.chan, shalomt, idosch,
	Jakub Kicinski

Make nfp_port_from_netdev() take a const parameter, otherwise
it can't be used with upcoming stats code.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/nfp_net.h      | 2 +-
 drivers/net/ethernet/netronome/nfp/nfp_net_repr.h | 2 +-
 drivers/net/ethernet/netronome/nfp/nfp_port.c     | 2 +-
 drivers/net/ethernet/netronome/nfp/nfp_port.h     | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net.h b/drivers/net/ethernet/netronome/nfp/nfp_net.h
index 320ec3900a32..93de25b39bc1 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net.h
@@ -839,7 +839,7 @@ extern const char nfp_driver_version[];
 
 extern const struct net_device_ops nfp_net_netdev_ops;
 
-static inline bool nfp_netdev_is_nfp_net(struct net_device *netdev)
+static inline bool nfp_netdev_is_nfp_net(const struct net_device *netdev)
 {
 	return netdev->netdev_ops == &nfp_net_netdev_ops;
 }
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_repr.h b/drivers/net/ethernet/netronome/nfp/nfp_net_repr.h
index e0f13dfe1f39..7f2df74c2a7f 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_repr.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_repr.h
@@ -75,7 +75,7 @@ enum nfp_repr_type {
 
 extern const struct net_device_ops nfp_repr_netdev_ops;
 
-static inline bool nfp_netdev_is_nfp_repr(struct net_device *netdev)
+static inline bool nfp_netdev_is_nfp_repr(const struct net_device *netdev)
 {
 	return netdev->netdev_ops == &nfp_repr_netdev_ops;
 }
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_port.c b/drivers/net/ethernet/netronome/nfp/nfp_port.c
index 86bc149ca231..81a23f9bdfc0 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_port.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_port.c
@@ -12,7 +12,7 @@
 #include "nfp_net.h"
 #include "nfp_port.h"
 
-struct nfp_port *nfp_port_from_netdev(struct net_device *netdev)
+struct nfp_port *nfp_port_from_netdev(const struct net_device *netdev)
 {
 	if (nfp_netdev_is_nfp_net(netdev)) {
 		struct nfp_net *nn = netdev_priv(netdev);
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_port.h b/drivers/net/ethernet/netronome/nfp/nfp_port.h
index b2479a2a49e5..24de9250d564 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_port.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_port.h
@@ -105,7 +105,7 @@ static inline bool nfp_port_is_vnic(const struct nfp_port *port)
 int
 nfp_port_set_features(struct net_device *netdev, netdev_features_t features);
 
-struct nfp_port *nfp_port_from_netdev(struct net_device *netdev);
+struct nfp_port *nfp_port_from_netdev(const struct net_device *netdev);
 struct nfp_port *
 nfp_port_from_id(struct nfp_pf *pf, enum nfp_port_type type, unsigned int id);
 struct nfp_eth_table_port *__nfp_port_get_eth_port(struct nfp_port *port);
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RFC 03/14] net: hstats: add basic/core functionality
  2019-01-28 23:44 [RFC 00/14] netlink/hierarchical stats Jakub Kicinski
  2019-01-28 23:44 ` [RFC 01/14] nfp: remove unused structure Jakub Kicinski
  2019-01-28 23:44 ` [RFC 02/14] nfp: constify parameter to nfp_port_from_netdev() Jakub Kicinski
@ 2019-01-28 23:44 ` Jakub Kicinski
  2019-01-30  4:18   ` David Ahern
  2019-01-28 23:44 ` [RFC 04/14] net: hstats: allow hierarchies to be built Jakub Kicinski
                   ` (12 subsequent siblings)
  15 siblings, 1 reply; 27+ messages in thread
From: Jakub Kicinski @ 2019-01-28 23:44 UTC (permalink / raw)
  To: davem
  Cc: oss-drivers, netdev, jiri, f.fainelli, andrew, mkubecek, dsahern,
	simon.horman, jesse.brandeburg, maciejromanfijalkowski,
	vasundhara-v.volam, michael.chan, shalomt, idosch,
	Jakub Kicinski

Add basic hierarchical stats.  For now there is no hierarchies
or other fancy features.  An ndo is added, and drivers can return
multiple groups of stats by adding the to them dump with
rtnl_hstat_add_grp().

Each group has attributes (qualifiers) which designate the direction
of the statistic (TX vs RX) and the source (device vs driver).

A handful of common statistics maintained by Linux drivers is added.

Dumping machinery is a little involved to make extensions easy.
A simple stack-based machine is employed which will in due course
help keep tracking of children and iteration over classifiers.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 include/linux/netdevice.h    |   9 +
 include/net/hstats.h         |  94 +++++++
 include/uapi/linux/if_link.h |  45 ++++
 net/core/Makefile            |   2 +-
 net/core/hstats.c            | 497 +++++++++++++++++++++++++++++++++++
 net/core/rtnetlink.c         |  21 ++
 6 files changed, 667 insertions(+), 1 deletion(-)
 create mode 100644 include/net/hstats.h
 create mode 100644 net/core/hstats.c

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index e675ef97a426..9f16036312f9 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -941,6 +941,8 @@ struct dev_ifalias {
 	char ifalias[];
 };
 
+struct rtnl_hstat_req;
+
 /*
  * This structure defines the management hooks for network devices.
  * The following hooks can be defined; unless noted otherwise, they are
@@ -1245,6 +1247,11 @@ struct dev_ifalias {
  *	that got dropped are freed/returned via xdp_return_frame().
  *	Returns negative number, means general error invoking ndo, meaning
  *	no frames were xmit'ed and core-caller will free all frames.
+ * int (*ndo_hstat_get_groups)(const struct net_device *dev,
+ *			       struct rtnl_hstat_req *req);
+ *	This function is used to retrieve driver's groups of hierarchical stats.
+ *	Driver should use rtnl_hstat_add_grp() to report its groups.
+ *	See Documentation/networking/hstats.rst for details.
  */
 struct net_device_ops {
 	int			(*ndo_init)(struct net_device *dev);
@@ -1441,6 +1448,8 @@ struct net_device_ops {
 						u32 flags);
 	int			(*ndo_xsk_async_xmit)(struct net_device *dev,
 						      u32 queue_id);
+	int			(*ndo_hstat_get_groups)(const struct net_device *dev,
+							struct rtnl_hstat_req *req);
 };
 
 /**
diff --git a/include/net/hstats.h b/include/net/hstats.h
new file mode 100644
index 000000000000..c2e8b379237a
--- /dev/null
+++ b/include/net/hstats.h
@@ -0,0 +1,94 @@
+// SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+/* Copyright (C) 2019 Netronome Systems, Inc. */
+
+#ifndef _NET_HSTATS_H
+#define _NET_HSTATS_H
+
+#include <linux/if_link.h>
+#include <linux/kernel.h>
+#include <net/netlink.h>
+
+struct net_device;
+struct sk_buff;
+
+/* Internal driver/core qualifiers used as indexes in qualifier tables
+ * and translated into IFLA_HSTATS_QUAL_* in dumps.
+ */
+enum {
+	RTNL_HSTATS_QUAL_TYPE,
+	RTNL_HSTATS_QUAL_DIRECTION,
+
+	RTNL_HSTATS_QUAL_CNT
+};
+
+struct hstat_dumper;
+struct rtnl_hstat_group;
+
+struct rtnl_hstat_req {
+	int err;
+	struct sk_buff *skb;
+	struct hstat_dumper *dumper;
+};
+
+struct rtnl_hstat_qualifier {
+	unsigned int constant;
+};
+
+/**
+ * struct rtnl_hstat_group - node in the hstat hierarchy
+ * @qualifiers:	attributes describing this group
+ * @stats_cnt:	number of stats in the bitmask
+ * @stats:	bitmask of stats present
+ * @get_stats:	driver callback for dumping the stats
+ */
+struct rtnl_hstat_group {
+	/* Note: this is *not* indexed with IFLA_* attributes! */
+	struct rtnl_hstat_qualifier qualifiers[RTNL_HSTATS_QUAL_CNT];
+	/* Can't use bitmaps - words are variable length */
+	unsigned int stats_cnt;
+	u64 stats[DIV_ROUND_UP(IFLA_HSTATS_STAT_MAX + 1, 64)];
+	int (*get_stats)(struct net_device *dev, struct rtnl_hstat_req *req,
+			 const struct rtnl_hstat_group *grp);
+};
+
+void rtnl_hstat_add_grp(struct rtnl_hstat_req *req,
+			const struct rtnl_hstat_group *grp);
+
+static inline void
+rtnl_hstat_dump(struct rtnl_hstat_req *req, const int id, const u64 val)
+{
+	if (req->err)
+		return;
+	if (nla_put_u64_64bit(req->skb, id, val, IFLA_HSTATS_STAT_UNSPEC))
+		req->err = -EMSGSIZE;
+}
+
+size_t rtnl_get_link_hstats_size(const struct net_device *dev);
+size_t rtnl_get_link_hstats(struct sk_buff *skb, struct net_device *dev,
+			    int *prividx);
+
+enum {
+#define RTNL_HSTAT_BIT(_name, _word) \
+	RTNL_HSTATS_STAT_ ## _name ## _BIT = \
+		BIT_ULL(IFLA_HSTATS_STAT_ ## _name - 1 - ((_word) * 64))
+
+	/* Common Linux stats */
+	RTNL_HSTAT_BIT(LINUX_PKTS, 0),
+	RTNL_HSTAT_BIT(LINUX_BYTES, 0),
+	RTNL_HSTAT_BIT(LINUX_BUSY, 0),
+	RTNL_HSTAT_BIT(LINUX_CSUM_PARTIAL, 0),
+	RTNL_HSTAT_BIT(LINUX_CSUM_COMPLETE, 0),
+	RTNL_HSTAT_BIT(LINUX_CSUM_UNNECESSARY, 0),
+	RTNL_HSTAT_BIT(LINUX_SEGMENTATION_OFFLOAD_PKTS, 0),
+#undef RTNL_HSTAT_BIT
+};
+
+/* Helper defines for common qualifier sets */
+#define RTNL_HSTATS_QUALS_BASIC(type, dir)				\
+	[RTNL_HSTATS_QUAL_TYPE] = {					\
+		.constant	= IFLA_HSTATS_QUAL_TYPE_ ##type,	\
+	},								\
+	[RTNL_HSTATS_QUAL_DIRECTION] = {				\
+		.constant	= IFLA_HSTATS_QUAL_DIR_ ##dir,		\
+	}
+#endif
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 5b225ff63b48..55fcef81e142 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -910,6 +910,7 @@ enum {
 	IFLA_STATS_LINK_XSTATS_SLAVE,
 	IFLA_STATS_LINK_OFFLOAD_XSTATS,
 	IFLA_STATS_AF_SPEC,
+	IFLA_STATS_LINK_HSTATS,
 	__IFLA_STATS_MAX,
 };
 
@@ -938,6 +939,50 @@ enum {
 };
 #define IFLA_OFFLOAD_XSTATS_MAX (__IFLA_OFFLOAD_XSTATS_MAX - 1)
 
+/* These are embedded into IFLA_STATS_LINK_HSTATS:
+ * See Documentation/networking/hstats.rst for details.
+ */
+enum {
+	IFLA_HSTATS_UNSPEC,
+	IFLA_HSTATS_GROUP,
+	IFLA_HSTATS_STATS,
+	IFLA_HSTATS_QUAL_TYPE,
+	IFLA_HSTATS_QUAL_DIRECTION,
+	__IFLA_HSTATS_MAX,
+};
+#define IFLA_HSTATS_MAX (__IFLA_HSTATS_MAX - 1)
+
+enum {
+	IFLA_HSTATS_QUAL_TYPE_UNSPEC,
+	IFLA_HSTATS_QUAL_TYPE_DEV,
+	IFLA_HSTATS_QUAL_TYPE_DRV,
+	__IFLA_HSTATS_QUAL_TYPE_MAX,
+};
+#define IFLA_HSTATS_QUAL_TYPE_MAX (__IFLA_HSTATS_QUAL_TYPE_MAX - 1)
+
+enum {
+	IFLA_HSTATS_QUAL_DIR_UNSPEC,
+	IFLA_HSTATS_QUAL_DIR_RX,
+	IFLA_HSTATS_QUAL_DIR_TX,
+	__IFLA_HSTATS_QUAL_DIR_MAX,
+};
+#define IFLA_HSTATS_QUAL_DIR_MAX (__IFLA_HSTATS_QUAL_DIR_MAX - 1)
+
+enum {
+	IFLA_HSTATS_STAT_UNSPEC,
+	/* Common statistics */
+	IFLA_HSTATS_STAT_LINUX_PKTS, /* 0 */
+	IFLA_HSTATS_STAT_LINUX_BYTES,
+	IFLA_HSTATS_STAT_LINUX_BUSY,
+	IFLA_HSTATS_STAT_LINUX_CSUM_PARTIAL,
+	IFLA_HSTATS_STAT_LINUX_CSUM_COMPLETE,
+	IFLA_HSTATS_STAT_LINUX_CSUM_UNNECESSARY,
+	IFLA_HSTATS_STAT_LINUX_SEGMENTATION_OFFLOAD_PKTS,
+
+	__IFLA_HSTATS_STAT_MAX,
+};
+#define IFLA_HSTATS_STAT_MAX (__IFLA_HSTATS_STAT_MAX - 1)
+
 /* XDP section */
 
 #define XDP_FLAGS_UPDATE_IF_NOEXIST	(1U << 0)
diff --git a/net/core/Makefile b/net/core/Makefile
index fccd31e0e7f7..30635dfbbe9b 100644
--- a/net/core/Makefile
+++ b/net/core/Makefile
@@ -11,7 +11,7 @@ obj-$(CONFIG_SYSCTL) += sysctl_net_core.o
 obj-y		     += dev.o ethtool.o dev_addr_lists.o dst.o netevent.o \
 			neighbour.o rtnetlink.o utils.o link_watch.o filter.o \
 			sock_diag.o dev_ioctl.o tso.o sock_reuseport.o \
-			fib_notifier.o xdp.o
+			fib_notifier.o xdp.o hstats.o
 
 obj-y += net-sysfs.o
 obj-$(CONFIG_PAGE_POOL) += page_pool.o
diff --git a/net/core/hstats.c b/net/core/hstats.c
new file mode 100644
index 000000000000..183a1c5dd93a
--- /dev/null
+++ b/net/core/hstats.c
@@ -0,0 +1,497 @@
+// SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+/* Copyright (C) 2019 Netronome Systems, Inc. */
+
+#include <linux/bitmap.h>
+#include <linux/err.h>
+#include <linux/if_link.h>
+#include <linux/kernel.h>
+#include <linux/netdevice.h>
+#include <net/hstats.h>
+#include <net/netlink.h>
+
+/* We deploy a simple stack-based dumper to walk the hierarchies.
+ * This is the documentation format for quick analysis of the state machine:
+ *
+ *   Header (in case there are move than one possibility):
+ *
+ * o |  direct action 1 \ __ these are performed by the code
+ * r |  direct action 2 /
+ * d |  --------------
+ * e | |  STACK CMD 1 | \ __  these are popped from the stack and run
+ * r v |  STACK CMD 2 | /     in order after current handler completes
+ *      ============== <---- top of the stack before current handler
+ */
+enum hstat_dumper_cmd {
+	/*      open grp
+	 *   put const quals
+	 *   ---------------
+	 *  |   DUMP STATS  |
+	 *  |    CLOSE grp  |
+	 *   ===============
+	 */
+	HSTAT_DCMD_GRP_LOAD,
+	/* dump all statitics
+	 */
+	HSTAT_DCMD_GRP_DUMP,
+	/* close grp */
+	HSTAT_DCMD_GRP_CLOSE,
+	/* count root group (netlink restart index) */
+	HSTAT_DCMD_ROOT_GRP_DONE,
+};
+
+struct hstat_dumper {
+	struct sk_buff *skb;
+	struct net_device *dev;
+	/* For sizing we only have a const pointer to dev */
+	const struct net_device *const_dev;
+	int err;
+
+	/* For calculating skb size */
+	bool sizing;
+	size_t size;
+
+	int current_root_grp;
+	int last_completed_root_grp;
+
+	u8 *cmd_stack;
+	size_t cmd_stack_top;
+	size_t cmd_stack_len;
+};
+
+struct hstat_dumper_cmd_simple {
+	u64 cmd;
+};
+
+struct hstat_dumper_cmd_grp_load {
+	const struct rtnl_hstat_group *grp;
+	u64 cmd;
+};
+
+struct hstat_dumper_cmd_grp_dump {
+	const struct rtnl_hstat_group *grp;
+	u64 cmd;
+};
+
+struct hstat_dumper_cmd_grp_close {
+	struct nlattr *nl_attr;
+	u64 cmd;
+};
+
+/* RTNL helpers */
+static const int rtnl_qual2ifla[RTNL_HSTATS_QUAL_CNT] = {
+	[RTNL_HSTATS_QUAL_TYPE]		= IFLA_HSTATS_QUAL_TYPE,
+	[RTNL_HSTATS_QUAL_DIRECTION]	= IFLA_HSTATS_QUAL_DIRECTION,
+};
+
+static bool rtnl_hstat_qualifier_present(const struct rtnl_hstat_qualifier *q)
+{
+	return q->constant;
+}
+
+/* Dumper basics */
+static u64 hstat_dumper_peek_cmd(struct hstat_dumper *dumper)
+{
+	return *(u64 *)(dumper->cmd_stack + dumper->cmd_stack_top - 8);
+}
+
+static int hstat_dumper_discard(struct hstat_dumper *dumper, size_t len)
+{
+	if (WARN_ON_ONCE(dumper->cmd_stack_top < len))
+		return -EINVAL;
+	dumper->cmd_stack_top -= len;
+	return 0;
+}
+
+static int hstat_dumper_pop(struct hstat_dumper *dumper, void *dst, size_t len)
+{
+	if (WARN_ON_ONCE(dumper->cmd_stack_top < len))
+		return -EINVAL;
+	dumper->cmd_stack_top -= len;
+	memcpy(dst, dumper->cmd_stack + dumper->cmd_stack_top, len);
+	return 0;
+}
+
+static bool hstat_dumper_done(struct hstat_dumper *dumper)
+{
+	return !dumper->cmd_stack_top;
+}
+
+static int hstat_dumper_error(struct hstat_dumper *dumper)
+{
+	if (WARN_ON_ONCE(dumper->cmd_stack_top && dumper->cmd_stack_top < 8))
+		return -EINVAL;
+	return 0;
+}
+
+static struct hstat_dumper *
+hstat_dumper_init(struct sk_buff *skb, const struct net_device *const_dev,
+		  struct net_device *dev, int *prividx)
+{
+	struct hstat_dumper *dumper;
+
+	dumper = kzalloc(sizeof(*dumper), GFP_KERNEL);
+	if (!dumper)
+		return NULL;
+	dumper->cmd_stack = kmalloc(8096, GFP_KERNEL);
+	if (!dumper->cmd_stack) {
+		kfree(dumper);
+		return NULL;
+	}
+	dumper->cmd_stack_len = 8096;
+
+	dumper->skb = skb;
+	dumper->dev = dev;
+	dumper->const_dev = const_dev;
+	if (prividx)
+		dumper->last_completed_root_grp = *prividx;
+	else
+		dumper->sizing = true;
+
+	return dumper;
+}
+
+static void hstat_dumper_destroy(struct hstat_dumper *dumper)
+{
+	kfree(dumper->cmd_stack);
+	kfree(dumper);
+}
+
+/* Dumper pushers */
+static int
+__hstat_dumper_push_cmd(struct hstat_dumper *dumper, void *data, size_t len)
+{
+	/* All structures pushed must be multiple of 8 w/ cmd as last member */
+	if (WARN_ON_ONCE(len % 8))
+		return -EINVAL;
+
+	while (dumper->cmd_stack_len - dumper->cmd_stack_top < len) {
+		void *st;
+
+		st = krealloc(dumper->cmd_stack, dumper->cmd_stack_len * 2,
+			      GFP_KERNEL);
+		if (!st)
+			return -ENOMEM;
+
+		dumper->cmd_stack = st;
+		dumper->cmd_stack_len *= 2;
+	}
+
+	memcpy(dumper->cmd_stack + dumper->cmd_stack_top, data, len);
+	dumper->cmd_stack_top += len;
+	return 0;
+}
+
+static int
+hstat_dumper_push_grp_load(struct hstat_dumper *dumper,
+			   const struct rtnl_hstat_group *grp)
+{
+	struct hstat_dumper_cmd_grp_load cmd = {
+		.cmd =		HSTAT_DCMD_GRP_LOAD,
+		.grp =		grp,
+	};
+
+	return __hstat_dumper_push_cmd(dumper, &cmd, sizeof(cmd));
+}
+
+static int
+hstat_dumper_push_dump(struct hstat_dumper *dumper,
+		       const struct rtnl_hstat_group *grp)
+{
+	struct hstat_dumper_cmd_grp_dump cmd = {
+		.cmd =		HSTAT_DCMD_GRP_DUMP,
+		.grp =		grp,
+	};
+
+	return __hstat_dumper_push_cmd(dumper, &cmd, sizeof(cmd));
+}
+
+static int
+hstat_dumper_push_grp_close(struct hstat_dumper *dumper, struct nlattr *nl_grp)
+{
+	struct hstat_dumper_cmd_grp_close cmd = {
+		.cmd =		HSTAT_DCMD_GRP_CLOSE,
+		.nl_attr =	nl_grp,
+	};
+
+	return __hstat_dumper_push_cmd(dumper, &cmd, sizeof(cmd));
+}
+
+static int
+hstat_dumper_push_root_grp_done(struct hstat_dumper *dumper)
+{
+	struct hstat_dumper_cmd_simple cmd = { HSTAT_DCMD_ROOT_GRP_DONE };
+
+	return __hstat_dumper_push_cmd(dumper, &cmd, sizeof(cmd));
+}
+
+/* Dumper actions */
+static int hstat_dumper_open_grp(struct hstat_dumper *dumper)
+{
+	struct nlattr *nl_grp;
+	int err;
+
+	if (dumper->sizing) {
+		dumper->size += nla_total_size(0); /* IFLA_HSTATS_GROUP */
+		return 0;
+	}
+
+	/* Open group nlattr and push onto the stack a close command */
+	nl_grp = nla_nest_start(dumper->skb, IFLA_HSTATS_GROUP);
+	if (!nl_grp)
+		return -EMSGSIZE;
+
+	err = hstat_dumper_push_grp_close(dumper, nl_grp);
+	if (err) {
+		nla_nest_cancel(dumper->skb, nl_grp);
+		return err;
+	}
+
+	return 0;
+}
+
+static int
+hstat_dumper_grp_put_stats(struct hstat_dumper *dumper,
+			   const struct rtnl_hstat_group *grp)
+{
+	struct rtnl_hstat_req dump_req;
+	struct nlattr *nl_stats;
+	int err;
+
+	WARN_ON_ONCE(!grp->stats_cnt != !grp->get_stats);
+
+	if (!grp->stats_cnt)
+		return 0;
+
+	if (dumper->sizing) {
+		dumper->size += nla_total_size(0);
+		dumper->size += grp->stats_cnt * nla_total_size(8);
+		return 0;
+	}
+
+	nl_stats = nla_nest_start(dumper->skb, IFLA_HSTATS_STATS);
+	if (!nl_stats)
+		return -EMSGSIZE;
+
+	memset(&dump_req, 0, sizeof(dump_req));
+	dump_req.dumper = dumper;
+	dump_req.skb = dumper->skb;
+
+	err = grp->get_stats(dumper->dev, &dump_req, grp);
+	if (err)
+		goto err_cancel_stats;
+	err = dump_req.err;
+	if (err)
+		goto err_cancel_stats;
+
+	nla_nest_end(dumper->skb, nl_stats);
+	return 0;
+
+err_cancel_stats:
+	nla_nest_cancel(dumper->skb, nl_stats);
+	return err;
+}
+
+static int
+hstat_dumper_put_qual(struct hstat_dumper *dumper, int i, u32 val)
+{
+	if (dumper->sizing) {
+		dumper->size += nla_total_size(sizeof(u32));
+		return 0;
+	}
+
+	return nla_put_u32(dumper->skb, rtnl_qual2ifla[i], val);
+}
+
+/* Dumper handlers */
+static int hstat_dumper_grp_load(struct hstat_dumper *dumper)
+{
+	struct hstat_dumper_cmd_grp_load cmd;
+	int i, err;
+
+	err = hstat_dumper_pop(dumper, &cmd, sizeof(cmd));
+	if (err)
+		return err;
+	if (dumper->err)
+		return 0;
+
+	if (dumper->current_root_grp < dumper->last_completed_root_grp)
+		return 0;
+
+	err = hstat_dumper_open_grp(dumper);
+	if (err)
+		return err;
+
+	for (i = 0; i < RTNL_HSTATS_QUAL_CNT; i++) {
+		const struct rtnl_hstat_qualifier *q;
+
+		q = &cmd.grp->qualifiers[i];
+		if (!rtnl_hstat_qualifier_present(q))
+			continue;
+
+		if (q->constant) {
+			err = hstat_dumper_put_qual(dumper, i, q->constant);
+			if (err)
+				return err;
+		}
+	}
+
+	return hstat_dumper_push_dump(dumper, cmd.grp);
+}
+
+static int hstat_dumper_grp_dump(struct hstat_dumper *dumper)
+{
+	struct hstat_dumper_cmd_grp_dump cmd;
+	int err;
+
+	err = hstat_dumper_pop(dumper, &cmd, sizeof(cmd));
+	if (err)
+		return err;
+	if (dumper->err)
+		return 0;
+
+	err = hstat_dumper_grp_put_stats(dumper, cmd.grp);
+	if (err)
+		return err;
+
+	return 0;
+}
+
+static int hstat_dumper_grp_close(struct hstat_dumper *dumper)
+{
+	struct hstat_dumper_cmd_grp_close cmd;
+	int err;
+
+	err = hstat_dumper_pop(dumper, &cmd, sizeof(cmd));
+	if (err)
+		return err;
+
+	if (!dumper->err)
+		nla_nest_end(dumper->skb, cmd.nl_attr);
+	else
+		nla_nest_cancel(dumper->skb, cmd.nl_attr);
+	return 0;
+}
+
+static int hstat_dumper_root_grp_done(struct hstat_dumper *dumper)
+{
+	int err;
+
+	err = hstat_dumper_discard(dumper, sizeof(u64));
+	if (err)
+		return err;
+	if (dumper->err)
+		return 0;
+
+	dumper->current_root_grp++;
+	return 0;
+}
+
+static int hstat_dumper_run(struct hstat_dumper *dumper)
+{
+	do {
+		int err;
+		u64 cmd;
+
+		err = hstat_dumper_error(dumper);
+		if (err)
+			return err;
+		if (hstat_dumper_done(dumper))
+			return 0;
+
+		cmd = hstat_dumper_peek_cmd(dumper);
+		switch (cmd) {
+		case HSTAT_DCMD_ROOT_GRP_DONE:
+			err = hstat_dumper_root_grp_done(dumper);
+			break;
+		case HSTAT_DCMD_GRP_LOAD:
+			err = hstat_dumper_grp_load(dumper);
+			break;
+		case HSTAT_DCMD_GRP_CLOSE:
+			err = hstat_dumper_grp_close(dumper);
+			break;
+		case HSTAT_DCMD_GRP_DUMP:
+			err = hstat_dumper_grp_dump(dumper);
+			break;
+		}
+		if (err && !dumper->err)
+			/* Record the errror hand keep invoking handlers,
+			 * handlers will see the error and only do clean up.
+			 */
+			dumper->err = err;
+	} while (true);
+
+	return dumper->err;
+}
+
+/* Driver helpers */
+void
+rtnl_hstat_add_grp(struct rtnl_hstat_req *req,
+		   const struct rtnl_hstat_group *grp)
+{
+	if (!req->err)
+		req->err = hstat_dumper_push_root_grp_done(req->dumper);
+	if (!req->err)
+		req->err = hstat_dumper_push_grp_load(req->dumper, grp);
+}
+EXPORT_SYMBOL(rtnl_hstat_add_grp);
+
+/* Stack call points */
+static size_t
+__rtnl_get_link_hstats(struct sk_buff *skb, const struct net_device *const_dev,
+		       struct net_device *dev, int *prividx)
+{
+	struct hstat_dumper *dumper;
+	struct rtnl_hstat_req req;
+	ssize_t ret;
+
+	if (!dev->netdev_ops || !dev->netdev_ops->ndo_hstat_get_groups)
+		return -ENODATA;
+
+	dumper = hstat_dumper_init(skb, const_dev, dev, prividx);
+	if (!dumper)
+		return -ENOMEM;
+
+	memset(&req, 0, sizeof(req));
+	req.dumper = dumper;
+
+	ret = dev->netdev_ops->ndo_hstat_get_groups(dev, &req);
+	if (ret < 0)
+		goto exit_dumper_destroy;
+	ret = req.err;
+	if (ret)
+		goto exit_dumper_destroy;
+
+	if (hstat_dumper_done(dumper))
+		return -ENODATA;
+
+	ret = hstat_dumper_run(dumper);
+	if (prividx) {
+		if (ret)
+			*prividx = dumper->current_root_grp;
+		else
+			*prividx = 0;
+	} else if (ret >= 0) {
+		ret = dumper->size;
+	}
+
+exit_dumper_destroy:
+	hstat_dumper_destroy(dumper);
+	return ret;
+}
+
+size_t rtnl_get_link_hstats_size(const struct net_device *dev)
+{
+	ssize_t ret;
+
+	ret = __rtnl_get_link_hstats(NULL, dev, NULL, NULL);
+	return ret;
+}
+
+size_t
+rtnl_get_link_hstats(struct sk_buff *skb, struct net_device *dev, int *prividx)
+{
+	ssize_t ret;
+
+	ret = __rtnl_get_link_hstats(skb, dev, dev, prividx);
+	return ret;
+}
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index f5a98082ac7a..a8112d0dca57 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -50,6 +50,7 @@
 #include <net/ip.h>
 #include <net/protocol.h>
 #include <net/arp.h>
+#include <net/hstats.h>
 #include <net/route.h>
 #include <net/udp.h>
 #include <net/tcp.h>
@@ -4871,6 +4872,23 @@ static int rtnl_fill_statsinfo(struct sk_buff *skb, struct net_device *dev,
 		*idxattr = 0;
 	}
 
+	if (stats_attr_valid(filter_mask, IFLA_STATS_LINK_HSTATS, *idxattr)) {
+		*idxattr = IFLA_STATS_LINK_HSTATS;
+		attr = nla_nest_start(skb, IFLA_STATS_LINK_HSTATS);
+		if (!attr)
+			goto nla_put_failure;
+
+		err = rtnl_get_link_hstats(skb, dev, prividx);
+		if (err == -ENODATA)
+			nla_nest_cancel(skb, attr);
+		else
+			nla_nest_end(skb, attr);
+
+		if (err && err != -ENODATA)
+			goto nla_put_failure;
+		*idxattr = 0;
+	}
+
 	nlmsg_end(skb, nlh);
 
 	return 0;
@@ -4946,6 +4964,9 @@ static size_t if_nlmsg_stats_size(const struct net_device *dev,
 		rcu_read_unlock();
 	}
 
+	if (stats_attr_valid(filter_mask, IFLA_STATS_LINK_HSTATS, 0))
+		size += rtnl_get_link_hstats_size(dev);
+
 	return size;
 }
 
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RFC 04/14] net: hstats: allow hierarchies to be built
  2019-01-28 23:44 [RFC 00/14] netlink/hierarchical stats Jakub Kicinski
                   ` (2 preceding siblings ...)
  2019-01-28 23:44 ` [RFC 03/14] net: hstats: add basic/core functionality Jakub Kicinski
@ 2019-01-28 23:44 ` Jakub Kicinski
  2019-01-28 23:44 ` [RFC 05/14] nfp: very basic hstat support Jakub Kicinski
                   ` (11 subsequent siblings)
  15 siblings, 0 replies; 27+ messages in thread
From: Jakub Kicinski @ 2019-01-28 23:44 UTC (permalink / raw)
  To: davem
  Cc: oss-drivers, netdev, jiri, f.fainelli, andrew, mkubecek, dsahern,
	simon.horman, jesse.brandeburg, maciejromanfijalkowski,
	vasundhara-v.volam, michael.chan, shalomt, idosch,
	Jakub Kicinski

Allow groups to have other groups attached as children.  Child
groups allow embedding different groups in the same root group
and simplify definition of dependent stats (as qualifiers don't
have to be repeated).

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 include/net/hstats.h |  6 ++++++
 net/core/hstats.c    | 14 ++++++++++++++
 2 files changed, 20 insertions(+)

diff --git a/include/net/hstats.h b/include/net/hstats.h
index c2e8b379237a..cbbdaf93d408 100644
--- a/include/net/hstats.h
+++ b/include/net/hstats.h
@@ -37,18 +37,24 @@ struct rtnl_hstat_qualifier {
 /**
  * struct rtnl_hstat_group - node in the hstat hierarchy
  * @qualifiers:	attributes describing this group
+ * @has_children: @children array is present and NULL-terminated
  * @stats_cnt:	number of stats in the bitmask
  * @stats:	bitmask of stats present
  * @get_stats:	driver callback for dumping the stats
+ * @children:	NULL-terminated array of groups inheriting the qualifiers
+ *		@has_children has to be set for core to parse the array
  */
 struct rtnl_hstat_group {
 	/* Note: this is *not* indexed with IFLA_* attributes! */
 	struct rtnl_hstat_qualifier qualifiers[RTNL_HSTATS_QUAL_CNT];
+	bool has_children;
 	/* Can't use bitmaps - words are variable length */
 	unsigned int stats_cnt;
 	u64 stats[DIV_ROUND_UP(IFLA_HSTATS_STAT_MAX + 1, 64)];
 	int (*get_stats)(struct net_device *dev, struct rtnl_hstat_req *req,
 			 const struct rtnl_hstat_group *grp);
+
+	const struct rtnl_hstat_group *children[];
 };
 
 void rtnl_hstat_add_grp(struct rtnl_hstat_req *req,
diff --git a/net/core/hstats.c b/net/core/hstats.c
index 183a1c5dd93a..b409dd40e0c9 100644
--- a/net/core/hstats.c
+++ b/net/core/hstats.c
@@ -31,6 +31,10 @@ enum hstat_dumper_cmd {
 	 */
 	HSTAT_DCMD_GRP_LOAD,
 	/* dump all statitics
+	 *   ---------------
+	 *  |  LOAD child0  |
+	 *  |  LOAD child1  |
+	 *   ===============
 	 */
 	HSTAT_DCMD_GRP_DUMP,
 	/* close grp */
@@ -353,6 +357,16 @@ static int hstat_dumper_grp_dump(struct hstat_dumper *dumper)
 	if (err)
 		return err;
 
+	if (cmd.grp->has_children) {
+		const struct rtnl_hstat_group *const *grp;
+
+		for (grp = cmd.grp->children; *grp; grp++) {
+			err = hstat_dumper_push_grp_load(dumper, *grp);
+			if (err)
+				return err;
+		}
+	}
+
 	return 0;
 }
 
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RFC 05/14] nfp: very basic hstat support
  2019-01-28 23:44 [RFC 00/14] netlink/hierarchical stats Jakub Kicinski
                   ` (3 preceding siblings ...)
  2019-01-28 23:44 ` [RFC 04/14] net: hstats: allow hierarchies to be built Jakub Kicinski
@ 2019-01-28 23:44 ` Jakub Kicinski
  2019-01-28 23:44 ` [RFC 06/14] net: hstats: allow iterators Jakub Kicinski
                   ` (10 subsequent siblings)
  15 siblings, 0 replies; 27+ messages in thread
From: Jakub Kicinski @ 2019-01-28 23:44 UTC (permalink / raw)
  To: davem
  Cc: oss-drivers, netdev, jiri, f.fainelli, andrew, mkubecek, dsahern,
	simon.horman, jesse.brandeburg, maciejromanfijalkowski,
	vasundhara-v.volam, michael.chan, shalomt, idosch,
	Jakub Kicinski

Expose basic vNIC device statistics via hstat.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 drivers/net/ethernet/netronome/nfp/Makefile   |  1 +
 .../net/ethernet/netronome/nfp/nfp_hstat.c    | 70 +++++++++++++++++++
 drivers/net/ethernet/netronome/nfp/nfp_net.h  |  3 +
 .../ethernet/netronome/nfp/nfp_net_common.c   |  1 +
 4 files changed, 75 insertions(+)
 create mode 100644 drivers/net/ethernet/netronome/nfp/nfp_hstat.c

diff --git a/drivers/net/ethernet/netronome/nfp/Makefile b/drivers/net/ethernet/netronome/nfp/Makefile
index 47c708f08ade..4721abe9bfbf 100644
--- a/drivers/net/ethernet/netronome/nfp/Makefile
+++ b/drivers/net/ethernet/netronome/nfp/Makefile
@@ -19,6 +19,7 @@ nfp-objs := \
 	    nfp_app.o \
 	    nfp_app_nic.o \
 	    nfp_devlink.o \
+	    nfp_hstat.o \
 	    nfp_hwmon.o \
 	    nfp_main.o \
 	    nfp_net_common.o \
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_hstat.c b/drivers/net/ethernet/netronome/nfp/nfp_hstat.c
new file mode 100644
index 000000000000..9480d3b6caa5
--- /dev/null
+++ b/drivers/net/ethernet/netronome/nfp/nfp_hstat.c
@@ -0,0 +1,70 @@
+// SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+/* Copyright (C) 2019 Netronome Systems, Inc. */
+
+#include <net/hstats.h>
+
+#include "nfp_net.h"
+
+/* NFD per-vNIC stats */
+static int
+nfp_hstat_vnic_nfd_basic_get_rx(struct net_device *netdev,
+				struct rtnl_hstat_req *req,
+				const struct rtnl_hstat_group *grp)
+{
+	struct nfp_net *nn = netdev_priv(netdev);
+
+	rtnl_hstat_dump(req, IFLA_HSTATS_STAT_LINUX_PKTS,
+			nn_readq(nn, NFP_NET_CFG_STATS_RX_FRAMES));
+	rtnl_hstat_dump(req, IFLA_HSTATS_STAT_LINUX_BYTES,
+			nn_readq(nn, NFP_NET_CFG_STATS_RX_OCTETS));
+	return 0;
+}
+
+static const struct rtnl_hstat_group nfp_hstat_vnic_nfd_rx = {
+	.qualifiers = {
+		RTNL_HSTATS_QUALS_BASIC(DEV, RX),
+	},
+
+	.get_stats = nfp_hstat_vnic_nfd_basic_get_rx,
+	.stats	= {
+		[0] =	RTNL_HSTATS_STAT_LINUX_PKTS_BIT |
+			RTNL_HSTATS_STAT_LINUX_BYTES_BIT,
+	},
+	.stats_cnt = 2,
+};
+
+static int
+nfp_hstat_vnic_nfd_basic_get_tx(struct net_device *netdev,
+				struct rtnl_hstat_req *req,
+				const struct rtnl_hstat_group *grp)
+{
+	struct nfp_net *nn = netdev_priv(netdev);
+
+	rtnl_hstat_dump(req, IFLA_HSTATS_STAT_LINUX_PKTS,
+			nn_readq(nn, NFP_NET_CFG_STATS_TX_FRAMES));
+	rtnl_hstat_dump(req, IFLA_HSTATS_STAT_LINUX_BYTES,
+			nn_readq(nn, NFP_NET_CFG_STATS_TX_OCTETS));
+	return 0;
+}
+
+static const struct rtnl_hstat_group nfp_hstat_vnic_nfd_tx = {
+	.qualifiers = {
+		RTNL_HSTATS_QUALS_BASIC(DEV, TX),
+	},
+
+	.get_stats = nfp_hstat_vnic_nfd_basic_get_tx,
+	.stats	= {
+		[0] =	RTNL_HSTATS_STAT_LINUX_PKTS_BIT |
+			RTNL_HSTATS_STAT_LINUX_BYTES_BIT,
+	},
+	.stats_cnt = 2,
+};
+
+int nfp_net_hstat_get_groups(const struct net_device *netdev,
+			     struct rtnl_hstat_req *req)
+{
+	rtnl_hstat_add_grp(req, &nfp_hstat_vnic_nfd_rx);
+	rtnl_hstat_add_grp(req, &nfp_hstat_vnic_nfd_tx);
+
+	return 0;
+}
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net.h b/drivers/net/ethernet/netronome/nfp/nfp_net.h
index 93de25b39bc1..08396a23edeb 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net.h
@@ -105,6 +105,7 @@ struct nfp_eth_table_port;
 struct nfp_net;
 struct nfp_net_r_vector;
 struct nfp_port;
+struct rtnl_hstat_req;
 
 /* Convenience macro for wrapping descriptor index on ring size */
 #define D_IDX(ring, idx)	((idx) & ((ring)->cnt - 1))
@@ -910,4 +911,6 @@ static inline void nfp_net_debugfs_dir_clean(struct dentry **dir)
 }
 #endif /* CONFIG_NFP_DEBUG */
 
+int nfp_net_hstat_get_groups(const struct net_device *dev,
+			     struct rtnl_hstat_req *req);
 #endif /* _NFP_NET_H_ */
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
index 7d2d4241498f..87ebfc3f0471 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_net_common.c
@@ -3531,6 +3531,7 @@ const struct net_device_ops nfp_net_netdev_ops = {
 	.ndo_udp_tunnel_add	= nfp_net_add_vxlan_port,
 	.ndo_udp_tunnel_del	= nfp_net_del_vxlan_port,
 	.ndo_bpf		= nfp_net_xdp,
+	.ndo_hstat_get_groups	= nfp_net_hstat_get_groups,
 };
 
 /**
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RFC 06/14] net: hstats: allow iterators
  2019-01-28 23:44 [RFC 00/14] netlink/hierarchical stats Jakub Kicinski
                   ` (4 preceding siblings ...)
  2019-01-28 23:44 ` [RFC 05/14] nfp: very basic hstat support Jakub Kicinski
@ 2019-01-28 23:44 ` Jakub Kicinski
  2019-01-28 23:45 ` [RFC 07/14] net: hstats: help in iteration over directions Jakub Kicinski
                   ` (9 subsequent siblings)
  15 siblings, 0 replies; 27+ messages in thread
From: Jakub Kicinski @ 2019-01-28 23:44 UTC (permalink / raw)
  To: davem
  Cc: oss-drivers, netdev, jiri, f.fainelli, andrew, mkubecek, dsahern,
	simon.horman, jesse.brandeburg, maciejromanfijalkowski,
	vasundhara-v.volam, michael.chan, shalomt, idosch,
	Jakub Kicinski

Add support for iterative type of qualifiers.  User can set
min/max values in the group and the group will be dumped
multiple times, each time driver can read the current state
of the qualifier with rtnl_hstat_qual_get() to access the
right statistics.

Dump will look like this:
[group]
  [const qualifiers]
  [group (iter #0)]
    [iter qualifiers]
    [stats]
  [group (iter #1)]
    [iter qualifiers]
    [stats]
  ...

This means that if group contains iterative qualifiers it
will itself never contain statistics, its statistics will
only be reported in its subgroups where iterators have
a value assigned.

Dumper needs to keep track of which qualifiers where set
in a given group (iterations may nest).

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 include/net/hstats.h         |   9 ++
 include/uapi/linux/if_link.h |   3 +
 net/core/hstats.c            | 168 +++++++++++++++++++++++++++++++++--
 3 files changed, 173 insertions(+), 7 deletions(-)

diff --git a/include/net/hstats.h b/include/net/hstats.h
index cbbdaf93d408..00f4d9334422 100644
--- a/include/net/hstats.h
+++ b/include/net/hstats.h
@@ -17,6 +17,9 @@ struct sk_buff;
 enum {
 	RTNL_HSTATS_QUAL_TYPE,
 	RTNL_HSTATS_QUAL_DIRECTION,
+	RTNL_HSTATS_QUAL_QUEUE,
+	RTNL_HSTATS_QUAL_PRIORITY,
+	RTNL_HSTATS_QUAL_TC,
 
 	RTNL_HSTATS_QUAL_CNT
 };
@@ -32,6 +35,10 @@ struct rtnl_hstat_req {
 
 struct rtnl_hstat_qualifier {
 	unsigned int constant;
+	unsigned int min;
+	unsigned int max;
+	int (*get_max)(const struct net_device *dev,
+		       const struct rtnl_hstat_group *grp);
 };
 
 /**
@@ -59,6 +66,8 @@ struct rtnl_hstat_group {
 
 void rtnl_hstat_add_grp(struct rtnl_hstat_req *req,
 			const struct rtnl_hstat_group *grp);
+bool rtnl_hstat_qual_is_set(struct rtnl_hstat_req *req, int qual);
+int rtnl_hstat_qual_get(struct rtnl_hstat_req *req, int qual);
 
 static inline void
 rtnl_hstat_dump(struct rtnl_hstat_req *req, const int id, const u64 val)
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 55fcef81e142..b33d38ff5b47 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -948,6 +948,9 @@ enum {
 	IFLA_HSTATS_STATS,
 	IFLA_HSTATS_QUAL_TYPE,
 	IFLA_HSTATS_QUAL_DIRECTION,
+	IFLA_HSTATS_QUAL_QUEUE,
+	IFLA_HSTATS_QUAL_PRIORITY,
+	IFLA_HSTATS_QUAL_TC,
 	__IFLA_HSTATS_MAX,
 };
 #define IFLA_HSTATS_MAX (__IFLA_HSTATS_MAX - 1)
diff --git a/net/core/hstats.c b/net/core/hstats.c
index b409dd40e0c9..c689ebfdaeaa 100644
--- a/net/core/hstats.c
+++ b/net/core/hstats.c
@@ -22,14 +22,28 @@
  *      ============== <---- top of the stack before current handler
  */
 enum hstat_dumper_cmd {
-	/*      open grp
-	 *   put const quals
-	 *   ---------------
-	 *  |   DUMP STATS  |
-	 *  |    CLOSE grp  |
-	 *   ===============
+	/* Non-iterative group:		Iterative group:
+	 *
+	 *      open grp		     open grp
+	 *   put const quals		 put const quals
+	 *   ---------------		 ---------------
+	 *  |   DUMP STATS  |		|      ITER     |
+	 *  |    CLOSE grp  |		|    CLOSE grp  |
+	 *   ===============		 ===============
 	 */
 	HSTAT_DCMD_GRP_LOAD,
+	/* Non-last iteration:		Last iteration:
+	 *
+	 *      open grp		     open grp
+	 *  put current quals		put current quals
+	 *   increment quals		 ---------------
+	 *   ---------------		|   DUMP STATS  |
+	 *  |   DUMP STATS  |		|    CLOSE grp  |
+	 *  |    CLOSE grp  |		 ===============
+	 *  |      ITER     |
+	 *   ===============
+	 */
+	HSTAT_DCMD_GRP_ITER,
 	/* dump all statitics
 	 *   ---------------
 	 *  |  LOAD child0  |
@@ -48,6 +62,8 @@ struct hstat_dumper {
 	struct net_device *dev;
 	/* For sizing we only have a const pointer to dev */
 	const struct net_device *const_dev;
+	u32 quals[RTNL_HSTATS_QUAL_CNT];
+	unsigned long quals_set;
 	int err;
 
 	/* For calculating skb size */
@@ -62,6 +78,12 @@ struct hstat_dumper {
 	size_t cmd_stack_len;
 };
 
+struct hstat_qualifier_state {
+	u32 cur;
+	u32 min;
+	u32 max;
+};
+
 struct hstat_dumper_cmd_simple {
 	u64 cmd;
 };
@@ -71,12 +93,19 @@ struct hstat_dumper_cmd_grp_load {
 	u64 cmd;
 };
 
+struct hstat_dumper_cmd_grp_iter {
+	struct hstat_qualifier_state *quals;
+	const struct rtnl_hstat_group *grp;
+	u64 cmd;
+};
+
 struct hstat_dumper_cmd_grp_dump {
 	const struct rtnl_hstat_group *grp;
 	u64 cmd;
 };
 
 struct hstat_dumper_cmd_grp_close {
+	unsigned long quals_set;
 	struct nlattr *nl_attr;
 	u64 cmd;
 };
@@ -85,11 +114,14 @@ struct hstat_dumper_cmd_grp_close {
 static const int rtnl_qual2ifla[RTNL_HSTATS_QUAL_CNT] = {
 	[RTNL_HSTATS_QUAL_TYPE]		= IFLA_HSTATS_QUAL_TYPE,
 	[RTNL_HSTATS_QUAL_DIRECTION]	= IFLA_HSTATS_QUAL_DIRECTION,
+	[RTNL_HSTATS_QUAL_QUEUE]	= IFLA_HSTATS_QUAL_QUEUE,
+	[RTNL_HSTATS_QUAL_PRIORITY]	= IFLA_HSTATS_QUAL_PRIORITY,
+	[RTNL_HSTATS_QUAL_TC]		= IFLA_HSTATS_QUAL_TC,
 };
 
 static bool rtnl_hstat_qualifier_present(const struct rtnl_hstat_qualifier *q)
 {
-	return q->constant;
+	return q->constant || q->max || q->get_max;
 }
 
 /* Dumper basics */
@@ -197,6 +229,20 @@ hstat_dumper_push_grp_load(struct hstat_dumper *dumper,
 	return __hstat_dumper_push_cmd(dumper, &cmd, sizeof(cmd));
 }
 
+static int
+hstat_dumper_push_grp_iter(struct hstat_dumper *dumper,
+			   const struct rtnl_hstat_group *grp,
+			   struct hstat_qualifier_state *quals)
+{
+	struct hstat_dumper_cmd_grp_iter cmd = {
+		.cmd =		HSTAT_DCMD_GRP_ITER,
+		.grp =		grp,
+		.quals =	quals,
+	};
+
+	return __hstat_dumper_push_cmd(dumper, &cmd, sizeof(cmd));
+}
+
 static int
 hstat_dumper_push_dump(struct hstat_dumper *dumper,
 		       const struct rtnl_hstat_group *grp)
@@ -215,6 +261,7 @@ hstat_dumper_push_grp_close(struct hstat_dumper *dumper, struct nlattr *nl_grp)
 	struct hstat_dumper_cmd_grp_close cmd = {
 		.cmd =		HSTAT_DCMD_GRP_CLOSE,
 		.nl_attr =	nl_grp,
+		.quals_set =	dumper->quals_set,
 	};
 
 	return __hstat_dumper_push_cmd(dumper, &cmd, sizeof(cmd));
@@ -303,12 +350,18 @@ hstat_dumper_put_qual(struct hstat_dumper *dumper, int i, u32 val)
 		return 0;
 	}
 
+	/* Qualifiers cannot be overwritten once set */
+	if (WARN_ON_ONCE(__test_and_set_bit(i, &dumper->quals_set)))
+		return -EINVAL;
+	dumper->quals[i] = val;
+
 	return nla_put_u32(dumper->skb, rtnl_qual2ifla[i], val);
 }
 
 /* Dumper handlers */
 static int hstat_dumper_grp_load(struct hstat_dumper *dumper)
 {
+	struct hstat_qualifier_state *quals = NULL;
 	struct hstat_dumper_cmd_grp_load cmd;
 	int i, err;
 
@@ -336,7 +389,89 @@ static int hstat_dumper_grp_load(struct hstat_dumper *dumper)
 			err = hstat_dumper_put_qual(dumper, i, q->constant);
 			if (err)
 				return err;
+		} else {
+			int max;
+
+			/* Each iteration point has its own set of iterators,
+			 * this allows iterating different group over different
+			 * sets of qualifiers.
+			 */
+			if (!quals) {
+				quals = kcalloc(RTNL_HSTATS_QUAL_CNT,
+						sizeof(*quals), GFP_KERNEL);
+				if (!quals)
+					return -ENOMEM;
+			}
+
+			max = q->max ?: q->get_max(dumper->const_dev, cmd.grp);
+			if (max < 0)
+				return max;
+
+			if (WARN_ON_ONCE(q->min > max))
+				return -EINVAL;
+			quals[i].min = q->min;
+			quals[i].cur = q->min;
+			quals[i].max = max;
+		}
+	}
+
+	if (quals)
+		return hstat_dumper_push_grp_iter(dumper, cmd.grp, quals);
+	else
+		return hstat_dumper_push_dump(dumper, cmd.grp);
+}
+
+static int hstat_dumper_grp_iter(struct hstat_dumper *dumper)
+{
+	struct hstat_dumper_cmd_grp_iter cmd;
+	int i, err;
+	bool done;
+
+	err = hstat_dumper_pop(dumper, &cmd, sizeof(cmd));
+	if (err)
+		return err;
+	if (dumper->err) {
+		kfree(cmd.quals);
+		return 0;
+	}
+
+	/* Find out if iteration is done */
+	for (i = 0; i < RTNL_HSTATS_QUAL_CNT; i++)
+		if (cmd.quals[i].cur + 1 < cmd.quals[i].max)
+			break;
+	done = i == RTNL_HSTATS_QUAL_CNT;
+	if (!done) {
+		err = hstat_dumper_push_grp_iter(dumper, cmd.grp, cmd.quals);
+		if (err)
+			return err;
+	}
+
+	err = hstat_dumper_open_grp(dumper);
+	if (err)
+		return err;
+
+	for (i = 0; i < RTNL_HSTATS_QUAL_CNT; i++) {
+		if (!cmd.quals[i].max)
+			continue;
+
+		err = hstat_dumper_put_qual(dumper, i, cmd.quals[i].cur);
+		if (err)
+			return err;
+	}
+
+	if (!done) {
+		for (i = 0; i < RTNL_HSTATS_QUAL_CNT; i++) {
+			if (cmd.quals[i].cur >= cmd.quals[i].max)
+				continue;
+
+			cmd.quals[i].cur++;
+			if (cmd.quals[i].cur == cmd.quals[i].max)
+				cmd.quals[i].cur = cmd.quals[i].min;
+			else
+				break;
 		}
+	} else {
+		kfree(cmd.quals);
 	}
 
 	return hstat_dumper_push_dump(dumper, cmd.grp);
@@ -379,6 +514,7 @@ static int hstat_dumper_grp_close(struct hstat_dumper *dumper)
 	if (err)
 		return err;
 
+	dumper->quals_set = cmd.quals_set;
 	if (!dumper->err)
 		nla_nest_end(dumper->skb, cmd.nl_attr);
 	else
@@ -417,6 +553,9 @@ static int hstat_dumper_run(struct hstat_dumper *dumper)
 		case HSTAT_DCMD_ROOT_GRP_DONE:
 			err = hstat_dumper_root_grp_done(dumper);
 			break;
+		case HSTAT_DCMD_GRP_ITER:
+			err = hstat_dumper_grp_iter(dumper);
+			break;
 		case HSTAT_DCMD_GRP_LOAD:
 			err = hstat_dumper_grp_load(dumper);
 			break;
@@ -449,6 +588,21 @@ rtnl_hstat_add_grp(struct rtnl_hstat_req *req,
 }
 EXPORT_SYMBOL(rtnl_hstat_add_grp);
 
+bool rtnl_hstat_qual_is_set(struct rtnl_hstat_req *req, int qual)
+{
+	return test_bit(qual, &req->dumper->quals_set);
+}
+EXPORT_SYMBOL(rtnl_hstat_qual_is_set);
+
+int rtnl_hstat_qual_get(struct rtnl_hstat_req *req, int qual)
+{
+	if (!test_bit(qual, &req->dumper->quals_set))
+		return U32_MAX;
+
+	return req->dumper->quals[qual];
+}
+EXPORT_SYMBOL(rtnl_hstat_qual_get);
+
 /* Stack call points */
 static size_t
 __rtnl_get_link_hstats(struct sk_buff *skb, const struct net_device *const_dev,
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RFC 07/14] net: hstats: help in iteration over directions
  2019-01-28 23:44 [RFC 00/14] netlink/hierarchical stats Jakub Kicinski
                   ` (5 preceding siblings ...)
  2019-01-28 23:44 ` [RFC 06/14] net: hstats: allow iterators Jakub Kicinski
@ 2019-01-28 23:45 ` Jakub Kicinski
  2019-01-28 23:45 ` [RFC 08/14] nfp: hstats: make use of iteration for direction Jakub Kicinski
                   ` (8 subsequent siblings)
  15 siblings, 0 replies; 27+ messages in thread
From: Jakub Kicinski @ 2019-01-28 23:45 UTC (permalink / raw)
  To: davem
  Cc: oss-drivers, netdev, jiri, f.fainelli, andrew, mkubecek, dsahern,
	simon.horman, jesse.brandeburg, maciejromanfijalkowski,
	vasundhara-v.volam, michael.chan, shalomt, idosch,
	Jakub Kicinski

Devices (or drivers) often keep the same statistics in both
directions, aid iterating over such statistics with some
helpers.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 include/net/hstats.h | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/include/net/hstats.h b/include/net/hstats.h
index 00f4d9334422..bb83f50768b1 100644
--- a/include/net/hstats.h
+++ b/include/net/hstats.h
@@ -69,6 +69,12 @@ void rtnl_hstat_add_grp(struct rtnl_hstat_req *req,
 bool rtnl_hstat_qual_is_set(struct rtnl_hstat_req *req, int qual);
 int rtnl_hstat_qual_get(struct rtnl_hstat_req *req, int qual);
 
+static inline bool rtnl_hstat_is_rx(struct rtnl_hstat_req *req)
+{
+	return rtnl_hstat_qual_get(req, RTNL_HSTATS_QUAL_DIRECTION) ==
+		IFLA_HSTATS_QUAL_DIR_RX;
+}
+
 static inline void
 rtnl_hstat_dump(struct rtnl_hstat_req *req, const int id, const u64 val)
 {
@@ -106,4 +112,13 @@ enum {
 	[RTNL_HSTATS_QUAL_DIRECTION] = {				\
 		.constant	= IFLA_HSTATS_QUAL_DIR_ ##dir,		\
 	}
+
+#define RTNL_HSTATS_QUALS_BASIC_BIDIR(type)				\
+	[RTNL_HSTATS_QUAL_TYPE] = {					\
+		.constant	= IFLA_HSTATS_QUAL_TYPE_ ##type,	\
+	},								\
+	[RTNL_HSTATS_QUAL_DIRECTION] = {				\
+		.min		= IFLA_HSTATS_QUAL_DIR_RX,		\
+		.max		= IFLA_HSTATS_QUAL_DIR_TX + 1,		\
+	}
 #endif
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RFC 08/14] nfp: hstats: make use of iteration for direction
  2019-01-28 23:44 [RFC 00/14] netlink/hierarchical stats Jakub Kicinski
                   ` (6 preceding siblings ...)
  2019-01-28 23:45 ` [RFC 07/14] net: hstats: help in iteration over directions Jakub Kicinski
@ 2019-01-28 23:45 ` Jakub Kicinski
  2019-01-28 23:45 ` [RFC 09/14] nfp: hstats: add driver and device per queue statistics Jakub Kicinski
                   ` (7 subsequent siblings)
  15 siblings, 0 replies; 27+ messages in thread
From: Jakub Kicinski @ 2019-01-28 23:45 UTC (permalink / raw)
  To: davem
  Cc: oss-drivers, netdev, jiri, f.fainelli, andrew, mkubecek, dsahern,
	simon.horman, jesse.brandeburg, maciejromanfijalkowski,
	vasundhara-v.volam, michael.chan, shalomt, idosch,
	Jakub Kicinski

Basic RX and TX stats are the same, just at a slightly different
offset, use iteration to save code.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 .../net/ethernet/netronome/nfp/nfp_hstat.c    | 48 +++++--------------
 1 file changed, 12 insertions(+), 36 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_hstat.c b/drivers/net/ethernet/netronome/nfp/nfp_hstat.c
index 9480d3b6caa5..d339008333bc 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_hstat.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_hstat.c
@@ -7,52 +7,29 @@
 
 /* NFD per-vNIC stats */
 static int
-nfp_hstat_vnic_nfd_basic_get_rx(struct net_device *netdev,
-				struct rtnl_hstat_req *req,
-				const struct rtnl_hstat_group *grp)
+nfp_hstat_vnic_nfd_basic_get(struct net_device *netdev,
+			     struct rtnl_hstat_req *req,
+			     const struct rtnl_hstat_group *grp)
 {
 	struct nfp_net *nn = netdev_priv(netdev);
+	u32 off;
 
-	rtnl_hstat_dump(req, IFLA_HSTATS_STAT_LINUX_PKTS,
-			nn_readq(nn, NFP_NET_CFG_STATS_RX_FRAMES));
-	rtnl_hstat_dump(req, IFLA_HSTATS_STAT_LINUX_BYTES,
-			nn_readq(nn, NFP_NET_CFG_STATS_RX_OCTETS));
-	return 0;
-}
-
-static const struct rtnl_hstat_group nfp_hstat_vnic_nfd_rx = {
-	.qualifiers = {
-		RTNL_HSTATS_QUALS_BASIC(DEV, RX),
-	},
-
-	.get_stats = nfp_hstat_vnic_nfd_basic_get_rx,
-	.stats	= {
-		[0] =	RTNL_HSTATS_STAT_LINUX_PKTS_BIT |
-			RTNL_HSTATS_STAT_LINUX_BYTES_BIT,
-	},
-	.stats_cnt = 2,
-};
-
-static int
-nfp_hstat_vnic_nfd_basic_get_tx(struct net_device *netdev,
-				struct rtnl_hstat_req *req,
-				const struct rtnl_hstat_group *grp)
-{
-	struct nfp_net *nn = netdev_priv(netdev);
+	off = rtnl_hstat_is_rx(req) ?
+		0 : NFP_NET_CFG_STATS_TX_OCTETS - NFP_NET_CFG_STATS_RX_OCTETS;
 
 	rtnl_hstat_dump(req, IFLA_HSTATS_STAT_LINUX_PKTS,
-			nn_readq(nn, NFP_NET_CFG_STATS_TX_FRAMES));
+			nn_readq(nn, NFP_NET_CFG_STATS_RX_FRAMES + off));
 	rtnl_hstat_dump(req, IFLA_HSTATS_STAT_LINUX_BYTES,
-			nn_readq(nn, NFP_NET_CFG_STATS_TX_OCTETS));
+			nn_readq(nn, NFP_NET_CFG_STATS_RX_OCTETS + off));
 	return 0;
 }
 
-static const struct rtnl_hstat_group nfp_hstat_vnic_nfd_tx = {
+static const struct rtnl_hstat_group nfp_hstat_vnic_nfd = {
 	.qualifiers = {
-		RTNL_HSTATS_QUALS_BASIC(DEV, TX),
+		RTNL_HSTATS_QUALS_BASIC_BIDIR(DEV),
 	},
 
-	.get_stats = nfp_hstat_vnic_nfd_basic_get_tx,
+	.get_stats = nfp_hstat_vnic_nfd_basic_get,
 	.stats	= {
 		[0] =	RTNL_HSTATS_STAT_LINUX_PKTS_BIT |
 			RTNL_HSTATS_STAT_LINUX_BYTES_BIT,
@@ -63,8 +40,7 @@ static const struct rtnl_hstat_group nfp_hstat_vnic_nfd_tx = {
 int nfp_net_hstat_get_groups(const struct net_device *netdev,
 			     struct rtnl_hstat_req *req)
 {
-	rtnl_hstat_add_grp(req, &nfp_hstat_vnic_nfd_rx);
-	rtnl_hstat_add_grp(req, &nfp_hstat_vnic_nfd_tx);
+	rtnl_hstat_add_grp(req, &nfp_hstat_vnic_nfd);
 
 	return 0;
 }
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RFC 09/14] nfp: hstats: add driver and device per queue statistics
  2019-01-28 23:44 [RFC 00/14] netlink/hierarchical stats Jakub Kicinski
                   ` (7 preceding siblings ...)
  2019-01-28 23:45 ` [RFC 08/14] nfp: hstats: make use of iteration for direction Jakub Kicinski
@ 2019-01-28 23:45 ` Jakub Kicinski
  2019-01-28 23:45 ` [RFC 10/14] net: hstats: add IEEE 802.3 and common IETF MIB/RMON stats Jakub Kicinski
                   ` (6 subsequent siblings)
  15 siblings, 0 replies; 27+ messages in thread
From: Jakub Kicinski @ 2019-01-28 23:45 UTC (permalink / raw)
  To: davem
  Cc: oss-drivers, netdev, jiri, f.fainelli, andrew, mkubecek, dsahern,
	simon.horman, jesse.brandeburg, maciejromanfijalkowski,
	vasundhara-v.volam, michael.chan, shalomt, idosch,
	Jakub Kicinski

Report basic stats for all queues, per-queue.  Separate
device and driver statistics into different groups.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 .../net/ethernet/netronome/nfp/nfp_hstat.c    | 133 ++++++++++++++++++
 1 file changed, 133 insertions(+)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_hstat.c b/drivers/net/ethernet/netronome/nfp/nfp_hstat.c
index d339008333bc..9300996e756c 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_hstat.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_hstat.c
@@ -37,9 +37,142 @@ static const struct rtnl_hstat_group nfp_hstat_vnic_nfd = {
 	.stats_cnt = 2,
 };
 
+/* NFD per-Q stats */
+static int
+nfp_hstat_vnic_nfd_get_queues(const struct net_device *dev,
+			      const struct rtnl_hstat_group *grp)
+{
+	struct nfp_net *nn = netdev_priv(dev);
+
+	return nn->max_r_vecs;
+}
+
+static int
+nfp_hstat_vnic_nfd_pq_get(struct net_device *dev,
+			  struct rtnl_hstat_req *req,
+			  const struct rtnl_hstat_group *grp)
+{
+	struct nfp_net *nn = netdev_priv(dev);
+	u32 queue, off;
+
+	queue = rtnl_hstat_qual_get(req, RTNL_HSTATS_QUAL_QUEUE);
+
+	off = NFP_NET_CFG_TXR_STATS(queue);
+	off += rtnl_hstat_is_rx(req) ?
+		0 : NFP_NET_CFG_RXR_STATS_BASE - NFP_NET_CFG_TXR_STATS_BASE;
+
+	rtnl_hstat_dump(req, IFLA_HSTATS_STAT_LINUX_PKTS, nn_readq(nn, off));
+	rtnl_hstat_dump(req, IFLA_HSTATS_STAT_LINUX_BYTES,
+			nn_readq(nn, off + 8));
+	return 0;
+}
+
+static const struct rtnl_hstat_group nfp_hstat_vnic_nfd_pq = {
+	.qualifiers = {
+		RTNL_HSTATS_QUALS_BASIC_BIDIR(DEV),
+		[RTNL_HSTATS_QUAL_QUEUE] = {
+			.get_max	= nfp_hstat_vnic_nfd_get_queues,
+		},
+	},
+
+	.get_stats = nfp_hstat_vnic_nfd_pq_get,
+	.stats	= {
+		[0] =	RTNL_HSTATS_STAT_LINUX_PKTS_BIT |
+			RTNL_HSTATS_STAT_LINUX_BYTES_BIT,
+	},
+	.stats_cnt = 2,
+};
+
+/* vNIC software stats */
+static int
+nfp_hstat_vnic_sw_rx_get(struct net_device *dev, struct rtnl_hstat_req *req,
+			 const struct rtnl_hstat_group *grp)
+{
+	struct nfp_net *nn = netdev_priv(dev);
+	struct nfp_net_r_vector *r_vec;
+
+	r_vec = &nn->r_vecs[rtnl_hstat_qual_get(req, RTNL_HSTATS_QUAL_QUEUE)];
+
+	rtnl_hstat_dump(req, IFLA_HSTATS_STAT_LINUX_PKTS, r_vec->rx_pkts);
+	rtnl_hstat_dump(req, IFLA_HSTATS_STAT_LINUX_BYTES, r_vec->rx_bytes);
+	rtnl_hstat_dump(req, IFLA_HSTATS_STAT_LINUX_CSUM_PARTIAL,
+			r_vec->hw_csum_rx_complete);
+	rtnl_hstat_dump(req, IFLA_HSTATS_STAT_LINUX_CSUM_UNNECESSARY,
+			r_vec->hw_csum_rx_ok + r_vec->hw_csum_rx_inner_ok);
+	return 0;
+}
+
+static const struct rtnl_hstat_group nfp_hstat_vnic_sw_rx = {
+	.qualifiers = {
+		RTNL_HSTATS_QUALS_BASIC(DRV, RX),
+		[RTNL_HSTATS_QUAL_QUEUE] = {
+			.get_max	= nfp_hstat_vnic_nfd_get_queues,
+		},
+	},
+
+	.get_stats = nfp_hstat_vnic_sw_rx_get,
+	.stats	= {
+		[0] =	RTNL_HSTATS_STAT_LINUX_PKTS_BIT |
+			RTNL_HSTATS_STAT_LINUX_BYTES_BIT |
+			RTNL_HSTATS_STAT_LINUX_CSUM_PARTIAL_BIT |
+			RTNL_HSTATS_STAT_LINUX_CSUM_UNNECESSARY_BIT,
+
+	},
+	.stats_cnt = 4,
+};
+
+static int
+nfp_hstat_vnic_sw_tx_get(struct net_device *dev, struct rtnl_hstat_req *req,
+			 const struct rtnl_hstat_group *grp)
+{
+	struct nfp_net *nn = netdev_priv(dev);
+	struct nfp_net_r_vector *r_vec;
+
+	r_vec = &nn->r_vecs[rtnl_hstat_qual_get(req, RTNL_HSTATS_QUAL_QUEUE)];
+
+	rtnl_hstat_dump(req, IFLA_HSTATS_STAT_LINUX_PKTS, r_vec->tx_pkts);
+	rtnl_hstat_dump(req, IFLA_HSTATS_STAT_LINUX_BYTES, r_vec->tx_bytes);
+	rtnl_hstat_dump(req, IFLA_HSTATS_STAT_LINUX_BUSY, r_vec->tx_busy);
+	rtnl_hstat_dump(req, IFLA_HSTATS_STAT_LINUX_CSUM_PARTIAL,
+			r_vec->hw_csum_tx + r_vec->hw_csum_tx_inner);
+	rtnl_hstat_dump(req, IFLA_HSTATS_STAT_LINUX_SEGMENTATION_OFFLOAD_PKTS,
+			r_vec->tx_lso);
+	return 0;
+}
+
+static const struct rtnl_hstat_group nfp_hstat_vnic_sw_tx = {
+	.qualifiers = {
+		RTNL_HSTATS_QUALS_BASIC(DRV, TX),
+		[RTNL_HSTATS_QUAL_QUEUE] = {
+			.get_max	= nfp_hstat_vnic_nfd_get_queues,
+		},
+	},
+
+	.get_stats = nfp_hstat_vnic_sw_tx_get,
+	.stats	= {
+		[0] =	RTNL_HSTATS_STAT_LINUX_PKTS_BIT |
+			RTNL_HSTATS_STAT_LINUX_BYTES_BIT |
+			RTNL_HSTATS_STAT_LINUX_BUSY_BIT |
+			RTNL_HSTATS_STAT_LINUX_CSUM_PARTIAL_BIT |
+			RTNL_HSTATS_STAT_LINUX_SEGMENTATION_OFFLOAD_PKTS_BIT,
+	},
+	.stats_cnt = 5,
+};
+
+static const struct rtnl_hstat_group nfp_hstat_vnic_sw = {
+	.has_children = true,
+	.children = {
+		&nfp_hstat_vnic_sw_rx,
+		&nfp_hstat_vnic_sw_tx,
+		NULL,
+	},
+};
+
 int nfp_net_hstat_get_groups(const struct net_device *netdev,
 			     struct rtnl_hstat_req *req)
 {
+	rtnl_hstat_add_grp(req, &nfp_hstat_vnic_sw);
+	rtnl_hstat_add_grp(req, &nfp_hstat_vnic_nfd_pq);
 	rtnl_hstat_add_grp(req, &nfp_hstat_vnic_nfd);
 
 	return 0;
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RFC 10/14] net: hstats: add IEEE 802.3 and common IETF MIB/RMON stats
  2019-01-28 23:44 [RFC 00/14] netlink/hierarchical stats Jakub Kicinski
                   ` (8 preceding siblings ...)
  2019-01-28 23:45 ` [RFC 09/14] nfp: hstats: add driver and device per queue statistics Jakub Kicinski
@ 2019-01-28 23:45 ` Jakub Kicinski
  2019-01-28 23:45 ` [RFC 11/14] nfp: hstats: add IEEE/RMON ethernet port/MAC stats Jakub Kicinski
                   ` (5 subsequent siblings)
  15 siblings, 0 replies; 27+ messages in thread
From: Jakub Kicinski @ 2019-01-28 23:45 UTC (permalink / raw)
  To: davem
  Cc: oss-drivers, netdev, jiri, f.fainelli, andrew, mkubecek, dsahern,
	simon.horman, jesse.brandeburg, maciejromanfijalkowski,
	vasundhara-v.volam, michael.chan, shalomt, idosch,
	Jakub Kicinski

Add IDs for standard-based counters commonly maintained by ethernet
hardware (IEEE 802.3, IETF RFC2819, IETF RFC2863).  The lists are
not intended to be complete, reserve some ID space for adding
more stats in the middle if they are found useful later.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 include/net/hstats.h         | 50 ++++++++++++++++++++++++++++++++++++
 include/uapi/linux/if_link.h | 50 ++++++++++++++++++++++++++++++++++++
 2 files changed, 100 insertions(+)

diff --git a/include/net/hstats.h b/include/net/hstats.h
index bb83f50768b1..dd6d5dc567cb 100644
--- a/include/net/hstats.h
+++ b/include/net/hstats.h
@@ -101,6 +101,56 @@ enum {
 	RTNL_HSTAT_BIT(LINUX_CSUM_COMPLETE, 0),
 	RTNL_HSTAT_BIT(LINUX_CSUM_UNNECESSARY, 0),
 	RTNL_HSTAT_BIT(LINUX_SEGMENTATION_OFFLOAD_PKTS, 0),
+
+	/* IETF RFC2819 ("Remote Network Monitoring MIB") */
+	RTNL_HSTAT_BIT(RFC2819_etherStatsDropEvents, 1),
+	RTNL_HSTAT_BIT(RFC2819_etherStatsOctets, 1),
+	RTNL_HSTAT_BIT(RFC2819_etherStatsPkts, 1),
+	RTNL_HSTAT_BIT(RFC2819_etherStatsBroadcastPkts, 1),
+	RTNL_HSTAT_BIT(RFC2819_etherStatsMulticastPkts, 1),
+	RTNL_HSTAT_BIT(RFC2819_etherStatsCRCAlignErrors, 1),
+	RTNL_HSTAT_BIT(RFC2819_etherStatsUndersizePkts, 1),
+	RTNL_HSTAT_BIT(RFC2819_etherStatsOversizePkts, 1),
+	RTNL_HSTAT_BIT(RFC2819_etherStatsFragments, 1),
+	RTNL_HSTAT_BIT(RFC2819_etherStatsJabbers, 1),
+	RTNL_HSTAT_BIT(RFC2819_etherStatsCollisions, 1),
+	RTNL_HSTAT_BIT(RFC2819_etherStatsPkts64Octets, 1),
+	RTNL_HSTAT_BIT(RFC2819_etherStatsPkts65to127Octets, 1),
+	RTNL_HSTAT_BIT(RFC2819_etherStatsPkts128to255Octets, 1),
+	RTNL_HSTAT_BIT(RFC2819_etherStatsPkts256to511Octets, 1),
+	RTNL_HSTAT_BIT(RFC2819_etherStatsPkts512to1023Octets, 1),
+	RTNL_HSTAT_BIT(RFC2819_etherStatsPkts1024to1518Octets, 1),
+	/* Extensions to IETF RFC2819 */
+	RTNL_HSTAT_BIT(RFC2819EXT_etherStatsPkts1024toMaxOctets, 1),
+	RTNL_HSTAT_BIT(RFC2819EXT_etherStatsPkts1519toMaxOctets, 1),
+	RTNL_HSTAT_BIT(RFC2819EXT_etherStatsPkts1024to2047Octets, 1),
+	RTNL_HSTAT_BIT(RFC2819EXT_etherStatsPkts2048to4095Octets, 1),
+	RTNL_HSTAT_BIT(RFC2819EXT_etherStatsPkts4096to8191Octets, 1),
+	RTNL_HSTAT_BIT(RFC2819EXT_etherStatsPkts8192toMaxOctets, 1),
+
+	/* IETF RFC2863 ("The Interfaces Group MIB") */
+	RTNL_HSTAT_BIT(RFC2863_UcastPkts, 1),
+	RTNL_HSTAT_BIT(RFC2863_Errors, 1),
+	RTNL_HSTAT_BIT(RFC2863_Discards, 1),
+
+	/* IEEE 802.3 */
+	RTNL_HSTAT_BIT(IEEE8023_FramesOK, 2),
+	RTNL_HSTAT_BIT(IEEE8023_OctetsOK, 2),
+	RTNL_HSTAT_BIT(IEEE8023_MulticastFramesOK, 2),
+	RTNL_HSTAT_BIT(IEEE8023_BroadcastFramesOK, 2),
+	RTNL_HSTAT_BIT(IEEE8023_FrameCheckSequenceErrors, 2),
+	RTNL_HSTAT_BIT(IEEE8023_AlignmentErrors, 2),
+	RTNL_HSTAT_BIT(IEEE8023_InRangeLengthErrors, 2),
+	RTNL_HSTAT_BIT(IEEE8023_OutOfRangeLengthField, 2),
+	RTNL_HSTAT_BIT(IEEE8023_FrameTooLongErrors, 2),
+	RTNL_HSTAT_BIT(IEEE8023_CollisionFrames, 2),
+	RTNL_HSTAT_BIT(IEEE8023_SQETestErrors, 2),
+	RTNL_HSTAT_BIT(IEEE8023_SymbolErrorDuringCarrier, 2),
+	RTNL_HSTAT_BIT(IEEE8023_MACControlFrames, 2),
+	RTNL_HSTAT_BIT(IEEE8023_UnsupportedOpcodes, 2),
+	RTNL_HSTAT_BIT(IEEE8023_PAUSEMACCtrlFrames, 2),
+	RTNL_HSTAT_BIT(IEEE8023_FECCorrectedBlocks, 2),
+	RTNL_HSTAT_BIT(IEEE8023_FECUncorrectableBlocks, 2),
 #undef RTNL_HSTAT_BIT
 };
 
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index b33d38ff5b47..c58468100a06 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -982,6 +982,56 @@ enum {
 	IFLA_HSTATS_STAT_LINUX_CSUM_UNNECESSARY,
 	IFLA_HSTATS_STAT_LINUX_SEGMENTATION_OFFLOAD_PKTS,
 
+	/* IETF RFC2819 ("Remote Network Monitoring MIB") */
+	IFLA_HSTATS_STAT_RFC2819_etherStatsDropEvents = 65, /* 1 */
+	IFLA_HSTATS_STAT_RFC2819_etherStatsOctets,
+	IFLA_HSTATS_STAT_RFC2819_etherStatsPkts,
+	IFLA_HSTATS_STAT_RFC2819_etherStatsBroadcastPkts,
+	IFLA_HSTATS_STAT_RFC2819_etherStatsMulticastPkts,
+	IFLA_HSTATS_STAT_RFC2819_etherStatsCRCAlignErrors,
+	IFLA_HSTATS_STAT_RFC2819_etherStatsUndersizePkts,
+	IFLA_HSTATS_STAT_RFC2819_etherStatsOversizePkts,
+	IFLA_HSTATS_STAT_RFC2819_etherStatsFragments,
+	IFLA_HSTATS_STAT_RFC2819_etherStatsJabbers,
+	IFLA_HSTATS_STAT_RFC2819_etherStatsCollisions,
+	IFLA_HSTATS_STAT_RFC2819_etherStatsPkts64Octets,
+	IFLA_HSTATS_STAT_RFC2819_etherStatsPkts65to127Octets,
+	IFLA_HSTATS_STAT_RFC2819_etherStatsPkts128to255Octets,
+	IFLA_HSTATS_STAT_RFC2819_etherStatsPkts256to511Octets,
+	IFLA_HSTATS_STAT_RFC2819_etherStatsPkts512to1023Octets,
+	IFLA_HSTATS_STAT_RFC2819_etherStatsPkts1024to1518Octets,
+	/* Extensions to IETF RFC2819 */
+	IFLA_HSTATS_STAT_RFC2819EXT_etherStatsPkts1024toMaxOctets,
+	IFLA_HSTATS_STAT_RFC2819EXT_etherStatsPkts1519toMaxOctets,
+	IFLA_HSTATS_STAT_RFC2819EXT_etherStatsPkts1024to2047Octets,
+	IFLA_HSTATS_STAT_RFC2819EXT_etherStatsPkts2048to4095Octets,
+	IFLA_HSTATS_STAT_RFC2819EXT_etherStatsPkts4096to8191Octets,
+	IFLA_HSTATS_STAT_RFC2819EXT_etherStatsPkts8192toMaxOctets,
+
+	/* IETF RFC2863 ("The Interfaces Group MIB") */
+	IFLA_HSTATS_STAT_RFC2863_UcastPkts = 97,
+	IFLA_HSTATS_STAT_RFC2863_Errors,
+	IFLA_HSTATS_STAT_RFC2863_Discards,
+
+	/* IEEE 802.3 */
+	IFLA_HSTATS_STAT_IEEE8023_FramesOK = 129, /* 2 */
+	IFLA_HSTATS_STAT_IEEE8023_OctetsOK,
+	IFLA_HSTATS_STAT_IEEE8023_MulticastFramesOK,
+	IFLA_HSTATS_STAT_IEEE8023_BroadcastFramesOK,
+	IFLA_HSTATS_STAT_IEEE8023_FrameCheckSequenceErrors,
+	IFLA_HSTATS_STAT_IEEE8023_AlignmentErrors,
+	IFLA_HSTATS_STAT_IEEE8023_InRangeLengthErrors,
+	IFLA_HSTATS_STAT_IEEE8023_OutOfRangeLengthField,
+	IFLA_HSTATS_STAT_IEEE8023_FrameTooLongErrors,
+	IFLA_HSTATS_STAT_IEEE8023_CollisionFrames,
+	IFLA_HSTATS_STAT_IEEE8023_SQETestErrors,
+	IFLA_HSTATS_STAT_IEEE8023_SymbolErrorDuringCarrier,
+	IFLA_HSTATS_STAT_IEEE8023_MACControlFrames,
+	IFLA_HSTATS_STAT_IEEE8023_UnsupportedOpcodes,
+	IFLA_HSTATS_STAT_IEEE8023_PAUSEMACCtrlFrames,
+	IFLA_HSTATS_STAT_IEEE8023_FECCorrectedBlocks,
+	IFLA_HSTATS_STAT_IEEE8023_FECUncorrectableBlocks,
+
 	__IFLA_HSTATS_STAT_MAX,
 };
 #define IFLA_HSTATS_STAT_MAX (__IFLA_HSTATS_STAT_MAX - 1)
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RFC 11/14] nfp: hstats: add IEEE/RMON ethernet port/MAC stats
  2019-01-28 23:44 [RFC 00/14] netlink/hierarchical stats Jakub Kicinski
                   ` (9 preceding siblings ...)
  2019-01-28 23:45 ` [RFC 10/14] net: hstats: add IEEE 802.3 and common IETF MIB/RMON stats Jakub Kicinski
@ 2019-01-28 23:45 ` Jakub Kicinski
  2019-01-28 23:45 ` [RFC 12/14] net: hstats: add markers for partial groups Jakub Kicinski
                   ` (4 subsequent siblings)
  15 siblings, 0 replies; 27+ messages in thread
From: Jakub Kicinski @ 2019-01-28 23:45 UTC (permalink / raw)
  To: davem
  Cc: oss-drivers, netdev, jiri, f.fainelli, andrew, mkubecek, dsahern,
	simon.horman, jesse.brandeburg, maciejromanfijalkowski,
	vasundhara-v.volam, michael.chan, shalomt, idosch,
	Jakub Kicinski

Add groups for MAC statistics, NFP's MAC/PHY maintains a mix
of IEEE 802.3 and RMON-compatible statistics.  Use the RMON
stats for driver drops and error counters as well.

Since the MAC stats are quite large initialize the hstat
bitmap when driver is loaded.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 .../net/ethernet/netronome/nfp/nfp_hstat.c    | 257 +++++++++++++++++-
 drivers/net/ethernet/netronome/nfp/nfp_main.c |   1 +
 drivers/net/ethernet/netronome/nfp/nfp_main.h |   2 +
 3 files changed, 257 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_hstat.c b/drivers/net/ethernet/netronome/nfp/nfp_hstat.c
index 9300996e756c..cd97cd2676f6 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_hstat.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_hstat.c
@@ -4,6 +4,217 @@
 #include <net/hstats.h>
 
 #include "nfp_net.h"
+#include "nfp_port.h"
+
+/* MAC stats */
+static const struct nfp_stat_pair {
+	u32 attr;
+	u32 offset;
+} nfp_mac_stats_rx[] = {
+	{
+		IFLA_HSTATS_STAT_RFC2819_etherStatsOctets,
+		NFP_MAC_STATS_RX_IN_OCTETS
+	}, {
+		IFLA_HSTATS_STAT_IEEE8023_FrameTooLongErrors,
+		NFP_MAC_STATS_RX_FRAME_TOO_LONG_ERRORS
+	}, {
+		IFLA_HSTATS_STAT_IEEE8023_InRangeLengthErrors,
+		NFP_MAC_STATS_RX_RANGE_LENGTH_ERRORS
+	},
+	/* missing NFP_MAC_STATS_RX_VLAN_RECEIVED_OK, no standard counter */
+	{
+		IFLA_HSTATS_STAT_RFC2863_Errors,
+		NFP_MAC_STATS_RX_IN_ERRORS
+	}, {
+		IFLA_HSTATS_STAT_RFC2819_etherStatsBroadcastPkts,
+		NFP_MAC_STATS_RX_IN_BROADCAST_PKTS
+	}, {
+		IFLA_HSTATS_STAT_RFC2819_etherStatsDropEvents,
+		NFP_MAC_STATS_RX_DROP_EVENTS
+	}, {
+		IFLA_HSTATS_STAT_IEEE8023_AlignmentErrors,
+		NFP_MAC_STATS_RX_ALIGNMENT_ERRORS
+	}, {
+		IFLA_HSTATS_STAT_IEEE8023_PAUSEMACCtrlFrames,
+		NFP_MAC_STATS_RX_PAUSE_MAC_CTRL_FRAMES
+	}, {
+		IFLA_HSTATS_STAT_IEEE8023_FramesOK,
+		NFP_MAC_STATS_RX_FRAMES_RECEIVED_OK
+	}, {
+		IFLA_HSTATS_STAT_IEEE8023_FrameCheckSequenceErrors,
+		NFP_MAC_STATS_RX_FRAME_CHECK_SEQUENCE_ERRORS
+	}, {
+		IFLA_HSTATS_STAT_RFC2863_UcastPkts,
+		NFP_MAC_STATS_RX_UNICAST_PKTS
+	}, {
+		IFLA_HSTATS_STAT_RFC2819_etherStatsMulticastPkts,
+		NFP_MAC_STATS_RX_MULTICAST_PKTS
+	}, {
+		IFLA_HSTATS_STAT_RFC2819_etherStatsPkts,
+		NFP_MAC_STATS_RX_PKTS
+	}, {
+		IFLA_HSTATS_STAT_RFC2819_etherStatsUndersizePkts,
+		NFP_MAC_STATS_RX_UNDERSIZE_PKTS
+	}, {
+		IFLA_HSTATS_STAT_RFC2819_etherStatsPkts64Octets,
+		NFP_MAC_STATS_RX_PKTS_64_OCTETS
+	}, {
+		IFLA_HSTATS_STAT_RFC2819_etherStatsPkts65to127Octets,
+		NFP_MAC_STATS_RX_PKTS_65_TO_127_OCTETS
+	}, {
+		IFLA_HSTATS_STAT_RFC2819_etherStatsPkts512to1023Octets,
+		NFP_MAC_STATS_RX_PKTS_512_TO_1023_OCTETS
+	}, {
+		IFLA_HSTATS_STAT_RFC2819_etherStatsPkts1024to1518Octets,
+		NFP_MAC_STATS_RX_PKTS_1024_TO_1518_OCTETS
+	}, {
+		IFLA_HSTATS_STAT_RFC2819_etherStatsJabbers,
+		NFP_MAC_STATS_RX_JABBERS
+	}, {
+		IFLA_HSTATS_STAT_RFC2819_etherStatsFragments,
+		NFP_MAC_STATS_RX_FRAGMENTS
+	}, {
+		IFLA_HSTATS_STAT_RFC2819_etherStatsPkts128to255Octets,
+		NFP_MAC_STATS_RX_PKTS_128_TO_255_OCTETS
+	}, {
+		IFLA_HSTATS_STAT_RFC2819_etherStatsPkts256to511Octets,
+		NFP_MAC_STATS_RX_PKTS_256_TO_511_OCTETS
+	}, {
+		IFLA_HSTATS_STAT_RFC2819EXT_etherStatsPkts1519toMaxOctets,
+		NFP_MAC_STATS_RX_PKTS_1519_TO_MAX_OCTETS
+	}, {
+		IFLA_HSTATS_STAT_RFC2819_etherStatsOversizePkts,
+		NFP_MAC_STATS_RX_OVERSIZE_PKTS
+	}, {
+		IFLA_HSTATS_STAT_IEEE8023_MACControlFrames,
+		NFP_MAC_STATS_RX_MAC_CTRL_FRAMES_RECEIVED
+	}
+}, nfp_mac_stats_tx[] = {
+	{
+		IFLA_HSTATS_STAT_RFC2819_etherStatsOctets,
+		NFP_MAC_STATS_TX_OUT_OCTETS
+	},
+	/* missing NFP_MAC_STATS_TX_VLAN_TRANSMITTED_OK, no standard counter */
+	{
+		IFLA_HSTATS_STAT_RFC2863_Errors,
+		NFP_MAC_STATS_TX_OUT_ERRORS
+	}, {
+		IFLA_HSTATS_STAT_RFC2819_etherStatsBroadcastPkts,
+		NFP_MAC_STATS_TX_BROADCAST_PKTS
+	}, {
+		IFLA_HSTATS_STAT_RFC2819_etherStatsPkts64Octets,
+		NFP_MAC_STATS_TX_PKTS_64_OCTETS
+	}, {
+		IFLA_HSTATS_STAT_RFC2819_etherStatsPkts256to511Octets,
+		NFP_MAC_STATS_TX_PKTS_256_TO_511_OCTETS
+	}, {
+		IFLA_HSTATS_STAT_RFC2819_etherStatsPkts512to1023Octets,
+		NFP_MAC_STATS_TX_PKTS_512_TO_1023_OCTETS
+	}, {
+		IFLA_HSTATS_STAT_IEEE8023_PAUSEMACCtrlFrames,
+		NFP_MAC_STATS_TX_PAUSE_MAC_CTRL_FRAMES
+	}, {
+		IFLA_HSTATS_STAT_IEEE8023_FramesOK,
+		NFP_MAC_STATS_TX_FRAMES_TRANSMITTED_OK
+	}, {
+		IFLA_HSTATS_STAT_RFC2863_UcastPkts,
+		NFP_MAC_STATS_TX_UNICAST_PKTS
+	}, {
+		IFLA_HSTATS_STAT_RFC2819_etherStatsMulticastPkts,
+		NFP_MAC_STATS_TX_MULTICAST_PKTS
+	}, {
+		IFLA_HSTATS_STAT_RFC2819_etherStatsPkts65to127Octets,
+		NFP_MAC_STATS_TX_PKTS_65_TO_127_OCTETS
+	}, {
+		IFLA_HSTATS_STAT_RFC2819_etherStatsPkts128to255Octets,
+		NFP_MAC_STATS_TX_PKTS_128_TO_255_OCTETS
+	}, {
+		IFLA_HSTATS_STAT_RFC2819_etherStatsPkts1024to1518Octets,
+		NFP_MAC_STATS_TX_PKTS_1024_TO_1518_OCTETS
+	}, {
+		IFLA_HSTATS_STAT_RFC2819EXT_etherStatsPkts1519toMaxOctets,
+		NFP_MAC_STATS_TX_PKTS_1519_TO_MAX_OCTETS
+	}
+};
+
+static int
+nfp_hstat_mac_get(struct net_device *netdev, struct rtnl_hstat_req *req,
+		  const struct rtnl_hstat_group *grp)
+{
+	const struct nfp_stat_pair *pairs;
+	struct nfp_port *port;
+	unsigned int dir, i;
+
+	port = nfp_port_from_netdev(netdev);
+	if (!__nfp_port_get_eth_port(port) || !port->eth_stats)
+		return -EINVAL;
+
+	dir = rtnl_hstat_qual_get(req, RTNL_HSTATS_QUAL_DIRECTION);
+	pairs = dir == IFLA_HSTATS_QUAL_DIR_RX ?
+		nfp_mac_stats_rx : nfp_mac_stats_tx;
+
+	for (i = 0; i < grp->stats_cnt; i++)
+		rtnl_hstat_dump(req, pairs[i].attr,
+				readq(port->eth_stats + pairs[i].offset));
+	return 0;
+}
+
+static struct rtnl_hstat_group nfp_hstat_mac_rx __ro_after_init = {
+	.qualifiers = {
+		RTNL_HSTATS_QUALS_BASIC(DEV, RX),
+	},
+
+	.get_stats = nfp_hstat_mac_get,
+};
+
+static struct rtnl_hstat_group nfp_hstat_mac_tx __ro_after_init = {
+	.qualifiers = {
+		RTNL_HSTATS_QUALS_BASIC(DEV, TX),
+	},
+
+	.get_stats = nfp_hstat_mac_get,
+};
+
+static const struct rtnl_hstat_group nfp_hstat_mac = {
+	.has_children = true,
+	.children = {
+		&nfp_hstat_mac_rx,
+		&nfp_hstat_mac_tx,
+		NULL,
+	},
+};
+
+static int
+nfp_hstat_mac_head_drop(struct net_device *netdev, struct rtnl_hstat_req *req,
+			const struct rtnl_hstat_group *grp)
+{
+	struct nfp_port *port;
+	unsigned int off, dir;
+
+	port = nfp_port_from_netdev(netdev);
+	if (!__nfp_port_get_eth_port(port) || !port->eth_stats)
+		return -EINVAL;
+
+	dir = rtnl_hstat_qual_get(req, RTNL_HSTATS_QUAL_DIRECTION);
+	off = dir == IFLA_HSTATS_QUAL_DIR_RX ?
+		NFP_MAC_STATS_RX_MAC_HEAD_DROP : NFP_MAC_STATS_TX_QUEUE_DROP;
+
+	rtnl_hstat_dump(req, IFLA_HSTATS_STAT_RFC2819_etherStatsDropEvents,
+			readq(port->eth_stats + off));
+	return 0;
+}
+
+static const struct rtnl_hstat_group nfp_hstat_tm = {
+	.qualifiers = {
+		RTNL_HSTATS_QUALS_BASIC_BIDIR(DEV),
+	},
+
+	.get_stats = nfp_hstat_mac_head_drop,
+	.stats	= {
+		[1] =	RTNL_HSTATS_STAT_RFC2819_etherStatsDropEvents_BIT,
+	},
+	.stats_cnt = 1,
+};
 
 /* NFD per-vNIC stats */
 static int
@@ -21,6 +232,10 @@ nfp_hstat_vnic_nfd_basic_get(struct net_device *netdev,
 			nn_readq(nn, NFP_NET_CFG_STATS_RX_FRAMES + off));
 	rtnl_hstat_dump(req, IFLA_HSTATS_STAT_LINUX_BYTES,
 			nn_readq(nn, NFP_NET_CFG_STATS_RX_OCTETS + off));
+	rtnl_hstat_dump(req, IFLA_HSTATS_STAT_RFC2863_Errors,
+			nn_readq(nn, NFP_NET_CFG_STATS_RX_ERRORS + off));
+	rtnl_hstat_dump(req, IFLA_HSTATS_STAT_RFC2863_Discards,
+			nn_readq(nn, NFP_NET_CFG_STATS_RX_DISCARDS + off));
 	return 0;
 }
 
@@ -33,8 +248,10 @@ static const struct rtnl_hstat_group nfp_hstat_vnic_nfd = {
 	.stats	= {
 		[0] =	RTNL_HSTATS_STAT_LINUX_PKTS_BIT |
 			RTNL_HSTATS_STAT_LINUX_BYTES_BIT,
+		[2] =	RTNL_HSTATS_STAT_RFC2863_Errors_BIT |
+			RTNL_HSTATS_STAT_RFC2863_Discards_BIT,
 	},
-	.stats_cnt = 2,
+	.stats_cnt = 4,
 };
 
 /* NFD per-Q stats */
@@ -99,6 +316,8 @@ nfp_hstat_vnic_sw_rx_get(struct net_device *dev, struct rtnl_hstat_req *req,
 			r_vec->hw_csum_rx_complete);
 	rtnl_hstat_dump(req, IFLA_HSTATS_STAT_LINUX_CSUM_UNNECESSARY,
 			r_vec->hw_csum_rx_ok + r_vec->hw_csum_rx_inner_ok);
+	rtnl_hstat_dump(req, IFLA_HSTATS_STAT_RFC2819_etherStatsDropEvents,
+			r_vec->rx_drops);
 	return 0;
 }
 
@@ -116,9 +335,10 @@ static const struct rtnl_hstat_group nfp_hstat_vnic_sw_rx = {
 			RTNL_HSTATS_STAT_LINUX_BYTES_BIT |
 			RTNL_HSTATS_STAT_LINUX_CSUM_PARTIAL_BIT |
 			RTNL_HSTATS_STAT_LINUX_CSUM_UNNECESSARY_BIT,
+		[1] =	RTNL_HSTATS_STAT_RFC2819_etherStatsDropEvents_BIT,
 
 	},
-	.stats_cnt = 4,
+	.stats_cnt = 5,
 };
 
 static int
@@ -137,6 +357,7 @@ nfp_hstat_vnic_sw_tx_get(struct net_device *dev, struct rtnl_hstat_req *req,
 			r_vec->hw_csum_tx + r_vec->hw_csum_tx_inner);
 	rtnl_hstat_dump(req, IFLA_HSTATS_STAT_LINUX_SEGMENTATION_OFFLOAD_PKTS,
 			r_vec->tx_lso);
+	rtnl_hstat_dump(req, IFLA_HSTATS_STAT_RFC2863_Errors, r_vec->tx_errors);
 	return 0;
 }
 
@@ -155,8 +376,9 @@ static const struct rtnl_hstat_group nfp_hstat_vnic_sw_tx = {
 			RTNL_HSTATS_STAT_LINUX_BUSY_BIT |
 			RTNL_HSTATS_STAT_LINUX_CSUM_PARTIAL_BIT |
 			RTNL_HSTATS_STAT_LINUX_SEGMENTATION_OFFLOAD_PKTS_BIT,
+		[1] =	RTNL_HSTATS_STAT_RFC2863_Errors_BIT,
 	},
-	.stats_cnt = 5,
+	.stats_cnt = 6,
 };
 
 static const struct rtnl_hstat_group nfp_hstat_vnic_sw = {
@@ -171,9 +393,38 @@ static const struct rtnl_hstat_group nfp_hstat_vnic_sw = {
 int nfp_net_hstat_get_groups(const struct net_device *netdev,
 			     struct rtnl_hstat_req *req)
 {
+	struct nfp_port *port;
+
 	rtnl_hstat_add_grp(req, &nfp_hstat_vnic_sw);
 	rtnl_hstat_add_grp(req, &nfp_hstat_vnic_nfd_pq);
 	rtnl_hstat_add_grp(req, &nfp_hstat_vnic_nfd);
 
+	port = nfp_port_from_netdev(netdev);
+	if (__nfp_port_get_eth_port(port) && port->eth_stats) {
+		rtnl_hstat_add_grp(req, &nfp_hstat_tm);
+		rtnl_hstat_add_grp(req, &nfp_hstat_mac);
+	}
+
 	return 0;
 }
+
+void __init nfp_net_hstat_init(void)
+{
+	unsigned int i;
+
+	for (i = 0; i < ARRAY_SIZE(nfp_mac_stats_rx); i++) {
+		unsigned int attr;
+
+		attr = nfp_mac_stats_rx[i].attr;
+		nfp_hstat_mac_rx.stats[attr / 64] |= BIT_ULL(attr % 64);
+	}
+	nfp_hstat_mac_rx.stats_cnt = ARRAY_SIZE(nfp_mac_stats_rx);
+
+	for (i = 0; i < ARRAY_SIZE(nfp_mac_stats_tx); i++) {
+		unsigned int attr;
+
+		attr = nfp_mac_stats_tx[i].attr;
+		nfp_hstat_mac_tx.stats[attr / 64] |= BIT_ULL(attr % 64);
+	}
+	nfp_hstat_mac_tx.stats_cnt = ARRAY_SIZE(nfp_mac_stats_tx);
+}
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_main.c b/drivers/net/ethernet/netronome/nfp/nfp_main.c
index 6c10e8d119e4..4e366fde3478 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_main.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_main.c
@@ -724,6 +724,7 @@ static int __init nfp_main_init(void)
 		nfp_driver_name);
 
 	nfp_net_debugfs_create();
+	nfp_net_hstat_init();
 
 	err = pci_register_driver(&nfp_pci_driver);
 	if (err < 0)
diff --git a/drivers/net/ethernet/netronome/nfp/nfp_main.h b/drivers/net/ethernet/netronome/nfp/nfp_main.h
index a3613a2e0aa5..367afb869df6 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_main.h
+++ b/drivers/net/ethernet/netronome/nfp/nfp_main.h
@@ -145,6 +145,8 @@ extern struct pci_driver nfp_netvf_pci_driver;
 
 extern const struct devlink_ops nfp_devlink_ops;
 
+void nfp_net_hstat_init(void);
+
 int nfp_net_pci_probe(struct nfp_pf *pf);
 void nfp_net_pci_remove(struct nfp_pf *pf);
 
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RFC 12/14] net: hstats: add markers for partial groups
  2019-01-28 23:44 [RFC 00/14] netlink/hierarchical stats Jakub Kicinski
                   ` (10 preceding siblings ...)
  2019-01-28 23:45 ` [RFC 11/14] nfp: hstats: add IEEE/RMON ethernet port/MAC stats Jakub Kicinski
@ 2019-01-28 23:45 ` Jakub Kicinski
  2019-01-28 23:45 ` [RFC 13/14] nfp: hstats: add a partial group of per-8021Q prio stats Jakub Kicinski
                   ` (3 subsequent siblings)
  15 siblings, 0 replies; 27+ messages in thread
From: Jakub Kicinski @ 2019-01-28 23:45 UTC (permalink / raw)
  To: davem
  Cc: oss-drivers, netdev, jiri, f.fainelli, andrew, mkubecek, dsahern,
	simon.horman, jesse.brandeburg, maciejromanfijalkowski,
	vasundhara-v.volam, michael.chan, shalomt, idosch,
	Jakub Kicinski

Hstats try to be hierarchy-neutral to the consumer, i.e. if users
only care about total statistics they should be able to add given
statistic throughout the hierarchy.  Events can't be counted
multiple times (inside one root group, that's why multiple are
allowed), and they can't be missing.

Sometimes, however, HW doesn't give use enough information, or
software may not see all events it wants to count.  Add an attribute
to indicate to user space that given statistic group does not report
all events reliably.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 include/net/hstats.h         |  2 ++
 include/uapi/linux/if_link.h |  9 +++++++++
 net/core/hstats.c            | 17 +++++++++++++++++
 3 files changed, 28 insertions(+)

diff --git a/include/net/hstats.h b/include/net/hstats.h
index dd6d5dc567cb..3e10694d3d66 100644
--- a/include/net/hstats.h
+++ b/include/net/hstats.h
@@ -45,6 +45,7 @@ struct rtnl_hstat_qualifier {
  * struct rtnl_hstat_group - node in the hstat hierarchy
  * @qualifiers:	attributes describing this group
  * @has_children: @children array is present and NULL-terminated
+ * @partial_flags: partial flags
  * @stats_cnt:	number of stats in the bitmask
  * @stats:	bitmask of stats present
  * @get_stats:	driver callback for dumping the stats
@@ -54,6 +55,7 @@ struct rtnl_hstat_qualifier {
 struct rtnl_hstat_group {
 	/* Note: this is *not* indexed with IFLA_* attributes! */
 	struct rtnl_hstat_qualifier qualifiers[RTNL_HSTATS_QUAL_CNT];
+	u8 partial_flags;
 	bool has_children;
 	/* Can't use bitmaps - words are variable length */
 	unsigned int stats_cnt;
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index c58468100a06..28ac4574bb0e 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -946,6 +946,7 @@ enum {
 	IFLA_HSTATS_UNSPEC,
 	IFLA_HSTATS_GROUP,
 	IFLA_HSTATS_STATS,
+	IFLA_HSTATS_PARTIAL,
 	IFLA_HSTATS_QUAL_TYPE,
 	IFLA_HSTATS_QUAL_DIRECTION,
 	IFLA_HSTATS_QUAL_QUEUE,
@@ -955,6 +956,14 @@ enum {
 };
 #define IFLA_HSTATS_MAX (__IFLA_HSTATS_MAX - 1)
 
+enum {
+	IFLA_HSTATS_PARTIAL_UNSPEC,
+	IFLA_HSTATS_PARTIAL_CLASSIFIER = 1,
+	IFLA_HSTATS_PARTIAL_SLOWPATH = 2,
+	__IFLA_HSTATS_PARTIAL_MAX,
+};
+#define IFLA_HSTATS_PARTIAL_MAX (__IFLA_HSTATS_PARTIAL_MAX - 1)
+
 enum {
 	IFLA_HSTATS_QUAL_TYPE_UNSPEC,
 	IFLA_HSTATS_QUAL_TYPE_DEV,
diff --git a/net/core/hstats.c b/net/core/hstats.c
index c689ebfdaeaa..c9b7264d98dc 100644
--- a/net/core/hstats.c
+++ b/net/core/hstats.c
@@ -358,6 +358,17 @@ hstat_dumper_put_qual(struct hstat_dumper *dumper, int i, u32 val)
 	return nla_put_u32(dumper->skb, rtnl_qual2ifla[i], val);
 }
 
+static int
+hstat_dumper_put_partials(struct hstat_dumper *dumper, u32 val)
+{
+	if (dumper->sizing) {
+		dumper->size += nla_total_size(sizeof(u32));
+		return 0;
+	}
+
+	return nla_put_u32(dumper->skb, IFLA_HSTATS_PARTIAL, val);
+}
+
 /* Dumper handlers */
 static int hstat_dumper_grp_load(struct hstat_dumper *dumper)
 {
@@ -378,6 +389,12 @@ static int hstat_dumper_grp_load(struct hstat_dumper *dumper)
 	if (err)
 		return err;
 
+	if (cmd.grp->partial_flags) {
+		err = hstat_dumper_put_partials(dumper, cmd.grp->partial_flags);
+		if (err)
+			return err;
+	}
+
 	for (i = 0; i < RTNL_HSTATS_QUAL_CNT; i++) {
 		const struct rtnl_hstat_qualifier *q;
 
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RFC 13/14] nfp: hstats: add a partial group of per-8021Q prio stats
  2019-01-28 23:44 [RFC 00/14] netlink/hierarchical stats Jakub Kicinski
                   ` (11 preceding siblings ...)
  2019-01-28 23:45 ` [RFC 12/14] net: hstats: add markers for partial groups Jakub Kicinski
@ 2019-01-28 23:45 ` Jakub Kicinski
  2019-01-28 23:45 ` [RFC 14/14] Documentation: networking: describe new hstat API Jakub Kicinski
                   ` (2 subsequent siblings)
  15 siblings, 0 replies; 27+ messages in thread
From: Jakub Kicinski @ 2019-01-28 23:45 UTC (permalink / raw)
  To: davem
  Cc: oss-drivers, netdev, jiri, f.fainelli, andrew, mkubecek, dsahern,
	simon.horman, jesse.brandeburg, maciejromanfijalkowski,
	vasundhara-v.volam, michael.chan, shalomt, idosch,
	Jakub Kicinski

The MAC maintains counts of PFC per-prio pause frames.  Unfortunately,
I couldn't find such counters in any standard, so add a partial
group (HW doesn't give us "non-PFC" count) with those counters.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 .../net/ethernet/netronome/nfp/nfp_hstat.c    | 44 +++++++++++++++++++
 1 file changed, 44 insertions(+)

diff --git a/drivers/net/ethernet/netronome/nfp/nfp_hstat.c b/drivers/net/ethernet/netronome/nfp/nfp_hstat.c
index cd97cd2676f6..ced4edca8b73 100644
--- a/drivers/net/ethernet/netronome/nfp/nfp_hstat.c
+++ b/drivers/net/ethernet/netronome/nfp/nfp_hstat.c
@@ -216,6 +216,49 @@ static const struct rtnl_hstat_group nfp_hstat_tm = {
 	.stats_cnt = 1,
 };
 
+static int
+nfp_hstat_mac_pp_pause(struct net_device *netdev, struct rtnl_hstat_req *req,
+		       const struct rtnl_hstat_group *grp)
+{
+	static const u32 remap_rx[] = {
+		0xe0, 0xe8, 0xb0, 0xb8, 0xf0, 0xf7, 0x100, 0x108
+	};
+	static const u32 remap_tx[] = {
+		0x1c0, 0x1c8, 0x1e0, 0x1e8, 0x1d0, 0x1d8, 0x1f0, 0x1f8
+	};
+	struct nfp_port *port;
+	const u32 *remap;
+	u8 dir, prio;
+
+	port = nfp_port_from_netdev(netdev);
+	if (!__nfp_port_get_eth_port(port) || !port->eth_stats)
+		return -EINVAL;
+
+	prio = rtnl_hstat_qual_get(req, RTNL_HSTATS_QUAL_PRIORITY);
+	dir = rtnl_hstat_qual_get(req, RTNL_HSTATS_QUAL_DIRECTION);
+	remap = dir == IFLA_HSTATS_QUAL_DIR_RX ? remap_rx : remap_tx;
+
+	rtnl_hstat_dump(req, IFLA_HSTATS_STAT_IEEE8023_PAUSEMACCtrlFrames,
+			readq(port->eth_stats + remap[prio]));
+	return 0;
+}
+
+static const struct rtnl_hstat_group nfp_hstat_pp_pause = {
+	.qualifiers = {
+		RTNL_HSTATS_QUALS_BASIC_BIDIR(DEV),
+		[RTNL_HSTATS_QUAL_PRIORITY] = {
+			.max	= 8,
+		},
+	},
+	.partial_flags = IFLA_HSTATS_PARTIAL_CLASSIFIER,
+
+	.get_stats = nfp_hstat_mac_pp_pause,
+	.stats	= {
+		[1] =	RTNL_HSTATS_STAT_IEEE8023_PAUSEMACCtrlFrames_BIT,
+	},
+	.stats_cnt = 1,
+};
+
 /* NFD per-vNIC stats */
 static int
 nfp_hstat_vnic_nfd_basic_get(struct net_device *netdev,
@@ -402,6 +445,7 @@ int nfp_net_hstat_get_groups(const struct net_device *netdev,
 	port = nfp_port_from_netdev(netdev);
 	if (__nfp_port_get_eth_port(port) && port->eth_stats) {
 		rtnl_hstat_add_grp(req, &nfp_hstat_tm);
+		rtnl_hstat_add_grp(req, &nfp_hstat_pp_pause);
 		rtnl_hstat_add_grp(req, &nfp_hstat_mac);
 	}
 
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [RFC 14/14] Documentation: networking: describe new hstat API
  2019-01-28 23:44 [RFC 00/14] netlink/hierarchical stats Jakub Kicinski
                   ` (12 preceding siblings ...)
  2019-01-28 23:45 ` [RFC 13/14] nfp: hstats: add a partial group of per-8021Q prio stats Jakub Kicinski
@ 2019-01-28 23:45 ` Jakub Kicinski
  2019-01-30 22:14 ` [RFC 00/14] netlink/hierarchical stats Roopa Prabhu
  2019-02-06 20:12 ` Florian Fainelli
  15 siblings, 0 replies; 27+ messages in thread
From: Jakub Kicinski @ 2019-01-28 23:45 UTC (permalink / raw)
  To: davem
  Cc: oss-drivers, netdev, jiri, f.fainelli, andrew, mkubecek, dsahern,
	simon.horman, jesse.brandeburg, maciejromanfijalkowski,
	vasundhara-v.volam, michael.chan, shalomt, idosch,
	Jakub Kicinski

Add hstat documentation for users and driver developers.

Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
---
 Documentation/networking/hstats.rst           | 590 ++++++++++++++++++
 .../networking/hstats_flow_example.dot        |  11 +
 Documentation/networking/index.rst            |   1 +
 3 files changed, 602 insertions(+)
 create mode 100644 Documentation/networking/hstats.rst
 create mode 100644 Documentation/networking/hstats_flow_example.dot

diff --git a/Documentation/networking/hstats.rst b/Documentation/networking/hstats.rst
new file mode 100644
index 000000000000..8856770a4909
--- /dev/null
+++ b/Documentation/networking/hstats.rst
@@ -0,0 +1,590 @@
+.. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
+
+=================================
+Netlink device hierarchical stats
+=================================
+
+Overview
+========
+
+*hstats* are standardized driver statistics.  Both device and host driver
+statistics can be exposed.
+
+*hstats* are a partial replacement for `ethtool -S`, standard-based statistics
+and statistics common across many drivers should be added to *hstats*.
+Once statistic is exposed via *hstats* it should not be added to `ethtool -S`
+(drivers which already expose it can keep doing so, but no new driver allowed).
+
+Apart from common statistics identifiers *hstats* provide:
+ - clear indication if statistic is maintained (unlike more compact
+   fixed-structure stats, e.g. `struct rtnl_link_stats64`);
+ - hierarchical structure (see :ref:`groups` and :ref:`hierarchies`);
+ - exposes :ref:`qualifiers` - these are attributes of the statistics which
+   help determine the source of the statistics;
+ - provide flexible grouping mechanism which should reduce code duplication
+   amongst drivers.
+
+Netlink message structure
+=========================
+
+Basic structure of the `IFLA_STATS_LINK_HSTATS` attribute looks like this:
+
+::
+
+ [IFLA_STATS_LINK_HSTATS]
+     [IFLA_HSTATS_GROUP]
+         [IFLA_HSTATS_QUAL_xxx]
+         [IFLA_HSTATS_QUAL_yyy]
+         [IFLA_HSTATS_STATS]
+             [IFLA_HSTATS_STAT_a]
+             [IFLA_HSTATS_STAT_b]
+         [IFLA_HSTATS_GROUP]
+             [IFLA_HSTATS_QUAL_zzz]
+             [IFLA_HSTATS_STATS]
+                 [IFLA_HSTATS_STAT_c]
+                 [IFLA_HSTATS_STAT_d]
+         [IFLA_HSTATS_GROUP]
+             [IFLA_HSTATS_QUAL_zzz]
+             [IFLA_HSTATS_STATS]
+                 [IFLA_HSTATS_STAT_c]
+                 [IFLA_HSTATS_STAT_d]
+
+.. _groups:
+
+Groups
+------
+
+`IFLA_STATS_LINK_HSTATS` may contain zero or more `IFLA_HSTATS_GROUP`.
+
+The statistics within a root level `IFLA_HSTATS_GROUP` must not count
+events multiple times.
+
+The statistics within a root level `IFLA_HSTATS_GROUP` must include all
+of the events, excluding groups where `IFLA_HSTATS_PARTIAL` attribute is
+present and set to non-zero value or children of such groups
+(see :ref:`partials`).  This removes the need to sum up statistics in the
+kernel, for example statistics maintained per queue by the driver can
+be reported only per queue and user space can do the adding.
+
+The purpose of multiple root level groups is to allow users tracking the flow
+of packets through pipelines.  Let's consider the following RX NIC pipeline:
+
+.. _example_flow:
+
+.. kernel-figure::  hstats_flow_example.dot
+   :alt:    statistics flow
+
+   Example device statistics flow diagram
+
+Each rectangle represents a root level `IFLA_STATS_LINK_HSTATS`.  Users should
+be able to add all `IFLA_HSTATS_STAT_IEEE8023_FramesOK` within each of those
+groups, and compare them to see at which stage of the processing pipeline
+packets disappear.
+
+.. _qualifiers:
+
+Qualifiers
+----------
+
+Each `IFLA_HSTATS_GROUP` may have a set of qualifiers.  Qualifiers are
+attributes which apply to the statistics like being a device statistic,
+being an RX statistic, or counting events/packets of a specific queue.
+See :ref:`qualifiers_list`.
+
+.. _hierarchies:
+
+Hierarchies
+-----------
+
+Each `IFLA_HSTATS_GROUP` can have further `IFLA_HSTATS_GROUP` s nested inside.
+Qualifiers of a parent group apply to the child group.  The hierarchies have no
+perscribed meaning, or set structure.  They are mostly present for the
+convenience of the driver authors.
+
+.. _partials:
+
+Partial groups
+--------------
+
+`IFLA_HSTATS_PARTIAL` attribute declares groups contents as incomplete.
+As explained in :ref:`groups` if a statistic is present it's sum within
+a root group must add up to total number of events seen at a given stage
+of processing.  This mostly relates to iterative qualifiers like
+:ref:`qual_queue` or :ref:`qual_prio`.
+
+Counting events multiple times towards different statistics, including
+similar statistics from different families (e.g. IEEE stats vs IETF stats)
+is perfectly normal and acceptable.
+
+`IFLA_HSTATS_PARTIAL` is inherited to children the same as :ref:`qualifiers`.
+It can carry one of the following flags:
+
+IFLA_HSTATS_PARTIAL_CLASSIFIER
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Statistics from this group contain more detailed breakdown which may not
+be complete.
+
+For example, HW may maintain the number of packets received per priority,
+but packets received without a priority tag may not be counted separately.
+Since non-tagged packets are not counted the sum of statistics will not
+be complete.
+
+IFLA_HSTATS_PARTIAL_SLOWPATH
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Statistics from this group count slow path events.  This attribute is used
+if hardware is capable of offload packet forwarding.  Some packets may still
+reach the CPU for example if they were purposefully trapped for monitoring,
+or no forwarding rule matched.
+
+User space guidelines
+=====================
+
+User space `hstat` tool can be used to query statistics.  This tools is part
+of the `iproute2` package.
+
+Device driver guidelines
+========================
+
+Defining groups
+---------------
+
+The definition of a group of statistics has four sections:
+
+.. code-block:: c
+
+  static const struct rtnl_hstat_group my_group = {
+	/* Qualifiers */
+	.qualifiers = {
+		[RTNL_HSTATS_QUAL_TYPE] = {
+			.constant	= IFLA_HSTATS_QUAL_TYPE_DEV,
+		},
+		[RTNL_HSTATS_QUAL_DIRECTION] = {
+			.constant	= IFLA_HSTATS_QUAL_DIR_RX,
+		},
+	},
+	/* Partials */
+	.partial_flags = IFLA_HSTATS_PARTIAL_CLASSIFIER,
+
+	/* Statistics */
+	.get_stats = nfp_hstat_vnic_nfd_basic_get,
+	.stats	= {
+		/* Word 0 stats */
+		[0] =	RTNL_HSTATS_STAT_LINUX_CSUM_COMPLETE_BIT,
+		/* Word 1 stats */
+		[1] =	RTNL_HSTATS_STAT_RFC2819_etherStatsPkts_BIT |
+			RTNL_HSTATS_STAT_RFC2819_etherStatsOctets_BIT,
+	},
+	.stats_cnt = 3,
+
+	/* Child groups */
+	.has_children	= true,
+	.children	= {
+		&my_child_group,
+		NULL
+	},
+  };
+
+Each section can be omitted, i.e. group may not specify any qualifiers,
+statistics or have no children.  :ref:`qual_type` and :ref:`qual_direction`
+qualifiers must be specified before any statistics is declared, that is to
+say current group or one of parent groups must set them before any statistics
+are reported.
+
+If any qualifier is iterative (`max` or `get_max` set instead of `constant`)
+the group will be dumped `max` times, the driver can find out about parameters
+of current invocation with `rtnl_hstat_qual_get` helper.
+
+Partial flags are optional, and mark group as incomplete (see :ref:`partials`).
+
+Statistics are dumped by invoking `get_stats` callback.  Driver should report
+the statistics with `rtnl_hstat_dump` helper.  The list of reported statistics
+contains all the statistics driver will dump.  It is used for sizing netlink
+replies and filtering.  Drivers must not dump more statistics or different
+statistics than defined.  The definition of the bitmap requires a little bit
+of care, as there is currently no automatic way to assign statistics to the
+correct 64 bit word.
+
+.. note::
+   Qualifier constants and parameters to `rtnl_hstat_dump` take constants
+   defined in include/uapi/linux/if_link.h, other constants come from
+   include/net/hstats.h and have the `RTNL_` prefix.
+
+If any child groups are attached the `has_children` member must be set to `true`
+and `children` table must be NULL-terminated.  Child groups may be used in
+different parent groups, but care has to be taken to never create loops.
+
+Root groups should generally not have the direction set as a constant.
+Most root groups should contain both RX and TX subgroups, rather than having
+RX-only root groups and TX-only root groups.
+
+Reporting groups
+----------------
+
+Drivers have to define one or more `rtnl_hstat_group`.
+
+They should then implement `ndo_hstat_get_groups` callback.  Inside the callback
+they can report every statistic group via `rtnl_hstat_add_grp` helper.
+
+Counter resets
+--------------
+
+Counters should not reset their values while the driver is bound to the device,
+e.g. when link is brought down.
+
+Counters may reset on driver load/probe or when device is explicitly reset
+via devlink.
+
+For iterative qualifiers like queues or priorities drivers should maintain
+statistics for all queues ever allocated (or simply always report for max).
+
+.. _qualifiers_list:
+
+List of qualifiers
+==================
+
+.. _qual_type:
+
+Type
+----
+
+Indicates the source of the statistic.  Statistics can either be maintained
+by the device or the driver.  Statistics may be calculated based on multiple
+counters (e.g. two counters may need to be added), but such calculation must
+not include statistics of different types.
+
+IFLA_HSTATS_QUAL_TYPE_DEV
+~~~~~~~~~~~~~~~~~~~~~~~~~
+Device statistic, counter is maintained by the device and only reported by
+the driver.  There is currently no distinction between hardware and
+microcode/firmware statistics.
+
+IFLA_HSTATS_QUAL_TYPE_DRV
+~~~~~~~~~~~~~~~~~~~~~~~~~
+Driver statistic, fully maintained by software.
+
+.. _qual_direction:
+
+Direction
+---------
+
+Direction allows drivers to indicate whether statistics is counting incoming
+or outgoing frames.  This is useful as it limmits the number of statistic IDs
+which have to be defined, but also often allows drivers to simplify the
+reporting if HW maintains the statitics for both directions in the same format,
+just in a different location.
+
+Direction and PCIe ports
+~~~~~~~~~~~~~~~~~~~~~~~~
+For PCIe port netdevs (representors) all statistics are expressed from
+device's prespective.  If frame is submitted for transmission on a PCIe
+interface it will be seen as "received" by the port (representor).
+
+This means it is possible to see transmission side-only statistics in receive
+direction and vice versa (e.g. `IFLA_HSTATS_STAT_LINUX_CSUM_PARTIAL`).
+
+Device statistics reported on port (representor) interfaces, count the total
+number of events on given device port.  This includes both frames forwarded
+via the "fast path" and explicitly directed to the device port from the
+port (representor) interface.
+
+Driver statistics necessarily count only the "fallback path" frames, and
+should always be reported with `IFLA_HSTATS_PARTIAL` set to
+`IFLA_HSTATS_PARTIAL_SLOWPATH`.
+
+IFLA_HSTATS_QUAL_DIR_RX
+~~~~~~~~~~~~~~~~~~~~~~~
+received
+
+IFLA_HSTATS_QUAL_DIR_TX
+~~~~~~~~~~~~~~~~~~~~~~~
+transmitted
+
+.. _qual_queue:
+
+Queue
+-----
+
+Queue (or channel) is a Receive Side Scaling construct allowing multiple
+consumers and producers to communicate with the device.  It is usually
+applicable to the PCIe interface "side" of the device.
+
+Number of reported queues is in no way indicative of how many queues are
+currently enabled on the system.
+
+.. _qual_prio:
+
+Priority
+--------
+
+Priority is defined as Layer 2 IEEE 802.1Q frame priority.
+
+Traffic class
+-------------
+
+Traffic class is any non-Layer 2 queuing and priritization.
+
+List of statistics
+==================
+
+All statistics are 64 bits.
+
+Common statistics
+-----------------
+
+Statistics from this section are not based on any standard but rather ones
+which commonly appear in Linux drivers.
+
+The fact that statistic ID is assigned for some event does not constitute
+a recommendation that given statistic is implemented by drivers.  Maintaining
+statistics has a performance cost associated and should be considered carefully.
+
+.. _stat_linux_pkts:
+
+IFLA_HSTATS_STAT_LINUX_PKTS
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Number of packets which were successfully handed over to the next layer.
+This means packets passed up during reception and passed down during
+transmission.  This differs from both IETF and IEEE definition of basic
+packet counters, which generally count at the same layer boundary in both
+directions.
+
+No error or discard counters are included in this counter.
+
+IFLA_HSTATS_STAT_LINUX_BYTES
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Same as :ref:`stat_linux_pkts` but counts bytes instead of packets.
+
+IFLA_HSTATS_STAT_LINUX_BUSY
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Number of times the driver returned `NETDEV_TX_BUSY` to the stack from
+in TX handler.
+
+This statistic is defined in to-device direction only.
+
+IFLA_HSTATS_STAT_LINUX_CSUM_PARTIAL
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Number of successfully handled descriptors with CHECKSUM_PARTIAL offload to
+the hardware.
+
+This statistic is defined in to-device direction only.
+
+IFLA_HSTATS_STAT_LINUX_CSUM_COMPLETE
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Number of successfully handled descriptors indicating CHECKSUM_COMPLETE.
+
+This statistic is defined in from-device direction only.
+
+IFLA_HSTATS_STAT_LINUX_CSUM_UNNECESSARY
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Number of successfully handled descriptors indicating CHECKSUM_UNNECESSARY.
+
+This statistic is defined in from-device direction only.
+
+IFLA_HSTATS_STAT_LINUX_SEGMENTATION_OFFLOAD_PKTS
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Number of successfully handled segmentation- and reassembly-offload descriptors,
+regardless of transport protocol (TSO, USO, etc.)
+
+For transmission it means the number of frames requiring segmentation submitted
+successfully to the device's transmission function (driver) or number of
+correctly parsed descriptors for such packets (device).
+
+For reception it can be used to count Generic Receive Offload frames, but *not*
+Large Receive Offload frames.
+
+This statistic is incremented once per each frame pre-segmentation as seen
+by the Linux stack, not once per each frame on the wire.
+
+IETF RFC2819 statistics
+-----------------------
+IETF RFC2819 ("Remote Network Monitoring MIB")-compatible statistics.
+
+IFLA_HSTATS_STAT_RFC2819_etherStatsDropEvents
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+etherStatsDropEvents counter
+
+IFLA_HSTATS_STAT_RFC2819_etherStatsOctets
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+etherStatsOctets counter
+
+IFLA_HSTATS_STAT_RFC2819_etherStatsPkts
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+etherStatsPkts counter
+
+IFLA_HSTATS_STAT_RFC2819_etherStatsBroadcastPkts
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+etherStatsBroadcastPkts counter
+
+IFLA_HSTATS_STAT_RFC2819_etherStatsMulticastPkts
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+etherStatsMulticastPkts counter
+
+IFLA_HSTATS_STAT_RFC2819_etherStatsCRCAlignErrors
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+etherStatsCRCAlignErrors counter
+
+IFLA_HSTATS_STAT_RFC2819_etherStatsUndersizePkts
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+etherStatsUndersizePkts counter
+
+IFLA_HSTATS_STAT_RFC2819_etherStatsOversizePkts
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+etherStatsOversizePkts counter
+
+IFLA_HSTATS_STAT_RFC2819_etherStatsFragments
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+etherStatsFragments counter
+
+IFLA_HSTATS_STAT_RFC2819_etherStatsJabbers
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+etherStatsJabbers counter
+
+IFLA_HSTATS_STAT_RFC2819_etherStatsCollisions
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+etherStatsCollisions counter
+
+IFLA_HSTATS_STAT_RFC2819_etherStatsPkts64Octets
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+etherStatsPkts64Octets counter
+
+IFLA_HSTATS_STAT_RFC2819_etherStatsPkts65to127Octets
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+etherStatsPkts65to127Octets counter
+
+IFLA_HSTATS_STAT_RFC2819_etherStatsPkts128to255Octets
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+etherStatsPkts128to255Octets counter
+
+IFLA_HSTATS_STAT_RFC2819_etherStatsPkts256to511Octets
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+etherStatsPkts256to511Octets counter
+
+IFLA_HSTATS_STAT_RFC2819_etherStatsPkts512to1023Octets
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+etherStatsPkts512to1023Octets counter
+
+IFLA_HSTATS_STAT_RFC2819_etherStatsPkts1024to1518Octets
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+etherStatsPkts1024to1518Octets counter
+
+Extended size statistics
+------------------------
+IETF RFC2819 defines simple packet-size based statistics for packets between
+64 and 1518 octets.  However, larger frames are commonly supported.  This
+sroup defines additional packet-size based statistics.
+
+IFLA_HSTATS_STAT_RFC2819EXT_etherStatsPkts1024toMaxOctets
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Definition follows etherStatsPkts65to127Octets with lower range bound of 1024,
+and no higher bound.
+
+IFLA_HSTATS_STAT_RFC2819EXT_etherStatsPkts1519toMaxOctets
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Definition follows etherStatsPkts65to127Octets with lower range bound of 1519,
+and no higher bound.
+
+IFLA_HSTATS_STAT_RFC2819EXT_etherStatsPkts1024to2047Octets
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Definition follows etherStatsPkts65to127Octets with lower range bound of 1024,
+and higher bound of 2047.
+
+IFLA_HSTATS_STAT_RFC2819EXT_etherStatsPkts2048to4095Octets
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Definition follows etherStatsPkts65to127Octets with lower range bound of 2048,
+and higher bound of 4095.
+
+IFLA_HSTATS_STAT_RFC2819EXT_etherStatsPkts4096to8191Octets
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Definition follows etherStatsPkts65to127Octets with lower range bound of 4096,
+and higher bound of 8191.
+
+IFLA_HSTATS_STAT_RFC2819EXT_etherStatsPkts8192toMaxOctets
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Definition follows etherStatsPkts65to127Octets with lower range bound of 8192,
+and no higher bound.
+
+IETF RFC2863 statistics
+-----------------------
+IETF RFC2863 ("The Interfaces Group MIB")-compatible statistics.
+
+IFLA_HSTATS_STAT_RFC2863_errors
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+`ifInErrors` or `ifOutErrors` depending on :ref:`qual_direction`.
+
+IFLA_HSTATS_STAT_RFC2863_discards
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+`ifInDiscards` or `ifOutDiscards` depending on :ref:`qual_direction`.
+
+IEEE 802.3 statistics
+---------------------
+IEEE 802.3 standard statistics.  Statistics which indicate two corresponding
+IEEE 802.3 attributes can be used in both directions, those which only
+mention a single attribute are undefined in the other direction.
+
+IFLA_HSTATS_STAT_IEEE8023_FramesOK
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+`aFramesReceivedOK` or `aFramesTransmittedOK` depending
+on :ref:`qual_direction`.
+
+IFLA_HSTATS_STAT_IEEE8023_OctetsOK
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+`aOctetsReceivedOK` or `aOctetsTransmittedOK` depending
+on :ref:`qual_direction`.
+
+IFLA_HSTATS_STAT_IEEE8023_MulticastFramesOK
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+`aMulticastFramesReceivedOK` or `aMulticastFramesXmittedOK` depending
+on :ref:`qual_direction`.
+
+IFLA_HSTATS_STAT_IEEE8023_BroadcastFramesOK
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+`aBroadcastFramesReceivedOK` or `aBroadcastFramesXmittedOK` depending
+on :ref:`qual_direction`.
+
+IFLA_HSTATS_STAT_IEEE8023_FrameCheckSequenceErrors
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+`aFrameCheckSequenceErrors`
+
+IFLA_HSTATS_STAT_IEEE8023_AlignmentErrors
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+`aAlignmentErrors`
+
+IFLA_HSTATS_STAT_IEEE8023_InRangeLengthErrors
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+`aInRangeLengthErrors`
+
+IFLA_HSTATS_STAT_IEEE8023_OutOfRangeLengthField
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+`aOutOfRangeLengthField`
+
+IFLA_HSTATS_STAT_IEEE8023_FrameTooLongErrors
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+`aFrameTooLongErrors`
+
+IFLA_HSTATS_STAT_IEEE8023_SymbolErrorDuringCarrier
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+`aSymbolErrorDuringCarrier`
+
+IFLA_HSTATS_STAT_IEEE8023_MACControlFrames
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+`aMACControlFramesReceived` or `aMACControlFramesTransmitted` depending
+on :ref:`qual_direction`.
+
+IFLA_HSTATS_STAT_IEEE8023_UnsupportedOpcodes
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+`aUnsupportedOpcodesReceived`
+
+IFLA_HSTATS_STAT_IEEE8023_PAUSEMACCtrlFrames
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+`aPAUSEMACCtrlFramesReceived` or `aPAUSEMACCtrlFramesTransmitted` depending
+on :ref:`qual_direction`.
+
+IFLA_HSTATS_STAT_IEEE8023_FECCorrectedBlocks
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+`aFECCorrectedBlocks`
+
+IFLA_HSTATS_STAT_IEEE8023_FECUncorrectableBlocks
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+`aFECUncorrectableBlocks`
diff --git a/Documentation/networking/hstats_flow_example.dot b/Documentation/networking/hstats_flow_example.dot
new file mode 100644
index 000000000000..7dbb41adf5f6
--- /dev/null
+++ b/Documentation/networking/hstats_flow_example.dot
@@ -0,0 +1,11 @@
+digraph L {
+  node [shape=record];
+  rankdir=RL;
+
+  a  [label="Device\n\nbasic ingress\nstats",color=blue]
+  b  [label="Device\n\nper-priority\nstats",color=blue]
+  c  [label="Device\n\nper-queue\nPCI interface\nstats",color=blue]
+  d  [label="Host\n\nper-queue\nstats",color=forestgreen]
+
+  a -> b -> c -> d
+}
\ No newline at end of file
diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst
index 59e86de662cd..b075de41e401 100644
--- a/Documentation/networking/index.rst
+++ b/Documentation/networking/index.rst
@@ -32,6 +32,7 @@ Linux Networking Documentation
    alias
    bridge
    snmp_counter
+   hstats
 
 .. only::  subproject
 
-- 
2.19.2


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [RFC 03/14] net: hstats: add basic/core functionality
  2019-01-28 23:44 ` [RFC 03/14] net: hstats: add basic/core functionality Jakub Kicinski
@ 2019-01-30  4:18   ` David Ahern
  2019-01-30 17:44     ` Jakub Kicinski
  0 siblings, 1 reply; 27+ messages in thread
From: David Ahern @ 2019-01-30  4:18 UTC (permalink / raw)
  To: Jakub Kicinski, davem
  Cc: oss-drivers, netdev, jiri, f.fainelli, andrew, mkubecek,
	simon.horman, jesse.brandeburg, maciejromanfijalkowski,
	vasundhara-v.volam, michael.chan, shalomt, idosch

On 1/28/19 4:44 PM, Jakub Kicinski wrote:
> @@ -4946,6 +4964,9 @@ static size_t if_nlmsg_stats_size(const struct net_device *dev,
>  		rcu_read_unlock();
>  	}
>  
> +	if (stats_attr_valid(filter_mask, IFLA_STATS_LINK_HSTATS, 0))

filter_mask is populated by RTEXT_FILTER_ from
include/uapi/linux/rtnetlink.h

> +		size += rtnl_get_link_hstats_size(dev);

rtnl_get_link_hstats_size == __rtnl_get_link_hstats can return < 0.

> +
>  	return size;
>  }
>  
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC 03/14] net: hstats: add basic/core functionality
  2019-01-30  4:18   ` David Ahern
@ 2019-01-30 17:44     ` Jakub Kicinski
  2019-01-30 22:33       ` David Ahern
  0 siblings, 1 reply; 27+ messages in thread
From: Jakub Kicinski @ 2019-01-30 17:44 UTC (permalink / raw)
  To: David Ahern
  Cc: davem, oss-drivers, netdev, jiri, f.fainelli, andrew, mkubecek,
	simon.horman, jesse.brandeburg, maciejromanfijalkowski,
	vasundhara-v.volam, michael.chan, shalomt, idosch

On Tue, 29 Jan 2019 21:18:28 -0700, David Ahern wrote:
> On 1/28/19 4:44 PM, Jakub Kicinski wrote:
> > @@ -4946,6 +4964,9 @@ static size_t if_nlmsg_stats_size(const struct net_device *dev,
> >  		rcu_read_unlock();
> >  	}
> >  
> > +	if (stats_attr_valid(filter_mask, IFLA_STATS_LINK_HSTATS, 0))  
> 
> filter_mask is populated by RTEXT_FILTER_ from
> include/uapi/linux/rtnetlink.h

ext_filter_mask is from IFLA_EXT_MASK and has RTEXT_FILTER_ bits set.
Here the mask is from struct if_stats_msg::filter_mask of RTM_GETSTATS.
Am I missing the point? :S

> > +		size += rtnl_get_link_hstats_size(dev);  
> 
> rtnl_get_link_hstats_size == __rtnl_get_link_hstats can return < 0.

Ups!  Thank you!

In general how much do you dislike this code? :)

> > +
> >  	return size;
> >  }
> >  
> >   
> 


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC 00/14] netlink/hierarchical stats
  2019-01-28 23:44 [RFC 00/14] netlink/hierarchical stats Jakub Kicinski
                   ` (13 preceding siblings ...)
  2019-01-28 23:45 ` [RFC 14/14] Documentation: networking: describe new hstat API Jakub Kicinski
@ 2019-01-30 22:14 ` Roopa Prabhu
  2019-01-31  0:24   ` Jakub Kicinski
  2019-02-06 20:12 ` Florian Fainelli
  15 siblings, 1 reply; 27+ messages in thread
From: Roopa Prabhu @ 2019-01-30 22:14 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David Miller, oss-drivers, netdev, Jiří Pírko,
	Florian Fainelli, Andrew Lunn, Michal Kubecek, David Ahern,
	Simon Horman, Brandeburg, Jesse, maciejromanfijalkowski,
	vasundhara-v.volam, Michael Chan, shalomt, Ido Schimmel

On Mon, Jan 28, 2019 at 3:45 PM Jakub Kicinski
<jakub.kicinski@netronome.com> wrote:
>
> Hi!
>
> As I tried to explain in my slides at netconf 2018 we are lacking
> an expressive, standard API to report device statistics.
>
> Networking silicon generally maintains some IEEE 802.3 and/or RMON
> statistics.  Today those all end up in ethtool -S.  Here is a simple
> attempt (admittedly very imprecise) of counting how many names driver
> authors invented for IETF RFC2819 etherStatsPkts512to1023Octets
> statistics (RX and TX):
>
> $ git grep '".*512.*1023.*"' -- drivers/net/ | \
>     sed -e 's/.*"\(.*\)".*/\1/' | sort | uniq | wc -l
> 63
>
> Interestingly only two drivers in the tree use the name the standard
> gave us (etherStatsPkts512to1023, modulo case).
>
> I set out to working on this set in an attempt to give drivers a way
> to express clearly to user space standard-compliant counters.
>
> Second most common use for custom statistics is per-queue counters.
> This is where the "hierarchical" part of this set comes in, as
> groups can be nested, and user space tools can handle the aggregation
> inside the groups if needed.
>
> This set also tries to address the problem of users not knowing if
> a statistic is reported by hardware or the driver.  Many modern drivers
> use some prefix in ethtool -S to indicate MAC/PHY stats.  At a quick
> glance: Netronome uses "mac.", Intel "port." and Mellanox "_phy".
> In this set, netlink attributes describe whether a group of statistics
> is RX or TX, maintained by device or driver.
>
> The purpose of this patch set is _not_ to replace ethtool -S.  It is
> an incredibly useful tool, and we will certainly continue using it.
> However, for standard-based and commonly maintained statistics a more
> structured API seems warranted.
>
> There are two things missing from these patches, which I initially
> planned to address as well: filtering, and refresh rate control.
>
> Filtering doesn't need much explanation, users should be able to request
> only a subset of statistics (like only SW stats or only given ID).  The
> bitmap of statistics in each group is there for filtering later on.
>
> By refresh control I mean the ability for user space to indicate how
> "fresh" values it expects.  Sometimes reading the HW counters requires
> slow register reads or FW communication, in such cases drivers may cache
> the result.  (Privileged) user space should be able to add a "not older
> than" timestamp to indicate how fresh statistics it expects.  And vice
> versa, drivers can then also put the timestamp of when the statistics
> were last refreshed in the dump for more precise bandwidth estimation.


Jakub, Glad to see hw stats in the RTM_*STATS api. I do see you
mention 'partial' support for ethtool stats. I understand the reason
you say its partial.
But while at it, why not also include the ability to have driver
extensible stats here ? ie make it complete. We have talked about
making all hw stats available
via the RTM_*STATS api in the past..., so just want to make sure the
new HSTATS infra you are adding to the RTM_*STATS api
covers or at-least makes it possible to include driver extensible
stats in the future where the driver gets to define the stats id +
value (This is very useful).
 It would be nice if you can account for that in this new HSTATS API.





>
> Jakub Kicinski (14):
>   nfp: remove unused structure
>   nfp: constify parameter to nfp_port_from_netdev()
>   net: hstats: add basic/core functionality
>   net: hstats: allow hierarchies to be built
>   nfp: very basic hstat support
>   net: hstats: allow iterators
>   net: hstats: help in iteration over directions
>   nfp: hstats: make use of iteration for direction
>   nfp: hstats: add driver and device per queue statistics
>   net: hstats: add IEEE 802.3 and common IETF MIB/RMON stats
>   nfp: hstats: add IEEE/RMON ethernet port/MAC stats
>   net: hstats: add markers for partial groups
>   nfp: hstats: add a partial group of per-8021Q prio stats
>   Documentation: networking: describe new hstat API
>
>  Documentation/networking/hstats.rst           | 590 +++++++++++++++
>  .../networking/hstats_flow_example.dot        |  11 +
>  Documentation/networking/index.rst            |   1 +
>  drivers/net/ethernet/netronome/nfp/Makefile   |   1 +
>  .../net/ethernet/netronome/nfp/nfp_hstat.c    | 474 ++++++++++++
>  drivers/net/ethernet/netronome/nfp/nfp_main.c |   1 +
>  drivers/net/ethernet/netronome/nfp/nfp_main.h |   2 +
>  drivers/net/ethernet/netronome/nfp/nfp_net.h  |  10 +-
>  .../ethernet/netronome/nfp/nfp_net_common.c   |   1 +
>  .../net/ethernet/netronome/nfp/nfp_net_repr.h |   2 +-
>  drivers/net/ethernet/netronome/nfp/nfp_port.c |   2 +-
>  drivers/net/ethernet/netronome/nfp/nfp_port.h |   2 +-
>  include/linux/netdevice.h                     |   9 +
>  include/net/hstats.h                          | 176 +++++
>  include/uapi/linux/if_link.h                  | 107 +++
>  net/core/Makefile                             |   2 +-
>  net/core/hstats.c                             | 682 ++++++++++++++++++
>  net/core/rtnetlink.c                          |  21 +
>  18 files changed, 2084 insertions(+), 10 deletions(-)
>  create mode 100644 Documentation/networking/hstats.rst
>  create mode 100644 Documentation/networking/hstats_flow_example.dot
>  create mode 100644 drivers/net/ethernet/netronome/nfp/nfp_hstat.c
>  create mode 100644 include/net/hstats.h
>  create mode 100644 net/core/hstats.c
>
> --
> 2.19.2
>

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC 03/14] net: hstats: add basic/core functionality
  2019-01-30 17:44     ` Jakub Kicinski
@ 2019-01-30 22:33       ` David Ahern
  0 siblings, 0 replies; 27+ messages in thread
From: David Ahern @ 2019-01-30 22:33 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: davem, oss-drivers, netdev, jiri, f.fainelli, andrew, mkubecek,
	simon.horman, jesse.brandeburg, maciejromanfijalkowski,
	vasundhara-v.volam, michael.chan, shalomt, idosch

On 1/30/19 10:44 AM, Jakub Kicinski wrote:
> On Tue, 29 Jan 2019 21:18:28 -0700, David Ahern wrote:
>> On 1/28/19 4:44 PM, Jakub Kicinski wrote:
>>> @@ -4946,6 +4964,9 @@ static size_t if_nlmsg_stats_size(const struct net_device *dev,
>>>  		rcu_read_unlock();
>>>  	}
>>>  
>>> +	if (stats_attr_valid(filter_mask, IFLA_STATS_LINK_HSTATS, 0))  
>>
>> filter_mask is populated by RTEXT_FILTER_ from
>> include/uapi/linux/rtnetlink.h
> 
> ext_filter_mask is from IFLA_EXT_MASK and has RTEXT_FILTER_ bits set.
> Here the mask is from struct if_stats_msg::filter_mask of RTM_GETSTATS.
> Am I missing the point? :S

nm. I confused the two filter_mask's.

> 
>>> +		size += rtnl_get_link_hstats_size(dev);  
>>
>> rtnl_get_link_hstats_size == __rtnl_get_link_hstats can return < 0.
> 
> Ups!  Thank you!
> 
> In general how much do you dislike this code? :)
> 

Not having spent much time on the stats details it is hard to follow -
i.e, requires a fair bit of time iterative over the core patches.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC 00/14] netlink/hierarchical stats
  2019-01-30 22:14 ` [RFC 00/14] netlink/hierarchical stats Roopa Prabhu
@ 2019-01-31  0:24   ` Jakub Kicinski
  2019-01-31 16:16     ` Roopa Prabhu
  0 siblings, 1 reply; 27+ messages in thread
From: Jakub Kicinski @ 2019-01-31  0:24 UTC (permalink / raw)
  To: Roopa Prabhu
  Cc: David Miller, oss-drivers, netdev, Jiří Pírko,
	Florian Fainelli, Andrew Lunn, Michal Kubecek, David Ahern,
	Simon Horman, Brandeburg, Jesse, maciejromanfijalkowski,
	vasundhara-v.volam, Michael Chan, shalomt, Ido Schimmel

On Wed, 30 Jan 2019 14:14:34 -0800, Roopa Prabhu wrote:
> On Mon, Jan 28, 2019 at 3:45 PM Jakub Kicinski wrote:
> > Hi!
> >
> > As I tried to explain in my slides at netconf 2018 we are lacking
> > an expressive, standard API to report device statistics.
> >
> > Networking silicon generally maintains some IEEE 802.3 and/or RMON
> > statistics.  Today those all end up in ethtool -S.  Here is a simple
> > attempt (admittedly very imprecise) of counting how many names driver
> > authors invented for IETF RFC2819 etherStatsPkts512to1023Octets
> > statistics (RX and TX):
> >
> > $ git grep '".*512.*1023.*"' -- drivers/net/ | \
> >     sed -e 's/.*"\(.*\)".*/\1/' | sort | uniq | wc -l
> > 63
> >
> > Interestingly only two drivers in the tree use the name the standard
> > gave us (etherStatsPkts512to1023, modulo case).
> >
> > I set out to working on this set in an attempt to give drivers a way
> > to express clearly to user space standard-compliant counters.
> >
> > Second most common use for custom statistics is per-queue counters.
> > This is where the "hierarchical" part of this set comes in, as
> > groups can be nested, and user space tools can handle the aggregation
> > inside the groups if needed.
> >
> > This set also tries to address the problem of users not knowing if
> > a statistic is reported by hardware or the driver.  Many modern drivers
> > use some prefix in ethtool -S to indicate MAC/PHY stats.  At a quick
> > glance: Netronome uses "mac.", Intel "port." and Mellanox "_phy".
> > In this set, netlink attributes describe whether a group of statistics
> > is RX or TX, maintained by device or driver.
> >
> > The purpose of this patch set is _not_ to replace ethtool -S.  It is
> > an incredibly useful tool, and we will certainly continue using it.
> > However, for standard-based and commonly maintained statistics a more
> > structured API seems warranted.
> >
> > There are two things missing from these patches, which I initially
> > planned to address as well: filtering, and refresh rate control.
> >
> > Filtering doesn't need much explanation, users should be able to request
> > only a subset of statistics (like only SW stats or only given ID).  The
> > bitmap of statistics in each group is there for filtering later on.
> >
> > By refresh control I mean the ability for user space to indicate how
> > "fresh" values it expects.  Sometimes reading the HW counters requires
> > slow register reads or FW communication, in such cases drivers may cache
> > the result.  (Privileged) user space should be able to add a "not older
> > than" timestamp to indicate how fresh statistics it expects.  And vice
> > versa, drivers can then also put the timestamp of when the statistics
> > were last refreshed in the dump for more precise bandwidth estimation.  
> 
> Jakub, Glad to see hw stats in the RTM_*STATS api. I do see you
> mention 'partial' support for ethtool stats. I understand the reason
> you say its partial.
> But while at it, why not also include the ability to have driver
> extensible stats here ? ie make it complete. We have talked about
> making all hw stats available
> via the RTM_*STATS api in the past..., so just want to make sure the
> new HSTATS infra you are adding to the RTM_*STATS api
> covers or at-least makes it possible to include driver extensible
> stats in the future where the driver gets to define the stats id +
> value (This is very useful).
>  It would be nice if you can account for that in this new HSTATS API.

My thinking was that we should leave truly custom/strange stats to
ethtool API which works quite well for that and at the same time be
very accepting of people adding new IDs to HSTAT (only requirement is
basically defining the meaning very clearly).  

For the first stab I looked at two drivers and added all the stats that
were common.

Given this set is identifying statistics by ID - how would we make that
extensible to drivers?  Would we go back to strings or have some
"driver specific" ID space?

Is there any particular type of statistic you'd expect drivers to want
to add?  For NICs I think IEEE/RMON should pretty much cover the
silicon ones, but I don't know much about switches :)

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC 00/14] netlink/hierarchical stats
  2019-01-31  0:24   ` Jakub Kicinski
@ 2019-01-31 16:16     ` Roopa Prabhu
  2019-01-31 16:31       ` Roopa Prabhu
  0 siblings, 1 reply; 27+ messages in thread
From: Roopa Prabhu @ 2019-01-31 16:16 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David Miller, oss-drivers, netdev, Jiří Pírko,
	Florian Fainelli, Andrew Lunn, Michal Kubecek, David Ahern,
	Simon Horman, Brandeburg, Jesse, maciejromanfijalkowski,
	vasundhara-v.volam, Michael Chan, shalomt, Ido Schimmel

On Wed, Jan 30, 2019 at 4:24 PM Jakub Kicinski
<jakub.kicinski@netronome.com> wrote:
>
> On Wed, 30 Jan 2019 14:14:34 -0800, Roopa Prabhu wrote:
> > On Mon, Jan 28, 2019 at 3:45 PM Jakub Kicinski wrote:
> > > Hi!
> > >
> > > As I tried to explain in my slides at netconf 2018 we are lacking
> > > an expressive, standard API to report device statistics.
> > >
> > > Networking silicon generally maintains some IEEE 802.3 and/or RMON
> > > statistics.  Today those all end up in ethtool -S.  Here is a simple
> > > attempt (admittedly very imprecise) of counting how many names driver
> > > authors invented for IETF RFC2819 etherStatsPkts512to1023Octets
> > > statistics (RX and TX):
> > >
> > > $ git grep '".*512.*1023.*"' -- drivers/net/ | \
> > >     sed -e 's/.*"\(.*\)".*/\1/' | sort | uniq | wc -l
> > > 63
> > >
> > > Interestingly only two drivers in the tree use the name the standard
> > > gave us (etherStatsPkts512to1023, modulo case).
> > >
> > > I set out to working on this set in an attempt to give drivers a way
> > > to express clearly to user space standard-compliant counters.
> > >
> > > Second most common use for custom statistics is per-queue counters.
> > > This is where the "hierarchical" part of this set comes in, as
> > > groups can be nested, and user space tools can handle the aggregation
> > > inside the groups if needed.
> > >
> > > This set also tries to address the problem of users not knowing if
> > > a statistic is reported by hardware or the driver.  Many modern drivers
> > > use some prefix in ethtool -S to indicate MAC/PHY stats.  At a quick
> > > glance: Netronome uses "mac.", Intel "port." and Mellanox "_phy".
> > > In this set, netlink attributes describe whether a group of statistics
> > > is RX or TX, maintained by device or driver.
> > >
> > > The purpose of this patch set is _not_ to replace ethtool -S.  It is
> > > an incredibly useful tool, and we will certainly continue using it.
> > > However, for standard-based and commonly maintained statistics a more
> > > structured API seems warranted.
> > >
> > > There are two things missing from these patches, which I initially
> > > planned to address as well: filtering, and refresh rate control.
> > >
> > > Filtering doesn't need much explanation, users should be able to request
> > > only a subset of statistics (like only SW stats or only given ID).  The
> > > bitmap of statistics in each group is there for filtering later on.
> > >
> > > By refresh control I mean the ability for user space to indicate how
> > > "fresh" values it expects.  Sometimes reading the HW counters requires
> > > slow register reads or FW communication, in such cases drivers may cache
> > > the result.  (Privileged) user space should be able to add a "not older
> > > than" timestamp to indicate how fresh statistics it expects.  And vice
> > > versa, drivers can then also put the timestamp of when the statistics
> > > were last refreshed in the dump for more precise bandwidth estimation.
> >
> > Jakub, Glad to see hw stats in the RTM_*STATS api. I do see you
> > mention 'partial' support for ethtool stats. I understand the reason
> > you say its partial.
> > But while at it, why not also include the ability to have driver
> > extensible stats here ? ie make it complete. We have talked about
> > making all hw stats available
> > via the RTM_*STATS api in the past..., so just want to make sure the
> > new HSTATS infra you are adding to the RTM_*STATS api
> > covers or at-least makes it possible to include driver extensible
> > stats in the future where the driver gets to define the stats id +
> > value (This is very useful).
> >  It would be nice if you can account for that in this new HSTATS API.
>
> My thinking was that we should leave truly custom/strange stats to
> ethtool API which works quite well for that and at the same time be
> very accepting of people adding new IDs to HSTAT (only requirement is
> basically defining the meaning very clearly).

that sounds reasonable. But the 'defining meaning clearly' gets tricky
sometimes.
The vendor who gets their ID or meaning first wins :) and the rest
will have to live with
ethtool and explain to rest of the world that ethtool is more reliable
for their hardware :)

I am also concerned that this getting the ID into common HSTAT ID
space will  slow down the process of adding new counters
for vendors. Which will lead to vendors sticking with ethtool API. It
would be great if people can get all stats in one place and not rely
on another API for 'more'.

>
> For the first stab I looked at two drivers and added all the stats that
> were common.
>
> Given this set is identifying statistics by ID - how would we make that
> extensible to drivers?  Would we go back to strings or have some
> "driver specific" ID space?

I was looking for ideas from you really, to see if you had considered
this. agree per driver ID space seems ugly.
ethtool strings are great today...if we can control the duplication.
But thinking some more..., i did see some
patches recently for vendor specific parameter (with ID) space in
devlink. maybe something like that will be
reasonable ?

>
> Is there any particular type of statistic you'd expect drivers to want
> to add?  For NICs I think IEEE/RMON should pretty much cover the
> silicon ones, but I don't know much about switches :)

I will have to go through the list. But switch asics do support
flexible stats/counters that can be attached at various points.
And new chip versions come with more support. Having that flexibility
to expose/extend such stats incrementally is very valuable on a per
hardware/vendor basis.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC 00/14] netlink/hierarchical stats
  2019-01-31 16:16     ` Roopa Prabhu
@ 2019-01-31 16:31       ` Roopa Prabhu
  2019-01-31 19:30         ` Jakub Kicinski
  0 siblings, 1 reply; 27+ messages in thread
From: Roopa Prabhu @ 2019-01-31 16:31 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David Miller, oss-drivers, netdev, Jiří Pírko,
	Florian Fainelli, Andrew Lunn, Michal Kubecek, David Ahern,
	Simon Horman, Brandeburg, Jesse, maciejromanfijalkowski,
	vasundhara-v.volam, Michael Chan, shalomt, Ido Schimmel

On Thu, Jan 31, 2019 at 8:16 AM Roopa Prabhu <roopa@cumulusnetworks.com> wrote:
>
> On Wed, Jan 30, 2019 at 4:24 PM Jakub Kicinski
> <jakub.kicinski@netronome.com> wrote:
> >
> > On Wed, 30 Jan 2019 14:14:34 -0800, Roopa Prabhu wrote:
> > > On Mon, Jan 28, 2019 at 3:45 PM Jakub Kicinski wrote:
> > > > Hi!
> > > >
> > > > As I tried to explain in my slides at netconf 2018 we are lacking
> > > > an expressive, standard API to report device statistics.
> > > >
> > > > Networking silicon generally maintains some IEEE 802.3 and/or RMON
> > > > statistics.  Today those all end up in ethtool -S.  Here is a simple
> > > > attempt (admittedly very imprecise) of counting how many names driver
> > > > authors invented for IETF RFC2819 etherStatsPkts512to1023Octets
> > > > statistics (RX and TX):
> > > >
> > > > $ git grep '".*512.*1023.*"' -- drivers/net/ | \
> > > >     sed -e 's/.*"\(.*\)".*/\1/' | sort | uniq | wc -l
> > > > 63
> > > >
> > > > Interestingly only two drivers in the tree use the name the standard
> > > > gave us (etherStatsPkts512to1023, modulo case).
> > > >
> > > > I set out to working on this set in an attempt to give drivers a way
> > > > to express clearly to user space standard-compliant counters.
> > > >
> > > > Second most common use for custom statistics is per-queue counters.
> > > > This is where the "hierarchical" part of this set comes in, as
> > > > groups can be nested, and user space tools can handle the aggregation
> > > > inside the groups if needed.
> > > >
> > > > This set also tries to address the problem of users not knowing if
> > > > a statistic is reported by hardware or the driver.  Many modern drivers
> > > > use some prefix in ethtool -S to indicate MAC/PHY stats.  At a quick
> > > > glance: Netronome uses "mac.", Intel "port." and Mellanox "_phy".
> > > > In this set, netlink attributes describe whether a group of statistics
> > > > is RX or TX, maintained by device or driver.
> > > >
> > > > The purpose of this patch set is _not_ to replace ethtool -S.  It is
> > > > an incredibly useful tool, and we will certainly continue using it.
> > > > However, for standard-based and commonly maintained statistics a more
> > > > structured API seems warranted.
> > > >
> > > > There are two things missing from these patches, which I initially
> > > > planned to address as well: filtering, and refresh rate control.
> > > >
> > > > Filtering doesn't need much explanation, users should be able to request
> > > > only a subset of statistics (like only SW stats or only given ID).  The
> > > > bitmap of statistics in each group is there for filtering later on.
> > > >
> > > > By refresh control I mean the ability for user space to indicate how
> > > > "fresh" values it expects.  Sometimes reading the HW counters requires
> > > > slow register reads or FW communication, in such cases drivers may cache
> > > > the result.  (Privileged) user space should be able to add a "not older
> > > > than" timestamp to indicate how fresh statistics it expects.  And vice
> > > > versa, drivers can then also put the timestamp of when the statistics
> > > > were last refreshed in the dump for more precise bandwidth estimation.
> > >
> > > Jakub, Glad to see hw stats in the RTM_*STATS api. I do see you
> > > mention 'partial' support for ethtool stats. I understand the reason
> > > you say its partial.
> > > But while at it, why not also include the ability to have driver
> > > extensible stats here ? ie make it complete. We have talked about
> > > making all hw stats available
> > > via the RTM_*STATS api in the past..., so just want to make sure the
> > > new HSTATS infra you are adding to the RTM_*STATS api
> > > covers or at-least makes it possible to include driver extensible
> > > stats in the future where the driver gets to define the stats id +
> > > value (This is very useful).
> > >  It would be nice if you can account for that in this new HSTATS API.
> >
> > My thinking was that we should leave truly custom/strange stats to
> > ethtool API which works quite well for that and at the same time be
> > very accepting of people adding new IDs to HSTAT (only requirement is
> > basically defining the meaning very clearly).
>
> that sounds reasonable. But the 'defining meaning clearly' gets tricky
> sometimes.
> The vendor who gets their ID or meaning first wins :) and the rest
> will have to live with
> ethtool and explain to rest of the world that ethtool is more reliable
> for their hardware :)
>
> I am also concerned that this getting the ID into common HSTAT ID
> space will  slow down the process of adding new counters
> for vendors. Which will lead to vendors sticking with ethtool API. It
> would be great if people can get all stats in one place and not rely
> on another API for 'more'.
>
> >
> > For the first stab I looked at two drivers and added all the stats that
> > were common.
> >
> > Given this set is identifying statistics by ID - how would we make that
> > extensible to drivers?  Would we go back to strings or have some
> > "driver specific" ID space?
>
> I was looking for ideas from you really, to see if you had considered
> this. agree per driver ID space seems ugly.
> ethtool strings are great today...if we can control the duplication.
> But thinking some more..., i did see some
> patches recently for vendor specific parameter (with ID) space in
> devlink. maybe something like that will be
> reasonable ?
>
> >
> > Is there any particular type of statistic you'd expect drivers to want
> > to add?  For NICs I think IEEE/RMON should pretty much cover the
> > silicon ones, but I don't know much about switches :)
>
> I will have to go through the list. But switch asics do support
> flexible stats/counters that can be attached at various points.
> And new chip versions come with more support. Having that flexibility
> to expose/extend such stats incrementally is very valuable on a per
> hardware/vendor basis.

Just want to clarify that I am suggesting a nested HSTATS extension
infra for drivers (just like ethtool).
'Common stats' stays at the top-level.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC 00/14] netlink/hierarchical stats
  2019-01-31 16:31       ` Roopa Prabhu
@ 2019-01-31 19:30         ` Jakub Kicinski
  2019-02-02 23:14           ` Roopa Prabhu
  0 siblings, 1 reply; 27+ messages in thread
From: Jakub Kicinski @ 2019-01-31 19:30 UTC (permalink / raw)
  To: Roopa Prabhu
  Cc: David Miller, oss-drivers, netdev, Jiří Pírko,
	Florian Fainelli, Andrew Lunn, Michal Kubecek, David Ahern,
	Simon Horman, Brandeburg, Jesse, maciejromanfijalkowski,
	vasundhara-v.volam, Michael Chan, shalomt, Ido Schimmel

On Thu, 31 Jan 2019 08:31:51 -0800, Roopa Prabhu wrote:
> On Thu, Jan 31, 2019 at 8:16 AM Roopa Prabhu wrote:
> > On Wed, Jan 30, 2019 at 4:24 PM Jakub Kicinski wrote:  
> > > On Wed, 30 Jan 2019 14:14:34 -0800, Roopa Prabhu wrote:  
> > >
> > > My thinking was that we should leave truly custom/strange stats to
> > > ethtool API which works quite well for that and at the same time be
> > > very accepting of people adding new IDs to HSTAT (only requirement is
> > > basically defining the meaning very clearly).  
> >
> > that sounds reasonable. But the 'defining meaning clearly' gets tricky
> > sometimes.
> > The vendor who gets their ID or meaning first wins :) and the rest
> > will have to live with
> > ethtool and explain to rest of the world that ethtool is more reliable
> > for their hardware :)

Right, that's the trade off inherent to standardization.  I don't see
any way to work around the fact that the definition may not fit all.

What I want as a end user and what I want for my customers is the
ability to switch the NIC on their system and not spend two months
"integrating" into their automation :(  If the definition of statistics
is not solid we're back to square one.

> > I am also concerned that this getting the ID into common HSTAT ID
> > space will  slow down the process of adding new counters
> > for vendors. Which will lead to vendors sticking with ethtool API. 

I feel like whatever we did here will end up looking much like the
ethtool interface, which is why I decided to leave that part out.
Ethtool -S works pretty well for custom stats.  Standard and structured
stats don't fit with it in any way, the two seem best left separate.

> > It would be great if people can get all stats in one place and not
> > rely on another API for 'more'.

One place in the driver or for the user?  I'm happy to add the code to
ethtool to also dump hstats and render them in a standard way.  In fact
the tool I have for testing has a "simplified" output format which
looks exactly like ethtool -S.

One place for the driver to report is hard, as I said I think the
custom stats are best left with ethtool.  Adding an extra incentive to
standardize.

> > > For the first stab I looked at two drivers and added all the stats that
> > > were common.
> > >
> > > Given this set is identifying statistics by ID - how would we make that
> > > extensible to drivers?  Would we go back to strings or have some
> > > "driver specific" ID space?  
> >
> > I was looking for ideas from you really, to see if you had considered
> > this. agree per driver ID space seems ugly.
> > ethtool strings are great today...if we can control the duplication.
> > But thinking some more..., i did see some
> > patches recently for vendor specific parameter (with ID) space in
> > devlink. maybe something like that will be
> > reasonable ?

I thought about this for a year and I basically came to the conclusion
I can't find any perfect solution, if there is one.

The devlink parameters are useful, but as anticipated they became the
laziest excuse of an ABI... Don't get me started ;)

> > > Is there any particular type of statistic you'd expect drivers to want
> > > to add?  For NICs I think IEEE/RMON should pretty much cover the
> > > silicon ones, but I don't know much about switches :)  
> >
> > I will have to go through the list. But switch asics do support
> > flexible stats/counters that can be attached at various points.
> > And new chip versions come with more support. Having that flexibility
> > to expose/extend such stats incrementally is very valuable on a per
> > hardware/vendor basis.  

Yes, I'm not too familiar with those counters.  Do they need to be
enabled to start counting?  Do they have performance impact?  Can the
"sample" events perf-style?  How is the condition on which they trigger
defined?  Is it maybe just "match a packet and increment a counter"?
Would such counters benefit from hierarchical structure?

I was trying to cover the long standing use cases - namely the
IEEE/RMON stats which all MAC have had for years and per queue stats
which all drivers have had for years.  But if we can cater to more
cases I'm open.

> Just want to clarify that I am suggesting a nested HSTATS extension
> infra for drivers (just like ethtool).
> 'Common stats' stays at the top-level.

I got a concept of groups here.  The dump generally looks like this:

[root group A (say MAC stats)]
  [sub group RX]
  [sub group TX]
[root group B (say PCIe stats)]
  [sub group RX]
  [sub group TX]
[root group C (say per-q driver stats]
  [sub group RX]
    [q1 group]
    [q2 group]
    [q3 group]
  [sub group TX]
    [q1 group]
    [q2 group]
    [q3 group]

Each root group representing a "point in the pipeline".

So it's not too hard to add a root group with whatever, the questions
are move how would it benefit over existing ethtool if the stats are
custom anyway?  Hm..

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC 00/14] netlink/hierarchical stats
  2019-01-31 19:30         ` Jakub Kicinski
@ 2019-02-02 23:14           ` Roopa Prabhu
  2019-02-06  4:45             ` Jakub Kicinski
  0 siblings, 1 reply; 27+ messages in thread
From: Roopa Prabhu @ 2019-02-02 23:14 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David Miller, oss-drivers, netdev, Jiří Pírko,
	Florian Fainelli, Andrew Lunn, Michal Kubecek, David Ahern,
	Simon Horman, Brandeburg, Jesse, Maciek Fijałkowski,
	vasundhara-v.volam, Michael Chan, shalomt, Ido Schimmel

On Thu, Jan 31, 2019 at 11:31 AM Jakub Kicinski
<jakub.kicinski@netronome.com> wrote:
>
> On Thu, 31 Jan 2019 08:31:51 -0800, Roopa Prabhu wrote:
> > On Thu, Jan 31, 2019 at 8:16 AM Roopa Prabhu wrote:
> > > On Wed, Jan 30, 2019 at 4:24 PM Jakub Kicinski wrote:
> > > > On Wed, 30 Jan 2019 14:14:34 -0800, Roopa Prabhu wrote:
> > > >
> > > > My thinking was that we should leave truly custom/strange stats to
> > > > ethtool API which works quite well for that and at the same time be
> > > > very accepting of people adding new IDs to HSTAT (only requirement is
> > > > basically defining the meaning very clearly).
> > >
> > > that sounds reasonable. But the 'defining meaning clearly' gets tricky
> > > sometimes.
> > > The vendor who gets their ID or meaning first wins :) and the rest
> > > will have to live with
> > > ethtool and explain to rest of the world that ethtool is more reliable
> > > for their hardware :)
>
> Right, that's the trade off inherent to standardization.  I don't see
> any way to work around the fact that the definition may not fit all.
>
> What I want as a end user and what I want for my customers is the
> ability to switch the NIC on their system and not spend two months
> "integrating" into their automation :(  If the definition of statistics
> is not solid we're back to square one.

agree. And I am with you on standardizing them.

>
> > > I am also concerned that this getting the ID into common HSTAT ID
> > > space will  slow down the process of adding new counters
> > > for vendors. Which will lead to vendors sticking with ethtool API.
>
> I feel like whatever we did here will end up looking much like the
> ethtool interface, which is why I decided to leave that part out.
> Ethtool -S works pretty well for custom stats.  Standard and structured
> stats don't fit with it in any way, the two seem best left separate.

understand the fear. My only point was getting them together in a
single API is better so that they don't get developed separately and
we don't end up with duplicate stats code.

>
> > > It would be great if people can get all stats in one place and not
> > > rely on another API for 'more'.
>
> One place in the driver or for the user?  I'm happy to add the code to
> ethtool to also dump hstats and render them in a standard way.  In fact
> the tool I have for testing has a "simplified" output format which
> looks exactly like ethtool -S.
>
> One place for the driver to report is hard, as I said I think the
> custom stats are best left with ethtool.  Adding an extra incentive to
> standardize.
>
> > > > For the first stab I looked at two drivers and added all the stats that
> > > > were common.
> > > >
> > > > Given this set is identifying statistics by ID - how would we make that
> > > > extensible to drivers?  Would we go back to strings or have some
> > > > "driver specific" ID space?
> > >
> > > I was looking for ideas from you really, to see if you had considered
> > > this. agree per driver ID space seems ugly.
> > > ethtool strings are great today...if we can control the duplication.
> > > But thinking some more..., i did see some
> > > patches recently for vendor specific parameter (with ID) space in
> > > devlink. maybe something like that will be
> > > reasonable ?
>
> I thought about this for a year and I basically came to the conclusion
> I can't find any perfect solution, if there is one.
>
> The devlink parameters are useful, but as anticipated they became the
> laziest excuse of an ABI... Don't get me started ;)
>
> > > > Is there any particular type of statistic you'd expect drivers to want
> > > > to add?  For NICs I think IEEE/RMON should pretty much cover the
> > > > silicon ones, but I don't know much about switches :)
> > >
> > > I will have to go through the list. But switch asics do support
> > > flexible stats/counters that can be attached at various points.
> > > And new chip versions come with more support. Having that flexibility
> > > to expose/extend such stats incrementally is very valuable on a per
> > > hardware/vendor basis.
>
> Yes, I'm not too familiar with those counters.  Do they need to be
> enabled to start counting?

yes correct.

> Do they have performance impact?

I have not heard of any performance impact...but they are not enabled
by default because of limited counter resource pool.

> Can the
> "sample" events perf-style?

I don't think so

> How is the condition on which they trigger
> defined?  Is it maybe just "match a packet and increment a counter"?

yes, something like that.

> Would such counters benefit from hierarchical structure?

hmm not sure.


One thing though, for most of these flexible counters and their
attachment points in hardware, we can count them on logical devices or
other objects in software like per vlan, vni, route stats etc.

>
> I was trying to cover the long standing use cases - namely the
> IEEE/RMON stats which all MAC have had for years and per queue stats
> which all drivers have had for years.  But if we can cater to more
> cases I'm open.
>
> > Just want to clarify that I am suggesting a nested HSTATS extension
> > infra for drivers (just like ethtool).
> > 'Common stats' stays at the top-level.
>
> I got a concept of groups here.  The dump generally looks like this:
>
> [root group A (say MAC stats)]
>   [sub group RX]
>   [sub group TX]
> [root group B (say PCIe stats)]
>   [sub group RX]
>   [sub group TX]
> [root group C (say per-q driver stats]
>   [sub group RX]
>     [q1 group]
>     [q2 group]
>     [q3 group]
>   [sub group TX]
>     [q1 group]
>     [q2 group]
>     [q3 group]
>
> Each root group representing a "point in the pipeline".
>
> So it's not too hard to add a root group with whatever, the questions
> are move how would it benefit over existing ethtool if the stats are
> custom anyway?  Hm..

It wouldn't. I am only saying that the netlink stats API is the new
place to move stats.
Ethtool stats will have to move to netlink some day and I don't see a
need to draw a hardline on saying no to ethtool custom stats moving to
the netlink based common stats API. And unless there is a good
migration path for a new hardware stats API that is all inclusive,
there is a higher chance of continued development on the older
hardware stats API.
I have no objections to having a standard set of stats (this is
essentially what we have for software stats too).

I don't want to block your series from going forward without HW custom
stats extensions. And IIUC your API is extensible and does not
preclude anyone from adding the ability to include HW custom stats
extensions in the future with enough justification. That is good for
me.

To take a random example, we expose the following stats on our
switches via ethtool. I have not used them personally but they
correspond to respective hardware counters. Is there any room for such
stats in the new HSTATS netlink API or they will have to stick to
ethtool ?
I believe people will need per-queue counters for this.

     HwIfOutWredDrops: 0
     HwIfOutQ0WredDrops: 0
     HwIfOutQ1WredDrops: 0
     HwIfOutQ2WredDrops: 0
     HwIfOutQ3WredDrops: 0
     HwIfOutQ4WredDrops: 0
     HwIfOutQ5WredDrops: 0
     HwIfOutQ6WredDrops: 0
     HwIfOutQ7WredDrops: 0
     HwIfOutQ8WredDrops: 0

     HwIfOutQ9WredDrops: 0

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC 00/14] netlink/hierarchical stats
  2019-02-02 23:14           ` Roopa Prabhu
@ 2019-02-06  4:45             ` Jakub Kicinski
  0 siblings, 0 replies; 27+ messages in thread
From: Jakub Kicinski @ 2019-02-06  4:45 UTC (permalink / raw)
  To: Roopa Prabhu
  Cc: David Miller, oss-drivers, netdev, Jiří Pírko,
	Florian Fainelli, Andrew Lunn, Michal Kubecek, David Ahern,
	Simon Horman, Brandeburg, Jesse, Maciek Fijałkowski,
	vasundhara-v.volam, Michael Chan, shalomt, Ido Schimmel

On Sat, 2 Feb 2019 15:14:44 -0800, Roopa Prabhu wrote:
> On Thu, Jan 31, 2019 at 11:31 AM Jakub Kicinski wrote:
> > On Thu, 31 Jan 2019 08:31:51 -0800, Roopa Prabhu wrote:  
> > > On Thu, Jan 31, 2019 at 8:16 AM Roopa Prabhu wrote:  
> > > > On Wed, Jan 30, 2019 at 4:24 PM Jakub Kicinski wrote:  
> > > > > On Wed, 30 Jan 2019 14:14:34 -0800, Roopa Prabhu wrote:
> > > > >
> > > > > My thinking was that we should leave truly custom/strange stats to
> > > > > ethtool API which works quite well for that and at the same time be
> > > > > very accepting of people adding new IDs to HSTAT (only requirement is
> > > > > basically defining the meaning very clearly).  
> > > >
> > > > that sounds reasonable. But the 'defining meaning clearly' gets tricky
> > > > sometimes.
> > > > The vendor who gets their ID or meaning first wins :) and the rest
> > > > will have to live with
> > > > ethtool and explain to rest of the world that ethtool is more reliable
> > > > for their hardware :)  
> >
> > Right, that's the trade off inherent to standardization.  I don't see
> > any way to work around the fact that the definition may not fit all.
> >
> > What I want as a end user and what I want for my customers is the
> > ability to switch the NIC on their system and not spend two months
> > "integrating" into their automation :(  If the definition of statistics
> > is not solid we're back to square one.  
> 
> agree. And I am with you on standardizing them.
> 
> >  
> > > > I am also concerned that this getting the ID into common HSTAT ID
> > > > space will  slow down the process of adding new counters
> > > > for vendors. Which will lead to vendors sticking with ethtool API.  
> >
> > I feel like whatever we did here will end up looking much like the
> > ethtool interface, which is why I decided to leave that part out.
> > Ethtool -S works pretty well for custom stats.  Standard and structured
> > stats don't fit with it in any way, the two seem best left separate.  
> 
> understand the fear. My only point was getting them together in a
> single API is better so that they don't get developed separately and
> we don't end up with duplicate stats code.
> 
> >  
> > > > It would be great if people can get all stats in one place and not
> > > > rely on another API for 'more'.  
> >
> > One place in the driver or for the user?  I'm happy to add the code to
> > ethtool to also dump hstats and render them in a standard way.  In fact
> > the tool I have for testing has a "simplified" output format which
> > looks exactly like ethtool -S.
> >
> > One place for the driver to report is hard, as I said I think the
> > custom stats are best left with ethtool.  Adding an extra incentive to
> > standardize.
> >  
> > > > > For the first stab I looked at two drivers and added all the stats that
> > > > > were common.
> > > > >
> > > > > Given this set is identifying statistics by ID - how would we make that
> > > > > extensible to drivers?  Would we go back to strings or have some
> > > > > "driver specific" ID space?  
> > > >
> > > > I was looking for ideas from you really, to see if you had considered
> > > > this. agree per driver ID space seems ugly.
> > > > ethtool strings are great today...if we can control the duplication.
> > > > But thinking some more..., i did see some
> > > > patches recently for vendor specific parameter (with ID) space in
> > > > devlink. maybe something like that will be
> > > > reasonable ?  
> >
> > I thought about this for a year and I basically came to the conclusion
> > I can't find any perfect solution, if there is one.
> >
> > The devlink parameters are useful, but as anticipated they became the
> > laziest excuse of an ABI... Don't get me started ;)
> >  
> > > > > Is there any particular type of statistic you'd expect drivers to want
> > > > > to add?  For NICs I think IEEE/RMON should pretty much cover the
> > > > > silicon ones, but I don't know much about switches :)  
> > > >
> > > > I will have to go through the list. But switch asics do support
> > > > flexible stats/counters that can be attached at various points.
> > > > And new chip versions come with more support. Having that flexibility
> > > > to expose/extend such stats incrementally is very valuable on a per
> > > > hardware/vendor basis.  
> >
> > Yes, I'm not too familiar with those counters.  Do they need to be
> > enabled to start counting?  
> 
> yes correct.
> 
> > Do they have performance impact?  
> 
> I have not heard of any performance impact...but they are not enabled
> by default because of limited counter resource pool.

I see.. I'd personally see that as something that we probably either
support via perf, or new devlink perf creation.  Those are perf events,
not stats to me.  Devlink would probably suit fixed HW better, and
perf could feel slightly more natural to certain NICs (*ekhm* perf
traces of offloaded BPF programs).

> > Can the
> > "sample" events perf-style?  
> 
> I don't think so
> 
> > How is the condition on which they trigger
> > defined?  Is it maybe just "match a packet and increment a counter"?  
> 
> yes, something like that.
> 
> > Would such counters benefit from hierarchical structure?  
> 
> hmm not sure.
> 
> 
> One thing though, for most of these flexible counters and their
> attachment points in hardware, we can count them on logical devices or
> other objects in software like per vlan, vni, route stats etc.
> 
> >
> > I was trying to cover the long standing use cases - namely the
> > IEEE/RMON stats which all MAC have had for years and per queue stats
> > which all drivers have had for years.  But if we can cater to more
> > cases I'm open.
> >  
> > > Just want to clarify that I am suggesting a nested HSTATS extension
> > > infra for drivers (just like ethtool).
> > > 'Common stats' stays at the top-level.  
> >
> > I got a concept of groups here.  The dump generally looks like this:
> >
> > [root group A (say MAC stats)]
> >   [sub group RX]
> >   [sub group TX]
> > [root group B (say PCIe stats)]
> >   [sub group RX]
> >   [sub group TX]
> > [root group C (say per-q driver stats]
> >   [sub group RX]
> >     [q1 group]
> >     [q2 group]
> >     [q3 group]
> >   [sub group TX]
> >     [q1 group]
> >     [q2 group]
> >     [q3 group]
> >
> > Each root group representing a "point in the pipeline".
> >
> > So it's not too hard to add a root group with whatever, the questions
> > are move how would it benefit over existing ethtool if the stats are
> > custom anyway?  Hm..  
> 
> It wouldn't. I am only saying that the netlink stats API is the new
> place to move stats.
> Ethtool stats will have to move to netlink some day and I don't see a
> need to draw a hardline on saying no to ethtool custom stats moving to
> the netlink based common stats API. And unless there is a good
> migration path for a new hardware stats API that is all inclusive,
> there is a higher chance of continued development on the older
> hardware stats API.
> I have no objections to having a standard set of stats (this is
> essentially what we have for software stats too).
> 
> I don't want to block your series from going forward without HW custom
> stats extensions. And IIUC your API is extensible and does not
> preclude anyone from adding the ability to include HW custom stats
> extensions in the future with enough justification. That is good for
> me.

Would you be more interested in seeing the similarity in API on the
driver side or on the netlink side?  I was hoping to leave the legacy
stats in ethtool (soon to be running over netlink as well) for the
time being.  I wish we had some form of library on the iproute2 side 
we could evolve together with the kernel libbpf-style :(

> To take a random example, we expose the following stats on our
> switches via ethtool. I have not used them personally but they
> correspond to respective hardware counters. Is there any room for such
> stats in the new HSTATS netlink API or they will have to stick to
> ethtool ?
> I believe people will need per-queue counters for this.
> 
>      HwIfOutWredDrops: 0
>      HwIfOutQ0WredDrops: 0
>      HwIfOutQ1WredDrops: 0
>      HwIfOutQ2WredDrops: 0
>      HwIfOutQ3WredDrops: 0
>      HwIfOutQ4WredDrops: 0
>      HwIfOutQ5WredDrops: 0
>      HwIfOutQ6WredDrops: 0
>      HwIfOutQ7WredDrops: 0
>      HwIfOutQ8WredDrops: 0
>      HwIfOutQ9WredDrops: 0

Well, yes, so these are clearly enough defined stats, and I'd be very
happy to add an ID for you for those... if those shouldn't be reported
in the tc qdisc red stats that should be used to configure WRED :(

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC 00/14] netlink/hierarchical stats
  2019-01-28 23:44 [RFC 00/14] netlink/hierarchical stats Jakub Kicinski
                   ` (14 preceding siblings ...)
  2019-01-30 22:14 ` [RFC 00/14] netlink/hierarchical stats Roopa Prabhu
@ 2019-02-06 20:12 ` Florian Fainelli
  2019-02-07 16:23   ` Jakub Kicinski
  15 siblings, 1 reply; 27+ messages in thread
From: Florian Fainelli @ 2019-02-06 20:12 UTC (permalink / raw)
  To: Jakub Kicinski, davem
  Cc: oss-drivers, netdev, jiri, andrew, mkubecek, dsahern,
	simon.horman, jesse.brandeburg, maciejromanfijalkowski,
	vasundhara-v.volam, michael.chan, shalomt, idosch

On 1/28/19 3:44 PM, Jakub Kicinski wrote:
> Hi!
> 
> As I tried to explain in my slides at netconf 2018 we are lacking
> an expressive, standard API to report device statistics.
> 
> Networking silicon generally maintains some IEEE 802.3 and/or RMON
> statistics.  Today those all end up in ethtool -S.  Here is a simple
> attempt (admittedly very imprecise) of counting how many names driver
> authors invented for IETF RFC2819 etherStatsPkts512to1023Octets
> statistics (RX and TX):
> 
> $ git grep '".*512.*1023.*"' -- drivers/net/ | \
>     sed -e 's/.*"\(.*\)".*/\1/' | sort | uniq | wc -l
> 63
> 
> Interestingly only two drivers in the tree use the name the standard
> gave us (etherStatsPkts512to1023, modulo case).
> 
> I set out to working on this set in an attempt to give drivers a way
> to express clearly to user space standard-compliant counters.
> 
> Second most common use for custom statistics is per-queue counters.
> This is where the "hierarchical" part of this set comes in, as
> groups can be nested, and user space tools can handle the aggregation
> inside the groups if needed.
> 
> This set also tries to address the problem of users not knowing if
> a statistic is reported by hardware or the driver.  Many modern drivers
> use some prefix in ethtool -S to indicate MAC/PHY stats.  At a quick
> glance: Netronome uses "mac.", Intel "port." and Mellanox "_phy".
> In this set, netlink attributes describe whether a group of statistics
> is RX or TX, maintained by device or driver.
> 
> The purpose of this patch set is _not_ to replace ethtool -S.  It is
> an incredibly useful tool, and we will certainly continue using it.
> However, for standard-based and commonly maintained statistics a more
> structured API seems warranted.
> 
> There are two things missing from these patches, which I initially
> planned to address as well: filtering, and refresh rate control.
> 
> Filtering doesn't need much explanation, users should be able to request
> only a subset of statistics (like only SW stats or only given ID).  The
> bitmap of statistics in each group is there for filtering later on.
> 
> By refresh control I mean the ability for user space to indicate how
> "fresh" values it expects.  Sometimes reading the HW counters requires
> slow register reads or FW communication, in such cases drivers may cache
> the result.  (Privileged) user space should be able to add a "not older
> than" timestamp to indicate how fresh statistics it expects.  And vice
> versa, drivers can then also put the timestamp of when the statistics
> were last refreshed in the dump for more precise bandwidth estimation.

Another thing that we cannot quite do with ethtool right now, at least
not easily, is something like the following use case.

You have some filtering/classification capable hardware, and the HW can
count the number of times a rule has been hit/missed. The number of
rules programmed into the HW is dynamic and depends on use case so
dumping them all is not convenient for e.g.: hundreds/thousands of rules.

You would want to return only the rules that are active/enabled, and not
the full possible range of rules. With ethtool, this is not possible
because you have to define the strings first, and in a second call, you
are going to get the dump and fill in the data returned to user-space...

I will review more in depth, but the idea looks great so far.

> 
> Jakub Kicinski (14):
>   nfp: remove unused structure
>   nfp: constify parameter to nfp_port_from_netdev()
>   net: hstats: add basic/core functionality
>   net: hstats: allow hierarchies to be built
>   nfp: very basic hstat support
>   net: hstats: allow iterators
>   net: hstats: help in iteration over directions
>   nfp: hstats: make use of iteration for direction
>   nfp: hstats: add driver and device per queue statistics
>   net: hstats: add IEEE 802.3 and common IETF MIB/RMON stats
>   nfp: hstats: add IEEE/RMON ethernet port/MAC stats
>   net: hstats: add markers for partial groups
>   nfp: hstats: add a partial group of per-8021Q prio stats
>   Documentation: networking: describe new hstat API
> 
>  Documentation/networking/hstats.rst           | 590 +++++++++++++++
>  .../networking/hstats_flow_example.dot        |  11 +
>  Documentation/networking/index.rst            |   1 +
>  drivers/net/ethernet/netronome/nfp/Makefile   |   1 +
>  .../net/ethernet/netronome/nfp/nfp_hstat.c    | 474 ++++++++++++
>  drivers/net/ethernet/netronome/nfp/nfp_main.c |   1 +
>  drivers/net/ethernet/netronome/nfp/nfp_main.h |   2 +
>  drivers/net/ethernet/netronome/nfp/nfp_net.h  |  10 +-
>  .../ethernet/netronome/nfp/nfp_net_common.c   |   1 +
>  .../net/ethernet/netronome/nfp/nfp_net_repr.h |   2 +-
>  drivers/net/ethernet/netronome/nfp/nfp_port.c |   2 +-
>  drivers/net/ethernet/netronome/nfp/nfp_port.h |   2 +-
>  include/linux/netdevice.h                     |   9 +
>  include/net/hstats.h                          | 176 +++++
>  include/uapi/linux/if_link.h                  | 107 +++
>  net/core/Makefile                             |   2 +-
>  net/core/hstats.c                             | 682 ++++++++++++++++++
>  net/core/rtnetlink.c                          |  21 +
>  18 files changed, 2084 insertions(+), 10 deletions(-)
>  create mode 100644 Documentation/networking/hstats.rst
>  create mode 100644 Documentation/networking/hstats_flow_example.dot
>  create mode 100644 drivers/net/ethernet/netronome/nfp/nfp_hstat.c
>  create mode 100644 include/net/hstats.h
>  create mode 100644 net/core/hstats.c
> 


-- 
Florian

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [RFC 00/14] netlink/hierarchical stats
  2019-02-06 20:12 ` Florian Fainelli
@ 2019-02-07 16:23   ` Jakub Kicinski
  0 siblings, 0 replies; 27+ messages in thread
From: Jakub Kicinski @ 2019-02-07 16:23 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: davem, oss-drivers, netdev, jiri, andrew, mkubecek, dsahern,
	simon.horman, jesse.brandeburg, maciejromanfijalkowski,
	vasundhara-v.volam, michael.chan, shalomt, idosch

On Wed, 6 Feb 2019 12:12:39 -0800, Florian Fainelli wrote:
> > By refresh control I mean the ability for user space to indicate how
> > "fresh" values it expects.  Sometimes reading the HW counters requires
> > slow register reads or FW communication, in such cases drivers may cache
> > the result.  (Privileged) user space should be able to add a "not older
> > than" timestamp to indicate how fresh statistics it expects.  And vice
> > versa, drivers can then also put the timestamp of when the statistics
> > were last refreshed in the dump for more precise bandwidth estimation.  
> 
> Another thing that we cannot quite do with ethtool right now, at least
> not easily, is something like the following use case.
> 
> You have some filtering/classification capable hardware, and the HW can
> count the number of times a rule has been hit/missed. The number of
> rules programmed into the HW is dynamic and depends on use case so
> dumping them all is not convenient for e.g.: hundreds/thousands of rules.

That raises the inevitable question of what is the source of the rules
i.e. which API has been used to configure them?

> You would want to return only the rules that are active/enabled, and not
> the full possible range of rules. With ethtool, this is not possible
> because you have to define the strings first, and in a second call, you
> are going to get the dump and fill in the data returned to user-space...

Interesting, if the driver is caching the stats it can remember both
last refresh and last change and return only the statistics which
changed since time X.  Would the "last changed" time stamp be of any
use to user space?  Probably not, right?

> I will review more in depth, but the idea looks great so far.

Thanks!

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2019-02-07 16:23 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-28 23:44 [RFC 00/14] netlink/hierarchical stats Jakub Kicinski
2019-01-28 23:44 ` [RFC 01/14] nfp: remove unused structure Jakub Kicinski
2019-01-28 23:44 ` [RFC 02/14] nfp: constify parameter to nfp_port_from_netdev() Jakub Kicinski
2019-01-28 23:44 ` [RFC 03/14] net: hstats: add basic/core functionality Jakub Kicinski
2019-01-30  4:18   ` David Ahern
2019-01-30 17:44     ` Jakub Kicinski
2019-01-30 22:33       ` David Ahern
2019-01-28 23:44 ` [RFC 04/14] net: hstats: allow hierarchies to be built Jakub Kicinski
2019-01-28 23:44 ` [RFC 05/14] nfp: very basic hstat support Jakub Kicinski
2019-01-28 23:44 ` [RFC 06/14] net: hstats: allow iterators Jakub Kicinski
2019-01-28 23:45 ` [RFC 07/14] net: hstats: help in iteration over directions Jakub Kicinski
2019-01-28 23:45 ` [RFC 08/14] nfp: hstats: make use of iteration for direction Jakub Kicinski
2019-01-28 23:45 ` [RFC 09/14] nfp: hstats: add driver and device per queue statistics Jakub Kicinski
2019-01-28 23:45 ` [RFC 10/14] net: hstats: add IEEE 802.3 and common IETF MIB/RMON stats Jakub Kicinski
2019-01-28 23:45 ` [RFC 11/14] nfp: hstats: add IEEE/RMON ethernet port/MAC stats Jakub Kicinski
2019-01-28 23:45 ` [RFC 12/14] net: hstats: add markers for partial groups Jakub Kicinski
2019-01-28 23:45 ` [RFC 13/14] nfp: hstats: add a partial group of per-8021Q prio stats Jakub Kicinski
2019-01-28 23:45 ` [RFC 14/14] Documentation: networking: describe new hstat API Jakub Kicinski
2019-01-30 22:14 ` [RFC 00/14] netlink/hierarchical stats Roopa Prabhu
2019-01-31  0:24   ` Jakub Kicinski
2019-01-31 16:16     ` Roopa Prabhu
2019-01-31 16:31       ` Roopa Prabhu
2019-01-31 19:30         ` Jakub Kicinski
2019-02-02 23:14           ` Roopa Prabhu
2019-02-06  4:45             ` Jakub Kicinski
2019-02-06 20:12 ` Florian Fainelli
2019-02-07 16:23   ` Jakub Kicinski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.