All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch net-next RFC 0/8] Add support for pipeline debug (dpipe)
@ 2017-02-16 15:22 Jiri Pirko
  2017-02-16 15:22 ` [patch net-next RFC 1/8] devlink: Support " Jiri Pirko
                   ` (10 more replies)
  0 siblings, 11 replies; 22+ messages in thread
From: Jiri Pirko @ 2017-02-16 15:22 UTC (permalink / raw)
  To: netdev
  Cc: davem, arkadis, idosch, mlxsw, jhs, ivecera, roopa, f.fainelli,
	vivien.didelot, john.fastabend, andrew

From: Jiri Pirko <jiri@mellanox.com>

Arkadi says:

While doing the hardware offloading process much of the hardware
specifics cannot be presented. An example for such is the routing
LPM algorithm which differ in hardware implementation from the
kernel software implementation. The only information the user receives
is whether specific route is offloaded or not, but he cannot really
understand the underlying implementation nor get the specific statistics
related to that process.

Another example is ACL offload using TC which is commonly implemented
using TCAM memory. Currently there is no capability to gain visibility
into the TCAM structure and to debug suboptimal resource allocation.

This patchset introduces capability for exporting the ASICs pipeline
abstraction via devlink infrastructure, which should serve as an
complementary tool. This infrastructure allows the user to get visibility
into the ASIC by modeling it as a set of match/action tables.

The main objects defined:
Table - abstraction for a single pipeline stage. Contains the
        available match/actions and counter availability.
Entry - entry in a specific table with specific matches/actions
        values and dedicated counter.
Header/field - tuples which describes the tables behavior.

As an example one of the ASIC's L3 blocks will be modeled. The egress
rif (router interface) table is the final step in the L3 pipeline
processing which does match on the internal rif index which was
determined before by the routing logic. The erif table determines
whether to forward or drop the packet and updates the corresponding
rif L3 statistics.

To expose this internal resources a special metadata header will
be introduced that describes the internal information gathered by
the ASIC's pipeline and contains the following fields: rif_port_index,
forward and drop.

Some internal hardware resources have direct mapping to kernel
objects. For example the rif_port_index is mapped to the net-devices
ifindex. By providing this mapping the users gains visibility into
the offloading process.

Follow-up work will include exporting more L3 tables which will give
visibility into the routing process.

First stage is adding support for dpipe in devlink. Next add support
in spectrum driver. Finally implement egress router interface
(erif) table for spectrum ASIC as an example.

Arkadi Sharshevsky (8):
  devlink: Support for pipeline debug (dpipe)
  mlxsw: spectrum: Add support for flow counter allocator
  mlxsw: reg: Add counter fields to RITR register
  mlxsw: spectrum: Add placeholder for dpipe
  mlxsw: spectrum: Add definition for egress rif table
  mlxsw: reg: Add Router Interface Counter Register
  mlxsw: spectrum: Support for counters on router interfaces
  mlxsw: spectrum: Add Support for erif table entries access

 drivers/net/ethernet/mellanox/mlxsw/Makefile       |   3 +-
 drivers/net/ethernet/mellanox/mlxsw/reg.h          | 178 +++++
 drivers/net/ethernet/mellanox/mlxsw/resources.h    |   2 +
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c     | 163 +++++
 drivers/net/ethernet/mellanox/mlxsw/spectrum.h     |  21 +
 drivers/net/ethernet/mellanox/mlxsw/spectrum_cnt.c | 182 +++++
 drivers/net/ethernet/mellanox/mlxsw/spectrum_cnt.h |  56 ++
 .../net/ethernet/mellanox/mlxsw/spectrum_dpipe.c   | 303 +++++++++
 .../net/ethernet/mellanox/mlxsw/spectrum_dpipe.h   |  43 ++
 include/net/devlink.h                              | 224 +++++-
 include/uapi/linux/devlink.h                       |  50 +-
 net/core/devlink.c                                 | 747 +++++++++++++++++++++
 12 files changed, 1969 insertions(+), 3 deletions(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/spectrum_cnt.c
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/spectrum_cnt.h
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.c
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.h

-- 
2.7.4

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [patch net-next RFC 1/8] devlink: Support for pipeline debug (dpipe)
  2017-02-16 15:22 [patch net-next RFC 0/8] Add support for pipeline debug (dpipe) Jiri Pirko
@ 2017-02-16 15:22 ` Jiri Pirko
  2017-02-16 15:57   ` John Fastabend
  2017-02-17  8:49   ` Simon Horman
  2017-02-16 15:22 ` [patch net-next RFC 2/8] mlxsw: spectrum: Add support for flow counter allocator Jiri Pirko
                   ` (9 subsequent siblings)
  10 siblings, 2 replies; 22+ messages in thread
From: Jiri Pirko @ 2017-02-16 15:22 UTC (permalink / raw)
  To: netdev
  Cc: davem, arkadis, idosch, mlxsw, jhs, ivecera, roopa, f.fainelli,
	vivien.didelot, john.fastabend, andrew

From: Arkadi Sharshevsky <arkadis@mellanox.com>

The pipeline debug is used to export the pipeline abstractions
for the main objects - tables, headers and entries. The only support for
set is for changing the counter parameter on specific table.

The basic structures:

Header - can represent a real protocol header information or internal
         metadata. Generic protocol headers like IPv4 can be shared
         between drivers. Each driver can add local headers.

Field - part of a header. Can represent protocol field or specific
        ASIC metadata field. Hardware special metadata fields can
        be mapped to different resources, for example switch ASIC
       	ports can have internal number which from the systems
        point of view is mapped to netdeivce ifindex.

Hfield - Specific combination of header:field. This object is used
         to represent the table behavior in terms of match/action.

Hfield_val - Represents value of specific Hfield.

Table - represents a hardware block which can be described with
        match/action behavior. The match/action can be done on the
        packets data or on the internal metadata that it gathered
        along the packets traversal throw the pipeline which is vendor
        specific and should be exported in order to provide
        understanding of ASICs behavior.

Entry - represents single record in a specific table. The entry is
        identified by specific combination of Hfield_vals for match
        /action.

Prior to accessing the tables/entries the drivers provide the header/
field data base which is used by driver to user-space. The data base
is split between the shared headers and unique headers.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/devlink.h        | 224 ++++++++++++-
 include/uapi/linux/devlink.h |  50 ++-
 net/core/devlink.c           | 747 +++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 1019 insertions(+), 2 deletions(-)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index d29e5fc..d480669 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -25,6 +25,8 @@ struct devlink {
 	struct list_head list;
 	struct list_head port_list;
 	struct list_head sb_list;
+	struct list_head dpipe_table_list;
+	struct devlink_dpipe_headers *dpipe_headers;
 	const struct devlink_ops *ops;
 	struct device *dev;
 	possible_net_t _net;
@@ -49,6 +51,157 @@ struct devlink_sb_pool_info {
 	enum devlink_sb_threshold_type threshold_type;
 };
 
+/**
+ * struct devlink_dpipe_field - dpipe field object
+ * @name: field name
+ * @id: id inside header fields array
+ * @bitwidth: bitwidth
+ * @mapping_type: mapping type
+ */
+struct devlink_dpipe_field {
+	const char *name;
+	unsigned int id;
+	unsigned int bitwidth;
+	enum devlink_dpipe_field_mapping_type mapping_type;
+};
+
+/**
+ * struct devlink_dpipe_header - dpipe header object
+ * @name: header name
+ * @id: id, global/local detrmined by global bit
+ * @fields: fields
+ * @fields_count: number of fields
+ * @global: indicates of header is shared like most protocol header
+ *	    or driver specific
+ */
+struct devlink_dpipe_header {
+	const char *name;
+	unsigned int id;
+	struct devlink_dpipe_field *fields;
+	unsigned int fields_count;
+	bool global;
+};
+
+/**
+ * struct devlink_dpipe_hfield - represents header:field tuple
+ * @header: header
+ * @filed_id: field index
+ */
+struct devlink_dpipe_hfield {
+	struct devlink_dpipe_header *header;
+	unsigned int field_id;
+};
+
+/**
+ * struct devlink_dpipe_hfield_val - represents a value of specific
+ *				     tuple
+ * @hfield: the tuple the value is relevant for
+ * @mapping_value: in case the field has some mapping this value
+ *                 specified the mapping value
+ * @value_size: value size
+ * @value: the value
+ * @mask: bit mask
+ */
+struct devlink_dpipe_hfield_val {
+	struct devlink_dpipe_hfield *hfield;
+	unsigned int mapping_value;
+	unsigned int value_size;
+	void *value;
+	void *mask;
+};
+
+/**
+ * struct devlink_dpipe_entry - table entry object
+ * @index: index of the entry in the table
+ * @matches: tuple match values
+ * @matches_count: count of matches tuples
+ * @actions: tuple actions values
+ * @actions_count: count of actions values
+ * @counter: value of counter
+ * @counter_valid: Specify if value is valid from hardware
+ */
+struct devlink_dpipe_entry {
+	unsigned int index;
+	struct devlink_dpipe_hfield_val *matches;
+	unsigned int matches_count;
+	struct devlink_dpipe_hfield_val *actions;
+	unsigned int actions_count;
+	u64 counter;
+	bool counter_valid;
+};
+
+/**
+ * struct devlink_dpipe_dump_ctx - context provided to driver in order
+ *				   to dump
+ * @info: info
+ * @cmd: devlink command
+ * @skb: skb
+ * @nest: top attribute
+ * @hdr: hdr
+ */
+struct devlink_dpipe_dump_ctx {
+	struct genl_info *info;
+	enum devlink_command cmd;
+	struct sk_buff *skb;
+	struct nlattr *nest;
+	void *hdr;
+};
+
+struct devlink_dpipe_table_ops;
+
+/**
+ * struct devlink_dpipe_table - table object
+ * @priv: private
+ * @name: table name
+ * @size: maximum number of entries
+ * @matches: header:field match tuples
+ * @matches_count: count of match tuples
+ * @actions: header:field action tuples
+ * @actions_count: count of action tuples
+ * @counters_enabled: indicates if counters are active
+ * @counter_control_extern: indicates if counter control is in dpipe or
+ *			    external tool
+ * @table_ops: table operations
+ * @rcu: rcu
+ */
+struct devlink_dpipe_table {
+	void *priv;
+	struct list_head list;
+	const char *name;
+	int size;
+	struct devlink_dpipe_hfield *matches;
+	unsigned int matches_count;
+	struct devlink_dpipe_hfield *actions;
+	unsigned int actions_count;
+	bool counters_enabled;
+	bool counter_control_extern;
+	struct devlink_dpipe_table_ops *table_ops;
+	struct rcu_head rcu;
+};
+
+/**
+ * struct devlink_dpipe_table_ops - dpipe_table ops
+ * @entries_dump - dumps all active entries in the table
+ * @counter_set_update - when changing the counter status hardware sync
+ *			 maybe needed to allocate/free counter related
+ *			 resources
+ */
+struct devlink_dpipe_table_ops {
+	int (*entries_dump)(void *priv, bool counters_enabled,
+			    struct devlink_dpipe_dump_ctx *dump_ctx);
+	int (*counter_set_update)(void *priv, bool enable);
+};
+
+/**
+ * struct devlink_dpipe_headers - dpipe headers
+ * @headers - header array can be shared(global bit) or local
+ * @headers_count - count of headers
+ */
+struct devlink_dpipe_headers {
+	struct devlink_dpipe_header **headers;
+	unsigned int headers_count;
+};
+
 struct devlink_ops {
 	int (*port_type_set)(struct devlink_port *devlink_port,
 			     enum devlink_port_type port_type);
@@ -132,7 +285,26 @@ int devlink_sb_register(struct devlink *devlink, unsigned int sb_index,
 			u16 egress_pools_count, u16 ingress_tc_count,
 			u16 egress_tc_count);
 void devlink_sb_unregister(struct devlink *devlink, unsigned int sb_index);
-
+int devlink_dpipe_table_register(struct devlink *devlink,
+				 const char *table_name,
+				 struct devlink_dpipe_table_ops *table_ops,
+				 void *priv,
+				 struct devlink_dpipe_hfield *matches,
+				 unsigned int matches_count,
+				 struct devlink_dpipe_hfield *actions,
+				 unsigned int actions_count,
+				 bool counter_control_extern);
+void devlink_dpipe_table_unregister(struct devlink *devlink,
+				    const char *table_name);
+int devlink_dpipe_headers_register(struct devlink *devlink,
+				   struct devlink_dpipe_headers *dpipe_headers);
+void devlink_dpipe_headers_unregister(struct devlink *devlink);
+bool devlink_dpipe_table_counter_enabled(struct devlink *devlink,
+					 const char *table_name);
+int devlink_dpipe_entry_prepare_ctx(struct devlink_dpipe_dump_ctx *dump_ctx);
+int devlink_dpipe_entry_append_ctx(struct devlink_dpipe_dump_ctx *dump_ctx,
+				   struct devlink_dpipe_entry *entry);
+int devlink_dpipe_entry_close_ctx(struct devlink_dpipe_dump_ctx *dump_ctx);
 #else
 
 static inline struct devlink *devlink_alloc(const struct devlink_ops *ops,
@@ -200,6 +372,56 @@ static inline void devlink_sb_unregister(struct devlink *devlink,
 {
 }
 
+int devlink_dpipe_table_register(struct devlink *devlink,
+				 const char *table_name,
+				 struct devlink_dpipe_table_ops *table_ops,
+				 void *priv,
+				 struct devlink_dpipe_hfield *matches,
+				 unsigned int matches_count,
+				 struct devlink_dpipe_hfield *actions,
+				 unsigned int actions_count,
+				 bool counter_control_extern)
+{
+	return 0;
+}
+
+void devlink_dpipe_table_unregister(struct devlink *devlink,
+				    const char *table_name);
+{
+}
+
+int devlink_dpipe_headers_register(struct devlink *devlink,
+				   struct devlink_dpipe_headers *dpipe_headers);
+{
+	return 0;
+}
+
+void devlink_dpipe_headers_unregister(struct devlink *devlink)
+{
+}
+
+bool devlink_dpipe_table_counter_enabled(struct devlink *devlink,
+					 const char *table_name)
+{
+	return false;
+}
+
+int devlink_dpipe_entry_prepare_ctx(struct devlink_dpipe_dump_ctx *dump_ctx)
+{
+	return 0;
+}
+
+int devlink_dpipe_entry_append_ctx(struct devlink_dpipe_dump_ctx *dump_ctx,
+				   struct devlink_dpipe_entry *entry)
+{
+	return 0;
+}
+
+int devlink_dpipe_entry_close_ctx(struct devlink_dpipe_dump_ctx *dump_ctx)
+{
+	return 0;
+}
+
 #endif
 
 #endif /* _NET_DEVLINK_H_ */
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 0f1f3a1..bbab55d 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -65,8 +65,12 @@ enum devlink_command {
 #define DEVLINK_CMD_ESWITCH_MODE_SET /* obsolete, never use this! */ \
 	DEVLINK_CMD_ESWITCH_SET
 
-	/* add new commands above here */
+	DEVLINK_CMD_DPIPE_TABLES_GET,
+	DEVLINK_CMD_DPIPE_ENTRIES_GET,
+	DEVLINK_CMD_DPIPE_HEADERS_GET,
+	DEVLINK_CMD_DPIPE_TABLE_COUNTER_SET,
 
+	/* add new commands above here */
 	__DEVLINK_CMD_MAX,
 	DEVLINK_CMD_MAX = __DEVLINK_CMD_MAX - 1
 };
@@ -148,10 +152,54 @@ enum devlink_attr {
 	DEVLINK_ATTR_ESWITCH_MODE,		/* u16 */
 	DEVLINK_ATTR_ESWITCH_INLINE_MODE,	/* u8 */
 
+	DEVLINK_ATTR_DPIPE_TABLES,		/* nested */
+	DEVLINK_ATTR_DPIPE_TABLE,		/* nested */
+	DEVLINK_ATTR_DPIPE_TABLE_NAME,		/* string */
+	DEVLINK_ATTR_DPIPE_TABLE_SIZE,		/* u32 */
+	DEVLINK_ATTR_DPIPE_TABLE_MATCHES,	/* nested */
+	DEVLINK_ATTR_DPIPE_TABLE_ACTIONS,	/* nested */
+	DEVLINK_ATTR_DPIPE_TABLE_COUNTERS_ENABLED,	/* flag */
+
+	DEVLINK_ATTR_DPIPE_HFIELD,		/* nested */
+	DEVLINK_ATTR_DPIPE_HFIELD_VALUE,
+	DEVLINK_ATTR_DPIPE_HFIELD_MASK,
+	DEVLINK_ATTR_DPIPE_HFIELD_MAPPING,	/* u32 */
+
+	DEVLINK_ATTR_DPIPE_ENTRIES,		/* nested */
+	DEVLINK_ATTR_DPIPE_ENTRY,		/* nested */
+	DEVLINK_ATTR_DPIPE_ENTRY_INDEX,		/* u32 */
+	DEVLINK_ATTR_DPIPE_ENTRY_MATCHES,	/* nested */
+	DEVLINK_ATTR_DPIPE_ENTRY_ACTIONS,	/* nested */
+	DEVLINK_ATTR_DPIPE_ENTRY_COUNTER,	/* u64 */
+	DEVLINK_ATTR_DPIPE_ENTRY_COUNTER_VALID,	/* u32 */
+
+	DEVLINK_ATTR_DPIPE_HEADERS,		/* nested */
+	DEVLINK_ATTR_DPIPE_HEADER,		/* nested */
+	DEVLINK_ATTR_DPIPE_HEADER_NAME,		/* string */
+	DEVLINK_ATTR_DPIPE_HEADER_ID,		/* u32 */
+	DEVLINK_ATTR_DPIPE_HEADER_FIELDS,	/* string */
+	DEVLINK_ATTR_DPIPE_HEADER_GLOBAL,	/* flag */
+
+	DEVLINK_ATTR_DPIPE_FIELD,		/* nested */
+	DEVLINK_ATTR_DPIPE_FIELD_NAME,		/* string */
+	DEVLINK_ATTR_DPIPE_FIELD_ID,		/* u32 */
+	DEVLINK_ATTR_DPIPE_FIELD_BITWIDTH,	/* u32 */
+	DEVLINK_ATTR_DPIPE_FIELD_MAPPING_TYPE,	/* u32 */
+
+	DEVLINK_ATTR_PAD,
+
 	/* add new attributes above here, update the policy in devlink.c */
 
 	__DEVLINK_ATTR_MAX,
 	DEVLINK_ATTR_MAX = __DEVLINK_ATTR_MAX - 1
 };
 
+/* Mapping between internal resource described by the field and system
+ * structure
+ */
+enum devlink_dpipe_field_mapping_type {
+	DEVLINK_DPIPE_FIELD_MAPPING_TYPE_NONE,
+	DEVLINK_DPIPE_FIELD_MAPPING_TYPE_IFINDEX,
+};
+
 #endif /* _UAPI_LINUX_DEVLINK_H_ */
diff --git a/net/core/devlink.c b/net/core/devlink.c
index e9c1e6a..286d7d2 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -1493,8 +1493,588 @@ static int devlink_nl_cmd_eswitch_set_doit(struct sk_buff *skb,
 		if (err)
 			return err;
 	}
+	return 0;
+}
+
+static int devlink_dpipe_hfield_put(struct sk_buff *skb,
+				    struct devlink_dpipe_hfield *hfield)
+{
+	struct devlink_dpipe_header *header = hfield->header;
+	struct devlink_dpipe_field *field = &header->fields[hfield->field_id];
+
+	if (nla_put_u32(skb, DEVLINK_ATTR_DPIPE_HEADER_ID, header->id) ||
+	    nla_put_u32(skb, DEVLINK_ATTR_DPIPE_FIELD_ID, field->id))
+		return -EMSGSIZE;
+
+	if (header->global)
+		if (nla_put_flag(skb, DEVLINK_ATTR_DPIPE_HEADER_GLOBAL))
+			return -EMSGSIZE;
+	return 0;
+}
+
+static int devlink_dpipe_hfield_val_put(struct sk_buff *skb,
+					struct devlink_dpipe_hfield_val *val)
+{
+	struct devlink_dpipe_header *header;
+	struct devlink_dpipe_field *field;
+
+	if (!val->hfield)
+		return -EINVAL;
+	if (devlink_dpipe_hfield_put(skb, val->hfield))
+		return -EMSGSIZE;
+	if (nla_put(skb, DEVLINK_ATTR_DPIPE_HFIELD_VALUE,
+		    val->value_size, val->value))
+		return -EMSGSIZE;
+	if (val->mask)
+		if (nla_put(skb, DEVLINK_ATTR_DPIPE_HFIELD_MASK,
+			    val->value_size, val->mask))
+			return -EMSGSIZE;
+
+	header = val->hfield->header;
+	field = &header->fields[val->hfield->field_id];
+	if (field->mapping_type)
+		if (nla_put_u32(skb, DEVLINK_ATTR_DPIPE_HFIELD_MAPPING,
+				val->mapping_value))
+			return -EMSGSIZE;
+	return 0;
+}
+
+static int devlink_dpipe_hfield_vals_put(struct sk_buff *skb,
+					 struct devlink_dpipe_hfield_val *vals,
+					 unsigned int vals_count)
+{
+	struct nlattr *hfield_attr;
+	int i;
+	int err;
+
+	for (i = 0; i < vals_count; i++) {
+		hfield_attr = nla_nest_start(skb, DEVLINK_ATTR_DPIPE_HFIELD);
+		if (!hfield_attr)
+			return -EMSGSIZE;
+		err = devlink_dpipe_hfield_val_put(skb, &vals[i]);
+		if (err)
+			goto err_hfield_val_put;
+		nla_nest_end(skb, hfield_attr);
+	}
+	return 0;
+
+err_hfield_val_put:
+	nla_nest_cancel(skb, hfield_attr);
+	return err;
+}
+
+static int devlink_dpipe_hfields_put(struct sk_buff *skb,
+				     struct devlink_dpipe_hfield *hfields,
+				     unsigned int hfields_count)
+{
+	struct nlattr *hfield_attr;
+	int i;
+
+	for (i = 0; i < hfields_count; i++) {
+		hfield_attr = nla_nest_start(skb, DEVLINK_ATTR_DPIPE_HFIELD);
+		if (!hfield_attr)
+			return -EMSGSIZE;
+		if (devlink_dpipe_hfield_put(skb, &hfields[i]))
+			goto nla_put_failure;
+		nla_nest_end(skb, hfield_attr);
+	}
+	return 0;
+
+nla_put_failure:
+	nla_nest_cancel(skb, hfield_attr);
+	return -EMSGSIZE;
+}
+
+static int devlink_dpipe_table_put(struct sk_buff *skb,
+				   struct devlink_dpipe_table *table)
+{
+	struct nlattr *matches_attr, *actions_attr, *table_attr;
+
+	table_attr = nla_nest_start(skb, DEVLINK_ATTR_DPIPE_TABLE);
+	if (!table_attr)
+		return -EMSGSIZE;
+
+	if (nla_put_string(skb, DEVLINK_ATTR_DPIPE_TABLE_NAME, table->name) ||
+	    nla_put_u32(skb, DEVLINK_ATTR_DPIPE_TABLE_SIZE, table->size))
+		goto nla_put_failure;
+	if (table->counters_enabled)
+		if (nla_put_flag(skb,
+				 DEVLINK_ATTR_DPIPE_TABLE_COUNTERS_ENABLED))
+			goto nla_put_failure;
+
+	matches_attr = nla_nest_start(skb, DEVLINK_ATTR_DPIPE_TABLE_MATCHES);
+	if (!matches_attr)
+		goto nla_put_failure;
+
+	if (devlink_dpipe_hfields_put(skb, table->matches,
+				      table->matches_count)) {
+		nla_nest_cancel(skb, matches_attr);
+		goto nla_put_failure;
+	}
+
+	actions_attr = nla_nest_start(skb, DEVLINK_ATTR_DPIPE_TABLE_ACTIONS);
+	if (!actions_attr)
+		goto nla_put_failure;
+
+	if (devlink_dpipe_hfields_put(skb, table->actions,
+				      table->actions_count)) {
+		nla_nest_cancel(skb, actions_attr);
+		goto nla_put_failure;
+	}
+
+	nla_nest_end(skb, actions_attr);
+	nla_nest_end(skb, table_attr);
+	return 0;
+
+nla_put_failure:
+	nla_nest_cancel(skb, table_attr);
+	return -EMSGSIZE;
+}
+
+static int devlink_dpipe_send_and_alloc_skb(struct sk_buff **pskb,
+					    struct genl_info *info)
+{
+	int err;
+
+	if (*pskb) {
+		err = genlmsg_reply(*pskb, info);
+		if (err)
+			return err;
+	}
+	*pskb = genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!*pskb)
+		return -ENOMEM;
+	return 0;
+}
+
+static int devlink_dpipe_tables_fill(struct genl_info *info,
+				     enum devlink_command cmd, int flags,
+				     struct list_head *dpipe_tables)
+
+{
+	struct devlink *devlink = info->user_ptr[0];
+	struct devlink_dpipe_table *table;
+	struct nlattr *tables_attr;
+	struct sk_buff *skb = NULL;
+	struct nlmsghdr *nlh;
+	bool incomplete;
+	void *hdr;
+	int i;
+	int err;
+
+	table = list_first_entry(dpipe_tables,
+				 struct devlink_dpipe_table, list);
+start_again:
+	err = devlink_dpipe_send_and_alloc_skb(&skb, info);
+	if (err)
+		return err;
+
+	hdr = genlmsg_put(skb, info->snd_portid, info->snd_seq,
+			  &devlink_nl_family, NLM_F_MULTI, cmd);
+	if (!hdr)
+		return -EMSGSIZE;
 
+	if (devlink_nl_put_handle(skb, devlink))
+		goto nla_put_failure;
+	tables_attr = nla_nest_start(skb, DEVLINK_ATTR_DPIPE_TABLES);
+	if (!tables_attr)
+		goto nla_put_failure;
+
+	i = 0;
+	incomplete = false;
+	list_for_each_entry_from(table, dpipe_tables, list) {
+		err = devlink_dpipe_table_put(skb, table);
+		if (err) {
+			if (!i)
+				goto err_table_put;
+			incomplete = true;
+			break;
+		}
+		i++;
+	}
+
+	nla_nest_end(skb, tables_attr);
+	genlmsg_end(skb, hdr);
+	if (incomplete)
+		goto start_again;
+
+send_done:
+	nlh = nlmsg_put(skb, info->snd_portid, info->snd_seq,
+			NLMSG_DONE, 0, flags | NLM_F_MULTI);
+	if (!nlh) {
+		err = devlink_dpipe_send_and_alloc_skb(&skb, info);
+		if (err)
+			goto err_skb_send_alloc;
+		goto send_done;
+	}
+
+	return genlmsg_reply(skb, info);
+
+nla_put_failure:
+	err = -EMSGSIZE;
+err_table_put:
+err_skb_send_alloc:
+	genlmsg_cancel(skb, hdr);
+	nlmsg_free(skb);
+	return err;
+}
+
+static int devlink_nl_cmd_dpipe_tables_get(struct sk_buff *skb,
+					   struct genl_info *info)
+{
+	struct devlink *devlink = info->user_ptr[0];
+
+	return devlink_dpipe_tables_fill(info, DEVLINK_CMD_DPIPE_TABLES_GET, 0,
+					 &devlink->dpipe_table_list);
+}
+
+static int devlink_dpipe_entry_put(struct sk_buff *skb,
+				   struct devlink_dpipe_entry *entry)
+{
+	struct nlattr *entry_attr, *matches_attr, *actions_attr;
+	int err;
+
+	entry_attr = nla_nest_start(skb, DEVLINK_ATTR_DPIPE_ENTRY);
+	if (!entry_attr)
+		return  -EMSGSIZE;
+
+	if (nla_put_u32(skb, DEVLINK_ATTR_DPIPE_ENTRY_INDEX, entry->index))
+		goto nla_put_failure;
+	if (entry->counter_valid)
+		if (nla_put_u64_64bit(skb, DEVLINK_ATTR_DPIPE_ENTRY_COUNTER,
+				      entry->counter, DEVLINK_ATTR_PAD))
+			goto nla_put_failure;
+
+	matches_attr = nla_nest_start(skb, DEVLINK_ATTR_DPIPE_ENTRY_MATCHES);
+	if (!matches_attr)
+		goto nla_put_failure;
+
+	err = devlink_dpipe_hfield_vals_put(skb, entry->matches,
+					    entry->matches_count);
+	if (err) {
+		nla_nest_cancel(skb, matches_attr);
+		goto err_hfield_vals_put;
+	}
+	nla_nest_end(skb, matches_attr);
+
+	actions_attr = nla_nest_start(skb, DEVLINK_ATTR_DPIPE_ENTRY_ACTIONS);
+	if (!actions_attr)
+		goto nla_put_failure;
+
+	err = devlink_dpipe_hfield_vals_put(skb, entry->actions,
+					    entry->actions_count);
+	if (err) {
+		nla_nest_cancel(skb, actions_attr);
+		goto err_hfield_vals_put;
+	}
+	nla_nest_end(skb, actions_attr);
+
+	nla_nest_end(skb, entry_attr);
 	return 0;
+
+nla_put_failure:
+	err = -EMSGSIZE;
+err_hfield_vals_put:
+	nla_nest_cancel(skb, entry_attr);
+	return err;
+}
+
+static struct devlink_dpipe_table *
+devlink_dpipe_table_find(struct list_head *dpipe_tables,
+			 const char *table_name)
+{
+	struct devlink_dpipe_table *table;
+
+	list_for_each_entry_rcu(table, dpipe_tables, list) {
+		if (!strcmp(table->name, table_name))
+			return table;
+	}
+	return NULL;
+}
+
+int devlink_dpipe_entry_prepare_ctx(struct devlink_dpipe_dump_ctx *dump_ctx)
+{
+	struct devlink *devlink;
+	int err;
+
+	err = devlink_dpipe_send_and_alloc_skb(&dump_ctx->skb,
+					       dump_ctx->info);
+	if (err)
+		return err;
+
+	dump_ctx->hdr = genlmsg_put(dump_ctx->skb,
+				    dump_ctx->info->snd_portid,
+				    dump_ctx->info->snd_seq,
+				    &devlink_nl_family, NLM_F_MULTI,
+				    dump_ctx->cmd);
+	if (!dump_ctx->hdr)
+		goto nla_put_failure;
+
+	devlink = dump_ctx->info->user_ptr[0];
+	if (devlink_nl_put_handle(dump_ctx->skb, devlink))
+		goto nla_put_failure;
+	dump_ctx->nest = nla_nest_start(dump_ctx->skb,
+					DEVLINK_ATTR_DPIPE_ENTRIES);
+	if (!dump_ctx->nest)
+		goto nla_put_failure;
+	return 0;
+
+nla_put_failure:
+	genlmsg_cancel(dump_ctx->skb, dump_ctx->hdr);
+	nlmsg_free(dump_ctx->skb);
+	return -EMSGSIZE;
+}
+
+int devlink_dpipe_entry_append_ctx(struct devlink_dpipe_dump_ctx *dump_ctx,
+				   struct devlink_dpipe_entry *entry)
+{
+	return devlink_dpipe_entry_put(dump_ctx->skb, entry);
+}
+
+int devlink_dpipe_entry_close_ctx(struct devlink_dpipe_dump_ctx *dump_ctx)
+{
+	nla_nest_end(dump_ctx->skb, dump_ctx->nest);
+	genlmsg_end(dump_ctx->skb, dump_ctx->hdr);
+	return 0;
+}
+
+static int devlink_dpipe_entries_fill(struct genl_info *info,
+				      enum devlink_command cmd, int flags,
+				      struct devlink_dpipe_table *table)
+{
+	struct devlink_dpipe_dump_ctx dump_ctx;
+	struct nlmsghdr *nlh;
+	int err;
+
+	dump_ctx.skb = NULL;
+	dump_ctx.cmd = cmd;
+	dump_ctx.info = info;
+
+	err = table->table_ops->entries_dump(table->priv,
+					     table->counters_enabled,
+					     &dump_ctx);
+	if (err)
+		goto err_entries_dump;
+
+send_done:
+	nlh = nlmsg_put(dump_ctx.skb, info->snd_portid, info->snd_seq,
+			NLMSG_DONE, 0, flags | NLM_F_MULTI);
+	if (!nlh) {
+		err = devlink_dpipe_send_and_alloc_skb(&dump_ctx.skb, info);
+		if (err)
+			goto err_skb_send_alloc;
+		goto send_done;
+	}
+	return genlmsg_reply(dump_ctx.skb, info);
+
+err_entries_dump:
+err_skb_send_alloc:
+	genlmsg_cancel(dump_ctx.skb, dump_ctx.hdr);
+	nlmsg_free(dump_ctx.skb);
+	return err;
+}
+
+static int devlink_nl_cmd_dpipe_entries_get(struct sk_buff *skb,
+					    struct genl_info *info)
+{
+	struct devlink *devlink = info->user_ptr[0];
+	struct devlink_dpipe_table *table;
+	const char *table_name;
+
+	if (!info->attrs[DEVLINK_ATTR_DPIPE_TABLE_NAME])
+		return -EINVAL;
+
+	table_name = nla_data(info->attrs[DEVLINK_ATTR_DPIPE_TABLE_NAME]);
+	table = devlink_dpipe_table_find(&devlink->dpipe_table_list,
+					 table_name);
+	if (!table)
+		return -EINVAL;
+
+	if (!table->table_ops->entries_dump)
+		return -EINVAL;
+
+	return devlink_dpipe_entries_fill(info, DEVLINK_CMD_DPIPE_ENTRIES_GET,
+					  0, table);
+}
+
+static int devlink_dpipe_fields_put(struct sk_buff *skb,
+				    const struct devlink_dpipe_header *header)
+{
+	struct devlink_dpipe_field *field;
+	struct nlattr *field_attr;
+	int i;
+
+	for (i = 0; i < header->fields_count; i++) {
+		field = &header->fields[i];
+		field_attr = nla_nest_start(skb, DEVLINK_ATTR_DPIPE_FIELD);
+		if (!field_attr)
+			return -EMSGSIZE;
+		if (nla_put_string(skb, DEVLINK_ATTR_DPIPE_FIELD_NAME, field->name) ||
+		    nla_put_u32(skb, DEVLINK_ATTR_DPIPE_FIELD_ID, field->id) ||
+		    nla_put_u32(skb, DEVLINK_ATTR_DPIPE_FIELD_BITWIDTH, field->bitwidth))
+			goto nla_put_failure;
+		if (field->mapping_type)
+			if (nla_put_u32(skb,
+					DEVLINK_ATTR_DPIPE_FIELD_MAPPING_TYPE,
+					field->mapping_type))
+				goto nla_put_failure;
+		nla_nest_end(skb, field_attr);
+	}
+	return 0;
+
+nla_put_failure:
+	nla_nest_cancel(skb, field_attr);
+	return -EMSGSIZE;
+}
+
+static int devlink_dpipe_header_put(struct sk_buff *skb,
+				    struct devlink_dpipe_header *header)
+{
+	struct nlattr *fields_attr, *header_attr;
+	int err;
+
+	header_attr = nla_nest_start(skb, DEVLINK_ATTR_DPIPE_HEADER);
+	if (!header)
+		return -EMSGSIZE;
+
+	if (nla_put_string(skb, DEVLINK_ATTR_DPIPE_HEADER_NAME, header->name) ||
+	    nla_put_u32(skb, DEVLINK_ATTR_DPIPE_HEADER_ID, header->id))
+		goto nla_put_failure;
+
+	fields_attr = nla_nest_start(skb, DEVLINK_ATTR_DPIPE_HEADER_FIELDS);
+	if (!fields_attr)
+		goto nla_put_failure;
+
+	err = devlink_dpipe_fields_put(skb, header);
+	if (err) {
+		nla_nest_cancel(skb, fields_attr);
+		goto nla_put_failure;
+	}
+	nla_nest_end(skb, fields_attr);
+	nla_nest_end(skb, header_attr);
+	return 0;
+
+nla_put_failure:
+	err = -EMSGSIZE;
+	nla_nest_cancel(skb, header_attr);
+	return err;
+}
+
+static int devlink_dpipe_headers_fill(struct genl_info *info,
+				      enum devlink_command cmd, int flags,
+				      struct devlink_dpipe_headers *
+				      dpipe_headers)
+{
+	struct devlink *devlink = info->user_ptr[0];
+	struct nlattr *headers_attr;
+	struct sk_buff *skb = NULL;
+	struct nlmsghdr *nlh;
+	void *hdr;
+	int i, j;
+	int err;
+
+	i = 0;
+start_again:
+	err = devlink_dpipe_send_and_alloc_skb(&skb, info);
+	if (err)
+		return err;
+
+	hdr = genlmsg_put(skb, info->snd_portid, info->snd_seq,
+			  &devlink_nl_family, NLM_F_MULTI, cmd);
+	if (!hdr)
+		return -EMSGSIZE;
+
+	if (devlink_nl_put_handle(skb, devlink))
+		goto nla_put_failure;
+	headers_attr = nla_nest_start(skb, DEVLINK_ATTR_DPIPE_HEADERS);
+	if (!headers_attr)
+		goto nla_put_failure;
+
+	j = 0;
+	for (; i < dpipe_headers->headers_count; i++) {
+		err = devlink_dpipe_header_put(skb, dpipe_headers->headers[i]);
+		if (err) {
+			if (!j)
+				goto err_table_put;
+			break;
+		}
+		j++;
+	}
+	nla_nest_end(skb, headers_attr);
+	genlmsg_end(skb, hdr);
+	if (i != dpipe_headers->headers_count)
+		goto start_again;
+
+send_done:
+	nlh = nlmsg_put(skb, info->snd_portid, info->snd_seq,
+			NLMSG_DONE, 0, flags | NLM_F_MULTI);
+	if (!nlh) {
+		err = devlink_dpipe_send_and_alloc_skb(&skb, info);
+		if (err)
+			goto err_skb_send_alloc;
+		goto send_done;
+	}
+	return genlmsg_reply(skb, info);
+
+nla_put_failure:
+	err = -EMSGSIZE;
+err_table_put:
+err_skb_send_alloc:
+	genlmsg_cancel(skb, hdr);
+	nlmsg_free(skb);
+	return err;
+}
+
+static int devlink_nl_cmd_dpipe_headers_get(struct sk_buff *skb,
+					    struct genl_info *info)
+{
+	struct devlink *devlink = info->user_ptr[0];
+
+	return devlink_dpipe_headers_fill(info, DEVLINK_CMD_DPIPE_HEADERS_GET,
+					  0, devlink->dpipe_headers);
+}
+
+static int devlink_dpipe_table_counter_set(struct devlink *devlink,
+					   const char *table_name,
+					   bool enable)
+{
+	struct devlink_dpipe_table *table;
+
+	table = devlink_dpipe_table_find(&devlink->dpipe_table_list,
+					 table_name);
+	if (!table)
+		return -EINVAL;
+
+	if (table->counter_control_extern)
+		return -EOPNOTSUPP;
+
+	if (!(table->counters_enabled ^ enable))
+		return 0;
+
+	table->counters_enabled = enable;
+	if (table->table_ops->counter_set_update)
+		table->table_ops->counter_set_update(table->priv, enable);
+	return 0;
+}
+
+static int devlink_nl_cmd_dpipe_table_counter_set(struct sk_buff *skb,
+						  struct genl_info *info)
+{
+	struct devlink *devlink = info->user_ptr[0];
+	const char *table_name;
+	int counter_enable;
+
+	if (!info->attrs[DEVLINK_ATTR_DPIPE_TABLE_NAME])
+		return -EINVAL;
+	table_name = nla_data(info->attrs[DEVLINK_ATTR_DPIPE_TABLE_NAME]);
+
+	if (!info->attrs[DEVLINK_ATTR_DPIPE_TABLE_COUNTERS_ENABLED])
+		counter_enable = false;
+	else
+		counter_enable = true;
+
+	return devlink_dpipe_table_counter_set(devlink, table_name,
+					       counter_enable);
 }
 
 static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
@@ -1512,6 +2092,8 @@ static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
 	[DEVLINK_ATTR_SB_TC_INDEX] = { .type = NLA_U16 },
 	[DEVLINK_ATTR_ESWITCH_MODE] = { .type = NLA_U16 },
 	[DEVLINK_ATTR_ESWITCH_INLINE_MODE] = { .type = NLA_U8 },
+	[DEVLINK_ATTR_DPIPE_TABLE_NAME] = { .type = NLA_NUL_STRING },
+	[DEVLINK_ATTR_DPIPE_TABLE_COUNTERS_ENABLED] = { .type =  NLA_U32 },
 };
 
 static const struct genl_ops devlink_nl_ops[] = {
@@ -1644,6 +2226,34 @@ static const struct genl_ops devlink_nl_ops[] = {
 		.flags = GENL_ADMIN_PERM,
 		.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
 	},
+	{
+		.cmd = DEVLINK_CMD_DPIPE_TABLES_GET,
+		.doit = devlink_nl_cmd_dpipe_tables_get,
+		.policy = devlink_nl_policy,
+		.flags = GENL_ADMIN_PERM,
+		.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
+	},
+	{
+		.cmd = DEVLINK_CMD_DPIPE_ENTRIES_GET,
+		.doit = devlink_nl_cmd_dpipe_entries_get,
+		.policy = devlink_nl_policy,
+		.flags = GENL_ADMIN_PERM,
+		.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
+	},
+	{
+		.cmd = DEVLINK_CMD_DPIPE_HEADERS_GET,
+		.doit = devlink_nl_cmd_dpipe_headers_get,
+		.policy = devlink_nl_policy,
+		.flags = GENL_ADMIN_PERM,
+		.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
+	},
+	{
+		.cmd = DEVLINK_CMD_DPIPE_TABLE_COUNTER_SET,
+		.doit = devlink_nl_cmd_dpipe_table_counter_set,
+		.policy = devlink_nl_policy,
+		.flags = GENL_ADMIN_PERM,
+		.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
+	},
 };
 
 static struct genl_family devlink_nl_family __ro_after_init = {
@@ -1680,6 +2290,7 @@ struct devlink *devlink_alloc(const struct devlink_ops *ops, size_t priv_size)
 	devlink_net_set(devlink, &init_net);
 	INIT_LIST_HEAD(&devlink->port_list);
 	INIT_LIST_HEAD(&devlink->sb_list);
+	INIT_LIST_HEAD_RCU(&devlink->dpipe_table_list);
 	return devlink;
 }
 EXPORT_SYMBOL_GPL(devlink_alloc);
@@ -1880,6 +2491,142 @@ void devlink_sb_unregister(struct devlink *devlink, unsigned int sb_index)
 }
 EXPORT_SYMBOL_GPL(devlink_sb_unregister);
 
+/**
+ *	devlink_dpipe_headers_register - register dpipe headers
+ *
+ *	@devlink: devlink
+ *	@dpipe_headers: dpipe header array
+ *
+ *	Register the headers supported by hardware.
+ */
+int devlink_dpipe_headers_register(struct devlink *devlink,
+				   struct devlink_dpipe_headers *dpipe_headers)
+{
+	mutex_lock(&devlink_mutex);
+	devlink->dpipe_headers = dpipe_headers;
+	mutex_unlock(&devlink_mutex);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(devlink_dpipe_headers_register);
+
+/**
+ *	devlink_dpipe_headers_unregister - unregister dpipe headers
+ *
+ *	@devlink: devlink
+ *
+ *	Unregister the headers supported by hardware.
+ */
+void devlink_dpipe_headers_unregister(struct devlink *devlink)
+{
+	mutex_lock(&devlink_mutex);
+	devlink->dpipe_headers = NULL;
+	mutex_unlock(&devlink_mutex);
+}
+EXPORT_SYMBOL_GPL(devlink_dpipe_headers_unregister);
+
+/**
+ *	devlink_dpipe_table_counter_enabled - check if counter allocation
+ *					      required
+ *	@devlink: devlink
+ *	@table_name: tables name
+ *
+ *	Used by driver to check if counter allocation is required.
+ *	After counter allocation is turned on the table entries
+ *	are updated to include counter statistics.
+ *
+ *	After that point on the driver must respect the counter
+ *	state so that each entry added to the table is added
+ *	with a counter.
+ */
+bool devlink_dpipe_table_counter_enabled(struct devlink *devlink,
+					 const char *table_name)
+{
+	struct devlink_dpipe_table *table;
+	bool enabled;
+
+	rcu_read_lock();
+	table = devlink_dpipe_table_find(&devlink->dpipe_table_list,
+					 table_name);
+	enabled = false;
+	if (table)
+		enabled = table->counters_enabled;
+	rcu_read_unlock();
+	return enabled;
+}
+EXPORT_SYMBOL_GPL(devlink_dpipe_table_counter_enabled);
+
+/**
+ *	devlink_dpipe_table_register - register dpipe table
+ *
+ *	@devlink: devlink
+ *	@table_name: table name
+ *	@table_ops: table ops
+ *	@priv: priv
+ *	@matches: match tuples
+ *	@matches_count: count of matches count
+ *	@actions: action tuples
+ *	@actions_count: count of action tuples
+ *	@counter_control_extern: external control for counters
+ */
+int devlink_dpipe_table_register(struct devlink *devlink,
+				 const char *table_name,
+				 struct devlink_dpipe_table_ops *table_ops,
+				 void *priv,
+				 struct devlink_dpipe_hfield *matches,
+				 unsigned int matches_count,
+				 struct devlink_dpipe_hfield *actions,
+				 unsigned int actions_count,
+				 bool counter_control_extern)
+{
+	struct devlink_dpipe_table *table;
+
+	if (devlink_dpipe_table_find(&devlink->dpipe_table_list, table_name))
+		return -EEXIST;
+
+	table = kzalloc(sizeof(*table), GFP_KERNEL);
+	if (!table)
+		return -ENOMEM;
+
+	table->name = table_name;
+	table->table_ops = table_ops;
+	table->priv = priv;
+	table->matches = matches;
+	table->matches_count = matches_count;
+	table->actions = actions;
+	table->actions_count = actions_count;
+	table->counter_control_extern = counter_control_extern;
+	mutex_lock(&devlink_mutex);
+	list_add_tail_rcu(&table->list, &devlink->dpipe_table_list);
+	mutex_unlock(&devlink_mutex);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(devlink_dpipe_table_register);
+
+/**
+ *	devlink_dpipe_table_unregister - unregister dpipe table
+ *
+ *	@devlink: devlink
+ *	@table_name: table name
+ */
+void devlink_dpipe_table_unregister(struct devlink *devlink,
+				    const char *table_name)
+{
+	struct devlink_dpipe_table *table;
+
+	mutex_lock(&devlink_mutex);
+	table = devlink_dpipe_table_find(&devlink->dpipe_table_list,
+					 table_name);
+	if (!table)
+		goto unlock;
+	list_del_rcu(&table->list);
+	mutex_unlock(&devlink_mutex);
+	kfree_rcu(table, rcu);
+	return;
+unlock:
+	mutex_unlock(&devlink_mutex);
+}
+EXPORT_SYMBOL_GPL(devlink_dpipe_table_unregister);
+
 static int __init devlink_module_init(void)
 {
 	return genl_register_family(&devlink_nl_family);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [patch net-next RFC 2/8] mlxsw: spectrum: Add support for flow counter allocator
  2017-02-16 15:22 [patch net-next RFC 0/8] Add support for pipeline debug (dpipe) Jiri Pirko
  2017-02-16 15:22 ` [patch net-next RFC 1/8] devlink: Support " Jiri Pirko
@ 2017-02-16 15:22 ` Jiri Pirko
  2017-02-16 15:22 ` [patch net-next RFC 3/8] mlxsw: reg: Add counter fields to RITR register Jiri Pirko
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 22+ messages in thread
From: Jiri Pirko @ 2017-02-16 15:22 UTC (permalink / raw)
  To: netdev
  Cc: davem, arkadis, idosch, mlxsw, jhs, ivecera, roopa, f.fainelli,
	vivien.didelot, john.fastabend, andrew

From: Arkadi Sharshevsky <arkadis@mellanox.com>

Add implementation for flow counter allocator. The ASIC has special memory
pool for various counting purposes. Counter memory is distributed between
equal size banks.

The static sub-pool configuration should specify the following parameters
for each sub-pool:
- Number of required banks.
- Maximum entry size.

Each module can add dedicated sub-pool or use existing one.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/Makefile       |   3 +-
 drivers/net/ethernet/mellanox/mlxsw/resources.h    |   2 +
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c     |  10 ++
 drivers/net/ethernet/mellanox/mlxsw/spectrum.h     |   2 +
 drivers/net/ethernet/mellanox/mlxsw/spectrum_cnt.c | 177 +++++++++++++++++++++
 drivers/net/ethernet/mellanox/mlxsw/spectrum_cnt.h |  49 ++++++
 6 files changed, 242 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/spectrum_cnt.c
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/spectrum_cnt.h

diff --git a/drivers/net/ethernet/mellanox/mlxsw/Makefile b/drivers/net/ethernet/mellanox/mlxsw/Makefile
index 6b6c30d..95fcacf 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/Makefile
+++ b/drivers/net/ethernet/mellanox/mlxsw/Makefile
@@ -15,7 +15,8 @@ obj-$(CONFIG_MLXSW_SPECTRUM)	+= mlxsw_spectrum.o
 mlxsw_spectrum-objs		:= spectrum.o spectrum_buffers.o \
 				   spectrum_switchdev.o spectrum_router.o \
 				   spectrum_kvdl.o spectrum_acl_tcam.o \
-				   spectrum_acl.o spectrum_flower.o
+				   spectrum_acl.o spectrum_flower.o \
+				   spectrum_cnt.o
 mlxsw_spectrum-$(CONFIG_MLXSW_SPECTRUM_DCB)	+= spectrum_dcb.o
 obj-$(CONFIG_MLXSW_MINIMAL)	+= mlxsw_minimal.o
 mlxsw_minimal-objs		:= minimal.o
diff --git a/drivers/net/ethernet/mellanox/mlxsw/resources.h b/drivers/net/ethernet/mellanox/mlxsw/resources.h
index bce8c2e..e435ae6 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/resources.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/resources.h
@@ -43,6 +43,7 @@ enum mlxsw_res_id {
 	MLXSW_RES_ID_KVD_SINGLE_MIN_SIZE,
 	MLXSW_RES_ID_KVD_DOUBLE_MIN_SIZE,
 	MLXSW_RES_ID_MAX_TRAP_GROUPS,
+	MLXSW_RES_ID_CNT_POOL_SIZE,
 	MLXSW_RES_ID_MAX_SPAN,
 	MLXSW_RES_ID_MAX_SYSTEM_PORT,
 	MLXSW_RES_ID_MAX_LAG,
@@ -75,6 +76,7 @@ static u16 mlxsw_res_ids[] = {
 	[MLXSW_RES_ID_KVD_SINGLE_MIN_SIZE] = 0x1002,
 	[MLXSW_RES_ID_KVD_DOUBLE_MIN_SIZE] = 0x1003,
 	[MLXSW_RES_ID_MAX_TRAP_GROUPS] = 0x2201,
+	[MLXSW_RES_ID_CNT_POOL_SIZE] = 0x2410,
 	[MLXSW_RES_ID_MAX_SPAN] = 0x2420,
 	[MLXSW_RES_ID_MAX_SYSTEM_PORT] = 0x2502,
 	[MLXSW_RES_ID_MAX_LAG] = 0x2520,
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index 16484f2..f7b8cf4 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -66,6 +66,7 @@
 #include "port.h"
 #include "trap.h"
 #include "txheader.h"
+#include "spectrum_cnt.h"
 
 static const char mlxsw_sp_driver_name[] = "mlxsw_spectrum";
 static const char mlxsw_sp_driver_version[] = "1.0";
@@ -3224,6 +3225,12 @@ static int mlxsw_sp_init(struct mlxsw_core *mlxsw_core,
 		goto err_acl_init;
 	}
 
+	err = mlxsw_sp_counter_pool_init(mlxsw_sp);
+	if (err) {
+		dev_err(mlxsw_sp->bus_info->dev, "Failed to init counter pool\n");
+		goto err_counter_pool_init;
+	}
+
 	err = mlxsw_sp_ports_create(mlxsw_sp);
 	if (err) {
 		dev_err(mlxsw_sp->bus_info->dev, "Failed to create ports\n");
@@ -3233,6 +3240,8 @@ static int mlxsw_sp_init(struct mlxsw_core *mlxsw_core,
 	return 0;
 
 err_ports_create:
+	mlxsw_sp_counter_pool_fini(mlxsw_sp);
+err_counter_pool_init:
 	mlxsw_sp_acl_fini(mlxsw_sp);
 err_acl_init:
 	mlxsw_sp_span_fini(mlxsw_sp);
@@ -3255,6 +3264,7 @@ static void mlxsw_sp_fini(struct mlxsw_core *mlxsw_core)
 	struct mlxsw_sp *mlxsw_sp = mlxsw_core_driver_priv(mlxsw_core);
 
 	mlxsw_sp_ports_remove(mlxsw_sp);
+	mlxsw_sp_counter_pool_fini(mlxsw_sp);
 	mlxsw_sp_acl_fini(mlxsw_sp);
 	mlxsw_sp_span_fini(mlxsw_sp);
 	mlxsw_sp_router_fini(mlxsw_sp);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
index 13ec85e..1aec58a 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
@@ -269,6 +269,7 @@ struct mlxsw_sp_router {
 };
 
 struct mlxsw_sp_acl;
+struct mlxsw_sp_counter_pool;
 
 struct mlxsw_sp {
 	struct {
@@ -304,6 +305,7 @@ struct mlxsw_sp {
 		DECLARE_BITMAP(usage, MLXSW_SP_KVD_LINEAR_SIZE);
 	} kvdl;
 
+	struct mlxsw_sp_counter_pool *counter_pool;
 	struct {
 		struct mlxsw_sp_span_entry *entries;
 		int entries_count;
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_cnt.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_cnt.c
new file mode 100644
index 0000000..a0344c6
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_cnt.c
@@ -0,0 +1,177 @@
+/*
+ * drivers/net/ethernet/mellanox/mlxsw/spectrum_cnt.c
+ * Copyright (c) 2017 Mellanox Technologies. All rights reserved.
+ * Copyright (c) 2017 Arkadi Sharshevsky <arkadis@mellanox.com>
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ * 3. Neither the names of the copyright holders nor the names of its
+ *    contributors may be used to endorse or promote products derived from
+ *    this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <linux/kernel.h>
+#include <linux/bitops.h>
+
+#include "spectrum_cnt.h"
+
+#define MLXSW_SW_COUNTER_POOL_BANK_SIZE 4096
+
+struct mlxsw_sp_counter_sub_pool {
+	unsigned int base_index;
+	unsigned int size;
+	unsigned int entry_size;
+	unsigned int bank_count;
+};
+
+struct mlxsw_sp_counter_pool {
+	unsigned int pool_size;
+	unsigned long *usage; /* Usage bitmap */
+	struct mlxsw_sp_counter_sub_pool *sub_pools;
+	unsigned int sub_pools_count;
+};
+
+static struct mlxsw_sp_counter_sub_pool mlxsw_sp_counter_sub_pools[] = {};
+
+int mlxsw_sp_counter_pool_init(struct mlxsw_sp *mlxsw_sp)
+{
+	struct mlxsw_sp_counter_sub_pool *sub_pool, *sub_pool_prev;
+	struct mlxsw_sp_counter_pool *pool;
+	unsigned int total_bank_config = 0;
+	unsigned int map_size;
+	unsigned int i;
+	int err;
+
+	if (!MLXSW_CORE_RES_VALID(mlxsw_sp->core, CNT_POOL_SIZE))
+		return -EIO;
+
+	pool = kzalloc(sizeof(*pool), GFP_KERNEL);
+	if (!pool)
+		return -ENOMEM;
+	mlxsw_sp->counter_pool = pool;
+
+	pool->pool_size = MLXSW_CORE_RES_GET(mlxsw_sp->core,
+					     CNT_POOL_SIZE);
+	/* Check config is valid, no bank over subscription */
+	for (i = 0; i < ARRAY_SIZE(mlxsw_sp_counter_sub_pools); i++)
+		total_bank_config += mlxsw_sp_counter_sub_pools[i].bank_count;
+
+	/* The last bank can be not fully used */
+	if (total_bank_config > pool->pool_size /
+				MLXSW_SW_COUNTER_POOL_BANK_SIZE + 1) {
+		err = -EINVAL;
+		goto err_bank_config;
+	}
+
+	map_size = BITS_TO_LONGS(pool->pool_size) * sizeof(unsigned long);
+	pool->usage = kzalloc(map_size, GFP_KERNEL);
+	if (!pool->usage) {
+		err = -ENOMEM;
+		goto err_usage_alloc;
+	}
+	pool->sub_pools = mlxsw_sp_counter_sub_pools;
+	pool->sub_pools_count = ARRAY_SIZE(mlxsw_sp_counter_sub_pools);
+
+	/* Allocation is based on bank count which should be
+	 * specified for each sub pool statically.
+	 */
+	for (i = 0; i < pool->sub_pools_count; i++) {
+		sub_pool = &pool->sub_pools[i];
+		if (i == 0) {
+			sub_pool->base_index = 0;
+		} else {
+			sub_pool_prev = &pool->sub_pools[i - 1];
+			sub_pool->base_index = sub_pool_prev->base_index +
+					       sub_pool_prev->size;
+		}
+		sub_pool->size = sub_pool->bank_count *
+				 MLXSW_SW_COUNTER_POOL_BANK_SIZE;
+		if (sub_pool->base_index + sub_pool->size > pool->pool_size)
+			sub_pool->size = pool->pool_size - sub_pool->base_index;
+	}
+	return 0;
+
+err_usage_alloc:
+err_bank_config:
+	kfree(mlxsw_sp->counter_pool);
+	return err;
+}
+
+void mlxsw_sp_counter_pool_fini(struct mlxsw_sp *mlxsw_sp)
+{
+	kfree(mlxsw_sp->counter_pool->usage);
+	kfree(mlxsw_sp->counter_pool);
+}
+
+int mlxsw_sp_counter_alloc(struct mlxsw_sp *mlxsw_sp, unsigned int sub_pool_id,
+			   unsigned int entry_size)
+{
+	struct mlxsw_sp_counter_pool *pool = mlxsw_sp->counter_pool;
+	struct mlxsw_sp_counter_sub_pool *sub_pool;
+	unsigned int entry_index;
+	unsigned int stop_index;
+	int i;
+
+	if (!pool->usage)
+		return -EINVAL;
+
+	if (sub_pool_id > pool->sub_pools_count - 1)
+		return -EINVAL;
+
+	sub_pool = &mlxsw_sp_counter_sub_pools[sub_pool_id];
+	if (entry_size > sub_pool->entry_size)
+		return -EINVAL;
+
+	stop_index = sub_pool->base_index + sub_pool->size;
+	entry_index = sub_pool->base_index;
+	entry_index = find_next_zero_bit(pool->usage, stop_index, entry_index);
+
+	/* The sub-pools can contain non-integer number of entries so
+	 * we must check for overflow
+	 */
+	if (entry_index + sub_pool->entry_size > stop_index)
+		return -ENOBUFS;
+	for (i = 0; i < sub_pool->entry_size; i++)
+		set_bit(entry_index + i, pool->usage);
+	return entry_index;
+}
+
+void mlxsw_sp_counter_free(struct mlxsw_sp *mlxsw_sp, unsigned int sub_pool_id,
+			   unsigned int entry_index)
+{
+	struct mlxsw_sp_counter_pool *pool = mlxsw_sp->counter_pool;
+	struct mlxsw_sp_counter_sub_pool *sub_pool;
+	int i;
+
+	if (entry_index > pool->pool_size)
+		return;
+
+	if (sub_pool_id > pool->sub_pools_count - 1)
+		return;
+
+	sub_pool = &mlxsw_sp_counter_sub_pools[sub_pool_id];
+	for (i = 0; i < sub_pool->entry_size; i++)
+		clear_bit(entry_index + i, pool->usage);
+}
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_cnt.h b/drivers/net/ethernet/mellanox/mlxsw/spectrum_cnt.h
new file mode 100644
index 0000000..d300b89
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_cnt.h
@@ -0,0 +1,49 @@
+/*
+ * drivers/net/ethernet/mellanox/mlxsw/spectrum_cnt.h
+ * Copyright (c) 2017 Mellanox Technologies. All rights reserved.
+ * Copyright (c) 2017 Arkadi Sharshevsky <arkdis@mellanox.com>
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ * 3. Neither the names of the copyright holders nor the names of its
+ *    contributors may be used to endorse or promote products derived from
+ *    this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _MLXSW_SPECTRUM_CNT_H
+#define _MLXSW_SPECTRUM_CNT_H
+
+#include "spectrum.h"
+
+enum mlxsw_sp_counter_sub_pool_id;
+
+int mlxsw_sp_counter_alloc(struct mlxsw_sp *mlxsw_sp, unsigned int sub_pool_id,
+			   unsigned int entry_size);
+void mlxsw_sp_counter_free(struct mlxsw_sp *mlxsw_sp, unsigned int sub_pool_id,
+			   unsigned int entry_index);
+int mlxsw_sp_counter_pool_init(struct mlxsw_sp *mlxsw_sp);
+void mlxsw_sp_counter_pool_fini(struct mlxsw_sp *mlxsw_sp);
+
+#endif
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [patch net-next RFC 3/8] mlxsw: reg: Add counter fields to RITR register
  2017-02-16 15:22 [patch net-next RFC 0/8] Add support for pipeline debug (dpipe) Jiri Pirko
  2017-02-16 15:22 ` [patch net-next RFC 1/8] devlink: Support " Jiri Pirko
  2017-02-16 15:22 ` [patch net-next RFC 2/8] mlxsw: spectrum: Add support for flow counter allocator Jiri Pirko
@ 2017-02-16 15:22 ` Jiri Pirko
  2017-02-16 15:22 ` [patch net-next RFC 4/8] mlxsw: spectrum: Add placeholder for dpipe Jiri Pirko
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 22+ messages in thread
From: Jiri Pirko @ 2017-02-16 15:22 UTC (permalink / raw)
  To: netdev
  Cc: davem, arkadis, idosch, mlxsw, jhs, ivecera, roopa, f.fainelli,
	vivien.didelot, john.fastabend, andrew

From: Arkadi Sharshevsky <arkadis@mellanox.com>

Update RITR for counter support. This allows adding counters for
ASIC's router ports.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/reg.h | 54 +++++++++++++++++++++++++++++++
 1 file changed, 54 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/reg.h b/drivers/net/ethernet/mellanox/mlxsw/reg.h
index 0899e2d..6066689 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/reg.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/reg.h
@@ -4125,6 +4125,60 @@ MLXSW_ITEM32(reg, ritr, sp_if_system_port, 0x08, 0, 16);
  */
 MLXSW_ITEM32(reg, ritr, sp_if_vid, 0x18, 0, 12);
 
+/* Shared between ingress/egress */
+enum mlxsw_reg_ritr_counter_set_type {
+	/* No Count. */
+	MLXSW_REG_RITR_COUNTER_SET_TYPE_NO_COUNT = 0x0,
+	/* Basic. Used for router interfaces, counting the following:
+	 *	- Error and Discard counters.
+	 *	- Unicast, Multicast and Broadcast counters. Sharing the
+	 *	  same set of counters for the different type of traffic
+	 *	  (IPv4, IPv6 and mpls).
+	 */
+	MLXSW_REG_RITR_COUNTER_SET_TYPE_BASIC = 0x9,
+};
+
+/* reg_ritr_ingress_counter_index
+ * Counter Index for flow counter.
+ * Access: RW
+ */
+MLXSW_ITEM32(reg, ritr, ingress_counter_index, 0x38, 0, 24);
+
+/* reg_ritr_ingress_counter_set_type
+ * Igress Counter Set Type for router interface counter.
+ * Access: RW
+ */
+MLXSW_ITEM32(reg, ritr, ingress_counter_set_type, 0x38, 24, 8);
+
+/* reg_ritr_egress_counter_index
+ * Counter Index for flow counter.
+ * Access: RW
+ */
+MLXSW_ITEM32(reg, ritr, egress_counter_index, 0x3C, 0, 24);
+
+/* reg_ritr_egress_counter_set_type
+ * Egress Counter Set Type for router interface counter.
+ * Access: RW
+ */
+MLXSW_ITEM32(reg, ritr, egress_counter_set_type, 0x3C, 24, 8);
+
+static inline void mlxsw_reg_ritr_counter_pack(char *payload, u32 index,
+					       bool enable, bool egress)
+{
+	enum mlxsw_reg_ritr_counter_set_type set_type;
+
+	if (enable)
+		set_type = MLXSW_REG_RITR_COUNTER_SET_TYPE_BASIC;
+	else
+		set_type = MLXSW_REG_RITR_COUNTER_SET_TYPE_NO_COUNT;
+	mlxsw_reg_ritr_egress_counter_set_type_set(payload, set_type);
+
+	if (egress)
+		mlxsw_reg_ritr_egress_counter_index_set(payload, index);
+	else
+		mlxsw_reg_ritr_ingress_counter_index_set(payload, index);
+}
+
 static inline void mlxsw_reg_ritr_rif_pack(char *payload, u16 rif)
 {
 	MLXSW_REG_ZERO(ritr, payload);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [patch net-next RFC 4/8] mlxsw: spectrum: Add placeholder for dpipe
  2017-02-16 15:22 [patch net-next RFC 0/8] Add support for pipeline debug (dpipe) Jiri Pirko
                   ` (2 preceding siblings ...)
  2017-02-16 15:22 ` [patch net-next RFC 3/8] mlxsw: reg: Add counter fields to RITR register Jiri Pirko
@ 2017-02-16 15:22 ` Jiri Pirko
  2017-02-16 15:22 ` [patch net-next RFC 5/8] mlxsw: spectrum: Add definition for egress rif table Jiri Pirko
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 22+ messages in thread
From: Jiri Pirko @ 2017-02-16 15:22 UTC (permalink / raw)
  To: netdev
  Cc: davem, arkadis, idosch, mlxsw, jhs, ivecera, roopa, f.fainelli,
	vivien.didelot, john.fastabend, andrew

From: Arkadi Sharshevsky <arkadis@mellanox.com>

Add placeholder for dpipe. Support for specific tables and headers will
be introduced in following patches. The headers are shared between all
mlxsw_sp instances.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/Makefile       |  2 +-
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c     | 10 ++++
 .../net/ethernet/mellanox/mlxsw/spectrum_dpipe.c   | 57 ++++++++++++++++++++++
 .../net/ethernet/mellanox/mlxsw/spectrum_dpipe.h   | 41 ++++++++++++++++
 4 files changed, 109 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.c
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.h

diff --git a/drivers/net/ethernet/mellanox/mlxsw/Makefile b/drivers/net/ethernet/mellanox/mlxsw/Makefile
index 95fcacf..2fb8c65 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/Makefile
+++ b/drivers/net/ethernet/mellanox/mlxsw/Makefile
@@ -16,7 +16,7 @@ mlxsw_spectrum-objs		:= spectrum.o spectrum_buffers.o \
 				   spectrum_switchdev.o spectrum_router.o \
 				   spectrum_kvdl.o spectrum_acl_tcam.o \
 				   spectrum_acl.o spectrum_flower.o \
-				   spectrum_cnt.o
+				   spectrum_cnt.o spectrum_dpipe.o
 mlxsw_spectrum-$(CONFIG_MLXSW_SPECTRUM_DCB)	+= spectrum_dcb.o
 obj-$(CONFIG_MLXSW_MINIMAL)	+= mlxsw_minimal.o
 mlxsw_minimal-objs		:= minimal.o
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index f7b8cf4..cb66416 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -67,6 +67,7 @@
 #include "trap.h"
 #include "txheader.h"
 #include "spectrum_cnt.h"
+#include "spectrum_dpipe.h"
 
 static const char mlxsw_sp_driver_name[] = "mlxsw_spectrum";
 static const char mlxsw_sp_driver_version[] = "1.0";
@@ -3231,6 +3232,12 @@ static int mlxsw_sp_init(struct mlxsw_core *mlxsw_core,
 		goto err_counter_pool_init;
 	}
 
+	err = mlxsw_sp_dpipe_init(mlxsw_sp);
+	if (err) {
+		dev_err(mlxsw_sp->bus_info->dev, "Failed to init pipeline debug\n");
+		goto err_dpipe_init;
+	}
+
 	err = mlxsw_sp_ports_create(mlxsw_sp);
 	if (err) {
 		dev_err(mlxsw_sp->bus_info->dev, "Failed to create ports\n");
@@ -3240,6 +3247,8 @@ static int mlxsw_sp_init(struct mlxsw_core *mlxsw_core,
 	return 0;
 
 err_ports_create:
+	mlxsw_sp_dpipe_fini(mlxsw_sp);
+err_dpipe_init:
 	mlxsw_sp_counter_pool_fini(mlxsw_sp);
 err_counter_pool_init:
 	mlxsw_sp_acl_fini(mlxsw_sp);
@@ -3264,6 +3273,7 @@ static void mlxsw_sp_fini(struct mlxsw_core *mlxsw_core)
 	struct mlxsw_sp *mlxsw_sp = mlxsw_core_driver_priv(mlxsw_core);
 
 	mlxsw_sp_ports_remove(mlxsw_sp);
+	mlxsw_sp_dpipe_fini(mlxsw_sp);
 	mlxsw_sp_counter_pool_fini(mlxsw_sp);
 	mlxsw_sp_acl_fini(mlxsw_sp);
 	mlxsw_sp_span_fini(mlxsw_sp);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.c
new file mode 100644
index 0000000..7c87bc8
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.c
@@ -0,0 +1,57 @@
+/*
+ * drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.c
+ * Copyright (c) 2017 Mellanox Technologies. All rights reserved.
+ * Copyright (c) 2017 Arkadi Sharshevsky <arakdis@mellanox.com>
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ * 3. Neither the names of the copyright holders nor the names of its
+ *    contributors may be used to endorse or promote products derived from
+ *    this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <linux/kernel.h>
+#include <net/devlink.h>
+
+#include "spectrum.h"
+#include "spectrum_dpipe.h"
+
+static struct devlink_dpipe_header *mlxsw_dpipe_headers[] = {};
+
+static struct devlink_dpipe_headers mlxsw_sp_dpipe_headers = {
+	.headers = mlxsw_dpipe_headers,
+	.headers_count = ARRAY_SIZE(mlxsw_dpipe_headers),
+};
+
+int mlxsw_sp_dpipe_init(struct mlxsw_sp *mlxsw_sp)
+{
+	return devlink_dpipe_headers_register(priv_to_devlink(mlxsw_sp->core),
+					      &mlxsw_sp_dpipe_headers);
+}
+
+void mlxsw_sp_dpipe_fini(struct mlxsw_sp *mlxsw_sp)
+{
+	devlink_dpipe_headers_unregister(priv_to_devlink(mlxsw_sp->core));
+}
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.h b/drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.h
new file mode 100644
index 0000000..ad16259
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.h
@@ -0,0 +1,41 @@
+/*
+ * drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.h
+ * Copyright (c) 2017 Mellanox Technologies. All rights reserved.
+ * Copyright (c) 2017 Arkadi Sharshevsky <arkadis@mellanox.com>
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are met:
+ *
+ * 1. Redistributions of source code must retain the above copyright
+ *    notice, this list of conditions and the following disclaimer.
+ * 2. Redistributions in binary form must reproduce the above copyright
+ *    notice, this list of conditions and the following disclaimer in the
+ *    documentation and/or other materials provided with the distribution.
+ * 3. Neither the names of the copyright holders nor the names of its
+ *    contributors may be used to endorse or promote products derived from
+ *    this software without specific prior written permission.
+ *
+ * Alternatively, this software may be distributed under the terms of the
+ * GNU General Public License ("GPL") version 2 as published by the Free
+ * Software Foundation.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
+ * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#ifndef _MLXSW_PIPELINE_H_
+#define _MLXSW_PIPELINE_H_
+
+int mlxsw_sp_dpipe_init(struct mlxsw_sp *mlxsw_sp);
+void mlxsw_sp_dpipe_fini(struct mlxsw_sp *mlxsw_sp);
+
+#endif /* _MLXSW_PIPELINE_H_*/
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [patch net-next RFC 5/8] mlxsw: spectrum: Add definition for egress rif table
  2017-02-16 15:22 [patch net-next RFC 0/8] Add support for pipeline debug (dpipe) Jiri Pirko
                   ` (3 preceding siblings ...)
  2017-02-16 15:22 ` [patch net-next RFC 4/8] mlxsw: spectrum: Add placeholder for dpipe Jiri Pirko
@ 2017-02-16 15:22 ` Jiri Pirko
  2017-02-16 15:22 ` [patch net-next RFC 6/8] mlxsw: reg: Add Router Interface Counter Register Jiri Pirko
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 22+ messages in thread
From: Jiri Pirko @ 2017-02-16 15:22 UTC (permalink / raw)
  To: netdev
  Cc: davem, arkadis, idosch, mlxsw, jhs, ivecera, roopa, f.fainelli,
	vivien.didelot, john.fastabend, andrew

From: Arkadi Sharshevsky <arkadis@mellanox.com>

Add definition for egress router interface table. This table describes
the final part in the routing pipeline. This table matches the egress
interface index (rif index, which is set by the previous stages and
determine the out port) and makes the decision of forwarding the packet
towards the L2 logic or dropping it.

The metadata header is added to represent this internal information.
The rif index field is mapped logically to netdevice ifindex.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 .../net/ethernet/mellanox/mlxsw/spectrum_dpipe.c   | 89 +++++++++++++++++++++-
 .../net/ethernet/mellanox/mlxsw/spectrum_dpipe.h   |  2 +
 2 files changed, 87 insertions(+), 4 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.c
index 7c87bc8..41e8e33 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.c
@@ -38,20 +38,101 @@
 #include "spectrum.h"
 #include "spectrum_dpipe.h"
 
-static struct devlink_dpipe_header *mlxsw_dpipe_headers[] = {};
+enum mlxsw_sp_field_metadata_id {
+	MLXSW_SP_DPIPE_FIELD_METADATA_ERIF_PORT,
+	MLXSW_SP_DPIPE_FIELD_METADATA_FORWARD_OUT,
+	MLXSW_SP_DPIPE_FIELD_METADATA_DROP,
+};
+
+static struct devlink_dpipe_field mlxsw_sp_dpipe_fields_metadata[] = {
+	{ .name = "egress_rif_port",
+	  .id = MLXSW_SP_DPIPE_FIELD_METADATA_ERIF_PORT,
+	  .bitwidth = 32,
+	  .mapping_type = DEVLINK_DPIPE_FIELD_MAPPING_TYPE_IFINDEX,
+	},
+	{ .name = "forward_out",
+	  .id = MLXSW_SP_DPIPE_FIELD_METADATA_FORWARD_OUT,
+	  .bitwidth = 1,
+	},
+	{ .name = "drop",
+	  .id = MLXSW_SP_DPIPE_FIELD_METADATA_DROP,
+	  .bitwidth = 1,
+	},
+};
+
+enum mlxsw_sp_dpipe_header_id {
+	MLXSW_SP_DPIPE_HEADER_METADATA,
+};
+
+static struct devlink_dpipe_header mlxsw_sp_dpipe_header_metadata = {
+	.name = "mlxsw_meta",
+	.id = MLXSW_SP_DPIPE_HEADER_METADATA,
+	.fields = mlxsw_sp_dpipe_fields_metadata,
+	.fields_count = ARRAY_SIZE(mlxsw_sp_dpipe_fields_metadata),
+};
+
+static struct devlink_dpipe_header *mlxsw_dpipe_headers[] = {
+	&mlxsw_sp_dpipe_header_metadata,
+};
 
 static struct devlink_dpipe_headers mlxsw_sp_dpipe_headers = {
 	.headers = mlxsw_dpipe_headers,
 	.headers_count = ARRAY_SIZE(mlxsw_dpipe_headers),
 };
 
+static struct devlink_dpipe_hfield mlxsw_sp_dpipe_matches_erif[] = {
+	{ .header = &mlxsw_sp_dpipe_header_metadata,
+	  .field_id = MLXSW_SP_DPIPE_FIELD_METADATA_ERIF_PORT,
+	},
+};
+
+enum mlxsw_dp_dpipe_actions_erif_id {
+	MLXSW_SP_DPIPE_TABLE_ERIF_ACTION_DROP,
+	MLXSW_SP_DPIPE_TABLE_ERIF_ACTION_FOWARD,
+};
+
+static struct devlink_dpipe_hfield mlxsw_sp_dpipe_actions_erif[] = {
+	{ .header = &mlxsw_sp_dpipe_header_metadata,
+	  .field_id = MLXSW_SP_DPIPE_FIELD_METADATA_DROP,
+	},
+	{ .header =  &mlxsw_sp_dpipe_header_metadata,
+	  .field_id = MLXSW_SP_DPIPE_FIELD_METADATA_FORWARD_OUT,
+	},
+};
+
+static struct devlink_dpipe_table_ops mlxsw_sp_erif_ops = {};
+
 int mlxsw_sp_dpipe_init(struct mlxsw_sp *mlxsw_sp)
 {
-	return devlink_dpipe_headers_register(priv_to_devlink(mlxsw_sp->core),
-					      &mlxsw_sp_dpipe_headers);
+	struct devlink *devlink = priv_to_devlink(mlxsw_sp->core);
+	int err;
+
+	err = devlink_dpipe_headers_register(devlink,
+					     &mlxsw_sp_dpipe_headers);
+	if (err)
+		return err;
+	err = devlink_dpipe_table_register(devlink,
+					   MLXSW_SP_DPIPE_TABLE_NAME_ERIF,
+					   &mlxsw_sp_erif_ops,
+					   mlxsw_sp,
+					   mlxsw_sp_dpipe_matches_erif,
+					   ARRAY_SIZE(mlxsw_sp_dpipe_matches_erif),
+					   mlxsw_sp_dpipe_actions_erif,
+					   ARRAY_SIZE(mlxsw_sp_dpipe_actions_erif),
+					   false);
+	if (err)
+		goto err_erif_register;
+	return 0;
+
+err_erif_register:
+	devlink_dpipe_headers_unregister(priv_to_devlink(mlxsw_sp->core));
+	return err;
 }
 
 void mlxsw_sp_dpipe_fini(struct mlxsw_sp *mlxsw_sp)
 {
-	devlink_dpipe_headers_unregister(priv_to_devlink(mlxsw_sp->core));
+	struct devlink *devlink = priv_to_devlink(mlxsw_sp->core);
+
+	devlink_dpipe_table_unregister(devlink, MLXSW_SP_DPIPE_TABLE_NAME_ERIF);
+	devlink_dpipe_headers_unregister(devlink);
 }
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.h b/drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.h
index ad16259..d208929 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.h
@@ -38,4 +38,6 @@
 int mlxsw_sp_dpipe_init(struct mlxsw_sp *mlxsw_sp);
 void mlxsw_sp_dpipe_fini(struct mlxsw_sp *mlxsw_sp);
 
+#define MLXSW_SP_DPIPE_TABLE_NAME_ERIF "mlxsw_erif"
+
 #endif /* _MLXSW_PIPELINE_H_*/
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [patch net-next RFC 6/8] mlxsw: reg: Add Router Interface Counter Register
  2017-02-16 15:22 [patch net-next RFC 0/8] Add support for pipeline debug (dpipe) Jiri Pirko
                   ` (4 preceding siblings ...)
  2017-02-16 15:22 ` [patch net-next RFC 5/8] mlxsw: spectrum: Add definition for egress rif table Jiri Pirko
@ 2017-02-16 15:22 ` Jiri Pirko
  2017-02-16 15:22 ` [patch net-next RFC 7/8] mlxsw: spectrum: Support for counters on router interfaces Jiri Pirko
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 22+ messages in thread
From: Jiri Pirko @ 2017-02-16 15:22 UTC (permalink / raw)
  To: netdev
  Cc: davem, arkadis, idosch, mlxsw, jhs, ivecera, roopa, f.fainelli,
	vivien.didelot, john.fastabend, andrew

From: Arkadi Sharshevsky <arkadis@mellanox.com>

The RICNT register retrieves per port performance counter. It will be
used to query the router interfaces statistics.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/reg.h | 124 ++++++++++++++++++++++++++++++
 1 file changed, 124 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/reg.h b/drivers/net/ethernet/mellanox/mlxsw/reg.h
index 6066689..a3b95a9 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/reg.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/reg.h
@@ -4339,6 +4339,129 @@ static inline void mlxsw_reg_ratr_eth_entry_pack(char *payload,
 	mlxsw_reg_ratr_eth_destination_mac_memcpy_to(payload, dest_mac);
 }
 
+/* RICNT - Router Interface Counter Register
+ * -----------------------------------------
+ * The RICNT register retrieves per port performance counters
+ */
+#define MLXSW_REG_RICNT_ID 0x800B
+#define MLXSW_REG_RICNT_LEN 0x100
+
+MLXSW_REG_DEFINE(ricnt, MLXSW_REG_RICNT_ID, MLXSW_REG_RICNT_LEN);
+
+/* reg_ricnt_counter_index
+ * Counter index
+ * Access: RW
+ */
+MLXSW_ITEM32(reg, ricnt, counter_index, 0x04, 0, 24);
+
+enum mlxsw_reg_ricnt_counter_set_type {
+	/* No Count. */
+	MLXSW_REG_RICNT_COUNTER_SET_TYPE_NO_COUNT = 0x00,
+	/* Basic. Used for router interfaces, counting the following:
+	 *	- Error and Discard counters.
+	 *	- Unicast, Multicast and Broadcast counters. Sharing the
+	 *	  same set of counters for the different type of traffic
+	 *	  (IPv4, IPv6 and mpls).
+	 */
+	MLXSW_REG_RICNT_COUNTER_SET_TYPE_BASIC = 0x09,
+};
+
+/* reg_ricnt_counter_set_type
+ * Counter Set Type for router interface counter
+ * Access: RW
+ */
+MLXSW_ITEM32(reg, ricnt, counter_set_type, 0x04, 24, 8);
+
+enum mlxsw_reg_ricnt_opcode {
+	/* Nop. Supported only for read access*/
+	MLXSW_REG_RICNT_OPCODE_NOP = 0x00,
+	/* Clear. Setting the clr bit will reset the counter value for
+	 * all counters of the specified Router Interface.
+	 */
+	MLXSW_REG_RICNT_OPCODE_CLEAR = 0x08,
+};
+
+/* reg_ricnt_opcode
+ * Opcode
+ * Access: RW
+ */
+MLXSW_ITEM32(reg, ricnt, op, 0x00, 28, 4);
+
+/* reg_ricnt_good_unicast_packets
+ * good unicast packets.
+ * Access: RW
+ */
+MLXSW_ITEM64(reg, ricnt, good_unicast_packets, 0x08, 0, 64);
+
+/* reg_ricnt_good_multicast_packets
+ * good multicast packets.
+ * Access: RW
+ */
+MLXSW_ITEM64(reg, ricnt, good_multicast_packets, 0x10, 0, 64);
+
+/* reg_ricnt_good_broadcast_packets
+ * good broadcast packets
+ * Access: RW
+ */
+MLXSW_ITEM64(reg, ricnt, good_broadcast_packets, 0x18, 0, 64);
+
+/* reg_ricnt_good_unicast_bytes
+ * A count of L3 data and padding octets not including L2 headers
+ * for good unicast frames.
+ * Access: RW
+ */
+MLXSW_ITEM64(reg, ricnt, good_unicast_bytes, 0x20, 0, 64);
+
+/* reg_ricnt_good_multicast_bytes
+ * A count of L3 data and padding octets not including L2 headers
+ * for good multicast frames.
+ * Access: RW
+ */
+MLXSW_ITEM64(reg, ricnt, good_multicast_bytes, 0x28, 0, 64);
+
+/* reg_ritr_good_broadcast_bytes
+ * A count of L3 data and padding octets not including L2 headers
+ * for good broadcast frames.
+ * Access: RW
+ */
+MLXSW_ITEM64(reg, ricnt, good_broadcast_bytes, 0x30, 0, 64);
+
+/* reg_ricnt_error_packets
+ * A count of errored frames that do not pass the router checks.
+ * Access: RW
+ */
+MLXSW_ITEM64(reg, ricnt, error_packets, 0x38, 0, 64);
+
+/* reg_ricnt_discrad_packets
+ * A count of non-errored frames that do not pass the router checks.
+ * Access: RW
+ */
+MLXSW_ITEM64(reg, ricnt, discard_packets, 0x40, 0, 64);
+
+/* reg_ricnt_error_bytes
+ * A count of L3 data and padding octets not including L2 headers
+ * for errored frames.
+ * Access: RW
+ */
+MLXSW_ITEM64(reg, ricnt, error_bytes, 0x48, 0, 64);
+
+/* reg_ricnt_discard_bytes
+ * A count of L3 data and padding octets not including L2 headers
+ * for non-errored frames that do not pass the router checks.
+ * Access: RW
+ */
+MLXSW_ITEM64(reg, ricnt, discard_bytes, 0x50, 0, 64);
+
+static inline void mlxsw_reg_ricnt_pack(char *payload, u32 index,
+					enum mlxsw_reg_ricnt_opcode op)
+{
+	MLXSW_REG_ZERO(ricnt, payload);
+	mlxsw_reg_ricnt_op_set(payload, op);
+	mlxsw_reg_ricnt_counter_index_set(payload, index);
+	mlxsw_reg_ricnt_counter_set_type_set(payload,
+					     MLXSW_REG_RICNT_COUNTER_SET_TYPE_BASIC);
+}
+
 /* RALTA - Router Algorithmic LPM Tree Allocation Register
  * -------------------------------------------------------
  * RALTA is used to allocate the LPM trees of the SHSPM method.
@@ -6014,6 +6137,7 @@ static const struct mlxsw_reg_info *mlxsw_reg_infos[] = {
 	MLXSW_REG(rgcr),
 	MLXSW_REG(ritr),
 	MLXSW_REG(ratr),
+	MLXSW_REG(ricnt),
 	MLXSW_REG(ralta),
 	MLXSW_REG(ralst),
 	MLXSW_REG(raltb),
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [patch net-next RFC 7/8] mlxsw: spectrum: Support for counters on router interfaces
  2017-02-16 15:22 [patch net-next RFC 0/8] Add support for pipeline debug (dpipe) Jiri Pirko
                   ` (5 preceding siblings ...)
  2017-02-16 15:22 ` [patch net-next RFC 6/8] mlxsw: reg: Add Router Interface Counter Register Jiri Pirko
@ 2017-02-16 15:22 ` Jiri Pirko
  2017-02-16 15:22 ` [patch net-next RFC 8/8] mlxsw: spectrum: Add Support for erif table entries access Jiri Pirko
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 22+ messages in thread
From: Jiri Pirko @ 2017-02-16 15:22 UTC (permalink / raw)
  To: netdev
  Cc: davem, arkadis, idosch, mlxsw, jhs, ivecera, roopa, f.fainelli,
	vivien.didelot, john.fastabend, andrew

From: Arkadi Sharshevsky <arkadis@mellanox.com>

Add support for counter allocation on router interfaces. The allocation
depends on the counter state of relevant table. In case the counting is
disabled or no counters left the counter index will be set as invalid.

Also a counter pool for router allocation is added.

Signed-off-by: Arakdi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/spectrum.c     | 143 +++++++++++++++++++++
 drivers/net/ethernet/mellanox/mlxsw/spectrum.h     |  19 +++
 drivers/net/ethernet/mellanox/mlxsw/spectrum_cnt.c |   7 +-
 drivers/net/ethernet/mellanox/mlxsw/spectrum_cnt.h |   9 +-
 4 files changed, 176 insertions(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index cb66416..9a8c563 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -3318,6 +3318,138 @@ static struct mlxsw_config_profile mlxsw_sp_config_profile = {
 	.resource_query_enable		= 1,
 };
 
+static void mlxsw_sp_rif_counter_set(struct mlxsw_sp_rif *r,
+				     int counter_index, bool valid,
+				     enum mlxsw_sp_rif_counter_dir dir)
+{
+	switch (dir) {
+	case MLXSW_SP_RIF_COUNTER_EGRESS:
+		r->counter_egress = counter_index;
+		r->counter_egress_valid = valid;
+		break;
+	case MLXSW_SP_RIF_COUNTER_INGRESS:
+		r->counter_ingress = counter_index;
+		r->counter_ingress_valid = valid;
+		break;
+	}
+}
+
+static int mlxsw_sp_rif_counter_get(struct mlxsw_sp_rif *r,
+				    enum mlxsw_sp_rif_counter_dir dir)
+{
+	switch (dir) {
+	case MLXSW_SP_RIF_COUNTER_EGRESS:
+		if (r->counter_egress_valid)
+			return r->counter_egress;
+		break;
+	case MLXSW_SP_RIF_COUNTER_INGRESS:
+		if (r->counter_ingress_valid)
+			return r->counter_ingress;
+		break;
+	}
+	return -EINVAL;
+}
+
+static int mlxsw_sp_rif_counter_edit(struct mlxsw_sp *mlxsw_sp, u16 rif,
+				     int counter_index, bool enable,
+				     enum mlxsw_sp_rif_counter_dir dir)
+{
+	char ritr_pl[MLXSW_REG_RITR_LEN];
+	bool is_egress = false;
+	int err;
+
+	if (dir == MLXSW_SP_RIF_COUNTER_EGRESS)
+		is_egress = true;
+	mlxsw_reg_ritr_rif_pack(ritr_pl, rif);
+	err = mlxsw_reg_query(mlxsw_sp->core, MLXSW_REG(ritr), ritr_pl);
+	if (err)
+		return err;
+
+	mlxsw_reg_ritr_counter_pack(ritr_pl, counter_index, enable,
+				    is_egress);
+	return mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(ritr), ritr_pl);
+}
+
+int mlxsw_sp_rif_counter_value_get(struct mlxsw_sp *mlxsw_sp,
+				   struct mlxsw_sp_rif *r,
+				   enum mlxsw_sp_rif_counter_dir dir, u64 *cnt)
+{
+	char ricnt_pl[MLXSW_REG_RICNT_LEN];
+	int counter_index;
+	int err;
+
+	counter_index = mlxsw_sp_rif_counter_get(r, dir);
+	if (counter_index < 0)
+		return -EINVAL;
+	mlxsw_reg_ricnt_pack(ricnt_pl, counter_index,
+			     MLXSW_REG_RICNT_OPCODE_NOP);
+	err = mlxsw_reg_query(mlxsw_sp->core, MLXSW_REG(ricnt), ricnt_pl);
+	if (err)
+		return err;
+	*cnt = mlxsw_reg_ricnt_good_unicast_packets_get(ricnt_pl);
+	return 0;
+}
+
+static int mlxsw_sp_rif_counter_clear(struct mlxsw_sp *mlxsw_sp,
+				      int counter_index)
+{
+	char ricnt_pl[MLXSW_REG_RICNT_LEN];
+
+	mlxsw_reg_ricnt_pack(ricnt_pl, counter_index,
+			     MLXSW_REG_RICNT_OPCODE_CLEAR);
+	return mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(ricnt), ricnt_pl);
+}
+
+int mlxsw_sp_rif_counter_alloc(struct mlxsw_sp *mlxsw_sp,
+			       struct mlxsw_sp_rif *r,
+			       enum mlxsw_sp_rif_counter_dir dir)
+{
+	int counter_index;
+	int err;
+
+	counter_index = mlxsw_sp_counter_alloc(mlxsw_sp,
+					       MLXSW_SP_COUNTER_SUB_POOL_RIF,
+					       MLXSW_SP_COUNTER_SIZE_RIF_BASIC);
+	if (counter_index < 0) {
+		err = counter_index;
+		return err;
+	}
+
+	err = mlxsw_sp_rif_counter_clear(mlxsw_sp, counter_index);
+	if (err)
+		goto err_counter_clear;
+
+	err = mlxsw_sp_rif_counter_edit(mlxsw_sp, r->rif, counter_index,
+					true, dir);
+	if (err)
+		goto err_counter_edit;
+
+	mlxsw_sp_rif_counter_set(r, counter_index, true, dir);
+	return 0;
+
+err_counter_edit:
+err_counter_clear:
+	mlxsw_sp_counter_free(mlxsw_sp, MLXSW_SP_COUNTER_SUB_POOL_RIF,
+			      counter_index);
+	return err;
+}
+
+void mlxsw_sp_rif_counter_free(struct mlxsw_sp *mlxsw_sp,
+			       struct mlxsw_sp_rif *r,
+			       enum mlxsw_sp_rif_counter_dir dir)
+{
+	int counter_index;
+
+	counter_index = mlxsw_sp_rif_counter_get(r, dir);
+	if (counter_index < 0)
+		return;
+	mlxsw_sp_rif_counter_edit(mlxsw_sp, r->rif,
+				  counter_index, false, dir);
+	mlxsw_sp_counter_free(mlxsw_sp, MLXSW_SP_COUNTER_SUB_POOL_RIF,
+			      counter_index);
+	mlxsw_sp_rif_counter_set(r, 0, false, dir);
+}
+
 static struct mlxsw_driver mlxsw_sp_driver = {
 	.kind				= mlxsw_sp_driver_name,
 	.priv_size			= sizeof(struct mlxsw_sp),
@@ -3549,6 +3681,15 @@ mlxsw_sp_vport_rif_sp_create(struct mlxsw_sp_port *mlxsw_sp_vport,
 	f->r = r;
 	mlxsw_sp->rifs[rif] = r;
 
+	if (devlink_dpipe_table_counter_enabled(priv_to_devlink(mlxsw_sp->core),
+						MLXSW_SP_DPIPE_TABLE_NAME_ERIF)){
+		err = mlxsw_sp_rif_counter_alloc(mlxsw_sp, r,
+						 MLXSW_SP_RIF_COUNTER_EGRESS);
+		if (err)
+			netdev_dbg(mlxsw_sp_vport->dev,
+				   "Counter alloc Failed err=%d\n", err);
+	}
+
 	return r;
 
 err_rif_alloc:
@@ -3570,6 +3711,8 @@ static void mlxsw_sp_vport_rif_sp_destroy(struct mlxsw_sp_port *mlxsw_sp_vport,
 	u16 rif = r->rif;
 
 	mlxsw_sp_router_rif_gone_sync(mlxsw_sp, r);
+	mlxsw_sp_rif_counter_free(mlxsw_sp, r, MLXSW_SP_RIF_COUNTER_EGRESS);
+	mlxsw_sp_rif_counter_free(mlxsw_sp, r, MLXSW_SP_RIF_COUNTER_INGRESS);
 
 	mlxsw_sp->rifs[rif] = NULL;
 	f->r = NULL;
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
index 1aec58a..1fb2a43 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
@@ -85,6 +85,11 @@
 
 #define MLXSW_SP_CELL_FACTOR 2	/* 2 * cell_size / (IPG + cell_size + 1) */
 
+enum mlxsw_sp_rif_counter_dir {
+	MLXSW_SP_RIF_COUNTER_INGRESS,
+	MLXSW_SP_RIF_COUNTER_EGRESS,
+};
+
 static inline u16 mlxsw_sp_pfc_delay_get(int mtu, u16 delay)
 {
 	delay = MLXSW_SP_BYTES_TO_CELLS(DIV_ROUND_UP(delay, BITS_PER_BYTE));
@@ -116,6 +121,10 @@ struct mlxsw_sp_rif {
 	unsigned char addr[ETH_ALEN];
 	int mtu;
 	u16 rif;
+	unsigned int counter_ingress;
+	bool counter_ingress_valid;
+	unsigned int counter_egress;
+	bool counter_egress_valid;
 };
 
 struct mlxsw_sp_mid {
@@ -711,4 +720,14 @@ int mlxsw_sp_flower_replace(struct mlxsw_sp_port *mlxsw_sp_port, bool ingress,
 void mlxsw_sp_flower_destroy(struct mlxsw_sp_port *mlxsw_sp_port, bool ingress,
 			     struct tc_cls_flower_offload *f);
 
+int mlxsw_sp_rif_counter_value_get(struct mlxsw_sp *mlxsw_sp,
+				   struct mlxsw_sp_rif *r,
+				   enum mlxsw_sp_rif_counter_dir dir,
+				   u64 *cnt);
+void mlxsw_sp_rif_counter_free(struct mlxsw_sp *mlxsw_sp,
+			       struct mlxsw_sp_rif *r,
+			       enum mlxsw_sp_rif_counter_dir dir);
+int mlxsw_sp_rif_counter_alloc(struct mlxsw_sp *mlxsw_sp,
+			       struct mlxsw_sp_rif *r,
+			       enum mlxsw_sp_rif_counter_dir dir);
 #endif
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_cnt.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_cnt.c
index a0344c6..3e3ca6b 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_cnt.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_cnt.c
@@ -53,7 +53,12 @@ struct mlxsw_sp_counter_pool {
 	unsigned int sub_pools_count;
 };
 
-static struct mlxsw_sp_counter_sub_pool mlxsw_sp_counter_sub_pools[] = {};
+static struct mlxsw_sp_counter_sub_pool mlxsw_sp_counter_sub_pools[] = {
+	[MLXSW_SP_COUNTER_SUB_POOL_RIF] = {
+		.bank_count = 2,
+		.entry_size = MLXSW_SP_COUNTER_SIZE_RIF_BASIC,
+	},
+};
 
 int mlxsw_sp_counter_pool_init(struct mlxsw_sp *mlxsw_sp)
 {
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_cnt.h b/drivers/net/ethernet/mellanox/mlxsw/spectrum_cnt.h
index d300b89..aaf4c90 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_cnt.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_cnt.h
@@ -37,7 +37,14 @@
 
 #include "spectrum.h"
 
-enum mlxsw_sp_counter_sub_pool_id;
+enum mlxsw_sp_counter_sub_pool_id {
+	MLXSW_SP_COUNTER_SUB_POOL_RIF,
+};
+
+/* Counter indexes number needed for different resources */
+enum mlxsw_sp_counter_sizes {
+	MLXSW_SP_COUNTER_SIZE_RIF_BASIC = 10,
+};
 
 int mlxsw_sp_counter_alloc(struct mlxsw_sp *mlxsw_sp, unsigned int sub_pool_id,
 			   unsigned int entry_size);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [patch net-next RFC 8/8] mlxsw: spectrum: Add Support for erif table entries access
  2017-02-16 15:22 [patch net-next RFC 0/8] Add support for pipeline debug (dpipe) Jiri Pirko
                   ` (6 preceding siblings ...)
  2017-02-16 15:22 ` [patch net-next RFC 7/8] mlxsw: spectrum: Support for counters on router interfaces Jiri Pirko
@ 2017-02-16 15:22 ` Jiri Pirko
  2017-02-16 15:51 ` [patch net-next RFC 0/8] Add support for pipeline debug (dpipe) John Fastabend
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 22+ messages in thread
From: Jiri Pirko @ 2017-02-16 15:22 UTC (permalink / raw)
  To: netdev
  Cc: davem, arkadis, idosch, mlxsw, jhs, ivecera, roopa, f.fainelli,
	vivien.didelot, john.fastabend, andrew

From: Arkadi Sharshevsky <arkadis@mellanox.com>

Implement dpipe's table ops for erif table which provide:
1. Getting the entries in the table with the associate values.
	- match on "mlxsw_meta:erif_index"
	- action on "mlxsw_meta:forwared_out"
2. Synchronize the hardware in case of enabling/disabling counters which
   mean removing erif counters from all interfaces.

Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
---
 .../net/ethernet/mellanox/mlxsw/spectrum_dpipe.c   | 167 ++++++++++++++++++++-
 1 file changed, 166 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.c
index 41e8e33..a7eebee 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.c
@@ -100,7 +100,172 @@ static struct devlink_dpipe_hfield mlxsw_sp_dpipe_actions_erif[] = {
 	},
 };
 
-static struct devlink_dpipe_table_ops mlxsw_sp_erif_ops = {};
+static void mlxsw_sp_erif_entry_clear(struct devlink_dpipe_entry *entry)
+{
+	struct devlink_dpipe_hfield_val *val;
+	unsigned int val_count, val_index;
+
+	val = entry->actions;
+	val_count = entry->actions_count;
+	for (val_index = 0; val_index < val_count; val_index++) {
+		kfree(val[val_index].value);
+		kfree(val[val_index].mask);
+	}
+
+	val = entry->matches;
+	val_count = entry->matches_count;
+	for (val_index = 0; val_index < val_count; val_index++) {
+		kfree(val[val_index].value);
+		kfree(val[val_index].mask);
+	}
+}
+
+static int mlxsw_sp_erif_entry_get(struct mlxsw_sp *mlxsw_sp,
+				   struct devlink_dpipe_entry *entry,
+				   struct mlxsw_sp_rif *r,
+				   bool counters_enabled)
+{
+	u32 *action_value;
+	u32 *rif_value;
+	u64 cnt;
+	int err;
+
+	/* Set Match RIF index */
+	rif_value = entry->matches->value;
+	*rif_value = r->rif;
+	entry->matches->mapping_value = r->dev->ifindex;
+
+	/* Set Action Forwarding */
+	action_value = entry->actions->value;
+	*action_value = 1;
+
+	entry->counter_valid = false;
+	entry->counter = 0;
+	if (!counters_enabled)
+		return 0;
+
+	entry->index = r->rif;
+	err = mlxsw_sp_rif_counter_value_get(mlxsw_sp, r,
+					     MLXSW_SP_RIF_COUNTER_EGRESS,
+					     &cnt);
+	if (!err) {
+		entry->counter = cnt;
+		entry->counter_valid = true;
+	}
+	return 0;
+}
+
+static int mlxsw_sp_erif_entry_prepare(struct devlink_dpipe_entry *entry,
+				       struct devlink_dpipe_hfield_val *match,
+				       struct devlink_dpipe_hfield_val *action)
+{
+	entry->matches = match;
+	entry->matches_count = 1;
+
+	entry->actions = action;
+	entry->actions_count = 1;
+
+	match->hfield = mlxsw_sp_dpipe_matches_erif;
+	match->value_size = sizeof(u32);
+	match->value = kmalloc(match->value_size, GFP_KERNEL);
+	if (!match->value)
+		return -ENOMEM;
+
+	action->hfield = &mlxsw_sp_dpipe_actions_erif[MLXSW_SP_DPIPE_TABLE_ERIF_ACTION_FOWARD];
+	action->value_size = sizeof(u32);
+	action->value = kmalloc(action->value_size, GFP_KERNEL);
+	if (!action->value)
+		goto err_action_alloc;
+	return 0;
+
+err_action_alloc:
+	kfree(match->value);
+	return -ENOMEM;
+}
+
+static int mlxsw_sp_table_erif_entries_dump(void *priv, bool counters_enabled,
+					    struct devlink_dpipe_dump_ctx *
+					    dump_ctx)
+{
+	struct devlink_dpipe_hfield_val match = {0}, action = {0};
+	struct devlink_dpipe_entry entry = {0};
+	struct mlxsw_sp *mlxsw_sp = priv;
+	unsigned int rif_count;
+	int i, j;
+	int err;
+
+	err = mlxsw_sp_erif_entry_prepare(&entry, &match, &action);
+	if (err)
+		return err;
+
+	rif_count = MLXSW_CORE_RES_GET(mlxsw_sp->core, MAX_RIFS);
+	rtnl_lock();
+	i = 0;
+start_again:
+	err = devlink_dpipe_entry_prepare_ctx(dump_ctx);
+	if (err)
+		return err;
+	j = 0;
+	for (; i < rif_count; i++) {
+		if (!mlxsw_sp->rifs[i])
+			continue;
+		err = mlxsw_sp_erif_entry_get(mlxsw_sp, &entry,
+					      mlxsw_sp->rifs[i],
+					      counters_enabled);
+		if (err)
+			goto err_entry_get;
+		err = devlink_dpipe_entry_append_ctx(dump_ctx, &entry);
+		if (err) {
+			if (err == -EMSGSIZE) {
+				if (!j)
+					goto err_entry_append;
+				break;
+			}
+			goto err_entry_append;
+		}
+		j++;
+	}
+
+	devlink_dpipe_entry_close_ctx(dump_ctx);
+	if (i != rif_count)
+		goto start_again;
+	rtnl_unlock();
+
+	mlxsw_sp_erif_entry_clear(&entry);
+	return 0;
+err_entry_append:
+err_entry_get:
+	rtnl_unlock();
+	mlxsw_sp_erif_entry_clear(&entry);
+	return err;
+}
+
+static int mlxsw_sp_table_erif_counter_update(void *priv, bool enable)
+{
+	struct mlxsw_sp *mlxsw_sp = priv;
+	int i;
+
+	rtnl_lock();
+	for (i = 0; i < MLXSW_CORE_RES_GET(mlxsw_sp->core, MAX_RIFS); i++) {
+		if (!mlxsw_sp->rifs[i])
+			continue;
+		if (enable)
+			mlxsw_sp_rif_counter_alloc(mlxsw_sp,
+						   mlxsw_sp->rifs[i],
+						   MLXSW_SP_RIF_COUNTER_EGRESS);
+		else
+			mlxsw_sp_rif_counter_free(mlxsw_sp,
+						  mlxsw_sp->rifs[i],
+						  MLXSW_SP_RIF_COUNTER_EGRESS);
+	}
+	rtnl_unlock();
+	return 0;
+}
+
+static struct devlink_dpipe_table_ops mlxsw_sp_erif_ops = {
+	.entries_dump = mlxsw_sp_table_erif_entries_dump,
+	.counter_set_update = mlxsw_sp_table_erif_counter_update,
+};
 
 int mlxsw_sp_dpipe_init(struct mlxsw_sp *mlxsw_sp)
 {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [patch net-next RFC 0/8] Add support for pipeline debug (dpipe)
  2017-02-16 15:22 [patch net-next RFC 0/8] Add support for pipeline debug (dpipe) Jiri Pirko
                   ` (7 preceding siblings ...)
  2017-02-16 15:22 ` [patch net-next RFC 8/8] mlxsw: spectrum: Add Support for erif table entries access Jiri Pirko
@ 2017-02-16 15:51 ` John Fastabend
  2017-02-16 16:26   ` Jiri Pirko
  2017-02-16 16:11 ` Andrew Lunn
  2017-02-16 17:04 ` Andrew Lunn
  10 siblings, 1 reply; 22+ messages in thread
From: John Fastabend @ 2017-02-16 15:51 UTC (permalink / raw)
  To: Jiri Pirko, netdev
  Cc: davem, arkadis, idosch, mlxsw, jhs, ivecera, roopa, f.fainelli,
	vivien.didelot, andrew

On 17-02-16 07:22 AM, Jiri Pirko wrote:
> From: Jiri Pirko <jiri@mellanox.com>
> 
> Arkadi says:
> 
> While doing the hardware offloading process much of the hardware
> specifics cannot be presented. An example for such is the routing
> LPM algorithm which differ in hardware implementation from the
> kernel software implementation. The only information the user receives
> is whether specific route is offloaded or not, but he cannot really
> understand the underlying implementation nor get the specific statistics
> related to that process.
> 

Agreed! Analyzing performance anomalies and optimizations is nearly
impossible without this.

> Another example is ACL offload using TC which is commonly implemented
> using TCAM memory. Currently there is no capability to gain visibility
> into the TCAM structure and to debug suboptimal resource allocation.

Yep.

> 
> This patchset introduces capability for exporting the ASICs pipeline
> abstraction via devlink infrastructure, which should serve as an
> complementary tool. This infrastructure allows the user to get visibility
> into the ASIC by modeling it as a set of match/action tables.
> 
> The main objects defined:
> Table - abstraction for a single pipeline stage. Contains the
>         available match/actions and counter availability.
> Entry - entry in a specific table with specific matches/actions
>         values and dedicated counter.
> Header/field - tuples which describes the tables behavior.
> 

We also need to understand the table topology on devices that have
flexible table topologies. For example some tables are processed
in parallel while others are sequential. One bug I've seen is an ACL
rule being inserted into a table in front of the tunnel engine for
example when it needed to be after the tunnel engine. What resulted
was incorrect drop of packets. That is just one example that was
fairly easy to diagnose more subtle issues are possible.

It looks like this could be added later as another nested block without
breaking compatibility.

> As an example one of the ASIC's L3 blocks will be modeled. The egress
> rif (router interface) table is the final step in the L3 pipeline
> processing which does match on the internal rif index which was
> determined before by the routing logic. The erif table determines
> whether to forward or drop the packet and updates the corresponding
> rif L3 statistics.
> 
> To expose this internal resources a special metadata header will
> be introduced that describes the internal information gathered by
> the ASIC's pipeline and contains the following fields: rif_port_index,
> forward and drop.
> 
> Some internal hardware resources have direct mapping to kernel
> objects. For example the rif_port_index is mapped to the net-devices
> ifindex. By providing this mapping the users gains visibility into
> the offloading process.
> 
> Follow-up work will include exporting more L3 tables which will give
> visibility into the routing process.
> 
> First stage is adding support for dpipe in devlink. Next add support
> in spectrum driver. Finally implement egress router interface
> (erif) table for spectrum ASIC as an example.
> 

+1 perhaps not surprisingly in general the idea looks great to me.

Another thought once something like this is in place. We currently have
a provisioning step that is done out of band typically via firmware
configuration at the moment where table sizes are specified, this is because
many of the table resources are shared across device. It seems like this might
be the right place to expose that to "expert" users at some point once
all this dpipe interface is worked out. Just a thought lets not get to
hung up on it now though.

Thanks,
John

> Arkadi Sharshevsky (8):
>   devlink: Support for pipeline debug (dpipe)
>   mlxsw: spectrum: Add support for flow counter allocator
>   mlxsw: reg: Add counter fields to RITR register
>   mlxsw: spectrum: Add placeholder for dpipe
>   mlxsw: spectrum: Add definition for egress rif table
>   mlxsw: reg: Add Router Interface Counter Register
>   mlxsw: spectrum: Support for counters on router interfaces
>   mlxsw: spectrum: Add Support for erif table entries access
> 
>  drivers/net/ethernet/mellanox/mlxsw/Makefile       |   3 +-
>  drivers/net/ethernet/mellanox/mlxsw/reg.h          | 178 +++++
>  drivers/net/ethernet/mellanox/mlxsw/resources.h    |   2 +
>  drivers/net/ethernet/mellanox/mlxsw/spectrum.c     | 163 +++++
>  drivers/net/ethernet/mellanox/mlxsw/spectrum.h     |  21 +
>  drivers/net/ethernet/mellanox/mlxsw/spectrum_cnt.c | 182 +++++
>  drivers/net/ethernet/mellanox/mlxsw/spectrum_cnt.h |  56 ++
>  .../net/ethernet/mellanox/mlxsw/spectrum_dpipe.c   | 303 +++++++++
>  .../net/ethernet/mellanox/mlxsw/spectrum_dpipe.h   |  43 ++
>  include/net/devlink.h                              | 224 +++++-
>  include/uapi/linux/devlink.h                       |  50 +-
>  net/core/devlink.c                                 | 747 +++++++++++++++++++++
>  12 files changed, 1969 insertions(+), 3 deletions(-)
>  create mode 100644 drivers/net/ethernet/mellanox/mlxsw/spectrum_cnt.c
>  create mode 100644 drivers/net/ethernet/mellanox/mlxsw/spectrum_cnt.h
>  create mode 100644 drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.c
>  create mode 100644 drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.h
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [patch net-next RFC 1/8] devlink: Support for pipeline debug (dpipe)
  2017-02-16 15:22 ` [patch net-next RFC 1/8] devlink: Support " Jiri Pirko
@ 2017-02-16 15:57   ` John Fastabend
  2017-02-17  8:49   ` Simon Horman
  1 sibling, 0 replies; 22+ messages in thread
From: John Fastabend @ 2017-02-16 15:57 UTC (permalink / raw)
  To: Jiri Pirko, netdev
  Cc: davem, arkadis, idosch, mlxsw, jhs, ivecera, roopa, f.fainelli,
	vivien.didelot, andrew

On 17-02-16 07:22 AM, Jiri Pirko wrote:
> From: Arkadi Sharshevsky <arkadis@mellanox.com>
> 
> The pipeline debug is used to export the pipeline abstractions
> for the main objects - tables, headers and entries. The only support for
> set is for changing the counter parameter on specific table.
> 

Couple quick comments this morning I'll review the code later today.

> The basic structures:
> 
> Header - can represent a real protocol header information or internal
>          metadata. Generic protocol headers like IPv4 can be shared
>          between drivers. Each driver can add local headers.
> 
> Field - part of a header. Can represent protocol field or specific
>         ASIC metadata field. Hardware special metadata fields can
>         be mapped to different resources, for example switch ASIC
>        	ports can have internal number which from the systems
>         point of view is mapped to netdeivce ifindex.
> 
> Hfield - Specific combination of header:field. This object is used
>          to represent the table behavior in terms of match/action.
> 
> Hfield_val - Represents value of specific Hfield.
> 
> Table - represents a hardware block which can be described with
>         match/action behavior. The match/action can be done on the
>         packets data or on the internal metadata that it gathered
>         along the packets traversal throw the pipeline which is vendor
>         specific and should be exported in order to provide
>         understanding of ASICs behavior.
> 
> Entry - represents single record in a specific table. The entry is
>         identified by specific combination of Hfield_vals for match
>         /action.

Should actions have their own type? I think this helps user space and
in general keep things clean.

> 
> Prior to accessing the tables/entries the drivers provide the header/
> field data base which is used by driver to user-space. The data base
> is split between the shared headers and unique headers.
> 

Having a header definition file in uapi that has known field:header values
is a nice optimization for user space that deals with multiple devices.
This way I can avoid string parsing and just look for my ipv6 header if
I am writing a simple user space application. More advanced (and I'm guessing
the stuff being used on the mlx switch or at least a lot of the applications
I use) probably don't need this but it is helpful on the lower end less
specialized equipment.

Thanks for doing this quick scan looks nice.
John

> Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
> ---
>  include/net/devlink.h        | 224 ++++++++++++-
>  include/uapi/linux/devlink.h |  50 ++-
>  net/core/devlink.c           | 747 +++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 1019 insertions(+), 2 deletions(-)
> 
> diff --git a/include/net/devlink.h b/include/net/devlink.h
> index d29e5fc..d480669 100644
> --- a/include/net/devlink.h
> +++ b/include/net/devlink.h
> @@ -25,6 +25,8 @@ struct devlink {
>  	struct list_head list;
>  	struct list_head port_list;
>  	struct list_head sb_list;
> +	struct list_head dpipe_table_list;
> +	struct devlink_dpipe_headers *dpipe_headers;
>  	const struct devlink_ops *ops;
>  	struct device *dev;
>  	possible_net_t _net;
> @@ -49,6 +51,157 @@ struct devlink_sb_pool_info {
>  	enum devlink_sb_threshold_type threshold_type;
>  };
>  
> +/**
> + * struct devlink_dpipe_field - dpipe field object
> + * @name: field name
> + * @id: id inside header fields array
> + * @bitwidth: bitwidth
> + * @mapping_type: mapping type
> + */
> +struct devlink_dpipe_field {
> +	const char *name;
> +	unsigned int id;
> +	unsigned int bitwidth;
> +	enum devlink_dpipe_field_mapping_type mapping_type;
> +};
> +
> +/**
> + * struct devlink_dpipe_header - dpipe header object
> + * @name: header name
> + * @id: id, global/local detrmined by global bit
> + * @fields: fields
> + * @fields_count: number of fields
> + * @global: indicates of header is shared like most protocol header
> + *	    or driver specific
> + */
> +struct devlink_dpipe_header {
> +	const char *name;
> +	unsigned int id;
> +	struct devlink_dpipe_field *fields;
> +	unsigned int fields_count;
> +	bool global;
> +};
> +
> +/**
> + * struct devlink_dpipe_hfield - represents header:field tuple
> + * @header: header
> + * @filed_id: field index
> + */
> +struct devlink_dpipe_hfield {
> +	struct devlink_dpipe_header *header;
> +	unsigned int field_id;
> +};
> +
> +/**
> + * struct devlink_dpipe_hfield_val - represents a value of specific
> + *				     tuple
> + * @hfield: the tuple the value is relevant for
> + * @mapping_value: in case the field has some mapping this value
> + *                 specified the mapping value
> + * @value_size: value size
> + * @value: the value
> + * @mask: bit mask
> + */
> +struct devlink_dpipe_hfield_val {
> +	struct devlink_dpipe_hfield *hfield;
> +	unsigned int mapping_value;
> +	unsigned int value_size;
> +	void *value;
> +	void *mask;
> +};
> +
> +/**
> + * struct devlink_dpipe_entry - table entry object
> + * @index: index of the entry in the table
> + * @matches: tuple match values
> + * @matches_count: count of matches tuples
> + * @actions: tuple actions values
> + * @actions_count: count of actions values
> + * @counter: value of counter
> + * @counter_valid: Specify if value is valid from hardware
> + */
> +struct devlink_dpipe_entry {
> +	unsigned int index;
> +	struct devlink_dpipe_hfield_val *matches;
> +	unsigned int matches_count;
> +	struct devlink_dpipe_hfield_val *actions;
> +	unsigned int actions_count;
> +	u64 counter;
> +	bool counter_valid;
> +};
> +
> +/**
> + * struct devlink_dpipe_dump_ctx - context provided to driver in order
> + *				   to dump
> + * @info: info
> + * @cmd: devlink command
> + * @skb: skb
> + * @nest: top attribute
> + * @hdr: hdr
> + */
> +struct devlink_dpipe_dump_ctx {
> +	struct genl_info *info;
> +	enum devlink_command cmd;
> +	struct sk_buff *skb;
> +	struct nlattr *nest;
> +	void *hdr;
> +};
> +
> +struct devlink_dpipe_table_ops;
> +
> +/**
> + * struct devlink_dpipe_table - table object
> + * @priv: private
> + * @name: table name
> + * @size: maximum number of entries
> + * @matches: header:field match tuples
> + * @matches_count: count of match tuples
> + * @actions: header:field action tuples
> + * @actions_count: count of action tuples
> + * @counters_enabled: indicates if counters are active
> + * @counter_control_extern: indicates if counter control is in dpipe or
> + *			    external tool
> + * @table_ops: table operations
> + * @rcu: rcu
> + */
> +struct devlink_dpipe_table {
> +	void *priv;
> +	struct list_head list;
> +	const char *name;
> +	int size;
> +	struct devlink_dpipe_hfield *matches;
> +	unsigned int matches_count;
> +	struct devlink_dpipe_hfield *actions;
> +	unsigned int actions_count;
> +	bool counters_enabled;
> +	bool counter_control_extern;
> +	struct devlink_dpipe_table_ops *table_ops;
> +	struct rcu_head rcu;
> +};
> +
> +/**
> + * struct devlink_dpipe_table_ops - dpipe_table ops
> + * @entries_dump - dumps all active entries in the table
> + * @counter_set_update - when changing the counter status hardware sync
> + *			 maybe needed to allocate/free counter related
> + *			 resources
> + */
> +struct devlink_dpipe_table_ops {
> +	int (*entries_dump)(void *priv, bool counters_enabled,
> +			    struct devlink_dpipe_dump_ctx *dump_ctx);
> +	int (*counter_set_update)(void *priv, bool enable);
> +};
> +
> +/**
> + * struct devlink_dpipe_headers - dpipe headers
> + * @headers - header array can be shared(global bit) or local
> + * @headers_count - count of headers
> + */
> +struct devlink_dpipe_headers {
> +	struct devlink_dpipe_header **headers;
> +	unsigned int headers_count;
> +};
> +
>  struct devlink_ops {
>  	int (*port_type_set)(struct devlink_port *devlink_port,
>  			     enum devlink_port_type port_type);
> @@ -132,7 +285,26 @@ int devlink_sb_register(struct devlink *devlink, unsigned int sb_index,
>  			u16 egress_pools_count, u16 ingress_tc_count,
>  			u16 egress_tc_count);
>  void devlink_sb_unregister(struct devlink *devlink, unsigned int sb_index);
> -
> +int devlink_dpipe_table_register(struct devlink *devlink,
> +				 const char *table_name,
> +				 struct devlink_dpipe_table_ops *table_ops,
> +				 void *priv,
> +				 struct devlink_dpipe_hfield *matches,
> +				 unsigned int matches_count,
> +				 struct devlink_dpipe_hfield *actions,
> +				 unsigned int actions_count,
> +				 bool counter_control_extern);
> +void devlink_dpipe_table_unregister(struct devlink *devlink,
> +				    const char *table_name);
> +int devlink_dpipe_headers_register(struct devlink *devlink,
> +				   struct devlink_dpipe_headers *dpipe_headers);
> +void devlink_dpipe_headers_unregister(struct devlink *devlink);
> +bool devlink_dpipe_table_counter_enabled(struct devlink *devlink,
> +					 const char *table_name);
> +int devlink_dpipe_entry_prepare_ctx(struct devlink_dpipe_dump_ctx *dump_ctx);
> +int devlink_dpipe_entry_append_ctx(struct devlink_dpipe_dump_ctx *dump_ctx,
> +				   struct devlink_dpipe_entry *entry);
> +int devlink_dpipe_entry_close_ctx(struct devlink_dpipe_dump_ctx *dump_ctx);
>  #else
>  
>  static inline struct devlink *devlink_alloc(const struct devlink_ops *ops,
> @@ -200,6 +372,56 @@ static inline void devlink_sb_unregister(struct devlink *devlink,
>  {
>  }
>  
> +int devlink_dpipe_table_register(struct devlink *devlink,
> +				 const char *table_name,
> +				 struct devlink_dpipe_table_ops *table_ops,
> +				 void *priv,
> +				 struct devlink_dpipe_hfield *matches,
> +				 unsigned int matches_count,
> +				 struct devlink_dpipe_hfield *actions,
> +				 unsigned int actions_count,
> +				 bool counter_control_extern)
> +{
> +	return 0;
> +}
> +
> +void devlink_dpipe_table_unregister(struct devlink *devlink,
> +				    const char *table_name);
> +{
> +}
> +
> +int devlink_dpipe_headers_register(struct devlink *devlink,
> +				   struct devlink_dpipe_headers *dpipe_headers);
> +{
> +	return 0;
> +}
> +
> +void devlink_dpipe_headers_unregister(struct devlink *devlink)
> +{
> +}
> +
> +bool devlink_dpipe_table_counter_enabled(struct devlink *devlink,
> +					 const char *table_name)
> +{
> +	return false;
> +}
> +
> +int devlink_dpipe_entry_prepare_ctx(struct devlink_dpipe_dump_ctx *dump_ctx)
> +{
> +	return 0;
> +}
> +
> +int devlink_dpipe_entry_append_ctx(struct devlink_dpipe_dump_ctx *dump_ctx,
> +				   struct devlink_dpipe_entry *entry)
> +{
> +	return 0;
> +}
> +
> +int devlink_dpipe_entry_close_ctx(struct devlink_dpipe_dump_ctx *dump_ctx)
> +{
> +	return 0;
> +}
> +
>  #endif
>  
>  #endif /* _NET_DEVLINK_H_ */
> diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
> index 0f1f3a1..bbab55d 100644
> --- a/include/uapi/linux/devlink.h
> +++ b/include/uapi/linux/devlink.h
> @@ -65,8 +65,12 @@ enum devlink_command {
>  #define DEVLINK_CMD_ESWITCH_MODE_SET /* obsolete, never use this! */ \
>  	DEVLINK_CMD_ESWITCH_SET
>  
> -	/* add new commands above here */
> +	DEVLINK_CMD_DPIPE_TABLES_GET,
> +	DEVLINK_CMD_DPIPE_ENTRIES_GET,
> +	DEVLINK_CMD_DPIPE_HEADERS_GET,
> +	DEVLINK_CMD_DPIPE_TABLE_COUNTER_SET,
>  
> +	/* add new commands above here */
>  	__DEVLINK_CMD_MAX,
>  	DEVLINK_CMD_MAX = __DEVLINK_CMD_MAX - 1
>  };
> @@ -148,10 +152,54 @@ enum devlink_attr {
>  	DEVLINK_ATTR_ESWITCH_MODE,		/* u16 */
>  	DEVLINK_ATTR_ESWITCH_INLINE_MODE,	/* u8 */
>  
> +	DEVLINK_ATTR_DPIPE_TABLES,		/* nested */
> +	DEVLINK_ATTR_DPIPE_TABLE,		/* nested */
> +	DEVLINK_ATTR_DPIPE_TABLE_NAME,		/* string */
> +	DEVLINK_ATTR_DPIPE_TABLE_SIZE,		/* u32 */
> +	DEVLINK_ATTR_DPIPE_TABLE_MATCHES,	/* nested */
> +	DEVLINK_ATTR_DPIPE_TABLE_ACTIONS,	/* nested */
> +	DEVLINK_ATTR_DPIPE_TABLE_COUNTERS_ENABLED,	/* flag */
> +
> +	DEVLINK_ATTR_DPIPE_HFIELD,		/* nested */
> +	DEVLINK_ATTR_DPIPE_HFIELD_VALUE,
> +	DEVLINK_ATTR_DPIPE_HFIELD_MASK,
> +	DEVLINK_ATTR_DPIPE_HFIELD_MAPPING,	/* u32 */
> +
> +	DEVLINK_ATTR_DPIPE_ENTRIES,		/* nested */
> +	DEVLINK_ATTR_DPIPE_ENTRY,		/* nested */
> +	DEVLINK_ATTR_DPIPE_ENTRY_INDEX,		/* u32 */
> +	DEVLINK_ATTR_DPIPE_ENTRY_MATCHES,	/* nested */
> +	DEVLINK_ATTR_DPIPE_ENTRY_ACTIONS,	/* nested */
> +	DEVLINK_ATTR_DPIPE_ENTRY_COUNTER,	/* u64 */
> +	DEVLINK_ATTR_DPIPE_ENTRY_COUNTER_VALID,	/* u32 */
> +
> +	DEVLINK_ATTR_DPIPE_HEADERS,		/* nested */
> +	DEVLINK_ATTR_DPIPE_HEADER,		/* nested */
> +	DEVLINK_ATTR_DPIPE_HEADER_NAME,		/* string */
> +	DEVLINK_ATTR_DPIPE_HEADER_ID,		/* u32 */
> +	DEVLINK_ATTR_DPIPE_HEADER_FIELDS,	/* string */
> +	DEVLINK_ATTR_DPIPE_HEADER_GLOBAL,	/* flag */
> +
> +	DEVLINK_ATTR_DPIPE_FIELD,		/* nested */
> +	DEVLINK_ATTR_DPIPE_FIELD_NAME,		/* string */
> +	DEVLINK_ATTR_DPIPE_FIELD_ID,		/* u32 */
> +	DEVLINK_ATTR_DPIPE_FIELD_BITWIDTH,	/* u32 */
> +	DEVLINK_ATTR_DPIPE_FIELD_MAPPING_TYPE,	/* u32 */
> +
> +	DEVLINK_ATTR_PAD,
> +
>  	/* add new attributes above here, update the policy in devlink.c */
>  
>  	__DEVLINK_ATTR_MAX,
>  	DEVLINK_ATTR_MAX = __DEVLINK_ATTR_MAX - 1
>  };
>  
> +/* Mapping between internal resource described by the field and system
> + * structure
> + */
> +enum devlink_dpipe_field_mapping_type {
> +	DEVLINK_DPIPE_FIELD_MAPPING_TYPE_NONE,
> +	DEVLINK_DPIPE_FIELD_MAPPING_TYPE_IFINDEX,
> +};
> +
>  #endif /* _UAPI_LINUX_DEVLINK_H_ */
> diff --git a/net/core/devlink.c b/net/core/devlink.c
> index e9c1e6a..286d7d2 100644
> --- a/net/core/devlink.c
> +++ b/net/core/devlink.c
> @@ -1493,8 +1493,588 @@ static int devlink_nl_cmd_eswitch_set_doit(struct sk_buff *skb,
>  		if (err)
>  			return err;
>  	}
> +	return 0;
> +}
> +
> +static int devlink_dpipe_hfield_put(struct sk_buff *skb,
> +				    struct devlink_dpipe_hfield *hfield)
> +{
> +	struct devlink_dpipe_header *header = hfield->header;
> +	struct devlink_dpipe_field *field = &header->fields[hfield->field_id];
> +
> +	if (nla_put_u32(skb, DEVLINK_ATTR_DPIPE_HEADER_ID, header->id) ||
> +	    nla_put_u32(skb, DEVLINK_ATTR_DPIPE_FIELD_ID, field->id))
> +		return -EMSGSIZE;
> +
> +	if (header->global)
> +		if (nla_put_flag(skb, DEVLINK_ATTR_DPIPE_HEADER_GLOBAL))
> +			return -EMSGSIZE;
> +	return 0;
> +}
> +
> +static int devlink_dpipe_hfield_val_put(struct sk_buff *skb,
> +					struct devlink_dpipe_hfield_val *val)
> +{
> +	struct devlink_dpipe_header *header;
> +	struct devlink_dpipe_field *field;
> +
> +	if (!val->hfield)
> +		return -EINVAL;
> +	if (devlink_dpipe_hfield_put(skb, val->hfield))
> +		return -EMSGSIZE;
> +	if (nla_put(skb, DEVLINK_ATTR_DPIPE_HFIELD_VALUE,
> +		    val->value_size, val->value))
> +		return -EMSGSIZE;
> +	if (val->mask)
> +		if (nla_put(skb, DEVLINK_ATTR_DPIPE_HFIELD_MASK,
> +			    val->value_size, val->mask))
> +			return -EMSGSIZE;
> +
> +	header = val->hfield->header;
> +	field = &header->fields[val->hfield->field_id];
> +	if (field->mapping_type)
> +		if (nla_put_u32(skb, DEVLINK_ATTR_DPIPE_HFIELD_MAPPING,
> +				val->mapping_value))
> +			return -EMSGSIZE;
> +	return 0;
> +}
> +
> +static int devlink_dpipe_hfield_vals_put(struct sk_buff *skb,
> +					 struct devlink_dpipe_hfield_val *vals,
> +					 unsigned int vals_count)
> +{
> +	struct nlattr *hfield_attr;
> +	int i;
> +	int err;
> +
> +	for (i = 0; i < vals_count; i++) {
> +		hfield_attr = nla_nest_start(skb, DEVLINK_ATTR_DPIPE_HFIELD);
> +		if (!hfield_attr)
> +			return -EMSGSIZE;
> +		err = devlink_dpipe_hfield_val_put(skb, &vals[i]);
> +		if (err)
> +			goto err_hfield_val_put;
> +		nla_nest_end(skb, hfield_attr);
> +	}
> +	return 0;
> +
> +err_hfield_val_put:
> +	nla_nest_cancel(skb, hfield_attr);
> +	return err;
> +}
> +
> +static int devlink_dpipe_hfields_put(struct sk_buff *skb,
> +				     struct devlink_dpipe_hfield *hfields,
> +				     unsigned int hfields_count)
> +{
> +	struct nlattr *hfield_attr;
> +	int i;
> +
> +	for (i = 0; i < hfields_count; i++) {
> +		hfield_attr = nla_nest_start(skb, DEVLINK_ATTR_DPIPE_HFIELD);
> +		if (!hfield_attr)
> +			return -EMSGSIZE;
> +		if (devlink_dpipe_hfield_put(skb, &hfields[i]))
> +			goto nla_put_failure;
> +		nla_nest_end(skb, hfield_attr);
> +	}
> +	return 0;
> +
> +nla_put_failure:
> +	nla_nest_cancel(skb, hfield_attr);
> +	return -EMSGSIZE;
> +}
> +
> +static int devlink_dpipe_table_put(struct sk_buff *skb,
> +				   struct devlink_dpipe_table *table)
> +{
> +	struct nlattr *matches_attr, *actions_attr, *table_attr;
> +
> +	table_attr = nla_nest_start(skb, DEVLINK_ATTR_DPIPE_TABLE);
> +	if (!table_attr)
> +		return -EMSGSIZE;
> +
> +	if (nla_put_string(skb, DEVLINK_ATTR_DPIPE_TABLE_NAME, table->name) ||
> +	    nla_put_u32(skb, DEVLINK_ATTR_DPIPE_TABLE_SIZE, table->size))
> +		goto nla_put_failure;
> +	if (table->counters_enabled)
> +		if (nla_put_flag(skb,
> +				 DEVLINK_ATTR_DPIPE_TABLE_COUNTERS_ENABLED))
> +			goto nla_put_failure;
> +
> +	matches_attr = nla_nest_start(skb, DEVLINK_ATTR_DPIPE_TABLE_MATCHES);
> +	if (!matches_attr)
> +		goto nla_put_failure;
> +
> +	if (devlink_dpipe_hfields_put(skb, table->matches,
> +				      table->matches_count)) {
> +		nla_nest_cancel(skb, matches_attr);
> +		goto nla_put_failure;
> +	}
> +
> +	actions_attr = nla_nest_start(skb, DEVLINK_ATTR_DPIPE_TABLE_ACTIONS);
> +	if (!actions_attr)
> +		goto nla_put_failure;
> +
> +	if (devlink_dpipe_hfields_put(skb, table->actions,
> +				      table->actions_count)) {
> +		nla_nest_cancel(skb, actions_attr);
> +		goto nla_put_failure;
> +	}
> +
> +	nla_nest_end(skb, actions_attr);
> +	nla_nest_end(skb, table_attr);
> +	return 0;
> +
> +nla_put_failure:
> +	nla_nest_cancel(skb, table_attr);
> +	return -EMSGSIZE;
> +}
> +
> +static int devlink_dpipe_send_and_alloc_skb(struct sk_buff **pskb,
> +					    struct genl_info *info)
> +{
> +	int err;
> +
> +	if (*pskb) {
> +		err = genlmsg_reply(*pskb, info);
> +		if (err)
> +			return err;
> +	}
> +	*pskb = genlmsg_new(GENLMSG_DEFAULT_SIZE, GFP_KERNEL);
> +	if (!*pskb)
> +		return -ENOMEM;
> +	return 0;
> +}
> +
> +static int devlink_dpipe_tables_fill(struct genl_info *info,
> +				     enum devlink_command cmd, int flags,
> +				     struct list_head *dpipe_tables)
> +
> +{
> +	struct devlink *devlink = info->user_ptr[0];
> +	struct devlink_dpipe_table *table;
> +	struct nlattr *tables_attr;
> +	struct sk_buff *skb = NULL;
> +	struct nlmsghdr *nlh;
> +	bool incomplete;
> +	void *hdr;
> +	int i;
> +	int err;
> +
> +	table = list_first_entry(dpipe_tables,
> +				 struct devlink_dpipe_table, list);
> +start_again:
> +	err = devlink_dpipe_send_and_alloc_skb(&skb, info);
> +	if (err)
> +		return err;
> +
> +	hdr = genlmsg_put(skb, info->snd_portid, info->snd_seq,
> +			  &devlink_nl_family, NLM_F_MULTI, cmd);
> +	if (!hdr)
> +		return -EMSGSIZE;
>  
> +	if (devlink_nl_put_handle(skb, devlink))
> +		goto nla_put_failure;
> +	tables_attr = nla_nest_start(skb, DEVLINK_ATTR_DPIPE_TABLES);
> +	if (!tables_attr)
> +		goto nla_put_failure;
> +
> +	i = 0;
> +	incomplete = false;
> +	list_for_each_entry_from(table, dpipe_tables, list) {
> +		err = devlink_dpipe_table_put(skb, table);
> +		if (err) {
> +			if (!i)
> +				goto err_table_put;
> +			incomplete = true;
> +			break;
> +		}
> +		i++;
> +	}
> +
> +	nla_nest_end(skb, tables_attr);
> +	genlmsg_end(skb, hdr);
> +	if (incomplete)
> +		goto start_again;
> +
> +send_done:
> +	nlh = nlmsg_put(skb, info->snd_portid, info->snd_seq,
> +			NLMSG_DONE, 0, flags | NLM_F_MULTI);
> +	if (!nlh) {
> +		err = devlink_dpipe_send_and_alloc_skb(&skb, info);
> +		if (err)
> +			goto err_skb_send_alloc;
> +		goto send_done;
> +	}
> +
> +	return genlmsg_reply(skb, info);
> +
> +nla_put_failure:
> +	err = -EMSGSIZE;
> +err_table_put:
> +err_skb_send_alloc:
> +	genlmsg_cancel(skb, hdr);
> +	nlmsg_free(skb);
> +	return err;
> +}
> +
> +static int devlink_nl_cmd_dpipe_tables_get(struct sk_buff *skb,
> +					   struct genl_info *info)
> +{
> +	struct devlink *devlink = info->user_ptr[0];
> +
> +	return devlink_dpipe_tables_fill(info, DEVLINK_CMD_DPIPE_TABLES_GET, 0,
> +					 &devlink->dpipe_table_list);
> +}
> +
> +static int devlink_dpipe_entry_put(struct sk_buff *skb,
> +				   struct devlink_dpipe_entry *entry)
> +{
> +	struct nlattr *entry_attr, *matches_attr, *actions_attr;
> +	int err;
> +
> +	entry_attr = nla_nest_start(skb, DEVLINK_ATTR_DPIPE_ENTRY);
> +	if (!entry_attr)
> +		return  -EMSGSIZE;
> +
> +	if (nla_put_u32(skb, DEVLINK_ATTR_DPIPE_ENTRY_INDEX, entry->index))
> +		goto nla_put_failure;
> +	if (entry->counter_valid)
> +		if (nla_put_u64_64bit(skb, DEVLINK_ATTR_DPIPE_ENTRY_COUNTER,
> +				      entry->counter, DEVLINK_ATTR_PAD))
> +			goto nla_put_failure;
> +
> +	matches_attr = nla_nest_start(skb, DEVLINK_ATTR_DPIPE_ENTRY_MATCHES);
> +	if (!matches_attr)
> +		goto nla_put_failure;
> +
> +	err = devlink_dpipe_hfield_vals_put(skb, entry->matches,
> +					    entry->matches_count);
> +	if (err) {
> +		nla_nest_cancel(skb, matches_attr);
> +		goto err_hfield_vals_put;
> +	}
> +	nla_nest_end(skb, matches_attr);
> +
> +	actions_attr = nla_nest_start(skb, DEVLINK_ATTR_DPIPE_ENTRY_ACTIONS);
> +	if (!actions_attr)
> +		goto nla_put_failure;
> +
> +	err = devlink_dpipe_hfield_vals_put(skb, entry->actions,
> +					    entry->actions_count);
> +	if (err) {
> +		nla_nest_cancel(skb, actions_attr);
> +		goto err_hfield_vals_put;
> +	}
> +	nla_nest_end(skb, actions_attr);
> +
> +	nla_nest_end(skb, entry_attr);
>  	return 0;
> +
> +nla_put_failure:
> +	err = -EMSGSIZE;
> +err_hfield_vals_put:
> +	nla_nest_cancel(skb, entry_attr);
> +	return err;
> +}
> +
> +static struct devlink_dpipe_table *
> +devlink_dpipe_table_find(struct list_head *dpipe_tables,
> +			 const char *table_name)
> +{
> +	struct devlink_dpipe_table *table;
> +
> +	list_for_each_entry_rcu(table, dpipe_tables, list) {
> +		if (!strcmp(table->name, table_name))
> +			return table;
> +	}
> +	return NULL;
> +}
> +
> +int devlink_dpipe_entry_prepare_ctx(struct devlink_dpipe_dump_ctx *dump_ctx)
> +{
> +	struct devlink *devlink;
> +	int err;
> +
> +	err = devlink_dpipe_send_and_alloc_skb(&dump_ctx->skb,
> +					       dump_ctx->info);
> +	if (err)
> +		return err;
> +
> +	dump_ctx->hdr = genlmsg_put(dump_ctx->skb,
> +				    dump_ctx->info->snd_portid,
> +				    dump_ctx->info->snd_seq,
> +				    &devlink_nl_family, NLM_F_MULTI,
> +				    dump_ctx->cmd);
> +	if (!dump_ctx->hdr)
> +		goto nla_put_failure;
> +
> +	devlink = dump_ctx->info->user_ptr[0];
> +	if (devlink_nl_put_handle(dump_ctx->skb, devlink))
> +		goto nla_put_failure;
> +	dump_ctx->nest = nla_nest_start(dump_ctx->skb,
> +					DEVLINK_ATTR_DPIPE_ENTRIES);
> +	if (!dump_ctx->nest)
> +		goto nla_put_failure;
> +	return 0;
> +
> +nla_put_failure:
> +	genlmsg_cancel(dump_ctx->skb, dump_ctx->hdr);
> +	nlmsg_free(dump_ctx->skb);
> +	return -EMSGSIZE;
> +}
> +
> +int devlink_dpipe_entry_append_ctx(struct devlink_dpipe_dump_ctx *dump_ctx,
> +				   struct devlink_dpipe_entry *entry)
> +{
> +	return devlink_dpipe_entry_put(dump_ctx->skb, entry);
> +}
> +
> +int devlink_dpipe_entry_close_ctx(struct devlink_dpipe_dump_ctx *dump_ctx)
> +{
> +	nla_nest_end(dump_ctx->skb, dump_ctx->nest);
> +	genlmsg_end(dump_ctx->skb, dump_ctx->hdr);
> +	return 0;
> +}
> +
> +static int devlink_dpipe_entries_fill(struct genl_info *info,
> +				      enum devlink_command cmd, int flags,
> +				      struct devlink_dpipe_table *table)
> +{
> +	struct devlink_dpipe_dump_ctx dump_ctx;
> +	struct nlmsghdr *nlh;
> +	int err;
> +
> +	dump_ctx.skb = NULL;
> +	dump_ctx.cmd = cmd;
> +	dump_ctx.info = info;
> +
> +	err = table->table_ops->entries_dump(table->priv,
> +					     table->counters_enabled,
> +					     &dump_ctx);
> +	if (err)
> +		goto err_entries_dump;
> +
> +send_done:
> +	nlh = nlmsg_put(dump_ctx.skb, info->snd_portid, info->snd_seq,
> +			NLMSG_DONE, 0, flags | NLM_F_MULTI);
> +	if (!nlh) {
> +		err = devlink_dpipe_send_and_alloc_skb(&dump_ctx.skb, info);
> +		if (err)
> +			goto err_skb_send_alloc;
> +		goto send_done;
> +	}
> +	return genlmsg_reply(dump_ctx.skb, info);
> +
> +err_entries_dump:
> +err_skb_send_alloc:
> +	genlmsg_cancel(dump_ctx.skb, dump_ctx.hdr);
> +	nlmsg_free(dump_ctx.skb);
> +	return err;
> +}
> +
> +static int devlink_nl_cmd_dpipe_entries_get(struct sk_buff *skb,
> +					    struct genl_info *info)
> +{
> +	struct devlink *devlink = info->user_ptr[0];
> +	struct devlink_dpipe_table *table;
> +	const char *table_name;
> +
> +	if (!info->attrs[DEVLINK_ATTR_DPIPE_TABLE_NAME])
> +		return -EINVAL;
> +
> +	table_name = nla_data(info->attrs[DEVLINK_ATTR_DPIPE_TABLE_NAME]);
> +	table = devlink_dpipe_table_find(&devlink->dpipe_table_list,
> +					 table_name);
> +	if (!table)
> +		return -EINVAL;
> +
> +	if (!table->table_ops->entries_dump)
> +		return -EINVAL;
> +
> +	return devlink_dpipe_entries_fill(info, DEVLINK_CMD_DPIPE_ENTRIES_GET,
> +					  0, table);
> +}
> +
> +static int devlink_dpipe_fields_put(struct sk_buff *skb,
> +				    const struct devlink_dpipe_header *header)
> +{
> +	struct devlink_dpipe_field *field;
> +	struct nlattr *field_attr;
> +	int i;
> +
> +	for (i = 0; i < header->fields_count; i++) {
> +		field = &header->fields[i];
> +		field_attr = nla_nest_start(skb, DEVLINK_ATTR_DPIPE_FIELD);
> +		if (!field_attr)
> +			return -EMSGSIZE;
> +		if (nla_put_string(skb, DEVLINK_ATTR_DPIPE_FIELD_NAME, field->name) ||
> +		    nla_put_u32(skb, DEVLINK_ATTR_DPIPE_FIELD_ID, field->id) ||
> +		    nla_put_u32(skb, DEVLINK_ATTR_DPIPE_FIELD_BITWIDTH, field->bitwidth))
> +			goto nla_put_failure;
> +		if (field->mapping_type)
> +			if (nla_put_u32(skb,
> +					DEVLINK_ATTR_DPIPE_FIELD_MAPPING_TYPE,
> +					field->mapping_type))
> +				goto nla_put_failure;
> +		nla_nest_end(skb, field_attr);
> +	}
> +	return 0;
> +
> +nla_put_failure:
> +	nla_nest_cancel(skb, field_attr);
> +	return -EMSGSIZE;
> +}
> +
> +static int devlink_dpipe_header_put(struct sk_buff *skb,
> +				    struct devlink_dpipe_header *header)
> +{
> +	struct nlattr *fields_attr, *header_attr;
> +	int err;
> +
> +	header_attr = nla_nest_start(skb, DEVLINK_ATTR_DPIPE_HEADER);
> +	if (!header)
> +		return -EMSGSIZE;
> +
> +	if (nla_put_string(skb, DEVLINK_ATTR_DPIPE_HEADER_NAME, header->name) ||
> +	    nla_put_u32(skb, DEVLINK_ATTR_DPIPE_HEADER_ID, header->id))
> +		goto nla_put_failure;
> +
> +	fields_attr = nla_nest_start(skb, DEVLINK_ATTR_DPIPE_HEADER_FIELDS);
> +	if (!fields_attr)
> +		goto nla_put_failure;
> +
> +	err = devlink_dpipe_fields_put(skb, header);
> +	if (err) {
> +		nla_nest_cancel(skb, fields_attr);
> +		goto nla_put_failure;
> +	}
> +	nla_nest_end(skb, fields_attr);
> +	nla_nest_end(skb, header_attr);
> +	return 0;
> +
> +nla_put_failure:
> +	err = -EMSGSIZE;
> +	nla_nest_cancel(skb, header_attr);
> +	return err;
> +}
> +
> +static int devlink_dpipe_headers_fill(struct genl_info *info,
> +				      enum devlink_command cmd, int flags,
> +				      struct devlink_dpipe_headers *
> +				      dpipe_headers)
> +{
> +	struct devlink *devlink = info->user_ptr[0];
> +	struct nlattr *headers_attr;
> +	struct sk_buff *skb = NULL;
> +	struct nlmsghdr *nlh;
> +	void *hdr;
> +	int i, j;
> +	int err;
> +
> +	i = 0;
> +start_again:
> +	err = devlink_dpipe_send_and_alloc_skb(&skb, info);
> +	if (err)
> +		return err;
> +
> +	hdr = genlmsg_put(skb, info->snd_portid, info->snd_seq,
> +			  &devlink_nl_family, NLM_F_MULTI, cmd);
> +	if (!hdr)
> +		return -EMSGSIZE;
> +
> +	if (devlink_nl_put_handle(skb, devlink))
> +		goto nla_put_failure;
> +	headers_attr = nla_nest_start(skb, DEVLINK_ATTR_DPIPE_HEADERS);
> +	if (!headers_attr)
> +		goto nla_put_failure;
> +
> +	j = 0;
> +	for (; i < dpipe_headers->headers_count; i++) {
> +		err = devlink_dpipe_header_put(skb, dpipe_headers->headers[i]);
> +		if (err) {
> +			if (!j)
> +				goto err_table_put;
> +			break;
> +		}
> +		j++;
> +	}
> +	nla_nest_end(skb, headers_attr);
> +	genlmsg_end(skb, hdr);
> +	if (i != dpipe_headers->headers_count)
> +		goto start_again;
> +
> +send_done:
> +	nlh = nlmsg_put(skb, info->snd_portid, info->snd_seq,
> +			NLMSG_DONE, 0, flags | NLM_F_MULTI);
> +	if (!nlh) {
> +		err = devlink_dpipe_send_and_alloc_skb(&skb, info);
> +		if (err)
> +			goto err_skb_send_alloc;
> +		goto send_done;
> +	}
> +	return genlmsg_reply(skb, info);
> +
> +nla_put_failure:
> +	err = -EMSGSIZE;
> +err_table_put:
> +err_skb_send_alloc:
> +	genlmsg_cancel(skb, hdr);
> +	nlmsg_free(skb);
> +	return err;
> +}
> +
> +static int devlink_nl_cmd_dpipe_headers_get(struct sk_buff *skb,
> +					    struct genl_info *info)
> +{
> +	struct devlink *devlink = info->user_ptr[0];
> +
> +	return devlink_dpipe_headers_fill(info, DEVLINK_CMD_DPIPE_HEADERS_GET,
> +					  0, devlink->dpipe_headers);
> +}
> +
> +static int devlink_dpipe_table_counter_set(struct devlink *devlink,
> +					   const char *table_name,
> +					   bool enable)
> +{
> +	struct devlink_dpipe_table *table;
> +
> +	table = devlink_dpipe_table_find(&devlink->dpipe_table_list,
> +					 table_name);
> +	if (!table)
> +		return -EINVAL;
> +
> +	if (table->counter_control_extern)
> +		return -EOPNOTSUPP;
> +
> +	if (!(table->counters_enabled ^ enable))
> +		return 0;
> +
> +	table->counters_enabled = enable;
> +	if (table->table_ops->counter_set_update)
> +		table->table_ops->counter_set_update(table->priv, enable);
> +	return 0;
> +}
> +
> +static int devlink_nl_cmd_dpipe_table_counter_set(struct sk_buff *skb,
> +						  struct genl_info *info)
> +{
> +	struct devlink *devlink = info->user_ptr[0];
> +	const char *table_name;
> +	int counter_enable;
> +
> +	if (!info->attrs[DEVLINK_ATTR_DPIPE_TABLE_NAME])
> +		return -EINVAL;
> +	table_name = nla_data(info->attrs[DEVLINK_ATTR_DPIPE_TABLE_NAME]);
> +
> +	if (!info->attrs[DEVLINK_ATTR_DPIPE_TABLE_COUNTERS_ENABLED])
> +		counter_enable = false;
> +	else
> +		counter_enable = true;
> +
> +	return devlink_dpipe_table_counter_set(devlink, table_name,
> +					       counter_enable);
>  }
>  
>  static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
> @@ -1512,6 +2092,8 @@ static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
>  	[DEVLINK_ATTR_SB_TC_INDEX] = { .type = NLA_U16 },
>  	[DEVLINK_ATTR_ESWITCH_MODE] = { .type = NLA_U16 },
>  	[DEVLINK_ATTR_ESWITCH_INLINE_MODE] = { .type = NLA_U8 },
> +	[DEVLINK_ATTR_DPIPE_TABLE_NAME] = { .type = NLA_NUL_STRING },
> +	[DEVLINK_ATTR_DPIPE_TABLE_COUNTERS_ENABLED] = { .type =  NLA_U32 },
>  };
>  
>  static const struct genl_ops devlink_nl_ops[] = {
> @@ -1644,6 +2226,34 @@ static const struct genl_ops devlink_nl_ops[] = {
>  		.flags = GENL_ADMIN_PERM,
>  		.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
>  	},
> +	{
> +		.cmd = DEVLINK_CMD_DPIPE_TABLES_GET,
> +		.doit = devlink_nl_cmd_dpipe_tables_get,
> +		.policy = devlink_nl_policy,
> +		.flags = GENL_ADMIN_PERM,
> +		.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
> +	},
> +	{
> +		.cmd = DEVLINK_CMD_DPIPE_ENTRIES_GET,
> +		.doit = devlink_nl_cmd_dpipe_entries_get,
> +		.policy = devlink_nl_policy,
> +		.flags = GENL_ADMIN_PERM,
> +		.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
> +	},
> +	{
> +		.cmd = DEVLINK_CMD_DPIPE_HEADERS_GET,
> +		.doit = devlink_nl_cmd_dpipe_headers_get,
> +		.policy = devlink_nl_policy,
> +		.flags = GENL_ADMIN_PERM,
> +		.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
> +	},
> +	{
> +		.cmd = DEVLINK_CMD_DPIPE_TABLE_COUNTER_SET,
> +		.doit = devlink_nl_cmd_dpipe_table_counter_set,
> +		.policy = devlink_nl_policy,
> +		.flags = GENL_ADMIN_PERM,
> +		.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
> +	},
>  };
>  
>  static struct genl_family devlink_nl_family __ro_after_init = {
> @@ -1680,6 +2290,7 @@ struct devlink *devlink_alloc(const struct devlink_ops *ops, size_t priv_size)
>  	devlink_net_set(devlink, &init_net);
>  	INIT_LIST_HEAD(&devlink->port_list);
>  	INIT_LIST_HEAD(&devlink->sb_list);
> +	INIT_LIST_HEAD_RCU(&devlink->dpipe_table_list);
>  	return devlink;
>  }
>  EXPORT_SYMBOL_GPL(devlink_alloc);
> @@ -1880,6 +2491,142 @@ void devlink_sb_unregister(struct devlink *devlink, unsigned int sb_index)
>  }
>  EXPORT_SYMBOL_GPL(devlink_sb_unregister);
>  
> +/**
> + *	devlink_dpipe_headers_register - register dpipe headers
> + *
> + *	@devlink: devlink
> + *	@dpipe_headers: dpipe header array
> + *
> + *	Register the headers supported by hardware.
> + */
> +int devlink_dpipe_headers_register(struct devlink *devlink,
> +				   struct devlink_dpipe_headers *dpipe_headers)
> +{
> +	mutex_lock(&devlink_mutex);
> +	devlink->dpipe_headers = dpipe_headers;
> +	mutex_unlock(&devlink_mutex);
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(devlink_dpipe_headers_register);
> +
> +/**
> + *	devlink_dpipe_headers_unregister - unregister dpipe headers
> + *
> + *	@devlink: devlink
> + *
> + *	Unregister the headers supported by hardware.
> + */
> +void devlink_dpipe_headers_unregister(struct devlink *devlink)
> +{
> +	mutex_lock(&devlink_mutex);
> +	devlink->dpipe_headers = NULL;
> +	mutex_unlock(&devlink_mutex);
> +}
> +EXPORT_SYMBOL_GPL(devlink_dpipe_headers_unregister);
> +
> +/**
> + *	devlink_dpipe_table_counter_enabled - check if counter allocation
> + *					      required
> + *	@devlink: devlink
> + *	@table_name: tables name
> + *
> + *	Used by driver to check if counter allocation is required.
> + *	After counter allocation is turned on the table entries
> + *	are updated to include counter statistics.
> + *
> + *	After that point on the driver must respect the counter
> + *	state so that each entry added to the table is added
> + *	with a counter.
> + */
> +bool devlink_dpipe_table_counter_enabled(struct devlink *devlink,
> +					 const char *table_name)
> +{
> +	struct devlink_dpipe_table *table;
> +	bool enabled;
> +
> +	rcu_read_lock();
> +	table = devlink_dpipe_table_find(&devlink->dpipe_table_list,
> +					 table_name);
> +	enabled = false;
> +	if (table)
> +		enabled = table->counters_enabled;
> +	rcu_read_unlock();
> +	return enabled;
> +}
> +EXPORT_SYMBOL_GPL(devlink_dpipe_table_counter_enabled);
> +
> +/**
> + *	devlink_dpipe_table_register - register dpipe table
> + *
> + *	@devlink: devlink
> + *	@table_name: table name
> + *	@table_ops: table ops
> + *	@priv: priv
> + *	@matches: match tuples
> + *	@matches_count: count of matches count
> + *	@actions: action tuples
> + *	@actions_count: count of action tuples
> + *	@counter_control_extern: external control for counters
> + */
> +int devlink_dpipe_table_register(struct devlink *devlink,
> +				 const char *table_name,
> +				 struct devlink_dpipe_table_ops *table_ops,
> +				 void *priv,
> +				 struct devlink_dpipe_hfield *matches,
> +				 unsigned int matches_count,
> +				 struct devlink_dpipe_hfield *actions,
> +				 unsigned int actions_count,
> +				 bool counter_control_extern)
> +{
> +	struct devlink_dpipe_table *table;
> +
> +	if (devlink_dpipe_table_find(&devlink->dpipe_table_list, table_name))
> +		return -EEXIST;
> +
> +	table = kzalloc(sizeof(*table), GFP_KERNEL);
> +	if (!table)
> +		return -ENOMEM;
> +
> +	table->name = table_name;
> +	table->table_ops = table_ops;
> +	table->priv = priv;
> +	table->matches = matches;
> +	table->matches_count = matches_count;
> +	table->actions = actions;
> +	table->actions_count = actions_count;
> +	table->counter_control_extern = counter_control_extern;
> +	mutex_lock(&devlink_mutex);
> +	list_add_tail_rcu(&table->list, &devlink->dpipe_table_list);
> +	mutex_unlock(&devlink_mutex);
> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(devlink_dpipe_table_register);
> +
> +/**
> + *	devlink_dpipe_table_unregister - unregister dpipe table
> + *
> + *	@devlink: devlink
> + *	@table_name: table name
> + */
> +void devlink_dpipe_table_unregister(struct devlink *devlink,
> +				    const char *table_name)
> +{
> +	struct devlink_dpipe_table *table;
> +
> +	mutex_lock(&devlink_mutex);
> +	table = devlink_dpipe_table_find(&devlink->dpipe_table_list,
> +					 table_name);
> +	if (!table)
> +		goto unlock;
> +	list_del_rcu(&table->list);
> +	mutex_unlock(&devlink_mutex);
> +	kfree_rcu(table, rcu);
> +	return;
> +unlock:
> +	mutex_unlock(&devlink_mutex);
> +}
> +EXPORT_SYMBOL_GPL(devlink_dpipe_table_unregister);
> +
>  static int __init devlink_module_init(void)
>  {
>  	return genl_register_family(&devlink_nl_family);
> 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [patch net-next RFC 0/8] Add support for pipeline debug (dpipe)
  2017-02-16 15:22 [patch net-next RFC 0/8] Add support for pipeline debug (dpipe) Jiri Pirko
                   ` (8 preceding siblings ...)
  2017-02-16 15:51 ` [patch net-next RFC 0/8] Add support for pipeline debug (dpipe) John Fastabend
@ 2017-02-16 16:11 ` Andrew Lunn
  2017-02-16 16:20   ` Jiri Pirko
  2017-02-16 17:04 ` Andrew Lunn
  10 siblings, 1 reply; 22+ messages in thread
From: Andrew Lunn @ 2017-02-16 16:11 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, arkadis, idosch, mlxsw, jhs, ivecera, roopa,
	f.fainelli, vivien.didelot, john.fastabend

On Thu, Feb 16, 2017 at 04:22:36PM +0100, Jiri Pirko wrote:
> From: Jiri Pirko <jiri@mellanox.com>
> 
> Arkadi says:

Hi Jiri, Arkadi

It is not mentioned here, but i assume you have a followup patchset
which extends the devlink command to enumerate what tables are
available and to print them?

What i seem to be missing is headers in uapi which links kernel and
userspace together. There are lots of enums defined, but few seem to
be exported to user space for the table parser/printer to use.

Hopefully it will be more obvious when the user space patches are
available.

Thanks
	Andrew

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [patch net-next RFC 0/8] Add support for pipeline debug (dpipe)
  2017-02-16 16:11 ` Andrew Lunn
@ 2017-02-16 16:20   ` Jiri Pirko
  2017-02-16 16:40     ` Andrew Lunn
  0 siblings, 1 reply; 22+ messages in thread
From: Jiri Pirko @ 2017-02-16 16:20 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: netdev, davem, arkadis, idosch, mlxsw, jhs, ivecera, roopa,
	f.fainelli, vivien.didelot, john.fastabend

Thu, Feb 16, 2017 at 05:11:50PM CET, andrew@lunn.ch wrote:
>On Thu, Feb 16, 2017 at 04:22:36PM +0100, Jiri Pirko wrote:
>> From: Jiri Pirko <jiri@mellanox.com>
>> 
>> Arkadi says:
>
>Hi Jiri, Arkadi
>
>It is not mentioned here, but i assume you have a followup patchset
>which extends the devlink command to enumerate what tables are
>available and to print them?

DEVLINK_CMD_DPIPE_TABLES_GET
command gets you all tables. But it gets it with the content. I guess
there could be some command to instruct devlink just to dump tables
without the content.


>
>What i seem to be missing is headers in uapi which links kernel and
>userspace together. There are lots of enums defined, but few seem to
>be exported to user space for the table parser/printer to use.

Not needed. All info is passed via devlink interface.


>
>Hopefully it will be more obvious when the user space patches are
>available.

I'll ask Arkadi to send the devlink userspace patches as RFC as well.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [patch net-next RFC 0/8] Add support for pipeline debug (dpipe)
  2017-02-16 15:51 ` [patch net-next RFC 0/8] Add support for pipeline debug (dpipe) John Fastabend
@ 2017-02-16 16:26   ` Jiri Pirko
  0 siblings, 0 replies; 22+ messages in thread
From: Jiri Pirko @ 2017-02-16 16:26 UTC (permalink / raw)
  To: John Fastabend
  Cc: netdev, davem, arkadis, idosch, mlxsw, jhs, ivecera, roopa,
	f.fainelli, vivien.didelot, andrew

Thu, Feb 16, 2017 at 04:51:12PM CET, john.fastabend@gmail.com wrote:
>On 17-02-16 07:22 AM, Jiri Pirko wrote:
>> From: Jiri Pirko <jiri@mellanox.com>
>> 
>> Arkadi says:
>> 
>> While doing the hardware offloading process much of the hardware
>> specifics cannot be presented. An example for such is the routing
>> LPM algorithm which differ in hardware implementation from the
>> kernel software implementation. The only information the user receives
>> is whether specific route is offloaded or not, but he cannot really
>> understand the underlying implementation nor get the specific statistics
>> related to that process.
>> 
>
>Agreed! Analyzing performance anomalies and optimizations is nearly
>impossible without this.
>
>> Another example is ACL offload using TC which is commonly implemented
>> using TCAM memory. Currently there is no capability to gain visibility
>> into the TCAM structure and to debug suboptimal resource allocation.
>
>Yep.
>
>> 
>> This patchset introduces capability for exporting the ASICs pipeline
>> abstraction via devlink infrastructure, which should serve as an
>> complementary tool. This infrastructure allows the user to get visibility
>> into the ASIC by modeling it as a set of match/action tables.
>> 
>> The main objects defined:
>> Table - abstraction for a single pipeline stage. Contains the
>>         available match/actions and counter availability.
>> Entry - entry in a specific table with specific matches/actions
>>         values and dedicated counter.
>> Header/field - tuples which describes the tables behavior.
>> 
>
>We also need to understand the table topology on devices that have
>flexible table topologies. For example some tables are processed
>in parallel while others are sequential. One bug I've seen is an ACL
>rule being inserted into a table in front of the tunnel engine for
>example when it needed to be after the tunnel engine. What resulted
>was incorrect drop of packets. That is just one example that was
>fairly easy to diagnose more subtle issues are possible.
>
>It looks like this could be added later as another nested block without
>breaking compatibility.

How do you want to do it? There could be multiple tables linked to one
table, depending on actions. Also, there could be default goto table
unrelated to any match or action.
Not sure how to add this to the UAPI. But I agree we need that.


>
>> As an example one of the ASIC's L3 blocks will be modeled. The egress
>> rif (router interface) table is the final step in the L3 pipeline
>> processing which does match on the internal rif index which was
>> determined before by the routing logic. The erif table determines
>> whether to forward or drop the packet and updates the corresponding
>> rif L3 statistics.
>> 
>> To expose this internal resources a special metadata header will
>> be introduced that describes the internal information gathered by
>> the ASIC's pipeline and contains the following fields: rif_port_index,
>> forward and drop.
>> 
>> Some internal hardware resources have direct mapping to kernel
>> objects. For example the rif_port_index is mapped to the net-devices
>> ifindex. By providing this mapping the users gains visibility into
>> the offloading process.
>> 
>> Follow-up work will include exporting more L3 tables which will give
>> visibility into the routing process.
>> 
>> First stage is adding support for dpipe in devlink. Next add support
>> in spectrum driver. Finally implement egress router interface
>> (erif) table for spectrum ASIC as an example.
>> 
>
>+1 perhaps not surprisingly in general the idea looks great to me.
>
>Another thought once something like this is in place. We currently have
>a provisioning step that is done out of band typically via firmware
>configuration at the moment where table sizes are specified, this is because
>many of the table resources are shared across device. It seems like this might
>be the right place to expose that to "expert" users at some point once
>all this dpipe interface is worked out. Just a thought lets not get to
>hung up on it now though.

Yep. We have a similar problem in general. But we rather need to divide
some shared resources that are not strictly bound to the tables, rather
very loosely. For that, we need to introduce a different interface.

Do you know about the need to setup size of specific table that could
not be done dynamically whenever the resources are needed?


>
>Thanks,
>John
>
>> Arkadi Sharshevsky (8):
>>   devlink: Support for pipeline debug (dpipe)
>>   mlxsw: spectrum: Add support for flow counter allocator
>>   mlxsw: reg: Add counter fields to RITR register
>>   mlxsw: spectrum: Add placeholder for dpipe
>>   mlxsw: spectrum: Add definition for egress rif table
>>   mlxsw: reg: Add Router Interface Counter Register
>>   mlxsw: spectrum: Support for counters on router interfaces
>>   mlxsw: spectrum: Add Support for erif table entries access
>> 
>>  drivers/net/ethernet/mellanox/mlxsw/Makefile       |   3 +-
>>  drivers/net/ethernet/mellanox/mlxsw/reg.h          | 178 +++++
>>  drivers/net/ethernet/mellanox/mlxsw/resources.h    |   2 +
>>  drivers/net/ethernet/mellanox/mlxsw/spectrum.c     | 163 +++++
>>  drivers/net/ethernet/mellanox/mlxsw/spectrum.h     |  21 +
>>  drivers/net/ethernet/mellanox/mlxsw/spectrum_cnt.c | 182 +++++
>>  drivers/net/ethernet/mellanox/mlxsw/spectrum_cnt.h |  56 ++
>>  .../net/ethernet/mellanox/mlxsw/spectrum_dpipe.c   | 303 +++++++++
>>  .../net/ethernet/mellanox/mlxsw/spectrum_dpipe.h   |  43 ++
>>  include/net/devlink.h                              | 224 +++++-
>>  include/uapi/linux/devlink.h                       |  50 +-
>>  net/core/devlink.c                                 | 747 +++++++++++++++++++++
>>  12 files changed, 1969 insertions(+), 3 deletions(-)
>>  create mode 100644 drivers/net/ethernet/mellanox/mlxsw/spectrum_cnt.c
>>  create mode 100644 drivers/net/ethernet/mellanox/mlxsw/spectrum_cnt.h
>>  create mode 100644 drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.c
>>  create mode 100644 drivers/net/ethernet/mellanox/mlxsw/spectrum_dpipe.h
>> 
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [patch net-next RFC 0/8] Add support for pipeline debug (dpipe)
  2017-02-16 16:20   ` Jiri Pirko
@ 2017-02-16 16:40     ` Andrew Lunn
  2017-02-16 16:48       ` Jiri Pirko
  0 siblings, 1 reply; 22+ messages in thread
From: Andrew Lunn @ 2017-02-16 16:40 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, arkadis, idosch, mlxsw, jhs, ivecera, roopa,
	f.fainelli, vivien.didelot, john.fastabend

On Thu, Feb 16, 2017 at 05:20:00PM +0100, Jiri Pirko wrote:
> Thu, Feb 16, 2017 at 05:11:50PM CET, andrew@lunn.ch wrote:
> >On Thu, Feb 16, 2017 at 04:22:36PM +0100, Jiri Pirko wrote:
> >> From: Jiri Pirko <jiri@mellanox.com>
> >> 
> >> Arkadi says:
> >
> >Hi Jiri, Arkadi
> >
> >It is not mentioned here, but i assume you have a followup patchset
> >which extends the devlink command to enumerate what tables are
> >available and to print them?
> 
> DEVLINK_CMD_DPIPE_TABLES_GET
> command gets you all tables. But it gets it with the content. I guess
> there could be some command to instruct devlink just to dump tables
> without the content.

This does not sound like it will scale very well, as the number of
tables increases. Also, at least in the DSA world, getting a table can
be an expensive operation, lots of MDIO/SPI/I2C transactions. I'd
prefer to be able to just get one specific table, rather than dump
them all.

I can however see cases when it does make sense to have an atomic,
dump everything operation, where you can trust to be consistent across
tables.

> >Hopefully it will be more obvious when the user space patches are
> >available.
> 
> I'll ask Arkadi to send the devlink userspace patches as RFC as well.

Great, thanks.

       Andrew

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [patch net-next RFC 0/8] Add support for pipeline debug (dpipe)
  2017-02-16 16:40     ` Andrew Lunn
@ 2017-02-16 16:48       ` Jiri Pirko
  2017-02-16 21:20         ` arkadis
  0 siblings, 1 reply; 22+ messages in thread
From: Jiri Pirko @ 2017-02-16 16:48 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: netdev, davem, arkadis, idosch, mlxsw, jhs, ivecera, roopa,
	f.fainelli, vivien.didelot, john.fastabend

Thu, Feb 16, 2017 at 05:40:29PM CET, andrew@lunn.ch wrote:
>On Thu, Feb 16, 2017 at 05:20:00PM +0100, Jiri Pirko wrote:
>> Thu, Feb 16, 2017 at 05:11:50PM CET, andrew@lunn.ch wrote:
>> >On Thu, Feb 16, 2017 at 04:22:36PM +0100, Jiri Pirko wrote:
>> >> From: Jiri Pirko <jiri@mellanox.com>
>> >> 
>> >> Arkadi says:
>> >
>> >Hi Jiri, Arkadi
>> >
>> >It is not mentioned here, but i assume you have a followup patchset
>> >which extends the devlink command to enumerate what tables are
>> >available and to print them?
>> 
>> DEVLINK_CMD_DPIPE_TABLES_GET
>> command gets you all tables. But it gets it with the content. I guess
>> there could be some command to instruct devlink just to dump tables
>> without the content.
>
>This does not sound like it will scale very well, as the number of
>tables increases. Also, at least in the DSA world, getting a table can
>be an expensive operation, lots of MDIO/SPI/I2C transactions. I'd
>prefer to be able to just get one specific table, rather than dump
>them all.

Understood. We'll try to figure out how to make this done.


>
>I can however see cases when it does make sense to have an atomic,
>dump everything operation, where you can trust to be consistent across
>tables.
>
>> >Hopefully it will be more obvious when the user space patches are
>> >available.
>> 
>> I'll ask Arkadi to send the devlink userspace patches as RFC as well.
>
>Great, thanks.
>
>       Andrew

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [patch net-next RFC 0/8] Add support for pipeline debug (dpipe)
  2017-02-16 15:22 [patch net-next RFC 0/8] Add support for pipeline debug (dpipe) Jiri Pirko
                   ` (9 preceding siblings ...)
  2017-02-16 16:11 ` Andrew Lunn
@ 2017-02-16 17:04 ` Andrew Lunn
  2017-02-16 18:40   ` Jiri Pirko
  10 siblings, 1 reply; 22+ messages in thread
From: Andrew Lunn @ 2017-02-16 17:04 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, arkadis, idosch, mlxsw, jhs, ivecera, roopa,
	f.fainelli, vivien.didelot, john.fastabend

On Thu, Feb 16, 2017 at 04:22:36PM +0100, Jiri Pirko wrote:
> From: Jiri Pirko <jiri@mellanox.com>

Hi Jiri, Arkadi

Just to make sure we are all on the same page here:

This purely RFC. It is not intended for the merge window starting next
week?

This is a new binary API, something we have to live with forever. So i
think before it gets merged into a Linus kernel, we need time to think
about all the different sorts of tables we want to dump, what sort of
fields they have, and can we represent them using this API. That is
going to take some time.

      Andrew

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [patch net-next RFC 0/8] Add support for pipeline debug (dpipe)
  2017-02-16 17:04 ` Andrew Lunn
@ 2017-02-16 18:40   ` Jiri Pirko
  0 siblings, 0 replies; 22+ messages in thread
From: Jiri Pirko @ 2017-02-16 18:40 UTC (permalink / raw)
  To: Andrew Lunn
  Cc: netdev, davem, arkadis, idosch, mlxsw, jhs, ivecera, roopa,
	f.fainelli, vivien.didelot, john.fastabend

Thu, Feb 16, 2017 at 06:04:46PM CET, andrew@lunn.ch wrote:
>On Thu, Feb 16, 2017 at 04:22:36PM +0100, Jiri Pirko wrote:
>> From: Jiri Pirko <jiri@mellanox.com>
>
>Hi Jiri, Arkadi
>
>Just to make sure we are all on the same page here:
>
>This purely RFC. It is not intended for the merge window starting next
>week?

No. We are aiming next net-next open.


>
>This is a new binary API, something we have to live with forever. So i
>think before it gets merged into a Linus kernel, we need time to think
>about all the different sorts of tables we want to dump, what sort of
>fields they have, and can we represent them using this API. That is
>going to take some time.

Agreed.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [patch net-next RFC 0/8] Add support for pipeline debug (dpipe)
  2017-02-16 16:48       ` Jiri Pirko
@ 2017-02-16 21:20         ` arkadis
  0 siblings, 0 replies; 22+ messages in thread
From: arkadis @ 2017-02-16 21:20 UTC (permalink / raw)
  To: Jiri Pirko, Andrew Lunn
  Cc: netdev, davem, idosch, mlxsw, jhs, ivecera, roopa, f.fainelli,
	vivien.didelot, john.fastabend



On 02/16/2017 06:48 PM, Jiri Pirko wrote:
> Thu, Feb 16, 2017 at 05:40:29PM CET, andrew@lunn.ch wrote:
>> On Thu, Feb 16, 2017 at 05:20:00PM +0100, Jiri Pirko wrote:
>>> Thu, Feb 16, 2017 at 05:11:50PM CET, andrew@lunn.ch wrote:
>>>> On Thu, Feb 16, 2017 at 04:22:36PM +0100, Jiri Pirko wrote:
>>>>> From: Jiri Pirko <jiri@mellanox.com>
>>>>>
>>>>> Arkadi says:
>>>>
>>>> Hi Jiri, Arkadi
>>>>
>>>> It is not mentioned here, but i assume you have a followup patchset
>>>> which extends the devlink command to enumerate what tables are
>>>> available and to print them?
>>>
>>> DEVLINK_CMD_DPIPE_TABLES_GET
>>> command gets you all tables. But it gets it with the content. I guess
>>> there could be some command to instruct devlink just to dump tables
>>> without the content.
>>
>> This does not sound like it will scale very well, as the number of
>> tables increases. Also, at least in the DSA world, getting a table can
>> be an expensive operation, lots of MDIO/SPI/I2C transactions. I'd
>> prefer to be able to just get one specific table, rather than dump
>> them all.
> 
> Understood. We'll try to figure out how to make this done.
> 

Get tables should give general table description including name,
behavior in terms of match/action, max entries, lookup way (TCAM, hash)
etc. To get the actual entries you have to use dump entries on a
per-table resolution. So you need to use DEVLINK_DPIPE_ENTRIES_GET on
a specific table.

First, one can see the tables in order to understand the hardware
structure (for debug purposes or just to gain visibility into the
offload process), and then dump the specific table of interest.

Some tables for example TCAM memory can be divided into regions.
So it is reasonable to aggregate tables under other tables (kind of
a zoom in) and be able to dump all the TCAM with one dump operation and
yet gain access of dumping each TCAM region separately.

Some things are currently missing like the above logic. Furthermore
the interaction between tables should be exported as well so that the
user receives graph of table interaction.

As I see it, all this behavior can be added and this basic API stays
valid.

> 
>>
>> I can however see cases when it does make sense to have an atomic,
>> dump everything operation, where you can trust to be consistent across
>> tables.
>>
>>>> Hopefully it will be more obvious when the user space patches are
>>>> available.
>>>
>>> I'll ask Arkadi to send the devlink userspace patches as RFC as well.
>>
>> Great, thanks.
>>
>>       Andrew

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [patch net-next RFC 1/8] devlink: Support for pipeline debug (dpipe)
  2017-02-16 15:22 ` [patch net-next RFC 1/8] devlink: Support " Jiri Pirko
  2017-02-16 15:57   ` John Fastabend
@ 2017-02-17  8:49   ` Simon Horman
  2017-02-18  7:38     ` Jiri Pirko
  1 sibling, 1 reply; 22+ messages in thread
From: Simon Horman @ 2017-02-17  8:49 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, arkadis, idosch, mlxsw, jhs, ivecera, roopa,
	f.fainelli, vivien.didelot, john.fastabend, andrew

Hi Jiri,

On Thu, Feb 16, 2017 at 04:22:37PM +0100, Jiri Pirko wrote:
> From: Arkadi Sharshevsky <arkadis@mellanox.com>
> 
> The pipeline debug is used to export the pipeline abstractions
> for the main objects - tables, headers and entries. The only support for
> set is for changing the counter parameter on specific table.
> 
> The basic structures:
> 
> Header - can represent a real protocol header information or internal
>          metadata. Generic protocol headers like IPv4 can be shared
>          between drivers. Each driver can add local headers.
> 
> Field - part of a header. Can represent protocol field or specific
>         ASIC metadata field. Hardware special metadata fields can
>         be mapped to different resources, for example switch ASIC
>        	ports can have internal number which from the systems
>         point of view is mapped to netdeivce ifindex.
> 
> Hfield - Specific combination of header:field. This object is used
>          to represent the table behavior in terms of match/action.
> 
> Hfield_val - Represents value of specific Hfield.
> 
> Table - represents a hardware block which can be described with
>         match/action behavior. The match/action can be done on the
>         packets data or on the internal metadata that it gathered
>         along the packets traversal throw the pipeline which is vendor
>         specific and should be exported in order to provide
>         understanding of ASICs behavior.
> 
> Entry - represents single record in a specific table. The entry is
>         identified by specific combination of Hfield_vals for match
>         /action.
> 
> Prior to accessing the tables/entries the drivers provide the header/
> field data base which is used by driver to user-space. The data base
> is split between the shared headers and unique headers.

Thanks for posting this. In general I think it looks quite promising.

After a first pass over the code I have the following
specific comments:

> 
> Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
> ---
>  include/net/devlink.h        | 224 ++++++++++++-
>  include/uapi/linux/devlink.h |  50 ++-
>  net/core/devlink.c           | 747 +++++++++++++++++++++++++++++++++++++++++++
>  3 files changed, 1019 insertions(+), 2 deletions(-)
> 
> diff --git a/include/net/devlink.h b/include/net/devlink.h

...

> +/**
> + * struct devlink_dpipe_entry - table entry object
> + * @index: index of the entry in the table
> + * @matches: tuple match values
> + * @matches_count: count of matches tuples
> + * @actions: tuple actions values
> + * @actions_count: count of actions values
> + * @counter: value of counter
> + * @counter_valid: Specify if value is valid from hardware
> + */
> +struct devlink_dpipe_entry {
> +	unsigned int index;

I'm not sure what I understand what index is but I assume that it is a
unique identifier for the flow within the table. From the point of view
of having enough indexes for all entries in the table an unsigned int seems
adequate. But I see use-cases that have significantly wider identifiers.

I'm wondering what your thoughts are on supporting wider identifiers.
Perhaps they belong in the match?

> +	struct devlink_dpipe_hfield_val *matches;
> +	unsigned int matches_count;
> +	struct devlink_dpipe_hfield_val *actions;
> +	unsigned int actions_count;
> +	u64 counter;

I'm unclear on what counter is. Is it the number of times the action entry
has been used (hit)? If so I think some provision for richer per-hit
counters for entries would be useful. At least number of hits and number of
bytes.  But perhaps it would be useful to allow hardware to describe its
per-entry counters?

> +	bool counter_valid;
> +};

...

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [patch net-next RFC 1/8] devlink: Support for pipeline debug (dpipe)
  2017-02-17  8:49   ` Simon Horman
@ 2017-02-18  7:38     ` Jiri Pirko
  2017-02-20  8:27       ` Simon Horman
  0 siblings, 1 reply; 22+ messages in thread
From: Jiri Pirko @ 2017-02-18  7:38 UTC (permalink / raw)
  To: Simon Horman
  Cc: netdev, davem, arkadis, idosch, mlxsw, jhs, ivecera, roopa,
	f.fainelli, vivien.didelot, john.fastabend, andrew

Fri, Feb 17, 2017 at 09:49:07AM CET, simon.horman@netronome.com wrote:
>Hi Jiri,
>
>On Thu, Feb 16, 2017 at 04:22:37PM +0100, Jiri Pirko wrote:
>> From: Arkadi Sharshevsky <arkadis@mellanox.com>
>> 
>> The pipeline debug is used to export the pipeline abstractions
>> for the main objects - tables, headers and entries. The only support for
>> set is for changing the counter parameter on specific table.
>> 
>> The basic structures:
>> 
>> Header - can represent a real protocol header information or internal
>>          metadata. Generic protocol headers like IPv4 can be shared
>>          between drivers. Each driver can add local headers.
>> 
>> Field - part of a header. Can represent protocol field or specific
>>         ASIC metadata field. Hardware special metadata fields can
>>         be mapped to different resources, for example switch ASIC
>>        	ports can have internal number which from the systems
>>         point of view is mapped to netdeivce ifindex.
>> 
>> Hfield - Specific combination of header:field. This object is used
>>          to represent the table behavior in terms of match/action.
>> 
>> Hfield_val - Represents value of specific Hfield.
>> 
>> Table - represents a hardware block which can be described with
>>         match/action behavior. The match/action can be done on the
>>         packets data or on the internal metadata that it gathered
>>         along the packets traversal throw the pipeline which is vendor
>>         specific and should be exported in order to provide
>>         understanding of ASICs behavior.
>> 
>> Entry - represents single record in a specific table. The entry is
>>         identified by specific combination of Hfield_vals for match
>>         /action.
>> 
>> Prior to accessing the tables/entries the drivers provide the header/
>> field data base which is used by driver to user-space. The data base
>> is split between the shared headers and unique headers.
>
>Thanks for posting this. In general I think it looks quite promising.
>
>After a first pass over the code I have the following
>specific comments:
>
>> 
>> Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
>> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
>> ---
>>  include/net/devlink.h        | 224 ++++++++++++-
>>  include/uapi/linux/devlink.h |  50 ++-
>>  net/core/devlink.c           | 747 +++++++++++++++++++++++++++++++++++++++++++
>>  3 files changed, 1019 insertions(+), 2 deletions(-)
>> 
>> diff --git a/include/net/devlink.h b/include/net/devlink.h
>
>...
>
>> +/**
>> + * struct devlink_dpipe_entry - table entry object
>> + * @index: index of the entry in the table
>> + * @matches: tuple match values
>> + * @matches_count: count of matches tuples
>> + * @actions: tuple actions values
>> + * @actions_count: count of actions values
>> + * @counter: value of counter
>> + * @counter_valid: Specify if value is valid from hardware
>> + */
>> +struct devlink_dpipe_entry {
>> +	unsigned int index;
>
>I'm not sure what I understand what index is but I assume that it is a
>unique identifier for the flow within the table. From the point of view

It is an index of the entry within the table, yes.


>of having enough indexes for all entries in the table an unsigned int seems
>adequate. But I see use-cases that have significantly wider identifiers.
>
>I'm wondering what your thoughts are on supporting wider identifiers.

We can make this u64, no problem.


>Perhaps they belong in the match?
>
>> +	struct devlink_dpipe_hfield_val *matches;
>> +	unsigned int matches_count;
>> +	struct devlink_dpipe_hfield_val *actions;
>> +	unsigned int actions_count;
>> +	u64 counter;
>
>I'm unclear on what counter is. Is it the number of times the action entry
>has been used (hit)? If so I think some provision for richer per-hit

Yes.

>counters for entries would be useful. At least number of hits and number of
>bytes.  But perhaps it would be useful to allow hardware to describe its
>per-entry counters?

Yeah, we were thinking about having this toggle per-entry. The thing is,
you should be able to control per-entry over standard API. For example,
for TCAM entry, you should use TC toggle to control counter on/off.

This table-wide toggle is useful for debugging purposes of entries that
are not exposed over standard API.

Do you see a need to toggle this per-entry?


Thanks for the review!

>
>> +	bool counter_valid;
>> +};
>
>...

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [patch net-next RFC 1/8] devlink: Support for pipeline debug (dpipe)
  2017-02-18  7:38     ` Jiri Pirko
@ 2017-02-20  8:27       ` Simon Horman
  0 siblings, 0 replies; 22+ messages in thread
From: Simon Horman @ 2017-02-20  8:27 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: netdev, davem, arkadis, idosch, mlxsw, jhs, ivecera, roopa,
	f.fainelli, vivien.didelot, john.fastabend, andrew

On Sat, Feb 18, 2017 at 08:38:51AM +0100, Jiri Pirko wrote:
> Fri, Feb 17, 2017 at 09:49:07AM CET, simon.horman@netronome.com wrote:
> >Hi Jiri,
> >
> >On Thu, Feb 16, 2017 at 04:22:37PM +0100, Jiri Pirko wrote:
> >> From: Arkadi Sharshevsky <arkadis@mellanox.com>
> >> 
> >> The pipeline debug is used to export the pipeline abstractions
> >> for the main objects - tables, headers and entries. The only support for
> >> set is for changing the counter parameter on specific table.
> >> 
> >> The basic structures:
> >> 
> >> Header - can represent a real protocol header information or internal
> >>          metadata. Generic protocol headers like IPv4 can be shared
> >>          between drivers. Each driver can add local headers.
> >> 
> >> Field - part of a header. Can represent protocol field or specific
> >>         ASIC metadata field. Hardware special metadata fields can
> >>         be mapped to different resources, for example switch ASIC
> >>        	ports can have internal number which from the systems
> >>         point of view is mapped to netdeivce ifindex.
> >> 
> >> Hfield - Specific combination of header:field. This object is used
> >>          to represent the table behavior in terms of match/action.
> >> 
> >> Hfield_val - Represents value of specific Hfield.
> >> 
> >> Table - represents a hardware block which can be described with
> >>         match/action behavior. The match/action can be done on the
> >>         packets data or on the internal metadata that it gathered
> >>         along the packets traversal throw the pipeline which is vendor
> >>         specific and should be exported in order to provide
> >>         understanding of ASICs behavior.
> >> 
> >> Entry - represents single record in a specific table. The entry is
> >>         identified by specific combination of Hfield_vals for match
> >>         /action.
> >> 
> >> Prior to accessing the tables/entries the drivers provide the header/
> >> field data base which is used by driver to user-space. The data base
> >> is split between the shared headers and unique headers.
> >
> >Thanks for posting this. In general I think it looks quite promising.
> >
> >After a first pass over the code I have the following
> >specific comments:
> >
> >> 
> >> Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com>
> >> Signed-off-by: Jiri Pirko <jiri@mellanox.com>
> >> ---
> >>  include/net/devlink.h        | 224 ++++++++++++-
> >>  include/uapi/linux/devlink.h |  50 ++-
> >>  net/core/devlink.c           | 747 +++++++++++++++++++++++++++++++++++++++++++
> >>  3 files changed, 1019 insertions(+), 2 deletions(-)
> >> 
> >> diff --git a/include/net/devlink.h b/include/net/devlink.h
> >
> >...
> >
> >> +/**
> >> + * struct devlink_dpipe_entry - table entry object
> >> + * @index: index of the entry in the table
> >> + * @matches: tuple match values
> >> + * @matches_count: count of matches tuples
> >> + * @actions: tuple actions values
> >> + * @actions_count: count of actions values
> >> + * @counter: value of counter
> >> + * @counter_valid: Specify if value is valid from hardware
> >> + */
> >> +struct devlink_dpipe_entry {
> >> +	unsigned int index;
> >
> >I'm not sure what I understand what index is but I assume that it is a
> >unique identifier for the flow within the table. From the point of view
> 
> It is an index of the entry within the table, yes.
> 
> 
> >of having enough indexes for all entries in the table an unsigned int seems
> >adequate. But I see use-cases that have significantly wider identifiers.
> >
> >I'm wondering what your thoughts are on supporting wider identifiers.
> 
> We can make this u64, no problem.
> 
> 
> >Perhaps they belong in the match?
> >
> >> +	struct devlink_dpipe_hfield_val *matches;
> >> +	unsigned int matches_count;
> >> +	struct devlink_dpipe_hfield_val *actions;
> >> +	unsigned int actions_count;
> >> +	u64 counter;
> >
> >I'm unclear on what counter is. Is it the number of times the action entry
> >has been used (hit)? If so I think some provision for richer per-hit
> 
> Yes.
> 
> >counters for entries would be useful. At least number of hits and number of
> >bytes.  But perhaps it would be useful to allow hardware to describe its
> >per-entry counters?
> 
> Yeah, we were thinking about having this toggle per-entry. The thing is,
> you should be able to control per-entry over standard API. For example,
> for TCAM entry, you should use TC toggle to control counter on/off.
> 
> This table-wide toggle is useful for debugging purposes of entries that
> are not exposed over standard API.
> 
> Do you see a need to toggle this per-entry?

For the use-cases I am currently aware of I would not need a per-entry toggle.

> Thanks for the review!
> 
> >
> >> +	bool counter_valid;
> >> +};
> >
> >...

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2017-02-20  8:27 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-16 15:22 [patch net-next RFC 0/8] Add support for pipeline debug (dpipe) Jiri Pirko
2017-02-16 15:22 ` [patch net-next RFC 1/8] devlink: Support " Jiri Pirko
2017-02-16 15:57   ` John Fastabend
2017-02-17  8:49   ` Simon Horman
2017-02-18  7:38     ` Jiri Pirko
2017-02-20  8:27       ` Simon Horman
2017-02-16 15:22 ` [patch net-next RFC 2/8] mlxsw: spectrum: Add support for flow counter allocator Jiri Pirko
2017-02-16 15:22 ` [patch net-next RFC 3/8] mlxsw: reg: Add counter fields to RITR register Jiri Pirko
2017-02-16 15:22 ` [patch net-next RFC 4/8] mlxsw: spectrum: Add placeholder for dpipe Jiri Pirko
2017-02-16 15:22 ` [patch net-next RFC 5/8] mlxsw: spectrum: Add definition for egress rif table Jiri Pirko
2017-02-16 15:22 ` [patch net-next RFC 6/8] mlxsw: reg: Add Router Interface Counter Register Jiri Pirko
2017-02-16 15:22 ` [patch net-next RFC 7/8] mlxsw: spectrum: Support for counters on router interfaces Jiri Pirko
2017-02-16 15:22 ` [patch net-next RFC 8/8] mlxsw: spectrum: Add Support for erif table entries access Jiri Pirko
2017-02-16 15:51 ` [patch net-next RFC 0/8] Add support for pipeline debug (dpipe) John Fastabend
2017-02-16 16:26   ` Jiri Pirko
2017-02-16 16:11 ` Andrew Lunn
2017-02-16 16:20   ` Jiri Pirko
2017-02-16 16:40     ` Andrew Lunn
2017-02-16 16:48       ` Jiri Pirko
2017-02-16 21:20         ` arkadis
2017-02-16 17:04 ` Andrew Lunn
2017-02-16 18:40   ` Jiri Pirko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.