All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 00/11] Add drop monitor for offloaded data paths
@ 2019-07-07  7:58 Ido Schimmel
  2019-07-07  7:58 ` [PATCH net-next 01/11] devlink: Create helper to fill port type information Ido Schimmel
                   ` (14 more replies)
  0 siblings, 15 replies; 48+ messages in thread
From: Ido Schimmel @ 2019-07-07  7:58 UTC (permalink / raw)
  To: netdev
  Cc: davem, jiri, mlxsw, dsahern, roopa, nikolay, andy, pablo,
	jakub.kicinski, pieter.jansenvanvuuren, andrew, f.fainelli,
	vivien.didelot, Ido Schimmel

From: Ido Schimmel <idosch@mellanox.com>

Users have several ways to debug the kernel and understand why a packet
was dropped. For example, using "drop monitor" and "perf". Both
utilities trace kfree_skb(), which is the function called when a packet
is freed as part of a failure. The information provided by these tools
is invaluable when trying to understand the cause of a packet loss.

In recent years, large portions of the kernel data path were offloaded
to capable devices. Today, it is possible to perform L2 and L3
forwarding in hardware, as well as tunneling (IP-in-IP and VXLAN).
Different TC classifiers and actions are also offloaded to capable
devices, at both ingress and egress.

However, when the data path is offloaded it is not possible to achieve
the same level of introspection as tools such "perf" and "drop monitor"
become irrelevant.

This patchset aims to solve this by allowing users to monitor packets
that the underlying device decided to drop along with relevant metadata
such as the drop reason and ingress port.

The above is achieved by exposing a fundamental capability of devices
capable of data path offloading - packet trapping. While the common use
case for packet trapping is the trapping of packets required for the
correct functioning of the control plane (e.g., STP, BGP packets),
packets can also be trapped due to other reasons such as exceptions
(e.g., TTL error) and drops (e.g., blackhole route).

Given this ability is not specific to a port, but rather to a device, it
is exposed using devlink. Each capable driver is expected to register
its supported packet traps with devlink and report trapped packets to
devlink as they income. devlink will perform accounting of received
packets and bytes and will potentially generate an event to user space
using a new generic netlink multicast group.

While this patchset is concerned with traps corresponding to dropped
packets, the interface itself is generic and can be used to expose traps
corresponding to control packets in the future. The API is vendor
neutral and similar to the API exposed by SAI which is implemented by
several vendors already.

The implementation in this patchset is on top of both mlxsw and
netdevsim so that people could experiment with the interface and provide
useful feedback.

Patches #1-#4 add the devlink-trap infrastructure.

Patches #5-#6 add an example implementation of netdevsim.

Patches #7-#11 add a real world implementation over mlxsw.

Tests for both the core infrastructure (over netdevsim) and mlxsw will
be sent separately as RFC as they are dependent on the acceptance of the
iproute2 changes.

Example
=======

Instantiate netdevsim
---------------------

# echo "10 1" > /sys/bus/netdevsim/new_device
# ip link set dev eth0 up

List supported traps
--------------------

# devlink trap show
netdevsim/netdevsim10:
  name source_mac_is_multicast type drop generic true report false action drop group l2_drops
  name vlan_tag_mismatch type drop generic true report false action drop group l2_drops
  name ingress_vlan_filter type drop generic true report false action drop group l2_drops
  name ingress_spanning_tree_filter type drop generic true report false action drop group l2_drops
  name port_list_is_empty type drop generic true report false action drop group l2_drops
  name port_loopback_filter type drop generic true report false action drop group l2_drops
  name fid_miss type exception generic false report false action trap group l2_drops
  name blackhole_route type drop generic true report false action drop group l3_drops
  name ttl_value_is_too_small type exception generic true report false action trap group l3_drops
  name tail_drop type drop generic true report false action drop group buffer_drops

Enable a trap
-------------

# devlink trap set netdevsim/netdevsim10 trap blackhole_route action trap report true

Query statistics
----------------

# devlink -s trap show netdevsim/netdevsim10 trap blackhole_route
netdevsim/netdevsim10:
  name blackhole_route type drop generic true report true action trap group l3_drops
    stats:
        rx:
          bytes 18744 packets 132

Monitor dropped packets
-----------------------

# devlink -v mon trap-report
[trap-report,report] netdevsim/netdevsim10: name blackhole_route type drop group l3_drops length 142 timestamp Sun Jun 30 20:26:12 2019 835605178 nsec
  input_port:
    netdevsim/netdevsim10/0: type eth netdev eth0

Future plans
============

* Provide more drop reasons as well as more metadata

v1:
* Rename trap names to make them more generic
* Change policer settings in mlxsw

Ido Schimmel (11):
  devlink: Create helper to fill port type information
  devlink: Add packet trap infrastructure
  devlink: Add generic packet traps and groups
  Documentation: Add devlink-trap documentation
  netdevsim: Add devlink-trap support
  Documentation: Add description of netdevsim traps
  mlxsw: core: Add API to set trap action
  mlxsw: reg: Add new trap action
  mlxsw: Add layer 2 discard trap IDs
  mlxsw: Add trap group for layer 2 discards
  mlxsw: spectrum: Add devlink-trap support

 .../networking/devlink-trap-netdevsim.rst     |   20 +
 Documentation/networking/devlink-trap.rst     |  190 +++
 Documentation/networking/index.rst            |    2 +
 drivers/net/ethernet/mellanox/mlxsw/Makefile  |    2 +-
 drivers/net/ethernet/mellanox/mlxsw/core.c    |   64 +
 drivers/net/ethernet/mellanox/mlxsw/core.h    |   12 +
 drivers/net/ethernet/mellanox/mlxsw/reg.h     |   10 +
 .../net/ethernet/mellanox/mlxsw/spectrum.c    |   17 +
 .../net/ethernet/mellanox/mlxsw/spectrum.h    |   13 +
 .../ethernet/mellanox/mlxsw/spectrum_trap.c   |  270 ++++
 drivers/net/ethernet/mellanox/mlxsw/trap.h    |    7 +
 drivers/net/netdevsim/dev.c                   |  273 +++-
 drivers/net/netdevsim/netdevsim.h             |    1 +
 include/net/devlink.h                         |  175 +++
 include/uapi/linux/devlink.h                  |   68 +
 net/core/devlink.c                            | 1312 ++++++++++++++++-
 16 files changed, 2409 insertions(+), 27 deletions(-)
 create mode 100644 Documentation/networking/devlink-trap-netdevsim.rst
 create mode 100644 Documentation/networking/devlink-trap.rst
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/spectrum_trap.c

-- 
2.20.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH net-next 01/11] devlink: Create helper to fill port type information
  2019-07-07  7:58 [PATCH net-next 00/11] Add drop monitor for offloaded data paths Ido Schimmel
@ 2019-07-07  7:58 ` Ido Schimmel
  2019-07-07  7:58 ` [PATCH net-next 02/11] devlink: Add packet trap infrastructure Ido Schimmel
                   ` (13 subsequent siblings)
  14 siblings, 0 replies; 48+ messages in thread
From: Ido Schimmel @ 2019-07-07  7:58 UTC (permalink / raw)
  To: netdev
  Cc: davem, jiri, mlxsw, dsahern, roopa, nikolay, andy, pablo,
	jakub.kicinski, pieter.jansenvanvuuren, andrew, f.fainelli,
	vivien.didelot, Ido Schimmel

From: Ido Schimmel <idosch@mellanox.com>

The function that fills port attributes in a netlink message fills the
port type attributes together with other attributes such as the device
handle.

The port type attributes will also need to be filled for trapped packets
by a subsequent patch.

Prevent code duplication and create a helper that can be used from both
places.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 net/core/devlink.c | 49 +++++++++++++++++++++++++++-------------------
 1 file changed, 29 insertions(+), 20 deletions(-)

diff --git a/net/core/devlink.c b/net/core/devlink.c
index 89c533778135..755a9a32015e 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -527,6 +527,33 @@ static int devlink_nl_port_attrs_put(struct sk_buff *msg,
 	return 0;
 }
 
+static int devlink_nl_port_type_fill(struct sk_buff *msg,
+				     const struct devlink_port *devlink_port)
+{
+	if (nla_put_u16(msg, DEVLINK_ATTR_PORT_TYPE, devlink_port->type))
+		return -EMSGSIZE;
+
+	if (devlink_port->type == DEVLINK_PORT_TYPE_ETH) {
+		struct net_device *netdev = devlink_port->type_dev;
+
+		if (netdev &&
+		    (nla_put_u32(msg, DEVLINK_ATTR_PORT_NETDEV_IFINDEX,
+				 netdev->ifindex) ||
+		     nla_put_string(msg, DEVLINK_ATTR_PORT_NETDEV_NAME,
+				    netdev->name)))
+			return -EMSGSIZE;
+	} else if (devlink_port->type == DEVLINK_PORT_TYPE_IB) {
+		struct ib_device *ibdev = devlink_port->type_dev;
+
+		if (ibdev &&
+		    nla_put_string(msg, DEVLINK_ATTR_PORT_IBDEV_NAME,
+				   ibdev->name))
+			return -EMSGSIZE;
+	}
+
+	return 0;
+}
+
 static int devlink_nl_port_fill(struct sk_buff *msg, struct devlink *devlink,
 				struct devlink_port *devlink_port,
 				enum devlink_command cmd, u32 portid,
@@ -544,30 +571,12 @@ static int devlink_nl_port_fill(struct sk_buff *msg, struct devlink *devlink,
 		goto nla_put_failure;
 
 	spin_lock(&devlink_port->type_lock);
-	if (nla_put_u16(msg, DEVLINK_ATTR_PORT_TYPE, devlink_port->type))
-		goto nla_put_failure_type_locked;
 	if (devlink_port->desired_type != DEVLINK_PORT_TYPE_NOTSET &&
 	    nla_put_u16(msg, DEVLINK_ATTR_PORT_DESIRED_TYPE,
 			devlink_port->desired_type))
 		goto nla_put_failure_type_locked;
-	if (devlink_port->type == DEVLINK_PORT_TYPE_ETH) {
-		struct net_device *netdev = devlink_port->type_dev;
-
-		if (netdev &&
-		    (nla_put_u32(msg, DEVLINK_ATTR_PORT_NETDEV_IFINDEX,
-				 netdev->ifindex) ||
-		     nla_put_string(msg, DEVLINK_ATTR_PORT_NETDEV_NAME,
-				    netdev->name)))
-			goto nla_put_failure_type_locked;
-	}
-	if (devlink_port->type == DEVLINK_PORT_TYPE_IB) {
-		struct ib_device *ibdev = devlink_port->type_dev;
-
-		if (ibdev &&
-		    nla_put_string(msg, DEVLINK_ATTR_PORT_IBDEV_NAME,
-				   ibdev->name))
-			goto nla_put_failure_type_locked;
-	}
+	if (devlink_nl_port_type_fill(msg, devlink_port))
+		goto nla_put_failure_type_locked;
 	spin_unlock(&devlink_port->type_lock);
 	if (devlink_nl_port_attrs_put(msg, devlink_port))
 		goto nla_put_failure;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH net-next 02/11] devlink: Add packet trap infrastructure
  2019-07-07  7:58 [PATCH net-next 00/11] Add drop monitor for offloaded data paths Ido Schimmel
  2019-07-07  7:58 ` [PATCH net-next 01/11] devlink: Create helper to fill port type information Ido Schimmel
@ 2019-07-07  7:58 ` Ido Schimmel
  2019-07-07  7:58 ` [PATCH net-next 03/11] devlink: Add generic packet traps and groups Ido Schimmel
                   ` (12 subsequent siblings)
  14 siblings, 0 replies; 48+ messages in thread
From: Ido Schimmel @ 2019-07-07  7:58 UTC (permalink / raw)
  To: netdev
  Cc: davem, jiri, mlxsw, dsahern, roopa, nikolay, andy, pablo,
	jakub.kicinski, pieter.jansenvanvuuren, andrew, f.fainelli,
	vivien.didelot, Ido Schimmel

From: Ido Schimmel <idosch@mellanox.com>

Add the basic packet trap infrastructure that allows device drivers to
register their supported packet traps and trap groups with devlink.

Each driver is expected to provide basic information about each
supported trap, such as name and ID, but also the supported metadata
types that will accompany each packet trapped via the trap. The
currently supported metadata type is just the input port, but more will
be added in the future. For example, output port and traffic class.

Trap groups allow users to set the reporting state as well as the action
of all member traps. In addition, users can retrieve per-group
statistics in case per-trap statistics are too narrow. In the future,
the trap group object can be extended with more attributes, such as
policer settings which will limit the amount of traffic generated by
member traps towards the CPU.

Beside registering their packet traps with devlink, drivers are also
expected to report trapped packets to devlink along with relevant
metadata. devlink will maintain packets and bytes statistics for each
packet trap and will potentially report the trapped packet with its
metadata to userspace via a new generic netlink multicast group.

The interface towards the drivers is simple and allows devlink to set
the action of the trap. Currently, only two actions are supported:
'trap' and 'drop'. When set to 'trap', the device is expected to provide
the sole copy of the packet to the driver which will pass it to devlink.
When set to 'drop', the device is expected to drop the packet and not
send a copy to the driver. In the future, more actions can be added,
such as 'mirror'.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/devlink.h        |  129 ++++
 include/uapi/linux/devlink.h |   68 ++
 net/core/devlink.c           | 1251 +++++++++++++++++++++++++++++++++-
 3 files changed, 1443 insertions(+), 5 deletions(-)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index 6625ea068d5e..22614d3a742a 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -14,6 +14,7 @@
 #include <linux/netdevice.h>
 #include <linux/spinlock.h>
 #include <linux/workqueue.h>
+#include <linux/refcount.h>
 #include <net/net_namespace.h>
 #include <uapi/linux/devlink.h>
 
@@ -31,6 +32,8 @@ struct devlink {
 	struct list_head reporter_list;
 	struct mutex reporters_lock; /* protects reporter_list */
 	struct devlink_dpipe_headers *dpipe_headers;
+	struct list_head trap_list;
+	struct list_head trap_group_list;
 	const struct devlink_ops *ops;
 	struct device *dev;
 	possible_net_t _net;
@@ -477,6 +480,89 @@ struct devlink_health_reporter_ops {
 			struct devlink_fmsg *fmsg);
 };
 
+/**
+ * struct devlink_trap_group - Immutable packet trap group attributes.
+ * @name: Trap group name.
+ * @id: Trap group identifier.
+ * @generic: Whether the trap group is generic or not.
+ *
+ * Describes immutable attributes of packet trap groups that drivers register
+ * with devlink.
+ */
+struct devlink_trap_group {
+	const char *name;
+	u16 id;
+	bool generic;
+};
+
+#define DEVLINK_TRAP_METADATA_TYPE_F_IN_PORT	BIT(0)
+
+/**
+ * struct devlink_trap - Immutable packet trap attributes.
+ * @type: Trap type.
+ * @init_action: Initial trap action.
+ * @generic: Whether the trap is generic or not.
+ * @id: Trap identifier.
+ * @name: Trap name.
+ * @group: Immutable packet trap group attributes.
+ * @metadata_cap: Metadata types that can be provided by the trap.
+ *
+ * Describes immutable attributes of packet traps that drivers register with
+ * devlink.
+ */
+struct devlink_trap {
+	enum devlink_trap_type type;
+	enum devlink_trap_action init_action;
+	bool generic;
+	u16 id;
+	const char *name;
+	struct devlink_trap_group group;
+	u32 metadata_cap;
+};
+
+enum devlink_trap_generic_id {
+	/* Add new generic trap IDs above */
+	__DEVLINK_TRAP_GENERIC_ID_MAX,
+	DEVLINK_TRAP_GENERIC_ID_MAX = __DEVLINK_TRAP_GENERIC_ID_MAX - 1,
+};
+
+enum devlink_trap_group_generic_id {
+	/* Add new generic trap group IDs above */
+	__DEVLINK_TRAP_GROUP_GENERIC_ID_MAX,
+	DEVLINK_TRAP_GROUP_GENERIC_ID_MAX =
+		__DEVLINK_TRAP_GROUP_GENERIC_ID_MAX - 1,
+};
+
+#define DEVLINK_TRAP_GENERIC(_type, _init_action, _id, _group, _metadata_cap) \
+	{								      \
+		.type = DEVLINK_TRAP_TYPE_##_type,			      \
+		.init_action = DEVLINK_TRAP_ACTION_##_init_action,	      \
+		.generic = true,					      \
+		.id = DEVLINK_TRAP_GENERIC_ID_##_id,			      \
+		.name = DEVLINK_TRAP_GENERIC_NAME_##_id,		      \
+		.group = _group,					      \
+		.metadata_cap = _metadata_cap,				      \
+	}
+
+#define DEVLINK_TRAP_DRIVER(_type, _init_action, _id, _name, _group,	      \
+			    _metadata_cap)				      \
+	{								      \
+		.type = DEVLINK_TRAP_TYPE_##_type,			      \
+		.init_action = DEVLINK_TRAP_ACTION_##_init_action,	      \
+		.generic = false,					      \
+		.id = _id,						      \
+		.name = _name,						      \
+		.group = _group,					      \
+		.metadata_cap = _metadata_cap,				      \
+	}
+
+#define DEVLINK_TRAP_GROUP_GENERIC(_id)					      \
+	{								      \
+		.name = DEVLINK_TRAP_GROUP_GENERIC_NAME_##_id,		      \
+		.id = DEVLINK_TRAP_GROUP_GENERIC_ID_##_id,		      \
+		.generic = true,					      \
+	}
+
 struct devlink_ops {
 	int (*reload)(struct devlink *devlink, struct netlink_ext_ack *extack);
 	int (*port_type_set)(struct devlink_port *devlink_port,
@@ -538,6 +624,38 @@ struct devlink_ops {
 	int (*flash_update)(struct devlink *devlink, const char *file_name,
 			    const char *component,
 			    struct netlink_ext_ack *extack);
+	/**
+	 * @trap_init: Trap initialization function.
+	 *
+	 * Should be used by device drivers to initialize the trap in the
+	 * underlying device. Drivers should also store the provided trap
+	 * context, so that they could efficiently pass it to
+	 * devlink_trap_report() when the trap is triggered.
+	 */
+	int (*trap_init)(struct devlink *devlink,
+			 const struct devlink_trap *trap, void *trap_ctx);
+	/**
+	 * @trap_fini: Trap de-initialization function.
+	 *
+	 * Should be used by device drivers to de-initialize the trap in the
+	 * underlying device.
+	 */
+	void (*trap_fini)(struct devlink *devlink,
+			  const struct devlink_trap *trap, void *trap_ctx);
+	/**
+	 * @trap_action_set: Trap action set function.
+	 */
+	int (*trap_action_set)(struct devlink *devlink,
+			       const struct devlink_trap *trap,
+			       enum devlink_trap_action action);
+	/**
+	 * @trap_group_init: Trap group initialization function.
+	 *
+	 * Should be used by device drivers to initialize the trap group in the
+	 * underlying device.
+	 */
+	int (*trap_group_init)(struct devlink *devlink,
+			       const struct devlink_trap_group *group);
 };
 
 static inline void *devlink_priv(struct devlink *devlink)
@@ -747,6 +865,17 @@ void devlink_flash_update_status_notify(struct devlink *devlink,
 					unsigned long done,
 					unsigned long total);
 
+int devlink_traps_register(struct devlink *devlink,
+			   const struct devlink_trap *traps,
+			   size_t traps_count, void *priv);
+void devlink_traps_unregister(struct devlink *devlink,
+			      const struct devlink_trap *traps,
+			      size_t traps_count);
+void devlink_trap_report(struct devlink *devlink,
+			 struct sk_buff *skb, void *trap_ctx,
+			 struct devlink_port *in_devlink_port);
+void *devlink_trap_ctx_priv(void *trap_ctx);
+
 #if IS_ENABLED(CONFIG_NET_DEVLINK)
 
 void devlink_compat_running_version(struct net_device *dev,
diff --git a/include/uapi/linux/devlink.h b/include/uapi/linux/devlink.h
index 5287b42c181f..c1c0a6e2a71a 100644
--- a/include/uapi/linux/devlink.h
+++ b/include/uapi/linux/devlink.h
@@ -16,6 +16,7 @@
 #define DEVLINK_GENL_NAME "devlink"
 #define DEVLINK_GENL_VERSION 0x1
 #define DEVLINK_GENL_MCGRP_CONFIG_NAME "config"
+#define DEVLINK_GENL_MCGRP_TRAP_NAME "trap"
 
 enum devlink_command {
 	/* don't change the order or add anything between, this is ABI! */
@@ -107,6 +108,17 @@ enum devlink_command {
 	DEVLINK_CMD_FLASH_UPDATE_END,		/* notification only */
 	DEVLINK_CMD_FLASH_UPDATE_STATUS,	/* notification only */
 
+	DEVLINK_CMD_TRAP_GET,		/* can dump */
+	DEVLINK_CMD_TRAP_SET,
+	DEVLINK_CMD_TRAP_NEW,
+	DEVLINK_CMD_TRAP_DEL,
+	DEVLINK_CMD_TRAP_REPORT,
+
+	DEVLINK_CMD_TRAP_GROUP_GET,	/* can dump */
+	DEVLINK_CMD_TRAP_GROUP_SET,
+	DEVLINK_CMD_TRAP_GROUP_NEW,
+	DEVLINK_CMD_TRAP_GROUP_DEL,
+
 	/* add new commands above here */
 	__DEVLINK_CMD_MAX,
 	DEVLINK_CMD_MAX = __DEVLINK_CMD_MAX - 1
@@ -186,6 +198,47 @@ enum devlink_param_fw_load_policy_value {
 	DEVLINK_PARAM_FW_LOAD_POLICY_VALUE_FLASH,
 };
 
+enum {
+	DEVLINK_ATTR_STATS_RX_PACKETS,
+	DEVLINK_ATTR_STATS_RX_BYTES,
+
+	__DEVLINK_ATTR_STATS_MAX,
+	DEVLINK_ATTR_STATS_MAX = __DEVLINK_ATTR_STATS_MAX - 1
+};
+
+/**
+ * enum devlink_trap_action - Packet trap action.
+ * @DEVLINK_TRAP_ACTION_DROP: Packet is dropped by the device and a copy is not
+ *                            sent to the CPU.
+ * @DEVLINK_TRAP_ACTION_TRAP: The sole copy of the packet is sent to the CPU.
+ */
+enum devlink_trap_action {
+	DEVLINK_TRAP_ACTION_DROP,
+	DEVLINK_TRAP_ACTION_TRAP,
+};
+
+/**
+ * enum devlink_trap_type - Packet trap type.
+ * @DEVLINK_TRAP_TYPE_DROP: Trap reason is a drop. Trapped packets are only
+ *                          processed by devlink and not injected to the
+ *                          kernel's Rx path.
+ * @DEVLINK_TRAP_TYPE_EXCEPTION: Trap reason is an exception. Packet was not
+ *                               forwarded as intended due to an exception
+ *                               (e.g., missing neighbour entry) and trapped to
+ *                               control plane for resolution. Trapped packets
+ *                               are processed by devlink and injected to
+ *                               the kernel's Rx path.
+ */
+enum devlink_trap_type {
+	DEVLINK_TRAP_TYPE_DROP,
+	DEVLINK_TRAP_TYPE_EXCEPTION,
+};
+
+enum {
+	/* Trap can report input port as metadata */
+	DEVLINK_ATTR_TRAP_METADATA_TYPE_IN_PORT,
+};
+
 enum devlink_attr {
 	/* don't change the order or add anything between, this is ABI! */
 	DEVLINK_ATTR_UNSPEC,
@@ -337,6 +390,21 @@ enum devlink_attr {
 	DEVLINK_ATTR_FLASH_UPDATE_STATUS_DONE,	/* u64 */
 	DEVLINK_ATTR_FLASH_UPDATE_STATUS_TOTAL,	/* u64 */
 
+	DEVLINK_ATTR_STATS,				/* nested */
+
+	DEVLINK_ATTR_TRAP_NAME,				/* string */
+	DEVLINK_ATTR_TRAP_REPORT_ENABLED,		/* u8 */
+	/* enum devlink_trap_action */
+	DEVLINK_ATTR_TRAP_ACTION,			/* u8 */
+	/* enum devlink_trap_type */
+	DEVLINK_ATTR_TRAP_TYPE,				/* u8 */
+	DEVLINK_ATTR_TRAP_GENERIC,			/* flag */
+	DEVLINK_ATTR_TRAP_METADATA,			/* nested */
+	DEVLINK_ATTR_TRAP_TIMESTAMP,			/* struct timespec */
+	DEVLINK_ATTR_TRAP_IN_PORT,			/* nested */
+	DEVLINK_ATTR_TRAP_PAYLOAD,			/* binary */
+	DEVLINK_ATTR_TRAP_GROUP_NAME,			/* string */
+
 	/* add new attributes above here, update the policy in devlink.c */
 
 	__DEVLINK_ATTR_MAX,
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 755a9a32015e..44a850ec3d5f 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -18,6 +18,8 @@
 #include <linux/spinlock.h>
 #include <linux/refcount.h>
 #include <linux/workqueue.h>
+#include <linux/u64_stats_sync.h>
+#include <linux/timekeeping.h>
 #include <rdma/ib_verbs.h>
 #include <net/netlink.h>
 #include <net/genetlink.h>
@@ -449,10 +451,12 @@ static struct genl_family devlink_nl_family;
 
 enum devlink_multicast_groups {
 	DEVLINK_MCGRP_CONFIG,
+	DEVLINK_MCGRP_TRAP,
 };
 
 static const struct genl_multicast_group devlink_nl_mcgrps[] = {
 	[DEVLINK_MCGRP_CONFIG] = { .name = DEVLINK_GENL_MCGRP_CONFIG_NAME },
+	[DEVLINK_MCGRP_TRAP] = { .name = DEVLINK_GENL_MCGRP_TRAP_NAME },
 };
 
 static int devlink_nl_put_handle(struct sk_buff *msg, struct devlink *devlink)
@@ -570,14 +574,14 @@ static int devlink_nl_port_fill(struct sk_buff *msg, struct devlink *devlink,
 	if (nla_put_u32(msg, DEVLINK_ATTR_PORT_INDEX, devlink_port->index))
 		goto nla_put_failure;
 
-	spin_lock(&devlink_port->type_lock);
+	spin_lock_bh(&devlink_port->type_lock);
 	if (devlink_port->desired_type != DEVLINK_PORT_TYPE_NOTSET &&
 	    nla_put_u16(msg, DEVLINK_ATTR_PORT_DESIRED_TYPE,
 			devlink_port->desired_type))
 		goto nla_put_failure_type_locked;
 	if (devlink_nl_port_type_fill(msg, devlink_port))
 		goto nla_put_failure_type_locked;
-	spin_unlock(&devlink_port->type_lock);
+	spin_unlock_bh(&devlink_port->type_lock);
 	if (devlink_nl_port_attrs_put(msg, devlink_port))
 		goto nla_put_failure;
 
@@ -585,7 +589,7 @@ static int devlink_nl_port_fill(struct sk_buff *msg, struct devlink *devlink,
 	return 0;
 
 nla_put_failure_type_locked:
-	spin_unlock(&devlink_port->type_lock);
+	spin_unlock_bh(&devlink_port->type_lock);
 nla_put_failure:
 	genlmsg_cancel(msg, hdr);
 	return -EMSGSIZE;
@@ -5146,6 +5150,623 @@ devlink_nl_cmd_health_reporter_dump_clear_doit(struct sk_buff *skb,
 	return 0;
 }
 
+struct devlink_stats {
+	u64 rx_bytes;
+	u64 rx_packets;
+	struct u64_stats_sync syncp;
+};
+
+/**
+ * struct devlink_trap_group_item - Packet trap group attributes.
+ * @group: Immutable packet trap group attributes.
+ * @refcount: Number of trap items using the group.
+ * @list: trap_group_list member.
+ * @stats: Trap group statistics.
+ *
+ * Describes packet trap group attributes. Created by devlink during trap
+ * registration.
+ */
+struct devlink_trap_group_item {
+	const struct devlink_trap_group *group;
+	refcount_t refcount;
+	struct list_head list;
+	struct devlink_stats __percpu *stats;
+};
+
+/**
+ * struct devlink_trap_item - Packet trap attributes.
+ * @trap: Immutable packet trap attributes.
+ * @group_item: Associated group item.
+ * @list: trap_list member.
+ * @report: Whether to report trapped packets or not.
+ * @action: Trap action.
+ * @stats: Trap statistics.
+ * @priv: Driver private information.
+ *
+ * Describes both mutable and immutable packet trap attributes. Created by
+ * devlink during trap registration and used for all trap related operations.
+ */
+struct devlink_trap_item {
+	const struct devlink_trap *trap;
+	struct devlink_trap_group_item *group_item;
+	struct list_head list;
+	bool report;
+	enum devlink_trap_action action;
+	struct devlink_stats __percpu *stats;
+	void *priv;
+};
+
+static struct devlink_trap_item *
+devlink_trap_item_lookup(struct devlink *devlink, const char *name)
+{
+	struct devlink_trap_item *trap_item;
+
+	list_for_each_entry(trap_item, &devlink->trap_list, list) {
+		if (!strcmp(trap_item->trap->name, name))
+			return trap_item;
+	}
+
+	return NULL;
+}
+
+static struct devlink_trap_item *
+devlink_trap_item_get_from_info(struct devlink *devlink,
+				struct genl_info *info)
+{
+	struct nlattr *attr;
+
+	if (!info->attrs[DEVLINK_ATTR_TRAP_NAME])
+		return NULL;
+	attr = info->attrs[DEVLINK_ATTR_TRAP_NAME];
+
+	return devlink_trap_item_lookup(devlink, nla_data(attr));
+}
+
+static int
+devlink_trap_action_get_from_info(struct genl_info *info,
+				  enum devlink_trap_action *p_trap_action)
+{
+	u8 val;
+
+	val = nla_get_u8(info->attrs[DEVLINK_ATTR_TRAP_ACTION]);
+	switch (val) {
+	case DEVLINK_TRAP_ACTION_DROP: /* fall-through */
+	case DEVLINK_TRAP_ACTION_TRAP:
+		*p_trap_action = val;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int devlink_trap_metadata_put(struct sk_buff *msg,
+				     const struct devlink_trap *trap)
+{
+	struct nlattr *attr;
+
+	attr = nla_nest_start(msg, DEVLINK_ATTR_TRAP_METADATA);
+	if (!attr)
+		return -EMSGSIZE;
+
+	if ((trap->metadata_cap & DEVLINK_TRAP_METADATA_TYPE_F_IN_PORT) &&
+	    nla_put_flag(msg, DEVLINK_ATTR_TRAP_METADATA_TYPE_IN_PORT))
+		goto nla_put_failure;
+
+	nla_nest_end(msg, attr);
+
+	return 0;
+
+nla_put_failure:
+	nla_nest_cancel(msg, attr);
+	return -EMSGSIZE;
+}
+
+static void devlink_trap_stats_read(struct devlink_stats __percpu *trap_stats,
+				    struct devlink_stats *stats)
+{
+	int i;
+
+	memset(stats, 0, sizeof(*stats));
+	for_each_possible_cpu(i) {
+		struct devlink_stats *cpu_stats;
+		u64 rx_packets, rx_bytes;
+		unsigned int start;
+
+		cpu_stats = per_cpu_ptr(trap_stats, i);
+		do {
+			start = u64_stats_fetch_begin_irq(&cpu_stats->syncp);
+			rx_packets = cpu_stats->rx_packets;
+			rx_bytes = cpu_stats->rx_bytes;
+		} while (u64_stats_fetch_retry_irq(&cpu_stats->syncp, start));
+
+		stats->rx_packets += rx_packets;
+		stats->rx_bytes += rx_bytes;
+	}
+}
+
+static int devlink_trap_stats_put(struct sk_buff *msg,
+				  struct devlink_stats __percpu *trap_stats)
+{
+	struct devlink_stats stats;
+	struct nlattr *attr;
+
+	devlink_trap_stats_read(trap_stats, &stats);
+
+	attr = nla_nest_start(msg, DEVLINK_ATTR_STATS);
+	if (!attr)
+		return -EMSGSIZE;
+
+	if (nla_put_u64_64bit(msg, DEVLINK_ATTR_STATS_RX_PACKETS,
+			      stats.rx_packets, DEVLINK_ATTR_PAD))
+		goto nla_put_failure;
+
+	if (nla_put_u64_64bit(msg, DEVLINK_ATTR_STATS_RX_BYTES,
+			      stats.rx_bytes, DEVLINK_ATTR_PAD))
+		goto nla_put_failure;
+
+	nla_nest_end(msg, attr);
+
+	return 0;
+
+nla_put_failure:
+	nla_nest_cancel(msg, attr);
+	return -EMSGSIZE;
+}
+
+static int devlink_nl_trap_fill(struct sk_buff *msg, struct devlink *devlink,
+				const struct devlink_trap_item *trap_item,
+				enum devlink_command cmd, u32 portid, u32 seq,
+				int flags)
+{
+	struct devlink_trap_group_item *group_item = trap_item->group_item;
+	void *hdr;
+	int err;
+
+	hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd);
+	if (!hdr)
+		return -EMSGSIZE;
+
+	if (devlink_nl_put_handle(msg, devlink))
+		goto nla_put_failure;
+
+	if (nla_put_string(msg, DEVLINK_ATTR_TRAP_GROUP_NAME,
+			   group_item->group->name))
+		goto nla_put_failure;
+
+	if (nla_put_string(msg, DEVLINK_ATTR_TRAP_NAME, trap_item->trap->name))
+		goto nla_put_failure;
+
+	if (nla_put_u8(msg, DEVLINK_ATTR_TRAP_TYPE, trap_item->trap->type))
+		goto nla_put_failure;
+
+	if (trap_item->trap->generic &&
+	    nla_put_flag(msg, DEVLINK_ATTR_TRAP_GENERIC))
+		goto nla_put_failure;
+
+	if (nla_put_u8(msg, DEVLINK_ATTR_TRAP_REPORT_ENABLED,
+		       trap_item->report))
+		goto nla_put_failure;
+
+	if (nla_put_u8(msg, DEVLINK_ATTR_TRAP_ACTION, trap_item->action))
+		goto nla_put_failure;
+
+	err = devlink_trap_metadata_put(msg, trap_item->trap);
+	if (err)
+		goto nla_put_failure;
+
+	err = devlink_trap_stats_put(msg, trap_item->stats);
+	if (err)
+		goto nla_put_failure;
+
+	genlmsg_end(msg, hdr);
+
+	return 0;
+
+nla_put_failure:
+	genlmsg_cancel(msg, hdr);
+	return -EMSGSIZE;
+}
+
+static int devlink_nl_cmd_trap_get_doit(struct sk_buff *skb,
+					struct genl_info *info)
+{
+	struct netlink_ext_ack *extack = info->extack;
+	struct devlink *devlink = info->user_ptr[0];
+	struct devlink_trap_item *trap_item;
+	struct sk_buff *msg;
+	int err;
+
+	if (list_empty(&devlink->trap_list))
+		return -EOPNOTSUPP;
+
+	trap_item = devlink_trap_item_get_from_info(devlink, info);
+	if (!trap_item) {
+		NL_SET_ERR_MSG_MOD(extack, "Device did not register this trap");
+		return -ENOENT;
+	}
+
+	msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!msg)
+		return -ENOMEM;
+
+	err = devlink_nl_trap_fill(msg, devlink, trap_item,
+				   DEVLINK_CMD_TRAP_NEW, info->snd_portid,
+				   info->snd_seq, 0);
+	if (err)
+		goto err_trap_fill;
+
+	return genlmsg_reply(msg, info);
+
+err_trap_fill:
+	nlmsg_free(msg);
+	return err;
+}
+
+static int devlink_nl_cmd_trap_get_dumpit(struct sk_buff *msg,
+					  struct netlink_callback *cb)
+{
+	struct devlink_trap_item *trap_item;
+	struct devlink *devlink;
+	int start = cb->args[0];
+	int idx = 0;
+	int err;
+
+	mutex_lock(&devlink_mutex);
+	list_for_each_entry(devlink, &devlink_list, list) {
+		if (!net_eq(devlink_net(devlink), sock_net(msg->sk)))
+			continue;
+		mutex_lock(&devlink->lock);
+		list_for_each_entry(trap_item, &devlink->trap_list, list) {
+			if (idx < start) {
+				idx++;
+				continue;
+			}
+			err = devlink_nl_trap_fill(msg, devlink, trap_item,
+						   DEVLINK_CMD_TRAP_NEW,
+						   NETLINK_CB(cb->skb).portid,
+						   cb->nlh->nlmsg_seq,
+						   NLM_F_MULTI);
+			if (err) {
+				mutex_unlock(&devlink->lock);
+				goto out;
+			}
+			idx++;
+		}
+		mutex_unlock(&devlink->lock);
+	}
+out:
+	mutex_unlock(&devlink_mutex);
+
+	cb->args[0] = idx;
+	return msg->len;
+}
+
+static int __devlink_trap_action_set(struct devlink *devlink,
+				     struct devlink_trap_item *trap_item,
+				     enum devlink_trap_action trap_action,
+				     struct netlink_ext_ack *extack)
+{
+	int err;
+
+	if (trap_item->action != trap_action &&
+	    trap_item->trap->type != DEVLINK_TRAP_TYPE_DROP) {
+		NL_SET_ERR_MSG_MOD(extack, "Cannot change action of non-drop traps. Skipping");
+		return 0;
+	}
+
+	err = devlink->ops->trap_action_set(devlink, trap_item->trap,
+					    trap_action);
+	if (err)
+		return err;
+
+	trap_item->action = trap_action;
+
+	return 0;
+}
+
+static int devlink_trap_action_set(struct devlink *devlink,
+				   struct devlink_trap_item *trap_item,
+				   struct genl_info *info)
+{
+	enum devlink_trap_action trap_action;
+	int err;
+
+	if (!info->attrs[DEVLINK_ATTR_TRAP_ACTION])
+		return 0;
+
+	err = devlink_trap_action_get_from_info(info, &trap_action);
+	if (err) {
+		NL_SET_ERR_MSG_MOD(info->extack, "Invalid trap action");
+		return -EINVAL;
+	}
+
+	return __devlink_trap_action_set(devlink, trap_item, trap_action,
+					 info->extack);
+}
+
+static void devlink_trap_report_set(struct devlink *devlink,
+				    struct devlink_trap_item *trap_item,
+				    struct genl_info *info)
+{
+	bool report;
+
+	if (!info->attrs[DEVLINK_ATTR_TRAP_REPORT_ENABLED])
+		return;
+
+	report = !!nla_get_u8(info->attrs[DEVLINK_ATTR_TRAP_REPORT_ENABLED]);
+	trap_item->report = report;
+}
+
+static int devlink_nl_cmd_trap_set_doit(struct sk_buff *skb,
+					struct genl_info *info)
+{
+	struct netlink_ext_ack *extack = info->extack;
+	struct devlink *devlink = info->user_ptr[0];
+	struct devlink_trap_item *trap_item;
+	int err;
+
+	if (list_empty(&devlink->trap_list))
+		return -EOPNOTSUPP;
+
+	trap_item = devlink_trap_item_get_from_info(devlink, info);
+	if (!trap_item) {
+		NL_SET_ERR_MSG_MOD(extack, "Device did not register this trap");
+		return -ENOENT;
+	}
+
+	err = devlink_trap_action_set(devlink, trap_item, info);
+	if (err)
+		return err;
+
+	devlink_trap_report_set(devlink, trap_item, info);
+
+	return 0;
+}
+
+static struct devlink_trap_group_item *
+devlink_trap_group_item_lookup(struct devlink *devlink, const char *name)
+{
+	struct devlink_trap_group_item *group_item;
+
+	list_for_each_entry(group_item, &devlink->trap_group_list, list) {
+		if (!strcmp(group_item->group->name, name))
+			return group_item;
+	}
+
+	return NULL;
+}
+
+static struct devlink_trap_group_item *
+devlink_trap_group_item_get_from_info(struct devlink *devlink,
+				      struct genl_info *info)
+{
+	char *name;
+
+	if (!info->attrs[DEVLINK_ATTR_TRAP_GROUP_NAME])
+		return NULL;
+	name = nla_data(info->attrs[DEVLINK_ATTR_TRAP_GROUP_NAME]);
+
+	return devlink_trap_group_item_lookup(devlink, name);
+}
+
+static int
+devlink_nl_trap_group_fill(struct sk_buff *msg, struct devlink *devlink,
+			   const struct devlink_trap_group_item *group_item,
+			   enum devlink_command cmd, u32 portid, u32 seq,
+			   int flags)
+{
+	void *hdr;
+	int err;
+
+	hdr = genlmsg_put(msg, portid, seq, &devlink_nl_family, flags, cmd);
+	if (!hdr)
+		return -EMSGSIZE;
+
+	if (devlink_nl_put_handle(msg, devlink))
+		goto nla_put_failure;
+
+	if (nla_put_string(msg, DEVLINK_ATTR_TRAP_GROUP_NAME,
+			   group_item->group->name))
+		goto nla_put_failure;
+
+	if (group_item->group->generic &&
+	    nla_put_flag(msg, DEVLINK_ATTR_TRAP_GENERIC))
+		goto nla_put_failure;
+
+	err = devlink_trap_stats_put(msg, group_item->stats);
+	if (err)
+		goto nla_put_failure;
+
+	genlmsg_end(msg, hdr);
+
+	return 0;
+
+nla_put_failure:
+	genlmsg_cancel(msg, hdr);
+	return -EMSGSIZE;
+}
+
+static int devlink_nl_cmd_trap_group_get_doit(struct sk_buff *skb,
+					      struct genl_info *info)
+{
+	struct netlink_ext_ack *extack = info->extack;
+	struct devlink *devlink = info->user_ptr[0];
+	struct devlink_trap_group_item *group_item;
+	struct sk_buff *msg;
+	int err;
+
+	if (list_empty(&devlink->trap_group_list))
+		return -EOPNOTSUPP;
+
+	group_item = devlink_trap_group_item_get_from_info(devlink, info);
+	if (!group_item) {
+		NL_SET_ERR_MSG_MOD(extack, "Device did not register this trap group");
+		return -ENOENT;
+	}
+
+	msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!msg)
+		return -ENOMEM;
+
+	err = devlink_nl_trap_group_fill(msg, devlink, group_item,
+					 DEVLINK_CMD_TRAP_GROUP_NEW,
+					 info->snd_portid, info->snd_seq, 0);
+	if (err)
+		goto err_trap_group_fill;
+
+	return genlmsg_reply(msg, info);
+
+err_trap_group_fill:
+	nlmsg_free(msg);
+	return err;
+}
+
+static int devlink_nl_cmd_trap_group_get_dumpit(struct sk_buff *msg,
+						struct netlink_callback *cb)
+{
+	enum devlink_command cmd = DEVLINK_CMD_TRAP_GROUP_NEW;
+	struct devlink_trap_group_item *group_item;
+	u32 portid = NETLINK_CB(cb->skb).portid;
+	struct devlink *devlink;
+	int start = cb->args[0];
+	int idx = 0;
+	int err;
+
+	mutex_lock(&devlink_mutex);
+	list_for_each_entry(devlink, &devlink_list, list) {
+		if (!net_eq(devlink_net(devlink), sock_net(msg->sk)))
+			continue;
+		mutex_lock(&devlink->lock);
+		list_for_each_entry(group_item, &devlink->trap_group_list,
+				    list) {
+			if (idx < start) {
+				idx++;
+				continue;
+			}
+			err = devlink_nl_trap_group_fill(msg, devlink,
+							 group_item, cmd,
+							 portid,
+							 cb->nlh->nlmsg_seq,
+							 NLM_F_MULTI);
+			if (err) {
+				mutex_unlock(&devlink->lock);
+				goto out;
+			}
+			idx++;
+		}
+		mutex_unlock(&devlink->lock);
+	}
+out:
+	mutex_unlock(&devlink_mutex);
+
+	cb->args[0] = idx;
+	return msg->len;
+}
+
+static int
+__devlink_trap_group_action_set(struct devlink *devlink,
+				struct devlink_trap_group_item *group_item,
+				enum devlink_trap_action trap_action,
+				struct netlink_ext_ack *extack)
+{
+	const char *group_name = group_item->group->name;
+	struct devlink_trap_item *trap_item;
+	int err;
+
+	list_for_each_entry(trap_item, &devlink->trap_list, list) {
+		if (strcmp(trap_item->trap->group.name, group_name))
+			continue;
+		err = __devlink_trap_action_set(devlink, trap_item,
+						trap_action, extack);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
+static int
+devlink_trap_group_action_set(struct devlink *devlink,
+			      struct devlink_trap_group_item *group_item,
+			      struct genl_info *info)
+{
+	enum devlink_trap_action trap_action;
+	int err;
+
+	if (!info->attrs[DEVLINK_ATTR_TRAP_ACTION])
+		return 0;
+
+	err = devlink_trap_action_get_from_info(info, &trap_action);
+	if (err) {
+		NL_SET_ERR_MSG_MOD(info->extack, "Invalid trap action");
+		return -EINVAL;
+	}
+
+	err = __devlink_trap_group_action_set(devlink, group_item, trap_action,
+					      info->extack);
+	if (err)
+		return err;
+
+	return 0;
+}
+
+static void
+__devlink_trap_group_report_set(struct devlink *devlink,
+				struct devlink_trap_group_item *group_item,
+				bool report)
+{
+	const char *group_name = group_item->group->name;
+	struct devlink_trap_item *trap_item;
+
+	list_for_each_entry(trap_item, &devlink->trap_list, list) {
+		if (strcmp(trap_item->trap->group.name, group_name))
+			continue;
+		trap_item->report = report;
+	}
+}
+
+static void
+devlink_trap_group_report_set(struct devlink *devlink,
+			      struct devlink_trap_group_item *group_item,
+			      struct genl_info *info)
+{
+	bool report;
+
+	if (!info->attrs[DEVLINK_ATTR_TRAP_REPORT_ENABLED])
+		return;
+
+	report = !!nla_get_u8(info->attrs[DEVLINK_ATTR_TRAP_REPORT_ENABLED]);
+	__devlink_trap_group_report_set(devlink, group_item, report);
+}
+
+static int devlink_nl_cmd_trap_group_set_doit(struct sk_buff *skb,
+					      struct genl_info *info)
+{
+	struct netlink_ext_ack *extack = info->extack;
+	struct devlink *devlink = info->user_ptr[0];
+	struct devlink_trap_group_item *group_item;
+	int err;
+
+	if (list_empty(&devlink->trap_group_list))
+		return -EOPNOTSUPP;
+
+	group_item = devlink_trap_group_item_get_from_info(devlink, info);
+	if (!group_item) {
+		NL_SET_ERR_MSG_MOD(extack, "Device did not register this trap group");
+		return -ENOENT;
+	}
+
+	err = devlink_trap_group_action_set(devlink, group_item, info);
+	if (err)
+		return err;
+
+	devlink_trap_group_report_set(devlink, group_item, info);
+
+	return 0;
+}
+
 static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
 	[DEVLINK_ATTR_BUS_NAME] = { .type = NLA_NUL_STRING },
 	[DEVLINK_ATTR_DEV_NAME] = { .type = NLA_NUL_STRING },
@@ -5176,6 +5797,10 @@ static const struct nla_policy devlink_nl_policy[DEVLINK_ATTR_MAX + 1] = {
 	[DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER] = { .type = NLA_U8 },
 	[DEVLINK_ATTR_FLASH_UPDATE_FILE_NAME] = { .type = NLA_NUL_STRING },
 	[DEVLINK_ATTR_FLASH_UPDATE_COMPONENT] = { .type = NLA_NUL_STRING },
+	[DEVLINK_ATTR_TRAP_NAME] = { .type = NLA_NUL_STRING },
+	[DEVLINK_ATTR_TRAP_REPORT_ENABLED] = { .type = NLA_U8 },
+	[DEVLINK_ATTR_TRAP_ACTION] = { .type = NLA_U8 },
+	[DEVLINK_ATTR_TRAP_GROUP_NAME] = { .type = NLA_NUL_STRING },
 };
 
 static const struct genl_ops devlink_nl_ops[] = {
@@ -5475,6 +6100,32 @@ static const struct genl_ops devlink_nl_ops[] = {
 		.flags = GENL_ADMIN_PERM,
 		.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
 	},
+	{
+		.cmd = DEVLINK_CMD_TRAP_GET,
+		.doit = devlink_nl_cmd_trap_get_doit,
+		.dumpit = devlink_nl_cmd_trap_get_dumpit,
+		.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
+		/* can be retrieved by unprivileged users */
+	},
+	{
+		.cmd = DEVLINK_CMD_TRAP_SET,
+		.doit = devlink_nl_cmd_trap_set_doit,
+		.flags = GENL_ADMIN_PERM,
+		.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
+	},
+	{
+		.cmd = DEVLINK_CMD_TRAP_GROUP_GET,
+		.doit = devlink_nl_cmd_trap_group_get_doit,
+		.dumpit = devlink_nl_cmd_trap_group_get_dumpit,
+		.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
+		/* can be retrieved by unprivileged users */
+	},
+	{
+		.cmd = DEVLINK_CMD_TRAP_GROUP_SET,
+		.doit = devlink_nl_cmd_trap_group_set_doit,
+		.flags = GENL_ADMIN_PERM,
+		.internal_flags = DEVLINK_NL_FLAG_NEED_DEVLINK,
+	},
 };
 
 static struct genl_family devlink_nl_family __ro_after_init = {
@@ -5520,6 +6171,8 @@ struct devlink *devlink_alloc(const struct devlink_ops *ops, size_t priv_size)
 	INIT_LIST_HEAD(&devlink->param_list);
 	INIT_LIST_HEAD(&devlink->region_list);
 	INIT_LIST_HEAD(&devlink->reporter_list);
+	INIT_LIST_HEAD(&devlink->trap_list);
+	INIT_LIST_HEAD(&devlink->trap_group_list);
 	mutex_init(&devlink->lock);
 	mutex_init(&devlink->reporters_lock);
 	return devlink;
@@ -5566,6 +6219,8 @@ void devlink_free(struct devlink *devlink)
 {
 	mutex_destroy(&devlink->reporters_lock);
 	mutex_destroy(&devlink->lock);
+	WARN_ON(!list_empty(&devlink->trap_group_list));
+	WARN_ON(!list_empty(&devlink->trap_list));
 	WARN_ON(!list_empty(&devlink->reporter_list));
 	WARN_ON(!list_empty(&devlink->region_list));
 	WARN_ON(!list_empty(&devlink->param_list));
@@ -5670,10 +6325,10 @@ static void __devlink_port_type_set(struct devlink_port *devlink_port,
 	if (WARN_ON(!devlink_port->registered))
 		return;
 	devlink_port_type_warn_cancel(devlink_port);
-	spin_lock(&devlink_port->type_lock);
+	spin_lock_bh(&devlink_port->type_lock);
 	devlink_port->type = type;
 	devlink_port->type_dev = type_dev;
-	spin_unlock(&devlink_port->type_lock);
+	spin_unlock_bh(&devlink_port->type_lock);
 	devlink_port_notify(devlink_port, DEVLINK_CMD_PORT_NEW);
 }
 
@@ -6751,6 +7406,592 @@ int devlink_region_snapshot_create(struct devlink_region *region, u64 data_len,
 }
 EXPORT_SYMBOL_GPL(devlink_region_snapshot_create);
 
+#define DEVLINK_TRAP(_id, _type)					      \
+	{								      \
+		.type = DEVLINK_TRAP_TYPE_##_type,			      \
+		.id = DEVLINK_TRAP_GENERIC_ID_##_id,			      \
+		.name = DEVLINK_TRAP_GENERIC_NAME_##_id,		      \
+	}
+
+static const struct devlink_trap devlink_trap_generic[] = {
+};
+
+#define DEVLINK_TRAP_GROUP(_id)						      \
+	{								      \
+		.id = DEVLINK_TRAP_GROUP_GENERIC_ID_##_id,		      \
+		.name = DEVLINK_TRAP_GROUP_GENERIC_NAME_##_id,		      \
+	}
+
+static const struct devlink_trap_group devlink_trap_group_generic[] = {
+};
+
+static int devlink_trap_generic_verify(const struct devlink_trap *trap)
+{
+	if (trap->id > DEVLINK_TRAP_GENERIC_ID_MAX)
+		return -EINVAL;
+
+	if (strcmp(trap->name, devlink_trap_generic[trap->id].name))
+		return -EINVAL;
+
+	if (trap->type != devlink_trap_generic[trap->id].type)
+		return -EINVAL;
+
+	return 0;
+}
+
+static int devlink_trap_driver_verify(const struct devlink_trap *trap)
+{
+	int i;
+
+	if (trap->id <= DEVLINK_TRAP_GENERIC_ID_MAX)
+		return -EINVAL;
+
+	for (i = 0; i < ARRAY_SIZE(devlink_trap_generic); i++) {
+		if (!strcmp(trap->name, devlink_trap_generic[i].name))
+			return -EEXIST;
+	}
+
+	return 0;
+}
+
+static int devlink_trap_verify(const struct devlink_trap *trap)
+{
+	if (!trap || !trap->name || !trap->group.name)
+		return -EINVAL;
+
+	if (trap->generic)
+		return devlink_trap_generic_verify(trap);
+	else
+		return devlink_trap_driver_verify(trap);
+}
+
+static int
+devlink_trap_group_generic_verify(const struct devlink_trap_group *group)
+{
+	if (group->id > DEVLINK_TRAP_GROUP_GENERIC_ID_MAX)
+		return -EINVAL;
+
+	if (strcmp(group->name, devlink_trap_group_generic[group->id].name))
+		return -EINVAL;
+
+	return 0;
+}
+
+static int
+devlink_trap_group_driver_verify(const struct devlink_trap_group *group)
+{
+	int i;
+
+	if (group->id <= DEVLINK_TRAP_GROUP_GENERIC_ID_MAX)
+		return -EINVAL;
+
+	for (i = 0; i < ARRAY_SIZE(devlink_trap_group_generic); i++) {
+		if (!strcmp(group->name, devlink_trap_group_generic[i].name))
+			return -EEXIST;
+	}
+
+	return 0;
+}
+
+static int devlink_trap_group_verify(const struct devlink_trap_group *group)
+{
+	if (group->generic)
+		return devlink_trap_group_generic_verify(group);
+	else
+		return devlink_trap_group_driver_verify(group);
+}
+
+static void
+devlink_trap_group_notify(struct devlink *devlink,
+			  const struct devlink_trap_group_item *group_item,
+			  enum devlink_command cmd)
+{
+	struct sk_buff *msg;
+	int err;
+
+	WARN_ON_ONCE(cmd != DEVLINK_CMD_TRAP_GROUP_NEW &&
+		     cmd != DEVLINK_CMD_TRAP_GROUP_DEL);
+
+	msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!msg)
+		return;
+
+	err = devlink_nl_trap_group_fill(msg, devlink, group_item, cmd, 0, 0,
+					 0);
+	if (err) {
+		nlmsg_free(msg);
+		return;
+	}
+
+	genlmsg_multicast_netns(&devlink_nl_family, devlink_net(devlink),
+				msg, 0, DEVLINK_MCGRP_CONFIG, GFP_KERNEL);
+}
+
+static struct devlink_trap_group_item *
+devlink_trap_group_item_create(struct devlink *devlink,
+			       const struct devlink_trap_group *group)
+{
+	struct devlink_trap_group_item *group_item;
+	int err;
+
+	err = devlink_trap_group_verify(group);
+	if (err)
+		return ERR_PTR(err);
+
+	group_item = kzalloc(sizeof(*group_item), GFP_KERNEL);
+	if (!group_item)
+		return ERR_PTR(-ENOMEM);
+
+	group_item->stats = netdev_alloc_pcpu_stats(struct devlink_stats);
+	if (!group_item->stats) {
+		err = -ENOMEM;
+		goto err_stats_alloc;
+	}
+
+	group_item->group = group;
+	refcount_set(&group_item->refcount, 1);
+
+	if (devlink->ops->trap_group_init) {
+		err = devlink->ops->trap_group_init(devlink, group);
+		if (err)
+			goto err_group_init;
+	}
+
+	list_add_tail(&group_item->list, &devlink->trap_group_list);
+	devlink_trap_group_notify(devlink, group_item,
+				  DEVLINK_CMD_TRAP_GROUP_NEW);
+
+	return group_item;
+
+err_group_init:
+	free_percpu(group_item->stats);
+err_stats_alloc:
+	kfree(group_item);
+	return ERR_PTR(err);
+}
+
+static void
+devlink_trap_group_item_destroy(struct devlink *devlink,
+				struct devlink_trap_group_item *group_item)
+{
+	devlink_trap_group_notify(devlink, group_item,
+				  DEVLINK_CMD_TRAP_GROUP_DEL);
+	list_del(&group_item->list);
+	free_percpu(group_item->stats);
+	kfree(group_item);
+}
+
+static struct devlink_trap_group_item *
+devlink_trap_group_item_get(struct devlink *devlink,
+			    const struct devlink_trap_group *group)
+{
+	struct devlink_trap_group_item *group_item;
+
+	group_item = devlink_trap_group_item_lookup(devlink, group->name);
+	if (group_item) {
+		refcount_inc(&group_item->refcount);
+		return group_item;
+	}
+
+	return devlink_trap_group_item_create(devlink, group);
+}
+
+static void
+devlink_trap_group_item_put(struct devlink *devlink,
+			    struct devlink_trap_group_item *group_item)
+{
+	if (!refcount_dec_and_test(&group_item->refcount))
+		return;
+
+	devlink_trap_group_item_destroy(devlink, group_item);
+}
+
+static int
+devlink_trap_item_group_link(struct devlink *devlink,
+			     struct devlink_trap_item *trap_item)
+{
+	struct devlink_trap_group_item *group_item;
+
+	group_item = devlink_trap_group_item_get(devlink,
+						 &trap_item->trap->group);
+	if (IS_ERR(group_item))
+		return PTR_ERR(group_item);
+
+	trap_item->group_item = group_item;
+
+	return 0;
+}
+
+static void
+devlink_trap_item_group_unlink(struct devlink *devlink,
+			       struct devlink_trap_item *trap_item)
+{
+	devlink_trap_group_item_put(devlink, trap_item->group_item);
+}
+
+static void devlink_trap_notify(struct devlink *devlink,
+				const struct devlink_trap_item *trap_item,
+				enum devlink_command cmd)
+{
+	struct sk_buff *msg;
+	int err;
+
+	WARN_ON_ONCE(cmd != DEVLINK_CMD_TRAP_NEW &&
+		     cmd != DEVLINK_CMD_TRAP_DEL);
+
+	msg = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
+	if (!msg)
+		return;
+
+	err = devlink_nl_trap_fill(msg, devlink, trap_item, cmd, 0, 0, 0);
+	if (err) {
+		nlmsg_free(msg);
+		return;
+	}
+
+	genlmsg_multicast_netns(&devlink_nl_family, devlink_net(devlink),
+				msg, 0, DEVLINK_MCGRP_CONFIG, GFP_KERNEL);
+}
+
+static int
+devlink_trap_register(struct devlink *devlink,
+		      const struct devlink_trap *trap, void *priv)
+{
+	struct devlink_trap_item *trap_item;
+	int err;
+
+	if (devlink_trap_item_lookup(devlink, trap->name))
+		return -EEXIST;
+
+	trap_item = kzalloc(sizeof(*trap_item), GFP_KERNEL);
+	if (!trap_item)
+		return -ENOMEM;
+
+	trap_item->stats = netdev_alloc_pcpu_stats(struct devlink_stats);
+	if (!trap_item->stats) {
+		err = -ENOMEM;
+		goto err_stats_alloc;
+	}
+
+	trap_item->trap = trap;
+	trap_item->report = false;
+	trap_item->action = trap->init_action;
+	trap_item->priv = priv;
+
+	err = devlink_trap_item_group_link(devlink, trap_item);
+	if (err)
+		goto err_group_link;
+
+	err = devlink->ops->trap_init(devlink, trap, trap_item);
+	if (err)
+		goto err_trap_init;
+
+	list_add_tail(&trap_item->list, &devlink->trap_list);
+	devlink_trap_notify(devlink, trap_item, DEVLINK_CMD_TRAP_NEW);
+
+	return 0;
+
+err_trap_init:
+	devlink_trap_item_group_unlink(devlink, trap_item);
+err_group_link:
+	free_percpu(trap_item->stats);
+err_stats_alloc:
+	kfree(trap_item);
+	return err;
+}
+
+static void devlink_trap_unregister(struct devlink *devlink,
+				    const struct devlink_trap *trap)
+{
+	struct devlink_trap_item *trap_item;
+
+	trap_item = devlink_trap_item_lookup(devlink, trap->name);
+	if (WARN_ON_ONCE(!trap_item))
+		return;
+
+	devlink_trap_notify(devlink, trap_item, DEVLINK_CMD_TRAP_DEL);
+	list_del(&trap_item->list);
+	if (devlink->ops->trap_fini)
+		devlink->ops->trap_fini(devlink, trap, trap_item);
+	devlink_trap_item_group_unlink(devlink, trap_item);
+	free_percpu(trap_item->stats);
+	kfree(trap_item);
+}
+
+static void devlink_trap_disable(struct devlink *devlink,
+				 const struct devlink_trap *trap)
+{
+	struct devlink_trap_item *trap_item;
+
+	trap_item = devlink_trap_item_lookup(devlink, trap->name);
+	if (WARN_ON_ONCE(!trap_item))
+		return;
+
+	devlink->ops->trap_action_set(devlink, trap, DEVLINK_TRAP_ACTION_DROP);
+	trap_item->action = DEVLINK_TRAP_ACTION_DROP;
+}
+
+/**
+ * devlink_traps_register - Register packet traps with devlink.
+ * @devlink: devlink.
+ * @traps: Packet traps.
+ * @traps_count: Count of provided packet traps.
+ * @priv: Driver private information.
+ *
+ * Return: Non-zero value on failure.
+ */
+int devlink_traps_register(struct devlink *devlink,
+			   const struct devlink_trap *traps,
+			   size_t traps_count, void *priv)
+{
+	int i, err;
+
+	if (!devlink->ops->trap_init || !devlink->ops->trap_action_set)
+		return -EINVAL;
+
+	mutex_lock(&devlink->lock);
+	for (i = 0; i < traps_count; i++) {
+		const struct devlink_trap *trap = &traps[i];
+
+		err = devlink_trap_verify(trap);
+		if (err)
+			goto err_trap_verify;
+
+		err = devlink_trap_register(devlink, trap, priv);
+		if (err)
+			goto err_trap_register;
+	}
+	mutex_unlock(&devlink->lock);
+
+	return 0;
+
+err_trap_register:
+err_trap_verify:
+	for (i--; i >= 0; i--)
+		devlink_trap_unregister(devlink, &traps[i]);
+	mutex_unlock(&devlink->lock);
+	return err;
+}
+EXPORT_SYMBOL_GPL(devlink_traps_register);
+
+/**
+ * devlink_traps_unregister - Unregister packet traps from devlink.
+ * @devlink: devlink.
+ * @traps: Packet traps.
+ * @traps_count: Count of provided packet traps.
+ */
+void devlink_traps_unregister(struct devlink *devlink,
+			      const struct devlink_trap *traps,
+			      size_t traps_count)
+{
+	int i;
+
+	mutex_lock(&devlink->lock);
+	/* Make sure we do not have any packets in-flight while unregistering
+	 * traps by disabling all of them and waiting for a grace period.
+	 */
+	for (i = traps_count - 1; i >= 0; i--)
+		devlink_trap_disable(devlink, &traps[i]);
+	synchronize_rcu();
+	for (i = traps_count - 1; i >= 0; i--)
+		devlink_trap_unregister(devlink, &traps[i]);
+	mutex_unlock(&devlink->lock);
+}
+EXPORT_SYMBOL_GPL(devlink_traps_unregister);
+
+static void
+devlink_trap_stats_update(struct devlink_stats __percpu *trap_stats,
+			  size_t skb_len)
+{
+	struct devlink_stats *stats;
+
+	stats = this_cpu_ptr(trap_stats);
+	u64_stats_update_begin(&stats->syncp);
+	stats->rx_bytes += skb_len;
+	stats->rx_packets++;
+	u64_stats_update_end(&stats->syncp);
+}
+
+static size_t devlink_nl_trap_in_port_size(void)
+{
+	       /* DEVLINK_ATTR_TRAP_IN_PORT nest */
+	return nla_total_size(0) +
+	       /* DEVLINK_ATTR_PORT_INDEX */
+	       nla_total_size(sizeof(u32)) +
+	       /* DEVLINK_ATTR_PORT_TYPE */
+	       nla_total_size(sizeof(u16)) +
+	       /* DEVLINK_ATTR_PORT_NETDEV_IFINDEX */
+	       nla_total_size(sizeof(u32)) +
+	       /* DEVLINK_ATTR_PORT_{NETDEV, IBDEV}_NAME */
+	       nla_total_size(max_t(int, IFNAMSIZ, IB_DEVICE_NAME_MAX) + 1);
+}
+
+static size_t
+devlink_nl_trap_report_size(struct devlink *devlink,
+			    const struct devlink_trap_item *trap_item,
+			    size_t skb_len)
+{
+	size_t size = nlmsg_msg_size(GENL_HDRLEN + devlink_nl_family.hdrsize);
+	struct devlink_trap_group_item *group_item = trap_item->group_item;
+
+	return NLMSG_ALIGN(size) +
+	       /* DEVLINK_ATTR_BUS_NAME */
+	       nla_total_size(strlen(devlink->dev->bus->name) + 1) +
+	       /* DEVLINK_ATTR_DEV_NAME */
+	       nla_total_size(strlen(dev_name(devlink->dev)) + 1) +
+	       /* DEVLINK_ATTR_TRAP_GROUP_NAME */
+	       nla_total_size(strlen(group_item->group->name) + 1) +
+	       /* DEVLINK_ATTR_TRAP_NAME */
+	       nla_total_size(strlen(trap_item->trap->name) + 1) +
+	       /* DEVLINK_ATTR_TRAP_TYPE */
+	       nla_total_size(sizeof(u8)) +
+	       /* DEVLINK_ATTR_TRAP_TIMESTAMP */
+	       nla_total_size(sizeof(struct timespec)) +
+	       /* DEVLINK_ATTR_TRAP_IN_PORT */
+	       devlink_nl_trap_in_port_size() +
+	       /* DEVLINK_ATTR_TRAP_PAYLOAD */
+	       nla_total_size(skb_len);
+}
+
+static int
+devlink_nl_trap_report_in_port_put(struct sk_buff *msg,
+				   const struct devlink_trap *trap,
+				   struct devlink_port *in_devlink_port)
+{
+	struct nlattr *attr;
+
+	if (!(trap->metadata_cap & DEVLINK_TRAP_METADATA_TYPE_F_IN_PORT))
+		return 0;
+
+	attr = nla_nest_start(msg, DEVLINK_ATTR_TRAP_IN_PORT);
+	if (!attr)
+		return -EMSGSIZE;
+
+	if (nla_put_u32(msg, DEVLINK_ATTR_PORT_INDEX, in_devlink_port->index))
+		goto nla_put_failure;
+
+	spin_lock(&in_devlink_port->type_lock);
+
+	if (devlink_nl_port_type_fill(msg, in_devlink_port))
+		goto unlock_nla_put_failure;
+
+	spin_unlock(&in_devlink_port->type_lock);
+
+	nla_nest_end(msg, attr);
+
+	return 0;
+
+unlock_nla_put_failure:
+	spin_unlock(&in_devlink_port->type_lock);
+nla_put_failure:
+	nla_nest_cancel(msg, attr);
+	return -EMSGSIZE;
+}
+
+static int
+devlink_nl_trap_report_fill(struct sk_buff *msg, struct devlink *devlink,
+			    const struct devlink_trap_item *trap_item,
+			    struct sk_buff *skb,
+			    struct devlink_port *in_devlink_port)
+{
+	struct devlink_trap_group_item *group_item = trap_item->group_item;
+	struct timespec ts = ktime_to_timespec(ktime_get_real());
+	const struct devlink_trap *trap = trap_item->trap;
+	struct nlattr *attr;
+	void *hdr;
+	int err;
+
+	hdr = genlmsg_put(msg, 0, 0, &devlink_nl_family, 0,
+			  DEVLINK_CMD_TRAP_REPORT);
+	if (!hdr)
+		return -EMSGSIZE;
+
+	if (devlink_nl_put_handle(msg, devlink))
+		goto nla_put_failure;
+
+	if (nla_put_string(msg, DEVLINK_ATTR_TRAP_GROUP_NAME,
+			   group_item->group->name))
+		goto nla_put_failure;
+
+	if (nla_put_string(msg, DEVLINK_ATTR_TRAP_NAME, trap->name))
+		goto nla_put_failure;
+
+	if (nla_put_u8(msg, DEVLINK_ATTR_TRAP_TYPE, trap->type))
+		goto nla_put_failure;
+
+	if (nla_put(msg, DEVLINK_ATTR_TRAP_TIMESTAMP, sizeof(ts), &ts))
+		goto nla_put_failure;
+
+	err = devlink_nl_trap_report_in_port_put(msg, trap, in_devlink_port);
+	if (err)
+		goto nla_put_failure;
+
+	attr = skb_put(msg, nla_total_size(skb->len));
+	attr->nla_type = DEVLINK_ATTR_TRAP_PAYLOAD;
+	attr->nla_len = nla_attr_size(skb->len);
+	if (skb_copy_bits(skb, 0, nla_data(attr), skb->len))
+		goto nla_put_failure;
+
+	genlmsg_end(msg, hdr);
+
+	return 0;
+
+nla_put_failure:
+	genlmsg_cancel(msg, hdr);
+	return -EMSGSIZE;
+}
+
+/**
+ * devlink_trap_report - Report trapped packet to user space.
+ * @devlink: devlink.
+ * @skb: Trapped packet.
+ * @trap_ctx: Trap context.
+ * @in_devlink_port: Input devlink port.
+ */
+void devlink_trap_report(struct devlink *devlink, struct sk_buff *skb,
+			 void *trap_ctx, struct devlink_port *in_devlink_port)
+{
+	struct devlink_trap_item *trap_item = trap_ctx;
+	struct sk_buff *msg;
+	int err;
+
+	devlink_trap_stats_update(trap_item->stats, skb->len);
+	devlink_trap_stats_update(trap_item->group_item->stats, skb->len);
+
+	if (!trap_item->report)
+		return;
+
+	msg = nlmsg_new(devlink_nl_trap_report_size(devlink, trap_item,
+						    skb->len), GFP_ATOMIC);
+	if (!msg)
+		return;
+
+	err = devlink_nl_trap_report_fill(msg, devlink, trap_item, skb,
+					  in_devlink_port);
+	if (err) {
+		nlmsg_free(msg);
+		return;
+	}
+
+	genlmsg_multicast_netns(&devlink_nl_family, devlink_net(devlink),
+				msg, 0, DEVLINK_MCGRP_TRAP, GFP_ATOMIC);
+}
+EXPORT_SYMBOL_GPL(devlink_trap_report);
+
+/**
+ * devlink_trap_ctx_priv - Trap context to driver private information.
+ * @trap_ctx: Trap context.
+ *
+ * Return: Driver private information passed during registration.
+ */
+void *devlink_trap_ctx_priv(void *trap_ctx)
+{
+	struct devlink_trap_item *trap_item = trap_ctx;
+
+	return trap_item->priv;
+}
+EXPORT_SYMBOL_GPL(devlink_trap_ctx_priv);
+
 static void __devlink_compat_running_version(struct devlink *devlink,
 					     char *buf, size_t len)
 {
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH net-next 03/11] devlink: Add generic packet traps and groups
  2019-07-07  7:58 [PATCH net-next 00/11] Add drop monitor for offloaded data paths Ido Schimmel
  2019-07-07  7:58 ` [PATCH net-next 01/11] devlink: Create helper to fill port type information Ido Schimmel
  2019-07-07  7:58 ` [PATCH net-next 02/11] devlink: Add packet trap infrastructure Ido Schimmel
@ 2019-07-07  7:58 ` Ido Schimmel
  2019-07-07  7:58 ` [PATCH net-next 04/11] Documentation: Add devlink-trap documentation Ido Schimmel
                   ` (11 subsequent siblings)
  14 siblings, 0 replies; 48+ messages in thread
From: Ido Schimmel @ 2019-07-07  7:58 UTC (permalink / raw)
  To: netdev
  Cc: davem, jiri, mlxsw, dsahern, roopa, nikolay, andy, pablo,
	jakub.kicinski, pieter.jansenvanvuuren, andrew, f.fainelli,
	vivien.didelot, Ido Schimmel

From: Ido Schimmel <idosch@mellanox.com>

Add generic packet traps and groups that can report dropped packets as
well as exceptions such as TTL error.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 include/net/devlink.h | 40 ++++++++++++++++++++++++++++++++++++++++
 net/core/devlink.c    | 12 ++++++++++++
 2 files changed, 52 insertions(+)

diff --git a/include/net/devlink.h b/include/net/devlink.h
index 22614d3a742a..9121eb45b1c0 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -521,18 +521,58 @@ struct devlink_trap {
 };
 
 enum devlink_trap_generic_id {
+	DEVLINK_TRAP_GENERIC_ID_SMAC_MC,
+	DEVLINK_TRAP_GENERIC_ID_VLAN_TAG_MISMATCH,
+	DEVLINK_TRAP_GENERIC_ID_INGRESS_VLAN_FILTER,
+	DEVLINK_TRAP_GENERIC_ID_INGRESS_STP_FILTER,
+	DEVLINK_TRAP_GENERIC_ID_EMPTY_TX_LIST,
+	DEVLINK_TRAP_GENERIC_ID_PORT_LOOPBACK_FILTER,
+	DEVLINK_TRAP_GENERIC_ID_BLACKHOLE_ROUTE,
+	DEVLINK_TRAP_GENERIC_ID_TTL_ERROR,
+	DEVLINK_TRAP_GENERIC_ID_TAIL_DROP,
+
 	/* Add new generic trap IDs above */
 	__DEVLINK_TRAP_GENERIC_ID_MAX,
 	DEVLINK_TRAP_GENERIC_ID_MAX = __DEVLINK_TRAP_GENERIC_ID_MAX - 1,
 };
 
 enum devlink_trap_group_generic_id {
+	DEVLINK_TRAP_GROUP_GENERIC_ID_L2_DROPS,
+	DEVLINK_TRAP_GROUP_GENERIC_ID_L3_DROPS,
+	DEVLINK_TRAP_GROUP_GENERIC_ID_BUFFER_DROPS,
+
 	/* Add new generic trap group IDs above */
 	__DEVLINK_TRAP_GROUP_GENERIC_ID_MAX,
 	DEVLINK_TRAP_GROUP_GENERIC_ID_MAX =
 		__DEVLINK_TRAP_GROUP_GENERIC_ID_MAX - 1,
 };
 
+#define DEVLINK_TRAP_GENERIC_NAME_SMAC_MC \
+	"source_mac_is_multicast"
+#define DEVLINK_TRAP_GENERIC_NAME_VLAN_TAG_MISMATCH \
+	"vlan_tag_mismatch"
+#define DEVLINK_TRAP_GENERIC_NAME_INGRESS_VLAN_FILTER \
+	"ingress_vlan_filter"
+#define DEVLINK_TRAP_GENERIC_NAME_INGRESS_STP_FILTER \
+	"ingress_spanning_tree_filter"
+#define DEVLINK_TRAP_GENERIC_NAME_EMPTY_TX_LIST \
+	"port_list_is_empty"
+#define DEVLINK_TRAP_GENERIC_NAME_PORT_LOOPBACK_FILTER \
+	"port_loopback_filter"
+#define DEVLINK_TRAP_GENERIC_NAME_BLACKHOLE_ROUTE \
+	"blackhole_route"
+#define DEVLINK_TRAP_GENERIC_NAME_TTL_ERROR \
+	"ttl_value_is_too_small"
+#define DEVLINK_TRAP_GENERIC_NAME_TAIL_DROP \
+	"tail_drop"
+
+#define DEVLINK_TRAP_GROUP_GENERIC_NAME_L2_DROPS \
+	"l2_drops"
+#define DEVLINK_TRAP_GROUP_GENERIC_NAME_L3_DROPS \
+	"l3_drops"
+#define DEVLINK_TRAP_GROUP_GENERIC_NAME_BUFFER_DROPS \
+	"buffer_drops"
+
 #define DEVLINK_TRAP_GENERIC(_type, _init_action, _id, _group, _metadata_cap) \
 	{								      \
 		.type = DEVLINK_TRAP_TYPE_##_type,			      \
diff --git a/net/core/devlink.c b/net/core/devlink.c
index 44a850ec3d5f..172ede6ba1b1 100644
--- a/net/core/devlink.c
+++ b/net/core/devlink.c
@@ -7414,6 +7414,15 @@ EXPORT_SYMBOL_GPL(devlink_region_snapshot_create);
 	}
 
 static const struct devlink_trap devlink_trap_generic[] = {
+	DEVLINK_TRAP(SMAC_MC, DROP),
+	DEVLINK_TRAP(VLAN_TAG_MISMATCH, DROP),
+	DEVLINK_TRAP(INGRESS_VLAN_FILTER, DROP),
+	DEVLINK_TRAP(INGRESS_STP_FILTER, DROP),
+	DEVLINK_TRAP(EMPTY_TX_LIST, DROP),
+	DEVLINK_TRAP(PORT_LOOPBACK_FILTER, DROP),
+	DEVLINK_TRAP(BLACKHOLE_ROUTE, DROP),
+	DEVLINK_TRAP(TTL_ERROR, EXCEPTION),
+	DEVLINK_TRAP(TAIL_DROP, DROP),
 };
 
 #define DEVLINK_TRAP_GROUP(_id)						      \
@@ -7423,6 +7432,9 @@ static const struct devlink_trap devlink_trap_generic[] = {
 	}
 
 static const struct devlink_trap_group devlink_trap_group_generic[] = {
+	DEVLINK_TRAP_GROUP(L2_DROPS),
+	DEVLINK_TRAP_GROUP(L3_DROPS),
+	DEVLINK_TRAP_GROUP(BUFFER_DROPS),
 };
 
 static int devlink_trap_generic_verify(const struct devlink_trap *trap)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH net-next 04/11] Documentation: Add devlink-trap documentation
  2019-07-07  7:58 [PATCH net-next 00/11] Add drop monitor for offloaded data paths Ido Schimmel
                   ` (2 preceding siblings ...)
  2019-07-07  7:58 ` [PATCH net-next 03/11] devlink: Add generic packet traps and groups Ido Schimmel
@ 2019-07-07  7:58 ` Ido Schimmel
  2019-07-07  7:58 ` [PATCH net-next 05/11] netdevsim: Add devlink-trap support Ido Schimmel
                   ` (10 subsequent siblings)
  14 siblings, 0 replies; 48+ messages in thread
From: Ido Schimmel @ 2019-07-07  7:58 UTC (permalink / raw)
  To: netdev
  Cc: davem, jiri, mlxsw, dsahern, roopa, nikolay, andy, pablo,
	jakub.kicinski, pieter.jansenvanvuuren, andrew, f.fainelli,
	vivien.didelot, Ido Schimmel

From: Ido Schimmel <idosch@mellanox.com>

Add initial documentation of the devlink-trap mechanism, explaining the
background, motivation and the semantics of the interface.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 Documentation/networking/devlink-trap.rst | 179 ++++++++++++++++++++++
 Documentation/networking/index.rst        |   1 +
 include/net/devlink.h                     |   6 +
 3 files changed, 186 insertions(+)
 create mode 100644 Documentation/networking/devlink-trap.rst

diff --git a/Documentation/networking/devlink-trap.rst b/Documentation/networking/devlink-trap.rst
new file mode 100644
index 000000000000..4a4fd42f5372
--- /dev/null
+++ b/Documentation/networking/devlink-trap.rst
@@ -0,0 +1,179 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+============
+Devlink Trap
+============
+
+Background
+==========
+
+Devices capable of offloading the kernel's datapath and perform functions such
+as bridging and routing must also be able to send specific packets to the
+kernel (i.e., the CPU) for processing.
+
+For example, a device acting as a multicast-aware bridge must be able to send
+IGMP membership reports to the kernel for processing by the bridge module.
+Without processing such packets, the bridge module could never populate its
+MDB.
+
+As another example, consider a device acting as router which has received an IP
+packet with a TTL of 1. Upon routing the packet the device must send it to the
+kernel so that it will route it as well and generate an ICMP Time Exceeded
+error datagram. Without letting the kernel route such packets itself, utilities
+such as ``traceroute`` could never work.
+
+The fundamental ability of sending certain packets to the kernel for processing
+is called "packet trapping".
+
+Overview
+========
+
+The ``devlink-trap`` mechanism allows capable device drivers to register their
+supported packet traps with ``devlink`` and report trapped packets to
+``devlink`` for further analysis.
+
+Upon receiving trapped packets, ``devlink`` will perform a per-trap packets and
+bytes accounting and potentially report the packet to user space via a netlink
+event along with all the provided metadata (e.g., trap reason, timestamp, input
+port). This is especially useful for drop traps (see :ref:`Trap-Types`)
+as it allows users to obtain further visibility into packet drops that would
+otherwise be invisible.
+
+The following diagram provides a general overview of ``devlink-trap``::
+
+                                    Netlink event: Packet w/ metadata
+                                  ^
+                                  |
+         Userspace                |
+	+---------------------------------------------------+
+	 Kernel                   |
+				  |
+			     +----+----+
+			     |         |      Kernel's Rx path
+			     | devlink |      (non-drop traps)
+			     |         |
+			     +----^----+      ^
+				  |           |
+				  +-----------+
+				  |
+			  +-------+-------+
+			  |               |
+			  | Device driver |
+			  |               |
+			  +-------^-------+
+	 Kernel                   |
+	+---------------------------------------------------+
+	 Hardware                 |
+				  | Trapped packet
+				  |
+			       +--+---+
+			       |      |
+			       | ASIC |
+			       |      |
+			       +------+
+
+.. _Trap-Types:
+
+Trap Types
+==========
+
+The ``devlink-trap`` mechanism supports the following packet trap types:
+
+  * ``drop``: Trapped packets were dropped by the underlying device. Packets
+    are only processed by ``devlink`` and not injected to the kernel's Rx path.
+    The trap action (see :ref:`Trap-Actions`) can be changed.
+  * ``exception``: Trapped packets were not forwarded as intended by the
+    underlying device due to an exception (e.g., TTL error, missing neighbour
+    entry) and trapped to the control plane for resolution. Packets are
+    processed by ``devlink`` and injected to the kernel's Rx path. Changing the
+    action of such traps is not allowed, as it can easily break the control
+    plane.
+
+.. _Trap-Actions:
+
+Trap Actions
+============
+
+The ``devlink-trap`` mechanism supports the following packet trap actions:
+
+  * ``trap``: The sole copy of the packet is sent to the CPU.
+  * ``drop``: The packet is dropped by the underlying device and a copy is not
+    sent to the CPU.
+
+Generic Packet Traps
+====================
+
+Generic packet traps are used to describe traps that trap well-defined packets
+or packets that are trapped due to well-defined conditions (e.g., TTL error).
+Such traps can be shared by multiple device drivers and their description must
+be added to the following table:
+
+.. list-table:: List of Generic Packet Traps
+   :widths: 5 5 90
+
+   * - Name
+     - Type
+     - Description
+   * - ``source_mac_is_multicast``
+     - ``drop``
+     - Traps incoming packets that the device decided to drop because of a
+       multicast source MAC
+   * - ``vlan_tag_mismatch``
+     - ``drop``
+     - Traps incoming packets that the device decided to drop in case of VLAN
+       tag mismatch: The ingress bridge port is not configured with a PVID and
+       the packet is untagged or prio-tagged
+   * - ``ingress_vlan_filter``
+     - ``drop``
+     - Traps incoming packets that the device decided to drop in case they are
+       tagged with a VLAN that is not configured on the ingress bridge port
+   * - ``ingress_spanning_tree_filter``
+     - ``drop``
+     - Traps incoming packets that the device decided to drop in case the STP
+       state of the ingress bridge port is not "forwarding"
+   * - ``port_list_is_empty``
+     - ``drop``
+     - Traps packets that the device decided to drop in case they need to be
+       flooded and the flood list is empty
+   * - ``port_loopback_filter``
+     - ``drop``
+     - Traps packets that the device decided to drop in case after layer 2
+       forwarding the only port from which they should be transmitted through
+       is the port from which they were received
+   * - ``blackhole_route``
+     - ``drop``
+     - Traps packets that the device decided to drop in case they hit a
+       blackhole route
+   * - ``ttl_value_is_too_small``
+     - ``exception``
+     - Traps unicast packets that should be forwarded by the device whose TTL
+       was decremented to 0 or less
+   * - ``tail_drop``
+     - ``drop``
+     - Traps packets that the device decided to drop because they could not be
+       enqueued to a transmission queue which is full
+
+Generic Packet Trap Groups
+==========================
+
+Generic packet trap groups are used to aggregate logically related packet
+traps. These groups allow the user to batch operations such as setting the
+trap action and report state of all member traps. In addition, ``devlink-trap``
+can report aggregated per-group packets and bytes statistics, in case per-trap
+statistics are too narrow. The description of these groups must be added to the
+following table:
+
+.. list-table:: List of Generic Packet Trap Groups
+   :widths: 10 90
+
+   * - Name
+     - Description
+   * - ``l2_drops``
+     - Contains packet traps for packets that were dropped by the device during
+       layer 2 forwarding (i.e., bridge)
+   * - ``l3_drops``
+     - Contains packet traps for packets that were dropped by the device or hit
+       an exception (e.g., TTL error) during layer 3 forwarding
+   * - ``buffer_drops``
+     - Contains packet traps for packets that were dropped by the device due to
+       an enqueue decision
diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst
index a46fca264bee..86a814e4d450 100644
--- a/Documentation/networking/index.rst
+++ b/Documentation/networking/index.rst
@@ -14,6 +14,7 @@ Contents:
    device_drivers/index
    dsa/index
    devlink-info-versions
+   devlink-trap
    ieee802154
    kapi
    z8530book
diff --git a/include/net/devlink.h b/include/net/devlink.h
index 9121eb45b1c0..5a578750f6b4 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -520,6 +520,9 @@ struct devlink_trap {
 	u32 metadata_cap;
 };
 
+/* All traps must be documented in
+ * Documentation/networking/devlink-trap.rst
+ */
 enum devlink_trap_generic_id {
 	DEVLINK_TRAP_GENERIC_ID_SMAC_MC,
 	DEVLINK_TRAP_GENERIC_ID_VLAN_TAG_MISMATCH,
@@ -536,6 +539,9 @@ enum devlink_trap_generic_id {
 	DEVLINK_TRAP_GENERIC_ID_MAX = __DEVLINK_TRAP_GENERIC_ID_MAX - 1,
 };
 
+/* All trap groups must be documented in
+ * Documentation/networking/devlink-trap.rst
+ */
 enum devlink_trap_group_generic_id {
 	DEVLINK_TRAP_GROUP_GENERIC_ID_L2_DROPS,
 	DEVLINK_TRAP_GROUP_GENERIC_ID_L3_DROPS,
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH net-next 05/11] netdevsim: Add devlink-trap support
  2019-07-07  7:58 [PATCH net-next 00/11] Add drop monitor for offloaded data paths Ido Schimmel
                   ` (3 preceding siblings ...)
  2019-07-07  7:58 ` [PATCH net-next 04/11] Documentation: Add devlink-trap documentation Ido Schimmel
@ 2019-07-07  7:58 ` Ido Schimmel
  2019-07-07  7:58 ` [PATCH net-next 06/11] Documentation: Add description of netdevsim traps Ido Schimmel
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 48+ messages in thread
From: Ido Schimmel @ 2019-07-07  7:58 UTC (permalink / raw)
  To: netdev
  Cc: davem, jiri, mlxsw, dsahern, roopa, nikolay, andy, pablo,
	jakub.kicinski, pieter.jansenvanvuuren, andrew, f.fainelli,
	vivien.didelot, Ido Schimmel

From: Ido Schimmel <idosch@mellanox.com>

Have netdevsim register its trap groups and traps with devlink during
initialization and periodically report trapped packets to devlink core.

Since netdevsim is not a real device, the trapped packets are emulated
using a workqueue that periodically reports a UDP packet with a random
5-tuple from each active packet trap and from each running netdev.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/netdevsim/dev.c       | 270 +++++++++++++++++++++++++++++-
 drivers/net/netdevsim/netdevsim.h |   1 +
 2 files changed, 270 insertions(+), 1 deletion(-)

diff --git a/drivers/net/netdevsim/dev.c b/drivers/net/netdevsim/dev.c
index c5c417a3c0ce..2d8c60c3326c 100644
--- a/drivers/net/netdevsim/dev.c
+++ b/drivers/net/netdevsim/dev.c
@@ -17,11 +17,21 @@
 
 #include <linux/debugfs.h>
 #include <linux/device.h>
+#include <linux/etherdevice.h>
+#include <linux/inet.h>
+#include <linux/jiffies.h>
+#include <linux/kernel.h>
 #include <linux/list.h>
 #include <linux/mutex.h>
 #include <linux/random.h>
+#include <linux/workqueue.h>
+#include <linux/random.h>
 #include <linux/rtnetlink.h>
 #include <net/devlink.h>
+#include <net/ip.h>
+#include <uapi/linux/devlink.h>
+#include <uapi/linux/ip.h>
+#include <uapi/linux/udp.h>
 
 #include "netdevsim.h"
 
@@ -196,6 +206,205 @@ static int nsim_dev_resources_register(struct devlink *devlink)
 	return err;
 }
 
+struct nsim_trap_item {
+	void *trap_ctx;
+	enum devlink_trap_action action;
+};
+
+struct nsim_trap_data {
+	struct delayed_work trap_report_dw;
+	struct nsim_trap_item *trap_items_arr;
+	struct nsim_dev *nsim_dev;
+	spinlock_t trap_lock;	/* Protects trap_items_arr */
+};
+
+enum {
+	NSIM_TRAP_ID_BASE = DEVLINK_TRAP_GENERIC_ID_MAX,
+	NSIM_TRAP_ID_FID_MISS,
+};
+
+#define NSIM_TRAP_NAME_FID_MISS "fid_miss"
+
+#define NSIM_TRAP_METADATA DEVLINK_TRAP_METADATA_TYPE_F_IN_PORT
+
+#define NSIM_TRAP_DROP(_id, _group_id)					      \
+	DEVLINK_TRAP_GENERIC(DROP, DROP, _id,				      \
+			     DEVLINK_TRAP_GROUP_GENERIC(_group_id),	      \
+			     NSIM_TRAP_METADATA)
+#define NSIM_TRAP_EXCEPTION(_id, _group_id)				      \
+	DEVLINK_TRAP_GENERIC(EXCEPTION, TRAP, _id,			      \
+			     DEVLINK_TRAP_GROUP_GENERIC(_group_id),	      \
+			     NSIM_TRAP_METADATA)
+#define NSIM_TRAP_DRIVER_EXCEPTION(_id, _group_id)			      \
+	DEVLINK_TRAP_DRIVER(EXCEPTION, TRAP, NSIM_TRAP_ID_##_id,	      \
+			    NSIM_TRAP_NAME_##_id,			      \
+			    DEVLINK_TRAP_GROUP_GENERIC(_group_id),	      \
+			    NSIM_TRAP_METADATA)
+
+static const struct devlink_trap nsim_traps_arr[] = {
+	NSIM_TRAP_DROP(SMAC_MC, L2_DROPS),
+	NSIM_TRAP_DROP(VLAN_TAG_MISMATCH, L2_DROPS),
+	NSIM_TRAP_DROP(INGRESS_VLAN_FILTER, L2_DROPS),
+	NSIM_TRAP_DROP(INGRESS_STP_FILTER, L2_DROPS),
+	NSIM_TRAP_DROP(EMPTY_TX_LIST, L2_DROPS),
+	NSIM_TRAP_DROP(PORT_LOOPBACK_FILTER, L2_DROPS),
+	NSIM_TRAP_DRIVER_EXCEPTION(FID_MISS, L2_DROPS),
+	NSIM_TRAP_DROP(BLACKHOLE_ROUTE, L3_DROPS),
+	NSIM_TRAP_EXCEPTION(TTL_ERROR, L3_DROPS),
+	NSIM_TRAP_DROP(TAIL_DROP, BUFFER_DROPS),
+};
+
+#define NSIM_TRAP_L4_DATA_LEN 100
+
+static struct sk_buff *nsim_dev_trap_skb_build(void)
+{
+	int tot_len, data_len = NSIM_TRAP_L4_DATA_LEN;
+	struct sk_buff *skb;
+	struct udphdr *udph;
+	struct ethhdr *eth;
+	struct iphdr *iph;
+
+	skb = alloc_skb(NLMSG_GOODSIZE, GFP_ATOMIC);
+	if (!skb)
+		return NULL;
+	tot_len = sizeof(struct iphdr) + sizeof(struct udphdr) + data_len;
+
+	eth = skb_put(skb, sizeof(struct ethhdr));
+	eth_random_addr(eth->h_dest);
+	eth_random_addr(eth->h_source);
+	eth->h_proto = htons(ETH_P_IP);
+	skb->protocol = htons(ETH_P_IP);
+
+	iph = skb_put(skb, sizeof(struct iphdr));
+	iph->protocol = IPPROTO_UDP;
+	iph->saddr = in_aton("192.0.2.1");
+	iph->daddr = in_aton("198.51.100.1");
+	iph->version = 0x4;
+	iph->frag_off = 0;
+	iph->ihl = 0x5;
+	iph->tot_len = htons(tot_len);
+	ip_send_check(iph);
+
+	udph = skb_put_zero(skb, sizeof(struct udphdr) + data_len);
+	get_random_bytes(&udph->source, sizeof(u16));
+	get_random_bytes(&udph->dest, sizeof(u16));
+	udph->len = htons(sizeof(struct udphdr) + data_len);
+
+	return skb;
+}
+
+static void nsim_dev_trap_report(struct nsim_dev_port *nsim_dev_port)
+{
+	struct nsim_dev *nsim_dev = nsim_dev_port->ns->nsim_dev;
+	struct nsim_trap_data *nsim_trap_data = nsim_dev->trap_data;
+	struct devlink *devlink = priv_to_devlink(nsim_dev);
+	int i;
+
+	spin_lock(&nsim_trap_data->trap_lock);
+	for (i = 0; i < ARRAY_SIZE(nsim_traps_arr); i++) {
+		struct nsim_trap_item *nsim_trap_item;
+		struct sk_buff *skb;
+
+		nsim_trap_item = &nsim_trap_data->trap_items_arr[i];
+		if (nsim_trap_item->action == DEVLINK_TRAP_ACTION_DROP)
+			continue;
+
+		skb = nsim_dev_trap_skb_build();
+		if (!skb)
+			continue;
+		skb->dev = nsim_dev_port->ns->netdev;
+
+		devlink_trap_report(devlink, skb, nsim_trap_item->trap_ctx,
+				    &nsim_dev_port->devlink_port);
+		consume_skb(skb);
+	}
+	spin_unlock(&nsim_trap_data->trap_lock);
+}
+
+#define NSIM_TRAP_REPORT_INTERVAL_MS	100
+
+static void nsim_dev_trap_report_work(struct work_struct *work)
+{
+	struct nsim_trap_data *nsim_trap_data;
+	struct nsim_dev_port *nsim_dev_port;
+	struct nsim_dev *nsim_dev;
+
+	nsim_trap_data = container_of(work, struct nsim_trap_data,
+				      trap_report_dw.work);
+	nsim_dev = nsim_trap_data->nsim_dev;
+
+	/* For each running port and enabled packet trap, generate a UDP
+	 * packet with a random 5-tuple and report it.
+	 */
+	mutex_lock(&nsim_dev->port_list_lock);
+	list_for_each_entry(nsim_dev_port, &nsim_dev->port_list, list) {
+		if (!netif_running(nsim_dev_port->ns->netdev))
+			continue;
+
+		nsim_dev_trap_report(nsim_dev_port);
+	}
+	mutex_unlock(&nsim_dev->port_list_lock);
+
+	schedule_delayed_work(&nsim_dev->trap_data->trap_report_dw,
+			      msecs_to_jiffies(NSIM_TRAP_REPORT_INTERVAL_MS));
+}
+
+static int nsim_dev_traps_init(struct devlink *devlink)
+{
+	struct nsim_dev *nsim_dev = devlink_priv(devlink);
+	struct nsim_trap_data *nsim_trap_data;
+	int err;
+
+	nsim_trap_data = kzalloc(sizeof(*nsim_trap_data), GFP_KERNEL);
+	if (!nsim_trap_data)
+		return -ENOMEM;
+
+	nsim_trap_data->trap_items_arr = kcalloc(ARRAY_SIZE(nsim_traps_arr),
+						 sizeof(struct nsim_trap_item),
+						 GFP_KERNEL);
+	if (!nsim_trap_data->trap_items_arr) {
+		err = -ENOMEM;
+		goto err_trap_data_free;
+	}
+
+	/* The lock is used to protect the action state of the registered
+	 * traps. The value is written by user and read in delayed work when
+	 * iterating over all the traps.
+	 */
+	spin_lock_init(&nsim_trap_data->trap_lock);
+	nsim_trap_data->nsim_dev = nsim_dev;
+	nsim_dev->trap_data = nsim_trap_data;
+
+	err = devlink_traps_register(devlink, nsim_traps_arr,
+				     ARRAY_SIZE(nsim_traps_arr), NULL);
+	if (err)
+		goto err_trap_items_free;
+
+	INIT_DELAYED_WORK(&nsim_dev->trap_data->trap_report_dw,
+			  nsim_dev_trap_report_work);
+	schedule_delayed_work(&nsim_dev->trap_data->trap_report_dw,
+			      msecs_to_jiffies(NSIM_TRAP_REPORT_INTERVAL_MS));
+
+	return 0;
+
+err_trap_items_free:
+	kfree(nsim_trap_data->trap_items_arr);
+err_trap_data_free:
+	kfree(nsim_trap_data);
+	return err;
+}
+
+static void nsim_dev_traps_exit(struct devlink *devlink)
+{
+	struct nsim_dev *nsim_dev = devlink_priv(devlink);
+
+	cancel_delayed_work_sync(&nsim_dev->trap_data->trap_report_dw);
+	devlink_traps_unregister(devlink, nsim_traps_arr,
+				 ARRAY_SIZE(nsim_traps_arr));
+	kfree(nsim_dev->trap_data->trap_items_arr);
+	kfree(nsim_dev->trap_data);
+}
+
 static int nsim_dev_reload(struct devlink *devlink,
 			   struct netlink_ext_ack *extack)
 {
@@ -262,9 +471,61 @@ static int nsim_dev_flash_update(struct devlink *devlink, const char *file_name,
 	return 0;
 }
 
+static struct nsim_trap_item *
+nsim_dev_trap_item_lookup(struct nsim_dev *nsim_dev, u16 trap_id)
+{
+	struct nsim_trap_data *nsim_trap_data = nsim_dev->trap_data;
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(nsim_traps_arr); i++) {
+		if (nsim_traps_arr[i].id == trap_id)
+			return &nsim_trap_data->trap_items_arr[i];
+	}
+
+	return NULL;
+}
+
+static int nsim_dev_devlink_trap_init(struct devlink *devlink,
+				      const struct devlink_trap *trap,
+				      void *trap_ctx)
+{
+	struct nsim_dev *nsim_dev = devlink_priv(devlink);
+	struct nsim_trap_item *nsim_trap_item;
+
+	nsim_trap_item = nsim_dev_trap_item_lookup(nsim_dev, trap->id);
+	if (WARN_ON(!nsim_trap_item))
+		return -ENOENT;
+
+	nsim_trap_item->trap_ctx = trap_ctx;
+	nsim_trap_item->action = trap->init_action;
+
+	return 0;
+}
+
+static int
+nsim_dev_devlink_trap_action_set(struct devlink *devlink,
+				 const struct devlink_trap *trap,
+				 enum devlink_trap_action action)
+{
+	struct nsim_dev *nsim_dev = devlink_priv(devlink);
+	struct nsim_trap_item *nsim_trap_item;
+
+	nsim_trap_item = nsim_dev_trap_item_lookup(nsim_dev, trap->id);
+	if (WARN_ON(!nsim_trap_item))
+		return -ENOENT;
+
+	spin_lock(&nsim_dev->trap_data->trap_lock);
+	nsim_trap_item->action = action;
+	spin_unlock(&nsim_dev->trap_data->trap_lock);
+
+	return 0;
+}
+
 static const struct devlink_ops nsim_dev_devlink_ops = {
 	.reload = nsim_dev_reload,
 	.flash_update = nsim_dev_flash_update,
+	.trap_init = nsim_dev_devlink_trap_init,
+	.trap_action_set = nsim_dev_devlink_trap_action_set,
 };
 
 static struct nsim_dev *
@@ -299,10 +560,14 @@ nsim_dev_create(struct nsim_bus_dev *nsim_bus_dev, unsigned int port_count)
 	if (err)
 		goto err_resources_unregister;
 
-	err = nsim_dev_debugfs_init(nsim_dev);
+	err = nsim_dev_traps_init(devlink);
 	if (err)
 		goto err_dl_unregister;
 
+	err = nsim_dev_debugfs_init(nsim_dev);
+	if (err)
+		goto err_traps_exit;
+
 	err = nsim_bpf_dev_init(nsim_dev);
 	if (err)
 		goto err_debugfs_exit;
@@ -311,6 +576,8 @@ nsim_dev_create(struct nsim_bus_dev *nsim_bus_dev, unsigned int port_count)
 
 err_debugfs_exit:
 	nsim_dev_debugfs_exit(nsim_dev);
+err_traps_exit:
+	nsim_dev_traps_exit(devlink);
 err_dl_unregister:
 	devlink_unregister(devlink);
 err_resources_unregister:
@@ -328,6 +595,7 @@ static void nsim_dev_destroy(struct nsim_dev *nsim_dev)
 
 	nsim_bpf_dev_exit(nsim_dev);
 	nsim_dev_debugfs_exit(nsim_dev);
+	nsim_dev_traps_exit(devlink);
 	devlink_unregister(devlink);
 	devlink_resources_unregister(devlink, NULL);
 	nsim_fib_destroy(nsim_dev->fib_data);
diff --git a/drivers/net/netdevsim/netdevsim.h b/drivers/net/netdevsim/netdevsim.h
index 79c05af2a7c0..a3dd73ae252e 100644
--- a/drivers/net/netdevsim/netdevsim.h
+++ b/drivers/net/netdevsim/netdevsim.h
@@ -145,6 +145,7 @@ struct nsim_dev_port {
 struct nsim_dev {
 	struct nsim_bus_dev *nsim_bus_dev;
 	struct nsim_fib_data *fib_data;
+	struct nsim_trap_data *trap_data;
 	struct dentry *ddir;
 	struct dentry *ports_ddir;
 	struct bpf_offload_dev *bpf_dev;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH net-next 06/11] Documentation: Add description of netdevsim traps
  2019-07-07  7:58 [PATCH net-next 00/11] Add drop monitor for offloaded data paths Ido Schimmel
                   ` (4 preceding siblings ...)
  2019-07-07  7:58 ` [PATCH net-next 05/11] netdevsim: Add devlink-trap support Ido Schimmel
@ 2019-07-07  7:58 ` Ido Schimmel
  2019-07-07  7:58 ` [PATCH net-next 07/11] mlxsw: core: Add API to set trap action Ido Schimmel
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 48+ messages in thread
From: Ido Schimmel @ 2019-07-07  7:58 UTC (permalink / raw)
  To: netdev
  Cc: davem, jiri, mlxsw, dsahern, roopa, nikolay, andy, pablo,
	jakub.kicinski, pieter.jansenvanvuuren, andrew, f.fainelli,
	vivien.didelot, Ido Schimmel

From: Ido Schimmel <idosch@mellanox.com>

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 .../networking/devlink-trap-netdevsim.rst     | 20 +++++++++++++++++++
 Documentation/networking/devlink-trap.rst     | 11 ++++++++++
 Documentation/networking/index.rst            |  1 +
 drivers/net/netdevsim/dev.c                   |  3 +++
 4 files changed, 35 insertions(+)
 create mode 100644 Documentation/networking/devlink-trap-netdevsim.rst

diff --git a/Documentation/networking/devlink-trap-netdevsim.rst b/Documentation/networking/devlink-trap-netdevsim.rst
new file mode 100644
index 000000000000..b721c9415473
--- /dev/null
+++ b/Documentation/networking/devlink-trap-netdevsim.rst
@@ -0,0 +1,20 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+======================
+Devlink Trap netdevsim
+======================
+
+Driver-specific Traps
+=====================
+
+.. list-table:: List of Driver-specific Traps Registered by ``netdevsim``
+   :widths: 5 5 90
+
+   * - Name
+     - Type
+     - Description
+   * - ``fid_miss``
+     - ``exception``
+     - When a packet enters the device it is classified to a filtering
+       indentifier (FID) based on the ingress port and VLAN. This trap is used
+       to trap packets for which a FID could not be found
diff --git a/Documentation/networking/devlink-trap.rst b/Documentation/networking/devlink-trap.rst
index 4a4fd42f5372..906cdeb0d1f3 100644
--- a/Documentation/networking/devlink-trap.rst
+++ b/Documentation/networking/devlink-trap.rst
@@ -153,6 +153,17 @@ be added to the following table:
      - Traps packets that the device decided to drop because they could not be
        enqueued to a transmission queue which is full
 
+Driver-specific Packet Traps
+============================
+
+Device drivers can register driver-specific packet traps, but these must be
+clearly documented. Such traps can correspond to device-specific exceptions and
+help debug packet drops caused by these exceptions. The following list includes
+links to the description of driver-specific traps registered by various device
+drivers:
+
+  * :doc:`/devlink-trap-netdevsim`
+
 Generic Packet Trap Groups
 ==========================
 
diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst
index 86a814e4d450..37eabc17894c 100644
--- a/Documentation/networking/index.rst
+++ b/Documentation/networking/index.rst
@@ -15,6 +15,7 @@ Contents:
    dsa/index
    devlink-info-versions
    devlink-trap
+   devlink-trap-netdevsim
    ieee802154
    kapi
    z8530book
diff --git a/drivers/net/netdevsim/dev.c b/drivers/net/netdevsim/dev.c
index 2d8c60c3326c..2892b3154940 100644
--- a/drivers/net/netdevsim/dev.c
+++ b/drivers/net/netdevsim/dev.c
@@ -218,6 +218,9 @@ struct nsim_trap_data {
 	spinlock_t trap_lock;	/* Protects trap_items_arr */
 };
 
+/* All driver-specific traps must be documented in
+ * Documentation/networking/devlink-trap-netdevsim.rst
+ */
 enum {
 	NSIM_TRAP_ID_BASE = DEVLINK_TRAP_GENERIC_ID_MAX,
 	NSIM_TRAP_ID_FID_MISS,
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH net-next 07/11] mlxsw: core: Add API to set trap action
  2019-07-07  7:58 [PATCH net-next 00/11] Add drop monitor for offloaded data paths Ido Schimmel
                   ` (5 preceding siblings ...)
  2019-07-07  7:58 ` [PATCH net-next 06/11] Documentation: Add description of netdevsim traps Ido Schimmel
@ 2019-07-07  7:58 ` Ido Schimmel
  2019-07-07  7:58 ` [PATCH net-next 08/11] mlxsw: reg: Add new " Ido Schimmel
                   ` (7 subsequent siblings)
  14 siblings, 0 replies; 48+ messages in thread
From: Ido Schimmel @ 2019-07-07  7:58 UTC (permalink / raw)
  To: netdev
  Cc: davem, jiri, mlxsw, dsahern, roopa, nikolay, andy, pablo,
	jakub.kicinski, pieter.jansenvanvuuren, andrew, f.fainelli,
	vivien.didelot, Ido Schimmel

From: Ido Schimmel <idosch@mellanox.com>

Up until now the action of a trap was never changed during its lifetime.
This is going to change by subsequent patches that will allow devlink to
control the action of certain traps.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/core.c | 12 ++++++++++++
 drivers/net/ethernet/mellanox/mlxsw/core.h |  3 +++
 2 files changed, 15 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c b/drivers/net/ethernet/mellanox/mlxsw/core.c
index 17ceac7505e5..6ec07ecfb5f6 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.c
@@ -1477,6 +1477,18 @@ void mlxsw_core_trap_unregister(struct mlxsw_core *mlxsw_core,
 }
 EXPORT_SYMBOL(mlxsw_core_trap_unregister);
 
+int mlxsw_core_trap_action_set(struct mlxsw_core *mlxsw_core,
+			       const struct mlxsw_listener *listener,
+			       enum mlxsw_reg_hpkt_action action)
+{
+	char hpkt_pl[MLXSW_REG_HPKT_LEN];
+
+	mlxsw_reg_hpkt_pack(hpkt_pl, action, listener->trap_id,
+			    listener->trap_group, listener->is_ctrl);
+	return mlxsw_reg_write(mlxsw_core, MLXSW_REG(hpkt), hpkt_pl);
+}
+EXPORT_SYMBOL(mlxsw_core_trap_action_set);
+
 static u64 mlxsw_core_tid_get(struct mlxsw_core *mlxsw_core)
 {
 	return atomic64_inc_return(&mlxsw_core->emad.tid);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.h b/drivers/net/ethernet/mellanox/mlxsw/core.h
index 8efcff4b59cb..19cea16c30bb 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.h
@@ -128,6 +128,9 @@ int mlxsw_core_trap_register(struct mlxsw_core *mlxsw_core,
 void mlxsw_core_trap_unregister(struct mlxsw_core *mlxsw_core,
 				const struct mlxsw_listener *listener,
 				void *priv);
+int mlxsw_core_trap_action_set(struct mlxsw_core *mlxsw_core,
+			       const struct mlxsw_listener *listener,
+			       enum mlxsw_reg_hpkt_action action);
 
 typedef void mlxsw_reg_trans_cb_t(struct mlxsw_core *mlxsw_core, char *payload,
 				  size_t payload_len, unsigned long cb_priv);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH net-next 08/11] mlxsw: reg: Add new trap action
  2019-07-07  7:58 [PATCH net-next 00/11] Add drop monitor for offloaded data paths Ido Schimmel
                   ` (6 preceding siblings ...)
  2019-07-07  7:58 ` [PATCH net-next 07/11] mlxsw: core: Add API to set trap action Ido Schimmel
@ 2019-07-07  7:58 ` Ido Schimmel
  2019-07-07  7:58 ` [PATCH net-next 09/11] mlxsw: Add layer 2 discard trap IDs Ido Schimmel
                   ` (6 subsequent siblings)
  14 siblings, 0 replies; 48+ messages in thread
From: Ido Schimmel @ 2019-07-07  7:58 UTC (permalink / raw)
  To: netdev
  Cc: davem, jiri, mlxsw, dsahern, roopa, nikolay, andy, pablo,
	jakub.kicinski, pieter.jansenvanvuuren, andrew, f.fainelli,
	vivien.didelot, Ido Schimmel

From: Ido Schimmel <idosch@mellanox.com>

Subsequent patches will add discard traps support in mlxsw. The driver
cannot configure such traps with a normal trap action, but need to use
exception trap action, which also increments an error counter.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/reg.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/reg.h b/drivers/net/ethernet/mellanox/mlxsw/reg.h
index ead36702549a..55211218ec1a 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/reg.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/reg.h
@@ -5559,6 +5559,7 @@ enum mlxsw_reg_hpkt_action {
 	MLXSW_REG_HPKT_ACTION_DISCARD,
 	MLXSW_REG_HPKT_ACTION_SOFT_DISCARD,
 	MLXSW_REG_HPKT_ACTION_TRAP_AND_SOFT_DISCARD,
+	MLXSW_REG_HPKT_ACTION_TRAP_EXCEPTION_TO_CPU,
 };
 
 /* reg_hpkt_action
@@ -5569,6 +5570,7 @@ enum mlxsw_reg_hpkt_action {
  * 3 - Discard.
  * 4 - Soft discard (allow other traps to act on the packet).
  * 5 - Trap and soft discard (allow other traps to overwrite this trap).
+ * 6 - Trap to CPU (CPU receives sole copy) and count it as error.
  * Access: RW
  *
  * Note: Must be set to 0 (forward) for event trap IDs, as they are already
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH net-next 09/11] mlxsw: Add layer 2 discard trap IDs
  2019-07-07  7:58 [PATCH net-next 00/11] Add drop monitor for offloaded data paths Ido Schimmel
                   ` (7 preceding siblings ...)
  2019-07-07  7:58 ` [PATCH net-next 08/11] mlxsw: reg: Add new " Ido Schimmel
@ 2019-07-07  7:58 ` Ido Schimmel
  2019-07-07  7:58 ` [PATCH net-next 10/11] mlxsw: Add trap group for layer 2 discards Ido Schimmel
                   ` (5 subsequent siblings)
  14 siblings, 0 replies; 48+ messages in thread
From: Ido Schimmel @ 2019-07-07  7:58 UTC (permalink / raw)
  To: netdev
  Cc: davem, jiri, mlxsw, dsahern, roopa, nikolay, andy, pablo,
	jakub.kicinski, pieter.jansenvanvuuren, andrew, f.fainelli,
	vivien.didelot, Ido Schimmel

From: Ido Schimmel <idosch@mellanox.com>

Add the trap IDs used to report layer 2 drops.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/trap.h | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/trap.h b/drivers/net/ethernet/mellanox/mlxsw/trap.h
index 19202bdb5105..7618f084cae9 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/trap.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/trap.h
@@ -66,6 +66,13 @@ enum {
 	MLXSW_TRAP_ID_NVE_ENCAP_ARP = 0xBD,
 	MLXSW_TRAP_ID_ROUTER_ALERT_IPV4 = 0xD6,
 	MLXSW_TRAP_ID_ROUTER_ALERT_IPV6 = 0xD7,
+	MLXSW_TRAP_ID_DISCARD_ING_PACKET_SMAC_MC = 0x140,
+	MLXSW_TRAP_ID_DISCARD_ING_SWITCH_VTAG_ALLOW = 0x148,
+	MLXSW_TRAP_ID_DISCARD_ING_SWITCH_VLAN = 0x149,
+	MLXSW_TRAP_ID_DISCARD_ING_SWITCH_STP = 0x14A,
+	MLXSW_TRAP_ID_DISCARD_LOOKUP_SWITCH_UC = 0x150,
+	MLXSW_TRAP_ID_DISCARD_LOOKUP_SWITCH_MC_NULL = 0x151,
+	MLXSW_TRAP_ID_DISCARD_LOOKUP_SWITCH_LB = 0x152,
 	MLXSW_TRAP_ID_ACL0 = 0x1C0,
 	/* Multicast trap used for routes with trap action */
 	MLXSW_TRAP_ID_ACL1 = 0x1C1,
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH net-next 10/11] mlxsw: Add trap group for layer 2 discards
  2019-07-07  7:58 [PATCH net-next 00/11] Add drop monitor for offloaded data paths Ido Schimmel
                   ` (8 preceding siblings ...)
  2019-07-07  7:58 ` [PATCH net-next 09/11] mlxsw: Add layer 2 discard trap IDs Ido Schimmel
@ 2019-07-07  7:58 ` Ido Schimmel
  2019-07-07  7:58 ` [PATCH net-next 11/11] mlxsw: spectrum: Add devlink-trap support Ido Schimmel
                   ` (4 subsequent siblings)
  14 siblings, 0 replies; 48+ messages in thread
From: Ido Schimmel @ 2019-07-07  7:58 UTC (permalink / raw)
  To: netdev
  Cc: davem, jiri, mlxsw, dsahern, roopa, nikolay, andy, pablo,
	jakub.kicinski, pieter.jansenvanvuuren, andrew, f.fainelli,
	vivien.didelot, Ido Schimmel

From: Ido Schimmel <idosch@mellanox.com>

Discard trap groups are defined in a different enum so that they could
all share the same policer ID: MLXSW_REG_HTGT_TRAP_GROUP_MAX + 1.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/reg.h | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlxsw/reg.h b/drivers/net/ethernet/mellanox/mlxsw/reg.h
index 55211218ec1a..22138d15079f 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/reg.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/reg.h
@@ -5422,6 +5422,14 @@ enum mlxsw_reg_htgt_trap_group {
 	MLXSW_REG_HTGT_TRAP_GROUP_SP_LBERROR,
 	MLXSW_REG_HTGT_TRAP_GROUP_SP_PTP0,
 	MLXSW_REG_HTGT_TRAP_GROUP_SP_PTP1,
+
+	__MLXSW_REG_HTGT_TRAP_GROUP_MAX,
+	MLXSW_REG_HTGT_TRAP_GROUP_MAX = __MLXSW_REG_HTGT_TRAP_GROUP_MAX - 1
+};
+
+enum mlxsw_reg_htgt_discard_trap_group {
+	MLXSW_REG_HTGT_DISCARD_TRAP_GROUP_BASE = MLXSW_REG_HTGT_TRAP_GROUP_MAX,
+	MLXSW_REG_HTGT_TRAP_GROUP_SP_L2_DISCARDS,
 };
 
 /* reg_htgt_trap_group
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH net-next 11/11] mlxsw: spectrum: Add devlink-trap support
  2019-07-07  7:58 [PATCH net-next 00/11] Add drop monitor for offloaded data paths Ido Schimmel
                   ` (9 preceding siblings ...)
  2019-07-07  7:58 ` [PATCH net-next 10/11] mlxsw: Add trap group for layer 2 discards Ido Schimmel
@ 2019-07-07  7:58 ` Ido Schimmel
  2019-07-07  8:01 ` [PATCH iproute2-next 0/7] " Ido Schimmel
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 48+ messages in thread
From: Ido Schimmel @ 2019-07-07  7:58 UTC (permalink / raw)
  To: netdev
  Cc: davem, jiri, mlxsw, dsahern, roopa, nikolay, andy, pablo,
	jakub.kicinski, pieter.jansenvanvuuren, andrew, f.fainelli,
	vivien.didelot, Ido Schimmel

From: Ido Schimmel <idosch@mellanox.com>

Register supported packet traps (layer 2 drops only, currently) and
associated trap group with devlink during driver initialization. The
code is placed in a new file (spectrum_trap.c) to which the existing
packet traps will be moved to once they are exposed via devlink.

The amount of traffic generated by these packet drop traps is capped at
700Kpps to ensure the CPU is not overwhelmed by incoming packets.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 drivers/net/ethernet/mellanox/mlxsw/Makefile  |   2 +-
 drivers/net/ethernet/mellanox/mlxsw/core.c    |  52 ++++
 drivers/net/ethernet/mellanox/mlxsw/core.h    |   9 +
 .../net/ethernet/mellanox/mlxsw/spectrum.c    |  17 ++
 .../net/ethernet/mellanox/mlxsw/spectrum.h    |  13 +
 .../ethernet/mellanox/mlxsw/spectrum_trap.c   | 270 ++++++++++++++++++
 6 files changed, 362 insertions(+), 1 deletion(-)
 create mode 100644 drivers/net/ethernet/mellanox/mlxsw/spectrum_trap.c

diff --git a/drivers/net/ethernet/mellanox/mlxsw/Makefile b/drivers/net/ethernet/mellanox/mlxsw/Makefile
index 171b36bd8a4e..0e86a581d45b 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/Makefile
+++ b/drivers/net/ethernet/mellanox/mlxsw/Makefile
@@ -29,7 +29,7 @@ mlxsw_spectrum-objs		:= spectrum.o spectrum_buffers.o \
 				   spectrum_mr_tcam.o spectrum_mr.o \
 				   spectrum_qdisc.o spectrum_span.o \
 				   spectrum_nve.o spectrum_nve_vxlan.o \
-				   spectrum_dpipe.o
+				   spectrum_dpipe.o spectrum_trap.o
 mlxsw_spectrum-$(CONFIG_MLXSW_SPECTRUM_DCB)	+= spectrum_dcb.o
 mlxsw_spectrum-$(CONFIG_PTP_1588_CLOCK)		+= spectrum_ptp.o
 obj-$(CONFIG_MLXSW_MINIMAL)	+= mlxsw_minimal.o
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c b/drivers/net/ethernet/mellanox/mlxsw/core.c
index 6ec07ecfb5f6..963a2b4b61b1 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.c
@@ -1017,6 +1017,54 @@ static int mlxsw_devlink_flash_update(struct devlink *devlink,
 					  component, extack);
 }
 
+static int mlxsw_devlink_trap_init(struct devlink *devlink,
+				   const struct devlink_trap *trap,
+				   void *trap_ctx)
+{
+	struct mlxsw_core *mlxsw_core = devlink_priv(devlink);
+	struct mlxsw_driver *mlxsw_driver = mlxsw_core->driver;
+
+	if (!mlxsw_driver->trap_init)
+		return -EOPNOTSUPP;
+	return mlxsw_driver->trap_init(mlxsw_core, trap, trap_ctx);
+}
+
+static void mlxsw_devlink_trap_fini(struct devlink *devlink,
+				    const struct devlink_trap *trap,
+				    void *trap_ctx)
+{
+	struct mlxsw_core *mlxsw_core = devlink_priv(devlink);
+	struct mlxsw_driver *mlxsw_driver = mlxsw_core->driver;
+
+	if (!mlxsw_driver->trap_fini)
+		return;
+	mlxsw_driver->trap_fini(mlxsw_core, trap, trap_ctx);
+}
+
+static int mlxsw_devlink_trap_action_set(struct devlink *devlink,
+					 const struct devlink_trap *trap,
+					 enum devlink_trap_action action)
+{
+	struct mlxsw_core *mlxsw_core = devlink_priv(devlink);
+	struct mlxsw_driver *mlxsw_driver = mlxsw_core->driver;
+
+	if (!mlxsw_driver->trap_action_set)
+		return -EOPNOTSUPP;
+	return mlxsw_driver->trap_action_set(mlxsw_core, trap, action);
+}
+
+static int
+mlxsw_devlink_trap_group_init(struct devlink *devlink,
+			      const struct devlink_trap_group *group)
+{
+	struct mlxsw_core *mlxsw_core = devlink_priv(devlink);
+	struct mlxsw_driver *mlxsw_driver = mlxsw_core->driver;
+
+	if (!mlxsw_driver->trap_group_init)
+		return -EOPNOTSUPP;
+	return mlxsw_driver->trap_group_init(mlxsw_core, group);
+}
+
 static const struct devlink_ops mlxsw_devlink_ops = {
 	.reload				= mlxsw_devlink_core_bus_device_reload,
 	.port_type_set			= mlxsw_devlink_port_type_set,
@@ -1034,6 +1082,10 @@ static const struct devlink_ops mlxsw_devlink_ops = {
 	.sb_occ_tc_port_bind_get	= mlxsw_devlink_sb_occ_tc_port_bind_get,
 	.info_get			= mlxsw_devlink_info_get,
 	.flash_update			= mlxsw_devlink_flash_update,
+	.trap_init			= mlxsw_devlink_trap_init,
+	.trap_fini			= mlxsw_devlink_trap_fini,
+	.trap_action_set		= mlxsw_devlink_trap_action_set,
+	.trap_group_init		= mlxsw_devlink_trap_group_init,
 };
 
 static int
diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.h b/drivers/net/ethernet/mellanox/mlxsw/core.h
index 19cea16c30bb..b65a17d49e43 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/core.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/core.h
@@ -292,6 +292,15 @@ struct mlxsw_driver {
 	int (*flash_update)(struct mlxsw_core *mlxsw_core,
 			    const char *file_name, const char *component,
 			    struct netlink_ext_ack *extack);
+	int (*trap_init)(struct mlxsw_core *mlxsw_core,
+			 const struct devlink_trap *trap, void *trap_ctx);
+	void (*trap_fini)(struct mlxsw_core *mlxsw_core,
+			  const struct devlink_trap *trap, void *trap_ctx);
+	int (*trap_action_set)(struct mlxsw_core *mlxsw_core,
+			       const struct devlink_trap *trap,
+			       enum devlink_trap_action action);
+	int (*trap_group_init)(struct mlxsw_core *mlxsw_core,
+			       const struct devlink_trap_group *group);
 	void (*txhdr_construct)(struct sk_buff *skb,
 				const struct mlxsw_tx_info *tx_info);
 	int (*resources_register)(struct mlxsw_core *mlxsw_core);
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
index ce285fbeebd3..c7e1c32b8af2 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.c
@@ -4635,6 +4635,12 @@ static int mlxsw_sp_init(struct mlxsw_core *mlxsw_core,
 		goto err_traps_init;
 	}
 
+	err = mlxsw_sp_devlink_traps_init(mlxsw_sp);
+	if (err) {
+		dev_err(mlxsw_sp->bus_info->dev, "Failed to initialize devlink traps\n");
+		goto err_devlink_traps_init;
+	}
+
 	err = mlxsw_sp_buffers_init(mlxsw_sp);
 	if (err) {
 		dev_err(mlxsw_sp->bus_info->dev, "Failed to initialize buffers\n");
@@ -4768,6 +4774,8 @@ static int mlxsw_sp_init(struct mlxsw_core *mlxsw_core,
 err_lag_init:
 	mlxsw_sp_buffers_fini(mlxsw_sp);
 err_buffers_init:
+	mlxsw_sp_devlink_traps_fini(mlxsw_sp);
+err_devlink_traps_init:
 	mlxsw_sp_traps_fini(mlxsw_sp);
 err_traps_init:
 	mlxsw_sp_fids_fini(mlxsw_sp);
@@ -4840,6 +4848,7 @@ static void mlxsw_sp_fini(struct mlxsw_core *mlxsw_core)
 	mlxsw_sp_span_fini(mlxsw_sp);
 	mlxsw_sp_lag_fini(mlxsw_sp);
 	mlxsw_sp_buffers_fini(mlxsw_sp);
+	mlxsw_sp_devlink_traps_fini(mlxsw_sp);
 	mlxsw_sp_traps_fini(mlxsw_sp);
 	mlxsw_sp_fids_fini(mlxsw_sp);
 	mlxsw_sp_kvdl_fini(mlxsw_sp);
@@ -5201,6 +5210,10 @@ static struct mlxsw_driver mlxsw_sp1_driver = {
 	.sb_occ_port_pool_get		= mlxsw_sp_sb_occ_port_pool_get,
 	.sb_occ_tc_port_bind_get	= mlxsw_sp_sb_occ_tc_port_bind_get,
 	.flash_update			= mlxsw_sp_flash_update,
+	.trap_init			= mlxsw_sp_trap_init,
+	.trap_fini			= mlxsw_sp_trap_fini,
+	.trap_action_set		= mlxsw_sp_trap_action_set,
+	.trap_group_init		= mlxsw_sp_trap_group_init,
 	.txhdr_construct		= mlxsw_sp_txhdr_construct,
 	.resources_register		= mlxsw_sp1_resources_register,
 	.kvd_sizes_get			= mlxsw_sp_kvd_sizes_get,
@@ -5231,6 +5244,10 @@ static struct mlxsw_driver mlxsw_sp2_driver = {
 	.sb_occ_port_pool_get		= mlxsw_sp_sb_occ_port_pool_get,
 	.sb_occ_tc_port_bind_get	= mlxsw_sp_sb_occ_tc_port_bind_get,
 	.flash_update			= mlxsw_sp_flash_update,
+	.trap_init			= mlxsw_sp_trap_init,
+	.trap_fini			= mlxsw_sp_trap_fini,
+	.trap_action_set		= mlxsw_sp_trap_action_set,
+	.trap_group_init		= mlxsw_sp_trap_group_init,
 	.txhdr_construct		= mlxsw_sp_txhdr_construct,
 	.resources_register		= mlxsw_sp2_resources_register,
 	.params_register		= mlxsw_sp2_params_register,
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
index abbb563db440..d0f96fa7cf9d 100644
--- a/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum.h
@@ -950,4 +950,17 @@ void mlxsw_sp_port_nve_fini(struct mlxsw_sp_port *mlxsw_sp_port);
 int mlxsw_sp_nve_init(struct mlxsw_sp *mlxsw_sp);
 void mlxsw_sp_nve_fini(struct mlxsw_sp *mlxsw_sp);
 
+/* spectrum_trap.c */
+int mlxsw_sp_devlink_traps_init(struct mlxsw_sp *mlxsw_sp);
+void mlxsw_sp_devlink_traps_fini(struct mlxsw_sp *mlxsw_sp);
+int mlxsw_sp_trap_init(struct mlxsw_core *mlxsw_core,
+		       const struct devlink_trap *trap, void *trap_ctx);
+void mlxsw_sp_trap_fini(struct mlxsw_core *mlxsw_core,
+			const struct devlink_trap *trap, void *trap_ctx);
+int mlxsw_sp_trap_action_set(struct mlxsw_core *mlxsw_core,
+			     const struct devlink_trap *trap,
+			     enum devlink_trap_action action);
+int mlxsw_sp_trap_group_init(struct mlxsw_core *mlxsw_core,
+			     const struct devlink_trap_group *group);
+
 #endif
diff --git a/drivers/net/ethernet/mellanox/mlxsw/spectrum_trap.c b/drivers/net/ethernet/mellanox/mlxsw/spectrum_trap.c
new file mode 100644
index 000000000000..bf9f74838577
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlxsw/spectrum_trap.c
@@ -0,0 +1,270 @@
+// SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
+/* Copyright (c) 2019 Mellanox Technologies. All rights reserved */
+
+#include <linux/kernel.h>
+#include <net/devlink.h>
+#include <uapi/linux/devlink.h>
+
+#include "core.h"
+#include "reg.h"
+#include "spectrum.h"
+
+#define MLXSW_SP_TRAP_METADATA DEVLINK_TRAP_METADATA_TYPE_F_IN_PORT
+
+static void mlxsw_sp_rx_drop_listener(struct sk_buff *skb, u8 local_port,
+				      void *priv);
+
+#define MLXSW_SP_TRAP_DROP(_id, _group_id)				      \
+	DEVLINK_TRAP_GENERIC(DROP, DROP, _id,				      \
+			     DEVLINK_TRAP_GROUP_GENERIC(_group_id),	      \
+			     MLXSW_SP_TRAP_METADATA)
+
+#define MLXSW_SP_RXL_DISCARD(_id, _group_id)				      \
+	MLXSW_RXL(mlxsw_sp_rx_drop_listener, DISCARD_##_id, DISCARD, false,   \
+		  SP_##_group_id, DISCARD)
+
+static struct devlink_trap mlxsw_sp_traps_arr[] = {
+	MLXSW_SP_TRAP_DROP(SMAC_MC, L2_DROPS),
+	MLXSW_SP_TRAP_DROP(VLAN_TAG_MISMATCH, L2_DROPS),
+	MLXSW_SP_TRAP_DROP(INGRESS_VLAN_FILTER, L2_DROPS),
+	MLXSW_SP_TRAP_DROP(INGRESS_STP_FILTER, L2_DROPS),
+	MLXSW_SP_TRAP_DROP(EMPTY_TX_LIST, L2_DROPS),
+	MLXSW_SP_TRAP_DROP(PORT_LOOPBACK_FILTER, L2_DROPS),
+};
+
+static struct mlxsw_listener mlxsw_sp_listeners_arr[] = {
+	MLXSW_SP_RXL_DISCARD(ING_PACKET_SMAC_MC, L2_DISCARDS),
+	MLXSW_SP_RXL_DISCARD(ING_SWITCH_VTAG_ALLOW, L2_DISCARDS),
+	MLXSW_SP_RXL_DISCARD(ING_SWITCH_VLAN, L2_DISCARDS),
+	MLXSW_SP_RXL_DISCARD(ING_SWITCH_STP, L2_DISCARDS),
+	MLXSW_SP_RXL_DISCARD(LOOKUP_SWITCH_UC, L2_DISCARDS),
+	MLXSW_SP_RXL_DISCARD(LOOKUP_SWITCH_MC_NULL, L2_DISCARDS),
+	MLXSW_SP_RXL_DISCARD(LOOKUP_SWITCH_LB, L2_DISCARDS),
+};
+
+/* Mapping between hardware trap and devlink trap. Multiple hardware traps can
+ * be mapped to the same devlink trap. Order is according to
+ * 'mlxsw_sp_listeners_arr'.
+ */
+static u16 mlxsw_sp_listener_devlink_map[] = {
+	DEVLINK_TRAP_GENERIC_ID_SMAC_MC,
+	DEVLINK_TRAP_GENERIC_ID_VLAN_TAG_MISMATCH,
+	DEVLINK_TRAP_GENERIC_ID_INGRESS_VLAN_FILTER,
+	DEVLINK_TRAP_GENERIC_ID_INGRESS_STP_FILTER,
+	DEVLINK_TRAP_GENERIC_ID_EMPTY_TX_LIST,
+	DEVLINK_TRAP_GENERIC_ID_EMPTY_TX_LIST,
+	DEVLINK_TRAP_GENERIC_ID_PORT_LOOPBACK_FILTER,
+};
+
+static int mlxsw_sp_rx_listener(struct mlxsw_sp *mlxsw_sp, struct sk_buff *skb,
+				u8 local_port,
+				struct mlxsw_sp_port *mlxsw_sp_port)
+{
+	struct mlxsw_sp_port_pcpu_stats *pcpu_stats;
+
+	if (unlikely(!mlxsw_sp_port)) {
+		dev_warn_ratelimited(mlxsw_sp->bus_info->dev, "Port %d: skb received for non-existent port\n",
+				     local_port);
+		kfree_skb(skb);
+		return -EINVAL;
+	}
+
+	skb->dev = mlxsw_sp_port->dev;
+
+	pcpu_stats = this_cpu_ptr(mlxsw_sp_port->pcpu_stats);
+	u64_stats_update_begin(&pcpu_stats->syncp);
+	pcpu_stats->rx_packets++;
+	pcpu_stats->rx_bytes += skb->len;
+	u64_stats_update_end(&pcpu_stats->syncp);
+
+	return 0;
+}
+
+static void mlxsw_sp_rx_drop_listener(struct sk_buff *skb, u8 local_port,
+				      void *trap_ctx)
+{
+	struct devlink_port *in_devlink_port;
+	struct mlxsw_sp_port *mlxsw_sp_port;
+	struct mlxsw_sp *mlxsw_sp;
+	struct devlink *devlink;
+
+	mlxsw_sp = devlink_trap_ctx_priv(trap_ctx);
+	mlxsw_sp_port = mlxsw_sp->ports[local_port];
+
+	if (mlxsw_sp_rx_listener(mlxsw_sp, skb, local_port, mlxsw_sp_port))
+		return;
+
+	devlink = priv_to_devlink(mlxsw_sp->core);
+	in_devlink_port = mlxsw_core_port_devlink_port_get(mlxsw_sp->core,
+							   local_port);
+	devlink_trap_report(devlink, skb, trap_ctx, in_devlink_port);
+	consume_skb(skb);
+}
+
+int mlxsw_sp_devlink_traps_init(struct mlxsw_sp *mlxsw_sp)
+{
+	struct devlink *devlink = priv_to_devlink(mlxsw_sp->core);
+
+	if (WARN_ON(ARRAY_SIZE(mlxsw_sp_listener_devlink_map) !=
+		    ARRAY_SIZE(mlxsw_sp_listeners_arr)))
+		return -EINVAL;
+
+	return devlink_traps_register(devlink, mlxsw_sp_traps_arr,
+				      ARRAY_SIZE(mlxsw_sp_traps_arr),
+				      mlxsw_sp);
+}
+
+void mlxsw_sp_devlink_traps_fini(struct mlxsw_sp *mlxsw_sp)
+{
+	struct devlink *devlink = priv_to_devlink(mlxsw_sp->core);
+
+	devlink_traps_unregister(devlink, mlxsw_sp_traps_arr,
+				 ARRAY_SIZE(mlxsw_sp_traps_arr));
+}
+
+int mlxsw_sp_trap_init(struct mlxsw_core *mlxsw_core,
+		       const struct devlink_trap *trap, void *trap_ctx)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(mlxsw_sp_listener_devlink_map); i++) {
+		struct mlxsw_listener *listener;
+		int err;
+
+		if (mlxsw_sp_listener_devlink_map[i] != trap->id)
+			continue;
+		listener = &mlxsw_sp_listeners_arr[i];
+
+		err = mlxsw_core_trap_register(mlxsw_core, listener, trap_ctx);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
+void mlxsw_sp_trap_fini(struct mlxsw_core *mlxsw_core,
+			const struct devlink_trap *trap, void *trap_ctx)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(mlxsw_sp_listener_devlink_map); i++) {
+		struct mlxsw_listener *listener;
+
+		if (mlxsw_sp_listener_devlink_map[i] != trap->id)
+			continue;
+		listener = &mlxsw_sp_listeners_arr[i];
+
+		mlxsw_core_trap_unregister(mlxsw_core, listener, trap_ctx);
+	}
+}
+
+int mlxsw_sp_trap_action_set(struct mlxsw_core *mlxsw_core,
+			     const struct devlink_trap *trap,
+			     enum devlink_trap_action action)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(mlxsw_sp_listener_devlink_map); i++) {
+		enum mlxsw_reg_hpkt_action hw_action;
+		struct mlxsw_listener *listener;
+		int err;
+
+		if (mlxsw_sp_listener_devlink_map[i] != trap->id)
+			continue;
+		listener = &mlxsw_sp_listeners_arr[i];
+
+		switch (action) {
+		case DEVLINK_TRAP_ACTION_DROP:
+			hw_action = MLXSW_REG_HPKT_ACTION_DISCARD;
+			break;
+		case DEVLINK_TRAP_ACTION_TRAP:
+			/* Drop traps need to use TRAP_EXCEPTION, so that an
+			 * error is counted.
+			 */
+			hw_action = MLXSW_REG_HPKT_ACTION_TRAP_EXCEPTION_TO_CPU;
+			if (trap->type != DEVLINK_TRAP_TYPE_DROP)
+				hw_action = MLXSW_REG_HPKT_ACTION_TRAP_TO_CPU;
+			break;
+		default:
+			return -EINVAL;
+		}
+
+		err = mlxsw_core_trap_action_set(mlxsw_core, listener,
+						 hw_action);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
+#define MLXSW_SP_DISCARD_POLICER_ID	(MLXSW_REG_HTGT_TRAP_GROUP_MAX + 1)
+
+static int
+mlxsw_sp_trap_group_policer_init(struct mlxsw_sp *mlxsw_sp,
+				 const struct devlink_trap_group *group)
+{
+	enum mlxsw_reg_qpcr_ir_units ir_units;
+	char qpcr_pl[MLXSW_REG_QPCR_LEN];
+	u16 policer_id;
+	u8 burst_size;
+	bool is_bytes;
+	u32 rate;
+
+	switch (group->id) {
+	case DEVLINK_TRAP_GROUP_GENERIC_ID_L2_DROPS:
+		policer_id = MLXSW_SP_DISCARD_POLICER_ID;
+		ir_units = MLXSW_REG_QPCR_IR_UNITS_M;
+		is_bytes = false;
+		rate = 700 * 1024; /* 700Kpps */
+		burst_size = 7;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	mlxsw_reg_qpcr_pack(qpcr_pl, policer_id, ir_units, is_bytes, rate,
+			    burst_size);
+	return mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(qpcr), qpcr_pl);
+}
+
+static int
+__mlxsw_sp_trap_group_init(struct mlxsw_sp *mlxsw_sp,
+			   const struct devlink_trap_group *group)
+{
+	char htgt_pl[MLXSW_REG_HTGT_LEN];
+	u8 priority, tc, group_id;
+	u16 policer_id;
+
+	switch (group->id) {
+	case DEVLINK_TRAP_GROUP_GENERIC_ID_L2_DROPS:
+		group_id = MLXSW_REG_HTGT_TRAP_GROUP_SP_L2_DISCARDS;
+		policer_id = MLXSW_SP_DISCARD_POLICER_ID;
+		priority = 0;
+		tc = 1;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	mlxsw_reg_htgt_pack(htgt_pl, group_id, policer_id, priority, tc);
+	return mlxsw_reg_write(mlxsw_sp->core, MLXSW_REG(htgt), htgt_pl);
+}
+
+int mlxsw_sp_trap_group_init(struct mlxsw_core *mlxsw_core,
+			     const struct devlink_trap_group *group)
+{
+	struct mlxsw_sp *mlxsw_sp = mlxsw_core_driver_priv(mlxsw_core);
+	int err;
+
+	err = mlxsw_sp_trap_group_policer_init(mlxsw_sp, group);
+	if (err)
+		return err;
+
+	err = __mlxsw_sp_trap_group_init(mlxsw_sp, group);
+	if (err)
+		return err;
+
+	return 0;
+}
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH iproute2-next 0/7] Add devlink-trap support
  2019-07-07  7:58 [PATCH net-next 00/11] Add drop monitor for offloaded data paths Ido Schimmel
                   ` (10 preceding siblings ...)
  2019-07-07  7:58 ` [PATCH net-next 11/11] mlxsw: spectrum: Add devlink-trap support Ido Schimmel
@ 2019-07-07  8:01 ` Ido Schimmel
  2019-07-07  8:01   ` [PATCH iproute2-next 1/7] devlink: Increase number of supported options Ido Schimmel
                     ` (6 more replies)
  2019-07-07  8:03 ` [RFC PATCH net-next 0/5] selftests: Add devlink-trap selftests Ido Schimmel
                   ` (2 subsequent siblings)
  14 siblings, 7 replies; 48+ messages in thread
From: Ido Schimmel @ 2019-07-07  8:01 UTC (permalink / raw)
  To: netdev
  Cc: davem, jiri, mlxsw, dsahern, roopa, nikolay, andy, pablo,
	jakub.kicinski, pieter.jansenvanvuuren, andrew, f.fainelli,
	vivien.didelot, Ido Schimmel

From: Ido Schimmel <idosch@mellanox.com>

This patchset adds devlink-trap support in iproute2.

Patch #1 increases the number of options devlink can handle.

Patches #2-#4 gradually add support for all devlink-trap commands.

Patches #5-#6 perform small tweaks in devlink code.

Patch #7 adds a man page for devlink-trap.

See individual commit messages for example usage and output.

Ido Schimmel (7):
  devlink: Increase number of supported options
  devlink: Add devlink trap set and show commands
  devlink: Add devlink trap group set and show commands
  devlink: Add devlink trap monitor support
  devlink: Set NETLINK_NO_ENOBUFS when monitoring events
  devlink: Add fflush() to print functions
  devlink: Add man page for devlink-trap

 devlink/devlink.c          | 568 +++++++++++++++++++++++++++++++++++--
 devlink/mnlg.c             |  12 +
 devlink/mnlg.h             |   2 +
 man/man8/devlink-monitor.8 |   3 +-
 man/man8/devlink-trap.8    | 166 +++++++++++
 man/man8/devlink.8         |  11 +-
 6 files changed, 742 insertions(+), 20 deletions(-)
 create mode 100644 man/man8/devlink-trap.8

-- 
2.20.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH iproute2-next 1/7] devlink: Increase number of supported options
  2019-07-07  8:01 ` [PATCH iproute2-next 0/7] " Ido Schimmel
@ 2019-07-07  8:01   ` Ido Schimmel
  2019-07-07  8:01   ` [PATCH iproute2-next 2/7] devlink: Add devlink trap set and show commands Ido Schimmel
                     ` (5 subsequent siblings)
  6 siblings, 0 replies; 48+ messages in thread
From: Ido Schimmel @ 2019-07-07  8:01 UTC (permalink / raw)
  To: netdev
  Cc: davem, jiri, mlxsw, dsahern, roopa, nikolay, andy, pablo,
	jakub.kicinski, pieter.jansenvanvuuren, andrew, f.fainelli,
	vivien.didelot, Ido Schimmel

From: Ido Schimmel <idosch@mellanox.com>

Currently, the number of supported options is capped at 32 which is a
problem given we are about to add a few more and go over the limit.

Increase the limit to 64 options.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 devlink/devlink.c | 24 ++++++++++++------------
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/devlink/devlink.c b/devlink/devlink.c
index 559f624e3666..f631c8241a24 100644
--- a/devlink/devlink.c
+++ b/devlink/devlink.c
@@ -213,7 +213,7 @@ static void ifname_map_free(struct ifname_map *ifname_map)
 #define DL_OPT_HEALTH_REPORTER_AUTO_RECOVER	BIT(28)
 
 struct dl_opts {
-	uint32_t present; /* flags of present items */
+	uint64_t present; /* flags of present items */
 	char *bus_name;
 	char *dev_name;
 	uint32_t port_index;
@@ -713,7 +713,7 @@ static int dl_argv_handle_port(struct dl *dl, char **p_bus_name,
 
 static int dl_argv_handle_both(struct dl *dl, char **p_bus_name,
 			       char **p_dev_name, uint32_t *p_port_index,
-			       uint32_t *p_handle_bit)
+			       uint64_t *p_handle_bit)
 {
 	char *str = dl_argv_next(dl);
 	unsigned int slash_count;
@@ -993,7 +993,7 @@ static int param_cmode_get(const char *cmodestr,
 }
 
 struct dl_args_metadata {
-	uint32_t o_flag;
+	uint64_t o_flag;
 	char err_msg[DL_ARGS_REQUIRED_MAX_ERR_LEN];
 };
 
@@ -1020,10 +1020,10 @@ static const struct dl_args_metadata dl_args_required[] = {
 	{DL_OPT_HEALTH_REPORTER_NAME, "Reporter's name is expected."},
 };
 
-static int dl_args_finding_required_validate(uint32_t o_required,
-					     uint32_t o_found)
+static int dl_args_finding_required_validate(uint64_t o_required,
+					     uint64_t o_found)
 {
-	uint32_t o_flag;
+	uint64_t o_flag;
 	int i;
 
 	for (i = 0; i < ARRAY_SIZE(dl_args_required); i++) {
@@ -1036,16 +1036,16 @@ static int dl_args_finding_required_validate(uint32_t o_required,
 	return 0;
 }
 
-static int dl_argv_parse(struct dl *dl, uint32_t o_required,
-			 uint32_t o_optional)
+static int dl_argv_parse(struct dl *dl, uint64_t o_required,
+			 uint64_t o_optional)
 {
 	struct dl_opts *opts = &dl->opts;
-	uint32_t o_all = o_required | o_optional;
-	uint32_t o_found = 0;
+	uint64_t o_all = o_required | o_optional;
+	uint64_t o_found = 0;
 	int err;
 
 	if (o_required & DL_OPT_HANDLE && o_required & DL_OPT_HANDLEP) {
-		uint32_t handle_bit;
+		uint64_t handle_bit;
 
 		err = dl_argv_handle_both(dl, &opts->bus_name, &opts->dev_name,
 					  &opts->port_index, &handle_bit);
@@ -1424,7 +1424,7 @@ static void dl_opts_put(struct nlmsghdr *nlh, struct dl *dl)
 }
 
 static int dl_argv_parse_put(struct nlmsghdr *nlh, struct dl *dl,
-			     uint32_t o_required, uint32_t o_optional)
+			     uint64_t o_required, uint64_t o_optional)
 {
 	int err;
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH iproute2-next 2/7] devlink: Add devlink trap set and show commands
  2019-07-07  8:01 ` [PATCH iproute2-next 0/7] " Ido Schimmel
  2019-07-07  8:01   ` [PATCH iproute2-next 1/7] devlink: Increase number of supported options Ido Schimmel
@ 2019-07-07  8:01   ` Ido Schimmel
  2019-07-07  8:01   ` [PATCH iproute2-next 3/7] devlink: Add devlink trap group " Ido Schimmel
                     ` (4 subsequent siblings)
  6 siblings, 0 replies; 48+ messages in thread
From: Ido Schimmel @ 2019-07-07  8:01 UTC (permalink / raw)
  To: netdev
  Cc: davem, jiri, mlxsw, dsahern, roopa, nikolay, andy, pablo,
	jakub.kicinski, pieter.jansenvanvuuren, andrew, f.fainelli,
	vivien.didelot, Ido Schimmel

From: Ido Schimmel <idosch@mellanox.com>

The trap set command allows the user to set the action and reporting
state of an individual trap. Example:

# devlink trap set netdevsim/netdevsim10 trap blackhole_route action trap report true

The trap show command allows the user to get the current status of an
individual trap or a dump of all traps in case one is not specified.
When '-s' is specified the trap's statistics are shown. When '-v' is
specified the metadata types the trap can provide are shown. Example:

# devlink -jvps trap show netdevsim/netdevsim10 trap blackhole_route
{
    "trap": {
        "netdevsim/netdevsim10": [ {
                "name": "blackhole_route",
                "type": "drop",
                "generic": true,
                "report": true,
                "action": "trap",
                "group": "l3_drops",
                "metadata": [ "input_port" ],
                "stats": {
                    "rx": {
                        "bytes": 0,
                        "packets": 0
                    }
                }
            } ]
    }
}

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 devlink/devlink.c | 313 +++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 308 insertions(+), 5 deletions(-)

diff --git a/devlink/devlink.c b/devlink/devlink.c
index f631c8241a24..90f41cbd2a9c 100644
--- a/devlink/devlink.c
+++ b/devlink/devlink.c
@@ -211,6 +211,9 @@ static void ifname_map_free(struct ifname_map *ifname_map)
 #define DL_OPT_HEALTH_REPORTER_NAME	BIT(27)
 #define DL_OPT_HEALTH_REPORTER_GRACEFUL_PERIOD	BIT(27)
 #define DL_OPT_HEALTH_REPORTER_AUTO_RECOVER	BIT(28)
+#define DL_OPT_TRAP_NAME		BIT(29)
+#define DL_OPT_TRAP_REPORT		BIT(30)
+#define DL_OPT_TRAP_ACTION		BIT(31)
 
 struct dl_opts {
 	uint64_t present; /* flags of present items */
@@ -247,6 +250,9 @@ struct dl_opts {
 	const char *reporter_name;
 	uint64_t reporter_graceful_period;
 	bool reporter_auto_recover;
+	const char *trap_name;
+	bool trap_report;
+	enum devlink_trap_action trap_action;
 };
 
 struct dl {
@@ -260,6 +266,7 @@ struct dl {
 	bool json_output;
 	bool pretty_output;
 	bool verbose;
+	bool stats;
 	struct {
 		bool present;
 		char *bus_name;
@@ -414,6 +421,22 @@ static const enum mnl_attr_data_type devlink_policy[DEVLINK_ATTR_MAX + 1] = {
 	[DEVLINK_ATTR_HEALTH_REPORTER_RECOVER_COUNT] = MNL_TYPE_U64,
 	[DEVLINK_ATTR_HEALTH_REPORTER_DUMP_TS] = MNL_TYPE_U64,
 	[DEVLINK_ATTR_HEALTH_REPORTER_GRACEFUL_PERIOD] = MNL_TYPE_U64,
+	[DEVLINK_ATTR_STATS] = MNL_TYPE_NESTED,
+	[DEVLINK_ATTR_TRAP_NAME] = MNL_TYPE_STRING,
+	[DEVLINK_ATTR_TRAP_REPORT_ENABLED] = MNL_TYPE_U8,
+	[DEVLINK_ATTR_TRAP_ACTION] = MNL_TYPE_U8,
+	[DEVLINK_ATTR_TRAP_TYPE] = MNL_TYPE_U8,
+	[DEVLINK_ATTR_TRAP_GENERIC] = MNL_TYPE_FLAG,
+	[DEVLINK_ATTR_TRAP_METADATA] = MNL_TYPE_NESTED,
+	[DEVLINK_ATTR_TRAP_TIMESTAMP] = MNL_TYPE_BINARY,
+	[DEVLINK_ATTR_TRAP_IN_PORT] = MNL_TYPE_NESTED,
+	[DEVLINK_ATTR_TRAP_GROUP_NAME] = MNL_TYPE_STRING,
+};
+
+static const enum mnl_attr_data_type
+devlink_stats_policy[DEVLINK_ATTR_STATS_MAX + 1] = {
+	[DEVLINK_ATTR_STATS_RX_PACKETS] = MNL_TYPE_U64,
+	[DEVLINK_ATTR_STATS_RX_BYTES] = MNL_TYPE_U64,
 };
 
 static int attr_cb(const struct nlattr *attr, void *data)
@@ -432,6 +455,25 @@ static int attr_cb(const struct nlattr *attr, void *data)
 	return MNL_CB_OK;
 }
 
+static int attr_stats_cb(const struct nlattr *attr, void *data)
+{
+	const struct nlattr **tb = data;
+	int type;
+
+	/* Allow the tool to work on top of newer kernels that might contain
+	 * more attributes.
+	 */
+	if (mnl_attr_type_valid(attr, DEVLINK_ATTR_STATS_MAX) < 0)
+		return MNL_CB_OK;
+
+	type = mnl_attr_get_type(attr);
+	if (mnl_attr_validate(attr, devlink_stats_policy[type]) < 0)
+		return MNL_CB_ERROR;
+
+	tb[type] = attr;
+	return MNL_CB_OK;
+}
+
 static int ifname_map_cb(const struct nlmsghdr *nlh, void *data)
 {
 	struct nlattr *tb[DEVLINK_ATTR_MAX + 1] = {};
@@ -992,6 +1034,20 @@ static int param_cmode_get(const char *cmodestr,
 	return 0;
 }
 
+static int trap_action_get(const char *actionstr,
+			   enum devlink_trap_action *p_action)
+{
+	if (strcmp(actionstr, "drop") == 0) {
+		*p_action = DEVLINK_TRAP_ACTION_DROP;
+	} else if (strcmp(actionstr, "trap") == 0) {
+		*p_action = DEVLINK_TRAP_ACTION_TRAP;
+	} else {
+		pr_err("Unknown trap action \"%s\"\n", actionstr);
+		return -EINVAL;
+	}
+	return 0;
+}
+
 struct dl_args_metadata {
 	uint64_t o_flag;
 	char err_msg[DL_ARGS_REQUIRED_MAX_ERR_LEN];
@@ -1018,6 +1074,7 @@ static const struct dl_args_metadata dl_args_required[] = {
 	{DL_OPT_REGION_ADDRESS,	      "Region address value expected."},
 	{DL_OPT_REGION_LENGTH,	      "Region length value expected."},
 	{DL_OPT_HEALTH_REPORTER_NAME, "Reporter's name is expected."},
+	{DL_OPT_TRAP_NAME,            "Trap's name is expected."},
 };
 
 static int dl_args_finding_required_validate(uint64_t o_required,
@@ -1307,6 +1364,32 @@ static int dl_argv_parse(struct dl *dl, uint64_t o_required,
 			if (err)
 				return err;
 			o_found |= DL_OPT_HEALTH_REPORTER_AUTO_RECOVER;
+		} else if (dl_argv_match(dl, "trap") &&
+			   (o_all & DL_OPT_TRAP_NAME)) {
+			dl_arg_inc(dl);
+			err = dl_argv_str(dl, &opts->trap_name);
+			if (err)
+				return err;
+			o_found |= DL_OPT_TRAP_NAME;
+		} else if (dl_argv_match(dl, "report") &&
+			   (o_all & DL_OPT_TRAP_REPORT)) {
+			dl_arg_inc(dl);
+			err = dl_argv_bool(dl, &opts->trap_report);
+			if (err)
+				return err;
+			o_found |= DL_OPT_TRAP_REPORT;
+		} else if (dl_argv_match(dl, "action") &&
+			   (o_all & DL_OPT_TRAP_ACTION)) {
+			const char *actionstr;
+
+			dl_arg_inc(dl);
+			err = dl_argv_str(dl, &actionstr);
+			if (err)
+				return err;
+			err = trap_action_get(actionstr, &opts->trap_action);
+			if (err)
+				return err;
+			o_found |= DL_OPT_TRAP_ACTION;
 		} else {
 			pr_err("Unknown option \"%s\"\n", dl_argv(dl));
 			return -EINVAL;
@@ -1420,6 +1503,15 @@ static void dl_opts_put(struct nlmsghdr *nlh, struct dl *dl)
 	if (opts->present & DL_OPT_HEALTH_REPORTER_AUTO_RECOVER)
 		mnl_attr_put_u8(nlh, DEVLINK_ATTR_HEALTH_REPORTER_AUTO_RECOVER,
 				opts->reporter_auto_recover);
+	if (opts->present & DL_OPT_TRAP_NAME)
+		mnl_attr_put_strz(nlh, DEVLINK_ATTR_TRAP_NAME,
+				  opts->trap_name);
+	if (opts->present & DL_OPT_TRAP_REPORT)
+		mnl_attr_put_u8(nlh, DEVLINK_ATTR_TRAP_REPORT_ENABLED,
+				opts->trap_report);
+	if (opts->present & DL_OPT_TRAP_ACTION)
+		mnl_attr_put_u8(nlh, DEVLINK_ATTR_TRAP_ACTION,
+				opts->trap_action);
 
 }
 
@@ -1926,6 +2018,30 @@ static void pr_out_entry_end(struct dl *dl)
 		__pr_out_newline();
 }
 
+static void pr_out_stats(struct dl *dl, struct nlattr *nla_stats)
+{
+	struct nlattr *tb[DEVLINK_ATTR_STATS_MAX + 1] = {};
+	int err;
+
+	if (!dl->stats)
+		return;
+
+	err = mnl_attr_parse_nested(nla_stats, attr_stats_cb, tb);
+	if (err != MNL_CB_OK)
+		return;
+
+	pr_out_object_start(dl, "stats");
+	pr_out_object_start(dl, "rx");
+	if (tb[DEVLINK_ATTR_STATS_RX_BYTES])
+		pr_out_u64(dl, "bytes",
+			   mnl_attr_get_u64(tb[DEVLINK_ATTR_STATS_RX_BYTES]));
+	if (tb[DEVLINK_ATTR_STATS_RX_PACKETS])
+		pr_out_u64(dl, "packets",
+			   mnl_attr_get_u64(tb[DEVLINK_ATTR_STATS_RX_PACKETS]));
+	pr_out_object_end(dl);
+	pr_out_object_end(dl);
+}
+
 static const char *param_cmode_name(uint8_t cmode)
 {
 	switch (cmode) {
@@ -3712,6 +3828,10 @@ static const char *cmd_name(uint8_t cmd)
 	case DEVLINK_CMD_REGION_SET: return "set";
 	case DEVLINK_CMD_REGION_NEW: return "new";
 	case DEVLINK_CMD_REGION_DEL: return "del";
+	case DEVLINK_CMD_TRAP_GET: return "get";
+	case DEVLINK_CMD_TRAP_SET: return "set";
+	case DEVLINK_CMD_TRAP_NEW: return "new";
+	case DEVLINK_CMD_TRAP_DEL: return "del";
 	default: return "<unknown cmd>";
 	}
 }
@@ -3740,6 +3860,11 @@ static const char *cmd_obj(uint8_t cmd)
 	case DEVLINK_CMD_REGION_NEW:
 	case DEVLINK_CMD_REGION_DEL:
 		return "region";
+	case DEVLINK_CMD_TRAP_GET:
+	case DEVLINK_CMD_TRAP_SET:
+	case DEVLINK_CMD_TRAP_NEW:
+	case DEVLINK_CMD_TRAP_DEL:
+		return "trap";
 	default: return "<unknown obj>";
 	}
 }
@@ -3765,6 +3890,7 @@ static bool cmd_filter_check(struct dl *dl, uint8_t cmd)
 }
 
 static void pr_out_region(struct dl *dl, struct nlattr **tb);
+static void pr_out_trap(struct dl *dl, struct nlattr **tb, bool array);
 
 static int cmd_mon_show_cb(const struct nlmsghdr *nlh, void *data)
 {
@@ -3820,6 +3946,23 @@ static int cmd_mon_show_cb(const struct nlmsghdr *nlh, void *data)
 		pr_out_mon_header(genl->cmd);
 		pr_out_region(dl, tb);
 		break;
+	case DEVLINK_CMD_TRAP_GET: /* fall through */
+	case DEVLINK_CMD_TRAP_SET: /* fall through */
+	case DEVLINK_CMD_TRAP_NEW: /* fall through */
+	case DEVLINK_CMD_TRAP_DEL:
+		mnl_attr_parse(nlh, sizeof(*genl), attr_cb, tb);
+		if (!tb[DEVLINK_ATTR_BUS_NAME] || !tb[DEVLINK_ATTR_DEV_NAME] ||
+		    !tb[DEVLINK_ATTR_TRAP_NAME] ||
+		    !tb[DEVLINK_ATTR_TRAP_TYPE] ||
+		    !tb[DEVLINK_ATTR_TRAP_REPORT_ENABLED] ||
+		    !tb[DEVLINK_ATTR_TRAP_ACTION] ||
+		    !tb[DEVLINK_ATTR_TRAP_GROUP_NAME] ||
+		    !tb[DEVLINK_ATTR_TRAP_METADATA] ||
+		    !tb[DEVLINK_ATTR_STATS])
+			return MNL_CB_ERROR;
+		pr_out_mon_header(genl->cmd);
+		pr_out_trap(dl, tb, false);
+		break;
 	}
 	return MNL_CB_OK;
 }
@@ -3833,7 +3976,8 @@ static int cmd_mon_show(struct dl *dl)
 	while ((cur_obj = dl_argv_index(dl, index++))) {
 		if (strcmp(cur_obj, "all") != 0 &&
 		    strcmp(cur_obj, "dev") != 0 &&
-		    strcmp(cur_obj, "port") != 0) {
+		    strcmp(cur_obj, "port") != 0 &&
+		    strcmp(cur_obj, "trap") != 0) {
 			pr_err("Unknown object \"%s\"\n", cur_obj);
 			return -EINVAL;
 		}
@@ -3850,7 +3994,7 @@ static int cmd_mon_show(struct dl *dl)
 static void cmd_mon_help(void)
 {
 	pr_err("Usage: devlink monitor [ all | OBJECT-LIST ]\n"
-	       "where  OBJECT-LIST := { dev | port }\n");
+	       "where  OBJECT-LIST := { dev | port | trap }\n");
 }
 
 static int cmd_mon(struct dl *dl)
@@ -6273,12 +6417,164 @@ static int cmd_health(struct dl *dl)
 	return -ENOENT;
 }
 
+static const char *trap_type_name(uint8_t type)
+{
+	switch (type) {
+	case DEVLINK_TRAP_TYPE_DROP:
+		return "drop";
+	case DEVLINK_TRAP_TYPE_EXCEPTION:
+		return "exception";
+	default:
+		return "<unknown type>";
+	}
+}
+
+static const char *trap_action_name(uint8_t action)
+{
+	switch (action) {
+	case DEVLINK_TRAP_ACTION_DROP:
+		return "drop";
+	case DEVLINK_TRAP_ACTION_TRAP:
+		return "trap";
+	default:
+		return "<unknown action>";
+	}
+}
+
+static const char *trap_metadata_name(const struct nlattr *attr)
+{
+	switch (attr->nla_type) {
+	case DEVLINK_ATTR_TRAP_METADATA_TYPE_IN_PORT:
+		return "input_port";
+	default:
+		return "<unknown metadata type>";
+	}
+}
+static void pr_out_trap_metadata(struct dl *dl, struct nlattr *attr)
+{
+	struct nlattr *attr_metadata;
+
+	pr_out_array_start(dl, "metadata");
+	mnl_attr_for_each_nested(attr_metadata, attr)
+		pr_out_str_value(dl, trap_metadata_name(attr_metadata));
+	pr_out_array_end(dl);
+}
+
+static void pr_out_trap(struct dl *dl, struct nlattr **tb, bool array)
+{
+	uint8_t action = mnl_attr_get_u8(tb[DEVLINK_ATTR_TRAP_ACTION]);
+	uint8_t type = mnl_attr_get_u8(tb[DEVLINK_ATTR_TRAP_TYPE]);
+
+	if (array)
+		pr_out_handle_start_arr(dl, tb);
+	else
+		__pr_out_handle_start(dl, tb, true, false);
+
+	pr_out_str(dl, "name", mnl_attr_get_str(tb[DEVLINK_ATTR_TRAP_NAME]));
+	pr_out_str(dl, "type", trap_type_name(type));
+	pr_out_bool(dl, "generic", !!tb[DEVLINK_ATTR_TRAP_GENERIC]);
+	pr_out_bool(dl, "report",
+		    !!mnl_attr_get_u8(tb[DEVLINK_ATTR_TRAP_REPORT_ENABLED]));
+	pr_out_str(dl, "action", trap_action_name(action));
+	pr_out_str(dl, "group",
+		   mnl_attr_get_str(tb[DEVLINK_ATTR_TRAP_GROUP_NAME]));
+	if (dl->verbose)
+		pr_out_trap_metadata(dl, tb[DEVLINK_ATTR_TRAP_METADATA]);
+	pr_out_stats(dl, tb[DEVLINK_ATTR_STATS]);
+	pr_out_handle_end(dl);
+}
+
+static int cmd_trap_show_cb(const struct nlmsghdr *nlh, void *data)
+{
+	struct genlmsghdr *genl = mnl_nlmsg_get_payload(nlh);
+	struct nlattr *tb[DEVLINK_ATTR_MAX + 1] = {};
+	struct dl *dl = data;
+
+	mnl_attr_parse(nlh, sizeof(*genl), attr_cb, tb);
+	if (!tb[DEVLINK_ATTR_BUS_NAME] || !tb[DEVLINK_ATTR_DEV_NAME] ||
+	    !tb[DEVLINK_ATTR_TRAP_NAME] || !tb[DEVLINK_ATTR_TRAP_TYPE] ||
+	    !tb[DEVLINK_ATTR_TRAP_REPORT_ENABLED] ||
+	    !tb[DEVLINK_ATTR_TRAP_ACTION] ||
+	    !tb[DEVLINK_ATTR_TRAP_GROUP_NAME] ||
+	    !tb[DEVLINK_ATTR_TRAP_METADATA] || !tb[DEVLINK_ATTR_STATS])
+		return MNL_CB_ERROR;
+
+	pr_out_trap(dl, tb, true);
+
+	return MNL_CB_OK;
+}
+
+static void cmd_trap_help(void)
+{
+	pr_err("Usage: devlink trap set DEV trap TRAP [ report { true | false } ]\n");
+	pr_err("                                      [ action { trap | drop } ]\n");
+	pr_err("       devlink trap show [ DEV trap TRAP ]\n");
+}
+
+static int cmd_trap_show(struct dl *dl)
+{
+	uint16_t flags = NLM_F_REQUEST | NLM_F_ACK;
+	struct nlmsghdr *nlh;
+	int err;
+
+	if (dl_argc(dl) == 0)
+		flags |= NLM_F_DUMP;
+
+	nlh = mnlg_msg_prepare(dl->nlg, DEVLINK_CMD_TRAP_GET, flags);
+
+	if (dl_argc(dl) > 0) {
+		err = dl_argv_parse_put(nlh, dl,
+					DL_OPT_HANDLE | DL_OPT_TRAP_NAME, 0);
+		if (err)
+			return err;
+	}
+
+	pr_out_section_start(dl, "trap");
+	err = _mnlg_socket_sndrcv(dl->nlg, nlh, cmd_trap_show_cb, dl);
+	pr_out_section_end(dl);
+
+	return err;
+}
+
+static int cmd_trap_set(struct dl *dl)
+{
+	struct nlmsghdr *nlh;
+	int err;
+
+	nlh = mnlg_msg_prepare(dl->nlg, DEVLINK_CMD_TRAP_SET,
+			       NLM_F_REQUEST | NLM_F_ACK);
+
+	err = dl_argv_parse_put(nlh, dl, DL_OPT_HANDLE | DL_OPT_TRAP_NAME,
+				DL_OPT_TRAP_REPORT | DL_OPT_TRAP_ACTION);
+	if (err)
+		return err;
+
+	return _mnlg_socket_sndrcv(dl->nlg, nlh, NULL, NULL);
+}
+
+static int cmd_trap(struct dl *dl)
+{
+	if (dl_argv_match(dl, "help")) {
+		cmd_trap_help();
+		return 0;
+	} else if (dl_argv_match(dl, "show") ||
+		   dl_argv_match(dl, "list") || dl_no_arg(dl)) {
+		dl_arg_inc(dl);
+		return cmd_trap_show(dl);
+	} else if (dl_argv_match(dl, "set")) {
+		dl_arg_inc(dl);
+		return cmd_trap_set(dl);
+	}
+	pr_err("Command \"%s\" not found\n", dl_argv(dl));
+	return -ENOENT;
+}
+
 static void help(void)
 {
 	pr_err("Usage: devlink [ OPTIONS ] OBJECT { COMMAND | help }\n"
 	       "       devlink [ -f[orce] ] -b[atch] filename\n"
-	       "where  OBJECT := { dev | port | sb | monitor | dpipe | resource | region | health }\n"
-	       "       OPTIONS := { -V[ersion] | -n[o-nice-names] | -j[son] | -p[retty] | -v[erbose] }\n");
+	       "where  OBJECT := { dev | port | sb | monitor | dpipe | resource | region | health | trap }\n"
+	       "       OPTIONS := { -V[ersion] | -n[o-nice-names] | -j[son] | -p[retty] | -v[erbose] -s[tatistics] }\n");
 }
 
 static int dl_cmd(struct dl *dl, int argc, char **argv)
@@ -6313,6 +6609,9 @@ static int dl_cmd(struct dl *dl, int argc, char **argv)
 	} else if (dl_argv_match(dl, "health")) {
 		dl_arg_inc(dl);
 		return cmd_health(dl);
+	} else if (dl_argv_match(dl, "trap")) {
+		dl_arg_inc(dl);
+		return cmd_trap(dl);
 	}
 	pr_err("Object \"%s\" not found\n", dl_argv(dl));
 	return -ENOENT;
@@ -6422,6 +6721,7 @@ int main(int argc, char **argv)
 		{ "json",		no_argument,		NULL, 'j' },
 		{ "pretty",		no_argument,		NULL, 'p' },
 		{ "verbose",		no_argument,		NULL, 'v' },
+		{ "statistics",		no_argument,		NULL, 's' },
 		{ NULL, 0, NULL, 0 }
 	};
 	const char *batch_file = NULL;
@@ -6437,7 +6737,7 @@ int main(int argc, char **argv)
 		return EXIT_FAILURE;
 	}
 
-	while ((opt = getopt_long(argc, argv, "Vfb:njpv",
+	while ((opt = getopt_long(argc, argv, "Vfb:njpvs",
 				  long_options, NULL)) >= 0) {
 
 		switch (opt) {
@@ -6463,6 +6763,9 @@ int main(int argc, char **argv)
 		case 'v':
 			dl->verbose = true;
 			break;
+		case 's':
+			dl->stats = true;
+			break;
 		default:
 			pr_err("Unknown option.\n");
 			help();
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH iproute2-next 3/7] devlink: Add devlink trap group set and show commands
  2019-07-07  8:01 ` [PATCH iproute2-next 0/7] " Ido Schimmel
  2019-07-07  8:01   ` [PATCH iproute2-next 1/7] devlink: Increase number of supported options Ido Schimmel
  2019-07-07  8:01   ` [PATCH iproute2-next 2/7] devlink: Add devlink trap set and show commands Ido Schimmel
@ 2019-07-07  8:01   ` Ido Schimmel
  2019-07-07  8:01   ` [PATCH iproute2-next 4/7] devlink: Add devlink trap monitor support Ido Schimmel
                     ` (3 subsequent siblings)
  6 siblings, 0 replies; 48+ messages in thread
From: Ido Schimmel @ 2019-07-07  8:01 UTC (permalink / raw)
  To: netdev
  Cc: davem, jiri, mlxsw, dsahern, roopa, nikolay, andy, pablo,
	jakub.kicinski, pieter.jansenvanvuuren, andrew, f.fainelli,
	vivien.didelot, Ido Schimmel

From: Ido Schimmel <idosch@mellanox.com>

These commands are similar to the trap set and show commands, but
operate on a trap group and not individual traps. Example:

# devlink trap group set netdevsim/netdevsim10 group l3_drops action trap report true
# devlink -jps trap group show netdevsim/netdevsim10 group l3_drops
{
    "trap_group": {
        "netdevsim/netdevsim10": [ {
                "name": "l3_drops",
                "generic": true,
                "stats": {
                    "rx": {
                        "bytes": 0,
                        "packets": 0
                    }
                }
            } ]
    }
}

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 devlink/devlink.c | 136 +++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 134 insertions(+), 2 deletions(-)

diff --git a/devlink/devlink.c b/devlink/devlink.c
index 90f41cbd2a9c..ad62b6162f0b 100644
--- a/devlink/devlink.c
+++ b/devlink/devlink.c
@@ -214,6 +214,7 @@ static void ifname_map_free(struct ifname_map *ifname_map)
 #define DL_OPT_TRAP_NAME		BIT(29)
 #define DL_OPT_TRAP_REPORT		BIT(30)
 #define DL_OPT_TRAP_ACTION		BIT(31)
+#define DL_OPT_TRAP_GROUP_NAME		BIT(32)
 
 struct dl_opts {
 	uint64_t present; /* flags of present items */
@@ -251,6 +252,7 @@ struct dl_opts {
 	uint64_t reporter_graceful_period;
 	bool reporter_auto_recover;
 	const char *trap_name;
+	const char *trap_group_name;
 	bool trap_report;
 	enum devlink_trap_action trap_action;
 };
@@ -1075,6 +1077,7 @@ static const struct dl_args_metadata dl_args_required[] = {
 	{DL_OPT_REGION_LENGTH,	      "Region length value expected."},
 	{DL_OPT_HEALTH_REPORTER_NAME, "Reporter's name is expected."},
 	{DL_OPT_TRAP_NAME,            "Trap's name is expected."},
+	{DL_OPT_TRAP_GROUP_NAME,      "Trap group's name is expected."},
 };
 
 static int dl_args_finding_required_validate(uint64_t o_required,
@@ -1371,6 +1374,13 @@ static int dl_argv_parse(struct dl *dl, uint64_t o_required,
 			if (err)
 				return err;
 			o_found |= DL_OPT_TRAP_NAME;
+		} else if (dl_argv_match(dl, "group") &&
+			   (o_all & DL_OPT_TRAP_GROUP_NAME)) {
+			dl_arg_inc(dl);
+			err = dl_argv_str(dl, &opts->trap_group_name);
+			if (err)
+				return err;
+			o_found |= DL_OPT_TRAP_GROUP_NAME;
 		} else if (dl_argv_match(dl, "report") &&
 			   (o_all & DL_OPT_TRAP_REPORT)) {
 			dl_arg_inc(dl);
@@ -1506,6 +1516,9 @@ static void dl_opts_put(struct nlmsghdr *nlh, struct dl *dl)
 	if (opts->present & DL_OPT_TRAP_NAME)
 		mnl_attr_put_strz(nlh, DEVLINK_ATTR_TRAP_NAME,
 				  opts->trap_name);
+	if (opts->present & DL_OPT_TRAP_GROUP_NAME)
+		mnl_attr_put_strz(nlh, DEVLINK_ATTR_TRAP_GROUP_NAME,
+				  opts->trap_group_name);
 	if (opts->present & DL_OPT_TRAP_REPORT)
 		mnl_attr_put_u8(nlh, DEVLINK_ATTR_TRAP_REPORT_ENABLED,
 				opts->trap_report);
@@ -3832,6 +3845,10 @@ static const char *cmd_name(uint8_t cmd)
 	case DEVLINK_CMD_TRAP_SET: return "set";
 	case DEVLINK_CMD_TRAP_NEW: return "new";
 	case DEVLINK_CMD_TRAP_DEL: return "del";
+	case DEVLINK_CMD_TRAP_GROUP_GET: return "get";
+	case DEVLINK_CMD_TRAP_GROUP_SET: return "set";
+	case DEVLINK_CMD_TRAP_GROUP_NEW: return "new";
+	case DEVLINK_CMD_TRAP_GROUP_DEL: return "del";
 	default: return "<unknown cmd>";
 	}
 }
@@ -3865,6 +3882,11 @@ static const char *cmd_obj(uint8_t cmd)
 	case DEVLINK_CMD_TRAP_NEW:
 	case DEVLINK_CMD_TRAP_DEL:
 		return "trap";
+	case DEVLINK_CMD_TRAP_GROUP_GET:
+	case DEVLINK_CMD_TRAP_GROUP_SET:
+	case DEVLINK_CMD_TRAP_GROUP_NEW:
+	case DEVLINK_CMD_TRAP_GROUP_DEL:
+		return "trap-group";
 	default: return "<unknown obj>";
 	}
 }
@@ -3891,6 +3913,7 @@ static bool cmd_filter_check(struct dl *dl, uint8_t cmd)
 
 static void pr_out_region(struct dl *dl, struct nlattr **tb);
 static void pr_out_trap(struct dl *dl, struct nlattr **tb, bool array);
+static void pr_out_trap_group(struct dl *dl, struct nlattr **tb, bool array);
 
 static int cmd_mon_show_cb(const struct nlmsghdr *nlh, void *data)
 {
@@ -3963,6 +3986,18 @@ static int cmd_mon_show_cb(const struct nlmsghdr *nlh, void *data)
 		pr_out_mon_header(genl->cmd);
 		pr_out_trap(dl, tb, false);
 		break;
+	case DEVLINK_CMD_TRAP_GROUP_GET: /* fall through */
+	case DEVLINK_CMD_TRAP_GROUP_SET: /* fall through */
+	case DEVLINK_CMD_TRAP_GROUP_NEW: /* fall through */
+	case DEVLINK_CMD_TRAP_GROUP_DEL:
+		mnl_attr_parse(nlh, sizeof(*genl), attr_cb, tb);
+		if (!tb[DEVLINK_ATTR_BUS_NAME] || !tb[DEVLINK_ATTR_DEV_NAME] ||
+		    !tb[DEVLINK_ATTR_TRAP_GROUP_NAME] ||
+		    !tb[DEVLINK_ATTR_STATS])
+			return MNL_CB_ERROR;
+		pr_out_mon_header(genl->cmd);
+		pr_out_trap_group(dl, tb, false);
+		break;
 	}
 	return MNL_CB_OK;
 }
@@ -3977,7 +4012,8 @@ static int cmd_mon_show(struct dl *dl)
 		if (strcmp(cur_obj, "all") != 0 &&
 		    strcmp(cur_obj, "dev") != 0 &&
 		    strcmp(cur_obj, "port") != 0 &&
-		    strcmp(cur_obj, "trap") != 0) {
+		    strcmp(cur_obj, "trap") != 0 &&
+		    strcmp(cur_obj, "trap-group") != 0) {
 			pr_err("Unknown object \"%s\"\n", cur_obj);
 			return -EINVAL;
 		}
@@ -3994,7 +4030,7 @@ static int cmd_mon_show(struct dl *dl)
 static void cmd_mon_help(void)
 {
 	pr_err("Usage: devlink monitor [ all | OBJECT-LIST ]\n"
-	       "where  OBJECT-LIST := { dev | port | trap }\n");
+	       "where  OBJECT-LIST := { dev | port | trap | trap-group }\n");
 }
 
 static int cmd_mon(struct dl *dl)
@@ -6509,6 +6545,9 @@ static void cmd_trap_help(void)
 	pr_err("Usage: devlink trap set DEV trap TRAP [ report { true | false } ]\n");
 	pr_err("                                      [ action { trap | drop } ]\n");
 	pr_err("       devlink trap show [ DEV trap TRAP ]\n");
+	pr_err("       devlink trap group set DEV group GROUP [ report { true | false } ]\n");
+	pr_err("                                              [ action { trap | drop } ]\n");
+	pr_err("       devlink trap group show [ DEV group GROUP ]\n");
 }
 
 static int cmd_trap_show(struct dl *dl)
@@ -6552,6 +6591,96 @@ static int cmd_trap_set(struct dl *dl)
 	return _mnlg_socket_sndrcv(dl->nlg, nlh, NULL, NULL);
 }
 
+static void pr_out_trap_group(struct dl *dl, struct nlattr **tb, bool array)
+{
+	if (array)
+		pr_out_handle_start_arr(dl, tb);
+	else
+		__pr_out_handle_start(dl, tb, true, false);
+
+	pr_out_str(dl, "name",
+		   mnl_attr_get_str(tb[DEVLINK_ATTR_TRAP_GROUP_NAME]));
+	pr_out_bool(dl, "generic", !!tb[DEVLINK_ATTR_TRAP_GENERIC]);
+	pr_out_stats(dl, tb[DEVLINK_ATTR_STATS]);
+	pr_out_handle_end(dl);
+}
+
+static int cmd_trap_group_show_cb(const struct nlmsghdr *nlh, void *data)
+{
+	struct genlmsghdr *genl = mnl_nlmsg_get_payload(nlh);
+	struct nlattr *tb[DEVLINK_ATTR_MAX + 1] = {};
+	struct dl *dl = data;
+
+	mnl_attr_parse(nlh, sizeof(*genl), attr_cb, tb);
+	if (!tb[DEVLINK_ATTR_BUS_NAME] || !tb[DEVLINK_ATTR_DEV_NAME] ||
+	    !tb[DEVLINK_ATTR_TRAP_GROUP_NAME] || !tb[DEVLINK_ATTR_STATS])
+		return MNL_CB_ERROR;
+
+	pr_out_trap_group(dl, tb, true);
+
+	return MNL_CB_OK;
+}
+
+static int cmd_trap_group_show(struct dl *dl)
+{
+	uint16_t flags = NLM_F_REQUEST | NLM_F_ACK;
+	struct nlmsghdr *nlh;
+	int err;
+
+	if (dl_argc(dl) == 0)
+		flags |= NLM_F_DUMP;
+
+	nlh = mnlg_msg_prepare(dl->nlg, DEVLINK_CMD_TRAP_GROUP_GET, flags);
+
+	if (dl_argc(dl) > 0) {
+		err = dl_argv_parse_put(nlh, dl,
+					DL_OPT_HANDLE | DL_OPT_TRAP_GROUP_NAME,
+					0);
+		if (err)
+			return err;
+	}
+
+	pr_out_section_start(dl, "trap_group");
+	err = _mnlg_socket_sndrcv(dl->nlg, nlh, cmd_trap_group_show_cb, dl);
+	pr_out_section_end(dl);
+
+	return err;
+}
+
+static int cmd_trap_group_set(struct dl *dl)
+{
+	struct nlmsghdr *nlh;
+	int err;
+
+	nlh = mnlg_msg_prepare(dl->nlg, DEVLINK_CMD_TRAP_GROUP_SET,
+			       NLM_F_REQUEST | NLM_F_ACK);
+
+	err = dl_argv_parse_put(nlh, dl,
+				DL_OPT_HANDLE | DL_OPT_TRAP_GROUP_NAME,
+				DL_OPT_TRAP_REPORT | DL_OPT_TRAP_ACTION);
+	if (err)
+		return err;
+
+	return _mnlg_socket_sndrcv(dl->nlg, nlh, NULL, NULL);
+}
+
+static int cmd_trap_group(struct dl *dl)
+{
+	if (dl_argv_match(dl, "help")) {
+		cmd_trap_help();
+		return 0;
+	} else if (dl_argv_match(dl, "show") ||
+		   dl_argv_match(dl, "list") || dl_no_arg(dl)) {
+		dl_arg_inc(dl);
+		return cmd_trap_group_show(dl);
+	} else if (dl_argv_match(dl, "set")) {
+		dl_arg_inc(dl);
+		return cmd_trap_group_set(dl);
+	}
+	pr_err("Command \"%s\" not found\n", dl_argv(dl));
+	return -ENOENT;
+}
+
 static int cmd_trap(struct dl *dl)
 {
 	if (dl_argv_match(dl, "help")) {
@@ -6564,6 +6693,9 @@ static int cmd_trap(struct dl *dl)
 	} else if (dl_argv_match(dl, "set")) {
 		dl_arg_inc(dl);
 		return cmd_trap_set(dl);
+	} else if (dl_argv_match(dl, "group")) {
+		dl_arg_inc(dl);
+		return cmd_trap_group(dl);
 	}
 	pr_err("Command \"%s\" not found\n", dl_argv(dl));
 	return -ENOENT;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH iproute2-next 4/7] devlink: Add devlink trap monitor support
  2019-07-07  8:01 ` [PATCH iproute2-next 0/7] " Ido Schimmel
                     ` (2 preceding siblings ...)
  2019-07-07  8:01   ` [PATCH iproute2-next 3/7] devlink: Add devlink trap group " Ido Schimmel
@ 2019-07-07  8:01   ` Ido Schimmel
  2019-07-07  8:01   ` [PATCH iproute2-next 5/7] devlink: Set NETLINK_NO_ENOBUFS when monitoring events Ido Schimmel
                     ` (2 subsequent siblings)
  6 siblings, 0 replies; 48+ messages in thread
From: Ido Schimmel @ 2019-07-07  8:01 UTC (permalink / raw)
  To: netdev
  Cc: davem, jiri, mlxsw, dsahern, roopa, nikolay, andy, pablo,
	jakub.kicinski, pieter.jansenvanvuuren, andrew, f.fainelli,
	vivien.didelot, Ido Schimmel

From: Ido Schimmel <idosch@mellanox.com>

According to the reporting state of individual traps, a notification is
sent to user space about each trapped packet. Allow the user to monitor
such events by having iproute2 subscribe to the
'DEVLINK_GENL_MCGRP_TRAP_NAME' group. An error is not returned in case
subscription failed in order not to cause regression with old kernels
and new iproute2.

When '-v' is specified trap metadata (e.g., input port) is also shown.
Example:

# devlink -jvp mon trap-report
[trap-report,report]
"netdevsim/netdevsim10": {
    "name": "blackhole_route",
    "type": "drop",
    "group": "l3_drops",
    "length": 146,
    "timestamp": "Tue May 28 13:03:09 2019 801360651 nsec",
    "input_port": {
        "netdevsim/netdevsim10/0": {
            "type": "eth",
            "netdev": "eth0"
        }
    }
}

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 devlink/devlink.c | 79 +++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 77 insertions(+), 2 deletions(-)

diff --git a/devlink/devlink.c b/devlink/devlink.c
index ad62b6162f0b..b9fce850ee00 100644
--- a/devlink/devlink.c
+++ b/devlink/devlink.c
@@ -3849,6 +3849,7 @@ static const char *cmd_name(uint8_t cmd)
 	case DEVLINK_CMD_TRAP_GROUP_SET: return "set";
 	case DEVLINK_CMD_TRAP_GROUP_NEW: return "new";
 	case DEVLINK_CMD_TRAP_GROUP_DEL: return "del";
+	case DEVLINK_CMD_TRAP_REPORT: return "report";
 	default: return "<unknown cmd>";
 	}
 }
@@ -3887,6 +3888,8 @@ static const char *cmd_obj(uint8_t cmd)
 	case DEVLINK_CMD_TRAP_GROUP_NEW:
 	case DEVLINK_CMD_TRAP_GROUP_DEL:
 		return "trap-group";
+	case DEVLINK_CMD_TRAP_REPORT:
+		return "trap-report";
 	default: return "<unknown obj>";
 	}
 }
@@ -3914,6 +3917,7 @@ static bool cmd_filter_check(struct dl *dl, uint8_t cmd)
 static void pr_out_region(struct dl *dl, struct nlattr **tb);
 static void pr_out_trap(struct dl *dl, struct nlattr **tb, bool array);
 static void pr_out_trap_group(struct dl *dl, struct nlattr **tb, bool array);
+static void pr_out_trap_report(struct dl *dl, struct nlattr **tb);
 
 static int cmd_mon_show_cb(const struct nlmsghdr *nlh, void *data)
 {
@@ -3998,6 +4002,18 @@ static int cmd_mon_show_cb(const struct nlmsghdr *nlh, void *data)
 		pr_out_mon_header(genl->cmd);
 		pr_out_trap_group(dl, tb, false);
 		break;
+	case DEVLINK_CMD_TRAP_REPORT:
+		mnl_attr_parse(nlh, sizeof(*genl), attr_cb, tb);
+		if (!tb[DEVLINK_ATTR_BUS_NAME] || !tb[DEVLINK_ATTR_DEV_NAME] ||
+		    !tb[DEVLINK_ATTR_TRAP_NAME] ||
+		    !tb[DEVLINK_ATTR_TRAP_TYPE] ||
+		    !tb[DEVLINK_ATTR_TRAP_GROUP_NAME] ||
+		    !tb[DEVLINK_ATTR_TRAP_PAYLOAD] ||
+		    !tb[DEVLINK_ATTR_TRAP_TIMESTAMP])
+			return MNL_CB_ERROR;
+		pr_out_mon_header(genl->cmd);
+		pr_out_trap_report(dl, tb);
+		break;
 	}
 	return MNL_CB_OK;
 }
@@ -4013,7 +4029,8 @@ static int cmd_mon_show(struct dl *dl)
 		    strcmp(cur_obj, "dev") != 0 &&
 		    strcmp(cur_obj, "port") != 0 &&
 		    strcmp(cur_obj, "trap") != 0 &&
-		    strcmp(cur_obj, "trap-group") != 0) {
+		    strcmp(cur_obj, "trap-group") != 0 &&
+		    strcmp(cur_obj, "trap-report") != 0) {
 			pr_err("Unknown object \"%s\"\n", cur_obj);
 			return -EINVAL;
 		}
@@ -4021,6 +4038,10 @@ static int cmd_mon_show(struct dl *dl)
 	err = _mnlg_socket_group_add(dl->nlg, DEVLINK_GENL_MCGRP_CONFIG_NAME);
 	if (err)
 		return err;
+	/* Do not bail in order to be compatible with old kernels that do not
+	 * support this multicast group.
+	 */
+	mnlg_socket_group_add(dl->nlg, DEVLINK_GENL_MCGRP_TRAP_NAME);
 	err = _mnlg_socket_recv_run(dl->nlg, cmd_mon_show_cb, dl);
 	if (err)
 		return err;
@@ -4030,7 +4051,7 @@ static int cmd_mon_show(struct dl *dl)
 static void cmd_mon_help(void)
 {
 	pr_err("Usage: devlink monitor [ all | OBJECT-LIST ]\n"
-	       "where  OBJECT-LIST := { dev | port | trap | trap-group }\n");
+	       "where  OBJECT-LIST := { dev | port | trap | trap-group | trap-report }\n");
 }
 
 static int cmd_mon(struct dl *dl)
@@ -6486,6 +6507,60 @@ static const char *trap_metadata_name(const struct nlattr *attr)
 		return "<unknown metadata type>";
 	}
 }
+
+static void pr_out_trap_report_timestamp(struct dl *dl,
+					 const struct nlattr *attr)
+{
+	struct timespec *ts;
+	struct tm *tm;
+	char buf[80];
+	char *tstr;
+
+	ts = mnl_attr_get_payload(attr);
+	tm = localtime(&ts->tv_sec);
+
+	tstr = asctime(tm);
+	tstr[strlen(tstr) - 1] = 0;
+	snprintf(buf, sizeof(buf), "%s %09ld nsec", tstr, ts->tv_nsec);
+
+	pr_out_str(dl, "timestamp", buf);
+}
+
+static void pr_out_trap_report_port(struct dl *dl, struct nlattr *attr,
+				    const char *name, struct nlattr **tb)
+{
+	int err;
+
+	if (!dl->verbose)
+		return;
+
+	err = mnl_attr_parse_nested(attr, attr_cb, tb);
+	if (err != MNL_CB_OK)
+		return;
+
+	pr_out_object_start(dl, name);
+	pr_out_port(dl, tb);
+	pr_out_object_end(dl);
+}
+
+static void pr_out_trap_report(struct dl *dl, struct nlattr **tb)
+{
+	uint8_t type = mnl_attr_get_u8(tb[DEVLINK_ATTR_TRAP_TYPE]);
+
+	__pr_out_handle_start(dl, tb, true, false);
+	pr_out_str(dl, "name", mnl_attr_get_str(tb[DEVLINK_ATTR_TRAP_NAME]));
+	pr_out_str(dl, "type", trap_type_name(type));
+	pr_out_str(dl, "group",
+		   mnl_attr_get_str(tb[DEVLINK_ATTR_TRAP_GROUP_NAME]));
+	pr_out_uint(dl, "length",
+		    mnl_attr_get_payload_len(tb[DEVLINK_ATTR_TRAP_PAYLOAD]));
+	pr_out_trap_report_timestamp(dl, tb[DEVLINK_ATTR_TRAP_TIMESTAMP]);
+	if (tb[DEVLINK_ATTR_TRAP_IN_PORT])
+		pr_out_trap_report_port(dl, tb[DEVLINK_ATTR_TRAP_IN_PORT],
+					"input_port", tb);
+	pr_out_handle_end(dl);
+}
+
 static void pr_out_trap_metadata(struct dl *dl, struct nlattr *attr)
 {
 	struct nlattr *attr_metadata;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH iproute2-next 5/7] devlink: Set NETLINK_NO_ENOBUFS when monitoring events
  2019-07-07  8:01 ` [PATCH iproute2-next 0/7] " Ido Schimmel
                     ` (3 preceding siblings ...)
  2019-07-07  8:01   ` [PATCH iproute2-next 4/7] devlink: Add devlink trap monitor support Ido Schimmel
@ 2019-07-07  8:01   ` Ido Schimmel
  2019-07-07  8:01   ` [PATCH iproute2-next 6/7] devlink: Add fflush() to print functions Ido Schimmel
  2019-07-07  8:02   ` [PATCH iproute2-next 7/7] devlink: Add man page for devlink-trap Ido Schimmel
  6 siblings, 0 replies; 48+ messages in thread
From: Ido Schimmel @ 2019-07-07  8:01 UTC (permalink / raw)
  To: netdev
  Cc: davem, jiri, mlxsw, dsahern, roopa, nikolay, andy, pablo,
	jakub.kicinski, pieter.jansenvanvuuren, andrew, f.fainelli,
	vivien.didelot, Ido Schimmel

From: Ido Schimmel <idosch@mellanox.com>

If we lost an event, there is nothing we can do in order to recover it.
Set above mentioned socket option on the netlink socket when monitoring
devlink events, so that `devlink monitor` will not abort in case we are
not draining the receive buffer fast enough.

The number of events we lost can be retrieved using:

# cat /proc/net/netlink | grep `pidof devlink` | awk '{ print $9 }'

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 devlink/devlink.c | 22 +++++++++++++++++++++-
 devlink/mnlg.c    | 12 ++++++++++++
 devlink/mnlg.h    |  2 ++
 3 files changed, 35 insertions(+), 1 deletion(-)

diff --git a/devlink/devlink.c b/devlink/devlink.c
index b9fce850ee00..817b74259ec3 100644
--- a/devlink/devlink.c
+++ b/devlink/devlink.c
@@ -140,6 +140,19 @@ static int _mnlg_socket_group_add(struct mnlg_socket *nlg,
 	return 0;
 }
 
+static int _mnlg_socket_setsockopt(struct mnlg_socket *nlg, int type,
+				   void *buf, socklen_t len)
+{
+	int err;
+
+	err = mnlg_socket_setsockopt(nlg, type, buf, len);
+	if (err < 0) {
+		pr_err("Failed to call mnlg_socket_setsockopt\n");
+		return -errno;
+	}
+	return 0;
+}
+
 struct ifname_map {
 	struct list_head list;
 	char *bus_name;
@@ -4020,7 +4033,7 @@ static int cmd_mon_show_cb(const struct nlmsghdr *nlh, void *data)
 
 static int cmd_mon_show(struct dl *dl)
 {
-	int err;
+	int err, one = 1;
 	unsigned int index = 0;
 	const char *cur_obj;
 
@@ -4035,6 +4048,13 @@ static int cmd_mon_show(struct dl *dl)
 			return -EINVAL;
 		}
 	}
+	/* It is possible to lose some events if we are not draining the socket
+	 * receive buffer fast enough. Keep processing events and do not abort.
+	 */
+	err = _mnlg_socket_setsockopt(dl->nlg, NETLINK_NO_ENOBUFS, &one,
+				      sizeof(one));
+	if (err)
+		return err;
 	err = _mnlg_socket_group_add(dl->nlg, DEVLINK_GENL_MCGRP_CONFIG_NAME);
 	if (err)
 		return err;
diff --git a/devlink/mnlg.c b/devlink/mnlg.c
index ee125df042f0..23e6e794b508 100644
--- a/devlink/mnlg.c
+++ b/devlink/mnlg.c
@@ -231,6 +231,18 @@ int mnlg_socket_group_add(struct mnlg_socket *nlg, const char *group_name)
 	return 0;
 }
 
+int mnlg_socket_setsockopt(struct mnlg_socket *nlg, int type, void *buf,
+			   socklen_t len)
+{
+	int err;
+
+	err = mnl_socket_setsockopt(nlg->nl, type, buf, len);
+	if (err < 0)
+		return err;
+
+	return 0;
+}
+
 static int get_family_id_attr_cb(const struct nlattr *attr, void *data)
 {
 	const struct nlattr **tb = data;
diff --git a/devlink/mnlg.h b/devlink/mnlg.h
index 4d1babf3b4c2..49154215729e 100644
--- a/devlink/mnlg.h
+++ b/devlink/mnlg.h
@@ -21,6 +21,8 @@ struct nlmsghdr *mnlg_msg_prepare(struct mnlg_socket *nlg, uint8_t cmd,
 int mnlg_socket_send(struct mnlg_socket *nlg, const struct nlmsghdr *nlh);
 int mnlg_socket_recv_run(struct mnlg_socket *nlg, mnl_cb_t data_cb, void *data);
 int mnlg_socket_group_add(struct mnlg_socket *nlg, const char *group_name);
+int mnlg_socket_setsockopt(struct mnlg_socket *nlg, int type, void *buf,
+			   socklen_t len);
 struct mnlg_socket *mnlg_socket_open(const char *family_name, uint8_t version);
 void mnlg_socket_close(struct mnlg_socket *nlg);
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH iproute2-next 6/7] devlink: Add fflush() to print functions
  2019-07-07  8:01 ` [PATCH iproute2-next 0/7] " Ido Schimmel
                     ` (4 preceding siblings ...)
  2019-07-07  8:01   ` [PATCH iproute2-next 5/7] devlink: Set NETLINK_NO_ENOBUFS when monitoring events Ido Schimmel
@ 2019-07-07  8:01   ` Ido Schimmel
  2019-07-07  8:02   ` [PATCH iproute2-next 7/7] devlink: Add man page for devlink-trap Ido Schimmel
  6 siblings, 0 replies; 48+ messages in thread
From: Ido Schimmel @ 2019-07-07  8:01 UTC (permalink / raw)
  To: netdev
  Cc: davem, jiri, mlxsw, dsahern, roopa, nikolay, andy, pablo,
	jakub.kicinski, pieter.jansenvanvuuren, andrew, f.fainelli,
	vivien.didelot, Ido Schimmel

From: Ido Schimmel <idosch@mellanox.com>

Similar to commit 09e0528cf977 ("ip: mroute: add fflush to
print_mroute"), we need to add fflush() to the print functions to get
the following to work:

# devlink mon &> log

Without it, stdout output is buffered and not written to the disk.

This is useful when writing tests that rely on devlink-monitor output.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 devlink/devlink.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/devlink/devlink.c b/devlink/devlink.c
index 817b74259ec3..6fa3be69ff1a 100644
--- a/devlink/devlink.c
+++ b/devlink/devlink.c
@@ -58,6 +58,7 @@ static int g_new_line_count;
 		}						\
 		fprintf(stdout, ##args);			\
 		g_new_line_count = 0;				\
+		fflush(stdout);					\
 	} while (0)
 
 #define pr_out_sp(num, args...)					\
@@ -66,6 +67,7 @@ static int g_new_line_count;
 		if (ret < num)					\
 			fprintf(stdout, "%*s", num - ret, "");	\
 		g_new_line_count = 0;				\
+		fflush(stdout);					\
 	} while (0)
 
 static int g_indent_level;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH iproute2-next 7/7] devlink: Add man page for devlink-trap
  2019-07-07  8:01 ` [PATCH iproute2-next 0/7] " Ido Schimmel
                     ` (5 preceding siblings ...)
  2019-07-07  8:01   ` [PATCH iproute2-next 6/7] devlink: Add fflush() to print functions Ido Schimmel
@ 2019-07-07  8:02   ` Ido Schimmel
  6 siblings, 0 replies; 48+ messages in thread
From: Ido Schimmel @ 2019-07-07  8:02 UTC (permalink / raw)
  To: netdev
  Cc: davem, jiri, mlxsw, dsahern, roopa, nikolay, andy, pablo,
	jakub.kicinski, pieter.jansenvanvuuren, andrew, f.fainelli,
	vivien.didelot, Ido Schimmel

From: Ido Schimmel <idosch@mellanox.com>

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
---
 man/man8/devlink-monitor.8 |   3 +-
 man/man8/devlink-trap.8    | 166 +++++++++++++++++++++++++++++++++++++
 man/man8/devlink.8         |  11 ++-
 3 files changed, 178 insertions(+), 2 deletions(-)
 create mode 100644 man/man8/devlink-trap.8

diff --git a/man/man8/devlink-monitor.8 b/man/man8/devlink-monitor.8
index 13fe641dc8f5..84f0442aa272 100644
--- a/man/man8/devlink-monitor.8
+++ b/man/man8/devlink-monitor.8
@@ -21,7 +21,7 @@ command is the first in the command line and then the object list.
 .I OBJECT-LIST
 is the list of object types that we want to monitor.
 It may contain
-.BR dev ", " port ".
+.BR dev ", " port ", " trap ", " trap-group ", " trap-report .
 
 .B devlink
 opens Devlink Netlink socket, listens on it and dumps state changes.
@@ -31,6 +31,7 @@ opens Devlink Netlink socket, listens on it and dumps state changes.
 .BR devlink-dev (8),
 .BR devlink-sb (8),
 .BR devlink-port (8),
+.BR devlink-trap (8),
 .br
 
 .SH AUTHOR
diff --git a/man/man8/devlink-trap.8 b/man/man8/devlink-trap.8
new file mode 100644
index 000000000000..8877df484749
--- /dev/null
+++ b/man/man8/devlink-trap.8
@@ -0,0 +1,166 @@
+.TH DEVLINK\-TRAP 8 "11 June 2019" "iproute2" "Linux"
+.SH NAME
+devlink-trap \- devlink trap configuration
+.SH SYNOPSIS
+.sp
+.ad l
+.in +8
+.ti -8
+.B devlink
+.RI "[ " OPTIONS " ]"
+.B trap
+.RI "{ " COMMAND " |"
+.BR help " }"
+.sp
+
+.ti -8
+.IR OPTIONS " := { "
+\fB\-v\fR[\fIerbose\fR] |
+\fB\-s\fR[\fItatistics\fR] }
+
+.ti -8
+.B "devlink trap show"
+.RI "[ " DEV
+.B trap
+.IR TRAP " ]"
+
+.ti -8
+.BI "devlink trap set " DEV " trap " TRAP
+.RB "[ " report " { " true " | " false " } ]"
+.br
+.RB "[ " action " { " trap " | " drop " } ]"
+
+.ti -8
+.B "devlink trap group show"
+.RI "[ " DEV
+.B group
+.IR GROUP " ]"
+
+.ti -8
+.BI "devlink trap group set " DEV " group " GROUP
+.RB "[ " report " { " true " | " false " } ]"
+.br
+.RB "[ " action " { " trap " | " drop " } ]"
+
+.ti -8
+.B devlink trap help
+
+.SH "DESCRIPTION"
+.SS devlink trap show - display available packet traps and their attributes
+
+.PP
+.I "DEV"
+- specifies the devlink device from which to show packet traps.
+If this argument is omitted all packet traps of all devices are listed.
+
+.PP
+.BI "trap " TRAP
+- specifies the packet trap.
+Only applicable if a devlink device is also specified.
+
+.SS devlink trap set - set attributes of a packet trap
+
+.PP
+.I "DEV"
+- specifies the devlink device the packet trap belongs to.
+
+.PP
+.BI "trap " TRAP
+- specifies the packet trap.
+
+.TP
+.BR report " { " true " | " false " } "
+whether to report trapped packets to user space or not.
+
+.TP
+.BR action " { " trap " | " drop " } "
+packet trap action.
+
+.I trap
+- the sole copy of the packet is sent to the CPU.
+
+.I drop
+- the packet is dropped by the underlying device and a copy is not sent to the CPU.
+
+.SS devlink trap group show - display available packet trap groups and their attributes
+
+.PP
+.I "DEV"
+- specifies the devlink device from which to show packet trap groups.
+If this argument is omitted all packet trap groups of all devices are listed.
+
+.PP
+.BI "group " GROUP
+- specifies the packet trap group.
+Only applicable if a devlink device is also specified.
+
+.SS devlink trap group set - set attributes of a packet trap group
+
+.PP
+.I "DEV"
+- specifies the devlink device the packet trap group belongs to.
+
+.PP
+.BI "group " GROUP
+- specifies the packet trap group.
+
+.TP
+.BR report " { " true " | " false " } "
+whether to report trapped packets from traps member in the trap group to user
+space or not.
+
+.TP
+.BR action " { " trap " | " drop " } "
+packet trap action. The action is set for all the packet traps member in the
+trap group. The actions of non-drop traps cannot be changed and are thus
+skipped.
+
+.SH "EXAMPLES"
+.PP
+devlink trap show
+.RS 4
+List available packet traps.
+.RE
+.PP
+devlink trap group show
+.RS 4
+List available packet trap groups.
+.RE
+.PP
+devlink -vs trap show pci/0000:01:00.0 trap source_mac_is_multicast
+.RS 4
+Show attributes and statistics of a specific packet trap.
+.RE
+.PP
+devlink -s trap group show pci/0000:01:00.0 group l2_drops
+.RS 4
+Show attributes and statistics of a specific packet trap group.
+.RE
+.PP
+devlink trap set pci/0000:01:00.0 trap source_mac_is_multicast action trap
+.RS 4
+Set the action of a specific packet trap to 'trap'.
+.RE
+.PP
+devlink trap set pci/0000:01:00.0 trap source_mac_is_multicast report true
+.RS 4
+Set the report status a specific packet trap to 'true'.
+.RE
+.PP
+devlink trap group set pci/0000:01:00.0 group l2_drops report true
+.RS 4
+Set the report status a specific packet trap group to 'true'.
+.RE
+.PP
+devlink -v monitor trap-report
+.RS 4
+Monitor trapped packets.
+
+.SH SEE ALSO
+.BR devlink (8),
+.BR devlink-dev (8),
+.BR devlink-monitor (8),
+.br
+
+.SH AUTHOR
+Ido Schimmel <idosch@mellanox.com>
diff --git a/man/man8/devlink.8 b/man/man8/devlink.8
index 13d4dcd908b3..12d489440a3d 100644
--- a/man/man8/devlink.8
+++ b/man/man8/devlink.8
@@ -7,7 +7,7 @@ devlink \- Devlink tool
 .in +8
 .ti -8
 .B devlink
-.RI "[ " OPTIONS " ] { " dev | port | monitor | sb | resource | region | health " } { " COMMAND " | "
+.RI "[ " OPTIONS " ] { " dev | port | monitor | sb | resource | region | health | trap " } { " COMMAND " | "
 .BR help " }"
 .sp
 
@@ -51,6 +51,10 @@ When combined with -j generate a pretty JSON output.
 .BR "\-v" , " --verbose"
 Turn on verbose output.
 
+.TP
+.BR "\-s" , " --statistics"
+Output statistics.
+
 .SS
 .I OBJECT
 
@@ -82,6 +86,10 @@ Turn on verbose output.
 .B health
 - devlink reporting and recovery
 
+.TP
+.B trap
+- devlink trap configuration
+
 .SS
 .I COMMAND
 
@@ -114,6 +122,7 @@ Exit status is 0 if command was successful or a positive integer upon failure.
 .BR devlink-resource (8),
 .BR devlink-region (8),
 .BR devlink-health (8),
+.BR devlink-trap (8),
 .br
 
 .SH REPORTING BUGS
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [RFC PATCH net-next 0/5] selftests: Add devlink-trap selftests
  2019-07-07  7:58 [PATCH net-next 00/11] Add drop monitor for offloaded data paths Ido Schimmel
                   ` (11 preceding siblings ...)
  2019-07-07  8:01 ` [PATCH iproute2-next 0/7] " Ido Schimmel
@ 2019-07-07  8:03 ` Ido Schimmel
  2019-07-07  8:03   ` [RFC PATCH net-next 1/5] selftests: devlink_trap: Add test cases for devlink-trap Ido Schimmel
                     ` (4 more replies)
  2019-07-07  8:15 ` [PATCH net-next 00/11] Add drop monitor for offloaded data paths Ido Schimmel
  2019-07-07 19:45 ` David Miller
  14 siblings, 5 replies; 48+ messages in thread
From: Ido Schimmel @ 2019-07-07  8:03 UTC (permalink / raw)
  To: netdev
  Cc: davem, jiri, mlxsw, dsahern, roopa, nikolay, andy, pablo,
	jakub.kicinski, pieter.jansenvanvuuren, andrew, f.fainelli,
	vivien.didelot, Ido Schimmel

From: Ido Schimmel <idosch@mellanox.com>

Patch #1 adds test cases for the generic devlink-trap infrastructure
over netdevsim.

Patch #2 mentions the previous patch in the devlink-trap documentation.
This is meant to serve as a reminder for people to add test cases when
the infrastructure is extended.

Patches #3-#5 add test cases over mlxsw.

Ido Schimmel (5):
  selftests: devlink_trap: Add test cases for devlink-trap
  Documentation: Add a section for devlink-trap testing
  selftests: forwarding: devlink_lib: Add devlink-trap helpers
  selftests: mlxsw: Add test cases for devlink-trap L2 drops
  selftests: mlxsw: Add a test case for devlink-trap

 Documentation/networking/devlink-trap.rst     |   9 +
 .../drivers/net/mlxsw/devlink_trap.sh         | 130 +++
 .../net/mlxsw/devlink_trap_l2_drops.sh        | 487 +++++++++++
 tools/testing/selftests/net/Makefile          |   2 +-
 tools/testing/selftests/net/config            |   1 +
 tools/testing/selftests/net/devlink_trap.sh   | 792 ++++++++++++++++++
 .../selftests/net/forwarding/devlink_lib.sh   | 125 +++
 7 files changed, 1545 insertions(+), 1 deletion(-)
 create mode 100755 tools/testing/selftests/drivers/net/mlxsw/devlink_trap.sh
 create mode 100755 tools/testing/selftests/drivers/net/mlxsw/devlink_trap_l2_drops.sh
 create mode 100755 tools/testing/selftests/net/devlink_trap.sh

-- 
2.20.1


^ permalink raw reply	[flat|nested] 48+ messages in thread

* [RFC PATCH net-next 1/5] selftests: devlink_trap: Add test cases for devlink-trap
  2019-07-07  8:03 ` [RFC PATCH net-next 0/5] selftests: Add devlink-trap selftests Ido Schimmel
@ 2019-07-07  8:03   ` Ido Schimmel
  2019-07-07  8:03   ` [RFC PATCH net-next 2/5] Documentation: Add a section for devlink-trap testing Ido Schimmel
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 48+ messages in thread
From: Ido Schimmel @ 2019-07-07  8:03 UTC (permalink / raw)
  To: netdev
  Cc: davem, jiri, mlxsw, dsahern, roopa, nikolay, andy, pablo,
	jakub.kicinski, pieter.jansenvanvuuren, andrew, f.fainelli,
	vivien.didelot, Ido Schimmel

From: Ido Schimmel <idosch@mellanox.com>

Add test cases for devlink-trap on top of the netdevsim implementation.

The tests focus on the devlink-trap core infrastructure and user space
API. They test both good and bad flows and also dismantle of the netdev
and devlink device used to report trapped packets.

This allows device drivers to focus their tests on device-specific
functionality.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 tools/testing/selftests/net/Makefile        |   2 +-
 tools/testing/selftests/net/config          |   1 +
 tools/testing/selftests/net/devlink_trap.sh | 792 ++++++++++++++++++++
 3 files changed, 794 insertions(+), 1 deletion(-)
 create mode 100755 tools/testing/selftests/net/devlink_trap.sh

diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile
index 1b24e36b4047..bb9ac949ba0e 100644
--- a/tools/testing/selftests/net/Makefile
+++ b/tools/testing/selftests/net/Makefile
@@ -10,7 +10,7 @@ TEST_PROGS += fib_tests.sh fib-onlink-tests.sh pmtu.sh udpgso.sh ip_defrag.sh
 TEST_PROGS += udpgso_bench.sh fib_rule_tests.sh msg_zerocopy.sh psock_snd.sh
 TEST_PROGS += udpgro_bench.sh udpgro.sh test_vxlan_under_vrf.sh reuseport_addr_any.sh
 TEST_PROGS += test_vxlan_fdb_changelink.sh so_txtime.sh ipv6_flowlabel.sh
-TEST_PROGS += tcp_fastopen_backup_key.sh
+TEST_PROGS += tcp_fastopen_backup_key.sh devlink_trap.sh
 TEST_PROGS_EXTENDED := in_netns.sh
 TEST_GEN_FILES =  socket
 TEST_GEN_FILES += psock_fanout psock_tpacket msg_zerocopy reuseport_addr_any
diff --git a/tools/testing/selftests/net/config b/tools/testing/selftests/net/config
index e4b878d95ba0..615857b89ead 100644
--- a/tools/testing/selftests/net/config
+++ b/tools/testing/selftests/net/config
@@ -28,3 +28,4 @@ CONFIG_NFT_CHAIN_NAT_IPV4=m
 CONFIG_NET_SCH_FQ=m
 CONFIG_NET_SCH_ETF=m
 CONFIG_TEST_BLACKHOLE_DEV=m
+CONFIG_NETDEVSIM=y
diff --git a/tools/testing/selftests/net/devlink_trap.sh b/tools/testing/selftests/net/devlink_trap.sh
new file mode 100755
index 000000000000..504d9dbe9d69
--- /dev/null
+++ b/tools/testing/selftests/net/devlink_trap.sh
@@ -0,0 +1,792 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# This test is for checking devlink-trap functionality. It makes use of
+# netdevsim which implements the required callbacks.
+
+##############################################################################
+# Global variables
+
+# Kselftest framework requirement - SKIP code is 4.
+KSFT_SKIP=4
+# Exit status to return at the end. Set in case one of the tests fails.
+EXIT_STATUS=0
+# Per-test return value. Clear at the beginning of each test.
+RET=0
+
+# netdevsim-specific variables.
+NETDEVSIM_PATH=/sys/bus/netdevsim/
+DEV_ADDR=1337
+SLEEP_TIME=.3
+DEVLINK_DEV=""
+NETDEV=""
+DEV=""
+
+##############################################################################
+# Dependencies
+
+if [ "$(id -u)" -ne 0 ]; then
+	echo "SKIP: Need root privileges"
+	exit $KSFT_SKIP
+fi
+
+if [ ! -d "$NETDEVSIM_PATH" ]; then
+	echo "SKIP: No netdevsim support"
+	exit $KSFT_SKIP
+fi
+
+if [ -d "${NETDEVSIM_PATH}/devices/netdevsim${DEV_ADDR}" ]; then
+	echo "SKIP: Device netdevsim${DEV_ADDR} already exists"
+	exit $KSFT_SKIP
+fi
+
+if [ ! -x "$(command -v jq)" ]; then
+	echo "SKIP: Could not run test without jq tool"
+	exit $KSFT_SKIP
+fi
+
+if [ ! -x "$(command -v udevadm)" ]; then
+	echo "SKIP: Could not run test without udevadm tool"
+	exit $KSFT_SKIP
+fi
+
+if [ ! -x "$(command -v ip)" ]; then
+	echo "SKIP: Could not run test without ip tool"
+	exit $KSFT_SKIP
+fi
+
+if [ ! -x "$(command -v devlink)" ]; then
+	echo "SKIP: Could not run test without devlink tool"
+	exit $KSFT_SKIP
+fi
+
+devlink help 2>&1 | grep -q "trap"
+if [ $? -ne 0 ]; then
+	echo "SKIP: iproute2 too old, missing devlink-trap"
+	exit $KSFT_SKIP
+fi
+
+##############################################################################
+# Helpers
+
+check_err()
+{
+	local err=$1; shift
+	local msg=$1; shift
+
+	if [[ $RET -eq 0 && $err -ne 0 ]]; then
+		RET=$err
+		RETMSG=$msg
+	fi
+}
+
+check_fail()
+{
+	local err=$1; shift
+	local msg=$1; shift
+
+	if [[ $RET -eq 0 && $err -eq 0 ]]; then
+		RET=1
+		RETMSG=$msg
+	fi
+}
+
+log_test()
+{
+	local test_name=$1
+	local opt_str=$2
+
+	if [[ $# -eq 2 ]]; then
+		opt_str="($opt_str)"
+	fi
+
+	if [[ $RET -ne 0 ]]; then
+		EXIT_STATUS=1
+		printf "TEST: %-60s  [FAIL]\n" "$test_name $opt_str"
+		if [[ ! -z "$RETMSG" ]]; then
+			printf "\t%s\n" "$RETMSG"
+		fi
+		return 1
+	fi
+
+	printf "TEST: %-60s  [ OK ]\n" "$test_name $opt_str"
+	return 0
+}
+
+netdevsim_dev_create()
+{
+	echo "$DEV_ADDR 0" > ${NETDEVSIM_PATH}/new_device
+}
+
+netdevsim_dev_destroy()
+{
+	echo "$DEV_ADDR" > ${NETDEVSIM_PATH}/del_device
+}
+
+netdevsim_port_create()
+{
+	echo 1 > ${NETDEVSIM_PATH}/devices/${DEV}/new_port
+}
+
+netdevsim_port_destroy()
+{
+	echo 1 > ${NETDEVSIM_PATH}/devices/${DEV}/del_port
+}
+
+##############################################################################
+# Trap helpers
+
+devlink_traps_num_get()
+{
+	devlink -j trap | jq '.[]["'$DEVLINK_DEV'"] | length'
+}
+
+devlink_traps_get()
+{
+	devlink -j trap | jq -r '.[]["'$DEVLINK_DEV'"][].name'
+}
+
+devlink_trap_report_set()
+{
+	local trap_name=$1; shift
+	local report=$1; shift
+
+	# Pipe output to /dev/null to avoid expected warnings.
+	devlink trap set $DEVLINK_DEV trap $trap_name report $report \
+		&> /dev/null
+}
+
+devlink_trap_report_get()
+{
+	local trap_name=$1; shift
+
+	devlink -j trap show $DEVLINK_DEV trap $trap_name \
+		| jq '.[][][].report'
+}
+
+devlink_trap_type_get()
+{
+	local trap_name=$1; shift
+
+	devlink -j trap show $DEVLINK_DEV trap $trap_name \
+		| jq -r '.[][][].type'
+}
+
+devlink_trap_action_set()
+{
+	local trap_name=$1; shift
+	local action=$1; shift
+
+	# Pipe output to /dev/null to avoid expected warnings.
+	devlink trap set $DEVLINK_DEV trap $trap_name \
+		action $action &> /dev/null
+}
+
+devlink_trap_action_get()
+{
+	local trap_name=$1; shift
+
+	devlink -j trap show $DEVLINK_DEV trap $trap_name \
+		| jq -r '.[][][].action'
+}
+
+devlink_trap_group_get()
+{
+	devlink -j trap show $DEVLINK_DEV trap $trap_name \
+		| jq -r '.[][][].group'
+}
+
+devlink_trap_metadata_test()
+{
+	local trap_name=$1; shift
+	local metadata=$1; shift
+
+	devlink -jv trap show $DEVLINK_DEV trap $trap_name \
+		| jq -e '.[][][].metadata | contains(["'$metadata'"])' \
+		&> /dev/null
+}
+
+devlink_trap_rx_packets_get()
+{
+	local trap_name=$1; shift
+
+	devlink -js trap show $DEVLINK_DEV trap $trap_name \
+		| jq '.[][][]["stats"]["rx"]["packets"]'
+}
+
+devlink_trap_rx_bytes_get()
+{
+	local trap_name=$1; shift
+
+	devlink -js trap show $DEVLINK_DEV trap $trap_name \
+		| jq '.[][][]["stats"]["rx"]["bytes"]'
+}
+
+devlink_trap_stats_idle_test()
+{
+	local trap_name=$1; shift
+	local t0_packets t0_bytes
+	local t1_packets t1_bytes
+
+	t0_packets=$(devlink_trap_rx_packets_get $trap_name)
+	t0_bytes=$(devlink_trap_rx_bytes_get $trap_name)
+
+	sleep $SLEEP_TIME
+
+	t1_packets=$(devlink_trap_rx_packets_get $trap_name)
+	t1_bytes=$(devlink_trap_rx_bytes_get $trap_name)
+
+	if [[ $t0_packets -eq $t1_packets && $t0_bytes -eq $t1_bytes ]]; then
+		return 0
+	else
+		return 1
+	fi
+}
+
+devlink_trap_mon_idle_test()
+{
+	local trap_name=$1; shift
+	local devlink_pid
+	local tmp_dir
+	local rc
+
+	tmp_dir="$(mktemp -d)"
+
+	timeout $SLEEP_TIME devlink mon trap-report &> ${tmp_dir}/reports.txt
+
+	grep -e "$DEVLINK_DEV" ${tmp_dir}/reports.txt | grep -q -e "$trap_name"
+
+	# Return true if no reports were found.
+	if [ $? -ne 0 ]; then
+		rc=0
+	else
+		rc=1
+	fi
+
+	rm -rf $tmp_dir
+
+	return $rc
+}
+
+devlink_traps_enable_all()
+{
+	local trap_name
+
+	for trap_name in $(devlink_traps_get); do
+		devlink_trap_report_set $trap_name "true"
+		devlink_trap_action_set $trap_name "trap"
+	done
+}
+
+devlink_traps_disable_all()
+{
+	for trap_name in $(devlink_traps_get); do
+		devlink_trap_report_set $trap_name "false"
+		devlink_trap_action_set $trap_name "drop"
+	done
+}
+
+##############################################################################
+# Trap group helpers
+
+devlink_trap_groups_get()
+{
+	devlink -j trap group | jq -r '.[]["'$DEVLINK_DEV'"][].name'
+}
+
+devlink_trap_group_report_set()
+{
+	local group_name=$1; shift
+	local report=$1; shift
+
+	# Pipe output to /dev/null to avoid expected warnings.
+	devlink trap group set $DEVLINK_DEV group $group_name report $report \
+		&> /dev/null
+}
+
+devlink_trap_group_action_set()
+{
+	local group_name=$1; shift
+	local action=$1; shift
+
+	# Pipe output to /dev/null to avoid expected warnings.
+	devlink trap group set $DEVLINK_DEV group $group_name action $action \
+		&> /dev/null
+}
+
+devlink_trap_group_rx_packets_get()
+{
+	local group_name=$1; shift
+
+	devlink -js trap group show $DEVLINK_DEV group $group_name \
+		| jq '.[][][]["stats"]["rx"]["packets"]'
+}
+
+devlink_trap_group_rx_bytes_get()
+{
+	local group_name=$1; shift
+
+	devlink -js trap group show $DEVLINK_DEV group $group_name \
+		| jq '.[][][]["stats"]["rx"]["bytes"]'
+}
+
+devlink_trap_group_stats_idle_test()
+{
+	local group_name=$1; shift
+	local t0_packets t0_bytes
+	local t1_packets t1_bytes
+
+	t0_packets=$(devlink_trap_group_rx_packets_get $group_name)
+	t0_bytes=$(devlink_trap_group_rx_bytes_get $group_name)
+
+	sleep $SLEEP_TIME
+
+	t1_packets=$(devlink_trap_group_rx_packets_get $group_name)
+	t1_bytes=$(devlink_trap_group_rx_bytes_get $group_name)
+
+	if [[ $t0_packets -eq $t1_packets && $t0_bytes -eq $t1_bytes ]]; then
+		return 0
+	else
+		return 1
+	fi
+}
+
+##############################################################################
+# Initialization
+
+setup_prepare()
+{
+	local netdev
+
+	DEV=netdevsim${DEV_ADDR}
+	DEVLINK_DEV=netdevsim/${DEV}
+
+	netdevsim_dev_create
+
+	if [ ! -d "${NETDEVSIM_PATH}/devices/${DEV}" ]; then
+		echo "Failed to create netdevsim device"
+		exit 1
+	fi
+
+	netdevsim_port_create
+
+	if [ ! -d "${NETDEVSIM_PATH}/devices/${DEV}/net/" ]; then
+		echo "Failed to create netdevsim port"
+		exit 1
+	fi
+
+	# Wait for udev to rename newly created netdev.
+	udevadm settle
+
+	NETDEV=$(ls ${NETDEVSIM_PATH}/devices/${DEV}/net/)
+}
+
+cleanup()
+{
+	netdevsim_port_destroy
+	netdevsim_dev_destroy
+}
+
+##############################################################################
+# Tests
+
+init_test()
+{
+	RET=0
+
+	test $(devlink_traps_num_get) -ne 0
+	check_err $? "No traps were registered"
+
+	devlink -j trap \
+		| jq -e '.[]["'$DEVLINK_DEV'"][] | select (.report == true)'
+	check_fail $? "Traps were not registered with report set to 'false'"
+
+	log_test "Initialization"
+}
+
+trap_report_test()
+{
+	local trap_name
+
+	RET=0
+
+	for trap_name in $(devlink_traps_get); do
+		devlink_trap_report_set $trap_name "true"
+		if [ $(devlink_trap_report_get $trap_name) != "true" ]; then
+			check_err 1 "Trap $trap_name did not enable reporting"
+		fi
+
+		devlink_trap_report_set $trap_name "false"
+		if [ $(devlink_trap_report_get $trap_name) != "false" ]; then
+			check_err 1 "Trap $trap_name did not disable reporting"
+		fi
+	done
+
+	log_test "Trap report"
+}
+
+trap_action_test()
+{
+	local orig_action
+	local trap_name
+	local action
+
+	RET=0
+
+	for trap_name in $(devlink_traps_get); do
+		# The action of non-drop traps cannot be changed.
+		if [ $(devlink_trap_type_get $trap_name) = "drop" ]; then
+			devlink_trap_action_set $trap_name "trap"
+			action=$(devlink_trap_action_get $trap_name)
+			if [ $action != "trap" ]; then
+				check_err 1 "Trap $trap_name did not change action to trap"
+			fi
+
+			devlink_trap_action_set $trap_name "drop"
+			action=$(devlink_trap_action_get $trap_name)
+			if [ $action != "drop" ]; then
+				check_err 1 "Trap $trap_name did not change action to drop"
+			fi
+		else
+			orig_action=$(devlink_trap_action_get $trap_name)
+
+			devlink_trap_action_set $trap_name "trap"
+			action=$(devlink_trap_action_get $trap_name)
+			if [ $action != $orig_action ]; then
+				check_err 1 "Trap $trap_name changed action when should not"
+			fi
+
+			devlink_trap_action_set $trap_name "drop"
+			action=$(devlink_trap_action_get $trap_name)
+			if [ $action != $orig_action ]; then
+				check_err 1 "Trap $trap_name changed action when should not"
+			fi
+		fi
+	done
+
+	log_test "Trap action"
+}
+
+trap_metadata_test()
+{
+	local trap_name
+
+	RET=0
+
+	for trap_name in $(devlink_traps_get); do
+		devlink_trap_metadata_test $trap_name "input_port"
+		check_err $? "Input port not reported as metadata of trap $trap_name"
+	done
+
+	log_test "Trap metadata"
+}
+
+bad_trap_test()
+{
+	RET=0
+
+	devlink_trap_action_set "made_up_trap" "drop"
+	check_fail $? "Did not get an error for non-existing trap"
+
+	log_test "Non-existing trap"
+}
+
+bad_trap_action_test()
+{
+	local traps_arr
+	local trap_name
+
+	RET=0
+
+	# Pick first trap.
+	traps_arr=($(devlink_traps_get))
+	trap_name=${traps_arr[0]}
+
+	devlink_trap_action_set $trap_name "made_up_action"
+	check_fail $? "Did not get an error for non-existing trap action"
+
+	log_test "Non-existing trap action"
+}
+
+bad_trap_report_test()
+{
+	local traps_arr
+	local trap_name
+
+	RET=0
+
+	# Pick first trap.
+	traps_arr=($(devlink_traps_get))
+	trap_name=${traps_arr[0]}
+
+	devlink_trap_report_set $trap_name "made_up_report"
+	check_fail $? "Did not get an error for invalid trap report value"
+
+	log_test "Non-existing trap report value"
+}
+
+trap_stats_test()
+{
+	local trap_name
+
+	RET=0
+
+	for trap_name in $(devlink_traps_get); do
+		devlink_trap_stats_idle_test $trap_name
+		check_err $? "Stats of trap $trap_name not idle when netdev down"
+
+		ip link set dev $NETDEV up
+
+		if [ $(devlink_trap_type_get $trap_name) = "drop" ]; then
+			devlink_trap_action_set $trap_name "trap"
+			devlink_trap_stats_idle_test $trap_name
+			check_fail $? "Stats of trap $trap_name idle when action is trap"
+
+			devlink_trap_action_set $trap_name "drop"
+			devlink_trap_stats_idle_test $trap_name
+			check_err $? "Stats of trap $trap_name not idle when action is drop"
+		else
+			devlink_trap_stats_idle_test $trap_name
+			check_fail $? "Stats of non-drop trap $trap_name idle when should not"
+		fi
+
+		ip link set dev $NETDEV down
+	done
+
+	log_test "Trap statistics"
+}
+
+trap_mon_test()
+{
+	local trap_name
+
+	RET=0
+
+	ip link set dev $NETDEV up
+
+	for trap_name in $(devlink_traps_get); do
+		devlink_trap_mon_idle_test $trap_name
+		check_err $? "Received trap reports for trap $trap_name when reporting disabled"
+
+		devlink_trap_action_set $trap_name "trap"
+		devlink_trap_mon_idle_test $trap_name
+		check_err $? "Received trap reports for trap $trap_name when reporting disabled and action set to trap"
+
+		devlink_trap_report_set $trap_name "true"
+		devlink_trap_mon_idle_test $trap_name
+		check_fail $? "Did not receive trap reports for trap $trap_name when reporting enabled"
+
+		devlink_trap_report_set $trap_name "false"
+		devlink_trap_mon_idle_test $trap_name
+		check_err $? "Received trap reports for trap $trap_name after disabling reporting"
+
+		devlink_trap_action_set $trap_name "drop"
+	done
+
+	ip link set dev $NETDEV down
+
+	log_test "Trap monitor"
+}
+
+trap_group_report_test()
+{
+	local curr_group group_name
+	local trap_name
+	local report
+
+	RET=0
+
+	for group_name in $(devlink_trap_groups_get); do
+		devlink_trap_group_report_set $group_name "true"
+
+		for trap_name in $(devlink_traps_get); do
+			curr_group=$(devlink_trap_group_get $trap_name)
+			if [ $curr_group != $group_name ]; then
+				continue
+			fi
+
+			report=$(devlink_trap_report_get $trap_name)
+			if [ $report != "true" ]; then
+				check_err 1 "Trap $trap_name did not enable reporting"
+			fi
+		done
+
+		devlink_trap_group_report_set $group_name "false"
+
+		for trap_name in $(devlink_traps_get); do
+			curr_group=$(devlink_trap_group_get $trap_name)
+			if [ $curr_group != $group_name ]; then
+				continue
+			fi
+
+			report=$(devlink_trap_report_get $trap_name)
+			if [ $report != "false" ]; then
+				check_err 1 "Trap $trap_name did not disable reporting"
+			fi
+		done
+	done
+
+	log_test "Trap group report"
+}
+
+trap_group_action_test()
+{
+	local curr_group group_name
+	local trap_name
+	local trap_type
+	local action
+
+	RET=0
+
+	for group_name in $(devlink_trap_groups_get); do
+		devlink_trap_group_action_set $group_name "trap"
+
+		for trap_name in $(devlink_traps_get); do
+			curr_group=$(devlink_trap_group_get $trap_name)
+			if [ $curr_group != $group_name ]; then
+				continue
+			fi
+
+			trap_type=$(devlink_trap_type_get $trap_name)
+			if [ $trap_type != "drop" ]; then
+				continue
+			fi
+
+			action=$(devlink_trap_action_get $trap_name)
+			if [ $action != "trap" ]; then
+				check_err 1 "Trap $trap_name did not change action to trap"
+			fi
+		done
+
+		devlink_trap_group_action_set $group_name "drop"
+
+		for trap_name in $(devlink_traps_get); do
+			curr_group=$(devlink_trap_group_get $trap_name)
+			if [ $curr_group != $group_name ]; then
+				continue
+			fi
+
+			trap_type=$(devlink_trap_type_get $trap_name)
+			if [ $trap_type != "drop" ]; then
+				continue
+			fi
+
+			action=$(devlink_trap_action_get $trap_name)
+			if [ $action != "drop" ]; then
+				check_err 1 "Trap $trap_name did not change action to drop"
+			fi
+		done
+	done
+
+	log_test "Trap group action"
+}
+
+bad_trap_group_test()
+{
+	RET=0
+
+	devlink_trap_group_action_set "made_up_trap_group" "drop"
+	check_fail $? "Did not get an error for non-existing trap group"
+
+	log_test "Non-existing trap group"
+}
+
+trap_group_stats_test()
+{
+	local group_name
+
+	RET=0
+
+	for group_name in $(devlink_trap_groups_get); do
+		devlink_trap_group_stats_idle_test $group_name
+		check_err $? "Stats of trap group $group_name not idle when netdev down"
+
+		ip link set dev $NETDEV up
+
+		devlink_trap_group_action_set $group_name "trap"
+		devlink_trap_group_stats_idle_test $group_name
+		check_fail $? "Stats of trap group $group_name idle when action is trap"
+
+		devlink_trap_group_action_set $group_name "drop"
+		ip link set dev $NETDEV down
+	done
+
+	log_test "Trap group statistics"
+}
+
+port_del_test()
+{
+	local group_name
+	local i
+
+	# The test never fails. It is meant to exercise different code paths
+	# and make sure we properly dismantle a port while packets are
+	# in-flight.
+	RET=0
+
+	# Make sure we have packets reported to user space.
+	devlink_traps_enable_all
+
+	for i in $(seq 1 10); do
+		ip link set dev $NETDEV up
+
+		sleep $SLEEP_TIME
+
+		netdevsim_port_destroy
+		netdevsim_port_create
+		udevadm settle
+	done
+
+	devlink_traps_disable_all
+
+	log_test "Port delete"
+}
+
+dev_del_test()
+{
+	local group_name
+	local i
+
+	# The test never fails. It is meant to exercise different code paths
+	# and make sure we properly unregister traps while packets are
+	# in-flight.
+	RET=0
+
+	# Make sure we have packets reported to user space.
+	devlink_traps_enable_all
+
+	for i in $(seq 1 10); do
+		ip link set dev $NETDEV up
+
+		sleep $SLEEP_TIME
+
+		cleanup
+		setup_prepare
+	done
+
+	devlink_traps_disable_all
+
+	log_test "Device delete"
+}
+
+trap cleanup EXIT
+
+# Each test should make sure that initial state is resumed at the end.
+setup_prepare
+init_test
+trap_report_test
+trap_action_test
+trap_metadata_test
+bad_trap_test
+bad_trap_action_test
+bad_trap_report_test
+trap_stats_test
+trap_mon_test
+trap_group_report_test
+trap_group_action_test
+bad_trap_group_test
+trap_group_stats_test
+port_del_test
+dev_del_test
+
+exit $EXIT_STATUS
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [RFC PATCH net-next 2/5] Documentation: Add a section for devlink-trap testing
  2019-07-07  8:03 ` [RFC PATCH net-next 0/5] selftests: Add devlink-trap selftests Ido Schimmel
  2019-07-07  8:03   ` [RFC PATCH net-next 1/5] selftests: devlink_trap: Add test cases for devlink-trap Ido Schimmel
@ 2019-07-07  8:03   ` Ido Schimmel
  2019-07-07  8:03   ` [RFC PATCH net-next 3/5] selftests: forwarding: devlink_lib: Add devlink-trap helpers Ido Schimmel
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 48+ messages in thread
From: Ido Schimmel @ 2019-07-07  8:03 UTC (permalink / raw)
  To: netdev
  Cc: davem, jiri, mlxsw, dsahern, roopa, nikolay, andy, pablo,
	jakub.kicinski, pieter.jansenvanvuuren, andrew, f.fainelli,
	vivien.didelot, Ido Schimmel

From: Ido Schimmel <idosch@mellanox.com>

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 Documentation/networking/devlink-trap.rst | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/Documentation/networking/devlink-trap.rst b/Documentation/networking/devlink-trap.rst
index 906cdeb0d1f3..b29aac6fe5fd 100644
--- a/Documentation/networking/devlink-trap.rst
+++ b/Documentation/networking/devlink-trap.rst
@@ -188,3 +188,12 @@ following table:
    * - ``buffer_drops``
      - Contains packet traps for packets that were dropped by the device due to
        an enqueue decision
+
+Testing
+=======
+
+See ``tools/testing/selftests/net/devlink_trap.sh`` for a test covering the
+core infrastructure. Test cases should be added for any new functionality.
+
+Device drivers should focus their tests on device-specific functionality, such
+as the triggering of supported packet traps.
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [RFC PATCH net-next 3/5] selftests: forwarding: devlink_lib: Add devlink-trap helpers
  2019-07-07  8:03 ` [RFC PATCH net-next 0/5] selftests: Add devlink-trap selftests Ido Schimmel
  2019-07-07  8:03   ` [RFC PATCH net-next 1/5] selftests: devlink_trap: Add test cases for devlink-trap Ido Schimmel
  2019-07-07  8:03   ` [RFC PATCH net-next 2/5] Documentation: Add a section for devlink-trap testing Ido Schimmel
@ 2019-07-07  8:03   ` Ido Schimmel
  2019-07-07  8:03   ` [RFC PATCH net-next 4/5] selftests: mlxsw: Add test cases for devlink-trap L2 drops Ido Schimmel
  2019-07-07  8:03   ` [RFC PATCH net-next 5/5] selftests: mlxsw: Add a test case for devlink-trap Ido Schimmel
  4 siblings, 0 replies; 48+ messages in thread
From: Ido Schimmel @ 2019-07-07  8:03 UTC (permalink / raw)
  To: netdev
  Cc: davem, jiri, mlxsw, dsahern, roopa, nikolay, andy, pablo,
	jakub.kicinski, pieter.jansenvanvuuren, andrew, f.fainelli,
	vivien.didelot, Ido Schimmel

From: Ido Schimmel <idosch@mellanox.com>

Add helpers to interact with devlink-trap, such as setting the action of
a trap and retrieving statistics.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 .../selftests/net/forwarding/devlink_lib.sh   | 125 ++++++++++++++++++
 1 file changed, 125 insertions(+)

diff --git a/tools/testing/selftests/net/forwarding/devlink_lib.sh b/tools/testing/selftests/net/forwarding/devlink_lib.sh
index 8553a67a2322..101a63f8fc5f 100644
--- a/tools/testing/selftests/net/forwarding/devlink_lib.sh
+++ b/tools/testing/selftests/net/forwarding/devlink_lib.sh
@@ -27,6 +27,12 @@ if [ $? -ne 0 ]; then
 	exit 1
 fi
 
+devlink help 2>&1 | grep "trap" &> /dev/null
+if [ $? -ne 0 ]; then
+	echo "SKIP: iproute2 too old, missing devlink-trap"
+	exit $KSFT_SKIP
+fi
+
 ##############################################################################
 # Devlink helpers
 
@@ -190,3 +196,122 @@ devlink_tc_bind_pool_th_restore()
 	devlink sb tc bind set $port tc $tc type $dir \
 		pool ${orig[0]} th ${orig[1]}
 }
+
+devlink_trap_report_set()
+{
+	local trap_name=$1; shift
+	local report=$1; shift
+
+	# Pipe output to /dev/null to avoid expected warnings.
+	devlink trap set $DEVLINK_DEV trap $trap_name report $report \
+		&> /dev/null
+}
+
+devlink_trap_action_set()
+{
+	local trap_name=$1; shift
+	local action=$1; shift
+
+	# Pipe output to /dev/null to avoid expected warnings.
+	devlink trap set $DEVLINK_DEV trap $trap_name \
+		action $action &> /dev/null
+}
+
+devlink_trap_rx_packets_get()
+{
+	local trap_name=$1; shift
+
+	devlink -js trap show $DEVLINK_DEV trap $trap_name \
+		| jq '.[][][]["stats"]["rx"]["packets"]'
+}
+
+devlink_trap_rx_bytes_get()
+{
+	local trap_name=$1; shift
+
+	devlink -js trap show $DEVLINK_DEV trap $trap_name \
+		| jq '.[][][]["stats"]["rx"]["bytes"]'
+}
+
+devlink_trap_stats_idle_test()
+{
+	local trap_name=$1; shift
+	local t0_packets t0_bytes
+	local t1_packets t1_bytes
+
+	t0_packets=$(devlink_trap_rx_packets_get $trap_name)
+	t0_bytes=$(devlink_trap_rx_bytes_get $trap_name)
+
+	sleep 1
+
+	t1_packets=$(devlink_trap_rx_packets_get $trap_name)
+	t1_bytes=$(devlink_trap_rx_bytes_get $trap_name)
+
+	if [[ $t0_packets -eq $t1_packets && $t0_bytes -eq $t1_bytes ]]; then
+		return 0
+	else
+		return 1
+	fi
+}
+
+devlink_trap_group_rx_packets_get()
+{
+	local group_name=$1; shift
+
+	devlink -js trap group show $DEVLINK_DEV group $group_name \
+		| jq '.[][][]["stats"]["rx"]["packets"]'
+}
+
+devlink_trap_group_rx_bytes_get()
+{
+	local group_name=$1; shift
+
+	devlink -js trap group show $DEVLINK_DEV group $group_name \
+		| jq '.[][][]["stats"]["rx"]["bytes"]'
+}
+
+devlink_trap_group_stats_idle_test()
+{
+	local group_name=$1; shift
+	local t0_packets t0_bytes
+	local t1_packets t1_bytes
+
+	t0_packets=$(devlink_trap_group_rx_packets_get $group_name)
+	t0_bytes=$(devlink_trap_group_rx_bytes_get $group_name)
+
+	sleep 1
+
+	t1_packets=$(devlink_trap_group_rx_packets_get $group_name)
+	t1_bytes=$(devlink_trap_group_rx_bytes_get $group_name)
+
+	if [[ $t0_packets -eq $t1_packets && $t0_bytes -eq $t1_bytes ]]; then
+		return 0
+	else
+		return 1
+	fi
+}
+
+devlink_trap_mon_input_port_test()
+{
+	local trap_name=$1; shift
+	local input_port=$1; shift
+	local devlink_pid
+	local tmp_dir
+	local rc
+
+	tmp_dir="$(mktemp -d)"
+
+	devlink_trap_report_set $trap_name "true"
+	timeout 1 devlink -v mon trap-report &> ${tmp_dir}/reports.txt
+	devlink_trap_report_set $trap_name "false"
+
+	grep -e "$trap_name" ${tmp_dir}/reports.txt -A 2 \
+		| grep -q -e "$input_port"
+
+	# Return false if no reports were found.
+	rc=$?
+
+	rm -rf $tmp_dir
+
+	return $rc
+}
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [RFC PATCH net-next 4/5] selftests: mlxsw: Add test cases for devlink-trap L2 drops
  2019-07-07  8:03 ` [RFC PATCH net-next 0/5] selftests: Add devlink-trap selftests Ido Schimmel
                     ` (2 preceding siblings ...)
  2019-07-07  8:03   ` [RFC PATCH net-next 3/5] selftests: forwarding: devlink_lib: Add devlink-trap helpers Ido Schimmel
@ 2019-07-07  8:03   ` Ido Schimmel
  2019-07-07  8:03   ` [RFC PATCH net-next 5/5] selftests: mlxsw: Add a test case for devlink-trap Ido Schimmel
  4 siblings, 0 replies; 48+ messages in thread
From: Ido Schimmel @ 2019-07-07  8:03 UTC (permalink / raw)
  To: netdev
  Cc: davem, jiri, mlxsw, dsahern, roopa, nikolay, andy, pablo,
	jakub.kicinski, pieter.jansenvanvuuren, andrew, f.fainelli,
	vivien.didelot, Ido Schimmel

From: Ido Schimmel <idosch@mellanox.com>

Test that each supported packet trap is triggered under the right
conditions and that packets are indeed dropped and not forwarded.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 .../net/mlxsw/devlink_trap_l2_drops.sh        | 487 ++++++++++++++++++
 1 file changed, 487 insertions(+)
 create mode 100755 tools/testing/selftests/drivers/net/mlxsw/devlink_trap_l2_drops.sh

diff --git a/tools/testing/selftests/drivers/net/mlxsw/devlink_trap_l2_drops.sh b/tools/testing/selftests/drivers/net/mlxsw/devlink_trap_l2_drops.sh
new file mode 100755
index 000000000000..365a204bd193
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/mlxsw/devlink_trap_l2_drops.sh
@@ -0,0 +1,487 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# Test devlink-trap L2 drops functionality over mlxsw. Each registered L2 drop
+# packet trap is tested to make sure it is triggered under the right
+# conditions.
+
+lib_dir=$(dirname $0)/../../../net/forwarding
+
+ALL_TESTS="
+	source_mac_is_multicast_test
+	vlan_tag_mismatch_test
+	ingress_vlan_filter_test
+	ingress_stp_filter_test
+	port_list_is_empty_test
+	port_loopback_filter_test
+"
+NUM_NETIFS=4
+source $lib_dir/tc_common.sh
+source $lib_dir/lib.sh
+source $lib_dir/devlink_lib.sh
+
+h1_create()
+{
+	simple_if_init $h1
+}
+
+h1_destroy()
+{
+	simple_if_fini $h1
+}
+
+h2_create()
+{
+	simple_if_init $h2
+}
+
+h2_destroy()
+{
+	simple_if_fini $h2
+}
+
+switch_create()
+{
+	ip link add dev br0 type bridge vlan_filtering 1 mcast_snooping 0
+
+	ip link set dev $swp1 master br0
+	ip link set dev $swp2 master br0
+
+	ip link set dev br0 up
+	ip link set dev $swp1 up
+	ip link set dev $swp2 up
+
+	tc qdisc add dev $swp2 clsact
+}
+
+switch_destroy()
+{
+	tc qdisc del dev $swp2 clsact
+
+	ip link set dev $swp2 down
+	ip link set dev $swp1 down
+
+	ip link del dev br0
+}
+
+setup_prepare()
+{
+	h1=${NETIFS[p1]}
+	swp1=${NETIFS[p2]}
+
+	swp2=${NETIFS[p3]}
+	h2=${NETIFS[p4]}
+
+	vrf_prepare
+
+	h1_create
+	h2_create
+
+	switch_create
+}
+
+cleanup()
+{
+	pre_cleanup
+
+	switch_destroy
+
+	h2_destroy
+	h1_destroy
+
+	vrf_cleanup
+}
+
+l2_drops_test()
+{
+	local trap_name=$1; shift
+	local group_name=$1; shift
+
+	# This is the common part of all the tests. It checks that stats are
+	# initially idle, then non-idle after changing the trap action and
+	# finally idle again. It also makes sure the packets are dropped and
+	# never forwarded.
+	devlink_trap_stats_idle_test $trap_name
+	check_err $? "Trap stats not idle with initial drop action"
+	devlink_trap_group_stats_idle_test $group_name
+	check_err $? "Trap group stats not idle with initial drop action"
+
+	devlink_trap_action_set $trap_name "trap"
+
+	devlink_trap_stats_idle_test $trap_name
+	check_fail $? "Trap stats idle after setting action to trap"
+	devlink_trap_group_stats_idle_test $group_name
+	check_fail $? "Trap group stats idle after setting action to trap"
+
+	devlink_trap_mon_input_port_test $trap_name $swp1
+	check_err $? "Port $swp1 not reported as input port"
+
+	devlink_trap_action_set $trap_name "drop"
+
+	devlink_trap_stats_idle_test $trap_name
+	check_err $? "Trap stats not idle after setting action to drop"
+	devlink_trap_group_stats_idle_test $group_name
+	check_err $? "Trap group stats not idle after setting action to drop"
+
+	tc_check_packets "dev $swp2 egress" 101 0
+	check_err $? "Packets were not dropped"
+}
+
+l2_drops_cleanup()
+{
+	local mz_pid=$1; shift
+
+	kill $mz_pid && wait $mz_pid &> /dev/null
+	tc filter del dev $swp2 egress protocol ip pref 1 handle 101 flower
+}
+
+source_mac_is_multicast_test()
+{
+	local trap_name="source_mac_is_multicast"
+	local smac=01:02:03:04:05:06
+	local group_name="l2_drops"
+	local mz_pid
+
+	tc filter add dev $swp2 egress protocol ip pref 1 handle 101 \
+		flower src_mac $smac action drop
+
+	$MZ $h1 -c 0 -p 100 -a $smac -b bcast -t ip -d 1msec -q &
+	mz_pid=$!
+
+	RET=0
+
+	l2_drops_test $trap_name $group_name
+
+	log_test "Source MAC is multicast"
+
+	l2_drops_cleanup $mz_pid
+}
+
+__vlan_tag_mismatch_test()
+{
+	local trap_name="vlan_tag_mismatch"
+	local dmac=de:ad:be:ef:13:37
+	local group_name="l2_drops"
+	local opt=$1; shift
+	local mz_pid
+
+	# Remove PVID flag. This should prevent untagged and prio-tagged
+	# packets from entering the bridge.
+	bridge vlan add vid 1 dev $swp1 untagged master
+
+	tc filter add dev $swp2 egress protocol ip pref 1 handle 101 \
+		flower dst_mac $dmac action drop
+
+	$MZ $h1 "$opt" -c 0 -p 100 -a own -b $dmac -t ip -d 1msec -q &
+	mz_pid=$!
+
+	l2_drops_test $trap_name $group_name
+
+	# Add PVID and make sure packets are no longer dropped.
+	bridge vlan add vid 1 dev $swp1 pvid untagged master
+	devlink_trap_action_set $trap_name "trap"
+
+	devlink_trap_stats_idle_test $trap_name
+	check_err $? "Trap stats not idle when packets should not be dropped"
+	devlink_trap_group_stats_idle_test $group_name
+	check_err $? "Trap group stats not idle with when packets should not be dropped"
+
+	tc_check_packets "dev $swp2 egress" 101 0
+	check_fail $? "Packets not forwarded when should"
+
+	devlink_trap_action_set $trap_name "drop"
+
+	l2_drops_cleanup $mz_pid
+}
+
+vlan_tag_mismatch_untagged_test()
+{
+	RET=0
+
+	__vlan_tag_mismatch_test
+
+	log_test "VLAN tag mismatch - untagged packets"
+}
+
+vlan_tag_mismatch_vid_0_test()
+{
+	RET=0
+
+	__vlan_tag_mismatch_test "-Q 0"
+
+	log_test "VLAN tag mismatch - prio-tagged packets"
+}
+
+vlan_tag_mismatch_test()
+{
+	vlan_tag_mismatch_untagged_test
+	vlan_tag_mismatch_vid_0_test
+}
+
+ingress_vlan_filter_test()
+{
+	local trap_name="ingress_vlan_filter"
+	local dmac=de:ad:be:ef:13:37
+	local group_name="l2_drops"
+	local mz_pid
+	local vid=10
+
+	bridge vlan add vid $vid dev $swp2 master
+	# During initialization the firmware enables all the VLAN filters and
+	# the driver does not turn them off since the traffic will be discarded
+	# by the STP filter whose default is DISCARD state. Add the VID on the
+	# ingress bridge port and then remove it to make sure it is not member
+	# in the VLAN.
+	bridge vlan add vid $vid dev $swp1 master
+	bridge vlan del vid $vid dev $swp1 master
+
+	RET=0
+
+	tc filter add dev $swp2 egress protocol ip pref 1 handle 101 \
+		flower dst_mac $dmac action drop
+
+	$MZ $h1 -Q $vid -c 0 -p 100 -a own -b $dmac -t ip -d 1msec -q &
+	mz_pid=$!
+
+	l2_drops_test $trap_name $group_name
+
+	# Add the VLAN on the bridge port and make sure packets are no longer
+	# dropped.
+	bridge vlan add vid $vid dev $swp1 master
+	devlink_trap_action_set $trap_name "trap"
+
+	devlink_trap_stats_idle_test $trap_name
+	check_err $? "Trap stats not idle when packets should not be dropped"
+	devlink_trap_group_stats_idle_test $group_name
+	check_err $? "Trap group stats not idle with when packets should not be dropped"
+
+	tc_check_packets "dev $swp2 egress" 101 0
+	check_fail $? "Packets not forwarded when should"
+
+	devlink_trap_action_set $trap_name "drop"
+
+	log_test "Ingress VLAN filter"
+
+	l2_drops_cleanup $mz_pid
+
+	bridge vlan del vid $vid dev $swp1 master
+	bridge vlan del vid $vid dev $swp2 master
+}
+
+__ingress_stp_filter_test()
+{
+	local trap_name="ingress_spanning_tree_filter"
+	local dmac=de:ad:be:ef:13:37
+	local group_name="l2_drops"
+	local state=$1; shift
+	local mz_pid
+	local vid=20
+
+	bridge vlan add vid $vid dev $swp2 master
+	bridge vlan add vid $vid dev $swp1 master
+	ip link set dev $swp1 type bridge_slave state $state
+
+	tc filter add dev $swp2 egress protocol ip pref 1 handle 101 \
+		flower dst_mac $dmac action drop
+
+	$MZ $h1 -Q $vid -c 0 -p 100 -a own -b $dmac -t ip -d 1msec -q &
+	mz_pid=$!
+
+	l2_drops_test $trap_name $group_name
+
+	# Change STP state to forwarding and make sure packets are no longer
+	# dropped.
+	ip link set dev $swp1 type bridge_slave state 3
+	devlink_trap_action_set $trap_name "trap"
+
+	devlink_trap_stats_idle_test $trap_name
+	check_err $? "Trap stats not idle when packets should not be dropped"
+	devlink_trap_group_stats_idle_test $group_name
+	check_err $? "Trap group stats not idle with when packets should not be dropped"
+
+	tc_check_packets "dev $swp2 egress" 101 0
+	check_fail $? "Packets not forwarded when should"
+
+	devlink_trap_action_set $trap_name "drop"
+
+	l2_drops_cleanup $mz_pid
+
+	bridge vlan del vid $vid dev $swp1 master
+	bridge vlan del vid $vid dev $swp2 master
+}
+
+ingress_stp_filter_listening_test()
+{
+	local state=$1; shift
+
+	RET=0
+
+	__ingress_stp_filter_test $state
+
+	log_test "Ingress STP filter - listening state"
+}
+
+ingress_stp_filter_learning_test()
+{
+	local state=$1; shift
+
+	RET=0
+
+	__ingress_stp_filter_test $state
+
+	log_test "Ingress STP filter - learning state"
+}
+
+ingress_stp_filter_test()
+{
+	ingress_stp_filter_listening_test 1
+	ingress_stp_filter_learning_test 2
+}
+
+port_list_is_empty_uc_test()
+{
+	local trap_name="port_list_is_empty"
+	local dmac=de:ad:be:ef:13:37
+	local group_name="l2_drops"
+	local mz_pid
+
+	# Disable unicast flooding on both ports, so that packets cannot egress
+	# any port.
+	ip link set dev $swp1 type bridge_slave flood off
+	ip link set dev $swp2 type bridge_slave flood off
+
+	RET=0
+
+	tc filter add dev $swp2 egress protocol ip pref 1 handle 101 \
+		flower dst_mac $dmac action drop
+
+	$MZ $h1 -c 0 -p 100 -a own -b $dmac -t ip -d 1msec -q &
+	mz_pid=$!
+
+	l2_drops_test $trap_name $group_name
+
+	# Allow packets to be flooded to one port.
+	ip link set dev $swp2 type bridge_slave flood on
+	devlink_trap_action_set $trap_name "trap"
+
+	devlink_trap_stats_idle_test $trap_name
+	check_err $? "Trap stats not idle when packets should not be dropped"
+	devlink_trap_group_stats_idle_test $group_name
+	check_err $? "Trap group stats not idle with when packets should not be dropped"
+
+	tc_check_packets "dev $swp2 egress" 101 0
+	check_fail $? "Packets not forwarded when should"
+
+	devlink_trap_action_set $trap_name "drop"
+
+	log_test "Port list is empty - unicast"
+
+	l2_drops_cleanup $mz_pid
+
+	ip link set dev $swp1 type bridge_slave flood on
+}
+
+port_list_is_empty_mc_test()
+{
+	local trap_name="port_list_is_empty"
+	local dmac=01:00:5e:00:00:01
+	local group_name="l2_drops"
+	local dip=239.0.0.1
+	local mz_pid
+
+	# Disable multicast flooding on both ports, so that packets cannot
+	# egress any port. We also need to flush IP addresses from the bridge
+	# in order to prevent packets from being flooded to the router port.
+	ip link set dev $swp1 type bridge_slave mcast_flood off
+	ip link set dev $swp2 type bridge_slave mcast_flood off
+	ip address flush dev br0
+
+	RET=0
+
+	tc filter add dev $swp2 egress protocol ip pref 1 handle 101 \
+		flower dst_mac $dmac action drop
+
+	$MZ $h1 -c 0 -p 100 -a own -b $dmac -t ip -B $dip -d 1msec -q &
+	mz_pid=$!
+
+	l2_drops_test $trap_name $group_name
+
+	# Allow packets to be flooded to one port.
+	ip link set dev $swp2 type bridge_slave mcast_flood on
+	devlink_trap_action_set $trap_name "trap"
+
+	devlink_trap_stats_idle_test $trap_name
+	check_err $? "Trap stats not idle when packets should not be dropped"
+	devlink_trap_group_stats_idle_test $group_name
+	check_err $? "Trap group stats not idle with when packets should not be dropped"
+
+	tc_check_packets "dev $swp2 egress" 101 0
+	check_fail $? "Packets not forwarded when should"
+
+	devlink_trap_action_set $trap_name "drop"
+
+	log_test "Port list is empty - multicast"
+
+	l2_drops_cleanup $mz_pid
+
+	ip link set dev $swp1 type bridge_slave mcast_flood on
+}
+
+port_list_is_empty_test()
+{
+	port_list_is_empty_uc_test
+	port_list_is_empty_mc_test
+}
+
+port_loopback_filter_uc_test()
+{
+	local trap_name="port_loopback_filter"
+	local dmac=de:ad:be:ef:13:37
+	local group_name="l2_drops"
+	local mz_pid
+
+	# Make sure packets can only egress the input port.
+	ip link set dev $swp2 type bridge_slave flood off
+
+	RET=0
+
+	tc filter add dev $swp2 egress protocol ip pref 1 handle 101 \
+		flower dst_mac $dmac action drop
+
+	$MZ $h1 -c 0 -p 100 -a own -b $dmac -t ip -d 1msec -q &
+	mz_pid=$!
+
+	l2_drops_test $trap_name $group_name
+
+	# Allow packets to be flooded.
+	ip link set dev $swp2 type bridge_slave flood on
+	devlink_trap_action_set $trap_name "trap"
+
+	devlink_trap_stats_idle_test $trap_name
+	check_err $? "Trap stats not idle when packets should not be dropped"
+	devlink_trap_group_stats_idle_test $group_name
+	check_err $? "Trap group stats not idle with when packets should not be dropped"
+
+	tc_check_packets "dev $swp2 egress" 101 0
+	check_fail $? "Packets not forwarded when should"
+
+	devlink_trap_action_set $trap_name "drop"
+
+	log_test "Port loopback filter - unicast"
+
+	l2_drops_cleanup $mz_pid
+}
+
+port_loopback_filter_test()
+{
+	port_loopback_filter_uc_test
+}
+
+trap cleanup EXIT
+
+setup_prepare
+setup_wait
+
+tests_run
+
+exit $EXIT_STATUS
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [RFC PATCH net-next 5/5] selftests: mlxsw: Add a test case for devlink-trap
  2019-07-07  8:03 ` [RFC PATCH net-next 0/5] selftests: Add devlink-trap selftests Ido Schimmel
                     ` (3 preceding siblings ...)
  2019-07-07  8:03   ` [RFC PATCH net-next 4/5] selftests: mlxsw: Add test cases for devlink-trap L2 drops Ido Schimmel
@ 2019-07-07  8:03   ` Ido Schimmel
  4 siblings, 0 replies; 48+ messages in thread
From: Ido Schimmel @ 2019-07-07  8:03 UTC (permalink / raw)
  To: netdev
  Cc: davem, jiri, mlxsw, dsahern, roopa, nikolay, andy, pablo,
	jakub.kicinski, pieter.jansenvanvuuren, andrew, f.fainelli,
	vivien.didelot, Ido Schimmel

From: Ido Schimmel <idosch@mellanox.com>

Test generic devlink-trap functionality over mlxsw. These tests are not
specific to a single trap, but do not check the devlink-trap common
infrastructure either.

Currently, the only test case is device deletion (by reloading the
driver) while packets are being trapped.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
---
 .../drivers/net/mlxsw/devlink_trap.sh         | 130 ++++++++++++++++++
 1 file changed, 130 insertions(+)
 create mode 100755 tools/testing/selftests/drivers/net/mlxsw/devlink_trap.sh

diff --git a/tools/testing/selftests/drivers/net/mlxsw/devlink_trap.sh b/tools/testing/selftests/drivers/net/mlxsw/devlink_trap.sh
new file mode 100755
index 000000000000..8e1d0dc44256
--- /dev/null
+++ b/tools/testing/selftests/drivers/net/mlxsw/devlink_trap.sh
@@ -0,0 +1,130 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# Test generic devlink-trap functionality over mlxsw. These tests are not
+# specific to a single trap, but do not check the devlink-trap common
+# infrastructure either.
+
+lib_dir=$(dirname $0)/../../../net/forwarding
+
+ALL_TESTS="
+	dev_del_test
+"
+NUM_NETIFS=4
+source $lib_dir/tc_common.sh
+source $lib_dir/lib.sh
+source $lib_dir/devlink_lib.sh
+
+h1_create()
+{
+	simple_if_init $h1
+}
+
+h1_destroy()
+{
+	simple_if_fini $h1
+}
+
+h2_create()
+{
+	simple_if_init $h2
+}
+
+h2_destroy()
+{
+	simple_if_fini $h2
+}
+
+switch_create()
+{
+	ip link add dev br0 type bridge vlan_filtering 1 mcast_snooping 0
+
+	ip link set dev $swp1 master br0
+	ip link set dev $swp2 master br0
+
+	ip link set dev br0 up
+	ip link set dev $swp1 up
+	ip link set dev $swp2 up
+}
+
+switch_destroy()
+{
+	ip link set dev $swp2 down
+	ip link set dev $swp1 down
+
+	ip link del dev br0
+}
+
+setup_prepare()
+{
+	h1=${NETIFS[p1]}
+	swp1=${NETIFS[p2]}
+
+	swp2=${NETIFS[p3]}
+	h2=${NETIFS[p4]}
+
+	vrf_prepare
+
+	h1_create
+	h2_create
+
+	switch_create
+}
+
+cleanup()
+{
+	pre_cleanup
+
+	switch_destroy
+
+	h2_destroy
+	h1_destroy
+
+	vrf_cleanup
+}
+
+dev_del_test()
+{
+	local trap_name="source_mac_is_multicast"
+	local smac=01:02:03:04:05:06
+	local num_iter=5
+	local mz_pid
+	local i
+
+	$MZ $h1 -c 0 -p 100 -a $smac -b bcast -t ip -q &
+	mz_pid=$!
+
+	# The purpose of this test is to make sure we correctly dismantle a
+	# port while packets are trapped from it. This is done by reloading the
+	# the driver while the 'ingress_smac_mc_drop' trap is triggered.
+	RET=0
+
+	for i in $(seq 1 $num_iter); do
+		log_info "Iteration $i / $num_iter"
+
+		devlink_trap_action_set $trap_name "trap"
+		devlink_trap_report_set $trap_name "true"
+		sleep 1
+
+		devlink_reload
+		# Allow netdevices to be re-created following the reload
+		sleep 20
+
+		cleanup
+		setup_prepare
+		setup_wait
+	done
+
+	log_test "Device delete"
+
+	kill $mz_pid && wait $mz_pid &> /dev/null
+}
+
+trap cleanup EXIT
+
+setup_prepare
+setup_wait
+
+tests_run
+
+exit $EXIT_STATUS
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH net-next 00/11] Add drop monitor for offloaded data paths
  2019-07-07  7:58 [PATCH net-next 00/11] Add drop monitor for offloaded data paths Ido Schimmel
                   ` (12 preceding siblings ...)
  2019-07-07  8:03 ` [RFC PATCH net-next 0/5] selftests: Add devlink-trap selftests Ido Schimmel
@ 2019-07-07  8:15 ` Ido Schimmel
  2019-07-07 19:45 ` David Miller
  14 siblings, 0 replies; 48+ messages in thread
From: Ido Schimmel @ 2019-07-07  8:15 UTC (permalink / raw)
  To: netdev
  Cc: davem, jiri, mlxsw, dsahern, roopa, nikolay, andy, pablo,
	jakub.kicinski, pieter.jansenvanvuuren, andrew, f.fainelli,
	vivien.didelot, Ido Schimmel

More info that I did not want to put in the cover letter:

Wrote a Wireshark dissector [1] for devlink packets. It dissects various
devlink packets and is especially useful for trap report packets. Both
the metadata about the trapped packet (e.g., trap reason) and also the
trapped Ethernet packet itself are dissected and displayed. Example [6].
Will submit it after the kernel infrastructure is accepted.

Jiri wrote devlink_exporter [2] for Prometheus [3]. This allows one to
export trap statistics as metrics to Prometheus. Eventually, this should
be integrated into node_exporter [4]. We also experimented with
Cloudflare's ebpf_exporter [5], which allows for more fine-grained
statistics (e.g., number of drops per {5-tuple, drop reason}).

I imagine that Wireshark will be mainly used by kernel engineers that
need to pinpoint a specific problem, whereas Prometheus (and similar
time series databases) will be used by network engineers that need to
monitor the network 24x7.

[1] https://github.com/idosch/wireshark/blob/devlink/epan/dissectors/packet-netlink-devlink.c
[2] https://github.com/jpirko/prometheus-devlink-exporter/blob/master/devlink-exporter.py
[3] https://prometheus.io/
[4] https://github.com/prometheus/node_exporter
[5] https://blog.cloudflare.com/introducing-ebpf_exporter/
[6]
Linux netlink (cooked header)
    Link-layer address type: Netlink (824)
    Family: Generic (0x0010)
Linux Generic Netlink protocol
    Netlink message header (type: 0x0013)
        Length: 224
        Family ID: 0x13 (devlink)
        Flags: 0x0000
            .... .... .... ...0 = Request: 0
            .... .... .... ..0. = Multipart message: 0
            .... .... .... .0.. = Ack: 0
            .... .... .... 0... = Echo: 0
            .... .... ...0 .... = Dump inconsistent: 0
            .... .... ..0. .... = Dump filtered: 0
        Sequence: 0
        Port ID: 0
    Command: Trap report (65)
    Family Version: 1
    Reserved
Linux devlink (device netlink) protocol
    Attribute: Bus name: pci
        Len: 8
        Type: 0x0001, Bus name (1)
            0... .... .... .... = Nested: 0
            .0.. .... .... .... = Network byte order: 0
            Attribute type: Bus name (1)
        Bus name: pci
    Attribute: Device name: 0000:01:00.0
        Len: 17
        Type: 0x0002, Device name (2)
            0... .... .... .... = Nested: 0
            .0.. .... .... .... = Network byte order: 0
            Attribute type: Device name (2)
        Device name: 0000:01:00.0
    Attribute: Trap group name: l2_drops
        Len: 13
        Type: 0x0089, Trap group name (137)
            0... .... .... .... = Nested: 0
            .0.. .... .... .... = Network byte order: 0
            Attribute type: Trap group name (137)
        Trap group name: l2_drops
    Attribute: Trap name: source_mac_is_multicast
        Len: 28
        Type: 0x0080, Trap name (128)
            0... .... .... .... = Nested: 0
            .0.. .... .... .... = Network byte order: 0
            Attribute type: Trap name (128)
        Trap name: source_mac_is_multicast
    Attribute: Trap type
        Len: 5
        Type: 0x0083, Trap type (131)
            0... .... .... .... = Nested: 0
            .0.. .... .... .... = Network byte order: 0
            Attribute type: Trap type (131)
        Trap type: Drop (0)
    Attribute: Trap timestamp
        Len: 20
        Type: 0x0086, Trap timestamp (134)
            0... .... .... .... = Nested: 0
            .0.. .... .... .... = Network byte order: 0
            Attribute type: Trap timestamp (134)
        Trap timestamp: Jul  6, 2019 18:16:11.396492223 IDT
    Attribute: Trap input port
        Len: 40
        Type: 0x8087, Nested, Trap input port (135)
            1... .... .... .... = Nested: 1
            .0.. .... .... .... = Network byte order: 0
            Attribute type: Unknown (32903)
        Attribute: Port index: 17
            Len: 8
            Type: 0x0003, Port index (3)
                0... .... .... .... = Nested: 0
                .0.. .... .... .... = Network byte order: 0
                Attribute type: Port index (3)
            Port index: 17
        Attribute: Port type
            Len: 6
            Type: 0x0004, Port type (4)
                0... .... .... .... = Nested: 0
                .0.. .... .... .... = Network byte order: 0
                Attribute type: Port type (4)
            Port type: Ethernet (2)
        Attribute: Net device index: 133
            Len: 8
            Type: 0x0006, Net device index (6)
                0... .... .... .... = Nested: 0
                .0.. .... .... .... = Network byte order: 0
                Attribute type: Net device index (6)
            Port net device index: 133
        Attribute: Net device name: swp3
            Len: 9
            Type: 0x0007, Net device name (7)
                0... .... .... .... = Nested: 0
                .0.. .... .... .... = Network byte order: 0
                Attribute type: Net device name (7)
            Port net device name: swp3
    Attribute: Trap payload
        Len: 64
        Type: 0x0088, Trap payload (136)
            0... .... .... .... = Nested: 0
            .0.. .... .... .... = Network byte order: 0
            Attribute type: Trap payload (136)
        Ethernet II, Src: Woonsang_04:05:06 (01:02:03:04:05:06), Dst: Mellanox_ff:27:d1 (7c:fe:90:ff:27:d1)
            Destination: Mellanox_ff:27:d1 (7c:fe:90:ff:27:d1)
                Address: Mellanox_ff:27:d1 (7c:fe:90:ff:27:d1)
                .... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)
                .... ...0 .... .... .... .... = IG bit: Individual address (unicast)
            Source: Woonsang_04:05:06 (01:02:03:04:05:06)
                [Expert Info (Warning/Protocol): Source MAC must not be a group address: IEEE 802.3-2002, Section 3.2.3(b)]
                    [Source MAC must not be a group address: IEEE 802.3-2002, Section 3.2.3(b)]
                    [Severity level: Warning]
                    [Group: Protocol]
                Address: Woonsang_04:05:06 (01:02:03:04:05:06)
                .... ..0. .... .... .... .... = LG bit: Globally unique address (factory default)
                .... ...1 .... .... .... .... = IG bit: Group address (multicast/broadcast)
            Type: IPv4 (0x0800)
            Trailer: 000000000000000000000000000000000000000000000000…
        Internet Protocol Version 4, Src: 192.0.2.1, Dst: 192.0.2.2
            0100 .... = Version: 4
            .... 0101 = Header Length: 20 bytes (5)
            Differentiated Services Field: 0x00 (DSCP: CS0, ECN: Not-ECT)
                0000 00.. = Differentiated Services Codepoint: Default (0)
                .... ..00 = Explicit Congestion Notification: Not ECN-Capable Transport (0)
            Total Length: 20
            Identification: 0x0000 (0)
            Flags: 0x0000
                0... .... .... .... = Reserved bit: Not set
                .0.. .... .... .... = Don't fragment: Not set
                ..0. .... .... .... = More fragments: Not set
            ...0 0000 0000 0000 = Fragment offset: 0
            Time to live: 255
            Protocol: IPv6 Hop-by-Hop Option (0)
            Header checksum: 0x37e6 [validation disabled]
            [Header checksum status: Unverified]
            Source: 192.0.2.1
            Destination: 192.0.2.2

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH net-next 00/11] Add drop monitor for offloaded data paths
  2019-07-07  7:58 [PATCH net-next 00/11] Add drop monitor for offloaded data paths Ido Schimmel
                   ` (13 preceding siblings ...)
  2019-07-07  8:15 ` [PATCH net-next 00/11] Add drop monitor for offloaded data paths Ido Schimmel
@ 2019-07-07 19:45 ` David Miller
  2019-07-08 13:19   ` Ido Schimmel
  2019-07-11 12:39   ` Ido Schimmel
  14 siblings, 2 replies; 48+ messages in thread
From: David Miller @ 2019-07-07 19:45 UTC (permalink / raw)
  To: idosch
  Cc: netdev, jiri, mlxsw, dsahern, roopa, nikolay, andy, pablo,
	jakub.kicinski, pieter.jansenvanvuuren, andrew, f.fainelli,
	vivien.didelot, idosch

From: Ido Schimmel <idosch@idosch.org>
Date: Sun,  7 Jul 2019 10:58:17 +0300

> Users have several ways to debug the kernel and understand why a packet
> was dropped. For example, using "drop monitor" and "perf". Both
> utilities trace kfree_skb(), which is the function called when a packet
> is freed as part of a failure. The information provided by these tools
> is invaluable when trying to understand the cause of a packet loss.
> 
> In recent years, large portions of the kernel data path were offloaded
> to capable devices. Today, it is possible to perform L2 and L3
> forwarding in hardware, as well as tunneling (IP-in-IP and VXLAN).
> Different TC classifiers and actions are also offloaded to capable
> devices, at both ingress and egress.
> 
> However, when the data path is offloaded it is not possible to achieve
> the same level of introspection as tools such "perf" and "drop monitor"
> become irrelevant.
> 
> This patchset aims to solve this by allowing users to monitor packets
> that the underlying device decided to drop along with relevant metadata
> such as the drop reason and ingress port.

We are now going to have 5 or so ways to capture packets passing through
the system, this is nonsense.

AF_PACKET, kfree_skb drop monitor, perf, XDP perf events, and now this
devlink thing.

This is insanity, too many ways to do the same thing and therefore the
worst possible user experience.

Pick _ONE_ method to trap packets and forward normal kfree_skb events,
XDP perf events, and these taps there too.

I mean really, think about it from the average user's perspective.  To
see all drops/pkts I have to attach a kfree_skb tracepoint, and not just
listen on devlink but configure a special tap thing beforehand and then
if someone is using XDP I gotta setup another perf event buffer capture
thing too.

Sorry, this isn't where we are going.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH net-next 00/11] Add drop monitor for offloaded data paths
  2019-07-07 19:45 ` David Miller
@ 2019-07-08 13:19   ` Ido Schimmel
  2019-07-08 22:51     ` Jakub Kicinski
  2019-07-11 12:39   ` Ido Schimmel
  1 sibling, 1 reply; 48+ messages in thread
From: Ido Schimmel @ 2019-07-08 13:19 UTC (permalink / raw)
  To: David Miller
  Cc: netdev, jiri, mlxsw, dsahern, roopa, nikolay, andy, pablo,
	jakub.kicinski, pieter.jansenvanvuuren, andrew, f.fainelli,
	vivien.didelot, idosch

On Sun, Jul 07, 2019 at 12:45:41PM -0700, David Miller wrote:
> From: Ido Schimmel <idosch@idosch.org>
> Date: Sun,  7 Jul 2019 10:58:17 +0300
> 
> > Users have several ways to debug the kernel and understand why a packet
> > was dropped. For example, using "drop monitor" and "perf". Both
> > utilities trace kfree_skb(), which is the function called when a packet
> > is freed as part of a failure. The information provided by these tools
> > is invaluable when trying to understand the cause of a packet loss.
> > 
> > In recent years, large portions of the kernel data path were offloaded
> > to capable devices. Today, it is possible to perform L2 and L3
> > forwarding in hardware, as well as tunneling (IP-in-IP and VXLAN).
> > Different TC classifiers and actions are also offloaded to capable
> > devices, at both ingress and egress.
> > 
> > However, when the data path is offloaded it is not possible to achieve
> > the same level of introspection as tools such "perf" and "drop monitor"
> > become irrelevant.
> > 
> > This patchset aims to solve this by allowing users to monitor packets
> > that the underlying device decided to drop along with relevant metadata
> > such as the drop reason and ingress port.
> 
> We are now going to have 5 or so ways to capture packets passing through
> the system, this is nonsense.
> 
> AF_PACKET, kfree_skb drop monitor, perf, XDP perf events, and now this
> devlink thing.
> 
> This is insanity, too many ways to do the same thing and therefore the
> worst possible user experience.
> 
> Pick _ONE_ method to trap packets and forward normal kfree_skb events,
> XDP perf events, and these taps there too.
> 
> I mean really, think about it from the average user's perspective.  To
> see all drops/pkts I have to attach a kfree_skb tracepoint, and not just
> listen on devlink but configure a special tap thing beforehand and then
> if someone is using XDP I gotta setup another perf event buffer capture
> thing too.

Let me try to explain again because I probably wasn't clear enough. The
devlink-trap mechanism is not doing the same thing as other solutions.

The packets we are capturing in this patchset are packets that the
kernel (the CPU) never saw up until now - they were silently dropped by
the underlying device performing the packet forwarding instead of the
CPU.

For each such packet we get valuable metadata from the underlying device
such as the drop reason and the ingress port. With time, even more
reasons and metadata could be provided (e.g., egress port, traffic
class). Netlink provides a structured and extensible way to report the
packet along with the metadata to interested users. The tc-sample action
uses a similar concept.

I would like to emphasize that these dropped packets are not injected to
the kernel's receive path and therefore not subject to kfree_skb() and
related infrastructure. There is no need to waste CPU cycles on packets
we already know were dropped (and why). Further, hardware tail/early
drops will not be dropped by the kernel, given its qdiscs are probably
empty.

Regarding the use of devlink, current ASICs can forward packets at
6.4Tb/s. We do not want to overwhelm the CPU with dropped packets and
therefore we give users the ability to control - via devlink - the
trapping of certain packets to the CPU and their reporting to user
space. In the future, devlink-trap can be extended to support the
configuration of the hardware policers of each trap.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH net-next 00/11] Add drop monitor for offloaded data paths
  2019-07-08 13:19   ` Ido Schimmel
@ 2019-07-08 22:51     ` Jakub Kicinski
  2019-07-09 12:38       ` Ido Schimmel
  0 siblings, 1 reply; 48+ messages in thread
From: Jakub Kicinski @ 2019-07-08 22:51 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: David Miller, netdev, jiri, mlxsw, dsahern, roopa, nikolay, andy,
	pablo, pieter.jansenvanvuuren, andrew, f.fainelli,
	vivien.didelot, idosch, Alexei Starovoitov

On Mon, 8 Jul 2019 16:19:08 +0300, Ido Schimmel wrote:
> On Sun, Jul 07, 2019 at 12:45:41PM -0700, David Miller wrote:
> > From: Ido Schimmel <idosch@idosch.org>
> > Date: Sun,  7 Jul 2019 10:58:17 +0300
> >   
> > > Users have several ways to debug the kernel and understand why a packet
> > > was dropped. For example, using "drop monitor" and "perf". Both
> > > utilities trace kfree_skb(), which is the function called when a packet
> > > is freed as part of a failure. The information provided by these tools
> > > is invaluable when trying to understand the cause of a packet loss.
> > > 
> > > In recent years, large portions of the kernel data path were offloaded
> > > to capable devices. Today, it is possible to perform L2 and L3
> > > forwarding in hardware, as well as tunneling (IP-in-IP and VXLAN).
> > > Different TC classifiers and actions are also offloaded to capable
> > > devices, at both ingress and egress.
> > > 
> > > However, when the data path is offloaded it is not possible to achieve
> > > the same level of introspection as tools such "perf" and "drop monitor"
> > > become irrelevant.
> > > 
> > > This patchset aims to solve this by allowing users to monitor packets
> > > that the underlying device decided to drop along with relevant metadata
> > > such as the drop reason and ingress port.  
> > 
> > We are now going to have 5 or so ways to capture packets passing through
> > the system, this is nonsense.
> > 
> > AF_PACKET, kfree_skb drop monitor, perf, XDP perf events, and now this
> > devlink thing.
> > 
> > This is insanity, too many ways to do the same thing and therefore the
> > worst possible user experience.
> > 
> > Pick _ONE_ method to trap packets and forward normal kfree_skb events,
> > XDP perf events, and these taps there too.
> > 
> > I mean really, think about it from the average user's perspective.  To
> > see all drops/pkts I have to attach a kfree_skb tracepoint, and not just
> > listen on devlink but configure a special tap thing beforehand and then
> > if someone is using XDP I gotta setup another perf event buffer capture
> > thing too.  
> 
> Let me try to explain again because I probably wasn't clear enough. The
> devlink-trap mechanism is not doing the same thing as other solutions.
> 
> The packets we are capturing in this patchset are packets that the
> kernel (the CPU) never saw up until now - they were silently dropped by
> the underlying device performing the packet forwarding instead of the
> CPU.

When you say silently dropped do you mean that mlxsw as of today
doesn't have any counters exposed for those events?

If we wanted to consolidate this into something existing we can either
 (a) add similar traps in the kernel data path;
 (b) make these traps extension of statistics.

My knee jerk reaction to seeing the patches was that it adds a new
place where device statistics are reported. Users who want to know why
things are dropped will not get detailed breakdown from ethtool -S which
for better or worse is the one stop shop for device stats today. 

Having thought about it some more, however, I think that having a
forwarding "exception" object and hanging statistics off of it is a
better design, even if we need to deal with some duplication to get
there.

IOW having an way to "trap all packets which would increment a
statistic" (option (b) above) is probably a bad design.

As for (a) I wonder how many of those events have a corresponding event
in the kernel stack? If we could add corresponding trace points and
just feed those from the device driver, that'd obviously be a holy
grail. Not to mention that requiring trace points to be added to the
core would make Alexei happy:

http://vger.kernel.org/netconf2019_files/netconf2019_slides_ast.pdf#page=3

;)

That's my $.02, not very insightful.

> For each such packet we get valuable metadata from the underlying device
> such as the drop reason and the ingress port. With time, even more
> reasons and metadata could be provided (e.g., egress port, traffic
> class). Netlink provides a structured and extensible way to report the
> packet along with the metadata to interested users. The tc-sample action
> uses a similar concept.
> 
> I would like to emphasize that these dropped packets are not injected to
> the kernel's receive path and therefore not subject to kfree_skb() and
> related infrastructure. There is no need to waste CPU cycles on packets
> we already know were dropped (and why). Further, hardware tail/early
> drops will not be dropped by the kernel, given its qdiscs are probably
> empty.
> 
> Regarding the use of devlink, current ASICs can forward packets at
> 6.4Tb/s. We do not want to overwhelm the CPU with dropped packets and
> therefore we give users the ability to control - via devlink - the
> trapping of certain packets to the CPU and their reporting to user
> space. In the future, devlink-trap can be extended to support the
> configuration of the hardware policers of each trap.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH net-next 00/11] Add drop monitor for offloaded data paths
  2019-07-08 22:51     ` Jakub Kicinski
@ 2019-07-09 12:38       ` Ido Schimmel
  2019-07-09 22:34         ` Jakub Kicinski
  0 siblings, 1 reply; 48+ messages in thread
From: Ido Schimmel @ 2019-07-09 12:38 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David Miller, netdev, jiri, mlxsw, dsahern, roopa, nikolay, andy,
	pablo, pieter.jansenvanvuuren, andrew, f.fainelli,
	vivien.didelot, idosch, Alexei Starovoitov

On Mon, Jul 08, 2019 at 03:51:58PM -0700, Jakub Kicinski wrote:
> On Mon, 8 Jul 2019 16:19:08 +0300, Ido Schimmel wrote:
> > On Sun, Jul 07, 2019 at 12:45:41PM -0700, David Miller wrote:
> > > From: Ido Schimmel <idosch@idosch.org>
> > > Date: Sun,  7 Jul 2019 10:58:17 +0300
> > >   
> > > > Users have several ways to debug the kernel and understand why a packet
> > > > was dropped. For example, using "drop monitor" and "perf". Both
> > > > utilities trace kfree_skb(), which is the function called when a packet
> > > > is freed as part of a failure. The information provided by these tools
> > > > is invaluable when trying to understand the cause of a packet loss.
> > > > 
> > > > In recent years, large portions of the kernel data path were offloaded
> > > > to capable devices. Today, it is possible to perform L2 and L3
> > > > forwarding in hardware, as well as tunneling (IP-in-IP and VXLAN).
> > > > Different TC classifiers and actions are also offloaded to capable
> > > > devices, at both ingress and egress.
> > > > 
> > > > However, when the data path is offloaded it is not possible to achieve
> > > > the same level of introspection as tools such "perf" and "drop monitor"
> > > > become irrelevant.
> > > > 
> > > > This patchset aims to solve this by allowing users to monitor packets
> > > > that the underlying device decided to drop along with relevant metadata
> > > > such as the drop reason and ingress port.  
> > > 
> > > We are now going to have 5 or so ways to capture packets passing through
> > > the system, this is nonsense.
> > > 
> > > AF_PACKET, kfree_skb drop monitor, perf, XDP perf events, and now this
> > > devlink thing.
> > > 
> > > This is insanity, too many ways to do the same thing and therefore the
> > > worst possible user experience.
> > > 
> > > Pick _ONE_ method to trap packets and forward normal kfree_skb events,
> > > XDP perf events, and these taps there too.
> > > 
> > > I mean really, think about it from the average user's perspective.  To
> > > see all drops/pkts I have to attach a kfree_skb tracepoint, and not just
> > > listen on devlink but configure a special tap thing beforehand and then
> > > if someone is using XDP I gotta setup another perf event buffer capture
> > > thing too.  
> > 
> > Let me try to explain again because I probably wasn't clear enough. The
> > devlink-trap mechanism is not doing the same thing as other solutions.
> > 
> > The packets we are capturing in this patchset are packets that the
> > kernel (the CPU) never saw up until now - they were silently dropped by
> > the underlying device performing the packet forwarding instead of the
> > CPU.

Jakub,

It seems to me that most of the criticism is about consolidation of
interfaces because you believe I'm doing something you can already do
today, but this is not the case.

Switch ASICs have dedicated traps for specific packets. Usually, these
packets are control packets (e.g., ARP, BGP) which are required for the
correct functioning of the control plane. You can see this in the SAI
interface, which is an abstraction layer over vendors' SDKs:

https://github.com/opencomputeproject/SAI/blob/master/inc/saihostif.h#L157

We need to be able to configure the hardware policers of these traps and
read their statistics to understand how many packets they dropped. We
currently do not have a way to do any of that and we rely on hardcoded
defaults in the driver which do not fit every use case (from
experience):

https://elixir.bootlin.com/linux/v5.2/source/drivers/net/ethernet/mellanox/mlxsw/spectrum.c#L4103

We plan to extend devlink-trap mechanism to cover all these use cases. I
hope you agree that this functionality belongs in devlink given it is a
device-specific configuration and not a netdev-specific one.

That being said, in its current form, this mechanism is focused on traps
that correlate to packets the device decided to drop as this is very
useful for debugging.

Given that the entire configuration is done via devlink and that devlink
stores all the information about these traps, it seems logical to also
report these packets and their metadata to user space as devlink events.

If this is not desirable, we can try to call into drop_monitor from
devlink and add a new command (e.g., NET_DM_CMD_HW_ALERT), which will
encode all the information we currently have in DEVLINK_CMD_TRAP_REPORT.

IMO, this is less desirable, as instead of having one tool (devlink) to
interact with this mechanism we will need two (devlink & dropwatch).

Below I tried to answer all your questions and refer to all the points
you brought up.

> When you say silently dropped do you mean that mlxsw as of today
> doesn't have any counters exposed for those events?

Some of these packets are counted, but not all of them.

> If we wanted to consolidate this into something existing we can either
>  (a) add similar traps in the kernel data path;
>  (b) make these traps extension of statistics.
> 
> My knee jerk reaction to seeing the patches was that it adds a new
> place where device statistics are reported.

Not at all. This would be a step back. We can already count discards due
to VLAN membership on ingress on a per-port basis. A software maintained
global counter does not buy us anything.

By also getting the dropped packet - coupled with the drop reason and
ingress port - you can understand exactly why and on which VLAN the
packet was dropped. I wrote a Wireshark dissector for these netlink
packets to make our life easier. You can see the details in my comment
to the cover letter:

https://marc.info/?l=linux-netdev&m=156248736710238&w=2

In case you do not care about individual packets, but still want more
fine-grained statistics for your monitoring application, you can use
eBPF. For example, one thing we did is attaching a kprobe to
devlink_trap_report() with an eBPF program that dissects the incoming
skbs and maintains a counter per-{5 tuple, drop reason}. With
ebpf_exporter you can export these statistics to Prometheus on which you
can run queries and visualize the results with Grafana. This is
especially useful for tail and early drops since it allows you to
understand which flows contribute to most of the drops.

> Users who want to know why things are dropped will not get detailed
> breakdown from ethtool -S which for better or worse is the one stop
> shop for device stats today.

I hope I managed to explain why counters are not enough, but I also want
to point out that ethtool statistics are not properly documented and
this hinders their effectiveness. I did my best to document the exposed
traps in order to avoid the same fate:

https://patchwork.ozlabs.org/patch/1128585/

In addition, there are selftests to show how each trap can be triggered
to reduce the ambiguity even further:

https://patchwork.ozlabs.org/patch/1128610/

And a note in the documentation to make sure future functionality is
tested as well:

https://patchwork.ozlabs.org/patch/1128608/

> Having thought about it some more, however, I think that having a
> forwarding "exception" object and hanging statistics off of it is a
> better design, even if we need to deal with some duplication to get
> there.
> 
> IOW having an way to "trap all packets which would increment a
> statistic" (option (b) above) is probably a bad design.
> 
> As for (a) I wonder how many of those events have a corresponding event
> in the kernel stack?

Generic packet drops all have a corresponding kfree_skb() calls in the
kernel, but that does not mean that every packet dropped by the hardware
would also be dropped by the kernel if it were to be injected to its Rx
path. In my reply to Dave I gave buffer drops as an example.

There are also situations in which packets can be dropped due to
device-specific exceptions and these do not have a corresponding drop
reason in the kernel. See example here:

https://patchwork.ozlabs.org/patch/1128587/

> If we could add corresponding trace points and just feed those from
> the device driver, that'd obviously be a holy grail.

Unlike tracepoints, netlink gives you a structured and extensible
interface. For example, in Spectrum-1 we cannot provide the Tx port for
early/tail drops, whereas for Spectrum-2 and later we can. With netlink,
we can just omit the DEVLINK_ATTR_TRAP_OUT_PORT attribute for
Spectrum-1. You also get a programmatic interface that you can query for
this information:

# devlink -v trap show netdevsim/netdevsim10 trap ingress_vlan_filter
netdevsim/netdevsim10:
  name ingress_vlan_filter type drop generic true report false action drop group l2_drops
    metadata:
       input_port

Thanks

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH net-next 00/11] Add drop monitor for offloaded data paths
  2019-07-09 12:38       ` Ido Schimmel
@ 2019-07-09 22:34         ` Jakub Kicinski
  2019-07-10 11:20           ` Ido Schimmel
  0 siblings, 1 reply; 48+ messages in thread
From: Jakub Kicinski @ 2019-07-09 22:34 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: David Miller, netdev, jiri, mlxsw, dsahern, roopa, nikolay, andy,
	pablo, pieter.jansenvanvuuren, andrew, f.fainelli,
	vivien.didelot, idosch, Alexei Starovoitov

On Tue, 9 Jul 2019 15:38:44 +0300, Ido Schimmel wrote:
> On Mon, Jul 08, 2019 at 03:51:58PM -0700, Jakub Kicinski wrote:
> > On Mon, 8 Jul 2019 16:19:08 +0300, Ido Schimmel wrote:  
> > > On Sun, Jul 07, 2019 at 12:45:41PM -0700, David Miller wrote:  
> > > > From: Ido Schimmel <idosch@idosch.org>
> > > > Date: Sun,  7 Jul 2019 10:58:17 +0300
> > > >     
> > > > > Users have several ways to debug the kernel and understand why a packet
> > > > > was dropped. For example, using "drop monitor" and "perf". Both
> > > > > utilities trace kfree_skb(), which is the function called when a packet
> > > > > is freed as part of a failure. The information provided by these tools
> > > > > is invaluable when trying to understand the cause of a packet loss.
> > > > > 
> > > > > In recent years, large portions of the kernel data path were offloaded
> > > > > to capable devices. Today, it is possible to perform L2 and L3
> > > > > forwarding in hardware, as well as tunneling (IP-in-IP and VXLAN).
> > > > > Different TC classifiers and actions are also offloaded to capable
> > > > > devices, at both ingress and egress.
> > > > > 
> > > > > However, when the data path is offloaded it is not possible to achieve
> > > > > the same level of introspection as tools such "perf" and "drop monitor"
> > > > > become irrelevant.
> > > > > 
> > > > > This patchset aims to solve this by allowing users to monitor packets
> > > > > that the underlying device decided to drop along with relevant metadata
> > > > > such as the drop reason and ingress port.    
> > > > 
> > > > We are now going to have 5 or so ways to capture packets passing through
> > > > the system, this is nonsense.
> > > > 
> > > > AF_PACKET, kfree_skb drop monitor, perf, XDP perf events, and now this
> > > > devlink thing.
> > > > 
> > > > This is insanity, too many ways to do the same thing and therefore the
> > > > worst possible user experience.
> > > > 
> > > > Pick _ONE_ method to trap packets and forward normal kfree_skb events,
> > > > XDP perf events, and these taps there too.
> > > > 
> > > > I mean really, think about it from the average user's perspective.  To
> > > > see all drops/pkts I have to attach a kfree_skb tracepoint, and not just
> > > > listen on devlink but configure a special tap thing beforehand and then
> > > > if someone is using XDP I gotta setup another perf event buffer capture
> > > > thing too.    
> > > 
> > > Let me try to explain again because I probably wasn't clear enough. The
> > > devlink-trap mechanism is not doing the same thing as other solutions.
> > > 
> > > The packets we are capturing in this patchset are packets that the
> > > kernel (the CPU) never saw up until now - they were silently dropped by
> > > the underlying device performing the packet forwarding instead of the
> > > CPU.  
> 
> Jakub,
> 
> It seems to me that most of the criticism is about consolidation of
> interfaces because you believe I'm doing something you can already do
> today, but this is not the case.

To be clear I'm not opposed to the patches, I'm just trying to
facilitate a discussion.

> Switch ASICs have dedicated traps for specific packets. Usually, these
> packets are control packets (e.g., ARP, BGP) which are required for the
> correct functioning of the control plane. You can see this in the SAI
> interface, which is an abstraction layer over vendors' SDKs:
> 
> https://github.com/opencomputeproject/SAI/blob/master/inc/saihostif.h#L157
> 
> We need to be able to configure the hardware policers of these traps and
> read their statistics to understand how many packets they dropped. We
> currently do not have a way to do any of that and we rely on hardcoded
> defaults in the driver which do not fit every use case (from
> experience):
> 
> https://elixir.bootlin.com/linux/v5.2/source/drivers/net/ethernet/mellanox/mlxsw/spectrum.c#L4103
> 
> We plan to extend devlink-trap mechanism to cover all these use cases. I
> hope you agree that this functionality belongs in devlink given it is a
> device-specific configuration and not a netdev-specific one.

No disagreement on providing knobs for traps.

> That being said, in its current form, this mechanism is focused on traps
> that correlate to packets the device decided to drop as this is very
> useful for debugging.

That'd be mixing two things - trap configuration and tracing exceptions
in one API. That's a little suboptimal but not too terrible, especially
if there is a higher level APIs users can default to.

> Given that the entire configuration is done via devlink and that devlink
> stores all the information about these traps, it seems logical to also
> report these packets and their metadata to user space as devlink events.
> 
> If this is not desirable, we can try to call into drop_monitor from
> devlink and add a new command (e.g., NET_DM_CMD_HW_ALERT), which will
> encode all the information we currently have in DEVLINK_CMD_TRAP_REPORT.
> 
> IMO, this is less desirable, as instead of having one tool (devlink) to
> interact with this mechanism we will need two (devlink & dropwatch).
> 
> Below I tried to answer all your questions and refer to all the points
> you brought up.
> 
> > When you say silently dropped do you mean that mlxsw as of today
> > doesn't have any counters exposed for those events?  
> 
> Some of these packets are counted, but not all of them.
> 
> > If we wanted to consolidate this into something existing we can either
> >  (a) add similar traps in the kernel data path;
> >  (b) make these traps extension of statistics.
> > 
> > My knee jerk reaction to seeing the patches was that it adds a new
> > place where device statistics are reported.  
> 
> Not at all. This would be a step back. We can already count discards due
> to VLAN membership on ingress on a per-port basis. A software maintained
> global counter does not buy us anything.
> 
> By also getting the dropped packet - coupled with the drop reason and
> ingress port - you can understand exactly why and on which VLAN the
> packet was dropped. I wrote a Wireshark dissector for these netlink
> packets to make our life easier. You can see the details in my comment
> to the cover letter:
> 
> https://marc.info/?l=linux-netdev&m=156248736710238&w=2
> 
> In case you do not care about individual packets, but still want more
> fine-grained statistics for your monitoring application, you can use
> eBPF. For example, one thing we did is attaching a kprobe to
> devlink_trap_report() with an eBPF program that dissects the incoming
> skbs and maintains a counter per-{5 tuple, drop reason}. With
> ebpf_exporter you can export these statistics to Prometheus on which you
> can run queries and visualize the results with Grafana. This is
> especially useful for tail and early drops since it allows you to
> understand which flows contribute to most of the drops.

No question that the mechanism is useful.

> > Users who want to know why things are dropped will not get detailed
> > breakdown from ethtool -S which for better or worse is the one stop
> > shop for device stats today.  
> 
> I hope I managed to explain why counters are not enough, but I also want
> to point out that ethtool statistics are not properly documented and
> this hinders their effectiveness. I did my best to document the exposed
> traps in order to avoid the same fate:
> 
> https://patchwork.ozlabs.org/patch/1128585/
> 
> In addition, there are selftests to show how each trap can be triggered
> to reduce the ambiguity even further:
> 
> https://patchwork.ozlabs.org/patch/1128610/
> 
> And a note in the documentation to make sure future functionality is
> tested as well:
> 
> https://patchwork.ozlabs.org/patch/1128608/
> 
> > Having thought about it some more, however, I think that having a
> > forwarding "exception" object and hanging statistics off of it is a
> > better design, even if we need to deal with some duplication to get
> > there.
> > 
> > IOW having an way to "trap all packets which would increment a
> > statistic" (option (b) above) is probably a bad design.
> > 
> > As for (a) I wonder how many of those events have a corresponding event
> > in the kernel stack?  
> 
> Generic packet drops all have a corresponding kfree_skb() calls in the
> kernel, but that does not mean that every packet dropped by the hardware
> would also be dropped by the kernel if it were to be injected to its Rx
> path.

The notion that all SW events get captured by kfree_skb() would not be
correct. We have the kfree_skb(), and xdp_exception(), and drivers can
drop packets if various allocations fail.. the situation is already not
great.

I think that having a single useful place where users can look to see
all traffic exception events would go a long way. Software side as I
mentioned is pretty brutal, IDK how many users are actually willing to
decode stack traces to figure out why their system is dropping
packets :/  

> In my reply to Dave I gave buffer drops as an example.

The example of buffer drops is also probably the case where having the
packet is least useful, but yes, I definitely agree devices need a way
of reporting events that can't happen in SW.

> There are also situations in which packets can be dropped due to
> device-specific exceptions and these do not have a corresponding drop
> reason in the kernel. See example here:
> 
> https://patchwork.ozlabs.org/patch/1128587/
> 
> > If we could add corresponding trace points and just feed those from
> > the device driver, that'd obviously be a holy grail.  
> 
> Unlike tracepoints, netlink gives you a structured and extensible
> interface. For example, in Spectrum-1 we cannot provide the Tx port for
> early/tail drops, whereas for Spectrum-2 and later we can. With netlink,
> we can just omit the DEVLINK_ATTR_TRAP_OUT_PORT attribute for
> Spectrum-1. You also get a programmatic interface that you can query for
> this information:
> 
> # devlink -v trap show netdevsim/netdevsim10 trap ingress_vlan_filter
> netdevsim/netdevsim10:
>   name ingress_vlan_filter type drop generic true report false action drop group l2_drops
>     metadata:
>        input_port

Right, you can set or not set skb fields to some extent but its
definitely not as flexible as netlink.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH net-next 00/11] Add drop monitor for offloaded data paths
  2019-07-09 22:34         ` Jakub Kicinski
@ 2019-07-10 11:20           ` Ido Schimmel
  2019-07-10 11:39             ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 48+ messages in thread
From: Ido Schimmel @ 2019-07-10 11:20 UTC (permalink / raw)
  To: Jakub Kicinski
  Cc: David Miller, netdev, jiri, mlxsw, dsahern, roopa, nikolay, andy,
	pablo, pieter.jansenvanvuuren, andrew, f.fainelli,
	vivien.didelot, idosch, Alexei Starovoitov

On Tue, Jul 09, 2019 at 03:34:30PM -0700, Jakub Kicinski wrote:
> On Tue, 9 Jul 2019 15:38:44 +0300, Ido Schimmel wrote:
> > On Mon, Jul 08, 2019 at 03:51:58PM -0700, Jakub Kicinski wrote:
> > > On Mon, 8 Jul 2019 16:19:08 +0300, Ido Schimmel wrote:  
> > > > On Sun, Jul 07, 2019 at 12:45:41PM -0700, David Miller wrote:  
> > > > > From: Ido Schimmel <idosch@idosch.org>
> > > > > Date: Sun,  7 Jul 2019 10:58:17 +0300
> > > > >     
> > > > > > Users have several ways to debug the kernel and understand why a packet
> > > > > > was dropped. For example, using "drop monitor" and "perf". Both
> > > > > > utilities trace kfree_skb(), which is the function called when a packet
> > > > > > is freed as part of a failure. The information provided by these tools
> > > > > > is invaluable when trying to understand the cause of a packet loss.
> > > > > > 
> > > > > > In recent years, large portions of the kernel data path were offloaded
> > > > > > to capable devices. Today, it is possible to perform L2 and L3
> > > > > > forwarding in hardware, as well as tunneling (IP-in-IP and VXLAN).
> > > > > > Different TC classifiers and actions are also offloaded to capable
> > > > > > devices, at both ingress and egress.
> > > > > > 
> > > > > > However, when the data path is offloaded it is not possible to achieve
> > > > > > the same level of introspection as tools such "perf" and "drop monitor"
> > > > > > become irrelevant.
> > > > > > 
> > > > > > This patchset aims to solve this by allowing users to monitor packets
> > > > > > that the underlying device decided to drop along with relevant metadata
> > > > > > such as the drop reason and ingress port.    
> > > > > 
> > > > > We are now going to have 5 or so ways to capture packets passing through
> > > > > the system, this is nonsense.
> > > > > 
> > > > > AF_PACKET, kfree_skb drop monitor, perf, XDP perf events, and now this
> > > > > devlink thing.
> > > > > 
> > > > > This is insanity, too many ways to do the same thing and therefore the
> > > > > worst possible user experience.
> > > > > 
> > > > > Pick _ONE_ method to trap packets and forward normal kfree_skb events,
> > > > > XDP perf events, and these taps there too.
> > > > > 
> > > > > I mean really, think about it from the average user's perspective.  To
> > > > > see all drops/pkts I have to attach a kfree_skb tracepoint, and not just
> > > > > listen on devlink but configure a special tap thing beforehand and then
> > > > > if someone is using XDP I gotta setup another perf event buffer capture
> > > > > thing too.    
> > > > 
> > > > Let me try to explain again because I probably wasn't clear enough. The
> > > > devlink-trap mechanism is not doing the same thing as other solutions.
> > > > 
> > > > The packets we are capturing in this patchset are packets that the
> > > > kernel (the CPU) never saw up until now - they were silently dropped by
> > > > the underlying device performing the packet forwarding instead of the
> > > > CPU.  
> > 
> > Jakub,
> > 
> > It seems to me that most of the criticism is about consolidation of
> > interfaces because you believe I'm doing something you can already do
> > today, but this is not the case.
> 
> To be clear I'm not opposed to the patches, I'm just trying to
> facilitate a discussion.

Sure, sorry if it came out the wrong way. I appreciate your feedback and
the time you have spent on this subject.

> 
> > Switch ASICs have dedicated traps for specific packets. Usually, these
> > packets are control packets (e.g., ARP, BGP) which are required for the
> > correct functioning of the control plane. You can see this in the SAI
> > interface, which is an abstraction layer over vendors' SDKs:
> > 
> > https://github.com/opencomputeproject/SAI/blob/master/inc/saihostif.h#L157
> > 
> > We need to be able to configure the hardware policers of these traps and
> > read their statistics to understand how many packets they dropped. We
> > currently do not have a way to do any of that and we rely on hardcoded
> > defaults in the driver which do not fit every use case (from
> > experience):
> > 
> > https://elixir.bootlin.com/linux/v5.2/source/drivers/net/ethernet/mellanox/mlxsw/spectrum.c#L4103
> > 
> > We plan to extend devlink-trap mechanism to cover all these use cases. I
> > hope you agree that this functionality belongs in devlink given it is a
> > device-specific configuration and not a netdev-specific one.
> 
> No disagreement on providing knobs for traps.
> 
> > That being said, in its current form, this mechanism is focused on traps
> > that correlate to packets the device decided to drop as this is very
> > useful for debugging.
> 
> That'd be mixing two things - trap configuration and tracing exceptions
> in one API. That's a little suboptimal but not too terrible, especially
> if there is a higher level APIs users can default to.

TBH, initially I was only focused on the drops, but then it occurred to
me that this is a too narrow scope. These traps are only a subset of the
complete list of traps we have and we have similar requirements for both
(statistics, setting policers etc.). Therefore, I decided to design this
interface in a more generic way, so that it could support the different
use cases.

> 
> > Given that the entire configuration is done via devlink and that devlink
> > stores all the information about these traps, it seems logical to also
> > report these packets and their metadata to user space as devlink events.
> > 
> > If this is not desirable, we can try to call into drop_monitor from
> > devlink and add a new command (e.g., NET_DM_CMD_HW_ALERT), which will
> > encode all the information we currently have in DEVLINK_CMD_TRAP_REPORT.
> > 
> > IMO, this is less desirable, as instead of having one tool (devlink) to
> > interact with this mechanism we will need two (devlink & dropwatch).
> > 
> > Below I tried to answer all your questions and refer to all the points
> > you brought up.
> > 
> > > When you say silently dropped do you mean that mlxsw as of today
> > > doesn't have any counters exposed for those events?  
> > 
> > Some of these packets are counted, but not all of them.
> > 
> > > If we wanted to consolidate this into something existing we can either
> > >  (a) add similar traps in the kernel data path;
> > >  (b) make these traps extension of statistics.
> > > 
> > > My knee jerk reaction to seeing the patches was that it adds a new
> > > place where device statistics are reported.  
> > 
> > Not at all. This would be a step back. We can already count discards due
> > to VLAN membership on ingress on a per-port basis. A software maintained
> > global counter does not buy us anything.
> > 
> > By also getting the dropped packet - coupled with the drop reason and
> > ingress port - you can understand exactly why and on which VLAN the
> > packet was dropped. I wrote a Wireshark dissector for these netlink
> > packets to make our life easier. You can see the details in my comment
> > to the cover letter:
> > 
> > https://marc.info/?l=linux-netdev&m=156248736710238&w=2
> > 
> > In case you do not care about individual packets, but still want more
> > fine-grained statistics for your monitoring application, you can use
> > eBPF. For example, one thing we did is attaching a kprobe to
> > devlink_trap_report() with an eBPF program that dissects the incoming
> > skbs and maintains a counter per-{5 tuple, drop reason}. With
> > ebpf_exporter you can export these statistics to Prometheus on which you
> > can run queries and visualize the results with Grafana. This is
> > especially useful for tail and early drops since it allows you to
> > understand which flows contribute to most of the drops.
> 
> No question that the mechanism is useful.
> 
> > > Users who want to know why things are dropped will not get detailed
> > > breakdown from ethtool -S which for better or worse is the one stop
> > > shop for device stats today.  
> > 
> > I hope I managed to explain why counters are not enough, but I also want
> > to point out that ethtool statistics are not properly documented and
> > this hinders their effectiveness. I did my best to document the exposed
> > traps in order to avoid the same fate:
> > 
> > https://patchwork.ozlabs.org/patch/1128585/
> > 
> > In addition, there are selftests to show how each trap can be triggered
> > to reduce the ambiguity even further:
> > 
> > https://patchwork.ozlabs.org/patch/1128610/
> > 
> > And a note in the documentation to make sure future functionality is
> > tested as well:
> > 
> > https://patchwork.ozlabs.org/patch/1128608/
> > 
> > > Having thought about it some more, however, I think that having a
> > > forwarding "exception" object and hanging statistics off of it is a
> > > better design, even if we need to deal with some duplication to get
> > > there.
> > > 
> > > IOW having an way to "trap all packets which would increment a
> > > statistic" (option (b) above) is probably a bad design.
> > > 
> > > As for (a) I wonder how many of those events have a corresponding event
> > > in the kernel stack?  
> > 
> > Generic packet drops all have a corresponding kfree_skb() calls in the
> > kernel, but that does not mean that every packet dropped by the hardware
> > would also be dropped by the kernel if it were to be injected to its Rx
> > path.
> 
> The notion that all SW events get captured by kfree_skb() would not be
> correct.

I meant that the generic drop reasons I'm exposing with this patchset
all correspond to reasons for which the kernel would drop packets.

> We have the kfree_skb(), and xdp_exception(), and drivers can
> drop packets if various allocations fail.. the situation is already not
> great.
> 
> I think that having a single useful place where users can look to see
> all traffic exception events would go a long way. 

I believe this was Dave's point as well. We have one tool to monitor
kfree_skb() drops and with this patchset we will have another to monitor
HW drops. As I mentioned in my previous reply, I will look into sending
the events via drop_monitor by calling into it from devlink.

I'm not involved with XDP (as you might have noticed), but I assume
drop_monitor could be extended for this use case as well by doing
register_trace_xdp_exception(). Then you could monitor SW, HW and XDP
events using a single netlink channel, potentially split into different
multicast groups to allow user space programs to receive only the events
they care about.

> Software side as I mentioned is pretty brutal, IDK how many users are
> actually willing to decode stack traces to figure out why their system
> is dropping packets :/
> 
> > In my reply to Dave I gave buffer drops as an example.
> 
> The example of buffer drops is also probably the case where having the
> packet is least useful, but yes, I definitely agree devices need a way
> of reporting events that can't happen in SW.
> 
> > There are also situations in which packets can be dropped due to
> > device-specific exceptions and these do not have a corresponding drop
> > reason in the kernel. See example here:
> > 
> > https://patchwork.ozlabs.org/patch/1128587/
> > 
> > > If we could add corresponding trace points and just feed those from
> > > the device driver, that'd obviously be a holy grail.  
> > 
> > Unlike tracepoints, netlink gives you a structured and extensible
> > interface. For example, in Spectrum-1 we cannot provide the Tx port for
> > early/tail drops, whereas for Spectrum-2 and later we can. With netlink,
> > we can just omit the DEVLINK_ATTR_TRAP_OUT_PORT attribute for
> > Spectrum-1. You also get a programmatic interface that you can query for
> > this information:
> > 
> > # devlink -v trap show netdevsim/netdevsim10 trap ingress_vlan_filter
> > netdevsim/netdevsim10:
> >   name ingress_vlan_filter type drop generic true report false action drop group l2_drops
> >     metadata:
> >        input_port
> 
> Right, you can set or not set skb fields to some extent but its
> definitely not as flexible as netlink.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH net-next 00/11] Add drop monitor for offloaded data paths
  2019-07-10 11:20           ` Ido Schimmel
@ 2019-07-10 11:39             ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 48+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-07-10 11:39 UTC (permalink / raw)
  To: Ido Schimmel, Jakub Kicinski
  Cc: David Miller, netdev, jiri, mlxsw, dsahern, roopa, nikolay, andy,
	pablo, pieter.jansenvanvuuren, andrew, f.fainelli,
	vivien.didelot, idosch, Alexei Starovoitov

Ido Schimmel <idosch@idosch.org> writes:

> On Tue, Jul 09, 2019 at 03:34:30PM -0700, Jakub Kicinski wrote:
>> On Tue, 9 Jul 2019 15:38:44 +0300, Ido Schimmel wrote:
>> > On Mon, Jul 08, 2019 at 03:51:58PM -0700, Jakub Kicinski wrote:
>> > > On Mon, 8 Jul 2019 16:19:08 +0300, Ido Schimmel wrote:  
>> > > > On Sun, Jul 07, 2019 at 12:45:41PM -0700, David Miller wrote:  
>> > > > > From: Ido Schimmel <idosch@idosch.org>
>> > > > > Date: Sun,  7 Jul 2019 10:58:17 +0300
>> > > > >     
>> > > > > > Users have several ways to debug the kernel and understand why a packet
>> > > > > > was dropped. For example, using "drop monitor" and "perf". Both
>> > > > > > utilities trace kfree_skb(), which is the function called when a packet
>> > > > > > is freed as part of a failure. The information provided by these tools
>> > > > > > is invaluable when trying to understand the cause of a packet loss.
>> > > > > > 
>> > > > > > In recent years, large portions of the kernel data path were offloaded
>> > > > > > to capable devices. Today, it is possible to perform L2 and L3
>> > > > > > forwarding in hardware, as well as tunneling (IP-in-IP and VXLAN).
>> > > > > > Different TC classifiers and actions are also offloaded to capable
>> > > > > > devices, at both ingress and egress.
>> > > > > > 
>> > > > > > However, when the data path is offloaded it is not possible to achieve
>> > > > > > the same level of introspection as tools such "perf" and "drop monitor"
>> > > > > > become irrelevant.
>> > > > > > 
>> > > > > > This patchset aims to solve this by allowing users to monitor packets
>> > > > > > that the underlying device decided to drop along with relevant metadata
>> > > > > > such as the drop reason and ingress port.    
>> > > > > 
>> > > > > We are now going to have 5 or so ways to capture packets passing through
>> > > > > the system, this is nonsense.
>> > > > > 
>> > > > > AF_PACKET, kfree_skb drop monitor, perf, XDP perf events, and now this
>> > > > > devlink thing.
>> > > > > 
>> > > > > This is insanity, too many ways to do the same thing and therefore the
>> > > > > worst possible user experience.
>> > > > > 
>> > > > > Pick _ONE_ method to trap packets and forward normal kfree_skb events,
>> > > > > XDP perf events, and these taps there too.
>> > > > > 
>> > > > > I mean really, think about it from the average user's perspective.  To
>> > > > > see all drops/pkts I have to attach a kfree_skb tracepoint, and not just
>> > > > > listen on devlink but configure a special tap thing beforehand and then
>> > > > > if someone is using XDP I gotta setup another perf event buffer capture
>> > > > > thing too.    
>> > > > 
>> > > > Let me try to explain again because I probably wasn't clear enough. The
>> > > > devlink-trap mechanism is not doing the same thing as other solutions.
>> > > > 
>> > > > The packets we are capturing in this patchset are packets that the
>> > > > kernel (the CPU) never saw up until now - they were silently dropped by
>> > > > the underlying device performing the packet forwarding instead of the
>> > > > CPU.  
>> > 
>> > Jakub,
>> > 
>> > It seems to me that most of the criticism is about consolidation of
>> > interfaces because you believe I'm doing something you can already do
>> > today, but this is not the case.
>> 
>> To be clear I'm not opposed to the patches, I'm just trying to
>> facilitate a discussion.
>
> Sure, sorry if it came out the wrong way. I appreciate your feedback and
> the time you have spent on this subject.
>
>> 
>> > Switch ASICs have dedicated traps for specific packets. Usually, these
>> > packets are control packets (e.g., ARP, BGP) which are required for the
>> > correct functioning of the control plane. You can see this in the SAI
>> > interface, which is an abstraction layer over vendors' SDKs:
>> > 
>> > https://github.com/opencomputeproject/SAI/blob/master/inc/saihostif.h#L157
>> > 
>> > We need to be able to configure the hardware policers of these traps and
>> > read their statistics to understand how many packets they dropped. We
>> > currently do not have a way to do any of that and we rely on hardcoded
>> > defaults in the driver which do not fit every use case (from
>> > experience):
>> > 
>> > https://elixir.bootlin.com/linux/v5.2/source/drivers/net/ethernet/mellanox/mlxsw/spectrum.c#L4103
>> > 
>> > We plan to extend devlink-trap mechanism to cover all these use cases. I
>> > hope you agree that this functionality belongs in devlink given it is a
>> > device-specific configuration and not a netdev-specific one.
>> 
>> No disagreement on providing knobs for traps.
>> 
>> > That being said, in its current form, this mechanism is focused on traps
>> > that correlate to packets the device decided to drop as this is very
>> > useful for debugging.
>> 
>> That'd be mixing two things - trap configuration and tracing exceptions
>> in one API. That's a little suboptimal but not too terrible, especially
>> if there is a higher level APIs users can default to.
>
> TBH, initially I was only focused on the drops, but then it occurred to
> me that this is a too narrow scope. These traps are only a subset of the
> complete list of traps we have and we have similar requirements for both
> (statistics, setting policers etc.). Therefore, I decided to design this
> interface in a more generic way, so that it could support the different
> use cases.
>
>> 
>> > Given that the entire configuration is done via devlink and that devlink
>> > stores all the information about these traps, it seems logical to also
>> > report these packets and their metadata to user space as devlink events.
>> > 
>> > If this is not desirable, we can try to call into drop_monitor from
>> > devlink and add a new command (e.g., NET_DM_CMD_HW_ALERT), which will
>> > encode all the information we currently have in DEVLINK_CMD_TRAP_REPORT.
>> > 
>> > IMO, this is less desirable, as instead of having one tool (devlink) to
>> > interact with this mechanism we will need two (devlink & dropwatch).
>> > 
>> > Below I tried to answer all your questions and refer to all the points
>> > you brought up.
>> > 
>> > > When you say silently dropped do you mean that mlxsw as of today
>> > > doesn't have any counters exposed for those events?  
>> > 
>> > Some of these packets are counted, but not all of them.
>> > 
>> > > If we wanted to consolidate this into something existing we can either
>> > >  (a) add similar traps in the kernel data path;
>> > >  (b) make these traps extension of statistics.
>> > > 
>> > > My knee jerk reaction to seeing the patches was that it adds a new
>> > > place where device statistics are reported.  
>> > 
>> > Not at all. This would be a step back. We can already count discards due
>> > to VLAN membership on ingress on a per-port basis. A software maintained
>> > global counter does not buy us anything.
>> > 
>> > By also getting the dropped packet - coupled with the drop reason and
>> > ingress port - you can understand exactly why and on which VLAN the
>> > packet was dropped. I wrote a Wireshark dissector for these netlink
>> > packets to make our life easier. You can see the details in my comment
>> > to the cover letter:
>> > 
>> > https://marc.info/?l=linux-netdev&m=156248736710238&w=2
>> > 
>> > In case you do not care about individual packets, but still want more
>> > fine-grained statistics for your monitoring application, you can use
>> > eBPF. For example, one thing we did is attaching a kprobe to
>> > devlink_trap_report() with an eBPF program that dissects the incoming
>> > skbs and maintains a counter per-{5 tuple, drop reason}. With
>> > ebpf_exporter you can export these statistics to Prometheus on which you
>> > can run queries and visualize the results with Grafana. This is
>> > especially useful for tail and early drops since it allows you to
>> > understand which flows contribute to most of the drops.
>> 
>> No question that the mechanism is useful.
>> 
>> > > Users who want to know why things are dropped will not get detailed
>> > > breakdown from ethtool -S which for better or worse is the one stop
>> > > shop for device stats today.  
>> > 
>> > I hope I managed to explain why counters are not enough, but I also want
>> > to point out that ethtool statistics are not properly documented and
>> > this hinders their effectiveness. I did my best to document the exposed
>> > traps in order to avoid the same fate:
>> > 
>> > https://patchwork.ozlabs.org/patch/1128585/
>> > 
>> > In addition, there are selftests to show how each trap can be triggered
>> > to reduce the ambiguity even further:
>> > 
>> > https://patchwork.ozlabs.org/patch/1128610/
>> > 
>> > And a note in the documentation to make sure future functionality is
>> > tested as well:
>> > 
>> > https://patchwork.ozlabs.org/patch/1128608/
>> > 
>> > > Having thought about it some more, however, I think that having a
>> > > forwarding "exception" object and hanging statistics off of it is a
>> > > better design, even if we need to deal with some duplication to get
>> > > there.
>> > > 
>> > > IOW having an way to "trap all packets which would increment a
>> > > statistic" (option (b) above) is probably a bad design.
>> > > 
>> > > As for (a) I wonder how many of those events have a corresponding event
>> > > in the kernel stack?  
>> > 
>> > Generic packet drops all have a corresponding kfree_skb() calls in the
>> > kernel, but that does not mean that every packet dropped by the hardware
>> > would also be dropped by the kernel if it were to be injected to its Rx
>> > path.
>> 
>> The notion that all SW events get captured by kfree_skb() would not be
>> correct.
>
> I meant that the generic drop reasons I'm exposing with this patchset
> all correspond to reasons for which the kernel would drop packets.
>
>> We have the kfree_skb(), and xdp_exception(), and drivers can
>> drop packets if various allocations fail.. the situation is already not
>> great.
>> 
>> I think that having a single useful place where users can look to see
>> all traffic exception events would go a long way. 
>
> I believe this was Dave's point as well. We have one tool to monitor
> kfree_skb() drops and with this patchset we will have another to monitor
> HW drops. As I mentioned in my previous reply, I will look into sending
> the events via drop_monitor by calling into it from devlink.
>
> I'm not involved with XDP (as you might have noticed), but I assume
> drop_monitor could be extended for this use case as well by doing
> register_trace_xdp_exception(). Then you could monitor SW, HW and XDP
> events using a single netlink channel, potentially split into different
> multicast groups to allow user space programs to receive only the events
> they care about.

Yes, having XDP hook into this would be good! This ties in nicely with
the need for better statistics for XDP (see
https://github.com/xdp-project/xdp-project/blob/master/xdp-project.org#consistency-for-statistics-with-xdp).

Unfortunately, it's not enough to just hook into the xdp_exception trace
point, as that is generally only triggered on XDP_ABORTED, not if the
XDP program just decides to drop the packet with XDP_DROP. So we may
need another mechanism to hook this if it's going to comprehensive; and
we'd probably want a way to do this that doesn't impose a performance
penalty when the drop monitor is not enabled...

-Toke

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH net-next 00/11] Add drop monitor for offloaded data paths
  2019-07-07 19:45 ` David Miller
  2019-07-08 13:19   ` Ido Schimmel
@ 2019-07-11 12:39   ` Ido Schimmel
  2019-07-11 19:02     ` David Miller
                       ` (2 more replies)
  1 sibling, 3 replies; 48+ messages in thread
From: Ido Schimmel @ 2019-07-11 12:39 UTC (permalink / raw)
  To: David Miller, nhorman
  Cc: netdev, jiri, mlxsw, dsahern, roopa, nikolay, andy, pablo,
	jakub.kicinski, pieter.jansenvanvuuren, andrew, f.fainelli,
	vivien.didelot, idosch

On Sun, Jul 07, 2019 at 12:45:41PM -0700, David Miller wrote:
> From: Ido Schimmel <idosch@idosch.org>
> Date: Sun,  7 Jul 2019 10:58:17 +0300
> 
> > Users have several ways to debug the kernel and understand why a packet
> > was dropped. For example, using "drop monitor" and "perf". Both
> > utilities trace kfree_skb(), which is the function called when a packet
> > is freed as part of a failure. The information provided by these tools
> > is invaluable when trying to understand the cause of a packet loss.
> > 
> > In recent years, large portions of the kernel data path were offloaded
> > to capable devices. Today, it is possible to perform L2 and L3
> > forwarding in hardware, as well as tunneling (IP-in-IP and VXLAN).
> > Different TC classifiers and actions are also offloaded to capable
> > devices, at both ingress and egress.
> > 
> > However, when the data path is offloaded it is not possible to achieve
> > the same level of introspection as tools such "perf" and "drop monitor"
> > become irrelevant.
> > 
> > This patchset aims to solve this by allowing users to monitor packets
> > that the underlying device decided to drop along with relevant metadata
> > such as the drop reason and ingress port.
> 
> We are now going to have 5 or so ways to capture packets passing through
> the system, this is nonsense.
> 
> AF_PACKET, kfree_skb drop monitor, perf, XDP perf events, and now this
> devlink thing.
> 
> This is insanity, too many ways to do the same thing and therefore the
> worst possible user experience.
> 
> Pick _ONE_ method to trap packets and forward normal kfree_skb events,
> XDP perf events, and these taps there too.
> 
> I mean really, think about it from the average user's perspective.  To
> see all drops/pkts I have to attach a kfree_skb tracepoint, and not just
> listen on devlink but configure a special tap thing beforehand and then
> if someone is using XDP I gotta setup another perf event buffer capture
> thing too.

Dave,

Before I start working on v2, I would like to get your feedback on the
high level plan. Also adding Neil who is the maintainer of drop_monitor
(and counterpart DropWatch tool [1]).

IIUC, the problem you point out is that users need to use different
tools to monitor packet drops based on where these drops occur
(SW/HW/XDP).

Therefore, my plan is to extend the existing drop_monitor netlink
channel to also cover HW drops. I will add a new message type and a new
multicast group for HW drops and encode in the message what is currently
encoded in the devlink events.

I would like to emphasize that the configuration of whether these
dropped packets are even sent to the CPU from the device still needs to
reside in devlink given this is the go-to tool for device-specific
configuration. In addition, these drop traps are a small subset of the
entire packet traps devices support and all have similar needs such as
HW policer configuration and statistics.

In the future we might also want to report events that indicate the
formation of possible problems. For example, in case packets are queued
above a certain threshold or for long periods of time. I hope we could
re-use drop_monitor for this as well, thereby making it the go-to
channel for diagnosing current and to-be problems in the data path.

Thanks

[1] https://github.com/nhorman/dropwatch

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH net-next 00/11] Add drop monitor for offloaded data paths
  2019-07-11 12:39   ` Ido Schimmel
@ 2019-07-11 19:02     ` David Miller
  2019-07-11 23:53     ` Neil Horman
  2019-07-14  2:38     ` David Miller
  2 siblings, 0 replies; 48+ messages in thread
From: David Miller @ 2019-07-11 19:02 UTC (permalink / raw)
  To: idosch
  Cc: nhorman, netdev, jiri, mlxsw, dsahern, roopa, nikolay, andy,
	pablo, jakub.kicinski, pieter.jansenvanvuuren, andrew,
	f.fainelli, vivien.didelot, idosch

From: Ido Schimmel <idosch@idosch.org>
Date: Thu, 11 Jul 2019 15:39:09 +0300

> Before I start working on v2, I would like to get your feedback on the
> high level plan. Also adding Neil who is the maintainer of drop_monitor
> (and counterpart DropWatch tool [1]).

I'll try to get back to this, but right now the merge window is completely
consuming me at the moment so you will have to exercise extreme patience.

Thank you.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH net-next 00/11] Add drop monitor for offloaded data paths
  2019-07-11 12:39   ` Ido Schimmel
  2019-07-11 19:02     ` David Miller
@ 2019-07-11 23:53     ` Neil Horman
  2019-07-12  3:40       ` Florian Fainelli
                         ` (2 more replies)
  2019-07-14  2:38     ` David Miller
  2 siblings, 3 replies; 48+ messages in thread
From: Neil Horman @ 2019-07-11 23:53 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: David Miller, netdev, jiri, mlxsw, dsahern, roopa, nikolay, andy,
	pablo, jakub.kicinski, pieter.jansenvanvuuren, andrew,
	f.fainelli, vivien.didelot, idosch

On Thu, Jul 11, 2019 at 03:39:09PM +0300, Ido Schimmel wrote:
> On Sun, Jul 07, 2019 at 12:45:41PM -0700, David Miller wrote:
> > From: Ido Schimmel <idosch@idosch.org>
> > Date: Sun,  7 Jul 2019 10:58:17 +0300
> > 
> > > Users have several ways to debug the kernel and understand why a packet
> > > was dropped. For example, using "drop monitor" and "perf". Both
> > > utilities trace kfree_skb(), which is the function called when a packet
> > > is freed as part of a failure. The information provided by these tools
> > > is invaluable when trying to understand the cause of a packet loss.
> > > 
> > > In recent years, large portions of the kernel data path were offloaded
> > > to capable devices. Today, it is possible to perform L2 and L3
> > > forwarding in hardware, as well as tunneling (IP-in-IP and VXLAN).
> > > Different TC classifiers and actions are also offloaded to capable
> > > devices, at both ingress and egress.
> > > 
> > > However, when the data path is offloaded it is not possible to achieve
> > > the same level of introspection as tools such "perf" and "drop monitor"
> > > become irrelevant.
> > > 
> > > This patchset aims to solve this by allowing users to monitor packets
> > > that the underlying device decided to drop along with relevant metadata
> > > such as the drop reason and ingress port.
> > 
> > We are now going to have 5 or so ways to capture packets passing through
> > the system, this is nonsense.
> > 
> > AF_PACKET, kfree_skb drop monitor, perf, XDP perf events, and now this
> > devlink thing.
> > 
> > This is insanity, too many ways to do the same thing and therefore the
> > worst possible user experience.
> > 
> > Pick _ONE_ method to trap packets and forward normal kfree_skb events,
> > XDP perf events, and these taps there too.
> > 
> > I mean really, think about it from the average user's perspective.  To
> > see all drops/pkts I have to attach a kfree_skb tracepoint, and not just
> > listen on devlink but configure a special tap thing beforehand and then
> > if someone is using XDP I gotta setup another perf event buffer capture
> > thing too.
> 
> Dave,
> 
> Before I start working on v2, I would like to get your feedback on the
> high level plan. Also adding Neil who is the maintainer of drop_monitor
> (and counterpart DropWatch tool [1]).
> 
> IIUC, the problem you point out is that users need to use different
> tools to monitor packet drops based on where these drops occur
> (SW/HW/XDP).
> 
> Therefore, my plan is to extend the existing drop_monitor netlink
> channel to also cover HW drops. I will add a new message type and a new
> multicast group for HW drops and encode in the message what is currently
> encoded in the devlink events.
> 
A few things here:
IIRC we don't announce individual hardware drops, drivers record them in
internal structures, and they are retrieved on demand via ethtool calls, so you
will either need to include some polling (probably not a very performant idea),
or some sort of flagging mechanism to indicate that on the next message sent to
user space you should go retrieve hw stats from a given interface.  I certainly
wouldn't mind seeing this happen, but its more work than just adding a new
netlink message.

Also, regarding XDP drops, we wont see them if the xdp program is offloaded to
hardware (you'll need your hw drop gathering mechanism for that), but for xdp
programs run on the cpu, dropwatch should alrady catch those.  I.e. if the xdp
program returns a DROP result for a packet being processed, the OS will call
kfree_skb on its behalf, and dropwatch wil call that.

> I would like to emphasize that the configuration of whether these
> dropped packets are even sent to the CPU from the device still needs to
> reside in devlink given this is the go-to tool for device-specific
> configuration. In addition, these drop traps are a small subset of the
> entire packet traps devices support and all have similar needs such as
> HW policer configuration and statistics.
> 
> In the future we might also want to report events that indicate the
> formation of possible problems. For example, in case packets are queued
> above a certain threshold or for long periods of time. I hope we could
> re-use drop_monitor for this as well, thereby making it the go-to
> channel for diagnosing current and to-be problems in the data path.
> 
Thats an interesting idea, but dropwatch certainly isn't currently setup for
that kind of messaging.  It may be worth creating a v2 of the netlink protocol
and really thinking out what you want to communicate.

Best
Neil

> Thanks
> 
> [1] https://github.com/nhorman/dropwatch
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH net-next 00/11] Add drop monitor for offloaded data paths
  2019-07-11 23:53     ` Neil Horman
@ 2019-07-12  3:40       ` Florian Fainelli
  2019-07-12 12:05         ` Neil Horman
  2019-07-12  9:27       ` Toke Høiland-Jørgensen
  2019-07-12 13:52       ` Ido Schimmel
  2 siblings, 1 reply; 48+ messages in thread
From: Florian Fainelli @ 2019-07-12  3:40 UTC (permalink / raw)
  To: Neil Horman, Ido Schimmel
  Cc: David Miller, netdev, jiri, mlxsw, dsahern, roopa, nikolay, andy,
	pablo, jakub.kicinski, pieter.jansenvanvuuren, andrew,
	vivien.didelot, idosch



On 7/11/2019 4:53 PM, Neil Horman wrote:
>> I would like to emphasize that the configuration of whether these
>> dropped packets are even sent to the CPU from the device still needs to
>> reside in devlink given this is the go-to tool for device-specific
>> configuration. In addition, these drop traps are a small subset of the
>> entire packet traps devices support and all have similar needs such as
>> HW policer configuration and statistics.
>>
>> In the future we might also want to report events that indicate the
>> formation of possible problems. For example, in case packets are queued
>> above a certain threshold or for long periods of time. I hope we could
>> re-use drop_monitor for this as well, thereby making it the go-to
>> channel for diagnosing current and to-be problems in the data path.
>>
> Thats an interesting idea, but dropwatch certainly isn't currently setup for
> that kind of messaging.  It may be worth creating a v2 of the netlink protocol
> and really thinking out what you want to communicate.

Is not what you describe more or less what Ido has been doing here with
this patch series?
-- 
Florian

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH net-next 00/11] Add drop monitor for offloaded data paths
  2019-07-11 23:53     ` Neil Horman
  2019-07-12  3:40       ` Florian Fainelli
@ 2019-07-12  9:27       ` Toke Høiland-Jørgensen
  2019-07-12 12:18         ` Neil Horman
  2019-07-12 13:52       ` Ido Schimmel
  2 siblings, 1 reply; 48+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-07-12  9:27 UTC (permalink / raw)
  To: Neil Horman, Ido Schimmel
  Cc: David Miller, netdev, jiri, mlxsw, dsahern, roopa, nikolay, andy,
	pablo, jakub.kicinski, pieter.jansenvanvuuren, andrew,
	f.fainelli, vivien.didelot, idosch

Neil Horman <nhorman@tuxdriver.com> writes:

> On Thu, Jul 11, 2019 at 03:39:09PM +0300, Ido Schimmel wrote:
>> On Sun, Jul 07, 2019 at 12:45:41PM -0700, David Miller wrote:
>> > From: Ido Schimmel <idosch@idosch.org>
>> > Date: Sun,  7 Jul 2019 10:58:17 +0300
>> > 
>> > > Users have several ways to debug the kernel and understand why a packet
>> > > was dropped. For example, using "drop monitor" and "perf". Both
>> > > utilities trace kfree_skb(), which is the function called when a packet
>> > > is freed as part of a failure. The information provided by these tools
>> > > is invaluable when trying to understand the cause of a packet loss.
>> > > 
>> > > In recent years, large portions of the kernel data path were offloaded
>> > > to capable devices. Today, it is possible to perform L2 and L3
>> > > forwarding in hardware, as well as tunneling (IP-in-IP and VXLAN).
>> > > Different TC classifiers and actions are also offloaded to capable
>> > > devices, at both ingress and egress.
>> > > 
>> > > However, when the data path is offloaded it is not possible to achieve
>> > > the same level of introspection as tools such "perf" and "drop monitor"
>> > > become irrelevant.
>> > > 
>> > > This patchset aims to solve this by allowing users to monitor packets
>> > > that the underlying device decided to drop along with relevant metadata
>> > > such as the drop reason and ingress port.
>> > 
>> > We are now going to have 5 or so ways to capture packets passing through
>> > the system, this is nonsense.
>> > 
>> > AF_PACKET, kfree_skb drop monitor, perf, XDP perf events, and now this
>> > devlink thing.
>> > 
>> > This is insanity, too many ways to do the same thing and therefore the
>> > worst possible user experience.
>> > 
>> > Pick _ONE_ method to trap packets and forward normal kfree_skb events,
>> > XDP perf events, and these taps there too.
>> > 
>> > I mean really, think about it from the average user's perspective.  To
>> > see all drops/pkts I have to attach a kfree_skb tracepoint, and not just
>> > listen on devlink but configure a special tap thing beforehand and then
>> > if someone is using XDP I gotta setup another perf event buffer capture
>> > thing too.
>> 
>> Dave,
>> 
>> Before I start working on v2, I would like to get your feedback on the
>> high level plan. Also adding Neil who is the maintainer of drop_monitor
>> (and counterpart DropWatch tool [1]).
>> 
>> IIUC, the problem you point out is that users need to use different
>> tools to monitor packet drops based on where these drops occur
>> (SW/HW/XDP).
>> 
>> Therefore, my plan is to extend the existing drop_monitor netlink
>> channel to also cover HW drops. I will add a new message type and a new
>> multicast group for HW drops and encode in the message what is currently
>> encoded in the devlink events.
>> 
> A few things here:
> IIRC we don't announce individual hardware drops, drivers record them in
> internal structures, and they are retrieved on demand via ethtool calls, so you
> will either need to include some polling (probably not a very performant idea),
> or some sort of flagging mechanism to indicate that on the next message sent to
> user space you should go retrieve hw stats from a given interface.  I certainly
> wouldn't mind seeing this happen, but its more work than just adding a new
> netlink message.
>
> Also, regarding XDP drops, we wont see them if the xdp program is offloaded to
> hardware (you'll need your hw drop gathering mechanism for that), but for xdp
> programs run on the cpu, dropwatch should alrady catch those.  I.e. if the xdp
> program returns a DROP result for a packet being processed, the OS will call
> kfree_skb on its behalf, and dropwatch wil call that.

There is no skb by the time an XDP program runs, so this is not true. As
I mentioned upthread, there's a tracepoint that will get called if an
error occurs (or the program returns XDP_ABORTED), but in most cases,
XDP_DROP just means that the packet silently disappears...

-Toke

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH net-next 00/11] Add drop monitor for offloaded data paths
  2019-07-12  3:40       ` Florian Fainelli
@ 2019-07-12 12:05         ` Neil Horman
  0 siblings, 0 replies; 48+ messages in thread
From: Neil Horman @ 2019-07-12 12:05 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Ido Schimmel, David Miller, netdev, jiri, mlxsw, dsahern, roopa,
	nikolay, andy, pablo, jakub.kicinski, pieter.jansenvanvuuren,
	andrew, vivien.didelot, idosch

On Thu, Jul 11, 2019 at 08:40:34PM -0700, Florian Fainelli wrote:
> 
> 
> On 7/11/2019 4:53 PM, Neil Horman wrote:
> >> I would like to emphasize that the configuration of whether these
> >> dropped packets are even sent to the CPU from the device still needs to
> >> reside in devlink given this is the go-to tool for device-specific
> >> configuration. In addition, these drop traps are a small subset of the
> >> entire packet traps devices support and all have similar needs such as
> >> HW policer configuration and statistics.
> >>
> >> In the future we might also want to report events that indicate the
> >> formation of possible problems. For example, in case packets are queued
> >> above a certain threshold or for long periods of time. I hope we could
> >> re-use drop_monitor for this as well, thereby making it the go-to
> >> channel for diagnosing current and to-be problems in the data path.
> >>
> > Thats an interesting idea, but dropwatch certainly isn't currently setup for
> > that kind of messaging.  It may be worth creating a v2 of the netlink protocol
> > and really thinking out what you want to communicate.
> 
> Is not what you describe more or less what Ido has been doing here with
> this patch series?
possibly, I was only CCed on this thread halfway throught the conversation, and
only on the cover letter, I've not had a chance to look at the entire series

Neil

> -- 
> Florian
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH net-next 00/11] Add drop monitor for offloaded data paths
  2019-07-12  9:27       ` Toke Høiland-Jørgensen
@ 2019-07-12 12:18         ` Neil Horman
  2019-07-12 12:33           ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 48+ messages in thread
From: Neil Horman @ 2019-07-12 12:18 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Ido Schimmel, David Miller, netdev, jiri, mlxsw, dsahern, roopa,
	nikolay, andy, pablo, jakub.kicinski, pieter.jansenvanvuuren,
	andrew, f.fainelli, vivien.didelot, idosch

On Fri, Jul 12, 2019 at 11:27:55AM +0200, Toke Høiland-Jørgensen wrote:
> Neil Horman <nhorman@tuxdriver.com> writes:
> 
> > On Thu, Jul 11, 2019 at 03:39:09PM +0300, Ido Schimmel wrote:
> >> On Sun, Jul 07, 2019 at 12:45:41PM -0700, David Miller wrote:
> >> > From: Ido Schimmel <idosch@idosch.org>
> >> > Date: Sun,  7 Jul 2019 10:58:17 +0300
> >> > 
> >> > > Users have several ways to debug the kernel and understand why a packet
> >> > > was dropped. For example, using "drop monitor" and "perf". Both
> >> > > utilities trace kfree_skb(), which is the function called when a packet
> >> > > is freed as part of a failure. The information provided by these tools
> >> > > is invaluable when trying to understand the cause of a packet loss.
> >> > > 
> >> > > In recent years, large portions of the kernel data path were offloaded
> >> > > to capable devices. Today, it is possible to perform L2 and L3
> >> > > forwarding in hardware, as well as tunneling (IP-in-IP and VXLAN).
> >> > > Different TC classifiers and actions are also offloaded to capable
> >> > > devices, at both ingress and egress.
> >> > > 
> >> > > However, when the data path is offloaded it is not possible to achieve
> >> > > the same level of introspection as tools such "perf" and "drop monitor"
> >> > > become irrelevant.
> >> > > 
> >> > > This patchset aims to solve this by allowing users to monitor packets
> >> > > that the underlying device decided to drop along with relevant metadata
> >> > > such as the drop reason and ingress port.
> >> > 
> >> > We are now going to have 5 or so ways to capture packets passing through
> >> > the system, this is nonsense.
> >> > 
> >> > AF_PACKET, kfree_skb drop monitor, perf, XDP perf events, and now this
> >> > devlink thing.
> >> > 
> >> > This is insanity, too many ways to do the same thing and therefore the
> >> > worst possible user experience.
> >> > 
> >> > Pick _ONE_ method to trap packets and forward normal kfree_skb events,
> >> > XDP perf events, and these taps there too.
> >> > 
> >> > I mean really, think about it from the average user's perspective.  To
> >> > see all drops/pkts I have to attach a kfree_skb tracepoint, and not just
> >> > listen on devlink but configure a special tap thing beforehand and then
> >> > if someone is using XDP I gotta setup another perf event buffer capture
> >> > thing too.
> >> 
> >> Dave,
> >> 
> >> Before I start working on v2, I would like to get your feedback on the
> >> high level plan. Also adding Neil who is the maintainer of drop_monitor
> >> (and counterpart DropWatch tool [1]).
> >> 
> >> IIUC, the problem you point out is that users need to use different
> >> tools to monitor packet drops based on where these drops occur
> >> (SW/HW/XDP).
> >> 
> >> Therefore, my plan is to extend the existing drop_monitor netlink
> >> channel to also cover HW drops. I will add a new message type and a new
> >> multicast group for HW drops and encode in the message what is currently
> >> encoded in the devlink events.
> >> 
> > A few things here:
> > IIRC we don't announce individual hardware drops, drivers record them in
> > internal structures, and they are retrieved on demand via ethtool calls, so you
> > will either need to include some polling (probably not a very performant idea),
> > or some sort of flagging mechanism to indicate that on the next message sent to
> > user space you should go retrieve hw stats from a given interface.  I certainly
> > wouldn't mind seeing this happen, but its more work than just adding a new
> > netlink message.
> >
> > Also, regarding XDP drops, we wont see them if the xdp program is offloaded to
> > hardware (you'll need your hw drop gathering mechanism for that), but for xdp
> > programs run on the cpu, dropwatch should alrady catch those.  I.e. if the xdp
> > program returns a DROP result for a packet being processed, the OS will call
> > kfree_skb on its behalf, and dropwatch wil call that.
> 
> There is no skb by the time an XDP program runs, so this is not true. As
> I mentioned upthread, there's a tracepoint that will get called if an
> error occurs (or the program returns XDP_ABORTED), but in most cases,
> XDP_DROP just means that the packet silently disappears...
> 
As I noted, thats only true for xdp programs that are offloaded to hardware, I
was only speaking for XDP programs that run on the cpu.  For the former case, we
obviously need some other mechanism to detect drops, but for cpu executed xdp
programs, the OS is responsible for freeing skbs associated with programs the
return XDP_DROP.

Neil

> -Toke
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH net-next 00/11] Add drop monitor for offloaded data paths
  2019-07-12 12:18         ` Neil Horman
@ 2019-07-12 12:33           ` Toke Høiland-Jørgensen
  2019-07-13  0:40             ` Neil Horman
  0 siblings, 1 reply; 48+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-07-12 12:33 UTC (permalink / raw)
  To: Neil Horman
  Cc: Ido Schimmel, David Miller, netdev, jiri, mlxsw, dsahern, roopa,
	nikolay, andy, pablo, jakub.kicinski, pieter.jansenvanvuuren,
	andrew, f.fainelli, vivien.didelot, idosch

Neil Horman <nhorman@tuxdriver.com> writes:

> On Fri, Jul 12, 2019 at 11:27:55AM +0200, Toke Høiland-Jørgensen wrote:
>> Neil Horman <nhorman@tuxdriver.com> writes:
>> 
>> > On Thu, Jul 11, 2019 at 03:39:09PM +0300, Ido Schimmel wrote:
>> >> On Sun, Jul 07, 2019 at 12:45:41PM -0700, David Miller wrote:
>> >> > From: Ido Schimmel <idosch@idosch.org>
>> >> > Date: Sun,  7 Jul 2019 10:58:17 +0300
>> >> > 
>> >> > > Users have several ways to debug the kernel and understand why a packet
>> >> > > was dropped. For example, using "drop monitor" and "perf". Both
>> >> > > utilities trace kfree_skb(), which is the function called when a packet
>> >> > > is freed as part of a failure. The information provided by these tools
>> >> > > is invaluable when trying to understand the cause of a packet loss.
>> >> > > 
>> >> > > In recent years, large portions of the kernel data path were offloaded
>> >> > > to capable devices. Today, it is possible to perform L2 and L3
>> >> > > forwarding in hardware, as well as tunneling (IP-in-IP and VXLAN).
>> >> > > Different TC classifiers and actions are also offloaded to capable
>> >> > > devices, at both ingress and egress.
>> >> > > 
>> >> > > However, when the data path is offloaded it is not possible to achieve
>> >> > > the same level of introspection as tools such "perf" and "drop monitor"
>> >> > > become irrelevant.
>> >> > > 
>> >> > > This patchset aims to solve this by allowing users to monitor packets
>> >> > > that the underlying device decided to drop along with relevant metadata
>> >> > > such as the drop reason and ingress port.
>> >> > 
>> >> > We are now going to have 5 or so ways to capture packets passing through
>> >> > the system, this is nonsense.
>> >> > 
>> >> > AF_PACKET, kfree_skb drop monitor, perf, XDP perf events, and now this
>> >> > devlink thing.
>> >> > 
>> >> > This is insanity, too many ways to do the same thing and therefore the
>> >> > worst possible user experience.
>> >> > 
>> >> > Pick _ONE_ method to trap packets and forward normal kfree_skb events,
>> >> > XDP perf events, and these taps there too.
>> >> > 
>> >> > I mean really, think about it from the average user's perspective.  To
>> >> > see all drops/pkts I have to attach a kfree_skb tracepoint, and not just
>> >> > listen on devlink but configure a special tap thing beforehand and then
>> >> > if someone is using XDP I gotta setup another perf event buffer capture
>> >> > thing too.
>> >> 
>> >> Dave,
>> >> 
>> >> Before I start working on v2, I would like to get your feedback on the
>> >> high level plan. Also adding Neil who is the maintainer of drop_monitor
>> >> (and counterpart DropWatch tool [1]).
>> >> 
>> >> IIUC, the problem you point out is that users need to use different
>> >> tools to monitor packet drops based on where these drops occur
>> >> (SW/HW/XDP).
>> >> 
>> >> Therefore, my plan is to extend the existing drop_monitor netlink
>> >> channel to also cover HW drops. I will add a new message type and a new
>> >> multicast group for HW drops and encode in the message what is currently
>> >> encoded in the devlink events.
>> >> 
>> > A few things here:
>> > IIRC we don't announce individual hardware drops, drivers record them in
>> > internal structures, and they are retrieved on demand via ethtool calls, so you
>> > will either need to include some polling (probably not a very performant idea),
>> > or some sort of flagging mechanism to indicate that on the next message sent to
>> > user space you should go retrieve hw stats from a given interface.  I certainly
>> > wouldn't mind seeing this happen, but its more work than just adding a new
>> > netlink message.
>> >
>> > Also, regarding XDP drops, we wont see them if the xdp program is offloaded to
>> > hardware (you'll need your hw drop gathering mechanism for that), but for xdp
>> > programs run on the cpu, dropwatch should alrady catch those.  I.e. if the xdp
>> > program returns a DROP result for a packet being processed, the OS will call
>> > kfree_skb on its behalf, and dropwatch wil call that.
>> 
>> There is no skb by the time an XDP program runs, so this is not true. As
>> I mentioned upthread, there's a tracepoint that will get called if an
>> error occurs (or the program returns XDP_ABORTED), but in most cases,
>> XDP_DROP just means that the packet silently disappears...
>> 
> As I noted, thats only true for xdp programs that are offloaded to hardware, I
> was only speaking for XDP programs that run on the cpu.  For the former case, we
> obviously need some other mechanism to detect drops, but for cpu executed xdp
> programs, the OS is responsible for freeing skbs associated with programs the
> return XDP_DROP.

Ah, I think maybe you're thinking of generic XDP (also referred to as
skb mode)? That is a separate mode; an XDP program loaded in "native
mode" (or "driver mode") runs on the CPU, but before the skb is created;
this is the common case for XDP, and there is no skb and thus no drop
notification in this mode.

There is *also* an offload mode for XDP programs, but that is only
supported by netronome cards thus far, so not as commonly used...

-Toke

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH net-next 00/11] Add drop monitor for offloaded data paths
  2019-07-11 23:53     ` Neil Horman
  2019-07-12  3:40       ` Florian Fainelli
  2019-07-12  9:27       ` Toke Høiland-Jørgensen
@ 2019-07-12 13:52       ` Ido Schimmel
  2019-07-14 11:29         ` Neil Horman
  2 siblings, 1 reply; 48+ messages in thread
From: Ido Schimmel @ 2019-07-12 13:52 UTC (permalink / raw)
  To: Neil Horman
  Cc: David Miller, netdev, jiri, mlxsw, dsahern, roopa, nikolay, andy,
	pablo, jakub.kicinski, pieter.jansenvanvuuren, andrew,
	f.fainelli, vivien.didelot, idosch

On Thu, Jul 11, 2019 at 07:53:54PM -0400, Neil Horman wrote:
> A few things here:
> IIRC we don't announce individual hardware drops, drivers record them in
> internal structures, and they are retrieved on demand via ethtool calls, so you
> will either need to include some polling (probably not a very performant idea),
> or some sort of flagging mechanism to indicate that on the next message sent to
> user space you should go retrieve hw stats from a given interface.  I certainly
> wouldn't mind seeing this happen, but its more work than just adding a new
> netlink message.

Neil,

The idea of this series is to pass the dropped packets themselves to
user space along with metadata, such as the drop reason and the ingress
port. In the future more metadata could be added thanks to the
extensible nature of netlink.

In v1 these packets were notified to user space as devlink events
and my plan for v2 is to send them as drop_monitor events, given it's an
existing generic netlink channel used to monitor SW drops. This will
allow users to listen on one netlink channel to diagnose potential
problems in either SW or HW (and hopefully XDP in the future).

Please note that the packets I'm talking about are packets users
currently do not see. They are dropped - potentially silently - by the
underlying device, thereby making it hard to debug whatever issues you
might be experiencing in your network.

The control path that determines if these packets are even sent to the
CPU from the HW needs to remain in devlink for the reasons I outlined in
my previous reply. However, the monitoring of these drops will be over
drop_monitor. This is similar to what we are currently doing with
tc-sample, where the control path that triggers the sampling and
determines the sampling rate and truncation is done over rtnetlink (tc),
but the sampled packets are notified over the generic netlink psample
channel.

To make it more real, you can check the example of the dissected devlink
message that notifies the drop of a packet due to a multicast source
MAC: https://marc.info/?l=linux-netdev&m=156248736710238&w=2

I will obviously have to create another Wireshark dissector for
drop_monitor in order to get the same information.

> Thats an interesting idea, but dropwatch certainly isn't currently setup for
> that kind of messaging.  It may be worth creating a v2 of the netlink protocol
> and really thinking out what you want to communicate.

I don't think we need a v2 of the netlink protocol. My current plan is
to extend the existing protocol with: New message type (e.g.,
NET_DM_CMD_HW_ALERT), new multicast group and a set of attributes to
encode the information that is currently encoded in the example message
I pasted above.

Thanks

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH net-next 00/11] Add drop monitor for offloaded data paths
  2019-07-12 12:33           ` Toke Høiland-Jørgensen
@ 2019-07-13  0:40             ` Neil Horman
  2019-07-13  8:07               ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 48+ messages in thread
From: Neil Horman @ 2019-07-13  0:40 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen
  Cc: Ido Schimmel, David Miller, netdev, jiri, mlxsw, dsahern, roopa,
	nikolay, andy, pablo, jakub.kicinski, pieter.jansenvanvuuren,
	andrew, f.fainelli, vivien.didelot, idosch

On Fri, Jul 12, 2019 at 02:33:29PM +0200, Toke Høiland-Jørgensen wrote:
> Neil Horman <nhorman@tuxdriver.com> writes:
> 
> > On Fri, Jul 12, 2019 at 11:27:55AM +0200, Toke Høiland-Jørgensen wrote:
> >> Neil Horman <nhorman@tuxdriver.com> writes:
> >> 
> >> > On Thu, Jul 11, 2019 at 03:39:09PM +0300, Ido Schimmel wrote:
> >> >> On Sun, Jul 07, 2019 at 12:45:41PM -0700, David Miller wrote:
> >> >> > From: Ido Schimmel <idosch@idosch.org>
> >> >> > Date: Sun,  7 Jul 2019 10:58:17 +0300
> >> >> > 
> >> >> > > Users have several ways to debug the kernel and understand why a packet
> >> >> > > was dropped. For example, using "drop monitor" and "perf". Both
> >> >> > > utilities trace kfree_skb(), which is the function called when a packet
> >> >> > > is freed as part of a failure. The information provided by these tools
> >> >> > > is invaluable when trying to understand the cause of a packet loss.
> >> >> > > 
> >> >> > > In recent years, large portions of the kernel data path were offloaded
> >> >> > > to capable devices. Today, it is possible to perform L2 and L3
> >> >> > > forwarding in hardware, as well as tunneling (IP-in-IP and VXLAN).
> >> >> > > Different TC classifiers and actions are also offloaded to capable
> >> >> > > devices, at both ingress and egress.
> >> >> > > 
> >> >> > > However, when the data path is offloaded it is not possible to achieve
> >> >> > > the same level of introspection as tools such "perf" and "drop monitor"
> >> >> > > become irrelevant.
> >> >> > > 
> >> >> > > This patchset aims to solve this by allowing users to monitor packets
> >> >> > > that the underlying device decided to drop along with relevant metadata
> >> >> > > such as the drop reason and ingress port.
> >> >> > 
> >> >> > We are now going to have 5 or so ways to capture packets passing through
> >> >> > the system, this is nonsense.
> >> >> > 
> >> >> > AF_PACKET, kfree_skb drop monitor, perf, XDP perf events, and now this
> >> >> > devlink thing.
> >> >> > 
> >> >> > This is insanity, too many ways to do the same thing and therefore the
> >> >> > worst possible user experience.
> >> >> > 
> >> >> > Pick _ONE_ method to trap packets and forward normal kfree_skb events,
> >> >> > XDP perf events, and these taps there too.
> >> >> > 
> >> >> > I mean really, think about it from the average user's perspective.  To
> >> >> > see all drops/pkts I have to attach a kfree_skb tracepoint, and not just
> >> >> > listen on devlink but configure a special tap thing beforehand and then
> >> >> > if someone is using XDP I gotta setup another perf event buffer capture
> >> >> > thing too.
> >> >> 
> >> >> Dave,
> >> >> 
> >> >> Before I start working on v2, I would like to get your feedback on the
> >> >> high level plan. Also adding Neil who is the maintainer of drop_monitor
> >> >> (and counterpart DropWatch tool [1]).
> >> >> 
> >> >> IIUC, the problem you point out is that users need to use different
> >> >> tools to monitor packet drops based on where these drops occur
> >> >> (SW/HW/XDP).
> >> >> 
> >> >> Therefore, my plan is to extend the existing drop_monitor netlink
> >> >> channel to also cover HW drops. I will add a new message type and a new
> >> >> multicast group for HW drops and encode in the message what is currently
> >> >> encoded in the devlink events.
> >> >> 
> >> > A few things here:
> >> > IIRC we don't announce individual hardware drops, drivers record them in
> >> > internal structures, and they are retrieved on demand via ethtool calls, so you
> >> > will either need to include some polling (probably not a very performant idea),
> >> > or some sort of flagging mechanism to indicate that on the next message sent to
> >> > user space you should go retrieve hw stats from a given interface.  I certainly
> >> > wouldn't mind seeing this happen, but its more work than just adding a new
> >> > netlink message.
> >> >
> >> > Also, regarding XDP drops, we wont see them if the xdp program is offloaded to
> >> > hardware (you'll need your hw drop gathering mechanism for that), but for xdp
> >> > programs run on the cpu, dropwatch should alrady catch those.  I.e. if the xdp
> >> > program returns a DROP result for a packet being processed, the OS will call
> >> > kfree_skb on its behalf, and dropwatch wil call that.
> >> 
> >> There is no skb by the time an XDP program runs, so this is not true. As
> >> I mentioned upthread, there's a tracepoint that will get called if an
> >> error occurs (or the program returns XDP_ABORTED), but in most cases,
> >> XDP_DROP just means that the packet silently disappears...
> >> 
> > As I noted, thats only true for xdp programs that are offloaded to hardware, I
> > was only speaking for XDP programs that run on the cpu.  For the former case, we
> > obviously need some other mechanism to detect drops, but for cpu executed xdp
> > programs, the OS is responsible for freeing skbs associated with programs the
> > return XDP_DROP.
> 
> Ah, I think maybe you're thinking of generic XDP (also referred to as
> skb mode)? That is a separate mode; an XDP program loaded in "native
Yes, was I not clear about that?
Neil

> mode" (or "driver mode") runs on the CPU, but before the skb is created;
> this is the common case for XDP, and there is no skb and thus no drop
> notification in this mode.
> 
> There is *also* an offload mode for XDP programs, but that is only
> supported by netronome cards thus far, so not as commonly used...
> 
> -Toke
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH net-next 00/11] Add drop monitor for offloaded data paths
  2019-07-13  0:40             ` Neil Horman
@ 2019-07-13  8:07               ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 48+ messages in thread
From: Toke Høiland-Jørgensen @ 2019-07-13  8:07 UTC (permalink / raw)
  To: Neil Horman
  Cc: Ido Schimmel, David Miller, netdev, jiri, mlxsw, dsahern, roopa,
	nikolay, andy, pablo, jakub.kicinski, pieter.jansenvanvuuren,
	andrew, f.fainelli, vivien.didelot, idosch

>Neil Horman <nhorman@tuxdriver.com> writes:

> On Fri, Jul 12, 2019 at 02:33:29PM +0200, Toke Høiland-Jørgensen wrote:
>> Neil Horman <nhorman@tuxdriver.com> writes:
>> 
>> > On Fri, Jul 12, 2019 at 11:27:55AM +0200, Toke Høiland-Jørgensen wrote:
>> >> Neil Horman <nhorman@tuxdriver.com> writes:
>> >> 
>> >> > On Thu, Jul 11, 2019 at 03:39:09PM +0300, Ido Schimmel wrote:
>> >> >> On Sun, Jul 07, 2019 at 12:45:41PM -0700, David Miller wrote:
>> >> >> > From: Ido Schimmel <idosch@idosch.org>
>> >> >> > Date: Sun,  7 Jul 2019 10:58:17 +0300
>> >> >> > 
>> >> >> > > Users have several ways to debug the kernel and understand why a packet
>> >> >> > > was dropped. For example, using "drop monitor" and "perf". Both
>> >> >> > > utilities trace kfree_skb(), which is the function called when a packet
>> >> >> > > is freed as part of a failure. The information provided by these tools
>> >> >> > > is invaluable when trying to understand the cause of a packet loss.
>> >> >> > > 
>> >> >> > > In recent years, large portions of the kernel data path were offloaded
>> >> >> > > to capable devices. Today, it is possible to perform L2 and L3
>> >> >> > > forwarding in hardware, as well as tunneling (IP-in-IP and VXLAN).
>> >> >> > > Different TC classifiers and actions are also offloaded to capable
>> >> >> > > devices, at both ingress and egress.
>> >> >> > > 
>> >> >> > > However, when the data path is offloaded it is not possible to achieve
>> >> >> > > the same level of introspection as tools such "perf" and "drop monitor"
>> >> >> > > become irrelevant.
>> >> >> > > 
>> >> >> > > This patchset aims to solve this by allowing users to monitor packets
>> >> >> > > that the underlying device decided to drop along with relevant metadata
>> >> >> > > such as the drop reason and ingress port.
>> >> >> > 
>> >> >> > We are now going to have 5 or so ways to capture packets passing through
>> >> >> > the system, this is nonsense.
>> >> >> > 
>> >> >> > AF_PACKET, kfree_skb drop monitor, perf, XDP perf events, and now this
>> >> >> > devlink thing.
>> >> >> > 
>> >> >> > This is insanity, too many ways to do the same thing and therefore the
>> >> >> > worst possible user experience.
>> >> >> > 
>> >> >> > Pick _ONE_ method to trap packets and forward normal kfree_skb events,
>> >> >> > XDP perf events, and these taps there too.
>> >> >> > 
>> >> >> > I mean really, think about it from the average user's perspective.  To
>> >> >> > see all drops/pkts I have to attach a kfree_skb tracepoint, and not just
>> >> >> > listen on devlink but configure a special tap thing beforehand and then
>> >> >> > if someone is using XDP I gotta setup another perf event buffer capture
>> >> >> > thing too.
>> >> >> 
>> >> >> Dave,
>> >> >> 
>> >> >> Before I start working on v2, I would like to get your feedback on the
>> >> >> high level plan. Also adding Neil who is the maintainer of drop_monitor
>> >> >> (and counterpart DropWatch tool [1]).
>> >> >> 
>> >> >> IIUC, the problem you point out is that users need to use different
>> >> >> tools to monitor packet drops based on where these drops occur
>> >> >> (SW/HW/XDP).
>> >> >> 
>> >> >> Therefore, my plan is to extend the existing drop_monitor netlink
>> >> >> channel to also cover HW drops. I will add a new message type and a new
>> >> >> multicast group for HW drops and encode in the message what is currently
>> >> >> encoded in the devlink events.
>> >> >> 
>> >> > A few things here:
>> >> > IIRC we don't announce individual hardware drops, drivers record them in
>> >> > internal structures, and they are retrieved on demand via ethtool calls, so you
>> >> > will either need to include some polling (probably not a very performant idea),
>> >> > or some sort of flagging mechanism to indicate that on the next message sent to
>> >> > user space you should go retrieve hw stats from a given interface.  I certainly
>> >> > wouldn't mind seeing this happen, but its more work than just adding a new
>> >> > netlink message.
>> >> >
>> >> > Also, regarding XDP drops, we wont see them if the xdp program is offloaded to
>> >> > hardware (you'll need your hw drop gathering mechanism for that), but for xdp
>> >> > programs run on the cpu, dropwatch should alrady catch those.  I.e. if the xdp
>> >> > program returns a DROP result for a packet being processed, the OS will call
>> >> > kfree_skb on its behalf, and dropwatch wil call that.
>> >> 
>> >> There is no skb by the time an XDP program runs, so this is not true. As
>> >> I mentioned upthread, there's a tracepoint that will get called if an
>> >> error occurs (or the program returns XDP_ABORTED), but in most cases,
>> >> XDP_DROP just means that the packet silently disappears...
>> >> 
>> > As I noted, thats only true for xdp programs that are offloaded to hardware, I
>> > was only speaking for XDP programs that run on the cpu.  For the former case, we
>> > obviously need some other mechanism to detect drops, but for cpu executed xdp
>> > programs, the OS is responsible for freeing skbs associated with programs the
>> > return XDP_DROP.
>> 
>> Ah, I think maybe you're thinking of generic XDP (also referred to as
>> skb mode)? That is a separate mode; an XDP program loaded in "native
> Yes, was I not clear about that?

No, not really. "Generic XDP" is not the same as "XDP"; the generic mode
is more of a debug mode (as far as I'm concerned at least). So in the
common case, it is absolutely not the case that the kernel will end up
calling kfree_skb after an XDP_DROP; so I got somewhat thrown off by
your insistence that it would... :)

-Toke

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH net-next 00/11] Add drop monitor for offloaded data paths
  2019-07-11 12:39   ` Ido Schimmel
  2019-07-11 19:02     ` David Miller
  2019-07-11 23:53     ` Neil Horman
@ 2019-07-14  2:38     ` David Miller
  2 siblings, 0 replies; 48+ messages in thread
From: David Miller @ 2019-07-14  2:38 UTC (permalink / raw)
  To: idosch
  Cc: nhorman, netdev, jiri, mlxsw, dsahern, roopa, nikolay, andy,
	pablo, jakub.kicinski, pieter.jansenvanvuuren, andrew,
	f.fainelli, vivien.didelot, idosch

From: Ido Schimmel <idosch@idosch.org>
Date: Thu, 11 Jul 2019 15:39:09 +0300

> Before I start working on v2, I would like to get your feedback on the
> high level plan. Also adding Neil who is the maintainer of drop_monitor
> (and counterpart DropWatch tool [1]).
> 
> IIUC, the problem you point out is that users need to use different
> tools to monitor packet drops based on where these drops occur
> (SW/HW/XDP).
> 
> Therefore, my plan is to extend the existing drop_monitor netlink
> channel to also cover HW drops. I will add a new message type and a new
> multicast group for HW drops and encode in the message what is currently
> encoded in the devlink events.

Consolidating the reporting to merge with the drop monitor facilities
is definitely a step in the right direction.

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH net-next 00/11] Add drop monitor for offloaded data paths
  2019-07-12 13:52       ` Ido Schimmel
@ 2019-07-14 11:29         ` Neil Horman
  2019-07-14 12:43           ` Ido Schimmel
  0 siblings, 1 reply; 48+ messages in thread
From: Neil Horman @ 2019-07-14 11:29 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: David Miller, netdev, jiri, mlxsw, dsahern, roopa, nikolay, andy,
	pablo, jakub.kicinski, pieter.jansenvanvuuren, andrew,
	f.fainelli, vivien.didelot, idosch

On Fri, Jul 12, 2019 at 04:52:30PM +0300, Ido Schimmel wrote:
> On Thu, Jul 11, 2019 at 07:53:54PM -0400, Neil Horman wrote:
> > A few things here:
> > IIRC we don't announce individual hardware drops, drivers record them in
> > internal structures, and they are retrieved on demand via ethtool calls, so you
> > will either need to include some polling (probably not a very performant idea),
> > or some sort of flagging mechanism to indicate that on the next message sent to
> > user space you should go retrieve hw stats from a given interface.  I certainly
> > wouldn't mind seeing this happen, but its more work than just adding a new
> > netlink message.
> 
> Neil,
> 
> The idea of this series is to pass the dropped packets themselves to
> user space along with metadata, such as the drop reason and the ingress
> port. In the future more metadata could be added thanks to the
> extensible nature of netlink.
> 
I had experimented with this idea previously.  Specifically I had investigated
the possibility of creating a dummy net_device that received only dropped
packets so that utilities like tcpdump could monitor the interface for said
packets along with the metadata that described where they dropped.

The concern I had was, as Dave mentioned, that you would wind up with either a
head of line blocking issue, or simply lots of lost "dropped" packets due to
queue overflow on receive, which kind of defeated the purpose of drop monitor.

That said, I like the idea, and if we can find a way around the fact that we
could potentially receive way more dropped packets than we could bounce back to
userspace, it would be a major improvement.

> In v1 these packets were notified to user space as devlink events
> and my plan for v2 is to send them as drop_monitor events, given it's an
> existing generic netlink channel used to monitor SW drops. This will
> allow users to listen on one netlink channel to diagnose potential
> problems in either SW or HW (and hopefully XDP in the future).
> 
Yeah, I'm supportive of that.

> Please note that the packets I'm talking about are packets users
> currently do not see. They are dropped - potentially silently - by the
> underlying device, thereby making it hard to debug whatever issues you
> might be experiencing in your network.
> 
Right I get that, you want the ability to register a listener of sorts to
monitor drops in hardware and report that back to user space as an drop even
with a location that (instead of being a kernel address, is a 'special location'
representing a hardware instance.  Makes sense.  Having that be a location +
counter tuple would make sense, but I don't think we can pass the skb itself (as
you mention above), without seeing significant loss.

> The control path that determines if these packets are even sent to the
> CPU from the HW needs to remain in devlink for the reasons I outlined in
> my previous reply. However, the monitoring of these drops will be over
> drop_monitor. This is similar to what we are currently doing with
> tc-sample, where the control path that triggers the sampling and
> determines the sampling rate and truncation is done over rtnetlink (tc),
> but the sampled packets are notified over the generic netlink psample
> channel.
> 
> To make it more real, you can check the example of the dissected devlink
> message that notifies the drop of a packet due to a multicast source
> MAC: https://marc.info/?l=linux-netdev&m=156248736710238&w=2
> 
> I will obviously have to create another Wireshark dissector for
> drop_monitor in order to get the same information.
> 
yes, Of course.
> > Thats an interesting idea, but dropwatch certainly isn't currently setup for
> > that kind of messaging.  It may be worth creating a v2 of the netlink protocol
> > and really thinking out what you want to communicate.
> 
> I don't think we need a v2 of the netlink protocol. My current plan is
> to extend the existing protocol with: New message type (e.g.,
> NET_DM_CMD_HW_ALERT), new multicast group and a set of attributes to
> encode the information that is currently encoded in the example message
> I pasted above.
> 
Ok, that makes sense.  I think we already do some very rudimentary version of
that (see trace_napi_poll_hit).  Here we check the device we receive frames on
to see if its rx_dropped count has increased, and if it has, store that as a
drop count in the NULL location.  Thats obviously insufficient, but I wonder if
its worth looking at using the dm_hw_stat_delta to encode and record those event
for sending with your new message type.

> Thanks
> 

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH net-next 00/11] Add drop monitor for offloaded data paths
  2019-07-14 11:29         ` Neil Horman
@ 2019-07-14 12:43           ` Ido Schimmel
  0 siblings, 0 replies; 48+ messages in thread
From: Ido Schimmel @ 2019-07-14 12:43 UTC (permalink / raw)
  To: Neil Horman
  Cc: David Miller, netdev, jiri, mlxsw, dsahern, roopa, nikolay, andy,
	pablo, jakub.kicinski, pieter.jansenvanvuuren, andrew,
	f.fainelli, vivien.didelot, idosch

On Sun, Jul 14, 2019 at 07:29:04AM -0400, Neil Horman wrote:
> On Fri, Jul 12, 2019 at 04:52:30PM +0300, Ido Schimmel wrote:
> > On Thu, Jul 11, 2019 at 07:53:54PM -0400, Neil Horman wrote:
> > > A few things here:
> > > IIRC we don't announce individual hardware drops, drivers record them in
> > > internal structures, and they are retrieved on demand via ethtool calls, so you
> > > will either need to include some polling (probably not a very performant idea),
> > > or some sort of flagging mechanism to indicate that on the next message sent to
> > > user space you should go retrieve hw stats from a given interface.  I certainly
> > > wouldn't mind seeing this happen, but its more work than just adding a new
> > > netlink message.
> > 
> > Neil,
> > 
> > The idea of this series is to pass the dropped packets themselves to
> > user space along with metadata, such as the drop reason and the ingress
> > port. In the future more metadata could be added thanks to the
> > extensible nature of netlink.
> > 
> I had experimented with this idea previously.  Specifically I had investigated
> the possibility of creating a dummy net_device that received only dropped
> packets so that utilities like tcpdump could monitor the interface for said
> packets along with the metadata that described where they dropped.
> 
> The concern I had was, as Dave mentioned, that you would wind up with either a
> head of line blocking issue, or simply lots of lost "dropped" packets due to
> queue overflow on receive, which kind of defeated the purpose of drop monitor.
> 
> That said, I like the idea, and if we can find a way around the fact that we
> could potentially receive way more dropped packets than we could bounce back to
> userspace, it would be a major improvement.

We don't necessarily need the entire packet, so we could add an option
to truncate them and get only the headers. tc-sample does this (see man
tc-sample).

Also, packets that are dropped for the same reason and belong to the
same flow are not necessarily interesting. We could add an eBPF filter
on the netlink socket which will only allow "unique" packets to be
enqueued. Where "unique" is defined as {drop reason, 5-tuple}. In the
case of SW drops "drop reason" is instruction pointer (call site of
kfree_skb()) and in the case of HW drops it's the drop reason provided
by the device. The duplicate copies will be counted in the eBPF map.

> 
> > In v1 these packets were notified to user space as devlink events
> > and my plan for v2 is to send them as drop_monitor events, given it's an
> > existing generic netlink channel used to monitor SW drops. This will
> > allow users to listen on one netlink channel to diagnose potential
> > problems in either SW or HW (and hopefully XDP in the future).
> > 
> Yeah, I'm supportive of that.
> 
> > Please note that the packets I'm talking about are packets users
> > currently do not see. They are dropped - potentially silently - by the
> > underlying device, thereby making it hard to debug whatever issues you
> > might be experiencing in your network.
> > 
> Right I get that, you want the ability to register a listener of sorts to
> monitor drops in hardware and report that back to user space as an drop even
> with a location that (instead of being a kernel address, is a 'special location'
> representing a hardware instance.  Makes sense.  Having that be a location +
> counter tuple would make sense, but I don't think we can pass the skb itself (as
> you mention above), without seeing significant loss.

Getting the skb itself will be an option users will need to enable in
drop_monitor. If it is not enabled, then default behavior is maintained
and you only get notifications that contain the location + counter tuple
you mentioned.

> 
> > The control path that determines if these packets are even sent to the
> > CPU from the HW needs to remain in devlink for the reasons I outlined in
> > my previous reply. However, the monitoring of these drops will be over
> > drop_monitor. This is similar to what we are currently doing with
> > tc-sample, where the control path that triggers the sampling and
> > determines the sampling rate and truncation is done over rtnetlink (tc),
> > but the sampled packets are notified over the generic netlink psample
> > channel.
> > 
> > To make it more real, you can check the example of the dissected devlink
> > message that notifies the drop of a packet due to a multicast source
> > MAC: https://marc.info/?l=linux-netdev&m=156248736710238&w=2
> > 
> > I will obviously have to create another Wireshark dissector for
> > drop_monitor in order to get the same information.
> > 
> yes, Of course.
> > > Thats an interesting idea, but dropwatch certainly isn't currently setup for
> > > that kind of messaging.  It may be worth creating a v2 of the netlink protocol
> > > and really thinking out what you want to communicate.
> > 
> > I don't think we need a v2 of the netlink protocol. My current plan is
> > to extend the existing protocol with: New message type (e.g.,
> > NET_DM_CMD_HW_ALERT), new multicast group and a set of attributes to
> > encode the information that is currently encoded in the example message
> > I pasted above.
> > 
> Ok, that makes sense.  I think we already do some very rudimentary version of
> that (see trace_napi_poll_hit).  Here we check the device we receive frames on
> to see if its rx_dropped count has increased, and if it has, store that as a
> drop count in the NULL location.  Thats obviously insufficient, but I wonder if
> its worth looking at using the dm_hw_stat_delta to encode and record those event
> for sending with your new message type.

Will check.

Thanks, Neil.

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2019-07-14 12:44 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-07  7:58 [PATCH net-next 00/11] Add drop monitor for offloaded data paths Ido Schimmel
2019-07-07  7:58 ` [PATCH net-next 01/11] devlink: Create helper to fill port type information Ido Schimmel
2019-07-07  7:58 ` [PATCH net-next 02/11] devlink: Add packet trap infrastructure Ido Schimmel
2019-07-07  7:58 ` [PATCH net-next 03/11] devlink: Add generic packet traps and groups Ido Schimmel
2019-07-07  7:58 ` [PATCH net-next 04/11] Documentation: Add devlink-trap documentation Ido Schimmel
2019-07-07  7:58 ` [PATCH net-next 05/11] netdevsim: Add devlink-trap support Ido Schimmel
2019-07-07  7:58 ` [PATCH net-next 06/11] Documentation: Add description of netdevsim traps Ido Schimmel
2019-07-07  7:58 ` [PATCH net-next 07/11] mlxsw: core: Add API to set trap action Ido Schimmel
2019-07-07  7:58 ` [PATCH net-next 08/11] mlxsw: reg: Add new " Ido Schimmel
2019-07-07  7:58 ` [PATCH net-next 09/11] mlxsw: Add layer 2 discard trap IDs Ido Schimmel
2019-07-07  7:58 ` [PATCH net-next 10/11] mlxsw: Add trap group for layer 2 discards Ido Schimmel
2019-07-07  7:58 ` [PATCH net-next 11/11] mlxsw: spectrum: Add devlink-trap support Ido Schimmel
2019-07-07  8:01 ` [PATCH iproute2-next 0/7] " Ido Schimmel
2019-07-07  8:01   ` [PATCH iproute2-next 1/7] devlink: Increase number of supported options Ido Schimmel
2019-07-07  8:01   ` [PATCH iproute2-next 2/7] devlink: Add devlink trap set and show commands Ido Schimmel
2019-07-07  8:01   ` [PATCH iproute2-next 3/7] devlink: Add devlink trap group " Ido Schimmel
2019-07-07  8:01   ` [PATCH iproute2-next 4/7] devlink: Add devlink trap monitor support Ido Schimmel
2019-07-07  8:01   ` [PATCH iproute2-next 5/7] devlink: Set NETLINK_NO_ENOBUFS when monitoring events Ido Schimmel
2019-07-07  8:01   ` [PATCH iproute2-next 6/7] devlink: Add fflush() to print functions Ido Schimmel
2019-07-07  8:02   ` [PATCH iproute2-next 7/7] devlink: Add man page for devlink-trap Ido Schimmel
2019-07-07  8:03 ` [RFC PATCH net-next 0/5] selftests: Add devlink-trap selftests Ido Schimmel
2019-07-07  8:03   ` [RFC PATCH net-next 1/5] selftests: devlink_trap: Add test cases for devlink-trap Ido Schimmel
2019-07-07  8:03   ` [RFC PATCH net-next 2/5] Documentation: Add a section for devlink-trap testing Ido Schimmel
2019-07-07  8:03   ` [RFC PATCH net-next 3/5] selftests: forwarding: devlink_lib: Add devlink-trap helpers Ido Schimmel
2019-07-07  8:03   ` [RFC PATCH net-next 4/5] selftests: mlxsw: Add test cases for devlink-trap L2 drops Ido Schimmel
2019-07-07  8:03   ` [RFC PATCH net-next 5/5] selftests: mlxsw: Add a test case for devlink-trap Ido Schimmel
2019-07-07  8:15 ` [PATCH net-next 00/11] Add drop monitor for offloaded data paths Ido Schimmel
2019-07-07 19:45 ` David Miller
2019-07-08 13:19   ` Ido Schimmel
2019-07-08 22:51     ` Jakub Kicinski
2019-07-09 12:38       ` Ido Schimmel
2019-07-09 22:34         ` Jakub Kicinski
2019-07-10 11:20           ` Ido Schimmel
2019-07-10 11:39             ` Toke Høiland-Jørgensen
2019-07-11 12:39   ` Ido Schimmel
2019-07-11 19:02     ` David Miller
2019-07-11 23:53     ` Neil Horman
2019-07-12  3:40       ` Florian Fainelli
2019-07-12 12:05         ` Neil Horman
2019-07-12  9:27       ` Toke Høiland-Jørgensen
2019-07-12 12:18         ` Neil Horman
2019-07-12 12:33           ` Toke Høiland-Jørgensen
2019-07-13  0:40             ` Neil Horman
2019-07-13  8:07               ` Toke Høiland-Jørgensen
2019-07-12 13:52       ` Ido Schimmel
2019-07-14 11:29         ` Neil Horman
2019-07-14 12:43           ` Ido Schimmel
2019-07-14  2:38     ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.